Carbon Prices Forecasting Using Group Information

Xiaohang Ren; Kang Yuan; Lizhu Tao; Cheng Yan

I. Introduction

The issue of greenhouse gas emissions is now receiving widespread global attention. According to estimates from relevant studies, the harm caused by each ton of CO₂ is worth approximately US$50 (Revesz et al., 2017). In order to alleviate the worsening environmental problems, the Kyoto Protocol, established in Kyoto, Japan, in 1997, brought the carbon emission rights market to reality. As a derivative market of the carbon spot market, the carbon futures market theoretically has the function of price forecasting and risk hedging, and has a decisive impact on the rational allocation of carbon resources, risk reduction and mitigation of climate change. The European Union Emissions Trading Scheme (EU ETS) is the earliest and largest carbon emission reduction market. This paper examines the international carbon futures market by taking the EU carbon allowance futures, the main carbon contract futures under the EU ETS, as the object of study.

Accurate and stable carbon price forecasts are important for policy making by government departments and decision making by market participants (Ren et al., 2022; Xiong et al., 2019). However, there are still relatively few studies on carbon price forecasting. For example, Benz and Truck (2009) use Markov switching and AR-GARCH models to forecast the EUA price, based on historical data. Chevallier (2009) considers seven macroeconomic variables as predictors of carbon prices and uses a variety of GARCH-type models for forecasting. Byun and Cho (2013) employ GARCH-type models to study the forecasting of carbon price volatility and include several energy futures prices as predictors. Hong et al. (2017) construct a multiple linear regression forecasting model to predict carbon prices using seven variables from the stock and energy markets.

The main contributions of this paper include the following: Firstly, we introduce a large number of macroeconomic variables as predictors of the carbon price (EU carbon allowance futures), whereas existing studies have typically included only a small number of exogenous predictor variables (Byun & Cho, 2013; Chevallier, 2009; Hong et al., 2017). These macroeconomic variables, 44 in total, include information on energy commodities, financial markets and economic activity in different countries. In this respect, our paper extends the study on carbon price predictors. Secondly, we use six different statistical models to forecast carbon prices: the ARMAX, GARCHX, LASSO, Adaptive LASSO, the Group SCAD, and the Group LASSO models. Benz and Truck (2009) and Byun and Cho (2013) compare the performance of several traditional statistical models for predicting carbon prices, but do not introduce high-dimensional statistical models. Comparing the prediction accuracy between high-dimensional and traditional statistical models, we find that the high-dimensional statistical model is more suitable for carbon price prediction with high-dimensional predictor variable inputs. Finally, by analyzing and comparing the prediction performance of different high-dimensional models, we find that carbon price prediction using group high-dimensional models is better when the input high-dimensional data has a distinct group structure.

The rest of the paper is structured as follows: Section 2 presents our use of methods and data; Section 3 reports our empirical results. Section 4 concludes.

II. Methodology and data

A. The candidate models

We introduce exogenous inputs to extend the ARMA model (Box et al., 2015; Whittle, 1953) to the ARMAX model, which is constructed as follows:

\[ Y_{t}=c+\varepsilon_{t}+\sum_{i=1}^{p} \alpha_{i} Y_{t-i}+\sum_{i=1}^{q} \beta_{i} \varepsilon_{t-i}+\sum_{i=1}^{g} \gamma_{i} X_{t-i}, \tag{1} \]

where $\alpha_{i}$, $\beta_{i}$ and $\gamma_{i}$ are the parameters of the observation $Y_{t-i}$, white noise error terms $\varepsilon_{t-i}$ and the exogenous covariate $X_{t-i}$, respectively. $c$ is a constant.

In this paper, we follow the ARMAX (1,1,1) models to understand and predict our carbon price return data

\[ p_{t}=c+\varepsilon_{t}+\alpha p_{t-1}+\beta \varepsilon_{t-1}+\gamma X_{t-1} \tag{2}, \]

As with the ARMAX model, we construct the GARCHX model as follows:

\[ y_{t}=\mu_{t}+\varepsilon_{t}, \varepsilon_{t} \mid \Psi_{t-1} \sim N\left(0, \sigma_{t}^{2}\right) \tag{3}, \]

\[ \sigma^{2}=\omega+\sum_{i=1}^{q} \alpha_{i} \varepsilon_{t-i}^{2}+\sum_{i=1}^{p} \beta_{i} \sigma_{t-i}^{2}+\sum_{i=1}^{n} \gamma_{i} X_{t-i} \tag{4}, \]

where $\Psi_{t-1}$ represents a collection of information until time $t-1$. $\varepsilon_{t}$ is the error term at time $t$. The expectation value of $y_{t}$ at time $t$ is $\mu_{t}$. $\sigma_{t}^{2}$ is the conditional variance, $X$ is the exogenous covariate. $\omega$, $\alpha_{i}$, $\beta_{i}$ and $\gamma_{i}$ are used as corresponding parameters.

In this paper, we employ GARCHX (1,1,1) to predict the carbon price. The models are as follows:

\[ p_{t}=\mu_{t}+\varepsilon_{t}, \varepsilon_{t} \mid \Psi_{t-1} \sim N\left(0, \sigma_{t}^{2}\right) \tag{5} \]

\[ \sigma^{2}=\omega+\alpha p_{t-1}^{2}+\beta \sigma_{t-1}^{2}+\gamma X_{t-1} \tag{6} \]

Following Tibshirani (1996), the LASSO estimator is constructed as follows:

\[ \hat{\beta}^{L A S S O}=\arg \min _{\beta}\left\{\frac{1}{N}\|Y-X \beta\|_{2}^{2}+\lambda \sum_{i=1}^{N}\left|\beta_{i}\right|\right\} \tag{7}, \]

where $\lambda$ is the regularization parameter, and for sparsity the penalty $\sum_{i=1}^{N}\left|\beta_{i}\right|$ is adopted.

The carbon price prediction model with LASSO is:

\[ \hat{p}_{t+1}=\hat{\beta}_{0}^{L A S S O}+\sum_{i=1}^{N} \hat{\beta}_{0}^{L A S S O} x_{i, t} \tag{8} \]

Here

\[ \begin{aligned} \hat{\beta}^{LASSO} &= {\arg \min _{\beta}}\bigg\{\frac{1}{t-1} \sum_{i=1}^{t-1}\\&\quad\left(p_{l+1}-\beta_{0}-\sum_{i=1}^{n} \beta_{i, x_{i}, l}\right)^{2} \\ &\quad+\lambda_{c v} \sum_{i=1}^{N}\left|\beta_{i}\right|\bigg\}, \end{aligned} \tag{9} \]

In reference to the approach of Zou (2006), Adaptive LASSO is given by

\[ \begin{align} \hat{\beta}^{adapt}&=\arg \min _{\beta}\\ &\quad\left\{\frac{1}{N}\|Y-X \beta\|_{2}^{2}+\lambda \sum_{i=1}^{n} \frac{\left|\beta_{i}\right|}{\left|\hat{\beta}_{init, i}\right|}\right\} . \end{align} \tag{10} \]

The carbon price prediction model with Adaptive LASSO is

\[ \hat{p}_{t+1}=\hat{\beta}_{0}^{adapt}+\sum_{i=1}^{N} \hat{\beta}_{0}^{adapt} x_{i, t} \tag{11}. \]

Here

\[ \begin{align} \hat{\beta}^{LASSO}&=\arg \min _{\beta}\bigg\{\frac{1}{t-1} \sum_{i=1}^{t-1}\\ &\quad\left(p_{l+1}-\beta_{0}-\sum_{i=1}^{n} \beta_{i, x_{i}, l}\right)^{2}\\ &\quad+\lambda_{c v} \sum_{i=1}^{N} \frac{\left|\beta_{i}\right|}{\left|\hat{\beta}_{i n i t, t}\right|}\bigg\}, \end{align} \tag{12} \]

B. Group method

In practical terms, the parametric vector in a high-dimensional regression model has a group structure $\left\{g_{1}, g_{2}, \ldots, g_{q}\right\}$ and an index number $\{1,2, \ldots, p\}$. That is, $\bigcup_{j=1}^{q} g_{j}=\{1,2, \ldots, p\}$ and $g_{i} \cap g_{k}=\varnothing$. We combine the group structure with two methods, LASSO and SCAD, respectively, and evaluate their performance in predicting carbon prices.

Following Yuan and Lin (2006), the group LASSO estimator is

\[ \begin{align} \hat{\beta}^{gLASSO}&=\arg \min _{\beta}\\ &\quad\left\{\frac{1}{N}\|Y-X \beta\|_{2}^{2}+\lambda \sum_{j=1}^{N} m_{j}\left\|\beta_{g j}\right\|_{2}\right\}, \end{align} \tag{13} \]

where standard Euclidean norm is $\left\|\beta_{g j}\right\|_{2}$ and $\left\|\beta_{g j}\right\|_{2}=\left(\sum_{l=1}^{k} \beta_{g j, l}\right)^{\frac{1}{2}}$. The multiplier $m_{j}$ is designed to balance the size of the different groups and is usually set to

\[ m_{j}=\sqrt{T_{j}} \tag{14}, \]

where $T_{j}$ indicates the number of parameters in $j$th group.

The carbon price prediction model with Group LASSO is

\[ \hat{p}_{t+1}=\hat{\beta}_{0}+\sum_{i=1}^{N} \hat{\beta}_{0} x_{i, t} \tag{15}. \]

Here

\[ \begin{aligned} \hat{\beta}&=\arg \min _{\beta}\bigg\{\frac{1}{t-1} \sum_{i=1}^{t-1}\\ &\quad\left(p_{l+1}-\beta_{0}-\sum_{i=1}^{n} \beta_{i, x_{i}, l}\right)^{2}\\ &\quad+\lambda_{c v} \sum_{j=1}^{N} m_{j}\left\|\beta_{g j}\right\|\bigg\}. \end{aligned} \tag{16} \]

The group SCAD estimator is

\[ \begin{align} \hat{\beta}^{g S C A D}&=\arg \min _{\beta}\\ &\quad\left\{\frac{1}{N}\|Y-X \beta\|_{2}^{2}+\lambda \sum_{j=1}^{N}\left\|\beta_{g j}\right\|_{2}\right\}, \end{align} \tag{17} \]

where $\beta_{g j}$ are the parameters in $g_{j}$th group, and the penalty $P_{\lambda}(\cdot)$ is

\[ P_{\lambda}(|x|)=\left\{\begin{array}{cc} \lambda|x|, & \text { if }|x| \leq \lambda . \\ -\frac{\left(|x|^{2}-2 a \lambda|x|+\lambda^{2}\right)}{2(a-1)}, & \text { if } \lambda<|x|< a \lambda . \\ \frac{(a+1) \lambda^{2}}{2}, & \text { if }|x|>a \lambda . \end{array}\right. \tag{18} \]

The carbon price prediction model with Group SCAD is

\[ \hat{p}_{t+1}=\hat{\beta}_{0}+\sum_{i=1}^{N} \hat{\beta}_{0} x_{i, t} \tag{19}. \]

Here

\[ \begin{aligned} \hat{\beta}&=\arg \min _{\beta}\bigg\{\frac{1}{t-1} \sum_{i=1}^{t-1}\\ &\quad\left(p_{l+1}-\beta_{0}-\sum_{i=1}^{n} \beta_{i, x_{i}, l}\right)^{2}\\ &\quad+\lambda_{c v} \sum_{j=1}^{N} m_{j}\left\|\beta_{g j}\right\|\bigg\}. \end{aligned} \tag{20} \]

C. Out-of-sample Comparisons

Using the full dataset usually leads to over-fitting, hence we use out-of-sample performance tests to objectively evaluate the predictive power of statistical models. Specifically, we split the sample into an in-sample dataset and an out-of-sample dataset. The statistical model is trained on the in-sample dataset and the predictions are then compared to the out-of-sample dataset.

Referring to Campbell and Thompson (2008) and similar approaches in the extensive literature that exists, we use the mean-squared prediction error (MSPE), the mean absolute prediction error (MAPE), the $R^{2}$ statistic of mean-squared prediction error $(R^{2}_{MSPE})$ and the $R^{2}$ statistic of the absolute value of prediction error $(R^{2}_{MAPE})$ to evaluate the out-of-sample performance of the prediction models.

$R^{2}_{MSPE}$ is defined as follows:

\[ R_{M S P E}^{2}=1-\frac{M S P E_{C}}{M S P E_{B}}, \tag{21} \]

and

\[ M S P E_{B}=\frac{1}{q} \sum_{i=1}^{1}\left(r_{m+i}-\hat{r}_{m+i}^{B}\right)^{2}, \tag{22} \]

\[ M S P E_{C}=\frac{1}{q} \sum_{i=1}^{1}\left(r_{m+i}-\hat{r}_{m+i}^{C}\right)^{2}, \tag{23} \]

where $MSPE_{B}$ and $MSPE_{C}$ are $MSPE$ of the benchmark and candidate models respectively. $r_{m+i}$ denotes the true value in dataset at the time $m+i$ while $\hat{r}_{m+i}^{B}$ and $\hat{r}_{m+i}^{C}$ represents the out-of-sample predicted value of the benchmark and the candidate model at the time $m+i$, respectively.

Likewise, $R^{2}_{MAPE}$ is defined as follows

\[ R_{M A P E}^{2}=1-\frac{M A P E_{C}}{M A P E_{B}}, \tag{24} \]

and

\[ M A P E_{B}=\frac{1}{q} \sum_{i=1}^{1}\left|r_{m+i}-\hat{r}_{m+i}^{B}\right|, \tag{25} \]

\[ M A P E_{C}=\frac{1}{q} \sum_{i=1}^{1}\left|r_{m+i}-\hat{r}_{m+i}^{C}\right| \tag{26}, \]

where $MAPE_{B}$ and $MAPE_{C}$ are the $MAPE$ of the benchmark and candidate models respectively. $r_{m+i}$ denotes the true value at the time $m+i$ while $\hat{r}_{m+i}^{B}$ and $\hat{r}_{m+i}^{C}$ represents the out-of-sample predicted value of the benchmark and the candidate models at the time $m+i$, respectively.

D. Data

The EU carbon futures price is the target of our attention. The monthly closing price of the ICE ECX EUA futures continuous contract are selected as a sample for our study. We make a logarithmic difference between the month-end closing prices of carbon futures to obtain carbon futures returns as the core dependent variable in this study. Our sample period is from January 2011 to December 2020. Data is obtained from DataStream. Descriptive statistics for carbon futures returns are shown in Table 1.

Positive mean monthly price returns on carbon futures imply an upward trend in carbon prices over the sample period. The negative skewness means that the carbon price return series is left-skewed and the kurtosis value indicates that the return series is platykurtic. The results of the ADF and J-B tests indicate that the carbon futures return data are stationary and non-normal.

We adopt 44 macroeconomic variables as predictor variables, reflecting the situation of energy markets, financial markets, and economic fundamentals. These variables have the potential to predict carbon price. The data source is DataStream. Furthermore, we group the 44 macroeconomic variables into energy groups, energy price groups, stock index groups, monetary policy groups and economic information groups. Each group consists of identical indicators from different regions. For data stationarity, log returns are calculated for all macroeconomic variables.

Table 1.Descriptive statistics for the carbon price return data

Mean	Stdev	Skewness	kurtosis	ADF	Jarque-Bera
0.0065	0.1373	-0.6408	1.5527	-4.3550***	0.0000***

Notes: This Table reports summary statistic for the response variable —the monthly carbon price returns, and the sample period runs from Jan 2011 to December 2020. The ADF shows the value of the Augmented Dickey-Fuller (ADF) test with the null hypothesis of non-stationarity. Jarque-Bera shows the p-values of the Jarque-Bera test with the null hypothesis of normality. ***, **, and * denote, respectively, statistically significance at 1%, 5% level, and 10% levels.

III. Empirical results

A. Baseline Out-of-sample Forecasting Results

Table 2 shows the performance of each statistical model for out-of-sample predictions after training on the in-sample dataset, including two predictive evaluation indicators, MSPE and MAPE. By definition [of MSPE and MAPE], a smaller value of MSPE and MAPE indicates a more accurate prediction of the statistical model, while the opposite indicates a greater bias in the prediction. Six-month and twelve-month forecasts are reported in Table 2(a) and Table 2(b) respectively.

From the results in Table 2, we find that among the candidate models, the LASSO and Adaptive LASSO models for high-dimensional variables generally outperform the traditional time-series statistical models (the ARMAX (1,1,1) and the GARCHX (1,1,1)) in the six-month prediction test. However, in the twelve-month test the high-dimensional prediction models outperform the ARMAX (1,1,1) model but are weaker than the GARCHX (1,1,1) model.

Table 2.The MSPE and MAPE in the six-month forecast and twelve-month forecast

(a) six-month forecast			(b) twelve-month forecast
forecasting model	MSPE	MAPE	forecasting model	MSPE	MAPE
ARMAX (1.1.1)	0.1057	0.2692	ARMAX (1.1.1)	0.1701	0.3057
GARCHX (1.1.1)	0.0321	0.1358	GARCHX (1.1.1)	0.0264	0.1333
LASSO	0.0122	0.0941	LASSO	0.0468	0.1476
Adaptive LASSO	0.0222	0.1484	Adaptive LASSO	0.0444	0.146
Group SCAD	0.0115	0.0942	Group SCAD	0.0153	0.1086
Group LASSO	0.0089	0.0816	Group LASSO	0.0157	0.1079

Notes: Table 2 reports MSPE and MAPE values of the six-month and twelve-month forecasts. The smaller MSPE and MAPE values indicate higher prediction accuracy of the forecast model.

For the four high-dimensional forecasting models, the Group high-dimensional forecasting model had significantly smaller values for two of the forecasting indicators than it did for the other models. The only exception is that the Group SCAD model had slightly higher MAPE values than the LASSO model in the six-month forecasting test while the empirical results are fully consistent with our expectations in the twelve-month forecasting test.

B. Comparison of the predictive performance between different models

From Table 2, we can briefly conclude that high-dimensional models (LASSO, Adaptive LASSO, Group SCAD and Group LASSO) exhibit better forecasting performance than traditional time-series models (the ARMAX (1,1,1) and the GARCH (1,1,1)) for carbon price forecasts that take into account a large number of other variables. This result inspired us to make a specific comparison of the predictive performance of different high-dimensional models. Specifically, we use the traditional time series model as the baseline model and the remaining high-dimensional model as the candidate model. The predictive performance of the candidate models is evaluated by calculating the $R^{2}_{MSPE}$ and $R^{2}_{MAPE}$ values of the candidate models relative to the benchmark model; larger values of $R^{2}_{MSPE}$ and $R^{2}_{MAPE}$ indicate stronger predictive performance of the candidate models compared to the benchmark model, with positive values indicating that the candidate models have better predictive power than the benchmark model, and inversely, that the candidate models have weaker predictive performance than the benchmark model.

Table 3 displays the results of using ARMAX (1, 1, 1) and GARCHX (1, 1, 1) models as the baseline model to obtain values for $R^{2}_{MSPE}$ and $R^{2}_{MAPE}$. Table 3 (a) represents the result for ARMAX (1, 1, 1) as the baseline model and Table 3 (b) represents the result for GARCHX (1, 1, 1) as the baseline model. Panel A and Panel B in Table 3 are the six-month prediction test and the twelve-month prediction test respectively. The first column of each table indicates which candidate model has been used.

Table 3 shows that the $R^{2}_{MSPE}$ and $R^{2}_{MAPE}$ values of the high-dimensional model with group structure are positive in both the six-month test and the twelve-month test when the baseline model is the ARMAX (1,1,1) and GARCH (1,1,1) models, indicating that the accuracy of the group high-dimensional model in predicting carbon prices out-of-sample is better than that of the traditional time series model in both tests.

Table 3.The MSPE and MAPE in the six-month forecast and twelve-month forecast

(a) ARMAX (1,1,1) as the benchmark model			(b) GARCHX (1,1,1) as the benchmark model
$R^{2}_{MSPE}$	$R^{2}_{MAPE}$	forecasting model	$R^{2}_{MSPE}$	$R^{2}_{MSPE}$
Panel A: Six-month forecast			Panel A: Six-month forecast
0.8845	0.6504	LASSO	0.7248	0.5171
0.7899	0.4487	Adaptive LASSO	0.7389	0.5224
0.8912	0.6500	Group SCAD	0.9100	0.6447
0.9157	0.6968	Group LASSO	0.9077	0.6470

Panel B: Twelve-month forecast			Panel B: Twelve-month forecast
0.6199	0.3070	LASSO	-0.7727	-0.1072
0.3084	-0.0927	Adaptive LASSO	-0.6818	-0.0952
0.6417	0.3063	Group SCAD	0.4204	0.1852
0.7227	0.3991	Group LASSO	0.4053	0.1905

Notes: Table 3 reports the results of $R^{2}_{MSPE}$ and $R^{2}_{MAPE}$ in the six-month and twelve-month forecasts. Table 3(a) and 3(b) show the results when the benchmark models are the ARMAX (1,1,1) model and the GARCHX (1,1,1) model respectively. The larger positive value of $R^{2}_{MSPE}$ and $R^{2}_{MAPE}$, the better prediction ability of the candidate forecast model compared with the corresponding benchmark model.

On the other hand, we also consider the performance of LASSO and Adaptive LASSO models. When the ARMAX (1,1,1) is used as the benchmark model, their $R^{2}_{MSPE}$ and $R^{2}_{MAPE}$ values are positive in both tests. When the GARCHX (1,1,1) is used as the benchmark model, their $R^{2}_{MSPE}$ and $R^{2}_{MAPE}$ values are positive in the six-month tests. It is, however, worth noting that both LASSO and Adaptive LASSO have negative $R^{2}_{MSPE}$ and $R^{2}_{MAPE}$ in the twelve-month forecast test, implying that high-dimensional models do not outperform traditional time-series models in carbon price forecasting in all cases.

Finally, we compare the $R^{2}_{MSPE}$ and $R^{2}_{MAPE}$ values of the group high-dimensional model with other high-dimensional models. In most cases, the $R^{2}_{MSPE}$ and $R^{2}_{MAPE}$ values of the group high-dimensional model are larger than those of the other models, except for a few cases where the $R^{2}_{MAPE}$ values of the Group SCAD model are close to those of the GARCH model. In particular, comparing the LASSO, Adaptive LASSO and Group LASSO models, the $R^{2}_{MSPE}$ and $R^{2}_{MAPE}$ values of the Group LASSO model are larger in any case, indicating that Group LASSO has better carbon price forecasting performance under our forecasting conditions. This implies that the use of group structure variables in our study to optimize the forecasting model is positive, a result that is also evidenced in Table 2.

IV. Conclusion

This study uses data from 2011 to 2020, including EU carbon allowance futures returns and 44 macroeconomic indicators, to investigate the prediction of carbon prices using six statistical models. In particular, we propose two-group high-dimensional models for carbon price forecasting. The results of the study show that the high-dimensional model has better predictive power for carbon prices than the traditional linear model, suggesting the usefulness of incorporating a variety of macroeconomic factors into the prediction of carbon prices. In addition, in the group high-dimensional models, the high-dimensional models with cluster structure have better predictive performance than other models.

Our findings have important implications for policy makers, investors, and other stakeholders, as they reveal the important influence of macroeconomic factors on carbon price dynamics. Policymakers can enhance the effectiveness of intervention policies based on important predictors and expectations of the price of carbon. Effective forecasting of carbon prices also helps investors diversify their portfolios and provide reasonable risk diversification.

Carbon Prices Forecasting Using Group Information

Abstract

I. Introduction

II. Methodology and data

A. The candidate models

B. Group method

C. Out-of-sample Comparisons

D. Data

III. Empirical results

A. Baseline Out-of-sample Forecasting Results

B. Comparison of the predictive performance between different models

IV. Conclusion

References

(a) ARMAX (1,1,1) as the benchmark model			(b) GARCHX (1,1,1) as the benchmark model
\(R^{2}_{MSPE}\)	\(R^{2}_{MAPE}\)	forecasting model	\(R^{2}_{MSPE}\)	\(R^{2}_{MSPE}\)
Panel A: Six-month forecast			Panel A: Six-month forecast
0.8845	0.6504	LASSO	0.7248	0.5171
0.7899	0.4487	Adaptive LASSO	0.7389	0.5224
0.8912	0.6500	Group SCAD	0.9100	0.6447
0.9157	0.6968	Group LASSO	0.9077	0.6470

Panel B: Twelve-month forecast			Panel B: Twelve-month forecast
0.6199	0.3070	LASSO	-0.7727	-0.1072
0.3084	-0.0927	Adaptive LASSO	-0.6818	-0.0952
0.6417	0.3063	Group SCAD	0.4204	0.1852
0.7227	0.3991	Group LASSO	0.4053	0.1905