Best procedure to select predictors and ARIMA model terms #68

maxdevblock · 2026-01-05T10:01:45Z

maxdevblock
Jan 5, 2026

I'm searching for a valid procedure to select predictors and $\text{ARIMA}(p,d,q)(P,D,Q)[S]$ model in a forecasting framework.

I have a dataset from $t=1$ to $t=T$, with a target variable and $p$ predictors. I want to find the best selection of $k \leq p$ predictors and $(p,d,q)(P,D,Q)$ ARIMA terms minimizing the prediction error up to $t=T+h$ (with $S=12$).

The first procedure I thought about is:

for each of the $2^p$ predictors' combinations
a. find the best $(p,d,q)(P,D,Q)$ ARIMA model by minimizing the corrected Akaike Information Criterion (AICc) over the full dataset $t=1,\ldots,T$ using the Hyndman-Khandakar algorithm
b. train-test split the dataset, so that the train set span $t=1,\ldots,\tau$ and the test set span $t=\tau+1,\ldots,T$ with $\tau < T$, for $u$ values of $\tau$
c. for each train-test split, refit the model (re-estimating the parameters) and calculate the test-RMSE
d. average the test-RMSE
among the $2^p$ models resulting from point (1), choose the one minimizing the average test-RMSE

A second procedure (more difficult to implement) could be:

for each of the $2^p$ predictors' combinations
a. train-test split the dataset, so that the train set span $t=1,\ldots,\tau$ and the test set span $t=\tau+1,\ldots,T$ with $\tau < T$, for $u$ values of $\tau$
b. use an ad-hoc algorithm similar to the Hyndman-Khandakar but minimizing the average test-RMSE to find the best $(p,d,q)(P,D,Q)$ ARIMA terms
among the $2^p$ models resulting from point (1), choose the one minimizing the average test-RMSE

Which of these two procedures could be better? Is there a best practice and/or a more rigorous and tested approach?

Thank you.

Answered by robjhyndman

Jan 5, 2026

Great question! I've done something similar to the first procedure before, and it works pretty well. Note that minimizing the AICc is asymptotically equivalent to minimizing one-step RMSE on cross-validated test sets, so there is no guarantee that the ARIMA model will be optimal for multi-step forecasting. On the other hand, if the data truly come from the fitted model, then optimizing for one-step RMSE will also give the optimal model for multi-step RMSE.

The second approach focuses more directly on the multi-step RMSE, but is less efficient in choosing the ARIMA model as there are a limited number of training/test splits that you can average over. It will also be much slower as the mode…

View full answer

robjhyndman · 2026-01-05T21:15:59Z

robjhyndman
Jan 5, 2026
Maintainer

Great question! I've done something similar to the first procedure before, and it works pretty well. Note that minimizing the AICc is asymptotically equivalent to minimizing one-step RMSE on cross-validated test sets, so there is no guarantee that the ARIMA model will be optimal for multi-step forecasting. On the other hand, if the data truly come from the fitted model, then optimizing for one-step RMSE will also give the optimal model for multi-step RMSE.

The second approach focuses more directly on the multi-step RMSE, but is less efficient in choosing the ARIMA model as there are a limited number of training/test splits that you can average over. It will also be much slower as the model search space is much bigger (including all the possible ARIMA models in each iteration).

I don't know of any literature on this, but I would guess that it won't make that much difference to forecast accuracy, and so I'd go for the faster approach, namely the first procedure.

3 replies

maxdevblock Jan 6, 2026
Author

Hi Rob. Many thanks.

The only paper I've found so far about a rigorous approach for selecting the best SARIMAX model is

Xie, J. (2023). Identifying optimal indicators and lag terms for nowcasting models. International Monetary Fund. https://doi.org/10.5089/9798400235177.001

The selection procedure is "simpler" (doesn't involve the average test-RMSE) because it's for nowcasting. Xie suggests to check for significance also, and it makes definitely sense to me because, in this case, the model would also be valid for explanation (and, maybe, reduce the prediction interval?).
Thus, I'm thinking to change the point (2) of the first procedure to

among the $2^p$ models resulting from point (1), choose the one minimizing the average test-RMSE and whose parameters p-values are all less than $\alpha$ (Xie suggests 5% significance level)

so that the selected model would finally be:

the model minimizing the average test-RMSE (over $u$ train-test splits)
with significant estimates at a chosen $\alpha$ level
and the predictors combination minimizing AICc (and, asymptotically, the one-step RMSE) over the full dataset

What do you think about it?

Thank you.

robjhyndman Jan 6, 2026
Maintainer

Statistical significance is a completely different problem from selecting predictors. I would not confuse them.

maxdevblock Jan 12, 2026
Author

Thank you @robjhyndman

Indeed, upon closer reading of Xie's paper, no specific reason is adduced for this choice, nor are there references to Scholars who made the same choice or proof that this choice could improve the model.

I will therefore proceed with the first approach, as it is fully defensible and does not conflate different theoretical frameworks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OTexts

Best procedure to select predictors and ARIMA model terms #68

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

OTexts

Best procedure to select predictors and ARIMA model terms #68

Uh oh!

maxdevblock Jan 5, 2026

Replies: 1 comment · 3 replies

Uh oh!

robjhyndman Jan 5, 2026 Maintainer

Uh oh!

Uh oh!

maxdevblock Jan 6, 2026 Author

Uh oh!

robjhyndman Jan 6, 2026 Maintainer

Uh oh!

maxdevblock Jan 12, 2026 Author

maxdevblock
Jan 5, 2026

Replies: 1 comment 3 replies

robjhyndman
Jan 5, 2026
Maintainer

maxdevblock Jan 6, 2026
Author

robjhyndman Jan 6, 2026
Maintainer

maxdevblock Jan 12, 2026
Author