Time series forecasting predicts future values based on historical patterns. Businesses use it for demand forecasting, financial planning, capacity planning, and anomaly detection. Despite the AI hype cycle, classical statistical methods remain competitive with deep learning for many forecasting tasks. Choosing the right approach depends on data characteristics, forecast horizon, and accuracy requirements.

Understanding Your Data

Before selecting a model, understand the time series characteristics:

Trend. Is there a long-term upward or downward direction? Revenue growing year over year, user base declining month over month.

Seasonality. Are there repeating patterns at fixed intervals? Retail sales spike in December, website traffic drops on weekends, energy demand peaks in summer.

Cyclicality. Are there patterns that do not have fixed periods? Business cycles, market sentiment shifts. Unlike seasonality, cycles do not have predictable timing.

Noise. How much random variation exists? Highly noisy series are harder to forecast accurately.

Stationarity. Do the statistical properties (mean, variance) change over time? Many models assume stationarity; non-stationary series need differencing or detrending.

External factors. Are there external variables that influence the series? Weather affecting energy demand, promotions affecting sales, holidays affecting traffic.

Approach Selection

Classical Statistical Methods

ARIMA (AutoRegressive Integrated Moving Average). The workhorse of time series forecasting. Handles trend through differencing, captures autocorrelation patterns. Works well for univariate series with clear autocorrelation structure. Use auto_arima (pmdarima library) to automatically select parameters.

Exponential Smoothing (ETS). Models trend and seasonality through weighted averages of past observations. Simple to implement and interpret. Works well for series with clear trend and seasonal patterns.

Prophet. Facebook’s forecasting library. Handles seasonality (daily, weekly, yearly), holidays, and trend changepoints automatically. Good for business time series with irregular holidays and strong seasonal patterns. Easy to use but less flexible than custom models.

When to use classical methods: Single series forecasting, moderate data volume, need for interpretability, limited compute resources, no complex cross-variable relationships.

Machine Learning Methods

Gradient Boosted Trees (XGBoost, LightGBM). Treat forecasting as a regression problem. Create features from lagged values, rolling statistics, calendar features, and external variables. Very effective when external features are important predictors.

Feature engineering is critical: lag features (value at t-1, t-7, t-30), rolling statistics (7-day mean, 30-day std), calendar features (day of week, month, holiday), and domain-specific features (promotion active, weather forecast).

When to use ML methods: Multiple external features influence the forecast, complex nonlinear relationships, large datasets, ensemble of many series.

Deep Learning Methods

LSTM and GRU. Recurrent neural networks that capture sequential dependencies. Can model complex temporal patterns. Require more data and compute than classical methods. Often do not outperform simpler methods on small to medium datasets.

Temporal Fusion Transformers (TFT). Transformer-based architecture designed for time series. Handles multiple input types (static covariates, known future inputs, observed past inputs). Provides interpretable attention weights.

N-BEATS and N-HiTS. Neural network architectures designed specifically for forecasting. Strong performance on benchmarks. Less need for feature engineering than tree-based methods.

Amazon Chronos. Foundation model for time series. Pre-trained on diverse time series data. Can forecast without task-specific training. Good for zero-shot forecasting when historical data is limited.

When to use deep learning: Large datasets (10,000+ time points), complex patterns that simpler methods miss, multiple related series (cross-series learning), sufficient compute budget for training.

Evaluation

Train-Test Split

Time series data is ordered - never use random splits. Always use temporal splits:

  • Training set: Historical data up to a cutoff point
  • Validation set: Data immediately after training, used for model selection
  • Test set: Most recent data, used for final evaluation

Metrics

MAE (Mean Absolute Error). Average absolute difference between predicted and actual values. Easy to interpret in the data’s units.

MAPE (Mean Absolute Percentage Error). Average percentage error. Useful for comparing across series with different scales. Problematic when actual values are near zero.

RMSE (Root Mean Squared Error). Penalizes large errors more than MAE. Use when large errors are particularly costly.

SMAPE (Symmetric Mean Absolute Percentage Error). Symmetric version of MAPE that handles near-zero values better.

Backtesting

Evaluate the model across multiple historical periods, not just one train-test split:

  1. Train on data up to time T, forecast T+1 to T+H
  2. Train on data up to time T+1, forecast T+2 to T+H+1
  3. Repeat across multiple origins

This gives a more robust estimate of forecast accuracy than a single split.

Production Considerations

Retraining frequency. How often should the model be retrained? For daily forecasts, weekly or monthly retraining is common. For rapidly changing patterns, more frequent retraining is needed.

Forecast monitoring. Compare forecasts to actual values as actuals become available. Track forecast accuracy over time. Alert when accuracy degrades beyond a threshold.

Probabilistic forecasts. Point forecasts (single number) are often insufficient. Provide prediction intervals (80% and 95% confidence bands) so decision makers understand forecast uncertainty.

Human judgment integration. For high-stakes forecasts (annual budgets, capacity planning), combine model forecasts with human domain expertise. Structured judgment processes (Delphi method) can improve accuracy.

Multiple models. No single model is best for all series. Use an ensemble or model selection process that chooses the best model per series based on validation performance.

Start with the simplest model that works (often ARIMA or Prophet), measure its performance with proper backtesting, and add complexity only if the simpler approach falls short. In many practical business forecasting scenarios, well-tuned classical methods match or outperform deep learning approaches.

Sources and Further Reading

  1. Holt, C.C. (1957/2004). “Forecasting seasonals and trends by exponentially weighted moving averages.” ONR Research Memorandum, Carnegie Institute of Technology, 1957. Reprinted in International Journal of Forecasting 20(1), pp. 5–10, 2004. — Original formulation of exponential smoothing (ETS).
  2. Box, G.E.P. and Jenkins, G.M. (1970). Time Series Analysis: Forecasting and Control. Holden-Day. — Foundational text establishing the ARIMA framework that remains standard methodology.
  3. Taylor, S.J. and Letham, B. (2018). “Forecasting at Scale.” The American Statistician 72(1), pp. 37–45. — The paper behind Facebook Prophet, describing the decomposable additive model for business time series. https://doi.org/10.1080/00031305.2017.1380080
  4. Lim, B., Arik, S.Ö., Loeff, N., and Pfister, T. (2021). “Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting.” International Journal of Forecasting 37(4), pp. 1748–1764. — Introduces TFT; provides interpretable attention across multiple input types. https://doi.org/10.1016/j.ijforecast.2021.03.012
  5. Makridakis, S., Spiliotis, E., and Assimakopoulos, V. (2018). “The M4 Competition: Results, Findings, Conclusion and Way Forward.” International Journal of Forecasting 34(4), pp. 802–808. — Large-scale empirical comparison across methods; documents conditions where classical methods match or outperform deep learning.