Lecture 22

Introduction to Time Series Forecasting

Monday

  • We covered the following topics:
    • sate-time data in R (date, POSIXct, POSIXlt)

    • Time series data/structure (ts, tsibble, tibble + Date column)

    • Time series decomposition

      • Trend, seasonality, and residuals
    • Smoothing time series data

      • Smoothing with moving averages

Today

  • Forecasting with two distinct models (1) ARIMA (2) Prophet
  • Understanding modeltime + tidymodels integration
  • Forecasting Process for time series

Toy Example: River Time Series & ARIMA

Let’s say we’re watching how much water flows in a river every day:

  • Monday: 50 cfs
  • Tuesday: 60 cfs
  • Wednesday: 70 cfs

We want to guess tomorrow’s flow.

1. AutoRegressive (AR)

  • This means: “Look at yesterday and the day before!”

  • If flow has been going up each day, AR says: ** “Tomorrow might go up again …” **

  • It’s like the river has a pattern.

2. Integrated (I)

  • Sometimes the flow just keeps climbing — like during a spring snowmelt!

  • To help ARIMA think better, we subtract yesterday’s number from today’s.

  • This makes the numbers more steady so ARIMA can do its magic.

3. Moving Average (MA)

  • The river sometimes gets a surprise flux (like a big rainstorm 🌧️).

  • MA looks at those surprises and helps smooth them out.

  • So if it rained two days ago, MA might say: “Don’t expect another surprise tomorrow.”

Put It All Together: AR + I + MA = ARIMA

  1. AR: Uses the river’s memory (trend)
  2. I: Calms the sturcutral components (season)
  3. MA: Handles noisy surprises (noise)

Now we can forecast tomorrow’s flow!

ARIMA

The basics of a ARIMA (AutoRegressive Integrated Moving Average) Model include:

  • AR: AutoRegressive part (past values):
    • AR(1) = current value depends on the previous value
    • AR(2) = current value depends on the previous two values
  • I: Integrated part (differencing to make data stationary)
    • Differencing removes trends and seasonality
    • e.g., diff(co2, lag = 12) removes annual seasonality
  • MA: Moving Average part (past errors)
    • MA(1) = current value depends on the previous error
    • MA(2) = current value depends on the previous two errors
  • p, d, q: Parameters for AR, I, and MA
    • p = number of lagged values (AR)
    • d = number of differences (I)
    • q = number of lagged errors (MA)

AIC

The AIC metric helps choose between models/parameters:

  • It rewards:
    • Good predictions
    • Simplicity
  • It punishes:
    • Complexity (too many parameters)
    • Overfitting (fitting noise instead of the trend)
  • Lower AIC = better model

A Simple Forecasting Example

  • auto.arima() is a function from the forecast package that automatically selects the best ARIMA model for your data accoridning toe to the AICc criterion.
  • The AICc (Akaike Information Criterion corrected) is a measure of the relative quality of statistical models for a given dataset.
  • It is used to compare different models and select the one that best fits the data while penalizing for complexity.
  • The auto.arima() function will automatically select the best parameters (p,d,q) based on the AICc criterion.

library(forecast)

co2_arima <- auto.arima(co2)

summary(co2_arima)
#> Series: co2 
#> ARIMA(1,1,1)(1,1,2)[12] 
#> 
#> Coefficients:
#>          ar1      ma1     sar1     sma1     sma2
#>       0.2569  -0.5847  -0.5489  -0.2620  -0.5123
#> s.e.  0.1406   0.1204   0.5880   0.5701   0.4819
#> 
#> sigma^2 = 0.08576:  log likelihood = -84.39
#> AIC=180.78   AICc=180.97   BIC=205.5
#> 
#> Training set error measures:
#>                      ME     RMSE       MAE         MPE       MAPE      MASE
#> Training set 0.01742092 0.287159 0.2303994 0.005073769 0.06845665 0.1819636
#>                      ACF1
#> Training set -0.002858162

Forecasting with ARIMA

  • The forecast() function is used to generate forecasts from the fitted ARIMA model.
  • The h argument specifies the number of periods to forecast into the future.
co2_forecast <- forecast(co2_arima, h = 60)

plot(co2_forecast)

🔢 ARIMA(1,1,1)(1,1,2)[12]

ARIMA Notation can be broken into two parts:

1. Non-seasonal part: ARIMA(1, 1, 1)

This is the “regular” ARIMA: - AR(1): One autoregressive term — the model looks at one lag of the time series.

  • I(1): One differencing — the model uses the change in values instead of raw values to make the series more stationary.

  • MA(1): One moving average term — the model corrects using one lag of the error term.

2. Seasonal part: (1, 1, 2)[12]

This is the seasonal pattern — repeated every 12 time units (like months in a year):

  • SAR(1): One seasonal autoregressive term — it uses the value from 12 time steps ago.

  • SI(1): One seasonal difference — subtracts the value from 12 steps ago to remove seasonal patterns.

  • SMA(2): Two seasonal moving average terms — uses errors from 12 and 24 time steps ago.

  • [12]: This is the seasonal period, i.e., it’s a yearly pattern with monthly data.

🔢 ARIMA(1,1,1)(1,1,2)[12]

Hey, ARIMA please…

“Model the data using a mix of its last value, the last error, and their seasonal versions from 12 months ago — but first difference it once to remove trend and once seasonally to remove yearly patterns.”

Note

ARIMA modeling works well when data is stationary. - Stationarity means the statistical properties of the series (mean, variance) do not change over time. - Non-stationary data can lead to unreliable forecasts and misleading results.

Prophet

  • Prophet is an open-source tool for forecasting time series data.

  • Developed by Facebook (Meta)

  • Designed for analysts and data scientists

  • Handles missing data, outliers, and seasonality

Key Features of Prophet

✅ Additive Model: Trend + Seasonality + Holidays + Noise
✅ Automatic Changepoint Detection
✅ Support for Custom Holidays & Events
✅ Flexible Seasonality (daily/weekly/yearly)
✅ Easy-to-use API in R and Python

Prophet’s Model Structure

Prophet decomposes time series into components:

\[ y(t) = g(t) + s(t) + h(t) + ε(t) \]

  • g(t): Trend (linear or logistic growth)

  • s(t): Seasonality (Fourier series)

  • h(t): Holiday effects

  • ε(t): Error term (noise)

📌 Assumes additive components by default; multiplicative also possible

A Simple Forecasting Example

Your time series must have: (1) ds column (date/timestamp) (2) y column (value to forecast)

library(prophet)
prophet_mod <- tsibble::as_tsibble(co2) |> 
  # prophet requires ds and y columns
  dplyr::rename(ds = index, y = value) |> 
  prophet()
# Make future dataframe and predict
future   <- make_future_dataframe(prophet_mod, periods = 1000)

forecast <- predict(prophet_mod, future)

# Plot the forecast
plot(prophet_mod, forecast) + 
  theme_minimal() 

Pros / Cons

Feature ARIMA Prophet
Statistical rigor Based on strong statistical theory; well-studied Intuitive, decomposable model (trend + seasonality + events)
Interpretability Clear interpretation of AR, MA, differencing terms Plots components like trend/seasonality directly
Flexibility (SARIMA) Seasonal ARIMA can handle seasonal structure Handles multiple seasonalities natively (yearly, weekly, daily)
Control over params Fine-tuned control over differencing, lags, and model order Easy to specify changepoints, seasonality, and custom events
Statistical testing Includes AIC/BIC for model selection Cross-validation support; uncertainty intervals included
Requires stationarity Time series must be stationary or made so (differencing) Handles non-stationary data out of the box
Model complexity Needs careful tuning (p,d,q) and domain expertise Defaults work well; limited tuning needed
Holiday effects Must be encoded manually Easy to include holidays/events
Multivariate support Basic ARIMA doesn’t support exogenous variables easily (need ARIMAX) Supports external regressors with add_regressor()
Non-linear trends Poor performance with structural breaks or non-linear growth Handles changepoints and logistic growth models well
Seasonality limits SARIMA handles only one seasonal period well Built-in multiple seasonal components (e.g., daily, weekly, yearly)

TL;DR

  • Use ARIMA if you want a classic, statistical model with deep customization and you’re comfortable making your data stationary.
  • Use Prophet if you want a fast, robust, and intuitive model for business or environmental data with strong seasonal effects and irregular events.

To much complexity!!

  • Each model has it own requirements, arguments, and tuning parameters.

  • Simular to our ML models, this introduces a large time sink, opportunity for error, and complexity.

  • modeltime brings tidy workflows to time series forecasting using the parsnip and workflows frameworks from tidymodels.

  • Combine multiple models (ARIMA, Prophet, XGBoost) in one framework to

Modeltime Integration

library(modeltime)
library(tidymodels)
library(timetk)

1. Create a time series split …

  • Use time_series_split() to make a train/test set.

  • Setting assess = “…” tells the function to use the last … of data as the testing set.

  • Setting cumulative = TRUE tells the sampling to use all of the prior data as the training set.

co2_tbl <-  tsibble::as_tsibble(co2) |> 
  as_tibble() |>
  mutate(date = as.Date(index), index = NULL) 

splits <- time_series_split(co2_tbl, assess = "60 months", cumulative = TRUE)

training <-  training(splits)
testing  <-  testing(splits)

2. Specify Models …

Just like tidymodels …

  • Model Spec: arima_reg()/prophet_reg()/prophet_boost() <– This sets up your general model algorithm and key parameters

  • Model Mode: mode = "regression"/mode = "classification" <– This sets the model mode to regression or classification (timeseries is always regression!)

  • Set Engine: set_engine("auto_arima")/set_engine("prophet") / <– This selects the specific package-function to use, you can add any function-level arguments here.

mods <- list(
  arima_reg() |>  set_engine("auto_arima"),
  
  arima_boost(min_n = 2, learn_rate = 0.015) |> set_engine(engine = "auto_arima_xgboost"),
  
  prophet_reg() |> set_engine("prophet"),
  
  prophet_boost() |> set_engine("prophet_xgboost"),
  
  # Exponential Smoothing State Space model
  exp_smoothing() |> set_engine(engine = "ets"),
  
  # Multivariate Adaptive Regression Spline model
  mars(mode = "regression") |> set_engine("earth") 
)

3. Fit Models …

  • Use purrr::map() to fit the models to the training data.

  • Fit Model: fit(value ~ date, training) <– All modeltime models require a date column to be a regressor.

models <- map(mods, ~ fit(.x, value ~ date, data = training))

4. Build modeltime table …

  • Use modeltime_table() to combine multiple models into a single table that can be used for calibration, accuracy, and forecasting.

modeltime_table():

  • Creates a table of models

  • Validates that all objects are models (parsnip or workflows objects) and all models have been fitted

  • Provides an ID and Description of the models

as_modeltime_table():

  • Converts a list of models to a modeltime table. Useful if programatically creating Modeltime Tables from models stored in a list (e.g. from map).
(models_tbl <- as_modeltime_table(models))
#> # Modeltime Table
#> # A tibble: 6 × 3
#>   .model_id .model   .model_desc            
#>       <int> <list>   <chr>                  
#> 1         1 <fit[+]> ARIMA(1,1,1)(2,1,2)[12]
#> 2         2 <fit[+]> ARIMA(1,1,1)(0,1,1)[12]
#> 3         3 <fit[+]> PROPHET                
#> 4         4 <fit[+]> PROPHET                
#> 5         5 <fit[+]> ETS(M,A,A)             
#> 6         6 <fit[+]> EARTH

Notes:

modeltime_table() does some basic checking to ensure all models are fit and organized into a scalable structure called a “Modeltime Table” that is used as part of our forecasting workflow.

  • It’s expected that tuning and parameter selection is performed prior to incorporating into a Modeltime Table.
  • If you try to add an unfitted model, the modeltime_table() will complain (throw an informative error) saying you need to fit() the model.

5. Calibrate the Models …

  • Use modeltime_calibrate() to evaluate the models on the test set.

  • Calibrating adds a new column, .calibration_data, with the test predictions and residuals inside.

(calibration_table <- modeltime_calibrate(models_tbl, testing, quiet = FALSE))
#> # Modeltime Table
#> # A tibble: 6 × 5
#>   .model_id .model   .model_desc             .type .calibration_data
#>       <int> <list>   <chr>                   <chr> <list>           
#> 1         1 <fit[+]> ARIMA(1,1,1)(2,1,2)[12] Test  <tibble [60 × 4]>
#> 2         2 <fit[+]> ARIMA(1,1,1)(0,1,1)[12] Test  <tibble [60 × 4]>
#> 3         3 <fit[+]> PROPHET                 Test  <tibble [60 × 4]>
#> 4         4 <fit[+]> PROPHET                 Test  <tibble [60 × 4]>
#> 5         5 <fit[+]> ETS(M,A,A)              Test  <tibble [60 × 4]>
#> 6         6 <fit[+]> EARTH                   Test  <tibble [60 × 4]>

A few notes on Calibration:

  • Calibration builds confidence intervals and accuracy metrics

  • Calibration Data is the predictions and residuals that are calculated from out-of-sample data.

  • After calibrating, the calibration data follows the data through the forecasting workflow.

Testing Forecast & Accuracy Evaluation

There are 2 critical parts to an evaluation.

  • Evaluating the Test (Out of Sample) Accuracy
  • Visualizing the Forecast vs Test Data Set

Accuracy

modeltime_accuracy() collects common accuracy metrics using yardstick functions:

  • MAE - Mean absolute error, mae()
  • MAPE - Mean absolute percentage error, mape()
  • MASE - Mean absolute scaled error, mase()
  • SMAPE - Symmetric mean absolute percentage error, smape()
  • RMSE - Root mean squared error, rmse()
  • RSQ - R-squared, rsq()
modeltime_accuracy(calibration_table) |> 
  arrange(mae)
#> # A tibble: 6 × 9
#>   .model_id .model_desc             .type   mae  mape  mase smape  rmse   rsq
#>       <int> <chr>                   <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1         1 ARIMA(1,1,1)(2,1,2)[12] Test  0.810 0.224 0.687 0.224 0.967 0.973
#> 2         2 ARIMA(1,1,1)(0,1,1)[12] Test  0.885 0.244 0.750 0.245 1.05  0.969
#> 3         3 PROPHET                 Test  1.06  0.295 0.899 0.294 1.16  0.986
#> 4         4 PROPHET                 Test  1.06  0.295 0.899 0.294 1.16  0.986
#> 5         5 ETS(M,A,A)              Test  1.48  0.409 1.25  0.410 1.71  0.958
#> 6         6 EARTH                   Test  1.89  0.526 1.61  0.526 2.25  0.499

Forecast

  • Use modeltime_forecast() to generate forecasts for the next 120 months (10 years).
  • Use plot_modeltime_forecast() to visualize the forecasts.
(forecast <- calibration_table  |> 
  modeltime_forecast(h = "60 months", 
                     new_data = testing,
                     actual_data = co2_tbl) )
#> # Forecast Results
#> 
#> # A tibble: 828 × 7
#>    .model_id .model_desc .key   .index     .value .conf_lo .conf_hi
#>        <int> <chr>       <fct>  <date>      <dbl>    <dbl>    <dbl>
#>  1        NA ACTUAL      actual 1959-01-01   315.       NA       NA
#>  2        NA ACTUAL      actual 1959-02-01   316.       NA       NA
#>  3        NA ACTUAL      actual 1959-03-01   316.       NA       NA
#>  4        NA ACTUAL      actual 1959-04-01   318.       NA       NA
#>  5        NA ACTUAL      actual 1959-05-01   318.       NA       NA
#>  6        NA ACTUAL      actual 1959-06-01   318        NA       NA
#>  7        NA ACTUAL      actual 1959-07-01   316.       NA       NA
#>  8        NA ACTUAL      actual 1959-08-01   315.       NA       NA
#>  9        NA ACTUAL      actual 1959-09-01   314.       NA       NA
#> 10        NA ACTUAL      actual 1959-10-01   313.       NA       NA
#> # ℹ 818 more rows

Vizualize

plot_modeltime_forecast(forecast)

Refit to Full Dataset & Forecast Forward

The final step is to refit the models to the full dataset using modeltime_refit() and forecast them forward.

refit_tbl <- calibration_table |>
    modeltime_refit(data = co2_tbl)

refit_tbl |>
    modeltime_forecast(h = "3 years", actual_data = co2_tbl) |>
    plot_modeltime_forecast()

Why refit?

The models have all changed! This is the (potential) benefit of refitting.

More often than not refitting is a good idea. Refitting:

  • Retrieves your model and preprocessing steps

  • Refits the model to the new data

  • Recalculates any automations. This includes:

    • Recalculating the changepoints for the Earth Model
    • Recalculating the ARIMA and ETS parameters
  • Preserves any parameter selections. This includes:

  • Any other defaults that are not automatic calculations are used.

Growing Ecosystem

Summary & Takeaways

  • Environmental science is rich with time series applications

  • Time series data is sequential and ordered

  • lubridate simplifies handling and extracting date-time features

  • Use ts, zoo, xts for classic or irregular data

  • Use tsibble, feasts, and fable for tidy workflows

  • Learn to decompose, smooth, and forecast

  • Use modeltime for advanced modeling and forecasting

Assignment

  • Use modeltime to forecast the next 12 months of streamflow data in the Poudre River based on last time assignment.
  • Use the prophet_reg(), and arima_reg() function to create a Prophet model for forecasting.
  • Use dataRetrieval to download daily streamflow for the next 12 months. Aggregate this data to monthly averages and compare it to the predictions made by your modeltime model.
  • Compute the R2 value between the model predictions and the observed data using a linear model and report the meaning.
  • Last, generate a plot of the Predicted vs Observed values and include a 1:1 line, and a linear model line.