Time series data/structure (ts, tsibble, tibble + Date column)
Time series decomposition
Trend, seasonality, and residuals
Smoothing time series data
Smoothing with moving averages
Today
Forecasting with two distinct models (1) ARIMA (2) Prophet
Understanding modeltime + tidymodels integration
Forecasting Process for time series
Toy Example: River Time Series & ARIMA
Let’s say we’re watching how much water flows in a river every day:
Monday: 50 cfs
Tuesday: 60 cfs
Wednesday: 70 cfs
We want to guess tomorrow’s flow.
1. AutoRegressive (AR)
This means: “Look at yesterday and the day before!”
If flow has been going up each day, AR says: ** “Tomorrow might go up again …” **
It’s like the river has a pattern.
2. Integrated (I)
Sometimes the flow just keeps climbing — like during a spring snowmelt!
To help ARIMA think better, we subtract yesterday’s number from today’s.
This makes the numbers more steady so ARIMA can do its magic.
3. Moving Average (MA)
The river sometimes gets a surprise flux (like a big rainstorm 🌧️).
MA looks at those surprises and helps smooth them out.
So if it rained two days ago, MA might say: “Don’t expect another surprise tomorrow.”
Put It All Together: AR + I + MA = ARIMA
AR: Uses the river’s memory (trend)
I: Calms the sturcutral components (season)
MA: Handles noisy surprises (noise)
Now we can forecast tomorrow’s flow!
ARIMA
The basics of a ARIMA (AutoRegressive Integrated Moving Average) Model include:
AR: AutoRegressive part (past values):
AR(1) = current value depends on the previous value
AR(2) = current value depends on the previous two values
I: Integrated part (differencing to make data stationary)
Differencing removes trends and seasonality
e.g., diff(co2, lag = 12) removes annual seasonality
MA: Moving Average part (past errors)
MA(1) = current value depends on the previous error
MA(2) = current value depends on the previous two errors
p, d, q: Parameters for AR, I, and MA
p = number of lagged values (AR)
d = number of differences (I)
q = number of lagged errors (MA)
AIC
The AIC metric helps choose between models/parameters:
It rewards:
Good predictions
Simplicity
It punishes:
Complexity (too many parameters)
Overfitting (fitting noise instead of the trend)
Lower AIC = better model
A Simple Forecasting Example
auto.arima() is a function from the forecast package that automatically selects the best ARIMA model for your data accoridning toe to the AICc criterion.
The AICc (Akaike Information Criterion corrected) is a measure of the relative quality of statistical models for a given dataset.
It is used to compare different models and select the one that best fits the data while penalizing for complexity.
The auto.arima() function will automatically select the best parameters (p,d,q) based on the AICc criterion.
library(forecast)co2_arima <-auto.arima(co2)summary(co2_arima)#> Series: co2 #> ARIMA(1,1,1)(1,1,2)[12] #> #> Coefficients:#> ar1 ma1 sar1 sma1 sma2#> 0.2569 -0.5847 -0.5489 -0.2620 -0.5123#> s.e. 0.1406 0.1204 0.5880 0.5701 0.4819#> #> sigma^2 = 0.08576: log likelihood = -84.39#> AIC=180.78 AICc=180.97 BIC=205.5#> #> Training set error measures:#> ME RMSE MAE MPE MAPE MASE#> Training set 0.01742092 0.287159 0.2303994 0.005073769 0.06845665 0.1819636#> ACF1#> Training set -0.002858162
Forecasting with ARIMA
The forecast() function is used to generate forecasts from the fitted ARIMA model.
The h argument specifies the number of periods to forecast into the future.
co2_forecast <-forecast(co2_arima, h =60)plot(co2_forecast)
🔢 ARIMA(1,1,1)(1,1,2)[12]
ARIMA Notation can be broken into two parts:
1. Non-seasonal part: ARIMA(1, 1, 1)
This is the “regular” ARIMA: - AR(1): One autoregressive term — the model looks at one lag of the time series.
I(1): One differencing — the model uses the change in values instead of raw values to make the series more stationary.
MA(1): One moving average term — the model corrects using one lag of the error term.
2. Seasonal part: (1, 1, 2)[12]
This is the seasonal pattern — repeated every 12 time units (like months in a year):
SAR(1): One seasonal autoregressive term — it uses the value from 12 time steps ago.
SI(1): One seasonal difference — subtracts the value from 12 steps ago to remove seasonal patterns.
SMA(2): Two seasonal moving average terms — uses errors from 12 and 24 time steps ago.
[12]: This is the seasonal period, i.e., it’s a yearly pattern with monthly data.
🔢 ARIMA(1,1,1)(1,1,2)[12]
Hey, ARIMA please…
“Model the data using a mix of its last value, the last error, and their seasonal versions from 12 months ago — but first difference it once to remove trend and once seasonally to remove yearly patterns.”
Note
ARIMA modeling works well when data is stationary. - Stationarity means the statistical properties of the series (mean, variance) do not change over time. - Non-stationary data can lead to unreliable forecasts and misleading results.
Prophet
Prophet is an open-source tool for forecasting time series data.
Developed by Facebook (Meta)
Designed for analysts and data scientists
Handles missing data, outliers, and seasonality
Key Features of Prophet
✅ Additive Model: Trend + Seasonality + Holidays + Noise
✅ Automatic Changepoint Detection
✅ Support for Custom Holidays & Events
✅ Flexible Seasonality (daily/weekly/yearly)
✅ Easy-to-use API in R and Python
Prophet’s Model Structure
Prophet decomposes time series into components:
\[ y(t) = g(t) + s(t) + h(t) + ε(t) \]
g(t): Trend (linear or logistic growth)
s(t): Seasonality (Fourier series)
h(t): Holiday effects
ε(t): Error term (noise)
📌 Assumes additive components by default; multiplicative also possible
A Simple Forecasting Example
Your time series must have: (1) ds column (date/timestamp) (2) y column (value to forecast)
library(prophet)prophet_mod <- tsibble::as_tsibble(co2) |># prophet requires ds and y columns dplyr::rename(ds = index, y = value) |>prophet()
# Make future dataframe and predictfuture <-make_future_dataframe(prophet_mod, periods =1000)forecast <-predict(prophet_mod, future)# Plot the forecastplot(prophet_mod, forecast) +theme_minimal()
Pros / Cons
Feature
ARIMA
Prophet
Statistical rigor
Based on strong statistical theory; well-studied
Intuitive, decomposable model (trend + seasonality + events)
Interpretability
Clear interpretation of AR, MA, differencing terms
Model Spec: arima_reg()/prophet_reg()/prophet_boost() <– This sets up your general model algorithm and key parameters
Model Mode: mode = "regression"/mode = "classification" <– This sets the model mode to regression or classification (timeseries is always regression!)
Set Engine: set_engine("auto_arima")/set_engine("prophet") / <– This selects the specific package-function to use, you can add any function-level arguments here.
modeltime_table() does some basic checking to ensure all models are fit and organized into a scalable structure called a “Modeltime Table” that is used as part of our forecasting workflow.
It’s expected that tuning and parameter selection is performed prior to incorporating into a Modeltime Table.
If you try to add an unfitted model, the modeltime_table() will complain (throw an informative error) saying you need to fit() the model.
5. Calibrate the Models …
Use modeltime_calibrate() to evaluate the models on the test set.
Calibrating adds a new column, .calibration_data, with the test predictions and residuals inside.
Calibration builds confidence intervals and accuracy metrics
Calibration Data is the predictions and residuals that are calculated from out-of-sample data.
After calibrating, the calibration data follows the data through the forecasting workflow.
Testing Forecast & Accuracy Evaluation
There are 2 critical parts to an evaluation.
Evaluating the Test (Out of Sample) Accuracy
Visualizing the Forecast vs Test Data Set
Accuracy
modeltime_accuracy() collects common accuracy metrics using yardstick functions:
MAE - Mean absolute error, mae()
MAPE - Mean absolute percentage error, mape()
MASE - Mean absolute scaled error, mase()
SMAPE - Symmetric mean absolute percentage error, smape()
RMSE - Root mean squared error, rmse()
RSQ - R-squared, rsq()
modeltime_accuracy(calibration_table) |>arrange(mae)#> # A tibble: 6 × 9#> .model_id .model_desc .type mae mape mase smape rmse rsq#> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>#> 1 1 ARIMA(1,1,1)(2,1,2)[12] Test 0.810 0.224 0.687 0.224 0.967 0.973#> 2 2 ARIMA(1,1,1)(0,1,1)[12] Test 0.885 0.244 0.750 0.245 1.05 0.969#> 3 3 PROPHET Test 1.06 0.295 0.899 0.294 1.16 0.986#> 4 4 PROPHET Test 1.06 0.295 0.899 0.294 1.16 0.986#> 5 5 ETS(M,A,A) Test 1.48 0.409 1.25 0.410 1.71 0.958#> 6 6 EARTH Test 1.89 0.526 1.61 0.526 2.25 0.499
Forecast
Use modeltime_forecast() to generate forecasts for the next 120 months (10 years).
Use plot_modeltime_forecast() to visualize the forecasts.
(forecast <- calibration_table |>modeltime_forecast(h ="60 months", new_data = testing,actual_data = co2_tbl) )#> # Forecast Results#> #> # A tibble: 828 × 7#> .model_id .model_desc .key .index .value .conf_lo .conf_hi#> <int> <chr> <fct> <date> <dbl> <dbl> <dbl>#> 1 NA ACTUAL actual 1959-01-01 315. NA NA#> 2 NA ACTUAL actual 1959-02-01 316. NA NA#> 3 NA ACTUAL actual 1959-03-01 316. NA NA#> 4 NA ACTUAL actual 1959-04-01 318. NA NA#> 5 NA ACTUAL actual 1959-05-01 318. NA NA#> 6 NA ACTUAL actual 1959-06-01 318 NA NA#> 7 NA ACTUAL actual 1959-07-01 316. NA NA#> 8 NA ACTUAL actual 1959-08-01 315. NA NA#> 9 NA ACTUAL actual 1959-09-01 314. NA NA#> 10 NA ACTUAL actual 1959-10-01 313. NA NA#> # ℹ 818 more rows
Vizualize
plot_modeltime_forecast(forecast)
Refit to Full Dataset & Forecast Forward
The final step is to refit the models to the full dataset using modeltime_refit() and forecast them forward.
The models have all changed! This is the (potential) benefit of refitting.
More often than not refitting is a good idea. Refitting:
Retrieves your model and preprocessing steps
Refits the model to the new data
Recalculates any automations. This includes:
Recalculating the changepoints for the Earth Model
Recalculating the ARIMA and ETS parameters
Preserves any parameter selections. This includes:
Any other defaults that are not automatic calculations are used.
Growing Ecosystem
Summary & Takeaways
Environmental science is rich with time series applications
Time series data is sequential and ordered
lubridate simplifies handling and extracting date-time features
Use ts, zoo, xts for classic or irregular data
Use tsibble, feasts, and fable for tidy workflows
Learn to decompose, smooth, and forecast
Use modeltime for advanced modeling and forecasting
Assignment
Use modeltime to forecast the next 12 months of streamflow data in the Poudre River based on last time assignment.
Use the prophet_reg(), and arima_reg() function to create a Prophet model for forecasting.
Use dataRetrieval to download daily streamflow for the next 12 months. Aggregate this data to monthly averages and compare it to the predictions made by your modeltime model.
Compute the R2 value between the model predictions and the observed data using a linear model and report the meaning.
Last, generate a plot of the Predicted vs Observed values and include a 1:1 line, and a linear model line.