Beyond Tradition: Harnessing Machine Learning for Demand Forecasting

In our previous article, we established robust baselines using traditional time series models like ARIMA and ETS. We validated stationarity, checked residuals, and built statistically sound forecasts. But we hit a fundamental limitation: these models only look at the past values of the series itself.

What about all the business context we simulated in our dataset? Promotions, holidays, product categories—this information is crucial for accurate demand planning. Today, we break free from tradition and harness Machine Learning to create forecasts that understand the real world.

The Power of Context: Why ML for Time Series?

Traditional models are powerful but myopic. Machine Learning models excel when we can provide them with relevant features:

Promotions: A 30% discount will likely boost demand—this isn’t just a statistical pattern.
Holidays: Christmas shopping behavior is fundamentally different from a random Tuesday.
Product Categories: Electronics and clothing have completely different seasonal patterns.
Day of Week: Weekend shopping behavior varies significantly.

Let’s enhance our single-product view from Article 2 to a multi-product, feature-rich approach.

Feature Engineering: The Real Magic

The key to successful ML for time series is thoughtful feature engineering. We’ll transform our raw data into features that help the model understand temporal patterns.

The entire code is available here: ml_time_series.ipynb

Creating Temporal Features

def create_features(df):
    df = df.copy()
    df['date'] = pd.to_datetime(df.index)

    # Basic date features
    df['day_of_week'] = df['date'].dt.dayofweek
    df['day_of_month'] = df['date'].dt.day
    df['week_of_year'] = df['date'].dt.isocalendar().week
    df['month'] = df['date'].dt.month
    df['quarter'] = df['date'].dt.quarter
    df['year'] = df['date'].dt.year
    df['is_weekend'] = (df['date'].dt.dayofweek >= 5).astype(int)

    # Cyclical encoding for periodic features
    df['day_of_week_sin'] = np.sin(2 * np.pi * df['day_of_week']/7)
    df['day_of_week_cos'] = np.cos(2 * np.pi * df['day_of_week']/7)
    df['month_sin'] = np.sin(2 * np.pi * df['month']/12)
    df['month_cos'] = np.cos(2 * np.pi * df['month']/12)

    # Holiday proximity (days until next major holiday)
    def days_to_holiday(date):
        holidays = {
            'christmas': pd.Timestamp(f'{date.year}-12-25'),
            'new_year': pd.Timestamp(f'{date.year+1}-01-01'),
            'july_4': pd.Timestamp(f'{date.year}-07-04')
        }
        min_days = 365
        for holiday in holidays.values():
            days = abs((date - holiday).days)
            min_days = min(min_days, days)
        return min_days

    df['days_to_holiday'] = df['date'].apply(days_to_holiday)

    return df

# Apply feature engineering
df_enhanced = create_features(df).reset_index(drop=True)

Creating Lag and Window Features

This is where we help the model understand recent trends and patterns.

# Create lag and rolling window features for a specific product
def create_lag_features(df, product_id, lag_periods=[1, 7, 14, 28], window_sizes=[7, 28]):
    product_df = df[df['product_id'] == product_id].copy().sort_values('date')

    # Lag features
    for lag in lag_periods:
        product_df[f'lag_{lag}'] = product_df['units_sold'].shift(lag)

    # Rolling statistics
    for window in window_sizes:
        product_df[f'rolling_mean_{window}'] = product_df['units_sold'].shift(1).rolling(window=window).mean()
        product_df[f'rolling_std_{window}'] = product_df['units_sold'].shift(1).rolling(window=window).std()
        product_df[f'rolling_max_{window}'] = product_df['units_sold'].shift(1).rolling(window=window).max()

    # Price change features
    product_df['price_change_1d'] = product_df['selling_price'].pct_change(1)
    product_df['price_change_7d'] = product_df['selling_price'].pct_change(7)

    return product_df

# Let's use our star product P003
product_ml = create_lag_features(df_enhanced, 'P003')

Training Our First ML Model

Now we have rich features that capture both temporal patterns and business context.

# Prepare features and target
feature_columns = [
    'day_of_week_sin', 'day_of_week_cos', 'month_sin', 'month_cos',
    'is_weekend', 'days_to_holiday', 'promotion', 'holiday',
    'lag_1', 'lag_7', 'lag_14', 'lag_28',
    'rolling_mean_7', 'rolling_std_7', 'rolling_mean_28',
    'price_change_1d', 'price_change_7d'
]

# Remove rows with NaN values (from lag features)
product_ml_clean = product_ml.dropna(subset=feature_columns + ['units_sold'])

# Split chronologically (important for time series!)
split_date = '2024-01-01'
train_mask = product_ml_clean['date'] < split_date
test_mask = product_ml_clean['date'] >= split_date

X_train = product_ml_clean[train_mask][feature_columns]
X_test = product_ml_clean[test_mask][feature_columns]
y_train = product_ml_clean[train_mask]['units_sold']
y_test = product_ml_clean[test_mask]['units_sold']

print(f"Training samples: {len(X_train)}, Test samples: {len(X_test)}")

# Train Random Forest model
rf_model = RandomForestRegressor(
    n_estimators=100,
    max_depth=10,
    random_state=42,
    n_jobs=-1
)

rf_model.fit(X_train, y_train)

# Generate predictions
y_pred_rf = rf_model.predict(X_test)

rf_mae = mean_absolute_error(y_test, y_pred_rf)

print(f"Random Forest MAE: {rf_mae:.2f}")
print(f"ARIMA Baseline MAE: {arima_mae:.2f}")
print(f"Improvement: {((arima_mae - rf_mae) / arima_mae * 100):.1f}%")

Feature Importance: Understanding What Drives Forecasts

One major advantage of tree-based models is interpretability through feature importance.

feature_importance = pd.DataFrame({
    'feature': feature_columns,
    'importance': rf_model.feature_importances_
}).sort_values('importance', ascending=True)

plt.figure(figsize=(10, 8))
plt.barh(feature_importance['feature'], feature_importance['importance'])
plt.title('Random Forest Feature Importance')
plt.xlabel('Importance Score')
plt.tight_layout()
plt.show()

Visualizing ML vs Traditional Forecasts

# Compare forecasts
plt.figure(figsize=(14, 8))

# Get the test period dates
test_dates = pd.Series(y_test.values, index=product_ml_clean[test_mask]['date'])
test_dates = test_dates.resample('W').sum()
#print(test_dates)
test_rf_forecast = pd.Series(y_pred_rf, index=product_ml_clean[test_mask]['date'])
test_rf_forecast = test_rf_forecast.resample('W').sum()
# Arima serie test
plt.plot(test.index, arima_forecast, label='ARIMA Forecast', color='blue', alpha=0.8)
# ETS forcast
plt.plot(test.index, ets_forecast, label='ETS Forecast', alpha=0.8)
#ML RF
plt.plot(test_dates.index, test_dates.values, label='Actual Sales', color='black', linewidth=2)
plt.plot(test_rf_forecast.index, test_rf_forecast.values, label='Random Forest Forecast', color='red', alpha=0.8)

plt.title('ML vs Traditional Forecasting: Product P003')
plt.legend()
plt.grid(True, alpha=0.3)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

The Results: Context is King

In our experiments with the simulated data, the ML approach typically shows:

The 15-30% improvement in MAE over traditional models. In this cases is even better (88%)
Better capture of promotion effects
More accurate holiday season predictions
Ability to learn across multiple products (when we extend the approach)

But We’ve Created New Challenges

Feature Storage: Now we need to maintain and update all these engineered features.
Data Leakage: Creating lag features requires careful chronological splitting.
Model Complexity: We’ve traded statistical assumptions for feature engineering complexity.
Scale: How do we efficiently create these features for 10,000 products?

The Bridge to MLOps

This ML approach sets the stage for our next critical topic: MLOps. We’ve moved from simple scripts to a more complex pipeline that needs:

Reproducible feature engineering
Versioned datasets
Organized project structure
Model and feature monitoring

The local ML prototype works beautifully for one product, but the real business value comes from scaling this to the entire product catalog reliably.

We’ve enhanced our forecasting power significantly by incorporating business context. But with great power comes great responsibility—the responsibility to build systems that can handle this complexity at scale.