What this is
A pure-SQL pipeline that produces a 30-day daily revenue forecast from
your GA4 BigQuery export. Trains an ARIMA_PLUS_XREG model on revenue
with four ARIMA_PLUS sub-models forecasting the upstream funnel metrics
that feed it. Output is a single BigQuery view designed to plug straight
into Data Studio.
No Python, no Vertex AI, no Cloud Functions, no orchestrator. The whole pipeline is two scheduled BigQuery queries — one that refreshes data daily, one that retrains weekly — plus a view that re-evaluates on every query.
How it works
Two phases:
- Build (first run, then weekly retrain) — aggregate GA4 events and
ad metrics into a daily fact table, then train the main XREG model on
revenue plus the four univariate
ARIMA_PLUSsub-models on the funnel metrics that become its future regressors. - Forecast (daily refresh, live on every query) — the sub-forecasts predict tomorrow’s funnel volume; the trained XREG model uses those plus your planned ad spend to predict tomorrow’s revenue. A view unions forecast with history for Data Studio to consume.
After the first run you’ll have 11 objects in your dataset: 3 tables,
6 models, and 2 views — including a sales_forecast_explained view
backed by ML.EXPLAIN_FORECAST that decomposes every prediction into
trend, seasonality, holiday lift, and per-regressor attribution.
Why this design
- Pure SQL. Lives entirely in BigQuery + scheduled queries. No Python orchestrator, no separate ML platform, no IAM service-account juggling between systems.
- Scenario planning is built in. Because ad spend is an explicit regressor with a forecast-time future-values table, you can override the trailing-average defaults with a planned media buy and the forecast updates the next time the view is queried.
- Custom occasions, not just calendar holidays. A
custom_holidaystable lets you name your own events — Black Friday week, anniversary sales, May campaign — and the model learns a lift per named occasion it applies to future occurrences. - Explainability as a first-class output. Every forecasted point decomposes into trend, weekly seasonality, yearly seasonality, holiday attribution, and per-regressor contribution. If you don’t know where each predicted dollar comes from, the model isn’t ready to drive budget decisions.
A note on model quality
Auto-ARIMA always produces output. That output looks the same whether
the fit is excellent or barely there. Section §4 of the SQL is the
evaluation step that decides whether to trust what you’ve built — train
an eval model on data ending 30 days ago, run ML.EVALUATE on the
held-out window, read the MAPE. Healthy ecommerce MAPE for daily revenue
is 8–15%; above 20%, investigate before shipping the forecast to a
dashboard.
A longer walkthrough is in the companion post: Forecasting GA4 revenue in pure SQL with BigQuery ML — and the model-quality checks you can’t skip.
Status
This pipeline is pre-release. It’s been validated on synthetic data and small production datasets, but real-world ecommerce has quirks that only show up at scale. I’m actively recruiting testers — see the Contributing section of the README for the ideal tester profile. The single most useful contribution is opening an issue with your MAPE and a sentence about the business profile, so a public benchmark table can take shape.
Get it
Full SQL, dataflow diagrams, IAM setup, troubleshooting guide: