Facebook’s Prophet Forecasting Crystal Ball

ODSC - Open Data Science
5 min readJun 23, 2021

Facebook’s Prophet is one of the most-liked forecasting approaches nowadays. Its usage is very well described, the code itself cleanly documented, hence instead of giving examples of Facebook’s Prophet, we will look under the hood to understand where these Bayesian model novelties lie.

https://xkcd.com/605/

The Model

The second you hear the model incorporates trend, seasonality, and holiday effects, you think: “Yeah, Sherlock, true rocket (data) science. That’s how we’ve been decomposing time series from long before Facebook itself existed.”

However, in the time series world, it’s not only the matter of WHAT but also the matter of HOW. On top of that, the full model includes also additional, user-defined regressors:

(Yes, you see correctly, we are yet again the slaves of the most popular distribution on our planet.)

Seasonality

As a past harmonic analysis lover, let me start with the seasonality component, which is actually the Fourier series incorporation into the modeling:

where P stands for period.

A note here — please keep in mind that we can achieve very similar results for the seasonality itself, regardless of whether we use Facebook’s Prophet, SARIMAX (For SARIMAX, using Fourier series as regressors instead of leveraging the differentiation could be controversial, however, the implementation for multiseasonalities for SARIMAX is not omnipresent and this can be a nice hack around it) or good old Linear Regression. The main difference would be the way that parameters are estimated — using MAP (Maximum A Posteriori) estimate, MLE (Maximum Likelihood Estimate), or Least Squares. Just to remind you, Least Squares are quite different from the other two — the aim is to minimize the residual sum of squares:

For such a calculation no distribution assumptions are needed (however, they still arrive later when inferencing about parameters distribution!).

MLE and MAP have certain similarities, MLE is about maximizing log-likelihood function:

and MAP about maximizing a posteriori, namely:

The chart below perfectly explains the difference between them:

https://www.robots.ox.ac.uk/~az/lectures/est/lect34.pdf

While MAP seems more reasonable, the requirement to know a priori probabilities is not trivial. In the case of the seasonality component of the Prophet method, the distribution is assumed to be normal (the hero distribution yet again), zero centered, with a default scale equal to 10 (however, tunable by a user).

While tidying up some notions, it’s also worth mentioning that seasonality coefficients estimation doesn’t require the usage of GAM (Generalized Additive Models) and splines fitting yet. Let me refresh the GAM definition:

with g being the link function and f_i the unspecified smooth functions. In the case of the Fourier series, our functions are well defined, so we fall into simple Linear Regression with linearity assumption:

Holidays

And suddenly we all dream about the world coming back to normal and traveling somewhere warm this summer…

Facebook’s Prophet contains a nice framework for holidays — you may either define your own or leverage the predefined, country-specific holidays. What’s more, it has the comfy flexibility of adding both holidays and their neighboring days to the modeling. All of that may not be evolutionary in terms of the model itself, but sure as hell spares us time in the data preparation department.

Trend

As you’ve all already guessed (or scrolled to the last part right away), most tricks are incorporated in the trend part. Authors leveraged their experience into crafting nice real-life trends: linear (no escaping the trivial cases!) and logistic, saturating growth:

where C_t denotes maximum capacity, k — growth rate, and m offset.

The changing over time capacity helps with modeling ‘capped’ forecasts, like in the case of Facebook — the number of users can be limited by the number of people with access to the Internet.

To make the trend even more practical, they’ve also added changepoints to the equation, to reflect business changes that affect trends (detailed formulas can be found in their paper). However, instead of quoting the formulas, let’s simply see how exemplary saturating growths may look like:

With changing capacity:

Or with two changepoints:

Note how the changing over time capacity together with no limitations of changepoints (either number or impact), causes the model to gain high flexibility. If you feel like playing with different parameter settings, just visit a little playground I prepared for you.

The Beginning of

With the ‘trend, seasonality, holidays’ trinity demystified, we can finally consciously experiment with the model and check whether in our case it outperforms the alternatives. And if you want to demystify and play with more time-series approaches, I will be telling more time(less) modeling stories on the 13th of July in my Ai+ session “Time Series Forecasting with Python.” Enjoy forecasting!

About the author/Ai+ Presenter on Facebook’s Prophet: Marta Markiewicz

Currently Senior (Big) Data Scientist at InPost and Lecturer at Wroclaw University of Economics and Business, previously Head of Data Science at Objectivity, with a background in Mathematical Statistics. For almost 10 years, she has been discovering the potential of data in various business domains, from medical data, through retail, HR, finance, aviation, real estate, logistics, … She deeply believes in the power of data in every area of life. Articles’ writer, conference speaker, and privately — passionate dancer and hand-made jewelry creator.

--

--

ODSC - Open Data Science

Our passion is bringing thousands of the best and brightest data scientists together under one roof for an incredible learning and networking experience.