notebooks/handling_shocks.ipynb
%matplotlib inline
from prophet import Prophet
import pandas as pd
from matplotlib import pyplot as plt
import logging
logging.getLogger('prophet').setLevel(logging.ERROR)
import warnings
warnings.filterwarnings("ignore")
plt.rcParams['figure.figsize'] = 9, 6
As a result of the lockdowns caused by the COVID-19 pandemic, many time series experienced "shocks" during 2020, e.g. spikes in media consumption (Netflix, YouTube), e-commerce transactions (Amazon, eBay), whilst attendance to in-person events declined dramatically.
Most of these time series would also maintain their new level for a period of time, subject to fluctuations driven by easing of lockdowns and/or vaccines.
Seasonal patterns could have also changed: for example, people may have consumed less media (in total hours) on weekdays compared to weekends before the COVID lockdowns, but during lockdowns weekday consumption could be much closer to weekend consumption.
In this page we'll explore some strategies for capturing these effects using Prophet's functionality:
For this case study we'll use Pedestrian Sensor data from the City of Melbourne. This data measures foot traffic from sensors in various places in the central business district, and we've chosen one sensor (Sensor_ID = 4) and aggregated the values to a daily grain.
The aggregated dataset can be found in the examples folder here.
df = pd.read_csv('https://raw.githubusercontent.com/facebook/prophet/main/examples/example_pedestrians_covid.csv')
df.set_index('ds').plot();
We can see two key events in the time series:
There are also shorter periods of strict lockdown that lead to sudden tips in the time series: 5 days in February 2021, and 14 days in early June 2021.
First we'll fit a model with the default Prophet settings:
m = Prophet()
m = m.fit(df)
future = m.make_future_dataframe(periods=366)
forecast = m.predict(future)
m.plot(forecast)
plt.axhline(y=0, color='red')
plt.title('Default Prophet');
m.plot_components(forecast);
The model seems to fit reasonably to past data, but notice how we're capturing the dips, and the spikes after the dips, as a part of the trend component.
By default, the model assumes that these large spikes are possible in the future, even though we realistically won't see something of the same magnitude within our forecast horizon (1 year in this case). This leads to a fairly optimistic forecast of the recovery of foot traffic in 2022.
To prevent large dips and spikes from being captured by the trend component, we can treat the days impacted by COVID-19 as holidays that will not repeat again in the future. Adding custom holidays is explained in more detail here. We set up a DataFrame like so to describe the periods affected by lockdowns:
lockdowns = pd.DataFrame([
{'holiday': 'lockdown_1', 'ds': '2020-03-21', 'lower_window': 0, 'ds_upper': '2020-06-06'},
{'holiday': 'lockdown_2', 'ds': '2020-07-09', 'lower_window': 0, 'ds_upper': '2020-10-27'},
{'holiday': 'lockdown_3', 'ds': '2021-02-13', 'lower_window': 0, 'ds_upper': '2021-02-17'},
{'holiday': 'lockdown_4', 'ds': '2021-05-28', 'lower_window': 0, 'ds_upper': '2021-06-10'},
])
for t_col in ['ds', 'ds_upper']:
lockdowns[t_col] = pd.to_datetime(lockdowns[t_col])
lockdowns['upper_window'] = (lockdowns['ds_upper'] - lockdowns['ds']).dt.days
lockdowns
ds specifying the start of the lockdown. ds_upper is not used by Prophet, but it's a convenient way for us to calculate upper_window.upper_window tells Prophet that the lockdown spans for x days after the start of the lockdown. Note that the holidays regression is inclusive of the upper bound.Note that since we don't specify any future dates, Prophet will assume that these holidays will not occur again when creating the future dataframe (and hence they won't affect our projections). This is different to how we would specify a recurring holiday.
m2 = Prophet(holidays=lockdowns)
m2 = m2.fit(df)
future2 = m2.make_future_dataframe(periods=366)
forecast2 = m2.predict(future2)
m2.plot(forecast2)
plt.axhline(y=0, color='red')
plt.title('Lockdowns as one-off holidays');
m2.plot_components(forecast2);
In an environment when behaviours are constantly changing, it's important to ensure that the trend component of the model is capturing to emerging patterns without overfitting to them.
The trend changepoints documentation explains two things we could tweak with the trend component:
changepoint_prior_scale), which determines how flexible the trend is; the default value is 0.05 and increasing this will allow the trend to fit more closely to the observed data.We plot the trend component and changepoints detected by our current model below.
from prophet.plot import add_changepoints_to_plot
fig = m2.plot(forecast2)
a = add_changepoints_to_plot(fig.gca(), m2, forecast2)
The detected changepoints look reasonable, and the future trend tracks the latest upwards trend in activity, but not to the extent of late 2020. This seems suitable for a best guess of future activity.
We can see what the forecast would look like if we wanted to emphasize COVID patterns more in model training; we can do this by adding more potential changepoints after 2020 and making the trend more flexible.
m3_changepoints = (
# 10 potential changepoints in 2.5 years
pd.date_range('2017-06-02', '2020-01-01', periods=10).date.tolist() +
# 15 potential changepoints in 1 year 2 months
pd.date_range('2020-02-01', '2021-04-01', periods=15).date.tolist()
)
# Default changepoint_prior_scale is 0.05, so 1.0 will lead to much more flexibility in comparison.
m3 = Prophet(holidays=lockdowns, changepoints=m3_changepoints, changepoint_prior_scale=1.0)
m3 = m3.fit(df)
forecast3 = m3.predict(future2)
from prophet.plot import add_changepoints_to_plot
fig = m3.plot(forecast3)
a = add_changepoints_to_plot(fig.gca(), m3, forecast3)
We're seeing many changepoints detected post-COVID, matching the various fluctuations from loosening / tightening lockdowns. Overall the trend curve and forecasted trend are quite similar to our previous model, but we're seeing a lot more uncertainty because of the higher number of trend changes we picked up in the history.
We probably wouldn't pick this model over the model with default parameters as a best estimate, but it's a good demonstration of how we can incorporate our beliefs into the model about which patterns are important to capture.
The seasonal component plots in the previous sections show a peak of activity on Friday compared to other days of the week. If we're not sure whether this will still hold post-lockdown, we can add conditional seasonalities to the model. Conditional seasonalities are explained in more detail here.
First we define boolean columns in the history dataframe to flag "pre covid" and "post covid" periods:
df2 = df.copy()
df2['pre_covid'] = pd.to_datetime(df2['ds']) < pd.to_datetime('2020-03-21')
df2['post_covid'] = ~df2['pre_covid']
The conditional seasonality we're interested in modelling here is the day-of-week ("weekly") seasonality. To do this, we firstly turn off the default weekly_seasonality when we create the Prophet model.
m4 = Prophet(holidays=lockdowns, weekly_seasonality=False)
We then add this weekly seasonality manually, as two different model components - one for pre-covid, one for post-covid. Note that fourier_order=3 is the default setting for weekly seasonality. After this we can run .fit().
m4.add_seasonality(
name='weekly_pre_covid',
period=7,
fourier_order=3,
condition_name='pre_covid',
)
m4.add_seasonality(
name='weekly_post_covid',
period=7,
fourier_order=3,
condition_name='post_covid',
);
m4 = m4.fit(df2)
We also need to create the pre_covid and post_covid flags in the future dataframe. This is so that Prophet can apply the correct weekly seasonality parameters to each future date.
future4 = m4.make_future_dataframe(periods=366)
future4['pre_covid'] = pd.to_datetime(future4['ds']) < pd.to_datetime('2020-03-21')
future4['post_covid'] = ~future4['pre_covid']
forecast4 = m4.predict(future4)
m4.plot(forecast4)
plt.axhline(y=0, color='red')
plt.title('Lockdowns as one-off holidays + Conditional weekly seasonality');
m4.plot_components(forecast4);
Interestingly, the model with conditional seasonalities suggests that, post-COVID, pedestrian activity peaks on Saturdays, instead of Fridays. This could make sense if most people are still working from home and are hence less likely to go out on Friday nights. From a prediction perspective this would only be important if we care about predicting weekdays vs. weekends accurately, but overall this kind of exploration helps us gain insight into how COVID has changed behaviours.
A lot of the content in this page was inspired by this GitHub discussion. We've covered a few low hanging fruits for tweaking Prophet models when faced with shocks such as COVID, but there are many other possible approaches as well, such as:
Overall though it's difficult to be confident in our forecasts in these environments when rules are constantly changing and outbreaks occur randomly. In this scenario it's more important to constantly re-train / re-evaluate our models and clearly communicate the increased uncertainty in forecasts.