Leveraging Time-Series Segmentation and Machine Learning for Better Forecasting Accuracy

ODSC - Open Data Science
4 min readMar 17, 2023

--

Several papers discussed the importance of segmenting time series into groups and modeling each group separately to enhance forecasting accuracy overall. But what does this look like in practice? At the end of the day, why not use an AutoML package (Automated Machine Learning) or an Auto-Forecasting tool and let it do the job for you?

The answer is quite simple, but we need to dive a bit deeper to understand the concept. An AutoML tool will usually use all the data you have available, develop several models, and then select the best-performing model as a global ‘champion’ to generate forecasts for all time series. An Auto-forecasting tool will usually compare various statistical models (sometimes deep-learning models are included as well) for each time series and then select the best-performing one based on users’ criteria to model the specific series.

However, we already know that:

  1. Machine Learning models deliver better results in terms of accuracy when we are dealing with interrelated series and complex patterns in our data.
  2. Machine Learning and Deep Learning models are much more computationally intensive compared to statistical methods, which makes them more expensive to run and scale in the cloud.
  3. Classical statistical methods (such as Exponential Smoothing and ARIMA) could overperform complex techniques when we have enough data history and specific characteristics in our data such as trend and seasonality.

Wouldn’t it be great to segment the time series first and then apply the appropriate modeling strategy to each group to deliver optimal results both in terms of accuracy and computational efficiency? Absolutely! So how do we choose from all the available different clustering methods?

While there are many different methods (such as similarity, component-based, and Kmeans) to explore, SAS Visual Forecasting provides a demand classification template where time series are grouped based on their historical demand patterns and an appropriate pre-selected modeling method is applied to each group.

The groups are formed following the flow below (for a more detailed explanation have a look at the documentation here).

Now let’s take this a step further. Using the default pre-selected models and settings we could enhance forecasting accuracy by quite a bit, but a significant uplift can be realized when modifying the pipelines that are automatically generated.

In the short clip below we showcase the logic. In SAS VF you can have many competing forecasting pipelines. In our example, we use two pipelines: the Autoforecasting template, which selects the best-performing method from various statistical models, and a demand-classification template, which implements the flow we presented above and creates nested forecasting pipelines for each group that is formed.

In the video, you can see that 519 times series are classified as ‘Year Round — Non-Seasonal’ which means that these series have long time span without seasonal patterns. This is a great opportunity to use machine learning and other advanced techniques to see if we can enhance the forecasting accuracy of our results. In our example we used a gradient boosting model, a panel series neural network, a hybrid technique called Stacked Model (which implements a strategy where the results from a Neural Network and statistical time series techniques are cleverly combined), and the preselected non-seasonal model node which implements a series of non-seasonal, statistical time series methods and selects the best performing one.

For the data we used, the Stacked Model achieves a reduction of the error in Weighted Mean Absolute Error (WMAE) of 5.5% without much tuning of the algorithm which is not bad at all! After implementing our changes, the demand classification pipeline reduces the overall error in our forecasting process by approx. 21% compared to the Auto-Forecasting one — quite impressive!

When happy with our forecasts, we could also infuse business knowledge, that is not captured in the data, by using the “Overrides” tab and then explore the key information of our project in the automated report that is generated in the “Insights tab”, which can be exported as PDF and shared with other business units. Job done!

If you want to try out this advanced capability, make sure to check SAS Viya Free Trial which gives you access to the platform in the cloud for 14 days, with the option to extend if needed.

Happy forecasting!

About the Author:

Spiros Potamitis is a data scientist who has extensive experience in the development and implementation of advanced analytics solutions across different industries and provides subject matter expertise in the areas of forecasting, machine learning, and AI. Spiros has also worked on and led advanced analytics teams in various sectors such as credit risk, customer insights, and CRM. LinkedIn.

Originally posted on OpenDataScience.com

Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday. You can also get data science training on-demand wherever you are with our Ai+ Training platform. Subscribe to our fast-growing Medium Publication too, the ODSC Journal, and inquire about becoming a writer.

--

--

ODSC - Open Data Science

Our passion is bringing thousands of the best and brightest data scientists together under one roof for an incredible learning and networking experience.