RELEX Live: Global Retail Summit | Free online event | 23 September | Sign up now

Contact Us

More Accurate Promotion Forecasting with Machine Learning

Oct 18, 2017 6 min

The importance of sales promotions as the drivers of consumer behavior has grown during the last decades and a significant proportion of all retail sales are made due to promotions. The increased usage of digital marketing with personalized, time-dependent offers only amplify this phenomenon. It’s inevitable that the management and forecasting of promotional activities must keep up the pace with this trend. To cope, sophisticated promotion forecasting and planning methods is necessary. 

In this whitepaper, we’ll take a look at how moving from a traditional category-based model to one that uses machine learning increases forecasting accuracy and supports promotion planning.

Categorization Approach and Its Challenges

Traditionally, a widely adopted method for forecasting promotions has been to categorize promotions into distinct groups based on their characteristics, for example what price reduction the promotion had, what kind of special in-store display was used and what kind of marketing activities were executed during the promotion. The upcoming promotion would then get its promotion forecast based on the average sales performance of the group of promotions with similar characteristics.

This method is already quite sophisticated, especially if the groups can be defined freely, for example based on product categories or location: bread can have different responses towards promotional activities than candy, and small stores can have different responses than large stores. Existing best in class promotion forecasting solutions can also intelligently search for an appropriate grouping in case there isn’t exactly matching data available. For example, let’s say there is no historical data for similar promotions with the exact same characteristics for a certain SKU-store combination, or that the number of those promotions is too low for the estimate to be statistically significant. The best in class solutions can then search for the most appropriately matching group promotions to derive a promotion forecast that is as accurate as possible by starting from the SKU-store level but going through for example the product and location hierarchy as well as partially matching groups.

Example of a set of promotions and categories for each promotion.
Figure 1. Example of a set of promotions and categories for each promotion. With a categorical approach, the best forecast is achieved by utilizing exactly matching set of promotions as reference data. However, that approach neglects partially matching data (for example, ‘Coupon / 20-40% price-cut’ is a different category than ‘Coupon / 20-40% price-cut / Multibuy’ or ‘Coupon / No-price-cut’)

However, this category-based approach also has some drawbacks:

  • The limitations of categorizing: Let’s say the price reductions of promotions are categorized into categories of “0-20%”, “20-40%” and “over 40%”. This leads to a situation where the forecasted effects of price reductions of 21% and 39% are equal, even though they probably should be closer to effects of categories “0-20%” and “over 40%” respectively. However, defining more fine-grained categories reduces the reliability and accuracy of your promotion forecasts, as the amount of data behind each distinct category is reduced. On the other hand, adding manual rules unnecessarily increases the complexity of your promotion forecasting setup and makes it a lot harder to manage.
  • Utilization of the data: Using only exactly matching reference promotions (e.g. same price reduction category, same in-store display and same marketing activities) as input in your promotion forecasting model neglects a big chunk of data that partially matches the upcoming promotion (e.g. different price reduction category, but same in-store display and same marketing activities) and that would be valuable information for promotion forecasting.
  • Big data isn’t always meaningful data: Not too seldom when going through the promotion master data there are promotions that belong to for example more than 10 categories which all are ‘required’ to be utilized in forecasting: we live in the big data era after all. However, big data doesn’t always mean meaningful data: It can be that only three or four of those categories contribute to sales performance, or that the 10 categories could be combined into smaller numbers of categories for forecasting purposes. With the category-based approach it’s not instantly clear which categories are irrelevant in terms of forecasting and which are not.
  • Simultaneous promotions with different categorizations: Promotion data can include promotions that are overlapping with each other and so it’s difficult to say which part of the promotion performance is achieved by which promotion and category combination.

Machine Learning Increased Forecast Accuracy and Supports Promotion Planning

Machine learning overcomes the challenges of the category-based approach. Machine learning algorithms identify and define the relationships between variables (in this case promotion attributes) and outcomes (sales). In practice, machine learning algorithms constantly adjust the models for the relationships between demand-influencing factors and actual demand. This minimizes the difference between the data and the model’s output, thus trying to ‘explain’ the data within the given parameters. So instead of asking, “How have similar promotions performed in the past?” you could instead ask, “What’s the demand impact of a 24% price discount compared to the impact of a special display? What would be the effect of both combined?” Ultimately, machine learning tackles common challenges by:

  • Utilizing continuous variables instead of category variables: Establishing a continuous relationship between e.g. the price decrease and demand allows you to estimate more precisely what’s the discount’s effect on demand. No need to categorize anything or add any heuristics or manual rules!
  • Separating the effects from one another: Machine learning estimates the effects of individual variables as well as combinations of those variables and thus draws insights from the whole data set, not only from similar promotions.
  • Separating meaningful data from the noise: Machine learning allows you to determine which variables have a significant impact on promotional sales and which don’t. That information can then be used in forecasting.
Causal model, with each distinct promotion category effect separated.
Figure 2. The forecast model generated and constantly updated using machine learning separates the impact of different promotion attributes. Available data is better utilized as the model is based on almost 200, 000 data points.

Benchmarking machine learning-based promotional forecasts against previous best practice reveals significant benefits. We have seen forecast errors decrease up to 15%  and forecasts stabilize substantially, meaning that the number of high error peaks has decreased significantly.

The added value here isn’t only about forecasting more accurately – the promotion planning process can also be improved by using machine learning. Knowing the effect of individual promotion characteristics on the demand of each individual SKU-store combination allows planning to be done in a more refined way by concentrating the promotional activities to where they matter most. For example, there’s no point in adding heavy discounts or marketing activities to SKUs that respond poorly to these tactics. Turning that setting upside down, it’s possible to optimize the price decrease and promotional activities to meet certain sales targets – e.g., in case of markdown optimization, where the target is to sell out the existing inventory. These activities help to increase the efficiency of the promotion planning and managing process, and ultimately to increase profit.

Best Practices for Forecasting Promotions

When implementing a promotion forecasting process the golden rule of garbage in, garbage out applies. When utilizing more sophisticated promotion forecasting models, it is even more true.

Therefore, a good first step is to ensure that the promotion master data is in good shape: Promotion prices are correct and promotion characteristics are thoroughly and systematically set up. This isn’t as self-evident as it sounds: Manual typos are quite typical in promotion master data, e.g. a promotion price of 49.9 instead of 4.99 and tv marketing instead of tv-marketing. Cleaning the master data really pays off when automating the promotion forecasting process. Read our blog post on mastering master data to dig deeper into the subject.

The second step is to implement machine learning for demand forecasting to automate the process. Nobody wants to spend countless hours going through the different models on different levels for possibly millions of SKU-store combinations.

Once the promotion forecasting process has been satisfyingly implemented, the results can also be taken outside forecasting – start planning the promotional activities operations based on the results of the automated analysis.

Results of causal methods coded into ‘Responsiveness indices (scale 1-5)’, to act as part of the promotion planning process
Figure 3. Results of machine learning coded into ‘Responsiveness indices (scale 1-5)’, to act as part of the promotion planning process.

Written by

Aki Ali-Vehmas

Data Scientist