Contact Us

More Accurate Promotion Forecasting with Causal Modeling

Oct 18, 2017 6 min

The importance of sales promotions as the drivers of consumer behavior has grown during the last decades and a significant proportion of all retail sales are made due to promotions. The increased usage of digital marketing with personalized, time-dependent offers only amplify this phenomenon. It’s inevitable that the management and forecasting of promotional activities must keep up the pace with this trend. To cope, sophisticated promotion forecasting and planning methods is necessary. 

In this whitepaper we’ll take a look at how moving from a traditional category-based model to using causal methods increases forecasting accuracy and supports promotion planning.

Categorization Approach and Its Challenges

Traditionally, a widely adopted method for forecasting promotions has been to categorize promotions into distinct groups based on their characteristics, for example what price reduction the promotion had, what kind of special in-store display was used and what kind of marketing activities were executed during the promotion. The upcoming promotion would then get its promotion forecast based on the average sales performance of the group of promotions with similar characteristics.

This method is already quite sophisticated, especially if the groups can be defined freely, for example based on product categories or location: bread can have different responses towards promotional activities than candy, and small stores can have different responses than large stores. Existing best in class promotion forecasting solutions can also intelligently search for an appropriate grouping in case there isn’t exactly matching data available. For example, let’s say there is no historical data for similar promotions with the exact same characteristics for a certain SKU-store combination, or that the number of those promotions is too low for the estimate to be statistically significant. The best in class solutions can then search for the most appropriately matching group promotions to derive a promotion forecast that is as accurate as possible by starting from the SKU-store level but going through for example the product and location hierarchy as well as partially matching groups.

Example of a set of promotions and categories for each promotion.
Figure 1. Example of a set of promotions and categories for each promotion. With a categorical approach, the best forecast is achieved by utilizing exactly matching set of promotions as reference data. However, that approach neglects partially matching data (for example, ‘Coupon / 20-40% price-cut’ is a different category than ‘Coupon / 20-40% price-cut / Multibuy’ or ‘Coupon / No-price-cut’)

However, this category-based approach also has some drawbacks:

  • The limitations of categorizing: Let’s say the price reductions of promotions are categorized into categories of “0-20%”, “20-40%” and “over 40%”. This leads to a situation where the forecasted effects of price reductions of 21% and 39% are equal, even though they probably should be closer to effects of categories “0-20%” and “over 40%” respectively. However, defining more fine-grained categories reduces the reliability and accuracy of your promotion forecasts, as the amount of data behind each distinct category is reduced. On the other hand, adding manual rules unnecessarily increases the complexity of your promotion forecasting setup and makes it a lot harder to manage.
  • Utilization of the data: Using only exactly matching reference promotions (e.g. same price reduction category, same in-store display and same marketing activities) as input in your promotion forecasting model neglects a big chunk of data that partially matches the upcoming promotion (e.g. different price reduction category, but same in-store display and same marketing activities) and that would be valuable information for promotion forecasting.
  • Big data isn’t always meaningful data: Not too seldom when going through the promotion master data there are promotions that belong to for example more than 10 categories which all are ‘required’ to be utilized in forecasting: we live in the big data era after all. However, big data doesn’t always mean meaningful data: It can be that only three or four of those categories contribute to sales performance, or that the 10 categories could be combined into smaller numbers of categories for forecasting purposes. With the category-based approach it’s not instantly clear which categories are irrelevant in terms of forecasting and which are not.
  • Simultaneous promotions with different categorizations: Promotion data can include promotions that are overlapping with each other and so it’s difficult to say which part of the promotion performance is achieved by which promotion and category combination.

Causal Modeling Increases Forecast Accuracy and Supports Promotion Planning

To overcome the drawbacks of the category-based approach causal methods can be used. These methods are designed to extract the cause-effect relationships between certain group of variables. In practice, causal methods fit some model based on the input data, so that the difference between the data and model output is minimized and thus try to ‘explain’ the data as well as possible with given parameters. So, when forecasting promotional sales using causal methods, instead of asking “How similar promotions have performed in the past?”, you could ask “What’s the effect of a 24% price discount on demand, what’s the effect of a special display, or both of those combined?”. In more detail, causal methods tackle the above identified drawbacks by:

  • Utilizing continuous variables instead of category variables: Establishing a continuous relationship between e.g. the price decrease and demand allows you to estimate more precisely what’s the discount’s effect on demand. No need to categorize anything or add any heuristics or manual rules!
  • Separating the effects from one another: Causal methods can shed light on the estimated effects of individual variables and combinations of those variables, and thus drawing insights from the whole data set, not only from exactly similar promotions.
  • Separating meaningful data from the noise: Examining the results of causal methods, and automating the analysis of results allows you to determine which variables have a significant impact on the promotional sales, which don’t, and use that information in forecasting.
Causal model, with each distinct promotion category effect separated.
Figure 2. Causal model, with each distinct promotion category effect separated. Data significantly more deeply utilized, the model consist of almost 200 000 data points.

Utilizing causal models in promotion forecasting and benchmarking them against the previous Historical Averages model has revealed significant benefits: We have seen forecast error decrease up to 15 percentage points on average, and forecasts substantially stabilized, meaning that the number of very high error peaks has decreased significantly.

The added value here isn’t only about forecasting more accurately – the promotion planning process can also be improved by using causal methods: Knowing the effect of individual promotion characteristics to the demand of each individual SKU-store combination allows planning to be done in a more refined way by concentrating the promotional activities to where it matters the most. For example, there’s no point in adding heavy discounts or marketing activities to SKUs that respond poorly to price decreases or advertising. Turning that setting upside down, it’s possible to optimize the price decrease and promotional activities to meet certain sales targets – e.g. in case of markdown sales, where the target is to sell out the existing inventory. These activities help to increase the efficiency of the promotion planning and managing process, and ultimately to increase profit.

Best Practices for Forecasting Promotions

When implementing a promotion forecasting process the golden rule of garbage in, garbage out applies. When utilizing more sophisticated promotion forecasting models, it is even more true.

Therefore, a good first step is to ensure that the promotion master data is in good shape: Promotion prices are correct and promotion characteristics are thoroughly and systematically set up. This isn’t as self-evident as it sounds: Manual typos are quite typical in promotion master data, e.g. a promotion price of 49.9 instead of 4.99 and tv marketing instead of tv-marketing. Cleaning the master data really pays off when automating the promotion forecasting process. Read our blog post on mastering master data to dig deeper into the subject.

The second step is to establish a process where causal promotion forecasting models are estimated on multiple levels and the most appropriate of them is chosen. The acceptance criteria should consider at least the reliability of the estimates as well as the granularity of the data: usually the lower the level of the estimation, the better, meaning for example that in the typical retail setting, the SKU-store level yields the most accurate results. The promotion forecasting process should also be automated, nobody wants to spend countless hours going through the different models on different levels for possibly millions of SKU-store combinations.

Once the promotion forecasting process has been satisfyingly implemented, the results can also be taken outside forecasting – start planning the promotional activities operations based on the results of the automated analysis.

Results of causal methods coded into ‘Responsiveness indices (scale 1-5)’, to act as part of the promotion planning process
Figure 3. Results of causal methods coded into ‘Responsiveness indices (scale 1-5)’, to act as part of the promotion planning process

Written by

Aki Ali-Vehmas

Data Scientist