Send me the guide as a PDF
Introduction: What is a Good Level of Forecast Accuracy?
“What would you consider a good level of forecast accuracy in our business?” is probably the single most frequent question we get from customers, consultants and other business experts alike.
Unfortunately, we feel that isn’t the right question to ask. Firstly, because in any retail or supply chain planning context, forecasting is always a means to an end, not the end itself. We need to keep in mind that a forecast is relevant only in its capacity of enabling us to achieve other goals, such as improved on-shelf availability, reduced food waste, or more effective assortments.
Secondly, although forecasting is an important part of any planning activity, it still represents only one cogwheel in the planning machinery, meaning that there are other factors that may have a significant impact on the outcome. Oftentimes the importance of accurate forecasting is truly crucial, but from time to time other factors are more important to attaining the desired results. (You can read more about how this can be seen in a store replenishment context in a recent master’s thesis commissioned by RELEX.)
We are, of course, not saying that you should stop measuring forecast accuracy altogether. It is an important tool for root cause analysis and for detecting systematic changes in forecast accuracy early on.
However, to get truly valuable insights from measuring forecast accuracy you need to understand:
1. The role of demand forecasting in attaining business results. Forecast accuracy is crucial when managing short shelf-life products, such as fresh food. However, for other products, such as slow-movers with long shelf-life, other parts of your planning process may have a bigger impact on your business results. Do you know for which products and situations forecast accuracy is a key driver of business results?
2. What factors affect the attainable forecast accuracy. Demand forecasts are inherently uncertain; that is why we call them forecasts rather than plans. In some circumstances demand forecasting is, however, easier than in others. Do you know when you can rely more heavily on forecasting and when, on the contrary, you need to set up your operations to have a higher tolerance for forecast errors?
3. How to assess forecast quality. Forecast metrics can be used for monitoring performance and detecting anomalies, but how can you tell whether your forecasts are already of high quality or whether there is still significant room for improvement in your forecast accuracy?
4. How the main forecast accuracy metrics work. When measuring forecast accuracy, the same data set can give good or horrible scores depending on the chosen metric and how you conduct the calculations. Do you understand why?
5. How to monitor forecast accuracy. No forecast metric is universally better than another. Do you know what forecast accuracy metrics to use and how?
In the following chapters, we will explain these facets of forecasting and why forecast accuracy is a good servant but a poor master.
Chapter 1: The Role of Demand Forecasting in Attaining Business Results
If you are not in the business of predicting weather, the value of a forecast comes from applying it as part of a planning process.
Good demand forecasts reduce uncertainty. In retail distribution and store replenishment, the benefits of good forecasting include the ability to attain excellent product availability with reduced safety stocks, minimized waste, as well as better margins, as the need for clearance sales are reduced. Further up the supply chain, good forecasting allows manufacturers to secure availability of relevant raw and packaging materials and operate their production with lower capacity, time and inventory buffers.
Forecasts are obviously important. If there are low-hanging fruit in demand forecasting, it always makes sense to harvest them. For example, if retailers are not yet taking advantage of modern tools allowing them to automatically select and employ the most effective combination of different time-series forecasting approaches and machine learning, the investment is going to pay off.
On the other hand, it is also obvious that demand forecasts will always be inaccurate to some degree and that the planning process must accommodate this.
In some cases, it may simply be more cost-effective to mitigate the effect of forecast errors rather than invest in further increasing the forecast accuracy. In inventory management, the cost of a moderate increase in safety stock for a long life-cycle and long shelf-life product may be quite reasonable in comparison to having demand planners spend a lot of time further fine-tuning forecasting models or doing manual changes to the demand forecast. Furthermore, if the remaining forecast error is caused by essentially random variation in demand, any attempt to further increase forecast accuracy will be fruitless.
In addition, there may be other factors with a bigger impact on the business result than perfecting the demand forecast. See Figure 1 for an example of using forecasting to drive replenishment planning for grocery stores. Although the forecast accuracy for the example product and store is quite good, there is still systematic waste due to product spoilage. When digging deeper into the matter, it becomes clear that the main culprit behind the excessive waste is the product’s presentation stock, i.e. the amount of stock needed to keep its shelf space sufficiently full to maintain an attractive display. By assigning less space to the product in question (Figure 2), the inventory levels can be pushed down, allowing for 100% availability with no waste, without changing the forecast.
Figure 1: The daily forecast accuracy for this product in this store is already quite good, but there is still systematic waste due to spoilage.
Figure 2: When reducing the shelf space assigned to the product in Figure 1, less stock is needed to make the shelf look sufficiently full, allowing for 100% on-shelf availability without waste. If one wanted to further reduce inventory, the next steps would be to look at the product’s relatively large batch size or increasing the number of delivery days from the current four per week.
The conclusion that can be drawn from the above examples is that even near-perfect forecasts do not produce excellent business results if the other parts of the planning process are not equally good.
In some situations, such as fresh food retail, forecasting is crucial. It makes business sense to invest in forecast accuracy, by making sure weekday-related variation in sales is effectively captured and by using advanced forecasting models such as regression analysis and machine learning for forecasting the effect of promotions, cannibalization that may diminish demand for substitute items, and by taking weather forecasts into account. (You can read more about fresh food forecasting and replenishment in our e-book.) However, all this work will not pay off if batch sizes are too large or there is excessive presentation stock. Also, when weekday-variation in sales is significant, you need to be able to dynamically adjust your safety stock per weekday to optimize availability and waste.
This, of course, holds true for any planning process. If you only focus on forecasts and do not spend time on optimizing the other elements impacting your business results, such as safety stocks, lead times, batch sizes or planning cycles, you will reach a point, where additional improvements in forecast accuracy will only marginally improve the actual business results.
Exhibit 1: The Danger of Focusing on Forecast Accuracy Rather than Business Results
In recent years, we have seen an increasing trend among retailers to apply forecast competitions for choosing between providers of planning software. Essentially, this means that all vendors get the same data from the retailers, which they will then insert into their planning tools to show what kind of forecast accuracy they can provide. May the best forecast win!
We are very much in favor of all approaches to buying software that include customers getting hands-on experience of the software and an opportunity to test its capabilities before making a purchase decision.
There is, however, also reason for caution when setting up forecast competitions. In some cases, we have been forced to choose between the forecast getting us the best score for the selected forecast accuracy metric or presenting the forecast that we know would be the best fit for its intended use. How can this happen?
Let us illustrate this with two simple yet true examples from retail store replenishment.
Our first example product is a typical slow-mover (see Figure 3). The day-level forecast accuracy measured as 1-MAD/Mean (see Section 4 for more information on the main forecast metrics) at 2% seems horribly low. Yet, in practice even a perfect forecast would not have any impact on the business results; the on-shelf availability is already perfect and the stock levels are determined by the presentation stock requirements and batch size of this product (see Figure 4). As the forecast is almost unbiased, it also works well as the basis for calculating projected store orders to drive forecasting at the supplying warehouse.
Figure 3: For this slow-moving product, the day-level forecast accuracy (measured as 100% – MAD/ Mean in percent) is horribly low at 2% and the week-level accuracy rather low at 66%. The forecast bias is, however, perfect at 100%.
Figure 4: The forecast for our example product in Figure 3 has very little impact on store replenishment. The lowest inventory level is set by the minimum presentation stock requirement (to keep shelves sufficiently full) and the highest inventory level by the relatively large batch size in which the product is delivered.
Interestingly, by manipulating the forecast model to consistently under-estimate demand, the day-level forecast accuracy for our example product can be significantly increased. However, at the same time, this would introduce a significant bias to the forecast with the potential of significantly hurting supply planning, in a situation where store forecasts form the basis for the distribution center forecast. Furthermore, there would be no positive impact on store replenishment.
So, for our slow-moving example product, the forecast giving us a better score for the selected forecast accuracy metric is less fit for its purpose of driving replenishment to the stores and distribution centers than the forecast attaining a worse forecast accuracy score.
Our second example, a typical fast-moving product, has a lot more sales, which makes it possible to identify a systematic weekday-related sales pattern (see Figure 5). As a result of the high sales volume, the demand for this product is much less influenced by random variation, enabling quite accurate day-level forecasts. Also, due to the considerable sales volume and frequent deliveries, the forecast is truly driving store replenishment and making sure the store is stocked up nicely just before the demand peaks (Figure 5).
Figure 5: Our second example consists of product with significantly more sales in this store. It displays a clear, predictable weekday-related sales pattern. The day-level forecast accuracy is good (83%). The forecast displays very little bias.
Figure 6: In our second example, the forecast has a clear impact on store replenishment, with deliveries arriving nicely before the demand peaks, securing perfect availability and attractive shelves, with a reasonable amount of stock.
For the fast-moving product, the same forecast accuracy metric that was problematic for the slow-moving product truly reflects the forecast’s fit for purpose.
You may be interested in knowing what we did when we faced the ethical dilemma of either presenting our potential customer with a better scoring or more fit-for-purpose forecast? We did not consort to delivering simply what the customer asked for but rather what they needed. However, we did present both forecasts and use detailed stock simulations to explain why our recommended choice was a better fit. Without this analysis, the conclusion of the forecast competition would have been wrong. Therefore, we strongly encourage companies to review the effectiveness of forecasts in the context they will be used in, for example using simulation.
Chapter 2: What Factors Affect the Attainable Forecast Accuracy
There are several factors that have an impact on what level of forecast accuracy can realistically be attained. This is one of the reasons why it is so difficult to do forecast accuracy comparisons between companies or even between products within the same company.
There are a few basic rules of thumb:
Forecasts are more accurate when sales volumes are high: It is in general easier to attain a good forecast accuracy for large sales volumes. If a store only sells one or two units of an item per day, even a one-unit random variation in sales will result in a large percentage forecast error. By the same token, large volumes lend themselves to leveling out random variation. For example, if hundreds of people buy the same product, such as a 12 oz. Coke can, on a daily basis, even a bus load of tourists stopping by that store to pick up a can each will not have a significant impact on forecast accuracy. However, if the same tourists have on their way happened to receive a mouthwatering recommendation for a very beer seasoned mustard stocked by the store, their purchases will correspond to a months’ worth of normal sales and most likely leave the shelves all cleaned out.
This means that demand is easier to forecast for hypermarkets and megastores than for convenience stores or chains of small hardware stores. Likewise, it is easier to forecast for discounters than for similar-sized supermarkets, because regular supermarkets might have an assortment ten times larger in terms of SKUs, meaning average sales per item are far lower.
Forecast accuracy improves with the level of aggregation: When aggregating over SKU’s or over time, the same effect of larger volumes dampening the impact of random variation can be seen. This means that forecast accuracy measured on a product group level or for a chain of stores is higher than when looking at individual SKU’s in specific stores. Likewise, the forecast accuracy measured on a monthly or weekly rather than a daily basis is usually significantly higher.
Short-term forecasts are more accurate than long-term forecasts: A longer forecasting horizon significantly increases the chance of changes not known to us yet having an impact on future demand. A simple example is weather-dependent demand. If we need to make decisions on what quantities of summer clothes to buy or produce half a year or even longer in advance, there is currently no way of knowing what the weather in the summer is going to be. On the other hand, if we are managing replenishment of ice-cream to grocery stores, we can make use of short-term weather forecasts when planning how much ice-cream to ship to each store.
Forecasting is easier in stable businesses: It goes without saying that it is always easier to attain a good forecast accuracy for mature products with stable demand than for new products. Forecasting in fast fashion is harder than in grocery. In grocery, retailers following a year-round low-price model find forecasting easier than competitors that rely heavily on promotions or frequent assortment changes.
Chapter 3: How to Assess Forecast Quality
Now that we have established that there cannot be any universal benchmarks for when forecast accuracy can be considered satisfactory or unsatisfactory, how do we go about identifying the potential for improvement in forecast accuracy?
As stated in the introduction, the first step is assessing your business results and the role forecasting plays in attaining them. If forecasting turns out to be a main culprit explaining disappointing business results, you need to assess whether your forecasting performance is satisfying.
These are some of the questions you need to dig into:
Do your forecasts accurately capture systematic variation in demand? There are usually many types of variation in demand that are somewhat systematic. There may be seasonality, such as demand for tea increasing in the winter time, or trends, such as an ongoing increase in demand of organic food, that can be detected by examining past sales data. In addition, especially at the store and product level, many products have distinct weekday-related variation in demand. A good forecasting system that applies automatic optimization of forecast models should be able to identify this kind of systematic patterns without manual intervention.
Do your forecasts accurately capture the impact of events known beforehand? Internal business decisions, such as promotions, price changes and assortment changes have a direct impact on demand. If these planned changes are not reflected in your forecast, you need to fix your planning process before you can start addressing forecast accuracy. The next step then is to examine how you forecast for example the impact of promotions. Are you already taking advantage of all available data, such as promotion type, marketing activities, price discounts, in-store displays etc. in forecasting, or could you improve forecast accuracy through more sophisticated forecasting? (You can read more about how we use causal models to forecast the impact of promotions here.)
In addition to your organization’s own business decisions, there are external factors that have an impact on demand. Some of these are known well in advance, such as holidays or local festivals. One-off events typically require manual planning, but for recurring events, such as Easter, for which past data is available, forecasting can be highly automated. Some external factors naturally take us by surprise, such as a specific product taking off in social media. In Finland, this happened recently with cauliflower, for which demand doubled in response to a social media campaign initiated by a few concerned citizens who wanted to help farmers move an exceptionally large crop. Even when the information becomes available only after important business decisions have been made, it is important to use the information to cleanse the data used for forecasting to avoid errors in future forecasts.
Does your forecast accuracy behave in a predictable way? It is often more important to understand in which situations and for which products forecasts can be expected to be good or bad, rather than to pour vast resources into perfecting forecasts that are by their nature unreliable. Understanding when forecast accuracy is likely to be low, makes it possible to do a risk analysis of the consequences of over- and under forecasting and to make business decisions accordingly.
A good example of this is a FMCG manufacturer we have worked with, who has a process for identifying potential “stars” in their portfolio of new products. “Star” products have the potential of really breaking the bank, but they are rare and seen only a couple of times per year. As the products have limited shelf-life, the manufacturer does not want to risk potentially very inflated forecasts driving up inventory just in case, rather they make sure they have production capacity, raw materials and packaging supplies to be able to deal with a situation where the original forecast turns out to be too low.
The need for predictable forecast behavior is also the reason why we apply extreme care when taking new forecasting methods, such as different machine learning algorithms into use. For example, when testing different variants of machine learning on promotion data, we discarded one approach that was on average slightly more accurate than some others, but significantly less robust and more difficult for the average demand planner to understand. Occasional extreme forecast errors can be very detrimental to your performance, when the planning process has been set up to tolerate a certain level of uncertainty. Furthermore, it reduces the demand planners’ confidence in the forecast calculations, which can significantly hurt work efficiency.
If demand changes in ways that cannot be explained or demand is affected by factors for which information is not available early enough to impact business decisions, you simply must find ways of making the process less dependent on forecast accuracy.
We already mentioned weather as one external factor having an impact on demand. In the short-term, weather forecasts can be used to drive replenishment to stores (you can read more about how to use machine learning to benefit from weather data in your forecasting here). However, long-term weather forecasts are still too uncertain to provide value in demand planning that needs to be done months ahead of sales.
In very weather-dependent businesses, such as winter sports gear, our recommendation is to make a business decision concerning what inventory levels to go for. For high-margin items, the business impact of losing sales due to stock-outs is usually worse than the impact of needing to resort to clearance sales to get rid of excess stock, which is why it may make sense to plan in accordance with favorable weather. For low-margin items, rebates may quickly turn products unprofitable, which is why it may be wiser to have a more cautious inventory plan.
In any case, setting your operations up so that final decisions on where to position stock are made as late as possible allow for collecting more information and improving forecast accuracy. In practice, this can mean holding back a proportion of inventory at your distribution centers to be allocated to the regions that have the most favorable conditions and the best chance of selling the goods at full price. (You can read more about managing seasonal products here.)
Chapter 4: How the Main Forecast Accuracy Metrics Work
Depending on the chosen metric, level of aggregation and forecasting horizon, you can get very different results on forecast accuracy for the exact same data set. To be able to analyze forecasts and track the development of forecasts accuracy over time, it is necessary to understand the basic characteristics of the most commonly used forecast accuracy metrics.
There is probably an infinite number of forecast accuracy metrics, but most of them are variations of the following three: forecast bias, mean average deviation (MAD), and mean average percentage error (MAPE). We will have a closer look at these next. Do not let the simple appearance of these metrics fool you. After explaining the basics, we will delve into the intricacies of how the metrics are calculated in practice and show how simple and completely justifiable changes in the calculation logic has the power of radically altering the forecast accuracy results.
Forecast bias is the difference between forecast and sales. If the forecast over-estimates sales, the forecast bias is considered positive. If the forecast under-estimates sales, the forecast bias is considered negative. If you want to examine bias as a percentage of sales, then simply divide total forecast by total sales – results of more than 100% mean that you are over-forecasting and results below 100% that you are under-forecasting.
In many cases it is useful to know if demand is systematically over- or under-estimated. For example, even if a slight forecast bias would not have notable effect on store replenishment, it can lead to over- or under-supply at the central warehouse or distribution centers if this kind of systematic error concerns many stores.
A word of caution: When looking at aggregations over several products or long periods of time, the bias metric does not give you much information on the quality of the detailed forecasts. The bias metric only tells you whether the overall forecast was good or not. It can easily disguise very large errors. You can find an example of this in Table 1.
Table 1: This example shows sales and forecasts for three items for a single week. As you can see, even though the biases for each individual product are quite large, the group-level bias calculated by comparing total sales to total forecast for all three products is very small. In other words, on an aggregated level the forecast looks great and the bias is close to the target of 100%.
Mean absolute deviation (MAD) is another commonly used forecasting metric. This metric shows how large an error, on average, you have in your forecast. However, as the MAD metric gives you the average error in units, it is not very useful for comparisons. An average error of 1,000 units may be very large when looking at a product that sells only 5,000 units per period, but marginal for an item that sells 100,000 units in the same time.
Mean absolute percentage error (MAPE) is akin to the MAD metric, but expresses the forecast error in relation to sales volume. Basically, it tells you by how many percentage points your forecasts are off, on average. This is probably the single most commonly used forecasting metric in demand planning.
As the MAPE calculations gives equal weight to all items, be it products or time periods, it quickly gives you very large error percentages if you include lots of slow-sellers in the data set, as relative errors amongst slow sellers can appear rather large even when the absolute errors are not (see Table 2 for an example of this). In fact, a typical problem when using the MAPE metric for slow-sellers on the day-level are sales being zero, making it impossible to calculate a MAPE score.
Table 2: Individual and average MAD and MAPE metrics for the same three products.
Measuring forecast accuracy is not only about selecting the right metric or metrics. There are a few more things to consider when deciding how you should calculate your forecast accuracy:
Measuring accuracy or measuring error: This may seem obvious, but we will mention it anyway, as over the years we have seen some very smart people get confused over this. Despite its name, forecast bias measures accuracy, meaning that the target level is 1 or 100% and the number +/- that is the deviation. MAD and MAPE, however, measure forecast error, meaning that 0 or 0% is the target and larger numbers indicate a larger error.
Aggregating data or aggregating metrics: One of the biggest factors affecting what results your forecast accuracy metrics produce is the selected level of aggregation in terms of number of products or over time. As discussed earlier, forecast accuracies are typically better when viewed on the aggregated level. However, when measuring forecast accuracy at aggregate levels, you also need to be careful about how you perform the calculations. As we will demonstrate below, it can make a huge difference whether you apply the metrics to aggregated data or calculate averages of the detailed metrics.
In the example (see Table 3), we have a group of three products, their sales and forecasts from a single week as well as their respective MAPEs. The bottom row shows sales, forecasts, and the MAPE calculated at a product group level, based on the aggregated numbers. Using this method, we get a group-level MAPE of 3%. However, as we saw earlier in Table 2, if one first calculates the product-level MAPE metrics and then calculates a group-level average, we arrive at a group-level MAPE of 33%.
Table 3: When calculating the MAPE using aggregated sales and aggregated forecast for the three products, the resulting group-level MAPE is much lower than when calculating the average of the individual products’ respective MAPE results.
Which number is correct? The answer is that both are, but they should be used in different situations and never be compared to one another.
For example, when assessing forecast quality from a store replenishment perspective, one could easily argue that the low forecast error of 3% on the aggregated level would in this case be quite misleading. However, if the forecast is used for business decisions on a more aggregated level, such as planning picking resources at a distribution center, the lower forecast error of 3% may be perfectly relevant.
The same dynamics are at play when aggregating over periods of time. The data in the previous examples were on a weekly level, but the results would look quite different if we calculated the MAPE for each weekday separately and then took the average of those metrics. In the first example (Table 2), the product-level MAPE scores based on weekly data were between 12% and 50%. However, the product-level averages calculated based on the day-level MAPE scores vary between 23% and 71% (see Table 4). By calculating the average of these latter MAPEs we get a third suggestion for the error across the group of products: 54%. This score is again quite different from the 33% we got when calculating MAPE based on week and product level data and the 3% we got when calculating it based on week and product group level data.
Table 4: The same example data presented on a day-level, including day and product level MAPE.
Which metric is the most relevant? If these were forecasts for a manufacturer that applies weekly or longer planning cycles, measuring accuracy on the week level makes sense. But if we are dealing with a grocery store receiving six deliveries a week and demonstrating a clear weekday-related pattern in sales, keeping track of daily forecast accuracy is much more important, especially if the items in question have a short shelf-life.
Arithmetic average or weighted average: One can argue that an error of 54% does not give the right picture of what is happening in our example. After all, Product C represents over two thirds of total sales and its forecast error is much smaller than for the low-volume products. Should not the forecast metric somehow reflect the importance of the different products? This can be resolved by weighting the forecast error by sales, as we have done for the MAPE metric in Table 5 below. The resulting metric is called the volume-weighted MAPE or MAD/mean ratio.
Table 5: Volume-weighted MAPE results per product (calculated from daily sales data) and for the group of products.
As you see in Table 5, the product-level volume-weighted MAPE results are different from our earlier MAPE results. This is because the MAPE for each day is weighted by the sales for that day. The underlying logic here is that if you only sell one on unit a day, an error of 100% is not as bad as when you sold 10 units and suffered the same error. On the group level, the volume-weighted MAPE is now much smaller, demonstrating the impact on placing more importance on the more stable high-volume product.
The choice between arithmetic and weighted averages is a matter of judgment and preference. On the on hand, it makes sense to give more weight to products with higher sales, but on the other hand, this way you may lose sight of under-performing slow-movers.
The final or earlier versions of the forecast: As discussed earlier, the longer into the future one forecasts, the less accurate the forecast is going to be. Typically, forecasts are calculated several months into the future and then updated, for example, on a weekly basis. So, for a given week you normally calculate multiple forecasts over time, meaning you have several different forecasts with different time lags. The forecasts should get more accurate when you get closer to the week that you are forecasting, meaning that your forecast accuracy will look very different depending on which forecast version you use in calculating it.
The forecast version you should use when measuring forecast accuracy is the forecast for which the time lag matches when important business decisions are made. In retail distribution and inventory management, the relevant lag is usually the lead time for a product. If a supplier delivers from the Far East with a lead time of 12 weeks, what matters is what your forecast quality was when the order was created, not what the forecast was when the products arrived.
Chapter 5: How to Monitor Forecast Accuracy
In terms of assessing forecast accuracy, no metric is universally better than another. It is all a question of what you want to use the metric for:
1. Forecast bias tells you whether you are systematically over- or under-forecasting. The other metrics do not tell you that.
2. MAD measures forecast error in units. It can, for example, be used for comparing the results of different forecast models applied to the same product. However, the MAD metric is not suitable for comparison between different data sets.
3. MAPE is better for comparisons as the forecast error is put in relation to sales. However, as all products are given the same weight, it can give very high error values when the sample contains many slow-movers. By using a volume-weighted MAPE, more importance is placed on the high-sellers. The downside of this, is that even very high forecast errors for slow-movers can go unnoticed.
The forecast accuracy metric should also be selected to match the relevant levels of aggregation and the relevant planning horizon. In Table 6 we present a few examples of different planning processes utilizing forecasts and typical levels of aggregation over products and time as well as the time spans associated with those planning tasks.
Table 6: Examples of different planning processes using forecasts.
To make things even more complicated, the same forecast is often used for several different purposes, meaning that several metrics for with different levels of aggregation and different time spans are commonly required.
A good example is store replenishment and inventory management at the supplying distribution center. Our recommendation is to use the same forecast that drives store replenishment translated into projected store orders to drive inventory management at the distribution center (DC). In this way, changes in the stores’ inventory parameters, replenishment schedules as well as planned changes in the stores’ stock positions, caused for example by the need to build stock in stores to prepare for a promotion or in association with a product launch, are immediately reflected in the DC’s order forecast.
This means that the stores’ forecasts need to be sufficiently accurate not only days but in many cases several weeks or even months ahead. The requirements for the store forecasts and the DC forecast are, however, not the same. The store-level forecast need to be accurate on the store and product level whereas the DC-level forecast needs to be accurate for the full order volume per product and all stores. On the DC level, aggregation typically reduces the forecast error per product. However, we need to be careful about systematic bias in the forecasts, as a tendency to over- or under-forecast store demand may become aggravated through aggregation.
The number of forecasts in a retail or supply chain planning context is typically very large to begin with and dealing with multiple metrics means that the number is increased even further. This means that you need an exception-based process for monitoring accuracy. Otherwise, your demand planners will either be completely swamped or risk losing valuable demand signals in the averages.
To be able to effectively identify relevant exceptions, it usually makes sense to classify products based on their importance and predictability. This can be done in many ways, but a simple starting point is to classify products based on sales value (ABC classification), which reflects economic impact, and sales frequency (XYZ classification), which tends to correlate with more accurate forecasting. For high sales value and sales frequency AX products, for example, a high forecast accuracy is realistic and the consequences of deviations quite significant, which is why the exception threshold should be kept low and reactions to forecast errors be quick. For low sales frequency products, your process needs to be more tolerant to forecast errors and exception thresholds should be set accordingly.
Another good approach, which we recommend using in combination with the above, is singling out products or situations where forecast accuracy is known to be a challenge or of crucial importance. A typical example is fresh or other short shelf-life products, which should be monitored very carefully as forecast errors quickly translate into waste or lost sales. Special situations, such as new kinds of promotions or product introductions can require special attention even when the products have longer shelf-life.
Of course, to get value out of monitoring forecast accuracy you need to be able to react to exceptions. Simply addressing exceptions by manually correcting erroneous forecasts will not help you in the long run as it does nothing to improve the forecasting process. Therefore, you need to make sure your forecasting system 1) is transparent enough for your demand planners to understand how any given forecast was formed and 2) allows your demand planners to control how forecasts are calculated (see Exhibit 2).
Exhibit 2: To Deal with Forecast Errors, You Need to Be Able to Understand and Control Your Forecasting System
Sophisticated forecasting involves using a multitude of forecasting methods considering many different demand-influencing factors. To be able to adjust forecasts that do not meet your business requirements, you need to understand where the forecast errors come from.
To efficiently debug forecasts, you need to be able to separate the different forecast components. In simple terms, this means visibility into baseline forecast, forecasted impact of promotions and events, as well as manual adjustments to the forecast separately (see Figure 7). Especially when forecasts are adjusted manually, it is very important to continuously monitor the added value of these changes. Several studies indicate that the human brain is not well suited for forecasting and that many of the changes made, especially small increases to forecasts, are not well grounded.
Figure 7: In this example, it is easy to see that both the baseline sales and forecast are very stable and changes in demand very promotion-driven. Forecast accuracy for promotions is also good.
In many cases, it is also very valuable to be able to go back in time to review what the forecast looked like in the past when an important business decision was made. Was a big purchase order, for example, placed because the actual forecast at that time contained a planned promotion that was later removed? In that case, the root cause for poor forecast accuracy was not the forecasting itself, but rather a lack of synchronization in planning.
Some forecasting systems on the market look like black boxes to the users: data goes in, forecasts come out. This approach would work fine if forecasts were 100% accurate, but forecasts are never fully reliable. Therefore, you need to make sure your forecasting system 1) is transparent enough for your demand planners to understand how any given forecast was formed and 2) allows your demand planners to control how forecasts are calculated. (You can read more about how we allow users to manage forecast and other calculations using our business rules engine here.)
Conclusion: Measuring Forecast Accuracy is a Good Servant But a Poor Master
At this point, we have produced more than 7,000 words of text and still not answered the original question of how high your forecast accuracy should be. You probably see now why we are sometimes tempted just to say an arbitrary number, like 95%, and move on. However, especially these days when there is so much hype around machine learning, we fear that the focus in improving retail and supply chain planning is shifting too much towards increasing forecast accuracy at the expense of improving the effectiveness of the full planning process. All the while our customers are enjoying the benefits of increased forecast accuracy with our machine learning algorithms, we still strongly feel that there is a need to discuss the role of forecasting in the bigger picture.
For some products, it is easy to attain a very high forecast accuracy. For others, it is more cost-effective to work on mitigating the consequences of forecast errors. For the ones that fall somewhere in-between, you need to continuously evaluate the quality of your forecast and how it works together with the rest of your planning process.
Good forecast accuracy alone does not equate a successful business. Therefore, measuring forecast accuracy is a good servant, but a poor master.
To summarize, here are a few key principles to bear in mind when measuring forecast accuracy:
1. Primarily measure what you need to achieve, such as efficiency or profitability. Use this information to focus on situations where good forecasting matters. Ignore areas where it will make little or no difference. Keep in mind that forecasting is a means to an end. It is a tool to help you get the best results; high sales volumes, low waste, great availability, good profits, and happy customers.
2. Understand the role of forecasts in attaining business results and improve forecasting as well as the other parts of the planning processes in parallel. Optimize safety stocks, lead times, planning cycles and demand forecasting in a coordinated fashion, focusing on the parts of the process that matter the most. Critically review assortments, batch sizes and promotional activities that do not drive business performance. Great forecast accuracy is no consolation if you are not getting the most important things right.
3. Make sure your forecast accuracy metrics match your planning processes and use several metrics in combination. Choose the right aggregation level, weighting, and lag for each purpose and monitor your forecast metrics continuously to spot any changes. Often the best insights are available when you use more than one metric at the same time. Most of this monitoring can and should be automated, so that only relevant exceptions are highlighted.
4. If you want to compare your forecast accuracy to that of other companies, it is crucial to make sure you are comparing like with like and understand how the metrics are calculated. The realistic levels of forecast accuracy can vary very significantly from business to business and between products even in the same segment depending on strategy, assortment width, marketing activities, and dependence on external factors, such as the weather. Furthermore, you can easily get significantly better or worse results when calculating essentially the same forecast accuracy metric in different ways. Remember that forecasting is not a competition to get the best numbers. To learn from others, study how they do forecasting, use forecasts and develop their planning processes, rather than focusing on numbers without context.