7 February, 2020

*This article continues from our 101 guide to predictive analytics. Here we are going into another level of detail, to talk about how not all predictive analytics / demand forecasting models are equal. *

*As you might imagine in the world of data science, there's a whole lot more detail you could go into - but one step at a time... What follows is a quick-fire introduction to a few concepts:*

There are two main types of predictive analysis: continuous and categorical.

Continuous models are concerned with forecasting the value of a given variable - say the amount of money a customer will spend when they receive a catalogue, or sales on a particular day.

Categorical models predict whether something will happen or not. The simplest and most common is a binary yes/no question: will customer X pay back their loan or not? Will England win the 2023 Rugby World Cup? (Good luck predicting that one.) However, it's quite possible to build categorical models with more outcomes: will this customer rent a hatchback, estate or SUV?

In general, some models are better at continuous variable forecasting, others at categorical problems. Which leads nicely onto a discussion about the statistical algorithms that sit at the heart of all predictive analytics engines...

There are a whole host of statistical methods that can be used in predictive analytics, including decision trees, logistic regression, Support Vector Machine (SVM), Bayesian classification, neural networks and more.

As you might imagine, each has its own benefits and drawbacks depending on the particular problem in question. Perhaps counterintuitively though, choosing a method is typically not the most important factor in getting accurate results for a business / public sector problem: the differences between algorithms are insignificant in the context of incomplete, noisy data affected by a lot of external factors. If you work at CERN and are trying to detect the Higgs boson then it's a different matter of course (and see below for how algorithms vary in their ability to explain their results).

These statistical algorithms do the heavy lifting in predictive analytics engines. However, they are only as good as the data that's fed into them and that data is never perfect. Missing values, outliers, in-built biases and more can all skew the output, which is where machine learning comes in.

ML can also automate the process of feature extraction, whereby the original variables are transformed into 'features' that are more useful in the prediction model. Let's take a customer's age in a marketing situation: the likelihood of making a purchase may increase up to a certain age, then plateau and start to decrease beyond a certain age. So age is an important predictor of purchase likelihood, but it's not a linear relationship. Of course, we don't know what this relationship is before we start and ML can automate the process of working out which transformations have the greatest predictive power.

Some of the algorithms described above (e.g. SVM, neural networks) effectively work as a black box: the output may be very accurate, but you have no idea what is behind the forecast. On the other hand, an algorithm like logistic regression actually quantifies the impact of individual variables - so perhaps the amount of rainfall has X impact, and the fact that it's a Monday has Y impact.

Whether this matters depends on how you intend to use the forecast, but in general it's very helpful to understand the underlying factors behind a prediction. Many organisations spend a lot of time debating what's happening and why, and predictive analytics has the ability to shed a rigorous, data-fuelled light on that.

If you'd like to learn more, these articles might be of interest:

Thanks for reading.

Skarp uses machine learning-powered predictive analytics to generate accurate, automated demand forecasts - and an explanation of what is actually driving performance.

By removing uncertainty and quantifying the impact of factors affecting performance, Skarp can reduce costs and improve customer satisfaction.

We offer a fully-managed service, designed for organisations with limited in-house data science resources.

There is no setup fee or minimum contract term with Skarp, and we offer all new clients a proof of concept free of charge. We believe the accuracy of our forecasts will speak for itself.