Ever wondered why analysts mention: “a model with 99% accuracy is misleading”? This post is the first of a series for how models are interpreted. This knowledge is important given machine learning is made easily accessible in many automated marketing platforms. The ability to build and deploy a model is no longer an exclusive domain for data scientists.
Assume the task is to predict individuals as potential buyer or non-buyer (converter vs non-converter) on a given website. Further assume the historical conversion rate is low (less than 2% of website users converted) giving rise to the non-converters being a majority class (the remaining 98%). In this situation, a bad model predicting ALL visitors (100%) as a non-converter will be 98% accurate. However, this model would not be useful, given the purpose of the project is to predict converters. You can flip it around and replace website converters with any other classification problem.
As an example: classifying into cancer vs non-cancer patients based on symptoms experienced. Assuming very few patients have cancer, any model would be highly accurate by predicting everyone has no cancer. This is dangerous.
The remedy
There two angles to attack this problem: changing the data that goes into the model (let’s call this input) and changing how we measure the model’s effectiveness (let’s call this output). Note: input and output are not academic terms, we are using them to frame the solution. On the input side, we can feed the model on a smaller dataset that does not have the issue of a “majority class”, perhaps a sub-sample of the original dataset where the majority class goes from being 90% of the data to 50% or 60%. This is called downsampling. An alternative to downsampling is upsampling: duplicating the data rows for the minority class (in the examples above, simply create copies of converters and cancer patients). On the output side, we need to 'zoom in' and focus on parts of accuracy we need the model to perform well. This is to look at precision and recall, in addition to accuracy. Given this requires explaining the concept of True and False Positives, and True and False Negatives, we will explore this in the next post.
Some models are not concerned with mere classification, they require predicting a numerical output. In the earlier example of identifying the 2% of website converters, we could opt to analyze the probability of converting that comes from the model; P(Convert). This is instead of counting the number of converters vs non-converters (binary values: 0 or 1), given P(Convert) is continuous and ranges from 0 to 1. We could zoom in on 5% of the dataset containing the 2% of converters and focus on how well the model correctly predicts P(Convert) within this smaller subset. This is known as a lift analysis.
In conclusion, a good practitioner needs to be able to realize when we should come at the problem from a different angle. An amazing value on an pre-selected metric (accuracy) might be leading us down the incorrect path, we need to be nimble and pivot to create robust predictions.
Comments