Why model accuracy alone isn’t okay
Imagine if you are working on a dataset to detect credit card fraud in which there is no fraudulent activity in 99% of the observations. It is easy to make the mistake of building a model that can show 99% accuracy but will never detect fraud!
Developing an efficient model using machine learning can be tricky. Just model accuracy cannot decide if your model is really working efficiently or not.
Several evaluation techniques can tell how well a model performs. In this article, we will discuss two metrics i.e. Precision and Recall, and then learn to evaluate model performance by seeing a ROC curve.
Consider a binary classification such as a fraud detection dataset in which 0 means no fraud and 1 means fraud.
A True Positive is when the actual class is 1 and the model predicts that class as also 1.
A False Positive is when the actual class is 0 and the model has predicted that class as 1. Similarly, False Negative and True Negative can be known by comparing the original to predicted values.
In summary, while True Positive and True Negative means that the model has performed well, False Positives and False Negatives means that it did not.
Precision calculates what fraction of the predicted positives is True Positives.
Precision = True Positives / (True Positives + False Positives)
Higher Precision is good. A model that produces no False Positives has a Precision of 1.
Recall calculates what fraction of transactions that were originally fraudulent are actually detected as fraudulent.
Recall = True Positives / (True Positives + False Negatives)
A higher Recall is good. A model that produces no False Negatives has a Recall of 1.
Imagine a student who has memorized a foreign language without understanding it. That student may score 100% in a test but may not be able to give the same results in real life. Precision and Recall in evaluating a model performance is just that. Unfortunately, they play the Tug of War game. Usually, improving precision may reduce recall and vice versa.
To fully evaluate the effectiveness of a model, both Precision and Recall have to be evaluated and tuned. It is only when they are both tuned to the purpose of the model that we know that the model is effective. This can be seen in a Receiver Operating Characteristic (ROC) curve.
A ROC curve is a graph that shows the performance of a classification model at all classification thresholds by plotting the True Positive Rate (TPR) aka. Recall and the False Positive Rate (FPR).
TPR = True Positives / (True Positives + False Negatives)
FPR = False Positives / (False Positives + True Negatives)
The following figure shows two ROC curves. The more the Area Under the Curve (AUC), the better your model is performing.
Try it yourself. Load some classification data in mlOS, make several models using a dozen algorithms and then compare them. See the ROC curves of the models even though the accuracy may be similar. You will know the difference between a good and a bad model.
Three things to remember:
- Accuracy is not the only indicator of model performance
- Precision and Recall can be adjusted for efficient model performance
- ROC curve / AUC shows the true picture