🎉 Deepchecks’ New Major Release: Evaluation for LLM-Based Apps!  Click here to find out more ðŸš€
DEEPCHECKS GLOSSARY

ROC (Receiver Operating Characteristic) Curve

A Receiver Operator Characteristic (ROC) curve is a graphical representation of a binary classifier’s diagnostic capacity. Its origins are in signal detection theory, but it is currently employed in a variety of fields including medicine, radiography, natural disasters, and machine learning.

  • Widespread use in these fields speaks volumes about the importance of roc.

ROC curve formula

The true positive rate (TPR) is plotted against the false positive rate (FPR) to create a ROC curve in machine learning (FPR). The true positive rate (TP/(TP + FN)) is the percent of all positive observations that were correctly expected to be positive.

Similarly, the false positive rate (FP/(TN + FP)) is the fraction of negative observations that are mistakenly projected to be positive. The true positive rate, for example, in medical testing is the percentage of patients who are accurately recognized as testing positive for the disease in the issue.

  • ROC curve uses both TPR and FPR at various categorization criteria. As the classification threshold is lowered, more items are classified as positive, resulting in an increase in both False Positives and True Positives

The ROC space, a discrete classifier that yields only the predicted class, returns a single point. However, we can generate a curve with probabilistic classifiers, which give a probability or score that indicates the degree to which an instance belongs to one class rather than another, by adjusting the score threshold.

Many discrete classifiers can be turned into scoring classifiers by examining their instance statistics. A decision tree, for example, uses the proportion of instances at a leaf node to determine the class of the node.

ROC analysis curve

The balance between TPR (sensitivity) and 1 – FPR (specificity) is depicted by the ROC curve. Classifiers with curves that are nearer to that top-left corner perform better. A random classifier is expected to give points that are diagonal (FPR = TPR) as a baseline. The test becomes less accurate when the curve approaches the ROC space’s 45-degree diagonal.

It’s worth noting that the ROC is independent of the class distribution. This makes it ideal for testing classifiers that anticipate infrequent events like illness or natural disasters. Using accuracy (TP + TN) / (FN + FP + TP + TN) to evaluate performance, on the other hand, would always favor classifiers that predict a negative outcome for uncommon events.

Testing. CI/CD. Monitoring.

Because ML systems are more fragile than you think. All based on our open-source core.

Our GithubInstall Open SourceBook a Demo

The area under curve (AUC)

It can be advantageous to summarize the performance of each classifier into a single measure when comparing different classifiers. Calculating the area under the ROC curve, abbreviated as AUC, is a typical method.

  • An outstanding model has an AUC close to 1, indicating that it has a high level of separability. 

AUC approaching 0 indicates a bad model, which means it has the lowest measure of separability. It predicts 1s to be 0s and 0s to be 1s. When the area under the curve is equal to 0.5, then a model has no ability to distinguish between classes.

It’s the chance that a randomly selected negative instance will be ranked lower than a randomly selected positive instance.

A classifier with a high AUC might sometimes perform worse in a specific location than one with a lower AUC. However, as a generic measure of predictive accuracy, the AUC works well in practice.

The following are two reasons why AUC is desirable:

The AUC is unaffected by scaling. Rather than measuring absolute values, it assesses how well predictions are ordered.

AUC is insensitive to categorization thresholds. It assesses the accuracy of the model’s predictions regardless of the categorization level used.

Both of these arguments, however, come with restrictions that may restrict AUC’s utility in specific situations:

Scale invariance isn’t always a good thing. For example, there are occasions when we absolutely require accurately calibrated probability outputs, and AUC won’t inform us.

Invariance of the classification threshold is not always desired. It may be crucial to minimize one type of classification error in circumstances where the cost of false negatives vs. false positives is large. When it comes to email spam detection, for example, you’ll probably want to prioritize reducing false positives, even if it means a considerable rise in false negatives. For this form of optimization, AUC isn’t a good statistic to utilize.