How to Detect Concept Drift with Machine Learning Monitoring

What is Concept Drift?

Let us start by defining concept drift and elaborating its importance. According to Wikipedia:

“In predictive analytics and Machine Learning, concept drift means that the statistical properties of the target variable, which the model is trying to predict, change over time in unforeseen ways. This causes problems because the predictions become less accurate as time passes.”

Concept drift (a.k.a model drift) is part of the Machine Learning lifecycle, and it is perhaps the primary reason for why we need to refresh and retrain ML models. As the incoming data drifts away from the historical data that was used for training, the relationships and correlations between features changes as well. Mathematically, concept drift is defined as a change in the distribution P(y|X), where y is the real label and X are the available features. The problem with concept drift is that the core assumption of Machine Learning is that the training distribution reflects the “real-world” distribution, otherwise nothing ensures that the trained model is fit for the target task.

To simplify this idea, consider this example in the cyber security field. Imagine you are trying to develop a system that notifies you about potential Denial-of-Service attacks. Say your model features the number of requests received by the server per minute. When this model was trained, 1,000 requests a minute was an extremely large number of requests that indicated something fishy was going on. But what if your company launched an advertising campaign that made your website much more popular? Here, the concept of suspicious behavior has drifted due to changes in reality.

Detecting concept drift early on is essential for maintaining up-to-date models in production that continuously provide value to your company. Ideally, this should be incorporated as part of a robust framework for monitoring ML models in production.

Types of Concept Drift

Gradual Concept Drift. This happens more frequently as a result of the dynamic nature of businesses. Fraud detection is one of many areas where this can happen. As fraudsters continually adapt to fraud detection techniques, a model based on historical data on fraudulent transactions becomes redundant, impacting its performance. This can also happen when users or potential users of a product change their preferences over time.

Recurring Concept Drift. Seasonal periodic events cause this. It is called “seasonality” and this can be forecasted by the team, but the ML model itself cannot take the changes into account. This can happen when customer behavior shifts with seasonal changes. It could happen when there are product discounts and user preferences over a period of time.

Instantaneous Concept Drift. These are usually caused by abrupt events that may be seen as outliers compared to normal observations, or it might happen due to issues with the data pipeline that affect the quality of the data, which in turn reduces the performance of the model. The COVID-19 pandemic is an outlier event that teams never planned for. During the outbreak, user behavior drastically changed and impacted businesses unequally. It is an outlier event and it affected models.

Models drift to a great degree when there’s data drift. Data drift detection should be a priority when using a monitoring tool as a proactive measure against concept drifts. For example, model drifts can happen when users change preferences or upstream data changes in the data pipeline. Being proactive by monitoring both data and concept drifts can save your projects and precious time.

How to Detect Concept Drift

We begin by selecting an appropriate drift detection algorithm. For streaming data, a popular choice is ADWIN (ADaptive WINdowing), while for batched data, the popular choices are the Kolmogorov–Smirnov test, the chi-squared test or adversarial validation.

Next, we apply the selected algorithm separately to the labels, the model’s predictions, and the data features. Drift in any one of these categories may be significant in its own way. Drift in the labels (label drift) indicates a change in the representation of these classes in the real world or your sampling or processing method, and possibly also a concept drift. Similarly, drift in your model’s predictions indicates a data drift in important features, and also perhaps a concept drift. Drifting in any of the individual features is worth noting, but for some features, this may not have a strong effect on your model’s quality. As a mathematical notation, we have:

To sum it up, our model learns to simulate P(y|X) during training, so concept drift, by definition, implies that our model will not be fit for the task. Label drift, prediction drift, and data drift are metrics that are easier to measure directly and may be strong indicators of concept drift.

Testing. CI/CD. Monitoring.

Because ML systems are more fragile than you think. All based on our open-source core.

Our GithubInstall Open SourceBook a Demo


We will show a basic example for detecting concept drift with the ADaptive WINdowing (ADWIN) algorithm using the river Python library for online ML. We begin by defining three different distributions for the data which we then concatenate to reflect a signal that drifts over time. Think of the data as being the true labels, predictions, or individual features.

import numpy as np
import matplotlib.pyplot as plt
from matplotlib import gridspec

# Generate data for 3 distributions
random_state = np.random.RandomState(seed=42)
dist_a = random_state.normal(0.8, 0.05, 1000)
dist_b = random_state.normal(0.4, 0.02, 1000)
dist_c = random_state.normal(0.6, 0.1, 1000)

# Concatenate data to simulate a data stream with 2 drifts
stream = np.concatenate((dist_a, dist_b, dist_c))

Next, we plot the data:

# Auxiliary function to plot the data
def plot_data(dist_a, dist_b, dist_c, drifts=None):
   fig = plt.figure(figsize=(7,3), tight_layout=True)
   gs = gridspec.GridSpec(1, 2, width_ratios=[3, 1])
   ax1, ax2 = plt.subplot(gs[0]), plt.subplot(gs[1])
   ax1.plot(stream, label='Stream')
   ax2.hist(dist_a, label=r'$dist_a$')
   ax2.hist(dist_b, label=r'$dist_b$')
   ax2.hist(dist_c, label=r'$dist_c$')
   if drifts is not None:
       for drift_detected in drifts:
           ax1.axvline(drift_detected, color='red')

plot_data(dist_a, dist_b, dist_c)

Which results in the following graph:

On the left is the synthetic signal, while on the right are the histograms for sets drawn from each of the three distribution

On the left is the synthetic signal, while on the right are the histograms for sets drawn from each of the three distributions. As we can see the signal has two points with significant drift.

Finally, we try to detect the drift using the ADWIN algorithm.

from river import drift
drift_detector = drift.ADWIN()
drifts = []

for i, val in enumerate(stream):
   drift_detector.update(val)   # Data is processed one sample at a time
   if drift_detector.change_detected:
       # The drift detector indicates after each sample if there is a drift in the data
       print(f'Change detected at index {i}')
       drift_detector.reset()   # As a best practice, we reset the detector

plot_data(dist_a, dist_b, dist_c, drifts)

Change detected at index 1055
Change detected at index 2079

As you can see, the algorithm detected two shifting points that are pretty close to the actual points where the drift occurs.


Concept drift or ML model drift is a common issue with Machine Learning models in production that is often not dealt with properly. Incorporating some basic monitoring mechanisms can help you detect potential errors early on and keep your models fresh and relevant. Deepchecks offers services to assist with this process, enabling your data science team to focus more on researching new exciting problems.

Don’t forget to ⭐ our Github repo, it’s really a big deal for open-source-led companies like Deepchecks.

Further Reading

Model drift in ML monitoring

Sethi, Tegjyot Singh, and Mehmed Kantardzic. “On the Reliable Detection of Concept Drift from Streaming Unlabeled Data.” Expert Systems with Applications (2017).

vZliobait.e, Indr.e. “Learning under Concept Drift: an Overview.” (2010).

Testing. CI/CD. Monitoring.

Because ML systems are more fragile than you think. All based on our open-source core.

Our GithubInstall Open SourceBook a Demo

Recent Blog Posts

Precision vs. Recall in the Quest for Model Mastery
Precision vs. Recall in the Quest for Model Mastery

Webinar Event
The Best LLM Safety-Net to Date:
Deepchecks, Garak, and NeMo Guardrails 🚀
June 18th, 2024    8:00 AM PST

Register NowRegister Now