If you like what we're working on, please  star us on GitHub. This enables us to continue to give back to the community.

When is data drift OK, and when do I have to fix something?

Tiara Williamson
Tiara WilliamsonAnswered

Data drifts usually occur when there is a difference between input data used to train and validate the model and real world data. It is something to be worried about, but it is not particularly bad, depending on the context. Data drifts are okay when they are caused by seasonality, shifts in human preference, and trends – They are out of your control and are indications that you need to account for.

  • Seasonality.
    This is an event that repeats itself over a period of time (e.g., monthly, yearly, biannually) It is normal for some ML use-cases to experience a level of seasonality. For example, sales for a particular product could peak at a certain time interval in the year, or that more sales happen during festive periods compared to non-festive periods, or the temperature is higher in the summer compared to the winter. Models that are used to predict these need to account for these seasons to maintain optimal performance.
  • Human preference and trends.
    Changes in human preference can lead to a trend; an increase or decrease in the purchase pattern of a product.

You can’t fully control these, so it is okay when data drifts occur for these reasons. These can help inform businesses to change their approach or adapt to customer preferences and trends. Forecast seasonality issues ahead of time and think through the steps you should take if you encounter them before they occur. If your data is robust enough, it might not affect performance much. You can also reactively fix them when they affect the performance of your model.

Open source package for ml validation

Build Test Suites for ML Models & Data with Deepchecks

Get StartedOur GithubOur Github

Subscribe to Our Newsletter

Do you want to stay informed? Keep up-to-date with industry news, the latest trends in MLOps, and observability of ML systems.