Data drifts usually occur when there is a difference between input data used to train and validate the model and real world data. It is something to be worried about, but it is not particularly bad, depending on the context. Data drifts are okay when they are caused by seasonality, shifts in human preference, and trends – They are out of your control and are indications that you need to account for.
- Seasonality.
This is an event that repeats itself over a period of time (e.g., monthly, yearly, biannually) It is normal for some ML use-cases to experience a level of seasonality. For example, sales for a particular product could peak at a certain time interval in the year, or that more sales happen during festive periods compared to non-festive periods, or the temperature is higher in the summer compared to the winter. Models that are used to predict these need to account for these seasons to maintain optimal performance. - Human preference and trends.
Changes in human preference can lead to a trend; an increase or decrease in the purchase pattern of a product.
You can’t fully control these, so it is okay when data drifts occur for these reasons. These can help inform businesses to change their approach or adapt to customer preferences and trends. Forecast seasonality issues ahead of time and think through the steps you should take if you encounter them before they occur. If your data is robust enough, it might not affect performance much. You can also reactively fix them when they affect the performance of your model.