If you like what we're working on, please  star us on GitHub. This enables us to continue to give back to the community.

Addressing Drifts in Time-Series Forecasting

This blog post was written by Tonye Harry as part of the Deepchecks Community Blog. If you would like to contribute your own blog post, feel free to reach out to us via blog@deepchecks.com. We typically pay a symbolic fee for content that’s accepted by our reviewers.


Source

Time Series is a sequence of chronological observations (data points). Typically, obtaining this data is done by continuous measurement over a period of time. Time series have been used in electrocardiogram (ECG) recordings of heart activities in real-time, and for detecting abnormalities in the cardiovascular system of patients. Another popular example is forecasting weather conditions using past weather patterns or trading markets with past data.

Nowadays, more businesses are beginning to see the value in forecasting. It helps them plan their operations and allocate human resources or capital to vital business functions to achieve their goals. A food company can use time series forecasting to predict the shelf-life of their products so they can plan ahead. Other companies that heavily rely on international supply chains usually employ real-time predictive analytics to increase visibility of the logistics around the delivery of their products to predict the time of arrival of their supplies. This improves profitability by ensuring sufficient inventory. Production and manufacturing companies with varying degrees of machinery can track and analyze time series data from sensors to gain business intelligence from production or manufacturing process workflows so they are able to anticipate and solve problems, saving them time and money.

As global production of massive data continues to contribute efficiency in data streaming and forecasting future values, the importance of time series analysis and forecasting has also increased. This analysis can even save lives through disease outbreak surveillance.

From research scientists to business managers, time series data (streaming data) is crucial for the success of any meaningful endeavor. It is unrealistic to anticipate having predictable and consistent data points for every situation because the majority of data changes throughout time. Since events in the real world are dynamic, it is certain that the distribution of data can change. This often leads to data drifts and gravely affects the forecasted results.

This article discusses time series analysis and forecasting, focusing on the challenges, drifts in time series, and how to address them.

Time Series Analysis and Forecasting

Time series forecasting over a period of time

Fig. 1 Time series forecasting over a period of time. Source

Time Series Analysis

Time series analysis is the process of obtaining statistical information and meaningful summaries from data points grouped chronologically. That analysis can be used to understand patterns or trends over a period of time. The applications of this range from weather, rainfall, and heart rate monitoring to quarterly sales and stock price analysis. In astronomy, experts use it to analyze objects and events of interest (like supernovae) and calibrate their equipment as they research. Time series can be used to monitor patterns of supernovae transitions to better understand how the universe changes over time.

Time series analysis is also used by healthcare facilities in identifying patterns and trends in the length of in-patient stays, the types of treatments they receive, and when they receive them parallel to the changes in their conditions over time. As a result, the facility can use the information gained from the analysis of the identified parameters (hospital duration, treatment, etc.) to determine the best times to deliver care and shorten patients’ typical hospital stays. Zhou et al. in their research deployed a time series forecasting model to predict the number of new in-patient admissions in a hospital to tackle overcrowding issues.

Types of Time Series Analysis

  • Classification: detects and assigns categories to time series data.
  • Segmentation: segments the data to display the underlying characteristics of the information source.
  • Curve Fitting: studies the relationships of variables within the data by plotting the data along a curve. The goal of curve fitting is to identify the ideal set of parameters to minimize error.
  • Exploratory Analysis: focuses on the relationships between desired variables present in the time series data, typically through data visualizations.
  • Predictive Analysis (Forecasting): Predicts future data values utilizing historical trends.

These are the few among other types (signal estimation, function approximation, or intervention analysis) of time series analysis. Some models and techniques used in time series analysis are the ARIMA model, multivariate models, and Holt-Winters Method.

This article focuses on Time Series Forecasting.

Open source package for ml validation

Build Test Suites for ML Models & Data with Deepchecks

Get StartedOur GithubOur Github

Time Series Forecasting

Time series forecasting uses historical data to predict future values. There are two types of large-scale forecasts that professionals are interested in:

Demand Forecast. This is used to anticipate demand for products and services. This can answer questions such as how much money needs to be allocated for each product, and how many cars will be purchased and produced in the next 12 months for inventory planning?

Growth Forecast. This enables businesses to anticipate a number of key performance indicators (KPIs) like future revenue growth or expenses. It answers questions about how much will be spent on cloud computing services next quarter, or how long it will take to hit company targets.

A bad forecast can severely and negatively affect both demand and growth use-cases for businesses that rely on forecasting. Companies should define the needs and expectations for the forecasting tool before implementing it. Before starting to develop a forecasting system, consider the following questions:

  1. What needs to be forecasted, and at what scale?
  2. What’s the cost of creating this system?
  3. What value will be derived from this and how will it be measured?

Keep in mind that the quality of the forecast is greatly influenced by the quality of the data utilized. Therefore, expectations for the system should be reasonable, given the data and resources available.

Time Series Forecasting Challenges (Drifts)

As exciting as the application of time series forecasting might be, it has challenges. A major concern is concept drift. This happens when the probability of the target variable Y changes over time given input features X, i.e., P(Y|X). In essence, the change in the input pattern of the time series data can affect the accuracy and performance of a forecasting model.

The following are typical concept drift problems that data scientists in different domains with varied forecasting use-cases encounter.

Trends (Recurrent Drift)

This is very common. Trends typically show a pattern movement of time series data. It might be a higher or lower data trend relative to the previous data, and every business needs to consider scenarios where it happens and what to do about it. For example, revenue for a thriving technology company might increase yearly. Although this is ideal given the financial trajectory of a profitable organization, forecasting the next year’s revenue might not be accurate based on the current revenue.

Recurrent drift

Fig. 2. Recurrent drift. Source

Seasonality (Gradual Drift)

This demonstrates how the data has gradually changed over time. It is similar to a trend, but not entirely. Trends show a sustained rise or fall in the data and can change direction with time. Seasonality shows a periodic change in data patterns, typically seen when the data rises above a reference point and then decreases again. Common seasonal periods might be hours of a day or months in a year. Compared to a month like February or May, the Christmas season may see exceptionally high toy purchases, then it goes back to baseline in February or a non-festive month. Seasonal changes must be discovered in order to cope with them effectively and hopefully boost performance.

Gradual drift

Fig. 3. Gradual drift. Source

Unique Events (Sudden Drift)

These are activities in a particular field of interest that can lead to spikes in streaming data. A sales event or marketing campaign for a phone retailer, for example, may cause an unusual increase in the amount of incoming data. It is important to identify unique occurrences that might directly and significantly affect future observations for the specific field of use. A Black Friday event will have a greater impact on a large U.S. retailer than a Best Buddy Day event, thus big events should be actively monitored. This can be achieved by determining whether these occurrences are related, filtering them out, and learning about the window patterns for the given period of time.

Sudden drift

Fig. 4. Sudden drift. Source

Addressing Drifts in Time Series

The drifts covered in this article will happen no matter the forecasting models employed, thus they are not inherently a bad thing because they reveal the nature of the streaming data and events they represent. Data gathered can exhibit some level of repetition or “rhythm”, so this is expected. Here are frequently used methods to address drifts:

Preprocessing and Feature Engineering

Detecting trends and seasonality in the preprocessing stage can be helpful in understanding why they happen and finding the correlation between events that cause a drift in the data.

Detrending can be done to remove the trends that cause distortions and give better visibility to useful subtrends, providing added information that can improve model performance in some cases. This can be done during the preprocessing stage by using linear detrending or differencing before using the data to forecast. It should be noted that these methods can be simple, but they have some disadvantages, which can be a trade-off depending on the desired results. These disadvantages may result in the presence of local trends, cumulative errors, or might overfit local trends.

Fortunately, seasonality can be modeled by Recurrent Neural Networks (RNNs) and linear models when used in forecasting because they can recognize the data’s sequential characteristics. For a feature-based model (e.g., a linear regression model), professionals can analyze and identify seasonality, then add a time feature to the data based on the time period of the seasonal change (feature engineering). For instance, a feature of the “week of the month” or “hour of the day” can be introduced into the time series data if the seasonality accounted for occurs monthly or daily. Seasonality can also be detected during the preprocessing stage by visualizing the data, decomposing the components, and checking for how stationary the data is before using differencing to deseasonalize the data. If the project is on a small scale, the previous method can work or can be used to debug seasonality concerns, but on a large scale, teams would need an automatic seasonality detection algorithm to properly scale their solution. There are other automated ways to detect seasonality, like Fast Fourier Transform (FFT), Change Point Detection (CPD), or Auto Correlation Function (ACF).

Feature engineering

Fig. 5 Difference between observations and features for feature engineering. Source

Monitoring

Monitor your forecasting system. Monitoring tools like Evidently AI can serve as a data drift detection tool to effectively monitor and detect drifts in data distributions. With the help of these tools, teams can get alerts when a threshold (an observation below or above a set performance metric) is reached or a substantial amount of drift is detected by the tool. These tools enable practitioners to troubleshoot faster compared to when there’s no monitoring tool.

Retrain and Update Your Model

Test different models to see which one performs better for the specific use case and update your model. Make sure to also adjust your hyperparameters (number of layers, learning rate, etc.) in sequential models like Long Short-Term Memory (LSTM) and RNNs. It might improve performance. Periodically retrain the model to ensure optimal performance.

Online Learning

Online learning allows people to continuously update models as new data is streamed into the forecasting system. Online learning algorithms are a good solution if the model can handle the rapid speed and a potentially big bulk of streaming data well enough and adapt to changing data distributions.

Conclusion

As more data is generated every day, time series analysis and forecasting become more and more popular across a wide range of industries, enabling organizations to make informed decisions. Drifts are rather common, and when they do occur, teams should recognize how much they impact their project and act as soon as they can to prevent a decline in the prediction model’s performance. A brief reminder: you can’t solve every drift at once, so prepare for each drift that comes, deal with them as they occur, and adapt as much as possible.

Subscribe to Our Newsletter

Do you want to stay informed? Keep up-to-date with industry news, the latest trends in MLOps, and observability of ML systems.

Related articles

How to Choose the Right Metrics to Analyze Model Data Drift
How to Choose the Right Metrics to Analyze Model Data Drift
What to Look for in an AI Governance Solution
What to Look for in an AI Governance Solution
×

Event
Identifying and Preventing Key ML PitfallsDec 5th, 2022    06:00 PM PST

Days
:
Hours
:
Minutes
:
Seconds
Register NowRegister Now