Many of the most crucial business actions are being driven by machine learning models. As a result, once deployed into production, it’s critical that these models stay relevant in the context of the most recent data.
If there is data skew, a model may be out of context because the data distribution in production differs from what was utilized during training. It’s also possible that a feature is no longer accessible in production data, or that the model is no longer applicable because the real-world environment has changed, or, to put it another way, user behavior has changed.
Feedback mechanisms are crucial in many parts of life, including business. The concept of a feedback loop is simple: you make something, measure information about it, and utilize that knowledge to enhance output. It’s a never-ending cycle of observation and progress. A feedback loop may be included in anything that has observable data and potential for development, and ML models can undoubtedly benefit from them.
Data intake, pre-processing, model construction and assessment, and eventually, deployment are all processes in a typical ML workflow. However, one important feature is missing: feedback.
The basic goal of any model monitoring approach is to establish this critical feedback loop from the deployment phase back to the model development phase. This allows the ML model to improve itself over time by determining whether to update the model or stick with the current one. To help with this choice, the model monitoring framework should keep track of and report on several model metrics in the two situations below.
Because no training data is available, the framework computes the model metrics using just the data available after deployment.
Metrics noted in the next section are generated based on which of the two situations applies to determine if a model in production requires an update or other interventions.
The best ai model monitoring metrics divide metrics into three categories based on their reliance on data and/or machine learning models.
A machine learning model performance monitoring framework should preferably include one or two metrics from each of the three categories, but if there are tradeoffs, one can start with operations metrics and work their way up in terms of model maturity. Furthermore, operational metrics should be checked in real-time or at least daily, with stability and model performance monitoring weekly or even longer depending on the domain and business environment.
For mature ML systems, monitoring MLOps lifecycle, MLOps pipelines, and the MLOps platform has become a need. It is critical to developing such a framework to assure the ML system’s consistency and robustness, since failing to do so risks losing the end-“confidence,” user’s which might be deadly. As a result, it is critical to include and prepare for it in the overall solution architecture of any ML use case implementation.