If you like what we're working on, please  star us on GitHub. This enables us to continue to give back to the community.
DEEPCHECKS GLOSSARY

MLOps Monitoring

Many of the most crucial business actions are being driven by machine learning models. As a result, once deployed into production, it’s critical that these models stay relevant in the context of the most recent data.

If there is data skew, a model may be out of context because the data distribution in production differs from what was utilized during training. It’s also possible that a feature is no longer accessible in production data, or that the model is no longer applicable because the real-world environment has changed, or, to put it another way, user behavior has changed.

Feedback

Feedback mechanisms are crucial in many parts of life, including business. The concept of a feedback loop is simple: you make something, measure information about it, and utilize that knowledge to enhance output. It’s a never-ending cycle of observation and progress. A feedback loop may be included in anything that has observable data and potential for development, and ML models can undoubtedly benefit from them.

Data intake, pre-processing, model construction and assessment, and eventually, deployment are all processes in a typical ML workflow. However, one important feature is missing: feedback.

The basic goal of any model monitoring approach is to establish this critical feedback loop from the deployment phase back to the model development phase. This allows the ML model to improve itself over time by determining whether to update the model or stick with the current one. To help with this choice, the model monitoring framework should keep track of and report on several model metrics in the two situations below.

  • The training data is provided, and the framework computes the model metrics on both the training and production data after deployment, comparing the results to make a conclusion.

Because no training data is available, the framework computes the model metrics using just the data available after deployment.

Metrics noted in the next section are generated based on which of the two situations applies to determine if a model in production requires an update or other interventions.

Testing. CI/CD. Monitoring.

Because ML systems are more fragile than you think. All based on our open-source core.

Deepchecks HubOur GithubOpen Source

Measurements

The best ai model monitoring metrics divide metrics into three categories based on their reliance on data and/or machine learning models.

A machine learning model performance monitoring framework should preferably include one or two metrics from each of the three categories, but if there are tradeoffs, one can start with operations metrics and work their way up in terms of model maturity. Furthermore, operational metrics should be checked in real-time or at least daily, with stability and model performance monitoring weekly or even longer depending on the domain and business environment.

  • Metrics of Stability — We can use these measures to detect two sorts of data distribution shift patterns:
  1. Earlier Probability Shift – Captures the shift in the distribution of the anticipated outputs and/or dependent variable between the training and production data, as well as across different time frames of the production data.
  2. Covariate Shift – Captures the shift in the distribution of each independent variable between the training and production data, or across different time frames of the production data, as appropriate.
  • Evaluation Metrics — These metrics aid in the detection of a conceptual shift in data, i.e. determining if the relationship between independent and dependent variables has shifted. They do so by comparing the quality of the current deployed model to when it was trained or at a previous time period post-deployment. As a result, a choice may be made whether or not to rework the deployed model.
  • Metrics for Operations – These indicators assist us in determining how well the deployed model is functioning in terms of utilization. As a result, they are independent of model type, data, and do not require any inputs, unlike the previous two measures.

Conclusion

For mature ML systems, monitoring MLOps lifecycle, MLOps pipelines, and the MLOps platform has become a need. It is critical to developing such a framework to assure the ML system’s consistency and robustness, since failing to do so risks losing the end-“confidence,” user’s which might be deadly. As a result, it is critical to include and prepare for it in the overall solution architecture of any ML use case implementation.