Machine Learning (ML) models are becoming increasingly popular in a variety of industries. They are involved in the decision making process of sensitive matters, where there is a high price for any mistakes made. Despite the sensitivity of the matter, many ML models go into production unmonitored and without proper testing, making them prone to risks. This is usually because MLOps’ newness cannot set a standardized method for monitoring ML models.
Who’s In Charge
Visualization of the different parties involved in the ML model development and upkeep (source)
A production-ready ML model is often a product of contributors from various fields. While the data engineers explore, clean, and construct the data pipelines, the data scientists experiment and create different models and evaluate the performance. ML engineers then productionize the chosen model and incorporate it into the production environment. DevOps is in charge of deployment and monitoring. Each role has different expertise and different ideas of what monitoring consists of. This complex setup puts companies in a situation where there is no single body that takes full responsibility for the ML models in production. To create a proper monitoring system, the different parties need to form a common language and address the different aspects of monitoring ML systems.
Often no single body takes full responsibility for the ML models in production (source)
What Should be Monitored
While DevOps teams are usually more familiar with metrics such as uptime and resource utilization latency., they might not be familiar with ML model monitoring metrics like accuracy, precision and recall, or slice-based performance metrics.
Since the performance of an ML model is tied directly with the data it operates on, it is essential to monitor both the model’s output and input features at different stages in order to detect data drifts and schema changes early on, and identify the source of any issues that may arise.
It can be tricky to define when something “goes wrong” because ML models are statistical objects that are allowed to make some mistakes. Testing ML models is not exactly the same as traditional software testing.
What Can Go Wrong With Your Model After Deployment
When your ML model meets the real world, issues may come up either immediately or after some time in production.
Dev/Production Data Mismatch. Many ML models are trained on hand-crafted clean datasets. When those models are applied to real-world data, they perform poorly because of that mismatch. Even a minor mismatch in the structure of the data fed to your ML model can greatly impact the model’s performance.
Data Integrity Issues. The data pipeline is often complex and the format of the data may change over time. Fields could be renamed, categories may be added or split, and more. Any such change can drastically affect your model’s performance.
Data Drift and Concept Drift. Data in the real world is constantly changing. Social trends, market shifts, and global events affect every industry. These may affect the distribution of the data being fed to your model, or that of the desired target prediction. The data you’ve trained your model on becomes less and less relevant over time. The format of the data is still valid, but your model will become stale and performance will deteriorate as time passes.
Serving Issues. It is also possible for your model to not receive the traffic you expected it to, or that model latency is so high that the predictions are not actually being incorporated in the system as intended.
These common problems can be detected early with proper ML model monitoring, and save you and your company from a potential catastrophe. Without proper monitoring, some of these issues can go unnoticed, as ML models are relatively “hidden” in the production environment and are often treated as black boxes.
Be in Control
Proper monitoring will enable you to be in full control of your ML models (source)
Monitoring ML models in production will enable you to be in control of your product, detect issues early on, and immediately intervene when action is needed. You will be notified whether the data pipeline breaks, a certain feature is not available, or if your model has simply become stale and must be retrained. Continuous evaluation of ML model performance metrics will give you peace of mind by knowing your model is operating as expected. ML model performance monitoring practices play a significant role in transitioning towards more reliable and unbiased AI systems.
If you currently have any ML models in production, ask yourself the following questions:
- What metrics am I constantly monitoring? Are they a good enough indicator of success?
- How will I know when there is a mismatch between the development data and the production data?
- How long will it take me to find out if my model is not working as expected? How long will it take me to identify the source of the problem?