Machine learning models are becoming increasingly popular in a variety of industries, they are involved in the decision making process of sensitive matters, where there is a high price for any mistakes made. Despite the sensitivity of the matter, many ML models go into production unmonitored and without proper testing, and thus they are prone to significant risks. This is perhaps due to the fact that MLOps is still a new field and there is currently not a standardized method for monitoring machine learning models.
Who‘s In Charge?
Visualization of the different parties involved in the ML model development and upkeep (source)
A production-ready ML model is often a product of contributors from various fields. The data engineers explore, clean, and construct the data pipelines, data scientists experiment and create different models and evaluate their performance. ML engineers than productionize the chosen model and incorporate it into the production environment, and finally, DevOps are in charge of deployment and monitoring. Each of these roles has different expertise and a different idea of what monitoring consists of. This complex setup can lead companies to a situation where there is no single body that takes full responsibility for the ML models that are in production. In order to create a proper monitoring system, the different parties need to form a common language and address the different aspects of monitoring ML systems.
Often no single body takes full responsibility for the ML models in production (source)
What Should be Monitored?
While DevOps teams are usually more familiar with metrics such as uptime, resource utilization latency, etc., they might not be familiar with machine learning model monitoring metrics such as accuracy, precision and recall, concepts such as slice-based performance metrics.
Additionally, as the performance of an ML model is tied directly with the data it operates on, it is essential to monitor both the model’s output and the input features at different stages in order to detect data drifts and schema changes early on, and identify the source of any issues that may arise.
Finally, it can be tricky to define when something “goes wrong” as ML models are statistical objects that are allowed to make some mistakes. Thus testing machine learning models is not exactly the same as traditional software testing.
What Can Go Wrong With Your Model After Deployment?
When your ML model meets the real world there are a variety of issues that may come up, either immediately or after some time in production.
Dev/Production data mismatch: Many ML models are trained on hand-crafted clean datasets. When these models are then applied to real-world data they have poor performance due to this mismatch. Even a minor mismatch in the structure of the data that is fed to your ML model can have a huge impact on the model’s performance.
Data integrity issues: The data pipeline is often complex and the format of the data may change over time, fields could be renamed, categories may be added or split, and more. Any such change can have a dramatic effect on your model performance.
Data drift and concept drift: Data in the real world is constantly changing. Social trends, market shifts, and global events affect each and every industry. These in turn may affect the distribution of the data that is being fed to your model, or that of the desired target prediction. Thus the data you’ve trained your model on becomes less and less relevant over time. The format of the data is still valid but your model will become stale and performance will deteriorate over time.
Serving issues: Finally, it may even be possible that your model is not receiving the traffic you expect it to receive, or that model latency is so high that the predictions are not actually being incorporated in the system as intended.
These are some of the common problems that can be detected by proper machine learning model monitoring early on, and save you and your company from a potential catastrophe. Without proper monitoring, some of these issues can go unnoticed, as ML models are relatively “hidden” in the production environment and are often treated as black boxes.
Be in Control
Proper monitoring will enable you to be in full control of your ML models (source)
Monitoring ML models in production will enable you to be in control of your product, detect issues early on and intervene when action is needed right away. You will be notified whether the data pipeline breaks, a certain feature is not available, or your model has simply become stale and must be retrained. Additionally, continuous evaluation of machine learning model performance metrics will let you have peace of mind, knowing that your model is operating as expected. Machine learning model performance monitoring practices play a significant role in transitioning towards more reliable and unbiased AI systems.
If you have any ML models in production at the moment ask yourself the following questions:
- What are the metrics that I am constantly monitoring? Are they a good enough indicator of success?
- How will I know when there is some mismatch between the development data and the production data?
- How long will it take me to find out my model is not working as expected? How long will it take me to identify the source of the problem?