Are you constantly nervous about hidden issues in your machine learning modeling framework? Do you want to know how to automate model monitoring well? Would you like to decide about what to monitor and how?
This article will explain why you should automate machine learning monitoring and share the best practices for setting up your monitoring framework.
Why Automate Your Machine Learning Model Monitoring Framework
A core challenge with monitoring is that it is an “important but not urgent” task. It is critical to prevent and address issues in your pipeline, but other urgent and important tasks always take priority.
The common advice for addressing non-urgent important tasks is to schedule them in your calendar. Scheduling is a great method for tasks with uniform importance, but monitoring is not like that. If your pipeline works, constant checking is a waste of time. However, when a problem emerges, you really have to know about it in time and respond.
Automating your machine learning model monitoring framework is a better answer. It keeps you assured that things work when they do and makes issues urgent when they become relevant.
For this reason, you should automate the monitoring of your machine learning workflow following best practices.
In the rest of this article, you will learn what to do to automate your machine learning monitoring well. Specifically, we will discuss the following areas:
- What to monitor within your machine learning workflow;
- Using metrics vs. logs in monitoring;
- Best ways to present your monitoring results;
- Monitoring as a software development problem;
- How to choose the proper monitoring tool.
What to Monitor in Machine Learning: Stages, Versions, and Events
Monitoring your machine learning model is more than just having an update of your model’s latest accuracy.
Your aim with monitoring is not to get noticed about low performance after the fact. Your aim is to be alert about model performance degradation plenty of time in advance so you can take action.
Even when you want to directly evaluate your models’ output, you may not have immediate access to ground truths. Instead, you need to use proxy measures to assess your model’s performance.
For these reasons, you need to monitor the intervals of your workflow. Because monitoring the entire workflow is impractical and expensive, you need to identify specific checkpoints to watch.
Identify checkpoints along the following dimensions:
Mark these checkpoints with categories and tags so you will be able to analyze them meaningfully.
Place checkpoints at the border of the stages you want to monitor.
Your machine learning workflow consists of different stages (e.g., the steps in your pipeline). Identify the main stages of your machine learning workflow and think through which of them you want to get information about.
The relevant stages are specific to your case. Here are the most common categories:
- Upstream data inflow;
- Data processing and feature engineering steps in your machine learning pipeline;
- Modeling steps (e.g., sampling, cross-validation, testing);
- Interactions between your machine learning models (e.g., between training and serving);
- DevOps processes;
- Business functions.
Introduce tags that mark your workflow stage versions.
As your project grows, you generate newer versions of your workflow, and you want to monitor and understand them. Because of the specifics of machine learning, versions do not mean only code but also datasets, models, environments, or even clients.
Mark one-off events to identify their effect.
Things will happen with your machine learning workflow. For example, you may replace a model with a better one; you introduce a new stage; the way you receive data changes; your business division launches a marketing campaign.
These events can affect your model performance. Mark them in your monitoring framework to not confuse their effect with model degradation.
Machine Learning Model Monitoring Metrics vs. Logs
The two main ways you collect data about your machine learning workflow are logs and metrics.
They both have their strengths and weaknesses, and you should evaluate their use for each checkpoint you want to monitor. Consider the table below for each monitoring checkpoint to decide whether you should generate metrics or collect logs for them.
|Context provided||Rich, High number of dimensions||Poor, Few dimensions|
|Analytics||Log analysis; Requires tooling||Statistics, Dashboards|
|Generation and storage cost||Can explode||Low, Constant (but issues with high cardinality metrics)|
Present Monitoring Results
The two main ways to present your machine learning monitoring framework’s results are to visualize them on dashboards or send them out as part of an alerting system.
Visualizations and Dashboard
You can make good use of dashboards by giving them a meaningful organization. If you have different models, monitor them in separate dashboards, but keep a common structure to keep them comparable to each other.
A Deepchecks dashboard summarizing model results
Alerts are critical components of automated monitoring as they are responsible for putting ‘automated’ into the framework.
You can set up alerts to go out in the following ways:
- At a particular pre-planned event (e.g., when you retrain your model, when you import new data, etc.);
- At fixed intervals (e.g., every 24 hours);
- Whenever an unexpected event happens (e.g., noticing data drift, fall in model performance, etc.).
For these alerts to be effective, think about what situation your monitoring system will send them out, who will read them, and what should be the optimal response.
An effective alert message contains the following information about the event:
– What has happened?
– Where in the workflow has it happened, and when?
– Why has it happened; what might have caused it?
– Why is this event important; what are its possible effects?
To make it easy for the alert’s recipient to quickly address the problem, provide the necessary context and environment:
- Auto-prepared data,
- Related logs,
- Coding environment for data exploration and debugging.
Create case-specific and personalized alert messages by following these practices. If your system has several alerts to send to the same recipient, prioritize them with a ranking algorithm based on the issues’ importance and the user’s behavior.
Monitoring is Software Development
Your machine learning monitoring framework is a software development project, and you should consider it as one. Be sure to follow software development best practices with it, including version control, documentation, and testing.
Monitoring has performance requirements just like other software projects. Evaluate the resource needs of your metrics and logs and determine when and how frequently you will need to generate them.
For example, recalculating metrics on the whole dataset is probably not a good idea if you work with huge multi-dimensional datasets. In this case, implement metrics and calculations that you can execute on batches.
As your modeling workflow grows, so will your monitoring framework. Think about how you will extend it as you introduce additional modeling features.
Build Models for Monitoring
How you design your machine learning workflow influences how well you can monitor it. If it is hard to generate metrics about specific modeling steps, you will have difficulty building a monitoring framework on top of it.
For this reason, build your machine learning workflow with monitoring in mind:
- Modularity: Maintain a modular structure with clear boundaries between steps;
- Version control: Track versions of your data, configuration, and code;
- Observability: Make your code observable, and avoid opaque and black-box models;
- Reproducibility: Make it possible to reproduce results with given parameters.
Select the Right Monitoring Tools
To automate monitoring, you need to introduce monitoring agents into your workflow.
When you implement monitoring agents, you can choose the monitoring tools of the platforms you use or external monitoring services.
Platform-specific Monitoring Tools
If you want to implement only a few simple monitoring agents, you can use the monitoring services of the machine learning platforms you use. They can be useful to investigate the most common issues related to the platform’s models. Unfortunately, they are platform-dependent, and you cannot set them to monitor business logic.
External Monitoring Services
If you have a complex workflow, use multiple modeling frameworks, want to include business logic, and want to see more than just a simple alert, you might use external monitoring services.
One benefit of these services is platform independence. You can monitor with them multiple machine learning frameworks, domain-specific issues, and their integrations with your analytics and DevOps processes.
Even if you are currently using a single framework, you can future-proof your monitoring with an external tool allowing you to compare current results with future ones generated with a different machine learning framework.
Automate Your Machine Learning Model Monitoring Framework
A good automating machine learning monitoring framework like Deepcheck alerts you when an issue is coming up and provides you with relevant context to take action.
With Deepchecks, you can include monitoring checkpoints into your machine learning workflow, present model metrics on dashboards, and customize alerts. We built our machine learning monitoring framework with software development and machine learning principles in mind so you can rely on it even if your project involves multiple machine learning platforms.