If you like what we're working on, please  star us on GitHub. This enables us to continue to give back to the community.

MLOps: Best Practices

This blog post was written by Preet Sanghavi as part of the Deepchecks Community Blog. If you would like to contribute your own blog post, feel free to reach out to us via blog@deepchecks.com. We typically pay a symbolic fee for content that's accepted by our reviewers.

Introduction

Data is the new oil. Possessing refined data allows someone to explore wide-ranging applications. With the advent of Machine Learning and Artificial Intelligence, this “oil” can be leveraged to our advantage to make highly accurate predictions, recommendations, and classifications. There are a number of operations involved in building a model and are generally referred to as MLOps. Let us explore some of the best practices associated with it.

Let us begin by defining Machine Learning Operations (MLOps). MLOps refers to the practice of developing efficient and usable models, their deployment, reviewing their inference, and monitoring them in the production environment.

As we know, the three most important steps of a Machine Learning pipeline are:

  1. Model Development
  2. Model Deployment
  3. Model Monitoring

Machine Learning engineers are generally well versed in model development But it is the deployment and monitoring that are the major issues. MLOps ensures that the entire pipeline flows smoothly and our model stays relevant and sustainable for longer periods of time, thus it becomes important to explore the best practices that need to be followed with respect to MLOps.

MLOps BEST PRACTICES

Testing. CI/CD. Monitoring.

Because ML systems are more fragile than you think. All based on our open-source core.

Deepchecks HubOur GithubOpen Source

Practice 1: TYPE OF DEPLOYMENT

It is important to realize the type of deployment before starting development. This is because some packages will work only with specific frameworks. Other concerns such as the business use-case, organizational scaling, and availability of resources need to be considered while deploying a model. For example, a real-time face detection model may not work optimally when deployed on a serverless platform. A serverless platform can cause high latency due to long boot-up times, so one needs to know whether or not to integrate an ML model into the software to use it separately as a service.

Deployment of a model can be categorized into one of the following:

  1. Live predictions using an application program interface to serve the results.
    Here, we can use a framework like Flask to wrap our model and serve it to the user.
    Example: serving real-time predictions for sports scores.
  2. Batch-wise production mode.
    In this type of deployment, we provide the user with an offline model so that there is no need for a server. This type of deployment is generally preferred to serve large numbers of requests and execute intricate models. We can simply schedule the training or testing using Prefect in Python.
    Example: serving weekly or bi-weekly product sales prediction using Prefect.
  3. Embedded model on a website or a mobile device.
    This type of deployment helps to significantly reduce latency and data consumption. Edge devices such as mobile and IoT devices have limited computation power and storage capacity due to the nature of their hardware.
    Example: quantization of an intricate classification model and deploying it on an edge device like mobile to understand the purchasing trend of a user.

Once you’ve determined the type of deployment, it is important to note that MLOps follows the continuous integration and continuous delivery principle, similar to DevOps. The exception is that for an ML pipeline, the data is also revamped along with the models. MLOps can also potentially help deploy a multi-step pipeline that can retrain the model on fetching new data.

Practice 2: MONITORING

Having deployed the model, it is critical to monitor the performance of the model to see if it works as intended. There is a plethora of things that could go wrong once we have deployed a Machine Learning model. Monitoring a model helps us find any change in data or relationship between the input and the target variable. Deepchecks has created a comprehensive list of checks and offered related tests that better help us understand different things that can go wrong and why monitoring is necessary after deployment.

Practice 3: REPRODUCIBILITY & TRACKING

Choosing an ML model can be tricky. You would need to test multiple models before finalizing on a particular one and keep a track of their results. It is important to keep a track of these models by using different Git branches for every model. This makes comparisons and selection easy while trying to choose the appropriate model for production. It is important to have reproducible models that can be reinstated or replicated given the same inputs. We can increase reproducibility by tracking data lineage, tracking file versions, and saving the metadata of the models. Tools such as Neptune, Weights & Biases, Comet, and MLFlow are widely used to track model performance and visualize results using an interactive dashboard.

Practice 4: DATA GOVERNANCE & FEATURE STORES

Managing data is another challenging problem in operating Machine Learning models. Data works as a critical tool in the industry that can harness inferences based on trends and relations. Data privacy is extremely important in the healthcare and financial domains, given the tighter regulations with respect to data control. One way to manage such data is by using a feature store. Feature stores help in governing and managing data by making it readily accessible and verifiable with regulatory rules. It helps in augmenting the collaboration experience between different team members working with the same data and helps in multi-source data consumption and avoiding data duplication. Some of the online feature stores include Cassandra DB, Redis, and MySQL Cluster.

Practice 5: ROBUSTNESS / LOAD TESTING

“But it worked fine on my device!” is a common thing to say when your ML model fails to deliver in a production environment. Most of the goal-oriented models are developed to cater users via platforms such as Netflix’s recommender system. It is important to simulate the behavior of these models under high loads similar to that of a production environment. This is called Load Testing It helps consider the aspect of software disruption due to extremely high traffic. Tools like NeoLoad or Locust are largely used for this purpose.

from locust import HttpUser, task, between

class QuickstartUser(HttpUser):
    wait_time = between(1, 2)

    def on_start(self):
        self.client.post("/login", json={"username":"foo", "password":"bar"})

    @task
    def hello_world(self):
        self.client.get("/hello")
        self.client.get("/world")

    @task(3)
    def view_item(self):
        for item_id in range(10):
            self.client.get(f"/item?id={item_id}", name="/item")

Example Code for User Testing in Python using Locust

Summary

It is evident that MLOps plays a crucial role when developing and deploying a Machine Learning model. It ensures the longevity and sustainability of the model for a long time. Ensuring the implementation of the aforementioned best practices helps us fully leverage the benefits of MLOps such as reproducibility, ease of deployment, and resource and lifecycle management.

To explore Deepchecks’ open-source library, go try it out yourself! Don’t forget to ⭐ their Github repo, it’s really a big deal for open-source-led companies like Deepchecks.

Testing. CI/CD. Monitoring.

Because ML systems are more fragile than you think. All based on our open-source core.

Deepchecks Hub Our GithubOpen Source

Recent Blog Posts

Reducing Bias and Ensuring Fairness in Machine Learning
Reducing Bias and Ensuring Fairness in Machine Learning
×

Event
Testing your NLP Models:
Hands-On Tutorial
March 29th, 2023    18:00 PM IDT

Days
:
Hours
:
Minutes
:
Seconds
Register Now