If you like what we're working on, please  star us on GitHub. This enables us to continue to give back to the community.

MLOps Best Practices

This blog post was written by Preet Sanghavi as part of the Deepchecks Community Blog. If you would like to contribute your own blog post, feel free to reach out to us via blog@deepchecks.com. We typically pay a symbolic fee for content that’s accepted by our reviewers.

“Data is the new oil”. Having refined data, one can explore its wide-ranging applications. With the advent of Machine Learning and Artificial Intelligence, this ‘oil’ can be leveraged to our advantage to make highly accurate predictions, recommendations, and classifications. There are a number of operations involved in building a model and are generally referred to as MLOps. Let us explore some of the best practices associated with MLOps.

However, before we begin, let us clarify the meaning of the term Machine Learning Operations or MLOps. It refers to the practice of developing efficient and usable models, their deployment, reviewing their inference, and monitoring them in the production environment.

As we know, the three most important steps of a machine learning pipeline are:

  1. Model Development.
  2. Model Deployment.
  3. Model Monitoring.

most important steps of a machine learning pipeline


Machine Learning Engineers are generally well versed in the model development part. However, deployment and monitoring are the major issues that are usually by them. MLOps ensures that the entire pipeline flows smoothly and our model stays relevant and sustainable for longer periods of time. Thus, it becomes important to explore the best practices that need to be followed with respect to MLOps.


Open source package for ml validation

Build Test Suites for ML Models & Data with Deepchecks

Get StartedOur GithubOur Github


 It is important to realize the type of deployment before starting with the development of the model. This is because some packages will work only with specific frameworks. Moreover, other concerns such as the business use case, organizational scaling, and availability of resources also need to be considered while deploying a model. For example, a real-time face detection model may not work optimally when deployed on a serverless platform. A serverless platform can cause high latency due to long boot-up times. Thus, one needs to know whether to integrate an ML model into the software to use it separately as a service.

Deployment of a model can be categorized into one of the following:

  1. Live predictions using an application program interface to serve the results.
    Here, we can use a framework like Flask to wrap our model and serve it to the user.
    For example, serving real time predictions for sports scores.
  2. Batch-wise production mode.
    In this type of deployment, we provide the user with an offline model so that there is no need for a server. This type of deployment is generally preferred to serve large numbers of requests and execute intricated models. Moreover, we can simply schedule the training or testing using Prefect in Python.
    For example, serving weekly or bi-weekly product sales prediction using Prefect.
  3. Embedded model on a website or a mobile device.
    This type of deployment can help us significantly reduce latency and data consumption. Edge devices such as mobile and IoT devices have limited computation power and storage capacity due to the nature of their hardware.
    For example, quantization of an intricate classification model and deploying it on an edge device like mobile to understand the purchasing trend of a user.

Having realized the type of deployment, it is important to note that MLOps one follows the continuous integration and continuous delivery principle similar to DevOps. However, the exception is that for an ML pipeline, the data is also revamped along with the models. Moreover, MLOps can also potentially help us deploy a multi-step pipeline that can retrain the model on fetching new data.

 Practice 2: MONITORING

 Having deployed the model, it is critical to monitor the performance of the model to see if it works as intended. There are a plethora of things that can go wrong once we have deployed a Machine Learning model. Monitoring a model helps us find any change in data or relationship between the input and the target variable. Deepchecks has created a comprehensive list of checks and offered related tests that better help us understand different things that can go wrong and why monitoring is necessary after deployment.


Choosing an ML model can be tricky. One needs to test multiple models before finalizing on a particular model and keep a track of their results. It is important to keep a track of these models possibly by using different Git branches for every model. This makes comparisons and selection easy while trying to choose the appropriate model for production. It is important to have reproducible models that can be reinstated or replicated given the same inputs. We can increase reproducibility by tracking data lineage, tracking file versions, and saving the metadata of the models. Tools such as Neptune, Weights & Biases, Comet, and MLFlow are widely used to track model performance and visualize results using an interactive dashboard.


Managing data is another challenging problem when it comes to operating machine learning models. Data works as a critical tool in the industry that can harness inferences based on trends and relations. Moreover, in healthcare and finance domains, data privacy is extremely important given the tighter regulations with respect to data control. One way to manage such data is by using a feature store. Feature stores help in governing and managing data by making it readily accessible and verifiable with regulatory rules. It helps in augmenting the collaboration experience between different team members working with the same data and helps in multi-source data consumption and avoiding data duplication. Some of the online feature stores include Cassandra DB, Redis, and MySQL Cluster.

Feature stores



“But It worked fine on my device”. This is a very common quote that arises after an ML model fails to deliver in a production environment. Most of the goal-oriented models are developed to cater users via platforms such as Netflix’s recommender system. It is important to simulate the behavior of these models under high loads similar to that of a production environment. This is called load testing. It helps consider the aspect of software disruption due to extremely high traffic. Tools like NeoLoad or Locust are largely used for this purpose.

from locust import HttpUser, task, between

class QuickstartUser(HttpUser):
    wait_time = between(1, 2)

    def on_start(self):
        self.client.post("/login", json={"username":"foo", "password":"bar"})

    def hello_world(self):

    def view_item(self):
        for item_id in range(10):
            self.client.get(f"/item?id={item_id}", name="/item")

Example Code for User Testing in Python using Locust


It is evident that MLOps plays a crucial role while developing and deploying a machine learning model. It ensures the longevity and sustainability of the model over a long period of time. Ensuring the implementation of the aforementioned best practices would help us fully leverage the benefits of MLOps such as reproducibility, ease of deployment, and resource and lifecycle management.

To explore Deepchecks’ open-source library, go try it out yourself! Don’t forget to star their Github repo, it’s really a big deal for open-source-led companies like Deepchecks.

Subscribe to Our Newsletter

Do you want to stay informed? Keep up-to-date with industry news, the latest trends in MLOps, and observability of ML systems.

Related articles

Top Techniques for Cross-validation in Machine Learning
Top Techniques for Cross-validation in Machine Learning

Check It NowCheck It Now
Check out our new open-source package's interactive demo

Check It Now