Introduction
As the adoption of machine learning and artificial intelligence continues to spread across a wide range of software products and services, so do their best practices and tools to facilitate testing, deployment, management, and monitoring of ML models.
MLOps provides these best practices, tools, and frameworks for businesses:
- To encourage project sustainability through automation of ML processes and auditing models to demonstrate steps in building and deploying models for easy collaboration and explainability;
- To increase the productivity of practitioners by providing them with the right infrastructures or environment to work and collaborate on projects, reducing time spent on tasks like manually sourcing data, code, or training models; and
- To increase reliability that allows teams to effectively create and work with KPIs and policies that guide the quality of every process in the life cycle of the ML product.
All these can be useful in scaling ML product use cases like in Natural Language Processing (NLP), computer vision, time series forecasting, anomaly detection, and predictive maintenance with optimal product performance, well-managed resources by stakeholders, and customer satisfaction for their users.

Fig. 1 Robotic device – Source
This article discusses an end-to-end MLOps architecture along with open source tools that can assist in accelerating each stage of your machine learning solution. By utilizing a platform-agnostic approach to discuss MLOps architecture, this article can serve as a guide for picking open source tools that can be employed in building an end-to-end MLOps solution.
Understanding MLOps
As far as the definitions of MLOps go, Google‘s definition is spot on:
“MLOps is an ML engineering culture and practice that aims at unifying ML system development (Dev) and ML system operations (Ops).”
Compared to DevOps or DataOps, MLOps is not about a set of specific tools or technologies.
“Some key best practices are having a reproducible pipeline for data preparation and training, having a centralized experiment tracking system with well-defined metrics, and implementing a model management solution that makes it easy to compare alternative models across various metrics and rollback to an old model if there is a problem in production. “
– Matei Zaharia, the chief technologist Databricks
It is crucial for data science and DevOps professionals, along with software development and operations teams, to collaborate continuously and communicate throughout the process. More often than not, collaboration is not limited to these teams only. Depending on the type of business, Subject Matter Experts (SMEs) might be involved to ask business questions and make sure the model’s performance meets their business goals.
The MLOps methodology allows machine learning engineers and data scientists to focus on the core development of the model rather than spending time on tasks such as preprocessing data, configuring environments, or monitoring models. This can be done with the help of commercial or open source MLOps tools. The choice of tool is influenced in most part by available resources (mostly financial) and the stability of the tool.
Why Choose Open-source Tools

Fig. 2 Open Source – Source
The idea that companies turn to open source solutions only to cut down on costs, is a popular misconception. There are many reasons to choose open source software for your overall business needs. Open source software is stable and highly performant because developers continuously monitor, close security gaps, and fix bugs. Moreover, most successful open source tools have a vibrant community of dedicated users and developers that provide built-in support, codebase longevity, and continuous introduction of new features.
Besides supporting a wide variety of environments, architectures, and use cases, new technology can be integrated with open source tools quite easily as organizations see fit. It is also worth noting that with open source, you are not limited to a specific vendor. For relatively newer open source tools, they might not provide enough support and documentation with future security vulnerabilities that haven’t been detected.
On the other hand, proprietary tools give organizations a higher level of certainty considering the tool’s ability to perform and receive frequent updates with ample support even at the beginning stages of the product. It is exhaustively tested with less security issues compared to open source tools. From a business perspective, limitations can arise when using proprietary products from only one vendor, making it an expensive decision when you want to change. The software can also be discontinued and without immediate options, it often leads to a costly situation.
Opensource Tools for Efficient MLOps Solutions
As part of the development of an end-to-end MLOps platform, it’s always a good idea to start by designing a framework or architecture based on your requirements. Input from all relevant teams should be obtained when putting together this framework as MLOps solutions hinge heavily on effective cross-team communication to succeed.
There are two phases in the architecture with different but related processes taking place between them. They are the Continuous Integration (CI) and Continuous Delivery (CD). Teams utilize a CI/CD pipeline as a method to guarantee the continuous development and delivery of models.
Below is an example of an end-to-end architecture:

Fig. 3: An MLOps architecture showing its different stages and parts. Source: Author
By implementing a CI/CD pipeline, ML teams can build robust, reliable models more rapidly and efficiently. A key goal of the continuous integration phase is to ensure the seamless integration and synchronization of code changes made by multiple contributors. The ML pipelines should always be rerun whenever there are changes to the ML system. In general, ML pipelines consist of data manipulation, training, testing, model deployment, and generating reports and artifacts. It is noteworthy to mention that testing is crucial in the ML lifecycle to detect bugs and issues in data and models, giving ML teams standardized feedback loops. It allows them to update the models quickly to ensure fast performance recovery.
The CI/CD stages described in figure 3 are investigated below with recommended open source tools that can help you at each stage of the pipeline.
Source Code
The CI pipeline starts with the source, such as a model code repository.
The ML pipeline is automatically triggered when a change is pushed to the code repository. Code can be stored on Github, one of the most popular version control systems. With GitHub, you can use branches and pull requests to control what code gets merged where and by whom.
Feature Store
A feature store is a relatively new concept and an optional part of the architecture. For bigger teams training a huge number of models, consider using a feature store. A feature store lets you keep track of the features used to train models. Every model has to access the data and do some transformation to turn it into features, which the model then uses for training. These features can be stored, versioned, and organized in a feature store so that models using the same kinds of features can get them directly from the store. This permits the reuse of features across different projects and models.
Feast offers the functionalities for storing and managing features for machine learning models. It supports online and offline features for realtime model inference or for model training and batch scoring. To get into more detail about how to create a feature store with Feast, Kedion writes a medium article exploring this topic further.
ML Pipeline
Kubeflow is a very popular open source tool for creating ML pipelines (i.e., workflows for building, training and deploying ML models). This open source toolkit facilitates the scaling of ML models because it runs on Kubernetes. It handles all the container orchestration and management, allowing data scientists to focus on creating their machine learning workflows. As illustrated in the architecture above, the ML workflow involves defining steps for data processing and manipulation, model training, and validation.
It is possible to run everything from Git repositories like GitHub without switching contexts (if necessary) by using Kubeflow Automated pipelines Engine (KALE) with GitOps using GitHub Actions. Learn more about this in an article by weaveworks. If you’d prefer not to use a CI tool, you may like to stick to Kubeflow’s Argo, which will run your workflow automatically under the hood. Note that the Kubeflow pipeline uses a domain-specific language (DSL) called Kfp SDK which is fairly easy to learn.
Kubeflow stands on three pillars: scalability, portability, and composability.
It can increase or decrease resources according to the project needs, and your project can run on different types of infrastructure. Composability implies that each component of your project will work even if it is broken down into several independent pieces. As an additional feature, Kubeflow has the possibility of running pipelines simultaneously, running pipelines, and generating several models at once.
The model validation testing stage can be done with Deepchecks, an open source tool that utilizes test suits for validating ML models and datasets. It is a flexible option as it integrates very well with Hugging Face, Databricks, and Apache Airflow among other tools. After our model has been validated, we are ready for deployment, and we can move into the CD phase.
Continuous Deployment ensures that new changes, releases, or models are deployed and efficiently brought to users. Doing so relieves the team of the burden of maintaining the release, accelerates the deployment of software to customers, and enables continuous learning through customer feedback loops.
Let’s take a closer look at our CD environment.
Model Registry

Fig. 4 – MLflow and Kubeflow: source
It’s difficult to keep track of how your model was created, where packaged models can be located, and what was done to make them work. A trustworthy model registry including all models’ meta-data, such as hyperparameters, metrics, code and dataset versions, and evaluation predictions might be quite useful.
A model registry offers a common location for managing various model artifacts, tracking, and governing models throughout the ML lifecycle. It also provides teams with a way to collaborate at different stages of the life cycle, as it contains several environments, ranging from staging to production to archived.
MLflow is a great fit for this case, as it seamlessly integrates with Kubeflow. It is an easy to use open source platform with a centralized model store that provides APIs for a variety of features. Features like model versioning, production to deployment transitions, annotations, and model lineage. Kubeflow does experiment tracking as well, but it doesn’t provide the level of detail as MLflow. Its APIs can be called directly in the pipeline code and all the required model meta-data will be stored in the MLflow server. It is then possible to view all these artifacts via the MLflow user interface.
Model Serving
At this point, the model should be stored in the production environment of a model registry (the MLflow server) and ready to be deployed and served. It quickly becomes apparent that the work is not yet done once the machine learning model is deployed to production. At this point, important questions about the method of shipping the model, accessibility of the model to end-users, optimizing system metrics like latency or uptime, and model scalability should be considered. Teams can try developing their own model deployment platform on Amazon Web Services like EKS or Lambda to solve all of these problems but they can save a ton of time and effort with Cortex.
Cortex is an open source tool that lets you deploy all types of models allowing your APIs to automatically scale to handle production workloads. It supports multiple frameworks including Tensorflow, pyTorch, and scikit-learn. Cortex can run inference on both CPU or GPU infrastructure and interestingly updates APIs post-deployment without any downtime.
Model Monitoring
Model monitoring is a continuous process, therefore it is important to identify the elements for monitoring and create a strategy for handling it. ML systems have to be monitored to detect model degradation which can lead to suboptimal performance levels. Operational resources such as GPU usage, and number of API calls should be monitored so the system runs without interruption.
Deepchecks is a great tool not only for testing and validating your model, but also for monitoring models after being served. It offers more flexibility as it has an open source and commercial offering, depending on your needs. Other model monitoring tools worth mentioning are Prometheus and Grafana. Real-time measurements can be tracked by monitoring tools and visualized on a dashboard, gathering both model quality information, such as outlier detection and model drift, as well as operational metrics, such as request rate and latency among others.
Summary
Open source tools can be easy to use and adopted by teams because of their flexibility and a support community creating new features and fixing bugs regularly. The table below lists the open source tools discussed in this article. Regardless of which tools are chosen, the end-to-end architecture is designed to operate a well-functioning MLOps solution.
MLOps Stage | Open-source Tool | Alternatives |
Source Code | ![]() | Bitbucket |
Feature Store | ![]() | Hopsworks |
ML Pipeline | ![]() | Polyaxon |
Model Validation Testing/Maintenance | ![]() | Etiq AI, Great Expectations |
Model Registry | ![]() | Neptune |
Model Serving | ![]() | Seldon Core |
Model Monitoring | ![]() | Prometheus, Grafana |
In the end, each team needs to find the right mix of MLOps products and practices that best fits their use cases.