If you like what we're working on, please  star us on GitHub. This enables us to continue to give back to the community.

Model Versioning for ML Models: A Comprehensive Guide

This blog post was written by Tonye Harry as part of the Deepchecks Community Blog. If you would like to contribute your own blog post, feel free to reach out to us via blog@deepchecks.com. We typically pay a symbolic fee for content that's accepted by our reviewers.


Roman Synkevych

Fig 1. Image credit: Roman Synkevych

In a typical software development process, changes are made rapidly and continuously due to the experimental nature of technology development. As a result, versioning has become an essential component to keep track of modifications made to the source code and identify the team member responsible for them.

This is particularly relevant in Machine Learning (ML) systems, where teams need to track changes in data, code, and the model being developed to achieve optimal results. Specifically, three types of versioning are crucial in an ML system:

  • Data Versioning: This mostly involves tracking and managing changes to the data utilized in creating the model.
  • Code Versioning: This enables the tracking of modifications made to the source code that powers an ML system, ensuring transparency and facilitating collaboration.
  • Model Versioning: This ensures that modifications to the machine learning model are tracked and managed. It can also involve some aspects of data versioning when needed.

In this article, we will focus on model versioning and provide a comprehensive guide to help you understand what it is and the various technicalities involved.

Model versioning

It is crucial to understand version control to appreciate model versioning.

Version control

It is the process of tracking and managing modifications in software code or ML systems and it is an essential part of maintaining a detailed record of changes to a system, enabling data science teams to revert to previous (favorable) versions and collaborate effectively.

Model versioning, on the other hand, is a specific type of version control focused on tracking changes made to the ML model in a machine learning system. By versioning the model, teams can maintain a complete history of changes made to the model, enabling them to reproduce results, debug issues, and collaborate effectively.

In addition, model versioning can track datasets, metrics, hyperparameters, algorithms, and artifacts to ensure transparency and accuracy in the ML development process.

Importance of version control in ML development

Due to the iterative nature of ML model development or lifecycles, continuous modifications to various components of the model, data, or code are a common occurrence. To track and manage these changes, machine learning model versioning plays a crucial role in creating simple, iterative, and retrievable records of these modifications. Here are some key benefits:

  • Enables accurate reproduction of previous results
  • Facilitates debugging problems and collaborating more effectively
  • Allows for continuous improvement and optimization of datasets, code, and models to improve the performance of the ML system.
  • Improves the reproducibility of the entire project.

Overview of version control systems

A Version Control System (VCS) is a software tool that enables developers to track and manage changes to source code, data, or model. By programmatically versioning files and projects, these tools help data scientists reduce the burden of manual versioning and enable team collaboration. Also, these tools reduce the likelihood of single-point failure compared to manual versioning where all changes can be lost if your disk gets damaged.

There are three primary types of version control systems:

  • Local Version Control Systems (LVCS)
  • Centralized Version Control Systems (CVCS)
  • Distributed Version Control Systems (DVCS)

Local Version Control System (LVCS):

It involves creating a database of changes to files and directories in the project, allowing you to revert to previous versions of the project in case of any issues. The system stores a complete copy of the project in the local database, which allows the team to work offline without the need for a network connection. It can lead to a single-point failure since it has inadequate backup capabilities.

Local Version Control System

Fig 2. A diagram illustrating a local version control system. Image credit: Git

Centralized Version Control System (CVCS):

It stores the code in a central repository, and collaborators work on a local copy of the code. Changes are made to the local copy, and then the changes are committed to the central repository. Examples of CVCS include Subversion (SVN), and Perforce.

Centralized Version Control System

Fig 3. A diagram illustrating a centralized version control system. Image credit: Git

Distributed Version Control System (DVCS):

It also stores the code in a central repository, but each collaborator has a local copy of the entire repository, enabling them to work offline and commit changes to the local repository. Changes can then be merged into the central repository as needed. Examples of DVCs include Git, Mercurial, and Bazaar.

Git is a version control system while GitHub is a cloud-based hosting service that helps you manage Git repositories

Distributed Version Control System

Fig 4. A diagram illustrating a distributed version control system. Image credit: Git

CommandGitHg (or Mercurial)bzr (or Bazaar)
PopularityMost popularSecond most popularLess popular
PerformanceVery fastFastSlow
Windows supportGoodGoodGood
Large projectsBetter suited for very large projectsBetter suited for medium-sized projectsBetter suited for small projects
HostingGitHub, GitLab, Bitbucket, and othersBitbucket, SourceForge, and othersLaunchpad, SourceForge, and others

Table 1. A table showing the differences and similarities between Git, Mercurial, and Bazaar

Testing. CI/CD. Monitoring.

Because ML systems are more fragile than you think. All based on our open-source core.

Deepchecks HubOur GithubOpen Source

Applying Git to model versioning

Git is the most popular version control system used by developers and data scientists alike as it has a very reliable workflow, is massively supported by most third-party platforms like GitHub, GitLab, etc, and the immense adoption of distributed version control systems by the vibrant development community.

Here is an example of using git for model versioning on your local machine. You can connect to your GitHub account to interact with your git repository as well.

Note: code used in this article is created with python programming language

## initialize a new Git repository in your project directory

git init

## create a new PyTorch model and save it to a file
import torch
import torch.nn as nn

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv2 = nn.Conv2d(16, 32, kernel_size=3, stride=1, padding=1)
        self.fc1 = nn.Linear(32 * 8 * 8, 64)
        self.fc2 = nn.Linear(64, 10)

    def forward(self, x):
        x = self.conv1(x)
        x = nn.functional.relu(x)
        x = self.pool(x)
        x = self.conv2(x)
        x = nn.functional.relu(x)
        x = self.pool(x)
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        x = nn.functional.relu(x)
        x = self.fc2(x)
        return x

model = MyModel()
torch.save(model.state_dict(), 'model.pth')

## add the model file to the Git repository and commit the changes
git add model.pth
git commit -m "Initial version of PyTorch model"

## train the model on some data and save the new version

# Load some data and train the model
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()

for epoch in range(10):
    for i, (images, labels) in enumerate(train_loader):
        outputs = model(images)
        loss = criterion(outputs, labels)

# Save the new version of the model
torch.save(model.state_dict(), 'model_v2.pth')

## Add the new model version to the Git repository and commit the changes

git add model_v2.pth
git commit -m "Trained model on some data"

This is a general example showing the idea behind model versioning with git and might be different in your case.

Model versioning in MLOps: Using Deepchecks

Deepchecks is a powerful open source MLOps tool that can be very useful for model versioning. It helps data scientists to effectively track and manage changes made to their machine learning models over time. With Deepchecks, you can easily compare different versions of your models and identify changes that may have caused a drop in performance.

The deepchecks SaaS platform (Deepchecks Pro) also acts as a versioning software that flawlessly does MLOps data versioning, machine learning model versioning, and helps versioning management for individual model evaluation metrics.

With its detailed visualizations of evaluation metrics, it helps you evaluate the effectiveness of each version of your different models. Deepchecks can be integrated with popular versioning systems like Git to ensure that your models are properly versioned and documented. Making model management, tracking, and collaboration easier.

FeatureDeepchecks ProGit
VersioningIt’s version control is designed specifically for machine learning modelsGeneral-purpose version control system
Ease of UseEasy to set up and use with a user-friendly interfaceMight be intimidating fro first time users especially when it requires a command line interface
Data TrackingTracks and logs data used to train models, making it easy to reproduce experiments and resultsLimited data tracking capabilities
Model TrackingTracks all versions of a model, including the model architecture, weights, and evaluation metricsTracks only code changes, requires manual tracking of model versions
CollaborationAllows for seamless collaboration among team members by providing a centralized platform for sharing and reviewing model versionsCollaboration can be difficult without proper branching and merging
ScalabilityCan handle large-scale machine learning projects with many contributors and modelsLimited scalability for larger projects
InterpretabilityProvides explainability of model versions through tracking of hyperparameters and training data, facilitating model interpretationLimited interpretability of model versions

Table 2. A table showing the differences Deepchecks Pro and Git

Here is an example of how model versioning can be done with Deepchecks:

## Installing deepchecks package

import sys
!{sys.executable} -m pip install -U deepchecks
!{sys.executable} -m pip install -U deepchecks-client

Preparing the reference data

from deepchecks.tabular.datasets.regression.airbnb import load_data, \
    load_pre_calculated_prediction, load_pre_calculated_feature_importance

ref_dataset, _ = load_data(data_format='Dataset')
ref_predictions, _ = load_pre_calculated_prediction()
feature_importance = load_pre_calculated_feature_importance() # Optional
neighbourhood_group               0.1
neighbourhood                     0.2
room_type                         0.1
minimum_nights                    0.1
number_of_reviews                 0.1
reviews_per_month                 0.1
calculated_host_listings_count    0.1
availability_365                  0.1
has_availability                  0.1

dtype: float64

Creating the Data Schema

from deepchecks_client import DeepchecksClient, create_schema, read_schema

schema_file_path = 'schema_file.yaml'
create_schema(dataset=ref_dataset, schema_output_file=schema_file_path)
# Note: for conveniently changing the auto-inferred schema it's recommended to edit the textual file with an app of your choice.
# After editing, you can use the `read_schema` function to verify the validity of the syntax in your updated schema.
Schema was successfully generated and saved to schema_file.yaml.

{'additional_data': {}, 'features': {'availability_365': 'integer', 'calculated_host_listings_count': 'integer', 'has_availability': 'categorical', 'minimum_nights': 'integer', 'neighbourhood': 'categorical', 'neighbourhood_group': 'categorical', 'number_of_reviews': 'integer', 'reviews_per_month': 'numeric', 'room_type': 'categorical'}}

Creating a Model Version

import os

# Point the host to deepchecks app
host = os.environ.get('DEEPCHECKS_API_HOST')  # Replace this with https://app.deepchecks.com
# note to put the API token in your environment variables. Or alternatively (less recommended):
# os.environ['DEEPCHECKS_API_TOKEN'] = 'uncomment-this-line-and-insert-your-api-token-here'
dc_client = DeepchecksClient(host=host, token=os.getenv('DEEPCHECKS_API_TOKEN'))
model_name = 'Airbnb'

model_version = dc_client.create_tabular_model_version(model_name=model_name, version_name='ver_1',
Model Airbnb was successfully created!. Default checks, monitors and alerts added.
Model version ver_1 was successfully created.
Reference data uploaded.

Uploading Production Data and Predictions

timestamp, label_col = 'datestamp', 'price'
_, prod_data = load_data(data_format='DataFrame')
_, prod_predictions = load_pre_calculated_prediction()

data=prod_data.drop([timestamp, label_col], axis=1),
/home/runner/work/mon/mon/.venv/lib/python3.9/site-packages/deepchecks_client/tabular/client.py:661: UserWarning:

Index of provided "data" dataframe completely matches "sample_ids" array, are you sure that "samples_ids" array is correct and contains unique sample identifiers?

10000 new samples sent.
10000 new samples sent.
10000 new samples sent.
10000 new samples sent.
2225 new samples sent.
Upload finished successfully but might take time to ingest into the system, see for status.

Updating the Labels

model_client = dc_client.get_or_create_model(model_name)
model_client.log_batch_labels(sample_ids=prod_data.index, labels=prod_data[label_col])
10000 labels sent.
10000 labels sent.
10000 labels sent.
10000 labels sent.
2225 labels sent.

Delete Model

# CAUTION: This will delete the model, all model versions, and all associated datasets.

To get access to an account, you can request for your organization and check out the quickstart guide to test the platform and its suitability for your organization’s use case.

What needs to be versioned?

Apart from versioning the code and data in an ML development lifecycle, the model and the environment should be versioned most for reproducibility and model optimization.


  • Model architecture or algorithm, hyperparameters (batch size, learning rate, epochs, etc.), and weights can be versioned.
  • Model evaluation metrics and results for each version, including test accuracy and other relevant performance indicators, should be documented. This enhances the model’s explainability and performance throughout experimentation.

Training and Deployment environments:

  • The configurations used for training and deploying models should be versioned. These configurations include dependencies like libraries and packages to ensure training and deployment environment consistency.
  • The deployment scripts used to deploy the model can be versioned to enable the reproducibility of the deployment process.
  • The dependencies required for the deployment environment, such as the operating system, runtime libraries, and software packages, can be versioned to ensure the consistency of the deployment environment.

Types of Model Versions

In order to ensure consistency and clarity in model versioning, different types of version numbers are used to indicate the scope of the changes made to a model. These types of version numbers are typically broken down into

  • Major version
  • Minor version
  • Patch version

Major Version: A major version indicates a significant change that could impact the performance or functionality of your model. Typically, a major version update involves a significant change in your model’s architecture, algorithms, or training data. You can also introduce new features or capabilities using this. Major versions are typically denoted by incrementing the first digit in the version number.

Minor Version: A minor version indicates a smaller change that typically does not significantly affect the model’s performance or functionality. For example, a minor version update could involve a bug fix, a small optimization, or a new feature that does not fundamentally alter the model’s behavior. Minor versions are typically denoted by incrementing the second digit in the version number.

Patch Version: A patch version indicates a small change or bug fix that is made to a specific version of the model. Patch versions are typically denoted by incrementing the third digit in the version number.

Components of versioning numbering

Fig 5. Components of versioning numbering. Image credit: Geeksforgeeks

Versioning schemes

Choosing a versioning scheme is an essential step toward efficient collaboration in ML development. Failure to do so could result in confusion and time wastage, especially for larger projects. Here are some versioning schemes to consider when developing machine learning models:

  • Semantic Versioning:
    Semantic versioning is a widely used versioning scheme that uses a three-part version number consisting of major, minor, and patch versions. The version number is typically written in the format “major.minor.patch”. This scheme is often used for software libraries, frameworks, and APIs.
  • Calendar Versioning:
    Calendar versioning is a versioning scheme that uses the date of release as the version number. For example, a model released on January 1st, 2022 would have a version number of 2022.01.01. This scheme is often used for data science projects where the focus is on tracking changes over time.

    Components of calendar versioning

    Fig 6. Components of calendar versioning. Image credit: Datalust

  • Sequential Versioning:
    Sequential versioning is a versioning scheme that uses a simple sequential numbering system to track versions. Each new version is assigned the next available number in the sequence (e.g., 1, 2, 3, etc.). This scheme is often used for small projects or individual models.
  • Git Commit Hash:
    This scheme involves versioning based on the Git commit hash, which is a unique identifier for every commit made to a code repository. It allows for precise tracking of changes made to the model and its associated code. For example, ‘4dfc13a’, ‘8c2fb85’, etc for each commit made to the repository. It is commonly used for collaborative development on machine learning projects, where multiple developers may be making changes to the codebase at the same time.
Git commit hash

Fig 7. Git commit hash. Image credit: Git

Challenges and Considerations

While model versioning is an essential part of machine learning development, it comes with its own set of challenges. As models become more complex and teams grow, it becomes increasingly important to manage the versioning process effectively. Here are a few considerations that should be made depending on the size of your project:

  • Data management and versioning
  • Scalability and resource requirements
  • Integrating with existing workflows and systems

Data management and versioning:

One major challenge in model versioning is data management and versioning. Keeping track of changes in data used to train models is crucial as it can have a significant impact on model performance.

However, managing large datasets and tracking changes to them can be challenging, especially when dealing with distributed datasets across multiple machines. It is essential to establish proper protocols and workflows for data versioning to ensure data consistency, reliability, and easy tracking of changes.

Scalability and resource requirements:

Another challenge in model versioning is scalability and resource requirements. As models grow in complexity, their versioning requirements can become increasingly complex, and storing multiple versions of models can quickly become resource-intensive. It is crucial to consider the scalability and resource requirements of model versioning systems when implementing them to ensure they can handle increasing model complexity and volume.

Integrating with existing workflows and systems:

Integrating model versioning with existing workflows and systems can also pose a challenge. Different teams may have different tools and workflows, and integrating model versioning into these can require significant effort. It is essential to choose a model versioning system that can integrate seamlessly with existing tools and workflows to minimize disruption and ensure smooth collaboration across teams.

Best Practices for Model Versioning

To ensure efficient collaboration with your team throughout the timeline of a project, it is important to consider version management from the outset.

When doing this, remember that model versioning should also involve monitoring and evaluating the model’s performance over time to ensure it continues to meet business and performance requirements. One best practice is to establish automated testing and monitoring processes that evaluate the model’s performance at regular intervals. This includes tracking metrics like accuracy, precision, and recall and comparing them against previous versions. For instance, using tools like deepchecks, can help track and evaluate model performance metrics over time.

Diagram showing the dashboard for Deepcheck Pro

Fig 8. Diagram showing the dashboard for Deepcheck Pro. Image credit: Deepchecks Pro


In conclusion, the significance of model versioning cannot be overstated, especially in the context of machine learning development. It allows teams to access records of all changes made to the models, facilitating ease of review and reverting back to previous versions if need be. This enhances continuous experimentation, which in turn can improve the model and the overall ML system.

As we look to the future, there is hope that the process of model versioning will become automated, similar to how Google Colab saves code when working. Gradually the integration of versioning with MLOps platforms and tools is also expected to significantly increase reproducibility and interpretability. Additionally, we must emphasize the importance of maintaining detailed records of each version, including relevant performance metrics, as it is an essential part of understanding the strengths and weaknesses of our models. It is only through continuous experimentation and careful monitoring of our models that we can ensure optimal performance in ML projects.

If you will like to test its model monitoring platform, apply for an invite and join the deepchecks community.

Testing. CI/CD. Monitoring.

Because ML systems are more fragile than you think. All based on our open-source core.

Deepchecks Hub Our GithubOpen Source