🎉 Deepchecks raised $14m!  Click here to find out more ðŸš€
DEEPCHECKS GLOSSARY

Model Registry

The main purpose of a model registry is to have a central location to store models that are ready for manufacturing. Developers may pool their resources with those of other groups and stakeholders in the registry to oversee the whole organization’s model lifecycle in concert. A data scientist may submit trained models to a model registry. Models may be tested, verified, and deployed to operate once they have.

The model registry consists of:

  • Centralized storage for all sorts of models where they are kept for quick retrieval by applications (or services). In the absence of a central repository for model artifacts, developers would be forced to keep their work in disorganized files in a central source code repository. An open-source model registry simplifies the process by providing a central ML model repository for these models.

The consolidated storage makes it simple for data teams to see the current condition of all models at once.

  • Collaborative unit for asset lifecycle management. The model registry offers ML teams a collaborative unit for working with and sharing models. It facilitates cooperation in:
  1. Bridging the gap between experimentation and production.
  2. Providing teams with a centralized UI for model collaboration.
  3. Providing a model consumption interface for downstream systems.

Importance of Model Registration

Without a model registry, machine learning engineers are more prone to take shortcuts or make expensive errors.

Here is what could happen if you neglect model registration:

  • Mislabeled model artifacts. It might be difficult to determine which artifacts (files) originated from which training task. If these data are transmitted over email or instant chat, it is easy for mail to get mixed up.
  • Data loss or deletion happens when teams don’t maintain a history of how and when they utilized specific datasets.
  • Absent or unknown source code versions. Even the best models can deliver unexpected or incorrect outcomes. Losing the original code or forgetting which edition was used to train the model is a common pitfall if neither precaution was taken. Invalidating an existing model and then training a new one to better comprehend the problem might result in unnecessary repetitions.
  • Undocumented model performance. As teams iterate, they rapidly amass several versions of a model for a certain job. It might be difficult to make meaningful comparisons between iterations of a model if its performance data are spread out among several notebooks or files.

How ML Model Registry Works

Each model in a model registry is given a unique identification, often known as a model ID or UUID. Many commercially available registry programs also provide a method for tracking numerous versions of the same model. Data science and machine learning teams may use the model ID and version to compare and deploy models confidently.

Additionally, registry tools provide the storage of parameters or measurements. When registering a model for illustration, training, and evaluation, tasks might write hyperparameter settings and performance measures (such as accuracy). The storage of these values facilitates model comparisons. This information assists teams in determining if updated versions of a model are an improvement over earlier versions when they build new models. Numerous registry programs feature a graphical user interface for displaying these settings and metrics.

Generally speaking, model registries consist of these components:

  • Object storage for storing model artifacts and huge binary files.
  • A semi-structured or organized database for storing model metadata.
  • A graphical user interface for examining and comparing learned models.
  • A programmatic API for retrieving model artifacts and information using a model ID.

Endnotes

Utilizing model registry tools is essential in constructing a viable MLOps architecture. Model registries facilitate data scientist research and development and speed model deployment. They make complicated audits and governance practically viable.

Testing. CI/CD. Monitoring.

Because ML systems are more fragile than you think. All based on our open-source core.

Our GithubInstall Open SourceBook a Demo