validating your machine learning models
Deepchecks ML Testing

ML Models Testing
& CI/CD

Deepchecks ML Testing is an open-source Python-based
solution for comprehensively validating your machine
learning models and data with minimal effort, in both the
research and the production phases.

Key Capabilities of ML Testing

Data Integrity

Data Integrity

When you have a fresh dataset, and
want to validate your data’s
correctness and uncover
inconsistencies such as conflicting
labels or data duplicates.

Model Evaluation

Model Evaluation

When you have a trained model, and
want to examine performance metrics,
compare it to various benchmarks, and
create a clear and granular picture for
validating the model’s behavior (e.g.
are segments where it under-
performs).

Train-Test Validation

Train-Test Validation

When you have separate datasets
(such as train and test, or training data
collected at different times), and want
to validate that they are representative
of each other and don’t have issues
such as drift or leakage.

ML Validation Continuity from Research to Production

ML Validation Continuity from
Research to Production

You can use the exact set (or a subset)
of the checks that were used during
research for CI/CD and Production
monitoring. That ensures that the deep
knowledge that you data science
team has will be used by the ML
Engineers in later model/data lifecycle
phase.

Code-Level Root Cause Analysis

Code-Level Root Cause
Analysis

You can segment the data to get to the
area where the model/data seem to
fail and then handle that to the data
science team for code level analysis.
This means quicker root cause analysis
cycles (up to 70% of the time is usually
spent on the initial analysis, which is
saved here).

Deepchecks Open Source: For ML Practitioners From Research to Production

Deepchecks ML Testing is a Python-based solution for comprehensively validating your machine learning models and data with minimal effort, in both the research and the production phases. This includes checks related to various types of issues, such as model performance, data integrity, distribution mismatches, and more. Model and data validation is one of the most important processes that Data Scientists and ML Engineers are dealing with while scaling up from the “laboratory phase” to ML Systems that are providing continuous value. We would typically recommend “kicking the tires” with the Deepchecks Testing module as a first step, and continue to deploying the Deepchecks Monitoring module when the timing is right.
Deepchecks Deepchecks

How Does It Work?

Suites are composed of checks. Each check contains outputs displayed in a notebook and/or conditions with a pass/fail output.

Conditions can be added or removed from a
check;

Checks can be edited or added/removed to a
suite;

Suites can be created from scratch or forked
from an existing suite.


The checks and suites are the foundations for the reports (testing module) and the dashboards (monitoring module). The testing package contains extensive pre-built suites, that are easily extensible by custom checks and suites.
How Does It Work?
The checks and suites are the foundations for the reports (testing module) and the dashboards (monitoring module). The testing package contains extensive pre-built suites, that are easily extensible by custom checks and suites.

Testing: Key Features & Checks


Data Integrity

Data Integrity
suite = data_integrity() suite_result = suite.run(train_dataset)
check = StringMismatch() result = check.run(dataset)
Data Integrity

Train Test Validation

Distribution Checks
suite = train_test_validation() suite_result = suite.run(train_dataset, test_dataset)
check = PredictionDrift() result = check.run(train_dataset, test_dataset)
Distribution Checks

Model Evaluation

Model Evaluation
suite = model_evaluation() suite_result = suite.run(train_dataset, test_dataset, model)
check = WeakSegmentsPerformance() result = check.run(test_dataset, model)
Model Evaluation

Checks for Unstructured Data

Checks for Unstructured Data
pip install -U “deepchecks[nlp]” pip install -U “deepchecks[nlp-properties]”
pip install -U “deepchecks[vision]”
Checks for Unstructured Data

Model Explainability Checks

Model Explainability Checks
Model Explainability Checks

Open Source & Community

Deepchecks is committed to keeping the ML evaluation package open-source and community-focused.

Past Events

End-2-End Evaluation of RAG-Based Applications | LLM Evaluation
End-2-End Evaluation of RAG-Based Applications | LLM Evaluation
LLM Application Observability | Deepchecks Evaluation
LLM Application Observability | Deepchecks Evaluation
Config-Driven Development for LLMs: Versioning, Routing, & Evaluating LLMs
Config-Driven Development for LLMs: Versioning, Routing, & Evaluating LLMs

Recent Blog Posts

The Best 10 LLM Evaluation Tools in 2024
The Best 10 LLM Evaluation Tools in 2024