Deepchecks LLM Evaluation
For RAG Apps

Everything you need for testing and evaluating your large language model based applications. Build
production-ready Retrieval-Augmented Generation (RAG) applications.
Test Different Options Objectively

Test Different Options
Objectively

Evaluate your RAG app holistically and test different components to find the best combination.

Test different:

  • LLM models
  • Different prompts
  • Different chunking strategies
  • Embedding models
  • Retrieval methods
  • Make product decisions and vendor selection
    metric-driven.

Understand How Each Step
Can Be Improved

Grounded in Context

Measures of how well the LLM output is
grounded in the context of the question.

Retrieval Relevance

Deepchecks CI is all about running the test suites you know (and love) as part of your CI/CD, using tools like GitHub Actions or Airflow.

Correctness

Deepchecks Monitoring is all about tracking data and models in production to make sure your ML system is behaving as expected.

Grounded in Context
Retrieval Relevance
Correctness
Automatic Annotation & Scoring

Automatic Annotation &
Scoring

Deepchecks automatically annotates LLM
interactions using a combination of open-
source, proprietary, and LLM models

Easily configure the out-of-the-box scoring to
further improve accuracy.

  • Choose the properties to prioritize
  • Refine properties and similarity thresholds
  • Change the location of each step in the pipeline
Monitoring

Mine Hard Samples for
Fine-Tuning & Debugging

Easily extract edge cases or sets of samples
where your RAG application doesn’t perform
well. Use it to modify the code or prompt, or just
download it for the next iteration of re-training.

LLMOps.Space LLMOps.Space

Deepchecks is a founding member of LLMOps.Space, a global community for LLM
practitioners. The community focuses on LLMOps-related content, discussions, and
events. Join thousands of practitioners on our Discord.
Join Discord ServerJoin Discord Server

LLMOps Past Events

End-2-End Evaluation of RAG-Based Applications | LLM Evaluation
End-2-End Evaluation of RAG-Based Applications | LLM Evaluation
LLM Application Observability | Deepchecks Evaluation
LLM Application Observability | Deepchecks Evaluation
Config-Driven Development for LLMs: Versioning, Routing, & Evaluating LLMs
Config-Driven Development for LLMs: Versioning, Routing, & Evaluating LLMs

Featured Content

The Best 10 LLM Evaluation Tools in 2024
The Best 10 LLM Evaluation Tools in 2024