LLM Hallucination Detection and Mitigation: Best Techniques

If you would like to contribute your own blog post, feel free to reach out to us via blog@deepchecks.com. We typically pay a symbolic fee for content that’s accepted by our reviewers.

Introduction

Large Language Model (LLM) adoption is reaching another level in 2024. As Valuates reports, the LLM market was valued at 10.5 Billion USD in 2022 and is anticipated to hit 40.8 Billion USD by 2029, with a staggering Compound Annual Growth Rate (CAGR) of 21.4%.

Imagine a machine so native to language that it can write poems, translate languages, and answer your questions in captivating detail. LLMs are doing just that, rapidly transforming fields like communication, education, and creative expression. Yet, amidst their brilliance lies a hidden vulnerability, the whisper of hallucination. These AI models can sometimes invent facts, fabricate stories, or simply get things wrong.

These hallucinations might seem harmless at first glance – a sprinkle of fiction in a poem, a mistranslated phrase. But the consequences can be real, with misleading information, biased outputs, and even eroded trust in technology. So, it becomes crucial to ask, how can we detect and mitigate these hallucinations, ensuring LLMs speak truth to power, not fantastical fabrications?

This blog dives into the world of LLM hallucinations, exploring their causes, uncovering the best detection techniques, and delving into mitigation strategies.

Understanding Hallucinations

Have you ever encountered a text that seems perfectly crafted, with impeccable grammar and flowing sentences, yet leaves you scratching your head wondering if it’s actually true? This is the world of LLM hallucinations, where large language models like GPT-3 and GPT-4 can become storytellers gone rogue. While they excel at producing coherent text, they sometimes steer off course, creating outputs that are factually inaccurate or entirely nonsensical. These hallucinations pose a significant challenge, with the potential to spread misinformation and undermine the credibility of these AI models.

Hallucinations in LLMs can stem from various root causes, including data biases and uncertainties in the training data and model architecture. Here’s a breakdown of these causes and the differentiation between types of hallucinations.

Root Causes of Hallucinations

  • Data biases: LLMs learn from vast amounts of text data, which may contain biases or inaccuracies. If the training data is skewed towards certain perspectives or contains factual errors, the model may generate hallucinations that reflect these biases.
  • Uncertainty: LLMs operate based on probabilistic principles, and in situations where there’s ambiguity or uncertainty in the input, the model might generate hallucinations as it attempts to fill in the gaps or make sense of the information.

Types of Hallucinations

  • Factual errors: These hallucinations involve the model generating information that is factually incorrect based on the input or context. For example, if a question asks for the capital of France and the model erroneously responds with “Berlin,” it’s a factual error hallucination.
  • Creative enrichment: In some cases, LLMs may produce responses that go beyond the input data, adding imaginative or creative elements. These hallucinations might include fictional narratives, improbable scenarios, or unexpected connections between concepts.

Real-world examples of LLM hallucinations have been observed in various contexts, including chatbots, language generation tasks, and question-answering systems. These hallucinations can sometimes lead to misinformation or misunderstandings if not appropriately identified and addressed. Therefore, it’s crucial to understand the underlying causes and types of hallucinations to improve the reliability and accuracy of LLM-generated content.

Deepchecks For LLM VALIDATION

LLM Hallucination Detection and Mitigation: Best Techniques

  • Reduce Risk
  • Simplify Compliance
  • Gain Visibility
  • Version Comparison
TRY LLM VALIDATION

LLM Hallucination Detection Techniques

1. Log Probability

Log probability is a fundamental technique in detecting hallucinations in LLMs. It provides the metric to gauge the likelihood of a generated text sequence by assessing how well it aligns with the model’s understanding of the language patterns. When LLMs hallucinate, they will produce text that is significantly different from expected language patterns. However, with the help of log probability calculations, we can identify these anomalies.

Seq-Logprob (model confidence)

Sequence log probability (Seq-Logprob) is a metric used to measure how likely a generated text sequence is based on the language model’s understanding. It offers a way to measure the LLM’s confidence in its generated text. Let’s understand the equation of Seq-Logprob.

Seq-Logprob (Model Confidence)

Seq-Logprob (Model Confidence)

Where,

  • P(yk | y<k, x, θ) = The conditional probability of producing the k-th word (yk) in the text, given the previously generated words (y<k), the source text (x), and the model’s parameters (θ).
  • Σ = adding up calculations for each word.
  • k = 1 to L = Iteration through each word in the generated text
  • 1/L– Normalization by length (L) of the text.

When an LLM hallucinates, it often produces words or phrases that are unlikely or illogical within the given context. These unexpected sequences result in a lower overall Seq-Logprob score, signaling to us that the text may contain incorrect or invented information. Therefore, a low Seq-Logprob serves as a warning sign, allowing us to potentially filter out or flag hallucinatory output.

Understanding Log Probability

In NLP and machine learning, probabilities are often represented using logarithms. This is because probabilities can become extremely small, making them computationally difficult to work with. Taking the logarithm of probabilities simplifies calculations and allows for easier comparison of values.

Detection Method

When an LLM generates text, each word or token in the sequence is assigned a probability based on the model’s training data and architecture. The log probability of the entire generated sequence is computed by summing the logarithms of the probabilities of each individual word or token. A low log probability suggests that the sequence is improbable or unlikely to occur based on the model’s training data. This can indicate a potential hallucination.

Example

Let’s consider an example where an LLM is asked to generate a sentence about a topic it has been trained on, such as “climate change.” The model generates the following sentence: “Polar bears enjoy sunbathing on the beaches of Antarctica during the winter.”

Assume the following probabilities based on the model’s understanding of the training data.

P(“polar bears”) = 0.5
P(“enjoy”) = 0.1
P(“sunbathing”) = 0.15
P(“on the beaches”) = 0.2
P(“of Antartica”) = 0.02
P(“during the winter”) = 0.01

Let’s calculate the log probabilities for the generated text.

log(P(“polar bears”)) = log(0.05) ~ – 1.30
log(P(“enjoy”)) = log(0.1) ~ -1.00
log(P(“sunbathing”)) = log(0.15) ~ -0.82
log(P(“on the beaches”)) = log(0.2) ~ -0.70
log(P(“of Antartica”)) = log(0.02) ~ -1.70
log(P(“during the winter”)) = log(0.01) ~ -2.00

Now, calculate the Seq-Logprob using the above-mentioned formula.

Seq-Logprob = 1/6 * [log(0.05) + log(0.1) + log(0.15) + log(0.2) + log(0.02) + log(0.01)]
Seq-Logprob ~ -1.30

Interpreting scores

Log probabilities are always negative. This is because probabilities themselves are between 0 and 1, and their logarithms fall below zero. A higher absolute value (e.g., -5.0) suggests a lower probability and potential problems with the generated text. A Seq-Logprob closer to zero (e.g., -0.2) would indicate a higher probability sequence.

Our Seq-Logprob score of -1.30 implies that the generated text is somewhat unusual as the model’s training data. However, this score is also quite close to zero. So, even though this generated text need not be wrong, it seeks further attention.

2. Sentence Similarity

Sentence similarity is a method used to compare the generated text to the source material or training data. It measures how closely the generated text aligns with the language patterns and content of the original data. In the context of hallucination detection in LLMs, significant deviations between the generated text and the source material can indicate potential hallucinations.

Where,

  • A & B = vectors
  • Ai and Bi = vector representation of A and B.
  • ||A|| = Magnitude (Length) of vector A.
  • ||B|| = Magnitude (Length) of vector B.
  • Σ(from i=1 to n) Ai x Bi = Calculates dot product between vectors A and B.

The equation calculates the cosine similarity between two vectors representing sentences or pieces of text. This score quantifies the semantic similarity between them.

Understanding Sentence Similarity

Sentence similarity is a concept in NLP that quantifies how similar two sentences are to each other. There are various metrics and techniques for measuring sentence similarity, including cosine similarity, Jaccard similarity, and embedding-based methods. These methods typically compare features or representations of sentences, such as word embeddings or syntactic structures, to compute a similarity score.

Detection Method

When an LLM generates text, the resulting sequence of words can be compared to the sentences in the model’s training data or a reference corpus.

Sentence similarity metrics are then applied to compute a similarity score between the generated text and the reference sentences. A high similarity score indicates that the generated text closely resembles the source material, while a low score suggests significant divergence. If the similarity score falls below a certain threshold, it raises concerns that the generated text may be a hallucination.

Example

Let’s understand the cosine similarity with an example. Consider the following sentences generated by an LLM.

A – Learning AI can be hard
B – Learning AI can be easy

Now, we have two vectors, A and B. Let’s find the cosine similarity using the above formula.

  • Calculate the dot product between A and B = 1.1 + 1.1 + 1.1 + 1.1 + 1.0 + 0.1 = 4.
  • Calculate the magnitude of vector A = √1² + 1² + 1² + 1² + 1² + 0² = 2.2360679775.
  • Calculate the magnitude of vector B = √1² + 1² + 1² + 1² + 0²+ 1² = 2.2360679775
  • Calculate the cosine similarity = (4) / (2.2360679775*2.2360679775) = 0.80

This cosine similarity score implies there is an 80% similarity between two sentences. This means the generated text is not a hallucination.

Understand Scores:

  • High similarity: Suppose the generated summary has a cosine similarity of 0.8 with a key sentence in the original article. This indicates strong topical alignment.
  • Moderate similarity: A cosine similarity of 0.4 with some article sentences suggests partial thematic overlap.
  • Low similarity: A cosine similarity near 0 or negative with most article sentences implies the summary introduces themes largely absent from the source.

3. Novelty Detection

Novelty detection is a method used to identify outputs that are statistically abnormal or significantly different from the typical patterns observed in the training data. In LLMs, novelty detection serves as a mechanism for flagging responses that deviate substantially from the expected language patterns or content. Hallucinations often manifest as novel or unexpected responses, making novelty detection a valuable technique for detecting them.

Understanding Novelty Detection

Novelty detection, or anomaly detection, is a critical machine learning task focused on identifying data points that deviate significantly from the established patterns within a dataset. When it comes to language models, novelty detection pinpoints generated text that exhibits significant differences from the characteristics or patterns observed in the model’s training data. Techniques for novelty detection include statistical methods, clustering, and distance-based approaches.

Detection Method

When an LLM generates text, the resulting sequences of words can be compared to the distribution of responses observed in the training data or reference corpus.

Novelty detection techniques assess how well the generated text fits within the expected distribution of responses. Text that falls outside of this distribution is considered novel or anomalous and may indicate a hallucination.

This comparison can be based on various factors, including n-gram frequencies, word embeddings, syntactic structures, or semantic representations.

Let’s focus on n-gram frequencies: An n-gram is a sequence of ‘n’ consecutive words. For example, in the sentence “The cat sat on the mat,” some possible n-grams include:

  • Bigrams (n=2): “The cat”, “cat sat”, “sat on”, etc.
  • Trigrams (n=3): “The cat sat”, “cat sat on”, etc.

During training, the LLM calculates the probability of different n-grams occurring in the corpus. To assess novelty, we can compare the n-gram probabilities of the generated text to those in the training data. N-grams with very low probabilities in the training data would contribute to a higher novelty score.

Let’s understand the calculation using the bi-grams.

Novelty Score = Σ -log(P(bigram)) 

Where,

P(bigram) = probability of bigrams in the text.

-log = converts lower probabilities into larger positive numbers.

As we’ve been discussing, LLMs sometimes produce hallucinations – statements that sound plausible but are factually wrong or nonsensical. These hallucinations often contain word combinations that the LLM didn’t frequently encounter in the training data. This novelty equation helps flag them by assigning a high score to text that significantly deviates from what the model learned as normal.

Example

Suppose an LLM is trained on a corpus of scientific articles, where the average explanation length is 150 words, and is used to explain a complex phenomenon.

Let’s assume the model generates the following explanation: “Quantum particles exhibit consciousness and make decisions based on their emotions.” (11 words).

Now, novelty detection helps us in flagging this explanation as unusual. How? –

Conciseness: The generated explanation is drastically shorter, 11 words, as compared to the 150-word average explanation length, signaling a potential outlier.

Unexpected vocabulary: Words like “consciousness” and “emotions” likely have very few occurrences in the scientific training corpus.

These factors would likely result in a high novelty score, indicating a hallucination.

4. Self-check GPT

Selfcheck GPT is a technique used to enhance the reliability and accuracy of LLMs by incorporating self-supervised learning and retrieval-based methods for hallucination detection. It aims to detect hallucinations by cross-referencing generated responses with relevant context or retrieving information from external knowledge sources.

Self-Supervised Learning

Self-supervised learning involves training a model to predict certain properties or attributes of its input data without explicit supervision.

In Selfcheck GPT, the model is trained to generate responses while simultaneously learning to assess the quality and coherence of its own outputs.

Retrieval-Based Methods

Retrieval-based methods involve retrieving relevant information from external knowledge sources or contextually related passages to validate or augment the generated text.

Selfcheck GPT incorporates retrieval-based methods to verify the accuracy and coherence of its generated responses by comparing them with the retrieved information.

Detection Method

When an LLM generates text, Selfcheck GPT evaluates the quality and reliability of the generated response by comparing it with the relevant context or retrieved information.

If the generated response deviates significantly from the retrieved information or lacks coherence with the context, this can be a hallucination.

Integration of Selfcheck GPT

Selfcheck GPT can be integrated into the inference pipeline of an LLM to dynamically assess the quality of generated responses in real time.

During inference, the LLM generates a response, which is then passed to the Selfcheck GPT module for evaluation.

If the generated response is deemed unreliable or inconsistent, Selfcheck GPT may trigger corrective measures such as rephrasing or discarding the response.

5. RAG (Retrieval-Augmented Generation)

Retrieval-Augmented Generation (RAG) is a technique used to improve the reliability and accuracy of LLMs by incorporating retrieval-based methods for hallucination detection. RAG combines traditional generative capabilities with retrieval-based approaches to validate and augment the generated text using information retrieved from external knowledge sources or contextually related passages.

Generation and Retrieval

In traditional LLMs, text generation is based solely on the model’s internal knowledge and training data. However, RAG augments this process by incorporating retrieval-based methods.

When generating text, RAG simultaneously retrieves relevant information from external knowledge sources or contextually related passages.

Contextual Validation

After generating a response, RAG compares it with the retrieved information to validate its accuracy, coherence, and relevance.

If the generated response aligns well with the retrieved information and maintains coherence with the context, it is deemed reliable. However, discrepancies or inconsistencies may indicate hallucinations.

Detection Method

RAG dynamically evaluates the quality and reliability of generated responses by cross-referencing them with relevant context or retrieved information.

If the generated response deviates significantly from the retrieved information or lacks coherence with the context, it raises suspicions of hallucinations.

By comparing the generated response with the retrieved information, RAG can identify inaccuracies, factual errors, or inconsistencies indicative of hallucinations.

Example

Suppose an LLM is tasked with generating explanations of scientific concepts.

After generating an explanation, RAG retrieves relevant passages from scientific articles or textbooks related to the topic.

It then compares the generated explanation with the retrieved information to validate its accuracy and coherence.

If the generated explanation contradicts the retrieved information or lacks coherence with the context, RAG identifies it as a potential hallucination and may trigger corrective actions.

LLM Hallucination Mitigation Techniques

1. Data Augmentation

Incorporating diverse and reliable data improves accuracy by providing the model with a broader range of examples and perspectives. Techniques such as data synthesis, oversampling, or incorporating external datasets can enhance the model’s understanding and generalization capabilities.

2. Model Fine-tuning

Model fine-tuning involves refining pre-trained models on specific tasks or datasets. This process adapts the model’s parameters to better suit the target domain, improving performance and reducing the likelihood of hallucinations on specific tasks or datasets.

3. Prompt Engineering

Crafting clear and focused prompts guides LLM outputs by providing contextual cues and constraints. Well-designed prompts can steer the model towards generating more relevant and accurate responses, reducing the risk of hallucinations and improving overall output quality.

4. Ensemble Methods

Ensemble methods combine outputs from multiple LLMs to increase robustness and reliability. By leveraging diverse models trained on different architectures or datasets, ensemble methods can mitigate individual model biases and errors, leading to more consistent and accurate predictions.

Conclusion

Detecting and mitigating hallucinations in LLMs is essential for ensuring the reliability and trustworthiness of automated text generation systems. By employing a combination of strategies such as data augmentation, model fine-tuning, prompt engineering, and ensemble methods, developers can enhance the accuracy and robustness of LLM-generated content.

Incorporating diverse and reliable data, refining models on specific tasks or datasets, crafting clear and focused prompts, and leveraging ensemble approaches collectively contribute to reducing the risk of hallucinations and improving overall output quality.

Deepchecks For LLM VALIDATION

LLM Hallucination Detection and Mitigation: Best Techniques

  • Reduce Risk
  • Simplify Compliance
  • Gain Visibility
  • Version Comparison
TRY LLM VALIDATION

Recent Blog Posts