Ever since the debut of ChatGPT, there has been a remarkable surge in the popularity of large language models (LLMs). These LLMs are reshaping the landscape of AI-driven product development, emerging as a pivotal technology in creating LLM-powered applications. Among the array of tools employed for building such applications, the LangChain framework makes creating applications that use LLMs easy. In this article, I will introduce you to the core components of LangChain. By the end of this article, you will know what LangChain is and how it works.
What is LangChain?
LangChain holds significance as a facilitative resource for constructing applications employing LLMs. Working directly with complex LLMs can be intricate, but LangChain furnishes a straightforward interface, streamlining the connection of LLMs with your application. Moreover, LangChain extends the capability to couple LLMs with diverse data sources(such as external data like files, other applications, and API data), enhancing their insights beyond their inherent general knowledge. This augmentation empowers LLMs with context-specific information, broadening their scope. Integrating various data sources allows your applications to attain heightened potency and adaptability. Lastly, LangChain empowers our LLMs to engage with their environment, determining the subsequent actions to undertake.
LangChain introduces modular abstractions for essential components required to interact with language models. These components are intentionally designed to ensure simplicity in their utilization, whether you are integrating them within the broader LangChain framework or not.
These components include:
In this tutorial, we’ll be using the Python version of LangChain. To participate, you should have the langchain Python package installed, along with all necessary API keys prepared for use.
To install the langchain Python package, simply use the following pip command:
Given the GitHub repository’s high activity level, make sure you’re using an up-to-date version.
Developing an application using LLMs necessitates the use of API keys for certain services, some of which may come with associated costs. The choice of LLM provider marks the initial step, requiring an API key for the chosen provider. For AI developers, the decision typically revolves around the trade-off between performance and expenditure, leading to the choice between proprietary and open-source foundation models.
Proprietary models are foundational closed-source models such as OpenAI and co:here developed by corporations with substantial expert teams and significant AI budgets. Generally larger in scale compared to open-source models, they deliver heightened performance, albeit often accompanied by more costly APIs. In this article, we’ll be utilizing OpenAI. However, it’s essential to emphasize that although the OpenAI API is affordable for experimentation, it is not provided free of charge. To obtain an OpenAI API Key, you need to have an OpenAI account and create a “New secret key” within the API keys section. Open-source models like BLOOM and LLaMA typically consist of smaller models with slightly diminished capabilities when compared to proprietary alternatives. However, they do present a more cost-effective choice. A multitude of open-source models are structured and hosted via the Hugging Face model registry, serving as a communal center. To acquire a Hugging Face API Key, you require a Hugging Face account and can generate a “New token” within the Access Tokens.
Components of LangChain
Within the framework of LangChain, the Schema pertains to the fundamental data types and structures utilized across the codebase. It encompasses four core components: Text, ChatMessages, Examples, and Document.
- Text: Engaging with language models predominantly occurs through text, serving as the primary conduit for interaction. In a broader context, numerous models follow a “text in, text out” paradigm, effectively positioning text as the core of their operation. As a result, LangChain’s design places notable importance on interfaces tied to text-related functionalities. Text is the primary communication medium between users and AI systems, serving as the channel through which input is received and output is delivered.
- ChatMessages: End users primarily engage with these systems through a chat interface. Certain model providers have even begun offering access to their underlying APIs in a manner that anticipates chat messages. These messages typically encompass a content field linked to a user designation, commonly consisting of text. Presently, the supported user categories are System, Human, and AI. Chat messages enable the AI to formulate responses grounded in the provided context.
- SystemChatMessage: A message providing instructions to the AI system.
- HumanChatMessage: A message from a human interacting with the AI system.
- AIChatMessage: A message from the AI system.
- Documents: A Document is unstructured data made up of page content (the actual information) and metadata (extra details describing the data’s attributes). The Document schema allows the AI system to handle and study unstructured data, which is a significant portion of real-world information. Through comprehension of unstructured data, AI systems gain valuable insights and improve prediction accuracy. Documents present text segments combined with related metadata.
- Examples: Examples are input and corresponding output pairs, representing inputs for a function and their expected results. These pairs serve roles in both model training and evaluation. The system learns to generate correct outputs for specific inputs by showcasing instances of accurate input-output associations. This forms a crucial element of AI system training and is pivotal in assessing system performance.
Models serve as the powerhouse of AI-driven applications. This component encompasses three distinct model types: Language Models, Chat Models, and Text Embedding Models.
Language Models: Within LangChain, models furnish an interface for Language Models. These specialized Language models are tailored to operate seamlessly with textual data, serving as both input and output. The landscape is currently witnessing a proliferation of diverse LLMs (both open-source and proprietary). LangChain facilitates integrations with an extensive spectrum of models while providing a unified and simplified interface to manage them. See the list of supported LLMs in LangChain. Let’s see an example where we use an LLM with text inputs:
Chat models: Chat models process sequences of messages to produce organized message outputs, catering to well-structured interactions. The core function of Chat Models is to manage arrays of Chat Messages, taking these lists as inputs and generating a Chat Message as output. This feature proves invaluable for applications featuring interactive chat interfaces, where conversation flow requires adept handling, and responses must be developed in real time. Chat models effectively engage with chat messages and can exhibit heightened creativity and adaptability based on their configuration. See a list of supported ChatModel in LangChain. Let’s see an example where we implement this with text inputs and the chat schema below:
Text Embedding: They accept textual input and convert it into a series/vector of floating-point numbers, effectively converting language into numerical forms. These numerical representations, known as embeddings, are strategically crafted to encapsulate the semantic essence of the input text. Through this translation of language into numeric form, these models enable machines to handle and comprehend text in a more abstract yet potent manner. Text embedding models transform text input into a list of floating-point numbers (embeddings), thus presenting a numerical portrayal of the input text. These embeddings play a pivotal role in extracting information from textual data. This proves indispensable in tasks demanding semantic comprehension or text comparison. See a list of various embedding models in LangChain. Let’s see an example where we implement this below:
A prompt serves as an instruction for an LLM. LLMs have democratized AI interaction – now, instead of code, prompts in natural language allow anyone to engage with AI. In essence, prompting involves encapsulating your intention within a natural-language query, which guides the model to generate the intended response. The prompt components within LangChain assist in directing our models through text instructions. When you devise an effective prompt that elicits the desired LLM output, you may use it as a template for various purposes. This is where LangChain introduces “PromptTemplates,” which facilitate the construction of prompts from different components. Through the utilization of prompt templates, LangChain streamlines prompt management and optimization.
Prompt Templates: Think of the Prompt template as a structured framework or blueprint designed for prompts. This framework features placeholders, which can be filled in with specific details or examples. Opting for a prompt template offers numerous advantages compared to manually crafting prompts with f-strings. This approach permits the efficient reuse of prompts when applicable. There are two primary categories of prompt templates: text prompt templates and chat prompt templates. Text Prompt Templates require a string of text as input. Conversely, Chat prompt templates necessitate a list of chat messages as input. Each chat message is assigned a role, which can be one of System, Human, or AI. Prompt templates can be approached from both a zero-shot and a few-shot perspective. In the zero-shot context, you rely on the assumption that the LLM has received adequate training on pertinent data to generate satisfactory responses. On the other hand, in the few-shot scenario, you augment the prompt by including a handful of examples to improve the quality of the LLM’s output. Please note that these prompt templates can accept multiple input variables, as demonstrated in the provided examples. Certain examples will involve the utilization of LangChain’s chain component, which we’ll delve into more deeply later. However, consider them as a means of automatically combining various LLM calls and actions or even just combining different components seamlessly.
Example Selectors: We can delve even deeper by selecting specific examples to incorporate into the prompts. To enhance the prompt creation process, LangChain introduces ExampleSelectors, which enable a more dynamic and context-aware approach. While it’s feasible to hardcode examples into prompts, greater efficacy is often achieved when examples are chosen dynamically. ExampleSelectors serve as tools that take user input and subsequently provide a list of examples to incorporate. Strategies employed by the example selector to dynamical select examples to include in the prompt include:
- Select based on example length
- Select based on maximal marginal relevance
- Select based on n-grams overlap
- Select based on similarity
Let’s explore one of these strategies, and feel free to experiment with the others as well.
Output Parser: While language models mainly yield textual results, there are times when a more organized structure is needed. These output parsers are dedicated classes tailored to transform and order responses systematically. Every OutputParser should have two core methods: “get_format_instructions” and “parse.” The “get_format_instructions” method offers a guideline, typically in a string, indicating the desired structure of the model’s output. Conversely, “parse” takes a string—usually the response from the language model—and converts it into an organized format. Additionally, an optional “parse_with_prompt” method can be incorporated. It utilizes the model’s response and the initial prompt to produce structured outcomes. Some of the parsers include list, json, datetime, enum, auto-fixing, retry, and structured output parsers.
The limitation of LLMs lies in their inherent lack of contextual information, particularly when it comes to accessing specific details. Despite LLM’s extensive pre-trained knowledge base, the option to enhance contextual alignment with your specific use case is available by incorporating supplementary context within the prompt. This gap can be addressed by granting LLMs access to pertinent external data. Often, this context is supplied in the form of documents or data obtained from external sources, where indexes play a pivotal role. Indexes frequently serve as the link between documents and the model, offering a simplified interface for both structured and unstructured data.
Document loader: Document Loaders serve to retrieve data from a diverse spectrum of external sources. A Document encapsulates textual content paired with its corresponding metadata. Within LangChain, three overarching categories of document loaders are available:
- Transform loaders: These loaders convert data from specific formats, such as CSV, PDF, SQL, etc., into the Document format.
- Public dataset or service loaders: Tailored for particular public web services like Wikipedia and YouTube.
- Private dataset or service loaders: Designed for non-public datasets and services like Google Drive, AWS S3, and Slack, these loaders necessitate authenticated access to the resources.
Document Loaders play an essential role in collecting documents from extensive sources. A comprehensive list of these loaders can be found here.
Text splitter: In many instances, documents can become overly lengthy, requiring segmentation into manageable sections. When confronted with extensive text sources, dividing the text into chunks becomes essential before loading. However, maintaining semantic coherence within these segments is crucial. This is where text splitters come into play. Using text splitters, you can fragment a document into smaller, more digestible pieces, enabling the model to process the content more efficiently. The process involves breaking the text into compact, semantically meaningful chunks while maintaining context through overlap between these chunks. Several text split strategies are available, including code-based splitting, character-based splitting, token-based splitting, recursive character splitting, and markdown header-based splitting.
Retriever: Retrievers merge documents with language models, enabling a language model to interrogate stored documents. They feature a solitary method called “get_relevant_documents,” which accepts the user’s query and delivers a list of pertinent documents, ideal for Q&A.
Vector store: As we observed earlier, after segmenting a substantial collection of documents into smaller, semantically connected text chunks, preserving this “relatedness” data for subsequent queries or across different scenarios is imperative. This degree of relatedness is termed an embedding, typically maintained within a vector store or database. One of the prevailing methods for storing and conducting searches over unstructured data involves embedding the data and retaining the resulting embedding vectors. The unstructured query is embedded during query moments, and the embedding vectors most closely aligned to the embedded query are retrieved—a vector store streamlines storing embedded data and executing vector-based searches. It’s simply a database to store embeddings vectors and make it easily searchable. LangChain offers wrappers for other vector stores, including FAISS, Chroma, Pinecone, Milvus, and more. See the full list of supported vector stores in Langchain.
At its core, a conversational system must have the capability to access historical messages, as this is pivotal for effective interactions. LangChain’s memory component facilitates the storage and retrieval of chat history. One might ponder: why not use the memory as a part of a prompt? Such an approach would unnecessarily elongate our prompt, which is inefficient. This component adeptly handles the storage and retrieval of conversations. It can maintain the complete dialogue, focus on the most recent interactions, provide a summary, extract, and showcase details from archived entities when mentioned, or employ a tailored strategy. The memory component ensures that incoming questions aren’t processed in isolation but are cross-referenced with prior interactions. These strategies span short-term and long-term memory, tailored to diverse application needs. Short-term memory revolves around maintaining context within a single discussion, often encompassing prior messages or their summaries. On the other hand, long-term memory focuses on accessing and updating data across multiple chats, which is where sophisticated systems like vector stores become essential. Memory is vital for a consistent and relevant conversational AI experience. Short-term memory maintains context in current chats, while long-term memory recalls insights from past conversations, enabling deeper, personalized interactions.
LangChain provides tools to add memory to a system. These tools can work on their own or be combined into a chain. A memory system handles two tasks – reading and writing. Every chain operates based on specific inputs. Some inputs come directly from the user, while others are retrieved from memory. In a typical operation, a chain interacts with its memory twice:
- After getting user inputs but before processing them, the chain READS from memory to enhance these inputs.
- After the main processing, but before giving the final answer, the chain WRITES the session’s data to memory for later use.
LangChain offers a diverse range of memory types that can be tailored to fit your specific needs when integrated into a chain:
- Conversation Buffer Memory: Retains specific messages, allowing extraction to variables.
- Conversation Buffer Window: Preserves the last ‘k’ interactions.
- Entity Memory: Stores details about distinct entities.
- Conversation Knowledge Graph Memory: Utilizes a knowledge graph to reconstruct memory.
- Conversation Summary Memory: Keeps an evolving summary of conversations.
- Conversation Summary Buffer Memory: Maintains the last ‘x’ interactions (based on token size) along with their summaries.
- Conversation Token Buffer Memory: Saves the last ‘x’ interactions, gauged by token size.
- Vector Store-Backed Memory: Archives interactions within a vector database.
Furthermore, LangChain introduces the ChatMessageHistory class for managing memory externally to the chain, allowing for preserving both Human and AI messages and subsequent retrieval. Lastly, LangChain seamlessly integrates with third-party databases and tools for enhanced versatility.
The name “LangChain” is a fusion of “Lang” and “Chain,” underscoring the significance of chains within the LangChain framework. A chain serves as a comprehensive conduit, seamlessly linking multiple LangChain components. It facilitates the automatic amalgamation of various LLM calls and actions. This was evident in our previous examples, where we integrated LLMs with prompt templates.
- LLMChain: An LLMChain is the most common type of chain. It comprises a PromptTemplate, a model (either an LLM or a ChatModel), and an optional output parser. This chain accepts various input variables and employs the PromptTemplate to craft them into a prompt. This prompt is then fed to the model. If an OutputParser is included, it refines the LLM’s output into a definitive format. We’ve demonstrated this approach in earlier examples.
- Index-related chains: Chains in this category facilitate interactions with indexes, merging our data housed in the indexes with LLMs. As LangChain seeks more efficient ways to relay multiple documents to the language model, this domain remains a hotbed of research. Currently, LangChain embraces four techniques:
- Stuffing: This approach involves embedding all pertinent data directly into the prompt, serving as context for the language model.
- Map Reduce: Here, an initial prompt processes each data chuck. Subsequently, another prompt integrates all initial outputs.
- Refine: This method starts with an initial prompt on the first data chunk, yielding some output. This output, alongside the next document, is presented for subsequent documents, prompting the LLM to refine its output based on the new information.
- Map-Rerank: Each data chuck undergoes an initial prompt. This aims to fulfill a task and assigns a confidence score to its answer. Responses are ranked based on these scores, and the top-ranked answer is returned.
Let’s delve into two prominent chaining methods: SimpleSequentialChain and Load_Summarize_Chain. The SimpleSequentialChain allows for straightforward chaining, where the output from one component becomes the input for the next. On the other hand, Load_Summarize_Chain is tailored for efficiently processing extensive documents to produce summaries.
Agents utilize LLMs to select and sequence their actions. The role of an LLM isn’t limited to generating text; it’s pivotal in informed decision-making. Echoing the CEO of OpenAI, Sam Altman’s insights, LLMs effectively act as robust “reasoning engines.” While chains consist of predefined sequences of actions, agents dynamically use the LLM’s analytical prowess to decide on actions and their order. Key features associated with agents include:
- Tool: Functionality called upon by an agent, symbolizing its capability. This layer of abstraction streamlines the interaction between LLMs and agents. It is essential to equip agents with the appropriate tools and describe these tools in ways that maximize their utility for the agent.
- Toolkit: A curated set of tools accessible to an agent. These assemblies, when used in tandem, can effectively accomplish distinct tasks. By possessing diverse tools, a toolkit ensures the agent can tackle tasks using the most fitting method.
- Agent Executor: The driving force behind an agent’s interaction with tools. In essence, it’s the agent’s operational backbone. It implements the decisions made by the agent, directs tasks to the correct tools, and manages the resulting data, updating the agent’s status accordingly.
LangChain classifies its agents into two main categories, as described in its official documentation:
- Action Agents: These agents leverage LLMs to determine the sequence and type of actions, either tool utilization and result observation or user responses. Here are the diverse action agents LangChain offers:
- Zero-shot ReAct
- Structured Input ReAct
- OpenAI Functions
- Self-ask with Search
- ReAct Document Store
- Plan-and-execute Agents: These agents strategize before embarking on a series of actions.
Consider an instance where we aim to use an LLM to tackle a mathematical problem. Given that LLMs can sometimes produce inaccurate results or “hallucinate,” we enhance accuracy by incorporating an agent equipped with a calculator.
This article thoroughly overviews LangChain’s core components, including Schema, Models, Prompts, Indexes, Memory, Chains, and Agents. While the library boasts a broader range of features than covered here, rapid advancements might render some details in this piece need to be revised. Nonetheless, the foundational concepts remain consistent. LangChain empowers anyone with coding skills to create applications powered by LLMs.
I am looking forward to seeing what you will be creating!