Creating Chatbots with RAG using LangChain

Part 2 of the series "Everything You Need to Know About LangChain (Before Starting an AI Project)"

Jul 13, 2025

In the last edition, we began our series on LangChain, one of the most powerful libraries for building applications that combine language models with external data.

In this article, we’ll see how to build a complete chatbot that uses RAG (Retrieval-Augmented Generation) to provide accurate answers based on specific documents.

This new approach is transforming the way AI systems generate responses by combining the power of large language models (LLMs) with the accuracy of specific data sources.

What is RAG?

RAG stands for “Retrieval-Augmented Generation”. The idea is simple but powerful: instead of a LLM responding only based on what it “knows” internally, it retrieves relevant information from a database or documents before generating the answer.

This greatly improves accuracy and contextualization.

This architecture combines two essential components: an information retrieval system and a language generation model. Instead of relying solely on the model’s internal knowledge, RAG retrieves relevant information from external databases before generating a response.

Imagine a specialist who, before answering a complex question, consults their personal library to find the most accurate and up-to-date information. That’s exactly what RAG does, but in an automated way and at computational speed.

Follow our page on LinkedIn for more content like this! 😉

How does RAG work?

The RAG process can be divided into three main stages:

1. Retrieval

When a question is asked, the system first searches a vector database for relevant documents or text passages. This search uses semantic similarity techniques, finding content related to the question’s context, even if it doesn’t contain the exact words.

2. Augmentation

The retrieved documents are then combined with the original question, creating an enriched context. This context provides the language model with specific and up-to-date information about the topic in question.

3. Generation

Finally, the language model uses both its internal knowledge and the retrieved information to generate a precise and contextually relevant answer.

The basic chatbot flow is:

Receive the user’s question
Retrieve relevant documents from a knowledge base (such as a database, PDF, articles, etc.)
Feed these documents and the question to the language model, which generates an answer based on that specific context
Deliver the answer to the user.

Why use LangChain for this?

LangChain offers ready-to-use components to build this pipeline in a modular and scalable way:

Retriever: to fetch the right information using vectors, embeddings, and specialized databases.
Chain: to connect the retrieval process with the generation model.
Memory: to store context and dialogue history, making the chatbot feel more natural. (we’ll explore this in the future!)

Example – Wikipedia Page

Imagine you have a document base about your product.

With LangChain, you can create a chatbot that, upon receiving a question, searches for the most relevant excerpts and generates a detailed response—even if the question is highly specific.

To illustrate, let’s retrieve a Wikipedia page about “bicho-preguiça” (sloth), using the WikipediaLoader library.

from langchain.document_loaders import WikipediaLoader

# Load content from Wikipedia
loader = WikipediaLoader("Sloth", lang="en", load_max_docs="1", doc_content_chars_max="10000")

docs = loader.load()

Now, we’ll split the retrieved document into smaller parts (chunks) using RecursiveCharacterTextSplitter.

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
texts = text_splitter.split_documents(docs)

Let’s generate embeddings (vector representation) for the chunks using the multilingual model sentence-transformers/distiluse-base-multilingual-cased-v2, and store them in a vector database (FAISS).

Next, we create our information retriever.

from langchain_huggingface import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/distiluse-base-multilingual-cased-v2")

# Store in FAISS
vectorstore = FAISS.from_documents(texts, embeddings)

# Create retriever
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

Now, let’s create our chatbot using ChatGoogleGenerativeAI with the Gemini-1.5-flash model.

from langchain.chains import RetrievalQA
from langchain_google_genai import ChatGoogleGenerativeAI
import os

# Configure your Gemini API key
os.environ["GOOGLE_API_KEY"] = ""

# Create the RAG chain with Gemini
llm = ChatGoogleGenerativeAI(model="models/gemini-1.5-flash", temperature=0.3)

# Connect our retriever to the chain
qa_chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)

# User question
# The answer will be based on the information from the Wikipedia page
question = "What species of sloths exist?"
answer = qa_chain.run(question)
print("Answer:", answer)

The answer:

🤖 Answer: There are six extant sloth species in two genera: *Bradypus* (three-toed sloths) and *Choloepus* (two-toed sloths).

With just a few lines of code, we built a chatbot with access to an external knowledge base!

🚀 To access the code with more LangChain usage examples, check out our Colab Notebooks area, with ready-to-run notebooks! Look for LangChain-chatbot-RAG.ipynb.

Conclusion

The RAG technique represents an important advancement in building more accurate and reliable AI systems. By combining language models with specific data sources, it offers an effective solution to the limitations of traditional AI.

LangChain provides a robust platform for building RAG-powered chatbots, enabling the creation of sophisticated solutions with relatively simple code. Its modular architecture makes it easy to customize and scale the system according to your specific needs.

With the example presented in this article, we have a solid foundation to build chatbots capable of answering precise questions based on our own documents.

Note: This article is intended for educational purposes, with a simple example to facilitate understanding. For production implementations, always consider aspects such as security, scalability, and monitoring.

The next step is to experiment with different configurations, explore other types of document loaders, and consider integrations with web interfaces to create a complete user experience.

This is the second article in our series on LangChain. Stay tuned for the next posts, where we’ll explore how to create chatbots with memory, interface (Streamlit), and agent calls!

Exploring Artificial Intelligence

Discussion about this post