How to build a RAG application from scratch with the least possible AI Hallucinations

Zhenya

May 14, 2024

Imagine a world where AI assistants can not only answer your questions but also provide insightful summaries, craft compelling creative text formats, and even access and integrate the latest information. This is the power of Retrieval-Augmented Generation (RAG) applications.

Traditionally, LLMs have revolutionized tasks like question answering and content creation. However, their reliance on pre-trained data can limit their access to the most recent information and domain-specific knowledge. RAG applications bridge this gap by combining the best of both worlds: the information retrieval capabilities of search engines and the powerful text generation abilities of LLMs.

This article empowers you to take control and build your RAG application. We'll walk you through the step-by-step process, from understanding the core components to implementing them for real-world uses.

Building Your RAG Application: A Step-by-Step Guide

The world of RAG applications is no longer just for tech giants. With the right tools and guidance, you can build your intelligent assistant! Here's a breakdown of the key steps involved:

Step 1: Setting up the Environment

Before we jump into building this RAG Application we need to set up the environment and for that, there are multiple Python packages that we need to install which include Langchain, OpenAI, FastAPI. We will also use python-native libraries such as os and dotenv for loading and accessing api keys.

First of all, make sure you have python3 installed on your machine. If yes - you can create a virtual environment and activate it. Later, you can install the required packages with this command:

pip install langchain, openai, fastapi, dotenv, langchain-openai, langchain-community

After that you can create a .env file in your project repository and write there the API keys that will be used:

OPENAI_API_KEY="<copy-paste-your-api-key-here>"

Well done! Now you can start writing the code!

Step 2: Building RAG with LangChain

One of the best frameworks available to developers who want to design applications with LLM capabilities is LangChain. It enables LLM models to generate responses based on the most recent information available on the internet.

There are multiple RAG methods available in LangChain; in this instance, we will use the RAG chain. Below is the LangChain function to initialize the chain using RAG method;

The code below sets up a language processing pipeline for retrieval-based question answering. The process consists of the following five steps: environment setup, accessing a web page and scraping its contents, dividing it into manageable bits, and embedding each chunk. Based on how well these embeddings match user queries, they are then indexed for quick and easy retrieval. Lastly, an OpenAI language model is used to build a RAG that uses the indexed embeddings to respond to user inquiries.

‍import os
from langchain import hub
from langchain_chroma import Chroma
from langchain_core.vectorstores import VectorStoreRetriever
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
import dotenv
from langchain_openai import ChatOpenAI

dotenv.load_dotenv()

openai_key = os.getenv("OPENAI_API_KEY")

llm = ChatOpenAI(model="gpt-3.5-turbo-0125")

prompt = hub.pull("rlm/rag-prompt")



def scrape_url(url: str) -> list:
    # Load, chunk and index the contents of the blog.
    loader = WebBaseLoader(
        web_paths=(url,),
        bs_kwargs=dict()
    )
    docs = loader.load()
    return docs

def create_retriever(docs: list) -> VectorStoreRetriever:
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
    splits = text_splitter.split_documents(docs)
    vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())

    # Retrieve and generate using the relevant snippets of the blog.
    retriever = vectorstore.as_retriever()
    return retriever


def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

def ask_rag(question: str, retriever: VectorStoreRetriever):
    rag_chain = (
        {"context": retriever | format_docs, "question": RunnablePassthrough()}
        | prompt
        | llm
        | StrOutputParser()
    )

    answer = rag_chain.invoke(question)
    return answer


Step 3: Integrating with the Backend

Building a RAG application requires also having a backend that will be responsible for connecting the LLM and its user. Here comes FastAPI - a powerful and efficient Python backend framework. You can quickly create a FastAPI application in one file.

Code Snippet

from fastapi import FastAPI

from rag import scrape_url, create_retriever, ask_rag

app = FastAPI()

@app.get("/")
async def root():
    return {"message": "Hello World"}

@app.post("/ask")
def ask(question: str, link: str):
    docs = scrape_url(link)
    retriever = create_retriever(docs)
    response = ask_rag(question=question, retriever=retriever)
    return {"message": response}

You can easily start FastAPI server by running this command in the terminal (make sure to name the file containing the code main.py

fastapi dev main.py

Bingo! Now you have your app’s API accessible on http://localhost:8000/docs. You can chat with your link’s data - by sending a POST request to the /ask endpoint and filling the question and link header fields.

Try giving https://docs.langwatch.ai/concepts as a data source to the RAG and ask it what are span and thread. Spoiler - they are important concepts for LLM observability that you might need later.

Evaluating The Quality of Your RAG

Now that the development phase is over, let's talk about evaluating your RAG applications with LangWatch. A whole range of tools is available from LangWatch to help you comprehend and enhance the functionality of your LLM applications. You can also improve the quality of your LLM app, obtain insights into user activity, and examine interactions with LangWatch.

With the help of LangWatch's Evaluations, you can better understand your user interactions and feelings as well as pinpoint areas that need improvement. Along with evaluation criteria like "Reliability and Faithfulness scores," the platform also provides tools for evaluating Quality Performance, such as the ability to detect jailbreaking users and biased outputs, as well as real-time mitigation of hallucinated responses.

Reach out to us: contact@langwatch.ai or book a demo: https://get.langwatch.ai/request-a-demo and we are happy to help you further.

Imagine a world where AI assistants can not only answer your questions but also provide insightful summaries, craft compelling creative text formats, and even access and integrate the latest information. This is the power of Retrieval-Augmented Generation (RAG) applications.

Traditionally, LLMs have revolutionized tasks like question answering and content creation. However, their reliance on pre-trained data can limit their access to the most recent information and domain-specific knowledge. RAG applications bridge this gap by combining the best of both worlds: the information retrieval capabilities of search engines and the powerful text generation abilities of LLMs.

This article empowers you to take control and build your RAG application. We'll walk you through the step-by-step process, from understanding the core components to implementing them for real-world uses.

Building Your RAG Application: A Step-by-Step Guide

The world of RAG applications is no longer just for tech giants. With the right tools and guidance, you can build your intelligent assistant! Here's a breakdown of the key steps involved:

Step 1: Setting up the Environment

Before we jump into building this RAG Application we need to set up the environment and for that, there are multiple Python packages that we need to install which include Langchain, OpenAI, FastAPI. We will also use python-native libraries such as os and dotenv for loading and accessing api keys.

First of all, make sure you have python3 installed on your machine. If yes - you can create a virtual environment and activate it. Later, you can install the required packages with this command:

pip install langchain, openai, fastapi, dotenv, langchain-openai, langchain-community

After that you can create a .env file in your project repository and write there the API keys that will be used:

OPENAI_API_KEY="<copy-paste-your-api-key-here>"

Well done! Now you can start writing the code!

Step 2: Building RAG with LangChain

One of the best frameworks available to developers who want to design applications with LLM capabilities is LangChain. It enables LLM models to generate responses based on the most recent information available on the internet.

There are multiple RAG methods available in LangChain; in this instance, we will use the RAG chain. Below is the LangChain function to initialize the chain using RAG method;

The code below sets up a language processing pipeline for retrieval-based question answering. The process consists of the following five steps: environment setup, accessing a web page and scraping its contents, dividing it into manageable bits, and embedding each chunk. Based on how well these embeddings match user queries, they are then indexed for quick and easy retrieval. Lastly, an OpenAI language model is used to build a RAG that uses the indexed embeddings to respond to user inquiries.

‍import os
from langchain import hub
from langchain_chroma import Chroma
from langchain_core.vectorstores import VectorStoreRetriever
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
import dotenv
from langchain_openai import ChatOpenAI

dotenv.load_dotenv()

openai_key = os.getenv("OPENAI_API_KEY")

llm = ChatOpenAI(model="gpt-3.5-turbo-0125")

prompt = hub.pull("rlm/rag-prompt")



def scrape_url(url: str) -> list:
    # Load, chunk and index the contents of the blog.
    loader = WebBaseLoader(
        web_paths=(url,),
        bs_kwargs=dict()
    )
    docs = loader.load()
    return docs

def create_retriever(docs: list) -> VectorStoreRetriever:
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
    splits = text_splitter.split_documents(docs)
    vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())

    # Retrieve and generate using the relevant snippets of the blog.
    retriever = vectorstore.as_retriever()
    return retriever


def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

def ask_rag(question: str, retriever: VectorStoreRetriever):
    rag_chain = (
        {"context": retriever | format_docs, "question": RunnablePassthrough()}
        | prompt
        | llm
        | StrOutputParser()
    )

    answer = rag_chain.invoke(question)
    return answer


Step 3: Integrating with the Backend

Building a RAG application requires also having a backend that will be responsible for connecting the LLM and its user. Here comes FastAPI - a powerful and efficient Python backend framework. You can quickly create a FastAPI application in one file.

Code Snippet

from fastapi import FastAPI

from rag import scrape_url, create_retriever, ask_rag

app = FastAPI()

@app.get("/")
async def root():
    return {"message": "Hello World"}

@app.post("/ask")
def ask(question: str, link: str):
    docs = scrape_url(link)
    retriever = create_retriever(docs)
    response = ask_rag(question=question, retriever=retriever)
    return {"message": response}

You can easily start FastAPI server by running this command in the terminal (make sure to name the file containing the code main.py

fastapi dev main.py

Bingo! Now you have your app’s API accessible on http://localhost:8000/docs. You can chat with your link’s data - by sending a POST request to the /ask endpoint and filling the question and link header fields.

Try giving https://docs.langwatch.ai/concepts as a data source to the RAG and ask it what are span and thread. Spoiler - they are important concepts for LLM observability that you might need later.

Evaluating The Quality of Your RAG

Now that the development phase is over, let's talk about evaluating your RAG applications with LangWatch. A whole range of tools is available from LangWatch to help you comprehend and enhance the functionality of your LLM applications. You can also improve the quality of your LLM app, obtain insights into user activity, and examine interactions with LangWatch.

With the help of LangWatch's Evaluations, you can better understand your user interactions and feelings as well as pinpoint areas that need improvement. Along with evaluation criteria like "Reliability and Faithfulness scores," the platform also provides tools for evaluating Quality Performance, such as the ability to detect jailbreaking users and biased outputs, as well as real-time mitigation of hallucinated responses.

Reach out to us: contact@langwatch.ai or book a demo: https://get.langwatch.ai/request-a-demo and we are happy to help you further.

Imagine a world where AI assistants can not only answer your questions but also provide insightful summaries, craft compelling creative text formats, and even access and integrate the latest information. This is the power of Retrieval-Augmented Generation (RAG) applications.

Traditionally, LLMs have revolutionized tasks like question answering and content creation. However, their reliance on pre-trained data can limit their access to the most recent information and domain-specific knowledge. RAG applications bridge this gap by combining the best of both worlds: the information retrieval capabilities of search engines and the powerful text generation abilities of LLMs.

This article empowers you to take control and build your RAG application. We'll walk you through the step-by-step process, from understanding the core components to implementing them for real-world uses.

Building Your RAG Application: A Step-by-Step Guide

The world of RAG applications is no longer just for tech giants. With the right tools and guidance, you can build your intelligent assistant! Here's a breakdown of the key steps involved:

Step 1: Setting up the Environment

Before we jump into building this RAG Application we need to set up the environment and for that, there are multiple Python packages that we need to install which include Langchain, OpenAI, FastAPI. We will also use python-native libraries such as os and dotenv for loading and accessing api keys.

First of all, make sure you have python3 installed on your machine. If yes - you can create a virtual environment and activate it. Later, you can install the required packages with this command:

pip install langchain, openai, fastapi, dotenv, langchain-openai, langchain-community

After that you can create a .env file in your project repository and write there the API keys that will be used:

OPENAI_API_KEY="<copy-paste-your-api-key-here>"

Well done! Now you can start writing the code!

Step 2: Building RAG with LangChain

One of the best frameworks available to developers who want to design applications with LLM capabilities is LangChain. It enables LLM models to generate responses based on the most recent information available on the internet.

There are multiple RAG methods available in LangChain; in this instance, we will use the RAG chain. Below is the LangChain function to initialize the chain using RAG method;

The code below sets up a language processing pipeline for retrieval-based question answering. The process consists of the following five steps: environment setup, accessing a web page and scraping its contents, dividing it into manageable bits, and embedding each chunk. Based on how well these embeddings match user queries, they are then indexed for quick and easy retrieval. Lastly, an OpenAI language model is used to build a RAG that uses the indexed embeddings to respond to user inquiries.

‍import os
from langchain import hub
from langchain_chroma import Chroma
from langchain_core.vectorstores import VectorStoreRetriever
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
import dotenv
from langchain_openai import ChatOpenAI

dotenv.load_dotenv()

openai_key = os.getenv("OPENAI_API_KEY")

llm = ChatOpenAI(model="gpt-3.5-turbo-0125")

prompt = hub.pull("rlm/rag-prompt")



def scrape_url(url: str) -> list:
    # Load, chunk and index the contents of the blog.
    loader = WebBaseLoader(
        web_paths=(url,),
        bs_kwargs=dict()
    )
    docs = loader.load()
    return docs

def create_retriever(docs: list) -> VectorStoreRetriever:
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
    splits = text_splitter.split_documents(docs)
    vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())

    # Retrieve and generate using the relevant snippets of the blog.
    retriever = vectorstore.as_retriever()
    return retriever


def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

def ask_rag(question: str, retriever: VectorStoreRetriever):
    rag_chain = (
        {"context": retriever | format_docs, "question": RunnablePassthrough()}
        | prompt
        | llm
        | StrOutputParser()
    )

    answer = rag_chain.invoke(question)
    return answer


Step 3: Integrating with the Backend

Building a RAG application requires also having a backend that will be responsible for connecting the LLM and its user. Here comes FastAPI - a powerful and efficient Python backend framework. You can quickly create a FastAPI application in one file.

Code Snippet

from fastapi import FastAPI

from rag import scrape_url, create_retriever, ask_rag

app = FastAPI()

@app.get("/")
async def root():
    return {"message": "Hello World"}

@app.post("/ask")
def ask(question: str, link: str):
    docs = scrape_url(link)
    retriever = create_retriever(docs)
    response = ask_rag(question=question, retriever=retriever)
    return {"message": response}

You can easily start FastAPI server by running this command in the terminal (make sure to name the file containing the code main.py

fastapi dev main.py

Bingo! Now you have your app’s API accessible on http://localhost:8000/docs. You can chat with your link’s data - by sending a POST request to the /ask endpoint and filling the question and link header fields.

Try giving https://docs.langwatch.ai/concepts as a data source to the RAG and ask it what are span and thread. Spoiler - they are important concepts for LLM observability that you might need later.

Evaluating The Quality of Your RAG

Now that the development phase is over, let's talk about evaluating your RAG applications with LangWatch. A whole range of tools is available from LangWatch to help you comprehend and enhance the functionality of your LLM applications. You can also improve the quality of your LLM app, obtain insights into user activity, and examine interactions with LangWatch.

With the help of LangWatch's Evaluations, you can better understand your user interactions and feelings as well as pinpoint areas that need improvement. Along with evaluation criteria like "Reliability and Faithfulness scores," the platform also provides tools for evaluating Quality Performance, such as the ability to detect jailbreaking users and biased outputs, as well as real-time mitigation of hallucinated responses.

Reach out to us: contact@langwatch.ai or book a demo: https://get.langwatch.ai/request-a-demo and we are happy to help you further.