Deploying an OpenAI RAG Application to AWS ElasticBeanstalk

Zhenya

Jul 27, 2024

Welcome back to our series on building customer support chatbots using Retrieval Augmented Generation (RAG) with OpenAI in Python. Today, we're diving deeper into creating a chatbot that can efficiently answer questions based on provided links. Our setup will utilize FastAPI for the backend, LangChain RAG for the retrieval and generation process, and LangWatch for monitoring. By the end of this tutorial, you'll have a fully functional chatbot deployed on AWS Elastic Beanstalk.

Here's what you'll need to follow along:

  1. Python 3.11 installed on your machine

  2. An AWS account (sign up here)

  3. Access to LangWatch (get started here)

  4. Familiarity with FastAPI and LangChain

We'll kick things off by setting up our development environment, followed by creating the RAG module, building the FastAPI server, containerizing our application, and finally, deploying it to AWS Elastic Beanstalk. So, let's get started and bring your chatbot to life! 🚀

Step 0 - Setting up the Environment for OpenAI RAG in Python

First, let's install the necessary dependencies for OpenAI RAG in Python. Create a Python virtual environment and install the required libraries:

pip install fastapi langwatch langchain langchain-openai langchain-community langchain-chroma

Next, create a .env file to securely store your API keys:

OPENAI_API_KEY=<your-api-key> LANGWATCH_API_KEY=<your-api-key>

Awesome! Now let's write some code.

Step 1 - Creating the RAG Module with OpenAI in Python

We'll start by coding our RAG module with OpenAI in Python. We'll structure our codebase to have a modular RAG component that can be easily imported and used with various input parameters. Additionally, we'll have a FastAPI backend to connect the RAG module with users through an API.

import os
from langchain import hub
from langchain_chroma import Chroma
from langchain_core.vectorstores import VectorStoreRetriever
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables.config import RunnableConfig
from langchain_core.runnables.passthrough import RunnablePassthrough
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
import dotenv
from langchain_openai import ChatOpenAI
import langwatch 

dotenv.load_dotenv()

openai_key = os.getenv("OPENAI_API_KEY")

llm = ChatOpenAI(model="gpt-3.5-turbo-0125")

prompt = hub.pull("rlm/rag-prompt")



def scrape_url(url: str) -> list:
    # Load, chunk and index the contents of the blog.
    loader = WebBaseLoader(
        web_paths=(url,),
        bs_kwargs=dict()
    )
    docs = loader.load()
    return docs

def create_retriever(docs: list) -> VectorStoreRetriever:
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
    splits = text_splitter.split_documents(docs)
    vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())

    # Retrieve and generate using the relevant snippets of the blog.
    retriever = vectorstore.as_retriever()
    return retriever


def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

@langwatch.trace()
def ask_rag(question: str, retriever: VectorStoreRetriever):
    config=RunnableConfig(
            callbacks=[
                langwatch.get_current_trace().get_langchain_callback(),
            ]
        )
    question_runnable = RunnablePassthrough().with_config(config=config)
    rag_chain = (
        {"context": retriever | format_docs, "question": question_runnable}
        | prompt
        | llm
        | StrOutputParser()
    )

    answer = rag_chain.invoke(question)
    return answer

This code composes our RAG pipeline from a couple of simple functions, where each resembles an important step in the Retrieval Augmented Generation. This code contains functions to scrape web pages, create vector embeddings out of them and create retrievers that can retrieve related data. The last function performs Question Answering with the help of RAG pipeline.

Pay attention how we add LangWatch tracing in the ask_rag function. We add a decorator at the top of a function definition and we create an instance of a runnable with our custom configurations where we send the callbacks to LangWatch.

Step 2 - Building a FastAPI Server for OpenAI RAG

Next, we'll create a FastAPI server that allows users to interact with the RAG module via an API. The server will have a single endpoint where users can make POST requests with two input arguments: the link to scrape and the question they ask.

from fastapi import FastAPI

from rag import scrape_url, create_retriever, ask_rag

app = FastAPI()

@app.get("/")
async def root():
    return {"message": "Hello World"}

@app.post("/ask")
def ask(question: str, link: str):
    docs = scrape_url(link)
    retriever = create_retriever(docs)
    response = ask_rag(question=question, retriever=retriever)
    return {"message": response}


Step 3 - Containerizing the OpenAI RAG Python App

As a third step we will make our application deployable as a docker container. Docker simplifies the deployment process and significantly reduces all the dependency issues. Our application will consist of one single docker container that will embed the FastAPI backend.

FROM python:3.11

WORKDIR /app

COPY requirements.txt .

RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]

Congrats! Now you can run your application as a docker container. You can build an image out of it with this command ran from the root of the project

docker build -t <image-name

Later, you can run the next command and access your application deployed on http://localhost:8080/docs.

docker run -d --name <container-name> --env-file .env <image-name

Nice progress!

Step 4 - Installing AWS and giving the Permissions

As soon as we have our application containerized we can move to the deployment part. For Mac platform you can run this command

brew install awscli

Great! Now you can connect with your AWS account from your terminal. But before doing it - you have to configure the connection with the right user.

aws configure

After running this command you will be prompted to input AWS Access Key ID, Secret Access Key, region, and output format. You can create the first ones navigating on AWS dashboard from IAM → Users → [Your User]. Make sure to pick your nearest region as an input region. The output format can be left empty or None.

You can later verify your AWS configuration by running this command

aws sts get-caller-identity

Finally, before trying to deploy your application, we need to ensure your IAM user has the necessary permissions to create and manage Elastic Beanstalk environments. Attach the following policies to your IAM user:

AWSElasticBeanstalkFullAccess
IAMFullAccess
AmazonEC2FullAccess
AmazonS3FullAccess

You can attach these policies via the AWS Management Console under IAM -> Users -> [Your User] -> Add permissions -> Attach policies directly.

After this part is done, good job, we will have our app accessible on the internet in a few moments.

Step 5 - Deploying OpenAI RAG Python App on AWS Elastic Beanstalk

Finally, lets install elastic beanstalk on our machine.

brew install aws-elasticbeanstalk

Next, we will initialize the Elastic Beanstalk in the root of the project directory. By running this command we will create a corresponding folder inside of our project. Pay attention - we specify docker as a platform.

eb init -p docker ragapp

After it is initialized we have to create the corresponding environment for our deployment, I call my environment as eb-env. This command will package my application and upload it to the Elastic Beanstalk. It will also create the environment that will be used by the application.

eb create eb-env

Before deploying the application we also have to specify the api keys in our deployed environment

eb setenv OPENAI_API_KEY=your_openai_api_key

Finally, you can deploy the app with simple

eb deploy

Now, you can check the status of your deployment and find out the link to your deployed APIs.

eb status

You are expected to see something like

Environment details for: eb-env
  Application name: ragapp
  Region: us-west-2
  Deployed Version: app-13d3-240710_145838723247
  Environment ID: e-anrxpim3ms
  Platform: arn:aws:elasticbeanstalk:us-west-2::platform/Docker running on 64bit Amazon Linux 2/3.8.3
  Tier: WebServer-Standard-1.0
  CNAME: eb-env.eba-hgsmkwpy.us-west-2.elasticbeanstalk.com
  Status: Ready
  Health: Green

Here, you need to pay attention to two indicators:

  1. Health - should be Green

  2. CNAME - a public url accessing your application.

Finally, you can navigate to eb-env.eba-hgsmkwpy.us-west-2.elasticbeanstalk.com/docs and play with the available API’s

You've successfully built and deployed a customer support chatbot using OpenAI RAG in Python, FastAPI, and AWS Elastic Beanstalk.

Last step is to unlock the BlackBox and see the results coming out of the LLM, improve and iterate. For this, we'd be happy to onboard you on LangWatch.

Did you like this tutorial? Let us know your feedback and we'd be happy to support you.

Happy coding!

Welcome back to our series on building customer support chatbots using Retrieval Augmented Generation (RAG) with OpenAI in Python. Today, we're diving deeper into creating a chatbot that can efficiently answer questions based on provided links. Our setup will utilize FastAPI for the backend, LangChain RAG for the retrieval and generation process, and LangWatch for monitoring. By the end of this tutorial, you'll have a fully functional chatbot deployed on AWS Elastic Beanstalk.

Here's what you'll need to follow along:

  1. Python 3.11 installed on your machine

  2. An AWS account (sign up here)

  3. Access to LangWatch (get started here)

  4. Familiarity with FastAPI and LangChain

We'll kick things off by setting up our development environment, followed by creating the RAG module, building the FastAPI server, containerizing our application, and finally, deploying it to AWS Elastic Beanstalk. So, let's get started and bring your chatbot to life! 🚀

Step 0 - Setting up the Environment for OpenAI RAG in Python

First, let's install the necessary dependencies for OpenAI RAG in Python. Create a Python virtual environment and install the required libraries:

pip install fastapi langwatch langchain langchain-openai langchain-community langchain-chroma

Next, create a .env file to securely store your API keys:

OPENAI_API_KEY=<your-api-key> LANGWATCH_API_KEY=<your-api-key>

Awesome! Now let's write some code.

Step 1 - Creating the RAG Module with OpenAI in Python

We'll start by coding our RAG module with OpenAI in Python. We'll structure our codebase to have a modular RAG component that can be easily imported and used with various input parameters. Additionally, we'll have a FastAPI backend to connect the RAG module with users through an API.

import os
from langchain import hub
from langchain_chroma import Chroma
from langchain_core.vectorstores import VectorStoreRetriever
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables.config import RunnableConfig
from langchain_core.runnables.passthrough import RunnablePassthrough
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
import dotenv
from langchain_openai import ChatOpenAI
import langwatch 

dotenv.load_dotenv()

openai_key = os.getenv("OPENAI_API_KEY")

llm = ChatOpenAI(model="gpt-3.5-turbo-0125")

prompt = hub.pull("rlm/rag-prompt")



def scrape_url(url: str) -> list:
    # Load, chunk and index the contents of the blog.
    loader = WebBaseLoader(
        web_paths=(url,),
        bs_kwargs=dict()
    )
    docs = loader.load()
    return docs

def create_retriever(docs: list) -> VectorStoreRetriever:
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
    splits = text_splitter.split_documents(docs)
    vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())

    # Retrieve and generate using the relevant snippets of the blog.
    retriever = vectorstore.as_retriever()
    return retriever


def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

@langwatch.trace()
def ask_rag(question: str, retriever: VectorStoreRetriever):
    config=RunnableConfig(
            callbacks=[
                langwatch.get_current_trace().get_langchain_callback(),
            ]
        )
    question_runnable = RunnablePassthrough().with_config(config=config)
    rag_chain = (
        {"context": retriever | format_docs, "question": question_runnable}
        | prompt
        | llm
        | StrOutputParser()
    )

    answer = rag_chain.invoke(question)
    return answer

This code composes our RAG pipeline from a couple of simple functions, where each resembles an important step in the Retrieval Augmented Generation. This code contains functions to scrape web pages, create vector embeddings out of them and create retrievers that can retrieve related data. The last function performs Question Answering with the help of RAG pipeline.

Pay attention how we add LangWatch tracing in the ask_rag function. We add a decorator at the top of a function definition and we create an instance of a runnable with our custom configurations where we send the callbacks to LangWatch.

Step 2 - Building a FastAPI Server for OpenAI RAG

Next, we'll create a FastAPI server that allows users to interact with the RAG module via an API. The server will have a single endpoint where users can make POST requests with two input arguments: the link to scrape and the question they ask.

from fastapi import FastAPI

from rag import scrape_url, create_retriever, ask_rag

app = FastAPI()

@app.get("/")
async def root():
    return {"message": "Hello World"}

@app.post("/ask")
def ask(question: str, link: str):
    docs = scrape_url(link)
    retriever = create_retriever(docs)
    response = ask_rag(question=question, retriever=retriever)
    return {"message": response}


Step 3 - Containerizing the OpenAI RAG Python App

As a third step we will make our application deployable as a docker container. Docker simplifies the deployment process and significantly reduces all the dependency issues. Our application will consist of one single docker container that will embed the FastAPI backend.

FROM python:3.11

WORKDIR /app

COPY requirements.txt .

RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]

Congrats! Now you can run your application as a docker container. You can build an image out of it with this command ran from the root of the project

docker build -t <image-name

Later, you can run the next command and access your application deployed on http://localhost:8080/docs.

docker run -d --name <container-name> --env-file .env <image-name

Nice progress!

Step 4 - Installing AWS and giving the Permissions

As soon as we have our application containerized we can move to the deployment part. For Mac platform you can run this command

brew install awscli

Great! Now you can connect with your AWS account from your terminal. But before doing it - you have to configure the connection with the right user.

aws configure

After running this command you will be prompted to input AWS Access Key ID, Secret Access Key, region, and output format. You can create the first ones navigating on AWS dashboard from IAM → Users → [Your User]. Make sure to pick your nearest region as an input region. The output format can be left empty or None.

You can later verify your AWS configuration by running this command

aws sts get-caller-identity

Finally, before trying to deploy your application, we need to ensure your IAM user has the necessary permissions to create and manage Elastic Beanstalk environments. Attach the following policies to your IAM user:

AWSElasticBeanstalkFullAccess
IAMFullAccess
AmazonEC2FullAccess
AmazonS3FullAccess

You can attach these policies via the AWS Management Console under IAM -> Users -> [Your User] -> Add permissions -> Attach policies directly.

After this part is done, good job, we will have our app accessible on the internet in a few moments.

Step 5 - Deploying OpenAI RAG Python App on AWS Elastic Beanstalk

Finally, lets install elastic beanstalk on our machine.

brew install aws-elasticbeanstalk

Next, we will initialize the Elastic Beanstalk in the root of the project directory. By running this command we will create a corresponding folder inside of our project. Pay attention - we specify docker as a platform.

eb init -p docker ragapp

After it is initialized we have to create the corresponding environment for our deployment, I call my environment as eb-env. This command will package my application and upload it to the Elastic Beanstalk. It will also create the environment that will be used by the application.

eb create eb-env

Before deploying the application we also have to specify the api keys in our deployed environment

eb setenv OPENAI_API_KEY=your_openai_api_key

Finally, you can deploy the app with simple

eb deploy

Now, you can check the status of your deployment and find out the link to your deployed APIs.

eb status

You are expected to see something like

Environment details for: eb-env
  Application name: ragapp
  Region: us-west-2
  Deployed Version: app-13d3-240710_145838723247
  Environment ID: e-anrxpim3ms
  Platform: arn:aws:elasticbeanstalk:us-west-2::platform/Docker running on 64bit Amazon Linux 2/3.8.3
  Tier: WebServer-Standard-1.0
  CNAME: eb-env.eba-hgsmkwpy.us-west-2.elasticbeanstalk.com
  Status: Ready
  Health: Green

Here, you need to pay attention to two indicators:

  1. Health - should be Green

  2. CNAME - a public url accessing your application.

Finally, you can navigate to eb-env.eba-hgsmkwpy.us-west-2.elasticbeanstalk.com/docs and play with the available API’s

You've successfully built and deployed a customer support chatbot using OpenAI RAG in Python, FastAPI, and AWS Elastic Beanstalk.

Last step is to unlock the BlackBox and see the results coming out of the LLM, improve and iterate. For this, we'd be happy to onboard you on LangWatch.

Did you like this tutorial? Let us know your feedback and we'd be happy to support you.

Happy coding!

Welcome back to our series on building customer support chatbots using Retrieval Augmented Generation (RAG) with OpenAI in Python. Today, we're diving deeper into creating a chatbot that can efficiently answer questions based on provided links. Our setup will utilize FastAPI for the backend, LangChain RAG for the retrieval and generation process, and LangWatch for monitoring. By the end of this tutorial, you'll have a fully functional chatbot deployed on AWS Elastic Beanstalk.

Here's what you'll need to follow along:

  1. Python 3.11 installed on your machine

  2. An AWS account (sign up here)

  3. Access to LangWatch (get started here)

  4. Familiarity with FastAPI and LangChain

We'll kick things off by setting up our development environment, followed by creating the RAG module, building the FastAPI server, containerizing our application, and finally, deploying it to AWS Elastic Beanstalk. So, let's get started and bring your chatbot to life! 🚀

Step 0 - Setting up the Environment for OpenAI RAG in Python

First, let's install the necessary dependencies for OpenAI RAG in Python. Create a Python virtual environment and install the required libraries:

pip install fastapi langwatch langchain langchain-openai langchain-community langchain-chroma

Next, create a .env file to securely store your API keys:

OPENAI_API_KEY=<your-api-key> LANGWATCH_API_KEY=<your-api-key>

Awesome! Now let's write some code.

Step 1 - Creating the RAG Module with OpenAI in Python

We'll start by coding our RAG module with OpenAI in Python. We'll structure our codebase to have a modular RAG component that can be easily imported and used with various input parameters. Additionally, we'll have a FastAPI backend to connect the RAG module with users through an API.

import os
from langchain import hub
from langchain_chroma import Chroma
from langchain_core.vectorstores import VectorStoreRetriever
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables.config import RunnableConfig
from langchain_core.runnables.passthrough import RunnablePassthrough
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
import dotenv
from langchain_openai import ChatOpenAI
import langwatch 

dotenv.load_dotenv()

openai_key = os.getenv("OPENAI_API_KEY")

llm = ChatOpenAI(model="gpt-3.5-turbo-0125")

prompt = hub.pull("rlm/rag-prompt")



def scrape_url(url: str) -> list:
    # Load, chunk and index the contents of the blog.
    loader = WebBaseLoader(
        web_paths=(url,),
        bs_kwargs=dict()
    )
    docs = loader.load()
    return docs

def create_retriever(docs: list) -> VectorStoreRetriever:
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
    splits = text_splitter.split_documents(docs)
    vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())

    # Retrieve and generate using the relevant snippets of the blog.
    retriever = vectorstore.as_retriever()
    return retriever


def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

@langwatch.trace()
def ask_rag(question: str, retriever: VectorStoreRetriever):
    config=RunnableConfig(
            callbacks=[
                langwatch.get_current_trace().get_langchain_callback(),
            ]
        )
    question_runnable = RunnablePassthrough().with_config(config=config)
    rag_chain = (
        {"context": retriever | format_docs, "question": question_runnable}
        | prompt
        | llm
        | StrOutputParser()
    )

    answer = rag_chain.invoke(question)
    return answer

This code composes our RAG pipeline from a couple of simple functions, where each resembles an important step in the Retrieval Augmented Generation. This code contains functions to scrape web pages, create vector embeddings out of them and create retrievers that can retrieve related data. The last function performs Question Answering with the help of RAG pipeline.

Pay attention how we add LangWatch tracing in the ask_rag function. We add a decorator at the top of a function definition and we create an instance of a runnable with our custom configurations where we send the callbacks to LangWatch.

Step 2 - Building a FastAPI Server for OpenAI RAG

Next, we'll create a FastAPI server that allows users to interact with the RAG module via an API. The server will have a single endpoint where users can make POST requests with two input arguments: the link to scrape and the question they ask.

from fastapi import FastAPI

from rag import scrape_url, create_retriever, ask_rag

app = FastAPI()

@app.get("/")
async def root():
    return {"message": "Hello World"}

@app.post("/ask")
def ask(question: str, link: str):
    docs = scrape_url(link)
    retriever = create_retriever(docs)
    response = ask_rag(question=question, retriever=retriever)
    return {"message": response}


Step 3 - Containerizing the OpenAI RAG Python App

As a third step we will make our application deployable as a docker container. Docker simplifies the deployment process and significantly reduces all the dependency issues. Our application will consist of one single docker container that will embed the FastAPI backend.

FROM python:3.11

WORKDIR /app

COPY requirements.txt .

RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]

Congrats! Now you can run your application as a docker container. You can build an image out of it with this command ran from the root of the project

docker build -t <image-name

Later, you can run the next command and access your application deployed on http://localhost:8080/docs.

docker run -d --name <container-name> --env-file .env <image-name

Nice progress!

Step 4 - Installing AWS and giving the Permissions

As soon as we have our application containerized we can move to the deployment part. For Mac platform you can run this command

brew install awscli

Great! Now you can connect with your AWS account from your terminal. But before doing it - you have to configure the connection with the right user.

aws configure

After running this command you will be prompted to input AWS Access Key ID, Secret Access Key, region, and output format. You can create the first ones navigating on AWS dashboard from IAM → Users → [Your User]. Make sure to pick your nearest region as an input region. The output format can be left empty or None.

You can later verify your AWS configuration by running this command

aws sts get-caller-identity

Finally, before trying to deploy your application, we need to ensure your IAM user has the necessary permissions to create and manage Elastic Beanstalk environments. Attach the following policies to your IAM user:

AWSElasticBeanstalkFullAccess
IAMFullAccess
AmazonEC2FullAccess
AmazonS3FullAccess

You can attach these policies via the AWS Management Console under IAM -> Users -> [Your User] -> Add permissions -> Attach policies directly.

After this part is done, good job, we will have our app accessible on the internet in a few moments.

Step 5 - Deploying OpenAI RAG Python App on AWS Elastic Beanstalk

Finally, lets install elastic beanstalk on our machine.

brew install aws-elasticbeanstalk

Next, we will initialize the Elastic Beanstalk in the root of the project directory. By running this command we will create a corresponding folder inside of our project. Pay attention - we specify docker as a platform.

eb init -p docker ragapp

After it is initialized we have to create the corresponding environment for our deployment, I call my environment as eb-env. This command will package my application and upload it to the Elastic Beanstalk. It will also create the environment that will be used by the application.

eb create eb-env

Before deploying the application we also have to specify the api keys in our deployed environment

eb setenv OPENAI_API_KEY=your_openai_api_key

Finally, you can deploy the app with simple

eb deploy

Now, you can check the status of your deployment and find out the link to your deployed APIs.

eb status

You are expected to see something like

Environment details for: eb-env
  Application name: ragapp
  Region: us-west-2
  Deployed Version: app-13d3-240710_145838723247
  Environment ID: e-anrxpim3ms
  Platform: arn:aws:elasticbeanstalk:us-west-2::platform/Docker running on 64bit Amazon Linux 2/3.8.3
  Tier: WebServer-Standard-1.0
  CNAME: eb-env.eba-hgsmkwpy.us-west-2.elasticbeanstalk.com
  Status: Ready
  Health: Green

Here, you need to pay attention to two indicators:

  1. Health - should be Green

  2. CNAME - a public url accessing your application.

Finally, you can navigate to eb-env.eba-hgsmkwpy.us-west-2.elasticbeanstalk.com/docs and play with the available API’s

You've successfully built and deployed a customer support chatbot using OpenAI RAG in Python, FastAPI, and AWS Elastic Beanstalk.

Last step is to unlock the BlackBox and see the results coming out of the LLM, improve and iterate. For this, we'd be happy to onboard you on LangWatch.

Did you like this tutorial? Let us know your feedback and we'd be happy to support you.

Happy coding!