Deploying an OpenAI RAG Application to AWS ElasticBeanstalk

Zhenya
Jul 27, 2024
Welcome back to our series on building customer support chatbots using Retrieval Augmented Generation (RAG) with OpenAI in Python. Today, we're diving deeper into creating a chatbot that can efficiently answer questions based on provided links. Our setup will utilize FastAPI for the backend, LangChain RAG for the retrieval and generation process, and LangWatch for monitoring. By the end of this tutorial, you'll have a fully functional chatbot deployed on AWS Elastic Beanstalk.
Here's what you'll need to follow along:
Python 3.11 installed on your machine
An AWS account (sign up here)
Access to LangWatch (get started here)
Familiarity with FastAPI and LangChain
We'll kick things off by setting up our development environment, followed by creating the RAG module, building the FastAPI server, containerizing our application, and finally, deploying it to AWS Elastic Beanstalk. So, let's get started and bring your chatbot to life! 🚀
Step 0 - Setting up the Environment for OpenAI RAG in Python
First, let's install the necessary dependencies for OpenAI RAG in Python. Create a Python virtual environment and install the required libraries:
Next, create a .env
file to securely store your API keys:
Awesome! Now let's write some code.
Step 1 - Creating the RAG Module with OpenAI in Python
We'll start by coding our RAG module with OpenAI in Python. We'll structure our codebase to have a modular RAG component that can be easily imported and used with various input parameters. Additionally, we'll have a FastAPI backend to connect the RAG module with users through an API.
This code composes our RAG pipeline from a couple of simple functions, where each resembles an important step in the Retrieval Augmented Generation. This code contains functions to scrape web pages, create vector embeddings out of them and create retrievers that can retrieve related data. The last function performs Question Answering with the help of RAG pipeline.
Pay attention how we add LangWatch tracing in the ask_rag
function. We add a decorator at the top of a function definition and we create an instance of a runnable with our custom configurations where we send the callbacks to LangWatch.
Step 2 - Building a FastAPI Server for OpenAI RAG
Next, we'll create a FastAPI server that allows users to interact with the RAG module via an API. The server will have a single endpoint where users can make POST
requests with two input arguments: the link to scrape and the question they ask.
Step 3 - Containerizing the OpenAI RAG Python App
As a third step we will make our application deployable as a docker container. Docker simplifies the deployment process and significantly reduces all the dependency issues. Our application will consist of one single docker container that will embed the FastAPI backend.
Congrats! Now you can run your application as a docker container. You can build an image out of it with this command ran from the root of the project
Later, you can run the next command and access your application deployed on http://localhost:8080/docs.
Nice progress!
Step 4 - Installing AWS and giving the Permissions
As soon as we have our application containerized we can move to the deployment part. For Mac platform you can run this command
Great! Now you can connect with your AWS account from your terminal. But before doing it - you have to configure the connection with the right user.
After running this command you will be prompted to input AWS Access Key ID, Secret Access Key, region, and output format. You can create the first ones navigating on AWS dashboard from IAM → Users → [Your User]. Make sure to pick your nearest region as an input region. The output format can be left empty or None.
You can later verify your AWS configuration by running this command
Finally, before trying to deploy your application, we need to ensure your IAM user has the necessary permissions to create and manage Elastic Beanstalk environments. Attach the following policies to your IAM user:
You can attach these policies via the AWS Management Console under IAM -> Users -> [Your User] -> Add permissions -> Attach policies directly.
After this part is done, good job, we will have our app accessible on the internet in a few moments.
Step 5 - Deploying OpenAI RAG Python App on AWS Elastic Beanstalk
Finally, lets install elastic beanstalk on our machine.
Next, we will initialize the Elastic Beanstalk in the root of the project directory. By running this command we will create a corresponding folder inside of our project. Pay attention - we specify docker as a platform.
After it is initialized we have to create the corresponding environment for our deployment, I call my environment as eb-env. This command will package my application and upload it to the Elastic Beanstalk. It will also create the environment that will be used by the application.
Before deploying the application we also have to specify the api keys in our deployed environment
Finally, you can deploy the app with simple
Now, you can check the status of your deployment and find out the link to your deployed APIs.
You are expected to see something like
Here, you need to pay attention to two indicators:
Health - should be Green
CNAME - a public url accessing your application.
Finally, you can navigate to eb-env.eba-hgsmkwpy.us-west-2.elasticbeanstalk.com/docs and play with the available API’s
You've successfully built and deployed a customer support chatbot using OpenAI RAG in Python, FastAPI, and AWS Elastic Beanstalk.
Last step is to unlock the BlackBox and see the results coming out of the LLM, improve and iterate. For this, we'd be happy to onboard you on LangWatch.
Did you like this tutorial? Let us know your feedback and we'd be happy to support you.
Happy coding!