Skip to content

Commit

Permalink
WIP: Feature: Create a CLI to run or deploy any Streamlit POC #327
Browse files Browse the repository at this point in the history
  • Loading branch information
flamingquaks committed Oct 29, 2024
1 parent 2783e60 commit efa34b3
Show file tree
Hide file tree
Showing 20 changed files with 1,326 additions and 27 deletions.
43 changes: 43 additions & 0 deletions ADDING_A_NEW_POC.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Adding a new POC to the GenAI-QuickStart-POCs repo.

The following guidance is applicable to **python** POCs only. If you are developing a .NET POC, development is via Visual Studio. For .NET, please follow design patterns currently present in `genai-quickstart-pocs-dot-net/Genai.Quickstart.Pocs`.

## Using Projen to manage Python Projects.

As the GenAI-QuickStart POCs repo has grown, the maintainers group has decided to introduce **Projen** to define a project structure for POCS and allow new POCs to be added programmatically, ensuring consistent developer & user expierences, consistent documentation & ease of long-term manangement of the repository.

To learn more about Projen, visit [https://projen.io/docs/introduction/](https://projen.io/docs/introduction/)

## Synthesizing the base POC

- Navigate to `.projenrc.ts` in the root of the repository. This file contains the definitons of all python projects.
- Following the pattern of existing repositories, define the new streamlit POC. The definion allows for details to be provided that will be added to the README template. This includes details such as additional prerequisites, POC goal, file walkthrough, extra steps the user needs to follow to run the POC, and more.
A new POC at it's most basic would look like this:
``typescript
new StreamlitQuickStartPOC({
parentProject: project,
pocName: 'Amazon Bedrock Video Chapter Creator POC',
pocPackageName: 'amazon-bedrock-video-chapter-creator-poc',
additionalDeps: ['langchain@^0.1', 'pandas', 'opensearch-py', 'thefuzz'],
pocDescription: 'This is sample code demonstrating the use of Amazon Transcribe, Amazon OpenSearch Serverless, Amazon Bedrock and Generative AI, to a implement video chapter generator and video search sample. The application is constructed with a simple streamlit frontend where users can upload a video that will be stored, transcribed and have searchable chapters generated. Additionally, if you have videos already uploaded to S3 and have subtitles for the video already created in `.srt` format, you can skip transcribing and jump straight into generating chapters.\n\nThe sample also includes a second UI that allows the user to ask about a topic. This will search the video chapters from the videos you\'ve provided and provide a video, set to a specific chapter, that was the closest match to the inquiry.',
});
``
This is a basic definition. To add more information to the `README` file, add the `readme` property as defined in `StreamlitQuickStartPOCProps` in `projenrc/projects/streamlit-quickstart-poc.ts`.

- Once you've defined the POC, open your terminal. Navigate the terminal so the directory is the root of this repository. Execute the following command:
```shell
npx projen
```
If prompted to install `projen` proceed to install it.
- After the command execution completes, you will see a new folder and contents added to `genai-quickstart-pocs-python`. Open this folder to see the base POC added. The POC is now ready to be developed.

## Developing the POC

Generally speaking, developing the POC should be no different with or without projen.
The following are the exceptions you should note:

- **If you need to add python dependencies**: `projen` is responsible for managing the python dependencies and generates the `requirements.txt` file. Do not modify the `requirements.txt`. To add dependencies, locate the POC in the `.projenrc.ts` file and add `additionalDeps`, which is a string array. You can just add the dependency as a string to the array or you can additionally include the version number, like `langchain@^0.2`. After you've updated the dependencies, re-run `npx projen` to refresh dependencies. Return your terminal to the POC directory and follow the README guidance for installing the dependencies.

- Developing the README file: `projen` automates the creation of the `README` file. If you want to add details to the README, it is recommend to define them in the `.projenrc.ts` file. This allows projen to keep README files consistent. That being said, once a README file is generated in a POC, it will NOT be overwritten by `projen`. If you add more details to `.projenrc.ts`, you can delete the README in the POC directory to have it regenerated on the next `npx projen`.

- README walkthrough GIF & architecture image:
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,7 @@
llm = Bedrock(
credentials_profile_name=os.getenv("profile_name"),
model_id="amazon.titan-text-express-v1",
endpoint_url="https://bedrock-runtime.us-east-1.amazonaws.com",
region_name="us-east-1",
region_name=os.getenv("region_name"),
verbose=True
)

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
# Amazon-Bedrock-Amazon-RDS-POC

This is sample code demonstrating the use of Amazon Bedrock and Generative AI to use natural language questions to query relational data stores, specifically Amazon RDS. This example leverages the MOMA Open Source Database: https://github.com/MuseumofModernArt/collection.

![Alt text](images/demo.gif)

# **Goal of this Repo:**

The goal of this repo is to provide users the ability to use Amazon Bedrock and generative AI to take natural language questions, and transform them into relational database queries against Amazon RDS Databases. This repo is designed to work with
Amazon RDS Postgres, but can be configured to work with other database engine types.
This repo comes with a basic frontend to help users stand up a proof of concept in just a few minutes.

The architecture and flow of the sample application will be:

![Alt text](images/architecture.png "POC Architecture")

When a user interacts with the GenAI app, the flow is as follows:

1. The user makes a request, asking a natural language question based on the data in Amazon RDS to the GenAI app (app.py).
2. This natural language question is passed into Amazon Bedrock, which takes the natural language question and creates a SQL query (amazonRDS_bedrock_query.py).
3. The created SQL query is then executed against your Amazon RDS database to begin retrieving the data (amazonRDS_bedrock_query.py).
4. The data is retrieved from your Amazon RDS Database and is passed back into Amazon Bedrock, to generate a natural language answer based on the retrieved data (amazonRDS_bedrock_query.py).
5. The LLM returns a natural language response to the user through the streamlit frontend based on the retrieved data (app.py).

# How to use this Repo:

## Prerequisites:

1. Amazon Bedrock Access and CLI Credentials. Ensure that the proper FM model access is provided in the Amazon Bedrock console
2. Amazon RDS Access and the ability to create a database.
3. Ensure Python 3.10 installed on your machine, it is the most stable version of Python for the packages we will be using, it can be downloaded [here](https://www.python.org/downloads/release/python-3100/).
4. Please note that this project leverages the [langchain-experimental](https://pypi.org/project/langchain-experimental/) package which has known vulnerabilities.

## Step 1:

The first step of utilizing this repo is performing a git clone of the repository.

```
git clone https://github.com/aws-samples/genai-quickstart-pocs.git
```

After cloning the repo onto your local machine, open it up in your favorite code editor.The file structure of this repo is broken into 4 key files,
the app.py file, the amazonRDS_bedrock_query.py file, the moma_examples.yaml file, and the requirements.txt. The app.py file houses the frontend application (streamlit app).
The amazonRDS_bedrock_query.py file contains connectors into your Amazon RDS instance and the interaction with Amazon Bedrock through LangChains SQLDatabaseChain.
The moma_examples.yaml file contains several samples prompts that will be used to implement a few-shot prompting technique. Last, the requirements.txt
file has all the requirements needed to get the sample application up and running.

## Step 2:

Set up a python virtual environment in the root directory of the repository and ensure that you are using Python 3.9. This can be done by running the following commands:

```
pip install virtualenv
python3.10 -m venv venv
```

The virtual environment will be extremely useful when you begin installing the requirements. If you need more clarification on the creation of the virtual environment please refer to this [blog](https://www.freecodecamp.org/news/how-to-setup-virtual-environments-in-python/).
After the virtual environment is created, ensure that it is activated, following the activation steps of the virtual environment tool you are using. Likely:

```
cd venv
cd bin
source activate
cd ../../
```

After your virtual environment has been created and activated, you can install all the requirements found in the requirements.txt file by running this command in the root of this repos directory in your terminal:

```
pip install -r requirements.txt
```

## Step 3:

Now that all the requirements have been successfully installed in your virtual environment we can begin configuring environment variables.
You will first need to create a .env file in the root of this repo. Within the .env file you just created you will need to configure the .env to contain:

```
profile_name=<aws_cli_profile_name>
rds_username=<rds_database_username>
rds_password=<rds_database_password>
rds_endpoint=<rds_database_endpoint>
rds_port=<rds_port>
rds_db_name=<rds_database_name>
```

Please ensure that your AWS CLI Profile has access to Amazon Bedrock!

Depending on the region and model that you are planning to use Amazon Bedrock in, you may need to reconfigure lines 19-25 in the amazonRDS_bedrock_query.py file:

```
llm = Bedrock(
credentials_profile_name=os.getenv("profile_name"),
model_id="amazon.titan-text-express-v1",
endpoint_url="https://bedrock-runtime.us-east-1.amazonaws.com",
region_name="us-east-1",
verbose=True
)
```

# Step 4

If you would like to use this repo with the sample data, you will need to upload the two sample data files found in the sample data directory as two individual tables to your Amazon RDS Postgres Database.

If you preferred to use your own database/tables in your Amazon RDS instance, I would highly recommend reviewing the moma_examples.yaml file in the SampleData directory to see how prompts are constructed for this sample application and spend the time creating 5 - 10 prompts that resemble your dataset more closely.

# Step 5

At this point the application should be ready to go. To start up the application with its basic frontend you simply need to run the following command in your terminal while in the root of the repositories' directory:

```
streamlit run app.py
```

As soon as the application is up and running in your browser of choice you can begin asking natural language questions against your Amazon RDS Database.
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
# Amazon-Bedrock-Amazon-Redshift-POC

This is sample code demonstrating the use of Amazon Bedrock and Generative AI to use natural language questions to query relational data stores, specifically Amazon Redshift. This example leverages the MOMA Open Source Database: https://github.com/MuseumofModernArt/collection.

**Please Note: If you don't want to build this from scratch, Amazon Redshift now supports GenAI capabilities natively, more information on that can be found [here](https://aws.amazon.com/blogs/aws/amazon-redshift-adds-new-ai-capabilities-to-boost-efficiency-and-productivity/).**

![Alt text](images/demo.gif)
# **Goal of this Repo:**

The goal of this repo is to provide users the ability to use Amazon Bedrock and generative AI to take natural language questions, and transform them into relational database queries against Amazon Redshift Databases. This repo is designed to work with
Amazon Redshift Provisioned Clusters. This repo comes with a basic frontend to help users stand up a proof of concept in just a few minutes.

The architecture and flow of the sample application will be:

![Alt text](images/architecture.png "POC Architecture")

When a user interacts with the GenAI app, the flow is as follows:

1. The user makes a request, asking a natural language question based on the data in Amazon Redshift to the GenAI app (app.py).
2. This natural language question is passed into Amazon Bedrock, which takes the natural language question and creates a SQL query (amazon_redshift_bedrock_query.py).
3. The created SQL query is then executed against your Amazon Redshift cluster to begin retrieving the data (amazon_redshift_bedrock_query.py).
4. The data is retrieved from your Amazon Redshift cluster and is passed back into Amazon Bedrock, to generate a natural language answer based on the retrieved data (amazon_redshift_bedrock_query.py).
5. The LLM returns a natural language response to the user through the streamlit frontend based on the retrieved data (app.py).

# How to use this Repo:

## Prerequisites:

1. Amazon Bedrock Access and CLI Credentials. Ensure that the proper FM model access is provided in the Amazon Bedrock console
2. Amazon Redshift Access and the ability to create an Amazon Redshift Provisioned cluster.
3. Ensure Python 3.10 installed on your machine, it is the most stable version of Python for the packages we will be using, it can be downloaded [here](https://www.python.org/downloads/release/python-3100/).
4. Please note that this project leverages the [langchain-experimental](https://pypi.org/project/langchain-experimental/) package which has known vulnerabilities.

## Step 1:

The first step of utilizing this repo is performing a git clone of the repository.

```
git clone https://github.com/aws-samples/genai-quickstart-pocs.git
```

After cloning the repo onto your local machine, open it up in your favorite code editor.The file structure of this repo is broken into 4 key files,
the app.py file, the amazon_redshift_bedrock_query.py file, the moma_examples.yaml file, and the requirements.txt. The app.py file houses the frontend application (streamlit app).
The amazon_redshift_bedrock_query.py file contains connectors into your Amazon Redshift cluster and the interaction with Amazon Bedrock through LangChains SQLDatabaseChain.
The moma_examples.yaml file contains several samples prompts that will be used to implement a few-shot prompting technique. Last, the requirements.txt
file has all the requirements needed to get the sample application up and running.

## Step 2:

Set up a python virtual environment in the root directory of the repository and ensure that you are using Python 3.9. This can be done by running the following commands:

```
pip install virtualenv
python3.10 -m venv venv
```

The virtual environment will be extremely useful when you begin installing the requirements. If you need more clarification on the creation of the virtual environment please refer to this [blog](https://www.freecodecamp.org/news/how-to-setup-virtual-environments-in-python/).
After the virtual environment is created, ensure that it is activated, following the activation steps of the virtual environment tool you are using. Likely:

```
cd venv
cd bin
source activate
cd ../../
```

After your virtual environment has been created and activated, you can install all the requirements found in the requirements.txt file by running this command in the root of this repos directory in your terminal:

```
pip install -r requirements.txt
```

## Step 3:

Now that all the requirements have been successfully installed in your virtual environment we can begin configuring environment variables.
You will first need to create a .env file in the root of this repo. Within the .env file you just created you will need to configure the .env to contain:

```
profile_name=<AWS_CLI_PROFILE_NAME>
redshift_host=<REDSHIFT_HOST_URL> example -> redshift-cluster-1.abcdefghijk123.us-east-1.redshift.amazonaws.com
redshift_port=<REDSHIFT_PORT>
redshift_database=<REDSHIFT_DATABASE_NAME>
redshift_username=<REDSHIFT_USERNAME>
redshift_password=<REDSHIFT_PASSWORD>
```

Please ensure that your AWS CLI Profile has access to Amazon Bedrock!

Depending on the region and model that you are planning to use Amazon Bedrock in, you may need to reconfigure lines 19-25 in the amazon_redshift_bedrock_query.py file:

```
llm = Bedrock(
credentials_profile_name=os.getenv("profile_name"),
model_id="amazon.titan-text-express-v1",
endpoint_url="https://bedrock-runtime.us-east-1.amazonaws.com",
region_name="us-east-1",
verbose=True
)
```

# Step 4

If you would like to use this repo with the sample data, you will need to upload the two sample data files found in the sample data directory as two individual tables to your Amazon Redshift Cluster.

If you preferred to use your own database/tables in your Amazon Redshift Cluster, I would highly recommend reviewing the moma_examples.yaml file in the SampleData directory to see how prompts are constructed for this sample application and spend the time creating 5 - 10 prompts that resemble your dataset more closely.

# Step 5

At this point the application should be ready to go. To start up the application with its basic frontend you simply need to run the following command in your terminal while in the root of the repositories' directory:

```
streamlit run app.py
```

As soon as the application is up and running in your browser of choice you can begin asking natural language questions against your Amazon Redshift Cluster.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@

st.title('Hello!')
st.write("This is a Streamlit app written in Python.")
st.write("To edit this app, go to `{{ outDir }}`.")
st.write("To edit this app, go to `/Users/awsrudy/Documents/Projects/genai-quickstart-pocs/genai-quickstart-pocs-python/amazon-bedrock-semantic-cache-poc-main`.")
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@

st.title('Hello!')
st.write("This is a Streamlit app written in Python.")
st.write("To edit this app, go to `{{ outDir }}`.")
st.write("To edit this app, go to `/Users/awsrudy/Documents/Projects/genai-quickstart-pocs/genai-quickstart-pocs-python/amazon-bedrock-translation-poc`.")
Loading

0 comments on commit efa34b3

Please sign in to comment.