Build a Web-based GPT Chatbot with Custom Knowledge

Build a Web-based GPT Chatbot with Custom Knowledge
Photo by Vincent Yuan @USA / Unsplash

The advancement of generative AI is remarkable. Now, with approximately 70 lines of Python code predominantly leveraging OpenAI GPT, llama-index, and Streamlit, you can craft chatbots infused with your specialized domain knowledge. This enables the creation of a personalized assistant tailored specifically for your needs.

This post is going to share how you can build your own powerful Chatbot for your own data!

1 Dependency

A few Python packages are needed for this app:

pip install streamlit openai llama-index nltk

Also, please get an OpenAI API key by following this guide:

  • Go to https://platform.openai.com/account/api-keys.
  • Click on the + Create new secret key button.
  • Enter an identifier name (optional) and click on the Create secret key button.
  • Copy the API key to be used in this tutorial (the key shown below was already revoked).

2 Create the Chatbot

The project folder can be set up like this:

chatbot
| |_main.py
| |_data
| |_.streamlit
| 	|_secrets.toml
File Tree

These files are:

  • a main.py to store the code of the app.
  • a data folder that stores the input data.
  • secrets.toml in the .streamlit folder that has the Open AI API key, like openai_key = 'Your OpenAI API key'

The main.py looks like this, this is all that it needs to pull up the chatbot:

import streamlit as st
from llama_index import VectorStoreIndex, ServiceContext, Document
from llama_index.llms import OpenAI
import openai
from llama_index import SimpleDirectoryReader

st.set_page_config(page_title="Chat with the Streamlit docs, powered by LlamaIndex", page_icon="๐Ÿฆ™", layout="centered", initial_sidebar_state="auto", menu_items=None)
openai.api_key = st.secrets.openai_key
st.title("Chat with the domain knowledge, powered by LlamaIndex ๐Ÿ’ฌ๐Ÿฆ™")
st.info("Check out the full tutorial to build this app in our [blog post](https://blog.streamlit.io/build-a-chatbot-with-custom-data-sources-powered-by-llamaindex/)", icon="๐Ÿ“ƒ")
         
if "messages" not in st.session_state.keys(): # Initialize the chat messages history
    st.session_state.messages = [
        {"role": "assistant", "content": "Ask me a question about the fils you uploaded just now!"}
    ]

# Utilities functions    
def save_uploadedfile(uploadedfile):
    import os
    
    with open(os.path.join("data",uploadedfile.name),"wb") as f:
         f.write(uploadedfile.getbuffer())
            
    return st.success(f"File saved to data folder!" )    
    
    
@st.cache_resource(show_spinner=False)
def load_data():
    with st.spinner(text="Loading and indexing the Streamlit docs โ€“ hang tight! This should take 1-2 minutes."):
        reader = SimpleDirectoryReader(input_dir="./data", recursive=True)
        docs = reader.load_data()
        service_context = ServiceContext.from_defaults(llm=OpenAI(model="gpt-3.5-turbo", temperature=0.5, system_prompt="You are an expert on the fraud detection and your job is to answer technical questions. Assume that all questions are related to the credit card fraud detection. Keep your answers technical and based on facts โ€“ do not hallucinate features."))
        index = VectorStoreIndex.from_documents(docs, service_context=service_context)
        return index

    
    
# Main app

datafile = st.file_uploader("Upload your data (string only)",type=['str','csv','txt'])

if datafile is not None:

    save_uploadedfile(datafile)


    index = load_data()

    if "chat_engine" not in st.session_state.keys(): # Initialize the chat engine
            st.session_state.chat_engine = index.as_chat_engine(chat_mode="condense_question", verbose=True)

    if prompt := st.chat_input("Your question"): # Prompt for user input and save to chat history
        st.session_state.messages.append({"role": "user", "content": prompt})

    for message in st.session_state.messages: # Display the prior chat messages
        with st.chat_message(message["role"]):
            st.write(message["content"])

    # If last message is not from assistant, generate a new response
    if st.session_state.messages[-1]["role"] != "assistant":
        with st.chat_message("assistant"):
            with st.spinner("Thinking..."):
                response = st.session_state.chat_engine.chat(prompt)
                st.write(response.response)
                message = {"role": "assistant", "content": response.response}
                st.session_state.messages.append(message) # Add response to message history
GPT Chatbot

Intuitively, it accepts the input from users, in this example it only accepts text and csv files, you can tweak the input types following the Streamlit guide.

Then the input file are saved and used to create a vector store, which has been used for the GPT to refer to in the question-answering sessions.

3 Activate the Chatbot

In the terminal of your laptop, run command:

streamlit run main.py
Activate the App

The the app is up and you can upload your own files to the app, once that is done, the chat box will appear and you can ask questions regarding the data.

The sample credit card transaction data can be downloaded at datacamp.
Chatbot UI
๐Ÿ’ก
Watch out the cost of the usage of OpenAI API, set a limit for the cost just to avoid unexpected intensive usage.

Reference

LlamaIndex app demo on Streamlit Blog

Build a chatbot with custom data sources, powered by LlamaIndex
Augment any LLM with your own data in 43 lines of code!

Streamlit file uploader

st.file_uploader - Streamlit Docs
st.file_uploader displays a file uploader widget.