<![CDATA[Spacecraft]]>https://realvincentyuan.github.io/Spacecraft/https://realvincentyuan.github.io/Spacecraft/favicon.pngSpacecrafthttps://realvincentyuan.github.io/Spacecraft/Ghost 5.87Sun, 07 Jul 2024 22:01:47 GMT60<![CDATA[Support Conversational History in RAG Pipelines with Llama 3]]>https://realvincentyuan.github.io/Spacecraft/support-conversational-history-in-rag-pipelines-with-llama-3/668afeb5ac15d470add4a53fSun, 07 Jul 2024 20:47:28 GMT

In Retrieval-Augmented Generation (RAG) pipelines, it's crucial to help chatbots recall previous conversations, as users may ask follow-up questions that rely on earlier context. However, users' prompts might lack sufficient context, assuming previous discussions are still relevant. To tackle this challenge, incorporating chat history into LLMs' question-answering context enables them to retrieve relevant information for new queries.

This post presents a solution leveraging LangChain, Llama 3-8B, and Ollama, which can efficiently run on an M2 Pro MacBook Pro with 16 GB memory.

1 Dependencies

1.1 Ollama and Llama 3 Model

Firstly, Ollama should be installed on a MacBook. Ollama can utilize the GPUs of the machine, ensuring efficient inference, provided there is sufficient memory. Llama 3-8B performs well on machines with 16 GB of memory.

💡
Ollama can be downloaded here: https://ollama.com/

Once it is downloaded, can use below command in the terminal to pull the Llama 3-8B model:

ollama pull llama3

1.2 Python Dependencies

Now, let's import the required packages to construct a RAG system with chat history, utilizing the LangChain toolkit.

# Models
from langchain.llms import LlamaCpp
from langchain.chat_models import ChatOpenAI

# Setup
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

# Vector store
from langchain.document_loaders import  TextLoader
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter

# LangChain supports many other chat models. Here, we're using Ollama
from langchain_community.chat_models import ChatOllama
from langchain_core.prompts import ChatPromptTemplate

# RAG with Memory 
from langchain.chains import create_history_aware_retriever, create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory

# Display results
import markdown
from IPython.display import display, Markdown, Latex

2 Create Vector Store

The source data consists of a summary of important events and statistics from the week of May 13th, 2024, as published by Yahoo Finance. This data is not included in the training set of Llama 3. For demonstration purposes, the news is extracted to a text file and utilized in the code to create the Chroma vector store and retriever.

source_data_path = '../data/yahoo.txt'

# for token-wise streaming so you'll see the answer gets generated token by token when Llama is answering your question
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

loader = TextLoader(source_data_path)

documents = loader.load()

#splitting the text into
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
texts = text_splitter.split_documents(documents)

embedding = HuggingFaceEmbeddings()

vectordb = Chroma.from_documents(documents=texts,
                                 embedding=embedding
                                 # persist_directory=persist_directory
                                )
                                
retriever = vectordb.as_retriever(search_kwargs={"k": 5})

3 Create the LLM Object

Make sure the Ollama is on and the LLama 3 model has been downloaded, then below code can be used to define a LLM object in the pipeline:

llm = ChatOllama(model="llama3",
                temperature=0.1)

4 RAG with Memory

In essence, there should be place to store chat history, also the the chat history is added to the prompt  in RAG, so that the LLM can access past conversation, also the chat history is update after each round of conversation. Below is a way to use the BaseChatMessageHistory to address this need:

### Contextualize question ###
contextualize_q_system_prompt = """Given a chat history and the latest user question \
which might reference context in the chat history, formulate a standalone question \
which can be understood without the chat history. Do NOT answer the question, \
just reformulate it if needed and otherwise return it as is."""
contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", contextualize_q_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)
history_aware_retriever = create_history_aware_retriever(
    llm, retriever, contextualize_q_prompt
)


### Answer question ###
qa_system_prompt = """You are an assistant for question-answering tasks. \
Use the following pieces of retrieved context to answer the question. \
If you don't know the answer, just say that you don't know. \
Use three sentences maximum and keep the answer concise.\

{context}"""
qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", qa_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)
question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)

rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)

Then define the RAG chain:

### Statefully manage chat history ###
store = {}

def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]


conversational_rag_chain = RunnableWithMessageHistory(
    rag_chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
    output_messages_key="answer",
)

Then let's try if the model understands the Yahoo Finance analysis, the question is What is the wall street expectation of the April Consumer Price Index (CPI)?.

llm_response = conversational_rag_chain.invoke(
    {"input": "What is the wall street expectation of the April Consumer Price Index (CPI)?"},
    config={
        "configurable": {"session_id": "abc123"}
    },  # constructs a key "abc123" in `store`.
)["answer"]

print('='*50)
display(Markdown(llm_response))

The response is:

According to the text, Wall Street expects an annual gain of 3.4% for headline CPI, which includes the price of food and energy, a decrease from the 3.5% headline number in March. Additionally, prices are expected to rise 0.4% on a month-over-month basis, in line with March's rise.

This is aligned with the source:

Support Conversational History in RAG Pipelines with Llama 3
Yahoo Finance Analysis

Then, a question is asked based on the output of last question to calculate the double of the expected CPI:

llm_response = conversational_rag_chain.invoke(
    {"input": "What is the double of the expected CPI in the prior answer?"},
    config={
        "configurable": {"session_id": "abc123"}
    },  # constructs a key "abc123" in `store`.
)["answer"]

print('='*50)
display(Markdown(llm_response))

And this is the output:

The expected annual gain for headline CPI is 3.4%. The double of this value would be:

2 x 3.4% = 6.8%

So, the double of the expected CPI is 6.8%.

So the model successfully picks up the information that it returns in the past and answer correctly to the new question.

5 Summary

This enhanced solution extends the capabilities of a regular RAG by supporting chat history, making it highly beneficial for multiple rounds of conversations. With Ollama, experiments like this can be run on an affordable laptop with embedded GPUs. A special acknowledgment to Meta for their great work in improving Llama 3.

]]>
<![CDATA[Build a Regulation Assistant Powered by Llama 2 and Streamlit with Google Colab GPUs]]>https://realvincentyuan.github.io/Spacecraft/build-a-regulation-assistant-powered-by-llama-2-and-streamlit-with-google-colab-gpus/668afd59ac15d470add4a52fSun, 07 Jul 2024 20:42:19 GMT

In our previous discussion, we explored the concept of creating a web chatbot using Llama 2. However, an incredibly practical application of chatbots is their ability to field questions within specific domains of knowledge. For example, a chatbot can be trained on policies, regulations, and laws, effectively functioning as a knowledge assistant that users can collaborate with. This functionality holds significant value for enterprise users, who often have vast repositories of internal documents that can be utilized to train the chatbot. Employees can then leverage the chatbot as a quick reference tool.

Furthermore, this solution can be entirely constructed using open-source components, eliminating the need to rely on external APIs like OpenAI and alleviating any privacy concerns.

This post showcases a compliance assistant built with the utilization of the open-source large language model Llama 2, in conjunction with retrieval-augmented generation (RAG), all presented through a user-friendly web interface powered by Streamlit.

💡
The code can be replicated on Google Colab, using free T4 GPUs. Kudos to Google.

1 Dependencies

Firstly, install a few dependencies:

!pip install -q streamlit

!npm install localtunnel

# GPU setup of LangChain
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install --force-reinstall llama-cpp-python==0.2.28  --no-cache-dir

!pip install huggingface_hub  chromadb langchain sentence-transformers pypdf 

Then download the Llama 2 model to the Colab notebook:

!wget https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_0.gguf

1.1 Mount the Google Drive

This chatbot needs to retrieve documents from a vector database which is composed of embeddings of regulations PDFs. The PDFs are saved in Google Drive, so let's mount the Google Drive so the code can access the PDFs:

# Mount the google drive
from google.colab import drive
drive.mount('/gdrive')

2 Build the Web Chatbot

The web chatbot is like this:

Build a Regulation Assistant Powered by Llama 2 and Streamlit with Google Colab GPUs
A Compliance Assistant 

Below is the entire code to build the compliance assistant, the details of each part will be introduced in the follow section:

%%writefile app.py

import streamlit as st
import os

from langchain.llms import LlamaCpp
from langchain.chains import LLMChain
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.prompts import PromptTemplate

from langchain.llms import LlamaCpp

from langchain_community.document_loaders import PyPDFLoader, PyPDFDirectoryLoader
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma

from langchain.chains import RetrievalQA




# App title
st.set_page_config(page_title="🦙💬 Llama 2 Chatbot")

llama_model_path = 'llama-2-7b-chat.Q5_0.gguf'

n_gpu_layers = 40  # Change this value based on your model and your GPU VRAM pool.
n_batch = 512  # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.

# for token-wise streaming so you'll see the answer gets generated token by token when Llama is answering your question
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])



# ====================== RAG ======================

# Encoding the PDFs
pdf_folder_path = '/gdrive/MyDrive/Research/Data/GenAI/PDFs'

loader = PyPDFDirectoryLoader(pdf_folder_path)

documents = loader.load()

#splitting the text into
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
texts = text_splitter.split_documents(documents)

# Create vector DB, embed and store the texts
# Supplying a persist_directory will store the embeddings on disk
persist_directory = 'db'

## here we are using OpenAI embeddings but in future we will swap out to local embeddings
embedding = HuggingFaceEmbeddings()

vectordb = Chroma.from_documents(documents=texts,
                                 embedding=embedding,
                                 persist_directory=persist_directory)

retriever = vectordb.as_retriever(search_kwargs={"k": 5})

# ====================== App ======================
with st.sidebar:
    st.title('🦙💬 Llama 2 Chatbot')


    st.subheader('Models and parameters')
    selected_model = st.sidebar.selectbox('Choose a Llama2 model', ['Llama2-7B', 'Llama2-13B'], key='selected_model')

    if selected_model == 'Llama2-7B':
        llm_path = llama_model_path
    elif selected_model == 'Llama2-13B':
        llm_path = llama_model_path

    temperature = st.sidebar.slider('temperature', min_value=0.01, max_value=5.0, value=0.1, step=0.01)
    top_p = st.sidebar.slider('top_p', min_value=0.01, max_value=1.0, value=0.9, step=0.01)
    max_length = st.sidebar.slider('max_length', min_value=32, max_value=128, value=120, step=8)
    st.markdown('📖 Learn how to build this app in this [blog](https://blog.streamlit.io/how-to-build-a-llama-2-chatbot/)!')


    llm = LlamaCpp(
      model_path=llm_path,
      temperature=temperature,
      top_p=top_p,
      n_ctx=2048,
      n_gpu_layers=n_gpu_layers,
      n_batch=n_batch,
      callback_manager=callback_manager,
      verbose=True,
    )

    # use another LangChain's chain, RetrievalQA, to associate Llama with the loaded documents stored in the vector db



# Store LLM generated responses
if "messages" not in st.session_state.keys():
    st.session_state.messages = [{"role": "assistant", "content": "How may I assist you today?"}]

# Display or clear chat messages
for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.write(message["content"])

def clear_chat_history():
    st.session_state.messages = [{"role": "assistant", "content": "How may I assist you today?"}]
st.sidebar.button('Clear Chat History', on_click=clear_chat_history)


# Function for generating LLaMA2 response. Refactored from https://github.com/a16z-infra/llama2-chatbot
def generate_llama2_response(prompt_input):

    pre_prompt = """[INST] <<SYS>>
                  You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.

                  If you cannot answer the question from the given documents, please state that you do not have an answer.\n
                  """


    for dict_message in st.session_state.messages:
        if dict_message["role"] == "user":
            pre_prompt += "User: " + dict_message["content"] + "\n\n"
        else:
            pre_prompt += "Assistant: " + dict_message["content"] + "\n\n"

    prompt = pre_prompt +  "{context}User : {question}" + "[\INST]"
    llama_prompt = PromptTemplate(template=prompt, input_variables=["context","question"])


    qa_chain = RetrievalQA.from_chain_type(
        llm,
        retriever=retriever,
         chain_type_kwargs={"prompt": llama_prompt}
    )

    result = qa_chain.run({
                            "query": prompt_input})


    return result

# User-provided prompt
if prompt := st.chat_input():
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message("user"):
        st.write(prompt)

# Generate a new response if last message is not from assistant
if st.session_state.messages[-1]["role"] != "assistant":
    with st.chat_message("assistant"):
        with st.spinner("Thinking..."):
            response = generate_llama2_response(prompt)
            placeholder = st.empty()
            full_response = ''
            for item in response:
                full_response += item
                placeholder.markdown(full_response)
            placeholder.markdown(full_response)
    message = {"role": "assistant", "content": full_response}
    st.session_state.messages.append(message)

2.1 Model Setup

In the code, firstly tweak the params per your hardware, models and objectives:

llama_model_path = 'llama-2-7b-chat.Q5_0.gguf'

n_gpu_layers = 40  # Change this value based on your model and your GPU VRAM pool.
n_batch = 512  # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.

The free version of Colab does not have too much memory so here the llama-2-7b-chat.Q5_0.gguf is used but you can use a larger model for better performance.

2.2 Vector Database

In order to perform RAG, a vector database has to be created first, in this example, the code read the regulation B and regulation Z PDFs and embed them, then a vector database is created based on that:


# ====================== RAG ======================

# Encoding the PDFs
pdf_folder_path = '/gdrive/MyDrive/Research/Data/GenAI/PDFs'

loader = PyPDFDirectoryLoader(pdf_folder_path)

documents = loader.load()

#splitting the text into
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
texts = text_splitter.split_documents(documents)

# Create vector DB, embed and store the texts
# Supplying a persist_directory will store the embeddings on disk
persist_directory = 'db'

## here we are using OpenAI embeddings but in future we will swap out to local embeddings
embedding = HuggingFaceEmbeddings()

vectordb = Chroma.from_documents(documents=texts,
                                 embedding=embedding,
                                 persist_directory=persist_directory)

retriever = vectordb.as_retriever(search_kwargs={"k": 5})

2.3 Message Management

Then, these are the setup for the display/clear of messages of the chatbot:

# Store LLM generated responses
if "messages" not in st.session_state.keys():
    st.session_state.messages = [{"role": "assistant", "content": "How may I assist you today?"}]

# Display or clear chat messages
for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.write(message["content"])

def clear_chat_history():
    st.session_state.messages = [{"role": "assistant", "content": "How may I assist you today?"}]
st.sidebar.button('Clear Chat History', on_click=clear_chat_history)

2.4 Get LLM Response

Below function appends the chat history into the prompt and use the vector database created above to retrieve answers.

💡
Note that this QA chain is different from a regular LLM chain.
# Function for generating LLaMA2 response based on RAG.
def generate_llama2_response(prompt_input):

    pre_prompt = """[INST] <<SYS>>
                  You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.

                  If you cannot answer the question from the given documents, please state that you do not have an answer.\n
                  """


    for dict_message in st.session_state.messages:
        if dict_message["role"] == "user":
            pre_prompt += "User: " + dict_message["content"] + "\n\n"
        else:
            pre_prompt += "Assistant: " + dict_message["content"] + "\n\n"

    prompt = pre_prompt +  "{context}User : {question}" + "[\INST]"
    llama_prompt = PromptTemplate(template=prompt, input_variables=["context","question"])


    qa_chain = RetrievalQA.from_chain_type(
        llm,
        retriever=retriever,
         chain_type_kwargs={"prompt": llama_prompt}
    )

    result = qa_chain.run({
                            "query": prompt_input})


    return result

2.5 Conversation

Below shows the question and answering process, the chatbot responses to users' questions:

# User-provided prompt
if prompt := st.chat_input():
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message("user"):
        st.write(prompt)

# Generate a new response if last message is not from assistant
if st.session_state.messages[-1]["role"] != "assistant":
    with st.chat_message("assistant"):
        with st.spinner("Thinking..."):
            response = generate_llama2_response(prompt)
            placeholder = st.empty()
            full_response = ''
            for item in response:
                full_response += item
                placeholder.markdown(full_response)
            placeholder.markdown(full_response)
    message = {"role": "assistant", "content": full_response}
    st.session_state.messages.append(message)

3 Start the Chatbot

You can bring up the chatbot by using below command:

!streamlit run app.py --server.address=localhost &>/content/logs.txt &

import urllib
print("Password/Enpoint IP for localtunnel is:",urllib.request.urlopen('https://ipv4.icanhazip.com').read().decode('utf8').strip("\n"))

!npx localtunnel --port 8501

The result shows a password to access the web app:

Password/Enpoint IP for localtunnel is: 34.125.220.166
npx: installed 22 in 2.393s
your url is: https://hot-pets-chew.loca.lt

Go to that url and enter the password, and enjoy the time!

4 Summary

This post demonstrates the construction of a versatile chatbot capable of more than just conversation. Specifically, it covers the following key features:

  • Creation of a vector database utilizing domain knowledge.
  • Ability of the chatbot to retrieve information from the vector database and respond to user queries.
  • User-friendly interface for ease of use.

This approach is scalable across various applications, as chatbots excel in information retrieval when equipped with a reliable database as the source of truth. Stay tuned for further insights into valuable applications of this technology.

]]>
<![CDATA[Unveiling the Deal: What Happens When Companies Merge]]>https://realvincentyuan.github.io/Spacecraft/unveiling-the-deal-what-happens-when-companies-merge/668afcbeac15d470add4a51cSun, 07 Jul 2024 20:40:17 GMT

This post is to go through the most important processes in the merger of companies and answer most interested questions for employees, shareholders and customers!

1 Company Merger Process

Acquiring a company in the U.S. is a complex process with various stages and potential outcomes. Here's a breakdown of the typical steps and what you can expect:

1. Pre-Negotiation:

  • Target Identification: The acquiring company identifies potential targets based on strategic fit, market potential, and other criteria.
  • Initial Contact: Discreet inquiries are made to gauge interest and gather information.
  • Non-Disclosure Agreement (NDA): Both parties sign an NDA to protect confidential information during discussions.

2. Due Diligence:

  • In-depth Investigation: The acquiring company assesses the target's financial health, operations, legal status, and other critical factors.
  • Valuation: Financial experts determine the target company's fair market value.

3. Negotiation and Agreement:

  • Letter of Intent (LOI): A non-binding agreement outlining key terms like price, structure, and timelines.
  • Negotiation: Both sides negotiate the final terms of the acquisition agreement, including purchase price, payment methods, and deal structure.
  • Definitive Agreement: A legally binding document outlining all agreed-upon terms and conditions.

4. Regulatory Approvals:

  • Antitrust Review: The deal might require approval from the Federal Trade Commission (FTC) or other regulatory bodies to ensure fair competition.
  • Industry-Specific Approvals: Depending on the industry, further regulatory approvals might be necessary.

5. Closing and Integration:

  • Closing: All legal formalities are completed, and the acquisition is finalized.
  • Integration: The acquiring company integrates the target's operations, employees, and systems into its own structure. This can be a complex and lengthy process.

What to expect:

  • Timeframe: The process can take months or even years, depending on the complexity of the deal and regulatory hurdles.
  • Costs: Significant legal, financial, and integration costs are involved.
  • Uncertainty: Regulatory approvals and market conditions can impact the deal's outcome.
  • Impact: Acquisitions can affect employees, customers, and the industry at large.

Additional points to consider:

  • There are different types of acquisitions, such as stock purchases, asset purchases, and mergers. Each has its own nuances.
  • Friendly acquisitions involve cooperation between both parties, while hostile takeovers involve a more aggressive approach.
  • The specific process and outcomes can vary significantly depending on the size, industry, and circumstances of the companies involved.

Now, let's break down each process and dive deep into how each process works, with some examples.

Unveiling the Deal: What Happens When Companies Merge
Photo by Vincent Yuan @USA / Unsplash

2 Pre-negotiation

The pre-negotiation phase in a company acquisition lays the groundwork for a successful deal or identifies potential roadblocks early on. Here's a more detailed breakdown of this crucial stage:

1. Target Identification:

  • Strategic fit: Aligning the target's strengths and weaknesses with the acquirer's goals and existing business.
  • Market potential: Assessing the target's market share, growth potential, and competitive landscape.
  • Financial attractiveness: Analyzing profitability, debt levels, and valuation multiples.

Examples:

  • Amazon's acquisition of Whole Foods: Focused on expanding Amazon's grocery delivery and brick-and-mortar presence.
  • Disney's acquisition of Marvel Entertainment: Aimed at acquiring valuable intellectual property and expanding its superhero universe.

2. Initial Contact:

  • Discreet approach: Using intermediaries, investment bankers, or direct contact depending on the situation and target receptivity.
  • Information gathering: Gauging the target's general interest, financial health, and potential deal structure.
  • Non-Disclosure Agreement (NDA): Protecting confidential information shared during discussions.

Example:

  • Microsoft's acquisition of LinkedIn: Initial contact reportedly occurred through a mutual acquaintance who connected Satya Nadella and Jeff Weiner.

3. Due Diligence Preparation:

  • Gathering internal resources: Assembling legal, financial, and operational teams for in-depth analysis.
  • Developing a due diligence plan: Defining scope, timelines, and key areas of investigation.
  • Negotiating access: Securing permission to review the target's financial records, contracts, and other sensitive information.

4. Non-Binding Negotiations:

  • Indicative offer: Presenting a non-binding price range based on preliminary valuation and market conditions.
  • Structure exploration: Discussing potential deal structures (stock purchase, asset purchase, merger) and their implications.
  • Exclusivity agreement (optional): Granting the acquirer temporary exclusive negotiation rights in exchange for a fee.

Remember:

  • Pre-negotiation is a delicate dance between expressing interest without revealing your hand too soon.
  • Thorough due diligence is crucial for understanding potential risks and opportunities.
  • Non-binding negotiations help refine deal terms and identify potential dealbreakers before investing significant resources.
Unveiling the Deal: What Happens When Companies Merge
Photo by Vincent Yuan @USA / Unsplash

3 Due Diligence

Due diligence is a crucial step in the company merger process, allowing the acquiring company to gain a deep understanding of the target company's financial health, operations, legal status, and potential risks. Here's a more specific breakdown of how it typically works:

Stages of Due Diligence:

1. Pre-Diligence:

  • Initial research and information gathering about the target company.
  • Signing a Non-Disclosure Agreement (NDA) to protect confidential information.

2. Financial Due Diligence:

  • Reviewing financial statements, tax returns, and internal controls.
  • Assessing the company's financial performance, profitability, and debt levels.
  • Identifying potential financial risks and liabilities.

3. Operational Due Diligence:

  • Evaluating the target company's business operations, processes, and systems.
  • Analyzing market position, competitive landscape, and customer base.
  • Identifying potential operational challenges and opportunities.

4. Legal Due Diligence:

  • Reviewing legal documents, contracts, and intellectual property rights.
  • Assessing potential legal risks, compliance issues, and litigation exposure.
  • Ensuring the target company is operating legally and has a clear title to assets.

5. Environmental Due Diligence:

  • Assessing potential environmental liabilities and regulatory compliance.
  • Identifying any environmental hazards or contamination on the target company's property.

6. Human Resources Due Diligence:

  • Evaluating the target company's workforce, employee contracts, and labor relations.
  • Identifying potential human resource risks and liabilities, such as employee lawsuits or unionization efforts.

Additional Points:

  • The specific scope and depth of due diligence vary depending on the size and complexity of the deal.
  • Experienced professionals, such as accountants, lawyers, and consultants, are often involved in the process.
  • Due diligence findings can impact the negotiation of the deal terms and price.
Unveiling the Deal: What Happens When Companies Merge
Photo by Vincent Yuan @USA / Unsplash

4 Negotiation and Agreement

The negotiation and agreement phase is arguably the most critical stage in an acquisition, where the terms are hammered out and the deal's fate is determined. Here's an in-depth look at how it typically unfolds:

1. Letter of Intent (LOI):

  • Non-binding document outlining key deal terms: Price, structure, timelines, contingencies, and exclusivity provisions.
  • Serves as a roadmap for further negotiations: Prevents wasting time if fundamental differences exist.
  • May include break-up fees: To compensate the target if the deal falls through due to the acquirer's actions.

Example:

  • SoftBank's acquisition of WeWork: The complex LOI included contingencies based on WeWork's financial performance.

2. Negotiation of Definitive Agreement:

  • Intensive process involving lawyers, advisors, and executives:Each side advocates for their best interests.
  • Key areas of negotiation: Purchase price, payment structure, warranties, indemnification, employee-related matters, and regulatory approvals.
  • Back-and-forth through drafts and revisions: Striving for a mutually beneficial agreement.

3. Deal Sweeteners:

  • Non-cash consideration: Stock, earn-outs, or other creative structures to bridge valuation gaps.
  • Management incentives: Retention packages or equity grants to key employees.

Examples:

  • Disney's acquisition of 21st Century Fox: Included a complex stock-based deal structure.
  • Elon Musk's acquisition of Twitter: Involved offering severance packages to some employees.

4. Finalizing the Agreement:

  • Legal review and approvals by boards and shareholders:Ensuring compliance and alignment.
  • Signing ceremony: Formalizing the agreement and marking a significant milestone.

Additional Points:

  • Negotiation is a dynamic process with power struggles and potential deadlocks.
  • Effective communication, flexibility, and a win-win mindset are crucial for success.
  • Cultural differences and regulatory complexities can add layers to the process.
Unveiling the Deal: What Happens When Companies Merge
Photo by Vincent Yuan @USA / Unsplash

5 Navigating the Regulatory Maze

Regulatory approval is a crucial hurdle in many acquisitions, aiming to ensure fair competition, consumer protection, and other societal considerations. Here's an overview of the process and common examples:

1. Identifying Relevant Regulators:

  • Industry-Specific Agencies: Depending on the industry, agencies like the Federal Trade Commission (FTC), Department of Justice (DOJ), or the Federal Communications Commission (FCC) might be involved.
  • Antitrust Regulators: The FTC and DOJ hold primary authority for antitrust reviews to prevent mergers that reduce competition.
  • Other Potential Regulators: Depending on the deal's specifics, agencies like the Securities and Exchange Commission (SEC) or state regulators might also weigh in.

2. Filing and Review Process:

  • Filing: Companies submit detailed information about the merger, including market analyses and justifications.
  • Initial Review: Regulators assess the potential impact on competition and other relevant factors.
  • Second Request: If concerns arise, regulators can request more information and conduct deeper investigations.
  • Public Comment: In some cases, the public can submit comments on the proposed merger.

3. Approval or Challenge:

  • Clearance: If regulators determine no significant anti-competitive harms, they grant approval.
  • Conditions: Approvals might come with conditions aimed at mitigating potential harms, like divestitures or restrictions on specific practices.
  • Challenge: If regulators believe the deal violates competition laws, they can file lawsuits to block it.

4. Timeline:

  • The process can vary significantly depending on the complexity of the deal and the level of scrutiny required. It can take anywhere from weeks to months, or even years in complex cases.

Examples:

  • AT&T's attempted acquisition of T-Mobile: The DOJ blocked the merger due to concerns about reduced competition in the wireless market.
  • Facebook's acquisition of WhatsApp: The FTC initially challenged the deal but ultimately approved it with conditions.

Additional Points:

  • The regulatory landscape can be complex and constantly evolving, requiring expert legal counsel for navigating the process.
  • The level of scrutiny and potential challenges can significantly impact the deal timeline and feasibility.
  • Understanding the regulatory environment and proactively addressing potential concerns is crucial for a successful acquisition.
Unveiling the Deal: What Happens When Companies Merge
Photo by Vincent Yuan @USA / Unsplash

Closing and Integration

Closing and integration mark the final chapter in a company merger, but they bring their own set of challenges and complexities. Here's a detailed breakdown:

Closing:

  • Formalization: Final documents are signed, legal formalities are completed, and the acquisition officially closes.
  • Regulatory Approvals: If required, all necessary regulatory approvals must be secured before closing.
  • Funding and Payment: The acquiring company finalizes the payment to the target company, often in cash, stock, or a combination.
  • Shareholder Votes: For public companies, shareholder approval might be required before closing.

Example:

  • CVS Health's acquisition of Aetna: The deal closed in 2018 after receiving regulatory approval and shareholder votes from both companies.

Integration:

  • Combining Operations: Merging business functions, systems, and teams from both companies.
  • Cultural Integration: Aligning company cultures, values, and communication styles.
  • Employee Transitions: Addressing employee concerns, managing potential layoffs, and implementing training programs.
  • Synergy Realization: Identifying and capturing cost savings, revenue growth, and other value-creation opportunities.

Challenges and Risks:

  • Integration complexity: Cultural clashes, resistance to change, and IT system integration issues can be difficult to overcome.
  • Synergy realization: Achieving projected synergies can be slower and more challenging than anticipated.
  • Employee morale and retention: Managing employee anxiety, skills gaps, and potential talent loss during integration is crucial.

Examples:

  • Disney's acquisition of Fox: The integration process was complex due to the size and diverse businesses involved.
  • Kraft Heinz's acquisition of Unilever: The merger failed to achieve expected synergies and led to cultural clashes.

Additional Points:

  • Effective communication, change management strategies, and strong leadership are crucial for successful integration.
  • The integration process can take months or even years, and requires ongoing monitoring and adjustments.
  • The success of a merger ultimately hinges on a smooth and well-executed closing and integration phase.
Unveiling the Deal: What Happens When Companies Merge
Photo by Vincent Yuan @USA / Unsplash

7 Frequently Asked Questions

Here are some of the most commonly asked questions about company mergers from the perspectives of employees, shareholders and customers:

Employees

For sure, the job security is the no.1 question. During a company merger, the evaluation of employee jobs typically involves several considerations. Let’s explore this from different angles:

Internal Assessment by the Merging Companies:

  • The merging companies themselves evaluate employee roles, responsibilities, and skills. They assess which positions are redundant, which are critical, and which can be integrated.
  • Job evaluations may involve comparing job descriptions, performance records, and qualifications.

Consulting Companies or HR Experts:

  • Some mergers engage external consulting firms or HR experts to assist in evaluating employees.
  • These experts analyze factors such as job functions, competencies, and market value.
  • They may provide recommendations on retaining key talent, aligning compensation, and managing workforce transitions.

Retention of Key Employees:

  • Identifying and retaining key employees is crucial. These are individuals with specialized skills, institutional knowledge, or leadership roles.
  • Companies consider factors like expertise, client relationships, and strategic importance.

Redundancies and Layoffs:

  • Unfortunately, some positions become redundant due to overlapping functions after the merger.
  • Companies decide which roles to eliminate based on business needs, cost savings, and efficiency.
  • Severance packages may be offered to affected employees.

Skill Assessment and Fit:

  • Companies evaluate whether employees’ skills align with the merged organization’s goals.
  • They consider adaptability, willingness to learn, and cultural fit.

Shareholders

This is a breakdown to show how shareholders are impacted during a company merger:

Exchange of Shares:

  • In a stock-for-stock merger, shareholders of both companies receive shares in the new combined entity.
  • The exchange ratio determines whether one company’s shareholders receive a premium above their share price before the merger announcement.
  • If the merger is favorable, shares of both companies may rise.

Dilution of Control:

  • Shareholders whose shares are not exchanged find their control diluted.
  • New shares issued to the other company’s shareholders reduce the control of existing shareholders.

Temporary Volatility:

  • Shareholders of the acquiring firm may experience a temporary drop in share value before the merger.
  • Shareholders of the target firm may see a rise in share value during the period.

Customers

Certainly! Let’s break down how customers are impacted during a company merger:aa

Service Disruptions and Miscommunications:

  • Integration efforts can divert attention from day-to-day operations, leading to miscommunications with customers.
  • Poorly managed systems migrations may cause confusion or delays in service.

Changes in Customer Service:

  • Customer service levels may fluctuate due to adjustments in staff, processes, or technology.
  • Customers might experience longer wait times or inconsistent support.

Product and Service Offerings:

  • Choices available to customers may change.
  • Some products or services may be discontinued, while new ones may be introduced.

Pricing and Terms:

  • Pricing structures could shift. Customers may face price increases or discounts.
  • Contract terms might be modified, affecting existing agreements.

Brand Perception and Loyalty:

  • Mergers can stress relationships with customers.
  • Brand loyalty may be tested as customers adapt to the new entity.

Communication Efforts:

  • Effective communication about the merger’s benefits and changes is crucial.
  • Transparency helps maintain customer trust.

Reference

]]>
<![CDATA[Built a Chatbot with Streamlit and Llama 2 with Google Colab GPUs from Scratch]]>https://realvincentyuan.github.io/Spacecraft/built-a-chatbot-with-streamlit-and-llama-2-with-google-colab-gpus-from-scratch/668afae6ac15d470add4a506Sun, 07 Jul 2024 20:31:52 GMT

So far, we have talked about a lot of things regarding Llama 2:

  • Swift inference powered by GPUs
  • Thoughtful responses with appropriate prompts
  • Question answering utilizing a knowledge database
  • A user-friendly web interface

You can find those informative posts in the GenAI section of Spacecraft as below:

GenAI - Spacecraft
Built a Chatbot with Streamlit and Llama 2 with Google Colab GPUs from Scratch

This post shows a product that makes the best of all the things learned, and build a web-based chatbot powered by a local Llama 2 model, running on Google Colab with GPUs.

2 Dependencies

Firstly, install a few dependencies:

!pip install -q streamlit

!npm install localtunnel

# GPU setup of LangChain
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install --force-reinstall llama-cpp-python==0.2.28  --no-cache-dir

!pip install huggingface_hub  chromadb langchain sentence-transformers pinecone_client

Then download the Llama 2 model to the Colab notebook:

!wget https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_0.gguf

3 Build the Web Chatbot

The web chatbot is like this:

Built a Chatbot with Streamlit and Llama 2 with Google Colab GPUs from Scratch
Llama 2 Chatbot

You need to write the app code to the disk first:

%%writefile app.py

import streamlit as st
import os

from langchain.llms import LlamaCpp
from langchain.chains import LLMChain
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.prompts import PromptTemplate

from langchain.llms import LlamaCpp


# App title
st.set_page_config(page_title="🦙💬 Llama 2 Chatbot")

llama_model_path = 'llama-2-7b-chat.Q5_0.gguf'

n_gpu_layers = 40  # Change this value based on your model and your GPU VRAM pool.
n_batch = 512  # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.

# for token-wise streaming so you'll see the answer gets generated token by token when Llama is answering your question
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

# Replicate Credentials
with st.sidebar:
    st.title('🦙💬 Llama 2 Chatbot')


    st.subheader('Models and parameters')
    selected_model = st.sidebar.selectbox('Choose a Llama2 model', ['Llama2-7B', 'Llama2-13B'], key='selected_model')

    if selected_model == 'Llama2-7B':
        llm_path = llama_model_path
    elif selected_model == 'Llama2-13B':
        llm_path = llama_model_path

    temperature = st.sidebar.slider('temperature', min_value=0.01, max_value=5.0, value=0.1, step=0.01)
    top_p = st.sidebar.slider('top_p', min_value=0.01, max_value=1.0, value=0.9, step=0.01)
    max_length = st.sidebar.slider('max_length', min_value=32, max_value=128, value=120, step=8)
    st.markdown('📖 Learn how to build this app in this [blog](https://blog.streamlit.io/how-to-build-a-llama-2-chatbot/)!')


    llm = LlamaCpp(
      model_path=llm_path,
      temperature=temperature,
      top_p=top_p,
      n_ctx=2048,
      n_gpu_layers=n_gpu_layers,
      n_batch=n_batch,
      callback_manager=callback_manager,
      verbose=True,
    )

# Store LLM generated responses
if "messages" not in st.session_state.keys():
    st.session_state.messages = [{"role": "assistant", "content": "How may I assist you today?"}]

# Display or clear chat messages
for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.write(message["content"])

def clear_chat_history():
    st.session_state.messages = [{"role": "assistant", "content": "How may I assist you today?"}]
st.sidebar.button('Clear Chat History', on_click=clear_chat_history)


# Function for generating LLaMA2 response. Refactored from https://github.com/a16z-infra/llama2-chatbot
def generate_llama2_response(prompt_input):

    pre_prompt = """[INST] <<SYS>>
                  You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.

                  If you cannot answer the question from the given documents, please state that you do not have an answer.\n
                  """


    for dict_message in st.session_state.messages:
        if dict_message["role"] == "user":
            pre_prompt += "User: " + dict_message["content"] + "\n\n"
        else:
            pre_prompt += "Assistant: " + dict_message["content"] + "\n\n"

    prompt = pre_prompt +  "User : {question}" + "[\INST]"
    llama_prompt = PromptTemplate(template=prompt, input_variables=["question"])

    chain = LLMChain(llm=llm, prompt=llama_prompt)

    result = chain({
                "question": prompt_input
                 })


    return result['text']

# User-provided prompt
if prompt := st.chat_input():
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message("user"):
        st.write(prompt)

# Generate a new response if last message is not from assistant
if st.session_state.messages[-1]["role"] != "assistant":
    with st.chat_message("assistant"):
        with st.spinner("Thinking..."):
            response = generate_llama2_response(prompt)
            placeholder = st.empty()
            full_response = ''
            for item in response:
                full_response += item
                placeholder.markdown(full_response)
            placeholder.markdown(full_response)
    message = {"role": "assistant", "content": full_response}
    st.session_state.messages.append(message)

3.1 Model Setup

In the code, firstly tweak the params per your hardware, models and objectives:

llama_model_path = 'llama-2-7b-chat.Q5_0.gguf'

n_gpu_layers = 40  # Change this value based on your model and your GPU VRAM pool.
n_batch = 512  # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.

The free version of Colab does not too much memory so here the llama-2-7b-chat.Q5_0.gguf is used but you can use a larger model for better performance.

3.2 Message Management

Then, these are the setup for the display/clear of messages of the chatbot:

# Store LLM generated responses
if "messages" not in st.session_state.keys():
    st.session_state.messages = [{"role": "assistant", "content": "How may I assist you today?"}]

# Display or clear chat messages
for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.write(message["content"])

def clear_chat_history():
    st.session_state.messages = [{"role": "assistant", "content": "How may I assist you today?"}]
st.sidebar.button('Clear Chat History', on_click=clear_chat_history)

3.3 Get LLM Response

Below function appends the chat history into the prompt and get the response of model:

def generate_llama2_response(prompt_input):

    pre_prompt = """[INST] <<SYS>>
                  You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.

                  If you cannot answer the question from the given documents, please state that you do not have an answer.\n
                  """


    for dict_message in st.session_state.messages:
        if dict_message["role"] == "user":
            pre_prompt += "User: " + dict_message["content"] + "\n\n"
        else:
            pre_prompt += "Assistant: " + dict_message["content"] + "\n\n"

    prompt = pre_prompt +  "User : {question}" + "[\INST]"
    llama_prompt = PromptTemplate(template=prompt, input_variables=["question"])

    chain = LLMChain(llm=llm, prompt=llama_prompt)

    result = chain({
                "question": prompt_input
                 })


    return result['text']

3.4 Conversation

Below shows the question and answering process, the chatbot responses to users' questions:

# User-provided prompt
if prompt := st.chat_input():
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message("user"):
        st.write(prompt)

# Generate a new response if last message is not from assistant
if st.session_state.messages[-1]["role"] != "assistant":
    with st.chat_message("assistant"):
        with st.spinner("Thinking..."):
            response = generate_llama2_response(prompt)
            placeholder = st.empty()
            full_response = ''
            for item in response:
                full_response += item
                placeholder.markdown(full_response)
            placeholder.markdown(full_response)
    message = {"role": "assistant", "content": full_response}
    st.session_state.messages.append(message)

4 Start the Chatbot

You can bring up the chatbot by using below command:

!streamlit run app.py --server.address=localhost &>/content/logs.txt &

import urllib
print("Password/Enpoint IP for localtunnel is:",urllib.request.urlopen('https://ipv4.icanhazip.com').read().decode('utf8').strip("\n"))

!npx localtunnel --port 8501

The result shows a password to access the web app:

Password/Enpoint IP for localtunnel is: 35.185.197.1
npx: installed 22 in 2.393s
your url is: https://hot-pets-chew.loca.lt

Go to that url and enter the password, and enjoy the time!

5 Conclusion

This post consolidates information to transform your local Llama 2 model into a fully functional chatbot. Moreover, you have the flexibility to craft specialized assistants for distinct domains by customizing the system prompts, all at no additional cost.

Let's build something cool!

]]>
<![CDATA[Question Answering on Multiple Files with Llama 2 and RAG]]>https://realvincentyuan.github.io/Spacecraft/question-answering-on-multiple-files-with-llama-2-and-rag/668afa15ac15d470add4a4f4Sun, 07 Jul 2024 20:29:41 GMT

In the previous post, we discussed the process of utilizing Llama 2 and retrieval augmented generation (RAG) for question answering. However, the method shared was designed for a single file, and in many scenarios, it's essential for the chatbot to have knowledge about all the information across multiple input files. This post will demonstrate how to achieve this capability with Llama 2 at no cost.

This post will show:

  • Run Llama 2 with GPUs
  • Create a vector store based on multiple files
  • Question answering based on RAG with multiple files in the vector store

1 Get Llama 2 Ready

Firstly, install Python dependencies, download the Llama 2 model, and load Llama 2 model. This part is identical to the reference link above so no details are shared repeatedly.

!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir

!pip install huggingface_hub   chromadb langchain sentence-transformers pinecone_client

import numpy as np
import pandas as pd

from huggingface_hub import hf_hub_download
from llama_cpp import Llama

from langchain.llms import LlamaCpp
from langchain.chains import LLMChain
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.prompts import PromptTemplate

# Vector store
from langchain.document_loaders import DirectoryLoader, TextLoader
from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain.vectorstores import Chroma

# Show result
import markdown

!wget https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_0.gguf

# for token-wise streaming so you'll see the answer gets generated token by token when Llama is answering your question
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

llama_model_path = 'llama-2-7b-chat.Q5_0.gguf'

n_gpu_layers = 40  # Change this value based on your model and your GPU VRAM pool.
n_batch = 512  # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.


llm = LlamaCpp(
    model_path=llama_model_path,
    temperature=0.1,
    top_p=1,
    n_ctx=16000,
    n_gpu_layers=n_gpu_layers,
    n_batch=n_batch,
    callback_manager=callback_manager,
    verbose=True,
)

2 Create Vector Database

Firstly, let's download some dataset:

!wget -q https://www.dropbox.com/s/vs6ocyvpzzncvwh/new_articles.zip
!unzip -q new_articles.zip -d new_articles

These are a bunch of news text files:

Question Answering on Multiple Files with Llama 2 and RAG
Input News Data

2.1 Load Files

Load the files using DirectoryLoader made by LangChain:

from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = DirectoryLoader('./new_articles/', glob="./*.txt", loader_cls=TextLoader)

documents = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
texts = text_splitter.split_documents(documents)

2.2 Create the Database

from langchain.embeddings import HuggingFaceEmbeddings


# Save the db in the disk
persist_directory = 'db'

# HuggingFace embedding is free!
embedding = HuggingFaceEmbeddings()

vectordb = Chroma.from_documents(documents=texts, 
                                 embedding=embedding,
                                 persist_directory=persist_directory)

You can save the database in the disk and load it back to the workflow in below ways:

vectordb.persist()
vectordb = None

vectordb = Chroma(persist_directory=persist_directory, 
                  embedding_function=embedding)

2.3 Make a Retriever

💡
The number of files to be searched impacts the result, thekvalue is a parameter to tweak per your use case.
retriever = vectordb.as_retriever(search_kwargs={"k": 5})

3 RAG

We then use RetrievalQA to retrieve the documents from the vector database and give the model more context on Llama 2, thereby increasing its knowledge.

Firstly, create the qa_chain:

# use another LangChain's chain, RetrievalQA, to associate Llama with the loaded documents stored in the vector db
from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=retriever
)

Then let's ask a few questions regarding the input documents, here comes the 1st question:

query = "Any news about Hugging Face and ServiceNow? Also include the source in the response."
llm_response = qa_chain(query)

The result is like:

Hugging Face raised $35 million from investors including ServiceNow, according to TechCrunch on May 18, 2022. (Source: TechCrunch)

Let's ask another question:

query = "Any news about Google IO 2023? Also include the source in the response."
llm_response = qa_chain(query)

The answer to the 2nd question is:

Based on the provided context, it seems that Google IO 2023 is expected to announce new hardware, including a foldable smartphone called Pixel Fold, and possibly a budget device called Pixel 7a, as well as updates to Wear OS and developer tools. Additionally, there may be news about Google's AI plans, with generative AI (like Bard) appearing across Google's line of products. However, I don't know the exact details or timeline of these announcements, as the provided context only provides general information about what to expect from the conference.

4 Summary

Up to this point, you can envision the possibilities that Llama 2 unlocks within this workflow, alongside other techniques highlighted in my blog. Notably, it encompasses:

  • Swift inference powered by GPUs
  • Thoughtful responses with appropriate prompts
  • Question answering utilizing a knowledge database
  • A user-friendly web interface

These building blocks empower developers to create more robust applications than ever before. Stay tuned for the unveiling of more exciting products!

]]>
<![CDATA[Job Aid of Running Streamlit App on Google Colab]]>Streamlit is a user-friendly, open-source Python framework designed to effortlessly create and share interactive data applications. Whether you're a data scientist, engineer, or analyst, Streamlit empowers you to transform your scripts into robust web applications within minutes, all within the familiar Python environment.

Google Colab, on the other

]]>
https://realvincentyuan.github.io/Spacecraft/job-aid-of-running-streamlit-app-on-google-colab/668af9abac15d470add4a4e7Sun, 07 Jul 2024 20:26:08 GMT

Streamlit is a user-friendly, open-source Python framework designed to effortlessly create and share interactive data applications. Whether you're a data scientist, engineer, or analyst, Streamlit empowers you to transform your scripts into robust web applications within minutes, all within the familiar Python environment.

Google Colab, on the other hand, provides a seamless environment for testing ideas related to app development, model training, and Gen AI experiments. It eliminates the need for manual setup of the coding environment and offers the added advantage of free GPUs.

The synergy between Streamlit and Google Colab becomes even more compelling when you can translate your demonstrations into interactive web applications. This enables you to effectively operationalize your ideas. In this post, we'll explore how to leverage Streamlit to build web applications seamlessly within the Google Colab environment.

1 Install Dependencies

Firstly, install Streamlit:

!pip install -q streamlit

Then install localtunnel to serve the Streamlit app

!npm install localtunnel

2 Build Your Apps

Create a demo web application like below:

%%writefile app.py

import streamlit as st

st.write('Hello, *World!* :sunglasses:')

Then run the app using below command:

!streamlit run app.py --server.address=localhost &>/content/logs.txt &

And a few files should be created and shown like this:

Job Aid of Running Streamlit App on Google Colab
File System

3 Expose the App

Let's expose the app to the port 8051:

import urllib
print("Password/Enpoint IP for localtunnel is:",urllib.request.urlopen('https://ipv4.icanhazip.com').read().decode('utf8').strip("\n"))

!npx localtunnel --port 8501

The return will be like this:

Password/Enpoint IP for localtunnel is: 35.245.122.211
npx: installed 22 in 1.71s
your url is: https://itchy-bikes-smoke.loca.lt

Copy that password, and click the url, it will lead you to a page:

Job Aid of Running Streamlit App on Google Colab
Landing Page

Once your enter the password, the web app is now yours:

Job Aid of Running Streamlit App on Google Colab
The Hello World App

4 Conclusion

This job aid shows how you can build a web app within Google Colab, now you can move one step further and try to build something cool 😎

]]>
<![CDATA[How to Prompt Correctly with Llama 2?]]>https://realvincentyuan.github.io/Spacecraft/how-to-prompt-correctly-with-llama-2/668a1b22ac15d470add4a2e3Sun, 28 Jan 2024 04:26:24 GMT

Uncertain if you've encountered instances where Llama 2 provides irrelevant, redundant, or potentially harmful responses. Such outcomes can be perplexing and may lead users to disengage. A contributing factor to this issue is often the incorrect utilization of prompts. Therefore, this post aims to introduce best practices for prompting when developing GenAI apps with Llama 2.

The sample code can run on Google Colab with GPUs, kindly check below post for the GPU configuration of Llama 2.

Run Llama 2 with Retrieval Augmented Generation in Google Colab with GPUs
Run Llama2 with RAG in Google Colab.
How to Prompt Correctly with Llama 2?

This post will show:

  • Run Llama 2 with GPUs
  • Comparison of different prompts and the impact to the response of Llama 2
  • Prompt design for chat, with awareness of historical messages

1 Get Llama 2 Ready

Firstly, install Python dependencies, download the Llama 2 model, and load Llama 2 model. This part is identical to the reference link above so no details are shared repeatedly.

!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir

!pip install huggingface_hub   chromadb langchain sentence-transformers pinecone_client

import numpy as np
import pandas as pd

from huggingface_hub import hf_hub_download
from llama_cpp import Llama

from langchain.llms import LlamaCpp
from langchain.chains import LLMChain
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.prompts import PromptTemplate

# Vector store
from langchain.document_loaders import CSVLoader
from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain.vectorstores import Chroma

# Show result
import markdown

!wget https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_0.gguf

# for token-wise streaming so you'll see the answer gets generated token by token when Llama is answering your question
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

llama_model_path = 'llama-2-7b-chat.Q5_0.gguf'

n_gpu_layers = 40  # Change this value based on your model and your GPU VRAM pool.
n_batch = 512  # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.

from langchain.llms import LlamaCpp
llm = LlamaCpp(
    model_path=llama_model_path,
    temperature=0.1,
    top_p=1,
    n_ctx=16000,
    n_gpu_layers=n_gpu_layers,
    n_batch=n_batch,
    callback_manager=callback_manager,
    verbose=True,
)

2 Impact of Different Prompts

It is pretty amazing that slightly different prompts will lead to quite different response. This can be reflected by simple testing as below.

2.1 Just Ask Questions

For instance, the most straightforward way is just to ask what you want like below:

Testing_message = "The Stoxx Europe 600 index slipped 0.5% at the close, extending a lackluster start to the year."

# Use LangChain's PromptTemplate and LLMChain
prompt = PromptTemplate.from_template(
    "Extract the named entity information from below text: {text}"
)

chain = LLMChain(llm=llm, prompt=prompt)
answer = chain.invoke(Testing_message)

The answer is like below:

 The index has fallen 3.7% since the beginning of January and is down 12.9% from its peak in August last year.
Please provide the named entities as follows:
1. Stoxx Europe 600
2. index
3. Europe
4. January
5. August

As you can see, Llama 2 firstly repeats the sentence and also adds more info, then answers the question, which is not expected by users as it seems to be out of control in a sense.

2.2 Prompt with System Message

By slightly adjusting the prompt, the response will become more normal.

prompt = PromptTemplate.from_template(
    "[INST]Extract the important Named Entity Recoginiton information from this text: {text}, do not add unrelated content in the reply.[/INST]"
)
chain = LLMChain(llm=llm, prompt=prompt)
answer = chain.invoke(Testing_message)

The response becomes:

  Sure! Here are the important named entities recognized in the given text:

1. Stoxx Europe 600 - Index
2. Europe - Continent

So now it does not change the sentence, and only answers the question that user asks. This version makes more sense simply because the addition of [INST] and [/INST] in the prompt. [INST] is part of the token used in the model training process, shared in the Llama 2 paper, which helps model understand the conversation.

Also, there is a more flexible way to do this, also with the addition of customizable system message as below:

# creating prompt for large language model
pre_prompt = """[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.

If you cannot answer the question from the given documents, please state that you do not have an answer.\n
"""

prompt = pre_prompt + "{context}\n" +"Question : {question}" + "[\INST]"
llama_prompt = PromptTemplate(template=prompt, input_variables=["context", "question"])

chain = LLMChain(llm=llm, prompt=llama_prompt)

result = chain({ "context" : "Extract the named entity information from below sentences:",
                "question": Testing_message
                 })

The result is as below:

  Sure, I'd be happy to help! Here is the named entity information extracted from the sentence you provided:

* Stoxx Europe 600 index
* Europe
* year

I hope this helps! Let me know if you have any other questions.

In fact this is the template strictly following the training procedure of Llama 2. And with above template, you can customize the system message more flexibly though the response might look similar to a simplified version as shown above.

<s>[INST] <<SYS>>
{{ system_prompt }}
<</SYS>>

{{ user_message }} [/INST]

2.3 Prompt with Chat History

Llama 2 does not have memory, using above prompt would not allow the model to answer your question based on the chat history, in order to address this challenge, we need to embed the chat history into the prompt so that the model is exposed to it.

Below is a vanilla version that works, firstly have the 1st round the question & answering:

def create_chat_prompt(pre_prompt, chat_messages):
  for dict_message in chat_messages:
      if dict_message["role"] == "user":
          pre_prompt += "User: " + dict_message["content"] + "\n\n"
      else:
          pre_prompt += "Assistant: " + dict_message["content"] + "\n\n"


  prompt = pre_prompt + "User : {question}" + "[\INST]"
  return prompt

messages = [{"role": "assistant", "content": "How may I assist you today?"}]

# creating prompt for large language model
pre_prompt = """[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.

If you cannot answer the question from the given documents, please state that you do not have an answer.\n
"""


llama_chat_prompt = PromptTemplate(template=prompt, input_variables=["question"])

chat_chain = LLMChain(llm=llm, prompt=llama_chat_prompt)

chat_question = { "question": "What do you think are the most beautiful things in the Autumn in the U.S.?"}

messages.append({"role": "user", "content": chat_question['question']})

response = chat_chain(chat_question)

messages.append({"role": "Assistant", "content": response['text']})

The response is like:

I'm here to help you with your question. However, I must inform you that the concept of "beauty" is subjective and can vary from person to person. What one person finds beautiful may not be the same for another.
That being said, Autumn in the U.S. is known for its vibrant foliage, with trees changing colors from green to shades of orange, red, and yellow. The most beautiful things about Autumn in the U.S. can be observed through the following:
1. Leaves: The vibrant colors of the leaves during Autumn are a sight to behold. The changing hues of orange, red, and yellow create a picturesque landscape that is both breathtaking and Instagram-worthy.
2. Fall Foliage: The term "fall foliage" refers to the vibrant colors of the leaves during Autumn. The changing colors of the leaves are a result of the trees preparing for winter by storing energy in their roots, stems, and branches.
3. Pumpkins: Pumpkins are a staple of Autumn in the U.S., often used in decorations

This is a pretty standard response, then let's ask a follow-up question based on the response above:

prompt = create_chat_prompt(pre_prompt=pre_prompt, 
                            chat_messages=messages)
                            
llama_chat_prompt = PromptTemplate(template=prompt, input_variables=["question"])

chat_chain = LLMChain(llm=llm, prompt=llama_chat_prompt)


chat_question = { "question": "For the 2nd point you mentioned above, can you please make a 3 day travel plan?"}
messages.append({"role": "user", "content": chat_question})

response = chat_chain(chat_question)

messages.append({"role": "Assistant", "content": response['text']})

As you can see, the question does not specify what 2nd point is, while the model is able to grasp that based on the chat history, and the response is like below:

Of course! I'd be happy to help you create a 3-day travel plan for experiencing the beautiful things about Autumn in the U.S. Here is a sample itinerary:
Day 1:
* Stop 1: Take a scenic drive through the Adirondack Mountains in upstate New York. The mountains offer breathtaking views of the changing leaves, and there are many scenic overlooks and hiking trails to explore.
* Stop 2: Visit the Hudson River Valley, which is known for its picturesque towns, farms, and vineyards. Take a stroll through the charming streets of Cold Spring or Beacon, and enjoy the fall foliage along the riverfront.
Day 2:
* Stop 1: Head to New England, specifically Vermont or New Hampshire, for some of the most spectacular fall foliage in the country. Take a drive through the Green Mountains or White Mountains, and stop at scenic overlooks and hiking trails along the way.
* Stop 2: Visit the coastal towns of Maine, such as Kennebunkport or Camden

3 Summary

Some the snippets are not made into a function just for demo purposes, while you can see by adding system messages and  chat history into the prompt, Llama 2 becomes even more intelligent and helpful.

So far, we have covered topics of Llama 2 regarding:

  • Fast inference using GPUs
  • Better prompt tactics for reasonable response
  • Chat with Llama 2
  • RAG for domain knowledge question & answering

This means that a lot of useful apps powered by Llama 2 can be built using above tech stack. Stay tuned for more valuable sharing!

Reference

How to Prompt Llama 2:

Llama 2 is here - get it on Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
How to Prompt Correctly with Llama 2?

]]>
<![CDATA[Build a Fraud Intelligence Analyst Powered by Llama 2 in Google Colab with GPUs]]>https://realvincentyuan.github.io/Spacecraft/build-a-fraud-intelligence-analyst-powered-by-llama-2-in-google-colab-with-gpus/668a1b22ac15d470add4a2e2Sat, 13 Jan 2024 04:55:34 GMT

Last time, we introduced how to use GPUs in Google Colab to run RAG with Llama 2. Today, a practical use case is discussed - fraudulent credit card transaction detection, powered by Llama 2.

Run Llama 2 with Retrieval Augmented Generation in Google Colab with GPUs
Run Llama2 with RAG in Google Colab.
Build a Fraud Intelligence Analyst Powered by Llama 2 in Google Colab with GPUs

Fraud detection is a critical task for businesses of all sizes. By identifying and investigating fraudulent transactions, businesses can protect their bottom line and keep their customers safe.

Llama 2 is a large language model that can be used to generate text, translate languages, write different kinds of creative content, and more. In this post, we'll show you how to use Llama 2 to build a Fraud Intelligence Analyst that can detect fraudulent patterns of credit card transactions and answer any questions regarding the transactions.

This Fraud Intelligence Analyst can be used to help fraud detection analysts and data scientists build better solutions to the fraud detection problem. By providing insights into the data, the Fraud Intelligence Analyst can help analysts identify new patterns of fraud and develop new strategies to combat it.

This post will show:

  • Load Llama 2 gguf model from HuggingFace
  • Run Llam2 2 with GPUs
  • Create a vector store from a CSV file that has credit card transaction data
  • Perform question and answering using Retrieval Augmented Generation(RAG)

1 Dependencies

Firstly, install Python dependencies as below:

!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir

!pip install huggingface_hub   chromadb langchain sentence-transformers pinecone_client

Then import dependencies as below:

import numpy as np
from huggingface_hub import hf_hub_download
from llama_cpp import Llama

from langchain.llms import LlamaCpp
from langchain.chains import LLMChain
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.prompts import PromptTemplate

# Vector store
from langchain.document_loaders import CSVLoader
from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain.vectorstores import Chroma

# Show result
import markdown

This credit card transaction dataset will be used to create the vector store:

from google.colab import drive
drive.mount('/content/drive')

source_text_file = '/content/drive/MyDrive/Research/Data/GenAI/credit_card_fraud.csv'

The transaction data is like below:

transaction time merchant amt city_pop is_fraud
2019-01-01 00:00:44 "Heller, Gutmann and Zieme" 107.23 149 0
2019-01-01 00:00:51 Lind-Buckridge 220.11 4154 0
2019-01-01 00:07:27 Kiehn Inc 96.29 589 0
2019-01-01 00:09:03 Beier-Hyatt 7.77 899 0
2019-01-01 00:21:32 Bruen-Yost 6.85 471 1
💡
The public fraud credit card transaction data can be found here: https://www.datacamp.com/workspace/datasets/dataset-python-credit-card-fraud

2 Load Llama 2 from HuggingFace

Firstly create a callback manager for the streaming output of text, and specify the model names in the HuggingFace:

# for token-wise streaming so you'll see the answer gets generated token by token when Llama is answering your question
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

# Download the model
!wget https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_0.gguf

Then specify the model path to be loaded into LlamaCpp:

model_path = 'llama-2-7b-chat.Q5_0.gguf'

Specify the GPU settings:

n_gpu_layers = 40  # Change this value based on your model and your GPU VRAM pool.
n_batch = 512  # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.

Next, let's load the model using langchain as below:

from langchain.llms import LlamaCpp
llm = LlamaCpp(
    model_path=llama_model_path,
    temperature=0.0,
    top_p=1,
    n_ctx=16000,
    n_gpu_layers=n_gpu_layers,
    n_batch=n_batch,
    callback_manager=callback_manager,
    verbose=True,
)
💡
Be sure to set up n_gpu_layers and n_batch, it shows BLAS = 1 in the output if it is set up correctly.

3 Question Answering

This time the CSV loader is used to embed a table and create a vector database, then the LLama 2 model will answer questions based on that file.

3.1 Create a Vector Store

Firstly let's load the CSV data from Colab:

embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

loader = CSVLoader(source_text_file, encoding="windows-1252")
documents = loader.load()

# Create a vector store
db = Chroma.from_documents(documents, embedding_function)

3.2 RAG

We then use RetrievalQA to retrieve the documents from the vector database and give the model more context on Llama 2, thereby increasing its knowledge.

# use another LangChain's chain, RetrievalQA, to associate Llama with the loaded documents stored in the vector db
from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vstore.as_retriever(search_kwargs={"k": 1})
)

Then the model is ready for your questions:

question = "Do you see any common patter for those fraudulent transactions? think about this step by step and provide examples for each pattern that you found."
result = qa_chain({"query": question})
print(markdown.markdown(result['result']))

The response is like:

Yes, I can identify some common patterns in the provided data for fraudulent transactions. Here are some examples of each pattern I found:

1. Recurring Transactions: There are several recurring transactions in the dataset, such as those with the same date and time every day or week. For example, transaction #86cad0e7682a85fa6418dde1a0a33a44 has a recurrence pattern of every Monday at 5:50 AM. While this alone does not necessarily indicate fraud, it could be a sign of automated or scripted transactions.

2. High-Value Transactions: Some transactions have unusually high values compared to the average transaction amount for the merchant and category. For example, transaction #86cad0e7682a85fa6418dde1a0a33a44 has an amt of $32.6, which is significantly higher than the average transaction amount for gas transport merchants in Browning, MO ($19.2). This could indicate a fraudulent transaction.

3. Multiple Transactions from Same IP Address:<p>Yes, I can identify some common patterns in the provided data for fraudulent transactions. 

4 Conclusion

In fraud detection, case studies are a common and important part of the process. However, they can be labor-intensive to create. Llama 2 and RAG can help to automate this process, making it more efficient and effective.

Llama 2 and RAG can be used to generate case studies that are tailored to specific questions or scenarios. This can help fraud detection analysts to identify patterns and trends that they might not otherwise have seen. Additionally, the case studies can be used to train new analysts on the latest fraud detection techniques.

Llama 2 and RAG are still in development, but they have the potential to revolutionize the way that fraud detection case study is conducted. By making it easier to create and analyze case studies, these tools can help fraud detection analysts to stay ahead of the curve.

Stay tuned for more applications like this one!

Reference

Langchain - llama.cpp:

Llama.cpp | 🦜️🔗 Langchain
llama-cpp-python is a
Build a Fraud Intelligence Analyst Powered by Llama 2 in Google Colab with GPUs

Create a vector store using CSV files:

How to use CSV files in vector stores with Langchain
A guide for using CSV files in vector stores with langchain
Build a Fraud Intelligence Analyst Powered by Llama 2 in Google Colab with GPUs

]]>
<![CDATA[Run Llama 2 with Retrieval Augmented Generation in Google Colab with GPUs]]>https://realvincentyuan.github.io/Spacecraft/run-llama-2-with-retrieval-augmented-generation-rag-in-google-colab-with-gpus/668a1b22ac15d470add4a2e1Sun, 07 Jan 2024 21:53:28 GMT

Utilizing GenAI models on Colab with its free GPUs proves advantageous for GenAI developers. It enables faster execution compared to personal computers lacking powerful GPUs, thereby allowing the testing of more ideas within the same timeframe.

Run Llama 2 with Retrieval Augmented Generation in Google Colab with GPUs
Colab GPU

This post will show you how you can:

  • Load Llama 2 gguf model from HuggingFace
  • Run Llam2 2 with GPUs
  • Create a vector store using Pinecone
  • Perform question and answering using Retrieval Augmented Generation(RAG)

1 Dependencies

Firstly, install Python dependencies as below:

!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir

!pip install huggingface_hub   chromadb langchain sentence-transformers pinecone_client

Then import dependencies as below:

import numpy as np
from huggingface_hub import hf_hub_download
from llama_cpp import Llama

from langchain.llms import LlamaCpp
from langchain.chains import LLMChain
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.prompts import PromptTemplate

Then mount the Google Drive to load the NBA player sample data shared by Meta in the llama-recipes repo. This dataset will be used to create the vector store:

from google.colab import drive
drive.mount('/content/drive')

source_text_file = '/content/drive/MyDrive/Research/Data/GenAI/nba.txt'
Run Llama 2 with Retrieval Augmented Generation in Google Colab with GPUs
NBA Player Sample Data

2 Load Llama 2 from HuggingFace

Firstly create a callback manager for the streaming output of text, and specify the model names in the HuggingFace:

# for token-wise streaming so you'll see the answer gets generated token by token when Llama is answering your question
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

# Download the model
!wget https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_0.gguf

Then specify the model path to be loaded into LlamaCpp:

model_path = 'llama-2-7b-chat.Q5_0.gguf'

Specify the GPU settings:

n_gpu_layers = 40  # Change this value based on your model and your GPU VRAM pool.
n_batch = 512  # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.

Next, let's load the model using langchain as below:

from langchain.llms import LlamaCpp
llm = LlamaCpp(
    model_path=llama_model_path,
    temperature=0.0,
    top_p=1,
    n_ctx=16000,
    n_gpu_layers=n_gpu_layers,
    n_batch=n_batch,
    callback_manager=callback_manager,
    verbose=True,
)
💡
Be sure to set up n_gpu_layers and n_batch, it shows BLAS = 1 in the output if it is set up correctly.

3 RAG

Retrieval Augmented Generation (RAG) is important because it addresses key limitations of large language models (LLMs). Here's why:

  • Factual Accuracy: LLMs can be creative and articulate, but they aren't always truthful. RAG integrates external knowledge sources, ensuring generated responses are grounded in real facts.
  • Reduced Hallucinations: LLMs can sometimes invent information or make false claims. RAG combats hallucinations by providing LLMs with reliable context from external sources.
  • Domain Expertise: LLMs struggle with specialized topics. RAG allows them access to specific knowledge bases, like medical journals or legal documents, enhancing their responses in niche areas.
  • Transparency and Trust: RAG systems can show their work! Users can see the sources used to generate responses, building trust and enabling fact-checking.

In short, RAG makes LLMs more reliable, accurate, and versatile, opening doors for their use in areas like education, legal advice, and scientific research. It's a crucial step towards trustworthy and grounded AI.

3.1 Initialize Pinecone

Let's import a few related packages and initialize Pinecone - a vector store provider.

💡
Quick start for Pinecone setup: https://docs.pinecone.io/docs/quickstart
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma

from langchain.embeddings import HuggingFaceEmbeddings

Get your Pinecone API key and env here:

PINECONE_API_KEY = ''
PINECONE_ENV = ''

And initialize it:

import pinecone
from langchain.vectorstores import Pinecone

# Initialize Pinecone
pinecone.init(
    api_key=PINECONE_API_KEY,  
    environment=PINECONE_ENV  
)

pinecone_index_nm = 'qabot'

3.2 Create a Vector Store

Firstly let's load the data from Colab:

embeddings = HuggingFaceEmbeddings()

# Load the document, split it into chunks, embed each chunk and load it into the vector store.
raw_documents = TextLoader(source_text_file).load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
documents = text_splitter.split_documents(raw_documents)

Then create the vector store:

# Send embedding vectors to Pinecone with Langchain

vstore = Pinecone.from_documents(documents, embeddings, index_name=pinecone_index_nm)

3.3 RAG

We then use RetrievalQA to retrieve the documents from the vector database and give the model more context on Llama 2, thereby increasing its knowledge.

# use another LangChain's chain, RetrievalQA, to associate Llama with the loaded documents stored in the vector db
from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vstore.as_retriever(search_kwargs={"k": 1})
)

Then the model is ready for your questions:

question = "Who is the tallest in Atlanta Hawks"
result = qa_chain({"query": question})

The response is like:

{'query': 'Who is the tallest in Atlanta Hawks',
 'result': ' The tallest player on the Atlanta Hawks roster is Saddiq Bey at 6\'7".'}

4 Conclusion

Utilizing the open-source Llama 2 model with RAG, you can create a robust chatbot tailored to your domain knowledge. This capability proves highly beneficial for enterprise users, as it circumvents privacy concerns and data leaks, ensuring everything operates in-house in theory.

However, there's still more to uncover in our quest to construct a secure and responsible GenAI app at the enterprise level. Stay tuned for further updates.

Reference

Langchain - llama.cpp:

Llama.cpp | 🦜️🔗 Langchain
llama-cpp-python is a
Run Llama 2 with Retrieval Augmented Generation in Google Colab with GPUs
]]>
<![CDATA[Say Hello to 2024]]>https://realvincentyuan.github.io/Spacecraft/say-hello-to-2024/668a1b22ac15d470add4a2e0Sun, 31 Dec 2023 20:51:10 GMT

As we near the conclusion of 2023, I find myself eager to reflect on the year and gather my thoughts for the upcoming year, 2024. This past year has been incredibly fruitful and meaningful, prompting me to consider my plans and aspirations for the future in this post.

1 Things Accomplished

The theme for 2023 revolves around stepping out of one's comfort zone! Here's a glimpse into my endeavors:

  • Relocated to the U.S. and established a new life.
  • Spearheaded a sophisticated project for my company.
  • Rekindled my passion for tennis, making remarkable advancements.
  • Crafted SpaceCraft (the blog) from the ground up.
  • Explored numerous destinations across the U.S. and captured stunning moments through photography.

Shall we delve deeper into each of these experiences?

1.1 Migration to the U.S.

Making the decision to leave my entire family, including my 18-month-old daughter, behind in pursuit of a better life in the U.S. was an arduous choice for a 30-year-old man like myself. My aim is to advance my career prospects, ensure a secure future for my family, and offer enhanced educational opportunities for my daughter.

The day it actually happened marked one of the most challenging moments for all of us as a family, bidding farewell at Shanghai Airport. Uncertainty clouded my mind—I couldn't fathom when I might reunite with them. I found myself alone in a country thousands of miles away from home, with the responsibility to establish a life on my own.

Upon arriving in the U.S., the challenges multiplied. Everything seemed different—I struggled to open a bank account, acquire a social security number, and even rent a car. Simple tasks like shopping for groceries became a puzzle as I navigated unfamiliar stores. Loneliness loomed large, especially during nights and weekends, making the initial period here deeply unsettling.

Say Hello to 2024
Photo by Vincent Yuan @USA / Unsplash

However, I persevered by adhering to my goals, enabling me to realign both my work and personal life swiftly. As I honed my focus, I gradually embraced the lifestyle here through adaptation and learning. It's crucial to prioritize life goals, exhibit resilience, and persist despite distractions. This approach significantly impacts your long-term journey, making a substantial difference.

1.2 Progress in Work

Since arriving in the U.S., my primary focus at work has been credit card application fraud detection, employing a combination of knowledge graph and machine learning techniques. This project stands out as one of the most intricate endeavors I've undertaken, encompassing both technical complexities and strategic business considerations. Leveraging insights from previous experiences was instrumental in its success, and I've predominantly shared these insights within the Tech section of this blog, using publicly available datasets.

1.3 Polished Tennis Skills

Playing tennis can evoke a complex array of emotions, particularly when the game proves challenging to grasp. In such instances, the experience can be far from enjoyable. Picture yourself repeatedly collecting tennis balls under scorching temperatures exceeding 35°C (95°F) for 2 to 3 hours due to an inability to sustain a rally with opponents at the outset.

Say Hello to 2024
Practice of Tennis, Photo by Vincent Yuan @USA / Unsplash

Tennis ceased to be just a recreational activity; it evolved into a pursuit that toughened me. Throughout summers, I devoted numerous nights and weekends to honing my serving and striking skills against a wall, armed only with a bottle of water and a tree brimming with tennis balls cascading out of the court. It marked the initial occasion I approached a sport with such earnestness and committed myself to extensive practice, determined to master it regardless of the challenges.

While I may not yet classify myself as an intermediate amateur player, significant progress has undeniably been achieved. Now, I derive more enjoyment from the game than the frustration it once caused me.

1.4 Developed SpaceCraft

I deliberated extensively before deciding whether to relaunch my blog and determining its content and style. Fortunately, I reached a conclusion from an objective viewpoint: just write if I enjoy it, keeping it that simple!

I consider myself fortunate not to have tainted it by viewing it solely as a means for making money, as this would significantly impact the content I share. Therefore, I approach it as a platform to chronicle my life and impart something meaningful to the world—nothing more!

For insights into the technical aspects of constructing this blog, please refer to the series below:

Host a Ghost Blog on AWS in 2023 (IV) - Create a Functional Table of Contents
Table of contents is easy for users to navigate through the long article, but Ghost did not provide a nice out-of-the-box solution.
Say Hello to 2024

1.5 Traveling and Learning

Traveling has significantly enriched my life in the U.S. I've gained extensive knowledge about this country and captured numerous wonderful photos along the way. It brings me immense joy to know that many of my friends thoroughly enjoy the posts in the Life section of my blog and admire my photography.

Since February 2023, I've curated and uploaded 863 photos to my portfolio on Unsplash. Remarkably, these images have garnered an average of 200,000 views and 1,000 downloads per month.

Photography is a passion of mine, and I find genuine happiness in knowing that my work resonates with people worldwide.

Vincent Yuan @USA (@vincentyuan87) | Unsplash Photo Community
See 863 of the best free to download photos, images, and wallpapers by Vincent Yuan @USA on Unsplash.
Say Hello to 2024

2 Looking Ahead to 2024

I believe I'm headed in the right direction, with no significant changes needed. However, I plan to dedicate more time to integrating generative AI into my workflow as I see this as a crucial factor for the long-term development of my career. I'll be sharing more posts on this topic later when the timing is right.

In addition to work, I'll be allocating most of my time to the following:

  • Investment
  • Photography
  • Tennis

I'll also be exploring these subjects for potential posts in either the Pro or Life columns. Stay tuned for more!

]]>
<![CDATA[Highlights of the Trip to the East Coast of the United States]]>https://realvincentyuan.github.io/Spacecraft/highlights-of-the-trip-to-east-coast-america/668a1b22ac15d470add4a2dfSun, 31 Dec 2023 05:14:02 GMT

It's been some time since my last trip, and this time, I embarked on a 7-day adventure spanning Washington D.C., Delaware, Philadelphia (Philly), and New York. Equipped with my new camera, the Canon EOS R8 paired with a 24-105mm lens, purchased during the Black Friday season, I seized the opportunity to capture numerous breathtaking vistas during the entirety of my journey.

Highlights of the Trip to the East Coast of the United States
Route

1 Washington D.C.

Washington, D.C., the capital of the United States, is known for several prominent aspects:

  • Monuments and Memorials: It's famous for its iconic landmarks like the Washington Monument, Lincoln Memorial, Jefferson Memorial, and the Martin Luther King Jr. Memorial, which honor key figures and pivotal moments in American history.
  • Government Buildings: Washington, D.C. is home to the White House (the residence of the U.S. President), the U.S. Capitol (where Congress meets), the Supreme Court, and various federal agencies, making it the political center of the country.
  • Museums and Cultural Institutions: The city hosts numerous world-class museums and galleries, including the Smithsonian Institution, comprising multiple museums such as the National Air and Space Museum, National Museum of American History, National Museum of Natural History, and many others, showcasing art, history, culture, and scientific achievements.
  • Cherry Blossom Festival: Each spring, the city's cherry blossom trees bloom, drawing millions of visitors to the National Mall. The National Cherry Blossom Festival celebrates this natural beauty and includes various events and activities.
  • Cultural Diversity: Washington, D.C. is a diverse city with a rich cultural tapestry, reflected in its neighborhoods, cuisine, festivals, and events. It's a melting pot of different cultures, languages, and traditions.
  • Education and Research: The city hosts several renowned universities, think tanks, and research institutions, including Georgetown University, George Washington University, and the Brookings Institution, contributing to its intellectual vibrancy.
  • Historic Neighborhoods: Areas like Georgetown, Capitol Hill, and Dupont Circle offer historic charm, cobblestone streets, unique architecture, and a blend of residential, commercial, and cultural spaces.
  • Political Activism and Protests: Being the seat of the U.S. government, Washington, D.C. often serves as a focal point for political activism, demonstrations, and protests on various national and international issues.
  • Sports and Entertainment: The city has professional sports teams like the Washington Football Team (NFL), Washington Nationals (MLB), Washington Wizards (NBA), and Capitals (NHL), along with a vibrant entertainment scene with theaters, live music venues, and restaurants.
  • International Influence: As the capital of the United States, Washington, D.C. plays a significant role in international diplomacy, hosting embassies, international organizations, and summits that shape global policies.

These elements collectively contribute to the distinctiveness and significance of Washington, D.C. in the United States and worldwide.

Highlights of the Trip to the East Coast of the United States
Washington Monument, Photo by Vincent Yuan @USA / Unsplash

The grand government buildings here stand out from those in other cities. They exude a sense of decency, cleanliness, and seamlessly blend in with the rest of the city's architecture.

Highlights of the Trip to the East Coast of the United States
United States Capitol, Photo by Vincent Yuan @USA / Unsplash

During the Christmas season, the city transforms with an abundance of Christmas trees and dazzling lights. This festive setup adds a welcoming and cozy vibe to the cityscape, making it more accessible and warm compared to other times of the year.

Highlights of the Trip to the East Coast of the United States
Library of Congress, Photo by Vincent Yuan @USA / Unsplash

Georgetown stands apart from downtown D.C. with its unique charm. Here, you'll find quaint yet elegant buildings that create a different vibe. The streets are brimming with charming shops, offering a delightful experience. It's the perfect spot to savor local seafood, coffee, and pastries.

Highlights of the Trip to the East Coast of the United States
Georgetown, Photo by Vincent Yuan @USA / Unsplash

Please be aware that most museums in D.C. offer free admission, providing ample opportunities to explore and learn. They are fantastic places to discover and expand your knowledge.

Highlights of the Trip to the East Coast of the United States
Smithsonian National Museum of Natural History, Photo by Vincent Yuan @USA / Unsplash

2 Delaware

We dedicated a day to Delaware, not so much for its tourist spots, but for its remarkable sales tax-free advantage. Outlets Delaware stands out as an excellent shopping destination when in the state.

While we cherished the picturesque views of the Atlantic Ocean near Rehoboth Beach, Delaware.

Highlights of the Trip to the East Coast of the United States
Rehoboth Beach, Photo by Vincent Yuan @USA / Unsplash

The water alongside the beach shimmered with vibrant hues from the afternoon sunlight, creating a picturesque scene. It was an ideal spot for a leisurely walk and relaxation, offering a clean and tranquil ambiance.

3 Philadelphia

This was where I learned a significant part of American history because Philadelphia witnessed several crucial historical events. For instance, at Liberty Hall, the Declaration of Independence was crafted on August 2, 1776.

Highlights of the Trip to the East Coast of the United States
Independence Hall, Photo by Vincent Yuan @USA / Unsplash

At the Independence Hall tour, knowledgeable rangers guide you through crucial historical events along the timeline. This immersive experience allows you to vividly sense the significant, destiny-altering moments of the United States.

Moreover, if you haven't tried it yet, don't miss the chance to savor a Philly cheese steak—it's truly something special!

4 New York

New York City offers an abundance of attractions and experiences and it is the most dynamic cities in the east coast in my eyes.

4.1 Museums

  • The Metropolitan Museum of Art (The Met): One of the world's largest and most comprehensive art museums, housing an extensive collection spanning various cultures and time periods.
Highlights of the Trip to the East Coast of the United States
The Met, Photo by Vincent Yuan @USA / Unsplash
  • Museum of Modern Art (MoMA): Known for its impressive collection of modern and contemporary art, including works by Picasso, Van Gogh, and Warhol.
  • American Museum of Natural History: Features fascinating exhibits on natural history, including dinosaur fossils, animal dioramas, and a planetarium.
  • The Guggenheim Museum: Known for its unique architecture designed by Frank Lloyd Wright and its collection of modern and contemporary art.
  • The Whitney Museum of American Art: Showcases a vast collection of American contemporary art, including works by renowned artists.

4.2 Scenery

  • Central Park: A vast green oasis in the heart of Manhattan offering scenic walking paths, lakes, meadows, and recreational activities like boating, biking, and picnicking.
Highlights of the Trip to the East Coast of the United States
Bethesda Terrace, Photo by Vincent Yuan @USA / Unsplash
  • The High Line: A unique elevated park built on a former railway line, offering stunning views of the cityscape, art installations, gardens, and seating areas.
  • Brooklyn Bridge: An iconic suspension bridge offering beautiful views of the Manhattan skyline and the East River, perfect for walking or cycling across.
Highlights of the Trip to the East Coast of the United States
Brooklyn Bridge, Photo by Vincent Yuan @USA / Unsplash
  • Top of the Rock and Empire State Building: Observation decks providing panoramic views of the city from above.
Highlights of the Trip to the East Coast of the United States
Top of the Rock, Photo by Vincent Yuan @USA / Unsplash
  • Statue of Liberty and Ellis Island: Visit these historic landmarks to learn about immigration history and enjoy views of the New York Harbor.
Highlights of the Trip to the East Coast of the United States
Statue of Liberty, Photo by Vincent Yuan @USA / Unsplash

4.3 Food

  • Diverse Cuisine: New York City offers a vast array of cuisines from around the world. Enjoy everything from Michelin-starred restaurants to food trucks and local delis.
  • Pizza: Grab a slice of New York-style pizza from iconic spots like Joe's Pizza, Di Fara Pizza, or Lombardi's.
  • Bagels: Sample authentic New York bagels from places like Russ & Daughters, Ess-a-Bagel, or Absolute Bagels.
  • Food Markets: Visit food markets like Chelsea Market or Smorgasburg to indulge in a variety of local and international food vendors.
  • Fine Dining: Explore acclaimed restaurants offering diverse culinary experiences, from high-end steak houses to innovative fine dining establishments.

New York City's diverse cultural offerings, breathtaking scenery, and vibrant culinary scene ensure there's something enjoyable for every visitor. We have spent 3 days in New York City and it was not nearly enough to enjoy this amazing city.

5 Remarks

On the East Coast, while you may not encounter as many stunning natural landscapes within the renowned national parks, the diverse blend of culture, architecture, museums, and culinary delights offers a unique experience unlike other parts of America.

Such a trip serves as a refreshing escape, ideal for unwinding after a year of hard work. It's a wonderful way to conclude 2023. Until 2024—Happy New Year!

]]>
<![CDATA[Build a Web-based GPT Chatbot with Custom Knowledge]]>https://realvincentyuan.github.io/Spacecraft/build-a-gpt-chatbot/668a1b22ac15d470add4a2deWed, 13 Dec 2023 04:01:57 GMT

The advancement of generative AI is remarkable. Now, with approximately 70 lines of Python code predominantly leveraging OpenAI GPT, llama-index, and Streamlit, you can craft chatbots infused with your specialized domain knowledge. This enables the creation of a personalized assistant tailored specifically for your needs.

This post is going to share how you can build your own powerful Chatbot for your own data!

1 Dependency

A few Python packages are needed for this app:

pip install streamlit openai llama-index nltk

Also, please get an OpenAI API key by following this guide:

  • Go to https://platform.openai.com/account/api-keys.
  • Click on the + Create new secret key button.
  • Enter an identifier name (optional) and click on the Create secret key button.
  • Copy the API key to be used in this tutorial (the key shown below was already revoked).

2 Create the Chatbot

The project folder can be set up like this:

chatbot
| |_main.py
| |_data
| |_.streamlit
| 	|_secrets.toml
File Tree

These files are:

  • a main.py to store the code of the app.
  • a data folder that stores the input data.
  • secrets.toml in the .streamlit folder that has the Open AI API key, like openai_key = 'Your OpenAI API key'

The main.py looks like this, this is all that it needs to pull up the chatbot:

import streamlit as st
from llama_index import VectorStoreIndex, ServiceContext, Document
from llama_index.llms import OpenAI
import openai
from llama_index import SimpleDirectoryReader

st.set_page_config(page_title="Chat with the Streamlit docs, powered by LlamaIndex", page_icon="🦙", layout="centered", initial_sidebar_state="auto", menu_items=None)
openai.api_key = st.secrets.openai_key
st.title("Chat with the domain knowledge, powered by LlamaIndex 💬🦙")
st.info("Check out the full tutorial to build this app in our [blog post](https://blog.streamlit.io/build-a-chatbot-with-custom-data-sources-powered-by-llamaindex/)", icon="📃")
         
if "messages" not in st.session_state.keys(): # Initialize the chat messages history
    st.session_state.messages = [
        {"role": "assistant", "content": "Ask me a question about the fils you uploaded just now!"}
    ]

# Utilities functions    
def save_uploadedfile(uploadedfile):
    import os
    
    with open(os.path.join("data",uploadedfile.name),"wb") as f:
         f.write(uploadedfile.getbuffer())
            
    return st.success(f"File saved to data folder!" )    
    
    
@st.cache_resource(show_spinner=False)
def load_data():
    with st.spinner(text="Loading and indexing the Streamlit docs – hang tight! This should take 1-2 minutes."):
        reader = SimpleDirectoryReader(input_dir="./data", recursive=True)
        docs = reader.load_data()
        service_context = ServiceContext.from_defaults(llm=OpenAI(model="gpt-3.5-turbo", temperature=0.5, system_prompt="You are an expert on the fraud detection and your job is to answer technical questions. Assume that all questions are related to the credit card fraud detection. Keep your answers technical and based on facts – do not hallucinate features."))
        index = VectorStoreIndex.from_documents(docs, service_context=service_context)
        return index

    
    
# Main app

datafile = st.file_uploader("Upload your data (string only)",type=['str','csv','txt'])

if datafile is not None:

    save_uploadedfile(datafile)


    index = load_data()

    if "chat_engine" not in st.session_state.keys(): # Initialize the chat engine
            st.session_state.chat_engine = index.as_chat_engine(chat_mode="condense_question", verbose=True)

    if prompt := st.chat_input("Your question"): # Prompt for user input and save to chat history
        st.session_state.messages.append({"role": "user", "content": prompt})

    for message in st.session_state.messages: # Display the prior chat messages
        with st.chat_message(message["role"]):
            st.write(message["content"])

    # If last message is not from assistant, generate a new response
    if st.session_state.messages[-1]["role"] != "assistant":
        with st.chat_message("assistant"):
            with st.spinner("Thinking..."):
                response = st.session_state.chat_engine.chat(prompt)
                st.write(response.response)
                message = {"role": "assistant", "content": response.response}
                st.session_state.messages.append(message) # Add response to message history
GPT Chatbot

Intuitively, it accepts the input from users, in this example it only accepts text and csv files, you can tweak the input types following the Streamlit guide.

Then the input file are saved and used to create a vector store, which has been used for the GPT to refer to in the question-answering sessions.

3 Activate the Chatbot

In the terminal of your laptop, run command:

streamlit run main.py
Activate the App

The the app is up and you can upload your own files to the app, once that is done, the chat box will appear and you can ask questions regarding the data.

The sample credit card transaction data can be downloaded at datacamp.
Build a Web-based GPT Chatbot with Custom Knowledge
Chatbot UI
💡
Watch out the cost of the usage of OpenAI API, set a limit for the cost just to avoid unexpected intensive usage.

Reference

LlamaIndex app demo on Streamlit Blog

Build a chatbot with custom data sources, powered by LlamaIndex
Augment any LLM with your own data in 43 lines of code!
Build a Web-based GPT Chatbot with Custom Knowledge

Streamlit file uploader

st.file_uploader - Streamlit Docs
st.file_uploader displays a file uploader widget.
Build a Web-based GPT Chatbot with Custom Knowledge

]]>
<![CDATA[Build a macOS App Powered by Llama 2 in Three Steps]]>https://realvincentyuan.github.io/Spacecraft/build-a-macos-app-powered-by-llama-2-in-three-steps/668a1b22ac15d470add4a2ddThu, 07 Dec 2023 03:30:11 GMT

Generative AI has been generating a lot of buzz lately, and among the most discussed open-source large language models is Llama 2 made by Meta. Recently, the renowned Hugging Face team introduced a tool enabling the utilization of large language models within MacOS applications. This post aims to guide you through the process of integrating and leveraging Llama 2 effortlessly, empowering you to build your own applications seamlessly.

1 Prerequisite

In order to run the steps in this post, it is suggested that you have:

  • A computer running macOS
  • Git, or GitHub client
  • Xcode, available for free in macOS App Store
  • Installed a few dependent Python packages

Following the steps, you can build an macOS app that you can interact with like below:

Build a macOS App Powered by Llama 2 in Three Steps
macOS App Powered by Llama 2

2 Swift-Chat Code

Clone the Swift-chat GitHub repo by using below command:

git clone https://github.com/huggingface/swift-chat

If you are not familiar with Git/GitHub, kindly go to this tutorial:

How to Use GitHub without Writing a Single Line of Code
Healthy food for thoughts about this beautiful life!
Build a macOS App Powered by Llama 2 in Three Steps

3 Xcode Build

Use Xcode to open the project file - SwiftChat.xcodeproj in that Git repo cloned in the last step:

Build a macOS App Powered by Llama 2 in Three Steps
Xcode Project

Click the play button on the top left to build the project and the app will show up.

4 Download Llama 2 CoreML Model

A CoreML model is required to be loaded into the app, there are many ways to convert a PyTorch/TensorFlow models into a CoreML model as quoted below:

1. Use the transformers-to-coreml conversion Space:
This is an automated tool built on top of exporters (see below) that either works for your model, or doesn't. It requires no coding: enter the Hub model identifier, select the task you plan to use the model for, and click apply. If the conversion succeeds, you can push the converted Core ML weights to the Hub, and you are done!
2. Use exporters, a Python conversion package built on top of Apple's coremltools (see below).
This library gives you a lot more options to configure the conversion task. In addition, it lets you create your own conversion configuration class, which you may use for additional control or to work around conversion issues.
3. Use coremltools, Apple's conversion package.
This is the lowest-level approach and therefore provides maximum control. It can still fail for some models (especially new ones), but you always have the option to dive inside the source code and try to figure out why.

But per my experiment, the easiest way is to download the Llama 2 CoreML model from Hugging Face as below:

4.1 Install huggingface_hub

In the terminal, run below command to install huggingface_hub:

pip install huggingface_hub
Install huggingface_hub

4.2 Download the Llama 2 CoreML Model

Once the huggingface_hub is installed, you can use the huggingface_cli to download the model:

huggingface-cli download --local-dir-use-symlinks False --local-dir ~/Download/Llama-2-7b-chat-coreml coreml-projects/Llama-2-7b-chat-coreml
Download Llama CoreML Model

You will be asked to provide the Hugging Face token if you do not have one, just click the link in the output and generate that token, then put it in the terminal, also, you can tweak the path to save the model by altering the value behind the --local-dir parameter.

5 Run the App

If the preceding steps have been successful, the app's user interface should now be visible. Proceed by clicking the button located in the left sidebar to load the model, enabling you to enjoy utilizing your personalized application!

This additional information serves to enhance the details not covered in the original post, simplifying the process. Much credit goes to the Hugging Face team for their remarkable efforts in this endeavor—kudos to them!

Reference

Post regarding the release of Swift Transformers:

Releasing Swift Transformers: Run On-Device LLMs in Apple Devices
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Build a macOS App Powered by Llama 2 in Three Steps

Llama 2 CoreML model made by Hugging Face

coreml-projects/Llama-2-7b-chat-coreml · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Build a macOS App Powered by Llama 2 in Three Steps
]]>
<![CDATA[An Overview of Generative AI]]>https://realvincentyuan.github.io/Spacecraft/an-overview-of-generative-ai/668a1b22ac15d470add4a2dcSun, 03 Dec 2023 20:51:43 GMT

Generative AI (GenAI) is definitely a hype, it is so overwhelming since the debut of ChatGPT in late 2022, and ever since then, a lot of progress have been made in both tech and regulation. Therefore, this post is going to summarize the most important things to know about GenAI with an holistic view.

1 Introduction

Generative AI refers to a subset of artificial intelligence that focuses on creating or generating new content, data, or outputs based on patterns and knowledge learned from existing data. This form of AI is particularly adept at generating novel content that resembles and aligns with the patterns observed in the training data. Generative models aim to produce outputs that mimic the characteristics of the input data they were trained on.

1.1 Evolution of GenAI

The history and evolution of generative AI span several decades and have seen significant advancements in machine learning and artificial intelligence research. Here's a brief overview:

Early Years (1950s - 1980s):

  • Early Concepts: The foundational concepts of artificial intelligence emerged in the 1950s and 1960s. Researchers explored the idea of machines simulating human-like intelligence.
  • Early Generative Models: Early generative models like Markov chains and simple probabilistic models were developed. These models generated basic sequences of text or data based on probabilistic rules.

Neural Networks Resurgence (1980s - 2000s):

  • Neural Networks: Neural networks gained attention in the 1980s and 1990s, but due to limitations in computing power and data, they were not extensively explored for generative tasks.
  • Restricted Success: Generative models during this period struggled due to challenges in training deep networks and limited datasets.

Rise of Deep Learning (2010s - Present):

  • Deep Learning Revolution: The advent of big data, increased computing power, and breakthroughs in deep learning techniques led to a resurgence of interest in generative models.
  • Generative Adversarial Networks (GANs): Introduced by Ian Goodfellow and his colleagues in 2014, GANs became a groundbreaking architecture for generative tasks. GANs involve a generator and discriminator competing against each other, leading to impressive results in image generation and beyond.
  • Variational Autoencoders (VAEs): VAEs emerged as another prominent framework for generative modeling, focusing on learning latent representations and generating data based on learned distributions.
  • Transformer Models: Transformers, introduced in the paper Attention is All You Need by Vaswani et al. in 2017, revolutionized natural language processing (NLP) and enabled advanced text generation tasks like language translation, text summarization, and more.

2 Core Concepts

Generative AI is based on several fundamental principles that enable machines to generate new content or data resembling patterns observed in the training data. Key principles include unsupervised learning, probabilistic modeling, and latent space representation.

2.1 Unsupervised Learning:

  • Definition: Unsupervised learning involves training AI models on unlabeled data without explicit supervision or predefined targets.
  • Role in Generative AI: Generative models use unsupervised learning to extract patterns and structures from the data. They learn the underlying distribution of the input data to generate new samples.
  • Example: Autoencoders and Generative Adversarial Networks (GANs) are unsupervised learning approaches widely used in generative AI.

2.2 Probabilistic Modeling:

  • Definition: Probabilistic modeling involves representing uncertainty in data using probability distributions and statistical methods.
  • Role in Generative AI: Generative models use probability distributions to capture the complexity of the data. They learn the likelihood of generating data and sample from these distributions to create new data points.
  • Example: Variational Autoencoders (VAEs) use probabilistic modeling to learn a latent space representation and generate data samples based on the learned distributions.

2.3 Latent Space Representation:

  • Definition: Latent space is a lower-dimensional representation learned by the model that captures meaningful features or characteristics of the data.
  • Role in Generative AI: Generative models encode high-dimensional data into a lower-dimensional latent space. This space represents underlying features, allowing the model to generate new data points by decoding these representations.
  • Example: In VAEs, the encoder-decoder architecture learns to map data into a latent space and reconstruct data from latent space representations.

2.4 How They Intersect in Generative AI:

  • Unsupervised Learning + Probabilistic Modeling: Generative models leverage unsupervised learning techniques to capture data distributions probabilistically. They learn representations of data and generate new samples probabilistically based on these learned distributions.
  • Latent Space Representation + Probabilistic Modeling: Models like VAEs use latent space representations that follow specific probability distributions. By sampling from these distributions, they generate new data points with controlled characteristics.

Generative AI models employ these principles to understand, model, and generate new data that captures the underlying patterns and structures present in the training data. These principles contribute to the creativity and flexibility of generative models, enabling them to produce realistic and diverse outputs.

3 Applications

Generative AI is rapidly transforming various industries and has led to the development of innovative applications that are changing the way we interact with technology. Here are some of the most famous applications of generative AI today, along with examples and references:

3.1 Image Generation and Manipulation

a) Artbreeder: A collaborative art platform that allows users to create and share AI-generated images and manipulate existing images using a variety of AI-powered tools. (https://www.artbreeder.com/)

3.2 Text Generation and Summarization

a) Jasper: An AI-powered writing assistant that helps writers generate creative text formats, like poems, code, scripts, musical pieces, email, letters, etc. (https://www.jasper.ai/)

3.3 Code Generation

a) GitHub Copilot: An AI-powered code completion tool that helps programmers write more efficient and error-free code by suggesting relevant code snippets based on the context. (https://github.com/features/copilot)

3.4 Audio and Music Generation

a) MuseNet: A generative model that can generate new musical pieces in various styles, including classical, jazz, and pop. (https://openai.com/research/musenet)

3.4 Video Generation

a) Runway ML: A cloud-based platform that provides various AI-powered tools for video editing, including automatic background removal, object tracking, and style transfer. (https://runwayml.com/)

3.5 Drug Discovery and Development

a) Atomwise: An AI-powered drug discovery platform that uses generative models to design new drug candidates with improved properties. (https://www.atomwise.com/)

3.6 Customer Service and Chatbots

a) Dialpad: An AI-powered customer service platform that uses chatbots to provide automated support and answer customer questions. (https://www.dialpad.com/features/artificial-intelligence/)

These examples demonstrate the wide range of applications for generative AI, and its potential to revolutionize various industries. As generative AI models continue to evolve, we can expect to see even more innovative applications emerge in the years to come.

4 Ethical Considerations

While Generative AI has the potential to transform industries and bring about a productivity revolution, it also raises ethical concerns that need to be addressed.

Some of the key ethical concerns of using Generative AI include:

Distribution of harmful content: While Generative AI systems can generate human-like content that enhances business productivity, it can also lead to generating harmful or offensive content. The most concerning harm stems from tools like Deepfakes that can create false images, videos, text, or speech that can be agenda-driven or fueled to spread hate speech. Such harmful content calls for human intervention to align it with the business ethics of the organization leveraging this technology .

Copyright and legal exposure: Generative AI models are trained on data, which can infringe upon the copyrights and intellectual property rights of other companies. It can lead to legal, reputational, and financial risk for the company using pre-trained models and can negatively impact creators and copyright holders.

Data privacy violations: The underlying training data may contain sensitive information, including personally identifiable information (PII). The common ethical principles include transparency, accountability, data privacy, and robustness that focus on the technology providers. Keeping informed of developments in these areas and engaging with the discourse around AI ethics is an essential part of making sure generative AI remains safe for all users.

It is important to note that addressing these concerns ensures responsible Generative AI development and deployment.

5 Outlook

Here are some of the key trends and challenges that are likely to shape the future of GenAI:

Big picture

Continued growth and investment: The field of GenAI is expected to continue to grow rapidly in the coming years, as more investment pours into the development of new models and applications.

Increasing sophistication of models: GenAI models are becoming increasingly sophisticated, thanks to advances in machine learning and artificial intelligence.

Expanding range of applications: GenAI is being used in a wider range of applications, from creating art and music to designing drugs and developing new materials.

Growing concerns about ethical implications: There is growing concern about the ethical implications of GenAI, such as the potential for bias, discrimination, and misuse.

Individual

New creative tools: GenAI is providing individuals with new tools for creativity, such as software that can generate realistic images, music, and code.

Personalized experiences: GenAI is being used to create personalized experiences for individuals, such as customized news feeds and recommendations.

Augmented decision-making: GenAI can be used to augment human decision-making, by providing insights and recommendations.

Job displacement: GenAI has the potential to displace some jobs, as machines become capable of performing tasks that were previously done by humans.

Overall, the outlook for GenAI is positive. GenAI has the potential to make our lives easier, more creative, and more productive. However, it is important to be aware of the ethical implications of this technology and to develop safeguards to ensure that it is used responsibly.

]]>
<![CDATA[Named Entity Recognition and Knowledge Graph for Natural Language Understanding]]>https://realvincentyuan.github.io/Spacecraft/ner-graph-1/668a1b22ac15d470add4a2dbMon, 27 Nov 2023 03:37:21 GMT

When handling textual descriptions in data science tasks, numerous approaches exist to comprehend them, including topic modeling, text classification, and sentiment analysis etc. However, these methods often necessitate additional explanation for audiences to grasp the results fully, particularly for individuals lacking technical expertise.

In this post, a novel method will be introduced, aiming to interpret relationships between elements in an easily understandable manner. This approach utilizes named entity recognition and knowledge graphs, offering a more accessible way to comprehend connections within the data.

1 Named Entity Recognition

Named Entity Recognition (NER) is a subfield of natural language processing (NLP) that deals with the identification and classification of named entities in text. Named entities are real-world objects or concepts that are mentioned in text, such as people, places, organizations, and products.

Here are a few well-known NER solution built with Python:

  1. spaCy: spaCy is a widely used and efficient library for natural language processing in Python. It provides a pre-trained statistical model for named entity recognition and allows customization of NER pipelines.
  2. NLTK (Natural Language Toolkit): NLTK is a comprehensive library for working with human language data. It includes various tools and modules for NER, albeit it might require more manual configuration compared to spaCy.
  3. Hugging Face Transformers: Hugging Face Transformers is a popular library providing various pre-trained language models for NER and other NLP tasks. It's known for its extensive collection of state-of-the-art models and easy-to-use pipelines.
  4. AllenNLP: AllenNLP is a powerful NLP library built on PyTorch, offering pre-built models and tools for various NLP tasks, including NER.
  5. Flair: Flair is an NLP library that allows for easy use of pre-trained word embeddings and state-of-the-art NLP models. It provides functionality for NER among other tasks.
  6. Stanford NER: Stanford NER is a widely-used Java-based tool that also offers Python bindings. It uses a statistical model to recognize named entities in text.

Here the spaCy is used to perform the NER task of below text:

texts = [
"Apple is planning to release a new iPhone model next month.",    
"Samsung unveiled its latest flagship smartphone at the tech conference.",    
"Google is investing heavily in artificial intelligence research.",    
"Microsoft announced a major update for its Windows operating system.",    
"Amazon launched a new line of smart home devices.",    
"Google faced criticism over its data privacy policies.",    
"Tesla's CEO Elon Musk tweeted about the company's future plans.",    
"Uber is expanding its services to additional cities worldwide.",    
"Netflix released a teaser for its upcoming original series.",    
"Apple introduced new features for its social media platform."
]

And the spaCy pipeline is applied and the results are converted into a Pandas DataFrame:

💡
Run python -m spacy download en_core_web_md in the terminal if you have not downloaded en_core_web_md before.
import pandas as pd
import spacy

# Load the English language model for spaCy
nlp = spacy.load("en_core_web_md")



# Initialize lists to store extracted data
data = {'text': [], 'entities': [], 'labels': []}

# Process each text using spaCy NER
for text in texts:
    doc = nlp(text)
    entities = [ent.text for ent in doc.ents]
    labels = [ent.label_ for ent in doc.ents]
    
    # Append data to respective lists
    data['text'].append(text)
    data['entities'].append(entities)
    data['labels'].append(labels)

# Create a pandas DataFrame
sp_df = pd.DataFrame(data)

sp_df.reset_index(inplace=True)
sp_df
NER with spaCy

This is the output:

index text entities labels
0 0 Apple is planning to release a new iPhone mode... [Apple, iPhone, next month] [ORG, PRODUCT, DATE]
1 1 Samsung unveiled its latest flagship smartphon... [Samsung] [ORG]
2 2 Google is investing heavily in artificial inte... [Google] [ORG]
3 3 Microsoft announced a major update for its Win... [Microsoft, Windows] [ORG, PRODUCT]
4 4 Amazon launched a new line of smart home devices. [Amazon] [ORG]
5 5 Google faced criticism over its data privacy p... [Google] [ORG]
6 6 Tesla's CEO Elon Musk tweeted about the compan... [Tesla, Elon Musk] [ORG, PERSON]
7 7 Uber is expanding its services to additional c... [Uber] [ORG]
8 8 Netflix released a teaser for its upcoming ori... [Netflix] [ORG]
9 9 Apple introduced new features for its social m... [Apple] [ORG]

2 Graphs Analytics

2.1 Create the Relationship DataFrame

This step is to create a DataFrame that has the source and target:

# Empty lists to store relationship data
indexes = []
text_entities = []
entity_types = []
relationships = []

# Iterate through each row in the DataFrame
for idx, row in sp_df.iterrows():
    index = row['index']
    text = row['text']
    entities = row['entities']
    labels = row['labels']

    # Extract relationships between text and entities based on labels
    for i in range(len(entities)):
        indexes.append(index)
        text_entities.append(text)
        entity_types.append(entities[i])
        relationships.append(labels[i])

# Create a new DataFrame for relationships
relationship_df = pd.DataFrame({
    'Index':indexes,
    'Text': text_entities,
    'Entities': entity_types,
    'Relationship': relationships
})
Create Relationships

The relationship_df is like below, the Index works as source and Entities is the target.

Index Text Entities Relationship
0 0 Apple is planning to release a new iPhone mode... Apple ORG
1 0 Apple is planning to release a new iPhone mode... iPhone PRODUCT
2 0 Apple is planning to release a new iPhone mode... next month DATE
3 1 Samsung unveiled its latest flagship smartphon... Samsung ORG
4 2 Google is investing heavily in artificial inte... Google ORG
5 3 Microsoft announced a major update for its Win... Microsoft ORG
6 3 Microsoft announced a major update for its Win... Windows PRODUCT
7 4 Amazon launched a new line of smart home devices. Amazon ORG
8 5 Google faced criticism over its data privacy p... Google ORG
9 6 Tesla's CEO Elon Musk tweeted about the compan... Tesla ORG
10 6 Tesla's CEO Elon Musk tweeted about the compan... Elon Musk PERSON
11 7 Uber is expanding its services to additional c... Uber ORG
12 8 Netflix released a teaser for its upcoming ori... Netflix ORG
13 9 Apple introduced new features for its social m... Apple ORG

2.2 Create Graphs

Given above relationships, graphs can be created like below:

import networkx as nx

# Assuming relationship_df is the DataFrame from the previous step
# This DataFrame has columns 'Text', 'Entities', 'Relationship'

# Create an empty directed graph
graph = nx.DiGraph()

# Add edges and relationships to the graph
for _, row in relationship_df.iterrows():
    index = row['Index']
    text = row['Text']
    entity = row['Entities']
    relationship = row['Relationship']
    graph.add_edge(index, entity, relationship=relationship, desc=text)

# Display nodes, edges, and their attributes
print("Nodes:", graph.nodes())
print("Edges:")
for edge in graph.edges(data=True):
    print(edge)
Create Graphs

The output is:

Nodes: [0, 'Apple', 'iPhone', 'next month', 1, 'Samsung', 2, 'Google', 3, 'Microsoft', 'Windows', 4, 'Amazon', 5, 6, 'Tesla', 'Elon Musk', 7, 'Uber', 8, 'Netflix', 9]
Edges:
(0, 'Apple', {'relationship': 'ORG', 'desc': 'Apple is planning to release a new iPhone model next month.'})
(0, 'iPhone', {'relationship': 'PRODUCT', 'desc': 'Apple is planning to release a new iPhone model next month.'})
(0, 'next month', {'relationship': 'DATE', 'desc': 'Apple is planning to release a new iPhone model next month.'})
(1, 'Samsung', {'relationship': 'ORG', 'desc': 'Samsung unveiled its latest flagship smartphone at the tech conference.'})
(2, 'Google', {'relationship': 'ORG', 'desc': 'Google is investing heavily in artificial intelligence research.'})
(3, 'Microsoft', {'relationship': 'ORG', 'desc': 'Microsoft announced a major update for its Windows operating system.'})
(3, 'Windows', {'relationship': 'PRODUCT', 'desc': 'Microsoft announced a major update for its Windows operating system.'})
(4, 'Amazon', {'relationship': 'ORG', 'desc': 'Amazon launched a new line of smart home devices.'})
(5, 'Google', {'relationship': 'ORG', 'desc': 'Google faced criticism over its data privacy policies.'})
(6, 'Tesla', {'relationship': 'ORG', 'desc': "Tesla's CEO Elon Musk tweeted about the company's future plans."})
(6, 'Elon Musk', {'relationship': 'PERSON', 'desc': "Tesla's CEO Elon Musk tweeted about the company's future plans."})
(7, 'Uber', {'relationship': 'ORG', 'desc': 'Uber is expanding its services to additional cities worldwide.'})
(8, 'Netflix', {'relationship': 'ORG', 'desc': 'Netflix released a teaser for its upcoming original series.'})
(9, 'Apple', {'relationship': 'ORG', 'desc': 'Apple introduced new features for its social media platform.'})

2.3 Visualization

The linkage can be visualized as below:

import matplotlib.pyplot as plt

# Perform connected component analysis
connected_components = list(nx.connected_components(graph.to_undirected()))

# Find the connected components with more than one node
connected_components = [component for component in connected_components if len(component) > 1]

# Display the connected components
print("Connected components:")
for idx, component in enumerate(connected_components, start=1):
    print(f"Component {idx}: {component}")

# Visualize the graph (optional)
pos = nx.spring_layout(graph)
nx.draw(graph, pos, with_labels=True, node_size=500, font_size=8)
plt.show()
Visualization of Graphs

The output is:

Connected components:
Component 1: {0, 9, 'next month', 'Apple', 'iPhone'}
Component 2: {1, 'Samsung'}
Component 3: {2, 5, 'Google'}
Component 4: {'Microsoft', 3, 'Windows'}
Component 5: {4, 'Amazon'}
Component 6: {'Elon Musk', 'Tesla', 6}
Component 7: {'Uber', 7}
Component 8: {8, 'Netflix'}
Named Entity Recognition and Knowledge Graph for Natural Language Understanding
Linkage

3 Conclusion

Unlike other NLP tasks highlighted earlier, this NER+Graph approach unveils crucial information within the text, leveraging graph structures to interconnect shared information. This method tends to be more deterministic, sidestepping lengthy learning curves and potential controversies when conveying insights to an audience.

Moreover, numerous graph algorithms can be employed to delve deeper into the connections among these elements. These explorations will be detailed in subsequent posts.

]]>