Build a Fraud Intelligence Analyst Powered by Llama 2 in Google Colab with GPUs

Build a Fraud Intelligence Analyst Powered by Llama 2 in Google Colab with GPUs
Photo by Vincent Yuan @USA / Unsplash

Last time, we introduced how to use GPUs in Google Colab to run RAG with Llama 2. Today, a practical use case is discussed - fraudulent credit card transaction detection, powered by Llama 2.

Run Llama 2 with Retrieval Augmented Generation in Google Colab with GPUs
Run Llama2 with RAG in Google Colab.

Fraud detection is a critical task for businesses of all sizes. By identifying and investigating fraudulent transactions, businesses can protect their bottom line and keep their customers safe.

Llama 2 is a large language model that can be used to generate text, translate languages, write different kinds of creative content, and more. In this post, we'll show you how to use Llama 2 to build a Fraud Intelligence Analyst that can detect fraudulent patterns of credit card transactions and answer any questions regarding the transactions.

This Fraud Intelligence Analyst can be used to help fraud detection analysts and data scientists build better solutions to the fraud detection problem. By providing insights into the data, the Fraud Intelligence Analyst can help analysts identify new patterns of fraud and develop new strategies to combat it.

This post will show:

  • Load Llama 2 gguf model from HuggingFace
  • Run Llam2 2 with GPUs
  • Create a vector store from a CSV file that has credit card transaction data
  • Perform question and answering using Retrieval Augmented Generation(RAG)

1 Dependencies

Firstly, install Python dependencies as below:

!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir

!pip install huggingface_hub   chromadb langchain sentence-transformers pinecone_client

Then import dependencies as below:

import numpy as np
from huggingface_hub import hf_hub_download
from llama_cpp import Llama

from langchain.llms import LlamaCpp
from langchain.chains import LLMChain
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.prompts import PromptTemplate

# Vector store
from langchain.document_loaders import CSVLoader
from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain.vectorstores import Chroma

# Show result
import markdown

This credit card transaction dataset will be used to create the vector store:

from google.colab import drive
drive.mount('/content/drive')

source_text_file = '/content/drive/MyDrive/Research/Data/GenAI/credit_card_fraud.csv'

The transaction data is like below:

transaction time merchant amt city_pop is_fraud
2019-01-01 00:00:44 "Heller, Gutmann and Zieme" 107.23 149 0
2019-01-01 00:00:51 Lind-Buckridge 220.11 4154 0
2019-01-01 00:07:27 Kiehn Inc 96.29 589 0
2019-01-01 00:09:03 Beier-Hyatt 7.77 899 0
2019-01-01 00:21:32 Bruen-Yost 6.85 471 1
๐Ÿ’ก
The public fraud credit card transaction data can be found here: https://www.datacamp.com/workspace/datasets/dataset-python-credit-card-fraud

2 Load Llama 2 from HuggingFace

Firstly create a callback manager for the streaming output of text, and specify the model names in the HuggingFace:

# for token-wise streaming so you'll see the answer gets generated token by token when Llama is answering your question
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

# Download the model
!wget https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_0.gguf

Then specify the model path to be loaded into LlamaCpp:

model_path = 'llama-2-7b-chat.Q5_0.gguf'

Specify the GPU settings:

n_gpu_layers = 40  # Change this value based on your model and your GPU VRAM pool.
n_batch = 512  # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.

Next, let's load the model using langchain as below:

from langchain.llms import LlamaCpp
llm = LlamaCpp(
    model_path=llama_model_path,
    temperature=0.0,
    top_p=1,
    n_ctx=16000,
    n_gpu_layers=n_gpu_layers,
    n_batch=n_batch,
    callback_manager=callback_manager,
    verbose=True,
)
๐Ÿ’ก
Be sure to set up n_gpu_layers and n_batch, it shows BLAS = 1 in the output if it is set up correctly.

3 Question Answering

This time the CSV loader is used to embed a table and create a vector database, then the LLama 2 model will answer questions based on that file.

3.1 Create a Vector Store

Firstly let's load the CSV data from Colab:

embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

loader = CSVLoader(source_text_file, encoding="windows-1252")
documents = loader.load()

# Create a vector store
db = Chroma.from_documents(documents, embedding_function)

3.2 RAG

We then use RetrievalQA to retrieve the documents from the vector database and give the model more context on Llama 2, thereby increasing its knowledge.

# use another LangChain's chain, RetrievalQA, to associate Llama with the loaded documents stored in the vector db
from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vstore.as_retriever(search_kwargs={"k": 1})
)

Then the model is ready for your questions:

question = "Do you see any common patter for those fraudulent transactions? think about this step by step and provide examples for each pattern that you found."
result = qa_chain({"query": question})
print(markdown.markdown(result['result']))

The response is like:

Yes, I can identify some common patterns in the provided data for fraudulent transactions. Here are some examples of each pattern I found:

1. Recurring Transactions: There are several recurring transactions in the dataset, such as those with the same date and time every day or week. For example, transaction #86cad0e7682a85fa6418dde1a0a33a44 has a recurrence pattern of every Monday at 5:50 AM. While this alone does not necessarily indicate fraud, it could be a sign of automated or scripted transactions.

2. High-Value Transactions: Some transactions have unusually high values compared to the average transaction amount for the merchant and category. For example, transaction #86cad0e7682a85fa6418dde1a0a33a44 has an amt of $32.6, which is significantly higher than the average transaction amount for gas transport merchants in Browning, MO ($19.2). This could indicate a fraudulent transaction.

3. Multiple Transactions from Same IP Address:<p>Yes, I can identify some common patterns in the provided data for fraudulent transactions. 

4 Conclusion

In fraud detection, case studies are a common and important part of the process. However, they can be labor-intensive to create. Llama 2 and RAG can help to automate this process, making it more efficient and effective.

Llama 2 and RAG can be used to generate case studies that are tailored to specific questions or scenarios. This can help fraud detection analysts to identify patterns and trends that they might not otherwise have seen. Additionally, the case studies can be used to train new analysts on the latest fraud detection techniques.

Llama 2 and RAG are still in development, but they have the potential to revolutionize the way that fraud detection case study is conducted. By making it easier to create and analyze case studies, these tools can help fraud detection analysts to stay ahead of the curve.

Stay tuned for more applications like this one!

Reference

Langchain - llama.cpp:

Llama.cpp | ๐Ÿฆœ๏ธ๐Ÿ”— Langchain
llama-cpp-python is a

Create a vector store using CSV files:

How to use CSV files in vector stores with Langchain
A guide for using CSV files in vector stores with langchain