Spacecraft

Introduction to Financial Services in TikTok and ByteDance Ecosystems

Vincent Yuan — Sun, 01 Jun 2025 16:40:48 GMT

TikTok and its parent company ByteDance have rapidly expanded into financial services, leveraging their massive user bases and integrated e-commerce platforms. This report examines the lifecycle of Buy Now, Pay Later (BNPL), cash loans, and seller finance products within these ecosystems, detailing application processes, inherent risks, and control mechanisms. By analyzing operational frameworks, regulatory challenges, and consumer behavior trends, this study provides a comprehensive overview of how these financial tools function and their implications for users and merchants.

1. Overview of Financial Products in TikTok and ByteDance

1.1 Buy Now, Pay Later (BNPL)

BNPL services on TikTok, such as TikTok PayLater, allow users to split purchases into interest-free installments. This product targets Gen Z consumers, enabling them to defer payments for goods purchased on TikTok Shop. ByteDance’s broader strategy includes partnerships with fintech firms to streamline cross-border transactions and enhance user engagement.

1.2 Cash Loans

ByteDance has ventured into cash loans through apps like Manfen in China, offering credit limits up to ¥200,000 with low daily interest rates. These loans are marketed as quick solutions for personal expenses, leveraging ByteDance’s data analytics to assess creditworthiness.

1.3 Seller Finance

TikTok Shop Capital provides merchant financing, offering upfront payments to sellers while buyers repay via installments. Partners handle underwriting, with repayments automatically deducted from sales proceeds. This model ensures cash flow stability for small businesses while expanding TikTok’s e-commerce dominance.

Photo by Poster POS / Unsplash

2. Application Processes for Customers and Merchants

2.1 BNPL Activation on TikTok

To activate TikTok PayLater:

Navigate to TikTok Shop and select a product.
At checkout, choose “PayLater” and complete identity verification (e.g., government ID, facial scan).
Set a payment PIN and agree to terms.
Receive instant credit approval, often with limits up to RM 10,000 or PHP 50,000.

ByteDance’s systems use algorithms to evaluate income, occupation, and transaction history, minimizing manual reviews.

2.2 Cash Loan Applications

For Manfen and similar services:

Submit a valid phone number and national ID.
Undergo facial recognition via the app.
Receive loan offers within minutes, with funds disbursed directly to linked accounts.

Approval relies on behavioral data from ByteDance’s platforms, reducing reliance on traditional credit scores.

2.3 Seller Finance Enrollment

Eligible TikTok merchants receive invitations via the Seller Center. Requirements include:

Consistent sales history.
Linked bank accounts for automatic repayment.
Compliance with TikTok’s merchant policies.

Funds are disbursed within 48 hours, with repayments deducted daily or weekly from sales revenue.

Photo by Poster POS / Unsplash

3. Lifecycle of Financial Products

3.1 BNPL Lifecycle

Application: Instant approval via in-app checkout.
Disbursement: Merchants receive full payment upfront; users incur deferred debt.
Repayment: Four equal installments over six weeks, auto-debited from linked cards.
Closure: Account suspended if payments lapse; defaults reported to credit bureaus in some regions.

3.2 Cash Loan Lifecycle

Origination: Algorithmic underwriting using social media and transaction data.
Disbursement: Funds transferred instantly post-approval.
Repayment: Fixed daily/weekly payments over 3–12 months.
Default Management: Penalties include increased interest and debt collection.

3.3 Seller Finance Lifecycle

Offer Generation: Pre-approved limits based on sales performance.
Funding: Immediate liquidity for inventory or expansion.
Repayment: Daily deductions (e.g., 10% of sales) until the advance is settled.
Renewal: Limits increase with reliable repayment history.

Photo by Obi / Unsplash

4. Risks Associated with TikTok/ByteDance Financial Products

4.1 Consumer Risks

Debt Spiral: BNPL’s “pay-in-four” model obscures total costs, leading many Gen Z users to miss payments.
Data Privacy: ByteDance’s access to user behavior data raises concerns about misuse for targeted lending.
Credit Score Impact: Late BNPL payments are increasingly reported to credit agencies, affecting future borrowing.

4.2 Merchant Risks

Overleveraging: Small businesses may overextend with TikTok Shop Capital, risking cash flow shortages.
Platform Dependency: Revenue tied to TikTok’s algorithms; sudden policy changes could disrupt operations.

4.3 Systemic Risks

Regulatory Scrutiny: U.S. and EU regulators are investigating BNPL for inadequate affordability checks.
Operational Failures: Automated repayment systems face glitches, causing erroneous penalties.

Photo by Scott Graham / Unsplash

5. Risk Control Measures

5.1 Algorithmic Underwriting

ByteDance uses machine learning to assess:

BNPL: Purchase history, device location, and social connections.
Cash Loans: App engagement metrics.
Seller Finance: Sales velocity and customer reviews.

5.2 Credit Limits and Repayment Safeguards

Dynamic Limits: BNPL caps adjust based on spending patterns.
Auto-Debit Prioritization: Repayments are prioritized over other transactions to reduce defaults.

5.3 Regulatory Compliance

Licensing: ByteDance acquired financial licenses to legitimize offline transactions.
Transparency: TikTok PayLater now discloses late fees during checkout.

5.4 Consumer Education

In-App Tutorials: Guides on repayment schedules and penalty avoidance.
Spending Alerts: Notifications for upcoming dues, reducing missed payments.

6. Conclusion

TikTok and ByteDance’s financial services reflect a strategic push to monetize their ecosystems, blending e-commerce with embedded finance. While BNPL and seller finance products enhance accessibility, they introduce significant risks—from consumer debt cycles to regulatory backlash. ByteDance’s use of algorithmic underwriting and partnerships with traditional banks mitigates some risks but underscores the need for global regulatory frameworks tailored to digital lending. Moving forward, balancing innovation with consumer protection will be critical as these platforms redefine retail finance.

Payment Service Provider Case Study: Airwallex - Part 2

Vincent Yuan — Sun, 04 May 2025 02:26:02 GMT

Last part, the most important services that Airwallex provides have been briefed.

Payment Service Provider Case Study: Airwallex - Part 1

SpacecraftVincent Yuan

Most Important Services of Airwallex

It is to dive deeper into how Airwallex ensures security for its services as this is one of the most critical cornerstones of a financial services company. Airwallex has established a multi-layered security architecture that combines regulatory compliance, advanced technological safeguards, and proactive threat management to protect its global financial platform. The company's approach addresses both technical vulnerabilities and operational risks through seven key pillars of security.

1. Regulatory Compliance and Certifications

Airwallex maintains the highest international security certifications, demonstrating third-party validation of its controls:

1.1 Payment Card Industry Compliance

As a PCI-DSS Level 1 certified service provider, Airwallex adheres to strict requirements for payment card data protection, including secure network architecture and regular vulnerability testing. This certification covers all card processing activities and requires annual audits by qualified security assessors.

💡

PCI DSS stands for Payment Card Industry Data Security Standard. It's a global standard designed to protect credit card data and ensure the secure handling of payment card information by businesses. Essentially, it's a set of rules and guidelines that organizations must follow to safeguard cardholder data and prevent fraud.

1.2 Information Security Management

The company holds ISO 27001 certification, implementing systematic controls for data confidentiality, integrity, and availability through risk assessments and security measures. This framework governs how Airwallex manages sensitive information across its global operations.

1.3 Financial Services Auditing

Regular SOC 2 Type II audits validate the effectiveness of Airwallex's security controls related to availability, processing integrity, and confidentiality. Ernst & Young conducts these assessments annually, with reports available through Airwallex's security portal.

Photo by 2H Media / Unsplash

2. Technical Security Infrastructure

2.1 Encryption Protocols

All customer data undergoes dual encryption using TLS v1.2 for data in transit and AES-256 for data at rest. This ensures protection against interception during transmission and unauthorized access to stored information, meeting banking-grade security standards.

2.2 Access Management

A mandatory two-factor authentication (2FA) system requires mobile device verification for all account logins. The platform implements role-based access controls with audit logging, while privileged access requires justification and time-bound approvals.

2.3 Network Protection

Airwallex employs a defense-in-depth network strategy featuring:

Next-generation firewalls with intrusion prevention systems
Distributed denial-of-service (DDoS) mitigation through cloud-based scrubbing centers
Continuous traffic monitoring using machine learning anomaly detection

Photo by Taylor Vick / Unsplash

3. Fraud Prevention Mechanisms

3.1 Real-Time Detection Systems

Machine learning models analyze transaction patterns across 150+ billion annual data points, identifying fraudulent activity within 500 milliseconds. The system incorporates behavioral biometrics and device fingerprinting to detect account takeover attempts.

3.2 Identity Verification

Integration with Trulioo's platform enables:

Document verification with liveness detection
Facial recognition biometric matching
Global watchlist screening against PEPs and sanctions lists
This layered approach reduces synthetic identity fraud by 68% compared to industry averages.

3.3 Card Security Measures

All card transactions utilize 3D Secure 2.0 authentication and tokenization. The platform's AI detects card-not-present fraud with 99.2% accuracy while maintaining sub-second authorization times.

Photo by SumUp / Unsplash

4. Data Protection Practices

4.1 Segregation Architecture

Customer funds reside in segregated accounts at partner banks like JP Morgan and Standard Chartered, separate from operational accounts. Daily reconciliation processes ensure transaction integrity across 13 banking jurisdictions.

4.2 Vulnerability Management

Airwallex's invite-only Bug Bounty Program rewards ethical hackers for identifying vulnerabilities, complemented by quarterly penetration tests from firms like NCC Group. The company maintains a mean time to patch of 4.3 hours for critical vulnerabilities.

4.3 Secure Development Lifecycle

All code undergoes static/dynamic analysis and manual review before deployment. The CI/CD pipeline includes automated security testing, with containerized microservices reducing attack surface areas.

5. Operational Security Controls

5.1 24/7 Monitoring

A global security operations center (SOC) analyzes 2.7 million security events daily using SIEM integration with Splunk and Sumo Logic. The team maintains a 15-second response SLA for critical alerts.

5.2 Employee Security

All staff complete quarterly security training with phishing simulation tests achieving 97% detection rates. Access to production systems requires Just-In-Time provisioning and hardware security keys.

5.3 Business Continuity

Airwallex's infrastructure spans 23 availability zones across AWS and Google Cloud, enabling automatic failover with RPO/RTO metrics of <5 minutes. Disaster recovery drills occur bi-annually with full transaction replay testing.

Photo by Sincerely Media / Unsplash

6. Financial Safeguards

6.1 Fund Protection

Partner banks provide FDIC insurance eligibility up to $250,000 per qualified account in the US. Airwallex maintains excess deposit insurance through Lloyd's of London for enterprise clients.

6.2 Transaction Verification

The platform employs multi-signature approval workflows for high-value transactions, requiring consensus from geographically distributed authorization nodes.

7 Conclusion

Through this comprehensive security framework, Airwallex establishes trust with businesses operating in 150+ countries, processing over $50 billion annually while maintaining zero material security breaches since inception. The company's proactive approach positions it as a leader in financial infrastructure security, continually adapting to emerging cyber threats in the global payments landscape.

Payment Service Provider Case Study: Airwallex - Part 1

Vincent Yuan — Sat, 03 May 2025 21:32:07 GMT

Airwallex is a financial technology (fintech) platform that provides cross-border payment solutions and financial services for businesses. Founded in 2015 in Melbourne, Australia, it has quickly grown into a global platform that helps companies manage international payments, treasury, and expenses. Though not a bank, Airwallex is licensed and regulated to provide electronic money and financial services in multiple regions including Australia, Canada, China, Hong Kong, Lithuania, Malaysia, New Zealand, Singapore, the Netherlands, the UK, and the US.

This post is to introduce the most important services that Airwallex provides.

1 Core Financial Infrastructure and Services

Airwallex has developed a robust global financial platform that serves over 150,000 businesses worldwide. The company processed approximately $50 billion in annualized transactions as of 2022 and reached a valuation of US$5.5 billion. Their service offering spans multiple areas of financial operations:

1.1 Business Accounts and Global Money Management

Airwallex provides businesses with multi-currency accounts that function similarly to virtual business accounts. These accounts enable companies to:

Open local currency accounts to receive funds in 20+ currencies
Access banking services in 13 major markets, allowing businesses to receive payments like a local entity
Manage all funds centrally through a unified platform
Convert funds at market-leading interbank rates with markups as low as 0.2% above the interbank exchange rate

The platform eliminates traditional banking obstacles like paperwork and queues while offering superior global coverage across currencies and markets.

1.2 International Payments and Transfers

One of Airwallex's core strengths is facilitating efficient cross-border payments:

Transfer funds in 60+ currencies to over 150 countries worldwide
Achieve remarkably fast processing times with 70% of remittances credited within the same day and 95% of global transfers arriving within the same day
Pay suppliers and employees globally without excessive wait times or fees
Utilize local payment rails to reduce costs associated with international transfers

1.3 Payment Acceptance Solutions

Airwallex offers comprehensive payment processing capabilities that allow businesses to:

Accept payments from both domestic and international customers across 35 countries in Asia-Pacific, Europe, and the Americas
Create localized checkout experiences with pricing in multiple currencies
Offer various alternative payment methods to accommodate customer preferences
Hold earnings in their Airwallex wallet to avoid additional foreign exchange fees
Integrate through multiple options, including payment links, eCommerce platform connections, or custom checkout experiences via APIs

In April 2024, Airwallex expanded its payment acceptance solution to the United States, enabling U.S.-based merchants to accept domestic and international payments while giving foreign merchants with U.S. entities the ability to provide localized payment experiences.

1.4 Corporate Cards and Expense Management

Airwallex's integrated spend management solutions include:

Multi-currency corporate cards that can be issued instantly
Customizable controls to limit spending at the card level
Expense tracking and employee reimbursement capabilities
Zero fees on international purchases, saving on typical foreign transaction charges

1.5 Bill Pay and Accounts Payable

The platform streamlines bill payment processes through:

Centralized management of domestic and international bills
OCR technology that automatically extracts relevant data from uploaded or emailed bills
Customized multi-layer approval workflows to align with company spend policies
Direct payment of bills in multiple currencies at competitive FX rates
Integration with accounting software for faster reconciliation

Photo by Clay Banks / Unsplash

2 Advanced Financial Technology

2.1 API Infrastructure and Embedded Finance

Airwallex has built a sophisticated API infrastructure that enables:

Integration of financial services directly into business workflows
Development of custom financial solutions through flexible API access
Embedded finance opportunities that allow non-financial businesses to offer financial services
Global treasury solutions for streamlining payroll and other financial operations

The company's embedded finance solutions are categorized into three main areas:

Global Treasury: Enabling platforms to provide collection, storage, and disbursement of funds worldwide using 160+ payment methods and payouts to 150+ countries
Payments for Platforms: Allowing programmatic creation of connected accounts with merchants or customers, automatic fund splitting, and global payment acceptance
Banking as a Service: Enabling platforms to embed traditional banking products such as accounts, cards, and borrowing within their own offerings

2.2 Payroll Solutions

Airwallex offers specialized solutions for payroll processing:

Global payroll payment services for international workforces
Domestic payroll payout capabilities
Automated and streamlined end-to-end payroll processes
Business payroll APIs that enable payments to employees worldwide

Photo by Clay Banks / Unsplash

3 Technological Foundation and Future Direction

Built on robust cloud infrastructure through Google Cloud, Airwallex maintains impressive performance metrics, including availability over 99.95% and latency well below 200 milliseconds. The platform can scale to support up to 50,000 transactions per second.

As of early 2025, Airwallex continues to expand its capabilities with new features such as bulk transfer approvals on mobile, instant transfers via Airwallex Pay, and manual fund capture for Shopline transactions. The company is also pursuing geographic expansion, with recent moves into Latin American markets including Mexico and Brazil.

Fraud Detection Mechanisms in the MANIC Payment Network Ecosystem

Vincent Yuan — Sun, 27 Apr 2025 21:50:09 GMT

Risk Mitigation in the MANIC Payment Scheme: A Component-Level Analysis

Risk Mitigation in the MANIC Payment Scheme.

SpacecraftVincent Yuan

The MANIC framework-comprising Merchant, Acquiring Bank, Network, Issuing Bank, and Customer-forms the backbone of credit card transactions. Each entity plays a critical role in detecting and mitigating fraud at different stages of the payment workflow. This report examines the fraud detection strategies employed by each component, leveraging advanced technologies, regulatory frameworks, and collaborative data-sharing practices to secure the ecosystem.

1 Merchant: Initial Transaction Screening and Risk Mitigation

Merchants serve as the entry point for transactions and implement pre-authorization fraud checks to filter suspicious activity before transmitting requests to acquirers. Device fingerprinting analyzes hardware and software configurations to identify anomalies, such as mismatched geolocation data or spoofed devices. Velocity checks flag unusual transaction patterns, such as rapid-fire purchases from a single IP address or device, which often indicate card-testing attacks. For high-risk transactions, merchants deploy 3D Secure (3DS), requiring customers to authenticate via one-time passwords or biometric verification, shifting liability to issuers upon successful authentication.

Point-of-sale (POS) systems also incorporate encryption and tokenization to safeguard cardholder data. For example, Clover’s POS systems encrypt transaction data end-to-end, preventing skimming attacks targeting legacy systems. Additionally, merchants monitor for force-posted fraud, where criminals use forged authorization codes to process offline transactions. By restricting weekend or holiday sales volumes and validating authorization codes, merchants reduce exposure to these schemes.

2 Acquiring Bank: Real-Time Monitoring and Merchant Profiling

Acquiring banks partner with merchants to process transactions while enforcing anti-fraud protocols. Advanced solutions like BANKiQ’s Fraud Risk Control (FRC) platform screen merchants during onboarding, analyzing business types, transaction histories, and hidden affiliations to flag high-risk entities. Post-onboarding, acquirers employ real-time transaction monitoring to detect anomalies such as sudden spikes in chargebacks or mismatched merchant category codes (MCCs).

Mastercard’s Excessive Fraud Merchant (EFM) Program exemplifies acquirer-level oversight. It calculates monthly fraud ratios by dividing fraud chargebacks by prior-month sales, imposing fines on merchants exceeding thresholds (e.g., ≥1,000 transactions and ≥$50,000 in fraud). Acquirers also leverage network-level data to identify cross-merchant fraud patterns. For instance, a surge in declines from specific card ranges may indicate coordinated card-testing attacks, prompting acquirers to block implicated IP addresses or devices.

3 Network: AI-Driven Authorization and Ecosystem-Wide Threat Intelligence

Networks like Visa and Mastercard act as central hubs for transaction routing and fraud analytics. Visa’s Advanced Authorization uses machine learning to evaluate over 500 risk attributes-including spending habits, device type, and transaction location-generating a risk score (1–99) in milliseconds. High-risk transactions trigger automatic alerts to issuers, enabling real-time declines. This system has reduced Visa’s global fraud rate to <0.1%, despite a 10x increase in transaction volume since 2005.

Visa’s Scam Disruption Practice extends beyond transactional analysis, deploying dark web surveillance and generative AI to map scam networks. By correlating phishing domains, fraudulent merchant accounts, and money mule accounts, Visa dismantles entire fraud ecosystems. In 2024, this initiative prevented $350 million in fraud, highlighting the effectiveness of proactive threat hunting. Networks also standardize security protocols, such as 3DS mandates in the EU and UK, which reduce card-not-present (CNP) fraud by requiring multi-factor authentication.

4 Issuing Bank: Behavioral Analytics and Post-Authorization Controls

Issuing banks finalize transaction approvals while safeguarding cardholder accounts. Real-time risk scoring tools, like those from Sardine.ai, analyze device handling patterns (e.g., phone tilt angles) and typing rhythms to distinguish legitimate users from fraudsters. Unusual activities, such as foreign transactions or rapid gift card purchases, trigger automatic holds and SMS alerts to cardholders.

Post-authorization, issuers review CVV mismatches and address verification service (AVS) failures to identify stolen cards. For example, a transaction approved despite an incorrect CVV may indicate account compromise, prompting the issuer to freeze the card and contact the customer. Machine learning models trained on historical fraud data further refine detection accuracy. J.P. Morgan’s fraud team, for instance, uses transaction velocity rules to block bots executing card-testing attacks, reducing false declines by 22%.

5 Customer: Behavioral Triggers and Authentication Participation

While customers do not directly implement fraud controls, their behavior influences risk assessments. Sudden deviations from typical spending patterns-such as large purchases at unfamiliar merchants-activate issuer-level flags. Customers also participate in 3DS authentication, verifying transactions via OTPs or biometrics, which reduces friendly fraud claims by confirming intent.

Educating customers on recognizing phishing attempts and securing card details remains critical. For instance, Visa’s public awareness campaigns have reduced social engineering scams by 18% in markets with high adoption of 3DS.

Photo by CardMapr.nl / Unsplash

6 Conclusion: Collaborative Defense Across the MANIC Framework

The MANIC ecosystem’s fraud detection efficacy stems from layered defenses at each transactional node. Merchants filter early-stage risks, acquirers enforce compliance, networks deploy AI-driven analytics, issuers monitor behavior, and customers contribute through authentication. Emerging technologies like generative AI and decentralized fraud databases promise further enhancements, enabling real-time adaptation to evolving threats. However, persistent challenges-such as cross-border fraud and deepfake-enabled social engineering-demand continued innovation and global cooperation among MANIC stakeholders.

Risk Mitigation in the MANIC Payment Scheme: A Component-Level Analysis

Vincent Yuan — Sun, 27 Apr 2025 19:41:22 GMT

The MANIC Scheme in Payment Networks: A Comprehensive Analysis of Transaction Ecosystems

A Comprehensive Analysis of Transaction Ecosystems.

SpacecraftVincent Yuan

The MANIC framework’s interconnected structure introduces risks at every node, from merchant fraud to network vulnerabilities. Below, we dissect risks specific to each participant and outline mitigation strategies informed by industry practices and technological innovations.

1. Merchant Risks

Primary Threats:

Chargebacks: High dispute rates (e.g., ≥1% of transactions) trigger penalties and account termination.
Data Breaches: Weak PCI DSS compliance exposes cardholder data to theft.
Reputational Risk: Association with fraudulent or high-risk industries (e.g., CBD, gambling).

💡

PCI DSS (Payment Card Industry Data Security Standard) compliance refers to a set of security standards designed to protect cardholder data across payment networks. It is a framework of best practices and guidelines established by the PCI Security Standards Council to ensure that all organizations handling credit card and payment data maintain secure systems and processes to protect that data from fraud, breaches, and theft.

Mitigation Strategies:

Dynamic Fraud Detection: Deploy AI-driven tools (e.g., NMI’s machine learning models) to flag suspicious transactions using behavioral analytics (typing speed, device fingerprints).
Tokenization: Replace sensitive data with tokens to reduce breach impact.
Rolling Reserves: Maintain 5–10% of transaction volume in reserve accounts to offset chargeback liabilities.

2. Acquiring Bank Risks

Primary Threats:

Merchant Default: High-risk merchants (e.g., those in crypto) may suddenly cease operations, leaving unresolved chargebacks.
Compliance Failures: Violations of AML/KYC regulations incur fines up to $1M per incident.

Mitigation Strategies:

Enhanced Underwriting: Use AI underwriting tools to assess merchant credit scores, industry risk tiers, and transaction history.
Real-Time Monitoring: Track chargeback ratios and transaction velocity via platforms like Stax, triggering alerts for anomalies (e.g., >50% MoM volume spikes).
Contractual Safeguards: Enforce early termination clauses for merchants exceeding agreed chargeback thresholds.

3. Network Risks

Primary Threats:

Illicit Use: Money laundering via prepaid cards or anonymized transactions.
Operational Disruptions: Downtime in clearing systems (e.g., VisaNet outages) halts global transactions.

Mitigation Strategies:

Link Analysis: Map transactional relationships to uncover fraud rings (e.g., detecting mule accounts funding terror groups).
MACH Architecture: Adopt cloud-native, microservices-based systems (e.g., Visa Direct) for 99.999% uptime and rapid failover.
Geo-Blocking: Restrict transactions from high-risk jurisdictions flagged in OFAC lists.

4. Issuing Bank Risks

Primary Threats:

Credit Risk: Cardholder defaults (e.g., 3.5% delinquency rates in Q1 2025).
Account Takeovers: Stolen credentials used for unauthorized purchases.

Mitigation Strategies:

Behavioral Biometrics: Deploy passive authentication via typing cadence or screen-touch pressure analysis.
Dynamic Credit Limits: Adjust spending caps in real-time based on cardholder income signals (e.g., Plaid’s cash flow verification).
3D Secure 2.0: Mandate biometric authentication for high-value online transactions.

5. Customer Risks

Primary Threats:

Identity Theft: Stolen card details sold on dark web markets (e.g., $40 avg. price per credit card dump).
Friendly Fraud: False chargeback claims (“item not received”) cost merchants $25B annually.

Mitigation Strategies:

EMV® 3-D Secure: Shift liability to issuers via cryptogram-based authentication.
Transactional Transparency: Provide real-time SMS updates with delivery tracking links to deter false disputes.
Education Campaigns: Teach customers to recognize phishing attempts via issuer-branded tutorials.

Cross-Component Risk Synergies

Risk Type	Collaborative Mitigation
Data Breaches	End-to-end encryption (E2EE) across MANIC nodes, audited quarterly via PCI DSS-certified tools.
Money Laundering	Shared blockchain ledgers between issuers and networks for immutable transaction tracing.
Systemic Fraud	Federated machine learning models pooling anonymized data from acquirers and networks.

Photo by SumUp / Unsplash

Future-Proofing the MANIC Model

Quantum-Resistant Cryptography: Preparing for Y2Q threats with lattice-based algorithms (NIST-standardized by 2026).
Decentralized Identity: Letting customers control data via self-sovereign wallets (e.g., Mastercard’s ID Service).
AI Co-Pilots: Tools like Stripe Radar 2.0 auto-negotiate chargebacks using generative AI for evidence compilation.

By layering these technical, contractual, and educational safeguards, stakeholders can reduce MANIC-related losses by 40–60% while maintaining transaction velocity. However, balancing security with user experience remains pivotal—overly stringent measures (e.g., step-up auth for $10 purchases) risk cart abandonment rates exceeding 35%.

The MANIC Scheme in Payment Networks: A Comprehensive Analysis of Transaction Ecosystems

Vincent Yuan — Sun, 27 Apr 2025 18:31:41 GMT

The modern payment ecosystem relies on a complex interplay of stakeholders to facilitate secure and efficient financial transactions. At the core of this system lies the MANIC scheme, an acronym representing the five critical entities involved in credit card processing: Merchant, Acquiring Bank, Network, Issuing Bank, and Customer/Cardholder. This framework ensures seamless coordination across parties, enabling billions of transactions daily. Below, we dissect each component of the MANIC model, analyze their interdependencies, and explore the technical and economic mechanisms underpinning this ecosystem.

1 The MANIC Framework: Core Components and Roles

1.1 Merchant: The Transaction Initiator

Merchants form the entry point of the payment lifecycle. These entities — ranging from retail stores to online platforms — accept card payments in exchange for goods or services. When a customer initiates a transaction, the merchant’s payment infrastructure (e.g., point-of-sale systems, e-commerce gateways) captures card details and forwards them to the acquiring bank.

Key Responsibilities:

Transaction Initiation: Triggering the authorization process by submitting payment requests.
Compliance: Adhering to Payment Card Industry Data Security Standards (PCI DSS) to protect cardholder data.
Settlement: Receiving funds post-transaction after fees are deducted by intermediaries.

Challenges:

Fee Structures: Merchants bear costs such as interchange fees (paid to issuing banks) and assessment fees (paid to networks).
Fraud Management: Implementing tools like tokenization and 3D Secure to mitigate risks.

1.2 Acquiring Bank: The Merchant’s Financial Partner

Acquiring banks (or “acquirers”) act as intermediaries between merchants and the broader payment network. Institutions like Chase or Worldpay provide merchant accounts, enabling businesses to accept card payments. Their role extends beyond transaction routing; they assume liability for chargebacks and ensure regulatory compliance.

Operational Workflow:

Authorization Request: The acquirer forwards transaction details to the card network.
Funds Settlement: After deducting fees, the acquirer deposits the net amount into the merchant’s account.
Dispute Resolution: Managing chargebacks and reconciling transactional discrepancies.

Economic Model:

Acquirers profit from markup fees added to interchange rates. For example, a $100 transaction with a 1.65% interchange fee and a 0.20% acquirer markup yields $1.85 in revenue.

1.3 Network: The Interchange Facilitator

Card networks (Visa, Mastercard, etc.) serve as communication highways, connecting acquirers and issuers. They standardize protocols (e.g., ISO 8583 messaging) and enforce security measures while monetizing transaction volume through assessment fees.

Technical Infrastructure:

Authorization Routing: Networks validate transactions by checking against issuer-defined rules (e.g., available credit, fraud flags).
Clearing and Settlement: Batch processing of transactions ensures funds move from issuers to acquirers.

Innovations:

Cloud-Native Architectures: Modern networks like Visa Direct leverage MACH (Microservices, API-first, Cloud-native, Headless) principles for real-time payments.
Tokenization: Replacing sensitive card data with tokens to enhance security in digital transactions.

1.4 Issuing Bank: The Cardholder’s Financial Institution

Issuing banks (e.g., Bank of America) provide credit/debit cards to consumers. They authorize transactions based on available credit, manage fraud detection systems, and settle obligations with acquirers via networks.

Authorization Process:

Risk Assessment: Algorithms evaluate transaction patterns, location, and purchase amount to flag suspicious activity.
Funds Reservation: Temporarily holding the transaction amount until settlement.

Revenue Streams:

Interchange Fees: Issuers earn ~1.3–2.5% of transaction value, compensating for credit risk and rewards programs.
Interest and Penalties: Revenue from cardholder balances and late fees.

1.5 Customer: The Transaction Originator

Cardholders initiate payments by presenting their cards at merchant terminals. Their interaction triggers the MANIC chain, culminating in funds transfer and monthly billing cycles.

Security Considerations:

EMV Chips: Reduce counterfeit fraud through dynamic authentication.
Biometric Authentication: Fingerprint/face recognition in mobile wallets (Apple Pay, Google Pay) enhances security.

Photo by Vincent Yuan @USA / Unsplash

2 Transaction Flow Under the MANIC Model

2.1 Step 1: Authorization

Customer swipes a card at a Merchant’s terminal.
Acquirer sends an authorization request via the Network to the Issuer.
Issuer approves/declines based on fraud checks and available credit.

2.2 Step 2: Authentication

3D Secure: For online transactions, cardholders authenticate via one-time passwords or biometrics.

2.3 Step 3: Clearing and Settlement

Batch Processing: At day’s end, the Acquirer submits batched transactions to the Network, which routes them to Issuers.
Net Settlement: Issuers transfer funds to acquirers via the network, minus interchange fees.

Simplified MANIC Transaction Flow:

Customer → Merchant → Acquirer → Network → Issuer → (Approval) → Network → Acquirer → Merchant

Photo by Clay Banks / Unsplash

3 Economic Dynamics and Fee Structures

The MANIC ecosystem thrives on fee-sharing mechanisms:

Interchange Fees: Paid by acquirers to issuers, typically 1.5–2.5% per transaction.
Assessment Fees: Networks charge 0.13–0.15% of volume for infrastructure use.
Acquirer Markup: Variable fees added to interchange, often 0.20–0.50%.

Example: A $100 purchase may incur $1.80 in interchange (1.8%), $0.15 in assessment fees, and $0.30 in acquirer markup, totaling $2.25 in processing costs.

4 Challenges and Future Trends

4.1 Regulatory Scrutiny

Payment for Order Flow (PFOF): Critics argue such practices create conflicts of interest, prompting EU plans to ban PFOF by 2026.
Interchange Caps: Regulations like the Durbin Amendment (US) limit debit card interchange fees, pressuring issuer revenues.

4.2 Technological Disruption

Decentralized Finance (DeFi): Blockchain-based systems challenge traditional networks by enabling peer-to-peer settlements.
Real-Time Payments: FedNow (US) and SEPA Instant (EU) bypass card networks, reducing reliance on MANIC intermediaries.

Photo by Maarten van den Heuvel / Unsplash

Conclusion

The MANIC scheme represents a finely tuned orchestra of financial institutions, networks, and end-users. While the model has enabled global commerce scalability, emerging technologies and regulatory shifts pose existential questions. Banks and networks adopting MACH architectures and AI-driven fraud detection are poised to lead the next evolution of payment ecosystems. However, balancing innovation with interoperability — ensuring the MANIC framework adapts without fragmenting — remains the industry’s paramount challenge.

Support Conversational History in RAG Pipelines with Llama 3

Vincent Yuan — Sun, 07 Jul 2024 20:47:28 GMT

In Retrieval-Augmented Generation (RAG) pipelines, it's crucial to help chatbots recall previous conversations, as users may ask follow-up questions that rely on earlier context. However, users' prompts might lack sufficient context, assuming previous discussions are still relevant. To tackle this challenge, incorporating chat history into LLMs' question-answering context enables them to retrieve relevant information for new queries.

This post presents a solution leveraging LangChain, Llama 3-8B, and Ollama, which can efficiently run on an M2 Pro MacBook Pro with 16 GB memory.

1 Dependencies

1.1 Ollama and Llama 3 Model

Firstly, Ollama should be installed on a MacBook. Ollama can utilize the GPUs of the machine, ensuring efficient inference, provided there is sufficient memory. Llama 3-8B performs well on machines with 16 GB of memory.

💡

Ollama can be downloaded here: https://ollama.com/

Once it is downloaded, can use below command in the terminal to pull the Llama 3-8B model:

ollama pull llama3

1.2 Python Dependencies

Now, let's import the required packages to construct a RAG system with chat history, utilizing the LangChain toolkit.

# Models
from langchain.llms import LlamaCpp
from langchain.chat_models import ChatOpenAI

# Setup
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

# Vector store
from langchain.document_loaders import  TextLoader
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter

# LangChain supports many other chat models. Here, we're using Ollama
from langchain_community.chat_models import ChatOllama
from langchain_core.prompts import ChatPromptTemplate

# RAG with Memory 
from langchain.chains import create_history_aware_retriever, create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory

# Display results
import markdown
from IPython.display import display, Markdown, Latex

2 Create Vector Store

The source data consists of a summary of important events and statistics from the week of May 13th, 2024, as published by Yahoo Finance. This data is not included in the training set of Llama 3. For demonstration purposes, the news is extracted to a text file and utilized in the code to create the Chroma vector store and retriever.

source_data_path = '../data/yahoo.txt'

# for token-wise streaming so you'll see the answer gets generated token by token when Llama is answering your question
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

loader = TextLoader(source_data_path)

documents = loader.load()

#splitting the text into
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
texts = text_splitter.split_documents(documents)

embedding = HuggingFaceEmbeddings()

vectordb = Chroma.from_documents(documents=texts,
                                 embedding=embedding
                                 # persist_directory=persist_directory
                                )
                                
retriever = vectordb.as_retriever(search_kwargs={"k": 5})

3 Create the LLM Object

Make sure the Ollama is on and the LLama 3 model has been downloaded, then below code can be used to define a LLM object in the pipeline:

llm = ChatOllama(model="llama3",
                temperature=0.1)

4 RAG with Memory

In essence, there should be place to store chat history, also the the chat history is added to the prompt in RAG, so that the LLM can access past conversation, also the chat history is update after each round of conversation. Below is a way to use the BaseChatMessageHistory to address this need:

### Contextualize question ###
contextualize_q_system_prompt = """Given a chat history and the latest user question \
which might reference context in the chat history, formulate a standalone question \
which can be understood without the chat history. Do NOT answer the question, \
just reformulate it if needed and otherwise return it as is."""
contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", contextualize_q_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)
history_aware_retriever = create_history_aware_retriever(
    llm, retriever, contextualize_q_prompt
)


### Answer question ###
qa_system_prompt = """You are an assistant for question-answering tasks. \
Use the following pieces of retrieved context to answer the question. \
If you don't know the answer, just say that you don't know. \
Use three sentences maximum and keep the answer concise.\

{context}"""
qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", qa_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)
question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)

rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)

Then define the RAG chain:

### Statefully manage chat history ###
store = {}

def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]


conversational_rag_chain = RunnableWithMessageHistory(
    rag_chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
    output_messages_key="answer",
)

Then let's try if the model understands the Yahoo Finance analysis, the question is What is the wall street expectation of the April Consumer Price Index (CPI)?.

llm_response = conversational_rag_chain.invoke(
    {"input": "What is the wall street expectation of the April Consumer Price Index (CPI)?"},
    config={
        "configurable": {"session_id": "abc123"}
    },  # constructs a key "abc123" in `store`.
)["answer"]

print('='*50)
display(Markdown(llm_response))

The response is:

According to the text, Wall Street expects an annual gain of 3.4% for headline CPI, which includes the price of food and energy, a decrease from the 3.5% headline number in March. Additionally, prices are expected to rise 0.4% on a month-over-month basis, in line with March's rise.

This is aligned with the source:

Yahoo Finance Analysis

Then, a question is asked based on the output of last question to calculate the double of the expected CPI:

llm_response = conversational_rag_chain.invoke(
    {"input": "What is the double of the expected CPI in the prior answer?"},
    config={
        "configurable": {"session_id": "abc123"}
    },  # constructs a key "abc123" in `store`.
)["answer"]

print('='*50)
display(Markdown(llm_response))

And this is the output:

The expected annual gain for headline CPI is 3.4%. The double of this value would be:

2 x 3.4% = 6.8%

So, the double of the expected CPI is 6.8%.

So the model successfully picks up the information that it returns in the past and answer correctly to the new question.

5 Summary

This enhanced solution extends the capabilities of a regular RAG by supporting chat history, making it highly beneficial for multiple rounds of conversations. With Ollama, experiments like this can be run on an affordable laptop with embedded GPUs. A special acknowledgment to Meta for their great work in improving Llama 3.

Build a Regulation Assistant Powered by Llama 2 and Streamlit with Google Colab GPUs

Vincent Yuan — Sun, 07 Jul 2024 20:42:19 GMT

In our previous discussion, we explored the concept of creating a web chatbot using Llama 2. However, an incredibly practical application of chatbots is their ability to field questions within specific domains of knowledge. For example, a chatbot can be trained on policies, regulations, and laws, effectively functioning as a knowledge assistant that users can collaborate with. This functionality holds significant value for enterprise users, who often have vast repositories of internal documents that can be utilized to train the chatbot. Employees can then leverage the chatbot as a quick reference tool.

Furthermore, this solution can be entirely constructed using open-source components, eliminating the need to rely on external APIs like OpenAI and alleviating any privacy concerns.

This post showcases a compliance assistant built with the utilization of the open-source large language model Llama 2, in conjunction with retrieval-augmented generation (RAG), all presented through a user-friendly web interface powered by Streamlit.

💡

The code can be replicated on Google Colab, using free T4 GPUs. Kudos to Google.

1 Dependencies

Firstly, install a few dependencies:

!pip install -q streamlit

!npm install localtunnel

# GPU setup of LangChain
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install --force-reinstall llama-cpp-python==0.2.28  --no-cache-dir

!pip install huggingface_hub  chromadb langchain sentence-transformers pypdf

Then download the Llama 2 model to the Colab notebook:

!wget https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_0.gguf

1.1 Mount the Google Drive

This chatbot needs to retrieve documents from a vector database which is composed of embeddings of regulations PDFs. The PDFs are saved in Google Drive, so let's mount the Google Drive so the code can access the PDFs:

# Mount the google drive
from google.colab import drive
drive.mount('/gdrive')

2 Build the Web Chatbot

The web chatbot is like this:

A Compliance Assistant

Below is the entire code to build the compliance assistant, the details of each part will be introduced in the follow section:

%%writefile app.py

import streamlit as st
import os

from langchain.llms import LlamaCpp
from langchain.chains import LLMChain
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.prompts import PromptTemplate

from langchain.llms import LlamaCpp

from langchain_community.document_loaders import PyPDFLoader, PyPDFDirectoryLoader
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma

from langchain.chains import RetrievalQA




# App title
st.set_page_config(page_title="🦙💬 Llama 2 Chatbot")

llama_model_path = 'llama-2-7b-chat.Q5_0.gguf'

n_gpu_layers = 40  # Change this value based on your model and your GPU VRAM pool.
n_batch = 512  # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.

# for token-wise streaming so you'll see the answer gets generated token by token when Llama is answering your question
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])



# ====================== RAG ======================

# Encoding the PDFs
pdf_folder_path = '/gdrive/MyDrive/Research/Data/GenAI/PDFs'

loader = PyPDFDirectoryLoader(pdf_folder_path)

documents = loader.load()

#splitting the text into
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
texts = text_splitter.split_documents(documents)

# Create vector DB, embed and store the texts
# Supplying a persist_directory will store the embeddings on disk
persist_directory = 'db'

## here we are using OpenAI embeddings but in future we will swap out to local embeddings
embedding = HuggingFaceEmbeddings()

vectordb = Chroma.from_documents(documents=texts,
                                 embedding=embedding,
                                 persist_directory=persist_directory)

retriever = vectordb.as_retriever(search_kwargs={"k": 5})

# ====================== App ======================
with st.sidebar:
    st.title('🦙💬 Llama 2 Chatbot')


    st.subheader('Models and parameters')
    selected_model = st.sidebar.selectbox('Choose a Llama2 model', ['Llama2-7B', 'Llama2-13B'], key='selected_model')

    if selected_model == 'Llama2-7B':
        llm_path = llama_model_path
    elif selected_model == 'Llama2-13B':
        llm_path = llama_model_path

    temperature = st.sidebar.slider('temperature', min_value=0.01, max_value=5.0, value=0.1, step=0.01)
    top_p = st.sidebar.slider('top_p', min_value=0.01, max_value=1.0, value=0.9, step=0.01)
    max_length = st.sidebar.slider('max_length', min_value=32, max_value=128, value=120, step=8)
    st.markdown('📖 Learn how to build this app in this [blog](https://blog.streamlit.io/how-to-build-a-llama-2-chatbot/)!')


    llm = LlamaCpp(
      model_path=llm_path,
      temperature=temperature,
      top_p=top_p,
      n_ctx=2048,
      n_gpu_layers=n_gpu_layers,
      n_batch=n_batch,
      callback_manager=callback_manager,
      verbose=True,
    )

    # use another LangChain's chain, RetrievalQA, to associate Llama with the loaded documents stored in the vector db



# Store LLM generated responses
if "messages" not in st.session_state.keys():
    st.session_state.messages = [{"role": "assistant", "content": "How may I assist you today?"}]

# Display or clear chat messages
for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.write(message["content"])

def clear_chat_history():
    st.session_state.messages = [{"role": "assistant", "content": "How may I assist you today?"}]
st.sidebar.button('Clear Chat History', on_click=clear_chat_history)


# Function for generating LLaMA2 response. Refactored from https://github.com/a16z-infra/llama2-chatbot
def generate_llama2_response(prompt_input):

    pre_prompt = """[INST] <>
                  You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.

                  If you cannot answer the question from the given documents, please state that you do not have an answer.\n
                  """


    for dict_message in st.session_state.messages:
        if dict_message["role"] == "user":
            pre_prompt += "User: " + dict_message["content"] + "\n\n"
        else:
            pre_prompt += "Assistant: " + dict_message["content"] + "\n\n"

    prompt = pre_prompt +  "{context}User : {question}" + "[\INST]"
    llama_prompt = PromptTemplate(template=prompt, input_variables=["context","question"])


    qa_chain = RetrievalQA.from_chain_type(
        llm,
        retriever=retriever,
         chain_type_kwargs={"prompt": llama_prompt}
    )

    result = qa_chain.run({
                            "query": prompt_input})


    return result

# User-provided prompt
if prompt := st.chat_input():
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message("user"):
        st.write(prompt)

# Generate a new response if last message is not from assistant
if st.session_state.messages[-1]["role"] != "assistant":
    with st.chat_message("assistant"):
        with st.spinner("Thinking..."):
            response = generate_llama2_response(prompt)
            placeholder = st.empty()
            full_response = ''
            for item in response:
                full_response += item
                placeholder.markdown(full_response)
            placeholder.markdown(full_response)
    message = {"role": "assistant", "content": full_response}
    st.session_state.messages.append(message)

2.1 Model Setup

In the code, firstly tweak the params per your hardware, models and objectives:

llama_model_path = 'llama-2-7b-chat.Q5_0.gguf'

n_gpu_layers = 40  # Change this value based on your model and your GPU VRAM pool.
n_batch = 512  # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.

The free version of Colab does not have too much memory so here the llama-2-7b-chat.Q5_0.gguf is used but you can use a larger model for better performance.

2.2 Vector Database

In order to perform RAG, a vector database has to be created first, in this example, the code read the regulation B and regulation Z PDFs and embed them, then a vector database is created based on that:


# ====================== RAG ======================

# Encoding the PDFs
pdf_folder_path = '/gdrive/MyDrive/Research/Data/GenAI/PDFs'

loader = PyPDFDirectoryLoader(pdf_folder_path)

documents = loader.load()

#splitting the text into
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
texts = text_splitter.split_documents(documents)

# Create vector DB, embed and store the texts
# Supplying a persist_directory will store the embeddings on disk
persist_directory = 'db'

## here we are using OpenAI embeddings but in future we will swap out to local embeddings
embedding = HuggingFaceEmbeddings()

vectordb = Chroma.from_documents(documents=texts,
                                 embedding=embedding,
                                 persist_directory=persist_directory)

retriever = vectordb.as_retriever(search_kwargs={"k": 5})

2.3 Message Management

Then, these are the setup for the display/clear of messages of the chatbot:

# Store LLM generated responses
if "messages" not in st.session_state.keys():
    st.session_state.messages = [{"role": "assistant", "content": "How may I assist you today?"}]

# Display or clear chat messages
for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.write(message["content"])

def clear_chat_history():
    st.session_state.messages = [{"role": "assistant", "content": "How may I assist you today?"}]
st.sidebar.button('Clear Chat History', on_click=clear_chat_history)

2.4 Get LLM Response

Below function appends the chat history into the prompt and use the vector database created above to retrieve answers.

💡

Note that this QA chain is different from a regular LLM chain.

# Function for generating LLaMA2 response based on RAG.
def generate_llama2_response(prompt_input):

    pre_prompt = """[INST] <>
                  You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.

                  If you cannot answer the question from the given documents, please state that you do not have an answer.\n
                  """


    for dict_message in st.session_state.messages:
        if dict_message["role"] == "user":
            pre_prompt += "User: " + dict_message["content"] + "\n\n"
        else:
            pre_prompt += "Assistant: " + dict_message["content"] + "\n\n"

    prompt = pre_prompt +  "{context}User : {question}" + "[\INST]"
    llama_prompt = PromptTemplate(template=prompt, input_variables=["context","question"])


    qa_chain = RetrievalQA.from_chain_type(
        llm,
        retriever=retriever,
         chain_type_kwargs={"prompt": llama_prompt}
    )

    result = qa_chain.run({
                            "query": prompt_input})


    return result

2.5 Conversation

Below shows the question and answering process, the chatbot responses to users' questions:

# User-provided prompt
if prompt := st.chat_input():
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message("user"):
        st.write(prompt)

# Generate a new response if last message is not from assistant
if st.session_state.messages[-1]["role"] != "assistant":
    with st.chat_message("assistant"):
        with st.spinner("Thinking..."):
            response = generate_llama2_response(prompt)
            placeholder = st.empty()
            full_response = ''
            for item in response:
                full_response += item
                placeholder.markdown(full_response)
            placeholder.markdown(full_response)
    message = {"role": "assistant", "content": full_response}
    st.session_state.messages.append(message)

3 Start the Chatbot

You can bring up the chatbot by using below command:

!streamlit run app.py --server.address=localhost &>/content/logs.txt &

import urllib
print("Password/Enpoint IP for localtunnel is:",urllib.request.urlopen('https://ipv4.icanhazip.com').read().decode('utf8').strip("\n"))

!npx localtunnel --port 8501

The result shows a password to access the web app:

Password/Enpoint IP for localtunnel is: 34.125.220.166
npx: installed 22 in 2.393s
your url is: https://hot-pets-chew.loca.lt

Go to that url and enter the password, and enjoy the time!

4 Summary

This post demonstrates the construction of a versatile chatbot capable of more than just conversation. Specifically, it covers the following key features:

Creation of a vector database utilizing domain knowledge.
Ability of the chatbot to retrieve information from the vector database and respond to user queries.
User-friendly interface for ease of use.

This approach is scalable across various applications, as chatbots excel in information retrieval when equipped with a reliable database as the source of truth. Stay tuned for further insights into valuable applications of this technology.

Unveiling the Deal: What Happens When Companies Merge

Vincent Yuan — Sun, 07 Jul 2024 20:40:17 GMT

This post is to go through the most important processes in the merger of companies and answer most interested questions for employees, shareholders and customers!

1 Company Merger Process

Acquiring a company in the U.S. is a complex process with various stages and potential outcomes. Here's a breakdown of the typical steps and what you can expect:

1. Pre-Negotiation:

Target Identification: The acquiring company identifies potential targets based on strategic fit, market potential, and other criteria.
Initial Contact: Discreet inquiries are made to gauge interest and gather information.
Non-Disclosure Agreement (NDA): Both parties sign an NDA to protect confidential information during discussions.

2. Due Diligence:

In-depth Investigation: The acquiring company assesses the target's financial health, operations, legal status, and other critical factors.
Valuation: Financial experts determine the target company's fair market value.

3. Negotiation and Agreement:

Letter of Intent (LOI): A non-binding agreement outlining key terms like price, structure, and timelines.
Negotiation: Both sides negotiate the final terms of the acquisition agreement, including purchase price, payment methods, and deal structure.
Definitive Agreement: A legally binding document outlining all agreed-upon terms and conditions.

4. Regulatory Approvals:

Antitrust Review: The deal might require approval from the Federal Trade Commission (FTC) or other regulatory bodies to ensure fair competition.
Industry-Specific Approvals: Depending on the industry, further regulatory approvals might be necessary.

5. Closing and Integration:

Closing: All legal formalities are completed, and the acquisition is finalized.
Integration: The acquiring company integrates the target's operations, employees, and systems into its own structure. This can be a complex and lengthy process.

What to expect:

Timeframe: The process can take months or even years, depending on the complexity of the deal and regulatory hurdles.
Costs: Significant legal, financial, and integration costs are involved.
Uncertainty: Regulatory approvals and market conditions can impact the deal's outcome.
Impact: Acquisitions can affect employees, customers, and the industry at large.

Additional points to consider:

There are different types of acquisitions, such as stock purchases, asset purchases, and mergers. Each has its own nuances.
Friendly acquisitions involve cooperation between both parties, while hostile takeovers involve a more aggressive approach.
The specific process and outcomes can vary significantly depending on the size, industry, and circumstances of the companies involved.

Now, let's break down each process and dive deep into how each process works, with some examples.

Photo by Vincent Yuan @USA / Unsplash

2 Pre-negotiation

The pre-negotiation phase in a company acquisition lays the groundwork for a successful deal or identifies potential roadblocks early on. Here's a more detailed breakdown of this crucial stage:

1. Target Identification:

Strategic fit: Aligning the target's strengths and weaknesses with the acquirer's goals and existing business.
Market potential: Assessing the target's market share, growth potential, and competitive landscape.
Financial attractiveness: Analyzing profitability, debt levels, and valuation multiples.

Examples:

Amazon's acquisition of Whole Foods: Focused on expanding Amazon's grocery delivery and brick-and-mortar presence.
Disney's acquisition of Marvel Entertainment: Aimed at acquiring valuable intellectual property and expanding its superhero universe.

2. Initial Contact:

Discreet approach: Using intermediaries, investment bankers, or direct contact depending on the situation and target receptivity.
Information gathering: Gauging the target's general interest, financial health, and potential deal structure.
Non-Disclosure Agreement (NDA): Protecting confidential information shared during discussions.

Example:

Microsoft's acquisition of LinkedIn: Initial contact reportedly occurred through a mutual acquaintance who connected Satya Nadella and Jeff Weiner.

3. Due Diligence Preparation:

Gathering internal resources: Assembling legal, financial, and operational teams for in-depth analysis.
Developing a due diligence plan: Defining scope, timelines, and key areas of investigation.
Negotiating access: Securing permission to review the target's financial records, contracts, and other sensitive information.

4. Non-Binding Negotiations:

Indicative offer: Presenting a non-binding price range based on preliminary valuation and market conditions.
Structure exploration: Discussing potential deal structures (stock purchase, asset purchase, merger) and their implications.
Exclusivity agreement (optional): Granting the acquirer temporary exclusive negotiation rights in exchange for a fee.

Remember:

Pre-negotiation is a delicate dance between expressing interest without revealing your hand too soon.
Thorough due diligence is crucial for understanding potential risks and opportunities.
Non-binding negotiations help refine deal terms and identify potential dealbreakers before investing significant resources.

Photo by Vincent Yuan @USA / Unsplash

3 Due Diligence

Due diligence is a crucial step in the company merger process, allowing the acquiring company to gain a deep understanding of the target company's financial health, operations, legal status, and potential risks. Here's a more specific breakdown of how it typically works:

Stages of Due Diligence:

1. Pre-Diligence:

Initial research and information gathering about the target company.
Signing a Non-Disclosure Agreement (NDA) to protect confidential information.

2. Financial Due Diligence:

Reviewing financial statements, tax returns, and internal controls.
Assessing the company's financial performance, profitability, and debt levels.
Identifying potential financial risks and liabilities.

3. Operational Due Diligence:

Evaluating the target company's business operations, processes, and systems.
Analyzing market position, competitive landscape, and customer base.
Identifying potential operational challenges and opportunities.

4. Legal Due Diligence:

Reviewing legal documents, contracts, and intellectual property rights.
Assessing potential legal risks, compliance issues, and litigation exposure.
Ensuring the target company is operating legally and has a clear title to assets.

5. Environmental Due Diligence:

Assessing potential environmental liabilities and regulatory compliance.
Identifying any environmental hazards or contamination on the target company's property.

6. Human Resources Due Diligence:

Evaluating the target company's workforce, employee contracts, and labor relations.
Identifying potential human resource risks and liabilities, such as employee lawsuits or unionization efforts.

Additional Points:

The specific scope and depth of due diligence vary depending on the size and complexity of the deal.
Experienced professionals, such as accountants, lawyers, and consultants, are often involved in the process.
Due diligence findings can impact the negotiation of the deal terms and price.

Photo by Vincent Yuan @USA / Unsplash

4 Negotiation and Agreement

The negotiation and agreement phase is arguably the most critical stage in an acquisition, where the terms are hammered out and the deal's fate is determined. Here's an in-depth look at how it typically unfolds:

1. Letter of Intent (LOI):

Non-binding document outlining key deal terms: Price, structure, timelines, contingencies, and exclusivity provisions.
Serves as a roadmap for further negotiations: Prevents wasting time if fundamental differences exist.
May include break-up fees: To compensate the target if the deal falls through due to the acquirer's actions.

Example:

SoftBank's acquisition of WeWork: The complex LOI included contingencies based on WeWork's financial performance.

2. Negotiation of Definitive Agreement:

Intensive process involving lawyers, advisors, and executives:Each side advocates for their best interests.
Key areas of negotiation: Purchase price, payment structure, warranties, indemnification, employee-related matters, and regulatory approvals.
Back-and-forth through drafts and revisions: Striving for a mutually beneficial agreement.

3. Deal Sweeteners:

Non-cash consideration: Stock, earn-outs, or other creative structures to bridge valuation gaps.
Management incentives: Retention packages or equity grants to key employees.

Examples:

Disney's acquisition of 21st Century Fox: Included a complex stock-based deal structure.
Elon Musk's acquisition of Twitter: Involved offering severance packages to some employees.

4. Finalizing the Agreement:

Legal review and approvals by boards and shareholders:Ensuring compliance and alignment.
Signing ceremony: Formalizing the agreement and marking a significant milestone.

Additional Points:

Negotiation is a dynamic process with power struggles and potential deadlocks.
Effective communication, flexibility, and a win-win mindset are crucial for success.
Cultural differences and regulatory complexities can add layers to the process.

Photo by Vincent Yuan @USA / Unsplash

5 Navigating the Regulatory Maze

Regulatory approval is a crucial hurdle in many acquisitions, aiming to ensure fair competition, consumer protection, and other societal considerations. Here's an overview of the process and common examples:

1. Identifying Relevant Regulators:

Industry-Specific Agencies: Depending on the industry, agencies like the Federal Trade Commission (FTC), Department of Justice (DOJ), or the Federal Communications Commission (FCC) might be involved.
Antitrust Regulators: The FTC and DOJ hold primary authority for antitrust reviews to prevent mergers that reduce competition.
Other Potential Regulators: Depending on the deal's specifics, agencies like the Securities and Exchange Commission (SEC) or state regulators might also weigh in.

2. Filing and Review Process:

Filing: Companies submit detailed information about the merger, including market analyses and justifications.
Initial Review: Regulators assess the potential impact on competition and other relevant factors.
Second Request: If concerns arise, regulators can request more information and conduct deeper investigations.
Public Comment: In some cases, the public can submit comments on the proposed merger.

3. Approval or Challenge:

Clearance: If regulators determine no significant anti-competitive harms, they grant approval.
Conditions: Approvals might come with conditions aimed at mitigating potential harms, like divestitures or restrictions on specific practices.
Challenge: If regulators believe the deal violates competition laws, they can file lawsuits to block it.

4. Timeline:

The process can vary significantly depending on the complexity of the deal and the level of scrutiny required. It can take anywhere from weeks to months, or even years in complex cases.

Examples:

AT&T's attempted acquisition of T-Mobile: The DOJ blocked the merger due to concerns about reduced competition in the wireless market.
Facebook's acquisition of WhatsApp: The FTC initially challenged the deal but ultimately approved it with conditions.

Additional Points:

The regulatory landscape can be complex and constantly evolving, requiring expert legal counsel for navigating the process.
The level of scrutiny and potential challenges can significantly impact the deal timeline and feasibility.
Understanding the regulatory environment and proactively addressing potential concerns is crucial for a successful acquisition.

Photo by Vincent Yuan @USA / Unsplash

6 Closing and Integration

Closing and integration mark the final chapter in a company merger, but they bring their own set of challenges and complexities. Here's a detailed breakdown:

Closing:

Formalization: Final documents are signed, legal formalities are completed, and the acquisition officially closes.
Regulatory Approvals: If required, all necessary regulatory approvals must be secured before closing.
Funding and Payment: The acquiring company finalizes the payment to the target company, often in cash, stock, or a combination.
Shareholder Votes: For public companies, shareholder approval might be required before closing.

Example:

CVS Health's acquisition of Aetna: The deal closed in 2018 after receiving regulatory approval and shareholder votes from both companies.

Integration:

Combining Operations: Merging business functions, systems, and teams from both companies.
Cultural Integration: Aligning company cultures, values, and communication styles.
Employee Transitions: Addressing employee concerns, managing potential layoffs, and implementing training programs.
Synergy Realization: Identifying and capturing cost savings, revenue growth, and other value-creation opportunities.

Challenges and Risks:

Integration complexity: Cultural clashes, resistance to change, and IT system integration issues can be difficult to overcome.
Synergy realization: Achieving projected synergies can be slower and more challenging than anticipated.
Employee morale and retention: Managing employee anxiety, skills gaps, and potential talent loss during integration is crucial.

Examples:

Disney's acquisition of Fox: The integration process was complex due to the size and diverse businesses involved.
Kraft Heinz's acquisition of Unilever: The merger failed to achieve expected synergies and led to cultural clashes.

Additional Points:

Effective communication, change management strategies, and strong leadership are crucial for successful integration.
The integration process can take months or even years, and requires ongoing monitoring and adjustments.
The success of a merger ultimately hinges on a smooth and well-executed closing and integration phase.

Photo by Vincent Yuan @USA / Unsplash

7 Frequently Asked Questions

Here are some of the most commonly asked questions about company mergers from the perspectives of employees, shareholders and customers:

Employees

For sure, the job security is the no.1 question. During a company merger, the evaluation of employee jobs typically involves several considerations. Let’s explore this from different angles:

Internal Assessment by the Merging Companies:

The merging companies themselves evaluate employee roles, responsibilities, and skills. They assess which positions are redundant, which are critical, and which can be integrated.
Job evaluations may involve comparing job descriptions, performance records, and qualifications.

Consulting Companies or HR Experts:

Some mergers engage external consulting firms or HR experts to assist in evaluating employees.
These experts analyze factors such as job functions, competencies, and market value.
They may provide recommendations on retaining key talent, aligning compensation, and managing workforce transitions.

Retention of Key Employees:

Identifying and retaining key employees is crucial. These are individuals with specialized skills, institutional knowledge, or leadership roles.
Companies consider factors like expertise, client relationships, and strategic importance.

Redundancies and Layoffs:

Unfortunately, some positions become redundant due to overlapping functions after the merger.
Companies decide which roles to eliminate based on business needs, cost savings, and efficiency.
Severance packages may be offered to affected employees.

Skill Assessment and Fit:

Companies evaluate whether employees’ skills align with the merged organization’s goals.
They consider adaptability, willingness to learn, and cultural fit.

Shareholders

This is a breakdown to show how shareholders are impacted during a company merger:

Exchange of Shares:

In a stock-for-stock merger, shareholders of both companies receive shares in the new combined entity.
The exchange ratio determines whether one company’s shareholders receive a premium above their share price before the merger announcement.
If the merger is favorable, shares of both companies may rise.

Dilution of Control:

Shareholders whose shares are not exchanged find their control diluted.
New shares issued to the other company’s shareholders reduce the control of existing shareholders.

Temporary Volatility:

Shareholders of the acquiring firm may experience a temporary drop in share value before the merger.
Shareholders of the target firm may see a rise in share value during the period.

Customers

Certainly! Let’s break down how customers are impacted during a company merger:aa

Service Disruptions and Miscommunications:

Integration efforts can divert attention from day-to-day operations, leading to miscommunications with customers.
Poorly managed systems migrations may cause confusion or delays in service.

Changes in Customer Service:

Customer service levels may fluctuate due to adjustments in staff, processes, or technology.
Customers might experience longer wait times or inconsistent support.

Product and Service Offerings:

Choices available to customers may change.
Some products or services may be discontinued, while new ones may be introduced.

Pricing and Terms:

Pricing structures could shift. Customers may face price increases or discounts.
Contract terms might be modified, affecting existing agreements.

Brand Perception and Loyalty:

Mergers can stress relationships with customers.
Brand loyalty may be tested as customers adapt to the new entity.

Communication Efforts:

Effective communication about the merger’s benefits and changes is crucial.
Transparency helps maintain customer trust.

Reference

Investopedia: Mergers & Acquisitions:https://www.investopedia.com/terms/m/mergersandacquisitions.asp
Harvard Business Review: The Art of the Deal:https://hbr.org/podcast/2016/02/the-art-of-the-interview
Mergers & Acquisitions Journal: https://mergersandinquisitions.com/
Department of Justice: Antitrust Division:https://www.justice.gov/atr/
Antitrust Source: https://www.antitrustsource.com/

Built a Chatbot with Streamlit and Llama 2 with Google Colab GPUs from Scratch

Vincent Yuan — Sun, 07 Jul 2024 20:31:52 GMT

So far, we have talked about a lot of things regarding Llama 2:

Swift inference powered by GPUs
Thoughtful responses with appropriate prompts
Question answering utilizing a knowledge database
A user-friendly web interface

You can find those informative posts in the GenAI section of Spacecraft as below:

GenAI - Spacecraft

Spacecraft

This post shows a product that makes the best of all the things learned, and build a web-based chatbot powered by a local Llama 2 model, running on Google Colab with GPUs.

2 Dependencies

Firstly, install a few dependencies:

!pip install -q streamlit

!npm install localtunnel

# GPU setup of LangChain
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install --force-reinstall llama-cpp-python==0.2.28  --no-cache-dir

!pip install huggingface_hub  chromadb langchain sentence-transformers pinecone_client

Then download the Llama 2 model to the Colab notebook:

!wget https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_0.gguf

3 Build the Web Chatbot

The web chatbot is like this:

Llama 2 Chatbot

You need to write the app code to the disk first:

%%writefile app.py

import streamlit as st
import os

from langchain.llms import LlamaCpp
from langchain.chains import LLMChain
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.prompts import PromptTemplate

from langchain.llms import LlamaCpp


# App title
st.set_page_config(page_title="🦙💬 Llama 2 Chatbot")

llama_model_path = 'llama-2-7b-chat.Q5_0.gguf'

n_gpu_layers = 40  # Change this value based on your model and your GPU VRAM pool.
n_batch = 512  # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.

# for token-wise streaming so you'll see the answer gets generated token by token when Llama is answering your question
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

# Replicate Credentials
with st.sidebar:
    st.title('🦙💬 Llama 2 Chatbot')


    st.subheader('Models and parameters')
    selected_model = st.sidebar.selectbox('Choose a Llama2 model', ['Llama2-7B', 'Llama2-13B'], key='selected_model')

    if selected_model == 'Llama2-7B':
        llm_path = llama_model_path
    elif selected_model == 'Llama2-13B':
        llm_path = llama_model_path

    temperature = st.sidebar.slider('temperature', min_value=0.01, max_value=5.0, value=0.1, step=0.01)
    top_p = st.sidebar.slider('top_p', min_value=0.01, max_value=1.0, value=0.9, step=0.01)
    max_length = st.sidebar.slider('max_length', min_value=32, max_value=128, value=120, step=8)
    st.markdown('📖 Learn how to build this app in this [blog](https://blog.streamlit.io/how-to-build-a-llama-2-chatbot/)!')


    llm = LlamaCpp(
      model_path=llm_path,
      temperature=temperature,
      top_p=top_p,
      n_ctx=2048,
      n_gpu_layers=n_gpu_layers,
      n_batch=n_batch,
      callback_manager=callback_manager,
      verbose=True,
    )

# Store LLM generated responses
if "messages" not in st.session_state.keys():
    st.session_state.messages = [{"role": "assistant", "content": "How may I assist you today?"}]

# Display or clear chat messages
for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.write(message["content"])

def clear_chat_history():
    st.session_state.messages = [{"role": "assistant", "content": "How may I assist you today?"}]
st.sidebar.button('Clear Chat History', on_click=clear_chat_history)


# Function for generating LLaMA2 response. Refactored from https://github.com/a16z-infra/llama2-chatbot
def generate_llama2_response(prompt_input):

    pre_prompt = """[INST] <>
                  You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.

                  If you cannot answer the question from the given documents, please state that you do not have an answer.\n
                  """


    for dict_message in st.session_state.messages:
        if dict_message["role"] == "user":
            pre_prompt += "User: " + dict_message["content"] + "\n\n"
        else:
            pre_prompt += "Assistant: " + dict_message["content"] + "\n\n"

    prompt = pre_prompt +  "User : {question}" + "[\INST]"
    llama_prompt = PromptTemplate(template=prompt, input_variables=["question"])

    chain = LLMChain(llm=llm, prompt=llama_prompt)

    result = chain({
                "question": prompt_input
                 })


    return result['text']

# User-provided prompt
if prompt := st.chat_input():
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message("user"):
        st.write(prompt)

# Generate a new response if last message is not from assistant
if st.session_state.messages[-1]["role"] != "assistant":
    with st.chat_message("assistant"):
        with st.spinner("Thinking..."):
            response = generate_llama2_response(prompt)
            placeholder = st.empty()
            full_response = ''
            for item in response:
                full_response += item
                placeholder.markdown(full_response)
            placeholder.markdown(full_response)
    message = {"role": "assistant", "content": full_response}
    st.session_state.messages.append(message)

3.1 Model Setup

In the code, firstly tweak the params per your hardware, models and objectives:

llama_model_path = 'llama-2-7b-chat.Q5_0.gguf'

n_gpu_layers = 40  # Change this value based on your model and your GPU VRAM pool.
n_batch = 512  # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.

The free version of Colab does not too much memory so here the llama-2-7b-chat.Q5_0.gguf is used but you can use a larger model for better performance.

3.2 Message Management

Then, these are the setup for the display/clear of messages of the chatbot:

# Store LLM generated responses
if "messages" not in st.session_state.keys():
    st.session_state.messages = [{"role": "assistant", "content": "How may I assist you today?"}]

# Display or clear chat messages
for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.write(message["content"])

def clear_chat_history():
    st.session_state.messages = [{"role": "assistant", "content": "How may I assist you today?"}]
st.sidebar.button('Clear Chat History', on_click=clear_chat_history)

3.3 Get LLM Response

Below function appends the chat history into the prompt and get the response of model:

def generate_llama2_response(prompt_input):

    pre_prompt = """[INST] <>
                  You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.

                  If you cannot answer the question from the given documents, please state that you do not have an answer.\n
                  """


    for dict_message in st.session_state.messages:
        if dict_message["role"] == "user":
            pre_prompt += "User: " + dict_message["content"] + "\n\n"
        else:
            pre_prompt += "Assistant: " + dict_message["content"] + "\n\n"

    prompt = pre_prompt +  "User : {question}" + "[\INST]"
    llama_prompt = PromptTemplate(template=prompt, input_variables=["question"])

    chain = LLMChain(llm=llm, prompt=llama_prompt)

    result = chain({
                "question": prompt_input
                 })


    return result['text']

3.4 Conversation

Below shows the question and answering process, the chatbot responses to users' questions:

# User-provided prompt
if prompt := st.chat_input():
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message("user"):
        st.write(prompt)

# Generate a new response if last message is not from assistant
if st.session_state.messages[-1]["role"] != "assistant":
    with st.chat_message("assistant"):
        with st.spinner("Thinking..."):
            response = generate_llama2_response(prompt)
            placeholder = st.empty()
            full_response = ''
            for item in response:
                full_response += item
                placeholder.markdown(full_response)
            placeholder.markdown(full_response)
    message = {"role": "assistant", "content": full_response}
    st.session_state.messages.append(message)

4 Start the Chatbot

You can bring up the chatbot by using below command:

!streamlit run app.py --server.address=localhost &>/content/logs.txt &

import urllib
print("Password/Enpoint IP for localtunnel is:",urllib.request.urlopen('https://ipv4.icanhazip.com').read().decode('utf8').strip("\n"))

!npx localtunnel --port 8501

The result shows a password to access the web app:

Password/Enpoint IP for localtunnel is: 35.185.197.1
npx: installed 22 in 2.393s
your url is: https://hot-pets-chew.loca.lt

Go to that url and enter the password, and enjoy the time!

5 Conclusion

This post consolidates information to transform your local Llama 2 model into a fully functional chatbot. Moreover, you have the flexibility to craft specialized assistants for distinct domains by customizing the system prompts, all at no additional cost.

Let's build something cool!

Question Answering on Multiple Files with Llama 2 and RAG

Vincent Yuan — Sun, 07 Jul 2024 20:29:41 GMT

In the previous post, we discussed the process of utilizing Llama 2 and retrieval augmented generation (RAG) for question answering. However, the method shared was designed for a single file, and in many scenarios, it's essential for the chatbot to have knowledge about all the information across multiple input files. This post will demonstrate how to achieve this capability with Llama 2 at no cost.

This post will show:

Run Llama 2 with GPUs
Create a vector store based on multiple files
Question answering based on RAG with multiple files in the vector store

1 Get Llama 2 Ready

Firstly, install Python dependencies, download the Llama 2 model, and load Llama 2 model. This part is identical to the reference link above so no details are shared repeatedly.

!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir

!pip install huggingface_hub   chromadb langchain sentence-transformers pinecone_client

import numpy as np
import pandas as pd

from huggingface_hub import hf_hub_download
from llama_cpp import Llama

from langchain.llms import LlamaCpp
from langchain.chains import LLMChain
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.prompts import PromptTemplate

# Vector store
from langchain.document_loaders import DirectoryLoader, TextLoader
from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain.vectorstores import Chroma

# Show result
import markdown

!wget https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_0.gguf

# for token-wise streaming so you'll see the answer gets generated token by token when Llama is answering your question
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

llama_model_path = 'llama-2-7b-chat.Q5_0.gguf'

n_gpu_layers = 40  # Change this value based on your model and your GPU VRAM pool.
n_batch = 512  # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.


llm = LlamaCpp(
    model_path=llama_model_path,
    temperature=0.1,
    top_p=1,
    n_ctx=16000,
    n_gpu_layers=n_gpu_layers,
    n_batch=n_batch,
    callback_manager=callback_manager,
    verbose=True,
)

2 Create Vector Database

Firstly, let's download some dataset:

!wget -q https://www.dropbox.com/s/vs6ocyvpzzncvwh/new_articles.zip
!unzip -q new_articles.zip -d new_articles

These are a bunch of news text files:

Input News Data

2.1 Load Files

Load the files using DirectoryLoader made by LangChain:

from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = DirectoryLoader('./new_articles/', glob="./*.txt", loader_cls=TextLoader)

documents = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
texts = text_splitter.split_documents(documents)

2.2 Create the Database

from langchain.embeddings import HuggingFaceEmbeddings


# Save the db in the disk
persist_directory = 'db'

# HuggingFace embedding is free!
embedding = HuggingFaceEmbeddings()

vectordb = Chroma.from_documents(documents=texts, 
                                 embedding=embedding,
                                 persist_directory=persist_directory)

You can save the database in the disk and load it back to the workflow in below ways:

vectordb.persist()
vectordb = None

vectordb = Chroma(persist_directory=persist_directory, 
                  embedding_function=embedding)

2.3 Make a Retriever

💡

The number of files to be searched impacts the result, thekvalue is a parameter to tweak per your use case.

retriever = vectordb.as_retriever(search_kwargs={"k": 5})

3 RAG

We then use RetrievalQA to retrieve the documents from the vector database and give the model more context on Llama 2, thereby increasing its knowledge.

Firstly, create the qa_chain:

# use another LangChain's chain, RetrievalQA, to associate Llama with the loaded documents stored in the vector db
from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=retriever
)

Then let's ask a few questions regarding the input documents, here comes the 1st question:

query = "Any news about Hugging Face and ServiceNow? Also include the source in the response."
llm_response = qa_chain(query)

The result is like:

Hugging Face raised $35 million from investors including ServiceNow, according to TechCrunch on May 18, 2022. (Source: TechCrunch)

Let's ask another question:

query = "Any news about Google IO 2023? Also include the source in the response."
llm_response = qa_chain(query)

The answer to the 2nd question is:

Based on the provided context, it seems that Google IO 2023 is expected to announce new hardware, including a foldable smartphone called Pixel Fold, and possibly a budget device called Pixel 7a, as well as updates to Wear OS and developer tools. Additionally, there may be news about Google's AI plans, with generative AI (like Bard) appearing across Google's line of products. However, I don't know the exact details or timeline of these announcements, as the provided context only provides general information about what to expect from the conference.

4 Summary

Up to this point, you can envision the possibilities that Llama 2 unlocks within this workflow, alongside other techniques highlighted in my blog. Notably, it encompasses:

Swift inference powered by GPUs
Thoughtful responses with appropriate prompts
Question answering utilizing a knowledge database
A user-friendly web interface

These building blocks empower developers to create more robust applications than ever before. Stay tuned for the unveiling of more exciting products!

Job Aid of Running Streamlit App on Google Colab

Vincent Yuan — Sun, 07 Jul 2024 20:26:08 GMT

Streamlit is a user-friendly, open-source Python framework designed to effortlessly create and share interactive data applications. Whether you're a data scientist, engineer, or analyst, Streamlit empowers you to transform your scripts into robust web applications within minutes, all within the familiar Python environment.

Google Colab, on the other hand, provides a seamless environment for testing ideas related to app development, model training, and Gen AI experiments. It eliminates the need for manual setup of the coding environment and offers the added advantage of free GPUs.

The synergy between Streamlit and Google Colab becomes even more compelling when you can translate your demonstrations into interactive web applications. This enables you to effectively operationalize your ideas. In this post, we'll explore how to leverage Streamlit to build web applications seamlessly within the Google Colab environment.

1 Install Dependencies

Firstly, install Streamlit:

!pip install -q streamlit

Then install localtunnel to serve the Streamlit app

!npm install localtunnel

2 Build Your Apps

Create a demo web application like below:

%%writefile app.py

import streamlit as st

st.write('Hello, *World!* :sunglasses:')

Then run the app using below command:

!streamlit run app.py --server.address=localhost &>/content/logs.txt &

And a few files should be created and shown like this:

File System

3 Expose the App

Let's expose the app to the port 8051:

import urllib
print("Password/Enpoint IP for localtunnel is:",urllib.request.urlopen('https://ipv4.icanhazip.com').read().decode('utf8').strip("\n"))

!npx localtunnel --port 8501

The return will be like this:

Password/Enpoint IP for localtunnel is: 35.245.122.211
npx: installed 22 in 1.71s
your url is: https://itchy-bikes-smoke.loca.lt

Copy that password, and click the url, it will lead you to a page:

Landing Page

Once your enter the password, the web app is now yours:

The Hello World App

4 Conclusion

This job aid shows how you can build a web app within Google Colab, now you can move one step further and try to build something cool 😎

How to Prompt Correctly with Llama 2?

Vincent Yuan — Sun, 28 Jan 2024 04:26:24 GMT

Uncertain if you've encountered instances where Llama 2 provides irrelevant, redundant, or potentially harmful responses. Such outcomes can be perplexing and may lead users to disengage. A contributing factor to this issue is often the incorrect utilization of prompts. Therefore, this post aims to introduce best practices for prompting when developing GenAI apps with Llama 2.

The sample code can run on Google Colab with GPUs, kindly check below post for the GPU configuration of Llama 2.

Run Llama 2 with Retrieval Augmented Generation in Google Colab with GPUs

Run Llama2 with RAG in Google Colab.

SpacecraftVincent Yuan

This post will show:

Run Llama 2 with GPUs
Comparison of different prompts and the impact to the response of Llama 2
Prompt design for chat, with awareness of historical messages

1 Get Llama 2 Ready

Firstly, install Python dependencies, download the Llama 2 model, and load Llama 2 model. This part is identical to the reference link above so no details are shared repeatedly.

!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir

!pip install huggingface_hub   chromadb langchain sentence-transformers pinecone_client

import numpy as np
import pandas as pd

from huggingface_hub import hf_hub_download
from llama_cpp import Llama

from langchain.llms import LlamaCpp
from langchain.chains import LLMChain
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.prompts import PromptTemplate

# Vector store
from langchain.document_loaders import CSVLoader
from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain.vectorstores import Chroma

# Show result
import markdown

!wget https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_0.gguf

# for token-wise streaming so you'll see the answer gets generated token by token when Llama is answering your question
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

llama_model_path = 'llama-2-7b-chat.Q5_0.gguf'

n_gpu_layers = 40  # Change this value based on your model and your GPU VRAM pool.
n_batch = 512  # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.

from langchain.llms import LlamaCpp
llm = LlamaCpp(
    model_path=llama_model_path,
    temperature=0.1,
    top_p=1,
    n_ctx=16000,
    n_gpu_layers=n_gpu_layers,
    n_batch=n_batch,
    callback_manager=callback_manager,
    verbose=True,
)

2 Impact of Different Prompts

It is pretty amazing that slightly different prompts will lead to quite different response. This can be reflected by simple testing as below.

2.1 Just Ask Questions

For instance, the most straightforward way is just to ask what you want like below:

Testing_message = "The Stoxx Europe 600 index slipped 0.5% at the close, extending a lackluster start to the year."

# Use LangChain's PromptTemplate and LLMChain
prompt = PromptTemplate.from_template(
    "Extract the named entity information from below text: {text}"
)

chain = LLMChain(llm=llm, prompt=prompt)
answer = chain.invoke(Testing_message)

The answer is like below:

 The index has fallen 3.7% since the beginning of January and is down 12.9% from its peak in August last year.
Please provide the named entities as follows:
1. Stoxx Europe 600
2. index
3. Europe
4. January
5. August

As you can see, Llama 2 firstly repeats the sentence and also adds more info, then answers the question, which is not expected by users as it seems to be out of control in a sense.

2.2 Prompt with System Message

By slightly adjusting the prompt, the response will become more normal.

prompt = PromptTemplate.from_template(
    "[INST]Extract the important Named Entity Recoginiton information from this text: {text}, do not add unrelated content in the reply.[/INST]"
)
chain = LLMChain(llm=llm, prompt=prompt)
answer = chain.invoke(Testing_message)

The response becomes:

  Sure! Here are the important named entities recognized in the given text:

1. Stoxx Europe 600 - Index
2. Europe - Continent

So now it does not change the sentence, and only answers the question that user asks. This version makes more sense simply because the addition of [INST] and [/INST] in the prompt. [INST] is part of the token used in the model training process, shared in the Llama 2 paper, which helps model understand the conversation.

Also, there is a more flexible way to do this, also with the addition of customizable system message as below:

# creating prompt for large language model
pre_prompt = """[INST] <>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.

If you cannot answer the question from the given documents, please state that you do not have an answer.\n
"""

prompt = pre_prompt + "{context}\n" +"Question : {question}" + "[\INST]"
llama_prompt = PromptTemplate(template=prompt, input_variables=["context", "question"])

chain = LLMChain(llm=llm, prompt=llama_prompt)

result = chain({ "context" : "Extract the named entity information from below sentences:",
                "question": Testing_message
                 })

The result is as below:

  Sure, I'd be happy to help! Here is the named entity information extracted from the sentence you provided:

* Stoxx Europe 600 index
* Europe
* year

I hope this helps! Let me know if you have any other questions.

In fact this is the template strictly following the training procedure of Llama 2. And with above template, you can customize the system message more flexibly though the response might look similar to a simplified version as shown above.

[INST] <>
{{ system_prompt }}
<>

{{ user_message }} [/INST]

2.3 Prompt with Chat History

Llama 2 does not have memory, using above prompt would not allow the model to answer your question based on the chat history, in order to address this challenge, we need to embed the chat history into the prompt so that the model is exposed to it.

~~Below is a vanilla version that works, firstly have the 1st round the question & answering:~~

def create_chat_prompt(pre_prompt, chat_messages):
  for dict_message in chat_messages:
      if dict_message["role"] == "user":
          pre_prompt += "User: " + dict_message["content"] + "\n\n"
      else:
          pre_prompt += "Assistant: " + dict_message["content"] + "\n\n"


  prompt = pre_prompt + "User : {question}" + "[\INST]"
  return prompt

messages = [{"role": "assistant", "content": "How may I assist you today?"}]

# creating prompt for large language model
pre_prompt = """[INST] <>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.

If you cannot answer the question from the given documents, please state that you do not have an answer.\n
"""


llama_chat_prompt = PromptTemplate(template=prompt, input_variables=["question"])

chat_chain = LLMChain(llm=llm, prompt=llama_chat_prompt)

chat_question = { "question": "What do you think are the most beautiful things in the Autumn in the U.S.?"}

messages.append({"role": "user", "content": chat_question['question']})

response = chat_chain(chat_question)

messages.append({"role": "Assistant", "content": response['text']})

~~The response is like:~~

I'm here to help you with your question. However, I must inform you that the concept of "beauty" is subjective and can vary from person to person. What one person finds beautiful may not be the same for another.
That being said, Autumn in the U.S. is known for its vibrant foliage, with trees changing colors from green to shades of orange, red, and yellow. The most beautiful things about Autumn in the U.S. can be observed through the following:
1. Leaves: The vibrant colors of the leaves during Autumn are a sight to behold. The changing hues of orange, red, and yellow create a picturesque landscape that is both breathtaking and Instagram-worthy.
2. Fall Foliage: The term "fall foliage" refers to the vibrant colors of the leaves during Autumn. The changing colors of the leaves are a result of the trees preparing for winter by storing energy in their roots, stems, and branches.
3. Pumpkins: Pumpkins are a staple of Autumn in the U.S., often used in decorations

~~This is a pretty standard response, then let's ask a follow-up question based on the response above:~~

prompt = create_chat_prompt(pre_prompt=pre_prompt, 
                            chat_messages=messages)
                            
llama_chat_prompt = PromptTemplate(template=prompt, input_variables=["question"])

chat_chain = LLMChain(llm=llm, prompt=llama_chat_prompt)


chat_question = { "question": "For the 2nd point you mentioned above, can you please make a 3 day travel plan?"}
messages.append({"role": "user", "content": chat_question})

response = chat_chain(chat_question)

messages.append({"role": "Assistant", "content": response['text']})

~~As you can see, the question does not specify what 2nd point is, while the model is able to grasp that based on the chat history, and the response is like below:~~

Of course! I'd be happy to help you create a 3-day travel plan for experiencing the beautiful things about Autumn in the U.S. Here is a sample itinerary:
Day 1:
* Stop 1: Take a scenic drive through the Adirondack Mountains in upstate New York. The mountains offer breathtaking views of the changing leaves, and there are many scenic overlooks and hiking trails to explore.
* Stop 2: Visit the Hudson River Valley, which is known for its picturesque towns, farms, and vineyards. Take a stroll through the charming streets of Cold Spring or Beacon, and enjoy the fall foliage along the riverfront.
Day 2:
* Stop 1: Head to New England, specifically Vermont or New Hampshire, for some of the most spectacular fall foliage in the country. Take a drive through the Green Mountains or White Mountains, and stop at scenic overlooks and hiking trails along the way.
* Stop 2: Visit the coastal towns of Maine, such as Kennebunkport or Camden

3 Summary

Some the snippets are not made into a function just for demo purposes, while you can see by adding system messages and chat history into the prompt, Llama 2 becomes even more intelligent and helpful.

~~So far, we have covered topics of Llama 2 regarding:~~

~~Fast inference using GPUs~~
~~Better prompt tactics for reasonable response~~
~~Chat with Llama 2~~
~~RAG for domain knowledge question & answering~~

~~This means that a lot of useful apps powered by Llama 2 can be built using above tech stack. Stay tuned for more valuable sharing!~~

Reference

~~How to Prompt Llama 2:~~

Llama 2 is here - get it on Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
get it on Hugging Face

Build a Fraud Intelligence Analyst Powered by Llama 2 in Google Colab with GPUs

Vincent Yuan — Sat, 13 Jan 2024 04:55:34 GMT

Last time, we introduced how to use GPUs in Google Colab to run RAG with Llama 2. Today, a practical use case is discussed - fraudulent credit card transaction detection, powered by Llama 2.
Run Llama 2 with Retrieval Augmented Generation in Google Colab with GPUs
Run Llama2 with RAG in Google Colab.
SpacecraftVincent Yuan
Fraud detection is a critical task for businesses of all sizes. By identifying and investigating fraudulent transactions, businesses can protect their bottom line and keep their customers safe.
Llama 2 is a large language model that can be used to generate text, translate languages, write different kinds of creative content, and more. In this post, we'll show you how to use Llama 2 to build a Fraud Intelligence Analyst that can detect fraudulent patterns of credit card transactions and answer any questions regarding the transactions.
This Fraud Intelligence Analyst can be used to help fraud detection analysts and data scientists build better solutions to the fraud detection problem. By providing insights into the data, the Fraud Intelligence Analyst can help analysts identify new patterns of fraud and develop new strategies to combat it.
This post will show:
Load Llama 2 gguf model from HuggingFace
Run Llam2 2 with GPUs
Create a vector store from a CSV file that has credit card transaction data
Perform question and answering using Retrieval Augmented Generation(RAG)
1 Dependencies
Firstly, install Python dependencies as below:
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir !pip install huggingface_hub chromadb langchain sentence-transformers pinecone_client
Then import dependencies as below:
import numpy as np from huggingface_hub import hf_hub_download from llama_cpp import Llama from langchain.llms import LlamaCpp from langchain.chains import LLMChain from langchain.callbacks.manager import CallbackManager from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler from langchain.prompts import PromptTemplate # Vector store from langchain.document_loaders import CSVLoader from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings from langchain.vectorstores import Chroma # Show result import markdown
This credit card transaction dataset will be used to create the vector store:
from google.colab import drive drive.mount('/content/drive') source_text_file = '/content/drive/MyDrive/Research/Data/GenAI/credit_card_fraud.csv'
The transaction data is like below:

transaction time merchant amt city_pop is_fraud

2019-01-01 00:00:44 "Heller, Gutmann and Zieme" 107.23 149 0

2019-01-01 00:00:51 Lind-Buckridge 220.11 4154 0

2019-01-01 00:07:27 Kiehn Inc 96.29 589 0

2019-01-01 00:09:03 Beier-Hyatt 7.77 899 0

2019-01-01 00:21:32 Bruen-Yost 6.85 471 1

💡
The public fraud credit card transaction data can be found here: https://www.datacamp.com/workspace/datasets/dataset-python-credit-card-fraud
2 Load Llama 2 from HuggingFace
Firstly create a callback manager for the streaming output of text, and specify the model names in the HuggingFace:
# for token-wise streaming so you'll see the answer gets generated token by token when Llama is answering your question callback_manager = CallbackManager([StreamingStdOutCallbackHandler()]) # Download the model !wget https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_0.gguf
Then specify the model path to be loaded into LlamaCpp:
model_path = 'llama-2-7b-chat.Q5_0.gguf'
Specify the GPU settings:
n_gpu_layers = 40 # Change this value based on your model and your GPU VRAM pool. n_batch = 512 # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.
Next, let's load the model using langchain as below:
from langchain.llms import LlamaCpp llm = LlamaCpp( model_path=llama_model_path, temperature=0.0, top_p=1, n_ctx=16000, n_gpu_layers=n_gpu_layers, n_batch=n_batch, callback_manager=callback_manager, verbose=True, )
💡
Be sure to set up n_gpu_layers and n_batch, it shows BLAS = 1 in the output if it is set up correctly.
3 Question Answering
This time the CSV loader is used to embed a table and create a vector database, then the LLama 2 model will answer questions based on that file.
3.1 Create a Vector Store
Firstly let's load the CSV data from Colab:
embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2") loader = CSVLoader(source_text_file, encoding="windows-1252") documents = loader.load() # Create a vector store db = Chroma.from_documents(documents, embedding_function)
3.2 RAG
We then use RetrievalQA to retrieve the documents from the vector database and give the model more context on Llama 2, thereby increasing its knowledge.
# use another LangChain's chain, RetrievalQA, to associate Llama with the loaded documents stored in the vector db from langchain.chains import RetrievalQA qa_chain = RetrievalQA.from_chain_type( llm, retriever=vstore.as_retriever(search_kwargs={"k": 1}) )
Then the model is ready for your questions:
question = "Do you see any common patter for those fraudulent transactions? think about this step by step and provide examples for each pattern that you found." result = qa_chain({"query": question}) print(markdown.markdown(result['result']))
The response is like:
Yes, I can identify some common patterns in the provided data for fraudulent transactions. Here are some examples of each pattern I found: 1. Recurring Transactions: There are several recurring transactions in the dataset, such as those with the same date and time every day or week. For example, transaction #86cad0e7682a85fa6418dde1a0a33a44 has a recurrence pattern of every Monday at 5:50 AM. While this alone does not necessarily indicate fraud, it could be a sign of automated or scripted transactions. 2. High-Value Transactions: Some transactions have unusually high values compared to the average transaction amount for the merchant and category. For example, transaction #86cad0e7682a85fa6418dde1a0a33a44 has an amt of $32.6, which is significantly higher than the average transaction amount for gas transport merchants in Browning, MO ($19.2). This could indicate a fraudulent transaction. 3. Multiple Transactions from Same IP Address:
Yes, I can identify some common patterns in the provided data for fraudulent transactions.
4 Conclusion
In fraud detection, case studies are a common and important part of the process. However, they can be labor-intensive to create. Llama 2 and RAG can help to automate this process, making it more efficient and effective.
Llama 2 and RAG can be used to generate case studies that are tailored to specific questions or scenarios. This can help fraud detection analysts to identify patterns and trends that they might not otherwise have seen. Additionally, the case studies can be used to train new analysts on the latest fraud detection techniques.
Llama 2 and RAG are still in development, but they have the potential to revolutionize the way that fraud detection case study is conducted. By making it easier to create and analyze case studies, these tools can help fraud detection analysts to stay ahead of the curve.
Stay tuned for more applications like this one!
Reference
Langchain - llama.cpp:
Llama.cpp | 🦜️🔗 Langchain
llama-cpp-python is a
🦜️🔗 LangChain
Create a vector store using CSV files:
How to use CSV files in vector stores with Langchain
A guide for using CSV files in vector stores with langchain
how.wtf

Run Llama 2 with Retrieval Augmented Generation in Google Colab with GPUs

Vincent Yuan — Sun, 07 Jan 2024 21:53:28 GMT

Utilizing GenAI models on Colab with its free GPUs proves advantageous for GenAI developers. It enables faster execution compared to personal computers lacking powerful GPUs, thereby allowing the testing of more ideas within the same timeframe.
Colab GPU
This post will show you how you can:
Load Llama 2 gguf model from HuggingFace
Run Llam2 2 with GPUs
Create a vector store using Pinecone
Perform question and answering using Retrieval Augmented Generation(RAG)
1 Dependencies
Firstly, install Python dependencies as below:
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir !pip install huggingface_hub chromadb langchain sentence-transformers pinecone_client
Then import dependencies as below:
import numpy as np from huggingface_hub import hf_hub_download from llama_cpp import Llama from langchain.llms import LlamaCpp from langchain.chains import LLMChain from langchain.callbacks.manager import CallbackManager from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler from langchain.prompts import PromptTemplate
Then mount the Google Drive to load the NBA player sample data shared by Meta in the llama-recipes repo. This dataset will be used to create the vector store:
from google.colab import drive drive.mount('/content/drive') source_text_file = '/content/drive/MyDrive/Research/Data/GenAI/nba.txt'
NBA Player Sample Data
2 Load Llama 2 from HuggingFace
Firstly create a callback manager for the streaming output of text, and specify the model names in the HuggingFace:
# for token-wise streaming so you'll see the answer gets generated token by token when Llama is answering your question callback_manager = CallbackManager([StreamingStdOutCallbackHandler()]) # Download the model !wget https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_0.gguf
Then specify the model path to be loaded into LlamaCpp:
model_path = 'llama-2-7b-chat.Q5_0.gguf'
Specify the GPU settings:
n_gpu_layers = 40 # Change this value based on your model and your GPU VRAM pool. n_batch = 512 # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.
Next, let's load the model using langchain as below:
from langchain.llms import LlamaCpp llm = LlamaCpp( model_path=llama_model_path, temperature=0.0, top_p=1, n_ctx=16000, n_gpu_layers=n_gpu_layers, n_batch=n_batch, callback_manager=callback_manager, verbose=True, )
💡
Be sure to set up n_gpu_layers and n_batch, it shows BLAS = 1 in the output if it is set up correctly.
3 RAG
Retrieval Augmented Generation (RAG) is important because it addresses key limitations of large language models (LLMs). Here's why:
Factual Accuracy: LLMs can be creative and articulate, but they aren't always truthful. RAG integrates external knowledge sources, ensuring generated responses are grounded in real facts.
Reduced Hallucinations: LLMs can sometimes invent information or make false claims. RAG combats hallucinations by providing LLMs with reliable context from external sources.
Domain Expertise: LLMs struggle with specialized topics. RAG allows them access to specific knowledge bases, like medical journals or legal documents, enhancing their responses in niche areas.
Transparency and Trust: RAG systems can show their work! Users can see the sources used to generate responses, building trust and enabling fact-checking.
In short, RAG makes LLMs more reliable, accurate, and versatile, opening doors for their use in areas like education, legal advice, and scientific research. It's a crucial step towards trustworthy and grounded AI.
3.1 Initialize Pinecone
Let's import a few related packages and initialize Pinecone - a vector store provider.
💡
Quick start for Pinecone setup: https://docs.pinecone.io/docs/quickstart
from langchain.document_loaders import TextLoader from langchain.text_splitter import CharacterTextSplitter from langchain.vectorstores import Chroma from langchain.embeddings import HuggingFaceEmbeddings
Get your Pinecone API key and env here:
PINECONE_API_KEY = '' PINECONE_ENV = ''
And initialize it:
import pinecone from langchain.vectorstores import Pinecone # Initialize Pinecone pinecone.init( api_key=PINECONE_API_KEY, environment=PINECONE_ENV ) pinecone_index_nm = 'qabot'
3.2 Create a Vector Store
Firstly let's load the data from Colab:
embeddings = HuggingFaceEmbeddings() # Load the document, split it into chunks, embed each chunk and load it into the vector store. raw_documents = TextLoader(source_text_file).load() text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0) documents = text_splitter.split_documents(raw_documents)
Then create the vector store:
# Send embedding vectors to Pinecone with Langchain vstore = Pinecone.from_documents(documents, embeddings, index_name=pinecone_index_nm)
3.3 RAG
We then use RetrievalQA to retrieve the documents from the vector database and give the model more context on Llama 2, thereby increasing its knowledge.
# use another LangChain's chain, RetrievalQA, to associate Llama with the loaded documents stored in the vector db from langchain.chains import RetrievalQA qa_chain = RetrievalQA.from_chain_type( llm, retriever=vstore.as_retriever(search_kwargs={"k": 1}) )
Then the model is ready for your questions:
question = "Who is the tallest in Atlanta Hawks" result = qa_chain({"query": question})
The response is like:
{'query': 'Who is the tallest in Atlanta Hawks', 'result': ' The tallest player on the Atlanta Hawks roster is Saddiq Bey at 6\'7".'}
4 Conclusion
Utilizing the open-source Llama 2 model with RAG, you can create a robust chatbot tailored to your domain knowledge. This capability proves highly beneficial for enterprise users, as it circumvents privacy concerns and data leaks, ensuring everything operates in-house in theory.
However, there's still more to uncover in our quest to construct a secure and responsible GenAI app at the enterprise level. Stay tuned for further updates.
Reference
Langchain - llama.cpp:
Llama.cpp | 🦜️🔗 Langchain
llama-cpp-python is a
🦜️🔗 LangChain

transaction time	merchant	amt	city_pop	is_fraud
2019-01-01 00:00:44	"Heller, Gutmann and Zieme"	107.23	149	0
2019-01-01 00:00:51	Lind-Buckridge	220.11	4154	0
2019-01-01 00:07:27	Kiehn Inc	96.29	589	0
2019-01-01 00:09:03	Beier-Hyatt	7.77	899	0
2019-01-01 00:21:32	Bruen-Yost	6.85	471	1