RAG (Retrieval-Augmented Generation)

LLMs have two fatal flaws for enterprise adoption: 1. Hallucination: If they don't know an answer, they will mathematically predict the most likely sounding words to make up a convincing lie. 2. Knowledge Cutoff: An LLM trained in 2023 knows absolutely nothing about your company's proprietary Q3 2024 financial report.

RAG is an architectural pattern that solves both issues without requiring you to expensively re-train or fine-tune the LLM.

The RAG Architecture

1. The Ingestion Pipeline (Data Preparation)

Extraction: Pulling text from proprietary PDFs, Websites, or SQL databases.
Chunking: LLMs have context limits (they can't read a 10,000-page book in one go). You must chunk the data into overlapping paragraphs (e.g., 500 words per chunk).
Embedding: Over 1,000 document chunks are passed into an Embedding Model (like Amazon Titan Embeddings) to convert the text into numerical Vectors.
Vector Storage: These vectors are saved into a Vector Database (like FAISS or Pinecone).

2. The Retrieval & Generation Pipeline (User Request)

Query Embedding: A user asks: "What is our current sick leave policy?" This question is converted into a Vector.
Semantic Search: The Vector DB searches its memory and instantly finds the 3 specific document chunks (paragraphs from the HR handbook) mathematically closest to the question's Vector.
Prompt Injection: A massive prompt is dynamically assembled behind the scenes.
Generation: The prompt is sent to the LLM (e.g., GPT-4 or Claude 3). The LLM reads the strict prompt, looks at the chunks, and generates a factual answer using only the provided chunks.

The Hidden Strict Prompt

When you use a RAG system, your simple question is secretly wrapped in a massive instruction before it hits the LLM. It looks like this:

You are a helpful company assistant. 
Answer the User's question using ONLY the provided Context chunks below. 
If the Context does not contain the answer, you MUST say "I do not know" and refuse to answer. Do not use outside knowledge.

Context Chunk 1: "Sick leave accrues at 3 hours per bi-weekly pay period..."
Context Chunk 2: "Sick leave maxes out at 40 hours carried over..."

User Question: What is our current sick leave policy?

How to execute the examples:

Go to the Examples/ folder and run the script: python GenAI_LangChain_RAG.py