Skip to content

amazingak1/rag-gemini

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📘 PDF Chat Application (RAG with Highlighted Sources)

This project is a Streamlit-based PDF Question Answering app built using Retrieval-Augmented Generation (RAG).
You can upload a PDF, ask questions about its content, and get short, accurate answers generated by Gemini, along with highlighted source chunks from the PDF.


🚀 Features

  • 📄 Upload any PDF file
  • 🔍 Semantic search using FAISS
  • 🧠 Text embeddings via HuggingFace (MiniLM)
  • 🤖 Answer generation using Google Gemini
  • 🖍️ Highlighted source chunks with page numbers
  • ⚡ Cached vector store for fast performance
  • ❌ Responds with "I don't know" if answer is not in the PDF
  • 🧾 Stores Chat History

🛠️ Tech Stack

  • Streamlit – UI
  • LangChain – Orchestration & document processing
  • FAISS – Vector database
  • HuggingFace Embeddings – Semantic text embeddings
  • Google Gemini – Answer generation
  • PyPDFLoader – PDF extraction

📂 Project Structure

├── main3.py # Streamlit app
├── .env # API keys
├── requirements.txt # Dependencies
└── README.md


📦 Installation

pip install -r requirements.txt

🧠 How It Works

  • Upload a PDF – Use the file uploader widget
  • PDF Processing – Document is split into overlapping chunks (800 chars, 150 char overlap)
  • Embeddings – Chunks are converted into embeddings using HuggingFace MiniLM
  • Vector Store – Embeddings are stored in FAISS for fast retrieval
  • Query Processing – User question retrieves top-3 relevant chunks
  • Answer Generation – Gemini generates an answer strictly from retrieved context
  • Source Display – Original chunks are shown with page numbers and highlighting

🧪 Example Prompts

Try asking these questions after uploading a PDF:

  • "What is the main topic discussed in this PDF?"
  • "Summarize the key points"
  • "What does the document say about [specific topic]?"
  • "Who are the authors?"

⚠️ Important Notes

Accuracy & Grounding

  • ✅ Answers are strictly grounded in PDF context
  • ✅ No hallucinations – If information is missing, the app responds with "I don't know"
  • ❌ Does not generate information outside the PDF content

PDF Compatibility

  • ✅ Works best with text-based PDFs
  • ❌ May struggle with scanned images (OCR not included)

Performance

  • Vector stores are cached for faster repeated queries
  • First query may take a few seconds while embeddings are computed

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages