Skip to content

RanulND/ScholarQA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Comparing the Performance of LLMs in RAG-Based Question-Answering: A Case Study in Computer Science Literature

This is the repo containing all the codes and datasets used for the paper titled "Comparing the Performance of LLMs in RAG-Based Question-Answering: A Case Study in Computer Science Literature" presented at the 5th International Conference on Artificial Intelligence in Education Technology hosted by University of Barcelona. access the full paper here

This paper compares the perfromance of popular open-source LLMs such as Mistral-7b-instruct, LLaMa2-7b-chat, Falcon-7b-instruct and Orca-mini-v3-7b, and OpenAI’s trending GPT-3.5 in RAG based question-answering.

🌟 Guide to the repo

  1. ChatBot
    Contains the scripts used for developing the chatbot.
  • chatbot_utils.py: Contains the parameters set for each LLM
  • hybrid_chat.py: The script to be used with a hybrid approach which uses answer candidates from knowledge graphs and langchain LLMchain to generate the answer.
  • prompt.py: Contains user defined prompts to match different scenarios/LLMs
  • vector_chat.py: This creates the chat infrastructure by calling the chat history and vecot db retreived documents to answer the questions.
  • vectordb_retriever.py: This is to do a vector search in the FAISS vector db when user inputs a question.
  1. DataBase

    2.1 Knowledge Graph

    • Contains the script used to transform cypher text into Neo4J Knowledge Graph representations.
    • In this stage of the research knowledge graphs are not implemented fully. So please consider this as a testing phase and prioritize the vector db.

    2.2 VectorDB

    • Contains the script used to embed document text into vectors and store them in the FAISS vector db.
    • Please note that all the documents are first represented using the JSON format before embedding.
  2. Document Processing

  • Includes some of the pre processing steps carried out on the documents.
  • Also includes the script to convert text documents into JSON format.
  1. Evaluation
  • A script to calculate the cosine similarity between generated answer and the answer candidate.
  • Used to mathematiccaly determine the accuracy of the chatbot in answer generation.
  1. Sample Files
  • Contains some of the documents and their JSON docs we used to test the chatbot functionality.
  • Please note all our LLM propost are engineered to answer the questions from the domains covered in the sample data: LLMs, Quantum Computing, Edge Computing

🛠 Run it

  1. Clone the repository to your local machine

    git clone https://github.com/RanulND/ScholarQA.git
  2. Once the cloning process is completed, navigate into the cloned directory.

    cd ScholarQA-master
  3. Install requirements

    pip install -r requirements.txt

🪄 Customise to match your style & feel the magic!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors