Evaluating ChatGPT’s Information Extraction Capabilities: An Assessment of Performance, Explainability, Calibration, and Faithfulness
-
Updated
Aug 17, 2024 - Python
Evaluating ChatGPT’s Information Extraction Capabilities: An Assessment of Performance, Explainability, Calibration, and Faithfulness
[ACL'24] Official Implementation of the paper "Direct Evaluation of Chain-of-Thought in Multi-hop Reasoning with Knowledge Graphs"(https://aclanthology.org/2024.findings-acl.168)
FaithScore: Fine-grained Evaluations of Hallucinations in Large Vision-Language Models
Evaluating the faithfulness of long-context language models
Code and data for the ACL 2024 Findings paper "Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning"
The Dataset and Official Implementation for <Discursive Socratic Questioning: Evaluating the Faithfulness of Language Models’ Understanding of Discourse Relations> @ ACL 2024
About The corresponding code from our paper " Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning" . Do not hesitate to open an issue if you run into any trouble!
Koishi's Day 2024 Paper (NeurIPS 2024): An advanced persona-driven role-playing system with global faithfulness quantification and optimization. In memory of the Koishi's Day of 2024.
Open-source Python toolkit for evaluating RAG pipelines. LLM-as-judge for faithfulness, relevancy, and context precision with Claude and GPT-4 backends.
Novel data representation leading to granular citations and higher accuracy
A new training framework for Trustworthy Large Reasoning Models
On the evaluation of deep learning interpretability methods for medical images under the scope of faithfulness
[EMNLP 2023] A Causal View of Entity Bias in (Large) Language Models
FIFA: Unified Faithfulness Evaluation Framework for Text-to-Video and Video-to-Text Generation
Official PyTorch implementation of Faithfulness Serum (ACL Main 2026) - a training-free method that improves the faithfulness of LLM explanations by guiding generation with attribution-based signals.
[ACL 2025] Reranking-based Generation for Unbiased Perspective Summarization
The official repo for the EMNLP 2025 paper "NormXLogit: The Head-on-Top Never Lies"
End-to-end RAG evaluation kit. Auto-generate test questions from your corpus, score responses on faithfulness/relevance/completeness with LLM-as-judge, produce quality reports. Works with any RAG implementation.
This project applies Explainable AI techniques to a Student Dropout dataset, covering pre-, in- and post-modeling explanations, as well as an analysis of their quality. The project was developed for the "Adavnced Topics on Machine Learning" course. 1st Semester of the 1st Year of the Master's Degree in Artificial Intelligence.
Add a description, image, and links to the faithfulness topic page so that developers can more easily learn about it.
To associate your repository with the faithfulness topic, visit your repo's landing page and select "manage topics."