Skip to content

fix: predict all CoQA turn answers instead of only the last turn#3704

Draft
rahulraj-jhawar-devrev wants to merge 1 commit intoEleutherAI:mainfrom
rahulraj-jhawar-devrev:rahulraj/contributions
Draft

fix: predict all CoQA turn answers instead of only the last turn#3704
rahulraj-jhawar-devrev wants to merge 1 commit intoEleutherAI:mainfrom
rahulraj-jhawar-devrev:rahulraj/contributions

Conversation

@rahulraj-jhawar-devrev
Copy link
Copy Markdown

Fixes #1231

Problem

CoQA implementation only predicts the last answer of each text. The official CoQA benchmark evaluates predictions for ALL turn_ids and averages results across turns.

Changes

  • Modified CoQA utils to iterate over all turns instead of just the last one
  • Maintains conversation context (previous Q&A pairs) for each turn prediction
  • Output format matches official CoQA evaluation expectations

Implementation Details

  • Added process_docs function that expands each conversation into multiple instances (one per turn)
  • Each expanded instance contains the story and conversation history up to that specific turn
  • The model predicts the answer for each turn with full context of previous Q&A pairs
  • Version bumped to 4.0 to reflect the change in evaluation behavior

🤖 Generated with Claude Code

EleutherAI#1231)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Rahulraj Jhawar seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CoQA's implementation only predicts the last answer of each text

2 participants