Skip to content

Add ifeval_pt to harness#3622

Open
Nkluge-correa wants to merge 22 commits intoEleutherAI:mainfrom
Polygl0t:ifeval_pt
Open

Add ifeval_pt to harness#3622
Nkluge-correa wants to merge 22 commits intoEleutherAI:mainfrom
Polygl0t:ifeval_pt

Conversation

@Nkluge-correa
Copy link
Copy Markdown

This PR introduces a Portuguese variant of the original ifeval task (i.e., ifeval_pt).

Because adapting ifeval required changes to some supporting scripts, we decided (for this initial PR) to keep the Portuguese variant separate from the existing implementation. Our goal is to make it easier for maintainers to review the changes and decide on the most appropriate integration strategy (e.g., keeping it as a standalone task or incorporating it into a multilingual version with refactored supporting scripts).

The dataset and metadata were manually translated by our team, which includes native Portuguese speakers. We also carefully reviewed the resulting evaluation outputs to ensure the correctness and consistency of the task. Results obtained using this evaluation are reported in our paper: https://arxiv.org/abs/2603.03543.

Note: We are also preparing additional Portuguese evaluations (including GSM8K and RULER variants), which we plan to submit in separate PRs.

@Nkluge-correa Nkluge-correa requested a review from baberabb as a code owner March 5, 2026 09:35
@Nkluge-correa Nkluge-correa requested a review from 0xSMT as a code owner March 27, 2026 23:50
@Nkluge-correa
Copy link
Copy Markdown
Author

Hello!
Is there anything still missing from this PR for it to be reviewed and considered? I have a few additional tasks I’d like to officially add to the harness, but I want to avoid overwhelming the maintainers with multiple PRs. I’m holding off on opening more until this one is resolved, so we can tackle things one at a time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant