Skip to content

fix: recompute overlap after trimming for merge mode#676

Merged
sfchen merged 1 commit intoOpenGene:masterfrom
KimBioInfoStudio:fix/stale-overlap-merge-675
Mar 30, 2026
Merged

fix: recompute overlap after trimming for merge mode#676
sfchen merged 1 commit intoOpenGene:masterfrom
KimBioInfoStudio:fix/stale-overlap-merge-675

Conversation

@KimBioInfoStudio
Copy link
Copy Markdown
Member

Summary

Fixes #675 — merge mode in v1.3 produces non-empty reverse unpaired output where v1.2 produced an empty file.

Root cause: The overlap caching optimization (introduced in 12b1738) computes the overlap result once before adapter/polyX/maxLen trimming and reuses it for the merge step. After trimming shortens the reads, the stale pre-trim overlap may incorrectly report "not overlapped" for pairs that do overlap post-trim. These pairs skip merging and fall through to the normal PE processing path, where single-passing reads get routed to the unpaired output — producing data in a file that should be empty.

Fix: Always recompute the overlap on post-trim reads before the merge step, restoring v1.2 behavior.

Test plan

  • Run fastp with --merge and --unpaired2 on paired-end data containing adapter sequences
  • Verify reverse unpaired output matches v1.2 behavior (empty when expected)
  • Verify merged output contains the same reads as v1.2

🤖 Generated with Claude Code

The overlap caching optimization introduced in v1.3 computed the overlap
result once before adapter/polyX/maxLen trimming and reused it for the
merge step. This caused the merge to use stale overlap data from
untrimmed reads, leading to pairs that should merge post-trim being
incorrectly routed to the unpaired output path instead.

Always recompute the overlap on the post-trim reads before merging,
restoring v1.2 behavior where the merge step analyzed trimmed reads.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@KimBioInfoStudio
Copy link
Copy Markdown
Member Author

@bernt-matthias Thanks for reporting this. We've identified the root cause and submitted this fix.

Root cause: The v1.3 overlap caching optimization computes the overlap result once before adapter trimming and reuses it for the merge step. After adapter trimming shortens the reads, the stale pre-trim overlap can incorrectly report "not overlapped" for pairs that actually overlap post-trim. These pairs skip merging and fall through to the normal paired-end processing path, where a single-passing read gets written to the unpaired output — producing data in a file that was empty in v1.2.

Fix: Recompute the overlap on the post-trim reads before the merge step, restoring v1.2 behavior.

Could you verify this fix against your Galaxy test case? That would help confirm it fully resolves the discrepancy you observed.

@bernt-matthias
Copy link
Copy Markdown

I'm happy to help. Could you provide me a prebuilt binary or tell me how to produce one? Otherwise you could open a draft pull request on bioconda against this commit. Then I could test the artifacts.

@KimBioInfoStudio
Copy link
Copy Markdown
Member Author

I'm happy to help. Could you provide me a prebuilt binary or tell me how to produce one? Otherwise you could open a draft pull request on bioconda against this commit. Then I could test the artifacts.

@bernt-matthias let me do the 2nd one

KimBioInfoStudio added a commit to KimBioInfoStudio/bioconda-recipes that referenced this pull request Mar 30, 2026
Point source to KimBioInfoStudio/fastp@9d8e888 which recomputes
overlap after trimming, fixing incorrect unpaired output in merge mode.
This is a test build for verification — not for merge.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@KimBioInfoStudio
Copy link
Copy Markdown
Member Author

@bernt-matthias Draft PR opened on bioconda-recipes: bioconda/bioconda-recipes#63976

Once CI completes, you can grab the built binaries from the artifacts (linux-x86_64, linux-aarch64, osx-x86_64, osx-arm64). Let us know how it goes!

@bernt-matthias
Copy link
Copy Markdown

I tested successfully with the linux container built by bioconda.

@sfchen sfchen merged commit f6d2d34 into OpenGene:master Mar 30, 2026
2 checks passed
@sfchen
Copy link
Copy Markdown
Member

sfchen commented Mar 30, 2026

I am going to release v1.3.1 for a fast fix.

@KimBioInfoStudio KimBioInfoStudio deleted the fix/stale-overlap-merge-675 branch March 30, 2026 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Test discrepancy in version 1.3

3 participants