fix: recompute overlap after trimming for merge mode#676
fix: recompute overlap after trimming for merge mode#676sfchen merged 1 commit intoOpenGene:masterfrom
Conversation
The overlap caching optimization introduced in v1.3 computed the overlap result once before adapter/polyX/maxLen trimming and reused it for the merge step. This caused the merge to use stale overlap data from untrimmed reads, leading to pairs that should merge post-trim being incorrectly routed to the unpaired output path instead. Always recompute the overlap on the post-trim reads before merging, restoring v1.2 behavior where the merge step analyzed trimmed reads. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@bernt-matthias Thanks for reporting this. We've identified the root cause and submitted this fix. Root cause: The v1.3 overlap caching optimization computes the overlap result once before adapter trimming and reuses it for the merge step. After adapter trimming shortens the reads, the stale pre-trim overlap can incorrectly report "not overlapped" for pairs that actually overlap post-trim. These pairs skip merging and fall through to the normal paired-end processing path, where a single-passing read gets written to the unpaired output — producing data in a file that was empty in v1.2. Fix: Recompute the overlap on the post-trim reads before the merge step, restoring v1.2 behavior. Could you verify this fix against your Galaxy test case? That would help confirm it fully resolves the discrepancy you observed. |
|
I'm happy to help. Could you provide me a prebuilt binary or tell me how to produce one? Otherwise you could open a draft pull request on bioconda against this commit. Then I could test the artifacts. |
@bernt-matthias let me do the 2nd one |
Point source to KimBioInfoStudio/fastp@9d8e888 which recomputes overlap after trimming, fixing incorrect unpaired output in merge mode. This is a test build for verification — not for merge. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@bernt-matthias Draft PR opened on bioconda-recipes: bioconda/bioconda-recipes#63976 Once CI completes, you can grab the built binaries from the artifacts (linux-x86_64, linux-aarch64, osx-x86_64, osx-arm64). Let us know how it goes! |
|
I tested successfully with the linux container built by bioconda. |
|
I am going to release v1.3.1 for a fast fix. |
Summary
Fixes #675 — merge mode in v1.3 produces non-empty reverse unpaired output where v1.2 produced an empty file.
Root cause: The overlap caching optimization (introduced in
12b1738) computes the overlap result once before adapter/polyX/maxLen trimming and reuses it for the merge step. After trimming shortens the reads, the stale pre-trim overlap may incorrectly report "not overlapped" for pairs that do overlap post-trim. These pairs skip merging and fall through to the normal PE processing path, where single-passing reads get routed to the unpaired output — producing data in a file that should be empty.Fix: Always recompute the overlap on post-trim reads before the merge step, restoring v1.2 behavior.
Test plan
--mergeand--unpaired2on paired-end data containing adapter sequences🤖 Generated with Claude Code