fix: pwrite spin-wait livelock on CPU-constrained systems (#678) by KimBioInfoStudio · Pull Request #679 · OpenGene/fastp

KimBioInfoStudio · 2026-04-08T06:24:42Z

Summary / 概要

Fixes #678 — regression introduced in v1.2.0 where large PE gz FASTQ files hang indefinitely with high thread counts.

修复 #678 — v1.2.0 引入的回归问题，高线程数下处理大型 PE gz FASTQ 文件时无限挂起。

--detect_adapter_for_pe is not related to this bug — the reporter happened to use it, but the deadlock is in the pack distribution and pwrite synchronization.

--detect_adapter_for_pe 与此 bug 无关——报告者恰好使用了该参数，但死锁出在 pack 分发和 pwrite 同步机制中。

Root Cause / 根因

Bug 1: Lock-free list deadlock (primary) / 无锁链表死锁（主因）

Introduced in v1.2.0 when PACK_SIZE was increased from 256→1000 and PACK_IN_MEM_LIMIT was reduced from 128→32.

引入版本：v1.2.0（PACK_SIZE 从 256→1000，PACK_IN_MEM_LIMIT 从 128→32）。

SingleProducerSingleConsumerList::canBeConsumed() returns false for a single-item list because nextItemReady=false and producerFinished=false. When thread_count > PACK_IN_MEM_LIMIT (e.g. -w 64 with limit 32), each worker gets at most 1 pack before reader backpressure kicks in. Workers cannot consume their only pack → mPackProcessedCounter never advances → readers stay blocked → deadlock.

canBeConsumed() 在链表仅有一个 item 时返回 false（nextItemReady=false 且 producerFinished=false）。当 thread_count > PACK_IN_MEM_LIMIT（如 -w 64，limit 为 32）时，每个 worker 在 reader 背压前最多拿到 1 个 pack，无法消费 → 计数器不推进 → reader 阻塞 → 死锁。

Bug 2: Pwrite spin-wait livelock / Pwrite 自旋活锁

Introduced in v1.2.0 with the parallel libdeflate gzip compression + pwrite mechanism.

引入版本：v1.2.0（并行 libdeflate gzip 压缩 + pwrite 机制）。

The pwrite offset ring used hardware pause/yield instructions that do not yield to the OS scheduler. Under CPU contention (Docker containers with limited CPUs), spinning threads consume all CPU, preventing the predecessor thread from publishing its sequence — output stays at 0 bytes.

pwrite 偏移环使用硬件 pause/yield 指令，不会让出操作系统时间片。在 CPU 受限环境（如 Docker 容器）中，自旋线程消耗全部 CPU，前序线程无法发布序列号 → 输出保持 0 字节。

Bug 3: Repeated buffer reallocation / 重复 buffer 分配

Introduced in v1.2.0 alongside the pwrite mechanism.

引入版本：v1.2.0（与 pwrite 机制同时引入）。

mCompBufSize was shared across all threads and never updated after init, causing every worker to re-allocate its compress buffer on every call when data exceeded the initial 500KB estimate.

mCompBufSize 在所有线程间共享且初始化后从未更新，导致每次数据超过初始 500KB 估计时都会重复分配。

Fix / 修复

Bug	Fix	File
List deadlock	`produced > consumed` replaces `nextItemReady \|\| producerFinished`	`singleproducersingleconsumerlist.h`
Pwrite livelock	`std::condition_variable` replaces hardware spin-wait	`writerthread.cpp`, `writerthread.h`
Buffer realloc	Per-thread `mCompBufSizes[]` replaces shared `mCompBufSize`	`writerthread.cpp`, `writerthread.h`

Test plan / 测试计划

make -j8 compiles on macOS
PE gz output with -w 4/8/16/64 all produce correct non-zero output
-w 64 with 500K reads completes in 25s (previously hung indefinitely)
Reproduced deadlock before fix: -w 64 → 0-byte output, confirmed via sample stack traces (all workers blocked at processorTask:1029 waiting for input, both readers blocked in backpressure)
Verify on Linux Docker with --cpus=2 -w 8 (original reporter's environment)

🤖 Generated with Claude Code

sfchen · 2026-04-08T06:33:27Z

Tested, but this PR doesn't fix #678

KimBioInfoStudio · 2026-04-08T07:52:56Z

@sfchen could u have a try, u could find bin in ci https://github.com/OpenGene/fastp/actions/runs/24123835261?pr=679

…Gene#678) Two bugs caused hangs with large PE gz FASTQ files: 1. Lock-free list deadlock: canBeConsumed() returned false for single-item lists. When thread_count > PACK_IN_MEM_LIMIT, each worker gets ≤1 pack before reader backpressure — deadlock. Fix: mark first item consumable via nextItemReady on produce(). 2. Pwrite spin-wait livelock: hardware pause/yield do not yield OS timeslice. Under CPU contention (Docker), workers starve the predecessor thread. Fix: sleep_for(1μs) to yield CPU. Also fix per-thread compress buffer tracking (mCompBufSize was shared and never updated, causing repeated reallocation). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

KimBioInfoStudio marked this pull request as draft April 8, 2026 06:35

KimBioInfoStudio force-pushed the fix/pwrite-livelock-678 branch 3 times, most recently from 7d8a087 to ca0c7b9 Compare April 8, 2026 07:40

KimBioInfoStudio mentioned this pull request Apr 8, 2026

Performance regression in v1.3.1 vs v1.1.0 on large PE gz FASTQ with --detect_adapter_for_pe #678

Closed

KimBioInfoStudio force-pushed the fix/pwrite-livelock-678 branch 5 times, most recently from a6c4d9c to e23b8e8 Compare April 8, 2026 09:35

KimBioInfoStudio force-pushed the fix/pwrite-livelock-678 branch from e23b8e8 to 19602ae Compare April 8, 2026 09:46

sfchen marked this pull request as ready for review April 8, 2026 10:09

sfchen merged commit 95903bf into OpenGene:master Apr 8, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: pwrite spin-wait livelock on CPU-constrained systems (#678)#679

fix: pwrite spin-wait livelock on CPU-constrained systems (#678)#679
sfchen merged 1 commit intoOpenGene:masterfrom
KimBioInfoStudio:fix/pwrite-livelock-678

KimBioInfoStudio commented Apr 8, 2026 •

edited

Loading

Uh oh!

sfchen commented Apr 8, 2026

Uh oh!

KimBioInfoStudio commented Apr 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

KimBioInfoStudio commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary / 概要

Root Cause / 根因

Fix / 修复

Test plan / 测试计划

Uh oh!

sfchen commented Apr 8, 2026

Uh oh!

KimBioInfoStudio commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

KimBioInfoStudio commented Apr 8, 2026 •

edited

Loading

KimBioInfoStudio commented Apr 8, 2026 •

edited

Loading