Skip to content

fix: pwrite spin-wait livelock on CPU-constrained systems (#678)#679

Merged
sfchen merged 1 commit intoOpenGene:masterfrom
KimBioInfoStudio:fix/pwrite-livelock-678
Apr 8, 2026
Merged

fix: pwrite spin-wait livelock on CPU-constrained systems (#678)#679
sfchen merged 1 commit intoOpenGene:masterfrom
KimBioInfoStudio:fix/pwrite-livelock-678

Conversation

@KimBioInfoStudio
Copy link
Copy Markdown
Member

@KimBioInfoStudio KimBioInfoStudio commented Apr 8, 2026

Summary / 概要

Fixes #678 — regression introduced in v1.2.0 where large PE gz FASTQ files hang indefinitely with high thread counts.

修复 #678 — v1.2.0 引入的回归问题,高线程数下处理大型 PE gz FASTQ 文件时无限挂起。

--detect_adapter_for_pe is not related to this bug — the reporter happened to use it, but the deadlock is in the pack distribution and pwrite synchronization.

--detect_adapter_for_pe 与此 bug 无关——报告者恰好使用了该参数,但死锁出在 pack 分发和 pwrite 同步机制中。

Root Cause / 根因

Bug 1: Lock-free list deadlock (primary) / 无锁链表死锁(主因)

Introduced in v1.2.0 when PACK_SIZE was increased from 256→1000 and PACK_IN_MEM_LIMIT was reduced from 128→32.

引入版本:v1.2.0(PACK_SIZE 从 256→1000,PACK_IN_MEM_LIMIT 从 128→32)。

SingleProducerSingleConsumerList::canBeConsumed() returns false for a single-item list because nextItemReady=false and producerFinished=false. When thread_count > PACK_IN_MEM_LIMIT (e.g. -w 64 with limit 32), each worker gets at most 1 pack before reader backpressure kicks in. Workers cannot consume their only pack → mPackProcessedCounter never advances → readers stay blocked → deadlock.

canBeConsumed() 在链表仅有一个 item 时返回 falsenextItemReady=falseproducerFinished=false)。当 thread_count > PACK_IN_MEM_LIMIT(如 -w 64,limit 为 32)时,每个 worker 在 reader 背压前最多拿到 1 个 pack,无法消费 → 计数器不推进 → reader 阻塞 → 死锁

Bug 2: Pwrite spin-wait livelock / Pwrite 自旋活锁

Introduced in v1.2.0 with the parallel libdeflate gzip compression + pwrite mechanism.

引入版本:v1.2.0(并行 libdeflate gzip 压缩 + pwrite 机制)。

The pwrite offset ring used hardware pause/yield instructions that do not yield to the OS scheduler. Under CPU contention (Docker containers with limited CPUs), spinning threads consume all CPU, preventing the predecessor thread from publishing its sequence — output stays at 0 bytes.

pwrite 偏移环使用硬件 pause/yield 指令,不会让出操作系统时间片。在 CPU 受限环境(如 Docker 容器)中,自旋线程消耗全部 CPU,前序线程无法发布序列号 → 输出保持 0 字节。

Bug 3: Repeated buffer reallocation / 重复 buffer 分配

Introduced in v1.2.0 alongside the pwrite mechanism.

引入版本:v1.2.0(与 pwrite 机制同时引入)。

mCompBufSize was shared across all threads and never updated after init, causing every worker to re-allocate its compress buffer on every call when data exceeded the initial 500KB estimate.

mCompBufSize 在所有线程间共享且初始化后从未更新,导致每次数据超过初始 500KB 估计时都会重复分配。

Fix / 修复

Bug Fix File
List deadlock produced > consumed replaces nextItemReady || producerFinished singleproducersingleconsumerlist.h
Pwrite livelock std::condition_variable replaces hardware spin-wait writerthread.cpp, writerthread.h
Buffer realloc Per-thread mCompBufSizes[] replaces shared mCompBufSize writerthread.cpp, writerthread.h

Test plan / 测试计划

  • make -j8 compiles on macOS
  • PE gz output with -w 4/8/16/64 all produce correct non-zero output
  • -w 64 with 500K reads completes in 25s (previously hung indefinitely)
  • Reproduced deadlock before fix: -w 64 → 0-byte output, confirmed via sample stack traces (all workers blocked at processorTask:1029 waiting for input, both readers blocked in backpressure)
  • Verify on Linux Docker with --cpus=2 -w 8 (original reporter's environment)

🤖 Generated with Claude Code

@sfchen
Copy link
Copy Markdown
Member

sfchen commented Apr 8, 2026

Tested, but this PR doesn't fix #678

@KimBioInfoStudio KimBioInfoStudio marked this pull request as draft April 8, 2026 06:35
@KimBioInfoStudio KimBioInfoStudio force-pushed the fix/pwrite-livelock-678 branch 3 times, most recently from 7d8a087 to ca0c7b9 Compare April 8, 2026 07:40
@KimBioInfoStudio
Copy link
Copy Markdown
Member Author

KimBioInfoStudio commented Apr 8, 2026

@sfchen could u have a try, u could find bin in ci https://github.com/OpenGene/fastp/actions/runs/24123835261?pr=679

…Gene#678)

Two bugs caused hangs with large PE gz FASTQ files:

1. Lock-free list deadlock: canBeConsumed() returned false for
   single-item lists. When thread_count > PACK_IN_MEM_LIMIT, each
   worker gets ≤1 pack before reader backpressure — deadlock.
   Fix: mark first item consumable via nextItemReady on produce().

2. Pwrite spin-wait livelock: hardware pause/yield do not yield OS
   timeslice. Under CPU contention (Docker), workers starve the
   predecessor thread. Fix: sleep_for(1μs) to yield CPU.

Also fix per-thread compress buffer tracking (mCompBufSize was shared
and never updated, causing repeated reallocation).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@KimBioInfoStudio KimBioInfoStudio force-pushed the fix/pwrite-livelock-678 branch from e23b8e8 to 19602ae Compare April 8, 2026 09:46
@sfchen sfchen marked this pull request as ready for review April 8, 2026 10:09
@sfchen sfchen merged commit 95903bf into OpenGene:master Apr 8, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Performance regression in v1.3.1 vs v1.1.0 on large PE gz FASTQ with --detect_adapter_for_pe

2 participants