fix: pwrite spin-wait livelock on CPU-constrained systems (#678)#679
Merged
sfchen merged 1 commit intoOpenGene:masterfrom Apr 8, 2026
Merged
Conversation
Member
|
Tested, but this PR doesn't fix #678 |
7d8a087 to
ca0c7b9
Compare
Member
Author
|
@sfchen could u have a try, u could find bin in ci https://github.com/OpenGene/fastp/actions/runs/24123835261?pr=679 |
a6c4d9c to
e23b8e8
Compare
…Gene#678) Two bugs caused hangs with large PE gz FASTQ files: 1. Lock-free list deadlock: canBeConsumed() returned false for single-item lists. When thread_count > PACK_IN_MEM_LIMIT, each worker gets ≤1 pack before reader backpressure — deadlock. Fix: mark first item consumable via nextItemReady on produce(). 2. Pwrite spin-wait livelock: hardware pause/yield do not yield OS timeslice. Under CPU contention (Docker), workers starve the predecessor thread. Fix: sleep_for(1μs) to yield CPU. Also fix per-thread compress buffer tracking (mCompBufSize was shared and never updated, causing repeated reallocation). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
e23b8e8 to
19602ae
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary / 概要
Fixes #678 — regression introduced in v1.2.0 where large PE gz FASTQ files hang indefinitely with high thread counts.
修复 #678 — v1.2.0 引入的回归问题,高线程数下处理大型 PE gz FASTQ 文件时无限挂起。
--detect_adapter_for_peis not related to this bug — the reporter happened to use it, but the deadlock is in the pack distribution and pwrite synchronization.--detect_adapter_for_pe与此 bug 无关——报告者恰好使用了该参数,但死锁出在 pack 分发和 pwrite 同步机制中。Root Cause / 根因
Bug 1: Lock-free list deadlock (primary) / 无锁链表死锁(主因)
Introduced in v1.2.0 when
PACK_SIZEwas increased from 256→1000 andPACK_IN_MEM_LIMITwas reduced from 128→32.引入版本:v1.2.0(
PACK_SIZE从 256→1000,PACK_IN_MEM_LIMIT从 128→32)。SingleProducerSingleConsumerList::canBeConsumed()returnsfalsefor a single-item list becausenextItemReady=falseandproducerFinished=false. Whenthread_count > PACK_IN_MEM_LIMIT(e.g.-w 64with limit 32), each worker gets at most 1 pack before reader backpressure kicks in. Workers cannot consume their only pack →mPackProcessedCounternever advances → readers stay blocked → deadlock.canBeConsumed()在链表仅有一个 item 时返回false(nextItemReady=false且producerFinished=false)。当thread_count > PACK_IN_MEM_LIMIT(如-w 64,limit 为 32)时,每个 worker 在 reader 背压前最多拿到 1 个 pack,无法消费 → 计数器不推进 → reader 阻塞 → 死锁。Bug 2: Pwrite spin-wait livelock / Pwrite 自旋活锁
Introduced in v1.2.0 with the parallel libdeflate gzip compression + pwrite mechanism.
引入版本:v1.2.0(并行 libdeflate gzip 压缩 + pwrite 机制)。
The pwrite offset ring used hardware
pause/yieldinstructions that do not yield to the OS scheduler. Under CPU contention (Docker containers with limited CPUs), spinning threads consume all CPU, preventing the predecessor thread from publishing its sequence — output stays at 0 bytes.pwrite 偏移环使用硬件
pause/yield指令,不会让出操作系统时间片。在 CPU 受限环境(如 Docker 容器)中,自旋线程消耗全部 CPU,前序线程无法发布序列号 → 输出保持 0 字节。Bug 3: Repeated buffer reallocation / 重复 buffer 分配
Introduced in v1.2.0 alongside the pwrite mechanism.
引入版本:v1.2.0(与 pwrite 机制同时引入)。
mCompBufSizewas shared across all threads and never updated after init, causing every worker to re-allocate its compress buffer on every call when data exceeded the initial 500KB estimate.mCompBufSize在所有线程间共享且初始化后从未更新,导致每次数据超过初始 500KB 估计时都会重复分配。Fix / 修复
produced > consumedreplacesnextItemReady || producerFinishedsingleproducersingleconsumerlist.hstd::condition_variablereplaces hardware spin-waitwriterthread.cpp,writerthread.hmCompBufSizes[]replaces sharedmCompBufSizewriterthread.cpp,writerthread.hTest plan / 测试计划
make -j8compiles on macOS-w 4/8/16/64all produce correct non-zero output-w 64with 500K reads completes in 25s (previously hung indefinitely)-w 64→ 0-byte output, confirmed viasamplestack traces (all workers blocked atprocessorTask:1029waiting for input, both readers blocked in backpressure)--cpus=2 -w 8(original reporter's environment)🤖 Generated with Claude Code