fix: disable lockedSynchronizers in dumpAllThreads to avoid ZGC safepoint heap scan#16195
Open
eddieran wants to merge 2 commits intoapache:3.3from
Open
fix: disable lockedSynchronizers in dumpAllThreads to avoid ZGC safepoint heap scan#16195eddieran wants to merge 2 commits intoapache:3.3from
eddieran wants to merge 2 commits intoapache:3.3from
Conversation
… heap scan (apache#16194) `ThreadMXBean.dumpAllThreads(true, true)` with lockedSynchronizers=true forces the JVM to scan the entire heap at a safepoint to find all AbstractOwnableSynchronizer instances. On ZGC with large heaps (65GB+), this causes ~37-second safepoint pauses that freeze all application threads, leading to cascading thread pool exhaustion. Change lockedSynchronizers from true to false. This retains locked monitor information (derived from thread stacks, cheap) but skips the expensive heap scan. Only java.util.concurrent.locks ownership info is lost from the thread dump output. Fixes apache#16194
Context is documented in the issue and PR description. Fixes apache#16194
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## 3.3 #16195 +/- ##
============================================
- Coverage 60.80% 60.75% -0.05%
+ Complexity 11756 11750 -6
============================================
Files 1953 1953
Lines 89118 89118
Branches 13444 13444
============================================
- Hits 54188 54145 -43
- Misses 29368 29397 +29
- Partials 5562 5576 +14
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What is the purpose of the change
Fixes #16194
JVMUtil.jstack()callsThreadMXBean.dumpAllThreads(true, true). ThelockedSynchronizers=trueparameter forces the JVM to scan the entire Java heap at a safepoint to find allAbstractOwnableSynchronizerinstances. On ZGC with large heaps, this causes catastrophic safepoint pauses (36–39 seconds measured on a 65GB heap with ~1950 threads) that freeze the entire application.This PR changes
lockedSynchronizersfromtruetofalse, eliminating the heap scan.Root Cause
On ZGC,
HeapInspection::find_instances_at_safepoint()iterates the entire heap, and every object reference must pass through ZGC's load barrier (color bit check → relocate → forwarding table → remap). On our 65GB heap, this resulted in a ~37-second safepoint pause. For comparison, normal ZGC safepoint operations (Mark Start, Mark End, Relocate Start) complete in 0.1–0.8ms.The OpenJDK community already fixed this on the tooling side (JDK-8324066: "clhsdb jstack should not scan for j.u.c locks by default"), but the programmatic API (
ThreadMXBean.dumpAllThreads) has no such protection.Production Impact
When
AbortPolicyWithReportfires on ZGC + large heap:dumpAllThreads(true, true)→ 37s full application freezeBrief changelog
JVMUtil.jstack(): ChangedumpAllThreads(true, true)todumpAllThreads(true, false)What is lost
Only the "Locked synchronizers" section at the bottom of each thread's dump — i.e.,
java.util.concurrent.locks.ReentrantLock/ReadWriteLockownership. All other diagnostic info is retained:synchronizedblock contention (BLOCKED on ...)synchronizedmonitor ownership (- locked ...)Verifying this change
Existing tests pass — all tests in
AbortPolicyWithReportTestmock thejstack()method and are not affected by the parameter change.The fix can be verified by:
AbortPolicyWithReport