Skip to content

Support jdk.CPUTimeSample event in ProfileResults for Java 25+ (JEP 509)#15983

Merged
mikemccand merged 10 commits intoapache:mainfrom
iprithv:fix/profile-results-cputime-sample
Apr 29, 2026
Merged

Support jdk.CPUTimeSample event in ProfileResults for Java 25+ (JEP 509)#15983
mikemccand merged 10 commits intoapache:mainfrom
iprithv:fix/profile-results-cputime-sample

Conversation

@iprithv
Copy link
Copy Markdown
Contributor

@iprithv iprithv commented Apr 24, 2026

Description

Java 25 added CPU-time profiling (JEP 509) which emits jdk.CPUTimeSample
events. ProfileResults did not recognize this event type, causing empty
profiling reports when the CPU-time profiler is active.

Resolves #15926

@github-actions github-actions Bot added this to the 11.0.0 milestone Apr 24, 2026
Copy link
Copy Markdown
Member

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoa, thank you for picking this up @iprithv -- I left some comments about maybe being less lenient about two sampling methods used a the same time.

// needed. When available, we skip legacy execution samples to avoid double-counting.
return name.equals("jdk.CPUTimeSample");
}
return (name.equals("jdk.ExecutionSample") || name.equals("jdk.NativeMethodSample"))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think NativeMethodSample is also a dup, if CPUTime and wall-clock sampling are both enabled?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I expanded the comment to say that explicitly so it matches the code. Thanks!

Comment thread gradle/testing/profiling.jfc Outdated
<setting name="period">1 ms</setting>
</event>

<event name="jdk.CPUTimeSample">
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we turn off the old (wall-clock) sampling (jdk.ExecutionSample, jdk.NativeMethodSample) and just fully commit to CPUTime. In nightly benchmarks we've found the 1 msec sampling to add non-trivial overhead (I think ~5-7% slower).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 1msec sampling was necessary for tests in order to find places wasting CPU. most tests just don't run that long.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done,

  1. Added gradle/testing/profiling.linux.jfc: legacy samplers off, jdk.CPUTimeSample on with a 1 ms throttle so short chunks still get dense enough samples.
  2. Left gradle/testing/profiling.jfc for non-Linux with 1 ms ExecutionSample / NativeMethodSample.
  3. CodeProfilingPlugin loads the Linux file when os.name contains linux, otherwise profiling.jfc.

Thanks both!


// Pre-scan to detect if CPU-time samples (Java 25+, JEP 509) are available.
// If so, prefer them over legacy execution samples to avoid double-counting.
boolean hasCPUTimeSamples = "cpu".equals(mode) && detectCPUTimeSamples(files);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this stricter? Throw an exception if we see a mix of CPUTime and legacy events? Lucene (and in general anyone profiling, hmm except maybe profiler debuggers heh) should only have one sampler enabled at a time?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially added the strict mixed-sampler check here, but removed it after @rmuir follow up review since the two Lucene JFC files are now mutually exclusive.

So the exclusivity now lives in the selected JFC config instead of a pre-scan in ProfileResults. Thanks!

* Pre-scan recording files to detect if any jdk.CPUTimeSample events are present. When they are,
* we prefer them over legacy jdk.ExecutionSample/jdk.NativeMethodSample to avoid double-counting.
*/
static boolean detectCPUTimeSamples(List<String> files) throws IOException {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this logic, should we just have profiling.linux.jfc that is loaded for linux, and profiling.jfc used for others?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, done. Linux uses gradle/testing/profiling.linux.jfc, everyone else gradle/testing/profiling.jfc, selected in CodeProfilingPlugin from the host OS. Thanks!

iprithv added 2 commits April 28, 2026 02:44
Signed-off-by: prithvi <prithvisivasankar@gmail.com>
@iprithv iprithv requested review from mikemccand and rmuir April 27, 2026 21:30
String name = event.getEventType().getName();
switch (mode) {
case "cpu":
if (hasCPUTimeSamples) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this boolean parameter nor the if block.

Just something like:

return name.equals("jdk.CPUTimeSample") || 
       name.equals("jdk.ExecutionSample") ||
       name.equals("jdk.NativeMethodSample");

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, updated. thanks!

Comment thread gradle/testing/profiling.jfc Outdated
<!--
Collects only execution and method samples at a low interval
Wall-clock execution samples for CPU profiling on platforms where JEP 509 CPU-time sampling
is unavailable (non-Linux). On Linux, CodeProfilingPlugin uses profiling.linux.jfc instead.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a profiling.linux.jfc in your PR

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad, missed it in adding in previous commit. Thanks!

* wall-clock samples ({@code jdk.ExecutionSample} / {@code jdk.NativeMethodSample}); only one
* CPU sampler should be enabled in JFR settings.
*/
static boolean resolveCpuTimeSampling(List<String> files) throws IOException {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the whole function as the two files are mutually exclusive. This is very verbose and not needed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, done. thanks!

}

// Pre-scan recordings: decide CPU-time vs legacy sampling and reject mixed configurations.
boolean hasCPUTimeSamples = "cpu".equals(mode) && resolveCpuTimeSampling(files);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nuke this one too

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, removed this too. thanks!

Signed-off-by: prithvi <prithvisivasankar@gmail.com>
@iprithv iprithv requested a review from rmuir April 28, 2026 06:49
Copy link
Copy Markdown
Member

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the all the iterations @iprithv -- I think this is close!

/**
* JEP 509 CPU-time sampling ({@code jdk.CPUTimeSample}) is implemented on Linux. Use a Linux-only
* JFR template with CPU-time sampling only; other hosts keep wall-clock execution sampling for
* short tests.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, it's for all tests, not just short tests right? Maybe just remove for short tests?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done. Updated. Thanks!

Provider<Boolean> frametypesOption) {}

/**
* JEP 509 CPU-time sampling ({@code jdk.CPUTimeSample}) is implemented on Linux. Use a Linux-only
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe is currently (Java 25) implemented only on Linux, so we use a Linux-only JFR template...?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, updated. Thanks!

*/
private static String profilingSettingsPath() {
String os = System.getProperty("os.name", "").toLowerCase(Locale.ROOT);
return os.contains("linux") ? "testing/profiling.linux.jfc" : "testing/profiling.jfc";
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, instead of baking linux in the name, could we use cputime? Maybe in the future other OS's can support CPU time sampling too...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, makes sense. Thanks!

Comment thread gradle/testing/profiling.cputime.jfc Outdated
-->
<configuration version="2.0" label="TestProfilingLinux" description="CPU-time sampling for unit tests (Linux)" provider="Apache">
<event name="jdk.ExecutionSample">
<setting name="enabled">false</setting>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, are these events enabled by default, so we have to have these entries (jdk.ExecutionSample and jdk.NativeMethodSample) to explicitly then disable them?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you are correct, I don't believe it is strictly necessary to have them in this file with enabled: false.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the explicit disabled entries. The CPU time template now only enables jdk.CPUTimeSample plus the allocation events used by heap profiling. Thanks!

Comment thread lucene/CHANGES.txt Outdated

Build
---------------------
* GITHUB#15926: Support jdk.CPUTimeSample event (Java 25+, JEP 509) in ProfileResults.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we word a bit more user-friendly-y? Maybe Use the new (Java 25+, JEP 509) CPU time sampling profiler when available (currently just on Linux), and fall back to the legacy wall-clock...?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, done. thanks!

iprithv added 3 commits April 28, 2026 20:03
Signed-off-by: prithvi <prithvisivasankar@gmail.com>
Signed-off-by: prithvi <prithvisivasankar@gmail.com>
@iprithv iprithv requested a review from mikemccand April 28, 2026 14:49
Copy link
Copy Markdown
Member

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I'll merge & backport! Thank you @iprithv and @rmuir.

This should also fix mikemccand/luceneutil#562, so we can finally see CPUTime profiler JFRs from nightly benchy and on blunders.io! (Oooh I think blunders.io flame charts are already working w/ CPUTime sampling).

@mikemccand mikemccand merged commit dd94a52 into apache:main Apr 29, 2026
13 checks passed
@mikemccand
Copy link
Copy Markdown
Member

Hmm, there is some pre-existing issue that causes gradle not to print the JFR stack traces. Instead it says this:

> Task :showJfrProfileSummary
One of the test tasks failed, skipping.

Yet tests were successful, nothing seems to have failed.

This is what I'm running:

./gradlew -p lucene/core test -Ptests.profile=true -Ptests.profile.stacksize=8 -Ptests.profile.linenumbers=true

Somehow in CodeProfilingPlugin.java this part is false(?)-failing:

        if (getShouldRunConditions().get().stream().anyMatch(value -> value == false)) {
          getLogger().lifecycle("One of the test tasks failed, skipping.");
          return;
        }

I haven't debugged any further yet ... if I disable that check then I get some profiler output but I don't think it's counting CPUTime events ... not sure:

Aggregated summary of 1 jfr profile.
PROFILE SUMMARY from 105485 events (total: 105485)
  tests.profile.mode=cpu
  tests.profile.count=10
  tests.profile.stacksize=8
  tests.profile.linenumbers=true
PERCENT       CPU SAMPLES   STACK
0.47%         498           sun.nio.ch.UnixFileDispatcherImpl#pread0() [Native code]
                              at sun.nio.ch.UnixFileDispatcherImpl#pread():57 [JIT compiled code]
                              at sun.nio.ch.IOUtil#readIntoNativeBuffer():339 [JIT compiled code]
                              at sun.nio.ch.IOUtil#read():307 [JIT compiled code]
                              at sun.nio.ch.IOUtil#read():284 [Inlined code]
                              at sun.nio.ch.FileChannelImpl#readInternal():1198 [JIT compiled code]
                              at sun.nio.ch.FileChannelImpl#implRead():1170 [JIT compiled code]
                              at sun.nio.ch.FileChannelImpl#read():1152 [JIT compiled code]
0.25%         265           org.apache.lucene.util.ByteBlockPool$DirectTrackingAllocator#getByteBlock():97 [JIT compiled cod\
e]
                              at org.apache.lucene.util.ByteBlockPool#nextBuffer():196 [JIT compiled code]
                              at org.apache.lucene.index.TermsHashPerField#initStreamSlices():161 [JIT compiled code]
                              at org.apache.lucene.index.TermsHashPerField#add():139 [Inlined code]
                              at org.apache.lucene.index.TermsHashPerField#add():207 [JIT compiled code]
                              at org.apache.lucene.index.IndexingChain$PerField#invertTokenStream():1365 [JIT compiled code]
                              at org.apache.lucene.index.IndexingChain$PerField#invert():1256 [JIT compiled code]
                              at org.apache.lucene.index.IndexingChain#processField():773 [JIT compiled code]

I'm also baffled by the total event count (105485) ... it doesn't match what jfr summary says for jdk.CPUTimeSample:

base) raptorlake:trunk[main]$ jfr summary ./lucene/core/build/tests-cwd/hotspot-pid-3877240-id-1-2026_04_29_12_55_47.jfr

 Version: 2.1
 Chunks: 7
 Start: 2026-04-29 16:53:42 (UTC)
 Duration: 125 s

 Event Type                              Count  Size (bytes)
=============================================================
 jdk.ObjectAllocationInNewTLAB          164932       3167800
 jdk.CPUTimeSample                      111550       1964718
 jdk.ObjectAllocationOutsideTLAB         22485        386121
 org.junit.TestExecution                  8673        671942
 jdk.Checkpoint                           7698      81389727

@iprithv
Copy link
Copy Markdown
Contributor Author

iprithv commented Apr 29, 2026

@mikemccand getDidWork() was the wrong check here. It can be false for skipped/up-to-date/not-selected test tasks even when nothing failed. I changed this to check task.getState().getFailure() == null instead here

On the count mismatch, I think ProfileResults is counting CPUTime samples that have a non-null stack trace, while jfr summary counts all jdk.CPUTimeSample events. The JFR summary has no ExecutionSample / NativeMethodSample, so this output should be coming from CPUTime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The build/test tool ProfileResults that aggregates JFRs and prints the hot stacks doesn't work with new CPUTime profiler in Java 25

3 participants