Skip to content
Original file line number Diff line number Diff line change
Expand Up @@ -101,10 +101,16 @@ public static void main(String[] args) throws IOException {
}

/** true if we care about this event */
static boolean isInteresting(String mode, RecordedEvent event) {
static boolean isInteresting(String mode, RecordedEvent event, boolean hasCPUTimeSamples) {
String name = event.getEventType().getName();
switch (mode) {
case "cpu":
if (hasCPUTimeSamples) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this boolean parameter nor the if block.

Just something like:

return name.equals("jdk.CPUTimeSample") || 
       name.equals("jdk.ExecutionSample") ||
       name.equals("jdk.NativeMethodSample");

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, updated. thanks!

// Prefer jdk.CPUTimeSample (Java 25+, JEP 509): samples by CPU time, not wall-clock
// time, so idle threads (like gradle epoll) are inherently excluded — no filtering
// needed. When available, we skip legacy execution samples to avoid double-counting.
return name.equals("jdk.CPUTimeSample");
}
return (name.equals("jdk.ExecutionSample") || name.equals("jdk.NativeMethodSample"))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think NativeMethodSample is also a dup, if CPUTime and wall-clock sampling are both enabled?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I expanded the comment to say that explicitly so it matches the code. Thanks!

&& !isGradlePollThread(event.getThread("sampledThread"));
case "heap":
Expand All @@ -121,6 +127,24 @@ static boolean isGradlePollThread(RecordedThread thread) {
return (thread != null && thread.getJavaName().startsWith("/127.0.0.1"));
}

/**
* Pre-scan recording files to detect if any jdk.CPUTimeSample events are present. When they are,
* we prefer them over legacy jdk.ExecutionSample/jdk.NativeMethodSample to avoid double-counting.
*/
static boolean detectCPUTimeSamples(List<String> files) throws IOException {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this logic, should we just have profiling.linux.jfc that is loaded for linux, and profiling.jfc used for others?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, done. Linux uses gradle/testing/profiling.linux.jfc, everyone else gradle/testing/profiling.jfc, selected in CodeProfilingPlugin from the host OS. Thanks!

for (String file : files) {
try (RecordingFile recording = new RecordingFile(Paths.get(file))) {
while (recording.hasMoreEvents()) {
RecordedEvent event = recording.readEvent();
if (event.getEventType().getName().equals("jdk.CPUTimeSample")) {
return true;
}
}
}
}
return false;
}

/** value we accumulate for this event */
static long getValue(RecordedEvent event) {
switch (event.getEventType().getName()) {
Expand All @@ -132,6 +156,8 @@ static long getValue(RecordedEvent event) {
return 1L;
case "jdk.NativeMethodSample":
return 1L;
case "jdk.CPUTimeSample":
return 1L;
default:
throw new UnsupportedOperationException(event.toString());
}
Expand Down Expand Up @@ -173,6 +199,11 @@ public static void printReport(
if (count < 1) {
throw new IllegalArgumentException("tests.profile.count must be positive");
}

// Pre-scan to detect if CPU-time samples (Java 25+, JEP 509) are available.
// If so, prefer them over legacy execution samples to avoid double-counting.
boolean hasCPUTimeSamples = "cpu".equals(mode) && detectCPUTimeSamples(files);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this stricter? Throw an exception if we see a mix of CPUTime and legacy events? Lucene (and in general anyone profiling, hmm except maybe profiler debuggers heh) should only have one sampler enabled at a time?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially added the strict mixed-sampler check here, but removed it after @rmuir follow up review since the two Lucene JFC files are now mutually exclusive.

So the exclusivity now lives in the selected JFC config instead of a pre-scan in ProfileResults. Thanks!


Map<String, SimpleEntry<String, Long>> histogram = new HashMap<>();
int totalEvents = 0;
long sumValues = 0;
Expand All @@ -181,7 +212,7 @@ public static void printReport(
try (RecordingFile recording = new RecordingFile(Paths.get(file))) {
while (recording.hasMoreEvents()) {
RecordedEvent event = recording.readEvent();
if (!isInteresting(mode, event)) {
if (!isInteresting(mode, event, hasCPUTimeSamples)) {
continue;
}
RecordedStackTrace trace = event.getStackTrace();
Expand Down
5 changes: 5 additions & 0 deletions gradle/testing/profiling.jfc
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,11 @@ Collects only execution and method samples at a low interval
<setting name="period">1 ms</setting>
</event>

<event name="jdk.CPUTimeSample">
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we turn off the old (wall-clock) sampling (jdk.ExecutionSample, jdk.NativeMethodSample) and just fully commit to CPUTime. In nightly benchmarks we've found the 1 msec sampling to add non-trivial overhead (I think ~5-7% slower).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 1msec sampling was necessary for tests in order to find places wasting CPU. most tests just don't run that long.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done,

  1. Added gradle/testing/profiling.linux.jfc: legacy samplers off, jdk.CPUTimeSample on with a 1 ms throttle so short chunks still get dense enough samples.
  2. Left gradle/testing/profiling.jfc for non-Linux with 1 ms ExecutionSample / NativeMethodSample.
  3. CodeProfilingPlugin loads the Linux file when os.name contains linux, otherwise profiling.jfc.

Thanks both!

<setting name="enabled">true</setting>
<setting name="throttle">10 ms</setting>
</event>

<event name="jdk.ObjectAllocationInNewTLAB">
<setting name="enabled">true</setting>
<setting name="stackTrace">true</setting>
Expand Down
5 changes: 5 additions & 0 deletions lucene/CHANGES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,11 @@ Changes in Runtime Behavior

Build
---------------------
* GITHUB#15926: Support jdk.CPUTimeSample event (Java 25+, JEP 509) in ProfileResults.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we word a bit more user-friendly-y? Maybe Use the new (Java 25+, JEP 509) CPU time sampling profiler when available (currently just on Linux), and fall back to the legacy wall-clock...?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, done. thanks!

CPU-time profiling produces more accurate profiles by sampling CPU instructions instead
of wall-clock time. When CPUTimeSample events are present, they are preferred over legacy
ExecutionSample events to avoid double-counting. (Prithvi S)

* GITHUB#15327: New low-level build options to detect abuse of LuceneTestCase.random():
tests.random.maxacquires and tests.random.maxcalls (Robert Muir, Dawid Weiss)

Expand Down
Loading