Skip to content

Add receiver-side IPC latency nanomark (Image::poll / Subscription:poll)#110

Open
benkier0 wants to merge 1 commit into
aeron-io:masterfrom
benkier0:receiver-ipc-nanomark
Open

Add receiver-side IPC latency nanomark (Image::poll / Subscription:poll)#110
benkier0 wants to merge 1 commit into
aeron-io:masterfrom
benkier0:receiver-ipc-nanomark

Conversation

@benkier0

Copy link
Copy Markdown

Background

All four existing C++ IPC benchmarks (AeronIpcBenchmark, AeronExclusiveIpcBenchmark, and their Nanomark variants) measure publisher throughput only; they time the round-trip sendBurst() + awaitConfirm() from the sender's perspective. There was no benchmark for the receiver hot-path: how long Image::poll() or Subscription::poll() takes from the moment a message was intended to be delivered.

This was raised in discussion #2027 with @pveentjer, who approved the approach described here.

The gap

Without an intended-start-time measurement, a naive receiver benchmark suffers from coordinated omission: if the subscriber falls behind, it eventually drains the backlog rapidly and records low latencies, hiding the real delay experienced by each message. p99 and p999 become meaningless.

Approach

Fixed-rate publishing with intended-start-time latency measurement, following the same pattern as LoadTestRig.send:

  • A background publisher thread maintains an intendedSendNs counter that advances by exactly NANOS_PER_SECOND / messageRatePerSec per message regardless of actual send time.
  • It busy-waits until nanoClock() >= intendedSendNs, then writes intendedSendNs (not the actual clock at send time) into the 8-byte message payload via tryClaim + putInt64.
  • The fragment handler on the polling side records latencyNs = nanoClock() - intendedSendNs.

If the publisher falls behind (e.g. back-pressure), the intended time still advances at the target rate. The receiver therefore records the full queuing delay, not just transit time. This makes p99/p999 meaningful.

Implementation details

File added: benchmarks-aeron/src/main/cpp/AeronIpcReceiverNanomark.cpp

Two Nanomark registrations share a single SharedState (one embedded media driver, one publication, one subscription on STREAM_ID 12):

  • AeronIpcReceiverNanomark::imagePoll; measures Image::poll(handler, FRAGMENT_LIMIT)
  • AeronIpcReceiverNanomark::subscriptionPoll; measures Subscription::poll(handler, FRAGMENT_LIMIT)

recordRun() is overridden as a no-op so the Nanomark framework's own wall-clock timing does not pollute the HDR histogram; all hdr_record_value calls happen inside the fragment handler.

Warmup: perThreadSetUp calls hdr_reset(m_histograms[id]) at repetition == 1, discarding rep 0 (warmup) data. main() runs 6 repetitions (1 warmup + 5 measurement). Rep 1/6 is visibly labeled in the teardown output so the warmup boundary is clear.

Configurable rate: argv[1] sets the message rate (messages/sec); defaults to 1,000,000 if omitted.

File modified: benchmarks-aeron/src/main/cpp/CMakeLists.txt; one line adding benchmark(aeronIpcReceiverNanomark AeronIpcReceiverNanomark.cpp).

Sample output (macOS dev machine, 1M msg/sec, untuned)

imagePoll
Summary: min/mean/max = 42/406 ns/394495 ns
p50=292 ns p90=333 ns p99=2793 ns p99.9≈7959 ns
subscriptionPoll
Summary: min/mean/max = 42/407 ns/534527 ns
p50=209 ns p90=334 ns p99=1709 ns p99.9≈10751 ns

The large max/tail values reflect OS scheduling jitter on an untuned dev machine (no CPU affinity). On an isolated server these would be significantly tighter. Happy for the maintainers to run on tuned hardware if baseline numbers for the repo are wanted.

Out of scope

Image::controlledPoll() is intentionally excluded to keep this PR focused on the basic poll path. It can be added as a follow-on.

@mikeb01 mikeb01 self-assigned this Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants