Add receiver-side IPC latency nanomark (Image::poll / Subscription:poll)#110
Open
benkier0 wants to merge 1 commit into
Open
Add receiver-side IPC latency nanomark (Image::poll / Subscription:poll)#110benkier0 wants to merge 1 commit into
benkier0 wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Background
All four existing C++ IPC benchmarks (AeronIpcBenchmark, AeronExclusiveIpcBenchmark, and their Nanomark variants) measure publisher throughput only; they time the round-trip sendBurst() + awaitConfirm() from the sender's perspective. There was no benchmark for the receiver hot-path: how long Image::poll() or Subscription::poll() takes from the moment a message was intended to be delivered.
This was raised in discussion #2027 with @pveentjer, who approved the approach described here.
The gap
Without an intended-start-time measurement, a naive receiver benchmark suffers from coordinated omission: if the subscriber falls behind, it eventually drains the backlog rapidly and records low latencies, hiding the real delay experienced by each message. p99 and p999 become meaningless.
Approach
Fixed-rate publishing with intended-start-time latency measurement, following the same pattern as LoadTestRig.send:
intendedSendNscounter that advances by exactlyNANOS_PER_SECOND / messageRatePerSecper message regardless of actual send time.nanoClock() >= intendedSendNs, then writesintendedSendNs(not the actual clock at send time) into the 8-byte message payload viatryClaim+putInt64.latencyNs = nanoClock() - intendedSendNs.If the publisher falls behind (e.g. back-pressure), the intended time still advances at the target rate. The receiver therefore records the full queuing delay, not just transit time. This makes p99/p999 meaningful.
Implementation details
File added:
benchmarks-aeron/src/main/cpp/AeronIpcReceiverNanomark.cppTwo Nanomark registrations share a single SharedState (one embedded media driver, one publication, one subscription on STREAM_ID 12):
AeronIpcReceiverNanomark::imagePoll; measuresImage::poll(handler, FRAGMENT_LIMIT)AeronIpcReceiverNanomark::subscriptionPoll; measuresSubscription::poll(handler, FRAGMENT_LIMIT)recordRun()is overridden as a no-op so the Nanomark framework's own wall-clock timing does not pollute the HDR histogram; allhdr_record_valuecalls happen inside the fragment handler.Warmup:
perThreadSetUpcallshdr_reset(m_histograms[id])atrepetition == 1, discarding rep 0 (warmup) data.main()runs 6 repetitions (1 warmup + 5 measurement). Rep 1/6 is visibly labeled in the teardown output so the warmup boundary is clear.Configurable rate:
argv[1]sets the message rate (messages/sec); defaults to 1,000,000 if omitted.File modified:
benchmarks-aeron/src/main/cpp/CMakeLists.txt; one line addingbenchmark(aeronIpcReceiverNanomark AeronIpcReceiverNanomark.cpp).Sample output (macOS dev machine, 1M msg/sec, untuned)
imagePoll
Summary: min/mean/max = 42/406 ns/394495 ns
p50=292 ns p90=333 ns p99=2793 ns p99.9≈7959 ns
subscriptionPoll
Summary: min/mean/max = 42/407 ns/534527 ns
p50=209 ns p90=334 ns p99=1709 ns p99.9≈10751 ns
The large max/tail values reflect OS scheduling jitter on an untuned dev machine (no CPU affinity). On an isolated server these would be significantly tighter. Happy for the maintainers to run on tuned hardware if baseline numbers for the repo are wanted.
Out of scope
Image::controlledPoll()is intentionally excluded to keep this PR focused on the basic poll path. It can be added as a follow-on.