[wip] BM_TCPEchoServerLatencyNQDRSubprocess benchmark#326
Conversation
|
Looking at this, it seems to me that adding a router should (in ideal case) add 0.014 ms of latency. That is time that the round trip to echo server without any routers in between takes. Adding a router to the chain adds two hops to the path of the packet, which should equal to +0.014 ms of latency. Actual latency added is 0.07, on average. That means there is 0.056 ms of overhead caused by the router. Is this a little, is this a lot? Where is this time spent? Is it spent usefully? |
|
In these latency tests, there is ever only a single TCP send in flight at a time, so the routers are as little loaded as is ever possible. So the latency measured should be the lowest achievable. edit: there should be tls in this |
|
On the whole, there is absolutely no reason to orchestrate the router subprocesses from C++ test. Much nicer to do this in Python and to use existing tooling, like echo server, some tcp ping utilities, iperf3, like a normal perf test would. Much more trustworthy results, that way, as well. When the thing stops being a microbenchmark, there is no point in trying to treat it as a microbenchmark. |
First few benchmarks is already in
main, the new one is theBM_TCPEchoServerLatencyNQDRSubprocessbenchmark.This shows what adding a router to a long chain does with latency when sending a small tcp message through. C is a client that measures timing, S is an echo server.
(use arguments such as
--benchmark_filter=.*BM_TCPEchoServerLatencyN.*to run only chosen benchmarks, or to run multiple times and compute stats)What would be interesting would be latency percentiles/distributions, which are not readily available now, but the benchmark can be updated with that, of course.
Looks like adding routers to the chain increases average (yes, I am ashamed for using average) latency linearly. And this could be used to measure where the latency is coming from, hopefully, and to track improvements if improvements are called for.