[http_server] Simplify and improve perf of DRT node and table#132
[http_server] Simplify and improve perf of DRT node and table#132cong1920 wants to merge 1 commit intomatt-42:masterfrom
Conversation
1966842 to
7a5aa97
Compare
7a5aa97 to
41b5f88
Compare
41b5f88 to
5dd4518
Compare
|
Pleas ignore this. See my latest reply below. ================= With the benchmark test added in commit ac446b1 I found this refactoring actually makes the route insert to drt slower and URL hit/miss almost no change. Thanks to LLM assisted coding nowadays. Considering route insert is not common once server is set up to run, this perf "regression" maybe is okay? @matt-42 So sorry for this long delayed PR. I always wanted to add some perf tests with this PR but always being lazy to do so. Nowadays every developer is driving LLM to code. I decided to give a try tonight :) If route insert "regression" is not okay I can continue drive my AI agent to improve. |
Introduce a hybrid_children_map that uses a flat vector<> with linear serach for nodes with < 8 children, auto-upgrading to unordered_map<> when getting more nodes to replace linear search with fast hashmap access. This change is most TRIE nodes have 1-5 children, so they benefit from contiguous cache-friendly access with linear search. Benchmark results (median, 25 iterations, WSL2 g++ -O2): Standard scenario (152 routes) vs master: insert: 274.3 -> 211.7 ns/op -22.8% hit: 80.9 -> 56.0 ns/op -30.8% miss: 45.2 -> 26.7 ns/op -40.9% Wide scenario (112 routes, 20-24 children/node) vs master: insert: 268.1 -> 212.9 ns/op -20.6% hit: 83.2 -> 62.1 ns/op -25.4% miss: 59.6 -> 33.8 ns/op -43.3% Besides this perf improvement, this PR also improves code readability by using wrapped data structures other than `std::vector<std::shared_ptr<T>>` and other multi-layer of standard containers. Based on wrapping structures there is also some optimizations like bulk allocation and etc.
0254001 to
8863e98
Compare
|
I decided to let LLMs help me again on how to improve, and the outcome is impressive. No more regression. All improvements.
|
[http_server] Perf and readability improvements of DRT
Introduce a hybrid_children_map that uses a flat vector<> with linear serach
for nodes with < 8 children, auto-upgrading to unordered_map<> when getting
more nodes to replace linear search with fast hashmap access.
This change is most TRIE nodes have 1-5 children, so they benefit from
contiguous cache-friendly access with linear search.
Besides this perf improvement, this PR also improves code readability by using
wrapped data structures other than
std::vector<std::shared_ptr<T>>and othermulti-layer of standard containers. Based on wrapping structures there is also
some optimizations like bulk allocation and etc.
=================
It started by seeing the
std::vecotr<std::shared_ptr<drt_node>>andstd::vector<std::shared_ptr<std::string>>as thedrt_nodeanddynamic_routing_table's members respectively, I couldn't help wondering why theshared_ptr<T>are needed here. Later I realized the instance of the DRT table is allowed to copy therefore we need to keep its objects being shared across all instances. Somehow I still think it is a waste to haveshared_ptr<T>manage each small object. So I made my first try years ago at new year eve during the family trip, which introduced something new and simplified something else.However I wasn't not very confident without a perf measurement. I planned to do so but just could not sit down for few hours to design and code some perf tests. Thanks to LLMs and OpenClaw and all the fancy things nowadays. One day this project came back to memory again and I just drove all AI things in hand to quickly come up with the
benchmarks/drt.ccand measured mixed perf results. Routing node hits and misses are slightly faster but inserting gets much slower.It might be okay because it would be rare to add new nodes once server is up. Well it is not hard to diagnose and try different approaches to improve too, with AI's assistance. After few iterations, we get this simplified dynamic_routng_table.hh with significant perf improvements no matter insert, hit or miss. @matt-42