Optimize binary search methods by IvoDD · Pull Request #3110 · man-group/ArcticDB

IvoDD · 2026-05-14T09:11:51Z

Reference Issues/PRs

Optimizations on top of #3091
Used in #3062

What does this implement or fix?

Some micro optimizations on binary search methods:

Don't keep TypedBlockData in ColumnDataIterator. Instead only keep block_data_ and block_size_
Don't recalculate block pointer and size when we already know them during gallop

Any other comments?

Benchmarks for all search and iteration methods:

Benchmark	Before (ns)	After (ns)	Delta
iterate_irregular_blocks_1 (one row per block)	478,496	311,163	−35.0%
iterate_with_iterator (100 rows)	798	719	−9.9%
exponential_lb_single_block (in first 100)	356	323	−9.2%
exponential_lb_single_block (full gallop)	458	424	−7.4%
exponential_lb_regular (in first 100)	364	339	−6.7%
exponential_lb_irregular_1000 (in first 100)	360	335	−6.7%
exponential_lb_irregular_1000 (full gallop)	496	476	−3.9%
exponential_lb_regular (full gallop)	504	489	−2.9%
exponential_lb_irregular_1 (in first 100)	464	455	−2.0%
exponential_lb_irregular_1 (full gallop)	687	679	−1.3%
lower_bound_single_block	411	394	−4.1%
lower_bound_irregular_1000	444	431	−3.0%
lower_bound_irregular_1	595	579	−2.8%
lower_bound_regular_blocks	443	436	−1.4%
iterate_single_block	27,305	27,247	−0.2%
iterate_regular_blocks	29,051	28,734	−1.1%
iterate_irregular_blocks_1000	28,136	27,893	−0.9%
iterate_with_scalar_at (100 rows)	182,183,122	182,088,026	−0.1%

Checklist

Checklist for code changes...

Have you updated the relevant docstrings, documentation and copyright notice?
Is this contribution tested against all ArcticDB's features?
Do all exceptions introduced raise appropriate error messages?
Are API changes highlighted in the PR description?
Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

claude · 2026-05-14T09:15:35Z

ArcticDB Code Review Summary

No items requiring attention. The optimization is correct, well-scoped, and the benchmark deltas in the PR description validate it.

Verified:

gallop_bracket first-block lambdas are safe: prev_block/cur_block remain first_block_idx throughout the first-block probing phase, so the optimized variants do not need to track the block field.
The raw-pointer block_begin_ replacing std::optional<TypedBlockData<TDT>> is consistently propagated (copy constructor, dereference, end-sentinel). All callers in column_algorithms.hpp and test_column.cpp updated to current_block_data() == nullptr.
New load_current_block computes block->logical_size() / sizeof(RawType), equivalent to the previous TypedBlockData::row_count() for Dim0 (which is static_assert-enforced by the search code paths).

vasil-pashov · 2026-05-18T10:45:55Z

+    auto record_probe_in_first_block = [&](size_t next_offset, RawType probe_value) {
+        prev_offset = cur_offset;
+        cur_offset = next_offset;
+        return is_before(probe_value, value);
+    };


Do these two extra assignments that are omitted really make a difference?

Most probably they do not.

Most of the benefit is from reusing the already calculated first_block_row_count and first_block_data in make_iter_in_first_block.

It made sense to also add a first block variant of record_probe as well, to make the invariant clearer

Additional micro optimizations on binary search methods: - Don't keep `TypedBlockData` in `ColumnDataIterator` - Don't recalculate block pointer and size when we already know them during gallop

…port (#3062) #### Reference Issues/PRs Monday ref: 11679866800 Depends on PRs #3091 and #3110 ### Issues - There is complicated bucket hopping logic in three places: `generate_output_index_column`, `generate_resampling_output_column`, `SortedAggregator::aggregate` - The bucket hopping logic involves many branches with loads of checks ### Changes (split per commit for easier review) 0. Adds C++ benchmarks which measures the CPU intensive part of resampling 1. Pure move of the `generate_output_index_column` to `sorted_aggregation.cpp`. - This way all bucket hopping logic is in one place. 2. Construct a `ResampleMapping` in `generate_output_index_column` and use it directly in other methods. - `ResampleMapping` just has a mapping from `output_row` to `(start_column_index, start_column_offset), (end_column_index, end_column_offset)`. - Resolves the 3 places with similar logic. - Makes the implementation of sparse aggregation easier. 3. Use [galloping search](https://en.wikipedia.org/wiki/Exponential_search) in `generate_output_index_column` to skip past all rows in a single bucket at once. - Index column construction was the bottleneck: aggregation vectorises well but index iteration does not. - Changes complexity from `O(num_input_rows + num_buckets)` to `O(num_buckets × log(rows_per_bucket))`. - Always ≤ `O(num_input_rows + num_buckets)` even when `num_buckets ≥ num_input_rows`. 4. Preallocate the output index column to `min(num_buckets, num_input_rows)` instead of `num_buckets`. - Galloping search has a higher constant than linear scan and regresses at low rows per bucket. - Slightly improves the case where most buckets are empty due to smaller allocation. 5. Use a runtime heuristic to choose between linear scan and galloping search. - Linear scan is faster below ~32 rows/bucket (because of smaller constant and better branch prediction); galloping search is faster above. - Threshold determined empirically from benchmarks at intermediate bucket counts. Extra benchmarking was done with more parametrization of the existing benchmark. Not kept in PR to avoid a huge amount of benchmarking code. - Recovers the Dense-100k and Empty regressions from commit 3 while retaining all gains elsewhere. 6. Implement sparse resampling. - Small change made straightforward by the `ResampleMapping` from commit 2. - Minimal overhead for the dense case. ### Resample benchmark timings `BM_resample/<rows_per_seg>/<num_segs>/<num_buckets>/<num_cols>`. Total rows ~1M. Source: `cpp/arcticdb/processing/test/benchmark_resample.cpp`. Times in **ms**, `--benchmark_min_time=2s`. | Regime | Args | rows/bucket | Description | |---|---|---|---| | Dense-1k | `100k × 10, 1k buckets` | ~1000 | Many rows/bucket, single row-slice | | Dense-100 | `100k × 10, 10k buckets` | ~100 | Medium rows/bucket, single row-slice | | Dense-10 | `100k × 10, 100k buckets` | ~10 | Few rows/bucket, single row-slice | | Spanning | `2k × 500, 100 buckets` | ~10k | Buckets span multiple row-slices | | Empty | `100k × 10, 10M buckets` | <1 | Bucket smaller than row spacing; most empty | **1 aggregation column** | # | Change | D-1k | D-100 | D-10 | Spanning | Empty | |---|---|---|---|---|---|---| | 0 | Baseline | 1.27 | 1.34 | 1.47 | 1.65 | 11.1 | | 1 | Code move | 1.02 (−20%) | 1.12 (−16%) | 1.27 (−14%) | 1.40 (−15%) | 11.1 (0%) | | 2 | ResampleMapping | 1.02 (−20%) | 1.12 (−16%) | 1.32 (−10%) | 1.40 (−15%) | 11.8 (+6%) | | 3 | Galloping search | 0.059 (−95%) | 0.385 (−71%) | 2.94 (+100%) | 0.285 (−83%) | 21.9 (+97%) | | 4 | Bounded allocation | 0.058 (−95%) | 0.396 (−70%) | 2.91 (+98%) | 0.291 (−82%) | 21.5 (+94%) | | 5 | Heuristic (lin/EUB) | 0.059 (−95%) | 0.383 (−71%) | 1.27 (−14%) | 0.293 (−82%) | 11.5 (+4%) | | 6 | Sparse-input support | 0.068 (−95%) | 0.449 (−66%) | 1.28 (−13%) | 0.296 (−82%) | 11.5 (+4%) | **100 aggregation columns** | # | Change | D-1k | D-100 | D-10 | Spanning | Empty | |---|---|---|---|---|---|---| | 0 | Baseline | 1.37 | 1.43 | 1.56 | 6.22 | 48.0 | | 1 | Code move | 1.11 (−19%) | 1.18 (−17%) | 1.34 (−14%) | 5.92 (−5%) | 46.2 (−4%) | | 2 | ResampleMapping | 1.11 (−19%) | 1.19 (−17%) | 1.39 (−11%) | 5.87 (−6%) | 50.4 (+5%) | | 3 | Galloping search | 0.148 (−89%) | 0.471 (−67%) | 2.96 (+90%) | 4.65 (−25%) | 63.1 (+31%) | | 4 | Bounded allocation | 0.148 (−89%) | 0.480 (−66%) | 2.95 (+89%) | 4.67 (−25%) | 44.1 (−8%) | | 5 | Heuristic (lin/EUB) | 0.149 (−89%) | 0.477 (−67%) | 1.33 (−15%) | 4.70 (−24%) | 35.9 (−25%) | | 6 | Sparse-input support | 0.158 (−88%) | 0.537 (−62%) | 1.35 (−13%) | 4.94 (−21%) | 36.0 (−25%) | Deltas vs baseline (row 0). #### Notes on benchmark results - Load average varied across runs so there are some artifacts in results like "Code move" improvements. - Galloping search improves the speed when there are more rows in a single bucket significantly. Thorough benchmarking showed exponential upper bound (EUB) becomes faster than linear search at ~32 rows per bucket. Hence we see some performance regressions in the 10 rows per bucket and in the mostly empty bucket cases. - Bounded allocation mostly helps the empty case as expected - Using the heuristic to choose between EUB and linear search helps when rows_per_bucket < 32. It is even more efficient than the baseline due to slightly better branch prediction (improved use of `ARCTICDB_LIKELY` and `ARCTICDB_UNLIKELY`). - Final state: every regime at or faster than baseline; Dense 1000 rows per bucket is the biggest winner with 20x improvement; Mostly empty bucket is the only usecase with no improvement and remains around baseline (+4%) --------- Co-authored-by: Ivo <ivo.dilov@man.com>

IvoDD requested review from alexowens90 and poodlewars as code owners May 14, 2026 09:11

IvoDD added no-release-notes This PR shouldn't be added to release notes. patch Small change, should increase patch version labels May 14, 2026

IvoDD mentioned this pull request May 14, 2026

Resampling performance improvement and sparse aggregation columns support #3062

Merged

alexowens90 approved these changes May 18, 2026

View reviewed changes

vasil-pashov approved these changes May 18, 2026

View reviewed changes

Base automatically changed from binary-search-utils to master May 21, 2026 09:15

Optimize binary search methods

6120021

Additional micro optimizations on binary search methods: - Don't keep `TypedBlockData` in `ColumnDataIterator` - Don't recalculate block pointer and size when we already know them during gallop

IvoDD force-pushed the binary-search-utils-optimization branch from 0c2d98c to 6120021 Compare May 21, 2026 09:18

IvoDD merged commit f7767c2 into master May 21, 2026
226 checks passed

IvoDD deleted the binary-search-utils-optimization branch May 21, 2026 12:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize binary search methods#3110

Optimize binary search methods#3110
IvoDD merged 1 commit into
masterfrom
binary-search-utils-optimization

IvoDD commented May 14, 2026

Uh oh!

claude Bot commented May 14, 2026

Uh oh!

vasil-pashov May 18, 2026

Uh oh!

IvoDD May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

IvoDD commented May 14, 2026

Reference Issues/PRs

What does this implement or fix?

Any other comments?

Checklist

Uh oh!

claude Bot commented May 14, 2026

ArcticDB Code Review Summary

Uh oh!

vasil-pashov May 18, 2026

Choose a reason for hiding this comment

Uh oh!

IvoDD May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants