Skip to content

Hybrid eager/deferred accumulation for string_agg GroupsAccumulator to reduce copying and memory usage#21469

Draft
kosiew wants to merge 9 commits intoapache:mainfrom
kosiew:deferredcopying-02-21156
Draft

Hybrid eager/deferred accumulation for string_agg GroupsAccumulator to reduce copying and memory usage#21469
kosiew wants to merge 9 commits intoapache:mainfrom
kosiew:deferredcopying-02-21156

Conversation

@kosiew
Copy link
Copy Markdown
Contributor

@kosiew kosiew commented Apr 8, 2026

Which issue does this PR close?


Rationale for this change

The current StringAggGroupsAccumulator eagerly copies every input string into per-group String buffers during update_batch. This approach is simple but can lead to significant memory overhead and unnecessary copying, especially for large payloads or high-cardinality groupings.

This PR introduces a hybrid strategy that defers copying when it is likely to be beneficial. Instead of immediately materializing strings, the accumulator can retain references to input batches and store lightweight (group_idx, row_idx) entries. Actual string concatenation is deferred until evaluate().

This approach aims to:

  • Reduce memory duplication for large strings
  • Improve performance for workloads with many groups and large payloads
  • Maintain efficiency for small inputs via an eager fast path

What changes are included in this PR?

  • Introduced a hybrid accumulation model:

    • Eager path: existing behavior for small inputs
    • Deferred path: stores references to input batches and row indices
  • Added new internal structures:

    • batches: Vec<ArrayRef> to retain input arrays
    • batch_entries: Vec<Vec<(u32, u32)>> to track (group_idx, row_idx) pairs
    • num_groups to track total group count
  • Introduced StringInputArray abstraction to unify handling of:

    • Utf8
    • LargeUtf8
    • Utf8View
  • Implemented heuristics to decide when to defer:

    • DEFER_GROUP_THRESHOLD
    • DEFER_PAYLOAD_LEN_THRESHOLD
  • Refactored append logic:

    • append_rows_typed for deferred indexing
    • append_batch_typed for eager materialization
    • append_batch_values_typed for reconstruction during evaluation
  • Updated evaluate() to:

    • Materialize deferred data into output
    • Support partial emits (EmitTo::First)
    • Retain un-emitted state correctly
  • Added state management improvements:

    • clear_state() to fully reset buffers
    • retain_after_emit() to compact deferred state after partial emits
  • Extended memory accounting in size() to include:

    • retained batches
    • batch entry metadata
  • Added test:

    • groups_mixed_eager_and_deferred_batches to validate correctness of hybrid behavior

Are these changes tested?

Yes.

  • Added a new test covering mixed eager and deferred batches

  • Existing tests continue to pass

  • The new test verifies:

    • Correct concatenation across eager + deferred inputs
    • Correct handling of partial emits
    • Proper retention of remaining groups after emission

Are there any user-facing changes?

No.

  • This change is internal to the execution engine
  • No API or SQL behavior changes
  • Expected improvements are limited to performance and memory usage characteristics

LLM-generated code disclosure

This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.

kosiew added 6 commits April 7, 2026 23:17
Optimize StringAggGroupsAccumulator to retain input and state
batches with metadata instead of building a Vec<Option<String>>
on every update. Assemble concatenated strings lazily in
evaluate() and state(). Adjust size() to reflect retained
arrays and metadata. Support EmitTo::First(n) by
emitting the required prefix and renumbering retained groups.
Include note for future mixed-batch compaction work.
Remove unnecessary &mut self from append_rows. Consolidate
repeated string-append loop into a typed private helper using
ArrayAccessor. Eliminate redundant runtime null checks in favor
of non-null entry invariant with debug_assert!. Simplify
retain_after_emit into a single filter-and-renumber pass. Trim
local ceremony in evaluate() and state() for clarity.
Consolidate string-like array routing through a single
StringInputArray abstraction to improve maintainability.
Rename the slot appender to append_group_value for
better readability of the lazy-assembly path.
Update append_rows_typed and append_batch_values_typed to
accept array references instead of values. Modify call sites
in StringInputArray to pass references, improving memory
efficiency and consistency across function calls.
Adjust string_agg to implement a hybrid accumulator, offering
eager updates for lightweight workloads and switching to
deferred row tracking for larger batches. This change
enhances performance while maintaining efficiency.
Included mixed-mode regression tests to cover various
batch scenarios and ensure correctness.
@github-actions github-actions bot added the functions Changes to functions implementation label Apr 8, 2026
@kosiew
Copy link
Copy Markdown
Contributor Author

kosiew commented Apr 8, 2026

Benchmark (#21437)

              Criterion Benchmark Summary (Statistically Significant Changes)
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark                                                                                     ┃ Mean Change ┃  P-value ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ aggregate_query_approx_percentile_cont_on_f32                                                 │      -4.93% │ 0.000000 │
│ aggregate_query_approx_percentile_cont_on_u64                                                 │      -4.68% │ 0.000000 │
│ aggregate_query_distinct_median                                                               │      -1.59% │ 0.000000 │
│ aggregate_query_group_by                                                                      │      -4.48% │ 0.000000 │
│ aggregate_query_group_by_u64 15 12                                                            │      -1.34% │ 0.000000 │
│ aggregate_query_group_by_u64_multiple_keys                                                    │      -3.66% │ 0.000000 │
│ aggregate_query_group_by_wide_u64_and_f32_without_aggregate_expressions                       │      -5.07% │ 0.000000 │
│ (aggregate_query_group_by_wide_u64_and_f32_without_aggregate_expr)                            │             │          │
│ aggregate_query_group_by_wide_u64_and_string_without_aggregate_expressions                    │      -4.94% │ 0.000000 │
│ (aggregate_query_group_by_wide_u64_and_string_without_aggregate_e)                            │             │          │
│ aggregate_query_group_by_with_filter                                                          │      -3.93% │ 0.000000 │
│ aggregate_query_no_group_by_count_distinct_wide                                               │      -4.73% │ 0.000000 │
│ aggregate_query_no_group_by_min_max_f64                                                       │      -3.28% │ 0.000000 │
│ array_agg_query_group_by_few_groups                                                           │      -3.85% │ 0.000000 │
│ array_agg_query_group_by_many_groups                                                          │      -3.06% │ 0.000000 │
│ array_agg_query_group_by_mid_groups                                                           │      -5.56% │ 0.000000 │
│ array_agg_struct_query_group_by_mid_groups                                                    │      -3.55% │ 0.000000 │
│ first_last_ignore_nulls                                                                       │      -1.65% │ 0.000000 │
│ first_last_many_columns                                                                       │      -1.76% │ 0.000000 │
│ first_last_one_column                                                                         │      -4.00% │ 0.000000 │
│ string_agg_payloads/few_groups/large_1024b (large_1024b)                                      │      -4.48% │ 0.000000 │
│ string_agg_payloads/few_groups/medium_64b (medium_64b)                                        │      -2.78% │ 0.000000 │
│ string_agg_payloads/few_groups/small_3b (small_3b)                                            │      -4.02% │ 0.000000 │
│ string_agg_payloads/many_groups/large_1024b (large_1024b)                                     │      -2.12% │ 0.000000 │
│ string_agg_payloads/many_groups/medium_64b (medium_64b)                                       │      -5.19% │ 0.000000 │
│ string_agg_payloads/many_groups/small_3b (small_3b)                                           │      -4.37% │ 0.000000 │
│ string_agg_payloads/mid_groups/large_1024b (large_1024b)                                      │      -4.01% │ 0.000000 │
│ string_agg_payloads/mid_groups/medium_64b (medium_64b)                                        │     -12.10% │ 0.000000 │
└───────────────────────────────────────────────────────────────────────────────────────────────┴─────────────┴──────────┘

Summary: 26 improvements, 0 regressions (p < 0.05)

@kosiew kosiew force-pushed the deferredcopying-02-21156 branch from 4b3ed6e to 51ac58a Compare April 8, 2026 14:18
kosiew added 3 commits April 8, 2026 22:20
Eliminate repeated match arms in string_agg.rs by introducing a local
dispatch macro. This enhances clarity and readability, allowing each
method to focus on intent while simplifying maintenance for future
changes. The refactor preserves existing static dispatch behavior,
ensuring that all targeted tests continue to pass.
Remove redundant num_groups field and derive emission size from
values. Collapse deferred-state retention into a tighter
iterator/unzip flow. Eliminate the extra append_batch_values
forwarding helper. Split evaluate() into smaller private steps
with replay_deferred_batches and finish_emit. Simplify
should_defer and refine the deferred replay loop for clarity
and efficiency.
@kosiew kosiew force-pushed the deferredcopying-02-21156 branch from 51ac58a to baa8054 Compare April 8, 2026 14:20
@github-actions github-actions bot added the core Core DataFusion crate label Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant