[ntuple] Support multiple column representations in the merger by silverweed · Pull Request #22017 · root-project/root

silverweed · 2026-04-22T15:07:43Z

This Pull request:

Significantly reworks the innards of the RNTupleMerger to support fast merging of fields with different but compatible column representations.
Basically it does two things:

turns all L3 merging cases into L2/L1.
no longer rejects merging fields with different column representations (previously this was only supported for representations that were the split/unsplit version of each other, and only via L3 merging).

A potentially negative consequence that we might want to revisit is that now the merger won't ever adapt the columns' splitness to the output compression (e.g. if merging changes the source compression from 0 to 505 it will still encode the columns as unsplit, and vice-versa). This will probably be readded in a future PR.

In order to achieve this, some new internal functionality had to be added, most notably RPagePersistentSink::AddColumnRepresentation.

TODO

check if we need a feature flag for the changes in AddExtendedColumnRanges
add a test for merging of Real32Trunc/Quant columns with different bit width/value range
properly split the big merger commit
update Merging.md

Checklist:

tested changes locally
updated the docs (if necessary)

Instead of calling continue multiple times in the AddColumnFromField loop, just early return in case of projected fields.

We are currently serializing columns per-field, but in case of late column extension this might result in inconsistent sorting of the columns in the serialized footer. e.g. assume you have fields "A" and "B", both late model extended, both with a single column: - col 0 -> field A, repr 0 - col 1 -> field B, repr 0 Now you add a new column representation to field "A"; this new column has id 2: - col 2 -> field A, repr 1 When serializing this RNTuple, all columns are written in the footer by RNTupleSerialize::SerializeColumnsForFields(). Before this change, they would end up on disk in order: [0, 2, 1]. This would corrupt the data by swapping the pages for columns 2 and 1. After this change, they get written as [0, 1, 2] which is the correct order. Note that this exact case is tested in ntuple_merger in the unit test MergeDeferredAdvanced.

Also fix the type of result

Internal functionality to be used by the Merger

github-actions · 2026-04-22T17:16:23Z

Test Results

16 files 16 suites 2d 10h 53m 8s ⏱️
3 843 tests 3 841 ✅ 0 💤 2 ❌
54 767 runs 54 751 ✅ 0 💤 16 ❌

For more details on these failures, see this check.

Results for commit 1db6b5e.

silverweed added 7 commits April 22, 2026 15:00

[ntuple] Small improvement in RNTupleMerger

950932c

Instead of calling continue multiple times in the AddColumnFromField loop, just early return in case of projected fields.

[ntuple] update merger test to make sure we test nRepetitions

46a7e6f

[ntuple] Clarify a bit RFieldBase::EntryToColumnElementIndex

5a5272a

Also fix the type of result

[ntuple] Add RClusterDescriptor::TryGetColumnRange

d89a682

[ntuple] Add RPagePersistentSink::AddColumnRepresentation

4aa1007

Internal functionality to be used by the Merger

[ntuple] Fix AddExtendedColumnRanges to account for ExtendColumns

4ba4e83

silverweed requested a review from jblomer as a code owner April 22, 2026 15:07

silverweed marked this pull request as draft April 22, 2026 15:07

silverweed changed the title ~~Ntuple merge colrep2~~ [ntuple] Support multiple column representations in the merger Apr 22, 2026

silverweed added the in:RNTuple label Apr 22, 2026

silverweed self-assigned this Apr 22, 2026

silverweed force-pushed the ntuple_merge_colrep2 branch 2 times, most recently from 2accf31 to b2ae5fc Compare April 22, 2026 15:22

WIP: merger column ext

1db6b5e

silverweed force-pushed the ntuple_merge_colrep2 branch from b2ae5fc to 1db6b5e Compare April 22, 2026 15:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ntuple] Support multiple column representations in the merger#22017

[ntuple] Support multiple column representations in the merger#22017
silverweed wants to merge 8 commits intoroot-project:masterfrom
silverweed:ntuple_merge_colrep2

silverweed commented Apr 22, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

silverweed commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

This Pull request:

TODO

Checklist:

Uh oh!

github-actions Bot commented Apr 22, 2026

Test Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

silverweed commented Apr 22, 2026 •

edited

Loading