Skip to content

HIVE-25948: Iceberg: Enable cost-based selection between Fanout and Clustered writers using column stats NDV#6389

Merged
deniskuzZ merged 5 commits intoapache:masterfrom
deniskuzZ:HIVE-25948
Apr 9, 2026
Merged

HIVE-25948: Iceberg: Enable cost-based selection between Fanout and Clustered writers using column stats NDV#6389
deniskuzZ merged 5 commits intoapache:masterfrom
deniskuzZ:HIVE-25948

Conversation

@deniskuzZ
Copy link
Copy Markdown
Member

@deniskuzZ deniskuzZ commented Mar 24, 2026

What changes were proposed in this pull request?

Cost-based selection between Fanout and Clustered writers

Why are the changes needed?

Perf optimization

Does this PR introduce any user-facing change?

No

How was this patch tested?

mvn test -Dtest=TestIcebergCliDriver -Dqfile=dynamic_partition_writes.q

┌───────────────────────┬───────────────────────────┐
│       Scenario        │         Expected          │
├───────────────────────┼───────────────────────────┤
│ threshold=0 (default) │ no sort (NDV<MAX_WRITERS) │
├───────────────────────┼───────────────────────────┤
│ threshold=-1          │ no sort                   │
├───────────────────────┼───────────────────────────┤
│ threshold=1           │ sort                      │
├───────────────────────┼───────────────────────────┤
│ threshold=2           │ sort (NDV>2)              │
├───────────────────────┼───────────────────────────┤
│ threshold=100         │ no sort (NDV<=100)        │
├───────────────────────┼───────────────────────────┤
│ fanout=false          │ sort                      │
└───────────────────────┴───────────────────────────┘

@deniskuzZ deniskuzZ changed the title HIVE-25948: Enable cost-based selection between FanoutWriter and ClusteredWriter for Iceberg tables based on column stats NDV HIVE-25948: Iceberg: Enable cost-based selection between FanoutWriter and ClusteredWriter based on column stats NDV Mar 24, 2026
@deniskuzZ deniskuzZ force-pushed the HIVE-25948 branch 2 times, most recently from 01fef8e to f661d63 Compare March 24, 2026 20:11
@deniskuzZ deniskuzZ changed the title HIVE-25948: Iceberg: Enable cost-based selection between FanoutWriter and ClusteredWriter based on column stats NDV HIVE-25948: Iceberg: Enable cost-based selection between Fanout and Clustered writers using column stats NDV Mar 24, 2026
@deniskuzZ deniskuzZ requested a review from okumin April 2, 2026 20:10
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Apr 8, 2026

@deniskuzZ
Copy link
Copy Markdown
Member Author

hi @okumin, thanks for the review! i've addressed the comments, please take a final look before we merge

Copy link
Copy Markdown
Contributor

@okumin okumin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Good optimization

Comment on lines +875 to +883
case -1:
return false;
case 0:
break;
case 1:
return true;
default:
MAX_WRITERS = threshold;
break;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may update checkstyle.xml if we prefer it, but it is out of scope of this PR and reconsider it later

@deniskuzZ deniskuzZ merged commit 1b57b6a into apache:master Apr 9, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants