feat: add BuildHasher variants for hash_utils#21429
feat: add BuildHasher variants for hash_utils#21429xudong963 wants to merge 6 commits intoapache:mainfrom
Conversation
(cherry picked from commit 02b972d)
|
run benchmark with_hashes |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing xudong963/upstream-pr46 (6a1cc02) to cdfade5 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
run benchmark with_hashes |
These might be regressions, I am rerunning once more. |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing xudong963/upstream-pr46 (6a1cc02) to cdfade5 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
run benchmark with_hashes |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing xudong963/upstream-pr46 (1dec571) to cdfade5 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
This reverts commit 1dec571.
Keep the default RandomState path concrete while adding dedicated BuildHasher entry points and tests. Also wire the parquet feature through object_store/tokio so datafusion-common all-features clippy builds cleanly.
|
run benchmark with_hashes |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing xudong963/upstream-pr46 (c022fb4) to cdfade5 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
run benchmark with_hashes |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing xudong963/upstream-pr46 (501b223) to cdfade5 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
Which issue does this PR close?
Rationale for this change
datafusion_common::hash_utilscurrently exposes hashing helpers only for DataFusion'sRandomState. This makes the code harder to reuse from callers that need a customBuildHasherfor integration or test scenarios.What changes are included in this PR?
with_hashes_with_hasherandcreate_hashes_with_hasherpublic APIs.RandomStatepath for the currentwith_hashesand
create_hashesAPIs.hash_utilstests to cover the new custom-hasher entry points.force_hash_collisionsso the feature-specific test configuration keeps passing.Are these changes tested?
Yes.
cargo test -p datafusion-common hash_utilscargo test -p datafusion-common --features force_hash_collisions hash_utilsAre there any user-facing changes?
Yes.
datafusion-commonnow exposeswith_hashes_with_hasherandcreate_hashes_with_hasherfor callers that need a customBuildHasher.