feat: implement native empty2null spark inner function#4683
Open
kazantsev-maksim wants to merge 57 commits into
Open
feat: implement native empty2null spark inner function#4683kazantsev-maksim wants to merge 57 commits into
kazantsev-maksim wants to merge 57 commits into
Conversation
This reverts commit 768b3e9.
comphead
reviewed
Jun 18, 2026
comphead
left a comment
Contributor
There was a problem hiding this comment.
Thanks @kazantsev-maksim can we investigate if this function can be implemented through codegen functions rather than native?
It doesn't seem to have intensive computations so codegen implementation should be fine I suppose. The example for codegen #4636
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Part of: #4670
Rationale for this change
Empty2Nullis an internal Spark expression that converts an empty string "" into null. The logic is trivial: if the value is null or a zero-length string, it returns null; otherwise it returns the string unchanged.Purpose
The function is applied during partitioned file writes (parquet, orc, etc.) — specifically to the partition columns. The reason is the correctness of Hive-style partitioning.
In Hive-style directory naming, an empty string and null are indistinguishable: both would produce a path like
col1=, which is ambiguous and breaks reading the data back. To avoid this, Spark runs partition columns throughEmpty2Nullbefore writing, so empty strings end up in the same default partition as null:col1=__HIVE_DEFAULT_PARTITION__What changes are included in this PR?
How are these changes tested?
Add rust unit tests