Skip to content

feat: implement native empty2null spark inner function#4683

Open
kazantsev-maksim wants to merge 57 commits into
apache:mainfrom
kazantsev-maksim:empty2null
Open

feat: implement native empty2null spark inner function#4683
kazantsev-maksim wants to merge 57 commits into
apache:mainfrom
kazantsev-maksim:empty2null

Conversation

@kazantsev-maksim

@kazantsev-maksim kazantsev-maksim commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Part of: #4670

Rationale for this change

Empty2Null is an internal Spark expression that converts an empty string "" into null. The logic is trivial: if the value is null or a zero-length string, it returns null; otherwise it returns the string unchanged.

Purpose
The function is applied during partitioned file writes (parquet, orc, etc.) — specifically to the partition columns. The reason is the correctness of Hive-style partitioning.

In Hive-style directory naming, an empty string and null are indistinguishable: both would produce a path like col1=, which is ambiguous and breaks reading the data back. To avoid this, Spark runs partition columns through Empty2Null before writing, so empty strings end up in the same default partition as null: col1=__HIVE_DEFAULT_PARTITION__

What changes are included in this PR?

How are these changes tested?

Add rust unit tests

@kazantsev-maksim kazantsev-maksim changed the title Empty2null feat: implement native empty2null spark inner function Jun 18, 2026

@comphead comphead left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kazantsev-maksim can we investigate if this function can be implemented through codegen functions rather than native?

It doesn't seem to have intensive computations so codegen implementation should be fine I suppose. The example for codegen #4636

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants