Skip to content

[Bug] map_from_arrays / map_from_entries do not enforce null-key rejection or spark.sql.mapKeyDedupPolicy #4680

@andygrove

Description

@andygrove

Describe the bug

MapFromArrays and MapFromEntries both route through Spark's ArrayBasedMapBuilder, which:

  1. Throws RuntimeException("Cannot use null as map key") when a key is NULL.
  2. Applies spark.sql.mapKeyDedupPolicy (EXCEPTION vs LAST_WIN) for duplicate keys.

Comet does not enforce either behavior:

  • CometMapFromArrays (spark/src/main/scala/org/apache/comet/serde/maps.scala) only wraps the call in a CASE WHEN that handles whole-array NULLs. It does not detect a NULL element inside the keys array, and it does not implement either dedup policy.
  • CometMapFromEntries (maps.scala) only gates on BinaryType keys / values. The null-key and duplicate-key cases are unmarked.

For datasets containing a NULL key or duplicate keys, Comet will silently produce a map where Spark would throw, or apply different dedup semantics.

Steps to reproduce

Build a map via map_from_arrays or map_from_entries where the keys array contains a NULL element, or contains duplicate keys, and compare Comet against Spark with spark.sql.mapKeyDedupPolicy set to both EXCEPTION and LAST_WIN.

Expected behavior

CometMapFromArrays and CometMapFromEntries should declare an Incompatible(Some(...)) branch (or a tighter input check) covering null-key rejection and dedup-policy semantics, with matching entries in getIncompatibleReasons(), so the cases fall back to Spark rather than diverging silently.

Additional context

Split out from #4505 (items 2 and 3), surfaced by the audit-comet-expression skill run in #4478. The two expressions share the ArrayBasedMapBuilder semantics so they are tracked together here. Distinct from #3327 (closed; native crash on whole-array NULL inputs).

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions