Describe the bug
MapFromArrays and MapFromEntries both route through Spark's ArrayBasedMapBuilder, which:
- Throws
RuntimeException("Cannot use null as map key") when a key is NULL.
- Applies
spark.sql.mapKeyDedupPolicy (EXCEPTION vs LAST_WIN) for duplicate keys.
Comet does not enforce either behavior:
CometMapFromArrays (spark/src/main/scala/org/apache/comet/serde/maps.scala) only wraps the call in a CASE WHEN that handles whole-array NULLs. It does not detect a NULL element inside the keys array, and it does not implement either dedup policy.
CometMapFromEntries (maps.scala) only gates on BinaryType keys / values. The null-key and duplicate-key cases are unmarked.
For datasets containing a NULL key or duplicate keys, Comet will silently produce a map where Spark would throw, or apply different dedup semantics.
Steps to reproduce
Build a map via map_from_arrays or map_from_entries where the keys array contains a NULL element, or contains duplicate keys, and compare Comet against Spark with spark.sql.mapKeyDedupPolicy set to both EXCEPTION and LAST_WIN.
Expected behavior
CometMapFromArrays and CometMapFromEntries should declare an Incompatible(Some(...)) branch (or a tighter input check) covering null-key rejection and dedup-policy semantics, with matching entries in getIncompatibleReasons(), so the cases fall back to Spark rather than diverging silently.
Additional context
Split out from #4505 (items 2 and 3), surfaced by the audit-comet-expression skill run in #4478. The two expressions share the ArrayBasedMapBuilder semantics so they are tracked together here. Distinct from #3327 (closed; native crash on whole-array NULL inputs).
Describe the bug
MapFromArraysandMapFromEntriesboth route through Spark'sArrayBasedMapBuilder, which:RuntimeException("Cannot use null as map key")when a key is NULL.spark.sql.mapKeyDedupPolicy(EXCEPTIONvsLAST_WIN) for duplicate keys.Comet does not enforce either behavior:
CometMapFromArrays(spark/src/main/scala/org/apache/comet/serde/maps.scala) only wraps the call in a CASE WHEN that handles whole-array NULLs. It does not detect a NULL element inside the keys array, and it does not implement either dedup policy.CometMapFromEntries(maps.scala) only gates onBinaryTypekeys / values. The null-key and duplicate-key cases are unmarked.For datasets containing a NULL key or duplicate keys, Comet will silently produce a map where Spark would throw, or apply different dedup semantics.
Steps to reproduce
Build a map via
map_from_arraysormap_from_entrieswhere the keys array contains a NULL element, or contains duplicate keys, and compare Comet against Spark withspark.sql.mapKeyDedupPolicyset to bothEXCEPTIONandLAST_WIN.Expected behavior
CometMapFromArraysandCometMapFromEntriesshould declare anIncompatible(Some(...))branch (or a tighter input check) covering null-key rejection and dedup-policy semantics, with matching entries ingetIncompatibleReasons(), so the cases fall back to Spark rather than diverging silently.Additional context
Split out from #4505 (items 2 and 3), surfaced by the
audit-comet-expressionskill run in #4478. The two expressions share theArrayBasedMapBuildersemantics so they are tracked together here. Distinct from #3327 (closed; native crash on whole-array NULL inputs).