Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,17 @@ public SqlReturnTypeInference getReturnTypeInference() {
RelDataType originalType =
SqlLibraryOperators.ARRAY.getReturnTypeInference().inferReturnType(sqlOperatorBinding);
RelDataType innerType = originalType.getComponentType();
// For empty `array()` Calcite infers element type as NULL, which downstream
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np: If possible, can we make the comment more concise. I think it's okay to leave most context/details in PR description.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trimmed in 22d0cce — comment now just points to the PR description.

// serializers (notably the analytics-engine route's substrait converter)
// reject with "Unable to convert the type UNKNOWN". Default to VARCHAR — the
// result is empty either way, so the chosen scalar element type doesn't
// affect any value computation, but it gives the call a substrait-serializable
// type. Existing v2-engine tests (which feed Object lists straight through to
// ExprCollectionValue) are unaffected because the empty list contains no
// elements that need to be cast.
if (innerType == null || isUnknownLikeType(innerType.getSqlTypeName())) {
innerType = typeFactory.createSqlType(SqlTypeName.VARCHAR);
}
Comment on lines +54 to +56
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is CAST(ARRAY[] to VARCHAR[])? Add an UT to cover this assumption.
Question, Is it limitation of Substrait, becuase it does not supoort UNKNOW?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refer PostgreSQL, it does not allow construct empty ARRAY[].

https://www.postgresql.org/docs/current/sql-expressions.html#SQL-SYNTAX-ARRAY-CONSTRUCTORS

You can construct an empty array, but since it's impossible to have an array with no type, you must explicitly cast your empty array to the desired type. For example:

SELECT ARRAY[]::integer[];
 array
-------
 {}
(1 row)

But in PPL, we allow create emtpy array(). So which means we implict covert to varchar[] in future, right?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes — semantically equivalent to CAST(ARRAY[] AS VARCHAR[]). The fallback path adds an empty-typed-array case Calcite/Postgres handle via explicit cast, but PPL surface allows bare array() so we do it implicitly.

Yes, Substrait limitation — Substrait's Type proto has no encoding for UNKNOWN (or Calcite's NULL SqlTypeName as an element type). When isthmus' TypeConverter.toSubstrait walks an ARRAY<UNKNOWN> it can't produce a valid wire type and throws UnsupportedOperationException: Unable to convert the type UNKNOWN. The Calcite-engine local executor is more lenient because it never serializes the type — it just iterates the empty List<Object> and never reads the element type.

UT added in aa82704testReturnTypeForEmptyCallIsVarcharArray and testReturnTypeForAllNullOperandsIsVarcharArray cover both fallback paths (empty operand list and typeless-NULL operand). testReturnTypeForIntegerOperandPreservesType is the regression guard that confirms concrete element types pass through unchanged.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right — that's the trade-off. Postgres requires the explicit cast (SELECT ARRAY[]::integer[]); PPL doesn't have that surface today, so any caller writing bare array() would otherwise hit an analytics-engine-side error. The implicit VARCHAR default keeps the bare form working, and concrete-element calls (array(1, 2), array('a')) keep their original element type — testReturnTypeForIntegerOperandPreservesType in aa82704 is the regression guard for that.

Long-term we could expose a PPL cast(array() as <T>[]) syntax that mirrors Postgres and remove the implicit default, but that's a separate language-surface change.

return createArrayType(
typeFactory, typeFactory.createTypeWithNullability(innerType, true), true);
} catch (Exception e) {
Expand All @@ -63,6 +74,17 @@ public UDFOperandMetadata getOperandMetadata() {
return null;
}

/**
* Calcite's {@link SqlLibraryOperators#ARRAY} infers a {@code NULL}-element array for an empty
* call list and an {@code UNKNOWN}-element array when type inference can't pick one (e.g. all
* operands are typeless nulls). Either of those bubbles up to the analytics-engine route's
* substrait converter as "Unable to convert the type UNKNOWN" — substrait has no encoding for
* either marker. Treat both as needing a concrete fallback.
*/
private static boolean isUnknownLikeType(SqlTypeName sqlTypeName) {
return sqlTypeName == SqlTypeName.NULL || sqlTypeName == SqlTypeName.UNKNOWN;
}

public static class ArrayImplementor implements NotNullImplementor {
@Override
public Expression implement(
Expand Down
Loading