Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,10 @@ public SqlReturnTypeInference getReturnTypeInference() {
RelDataType originalType =
SqlLibraryOperators.ARRAY.getReturnTypeInference().inferReturnType(sqlOperatorBinding);
RelDataType innerType = originalType.getComponentType();
// Default empty/unknown element type to VARCHAR — see PR description for why.
if (innerType == null || isUnknownLikeType(innerType.getSqlTypeName())) {
innerType = typeFactory.createSqlType(SqlTypeName.VARCHAR);
}
Comment on lines +54 to +56
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is CAST(ARRAY[] to VARCHAR[])? Add an UT to cover this assumption.
Question, Is it limitation of Substrait, becuase it does not supoort UNKNOW?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refer PostgreSQL, it does not allow construct empty ARRAY[].

https://www.postgresql.org/docs/current/sql-expressions.html#SQL-SYNTAX-ARRAY-CONSTRUCTORS

You can construct an empty array, but since it's impossible to have an array with no type, you must explicitly cast your empty array to the desired type. For example:

SELECT ARRAY[]::integer[];
 array
-------
 {}
(1 row)

But in PPL, we allow create emtpy array(). So which means we implict covert to varchar[] in future, right?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes — semantically equivalent to CAST(ARRAY[] AS VARCHAR[]). The fallback path adds an empty-typed-array case Calcite/Postgres handle via explicit cast, but PPL surface allows bare array() so we do it implicitly.

Yes, Substrait limitation — Substrait's Type proto has no encoding for UNKNOWN (or Calcite's NULL SqlTypeName as an element type). When isthmus' TypeConverter.toSubstrait walks an ARRAY<UNKNOWN> it can't produce a valid wire type and throws UnsupportedOperationException: Unable to convert the type UNKNOWN. The Calcite-engine local executor is more lenient because it never serializes the type — it just iterates the empty List<Object> and never reads the element type.

UT added in aa82704testReturnTypeForEmptyCallIsVarcharArray and testReturnTypeForAllNullOperandsIsVarcharArray cover both fallback paths (empty operand list and typeless-NULL operand). testReturnTypeForIntegerOperandPreservesType is the regression guard that confirms concrete element types pass through unchanged.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right — that's the trade-off. Postgres requires the explicit cast (SELECT ARRAY[]::integer[]); PPL doesn't have that surface today, so any caller writing bare array() would otherwise hit an analytics-engine-side error. The implicit VARCHAR default keeps the bare form working, and concrete-element calls (array(1, 2), array('a')) keep their original element type — testReturnTypeForIntegerOperandPreservesType in aa82704 is the regression guard for that.

Long-term we could expose a PPL cast(array() as <T>[]) syntax that mirrors Postgres and remove the implicit default, but that's a separate language-surface change.

return createArrayType(
typeFactory, typeFactory.createTypeWithNullability(innerType, true), true);
} catch (Exception e) {
Expand All @@ -63,6 +67,17 @@ public UDFOperandMetadata getOperandMetadata() {
return null;
}

/**
* Calcite's {@link SqlLibraryOperators#ARRAY} infers a {@code NULL}-element array for an empty
* call list and an {@code UNKNOWN}-element array when type inference can't pick one (e.g. all
* operands are typeless nulls). Either of those bubbles up to the analytics-engine route's
* substrait converter as "Unable to convert the type UNKNOWN" — substrait has no encoding for
* either marker. Treat both as needing a concrete fallback.
*/
private static boolean isUnknownLikeType(SqlTypeName sqlTypeName) {
return sqlTypeName == SqlTypeName.NULL || sqlTypeName == SqlTypeName.UNKNOWN;
}

public static class ArrayImplementor implements NotNullImplementor {
@Override
public Expression implement(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,12 @@
import java.util.Collections;
import java.util.List;
import java.util.stream.Collectors;
import org.apache.calcite.rel.type.RelDataType;
import org.apache.calcite.rel.type.RelDataTypeFactory;
import org.apache.calcite.rel.type.RelDataTypeSystem;
import org.apache.calcite.sql.ExplicitOperatorBinding;
import org.apache.calcite.sql.fun.SqlLibraryOperators;
import org.apache.calcite.sql.type.SqlTypeFactoryImpl;
import org.apache.calcite.sql.type.SqlTypeName;
import org.junit.jupiter.api.Test;

Expand Down Expand Up @@ -302,4 +308,79 @@ public void testArrayWithCharTypePreservesNulls() {
assertNull(list.get(1), "Null should be preserved during CHAR type conversion");
assertEquals("y", list.get(2));
}

// ==================== RETURN-TYPE INFERENCE TESTS ====================
// These tests cover the return-type fallback the analytics-engine route depends on:
// when Calcite can't infer a concrete element type (no operands, or all-null operands),
// we substitute VARCHAR so the call's return type is substrait-serializable. Without the
// fallback Calcite emits ARRAY<NULL> / ARRAY<UNKNOWN>, which fails substrait conversion
// with "Unable to convert the type UNKNOWN" downstream.

/** array() — empty operand list — returns ARRAY<VARCHAR>. */
@Test
public void testReturnTypeForEmptyCallIsVarcharArray() {
RelDataType returnType = inferReturnType();
assertEquals(SqlTypeName.ARRAY, returnType.getSqlTypeName());
RelDataType element = returnType.getComponentType();
assertNotNull(element);
assertEquals(SqlTypeName.VARCHAR, element.getSqlTypeName());
assertTrue(element.isNullable(), "Element type should be nullable per existing semantics");
}

/** array(NULL) — single typeless-null operand — also falls back to ARRAY<VARCHAR>. */
@Test
public void testReturnTypeForAllNullOperandsIsVarcharArray() {
RelDataTypeFactory typeFactory = newTypeFactory();
RelDataType nullType = typeFactory.createSqlType(SqlTypeName.NULL);
RelDataType returnType = inferReturnType(nullType);
assertEquals(SqlTypeName.ARRAY, returnType.getSqlTypeName());
RelDataType element = returnType.getComponentType();
assertNotNull(element);
assertEquals(SqlTypeName.VARCHAR, element.getSqlTypeName());
}

/** array(1) — INTEGER operand — preserves the inferred element type (no fallback). */
@Test
public void testReturnTypeForIntegerOperandPreservesType() {
RelDataTypeFactory typeFactory = newTypeFactory();
RelDataType intType = typeFactory.createSqlType(SqlTypeName.INTEGER);
RelDataType returnType = inferReturnType(intType);
assertEquals(SqlTypeName.ARRAY, returnType.getSqlTypeName());
RelDataType element = returnType.getComponentType();
assertNotNull(element);
assertEquals(
SqlTypeName.INTEGER,
element.getSqlTypeName(),
"Concrete element types must not be affected by the VARCHAR fallback");
}

/** array('a', 'b') — VARCHAR operands — already VARCHAR, fallback path doesn't fire. */
@Test
public void testReturnTypeForVarcharOperandPreservesType() {
RelDataTypeFactory typeFactory = newTypeFactory();
RelDataType varcharType = typeFactory.createSqlType(SqlTypeName.VARCHAR);
RelDataType returnType = inferReturnType(varcharType, varcharType);
assertEquals(SqlTypeName.ARRAY, returnType.getSqlTypeName());
assertEquals(SqlTypeName.VARCHAR, returnType.getComponentType().getSqlTypeName());
}

/**
* Helper — invokes {@code new ArrayFunctionImpl().getReturnTypeInference().inferReturnType(...)}
* via Calcite's {@link ExplicitOperatorBinding}, which is the public test harness for exercising
* a return-type inference against a specific operand-type list. We bind it to {@link
* SqlLibraryOperators#ARRAY} so the inference's internal call to {@code
* SqlLibraryOperators.ARRAY.getReturnTypeInference().inferReturnType(...)} resolves the same
* operator the lambda delegates to.
*/
private static RelDataType inferReturnType(RelDataType... operandTypes) {
RelDataTypeFactory typeFactory = newTypeFactory();
ExplicitOperatorBinding binding =
new ExplicitOperatorBinding(
typeFactory, SqlLibraryOperators.ARRAY, Arrays.asList(operandTypes));
return new ArrayFunctionImpl().getReturnTypeInference().inferReturnType(binding);
}

private static RelDataTypeFactory newTypeFactory() {
return new SqlTypeFactoryImpl(RelDataTypeSystem.DEFAULT);
}
}
Loading