[SPARK-52709][SQL] Fix parsing of STRUCT<>#55285
Open
yadavay-amzn wants to merge 1 commit intoapache:masterfrom
Open
[SPARK-52709][SQL] Fix parsing of STRUCT<>#55285yadavay-amzn wants to merge 1 commit intoapache:masterfrom
yadavay-amzn wants to merge 1 commit intoapache:masterfrom
Conversation
Member
|
Hi @yadavay-amzn, thank you for your contribution! |
495989d to
ef02927
Compare
Member
|
The GA failure seems not related to this change. Could you rebase to |
When the lexer sees STRUCT<>, it tokenizes it as STRUCT + NEQ because NEQ (<>) is matched before LT (<) in the lexer. This increments complex_type_level_counter for STRUCT but never decrements it (no GT token), corrupting the counter for subsequent tokens. As a result, `SELECT CAST(null AS STRUCT<>), 2 >> 1` fails because >> is not recognized as shift-right. The previous fix (apache#51480) modified the NEQ lexer rule but was reverted because it broke ARRAY(col1 <> col2) where <> is the not-equal operator. This fix follows cloud-fan's suggestion to handle it at the parser level. When the parser matches STRUCT followed by NEQ in the dataType rule, it decrements the counter via an inline action. This is safe because the parser has confirmed that NEQ is being used as empty angle brackets in a type context, not as a comparison operator. Closes SPARK-52709
ef02927 to
d8dea9e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Fix the
STRUCT<>parsing bug by adding a parser-level action to decrementcomplex_type_level_counterwhenSTRUCT NEQis matched in thedataTyperule.Root cause: When the lexer sees
STRUCT<>, it tokenizes it asSTRUCT+NEQ(becauseNEQis defined beforeLT). This increments the counter forSTRUCTbut never decrements it (noGTtoken), corrupting the counter. Subsequent>>then fails to be recognized as shift-right.Previous attempt: PR #51480 was merged then reverted because it modified the
NEQlexer rule, which brokeARRAY(col1 <> col2)where<>is the not-equal operator.This fix: Follows @cloud-fan's suggestion to handle it at the parser level. When the parser matches
STRUCT NEQin thedataTyperule, we knowNEQis being used as empty angle brackets (not as a comparison), so we decrement the counter via an inline action.Why are the changes needed?
SELECT CAST(null AS STRUCT<>), 2 >> 1fails with a syntax error because the corrupted counter prevents>>from being recognized as shift-right.Does this PR introduce any user-facing change?
No. This is a correctness fix for SQL parsing.
How was this patch tested?
Added test in
SparkSqlParserSuitecovering 6 cases:STRUCT<>followed by>>(the original bug)STRUCT<>followed by>>STRUCT<>followed by>>>(unsigned shift right)ARRAY(1 <> 2)still works (the regression case from the reverted PR)MAP<STRING, ARRAY<INT>>still workTest results:
SparkSqlParserSuite: 46/46 passedDataTypeParserSuite: 59/59 passedPlanParserSuite: 80/80 passedWas this patch authored or co-authored using generative AI tooling?
Yes