[improvement](fe) Keep generated IS NOT NULL for variable-length types in EliminateNotNull#62519
[improvement](fe) Keep generated IS NOT NULL for variable-length types in EliminateNotNull#62519
Conversation
…s in EliminateNotNull ### What problem does this PR solve? Issue Number: close #xxx Problem Summary: The EliminateNotNull rule removes all generated IS NOT NULL predicates when they are implied by other predicates (e.g., `a > 'abc'` implies `a IS NOT NULL`). However, for variable-length types (VARCHAR, STRING, JSON, VARIANT, ARRAY, MAP, STRUCT), the IS NOT NULL check reads only the null bitmap which is very cheap and vectorizable, while the actual predicate (`a > 'abc'`) requires reading variable-length data which is expensive. Keeping the IS NOT NULL as a pre-filter can skip null rows before reading the variable-length data, improving scan performance. For fixed-length types (INT, BIGINT, DATE, etc.), the IS NOT NULL check offers no benefit because reading the column data is already cheap, so it is still removed. ### Release note Optimizer now preserves generated IS NOT NULL predicates for variable-length type columns (STRING, VARCHAR, JSON, VARIANT, complex types) to enable cheaper null bitmap pre-filtering at scan level. ### Check List (For Author) - Test: Unit Test (EliminateNotNullTest - 6 test cases) - Behavior changed: Yes - generated IS NOT NULL for variable-length types is no longer removed - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
/review |
There was a problem hiding this comment.
Found 1 blocking issue and 1 follow-up test gap.
-
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/EliminateNotNull.java
The rule now keeps generatedIS NOT NULLfor variable-length slots, but it rebuilds the kept predicate as plainnew Not(new IsNull(slot))and drops theisGeneratedIsNotNullmarker. That marker is still consumed later byConstantPropagationandRangeInference, so a query likename = 'abc'can still lose the intended scan pre-filter later in the rewrite pipeline. -
fe/fe-core/src/test/java/org/apache/doris/nereids/rules/rewrite/EliminateNotNullTest.java
The new tests only check conjunct counts after a local rewrite sequence. They do not assert that the retained predicate is still marked as generated, and they do not cover the later rewrite stages where the regression above appears.
Critical checkpoints:
- Goal of current task: Partially met. The rule is trying to preserve generated
IS NOT NULLfor variable-length types, but that behavior does not hold end-to-end because the generated marker is lost. - Small, clear, focused change: Yes. The code change is localized to one rewrite rule plus FE unit tests.
- Concurrency: Not applicable. No shared-state or locking changes are introduced.
- Lifecycle / static initialization: Not applicable.
- Configuration changes: None.
- Compatibility / storage / FE-BE protocol: No incompatible protocol or storage change is involved.
- Parallel code paths: Needs follow-up. Downstream optimizer paths still distinguish generated vs user-written
IS NOT NULL, so this rule has to preserve that contract. - Special conditional checks: The new type-based branch is documented, but its interaction with later rewrites is incomplete.
- Test coverage: Insufficient for the new behavior. A regression should assert the generated marker is preserved, or run the full rewrite pipeline /
ConstantPropagationpath. - Observability: Not applicable.
- Transaction / persistence / data writes: Not applicable.
- FE-BE variable passing: Not applicable.
- Performance: The intended optimization is reasonable, but it is currently undermined because the retained pre-filter can still be simplified away.
- Other issues: None beyond the above.
Validation note: I could not run ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.EliminateNotNullTest to completion in this runner because thirdparty/installed/bin/protoc is missing during gensrc.
| ImmutableSet.Builder<Expression> keepIsNotNull | ||
| = ImmutableSet.builderWithExpectedSize(slotsFromIsNotNull.size()); | ||
| for (Slot slot : slotsFromIsNotNull) { | ||
| if (slot.nullable() && !inferNonNotSlots.contains(slot)) { |
There was a problem hiding this comment.
This rebuilds a kept generated varlen IS NOT NULL as plain new Not(new IsNull(slot)), so the isGeneratedIsNotNull marker is lost. Downstream rewrites still branch on that bit (ConstantPropagation.canReplaceExpression() and RangeInference), so a case like name = 'abc' can still fold the retained pre-filter away later in the pipeline. Can we preserve the generated marker for the predicates we intentionally keep here?
| CascadesContext ctx = checker.getCascadesContext(); | ||
| ctx.topDownRewrite(new InferFilterNotNull()); | ||
| ctx.bottomUpRewrite(new EliminateNotNull()); | ||
| checker.matches( |
There was a problem hiding this comment.
These assertions only check conjunct counts after a local InferFilterNotNull + EliminateNotNull sequence. They would still pass even if the kept predicate loses its isGeneratedIsNotNull marker, which is exactly what downstream passes use. Please add a regression that either asserts the retained predicate is still marked as generated, or runs the full rewriter / ConstantPropagation path so we know the pre-filter survives end-to-end.
What problem does this PR solve?
Issue Number: close #xxx
Problem Summary:
The EliminateNotNull rule removes all generated IS NOT NULL predicates when they are implied by other predicates (e.g.,
a > 'abc'impliesa IS NOT NULL). However, for variable-length types (VARCHAR, STRING, JSON, VARIANT, ARRAY, MAP, STRUCT), the IS NOT NULL check reads only the null bitmap which is very cheap and vectorizable, while the actual predicate (a > 'abc') requires reading variable-length data which is expensive. Keeping the IS NOT NULL as a pre-filter can skip null rows before reading the variable-length data, improving scan performance.For fixed-length types (INT, BIGINT, DATE, etc.), the IS NOT NULL check offers no benefit because reading the column data is already cheap, so it is still removed.
Release note
Optimizer now preserves generated IS NOT NULL predicates for variable-length type columns (STRING, VARCHAR, JSON, VARIANT, complex types) to enable cheaper null bitmap pre-filtering at scan level.
Check List (For Author)
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)