Skip to content

[improvement](fe) Keep generated IS NOT NULL for variable-length types in EliminateNotNull#62519

Closed
englefly wants to merge 1 commit intomasterfrom
preserve-is-null
Closed

[improvement](fe) Keep generated IS NOT NULL for variable-length types in EliminateNotNull#62519
englefly wants to merge 1 commit intomasterfrom
preserve-is-null

Conversation

@englefly
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:
The EliminateNotNull rule removes all generated IS NOT NULL predicates when they are implied by other predicates (e.g., a > 'abc' implies a IS NOT NULL). However, for variable-length types (VARCHAR, STRING, JSON, VARIANT, ARRAY, MAP, STRUCT), the IS NOT NULL check reads only the null bitmap which is very cheap and vectorizable, while the actual predicate (a > 'abc') requires reading variable-length data which is expensive. Keeping the IS NOT NULL as a pre-filter can skip null rows before reading the variable-length data, improving scan performance.

For fixed-length types (INT, BIGINT, DATE, etc.), the IS NOT NULL check offers no benefit because reading the column data is already cheap, so it is still removed.

Release note

Optimizer now preserves generated IS NOT NULL predicates for variable-length type columns (STRING, VARCHAR, JSON, VARIANT, complex types) to enable cheaper null bitmap pre-filtering at scan level.

Check List (For Author)

  • Test: Unit Test (EliminateNotNullTest - 6 test cases)
  • Behavior changed: Yes - generated IS NOT NULL for variable-length types is no longer removed
  • Does this need documentation: No

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

…s in EliminateNotNull

### What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:
The EliminateNotNull rule removes all generated IS NOT NULL predicates when they
are implied by other predicates (e.g., `a > 'abc'` implies `a IS NOT NULL`).
However, for variable-length types (VARCHAR, STRING, JSON, VARIANT, ARRAY, MAP,
STRUCT), the IS NOT NULL check reads only the null bitmap which is very cheap and
vectorizable, while the actual predicate (`a > 'abc'`) requires reading
variable-length data which is expensive. Keeping the IS NOT NULL as a pre-filter
can skip null rows before reading the variable-length data, improving scan performance.

For fixed-length types (INT, BIGINT, DATE, etc.), the IS NOT NULL check offers no
benefit because reading the column data is already cheap, so it is still removed.

### Release note

Optimizer now preserves generated IS NOT NULL predicates for variable-length type
columns (STRING, VARCHAR, JSON, VARIANT, complex types) to enable cheaper null
bitmap pre-filtering at scan level.

### Check List (For Author)

- Test: Unit Test (EliminateNotNullTest - 6 test cases)
- Behavior changed: Yes - generated IS NOT NULL for variable-length types is no longer removed
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@englefly
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 76.92% (10/13) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 92.31% (12/13) 🎉
Increment coverage report
Complete coverage report

@englefly
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 1 blocking issue and 1 follow-up test gap.

  1. fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/EliminateNotNull.java
    The rule now keeps generated IS NOT NULL for variable-length slots, but it rebuilds the kept predicate as plain new Not(new IsNull(slot)) and drops the isGeneratedIsNotNull marker. That marker is still consumed later by ConstantPropagation and RangeInference, so a query like name = 'abc' can still lose the intended scan pre-filter later in the rewrite pipeline.

  2. fe/fe-core/src/test/java/org/apache/doris/nereids/rules/rewrite/EliminateNotNullTest.java
    The new tests only check conjunct counts after a local rewrite sequence. They do not assert that the retained predicate is still marked as generated, and they do not cover the later rewrite stages where the regression above appears.

Critical checkpoints:

  • Goal of current task: Partially met. The rule is trying to preserve generated IS NOT NULL for variable-length types, but that behavior does not hold end-to-end because the generated marker is lost.
  • Small, clear, focused change: Yes. The code change is localized to one rewrite rule plus FE unit tests.
  • Concurrency: Not applicable. No shared-state or locking changes are introduced.
  • Lifecycle / static initialization: Not applicable.
  • Configuration changes: None.
  • Compatibility / storage / FE-BE protocol: No incompatible protocol or storage change is involved.
  • Parallel code paths: Needs follow-up. Downstream optimizer paths still distinguish generated vs user-written IS NOT NULL, so this rule has to preserve that contract.
  • Special conditional checks: The new type-based branch is documented, but its interaction with later rewrites is incomplete.
  • Test coverage: Insufficient for the new behavior. A regression should assert the generated marker is preserved, or run the full rewrite pipeline / ConstantPropagation path.
  • Observability: Not applicable.
  • Transaction / persistence / data writes: Not applicable.
  • FE-BE variable passing: Not applicable.
  • Performance: The intended optimization is reasonable, but it is currently undermined because the retained pre-filter can still be simplified away.
  • Other issues: None beyond the above.

Validation note: I could not run ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.EliminateNotNullTest to completion in this runner because thirdparty/installed/bin/protoc is missing during gensrc.

ImmutableSet.Builder<Expression> keepIsNotNull
= ImmutableSet.builderWithExpectedSize(slotsFromIsNotNull.size());
for (Slot slot : slotsFromIsNotNull) {
if (slot.nullable() && !inferNonNotSlots.contains(slot)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rebuilds a kept generated varlen IS NOT NULL as plain new Not(new IsNull(slot)), so the isGeneratedIsNotNull marker is lost. Downstream rewrites still branch on that bit (ConstantPropagation.canReplaceExpression() and RangeInference), so a case like name = 'abc' can still fold the retained pre-filter away later in the pipeline. Can we preserve the generated marker for the predicates we intentionally keep here?

CascadesContext ctx = checker.getCascadesContext();
ctx.topDownRewrite(new InferFilterNotNull());
ctx.bottomUpRewrite(new EliminateNotNull());
checker.matches(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These assertions only check conjunct counts after a local InferFilterNotNull + EliminateNotNull sequence. They would still pass even if the kept predicate loses its isGeneratedIsNotNull marker, which is exactly what downstream passes use. Please add a regression that either asserts the retained predicate is still marked as generated, or runs the full rewriter / ConstantPropagation path so we know the pre-filter survives end-to-end.

@englefly englefly closed this Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants