HBASE-29974: Persist filter hints across scan circuit breaks#7882
Open
shubham-roy wants to merge 5 commits intoapache:masterfrom
Open
HBASE-29974: Persist filter hints across scan circuit breaks#7882shubham-roy wants to merge 5 commits intoapache:masterfrom
shubham-roy wants to merge 5 commits intoapache:masterfrom
Conversation
added 2 commits
March 7, 2026 18:47
…rcuit breaks in scan pipeline, causing unnecessary cell-level iteration.
93eb1f0 to
bc24248
Compare
Contributor
|
@shubham-roy could you rebase with latest master? |
Contributor
|
FYI @tkhurana |
Contributor
|
Contributor
|
Contributor
Author
|
@virajjasani , I have resolved merge conflicts and have also applied spotless command. For the test failures: The above passes in my local. Are we sure it is not a flapper? Also, I believe the 3 count of flakes can be ignored, right? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

JIRA
HBASE-29974
Problem / Motivation
HBase's scan pipeline applies several structural short-circuit checks in
UserScanQueryMatcher.matchColumn— time-range gate, column-set exclusion, andversion-limit exhaustion — before
Filter.filterCellis ever called. When anyof these gates fire, the pipeline returns a plain
SKIPorSEEK_NEXT_COLcodeand the filter is bypassed entirely.
Similarly,
RegionScannerImpliterates through every cell in a rejected rowone-by-one via
nextRow()afterfilterRowKeyreturnstrue, even though afilter can deterministically compute the next valid
row boundary.
The consequence is that filter implementations that could provide meaningful
seek hints are never consulted at these decision points, forcing the scanner to
advance cell-by-cell rather than issuing a single
requestSeek. Forrange-oriented or sparse-column filters over large tables this results in
significant, avoidable read amplification.
Root Cause
The existing
getNextCellHint/SEEK_NEXT_USING_HINTmechanism requiresfilterCellto have been called first; there was no API contract or call-sitefor filters to provide a forward hint when the cell was structurally excluded
before reaching
filterCell. The three structural gates inmatchColumnand thefilterRowKeyrejection path all lacked hook points.Solution
New public API (
Filter)Two new optional methods are added to the abstract
Filterclass, both withbackward-compatible
nulldefaults inFilterBaseandFilterWrapper:getSkipHint(Cell skippedCell)filterCellgetHintForRejectedRow(Cell firstRowCell)filterRowKeybefore any cell-level iterationfilterRowKeyUserScanQueryMatcher(server-side)pendingSkipHintfield andresolveSkipHint()helper.tsCmp > 0,time-range
tsCmp < 0, column-set exclusion, versionSKIP/SEEK_NEXT_COL),calls
resolveSkipHint()and promotes the result toSEEK_NEXT_USING_HINTwhen a non-null hint is returned.
getNextKeyHintdrainspendingSkipHintfirst, before delegating tofilter.getNextCellHint, ensuring the correct hint is returned for the cellthat triggered the structural gate.
RegionScannerImpl(server-side)getHintForRejectedRow(Cell)validates and retrieves the filter hint after afilterRowKeyrejection.nextRowViaHint(ScannerContext, Cell, ExtendedCell)replaces thenextRow()call with a single
storeHeap.requestSeek(hint), wrapped withskippingRowmode to keep block-size accounting consistent.
Backward Compatibility
Filtermethods returnnullby default(defined on
Filterdirectly, with matching no-ops inFilterBaseandFilterWrapper), so all existing filter implementations continue to behaveidentically.
DoNotRetryIOExceptionis thrown (consistent with existinggetNextCellHintvalidation) if a filter returns a non-
ExtendedCellfrom either new method.