Skip to content

feat(scanner): add external row-address mask prefilter#7288

Draft
JulianYG wants to merge 1 commit into
lance-format:mainfrom
JulianYG:feat/row-addr-mask-prefilter
Draft

feat(scanner): add external row-address mask prefilter#7288
JulianYG wants to merge 1 commit into
lance-format:mainfrom
JulianYG:feat/row-addr-mask-prefilter

Conversation

@JulianYG

Copy link
Copy Markdown

Addresses #6852.

What

Adds Scanner::with_row_addr_prefilter(RowAddrMask), letting callers pass a
precomputed row-address allow/block mask as a prefilter into vector and plain
scans, reusing the scanner's existing retrieval plan rather than re-deriving it.

Motivation

Some pipelines precompute a set of eligible rows out-of-band (e.g. a stored
bitmap of rows belonging to a logical subset / dataset) and want to run KNN or a
plain scan restricted to that set -- without expressing it as a SQL filter. A
multi-hundred-thousand-element IN (...) is impractical to build and parse;
passing the row set directly is far cheaper.

How

The mask threads into the existing prefilter machinery at three points:

  • ANN branch: fed through PreFilterSource into new_knn_exec, ANDed with
    any deletion/SQL prefilter via a MaskAndLoader.
  • Flat / unindexed-fragment branch: a new RowAddrMaskFilterExec filters
    scan output by _rowid, so rows appended after the index build are honored.
  • Plain (non-vector) scan: the mask is supplied as the FilteredReadExec
    index input, so only masked rows are read; a SQL filter becomes a refine on top.

Deletions are still applied by DatasetPreFilter; illegal addresses are ignored.

Status

Draft, pending API agreement on #6852. Behavior is exercised by an out-of-tree
PyO3 binding's test suite (based on v7.0.0, this PR is rebased); cargo check -p lance and cargo fmt are clean. I'd
appreciate guidance on the public API shape before finalizing in-tree tests.

Lets callers pass a serialized RowAddrMask as an allow/block prefilter
into vector and plain scans, reusing the scanner's retrieval plan. The
mask feeds the KNN prefilter source on the ANN branch and FilteredReadExec
as the row source for non-vector scans; a new RowAddrMaskFilterExec honors
the mask on the flat/unindexed-fragment branch. Addresses lance-format#6852.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions github-actions Bot added the enhancement New feature or request label Jun 16, 2026
@JulianYG JulianYG marked this pull request as draft June 16, 2026 07:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants