Skip to content

feat(ivfflat): add INCLUDE column support for IVF indexes#24168

Open
iamlinjunhong wants to merge 8 commits intomatrixorigin:mainfrom
iamlinjunhong:m-ivf-incl-col
Open

feat(ivfflat): add INCLUDE column support for IVF indexes#24168
iamlinjunhong wants to merge 8 commits intomatrixorigin:mainfrom
iamlinjunhong:m-ivf-incl-col

Conversation

@iamlinjunhong
Copy link
Copy Markdown
Contributor

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue #24167

What this PR does / why we need it:

Add INCLUDE column support across IVF DDL, entries-table maintenance, planner rewrites, and ivf_search so covered vector queries can push include predicates and avoid base-table joins when mode=include.

Keep mode=post and mode=pre on their original single-round path, switch mode=include fallback from cumulative bucket expansion to non-overlapping bucket slices, reset per-input search round state, and collapse EXPLAIN ANALYZE background ivf_search plans by default while keeping capped verbose expansion.

Add parser, planner, ALTER, explain, unit, and distributed SQL coverage for the new INCLUDE behavior.

@matrix-meow matrix-meow added the size/XXL Denotes a PR that changes 2000+ lines label Apr 21, 2026
Copy link
Copy Markdown
Contributor

@cpegeric cpegeric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add make hnsw index work with INCLUDE and share the code as much as you can. I

Comment thread pkg/sql/plan/ivfflat_include_alter.go Outdated
Copy link
Copy Markdown
Contributor

@XuPeng-SH XuPeng-SH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the large IVF INCLUDE-column slice — the direction makes sense, but I found a few correctness issues that should be fixed before merge:

  1. ivf_search now depends on RuntimeFilterSpecs and IndexReaderParam, but the vm.TableFunction clone / remote encode-decode paths do not preserve them (pkg/sql/compile/operator.go, pkg/sql/compile/remoterun.go). That can break Bloom-filter probe state and candidate budgeting on parallel / remote execution.

  2. ALTER TABLE ... CHANGE COLUMN only records the new column name in affectedCols, while IVF INCLUDE dependency detection still checks the old names in the original index defs (pkg/sql/plan/build_alter_table.go, pkg/sql/plan/ivfflat_include_alter.go). That can skip rebuilding an affected INCLUDE index on rename/change.

  3. In mode=include, multi-round fallback now moves across non-overlapping centroid slices even when residual filters remain outside ivf_search (pkg/sql/plan/apply_indices_ivfflat.go, pkg/vectorindex/ivfflat/search.go). If the outer residual filters reject rows from an early slice, closer valid rows from that same slice are never revisited, so results can be wrong.

  4. IVF indexes are now rewritten on every UPDATE because ivfNeedsRewrite := catalog.IsIvfIndexAlgo(indexdef.IndexAlgo) makes the skip guard impossible to hit (pkg/sql/plan/build_dml_util.go). That looks like a write-amplification regression for updates that do not touch IVF-relevant columns.

@iamlinjunhong
Copy link
Copy Markdown
Contributor Author

Thanks for the large IVF INCLUDE-column slice — the direction makes sense, but I found a few correctness issues that should be fixed before merge:

  1. ivf_search now depends on RuntimeFilterSpecs and IndexReaderParam, but the vm.TableFunction clone / remote encode-decode paths do not preserve them (pkg/sql/compile/operator.go, pkg/sql/compile/remoterun.go). That can break Bloom-filter probe state and candidate budgeting on parallel / remote execution.
  2. ALTER TABLE ... CHANGE COLUMN only records the new column name in affectedCols, while IVF INCLUDE dependency detection still checks the old names in the original index defs (pkg/sql/plan/build_alter_table.go, pkg/sql/plan/ivfflat_include_alter.go). That can skip rebuilding an affected INCLUDE index on rename/change.
  3. In mode=include, multi-round fallback now moves across non-overlapping centroid slices even when residual filters remain outside ivf_search (pkg/sql/plan/apply_indices_ivfflat.go, pkg/vectorindex/ivfflat/search.go). If the outer residual filters reject rows from an early slice, closer valid rows from that same slice are never revisited, so results can be wrong.
  4. IVF indexes are now rewritten on every UPDATE because ivfNeedsRewrite := catalog.IsIvfIndexAlgo(indexdef.IndexAlgo) makes the skip guard impossible to hit (pkg/sql/plan/build_dml_util.go). That looks like a write-amplification regression for updates that do not touch IVF-relevant columns.

done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XXL Denotes a PR that changes 2000+ lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants