feat(ivfflat): add INCLUDE column support for IVF indexes#24168
feat(ivfflat): add INCLUDE column support for IVF indexes#24168iamlinjunhong wants to merge 8 commits intomatrixorigin:mainfrom
Conversation
7f80869 to
199b731
Compare
8b4082f to
9b34415
Compare
cpegeric
left a comment
There was a problem hiding this comment.
please add make hnsw index work with INCLUDE and share the code as much as you can. I
XuPeng-SH
left a comment
There was a problem hiding this comment.
Thanks for the large IVF INCLUDE-column slice — the direction makes sense, but I found a few correctness issues that should be fixed before merge:
-
ivf_search now depends on RuntimeFilterSpecs and IndexReaderParam, but the vm.TableFunction clone / remote encode-decode paths do not preserve them (pkg/sql/compile/operator.go, pkg/sql/compile/remoterun.go). That can break Bloom-filter probe state and candidate budgeting on parallel / remote execution.
-
ALTER TABLE ... CHANGE COLUMN only records the new column name in affectedCols, while IVF INCLUDE dependency detection still checks the old names in the original index defs (pkg/sql/plan/build_alter_table.go, pkg/sql/plan/ivfflat_include_alter.go). That can skip rebuilding an affected INCLUDE index on rename/change.
-
In mode=include, multi-round fallback now moves across non-overlapping centroid slices even when residual filters remain outside ivf_search (pkg/sql/plan/apply_indices_ivfflat.go, pkg/vectorindex/ivfflat/search.go). If the outer residual filters reject rows from an early slice, closer valid rows from that same slice are never revisited, so results can be wrong.
-
IVF indexes are now rewritten on every UPDATE because ivfNeedsRewrite := catalog.IsIvfIndexAlgo(indexdef.IndexAlgo) makes the skip guard impossible to hit (pkg/sql/plan/build_dml_util.go). That looks like a write-amplification regression for updates that do not touch IVF-relevant columns.
done |
What type of PR is this?
Which issue(s) this PR fixes:
issue #24167
What this PR does / why we need it:
Add INCLUDE column support across IVF DDL, entries-table maintenance, planner rewrites, and ivf_search so covered vector queries can push include predicates and avoid base-table joins when mode=include.
Keep mode=post and mode=pre on their original single-round path, switch mode=include fallback from cumulative bucket expansion to non-overlapping bucket slices, reset per-input search round state, and collapse EXPLAIN ANALYZE background ivf_search plans by default while keeping capped verbose expansion.
Add parser, planner, ALTER, explain, unit, and distributed SQL coverage for the new INCLUDE behavior.