feat(mem_wal): warm flushed generations into shared caches before query#7284
Open
hamersaw wants to merge 3 commits into
Open
feat(mem_wal): warm flushed generations into shared caches before query#7284hamersaw wants to merge 3 commits into
hamersaw wants to merge 3 commits into
Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
… seam Decompose flushed-generation caching into two roles so warming can be plugged in by the consumer without lance owning it. - `DatasetCache` trait (impl'd by the by-path `FlushedMemTableCache`): `get_or_open` + a now self-contained `get_or_build_pk_hashes` (opens + scans internally, so an out-of-crate warmer needs none of lance's PK-hashing internals) + `retain_paths`. - `GenerationWarmer` trait: the seam lance fires; the consumer (e.g. the WAL pod) implements it. No lance impl ships. - Two warm seams wired to `Option<Arc<dyn GenerationWarmer>>`, `None` by default: pre-commit in `MemTableFlusher` (via `ShardWriterConfig.warmer`) and warm-on-open in `open_flushed_dataset` (threaded through the LSM scanner/planners). Both no-op without a warmer. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tCache Change the LSM scanner/planners, block-list, open_flushed_dataset, and prewarm_mem_wal to take `Arc<dyn DatasetCache>` instead of the concrete `Arc<FlushedMemTableCache>`, and re-export `scan_pk_hashes`. This lets a consumer (the sophon WAL node) supply its own DatasetCache implementation — e.g. one that injects a read-through object-store byte-cache wrapper at open — instead of being locked to the built-in cache. `FlushedMemTableCache` remains the default impl; callers pass it by value and it coerces. No behavior change: the default path still uses FlushedMemTableCache. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# Conflicts: # rust/lance/src/dataset/mem_wal/scanner/builder.rs # rust/lance/src/dataset/mem_wal/scanner/planner.rs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Evolve
FlushedMemTableCacheinto the unified warm/open interface for mem_wal flushed generations, and populate the caches before a generation is queryable so the first query sees zero cold reads.FlushedMemTableCachenow owns a requiredSession(the indexCacheBackendseam) and an optional read-throughWrappingObjectStore(page cache), threading both into every open.get_or_open(path)drops its per-call session arg.warm(path, pk_columns): open +prewarm_all_indexes(FTS) +get_or_build_pk_hashes(vector block-list), bounded by a semaphore and idempotent via awarmedgate.open_flushed_datasetfires a warm-on-open backstop.retain_pathsis now async and actively evicts retired generations' index objects via the newSession::invalidate_index_prefix; the byte cache is left to LRU.MemTableFlusherwarms each generation pre-commit, best-effort (logged on error, never blocksupdate_manifest), threaded viaShardWriterConfig.flushed_cache.This is the Lance-side building block for WAL-pod flushed-generation caching (consumed by sophon, which supplies the backed
Session+ read-through pool).Test plan
cargo test -p lance --lib mem_wal::scanner::flushed_cache(7 tests, incl. warm/idempotency/pk-hash/retain) — passcargo test -p lance --lib mem_wal::memtable::flush(8 tests) — passcargo clippy -p lance --tests --benches— cleancargo fmt --all🤖 Generated with Claude Code