Skip to content

feat(mem_wal): warm flushed generations into shared caches before query#7284

Open
hamersaw wants to merge 3 commits into
lance-format:mainfrom
hamersaw:perf/wal-cache
Open

feat(mem_wal): warm flushed generations into shared caches before query#7284
hamersaw wants to merge 3 commits into
lance-format:mainfrom
hamersaw:perf/wal-cache

Conversation

@hamersaw

Copy link
Copy Markdown
Contributor

Summary

Evolve FlushedMemTableCache into the unified warm/open interface for mem_wal flushed generations, and populate the caches before a generation is queryable so the first query sees zero cold reads.

  • FlushedMemTableCache now owns a required Session (the index CacheBackend seam) and an optional read-through WrappingObjectStore (page cache), threading both into every open. get_or_open(path) drops its per-call session arg.
  • New warm(path, pk_columns): open + prewarm_all_indexes (FTS) + get_or_build_pk_hashes (vector block-list), bounded by a semaphore and idempotent via a warmed gate. open_flushed_dataset fires a warm-on-open backstop.
  • retain_paths is now async and actively evicts retired generations' index objects via the new Session::invalidate_index_prefix; the byte cache is left to LRU.
  • MemTableFlusher warms each generation pre-commit, best-effort (logged on error, never blocks update_manifest), threaded via ShardWriterConfig.flushed_cache.

This is the Lance-side building block for WAL-pod flushed-generation caching (consumed by sophon, which supplies the backed Session + read-through pool).

Test plan

  • cargo test -p lance --lib mem_wal::scanner::flushed_cache (7 tests, incl. warm/idempotency/pk-hash/retain) — pass
  • cargo test -p lance --lib mem_wal::memtable::flush (8 tests) — pass
  • cargo clippy -p lance --tests --benches — clean
  • cargo fmt --all

🤖 Generated with Claude Code

@github-actions github-actions Bot added the enhancement New feature or request label Jun 15, 2026
… seam

Decompose flushed-generation caching into two roles so warming can be
plugged in by the consumer without lance owning it.

- `DatasetCache` trait (impl'd by the by-path `FlushedMemTableCache`):
  `get_or_open` + a now self-contained `get_or_build_pk_hashes`
  (opens + scans internally, so an out-of-crate warmer needs none of
  lance's PK-hashing internals) + `retain_paths`.
- `GenerationWarmer` trait: the seam lance fires; the consumer (e.g. the
  WAL pod) implements it. No lance impl ships.
- Two warm seams wired to `Option<Arc<dyn GenerationWarmer>>`, `None` by
  default: pre-commit in `MemTableFlusher` (via `ShardWriterConfig.warmer`)
  and warm-on-open in `open_flushed_dataset` (threaded through the LSM
  scanner/planners). Both no-op without a warmer.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tCache

Change the LSM scanner/planners, block-list, open_flushed_dataset, and
prewarm_mem_wal to take `Arc<dyn DatasetCache>` instead of the concrete
`Arc<FlushedMemTableCache>`, and re-export `scan_pk_hashes`. This lets a
consumer (the sophon WAL node) supply its own DatasetCache implementation
— e.g. one that injects a read-through object-store byte-cache wrapper at
open — instead of being locked to the built-in cache. `FlushedMemTableCache`
remains the default impl; callers pass it by value and it coerces.

No behavior change: the default path still uses FlushedMemTableCache.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@hamersaw hamersaw marked this pull request as ready for review June 16, 2026 17:36
# Conflicts:
#	rust/lance/src/dataset/mem_wal/scanner/builder.rs
#	rust/lance/src/dataset/mem_wal/scanner/planner.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant