Skip to content

feat: add SpillStore trait with local-disk implementation#7311

Open
wjones127 wants to merge 1 commit into
lance-format:mainfrom
wjones127:worktree-piped-wiggling-metcalfe
Open

feat: add SpillStore trait with local-disk implementation#7311
wjones127 wants to merge 1 commit into
lance-format:mainfrom
wjones127:worktree-piped-wiggling-metcalfe

Conversation

@wjones127

@wjones127 wjones127 commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds a generic SpillStore — reclaimable RAII scratch storage for intermediate state that overflows memory (e.g. index-build posting lists, shuffle runs, BTree pages). Mechanism only; consumer migration (IVF shuffler) is a follow-up.

  • SpillStore / SpillFile (lance-io): create_spill_file() vends a write-once RAII handle whose writer() / reader() hand back Box<dyn Writer> / Box<dyn Reader>, so callers feed spill files straight into FileWriter::try_new and a v2 FileReader without leaking an ObjectStore + path. Dropping the handle deletes the file and releases its bytes back to the store's budget.
  • LocalSpillStore: writes to an OS temp directory; with_cap enforces an optional byte budget shared across all handles, returning a typed Error::DiskCapExceeded instead of silently filling the disk. Enforcement lives entirely in the spill store — the spill file decorates the writer with a QuotaWriter (reserve-on-write, release-on-drop-by-stat) rather than threading a field through ObjectStore and every provider, so it works for any backend the store opens.
  • From<io::Error> recovers a wrapped lance Error, so typed errors such as DiskCapExceeded survive the AsyncWrite boundary.
  • ScanScheduler::open_reader builds a FileScheduler over an already-open Reader (no path/size lookup), bridging a bare spill reader into the v2 reader path.
  • Session gains a spill_store field (default: uncapped LocalSpillStore), a with_spill_store() builder for injection, and a spill_store() accessor.

Closes #7300

🤖 Generated with Claude Code

@github-actions github-actions Bot added the enhancement New feature or request label Jun 17, 2026
@github-actions github-actions Bot added the A-encoding Encoding, IO, file reader/writer label Jun 17, 2026
@wjones127 wjones127 force-pushed the worktree-piped-wiggling-metcalfe branch from 2a5a2b4 to ad19afe Compare June 17, 2026 20:56
Adds a `SpillStore` trait on `Session` providing uniform, reclaimable
scratch space for intermediate state that overflows memory (e.g. index
build posting lists, shuffle runs, BTree pages).

- `SpillStore` / `SpillFile` (lance-io): `create_spill_file()` vends a
  RAII handle; `writer()` / `reader()` hand back `Box<dyn Writer>` /
  `Box<dyn Reader>` so callers feed spill files directly into
  `FileWriter::try_new` and a v2 `FileReader` without leaking an
  `ObjectStore` + path. The file is deleted on drop and its bytes are
  released back to the store's usage counter.
- `LocalSpillStore`: writes to an OS temp directory; optionally enforces
  a byte cap. Enforcement lives entirely in the spill store: the spill
  file decorates the writer with a quota-enforcing `QuotaWriter`
  (reserve-on-write, release-on-drop-by-stat) rather than threading a
  field through `ObjectStore` and every provider, so it works for any
  backend the store opens.
- `From<io::Error>` recovers a wrapped lance `Error`, so typed errors
  such as `DiskCapExceeded` survive the `AsyncWrite` boundary.
- `ScanScheduler::open_reader` builds a `FileScheduler` over an
  already-open `Reader` (no path/size lookup).
- `Session` gains a `spill_store` field (defaults to uncapped
  `LocalSpillStore`), a `with_spill_store()` builder, and a
  `spill_store()` accessor so callers and tests can inject alternatives.

Mechanism only; consumer migration (IVF shuffler) is a follow-up.

Closes lance-format#7300

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@wjones127 wjones127 force-pushed the worktree-piped-wiggling-metcalfe branch from ad19afe to b2c0c08 Compare June 17, 2026 21:10
@wjones127 wjones127 marked this pull request as ready for review June 17, 2026 22:01
@wjones127 wjones127 requested a review from westonpace June 17, 2026 22:01
@codecov

codecov Bot commented Jun 17, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 88.92508% with 34 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance-io/src/spill.rs 85.77% 23 Missing and 8 partials ⚠️
rust/lance-core/src/error.rs 93.10% 1 Missing and 1 partial ⚠️
rust/lance/src/session.rs 96.96% 0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@westonpace westonpace left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some nits

//! called only once, so the bytes reserved while writing match the file size
//! released on drop. Two minor inexactnesses are not engineered around: a write
//! aborted at the cap leaks its reservation until the store is dropped, and a
//! file whose size cannot be stat-ed on drop is not released.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "a file whose size cannot be stat-ed on drop" mean? I guess it's not super important.

Comment on lines +64 to +77
pub trait SpillFile: Send + Sync {
/// Open a writer over this spill file.
///
/// For a capped store, writes that would exceed the cap fail with
/// [`lance_core::Error::DiskCapExceeded`]. Spill files are write-once:
/// implementations may reject a second call (the [`LocalSpillStore`] impl
/// returns [`lance_core::Error::invalid_input`]).
async fn writer(&self) -> Result<Box<dyn Writer>>;

/// Open a reader over this spill file.
///
/// The data must have been fully written (the writer shut down) first.
async fn reader(&self) -> Result<Box<dyn Reader>>;
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not super important but it seems there is an implicit lifecycle here that could be captured at compile time. The file must be fully written before it can be opened for reads and the writer can only be obtained once.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. I might refactor based on that.

/// [`SpillStore::create_spill_file`], allowing implementations not backed by a
/// local file (e.g. in-memory buffers, remote object stores).
#[async_trait]
pub trait SpillFile: Send + Sync {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a trait and not a struct? Are you expecting multiple impls?

Comment on lines +116 to +130
loop {
let current = self.used.load(Ordering::Relaxed);
let next = current.saturating_add(n);
if next > self.cap_bytes {
return Err(Error::disk_cap_exceeded(self.cap_bytes, current));
}
if self
.used
.compare_exchange(current, next, Ordering::Relaxed, Ordering::Relaxed)
.is_ok()
{
return Ok(());
}
// Another thread won the CAS — retry.
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personal nit but anytime I start writing CAS loops I give up and just use a mutex 😆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-encoding Encoding, IO, file reader/writer enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add SpillStore trait with local-disk implementation

2 participants