Minor: improve PageStore docs with a temp-file spilling example#10074
Minor: improve PageStore docs with a temp-file spilling example#10074alamb wants to merge 1 commit into
Conversation
|
FYI @adriangb |
adriangb
left a comment
There was a problem hiding this comment.
Thanks @alamb !
Do you think it'd be better to export this as public code instead of a docstring? Then the docstring could show using the code and the code would be a good example that we can add tests to, etc.
The only other note I'd add for the example is to write to a file instead of an in-memory buffer. The destination being memory makes the argument for spilling buffers to disk weaker IMO.
|
(btw I can't approve this, i'm not a committer) |
I am torn about this -- the example implementation is really quite bad (does many small I/Os, no buffering or prefetching, etc) so if we included it with arrow-rs I think people would have a crappy experience. WIth an example at least the crappy experience would be their own code (that they copy pasted...) But now that I write that it seems a poor justification 🤔
I don't quite follow this question. The example does write to a file 🤔 |
|
I am referring to this part: /// let mut writer =
/// ArrowWriter::try_new_with_options(&mut buffer, to_write.schema(), options).unwrap();
(the actual bytes are still written to memory) |
|
I agree if a correct implementation using files is complex (either because of APIs, buffering, etc.) then providing a bad one is not helpful. But if it were easy to make a correct implementation maybe providing it is better than providing an incomplete example? |
Which issue does this PR close?
Rationale for this change
The original
with_page_store_factoryexample used aHashMap-backed store with deliberately sparse keys to demonstrate that the writer treatsPageKeys as opaqueI think that was more of a unit test -- but it made the example longer and harder to follow. A temp-file-backed store is both simpler to read and much closer to what I think users would actually build, since spilling pages off the heap is the whole point of the API.
What changes are included in this PR?
Replaces the doctest with a
TempFilePageStore(one temp file per column chunk:putappends the page,takeseeks it back) and reflows the surrounding prose. Also tidies thePageStoretrait docs to link out to the example. This is documentation only — no code or behavior changes.Are these changes tested?
Yes — the example is a runnable doctest that writes a record batch through the spilling store and asserts the round-tripped data matches.
Are there any user-facing changes?
Documentation only; no public API or behavior changes.