Skip to content

Minor: improve PageStore docs with a temp-file spilling example#10074

Open
alamb wants to merge 1 commit into
apache:mainfrom
alamb:page-store-tempfile-doc-example
Open

Minor: improve PageStore docs with a temp-file spilling example#10074
alamb wants to merge 1 commit into
apache:mainfrom
alamb:page-store-tempfile-doc-example

Conversation

@alamb

@alamb alamb commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

The original with_page_store_factory example used a HashMap-backed store with deliberately sparse keys to demonstrate that the writer treats PageKeys as opaque

I think that was more of a unit test -- but it made the example longer and harder to follow. A temp-file-backed store is both simpler to read and much closer to what I think users would actually build, since spilling pages off the heap is the whole point of the API.

What changes are included in this PR?

Replaces the doctest with a TempFilePageStore (one temp file per column chunk: put appends the page, take seeks it back) and reflows the surrounding prose. Also tidies the PageStore trait docs to link out to the example. This is documentation only — no code or behavior changes.

Are these changes tested?

Yes — the example is a runnable doctest that writes a record batch through the spilling store and asserts the round-tripped data matches.

Are there any user-facing changes?

Documentation only; no public API or behavior changes.

@github-actions github-actions Bot added the parquet Changes to the parquet crate label Jun 4, 2026
@alamb alamb added the documentation Improvements or additions to documentation label Jun 4, 2026
@alamb

alamb commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

FYI @adriangb

@adriangb adriangb left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alamb !

Do you think it'd be better to export this as public code instead of a docstring? Then the docstring could show using the code and the code would be a good example that we can add tests to, etc.

The only other note I'd add for the example is to write to a file instead of an in-memory buffer. The destination being memory makes the argument for spilling buffers to disk weaker IMO.

@adriangb

adriangb commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

(btw I can't approve this, i'm not a committer)

@alamb

alamb commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

Thanks @alamb !

Do you think it'd be better to export this as public code instead of a docstring? Then the docstring could show using the code and the code would be a good example that we can add tests to, etc.

I am torn about this -- the example implementation is really quite bad (does many small I/Os, no buffering or prefetching, etc) so if we included it with arrow-rs I think people would have a crappy experience. WIth an example at least the crappy experience would be their own code (that they copy pasted...)

But now that I write that it seems a poor justification 🤔

The only other note I'd add for the example is to write to a file instead of an in-memory buffer. The destination being memory makes the argument for spilling buffers to disk weaker IMO.

I don't quite follow this question. The example does write to a file

    ///     file: File,

🤔

@adriangb

adriangb commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

I am referring to this part:

    /// let mut writer =
    ///     ArrowWriter::try_new_with_options(&mut buffer, to_write.schema(), options).unwrap();

(the actual bytes are still written to memory)

@adriangb

adriangb commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

I agree if a correct implementation using files is complex (either because of APIs, buffering, etc.) then providing a bad one is not helpful. But if it were easy to make a correct implementation maybe providing it is better than providing an incomplete example?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants