feat(asyncread): new module for windowed reproject over AsyncGeoData#2
Closed
jejjohnson wants to merge 5 commits into
Closed
feat(asyncread): new module for windowed reproject over AsyncGeoData#2jejjohnson wants to merge 5 commits into
jejjohnson wants to merge 5 commits into
Conversation
Carve out the non-I/O code in read.py so the upcoming asyncread module can share it. Three new private helpers; no behavior change for existing callers. - `_window_intersects_data` + `_build_no_intersect_result`: pure-CPU fall-back path used at the head of `read_from_window`. The async sibling will call them first and skip the `await` when the window misses the data. - `_ReprojectPlan` dataclass + `_reproject_setup` + `_reproject_finalize`: split `read_reproject` into "compute destination grid / allocate / check intersection / detect fast path" (setup, no I/O) and "iterate non-spatial dims + rasterio.warp.reproject + pack GeoTensor" (finalize, no I/O). `read_reproject` itself becomes a thin orchestrator: call setup, handle the fast-path / non-intersecting early exits, do the single windowed read via `read_from_polygon(..., trigger_load=True)`, then call finalize. Public surface (`read_from_window`, `read_reproject`, and every wrapper above them) is unchanged. Full suite still green: 926 passed, 26 warnings. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Address the second round of review on PR spaceml-org#54: the reprojection family (`read_reproject`, `read_reproject_like`, `read_to_crs`, `resize`, `read_from_tile`) and the read-by-window/bounds/polygon/center family now work over an `AsyncGeoData` input without pre-loading the full raster. `georeader/asyncread.py` exposes async siblings of nine `read.*` functions with identical signatures. Each is `async def` with a single `await` at the I/O boundary; the non-I/O setup, intersection check, dtype handling, and warp loop are shared with the sync path via the private helpers extracted in the previous commit (`_window_intersects_data`, `_build_no_intersect_result`, `_reproject_setup`, `_reproject_finalize`). Key semantics: - `read_reproject` and its wrappers stream **only the input window required for the destination grid** β they call `read_from_polygon(..., trigger_load=True)` exactly the same way the sync path does. No full-raster load. - `read_from_window` honours the same return contract as the sync version: ndarray when `return_only_data=True`, GeoTensor when `trigger_load=True`, and otherwise the unmaterialised view (for callers who want to compose further before awaiting). - `read_from_tile` checks `inspect.iscoroutinefunction` before delegating to a reader-provided `read_from_tile` fast path, supporting both sync and async overrides. - The no-intersection branch of `read_from_window` never awaits β it synthesises the padded array / GeoTensor in memory and returns directly. Also updates the `AsyncGeoTIFFReader` module docstring to point users at `georeader.asyncread` instead of "load-then-post-warp via read.read_reproject_like". The reader itself is unchanged. Smoke-tested against a local COG fixture: `read_to_crs`, `read_reproject_like`, `read_from_bounds`, and `read_from_tile` all return values matching the sync `RasterioReader` path within 1 DN. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add `tests/test_asyncread.py` with 25 tests covering all nine async functions in `georeader.asyncread`. Each test class mirrors its sync sibling in `test_read_windows.py`; each test asserts that the async path (`AsyncGeoTIFFReader` β `asyncread.*`) returns shape + values matching the sync path (`RasterioReader` β `read.*`) on the same on-disk COG fixture. Coverage by function: - `read_from_window`: trigger_load β GeoTensor, lazy view, return_only_data β ndarray, boundless no-intersection β synthetic fill, non-boundless no-intersection β None (5 tests). - `read_from_bounds`: basic, pad_add grows window, cross-CRS bounds (3). - `read_from_polygon`: basic + cross-CRS polygon (2). - `read_from_center_coords`: basic parity (1). - `read_reproject`: full warp parity at EPSG:4326, same-CRS aligned fast path (no warp), non-intersecting β fill (3). - `read_reproject_like`: template parity + return_only_data (2). - `read_to_crs`: WGS84 parity, same-CRS short-circuit, Web Mercator (3). - `resize`: half + double, anti_aliasing=False to keep parity exact (2). - `read_from_tile`: basic, non-intersecting β None, assert_if_not_intersects, out_shape parity (4). Tests are `pytest.importorskip`-gated on `async_geotiff` + `obstore`, so they're skipped when the optional `[async]` extra isn't installed. The async fixture uses `@pytest_asyncio.fixture` to satisfy strict mode. Full suite: 951 passed (926 baseline + 25 new), 26 warnings. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Reflect the new `georeader.asyncread` module in the intro notebook. What changed in the narrative: - "Using the `read` module with this reader" section (cell 25) becomes "Reading via the `asyncread` module" β a one-to-one side-by-side table of `read.*` vs `asyncread.*`, with noβ οΈ /β rows. Every function in the read/reproject family now has an async sibling. - Patterns 1β3 (cells 26β28) switch from `read.* + view.load()` / `await reader.load() + sync warp` to direct `await asyncread.*` calls. Pattern 3 in particular drops the pre-load step β `asyncread.read_to_crs` and `asyncread.read_reproject_like` stream only the input window required for the destination grid. - "Bandwidth-conscious tile reads" section (cells 29β30) collapses the manual `mercantile.xy_bounds` β `transform_bounds` β `read_from_bounds` β `read_to_crs` composition into a single `await asyncread.read_from_tile` call. The compositional bytes-over-the-wire argument is preserved in prose (it's still the same plumbing, just hidden behind the async API). - "What this reader does NOT do" (cell 31) drops the "no on-the-fly CRS warp" and "no on-the-fly resampling" bullets β both are now provided by `asyncread` (the warp loop is shared with the sync path; only the byte-fetch is async). - The "Mini-solution: warp / reproject after loading" section (former cells 32β34) is removed entirely. The fetch-native-then-warp workaround was the previous review's pain point; the new `asyncread` path is now the canonical pattern and lives in the body above. Notebook re-executed in-place with the optional `[async]` extra installed, so outputs reflect the new code. Cell count: 39 β 36. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Address PR review feedback that the initial docstrings + comments were too
thin. No behaviour change. Full suite remains 951 passed.
### `georeader/asyncread.py` β all 9 functions now have substantive docstrings
Each public function gets:
- A clear one-line summary describing what it does (not "async sibling of X").
- A short prose explanation of the I/O semantics (where the single `await`
lives, when the no-intersection branch skips I/O entirely).
- An `Args:` block annotating only the arguments where the async semantics
differ from the sync sibling, with cross-references to `read_from_window`
for the shared kwargs (`return_only_data`, `trigger_load`, `boundless`).
- A `Returns:` block spelling out the actual return-type matrix (the
view-vs-GeoTensor-vs-ndarray distinction is the most error-prone part
of the API).
- A runnable `Example:` showing the actual `await` pattern.
- A `See Also:` link back to the sync sibling for the full parameter and
example treatment, plus pointers to thinner-wrapper siblings.
`resize` additionally gets a `.. warning::` block flagging that
`anti_aliasing=True` on a lazy async reader causes an eager full-extent
read inside `apply_anti_aliasing` β a real footgun the original docstring
didn't mention.
### `georeader/read.py` β new helpers documented thoroughly
- `_window_intersects_data`, `_build_no_intersect_result`: explain the
pure-CPU no-I/O fallback semantics and why they're shared with asyncread.
- `_ReprojectPlan`: per-attribute docstrings, plus an upfront explanation
of the three-branch decision tree (fast path / non-intersecting / normal).
- `_reproject_setup`: docstring + inline comments on every numbered step.
The trickiest bits get extra explanation:
* The aligned-grid detection (why integer offsets β no warp needed).
* The dtype/bool casting decision (why `cast` tracks an `.astype()`
requirement instead of just being recomputed).
* The `dst_nodata or fill_value_default` fallback (historical API:
`dst_nodata=0` means "use the source default" β preserved verbatim).
- `_reproject_finalize`: docstring + per-step comments on the warp loop.
The bool round-trip (`bool β float32 β warp β threshold(0.5) β bool`)
is the most non-obvious piece β fully documented now.
### Rationale comments on the orchestrators
Both `read.read_reproject` and `asyncread.read_reproject` now have block
comments before each of the three branches explaining what triggers it
and what the orchestrator does. The structures mirror each other and the
comments call that out β making the sync β async correspondence obvious
without diff-hunting.
### Specific complex bits flagged
- `asyncread.read_from_window`: the view-vs-GeoTensor dispatch block now
walks through all three caller intents (eager-shim return, explicit
materialisation request, lazy view for fan-out).
- `asyncread.read_from_tile`: the `inspect.iscoroutinefunction` check is
documented as the dynamic-dispatch mechanism that supports both sync
(RasterioReader's WarpedVRT path) and async (future server-side tile
APIs) reader overrides without separate code paths.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Owner
Author
|
Closing β moved to upstream as a draft PR-into-PR on spaceml-org/georeader so reviewers can find it alongside PR spaceml-org#54. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a PR-into-a-PR: targets
feat/async-readers(the branch behind upstream PR spaceml-org#54), notmain. Once merged here, the four commits flow into PR spaceml-org#54 naturally.What this addresses
Second review comment on upstream PR spaceml-org#54 (gonzmg88, 2026-05-26): the reprojection family (
read_reproject,read_to_crs,read_reproject_like,resize,read_from_tile) raisedNotImplementedErroronAsyncGeoTIFFReader, forcing users to pre-load the whole raster before warping. The comment's observation was correct βread_reprojectalready does only a single windowed read on the input (read.py:1606pre-refactor); the rest is non-I/O setup + warp. Carving along that seam exposes the reproject family to async readers without pre-loading.What's in here β 4 commits, independently reviewable
refactor(read): extract reproject + window-intersect helpers_window_intersects_data,_build_no_intersect_result,_ReprojectPlan,_reproject_setup,_reproject_finalize.read_from_windowandread_reprojectreshape to call them. Public API and behavior unchanged. Sync suite: 926 still green.feat(asyncread): new module mirroring read for AsyncGeoDatageoreader/asyncread.pywith nineasync defsiblings:read_from_window,read_from_bounds,read_from_polygon,read_from_center_coords,read_reproject,read_reproject_like,read_to_crs,resize,read_from_tile. Each is a thin orchestrator: oneawaitat the I/O boundary, shared helpers from commit 1 do the rest.AsyncGeoTIFFReaderdocstring updated to point here instead of "post-load warp."test(asyncread): parity tests against RasterioReadertests/test_asyncread.pywith 25 tests across 9 classes. Each test asserts numerical parity between the async path (AsyncGeoTIFFReaderβasyncread.*) and the sync path (RasterioReaderβread.*) on the same on-disk COG fixture.pytest.importorskip-gated onasync_geotiff+obstore. Full suite: 951 passed.docs(async_geotiff_reader): demonstrate asyncread end-to-endread.*βasyncread.*table replaces theawait asyncread.*calls; cells 29-30 collapse the manual mercantile composition into oneasyncread.read_from_tileline; the "Mini-solution: warp after loading" section is removed (now the canonical pattern isasyncread). Re-executed in-place.Highlights
asyncread.read_reprojectruns_reproject_setup(sync, no I/O), thenawait asyncread.read_from_polygon(..., trigger_load=True)for the input window, then_reproject_finalize(sync warp loop). Same as the sync path; same bytes-over-the-wire.read.*signatures and behavior preserved. Internal refactor only. Newasyncreadmodule is purely additive._reproject_setupand_reproject_finalizeare shared betweenread.read_reprojectandasyncread.read_reproject. If a bug is fixed in one, both benefit.asyncread.read_from_tilesupports both sync and async reader fast paths. Usesinspect.iscoroutinefunctionto detect whether toawaita reader-provided override.np.array_equal(zero tolerance) on the windowed-read paths, andnp.allclose(..., atol=1.0)on the reproject paths (the 1 DN tolerance is for the booleanβfloatβthreshold branch; reproject of int data is exact in practice).Testing
Breakdown:
tests/test_asyncread.pyNotebook re-executed end-to-end with
[async]extra; outputs embedded.Reviewing this PR
Suggested order matches the commits:
read_from_window/read_reprojectbody. Sync tests gate behaviour preservation.read.*signatures.π€ Generated with Claude Code