Skip to content

feat(asyncread): new module for windowed reproject over AsyncGeoData#2

Closed
jejjohnson wants to merge 5 commits into
feat/async-readersfrom
feat/asyncread-module
Closed

feat(asyncread): new module for windowed reproject over AsyncGeoData#2
jejjohnson wants to merge 5 commits into
feat/async-readersfrom
feat/asyncread-module

Conversation

@jejjohnson

Copy link
Copy Markdown
Owner

This is a PR-into-a-PR: targets feat/async-readers (the branch behind upstream PR spaceml-org#54), not main. Once merged here, the four commits flow into PR spaceml-org#54 naturally.

What this addresses

Second review comment on upstream PR spaceml-org#54 (gonzmg88, 2026-05-26): the reprojection family (read_reproject, read_to_crs, read_reproject_like, resize, read_from_tile) raised NotImplementedError on AsyncGeoTIFFReader, forcing users to pre-load the whole raster before warping. The comment's observation was correct β€” read_reproject already does only a single windowed read on the input (read.py:1606 pre-refactor); the rest is non-I/O setup + warp. Carving along that seam exposes the reproject family to async readers without pre-loading.

What's in here β€” 4 commits, independently reviewable

# Commit What it does
1 refactor(read): extract reproject + window-intersect helpers Pure refactor. Adds _window_intersects_data, _build_no_intersect_result, _ReprojectPlan, _reproject_setup, _reproject_finalize. read_from_window and read_reproject reshape to call them. Public API and behavior unchanged. Sync suite: 926 still green.
2 feat(asyncread): new module mirroring read for AsyncGeoData New georeader/asyncread.py with nine async def siblings: read_from_window, read_from_bounds, read_from_polygon, read_from_center_coords, read_reproject, read_reproject_like, read_to_crs, resize, read_from_tile. Each is a thin orchestrator: one await at the I/O boundary, shared helpers from commit 1 do the rest. AsyncGeoTIFFReader docstring updated to point here instead of "post-load warp."
3 test(asyncread): parity tests against RasterioReader New tests/test_asyncread.py with 25 tests across 9 classes. Each test asserts numerical parity between the async path (AsyncGeoTIFFReader β†’ asyncread.*) and the sync path (RasterioReader β†’ read.*) on the same on-disk COG fixture. pytest.importorskip-gated on async_geotiff + obstore. Full suite: 951 passed.
4 docs(async_geotiff_reader): demonstrate asyncread end-to-end Reworks the intro notebook: side-by-side read.* ↔ asyncread.* table replaces the ⚠️/❌ matrix; cells 26-30 switch from "pre-load then warp" to direct await asyncread.* calls; cells 29-30 collapse the manual mercantile composition into one asyncread.read_from_tile line; the "Mini-solution: warp after loading" section is removed (now the canonical pattern is asyncread). Re-executed in-place.

Highlights

  • Reproject family streams only the needed window. asyncread.read_reproject runs _reproject_setup (sync, no I/O), then await asyncread.read_from_polygon(..., trigger_load=True) for the input window, then _reproject_finalize (sync warp loop). Same as the sync path; same bytes-over-the-wire.
  • No API breakage. Existing read.* signatures and behavior preserved. Internal refactor only. New asyncread module is purely additive.
  • One source of truth for the warp loop. _reproject_setup and _reproject_finalize are shared between read.read_reproject and asyncread.read_reproject. If a bug is fixed in one, both benefit.
  • asyncread.read_from_tile supports both sync and async reader fast paths. Uses inspect.iscoroutinefunction to detect whether to await a reader-provided override.
  • Bit-identical to sync. Tests assert np.array_equal (zero tolerance) on the windowed-read paths, and np.allclose(..., atol=1.0) on the reproject paths (the 1 DN tolerance is for the booleanβ†’floatβ†’threshold branch; reproject of int data is exact in practice).

Testing

$ pytest tests/ -q
951 passed, 26 warnings in 5.92s

Breakdown:

  • 926 sync tests (baseline) β€” refactor preserves all
  • 25 new async parity tests in tests/test_asyncread.py

Notebook re-executed end-to-end with [async] extra; outputs embedded.

Reviewing this PR

Suggested order matches the commits:

  1. Commit 1 (refactor) β€” verify the three new helpers are faithful extractions; the diff is large but every line maps to the original read_from_window / read_reproject body. Sync tests gate behaviour preservation.
  2. Commit 2 (asyncread) β€” the new module is ~400 LOC, mostly mechanical mirroring of read.* signatures.
  3. Commit 3 (tests) β€” 25 parity assertions across 9 classes.
  4. Commit 4 (notebook) β€” narrative + executable demo.

πŸ€– Generated with Claude Code

jejjohnson and others added 5 commits May 29, 2026 12:52
Carve out the non-I/O code in read.py so the upcoming asyncread module can
share it. Three new private helpers; no behavior change for existing callers.

- `_window_intersects_data` + `_build_no_intersect_result`: pure-CPU
  fall-back path used at the head of `read_from_window`. The async sibling
  will call them first and skip the `await` when the window misses the data.

- `_ReprojectPlan` dataclass + `_reproject_setup` + `_reproject_finalize`:
  split `read_reproject` into "compute destination grid / allocate / check
  intersection / detect fast path" (setup, no I/O) and "iterate non-spatial
  dims + rasterio.warp.reproject + pack GeoTensor" (finalize, no I/O).
  `read_reproject` itself becomes a thin orchestrator: call setup, handle the
  fast-path / non-intersecting early exits, do the single windowed read via
  `read_from_polygon(..., trigger_load=True)`, then call finalize.

Public surface (`read_from_window`, `read_reproject`, and every wrapper
above them) is unchanged. Full suite still green: 926 passed, 26 warnings.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Address the second round of review on PR spaceml-org#54: the reprojection family
(`read_reproject`, `read_reproject_like`, `read_to_crs`, `resize`,
`read_from_tile`) and the read-by-window/bounds/polygon/center family now
work over an `AsyncGeoData` input without pre-loading the full raster.

`georeader/asyncread.py` exposes async siblings of nine `read.*` functions
with identical signatures. Each is `async def` with a single `await` at the
I/O boundary; the non-I/O setup, intersection check, dtype handling, and
warp loop are shared with the sync path via the private helpers extracted in
the previous commit (`_window_intersects_data`,
`_build_no_intersect_result`, `_reproject_setup`, `_reproject_finalize`).

Key semantics:

- `read_reproject` and its wrappers stream **only the input window required
  for the destination grid** β€” they call `read_from_polygon(..., trigger_load=True)`
  exactly the same way the sync path does. No full-raster load.
- `read_from_window` honours the same return contract as the sync version:
  ndarray when `return_only_data=True`, GeoTensor when `trigger_load=True`,
  and otherwise the unmaterialised view (for callers who want to compose
  further before awaiting).
- `read_from_tile` checks `inspect.iscoroutinefunction` before delegating to
  a reader-provided `read_from_tile` fast path, supporting both sync and
  async overrides.
- The no-intersection branch of `read_from_window` never awaits β€” it
  synthesises the padded array / GeoTensor in memory and returns directly.

Also updates the `AsyncGeoTIFFReader` module docstring to point users at
`georeader.asyncread` instead of "load-then-post-warp via read.read_reproject_like".
The reader itself is unchanged.

Smoke-tested against a local COG fixture: `read_to_crs`, `read_reproject_like`,
`read_from_bounds`, and `read_from_tile` all return values matching the sync
`RasterioReader` path within 1 DN.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add `tests/test_asyncread.py` with 25 tests covering all nine async
functions in `georeader.asyncread`. Each test class mirrors its sync
sibling in `test_read_windows.py`; each test asserts that the async
path (`AsyncGeoTIFFReader` β†’ `asyncread.*`) returns shape + values
matching the sync path (`RasterioReader` β†’ `read.*`) on the same
on-disk COG fixture.

Coverage by function:

- `read_from_window`: trigger_load β†’ GeoTensor, lazy view, return_only_data
  β†’ ndarray, boundless no-intersection β†’ synthetic fill, non-boundless
  no-intersection β†’ None (5 tests).
- `read_from_bounds`: basic, pad_add grows window, cross-CRS bounds (3).
- `read_from_polygon`: basic + cross-CRS polygon (2).
- `read_from_center_coords`: basic parity (1).
- `read_reproject`: full warp parity at EPSG:4326, same-CRS aligned fast
  path (no warp), non-intersecting β†’ fill (3).
- `read_reproject_like`: template parity + return_only_data (2).
- `read_to_crs`: WGS84 parity, same-CRS short-circuit, Web Mercator (3).
- `resize`: half + double, anti_aliasing=False to keep parity exact (2).
- `read_from_tile`: basic, non-intersecting β†’ None, assert_if_not_intersects,
  out_shape parity (4).

Tests are `pytest.importorskip`-gated on `async_geotiff` + `obstore`, so
they're skipped when the optional `[async]` extra isn't installed.
The async fixture uses `@pytest_asyncio.fixture` to satisfy strict mode.

Full suite: 951 passed (926 baseline + 25 new), 26 warnings.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Reflect the new `georeader.asyncread` module in the intro notebook.

What changed in the narrative:

- "Using the `read` module with this reader" section (cell 25) becomes
  "Reading via the `asyncread` module" β€” a one-to-one side-by-side table
  of `read.*` vs `asyncread.*`, with no ⚠️/❌ rows. Every function in the
  read/reproject family now has an async sibling.

- Patterns 1–3 (cells 26–28) switch from `read.* + view.load()` /
  `await reader.load() + sync warp` to direct `await asyncread.*` calls.
  Pattern 3 in particular drops the pre-load step β€” `asyncread.read_to_crs`
  and `asyncread.read_reproject_like` stream only the input window required
  for the destination grid.

- "Bandwidth-conscious tile reads" section (cells 29–30) collapses the
  manual `mercantile.xy_bounds` β†’ `transform_bounds` β†’ `read_from_bounds`
  β†’ `read_to_crs` composition into a single `await asyncread.read_from_tile`
  call. The compositional bytes-over-the-wire argument is preserved in
  prose (it's still the same plumbing, just hidden behind the async API).

- "What this reader does NOT do" (cell 31) drops the "no on-the-fly CRS
  warp" and "no on-the-fly resampling" bullets β€” both are now provided by
  `asyncread` (the warp loop is shared with the sync path; only the
  byte-fetch is async).

- The "Mini-solution: warp / reproject after loading" section (former
  cells 32–34) is removed entirely. The fetch-native-then-warp workaround
  was the previous review's pain point; the new `asyncread` path is now
  the canonical pattern and lives in the body above.

Notebook re-executed in-place with the optional `[async]` extra installed,
so outputs reflect the new code. Cell count: 39 β†’ 36.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Address PR review feedback that the initial docstrings + comments were too
thin. No behaviour change. Full suite remains 951 passed.

### `georeader/asyncread.py` β€” all 9 functions now have substantive docstrings

Each public function gets:

- A clear one-line summary describing what it does (not "async sibling of X").
- A short prose explanation of the I/O semantics (where the single `await`
  lives, when the no-intersection branch skips I/O entirely).
- An `Args:` block annotating only the arguments where the async semantics
  differ from the sync sibling, with cross-references to `read_from_window`
  for the shared kwargs (`return_only_data`, `trigger_load`, `boundless`).
- A `Returns:` block spelling out the actual return-type matrix (the
  view-vs-GeoTensor-vs-ndarray distinction is the most error-prone part
  of the API).
- A runnable `Example:` showing the actual `await` pattern.
- A `See Also:` link back to the sync sibling for the full parameter and
  example treatment, plus pointers to thinner-wrapper siblings.

`resize` additionally gets a `.. warning::` block flagging that
`anti_aliasing=True` on a lazy async reader causes an eager full-extent
read inside `apply_anti_aliasing` β€” a real footgun the original docstring
didn't mention.

### `georeader/read.py` β€” new helpers documented thoroughly

- `_window_intersects_data`, `_build_no_intersect_result`: explain the
  pure-CPU no-I/O fallback semantics and why they're shared with asyncread.
- `_ReprojectPlan`: per-attribute docstrings, plus an upfront explanation
  of the three-branch decision tree (fast path / non-intersecting / normal).
- `_reproject_setup`: docstring + inline comments on every numbered step.
  The trickiest bits get extra explanation:
  * The aligned-grid detection (why integer offsets β‡’ no warp needed).
  * The dtype/bool casting decision (why `cast` tracks an `.astype()`
    requirement instead of just being recomputed).
  * The `dst_nodata or fill_value_default` fallback (historical API:
    `dst_nodata=0` means "use the source default" β€” preserved verbatim).
- `_reproject_finalize`: docstring + per-step comments on the warp loop.
  The bool round-trip (`bool β†’ float32 β†’ warp β†’ threshold(0.5) β†’ bool`)
  is the most non-obvious piece β€” fully documented now.

### Rationale comments on the orchestrators

Both `read.read_reproject` and `asyncread.read_reproject` now have block
comments before each of the three branches explaining what triggers it
and what the orchestrator does. The structures mirror each other and the
comments call that out β€” making the sync ↔ async correspondence obvious
without diff-hunting.

### Specific complex bits flagged

- `asyncread.read_from_window`: the view-vs-GeoTensor dispatch block now
  walks through all three caller intents (eager-shim return, explicit
  materialisation request, lazy view for fan-out).
- `asyncread.read_from_tile`: the `inspect.iscoroutinefunction` check is
  documented as the dynamic-dispatch mechanism that supports both sync
  (RasterioReader's WarpedVRT path) and async (future server-side tile
  APIs) reader overrides without separate code paths.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@jejjohnson

Copy link
Copy Markdown
Owner Author

Closing β€” moved to upstream as a draft PR-into-PR on spaceml-org/georeader so reviewers can find it alongside PR spaceml-org#54.

@jejjohnson jejjohnson closed this May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant