Skip to content

feat: new AsyncGeoData protocol, RasterioReader bytes-path knobs, AsyncGeoTIFFReader#54

Draft
jejjohnson wants to merge 7 commits into
spaceml-org:mainfrom
jejjohnson:feat/async-readers
Draft

feat: new AsyncGeoData protocol, RasterioReader bytes-path knobs, AsyncGeoTIFFReader#54
jejjohnson wants to merge 7 commits into
spaceml-org:mainfrom
jejjohnson:feat/async-readers

Conversation

@jejjohnson

@jejjohnson jejjohnson commented May 14, 2026

Copy link
Copy Markdown
Member

User story

"I'm building a tile server / async ML inference service in Python and I want to fan out hundreds of concurrent COG reads from one process. Today I either roll my own async reader or pull in something with a different API. I want the same georeader shape — but async."

This PR adds that. After it lands, the user does:

from obstore.store import S3Store
from georeader.async_geotiff_reader import AsyncGeoTIFFReader

store = S3Store(bucket="my-bucket", region="us-east-1")
reader = await AsyncGeoTIFFReader.open("scene.tif", store=store)

# Fan out 100 reads concurrently from one event loop
chips = await asyncio.gather(*[reader.read_from_window(w) for w in windows])

Same crs / transform / shape / read_from_window / load surface as RasterioReader. Just await-able.

Motivation

Three pressures, none new:

  1. Cloud is the default substrate. Modern RS workflows assume reads from S3 / GCS / Azure. RasterioReader ships one path (GDAL VSI) and has no seam to plug in alternatives — fsspec for niche backends, obstore for parallel ranges.
  2. Async I/O is now first-class. Tile servers, async APIs, and any fan-out-to-many-windows workload want coroutines. RasterioReader is sync-only; users currently roll their own.
  3. COG-only readers can be substantially faster. A pure-Rust COG reader (via async-tiff) skips per-call GDAL state, batches parallel range requests through obstore, and coalesces close-by ranges. Specialising for the dominant cloud-native format pays for itself once.

The full design is in research_journal_v2/notes/geotoolz/plans/georeader/. Critical decision: we do not reimplement COG plumbing. developmentseed/async-geotiff already ships IFD walk, tile-fetch math, decompression, request coalescing — we depend on it via an ~80-LOC adapter.

What's included

Four commits, each independently reviewable:

# Commit What it does
1 7fde631 Adds AsyncGeoData abstract class. Deduplicates derived-property defaults (bounds, res, footprint) up to GeoDataBaseGeoData and AsyncGeoData keep only what genuinely differs per tier. No behaviour change for existing callers.
2 06fa1a4 Adds opener= / fs= / rio_open_kwargs= keyword-only knobs to RasterioReader. Threaded through all 7 rasterio.open(...) sites and all 4 recursive RasterioReader(...) constructions. Defaults reproduce today's GDAL VSI behaviour exactly. Bumps the rasterio floor to >=1.4 (needed for opener=).
3 99da95b Adds AsyncGeoTIFFReader — ~80-LOC adapter over async-geotiff. Conforms to AsyncGeoData. New optional [async] extra. pytest-asyncio added to the dev group.
4 ea7ed4d Docs: new intro notebook for AsyncGeoTIFFReader, async sidebars in three existing notebooks (read_from_tileserver, tiling_and_stitching, read_S2_SAFE_from_bucket), mkdocs.yml registration.

Highlights

  • Same metadata surface across sync and async. crs, transform, shape, bounds, dtype, fill_value_default, dims, footprint(crs) are all identical on RasterioReader and AsyncGeoTIFFReader. The only difference is await-ability on the read methods.
  • AsyncGeoTIFFReader is two-phase lazy. __init__ is free; await open() fetches just the COG header (cheap); pixel bytes are fetched on each read_* call. Same model as RasterioReader semantically — different in that the header is parsed once per reader (vs RasterioReader's fresh open per read(), which is what keeps it pickleable across multiprocessing / joblib / Dask).
  • Honest scope boundaries. AsyncGeoTIFFReader is TIFF/COG only. read_from_bounds(target_crs=...) raises NotImplementedError with a clear pointer at georeader.read.read_to_crs for the post-load warp pattern (demonstrated in the intro notebook). async-geotiff explicitly disclaims warp; we follow suit rather than pulling GDAL back into the async cone.
  • No behaviour change for existing RasterioReader callers. Defaults reproduce GDAL VSI exactly. The three new knobs are purely additive.
  • async-geotiff API surface verified. Before writing the adapter I spiked the actual async-geotiff 0.5.0 API and adjusted the design (e.g. geotiff.count instead of ifd.samples_per_pixel, GeoTIFF.dtype already returns np.dtype, store= is required not optional).
  • The intro notebook actually demonstrates. Two HTML/CSS box-flow diagrams that render in both Jupyter and mkdocs (no extensions), overview_level walkthrough with byte-size comparison, real asyncio.gather fan-out, the NotImplementedError boundary, and a mini-solution for warp-after-load.

Testing plan

  • Full suite: 793 passed, 26 warnings. Unchanged from baseline + 17 new tests.
  • TestAsyncGeoData (4 tests) — inherited defaults from GeoDataBase, NotImplementedError on the abstract async methods.
  • TestBytesPathKnobs (5 tests) — default GDAL VSI unchanged, opener round-trip, fsspec round-trip, mutually-exclusive validation, kwargs survive recursive construction.
  • test_async_geotiff_reader.py (8 tests, pytest.importorskip-gated) — metadata-after-open, RuntimeError-before-open, numerical parity with RasterioReader.read_from_window, full-load parity, warp/resample NotImplementedError boundary, asyncio.gather fan-out, async with, __repr__.
  • All updated notebooks re-execute end-to-end and embed real outputs:
    • docs/advanced/bytes_path_knobs.ipynb — local fixture, all three paths
    • docs/advanced/async_geotiff_reader.ipynb — local fixture with overviews
    • notebooks/read_from_tileserver.ipynbreal public Sentinel-2 COG via Element 84's sentinel-cogs bucket, 16 concurrent reads
  • Reviewer follow-up: poetry.lock is not regenerated by these commits since all new deps are optional ([async] extra) or dev-only — the base resolution is unchanged. Run poetry lock --no-update before merging to refresh the lockfile metadata.

🤖 Generated with Claude Code

jejjohnson and others added 4 commits May 14, 2026 17:27
… on GeoDataBase

Adds a new ``AsyncGeoData`` abstract class to ``georeader/abstract_reader.py``
that mirrors ``GeoData`` with ``async`` read methods. Concrete async readers
(e.g. the upcoming ``AsyncGeoTIFFReader``) satisfy this interface so user code
can branch on sync-vs-async without isinstance checks.

While here, deduplicate the derived metadata properties that were previously
copy-pasted across ``GeoData`` and would have been copy-pasted again on
``AsyncGeoData``:

- ``bounds``, ``res``, ``footprint`` move up to ``GeoDataBase`` (they only need
  ``transform``, ``crs``, ``shape`` — all already on ``GeoDataBase``).
- ``GeoData`` and ``AsyncGeoData`` keep only the surface that genuinely differs
  per tier: sync vs async ``load`` / ``read_from_window``, plus the read-tier
  metadata stubs (``dtype``, ``dims``, ``fill_value_default``).

No behaviour change for existing ``GeoData`` consumers — ``GeoData.bounds`` etc.
still resolve, just one inheritance level higher. ``GeoTensor`` is unaffected
(no inheritance from these classes).

Tests: adds ``TestAsyncGeoData`` covering inherited defaults (``bounds``,
``res``, ``footprint``) and verifying ``load`` / ``read_from_window`` raise
``NotImplementedError`` on a bare subclass. Full suite: 780 passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`RasterioReader` previously routed all reads through GDAL VSI (libcurl in C)
with no seam to plug in an alternative byte transport. That's fine for the
common case but offers no way to opt into fsspec for niche backends (FTP,
SFTP, GitHub, MinIO with custom auth) or a user-supplied callback for custom
HTTP clients / refreshable tokens.

Add three keyword-only constructor knobs that translate into the rasterio
`opener=` parameter, plus an escape hatch for arbitrary extra kwargs:

- `opener=callable` — passed straight to `rasterio.open(opener=...)`. The
  callable must accept `(path, mode="rb")` — rasterio 1.4 calls it as
  `opener(path)` so the mode default is load-bearing.
- `fs=fsspec.AbstractFileSystem` — shortcut equivalent to `opener=fs.open`.
- `rio_open_kwargs=dict` — escape hatch for arbitrary additional kwargs
  forwarded to every `rasterio.open(...)` call.

`opener=` and `fs=` are mutually exclusive — passing both raises `ValueError`
at construction.

Implementation:
- New private helper `_resolve_open_kwargs()` returns the kwargs dict to
  splat at every `rasterio.open(path, ...)` call site.
- Threaded through all 7 `rasterio.open(...)` call sites in the file.
- All 4 recursive `RasterioReader(...)` constructions (in `read_from_window`,
  `isel`, `__copy__`, `reader_overview`) forward the three knobs so they
  survive across spawned sub-readers.
- Bump `rasterio` floor from `>=1` to `>=1.4` (the version that introduced
  `opener=`).

Docs:
- New `docs/advanced/bytes_path_knobs.ipynb` — fully executable end-to-end
  demo against a local fixture, exercising all three paths plus the
  mutually-exclusive validation.
- Sidebar in `docs/read_S2_SAFE_from_bucket.ipynb` flagging the knobs exist
  for cloud reads (pseudocode only — the executable demo is the new
  advanced notebook).
- Register the advanced notebook in `mkdocs.yml`.

Tests: adds `TestBytesPathKnobs` covering the default path (no knobs),
opener callback round-trip, fsspec shortcut round-trip, mutually-exclusive
validation, and kwargs surviving the recursive construction in
`read_from_window`. Full suite: 785 passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…geotiff

`AsyncGeoTIFFReader` provides async, COG-only reads for high-concurrency
fan-out workloads (tile servers, async ML inference services). It is an
~80-LOC adapter on top of `async-geotiff` (DevSeed) — we don't re-implement
IFD walk, tile-fetch math, decompression dispatch, or request coalescing.
That all lives upstream and we depend on it.

The reader conforms to `AsyncGeoData` (added in the previous commit) so user
code typed against the protocol can swap sync ↔ async readers without
isinstance checks. Same metadata property names as `RasterioReader`
(`crs`, `transform`, `shape`, `dtype`, `bounds`, `fill_value_default`,
`dims`); same method names (`read_from_window`, `read_from_bounds`, `load`)
but each is a coroutine.

Construction is two-phase:
- `AsyncGeoTIFFReader(path, store=...)` — cheap, no I/O
- `await AsyncGeoTIFFReader.open(...)` — fetches the COG header (IFD chain)

After `open()`, sync metadata properties work instantly (just reads off the
already-fetched header). The first pixel-byte fetch happens on the first
`await reader.read_from_window(...)` / `load()`. The `_geotiff` handle is
kept alive between reads, so the header is parsed exactly once per reader
(unlike `RasterioReader`, which opens fresh per call for multi-process
safety). Trade-off: faster repeated reads, not pickleable across processes.

Anti-goals (raise `NotImplementedError`):
- `read_from_bounds(target_crs=...)` — async-geotiff explicitly disclaims
  warp; users either post-warp via `georeader.read.read_reproject_like` or
  fall back to `RasterioReader` with WarpedVRT.
- `read_from_bounds(target_resolution=...)` — same reasoning, no resample.

Dependencies:
- New optional `[async]` extra pulls in `async-geotiff>=0.5,<0.6` (and its
  transitive `async-tiff` + `obspec` chain). Pinned to the 0.5.x line
  because the upstream API is pre-1.0 and may shift between minors.
- Users still pick an `obstore` backend themselves (`S3Store` / `GCSStore` /
  `AzureStore` / `LocalStore`); the right one depends on their cloud.
- Dev group adds `pytest-asyncio`, `obstore`, and `async-geotiff` so the
  tests run in CI.

Tests: `pytest.importorskip("async_geotiff")` gates the whole module so
lean environments skip cleanly. Eight tests cover metadata-after-open,
RuntimeError-before-open, parity with `RasterioReader.read_from_window`
(numerical equality, not just shape), full-load parity, the warp/resample
NotImplementedError boundary, `asyncio.gather` concurrent fan-out across
16 windows, `async with` context manager, and `__repr__` status.

Full suite: 793 passed (8 new + 785 existing).

Note: `poetry.lock` is not regenerated by this commit — all new deps are
optional or dev-only, so the base resolution is unchanged. Run
`poetry lock --no-update` pre-merge to refresh the lockfile metadata.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…notebooks

Adds a full tutorial notebook for `AsyncGeoTIFFReader` and threads short
"async alternative" sidebars into three existing notebooks so users coming
from the sync path discover the async sibling at the relevant entry points.

New notebook `docs/advanced/async_geotiff_reader.ipynb`:
- Two HTML/CSS box-flow diagrams that render natively in both Jupyter and
  mkdocs without any extension — one for `RasterioReader` (three-path
  branch: GDAL VSI / opener / fsspec) and one for `AsyncGeoTIFFReader`
  (linear chain through async-geotiff → async-tiff → obspec → storage).
  Color-coded by layer responsibility (user code / our package / external
  dep / storage).
- A "which reader should I use" decision table.
- End-to-end demo against a local fixture (with a built overview pyramid)
  showing the two-phase laziness model, sync metadata properties,
  numerical parity with `RasterioReader`, the `overview_level` knob with
  side-by-side shape / resolution / byte-size comparisons, concurrent
  fan-out via `asyncio.gather`, and the `async with` context manager.
- A "what this reader does NOT do" section showing
  `NotImplementedError` on `target_crs=` / `target_resolution=`, followed
  by a mini-solution for the common case: load native then warp
  post-step via `read.read_to_crs` / `read.read_reproject_like`.
- A tips/gotchas section covering the two-phase laziness, the explicit
  (not auto-picked) overview-level semantics, multi-process pickleability
  caveats, the `store=` requirement, the inverted mask convention, and
  format scope (TIFF/COG only).

Sidebars added to existing notebooks:
- `notebooks/read_from_tileserver.ipynb` — full executable cell sequence
  against Element 84's public `sentinel-cogs` S3 bucket. Opens a real
  Sentinel-2 L2A COG (10980x10980 uint16 with 4 overviews), then issues
  16 concurrent window reads via `asyncio.gather`. Markdown is explicit
  that XYZ tiles and COG windows are different protocols — not a 1:1
  swap on the same input.
- `docs/advanced/tiling_and_stitching.ipynb` — markdown sidebar with an
  `asyncio.gather` sketch for the per-tile read loop (model inference
  stays sync; only the reads parallelise).
- `docs/read_S2_SAFE_from_bucket.ipynb` — markdown sidebar with a
  `GCSStore` pseudocode block, cross-linking to the intro notebook.

`mkdocs.yml` is updated to register the new tutorial under
"Tutorials → Advanced".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 14, 2026 16:15
@jejjohnson jejjohnson changed the title feat: AsyncGeoData protocol, RasterioReader bytes-path knobs, AsyncGeoTIFFReader feat: new AsyncGeoData protocol, RasterioReader bytes-path knobs, AsyncGeoTIFFReader May 14, 2026
@jejjohnson jejjohnson self-assigned this May 14, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an async reader stack to georeader: introduces AsyncGeoData abstract class (with shared metadata properties hoisted to GeoDataBase), AsyncGeoTIFFReader as a thin adapter over developmentseed/async-geotiff, and adds opener= / fs= / rio_open_kwargs= bytes-path knobs to RasterioReader. Includes new tests, two new advanced notebooks, and sidebars in existing notebooks.

Changes:

  • New AsyncGeoData protocol + refactor of derived metadata (bounds, res, footprint) up to GeoDataBase.
  • New AsyncGeoTIFFReader (~80 LOC adapter) under new optional [async] extra; bumps rasterio floor to >=1.4.
  • New opener=/fs=/rio_open_kwargs= keyword-only constructor knobs on RasterioReader threaded through all 7 rasterio.open sites and all 4 recursive constructions.

Reviewed changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated no comments.

Show a summary per file
File Description
georeader/abstract_reader.py Adds AsyncGeoData; moves bounds/res/footprint to GeoDataBase.
georeader/async_geotiff_reader.py New thin async COG reader satisfying AsyncGeoData.
georeader/rasterio_reader.py Adds keyword-only opener/fs/rio_open_kwargs and threads them through all open sites and sub-reader constructions.
pyproject.toml Bumps rasterio floor; adds [async] extra; adds pytest-asyncio and asyncio_mode=strict.
mkdocs.yml Registers the two new advanced tutorial notebooks.
tests/test_abstract_reader.py Adds TestAsyncGeoData covering inherited defaults + NotImplementedError.
tests/test_rasterio_reader.py Adds TestBytesPathKnobs covering default/opener/fs/exclusivity/recursive forwarding.
tests/test_async_geotiff_reader.py New test file (importorskip-gated) for AsyncGeoTIFFReader.
docs/advanced/bytes_path_knobs.ipynb New executable tutorial for the three bytes-path knobs.
docs/advanced/async_geotiff_reader.ipynb New executable tutorial for AsyncGeoTIFFReader.
docs/advanced/tiling_and_stitching.ipynb Adds async fan-out sidebar; re-encoded unicode glyphs in author/citation.
docs/read_S2_SAFE_from_bucket.ipynb Adds bytes-path-knobs + async sidebars; re-encoded unicode glyphs in author/citation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jejjohnson jejjohnson requested a review from gonzmg88 May 14, 2026 16:18
@jejjohnson jejjohnson added the enhancement New feature or request label May 14, 2026
CI's `poetry install --with dev,tutorial` fails because pyproject.toml gained
new entries (`async-geotiff` optional dep, `[async]` extra, `pytest-asyncio`
+ `obstore` + `async-geotiff` in dev) that are not reflected in poetry.lock.

Regenerated with Poetry 2.4.1 (matching CI's `version: latest`). Lockfile
format is preserved; only the new dep entries and their transitive closure
are added. Existing pins are unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jejjohnson jejjohnson requested a review from gonzmg88 May 16, 2026 07:55

@gonzmg88 gonzmg88 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR overview

This new reader is a great addition, I'm very excited to use it. I think the following comments should enhance the class and it makes it more aligned with the rest of the package.

AsyncGeoTIFFReader class

Remove read_from_bounds from AsyncGeoTIFFReader, it's not part of RasterioReader API nor in the other readers. For reading from bounds theres the function in the read module instead. (read.read_from_bounds method which shall work with this reader as input btw regardless of the crs passed because it reprojects the bounds not the data).

Maybe worth to have the same repr as RasterioReader if the reader is open for consistency? (Basically showing bounds, crs etc)

The boundless option in read_from_window shall be honored. It should produce a GeoTensor with smaller size if the window is in the edges of the raster (so if AsyncGeoTIFFReader pads always an easy fix would be to intersect the window with the raster window before the read if boundless is False). Also this behavior shall be explicitly tested.

Test the read methods in the read module against an AsyncGeoTIFFReader reader

Could you test the following methods in the read module with an AsyncGeoTIFFReader object as input:
• read_from_bounds
• read_from_polygon
• read_from_center_coords
• read_from_tile
• read_reproject(_like)
These tests already exist for RasterioReader object. I'd use the same configuration but with AsyncGeoTIFFReader. Instead of copy pasting, please consider refactor the tests to systematically test with the two readers.

notebooks and docs

Are you sure the AsyncGeoTIFFReader works with jp2 data? Can you make a explicit example in the read_S2_SAFE_from_bucket notebook? I would remove that from this example otherwise. Also,the SAFE reader works only with RasterioReader as is implemented now. I guess it could be configured to have an option of which reader to use... This tutorial is runnable so this behavior can be tested (because jp2 files in gcp can be anonymously read)

The notebook of read from tile server actually demonstrates a completely different thing (reading from a tile server and stitching). So it has nothing to do with this reader. Please move the example to a separate notebook. There's an old example to read S2 files from element84 public Amazon bucket, maybe this example could be moved there? (And updating that notebook? It's in notebooks/Sentinel-2 folder)

In the AsyncGeoTIFFReader notebook example, to create the COG fixture, you can use from georeader.save import save_cog which is less verbose than using rasterio primitives.

Maybe also that notebook would be cool to surface that all read methods work with the reader.

jejjohnson and others added 2 commits May 18, 2026 07:42
…/block_windows parity

Restructure AsyncGeoTIFFReader to mirror RasterioReader's laziness pattern:
read_from_window is now sync and returns a windowed AsyncGeoTIFFReader view
(no I/O); load() is async and performs the actual fetch. This makes the
reader work polymorphically with the entire read.* module — read_from_window,
read_from_bounds, read_from_polygon, read_from_center_coords, read_from_tile,
and read_reproject(_like) — the latter via the pre-load pattern
(`await reader.load()` then pass the GeoTensor; isinstance(data_in, GeoTensor)
short-circuit at read.py:1605 skips internal sync materialisation).

New methods matching RasterioReader:
- overviews() / reader_overview(level): introspect and pin overview level
- block_windows(): tile-aligned iteration for fan-out reads aligned with
  the COG's internal tile grid

Other reader changes:
- window_focus attribute tracks the current view's window
- _raster_window property replaces three inline Window(0, 0, w, h)
  constructions (parallels RasterioReader.real_window)
- fill_value_default falls back to 0 when the COG has no nodata tag
  (matches RasterioReader's nodata-if-not-none-else-0 default)
- Boundless padding routed through window_utils.get_slice_pad +
  GeoTensor.pad() (same pattern as GeoTensor.read_from_window) — replaces
  the previous bespoke np.full + offset-placement code
- Rich __repr__ when opened; aligned multi-line Affine formatting (also
  fixed in RasterioReader.__repr__)

Protocol:
- AsyncGeoData.read_from_window is sync now (returns view), aligning with
  GeoData.read_from_window. Only load() remains async.

Tests (suite: 926 passed, 0 skipped, 0 failed; was 793):
- AsyncGeoTIFFReader: 19 reader-specific tests — overviews, reader_overview,
  block_windows, nested views, explicit nodata, focused bounds,
  read-before-open, boundless edge windows, concurrent fan-out
- New cog_with_nodata_and_overviews fixture for paths the default fixture
  doesn't reach
- test_read_dataarray.py parametrized across both readers via a
  reader_and_materialize fixture — covers all read.* functions including
  read_reproject(_like) via the pre-load pattern (no aread_* siblings
  needed)
- New polymorphic coverage: cross-CRS bounds, cross-CRS polygon, pad_add,
  return_only_data (sync-only by design)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Notebook changes (responding to gonzmg88's review on PR spaceml-org#54):

- docs/advanced/async_geotiff_reader.ipynb — rebuilt from 29 → 39 cells:
  - Added a 60-second async/await primer + "when is async worth it?"
    section for users new to async Python
  - Replaced raw rasterio fixture construction with `save_cog` (with
    `BLOCKSIZE=64` so a 256×256 raster still gets overviews)
  - Documented the view+load pattern explicitly with a quick-reference
    table; cleaned stale references to the removed
    `reader.read_from_bounds(target_crs=...)` method
  - Added a `read.*` compatibility matrix with three categories
    (✅ direct / ⚠️ pre-load required / ❌ not supported) and three
    runnable demos covering the cases
  - Added a `block_windows` tile-aligned fan-out demo (the actual
    recommended pattern for tile servers)
  - Added a bandwidth-conscious tile-reads section showing how to
    compose `read_from_bounds` + `read_to_crs` to fetch only the
    tile-region instead of pre-loading the whole COG
  - Replaced the hand-rolled "print each property" cell with
    `print(reader)` (rich __repr__) + programmatic-access assertions
  - Dropped the misleading "~80 LOC adapter" claim from the diagram
    and the module docstring; described scope honestly

- docs/read_S2_SAFE_from_bucket.ipynb — replaced the JP2-implying
  pseudocode sidebar with an upfront limitation note ("AsyncTiffException:
  unexpected magic bytes" — async-geotiff is TIFF-only, JP2 is not
  supported) plus a real runnable example against the Element 84 L2A
  COG bucket

- notebooks/read_from_tileserver.ipynb — reverted: dropped the 4-cell
  AsyncGeoTIFFReader sidebar that didn't belong (notebook is about
  XYZ tile stitching, an unrelated protocol)

- notebooks/Sentinel-2/read_s2_safe_element84_cloud.ipynb — appended
  the Element 84 async fan-out demo in the right place (alongside the
  existing pystac + S2_SAFE_reader content)

Bug fix uncovered while writing the compatibility-matrix proof:

- georeader/read.py:1832-1835 — `read.read_from_tile` had an inverted
  intersection check (`else: return` was returning None for *intersecting*
  tiles, falling through to the rest of the function only for
  non-intersecting tiles). The existing parametrized test passed by
  accident because of an `if chip_out is None: return` early-out.
  Swapped the control flow to match the docstring's promise; tightened
  the test to assert non-None for a center-of-raster tile so the
  regression can't recur silently. With the fix, `read.read_from_tile`
  joins `read_reproject` / `read_reproject_like` / `read_to_crs` in the
  ⚠️ pre-load column for async readers (the function falls through to
  `read_reproject` when the reader has no native `read_from_tile`
  method).

Suite still: 926 passed, 0 skipped, 0 failed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jejjohnson jejjohnson requested a review from gonzmg88 May 18, 2026 06:31
@gonzmg88

gonzmg88 commented May 26, 2026

Copy link
Copy Markdown
Collaborator

Okay, so now I have a better understanding of the problem, thanks for this notebook and for the aysnc for newbies introduction!

I still find disapointing that the functions that reproject do not work without pre-loading. read_reproject was implemented to load only the input data needed for the reprojection, so for instance, when you call read_reproject_like you first do a windowed read on the input raster on the area covered by the output template and then do the reprojection. Actually, data is only loaded in read_reproject here: https://github.com/jejjohnson/georeader/blob/main/georeader/read.py#L1606 (which is a windowed read).
Then the other functions (read_from_tile, read_to_crs, resize and read_reproject_like) are just wrappers of read_reproject, they do not load any data from the input (only place where data is loaded is in that link I added before).

One potential solution would be to create async sibilings of all reproject functions but shall be done in a smart refactored way (so that the wrapping code of these functions is reused in both the sync and the async versions). But maybe there're some other smarter/cleaner approaches? (I think this should be a recurring problem in general for async functions...). Next section describes the proposal for the async sibiling module of read.py

Proposal

Create an asyncread.py module with reproject function family ( read_reproject, read_from_tile, read_to_crs, resize and read_reproject_like). This would involve refactoring the reproject functions in the read module to extract internal methods (named with _ before) so that code is re-used between the read and the asyncread modules. We shall extend the testing so that all tests of the reproject functions are tested in both pathways (with an AsyncGeoTiffReader asyncread and with RasterioReader read).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants