Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,286 changes: 1,286 additions & 0 deletions docs/advanced/async_geotiff_reader.ipynb

Large diffs are not rendered by default.

402 changes: 402 additions & 0 deletions docs/advanced/bytes_path_knobs.ipynb

Large diffs are not rendered by default.

37 changes: 33 additions & 4 deletions docs/advanced/tiling_and_stitching.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"source": [
"# Tiling and stitching segmentation outputs\n",
"\n",
"* Author: Gonzalo Mateo-García\n",
"* Author: Gonzalo Mateo-Garc\u00eda\n",
"\n",
"This tutorial shows how to run an AI model by fix-size tiles following the recommendations of *Huang et al. 2018*:\n",
"\n",
Expand Down Expand Up @@ -76,7 +76,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:00<00:00, 12409.18it/s]"
"100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 13/13 [00:00<00:00, 12409.18it/s]"
]
},
{
Expand Down Expand Up @@ -215,7 +215,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:05<00:00, 2.76it/s]\n"
"100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 15/15 [00:05<00:00, 2.76it/s]\n"
]
}
],
Expand Down Expand Up @@ -299,6 +299,35 @@
"cloudsen12.plot_cloudSEN12mask(output_tensor,ax=ax[1])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Async fan-out \u2014 swap the read loop\n",
"\n",
"The loop above issues one read per tile via `RasterioReader` (sync, GDAL VSI). For workloads where many tiles come from cloud storage and reads are network-bound (a tile server, an async ML inference service), swap the per-tile read for `AsyncGeoTIFFReader` + `asyncio.gather`. The model inference itself stays sync \u2014 only the reads parallelise.\n",
"\n",
"Sketch (replace the read step in the tiling loop):\n",
"\n",
"```python\n",
"import asyncio\n",
"from obstore.store import S3Store\n",
"from georeader.async_geotiff_reader import AsyncGeoTIFFReader\n",
"\n",
"store = S3Store(bucket=\"my-bucket\", region=\"us-east-1\")\n",
"reader = await AsyncGeoTIFFReader.open(\"scene.tif\", store=store)\n",
"\n",
"# Fan out all tile reads concurrently from one process.\n",
"chips = await asyncio.gather(*[reader.read_from_window(w) for w in windows])\n",
"\n",
"# Then run the (sync) model on each chip and stitch as before.\n",
"predictions = [model(chip.values) for chip in chips]\n",
"stitched = stitch(predictions, windows, ...)\n",
"```\n",
"\n",
"See [`async_geotiff_reader.ipynb`](async_geotiff_reader.ipynb) for the full tutorial \u2014 when to use which reader, the two-phase laziness model, gotchas, and a mini-solution for post-load warp/reproject.\n"
]
},
{
"cell_type": "markdown",
"id": "9a1d807c-be46-401e-adc4-d135e69ea17f",
Expand All @@ -319,7 +348,7 @@
"\turl = {https://www.sciencedirect.com/science/article/pii/S2352340924008163},\n",
"\tdoi = {10.1016/j.dib.2024.110852},\n",
"\tjournal = {Data in Brief},\n",
"\tauthor = {Aybar, Cesar and Bautista, Lesly and Montero, David and Contreras, Julio and Ayala, Daryl and Prudencio, Fernando and Loja, Jhomira and Ysuhuaylas, Luis and Herrera, Fernando and Gonzales, Karen and Valladares, Jeanett and Flores, Lucy A. and Mamani, Evelin and Quiñonez, Maria and Fajardo, Rai and Espinoza, Wendy and Limas, Antonio and Yali, Roy and Alcántara, Alejandro and Leyva, Martin and Loayza-Muro, Rau´l and Willems, Bram and Mateo-García, Gonzalo and Gómez-Chova, Luis},\n",
"\tauthor = {Aybar, Cesar and Bautista, Lesly and Montero, David and Contreras, Julio and Ayala, Daryl and Prudencio, Fernando and Loja, Jhomira and Ysuhuaylas, Luis and Herrera, Fernando and Gonzales, Karen and Valladares, Jeanett and Flores, Lucy A. and Mamani, Evelin and Qui\u00f1onez, Maria and Fajardo, Rai and Espinoza, Wendy and Limas, Antonio and Yali, Roy and Alc\u00e1ntara, Alejandro and Leyva, Martin and Loayza-Muro, Rau\u00b4l and Willems, Bram and Mateo-Garc\u00eda, Gonzalo and G\u00f3mez-Chova, Luis},\n",
"\tmonth = aug,\n",
"\tyear = {2024},\n",
"\tpages = {110852},\n",
Expand Down
112 changes: 112 additions & 0 deletions docs/read_S2_SAFE_from_bucket.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -306,6 +306,118 @@
"# shutil.rmtree(\"deleteme\")"
]
},
{
"cell_type": "markdown",
"id": "27671b69",
"metadata": {},
"source": [
"## Alternative bytes paths via `RasterioReader`\n",
"\n",
"The high-level `S2_SAFE_reader` above routes bytes through GDAL VSI (libcurl in C) by default — the fastest sync path for public cloud buckets. For workloads that need a different bytes transport (custom auth, niche backends, a Python-side adapter), `RasterioReader` exposes three keyword-only knobs:\n",
"\n",
"- `opener=callable` — passed straight to `rasterio.open(opener=...)`. Signature: `opener(path, mode='rb') -> file-like`.\n",
"- `fs=fsspec.AbstractFileSystem` — shortcut for `opener=fs.open`. Useful for FTP / SFTP / GitHub / MinIO with custom auth.\n",
"- `rio_open_kwargs=dict` — escape hatch for arbitrary additional kwargs.\n",
"\n",
"Sketch (pseudocode — replace with a path and credentials you have access to):\n",
"\n",
"```python\n",
"from georeader.rasterio_reader import RasterioReader\n",
"import fsspec\n",
"\n",
"granule_jp2 = \"gs://my-bucket/path/to/B04.jp2\"\n",
"\n",
"# Default: GDAL VSI, what S2_SAFE_reader uses internally\n",
"reader_default = RasterioReader(granule_jp2)\n",
"\n",
"# Alternative: route through fsspec / gcsfs\n",
"fs = fsspec.filesystem(\"gcs\", token=\"anon\")\n",
"reader_fsspec = RasterioReader(granule_jp2, fs=fs)\n",
"\n",
"# Or a fully custom opener (refresh-aware HTTP clients, sync facade over async readers, ...)\n",
"def my_opener(path, mode=\"rb\"):\n",
" return some_binary_file_like(path)\n",
"reader_custom = RasterioReader(granule_jp2, opener=my_opener)\n",
"```\n",
"\n",
"See [`advanced/bytes_path_knobs.ipynb`](advanced/bytes_path_knobs.ipynb) for a fully executable end-to-end demo against a local fixture.\n"
]
},
{
"cell_type": "markdown",
"id": "9bbf0752",
"metadata": {},
"source": [
"## Async alternative — `AsyncGeoTIFFReader` for high-concurrency reads\n",
"\n",
"`S2_SAFE_reader` and the `opener=` / `fs=` knobs above are **sync** — one\n",
"read at a time. For workloads that fan out many concurrent reads from one\n",
"process (a tile server serving S2 chips, an async ML inference service),\n",
"`AsyncGeoTIFFReader` + `asyncio.gather` is the right shape.\n",
"\n",
"**Important limitation: this reader does not work with the JP2 bands in\n",
"the SAFE archive.** `AsyncGeoTIFFReader` is a thin adapter over\n",
"[`developmentseed/async-geotiff`](https://github.com/developmentseed/async-geotiff),\n",
"which parses files as **TIFF only**. Opening a `.jp2` from the SAFE\n",
"archive raises `AsyncTiffException: unexpected magic bytes`. Sentinel-2\n",
"L1C is published as JP2 in the SAFE archive — for that format you must\n",
"use `RasterioReader` (which routes through GDAL's JP2 driver).\n",
"\n",
"If you want **async** Sentinel-2 access, switch buckets: the\n",
"[Element 84 `sentinel-cogs` bucket on AWS](https://registry.opendata.aws/sentinel-2-l2a-cogs/)\n",
"hosts L2A scenes as **per-band Cloud-Optimized GeoTIFFs**, which\n",
"`AsyncGeoTIFFReader` reads natively. The cell below demonstrates the\n",
"actual working flow against a public L2A scene — anonymous read, no\n",
"credentials needed.\n",
"\n",
"For the full `AsyncGeoTIFFReader` tutorial (two-phase laziness,\n",
"overviews, fan-out patterns, `read.*` polymorphism, warp-after-load),\n",
"see [`advanced/async_geotiff_reader.ipynb`](advanced/async_geotiff_reader.ipynb).\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "954fe0d2",
"metadata": {},
"outputs": [],
"source": [
"# Runnable example: async reads of an Element 84 L2A scene (per-band COGs).\n",
"# The bucket is anonymously readable, so no credentials are needed — but\n",
"# you need network access for this cell to execute.\n",
"import asyncio\n",
"\n",
"import rasterio.windows\n",
"from obstore.store import S3Store\n",
"\n",
"from georeader.async_geotiff_reader import AsyncGeoTIFFReader\n",
"\n",
"# A stable public scene from the L2A COG bucket (MGRS T49SGV, May 2022).\n",
"store = S3Store(bucket=\"sentinel-cogs\", region=\"us-west-2\", skip_signature=True)\n",
"scene_prefix = \"sentinel-s2-l2a-cogs/49/S/GV/2022/5/S2B_49SGV_20220527_0_L2A\"\n",
"band_paths = [f\"{scene_prefix}/B04.tif\", f\"{scene_prefix}/B03.tif\", f\"{scene_prefix}/B02.tif\"]\n",
"\n",
"# Open one reader per band concurrently (each open() = one HEAD-ish IFD fetch)\n",
"readers = await asyncio.gather(\n",
" *[AsyncGeoTIFFReader.open(p, store=store) for p in band_paths]\n",
")\n",
"print(f\"Opened {len(readers)} band readers\")\n",
"print(f\" B04 shape: {readers[0].shape}, crs: {readers[0].crs}, res: {readers[0].res}\")\n",
"\n",
"# Fan out 16 concurrent 256x256 window reads, one window per band, mixed\n",
"windows = [\n",
" rasterio.windows.Window(col_off=5000 + (i % 4) * 256,\n",
" row_off=5000 + (i // 4) * 256,\n",
" width=256, height=256)\n",
" for i in range(16)\n",
"]\n",
"chips = await asyncio.gather(\n",
" *[readers[i % 3].read_from_window(w).load() for i, w in enumerate(windows)]\n",
")\n",
"print(f\"Fetched {len(chips)} chips across 3 bands concurrently from one event loop\")\n",
"print(f\" first chip shape: {chips[0].values.shape}\")\n"
]
},
{
"cell_type": "markdown",
"id": "d45f3f30-150e-487e-a8e6-93df89f542c8",
Expand Down
115 changes: 89 additions & 26 deletions georeader/abstract_reader.py
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,31 @@ def width(self) -> int:
def height(self) -> int:
return self.shape[-2]

@property
def res(self) -> Tuple[float, float]:
return window_utils.res(self.transform)

@property
def bounds(self) -> Tuple[float, float, float, float]:
return window_utils.window_bounds(
rasterio.windows.Window(
row_off=0, col_off=0, height=self.shape[-2], width=self.shape[-1]
),
self.transform,
)

def footprint(self, crs: Optional[str] = None) -> Polygon:
pol = window_utils.window_polygon(
rasterio.windows.Window(
row_off=0, col_off=0, height=self.shape[-2], width=self.shape[-1]
),
self.transform,
)
if (crs is None) or window_utils.compare_crs(self.crs, crs):
return pol

return window_utils.polygon_to_crs(pol, self.crs, crs)


@dataclass
class FakeGeoData:
Expand Down Expand Up @@ -197,56 +222,94 @@ def read_from_window(
raise NotImplementedError(
"read_from_window method must be implemented in the subclass"
)

@property
def values(self) -> np.ndarray:
# return np.zeros(self.shape, dtype=self.dtype)
return self.load(boundless=True).values

@property
def res(self) -> Tuple[float, float]:
return window_utils.res(self.transform)


@property
def dtype(self) -> Any:
raise NotImplementedError(
"dtype property must be implemented in the subclass"
)

@property
def dims(self) -> list[str]:
raise NotImplementedError(
"dims property must be implemented in the subclass"
)

@property
def fill_value_default(self) -> Any:
raise NotImplementedError(
"fill_value_default property must be implemented in the subclass"
)

@property
def bounds(self) -> Tuple[float, float, float, float]:
return window_utils.window_bounds(
rasterio.windows.Window(
row_off=0, col_off=0, height=self.shape[-2], width=self.shape[-1]
),
self.transform,

AbstractGeoData = GeoData


class AsyncGeoData(GeoDataBase):
"""Async mirror of :class:`GeoData`.

Concrete async readers (e.g. ``AsyncGeoTIFFReader``) satisfy this
interface. User code typed against ``AsyncGeoData`` accepts any
conforming async reader without isinstance checks.

Inherits the metadata surface and derived properties (``transform``,
``crs``, ``shape``, ``width``, ``height``, ``bounds``, ``res``,
``footprint``) from :class:`GeoDataBase`. Adds an ``async`` ``load``
method, a **sync** ``read_from_window`` that returns a windowed view
(mirroring :class:`~georeader.rasterio_reader.RasterioReader`), and
the read-tier metadata properties (``dtype``, ``dims``,
``fill_value_default``).

Notes
-----
There is no ``values`` property here (unlike :class:`GeoData`, where it
materialises via a sync ``self.load()``). Properties cannot be ``async``,
so callers materialise via ``await reader.load()`` and read
``.values`` on the returned :class:`~georeader.geotensor.GeoTensor`.

``read_from_window`` is **sync** by design: like
:meth:`RasterioReader.read_from_window`, it only constructs a windowed
view of the reader and performs no I/O. This means
:func:`georeader.read.read_from_window` (and other ``read.*``
functions) work polymorphically with both sync and async readers —
the only difference is that the returned async view must be
materialised via ``await view.load()``.
"""

async def load(self, boundless: bool = True) -> GeoTensor:
raise NotImplementedError(
"load method must be implemented in the subclass"
)

def footprint(self, crs: Optional[str] = None) -> Polygon:
pol = window_utils.window_polygon(
rasterio.windows.Window(
row_off=0, col_off=0, height=self.shape[-2], width=self.shape[-1]
),
self.transform,

def read_from_window(
self, window: rasterio.windows.Window, boundless: bool = True
) -> Union["AsyncGeoData", GeoTensor]:
raise NotImplementedError(
"read_from_window method must be implemented in the subclass"
)
if (crs is None) or window_utils.compare_crs(self.crs, crs):
return pol

return window_utils.polygon_to_crs(pol, self.crs, crs)
@property
def dtype(self) -> Any:
raise NotImplementedError(
"dtype property must be implemented in the subclass"
)

AbstractGeoData = GeoData
@property
def dims(self) -> list[str]:
raise NotImplementedError(
"dims property must be implemented in the subclass"
)

@property
def fill_value_default(self) -> Any:
raise NotImplementedError(
"fill_value_default property must be implemented in the subclass"
)


def same_extent(geo1: GeoData, geo2: GeoData, precision: float = 1e-3) -> bool:
Expand Down
Loading
Loading