Skip to content
Draft
858 changes: 858 additions & 0 deletions docs/advanced/async_geotiff_reader.ipynb

Large diffs are not rendered by default.

402 changes: 402 additions & 0 deletions docs/advanced/bytes_path_knobs.ipynb

Large diffs are not rendered by default.

37 changes: 33 additions & 4 deletions docs/advanced/tiling_and_stitching.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"source": [
"# Tiling and stitching segmentation outputs\n",
"\n",
"* Author: Gonzalo Mateo-García\n",
"* Author: Gonzalo Mateo-Garc\u00eda\n",
"\n",
"This tutorial shows how to run an AI model by fix-size tiles following the recommendations of *Huang et al. 2018*:\n",
"\n",
Expand Down Expand Up @@ -76,7 +76,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:00<00:00, 12409.18it/s]"
"100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 13/13 [00:00<00:00, 12409.18it/s]"
]
},
{
Expand Down Expand Up @@ -215,7 +215,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:05<00:00, 2.76it/s]\n"
"100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 15/15 [00:05<00:00, 2.76it/s]\n"
]
}
],
Expand Down Expand Up @@ -299,6 +299,35 @@
"cloudsen12.plot_cloudSEN12mask(output_tensor,ax=ax[1])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Async fan-out \u2014 swap the read loop\n",
"\n",
"The loop above issues one read per tile via `RasterioReader` (sync, GDAL VSI). For workloads where many tiles come from cloud storage and reads are network-bound (a tile server, an async ML inference service), swap the per-tile read for `AsyncGeoTIFFReader` + `asyncio.gather`. The model inference itself stays sync \u2014 only the reads parallelise.\n",
"\n",
"Sketch (replace the read step in the tiling loop):\n",
"\n",
"```python\n",
"import asyncio\n",
"from obstore.store import S3Store\n",
"from georeader.async_geotiff_reader import AsyncGeoTIFFReader\n",
"\n",
"store = S3Store(bucket=\"my-bucket\", region=\"us-east-1\")\n",
"reader = await AsyncGeoTIFFReader.open(\"scene.tif\", store=store)\n",
"\n",
"# Fan out all tile reads concurrently from one process.\n",
"chips = await asyncio.gather(*[reader.read_from_window(w) for w in windows])\n",
"\n",
"# Then run the (sync) model on each chip and stitch as before.\n",
"predictions = [model(chip.values) for chip in chips]\n",
"stitched = stitch(predictions, windows, ...)\n",
"```\n",
"\n",
"See [`async_geotiff_reader.ipynb`](async_geotiff_reader.ipynb) for the full tutorial \u2014 when to use which reader, the two-phase laziness model, gotchas, and a mini-solution for post-load warp/reproject.\n"
]
},
{
"cell_type": "markdown",
"id": "9a1d807c-be46-401e-adc4-d135e69ea17f",
Expand All @@ -319,7 +348,7 @@
"\turl = {https://www.sciencedirect.com/science/article/pii/S2352340924008163},\n",
"\tdoi = {10.1016/j.dib.2024.110852},\n",
"\tjournal = {Data in Brief},\n",
"\tauthor = {Aybar, Cesar and Bautista, Lesly and Montero, David and Contreras, Julio and Ayala, Daryl and Prudencio, Fernando and Loja, Jhomira and Ysuhuaylas, Luis and Herrera, Fernando and Gonzales, Karen and Valladares, Jeanett and Flores, Lucy A. and Mamani, Evelin and Quiñonez, Maria and Fajardo, Rai and Espinoza, Wendy and Limas, Antonio and Yali, Roy and Alcántara, Alejandro and Leyva, Martin and Loayza-Muro, Rau´l and Willems, Bram and Mateo-García, Gonzalo and Gómez-Chova, Luis},\n",
"\tauthor = {Aybar, Cesar and Bautista, Lesly and Montero, David and Contreras, Julio and Ayala, Daryl and Prudencio, Fernando and Loja, Jhomira and Ysuhuaylas, Luis and Herrera, Fernando and Gonzales, Karen and Valladares, Jeanett and Flores, Lucy A. and Mamani, Evelin and Qui\u00f1onez, Maria and Fajardo, Rai and Espinoza, Wendy and Limas, Antonio and Yali, Roy and Alc\u00e1ntara, Alejandro and Leyva, Martin and Loayza-Muro, Rau\u00b4l and Willems, Bram and Mateo-Garc\u00eda, Gonzalo and G\u00f3mez-Chova, Luis},\n",
"\tmonth = aug,\n",
"\tyear = {2024},\n",
"\tpages = {110852},\n",
Expand Down
78 changes: 73 additions & 5 deletions docs/read_S2_SAFE_from_bucket.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"source": [
"## Read Sentinel-2 files from public bucket\n",
"\n",
"* Author: Gonzalo Mateo-García\n",
"* Author: Gonzalo Mateo-Garc\u00eda\n",
"\n",
"This notebook shows how to read a Sentinel-2 SAFE file from the public Google bucket and reading a subset of it."
]
Expand Down Expand Up @@ -62,7 +62,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 13/13 [00:00<00:00, 26341.04it/s]"
"100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 13/13 [00:00<00:00, 26341.04it/s]"
]
},
{
Expand Down Expand Up @@ -193,8 +193,8 @@
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 10 µs, sys: 0 ns, total: 10 µs\n",
"Wall time: 18.6 µs\n"
"CPU times: user 10 \u00b5s, sys: 0 ns, total: 10 \u00b5s\n",
"Wall time: 18.6 \u00b5s\n"
]
},
{
Expand Down Expand Up @@ -306,6 +306,74 @@
"# shutil.rmtree(\"deleteme\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Alternative bytes paths via `RasterioReader`\n",
"\n",
"The high-level `S2_SAFE_reader` above routes bytes through GDAL VSI (libcurl in C) by default \u2014 the fastest sync path for public cloud buckets. For workloads that need a different bytes transport (custom auth, niche backends, a Python-side adapter), `RasterioReader` exposes three keyword-only knobs:\n",
"\n",
"- `opener=callable` \u2014 passed straight to `rasterio.open(opener=...)`. Signature: `opener(path, mode='rb') -> file-like`.\n",
"- `fs=fsspec.AbstractFileSystem` \u2014 shortcut for `opener=fs.open`. Useful for FTP / SFTP / GitHub / MinIO with custom auth.\n",
"- `rio_open_kwargs=dict` \u2014 escape hatch for arbitrary additional kwargs.\n",
"\n",
"Sketch (pseudocode \u2014 replace with a path and credentials you have access to):\n",
"\n",
"```python\n",
"from georeader.rasterio_reader import RasterioReader\n",
"import fsspec\n",
"\n",
"granule_jp2 = \"gs://my-bucket/path/to/B04.jp2\"\n",
"\n",
"# Default: GDAL VSI, what S2_SAFE_reader uses internally\n",
"reader_default = RasterioReader(granule_jp2)\n",
"\n",
"# Alternative: route through fsspec / gcsfs\n",
"fs = fsspec.filesystem(\"gcs\", token=\"anon\")\n",
"reader_fsspec = RasterioReader(granule_jp2, fs=fs)\n",
"\n",
"# Or a fully custom opener (refresh-aware HTTP clients, sync facade over async readers, ...)\n",
"def my_opener(path, mode=\"rb\"):\n",
" return some_binary_file_like(path)\n",
"reader_custom = RasterioReader(granule_jp2, opener=my_opener)\n",
"```\n",
"\n",
"See [`advanced/bytes_path_knobs.ipynb`](advanced/bytes_path_knobs.ipynb) for a fully executable end-to-end demo against a local fixture.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Async alternative \u2014 `AsyncGeoTIFFReader` for high-concurrency reads\n",
"\n",
"`S2_SAFE_reader` and the `opener=` / `fs=` knobs above are **sync** \u2014 one read at a time. For workloads that fan out many concurrent reads from one process (a tile server serving S2 chips, an async ML inference service), `AsyncGeoTIFFReader` + `asyncio.gather` is the right shape. It is COG-only (good for the per-band JP2/TIFF granules; not for the SAFE XML metadata), takes any `obspec.AsyncStore` (`obstore.GCSStore` here), and skips GDAL entirely on the read path.\n",
"\n",
"Sketch (pseudocode \u2014 needs real bucket coordinates and credentials):\n",
"\n",
"```python\n",
"import asyncio\n",
"from obstore.store import GCSStore\n",
"from georeader.async_geotiff_reader import AsyncGeoTIFFReader\n",
"\n",
"# An obstore store rooted at the public Sentinel-2 GCS bucket\n",
"store = GCSStore(bucket=\"gcp-public-data-sentinel-2\", skip_signature=True)\n",
"\n",
"# One reader per granule; in tile-server use these are cached in an LRU.\n",
"reader = await AsyncGeoTIFFReader.open(\"path/to/B04.jp2\", store=store)\n",
"\n",
"# Fan out across N windows of one granule:\n",
"chips = await asyncio.gather(*[reader.read_from_window(w) for w in windows])\n",
"\n",
"# Or across N granules concurrently:\n",
"readers = await asyncio.gather(*[AsyncGeoTIFFReader.open(p, store=store) for p in paths])\n",
"scenes = await asyncio.gather(*[r.load() for r in readers])\n",
"```\n",
"\n",
"See [`advanced/async_geotiff_reader.ipynb`](advanced/async_geotiff_reader.ipynb) for the full tutorial, diagrams, and a mini-solution for warp-after-load.\n"
]
},
{
"cell_type": "markdown",
"id": "d45f3f30-150e-487e-a8e6-93df89f542c8",
Expand All @@ -327,7 +395,7 @@
"\tnumber = {1},\n",
"\turldate = {2023-11-30},\n",
"\tjournal = {Scientific Reports},\n",
"\tauthor = {Portalés-Julià, Enrique and Mateo-García, Gonzalo and Purcell, Cormac and Gómez-Chova, Luis},\n",
"\tauthor = {Portal\u00e9s-Juli\u00e0, Enrique and Mateo-Garc\u00eda, Gonzalo and Purcell, Cormac and G\u00f3mez-Chova, Luis},\n",
"\tmonth = nov,\n",
"\tyear = {2023},\n",
"\tpages = {20316},\n",
Expand Down
105 changes: 79 additions & 26 deletions georeader/abstract_reader.py
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,31 @@ def width(self) -> int:
def height(self) -> int:
return self.shape[-2]

@property
def res(self) -> Tuple[float, float]:
return window_utils.res(self.transform)

@property
def bounds(self) -> Tuple[float, float, float, float]:
return window_utils.window_bounds(
rasterio.windows.Window(
row_off=0, col_off=0, height=self.shape[-2], width=self.shape[-1]
),
self.transform,
)

def footprint(self, crs: Optional[str] = None) -> Polygon:
pol = window_utils.window_polygon(
rasterio.windows.Window(
row_off=0, col_off=0, height=self.shape[-2], width=self.shape[-1]
),
self.transform,
)
if (crs is None) or window_utils.compare_crs(self.crs, crs):
return pol

return window_utils.polygon_to_crs(pol, self.crs, crs)


@dataclass
class FakeGeoData:
Expand Down Expand Up @@ -197,56 +222,84 @@ def read_from_window(
raise NotImplementedError(
"read_from_window method must be implemented in the subclass"
)

@property
def values(self) -> np.ndarray:
# return np.zeros(self.shape, dtype=self.dtype)
return self.load(boundless=True).values

@property
def res(self) -> Tuple[float, float]:
return window_utils.res(self.transform)


@property
def dtype(self) -> Any:
raise NotImplementedError(
"dtype property must be implemented in the subclass"
)

@property
def dims(self) -> list[str]:
raise NotImplementedError(
"dims property must be implemented in the subclass"
)

@property
def fill_value_default(self) -> Any:
raise NotImplementedError(
"fill_value_default property must be implemented in the subclass"
)

@property
def bounds(self) -> Tuple[float, float, float, float]:
return window_utils.window_bounds(
rasterio.windows.Window(
row_off=0, col_off=0, height=self.shape[-2], width=self.shape[-1]
),
self.transform,

AbstractGeoData = GeoData


class AsyncGeoData(GeoDataBase):
"""Async mirror of :class:`GeoData`.

Concrete async readers (e.g. ``AsyncGeoTIFFReader``) satisfy this
interface. User code typed against ``AsyncGeoData`` accepts any
conforming async reader without isinstance checks.

Inherits the metadata surface and derived properties (``transform``,
``crs``, ``shape``, ``width``, ``height``, ``bounds``, ``res``,
``footprint``) from :class:`GeoDataBase`. Adds ``async`` read methods
(``load``, ``read_from_window``) and the read-tier metadata
properties (``dtype``, ``dims``, ``fill_value_default``).

Notes
-----
There is no ``values`` property here (unlike :class:`GeoData`, where it
materialises via a sync ``self.load()``). Properties cannot be ``async``,
so callers materialise via ``await reader.load()`` and read
``.values`` on the returned :class:`~georeader.geotensor.GeoTensor`.
"""

async def load(self, boundless: bool = True) -> GeoTensor:
raise NotImplementedError(
"load method must be implemented in the subclass"
)

def footprint(self, crs: Optional[str] = None) -> Polygon:
pol = window_utils.window_polygon(
rasterio.windows.Window(
row_off=0, col_off=0, height=self.shape[-2], width=self.shape[-1]
),
self.transform,

async def read_from_window(
self, window: rasterio.windows.Window, boundless: bool = True
) -> Union["AsyncGeoData", GeoTensor]:
raise NotImplementedError(
"read_from_window method must be implemented in the subclass"
)
if (crs is None) or window_utils.compare_crs(self.crs, crs):
return pol

return window_utils.polygon_to_crs(pol, self.crs, crs)
@property
def dtype(self) -> Any:
raise NotImplementedError(
"dtype property must be implemented in the subclass"
)

AbstractGeoData = GeoData
@property
def dims(self) -> list[str]:
raise NotImplementedError(
"dims property must be implemented in the subclass"
)

@property
def fill_value_default(self) -> Any:
raise NotImplementedError(
"fill_value_default property must be implemented in the subclass"
)


def same_extent(geo1: GeoData, geo2: GeoData, precision: float = 1e-3) -> bool:
Expand Down
Loading
Loading