Waterbodies extraction pipeline by LucaRom · Pull Request #605 · NRCan/geo-deep-learning

LucaRom · 2025-11-03T22:28:15Z

Draft for now

Will follow soon...

- Add comprehensive module and function docstrings - Replace print() with logging (log.info, log.warning, log.exception) - Replace assert statements with proper ValueError/RuntimeError exceptions - Fix exception handling: pre-assign messages, use specific exception types - Add timeout and SSL verification to requests.get() - Use tempfile.gettempdir() instead of hardcoded /tmp - Fix line length issues (E501) - Fix boolean argument positioning with * separator (FBT001/FBT002) - Rename generic 'df' variables to more descriptive names - Add noqa comments for acceptable complexity (C901) and argument count (PLR0913) - Fix blank line spacing in docstrings (D204, D205)

- Overview of water extraction module as extension of geo-deep-learning - Three usage patterns: fully automated, manual preprocessing, inference - Detailed pipeline documentation for each preprocessing step - Configuration parameters reference table - Output directory structure explanation - Troubleshooting section for common issues - Examples for all usage methods

Merge main with water_extraction

…g and inference - Overhauled TWI computation with tiled processing for large AOIs (>100GB) - Implemented intensity alignment and AOI cropping with BIGTIFF support - Fixed raster stacking to preserve per-band NoData and dtype - Added stratified train/val/test splitting with water-ratio stratification - Enhanced tile filtering with authoritative statistics export - Created standalone inference.py with end-to-end pipeline - Added timing tracking (start/end timestamps, duration) - Improved LiDAR mask rasterization with compression/tiling - Extended download_elevation.py with retry logic and error handling - Updated dependencies (whitebox_workflows, pandas)

Add seam_correction.py to water extraction tools. Implement correct_seams() to suppress linear boundary artifacts in lidar-derived rasters where adjacent acquisition projects meet. Extract shared edges from project extent polygons, build a feathered distance weight mask, and blend a nodata-safe Gaussian-smoothed pass with the original within the buffer zone only.

- Reformat code with proper line breaks and indentation - Reorder imports according to style guide - Add proper spacing around operators and function arguments - Remove unused imports and commented code - Add strict=False to zip() for Python 3.10+ compatibility Files: download_elevation.py, preprocess_inference_data.py, prepare_inputs.py, segmentation_task.py

Add ability to train and run inference with or without intensity channel, allowing models to work with 2-channel (TWI, nDSM) or 3-channel (TWI, nDSM, Intensity) inputs. Changes: - elevation_stack_datamodule.py: * Add include_intensity parameter with validation * Slice user-provided normalization stats to match channel count * Validate stats configuration before dataset creation - elevation_stack_dataset.py: * Add include_intensity parameter to control channel loading * Implement selective channel loading (2 or 3 bands) * Add channel count validation against normalization stats * Improve error messages with detailed diagnostics - inference.py: * Add model_in_channels parameter for inference * Validate input raster has sufficient bands * Implement selective channel loading during inference * Slice mean/std statistics to match model channels * Add detailed logging for channel configuration This enables training models on different input configurations and ensures proper channel handling during inference.

Extend visualize_prediction() to properly handle images with different channel counts: - 1 channel (grayscale): Convert to RGB by repeating channel - 2 channels: Pad with zeros to create 3-channel RGB - 3+ channels: Use first 3 channels (existing behavior) This fixes visualization errors when working with 2-channel inputs (TWI + nDSM without intensity) or single-channel inputs.

…puts Fix seam correction to use polygon boundaries as correction centerlines Optimize seam correction with strip-based windowed processing Fix RuntimeWarning on division by zero in _gaussian_smooth_nodata_safe Switch from Gaussian-blur-and-blend to an inpainting approach. Add nodata filling along seam centerline in correct_seams Add blend_sigma parameter (default 0.0) used only for the taper zone outside the inpainting zone. Switch blend taper to cosine and tighten default parameters Replace Gaussian inpainting with bilateral filter Add scikit to dependencies Add an optional dependencies for water extraction pipeline in GDL

…ding dsm and dtm

Revert seam_correction.py from bilateral filter back to Gaussian inpainting. Rename seam_sigma_color → seam_gaussian_sigma (default 1.5 px) in elevation_stack_datamodule.py, inference.py, and preprocess_inference_data.py. Replace sigma_color= kwarg with gaussian_sigma= in all correct_seams() calls. Rename --seam_sigma_color CLI flag to --seam_sigma and update help text. Remove scikit-image from pyproject.toml as it is no longer used.

… address edge s artefacts

The intermediate file also acts as an input for the seam correction A sbatch file for slurm was created to run the script

LucaRom and others added 12 commits November 3, 2025 22:26

Add base files for waterbodies extraction pipeline

06594a0

Add valid LiDAR mask support to tiling workflow

cf954be

Merge remote-tracking branch 'origin/main'

713040d

Merge pull request #611 from NRCan/main

4dc1003

Merge main with water_extraction

LucaRom force-pushed the water_extraction_new branch from 9dbd7f4 to efcdf43 Compare March 5, 2026 21:13

LucaRom added 17 commits March 5, 2026 16:20

Add seam_correction logic into the pipeline to correct DTM and DSM

9af83d6

Add slurm folder for jobs and logs. Add an example script for downloa…

e898b72

…ding dsm and dtm

Restore Gaussian seam correction (revert bilateral filter)

acd6560

Add seam correction parameter config

3d532e0

Add command line overriding abilities to prepare data

30f6c0e

Add sh file for prepare data on hpc and housekeeping

15200d7

Fix overriding extent path when config is none and add logs to debug

3700954

Fix change project_extents_path to null instead of None

7dcdc62

Merge stoud and sterr for prepare data job script

1583d68

Revert to non override approach for prepare data script

840e08a

Add more ram to prepare data job

67ce5a2

Add script to create a buffered version of the AOI as a preprocess to…

75b6531

… address edge s artefacts

Fix copying features format for dictionnary, not object

3f83e55

Fix AOI type not matching after buff

d242b33

Add script to create the valid lidar mask

1f0f955

The intermediate file also acts as an input for the seam correction A sbatch file for slurm was created to run the script

fix: split datamodule prepare_data flow for training vs inference

19b0797

Fix valid lidar mask handling during inference

cb27cce

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Waterbodies extraction pipeline#605

Waterbodies extraction pipeline#605
LucaRom wants to merge 30 commits intomainfrom
water_extraction_new

LucaRom commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LucaRom commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant