Skip to content

Waterbodies extraction pipeline#605

Draft
LucaRom wants to merge 30 commits intomainfrom
water_extraction_new
Draft

Waterbodies extraction pipeline#605
LucaRom wants to merge 30 commits intomainfrom
water_extraction_new

Conversation

@LucaRom
Copy link
Copy Markdown
Collaborator

@LucaRom LucaRom commented Nov 3, 2025

Draft for now

Will follow soon...

LucaRom and others added 12 commits November 3, 2025 22:26
- Add comprehensive module and function docstrings
- Replace print() with logging (log.info, log.warning, log.exception)
- Replace assert statements with proper ValueError/RuntimeError exceptions
- Fix exception handling: pre-assign messages, use specific exception types
- Add timeout and SSL verification to requests.get()
- Use tempfile.gettempdir() instead of hardcoded /tmp
- Fix line length issues (E501)
- Fix boolean argument positioning with * separator (FBT001/FBT002)
- Rename generic 'df' variables to more descriptive names
- Add noqa comments for acceptable complexity (C901) and argument count (PLR0913)
- Fix blank line spacing in docstrings (D204, D205)
- Overview of water extraction module as extension of geo-deep-learning
- Three usage patterns: fully automated, manual preprocessing, inference
- Detailed pipeline documentation for each preprocessing step
- Configuration parameters reference table
- Output directory structure explanation
- Troubleshooting section for common issues
- Examples for all usage methods
Merge main with water_extraction
…g and inference

- Overhauled TWI computation with tiled processing for large AOIs (>100GB)
- Implemented intensity alignment and AOI cropping with BIGTIFF support
- Fixed raster stacking to preserve per-band NoData and dtype
- Added stratified train/val/test splitting with water-ratio stratification
- Enhanced tile filtering with authoritative statistics export
- Created standalone inference.py with end-to-end pipeline
- Added timing tracking (start/end timestamps, duration)
- Improved LiDAR mask rasterization with compression/tiling
- Extended download_elevation.py with retry logic and error handling
- Updated dependencies (whitebox_workflows, pandas)
Add seam_correction.py to water extraction tools.
Implement correct_seams() to suppress linear boundary artifacts in
lidar-derived rasters where adjacent acquisition projects meet.
Extract shared edges from project extent polygons, build a feathered
distance weight mask, and blend a nodata-safe Gaussian-smoothed pass
with the original within the buffer zone only.
- Reformat code with proper line breaks and indentation
- Reorder imports according to style guide
- Add proper spacing around operators and function arguments
- Remove unused imports and commented code
- Add strict=False to zip() for Python 3.10+ compatibility

Files: download_elevation.py, preprocess_inference_data.py,
prepare_inputs.py, segmentation_task.py
Add ability to train and run inference with or without intensity channel,
allowing models to work with 2-channel (TWI, nDSM) or 3-channel
(TWI, nDSM, Intensity) inputs.

Changes:
- elevation_stack_datamodule.py:
  * Add include_intensity parameter with validation
  * Slice user-provided normalization stats to match channel count
  * Validate stats configuration before dataset creation

- elevation_stack_dataset.py:
  * Add include_intensity parameter to control channel loading
  * Implement selective channel loading (2 or 3 bands)
  * Add channel count validation against normalization stats
  * Improve error messages with detailed diagnostics

- inference.py:
  * Add model_in_channels parameter for inference
  * Validate input raster has sufficient bands
  * Implement selective channel loading during inference
  * Slice mean/std statistics to match model channels
  * Add detailed logging for channel configuration

This enables training models on different input configurations and
ensures proper channel handling during inference.
Extend visualize_prediction() to properly handle images with different
channel counts:
- 1 channel (grayscale): Convert to RGB by repeating channel
- 2 channels: Pad with zeros to create 3-channel RGB
- 3+ channels: Use first 3 channels (existing behavior)

This fixes visualization errors when working with 2-channel inputs
(TWI + nDSM without intensity) or single-channel inputs.
…puts

Fix seam correction to use polygon boundaries as correction centerlines
Optimize seam correction with strip-based windowed processing
Fix RuntimeWarning on division by zero in _gaussian_smooth_nodata_safe
Switch from Gaussian-blur-and-blend to an inpainting approach.
Add nodata filling along seam centerline in correct_seams
Add blend_sigma parameter (default 0.0) used only for the taper zone outside the inpainting zone.
Switch blend taper to cosine and tighten default parameters
Replace Gaussian inpainting with bilateral filter
Add scikit to dependencies
Add an optional dependencies for water extraction pipeline in GDL
@LucaRom LucaRom force-pushed the water_extraction_new branch from 9dbd7f4 to efcdf43 Compare March 5, 2026 21:13
LucaRom added 17 commits March 5, 2026 16:20
  Revert seam_correction.py from bilateral filter back to Gaussian inpainting.
  Rename seam_sigma_color → seam_gaussian_sigma (default 1.5 px) in
  elevation_stack_datamodule.py, inference.py, and preprocess_inference_data.py.
  Replace sigma_color= kwarg with gaussian_sigma= in all correct_seams() calls.
  Rename --seam_sigma_color CLI flag to --seam_sigma and update help text.
  Remove scikit-image from pyproject.toml as it is no longer used.
The intermediate file also acts as an input for the seam correction
A sbatch file for slurm was created to run the script
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant