Skip to content

R2r multiscale#208

Draft
seanmcculloch wants to merge 43 commits intomainfrom
r2r_multiscale
Draft

R2r multiscale#208
seanmcculloch wants to merge 43 commits intomainfrom
r2r_multiscale

Conversation

@seanmcculloch
Copy link
Copy Markdown
Collaborator

Description of the Changes You Made

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Instructions for Testing

Additional Info

seanmcculloch and others added 30 commits April 22, 2026 04:36
* feat: split xml IP detection

- Add split tile shape support to overlap_detection.py
- Add split path construction and crop passthrough to metadata_builder.py
- Add crop slicing to image_reader.py
- Add fetch_local_xml utility function to pipelines/utils.py
- Update xml_to_dataframe.py for split XML support
- Add uv.lock to gitignore

Tests for this feature are in a separate PR.

* test: add split XML IP detection tests

- Add test_xml_to_dataframe tests for split XML parsing
- Add test_image_reader tests for crop slicing
- Add test_metadata_builder tests for split metadata handling
- Add dataset_split.xml test fixture

These tests verify the split XML support added in feat/split-xml-ipd.

* Initial plan

* Update Rhapso/data_prep/xml_to_dataframe.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update Rhapso/data_prep/xml_to_dataframe.py

descriptive error with bad split xmls.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Fix channel parsing in parse_image_loader_split_zarr to use .ome.zarr suffix

Co-authored-by: seanmcculloch <86432671+seanmcculloch@users.noreply.github.com>

* Update Rhapso/detection/metadata_builder.py

ceil crop_max when downsampling

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update Rhapso/data_prep/xml_to_dataframe.py

descriptive error upon bad split xml

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Initial plan

* Fix channel extraction, crop validation, path handling, and test signatures

Co-authored-by: seanmcculloch <86432671+seanmcculloch@users.noreply.github.com>

* Move import to top level and use calculated level instead of hardcoded '0'

Co-authored-by: seanmcculloch <86432671+seanmcculloch@users.noreply.github.com>

* fix: handling of multiscale levels

* fix: prevent skipping ip detection with 1 split tile only

* feat: add overlapping_only flag (default true) to ipd

* Fix non-split zarr path construction in overlap detection (#164)

* Initial plan

* Fix dim_other path to use proper path joining and include level

Co-authored-by: seanmcculloch <86432671+seanmcculloch@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: seanmcculloch <86432671+seanmcculloch@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Initial plan

* Use os.path.join for path construction in metadata_builder

Co-authored-by: seanmcculloch <86432671+seanmcculloch@users.noreply.github.com>

* Use os.path.join consistently for all path construction

Co-authored-by: seanmcculloch <86432671+seanmcculloch@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: seanmcculloch <86432671+seanmcculloch@users.noreply.github.com>
* Initial plan

* Add crop bounds validation with comprehensive tests

Co-authored-by: seanmcculloch <86432671+seanmcculloch@users.noreply.github.com>

* Enhance test to verify complete error message with shape

Co-authored-by: seanmcculloch <86432671+seanmcculloch@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: seanmcculloch <86432671+seanmcculloch@users.noreply.github.com>
…ysis

- Add evaluation/__init__.py to make evaluation a proper package
- Add evaluation/detection_qc/ with view_metrics, sweep_analyzer, and plotting
- ViewIPMetrics computes per-view spatial, density, and intensity stats from N5
- SweepAnalyzer aggregates trial results with labeled metric dicts (name/value/description)
- Plotting generates 4 diagnostic PNGs (IP counts, success rates, box plots, heatmap)
- Fix test directory typo: test_evaluaton -> test_evaluation
- Add 23 tests covering metrics, analysis, serialization, and plotting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add print statements to trace multiscale level selection and zarr path
construction during interest point detection. Helps diagnose cases where
non-1.0 dsxy/dsz result in zero detections.

- [interest_point_detection.py] log selected level and dsxy/dsz values
- [overlap_detection.py] log root zarr opening and available multiscales
- [metadata_builder.py] log constructed zarr paths for each view
- [image_reader.py] log zarr open attempts with error handling

All prints go to stdout and will be captured in /results/output.log

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Print file_path for every chunk loaded to see what multiscale level
is being used in practice.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
OME-Zarr stores already have the multiscale level in the path
(e.g., SPIM.ome.zarr/0), so don't append it again. Check if the
path already ends with the level before appending.

This fixes 0 IPs detected when dsxy/dsz > 1, which was trying to
open non-existent paths like SPIM.ome.zarr/0/0.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
OME-Zarr multiscale levels are arrays, not groups. Check for .keys()
before calling it; if not present, it's an array.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
When a specific multiscale level (e.g., /4) fails to open, attempt to
inspect the root zarr to show which levels are actually available.
This helps identify if the issue is a missing level vs. a path/access problem.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
In split tile mode, zarr_base_path only has the root (SPIM.ome.zarr/)
but multiscale levels live inside each per-tile zarr
(e.g., Tile_X_..._ch_405.zarr/4). Join zarr_base_path + file_path
before appending the multiscale level.

This was causing "nothing found at path ''" errors for all non-1.0
dsxy/dsz values because SPIM.ome.zarr/4 doesn't exist — only
SPIM.ome.zarr/Tile_X_.../4 does.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Add prints to trace:
- Image shape and data stats after zarr open
- Dask array shape after transpose
- Shape before/after downsampling
- Input image stats and peak counts in DoG.run

This helps diagnose why levels 1-4 find 0 points despite data existing in S3.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Bounds in interval_key are always in full-resolution (level 0) coordinates.
When loading from a higher multiscale level (e.g., level 4), the data is
downsampled by 2^level. We must divide the bounds by 2^level before slicing.

Without this, bounds calculated for level 0 would try to access far out of
bounds in a level 4 array, resulting in tiny or empty slices and 0 detected
points.

This fixes the "0 points detected at dsxy/dsz != 1.0" bug by properly
scaling bounds for multiscale-level zarr loads.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Remove debugging prints that were only needed during initial troubleshooting:
- Image shape/stats prints from ImageReader
- DoG peak count and stats prints
- MetadataBuilder level/path prints

The core fix (bounds scaling for multiscale levels) is working correctly
and the verbose output is no longer needed. Logs are now much cleaner and
more token-efficient for capsule runs.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
…ion debug

Add strategic print statements to trace:
- Multiscale level selection and leftovers in overlap_detection
- Path construction for split tile zarr stores in metadata_builder
- Bounds scaling for multiscale levels in image_reader
- Peak detection counts in difference_of_gaussian

This enables diagnosis of the 0 IP detection bug when dsxy/dsz != 1.0

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Add explicit prints with flush=True in Ray remote task to diagnose
if tasks are executing and where errors occur. Helps identify
Ray worker serialization issues.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
The crop bounds validation was too strict. Since we slice as
data[crop_min:crop_max+1] (exclusive upper bound), crop_max can
legitimately equal array_shape[i] - 1.

This was causing "crop_max exceeds array dimension" errors when
loading multiscale levels where scaled crop bounds reached the edge
of the downsampled array.

Changed validation from:
  if crop_max[i] >= array_shape[i]
to:
  if crop_max[i] > array_shape[i] - 1

Fixes 0 IP detection at multiscale levels > 0 for split tile workflows.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
The scaling formula in metadata_builder.py can legitimately produce
crop_max values that exceed array dimensions due to rounding when
scaling from full-resolution to multiscale coordinates. Rather than
reject these, clamp crop_max to the valid range (0 to shape-1) since
the intent is to crop to available data.

This fixes interest point detection at multiscale levels when split
tile crop bounds are provided.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Crop bounds come from metadata_builder in level-0 (full-resolution) coordinates.
The previous code was applying crop BEFORE downsampling and scaling, which breaks
the coordinate system for split tiles at multiscale levels.

Correct order:
1. Load array at selected multiscale level
2. Downsample by dsxy/dsz parameters
3. Scale interval bounds by 2^level to current-level coordinates
4. Scale crop bounds by same factor
5. Apply crop at final downsampled space

This ensures pixel-perfect accuracy when combining multiscale loading with split
tile cropping.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Add comment explaining that the scaling formula in metadata_builder can
produce crop_max values exceeding array dimensions due to rounding.
These are clamped to valid bounds in image_reader.py after crop is
applied in the correct coordinate space.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Initialize Ray with num_cpus=os.cpu_count() before submitting remote
tasks, and shutdown after detection completes. This prevents Ray from
silently auto-initializing with default settings and ensures resources
are released for other frameworks (e.g. Dask) running in the same
process.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Crop bounds were double-scaled by 2^level (once in MetadataBuilder, once
in ImageReader) instead of being scaled by the total downsampling factor
(2^level * dsxy for XY, 2^level * dsz for Z). This caused ValueErrors
when crop bounds exceeded the downsampled array shape at most multiscale
levels (only worked by coincidence when 2^level == dsxy).

Now crop bounds are kept in level-0 coordinates through MetadataBuilder
and scaled once in ImageReader by the full factor.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants