R2r multiscale by seanmcculloch · Pull Request #208 · AllenNeuralDynamics/Rhapso

seanmcculloch · 2026-05-07T17:00:30Z

Description of the Changes You Made

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Instructions for Testing

Additional Info

…sform

* feat: split xml IP detection - Add split tile shape support to overlap_detection.py - Add split path construction and crop passthrough to metadata_builder.py - Add crop slicing to image_reader.py - Add fetch_local_xml utility function to pipelines/utils.py - Update xml_to_dataframe.py for split XML support - Add uv.lock to gitignore Tests for this feature are in a separate PR. * test: add split XML IP detection tests - Add test_xml_to_dataframe tests for split XML parsing - Add test_image_reader tests for crop slicing - Add test_metadata_builder tests for split metadata handling - Add dataset_split.xml test fixture These tests verify the split XML support added in feat/split-xml-ipd. * Initial plan * Update Rhapso/data_prep/xml_to_dataframe.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update Rhapso/data_prep/xml_to_dataframe.py descriptive error with bad split xmls. Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Fix channel parsing in parse_image_loader_split_zarr to use .ome.zarr suffix Co-authored-by: seanmcculloch <86432671+seanmcculloch@users.noreply.github.com> * Update Rhapso/detection/metadata_builder.py ceil crop_max when downsampling Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update Rhapso/data_prep/xml_to_dataframe.py descriptive error upon bad split xml Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Initial plan * Fix channel extraction, crop validation, path handling, and test signatures Co-authored-by: seanmcculloch <86432671+seanmcculloch@users.noreply.github.com> * Move import to top level and use calculated level instead of hardcoded '0' Co-authored-by: seanmcculloch <86432671+seanmcculloch@users.noreply.github.com> * fix: handling of multiscale levels * fix: prevent skipping ip detection with 1 split tile only * feat: add overlapping_only flag (default true) to ipd * Fix non-split zarr path construction in overlap detection (#164) * Initial plan * Fix dim_other path to use proper path joining and include level Co-authored-by: seanmcculloch <86432671+seanmcculloch@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: seanmcculloch <86432671+seanmcculloch@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Initial plan * Use os.path.join for path construction in metadata_builder Co-authored-by: seanmcculloch <86432671+seanmcculloch@users.noreply.github.com> * Use os.path.join consistently for all path construction Co-authored-by: seanmcculloch <86432671+seanmcculloch@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: seanmcculloch <86432671+seanmcculloch@users.noreply.github.com>

* Initial plan * Add crop bounds validation with comprehensive tests Co-authored-by: seanmcculloch <86432671+seanmcculloch@users.noreply.github.com> * Enhance test to verify complete error message with shape Co-authored-by: seanmcculloch <86432671+seanmcculloch@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: seanmcculloch <86432671+seanmcculloch@users.noreply.github.com>

…ysis - Add evaluation/__init__.py to make evaluation a proper package - Add evaluation/detection_qc/ with view_metrics, sweep_analyzer, and plotting - ViewIPMetrics computes per-view spatial, density, and intensity stats from N5 - SweepAnalyzer aggregates trial results with labeled metric dicts (name/value/description) - Plotting generates 4 diagnostic PNGs (IP counts, success rates, box plots, heatmap) - Fix test directory typo: test_evaluaton -> test_evaluation - Add 23 tests covering metrics, analysis, serialization, and plotting Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add print statements to trace multiscale level selection and zarr path construction during interest point detection. Helps diagnose cases where non-1.0 dsxy/dsz result in zero detections. - [interest_point_detection.py] log selected level and dsxy/dsz values - [overlap_detection.py] log root zarr opening and available multiscales - [metadata_builder.py] log constructed zarr paths for each view - [image_reader.py] log zarr open attempts with error handling All prints go to stdout and will be captured in /results/output.log Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

Print file_path for every chunk loaded to see what multiscale level is being used in practice. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

OME-Zarr stores already have the multiscale level in the path (e.g., SPIM.ome.zarr/0), so don't append it again. Check if the path already ends with the level before appending. This fixes 0 IPs detected when dsxy/dsz > 1, which was trying to open non-existent paths like SPIM.ome.zarr/0/0. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

OME-Zarr multiscale levels are arrays, not groups. Check for .keys() before calling it; if not present, it's an array. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

When a specific multiscale level (e.g., /4) fails to open, attempt to inspect the root zarr to show which levels are actually available. This helps identify if the issue is a missing level vs. a path/access problem. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

In split tile mode, zarr_base_path only has the root (SPIM.ome.zarr/) but multiscale levels live inside each per-tile zarr (e.g., Tile_X_..._ch_405.zarr/4). Join zarr_base_path + file_path before appending the multiscale level. This was causing "nothing found at path ''" errors for all non-1.0 dsxy/dsz values because SPIM.ome.zarr/4 doesn't exist — only SPIM.ome.zarr/Tile_X_.../4 does. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

Add prints to trace: - Image shape and data stats after zarr open - Dask array shape after transpose - Shape before/after downsampling - Input image stats and peak counts in DoG.run This helps diagnose why levels 1-4 find 0 points despite data existing in S3. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

Bounds in interval_key are always in full-resolution (level 0) coordinates. When loading from a higher multiscale level (e.g., level 4), the data is downsampled by 2^level. We must divide the bounds by 2^level before slicing. Without this, bounds calculated for level 0 would try to access far out of bounds in a level 4 array, resulting in tiny or empty slices and 0 detected points. This fixes the "0 points detected at dsxy/dsz != 1.0" bug by properly scaling bounds for multiscale-level zarr loads. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

Remove debugging prints that were only needed during initial troubleshooting: - Image shape/stats prints from ImageReader - DoG peak count and stats prints - MetadataBuilder level/path prints The core fix (bounds scaling for multiscale levels) is working correctly and the verbose output is no longer needed. Logs are now much cleaner and more token-efficient for capsule runs. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

…ion debug Add strategic print statements to trace: - Multiscale level selection and leftovers in overlap_detection - Path construction for split tile zarr stores in metadata_builder - Bounds scaling for multiscale levels in image_reader - Peak detection counts in difference_of_gaussian This enables diagnosis of the 0 IP detection bug when dsxy/dsz != 1.0 Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

Add explicit prints with flush=True in Ray remote task to diagnose if tasks are executing and where errors occur. Helps identify Ray worker serialization issues. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

The crop bounds validation was too strict. Since we slice as data[crop_min:crop_max+1] (exclusive upper bound), crop_max can legitimately equal array_shape[i] - 1. This was causing "crop_max exceeds array dimension" errors when loading multiscale levels where scaled crop bounds reached the edge of the downsampled array. Changed validation from: if crop_max[i] >= array_shape[i] to: if crop_max[i] > array_shape[i] - 1 Fixes 0 IP detection at multiscale levels > 0 for split tile workflows. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

The scaling formula in metadata_builder.py can legitimately produce crop_max values that exceed array dimensions due to rounding when scaling from full-resolution to multiscale coordinates. Rather than reject these, clamp crop_max to the valid range (0 to shape-1) since the intent is to crop to available data. This fixes interest point detection at multiscale levels when split tile crop bounds are provided. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

Crop bounds come from metadata_builder in level-0 (full-resolution) coordinates. The previous code was applying crop BEFORE downsampling and scaling, which breaks the coordinate system for split tiles at multiscale levels. Correct order: 1. Load array at selected multiscale level 2. Downsample by dsxy/dsz parameters 3. Scale interval bounds by 2^level to current-level coordinates 4. Scale crop bounds by same factor 5. Apply crop at final downsampled space This ensures pixel-perfect accuracy when combining multiscale loading with split tile cropping. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

Add comment explaining that the scaling formula in metadata_builder can produce crop_max values exceeding array dimensions due to rounding. These are clamped to valid bounds in image_reader.py after crop is applied in the correct coordinate space. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

Initialize Ray with num_cpus=os.cpu_count() before submitting remote tasks, and shutdown after detection completes. This prevents Ray from silently auto-initializing with default settings and ensures resources are released for other frameworks (e.g. Dask) running in the same process. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Crop bounds were double-scaled by 2^level (once in MetadataBuilder, once in ImageReader) instead of being scaled by the total downsampling factor (2^level * dsxy for XY, 2^level * dsz for Z). This caused ValueErrors when crop bounds exceeded the downsampled array shape at most multiscale levels (only worked by coincidence when 2^level == dsxy). Now crop bounds are kept in level-0 coordinates through MetadataBuilder and scaled once in ImageReader by the full factor. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Both interest_point_detection and split_dataset now explicitly init Ray with _temp_dir="/scratch/ray/" and os.makedirs(exist_ok=True). This prevents the "No space left on device" error when /tmp fills up during parameter sweeps on Code Ocean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

seanmcculloch and others added 30 commits April 22, 2026 04:36

split r2r

ea2c971

detection dev

21ef9db

fix: split dataset xml generation fix timepoints and calibration tran…

2cd7a72

…sform

chore: clean diff

d4174fc

fix: update split xml_to_dataframe with better zarr parser

8116fc5

chore: typo

f455069

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

docs: inline comment on channel parsing

f2ce5f8

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

chore: .format to f string

e35375d

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

chore: debug prints

7ffe91d

debug: move print to ImageReader.fetch_image_data for visibility

f8389fa

Print file_path for every chunk loaded to see what multiscale level is being used in practice. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

fix: handle zarr arrays without .keys() in diagnostic print

bf07b3c

OME-Zarr multiscale levels are arrays, not groups. Check for .keys() before calling it; if not present, it's an array. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

feat(diagnostics): Add Ray task-level logging with flush

cf780bc

Add explicit prints with flush=True in Ray remote task to diagnose if tasks are executing and where errors occur. Helps identify Ray worker serialization issues. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

seanmcculloch and others added 13 commits April 22, 2026 04:36

fixes for detection of unsplit xml

1f25fdf

matching and solver working, skip global optimization

c3635fd

remove _ch_ restriction

9214667

add retry

8c4e963

checkpoint

9ad5a92

Fix split-image transform handling in matching

98ea077

translation metadata

2da34eb

tighten chunk-overlap

ca31c88

checkpoint

47dd0cc

pinned versions

aed99a3

fix double offset of IPs when chunks_per_bound greater than 1

51930df

fix chunks_per_bound double offset

826582e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R2r multiscale#208

R2r multiscale#208
seanmcculloch wants to merge 43 commits intomainfrom
r2r_multiscale

seanmcculloch commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

seanmcculloch commented May 7, 2026

Description of the Changes You Made

Type of Change

Instructions for Testing

Additional Info

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants