R2r multiscale#208
Draft
seanmcculloch wants to merge 43 commits intomainfrom
Draft
Conversation
* feat: split xml IP detection - Add split tile shape support to overlap_detection.py - Add split path construction and crop passthrough to metadata_builder.py - Add crop slicing to image_reader.py - Add fetch_local_xml utility function to pipelines/utils.py - Update xml_to_dataframe.py for split XML support - Add uv.lock to gitignore Tests for this feature are in a separate PR. * test: add split XML IP detection tests - Add test_xml_to_dataframe tests for split XML parsing - Add test_image_reader tests for crop slicing - Add test_metadata_builder tests for split metadata handling - Add dataset_split.xml test fixture These tests verify the split XML support added in feat/split-xml-ipd. * Initial plan * Update Rhapso/data_prep/xml_to_dataframe.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update Rhapso/data_prep/xml_to_dataframe.py descriptive error with bad split xmls. Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Fix channel parsing in parse_image_loader_split_zarr to use .ome.zarr suffix Co-authored-by: seanmcculloch <86432671+seanmcculloch@users.noreply.github.com> * Update Rhapso/detection/metadata_builder.py ceil crop_max when downsampling Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update Rhapso/data_prep/xml_to_dataframe.py descriptive error upon bad split xml Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Initial plan * Fix channel extraction, crop validation, path handling, and test signatures Co-authored-by: seanmcculloch <86432671+seanmcculloch@users.noreply.github.com> * Move import to top level and use calculated level instead of hardcoded '0' Co-authored-by: seanmcculloch <86432671+seanmcculloch@users.noreply.github.com> * fix: handling of multiscale levels * fix: prevent skipping ip detection with 1 split tile only * feat: add overlapping_only flag (default true) to ipd * Fix non-split zarr path construction in overlap detection (#164) * Initial plan * Fix dim_other path to use proper path joining and include level Co-authored-by: seanmcculloch <86432671+seanmcculloch@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: seanmcculloch <86432671+seanmcculloch@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Initial plan * Use os.path.join for path construction in metadata_builder Co-authored-by: seanmcculloch <86432671+seanmcculloch@users.noreply.github.com> * Use os.path.join consistently for all path construction Co-authored-by: seanmcculloch <86432671+seanmcculloch@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: seanmcculloch <86432671+seanmcculloch@users.noreply.github.com>
* Initial plan * Add crop bounds validation with comprehensive tests Co-authored-by: seanmcculloch <86432671+seanmcculloch@users.noreply.github.com> * Enhance test to verify complete error message with shape Co-authored-by: seanmcculloch <86432671+seanmcculloch@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: seanmcculloch <86432671+seanmcculloch@users.noreply.github.com>
…ysis - Add evaluation/__init__.py to make evaluation a proper package - Add evaluation/detection_qc/ with view_metrics, sweep_analyzer, and plotting - ViewIPMetrics computes per-view spatial, density, and intensity stats from N5 - SweepAnalyzer aggregates trial results with labeled metric dicts (name/value/description) - Plotting generates 4 diagnostic PNGs (IP counts, success rates, box plots, heatmap) - Fix test directory typo: test_evaluaton -> test_evaluation - Add 23 tests covering metrics, analysis, serialization, and plotting Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add print statements to trace multiscale level selection and zarr path construction during interest point detection. Helps diagnose cases where non-1.0 dsxy/dsz result in zero detections. - [interest_point_detection.py] log selected level and dsxy/dsz values - [overlap_detection.py] log root zarr opening and available multiscales - [metadata_builder.py] log constructed zarr paths for each view - [image_reader.py] log zarr open attempts with error handling All prints go to stdout and will be captured in /results/output.log Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Print file_path for every chunk loaded to see what multiscale level is being used in practice. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
OME-Zarr stores already have the multiscale level in the path (e.g., SPIM.ome.zarr/0), so don't append it again. Check if the path already ends with the level before appending. This fixes 0 IPs detected when dsxy/dsz > 1, which was trying to open non-existent paths like SPIM.ome.zarr/0/0. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
OME-Zarr multiscale levels are arrays, not groups. Check for .keys() before calling it; if not present, it's an array. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
When a specific multiscale level (e.g., /4) fails to open, attempt to inspect the root zarr to show which levels are actually available. This helps identify if the issue is a missing level vs. a path/access problem. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
In split tile mode, zarr_base_path only has the root (SPIM.ome.zarr/) but multiscale levels live inside each per-tile zarr (e.g., Tile_X_..._ch_405.zarr/4). Join zarr_base_path + file_path before appending the multiscale level. This was causing "nothing found at path ''" errors for all non-1.0 dsxy/dsz values because SPIM.ome.zarr/4 doesn't exist — only SPIM.ome.zarr/Tile_X_.../4 does. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Add prints to trace: - Image shape and data stats after zarr open - Dask array shape after transpose - Shape before/after downsampling - Input image stats and peak counts in DoG.run This helps diagnose why levels 1-4 find 0 points despite data existing in S3. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Bounds in interval_key are always in full-resolution (level 0) coordinates. When loading from a higher multiscale level (e.g., level 4), the data is downsampled by 2^level. We must divide the bounds by 2^level before slicing. Without this, bounds calculated for level 0 would try to access far out of bounds in a level 4 array, resulting in tiny or empty slices and 0 detected points. This fixes the "0 points detected at dsxy/dsz != 1.0" bug by properly scaling bounds for multiscale-level zarr loads. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Remove debugging prints that were only needed during initial troubleshooting: - Image shape/stats prints from ImageReader - DoG peak count and stats prints - MetadataBuilder level/path prints The core fix (bounds scaling for multiscale levels) is working correctly and the verbose output is no longer needed. Logs are now much cleaner and more token-efficient for capsule runs. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
…ion debug Add strategic print statements to trace: - Multiscale level selection and leftovers in overlap_detection - Path construction for split tile zarr stores in metadata_builder - Bounds scaling for multiscale levels in image_reader - Peak detection counts in difference_of_gaussian This enables diagnosis of the 0 IP detection bug when dsxy/dsz != 1.0 Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Add explicit prints with flush=True in Ray remote task to diagnose if tasks are executing and where errors occur. Helps identify Ray worker serialization issues. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
The crop bounds validation was too strict. Since we slice as data[crop_min:crop_max+1] (exclusive upper bound), crop_max can legitimately equal array_shape[i] - 1. This was causing "crop_max exceeds array dimension" errors when loading multiscale levels where scaled crop bounds reached the edge of the downsampled array. Changed validation from: if crop_max[i] >= array_shape[i] to: if crop_max[i] > array_shape[i] - 1 Fixes 0 IP detection at multiscale levels > 0 for split tile workflows. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
The scaling formula in metadata_builder.py can legitimately produce crop_max values that exceed array dimensions due to rounding when scaling from full-resolution to multiscale coordinates. Rather than reject these, clamp crop_max to the valid range (0 to shape-1) since the intent is to crop to available data. This fixes interest point detection at multiscale levels when split tile crop bounds are provided. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Crop bounds come from metadata_builder in level-0 (full-resolution) coordinates. The previous code was applying crop BEFORE downsampling and scaling, which breaks the coordinate system for split tiles at multiscale levels. Correct order: 1. Load array at selected multiscale level 2. Downsample by dsxy/dsz parameters 3. Scale interval bounds by 2^level to current-level coordinates 4. Scale crop bounds by same factor 5. Apply crop at final downsampled space This ensures pixel-perfect accuracy when combining multiscale loading with split tile cropping. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Add comment explaining that the scaling formula in metadata_builder can produce crop_max values exceeding array dimensions due to rounding. These are clamped to valid bounds in image_reader.py after crop is applied in the correct coordinate space. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Initialize Ray with num_cpus=os.cpu_count() before submitting remote tasks, and shutdown after detection completes. This prevents Ray from silently auto-initializing with default settings and ensures resources are released for other frameworks (e.g. Dask) running in the same process. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Crop bounds were double-scaled by 2^level (once in MetadataBuilder, once in ImageReader) instead of being scaled by the total downsampling factor (2^level * dsxy for XY, 2^level * dsz for Z). This caused ValueErrors when crop bounds exceeded the downsampled array shape at most multiscale levels (only worked by coincidence when 2^level == dsxy). Now crop bounds are kept in level-0 coordinates through MetadataBuilder and scaled once in ImageReader by the full factor. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Both interest_point_detection and split_dataset now explicitly init Ray with _temp_dir="/scratch/ray/" and os.makedirs(exist_ok=True). This prevents the "No space left on device" error when /tmp fills up during parameter sweeps on Code Ocean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of the Changes You Made
Type of Change
Instructions for Testing
Additional Info