perf(images): materialize rasterized image once to avoid per-channel re-warp#708
Merged
Conversation
…re-warp `spatialdata.rasterize` returns a lazy dask array; `_render_images` then reads it once per channel (NaN check, compositing, draw), so the affine warp re-runs N_passes x N_channels times per render. Materializing the rasterized result once collapses that to a single warp. Pixel-identical output (no baseline change); the rasterized array is always display-sized so the eager compute is memory-bounded. Measured ~15-20x on multi-channel images (8ch 6000^2: 10.4s -> 0.5s); neutral for single-channel. Covers images and labels. Closes #707
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #708 +/- ##
==========================================
- Coverage 76.40% 76.36% -0.04%
==========================================
Files 14 14
Lines 4314 4316 +2
Branches 1003 1004 +1
==========================================
Hits 3296 3296
Misses 663 663
- Partials 355 357 +2
🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
render_imagesre-materializes the lazy rasterized image once per channel, on every pass, so the affine warp runsN_passes × N_channelstimes per render. This materializes it once.Root cause
spatialdata.rasterizereturns a lazy dask array (~5 ms)._render_imagesthen reads it per channel in several passes — the NaN check, the per-channel compositing (img.sel(c=ch).values), and the draw — each of which recomputes the entire dask graph (theorder=0affine warp +concatenate3). So an 8-channel render pays for the warp roughlypasses × 8times.Measured (synthetic 8ch float32, chunked like the real pipeline):
_rasterize_if_necessary(lazy)pl.show(8ch 6000²).sel(c=ch).values.compute()of the same arrayFix
Materialize the rasterized array once before returning:
Results
End-to-end: 10 457 ms → 508 ms (20.6×) at 8ch 6000²; 15.4× at 4000². Single-channel is neutral. The datashader path already
.compute()s, so it is unaffected.Why this is safe
np.array_equal == True) — transform, coords, and channel labels preserved — so no visual baselines change (that's the correctness proof: existing image/label tests stay green unchanged)._rasterize_if_necessaryis always display-sized (rasterized to ~target, or returned only when ≤ target+100 per axis), so the eager compute is ~15 MB at 700²×8._rasterize_if_necessary).Closes #707.