Skip to content
Open
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
d67563e
feat: add CompactMask for memory-efficient crop-RLE mask storage
Borda Mar 2, 2026
77a0117
fix: resolve unresolved PR 2159 review suggestions
Borda Mar 2, 2026
ad6ceb7
fix: correct bounding box coordinates in CompactMask doctests
Borda Mar 2, 2026
a417abc
Merge branch 'develop' into debug/oom
Borda Mar 2, 2026
969e002
feat: implement memory-efficient IoU and NMS with CompactMask integra…
Borda Mar 2, 2026
d41fcf7
Merge branch 'debug/oom' of https://github.com/roboflow/supervision i…
Borda Mar 2, 2026
91742a2
Merge branch 'develop' into debug/oom
Borda Mar 10, 2026
13d0156
test: add extensive tests for CompactMask IoU, NMS, and InferenceSlic…
Borda Mar 10, 2026
490cc0a
fix(pre_commit): 🎨 auto format pre-commit hooks
pre-commit-ci[bot] Mar 10, 2026
75ed494
Merge branch 'develop' into debug/oom
Borda Mar 10, 2026
1f74014
fix(pre_commit): 🎨 auto format pre-commit hooks
pre-commit-ci[bot] Mar 10, 2026
eff23f6
feat(examples): add CompactMask demo and benchmark
Borda Mar 11, 2026
66917c2
Merge branch 'develop' into debug/oom
Borda Mar 11, 2026
b42bc46
feat(examples): expand CompactMask benchmark with new stages and metrics
Borda Mar 11, 2026
c1b2f26
feat(tests): add detailed CompactMask tests for NMM, centroids, holes…
Borda Mar 11, 2026
e0f4979
feat(compact_mask): add repack(), fix merge perf, and add parity tests
Borda Mar 11, 2026
b058368
fix(pre_commit): 🎨 auto format pre-commit hooks
pre-commit-ci[bot] Mar 11, 2026
3784110
Apply suggestions from code review
Borda Mar 11, 2026
9ff6096
fix(masks): handle empty crops by defaulting centroid to (0, 0)
Borda Mar 11, 2026
3b76858
Merge branch 'debug/oom' of https://github.com/roboflow/supervision i…
Borda Mar 11, 2026
33f1dcc
feat(compact_mask): add `bbox_xyxy` property and improve type annotat…
Borda Mar 11, 2026
2968028
Apply suggestions from code review
Borda Mar 11, 2026
e1a5df8
feat(compact_mask): enhance `with_offset` for clipping and add tests
Borda Mar 11, 2026
5b1c639
docs(compact_mask): unwrap prose and add per-operation speedup analysis
Borda Mar 11, 2026
9ee7fd0
perf(compact_mask): fast path in with_offset avoids decode/re-encode
Borda Mar 11, 2026
8c28649
fix(benchmark): count NMS mismatches and explain exact-vs-resize diff…
Borda Mar 11, 2026
b63130b
test(compact_mask): add 121 parametrised random-scenario parity tests
Borda Mar 11, 2026
2930e1b
refactor: rename single-char variables to descriptive names
Borda Mar 11, 2026
b747e24
fix(nms): remove resize-to-640 approximation from mask_non_max_suppre…
Borda Mar 12, 2026
b2234da
docs(compact_mask): update README with fresh benchmark results
Borda Mar 12, 2026
a0783de
refactor(benchmark): improve summary table logic, add CSV export
Borda Mar 12, 2026
3b88b6e
fix(pre_commit): 🎨 auto format pre-commit hooks
pre-commit-ci[bot] Mar 12, 2026
c52adeb
docs(compact_mask): update README with revised benchmark speedups and…
Borda Mar 12, 2026
03e3c77
Merge branch 'debug/oom' of https://github.com/roboflow/supervision i…
Borda Mar 12, 2026
437e7ee
fix(pre_commit): 🎨 auto format pre-commit hooks
pre-commit-ci[bot] Mar 12, 2026
12ccf4f
Merge branch 'develop' into debug/oom
Borda Mar 13, 2026
4241ff9
Merge branch 'develop' into debug/oom
Borda Mar 18, 2026
12ae35b
Merge branch 'develop' into debug/oom
Borda Apr 1, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
243 changes: 243 additions & 0 deletions examples/compact_mask/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,243 @@
# CompactMask — Memory-Efficient Mask Storage

This example benchmarks `CompactMask`, a new mask representation introduced in
`supervision` that replaces dense `(N, H, W)` boolean arrays with a crop-scoped
Run-Length Encoding (RLE). The benchmark demonstrates full API compatibility,
massive memory savings, and order-of-magnitude annotation speedups — with no
change to your existing `Detections` code.

---

## The Problem

Instance segmentation models return one boolean mask per detected object.
`supervision` stores these as a stacked `(N, H, W)` numpy array.

For a 4K image with 1 000 detected objects:

```
1 000 x 3840 x 2160 x 1 byte = 8.3 GB
```

At this scale, typical pipelines crash with `MemoryError` before a single frame
is annotated. Aerial imagery, satellite tiles, and high-density crowd scenes all
hit this wall.

---

## The Solution — Crop-RLE Storage

`CompactMask` stores each mask as a run-length encoding of its **bounding-box
crop** rather than the full image canvas.

```
dense (N,H,W) mask → N x crop_RLE + N x (x1,y1) offset
8.3 GB → ~280 KB
```

The bounding boxes are already present in `Detections.xyxy`, so no extra
metadata is required from the caller.

### Theoretical analysis (4K scene, 80x80 px objects, ~65% fill per bbox)

Assumptions used throughout the PR design analysis:

| Parameter | Value |
| ---------------------- | ------------------------ |
| Image size | 4K — 3840x2160 = 8.29 MP |
| Avg bounding box | 80x80 px = 6 400 px² |
| Fill ratio within bbox | ~65% |
| Avg contour vertices | ~400 pts |
| Avg RLE runs / mask | ~240 (3 runs x 80 rows) |

#### Space comparison

| Format | Per object | N=100 | N=1 000 | vs Dense |
| ------------------- | -------------- | ------ | ---------- | --------- |
| **Dense** (current) | 8.29 MB | 829 MB | **8.3 GB** | 1x |
| Local Crop + Offset | 6.4 KB | 640 KB | 6.4 MB | 1 300x |
| **Crop-RLE** ✓ | ~2 KB | 200 KB | **2 MB** | 4 000x |
| Polygon ⚠ lossy | ~3.2 KB | 320 KB | 3.2 MB | 2 600x |
| memmap | 8.29 MB (disk) | 829 MB | 8.3 GB | 1x (disk) |

Crop-RLE beats Local Crop because it only encodes actual pixel runs, skipping
the ~35% background pixels within each bounding box.

#### Encode time: dense array → format

| Format | Complexity | N=10 | N=100 | N=1 000 |
| ------------------- | --------------------------------- | ------- | ------- | --------- |
| Local Crop + Offset | O(A) — strided slice from xyxy | ~0.1 ms | ~1 ms | ~10 ms |
| **Crop RLE** | O(A) — scan crop rows for runs | ~0.2 ms | ~2 ms | ~20 ms |
| Polygon | O(P) — `cv2.findContours` on crop | ~2 ms | ~20 ms | ~200 ms |
| memmap | O(I) — write 8.29 MB to disk | ~80 ms | ~800 ms | ~8 000 ms |

#### Decode time: format → full (H, W) mask

Required by `MaskAnnotator`, `mask_iou_batch`, `merge()`, etc.
Dominant cost at 4K is **allocating and zeroing a 8.29 MB array**, which is
identical across all in-memory formats once full materialisation is needed.

| Format | N=10 | N=100 | N=1 000 |
| --------------------- | ------ | ------- | --------- |
| Local Crop / Crop RLE | ~3 ms | ~30 ms | ~300 ms |
| Polygon | ~5 ms | ~50 ms | ~500 ms |
| memmap | ~80 ms | ~800 ms | ~8 000 ms |

#### Decode time: crop-only path (optimised)

When callers need only the bounding-box region — `MaskAnnotator` crop-paint
path, `.area`, `contains_holes`, `filter_segments_by_distance`:

| Format | Complexity | N=10 | N=100 | N=1 000 |
| ------------------- | -------------------------------- | -------- | ------- | --------- |
| Local Crop + Offset | O(1) — already stored | ~0 ms | ~0 ms | ~0 ms |
| **Crop RLE** ✓ | O(A) — expand ~240 runs | ~0.02 ms | ~0.2 ms | ~2 ms |
| Polygon | O(A) — `fillPoly` on crop canvas | ~2 ms | ~20 ms | ~200 ms |
| memmap | N/A — always full-size | ~80 ms | ~800 ms | ~8 000 ms |

Crop RLE's `.crop()` method powers the `MaskAnnotator` optimisation — it never
allocates the full image canvas, which is the entire source of the annotation
speedup.

#### IoU / NMS at 1 % bbox overlap rate (sparse aerial scene)

| Format | Strategy | N=1 000 |
| ------------------- | ------------------------------------- | ---------- |
| Dense (current) | All pairs, 640² pixel AND | ~10 000 ms |
| Local Crop + Offset | Bbox pre-filter → pixel IoU | **~5 ms** |
| Crop RLE | Bbox pre-filter → expand intersection | **~15 ms** |

At N=1 000 with 1 % overlap, bbox pre-filter reduces 499 500 candidate pairs to
~5 000 overlapping pairs — a ~2 000x reduction in pixel-level work.

---

## Why Crop-RLE Was Chosen over Local Crop

Both formats compress extremely well; the deciding factors for Crop-RLE are:

1. **~3x smaller** for masks that are themselves sparse within their bounding box.
2. **COCO RLE interop path** — row-major crop RLE can be re-encoded to
column-major full-image RLE for `pycocotools` if needed.
3. `.area` computed directly from run lengths — no materialisation, no allocation.

The main trade-off: crop-only decode is O(A) rather than O(1). For the common
solid-fill segmentation mask this is negligible (\<0.1 ms per mask).

---

## Drop-In Compatibility

`CompactMask` implements the same duck-typed interface as `np.ndarray`:

```python
import supervision as sv
from supervision.detection.compact_mask import CompactMask

# Build from an existing dense (N, H, W) bool array:
compact = CompactMask.from_dense(masks_dense, xyxy, image_shape=(H, W))

# Use exactly like a dense mask — no other code changes needed:
detections = sv.Detections(xyxy=xyxy, mask=compact, class_id=class_ids)

# Filtering, merging, area — all work transparently:
filtered = detections[confidence > 0.5]
areas = detections.area # RLE sum, no materialisation
merged = sv.Detections.merge([det_a, det_b])

# MaskAnnotator works without any change:
annotated = sv.MaskAnnotator().annotate(frame, detections)

# Materialise back to dense when you need raw numpy:
dense_again = compact.to_dense() # (N, H, W) bool
```

Supported indexing patterns:

| Expression | Returns |
| ------------------ | ---------------------------- |
| `mask[i]` (int) | Dense `(H, W)` bool array |
| `mask[bool_array]` | New `CompactMask` (filtered) |
| `mask[slice]` | New `CompactMask` |
| `np.asarray(mask)` | Dense `(N, H, W)` bool array |

---

## Benchmark

Run on any machine — no GPU or real model required:

```bash
uv run python examples/compact_mask/benchmark.py
```

Three image tiers x three fill fractions (5 / 10 / 20 %):

| Tier | Resolution | Typical use-case |
| ---- | ---------- | ----------------------------------- |
| FHD | 1920x1080 | Video surveillance, robotics |
| 4K | 3840x2160 | Drone footage, cinema |
| SAT | 8192x8192 | Sentinel-2 / GeoTIFF benchmark tile |

Dense timing is skipped automatically when the array would exceed 12 GB
(`DENSE_SKIP_GB`), preventing swap thrashing on SAT scenarios. Memory is still
reported as theoretical `NxHxW` bytes.

### Sample results (macOS, Apple M-series, REPS=5)

| Scenario | Dense mem | Compact theor. | Compact actual | Mem x | Area x | Annot x |
| ----------- | --------- | -------------- | -------------- | ------- | ------ | ------- |
| FHD-100-5% | 207 MB | 33 KB | 62 KB | 6 300x | 280x | 70x |
| FHD-100-20% | 207 MB | 67 KB | 137 KB | 3 100x | 267x | 27x |
| 4K-500-5% | 4 147 MB | 139 KB | 250 KB | 30 000x | 1 087x | 383x |
| 4K-1000-10% | 8 294 MB | 277 KB | 498 KB | 30 000x | 1 120x | 439x |
| SAT-200-5% | 13 422 MB | 271 KB | 485 KB | 49 000x | N/A | N/A |

- **Compact theor.** — sum of internal numpy buffer `nbytes`
- **Compact actual** — `tracemalloc` peak during `CompactMask.from_dense()`, including Python object overhead (~2x theoretical for small object counts)
- **Mem x** — dense / compact theoretical ratio
- **Area x** — `.area` speedup; RLE sums True-pixel counts with no materialisation
- **Annot x** — `MaskAnnotator` speedup; crop-paint avoids full-frame allocation
- **N/A** — dense timing skipped (array > 12 GB)

All non-skipped scenarios pass: pixel-perfect annotation, exact area,
lossless `to_dense()` roundtrip.

---

## Use-Cases

- **Aerial / satellite imagery** — thousands of small objects on large tiles;
dense masks exhaust RAM before inference completes.
- **High-density crowd / cell segmentation** — N > 500 on FHD already requires
several GB of mask storage per batch.
- **Real-time annotation pipelines** — crop-paint cuts annotation from seconds
to milliseconds at 4K resolution.
- **Long-running tracking** — accumulated `Detections` across many frames stay
in kilobytes rather than gigabytes.
- **`InferenceSlicer`** — `with_offset()` adjusts crop origins directly when
stitching tile results; no dense materialisation needed.

---

## Limitations

- `CompactMask` is **not** a full `np.ndarray`. Call `.to_dense()` before
passing to code that requires arbitrary ndarray methods (`astype`, `reshape`,
`ravel`, `any`, `all`, …).
- RLE format is **row-major (C-order), crop-scoped** — incompatible with
pycocotools / COCO API RLEs (column-major, full-image-scoped). Use
`.to_dense()` first if you need pycocotools interop.
- `from_dense()` requires the input `(N, H, W)` array to fit in memory.
For truly OOM-scale data, build `CompactMask` per-detection directly from
model output crops rather than from a pre-allocated dense stack.

---

## Files

| File | Description |
| -------------- | ------------------------------------------------ |
| `benchmark.py` | Full benchmark across FHD / 4K / satellite tiers |
| `README.md` | This file |
Loading
Loading