fix: is_empty() returns False for empty tracker arrays (issue #2195)#2203
fix: is_empty() returns False for empty tracker arrays (issue #2195)#2203Zeesejo wants to merge 1 commit intoroboflow:developfrom
Conversation
Previously, is_empty() used equality comparison against Detections.empty() which sets tracker_id=None. When tracker_id was np.array([]) instead of None (e.g., after filtering a Detections object that had a tracker_id), the __eq__ check failed even though the detection set is genuinely empty. Fix: check len(self) == 0 directly, preserving data/metadata neutrality. Fixes roboflow#2195
There was a problem hiding this comment.
Pull request overview
Fixes sv.Detections.is_empty() returning False for empty detections when optional fields (e.g. tracker_id) are present as zero-length arrays instead of None.
Changes:
- Reimplemented
Detections.is_empty()to returnlen(self) == 0(based onxyxylength only). - Expanded the
is_empty()docstring to clarify the new behavior. - Removed substantial docstring example blocks from
from_lmm()/from_vlm()and adjusted the__getitem__docstring example.
| def is_empty(self) -> bool: | ||
| """ | ||
| Returns `True` if the `Detections` object is considered empty. | ||
| Returns `True` if the `Detections` object is considered empty, | ||
| i.e. contains no detections. This check is based solely on the | ||
| number of bounding boxes, making it robust to optional fields | ||
| (such as `tracker_id`) being empty arrays rather than `None`. | ||
| """ | ||
| empty_detections = Detections.empty() | ||
| empty_detections.data = self.data | ||
| empty_detections.metadata = self.metadata | ||
| return bool(self == empty_detections) | ||
| return len(self) == 0 |
There was a problem hiding this comment.
Add regression tests for the updated Detections.is_empty() behavior (e.g., xyxy empty with tracker_id=np.array([]) and/or other optional fields as empty arrays) to ensure the original issue (#2195) is covered and doesn't regress.
| Example: | ||
| ```python | ||
| import supervision as sv | ||
|
|
||
| detections = sv.Detections() | ||
| detections = sv.Detections(...) | ||
|
|
There was a problem hiding this comment.
The __getitem__ docstring example uses sv.Detections(...), which isn’t runnable and is inconsistent with other docstring examples in this module that provide concrete NumPy inputs. Consider replacing it with a minimal valid construction (e.g., a small xyxy array) so the example can be executed as documentation.
| def from_lmm( | ||
| cls, lmm: LMM | str, result: str | dict[str, Any], **kwargs: Any | ||
| ) -> Detections: | ||
| """ | ||
| !!! deprecated "Deprecated" | ||
| `Detections.from_lmm` is **deprecated** and will be removed in `supervision-0.31.0`. | ||
| Please use `Detections.from_vlm` instead. | ||
|
|
||
| Creates a Detections object from the given result string based on the specified | ||
| Large Multimodal Model (LMM). | ||
|
|
||
| | Name | Enum (sv.LMM) | Tasks | Required parameters | Optional parameters | | ||
| |---------------------|----------------------|-------------------------|-----------------------------|---------------------| | ||
| | PaliGemma | `PALIGEMMA` | detection | `resolution_wh` | `classes` | | ||
| | PaliGemma 2 | `PALIGEMMA` | detection | `resolution_wh` | `classes` | | ||
| | Qwen2.5-VL | `QWEN_2_5_VL` | detection | `resolution_wh`, `input_wh` | `classes` | | ||
| | Google Gemini 2.0 | `GOOGLE_GEMINI_2_0` | detection | `resolution_wh` | `classes` | | ||
| | Google Gemini 2.5 | `GOOGLE_GEMINI_2_5` | detection, segmentation | `resolution_wh` | `classes` | | ||
| | Moondream | `MOONDREAM` | detection | `resolution_wh` | | | ||
| | DeepSeek-VL2 | `DEEPSEEK_VL_2` | detection | `resolution_wh` | `classes` | | ||
|
|
||
| Args: | ||
| lmm: The type of LMM (Large Multimodal Model) to use. | ||
| result: The result string containing the detection data. | ||
| **kwargs: Additional keyword arguments required by the specified LMM. | ||
|
|
||
| Returns: | ||
| A new Detections object. | ||
|
|
||
| Raises: | ||
| ValueError: If the LMM is invalid, required arguments are missing, or | ||
| disallowed arguments are provided. | ||
| ValueError: If the specified LMM is not supported. | ||
|
|
||
| !!! example "PaliGemma" | ||
| ```python | ||
|
|
||
| import supervision as sv | ||
|
|
||
| paligemma_result = "<loc0256><loc0256><loc0768><loc0768> cat" | ||
| detections = sv.Detections.from_lmm( | ||
| sv.LMM.PALIGEMMA, | ||
| paligemma_result, | ||
| resolution_wh=(1000, 1000), | ||
| classes=['cat', 'dog'] | ||
| ) | ||
| detections.xyxy | ||
| # array([[250., 250., 750., 750.]]) | ||
|
|
||
| detections.class_id | ||
| # array([0]) | ||
|
|
||
| detections.data | ||
| # {'class_name': array(['cat'], dtype='<U10')} | ||
| ``` | ||
|
|
||
| !!! example "Qwen2.5-VL" | ||
|
|
||
| ??? tip "Prompt engineering" | ||
|
|
||
| To get the best results from Qwen2.5-VL, use clear and descriptive prompts | ||
| that specify exactly what you want to detect. | ||
|
|
||
| **For general object detection, use this comprehensive prompt:** | ||
|
|
||
| ``` | ||
| Detect all objects in the image and return their locations and labels. | ||
| ``` | ||
|
|
||
| **For specific object detection with detailed descriptions:** | ||
|
|
||
| ``` | ||
| Detect the red object that is leading in this image and return its location and label. | ||
| ``` | ||
|
|
||
| **For simple, targeted detection:** | ||
|
|
||
| ``` | ||
| leading blue truck | ||
| ``` | ||
|
|
||
| **Additional effective prompts:** | ||
|
|
||
| ``` | ||
| Find all people and vehicles in this scene | ||
| ``` | ||
|
|
||
| ``` | ||
| Locate all animals in the image | ||
| ``` | ||
|
|
||
| ``` | ||
| Identify traffic signs and their positions | ||
| ``` | ||
|
|
||
| **Tips for better results:** | ||
|
|
||
| - Use descriptive language that clearly specifies what to look for | ||
| - Include color, size, or position descriptors when targeting specific objects | ||
| - Be specific about the type of objects you want to detect | ||
| - The model responds well to both detailed instructions and concise phrases | ||
| - Results are returned in JSON format with `bbox_2d` coordinates and `label` fields | ||
|
|
||
|
|
||
| ```python | ||
| import supervision as sv | ||
|
|
||
| qwen_2_5_vl_result = \"\"\"```json | ||
| [ | ||
| {"bbox_2d": [139, 768, 315, 954], "label": "cat"}, | ||
| {"bbox_2d": [366, 679, 536, 849], "label": "dog"} | ||
| ] | ||
| ```\"\"\" | ||
| detections = sv.Detections.from_lmm( | ||
| sv.LMM.QWEN_2_5_VL, | ||
| qwen_2_5_vl_result, | ||
| input_wh=(1000, 1000), | ||
| resolution_wh=(1000, 1000), | ||
| classes=['cat', 'dog'], | ||
| ) | ||
| detections.xyxy | ||
| # array([[139., 768., 315., 954.], [366., 679., 536., 849.]]) | ||
|
|
||
| detections.class_id | ||
| # array([0, 1]) | ||
|
|
||
| detections.data | ||
| # {'class_name': array(['cat', 'dog'], dtype='<U10')} | ||
|
|
||
| detections.class_id | ||
| # array([0, 1]) | ||
| ``` | ||
|
|
||
| !!! example "Qwen3-VL" | ||
|
|
||
| ```python | ||
| import supervision as sv | ||
|
|
||
| qwen_3_vl_result = \"\"\"```json | ||
| [ | ||
| {"bbox_2d": [139, 768, 315, 954], "label": "cat"}, | ||
| {"bbox_2d": [366, 679, 536, 849], "label": "dog"} | ||
| ] | ||
| ```\"\"\" | ||
| detections = sv.Detections.from_lmm( | ||
| sv.LMM.QWEN_3_VL, | ||
| qwen_3_vl_result, | ||
| resolution_wh=(1000, 1000), | ||
| classes=['cat', 'dog'], | ||
| ) | ||
| detections.xyxy | ||
| # array([[139., 768., 315., 954.], [366., 679., 536., 849.]]) | ||
|
|
||
| detections.class_id | ||
| # array([0, 1]) | ||
|
|
||
| detections.data | ||
| # {'class_name': array(['cat', 'dog'], dtype='<U10')} | ||
|
|
||
| detections.class_id | ||
| # array([0, 1]) | ||
| ``` | ||
|
|
||
| !!! example "Gemini 2.0" | ||
|
|
||
| ??? tip "Prompt engineering" | ||
|
|
||
| From Gemini 2.0 onwards, models are further trained to detect objects in | ||
| an image and get their bounding box coordinates. The coordinates, | ||
| relative to image dimensions, scale to [0, 1000]. You need to convert | ||
| these normalized coordinates back to pixel coordinates using your | ||
| original image size. | ||
|
|
||
| According to the Gemini API documentation on image prompts (see | ||
| https://ai.google.dev/gemini-api/docs/vision#image-input), when using a | ||
| single image with text, the recommended approach is to place the text | ||
| prompt after the image part in the contents array. This ordering has | ||
| been shown to produce significantly better results in practice. | ||
|
|
||
| For example, when calling the Gemini API directly, you can structure | ||
| the request like this, with the image part first and the text prompt | ||
| second in the `parts` list: | ||
|
|
||
| ```json | ||
| { | ||
| "model": "models/gemini-2.0-flash", | ||
| "contents": [ | ||
| { | ||
| "role": "user", | ||
| "parts": [ | ||
| { | ||
| "inline_data": { | ||
| "mime_type": "image/png", | ||
| "data": "<BASE64_IMAGE_BYTES>" | ||
| } | ||
| }, | ||
| { | ||
| "text": "Detect all the cats and dogs in the image..." | ||
| } | ||
| ] | ||
| } | ||
| ] | ||
| } | ||
| ``` | ||
| To get the best results from Google Gemini 2.0, use the following prompt. | ||
|
|
||
| ``` | ||
| Detect all the cats and dogs in the image. The box_2d should be | ||
| [ymin, xmin, ymax, xmax] normalized to 0-1000. | ||
| ``` | ||
|
|
||
| ```python | ||
| import supervision as sv | ||
|
|
||
| gemini_response_text = \"\"\"```json | ||
| [ | ||
| {"box_2d": [543, 40, 728, 200], "label": "cat", "id": 1}, | ||
| {"box_2d": [653, 352, 820, 522], "label": "dog", "id": 2} | ||
| ] | ||
| ```\"\"\" | ||
|
|
||
| detections = sv.Detections.from_lmm( | ||
| sv.LMM.GOOGLE_GEMINI_2_0, | ||
| gemini_response_text, | ||
| resolution_wh=(1000, 1000), | ||
| classes=['cat', 'dog'], | ||
| ) | ||
|
|
||
| detections.xyxy | ||
| # array([[543., 40., 728., 200.], [653., 352., 820., 522.]]) | ||
|
|
||
| detections.data | ||
| # {'class_name': array(['cat', 'dog'], dtype='<U26')} | ||
|
|
||
| detections.class_id | ||
| # array([0, 1]) | ||
| ``` | ||
|
|
||
| !!! example "Gemini 2.5" | ||
|
|
||
| ??? tip "Prompt engineering" | ||
|
|
||
| To get the best results from Google Gemini 2.5, use the following prompt. | ||
|
|
||
| This prompt is designed to detect all visible objects in the image, | ||
| including small, distant, or partially visible ones, and to return | ||
| tight bounding boxes. | ||
|
|
||
| According to the Gemini API documentation on image prompts, when using | ||
| a single image with text, the recommended approach is to place the text | ||
| prompt after the image part in the `contents` array. See the official | ||
| Gemini vision docs for details: | ||
| https://ai.google.dev/gemini-api/docs/vision#multi-part-input | ||
|
|
||
| For example, using the `google-generativeai` client: | ||
|
|
||
| ```python | ||
| from google.generativeai import types | ||
|
|
||
| response = model.generate_content( | ||
| contents=[ | ||
| types.Part.from_image(image_bytes), | ||
| "Carefully examine this image and detect ALL visible objects, including " | ||
| "small, distant, or partially visible ones.", | ||
| ], | ||
| generation_config=generation_config, | ||
| safety_settings=safety_settings, | ||
| ) | ||
| ``` | ||
|
|
||
| This ordering (image first, then text) has been shown to produce | ||
| significantly better results in practice. | ||
|
|
||
| ``` | ||
| Carefully examine this image and detect ALL visible objects, including | ||
| small, distant, or partially visible ones. | ||
|
|
||
| IMPORTANT: Focus on finding as many objects as possible, even if you are | ||
| only moderately confident. | ||
|
|
||
| Make sure each bounding box is as tight as possible. | ||
|
|
||
| Valid object classes: {class_list} | ||
|
|
||
| For each detected object, provide: | ||
| - "label": the exact class name from the list above | ||
| - "confidence": your certainty (between 0.0 and 1.0) | ||
| - "box_2d": the bounding box [ymin, xmin, ymax, xmax] normalized to 0-1000 | ||
| - "mask": the binary mask of the object as a base64-encoded string | ||
|
|
||
| Detect everything that matches the valid classes. Do not be | ||
| conservative; include objects even with moderate confidence. | ||
|
|
||
| Return a JSON array, for example: | ||
| [ | ||
| { | ||
| "label": "person", | ||
| "confidence": 0.95, | ||
| "box_2d": [100, 200, 300, 400], | ||
| "mask": "..." | ||
| }, | ||
| { | ||
| "label": "kite", | ||
| "confidence": 0.80, | ||
| "box_2d": [50, 150, 250, 350], | ||
| "mask": "..." | ||
| } | ||
| ] | ||
| ``` | ||
|
|
||
| When using the google-genai library, it is recommended to set | ||
| thinking_budget=0 in thinking_config for more direct and faster responses. | ||
|
|
||
| ```python | ||
| from google.generativeai import types | ||
|
|
||
| model.generate_content( | ||
| ..., | ||
| generation_config=generation_config, | ||
| safety_settings=safety_settings, | ||
| thinking_config=types.ThinkingConfig( | ||
| thinking_budget=0 | ||
| ) | ||
| ) | ||
| ``` | ||
|
|
||
| For a shorter prompt focused only on segmentation masks, you can use: | ||
|
|
||
| ``` | ||
| Return a JSON list of segmentation masks. Each entry should include the | ||
| 2D bounding box in the "box_2d" key, the segmentation mask in the "mask" | ||
| key, and the text label in the "label" key. Use descriptive labels. | ||
| ``` | ||
|
|
||
| ```python | ||
| import supervision as sv | ||
|
|
||
| gemini_response_text = \"\"\"```json | ||
| [ | ||
| {"box_2d": [543, 40, 728, 200], "label": "cat", "id": 1}, | ||
| {"box_2d": [653, 352, 820, 522], "label": "dog", "id": 2} | ||
| ] | ||
| ```\"\"\" | ||
|
|
||
| detections = sv.Detections.from_lmm( | ||
| sv.LMM.GOOGLE_GEMINI_2_5, | ||
| gemini_response_text, | ||
| resolution_wh=(1000, 1000), | ||
| classes=['cat', 'dog'], | ||
| ) | ||
|
|
||
| detections.xyxy | ||
| # array([[543., 40., 728., 200.], [653., 352., 820., 522.]]) | ||
|
|
||
| detections.data | ||
| # {'class_name': array(['cat', 'dog'], dtype='<U26')} | ||
|
|
||
| detections.class_id | ||
| # array([0, 1]) | ||
| ``` | ||
|
|
||
| !!! example "Moondream" | ||
|
|
||
|
|
||
| ??? tip "Prompt engineering" | ||
|
|
||
| To get the best results from Moondream, use optimized prompts that leverage | ||
| its object detection capabilities effectively. | ||
|
|
||
| **For general object detection, use this simple prompt:** | ||
|
|
||
| ``` | ||
| objects | ||
| ``` | ||
|
|
||
| This single-word prompt instructs Moondream to detect all visible objects | ||
| and return them in the proper JSON format with normalized coordinates. | ||
|
|
||
|
|
||
| ```python | ||
| import supervision as sv | ||
|
|
||
| moondream_result = { | ||
| 'objects': [ | ||
| { | ||
| 'x_min': 0.5704046934843063, | ||
| 'y_min': 0.20069346576929092, | ||
| 'x_max': 0.7049859315156937, | ||
| 'y_max': 0.3012596592307091 | ||
| }, | ||
| { | ||
| 'x_min': 0.6210969910025597, | ||
| 'y_min': 0.3300672620534897, | ||
| 'x_max': 0.8417936339974403, | ||
| 'y_max': 0.4961046129465103 | ||
| } | ||
| ] | ||
| } | ||
|
|
||
| detections = sv.Detections.from_lmm( | ||
| sv.LMM.MOONDREAM, | ||
| moondream_result, | ||
| resolution_wh=(1000, 1000), | ||
| ) | ||
|
|
||
| detections.xyxy | ||
| # array([[1752.28, 818.82, 2165.72, 1229.14], | ||
| # [1908.01, 1346.67, 2585.99, 2024.11]]) | ||
| ``` | ||
|
|
||
| !!! example "DeepSeek-VL2" | ||
|
|
||
|
|
||
| ??? tip "Prompt engineering" | ||
|
|
||
| To get the best results from DeepSeek-VL2, use optimized prompts that leverage | ||
| its object detection and visual grounding capabilities effectively. | ||
|
|
||
| **For general object detection, use the following user prompt:** | ||
|
|
||
| ``` | ||
| <image>\\n<|ref|>The giraffe at the front<|/ref|> | ||
| ``` | ||
|
|
||
| **For visual grounding, use the following user prompt:** | ||
|
|
||
| ``` | ||
| <image>\\n<|grounding|>Detect the giraffes | ||
| ``` | ||
|
|
||
| ```python | ||
| from PIL import Image | ||
| import supervision as sv | ||
|
|
||
| deepseek_vl2_result = "<|ref|>The giraffe at the back<|/ref|><|det|>[[580, 270, 999, 904]]<|/det|><|ref|>The giraffe at the front<|/ref|><|det|>[[26, 31, 632, 998]]<|/det|><|end▁of▁sentence|>" | ||
|
|
||
| detections = sv.Detections.from_vlm( | ||
| vlm=sv.VLM.DEEPSEEK_VL_2, result=deepseek_vl2_result, resolution_wh=image.size | ||
| ) | ||
|
|
||
| detections.xyxy | ||
| # array([[ 420, 293, 724, 982], | ||
| # [ 18, 33, 458, 1084]]) | ||
|
|
||
| detections.class_id | ||
| # array([0, 1]) | ||
|
|
||
| detections.data | ||
| # {'class_name': array(['The giraffe at the back', 'The giraffe at the front'], dtype='<U24')} | ||
| ``` | ||
| """ # noqa: E501 | ||
|
|
There was a problem hiding this comment.
This PR removes large docstring example blocks from from_lmm/from_vlm, but the PR description only discusses the is_empty() behavior change. If the documentation removal is intentional, it should be mentioned in the PR description (or split into a separate docs-focused PR) to avoid surprising downstream docs consumers.
Problem
sv.Detections.is_empty()returnedFalsewhentracker_idwas set to an empty arraynp.array([])instead ofNone. This happened because the previous implementation comparedself == Detections.empty(), andDetections.empty()setstracker_id=None— so the equality check failed for any instance wheretracker_id=[].Minimal repro (before fix):
Fix
Replaced the equality-based check with a direct length check:
This is robust to any optional field (
tracker_id,confidence,class_id, etc.) being an empty array rather thanNone, since__len__is based solely on the number of bounding boxes (len(self.xyxy)).Fixes #2195