fix: is_empty() returns False for empty tracker arrays (issue #2195) by Zeesejo · Pull Request #2203 · roboflow/supervision

Zeesejo · 2026-04-07T13:12:16Z

Problem

sv.Detections.is_empty() returned False when tracker_id was set to an empty array np.array([]) instead of None. This happened because the previous implementation compared self == Detections.empty(), and Detections.empty() sets tracker_id=None — so the equality check failed for any instance where tracker_id=[].

Minimal repro (before fix):

import numpy as np
import supervision as sv

detections = sv.Detections(
    xyxy=np.empty((0, 4), dtype=np.float32),
    tracker_id=np.array([])  # empty array, not None
)
print(detections.is_empty())  # ❌ returned False

Fix

Replaced the equality-based check with a direct length check:

def is_empty(self) -> bool:
    return len(self) == 0

This is robust to any optional field (tracker_id, confidence, class_id, etc.) being an empty array rather than None, since __len__ is based solely on the number of bounding boxes (len(self.xyxy)).

Fixes #2195

Previously, is_empty() used equality comparison against Detections.empty() which sets tracker_id=None. When tracker_id was np.array([]) instead of None (e.g., after filtering a Detections object that had a tracker_id), the __eq__ check failed even though the detection set is genuinely empty. Fix: check len(self) == 0 directly, preserving data/metadata neutrality. Fixes roboflow#2195

CLAassistant · 2026-04-07T13:12:30Z

All committers have signed the CLA.

Copilot

Pull request overview

Fixes sv.Detections.is_empty() returning False for empty detections when optional fields (e.g. tracker_id) are present as zero-length arrays instead of None.

Changes:

Reimplemented Detections.is_empty() to return len(self) == 0 (based on xyxy length only).
Expanded the is_empty() docstring to clarify the new behavior.
Removed substantial docstring example blocks from from_lmm() / from_vlm() and adjusted the __getitem__ docstring example.

Copilot · 2026-04-07T13:15:54Z

src/supervision/detection/core.py

    def is_empty(self) -> bool:
        """
-        Returns `True` if the `Detections` object is considered empty.
+        Returns `True` if the `Detections` object is considered empty,
+        i.e. contains no detections. This check is based solely on the
+        number of bounding boxes, making it robust to optional fields
+        (such as `tracker_id`) being empty arrays rather than `None`.
        """
-        empty_detections = Detections.empty()
-        empty_detections.data = self.data
-        empty_detections.metadata = self.metadata
-        return bool(self == empty_detections)
+        return len(self) == 0


Add regression tests for the updated Detections.is_empty() behavior (e.g., xyxy empty with tracker_id=np.array([]) and/or other optional fields as empty arrays) to ensure the original issue (#2195) is covered and doesn't regress.

Copilot · 2026-04-07T13:15:54Z

src/supervision/detection/core.py

        Example:
            ```python
            import supervision as sv

-            detections = sv.Detections()
+            detections = sv.Detections(...)



The __getitem__ docstring example uses sv.Detections(...), which isn’t runnable and is inconsistent with other docstring examples in this module that provide concrete NumPy inputs. Consider replacing it with a minimal valid construction (e.g., a small xyxy array) so the example can be executed as documentation.

Copilot · 2026-04-07T13:15:55Z

src/supervision/detection/core.py

    def from_lmm(
        cls, lmm: LMM | str, result: str | dict[str, Any], **kwargs: Any
    ) -> Detections:
        """
        !!! deprecated "Deprecated"
            `Detections.from_lmm` is **deprecated** and will be removed in `supervision-0.31.0`.
            Please use `Detections.from_vlm` instead.

        Creates a Detections object from the given result string based on the specified
        Large Multimodal Model (LMM).

        | Name                | Enum (sv.LMM)        | Tasks                   | Required parameters         | Optional parameters |
        |---------------------|----------------------|-------------------------|-----------------------------|---------------------|
        | PaliGemma           | `PALIGEMMA`          | detection               | `resolution_wh`             | `classes`           |
        | PaliGemma 2         | `PALIGEMMA`          | detection               | `resolution_wh`             | `classes`           |
        | Qwen2.5-VL          | `QWEN_2_5_VL`        | detection               | `resolution_wh`, `input_wh` | `classes`           |
        | Google Gemini 2.0   | `GOOGLE_GEMINI_2_0`  | detection               | `resolution_wh`             | `classes`           |
        | Google Gemini 2.5   | `GOOGLE_GEMINI_2_5`  | detection, segmentation | `resolution_wh`             | `classes`           |
        | Moondream           | `MOONDREAM`          | detection               | `resolution_wh`             |                     |
        | DeepSeek-VL2        | `DEEPSEEK_VL_2`      | detection               | `resolution_wh`             | `classes`           |

        Args:
            lmm: The type of LMM (Large Multimodal Model) to use.
            result: The result string containing the detection data.
            **kwargs: Additional keyword arguments required by the specified LMM.

        Returns:
            A new Detections object.

        Raises:
            ValueError: If the LMM is invalid, required arguments are missing, or
                disallowed arguments are provided.
            ValueError: If the specified LMM is not supported.
-
-        !!! example "PaliGemma"
-            ```python
-
-            import supervision as sv
-
-            paligemma_result = "<loc0256><loc0256><loc0768><loc0768> cat"
-            detections = sv.Detections.from_lmm(
-                sv.LMM.PALIGEMMA,
-                paligemma_result,
-                resolution_wh=(1000, 1000),
-                classes=['cat', 'dog']
-            )
-            detections.xyxy
-            # array([[250., 250., 750., 750.]])
-
-            detections.class_id
-            # array([0])
-
-            detections.data
-            # {'class_name': array(['cat'], dtype='<U10')}
-            ```
-
-        !!! example "Qwen2.5-VL"
-
-            ??? tip "Prompt engineering"
-
-                To get the best results from Qwen2.5-VL, use clear and descriptive prompts
-                that specify exactly what you want to detect.
-
-                **For general object detection, use this comprehensive prompt:**
-
-                ```
-                Detect all objects in the image and return their locations and labels.
-                ```
-
-                **For specific object detection with detailed descriptions:**
-
-                ```
-                Detect the red object that is leading in this image and return its location and label.
-                ```
-
-                **For simple, targeted detection:**
-
-                ```
-                leading blue truck
-                ```
-
-                **Additional effective prompts:**
-
-                ```
-                Find all people and vehicles in this scene
-                ```
-
-                ```
-                Locate all animals in the image
-                ```
-
-                ```
-                Identify traffic signs and their positions
-                ```
-
-                **Tips for better results:**
-
-                - Use descriptive language that clearly specifies what to look for
-                - Include color, size, or position descriptors when targeting specific objects
-                - Be specific about the type of objects you want to detect
-                - The model responds well to both detailed instructions and concise phrases
-                - Results are returned in JSON format with `bbox_2d` coordinates and `label` fields
-
-
-            ```python
-            import supervision as sv
-
-            qwen_2_5_vl_result = \"\"\"```json
-            [
-                {"bbox_2d": [139, 768, 315, 954], "label": "cat"},
-                {"bbox_2d": [366, 679, 536, 849], "label": "dog"}
-            ]
-            ```\"\"\"
-            detections = sv.Detections.from_lmm(
-                sv.LMM.QWEN_2_5_VL,
-                qwen_2_5_vl_result,
-                input_wh=(1000, 1000),
-                resolution_wh=(1000, 1000),
-                classes=['cat', 'dog'],
-            )
-            detections.xyxy
-            # array([[139., 768., 315., 954.], [366., 679., 536., 849.]])
-
-            detections.class_id
-            # array([0, 1])
-
-            detections.data
-            # {'class_name': array(['cat', 'dog'], dtype='<U10')}
-
-            detections.class_id
-            # array([0, 1])
-            ```
-
-        !!! example "Qwen3-VL"
-
-            ```python
-            import supervision as sv
-
-            qwen_3_vl_result = \"\"\"```json
-            [
-                {"bbox_2d": [139, 768, 315, 954], "label": "cat"},
-                {"bbox_2d": [366, 679, 536, 849], "label": "dog"}
-            ]
-            ```\"\"\"
-            detections = sv.Detections.from_lmm(
-                sv.LMM.QWEN_3_VL,
-                qwen_3_vl_result,
-                resolution_wh=(1000, 1000),
-                classes=['cat', 'dog'],
-            )
-            detections.xyxy
-            # array([[139., 768., 315., 954.], [366., 679., 536., 849.]])
-
-            detections.class_id
-            # array([0, 1])
-
-            detections.data
-            # {'class_name': array(['cat', 'dog'], dtype='<U10')}
-
-            detections.class_id
-            # array([0, 1])
-            ```
-
-        !!! example "Gemini 2.0"
-
-            ??? tip "Prompt engineering"
-
-                From Gemini 2.0 onwards, models are further trained to detect objects in
-                an image and get their bounding box coordinates. The coordinates,
-                relative to image dimensions, scale to [0, 1000]. You need to convert
-                these normalized coordinates back to pixel coordinates using your
-                original image size.
-
-                According to the Gemini API documentation on image prompts (see
-                https://ai.google.dev/gemini-api/docs/vision#image-input), when using a
-                single image with text, the recommended approach is to place the text
-                prompt after the image part in the contents array. This ordering has
-                been shown to produce significantly better results in practice.
-
-                For example, when calling the Gemini API directly, you can structure
-                the request like this, with the image part first and the text prompt
-                second in the `parts` list:
-
-                ```json
-                {
-                  "model": "models/gemini-2.0-flash",
-                  "contents": [
-                    {
-                      "role": "user",
-                      "parts": [
-                        {
-                          "inline_data": {
-                            "mime_type": "image/png",
-                            "data": "<BASE64_IMAGE_BYTES>"
-                          }
-                        },
-                        {
-                          "text": "Detect all the cats and dogs in the image..."
-                        }
-                      ]
-                    }
-                  ]
-                }
-                ```
-                To get the best results from Google Gemini 2.0, use the following prompt.
-
-                ```
-                Detect all the cats and dogs in the image. The box_2d should be
-                [ymin, xmin, ymax, xmax] normalized to 0-1000.
-                ```
-
-            ```python
-            import supervision as sv
-
-            gemini_response_text = \"\"\"```json
-                [
-                    {"box_2d": [543, 40, 728, 200], "label": "cat", "id": 1},
-                    {"box_2d": [653, 352, 820, 522], "label": "dog", "id": 2}
-                ]
-            ```\"\"\"
-
-            detections = sv.Detections.from_lmm(
-                sv.LMM.GOOGLE_GEMINI_2_0,
-                gemini_response_text,
-                resolution_wh=(1000, 1000),
-                classes=['cat', 'dog'],
-            )
-
-            detections.xyxy
-            # array([[543., 40., 728., 200.], [653., 352., 820., 522.]])
-
-            detections.data
-            # {'class_name': array(['cat', 'dog'], dtype='<U26')}
-
-            detections.class_id
-            # array([0, 1])
-            ```
-
-        !!! example "Gemini 2.5"
-
-            ??? tip "Prompt engineering"
-
-                To get the best results from Google Gemini 2.5, use the following prompt.
-
-                This prompt is designed to detect all visible objects in the image,
-                including small, distant, or partially visible ones, and to return
-                tight bounding boxes.
-
-                According to the Gemini API documentation on image prompts, when using
-                a single image with text, the recommended approach is to place the text
-                prompt after the image part in the `contents` array. See the official
-                Gemini vision docs for details:
-                https://ai.google.dev/gemini-api/docs/vision#multi-part-input
-
-                For example, using the `google-generativeai` client:
-
-                ```python
-                from google.generativeai import types
-
-                response = model.generate_content(
-                    contents=[
-                        types.Part.from_image(image_bytes),
-                        "Carefully examine this image and detect ALL visible objects, including "
-                        "small, distant, or partially visible ones.",
-                    ],
-                    generation_config=generation_config,
-                    safety_settings=safety_settings,
-                )
-                ```
-
-                This ordering (image first, then text) has been shown to produce
-                significantly better results in practice.
-
-                ```
-                Carefully examine this image and detect ALL visible objects, including
-                small, distant, or partially visible ones.
-
-                IMPORTANT: Focus on finding as many objects as possible, even if you are
-                only moderately confident.
-
-                Make sure each bounding box is as tight as possible.
-
-                Valid object classes: {class_list}
-
-                For each detected object, provide:
-                - "label": the exact class name from the list above
-                - "confidence": your certainty (between 0.0 and 1.0)
-                - "box_2d": the bounding box [ymin, xmin, ymax, xmax] normalized to 0-1000
-                - "mask": the binary mask of the object as a base64-encoded string
-
-                Detect everything that matches the valid classes. Do not be
-                conservative; include objects even with moderate confidence.
-
-                Return a JSON array, for example:
-                [
-                    {
-                        "label": "person",
-                        "confidence": 0.95,
-                        "box_2d": [100, 200, 300, 400],
-                        "mask": "..."
-                    },
-                    {
-                        "label": "kite",
-                        "confidence": 0.80,
-                        "box_2d": [50, 150, 250, 350],
-                        "mask": "..."
-                    }
-                ]
-                ```
-
-                When using the google-genai library, it is recommended to set
-                thinking_budget=0 in thinking_config for more direct and faster responses.
-
-                ```python
-                from google.generativeai import types
-
-                model.generate_content(
-                    ...,
-                    generation_config=generation_config,
-                    safety_settings=safety_settings,
-                    thinking_config=types.ThinkingConfig(
-                        thinking_budget=0
-                    )
-                )
-                ```
-
-                For a shorter prompt focused only on segmentation masks, you can use:
-
-                ```
-                Return a JSON list of segmentation masks. Each entry should include the
-                2D bounding box in the "box_2d" key, the segmentation mask in the "mask"
-                key, and the text label in the "label" key. Use descriptive labels.
-                ```
-
-            ```python
-            import supervision as sv
-
-            gemini_response_text = \"\"\"```json
-                [
-                    {"box_2d": [543, 40, 728, 200], "label": "cat", "id": 1},
-                    {"box_2d": [653, 352, 820, 522], "label": "dog", "id": 2}
-                ]
-            ```\"\"\"
-
-            detections = sv.Detections.from_lmm(
-                sv.LMM.GOOGLE_GEMINI_2_5,
-                gemini_response_text,
-                resolution_wh=(1000, 1000),
-                classes=['cat', 'dog'],
-            )
-
-            detections.xyxy
-            # array([[543., 40., 728., 200.], [653., 352., 820., 522.]])
-
-            detections.data
-            # {'class_name': array(['cat', 'dog'], dtype='<U26')}
-
-            detections.class_id
-            # array([0, 1])
-            ```
-
-        !!! example "Moondream"
-
-
-            ??? tip "Prompt engineering"
-
-                To get the best results from Moondream, use optimized prompts that leverage
-                its object detection capabilities effectively.
-
-                **For general object detection, use this simple prompt:**
-
-                ```
-                objects
-                ```
-
-                This single-word prompt instructs Moondream to detect all visible objects
-                and return them in the proper JSON format with normalized coordinates.
-
-
-            ```python
-            import supervision as sv
-
-            moondream_result = {
-                'objects': [
-                    {
-                        'x_min': 0.5704046934843063,
-                        'y_min': 0.20069346576929092,
-                        'x_max': 0.7049859315156937,
-                        'y_max': 0.3012596592307091
-                    },
-                    {
-                        'x_min': 0.6210969910025597,
-                        'y_min': 0.3300672620534897,
-                        'x_max': 0.8417936339974403,
-                        'y_max': 0.4961046129465103
-                    }
-                ]
-            }
-
-            detections = sv.Detections.from_lmm(
-                sv.LMM.MOONDREAM,
-                moondream_result,
-                resolution_wh=(1000, 1000),
-            )
-
-            detections.xyxy
-            # array([[1752.28,  818.82, 2165.72, 1229.14],
-            #        [1908.01, 1346.67, 2585.99, 2024.11]])
-            ```
-
-        !!! example "DeepSeek-VL2"
-
-
-            ??? tip "Prompt engineering"
-
-                To get the best results from DeepSeek-VL2, use optimized prompts that leverage
-                its object detection and visual grounding capabilities effectively.
-
-                **For general object detection, use the following user prompt:**
-
-                ```
-                <image>\\n<|ref|>The giraffe at the front<|/ref|>
-                ```
-
-                **For visual grounding, use the following user prompt:**
-
-                ```
-                <image>\\n<|grounding|>Detect the giraffes
-                ```
-
-            ```python
-            from PIL import Image
-            import supervision as sv
-
-            deepseek_vl2_result = "<|ref|>The giraffe at the back<|/ref|><|det|>[[580, 270, 999, 904]]<|/det|><|ref|>The giraffe at the front<|/ref|><|det|>[[26, 31, 632, 998]]<|/det|><|end▁of▁sentence|>"
-
-            detections = sv.Detections.from_vlm(
-                vlm=sv.VLM.DEEPSEEK_VL_2, result=deepseek_vl2_result, resolution_wh=image.size
-            )
-
-            detections.xyxy
-            # array([[ 420,  293,  724,  982],
-            #        [  18,   33,  458, 1084]])
-
-            detections.class_id
-            # array([0, 1])
-
-            detections.data
-            # {'class_name': array(['The giraffe at the back', 'The giraffe at the front'], dtype='<U24')}
-            ```
        """  # noqa: E501



This PR removes large docstring example blocks from from_lmm/from_vlm, but the PR description only discusses the is_empty() behavior change. If the documentation removal is intentional, it should be mentioned in the PR description (or split into a separate docs-focused PR) to avoid surprising downstream docs consumers.

Zeesejo requested a review from SkalskiP as a code owner April 7, 2026 13:12

Copilot AI review requested due to automatic review settings April 7, 2026 13:12

Copilot started reviewing on behalf of Zeesejo April 7, 2026 13:12 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: is_empty() returns False for empty tracker arrays (issue #2195)#2203

fix: is_empty() returns False for empty tracker arrays (issue #2195)#2203
Zeesejo wants to merge 1 commit intoroboflow:developfrom
Zeesejo:fix/is-empty-tracker-id

Zeesejo commented Apr 7, 2026

Uh oh!

CLAassistant commented Apr 7, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 7, 2026

Uh oh!

Copilot AI Apr 7, 2026

Uh oh!

Copilot AI Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Zeesejo commented Apr 7, 2026

Problem

Fix

Uh oh!

CLAassistant commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CLAassistant commented Apr 7, 2026 •

edited

Loading