Skip to content

IndexError in model.inference when input contains empty strings #316

@tannonk

Description

@tannonk

Description

When using knowledgator/gliner-x-base, inference fails with an IndexError on inputs containing an empty string.

Reproduction Steps

The following script demonstrates the problem:

from gliner import GLiNER

model = GLiNER.from_pretrained("knowledgator/gliner-x-base")

# The presence of the empty string in this list triggers the error
texts = ["Email CEO to approve budget", ""]
labels = ["person", "organization", "action"]

print("Running inference...")
predictions = model.inference(texts, labels, batch_size=16)
print(f"Results: {predictions}")

Traceback

Traceback (most recent call last):
  File "issue_repro.py", line 10, in <module>
    predictions = model.inference(texts, labels, batch_size=16)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../gliner/model.py", line 1290, in inference
    start_text_idx = start_token_idx_to_text_idx[start_token_idx]
                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range

Expected Behavior

The model should handle empty strings gracefully by returning an empty list of entities for that specific index, e.g.:
[[{'start': 6, 'end': 9, 'text': 'CEO', 'label': 'person'}], []], as is done with during standard inference with other GLiNER models.

Environment

  • GLiNER: v0.2.24
  • flash_attn: v2.7.4.post1+25.11
  • Model: knowledgator/gliner-x-base

Workaround

I found a quick fix by simply skipping the output processing of empty string inputs by modifying this section with:

all_entities = []
for i, output in enumerate(outputs):
    if not tokens[i]: # FIX empty input case for models like knowledgator/gliner-x-base
        all_entities.append([])
        continue
    start_token_idx_to_text_idx = all_start_token_idx_to_text_idx[i]
    end_token_idx_to_text_idx = all_end_token_idx_to_text_idx[i]
    entities = []

But it would be better to handle this in the forward pass to avoid ghost predictions from the model on empty strings entirely. Perhaps a single fix to handle this could be found that also solves Issue #315?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions