Skip to content

Fix server XTC: accept int params and flatten special_tokens list#1301

Open
realyxl wants to merge 1 commit into
ml-explore:mainfrom
realyxl:fix/server-xtc-int-and-special-tokens
Open

Fix server XTC: accept int params and flatten special_tokens list#1301
realyxl wants to merge 1 commit into
ml-explore:mainfrom
realyxl:fix/server-xtc-int-and-special-tokens

Conversation

@realyxl
Copy link
Copy Markdown

@realyxl realyxl commented May 22, 2026

mlx_lm.server rejects integer XTC parameters and crashes whenever xtc_probability > 0. Two unrelated server-side bugs, both in server.py.

1. Strict float rejects integer 0 / 1

validate_model_parameters() checks xtc_probability / xtc_threshold with float alone, while every other sampling parameter in the same block — temperature, top_p, min_p, repetition_penalty, presence_penalty, frequency_penalty — already accepts (float, int). JSON clients (e.g. SillyTavern) that emit 0 / 1 as integers get a 4xx.

The downstream apply_xtc / make_sampler have no isinstance checks; the values are only used in comparisons and MLX tensor ops, which treat int and float identically.

-        self._validate("xtc_probability", float, min_val=0, max_val=1)
-        self._validate("xtc_threshold", float, min_val=0, max_val=1)
+        self._validate("xtc_probability", (float, int), min_val=0, max_val=1)
+        self._validate("xtc_threshold", (float, int), min_val=0, max_val=1)

2. Nested xtc_special_tokens crashes apply_xtc

_make_sampler builds xtc_special_tokens as [int, list[int]] because tokenizer.encode("\n") returns a list. With xtc_probability > 0, apply_xtc does mask[..., xtc_special_tokens] = False and raises ValueError: Initialization encountered extra dimension (#1257).

generate.py:2070 and chat.py:155-157 already use the correct flat construction; this aligns server.py with them.

-        xtc_special_tokens=[
-            tokenizer.eos_token_id,
-            tokenizer.encode("\n"),
-        ],
+        xtc_special_tokens=tokenizer.encode("\n") + list(tokenizer.eos_token_ids),

This also picks up additional EOS tokens on multi-EOS tokenizers (Gemma, Llama 3), which singular eos_token_id missed.

Related

Fixes #1257.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

XTC sampling crashes on mlx-community/gemma-4-e4b-it-4bit ValueError: Initialization encountered extra dimension

1 participant