Skip to content

Add logits processor arguments to mlx_lm.generate#1273

Open
realyxl wants to merge 1 commit into
ml-explore:mainfrom
realyxl:cli-logits-processors
Open

Add logits processor arguments to mlx_lm.generate#1273
realyxl wants to merge 1 commit into
ml-explore:mainfrom
realyxl:cli-logits-processors

Conversation

@realyxl
Copy link
Copy Markdown

@realyxl realyxl commented May 13, 2026

Background

mlx_lm.generate does not currently expose seven logits-processor
parameters that are available in the Python API: logit_bias,
{repetition,presence,frequency}_penalty, and their three _context_size
companions.

These parameters are accepted by
mlx_lm.sample_utils.make_logits_processors(), which produces the list that
generate() / stream_generate() already accept via the
logits_processors keyword argument. mlx_lm.server already exposes the
same seven parameters over HTTP.

Goal

Add the same seven parameters to the mlx_lm.generate CLI.

Motivation

  1. Common need in terminal testing. mlx_lm.generate is convenient for
    quickly testing local checkpoints, LoRA adapters, and quantization levels
    from the terminal. These parameters are routinely useful, and sometimes
    necessary, for suppressing the repetition loops that small or
    low-bit-quantized local models often fall into.

  2. Consistency.

    • CLI ↔ Python API: generate() already supports
      logits_processors, and mlx_lm.sample_utils.make_logits_processors is
      the standard factory; the mlx_lm.generate CLI does not yet expose them.
    • Sampler ↔ logits-processor: mlx_lm.generate already exposes the
      parallel make_sampler parameters (--temp, --top-p, --top-k,
      --min-p, --xtc-*); the logits-processor side was asymmetric.
  3. Minimal and additive. No algorithmic changes. This only wires the
    existing factory into argparse.

  4. Why only generate, not chat. mlx_lm.chat might be intentionally
    minimal — it doesn't expose even --top-k. This PR keeps that scope
    unchanged and focuses on mlx_lm.generate, the primary terminal generation
    tool and the one where these knobs are most useful.

Implementation

A single file changed (mlx_lm/generate.py).

  • Import make_logits_processors.
  • Seven DEFAULT_* constants. None for logit_bias, 0.0 for the three
    penalties, 20 for the three context sizes. This matches the 0.0 == disabled convention used by the sampler defaults in this file and by
    mlx_lm/server.py.
  • A small parse_logit_bias argparse type that accepts a JSON object and
    converts keys to int / values to float.
  • Seven add_argument calls in setup_arg_parser, ordered to mirror the
    make_logits_processors signature.
  • In main(), build logits_processors via make_logits_processors(...)
    (kwarg form) and pass it to generate(...) as the existing
    logits_processors= kwarg.

Tests

No new tests added. pre-commit and the full existing suite
(tests.test_sample_utils + tests.test_generate, 31 tests) pass locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant