Skip to content

feat(minimax): port MiniMax TTS plugin from Python agents#1287

Open
toubatbrian wants to merge 13 commits intomainfrom
claude/jolly-lovelace-blf3Z
Open

feat(minimax): port MiniMax TTS plugin from Python agents#1287
toubatbrian wants to merge 13 commits intomainfrom
claude/jolly-lovelace-blf3Z

Conversation

@toubatbrian
Copy link
Copy Markdown
Contributor

Summary

Ports the MiniMax TTS plugin from livekit/agents (Python) into agents-js.

Triggered by automated routine observing merged PR livekit/agents#5518 ("(minimax): add new TTS models") — the Python change simply added speech-2.8-hd and speech-2.8-turbo to the already-existing MiniMax plugin. Because the MiniMax plugin did not yet exist in agents-js, this PR creates the full plugin scaffold (including the new 2.8 models) rather than attempting to land the two model literals alone.

Closes the JS-side gap with livekit-plugins-minimax (Python).

What's included

New workspace package @livekit/agents-plugin-minimax at plugins/minimax/:

  • package.json, tsconfig.json, tsup.config.ts, api-extractor.json, README.md — matches the conventions of existing plugins (e.g. rime, cartesia, neuphonic).
  • src/index.ts — plugin registration.
  • src/models.ts — literal types (TTSModel, TTSVoice, TTSEmotion, TTSLanguageBoost, TTSSampleRate), defaults (DEFAULT_MODEL, DEFAULT_VOICE_ID, DEFAULT_BASE_URL). Includes the new speech-2.8-hd and speech-2.8-turbo model strings from (minimax): add new TTS models agents#5518.
  • src/tts.ts:
    • TTS class with the same capability surface as the Python version (streaming: true, alignedTranscript: false).
    • ChunkedStream: one-shot synthesis via HTTP SSE (POST /v1/t2a_v2, stream: true, exclude_aggregated_audio: true), hex-decoding the audio chunks and pushing PCM frames into an AudioByteStream.
    • SynthesizeStream: real-time WebSocket synthesis via /ws/v1/t2a_v2 with the task_start / task_continue / task_finish event protocol. Uses a sentence tokenizer to chunk incoming text.
    • updateOptions() parity with the Python update_options.
    • Input validation mirrors Python: speed ∈ [0.5, 2.0], intensity ∈ [-100, 100], timbre ∈ [-100, 100], and the fluent emotion is only accepted for speech-2.6-* models.
    • Error surfacing via APIConnectionError / APIStatusError / APITimeoutError / APIError with MiniMax trace_id propagation on both HTTP and WS paths.

Implementation notes where JS differs from Python

Code-level parity is mostly 1:1, except:

  1. Audio format is restricted to PCM. The Python plugin exposes a audio_format option with pcm | mp3 | flac | wav. @livekit/agents's AudioByteStream is designed around raw PCM samples — decoding MP3/FLAC/WAV on the fly would require pulling in an external decoder and wiring it through the TTS pipeline, which is out of scope for the initial port. format: "pcm" is always sent on the wire, and the incoming hex audio is fed straight into AudioByteStream. The public bitrate option is still accepted for API parity, but is effectively ignored by MiniMax when format=pcm.

  2. No SentenceStreamPacer. Python's optional text_pacing (a sentence-level pacer that coordinates with the audio emitter) does not have a counterpart in @livekit/agents at the moment, so the option is omitted. Users can still pass a custom SentenceTokenizer via tokenizer.

  3. HTTP client. Python uses aiohttp; the JS port uses fetch for the chunked HTTP path (matches elevenlabs) and ws for the streaming path (matches cartesia / neuphonic).

  4. Request ID / trace ID. Python extracts Trace-Id / X-Trace-Id from response headers and trace_id from the body (both root.trace_id and base_resp.trace_id). JS does the same, preferring header, falling back to body.

  5. py.typed marker is not applicable in the JS world; types ship via the generated dist/index.d.ts.

Test plan

  • pnpm install at repo root picks up the new workspace.
  • pnpm --filter @livekit/agents-plugin-minimax build succeeds.
  • pnpm --filter @livekit/agents-plugin-minimax lint is clean.
  • Manual smoke test: new TTS({ apiKey }).synthesize("hello world") returns PCM audio.
  • Manual smoke test: new TTS({ apiKey }).stream() with a pushed sentence emits framed PCM chunks end-to-end.
  • Verify that passing emotion: 'fluent' with a non-speech-2.6-* model throws, matching Python behavior.

This is an automated port from livekit/agents#5518 by the Claude Code automation routine (experimental).

cc @toubatbrian @livekit/agent-devs

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Apr 22, 2026

🦋 Changeset detected

Latest commit: c34b65d

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 27 packages
Name Type
@livekit/agents-plugin-minimax Major
@livekit/agents Major
@livekit/agents-plugin-anam Major
@livekit/agents-plugin-assemblyai Major
@livekit/agents-plugin-baseten Major
@livekit/agents-plugin-bey Major
@livekit/agents-plugin-cartesia Major
@livekit/agents-plugin-cerebras Major
@livekit/agents-plugin-deepgram Major
@livekit/agents-plugin-elevenlabs Major
@livekit/agents-plugin-google Major
@livekit/agents-plugin-hedra Major
@livekit/agents-plugin-inworld Major
@livekit/agents-plugin-lemonslice Major
@livekit/agents-plugin-livekit Major
@livekit/agents-plugin-mistral Major
@livekit/agents-plugin-neuphonic Major
@livekit/agents-plugin-openai Major
@livekit/agents-plugin-phonic Major
@livekit/agents-plugin-resemble Major
@livekit/agents-plugin-rime Major
@livekit/agents-plugin-runway Major
@livekit/agents-plugin-sarvam Major
@livekit/agents-plugin-silero Major
@livekit/agents-plugin-trugen Major
@livekit/agents-plugin-xai Major
@livekit/agents-plugins-test Major

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

devin-ai-integration[bot]

This comment was marked as resolved.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 05c9ccdba6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/minimax/src/tts.ts Outdated
const audioHex = data?.data?.audio as string | undefined;
if (audioHex) {
const audio = hexToBuffer(audioHex);
for (const frame of bstream.write(audio.buffer)) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve byte offsets when writing decoded audio

Both synthesis paths decode hex with Buffer.from(...) and then pass audio.buffer to AudioByteStream. In Node this Buffer is typically a slice of a pooled ArrayBuffer (byteOffset is non-zero), so using .buffer feeds unrelated bytes before/after the actual chunk, which corrupts PCM output and frame boundaries. Pass the Buffer/view directly (or slice the backing buffer with byteOffset and byteLength) here and in the matching WebSocket path.

Useful? React with 👍 / 👎.

@@ -0,0 +1,52 @@
{
"name": "@livekit/agents-plugin-minimax",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Update lockfile when adding the new workspace package

This commit introduces plugins/minimax/package.json as a new workspace package, but pnpm-lock.yaml was not regenerated to include a plugins/minimax importer. Because CI runs pnpm install --frozen-lockfile in build/test workflows, installs will fail when the lockfile is out of sync with workspace manifests. Regenerating and committing the lockfile with this package is required to keep CI green.

Useful? React with 👍 / 👎.

- Regenerate pnpm-lock.yaml so frozen-lockfile install finds the new
  minimax workspace package (otherwise Build/Test/Formatting all fail
  at install time).
- Add MINIMAX_API_KEY and MINIMAX_BASE_URL to turbo.json globalEnv so
  eslint-config-turbo stops rejecting the env var references.
- Pass the Node.js Buffer directly to AudioByteStream.write instead of
  unwrapping to Buffer.buffer. Node pools small Buffers inside a larger
  ArrayBuffer, so .buffer exposed ~8KB of unrelated pool memory and
  corrupted PCM output (flagged by Devin and Codex review).
- Fix three TS2322 errors: APIStatusError.options.body must be an
  object or null, not a JSON string.
- Add tts.test.ts with the standard 'skip when API key missing'
  pattern used by rime/cartesia/neuphonic.
- Add changeset entry.
Copy link
Copy Markdown
Contributor Author

Thanks for the reviews @devin-ai-integration @chatgpt-codex-connector — addressed in 105f587:

  • Buffer pool bug (both HTTP + WebSocket paths): switched bstream.write(audio.buffer)bstream.write(audio) so the ArrayBufferView branch in AudioByteStream.write correctly uses byteOffset/byteLength and avoids feeding unrelated pooled memory.
  • Lockfile: regenerated pnpm-lock.yaml with the new plugins/minimax importer so the CI pnpm install --frozen-lockfile step succeeds.

Also fixed while in the area:

  • Added MINIMAX_API_KEY / MINIMAX_BASE_URL to turbo.json globalEnv (turbo lint rule).
  • Fixed three TS2322 errors: APIStatusError.options.body must be an object | null, not a JSON string.
  • Added plugins/minimax/src/tts.test.ts matching the skip-when-no-API-key pattern used by rime/cartesia/neuphonic.
  • Added a changeset.

Generated by Claude Code

@chatgpt-codex-connector
Copy link
Copy Markdown

Summary

  • Added a MiniMax regression test that validates buffer-view byte-offset handling by passing a sliced Buffer into AudioByteStream.write() and asserting only the intended PCM bytes are decoded; this protects against reintroducing the .buffer pooled-memory corruption issue noted in review. plugins/minimax/src/tts.test.tsL22-L34
  • Updated test imports to include expect and AudioByteStream needed for the new regression case. plugins/minimax/src/tts.test.tsL4-L6

Testing

  • ⚠️ pnpm install --frozen-lockfile (failed in this environment: onnxruntime-node postinstall download hit ENETUNREACH, so install did not complete).
  • ⚠️ pnpm test -- plugins/minimax/src/tts.test.ts (could not run because vitest was unavailable after the failed install).
  • git commit -m "test(minimax): add regression for buffer view byte offsets"

View task →

devin-ai-integration[bot]

This comment was marked as resolved.

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ toubatbrian
❌ claude
You have signed the CLA already but the status is still pending? Let us recheck it.

- updateOptions now re-validates merged opts via an extracted
  validateOptions helper, so speed/intensity/timbre/emotion+model
  constraints surface locally rather than as a server-side error.
  (Python's update_options silently assigns; the JS port tightens this.)
- Add the CLAUDE.md-required // Ref: python comments to every ported
  symbol in src/tts.ts and src/models.ts so reviewers can cross-check
  each function/type against the Python source.
devin-ai-integration[bot]

This comment was marked as resolved.

Promise.race between taskStarted.await and the setTimeout-based
timeout promise leaked the timer. On successful task_start, the
timer still fired and called reject() on an already-settled race,
producing an unhandled promise rejection. Capture the handle and
clearTimeout in a finally block. Matches the pattern used by
waitForWebSocketOpen later in the same file and by cartesia.
devin-ai-integration[bot]

This comment was marked as resolved.

#tokenStream is created once in the constructor; the TTS base class
retries run() on retryable errors (timeout, 5xx, 429), so closing
the stream here made every retry push into a closed stream and
silently drop user input. Only close() closes it now, matching
how cartesia handles its tokenizer stream.
devin-ai-integration[bot]

This comment was marked as resolved.

…tryable

- All new files now use SPDX-FileCopyrightText: 2026 per CLAUDE.md.
- When the server returns a non-zero base_resp.status_code, pass
  retryable: false to APIStatusError. These are MiniMax app-level
  codes (e.g. 1002 invalid param, 1004 auth), not HTTP status codes,
  so the default retryability heuristic would incorrectly retry
  permanent errors. Applies to both the HTTP SSE and WebSocket paths.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants