feat(_kokoro_tts): add audio format config with remote API support by Draco-Lunaris · Pull Request #1692 · agent0ai/agent-zero

Draco-Lunaris · 2026-06-03T22:23:56Z

Feature Request: Add audio format configuration to Kokoro TTS plugin

Problem

The _kokoro_tts plugin currently hardcodes audio/wav as the output format. WAV files are uncompressed and large — a typical 3-second phrase produces ~65KB of WAV vs ~10KB of MP3 at comparable speech quality. For remote TTS services like Kokoro-FastAPI that support multiple output formats (mp3, wav, opus, flac), users have no way to configure the format without modifying plugin source code.

Additionally, the plugin currently assumes local inference via from kokoro import KPipeline, which requires the kokoro Python package to be installed in the framework's venv. When using a remote Kokoro-FastAPI service, the local import is unnecessary and the plugin should call the remote API instead.

Proposed Solution

Add a response_format config option to the Kokoro TTS plugin, allowing users to choose between wav, mp3, opus, and flac output formats.

Implementation Details

`default_config.yaml`

voice: af_bella
speed: 1.1
response_format: mp3

`helpers/runtime.py` — `normalize_config()`

VALID_FORMATS = {"wav", "mp3", "opus", "flac"}
MIME_TYPES = {
    "wav": "audio/wav",
    "mp3": "audio/mpeg",
    "opus": "audio/opus",
    "flac": "audio/flac",
}

# In normalize_config:
response_format = str(config.get("response_format", normalized["response_format"]) or "").strip().lower()
if response_format in VALID_FORMATS:
    normalized["response_format"] = response_format

`helpers/runtime.py` — `synthesize_sentences()`

Pass response_format through to the backend and return the corresponding MIME type.

For local inference (current path):

# soundfile.write() supports WAV and FLAC natively
# For MP3/Opus, use format conversion after WAV generation
sf.write(buffer, combined_audio, 24000, format=format_map[response_format])

For remote API (Kokoro-FastAPI):

json={
    "model": "kokoro",
    "input": text,
    "voice": voice,
    "response_format": response_format,
    "speed": speed,
}

`api/synthesize.py`

# Instead of hardcoded mime_type:
mime_type = MIME_TYPES.get(cfg["response_format"], "audio/mpeg")
return {
    "success": True,
    "audio": audio,
    "mime_type": mime_type,
}

`webui/config.html`

Add a format selector:

<div class="field">
  <div class="field-label">
    <div class="field-title">Audio Format</div>
    <div class="field-description">Output format for synthesized audio.</div>
  </div>
  <div class="field-control">
    <select x-model="config.response_format">
      <option value="mp3">MP3 (recommended)</option>
      <option value="wav">WAV (uncompressed)</option>
      <option value="opus">Opus (low bitrate)</option>
      <option value="flac">FLAC (lossless)</option>
    </select>
  </div>
</div>

Benefits

~85% file size reduction for MP3 vs WAV at comparable speech quality
Faster network transfer between remote TTS service and browser
Browser compatibility — all modern browsers support MP3 playback via speechSynthesis and <audio> elements
User choice — lossless/low-latency users can still use WAV; bandwidth-constrained users can use MP3/Opus
Forward-compatible — new formats can be added without code changes

Environment

Agent Zero version: v1.14+
Plugin: _kokoro_tts
Tested with: Kokoro-FastAPI v0.9.4 (remote), local kokoro 0.9.4

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9cd5be8bf4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-03T22:26:31Z

-voice: am_puck,am_onyx
+voice: am_onyx+am_echo
 speed: 1.1
+remote_url: http://ares.moon-dragon.us:18890


Require explicit opt-in before using a third-party TTS host

With the shipped default config, any user who enables Kokoro TTS without creating their own config (and any migrated config missing remote_url) will have every synthesized text POSTed to ares.moon-dragon.us. That is a privacy and availability regression from the previous local-only path, because normal spoken chat content leaves the user's deployment by default; make the remote URL empty/localhost and require users to explicitly configure a remote service before sending audio requests off-box.

Useful? React with 👍 / 👎.

Draco-Lunaris · 2026-06-03T22:49:25Z

Found an issue with local / remote TTS usage. Cleaning it up and fixing it now. Should have a proper PR shortly.

- Add response_format config (mp3/wav/opus/flac) with MIME type mapping - Add remote_url config for optional remote Kokoro-FastAPI server - If remote_url is set, use remote API for synthesis; otherwise use local model - If remote_url is set, use remote health check; otherwise use local model status - Status endpoint reports both local model and remote health (if configured) - Synthesize endpoint returns (audio, mime_type) tuple for proper content-type - WebUI config page adds format dropdown and remote URL field - WebUI main page shows remote health alongside local model status - Preserves all local synthesis functionality (soundfile, KPipeline, etc.) - Preserves upstream defaults (voice: am_puck,am_onyx, speed: 1.1)

Draco-Lunaris · 2026-06-03T23:06:31Z

Cleaned up the PR, Now includes all the original local functionality for TTS. Both local and remove TTS should be unaffected with just the new config additions added to the plugin config. Running the changes local myself.

chatgpt-codex-connector Bot reviewed Jun 3, 2026

View reviewed changes

Draco-Lunaris-Echo force-pushed the feature/kokoro-tts-audio-format-config branch from e3e440c to 5eaa508 Compare June 3, 2026 22:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(_kokoro_tts): add audio format config with remote API support#1692

feat(_kokoro_tts): add audio format config with remote API support#1692
Draco-Lunaris wants to merge 1 commit into
agent0ai:mainfrom
Draco-Lunaris:feature/kokoro-tts-audio-format-config

Draco-Lunaris commented Jun 3, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 3, 2026

Uh oh!

Draco-Lunaris commented Jun 3, 2026

Uh oh!

Draco-Lunaris commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Draco-Lunaris commented Jun 3, 2026

Feature Request: Add audio format configuration to Kokoro TTS plugin

Problem

Proposed Solution

Implementation Details

default_config.yaml

helpers/runtime.py — normalize_config()

helpers/runtime.py — synthesize_sentences()

api/synthesize.py

webui/config.html

Benefits

Environment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

Draco-Lunaris commented Jun 3, 2026

Uh oh!

Draco-Lunaris commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`default_config.yaml`

`helpers/runtime.py` — `normalize_config()`

`helpers/runtime.py` — `synthesize_sentences()`

`api/synthesize.py`

`webui/config.html`