feat(_kokoro_tts): add audio format config with remote API support#1692
feat(_kokoro_tts): add audio format config with remote API support#1692Draco-Lunaris wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9cd5be8bf4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| voice: am_puck,am_onyx | ||
| voice: am_onyx+am_echo | ||
| speed: 1.1 | ||
| remote_url: http://ares.moon-dragon.us:18890 |
There was a problem hiding this comment.
Require explicit opt-in before using a third-party TTS host
With the shipped default config, any user who enables Kokoro TTS without creating their own config (and any migrated config missing remote_url) will have every synthesized text POSTed to ares.moon-dragon.us. That is a privacy and availability regression from the previous local-only path, because normal spoken chat content leaves the user's deployment by default; make the remote URL empty/localhost and require users to explicitly configure a remote service before sending audio requests off-box.
Useful? React with 👍 / 👎.
|
Found an issue with local / remote TTS usage. Cleaning it up and fixing it now. Should have a proper PR shortly. |
- Add response_format config (mp3/wav/opus/flac) with MIME type mapping - Add remote_url config for optional remote Kokoro-FastAPI server - If remote_url is set, use remote API for synthesis; otherwise use local model - If remote_url is set, use remote health check; otherwise use local model status - Status endpoint reports both local model and remote health (if configured) - Synthesize endpoint returns (audio, mime_type) tuple for proper content-type - WebUI config page adds format dropdown and remote URL field - WebUI main page shows remote health alongside local model status - Preserves all local synthesis functionality (soundfile, KPipeline, etc.) - Preserves upstream defaults (voice: am_puck,am_onyx, speed: 1.1)
e3e440c to
5eaa508
Compare
|
Cleaned up the PR, Now includes all the original local functionality for TTS. Both local and remove TTS should be unaffected with just the new config additions added to the plugin config. Running the changes local myself. |
Feature Request: Add audio format configuration to Kokoro TTS plugin
Problem
The
_kokoro_ttsplugin currently hardcodesaudio/wavas the output format. WAV files are uncompressed and large — a typical 3-second phrase produces ~65KB of WAV vs ~10KB of MP3 at comparable speech quality. For remote TTS services like Kokoro-FastAPI that support multiple output formats (mp3, wav, opus, flac), users have no way to configure the format without modifying plugin source code.Additionally, the plugin currently assumes local inference via
from kokoro import KPipeline, which requires the kokoro Python package to be installed in the framework's venv. When using a remote Kokoro-FastAPI service, the local import is unnecessary and the plugin should call the remote API instead.Proposed Solution
Add a
response_formatconfig option to the Kokoro TTS plugin, allowing users to choose betweenwav,mp3,opus, andflacoutput formats.Implementation Details
default_config.yamlhelpers/runtime.py—normalize_config()helpers/runtime.py—synthesize_sentences()Pass
response_formatthrough to the backend and return the corresponding MIME type.For local inference (current path):
For remote API (Kokoro-FastAPI):
api/synthesize.pywebui/config.htmlAdd a format selector:
Benefits
speechSynthesisand<audio>elementsEnvironment
_kokoro_tts