Skip to content

feat: add multilingual TTS support via ephone (eSpeak-NG WASM)#313

Open
Dongshan-git wants to merge 1 commit into
hexgrad:mainfrom
Dongshan-git:feat/multilingual-ephone
Open

feat: add multilingual TTS support via ephone (eSpeak-NG WASM)#313
Dongshan-git wants to merge 1 commit into
hexgrad:mainfrom
Dongshan-git:feat/multilingual-ephone

Conversation

@Dongshan-git
Copy link
Copy Markdown

Summary

  • Replace the English-only phonemizer package with ephone v1.0.2, an eSpeak-NG WASM wrapper with built-in language packs for 9 languages (en-US, en-GB, es, fr, it, pt-BR, ja, zh, hi)
  • Fix a hang in KokoroTTS.stream() where splitter.close() was never called, leaving the async iterator blocked forever after all chunks were pushed
  • Uncomment all non-English voices in voices.js (Japanese, Chinese, Spanish, French, Hindi, Italian, Portuguese)
  • Redesign the browser demo with a grouped voice selector, waveform visualizer, and per-language example texts

Changes

src/phonemize.js

  • Rewrite around createEphone from the ephone package
  • Language packs (en_us, en_all, roa, jpx, sit) are imported as static top-level imports; Hindi's large all pack is lazy-loaded on demand
  • normalize_text now takes an english flag — number/currency/abbreviation normalization is skipped for non-English input
  • English post-processing (r → ɹ, kokoro pronunciation fix) is gated behind isEnglish so it doesn't corrupt romance-language phonemes

src/kokoro.js

  • Add missing splitter.close() call after splitter.push(...chunks) — without this the TextSplitterStream async iterator never resolves
  • Await this.tokenizer(...) (was missing await, causing silent failures)
  • Extend _validate_voice language type annotation to cover all 9 language codes

src/voices.js

  • Enable all previously-commented-out non-English voices

rollup.config.js

  • Switch web build from file: "kokoro.web.js" to dir + entryFileNames/chunkFileNames to support dynamic imports (ephone language packs are code-split chunks)

demo/

  • Grouped voice selector with language flags and quality grades
  • WaveformPlayer component with canvas waveform visualization
  • Per-language example texts
  • AnimatePresence loading/result transitions

Test plan

  • English (af_heart, bf_emma) — verify existing behaviour unchanged
  • French (ff_siwis) — was completely broken before; should now synthesize correctly
  • Japanese (jf_alpha) — hiragana/katakana input; kanji not supported by eSpeak-NG
  • Chinese (zf_xiaoxiao), Spanish (ef_dora), Hindi (hf_alpha), Italian (if_sara), Portuguese (pf_dora)
  • Run vitest — all phonemize tests should pass

🤖 Generated with Claude Code

Replace the English-only `phonemizer` package with `ephone` v1.0.2,
an eSpeak-NG WASM wrapper that ships language packs for 9 languages.
Fix a hang in `KokoroTTS.stream()` where `splitter.close()` was never
called, leaving the async iterator blocked forever.

- phonemize.js: rewrite around `createEphone`; lazy-load the large
  Hindi 'all' pack on demand; skip English-specific r→ɹ for romance langs
- kokoro.js: call `splitter.close()` after pushing chunks; await tokenizer
- voices.js: uncomment all non-English voices (ja/zh/es/fr/hi/it/pt-br)
- rollup.config.js: switch web build to dir+chunkFileNames for dynamic imports
- demo: redesign UI with grouped voice selector, waveform player,
  per-language example texts, and AnimatePresence transitions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Dongshan-git
Copy link
Copy Markdown
Author

Also bumps @huggingface/transformers from ^3.5.1 to ^4.0.1 and dev dependencies (rollup, vitest, typescript, prettier) to their latest major versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant