feat: add in-process OpenAI-compatible STT API service by krystophny · Pull Request #245 · peteonrails/voxtype

krystophny · 2026-03-01T09:08:10Z

Summary

This PR adds an in-process OpenAI-compatible STT HTTP service to Voxtype. The service runs alongside the daemon and reuses the daemon's transcriber, so Voxtype does not load the Whisper model twice when both push-to-talk STT and an API endpoint are needed.

Scope

Adds /healthz, /v1/audio/transcriptions, and /v1/audio/translations.
Adds service config, CLI flags, and environment overrides for bind host/port, request timeout, upload limits, and allowed languages.
Adds request-level language and prompt overrides for the Whisper transcriber.
Adds json, text, and verbose_json response handling, including segment timestamps for long-form chunking.
Keeps a bounded Whisper state pool for concurrent service requests.

This branch is intentionally independent from macOS support. It is based on main and does not include the macOS files from #129. For local Mac testing with both feature sets stacked, use feature/macos-openai-stt-stack.

Design Notes

No auth is implemented in the local Voxtype service; the intended deployment is loopback/private LAN first.
The service can constrain accepted request languages through config.
Audio upload decode/downmix/resample to 16 kHz mono is handled server-side.

Verification

Executed on mailuefterl in /tmp/voxtype-openai.J5NRzM/repo after applying this branch on top of main:

$ cargo check
Finished `dev` profile

$ cargo test
549 unit tests passed; 25 integration tests passed; 0 failed

$ cargo build
Finished `dev` profile

Stacked Mac branch verification on feature/macos-openai-stt-stack:

$ cargo build --release --features gpu-metal
Finished `release` profile

$ curl -fsS http://127.0.0.1:8427/healthz
{"status":"ok"}

$ curl -fsS -F file=@tests/fixtures/vad/speech_hello.wav -F model=large-v3-turbo -F response_format=json -F language=en http://127.0.0.1:8427/v1/audio/transcriptions
{"text":"Hello world"}

Closes #244

The service previously created its own transcriber instance, loading the same model into GPU memory a second time. Now the daemon passes its existing transcriber via Arc to the service, halving VRAM usage. Falls back to creating a separate instance when no shared transcriber is available (on-demand loading, gpu_isolation).

Return per-segment start/end timestamps when response_format is verbose_json. Adds transcribe_segments method to Transcriber trait with default fallback and WhisperTranscriber override that extracts real timestamps from whisper-rs segment iterator. Bumps default max_upload_bytes to 200MB and request_timeout_ms to 600s to support long audio files.

krystophny added 5 commits April 23, 2026 07:45

feat: add in-process OpenAI-compatible STT service

8ce573c

Reuse bounded whisper state pool for local STT service

7615233

Chunk long-form service transcriptions

ec9a74d

krystophny force-pushed the feature/single-daemon-openai-stt-api branch from a7cae79 to ec9a74d Compare April 23, 2026 05:49

krystophny changed the title ~~feat: single daemon with local OpenAI-compatible STT service~~ feat: add in-process OpenAI-compatible STT API service Apr 23, 2026

krystophny mentioned this pull request Apr 23, 2026

feat: add macOS support #129

Open

krystophny marked this pull request as ready for review April 23, 2026 06:22

krystophny requested a review from peteonrails as a code owner April 23, 2026 06:22

krystophny marked this pull request as draft April 23, 2026 06:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add in-process OpenAI-compatible STT API service#245

feat: add in-process OpenAI-compatible STT API service#245
krystophny wants to merge 5 commits intomainfrom
feature/single-daemon-openai-stt-api

krystophny commented Mar 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

krystophny commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Scope

Design Notes

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

krystophny commented Mar 1, 2026 •

edited

Loading