feat: add in-process OpenAI-compatible STT API service#245
Draft
krystophny wants to merge 5 commits intomainfrom
Draft
feat: add in-process OpenAI-compatible STT API service#245krystophny wants to merge 5 commits intomainfrom
krystophny wants to merge 5 commits intomainfrom
Conversation
The service previously created its own transcriber instance, loading the same model into GPU memory a second time. Now the daemon passes its existing transcriber via Arc to the service, halving VRAM usage. Falls back to creating a separate instance when no shared transcriber is available (on-demand loading, gpu_isolation).
Return per-segment start/end timestamps when response_format is verbose_json. Adds transcribe_segments method to Transcriber trait with default fallback and WhisperTranscriber override that extracts real timestamps from whisper-rs segment iterator. Bumps default max_upload_bytes to 200MB and request_timeout_ms to 600s to support long audio files.
a7cae79 to
ec9a74d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds an in-process OpenAI-compatible STT HTTP service to Voxtype. The service runs alongside the daemon and reuses the daemon's transcriber, so Voxtype does not load the Whisper model twice when both push-to-talk STT and an API endpoint are needed.
Scope
/healthz,/v1/audio/transcriptions, and/v1/audio/translations.json,text, andverbose_jsonresponse handling, including segment timestamps for long-form chunking.This branch is intentionally independent from macOS support. It is based on
mainand does not include the macOS files from #129. For local Mac testing with both feature sets stacked, usefeature/macos-openai-stt-stack.Design Notes
Verification
Executed on
mailuefterlin/tmp/voxtype-openai.J5NRzM/repoafter applying this branch on top ofmain:Stacked Mac branch verification on
feature/macos-openai-stt-stack:Closes #244