feat: add Deepgram streaming transcription mode by sjawhar · Pull Request #280 · peteonrails/voxtype

sjawhar · 2026-03-23T01:56:36Z

Summary

Adds real-time streaming transcription via Deepgram's WebSocket API as a new whisper mode (streaming). Audio is transcribed as you speak, with results appearing after recording stops.

What's included

Core streaming client (src/transcribe/deepgram.rs):

WebSocket client using the deepgram crate
PCM audio encoding and streaming
Final transcript extraction from Deepgram responses
Drop impl that aborts the background WS task on cancel
Error tracking for mid-stream disconnects
5-second timeout on stream finish to prevent daemon hangs
UTF-8-safe transcript preview truncation

Daemon integration (src/daemon.rs):

build_deepgram_config() with configurable sample rate passthrough
Streaming recording paths in all 3 recording-start handlers (PTT, toggle, SIGUSR1)
Model loading guard (skips local Whisper model prep in streaming mode)
Audio capture cleanup on stream setup failure
Rustls crypto provider init gated on streaming mode

Configuration (src/config.rs, src/cli.rs, src/main.rs):

mode = "streaming" in config file
--whisper-mode streaming CLI flag
VOXTYPE_WHISPER_MODE environment variable
--streaming-model, --streaming-endpoint, --streaming-api-key flags
VOXTYPE_DEEPGRAM_API_KEY environment variable
Streaming fields visible in voxtype config output (API key masked)

Documentation:

docs/CONFIGURATION.md - streaming config reference with examples
docs/USER_MANUAL.md - setup guide, usage, comparison with local mode
docs/TROUBLESHOOTING.md - common streaming errors and fixes

Usage

# ~/.config/voxtype/config.toml
[whisper]
mode = "streaming"
streaming_api_key = "your-deepgram-api-key"

Or via environment:

export VOXTYPE_WHISPER_MODE=streaming
export VOXTYPE_DEEPGRAM_API_KEY="your-key"
voxtype daemon

Testing

538 tests pass (cargo test --lib)
No changes to existing eager processing, local, remote, or CLI transcription modes
New tests: UTF-8 truncation, sample rate propagation, config parsing

Also fixes audio clipping at recording start and end: - send_notification() changed from async/await to fire-and-forget via tokio::spawn, eliminating 50-200ms dbus latency before audio capture - finish_streaming_recording() now sends final audio samples to Deepgram before closing, instead of discarding them

sjawhar requested a review from peteonrails as a code owner March 23, 2026 01:56

sjawhar force-pushed the feat/deepgram-streaming branch 3 times, most recently from 1b8a5af to edc7e0e Compare March 29, 2026 04:47

sjawhar force-pushed the feat/deepgram-streaming branch from edc7e0e to e884cca Compare April 13, 2026 01:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Deepgram streaming transcription mode#280

feat: add Deepgram streaming transcription mode#280
sjawhar wants to merge 1 commit intopeteonrails:mainfrom
sjawhar:feat/deepgram-streaming

sjawhar commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sjawhar commented Mar 23, 2026

Summary

What's included

Usage

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant