Skip to content

feat: add Deepgram streaming transcription mode#280

Open
sjawhar wants to merge 1 commit intopeteonrails:mainfrom
sjawhar:feat/deepgram-streaming
Open

feat: add Deepgram streaming transcription mode#280
sjawhar wants to merge 1 commit intopeteonrails:mainfrom
sjawhar:feat/deepgram-streaming

Conversation

@sjawhar
Copy link
Copy Markdown
Contributor

@sjawhar sjawhar commented Mar 23, 2026

Summary

Adds real-time streaming transcription via Deepgram's WebSocket API as a new whisper mode (streaming). Audio is transcribed as you speak, with results appearing after recording stops.

What's included

Core streaming client (src/transcribe/deepgram.rs):

  • WebSocket client using the deepgram crate
  • PCM audio encoding and streaming
  • Final transcript extraction from Deepgram responses
  • Drop impl that aborts the background WS task on cancel
  • Error tracking for mid-stream disconnects
  • 5-second timeout on stream finish to prevent daemon hangs
  • UTF-8-safe transcript preview truncation

Daemon integration (src/daemon.rs):

  • build_deepgram_config() with configurable sample rate passthrough
  • Streaming recording paths in all 3 recording-start handlers (PTT, toggle, SIGUSR1)
  • Model loading guard (skips local Whisper model prep in streaming mode)
  • Audio capture cleanup on stream setup failure
  • Rustls crypto provider init gated on streaming mode

Configuration (src/config.rs, src/cli.rs, src/main.rs):

  • mode = "streaming" in config file
  • --whisper-mode streaming CLI flag
  • VOXTYPE_WHISPER_MODE environment variable
  • --streaming-model, --streaming-endpoint, --streaming-api-key flags
  • VOXTYPE_DEEPGRAM_API_KEY environment variable
  • Streaming fields visible in voxtype config output (API key masked)

Documentation:

  • docs/CONFIGURATION.md - streaming config reference with examples
  • docs/USER_MANUAL.md - setup guide, usage, comparison with local mode
  • docs/TROUBLESHOOTING.md - common streaming errors and fixes

Usage

# ~/.config/voxtype/config.toml
[whisper]
mode = "streaming"
streaming_api_key = "your-deepgram-api-key"

Or via environment:

export VOXTYPE_WHISPER_MODE=streaming
export VOXTYPE_DEEPGRAM_API_KEY="your-key"
voxtype daemon

Testing

  • 538 tests pass (cargo test --lib)
  • No changes to existing eager processing, local, remote, or CLI transcription modes
  • New tests: UTF-8 truncation, sample rate propagation, config parsing

@sjawhar sjawhar requested a review from peteonrails as a code owner March 23, 2026 01:56
@sjawhar sjawhar force-pushed the feat/deepgram-streaming branch 3 times, most recently from 1b8a5af to edc7e0e Compare March 29, 2026 04:47
Also fixes audio clipping at recording start and end:
- send_notification() changed from async/await to fire-and-forget via
  tokio::spawn, eliminating 50-200ms dbus latency before audio capture
- finish_streaming_recording() now sends final audio samples to Deepgram
  before closing, instead of discarding them
@sjawhar sjawhar force-pushed the feat/deepgram-streaming branch from edc7e0e to e884cca Compare April 13, 2026 01:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant