Skip to content

WIP: VAD with Whisper and Silero backends#147

Draft
peteonrails wants to merge 1 commit intomainfrom
feature/vad-silero-compat
Draft

WIP: VAD with Whisper and Silero backends#147
peteonrails wants to merge 1 commit intomainfrom
feature/vad-silero-compat

Conversation

@peteonrails
Copy link
Copy Markdown
Owner

Summary

Implements Voice Activity Detection with engine-specific backends:

  • WhisperVad: Uses whisper-rs built-in VAD (GGML Silero model) - works now
  • SileroVad: Uses voice_activity_detector crate for Parakeet builds - blocked on ort compatibility

Status: Blocked

The voice_activity_detector 0.2.0 crate uses an older ort API that conflicts with parakeet-rs's ort version. Build fails with API mismatch errors when the parakeet feature is enabled.

Unblocking options

  1. Wait for voice_activity_detector to update for newer ort versions
  2. Find an alternative Silero ONNX implementation compatible with our ort version
  3. Contribute ort compatibility fix upstream to voice_activity_detector

Related

  • See feature/vad-silence-filter for the working energy-based VAD (no external dependencies)
  • WhisperVad from this branch can be cherry-picked independently since it has no ort dependency

Implements Voice Activity Detection with two backends:
- WhisperVad: Uses whisper-rs built-in VAD (GGML Silero model)
- SileroVad: Uses voice_activity_detector crate (ONNX Silero model)

WhisperVad works, but SileroVad is blocked on ort API compatibility.
The voice_activity_detector 0.2.0 crate uses an older ort API that
conflicts with parakeet-rs's ort version.

This branch preserves the work for when:
- voice_activity_detector updates to support newer ort
- Or we find an alternative Silero ONNX implementation

See feature/vad-silence-filter for the working energy-based VAD.
@peteonrails
Copy link
Copy Markdown
Owner Author

Update: ort compatibility may be unblocked

The silero-vad-rust crate (v6.2.1) uses ort 2.0.0-rc.10, which is compatible with our ort 2.0.0-rc.11. This is a different crate from voice_activity_detector and doesn't have the API conflict.

This could replace the blocked voice_activity_detector dependency and give us Silero VAD in ONNX binaries. Worth spiking as a replacement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant