Skip to content

voice: add opt-in audio turn detector hook#5514

Draft
piyush-gambhir wants to merge 1 commit intolivekit:mainfrom
piyush-gambhir:feat/audio-turn-detector-hook
Draft

voice: add opt-in audio turn detector hook#5514
piyush-gambhir wants to merge 1 commit intolivekit:mainfrom
piyush-gambhir:feat/audio-turn-detector-hook

Conversation

@piyush-gambhir
Copy link
Copy Markdown
Contributor

Summary

This PR adds an opt-in extension point for audio-native turn detectors in livekit-agents.

Today, custom turn detectors operate on chat or transcript context only. This change adds:

  • AudioTurnContext, a public container for buffered current-turn audio plus transcript and chat context
  • AudioTurnDetector, a public protocol for audio-native end-of-turn detectors
  • current-turn PCM buffering in AudioRecognition
  • audio-detector invocation in the existing end-of-utterance path before endpointing and commit
  • a focused example agent and unit coverage for the new hook

Motivation

Some turn detectors operate directly on audio rather than text. Those detectors cannot currently plug into the existing custom turn-detection API because only ChatContext is exposed to custom detectors.

This PR adds the minimal generic hook needed to support that class of detector without introducing any provider-specific dependency.

Scope

  • no default behavior change
  • existing text/chat-context turn detectors continue to work unchanged
  • no built-in vendor or model integration is added in this PR
  • audio is exposed as buffered current-turn frames; model-specific preprocessing remains the responsibility of the detector implementation

Validation

  • make check
  • uv run pytest tests/test_agent_session.py tests/test_audio_recognition_handoff.py tests/test_audio_turn_detection.py -q
  • uv run python -m py_compile examples/voice_agents/audio_turn_detector.py
  • uv run examples/voice_agents/audio_turn_detector.py --help

Notes

This draft intentionally keeps the change generic and provider-agnostic. A follow-up plugin PR can implement a concrete detector on top of this hook.

This is also intended as a narrower, lower-risk framing of audio-based turn detection than the earlier feature request in #3094.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant