feat(voice): port dynamic endpointing to Node.js#1297
Conversation
Port the dynamic endpointing state machine from the Python SDK so Node.js voice sessions can learn endpointing delays from pauses, interruptions, and agent overlap.
Restore the existing AudioRecognition delay fields and no-arg agent-speech entrypoint as compatibility shims so the dynamic endpointing port does not break the public TypeScript surface.
Fix the generated lint error from the additive AudioRecognition type changes so the pre-push hook accepts the dynamic endpointing port.
Keep the existing ExpFilter, AgentActivity, and AudioRecognition TypeScript signatures additive while still routing the new dynamic endpointing runtime through the Node.js voice stack.
|
Rosetta Bot seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
🦋 Changeset detectedLatest commit: 66b66eb The changes in this PR will be included in the next version bump. This PR includes changesets to release 26 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
| this.onStartOfSpeech(startTime, ev.speechDuration); | ||
| const span = this.ensureUserTurnSpan(startTime); | ||
| const ctx = this.userTurnContext(span); | ||
| otelContext.with(ctx, () => this.hooks.onStartOfSpeech(ev)); |
There was a problem hiding this comment.
🔴 Double overlap speech sentinels sent to interruption detection pipeline
When VAD fires START_OF_SPEECH, the new code in createVadTask calls this.onStartOfSpeech(startTime, ev.speechDuration) at agents/src/voice/audio_recognition.ts:1085, which (when interruption is enabled and agent is speaking) calls this.onStartOfOverlapSpeech(...) at line 389. Then immediately after, this.hooks.onStartOfSpeech(ev) at line 1088 fires the hook into AgentActivity.onStartOfSpeech(ev) (agents/src/voice/agent_activity.ts:1033), which at line 1045 also calls this.audioRecognition.onStartOfOverlapSpeech(...). This sends the overlapSpeechStarted sentinel twice to the interruption detection stream.
The same double-call pattern occurs for END_OF_SPEECH: AudioRecognition.onEndOfSpeech() at line 1119 calls this.onEndOfOverlapSpeech(), and then the hook AgentActivity.onEndOfSpeech() at line 1061 also calls this.audioRecognition.onEndOfOverlapSpeech(). Both calls pass their guards since isAgentSpeaking remains true.
The STT-driven START_OF_SPEECH/END_OF_SPEECH paths (lines 766 and 812) have the same issue.
Duplicate sentinels can corrupt the interruption detection model's state machine, leading to incorrect interruption/non-interruption decisions.
Prompt for agents
The new AudioRecognition.onStartOfSpeech() and AudioRecognition.onEndOfSpeech() methods now handle overlap speech detection internally (calling onStartOfOverlapSpeech / onEndOfOverlapSpeech). However, the existing RecognitionHooks implementations in AgentActivity (onStartOfSpeech at agent_activity.ts:1033 and onEndOfSpeech at agent_activity.ts:1053) still directly call audioRecognition.onStartOfOverlapSpeech / onEndOfOverlapSpeech when interruption detection is enabled. This causes duplicate sentinels to be sent to the interruption detection pipeline for every VAD or STT speech event.
To fix: Remove the direct onStartOfOverlapSpeech / onEndOfOverlapSpeech calls from AgentActivity.onStartOfSpeech (lines 1043-1050 in agent_activity.ts) and AgentActivity.onEndOfSpeech (lines 1059-1064 in agent_activity.ts), since AudioRecognition now handles this internally via its new onStartOfSpeech/onEndOfSpeech methods.
Was this helpful? React with 👍 or 👎 to provide feedback.
|
|
||
| const logger = log(); | ||
|
|
||
| // Ref: source livekit-agents/livekit/agents/voice/endpointing.py - 7-7 |
There was a problem hiding this comment.
🔴 Ref comments use wrong format: 'source' instead of 'python' and missing 'lines' suffix
CLAUDE.md mandates the format // Ref: python <relative-file-path> - <line-range> lines for Python reference comments. All 19 Ref comments added in this PR use the wrong format // Ref: source <path> - <line-range> — using source instead of python and omitting the lines suffix. Compare with the existing correct example at agents/src/stt/stt.ts:77: // Ref: python livekit-agents/livekit/agents/stt/stt.py - 62-68 lines. This instance is representative of all Ref comments in the PR.
| // Ref: source livekit-agents/livekit/agents/voice/endpointing.py - 7-7 | |
| // Ref: python livekit-agents/livekit/agents/voice/endpointing.py - 7-7 lines |
Was this helpful? React with 👍 or 👎 to provide feedback.
| } | ||
|
|
||
| async onStartOfAgentSpeech() { | ||
| // Ref: source livekit-agents/livekit/agents/voice/audio_recognition.py - 238-243 |
There was a problem hiding this comment.
🔴 Ref comments use wrong format in audio_recognition.ts
Same CLAUDE.md format violation as in endpointing.ts — uses // Ref: source instead of // Ref: python and missing lines suffix. This affects all 4 Ref comments added to this file (lines 338, 347, 380, 392).
| // Ref: source livekit-agents/livekit/agents/voice/audio_recognition.py - 238-243 | |
| // Ref: python livekit-agents/livekit/agents/voice/audio_recognition.py - 238-243 lines |
Was this helpful? React with 👍 or 👎 to provide feedback.
| this.vad.on('metrics_collected', this.onMetricsCollected); | ||
| } | ||
|
|
||
| // Ref: source livekit-agents/livekit/agents/voice/agent_activity.py - 768-784 |
There was a problem hiding this comment.
🔴 Ref comments use wrong format in agent_activity.ts
Same CLAUDE.md format violation — uses // Ref: source instead of // Ref: python and missing lines suffix. This affects all 6 Ref comments added to this file (lines 473, 753, 936, 959, 1845, 2127).
| // Ref: source livekit-agents/livekit/agents/voice/agent_activity.py - 768-784 | |
| // Ref: python livekit-agents/livekit/agents/voice/agent_activity.py - 768-784 lines |
Was this helpful? React with 👍 or 👎 to provide feedback.
| }; | ||
|
|
||
| /** @internal */ | ||
| // Ref: source livekit-agents/livekit/agents/utils/exp_filter.py - 5-64 |
There was a problem hiding this comment.
🔴 Ref comment uses wrong format in utils.ts
Same CLAUDE.md format violation — uses // Ref: source instead of // Ref: python and missing lines suffix.
| // Ref: source livekit-agents/livekit/agents/utils/exp_filter.py - 5-64 | |
| // Ref: python livekit-agents/livekit/agents/utils/exp_filter.py - 5-64 lines |
Was this helpful? React with 👍 or 👎 to provide feedback.
This PR was created by Rosetta.
Tracking issue: https://github.com/livekit/rosetta/issues/99
Summary
Source