Skip to content

feat(voice): port dynamic endpointing to Node.js#1297

Closed
u9g wants to merge 4 commits intomainfrom
rosetta/issue-99
Closed

feat(voice): port dynamic endpointing to Node.js#1297
u9g wants to merge 4 commits intomainfrom
rosetta/issue-99

Conversation

@u9g
Copy link
Copy Markdown
Contributor

@u9g u9g commented Apr 22, 2026

This PR was created by Rosetta.

Tracking issue: https://github.com/livekit/rosetta/issues/99

Summary

  • port the Python dynamic endpointing state machine and endpointing factory into the Node.js voice turn-config runtime
  • wire AudioRecognition and AgentActivity into the new runtime so pauses, interruptions, and agent overlap update endpointing delays in live sessions
  • add direct parity tests for the endpointing contract plus Node-specific AudioRecognition integration coverage

Source

Rosetta Bot added 4 commits April 22, 2026 20:05
Port the dynamic endpointing state machine from the Python SDK so Node.js voice sessions can learn endpointing delays from pauses, interruptions, and agent overlap.
Restore the existing AudioRecognition delay fields and no-arg agent-speech entrypoint as compatibility shims so the dynamic endpointing port does not break the public TypeScript surface.
Fix the generated lint error from the additive AudioRecognition type changes so the pre-push hook accepts the dynamic endpointing port.
Keep the existing ExpFilter, AgentActivity, and AudioRecognition TypeScript signatures additive while still routing the new dynamic endpointing runtime through the Node.js voice stack.
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Rosetta Bot seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Apr 22, 2026

🦋 Changeset detected

Latest commit: 66b66eb

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 26 packages
Name Type
@livekit/agents Major
@livekit/agents-plugin-anam Major
@livekit/agents-plugin-assemblyai Major
@livekit/agents-plugin-baseten Major
@livekit/agents-plugin-bey Major
@livekit/agents-plugin-cartesia Major
@livekit/agents-plugin-cerebras Major
@livekit/agents-plugin-deepgram Major
@livekit/agents-plugin-elevenlabs Major
@livekit/agents-plugin-google Major
@livekit/agents-plugin-hedra Major
@livekit/agents-plugin-inworld Major
@livekit/agents-plugin-lemonslice Major
@livekit/agents-plugin-livekit Major
@livekit/agents-plugin-mistral Major
@livekit/agents-plugin-neuphonic Major
@livekit/agents-plugin-openai Major
@livekit/agents-plugin-phonic Major
@livekit/agents-plugin-resemble Major
@livekit/agents-plugin-rime Major
@livekit/agents-plugin-runway Major
@livekit/agents-plugin-sarvam Major
@livekit/agents-plugin-silero Major
@livekit/agents-plugins-test Major
@livekit/agents-plugin-trugen Major
@livekit/agents-plugin-xai Major

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 5 potential issues.

View 6 additional findings in Devin Review.

Open in Devin Review

Comment on lines +1085 to 1088
this.onStartOfSpeech(startTime, ev.speechDuration);
const span = this.ensureUserTurnSpan(startTime);
const ctx = this.userTurnContext(span);
otelContext.with(ctx, () => this.hooks.onStartOfSpeech(ev));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Double overlap speech sentinels sent to interruption detection pipeline

When VAD fires START_OF_SPEECH, the new code in createVadTask calls this.onStartOfSpeech(startTime, ev.speechDuration) at agents/src/voice/audio_recognition.ts:1085, which (when interruption is enabled and agent is speaking) calls this.onStartOfOverlapSpeech(...) at line 389. Then immediately after, this.hooks.onStartOfSpeech(ev) at line 1088 fires the hook into AgentActivity.onStartOfSpeech(ev) (agents/src/voice/agent_activity.ts:1033), which at line 1045 also calls this.audioRecognition.onStartOfOverlapSpeech(...). This sends the overlapSpeechStarted sentinel twice to the interruption detection stream.

The same double-call pattern occurs for END_OF_SPEECH: AudioRecognition.onEndOfSpeech() at line 1119 calls this.onEndOfOverlapSpeech(), and then the hook AgentActivity.onEndOfSpeech() at line 1061 also calls this.audioRecognition.onEndOfOverlapSpeech(). Both calls pass their guards since isAgentSpeaking remains true.

The STT-driven START_OF_SPEECH/END_OF_SPEECH paths (lines 766 and 812) have the same issue.

Duplicate sentinels can corrupt the interruption detection model's state machine, leading to incorrect interruption/non-interruption decisions.

Prompt for agents
The new AudioRecognition.onStartOfSpeech() and AudioRecognition.onEndOfSpeech() methods now handle overlap speech detection internally (calling onStartOfOverlapSpeech / onEndOfOverlapSpeech). However, the existing RecognitionHooks implementations in AgentActivity (onStartOfSpeech at agent_activity.ts:1033 and onEndOfSpeech at agent_activity.ts:1053) still directly call audioRecognition.onStartOfOverlapSpeech / onEndOfOverlapSpeech when interruption detection is enabled. This causes duplicate sentinels to be sent to the interruption detection pipeline for every VAD or STT speech event.

To fix: Remove the direct onStartOfOverlapSpeech / onEndOfOverlapSpeech calls from AgentActivity.onStartOfSpeech (lines 1043-1050 in agent_activity.ts) and AgentActivity.onEndOfSpeech (lines 1059-1064 in agent_activity.ts), since AudioRecognition now handles this internally via its new onStartOfSpeech/onEndOfSpeech methods.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.


const logger = log();

// Ref: source livekit-agents/livekit/agents/voice/endpointing.py - 7-7
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Ref comments use wrong format: 'source' instead of 'python' and missing 'lines' suffix

CLAUDE.md mandates the format // Ref: python <relative-file-path> - <line-range> lines for Python reference comments. All 19 Ref comments added in this PR use the wrong format // Ref: source <path> - <line-range> — using source instead of python and omitting the lines suffix. Compare with the existing correct example at agents/src/stt/stt.ts:77: // Ref: python livekit-agents/livekit/agents/stt/stt.py - 62-68 lines. This instance is representative of all Ref comments in the PR.

Suggested change
// Ref: source livekit-agents/livekit/agents/voice/endpointing.py - 7-7
// Ref: python livekit-agents/livekit/agents/voice/endpointing.py - 7-7 lines
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

}

async onStartOfAgentSpeech() {
// Ref: source livekit-agents/livekit/agents/voice/audio_recognition.py - 238-243
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Ref comments use wrong format in audio_recognition.ts

Same CLAUDE.md format violation as in endpointing.ts — uses // Ref: source instead of // Ref: python and missing lines suffix. This affects all 4 Ref comments added to this file (lines 338, 347, 380, 392).

Suggested change
// Ref: source livekit-agents/livekit/agents/voice/audio_recognition.py - 238-243
// Ref: python livekit-agents/livekit/agents/voice/audio_recognition.py - 238-243 lines
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

this.vad.on('metrics_collected', this.onMetricsCollected);
}

// Ref: source livekit-agents/livekit/agents/voice/agent_activity.py - 768-784
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Ref comments use wrong format in agent_activity.ts

Same CLAUDE.md format violation — uses // Ref: source instead of // Ref: python and missing lines suffix. This affects all 6 Ref comments added to this file (lines 473, 753, 936, 959, 1845, 2127).

Suggested change
// Ref: source livekit-agents/livekit/agents/voice/agent_activity.py - 768-784
// Ref: python livekit-agents/livekit/agents/voice/agent_activity.py - 768-784 lines
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread agents/src/utils.ts
};

/** @internal */
// Ref: source livekit-agents/livekit/agents/utils/exp_filter.py - 5-64
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Ref comment uses wrong format in utils.ts

Same CLAUDE.md format violation — uses // Ref: source instead of // Ref: python and missing lines suffix.

Suggested change
// Ref: source livekit-agents/livekit/agents/utils/exp_filter.py - 5-64
// Ref: python livekit-agents/livekit/agents/utils/exp_filter.py - 5-64 lines
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@u9g u9g closed this Apr 22, 2026
@u9g u9g deleted the rosetta/issue-99 branch April 22, 2026 23:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants