feat(voice): port dynamic endpointing to Node.js by u9g · Pull Request #1297 · livekit/agents-js

u9g · 2026-04-22T20:14:17Z

This PR was created by Rosetta.

Tracking issue: https://github.com/livekit/rosetta/issues/99

Summary

port the Python dynamic endpointing state machine and endpointing factory into the Node.js voice turn-config runtime
wire AudioRecognition and AgentActivity into the new runtime so pauses, interruptions, and agent overlap update endpointing delays in live sessions
add direct parity tests for the endpointing contract plus Node-specific AudioRecognition integration coverage

Source

Port the dynamic endpointing state machine from the Python SDK so Node.js voice sessions can learn endpointing delays from pauses, interruptions, and agent overlap.

Restore the existing AudioRecognition delay fields and no-arg agent-speech entrypoint as compatibility shims so the dynamic endpointing port does not break the public TypeScript surface.

Fix the generated lint error from the additive AudioRecognition type changes so the pre-push hook accepts the dynamic endpointing port.

Keep the existing ExpFilter, AgentActivity, and AudioRecognition TypeScript signatures additive while still routing the new dynamic endpointing runtime through the Node.js voice stack.

CLAassistant · 2026-04-22T20:14:25Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

Rosetta Bot seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

changeset-bot · 2026-04-22T20:14:25Z

🦋 Changeset detected

Latest commit: 66b66eb

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 26 packages

Name	Type
@livekit/agents	Major
@livekit/agents-plugin-anam	Major
@livekit/agents-plugin-assemblyai	Major
@livekit/agents-plugin-baseten	Major
@livekit/agents-plugin-bey	Major
@livekit/agents-plugin-cartesia	Major
@livekit/agents-plugin-cerebras	Major
@livekit/agents-plugin-deepgram	Major
@livekit/agents-plugin-elevenlabs	Major
@livekit/agents-plugin-google	Major
@livekit/agents-plugin-hedra	Major
@livekit/agents-plugin-inworld	Major
@livekit/agents-plugin-lemonslice	Major
@livekit/agents-plugin-livekit	Major
@livekit/agents-plugin-mistral	Major
@livekit/agents-plugin-neuphonic	Major
@livekit/agents-plugin-openai	Major
@livekit/agents-plugin-phonic	Major
@livekit/agents-plugin-resemble	Major
@livekit/agents-plugin-rime	Major
@livekit/agents-plugin-runway	Major
@livekit/agents-plugin-sarvam	Major
@livekit/agents-plugin-silero	Major
@livekit/agents-plugins-test	Major
@livekit/agents-plugin-trugen	Major
@livekit/agents-plugin-xai	Major

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

devin-ai-integration

Devin Review found 5 potential issues.

View 6 additional findings in Devin Review.

devin-ai-integration · 2026-04-22T20:20:18Z

+              this.onStartOfSpeech(startTime, ev.speechDuration);
              const span = this.ensureUserTurnSpan(startTime);
              const ctx = this.userTurnContext(span);
              otelContext.with(ctx, () => this.hooks.onStartOfSpeech(ev));


🔴 Double overlap speech sentinels sent to interruption detection pipeline

When VAD fires START_OF_SPEECH, the new code in createVadTask calls this.onStartOfSpeech(startTime, ev.speechDuration) at agents/src/voice/audio_recognition.ts:1085, which (when interruption is enabled and agent is speaking) calls this.onStartOfOverlapSpeech(...) at line 389. Then immediately after, this.hooks.onStartOfSpeech(ev) at line 1088 fires the hook into AgentActivity.onStartOfSpeech(ev) (agents/src/voice/agent_activity.ts:1033), which at line 1045 also calls this.audioRecognition.onStartOfOverlapSpeech(...). This sends the overlapSpeechStarted sentinel twice to the interruption detection stream.

The same double-call pattern occurs for END_OF_SPEECH: AudioRecognition.onEndOfSpeech() at line 1119 calls this.onEndOfOverlapSpeech(), and then the hook AgentActivity.onEndOfSpeech() at line 1061 also calls this.audioRecognition.onEndOfOverlapSpeech(). Both calls pass their guards since isAgentSpeaking remains true.

The STT-driven START_OF_SPEECH/END_OF_SPEECH paths (lines 766 and 812) have the same issue.

Duplicate sentinels can corrupt the interruption detection model's state machine, leading to incorrect interruption/non-interruption decisions.

Prompt for agents

The new AudioRecognition.onStartOfSpeech() and AudioRecognition.onEndOfSpeech() methods now handle overlap speech detection internally (calling onStartOfOverlapSpeech / onEndOfOverlapSpeech). However, the existing RecognitionHooks implementations in AgentActivity (onStartOfSpeech at agent_activity.ts:1033 and onEndOfSpeech at agent_activity.ts:1053) still directly call audioRecognition.onStartOfOverlapSpeech / onEndOfOverlapSpeech when interruption detection is enabled. This causes duplicate sentinels to be sent to the interruption detection pipeline for every VAD or STT speech event. To fix: Remove the direct onStartOfOverlapSpeech / onEndOfOverlapSpeech calls from AgentActivity.onStartOfSpeech (lines 1043-1050 in agent_activity.ts) and AgentActivity.onEndOfSpeech (lines 1059-1064 in agent_activity.ts), since AudioRecognition now handles this internally via its new onStartOfSpeech/onEndOfSpeech methods.

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-04-22T20:20:20Z

+
+const logger = log();
+
+// Ref: source livekit-agents/livekit/agents/voice/endpointing.py - 7-7


🔴 Ref comments use wrong format: 'source' instead of 'python' and missing 'lines' suffix

CLAUDE.md mandates the format // Ref: python <relative-file-path> - <line-range> lines for Python reference comments. All 19 Ref comments added in this PR use the wrong format // Ref: source <path> - <line-range> — using source instead of python and omitting the lines suffix. Compare with the existing correct example at agents/src/stt/stt.ts:77: // Ref: python livekit-agents/livekit/agents/stt/stt.py - 62-68 lines. This instance is representative of all Ref comments in the PR.

Suggested change

// Ref: source livekit-agents/livekit/agents/voice/endpointing.py - 7-7

// Ref: python livekit-agents/livekit/agents/voice/endpointing.py - 7-7 lines

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-04-22T20:20:22Z

  }

-  async onStartOfAgentSpeech() {
+  // Ref: source livekit-agents/livekit/agents/voice/audio_recognition.py - 238-243


🔴 Ref comments use wrong format in audio_recognition.ts

Same CLAUDE.md format violation as in endpointing.ts — uses // Ref: source instead of // Ref: python and missing lines suffix. This affects all 4 Ref comments added to this file (lines 338, 347, 380, 392).

Suggested change

// Ref: source livekit-agents/livekit/agents/voice/audio_recognition.py - 238-243

// Ref: python livekit-agents/livekit/agents/voice/audio_recognition.py - 238-243 lines

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-04-22T20:20:23Z

      this.vad.on('metrics_collected', this.onMetricsCollected);
    }

+    // Ref: source livekit-agents/livekit/agents/voice/agent_activity.py - 768-784


🔴 Ref comments use wrong format in agent_activity.ts

Same CLAUDE.md format violation — uses // Ref: source instead of // Ref: python and missing lines suffix. This affects all 6 Ref comments added to this file (lines 473, 753, 936, 959, 1845, 2127).

Suggested change

// Ref: source livekit-agents/livekit/agents/voice/agent_activity.py - 768-784

// Ref: python livekit-agents/livekit/agents/voice/agent_activity.py - 768-784 lines

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-04-22T20:20:25Z

+};
+
 /** @internal */
+// Ref: source livekit-agents/livekit/agents/utils/exp_filter.py - 5-64


🔴 Ref comment uses wrong format in utils.ts

Same CLAUDE.md format violation — uses // Ref: source instead of // Ref: python and missing lines suffix.

Suggested change

// Ref: source livekit-agents/livekit/agents/utils/exp_filter.py - 5-64

// Ref: python livekit-agents/livekit/agents/utils/exp_filter.py - 5-64 lines

Was this helpful? React with 👍 or 👎 to provide feedback.

Rosetta Bot added 4 commits April 22, 2026 20:05

feat(voice): add dynamic endpointing runtime

9f51db4

Port the dynamic endpointing state machine from the Python SDK so Node.js voice sessions can learn endpointing delays from pauses, interruptions, and agent overlap.

fix(voice): keep audio recognition endpointing additive

f413897

Restore the existing AudioRecognition delay fields and no-arg agent-speech entrypoint as compatibility shims so the dynamic endpointing port does not break the public TypeScript surface.

style(voice): format audio recognition endpointing types

fe54380

Fix the generated lint error from the additive AudioRecognition type changes so the pre-push hook accepts the dynamic endpointing port.

fix(voice): preserve additive endpointing APIs

66b66eb

Keep the existing ExpFilter, AgentActivity, and AudioRecognition TypeScript signatures additive while still routing the new dynamic endpointing runtime through the Node.js voice stack.

devin-ai-integration Bot reviewed Apr 22, 2026

View reviewed changes

u9g closed this Apr 22, 2026

u9g deleted the rosetta/issue-99 branch April 22, 2026 23:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(voice): port dynamic endpointing to Node.js#1297

feat(voice): port dynamic endpointing to Node.js#1297
u9g wants to merge 4 commits intomainfrom
rosetta/issue-99

u9g commented Apr 22, 2026

Uh oh!

CLAassistant commented Apr 22, 2026

Uh oh!

changeset-bot Bot commented Apr 22, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Apr 22, 2026

Uh oh!

devin-ai-integration Bot Apr 22, 2026

Uh oh!

devin-ai-integration Bot Apr 22, 2026

Uh oh!

devin-ai-integration Bot Apr 22, 2026

Uh oh!

devin-ai-integration Bot Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		const logger = log();

		// Ref: source livekit-agents/livekit/agents/voice/endpointing.py - 7-7

	// Ref: source livekit-agents/livekit/agents/voice/endpointing.py - 7-7
	// Ref: python livekit-agents/livekit/agents/voice/endpointing.py - 7-7 lines

Conversation

u9g commented Apr 22, 2026

Summary

Source

Uh oh!

CLAassistant commented Apr 22, 2026

Uh oh!

changeset-bot Bot commented Apr 22, 2026

🦋 Changeset detected

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants