feat(liveavatar): port plugin from python with video_quality param#1324
feat(liveavatar): port plugin from python with video_quality param#1324toubatbrian wants to merge 1 commit intomainfrom
Conversation
Ports the Python `livekit-plugins-liveavatar` plugin into agents-js as `@livekit/agents-plugin-liveavatar`, including the new `videoQuality` parameter from livekit/agents#5552. The plugin mirrors the Python implementation: it brings up a LiveAvatar streaming session, opens the realtime websocket, captures the agent's audio output through a queue-based AudioOutput, resamples to 24 kHz mono, and forwards base64-encoded chunks (~600 ms first chunk, ~1 s subsequent) to the LiveAvatar service. Inbound websocket events drive playback start/finish notifications back into the AgentSession. Also exports `voice.AudioOutput` (and its companion types) from `@livekit/agents` so plugin authors can subclass the abstract audio sink. Refs: livekit/agents#5552 https://claude.ai/code/session_01DE5pBrf3y1bFgLTK8NDTkB
🦋 Changeset detectedLatest commit: df624a8 The changes in this PR will be included in the next version bump. This PR includes changesets to release 27 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: df624a84b7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| await new Promise<void>((resolve, reject) => { | ||
| ws.once('open', resolve); | ||
| ws.once('error', reject); | ||
| }); |
There was a problem hiding this comment.
Handle WebSocket open failures inside main task
start() launches this.mainTask() without awaiting it, but the initial websocket handshake await is outside the method’s try/catch. When the ws_url is invalid or the connection fails (DNS/TLS/network outage), this path rejects before the internal error handling runs, which can surface as an unhandled promise rejection and leave the session partially initialized. Move this connect await into the guarded section (or attach a catch at spawn time) so startup failures are routed through normal cleanup.
Useful? React with 👍 / 👎.
| ): Promise<unknown> { | ||
| const url = this.apiUrl + endpoint; | ||
| const maxRetry = this.connOptions.maxRetry; | ||
| for (let i = 0; i < maxRetry; i++) { |
There was a problem hiding this comment.
Execute at least one API attempt when retries are zero
The retry loop uses i < maxRetry, so connOptions.maxRetry = 0 results in zero HTTP calls and an immediate APIConnectionError. Since maxRetry represents retries, callers setting zero expect one initial request with no retries; this implementation skips the initial attempt entirely. Use maxRetry + 1 total attempts (or equivalent logic) to preserve expected connection semantics.
Useful? React with 👍 / 👎.
| try { | ||
| if (this.sessionId && this.sessionToken) { | ||
| const data = await this.api.stopStreamingSession(this.sessionId, this.sessionToken); | ||
| if (data.code <= 200) { |
There was a problem hiding this comment.
🟡 Suspicious data.code <= 200 success check logs success for unexpected response codes
The condition data.code <= 200 at plugins/liveavatar/src/avatar.ts:607 is used to decide whether to log a success message for the stop-session API call. This would treat any code ≤ 200 (including 0, negative numbers, or other non-200 codes) as successful. A more precise check like data.code === 200 (or >= 200 && < 300 if the API uses HTTP-like semantics) would be appropriate. While this only affects logging (the session stop proceeds regardless), it could mask failures by logging a misleading "session stopped" message when the API actually returned an error code.
| if (data.code <= 200) { | |
| if (data.code === 200) { |
Was this helpful? React with 👍 or 👎 to provide feedback.
Summary
Automated port of livekit/agents#5552 (
(liveavatar): add video_quality param) into agents-js, plus the first-time port of the entireliveavatarplugin because the Python diff lands on a plugin that previously had no JS counterpart (it was explicitly skipped in #1280 alongside the other Python-only avatar plugins).cc @toubatbrian @livekit/agent-devs — please review.
Ported Features
1. New
@livekit/agents-plugin-liveavatarpackageLayout matches existing avatar plugins (
bey,lemonslice,trugen,anam):plugins/liveavatar/package.jsonplugins/liveavatar/tsconfig.jsonplugins/liveavatar/tsup.config.tsplugins/liveavatar/README.mdplugins/liveavatar/src/index.ts—Pluginregistration + re-exportsplugins/liveavatar/src/log.tsplugins/liveavatar/src/api.ts—LiveAvatarAPIHTTP client (mirrorslivekit-plugins/livekit-plugins-liveavatar/livekit/plugins/liveavatar/api.py)plugins/liveavatar/src/avatar.ts—AvatarSession+ queue-based audio forwarder (mirrorslivekit-plugins/livekit-plugins-liveavatar/livekit/plugins/liveavatar/avatar.py)Each ported method/field carries an inline
// Ref: python <path> - <line-range> linescomment pointing back to the Python source per theCLAUDE.mdporting convention.2.
videoQualityparameter (livekit/agents#5552)VideoQuality = 'very_high' | 'high' | 'medium' | 'low'mirroring the PythonLiteral(api.ts).AvatarSessionconstructor acceptsvideoQuality?: VideoQuality; when set,LiveAvatarAPI.createStreamingSessionincludes it on the/tokenpayload (omitted when unset, matching the Pythonif video_quality is not Noneguard).3. Queue-based audio forwarding
Python uses
livekit.agents.voice.avatar.QueueAudioOutput(a built-in core helper) plus anAudioSegmentEndsentinel pushed onto the queue whenflush()is called. agents-js doesn't exposeQueueAudioOutputtoday, so the plugin defines a small private subclass ofvoice.AudioOutputthat:AudioFrameitems into astreamNs.StreamChannel<AudioFrame | AudioSegmentEnd>.AudioSegmentEndsentinel value whenflush()is called (instead of a typed-queue tag).'clear_buffer'event whenclearBuffer()is invoked, which theAvatarSessiondriver wires into the same interrupt handling Python uses.The forwarding loop reads from this stream, resamples to 24 kHz mono via
AudioResampler(@livekit/rtc-node), buffers up to ~600 ms (first chunk) / ~1 s (subsequent) of audio, base64-encodes it, and ships anagent.speakevent over the LiveAvatar websocket. OnAudioSegmentEndit flushes the buffer and emitsagent.speak_end+agent.start_listening. Inboundsession.state_updated/agent.speak_started/agent.speak_ended/agent.speak_interruptedevents driveonPlaybackStarted/onPlaybackFinishednotifications and anagent.interruptsend when the user barges in.4. Core export:
voice.AudioOutputAdds
AudioOutput,AudioOutputCapabilities,PlaybackFinishedEvent,PlaybackStartedEventto the public surface of@livekit/agents(agents/src/voice/index.ts). The abstract class already existed inagents/src/voice/io.tsbut was not exported throughvoice/index.ts, so plugin authors had no supported way to write a custom audio sink. This is the minimum change required to enable the queue-based subclass above without reaching into deep paths.5.
turbo.jsonenv varsAdds
LIVEAVATAR_API_KEY,LIVEAVATAR_API_URL,LIVEAVATAR_AVATAR_IDtoglobalEnvsoeslint-plugin-turbo'sno-undeclared-env-varsis satisfied (mirrors howLEMONSLICE_*,BEY_*,TRUGEN_*are tracked).Implementation Notes (language-level differences)
QueueAudioOutputin JS core. Python importsfrom livekit.agents.voice.avatar import QueueAudioOutput, AudioSegmentEnd— both first-party utilities. agents-js doesn't ship them, so the plugin inlines a minimal queue-backedvoice.AudioOutputsubclass +AudioSegmentEndsentinel locally rather than landing them in core for a single consumer. If a second plugin ever wants the same primitive, lifting these intoagents/src/voice/avatar/is the natural follow-up.AudioResampleroverload. Python'srtc.AudioResampler(input_rate=..., output_rate=..., num_channels=1)accepts anum_channelskwarg. The JS binding from@livekit/rtc-nodeexposes the same parameter as the third positional arg (new AudioResampler(inputRate, outputRate, 1)), so the resample step is line-for-line equivalent.asyncio.Event→Future. The_session_connected/_chunk_interruptedPythonasyncio.Eventinstances become aFuture<void>(one-shot connect signal) and a simple boolean flag (the chunk-interrupt signal is read once per loop iteration), respectively. Closing semantics are unchanged.utils.aio.Chan→createStreamChannel. The websocket send queue is astreamNs.StreamChannel<Record<string, unknown>>.closedandclose()semantics map 1-to-1; theCloseAgentSessionEventlistener closes it the same way the Pythonon_agent_session_closehandler does.utils.aio.interval(60).tick()→setTimeoutloop. AsetTimeout-based loop replaces Python's tickable interval. The forward path additionally resets the timer whenever a real event is sent (matches Python'sping_interval.reset()after a successfulws_conn.send_json).get_job_context()shutdown callback. Python'sawait super().start(...)(AvatarSessionbase) registersacloseon the job context's shutdown callback list. The JS base class isn't merged yet (feat(voice): port AvatarSession base class and transcript sync asymmetric detach warning #1280 is still open), so this plugin registers its ownjobCtx.addShutdownCallback(() => this.aclose())directly insidestart(). Once feat(voice): port AvatarSession base class and transcript sync asymmetric detach warning #1280 lands, this can be folded into asuper.start(...)call.AccessTokenshape. Python useslivekit.api.AccessTokenwithwith_kind('agent'),with_grants(VideoGrants(room_join=True, room=...)),with_attributes({ATTRIBUTE_PUBLISH_ON_BEHALF: ...}). JS useslivekit-server-sdk'sAccessTokenwith the equivalent property accessors (at.kind = 'agent',at.addGrant(...),at.attributes = ...) — same wire format.ATTRIBUTE_PUBLISH_ON_BEHALFis hardcoded.agents/src/constants.tsexports it but@livekit/agents's public barrel does not, so this plugin hardcodes the literal'lk.publish_on_behalf'string the same waylemonslice,trugen, andrunwaydo.videoQualitytyping. Python'sVideoQualitylives inavatar.pyand is imported intoapi.pyviaif TYPE_CHECKING:to avoid a circular import. TypeScript has no equivalent friction, soVideoQualityis defined and exported fromapi.tsand re-imported byavatar.tsdirectly.Tests
pnpm buildpasses (28/28 packages including the new plugin).pnpm --filter @livekit/agents-plugin-liveavatar lintis clean.pnpm format:checkis clean across the repo.Test plan
AgentSession, attachAvatarSession, verify the avatar joins the room asliveavatar-avatar-agentand speaks the agent's audio.videoQualityvalue (very_high/high/medium/low) is honored on the/tokenpayload and reflected in the avatar's video.agent.interruptis sent onclearBufferwhile the avatar is mid-speech, and thatplaybackPositionis reported withinterrupted: trueto the agent session.aclose()(orCloseAgentSessionEvent) closes the message channel, drains audio, calls/stop, and tears down the websocket without leaking timers.isSandbox: true, confirm the 1-minute disconnect is logged as a warning instead of raisingAPIConnectionError.AudioResamplerswap path.https://claude.ai/code/session_01DE5pBrf3y1bFgLTK8NDTkB
Generated by Claude Code