Skip to content

feat(liveavatar): port plugin from python with video_quality param#1324

Open
toubatbrian wants to merge 1 commit intomainfrom
claude/jolly-lovelace-5PHvh
Open

feat(liveavatar): port plugin from python with video_quality param#1324
toubatbrian wants to merge 1 commit intomainfrom
claude/jolly-lovelace-5PHvh

Conversation

@toubatbrian
Copy link
Copy Markdown
Contributor

Summary

Automated port of livekit/agents#5552 ((liveavatar): add video_quality param) into agents-js, plus the first-time port of the entire liveavatar plugin because the Python diff lands on a plugin that previously had no JS counterpart (it was explicitly skipped in #1280 alongside the other Python-only avatar plugins).

cc @toubatbrian @livekit/agent-devs — please review.

This PR was created by an automated Claude Code Routine (currently in experimentation stage). It mirrors the Python livekit-plugins-liveavatar plugin into a new @livekit/agents-plugin-liveavatar package and pulls in the videoQuality parameter introduced by livekit/agents#5552.

Ported Features

1. New @livekit/agents-plugin-liveavatar package

Layout matches existing avatar plugins (bey, lemonslice, trugen, anam):

  • plugins/liveavatar/package.json
  • plugins/liveavatar/tsconfig.json
  • plugins/liveavatar/tsup.config.ts
  • plugins/liveavatar/README.md
  • plugins/liveavatar/src/index.tsPlugin registration + re-exports
  • plugins/liveavatar/src/log.ts
  • plugins/liveavatar/src/api.tsLiveAvatarAPI HTTP client (mirrors livekit-plugins/livekit-plugins-liveavatar/livekit/plugins/liveavatar/api.py)
  • plugins/liveavatar/src/avatar.tsAvatarSession + queue-based audio forwarder (mirrors livekit-plugins/livekit-plugins-liveavatar/livekit/plugins/liveavatar/avatar.py)

Each ported method/field carries an inline // Ref: python <path> - <line-range> lines comment pointing back to the Python source per the CLAUDE.md porting convention.

2. videoQuality parameter (livekit/agents#5552)

  • New TS type alias VideoQuality = 'very_high' | 'high' | 'medium' | 'low' mirroring the Python Literal (api.ts).
  • AvatarSession constructor accepts videoQuality?: VideoQuality; when set, LiveAvatarAPI.createStreamingSession includes it on the /token payload (omitted when unset, matching the Python if video_quality is not None guard).

3. Queue-based audio forwarding

Python uses livekit.agents.voice.avatar.QueueAudioOutput (a built-in core helper) plus an AudioSegmentEnd sentinel pushed onto the queue when flush() is called. agents-js doesn't expose QueueAudioOutput today, so the plugin defines a small private subclass of voice.AudioOutput that:

  • Buffers AudioFrame items into a streamNs.StreamChannel<AudioFrame | AudioSegmentEnd>.
  • Pushes a private AudioSegmentEnd sentinel value when flush() is called (instead of a typed-queue tag).
  • Re-emits an internal 'clear_buffer' event when clearBuffer() is invoked, which the AvatarSession driver wires into the same interrupt handling Python uses.

The forwarding loop reads from this stream, resamples to 24 kHz mono via AudioResampler (@livekit/rtc-node), buffers up to ~600 ms (first chunk) / ~1 s (subsequent) of audio, base64-encodes it, and ships an agent.speak event over the LiveAvatar websocket. On AudioSegmentEnd it flushes the buffer and emits agent.speak_end + agent.start_listening. Inbound session.state_updated / agent.speak_started / agent.speak_ended / agent.speak_interrupted events drive onPlaybackStarted / onPlaybackFinished notifications and an agent.interrupt send when the user barges in.

4. Core export: voice.AudioOutput

Adds AudioOutput, AudioOutputCapabilities, PlaybackFinishedEvent, PlaybackStartedEvent to the public surface of @livekit/agents (agents/src/voice/index.ts). The abstract class already existed in agents/src/voice/io.ts but was not exported through voice/index.ts, so plugin authors had no supported way to write a custom audio sink. This is the minimum change required to enable the queue-based subclass above without reaching into deep paths.

5. turbo.json env vars

Adds LIVEAVATAR_API_KEY, LIVEAVATAR_API_URL, LIVEAVATAR_AVATAR_ID to globalEnv so eslint-plugin-turbo's no-undeclared-env-vars is satisfied (mirrors how LEMONSLICE_*, BEY_*, TRUGEN_* are tracked).

Implementation Notes (language-level differences)

  • No QueueAudioOutput in JS core. Python imports from livekit.agents.voice.avatar import QueueAudioOutput, AudioSegmentEnd — both first-party utilities. agents-js doesn't ship them, so the plugin inlines a minimal queue-backed voice.AudioOutput subclass + AudioSegmentEnd sentinel locally rather than landing them in core for a single consumer. If a second plugin ever wants the same primitive, lifting these into agents/src/voice/avatar/ is the natural follow-up.
  • Sample-rate-only AudioResampler overload. Python's rtc.AudioResampler(input_rate=..., output_rate=..., num_channels=1) accepts a num_channels kwarg. The JS binding from @livekit/rtc-node exposes the same parameter as the third positional arg (new AudioResampler(inputRate, outputRate, 1)), so the resample step is line-for-line equivalent.
  • asyncio.EventFuture. The _session_connected / _chunk_interrupted Python asyncio.Event instances become a Future<void> (one-shot connect signal) and a simple boolean flag (the chunk-interrupt signal is read once per loop iteration), respectively. Closing semantics are unchanged.
  • utils.aio.ChancreateStreamChannel. The websocket send queue is a streamNs.StreamChannel<Record<string, unknown>>. closed and close() semantics map 1-to-1; the Close AgentSessionEvent listener closes it the same way the Python on_agent_session_close handler does.
  • utils.aio.interval(60).tick()setTimeout loop. A setTimeout-based loop replaces Python's tickable interval. The forward path additionally resets the timer whenever a real event is sent (matches Python's ping_interval.reset() after a successful ws_conn.send_json).
  • get_job_context() shutdown callback. Python's await super().start(...) (AvatarSession base) registers aclose on the job context's shutdown callback list. The JS base class isn't merged yet (feat(voice): port AvatarSession base class and transcript sync asymmetric detach warning #1280 is still open), so this plugin registers its own jobCtx.addShutdownCallback(() => this.aclose()) directly inside start(). Once feat(voice): port AvatarSession base class and transcript sync asymmetric detach warning #1280 lands, this can be folded into a super.start(...) call.
  • AccessToken shape. Python uses livekit.api.AccessToken with with_kind('agent'), with_grants(VideoGrants(room_join=True, room=...)), with_attributes({ATTRIBUTE_PUBLISH_ON_BEHALF: ...}). JS uses livekit-server-sdk's AccessToken with the equivalent property accessors (at.kind = 'agent', at.addGrant(...), at.attributes = ...) — same wire format.
  • ATTRIBUTE_PUBLISH_ON_BEHALF is hardcoded. agents/src/constants.ts exports it but @livekit/agents's public barrel does not, so this plugin hardcodes the literal 'lk.publish_on_behalf' string the same way lemonslice, trugen, and runway do.
  • videoQuality typing. Python's VideoQuality lives in avatar.py and is imported into api.py via if TYPE_CHECKING: to avoid a circular import. TypeScript has no equivalent friction, so VideoQuality is defined and exported from api.ts and re-imported by avatar.ts directly.

Tests

  • pnpm build passes (28/28 packages including the new plugin).
  • pnpm --filter @livekit/agents-plugin-liveavatar lint is clean.
  • pnpm format:check is clean across the repo.
  • No new unit tests added in this port (Python source has none either; lemonslice's test pattern is API-key-gated and not portable without a live LiveAvatar credential).

Test plan

  • Run the new plugin against a real LiveAvatar account end-to-end: start an AgentSession, attach AvatarSession, verify the avatar joins the room as liveavatar-avatar-agent and speaks the agent's audio.
  • Verify each videoQuality value (very_high / high / medium / low) is honored on the /token payload and reflected in the avatar's video.
  • Confirm interrupt → agent.interrupt is sent on clearBuffer while the avatar is mid-speech, and that playbackPosition is reported with interrupted: true to the agent session.
  • Confirm clean shutdown: aclose() (or Close AgentSessionEvent) closes the message channel, drains audio, calls /stop, and tears down the websocket without leaking timers.
  • Sandbox mode: with isSandbox: true, confirm the 1-minute disconnect is logged as a warning instead of raising APIConnectionError.
  • Smoke test against an avatar using a non-24 kHz / non-mono TTS source to exercise the AudioResampler swap path.

https://claude.ai/code/session_01DE5pBrf3y1bFgLTK8NDTkB


Generated by Claude Code

Ports the Python `livekit-plugins-liveavatar` plugin into agents-js as
`@livekit/agents-plugin-liveavatar`, including the new `videoQuality`
parameter from livekit/agents#5552.

The plugin mirrors the Python implementation: it brings up a LiveAvatar
streaming session, opens the realtime websocket, captures the agent's
audio output through a queue-based AudioOutput, resamples to 24 kHz mono,
and forwards base64-encoded chunks (~600 ms first chunk, ~1 s subsequent)
to the LiveAvatar service. Inbound websocket events drive playback
start/finish notifications back into the AgentSession.

Also exports `voice.AudioOutput` (and its companion types) from
`@livekit/agents` so plugin authors can subclass the abstract audio sink.

Refs: livekit/agents#5552

https://claude.ai/code/session_01DE5pBrf3y1bFgLTK8NDTkB
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Apr 27, 2026

🦋 Changeset detected

Latest commit: df624a8

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 27 packages
Name Type
@livekit/agents-plugin-liveavatar Major
@livekit/agents Major
@livekit/agents-plugin-anam Major
@livekit/agents-plugin-assemblyai Major
@livekit/agents-plugin-baseten Major
@livekit/agents-plugin-bey Major
@livekit/agents-plugin-cartesia Major
@livekit/agents-plugin-cerebras Major
@livekit/agents-plugin-deepgram Major
@livekit/agents-plugin-elevenlabs Major
@livekit/agents-plugin-google Major
@livekit/agents-plugin-hedra Major
@livekit/agents-plugin-inworld Major
@livekit/agents-plugin-lemonslice Major
@livekit/agents-plugin-livekit Major
@livekit/agents-plugin-mistral Major
@livekit/agents-plugin-neuphonic Major
@livekit/agents-plugin-openai Major
@livekit/agents-plugin-phonic Major
@livekit/agents-plugin-resemble Major
@livekit/agents-plugin-rime Major
@livekit/agents-plugin-runway Major
@livekit/agents-plugin-sarvam Major
@livekit/agents-plugin-silero Major
@livekit/agents-plugins-test Major
@livekit/agents-plugin-trugen Major
@livekit/agents-plugin-xai Major

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: df624a84b7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +398 to +401
await new Promise<void>((resolve, reject) => {
ws.once('open', resolve);
ws.once('error', reject);
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Handle WebSocket open failures inside main task

start() launches this.mainTask() without awaiting it, but the initial websocket handshake await is outside the method’s try/catch. When the ws_url is invalid or the connection fails (DNS/TLS/network outage), this path rejects before the internal error handling runs, which can surface as an unhandled promise rejection and leave the session partially initialized. Move this connect await into the guarded section (or attach a catch at spawn time) so startup failures are routed through normal cleanup.

Useful? React with 👍 / 👎.

): Promise<unknown> {
const url = this.apiUrl + endpoint;
const maxRetry = this.connOptions.maxRetry;
for (let i = 0; i < maxRetry; i++) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Execute at least one API attempt when retries are zero

The retry loop uses i < maxRetry, so connOptions.maxRetry = 0 results in zero HTTP calls and an immediate APIConnectionError. Since maxRetry represents retries, callers setting zero expect one initial request with no retries; this implementation skips the initial attempt entirely. Use maxRetry + 1 total attempts (or equivalent logic) to preserve expected connection semantics.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 6 additional findings in Devin Review.

Open in Devin Review

try {
if (this.sessionId && this.sessionToken) {
const data = await this.api.stopStreamingSession(this.sessionId, this.sessionToken);
if (data.code <= 200) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suspicious data.code <= 200 success check logs success for unexpected response codes

The condition data.code <= 200 at plugins/liveavatar/src/avatar.ts:607 is used to decide whether to log a success message for the stop-session API call. This would treat any code ≤ 200 (including 0, negative numbers, or other non-200 codes) as successful. A more precise check like data.code === 200 (or >= 200 && < 300 if the API uses HTTP-like semantics) would be appropriate. While this only affects logging (the session stop proceeds regardless), it could mask failures by logging a misleading "session stopped" message when the API actually returned an error code.

Suggested change
if (data.code <= 200) {
if (data.code === 200) {
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants