feat(inference): add inferenceClass option to inference.LLM#1282
Open
adrian-cowham wants to merge 1 commit intomainfrom
Open
feat(inference): add inferenceClass option to inference.LLM#1282adrian-cowham wants to merge 1 commit intomainfrom
adrian-cowham wants to merge 1 commit intomainfrom
Conversation
Adds an optional `inferenceClass?: 'priority' | 'standard'` option on both the `inference.LLM` constructor and `LLM.chat()`. When set, the outbound gateway request includes an `X-LiveKit-Inference-Priority` header, letting callers opt into the LiveKit Agent Gateway's priority/standard routing tiers. Precedence follows the existing pattern for `parallelToolCalls`/`toolChoice`: per-call value beats constructor default; when neither is set, no header is emitted (behavior unchanged). Header name literals are consolidated in inference/utils.ts as `INFERENCE_PROVIDER_HEADER` and `INFERENCE_PRIORITY_HEADER`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Adds an optional
inferenceClass?: 'priority' | 'standard'option toinference.LLM, letting callers opt into the LiveKit Agent Gateway's priority/standard routing tiers. When set, the outbound request includes anX-LiveKit-Inference-Priorityheader; when unset, nothing is emitted (behavior unchanged).The option is available in two places, mirroring the existing precedence pattern for
parallelToolCalls/toolChoicein the same file:chat()call made by thatLLMinstance.LLM.chat()— overrides the constructor value for a single request.Changes Made
InferenceLLMOptionsgets a new optionalinferenceClass?: InferenceClassfield;InferenceClassis a new exported type alias ('priority' | 'standard').LLMconstructor acceptsinferenceClassand stores it onthis.optswithout a default.LLM.chat()acceptsinferenceClass, resolves it ascallArg !== undefined ? callArg : this.opts.inferenceClass, and forwards the resolved value toLLMStream.LLMStream.run()setsextraHeaders[INFERENCE_PRIORITY_HEADER] = inferenceClassimmediately after the existing provider-header block, only when the resolved value is defined.X-LiveKit-Inference-Provider,X-LiveKit-Inference-Priority) are consolidated asINFERENCE_PROVIDER_HEADER/INFERENCE_PRIORITY_HEADERconstants inagents/src/inference/utils.ts, co-located with the other gateway-concerned helpers.type InferenceClassis re-exported fromagents/src/inference/index.ts.agents/src/inference/llm.test.tswith 7 explicitit(...)tests covering every observable precedence state (no-value, constructor-only × 2, per-call-only × 2, per-call-overrides-constructor × 2), per the project's "explicit tests, no loop-over-cases" convention.Usage
Pre-Review Checklist
pnpm build,pnpm format:check,pnpm lint(no new warnings) pass locally.inference.LLMpath is touched —inference.STT,inference.TTS, andFallbackAdapterare intentionally out of scope for this PR.Testing
agents/src/inference/llm.test.ts(7/7 pass).main— e.g. cerebras/mistral — are not affected by this change.)