Skip to content

feat(providers): add DeepInfra as a built-in inference provider#1902

Open
mmilutinovic371 wants to merge 4 commits into
NVIDIA:mainfrom
mmilutinovic371:feat/providers-deepinfra-v2-only
Open

feat(providers): add DeepInfra as a built-in inference provider#1902
mmilutinovic371 wants to merge 4 commits into
NVIDIA:mainfrom
mmilutinovic371:feat/providers-deepinfra-v2-only

Conversation

@mmilutinovic371

@mmilutinovic371 mmilutinovic371 commented Jun 14, 2026

Copy link
Copy Markdown

Summary

DeepInfra is one of the top open source LLM providers and a perfect fit for agent frameworks with its low cost and high performance. This PR promotes it from a documented workaround to a core built-in provider in OpenShell using Providers v2.

  • Adds deepinfra as a built-in Providers v2 profile with DEEPINFRA_API_KEY discovery
  • Adds deepinfra as a built-in inference provider alongside nvidia, openai, and anthropic
  • DEEPINFRA_API_KEY is now discovered automatically via --from-existing (through the v2 profile discovery section)
  • openshell provider list-profiles shows DeepInfra in the INFERENCE section
  • Fixes build_backend_url to correctly strip /v1 from request paths when the provider base URL contains /v1/ as an internal path segment (e.g. https://api.deepinfra.com/v1/openai) — without this fix, requests were routed to .../v1/openai/v1/chat/completions (404) instead of .../v1/openai/chat/completions

Related Issue

N/A

Changes

  • providers/deepinfra.yaml — new built-in Providers v2 profile (inference category, api.deepinfra.com:443, Bearer auth, DEEPINFRA_API_KEY)
  • crates/openshell-core/src/inference.rsDEEPINFRA_PROFILE, normalization, profile_for entries + tests; deepinfra added to openai_compatible_profiles_include_embeddings
  • crates/openshell-router/src/backend.rs — URL construction fix narrowed to path-rooted /v1 check; regression test for nested proxy path added
  • crates/openshell-providers/src/profiles.rs — registration of deepinfra.yaml in built-in profile catalog
  • crates/openshell-providers/src/providers/deepinfra.rs — new provider discovery plugin (DEEPINFRA_API_KEY, discovery test)
  • crates/openshell-providers/src/providers/mod.rs — module declaration for deepinfra
  • crates/openshell-providers/src/lib.rs — registers deepinfra in ProviderRegistry so known_types() and TUI include it
  • crates/openshell-server/src/inference.rs — adds deepinfra to unsupported-type error message
  • docs/sandboxes/providers-v2.mdx — DeepInfra row in built-in profiles table
  • docs/sandboxes/manage-providers.mdx — DeepInfra rows (provider types + inference providers); removes old v1 workaround row that used openai type with OPENAI_API_KEY

Testing

  • mise run pre-commit passes (rust, helm, markdown, license; python:proto is a pre-existing failure unrelated to this PR)
  • 294 Rust unit tests pass across openshell-core, openshell-providers, openshell-router (cargo test -p openshell-core -p openshell-providers -p openshell-router)
  • openshell provider list-profiles shows deepinfra in INFERENCE section
  • openshell provider create --name di --type deepinfra --from-existing discovers DEEPINFRA_API_KEY
  • openshell inference set --provider di --model <model> --no-verify configures route
  • curl https://inference.local/v1/chat/completions from inside sandbox returns a valid completion from DeepInfra

Unit test results

test result: ok. 176 passed; 0 failed; 0 ignored  (openshell-core)
test result: ok. 47 passed;  0 failed; 0 ignored  (openshell-providers)
test result: ok. 54 passed;  0 failed; 0 ignored  (openshell-router)
test result: ok. 17 passed;  0 failed; 0 ignored  (openshell-router integration)

Includes inference::tests::profile_for_deepinfra, inference::tests::openai_compatible_profiles_include_embeddings (covers deepinfra), backend::tests::build_backend_url_dedupes_v1_for_base_with_v1_subpath, backend::tests::build_backend_url_preserves_v1_for_nested_proxy_path, and providers::deepinfra::tests::discovers_deepinfra_env_credentials.

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated (docs/sandboxes/providers-v2.mdx, docs/sandboxes/manage-providers.mdx)

…nly)

- Adds `deepinfra` as a built-in Providers v2 profile (`providers/deepinfra.yaml`)
  with inference category, Bearer auth, and `DEEPINFRA_API_KEY` discovery
- Adds `DEEPINFRA_PROFILE` to inference routing so `inference.local` works
  with the `deepinfra` provider type
- Fixes `build_backend_url` to strip `/v1` from request paths when the base
  URL contains `/v1/` as an internal segment (e.g. `api.deepinfra.com/v1/openai`),
  preventing double-versioned paths like `.../v1/openai/v1/chat/completions`
- Updates `docs/sandboxes/providers-v2.mdx` and `docs/sandboxes/manage-providers.mdx`
  with DeepInfra entries; removes the old v1 workaround row that used `openai`
  type with `OPENAI_API_KEY`

Signed-off-by: Milos Milutinovic <codemastermilos@gmail.com>
@copy-pr-bot

copy-pr-bot Bot commented Jun 14, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@johntmyers johntmyers self-assigned this Jun 15, 2026
@johntmyers johntmyers added the gator:in-review Gator is reviewing or awaiting PR review feedback label Jun 15, 2026
@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

PR Review Status

Validation: This PR is project-valid for OpenShell because it is concentrated Providers v2/inference work with a clear user path, dedicated DeepInfra credentials, provider policy metadata, tests, and docs updates.
Head SHA: b9a714f4685e3d062f3b8d31acb45e7b22cffb00

Review findings:

  • crates/openshell-router/src/backend.rs: the new /v1 dedupe check now matches any base URL containing /v1/. That can regress custom/proxy endpoints such as https://proxy.example/api/v1/openai, where /v1/chat/completions may intentionally need to remain appended. Please narrow the check to the intended cases, such as a base path that starts with v1 or ends with v1, and add a regression test that preserves /v1/chat/completions for a nested proxy path.
  • crates/openshell-providers/src/lib.rs / crates/openshell-providers/src/providers/mod.rs: DeepInfra is added to the built-in profile catalog, but not to the provider discovery plugin registry. ProviderRegistry::known_types() will omit deepinfra, so the TUI create-provider modal will not list it, and legacy --from-existing fallback discovery will report it as unsupported. Please add a deepinfra provider plugin with DEEPINFRA_API_KEY, register it, and include the standard discovery test.
  • crates/openshell-server/src/inference.rs: the unsupported-inference-provider error still lists openai, anthropic, nvidia, google-vertex-ai; please add deepinfra so users get accurate debugging guidance.
  • crates/openshell-core/src/inference.rs: please include deepinfra in the openai_compatible_profiles_include_embeddings test so the OpenAI-compatible protocol contract is locked in.

Docs: Updated on existing Fern pages under docs/; no docs/index.yml navigation change is needed.

Next state: gator:in-review

- Narrow build_backend_url /v1 dedupe to URLs whose path component is
  exactly /v1 or starts with /v1/ — prevents regression on proxy
  endpoints where /v1 is buried deeper (e.g. /api/v1/openai); add
  regression test for the nested proxy path case
- Add deepinfra provider plugin with DEEPINFRA_API_KEY discovery,
  registered in ProviderRegistry so known_types() and TUI include it
- Add deepinfra to unsupported-inference-provider error message in
  openshell-server for accurate user-facing debugging guidance
- Add deepinfra to openai_compatible_profiles_include_embeddings test
  to lock in the OpenAI-compatible protocol contract

Signed-off-by: Milos Milutinovic <codemastermilos@gmail.com>
@mmilutinovic371

Copy link
Copy Markdown
Author

All four gator review findings addressed in 1c3fc5b:

  1. build_backend_url /v1 dedupe narrowed — changed from base.contains("/v1/") to extracting the URL's path component and checking path == "/v1" || path.starts_with("/v1/"). This preserves the full path for proxy endpoints like https://proxy.example/api/v1/openai (path is /api/v1/..., not rooted at /v1) while still deduplicating for DeepInfra (/v1/openai) and OpenAI/Nvidia (/v1). Regression test build_backend_url_preserves_v1_for_nested_proxy_path added.

  2. DeepInfra provider plugin added — new providers/deepinfra.rs with DEEPINFRA_API_KEY discovery spec, registered in ProviderRegistry::new() and providers/mod.rs. known_types() now returns deepinfra; discovery test included.

  3. Error message updateddeepinfra added to the supported providers list in openshell-server/src/inference.rs.

  4. Embeddings test extendeddeepinfra added to openai_compatible_profiles_include_embeddings in openshell-core/src/inference.rs.

@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

Re-check After Author Update

I re-evaluated latest head 1c3fc5bd47f91f22038eddafb49026ae675f9295 after @mmilutinovic371's 2026-06-15 18:47 UTC comment saying the prior gator findings were addressed.

Disposition: partially resolved.

Remaining items:

  • crates/openshell-router/src/backend.rs: the updated /v1 dedupe logic now preserves nested proxy paths such as https://proxy.example/api/v1/openai, but it regresses documented OpenAI-compatible endpoints whose base URL ends in /v1 below a non-root prefix, for example https://api.groq.com/openai/v1. Those would route to .../openai/v1/v1/chat/completions. Please dedupe when the base URL path has v1 as the first or final path segment, while still preserving the new nested proxy case, and add a regression test for https://api.groq.com/openai/v1 plus /v1/chat/completions.

Resolved from the prior review:

  • DeepInfra provider discovery is registered with DEEPINFRA_API_KEY.
  • The unsupported inference provider error includes deepinfra.
  • The OpenAI-compatible embeddings protocol test includes deepinfra.
  • Docs were updated on the existing Fern provider pages; no navigation change is needed.

Next state: gator:in-review

Extends the /v1 deduplication logic to also strip /v1 from request paths
when the base URL's path ends with /v1 (e.g. https://api.groq.com/openai/v1).
The previous fix only matched paths starting with /v1/, which regressed
providers like Groq whose base path has /v1 as the last segment rather than
the first. The nested-proxy exclusion (e.g. /api/v1/openai) is preserved
since /v1 appears in the middle — neither first nor last segment. Adds a
regression test for the Groq-style base URL.

Signed-off-by: Milos Milutinovic <codemastermilos@gmail.com>
@mmilutinovic371

Copy link
Copy Markdown
Author

Addressed in ce0ccda.

The root cause: the path-rooted check (starts_with("/v1/")) only matched providers where /v1 is the first path segment. Groq-style base URLs (/openai/v1) have /v1 as the last segment, so dedup was skipped and /v1 doubled.

Fix: dedup when the base URL's path starts with /v1/ (deepinfra: /v1/openai) or ends with /v1 (openai/nvidia: /v1, groq: /openai/v1). The nested-proxy exclusion (/api/v1/openai) still holds — /v1 there is neither first nor last segment so neither condition fires.

Regression test build_backend_url_dedupes_v1_for_base_ending_with_v1 added for https://api.groq.com/openai/v1 + /v1/chat/completionshttps://api.groq.com/openai/v1/chat/completions.

@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

Re-check After Author Update

I re-evaluated latest head ce0ccda6aa14b8ee33e1fa0c9709e7695bbe4df1 after @mmilutinovic371's 2026-06-15 18:59 UTC comment saying the remaining /v1 routing feedback was addressed.

Disposition: partially resolved.

Resolved items:

  • The DeepInfra and Groq-style /v1 dedupe cases are now covered while preserving the nested proxy-path case.
  • DeepInfra provider discovery is registered with DEEPINFRA_API_KEY.
  • The unsupported inference provider error includes deepinfra.
  • The OpenAI-compatible embeddings protocol test includes deepinfra.
  • Fern docs under docs/ were updated; no docs/index.yml navigation change is needed.

Remaining items:

  • crates/openshell-server/src/grpc/provider.rs: list_provider_profiles_returns_built_in_profile_categories still expects the old built-in profile inventory. Because providers/deepinfra.yaml is now included in the default profile catalog, this assertion should include deepinfra between cursor and github; otherwise CI should fail when that test runs.
  • crates/openshell-server/src/grpc/provider.rs / crates/openshell-core/src/telemetry.rs: DeepInfra now normalizes as a built-in provider, but provider lifecycle telemetry still falls through to Custom. Please either add a deepinfra telemetry bucket and mapping, or add a test/comment that documents why this built-in provider intentionally remains in the custom bucket.

Non-blocking docs note:

  • architecture/gateway.md still lists supported cluster inference providers as openai, anthropic, nvidia, and google-vertex-ai; please update that overview if architecture docs are expected to stay current with this provider addition.

Next state: gator:in-review

…t test

- Add DeepInfra variant to ProviderProfile telemetry enum and from_raw()
  mapping so deepinfra providers are tracked in their own bucket rather
  than falling through to Custom
- Map deepinfra in telemetry_provider_profile() in openshell-server
- Add deepinfra to list_provider_profiles_returns_built_in_profile_categories
  test (sorted between cursor and github)
- Update architecture/gateway.md inference provider list to include deepinfra

Signed-off-by: Milos Milutinovic <codemastermilos@gmail.com>
@mmilutinovic371

Copy link
Copy Markdown
Author

Addressed in d4e5e98:

  • Telemetry: Added Deepinfra variant to ProviderProfile enum in openshell-core/src/telemetry.rs with as_str() → "deepinfra" and from_raw("deepinfra") → Deepinfra. Mapped it in telemetry_provider_profile() in openshell-server so deepinfra providers get their own bucket rather than falling through to Custom.
  • Profile list test: Added "deepinfra" between "cursor" and "github" in list_provider_profiles_returns_built_in_profile_categories.
  • Architecture docs: Updated architecture/gateway.md to include deepinfra in the supported cluster inference provider types.

@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

Re-check After Author Update

I re-evaluated latest head d4e5e980c697a45ae1023af98c7bf058fc717464 after @mmilutinovic371's 2026-06-15 19:33 UTC comment saying the remaining telemetry/profile-list feedback was addressed.

Disposition: resolved.

Remaining items:

  • No blocking items remain.

Review: The independent reviewer found no blocking findings at this head. The only follow-up was a non-blocking suggestion to add a focused DeepInfra telemetry assertion.

Docs: Fern docs under docs/sandboxes/ and the architecture overview were updated; no docs/index.yml navigation change is needed.

E2E: I applied test:e2e because this touches provider credential discovery and inference routing behavior.

Next state: gator:watch-pipeline

@johntmyers johntmyers added test:e2e Requires end-to-end coverage gator:watch-pipeline Gator is monitoring PR CI/CD status and removed gator:in-review Gator is reviewing or awaiting PR review feedback labels Jun 15, 2026
@johntmyers

Copy link
Copy Markdown
Collaborator

/ok to test d4e5e98

@github-actions

Copy link
Copy Markdown

Label test:e2e applied for d4e5e98. Open Branch E2E Checks, find the run for commit d4e5e98, and click Re-run all jobs to execute with the label set. The run will execute the standard E2E suite after building the required gateway and supervisor images once. The matching required CI gate status on this PR will flip green automatically once the run finishes.

@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

Pipeline Failure

Head SHA: d4e5e980c697a45ae1023af98c7bf058fc717464

OpenShell / Branch Checks failed because both Rust jobs failed cargo fmt --all -- --check in crates/openshell-router/src/backend.rs around the new Groq-style /v1 regression test assertion.

Next action: @mmilutinovic371, please run cargo fmt --all or otherwise apply the formatter output, push the formatting-only fix, and the pipeline can be rechecked. The test:e2e label remains appropriate, but gator is moving this back to gator:in-review until the required branch check is green.

@johntmyers johntmyers added gator:in-review Gator is reviewing or awaiting PR review feedback and removed gator:watch-pipeline Gator is monitoring PR CI/CD status labels Jun 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gator:in-review Gator is reviewing or awaiting PR review feedback test:e2e Requires end-to-end coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants