Skip to content

Resolve authz ConfigMap for VirtualMCPServer#5290

Open
blkt wants to merge 2 commits into
mainfrom
feat/vmcp-resolve-authz-configmap
Open

Resolve authz ConfigMap for VirtualMCPServer#5290
blkt wants to merge 2 commits into
mainfrom
feat/vmcp-resolve-authz-configmap

Conversation

@blkt
Copy link
Copy Markdown
Contributor

@blkt blkt commented May 15, 2026

A VirtualMCPServer with spec.incomingAuth.authzConfig.type: configMap
silently produced a vmcp config.yaml that referenced the unresolved
configMap type token. The vmcp binary's AuthzConfig validator only
accepts cedar or none, so the pod crashed in CrashLoopBackOff at
startup. Inline authz also silently dropped GroupClaimName,
RoleClaimName, GroupEntityType, and EntitiesJSON, so any enterprise
Cedar policy that walked a Client → ClaimGroup → PlatformRole hierarchy
denied every request because the runtime Cedar authorizer built
THVGroup:: parents while the entity store contained ClaimGroup::
entities.

Wire the configMap path end-to-end, plumb the four missing fields
through both source paths, and move PrimaryUpstreamProvider onto the
auth server config where it belongs:

  • Extract LoadAuthzConfigFromConfigMap as the shared fetch/parse/
    validate helper in controllerutil; AddAuthzConfigOptions now
    delegates to it. The vMCP converter calls the same helper so the
    failure modes match the MCPServer/MCPRemoteProxy runner path.

  • Extend pkg/vmcp/config.AuthzConfig with EntitiesJSON,
    GroupClaimName, RoleClaimName, GroupEntityType, and forward
    all four into cedar.ConfigOptions in the Cedar middleware factory.
    EntitiesJSON defaults to "[]" when unset to preserve the
    historical Cedar contract.

  • Lift the source-agnostic Cedar JWT-claim mapping fields
    (GroupClaimName, RoleClaimName, GroupEntityType) onto
    AuthzConfigRef so they work identically for inline and configMap
    users. For configMap users the parsed payload provides the default
    and the spec-level field overrides when set.

  • Move PrimaryUpstreamProvider onto EmbeddedAuthServerConfig
    (spec.authServerConfig.primaryUpstreamProvider on
    VirtualMCPServer). The field describes which upstream IDP token
    Cedar reads claims from, which is a property of the embedded auth
    server, not of the authz policies. Placing it next to
    upstreamProviders co-locates the choice with the set it selects
    from. Reachable from MCPExternalAuthConfig of type
    embeddedAuthServer as well; on MCPServer/MCPRemoteProxy
    (single-upstream consumers) the only validated values are empty
    (auto-select) or the name of that single upstream. The legacy
    spec.incomingAuth.authzConfig.inline.primaryUpstreamProvider
    location is read as a backward-compatibility fallback for one
    release with a Warning event of reason
    AuthzPrimaryUpstreamProviderDeprecated.

  • Pre-validate the referenced authz ConfigMap in the controller
    and distinguish NotFound from other parse/validation failures
    via two condition reasons on AuthConfigured: the existing shared
    AuthzConfigMapNotFound and a new AuthzConfigMapInvalid. This
    mirrors the diagnostic MCPRemoteProxy emits today and gives
    users a status-level error before the converter fails opaquely.

  • Validate the resolved PrimaryUpstreamProvider (from either
    source) against the declared embedded auth server upstreams so an
    unresolvable provider is rejected at convert time.

CRD compatibility: not a breaking change. The legacy
spec.incomingAuth.authzConfig.inline.primaryUpstreamProvider
location continues to work, with a Warning event emitted whenever it
is read; planned removal one release after the deprecation cycle.
The new fields on AuthzConfigRef (GroupClaimName, RoleClaimName,
GroupEntityType) and on EmbeddedAuthServerConfig
(PrimaryUpstreamProvider) are additive.

Closes #4919
Closes #5208
Closes #5277

Large PR Justification

~950 lines of code are relevant, everything else is auto-generated.

@blkt blkt self-assigned this May 15, 2026
@github-actions github-actions Bot added the size/XL Extra large PR: 1000+ lines changed label May 15, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.


This review will be automatically dismissed once you add the justification section.

@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels May 15, 2026
@github-actions github-actions Bot dismissed their stale review May 15, 2026 08:58

Large PR justification has been provided. Thank you!

@github-actions
Copy link
Copy Markdown
Contributor

✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review.

@blkt
Copy link
Copy Markdown
Contributor Author

blkt commented May 15, 2026

Heads up on dropping InlineAuthzConfig.PrimaryUpstreamProvider. Flagging because it's a breaking change.

The four Cedar fields touched here split into two buckets:

  • Policy content (Policies, EntitiesJSON`): lives wherever the policies live.
  • JWT-claim mapping (PrimaryUpstreamProvider, GroupClaimName, RoleClaimName, GroupEntityType): describes how Cedar reads the token, a separate concern from the policies themselves.

So the second bucket went onto AuthzConfigRef directly. Same field, same semantics, regardless of whether you're inline or configMap.

The non-breaking alternative was just leaving PrimaryUpstreamProvider on InlineAuthzConfig. Two reasons I didn't:

  • configMap users would still have no spec-level knob; they'd have to mutate the ConfigMap, and in our enterprise flow that's owned by an external controller.
  • Two YAML paths for the same value, plus ExplicitPrimaryUpstreamProvider() keeps an inline-only branch that's easy to miss.

The cost is the schema break: anyone setting spec.incomingAuth.authzConfig.inline.primaryUpstreamProvider today needs to move it up one level. If that feels too aggressive, happy to do a deprecation window instead (keep the inline field for a release, fall through in the helper, remove next minor). Small isolated change.

@blkt blkt force-pushed the feat/vmcp-resolve-authz-configmap branch from bf2fafb to 7895464 Compare May 15, 2026 09:37
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels May 15, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 15, 2026

Codecov Report

❌ Patch coverage is 92.94872% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.40%. Comparing base (0a741f7) to head (0925161).

Files with missing lines Patch % Lines
cmd/thv-operator/pkg/controllerutil/authz.go 85.29% 3 Missing and 2 partials ⚠️
...perator/controllers/virtualmcpserver_controller.go 90.90% 2 Missing and 1 partial ⚠️
cmd/thv-operator/pkg/vmcpconfig/converter.go 95.23% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5290      +/-   ##
==========================================
- Coverage   68.41%   68.40%   -0.02%     
==========================================
  Files         620      620              
  Lines       63316    63401      +85     
==========================================
+ Hits        43317    43367      +50     
- Misses      16772    16798      +26     
- Partials     3227     3236       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@blkt blkt force-pushed the feat/vmcp-resolve-authz-configmap branch from 7895464 to f8c1696 Compare May 15, 2026 11:40
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels May 15, 2026
@blkt blkt force-pushed the feat/vmcp-resolve-authz-configmap branch from f8c1696 to 12defec Compare May 19, 2026 10:45
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels May 19, 2026
A `VirtualMCPServer` with `spec.incomingAuth.authzConfig.type: configMap`
silently produced a vmcp `config.yaml` that referenced the unresolved
`configMap` type token. The vmcp binary's `AuthzConfig` validator only
accepts `cedar` or `none`, so the pod crashed in `CrashLoopBackOff` at
startup. Inline authz also silently dropped `GroupClaimName`,
`RoleClaimName`, `GroupEntityType`, and `EntitiesJSON`, so any enterprise
Cedar policy that walked a `Client → ClaimGroup → PlatformRole` hierarchy
denied every request because the runtime Cedar authorizer built
`THVGroup::` parents while the entity store contained `ClaimGroup::`
entities.

Wire the configMap path end-to-end, plumb the four missing fields
through both source paths, and move `PrimaryUpstreamProvider` onto the
auth server config where it belongs:

  * Extract `LoadAuthzConfigFromConfigMap` as the shared fetch/parse/
    validate helper in `controllerutil`; `AddAuthzConfigOptions` now
    delegates to it. The vMCP converter calls the same helper so the
    failure modes match the `MCPServer`/`MCPRemoteProxy` runner path.

  * Extend `pkg/vmcp/config.AuthzConfig` with `EntitiesJSON`,
    `GroupClaimName`, `RoleClaimName`, `GroupEntityType`, and forward
    all four into `cedar.ConfigOptions` in the Cedar middleware factory.
    `EntitiesJSON` defaults to `"[]"` when unset to preserve the
    historical Cedar contract.

  * Lift the source-agnostic Cedar JWT-claim mapping fields
    (`GroupClaimName`, `RoleClaimName`, `GroupEntityType`) onto
    `AuthzConfigRef` so they work identically for inline and configMap
    users. For configMap users the parsed payload provides the default
    and the spec-level field overrides when set.

  * Move `PrimaryUpstreamProvider` onto `EmbeddedAuthServerConfig`
    (`spec.authServerConfig.primaryUpstreamProvider` on
    `VirtualMCPServer`). The field describes which upstream IDP token
    Cedar reads claims from, which is a property of the embedded auth
    server, not of the authz policies. Placing it next to
    `upstreamProviders` co-locates the choice with the set it selects
    from. Reachable from `MCPExternalAuthConfig` of type
    `embeddedAuthServer` as well; on `MCPServer`/`MCPRemoteProxy`
    (single-upstream consumers) the only validated values are empty
    (auto-select) or the name of that single upstream. The legacy
    `spec.incomingAuth.authzConfig.inline.primaryUpstreamProvider`
    location is read as a backward-compatibility fallback for one
    release with a `Warning` event of reason
    `AuthzPrimaryUpstreamProviderDeprecated`.

  * Pre-validate the referenced authz `ConfigMap` in the controller
    and distinguish `NotFound` from other parse/validation failures
    via two condition reasons on `AuthConfigured`: the existing shared
    `AuthzConfigMapNotFound` and a new `AuthzConfigMapInvalid`. This
    mirrors the diagnostic `MCPRemoteProxy` emits today and gives
    users a status-level error before the converter fails opaquely.

  * Validate the resolved `PrimaryUpstreamProvider` (from either
    source) against the declared embedded auth server upstreams so an
    unresolvable provider is rejected at convert time.

CRD compatibility: not a breaking change. The legacy
`spec.incomingAuth.authzConfig.inline.primaryUpstreamProvider`
location continues to work, with a Warning event emitted whenever it
is read; planned removal one release after the deprecation cycle.
The new fields on `AuthzConfigRef` (`GroupClaimName`, `RoleClaimName`,
`GroupEntityType`) and on `EmbeddedAuthServerConfig`
(`PrimaryUpstreamProvider`) are additive.

Closes #4919
Closes #5208
Closes #5277
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels May 19, 2026
The relocation of `PrimaryUpstreamProvider` onto `EmbeddedAuthServerConfig`
in the parent commit introduced two new behaviours and surfaced one
pre-existing coverage gap. None of the three were captured by a test. Add
focused unit tests that pin each behaviour so a future refactor cannot
silently drop it.

* `*VirtualMCPServer.ExplicitPrimaryUpstreamProvider()` is the central
  precedence helper: canonical `spec.authServerConfig.primaryUpstreamProvider`
  wins, deprecated `spec.incomingAuth.authzConfig.inline.primaryUpstreamProvider`
  is the fallback, otherwise empty. A unit test covers all four
  combinations (canonical only, deprecated only, both, neither) and locks
  the `fromDeprecated` flag the caller uses to gate the deprecation event.

* `AuthzConfigRef.DeprecatedInlinePrimaryUpstreamProvider()` is the helper
  the advisory paths on `MCPServer` and `MCPRemoteProxy` read. A unit
  test covers the nil receiver, nil `Inline` subtree, inline without the
  field, and inline with the field set.

* `MCPServer.validateAuthzPrimaryUpstreamProviderIgnored` and the
  equivalent on `MCPRemoteProxy` set the `AuthzPrimaryUpstreamProviderIgnored`
  advisory condition when the deprecated field is set on a CR that has
  no embedded auth server to act on it. Both functions existed before
  this PR but had no test coverage at all. The relocation of the field
  did not change their behaviour, but the absence of coverage means a
  regression there would go unnoticed. Add table-driven tests that
  exercise the True-on-set, absent-when-unset, and stale-cleared cases.

* `VirtualMCPServerReconciler.validateAuthzUpstreamAvailable` emits a
  `Warning` event with reason `AuthzPrimaryUpstreamProviderDeprecated`
  whenever it resolves the primary from the deprecated location, so
  operators see a kubectl-visible signal even when the deprecated value
  still validates. A test using `events.NewFakeRecorder` confirms the
  event fires for the deprecated path, does not fire for the canonical
  path, and does not fire when neither location is set.

No production code changes; only tests.
@blkt blkt force-pushed the feat/vmcp-resolve-authz-configmap branch from 12defec to 0925161 Compare May 19, 2026 12:54
@github-actions github-actions Bot removed the size/XL Extra large PR: 1000+ lines changed label May 19, 2026
@github-actions github-actions Bot added the size/XL Extra large PR: 1000+ lines changed label May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Extra large PR: 1000+ lines changed

Projects

None yet

1 participant