Cache Azure CLI delegated-auth tokens to fix memory exhaustion and slow model catalog loads#8458
Cache Azure CLI delegated-auth tokens to fix memory exhaustion and slow model catalog loads#8458Copilot wants to merge 4 commits into
Conversation
… fan-out Co-authored-by: JeffreyCA <9157833+JeffreyCA@users.noreply.github.com>
|
/azp run azure-dev - cli |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
This PR adds per-tenant caching for legacy Azure CLI authentication credentials so concurrent token requests share one AzureCLICredential instance and avoid spawning many az subprocesses.
Changes:
- Adds a mutex-protected
azCliCredentialscache toauth.Manager. - Routes the
auth.useAzCliAuthpath through a cached credential helper. - Adds a test verifying credential reuse for the same tenant and separation across tenants.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
cli/azd/pkg/auth/manager.go |
Adds the per-tenant Azure CLI credential cache and helper used by CredentialForCurrentUser. |
cli/azd/pkg/auth/manager_test.go |
Adds coverage for cached credential reuse and tenant-specific cache entries. |
Wrap AzureCLICredential in a cachingCredential that reuses tokens per scope/tenant and single-flights concurrent acquisitions, so fan-out flows like loading the AI model catalog spawn a single `az` subprocess instead of one per request. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run azure-dev - cli |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Azure Dev CLI Install InstructionsInstall scriptsMacOS/Linux
bash: pwsh: WindowsPowerShell install MSI install Standalone Binary
MSI
Documentationlearn.microsoft.com documentationtitle: Azure Developer CLI reference
|
hemarina
left a comment
There was a problem hiding this comment.
Nice fix — the layering (one cachingCredential per tenant inside the manager, with singleflight + in-memory cache inside the credential) is clean, and scoping the cache to a single azd-command lifetime sidesteps the persistence/invalidation rabbit hole entirely. singleflight was already pulled into azd by pkg/azapi and pkg/project, so no new dependency surface. Tests use the right Go 1.26 patterns (wg.Go, t.Context()) per cli/azd/AGENTS.md, and the unbuffered gateReady channel in TestCachingCredentialSingleFlight correctly deadlocks if singleflight ever fails to coalesce — that's the right failure mode for race-sensitive tests.
Approving. Two small things below, both non-blocking — one test-coverage nit and one behavior-change note worth surfacing in the PR description.
vhvb1989
left a comment
There was a problem hiding this comment.
Nice work! The caching + singleflight approach is clean and well-tested.
One note for a potential follow-up: the same subprocess-per-call problem exists on the extension side. azdext.TokenProvider uses AzureDeveloperCLICredential, which — like AzureCLICredential — has no built-in caching and spawns an azd auth token subprocess on every GetToken call. So extensions that fan out concurrent Azure SDK clients (e.g. the model catalog load in azure.ai.agents) could hit the same memory/latency issue even when using azd credentials instead of az credentials.
It would be worth promoting cachingCredential (or a similar wrapper) so it can be reused in azdext.TokenProvider as well. This could speed up the model catalog load for the default azd auth path too, not just the delegated az CLI path. No need to block this PR on it — just flagging it as a follow-up if we observe the same bottleneck there.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
📋 Milestone: June 2026This work is tracked for June 2026. The team will review it soon! |
Fixes #8455
This PR fixes a memory-exhaustion failure and a follow-on performance regression that occur when
azddelegates authentication to the Azure CLI (auth.useAzCliAuth = true), most visibly during theazd ai agent initflow when it loads the full model catalog.Motivation
With delegated auth,
azdpreviously created a newAzureCLICredentialper request. That credential caches nothing and spawns anaz account get-access-tokensubprocess on everyGetTokencall, so flows that fan out many concurrent Azure SDK clients — like the catalog load, which queries every region in parallel with a fresh client per region — spawned oneazsubprocess per request at once. In constrained environments like Cloud Shell this could exhaust memory and fail the command. Sharing one credential per tenant resolved the crash by serializing the requests, but since the credential still cached nothing each request kept spawning its ownazsubprocess, making the catalog load slow.What changed
This PR wraps the Azure CLI credential in an in-memory caching credential, shared per tenant. Tokens are cached by scope, tenant, CAE flag, and claims, reused until near expiry, and concurrent acquisitions for the same key are de-duplicated with single-flight so a burst of parallel requests collapses into a single
azsubprocess. The cache lives only for the command invocation and never persists to disk. A catalog load that previously spawned oneazsubprocess per region now spawns a single subprocess and serves the rest from memory, removing both the memory pressure and the serialized latency.Results
Time to load the full model catalog during
azd ai agent initwith delegated auth in Cloud Shell:Behavior change
Because tokens are now cached for the duration of a command, there is a subtle change around running
az logoutin a parallel shell mid-command. Previously every request shelled out toaz, so anaz logoutelsewhere surfaced as an auth error on the next request within the sameazdcommand. Now an in-flightazdcommand keeps using its cached token until shortly before expiry (token TTL minus a five-minute refresh offset), so it will not be interrupted by a concurrentaz logout. This is the desired behavior — a long-runningazd upshould not fail because another shell logged out — and the cache is in-memory only, so subsequentazdinvocations are unaffected.Testing
Added unit tests covering token reuse, per-scope caching, refresh near expiry, and single-flight de-duplication under concurrency. The new tests and the existing auth suite pass under the race detector.