feat: auto-clear auth cache on re-login to fix stale token issues#8414
feat: auto-clear auth cache on re-login to fix stale token issues#8414vhvb1989 wants to merge 7 commits into
Conversation
When users encounter AADSTS700082 (expired refresh token) errors after re-authenticating with azd auth login, the stale MSAL cache and credential files prevent the new login from taking effect. The --reset flag clears all cached authentication data before performing the login flow: - MSAL token cache (auth/msal/) - Credential cache files - auth.json (current user config) - auth.claims - Subscription cache Fixes #7541 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Documents all supported authentication methods including interactive browser, device code, service principal (secret/certificate), federated credentials (GitHub Actions, Azure Pipelines, generic OIDC), managed identity, delegated auth via Azure CLI, and external authentication. Also documents the new --reset flag, when to use it, what files it clears, and how authentication state is stored on disk. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds a --reset flag to azd auth login to clear cached authentication artifacts (MSAL cache, credential files, auth.json/claims, and subscriptions cache) before running the login flow, addressing cases where stale local state prevents re-auth from taking effect (e.g., AADSTS700082).
Changes:
- Added
auth.Manager.CleanAllAuthCache()to remove auth-related cache files/directories and recreate the MSAL cache directory. - Wired new
--resetflag intoazd auth loginto clear auth + subscriptions cache prior to login, and updated help/completions snapshots. - Added unit tests covering
CleanAllAuthCachebehavior; added new authentication documentation covering reset behavior and auth modes.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| cli/azd/pkg/auth/manager.go | Introduces CleanAllAuthCache() to delete auth cache artifacts and recreate required directories. |
| cli/azd/cmd/auth_login.go | Adds --reset flag and executes cache cleanup before login. |
| cli/azd/pkg/auth/manager_coverage_test.go | Adds unit coverage for CleanAllAuthCache() behavior. |
| cli/azd/docs/authentication.md | New doc describing auth methods and the --reset workflow/paths. |
| cli/azd/cmd/testdata/TestUsage-azd-auth-login.snap | Updates command usage snapshot to include --reset. |
| cli/azd/cmd/testdata/TestFigSpec.ts | Updates Fig completion spec to include --reset. |
- Reject --reset + --check-status flag combination - Use cfgRoot directly for claims path instead of ignoring claimsFilePath errors - Use filepath.Join and osutil permission constants in tests Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
📋 Prioritization NoteThanks for the contribution! The linked issue isn't in the current milestone yet. |
Replace bare errors.New with fmt.Errorf wrapping internal.ErrInvalidFlagCombination to satisfy Test_RunMethodsNoBareErrors CI check. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| ### When to use `--reset` | ||
|
|
||
| Use `--reset` when: | ||
|
|
||
| - You see `AADSTS700082` or similar stale-token errors right after logging in successfully | ||
| - `azd` commands fail with authentication errors that persist across multiple `azd auth login` | ||
| attempts | ||
| - You want to ensure a completely fresh authentication state (e.g. after switching tenants | ||
| or accounts) |
There was a problem hiding this comment.
Should we also update our error suggestions accordingly? I think we already have some logic to detect AADSTS700082
There was a problem hiding this comment.
Great suggestion! Done — I've added a specific AADSTS700082 rule in error_suggestions.yaml and updated the generic AADSTS rule to mention that re-running azd auth login now automatically clears stale cached tokens. Also added end-to-end pipeline tests for both rules.
Additionally, the approach has changed: instead of the --reset flag, azd auth login now automatically clears all cached auth data when it detects you're already logged in, making the fix fully transparent.
|
This PR adds 🔴 Must FixNone. 🟡 Should Fix
🟢 NitpickNone. Overall recommendation: Comment. |
|
The core cache deletion path looks reasonable, but the new user recovery path is still under-covered and not fully discoverable from existing stale-token errors. Couple things I noticed:
|
Replace the explicit --reset flag with automatic detection: when the user is already logged in, azd auth login now clears all cached authentication data (MSAL tokens, credentials, auth.json, claims, subscriptions cache) before re-authenticating. This makes the fix for stale token issues transparent — no extra flags needed. If the user is not logged in, the normal login flow proceeds without clearing anything. Closes #7541 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add specific AADSTS700082 rule in error_suggestions.yaml with guidance about automatic cache clearing on re-login - Update generic AADSTS rule to mention auto-clearing behavior - Add end-to-end pipeline tests for both the specific and generic rules Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Thanks for the review @RickWinter! Both items have been addressed:
|
hemarina
left a comment
There was a problem hiding this comment.
Thanks for the v2 refactor — auto-detecting an existing login is the right UX call, and the LogInDetails() gate is a clean way to avoid wiping state on a true first-time login. A few things worth considering before merge:
🔴 High
1. Wipe-before-login can leave the user in a strictly worse state.
cli/azd/cmd/auth_login.go:370-384 deletes auth.json, auth.claims, subscriptions.cache, the entire MSAL cache, and all cred*.bin files, then calls la.login(ctx). Any failure in la.login — interactive cancel, MFA timeout, CA-policy block, network blip, missing SP credential mode (auth_login.go:504-557), unreadable certificate path — leaves no rollback path. The PR description and cli/azd/docs/authentication.md:177-180 pitch this as zero-risk, but the previous behavior (overwrite-on-success) preserved the prior session if the new login failed.
Suggested approaches:
- Stage cleanup after
la.login(ctx)succeeds, or - Back up
~/.azd/auth/to a temp dir, attempt login, restore on failure.
2. New AADSTS700082 rule is missing a links: field.
cli/azd/resources/error_suggestions.yaml:607-613 adds the rule but omits links:. Pipeline matching is first-match-wins (pipeline_test.go → TestPipeline_FirstMatchWins), so this rule pre-empts the generic AADSTS rule below (lines 615-624) which does include the azd auth login reference link. Per cli/azd/AGENTS.md: "populate all relevant fields (Err, Suggestion, Message, Links)." Please mirror the link from the generic entry:
- patterns:
- "AADSTS700082"
message: "The refresh token has expired or been revoked."
suggestion: >-
Run ''azd auth login'' to sign in again. ...
links:
- title: "azd auth login reference"
url: "https://learn.microsoft.com/azure/developer/azure-developer-cli/reference#azd-auth-login"🟡 Medium
3. The LogInDetails == nil gate is narrower than the help text implies.
auth_login.go:373 gates the wipe on LogInDetails(ctx) == nil. LogInDetails returns an error if m.publicClient.Accounts(ctx) fails (manager.go:1505-1508) — which can happen with the corrupted MSAL cache files this PR is specifically intended to recover from. For those users the gate evaluates to err != nil and the wipe silently skips. Either broaden the gate to "any cached auth state exists" (e.g., check auth.json presence) or document the limitation. The help text "automatically clears cached authentication data" currently reads broader than the actual behavior.
4. Subs-cache cleanup failure blocks login.
At auth_login.go:378-380, if ClearSubscriptions returns a non-ErrNotExist filesystem error (read-only mount, disk full, transient permission), we return "clearing subscriptions cache: <err>" and never call la.login(ctx) — leaving the user fully logged out after the auth wipe with a confusing error that does not hint at the actual state. Either swap the order (clear subscriptions first), or log-and-continue on subs-cache failure:
if err := la.subManager.ClearSubscriptions(ctx); err != nil {
log.Printf("warning: clearing subscriptions cache: %v", err)
}🟢 Low
5. Blast-radius doc nuance.
cli/azd/docs/authentication.md:178 documents this at the directory level ("clears all locally cached authentication data"), but the multi-account / multi-SP nuance isn't called out: a user with multiple cached MSAL accounts (common in multi-tenant guest scenarios) loses all of them as a side effect of refreshing one, and any cached service-principal credentials under other (tenant, client) pairs (cred*.bin) get wiped too. A one-line callout would help users with multiple identities.
For context, I also looked at whether the external-auth path (AZD_AUTH_ENDPOINT/AZD_AUTH_KEY) could hit this code in a weird way — it can''t: Mode() returns ExternalRequest when those env vars are set, and SetBuiltInAuthMode() rejects mode changes while external mode is active (manager.go:1564-1567), so loginAction.Run returns before reaching the new block. No action needed there.
Overall recommendation: Comment.
JeffreyCA
left a comment
There was a problem hiding this comment.
To @hemarina's comment, staging CleanAllAuthCache() after la.login succeeds would delete the fresh token too, since the login writes it back into the same auth/ dir and auth.json we just wiped.
The backup/restore approach would preserve the old session, but there may be concurrency considerations and we'd need to also consider the subscriptions.cache which lives at the config root (outside auth/).
We might be able to avoid the rollback problem entirely by narrowing the scope and evicting only the stale refresh token for the current account (the AADSTS700082 cause), which Logout() already does via getSignedInAccount + publicClient.RemoveAccount (manager.go:1047-1056). Would that work better?
Also consider skipping the clearing logic for MI/SP re-logins (the LogInDetails == nil guard currently fires for those too, where there's no refresh token to fix)
- Add links field to AADSTS700082 error suggestion rule - Broaden login-state gate: only skip cleanup on ErrNoCurrentUser, proceed with cleanup for any other result (including corrupted cache) - Add telemetry attribute for cache-clear failures - Update test to verify links in AADSTS700082 rule Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Thanks for the thorough review @hemarina! Here's how we addressed each point: 🔴 #1 — Wipe-before-login leaving user in worse state: Rejected. When the cleanup runs, it's because the user is already in a bad auth state (stale tokens causing failures like AADSTS700082). The purpose is specifically to clear that bad state. If login fails after cleanup, the user gets a clear error — which is better than silently staying in a broken state where tokens appear valid but are expired. 🔴 #2 — Missing 🟡 #3 — 🟡 #4 — Subs-cache failure blocks login: Rejected. If we can't clear the files, we can't be confident the login/token state will work for subsequent commands. Kept as a hard error, but added a telemetry attribute ( 🟢 #5 — Multi-account blast radius: Rejected. azd only supports a single active account at a time, so multi-account blast radius is not a concern. All changes in 80f64c7. |
Azure Dev CLI Install InstructionsInstall scriptsMacOS/Linux
bash: pwsh: WindowsPowerShell install MSI install Standalone Binary
MSI
Documentationlearn.microsoft.com documentationtitle: Azure Developer CLI reference
|
Summary
When
azd auth loginis run while the user is already logged in, azd now automatically clears all cached authentication data before re-authenticating. This transparently fixes the stale token issue (#7541) whereAADSTS700082expired refresh token errors persist even after a successful login.What changed
azd auth logindetects if the user is already logged in (viaLogInDetails()). If so, it callsCleanAllAuthCache()andClearSubscriptions()before proceeding with login. If not logged in, it proceeds normally.CleanAllAuthCache()method (new onauth.Manager): Removes all auth-related files — theauth/directory (MSAL + credential caches),auth.json,auth.claims— then recreates theauth/msal/directory structure.cli/azd/docs/authentication.mdcovering all supported auth methods, automatic cleanup behavior, auth state storage, and troubleshooting.What is cleared on re-login
~/.azd/auth/msal/~/.azd/auth/~/.azd/auth.json~/.azd/auth.claims~/.azd/subscriptions.cacheDesign decisions
--reset, the cleanup happens automatically when re-logging in. First-time logins (not logged in) skip the cleanup entirely.--check-statusearly exit and after the delegated/external auth mode check, so it only applies to the normalAzdBuiltInlogin path.--client-id --tenant-id) work correctly — login methods create credentials from scratch and don't depend on existing cache state.Closes #7541