Genericize OAuth profiles#13477
Conversation
Users who haven't logged in since the oauth_github table landed (2026-01-20) don't have a row in it. Tier 1 switches identity reads to oauth_github, so those users need to be backfilled first or they appear to not exist. One-off migration copies users.gh_* into oauth_github for every user with gh_id > 0, using ON CONFLICT DO NOTHING so it's idempotent. Also adds a preflight check on app startup (non-test) that queries for users with gh_id > 0 missing an oauth_github row. If any exist, startup fails with a clear log message -- safety net in case the migration silently skipped rows.
Identity handling is hardwired to GitHub via app.github and app.github_oauth. This adds a provider abstraction so that adding a new provider is just a new impl + a registry entry. Three new types: - OAuthProvider trait: async methods for the OAuth dance (authorize_url, exchange_code, fetch_user_info) and a stable name() used for registry lookup and oauth_<name> table routing. - UserInfo: provider-agnostic user profile returned by fetch_user_info. account_id is String because providers disagree on ID shape (GitHub: i64, Bitbucket: UUID, GitLab: numeric but can exceed i64 on some instances). - ProviderRegistry: HashMap<name, Arc<dyn OAuthProvider>> attached to App. GitHubProvider wraps the existing BasicClient and GitHubClient so no identity plumbing below it changes. Team-membership code still calls GitHubClient directly. App.github changes from Box to Arc so GitHubProvider and App can share the same client instance. build_github_oauth_client() is extracted to deduplicate the OAuth endpoint URLs. Nothing calls the registry yet -- next commit routes session::begin and session::authorize through it.
session::begin and session::authorize now dispatch through app.oauth_providers instead of hardcoded app.github_oauth and app.github. The ?provider= query param selects a registered provider (defaults to "github" so the frontend keeps working). Provider name is carried in server-side session state, not as a callback query param, so it can't be spoofed. User lookups on the identity path now read from oauth_github instead of users.gh_*. Writes stay dual because the gh_* columns are NOT NULL; stopping writes is a follow-up after a schema change makes them nullable. Config rename: gh_token_encryption -> oauth_token_encryption. The old env var (GITHUB_TOKEN_ENCRYPTION_KEY) is accepted as a deprecated alias with a WARN log. Team membership still goes through GitHubClient directly -- stays GitHub-bound until Tier 4. Adds regression test confirming sync_admins still matches users via oauth_github.account_id after the identity read cutover.
Add Dockerfile.integration and a docker-compose `integration` service that builds the server in debug mode and runs it against the cargo_registry_test database on port 9888. RUST_LOG=debug is on by default so container logs are useful for debugging. src/tests/docker_integration.rs has 9 tests that hit the live container: session begin/authorize flows, provider selection (default and explicit github, unknown provider 404), CSRF rejection, preflight verification (implicit via server boot), and seeded-user identity checks via /api/v1/users/:login and /api/v1/me with a forged session cookie. Tests skip automatically when the container isn't up. Also tightens a few doc comments and contractions caught during review.
The genericize-profiles work needs a way to identify which OAuth
provider a user or credential row is associated with. A native
Postgres ENUM gives readable values in raw SQL ('github' vs. an
opaque integer) and lets the database enforce the closed set of
supported providers.
Starts with a single variant 'github'; future providers add
variants via ALTER TYPE ... ADD VALUE.
Each user needs a way to declare which OAuth provider they consider their primary identity — the one whose username/avatar appear in public profile views and whose credentials are used for API token encryption. With only GitHub today every user defaults to 'github', but the column is the prerequisite for multi-provider support. Migration: NOT NULL DEFAULT 'github', which PG 11+ applies as a metadata-only operation (no table rewrite). Rust side adds the OAuthProviderId enum with FromSql/ToSql for the native PG enum, FromStr for string-based lookups, and a round-trip integration test that exercises both the DB default path and explicit ToSql.
|
☔ The latest upstream changes made this pull request unmergeable. Please resolve the merge conflicts. |
|
This is a draft PR for work toward resolving two RFCs:
This is primarily here for asynchronous review of what I'm proposing as the resolution of the RFCs. The work is still in progress. |
| -- was created. Batched by id range, idempotent via ON CONFLICT DO NOTHING. | ||
|
|
||
| SET LOCAL lock_timeout = '10s'; | ||
| SET LOCAL statement_timeout = '120s'; |
There was a problem hiding this comment.
Could you elaborate on why this commit is needed? After we merged #12792, we ran the SQL in https://github.com/rust-lang/crates.io/blob/150d165e4d5ecda24b60f146589db4def7fd34bf/migrations/data_oauth_github.sql outside of the migration framework that runs on deploy, so the backfill has already been handled.
| //! `users.gh_*` to `oauth_github`. That requires every user with | ||
| //! `gh_id > 0` to have a matching `oauth_github` row. The backfill migration | ||
| //! copies the data, and this module verifies the invariant on startup -- | ||
| //! if any user is missing a row the app refuses to start. |
There was a problem hiding this comment.
Yeah this is not the kind of thing we'd want to fail to start the app for
| let github = RealGitHubClient::new(client); | ||
| let github = Box::new(github); | ||
| let github: std::sync::Arc<dyn crates_io_github::GitHubClient> = | ||
| std::sync::Arc::new(RealGitHubClient::new(client.clone())); |
There was a problem hiding this comment.
IMO this commit should be pulled out into a separate PR from anything having to do with usernames-- we want to introduce the crates.io username first, separately from adding any other oauth provider.
| -- it back would require distinguishing backfilled rows from rows inserted | ||
| -- by login, which we cannot do. The `oauth_github` table itself is dropped | ||
| -- by login, which we can't do. The `oauth_github` table itself is dropped | ||
| -- by the earlier migration 2026-01-20-162913_oauth_github_table/down.sql. |
There was a problem hiding this comment.
Yeah any comment corrections should be made in a separate commit, possibly a separate PR
| //! | ||
| //! Then run just these tests: | ||
| //! | ||
| //! cargo test --test integration docker_integration |
There was a problem hiding this comment.
I don't use docker; could you elaborate on why separate tests are needed?
| # Image for running the crates.io server in integration tests. | ||
| # Debug build for fast compilation and full debug output. | ||
|
|
||
| ARG RUST_VERSION=1.94.1 |
There was a problem hiding this comment.
Again I don't use docker, but this looks significantly different than the existing docker file at https://github.com/rust-lang/crates.io/blob/main/backend.Dockerfile, could you explain why this one is different?
The backfill migration is no longer needed as part of the OAuth provider genericization effort. The schema changes (login → username rename and avatar type widening) are still required, but the data backfill itself will be handled differently in the new architecture. Keeping this migration would complicate the migration history and introduce unnecessary coupling to GitHub-specific table structure.
No description provided.