Skip to content

Genericize OAuth profiles#13477

Draft
quinnjr wants to merge 7 commits intorust-lang:mainfrom
quinnjr:feature/genericize-profiles
Draft

Genericize OAuth profiles#13477
quinnjr wants to merge 7 commits intorust-lang:mainfrom
quinnjr:feature/genericize-profiles

Conversation

@quinnjr
Copy link
Copy Markdown

@quinnjr quinnjr commented Apr 22, 2026

No description provided.

quinnjr added 6 commits April 16, 2026 17:21
Users who haven't logged in since the oauth_github table landed
(2026-01-20) don't have a row in it. Tier 1 switches identity
reads to oauth_github, so those users need to be backfilled first
or they appear to not exist.

One-off migration copies users.gh_* into oauth_github for every
user with gh_id > 0, using ON CONFLICT DO NOTHING so it's
idempotent.

Also adds a preflight check on app startup (non-test) that queries
for users with gh_id > 0 missing an oauth_github row. If any
exist, startup fails with a clear log message -- safety net in
case the migration silently skipped rows.
Identity handling is hardwired to GitHub via app.github and
app.github_oauth. This adds a provider abstraction so that adding
a new provider is just a new impl + a registry entry.

Three new types:

- OAuthProvider trait: async methods for the OAuth dance
  (authorize_url, exchange_code, fetch_user_info) and a stable
  name() used for registry lookup and oauth_<name> table routing.

- UserInfo: provider-agnostic user profile returned by
  fetch_user_info. account_id is String because providers disagree
  on ID shape (GitHub: i64, Bitbucket: UUID, GitLab: numeric but
  can exceed i64 on some instances).

- ProviderRegistry: HashMap<name, Arc<dyn OAuthProvider>> attached
  to App.

GitHubProvider wraps the existing BasicClient and GitHubClient so
no identity plumbing below it changes. Team-membership code still
calls GitHubClient directly.

App.github changes from Box to Arc so GitHubProvider and App can
share the same client instance. build_github_oauth_client() is
extracted to deduplicate the OAuth endpoint URLs.

Nothing calls the registry yet -- next commit routes session::begin
and session::authorize through it.
session::begin and session::authorize now dispatch through
app.oauth_providers instead of hardcoded app.github_oauth and
app.github. The ?provider= query param selects a registered
provider (defaults to "github" so the frontend keeps working).
Provider name is carried in server-side session state, not as a
callback query param, so it can't be spoofed.

User lookups on the identity path now read from oauth_github
instead of users.gh_*. Writes stay dual because the gh_* columns
are NOT NULL; stopping writes is a follow-up after a schema change
makes them nullable.

Config rename: gh_token_encryption -> oauth_token_encryption.
The old env var (GITHUB_TOKEN_ENCRYPTION_KEY) is accepted as a
deprecated alias with a WARN log.

Team membership still goes through GitHubClient directly -- stays
GitHub-bound until Tier 4.

Adds regression test confirming sync_admins still matches users
via oauth_github.account_id after the identity read cutover.
Add Dockerfile.integration and a docker-compose `integration` service
that builds the server in debug mode and runs it against the
cargo_registry_test database on port 9888. RUST_LOG=debug is on by
default so container logs are useful for debugging.

src/tests/docker_integration.rs has 9 tests that hit the live
container: session begin/authorize flows, provider selection (default
and explicit github, unknown provider 404), CSRF rejection, preflight
verification (implicit via server boot), and seeded-user identity
checks via /api/v1/users/:login and /api/v1/me with a forged session
cookie. Tests skip automatically when the container isn't up.

Also tightens a few doc comments and contractions caught during review.
The genericize-profiles work needs a way to identify which OAuth
provider a user or credential row is associated with. A native
Postgres ENUM gives readable values in raw SQL ('github' vs. an
opaque integer) and lets the database enforce the closed set of
supported providers.

Starts with a single variant 'github'; future providers add
variants via ALTER TYPE ... ADD VALUE.
Each user needs a way to declare which OAuth provider they consider
their primary identity — the one whose username/avatar appear in
public profile views and whose credentials are used for API token
encryption. With only GitHub today every user defaults to 'github',
but the column is the prerequisite for multi-provider support.

Migration: NOT NULL DEFAULT 'github', which PG 11+ applies as a
metadata-only operation (no table rewrite). Rust side adds the
OAuthProviderId enum with FromSql/ToSql for the native PG enum,
FromStr for string-based lookups, and a round-trip integration test
that exercises both the DB default path and explicit ToSql.
@rustbot
Copy link
Copy Markdown
Collaborator

rustbot commented Apr 22, 2026

☔ The latest upstream changes made this pull request unmergeable. Please resolve the merge conflicts.

@quinnjr
Copy link
Copy Markdown
Author

quinnjr commented Apr 22, 2026

This is a draft PR for work toward resolving two RFCs:

This is primarily here for asynchronous review of what I'm proposing as the resolution of the RFCs. The work is still in progress.

@quinnjr
Copy link
Copy Markdown
Author

quinnjr commented Apr 22, 2026

@carols10cents

-- was created. Batched by id range, idempotent via ON CONFLICT DO NOTHING.

SET LOCAL lock_timeout = '10s';
SET LOCAL statement_timeout = '120s';
Copy link
Copy Markdown
Member

@carols10cents carols10cents Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on why this commit is needed? After we merged #12792, we ran the SQL in https://github.com/rust-lang/crates.io/blob/150d165e4d5ecda24b60f146589db4def7fd34bf/migrations/data_oauth_github.sql outside of the migration framework that runs on deploy, so the backfill has already been handled.

View changes since the review

Comment thread src/oauth/preflight.rs
//! `users.gh_*` to `oauth_github`. That requires every user with
//! `gh_id > 0` to have a matching `oauth_github` row. The backfill migration
//! copies the data, and this module verifies the invariant on startup --
//! if any user is missing a row the app refuses to start.
Copy link
Copy Markdown
Member

@carols10cents carols10cents Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is not the kind of thing we'd want to fail to start the app for

View changes since the review

Comment thread src/bin/server.rs
let github = RealGitHubClient::new(client);
let github = Box::new(github);
let github: std::sync::Arc<dyn crates_io_github::GitHubClient> =
std::sync::Arc::new(RealGitHubClient::new(client.clone()));
Copy link
Copy Markdown
Member

@carols10cents carols10cents Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this commit should be pulled out into a separate PR from anything having to do with usernames-- we want to introduce the crates.io username first, separately from adding any other oauth provider.

View changes since the review

-- it back would require distinguishing backfilled rows from rows inserted
-- by login, which we cannot do. The `oauth_github` table itself is dropped
-- by login, which we can't do. The `oauth_github` table itself is dropped
-- by the earlier migration 2026-01-20-162913_oauth_github_table/down.sql.
Copy link
Copy Markdown
Member

@carols10cents carols10cents Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah any comment corrections should be made in a separate commit, possibly a separate PR

View changes since the review

//!
//! Then run just these tests:
//!
//! cargo test --test integration docker_integration
Copy link
Copy Markdown
Member

@carols10cents carols10cents Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't use docker; could you elaborate on why separate tests are needed?

View changes since the review

Comment thread Dockerfile.integration
# Image for running the crates.io server in integration tests.
# Debug build for fast compilation and full debug output.

ARG RUST_VERSION=1.94.1
Copy link
Copy Markdown
Member

@carols10cents carols10cents Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again I don't use docker, but this looks significantly different than the existing docker file at https://github.com/rust-lang/crates.io/blob/main/backend.Dockerfile, could you explain why this one is different?

View changes since the review

The backfill migration is no longer needed as part of the OAuth
provider genericization effort. The schema changes (login → username
rename and avatar type widening) are still required, but the data
backfill itself will be handled differently in the new architecture.

Keeping this migration would complicate the migration history and
introduce unnecessary coupling to GitHub-specific table structure.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants