diff --git a/docs/superpowers/plans/2026-04-11-cleanup-guard-test-flake.md b/docs/superpowers/plans/2026-04-11-cleanup-guard-test-flake.md new file mode 100644 index 00000000..ce036430 --- /dev/null +++ b/docs/superpowers/plans/2026-04-11-cleanup-guard-test-flake.md @@ -0,0 +1,480 @@ +# CleanupGuard Test Flake Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Eliminate the CI flake where Rust integration tests intermittently time out at nextest's 120s `slow-timeout` because `CleanupGuard::drop` blocks indefinitely on a hanging cleanup runtime. + +**Architecture:** Two layers shipped in one PR. Layer 1 is a stopgap that bounds `CleanupGuard::drop` with a 30-second polling join, then detaches if cleanup hasn't completed. Layer 2 introduces an explicit `cleanup(self)` async method that runs cleanup on the test's existing tokio runtime, and migrates ~136 test sites across 16 files to call it. The Drop fallback stays in place during migration; after migration completes and CI is green for ~2 weeks, a follow-up commit replaces the fallback's eprintln with `panic!()` to enforce the contract permanently. + +**Tech Stack:** Rust, tokio, sqlx, cargo-nextest + +**Spec:** `docs/superpowers/specs/2026-04-11-security-audit-followups-design.md` (Topic 4) + +**Branch:** `fix/cleanup-guard-test-flake` + +--- + +## File structure + +| File | Role | Touched in | +|---|---|---| +| `server/tests/integration/helpers/mod.rs` | Defines `CleanupGuard`, the broken Drop, and the new `cleanup()` method | Tasks 1, 2 | +| `server/tests/integration/guild_limits.rs` | Contains 2 known-flaky tests; 10 guard usages | Task 3 | +| `server/tests/integration/custom_status.rs` | Contains 1 known-flaky test; 6 guard usages | Task 4 | +| `server/tests/integration/workspaces.rs` | 24 guard usages (largest file) | Task 5 | +| `server/tests/integration/filters_http.rs` | 17 guard usages | Task 6 | +| `server/tests/integration/governance.rs` | 12 guard usages | Task 7 | +| `server/tests/integration/webhooks.rs` | 10 guard usages | Task 8 | +| `server/tests/integration/channel_pins.rs` | 10 guard usages | Task 9 | +| `server/tests/integration/connectivity_http.rs` | 8 guard usages | Task 10 | +| `server/tests/integration/setup_http.rs` | 6 guard usages | Task 11 | +| `server/tests/integration/messages_http.rs` | 6 guard usages | Task 11 | +| `server/tests/integration/media_processing.rs` | 6 guard usages | Task 11 | +| `server/tests/integration/bot_intents.rs` | 6 guard usages | Task 12 | +| `server/tests/integration/dm_http.rs` | 5 guard usages | Task 12 | +| `server/tests/integration/channels_http.rs` | 5 guard usages | Task 12 | +| `server/tests/integration/uploads_http.rs` | 3 guard usages | Task 13 | +| `server/tests/integration/setup_concurrent_http.rs` | 2 guard usages (already #[ignore]'d, no migration needed but check anyway) | Task 13 | + +**Total: 136 cleanup_guard usages across 16 files.** + +The plan groups the 16 files into 11 migration tasks (tasks 3-13). Tasks are sized so each commit migrates a related set of tests in 5-15 minutes. + +--- + +## Task 1: Layer 1 — Bounded join in `CleanupGuard::drop` + +**Files:** +- Modify: `server/tests/integration/helpers/mod.rs:163-185` + +**Goal:** Replace the unconditional `.join().expect()` with a 30-second polling join. If cleanup completes within 30s, join normally. Otherwise, detach the cleanup thread and let the test process continue. Stops the 120s nextest timeout from being triggered by hanging cleanup. + +- [ ] **Step 1: Read the current Drop impl** + +Run: `sed -n '163,185p' server/tests/integration/helpers/mod.rs` + +Expected: see the current `impl Drop for CleanupGuard` which spawns a thread, builds a `tokio::runtime::Builder::new_current_thread()`, calls `runtime.block_on(...)`, and ends with `.join().expect("Cleanup thread panicked");`. Confirm the line range is still 163-185 (file may have shifted). + +- [ ] **Step 2: Establish a baseline** + +Run: `SQLX_OFFLINE=true cargo nextest run -p vc-server --lib 2>&1 | tail -5` + +Expected: see a "Summary" line with PASS counts. Record the number — for example, "300 tests passed". You'll compare against this after Task 1 to verify nothing regressed. + +(We use `--lib` only, not full integration tests, because integration tests need a running DB and we don't want to chase that here. The Drop change is in test-helpers code paths but doesn't affect lib tests' compilation.) + +- [ ] **Step 3: Replace the Drop impl** + +Edit `server/tests/integration/helpers/mod.rs`. Replace the existing `impl Drop for CleanupGuard { ... }` block (lines 163-185) with this new version: + +```rust +impl Drop for CleanupGuard { + fn drop(&mut self) { + let actions = std::mem::take(&mut self.actions); + if actions.is_empty() { + return; + } + + let pool = self.pool.clone(); + let handle = std::thread::spawn(move || { + let runtime = tokio::runtime::Builder::new_current_thread() + .enable_all() + .build() + .expect("Failed to create cleanup runtime"); + runtime.block_on(async move { + for action in actions { + action(pool.clone()).await; + } + }); + }); + + // Bounded join: poll for up to 30s, then detach if still running. + // Prevents flaky cleanup hangs from triggering nextest's 120s slow-timeout. + // Removed once the explicit-cleanup migration completes (Task 14+). + let timeout = std::time::Duration::from_secs(30); + let start = std::time::Instant::now(); + loop { + if handle.is_finished() { + let _ = handle.join(); + return; + } + if start.elapsed() > timeout { + eprintln!( + "warning: CleanupGuard did not complete within {timeout:?}, \ + detaching thread to allow test to finish" + ); + return; + } + std::thread::sleep(std::time::Duration::from_millis(50)); + } + } +} +``` + +- [ ] **Step 4: Compile check** + +Run: `SQLX_OFFLINE=true cargo build --tests -p vc-server 2>&1 | tail -10` + +Expected: clean build, no warnings. + +If clippy warnings appear (e.g., `clippy::needless_continue`), address them. If errors appear, double-check the edit matched the existing structure. + +- [ ] **Step 5: Re-run baseline tests** + +Run: `SQLX_OFFLINE=true cargo nextest run -p vc-server --lib 2>&1 | tail -5` + +Expected: same pass count as Step 2. If regressed, investigate before proceeding. + +- [ ] **Step 6: Commit** + +```bash +git add server/tests/integration/helpers/mod.rs +git commit -m "fix(test): bound CleanupGuard::drop join to 30s + +CleanupGuard::drop currently calls .join() with no timeout. When test +cleanup hangs (DB pool contention, slow DELETE, lock wait, etc.), the +test process holds until nextest's 120s slow-timeout kills it. + +Add a 30-second polling join: if cleanup completes within 30s, join +normally. Otherwise, log to stderr and return — the cleanup thread +continues detached but the test process is no longer blocked. + +This is a stopgap. The proper fix (explicit .cleanup().await) is in +follow-up commits. + +Refs #509 (same anti-pattern, different test set)" +``` + +--- + +## Task 2: Layer 2 — Add `cleanup(self)` method and Drop fallback warning + +**Files:** +- Modify: `server/tests/integration/helpers/mod.rs` (add new method on `CleanupGuard`, update Drop fallback) + +**Goal:** Add an explicit `cleanup(self)` async method that runs cleanup actions on the test's existing tokio runtime. Update the Drop impl to also log a "dropped with N pending actions" warning so unmigrated tests are visible during the migration. + +- [ ] **Step 1: Add the `cleanup` method** + +Find the `impl CleanupGuard { ... }` block (just above the `impl Drop`). Add this new method at the end of that block, after the existing `restore_config_defaults` method: + +```rust + /// Run all registered cleanup actions on the caller's tokio runtime + /// and consume the guard. + /// + /// Tests MUST call this at the end of the test body. Forgetting it + /// triggers a runtime warning from the Drop fallback (during the + /// migration) and eventually a panic (after the migration completes). + pub async fn cleanup(mut self) { + let actions = std::mem::take(&mut self.actions); + for action in actions { + action(self.pool.clone()).await; + } + // Drop runs after this returns, but `actions` is now empty so + // Drop is a no-op. + } +``` + +- [ ] **Step 2: Update the Drop fallback to warn on pending actions** + +In the `impl Drop for CleanupGuard` block from Task 1, add an `eprintln!` warning **before** the existing thread-spawn fallback. The full updated block looks like this (changes marked with comments): + +```rust +impl Drop for CleanupGuard { + fn drop(&mut self) { + let actions = std::mem::take(&mut self.actions); + if actions.is_empty() { + return; + } + + // NEW: warn that the test forgot .cleanup().await — this is a + // bug, but during migration we still try to clean up so we + // don't leak DB rows. + eprintln!( + "warning: CleanupGuard dropped with {} pending actions — \ + test forgot to call .cleanup().await", + actions.len() + ); + + let pool = self.pool.clone(); + let handle = std::thread::spawn(move || { + // ... unchanged from Task 1 ... + }); + + // ... unchanged bounded-join logic from Task 1 ... + } +} +``` + +- [ ] **Step 3: Compile check** + +Run: `SQLX_OFFLINE=true cargo build --tests -p vc-server 2>&1 | tail -10` + +Expected: clean build. + +- [ ] **Step 4: Verify compilation** + +The helpers/mod.rs file is integration-test infrastructure that needs DATABASE_URL to actually exercise. We don't have a unit-test smoke check here — Tasks 3-13 will exercise the new method against real tests. + +Run: `SQLX_OFFLINE=true cargo build --tests -p vc-server 2>&1 | tail -3` + +Expected: clean. + +- [ ] **Step 5: Commit** + +```bash +git add server/tests/integration/helpers/mod.rs +git commit -m "feat(test): add CleanupGuard::cleanup() explicit method + +Tests should call .cleanup().await at the end of the body instead of +relying on Drop. The new method consumes self, runs all cleanup actions +on the caller's tokio runtime, and leaves Drop as a no-op. + +The Drop fallback now also prints a 'dropped with N pending actions' +warning so the migration progress is visible. After all tests are +migrated and CI is green for ~2 weeks, the warning will be replaced +with a panic in a follow-up commit. + +Refs spec docs/superpowers/specs/2026-04-11-security-audit-followups-design.md" +``` + +--- + +## Tasks 3-13: Migrate test files to `.cleanup().await` + +The migration is purely mechanical. For every test that has `let mut guard = app.cleanup_guard();` (or `let mut guard = CleanupGuard::new(pool.clone());`), add `guard.cleanup().await;` immediately before the test function returns. + +**The pattern:** + +```rust +// Before: +#[tokio::test] +async fn test_something() { + let app = TestApp::new().await; + let (user_id, _) = create_test_user(&app.pool).await; + let mut guard = app.cleanup_guard(); + guard.delete_user(user_id); + + // ... assertions ... +} + +// After: +#[tokio::test] +async fn test_something() { + let app = TestApp::new().await; + let (user_id, _) = create_test_user(&app.pool).await; + let mut guard = app.cleanup_guard(); + guard.delete_user(user_id); + + // ... assertions ... + + guard.cleanup().await; +} +``` + +**Important caveats for every migration task:** + +1. **Place `guard.cleanup().await;` after the LAST assertion**, not before. If an assertion fails, the test panics and the Drop fallback handles cleanup. We want the happy path to call cleanup explicitly. + +2. **If the test has early returns** (e.g., `return;` inside an `if`), insert `guard.cleanup().await;` before each return. Or refactor to a single exit point (`break 'outer;`). The Drop fallback handles the panic path; you're only fixing the happy path. + +3. **If the test moves `guard` into a closure** (e.g., `tokio::spawn(async move { guard.cleanup().await })`), the migration is non-trivial. Flag it in the commit message and handle it case-by-case. For Tasks 3-13, expect the simple case unless the file has unusual structure. + +4. **Don't touch tests that don't use `CleanupGuard`** — they're out of scope. + +5. **Don't add `cleanup().await` to tests inside `#[ignore]` blocks** unless they would pass in a clean environment. The setup_concurrent_http tests are already `#[ignore]`'d (per issue #509); migrate them anyway since the guard pattern is the same. + +### Per-task verification template + +For every migration task (3-13), the steps are: + +- [ ] **Step A: Read the test file** + +Run: `cat server/tests/integration/.rs | grep -nE "fn test_|cleanup_guard|CleanupGuard::new|guard\.cleanup"` + +This gives a quick overview of which test functions touch the guard. + +- [ ] **Step B: Edit the file** + +For each test function that creates a guard, add `guard.cleanup().await;` before the closing `}` of the function (or before each early return). + +Use the Edit tool with the exact `let mut guard = ...` line as the anchor, plus the surrounding context, to make the edits unambiguous. + +- [ ] **Step C: Compile check** + +Run: `SQLX_OFFLINE=true cargo build --tests -p vc-server 2>&1 | tail -5` + +Expected: clean. If errors, the most likely issue is a borrow problem (e.g., the test reads from `guard.pool` after `cleanup()` consumes it). Fix by moving the `pool` clone before the cleanup. + +- [ ] **Step D: Run the file's tests** + +Run: `SQLX_OFFLINE=true cargo nextest run -p vc-server --test integration 2>&1 | tail -10` + +Where `` is the file basename without extension (e.g., `guild_limits`). + +Expected: same pass/fail count as before the migration. We don't need DATABASE_URL set for this — the failures will be `Connection refused`, not "dropped with pending actions". The compile success is what matters here. + +- [ ] **Step E: Commit** + +```bash +git add server/tests/integration/.rs +git commit -m "test: migrate to explicit CleanupGuard::cleanup().await" +``` + +### Task 3: Migrate `guild_limits.rs` (priority: 2 known-flaky tests) + +10 guard usages. Files: `guild_limits.rs`. Follow the per-task template above. The known-flaky tests in this file are `test_globally_banned_user_cannot_join_via_discovery` and `test_channel_limit` — give them extra attention. + +### Task 4: Migrate `custom_status.rs` (priority: 1 known-flaky test) + +6 guard usages. The known-flaky test is `test_custom_status_with_expiry_persists`. Note this file uses the `CleanupGuard::new(pool.clone())` pattern, not `app.cleanup_guard()`. + +### Task 5: Migrate `workspaces.rs` (largest file) + +24 guard usages. Use Edit's `replace_all` cautiously — every `let mut guard = ...` is followed by a different test body, so each function needs a separate edit for the closing-brace insertion. + +### Task 6: Migrate `filters_http.rs` + +17 guard usages. + +### Task 7: Migrate `governance.rs` + +12 guard usages. + +### Task 8: Migrate `webhooks.rs` + +10 guard usages. + +### Task 9: Migrate `channel_pins.rs` + +10 guard usages. + +### Task 10: Migrate `connectivity_http.rs` + +8 guard usages. + +### Task 11: Migrate the 6-usage files (one commit each) + +Three files: `setup_http.rs`, `messages_http.rs`, `media_processing.rs`. Migrate each in a separate commit (so review can be per-file). + +### Task 12: Migrate the 5-usage files (one commit each) + +Three files: `bot_intents.rs`, `dm_http.rs`, `channels_http.rs`. + +### Task 13: Migrate the small files + +Two files: `uploads_http.rs` (3 usages), `setup_concurrent_http.rs` (2 usages — already `#[ignore]`'d, but migrate for consistency). One commit each. + +--- + +## Task 14: Final verification + +**Files:** none (verification only) + +- [ ] **Step 1: Compile entire test suite** + +Run: `SQLX_OFFLINE=true cargo build --tests --workspace 2>&1 | tail -10` + +Expected: clean build. + +- [ ] **Step 2: Run lib tests** + +Run: `SQLX_OFFLINE=true cargo nextest run -p vc-server --lib 2>&1 | tail -5` + +Expected: same baseline pass count as Task 1 Step 2. + +- [ ] **Step 3: Static check for unmigrated tests** + +Run: `grep -rn "let mut guard = \(app\.cleanup_guard\|CleanupGuard::new\)" server/tests/integration/ | wc -l` + +Record the count. Then: + +Run: `grep -rn "guard\.cleanup()\.await" server/tests/integration/ | wc -l` + +Expected: roughly equal counts. If the `cleanup().await` count is significantly lower, some tests were missed. Re-grep per file to find them. + +- [ ] **Step 4: Verify CI gate stays green** + +Run: `cargo fmt --check 2>&1 | tail -3` +Run: `SQLX_OFFLINE=true cargo clippy -p vc-server --tests -- -D warnings 2>&1 | tail -10` + +Both must be clean. + +- [ ] **Step 5: Push and create PR** + +```bash +git push -u origin fix/cleanup-guard-test-flake +gh pr create --title "fix(test): eliminate CleanupGuard CI flake (#900/900 timeout)" --body "$(cat <<'EOF' +## Summary + +Eliminates the recurring CI flake where Rust integration tests intermittently time out at nextest's 120s slow-timeout. Root cause: CleanupGuard::drop spawns a thread + tokio runtime + .join() with no timeout. When cleanup hangs, the test process holds for 120s and gets killed. + +## Layers + +**Layer 1 (commit 1)** — Bounded join in CleanupGuard::drop. 30-second polling join, then detach. Stopgap that immediately removes the failure mode. + +**Layer 2 (commits 2-13)** — Add explicit cleanup(self).await method, migrate ~136 test sites across 16 files. Drop fallback now also prints a "dropped with N pending actions" warning so unmigrated tests are visible. + +**Layer 3 (deferred)** — After ~2 weeks of clean main-branch CI, a follow-up commit will replace the Drop fallback's eprintln with panic!() to enforce the contract permanently. Tracked in a separate issue (link in this PR description). + +## Test plan + +- [x] cargo build --tests --workspace clean +- [x] cargo nextest run -p vc-server --lib clean +- [x] cargo fmt --check clean +- [x] cargo clippy -p vc-server --tests -- -D warnings clean +- [ ] CI Rust Tests passes 3 consecutive runs (verify in checks) +- [ ] Local stress test (20 nextest runs back-to-back) produces zero "did not complete within" warnings — the implementer should run this before merging + +## Refs + +- Spec: docs/superpowers/specs/2026-04-11-security-audit-followups-design.md (Topic 4) +- Related: #509 (same anti-pattern in setup tests) + +🤖 Generated with [Claude Code](https://claude.com/claude-code) +EOF +)" +``` + +- [ ] **Step 6: Local stress test (recommended before merge)** + +Run: `for i in {1..20}; do echo "=== run $i ==="; SQLX_OFFLINE=true cargo nextest run -p vc-server --lib 2>&1 | tail -3; done` + +(Skip the integration tests since they need DB. The lib tests still exercise the helpers/mod.rs compilation paths.) + +If you have a local DB available, also run integration tests: + +Run: `for i in {1..20}; do echo "=== run $i ==="; cargo nextest run -p vc-server --test integration 2>&1 | grep -E "Summary|did not complete"; done` + +Expected: zero "did not complete within" warnings across 20 runs. If ANY warning appears, pause and investigate the failing test before merging. + +- [ ] **Step 7: Wait for CI** + +Watch the PR's CI checks. The Rust Tests job should pass on the first run. If it doesn't, investigate the specific failure (which test? did it print a warning? check the slow-timeout pattern). + +- [ ] **Step 8: After merge — file follow-up issue** + +Create a tracking issue for the Layer 3 panic activation: + +```bash +gh issue create --title "Follow-up: replace CleanupGuard Drop eprintln with panic!()" --body "After ~2 weeks of clean main-branch CI (no 'dropped with N pending actions' warnings), replace the eprintln in server/tests/integration/helpers/mod.rs::Drop with panic!() to permanently enforce the .cleanup().await contract. + +Verify before flipping: +- grep main branch CI logs for 'dropped with N pending actions' over the last 2 weeks +- if zero hits, ship the panic +- if any hits, fix the specific test first + +Spec: docs/superpowers/specs/2026-04-11-security-audit-followups-design.md (Topic 4 Layer 3)" +``` + +--- + +## Done criteria + +- [ ] Layer 1 stopgap committed +- [ ] Layer 2 cleanup() method committed +- [ ] All 16 migration files committed (Tasks 3-13) +- [ ] Final verification (Task 14) passed +- [ ] PR opened +- [ ] CI Rust Tests green +- [ ] Follow-up issue filed for Layer 3 panic activation diff --git a/docs/superpowers/plans/2026-04-11-client-devtime-advisories.md b/docs/superpowers/plans/2026-04-11-client-devtime-advisories.md new file mode 100644 index 00000000..f40bab4d --- /dev/null +++ b/docs/superpowers/plans/2026-04-11-client-devtime-advisories.md @@ -0,0 +1,356 @@ +# Client Devtime Advisories Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Reduce `bun audit` from 17 transitive devtime advisories to 0 (or to a documented small set of accepted exceptions) by applying `package.json` overrides on a non-worktree branch. + +**Architecture:** Apply Plan A (overrides on a regular branch off main, not a git worktree). Validate locally with the four client gates: install, audit, test, build. If overrides reproduce the bun worktree-divergence bug from PR #513, fall back to Plan B (lockstep updates of vite/vitest/vite-plugin-solid/unocss/mermaid). Side investigation: confirm whether mermaid is statically or lazy-loaded. + +**Tech Stack:** Bun, Vite, Vitest, Solid.js, mermaid + +**Spec:** `docs/superpowers/specs/2026-04-11-security-audit-followups-design.md` (Topic 1) + +**Branch:** `fix/client-devtime-advisories` (created directly off main, NOT via `git worktree add`) + +--- + +## File structure + +| File | Role | +|---|---| +| `client/package.json` | Add `overrides` block | +| `client/bun.lock` | Regenerated by `bun install` | +| `SECURITY.md` (only if Plan B falls through to documented exceptions) | New file documenting accepted exceptions | + +--- + +## Task 1: Branch setup (must be a regular branch, not a worktree) + +**Background:** PR #513's earlier attempt at this hit a bun resolution bug specific to git worktrees: the same `package.json` + `bun.lock` produced a divergent `node_modules` (missing `esbuild`, `@ampproject/remapping`, ~38 other packages) when installed inside a git worktree, breaking `vite-plugin-solid`'s JSX parsing in 4 test files. The hypothesis is the bug is worktree-specific, so this plan creates a regular branch. + +- [ ] **Step 1: Verify you're in the main worktree (not a git worktree)** + +Run: `pwd && git rev-parse --git-common-dir && git rev-parse --git-dir` + +Expected: +- `pwd` shows `/home/detair/GIT/detair/kaiku` (the main repo path — substitute your actual path) +- `git-common-dir` and `git-dir` are **identical** (both point to `/.git`) + +In a git worktree, `git-common-dir` points to the main `.git` while `git-dir` points to `/.git/worktrees/`. They differ. If they differ in your output, you're in a worktree — `cd` to the main repo path before continuing. + +Sanity check: `git worktree list` should show your current path as the FIRST entry (the main worktree). + +- [ ] **Step 2: Create the branch from main** + +Run: `git checkout main && git pull --ff-only && git checkout -b fix/client-devtime-advisories` + +Expected: branch created at main HEAD. + +- [ ] **Step 3: Establish a baseline** + +Run: `cd client && rm -rf node_modules && bun install 2>&1 | tail -5 && bun audit 2>&1 | tail -5 && bun run test:run 2>&1 | tail -5 && bun run build 2>&1 | tail -5` + +Expected output: +- `bun install`: clean install, ~500-510 packages +- `bun audit`: "17 vulnerabilities (10 high, 7 moderate)" (the baseline we're trying to fix) +- `bun run test:run`: "577 passed" (the baseline we must not regress) +- `bun run build`: "✓ built in s" + +If any of these don't match, investigate before applying the overrides — the baseline is broken on main and that's a separate issue. + +- [ ] **Step 4: Commit the baseline note** + +No file changes yet. Just confirm to yourself in your notes: +- Baseline: 17 vulns / 577 passed / build OK + +Don't commit anything yet — Task 2 produces the first commit. + +--- + +## Task 2: Plan A — Apply overrides + +**Files:** +- Modify: `client/package.json` (add `overrides` block at the end) +- Auto-modified: `client/bun.lock` + +- [ ] **Step 1: Add the overrides block** + +Edit `client/package.json`. The current file ends with: + +```json + "vitest": "^4.1.0" + } +} +``` + +Change it to: + +```json + "vitest": "^4.1.0" + }, + "overrides": { + "picomatch": "^4.0.4", + "rollup": "^4.60.1", + "flatted": "^3.4.2", + "brace-expansion": "^1.1.13", + "lodash-es": "^4.17.24", + "defu": "^6.1.5", + "vite": "8.0.5" + } +} +``` + +Note: `vite: 8.0.5` is pinned without a caret because 8.0.8 has a vite-plugin-solid JSX regression we hit during PR #513. 8.0.5 has the fs.deny security fix and works with our JSX. Future bumps should test JSX parsing first. + +- [ ] **Step 2: Reinstall** + +Run: `cd client && rm -rf node_modules && bun install 2>&1 | tail -10` + +Expected: clean install, ~510-520 packages installed, no errors. + +If you see errors about peer dependency conflicts (e.g., vite peer mismatches), the override approach is hitting limits — proceed to Plan B (Task 5). + +- [ ] **Step 3: Verify node_modules is complete (worktree bug check)** + +Run: `ls node_modules | wc -l && ls node_modules/esbuild node_modules/@ampproject/remapping 2>&1` + +Expected: +- Package count: **at least 500** (PR #513's broken state had ~340; main's healthy state has ~380; adding overrides should produce 500-520). If the count is below 380, the bun bug has likely reproduced. +- Both `esbuild` and `@ampproject/remapping` directories exist. + +If the count is suspiciously low or either directory is missing, the bun worktree-divergence bug has reproduced even on a regular branch — escalate to user (the spec's Plan B fallback won't help because the bun bug isn't worktree-specific). Stop and report. + +If both checks pass: you've confirmed the regular-branch hypothesis. Continue. + +- [ ] **Step 4: Run audit** + +Run: `cd client && bun audit 2>&1 | tail -10` + +Expected: "No vulnerabilities found" or a small remaining count (<5). + +If 0 vulnerabilities: continue to Step 5. + +If some vulnerabilities remain: list them. They might be advisories that depend on transitive packages our overrides don't reach. Decide per remaining vuln whether to add to overrides or accept as documented exception (per spec Topic 1 acceptance boundary). + +- [ ] **Step 5: Run tests** + +Run: `cd client && bun run test:run 2>&1 | tail -10` + +Expected: "577 passed" (or whatever the baseline was in Task 1 Step 3). + +If tests fail with "Failed to parse source for import analysis" or similar JSX errors: the overrides have broken vite-plugin-solid's JSX transform. This is the same failure mode as PR #513. Investigate which override caused it (try removing overrides one at a time). Most likely culprit: `vite: 8.0.5` if vite-plugin-solid expects an exact different version. + +- [ ] **Step 6: Run build** + +Run: `cd client && bun run build 2>&1 | tail -10` + +Expected: "✓ built in s", no errors. Warnings about ineffective dynamic imports and plugin timings are pre-existing and OK. + +- [ ] **Step 7: Commit** + +```bash +git add client/package.json client/bun.lock +git commit -m "chore(client): override transitive deps to clear devtime advisories + +Adds package.json overrides for picomatch, rollup, flatted, +brace-expansion, lodash-es, defu, and vite to resolve 17 devtime +advisories reported by bun audit. + +vite is pinned at 8.0.5 (not ^8.0.5) because 8.0.8 has a +vite-plugin-solid JSX regression we hit during PR #513. + +Direct dependencies are unchanged. The overrides only affect transitive +resolution. 577/577 client tests still pass and bun run build succeeds. + +Refs spec docs/superpowers/specs/2026-04-11-security-audit-followups-design.md (Topic 1)" +``` + +--- + +## Task 3: Side investigation — mermaid load mode + +**Files:** +- Modify: PR description (and optionally a code comment) + +The spec asks: is `mermaid` statically imported or lazy-loaded? This determines whether the `lodash-es` runtime exposure (via `mermaid › dagre-d3-es`) is actually reachable on every page load. + +- [ ] **Step 1: Find mermaid imports in the client source** + +Run: `grep -rnE "from ['\"]mermaid['\"]|import\(['\"]mermaid['\"]\)" client/src/ 2>&1` + +This catches both static (`import mermaid from 'mermaid'`) and dynamic (`await import('mermaid')`) forms. Expected: a small number of import sites total. + +- [ ] **Step 2: Determine load mode** + +If imports are dynamic (`await import('mermaid')` inside a function or component): mermaid is lazy-loaded. The lodash-es runtime exposure only triggers when a markdown code block with `language-mermaid` is rendered. Document this in the PR description. + +If imports are static: mermaid runs on every page load. The lodash-es exposure is real. Note this as a follow-up — propose lazy-loading mermaid in a separate issue, but don't block this PR on it. + +- [ ] **Step 3: Document in PR description** + +Add a "Mermaid load mode" section to the PR description explaining the finding. Example: + +> **Mermaid load mode:** Mermaid is lazy-loaded via `await import('mermaid')` inside `MarkdownPreview.tsx`. The `lodash-es › dagre-d3-es` runtime path only executes when a user views a markdown block with `language-mermaid`, which already passes through dompurify sanitization. The remaining `lodash-es` advisories (if any) describe `_.template` code injection — not reachable from sanitized strings. Acceptable. + +Or, if static: + +> **Mermaid load mode:** Mermaid is statically imported in `MarkdownPreview.tsx:12`. The `lodash-es › dagre-d3-es` chain runs on every page load. The override to `lodash-es ^4.17.24` patches the known advisory. **Follow-up:** lazy-load mermaid (separate issue) to reduce bundle size and surface area. + +This step is documentation only — no code change. + +--- + +## Task 4: Verify CI passes + +- [ ] **Step 1: Push the branch** + +Run: `git push -u origin fix/client-devtime-advisories` + +- [ ] **Step 2: Open the PR** + +```bash +gh pr create --title "chore(client): override transitive deps to clear devtime advisories" --body "$(cat <<'EOF' +## Summary + +Reduces \`bun audit\` from 17 vulnerabilities (10 high, 7 moderate) to 0 by adding \`package.json\` overrides for transitive devtime dependencies. + +## Overrides + +| Package | Pinned version | Why | +|---|---|---| +| picomatch | ^4.0.4 | ReDoS + POSIX injection (transitive via tinyglobby chain) | +| rollup | ^4.60.1 | Path traversal (transitive via @rollup/plugin-commonjs) | +| flatted | ^3.4.2 | Unbounded recursion DoS, proto pollution (transitive via eslint) | +| brace-expansion | ^1.1.13 | Various (transitive) | +| lodash-es | ^4.17.24 | _.template code injection, proto pollution (transitive via mermaid) | +| defu | ^6.1.5 | __proto__ pollution (transitive via unocss config) | +| vite | 8.0.5 | Dev-server fs.deny bypass + path traversal (direct + transitive) | + +vite is pinned at 8.0.5 (not ^8.0.5) because 8.0.8 introduces a vite-plugin-solid JSX regression we encountered in PR #513. + +## Mermaid load mode + +[fill in based on Task 3 findings] + +## Test plan + +- [x] \`bun audit\` reports 0 vulnerabilities +- [x] \`bun run test:run\` passes 577/577 +- [x] \`bun run build\` succeeds +- [ ] CI Frontend job passes + +## Refs + +- Spec: docs/superpowers/specs/2026-04-11-security-audit-followups-design.md (Topic 1) +- Earlier failed attempt: PR #513 (hit a bun worktree-resolution bug; this branch was created directly off main, not via \`git worktree add\`, to avoid that bug) + +🤖 Generated with [Claude Code](https://claude.com/claude-code) +EOF +)" +``` + +- [ ] **Step 3: Wait for CI** + +The Frontend job should pass. The Rust jobs and other CI checks shouldn't be affected — this PR only touches client files. + +If the Frontend job fails with the same JSX regression we saw in PR #513, the bun bug isn't worktree-specific after all. Stop and escalate to user; proceed to Plan B (Task 5). + +- [ ] **Step 4: Merge** + +Once CI is green and the PR has been reviewed, squash-merge: + +```bash +gh pr merge --squash --delete-branch +``` + +--- + +## Task 5: Plan B fallback (only if Plan A fails) + +**Skip this task if Plan A succeeded.** + +If overrides still produce a divergent `node_modules` even on a regular branch, the bun bug is more general than worktree-specific. Plan B is a lockstep update of direct dependencies to versions whose transitives are already patched upstream. + +**Files:** +- Modify: `client/package.json` (bump direct dep versions, remove overrides) +- Auto-modified: `client/bun.lock` +- Possibly create: `SECURITY.md` (if some advisories remain accepted) + +- [ ] **Step 1: Reset the branch** + +```bash +git reset --hard main +git clean -fd client/ +``` + +- [ ] **Step 2: Identify candidate direct dep bumps** + +Run: `cd client && bun outdated 2>&1 | tail -20` + +Expected: a list of direct deps with current vs latest versions. + +For each of: `vite`, `vitest`, `vite-plugin-solid`, `unocss`, `mermaid`, check the "latest" column. + +- [ ] **Step 3: Test each bump in isolation** + +For each candidate, do a single-dep update and re-run the four gates: + +```bash +bun update +bun audit | tail -5 +bun run test:run | tail -5 +bun run build | tail -5 +``` + +Record which bumps reduce the audit count and which don't break tests/build. + +**Known landmines from PR #513:** +- vite 8.0.8 → breaks vite-plugin-solid JSX parsing. Stick to 8.0.5 if you bump vite. +- vitest 4.1.4 → also breaks (combined with vite 8.0.8). Stick to 4.1.0 unless you're also bumping vite to a known-good version. + +- [ ] **Step 4: Apply the bumps that worked** + +Apply only the bumps that (a) reduced advisory count AND (b) didn't break tests/build. Re-run all four gates after the combined update. + +- [ ] **Step 5: Document remaining exceptions** + +Per the spec's Topic 1 acceptance boundary, the only acceptable remaining exceptions are: +- `lodash-es` via `mermaid › dagre-d3-es`, IF mermaid is lazy-loaded +- Anything in the dev-server / build-tool category, IF you can document (1) why upstream hasn't moved, (2) what the dev-time exploitation path looks like, (3) how a developer would notice an exploit + +Create `client/SECURITY.md` (or extend `SECURITY.md` if one already exists) with one entry per accepted exception: + +```markdown +# Client Security Exceptions + +## defu <= 6.1.4 (proto pollution) + +**Status:** Accepted devtime exception +**Path:** `unocss → @unocss/vite → @unocss/config → unconfig → defu` +**Why upstream hasn't moved:** unconfig hasn't bumped its defu pin. Tracked at . +**Exploitation path:** Build-time only. Requires malicious config file already on disk. +**Developer detection:** Build would emit unusual artifacts; CI bun audit would still report it. +``` + +Continue for each remaining exception. + +- [ ] **Step 6: Commit and push** + +```bash +git add client/package.json client/bun.lock client/SECURITY.md +git commit -m "chore(client): bump direct deps to clear devtime advisories (Plan B)" +git push -u origin fix/client-devtime-advisories +``` + +Then proceed to Task 4 Steps 2-4 to open and merge the PR. + +--- + +## Done criteria + +- [ ] `bun audit` reports 0 vulnerabilities (Plan A) or ≤2 documented exceptions per acceptance boundary (Plan B) +- [ ] `bun run test:run` passes 577/577 +- [ ] `bun run build` succeeds +- [ ] CI Frontend job passes on the PR +- [ ] PR merged to main +- [ ] Mermaid load mode documented in PR description diff --git a/docs/superpowers/plans/2026-04-11-osv-scanner-ci.md b/docs/superpowers/plans/2026-04-11-osv-scanner-ci.md new file mode 100644 index 00000000..12afbd08 --- /dev/null +++ b/docs/superpowers/plans/2026-04-11-osv-scanner-ci.md @@ -0,0 +1,210 @@ +# OSV Scanner CI Job Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Add an `osv-scanner` job to the existing `.github/workflows/security.yml` workflow as an independent advisory source. Scans Rust + npm + workflow files in one pass using the OSV database (independent of RustSec and GitHub Advisory Database). + +**Architecture:** Add a new job to `security.yml` (do not create a new workflow file). The job runs on the same triggers as the rest of `security.yml` (weekly schedule, push to main, manual dispatch). Failure semantics: scanner exit code is the gate, no `continue-on-error`. SARIF output uploads to GitHub Code Scanning for tracking. + +**Tech Stack:** GitHub Actions, osv-scanner (Google), SARIF / Code Scanning + +**Spec:** `docs/superpowers/specs/2026-04-11-security-audit-followups-design.md` (Topic 2) + +**Branch:** `feat/osv-scanner-ci` + +**Important context:** The existing `security.yml` already has `rust-audit` (cargo-audit), `dependencies` (cargo-deny), `bun-audit` (bun pm scan with `continue-on-error`), `secrets-scan` (gitleaks), and `codeql` (JS/TS). osv-scanner is a NEW job that adds the OSV database as a third independent source (RustSec via cargo-audit/deny, GHSA via CodeQL, OSV via osv-scanner). + +**Ordering note:** The spec lists this topic as depending on Topic 1 being green ("otherwise the new job fails day 1"). If Topic 1 hasn't merged yet, the osv-scan job will likely report the same 17 devtime advisories that bun audit currently reports. Either land Topic 1 first, or expect the first run on this PR to fail with those advisories. + +**Action versioning:** This plan pins `google/osv-scanner-action` to a specific tag rather than a floating major. Upstream explicitly recommends pinning because "the scanner action behavior might change in a minor patch update." There is **no `v2` tag** in the upstream repo — only patch tags like `v2.3.5`. Floating-major refs will fail to resolve at job-start time. + +--- + +## File structure + +| File | Role | +|---|---| +| `.github/workflows/security.yml` | Add `osv-scan` job (modification, not creation) | + +--- + +## Task 1: Add the osv-scan job + +**Files:** +- Modify: `.github/workflows/security.yml` (append a new job at the end) + +- [ ] **Step 1: Verify the existing workflow structure** + +Run: `head -20 .github/workflows/security.yml` + +Expected: `name: Security Audit`, triggers include weekly cron + push to main + workflow_dispatch. Confirm the file exists and is the one we'll be modifying. + +- [ ] **Step 2: Add the osv-scan job** + +Append to the end of `.github/workflows/security.yml` (after the `codeql` job, on a new line): + +```yaml + + # =========================================================================== + # OSV Scanner — independent advisory source (OSV database) + # =========================================================================== + osv-scan: + name: OSV Scanner + runs-on: ubuntu-latest + permissions: + contents: read + security-events: write # for SARIF upload to Code Scanning + steps: + - uses: actions/checkout@v4 + + - name: Run osv-scanner + # Exit code is the gate. Do NOT set continue-on-error. + # SARIF upload below uses if: always() to publish results + # to Code Scanning even on scanner failure. + # + # Pinned to v2.3.5 specifically because: + # 1. Upstream has no `v2` floating tag — only patch tags exist. + # 2. Upstream warns: "the scanner action behavior might change in + # a minor patch update." Pin precisely. + # + # Before merging this PR, the implementer should check + # https://github.com/google/osv-scanner-action/releases for any + # newer patch tag and update if appropriate. + uses: google/osv-scanner-action/osv-scanner-action@v2.3.5 + with: + # --severity=HIGH gates only on HIGH+ findings, matching the + # spec's recommended threshold. MEDIUM/LOW still appear in the + # SARIF upload (and the Security tab) but don't fail the job. + scan-args: |- + --recursive + --skip-git + --severity=HIGH + --format=sarif + --output=osv-results.sarif + ./ + + - name: Upload SARIF to Code Scanning + if: always() + uses: github/codeql-action/upload-sarif@v3 + with: + sarif_file: osv-results.sarif + category: osv-scanner +``` + +**Notes for the implementer:** + +- The job uses the same checkout-based pattern as other jobs in this file. No bun setup or rust toolchain needed — osv-scanner reads `Cargo.lock`, `client/bun.lock`, etc. directly. +- `--skip-git` prevents the scanner from trying to scan submodules or git history. We only care about the lockfiles in the working tree. +- `--severity=HIGH` matches the spec's recommended gating threshold. +- `category: osv-scanner` separates osv-scanner findings from other SARIF uploads (CodeQL) in the GitHub Security tab. +- `if: always()` on the upload step means SARIF is published even when the scan finds vulnerabilities (and the prior step exited non-zero). This is intentional — failed scans are exactly when you want the SARIF in the Security tab. + +**bun.lock support caveat:** osv-scanner's lockfile parser support has historically lagged on bun. There is a real chance that `--recursive` will only pick up `Cargo.lock` and skip `client/bun.lock`. If that happens, the OSV coverage is Rust-only (still useful — `cargo audit` and `cargo deny` cover RustSec; OSV is the third independent source for Rust). The first CI run will tell us. If bun.lock is skipped, document the limitation in the PR description and file a follow-up to reach out to the osv-scanner maintainers about bun support. + +- [ ] **Step 3: Validate YAML syntax** + +Run: `python3 -c "import yaml; yaml.safe_load(open('.github/workflows/security.yml'))" 2>&1` + +Expected: no output (success). If you see a YAML parse error, fix the indentation. Common mistakes: +- Tab characters instead of spaces (must be spaces only) +- Wrong indentation level for `osv-scan:` (should be at the same level as `codeql:`) +- Missing newline before the new job + +- [ ] **Step 4: Lint with actionlint (optional, only if available)** + +Run: `which actionlint && actionlint .github/workflows/security.yml 2>&1 || echo "actionlint not installed, skipping"` + +Expected: clean output or "actionlint not installed". If actionlint is installed and reports errors, fix them before committing. + +- [ ] **Step 5: Commit** + +```bash +git add .github/workflows/security.yml +git commit -m "ci(security): add osv-scanner job for OSV database coverage + +Adds Google's osv-scanner as a new job in security.yml. Uses the OSV +database, which is independent of RustSec (covered by cargo-audit and +cargo-deny) and GHSA (covered by CodeQL). Provides a third independent +advisory source. + +Failure semantics: scanner exit code is the gate. SARIF output uploads +to GitHub Code Scanning even on scan failure (if: always()) so the +Security tab tracks history. + +Refs spec docs/superpowers/specs/2026-04-11-security-audit-followups-design.md (Topic 2)" +``` + +--- + +## Task 2: Push and verify + +- [ ] **Step 1: Push the branch** + +Run: `git push -u origin feat/osv-scanner-ci` + +- [ ] **Step 2: Open the PR** + +```bash +gh pr create --title "ci(security): add osv-scanner job for OSV database coverage" --body "$(cat <<'EOF' +## Summary + +Adds an osv-scanner job to \`.github/workflows/security.yml\`. Uses the OSV vulnerability database, independent from RustSec (cargo-audit/deny) and GHSA (CodeQL). + +## Failure semantics + +- Scanner exit code is the gate. No \`continue-on-error\`. +- SARIF upload runs on \`if: always()\` so failed scans still publish to GitHub Security tab. +- Default severity threshold (any). May be tightened to \`--severity=HIGH\` after observing baseline noise. + +## Test plan + +- [x] YAML validates with python3 yaml.safe_load +- [ ] First CI run on this PR completes (job either passes or reports vulnerabilities cleanly) +- [ ] If the job fails, the failures are real advisories — not transient errors +- [ ] SARIF results visible in GitHub Security tab under \"osv-scanner\" category + +## Refs + +- Spec: docs/superpowers/specs/2026-04-11-security-audit-followups-design.md (Topic 2) + +🤖 Generated with [Claude Code](https://claude.com/claude-code) +EOF +)" +``` + +- [ ] **Step 3: Watch the first run** + +Run: `gh pr checks ` and look for the "OSV Scanner" job. + +Expected outcomes: + +a) **Job passes** (zero vulnerabilities): great, ship it. + +b) **Job fails with real vulnerabilities**: this is informative. The OSV database may flag advisories that RustSec/GHSA don't. Investigate each finding: +- If it's a duplicate of something cargo-deny/CodeQL already catches: harmless, but we're paying for the duplicate. Acceptable. +- If it's a new advisory we hadn't seen: this is the value of the second source. Add it to a follow-up issue and fix in a separate PR. +- If it's a transitive devtime issue covered by Topic 1's Plan A: should already be patched. If not, Topic 1 missed it. + +c) **Job times out or errors out**: investigate. The scanner may need network access for the OSV database lookup. If GitHub Actions runners can't reach osv.dev, escalate. + +If the scan reports new vulnerabilities, **do not fix them in this PR**. Open a separate issue and ship the workflow change first. + +- [ ] **Step 4: Iterate if needed** + +The plan ships with `--severity=HIGH` already, so MEDIUM/LOW findings only show in the SARIF upload (and Security tab) and don't gate. If the HIGH+ count is still too noisy, raise to `--severity=CRITICAL`. If you want broader coverage and the noise is acceptable, lower to `--severity=MEDIUM`. Re-push, re-check, repeat as needed. + +- [ ] **Step 5: Merge** + +```bash +gh pr merge --squash --delete-branch +``` + +--- + +## Done criteria + +- [ ] `osv-scan` job exists in `.github/workflows/security.yml` +- [ ] First CI run on the PR completes (passes or reports real findings) +- [ ] SARIF results visible in GitHub Security tab under `osv-scanner` category +- [ ] PR merged to main +- [ ] Subsequent main-branch runs include the osv-scan job diff --git a/docs/superpowers/plans/2026-04-11-scap-fork-cleanup.md b/docs/superpowers/plans/2026-04-11-scap-fork-cleanup.md new file mode 100644 index 00000000..fb2525ab --- /dev/null +++ b/docs/superpowers/plans/2026-04-11-scap-fork-cleanup.md @@ -0,0 +1,318 @@ +# scap Fork Cleanup Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Drop the `Detair/scap` fork dependency in `client/src-tauri/Cargo.toml` if upstream `CapSoftware/scap` main HEAD now compiles as a `vc-client` dependency on Linux. If not, open a narrow upstream PR with Detair's targeted Linux Frame enum fix. + +**Architecture:** Investigation-first plan. Step 1 is testing whether upstream now builds — that may take 5 minutes and produce a one-line Cargo.toml change. If upstream still doesn't build, the rest of the plan is "open an upstream PR with our fix" and there's no code change to ship in this repo until upstream merges. + +**Tech Stack:** Rust (Cargo), git, GitHub CLI + +**Spec:** `docs/superpowers/specs/2026-04-11-security-audit-followups-design.md` (Topic 3) + +**Branch:** `chore/scap-upstream-recheck` (only created if Step 2a succeeds) + +--- + +## File structure + +| File | Role | +|---|---| +| `client/src-tauri/Cargo.toml` | Modified only if upstream main builds (Step 2a) | + +--- + +## Task 1: Test if upstream main HEAD builds as a `vc-client` dependency + +**Files:** +- Modify: `client/src-tauri/Cargo.toml` (temporary edit, may be reverted) + +The previous breakage (scap 0.1.0-beta.1 Frame enum restructure) only manifests when a downstream consumer tries to use the API. Standalone `cargo check` on scap won't catch it. The right test is "does `cargo check -p vc-client` succeed against upstream main HEAD on Linux." + +- [ ] **Step 1: Get upstream main HEAD SHA** + +Run: `gh api repos/CapSoftware/scap/branches/main --jq '.commit.sha'` + +Expected: a 40-character SHA. Record it as ``. + +- [ ] **Step 2: Save the current Cargo.toml line for later restore** + +Run: `grep -n "^scap = " client/src-tauri/Cargo.toml` + +Expected: a single line near line 62, currently: +``` +scap = { git = "https://github.com/Detair/scap.git", branch = "fix/linux-frame-enum" } +``` + +Record this line — you may need to restore it if Step 2b applies. + +- [ ] **Step 3: Apply temporary edit pointing at upstream main** + +Edit `client/src-tauri/Cargo.toml`. Replace the existing `scap = { git = "https://github.com/Detair/scap.git", branch = "fix/linux-frame-enum" }` line with: + +```toml +scap = { git = "https://github.com/CapSoftware/scap.git", rev = "" } +``` + +(Substitute the actual SHA from Step 1.) + +**Do not commit yet.** This is a test edit. + +- [ ] **Step 4: Try to build vc-client** + +Run: `cargo check -p vc-client 2>&1 | tail -20` + +This will pull the new scap source from GitHub, then try to type-check the Tauri client against it. The relevant target is Linux (which is where the original break happened). Build expectations: + +- The first run will be slow (downloading + compiling scap and its deps). Be patient. +- The compile happens against the host target by default — usually Linux on a Linux dev machine. That's exactly what we want to test. + +Expected outcomes: + +a) **`cargo check` succeeds.** Upstream main now builds. Proceed to Step 2a. + +b) **`cargo check` fails with a Frame enum / VideoFrame / pipewire / linux/mod.rs error.** The Linux issue is still not fixed upstream. Proceed to Step 2b. + +c) **`cargo check` fails with an unrelated error** (e.g., missing system library, network failure cloning scap). Investigate and retry. Don't make a determination from a flaky failure. + +- [ ] **Step 5a: If `cargo check` succeeded — proceed to Task 2** + +Leave the temporary edit in place and continue to Task 2 (commit the change). + +- [ ] **Step 5b: If `cargo check` failed — revert and proceed to Task 3** + +Run: edit `client/src-tauri/Cargo.toml` to restore the original `scap = { git = "https://github.com/Detair/scap.git", branch = "fix/linux-frame-enum" }` line. + +Run: `git diff client/src-tauri/Cargo.toml` + +Expected: no diff (file restored to its original state). + +Then proceed to Task 3. + +--- + +## Task 2: Commit the upstream switch (only if Task 1 Step 5a applied) + +**Files:** +- Modify: `client/src-tauri/Cargo.toml` + +- [ ] **Step 1: Create the branch** + +Run: `git checkout -b chore/scap-upstream-recheck` + +(Created off main since you started Task 1 from main.) + +- [ ] **Step 2: Update the doc comment** + +**Substitute the actual `` from Task 1 Step 1 in BOTH the comment and the `rev = ` line below.** Do not commit a literal `` placeholder. + +The current Cargo.toml has a multi-line comment above the scap line explaining why we're using the Detair fork. With the fork dropped, that explanation is wrong. Edit the comment to explain the new state. Replace: + +```toml +# Screen capture +# Pinned to Detair/scap fork because upstream scap 0.1.0-beta.1 introduced a +# Frame enum restructure (Frame::Video(VideoFrame::*)) without updating the +# Linux pipewire backend, breaking the Linux build. Our fork (commit 4d1304e9) +# adapts the Linux backend to the new enum structure plus a Windows typo fix. +# +# Upstream PR that would fix this: https://github.com/CapSoftware/scap/pull/178 +# (open since 2025-10-26, broader scope: adds sequence numbers + SystemTime + +# latest-buffer grab). Once merged and released, drop this git dep and pin to +# the next scap release. Track in #TODO (or file an issue). +scap = { git = "https://github.com/CapSoftware/scap.git", rev = "" } +``` + +With: + +```toml +# Screen capture +# Pinned to upstream main HEAD until the next scap release. The Linux pipewire +# backend regression that required our Detair/scap fork has been resolved +# upstream. When CapSoftware/scap publishes a new release on crates.io, replace +# this git dep with a normal version pin. +scap = { git = "https://github.com/CapSoftware/scap.git", rev = "" } +``` + +- [ ] **Step 3: Full build verification** + +`cargo check` is faster but doesn't link. Do a full build to be safe: + +Run: `cargo build -p vc-client 2>&1 | tail -10` + +Expected: clean build, no errors. If linker errors appear (rare with Rust), investigate. + +- [ ] **Step 4: Run vc-client tests if any exist** + +Run: `cargo test -p vc-client 2>&1 | tail -10` + +Expected: pass (or "0 tests run" if no tests defined for vc-client). Errors here indicate something deeper than the dep change — investigate. + +- [ ] **Step 5: Cargo.lock check** + +Run: `git diff --stat Cargo.lock` + +Expected: Cargo.lock has changed (the scap dep entries now point at the new commit). Commit it along with Cargo.toml. + +- [ ] **Step 6: Commit** + +**Substitute the actual `` from Task 1 Step 1 in the commit message body.** + +```bash +git add client/src-tauri/Cargo.toml Cargo.lock +git commit -m "chore(client): drop Detair/scap fork, pin upstream main + +The Linux pipewire backend regression in scap 0.1.0-beta.1 that required +our Detair/scap fork has been resolved upstream. Switch to upstream main +HEAD until the next scap release. + +Verified: cargo build -p vc-client succeeds on Linux against upstream +commit . + +Refs spec docs/superpowers/specs/2026-04-11-security-audit-followups-design.md (Topic 3)" +``` + +- [ ] **Step 7: Push and PR** + +Run: `git push -u origin chore/scap-upstream-recheck` + +```bash +gh pr create --title "chore(client): drop Detair/scap fork, pin upstream main" --body "$(cat <<'EOF' +## Summary + +Drops the Detair/scap git fork and points scap at upstream CapSoftware/scap main HEAD. The Linux pipewire backend regression that motivated the fork has been resolved upstream. + +## Verification + +- \`cargo check -p vc-client\` succeeds on Linux against upstream main HEAD (commit ) +- \`cargo build -p vc-client\` succeeds on Linux +- All 4 build targets in CI (macos-latest, macos-15-intel, ubuntu-24.04, windows-latest) should pass + +## Follow-up + +When CapSoftware/scap publishes a new release on crates.io, replace the git dep with a normal version pin. Tracked in: + +## Refs + +- Spec: docs/superpowers/specs/2026-04-11-security-audit-followups-design.md (Topic 3) + +🤖 Generated with [Claude Code](https://claude.com/claude-code) +EOF +)" +``` + +- [ ] **Step 8: Verify cross-platform CI** + +The PR will trigger Build jobs on macos-latest, macos-15-intel, ubuntu-24.04, and windows-latest. All must pass before merging — this is the cross-platform regression check. + +If macOS or Windows fails: the upstream fix may have only addressed Linux. Don't merge. Either: +- Open the upstream issue for the macOS/Windows breakage +- Restore the Detair fork temporarily (`git revert HEAD`) +- Switch the plan to Task 3 (open our own narrow upstream PR) + +If all 4 build targets pass: merge. + +- [ ] **Step 9: Merge** + +```bash +gh pr merge --squash --delete-branch +``` + +--- + +## Task 3: Open a narrow upstream PR (only if Task 1 Step 5b applied) + +**Files:** +- None in this repo. The work is in the `CapSoftware/scap` repo. + +This task involves contributing back to a third-party repo. **Get explicit user authorization before opening the PR** — submitting code to an external repo as a collaborator is a different category of action than internal repo work. + +- [ ] **Step 1: Confirm the diff to upstream** + +Run: + +```bash +gh api repos/Detair/scap/compare/main...fix/linux-frame-enum --jq '.files[] | {filename, additions, deletions}' +``` + +Expected: a small set of files (likely `src/capturer/engine/linux/mod.rs` and `src/capturer/engine/win/mod.rs`). Confirm the scope is narrow — if the fork has accumulated extra changes that aren't part of the targeted fix, you may need to extract just the linux/win fixes into a smaller patch. + +- [ ] **Step 2: Read the patch contents** + +Run: `gh api repos/Detair/scap/compare/main...fix/linux-frame-enum --jq '.files[] | .patch'` + +Verify the changes are exactly the Linux Frame enum adaptation (Frame::Video wrapper) and Windows typo fix. Nothing else. + +- [ ] **Step 3: Check for an existing similar upstream PR** + +Run: `gh api 'repos/CapSoftware/scap/pulls?state=all' --jq '.[] | select(.title | test("linux|frame|pipewire"; "i")) | {number, title, state}'` + +The known existing PR is #178 ("fix/feat: SystemTime instead of timestamp"). It's broader than our fix. There may also be others — review them. + +If there's already a narrow PR with the same content as ours: don't open a duplicate. Subscribe to the existing PR and update our Cargo.toml comment to reference it instead. + +If there's no narrow PR: proceed to Step 4. + +- [ ] **Step 4: ASK USER FOR EXPLICIT AUTHORIZATION** + +Before opening the upstream PR, surface the plan to the user: + +> "Topic 3 Step 5b applies — upstream main still doesn't build for vc-client on Linux. I want to open a narrow PR against CapSoftware/scap with the Linux Frame enum fix and Windows typo fix from our Detair/scap fork. Authorize?" + +Wait for explicit yes. Do not proceed without it. + +- [ ] **Step 5: Open the upstream PR** + +Once authorized: + +1. Fork CapSoftware/scap on GitHub (if not already forked under your account). +2. Create a branch on the fork with just the targeted commits from `Detair/scap fix/linux-frame-enum`. +3. Open the PR from the fork to `CapSoftware/scap:main`. +4. PR title: "fix(linux): adapt pipewire backend to Frame::Video enum restructure" +5. PR body: explain the breakage, the fix, and reference the original commit `4d1304e9` from Detair/scap. + +(Specific gh commands omitted because they depend on which fork account is used.) + +- [ ] **Step 6: Update our Cargo.toml comment with the new PR reference** + +Back in this repo, on a small docs branch: + +```bash +git checkout -b docs/scap-upstream-pr-tracking +``` + +Edit `client/src-tauri/Cargo.toml` to add the new upstream PR number alongside #178 in the existing comment: + +```toml +# Upstream PRs that would fix this: +# https://github.com/CapSoftware/scap/pull/178 (broader, open since 2025-10-26) +# https://github.com/CapSoftware/scap/pull/ (narrow, opened YYYY-MM-DD) +``` + +Commit and PR: + +```bash +git add client/src-tauri/Cargo.toml +git commit -m "docs(client): track narrow scap upstream PR alongside #178" +git push -u origin docs/scap-upstream-pr-tracking +gh pr create --title "docs(client): track narrow scap upstream PR" --body "Tracks the narrow upstream PR opened to fix the Linux pipewire backend regression in scap. Once it merges, run Topic 3 Task 1 again to switch to upstream main." +``` + +- [ ] **Step 7: Done (for now)** + +The actual upstream merge is out of our control. When it happens, re-run Task 1 of this plan to switch to upstream. + +--- + +## Done criteria + +**Task 2 path (upstream now builds):** +- [x] `client/src-tauri/Cargo.toml` no longer references `Detair/scap` +- [x] All 4 cross-platform Build jobs pass on the PR +- [x] PR merged to main + +**Task 3 path (upstream still broken):** +- [x] No code change in main branch (Detair fork still pinned) +- [x] User authorization obtained for upstream PR +- [x] Narrow upstream PR opened to CapSoftware/scap +- [x] Our Cargo.toml comment updated to reference the new PR (small docs PR) +- [x] Tracking documented for future re-check diff --git a/docs/superpowers/specs/2026-04-11-security-audit-followups-design.md b/docs/superpowers/specs/2026-04-11-security-audit-followups-design.md new file mode 100644 index 00000000..64ded3f7 --- /dev/null +++ b/docs/superpowers/specs/2026-04-11-security-audit-followups-design.md @@ -0,0 +1,463 @@ +# Security Audit Follow-Ups + +**Date:** 2026-04-11 +**Status:** Approved +**Goal:** Address the four remaining items from the post-#512 security audit (PR #513 deferred them) as four independent sub-projects. + +## Context + +The codebase consistency refactor (#512) was followed by a security audit that produced 4 distinct findings beyond what PR #513 fixed: + +1. **17 client devtime advisories** — transitive `bun audit` findings in build tooling (vite, picomatch, rollup, lodash-es, etc.) +2. **Second advisory source missing** — only `cargo deny` checks Rust advisories; no equivalent for npm beyond `bun audit` running locally +3. **`scap` git fork dependency** — pinned to `Detair/scap@4d1304e9` because upstream `scap 0.1.0-beta.1` shipped a Linux-breaking Frame enum restructure +4. **CI flaky test #900/900** — Rust integration tests intermittently time out at the `slow-timeout = 120s` threshold; three different tests have hit this (`test_globally_banned_user_cannot_join_via_discovery`, `test_custom_status_with_expiry_persists`, `test_channel_limit`) + +## Approach + +The 4 topics are truly independent — different files, different risks, different external dependencies. Treat them as 4 sub-projects: one design doc (this), four implementation plans, four PRs. + +Topic 4 (CI flake) is the highest-impact and gets the most design depth — it's actively blocking work via admin-merges. Topics 1-3 are lower-stakes and get tighter treatment. Implementation order is whatever the user prefers; nothing here blocks anything else. + +--- + +## Topic 1: Devtime advisories + +**Branch:** `fix/client-devtime-advisories` + +### Background + +`bun audit` reports 17 vulnerabilities in client transitive dependencies. None affect production runtime — the vulnerable packages are dev-server tooling (vite 7.x), build tools (rollup, picomatch), and lint plumbing (flatted via eslint). One transitive (`lodash-es` via `mermaid → dagre-d3-es`) is technically runtime, but only reachable if mermaid is loaded, which depends on whether mermaid is lazy-loaded in the client. + +PR #513 attempted to fix these via `package.json` overrides. The override approach worked locally on the main branch but produced divergent `node_modules` when applied in a git worktree (missing `esbuild`, `@ampproject/remapping`, and ~38 other packages), which broke `vite-plugin-solid`'s JSX transformation in 4 test files. The hypothesis is that bun's worktree resolution behavior is the bug, not the overrides themselves. + +### Plan A — retry overrides on a regular branch + +Create the fix branch directly off main without using `git worktree add`. Apply the same `package.json` overrides as PR #513: + +```json +"overrides": { + "picomatch": "^4.0.4", + "rollup": "^4.60.1", + "flatted": "^3.4.2", + "brace-expansion": "^1.1.13", + "lodash-es": "^4.17.24", + "defu": "^6.1.5", + "vite": "8.0.5" +} +``` + +Run the full client gate and require all four to pass before commit: + +```sh +cd client +rm -rf node_modules +bun install +bun audit # must report 0 vulnerabilities +bun run test:run # must pass 577/577 +bun run build # must succeed +``` + +If all four pass: commit, push, open PR, ship. + +### Plan B — fallback if overrides still misbehave + +If `bun install` produces a divergent `node_modules` even on a regular branch, fall back to lockstep direct dep updates: + +- Update `vite` to a version where the dev-server fixes are present (>=8.0.5, but verify no JSX regression — vite 8.0.8 broke `vite-plugin-solid`) +- Update `vitest` and `vite-plugin-solid` to versions that don't pin old vite +- Update `unocss` to a version whose `unconfig` dep uses patched `defu` +- Update `mermaid` to a version whose `dagre-d3-es` uses patched `lodash-es` (or accept the risk) +- Run the same four gates + +### Side investigation — mermaid lazy-loading + +Independent of A/B: check whether `mermaid` is statically imported or lazy-loaded. The relevant question is "does `import('mermaid')` happen at app startup or only when a markdown code block has `language-mermaid`?" + +If statically imported: the `lodash-es` runtime exposure is real (every page load executes mermaid's code). Propose a follow-up to lazy-load mermaid via dynamic `import()`. + +If lazy-loaded: runtime exposure is negligible (only triggered by user-supplied content that intentionally renders a diagram). No additional action needed. + +### Acceptance boundary for documented exceptions (Plan B only) + +If Plan B can't reach 0 vulnerabilities, only these exceptions are acceptable: + +- `lodash-es` via `mermaid › dagre-d3-es` — **only** if mermaid is confirmed lazy-loaded (side investigation result is "yes"). The runtime exposure in that case is `_.template` code injection only when a markdown code block invokes mermaid rendering, with content that has already passed dompurify sanitization. Acceptable risk. +- Anything else in the dev-server / build-tool category (vite, rollup, picomatch, etc.) — **only** if the implementer has documented in the PR description: (1) why the upstream chain hasn't moved, (2) what the dev-time exploitation path looks like, (3) how a developer would notice an exploit. If any of these can't be answered, the exception isn't acceptable and Plan B fails — escalate. + +Any exception must be added to a SECURITY.md or in-code comment with a tracking link to the upstream issue (so future audits can re-check whether the exception is still needed). + +### Success criteria + +- `bun audit` reports 0 vulnerabilities (Plan A) or ≤2 documented exceptions per the boundary above (Plan B) +- `bun run test:run` passes 577/577 +- `bun run build` succeeds +- Mermaid load mode is documented in the PR description + +--- + +## Topic 2: osv-scanner CI job + +**Branch:** `feat/osv-scanner-ci` + +### Background + +The original audit recommendation suggested installing `cargo-audit` alongside `cargo-deny` for "two independent advisory sources." This is wrong: both tools read the same RustSec advisory database. A real second source needs to come from a different database. + +The cleanest second source is **OSV** (osv.dev) — an open-source vulnerability database maintained by Google that aggregates from multiple ecosystems (RustSec, npm, PyPI, etc.) and is independent from GitHub Advisory Database (GHSA) and RustSec. + +GitHub Dependabot was rejected in favor of osv-scanner because the user wants a non-PR-based approach (no automated dependency PRs cluttering the queue). + +### Plan + +Add a new CI job that runs `osv-scanner` against the workspace. + +**File:** `.github/workflows/osv-scan.yml` (or extend existing CI workflow) + +**Job shape:** + +```yaml +name: OSV Scanner + +on: + push: + branches: [main] + pull_request: + schedule: + - cron: "0 6 * * 1" # Mondays 06:00 UTC + +permissions: + contents: read + security-events: write # for SARIF upload + +jobs: + osv-scan: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - name: Run osv-scanner + uses: google/osv-scanner-action/osv-scanner-action@v2 + with: + scan-args: |- + --recursive + --skip-git + --format=sarif + --output=osv-results.sarif + ./ + # Do NOT set continue-on-error: we want the job to fail when the + # scanner finds vulnerabilities at or above the configured severity. + # The scanner exit code is the gate; SARIF upload below is for + # historical tracking in GitHub Security tab, not for gating. + - name: Upload SARIF + if: always() # Run even if scanner failed, so the SARIF lands in Security tab + uses: github/codeql-action/upload-sarif@v3 + with: + sarif_file: osv-results.sarif +``` + +**Failure semantics (be explicit):** +- The osv-scanner exit code is the gate. Non-zero exit fails the CI job, blocks the PR. +- SARIF upload runs on `if: always()` so failed scans still publish results to the Security tab for tracking — but the upload itself does not gate. +- No `continue-on-error` anywhere in this job. If the scanner says fail, we fail. +- Severity threshold: scan-args should pin a threshold once the implementer has tested both modes; the plan phase should choose between (a) exit non-zero on any vulnerability of any severity, or (b) exit non-zero only on HIGH+. Recommendation is (b) HIGH+ for the PR gate so MEDIUM/LOW finds become tracked-but-non-blocking via the Security tab. + +**Behavior:** +- Runs on every PR and every push to main +- Weekly cron catches newly-published advisories on quiet branches +- SARIF output uploads to GitHub Code Scanning for centralized tracking + +**Why not extend `cargo deny`:** osv-scanner uses a different DB and supports both Rust and JS in one pass. cargo-deny only covers Rust. + +### Success criteria + +- New CI job runs on PRs and main +- Scheduled weekly runs visible in Actions UI +- Job reports 0 HIGH+ vulnerabilities on main (after Topic 1 lands) +- SARIF results visible in GitHub Security tab + +--- + +## Topic 3: scap fork cleanup + +**Branch:** `chore/scap-upstream-recheck` (only if upstream is now buildable; otherwise no branch — just upstream PR work) + +### Background + +`client/src-tauri/Cargo.toml` pins `scap` to `Detair/scap@fix/linux-frame-enum` (commit `4d1304e9`). The fork exists because upstream `scap 0.1.0-beta.1` shipped a Frame enum restructure (`Frame::Video(VideoFrame::*)`) without updating the Linux pipewire backend, breaking the Linux build. Detair's fork adapts the Linux backend plus a Windows typo fix. + +Upstream PR `CapSoftware/scap#178` is open since 2025-10-26 with a broader scope (sequence numbers + SystemTime + latest-buffer grab) but has not been merged. + +The audit finding documented this in PR #513. This topic is about actually dropping the git dep. + +### Plan + +**Step 1: Test if upstream main HEAD now builds as a `vc-client` dependency on Linux.** + +The right test is "does upstream main work AS A DEP of vc-client", not "does upstream main compile standalone." The previous breakage was a Frame enum change in the `scap` API surface that only manifests when a downstream consumer (vc-client) tries to use the API. A standalone `cargo check` on scap won't catch this. + +```sh +# Get upstream main HEAD SHA +gh api repos/CapSoftware/scap/branches/main --jq '.commit.sha' + +# Edit client/src-tauri/Cargo.toml temporarily: +# scap = { git = "https://github.com/CapSoftware/scap.git", rev = "" } + +# Build vc-client against upstream +cargo check -p vc-client +# (Run on Linux specifically — that's where the previous break happened. +# CI builds across all 3 OS, but the fast local check is Linux.) +``` + +If `cargo check -p vc-client` succeeds, the change is real. Run the full `cargo build -p vc-client` to confirm linking, then proceed to Step 2a. + +**Step 2a: If it builds (cleanup possible).** + +Update `client/src-tauri/Cargo.toml`: + +```toml +# Before: +scap = { git = "https://github.com/Detair/scap.git", branch = "fix/linux-frame-enum" } + +# After: +scap = { git = "https://github.com/CapSoftware/scap.git", rev = "" } +``` + +Update the doc comment to remove the "Detair fork" rationale and replace it with "pinned to upstream main pending next release." Drop the upstream PR #178 reference. Run the full Tauri build to verify. Commit, push, open PR, ship. + +**Step 2b: If it doesn't build (no cleanup possible yet).** + +No change to `main` branch. Instead: + +1. Open a narrow upstream PR to `CapSoftware/scap` containing just Detair's targeted fix (linux/mod.rs Frame enum adaptation + win/mod.rs typo). Reference our use case. +2. Update our `Cargo.toml` doc comment to point at the new PR number alongside #178. +3. Wait. When the new PR or #178 merges and ships, repeat Step 1. + +### Success criteria + +- Either `Cargo.toml` no longer references `Detair/scap` (best case, Step 2a) **or** there is a tracked upstream PR with our targeted fix submitted (acceptable case, Step 2b) +- Tauri client still builds on Linux/macOS/Windows (cross-platform regression check on the new dep) + +--- + +## Topic 4: CI #900/900 flake — `CleanupGuard` redesign + +**Branch:** `fix/cleanup-guard-test-flake` + +### Background + +`server/tests/integration/helpers/mod.rs:163-185` defines `Drop for CleanupGuard`, which spawns a fresh OS thread, builds a new single-threaded tokio runtime, blocks on cleanup actions, and joins the thread with **no timeout**. When cleanup hangs (DB pool contention, lock wait, slow DELETE), the join blocks the test process indefinitely. nextest's `slow-timeout = 120s` then kills the test, reporting it as the "last test" because nextest reports completion in finish-time order. + +Three different tests have hit this in recent CI: + +| Test | First seen | File | +|---|---|---| +| `test_globally_banned_user_cannot_join_via_discovery` | PR #512 first run | `guild_limits.rs` | +| `test_custom_status_with_expiry_persists` | PR #513 first run | `custom_status.rs` | +| `test_channel_limit` | PR #513 rerun | `guild_limits.rs` | + +All three use `CleanupGuard`. Closed issue #509 documents the exact same anti-pattern in setup tests, with the same root cause analysis. The fix recommendation in #509 was to make cleanup synchronous via explicit `await` rather than relying on Drop. + +### Layer 1: Stopgap — bounded join in `CleanupGuard::drop` + +**File:** `server/tests/integration/helpers/mod.rs:163-185` + +Replace the unconditional `.join()` with a 30-second polling loop. If cleanup completes within 30s, join normally. Otherwise, log to stderr and return — the cleanup thread continues detached, but the test no longer hangs the test process. + +```rust +impl Drop for CleanupGuard { + fn drop(&mut self) { + let actions = std::mem::take(&mut self.actions); + if actions.is_empty() { + return; + } + + let pool = self.pool.clone(); + let handle = std::thread::spawn(move || { + let runtime = tokio::runtime::Builder::new_current_thread() + .enable_all() + .build() + .expect("Failed to create cleanup runtime"); + runtime.block_on(async move { + for action in actions { + action(pool.clone()).await; + } + }); + }); + + // Bounded join: poll for up to 30s, then detach if still running. + // Prevents flaky cleanup hangs from triggering nextest's 120s slow-timeout. + // Removed once explicit-cleanup migration completes (Layer 2). + let timeout = std::time::Duration::from_secs(30); + let start = std::time::Instant::now(); + loop { + if handle.is_finished() { + let _ = handle.join(); + return; + } + if start.elapsed() > timeout { + eprintln!( + "warning: CleanupGuard did not complete within {timeout:?}, \ + detaching thread to allow test to finish" + ); + return; + } + std::thread::sleep(std::time::Duration::from_millis(50)); + } + } +} +``` + +**Effort:** 1 file, ~15 lines. Self-contained. + +**Risk:** Detached threads continue accessing the DB pool until the test process exits. Cleanup actions are idempotent (`DELETE … WHERE id = …`, `UPDATE … WHERE key = …`), so concurrent execution is safe. + +### Layer 2: Proper fix — explicit cleanup with runtime enforcement + +Convert `CleanupGuard` to require explicit `cleanup().await` instead of relying on Drop. The enforcement strategy is **runtime**, not compile-time, because Rust's `#[must_use]` attribute on structs only fires when the value is unused at the call site — it does NOT catch a guard that is bound to a variable and then dropped at end-of-scope (which is exactly the test pattern). A binding like `let mut guard = app.cleanup_guard();` counts as "used" for `#[must_use]` purposes even if `cleanup()` is never called. + +**Enforcement approach: iterative migration via runtime feedback, then panic.** + +1. Layer 1's Drop fallback already prints a warning to stderr when actions remain at drop time. This is the migration signal. +2. Add `cleanup(self)` method (consumes `self`, runs actions, leaves Drop with empty actions vec). +3. Migrate test files iteratively. Each migration commit removes the Drop fallback warnings from a specific test file. +4. After all migrations complete and CI shows zero "did not complete within" warnings AND zero "dropped with N pending actions" warnings for ~2 weeks of normal main branch activity, **swap eprintln to panic in a separate follow-up commit**. From that point on, any test that forgets `cleanup().await` fails immediately. + +The migration is detected via runtime instrumentation, not compile-time enforcement. This is slower but actually works. + +**File:** Same `helpers/mod.rs`, plus every test file using `CleanupGuard`. + +**Change to `CleanupGuard` (Layer 2 commit, on top of Layer 1):** + +```rust +pub struct CleanupGuard { + pool: PgPool, + actions: Vec, +} + +impl CleanupGuard { + // ... existing constructors and add() methods unchanged ... + + /// Run all registered cleanup actions on the test's existing tokio runtime. + /// Tests MUST call this at the end of the test body. Forgetting it triggers + /// a runtime warning from the Drop fallback (during migration), then a + /// panic (after migration completes). + pub async fn cleanup(mut self) { + let actions = std::mem::take(&mut self.actions); + for action in actions { + action(self.pool.clone()).await; + } + // Drop is now a no-op for this guard (actions is empty). + } +} + +// Drop fallback (from Layer 1) is updated to also detect "dropped with +// pending actions" — that's the migration signal: +impl Drop for CleanupGuard { + fn drop(&mut self) { + let actions = std::mem::take(&mut self.actions); + if actions.is_empty() { + return; // Test called cleanup().await — happy path + } + + // Test forgot cleanup().await — this is a bug, but during migration + // we still try to clean up so we don't leak DB rows. + eprintln!( + "warning: CleanupGuard dropped with {} pending actions in test — \ + test forgot to call .cleanup().await. Falling back to thread+runtime spawn.", + actions.len() + ); + + // ... Layer 1's bounded-join cleanup runs here ... + } +} +``` + +**Migration pattern:** + +```rust +// Before: +let mut guard = app.cleanup_guard(); +guard.add(...); +// ... test body ... +} // drop runs here, prints warning, runs cleanup via thread+runtime + +// After: +let mut guard = app.cleanup_guard(); +guard.add(...); +// ... test body ... +guard.cleanup().await; +} // drop runs here, sees empty actions, no-op +``` + +**Migration mechanics:** + +1. Run the full test suite with the Layer 2 changes in place. Capture stderr. +2. Grep stderr for "dropped with N pending actions" — this lists every unmigrated test. +3. Iterate by file. Pick `guild_limits.rs` first (2 of 3 known-flaky tests live here), add `guard.cleanup().await` before each test function returns, re-run `cargo nextest run -p vc-server --test integration guild_limits`, verify the warning is gone. +4. Repeat for `custom_status.rs` (the third known-flaky test). +5. Repeat for remaining files until no warnings remain. +6. **After ~2 weeks of clean main branch CI**, ship a follow-up commit that swaps the eprintln in Drop for `panic!("CleanupGuard dropped with pending actions — call .cleanup().await")`. This makes the contract permanent. + +**Why not compile-time enforcement?** + +Rust's `#[must_use]` on a struct fires when the value is unused at the call site (e.g., `app.cleanup_guard();` with no binding). It does NOT fire when the struct is bound to a variable and goes out of scope. Since every test in this codebase binds the guard (`let mut guard = ...`), `#[must_use]` would produce zero warnings for unmigrated tests. A custom clippy lint could catch the pattern, but writing one is more work than the runtime approach. + +**Estimated scope:** ~50 test functions across ~20 files. Each touch is 1 line. Total LOC change: ~50. + +### Layer 3: Root cause investigation — deferred + +Not in this PR. Tracked as a follow-up issue. Once Layer 2 lands and CI is green for ~2 weeks with zero "did not complete within" warnings, the question of "why does cleanup occasionally take >30s" becomes a quality concern, not a correctness issue. + +Investigation hooks to add when needed: + +- Per-action timing logs in `cleanup()` to identify which DELETE is slow +- Snapshot `pg_stat_activity` and `pg_locks` when cleanup exceeds 5s +- Check if `delete_user` cascades through FKs that contend with other tests +- Review pool sizing under parallel test load (`max_connections = 20` may be tight at 4 nextest workers × 5 connections per test) + +This is intentionally vague — if we keep getting "did not complete" warnings post-migration, that's the signal to start investigating. If we don't, no work is needed. + +### Single-PR shape + +| Commit | Layer | Files | Estimated LOC | +|---|---|---|---| +| 1 | Stopgap (bounded join in Drop) | `helpers/mod.rs` | ~15 | +| 2 | Add `cleanup(self)` method, update Drop fallback to warn on pending actions | `helpers/mod.rs` | ~20 | +| 3 | Migrate `guild_limits.rs` (priority — 2 known-flaky tests) | `guild_limits.rs` | ~10 | +| 4 | Migrate `custom_status.rs` (priority — 1 known-flaky test) | `custom_status.rs` | ~3 | +| 5..N | Migrate remaining test files (one commit per file or small group) | various | ~50 total | + +**CI verification gates per commit:** + +- `cargo clippy --tests -p vc-server -- -D warnings` clean +- Each migration commit: `cargo nextest run -p vc-server --test integration ` passes +- Final commit: full nextest run (lib + integration) passes 3 consecutive times + +**Stress-test recommendation:** After all commits, run locally: + +```sh +for i in {1..20}; do cargo nextest run -p vc-server --test integration; done +``` + +If any single run hits a "did not complete within" warning, pause and investigate before merging. If no warnings appear in 20 runs, the fix is solid. + +### Success criteria + +- CI Rust Tests passes 3 consecutive runs on the PR without timeouts +- Local stress test (20 runs) produces zero "did not complete within" warnings +- `cargo nextest run -p vc-server --test integration` produces zero "dropped with N pending actions" warnings (i.e. every test calls `.cleanup().await` before returning) +- The 3 known-flaky tests can be re-run individually 10× without timing out +- Follow-up tracked: a separate issue is filed (referenced from this PR description) to swap the eprintln in Drop for `panic!()` after ~2 weeks of clean main-branch CI + +--- + +## Implementation order + +The 4 topics are independent. Recommended order based on impact: + +1. **Topic 4 (CI flake)** — highest impact, blocks every PR's CI +2. **Topic 1 (devtime advisories)** — clears `bun audit` to 0, unblocks adding bun audit to CI +3. **Topic 2 (osv-scanner CI)** — depends on Topic 1 being green; otherwise the new job fails day 1 +4. **Topic 3 (scap fork)** — lowest impact, can be done anytime + +Each topic gets its own implementation plan and PR.