fix: PR-review findings cleanup (security, cost, UX guards)#50
fix: PR-review findings cleanup (security, cost, UX guards)#50
Conversation
… tier prices (#46 review)
There was a problem hiding this comment.
Pull request overview
This PR addresses previously reported review findings across the CLI, dashboard, docs, and CI by hardening localhost/CSRF protections for Deep Audit, correcting cost estimation and pricing metadata, reducing Gemini request cost risk, and adding regression coverage (unit + E2E) plus a CI lane to keep these guarantees enforced.
Changes:
- Enforce loopback + CSRF guard on the dashboard Deep Audit POST route and add unit/E2E regressions.
- Fix cost-related issues (premium tier sync estimator behavior, Gemini
maxOutputTokenscap logic, provider pricing metadata alignment). - Improve robustness/UX around invalid dates, URL-encode Gemini identifiers, refresh docs, and add browser E2E lane to CI.
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/e2e/browser/ui-deep-audit-csrf.test.ts | Browser E2E regressions ensuring Deep Audit POST rejects missing/wrong CSRF. |
| src/utils/interactions-api.ts | URL-encode Gemini interaction IDs and API keys in requests. |
| src/utils/interactions-api.test.ts | Adds regression tests for URL-encoding behavior. |
| src/skills/llm.ts | URL-encode Gemini API key and cap maxOutputTokens correctly. |
| src/skills/llm.test.ts | Adds/updates tests for token budgeting behavior. |
| src/skills/factcheck-grounded.ts | Align per-claim cost default and URL-encode Gemini API key. |
| src/skills/factcheck-grounded.test.ts | Updates expected telemetry cost to match new pricing. |
| src/providers/registry.ts | Adjusts Gemini grounded/deep-research pricing metadata to match tier totals. |
| src/cost/estimator.ts | Prevent premium tier from being estimated as synchronous Deep Audit cost. |
| src/cost/estimator.test.ts | Adds regression coverage for premium tier estimator behavior. |
| docs/features.md | Updates skills table costs/default flags to match shipped constants/config. |
| dashboard/src/lib/providers.ts | Mirrors provider pricing metadata changes in the dashboard registry. |
| dashboard/src/lib/format.ts | Prevents invalid date inputs from throwing by returning empty string. |
| dashboard/src/lib/estimator.ts | Mirrors premium-tier sync estimator logic in dashboard. |
| dashboard/src/components/ClaimDrillDown.tsx | Omits published-date badge when date formatting falls back to empty string. |
| dashboard/src/app/api/reports/[id]/deep-audit/route.ts | Adds guardLocalMutation to Deep Audit POST to enforce loopback+CSRF. |
| dashboard/src/tests/format.test.ts | Regression tests for invalid date handling. |
| dashboard/src/tests/api/deep-audit-guard.test.ts | Unit tests asserting Deep Audit POST is guarded (host spoof + CSRF cases). |
| README.md | Updates skills table costs/default-enabled flags to match behavior/constants. |
| CHANGELOG.md | Documents the security/cost/robustness fixes and CI addition. |
| .github/workflows/ci.yml | Adds browser E2E lane to CI (including agent-browser install). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const startedAt = Date.now(); | ||
| const emitAttempt = (payload: Record<string, unknown>) => | ||
| emitGroundedCallEvent({ | ||
| provider: "gemini-grounded", | ||
| model, | ||
| claimPreview: claim.slice(0, 160), | ||
| retriesLeft, | ||
| costUsd: 0.01, | ||
| costUsd: 0.04, | ||
| ...payload, |
There was a problem hiding this comment.
costUsd in emitGroundedCallEvent is still hard-coded (0.04). This can drift from resolved.metadata.costPerCheckUsd if registry pricing changes again; consider passing the per-claim cost into fetchGroundedAssessment and emitting that value instead.
| } finally { | ||
| await handle.stop(); | ||
| await new Promise((r) => setTimeout(r, 200)); | ||
| temp.cleanup(); |
There was a problem hiding this comment.
The post-stop delay here is 200ms, but other dashboard E2E tests use 500ms explicitly to allow Next’s dev lock to release before the next boot. Using a shorter delay risks flaky E2E runs; consider matching the 500ms delay (or centralizing this in a shared helper).
| } finally { | ||
| await handle.stop(); | ||
| await new Promise((r) => setTimeout(r, 200)); | ||
| temp.cleanup(); |
There was a problem hiding this comment.
Same as above: using a 200ms post-stop delay may be too short to reliably release Next’s dev lock between tests. Align with the 500ms delay used elsewhere (or centralize teardown in a helper) to avoid intermittent CI flakes.
| - name: Install agent-browser | ||
| run: npm install -g agent-browser && agent-browser install | ||
|
|
||
| - name: Run browser E2E lane | ||
| run: bun run test:e2e:browser | ||
| timeout-minutes: 5 |
There was a problem hiding this comment.
CI installs agent-browser globally without a version pin. This can make CI non-reproducible (sudden upstream releases breaking the lane) and increases supply-chain risk. Prefer pinning the version (e.g. install agent-browser@<known-good>), or add it as a devDependency and run it via a lockfile-managed runner (bunx/npx).
| {s.publishedDate && formatShortDate(s.publishedDate) && ( | ||
| <Badge variant="secondary" className="ml-2"> | ||
| {formatShortDate(s.publishedDate)} | ||
| </Badge> |
There was a problem hiding this comment.
formatShortDate(s.publishedDate) is invoked twice (once in the conditional and again for rendering). Store the formatted value in a local variable so it’s computed once and used consistently.
…Copilot PR #50 review)
Resolve README skills table conflict: keep 'disabled by default' framing from #50 for Grammar/Academic/Self-Plagiarism (matches DEFAULT_SKILLS) but update Academic engine label to OpenAlex (default) / Semantic Scholar (legacy) from #49. Bump version 1.3.0 → 1.3.1 and move CHANGELOG entry to a dated 1.3.1 section.
Summary
Addresses 8 findings from GitHub PR reviews (Copilot + Codex) on PRs #43, #46, #47, validated against
mainat commit92c605aand nowe7a1fb7.6285ee336e3d27maxOutputTokensforces 8192 for small requests (up to 16× cost)b503314formatShortDate("unknown")throwsRangeErrorb77fb40bb572e2costPerCheckUsdmismatches tier pricing656e37264a76245d06536Each fix is preceded by a failing regression test commit (TDD), plus a code-reviewer sub-agent sweep ran at both Gate 1 (after Critical+High) and Gate 2 (after Medium) — both came back clean.
Finding 1 — Deep Audit POST guard (Codex P1 on #46)
Added
guardLocalMutation(request)as the first line of the POST handler indashboard/src/app/api/reports/[id]/deep-audit/route.ts. Mirrors the pattern used inproviders/route.ts,skills/route.ts,config/route.ts, etc. Unit test uses a spoofed-HostNextRequest; integration test fires a real POST and asserts 403 when CSRF is missing or wrong.Finding 2 — Premium tier estimator (Copilot + Codex P2 on #46)
estimateRunCost()treatedfactCheckTier="premium"as a $1.50 synchronous cost, but runtime routes premium → basic sync + async Deep Audit. CoercedconfiguredTier === "premium" → "basic"for the sync estimate;estimateFactCheckCost("premium")is preserved as a pure helper for Deep Audit pricing display (--estimate-costCLI still shows the $1.50 row). Mirrored indashboard/src/lib/estimator.ts.Finding 3 — Gemini
maxOutputTokenscap (Copilot on #46)Changed to
Math.min(maxTokens, 8192). No 1024 floor —factcheck.ts:171intentionally calls with 512.Finding 4 — Invalid date handling (Codex P2 on #47)
Guard on
Number.isNaN(date.getTime())informatDate; returns"". Badge inClaimDrillDownis omitted when the fallback triggers (no empty badge).Finding 5 — URL encoding
All Gemini interaction + generateContent call sites now use
encodeURIComponenton api key and interaction id:src/utils/interactions-api.ts,src/skills/llm.ts,src/skills/factcheck-grounded.ts.Finding 6 — Provider registry cost alignment
gemini-grounded:costPerCheckUsd: 0.01 → 0.04, label$0.04/claim(≈ $0.16 per 4-claim article; matchesstandardtier)gemini-deep-research:costPerCheckUsd: 0.05 → 0.375, label$0.375/claim(≈ $1.50 per 4-claim article; matchespremiumtier)Sanity-checked all 4 estimator configs produce the expected tier prices.
Finding 7 — README drift
~$0.09→~$0.03(matchesAI_DETECTION_COST = 0.03)DEFAULT_SKILLS).docs/features.md.CI
Added browser E2E lane (
bun run test:e2e:browser) to.github/workflows/ci.ymlwith a 5-min timeout.Test plan
bun run test— 343 pass, 0 fail (was 336 baseline; +7 new regression cases)bun run test:dashboard— 92 pass, 0 fail (was 88 baseline; +4 new regression cases)bun run test:e2e:browser— 18 pass, 0 fail (was 16 baseline; +2 CSRF regression tests)cd dashboard && bun run build— green🤖 Generated with Claude Code