fix: PR-review findings cleanup (security, cost, UX guards) by sharonds · Pull Request #50 · sharonds/checkapp

sharonds · 2026-04-22T18:02:03Z

Summary

Addresses 8 findings from GitHub PR reviews (Copilot + Codex) on PRs #43, #46, #47, validated against main at commit 92c605a and now e7a1fb7.

#	Severity	Finding	Commit
1	🔴 Critical	Deep Audit POST has no CSRF/loopback guard	`6285ee3`
2	🟠 High	Estimator overcharges premium tier by $1.46/check	`36e3d27`
3	🟠 High	Gemini `maxOutputTokens` forces 8192 for small requests (up to 16× cost)	`b503314`
4	🟠 High	`formatShortDate("unknown")` throws `RangeError`	`b77fb40`
5	🟡 Medium	API key + interaction id not URL-encoded in Gemini endpoints	`bb572e2`
6	🟡 Medium	Provider registry `costPerCheckUsd` mismatches tier pricing	`656e372`
7	🟡 Medium	README AI Detection cost + default-enabled flags wrong	`64a7624`
—	CI	Browser E2E lane not in CI	`5d06536`

Each fix is preceded by a failing regression test commit (TDD), plus a code-reviewer sub-agent sweep ran at both Gate 1 (after Critical+High) and Gate 2 (after Medium) — both came back clean.

Finding 1 — Deep Audit POST guard (Codex P1 on #46)

This mutation route starts paid Deep Research jobs but never calls guardLocalMutation, unlike other write endpoints.

Added guardLocalMutation(request) as the first line of the POST handler in dashboard/src/app/api/reports/[id]/deep-audit/route.ts. Mirrors the pattern used in providers/route.ts, skills/route.ts, config/route.ts, etc. Unit test uses a spoofed-Host NextRequest; integration test fires a real POST and asserts 403 when CSRF is missing or wrong.

Finding 2 — Premium tier estimator (Copilot + Codex P2 on #46)

estimateRunCost() treated factCheckTier="premium" as a $1.50 synchronous cost, but runtime routes premium → basic sync + async Deep Audit. Coerced configuredTier === "premium" → "basic" for the sync estimate; estimateFactCheckCost("premium") is preserved as a pure helper for Deep Audit pricing display (--estimate-cost CLI still shows the $1.50 row). Mirrored in dashboard/src/lib/estimator.ts.

Finding 3 — Gemini `maxOutputTokens` cap (Copilot on #46)

maxOutputTokens is set to Math.max(maxTokens, 8192), which ignores the caller's requested maxTokens for any value < 8192.

Changed to Math.min(maxTokens, 8192). No 1024 floor — factcheck.ts:171 intentionally calls with 512.

Finding 4 — Invalid date handling (Codex P2 on #47)

Guard on Number.isNaN(date.getTime()) in formatDate; returns "". Badge in ClaimDrillDown is omitted when the fallback triggers (no empty badge).

Finding 5 — URL encoding

All Gemini interaction + generateContent call sites now use encodeURIComponent on api key and interaction id: src/utils/interactions-api.ts, src/skills/llm.ts, src/skills/factcheck-grounded.ts.

Finding 6 — Provider registry cost alignment

gemini-grounded: costPerCheckUsd: 0.01 → 0.04, label $0.04/claim (≈ $0.16 per 4-claim article; matches standard tier)
gemini-deep-research: costPerCheckUsd: 0.05 → 0.375, label $0.375/claim (≈ $1.50 per 4-claim article; matches premium tier)

Sanity-checked all 4 estimator configs produce the expected tier prices.

Finding 7 — README drift

AI Detection cost: ~$0.09 → ~$0.03 (matches AI_DETECTION_COST = 0.03)
Grammar / Academic / Self-Plagiarism correctly marked as disabled by default (matches DEFAULT_SKILLS).
Mirrored in docs/features.md.

CI

Added browser E2E lane (bun run test:e2e:browser) to .github/workflows/ci.yml with a 5-min timeout.

Test plan

bun run test — 343 pass, 0 fail (was 336 baseline; +7 new regression cases)
bun run test:dashboard — 92 pass, 0 fail (was 88 baseline; +4 new regression cases)
bun run test:e2e:browser — 18 pass, 0 fail (was 16 baseline; +2 CSRF regression tests)
cd dashboard && bun run build — green
Gate A: all 5 failing-oracle tests fail for the right reasons before fix commits
Gate 1: pr-review-toolkit:code-reviewer sub-agent — clean (Tasks 1-4)
Gate 2: pr-review-toolkit:code-reviewer sub-agent — clean (full branch)

🤖 Generated with Claude Code

#46 review)

…view)

…eview)

…ns API (#46 review)

…review)

…iew-#46)

…te (#46 review)

…92 (#46 review)

…eview)

…oints (#46 review)

… tier prices (#46 review)

…review)

Copilot

Pull request overview

This PR addresses previously reported review findings across the CLI, dashboard, docs, and CI by hardening localhost/CSRF protections for Deep Audit, correcting cost estimation and pricing metadata, reducing Gemini request cost risk, and adding regression coverage (unit + E2E) plus a CI lane to keep these guarantees enforced.

Changes:

Enforce loopback + CSRF guard on the dashboard Deep Audit POST route and add unit/E2E regressions.
Fix cost-related issues (premium tier sync estimator behavior, Gemini maxOutputTokens cap logic, provider pricing metadata alignment).
Improve robustness/UX around invalid dates, URL-encode Gemini identifiers, refresh docs, and add browser E2E lane to CI.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tests/e2e/browser/ui-deep-audit-csrf.test.ts	Browser E2E regressions ensuring Deep Audit POST rejects missing/wrong CSRF.
src/utils/interactions-api.ts	URL-encode Gemini interaction IDs and API keys in requests.
src/utils/interactions-api.test.ts	Adds regression tests for URL-encoding behavior.
src/skills/llm.ts	URL-encode Gemini API key and cap `maxOutputTokens` correctly.
src/skills/llm.test.ts	Adds/updates tests for token budgeting behavior.
src/skills/factcheck-grounded.ts	Align per-claim cost default and URL-encode Gemini API key.
src/skills/factcheck-grounded.test.ts	Updates expected telemetry cost to match new pricing.
src/providers/registry.ts	Adjusts Gemini grounded/deep-research pricing metadata to match tier totals.
src/cost/estimator.ts	Prevent premium tier from being estimated as synchronous Deep Audit cost.
src/cost/estimator.test.ts	Adds regression coverage for premium tier estimator behavior.
docs/features.md	Updates skills table costs/default flags to match shipped constants/config.
dashboard/src/lib/providers.ts	Mirrors provider pricing metadata changes in the dashboard registry.
dashboard/src/lib/format.ts	Prevents invalid date inputs from throwing by returning empty string.
dashboard/src/lib/estimator.ts	Mirrors premium-tier sync estimator logic in dashboard.
dashboard/src/components/ClaimDrillDown.tsx	Omits published-date badge when date formatting falls back to empty string.
dashboard/src/app/api/reports/[id]/deep-audit/route.ts	Adds `guardLocalMutation` to Deep Audit POST to enforce loopback+CSRF.
dashboard/src/tests/format.test.ts	Regression tests for invalid date handling.
dashboard/src/tests/api/deep-audit-guard.test.ts	Unit tests asserting Deep Audit POST is guarded (host spoof + CSRF cases).
README.md	Updates skills table costs/default-enabled flags to match behavior/constants.
CHANGELOG.md	Documents the security/cost/robustness fixes and CI addition.
.github/workflows/ci.yml	Adds browser E2E lane to CI (including agent-browser install).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-22T18:07:34Z

  const startedAt = Date.now();
  const emitAttempt = (payload: Record<string, unknown>) =>
    emitGroundedCallEvent({
      provider: "gemini-grounded",
      model,
      claimPreview: claim.slice(0, 160),
      retriesLeft,
-      costUsd: 0.01,
+      costUsd: 0.04,
      ...payload,


costUsd in emitGroundedCallEvent is still hard-coded (0.04). This can drift from resolved.metadata.costPerCheckUsd if registry pricing changes again; consider passing the per-claim cost into fetchGroundedAssessment and emitting that value instead.

Copilot · 2026-04-22T18:07:35Z

+    } finally {
+      await handle.stop();
+      await new Promise((r) => setTimeout(r, 200));
+      temp.cleanup();


The post-stop delay here is 200ms, but other dashboard E2E tests use 500ms explicitly to allow Next’s dev lock to release before the next boot. Using a shorter delay risks flaky E2E runs; consider matching the 500ms delay (or centralizing this in a shared helper).

Copilot · 2026-04-22T18:07:35Z

+    } finally {
+      await handle.stop();
+      await new Promise((r) => setTimeout(r, 200));
+      temp.cleanup();


Same as above: using a 200ms post-stop delay may be too short to reliably release Next’s dev lock between tests. Align with the 500ms delay used elsewhere (or centralize teardown in a helper) to avoid intermittent CI flakes.

Copilot · 2026-04-22T18:07:35Z

+      - name: Install agent-browser
+        run: npm install -g agent-browser && agent-browser install
+
+      - name: Run browser E2E lane
+        run: bun run test:e2e:browser
+        timeout-minutes: 5


CI installs agent-browser globally without a version pin. This can make CI non-reproducible (sudden upstream releases breaking the lane) and increases supply-chain risk. Prefer pinning the version (e.g. install agent-browser@<known-good>), or add it as a devDependency and run it via a lockfile-managed runner (bunx/npx).

Copilot · 2026-04-22T18:07:35Z

+              {s.publishedDate && formatShortDate(s.publishedDate) && (
                <Badge variant="secondary" className="ml-2">
                  {formatShortDate(s.publishedDate)}
                </Badge>


formatShortDate(s.publishedDate) is invoked twice (once in the conditional and again for rendering). Store the formatted value in a local variable so it’s computed once and used consistently.

…view)

…Copilot PR #50 review)

…flake)

Resolve README skills table conflict: keep 'disabled by default' framing from #50 for Grammar/Academic/Self-Plagiarism (matches DEFAULT_SKILLS) but update Academic engine label to OpenAlex (default) / Semantic Scholar (legacy) from #49. Bump version 1.3.0 → 1.3.1 and move CHANGELOG entry to a dated 1.3.1 section.

sharonds added 14 commits April 22, 2026 19:40

test(e2e+unit): failing CSRF + loopback regression for Deep Audit POST (

4023d3a

#46 review)

test(cost): failing regression for premium-tier sync estimate (#46 re…

40c9d7b

…view)

test(llm): failing regression for Gemini maxOutputTokens clamp (#46 r…

99843e0

…eview)

test(utils): failing regression for URL encoding in Gemini interactio…

c107668

…ns API (#46 review)

test(dashboard-format): failing regression for invalid date input (#47 …

4b48433

…review)

fix(dashboard): add CSRF + loopback guard to Deep Audit POST (#44 rev…

6285ee3

…iew-#46)

ci: run browser E2E lane so new UI regressions are caught (#46 review)

5d06536

fix(cost): premium tier uses basic sync cost; Deep Audit stays separa…

36e3d27

…te (#46 review)

fix(llm): Gemini maxOutputTokens respects caller budget, capped at 81…

b503314

…92 (#46 review)

fix(dashboard): formatShortDate fallback on invalid date input (#47 r…

b77fb40

…eview)

fix(gemini): URL-encode api key and interaction id in all Gemini endp…

bb572e2

…oints (#46 review)

fix(providers): gemini tier costs are per-claim, aligned with 4-claim…

656e372

… tier prices (#46 review)

docs(readme): align skills table with shipped defaults and costs (#43 …

64a7624

…review)

docs(changelog): document PR-review-findings cleanup under [Unreleased]

c1488e9

Copilot AI review requested due to automatic review settings April 22, 2026 18:02

Copilot started reviewing on behalf of sharonds April 22, 2026 18:02 View session

Copilot AI reviewed Apr 22, 2026

View reviewed changes

sharonds added 6 commits April 22, 2026 20:13

fix(grounded): emit per-claim cost from registry (Copilot PR #50 review)

779c5b1

test(e2e): bump CSRF-test post-stop delay to 500ms (Copilot PR #50 re…

e319a30

…view)

ci: pin agent-browser to 0.26.0 (Copilot PR #50 review)

0fe7c44

refactor(dashboard): hoist formatShortDate result in ClaimDrillDown (…

19a9f89

…Copilot PR #50 review)

test(e2e): poll for skills list before asserting body text (fixes CI …

792337f

…flake)

sharonds merged commit 33c709a into main Apr 22, 2026
4 checks passed

sharonds deleted the fix/pr-review-cleanup branch May 6, 2026 15:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: PR-review findings cleanup (security, cost, UX guards)#50

fix: PR-review findings cleanup (security, cost, UX guards)#50
sharonds merged 20 commits intomainfrom
fix/pr-review-cleanup

sharonds commented Apr 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 22, 2026

Uh oh!

Copilot AI Apr 22, 2026

Uh oh!

Copilot AI Apr 22, 2026

Uh oh!

Copilot AI Apr 22, 2026

Uh oh!

Copilot AI Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sharonds commented Apr 22, 2026

Summary

Finding 1 — Deep Audit POST guard (Codex P1 on #46)

Finding 2 — Premium tier estimator (Copilot + Codex P2 on #46)

Finding 3 — Gemini maxOutputTokens cap (Copilot on #46)

Finding 4 — Invalid date handling (Codex P2 on #47)

Finding 5 — URL encoding

Finding 6 — Provider registry cost alignment

Finding 7 — README drift

CI

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Finding 3 — Gemini `maxOutputTokens` cap (Copilot on #46)