Skip to content

fix: PR-review findings cleanup (security, cost, UX guards)#50

Merged
sharonds merged 20 commits intomainfrom
fix/pr-review-cleanup
Apr 22, 2026
Merged

fix: PR-review findings cleanup (security, cost, UX guards)#50
sharonds merged 20 commits intomainfrom
fix/pr-review-cleanup

Conversation

@sharonds
Copy link
Copy Markdown
Owner

Summary

Addresses 8 findings from GitHub PR reviews (Copilot + Codex) on PRs #43, #46, #47, validated against main at commit 92c605a and now e7a1fb7.

# Severity Finding Commit
1 🔴 Critical Deep Audit POST has no CSRF/loopback guard 6285ee3
2 🟠 High Estimator overcharges premium tier by $1.46/check 36e3d27
3 🟠 High Gemini maxOutputTokens forces 8192 for small requests (up to 16× cost) b503314
4 🟠 High formatShortDate("unknown") throws RangeError b77fb40
5 🟡 Medium API key + interaction id not URL-encoded in Gemini endpoints bb572e2
6 🟡 Medium Provider registry costPerCheckUsd mismatches tier pricing 656e372
7 🟡 Medium README AI Detection cost + default-enabled flags wrong 64a7624
CI Browser E2E lane not in CI 5d06536

Each fix is preceded by a failing regression test commit (TDD), plus a code-reviewer sub-agent sweep ran at both Gate 1 (after Critical+High) and Gate 2 (after Medium) — both came back clean.

Finding 1 — Deep Audit POST guard (Codex P1 on #46)

This mutation route starts paid Deep Research jobs but never calls guardLocalMutation, unlike other write endpoints.

Added guardLocalMutation(request) as the first line of the POST handler in dashboard/src/app/api/reports/[id]/deep-audit/route.ts. Mirrors the pattern used in providers/route.ts, skills/route.ts, config/route.ts, etc. Unit test uses a spoofed-Host NextRequest; integration test fires a real POST and asserts 403 when CSRF is missing or wrong.

Finding 2 — Premium tier estimator (Copilot + Codex P2 on #46)

estimateRunCost() treated factCheckTier="premium" as a $1.50 synchronous cost, but runtime routes premium → basic sync + async Deep Audit. Coerced configuredTier === "premium" → "basic" for the sync estimate; estimateFactCheckCost("premium") is preserved as a pure helper for Deep Audit pricing display (--estimate-cost CLI still shows the $1.50 row). Mirrored in dashboard/src/lib/estimator.ts.

Finding 3 — Gemini maxOutputTokens cap (Copilot on #46)

maxOutputTokens is set to Math.max(maxTokens, 8192), which ignores the caller's requested maxTokens for any value < 8192.

Changed to Math.min(maxTokens, 8192). No 1024 floor — factcheck.ts:171 intentionally calls with 512.

Finding 4 — Invalid date handling (Codex P2 on #47)

Guard on Number.isNaN(date.getTime()) in formatDate; returns "". Badge in ClaimDrillDown is omitted when the fallback triggers (no empty badge).

Finding 5 — URL encoding

All Gemini interaction + generateContent call sites now use encodeURIComponent on api key and interaction id: src/utils/interactions-api.ts, src/skills/llm.ts, src/skills/factcheck-grounded.ts.

Finding 6 — Provider registry cost alignment

  • gemini-grounded: costPerCheckUsd: 0.01 → 0.04, label $0.04/claim (≈ $0.16 per 4-claim article; matches standard tier)
  • gemini-deep-research: costPerCheckUsd: 0.05 → 0.375, label $0.375/claim (≈ $1.50 per 4-claim article; matches premium tier)

Sanity-checked all 4 estimator configs produce the expected tier prices.

Finding 7 — README drift

  • AI Detection cost: ~$0.09~$0.03 (matches AI_DETECTION_COST = 0.03)
  • Grammar / Academic / Self-Plagiarism correctly marked as disabled by default (matches DEFAULT_SKILLS).
  • Mirrored in docs/features.md.

CI

Added browser E2E lane (bun run test:e2e:browser) to .github/workflows/ci.yml with a 5-min timeout.

Test plan

  • bun run test — 343 pass, 0 fail (was 336 baseline; +7 new regression cases)
  • bun run test:dashboard — 92 pass, 0 fail (was 88 baseline; +4 new regression cases)
  • bun run test:e2e:browser — 18 pass, 0 fail (was 16 baseline; +2 CSRF regression tests)
  • cd dashboard && bun run build — green
  • Gate A: all 5 failing-oracle tests fail for the right reasons before fix commits
  • Gate 1: pr-review-toolkit:code-reviewer sub-agent — clean (Tasks 1-4)
  • Gate 2: pr-review-toolkit:code-reviewer sub-agent — clean (full branch)

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings April 22, 2026 18:02
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses previously reported review findings across the CLI, dashboard, docs, and CI by hardening localhost/CSRF protections for Deep Audit, correcting cost estimation and pricing metadata, reducing Gemini request cost risk, and adding regression coverage (unit + E2E) plus a CI lane to keep these guarantees enforced.

Changes:

  • Enforce loopback + CSRF guard on the dashboard Deep Audit POST route and add unit/E2E regressions.
  • Fix cost-related issues (premium tier sync estimator behavior, Gemini maxOutputTokens cap logic, provider pricing metadata alignment).
  • Improve robustness/UX around invalid dates, URL-encode Gemini identifiers, refresh docs, and add browser E2E lane to CI.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/e2e/browser/ui-deep-audit-csrf.test.ts Browser E2E regressions ensuring Deep Audit POST rejects missing/wrong CSRF.
src/utils/interactions-api.ts URL-encode Gemini interaction IDs and API keys in requests.
src/utils/interactions-api.test.ts Adds regression tests for URL-encoding behavior.
src/skills/llm.ts URL-encode Gemini API key and cap maxOutputTokens correctly.
src/skills/llm.test.ts Adds/updates tests for token budgeting behavior.
src/skills/factcheck-grounded.ts Align per-claim cost default and URL-encode Gemini API key.
src/skills/factcheck-grounded.test.ts Updates expected telemetry cost to match new pricing.
src/providers/registry.ts Adjusts Gemini grounded/deep-research pricing metadata to match tier totals.
src/cost/estimator.ts Prevent premium tier from being estimated as synchronous Deep Audit cost.
src/cost/estimator.test.ts Adds regression coverage for premium tier estimator behavior.
docs/features.md Updates skills table costs/default flags to match shipped constants/config.
dashboard/src/lib/providers.ts Mirrors provider pricing metadata changes in the dashboard registry.
dashboard/src/lib/format.ts Prevents invalid date inputs from throwing by returning empty string.
dashboard/src/lib/estimator.ts Mirrors premium-tier sync estimator logic in dashboard.
dashboard/src/components/ClaimDrillDown.tsx Omits published-date badge when date formatting falls back to empty string.
dashboard/src/app/api/reports/[id]/deep-audit/route.ts Adds guardLocalMutation to Deep Audit POST to enforce loopback+CSRF.
dashboard/src/tests/format.test.ts Regression tests for invalid date handling.
dashboard/src/tests/api/deep-audit-guard.test.ts Unit tests asserting Deep Audit POST is guarded (host spoof + CSRF cases).
README.md Updates skills table costs/default-enabled flags to match behavior/constants.
CHANGELOG.md Documents the security/cost/robustness fixes and CI addition.
.github/workflows/ci.yml Adds browser E2E lane to CI (including agent-browser install).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 229 to 237
const startedAt = Date.now();
const emitAttempt = (payload: Record<string, unknown>) =>
emitGroundedCallEvent({
provider: "gemini-grounded",
model,
claimPreview: claim.slice(0, 160),
retriesLeft,
costUsd: 0.01,
costUsd: 0.04,
...payload,
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

costUsd in emitGroundedCallEvent is still hard-coded (0.04). This can drift from resolved.metadata.costPerCheckUsd if registry pricing changes again; consider passing the per-claim cost into fetchGroundedAssessment and emitting that value instead.

Copilot uses AI. Check for mistakes.
Comment on lines +46 to +49
} finally {
await handle.stop();
await new Promise((r) => setTimeout(r, 200));
temp.cleanup();
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The post-stop delay here is 200ms, but other dashboard E2E tests use 500ms explicitly to allow Next’s dev lock to release before the next boot. Using a shorter delay risks flaky E2E runs; consider matching the 500ms delay (or centralizing this in a shared helper).

Copilot uses AI. Check for mistakes.
Comment on lines +71 to +74
} finally {
await handle.stop();
await new Promise((r) => setTimeout(r, 200));
temp.cleanup();
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: using a 200ms post-stop delay may be too short to reliably release Next’s dev lock between tests. Align with the 500ms delay used elsewhere (or centralize teardown in a helper) to avoid intermittent CI flakes.

Copilot uses AI. Check for mistakes.
Comment thread .github/workflows/ci.yml
Comment on lines +38 to +43
- name: Install agent-browser
run: npm install -g agent-browser && agent-browser install

- name: Run browser E2E lane
run: bun run test:e2e:browser
timeout-minutes: 5
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI installs agent-browser globally without a version pin. This can make CI non-reproducible (sudden upstream releases breaking the lane) and increases supply-chain risk. Prefer pinning the version (e.g. install agent-browser@<known-good>), or add it as a devDependency and run it via a lockfile-managed runner (bunx/npx).

Copilot uses AI. Check for mistakes.
Comment on lines 41 to 44
{s.publishedDate && formatShortDate(s.publishedDate) && (
<Badge variant="secondary" className="ml-2">
{formatShortDate(s.publishedDate)}
</Badge>
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatShortDate(s.publishedDate) is invoked twice (once in the conditional and again for rendering). Store the formatted value in a local variable so it’s computed once and used consistently.

Copilot uses AI. Check for mistakes.
Resolve README skills table conflict: keep 'disabled by default' framing
from #50 for Grammar/Academic/Self-Plagiarism (matches DEFAULT_SKILLS) but
update Academic engine label to OpenAlex (default) / Semantic Scholar
(legacy) from #49. Bump version 1.3.0 → 1.3.1 and move CHANGELOG entry
to a dated 1.3.1 section.
@sharonds sharonds merged commit 33c709a into main Apr 22, 2026
4 checks passed
@sharonds sharonds deleted the fix/pr-review-cleanup branch May 6, 2026 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants