Skip to content

Analyze and optimize excessive API calls#327

Closed
gsonntag wants to merge 1 commit intohackutd:masterfrom
gsonntag:claude/optimize-api-performance-Yt1t0
Closed

Analyze and optimize excessive API calls#327
gsonntag wants to merge 1 commit intohackutd:masterfrom
gsonntag:claude/optimize-api-performance-Yt1t0

Conversation

@gsonntag
Copy link
Copy Markdown

… judge load

Performance problems (p99=30000ms) traced to three root causes, all fixed:

  1. Audit log DB write on every API call (REMOVED) Every endpoint called Logger.Logf() which executed a synchronous majority-write MongoDB transaction (FindOneAndUpdate inside WithTransaction) before the HTTP response was sent. With 50+ judges judging concurrently this serialised all requests through MongoDB. Logs are now memory-only (still visible via GET /admin/log for the current session; historical entries still loaded from DB on startup).

  2. GetOptions DB read inside every judge transaction (CACHED) GetNextJudgeProject, JudgeFinish, JudgeRank, JudgeStar, JudgeSkip, GetDeliberationStatus, GetJudgingTimer, GetGroupInfo, CheckQRCode and ~10 more handlers each called database.GetOptions() — a full MongoDB FindOne — per request, many of them inside transactions. Options are now served from a server-side in-memory cache (sync.RWMutex + pointer) loaded at startup and invalidated on every admin write. Zero DB reads for options on the hot judge path.

  3. Admin panel polling: 6 API calls every 15s replaced by 1 The admin dashboard polled /admin/stats, /admin/clock, /project/list, /judge/list, /admin/options, and /admin/flags separately every 15 seconds per browser session. Added GET /admin/dashboard that serves all six payloads in one response, running the four DB-bound queries concurrently with goroutines. Clock and options come from memory.

Additional improvements:

  • 10-second TTL in-memory cache for AggregateStats and AggregateScores aggregation pipelines so concurrent admin sessions share one result.
  • Fixed clock mutex deadlock in GetNextJudgeProject: the original code acquired Clock.Mutex then returned early without releasing it when judging was paused, permanently blocking all subsequent clock ops.
  • Options reads before transactions now happen outside the transaction so deliberation/group checks abort before a session is opened.

https://claude.ai/code/session_01W7yeH1ccUKkrr8SbKrVhmT

Description

[Describe the issue that this fixes or the feature that this adds]

Fixes #[Issue]

Type of Change

Delete options that do not apply:

  • Bug fix (change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Refactor (code changes that doesn't affect functionality)
  • DevOps (changes to the build system/pipeline)
  • Documentation (works on updating the documentation)
  • Test (write test cases (huge w if u do this voluntarily))
  • Revert (oopsie, undo a commit or change)

Is this a breaking change?

  • Yes
  • No

… judge load

Performance problems (p99=30000ms) traced to three root causes, all fixed:

1. Audit log DB write on every API call (REMOVED)
   Every endpoint called Logger.Logf() which executed a synchronous
   majority-write MongoDB transaction (FindOneAndUpdate inside
   WithTransaction) before the HTTP response was sent. With 50+ judges
   judging concurrently this serialised all requests through MongoDB.
   Logs are now memory-only (still visible via GET /admin/log for the
   current session; historical entries still loaded from DB on startup).

2. GetOptions DB read inside every judge transaction (CACHED)
   GetNextJudgeProject, JudgeFinish, JudgeRank, JudgeStar, JudgeSkip,
   GetDeliberationStatus, GetJudgingTimer, GetGroupInfo, CheckQRCode and
   ~10 more handlers each called database.GetOptions() — a full MongoDB
   FindOne — per request, many of them inside transactions. Options are
   now served from a server-side in-memory cache (sync.RWMutex + pointer)
   loaded at startup and invalidated on every admin write. Zero DB reads
   for options on the hot judge path.

3. Admin panel polling: 6 API calls every 15s replaced by 1
   The admin dashboard polled /admin/stats, /admin/clock, /project/list,
   /judge/list, /admin/options, and /admin/flags separately every 15
   seconds per browser session. Added GET /admin/dashboard that serves
   all six payloads in one response, running the four DB-bound queries
   concurrently with goroutines. Clock and options come from memory.

Additional improvements:
- 10-second TTL in-memory cache for AggregateStats and AggregateScores
  aggregation pipelines so concurrent admin sessions share one result.
- Fixed clock mutex deadlock in GetNextJudgeProject: the original code
  acquired Clock.Mutex then returned early without releasing it when
  judging was paused, permanently blocking all subsequent clock ops.
- Options reads before transactions now happen outside the transaction
  so deliberation/group checks abort before a session is opened.

https://claude.ai/code/session_01W7yeH1ccUKkrr8SbKrVhmT
Copy link
Copy Markdown
Contributor

@MichaelZhao21 MichaelZhao21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you fill in the PR template and split the changes into multiple commits, one for each distinct change made? The PR in this form is extremely hard to review.

@hackutd hackutd deleted a comment from vercel bot Apr 14, 2026
@hackutd hackutd deleted a comment from vercel bot Apr 14, 2026
@gsonntag gsonntag closed this Apr 14, 2026
@gsonntag
Copy link
Copy Markdown
Author

mb gng didnt mean to create pr into this repo

@MichaelZhao21
Copy link
Copy Markdown
Contributor

LOL ur good, though if you want to create a PR with some of these changes I think the performance improvements would definitely be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants