Skip to content

feat: add resilient background job retry & monitoring (#130)#914

Open
tarai-dl wants to merge 2 commits intorohitdash08:mainfrom
tarai-dl:rn/job-retry-monitoring
Open

feat: add resilient background job retry & monitoring (#130)#914
tarai-dl wants to merge 2 commits intorohitdash08:mainfrom
tarai-dl:rn/job-retry-monitoring

Conversation

@tarai-dl
Copy link
Copy Markdown

Summary

Implements resilient background job retry and monitoring as described in #130.

Changes

API Client (app/src/api/client.ts)

  • Added configurable retry with exponential backoff (default 3 retries, 500ms base, 10s max)
  • Added ±25% jitter to prevent thundering herd
  • Retryable HTTP statuses: 408, 429, 500, 502, 503, 504
  • Network errors (connection failures, timeouts) are automatically retried
  • Built-in onApiMetric() listener for real-time monitoring

useRetry Hook (app/src/hooks/useRetry.ts)

  • React hook wrapping any async function with retry logic
  • Callbacks for onRetry, onFailure, onSuccess
  • Tracks loading, error, attempts state

Job Monitor (app/src/components/jobs/JobMonitor.tsx)

  • Real-time dashboard showing job status summary (Pending/Processing/Sent/Failed/Retrying)
  • Click-to-filter status cards
  • Job list with status badges, attempt counts, error messages, next retry time
  • Manual retry button for failed jobs
  • Live API metrics panel showing request status, duration, and retry indicators
  • Auto-refreshes every 30 seconds

Job Tracking API (app/src/api/jobs.ts)

  • listJobs(), retryJob(), computeJobSummary()

Tests

  • apiClient.test.ts — Tests for backoff computation and metric listeners
  • useRetry.test.ts — Tests for retry hook (success, retry, failure, reset)
  • jobs.test.ts — Tests for job summary computation

Documentation

  • Updated README with comprehensive section on the new retry & monitoring system

Acceptance Criteria

  • Production ready implementation
  • Includes tests
  • Documentation updated

Closes #130

Implement a production-ready background job system:

- Redis-backed job queue with priority support
- Exponential backoff retry (configurable max retries, default 5)
- Dead-letter queue for permanently failed jobs
- Job status tracking in PostgreSQL
- Distributed locking to prevent duplicate execution
- APScheduler integration for automatic job processing
- REST API for job monitoring and management
- Prometheus metrics integration
- Tests for queue, worker, and retry logic

Closes rohitdash08#130
- Enhanced API client with exponential backoff retry logic and jitter
- Added useRetry hook for wrapping async functions with retry
- Created JobMonitor dashboard component with live API metrics
- Added jobs API for tracking background job delivery status
- Integrated JobMonitor into the Reminders page
- Added comprehensive tests for retry logic and job monitoring
- Updated README with documentation

Closes rohitdash08#130
@tarai-dl tarai-dl requested a review from rohitdash08 as a code owner April 18, 2026 04:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Resilient background job retry & monitoring

1 participant