feat: Production readiness — monitoring, alerting, SLOs, runbooks, data retention, pentest, UAT by devin-ai-integration[bot] · Pull Request #3 · munisp/remitflow

devin-ai-integration · 2026-06-16T13:21:46Z

Summary

Adds the remaining 8 production readiness items: monitoring stack, alerting rules, SLO definitions, incident response runbooks, data retention policy, authenticated penetration testing, and UAT scenarios. 2,506 lines across 21 files.

New structure:

ops/
├── monitoring/
│   ├── docker-compose.monitoring.yml     # Prometheus + Grafana + Alertmanager stack
│   ├── prometheus/
│   │   ├── prometheus.yml                # Scrape config (API + Go + Rust + Python services)
│   │   └── alerts.yml                    # 20 alert rules in 5 groups
│   ├── alertmanager/alertmanager.yml     # PagerDuty/Opsgenie/Slack routing
│   └── grafana/
│       ├── dashboards/                   # Transfer Ops (14 panels) + Infra (11 panels)
│       └── provisioning/                 # Auto-load datasources + dashboards
├── runbooks/
│   ├── incident-response.md             # SEV1-4 classification + lifecycle
│   ├── ledger-imbalance.md              # Zero-tolerance TigerBeetle balance check
│   ├── stuck-transfers.md               # Decision tree + retry/compensate/failover
│   ├── rail-provider-down.md            # Backup rail matrix + failover commands
│   ├── low-success-rate.md              # Error code → action mapping
│   ├── slow-delivery.md                 # p95 latency investigation
│   └── sanctions-screening-down.md      # Regulatory hold procedure (mandatory)
├── slo/service-level-objectives.yml     # 12 SLOs with error budget policy
└── data-retention/data-retention-policy.yml  # GDPR/NDPR/POPIA/PDPA compliance

Key design decisions:

Alerting routes critical financial alerts (ledger imbalance) to a separate PagerDuty escalation with 0s group wait and 15min repeat — money integrity is treated differently from latency/errors
SLOs include error budget policy: 75%→25% remaining triggers increasing reliability work allocation, exhausted = production freeze
Data retention distinguishes 8 categories with different periods: transactions 7yr (CBN), SARs 10yr (FATF), sessions 24hr, analytics 3yr (anonymized)
Sanctions screening runbook mandates transfer hold (not bypass) — regulatory requirement

QA additions:

qa/security/pentest-authenticated.sh — BOLA, privilege escalation, rate limiting, SSRF, XSS, financial limit bypass tests
qa/uat/uat-scenarios.sh — 5 stakeholder journeys (diaspora worker, merchant, employer, DeFi user, agent)

Link to Devin session: https://app.devin.ai/sessions/64d054ae77da41e9a2b74d8593fa635c
Requested by: @munisp

…ta retention, pentest, UAT Monitoring & Alerting: - Grafana dashboards: Transfer Operations (14 panels) + Infrastructure (11 panels) - Prometheus alerting: 20 rules across 5 groups (financial, SLA, infra, compliance, settlement) - Alertmanager config: PagerDuty (critical), Opsgenie (warning), Slack (info) - Docker Compose monitoring stack (Prometheus + Grafana + Alertmanager) SLO Definitions: - 12 SLOs: fund delivery 99.9%, API availability 99.95%, ledger integrity 100% - Settlement latency targets per rail (M-Pesa 10s, NIBSS 30s, SEPA 4h, SWIFT 48h) - Error budget policy with escalation levels (25%/50%/75%/100% consumed) Incident Response: - 6 runbooks: ledger imbalance, stuck transfers, rail provider down, slow delivery, low success rate, sanctions screening down - Incident response procedure with severity classification (SEV1-4) - On-call schedule template and communication templates Data Retention: - GDPR/NDPR/POPIA/PDPA compliant retention policy - 8 data categories with specific retention periods and deletion procedures - DSAR implementation (right to access, erasure, portability) - Automated retention jobs (weekly anonymization, monthly archival) QA Additions: - Authenticated penetration test runner (BOLA, privilege escalation, rate limiting) - UAT scenarios for 5 stakeholder journeys (diaspora worker, merchant, employer, DeFi, agent) Co-Authored-By: Patrick Munis <pmunis@gmail.com>

devin-ai-integration · 2026-06-16T13:21:51Z

Original prompt from Patrick

https://drive.google.com/file/d/14K-94cZoOVgiYCUA-VympU-4_8IBqv2d/view?usp=sharing
extract the contents of the archive. List all the features of the platform

devin-ai-integration · 2026-06-16T13:21:52Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment, CI, and merge conflict monitoring

devin-ai-integration Bot assigned munisp Jun 16, 2026

munisp merged commit c7e8a7e into base Jun 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Production readiness — monitoring, alerting, SLOs, runbooks, data retention, pentest, UAT#3

feat: Production readiness — monitoring, alerting, SLOs, runbooks, data retention, pentest, UAT#3
munisp merged 1 commit into
basefrom
devin/1781614437-production-readiness

devin-ai-integration Bot commented Jun 16, 2026

Uh oh!

devin-ai-integration Bot commented Jun 16, 2026

Uh oh!

devin-ai-integration Bot commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

devin-ai-integration Bot commented Jun 16, 2026

Summary

Uh oh!

devin-ai-integration Bot commented Jun 16, 2026

Uh oh!

devin-ai-integration Bot commented Jun 16, 2026

🤖 Devin AI Engineer

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant