Skip to content

fix(check-collections): report runner scan FAILED with logs on abort re-raise (SAS-13001)#2779

Open
m1n0 wants to merge 1 commit into
mainfrom
SAS-13001-scan-failure-logs
Open

fix(check-collections): report runner scan FAILED with logs on abort re-raise (SAS-13001)#2779
m1n0 wants to merge 1 commit into
mainfrom
SAS-13001-scan-failure-logs

Conversation

@m1n0

@m1n0 m1n0 commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

What & why

SAS-13001: a post-migration contract scan failed but produced no logs in Cloud, making it undiagnosable.

Root cause is a diagnosability hole in the engine. A single-contract (runner) scan runs execute_check_collections with abort_on_first_error=True, so the first exception during contract construction or verify re-raises immediately — before phase 3's combined upload, which is the only place that ships the engine's captured logs to Cloud and calls mark_scan_as_failed. The exception then reaches the CLI (exit 3) and the launcher only writes it to pod logs, so the Cloud scan record shows FAILED with an empty log payload. This turns any pre-upload crash into a silent, undiagnosable scan.

Fix

Before the abort_on_first_error re-raise, best-effort mark the still-PENDING runner scan FAILED with the captured logs (_report_runner_scan_failed_before_reraise). The shared Logs gatherer holds every construction/verify record (each impl is built with logs=logs), so its records are exactly what should reach Cloud. The historical re-raise contract is preserved and reporting never masks the original exception. No-op for ad-hoc runs (no scan id).

Tests

  • New test_uncaught_exception_during_verify_marks_scan_failed_with_logs: a scan that raises during verify now marks the scan FAILED with a non-empty log payload (fails on main, passes here).
  • Full soda-core unit suite: 978 passed, 2 skipped; pre-commit clean.

Note

This is the diagnosability fix. The underlying crash for M&S (most likely a migrated contract with an unsupported check type) is tracked separately — this change makes that (and any future pre-upload crash) self-diagnosing in Cloud.

🤖 Generated with Claude Code

…re-raise

A single-contract (runner) scan runs with abort_on_first_error=True, so the
first construction/verify exception re-raises out of execute_check_collections
before phase 3's combined upload — the only place that otherwise ships the
engine logs and marks the scan FAILED in Cloud. The exception then reaches the
CLI (exit 3) and the launcher only writes it to pod logs, leaving the Cloud
scan record FAILED with no logs and the failure undiagnosable.

Before the abort re-raise, best-effort mark the still-PENDING runner scan
FAILED with the captured logs (the shared Logs gatherer holds every
construction/verify record). The re-raise contract is preserved and reporting
never masks the original exception. No-op for ad-hoc runs (no scan id).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@sonarqubecloud

sonarqubecloud Bot commented Jul 2, 2026

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant