Skip to content

skills(running-in-ci): forbid ScheduleWakeup and fire-and-forget background bash in CI#593

Closed
tend-agent wants to merge 1 commit into
mainfrom
hourly/review-26355490223
Closed

skills(running-in-ci): forbid ScheduleWakeup and fire-and-forget background bash in CI#593
tend-agent wants to merge 1 commit into
mainfrom
hourly/review-26355490223

Conversation

@tend-agent

Copy link
Copy Markdown
Collaborator

Problem

A tend-mention handle job on PRQL/prql (run 26347739838) ran for the full GitHub Actions 6-hour cap (00:43:51Z → 06:44:16Z, cancelled) after completing its actual work in ~20 minutes. Session log 7ffd00a0-8c0c-4282-b99b-d8ade07ad34b.jsonl ends at 01:04:44Z — the runner sat idle for ~5h40m doing no further work before GitHub Actions killed it.

The bot's last logged action was a ScheduleWakeup call:

{
  "delaySeconds": 270,
  "reason": "Fallback in case task notification is missed; primary signal is background completion notification.",
  "prompt": "Continue: check if `task prqlc:pull-request` finished, finalize merge cleanup commit, and push to PR #5741."
}

The bot had backgrounded task prqlc:pull-request with run_in_background: true and called ScheduleWakeup as a fallback for the bash-completion notification. In CI:

  • ScheduleWakeup is a /loop dynamic-mode tool — its description says so explicitly ("Schedule when to resume work in /loop dynamic mode"). Single-shot CI runs are not in /loop mode, so the wakeup never fires.
  • The backgrounded Bash completion notification needs an active turn to be received. After the bot ended its turn, no turn ever came back.

Net effect: the bot's final comment (PR #5741, comment 4526992996) promised follow-up work it never delivered, and the job burned ~5h40m of GitHub Actions runtime doing nothing.

Root cause

ScheduleWakeup and "background-bash-then-exit" both assume a future turn will fire. In single-shot CI, no such turn exists.

Fix

One bullet added to Restrictions in plugins/tend-ci-runner/skills/running-in-ci/SKILL.md. Tells the bot to either block on long-running work synchronously, finish what it can without waiting, or hand off in its final response (e.g. comment with what's left) and exit.

Gate assessment

  • Evidence level: High — 1 clear occurrence with a fully traceable decision chain (ScheduleWakeup call → session-log cutoff → 6h cap). The lost wall-clock time and broken promise to the maintainer are unambiguous.
  • Structural vs. stochastic: Structural. ScheduleWakeup is documented as /loop-only; in CI it has no fire mechanism. The same conditions (backgrounded long task + bot ends turn) will produce the same outcome every time.
  • Change type: Targeted fix (one bullet added to an existing list).
  • Gate 1: Pass — structural failures need 1 occurrence for a targeted fix.
  • Gate 2: Pass — targeted-fix evidence bar is "normal" (Gate 1 thresholds), and the change is small and proportionate.

Distinct from #572, which covers an unbounded until poll deadlock — same outcome (6h cap), different root cause and different fix.

Evidence log: https://gist.github.com/5686b1fca82c08d385d9031a79be4a3e

…ground bash in CI

ScheduleWakeup is a /loop dynamic-mode tool with no fire mechanism in
single-shot CI; calling it after backgrounding work leaves the action
idle until the 6h Actions cap. Same hazard for backgrounding a Bash
command then ending the turn — no follow-up turn arrives to receive the
completion notification.

Observed on PRQL/prql tend-mention run 26347739838: bot posted its plan
comment at 01:04:39Z, called ScheduleWakeup(270s) expecting to come back
after a `task prqlc:pull-request` background bash finished, then sat
idle until cancelled at 06:44:16Z.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@max-sixty

Copy link
Copy Markdown
Owner

no! we need to change our criteria; this is structural and it's not critical

@tend-agent

Copy link
Copy Markdown
Collaborator Author

Understood — closing this. The gate let it through because "structural" currently grants a 1-occurrence bypass regardless of evidence level. Filed #602 to tighten the criteria so structural classification raises recurrence confidence but doesn't override Gate 1 — only Critical findings act on a single occurrence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

claude-behavior Behavioral issues found by review-reviewers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants