skills(triage): drop "run full suite" — defer to PR CI per "ship before end_turn" by tend-agent · Pull Request #671 · max-sixty/tend

tend-agent · 2026-06-08T16:26:34Z

Problem

tend-triage's Step 6 "If fixing" tells the bot to "Run the full test suite and lints" before commit/push. That directly conflicts with running-in-ci's "End the turn only when work is shipped" (added in #661), which forbids backgrounding pre-push work the deliverable depends on.

Today on PRQL/prql, run 27149177431 (tend-triage on issue #5988, "Compiler crash when joining a zero-row relation"):

Bot reproduced the panic, applied the fix, added a snapshot test, ran it green — all locally.
Following Step 6.3 ("Run the full test suite and lints"), kicked off task prqlc:pull-request (~7 min suite) with run_in_background: true.
Used Monitor (timeout_ms: 420000) and ScheduleWakeup (delaySeconds: 180) to wait for the suite, then emitted end_turn.
Session terminated — no commit, no push, no PR, no comment on the issue. 14 minutes of agent time on a real user bug, zero maintainer-visible output.

Last assistant text: "I've scheduled a fallback check. Waiting for the test suite to finish."

This is the second occurrence of "agent calls ScheduleWakeup/Monitor in CI expecting resumption and the deliverable never ships" — the first was tend-mention run 26347739838 (May 24, fix attempt closed as #593, gate-tightening followed in #602, and the root-cause guidance shipped in #661 — but the triage skill never picked up the alignment). PRQL is still pinned to max-sixty/tend@0.1.2, predating #661, so the conflicting Step 6.3 won. Even once consumers bump past #661, an agent reading both skills will see Step 6.3's explicit instruction and a generic principle in running-in-ci; the explicit instruction tends to win.

Solution

Rewrite triage Step 6.3 so the two skills agree: the targeted reproduction test plus a clean compile is sufficient confidence; the comprehensive suite is PR CI's job. Renumber the following two steps (4 → 3, 5 → 4) since "Confirm the test passes" merges naturally into Step 2.

Gate assessment

Evidence level: High — 2 occurrences of "agent uses async-resume tool in CI then end_turn leaves deliverable unshipped" across tend-mention and tend-triage. Both fully diagnosed from session logs.
Structural vs stochastic: Structural. Two skills' instructions conflict on whether to run the comprehensive suite before push; the more specific (triage Step 6.3) wins. Same conditions reproduce reliably.
Change type: Removal / simplification — drop the conflicting "Run the full test suite and lints" line, fold "Confirm the test passes" into the same step, renumber. Per Gate 2, Low evidence bar (1 occurrence enough).
Both gates: Pass.

Evidence gist (full per-run history): https://gist.github.com/a017dd295de05dc44af4479748ffe636

Sources and notes

Today's triage session: PRQL/prql Run 27149177431, issue PRQL/prql#5988 (reporter kgutwin, non-maintainer).
Session JSONL tool sequence: Bash(task prqlc:pull-request, timeout 420000) → Bash(until... done, run_in_background: true, timeout 420000) → Monitor(timeout_ms: 420000, persistent: false) → Write(/tmp/pr-body.md) → Read(.../tasks/...output) (×3) → ScheduleWakeup(delaySeconds: 180, prompt: "Check the prqlc:pull-request test output ... and continue triage of issue 5988"). Stop reason: end_turn. No git commit / git push / gh pr create / gh issue comment in the session.
Prior tend-mention occurrence: PRQL/prql Run 26347739838, terminated at 6h cap; documented in the 2026-05 evidence gist under "Acted on (High evidence, structural): tend-mention 6h-cap idle after ScheduleWakeup in CI".
Earlier closed PR: skills(running-in-ci): forbid ScheduleWakeup and fire-and-forget background bash in CI #593 (forbid ScheduleWakeup and fire-and-forget background bash in CI) — closed at 1 occurrence pre-gate-tightening. Comment: "no! we need to change our criteria; this is structural and it's not critical". Gates were tightened in skills(review-gates): structural classification doesn't override Gate 1 #602, then skills(running-in-ci): end the turn only when work is shipped #661 added the more nuanced framing — this PR completes the picture by removing the triage-side conflict.
running-in-ci's rule (post-skills(running-in-ci): end the turn only when work is shipped #661): "A targeted compile plus the tests directly exercising the change is enough local confidence to ship — leave the comprehensive matrix to CI."

…re end_turn" Triage Step 6.3's "Run the full test suite and lints" conflicts with running-in-ci's "End the turn only when work is shipped" (#661): a long suite kicked off before push pushes the agent into mid-wait end_turn and the deliverable never ships. Fold the targeted-test check into Step 2 and renumber. Closes a 2-occurrence pattern (May tend-mention #5741, June tend-triage on PRQL/prql#5988) where the agent finished real work, called ScheduleWakeup expecting resumption, and exited without commit/push. Co-Authored-By: Claude <noreply@anthropic.com>

tend-agent

Same pattern elsewhere: plugins/tend-ci-runner/skills/nightly/SKILL.md line 281 still says branch, fix, run full test suite, commit, push, create PR, poll CI. By this PR's own argument — an explicit step-list instruction tends to override the generic running-in-ci rule — nightly remains vulnerable to the same mid-wait-end_turn failure mode, just one of the project's longer suites away from reproducing it. Happy to push a follow-up that drops run full test suite from that line; say the word.

Bumps the generator version to 0.1.4, syncs the lockfile, and adds the 0.1.4 CHANGELOG section (published as the GitHub Release notes on tag push). Notable changes since 0.1.3: - Claude harnesses switch to `bypassPermissions` (#677) - GitHub Releases publish from `CHANGELOG.md` on tag push (#678) — this release is the first live exercise of that workflow - Interactive sandbox receives GitHub Actions context env vars (#664) - install-tend bot-token step restructured around scope-audit remediation (#680) - Review skill checks the rollup before APPROVE (#667); running-in-ci/triage/ci-fix skill refinements (#661, #666, #669, #670, #671) > _This was written by Claude Code on behalf of max_ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Fable 5 <noreply@anthropic.com>

tend-agent added the claude-behavior Behavioral issues found by review-reviewers label Jun 8, 2026

tend-agent commented Jun 8, 2026

View reviewed changes

max-sixty merged commit e77bda8 into main Jun 9, 2026
7 checks passed

max-sixty deleted the hourly/review-27150920923 branch June 9, 2026 05:01

This was referenced Jun 9, 2026

skills(triage): commit/push/PR before the full-suite gate — a session that ends mid-wait loses the entire fix #674

Open

review-runs-tracking: 2026-06 #646

Open

max-sixty mentioned this pull request Jun 11, 2026

chore: release 0.1.4 #681

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

skills(triage): drop "run full suite" — defer to PR CI per "ship before end_turn"#671

skills(triage): drop "run full suite" — defer to PR CI per "ship before end_turn"#671
max-sixty merged 1 commit into
mainfrom
hourly/review-27150920923

tend-agent commented Jun 8, 2026

Uh oh!

tend-agent left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tend-agent commented Jun 8, 2026

Problem

Solution

Gate assessment

Uh oh!

tend-agent left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants