Skip to content

skills(triage): drop "run full suite" — defer to PR CI per "ship before end_turn"#671

Merged
max-sixty merged 1 commit into
mainfrom
hourly/review-27150920923
Jun 9, 2026
Merged

skills(triage): drop "run full suite" — defer to PR CI per "ship before end_turn"#671
max-sixty merged 1 commit into
mainfrom
hourly/review-27150920923

Conversation

@tend-agent

Copy link
Copy Markdown
Collaborator

Problem

tend-triage's Step 6 "If fixing" tells the bot to "Run the full test suite and lints" before commit/push. That directly conflicts with running-in-ci's "End the turn only when work is shipped" (added in #661), which forbids backgrounding pre-push work the deliverable depends on.

Today on PRQL/prql, run 27149177431 (tend-triage on issue #5988, "Compiler crash when joining a zero-row relation"):

  • Bot reproduced the panic, applied the fix, added a snapshot test, ran it green — all locally.
  • Following Step 6.3 ("Run the full test suite and lints"), kicked off task prqlc:pull-request (~7 min suite) with run_in_background: true.
  • Used Monitor (timeout_ms: 420000) and ScheduleWakeup (delaySeconds: 180) to wait for the suite, then emitted end_turn.
  • Session terminated — no commit, no push, no PR, no comment on the issue. 14 minutes of agent time on a real user bug, zero maintainer-visible output.

Last assistant text: "I've scheduled a fallback check. Waiting for the test suite to finish."

This is the second occurrence of "agent calls ScheduleWakeup/Monitor in CI expecting resumption and the deliverable never ships" — the first was tend-mention run 26347739838 (May 24, fix attempt closed as #593, gate-tightening followed in #602, and the root-cause guidance shipped in #661 — but the triage skill never picked up the alignment). PRQL is still pinned to max-sixty/tend@0.1.2, predating #661, so the conflicting Step 6.3 won. Even once consumers bump past #661, an agent reading both skills will see Step 6.3's explicit instruction and a generic principle in running-in-ci; the explicit instruction tends to win.

Solution

Rewrite triage Step 6.3 so the two skills agree: the targeted reproduction test plus a clean compile is sufficient confidence; the comprehensive suite is PR CI's job. Renumber the following two steps (4 → 3, 5 → 4) since "Confirm the test passes" merges naturally into Step 2.

Gate assessment

  • Evidence level: High — 2 occurrences of "agent uses async-resume tool in CI then end_turn leaves deliverable unshipped" across tend-mention and tend-triage. Both fully diagnosed from session logs.
  • Structural vs stochastic: Structural. Two skills' instructions conflict on whether to run the comprehensive suite before push; the more specific (triage Step 6.3) wins. Same conditions reproduce reliably.
  • Change type: Removal / simplification — drop the conflicting "Run the full test suite and lints" line, fold "Confirm the test passes" into the same step, renumber. Per Gate 2, Low evidence bar (1 occurrence enough).
  • Both gates: Pass.

Evidence gist (full per-run history): https://gist.github.com/a017dd295de05dc44af4479748ffe636

Sources and notes

…re end_turn"

Triage Step 6.3's "Run the full test suite and lints" conflicts with
running-in-ci's "End the turn only when work is shipped" (#661): a long
suite kicked off before push pushes the agent into mid-wait end_turn
and the deliverable never ships. Fold the targeted-test check into
Step 2 and renumber.

Closes a 2-occurrence pattern (May tend-mention #5741, June tend-triage
on PRQL/prql#5988) where the agent finished real work, called
ScheduleWakeup expecting resumption, and exited without commit/push.

Co-Authored-By: Claude <noreply@anthropic.com>
@tend-agent tend-agent added the claude-behavior Behavioral issues found by review-reviewers label Jun 8, 2026

@tend-agent tend-agent left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same pattern elsewhere: plugins/tend-ci-runner/skills/nightly/SKILL.md line 281 still says branch, fix, run full test suite, commit, push, create PR, poll CI. By this PR's own argument — an explicit step-list instruction tends to override the generic running-in-ci rule — nightly remains vulnerable to the same mid-wait-end_turn failure mode, just one of the project's longer suites away from reproducing it. Happy to push a follow-up that drops run full test suite from that line; say the word.

@max-sixty max-sixty merged commit e77bda8 into main Jun 9, 2026
7 checks passed
@max-sixty max-sixty deleted the hourly/review-27150920923 branch June 9, 2026 05:01
@max-sixty max-sixty mentioned this pull request Jun 11, 2026
max-sixty added a commit that referenced this pull request Jun 11, 2026
Bumps the generator version to 0.1.4, syncs the lockfile, and adds the
0.1.4 CHANGELOG section (published as the GitHub Release notes on tag
push).

Notable changes since 0.1.3:

- Claude harnesses switch to `bypassPermissions` (#677)
- GitHub Releases publish from `CHANGELOG.md` on tag push (#678) — this
release is the first live exercise of that workflow
- Interactive sandbox receives GitHub Actions context env vars (#664)
- install-tend bot-token step restructured around scope-audit
remediation (#680)
- Review skill checks the rollup before APPROVE (#667);
running-in-ci/triage/ci-fix skill refinements (#661, #666, #669, #670,
#671)

> _This was written by Claude Code on behalf of max_

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

claude-behavior Behavioral issues found by review-reviewers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants