chore(tests): codex-desc budget is per-skill-average, not a hard aggregate (ag-vzbt)#612
Merged
Merged
Conversation
…egate (ag-vzbt) The skills-codex description catalog budget was a hard aggregate (2800 chars, raised 2600→2700→2800 as skills landed) with ~17 chars headroom — adding the Nth+ skill walls out, and /burndown (#600) was forced into a 17-char stub to fit. Replace it with a PER-SKILL AVERAGE cap (CODEX_DESC_AVG_FAIL_CHARS=45) that scales with the catalog: each terse description keeps the avg low; the gate fails only if descriptions are bloated on average. Current avg ~35 (77% of 45) — real headroom, and it never becomes a wall as the catalog grows. The per-skill hard cap (DESC_FAIL_CHARS=180) still bounds any single bloated description. BUDGET_REPO_ROOT override added so the rule is fixture-testable. (NOTE: the local fast gate also surfaces a PRE-EXISTING mkdocs failure on rfcs/0001-finding-generator-parallelism.md — reproduces on clean origin/main, unrelated to this test-script change.) Closes-scenario: ag-vzbt#per-skill-average-budget Bounded-context: BC2-validation Evidence: tests/scripts/codex-desc-avg-budget.bats
…612) The prior commit 31dec88 committed the bats spec but the tests/skills/test-token-budgets.sh edit (BUDGET_REPO_ROOT + per-skill CODEX_DESC_AVG_FAIL_CHARS=45) was never staged, so CI ran the OLD aggregate-2800 gate against the real repo and the fixture-driven FAIL assertion could not pass. Evidence: bats tests/scripts/codex-desc-avg-budget.bats -> 3/3 green; real-repo gate -> avg 35/45 (77%), Failed: 0. ag-vzbt
boshu2
added a commit
that referenced
this pull request
May 30, 2026
…ag-j4l1 #evolve-discipline-references) (#615) ## What Promotes this session's hard-won fix-and-repush discipline — which lived only in the **session-only cron prompt** (expires in 7 days) — into the **durable `/evolve` skill**, so every future autonomous run inherits it. ## How - **NEW** `skills/evolve/references/new-skill-landing.md` — the **six derived surfaces** a new/modified skill must regenerate in one shot (registry.json being the most-missed; it trips `contracts-sync` + `correctness(ubuntu)` together), with the `regen-all.sh` shortcut and the manual codex-twin / `SKILL-TIERS.md` steps. - `gate-hygiene.md` **+2 subsections**: pre-push **diff-scope check** (catches half-staged commits like #612 + conflict-resolution collateral deletions like #600) and **pre-existing-vs-mine red triage** (mkdocs-strict + the 7 codex `.agentops-generated.json` drifts). - `SKILL.md` links both from Step 4.5 + the References list. ## Dog-fooding Built one-shot through its own documented discipline: codex twin hash refreshed for `evolve`, the 7 pre-existing codex drifts reverted, manifest surgically rebuilt to the `evolve` delta only. No inventory/count files touched (correct — adds references, not a skill). ## Evidence - `tests/scripts/evolve-discipline-references.bats` → **4/4** (locks both reference links + the two new gate-hygiene subsections) - `heal.sh --strict skills/evolve` → clean - `validate-codex-generated-artifacts.sh --scope worktree` → pass (Companion: ag-cw2y wired the skill-builder *scaffold*; ag-ekyq added the registry surface; this bakes the operator-side *runtime guidance* into /evolve.) Closes-scenario: ag-j4l1#evolve-discipline-references Bounded-context: BC2-Validation Evidence: skills/evolve/references/new-skill-landing.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
ag-vzbt — the last
ag-cw2yitem. The skills-codex description catalog budget was a hard aggregate (2800 chars, bumped 2600→2700→2800 as skills landed) with ~17 chars headroom. Adding the Nth+ skill walls out —/burndown#600 was forced into a 17-char stub ("Bounded epic loop") to fit.How
Replace the hard aggregate with a per-skill average cap (
CODEX_DESC_AVG_FAIL_CHARS=45) that scales with the catalog:DESC_FAIL_CHARS=180) still bounds any single bloated description.BUDGET_REPO_ROOToverride added so the rule is fixture-testable.Tests (TDD, red→green)
tests/scripts/codex-desc-avg-budget.bats— PASS under budget / FAIL over budget / 100 skills (total ~3800 > old 2800 wall) still pass because avg stays low. 3/3 green; shellcheck clean; real-repo gate passes (avg 35/45).The local fast gate also surfaces a pre-existing
mkdocs strictfailure onrfcs/0001-finding-generator-parallelism.md— reproduces on cleanorigin/mainwith my changes stashed; my diff touches only the test script + bats (zero docs). Flagging separately; not in scope here.ag-cw2y status
All four items now done (items 1,2,4 merged #609/#610/#611; item 3 here). A brand-new skill is now one-shot-green → this unblocks
ag-hdqu0.8(last Outcomes bead) and/burndown#600.Closes-scenario: ag-vzbt#per-skill-average-budget
Bounded-context: BC2-validation
Evidence: tests/scripts/codex-desc-avg-budget.bats