Skip to content

chore(tests): codex-desc budget is per-skill-average, not a hard aggregate (ag-vzbt)#612

Merged
boshu2 merged 2 commits into
mainfrom
chore/ag-vzbt-codex-desc-avg-budget
May 30, 2026
Merged

chore(tests): codex-desc budget is per-skill-average, not a hard aggregate (ag-vzbt)#612
boshu2 merged 2 commits into
mainfrom
chore/ag-vzbt-codex-desc-avg-budget

Conversation

@boshu2
Copy link
Copy Markdown
Owner

@boshu2 boshu2 commented May 30, 2026

What

ag-vzbt — the last ag-cw2y item. The skills-codex description catalog budget was a hard aggregate (2800 chars, bumped 2600→2700→2800 as skills landed) with ~17 chars headroom. Adding the Nth+ skill walls out — /burndown #600 was forced into a 17-char stub ("Bounded epic loop") to fit.

How

Replace the hard aggregate with a per-skill average cap (CODEX_DESC_AVG_FAIL_CHARS=45) that scales with the catalog:

  • Each terse description keeps the average low; the gate fails only if descriptions are bloated on average — not because the catalog grew.
  • Current state: avg 35 chars/skill (77% of 45) — real headroom, and it never becomes a wall again.
  • The per-skill hard cap (DESC_FAIL_CHARS=180) still bounds any single bloated description.
  • BUDGET_REPO_ROOT override added so the rule is fixture-testable.

Tests (TDD, red→green)

tests/scripts/codex-desc-avg-budget.bats — PASS under budget / FAIL over budget / 100 skills (total ~3800 > old 2800 wall) still pass because avg stays low. 3/3 green; shellcheck clean; real-repo gate passes (avg 35/45).

⚠️ Pre-existing unrelated red

The local fast gate also surfaces a pre-existing mkdocs strict failure on rfcs/0001-finding-generator-parallelism.md — reproduces on clean origin/main with my changes stashed; my diff touches only the test script + bats (zero docs). Flagging separately; not in scope here.

ag-cw2y status

All four items now done (items 1,2,4 merged #609/#610/#611; item 3 here). A brand-new skill is now one-shot-green → this unblocks ag-hdqu0.8 (last Outcomes bead) and /burndown #600.

Closes-scenario: ag-vzbt#per-skill-average-budget
Bounded-context: BC2-validation
Evidence: tests/scripts/codex-desc-avg-budget.bats

…egate (ag-vzbt)

The skills-codex description catalog budget was a hard aggregate (2800 chars,
raised 2600→2700→2800 as skills landed) with ~17 chars headroom — adding the Nth+
skill walls out, and /burndown (#600) was forced into a 17-char stub to fit.

Replace it with a PER-SKILL AVERAGE cap (CODEX_DESC_AVG_FAIL_CHARS=45) that scales
with the catalog: each terse description keeps the avg low; the gate fails only if
descriptions are bloated on average. Current avg ~35 (77% of 45) — real headroom,
and it never becomes a wall as the catalog grows. The per-skill hard cap
(DESC_FAIL_CHARS=180) still bounds any single bloated description.

BUDGET_REPO_ROOT override added so the rule is fixture-testable.
(NOTE: the local fast gate also surfaces a PRE-EXISTING mkdocs failure on
rfcs/0001-finding-generator-parallelism.md — reproduces on clean origin/main,
unrelated to this test-script change.)

Closes-scenario: ag-vzbt#per-skill-average-budget
Bounded-context: BC2-validation
Evidence: tests/scripts/codex-desc-avg-budget.bats
@github-actions github-actions Bot added the tests label May 30, 2026
…612)

The prior commit 31dec88 committed the bats spec but the
tests/skills/test-token-budgets.sh edit (BUDGET_REPO_ROOT + per-skill
CODEX_DESC_AVG_FAIL_CHARS=45) was never staged, so CI ran the OLD
aggregate-2800 gate against the real repo and the fixture-driven FAIL
assertion could not pass.

Evidence: bats tests/scripts/codex-desc-avg-budget.bats -> 3/3 green;
real-repo gate -> avg 35/45 (77%), Failed: 0.

ag-vzbt
@boshu2 boshu2 merged commit 4d18300 into main May 30, 2026
14 checks passed
@boshu2 boshu2 deleted the chore/ag-vzbt-codex-desc-avg-budget branch May 30, 2026 00:30
boshu2 added a commit that referenced this pull request May 30, 2026
…ag-j4l1 #evolve-discipline-references) (#615)

## What

Promotes this session's hard-won fix-and-repush discipline — which lived
only in the **session-only cron prompt** (expires in 7 days) — into the
**durable `/evolve` skill**, so every future autonomous run inherits it.

## How

- **NEW** `skills/evolve/references/new-skill-landing.md` — the **six
derived surfaces** a new/modified skill must regenerate in one shot
(registry.json being the most-missed; it trips `contracts-sync` +
`correctness(ubuntu)` together), with the `regen-all.sh` shortcut and
the manual codex-twin / `SKILL-TIERS.md` steps.
- `gate-hygiene.md` **+2 subsections**: pre-push **diff-scope check**
(catches half-staged commits like #612 + conflict-resolution collateral
deletions like #600) and **pre-existing-vs-mine red triage**
(mkdocs-strict + the 7 codex `.agentops-generated.json` drifts).
- `SKILL.md` links both from Step 4.5 + the References list.

## Dog-fooding

Built one-shot through its own documented discipline: codex twin hash
refreshed for `evolve`, the 7 pre-existing codex drifts reverted,
manifest surgically rebuilt to the `evolve` delta only. No
inventory/count files touched (correct — adds references, not a skill).

## Evidence

- `tests/scripts/evolve-discipline-references.bats` → **4/4** (locks
both reference links + the two new gate-hygiene subsections)
- `heal.sh --strict skills/evolve` → clean
- `validate-codex-generated-artifacts.sh --scope worktree` → pass

(Companion: ag-cw2y wired the skill-builder *scaffold*; ag-ekyq added
the registry surface; this bakes the operator-side *runtime guidance*
into /evolve.)

Closes-scenario: ag-j4l1#evolve-discipline-references
Bounded-context: BC2-Validation
Evidence: skills/evolve/references/new-skill-landing.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant