boshu2 · boshu2 · May 29, 2026 · May 29, 2026 · May 29, 2026 · May 29, 2026
@@ -67,7 +67,7 @@ The April 2026 Claude Code source analysis confirmed that Anthropic's internal t
 | Anthropic Concept | AgentOps Equivalent | Status |
 |---|---|---|
 | **Learning Loop** — memory extraction, dream cycle consolidation, future session context | Knowledge Flywheel — `/retro` → `/forge` → `/harvest` → `ao lookup` / `ao context assemble`, tiered promotion (learning → pattern → rule), plus bounded Dream via `/dream` | Live with bounds. On-demand capture/promotion works, and Dream provides an operator-started compounding lane. GitHub nightly is the public proof harness for the contracts, not the user's private runtime. |
-| **Skillify** — AI watches patterns, packages them as reusable skills, compound growth | Skills system — 76 skills, `/heal-skill` audit, `/converter` cross-runtime export, SKILL-TIERS classification | Prototype built. `ao flywheel close-loop` now drafts review-only skills from repeated patterns; promotion polish is the remaining gap. |
+| **Skillify** — AI watches patterns, packages them as reusable skills, compound growth | Skills system — 78 skills, `/heal-skill` audit, `/converter` cross-runtime export, SKILL-TIERS classification | Prototype built. `ao flywheel close-loop` now drafts review-only skills from repeated patterns; promotion polish is the remaining gap. |
 | **Verification Agent** — adversarial AI auditing AI, VERDICT system for human review | Council architecture — `/council`, `/pre-mortem`, `/vibe`, `/post-mortem` with multi-model consensus, prediction tracking. Stage 4 behavioral validation adds holdout scenarios + satisfaction scoring in STEP 1.8. | Live on demand. STEP 1.8 fires automatically inside `/validation` when that skill is invoked. |
 | **Managed Agents Dreaming** (May 2026) — scheduled session review, pattern extraction, memory curation between sessions | `/dream` + `.github/workflows/nightly.yml` proof jobs + substrate-driven scheduling when needed | Live with operator setup. The bounded private Dream lane runs harvest → forge → close-loop → defrag when the operator or substrate starts it. AgentOps itself no longer ships the daemon executor. |
 | **Managed Agents Outcomes** (May 2026) — rubric-driven separate-context grader with iterate-until-pass | Live at three scopes: project — `GOALS.md` (rubric) + `ao goals measure` (each gate runs as separate subprocess; `cli/internal/goals/measure.go:132-164`) + `/evolve` (can iterate a worst-failing gate under operator limits; `skills/evolve/SKILL.md:379-388`); plan — `/pre-mortem` council judges as separate-context graders; code — `/vibe` council judges. An internal council review (2026-05-06) found these capabilities present across rubric authoring, separate-context grading, iterate-until-pass, and pinpoint-what-changed; this is an internal finding, not an audited external-parity claim. | Live at the capability layer. Empirical workbench A/B (2026-05-06): Δ=+0.0000 across 12 cases at v1 difficulty (both legs 12/12) — task difficulty floor exhausted; v2 substrate (realistic agent tasks where the hook layer differentiates) is roadmap. Counter-stat artifact: `evals/workbench/results/2026-05-06-yjzp9-counterstat.json`. |
@@ -176,7 +176,7 @@ The same model used in the README: bookkeeping records the work, the context com
 - `ao lookup` — decay-ranked retrieval for on-demand knowledge
 - `ao context assemble` — phase-scoped context packets
 - `ao compile` — rebuild the knowledge wiki (mine, grow, defrag, lint)
-- 76 skills — reusable context packages across Claude Code, Codex, and OpenCode
+- 78 skills — reusable context packages across Claude Code, Codex, and OpenCode
 - `bash <(curl -fsSL .../install.sh)` — 30 seconds, zero config
 
 #### Layer 3: Validation Gates
@@ -261,7 +261,7 @@ As of 2026-05-10:
 
 - GitHub repo: 341 stars, 33 forks, 2 open issues, last pushed 2026-05-10T03:24:01Z
 - Public surface: GitHub Pages mkdocs site live at boshu2.github.io/agentops/; doctrine site live at 12factoragentops.com
-- Distribution/runtime reach: 76 shared skills, 76 checked-in Codex artifacts, and 32 Codex overrides. `/validate` and `/curate` are additive in this train; legacy validation and mining skills remain until their shim/retirement gates are resolved.
+- Distribution/runtime reach: 78 shared skills, 78 checked-in Codex artifacts, and 32 Codex overrides. `/validate` and `/curate` are additive in this train; legacy validation and mining skills remain until their shim/retirement gates are resolved.
 
 **Measured operational proof:**
 

@@ -370,7 +370,7 @@ func TestCobraCommandTreeRegistration(t *testing.T) {
 		"defrag", "demo", "doctor", "eval", "evolve", "extract", "feedback", "feedback-loop",
 		"findings", "flywheel", "forge", "gate", "goals", "handoff", "harness", "harvest",
 		"index", "init", "inject", "knowledge", "lookup", "loop", "maturity",
-		"memory", "metrics", "migrate", "mind", "mine", "notebook", "operator", "patterns",
+		"memory", "metrics", "migrate", "mind", "mine", "notebook", "operator", "orchestrate", "patterns",
 		"pool", "quick-start", "ratchet", "reconcile", "retrieval-bench", "robot-docs", "rpi",
 		"registry", "scenario", "scope", "search", "seed", "session", "session-outcome", "sessions", "skills", "status",
 		"store", "task-feedback", "task-status", "task-sync", "temper",
@@ -429,7 +429,7 @@ func TestCobraExpectedCmdsMatchRegistration(t *testing.T) {
 		"defrag", "demo", "doctor", "eval", "evolve", "extract", "feedback", "feedback-loop",
 		"findings", "flywheel", "forge", "gate", "goals", "handoff", "harness", "harvest",
 		"index", "init", "inject", "knowledge", "lookup", "loop", "maturity",
-		"memory", "metrics", "migrate", "mind", "mine", "notebook", "operator", "patterns",
+		"memory", "metrics", "migrate", "mind", "mine", "notebook", "operator", "orchestrate", "patterns",
 		"pool", "quick-start", "ratchet", "reconcile", "retrieval-bench", "robot-docs", "rpi",
 		"registry", "scenario", "scope", "search", "seed", "session", "session-outcome", "sessions", "skills", "status",
 		"store", "task-feedback", "task-status", "task-sync", "temper",

@@ -0,0 +1,126 @@
+// practices: [hexagonal-architecture, safe-degradation]
+package main
+
+import (
+	"context"
+	"encoding/json"
+	"fmt"
+	"os/exec"
+	"strings"
+
+	"github.com/spf13/cobra"
+
+	"github.com/boshu2/agentops/cli/internal/orchestration"
+	"github.com/boshu2/agentops/cli/internal/ports"
+)
+
+// orchestrateCmd is the parent for orchestration-backend tooling. It wires
+// the library-only OrchestrationPort (internal/orchestration + the typed
+// port in internal/ports) into the live `ao` command surface.
+var orchestrateCmd = &cobra.Command{
+	Use:   "orchestrate",
+	Short: "Resolve and inspect the orchestration backend ladder",
+	Long: `Tooling for the orchestration safe-degradation ladder
+(NTM -> Claude-native -> beads floor). Subcommands resolve which backend
+a unit of work would run on, honoring an explicit pin, the
+AGENTOPS_ORCHESTRATION env override, and an opt-out to the beads floor.`,
+}
+
+var (
+	orchestrateSelectJSON   bool
+	orchestrateSelectPin    string
+	orchestrateSelectOptOut bool
+)
+
+var orchestrateSelectCmd = &cobra.Command{
+	Use:   "select",
+	Short: "Select the orchestration backend for a unit of work",
+	Long: `Resolve the orchestration backend via the safe-degradation ladder
+NTM -> Claude-native -> beads floor.
+
+NTM availability is detected by capability — this shells out to
+` + "`ntm --robot-capabilities`" + ` and degrades gracefully when ntm is
+absent. Resolution order (first match wins):
+
+  1. --pin <ntm|claude|codex|beads>  forces that backend.
+  2. AGENTOPS_ORCHESTRATION env       acts as an explicit pin / opt-out.
+  3. --opt-out                        routes to the beads floor.
+  4. NTM probe reports available      -> ntm.
+  5. otherwise                        -> claude (beads floor remains).`,
+	RunE: runOrchestrateSelect,
+}
+
+func init() {
+	orchestrateCmd.GroupID = "workflow"
+	rootCmd.AddCommand(orchestrateCmd)
+	orchestrateCmd.AddCommand(orchestrateSelectCmd)
+	orchestrateSelectCmd.Flags().BoolVar(&orchestrateSelectJSON, "json", false,
+		"Emit the selection trace as JSON")
+	orchestrateSelectCmd.Flags().StringVar(&orchestrateSelectPin, "pin", "",
+		"Force a backend: ntm|claude|codex|beads (overrides --opt-out and availability)")
+	orchestrateSelectCmd.Flags().BoolVar(&orchestrateSelectOptOut, "opt-out", false,
+		"Bypass swarm engines and run on the beads floor")
+	_ = orchestrateSelectCmd.RegisterFlagCompletionFunc("pin",
+		staticCompletionFunc("ntm", "claude", "codex", "beads"))
+}
+
+// execCommandRunner is the production CommandRunner adapter: it shells out
+// via os/exec so ProbeNTM actually invokes `ntm --robot-capabilities`. It
+// is a thin consumer of the orchestration package and adds no behavior of
+// its own.
+type execCommandRunner struct{}
+
+// Run executes name with args and returns the combined output. A non-zero
+// exit (or a missing binary) surfaces as an error, which ProbeNTM reads as
+// the canonical "tool absent or unusable" degradation signal.
+func (execCommandRunner) Run(ctx context.Context, name string, args ...string) ([]byte, error) {
+	return exec.CommandContext(ctx, name, args...).CombinedOutput()
+}
+
+// compile-time assertion that the adapter satisfies the probe's contract.
+var _ orchestration.CommandRunner = execCommandRunner{}
+
+// workSpecFromFlags maps the command's flag values onto a port WorkSpec.
+// It is split out from the cobra plumbing so the flag->intent mapping can
+// be unit-tested without constructing a command.
+func workSpecFromFlags(pin string, optOut bool) ports.WorkSpec {
+	return ports.WorkSpec{
+		OptOut: optOut,
+		Pin:    ports.Backend(strings.TrimSpace(pin)),
+	}
+}
+
+// runOrchestrateSelect builds the production Selector over an exec-backed
+// runner and resolves the backend for the flag-derived WorkSpec.
+func runOrchestrateSelect(cmd *cobra.Command, _ []string) error {
+	selector := orchestration.NewSelector(execCommandRunner{})
+	work := workSpecFromFlags(orchestrateSelectPin, orchestrateSelectOptOut)
+
+	trace, err := selector.Select(cmd.Context(), work)
+	if err != nil {
+		return fmt.Errorf("selecting orchestration backend: %w", err)
+	}
+
+	return emitSelectionTrace(cmd, trace, orchestrateSelectJSON)
+}
+
+// emitSelectionTrace renders a SelectionTrace as JSON (when jsonOut) or as
+// a human-readable summary. Kept separate so both branches are testable
+// against an injected writer.
+func emitSelectionTrace(cmd *cobra.Command, trace ports.SelectionTrace, jsonOut bool) error {
+	out := cmd.OutOrStdout()
+	if jsonOut {
+		enc := json.NewEncoder(out)
+		enc.SetIndent("", "  ")
+		return enc.Encode(trace)
+	}
+
+	fmt.Fprintf(out, "Backend: %s\n", trace.Chosen)
+	fmt.Fprintf(out, "Reason:  %s\n", trace.Reason)
+	considered := make([]string, 0, len(trace.Considered))
+	for _, b := range trace.Considered {
+		considered = append(considered, string(b))
+	}
+	fmt.Fprintf(out, "Ladder:  %s\n", strings.Join(considered, " -> "))
+	return nil
+}
@@ -0,0 +1,162 @@
+package main
+
+import (
+	"bytes"
+	"context"
+	"encoding/json"
+	"errors"
+	"strings"
+	"testing"
+
+	"github.com/spf13/cobra"
+
+	"github.com/boshu2/agentops/cli/internal/orchestration"
+	"github.com/boshu2/agentops/cli/internal/ports"
+)
+
+// fakeRunner is an in-memory CommandRunner so the Select path can be
+// exercised without shelling out to a real `ntm` binary.
+type fakeRunner struct {
+	out []byte
+	err error
+}
+
+func (f fakeRunner) Run(_ context.Context, _ string, _ ...string) ([]byte, error) {
+	return f.out, f.err
+}
+
+func TestOrchestrate_WorkSpecFromFlags(t *testing.T) {
+	tests := []struct {
+		name    string
+		pin     string
+		optOut  bool
+		wantPin ports.Backend
+		wantOpt bool
+	}{
+		{name: "empty", pin: "", optOut: false, wantPin: "", wantOpt: false},
+		{name: "pin trimmed", pin: "  claude  ", optOut: false, wantPin: ports.BackendClaude, wantOpt: false},
+		{name: "opt-out", pin: "", optOut: true, wantPin: "", wantOpt: true},
+		{name: "pin wins over opt-out flags", pin: "codex", optOut: true, wantPin: ports.BackendCodex, wantOpt: true},
+	}
+	for _, tc := range tests {
+		t.Run(tc.name, func(t *testing.T) {
+			got := workSpecFromFlags(tc.pin, tc.optOut)
+			if got.Pin != tc.wantPin {
+				t.Fatalf("Pin: got %q, want %q", got.Pin, tc.wantPin)
+			}
+			if got.OptOut != tc.wantOpt {
+				t.Fatalf("OptOut: got %v, want %v", got.OptOut, tc.wantOpt)
+			}
+		})
+	}
+}
+
+// TestOrchestrate_SelectResolvesBackends drives a real Selector with an
+// injected fake runner across the ladder branches, asserting the chosen
+// backend for each flag combination.
+func TestOrchestrate_SelectResolvesBackends(t *testing.T) {
+	t.Setenv("AGENTOPS_ORCHESTRATION", "") // neutralize any operator override
+
+	tests := []struct {
+		name   string
+		runner orchestration.CommandRunner
+		pin    string
+		optOut bool
+		want   ports.Backend
+	}{
+		{
+			name:   "ntm absent degrades to claude",
+			runner: fakeRunner{err: errors.New("ntm: not found")},
+			want:   ports.BackendClaude,
+		},
+		{
+			name:   "ntm available selects ntm",
+			runner: fakeRunner{out: []byte(`{"capabilities":["tmux","git"]}`)},
+			want:   ports.BackendNTM,
+		},
+		{
+			name:   "opt-out routes to beads floor",
+			runner: fakeRunner{err: errors.New("ntm: not found")},
+			optOut: true,
+			want:   ports.BackendBeads,
+		},
+		{
+			name:   "pin wins over availability",
+			runner: fakeRunner{out: []byte(`{"capabilities":["tmux","git"]}`)},
+			pin:    "claude",
+			want:   ports.BackendClaude,
+		},
+		{
+			name:   "pin codex (never auto-selected) honored",
+			runner: fakeRunner{err: errors.New("ntm: not found")},
+			pin:    "codex",
+			want:   ports.BackendCodex,
+		},
+	}
+
+	for _, tc := range tests {
+		t.Run(tc.name, func(t *testing.T) {
+			selector := orchestration.NewSelector(tc.runner)
+			work := workSpecFromFlags(tc.pin, tc.optOut)
+			trace, err := selector.Select(context.Background(), work)
+			if err != nil {
+				t.Fatalf("Select returned error: %v", err)
+			}
+			if trace.Chosen != tc.want {
+				t.Fatalf("Chosen: got %q, want %q", trace.Chosen, tc.want)
+			}
+			if len(trace.Considered) == 0 {
+				t.Fatal("Considered ladder must be recorded")
+			}
+		})
+	}
+}
+
+// TestOrchestrate_EmitSelectionTraceJSON asserts the JSON branch emits the
+// trace verbatim and parses back into the port shape.
+func TestOrchestrate_EmitSelectionTraceJSON(t *testing.T) {
+	trace := ports.SelectionTrace{
+		Chosen:     ports.BackendBeads,
+		Reason:     "WorkSpec.OptOut -> beads floor",
+		Considered: []ports.Backend{"pin", "env", "optout"},
+	}
+	cmd := &cobra.Command{}
+	var buf bytes.Buffer
+	cmd.SetOut(&buf)
+
+	if err := emitSelectionTrace(cmd, trace, true); err != nil {
+		t.Fatalf("emitSelectionTrace: %v", err)
+	}
+
+	var got ports.SelectionTrace
+	if err := json.Unmarshal(buf.Bytes(), &got); err != nil {
+		t.Fatalf("output is not valid JSON: %v", err)
+	}
+	if got.Chosen != ports.BackendBeads {
+		t.Fatalf("Chosen: got %q, want %q", got.Chosen, ports.BackendBeads)
+	}
+}
+
+// TestOrchestrate_EmitSelectionTraceHuman asserts the human-readable branch
+// renders the backend, reason, and ladder.
+func TestOrchestrate_EmitSelectionTraceHuman(t *testing.T) {
+	trace := ports.SelectionTrace{
+		Chosen:     ports.BackendClaude,
+		Reason:     "NTM absent -> claude-native fallback",
+		Considered: []ports.Backend{"pin", "env", "optout", "ntm", "claude", "beads"},
+	}
+	cmd := &cobra.Command{}
+	var buf bytes.Buffer
+	cmd.SetOut(&buf)
+
+	if err := emitSelectionTrace(cmd, trace, false); err != nil {
+		t.Fatalf("emitSelectionTrace: %v", err)
+	}
+
+	got := buf.String()
+	for _, want := range []string{"Backend: claude", "Reason:  NTM absent", "pin -> env -> optout -> ntm -> claude -> beads"} {
+		if !strings.Contains(got, want) {
+			t.Fatalf("output missing %q\nfull output:\n%s", want, got)
+		}
+	}
+}
@@ -226,7 +226,7 @@ func selectExecutorFromCaps(caps backendCapabilities, statusPath string, allPhas
 // The selection policy, chosen backend, and reason are logged to logPath for
 // observability. Pass an empty logPath to skip log writing (e.g., in tests).
 //
-// Selection order: runtime override (stream/direct) > auto (live-status=>stream, else direct).
+// Selection order: runtime override (stream/direct/tmux) > auto (always resolves to stream).
 func selectExecutor(statusPath string, allPhases []PhaseProgress) PhaseExecutor {
 	return selectExecutorWithLog(statusPath, allPhases, "", "", false, defaultPhasedEngineOptions())
 }

@@ -2069,6 +2069,35 @@ ao handoff [summary] [flags]
 
 ---
 
+### `ao orchestrate`
+
+Tooling for the orchestration safe-degradation ladder
+
+```
+ao orchestrate [command]
+```
+
+**Subcommands:**
+
+#### `ao orchestrate select`
+
+Resolve the orchestration backend via the safe-degradation ladder
+
+```
+ao orchestrate select [flags]
+```
+
+**Flags:**
+
+```
+  -h, --help         help for select
+      --json         Emit the selection trace as JSON
+      --opt-out      Bypass swarm engines and run on the beads floor
+      --pin string   Force a backend: ntm|claude|codex|beads (overrides --opt-out and availability)
+```
+
+---
+
 ### `ao ratchet`
 
 Track progress through the phased RPI workflow.

@@ -167,6 +167,8 @@ These are the skills every user needs first. Everything else is available when y
 | `/scenario` | Author and manage holdout scenarios for behavioral validation |
 | `/skill-auditor` | Two-pass audit of an existing SKILL.md against the unified template (15 checks) |
 | `/skill-builder` | Scaffold or absorb new SKILL.md files against the unified template |
+| `/automation-shape-routing` | Front door for building agent automation — decide the SHAPE (Workflow vs NTM swarm vs plain skill), then hand off to the right builder |
+| `/workflow-builder` | Scaffold a new Claude Workflow script (`.claude/workflows/*.js`) — deterministic multi-agent orchestration |
 
 ## Expert Skills (specialized workflows)