Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions PRODUCT.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ The April 2026 Claude Code source analysis confirmed that Anthropic's internal t
| Anthropic Concept | AgentOps Equivalent | Status |
|---|---|---|
| **Learning Loop** — memory extraction, dream cycle consolidation, future session context | Knowledge Flywheel — `/retro` → `/forge` → `/harvest` → `ao lookup` / `ao context assemble`, tiered promotion (learning → pattern → rule), plus bounded Dream via `/dream` | Live with bounds. On-demand capture/promotion works, and Dream provides an operator-started compounding lane. GitHub nightly is the public proof harness for the contracts, not the user's private runtime. |
| **Skillify** — AI watches patterns, packages them as reusable skills, compound growth | Skills system — 76 skills, `/heal-skill` audit, `/converter` cross-runtime export, SKILL-TIERS classification | Prototype built. `ao flywheel close-loop` now drafts review-only skills from repeated patterns; promotion polish is the remaining gap. |
| **Skillify** — AI watches patterns, packages them as reusable skills, compound growth | Skills system — 78 skills, `/heal-skill` audit, `/converter` cross-runtime export, SKILL-TIERS classification | Prototype built. `ao flywheel close-loop` now drafts review-only skills from repeated patterns; promotion polish is the remaining gap. |
| **Verification Agent** — adversarial AI auditing AI, VERDICT system for human review | Council architecture — `/council`, `/pre-mortem`, `/vibe`, `/post-mortem` with multi-model consensus, prediction tracking. Stage 4 behavioral validation adds holdout scenarios + satisfaction scoring in STEP 1.8. | Live on demand. STEP 1.8 fires automatically inside `/validation` when that skill is invoked. |
| **Managed Agents Dreaming** (May 2026) — scheduled session review, pattern extraction, memory curation between sessions | `/dream` + `.github/workflows/nightly.yml` proof jobs + substrate-driven scheduling when needed | Live with operator setup. The bounded private Dream lane runs harvest → forge → close-loop → defrag when the operator or substrate starts it. AgentOps itself no longer ships the daemon executor. |
| **Managed Agents Outcomes** (May 2026) — rubric-driven separate-context grader with iterate-until-pass | Live at three scopes: project — `GOALS.md` (rubric) + `ao goals measure` (each gate runs as separate subprocess; `cli/internal/goals/measure.go:132-164`) + `/evolve` (can iterate a worst-failing gate under operator limits; `skills/evolve/SKILL.md:379-388`); plan — `/pre-mortem` council judges as separate-context graders; code — `/vibe` council judges. An internal council review (2026-05-06) found these capabilities present across rubric authoring, separate-context grading, iterate-until-pass, and pinpoint-what-changed; this is an internal finding, not an audited external-parity claim. | Live at the capability layer. Empirical workbench A/B (2026-05-06): Δ=+0.0000 across 12 cases at v1 difficulty (both legs 12/12) — task difficulty floor exhausted; v2 substrate (realistic agent tasks where the hook layer differentiates) is roadmap. Counter-stat artifact: `evals/workbench/results/2026-05-06-yjzp9-counterstat.json`. |
Expand Down Expand Up @@ -176,7 +176,7 @@ The same model used in the README: bookkeeping records the work, the context com
- `ao lookup` — decay-ranked retrieval for on-demand knowledge
- `ao context assemble` — phase-scoped context packets
- `ao compile` — rebuild the knowledge wiki (mine, grow, defrag, lint)
- 76 skills — reusable context packages across Claude Code, Codex, and OpenCode
- 78 skills — reusable context packages across Claude Code, Codex, and OpenCode
- `bash <(curl -fsSL .../install.sh)` — 30 seconds, zero config

#### Layer 3: Validation Gates
Expand Down Expand Up @@ -261,7 +261,7 @@ As of 2026-05-10:

- GitHub repo: 341 stars, 33 forks, 2 open issues, last pushed 2026-05-10T03:24:01Z
- Public surface: GitHub Pages mkdocs site live at boshu2.github.io/agentops/; doctrine site live at 12factoragentops.com
- Distribution/runtime reach: 76 shared skills, 76 checked-in Codex artifacts, and 32 Codex overrides. `/validate` and `/curate` are additive in this train; legacy validation and mining skills remain until their shim/retirement gates are resolved.
- Distribution/runtime reach: 78 shared skills, 78 checked-in Codex artifacts, and 32 Codex overrides. `/validate` and `/curate` are additive in this train; legacy validation and mining skills remain until their shim/retirement gates are resolved.

**Measured operational proof:**

Expand Down
4 changes: 2 additions & 2 deletions cli/cmd/ao/cobra_commands_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -370,7 +370,7 @@ func TestCobraCommandTreeRegistration(t *testing.T) {
"defrag", "demo", "doctor", "eval", "evolve", "extract", "feedback", "feedback-loop",
"findings", "flywheel", "forge", "gate", "goals", "handoff", "harness", "harvest",
"index", "init", "inject", "knowledge", "lookup", "loop", "maturity",
"memory", "metrics", "migrate", "mind", "mine", "notebook", "operator", "patterns",
"memory", "metrics", "migrate", "mind", "mine", "notebook", "operator", "orchestrate", "patterns",
"pool", "quick-start", "ratchet", "reconcile", "retrieval-bench", "robot-docs", "rpi",
"registry", "scenario", "scope", "search", "seed", "session", "session-outcome", "sessions", "skills", "status",
"store", "task-feedback", "task-status", "task-sync", "temper",
Expand Down Expand Up @@ -429,7 +429,7 @@ func TestCobraExpectedCmdsMatchRegistration(t *testing.T) {
"defrag", "demo", "doctor", "eval", "evolve", "extract", "feedback", "feedback-loop",
"findings", "flywheel", "forge", "gate", "goals", "handoff", "harness", "harvest",
"index", "init", "inject", "knowledge", "lookup", "loop", "maturity",
"memory", "metrics", "migrate", "mind", "mine", "notebook", "operator", "patterns",
"memory", "metrics", "migrate", "mind", "mine", "notebook", "operator", "orchestrate", "patterns",
"pool", "quick-start", "ratchet", "reconcile", "retrieval-bench", "robot-docs", "rpi",
"registry", "scenario", "scope", "search", "seed", "session", "session-outcome", "sessions", "skills", "status",
"store", "task-feedback", "task-status", "task-sync", "temper",
Expand Down
126 changes: 126 additions & 0 deletions cli/cmd/ao/orchestrate.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
// practices: [hexagonal-architecture, safe-degradation]
package main

import (
"context"
"encoding/json"
"fmt"
"os/exec"
"strings"

"github.com/spf13/cobra"

"github.com/boshu2/agentops/cli/internal/orchestration"
"github.com/boshu2/agentops/cli/internal/ports"
)

// orchestrateCmd is the parent for orchestration-backend tooling. It wires
// the library-only OrchestrationPort (internal/orchestration + the typed
// port in internal/ports) into the live `ao` command surface.
var orchestrateCmd = &cobra.Command{
Use: "orchestrate",
Short: "Resolve and inspect the orchestration backend ladder",
Long: `Tooling for the orchestration safe-degradation ladder
(NTM -> Claude-native -> beads floor). Subcommands resolve which backend
a unit of work would run on, honoring an explicit pin, the
AGENTOPS_ORCHESTRATION env override, and an opt-out to the beads floor.`,
}

var (
orchestrateSelectJSON bool
orchestrateSelectPin string
orchestrateSelectOptOut bool
)

var orchestrateSelectCmd = &cobra.Command{
Use: "select",
Short: "Select the orchestration backend for a unit of work",
Long: `Resolve the orchestration backend via the safe-degradation ladder
NTM -> Claude-native -> beads floor.

NTM availability is detected by capability — this shells out to
` + "`ntm --robot-capabilities`" + ` and degrades gracefully when ntm is
absent. Resolution order (first match wins):

1. --pin <ntm|claude|codex|beads> forces that backend.
2. AGENTOPS_ORCHESTRATION env acts as an explicit pin / opt-out.
3. --opt-out routes to the beads floor.
4. NTM probe reports available -> ntm.
5. otherwise -> claude (beads floor remains).`,
RunE: runOrchestrateSelect,
}

func init() {
orchestrateCmd.GroupID = "workflow"
rootCmd.AddCommand(orchestrateCmd)
orchestrateCmd.AddCommand(orchestrateSelectCmd)
orchestrateSelectCmd.Flags().BoolVar(&orchestrateSelectJSON, "json", false,
"Emit the selection trace as JSON")
orchestrateSelectCmd.Flags().StringVar(&orchestrateSelectPin, "pin", "",
"Force a backend: ntm|claude|codex|beads (overrides --opt-out and availability)")
orchestrateSelectCmd.Flags().BoolVar(&orchestrateSelectOptOut, "opt-out", false,
"Bypass swarm engines and run on the beads floor")
_ = orchestrateSelectCmd.RegisterFlagCompletionFunc("pin",
staticCompletionFunc("ntm", "claude", "codex", "beads"))
}

// execCommandRunner is the production CommandRunner adapter: it shells out
// via os/exec so ProbeNTM actually invokes `ntm --robot-capabilities`. It
// is a thin consumer of the orchestration package and adds no behavior of
// its own.
type execCommandRunner struct{}

// Run executes name with args and returns the combined output. A non-zero
// exit (or a missing binary) surfaces as an error, which ProbeNTM reads as
// the canonical "tool absent or unusable" degradation signal.
func (execCommandRunner) Run(ctx context.Context, name string, args ...string) ([]byte, error) {
return exec.CommandContext(ctx, name, args...).CombinedOutput()
}

// compile-time assertion that the adapter satisfies the probe's contract.
var _ orchestration.CommandRunner = execCommandRunner{}

// workSpecFromFlags maps the command's flag values onto a port WorkSpec.
// It is split out from the cobra plumbing so the flag->intent mapping can
// be unit-tested without constructing a command.
func workSpecFromFlags(pin string, optOut bool) ports.WorkSpec {
return ports.WorkSpec{
OptOut: optOut,
Pin: ports.Backend(strings.TrimSpace(pin)),
}
}

// runOrchestrateSelect builds the production Selector over an exec-backed
// runner and resolves the backend for the flag-derived WorkSpec.
func runOrchestrateSelect(cmd *cobra.Command, _ []string) error {
selector := orchestration.NewSelector(execCommandRunner{})
work := workSpecFromFlags(orchestrateSelectPin, orchestrateSelectOptOut)

trace, err := selector.Select(cmd.Context(), work)
if err != nil {
return fmt.Errorf("selecting orchestration backend: %w", err)
}

return emitSelectionTrace(cmd, trace, orchestrateSelectJSON)
}

// emitSelectionTrace renders a SelectionTrace as JSON (when jsonOut) or as
// a human-readable summary. Kept separate so both branches are testable
// against an injected writer.
func emitSelectionTrace(cmd *cobra.Command, trace ports.SelectionTrace, jsonOut bool) error {
out := cmd.OutOrStdout()
if jsonOut {
enc := json.NewEncoder(out)
enc.SetIndent("", " ")
return enc.Encode(trace)
}

fmt.Fprintf(out, "Backend: %s\n", trace.Chosen)
fmt.Fprintf(out, "Reason: %s\n", trace.Reason)
considered := make([]string, 0, len(trace.Considered))
for _, b := range trace.Considered {
considered = append(considered, string(b))
}
fmt.Fprintf(out, "Ladder: %s\n", strings.Join(considered, " -> "))
return nil
}
162 changes: 162 additions & 0 deletions cli/cmd/ao/orchestrate_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
package main

import (
"bytes"
"context"
"encoding/json"
"errors"
"strings"
"testing"

"github.com/spf13/cobra"

"github.com/boshu2/agentops/cli/internal/orchestration"
"github.com/boshu2/agentops/cli/internal/ports"
)

// fakeRunner is an in-memory CommandRunner so the Select path can be
// exercised without shelling out to a real `ntm` binary.
type fakeRunner struct {
out []byte
err error
}

func (f fakeRunner) Run(_ context.Context, _ string, _ ...string) ([]byte, error) {
return f.out, f.err
}

func TestOrchestrate_WorkSpecFromFlags(t *testing.T) {
tests := []struct {
name string
pin string
optOut bool
wantPin ports.Backend
wantOpt bool
}{
{name: "empty", pin: "", optOut: false, wantPin: "", wantOpt: false},
{name: "pin trimmed", pin: " claude ", optOut: false, wantPin: ports.BackendClaude, wantOpt: false},
{name: "opt-out", pin: "", optOut: true, wantPin: "", wantOpt: true},
{name: "pin wins over opt-out flags", pin: "codex", optOut: true, wantPin: ports.BackendCodex, wantOpt: true},
}
for _, tc := range tests {
t.Run(tc.name, func(t *testing.T) {
got := workSpecFromFlags(tc.pin, tc.optOut)
if got.Pin != tc.wantPin {
t.Fatalf("Pin: got %q, want %q", got.Pin, tc.wantPin)
}
if got.OptOut != tc.wantOpt {
t.Fatalf("OptOut: got %v, want %v", got.OptOut, tc.wantOpt)
}
})
}
}

// TestOrchestrate_SelectResolvesBackends drives a real Selector with an
// injected fake runner across the ladder branches, asserting the chosen
// backend for each flag combination.
func TestOrchestrate_SelectResolvesBackends(t *testing.T) {
t.Setenv("AGENTOPS_ORCHESTRATION", "") // neutralize any operator override

tests := []struct {
name string
runner orchestration.CommandRunner
pin string
optOut bool
want ports.Backend
}{
{
name: "ntm absent degrades to claude",
runner: fakeRunner{err: errors.New("ntm: not found")},
want: ports.BackendClaude,
},
{
name: "ntm available selects ntm",
runner: fakeRunner{out: []byte(`{"capabilities":["tmux","git"]}`)},
want: ports.BackendNTM,
},
{
name: "opt-out routes to beads floor",
runner: fakeRunner{err: errors.New("ntm: not found")},
optOut: true,
want: ports.BackendBeads,
},
{
name: "pin wins over availability",
runner: fakeRunner{out: []byte(`{"capabilities":["tmux","git"]}`)},
pin: "claude",
want: ports.BackendClaude,
},
{
name: "pin codex (never auto-selected) honored",
runner: fakeRunner{err: errors.New("ntm: not found")},
pin: "codex",
want: ports.BackendCodex,
},
}

for _, tc := range tests {
t.Run(tc.name, func(t *testing.T) {
selector := orchestration.NewSelector(tc.runner)
work := workSpecFromFlags(tc.pin, tc.optOut)
trace, err := selector.Select(context.Background(), work)
if err != nil {
t.Fatalf("Select returned error: %v", err)
}
if trace.Chosen != tc.want {
t.Fatalf("Chosen: got %q, want %q", trace.Chosen, tc.want)
}
if len(trace.Considered) == 0 {
t.Fatal("Considered ladder must be recorded")
}
})
}
}

// TestOrchestrate_EmitSelectionTraceJSON asserts the JSON branch emits the
// trace verbatim and parses back into the port shape.
func TestOrchestrate_EmitSelectionTraceJSON(t *testing.T) {
trace := ports.SelectionTrace{
Chosen: ports.BackendBeads,
Reason: "WorkSpec.OptOut -> beads floor",
Considered: []ports.Backend{"pin", "env", "optout"},
}
cmd := &cobra.Command{}
var buf bytes.Buffer
cmd.SetOut(&buf)

if err := emitSelectionTrace(cmd, trace, true); err != nil {
t.Fatalf("emitSelectionTrace: %v", err)
}

var got ports.SelectionTrace
if err := json.Unmarshal(buf.Bytes(), &got); err != nil {
t.Fatalf("output is not valid JSON: %v", err)
}
if got.Chosen != ports.BackendBeads {
t.Fatalf("Chosen: got %q, want %q", got.Chosen, ports.BackendBeads)
}
}

// TestOrchestrate_EmitSelectionTraceHuman asserts the human-readable branch
// renders the backend, reason, and ladder.
func TestOrchestrate_EmitSelectionTraceHuman(t *testing.T) {
trace := ports.SelectionTrace{
Chosen: ports.BackendClaude,
Reason: "NTM absent -> claude-native fallback",
Considered: []ports.Backend{"pin", "env", "optout", "ntm", "claude", "beads"},
}
cmd := &cobra.Command{}
var buf bytes.Buffer
cmd.SetOut(&buf)

if err := emitSelectionTrace(cmd, trace, false); err != nil {
t.Fatalf("emitSelectionTrace: %v", err)
}

got := buf.String()
for _, want := range []string{"Backend: claude", "Reason: NTM absent", "pin -> env -> optout -> ntm -> claude -> beads"} {
if !strings.Contains(got, want) {
t.Fatalf("output missing %q\nfull output:\n%s", want, got)
}
}
}
2 changes: 1 addition & 1 deletion cli/cmd/ao/rpi_phased_stream.go
Original file line number Diff line number Diff line change
Expand Up @@ -226,7 +226,7 @@ func selectExecutorFromCaps(caps backendCapabilities, statusPath string, allPhas
// The selection policy, chosen backend, and reason are logged to logPath for
// observability. Pass an empty logPath to skip log writing (e.g., in tests).
//
// Selection order: runtime override (stream/direct) > auto (live-status=>stream, else direct).
// Selection order: runtime override (stream/direct/tmux) > auto (always resolves to stream).
func selectExecutor(statusPath string, allPhases []PhaseProgress) PhaseExecutor {
return selectExecutorWithLog(statusPath, allPhases, "", "", false, defaultPhasedEngineOptions())
}
Expand Down
29 changes: 29 additions & 0 deletions cli/docs/COMMANDS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2069,6 +2069,35 @@ ao handoff [summary] [flags]

---

### `ao orchestrate`

Tooling for the orchestration safe-degradation ladder

```
ao orchestrate [command]
```

**Subcommands:**

#### `ao orchestrate select`

Resolve the orchestration backend via the safe-degradation ladder

```
ao orchestrate select [flags]
```

**Flags:**

```
-h, --help help for select
--json Emit the selection trace as JSON
--opt-out Bypass swarm engines and run on the beads floor
--pin string Force a backend: ntm|claude|codex|beads (overrides --opt-out and availability)
```

---

### `ao ratchet`

Track progress through the phased RPI workflow.
Expand Down
2 changes: 2 additions & 0 deletions cli/embedded/skills/using-agentops/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -167,6 +167,8 @@ These are the skills every user needs first. Everything else is available when y
| `/scenario` | Author and manage holdout scenarios for behavioral validation |
| `/skill-auditor` | Two-pass audit of an existing SKILL.md against the unified template (15 checks) |
| `/skill-builder` | Scaffold or absorb new SKILL.md files against the unified template |
| `/automation-shape-routing` | Front door for building agent automation — decide the SHAPE (Workflow vs NTM swarm vs plain skill), then hand off to the right builder |
| `/workflow-builder` | Scaffold a new Claude Workflow script (`.claude/workflows/*.js`) — deterministic multi-agent orchestration |

## Expert Skills (specialized workflows)

Expand Down
Loading
Loading