Run coding-agent CLIs — Claude Code, Codex, Antigravity, Gemini — from .NET: hardened spawning, streamed events, lifecycle, quota, metrics & optional rendering.
CodingAgentRunner gives a .NET application an LLM on your local machine, using the coding-agent CLI you already sign in to — with no API keys. Run a single prompt, or a full multi-turn session. It launches and supervises terminal-native coding agents (Claude Code, OpenAI Codex, Google Antigravity / agentapi, and the legacy Gemini CLI) as child processes — reliably, especially on Windows.
It is the process-and-protocol layer for those CLIs: it spawns the agent CLI with the right binary, environment, and isolation; normalizes its stream-json output — a different frame dialect per CLI — into one structured event vocabulary; classifies the run's outcome; enforces a platform-owns-git boundary; tracks remaining quota with a cache that polls more often as usage approaches the limit; records run metrics; and can render agent Markdown through an optional package. Unlike a general process wrapper such as CliWrap, it is specialized to coding-agent CLIs — it parses their stream-json output and classifies the run's outcome.
Status: core complete, pre-1.0. Extracted and generalized from Agent Studio, a production multi-agent orchestrator that has processed hundreds of millions of tokens through these CLIs. The spawn engine, descriptor-driven CLI catalog, event contract, outcome model, quota module, metrics recorder and optional rendering package are implemented and tested (366 tests, CI on Windows + Linux). BenchmarkDotNet micro-benchmarks are available as an optional manual run. The public API may still shift before 1.0 — pin a version and watch releases.
Running these CLIs from another process on Windows is full of footguns that each cost real debugging time. CodingAgentRunner encodes the fixes so you don't have to rediscover them:
- The
.cmd-shim prompt truncation. On Windowsclaude/codexoften resolve to a.cmdshim; spawning it routes throughcmd.exe, which silently truncates a multi-line prompt at the first newline — so the agent receives only the first line and no task. The runner resolves and launches the real.exe. - stdin default-deny + handle scrubbing. A Node CLI that inherits a live stdin pipe (or an unrelated parent handle) can wedge during init on Windows. Runs get an immediate-EOF stdin by default, and a Win32 spawner that hands the child only its three std pipes.
- Platform-owns-git boundary (defence-in-depth, not a sandbox). The reliable layer is a soft rule in the agent's instructions (AGENTS.md/CLAUDE.md: "don't run git; the host owns commit/push"). On top, an optional PATH-front
gitwrapper blocks mutating commands — but it is not a hard guarantee: a CLI that controls its own shell's PATH (e.g. claude-code's bash) can resolve the realgitand bypass it. Treat it as an extra safety net, not a jail. - Clean-context isolation. Each run gets an isolated CLI home so concurrent runs (and your own interactive session) don't collide.
- Completion you can trust. The library uses the CLI's own completion signal (the
stream-jsonresult frame + process exit) — not a fragile[[TASK_DONE]]-scraping heuristic. - Honest outcomes. A deliberate stop (user pause, watchdog) is reported as stopped, never as a
-1crash — the distinction Windows'Process.Killthrows away. - Quota awareness. Probe and cache the remaining rate-limit window per CLI, with an escalation policy that polls more often as you approach the limit.
See docs/why-windows-hardening.md for the full stories behind each.
The library targets known coding-agent CLIs in their specific versions — it is purpose-built for them, not a generic "wrap any CLI" framework, and there is no "add your own CLI" extension point.
| Agent | Type id | Status | Context | Adapter / stream | Notes |
|---|---|---|---|---|---|
| Claude Code | claude |
supported | clean or shared | Claude stream-json adapter |
First-class driver. |
| OpenAI Codex | codex |
supported | clean or shared | Codex stream-json adapter |
First-class driver, including reasoning-model liveness metadata. |
| Google Gemini | gemini |
deprecated | shared only | Gemini adapter | Legacy Google driver kept for compatibility; no new feature work. |
Google Antigravity (agentapi) |
antigravity |
driver shipped | shared only | reuses Gemini adapter | Maintained Google path; kept out of the default selectable set (CliTypes.All) until a consumer migrates. |
| GitHub Copilot | copilot |
removed | n/a | no supported adapter | Removed because the headless path was PTY/TUI-dependent and did not fit the hardened structured stream engine. |
Context means the CLI's persistent home/state, not the repository or prompt size.
clean creates a temporary per-run CLI home and seeds only the minimum auth/base config
(Claude via CLAUDE_CONFIG_DIR, Codex via CODEX_HOME). shared uses the operator's
normal signed-in CLI home, including settings, cache, memory and prior CLI state. Repo
files and repo instruction files such as AGENTS.md / CLAUDE.md remain visible in
both modes because they come from the checkout, not from the CLI home.
Internally each CLI is a CliDescriptor — data plus a few pure delegates — in a fixed catalog, and one internal sealed engine is parameterized by it; covering a CLI is a descriptor in the library, not a subclass and not a consumer plug-in. The adapters fold each CLI's own dialect onto one event vocabulary, so your run logic is written once. See Architecture and Cross-CLI normalization.
Google status: Gemini is deprecated and superseded by Antigravity, Google's
agentapiCLI. The Gemini driver stays in place for existing consumers; Antigravity is where new Google integration work belongs.
- ✅ Hardened binary resolution (
.cmd→.exe, the prompt-truncation fix). - ✅ Process spawning: environment hardening, clean-context isolation, git-guard, stdin default-deny, Win32 handle-scrub spawner.
- ✅ One event vocabulary across CLIs: each CLI's
stream-jsondialect normalized to the same closedCliRunEventsum type (incl. the CLI's real completion signal) — write run logic once, never branch on the CLI or model. - ✅ One terminal event:
RunEndedwith a 3-valued outcome (completed / stopped / failed), classified from exit code + stop reason. - ✅ Typed interrupt classification (opt-in): an
IInterruptClassifierraises aCliRunEvent.Interrupt(InterruptReason, …, IsFatal)for stop-worthy conditions (environment blocker, quota exhausted, silent completion, …); the library emits the event, your code decides whether toStop. - ✅ Per-CLI capabilities (
CliCapabilities: clean-context, resume, heartbeat-during-thinking, thinking levels) and overridable defaults resolved by specificity (CliScope/CliDefault<T>: CLI ▸ model ▸ thinking level) — ask the capability, don't switch on the CLI type. - ✅ Built-in silence watchdog (
RunWatchdog/WatchdogPolicy) — one-line attach, phase-aware budgets. - ✅ Durable per-stream output logs (crash-tolerant, fsync per line).
- ✅ Platform-owns-git guard (brand-neutral, configurable).
- ✅ Quota cache · escalation · cap/gate · free event-harvest (poll harder near the limit; skip a run before it hits the wall).
- ✅ Pluggable process spawner (
CliOptions.Spawner/ICliProcessSpawner) — inject a custom launcher (e.g. a Windows PTY); null uses redirected pipes. - ✅ Run metrics from the event stream (
RunMetricsRecorder) plus an optionalCodingAgentRunner.Renderingpackage for Markdown/HTML UI output. - ✅ Optional BenchmarkDotNet micro-benchmarks for adapter parsing, usage parsing and rendering hot paths.
- 🚧 Concrete PTY-based quota probes (the
IQuotaProbecontract + cache are done; plug your own probe today).
using CodingAgentRunner;
using CodingAgentRunner.Abstractions;
using CodingAgentRunner.Events;
using CodingAgentRunner.Execution;
// Wire the library once; resolve a driver per CLI.
var runner = new CliRunner(new CliOptions());
var driver = runner.Get("claude");
// Drive a watchdog / UI from the typed event stream.
driver.OnRunEvent += (runId, evt) =>
{
switch (evt)
{
case CliRunEvent.OutputDelta d: Console.Write(d.Text); break;
case CliRunEvent.ToolStarted t: Console.WriteLine($"\n[tool] {t.ToolName} {t.Argument}"); break;
case CliRunEvent.TurnCompleted c: Console.WriteLine($"\n[done] {c.UsageSummary}"); break;
}
};
driver.OnFinished += (runId, run) =>
Console.WriteLine($"\nRun {runId}: {run.Status} (exit {run.ExitCode}, {run.DurationSeconds:F1}s)");
var (run, error) = await driver.StartAsync(new CliRunRequest
{
RunId = "task-1",
Prompt = "Add a build-status badge to the README.",
WorkingDirectory = @"C:\repo",
Model = "claude-opus-4-8",
ThinkingLevel = "high",
});
// ... later, to stop a run on purpose (reported as 'stopped', not a crash):
driver.Stop("task-1");using CodingAgentRunner.Quota;
// Plug your own probe (HTTP call, CLI scrape, …) behind the IQuotaProbe contract.
var quota = new QuotaService(
probes: new[] { myClaudeQuotaProbe },
options: new QuotaCacheOptions
{
DefaultTtl = TimeSpan.FromMinutes(10),
EscalationTiers =
[
new QuotaEscalationTier(90, TimeSpan.FromMinutes(2)), // ≥90% used → poll every 2 min
new QuotaEscalationTier(97, TimeSpan.FromSeconds(30)), // ≥97% used → every 30 s
],
});
QuotaReport report = quota.GetWithBackgroundRefresh(); // cached now; refreshes stale entries in the backgroundusing CodingAgentRunner.Metrics;
var metrics = new RunMetricsRecorder();
driver.OnRunEvent += (_, evt) => metrics.Observe(evt);
driver.OnFinished += (runId, _) =>
{
RunMetrics snapshot = metrics.Build();
Console.WriteLine($"first output: {snapshot.TimeToFirstOutputMs / 1000.0:F1}s");
Console.WriteLine($"output tokens/sec: {snapshot.AverageOutputTokensPerSec:F1}");
};Metrics are reconstructed from the same CliRunEvent stream your UI or watchdog
already consumes. The recorder does not poll the process or parse raw logs.
using CodingAgentRunner.Rendering;
IReadOnlyList<RenderedLine> lines = MarkdownRenderer.ToLines(agentMarkdown);
string html = string.Concat(lines.Select(HtmlRenderer.SpansToHtml));CodingAgentRunner.Rendering is opt-in and one-way (Rendering depends on core;
core never references Rendering). It maps agent Markdown onto a presentation-neutral
span/line model, injects links through a pluggable LinkResolver, and can materialize
Markdown or HTML for UI consumers. The default resolver (LinkExtractor.WebDefault)
enforces an http/https/mailto allowlist that rejects javascript: / data: targets,
so HTML output is XSS-safe by default. Event-stream consumers never pay this dependency.
src/CodingAgentRunner/
CliRunner.cs entry point: resolves one ICliDriver per CLI from the catalog
Abstractions/ consumer options + IUserHome/IRunLogPath providers
Model/ value types, the run-outcome classifier, model catalog, CliCapabilities
Events/ CliRunEvent contract, phase machine, Interrupt + InterruptReason, watchdog
Adapters/ stream-json → CliRunEvent (Claude / Codex / Gemini; Antigravity reuses Gemini)
Execution/ CliRunEngine (one engine parameterized by CliDescriptor), CliCatalog,
BuiltInDescriptors, LaunchSpec, hardening, clean-context, log stores, Win32 spawner
Metrics/ RunMetrics / TurnMetrics / RunMetricsRecorder / UsageSummaryParser
Quota/ quota model, escalation cache, probe contract
src/CodingAgentRunner.Rendering/ optional, opt-in: span/line model, link injection,
Markdown/HTML rendering (depends on core; core does not depend on it)
tests/CodingAgentRunner.Tests/ xUnit tests
benchmarks/CodingAgentRunner.Benchmarks/ BenchmarkDotNet micro-bench: parse / metrics / render throughput
docs/ developer wiki (architecture, the "why")
website/ project website (static, English)
website/data/cli-performance-observations.json
measured end-to-end CLI performance scenario data with source-test references
dotnet build
dotnet testRequires the .NET 10 SDK.
Micro-benchmarks of the library's own parsing, metrics and rendering overhead — the
per-line cost the host pays while reading agent output (not end-to-end model
benchmarks, which would mean spawning a real CLI). These are optional manual runs;
dotnet test and release validation do not execute BenchmarkDotNet:
dotnet run -c Release --project benchmarks/CodingAgentRunner.Benchmarks -- --filter '*'
dotnet run -c Release --project benchmarks/CodingAgentRunner.Benchmarks -- --filter '*' --job drySee benchmarks/CodingAgentRunner.Benchmarks for the benchmark classes and the fast smoke command.
End-to-end CLI performance observations live separately in
website/data/cli-performance-observations.json.
Those rows are measured local CLI executions, and each scenario links to source-level
tests through sourceTests. The JSON also records contextTokens,
cached/cache-creation input tokens, output tokens, reasoning tokens and
totalTokensUsed so first-output latency can be compared against real prompt size.
Releases are published to nuget.org by the release GitHub workflow when a
v*.*.* tag is pushed; the package version is derived from the tag.
scripts/release.sh 0.1.0 # validates, tests, tags v0.1.0, pushes the tag
# scripts/pack.sh # local pack into ./artifacts (no publish)Auth uses nuget.org Trusted Publishing (OIDC) — there is no API key to
create, store, or rotate. GitHub Actions mints a short-lived OIDC token that
nuget.org validates against a Trusted Publishing policy (package owner + this
repo + release.yml). Nothing secret lives in the repo, the scripts, or the
workflow. While the API is pre-1.0, publish 0.x versions.
See CONTRIBUTING.md. Issues and PRs welcome.
Apache-2.0 © 2026 Robert Mischke.