MUL-3414: hint custom-runtime-profile compatibility, name failure mode by multica-eve · Pull Request #4301 · multica-ai/multica

multica-eve · 2026-06-18T08:09:54Z

Closes MUL-3414

Background

GitHub bug #4293:
admins created a custom runtime profile, kept the built-in
protocol_family (e.g. cursor, claude), and pointed command_name
at grok / droid. The runtime registered, came online, kept emitting
heartbeats, and then failed every claimed task — with a generic
"agent backend failed" error that gave no hint the profile itself was
the cause. Triage on the issue (GPT-Boy) confirmed: the daemon launches
the custom command with the family's hard-coded launch arguments and
parses its stdout against the family's protocol; nothing dynamically
adapts to a different CLI.

This PR ships the agreed-on "提示 + 明确错误" (hints + clear errors)
fix. It does NOT add Grok/Droid support — that lives separately as
#4111.

Core changes

UI hint — runtime-profiles-dialog.tsx
- Family-pick step: amber callout naming the failure mode
  (registers, comes online, fails every task with empty output)
  so admins see the boundary before they pick claude intending to
  run grok.
- Command field: a per-family hint
  (Must accept <family>'s launch arguments and produce <family>- compatible output. … grok or droid don't and need a first-class provider) so the boundary is repeated next to the input where
  they are typing the binary name.
- Locale strings added to en / zh-Hans / ja / ko runtimes.json;
  parity test stays green.
CLI hint — cmd_runtime_profile.go
- multica runtime profile create gains a Long help block
  enumerating the supported families and explaining that
  non-compatible CLIs come online but fail every task.
- --protocol-family / --command-name flag descriptions repeat
  the boundary so admins reading --help see it inline.
Daemon clear error — daemon.go
- runTask now retains isCustomProfile and customCommandPath
  after the existing customCommandPathForRuntime lookup.
- Both error paths (backend.Execute returning an error and the
  default: Result-status branch) call a new
  wrapCustomProfileExecError(provider, command, raw) and pin
  failure_reason = agent_error.runtime_version_unsupported. The
  poisoned-API 400 classifier still wins, so genuine upstream
  rejections keep their existing reason.
- The wrapped comment names the protocol family, the actual
  command path, the contract (must accept family-compatible
  arguments and output), and includes the original error so daemon
  log forensics still work.

Out of scope (intentionally)

No server-side strict validation of command_name at create/update
time. The server doesn't know each host's PATH and command_name
is allowed to be a wrapper, so a strong validator would mis-fail.
fixed_args is still not exposed (the daemon's existing TODO under
MUL-3284 still applies). Exposing it now would offer admins a
workaround that doesn't actually take effect.
No first-class Grok / Droid backend.

Tests

packages/views/runtimes/components/runtime-profiles-dialog.test.tsx
— 2 new cases: family-callout copy on the family step, and the
per-family command hint after picking cursor.
server/internal/daemon/runtime_profile_runtask_test.go (new) —
shape + defaults of wrapCustomProfileExecError, plus a
behavioural runTask case that proves a custom-profile exec
failure becomes Status=blocked / FailureReason=runtime_version_unsupported
with the refined comment, and a guard that built-in-runtime
failures are NOT rewritten (so the taxonomy used by failure
analytics stays stable).

Verification

pnpm --filter @multica/views typecheck → ok
pnpm --filter @multica/views test → 1419 passed (incl. locale
parity)
go test ./internal/daemon/... ./pkg/taskfailure/... ./pkg/agent/... ./cmd/multica/... → all green
go test -race ./internal/daemon/... → clean
Pre-existing handler test failures referencing a missing
source_task_id column reproduce on main and are unrelated to
this change.

Custom runtime profiles silently failed when admins reused a built-in protocol family (e.g. cursor, claude) but pointed command_name at a non-compatible CLI (grok, droid). The runtime registered, came online, and emitted heartbeats — every task then failed with a generic backend error and no clue that the profile itself was the cause. This change makes the boundary visible at create time and named at fail time, without trying to support arbitrary third-party CLIs: - UI: dialog renders a family-compatibility callout on the family-pick step and a per-family compatibility line under the command input, with locale strings for en / zh-Hans / ja / ko. - CLI: `multica runtime profile create` gains a Long help block and per-flag help that document the same boundary so non-UI admins see it in `--help`. - Daemon: when a custom-profile runtime's backend exec fails (raw error or non-completed Result.Status), runTask rewrites the comment to "Custom runtime profile is incompatible with the selected <family> protocol family …" and pins failure_reason to agent_error.runtime_version_unsupported. The poisoned-API 400 path still wins so genuine upstream rejections keep their existing classification. Tests: - runtime-profiles-dialog.test.tsx: 2 new cases for the family callout and the command hint (full file: 5 tests). - runtime_profile_runtask_test.go: wrapCustomProfileExecError unit shape + defaults, and a behavioural runTask test confirming the custom path returns a blocked TaskResult with the refined failure_reason. A built-in-runtime regression guard ensures the rewrite stays gated on isCustomProfile. Verification: - pnpm --filter @multica/views typecheck → ok - pnpm --filter @multica/views test → 1419 passed - go test ./internal/daemon/... ./pkg/taskfailure/... ./pkg/agent/... ./cmd/multica/... → all green; race-detector run on ./internal/daemon/... also clean. Co-authored-by: multica-agent <github@multica.ai>

vercel · 2026-06-18T08:10:42Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
multica-docs	Ready	Preview, Comment	Jun 18, 2026 9:12am

…over timeout/idle (MUL-3414) GPT-Boy's review on the first cut surfaced three regressions that the original implementation either caused or missed. This commit fixes all three: 1. Absolute command path leaked into user-visible comments. The `set-path` CLI (cmd_runtime_profile.go) explicitly documents the per-machine override as state that "never leaves the machine", but `wrapCustomProfileExecError` was echoing the full path back into the issue/chat comment — a privacy regression and a contract break. New `safeProfileCommandLabel` strips to filepath.Base; the full path stays in the structured daemon log only via taskLog fields. 2. The rewrite was over-broad: every custom-profile failure except poisoned-API 400 was retagged as runtime_version_unsupported, which hid real auth / quota / network / context_overflow / model_not_found errors that same-protocol wrappers can hit just like the upstream CLI. New `shouldRewriteAsCustomProfileIncompatible` predicate gates the rewrite to genuine protocol-shape failures only: process_failure, empty_or_unparseable_output, unknown. Other reasons pass through with the classifier's verdict intact, both in the executeAndDrain-error path (returns the raw error so handleTask runs the canonical FailTask classifier) and in the default-status path (TaskResult keeps the classifier's failure_reason and the original error string). An exhaustive predicate test enumerates every taskfailure.AllReasons() entry to catch future taxonomy drift. 3. The droid-shape case (binary launches, sits silent because it doesn't speak the protocol, gets killed by timeout/idle_watchdog) was uncovered. When isCustomProfile && result.SessionID == "", the timeout / idle_watchdog branches now append a compatibility hint via `appendCustomProfileSilenceHint`. failure_reason stays timeout / idle_watchdog so runtime sweepers and operator dashboards keep their existing semantics (a real long-running tool call still belongs in the timeout bucket); the user-visible comment is the right place for the hint, the analytics taxonomy is not. Side support: added a small package-level `agentNew = agent.New` hook mirroring the existing `detectAgentVersion` / `lookPath` pattern so runTask integration tests can drive the post-executeAndDrain switch end-to-end with a stub backend. Test coverage: - `TestSafeProfileCommandLabel` — 8 sub-cases pinning the redaction. - `TestWrapCustomProfileExecError_RedactsAbsolutePath` — privacy regression guard. - `TestShouldRewriteAsCustomProfileIncompatible` — exhaustive over taskfailure.AllReasons(). - `TestAppendCustomProfileSilenceHint` — hint-shape unit. - runTask integration: ExecError-rewrite, ExecError-auth-passthrough, FailedResult-auth-preserves-reason, FailedResult-protocol-shape- rewrites, Timeout-no-session-hint, Timeout-with-session-no-hint, IdleWatchdog-no-session-hint, BuiltIn-exec-error-stays-legacy. Verification: - go test ./internal/daemon/... — green - go test -race ./internal/daemon/... — clean - go vet ./... — no issues Co-authored-by: multica-agent <github@multica.ai>

vercel Bot deployed to Preview – multica-docs June 18, 2026 08:10 View deployment

vercel Bot deployed to Preview – multica-docs June 18, 2026 09:12 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MUL-3414: hint custom-runtime-profile compatibility, name failure mode#4301

MUL-3414: hint custom-runtime-profile compatibility, name failure mode#4301
multica-eve wants to merge 2 commits into
mainfrom
fix/MUL-3414-custom-profile-incompatible-hint

multica-eve commented Jun 18, 2026

Uh oh!

vercel Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

multica-eve commented Jun 18, 2026

Background

Core changes

Out of scope (intentionally)

Tests

Verification

Uh oh!

vercel Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented Jun 18, 2026 •

edited

Loading