Skip to content

MUL-3414: hint custom-runtime-profile compatibility, name failure mode#4301

Open
multica-eve wants to merge 2 commits into
mainfrom
fix/MUL-3414-custom-profile-incompatible-hint
Open

MUL-3414: hint custom-runtime-profile compatibility, name failure mode#4301
multica-eve wants to merge 2 commits into
mainfrom
fix/MUL-3414-custom-profile-incompatible-hint

Conversation

@multica-eve

Copy link
Copy Markdown
Collaborator

Closes MUL-3414

Background

GitHub bug #4293:
admins created a custom runtime profile, kept the built-in
protocol_family (e.g. cursor, claude), and pointed command_name
at grok / droid. The runtime registered, came online, kept emitting
heartbeats, and then failed every claimed task — with a generic
"agent backend failed" error that gave no hint the profile itself was
the cause. Triage on the issue (GPT-Boy) confirmed: the daemon launches
the custom command with the family's hard-coded launch arguments and
parses its stdout against the family's protocol; nothing dynamically
adapts to a different CLI.

This PR ships the agreed-on "提示 + 明确错误" (hints + clear errors)
fix. It does NOT add Grok/Droid support — that lives separately as
#4111.

Core changes

  1. UI hintruntime-profiles-dialog.tsx

    • Family-pick step: amber callout naming the failure mode
      (registers, comes online, fails every task with empty output)
      so admins see the boundary before they pick claude intending to
      run grok.
    • Command field: a per-family hint
      (Must accept <family>'s launch arguments and produce <family>- compatible output. … grok or droid don't and need a first-class provider) so the boundary is repeated next to the input where
      they are typing the binary name.
    • Locale strings added to en / zh-Hans / ja / ko runtimes.json;
      parity test stays green.
  2. CLI hintcmd_runtime_profile.go

    • multica runtime profile create gains a Long help block
      enumerating the supported families and explaining that
      non-compatible CLIs come online but fail every task.
    • --protocol-family / --command-name flag descriptions repeat
      the boundary so admins reading --help see it inline.
  3. Daemon clear errordaemon.go

    • runTask now retains isCustomProfile and customCommandPath
      after the existing customCommandPathForRuntime lookup.
    • Both error paths (backend.Execute returning an error and the
      default: Result-status branch) call a new
      wrapCustomProfileExecError(provider, command, raw) and pin
      failure_reason = agent_error.runtime_version_unsupported. The
      poisoned-API 400 classifier still wins, so genuine upstream
      rejections keep their existing reason.
    • The wrapped comment names the protocol family, the actual
      command path, the contract (must accept family-compatible
      arguments and output), and includes the original error so daemon
      log forensics still work.

Out of scope (intentionally)

  • No server-side strict validation of command_name at create/update
    time. The server doesn't know each host's PATH and command_name
    is allowed to be a wrapper, so a strong validator would mis-fail.
  • fixed_args is still not exposed (the daemon's existing TODO under
    MUL-3284 still applies). Exposing it now would offer admins a
    workaround that doesn't actually take effect.
  • No first-class Grok / Droid backend.

Tests

  • packages/views/runtimes/components/runtime-profiles-dialog.test.tsx
    — 2 new cases: family-callout copy on the family step, and the
    per-family command hint after picking cursor.
  • server/internal/daemon/runtime_profile_runtask_test.go (new) —
    shape + defaults of wrapCustomProfileExecError, plus a
    behavioural runTask case that proves a custom-profile exec
    failure becomes Status=blocked / FailureReason=runtime_version_unsupported
    with the refined comment, and a guard that built-in-runtime
    failures are NOT rewritten (so the taxonomy used by failure
    analytics stays stable).

Verification

  • pnpm --filter @multica/views typecheck → ok
  • pnpm --filter @multica/views test → 1419 passed (incl. locale
    parity)
  • go test ./internal/daemon/... ./pkg/taskfailure/... ./pkg/agent/... ./cmd/multica/... → all green
  • go test -race ./internal/daemon/... → clean
  • Pre-existing handler test failures referencing a missing
    source_task_id column reproduce on main and are unrelated to
    this change.

Custom runtime profiles silently failed when admins reused a built-in
protocol family (e.g. cursor, claude) but pointed command_name at a
non-compatible CLI (grok, droid). The runtime registered, came online,
and emitted heartbeats — every task then failed with a generic backend
error and no clue that the profile itself was the cause.

This change makes the boundary visible at create time and named at
fail time, without trying to support arbitrary third-party CLIs:

- UI: dialog renders a family-compatibility callout on the family-pick
  step and a per-family compatibility line under the command input,
  with locale strings for en / zh-Hans / ja / ko.
- CLI: `multica runtime profile create` gains a Long help block and
  per-flag help that document the same boundary so non-UI admins see
  it in `--help`.
- Daemon: when a custom-profile runtime's backend exec fails (raw
  error or non-completed Result.Status), runTask rewrites the comment
  to "Custom runtime profile is incompatible with the selected
  <family> protocol family …" and pins failure_reason to
  agent_error.runtime_version_unsupported. The poisoned-API 400 path
  still wins so genuine upstream rejections keep their existing
  classification.

Tests:
- runtime-profiles-dialog.test.tsx: 2 new cases for the family
  callout and the command hint (full file: 5 tests).
- runtime_profile_runtask_test.go: wrapCustomProfileExecError unit
  shape + defaults, and a behavioural runTask test confirming the
  custom path returns a blocked TaskResult with the refined
  failure_reason. A built-in-runtime regression guard ensures the
  rewrite stays gated on isCustomProfile.

Verification:
- pnpm --filter @multica/views typecheck → ok
- pnpm --filter @multica/views test → 1419 passed
- go test ./internal/daemon/... ./pkg/taskfailure/... ./pkg/agent/...
  ./cmd/multica/... → all green; race-detector run on ./internal/daemon/...
  also clean.

Co-authored-by: multica-agent <github@multica.ai>
@vercel

vercel Bot commented Jun 18, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
multica-docs Ready Ready Preview, Comment Jun 18, 2026 9:12am

Request Review

…over timeout/idle (MUL-3414)

GPT-Boy's review on the first cut surfaced three regressions that the
original implementation either caused or missed. This commit fixes all
three:

1. Absolute command path leaked into user-visible comments. The
   `set-path` CLI (cmd_runtime_profile.go) explicitly documents the
   per-machine override as state that "never leaves the machine", but
   `wrapCustomProfileExecError` was echoing the full path back into the
   issue/chat comment — a privacy regression and a contract break.
   New `safeProfileCommandLabel` strips to filepath.Base; the full path
   stays in the structured daemon log only via taskLog fields.

2. The rewrite was over-broad: every custom-profile failure except
   poisoned-API 400 was retagged as runtime_version_unsupported, which
   hid real auth / quota / network / context_overflow / model_not_found
   errors that same-protocol wrappers can hit just like the upstream
   CLI. New `shouldRewriteAsCustomProfileIncompatible` predicate gates
   the rewrite to genuine protocol-shape failures only:
   process_failure, empty_or_unparseable_output, unknown. Other reasons
   pass through with the classifier's verdict intact, both in the
   executeAndDrain-error path (returns the raw error so handleTask runs
   the canonical FailTask classifier) and in the default-status path
   (TaskResult keeps the classifier's failure_reason and the original
   error string). An exhaustive predicate test enumerates every
   taskfailure.AllReasons() entry to catch future taxonomy drift.

3. The droid-shape case (binary launches, sits silent because it doesn't
   speak the protocol, gets killed by timeout/idle_watchdog) was
   uncovered. When isCustomProfile && result.SessionID == "", the
   timeout / idle_watchdog branches now append a compatibility hint via
   `appendCustomProfileSilenceHint`. failure_reason stays
   timeout / idle_watchdog so runtime sweepers and operator dashboards
   keep their existing semantics (a real long-running tool call still
   belongs in the timeout bucket); the user-visible comment is the
   right place for the hint, the analytics taxonomy is not.

Side support: added a small package-level `agentNew = agent.New` hook
mirroring the existing `detectAgentVersion` / `lookPath` pattern so
runTask integration tests can drive the post-executeAndDrain switch
end-to-end with a stub backend.

Test coverage:
- `TestSafeProfileCommandLabel` — 8 sub-cases pinning the redaction.
- `TestWrapCustomProfileExecError_RedactsAbsolutePath` — privacy
  regression guard.
- `TestShouldRewriteAsCustomProfileIncompatible` — exhaustive over
  taskfailure.AllReasons().
- `TestAppendCustomProfileSilenceHint` — hint-shape unit.
- runTask integration: ExecError-rewrite, ExecError-auth-passthrough,
  FailedResult-auth-preserves-reason, FailedResult-protocol-shape-
  rewrites, Timeout-no-session-hint, Timeout-with-session-no-hint,
  IdleWatchdog-no-session-hint, BuiltIn-exec-error-stays-legacy.

Verification:
- go test ./internal/daemon/... — green
- go test -race ./internal/daemon/... — clean
- go vet ./... — no issues

Co-authored-by: multica-agent <github@multica.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant