MUL-3414: hint custom-runtime-profile compatibility, name failure mode#4301
Open
multica-eve wants to merge 2 commits into
Open
MUL-3414: hint custom-runtime-profile compatibility, name failure mode#4301multica-eve wants to merge 2 commits into
multica-eve wants to merge 2 commits into
Conversation
Custom runtime profiles silently failed when admins reused a built-in protocol family (e.g. cursor, claude) but pointed command_name at a non-compatible CLI (grok, droid). The runtime registered, came online, and emitted heartbeats — every task then failed with a generic backend error and no clue that the profile itself was the cause. This change makes the boundary visible at create time and named at fail time, without trying to support arbitrary third-party CLIs: - UI: dialog renders a family-compatibility callout on the family-pick step and a per-family compatibility line under the command input, with locale strings for en / zh-Hans / ja / ko. - CLI: `multica runtime profile create` gains a Long help block and per-flag help that document the same boundary so non-UI admins see it in `--help`. - Daemon: when a custom-profile runtime's backend exec fails (raw error or non-completed Result.Status), runTask rewrites the comment to "Custom runtime profile is incompatible with the selected <family> protocol family …" and pins failure_reason to agent_error.runtime_version_unsupported. The poisoned-API 400 path still wins so genuine upstream rejections keep their existing classification. Tests: - runtime-profiles-dialog.test.tsx: 2 new cases for the family callout and the command hint (full file: 5 tests). - runtime_profile_runtask_test.go: wrapCustomProfileExecError unit shape + defaults, and a behavioural runTask test confirming the custom path returns a blocked TaskResult with the refined failure_reason. A built-in-runtime regression guard ensures the rewrite stays gated on isCustomProfile. Verification: - pnpm --filter @multica/views typecheck → ok - pnpm --filter @multica/views test → 1419 passed - go test ./internal/daemon/... ./pkg/taskfailure/... ./pkg/agent/... ./cmd/multica/... → all green; race-detector run on ./internal/daemon/... also clean. Co-authored-by: multica-agent <github@multica.ai>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
…over timeout/idle (MUL-3414) GPT-Boy's review on the first cut surfaced three regressions that the original implementation either caused or missed. This commit fixes all three: 1. Absolute command path leaked into user-visible comments. The `set-path` CLI (cmd_runtime_profile.go) explicitly documents the per-machine override as state that "never leaves the machine", but `wrapCustomProfileExecError` was echoing the full path back into the issue/chat comment — a privacy regression and a contract break. New `safeProfileCommandLabel` strips to filepath.Base; the full path stays in the structured daemon log only via taskLog fields. 2. The rewrite was over-broad: every custom-profile failure except poisoned-API 400 was retagged as runtime_version_unsupported, which hid real auth / quota / network / context_overflow / model_not_found errors that same-protocol wrappers can hit just like the upstream CLI. New `shouldRewriteAsCustomProfileIncompatible` predicate gates the rewrite to genuine protocol-shape failures only: process_failure, empty_or_unparseable_output, unknown. Other reasons pass through with the classifier's verdict intact, both in the executeAndDrain-error path (returns the raw error so handleTask runs the canonical FailTask classifier) and in the default-status path (TaskResult keeps the classifier's failure_reason and the original error string). An exhaustive predicate test enumerates every taskfailure.AllReasons() entry to catch future taxonomy drift. 3. The droid-shape case (binary launches, sits silent because it doesn't speak the protocol, gets killed by timeout/idle_watchdog) was uncovered. When isCustomProfile && result.SessionID == "", the timeout / idle_watchdog branches now append a compatibility hint via `appendCustomProfileSilenceHint`. failure_reason stays timeout / idle_watchdog so runtime sweepers and operator dashboards keep their existing semantics (a real long-running tool call still belongs in the timeout bucket); the user-visible comment is the right place for the hint, the analytics taxonomy is not. Side support: added a small package-level `agentNew = agent.New` hook mirroring the existing `detectAgentVersion` / `lookPath` pattern so runTask integration tests can drive the post-executeAndDrain switch end-to-end with a stub backend. Test coverage: - `TestSafeProfileCommandLabel` — 8 sub-cases pinning the redaction. - `TestWrapCustomProfileExecError_RedactsAbsolutePath` — privacy regression guard. - `TestShouldRewriteAsCustomProfileIncompatible` — exhaustive over taskfailure.AllReasons(). - `TestAppendCustomProfileSilenceHint` — hint-shape unit. - runTask integration: ExecError-rewrite, ExecError-auth-passthrough, FailedResult-auth-preserves-reason, FailedResult-protocol-shape- rewrites, Timeout-no-session-hint, Timeout-with-session-no-hint, IdleWatchdog-no-session-hint, BuiltIn-exec-error-stays-legacy. Verification: - go test ./internal/daemon/... — green - go test -race ./internal/daemon/... — clean - go vet ./... — no issues Co-authored-by: multica-agent <github@multica.ai>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes MUL-3414
Background
GitHub bug #4293:
admins created a custom runtime profile, kept the built-in
protocol_family(e.g.cursor,claude), and pointedcommand_nameat
grok/droid. The runtime registered, came online, kept emittingheartbeats, and then failed every claimed task — with a generic
"agent backend failed" error that gave no hint the profile itself was
the cause. Triage on the issue (GPT-Boy) confirmed: the daemon launches
the custom command with the family's hard-coded launch arguments and
parses its stdout against the family's protocol; nothing dynamically
adapts to a different CLI.
This PR ships the agreed-on "提示 + 明确错误" (hints + clear errors)
fix. It does NOT add Grok/Droid support — that lives separately as
#4111.
Core changes
UI hint —
runtime-profiles-dialog.tsx(
registers, comes online, fails every task with empty output)so admins see the boundary before they pick
claudeintending torun
grok.(
Must accept <family>'s launch arguments and produce <family>- compatible output. … grok or droid don't and need a first-class provider) so the boundary is repeated next to the input wherethey are typing the binary name.
en / zh-Hans / ja / koruntimes.json;parity test stays green.
CLI hint —
cmd_runtime_profile.gomultica runtime profile creategains aLonghelp blockenumerating the supported families and explaining that
non-compatible CLIs come online but fail every task.
--protocol-family/--command-nameflag descriptions repeatthe boundary so admins reading
--helpsee it inline.Daemon clear error —
daemon.gorunTasknow retainsisCustomProfileandcustomCommandPathafter the existing
customCommandPathForRuntimelookup.backend.Executereturning an error and thedefault:Result-status branch) call a newwrapCustomProfileExecError(provider, command, raw)and pinfailure_reason = agent_error.runtime_version_unsupported. Thepoisoned-API 400 classifier still wins, so genuine upstream
rejections keep their existing reason.
command path, the contract (must accept family-compatible
arguments and output), and includes the original error so daemon
log forensics still work.
Out of scope (intentionally)
command_nameat create/updatetime. The server doesn't know each host's PATH and
command_nameis allowed to be a wrapper, so a strong validator would mis-fail.
fixed_argsis still not exposed (the daemon's existing TODO underMUL-3284 still applies). Exposing it now would offer admins a
workaround that doesn't actually take effect.
Tests
packages/views/runtimes/components/runtime-profiles-dialog.test.tsx— 2 new cases: family-callout copy on the family step, and the
per-family command hint after picking
cursor.server/internal/daemon/runtime_profile_runtask_test.go(new) —shape + defaults of
wrapCustomProfileExecError, plus abehavioural
runTaskcase that proves a custom-profile execfailure becomes
Status=blocked / FailureReason=runtime_version_unsupportedwith the refined comment, and a guard that built-in-runtime
failures are NOT rewritten (so the taxonomy used by failure
analytics stays stable).
Verification
pnpm --filter @multica/views typecheck→ okpnpm --filter @multica/views test→ 1419 passed (incl. localeparity)
go test ./internal/daemon/... ./pkg/taskfailure/... ./pkg/agent/... ./cmd/multica/...→ all greengo test -race ./internal/daemon/...→ cleansource_task_idcolumn reproduce onmainand are unrelated tothis change.