fix: recover stuck node progress and scope mirror to active workflow#11866
fix: recover stuck node progress and scope mirror to active workflow#11866christian-byrne wants to merge 7 commits intomainfrom
Conversation
When WebSocket terminal messages (execution_success / execution_error / execution_interrupted) are dropped, node-level progress UI gets stuck on the affected node even though the job has reached a terminal state on the backend. Concurrently, progress messages from a job initiated in another open workflow can leak into the active one because the global nodeProgressStates mirror is overwritten unconditionally on every progress_state event. Changes: - Add reconcileTerminalJobs(active, terminal) to executionStore. Used by polling to detect jobs that moved Running -> History via the backend snapshot and clear their per-job state without disturbing a different running job (evictTerminalJob clears the global mirror only when the evicted job actually owns it, identified via prompt_id on NodeProgressState). - Wire reconcileTerminalJobs into queueStore.update() alongside the existing reconcileInitializingJobs call, so the polling fallback now recovers stuck progress without requiring backend retries or WebSocket sequence numbers. - Broaden useQueuePolling activation from activeJobsCount === 1 to activeJobsCount > 0 so the recovery path also covers the case where one job's terminal message is dropped while another job is still pending or running. - Gate writes to the global nodeProgressStates mirror in handleProgressState by active workflow. Resolution order: workflow_id on the WS message, jobIdToWorkflowId mapping, jobIdToSessionWorkflowPath fallback, defaulting to the legacy behaviour when nothing is resolvable. - Add optional workflow_id field to all execution-related WS message schemas so the gating starts using the backend-provided value as soon as it ships. Tests: - Active-workflow gating: per-job state always updates; mirror only updates on workflow_id match (or via mapping fallbacks); preserves legacy behaviour when no resolution is available. - reconcileTerminalJobs: evicts non-active terminal jobs without disturbing the active job, evicts the active job and clears the mirror, clears a stale mirror still owned by a job that just became terminal, skips still-active jobs, skips jobs absent from terminal, idempotent.
- queueStore.update(): always reconcile terminal jobs when history is non-empty, not just when queue has data. Previously, the common case of the last active job finishing (queue empty, job in history) was skipped, leaving the very stuck-progress bug we are trying to fix. - handleProgress: gate _executingNodeProgress by the same active- workflow check as handleProgressState, so per-node current progress no longer leaks across workflows. - Add tests for both fixes.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR implements workflow-aware execution state management for multi-job polling and cross-tab synchronization. Schema changes add ChangesWorkflow-Aware Job Execution and Multi-Job Polling
Sequence DiagramsequenceDiagram
participant WS as WebSocket
participant Handler as Progress Handler
participant Resolver as messageMatchesActiveWorkflow
participant Mirror as Global nodeProgressStates
participant Store as workflowStore
participant Output as nodeOutputStore
WS->>Handler: progress_state (workflow_id, job_id)
Handler->>Resolver: Check workflow ownership
Resolver->>Store: Is workflow_id active workflow?
Store-->>Resolver: activeWorkflow match result
Resolver-->>Handler: isActiveWorkflowMessage
Handler->>Handler: Update nodeProgressStatesByJob[job_id]
alt Workflow matches active
Handler->>Output: Revoke previews (running transition)
Handler->>Mirror: Update nodeProgressStates
Handler->>Handler: Update _executingNodeProgress
else Workflow mismatch or unresolved
Handler->>Handler: Skip global updates
end
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~90 minutes Poem
Caution Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional.
❌ Failed checks (1 error, 1 warning)
✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
🎭 Playwright: ✅ 1465 passed, 0 failed · 1 flaky📊 Browser Reports
|
🎨 Storybook: ✅ Built — View Storybook |
|
Acknowledged — waiting for the full CodeRabbit review pass to come through. Will address concrete feedback once it arrives. |
📦 Bundle: 5.26 MB gzip 🔴 +1.74 kBDetailsSummary
Category Glance App Entry Points — 22.6 kB (baseline 22.6 kB) • ⚪ 0 BMain entry bundles and manifests
Status: 1 added / 1 removed Graph Workspace — 1.24 MB (baseline 1.24 MB) • 🟢 -3 BGraph editor runtime, canvas, workflow orchestration
Status: 1 added / 1 removed Views & Navigation — 81.8 kB (baseline 81.8 kB) • ⚪ 0 BTop-level views, pages, and routed surfaces
Status: 9 added / 9 removed / 2 unchanged Panels & Settings — 489 kB (baseline 489 kB) • ⚪ 0 BConfiguration panels, inspectors, and settings screens
Status: 10 added / 10 removed / 11 unchanged User & Accounts — 17.5 kB (baseline 17.5 kB) • ⚪ 0 BAuthentication, profile, and account management bundles
Status: 5 added / 5 removed / 2 unchanged Editors & Dialogs — 112 kB (baseline 112 kB) • ⚪ 0 BModals, dialogs, drawers, and in-app editors
Status: 4 added / 4 removed UI Components — 62.9 kB (baseline 62.9 kB) • ⚪ 0 BReusable component library chunks
Status: 5 added / 5 removed / 9 unchanged Data & Services — 3.05 MB (baseline 3.04 MB) • 🔴 +7.71 kBStores, services, APIs, and repositories
Status: 13 added / 13 removed / 4 unchanged Utilities & Hooks — 365 kB (baseline 365 kB) • ⚪ 0 BHelpers, composables, and utility bundles
Status: 13 added / 13 removed / 18 unchanged Vendor & Third-Party — 9.94 MB (baseline 9.94 MB) • ⚪ 0 BExternal libraries and shared vendor chunks Status: 16 unchanged Other — 8.84 MB (baseline 8.84 MB) • ⚪ 0 BBundles that do not match a named category
Status: 57 added / 57 removed / 78 unchanged ⚡ Performance Report
All metrics
Historical variance (last 15 runs)
Trend (last 15 commits on main)
Raw data{
"timestamp": "2026-05-03T20:16:27.645Z",
"gitSha": "592cac8bc2810e5f310444959d699f52fd009116",
"branch": "glary/fix-stuck-progress-eviction",
"measurements": [
{
"name": "canvas-idle",
"durationMs": 2032.2979999999973,
"styleRecalcs": 8,
"styleRecalcDurationMs": 7.93,
"layouts": 0,
"layoutDurationMs": 0,
"taskDurationMs": 398.837,
"heapDeltaBytes": 22832364,
"heapUsedBytes": 72231484,
"domNodes": 16,
"jsHeapTotalBytes": 15204352,
"scriptDurationMs": 21.863999999999997,
"eventListeners": 6,
"totalBlockingTimeMs": 0,
"frameDurationMs": 16.666666666666668,
"p95FrameDurationMs": 16.700000000000728
},
{
"name": "canvas-idle",
"durationMs": 1995.5209999999965,
"styleRecalcs": 8,
"styleRecalcDurationMs": 7.061000000000002,
"layouts": 0,
"layoutDurationMs": 0,
"taskDurationMs": 393.78400000000005,
"heapDeltaBytes": 22805944,
"heapUsedBytes": 72410760,
"domNodes": 16,
"jsHeapTotalBytes": 14942208,
"scriptDurationMs": 20.722000000000005,
"eventListeners": 4,
"totalBlockingTimeMs": 0,
"frameDurationMs": 16.666666666666668,
"p95FrameDurationMs": 16.800000000000182
},
{
"name": "canvas-idle",
"durationMs": 2039.2150000000129,
"styleRecalcs": 10,
"styleRecalcDurationMs": 9.768000000000002,
"layouts": 0,
"layoutDurationMs": 0,
"taskDurationMs": 412.25199999999995,
"heapDeltaBytes": 2063928,
"heapUsedBytes": 67663672,
"domNodes": 20,
"jsHeapTotalBytes": 20017152,
"scriptDurationMs": 29.823999999999998,
"eventListeners": 6,
"totalBlockingTimeMs": 0,
"frameDurationMs": 16.66333333333332,
"p95FrameDurationMs": 16.799999999999272
},
{
"name": "canvas-mouse-sweep",
"durationMs": 1902.5660000000073,
"styleRecalcs": 75,
"styleRecalcDurationMs": 41.201,
"layouts": 12,
"layoutDurationMs": 4.228,
"taskDurationMs": 847.9599999999999,
"heapDeltaBytes": 18290244,
"heapUsedBytes": 67467336,
"domNodes": 58,
"jsHeapTotalBytes": 15990784,
"scriptDurationMs": 130.43500000000003,
"eventListeners": 4,
"totalBlockingTimeMs": 0,
"frameDurationMs": 16.666666666666668,
"p95FrameDurationMs": 16.700000000000728
},
{
"name": "canvas-mouse-sweep",
"durationMs": 1797.6430000000505,
"styleRecalcs": 76,
"styleRecalcDurationMs": 40.336,
"layouts": 12,
"layoutDurationMs": 3.5010000000000003,
"taskDurationMs": 792.6469999999999,
"heapDeltaBytes": 19335184,
"heapUsedBytes": 68217168,
"domNodes": 59,
"jsHeapTotalBytes": 15466496,
"scriptDurationMs": 126.442,
"eventListeners": 6,
"totalBlockingTimeMs": 0,
"frameDurationMs": 16.666666666666668,
"p95FrameDurationMs": 16.699999999999818
},
{
"name": "canvas-mouse-sweep",
"durationMs": 1851.4979999999923,
"styleRecalcs": 74,
"styleRecalcDurationMs": 39.162,
"layouts": 12,
"layoutDurationMs": 3.47,
"taskDurationMs": 825.932,
"heapDeltaBytes": 18235020,
"heapUsedBytes": 67017068,
"domNodes": 56,
"jsHeapTotalBytes": 16777216,
"scriptDurationMs": 121.99100000000001,
"eventListeners": 6,
"totalBlockingTimeMs": 0,
"frameDurationMs": 16.66333333333332,
"p95FrameDurationMs": 16.800000000000182
},
{
"name": "canvas-zoom-sweep",
"durationMs": 1718.6049999999682,
"styleRecalcs": 31,
"styleRecalcDurationMs": 17.762999999999998,
"layouts": 6,
"layoutDurationMs": 0.6789999999999999,
"taskDurationMs": 329.878,
"heapDeltaBytes": 151496,
"heapUsedBytes": 48777904,
"domNodes": 76,
"jsHeapTotalBytes": 15204352,
"scriptDurationMs": 23.387,
"eventListeners": 21,
"totalBlockingTimeMs": 0,
"frameDurationMs": 16.666666666666668,
"p95FrameDurationMs": 16.800000000000182
},
{
"name": "canvas-zoom-sweep",
"durationMs": 1734.929999999963,
"styleRecalcs": 31,
"styleRecalcDurationMs": 19.031,
"layouts": 6,
"layoutDurationMs": 0.831,
"taskDurationMs": 340.14599999999996,
"heapDeltaBytes": 214208,
"heapUsedBytes": 48532068,
"domNodes": 75,
"jsHeapTotalBytes": 14680064,
"scriptDurationMs": 24.474999999999998,
"eventListeners": 21,
"totalBlockingTimeMs": 0,
"frameDurationMs": 16.666666666666668,
"p95FrameDurationMs": 16.800000000000182
},
{
"name": "canvas-zoom-sweep",
"durationMs": 1721.6190000000324,
"styleRecalcs": 31,
"styleRecalcDurationMs": 17.706,
"layouts": 6,
"layoutDurationMs": 0.719,
"taskDurationMs": 344.6480000000001,
"heapDeltaBytes": -165404,
"heapUsedBytes": 47896872,
"domNodes": 75,
"jsHeapTotalBytes": 14942208,
"scriptDurationMs": 24.87600000000001,
"eventListeners": 21,
"totalBlockingTimeMs": 0,
"frameDurationMs": 16.66333333333332,
"p95FrameDurationMs": 16.800000000000182
},
{
"name": "dom-widget-clipping",
"durationMs": 641.3840000000164,
"styleRecalcs": 11,
"styleRecalcDurationMs": 7.700000000000002,
"layouts": 0,
"layoutDurationMs": 0,
"taskDurationMs": 373.36400000000003,
"heapDeltaBytes": 8801996,
"heapUsedBytes": 57546416,
"domNodes": 18,
"jsHeapTotalBytes": 14942208,
"scriptDurationMs": 60.575,
"eventListeners": 2,
"totalBlockingTimeMs": 0,
"frameDurationMs": 16.666666666666668,
"p95FrameDurationMs": 16.700000000000273
},
{
"name": "dom-widget-clipping",
"durationMs": 555.0240000000031,
"styleRecalcs": 10,
"styleRecalcDurationMs": 7.603999999999999,
"layouts": 0,
"layoutDurationMs": 0,
"taskDurationMs": 348.179,
"heapDeltaBytes": 9436408,
"heapUsedBytes": 58660040,
"domNodes": 16,
"jsHeapTotalBytes": 14417920,
"scriptDurationMs": 55.478,
"eventListeners": 0,
"totalBlockingTimeMs": 0,
"frameDurationMs": 16.666666666666668,
"p95FrameDurationMs": 16.800000000000182
},
{
"name": "dom-widget-clipping",
"durationMs": 551.1010000000169,
"styleRecalcs": 10,
"styleRecalcDurationMs": 7.129,
"layouts": 0,
"layoutDurationMs": 0,
"taskDurationMs": 349.15299999999996,
"heapDeltaBytes": 8689696,
"heapUsedBytes": 57472596,
"domNodes": 16,
"jsHeapTotalBytes": 15204352,
"scriptDurationMs": 56.133,
"eventListeners": 2,
"totalBlockingTimeMs": 0,
"frameDurationMs": 16.666666666666668,
"p95FrameDurationMs": 16.799999999999727
},
{
"name": "large-graph-idle",
"durationMs": 2026.4449999999954,
"styleRecalcs": 9,
"styleRecalcDurationMs": 9.544,
"layouts": 0,
"layoutDurationMs": 0,
"taskDurationMs": 617.3249999999999,
"heapDeltaBytes": 3667876,
"heapUsedBytes": 60837872,
"domNodes": -261,
"jsHeapTotalBytes": 4222976,
"scriptDurationMs": 106.89599999999999,
"eventListeners": -129,
"totalBlockingTimeMs": 0,
"frameDurationMs": 16.66333333333332,
"p95FrameDurationMs": 16.800000000000182
},
{
"name": "large-graph-idle",
"durationMs": 2009.5180000000141,
"styleRecalcs": 8,
"styleRecalcDurationMs": 7.806,
"layouts": 0,
"layoutDurationMs": 0,
"taskDurationMs": 599.679,
"heapDeltaBytes": 1049032,
"heapUsedBytes": 59181588,
"domNodes": -264,
"jsHeapTotalBytes": 4485120,
"scriptDurationMs": 100.741,
"eventListeners": -159,
"totalBlockingTimeMs": 0,
"frameDurationMs": 16.666666666666668,
"p95FrameDurationMs": 16.700000000000728
},
{
"name": "large-graph-idle",
"durationMs": 2018.2810000000018,
"styleRecalcs": 10,
"styleRecalcDurationMs": 9.794,
"layouts": 0,
"layoutDurationMs": 0,
"taskDurationMs": 600.214,
"heapDeltaBytes": 2227212,
"heapUsedBytes": 59868432,
"domNodes": -261,
"jsHeapTotalBytes": 5533696,
"scriptDurationMs": 107.46499999999999,
"eventListeners": -129,
"totalBlockingTimeMs": 0,
"frameDurationMs": 16.666666666666668,
"p95FrameDurationMs": 16.799999999999272
},
{
"name": "large-graph-pan",
"durationMs": 2139.3489999999815,
"styleRecalcs": 69,
"styleRecalcDurationMs": 18.697000000000003,
"layouts": 0,
"layoutDurationMs": 0,
"taskDurationMs": 1175.934,
"heapDeltaBytes": 20412040,
"heapUsedBytes": 79506164,
"domNodes": -267,
"jsHeapTotalBytes": 4689920,
"scriptDurationMs": 399.12499999999994,
"eventListeners": -159,
"totalBlockingTimeMs": 0,
"frameDurationMs": 16.666666666666668,
"p95FrameDurationMs": 16.700000000000728
},
{
"name": "large-graph-pan",
"durationMs": 2131.920999999977,
"styleRecalcs": 70,
"styleRecalcDurationMs": 24.865,
"layouts": 0,
"layoutDurationMs": 0,
"taskDurationMs": 1189.391,
"heapDeltaBytes": 8351080,
"heapUsedBytes": 67273808,
"domNodes": -259,
"jsHeapTotalBytes": 1544192,
"scriptDurationMs": 404.031,
"eventListeners": -127,
"totalBlockingTimeMs": 0,
"frameDurationMs": 16.66333333333332,
"p95FrameDurationMs": 16.800000000000182
},
{
"name": "large-graph-pan",
"durationMs": 2109.1709999999466,
"styleRecalcs": 69,
"styleRecalcDurationMs": 19.116999999999997,
"layouts": 0,
"layoutDurationMs": 0,
"taskDurationMs": 1125.4679999999998,
"heapDeltaBytes": -2734108,
"heapUsedBytes": 57686880,
"domNodes": -262,
"jsHeapTotalBytes": 5242880,
"scriptDurationMs": 389.93,
"eventListeners": -129,
"totalBlockingTimeMs": 0,
"frameDurationMs": 16.666666666666668,
"p95FrameDurationMs": 16.800000000000182
},
{
"name": "large-graph-zoom",
"durationMs": 3215.6689999999912,
"styleRecalcs": 66,
"styleRecalcDurationMs": 20.078000000000003,
"layouts": 60,
"layoutDurationMs": 7.755999999999999,
"taskDurationMs": 1383.2559999999999,
"heapDeltaBytes": 9534072,
"heapUsedBytes": 70091124,
"domNodes": -268,
"jsHeapTotalBytes": 5533696,
"scriptDurationMs": 488.289,
"eventListeners": -125,
"totalBlockingTimeMs": 0,
"frameDurationMs": 16.670000000000012,
"p95FrameDurationMs": 16.800000000000182
},
{
"name": "large-graph-zoom",
"durationMs": 3169.433999999967,
"styleRecalcs": 64,
"styleRecalcDurationMs": 18.319,
"layouts": 60,
"layoutDurationMs": 7.774999999999999,
"taskDurationMs": 1388.645,
"heapDeltaBytes": 10428000,
"heapUsedBytes": 71514328,
"domNodes": -271,
"jsHeapTotalBytes": 1077248,
"scriptDurationMs": 487.637,
"eventListeners": -125,
"totalBlockingTimeMs": 0,
"frameDurationMs": 16.666666666666668,
"p95FrameDurationMs": 16.699999999999818
},
{
"name": "large-graph-zoom",
"durationMs": 3195.842999999968,
"styleRecalcs": 66,
"styleRecalcDurationMs": 19.311000000000003,
"layouts": 60,
"layoutDurationMs": 7.648999999999999,
"taskDurationMs": 1385.426,
"heapDeltaBytes": 9179076,
"heapUsedBytes": 69941136,
"domNodes": -270,
"jsHeapTotalBytes": 5533696,
"scriptDurationMs": 487.22,
"eventListeners": -125,
"totalBlockingTimeMs": 0,
"frameDurationMs": 16.66333333333332,
"p95FrameDurationMs": 16.800000000000182
},
{
"name": "minimap-idle",
"durationMs": 2018.1050000000482,
"styleRecalcs": 7,
"styleRecalcDurationMs": 7.226999999999998,
"layouts": 0,
"layoutDurationMs": 0,
"taskDurationMs": 605.664,
"heapDeltaBytes": 7235596,
"heapUsedBytes": 67882996,
"domNodes": -264,
"jsHeapTotalBytes": 552960,
"scriptDurationMs": 104.477,
"eventListeners": -129,
"totalBlockingTimeMs": 0,
"frameDurationMs": 16.66333333333332,
"p95FrameDurationMs": 16.700000000000728
},
{
"name": "minimap-idle",
"durationMs": 2014.7939999999949,
"styleRecalcs": 7,
"styleRecalcDurationMs": 7.140000000000001,
"layouts": 0,
"layoutDurationMs": 0,
"taskDurationMs": 617.7479999999999,
"heapDeltaBytes": 9490276,
"heapUsedBytes": 70174064,
"domNodes": -266,
"jsHeapTotalBytes": -233472,
"scriptDurationMs": 101.43999999999998,
"eventListeners": -129,
"totalBlockingTimeMs": 0,
"frameDurationMs": 16.666666666666668,
"p95FrameDurationMs": 16.800000000000182
},
{
"name": "minimap-idle",
"durationMs": 2010.1329999999962,
"styleRecalcs": 8,
"styleRecalcDurationMs": 7.796000000000001,
"layouts": 0,
"layoutDurationMs": 0,
"taskDurationMs": 606.722,
"heapDeltaBytes": 10207116,
"heapUsedBytes": 70256552,
"domNodes": -265,
"jsHeapTotalBytes": 28672,
"scriptDurationMs": 102.885,
"eventListeners": -129,
"totalBlockingTimeMs": 0,
"frameDurationMs": 16.66333333333332,
"p95FrameDurationMs": 16.800000000000182
},
{
"name": "subgraph-dom-widget-clipping",
"durationMs": 540.5640000000176,
"styleRecalcs": 46,
"styleRecalcDurationMs": 10.454,
"layouts": 0,
"layoutDurationMs": 0,
"taskDurationMs": 355.476,
"heapDeltaBytes": 9776868,
"heapUsedBytes": 59478716,
"domNodes": 17,
"jsHeapTotalBytes": 15728640,
"scriptDurationMs": 116.88099999999999,
"eventListeners": 8,
"totalBlockingTimeMs": 0,
"frameDurationMs": 16.666666666666668,
"p95FrameDurationMs": 16.700000000000273
},
{
"name": "subgraph-dom-widget-clipping",
"durationMs": 575.3450000000271,
"styleRecalcs": 45,
"styleRecalcDurationMs": 10.369,
"layouts": 0,
"layoutDurationMs": 0,
"taskDurationMs": 367.62600000000003,
"heapDeltaBytes": 9068992,
"heapUsedBytes": 57804868,
"domNodes": 15,
"jsHeapTotalBytes": 15204352,
"scriptDurationMs": 118.229,
"eventListeners": 6,
"totalBlockingTimeMs": 0,
"frameDurationMs": 16.663333333333338,
"p95FrameDurationMs": 16.700000000000273
},
{
"name": "subgraph-dom-widget-clipping",
"durationMs": 587.9740000000311,
"styleRecalcs": 46,
"styleRecalcDurationMs": 11.041000000000002,
"layouts": 0,
"layoutDurationMs": 0,
"taskDurationMs": 381.11599999999993,
"heapDeltaBytes": 9147444,
"heapUsedBytes": 57800412,
"domNodes": 18,
"jsHeapTotalBytes": 15466496,
"scriptDurationMs": 122.169,
"eventListeners": 8,
"totalBlockingTimeMs": 0,
"frameDurationMs": 16.666666666666668,
"p95FrameDurationMs": 16.800000000000182
},
{
"name": "subgraph-idle",
"durationMs": 1997.8710000000035,
"styleRecalcs": 10,
"styleRecalcDurationMs": 9.791,
"layouts": 0,
"layoutDurationMs": 0,
"taskDurationMs": 391.72900000000004,
"heapDeltaBytes": 23209512,
"heapUsedBytes": 71538352,
"domNodes": 20,
"jsHeapTotalBytes": 14942208,
"scriptDurationMs": 20.394999999999996,
"eventListeners": 6,
"totalBlockingTimeMs": 0,
"frameDurationMs": 16.66333333333332,
"p95FrameDurationMs": 16.800000000000182
},
{
"name": "subgraph-idle",
"durationMs": 2001.0199999999827,
"styleRecalcs": 9,
"styleRecalcDurationMs": 8.431999999999999,
"layouts": 0,
"layoutDurationMs": 0,
"taskDurationMs": 387.96400000000006,
"heapDeltaBytes": 23263280,
"heapUsedBytes": 72922952,
"domNodes": 18,
"jsHeapTotalBytes": 15204352,
"scriptDurationMs": 18.701999999999998,
"eventListeners": 6,
"totalBlockingTimeMs": 0,
"frameDurationMs": 16.666666666666668,
"p95FrameDurationMs": 16.700000000000728
},
{
"name": "subgraph-idle",
"durationMs": 1995.1800000000048,
"styleRecalcs": 9,
"styleRecalcDurationMs": 9.028999999999998,
"layouts": 0,
"layoutDurationMs": 0,
"taskDurationMs": 394.78000000000003,
"heapDeltaBytes": 22531444,
"heapUsedBytes": 71209752,
"domNodes": 18,
"jsHeapTotalBytes": 14680064,
"scriptDurationMs": 18.914,
"eventListeners": 4,
"totalBlockingTimeMs": 0,
"frameDurationMs": 16.666666666666668,
"p95FrameDurationMs": 16.800000000000182
},
{
"name": "subgraph-mouse-sweep",
"durationMs": 1715.079000000003,
"styleRecalcs": 75,
"styleRecalcDurationMs": 39.879,
"layouts": 16,
"layoutDurationMs": 5.435,
"taskDurationMs": 724.4549999999999,
"heapDeltaBytes": 14750684,
"heapUsedBytes": 63330936,
"domNodes": 60,
"jsHeapTotalBytes": 15466496,
"scriptDurationMs": 90.86799999999998,
"eventListeners": 6,
"totalBlockingTimeMs": 0,
"frameDurationMs": 16.66333333333332,
"p95FrameDurationMs": 16.700000000000728
},
{
"name": "subgraph-mouse-sweep",
"durationMs": 1707.9709999999864,
"styleRecalcs": 75,
"styleRecalcDurationMs": 40.319,
"layouts": 16,
"layoutDurationMs": 5.383,
"taskDurationMs": 776.895,
"heapDeltaBytes": 2819340,
"heapUsedBytes": 51130992,
"domNodes": -263,
"jsHeapTotalBytes": 14807040,
"scriptDurationMs": 91.75500000000001,
"eventListeners": -129,
"totalBlockingTimeMs": 0,
"frameDurationMs": 16.66333333333332,
"p95FrameDurationMs": 16.700000000000728
},
{
"name": "subgraph-mouse-sweep",
"durationMs": 1696.5549999999894,
"styleRecalcs": 74,
"styleRecalcDurationMs": 37.672,
"layouts": 16,
"layoutDurationMs": 4.521,
"taskDurationMs": 781.313,
"heapDeltaBytes": -1728432,
"heapUsedBytes": 46903696,
"domNodes": -262,
"jsHeapTotalBytes": 15069184,
"scriptDurationMs": 93.454,
"eventListeners": -129,
"totalBlockingTimeMs": 0,
"frameDurationMs": 16.66333333333332,
"p95FrameDurationMs": 16.700000000000728
},
{
"name": "viewport-pan-sweep",
"durationMs": 8176.755999999955,
"styleRecalcs": 251,
"styleRecalcDurationMs": 56.061,
"layouts": 0,
"layoutDurationMs": 0,
"taskDurationMs": 3991.099,
"heapDeltaBytes": 15228816,
"heapUsedBytes": 72283908,
"domNodes": -262,
"jsHeapTotalBytes": 3379200,
"scriptDurationMs": 1317.651,
"eventListeners": -113,
"totalBlockingTimeMs": 0,
"frameDurationMs": 16.666666666666668,
"p95FrameDurationMs": 16.700000000000728
},
{
"name": "viewport-pan-sweep",
"durationMs": 8156.655999999998,
"styleRecalcs": 249,
"styleRecalcDurationMs": 55.157,
"layouts": 0,
"layoutDurationMs": 0,
"taskDurationMs": 4310.275,
"heapDeltaBytes": 19462860,
"heapUsedBytes": 77968320,
"domNodes": -263,
"jsHeapTotalBytes": 8622080,
"scriptDurationMs": 1431.025,
"eventListeners": -113,
"totalBlockingTimeMs": 0,
"frameDurationMs": 16.66333333333332,
"p95FrameDurationMs": 16.799999999999272
},
{
"name": "viewport-pan-sweep",
"durationMs": 8153.9470000000165,
"styleRecalcs": 249,
"styleRecalcDurationMs": 55.980000000000004,
"layouts": 0,
"layoutDurationMs": 0,
"taskDurationMs": 4116.269,
"heapDeltaBytes": 6125156,
"heapUsedBytes": 65676020,
"domNodes": -261,
"jsHeapTotalBytes": 5767168,
"scriptDurationMs": 1333.8129999999999,
"eventListeners": -113,
"totalBlockingTimeMs": 0,
"frameDurationMs": 16.666666666666668,
"p95FrameDurationMs": 16.799999999999272
},
{
"name": "vue-large-graph-idle",
"durationMs": 10870.069000000001,
"styleRecalcs": 0,
"styleRecalcDurationMs": 0,
"layouts": 0,
"layoutDurationMs": 0,
"taskDurationMs": 10835.408,
"heapDeltaBytes": -34856612,
"heapUsedBytes": 171916100,
"domNodes": -8331,
"jsHeapTotalBytes": 25227264,
"scriptDurationMs": 601.039,
"eventListeners": -16466,
"totalBlockingTimeMs": 0,
"frameDurationMs": 16.66333333333326,
"p95FrameDurationMs": 16.799999999999272
},
{
"name": "vue-large-graph-idle",
"durationMs": 10925.185,
"styleRecalcs": 0,
"styleRecalcDurationMs": 0,
"layouts": 0,
"layoutDurationMs": 0,
"taskDurationMs": 10910.184000000001,
"heapDeltaBytes": -25429316,
"heapUsedBytes": 171758956,
"domNodes": -8331,
"jsHeapTotalBytes": 24178688,
"scriptDurationMs": 574.9459999999999,
"eventListeners": -16466,
"totalBlockingTimeMs": 0,
"frameDurationMs": 17.223333333333358,
"p95FrameDurationMs": 16.799999999999272
},
{
"name": "vue-large-graph-idle",
"durationMs": 10846.221000000014,
"styleRecalcs": 0,
"styleRecalcDurationMs": 0,
"layouts": 0,
"layoutDurationMs": 0,
"taskDurationMs": 10831.283,
"heapDeltaBytes": -43585892,
"heapUsedBytes": 172058260,
"domNodes": -8331,
"jsHeapTotalBytes": 25227264,
"scriptDurationMs": 599.408,
"eventListeners": -16466,
"totalBlockingTimeMs": 0,
"frameDurationMs": 18.886666666666738,
"p95FrameDurationMs": 16.799999999999272
},
{
"name": "vue-large-graph-pan",
"durationMs": 13245.985999999959,
"styleRecalcs": 66,
"styleRecalcDurationMs": 18.500000000000018,
"layouts": 0,
"layoutDurationMs": 0,
"taskDurationMs": 13211.814,
"heapDeltaBytes": -30575716,
"heapUsedBytes": 164922040,
"domNodes": -8331,
"jsHeapTotalBytes": -1773568,
"scriptDurationMs": 959.067,
"eventListeners": -16460,
"totalBlockingTimeMs": 0,
"frameDurationMs": 17.77333333333336,
"p95FrameDurationMs": 16.700000000000728
},
{
"name": "vue-large-graph-pan",
"durationMs": 13225.94799999996,
"styleRecalcs": 67,
"styleRecalcDurationMs": 17.91000000000001,
"layouts": 0,
"layoutDurationMs": 0,
"taskDurationMs": 13187.295999999998,
"heapDeltaBytes": -40406644,
"heapUsedBytes": 175840184,
"domNodes": -8331,
"jsHeapTotalBytes": 237568,
"scriptDurationMs": 922.4700000000001,
"eventListeners": -16488,
"totalBlockingTimeMs": 52,
"frameDurationMs": 17.220000000000073,
"p95FrameDurationMs": 16.799999999999272
},
{
"name": "vue-large-graph-pan",
"durationMs": 12867.91100000005,
"styleRecalcs": 66,
"styleRecalcDurationMs": 18.074000000000005,
"layouts": 0,
"layoutDurationMs": 0,
"taskDurationMs": 12846.504999999997,
"heapDeltaBytes": -29326892,
"heapUsedBytes": 165331992,
"domNodes": -8331,
"jsHeapTotalBytes": -3346432,
"scriptDurationMs": 891.1719999999998,
"eventListeners": -16462,
"totalBlockingTimeMs": 0,
"frameDurationMs": 17.776666666666642,
"p95FrameDurationMs": 16.799999999999272
},
{
"name": "workflow-execution",
"durationMs": 476.1010000000283,
"styleRecalcs": 20,
"styleRecalcDurationMs": 25.002000000000002,
"layouts": 5,
"layoutDurationMs": 1.909,
"taskDurationMs": 132.54599999999996,
"heapDeltaBytes": 5336876,
"heapUsedBytes": 56728376,
"domNodes": 189,
"jsHeapTotalBytes": 262144,
"scriptDurationMs": 23.806999999999995,
"eventListeners": 69,
"totalBlockingTimeMs": 0,
"frameDurationMs": 16.666666666666668,
"p95FrameDurationMs": 16.700000000000273
},
{
"name": "workflow-execution",
"durationMs": 465.1259999999411,
"styleRecalcs": 16,
"styleRecalcDurationMs": 24.176999999999996,
"layouts": 5,
"layoutDurationMs": 1.565,
"taskDurationMs": 127.29499999999999,
"heapDeltaBytes": 5033220,
"heapUsedBytes": 56126976,
"domNodes": 155,
"jsHeapTotalBytes": 262144,
"scriptDurationMs": 25.081000000000003,
"eventListeners": 71,
"totalBlockingTimeMs": 0,
"frameDurationMs": 16.666666666666668,
"p95FrameDurationMs": 16.700000000000273
},
{
"name": "workflow-execution",
"durationMs": 459.25499999998465,
"styleRecalcs": 17,
"styleRecalcDurationMs": 21.356999999999996,
"layouts": 4,
"layoutDurationMs": 1.098,
"taskDurationMs": 120.04200000000002,
"heapDeltaBytes": 5060712,
"heapUsedBytes": 54972736,
"domNodes": 157,
"jsHeapTotalBytes": 262144,
"scriptDurationMs": 22.128999999999994,
"eventListeners": 69,
"totalBlockingTimeMs": 0,
"frameDurationMs": 16.663333333333338,
"p95FrameDurationMs": 16.700000000000273
}
]
} |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/schemas/apiSchema.ts`:
- Around line 121-125: handleProgressText in src/stores/executionStore.ts
currently only gates progress_text by prompt_id, allowing execution_start from
another tab to leak previews when nodeIds overlap; update handleProgressText to
also check workflow_id from the incoming zProgressTextWsMessage against the
store's active workflow (the same gating used for progress and progress_state),
i.e., read workflow_id from the payload (zProgressTextWsMessage) and
short-circuit if it doesn't match the active workflow before applying text
updates, keeping existing prompt_id checks intact.
In `@src/stores/executionStore.ts`:
- Around line 608-619: reconcileTerminalJobs currently builds tracked from
nodeProgressStatesByJob and activeJobId, so jobs only present in
initializingJobIds never get checked; update reconcileTerminalJobs to also
include entries from initializingJobIds into the tracked Set before iterating
(use initializingJobIds.forEach to add each JobId), then proceed with the same
active/terminal checks and call evictTerminalJob(jobId) for matches; reference
reconcileTerminalJobs, initializingJobIds, nodeProgressStatesByJob, activeJobId,
and evictTerminalJob.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 68623f2b-0192-4fe2-a994-d5171333ad72
📒 Files selected for processing (7)
src/platform/remote/comfyui/useQueuePolling.test.tssrc/platform/remote/comfyui/useQueuePolling.tssrc/schemas/apiSchema.tssrc/stores/executionStore.test.tssrc/stores/executionStore.tssrc/stores/queueStore.test.tssrc/stores/queueStore.ts
- handleProgressText: gate by active workflow using workflow_id from the message, matching the gating already on progress and progress_state. Prevents text previews from a different tab leaking onto the active workflow when nodeIds overlap. - reconcileTerminalJobs: union initializingJobIds into the tracked set so a cloud job that lands in history while still in the 'Waiting for a machine' state gets evicted via evictTerminalJob, clearing the stuck initializing indicator. - Add tests for both: workflow_id mismatch/match on progress_text, and initializing-only job eviction.
Codecov Report❌ Patch coverage is
@@ Coverage Diff @@
## main #11866 +/- ##
===========================================
- Coverage 71.21% 56.00% -15.22%
===========================================
Files 1483 1383 -100
Lines 85606 70728 -14878
Branches 23055 19712 -3343
===========================================
- Hits 60965 39610 -21355
- Misses 23731 30591 +6860
+ Partials 910 527 -383
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 1012 files with indirect coverage changes 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/stores/executionStore.ts`:
- Around line 355-367: When workflowStore.activeWorkflow changes we must
reconcile the mirror so stale progress from a previous workflow doesn't persist;
update the logic around messageMatchesActiveWorkflow to, on active workflow
change, retrieve the canonical state from nodeProgressStatesByJob for the
current job/workflow and either clear nodeProgressStates and
_executingNodeProgress or rebuild them from that map. Specifically: in the
blocks using messageMatchesActiveWorkflow, add handling that looks up
nodeProgressStatesByJob[jobId] (or its workflow-specific entry) and sets
nodeProgressStates.value and _executingNodeProgress.value to the corresponding
canonical entries (or null/empty if none) whenever workflowStore.activeWorkflow
changes; apply the same reconciliation fix to the other block referenced (around
the 387-410 logic) so both incoming-event gating and manual tab switches produce
a correct mirror.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6bb3bd03-a8ff-4ca7-94b8-891266cc817d
📒 Files selected for processing (2)
src/stores/executionStore.test.tssrc/stores/executionStore.ts
The incoming-event gate prevents new progress messages from leaking into the wrong workflow, but if workflow A had already populated the mirror and the user switches to B, the stale A progress would still render until another event arrived. Add a watcher on workflowStore.activeWorkflow that rebuilds the mirror from nodeProgressStatesByJob using the existing workflow-id / session-path mappings, picking the most recently inserted job for the new active workflow, or clears the mirror entirely when no job belongs to it. Also clears _executingNodeProgress when its job no longer matches the active workflow.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/scripts/app.ts (1)
748-763:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftDon't suppress
node.onExecutedfor background workflows without a replay path.This early return prevents callback-driven UI state from ever being rebuilt for jobs started in another workflow tab.
src/extensions/core/load3d.ts:477-510is one concrete example: it wrapsnode.onExecutedto update the load3d preview, so those previews stay stale once the callback is skipped here. Persisting raw outputs is good, but this needs an activation-time replay/projector instead of dropping the callback entirely.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/app.ts` around lines 748 - 763, The early return using useExecutionStore.messageMatchesActiveWorkflow(...) causes node.onExecuted callbacks to be skipped for background workflows; instead, always invoke getNodeByExecutionId(this.rootGraph, executionId) and, if node.onExecuted exists, call node.onExecuted(detail.output) so callback-driven UI (e.g., load3d preview wrappers) still update; if you need to avoid immediate UI updates for non-active workflows, push the output and callback into a replay/projector queue keyed by the workflow (implement a replay mechanism that replays queued node.onExecuted calls on activation) rather than suppressing the callback in this function.
🧹 Nitpick comments (1)
src/stores/executionStore.ts (1)
630-632: ⚡ Quick winClean job-to-workflow mappings during terminal eviction
evictTerminalJob()is now the central terminal cleanup path, but it leavesjobIdToWorkflowId(and optionally session path mapping) behind. Over long sessions, this causes unnecessary map growth.Proposed fix
function evictTerminalJob(jobId: JobId) { if (!jobId) return ... useJobPreviewStore().clearPreview(jobId) clearInitializationByJobId(jobId) + jobIdToWorkflowId.value.delete(jobId) + if (jobIdToSessionWorkflowPath.value.has(jobId)) { + const next = new Map(jobIdToSessionWorkflowPath.value) + next.delete(jobId) + jobIdToSessionWorkflowPath.value = next + } const isActive = activeJobId.value === jobId const mirrorBelongsToEvicted = mirrorOwnerJobId() === jobId ... }Based on learnings: prompt IDs are UUIDs and never reused, so pruning terminal-job mappings is safe and prevents monotonic growth.
Also applies to: 648-652
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/stores/executionStore.ts` around lines 630 - 632, evictTerminalJob() currently calls useJobPreviewStore().clearPreview(jobId) and clearInitializationByJobId(jobId) but does not remove the jobId from the jobIdToWorkflowId map (and related session-path mappings), causing unbounded map growth; update evictTerminalJob() to also remove the job from jobIdToWorkflowId (jobIdToWorkflowId.delete(jobId)) and remove any associated session/path mapping (e.g., delete from jobId->sessionPath or sessionPath->jobId maps) in the same cleanup path (also apply the same deletion logic in the other cleanup site referenced around the other eviction block) so terminal evictions fully prune job-related mappings.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/scripts/app.ts`:
- Around line 801-819: When a workflow becomes active we need to rehydrate
cached previews from the job preview store into the node-output store; update
reconcileMirrorForActiveWorkflow() (or the activation path that checks
useExecutionStore().messageMatchesActiveWorkflow(jobId, workflowId)) to, after
confirming the workflow is active, read cached previews from
useJobPreviewStore().previewsByPromptId (or the job preview lookup used by
setPreviewUrl(jobId, blobUrl, displayNodeId)) and call
setNodePreviewsByExecutionId(...) for the appropriate displayNodeId/node parent
chain (and call revokePreviewsByExecutionId if needed) so the canvas-visible
previews are restored when switching back to a workflow that previously received
previews.
In `@src/stores/executionStore.ts`:
- Around line 351-356: The loop that calls revokePreviewsByExecutionId for nodes
in 'running' state can revoke previews from executions that belong to a
different active workflow; fetch the current active workflow id from the
workflow store (e.g. via useWorkflowStore()/getActiveWorkflowId() or the
existing activeWorkflow variable) and add a guard before calling
revokePreviewsByExecutionId(nodeId) so it only runs when nodeState.workflowId
(or the appropriate ownership field on nodeState) matches the active workflow
id; apply the same ownership check to the other similar block (the one covering
lines ~365-377) to prevent cross-workflow revocations.
---
Outside diff comments:
In `@src/scripts/app.ts`:
- Around line 748-763: The early return using
useExecutionStore.messageMatchesActiveWorkflow(...) causes node.onExecuted
callbacks to be skipped for background workflows; instead, always invoke
getNodeByExecutionId(this.rootGraph, executionId) and, if node.onExecuted
exists, call node.onExecuted(detail.output) so callback-driven UI (e.g., load3d
preview wrappers) still update; if you need to avoid immediate UI updates for
non-active workflows, push the output and callback into a replay/projector queue
keyed by the workflow (implement a replay mechanism that replays queued
node.onExecuted calls on activation) rather than suppressing the callback in
this function.
---
Nitpick comments:
In `@src/stores/executionStore.ts`:
- Around line 630-632: evictTerminalJob() currently calls
useJobPreviewStore().clearPreview(jobId) and clearInitializationByJobId(jobId)
but does not remove the jobId from the jobIdToWorkflowId map (and related
session-path mappings), causing unbounded map growth; update evictTerminalJob()
to also remove the job from jobIdToWorkflowId (jobIdToWorkflowId.delete(jobId))
and remove any associated session/path mapping (e.g., delete from
jobId->sessionPath or sessionPath->jobId maps) in the same cleanup path (also
apply the same deletion logic in the other cleanup site referenced around the
other eviction block) so terminal evictions fully prune job-related mappings.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 2f44898b-c399-4d51-b904-778d003b8690
📒 Files selected for processing (4)
src/scripts/api.tssrc/scripts/app.tssrc/stores/executionStore.test.tssrc/stores/executionStore.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- src/stores/executionStore.test.ts
handleProgressState was calling revokePreviewsByExecutionId before the active-workflow gate, so a progress_state from a non-active workflow could still revoke previews on the active canvas when nodeIds collided. Hoist the active-workflow check above the revocation loop so previews are only touched when the message belongs to the active tab.
9ebab86 to
7f48b33
Compare
There was a problem hiding this comment.
Thanks for the thorough write-up and the test coverage here. I think the overall direction is right: scoping execution UI state by the active workflow and adding polling-based terminal reconciliation should address the stuck-progress class of bugs.
That said, a few execution event paths still appear to bypass the new workflow ownership checks or still use the older global reset behavior. Those gaps seem likely to preserve some of the cross-workflow leakage this PR is trying to remove, especially for executed, terminal WS events, executing, and progress_text.
One adjacent path worth checking as well: src/scripts/app.ts:766 still calls onExecutionStart on the currently visible graph without workflow gating. it may still allow background workflow starts to trigger callbacks on the wrong canvas.
| const nodeOutputStore = useNodeOutputStore() | ||
| const executionId = String(detail.display_node || detail.node) | ||
|
|
||
| // Always persist outputs into the (job-scoped) node-output store so |
There was a problem hiding this comment.
Issue: this write still happens before the active-workflow gate, but nodeOutputStore.setNodeOutputsByExecutionId() resolves the execution id against the current app.rootGraph. If a background workflow emits executed for a colliding node id, it can still overwrite the visible workflow's outputs even though node.onExecuted is gated below. Could this write either be gated as well, or routed through a genuinely job/workflow-scoped output store?
There was a problem hiding this comment.
Agreed this is real. The executed write into nodeOutputStore resolving against app.rootGraph is a parallel cross-workflow leak, alongside other consumers of the global mirror (group nodes, minimap, GraphCanvas, useNodeExecutionState, browser tab title, litegraph stroke styles).
Out of scope for this PR — app.ts is not modified here, and properly fixing it requires either (a) gating writes in app.ts and rehydrating on tab switch, or (b) routing all node-output access through a job-keyed store and changing all 8 consumers. Both are sizable refactors that I deliberately scoped out per the original design discussion (where we picked the focused gate-in-executionStore approach over the UI-wide refactor).
I'll file a follow-up issue covering: executed output gating, canvas preview rehydration on tab switch, and consumer migration to per-job state. Happy to take that as a separate PR.
| } | ||
|
|
||
| /** | ||
| * Safely evict per-job execution artifacts for a job that has reached a |
There was a problem hiding this comment.
Issue: this new safe eviction primitive is exactly the behavior needed for non-active terminal jobs, but the WS terminal handlers still call resetExecutionState(prompt_id). That older path clears the global mirror, _executingNodeProgress, activeJobId, and queuedJobs[activeJobId] even when the terminal prompt_id is not the active job. If job A finishes after job B has become active, A's success event can wipe B's execution state.
There was a problem hiding this comment.
Excellent catch — this was a real cross-workflow clobbering bug. Fixed in 452ef58.
Added a small terminateJob(jobId) helper that routes to evictTerminalJob for non-active jobs (designed to be safe for any jobId, never touches another running job's mirror) and to resetExecutionState only when jobId === activeJobId. Applied at every WS terminal call site: handleExecutionSuccess, handleExecutionInterrupted, handleExecutionError, handleServiceLevelError, handleCloudValidationError.
Three regression tests added in terminal WS handlers do not clobber active job: success on a non-active job preserves the active job's mirror/activeJobId, interrupted on a non-active job same, success on the active job clears as before.
| executionStore.reconcileInitializingJobs(activeJobIds) | ||
| } | ||
|
|
||
| // Reconcile terminal jobs whenever history is non-empty. The last |
There was a problem hiding this comment.
Issue: fetching queue and history concurrently can still miss the recovery case this PR is targeting. If getHistory() returns before the finishing job appears in history while getQueue() returns after it disappeared from Running/Pending, reconciliation sees the job in neither set; then activeJobsCount becomes 0 and polling stops, leaving dropped-terminal progress stuck. A follow-up history fetch or continued polling when tracked progress exists but no terminal id was seen would close this race.
There was a problem hiding this comment.
Real edge case — appreciated. The race window is bounded (backend writes ordering between queue and history are short) but not zero. The robust fix is what you suggested: keep polling while there's tracked progress that hasn't been seen in either set, regardless of activeJobsCount.
I'll address this in a small follow-up rather than expanding this PR further: change useQueuePolling activation to activeJobsCount > 0 || executionStore.nodeProgressStatesByJob has untracked entries, plus an explicit "saw queue empty + history empty + still tracked" continued-poll path. That keeps this PR focused on the WS schema + eviction + workflow gating bundle that's already grown larger than originally planned.
In the meantime, the practical mitigation is that the next real WS event (heartbeat, status, etc.) re-triggers polling via GraphView.vue, so the recovery latency in the worst case is bounded by the time to the next heartbeat rather than indefinite.
| const zExecutingWsMessage = z.object({ | ||
| node: zNodeId, | ||
| display_node: zNodeId, | ||
| prompt_id: zJobId |
There was a problem hiding this comment.
Suggestion: since executing now carries workflow_id in the schema, the API dispatch path should preserve that metadata too. It currently dispatches only display_node || node, so downstream listeners cannot apply the same active-workflow gate as the other execution events and background executions can still clear active node progress or trigger redraws for the visible canvas.
There was a problem hiding this comment.
Right — the schema now describes a richer executing payload but api.ts still strips it down to display_node || node at line 731-734, and the type system at line 218-220 hard-codes executing to NodeId. Plumbing the full payload through requires changes in api.ts (dispatch + ApiToEventType), the handleExecuting handler signature, and existing test mocks. That's a wider structural change than I want to bundle into this PR (which is already covering schema + eviction + active-workflow gating).
Out of scope here, but I'll include this in the follow-up issue alongside the executed write gating (3178588877) and the queue/history race (3178588879). The current handleExecuting is partially safe today because it only mutates activeJob.value.nodes (which is gated through activeJobId indirection) — but you're correct that listeners can't apply the new workflow gate downstream until the dispatch carries the metadata. Tracked for the follow-up.
| const { nodeId, text, prompt_id, workflow_id } = e.detail | ||
| if (!text || !nodeId) return | ||
|
|
||
| // Filter: only accept progress for the active prompt |
There was a problem hiding this comment.
Issue: this workflow ownership check runs after the legacy activeJobId prompt guard. Since activeJobId is still global and can point at another workflow's latest job, progress_text for the currently visible workflow can be dropped even when workflow_id matches the active workflow. Suggestion: prefer the workflow gate when metadata is available, and only fall back to the legacy activeJobId guard when ownership cannot be resolved.
There was a problem hiding this comment.
Fixed in 452ef58. handleProgressText now prefers the workflow-ownership gate when ownership can be resolved (any of: workflow_id on the message, jobIdToWorkflowId mapping, jobIdToSessionWorkflowPath mapping). Only falls back to the legacy activeJobId !== prompt_id guard when ownership is genuinely unresolvable, so progress_text for the visible workflow won't be dropped just because activeJobId happens to point at another workflow's job.
…jobs The WS terminal handlers (execution_success, execution_error, execution_interrupted, plus the cloud validation and service-level error paths) all called resetExecutionState(prompt_id) directly. That function unconditionally clears the global mirror, _executingNodeProgress, activeJobId, and queuedJobs[activeJobId.value], which means a terminal event for a non-active job (e.g. job A finishing after job B becomes active) wipes the active job's UI state. Add a small terminateJob(jobId) helper that routes to evictTerminalJob for non-active jobs (already designed to be safe for any jobId) and to resetExecutionState only when jobId === activeJobId. Apply at every WS terminal call site. Also reorder the gating in handleProgressText: prefer the active-workflow gate when ownership can be resolved (workflow_id on the message or a registered mapping), and only fall back to the legacy activeJobId guard when ownership is unresolvable. The previous order would drop legitimate text previews for the visible workflow when activeJobId pointed at a different workflow's job.
17c6807 to
452ef58
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (2)
src/stores/executionStore.ts (1)
353-360: 💤 Low valueConsider hoisting
useNodeOutputStore()outside the loop.While functionally correct (Pinia stores are singletons), calling
useNodeOutputStore()and destructuring inside the loop on every iteration is slightly wasteful.♻️ Suggested refactor
if (isActiveWorkflowMessage) { + const { revokePreviewsByExecutionId } = useNodeOutputStore() for (const nodeId in nodes) { const nodeState = nodes[nodeId] if (nodeState.state === 'running' && !previousForJob[nodeId]) { - const { revokePreviewsByExecutionId } = useNodeOutputStore() revokePreviewsByExecutionId(nodeId) } } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/stores/executionStore.ts` around lines 353 - 360, The loop calls useNodeOutputStore() and destructures revokePreviewsByExecutionId on every iteration which is wasteful; hoist the store lookup and destructuring outside the for-loop: obtain const { revokePreviewsByExecutionId } = useNodeOutputStore() once before iterating over nodes (while keeping the same guard isActiveWorkflowMessage and checks for nodeState.state === 'running' && !previousForJob[nodeId]) and then call revokePreviewsByExecutionId(nodeId) inside the loop.src/stores/executionStore.test.ts (1)
86-96: 💤 Low valueRemove dead mock for
@/stores/imagePreviewStore.Lines 86-90 mock
@/stores/imagePreviewStoreexportinguseNodeOutputStore, butexecutionStore.tsimportsuseNodeOutputStorefrom@/stores/nodeOutputStore(line 32). The correct mock at lines 92-96 supersedes this one. The dead mock is confusing and could mask future import changes.♻️ Suggested removal
-vi.mock('@/stores/imagePreviewStore', () => ({ - useNodeOutputStore: () => ({ - revokePreviewsByExecutionId: vi.fn() - }) -})) - vi.mock('@/stores/nodeOutputStore', () => ({ useNodeOutputStore: () => ({ revokePreviewsByExecutionId: mockRevokePreviewsByExecutionId }) }))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/stores/executionStore.test.ts` around lines 86 - 96, Remove the dead mock block that defines useNodeOutputStore returning a revokePreviewsByExecutionId vi.fn(); it is superseded by the working mock that uses mockRevokePreviewsByExecutionId and is confusing. Locate the mock that returns { useNodeOutputStore: () => ({ revokePreviewsByExecutionId: vi.fn() }) } and delete that entire mock declaration so only the correct mock referencing mockRevokePreviewsByExecutionId remains.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/stores/executionStore.test.ts`:
- Around line 86-96: Remove the dead mock block that defines useNodeOutputStore
returning a revokePreviewsByExecutionId vi.fn(); it is superseded by the working
mock that uses mockRevokePreviewsByExecutionId and is confusing. Locate the mock
that returns { useNodeOutputStore: () => ({ revokePreviewsByExecutionId: vi.fn()
}) } and delete that entire mock declaration so only the correct mock
referencing mockRevokePreviewsByExecutionId remains.
In `@src/stores/executionStore.ts`:
- Around line 353-360: The loop calls useNodeOutputStore() and destructures
revokePreviewsByExecutionId on every iteration which is wasteful; hoist the
store lookup and destructuring outside the for-loop: obtain const {
revokePreviewsByExecutionId } = useNodeOutputStore() once before iterating over
nodes (while keeping the same guard isActiveWorkflowMessage and checks for
nodeState.state === 'running' && !previousForJob[nodeId]) and then call
revokePreviewsByExecutionId(nodeId) inside the loop.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 5cc5a8d3-b84c-4f2b-946f-4d756c1bf8c0
📒 Files selected for processing (2)
src/stores/executionStore.test.tssrc/stores/executionStore.ts
- Hoist useNodeOutputStore() destructure outside the per-node loop in handleProgressState so the store lookup happens once per message instead of once per running node. - Remove the dead vi.mock for @/stores/imagePreviewStore (executionStore imports useNodeOutputStore from @/stores/nodeOutputStore; the second mock is the live one).
|
Both nitpicks addressed in 4d9011b:
|
christian-byrne
left a comment
There was a problem hiding this comment.
executionStore.reconcileTerminalJobs(active, terminal) — new polling-driven reconciliation. Walks tracked jobs, evicts any job that is in the terminal set and not in the active set. Internally uses evictTerminalJob, which always removes per-job state but only clears the global mirror / activeJobId / _executingNodeProgress when the evicted job actually owns them (mirror ownership identified via prompt_id on NodeProgressState since the mirror is replaced wholesale per message). Idempotent with the WS terminal handlers.
There is 0% chance that a polling based approach is the move here. We need to have singular state and have the eviction be a watcher or watchEffect.
| * {@link activeJobId}) only when those still belong to the evicted job. | ||
| * - Idempotent: calling for an already-cleared job is a no-op. | ||
| */ | ||
| function evictTerminalJob(jobId: JobId) { |
There was a problem hiding this comment.
How about this for design:
- when we store progress state, keep the job id (promptId) in the ref/local data.
- expose a finished jobs ref from the queue or history store, or wherever
- watch the finished jobs ref in the execution store or in respective useNodeProgressState
This current set up obvioulsy is not good.
There was a problem hiding this comment.
Fair — the imperative evictTerminalJob / reconcileTerminalJobs path I added is fundamentally an external "go scrub stale state" pattern, which is the opposite of how the rest of executionStore works (events come in, state mutates locally). Reactive design is cleaner.
Outline of the redesign as I read your comment:
-
Tag progress data with promptId at write time.
nodeProgressStatesByJobalready does this implicitly via the outer key, andNodeProgressState.prompt_idis on every entry. So the data is already promptId-aware; we don't need to add storage. -
Expose
finishedJobsreactively fromqueueStore(or a new derived store). Computed fromhistoryTasks(the source of truth for terminal state), keyed by jobId, so consumers can subscribe to "this jobId just became terminal" without writing imperative reconciliation loops. -
Watch from
executionStore(oruseNodeProgressState). A single watcher onfinishedJobs— when a tracked promptId enters that set, run the same per-job cleanup thatevictTerminalJobdoes today (delete fromnodeProgressStatesByJob, conditionally clear mirror, clear preview, clear initializing). Same eviction primitive, reactive trigger.
This collapses two of the things people complained about in the current PR:
- No more "polling fires
reconcileTerminalJobswhich iterates all tracked jobs" — just a watch on the derived diff - No more
queueStore.update()knowing aboutexecutionStore's internals
I'll move this work to a new PR with that shape (per your split suggestion). Will keep the evictTerminalJob logic in spirit but expose it as the watcher's reaction rather than as a polling-callable export. Tests will cover: terminal arrival via WS (current happy path), terminal arrival only via history (the dropped-WS recovery case), and the multi-job-cross-workflow permutations.
There was a problem hiding this comment.
I sort of want to separate this into two PRs:
- First PR just for adding the workflow_id in the schema and adding the logic to filter progress events and UI side effects by workflow_id. Add tests e2e and unit and permutations
- Second PR for the eventual consistency fallback handling (basically the queue polling eventual consistency backup but applied to porgress state messages). Also needs e2e, unit, and permutation tests. Of course, apply my review comments when you move that code to a new PR.
There was a problem hiding this comment.
Agreed — splitting cleanly along that line makes sense, and would have been the right call from the start. Plan:
PR 1 (this branch, rescope): schema-level workflow_id addition + active-workflow gating in executionStore + tab-switch mirror reconciliation. Drop the polling/eviction commits. Tests stay focused on gating semantics.
PR 2 (new branch): queue-polling eventual-consistency fallback for stuck progress, redesigned per your other comment (#3183431959 — store promptId with progress state, expose finishedJobs reactively from queue/history store, watch from execution store). E2E + permutation tests included.
Going to wait for confirmation from the team before rewriting branch history (this PR has been through several review rounds and has people referencing specific commits). If you'd prefer a clean redo, happy to close this and open the two new PRs from scratch.
PR Created by the Glary-Bot Agent
Summary
Fixes the bug where node-level progress UI stays stuck on a node after the job has finished (Partner Nodes, observed multiple times in #bug-dump), and prevents cross-workflow progress leakage where a job from one open workflow tab updates the progress UI of another.
Problem
Two independent issues both contribute to the user-visible stuck-progress bug:
nodeProgressStatesandnodeProgressStatesByJob[jobId]are only cleared fromhandleExecutionSuccess/handleExecutionError/handleExecutionInterrupted. The Cloud backend broadcasts these once and does not retry, so any drop leaves the per-node progress bar / time-elapsed counter pinned.queueStore.update()polling sees the job move Running → History (Job Queue and Assets panel update correctly) but never reconciles node-level state.handleProgressStateandhandleProgressoverwrite the globalnodeProgressStatesmirror and_executingNodeProgressunconditionally, regardless of which open workflow initiated the job. Switching tabs while a job is in flight bleeds progress into the wrong workflow.Changes
executionStore.reconcileTerminalJobs(active, terminal)— new polling-driven reconciliation. Walks tracked jobs, evicts any job that is in the terminal set and not in the active set. Internally usesevictTerminalJob, which always removes per-job state but only clears the global mirror /activeJobId/_executingNodeProgresswhen the evicted job actually owns them (mirror ownership identified viaprompt_idonNodeProgressStatesince the mirror is replaced wholesale per message). Idempotent with the WS terminal handlers.queueStore.update()— callsreconcileTerminalJobswhenever history is non-empty, regardless of whether queue is empty. The "last active job finishes" case legitimately produces empty Running/Pending plus the job in history, and that is the most common path we need to recover from.useQueuePolling— activates whileactiveJobsCount > 0instead of=== 1, so recovery still happens when one job's terminal is dropped while another is pending/running.handleProgressStateandhandleProgress— gate writes to the global mirror /_executingNodeProgressby active workflow. Resolution order:workflow_idon the WS message →jobIdToWorkflowId→jobIdToSessionWorkflowPath→ fall through to legacy behaviour. Per-job state (nodeProgressStatesByJob) is unconditionally written either way.apiSchema.ts— adds optionalworkflow_idtoprogress,progress_state,executing,executed,progress_text,execution_start,execution_success,execution_cached,execution_interrupted,execution_error, and toNodeProgressState. Optional so the schema is correct both before and after the Cloud deploy that adds the field; gating uses it the moment it arrives.Why this approach
resetExecutionStateis unsafe to call for non-active jobs — it unconditionally clears the global mirror andactiveJobId, which would clobber a different running job.evictTerminalJobis the polling-safe variant.Active-workflow gating is implemented with the existing
jobIdToWorkflowId/jobIdToSessionWorkflowPathmappings rather than a UI-wide refactor of all 8 consumers of the global mirror. The gate keeps the mirror surface stable while eliminating the leakage at the only places it's written.Edge cases
activeJobIdalready null,deleteof missing key, etc.).RunningandHistoryin the same snapshot (race window betweengetQueueandgetHistory) → guarded byif (activeJobIds.has(jobId)) continue.historyTasksfrom the same fetch, so jobs rotated out of the bounded window still get LRU-evicted fromnodeProgressStatesByJobat 1000 entries.Tests
useExecutionStore - reconcileTerminalJobs: 6 cases covering active vs non-active eviction, stale-mirror cleanup, race guard, missing-from-terminal skip, idempotency.useExecutionStore - active workflow gating of progress mirror: 9 cases coveringworkflow_idmatch/mismatch on bothprogress_stateandprogressevents, both fallback resolution paths, default-to-legacy when unresolvable, no-active-workflow short-circuit, and per-job state always updating.useQueueStore - history reconciliation: 2 cases assertingreconcileTerminalJobsruns when queue is empty + history non-empty, and is skipped when history is empty.useQueuePolling: existing> 1test inverted to assert polling now runs.pnpm test:uniton the changed stores → 139 passed, 1 skipped.pnpm typecheck,pnpm lint,pnpm format,pnpm knip→ clean.Manual verification
The bug is a race in store reconciliation triggered by dropped WebSocket terminal messages from the Cloud backend. There is no UI control to trigger this locally — reproducing it requires intercepting
execution_success/execution_error/execution_interruptedat the network/realtime-broadcast layer, which is exactly the failure mode the unit tests simulate directly via mockedprogress_stateevents without a corresponding terminal event. The 17 new unit tests directly exercise the dropped-message scenario, the multi-workflow leakage scenario, and the recovery path.Out of scope
workflow_idpropagation in core ComfyUI — tracked in BE-672.nodeProgressStatesByJob[activeJobId]. The gate already prevents leakage; that refactor can land separately if desired.┆Issue is synchronized with this Notion page by Unito