fix(ui): stabilize workflow node execution state updates#9029
fix(ui): stabilize workflow node execution state updates#9029JPPhoto wants to merge 4 commits intoinvoke-ai:mainfrom
Conversation
23fae17 to
894df8a
Compare
057032b to
780663a
Compare
36894df to
f2068cf
Compare
There was a problem hiding this comment.
It's a funny coincidence that I noticed the doubled output just a few days ago on my own, and was puzzled by it, not knowing whether it was a feature or a bug.
In any case, I'm having difficulty testing this PR. I set up the very simple integer operation workflow shown below. When I run it, the nodes go into permanent pending state and continue showing pending even after cancelling. On the other hand, when using an image generation workflow that previously doubled the output, the doubled output is gone. However, the nodes at the very beginning and end of the workflow get stuck in PENDING. Is there something I'm doing wrong?
(Yes, I rebuilt the front end)
I think there are multiple issues here and this PR addresses double results while #9043 addresses status issues. Can you locally merge #9043 into your checkout and see if everything is better with both applied? |
e0b1e7d to
4afe259
Compare
Will do. I'm traveling for a bunch of business meetings this week so it may be a couple days before I get back to this, but I'm anxious to get it pushed through. |
|
I've created a branch that contains both #9029 and #9043 . However the problem of stuck workflows persists. When I create the simplest workflow of them all, a single Add Integers node, and press the invoke button, about 90% of the time it gets stuck in pending state. If I create a slightly more complex workflow, such as feeding the Add Integers output into Integer Range of Size, the workflow completes about 80% of the time and get stuck in PENDING the rest of the time. This suggests to me that there is still a race condition of some sort. Let me know if testing this the wrong way. |
4afe259 to
e65a67f
Compare
|
I think I have the issue isolated. The node's initial execution state has not been put into |
0af9f1f to
46cc494
Compare
46cc494 to
814a95b
Compare
lstein
left a comment
There was a problem hiding this comment.
I tested it out with several workflows that were previously returning doubled output, and this PR fixed the issue. I performed a Claude code review and it flagged a couple of minor bugs that I thought were worth your attention:
- completedInvocationKeys grows without bound (likely memory leak) — setEventListeners.tsx:73
The replaced cache was LRUCache<number, boolean>({ max: 100 }). The new Set has no bound and lives for the full lifetime of
setEventListeners (the whole socket session). Every completed invocation in the session adds a string that's never removed. In a long-running
session with many runs, this accumulates forever.
Suggested fixes (any one):
- Use LRUCache<string, boolean>({ max: <e.g.> 1000 }) to match the prior pattern.
- Clear keys in the queue_item_status_changed handler on the completed | failed | canceled branch — you already know all invocations belonging to
item_id are done. A Map<number, Set> keyed by item_id makes this cleanup O(1). - At minimum, clear the Set alongside the existing $nodeExecutionStates reset on status === 'in_progress' (defensible if you believe the concern
is only within-run ordering).
- Asymmetric handling between invocation_complete and the other three invocation events — setEventListeners.tsx:111, 128, 161, 178
invocation_started, invocation_progress, and invocation_error still early-return on finishedQueueItemIds.has(data.item_id). invocation_complete no
longer does. This is intentional but subtle: if queue_item_status_changed(failed) arrives before a late invocation_error, the error event is now
silently dropped and the node may be left stuck in IN_PROGRESS. Since this PR's whole theme is hardening against out-of-order events, consider
also removing the finishedQueueItemIds early-return from invocation_error so the error helper can still populate the node's terminal state.
- Finally, there is a cosmetic issue: the test file for
nodeExecutionState.tsis namednodeExecutionStateHelpers.test.ts., but the pattern elsewhere would be to name itnodeExecutionState.test.ts. I assume at one point the code file was named...Helpers.tsand was renamed.
|
@lstein Thanks, I think it's hardened now and also won't grow without bounds. The test has been renamed to match the repo convention. |
Summary
Fixes workflow node execution state updates in the frontend event layer.
This change fixes nodes getting stuck in
IN_PROGRESSor showing duplicate outputs when socket events arrive out of order or are repeated. The fix moves the event-ordering logic into shared helpers and uses a listener-local completed-invocation key set so lateinvocation_started/invocation_progressevents cannot overwrite a completed node state.Related Issues / Discussions
QA Instructions
main, run a workflow in the Workflow Editor and examine the Outputs pane for a node that executes. You should see two outputs even when the node is executed once.pnpm vitest runto run regression tests.Merge Plan
Checklist
What's Newcopy (if doing a release after this PR)