Skip to content

fix(ui): stabilize workflow node execution state updates#9029

Open
JPPhoto wants to merge 4 commits intoinvoke-ai:mainfrom
JPPhoto:workflow-node-execution-event-ordering
Open

fix(ui): stabilize workflow node execution state updates#9029
JPPhoto wants to merge 4 commits intoinvoke-ai:mainfrom
JPPhoto:workflow-node-execution-event-ordering

Conversation

@JPPhoto
Copy link
Copy Markdown
Collaborator

@JPPhoto JPPhoto commented Apr 8, 2026

Summary

Fixes workflow node execution state updates in the frontend event layer.

This change fixes nodes getting stuck in IN_PROGRESS or showing duplicate outputs when socket events arrive out of order or are repeated. The fix moves the event-ordering logic into shared helpers and uses a listener-local completed-invocation key set so late invocation_started / invocation_progress events cannot overwrite a completed node state.

Related Issues / Discussions

QA Instructions

  1. On main, run a workflow in the Workflow Editor and examine the Outputs pane for a node that executes. You should see two outputs even when the node is executed once.
  2. After pulling and building (or running in dev mode), open the Workflow Editor and run a workflow with visible node progress.
    • Confirm nodes transition from pending to in progress to completed.
    • Confirm completed nodes do not revert back to in progress during or after the run.
    • Confirm the Outputs pane does not show duplicate outputs for a single node execution.
  3. Execute pnpm vitest run to run regression tests.

Merge Plan

Checklist

  • The PR has a short but descriptive title, suitable for a changelog
  • Tests added / updated (if applicable)
  • ❗Changes to a redux slice have a corresponding migration
  • Documentation added / updated (if applicable)
  • Updated What's New copy (if doing a release after this PR)

@JPPhoto JPPhoto added the frontend PRs that change frontend files label Apr 8, 2026
@JPPhoto JPPhoto force-pushed the workflow-node-execution-event-ordering branch 9 times, most recently from 23fae17 to 894df8a Compare April 14, 2026 00:39
@lstein lstein self-assigned this Apr 14, 2026
@lstein lstein added the v6.13.x label Apr 14, 2026
@lstein lstein moved this to 6.13.x Theme: MODELS in Invoke - Community Roadmap Apr 14, 2026
@JPPhoto JPPhoto force-pushed the workflow-node-execution-event-ordering branch 12 times, most recently from 057032b to 780663a Compare April 20, 2026 21:02
@JPPhoto JPPhoto force-pushed the workflow-node-execution-event-ordering branch 2 times, most recently from 36894df to f2068cf Compare April 21, 2026 00:40
Copy link
Copy Markdown
Collaborator

@lstein lstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a funny coincidence that I noticed the doubled output just a few days ago on my own, and was puzzled by it, not knowing whether it was a feature or a bug.

In any case, I'm having difficulty testing this PR. I set up the very simple integer operation workflow shown below. When I run it, the nodes go into permanent pending state and continue showing pending even after cancelling. On the other hand, when using an image generation workflow that previously doubled the output, the doubled output is gone. However, the nodes at the very beginning and end of the workflow get stuck in PENDING. Is there something I'm doing wrong?

(Yes, I rebuilt the front end)

Image

@JPPhoto
Copy link
Copy Markdown
Collaborator Author

JPPhoto commented Apr 21, 2026

It's a funny coincidence that I noticed the doubled output just a few days ago on my own, and was puzzled by it, not knowing whether it was a feature or a bug.

In any case, I'm having difficulty testing this PR. I set up the very simple integer operation workflow shown below. When I run it, the nodes go into permanent pending state and continue showing pending even after cancelling. On the other hand, when using an image generation workflow that previously doubled the output, the doubled output is gone. However, the nodes at the very beginning and end of the workflow get stuck in PENDING. Is there something I'm doing wrong?

(Yes, I rebuilt the front end)

I think there are multiple issues here and this PR addresses double results while #9043 addresses status issues. Can you locally merge #9043 into your checkout and see if everything is better with both applied?

@JPPhoto JPPhoto force-pushed the workflow-node-execution-event-ordering branch from e0b1e7d to 4afe259 Compare April 21, 2026 02:00
@lstein
Copy link
Copy Markdown
Collaborator

lstein commented Apr 21, 2026

It's a funny coincidence that I noticed the doubled output just a few days ago on my own, and was puzzled by it, not knowing whether it was a feature or a bug.
In any case, I'm having difficulty testing this PR. I set up the very simple integer operation workflow shown below. When I run it, the nodes go into permanent pending state and continue showing pending even after cancelling. On the other hand, when using an image generation workflow that previously doubled the output, the doubled output is gone. However, the nodes at the very beginning and end of the workflow get stuck in PENDING. Is there something I'm doing wrong?
(Yes, I rebuilt the front end)

I think there are multiple issues here and this PR addresses double results while #9043 addresses status issues. Can you locally merge #9043 into your checkout and see if everything is better with both applied?

Will do. I'm traveling for a bunch of business meetings this week so it may be a couple days before I get back to this, but I'm anxious to get it pushed through.

@lstein
Copy link
Copy Markdown
Collaborator

lstein commented Apr 21, 2026

I've created a branch that contains both #9029 and #9043 . However the problem of stuck workflows persists.

When I create the simplest workflow of them all, a single Add Integers node, and press the invoke button, about 90% of the time it gets stuck in pending state. If I create a slightly more complex workflow, such as feeding the Add Integers output into Integer Range of Size, the workflow completes about 80% of the time and get stuck in PENDING the rest of the time.

This suggests to me that there is still a race condition of some sort. Let me know if testing this the wrong way.

@lstein
Copy link
Copy Markdown
Collaborator

lstein commented Apr 21, 2026

Oh, wait, I merged #9042, not #9043. Trying that as well.

No, the problem persists. This is with all three PRs (#9029, #9042 and #9043) merged into a local branch. Also I note that the single node Add Integer workflow sometimes appears to run to complete, but produces no output.

@JPPhoto JPPhoto force-pushed the workflow-node-execution-event-ordering branch from 4afe259 to e65a67f Compare April 22, 2026 00:56
@JPPhoto
Copy link
Copy Markdown
Collaborator Author

JPPhoto commented Apr 22, 2026

I think I have the issue isolated. The node's initial execution state has not been put into $nodeExecutionStates before the first socket event for that node arrives. For very fast workflows, invocation_started, invocation_progress, or even invocation_complete can win that race, and the handler was previously dropping the event because there was no existing execution-state entry to update. Try this PR again now and see if that resolves the problem.

@JPPhoto JPPhoto force-pushed the workflow-node-execution-event-ordering branch from 0af9f1f to 46cc494 Compare April 22, 2026 01:32
@JPPhoto JPPhoto force-pushed the workflow-node-execution-event-ordering branch from 46cc494 to 814a95b Compare April 23, 2026 15:58
Copy link
Copy Markdown
Collaborator

@lstein lstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it out with several workflows that were previously returning doubled output, and this PR fixed the issue. I performed a Claude code review and it flagged a couple of minor bugs that I thought were worth your attention:

  1. completedInvocationKeys grows without bound (likely memory leak) — setEventListeners.tsx:73

The replaced cache was LRUCache<number, boolean>({ max: 100 }). The new Set has no bound and lives for the full lifetime of
setEventListeners (the whole socket session). Every completed invocation in the session adds a string that's never removed. In a long-running
session with many runs, this accumulates forever.

Suggested fixes (any one):

  • Use LRUCache<string, boolean>({ max: <e.g.> 1000 }) to match the prior pattern.
  • Clear keys in the queue_item_status_changed handler on the completed | failed | canceled branch — you already know all invocations belonging to
    item_id are done. A Map<number, Set> keyed by item_id makes this cleanup O(1).
  • At minimum, clear the Set alongside the existing $nodeExecutionStates reset on status === 'in_progress' (defensible if you believe the concern
    is only within-run ordering).
  1. Asymmetric handling between invocation_complete and the other three invocation events — setEventListeners.tsx:111, 128, 161, 178

invocation_started, invocation_progress, and invocation_error still early-return on finishedQueueItemIds.has(data.item_id). invocation_complete no
longer does. This is intentional but subtle: if queue_item_status_changed(failed) arrives before a late invocation_error, the error event is now
silently dropped and the node may be left stuck in IN_PROGRESS. Since this PR's whole theme is hardening against out-of-order events, consider
also removing the finishedQueueItemIds early-return from invocation_error so the error helper can still populate the node's terminal state.

  1. Finally, there is a cosmetic issue: the test file for nodeExecutionState.ts is named nodeExecutionStateHelpers.test.ts., but the pattern elsewhere would be to name it nodeExecutionState.test.ts. I assume at one point the code file was named ...Helpers.ts and was renamed.

@JPPhoto JPPhoto requested a review from lstein April 25, 2026 00:21
@JPPhoto
Copy link
Copy Markdown
Collaborator Author

JPPhoto commented Apr 25, 2026

@lstein Thanks, I think it's hardened now and also won't grow without bounds. The test has been renamed to match the repo convention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend PRs that change frontend files v6.13.x

Projects

Status: 6.13.x Theme: MODELS

Development

Successfully merging this pull request may close these issues.

2 participants