Skip to content

ci(windows): propagate cmake/ninja build failures out of the Build step#5947

Merged
Fedr merged 2 commits intomasterfrom
ci/fix-windows-build-step-exit
Apr 21, 2026
Merged

ci(windows): propagate cmake/ninja build failures out of the Build step#5947
Fedr merged 2 commits intomasterfrom
ci/fix-windows-build-step-exit

Conversation

@Fedr
Copy link
Copy Markdown
Contributor

@Fedr Fedr commented Apr 21, 2026

Problem

On run 24691107383, msvc-2019 Release CMake job 72213000966, the `Build` step reported a green `success` despite ninja saying:

[775/895] Linking CXX shared library bin\MRViewer.dll
FAILED: [code=4294967295] bin/MRViewer.dll bin/MRViewer.lib
cpr.lib(util.cpp.obj) : error LNK2019: unresolved external symbol __std_find_trivial_1 ...
cpr.lib(session.cpp.obj) : error LNK2001: unresolved external symbol __std_find_trivial_1
cpr.lib(util.cpp.obj) : error LNK2019: unresolved external symbol __std_find_first_of_trivial_pos_1 ...
cpr.lib(threadpool.cpp.obj) : error LNK2019: unresolved external symbol _Cnd_timedwait_for_unchecked ...
bin\MRViewer.dll : fatal error LNK1120: 3 unresolved externals
...
ninja: build stopped: subcommand failed.

After that, the script ran `xcopy` and exited 0. GitHub Actions marked the step `success`. The next step, `Generate and build Python bindings`, then tried to link `mrcudapy.pyd` against `MRCuda.lib` — which was never produced because MRCuda's link target was downstream of the failed MRViewer target — and failed with a confusing

lld-link: error: could not open 'MRCuda.lib': No such file or directory

…67 seconds after the real failure the Build step should have reported. Debugging that error points at CUDA / MRCuda / mrbind, none of which is the actual problem.

Root cause

The `Build` step used an `if errorlevel 1 exit 1` check after each command inside a parenthesized IF block in a `shell: cmd` batch script:

if 1==1 (
    cmake --build source\TempOutput -j
    if errorlevel 1 exit 1
    ...
)

`if errorlevel 1` inside a parenthesized IF block in cmd.exe has well-known quirks where it can fail to catch exit codes from commands earlier in the same block (related to cmd.exe's parse-time expansion / single-pass parsing of parenthesized constructs). The check happened not to trigger for cmake/ninja's non-zero exit in this run; the script fell through to the next command.

The pattern had been in place for a while but was likely never exercised by a ninja-level build failure before this PR's unusual combination (#5932's overlay cascaded an ABI change through cpr → freetype → MRViewer, exposing a latent MSVC STL/cpr compatibility issue in the msvc-2019/vcpkg-2024.10.21 matrix entry, which triggered a link failure that the error-propagation pattern couldn't catch).

Fix

Replace each `if errorlevel 1 exit 1` with `|| exit /b 1` chained directly to the preceding command. `||` is evaluated at process-exit time rather than via cmd.exe's errorlevel-in-parens parsing, and is the documented idiom for robust error propagation in batch files.

-  cmake --build source\TempOutput -j
-  if errorlevel 1 exit 1
+  cmake --build source\TempOutput -j || exit /b 1

Applied to all six commands inside the CMake branch of the IF (VsDevCmd, cmake --version, cmake -B, cmake --build, mkdir, xcopy) and to the msbuild command in the ELSE branch.

Functional change

  • When the build succeeds: no behavior change. `|| exit /b 1` evaluates to no-op when the left side returns 0.
  • When any build command fails: the step now exits with code 1 at the point of failure, and GitHub Actions reports the step as `failure` with the log pointing at the real cause. Previously, most failures would still be caught (`if errorlevel 1` is not universally broken), but the subset that the parse-time-expansion quirks miss — like the one observed here — will now be caught reliably.

Not in this PR

  • Other steps in the same workflow use the same `if errorlevel 1 exit 1` pattern inside `shell: cmd` blocks (`Build MRBind`, `Generate C bindings`, `Generate and build Python bindings`, `Install MSYS2 for MRBind`, etc.). They may have the same latent bug. Left alone here to keep the PR scoped to the one step with a concrete reproducer. A follow-up PR could audit all of them.
  • The MSVC STL vs cpr incompatibility that caused the MRViewer link failure in the first place is a different issue; this PR doesn't address it, only the error-propagation problem it exposed. The underlying MRViewer link failure is a separate discussion (likely: bump msvc-2019's vcpkg-version off `2024.10.21`).

Rollback

Single-file revert of `.github/workflows/build-test-windows.yml`.

The Build step used an `if errorlevel 1 exit 1` check after each
command inside a parenthesized IF block in a cmd.exe batch script.
On windows-2025 runners, that check was failing to catch
`cmake --build` / ninja failures — observed in run 24691107383,
msvc-2019 Release CMake: ninja reported

    ninja: build stopped: subcommand failed.

(MRViewer.dll link failed with LNK1120 unresolved externals from
cpr.lib), but the `if errorlevel 1 exit 1` after `cmake --build`
didn't fire, so the script went on to `xcopy` and exited 0. GitHub
Actions marked the step as `success` even though MRCuda.lib and
other targets were never produced. The very next step,
`Generate and build Python bindings`, tried to link `mrcudapy.pyd`
against `MRCuda.lib` and failed with `lld-link: error: could not
open 'MRCuda.lib': No such file or directory` — a confusing error
67 seconds after the real failure the step should have reported.

`if errorlevel 1` inside parenthesized IF blocks in cmd.exe has
well-known parse-time expansion quirks that can cause it to miss
exit codes that were set by commands earlier in the same block.
Replace the pattern with `|| exit /b 1` chained directly onto each
command. `||` is evaluated at process-exit time and doesn't depend
on errorlevel-in-parens semantics. This is the documented idiom
for robust error propagation in cmd.exe batch files.

Functional change: the step now fails fast with the first
command's non-zero exit, instead of running to completion and
reporting success regardless. Applies to all three Windows matrix
entries (msvc-2019, msvc-2022 Debug, msvc-2022 Release). Does not
change behavior when the build succeeds.

Other steps in the same workflow (`Build MRBind`, `Generate C
bindings`, `Generate and build Python bindings`) use similar cmd
batch patterns and may have the same latent bug; left alone here
to keep the PR scoped to the Build step that has a concrete
reproducer. Worth auditing separately.
@Fedr Fedr requested a review from Grantim April 21, 2026 12:03
Comment on lines +191 to +200
# Error handling: use `|| exit /b 1` chained to each command instead
# of the `if errorlevel 1 exit 1` pattern. The latter was failing to
# propagate `cmake --build` / ninja failures out of the step on
# windows-2025 runners; in a run where ninja reported
# `ninja: build stopped: subcommand failed` the step still exited 0
# and went on to `xcopy`, which hid the real failure and caused the
# next step's linker error for a nonexistent .lib. `|| exit /b 1`
# is evaluated at process-exit time rather than via the
# errorlevel-in-parens parsing quirks of cmd.exe IF blocks, and is
# the documented idiom for robust error propagation in batch files.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment seems excessively large, can it be simplified?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shortened to three lines in fb6f4ca.

@Fedr Fedr merged commit eb56fb9 into master Apr 21, 2026
11 of 27 checks passed
@Fedr Fedr deleted the ci/fix-windows-build-step-exit branch April 21, 2026 12:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants