Skip to content

ci(windows): add DLL-not-found diagnostic steps#5966

Merged
Fedr merged 2 commits intomasterfrom
ci/windows-dll-diagnostics
Apr 24, 2026
Merged

ci(windows): add DLL-not-found diagnostic steps#5966
Fedr merged 2 commits intomasterfrom
ci/windows-dll-diagnostics

Conversation

@Fedr
Copy link
Copy Markdown
Contributor

@Fedr Fedr commented Apr 23, 2026

Summary

Add two unconditional diagnostic steps to build-test-windows.yml so that STATUS_DLL_NOT_FOUND failures at MRTest / MeshViewer launch become self-diagnosing from the CI log.

Motivation

PR #5959 (zlib-ng) run 24802707812 failed on the msvc-2019 Debug CMake x64-windows-meshlib-iterator-debug leg with

##[error]Process completed with exit code -1073741515.

That's 0xC0000135 = STATUS_DLL_NOT_FOUND: MeshViewer.exe couldn't find a DLL at launch and the Windows loader aborted before main(). The log contained zero information about which DLL was missing. The same run succeeded on three other Windows triplets, implicating the iterator-debug triplet's DLL-copy chain, but diagnosis required reading vcpkg install logs from memory and guessing DLL-name conventions. This PR makes that data explicit on every Windows build, so the next occurrence of this failure mode identifies the root cause without needing to re-run CI or SSH to a runner.

The steps

Both diagnostic bodies live in scripts/diagnostics/*.ps1; the workflow just invokes them.

1. After Vcpkg integrate install — vcpkg install tree inventory

- name: Diagnostic — vcpkg installed tree
  if: ${{ always() }}
  shell: pwsh
  continue-on-error: true
  run: ./scripts/diagnostics/windows-vcpkg-tree.ps1 -Triplet "${{ matrix.vcpkg_triplet || 'x64-windows-meshlib' }}"

scripts/diagnostics/windows-vcpkg-tree.ps1 lists every file under C:\vcpkg\installed\<triplet>\{bin,debug\bin,lib,debug\lib}. Answers: did vcpkg install the package, and under what filenames? (e.g. distinguishing libz-ng.dll vs libz-ngd.dll, or libz-ng.lib vs zlib-ng.lib.)

2. Right before Run Start-and-Exit Tests — output bin inventory + import tables + PATH

- name: Diagnostic — output bin DLL inventory + MRMesh imports
  if: ${{ always() }}
  shell: pwsh
  continue-on-error: true
  working-directory: source\x64\${{ matrix.config }}
  run: ${{ github.workspace }}\scripts\diagnostics\windows-bin-imports.ps1 -VcPath "${{ matrix.vc-path }}"

scripts/diagnostics/windows-bin-imports.ps1 does three things in cwd:

  • lists all .dll / .exe next to MeshViewer.exe (did applocal copy succeed?)
  • runs dumpbin /dependents on MeshViewer.exe, MRMesh.dll, MRTest.exe (which DLL names does the loader want?)
  • prints $env:PATH entry-by-entry (is the vcpkg install dir in the loader's search path?)

Cross-referencing these with step 1's vcpkg tree makes the diagnosis unambiguous: if the DLL exists in vcpkg's tree, is named what the loader wants, but isn't in the output dir and vcpkg's bin isn't on PATH — it's an applocal-copy problem. If the DLL name in the import table doesn't match what vcpkg installed — it's a naming-convention issue. If the DLL is entirely missing from vcpkg's tree — it's an install-time problem.

Safety

  • if: always() — runs even when preceding steps failed. That's when you need it most; a Build step that crashed pre-link still has a partial output dir we'd want to inspect.
  • continue-on-error: true — a bug in the diagnostic step itself (e.g. dumpbin not found, pwsh syntax regression) can't turn a green run red.
  • Unconditional on all Windows legs — no if: matrix.foo == gating. Cheap enough to always run (~30 lines when healthy), and having the healthy-leg output makes diff-diagnosis trivial when one leg regresses.

Cost

~30 lines of pwsh output per Windows job when everything works; ~100 lines when a DLL is missing. Negligible relative to the multi-GB Windows CI logs already emitted.

Scope

Three files:

  • .github/workflows/build-test-windows.yml — two new steps, one run: line each.
  • scripts/diagnostics/windows-vcpkg-tree.ps1 — new, vcpkg install-tree inventory.
  • scripts/diagnostics/windows-bin-imports.ps1 — new, output-dir DLL inventory + dumpbin imports + PATH dump.

No source changes, no behavioural change to the actual build/test pipeline.

Labels

Windows-only change → can reasonably be gated with disable-build-* labels for the other platforms, but the hook that triggers image rebuilds on thirdparty/ path changes doesn't fire here (nothing under thirdparty/ is touched), so other platforms will run naturally without unnecessary image rebuild cost. Leaving no labels applied.

🤖 Generated with Claude Code

Fedr and others added 2 commits April 23, 2026 14:59
STATUS_DLL_NOT_FOUND (exit code -1073741515 / 0xC0000135) at
MeshViewer.exe launch kills the Run Start-and-Exit Tests step with
zero useful output — the Windows loader aborts before main() and
CI sees only a bare ##[error]Process completed with exit code ...

Observed on the zlib-ng branch run 24802707812, job 72596857457
(msvc-2019 Debug CMake with the x64-windows-meshlib-iterator-debug
triplet): vcpkg installed zlib-ng successfully but libz-ng.dll
didn't end up next to MeshViewer.exe. Three other Windows legs in
the same run with the default triplet succeeded, so the gap is
specifically in the iterator-debug triplet's DLL-copy chain.

Add two unconditional diagnostic steps so the next run of any
Windows job self-documents the state the loader will see:

  1. After vcpkg-integrate-install, "Diagnostic — vcpkg installed
     tree": lists the contents of
     C:\vcpkg\installed\<TRIPLET>\{bin,debug\bin,lib,debug\lib}
     so we see which DLLs vcpkg actually installed and under what
     names (e.g. libz-ng.dll vs libz-ngd.dll). Confirms or rules out
     "vcpkg didn't install the package" as the cause.

  2. Right before "Run Start-and-Exit Tests", "Diagnostic — output
     bin DLL inventory + MRMesh imports":
       - Lists all .dll/.exe under source\x64\<CONFIG>\ so we see
         which DLLs were copied next to MeshViewer.exe
       - Runs dumpbin /dependents on MeshViewer.exe, MRMesh.dll,
         MRTest.exe so we see which DLL names the loader is actually
         looking for at process start
       - Dumps the PATH so we can tell whether the vcpkg install
         dir would be a fallback lookup location
     Together, these three data points are enough to turn any
     DLL-not-found failure into a one-line diagnosis.

Both steps have `if: always()` and `continue-on-error: true`, so
they run even when the main Build step failed (often when you need
the diagnostic most) and never mask a real failure themselves.

Net cost: ~30 lines of pwsh output per Windows job when everything
works; ~100 lines when a DLL is missing. Trivial relative to the
multi-gigabyte Windows CI logs already emitted.
Moves the two inline diagnostic blocks added to build-test-windows.yml
into dedicated .ps1 files under scripts/diagnostics/, shrinking the
workflow by ~60 lines while keeping the exact same behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Fedr Fedr marked this pull request as ready for review April 24, 2026 10:41
@Fedr Fedr merged commit a347491 into master Apr 24, 2026
34 checks passed
@Fedr Fedr deleted the ci/windows-dll-diagnostics branch April 24, 2026 11:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants