Skip to content

Building on Windows, a Complete Troubleshooting Guide #156

@BBC-Esq

Description

@BBC-Esq

Building pywhispercpp with CUDA on Windows: A Complete Troubleshooting Guide

This guide documents every step, failure, and fix encountered while building
pywhispercpp from source with CUDA
support on Windows 10 with an NVIDIA RTX 4090.


Environment

Component Version / Path
OS Windows 10 Pro 10.0.19045
GPU NVIDIA RTX 4090
Python 3.12 (virtual environment at D:\Python_Programs\bench_STT_whispercpp\)
CUDA Toolkit 12.8, installed at C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8
Visual Studio 2022 Build Tools at C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools
CMake 4.2
PyTorch 2.9.0+cu128

Why Build from Source?

pywhispercpp does not publish prebuilt CUDA wheels. The PyPI wheels are
CPU-only. To get GPU-accelerated inference via whisper.cpp's CUDA backend, you
must build the C++ extension from source with GGML_CUDA=1.


Step-by-Step Build Process (What Finally Worked)

1. Clone and patch setup.py

git clone https://github.com/absadiki/pywhispercpp.git _build_pywhispercpp
cd _build_pywhispercpp

Critical patch: pywhispercpp's setup.py (around lines 153-154) contains
code that dumps every environment variable as a CMake -D flag:

# REMOVE THESE LINES from setup.py:
for key, value in os.environ.items():
    cmake_args.append(f'-D{key}={value}')

This causes CMake to choke on Windows environment variables that contain spaces,
semicolons, parentheses, and other special characters (e.g., ProgramFiles(x86),
PATH with hundreds of entries, etc.). Delete or comment out these two lines.

2. Set environment variables and build

Create a batch file (e.g., _build.bat) with:

@echo off
set CMAKE_GENERATOR=Visual Studio 17 2022
set CMAKE_ARGS=-DGGML_CUDA=on
set GGML_CUDA=1
set FORCE_CMAKE=1
set NO_REPAIR=1

pip install . --no-build-isolation --no-cache-dir

Key variables explained:

Variable Why It's Needed
CMAKE_GENERATOR=Visual Studio 17 2022 CMake's default generator cannot find the CUDA VS integration. The VS 2022 generator has proper CUDA toolkit integration via the BuildTools installation.
CMAKE_ARGS=-DGGML_CUDA=on Tells the whisper.cpp CMake build to enable the CUDA backend.
GGML_CUDA=1 Some code paths in setup.py also check this variable.
FORCE_CMAKE=1 Forces CMake-based build instead of any fallback.
NO_REPAIR=1 Skips the repairwheel step which fails on Windows (see below).

Run the batch file from a regular command prompt (not from inside vcvarsall
— the VS 2022 generator handles MSVC detection on its own).

3. Copy the dependent DLLs

After the wheel installs, the .pyd extension module (_pywhispercpp.pyd) is
placed in Lib/site-packages/, but the shared libraries it depends on are left
inside the build tree. You must manually copy them to site-packages/:

ggml.dll
ggml-base.dll
ggml-cpu.dll
ggml-cuda.dll
whisper.dll

Find them inside the build artifacts (typically under
_build_pywhispercpp/build/ or the pip temp build directory) and copy to:

<venv>/Lib/site-packages/

4. Configure DLL search paths at runtime

Even with the DLLs in site-packages, Python on Windows won't find them unless
you explicitly add the directories to both PATH and os.add_dll_directory().
This must happen before import _pywhispercpp.

The function below handles this (place it at the top of any script that uses
pywhispercpp, and call it before any imports):

import sys
import os
import platform
from pathlib import Path

def set_cuda_paths():
    if platform.system() != "Windows":
        return
    venv_base = Path(sys.executable).parent.parent
    nvidia_base = venv_base / "Lib" / "site-packages" / "nvidia"
    site_packages = venv_base / "Lib" / "site-packages"

    paths_to_add = [
        site_packages,  # whisper.cpp DLLs (ggml-cuda.dll, whisper.dll, etc.)
    ]

    if nvidia_base.exists():
        paths_to_add += [
            nvidia_base / "cuda_runtime" / "bin",
            nvidia_base / "cuda_runtime" / "lib" / "x64",
            nvidia_base / "cuda_runtime" / "include",
            nvidia_base / "cublas" / "bin",
            nvidia_base / "cudnn" / "bin",
            nvidia_base / "cuda_nvrtc" / "bin",
            nvidia_base / "cuda_nvcc" / "bin",
        ]

    # System CUDA toolkit
    cuda_path = os.environ.get(
        "CUDA_PATH",
        r"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8"
    )
    cuda_bin = Path(cuda_path) / "bin"
    if cuda_bin.exists():
        paths_to_add.append(cuda_bin)

    current_value = os.environ.get("PATH", "")
    new_value = os.pathsep.join(
        [str(p) for p in paths_to_add]
        + ([current_value] if current_value else [])
    )
    os.environ["PATH"] = new_value

    if nvidia_base.exists():
        triton_cuda_path = nvidia_base / "cuda_runtime"
        current_cuda_path = os.environ.get("CUDA_PATH", "")
        new_cuda_path = os.pathsep.join(
            [str(triton_cuda_path)]
            + ([current_cuda_path] if current_cuda_path else [])
        )
        os.environ["CUDA_PATH"] = new_cuda_path

    if hasattr(os, "add_dll_directory"):
        for path in paths_to_add:
            if Path(path).exists():
                try:
                    os.add_dll_directory(str(path))
                except OSError:
                    pass

set_cuda_paths()

Why both PATH and add_dll_directory?

  • os.add_dll_directory() (Python 3.8+) is required because Python 3.8 changed
    DLL search behavior on Windows — it no longer searches PATH by default for
    extension module dependencies.
  • Prepending to PATH is still needed because some DLLs loaded by the CUDA
    runtime itself (e.g., cublas64_*.dll) use the legacy LoadLibrary search
    which does check PATH.

Every Error Encountered (and How It Was Fixed)

Error 1: CMake dumps all env vars as -D flags

Symptom:

CMake Error: cmake -DProgramFiles(x86)=C:\Program Files (x86) ...

CMake crashes with parse errors on Windows environment variables containing
spaces, parentheses, and special characters.

Cause: setup.py lines 153-154 iterate over os.environ.items() and pass
every single variable as a -D flag to CMake.

Fix: Delete or comment out these two lines in setup.py:

for key, value in os.environ.items():
    cmake_args.append(f'-D{key}={value}')

Error 2: No CUDA toolset found (default CMake generator)

Symptom:

CMake Error: No CUDA toolset found.

Cause: CMake 4.2's default generator (Ninja or NMake) doesn't know where the
CUDA VS integration files are located. The CUDA Toolkit installs VS integration
files specifically for the Visual Studio generators.

Fix: Explicitly set the generator:

set CMAKE_GENERATOR=Visual Studio 17 2022

The Visual Studio 2022 generator has built-in support for finding the CUDA
toolkit's VS integration (installed at
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8\extras\visual_studio_integration).


Error 3: Ninja generator + vcvarsall.bat fails

Symptom (when trying Ninja instead of VS generator):

vcvarsall.bat: The input line is too long.

or various path-escaping errors when trying to source vcvarsall.bat from a bash
shell via cmd //c.

Cause: The BuildTools vcvarsall.bat has trouble when invoked from certain
shell environments, and the extremely long PATH on the system exceeds cmd.exe's
line length limits.

Fix: Don't use Ninja. Use Visual Studio 17 2022 generator instead — it
doesn't need vcvarsall.bat because it invokes MSBuild directly, which handles
the compiler environment internally.


Error 4: repairwheel "[WinError 2] The system cannot find the file specified"

Symptom:

[WinError 2] The system cannot find the file specified

during the repairwheel post-build step.

Cause: repairwheel (or delvewheel) is not installed or not on PATH in
the build environment. pywhispercpp's build tries to run it to bundle DLLs into
the wheel, but the tool is missing.

Fix: Skip the repair step entirely:

set NO_REPAIR=1

Then manually copy the DLLs yourself (Step 3 above). This is actually more
reliable on Windows because repairwheel sometimes misses CUDA-specific DLLs
anyway.


Error 5: ImportError: DLL load failed while importing _pywhispercpp

Symptom:

ImportError: DLL load failed while importing _pywhispercpp:
The specified module could not be found.

Cause: _pywhispercpp.pyd depends on whisper.dll, ggml.dll,
ggml-cuda.dll, etc. Even though these files exist, Python 3.8+ on Windows
does not search PATH or the current directory for DLL dependencies of extension
modules. You must explicitly register DLL directories.

Fix: Call os.add_dll_directory() for every directory containing required
DLLs before importing _pywhispercpp. The three critical directories are:

  1. <venv>/Lib/site-packages/ — where whisper.dll, ggml-*.dll live
  2. <nvidia packages>/cuda_runtime/bin/ — CUDA runtime DLLs from pip-installed nvidia packages
  3. C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8\bin\ — system CUDA toolkit DLLs

See the set_cuda_paths() function in Step 4 above.


Error 6: from pywhispercpp import _pywhispercpp fails

Symptom:

ImportError: cannot import name '_pywhispercpp' from 'pywhispercpp'

Cause: _pywhispercpp is a top-level Python extension module (compiled
as _pywhispercpp.pyd in site-packages), not a submodule of the pywhispercpp
package.

Fix: Import it as a top-level module:

# Wrong:
from pywhispercpp import _pywhispercpp

# Correct:
import _pywhispercpp as pw

In practice you rarely need to import it directly — the pywhispercpp.model.Model
class wraps it.


Error 7: 'whisper_full_params' object has no attribute 'use_gpu'

Symptom:

AttributeError: '_pywhispercpp.whisper_full_params' object has no attribute 'use_gpu'

Cause: Unlike some other Python Whisper bindings, whisper.cpp does not
have a runtime use_gpu toggle. GPU usage is determined entirely at compile
time. If the library was built with GGML_CUDA=on, it uses the GPU
automatically. There is no way to disable it at runtime via the Python API.

Fix: Remove any use_gpu=True arguments from Model() or
model.transcribe() calls. The --device flag in the benchmark script only
controls whether VRAM monitoring is enabled, not whether the GPU is used for
inference.


Error 8: 'whisper_full_params' object has no attribute 'beam_size'

Symptom:

AttributeError: '_pywhispercpp.whisper_full_params' object has no attribute 'beam_size'

Cause: beam_size is not a top-level attribute of whisper_full_params.
It's nested inside the beam_search sub-dictionary. The params struct has two
nested strategy configs:

  • params.greedy{'best_of': -1}
  • params.beam_search{'beam_size': 5, 'patience': -1.0}

Fix: Set it via the nested dict after constructing the model:

# First, tell the Model to use beam search sampling strategy
model = Model(
    model_path,
    params_sampling_strategy=1,  # 0 = greedy, 1 = beam search
    n_threads=4,
)

# Then set beam_size on the nested dict
model._params.beam_search['beam_size'] = beam_size

Error 9: Audio file rejected (wrong sample rate)

Symptom:

Exception: WAV file must be 16000 Hz

Cause: whisper.cpp (and the original Whisper model) requires 16 kHz mono
audio. pywhispercpp's built-in WAV loader strictly enforces this — it does not
resample. Non-WAV files (FLAC, MP3, etc.) are converted via ffmpeg, which
handles resampling automatically.

Fix: Either:

  • Use a non-WAV format and ensure ffmpeg is installed (pywhispercpp will convert
    it automatically to 16kHz WAV via ffmpeg)
  • Pre-convert your WAV files:
    ffmpeg -i input.wav -ac 1 -ar 16000 output_16k.wav -y

Summary: The Minimal Working Recipe

# 1. Clone
git clone https://github.com/absadiki/pywhispercpp.git _build_pywhispercpp
cd _build_pywhispercpp

# 2. Patch setup.py — remove the env-var-dumping lines (around line 153-154):
#    for key, value in os.environ.items():
#        cmake_args.append(f'-D{key}={value}')

# 3. Build (from a regular cmd.exe prompt)
set CMAKE_GENERATOR=Visual Studio 17 2022
set CMAKE_ARGS=-DGGML_CUDA=on
set GGML_CUDA=1
set FORCE_CMAKE=1
set NO_REPAIR=1
pip install . --no-build-isolation --no-cache-dir

# 4. Copy DLLs from the build tree to site-packages:
#    ggml.dll, ggml-base.dll, ggml-cpu.dll, ggml-cuda.dll, whisper.dll

# 5. In your Python scripts, call set_cuda_paths() BEFORE importing pywhispercpp

Verifying the Build

Run these checks to confirm everything works:

# Smoke test — imports and system info
python -c "
import os, sys, platform
from pathlib import Path
# (call set_cuda_paths() here)
import _pywhispercpp as pw
print(pw.whisper_print_system_info())
"

# Should print system info including CUDA-related flags like:
#   CUDA = 1
#   COREML = 0
#   OPENVINO = 0
#   ...

# Inference test with tiny model
python test_inference.py --model tiny.en

# Full benchmark
python bench_whispercpp.py --model tiny.en --audio your_file.wav

API Gotchas Reference

Pitfall Details
No use_gpu parameter GPU is always on if built with CUDA. No runtime toggle.
beam_size is nested Access via model._params.beam_search['beam_size'], not as a flat param.
_pywhispercpp is top-level import _pywhispercpp, not from pywhispercpp import _pywhispercpp.
WAV must be 16kHz 16-bit Use ffmpeg to convert, or use non-WAV formats (auto-converted).
DLL paths on Windows Must call os.add_dll_directory() before importing the extension.
params_sampling_strategy 0 = greedy, anything else = beam search. Set in Model() constructor.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions