Building pywhispercpp with CUDA on Windows: A Complete Troubleshooting Guide
This guide documents every step, failure, and fix encountered while building
pywhispercpp from source with CUDA
support on Windows 10 with an NVIDIA RTX 4090.
Environment
| Component |
Version / Path |
| OS |
Windows 10 Pro 10.0.19045 |
| GPU |
NVIDIA RTX 4090 |
| Python |
3.12 (virtual environment at D:\Python_Programs\bench_STT_whispercpp\) |
| CUDA Toolkit |
12.8, installed at C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8 |
| Visual Studio |
2022 Build Tools at C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools |
| CMake |
4.2 |
| PyTorch |
2.9.0+cu128 |
Why Build from Source?
pywhispercpp does not publish prebuilt CUDA wheels. The PyPI wheels are
CPU-only. To get GPU-accelerated inference via whisper.cpp's CUDA backend, you
must build the C++ extension from source with GGML_CUDA=1.
Step-by-Step Build Process (What Finally Worked)
1. Clone and patch setup.py
git clone https://github.com/absadiki/pywhispercpp.git _build_pywhispercpp
cd _build_pywhispercpp
Critical patch: pywhispercpp's setup.py (around lines 153-154) contains
code that dumps every environment variable as a CMake -D flag:
# REMOVE THESE LINES from setup.py:
for key, value in os.environ.items():
cmake_args.append(f'-D{key}={value}')
This causes CMake to choke on Windows environment variables that contain spaces,
semicolons, parentheses, and other special characters (e.g., ProgramFiles(x86),
PATH with hundreds of entries, etc.). Delete or comment out these two lines.
2. Set environment variables and build
Create a batch file (e.g., _build.bat) with:
@echo off
set CMAKE_GENERATOR=Visual Studio 17 2022
set CMAKE_ARGS=-DGGML_CUDA=on
set GGML_CUDA=1
set FORCE_CMAKE=1
set NO_REPAIR=1
pip install . --no-build-isolation --no-cache-dir
Key variables explained:
| Variable |
Why It's Needed |
CMAKE_GENERATOR=Visual Studio 17 2022 |
CMake's default generator cannot find the CUDA VS integration. The VS 2022 generator has proper CUDA toolkit integration via the BuildTools installation. |
CMAKE_ARGS=-DGGML_CUDA=on |
Tells the whisper.cpp CMake build to enable the CUDA backend. |
GGML_CUDA=1 |
Some code paths in setup.py also check this variable. |
FORCE_CMAKE=1 |
Forces CMake-based build instead of any fallback. |
NO_REPAIR=1 |
Skips the repairwheel step which fails on Windows (see below). |
Run the batch file from a regular command prompt (not from inside vcvarsall
— the VS 2022 generator handles MSVC detection on its own).
3. Copy the dependent DLLs
After the wheel installs, the .pyd extension module (_pywhispercpp.pyd) is
placed in Lib/site-packages/, but the shared libraries it depends on are left
inside the build tree. You must manually copy them to site-packages/:
ggml.dll
ggml-base.dll
ggml-cpu.dll
ggml-cuda.dll
whisper.dll
Find them inside the build artifacts (typically under
_build_pywhispercpp/build/ or the pip temp build directory) and copy to:
<venv>/Lib/site-packages/
4. Configure DLL search paths at runtime
Even with the DLLs in site-packages, Python on Windows won't find them unless
you explicitly add the directories to both PATH and os.add_dll_directory().
This must happen before import _pywhispercpp.
The function below handles this (place it at the top of any script that uses
pywhispercpp, and call it before any imports):
import sys
import os
import platform
from pathlib import Path
def set_cuda_paths():
if platform.system() != "Windows":
return
venv_base = Path(sys.executable).parent.parent
nvidia_base = venv_base / "Lib" / "site-packages" / "nvidia"
site_packages = venv_base / "Lib" / "site-packages"
paths_to_add = [
site_packages, # whisper.cpp DLLs (ggml-cuda.dll, whisper.dll, etc.)
]
if nvidia_base.exists():
paths_to_add += [
nvidia_base / "cuda_runtime" / "bin",
nvidia_base / "cuda_runtime" / "lib" / "x64",
nvidia_base / "cuda_runtime" / "include",
nvidia_base / "cublas" / "bin",
nvidia_base / "cudnn" / "bin",
nvidia_base / "cuda_nvrtc" / "bin",
nvidia_base / "cuda_nvcc" / "bin",
]
# System CUDA toolkit
cuda_path = os.environ.get(
"CUDA_PATH",
r"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8"
)
cuda_bin = Path(cuda_path) / "bin"
if cuda_bin.exists():
paths_to_add.append(cuda_bin)
current_value = os.environ.get("PATH", "")
new_value = os.pathsep.join(
[str(p) for p in paths_to_add]
+ ([current_value] if current_value else [])
)
os.environ["PATH"] = new_value
if nvidia_base.exists():
triton_cuda_path = nvidia_base / "cuda_runtime"
current_cuda_path = os.environ.get("CUDA_PATH", "")
new_cuda_path = os.pathsep.join(
[str(triton_cuda_path)]
+ ([current_cuda_path] if current_cuda_path else [])
)
os.environ["CUDA_PATH"] = new_cuda_path
if hasattr(os, "add_dll_directory"):
for path in paths_to_add:
if Path(path).exists():
try:
os.add_dll_directory(str(path))
except OSError:
pass
set_cuda_paths()
Why both PATH and add_dll_directory?
os.add_dll_directory() (Python 3.8+) is required because Python 3.8 changed
DLL search behavior on Windows — it no longer searches PATH by default for
extension module dependencies.
- Prepending to
PATH is still needed because some DLLs loaded by the CUDA
runtime itself (e.g., cublas64_*.dll) use the legacy LoadLibrary search
which does check PATH.
Every Error Encountered (and How It Was Fixed)
Error 1: CMake dumps all env vars as -D flags
Symptom:
CMake Error: cmake -DProgramFiles(x86)=C:\Program Files (x86) ...
CMake crashes with parse errors on Windows environment variables containing
spaces, parentheses, and special characters.
Cause: setup.py lines 153-154 iterate over os.environ.items() and pass
every single variable as a -D flag to CMake.
Fix: Delete or comment out these two lines in setup.py:
for key, value in os.environ.items():
cmake_args.append(f'-D{key}={value}')
Error 2: No CUDA toolset found (default CMake generator)
Symptom:
CMake Error: No CUDA toolset found.
Cause: CMake 4.2's default generator (Ninja or NMake) doesn't know where the
CUDA VS integration files are located. The CUDA Toolkit installs VS integration
files specifically for the Visual Studio generators.
Fix: Explicitly set the generator:
set CMAKE_GENERATOR=Visual Studio 17 2022
The Visual Studio 2022 generator has built-in support for finding the CUDA
toolkit's VS integration (installed at
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8\extras\visual_studio_integration).
Error 3: Ninja generator + vcvarsall.bat fails
Symptom (when trying Ninja instead of VS generator):
vcvarsall.bat: The input line is too long.
or various path-escaping errors when trying to source vcvarsall.bat from a bash
shell via cmd //c.
Cause: The BuildTools vcvarsall.bat has trouble when invoked from certain
shell environments, and the extremely long PATH on the system exceeds cmd.exe's
line length limits.
Fix: Don't use Ninja. Use Visual Studio 17 2022 generator instead — it
doesn't need vcvarsall.bat because it invokes MSBuild directly, which handles
the compiler environment internally.
Error 4: repairwheel "[WinError 2] The system cannot find the file specified"
Symptom:
[WinError 2] The system cannot find the file specified
during the repairwheel post-build step.
Cause: repairwheel (or delvewheel) is not installed or not on PATH in
the build environment. pywhispercpp's build tries to run it to bundle DLLs into
the wheel, but the tool is missing.
Fix: Skip the repair step entirely:
Then manually copy the DLLs yourself (Step 3 above). This is actually more
reliable on Windows because repairwheel sometimes misses CUDA-specific DLLs
anyway.
Error 5: ImportError: DLL load failed while importing _pywhispercpp
Symptom:
ImportError: DLL load failed while importing _pywhispercpp:
The specified module could not be found.
Cause: _pywhispercpp.pyd depends on whisper.dll, ggml.dll,
ggml-cuda.dll, etc. Even though these files exist, Python 3.8+ on Windows
does not search PATH or the current directory for DLL dependencies of extension
modules. You must explicitly register DLL directories.
Fix: Call os.add_dll_directory() for every directory containing required
DLLs before importing _pywhispercpp. The three critical directories are:
<venv>/Lib/site-packages/ — where whisper.dll, ggml-*.dll live
<nvidia packages>/cuda_runtime/bin/ — CUDA runtime DLLs from pip-installed nvidia packages
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8\bin\ — system CUDA toolkit DLLs
See the set_cuda_paths() function in Step 4 above.
Error 6: from pywhispercpp import _pywhispercpp fails
Symptom:
ImportError: cannot import name '_pywhispercpp' from 'pywhispercpp'
Cause: _pywhispercpp is a top-level Python extension module (compiled
as _pywhispercpp.pyd in site-packages), not a submodule of the pywhispercpp
package.
Fix: Import it as a top-level module:
# Wrong:
from pywhispercpp import _pywhispercpp
# Correct:
import _pywhispercpp as pw
In practice you rarely need to import it directly — the pywhispercpp.model.Model
class wraps it.
Error 7: 'whisper_full_params' object has no attribute 'use_gpu'
Symptom:
AttributeError: '_pywhispercpp.whisper_full_params' object has no attribute 'use_gpu'
Cause: Unlike some other Python Whisper bindings, whisper.cpp does not
have a runtime use_gpu toggle. GPU usage is determined entirely at compile
time. If the library was built with GGML_CUDA=on, it uses the GPU
automatically. There is no way to disable it at runtime via the Python API.
Fix: Remove any use_gpu=True arguments from Model() or
model.transcribe() calls. The --device flag in the benchmark script only
controls whether VRAM monitoring is enabled, not whether the GPU is used for
inference.
Error 8: 'whisper_full_params' object has no attribute 'beam_size'
Symptom:
AttributeError: '_pywhispercpp.whisper_full_params' object has no attribute 'beam_size'
Cause: beam_size is not a top-level attribute of whisper_full_params.
It's nested inside the beam_search sub-dictionary. The params struct has two
nested strategy configs:
params.greedy → {'best_of': -1}
params.beam_search → {'beam_size': 5, 'patience': -1.0}
Fix: Set it via the nested dict after constructing the model:
# First, tell the Model to use beam search sampling strategy
model = Model(
model_path,
params_sampling_strategy=1, # 0 = greedy, 1 = beam search
n_threads=4,
)
# Then set beam_size on the nested dict
model._params.beam_search['beam_size'] = beam_size
Error 9: Audio file rejected (wrong sample rate)
Symptom:
Exception: WAV file must be 16000 Hz
Cause: whisper.cpp (and the original Whisper model) requires 16 kHz mono
audio. pywhispercpp's built-in WAV loader strictly enforces this — it does not
resample. Non-WAV files (FLAC, MP3, etc.) are converted via ffmpeg, which
handles resampling automatically.
Fix: Either:
- Use a non-WAV format and ensure ffmpeg is installed (pywhispercpp will convert
it automatically to 16kHz WAV via ffmpeg)
- Pre-convert your WAV files:
ffmpeg -i input.wav -ac 1 -ar 16000 output_16k.wav -y
Summary: The Minimal Working Recipe
# 1. Clone
git clone https://github.com/absadiki/pywhispercpp.git _build_pywhispercpp
cd _build_pywhispercpp
# 2. Patch setup.py — remove the env-var-dumping lines (around line 153-154):
# for key, value in os.environ.items():
# cmake_args.append(f'-D{key}={value}')
# 3. Build (from a regular cmd.exe prompt)
set CMAKE_GENERATOR=Visual Studio 17 2022
set CMAKE_ARGS=-DGGML_CUDA=on
set GGML_CUDA=1
set FORCE_CMAKE=1
set NO_REPAIR=1
pip install . --no-build-isolation --no-cache-dir
# 4. Copy DLLs from the build tree to site-packages:
# ggml.dll, ggml-base.dll, ggml-cpu.dll, ggml-cuda.dll, whisper.dll
# 5. In your Python scripts, call set_cuda_paths() BEFORE importing pywhispercpp
Verifying the Build
Run these checks to confirm everything works:
# Smoke test — imports and system info
python -c "
import os, sys, platform
from pathlib import Path
# (call set_cuda_paths() here)
import _pywhispercpp as pw
print(pw.whisper_print_system_info())
"
# Should print system info including CUDA-related flags like:
# CUDA = 1
# COREML = 0
# OPENVINO = 0
# ...
# Inference test with tiny model
python test_inference.py --model tiny.en
# Full benchmark
python bench_whispercpp.py --model tiny.en --audio your_file.wav
API Gotchas Reference
| Pitfall |
Details |
No use_gpu parameter |
GPU is always on if built with CUDA. No runtime toggle. |
beam_size is nested |
Access via model._params.beam_search['beam_size'], not as a flat param. |
_pywhispercpp is top-level |
import _pywhispercpp, not from pywhispercpp import _pywhispercpp. |
| WAV must be 16kHz 16-bit |
Use ffmpeg to convert, or use non-WAV formats (auto-converted). |
| DLL paths on Windows |
Must call os.add_dll_directory() before importing the extension. |
params_sampling_strategy |
0 = greedy, anything else = beam search. Set in Model() constructor. |
Building pywhispercpp with CUDA on Windows: A Complete Troubleshooting Guide
This guide documents every step, failure, and fix encountered while building
pywhispercpp from source with CUDA
support on Windows 10 with an NVIDIA RTX 4090.
Environment
D:\Python_Programs\bench_STT_whispercpp\)C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildToolsWhy Build from Source?
pywhispercpp does not publish prebuilt CUDA wheels. The PyPI wheels are
CPU-only. To get GPU-accelerated inference via whisper.cpp's CUDA backend, you
must build the C++ extension from source with
GGML_CUDA=1.Step-by-Step Build Process (What Finally Worked)
1. Clone and patch setup.py
git clone https://github.com/absadiki/pywhispercpp.git _build_pywhispercpp cd _build_pywhispercppCritical patch: pywhispercpp's
setup.py(around lines 153-154) containscode that dumps every environment variable as a CMake
-Dflag:This causes CMake to choke on Windows environment variables that contain spaces,
semicolons, parentheses, and other special characters (e.g.,
ProgramFiles(x86),PATHwith hundreds of entries, etc.). Delete or comment out these two lines.2. Set environment variables and build
Create a batch file (e.g.,
_build.bat) with:Key variables explained:
CMAKE_GENERATOR=Visual Studio 17 2022CMAKE_ARGS=-DGGML_CUDA=onGGML_CUDA=1FORCE_CMAKE=1NO_REPAIR=1repairwheelstep which fails on Windows (see below).Run the batch file from a regular command prompt (not from inside vcvarsall
— the VS 2022 generator handles MSVC detection on its own).
3. Copy the dependent DLLs
After the wheel installs, the
.pydextension module (_pywhispercpp.pyd) isplaced in
Lib/site-packages/, but the shared libraries it depends on are leftinside the build tree. You must manually copy them to
site-packages/:Find them inside the build artifacts (typically under
_build_pywhispercpp/build/or the pip temp build directory) and copy to:4. Configure DLL search paths at runtime
Even with the DLLs in site-packages, Python on Windows won't find them unless
you explicitly add the directories to both
PATHandos.add_dll_directory().This must happen before
import _pywhispercpp.The function below handles this (place it at the top of any script that uses
pywhispercpp, and call it before any imports):
Why both PATH and add_dll_directory?
os.add_dll_directory()(Python 3.8+) is required because Python 3.8 changedDLL search behavior on Windows — it no longer searches PATH by default for
extension module dependencies.
PATHis still needed because some DLLs loaded by the CUDAruntime itself (e.g.,
cublas64_*.dll) use the legacy LoadLibrary searchwhich does check PATH.
Every Error Encountered (and How It Was Fixed)
Error 1: CMake dumps all env vars as -D flags
Symptom:
CMake crashes with parse errors on Windows environment variables containing
spaces, parentheses, and special characters.
Cause:
setup.pylines 153-154 iterate overos.environ.items()and passevery single variable as a
-Dflag to CMake.Fix: Delete or comment out these two lines in
setup.py:Error 2: No CUDA toolset found (default CMake generator)
Symptom:
Cause: CMake 4.2's default generator (Ninja or NMake) doesn't know where the
CUDA VS integration files are located. The CUDA Toolkit installs VS integration
files specifically for the Visual Studio generators.
Fix: Explicitly set the generator:
The Visual Studio 2022 generator has built-in support for finding the CUDA
toolkit's VS integration (installed at
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8\extras\visual_studio_integration).Error 3: Ninja generator + vcvarsall.bat fails
Symptom (when trying Ninja instead of VS generator):
or various path-escaping errors when trying to source vcvarsall.bat from a bash
shell via
cmd //c.Cause: The BuildTools vcvarsall.bat has trouble when invoked from certain
shell environments, and the extremely long PATH on the system exceeds cmd.exe's
line length limits.
Fix: Don't use Ninja. Use
Visual Studio 17 2022generator instead — itdoesn't need vcvarsall.bat because it invokes MSBuild directly, which handles
the compiler environment internally.
Error 4: repairwheel "[WinError 2] The system cannot find the file specified"
Symptom:
during the
repairwheelpost-build step.Cause:
repairwheel(ordelvewheel) is not installed or not on PATH inthe build environment. pywhispercpp's build tries to run it to bundle DLLs into
the wheel, but the tool is missing.
Fix: Skip the repair step entirely:
Then manually copy the DLLs yourself (Step 3 above). This is actually more
reliable on Windows because repairwheel sometimes misses CUDA-specific DLLs
anyway.
Error 5:
ImportError: DLL load failed while importing _pywhispercppSymptom:
Cause:
_pywhispercpp.pyddepends onwhisper.dll,ggml.dll,ggml-cuda.dll, etc. Even though these files exist, Python 3.8+ on Windowsdoes not search PATH or the current directory for DLL dependencies of extension
modules. You must explicitly register DLL directories.
Fix: Call
os.add_dll_directory()for every directory containing requiredDLLs before importing
_pywhispercpp. The three critical directories are:<venv>/Lib/site-packages/— where whisper.dll, ggml-*.dll live<nvidia packages>/cuda_runtime/bin/— CUDA runtime DLLs from pip-installed nvidia packagesC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8\bin\— system CUDA toolkit DLLsSee the
set_cuda_paths()function in Step 4 above.Error 6:
from pywhispercpp import _pywhispercppfailsSymptom:
Cause:
_pywhispercppis a top-level Python extension module (compiledas
_pywhispercpp.pydin site-packages), not a submodule of thepywhispercpppackage.
Fix: Import it as a top-level module:
In practice you rarely need to import it directly — the
pywhispercpp.model.Modelclass wraps it.
Error 7:
'whisper_full_params' object has no attribute 'use_gpu'Symptom:
Cause: Unlike some other Python Whisper bindings, whisper.cpp does not
have a runtime
use_gputoggle. GPU usage is determined entirely at compiletime. If the library was built with
GGML_CUDA=on, it uses the GPUautomatically. There is no way to disable it at runtime via the Python API.
Fix: Remove any
use_gpu=Truearguments fromModel()ormodel.transcribe()calls. The--deviceflag in the benchmark script onlycontrols whether VRAM monitoring is enabled, not whether the GPU is used for
inference.
Error 8:
'whisper_full_params' object has no attribute 'beam_size'Symptom:
Cause:
beam_sizeis not a top-level attribute ofwhisper_full_params.It's nested inside the
beam_searchsub-dictionary. The params struct has twonested strategy configs:
params.greedy→{'best_of': -1}params.beam_search→{'beam_size': 5, 'patience': -1.0}Fix: Set it via the nested dict after constructing the model:
Error 9: Audio file rejected (wrong sample rate)
Symptom:
Cause: whisper.cpp (and the original Whisper model) requires 16 kHz mono
audio. pywhispercpp's built-in WAV loader strictly enforces this — it does not
resample. Non-WAV files (FLAC, MP3, etc.) are converted via ffmpeg, which
handles resampling automatically.
Fix: Either:
it automatically to 16kHz WAV via ffmpeg)
Summary: The Minimal Working Recipe
Verifying the Build
Run these checks to confirm everything works:
API Gotchas Reference
use_gpuparameterbeam_sizeis nestedmodel._params.beam_search['beam_size'], not as a flat param._pywhispercppis top-levelimport _pywhispercpp, notfrom pywhispercpp import _pywhispercpp.os.add_dll_directory()before importing the extension.params_sampling_strategy0= greedy, anything else = beam search. Set inModel()constructor.