Skip to content

feat: CUDA 10.2 / C++14 compatibility for Jetson TX2 (compute 6.2)#21968

Draft
sourceupdev wants to merge 1 commit intoggml-org:masterfrom
sourceupcode:feat/jetson-tx2-v2
Draft

feat: CUDA 10.2 / C++14 compatibility for Jetson TX2 (compute 6.2)#21968
sourceupdev wants to merge 1 commit intoggml-org:masterfrom
sourceupcode:feat/jetson-tx2-v2

Conversation

@sourceupdev
Copy link
Copy Markdown

@sourceupdev sourceupdev commented Apr 15, 2026

THIS IS A CHANGE ONLY TO PERSIST A WORKING VERSION ON JETSON-TX2, not intended for MAIN Llama-CPP

Minimal-diff approach to support GCC 9 + nvcc 10.2 with --expt-relaxed-constexpr:

  • Add compat-cuda10.cuh: bf16->fp16 polyfills (no hw bf16 on compute 6.2)
  • Guard cuda_bf16.h include with CUDART_VERSION >= 11000
  • CMake: C++14 std, arch 62, --expt-relaxed-constexpr for CUDA < 11.0
  • Replace std::is_same_v with std::is_same<>::value (C++14)
  • Convert fold expressions to C++14 equivalents
  • Convert structured bindings to explicit .first/.second
  • Guard cooperative_groups (cg::this_grid) behind CUDART_VERSION >= 11000
  • Fix cudaStreamWaitEvent 2-arg calls (CUDA 10.2 requires 3rd flags param)
  • Replace __builtin_assume with GGML_CUDA_ASSUME macro
  • Fix static inline const/auto to constexpr with explicit types
  • Fix if-init statements to C++14 style

14 files changed (13 modified + 1 new), 160 insertions, 98 deletions Tested: build OK, CPU inference 0.7 t/s, GPU inference 7.3 t/s (gemma-4-E2B Q4_0)

Overview

Additional information

Requirements

Minimal-diff approach to support GCC 9 + nvcc 10.2 with --expt-relaxed-constexpr:
- Add compat-cuda10.cuh: bf16->fp16 polyfills (no hw bf16 on compute 6.2)
- Guard cuda_bf16.h include with CUDART_VERSION >= 11000
- CMake: C++14 std, arch 62, --expt-relaxed-constexpr for CUDA < 11.0
- Replace std::is_same_v with std::is_same<>::value (C++14)
- Convert fold expressions to C++14 equivalents
- Convert structured bindings to explicit .first/.second
- Guard cooperative_groups (cg::this_grid) behind CUDART_VERSION >= 11000
- Fix cudaStreamWaitEvent 2-arg calls (CUDA 10.2 requires 3rd flags param)
- Replace __builtin_assume with GGML_CUDA_ASSUME macro
- Fix static inline const/auto to constexpr with explicit types
- Fix if-init statements to C++14 style

14 files changed (13 modified + 1 new), 160 insertions, 98 deletions
Tested: build OK, CPU inference 0.7 t/s, GPU inference 7.3 t/s (gemma-4-E2B Q4_0)
@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants