feat: CUDA 10.2 / C++14 compatibility for Jetson TX2 (compute 6.2) by sourceupdev · Pull Request #21968 · ggml-org/llama.cpp

sourceupdev · 2026-04-15T20:35:45Z

THIS IS A CHANGE ONLY TO PERSIST A WORKING VERSION ON JETSON-TX2, not intended for MAIN Llama-CPP

Minimal-diff approach to support GCC 9 + nvcc 10.2 with --expt-relaxed-constexpr:

Add compat-cuda10.cuh: bf16->fp16 polyfills (no hw bf16 on compute 6.2)
Guard cuda_bf16.h include with CUDART_VERSION >= 11000
CMake: C++14 std, arch 62, --expt-relaxed-constexpr for CUDA < 11.0
Replace std::is_same_v with std::is_same<>::value (C++14)
Convert fold expressions to C++14 equivalents
Convert structured bindings to explicit .first/.second
Guard cooperative_groups (cg::this_grid) behind CUDART_VERSION >= 11000
Fix cudaStreamWaitEvent 2-arg calls (CUDA 10.2 requires 3rd flags param)
Replace __builtin_assume with GGML_CUDA_ASSUME macro
Fix static inline const/auto to constexpr with explicit types
Fix if-init statements to C++14 style

14 files changed (13 modified + 1 new), 160 insertions, 98 deletions Tested: build OK, CPU inference 0.7 t/s, GPU inference 7.3 t/s (gemma-4-E2B Q4_0)

Overview

Additional information

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure:

Minimal-diff approach to support GCC 9 + nvcc 10.2 with --expt-relaxed-constexpr: - Add compat-cuda10.cuh: bf16->fp16 polyfills (no hw bf16 on compute 6.2) - Guard cuda_bf16.h include with CUDART_VERSION >= 11000 - CMake: C++14 std, arch 62, --expt-relaxed-constexpr for CUDA < 11.0 - Replace std::is_same_v with std::is_same<>::value (C++14) - Convert fold expressions to C++14 equivalents - Convert structured bindings to explicit .first/.second - Guard cooperative_groups (cg::this_grid) behind CUDART_VERSION >= 11000 - Fix cudaStreamWaitEvent 2-arg calls (CUDA 10.2 requires 3rd flags param) - Replace __builtin_assume with GGML_CUDA_ASSUME macro - Fix static inline const/auto to constexpr with explicit types - Fix if-init statements to C++14 style 14 files changed (13 modified + 1 new), 160 insertions, 98 deletions Tested: build OK, CPU inference 0.7 t/s, GPU inference 7.3 t/s (gemma-4-E2B Q4_0)

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Apr 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: CUDA 10.2 / C++14 compatibility for Jetson TX2 (compute 6.2)#21968

feat: CUDA 10.2 / C++14 compatibility for Jetson TX2 (compute 6.2)#21968
sourceupdev wants to merge 1 commit intoggml-org:masterfrom
sourceupcode:feat/jetson-tx2-v2

sourceupdev commented Apr 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sourceupdev commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

THIS IS A CHANGE ONLY TO PERSIST A WORKING VERSION ON JETSON-TX2, not intended for MAIN Llama-CPP

Overview

Additional information

Requirements

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sourceupdev commented Apr 15, 2026 •

edited

Loading