Skip to content

model : support NVFP4 tensors for Gemma4#21971

Merged
CISC merged 4 commits intomasterfrom
cisc/gemma4-nvfp4
Apr 16, 2026
Merged

model : support NVFP4 tensors for Gemma4#21971
CISC merged 4 commits intomasterfrom
cisc/gemma4-nvfp4

Conversation

@CISC
Copy link
Copy Markdown
Member

@CISC CISC commented Apr 15, 2026

Overview

Add support for NVFP4 Gemma4.

Additional information

Also adds wo_s to build_attn to be able to pass it on to build_lora_mm.
GGUF: CISCai/gemma-4-31B-it-NVFP4-turbo-GGUF

Requirements

@CISC CISC requested a review from ngxson April 15, 2026 23:29
@github-actions github-actions bot added the model Model specific label Apr 15, 2026
Copy link
Copy Markdown
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, these should be removed and use wq_b, wk_b, wv_b and wqkv_b instead:

llama.cpp/src/llama-model.h

Lines 256 to 263 in b1be68e

// attention bias
struct ggml_tensor * bq = nullptr;
struct ggml_tensor * bk = nullptr;
struct ggml_tensor * bv = nullptr;
struct ggml_tensor * bo = nullptr;
struct ggml_tensor * bqkv = nullptr;

In a follow-up PR

Comment thread src/llama-graph.cpp
Comment on lines 2238 to +2246
if (arch == LLM_ARCH_GLM4 || arch == LLM_ARCH_GLM4_MOE) {
// GLM4 and GLM4_MOE seem to have numerical issues with half-precision accumulators
cur = build_lora_mm(wo, cur);
ggml_mul_mat_set_prec(cur, GGML_PREC_F32);
if (wo_s) {
cur = ggml_mul(ctx0, cur, wo_s);
}
} else {
cur = build_lora_mm(wo, cur, wo_s);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a follow-up PR to fix the order of the build_lora_mm arguments (e.g. cur, wo, wo_s) and add an optional precision argument to avoid this branching.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will be more manageable after merging #21245

@CISC
Copy link
Copy Markdown
Member Author

CISC commented Apr 16, 2026

@ngxson gentle ping, merge this, then build_qkv, then yours?

Copy link
Copy Markdown
Contributor

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, sounds good to me

@CISC CISC merged commit f772f6e into master Apr 16, 2026
50 checks passed
@CISC CISC deleted the cisc/gemma4-nvfp4 branch April 16, 2026 14:51
cnsiva pushed a commit to saas-home/llama.cpp that referenced this pull request Apr 17, 2026
* support nvfp4 tensors for Gemma4

* add wo_s to build_attn

* add wo_s to build_attn

* fix glm4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

model Model specific

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants