model: move `load_hparams` and `load_tensors` to per-model definition by ngxson · Pull Request #22004 · ggml-org/llama.cpp

ngxson · 2026-04-16T16:23:58Z

Overview

Fix #21966

The migration will be done via a script 0migrate.py included in this PR (and will be removed right before this is merged)

Important note: the goal of this PR is to make sure the migration is as deterministic as possible, we do that via a completely heuristic script as mentioned above. Any improvements (deduplication, clean up, etc) will be done via follow-up PRs

Depends on:

Checklist before merging:

Remove migration script
Remove #if 0 pre-migration places

Additional information

Migration rules:

Create one file per arch, the file name is the arch enum name in lower case, with _ replaced by -
Model name is arch name in lower case, prefixed by llama_model_

Given 2 example arch A and B:

If arch A and B BOTH use the same load_tensors AND load_hparams, B inherit A
If either load_tensors OR load_hparams is different, no inherit, there will be one of the 2 functions being duplicated
If B reuse the same graph with A, the graph definition in B will be using graph = llama_model_a::graph

Side effects:

Some part of the code will be duplicated (i.e. some load_tensors or load_hparams)
No more -iswa suffix in file name or in model class name

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: partially, mostly for the migration script

ngxson · 2026-04-16T21:21:34Z

Alright, first pass of the migration (fully generated from the script --> 69104f1)

Asking @ggerganov @CISC @pwilkin for a pre-review

CISC · 2026-04-16T21:34:05Z

+    switch (arch) {
+        case LLM_ARCH_LLAMA:
+            {
+                model = new llama_model_llama(params);
+            } break;
+        case LLM_ARCH_LLAMA4:


Surely there must be some other way to do this without this switch?

Yes and no, technically we can hide the case behind a macro like:

ARCH_DEF(LLAMA)

That will expand to the same case LLM_ARCH_LLAMA: ... break; (though I'm not 100% sure if a preprocessor macro like tolower or toupperexists or not)

Another way is to bake the arch into model class definition, but we will still need to initialize the model instance just to look at its arch

In any cases, it's technically hard to escape from any shorts of switch..case here in C/C++. After all, a struct definition in c++ is no more than bunch of function pointers, unlike higher-level languages like python or java where the notion of "class" does exist at runtime.

struct definition in c++ is no more than bunch of function pointers

Another extreme hacky way is to arrange the vpointer table to follow exactly the same order as the enum, then calculate the correct function pointer from a given enum. But this is kinda science fiction IMO

struct definition in c++ is no more than bunch of function pointers

Another extreme hacky way is to arrange the vpointer table to follow exactly the same order as the enum, then calculate the correct function pointer from a given enum. But this is kinda science fiction IMO

My thoughts were along those lines; that it should be possible to make a link-time table of the classes in enum order.

Anw, I improved this a bit to make it shorter:

case LLM_ARCH_LLAMA: return new llama_model_llama(params); case LLM_ARCH_LLAMA4: return new llama_model_llama4(params); case LLM_ARCH_LLAMA_EMBED: return new llama_model_llama_embed(params);

IMO that's not much different from the current switch..case implementation. The std::unordered_map will be slightly worse because:

The map is constructed at runtime (versus switch being compiled statically into a jump table)

Each look up on the map need to call hash function (versus switch, just a simple jump)

Lambda function need to be compiled which will produce some overheads. While it's not very significant, does technically cost compilation time

Beside, I can't think of any cases where the std::map can do but switch...case cannot (except for being modifiable at runtime, which we won't support anyway)

More elegant ;)

BTW it's static const, so the compiler will probably optimize it anyway, won't be constructed at runtime FWIW.

More elegant ;)

I don't get it... how can this:

{ A, []{ return std::make_unique<AClass>(); } },

more elegant than this:

case A: return new AClass(); // no lambda is needed

BTW it's static const, so the compiler will probably optimize it anyway, won't be constructed at runtime FWIW.

IIRC it only does that with -O2 or -O3. Switch statement is always compiled to static jump table no matter what. So, why not?

Also, the map always requires calculating hash to look up elements.

Personal preference :)

ngxson · 2026-04-16T22:11:04Z

+struct llama_model_llama_embed : public llama_model_llama {
+    llama_model_llama_embed(const struct llama_model_params & params) : llama_model_llama(params) {}
+    // reuse load_hparams and load_tensors from llama_model_llama
+
+    template <bool embed>
+    using graph = llama_model_llama::graph<embed>;
+
+    std::unique_ptr<llm_graph_context> build_graph_context(const llm_graph_params & params) const override;
 };


The auto-migration script can't be intelligent enough to point out that we can use the specialization <true> of the graph, but things like this can be improved via a follow-up PR (just noting here for visibility)

pwilkin · 2026-04-16T23:03:02Z


+        // helper function to facilitate migration
+        // TODO: remove this in the future
        auto create_tensor = [&](const LLM_TN_IMPL & tn, const std::initializer_list<int64_t> & ne, int flags) -> ggml_tensor * {


Would rename the lambda, having both the method and the lambda named create_tensor is a bit confusing.

I think the better way is to completely move the current function the llm_arch_model_i and simply reuse the create_tensor there. Will push a fix for this

I try not to renaming things to reduce the number of line changes in this PR

This lambda is removed in the latest version

ngxson · 2026-04-17T15:18:52Z

Not very urgent but pinging @ggerganov if you can have a quick look on the direction

ngxson added 13 commits April 15, 2026 21:21

git-friendly migration

05905b1

add build_graph

59f8237

nits

eefe366

Merge branch 'master' into xsn/model_def_self_contained

e078d03

exclude old code from build

7e71b46

wip

4d87359

add llm_arch_model_i

ede26f9

prepare downstream functions

96a959c

nits

bc5f239

Merge branch 'master' into xsn/model_def_self_contained

2c91880

nits

589de0e

wip

7127077

wip

e56f5bc

ngxson mentioned this pull request Apr 16, 2026

cmake: use glob to collect src/models sources #22005

Merged

ngxson added 8 commits April 16, 2026 18:33

Merge branch 'master' into xsn/model_def_self_contained

f1549cd

add back create_tensor_qkv

e4e521a

fix files missing include

e73ac93

Merge branch 'master' into xsn/model_def_self_contained

80e75d4

enforce one llm_build per arch

9445ce2

cmake: use glob

8613071

missing model params

10aa6a7

nits

b8e9131

github-actions bot added the python python script changes label Apr 16, 2026

ngxson added 4 commits April 16, 2026 21:52

wip

55569ad

wip (2)

e95c4d6

wip (3)

4f58c4d

test-llama-archs is happy

9d3bdbd

ngxson marked this pull request as ready for review April 16, 2026 21:20

ngxson requested review from CISC and ggerganov as code owners April 16, 2026 21:20

CISC reviewed Apr 16, 2026

View reviewed changes

ngxson added 2 commits April 17, 2026 00:03

Merge branch 'master' into xsn/model_def_self_contained

6c6ecf8

improve switch case

5096a32

ngxson force-pushed the xsn/model_def_self_contained branch from 69104f1 to 31011e6 Compare April 16, 2026 22:06

ngxson commented Apr 16, 2026

View reviewed changes

github-actions bot added the model Model specific label Apr 16, 2026

pwilkin reviewed Apr 16, 2026

View reviewed changes

ngxson added 6 commits April 17, 2026 15:25

Merge branch 'master' into xsn/model_def_self_contained

b3dc2b2

move more stuff into llm_arch_model_i

7f22ff2

fix downstream code

6d39d8c

nits

47d7a9b

nits (2)

64ce044

auto

b5809e2

ngxson force-pushed the xsn/model_def_self_contained branch from 31011e6 to b5809e2 Compare April 17, 2026 14:49

ggerganov self-assigned this Apr 17, 2026

Conversation

ngxson commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Additional information

Migration rules:

Requirements

Uh oh!

ngxson commented Apr 16, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngxson Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngxson Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngxson commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ngxson commented Apr 16, 2026 •

edited

Loading

ngxson Apr 16, 2026 •

edited

Loading

ngxson Apr 16, 2026 •

edited

Loading