Skip to content

model: move load_hparams and load_tensors to per-model definition#22004

Open
ngxson wants to merge 33 commits intoggml-org:masterfrom
ngxson:xsn/model_def_self_contained
Open

model: move load_hparams and load_tensors to per-model definition#22004
ngxson wants to merge 33 commits intoggml-org:masterfrom
ngxson:xsn/model_def_self_contained

Conversation

@ngxson
Copy link
Copy Markdown
Contributor

@ngxson ngxson commented Apr 16, 2026

Overview

Fix #21966

The migration will be done via a script 0migrate.py included in this PR (and will be removed right before this is merged)

Important note: the goal of this PR is to make sure the migration is as deterministic as possible, we do that via a completely heuristic script as mentioned above. Any improvements (deduplication, clean up, etc) will be done via follow-up PRs

Depends on:

Checklist before merging:

  • Remove migration script
  • Remove #if 0 pre-migration places

Additional information

Migration rules:

  • Create one file per arch, the file name is the arch enum name in lower case, with _ replaced by -
  • Model name is arch name in lower case, prefixed by llama_model_

Given 2 example arch A and B:

  • If arch A and B BOTH use the same load_tensors AND load_hparams, B inherit A
  • If either load_tensors OR load_hparams is different, no inherit, there will be one of the 2 functions being duplicated
  • If B reuse the same graph with A, the graph definition in B will be using graph = llama_model_a::graph

Side effects:

  • Some part of the code will be duplicated (i.e. some load_tensors or load_hparams)
  • No more -iswa suffix in file name or in model class name

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: partially, mostly for the migration script

@github-actions github-actions bot added the python python script changes label Apr 16, 2026
@ngxson ngxson marked this pull request as ready for review April 16, 2026 21:20
@ngxson ngxson requested review from CISC and ggerganov as code owners April 16, 2026 21:20
@ngxson
Copy link
Copy Markdown
Contributor Author

ngxson commented Apr 16, 2026

Alright, first pass of the migration (fully generated from the script --> 69104f1)

Asking @ggerganov @CISC @pwilkin for a pre-review

Comment thread src/llama-model.cpp
Comment on lines +39 to +44
switch (arch) {
case LLM_ARCH_LLAMA:
{
model = new llama_model_llama(params);
} break;
case LLM_ARCH_LLAMA4:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely there must be some other way to do this without this switch?

Copy link
Copy Markdown
Contributor Author

@ngxson ngxson Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and no, technically we can hide the case behind a macro like:

ARCH_DEF(LLAMA)

That will expand to the same case LLM_ARCH_LLAMA: ... break; (though I'm not 100% sure if a preprocessor macro like tolower or toupperexists or not)

Another way is to bake the arch into model class definition, but we will still need to initialize the model instance just to look at its arch

In any cases, it's technically hard to escape from any shorts of switch..case here in C/C++. After all, a struct definition in c++ is no more than bunch of function pointers, unlike higher-level languages like python or java where the notion of "class" does exist at runtime.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

struct definition in c++ is no more than bunch of function pointers

Another extreme hacky way is to arrange the vpointer table to follow exactly the same order as the enum, then calculate the correct function pointer from a given enum. But this is kinda science fiction IMO

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

struct definition in c++ is no more than bunch of function pointers

Another extreme hacky way is to arrange the vpointer table to follow exactly the same order as the enum, then calculate the correct function pointer from a given enum. But this is kinda science fiction IMO

My thoughts were along those lines; that it should be possible to make a link-time table of the classes in enum order.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anw, I improved this a bit to make it shorter:

        case LLM_ARCH_LLAMA:
            return new llama_model_llama(params);
        case LLM_ARCH_LLAMA4:
            return new llama_model_llama4(params);
        case LLM_ARCH_LLAMA_EMBED:
            return new llama_model_llama_embed(params);

Copy link
Copy Markdown
Contributor Author

@ngxson ngxson Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO that's not much different from the current switch..case implementation. The std::unordered_map will be slightly worse because:

  • The map is constructed at runtime (versus switch being compiled statically into a jump table)
  • Each look up on the map need to call hash function (versus switch, just a simple jump)
  • Lambda function need to be compiled which will produce some overheads. While it's not very significant, does technically cost compilation time

Beside, I can't think of any cases where the std::map can do but switch...case cannot (except for being modifiable at runtime, which we won't support anyway)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More elegant ;)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW it's static const, so the compiler will probably optimize it anyway, won't be constructed at runtime FWIW.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More elegant ;)

I don't get it... how can this:

{ A, []{ return std::make_unique<AClass>(); } },

more elegant than this:

case A: return new AClass(); // no lambda is needed

BTW it's static const, so the compiler will probably optimize it anyway, won't be constructed at runtime FWIW.

IIRC it only does that with -O2 or -O3. Switch statement is always compiled to static jump table no matter what. So, why not?

Also, the map always requires calculating hash to look up elements.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personal preference :)

@ngxson ngxson force-pushed the xsn/model_def_self_contained branch from 69104f1 to 31011e6 Compare April 16, 2026 22:06
Comment thread src/models/models.h
Comment on lines +142 to 150
struct llama_model_llama_embed : public llama_model_llama {
llama_model_llama_embed(const struct llama_model_params & params) : llama_model_llama(params) {}
// reuse load_hparams and load_tensors from llama_model_llama

template <bool embed>
using graph = llama_model_llama::graph<embed>;

std::unique_ptr<llm_graph_context> build_graph_context(const llm_graph_params & params) const override;
};
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The auto-migration script can't be intelligent enough to point out that we can use the specialization <true> of the graph, but things like this can be improved via a follow-up PR (just noting here for visibility)

@github-actions github-actions bot added the model Model specific label Apr 16, 2026
Comment thread src/llama-model.cpp Outdated

// helper function to facilitate migration
// TODO: remove this in the future
auto create_tensor = [&](const LLM_TN_IMPL & tn, const std::initializer_list<int64_t> & ne, int flags) -> ggml_tensor * {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would rename the lambda, having both the method and the lambda named create_tensor is a bit confusing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the better way is to completely move the current function the llm_arch_model_i and simply reuse the create_tensor there. Will push a fix for this

I try not to renaming things to reduce the number of line changes in this PR

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lambda is removed in the latest version

@ngxson ngxson force-pushed the xsn/model_def_self_contained branch from 31011e6 to b5809e2 Compare April 17, 2026 14:49
@ngxson
Copy link
Copy Markdown
Contributor Author

ngxson commented Apr 17, 2026

Not very urgent but pinging @ggerganov if you can have a quick look on the direction

@ggerganov ggerganov self-assigned this Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

model Model specific python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor: move load_hparams and load_tensors to per-model definition

4 participants