model: move load_hparams and load_tensors to per-model definition#22004
model: move load_hparams and load_tensors to per-model definition#22004ngxson wants to merge 33 commits intoggml-org:masterfrom
load_hparams and load_tensors to per-model definition#22004Conversation
|
Alright, first pass of the migration (fully generated from the script --> 69104f1) Asking @ggerganov @CISC @pwilkin for a pre-review |
| switch (arch) { | ||
| case LLM_ARCH_LLAMA: | ||
| { | ||
| model = new llama_model_llama(params); | ||
| } break; | ||
| case LLM_ARCH_LLAMA4: |
There was a problem hiding this comment.
Surely there must be some other way to do this without this switch?
There was a problem hiding this comment.
Yes and no, technically we can hide the case behind a macro like:
ARCH_DEF(LLAMA)
That will expand to the same case LLM_ARCH_LLAMA: ... break; (though I'm not 100% sure if a preprocessor macro like tolower or toupperexists or not)
Another way is to bake the arch into model class definition, but we will still need to initialize the model instance just to look at its arch
In any cases, it's technically hard to escape from any shorts of switch..case here in C/C++. After all, a struct definition in c++ is no more than bunch of function pointers, unlike higher-level languages like python or java where the notion of "class" does exist at runtime.
There was a problem hiding this comment.
struct definition in c++ is no more than bunch of function pointers
Another extreme hacky way is to arrange the vpointer table to follow exactly the same order as the enum, then calculate the correct function pointer from a given enum. But this is kinda science fiction IMO
There was a problem hiding this comment.
struct definition in c++ is no more than bunch of function pointers
Another extreme hacky way is to arrange the vpointer table to follow exactly the same order as the enum, then calculate the correct function pointer from a given enum. But this is kinda science fiction IMO
My thoughts were along those lines; that it should be possible to make a link-time table of the classes in enum order.
There was a problem hiding this comment.
Anw, I improved this a bit to make it shorter:
case LLM_ARCH_LLAMA:
return new llama_model_llama(params);
case LLM_ARCH_LLAMA4:
return new llama_model_llama4(params);
case LLM_ARCH_LLAMA_EMBED:
return new llama_model_llama_embed(params);There was a problem hiding this comment.
IMO that's not much different from the current switch..case implementation. The std::unordered_map will be slightly worse because:
- The map is constructed at runtime (versus switch being compiled statically into a jump table)
- Each look up on the map need to call hash function (versus switch, just a simple jump)
- Lambda function need to be compiled which will produce some overheads. While it's not very significant, does technically cost compilation time
Beside, I can't think of any cases where the std::map can do but switch...case cannot (except for being modifiable at runtime, which we won't support anyway)
There was a problem hiding this comment.
BTW it's static const, so the compiler will probably optimize it anyway, won't be constructed at runtime FWIW.
There was a problem hiding this comment.
More elegant ;)
I don't get it... how can this:
{ A, []{ return std::make_unique<AClass>(); } },
more elegant than this:
case A: return new AClass(); // no lambda is needed
BTW it's static const, so the compiler will probably optimize it anyway, won't be constructed at runtime FWIW.
IIRC it only does that with -O2 or -O3. Switch statement is always compiled to static jump table no matter what. So, why not?
Also, the map always requires calculating hash to look up elements.
69104f1 to
31011e6
Compare
| struct llama_model_llama_embed : public llama_model_llama { | ||
| llama_model_llama_embed(const struct llama_model_params & params) : llama_model_llama(params) {} | ||
| // reuse load_hparams and load_tensors from llama_model_llama | ||
|
|
||
| template <bool embed> | ||
| using graph = llama_model_llama::graph<embed>; | ||
|
|
||
| std::unique_ptr<llm_graph_context> build_graph_context(const llm_graph_params & params) const override; | ||
| }; |
There was a problem hiding this comment.
The auto-migration script can't be intelligent enough to point out that we can use the specialization <true> of the graph, but things like this can be improved via a follow-up PR (just noting here for visibility)
|
|
||
| // helper function to facilitate migration | ||
| // TODO: remove this in the future | ||
| auto create_tensor = [&](const LLM_TN_IMPL & tn, const std::initializer_list<int64_t> & ne, int flags) -> ggml_tensor * { |
There was a problem hiding this comment.
Would rename the lambda, having both the method and the lambda named create_tensor is a bit confusing.
There was a problem hiding this comment.
I think the better way is to completely move the current function the llm_arch_model_i and simply reuse the create_tensor there. Will push a fix for this
I try not to renaming things to reduce the number of line changes in this PR
There was a problem hiding this comment.
This lambda is removed in the latest version
31011e6 to
b5809e2
Compare
|
Not very urgent but pinging @ggerganov if you can have a quick look on the direction |
Overview
Fix #21966
The migration will be done via a script
0migrate.pyincluded in this PR (and will be removed right before this is merged)Important note: the goal of this PR is to make sure the migration is as deterministic as possible, we do that via a completely heuristic script as mentioned above. Any improvements (deduplication, clean up, etc) will be done via follow-up PRs
Depends on:
Checklist before merging:
#if 0pre-migration placesAdditional information
Migration rules:
_replaced by-llama_model_Given 2 example arch A and B:
load_tensorsANDload_hparams, B inherit Aload_tensorsORload_hparamsis different, no inherit, there will be one of the 2 functions being duplicatedusing graph = llama_model_a::graphSide effects:
load_tensorsorload_hparams)-iswasuffix in file name or in model class nameRequirements