Avoid redundant memory_breakdown() computation in device memory probe#21988
Open
Jessen-Li wants to merge 1 commit intoggml-org:masterfrom
Open
Avoid redundant memory_breakdown() computation in device memory probe#21988Jessen-Li wants to merge 1 commit intoggml-org:masterfrom
Jessen-Li wants to merge 1 commit intoggml-org:masterfrom
Conversation
Add llama_memory_breakdown_print_impl that accepts a precomputed memory_breakdown object, and reuse it in llama_get_device_memory_data to avoid duplicate computation during fit/probe runs.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
This PR removes a redundant computation of ctx->memory_breakdown() during device memory probing in llama_get_device_memory_data().
Additional information
Changes
llama_memory_breakdown_print()
Motivation
llama_get_device_memory_data() may be called multiple times during fit/probe workflows (e.g. baseline, context probing, repeated resource checks). Each call currently recomputes memory_breakdown, leading to unnecessary duplicated work.
This change ensures:
Behavior
No functional change is intended.
Output and memory accounting remain identical.
Requirements
No new requirements introduced by this change.
NO.