Skip to content

Avoid redundant memory_breakdown() computation in device memory probe#21988

Open
Jessen-Li wants to merge 1 commit intoggml-org:masterfrom
Jessen-Li:pr-memory-breakdown
Open

Avoid redundant memory_breakdown() computation in device memory probe#21988
Jessen-Li wants to merge 1 commit intoggml-org:masterfrom
Jessen-Li:pr-memory-breakdown

Conversation

@Jessen-Li
Copy link
Copy Markdown

Overview

This PR removes a redundant computation of ctx->memory_breakdown() during device memory probing in llama_get_device_memory_data().

Additional information

Changes

  • Introduce llama_memory_breakdown_print_impl which accepts a precomputed memory_breakdown object.
  • Reuse the existing memory_breakdown result from llama_get_device_memory_data() instead of recomputing it in
    llama_memory_breakdown_print()

Motivation

llama_get_device_memory_data() may be called multiple times during fit/probe workflows (e.g. baseline, context probing, repeated resource checks). Each call currently recomputes memory_breakdown, leading to unnecessary duplicated work.

This change ensures:

  • No redundant computation within a single probe
  • Consistent snapshot used for both computation and logging
  • Slight reduction in overhead in repeated probe scenarios

Behavior

No functional change is intended.
Output and memory accounting remain identical.

Requirements

No new requirements introduced by this change.

Add llama_memory_breakdown_print_impl that accepts a precomputed
memory_breakdown object, and reuse it in llama_get_device_memory_data
to avoid duplicate computation during fit/probe runs.
@Jessen-Li Jessen-Li requested a review from ggerganov as a code owner April 16, 2026 11:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant