Skip to content

[PAL/vm-common] Use per-CPU L1d,L1i,L2 caches and one shared L3#32

Merged
dimakuv merged 1 commit into
intel_tdxfrom
dimakuv/vm-common-fix-caches-to-cpus
Jul 12, 2024
Merged

[PAL/vm-common] Use per-CPU L1d,L1i,L2 caches and one shared L3#32
dimakuv merged 1 commit into
intel_tdxfrom
dimakuv/vm-common-fix-caches-to-cpus

Conversation

@dimakuv
Copy link
Copy Markdown

@dimakuv dimakuv commented Jul 12, 2024

Description of the changes

Previously, synthesized VM CPU/NUMA/caches topology had a bug: each CPU was pointing to the same L1d cache, same L1i cache, same L2 cache. This was interpreted by the LibOS layer as e.g. a single L1d cache on the platform shared by all CPUs. This bogus CPUs-caches topology confused some programs, in particular the GEMM Rust crate: the crate calculates the number of CPUs sharing a particular cache, then uses this number to calculate the "effective" number of bytes in the cache reserved for a single CPU, and then uses this number to optimize matrix multiplication:

This PR creates a correct CPUs-caches topology: each CPU has a dedicated L1d cache, L1i cache and L2 cache. L3 cache is shared by all CPUs (same as it was done previously). This satisfies the GEMM Rust crate and allows to run e.g. Candle ML framework.

How to test this PR?

Run #31.


This change is Reviewable

Previously, synthesized VM CPU/NUMA/caches topology had a bug: each CPU
was pointing to the same L1d cache, same L1i cache, same L2 cache. This
was interpreted by the LibOS layer as e.g. a single L1d cache on the
platform shared by all CPUs. This bogus CPUs-caches topology confused
some programs, in particular the GEMM Rust crate: the crate calculates
the number of CPUs sharing a particular cache, then uses this number to
calculate the "effective" number of bytes in the cache reserved for a
single CPU, and then uses this number to optimize matrix multiplication:
- https://github.com/sarah-ek/gemm/blob/8cdc1de4/gemm-common/src/cache.rs#L118
- https://github.com/sarah-ek/gemm/blob/8cdc1de4/gemm-common/src/cache.rs#L214
- https://github.com/sarah-ek/gemm/blob/8cdc1de4/gemm-common/src/gemm.rs#L356

This commit creates a correct CPUs-caches topology: each CPU has a
dedicated L1d cache, L1i cache and L2 cache. L3 cache is shared by all
CPUs (same as it was done previously). This satisfies the GEMM Rust
crate and allows to run e.g. Candle ML framework.

Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii@intel.com>
Copy link
Copy Markdown
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, all discussions resolved

a discussion (no related file):
FYI: We increase the data segment in the PAL binary, because instead of 4 statically allocated cache-infos, we now have MAX_NUM_CPUS * MAX_CACHES = 1024 statically allocated cache-infos.

I checked the final binary sizes before and after this PR: no change at all.


@dimakuv dimakuv merged commit c4f0437 into intel_tdx Jul 12, 2024
@dimakuv dimakuv deleted the dimakuv/vm-common-fix-caches-to-cpus branch July 12, 2024 08:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant