Skip to content

common : add --hf-prune-old-files (-hfp) parameter to automatically delete outdated HF files#21923

Open
Cr4xy wants to merge 1 commit intoggml-org:masterfrom
Cr4xy:hf-prune-old-files
Open

common : add --hf-prune-old-files (-hfp) parameter to automatically delete outdated HF files#21923
Cr4xy wants to merge 1 commit intoggml-org:masterfrom
Cr4xy:hf-prune-old-files

Conversation

@Cr4xy
Copy link
Copy Markdown

@Cr4xy Cr4xy commented Apr 14, 2026

Overview

This PR adds a new HF model management housekeeping CLI argument that tells llama.cpp to delete older versions of HuggingFace models.
Since models are now stored in the HF format, the disk usage can rise significantly when the same model gets downloaded multiple times.
For example, my models--unsloth--gemma-4-26B-A4B-it-GGUF folder currently takes up around 100GB instead of 27GB.
With this flag, llama.cpp will check for old versions on startup and delete them if it can find them. That basically brings back the old behavior, where the model size stayed (almost) the same when a new version was downloaded.

Additional information

This works by checking all symlinks in the snapshots directory that have the same file name as the current (latest) model file(s) and mmproj, if symlinks with the same name are found under a different commit, they are deleted, and if they don't point to the same file as the current files, those blob files are also deleted. If the commit folder ends up empty, it is cleaned up as well.
Unreferenced blobs are not being touched, since we don't have any information on what the file is used for and it might be bad to delete them.

Tests

For testing, I recreated a directory structure for unsloth/gemma-4-26B-A4B-it using these commands, and made sure only the latest version was kept:

mkdir ~/.cache/huggingface/hub/models--unsloth--gemma-4-26B-A4B-it-GGUF/

cd ~/.cache/huggingface/hub/models--unsloth--gemma-4-26B-A4B-it-GGUF/

mkdir -p blobs refs snapshots

touch blobs/04970385cb9761b18d11dd9088a98b2d55dd81f90861227295d563f15dec6052
touch blobs/3cb067adc7c8326f62c0bea8adba7b12419294622dd8318285be54a960305e98
touch blobs/8196e6dda1446b547d69cf30c0bf7a12a69eaf399513ab41ff065796aa975a61
touch blobs/9c961effefe478b33e4f4f63ba225aa51b3ab11aa3d68fe2ad34ce4f082a241b
touch blobs/9edda7d119912936bb9a1b52ba3f014eb57bfe390cac15dfe607fe6bedc151a7
touch blobs/fc2ebf4c44528daa2cea7b39891712847ca5e4f87dcf578054a06c46bfe6da27

touch refs/main

mkdir -p snapshots/033f1ad73e950e049fd521ae599b3e5573a1517d
mkdir -p snapshots/8bacec5c8e829a25502cdfe3c3f5b6aabee3218c
mkdir -p snapshots/9c718328e1620e7280a93e1a809e805e0f3e4839
mkdir -p snapshots/bd1a2329b14654bebfdf4b3346cd3b8e123fd81b

ln -s ../../blobs/9edda7d119912936bb9a1b52ba3f014eb57bfe390cac15dfe607fe6bedc151a7 snapshots/033f1ad73e950e049fd521ae599b3e5573a1517d/gemma-4-26B-A4B-it-UD-Q8_K_XL.gguf
ln -s ../../blobs/fc2ebf4c44528daa2cea7b39891712847ca5e4f87dcf578054a06c46bfe6da27 snapshots/033f1ad73e950e049fd521ae599b3e5573a1517d/mmproj-BF16.gguf

ln -s ../../blobs/9c961effefe478b33e4f4f63ba225aa51b3ab11aa3d68fe2ad34ce4f082a241b snapshots/8bacec5c8e829a25502cdfe3c3f5b6aabee3218c/gemma-4-26B-A4B-it-UD-Q8_K_XL.gguf
ln -s ../../blobs/fc2ebf4c44528daa2cea7b39891712847ca5e4f87dcf578054a06c46bfe6da27 snapshots/8bacec5c8e829a25502cdfe3c3f5b6aabee3218c/mmproj-BF16.gguf

ln -s ../../blobs/3cb067adc7c8326f62c0bea8adba7b12419294622dd8318285be54a960305e98 snapshots/9c718328e1620e7280a93e1a809e805e0f3e4839/gemma-4-26B-A4B-it-UD-Q8_K_XL.gguf
ln -s ../../blobs/fc2ebf4c44528daa2cea7b39891712847ca5e4f87dcf578054a06c46bfe6da27 snapshots/9c718328e1620e7280a93e1a809e805e0f3e4839/mmproj-BF16.gguf

ln -s ../../blobs/3cb067adc7c8326f62c0bea8adba7b12419294622dd8318285be54a960305e98 snapshots/bd1a2329b14654bebfdf4b3346cd3b8e123fd81b/gemma-4-26B-A4B-it-UD-Q8_K_XL.gguf
ln -s ../../blobs/fc2ebf4c44528daa2cea7b39891712847ca5e4f87dcf578054a06c46bfe6da27 snapshots/bd1a2329b14654bebfdf4b3346cd3b8e123fd81b/mmproj-BF16.gguf

Here's the current output:

$ tree ~/.cache/huggingface/hub/models--unsloth--gemma-4-26B-A4B-it-GGUF/
/home/cr4xy/.cache/huggingface/hub/models--unsloth--gemma-4-26B-A4B-it-GGUF/
├── blobs
│   ├── 04970385cb9761b18d11dd9088a98b2d55dd81f90861227295d563f15dec6052
│   ├── 3cb067adc7c8326f62c0bea8adba7b12419294622dd8318285be54a960305e98
│   ├── 8196e6dda1446b547d69cf30c0bf7a12a69eaf399513ab41ff065796aa975a61
│   ├── 9c961effefe478b33e4f4f63ba225aa51b3ab11aa3d68fe2ad34ce4f082a241b
│   ├── 9edda7d119912936bb9a1b52ba3f014eb57bfe390cac15dfe607fe6bedc151a7
│   └── fc2ebf4c44528daa2cea7b39891712847ca5e4f87dcf578054a06c46bfe6da27
├── refs
│   └── main
└── snapshots
    ├── 033f1ad73e950e049fd521ae599b3e5573a1517d
    │   ├── gemma-4-26B-A4B-it-UD-Q8_K_XL.gguf -> ../../blobs/9edda7d119912936bb9a1b52ba3f014eb57bfe390cac15dfe607fe6bedc151a7
    │   └── mmproj-BF16.gguf -> ../../blobs/fc2ebf4c44528daa2cea7b39891712847ca5e4f87dcf578054a06c46bfe6da27
    ├── 8bacec5c8e829a25502cdfe3c3f5b6aabee3218c
    │   ├── gemma-4-26B-A4B-it-UD-Q8_K_XL.gguf -> ../../blobs/9c961effefe478b33e4f4f63ba225aa51b3ab11aa3d68fe2ad34ce4f082a241b
    │   └── mmproj-BF16.gguf -> ../../blobs/fc2ebf4c44528daa2cea7b39891712847ca5e4f87dcf578054a06c46bfe6da27
    ├── 9c718328e1620e7280a93e1a809e805e0f3e4839
    │   ├── gemma-4-26B-A4B-it-UD-Q8_K_XL.gguf -> ../../blobs/3cb067adc7c8326f62c0bea8adba7b12419294622dd8318285be54a960305e98
    │   └── mmproj-BF16.gguf -> ../../blobs/fc2ebf4c44528daa2cea7b39891712847ca5e4f87dcf578054a06c46bfe6da27
    └── bd1a2329b14654bebfdf4b3346cd3b8e123fd81b
        ├── gemma-4-26B-A4B-it-UD-Q8_K_XL.gguf -> ../../blobs/3cb067adc7c8326f62c0bea8adba7b12419294622dd8318285be54a960305e98
        └── mmproj-BF16.gguf -> ../../blobs/fc2ebf4c44528daa2cea7b39891712847ca5e4f87dcf578054a06c46bfe6da27

8 directories, 15 files
$ ./build/bin/llama-cli -hf unsloth/gemma-4-26B-A4B-it-GGUF:Q8_K_XL --ctx-size 8192 -v -hfp
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 24060 MiB):
  Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes, VRAM: 24060 MiB
common_download_file_single_online: no previous model file found /home/cr4xy/.cache/llama.cpp/unsloth_gemma-4-26B-A4B-it-GGUF_preset.ini
common_download_file_single_online: HEAD failed, status: 404
no remote preset found, skipping
common_download_file_single_online: using cached file: /home/cr4xy/.cache/huggingface/hub/models--unsloth--gemma-4-26B-A4B-it-GGUF/snapshots/8bacec5c8e829a25502cdfe3c3f5b6aabee3218c/gemma-4-26B-A4B-it-UD-Q8_K_XL.gguf
common_download_file_single_online: using cached file: /home/cr4xy/.cache/huggingface/hub/models--unsloth--gemma-4-26B-A4B-it-GGUF/snapshots/8bacec5c8e829a25502cdfe3c3f5b6aabee3218c/mmproj-BF16.gguf
deleting old blob file from hf cache: /home/cr4xy/.cache/huggingface/hub/models--unsloth--gemma-4-26B-A4B-it-GGUF/snapshots/9c718328e1620e7280a93e1a809e805e0f3e4839/gemma-4-26B-A4B-it-UD-Q8_K_XL.gguf -> /home/cr4xy/.cache/huggingface/hub/models--unsloth--gemma-4-26B-A4B-it-GGUF/blobs/3cb067adc7c8326f62c0bea8adba7b12419294622dd8318285be54a960305e98
deleting old blob file from hf cache: /home/cr4xy/.cache/huggingface/hub/models--unsloth--gemma-4-26B-A4B-it-GGUF/snapshots/033f1ad73e950e049fd521ae599b3e5573a1517d/gemma-4-26B-A4B-it-UD-Q8_K_XL.gguf -> /home/cr4xy/.cache/huggingface/hub/models--unsloth--gemma-4-26B-A4B-it-GGUF/blobs/9edda7d119912936bb9a1b52ba3f014eb57bfe390cac15dfe607fe6bedc151a7
deleting old blob file from hf cache: /home/cr4xy/.cache/huggingface/hub/models--unsloth--gemma-4-26B-A4B-it-GGUF/snapshots/bd1a2329b14654bebfdf4b3346cd3b8e123fd81b/gemma-4-26B-A4B-it-UD-Q8_K_XL.gguf -> /home/cr4xy/.cache/huggingface/hub/models--unsloth--gemma-4-26B-A4B-it-GGUF/blobs/3cb067adc7c8326f62c0bea8adba7b12419294622dd8318285be54a960305e98
[...]
$ tree ~/.cache/huggingface/hub/models--unsloth--gemma-4-26B-A4B-it-GGUF/
/home/cr4xy/.cache/huggingface/hub/models--unsloth--gemma-4-26B-A4B-it-GGUF/
├── blobs
│   ├── 04970385cb9761b18d11dd9088a98b2d55dd81f90861227295d563f15dec6052 # unreferenced
│   ├── 8196e6dda1446b547d69cf30c0bf7a12a69eaf399513ab41ff065796aa975a61 # unreferenced
│   ├── 9c961effefe478b33e4f4f63ba225aa51b3ab11aa3d68fe2ad34ce4f082a241b
│   └── fc2ebf4c44528daa2cea7b39891712847ca5e4f87dcf578054a06c46bfe6da27
├── refs
│   └── main
└── snapshots
    └── 8bacec5c8e829a25502cdfe3c3f5b6aabee3218c
        ├── gemma-4-26B-A4B-it-UD-Q8_K_XL.gguf -> ../../blobs/9c961effefe478b33e4f4f63ba225aa51b3ab11aa3d68fe2ad34ce4f082a241b
        └── mmproj-BF16.gguf -> ../../blobs/fc2ebf4c44528daa2cea7b39891712847ca5e4f87dcf578054a06c46bfe6da27

5 directories, 7 files

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: Yes, FIM autocomplete with Qwen2.5-Coder-7B:Q8_0 and Qwen3.5-35B-A3B:UD_Q4_K_XL to convert the existing model directory tree into the above commands for testing

@Cr4xy Cr4xy requested a review from a team as a code owner April 14, 2026 21:28
@ngxson ngxson requested a review from angt April 15, 2026 13:25
@angt
Copy link
Copy Markdown
Member

angt commented Apr 16, 2026

I’m not convinced we should incorporate features from the hf cli tool into llama.cpp.
When downloading models with -hf, I believe we can expect users to use the CLI to manage the cache entirely.

@Cr4xy
Copy link
Copy Markdown
Author

Cr4xy commented Apr 16, 2026

I’m not convinced we should incorporate features from the hf cli tool into llama.cpp. When downloading models with -hf, I believe we can expect users to use the CLI to manage the cache entirely.

I understand that, I just think this would make sense because otherwise users are forced to always clean up disk space manually with the CLI. Personally, I don't even have huggingface_hub installed on the computer that runs llama-server, since llama.cpp already downloads everything automatically with -hf. It would just be nice if it could also take care of this part.

@shipped-it
Copy link
Copy Markdown

+1

as llama.cpp can download models from huggingface, users should be allowed to remove those downloaded files quickly too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants