[docs]: refresh KT install commands by JimmyPeilinLi · Pull Request #1958 · kvcache-ai/ktransformers

JimmyPeilinLi · 2026-04-26T16:29:45Z

Summary: update README/SFT/kt-kernel install commands with current KT inference and SFT paths while keeping docs formatting minimal. Validation: git diff --check; confirmed sglang-kt 0.6.1 wheel metadata does not depend on kt-kernel.

gemini-code-assist

Code Review

This pull request updates the documentation and installation guides to reflect the transition to PyPI-based package distribution and improved integration with LLaMA-Factory. Key changes include simplified installation commands, updated tutorial links, and more detailed training/inference instructions using accelerate. Feedback focuses on maintaining necessary dependency extras for LLaMA-Factory, correcting Python version compatibility notes, and ensuring consistency between model examples used in training and inference sections.

gemini-code-assist · 2026-04-26T16:31:51Z

-# Install environment following kt-sft/README.md
-USE_KT=1 llamafactory-cli train examples/train_lora/deepseek3_lora_sft_kt.yaml
+cd /path/to/LLaMA-Factory
+pip install -e .


Including the [torch,metrics] extras when installing LLaMA-Factory is recommended to ensure that all necessary dependencies for training and evaluation are installed, especially since the quick start does not explicitly install them earlier.

Suggested change

pip install -e .

pip install -e ".[torch,metrics]"

gemini-code-assist · 2026-04-26T16:31:51Z

-# 按照 kt-sft/README.md 安装环境
-USE_KT=1 llamafactory-cli train examples/train_lora/deepseek3_lora_sft_kt.yaml
+cd /path/to/LLaMA-Factory
+pip install -e .


建议在安装 LLaMA-Factory 时包含 [torch,metrics] 扩展项，以确保安装了训练和评估所需的所有依赖项。

Suggested change

pip install -e .

pip install -e ".[torch,metrics]"

gemini-code-assist · 2026-04-26T16:31:51Z

 ```shell
 # 1. Create a conda environment
-conda create -n Kllama python=3.12 # choose from : [3.10, 3.11, 3.12, 3.13]
+conda create -n Kllama python=3.12 # choose from : [3.11, 3.12, 3.13]


The suggested Python versions [3.11, 3.12, 3.13] include 3.13, but kt-kernel/README.md (lines 63 and 68) indicates that pre-built wheels are currently only provided for Python 3.10, 3.11, and 3.12. Suggesting 3.13 may lead to a source build, which contradicts the goal of avoiding local compilation stated in line 98. Additionally, Python 3.10 is missing from the list despite being supported.

Suggested change

conda create -n Kllama python=3.12 # choose from : [3.11, 3.12, 3.13]

conda create -n Kllama python=3.12 # choose from : [3.10, 3.11, 3.12]

gemini-code-assist · 2026-04-26T16:31:51Z

 git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
 cd LLaMA-Factory
-pip install -e ".[torch,metrics]" --no-build-isolation
+pip install -e .


Removing the [torch,metrics] extras from the LLaMA-Factory installation may cause the training and evaluation steps to fail if these dependencies are not already present in the environment. It is safer to include them to ensure the tutorial works as expected.

Suggested change

pip install -e .

pip install -e ".[torch,metrics]"

gemini-code-assist · 2026-04-26T16:31:52Z


 # 4. Install flash-attention, download the corresponding file based on your Python and Torch versions from: https://github.com/Dao-AILab/flash-attention/releases
-pip install flash_attn-2.8.3+cu12torch2.7cxx11abiTRUE-cp312-cp312-linux_x86_64.whl
+pip install flash-attn --no-build-isolation


The command has been updated to use PyPI, but the preceding comment (line 116) still instructs the user to download a wheel file from GitHub. Please update the comment to be consistent with the new installation method.

gemini-code-assist · 2026-04-26T16:31:52Z

 ### Core Feature 2: Chat with the fine-tuned model (base + LoRA adapter)

-Run the command: `llamafactory-cli chat examples/inference/deepseek3_lora_sft_kt.yaml`.
+Run the command: `llamafactory-cli chat examples/inference/qwen3_lora_sft.yaml`.


The command uses qwen3_lora_sft.yaml, but the tutorial is focused on DeepSeek-V3 (as seen in the training step at line 131 and the YAML example at line 221). This inconsistency will confuse users who just trained a DeepSeek-V3 model in the previous step. Please update the command to use the appropriate DeepSeek-V3 inference configuration.

Suggested change

Run the command: `llamafactory-cli chat examples/inference/qwen3_lora_sft.yaml`.

Run the command: `llamafactory-cli chat examples/inference/deepseek_v3_lora_sft.yaml`.

gemini-code-assist · 2026-04-26T16:31:52Z

 ### Core Feature 3: Batch inference + metrics (base + LoRA adapter)

-Run the command: `API_PORT=8000 llamafactory-cli api examples/inference/deepseek3_lora_sft_kt.yaml`.
+Run the command: `API_PORT=8000 llamafactory-cli api examples/inference/qwen3_lora_sft.yaml`.


This API command uses qwen3_lora_sft.yaml, which is inconsistent with the DeepSeek-V3 model used throughout the rest of the guide.

Suggested change

Run the command: `API_PORT=8000 llamafactory-cli api examples/inference/qwen3_lora_sft.yaml`.

Run the command: `API_PORT=8000 llamafactory-cli api examples/inference/deepseek_v3_lora_sft.yaml`.

[docs]: refresh KT install commands

d93ea7e

gemini-code-assist Bot reviewed Apr 26, 2026

View reviewed changes

JimmyPeilinLi merged commit 0656e01 into main Apr 26, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[docs]: refresh KT install commands#1958

[docs]: refresh KT install commands#1958
JimmyPeilinLi merged 1 commit intomainfrom
docs-v061-refresh

JimmyPeilinLi commented Apr 26, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 26, 2026

Uh oh!

gemini-code-assist Bot Apr 26, 2026

Uh oh!

gemini-code-assist Bot Apr 26, 2026

Uh oh!

gemini-code-assist Bot Apr 26, 2026

Uh oh!

gemini-code-assist Bot Apr 26, 2026

Uh oh!

gemini-code-assist Bot Apr 26, 2026

Uh oh!

gemini-code-assist Bot Apr 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	conda create -n Kllama python=3.12 # choose from : [3.11, 3.12, 3.13]
	conda create -n Kllama python=3.12 # choose from : [3.10, 3.11, 3.12]

	Run the command: `llamafactory-cli chat examples/inference/qwen3_lora_sft.yaml`.
	Run the command: `llamafactory-cli chat examples/inference/deepseek_v3_lora_sft.yaml`.

	Run the command: `API_PORT=8000 llamafactory-cli api examples/inference/qwen3_lora_sft.yaml`.
	Run the command: `API_PORT=8000 llamafactory-cli api examples/inference/deepseek_v3_lora_sft.yaml`.

Conversation

JimmyPeilinLi commented Apr 26, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant