-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[docs]: refresh KT install commands #1958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -13,13 +13,13 @@ | |
|
|
||
| ## 🎯 概览 | ||
|
|
||
| KTransformers 是一个专注于通过 CPU-GPU 异构计算实现大语言模型高效推理和微调的研究项目。该项目已发展为**两个核心模块**:[kt-kernel](./kt-kernel/) 和 [kt-sft](./kt-sft/)。 | ||
| KTransformers 是一个专注于通过 CPU-GPU 异构计算实现大语言模型高效推理和微调的研究项目。该项目已发展为**两个核心模块**:[kt-kernel](./kt-kernel/) 和 [kt-sft](./doc/en/SFT/KTransformers-Fine-Tuning_User-Guide.md)。 | ||
|
|
||
| ## 🔥 更新 | ||
|
|
||
| * **2025 年 12 月 5 日**:支持原生 Kimi-K2-Thinking 推理([教程](./doc/en/Kimi-K2-Thinking-Native.md)) | ||
| * **2025 年 12 月 5 日**:支持原生 Kimi-K2-Thinking 推理([教程](./doc/en/kt-kernel/Kimi-K2-Thinking-Native.md)) | ||
| * **2025 年 11 月 6 日**:支持 Kimi-K2-Thinking 推理([教程](./doc/en/Kimi-K2-Thinking.md))和微调([教程](./doc/en/SFT_Installation_Guide_KimiK2.md)) | ||
| * **2025 年 11 月 4 日**:KTransformers 微调 × LLaMA-Factory 集成([教程](./doc/en/KTransformers-Fine-Tuning_User-Guide.md)) | ||
| * **2025 年 11 月 4 日**:KTransformers 微调 × LLaMA-Factory 集成([教程](./doc/en/SFT/KTransformers-Fine-Tuning_User-Guide.md)) | ||
| * **2025 年 10 月 27 日**:支持昇腾 NPU([教程](./doc/zh/DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md)) | ||
| * **2025 年 10 月 10 日**:集成到 SGLang([路线图](https://github.com/sgl-project/sglang/issues/11425),[博客](https://lmsys.org/blog/2025-10-22-KTransformers/)) | ||
| * **2025 年 9 月 11 日**:支持 Qwen3-Next([教程](./doc/en/Qwen3-Next.md)) | ||
|
|
@@ -79,7 +79,7 @@ pip install . | |
|
|
||
| --- | ||
|
|
||
| ### 🎓 [kt-sft](./kt-sft/) - 微调框架 | ||
| ### 🎓 [kt-sft](./doc/en/SFT/KTransformers-Fine-Tuning_User-Guide.md) - 微调框架 | ||
|
|
||
| KTransformers × LLaMA-Factory 集成,用于超大型 MoE 模型微调。 | ||
|
|
||
|
|
@@ -101,12 +101,15 @@ KTransformers × LLaMA-Factory 集成,用于超大型 MoE 模型微调。 | |
|
|
||
| **快速开始:** | ||
| ```bash | ||
| cd kt-sft | ||
| # 按照 kt-sft/README.md 安装环境 | ||
| USE_KT=1 llamafactory-cli train examples/train_lora/deepseek3_lora_sft_kt.yaml | ||
| cd /path/to/LLaMA-Factory | ||
| pip install -e . | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| pip install "ktransformers[sft]" | ||
| USE_KT=1 ACCELERATE_USE_KT=true \ | ||
| accelerate launch --config_file examples/ktransformers/accelerate/fsdp2_kt_bf16.yaml \ | ||
| -m llamafactory.cli train examples/ktransformers/train_lora/deepseek_v3_lora_sft_kt.yaml | ||
| ``` | ||
|
|
||
| 👉 **[完整文档 →](./kt-sft/README.md)** | ||
| 👉 **[完整文档 →](./doc/en/SFT/KTransformers-Fine-Tuning_User-Guide.md)** | ||
|
|
||
| --- | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -95,26 +95,26 @@ This section shows how to install and use **LLaMA-Factory + KTransformers** for | |||||
| ### Environment Setup | ||||||
|
|
||||||
| According to the following example, install both the **KTransformers** and **LLaMA-Factory** environments simultaneously. | ||||||
| This time, to simplify the installation process of KTransformers, we have specially packaged a wheel file to avoid local compilation. | ||||||
| This time, to simplify the installation process of KTransformers, use the PyPI packages to avoid local compilation. | ||||||
| The detailed installation steps are as follows: | ||||||
| (Note: Make sure your local **Python version**, **Torch version**, **CUDA version**, and the **KTransformers wheel filename** correspond correctly.) | ||||||
| (Note: Make sure your local **Python version**, **Torch version**, and **CUDA version** are compatible with the installed packages.) | ||||||
|
|
||||||
| ```shell | ||||||
| # 1. Create a conda environment | ||||||
| conda create -n Kllama python=3.12 # choose from : [3.10, 3.11, 3.12, 3.13] | ||||||
| conda create -n Kllama python=3.12 # choose from : [3.11, 3.12, 3.13] | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The suggested Python versions
Suggested change
|
||||||
| conda install -y -c conda-forge libstdcxx-ng gcc_impl_linux-64 | ||||||
| conda install -y -c nvidia/label/cuda-11.8.0 cuda-runtime | ||||||
|
|
||||||
| # 2. Install the LLaMA-Factory environment | ||||||
| git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git | ||||||
| cd LLaMA-Factory | ||||||
| pip install -e ".[torch,metrics]" --no-build-isolation | ||||||
| pip install -e . | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Removing the
Suggested change
|
||||||
|
|
||||||
| # 3. Install the KTransformers wheel that matches your Torch and Python versions, from https://github.com/kvcache-ai/ktransformers/releases/tag/v0.4.1 (Note: The CUDA version can differ from that in the wheel filename.) | ||||||
| pip install ktransformers-0.4.1+cu128torch27fancy-cp312-cp312-linux_x86_64.whl | ||||||
| # 3. Install the KTransformers SFT packages | ||||||
| pip install "ktransformers[sft]" | ||||||
|
|
||||||
| # 4. Install flash-attention, download the corresponding file based on your Python and Torch versions from: https://github.com/Dao-AILab/flash-attention/releases | ||||||
| pip install flash_attn-2.8.3+cu12torch2.7cxx11abiTRUE-cp312-cp312-linux_x86_64.whl | ||||||
| pip install flash-attn --no-build-isolation | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||
| # abi=True/False can find from below | ||||||
| # import torch | ||||||
| # print(torch._C._GLIBCXX_USE_CXX11_ABI) | ||||||
|
|
@@ -128,7 +128,7 @@ pip install custom_flashinfer/ | |||||
|
|
||||||
| ### Core Feature 1: Use KTransformers backend to fine-tune ultra-large MoE models | ||||||
|
|
||||||
| Run the command: `USE_KT=1 llamafactory-cli train examples/train_lora/deepseek3_lora_sft_kt.yaml`. | ||||||
| Run the command: `USE_KT=1 ACCELERATE_USE_KT=true accelerate launch --config_file examples/ktransformers/accelerate/fsdp2_kt_bf16.yaml -m llamafactory.cli train examples/ktransformers/train_lora/deepseek_v3_lora_sft_kt.yaml`. | ||||||
|
|
||||||
| Note: You **must** provide a **BF16** model. DeepSeek-V3-671B is released in FP8 by default; convert with [DeepSeek-V3/inference/fp8_cast_bf16.py](https://github.com/deepseek-ai/DeepSeek-V3/blob/main/inference/fp8_cast_bf16.py). | ||||||
|
|
||||||
|
|
@@ -213,7 +213,7 @@ Outputs go to `output_dir` in safetensors format plus adapter metadata for later | |||||
|
|
||||||
| ### Core Feature 2: Chat with the fine-tuned model (base + LoRA adapter) | ||||||
|
|
||||||
| Run the command: `llamafactory-cli chat examples/inference/deepseek3_lora_sft_kt.yaml`. | ||||||
| Run the command: `llamafactory-cli chat examples/inference/qwen3_lora_sft.yaml`. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The command uses
Suggested change
|
||||||
|
|
||||||
| Use the safetensors adapter trained with KT for inference. | ||||||
|
|
||||||
|
|
@@ -238,7 +238,7 @@ During loading, LLaMA-Factory maps layer names to KT’s naming. You’ll see lo | |||||
|
|
||||||
| ### Core Feature 3: Batch inference + metrics (base + LoRA adapter) | ||||||
|
|
||||||
| Run the command: `API_PORT=8000 llamafactory-cli api examples/inference/deepseek3_lora_sft_kt.yaml`. | ||||||
| Run the command: `API_PORT=8000 llamafactory-cli api examples/inference/qwen3_lora_sft.yaml`. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This API command uses
Suggested change
|
||||||
| Invoke the KT fine-tuned adapter to provide the API; the usage logic of other APIs is consistent with the native LLaMA-Factory approach. | ||||||
|
|
||||||
| ```yaml | ||||||
|
|
||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Including the
[torch,metrics]extras when installing LLaMA-Factory is recommended to ensure that all necessary dependencies for training and evaluation are installed, especially since the quick start does not explicitly install them earlier.