xc-llm-ascend/docs/source/community/user_stories/llamafactory.md

# LLaMA-Factory

**Introduction**

[LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) is an easy-to-use and efficient platform for training and fine-tuning large language models. With LLaMA-Factory, you can fine-tune hundreds of pre-trained models locally without writing any code.

LLaMA-Facotory users need to evaluate and inference the model after fine-tuning.

**Business challenge**

LLaMA-Factory uses Transformers to perform inference on Ascend NPUs, but the speed is slow.

**Benefits with vLLM Ascend**

With the joint efforts of LLaMA-Factory and vLLM Ascend ([LLaMA-Factory#7739](https://github.com/hiyouga/LLaMA-Factory/pull/7739)), LLaMA-Factory has achieved significant performance gains during model inference. Benchmark results show that its inference speed is now up to 2× faster compared to the Transformers implementation.

**Learn more**

See more details about LLaMA-Factory and how it uses vLLM Ascend for inference on Ascend NPUs in [LLaMA-Factory Ascend NPU Inference](https://llamafactory.readthedocs.io/en/latest/advanced/npu_inference.html).
-												[Doc] Refactor and init user story page  (#1224)

### What this PR does / why we need it?
This PR refactor the user stories page:
- Move it to community
- Add initial info of LLaMA-Factory, Huggingface/trl, MindIE Turbo,
GPUStack, verl
- Add a new page for LLaMA-Factory

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Preview locally

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
											
										
										
											2025-06-17 09:36:35 +08:00
+								# LLaMA-Factory
-												[v0.11.0][Doc] Update doc (#3852)

### What this PR does / why we need it?
Update doc


Signed-off-by: hfadzxy <starmoon_zhang@163.com>
											
										
										
											2025-10-29 11:32:12 +08:00
+								**Introduction**
-												[Doc] Refactor and init user story page  (#1224)

### What this PR does / why we need it?
This PR refactor the user stories page:
- Move it to community
- Add initial info of LLaMA-Factory, Huggingface/trl, MindIE Turbo,
GPUStack, verl
- Add a new page for LLaMA-Factory

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Preview locally

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
											
										
										
											2025-06-17 09:36:35 +08:00
 								[LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) is an easy-to-use and efficient platform for training and fine-tuning large language models. With LLaMA-Factory, you can fine-tune hundreds of pre-trained models locally without writing any code.
-												[v0.11.0][Doc] Update doc (#3852)

### What this PR does / why we need it?
Update doc


Signed-off-by: hfadzxy <starmoon_zhang@163.com>
											
										
										
											2025-10-29 11:32:12 +08:00
+								LLaMA-Facotory users need to evaluate and inference the model after fine-tuning.
-												[Doc] Refactor and init user story page  (#1224)

### What this PR does / why we need it?
This PR refactor the user stories page:
- Move it to community
- Add initial info of LLaMA-Factory, Huggingface/trl, MindIE Turbo,
GPUStack, verl
- Add a new page for LLaMA-Factory

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Preview locally

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
											
										
										
											2025-06-17 09:36:35 +08:00
-												[v0.11.0][Doc] Update doc (#3852)

### What this PR does / why we need it?
Update doc


Signed-off-by: hfadzxy <starmoon_zhang@163.com>
											
										
										
											2025-10-29 11:32:12 +08:00
+								**Business challenge**
-												[Doc] Refactor and init user story page  (#1224)

### What this PR does / why we need it?
This PR refactor the user stories page:
- Move it to community
- Add initial info of LLaMA-Factory, Huggingface/trl, MindIE Turbo,
GPUStack, verl
- Add a new page for LLaMA-Factory

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Preview locally

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
											
										
										
											2025-06-17 09:36:35 +08:00
-												[v0.11.0][Doc] Update doc (#3852)

### What this PR does / why we need it?
Update doc


Signed-off-by: hfadzxy <starmoon_zhang@163.com>
											
										
										
											2025-10-29 11:32:12 +08:00
+								LLaMA-Factory uses Transformers to perform inference on Ascend NPUs, but the speed is slow.
-												[Doc] Refactor and init user story page  (#1224)

### What this PR does / why we need it?
This PR refactor the user stories page:
- Move it to community
- Add initial info of LLaMA-Factory, Huggingface/trl, MindIE Turbo,
GPUStack, verl
- Add a new page for LLaMA-Factory

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Preview locally

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
											
										
										
											2025-06-17 09:36:35 +08:00
-												[v0.11.0][Doc] Update doc (#3852)

### What this PR does / why we need it?
Update doc


Signed-off-by: hfadzxy <starmoon_zhang@163.com>
											
										
										
											2025-10-29 11:32:12 +08:00
+								**Benefits with vLLM Ascend**
-												[Doc] Refactor and init user story page  (#1224)

### What this PR does / why we need it?
This PR refactor the user stories page:
- Move it to community
- Add initial info of LLaMA-Factory, Huggingface/trl, MindIE Turbo,
GPUStack, verl
- Add a new page for LLaMA-Factory

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Preview locally

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
											
										
										
											2025-06-17 09:36:35 +08:00
-												[v0.11.0][Doc] Update doc (#3852)

### What this PR does / why we need it?
Update doc


Signed-off-by: hfadzxy <starmoon_zhang@163.com>
											
										
										
											2025-10-29 11:32:12 +08:00
+								With the joint efforts of LLaMA-Factory and vLLM Ascend ([LLaMA-Factory#7739](https://github.com/hiyouga/LLaMA-Factory/pull/7739)), LLaMA-Factory has achieved significant performance gains during model inference. Benchmark results show that its inference speed is now up to 2× faster compared to the Transformers implementation.
-												[Doc] Refactor and init user story page  (#1224)

### What this PR does / why we need it?
This PR refactor the user stories page:
- Move it to community
- Add initial info of LLaMA-Factory, Huggingface/trl, MindIE Turbo,
GPUStack, verl
- Add a new page for LLaMA-Factory

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Preview locally

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
											
										
										
											2025-06-17 09:36:35 +08:00
 								**Learn more**
-												[v0.11.0][Doc] Update doc (#3852)

### What this PR does / why we need it?
Update doc


Signed-off-by: hfadzxy <starmoon_zhang@163.com>
											
										
										
											2025-10-29 11:32:12 +08:00
+								See more details about LLaMA-Factory and how it uses vLLM Ascend for inference on Ascend NPUs in [LLaMA-Factory Ascend NPU Inference](https://llamafactory.readthedocs.io/en/latest/advanced/npu_inference.html).