What this PR does / why we need it? This pull request performs a comprehensive cleanup of the vLLM Ascend documentation. It fixes numerous typos, grammatical errors, and phrasing issues across community guidelines, developer documents, hardware tutorials, and feature guides. Key improvements include correcting hardware names (e.g., Atlas 300I), fixing broken links, cleaning up code examples (removing duplicate flags and trailing commas), and improving the clarity of technical explanations. These changes are necessary to ensure the documentation is professional, accurate, and easy for users to follow. Does this PR introduce any user-facing change? No, this PR contains documentation-only updates. How was this patch tested? The changes were manually reviewed for accuracy and grammatical correctness. No functional code changes were introduced. --------- Signed-off-by: herizhen <1270637059@qq.com> Signed-off-by: herizhen <59841270+herizhen@users.noreply.github.com>
1.1 KiB
LLaMA-Factory
Introduction
LLaMA-Factory is an easy-to-use and efficient platform for training and fine-tuning large language models. With LLaMA-Factory, you can fine-tune hundreds of pre-trained models locally without writing any code.
LLaMA-Factory users need to evaluate the model and perform inference after fine-tuning.
Business challenge
LLaMA-Factory uses Transformers to perform inference on Ascend NPUs, but the speed is slow.
Benefits with vLLM Ascend
With the joint efforts of LLaMA-Factory and vLLM Ascend (LLaMA-Factory#7739), LLaMA-Factory has achieved significant performance gains during model inference. Benchmark results show that its inference speed is now up to 2× faster compared to the Transformers implementation.
Learn more
See more details about LLaMA-Factory and how it uses vLLM Ascend for inference on Ascend NPUs in LLaMA-Factory Ascend NPU Inference.