[Doc] Refactor and init user story page (#1224)
### What this PR does / why we need it? This PR refactor the user stories page: - Move it to community - Add initial info of LLaMA-Factory, Huggingface/trl, MindIE Turbo, GPUStack, verl - Add a new page for LLaMA-Factory ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Preview locally Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
This commit is contained in:
19
docs/source/community/user_stories/index.md
Normal file
19
docs/source/community/user_stories/index.md
Normal file
@@ -0,0 +1,19 @@
|
||||
# User Stories
|
||||
|
||||
Read case studies on how users and developers solves real, everyday problems with vLLM Ascend
|
||||
|
||||
- [LLaMA-Factory](./llamafactory.md) is an easy-to-use and efficient platform for training and fine-tuning large language models, it supports vLLM Ascend to speed up inference since [LLaMA-Factory#7739](https://github.com/hiyouga/LLaMA-Factory/pull/7739), gain 2x performance enhancement of inference.
|
||||
|
||||
- [Huggingface/trl](https://github.com/huggingface/trl) is a cutting-edge library designed for post-training foundation models using advanced techniques like SFT, PPO and DPO, it uses vLLM Ascend since [v0.17.0](https://github.com/huggingface/trl/releases/tag/v0.17.0) to support RLHF on Ascend NPU.
|
||||
|
||||
- [MindIE Turbo](https://pypi.org/project/mindie-turbo) is an LLM inference engine acceleration plug-in library developed by Huawei on Ascend hardware, which includes self-developed large language model optimization algorithms and optimizations related to the inference engine framework. It supports vLLM Ascend since [2.0rc1](https://www.hiascend.com/document/detail/zh/mindie/20RC1/AcceleratePlugin/turbodev/mindie-turbo-0001.html).
|
||||
|
||||
- [GPUStack](https://github.com/gpustack/gpustack) is an open-source GPU cluster manager for running AI models. It supports vLLM Ascend since [v0.6.2](https://github.com/gpustack/gpustack/releases/tag/v0.6.2), see more GPUStack performance evaluation info on [link](https://mp.weixin.qq.com/s/pkytJVjcH9_OnffnsFGaew).
|
||||
|
||||
- [verl](https://github.com/volcengine/verl) is a flexible, efficient and production-ready RL training library for large language models (LLMs), uses vLLM Ascend since [v0.4.0](https://github.com/volcengine/verl/releases/tag/v0.4.0), see more info on [verl x Ascend Quickstart](https://verl.readthedocs.io/en/latest/ascend_tutorial/ascend_quick_start.html).
|
||||
|
||||
:::{toctree}
|
||||
:caption: More details
|
||||
:maxdepth: 1
|
||||
llamafactory
|
||||
:::
|
||||
19
docs/source/community/user_stories/llamafactory.md
Normal file
19
docs/source/community/user_stories/llamafactory.md
Normal file
@@ -0,0 +1,19 @@
|
||||
# LLaMA-Factory
|
||||
|
||||
**About / Introduction**
|
||||
|
||||
[LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) is an easy-to-use and efficient platform for training and fine-tuning large language models. With LLaMA-Factory, you can fine-tune hundreds of pre-trained models locally without writing any code.
|
||||
|
||||
LLaMA-Facotory users need to evaluate and inference the model after fine-tuning the model.
|
||||
|
||||
**The Business Challenge**
|
||||
|
||||
LLaMA-Factory used transformers to perform inference on Ascend NPU, but the speed was slow.
|
||||
|
||||
**Solving Challenges and Benefits with vLLM Ascend**
|
||||
|
||||
With the joint efforts of LLaMA-Factory and vLLM Ascend ([LLaMA-Factory#7739](https://github.com/hiyouga/LLaMA-Factory/pull/7739)), the performance of LLaMA-Factory in the model inference stage has been significantly improved. According to the test results, the inference speed of LLaMA-Factory has been increased to 2x compared to the transformers version.
|
||||
|
||||
**Learn more**
|
||||
|
||||
See more about LLaMA-Factory and how it uses vLLM Ascend for inference on the Ascend NPU in the following documentation: [LLaMA-Factory Ascend NPU Inference](https://llamafactory.readthedocs.io/en/latest/advanced/npu_inference.html).
|
||||
@@ -66,11 +66,5 @@ developer_guide/evaluation/index
|
||||
:maxdepth: 1
|
||||
community/governance
|
||||
community/contributors
|
||||
:::
|
||||
|
||||
% User stories about vLLM Ascend project
|
||||
:::{toctree}
|
||||
:caption: User Story
|
||||
:maxdepth: 1
|
||||
user_stories/index
|
||||
community/user_stories/index
|
||||
:::
|
||||
|
||||
@@ -1,15 +0,0 @@
|
||||
# xxx project uses Ascend vLLM, gain 200% performance enhancement of inference.
|
||||
|
||||
## About / Introduction
|
||||
Draft content
|
||||
|
||||
## The Business Challenge
|
||||
Our goal is to ...
|
||||
|
||||
## Solving challenges with vLLM Ascend
|
||||
vLLM Ascend helped us ...
|
||||
|
||||
## Benefits using vLLM Ascend
|
||||
|
||||
## Learn more
|
||||
more info about this case
|
||||
@@ -1,22 +0,0 @@
|
||||
# vLLM Ascend User Stories
|
||||
|
||||
Read case studies on how users and developers solves real, everyday problems with vLLM Ascend
|
||||
|
||||
:::{card} Example user story
|
||||
:link: ./example
|
||||
:link-type: doc
|
||||
|
||||
xxx project uses Ascend vLLM, gain 200% performance enhancement of inference.
|
||||
|
||||
+++
|
||||
|
||||
**Tags**: vLLM, Ascend, Inference
|
||||
|
||||
:::
|
||||
|
||||
:::{toctree}
|
||||
:caption: Deployment
|
||||
:maxdepth: 1
|
||||
:hidden:
|
||||
example
|
||||
:::
|
||||
Reference in New Issue
Block a user