diff --git a/docs/source/community/user_stories/index.md b/docs/source/community/user_stories/index.md new file mode 100644 index 0000000..1dc1e56 --- /dev/null +++ b/docs/source/community/user_stories/index.md @@ -0,0 +1,19 @@ +# User Stories + +Read case studies on how users and developers solves real, everyday problems with vLLM Ascend + +- [LLaMA-Factory](./llamafactory.md) is an easy-to-use and efficient platform for training and fine-tuning large language models, it supports vLLM Ascend to speed up inference since [LLaMA-Factory#7739](https://github.com/hiyouga/LLaMA-Factory/pull/7739), gain 2x performance enhancement of inference. + +- [Huggingface/trl](https://github.com/huggingface/trl) is a cutting-edge library designed for post-training foundation models using advanced techniques like SFT, PPO and DPO, it uses vLLM Ascend since [v0.17.0](https://github.com/huggingface/trl/releases/tag/v0.17.0) to support RLHF on Ascend NPU. + +- [MindIE Turbo](https://pypi.org/project/mindie-turbo) is an LLM inference engine acceleration plug-in library developed by Huawei on Ascend hardware, which includes self-developed large language model optimization algorithms and optimizations related to the inference engine framework. It supports vLLM Ascend since [2.0rc1](https://www.hiascend.com/document/detail/zh/mindie/20RC1/AcceleratePlugin/turbodev/mindie-turbo-0001.html). + +- [GPUStack](https://github.com/gpustack/gpustack) is an open-source GPU cluster manager for running AI models. It supports vLLM Ascend since [v0.6.2](https://github.com/gpustack/gpustack/releases/tag/v0.6.2), see more GPUStack performance evaluation info on [link](https://mp.weixin.qq.com/s/pkytJVjcH9_OnffnsFGaew). + +- [verl](https://github.com/volcengine/verl) is a flexible, efficient and production-ready RL training library for large language models (LLMs), uses vLLM Ascend since [v0.4.0](https://github.com/volcengine/verl/releases/tag/v0.4.0), see more info on [verl x Ascend Quickstart](https://verl.readthedocs.io/en/latest/ascend_tutorial/ascend_quick_start.html). + +:::{toctree} +:caption: More details +:maxdepth: 1 +llamafactory +::: diff --git a/docs/source/community/user_stories/llamafactory.md b/docs/source/community/user_stories/llamafactory.md new file mode 100644 index 0000000..bb95990 --- /dev/null +++ b/docs/source/community/user_stories/llamafactory.md @@ -0,0 +1,19 @@ +# LLaMA-Factory + +**About / Introduction** + +[LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) is an easy-to-use and efficient platform for training and fine-tuning large language models. With LLaMA-Factory, you can fine-tune hundreds of pre-trained models locally without writing any code. + +LLaMA-Facotory users need to evaluate and inference the model after fine-tuning the model. + +**The Business Challenge** + +LLaMA-Factory used transformers to perform inference on Ascend NPU, but the speed was slow. + +**Solving Challenges and Benefits with vLLM Ascend** + +With the joint efforts of LLaMA-Factory and vLLM Ascend ([LLaMA-Factory#7739](https://github.com/hiyouga/LLaMA-Factory/pull/7739)), the performance of LLaMA-Factory in the model inference stage has been significantly improved. According to the test results, the inference speed of LLaMA-Factory has been increased to 2x compared to the transformers version. + +**Learn more** + +See more about LLaMA-Factory and how it uses vLLM Ascend for inference on the Ascend NPU in the following documentation: [LLaMA-Factory Ascend NPU Inference](https://llamafactory.readthedocs.io/en/latest/advanced/npu_inference.html). diff --git a/docs/source/index.md b/docs/source/index.md index 5f421b4..e9b103b 100644 --- a/docs/source/index.md +++ b/docs/source/index.md @@ -66,11 +66,5 @@ developer_guide/evaluation/index :maxdepth: 1 community/governance community/contributors -::: - -% User stories about vLLM Ascend project -:::{toctree} -:caption: User Story -:maxdepth: 1 -user_stories/index +community/user_stories/index ::: diff --git a/docs/source/user_stories/example.md b/docs/source/user_stories/example.md deleted file mode 100644 index ebbcf56..0000000 --- a/docs/source/user_stories/example.md +++ /dev/null @@ -1,15 +0,0 @@ -# xxx project uses Ascend vLLM, gain 200% performance enhancement of inference. - -## About / Introduction -Draft content - -## The Business Challenge -Our goal is to ... - -## Solving challenges with vLLM Ascend -vLLM Ascend helped us ... - -## Benefits using vLLM Ascend - -## Learn more -more info about this case diff --git a/docs/source/user_stories/index.md b/docs/source/user_stories/index.md deleted file mode 100644 index 6de8b21..0000000 --- a/docs/source/user_stories/index.md +++ /dev/null @@ -1,22 +0,0 @@ -# vLLM Ascend User Stories - -Read case studies on how users and developers solves real, everyday problems with vLLM Ascend - -:::{card} Example user story -:link: ./example -:link-type: doc - -xxx project uses Ascend vLLM, gain 200% performance enhancement of inference. - -+++ - -**Tags**: vLLM, Ascend, Inference - -::: - -:::{toctree} -:caption: Deployment -:maxdepth: 1 -:hidden: -example -:::