xc-llm-ascend

Author	SHA1	Message	Date
wangxiyuan	b5b7e0ecc7	[Doc] Add qwen3 embedding 8b guide (#1734 ) 1. Add the tutorials for qwen3-embedding-8b 2. Remove VLLM_USE_V1=1 in docs, it's useless any more from 0.9.2 - vLLM version: v0.9.2 - vLLM main: `5923ab9524` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-07-11 17:40:17 +08:00
wangxiyuan	3d1e6a5929	[Doc] Update user doc index (#1581 ) Add user doc index to make the user guide more clear - vLLM version: v0.9.1 - vLLM main: `49e8c7ea25` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-07-10 14:26:59 +08:00
Li Wang	0c4aa2b4f1	[Doc] Add multi node data parallel doc (#1685 ) ### What this PR does / why we need it? add multi node data parallel doc ### Does this PR introduce _any_ user-facing change? add multi node data parallel doc ### How was this patch tested? - vLLM version: v0.9.1 - vLLM main: `805d62ca88` Signed-off-by: wangli <wangli858794774@gmail.com>	2025-07-10 09:36:37 +08:00
leo-pony	b4b19ea588	[Doc] Add multi-npu qwen3-MoE-32B Tutorials (#1419 ) Signed-off-by: leo-pony <nengjunma@outlook.com> ### What this PR does / why we need it? Add multi-npu qwen3-MoE-32B Tutorials Relate RFC: https://github.com/vllm-project/vllm-ascend/issues/1248 - vLLM version: v0.9.1 - vLLM main: `5358cce5ff` --------- Signed-off-by: leo-pony <nengjunma@outlook.com>	2025-07-10 09:06:51 +08:00
leo-pony	53ec583bbb	[Docs] Update Altlas 300I series doc and fix CI lint (#1537 ) ### What this PR does / why we need it? - Update Altlas 300I series doc: cleanup unused parameters and enable optimized ops - Fix code spell CI ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed --------- Signed-off-by: leo-pony <nengjunma@outlook.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com>	2025-06-30 23:34:00 +08:00
Yikun Jiang	e4df0a4395	Add Pangu MoE Pro for 300I series docs (#1516 ) ### What this PR does / why we need it? Add Pangu MoE Pro for 300I series docs ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-06-30 13:37:22 +08:00
Yikun Jiang	cad4c693c6	Add Pangu MoE Pro docs (#1512 ) ### What this PR does / why we need it? This PR add Pangu MoE Pro 72B docs [1] https://gitcode.com/ascend-tribe/pangu-pro-moe-model ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-06-30 12:15:33 +08:00
Shanshan Shen	99e685532d	[Doc] Add Qwen2.5-VL eager mode doc (#1394 ) ### What this PR does / why we need it? Add Qwen2.5-VL eager mode doc. --------- Signed-off-by: shen-shanshan <467638484@qq.com>	2025-06-28 09:08:51 +08:00
Shanshan Shen	4e2daf5ab7	[Doc] Add qwen2-audio eager mode tutorial (#1371 ) ### What this PR does / why we need it? Add qwen2-audio eager mode tutorial. Signed-off-by: shen-shanshan <467638484@qq.com>	2025-06-26 16:56:05 +08:00
leo-pony	1025344912	Doc Enhancement: Single NPU(Qwen3-8B) aclgraph mode + eager mode (#1374 ) ### What this PR does / why we need it? Doc Enhancement: Single NPU(Qwen3-8B) aclgraph mode + eager mode. Relate RFC: https://github.com/vllm-project/vllm-ascend/issues/1248 ### Does this PR introduce _any_ user-facing change? No changes. ### How was this patch tested? Preview Signed-off-by: leo-pony <nengjunma@outlook.com> Signed-off-by: leo-pony <nengjunma@outlook.com>	2025-06-26 16:52:54 +08:00
Yikun Jiang	2e5f312530	Cleanup ununsed doc (#1352 ) ### What this PR does / why we need it? Cleanup ununsed doc for MoGE model, we will add back this when MoGE model ready. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-06-22 15:05:30 +08:00
Yikun Jiang	c30ddb8331	Bump v0.9.1rc1 release (#1349 ) ### What this PR does / why we need it? Bump v0.9.1rc1 release Closes: https://github.com/vllm-project/vllm-ascend/pull/1341 Closes: https://github.com/vllm-project/vllm-ascend/pull/1334 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed --------- Signed-off-by: Shanshan Shen <87969357+shen-shanshan@users.noreply.github.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Co-authored-by: shen-shanshan <467638484@qq.com>	2025-06-22 13:15:36 +08:00
22dimensions	c464c32b81	add doc for offline quantization inference (#1009 ) add example for offline inference with quantized model Signed-off-by: 22dimensions <waitingwind@foxmail.com>	2025-05-29 17:32:42 +08:00
22dimensions	d5401a08be	[DOC] update modelslim version (#908 ) 1. update modelslim version to fix deepseek related issues 2. add note for "--quantization ascend" Signed-off-by: 22dimensions <waitingwind@foxmail.com>	2025-05-21 09:12:02 +08:00
22dimensions	a8730e7a3c	[Doc] update quantization docs with QwQ-32B-W8A8 example (#835 ) 1. replace deepseek-v2-lite model with more pratical model QwQ 32B 2. fix some incorrect commands 3. replase modelslim version with a more formal tag Signed-off-by: 22dimensions <waitingwind@foxmail.com>	2025-05-17 15:25:17 +08:00
wangxiyuan	6193ba679b	[CI] add codespell CI and fix format.sh (#827 ) 1. Fix format check error to make format.sh work 2. Add codespell check CI 3. Add the missing required package for vllm-ascend. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-05-12 22:04:48 +08:00
Yikun Jiang	d39855b075	Update installation and tutorial doc (#711 ) ### What this PR does / why we need it? Update installation and tutorial doc ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? preview Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-04-28 21:52:17 +08:00
Li Wang	d0a0c81ced	[Doc] Add deepsee-v2-lite w8a8 quantization turorial (#630 ) ### What this PR does / why we need it? Add deepsee-v2-lite w8a8 quantization turorial --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-04-28 17:14:26 +08:00
wangxiyuan	9c7428b3d5	[CI] enable custom ops build (#466 ) ### What this PR does / why we need it? This PR enable custom ops build by default. ### Does this PR introduce _any_ user-facing change? Yes, users now install vllm-ascend from source will trigger custom ops build step. ### How was this patch tested? By image build and e2e CI --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-04-12 10:24:53 +08:00
jinyuxin	5d6239306b	[DOC] Update multi_node.md (#468 ) ### What this PR does / why we need it? - Added instructions for verifying multi-node communication environment. - Included explanations of Ray-related environment variables for configuration. - Provided detailed steps for launching services in a multi-node environment. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? manually tested. Signed-off-by: jinyuxin <jinyuxin2@huawei.com>	2025-04-08 14:19:57 +08:00
Shanshan Shen	c06af8b2e0	[V1][Core] Add support for V1 Engine (#295 ) ### What this PR does / why we need it? Add support for V1 Engine. Please note that this is just the initial version, and there may be some places need to be fixed or optimized in the future, feel free to leave some comments to us. ### Does this PR introduce _any_ user-facing change? To use V1 Engine on NPU device, you need to set the env variable shown below: ```bash export VLLM_USE_V1=1 export VLLM_WORKER_MULTIPROC_METHOD=spawn ``` If you are using vllm for offline inferencing, you must add a `__main__` guard like: ```bash if __name__ == '__main__': llm = vllm.LLM(...) ``` Find more details [here](https://docs.vllm.ai/en/latest/getting_started/troubleshooting.html#python-multiprocessing). ### How was this patch tested? I have tested the online serving with `Qwen2.5-7B-Instruct` using this command: ```bash vllm serve Qwen/Qwen2.5-7B-Instruct --max_model_len 26240 ``` Query the model with input prompts: ```bash curl http://localhost:8000/v1/completions \ -H "Content-Type: application/json" \ -d '{ "model": "Qwen/Qwen2.5-7B-Instruct", "prompt": "The future of AI is", "max_tokens": 7, "temperature": 0 }' ``` --------- Signed-off-by: shen-shanshan <467638484@qq.com> Co-authored-by: didongli182 <didongli@huawei.com>	2025-03-20 19:34:44 +08:00
wangxiyuan	c25631ec7b	[Doc] Add the release note for 0.7.3rc1 (#285 ) Add the release note for 0.7.3rc1 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-03-13 17:57:06 +08:00
Li Wang	41aba1cfc1	[Doc]Fix tutorial doc expression (#319 ) Fix tutorial doc expression Signed-off-by: wangli <wangli858794774@gmail.com>	2025-03-13 15:24:05 +08:00
xiemingda	59ea23d0d3	[Doc] Add Single NPU (Qwen2.5-VL-7B) tutorial (#311 ) Run vllm-ascend on Single NPU What this PR does / why we need it? Add vllm-ascend tutorial doc for Qwen/Qwen2.5-VL-7B-Instruct model Inference/Serving doc Does this PR introduce any user-facing change? no How was this patch tested? no Signed-off-by: xiemingda <xiemingda1002@gmail.com>	2025-03-12 20:37:12 +08:00
Yikun Jiang	007aeaa48b	[Doc] Change distributed_executor_backend to mp (#287 ) ### What this PR does / why we need it? Fix `ValueError: Unrecognized distributed executor backend tp. Supported values are 'ray', 'mp' 'uni', 'external_launcher' or custom ExecutorBase subclass.` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Test on my local node Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-03-10 11:27:26 +08:00
Yikun Jiang	38334f5daa	[Docs] Re-arch on doc and make QwQ doc work (#271 ) ### What this PR does / why we need it? Re-arch on tutorials, move singe npu / multi npu / multi node to index. - Unifiy docker run cmd - Use dropdown to hide build from source installation doc - Re-arch tutorials to include Qwen/QwQ/DeepSeek - Make QwQ doc works ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI test Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-03-10 09:27:48 +08:00

26 Commits