xc-llm-ascend

Author	SHA1	Message	Date
zyz111222	98c788a65a	[Doc] add PaddleOCR-VL tutorials guide (#5556 ) ### What this PR does / why we need it? 1. add PaddleOCR-VL.md in the `docs/source/tutorials/` 2. add PaddleOCR-VL index in `docs/source/tutorials/index.md` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? by CI - vLLM version: v0.13.0 - vLLM main: `7157596103` Signed-off-by: zouyizhou <zouyizhou@huawei.com>	2026-01-09 11:01:25 +08:00
meihanc	503822c56c	[Doc] Add Qwen3-Omni-30B-A3B-Thinking Tutorials (#3991 ) ### What this PR does / why we need it? Add Qwen3-Omni-30B-A3B-Thinking Tutorials ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: `5326c89803` --------- Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>	2026-01-08 16:57:20 +08:00
zhangmuzhi_yuwan	6c1a685b30	[Doc] add new doc for mooncake: PD-Colocated cross-node multi-instance validation of Mooncake's KV Cache reuse and performance. (#5415 ) ### What this PR does / why we need it? This documentation provides a comprehensive technical guide for deploying vLLM-Ascend using a Prefill-Decode (PD) colocated architecture integrated with Mooncake, a high-performance distributed KV Cache transfer engine. As Large Language Model (LLM) serving scales, managing KV Cache efficiently across distributed nodes is essential for reducing latency and optimizing hardware utilization. The tutorial focuses on a multi-instance setup using Huawei Atlas 800T A2 nodes. By leveraging Mooncake’s distributed memory pooling, vLLM instances can achieve seamless cross-node KV Cache reuse. This capability allows an instance to retrieve precomputed cache from a remote node's DRAM via high-speed RoCE networks, effectively bypassing redundant prefill computations. ### Does this PR introduce _any_ user-facing change? No - vLLM version: release/v0.13.0 - vLLM main: `0bfd7484fd` --------- Signed-off-by: zhangmuzhibangde <1037640609@qq.com> Signed-off-by: zhangmuzhi_yuwan <1037640609@qq.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com>	2026-01-05 14:19:57 +08:00
LookAround0301	ca31d6823e	[Doc] add long_sequence feature user guide (#5343 ) ### What this PR does / why we need it? add long_sequence feature user guide - vLLM version: release/v0.13.0 - vLLM main: `bc0a5a0c08` --------- Signed-off-by: LookAround <lixushi@huawei.com>	2025-12-27 10:44:43 +08:00
weiguihua2	ce52e17bf3	[Doc]add long sequence tutorials (#5364 ) ### What this PR does / why we need it? Provide sample guidance for running long-sequence DeepSeek across multiple nodes To guide users on using the context parallel feature, a practical example is provided. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: release/v0.13.0 - vLLM main: `bc0a5a0c08` Signed-off-by: weiguihua2 <weiguihua2@huawei.com>	2025-12-27 09:52:11 +08:00
luluxiu520	bc05a81bf2	Add Qwen3-VL-235B-A22B-Instruct tutorials (#5167 ) ### What this PR does / why we need it? This PR provides an introduction to the Qwen3-VL-235B-A22B-Instruct model, details on the features supported by the model in the current version, the model deployment process, as well as methods for performance testing and accuracy testing. With this document, the deployment and testing of the Qwen3-VL-235B-A22B-Instruct model can be implemented more easily. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` Signed-off-by: luluxiu520 <l2625793@outlook.com>	2025-12-19 14:56:17 +08:00
TingW09	879ec2d1c4	[Doc] add qwen3 reranker (#5086 ) ### What this PR does / why we need it? add qwen3 reranker tutorials ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.12.0 --------- Signed-off-by: TingW09 <944713709@qq.com>	2025-12-18 10:54:07 +08:00
lilinsiman	31c94b7e7b	[doc][main] Correct more doc mistakes (#4958 ) ### What this PR does / why we need it? Correct more doc mistakes - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` Signed-off-by: lilinsiman <lilinsiman@gmail.com>	2025-12-13 18:36:58 +08:00
wangxiyuan	e538fa6f9c	[Doc] Update tutorial index (#4920 ) Update tutorial index and remove useless doc - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-12-11 20:53:13 +08:00
yangxiaoman8	e1bb6f47ec	[doc] Add Qwen2.5 tutorials (#4636 ) ### What this PR does / why we need it? Add qwen2.5 turorial - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: yangshihao6 <yangshihao6@huawei.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-12-11 17:30:05 +08:00
wangxiyuan	bb76f7962c	cleanup useless torchair logic (#4856 ) This PR clean up useless torchair logic in model runner. The moge doc is only for torchair, it can be removed as well. - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: Mengqing Cao <cmq0113@163.com>	2025-12-11 11:21:13 +08:00
SILONG ZENG	ff7d703192	[Doc]Add tutorial document for qwen-VL-Dense (#3516 ) ### What this PR does / why we need it? This document employs the qwen3-vl-8b model and qwen2.5-vl-32b to demonstrate the primary verification steps for the Qwen-VL series dense models, including supported features, feature configuration, environment preparation, NPU deployment, and accuracy and performance evaluation. - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: MrZ20 <2609716663@qq.com>	2025-12-11 08:55:23 +08:00
Leaf	89a8607b30	add DeepSeek-R1 tutorial. (#4666 ) ### What this PR does / why we need it? This PR adds tutorials for the DeepSeeK-R1 series models, including the A2 and A3 series, and provides accuracy validation results. - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: Gongdayao <gongdayao@foxmail.com>	2025-12-11 08:52:27 +08:00
wind-all	1a443f2772	add multi_npu_qwen3_dense tutorials (#4543 ) ### What this PR does / why we need it? This PR adds tutorials for the Qwen3-Dense series models, including the A2 and A3 series, and provides accuracy validation results. - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: wind-all <anyuting@h-partners.com>	2025-12-10 16:09:56 +08:00
Ruri	ce5872705e	[Feat] Support native Kimi-K2-Thinking native W4A16 quantized experts weights (#4516 ) ### What this PR does / why we need it? Adds W4A16 quantization method for the Kimi-K2-Thinking model and updates relevant modules to support the new quantization method. - Implements complete W4A16 quantization method including weight packing/unpacking, per-group quantization parameter generation, post-processing logic and MoE method application. - Adds parameters `use_int4_w4a16`, `w1_offset` and `w2_offset`, adjusts `with_quant` conditional logic to support W4A16 matrix multiplication. - Adds `packed_modules_model_mapping` for Kimi-K2-Thinking model and processing logic for `weight_packed` field. - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> Signed-off-by: Ruri <33858552+zhoux77899@users.noreply.github.com> Signed-off-by: Ruri <zhouxiang100@huawei.com>	2025-12-10 15:58:52 +08:00
wangxiaoteng888	a77045f355	[P/D][main]Offline the llmdatadist connector related parts of the code and files. (#4780 ) ### What this PR does / why we need it? As support for the mooncake connector is now available, the llmdatadist connector is no longer being maintained, so the llmdatadist-related files need to be retired. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By ci - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com> Signed-off-by: liziyu <liziyu16@huawei.com> Co-authored-by: liziyu <liziyu16@huawei.com>	2025-12-09 22:36:43 +08:00
xuyexiong	193dc1703f	[Doc] Add Qwen3-235B tutorial (#4358 ) ### What this PR does / why we need it? Add Qwen3-235B tutorial including the following examples - Single-node Online Deployment for 128k context inference - Multi-node Deployment with MP - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: xuyexiong <xuyexiong@huawei.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-12-08 20:06:46 +08:00
1092626063	eabedf43aa	[Doc] Refactor the DeepSeek-V3.1 tutorial. (#4399 ) ### What this PR does / why we need it? Refactor the DeepSeek-V3.1 tutorial. - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 Signed-off-by: 1092626063 <1092626063@qq.com>	2025-12-02 18:46:30 +08:00
yeyifan	8907010815	[Doc] Add tutorial for Qwen3-Coder-30B-A3B (#4391 ) ### What this PR does / why we need it? Add tutorial for Qwen3-Coder-30B-A3B - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: nsdie <yeyifan@huawei.com> Signed-off-by: herizhen <you@example.com> Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com> Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com> Signed-off-by: weijinqian_v1 <weijinqian@huawei.com> Signed-off-by: weijinqian0 <1184188277@qq.com> Co-authored-by: Li Wang <wangli858794774@gmail.com> Co-authored-by: herizhen <59841270+herizhen@users.noreply.github.com> Co-authored-by: herizhen <you@example.com> Co-authored-by: Yizhou <136800916+yiz-liu@users.noreply.github.com> Co-authored-by: jiangyunfan1 <jiangyunfan1@h-partners.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: XiaoxinWang <963372609@qq.com> Co-authored-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com> Co-authored-by: weijinqian0 <1184188277@qq.com> Co-authored-by: weijinqian_v1 <weijinqian@huawei.com>	2025-12-02 16:03:37 +08:00
Ting FU	b747c95cfa	[Doc] Add single NPU tutorial for Qwen2.5-Omni-7B (#4446 ) ### What this PR does / why we need it? Add single NPU tutorial for Qwen2.5-Omni-7B - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 Signed-off-by: Ting FU <futing10@huawei.com>	2025-11-29 11:57:29 +08:00
wangxiaoteng888	366d2d95e8	[P/D] Add readme for PD separation (#4182 ) ### What this PR does / why we need it? Add readme for PD separation ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By ci - vLLM version: v0.11.0 - vLLM main: `2918c1b49c` --------- Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com> Signed-off-by: liziyu <liziyu16@huawei.com> Co-authored-by: liziyu <liziyu16@huawei.com>	2025-11-28 15:17:59 +08:00
mazhixin000	ab51fcea4c	[Doc]Add single node PD disaggregation instructions (#4337 ) ### What this PR does / why we need it? add single node PD disaggregation instructions for Qwen 2.5VL model. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main: `2918c1b49c` --------- Signed-off-by: mazhixin <mazhixin7@huawei.com> Signed-off-by: mazhixin000 <mazhixinkorea@163.com> Co-authored-by: mazhixin <mazhixin7@huawei.com>	2025-11-22 23:33:07 +08:00
22dimensions	e6625bb582	[Doc] add qwen3 w4a4 tutorial (#4076 ) ### What this PR does / why we need it? v0.11.0rc1 will introduce w4a4 quantization feature, so add this tutorial. ### Does this PR introduce _any_ user-facing change? No - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: 22dimensions <waitingwind@foxmail.com>	2025-11-10 20:30:07 +08:00
zhangyiming	5f08e07208	[Doc] Refactor the DeepSeek-V3.2-Exp tutorial. (#3871 ) ### What this PR does / why we need it? Refactor the DeepSeek-V3.2-Exp tutorial. - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: menogrey <1299267905@qq.com>	2025-11-04 18:58:33 +08:00
Shanshan Shen	3e5ae49160	[MM][Doc] Update online serving tutorials for `Qwen2-Audio` (#3606 ) ### What this PR does / why we need it? Update online serving tutorials for `Qwen2-Audio`. Part of https://github.com/vllm-project/vllm-ascend/issues/3508. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: shen-shanshan <467638484@qq.com>	2025-10-27 16:58:03 +08:00
wangxiaoteng888	ca05f7d632	[Bugfix] TP size larger than KV cache head causes accuracy issues (#3366 ) ### What this PR does / why we need it? Resolve the issue where, in the case of unequal TP (Tensor Parallelism), the TP size is larger than the number of model attention kvcache heads, causing the KV cache to generate duplicates, which leads to transmission errors in the original code. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By ci - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: nwpu-zxr <zhouxuerong2@huawei.com> Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com> Co-authored-by: nwpu-zxr <zhouxuerong2@huawei.com>	2025-10-11 11:22:23 +08:00
wangxiyuan	b8c58d68e1	[Doc] Add deepseek v3.2 tutorial (#3275 ) Add deepseek v3.2 tutorial - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/releases/v0.11.0 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: wangli <wangli858794774@gmail.com> Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com>	2025-09-30 17:54:31 +08:00
Peipei	cf445c41f9	[Doc]Add qwen3_vl series guide (#3227 ) ### What this PR does / why we need it? This PR provides user guide documents for Qwen3-VL 4B and Qwen3-VL-235B-A22B. ### Does this PR introduce _any_ user-facing change? None ### How was this patch tested? - vLLM version: v0.10.2 - vLLM main: https://github.com/vllm-project/vllm/commit/releases/v0.11.0 --------- Signed-off-by: booker123456 <945658361@qq.com>	2025-09-28 21:35:52 +08:00
Li Wang	4267f5d55f	[Doc] Add multi-node ray backend tutorial (#2376 ) ### What this PR does / why we need it? Add multi-node ray backend tutorial for Qwen235B-A3B ### How was this patch tested? - vLLM version: v0.10.2 - vLLM main: `f4cd80f944` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-09-18 15:30:18 +08:00
Yikun Jiang	b5ccef6115	[Doc] Add doc for Qwen3 Next (#2916 ) ### What this PR does / why we need it? Add doc for Qwen3 Next ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Doc CI passed Related: https://github.com/vllm-project/vllm-ascend/issues/2884 - vLLM version: v0.10.2 - vLLM main: `01413e0cf5` Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-09-16 01:16:06 +08:00
yupeng	a746f8274f	[DOC] Qwen3 PD disaggregation user guide (#2751 ) ### What this PR does / why we need it? The PR is for the document of the prefiller&decoder disaggregation deloyment guide. The scenario of the guide is: - Use 3 nodes totally and 2 NPUs on each node - Qwen3-30B-A3B - 1P2D - Expert Parallel The deployment can be used to verify PD Disggregation / Expert Parallel features with a slightly less resources. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? No. - vLLM version: v0.10.1.1 - vLLM main: `e599e2c65e` --------- Signed-off-by: paulyu12 <507435917@qq.com>	2025-09-07 10:35:37 +08:00
22dimensions	440d28a138	[Tutorial] Add qwen3 8b w4a8 tutorial (#2249 ) ### What this PR does / why we need it? Add a new single npu quantization tutorial, and using the latest qwen3 model. - vLLM version: v0.10.0 - vLLM main: `8e8e0b6af1` Signed-off-by: 22dimensions <waitingwind@foxmail.com>	2025-08-07 14:39:38 +08:00
Li Wang	bf84f2dbfa	[Doc] Support kimi-k2-w8a8 (#2162 ) ### What this PR does / why we need it? In fact, the kimi-k2 model is similar to the deepseek model, and we only need to make a few changes to support it. what does this pr do: 1. Add kimi-k2-w8a8 deployment doc 2. Update quantization doc 3. Upgrade torchair support list ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.10.0 - vLLM main: `9edd1db02b` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-08-06 19:28:47 +08:00
wangxiyuan	b5b7e0ecc7	[Doc] Add qwen3 embedding 8b guide (#1734 ) 1. Add the tutorials for qwen3-embedding-8b 2. Remove VLLM_USE_V1=1 in docs, it's useless any more from 0.9.2 - vLLM version: v0.9.2 - vLLM main: `5923ab9524` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-07-11 17:40:17 +08:00
leo-pony	b4b19ea588	[Doc] Add multi-npu qwen3-MoE-32B Tutorials (#1419 ) Signed-off-by: leo-pony <nengjunma@outlook.com> ### What this PR does / why we need it? Add multi-npu qwen3-MoE-32B Tutorials Relate RFC: https://github.com/vllm-project/vllm-ascend/issues/1248 - vLLM version: v0.9.1 - vLLM main: `5358cce5ff` --------- Signed-off-by: leo-pony <nengjunma@outlook.com>	2025-07-10 09:06:51 +08:00
Yikun Jiang	cad4c693c6	Add Pangu MoE Pro docs (#1512 ) ### What this PR does / why we need it? This PR add Pangu MoE Pro 72B docs [1] https://gitcode.com/ascend-tribe/pangu-pro-moe-model ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-06-30 12:15:33 +08:00
Shanshan Shen	4e2daf5ab7	[Doc] Add qwen2-audio eager mode tutorial (#1371 ) ### What this PR does / why we need it? Add qwen2-audio eager mode tutorial. Signed-off-by: shen-shanshan <467638484@qq.com>	2025-06-26 16:56:05 +08:00
Yikun Jiang	2e5f312530	Cleanup ununsed doc (#1352 ) ### What this PR does / why we need it? Cleanup ununsed doc for MoGE model, we will add back this when MoGE model ready. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-06-22 15:05:30 +08:00
Yikun Jiang	c30ddb8331	Bump v0.9.1rc1 release (#1349 ) ### What this PR does / why we need it? Bump v0.9.1rc1 release Closes: https://github.com/vllm-project/vllm-ascend/pull/1341 Closes: https://github.com/vllm-project/vllm-ascend/pull/1334 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed --------- Signed-off-by: Shanshan Shen <87969357+shen-shanshan@users.noreply.github.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Co-authored-by: shen-shanshan <467638484@qq.com>	2025-06-22 13:15:36 +08:00
Li Wang	d0a0c81ced	[Doc] Add deepsee-v2-lite w8a8 quantization turorial (#630 ) ### What this PR does / why we need it? Add deepsee-v2-lite w8a8 quantization turorial --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-04-28 17:14:26 +08:00
xiemingda	59ea23d0d3	[Doc] Add Single NPU (Qwen2.5-VL-7B) tutorial (#311 ) Run vllm-ascend on Single NPU What this PR does / why we need it? Add vllm-ascend tutorial doc for Qwen/Qwen2.5-VL-7B-Instruct model Inference/Serving doc Does this PR introduce any user-facing change? no How was this patch tested? no Signed-off-by: xiemingda <xiemingda1002@gmail.com>	2025-03-12 20:37:12 +08:00
Yikun Jiang	38334f5daa	[Docs] Re-arch on doc and make QwQ doc work (#271 ) ### What this PR does / why we need it? Re-arch on tutorials, move singe npu / multi npu / multi node to index. - Unifiy docker run cmd - Use dropdown to hide build from source installation doc - Re-arch tutorials to include Qwen/QwQ/DeepSeek - Make QwQ doc works ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI test Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-03-10 09:27:48 +08:00

42 Commits