xc-llm-ascend

Author	SHA1	Message	Date
wind-all	1a443f2772	add multi_npu_qwen3_dense tutorials (#4543 ) ### What this PR does / why we need it? This PR adds tutorials for the Qwen3-Dense series models, including the A2 and A3 series, and provides accuracy validation results. - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: wind-all <anyuting@h-partners.com>	2025-12-10 16:09:56 +08:00
Ruri	ce5872705e	[Feat] Support native Kimi-K2-Thinking native W4A16 quantized experts weights (#4516 ) ### What this PR does / why we need it? Adds W4A16 quantization method for the Kimi-K2-Thinking model and updates relevant modules to support the new quantization method. - Implements complete W4A16 quantization method including weight packing/unpacking, per-group quantization parameter generation, post-processing logic and MoE method application. - Adds parameters `use_int4_w4a16`, `w1_offset` and `w2_offset`, adjusts `with_quant` conditional logic to support W4A16 matrix multiplication. - Adds `packed_modules_model_mapping` for Kimi-K2-Thinking model and processing logic for `weight_packed` field. - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> Signed-off-by: Ruri <33858552+zhoux77899@users.noreply.github.com> Signed-off-by: Ruri <zhouxiang100@huawei.com>	2025-12-10 15:58:52 +08:00
wangxiyuan	835b4c8f1d	Drop torchair (#4814 ) aclgraph is stable and fast now. Let's drop torchair graph mode now. TODO: some logic to adapt torchair should be cleaned up as well. We'll do it in the following PR. - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: Mengqing Cao <cmq0113@163.com>	2025-12-10 09:20:40 +08:00
wangxiaoteng888	a77045f355	[P/D][main]Offline the llmdatadist connector related parts of the code and files. (#4780 ) ### What this PR does / why we need it? As support for the mooncake connector is now available, the llmdatadist connector is no longer being maintained, so the llmdatadist-related files need to be retired. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By ci - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com> Signed-off-by: liziyu <liziyu16@huawei.com> Co-authored-by: liziyu <liziyu16@huawei.com>	2025-12-09 22:36:43 +08:00
linfeng-yuan	56f01820e8	[Docs]fix the configuration conflicts in documentation (#4823 ) ### What this PR does / why we need it? Fix configuration error in our documentations. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? NA. Signed-off-by: linfeng-yuan <1102311262@qq.com>	2025-12-09 15:37:38 +08:00
xuyexiong	193dc1703f	[Doc] Add Qwen3-235B tutorial (#4358 ) ### What this PR does / why we need it? Add Qwen3-235B tutorial including the following examples - Single-node Online Deployment for 128k context inference - Multi-node Deployment with MP - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: xuyexiong <xuyexiong@huawei.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-12-08 20:06:46 +08:00
liziyu	688b1332da	[P/D] check kv extra config and del hccl backend (#4547 ) ### What this PR does / why we need it? check kv extra config & del hccl backend - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: liziyu <liziyu16@huawei.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-12-07 15:19:42 +08:00
mazhixin000	3740b3edfc	【main】[Doc]add 2P1D instruction for single node (#4716 ) ### What this PR does / why we need it? Add the description for 2P1D， keeping it consistent with the content in the dev branch. ### Does this PR introduce _any_ user-facing change? no - vLLM version: v0.12.0 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.12.0 Signed-off-by: mazhixin000 <mazhixinkorea@163.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-12-05 18:35:18 +08:00
1092626063	b84c9afbf5	【doc fix】doc fix: deepseekv3.1 (#4645 ) ### What this PR does / why we need it? fix deepseekv3.1 doc to recomand developers to use Mooncake instead of LLMDatadist ### Does this PR introduce _any_ user-facing change? <!-- Note that it means any user-facing change including all aspects such as API, interface or other behavior changes. Documentation-only updates are not considered user-facing changes. --> ### How was this patch tested? <!-- CI passed with new added/existing test. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. --> Signed-off-by: AiChiMomo <1092626063@qq.com>	2025-12-02 21:49:13 +08:00
1092626063	eabedf43aa	[Doc] Refactor the DeepSeek-V3.1 tutorial. (#4399 ) ### What this PR does / why we need it? Refactor the DeepSeek-V3.1 tutorial. - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 Signed-off-by: 1092626063 <1092626063@qq.com>	2025-12-02 18:46:30 +08:00
yeyifan	8907010815	[Doc] Add tutorial for Qwen3-Coder-30B-A3B (#4391 ) ### What this PR does / why we need it? Add tutorial for Qwen3-Coder-30B-A3B - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: nsdie <yeyifan@huawei.com> Signed-off-by: herizhen <you@example.com> Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com> Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com> Signed-off-by: weijinqian_v1 <weijinqian@huawei.com> Signed-off-by: weijinqian0 <1184188277@qq.com> Co-authored-by: Li Wang <wangli858794774@gmail.com> Co-authored-by: herizhen <59841270+herizhen@users.noreply.github.com> Co-authored-by: herizhen <you@example.com> Co-authored-by: Yizhou <136800916+yiz-liu@users.noreply.github.com> Co-authored-by: jiangyunfan1 <jiangyunfan1@h-partners.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: XiaoxinWang <963372609@qq.com> Co-authored-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com> Co-authored-by: weijinqian0 <1184188277@qq.com> Co-authored-by: weijinqian_v1 <weijinqian@huawei.com>	2025-12-02 16:03:37 +08:00
wangxiyuan	cb33b09179	[Doc]clean up ascend scheduler config from doc (#4612 ) clean up ascend scheduler config from doc - vLLM version: v0.11.2 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-12-02 14:22:56 +08:00
zhangyiming	c097790370	[Doc] Fix DeepSeek-V3.2-Exp doc, add docker command. (#4479 ) ### What this PR does / why we need it? Fix DeepSeek-V3.2-Exp doc, add docker command. - vLLM version: v0.11.2 Signed-off-by: menogrey <1299267905@qq.com>	2025-12-01 22:29:21 +08:00
Mengqing Cao	517fd9272d	Revert "drop ascend scheduler" (#4580 ) Reverts vllm-project/vllm-ascend#4498 - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2	2025-11-29 22:20:48 +08:00
wangxiyuan	f10acddb78	drop ascend scheduler (#4498 ) Ascend scheduler was added for non chunk prefill case before, since that the npu ops didn't work well with chunked prefill. Now the ops with chunked prefill work better, it's time to remove the ascend scheduler to use vLLM default scheduler. - vLLM version: v0.11.2 --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-29 16:18:34 +08:00
wangxiyuan	8ebbf13c1a	Update triton package name (#4563 ) Add `aarch64` suffix to make sure the package name is OK - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-29 15:00:40 +08:00
Ting FU	b747c95cfa	[Doc] Add single NPU tutorial for Qwen2.5-Omni-7B (#4446 ) ### What this PR does / why we need it? Add single NPU tutorial for Qwen2.5-Omni-7B - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 Signed-off-by: Ting FU <futing10@huawei.com>	2025-11-29 11:57:29 +08:00
wangxiyuan	048d350f9e	update triton package url (#4552 ) Triton package url is not correct. This PR fix it Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-28 21:00:49 +08:00
wangxiaoteng888	366d2d95e8	[P/D] Add readme for PD separation (#4182 ) ### What this PR does / why we need it? Add readme for PD separation ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By ci - vLLM version: v0.11.0 - vLLM main: `2918c1b49c` --------- Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com> Signed-off-by: liziyu <liziyu16@huawei.com> Co-authored-by: liziyu <liziyu16@huawei.com>	2025-11-28 15:17:59 +08:00
SILONG ZENG	ab37a7d5ae	[main]Upgrade cann to 8.3rc2 (#4350 ) ### What this PR does / why we need it? Upgrade cann to 8.3rc2 ### Does this PR introduce _any_ user-facing change? Yes, docker image will use 8.3.RC2 - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: MrZ20 <2609716663@qq.com>	2025-11-28 14:06:01 +08:00
herizhen	d252e36ae8	Change comment location (#4432 ) ### What this PR does / why we need it? When running 'python example.py',connection issues often occur.The solution is to comment out the first line the code. Complete the specific names of machines A2 and A3. Standardize document format,a space should be added after the colon. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ut - vLLM version: v0.11.2 --------- Signed-off-by: herizhen <you@example.com> Co-authored-by: herizhen <you@example.com>	2025-11-26 16:13:31 +08:00
Li Wang	b5f7a83927	[Doc] Upgrade multi-node doc (#4365 ) ### What this PR does / why we need it? When we are using `Ascend scheduler`, the param `max_num_batched_tokens` should be larger than `max_model_len`, otherwise, will encountered the follow error: ```shell Value error, Ascend scheduler is enabled without chunked prefill feature. Argument max_num_batched_tokens (4096) is smaller than max_model_len (32768). This effectively limits the maximum sequence length to max_num_batched_tokens and makes vLLM reject longer sequences. Please increase max_num_batched_tokens or decrease max_model_len. [type=value_error, input_value=ArgsKwargs((), {'model_co...g': {'enabled': True}}}), input_type=ArgsKwargs] ``` ### Does this PR introduce _any_ user-facing change? Users/Developers who running the model according to the [tutorial](https://docs.vllm.ai/projects/ascend/en/latest/tutorials/multi_node.html), the parameters can be specified correctly. ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main: `2918c1b49c` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-11-24 10:57:50 +08:00
mazhixin000	ab51fcea4c	[Doc]Add single node PD disaggregation instructions (#4337 ) ### What this PR does / why we need it? add single node PD disaggregation instructions for Qwen 2.5VL model. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main: `2918c1b49c` --------- Signed-off-by: mazhixin <mazhixin7@huawei.com> Signed-off-by: mazhixin000 <mazhixinkorea@163.com> Co-authored-by: mazhixin <mazhixin7@huawei.com>	2025-11-22 23:33:07 +08:00
liziyu	a30261f779	[P/D] pd proxy support ipv6 (#4161 ) ### What this PR does / why we need it? pd proxy support ipv6, mooncake connector check whether the IPv6 address is used and notify the user. - vLLM version: v0.11.0 - vLLM main: `2918c1b49c` --------- Signed-off-by: liziyu <liziyu16@huawei.com>	2025-11-18 11:01:13 +08:00
lilinsiman	adee9dd3b1	[Info][main] Correct the mistake in information documents (#4157 ) ### What this PR does / why we need it? Correct the mistake in information documents ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ut - vLLM version: v0.11.0 - vLLM main: `2918c1b49c` --------- Signed-off-by: lilinsiman <lilinsiman@gmail.com>	2025-11-13 15:53:58 +08:00
zhangyiming	c9e5b90f53	[Doc] Fix DeepSeek-3.2-Exp doc, remove v0.11.0rc0 outdated infos. (#4095 ) ### What this PR does / why we need it? Fix DeepSeek-3.2-Exp doc, remove v0.11.0rc0 outdated infos. - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: menogrey <1299267905@qq.com>	2025-11-12 09:11:31 +08:00
wangxiyuan	f811a24bf0	Remove VLLM_USE_V1 (#4086 ) Drop VLLM_USE_V1 usage. This env has been removed from vLLM already. - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-11 15:43:39 +08:00
22dimensions	e6625bb582	[Doc] add qwen3 w4a4 tutorial (#4076 ) ### What this PR does / why we need it? v0.11.0rc1 will introduce w4a4 quantization feature, so add this tutorial. ### Does this PR introduce _any_ user-facing change? No - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: 22dimensions <waitingwind@foxmail.com>	2025-11-10 20:30:07 +08:00
zhangyiming	a74e76b02d	[Doc] Remove extra MLAPO installation step for DeepSeek-V3.2. (#4024 ) ### What this PR does / why we need it? Remove extra MLAPO installation step for DeepSeek-V3.2. - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: menogrey <1299267905@qq.com>	2025-11-10 09:09:59 +08:00
lilinsiman	a3ff765c65	[Info][main] Corrected the errors in the information (#4055 ) ### What this PR does / why we need it? Corrected the errors in the information ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ut - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: lilinsiman <lilinsiman@gmail.com>	2025-11-08 18:48:59 +08:00
Li Wang	259eb25f88	[CI] Quick fix mooncake for nightly-ci (#4028 ) ### What this PR does / why we need it? Since we have upgraded to CANN 8.3rc1, we will no longer use the privately maintained Mooncake repository, but instead use the official release released by Mooncake: https://github.com/kvcache-ai/Mooncake/releases/tag/v0.3.7.post2 . Next step: this is only a temporary solution. We will integrate mooncake into the vllm-ascend base image later for easier use. see https://github.com/vllm-project/vllm-ascend/pull/3989 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-11-06 18:46:00 +08:00
zhangyiming	5f08e07208	[Doc] Refactor the DeepSeek-V3.2-Exp tutorial. (#3871 ) ### What this PR does / why we need it? Refactor the DeepSeek-V3.2-Exp tutorial. - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: menogrey <1299267905@qq.com>	2025-11-04 18:58:33 +08:00
zxr2333	15bb5098ad	[PD Disaggregation]Set adxl engine as default backend and update README (#3761 ) ### What this PR does / why we need it? Set adxl engine as the default Mooncake backend, because Ascend Transport is no longer maintained. Update README to include instructions for installing the adxl backend Mooncake. ### Does this PR introduce _any_ user-facing change? Users need to compile and install the mooncake backend for adxl according to the revised README instructions. ### How was this patch tested? By CI. - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: nwpu-zxr <zhouxuerong2@huawei.com>	2025-11-04 16:06:39 +08:00
zhangxinyuehfad	789ba4c5c2	[Doc] Update doc (#3836 ) ### What this PR does / why we need it? Update doc ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/releases/v0.11.1 Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2025-10-29 11:03:39 +08:00
Shanshan Shen	3e5ae49160	[MM][Doc] Update online serving tutorials for `Qwen2-Audio` (#3606 ) ### What this PR does / why we need it? Update online serving tutorials for `Qwen2-Audio`. Part of https://github.com/vllm-project/vllm-ascend/issues/3508. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: shen-shanshan <467638484@qq.com>	2025-10-27 16:58:03 +08:00
zhangyiming	ebfd09a075	[Doc] Update the Pangu Pro MoE tutorials. (#3651 ) ### What this PR does / why we need it? Update the Pangu Pro MoE tutorials. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: menogrey <1299267905@qq.com>	2025-10-23 20:41:47 +08:00
Crazyang	f06a6cad1b	[Doc] Update the modelslim website from gitee to gitcode. (#3615 ) ### What this PR does / why we need it? Because the ModelSlim code repository has migrated from gitee to gitcode, all relevant links in the repository have been updated. [migration notice](https://gitee.com/ascend/msit/tree/master/.%E6%9C%AC%E9%A1%B9%E7%9B%AE%E5%B7%B2%E7%BB%8F%E6%AD%A3%E5%BC%8F%E8%BF%81%E7%A7%BB%E8%87%B3%20Gitcode%20%E5%B9%B3%E5%8F%B0) ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? vLLM version: v0.11.0rc3 vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: Crazyang <im.crazyang@gmail.com> Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com> Co-authored-by: weichen <calvin_zhu0210@outlook.com>	2025-10-23 15:38:16 +08:00
Li Wang	ca104ce6f0	[Doc] Upgrade docker run command (#3645 ) ### What this PR does / why we need it? Update the docker run command, specifically: add --shm-size=1g ### Does this PR introduce _any_ user-facing change? users/developers using docker to pull vllm-ascend, the shared memory of the container will be increased from the default 64MB to 1G ### How was this patch tested? - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: wangli <wangli858794774@gmail.com>	2025-10-23 11:17:26 +08:00
liziyu	3164cb663c	[Bugfix] mooncake connector support external dp & update readme (#3579 ) ### What this PR does / why we need it? mooncake connector support external dp & update readme ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: liziyu <liziyu16@huawei.com>	2025-10-21 20:15:24 +08:00
likeful	6b6857929d	[Doc] Add --shm-size option to Docker command for qwen3 vl 235B (#3519 ) ### What this PR does / why we need it? Added shared memory size option to Docker run command.If shm-size is not specified, docker will use 64MB by default. In this case, vllm:EngineCore process may coredump if workload is high. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Done Closes: https://github.com/vllm-project/vllm-ascend/issues/3513 - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: likeful <irayki@gmail.com> Signed-off-by: leijie2015 <irayki@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-20 23:37:35 +08:00
Li Wang	4c4a8458a5	[CI] Refator multi-node CI (#3487 ) ### What this PR does / why we need it? Refactor the multi-machine CI use case. The purpose of this PR is to increase the ease of adding multi-machine CI use cases, allowing developers to add multi-machine cluster model testing use cases (including PD separation) by simply adding a new YAML configuration file. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-10-17 09:04:31 +08:00
leo-pony	291c00a224	[Doc] pin version that can stable running 310I Duo to vllm-ascend v0.10.0rc1 (#3455 ) Pin version that can stable running 310I Duo to vllm-ascend v0.10.0rc1. ### What this PR does / why we need it? Since PR #2614 310I Duo been broken. Although we are currently working on fixing the issue with the 310I Duo being broken, there is no confirmed timeline for a fix in the short term. To allow users to quickly find a working version instead of going back and forth on trial and error, this PR fixes the version in the 310I Duo guide. ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? NA - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: leo-pony <nengjunma@outlook.com>	2025-10-16 08:54:09 +08:00
leo-pony	ff91904ee2	[Doc] Clearer corresponding relationship between configurations for multi-node guides (#3441 ) Optimize multi-node guide: more clearer corresponding relationship between configuration items and nodes ### What this PR does / why we need it? Some issues caused by misunderstandings due to unclear guidance content, for example: #3367 ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? NA - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: leo-pony <nengjunma@outlook.com>	2025-10-16 08:54:03 +08:00
zxr2333	c2c1db78a7	[Bugfix] fix ZeroDivisionError when prefill_tp_size > num_kv_head and fix tp_resharding README (#3437 ) ### What this PR does / why we need it? Fix ZeroDivisionError when prefill_tp_size > num_kv_head, in this situation, num_head_replica can be 0 and used to divide another value, this PR restricts the minimum value of a to be 1. And this PR fix tp_resharding README. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By CI. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: liziyu <liziyu16@huawei.com> Signed-off-by: nwpu-zxr <zhouxuerong2@huawei.com> Co-authored-by: liziyu <liziyu16@huawei.com>	2025-10-15 08:45:44 +08:00
wangxiaoteng888	19b85ef1bc	[Bugfix] multi_node_pd_disaggregation_mooncake.md update (#3400 ) ### What this PR does / why we need it? multi_node_pd_disaggregation_mooncake.md update. Fix issues encountered during service startup. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By ci - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: wangxiaoteng@huawei.com <wangxiaoteng@huawei.com>	2025-10-14 09:29:35 +08:00
wangxiaoteng888	ca05f7d632	[Bugfix] TP size larger than KV cache head causes accuracy issues (#3366 ) ### What this PR does / why we need it? Resolve the issue where, in the case of unequal TP (Tensor Parallelism), the TP size is larger than the number of model attention kvcache heads, causing the KV cache to generate duplicates, which leads to transmission errors in the original code. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By ci - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: nwpu-zxr <zhouxuerong2@huawei.com> Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com> Co-authored-by: nwpu-zxr <zhouxuerong2@huawei.com>	2025-10-11 11:22:23 +08:00
Li Wang	60b7c936c5	[Doc] Update deepseek-v3.2 doc (#3319 ) ### What this PR does / why we need it? Upgrade deepseek-v3.2 doc for A2 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0 --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-10-10 08:55:39 +08:00
Yikun Jiang	2dde1268c7	Fix doc for A2 series and cleanup note (#3307 ) ### What this PR does / why we need it? Fix doc for A2 series and cleanup note ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/releases/v0.11.0 Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-10-01 14:39:48 +08:00
wangxiyuan	b8c58d68e1	[Doc] Add deepseek v3.2 tutorial (#3275 ) Add deepseek v3.2 tutorial - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/releases/v0.11.0 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: wangli <wangli858794774@gmail.com> Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com>	2025-09-30 17:54:31 +08:00
Peipei	cf445c41f9	[Doc]Add qwen3_vl series guide (#3227 ) ### What this PR does / why we need it? This PR provides user guide documents for Qwen3-VL 4B and Qwen3-VL-235B-A22B. ### Does this PR introduce _any_ user-facing change? None ### How was this patch tested? - vLLM version: v0.10.2 - vLLM main: https://github.com/vllm-project/vllm/commit/releases/v0.11.0 --------- Signed-off-by: booker123456 <945658361@qq.com>	2025-09-28 21:35:52 +08:00

1 2 3 4

195 Commits