xc-llm-ascend

Author	SHA1	Message	Date
fems14	5447a039b9	[Feature][main]reconstruction kvpool connector to ascend connector (#4438 ) ### What this PR does / why we need it? 1.In short, we renamed the existing MooncakeStoreConnector to AscendStoreConnector and extracted the storage engine interaction logic into a new Backend class. Associated RFC：https://github.com/vllm-project/vllm-ascend/issues/4329 2.Fixed the issue where the number of input parameters for the connector was incorrect, introduced in vllm 0.11.2 ### Does this PR introduce _any_ user-facing change? change MooncakeStoreConnector to AscendStoreConnector ### How was this patch tested? - vLLM version: v0.11.2 --------- Signed-off-by: fems14 <1804143737@qq.com>	2025-11-28 18:08:37 +08:00
Chenxi Qian	554f16ae1f	[Kernel] add custom op GmmSwigluQuantWeightNzTensorList (#3804 ) ### What this PR does / why we need it? This PR introduces support for adding custom CANN `aclnn` ops to `vllm-ascend`, allowing users to define and use their own custom operators. Key changes include: - Building and installing custom ops into the `vllm-ascend`-specified directory - Binding the `aclnn` op interface to the `torch.ops._C_ascend` module - Enabling invocation of these ops within `vllm-ascend` This PR includes a sample custom op: `aclnnGroupedMatmulSwigluQuantWeightNzTensorList`, which is adapted from the CANN operator [`aclnnGroupedMatmulSwigluQuantWeightNZ`](https://www.hiascend.com/document/detail/zh/canncommercial/83RC1/API/aolapi/context/aclnnGroupedMatmulSwigluQuantWeightNZ.md). Its input parameters `weight` and `weight_scale` now accept `list[torch.Tensor]` (i.e., `at::TensorList`). ### Does this PR introduce _any_ user-facing change? No. - vLLM version: v0.11.2 --------- Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com>	2025-11-28 18:06:39 +08:00
herizhen	3199fe8350	[Doc]Delete equals sign (#4537 ) ### What this PR does / why we need it? Delete equals sign in doc ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ut - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: herizhen <you@example.com> Co-authored-by: herizhen <you@example.com>	2025-11-28 17:09:26 +08:00
wangxiaoteng888	366d2d95e8	[P/D] Add readme for PD separation (#4182 ) ### What this PR does / why we need it? Add readme for PD separation ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By ci - vLLM version: v0.11.0 - vLLM main: `2918c1b49c` --------- Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com> Signed-off-by: liziyu <liziyu16@huawei.com> Co-authored-by: liziyu <liziyu16@huawei.com>	2025-11-28 15:17:59 +08:00
LHXuuu	bdc66972db	[Quantization] Support compressed tensors w8a8 static and w8a8 dynamic weight (#4036 ) ### What this PR does / why we need it? While using the LLM Compressor quantization tool from the VLLM community to generate quantized weights, the VLLM Ascend engine needs to be adapted to support the compressed tensors quantization format. 1. Add AscendCompressedTensorsConfig to replace CompressedTensorsConfig in vllm. 2. Support CompressedTensorsW8A8 static weight. - weight: per-channel, int8, symmetric; activation: per-tensor, int8, symmetric. 4. Support CompressedTensorsW8A8Dynamic weight. - weight: per-channel, int8, symmetric; activation: per-token, int8, symmetric, dynamic. 5. Modify the override_quantization_method in AscendQuantConfig. Co-authored-by: taoqun110 taoqun@huawei.com Co-authored-by: chenxi-hh chen464822955@163.com - vLLM version: v0.11.2 --------- Signed-off-by: LHXuuu <scut_xlh@163.com> Signed-off-by: chenxi-hh <chen464822955@163.com> Signed-off-by: chenxi-hh <32731611+chenxi-hh@users.noreply.github.com> Co-authored-by: chenxi-hh <chen464822955@163.com> Co-authored-by: chenxi-hh <32731611+chenxi-hh@users.noreply.github.com>	2025-11-28 14:09:39 +08:00
SILONG ZENG	ab37a7d5ae	[main]Upgrade cann to 8.3rc2 (#4350 ) ### What this PR does / why we need it? Upgrade cann to 8.3rc2 ### Does this PR introduce _any_ user-facing change? Yes, docker image will use 8.3.RC2 - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: MrZ20 <2609716663@qq.com>	2025-11-28 14:06:01 +08:00
herizhen	d252e36ae8	Change comment location (#4432 ) ### What this PR does / why we need it? When running 'python example.py',connection issues often occur.The solution is to comment out the first line the code. Complete the specific names of machines A2 and A3. Standardize document format,a space should be added after the colon. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ut - vLLM version: v0.11.2 --------- Signed-off-by: herizhen <you@example.com> Co-authored-by: herizhen <you@example.com>	2025-11-26 16:13:31 +08:00
wangxiyuan	bc69d7cfe1	upgrade to vllm 0.11.2 (#4400 ) Bump vLLM version to v0.11.2 What's broken and changed by vLLM: 1. structured_output is broken by https://github.com/vllm-project/vllm/pull/26866 2. get_mrope_input_positions is broken by https://github.com/vllm-project/vllm/pull/28399 3. graph mode is broken by https://github.com/vllm-project/vllm/pull/25110 we'll upgrade torch to 2.8 to fix the problem later 4. embedding is broken by https://github.com/vllm-project/vllm/pull/27583 5. `get_attn_backend_cls` and attention backend is broken are broken by https://github.com/vllm-project/vllm/pull/28534 6. spec decode is broken by https://github.com/vllm-project/vllm/pull/28771 7. sp feature is broken by https://github.com/vllm-project/vllm/pull/27126 8. mtp is broken by https://github.com/vllm-project/vllm/pull/27922 9. lora is broken by https://github.com/vllm-project/vllm/pull/21068 10. execute_model is broken by https://github.com/vllm-project/vllm/pull/26866 11. `VLLM_DISABLE_SHARED_EXPERTS_STREAM` env is broken by https://github.com/vllm-project/vllm/pull/28159 12. kv cahe is broken by https://github.com/vllm-project/vllm/pull/27753 13. dp is broken by https://github.com/vllm-project/vllm/pull/25110 What's broken and changed by ourself: 1. qwen vl is broken by https://github.com/vllm-project/vllm/pull/28455 We'll remove model files in the future to avoid this kind of error 2. Engine core is broken by https://github.com/vllm-project/vllm/pull/23691 We'll remove the patch file in the future. 3. Ascend scheduler is broken by https://github.com/vllm-project/vllm/pull/28733 We'll remove ascend scheudler later. 4. qwen3-next is broken by https://github.com/vllm-project/vllm/pull/28083 We'll remove model files in the future to avoid this kind of error 5. qwen vl is broken by https://github.com/vllm-project/vllm/pull/27764. We'll remove model files in the future Known issue: 1. ray doesn't work 2. the accuracy of qwen3-next is not correct 3. qwen3-vl is broken 4. prefix cache+ ascend scheduler + deepseek v2 lite is broken. Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Co-authored-by: 22dimensions <waitingwind@foxmail.com> Co-authored-by: shen-shanshan <467638484@qq.com> - vLLM version: v0.11.2 --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: leo-pony <nengjunma@outlook.com>	2025-11-26 11:48:58 +08:00
herizhen	e945e91933	Document error correction (#4422 ) ### What this PR does / why we need it? The "g" at the beginning of the current sentence is redundant and needs to be deleted "MindIE Turbo" is no longer required to be displayed. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ut - vLLM main: `2918c1b49c` --------- Signed-off-by: herizhen <you@example.com> Co-authored-by: herizhen <you@example.com>	2025-11-25 14:21:13 +08:00
Tjh-UKN	00ea61ec88	[feature] vllm-ascend support msprobe (eager mode dump) (#4241 ) ### What this PR does / why we need it? vllm-ascend need to dump data during model execution to debug some precision problems, here msprobe provide the corresponding abilities, so msprobe will join vllm-ascend to make debug easier ### Does this PR introduce _any_ user-facing change? ``` 'dump_config': '/path/to/config.json' ``` - vLLM version: v0.11.0 - vLLM main: `2918c1b49c` --------- Signed-off-by: Tjh-UKN <2559659915@qq.com>	2025-11-24 21:58:31 +08:00
herizhen	8c87a3b053	Change the first letter to uppercase (#4375 ) ### What this PR does / why we need it? The first letter of the English title should be capitalized ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ut - vLLM version: v0.11.0 - vLLM main: `2918c1b49c` Signed-off-by: herizhen <you@example.com> Co-authored-by: herizhen <you@example.com>	2025-11-24 12:18:24 +08:00
Li Wang	b5f7a83927	[Doc] Upgrade multi-node doc (#4365 ) ### What this PR does / why we need it? When we are using `Ascend scheduler`, the param `max_num_batched_tokens` should be larger than `max_model_len`, otherwise, will encountered the follow error: ```shell Value error, Ascend scheduler is enabled without chunked prefill feature. Argument max_num_batched_tokens (4096) is smaller than max_model_len (32768). This effectively limits the maximum sequence length to max_num_batched_tokens and makes vLLM reject longer sequences. Please increase max_num_batched_tokens or decrease max_model_len. [type=value_error, input_value=ArgsKwargs((), {'model_co...g': {'enabled': True}}}), input_type=ArgsKwargs] ``` ### Does this PR introduce _any_ user-facing change? Users/Developers who running the model according to the [tutorial](https://docs.vllm.ai/projects/ascend/en/latest/tutorials/multi_node.html), the parameters can be specified correctly. ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main: `2918c1b49c` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-11-24 10:57:50 +08:00
mazhixin000	ab51fcea4c	[Doc]Add single node PD disaggregation instructions (#4337 ) ### What this PR does / why we need it? add single node PD disaggregation instructions for Qwen 2.5VL model. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main: `2918c1b49c` --------- Signed-off-by: mazhixin <mazhixin7@huawei.com> Signed-off-by: mazhixin000 <mazhixinkorea@163.com> Co-authored-by: mazhixin <mazhixin7@huawei.com>	2025-11-22 23:33:07 +08:00
wangxiyuan	fff258bce1	[Doc] add release note for v0.11.0rc2 (#4348 ) add release note for v0.11.0rc2 - vLLM version: v0.11.0 - vLLM main: `2918c1b49c` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-21 23:03:32 +08:00
whx	a5554b6661	[Feat][Doc] Add a load_balance_dp_proxy in examples and external dp doc. (#4265 ) ### What this PR does / why we need it? This PR adds a load-balance dp proxy server which can be used in external DP scenario without Disaggregated-Prefill enabled. What's more, add a doc of external dp and load-balance dp proxy server. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? See the new doc. - vLLM version: v0.11.0 - vLLM main: `2918c1b49c` --------- Signed-off-by: whx-sjtu <2952154980@qq.com>	2025-11-21 16:33:23 +08:00
LI SHENGYONG	019c7ded91	eplb redundant expert bugfix (#4291 ) ### What this PR does / why we need it? Redundant experts bugfix ### Does this PR introduce _any_ user-facing change? After configuring the path for experts_map, users do not need to configure iinit_redundancy_expert. ### How was this patch tested? The accuracy of EPLB was tested with and without the use of redundant experts. - vLLM version: v0.11.0 - vLLM main: `2918c1b49c` --------- Signed-off-by: shenchuxiaofugui <1311027364@qq.com>	2025-11-21 14:24:35 +08:00
Canlin Guo	d5fef22149	[Docs] Improve the AISBench multi-modal testing docs (#4255 ) ### What this PR does / why we need it? Add some of the pitfalls I ran into when using AISBench to test multi-modal models. - vLLM version: v0.11.0 - vLLM main: `2918c1b49c` --------- Signed-off-by: gcanlin <canlinguosdu@gmail.com>	2025-11-19 16:00:39 +08:00
pz1116	d43022f3ed	[doc]fix readme for kv pool user guide (#4271 ) ### What this PR does / why we need it? Add the parameter "register_buffer" for PD Aggregated Scenario in the given example. - vLLM version: v0.11.0 - vLLM main: `2918c1b49c` Signed-off-by: Pz1116 <zpbzpb123123@gmail.com>	2025-11-19 15:57:50 +08:00
liziyu	a30261f779	[P/D] pd proxy support ipv6 (#4161 ) ### What this PR does / why we need it? pd proxy support ipv6, mooncake connector check whether the IPv6 address is used and notify the user. - vLLM version: v0.11.0 - vLLM main: `2918c1b49c` --------- Signed-off-by: liziyu <liziyu16@huawei.com>	2025-11-18 11:01:13 +08:00
lilinsiman	adee9dd3b1	[Info][main] Correct the mistake in information documents (#4157 ) ### What this PR does / why we need it? Correct the mistake in information documents ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ut - vLLM version: v0.11.0 - vLLM main: `2918c1b49c` --------- Signed-off-by: lilinsiman <lilinsiman@gmail.com>	2025-11-13 15:53:58 +08:00
22dimensions	c272747d13	Upgrade to 0.11.1 newest vllm commit (#3982 ) ### What this PR does / why we need it? adapt vllm-ascend main branch with vllm releases/v0.11.1 fix `forward context not set` in test_vlm.py caused by: https://github.com/vllm-project/vllm/pull/23207 fix import `cdiv round` failed caused by: https://github.com/vllm-project/vllm/pull/27188 fix import `init_cached_hf_modules` failed caused by: https://github.com/vllm-project/vllm/pull/27567 adapt triton kernel `fused_recurrent_gated_delta_rule_fwd_kernel` caused by: https://github.com/vllm-project/vllm/pull/27654 - remove unused code in sigmoid_gating.py - `class FusedRecurrentFunction` , `fused_recurrent_gated_delta_rule`, `fused_recurrent_gated_delta_rule_fwd` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: 22dimensions <waitingwind@foxmail.com>	2025-11-12 23:01:19 +08:00
zhangyiming	c9e5b90f53	[Doc] Fix DeepSeek-3.2-Exp doc, remove v0.11.0rc0 outdated infos. (#4095 ) ### What this PR does / why we need it? Fix DeepSeek-3.2-Exp doc, remove v0.11.0rc0 outdated infos. - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: menogrey <1299267905@qq.com>	2025-11-12 09:11:31 +08:00
thonean	e38fe92f40	[Misc][Doc] Add service profiling feature with user guide (#3756 ) ### What this PR does / why we need it? To support the data collection capabilities of the msServiceProfiler on vLLM-ascned framework and enable customization of data collection points via configuration file, a default profiling configuration has been added to vllm-ascend, facilitating debugging and optimization for developers and users. ### Does this PR introduce _any_ user-facing change? None ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: minghangc <29514143@qq.com>	2025-11-12 09:07:14 +08:00
wangxiyuan	f811a24bf0	Remove VLLM_USE_V1 (#4086 ) Drop VLLM_USE_V1 usage. This env has been removed from vLLM already. - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-11 15:43:39 +08:00
Yikun Jiang	e384755ce1	[Doc] Recover installation doc to use pip install (#4109 ) ### What this PR does / why we need it? Use pip installation in installation doc and change related doctest to validate. ### Does this PR introduce _any_ user-facing change? No, doc only ### How was this patch tested? Doctest related CI passed - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-11-11 09:25:44 +08:00
wangxiyuan	64220c68c5	[Doc] Add release note for v0.11.0rc1 (#3931 ) Add release note for v0.11.0rc1. - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-10 21:01:50 +08:00
22dimensions	e6625bb582	[Doc] add qwen3 w4a4 tutorial (#4076 ) ### What this PR does / why we need it? v0.11.0rc1 will introduce w4a4 quantization feature, so add this tutorial. ### Does this PR introduce _any_ user-facing change? No - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: 22dimensions <waitingwind@foxmail.com>	2025-11-10 20:30:07 +08:00
herizhen	75c3f9a780	[Typo] LLama has been changed to Llama (#4089 ) ### What this PR does / why we need it? First-generation model:uses"LLama",subsequent models use"Llama" The second"L"here should be lowercase.Other instances of "LLama"on this page should be corrected accordingly ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ut - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: herizhen <you@example.com> Co-authored-by: herizhen <you@example.com>	2025-11-10 16:22:52 +08:00
Canlin Guo	de49fb3deb	[Feature][Build] Upgrade the minimum version to 3.10 (#3926 ) ### What this PR does / why we need it? Closes #3728, #3657. The main branch is now aligned with the vllm `releases/v0.11.1` branch, which no longer supports `Python 3.9`. Check it [here](https://github.com/vllm-project/vllm/blob/releases/v0.11.1/pyproject.toml). ### Does this PR introduce _any_ user-facing change? The newest version of vllm-ascend don't support Python 3.9. ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: gcanlin <canlinguosdu@gmail.com>	2025-11-10 11:50:12 +08:00
wangxiaoteng888	b1a00e0512	[docs] [P/D] add feature guide for disaggregated-prefill (#3950 ) ### What this PR does / why we need it? add feature guide for disaggregated-prefill ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? by ci - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com> Signed-off-by: liziyu <liziyu16@huawei.com> Signed-off-by: wangxiaoteng888 <56506195+wangxiaoteng888@users.noreply.github.com> Co-authored-by: liziyu <liziyu16@huawei.com>	2025-11-10 09:31:30 +08:00
zhangyiming	a74e76b02d	[Doc] Remove extra MLAPO installation step for DeepSeek-V3.2. (#4024 ) ### What this PR does / why we need it? Remove extra MLAPO installation step for DeepSeek-V3.2. - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: menogrey <1299267905@qq.com>	2025-11-10 09:09:59 +08:00
lilinsiman	a3ff765c65	[Info][main] Corrected the errors in the information (#4055 ) ### What this PR does / why we need it? Corrected the errors in the information ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ut - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: lilinsiman <lilinsiman@gmail.com>	2025-11-08 18:48:59 +08:00
zhangyiming	46ef280105	[Doc] Add model feature matrix table. (#4040 ) ### What this PR does / why we need it? Add model feature matrix table. - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: menogrey <1299267905@qq.com>	2025-11-07 11:28:05 +08:00
Li Wang	259eb25f88	[CI] Quick fix mooncake for nightly-ci (#4028 ) ### What this PR does / why we need it? Since we have upgraded to CANN 8.3rc1, we will no longer use the privately maintained Mooncake repository, but instead use the official release released by Mooncake: https://github.com/kvcache-ai/Mooncake/releases/tag/v0.3.7.post2 . Next step: this is only a temporary solution. We will integrate mooncake into the vllm-ascend base image later for easier use. see https://github.com/vllm-project/vllm-ascend/pull/3989 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-11-06 18:46:00 +08:00
Liziqi-77	25b24c02ea	[Feat](Mooncake) Supports multiple input suffixes for global_segment_size (#3690 ) ### What this PR does / why we need it? - global_segment_size and local_buffer_size use constants for unified management. - Newly added support for input formats ending with GB, MB, KB, and B, while being compatible with existing input methods. ### Does this PR introduce _any_ user-facing change? - Users can use new input methods - The documentation has also been modified ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: 李子琦 <liziqi_ing@163.com>	2025-11-06 14:48:15 +08:00
pz1116	b1488ecdb1	[main][doc][kv_pool]Add adxl timeout parameter in kv pool user guide (#4012 ) ### What this PR does / why we need it? Add adxl timeout parameter in kv pool user guide, avoiding timeout error when initializing connections between devices. - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: Pz1116 <zpbzpb123123@gmail.com>	2025-11-05 18:39:35 +08:00
offline893	5cff3069f4	[Doc]Add developer guide of eplb. (#3759 ) ### What this PR does / why we need it? Add developer guide of eplb - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: offline0806 <3337230449@qq.com> Co-authored-by: offline0806 <3337230449@qq.com>	2025-11-05 18:35:41 +08:00
pz1116	e0c23cb011	[docs] Add kv pool developer guide (#3752 ) ### What this PR does / why we need it? Add kv pool developer guide ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? vLLM version: v0.11.0rc3 vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: Pz1116 <zpbzpb123123@gmail.com> Signed-off-by: pz1116 <zpbzpb123123@gmail.com>	2025-11-05 18:03:36 +08:00
zouyida2052	1ba158567c	[Doc] add mtp doc (#3770 ) ### What this PR does / why we need it? add mtp develop doc - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: zouyida2052 <zouyida2002@gmail.com>	2025-11-05 16:38:35 +08:00
wangxiyuan	3ac76fdccc	[Doc] Update version policy (#3999 ) Add version policy for main branch to clear how vllm-ascend work with vllm - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-05 14:55:54 +08:00
zzzzwwjj	46d5a77688	[docs] add aclgraph developer guide (#3683 ) ### What this PR does / why we need it? Add aclgraph developer guide. - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: zzzzwwjj <1183291235@qq.com>	2025-11-05 10:34:28 +08:00
zhangyiming	5f08e07208	[Doc] Refactor the DeepSeek-V3.2-Exp tutorial. (#3871 ) ### What this PR does / why we need it? Refactor the DeepSeek-V3.2-Exp tutorial. - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: menogrey <1299267905@qq.com>	2025-11-04 18:58:33 +08:00
zxr2333	15bb5098ad	[PD Disaggregation]Set adxl engine as default backend and update README (#3761 ) ### What this PR does / why we need it? Set adxl engine as the default Mooncake backend, because Ascend Transport is no longer maintained. Update README to include instructions for installing the adxl backend Mooncake. ### Does this PR introduce _any_ user-facing change? Users need to compile and install the mooncake backend for adxl according to the revised README instructions. ### How was this patch tested? By CI. - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: nwpu-zxr <zhouxuerong2@huawei.com>	2025-11-04 16:06:39 +08:00
wangxiyuan	cc2cd42ad3	Upgrade CANN to 8.3.rc1 (#3945 ) ### What this PR does / why we need it? This PR upgrade CANN from 8.2rc1 to 8.3rc1 and remove the CANN version check logic. TODO: we notice that UT runs failed with CANN 8.3 image. So the base image for UT is still 8.2. We'll fix it later. - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-03 20:21:07 +08:00
wangxiyuan	fcc9a0eaeb	Update torch-npu version to 2.7.1 (#3896 ) ### What this PR does / why we need it? Upgrade torch-npu to the official release version 2.7.1 - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-10-31 17:16:31 +08:00
zhangxinyuehfad	5f6d1b3323	[Doc] Update doc for release notese (#3853 ) ### What this PR does / why we need it? Update doc for release notese - vLLM version: v0.11.0 - vLLM main: https://github.com/vllm-project/vllm/commit/releases/v0.11.1 Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2025-10-31 16:46:17 +08:00
wangxiyuan	10772d94e3	[Build] Force torch version (#3791 ) We notice that sometimes user build vllm-ascend with incorrect torch version. In this case, the build is passed, but when running the code, the error `AttributeError: '_OpNamespace' '_C_ascend' object has no attribute 'weak_ref_tensor'` is raised. Let's force the torch version to 2.7.1 and check the torch version when build from source to fix the issue. closes: #3342 - vLLM version: v0.11.0rc3 - vLLM main: `c9461e05a4` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-10-30 15:53:15 +08:00
wangxiyuan	ff47524b88	[Doc] Remove modeling doc (#3789 ) Remove `modeling` doc, it's useless now - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/releases/v0.11.1 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-10-30 15:53:02 +08:00
Liwx	eed1957f03	Add FAQ for docker pull error on Kylin OS (#3870 ) Added instructions for resolving 'invalid tar header' error on Kylin OS with an ARM64 architecture on Atlas300I hardware during docker pull, including steps for offline loading of docker images. --- ### What this PR does / why we need it? The primary motivation for this PR is to address a critical `docker pull` failure that occurs on specific, yet important, enterprise environments. Specifically, when operating on Kylin OS (麒麟操作系统) with an ARM64 architecture on Atlas300I hardware, users frequently encounter an `archive/tar: invalid tar header` error, which completely blocks the setup process. This issue has been consistently reproduced, with multiple retries failing with the same error, confirming that it is a persistent environmental problem rather than a transient network issue. <img width="2060" height="525" alt="image" src="https://github.com/user-attachments/assets/6c1c5728-de27-476f-8df4-723564fc290b" /> This guide provides a robust, step-by-step workaround using an offline-loading method (`docker save` on a host machine and `docker load` on the target machine). This solution is crucial for enabling users on this platform to use vLLM. This contribution does not directly fix an existing issue number, but it proactively solves a significant environmental and usability problem for a growing user base. ### Does this PR introduce _any_ user-facing change? No.It does not alter any code, APIs, interfaces, or existing behavior of the vLLM project. ### How was this patch tested? The instructions and troubleshooting steps in this guide were validated through a real-world, end-to-end test case on the my hardware and OS. The testing process was as follows: 1. Problem Reproduction: An attempt was made to directly `docker pull` the `vllm-ascend:v0.10.0rc1-310p` image on a target machine running Kylin OS (ARM64). The `invalid tar header` failure was successfully and consistently reproduced, confirming the existence of the problem. 2. Solution Implementation: The workaround detailed in the guide was executed: * On a separate host machine (Ubuntu x86_64), the image was successfully pulled using the `--platform linux/arm64` flag. * The image was then saved to a `.tar` archive using `docker save`. * The `.tar` archive was transferred to the target Kylin OS machine. * The image was successfully loaded from the archive using `docker load -i ...`. 3. End-to-End Validation: After loading the image, the vLLM container was launched on the target machine following the instructions in the guide. Both online inference (via `curl` to the API server) and offline inference (via the Python script) were executed successfully, confirming that the entire workflow described in the document is accurate and effective. Since this is a documentation-only change based on a validated workflow, no new unit or integration tests were added to the codebase. - vLLM version: v0.11.0rc3 - vLLM main: `83f478bb19` --------- Signed-off-by: Liwx <liweixuan1014@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-30 14:10:52 +08:00
zhangxinyuehfad	789ba4c5c2	[Doc] Update doc (#3836 ) ### What this PR does / why we need it? Update doc ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/releases/v0.11.1 Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2025-10-29 11:03:39 +08:00

... 5 6 7 8 9 ...

617 Commits