xc-llm-ascend

Author	SHA1	Message	Date
wangxiyuan	0d14f635b4	upgrade torch npu version (#4433 ) vLLM graph feature now rely on torch >=2.8. To make graph mode work, we need upgrade torch version as well. For long term support, upgrade torch to a newer one is good to go as well. Related vLLM change: https://github.com/vllm-project/vllm/pull/25110 - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2	2025-12-01 19:01:55 +08:00
dependabot[bot]	8c65009d62	Bump actions/setup-python from 6.0.0 to 6.1.0 (#4591 ) Bumps [actions/setup-python](https://github.com/actions/setup-python) from 6.0.0 to 6.1.0. - vLLM version: v0.11.2 Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-12-01 14:32:08 +08:00
Mengqing Cao	517fd9272d	Revert "drop ascend scheduler" (#4580 ) Reverts vllm-project/vllm-ascend#4498 - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2	2025-11-29 22:20:48 +08:00
wangxiyuan	1eb5295a1b	remove qwen3-next model file (#4573 ) Let's remove qwen3-next model filecurrently. We'll support it later by using vLLM origin model file - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-29 18:37:26 +08:00
Nengjun Ma	a3041cd78c	[Bugfix] fix dp parallel + tp > 1 offline inference port conflict (#4539 ) ### What this PR does / why we need it? fix dp parallel + tp > 1 offline inference port conflict issue import PR:https://github.com/vllm-project/vllm-ascend/pull/429 - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: leo-pony <nengjunma@outlook.com>	2025-11-29 18:37:11 +08:00
wangxiyuan	6664a4e5ce	improve soc version (#4522 ) Make SOC_VERSION be readable for users. Now users can set simply "910b"、“910c”、“310p” - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-29 17:42:16 +08:00
wangxiyuan	f10acddb78	drop ascend scheduler (#4498 ) Ascend scheduler was added for non chunk prefill case before, since that the npu ops didn't work well with chunked prefill. Now the ops with chunked prefill work better, it's time to remove the ascend scheduler to use vLLM default scheduler. - vLLM version: v0.11.2 --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-29 16:18:34 +08:00
wangxiyuan	8ebbf13c1a	Update triton package name (#4563 ) Add `aarch64` suffix to make sure the package name is OK - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-29 15:00:40 +08:00
wangxiyuan	048d350f9e	update triton package url (#4552 ) Triton package url is not correct. This PR fix it Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-28 21:00:49 +08:00
shiyuan680	1c4a0468ee	【OPS】qwen3-next support triton chunk_gated_delta_rule ops (#4070 ) ### What this PR does / why we need it? qwen3-next suppot triton chunk_gated_delta_rule ops ### co-owners @OsirisDuan - vLLM version: v0.11.2 Signed-off-by: shiyuan680 <917935075@qq.com>	2025-11-28 20:55:43 +08:00
Chenxi Qian	554f16ae1f	[Kernel] add custom op GmmSwigluQuantWeightNzTensorList (#3804 ) ### What this PR does / why we need it? This PR introduces support for adding custom CANN `aclnn` ops to `vllm-ascend`, allowing users to define and use their own custom operators. Key changes include: - Building and installing custom ops into the `vllm-ascend`-specified directory - Binding the `aclnn` op interface to the `torch.ops._C_ascend` module - Enabling invocation of these ops within `vllm-ascend` This PR includes a sample custom op: `aclnnGroupedMatmulSwigluQuantWeightNzTensorList`, which is adapted from the CANN operator [`aclnnGroupedMatmulSwigluQuantWeightNZ`](https://www.hiascend.com/document/detail/zh/canncommercial/83RC1/API/aolapi/context/aclnnGroupedMatmulSwigluQuantWeightNZ.md). Its input parameters `weight` and `weight_scale` now accept `list[torch.Tensor]` (i.e., `at::TensorList`). ### Does this PR introduce _any_ user-facing change? No. - vLLM version: v0.11.2 --------- Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com>	2025-11-28 18:06:39 +08:00
LHXuuu	bdc66972db	[Quantization] Support compressed tensors w8a8 static and w8a8 dynamic weight (#4036 ) ### What this PR does / why we need it? While using the LLM Compressor quantization tool from the VLLM community to generate quantized weights, the VLLM Ascend engine needs to be adapted to support the compressed tensors quantization format. 1. Add AscendCompressedTensorsConfig to replace CompressedTensorsConfig in vllm. 2. Support CompressedTensorsW8A8 static weight. - weight: per-channel, int8, symmetric; activation: per-tensor, int8, symmetric. 4. Support CompressedTensorsW8A8Dynamic weight. - weight: per-channel, int8, symmetric; activation: per-token, int8, symmetric, dynamic. 5. Modify the override_quantization_method in AscendQuantConfig. Co-authored-by: taoqun110 taoqun@huawei.com Co-authored-by: chenxi-hh chen464822955@163.com - vLLM version: v0.11.2 --------- Signed-off-by: LHXuuu <scut_xlh@163.com> Signed-off-by: chenxi-hh <chen464822955@163.com> Signed-off-by: chenxi-hh <32731611+chenxi-hh@users.noreply.github.com> Co-authored-by: chenxi-hh <chen464822955@163.com> Co-authored-by: chenxi-hh <32731611+chenxi-hh@users.noreply.github.com>	2025-11-28 14:09:39 +08:00
SILONG ZENG	ab37a7d5ae	[main]Upgrade cann to 8.3rc2 (#4350 ) ### What this PR does / why we need it? Upgrade cann to 8.3rc2 ### Does this PR introduce _any_ user-facing change? Yes, docker image will use 8.3.RC2 - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: MrZ20 <2609716663@qq.com>	2025-11-28 14:06:01 +08:00
Slightwind	9fdabb7b60	[feature] Add Custom Op grouped_matmul_swiglu_quant (#4431 ) This PR introduces the `EXEC_NPU_CMD` macro, serving as an adapter layer to simplify the invocation of `aclnn` operators on Ascend NPUs. Key Changes: * Adapter Layer: Added `EXEC_NPU_CMD` macro and related dependencies to standardize `aclnn` calls. * Operator Support: Integrated `grouped_matmul_swiglu_quant` as a reference implementation to demonstrate the usage of the new macro. --- - vLLM version: v0.11.2 --------- Signed-off-by: SlightwindSec <slightwindsec@gmail.com>	2025-11-27 21:56:18 +08:00
zzzzwwjj	1fd56b1106	chip type judgement code optimization (#4485 ) ### What this PR does / why we need it? \| \| cpu envir \| npu envir \| \|---\|---\|---\| \| set `SOC_VERSION` \| check if `SOC_VERSION` is in dict `soc_to_device`, if not, raise an error that can not support current chip type. \| print a warning log when `SOC_VERSION` is not equal to chip type from `npu-smi`, same as left for others. \| \| not set `SOC_VERSION` \| raise an error that `SOC_VERSION` is necessary when compiling in a cpu envir. \| use chip type from `npu-smi` to compile vllm-ascend. \| ### Does this PR introduce _any_ user-facing change? Now we must set env `SOC_VERSION` when compiling in cpu envir. ### How was this patch tested? - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 Signed-off-by: zzzzwwjj <1183291235@qq.com>	2025-11-27 17:18:49 +08:00
wangxiyuan	a91e76cd84	[CI] clean up ci (#4452 ) 1. Run 4-card test only when single and 2-card test passed 2. rename file to make it more clear 3. remove useless pd workflow, it has been managed by nightly test already. - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-26 14:07:56 +08:00
wangxiyuan	bc69d7cfe1	upgrade to vllm 0.11.2 (#4400 ) Bump vLLM version to v0.11.2 What's broken and changed by vLLM: 1. structured_output is broken by https://github.com/vllm-project/vllm/pull/26866 2. get_mrope_input_positions is broken by https://github.com/vllm-project/vllm/pull/28399 3. graph mode is broken by https://github.com/vllm-project/vllm/pull/25110 we'll upgrade torch to 2.8 to fix the problem later 4. embedding is broken by https://github.com/vllm-project/vllm/pull/27583 5. `get_attn_backend_cls` and attention backend is broken are broken by https://github.com/vllm-project/vllm/pull/28534 6. spec decode is broken by https://github.com/vllm-project/vllm/pull/28771 7. sp feature is broken by https://github.com/vllm-project/vllm/pull/27126 8. mtp is broken by https://github.com/vllm-project/vllm/pull/27922 9. lora is broken by https://github.com/vllm-project/vllm/pull/21068 10. execute_model is broken by https://github.com/vllm-project/vllm/pull/26866 11. `VLLM_DISABLE_SHARED_EXPERTS_STREAM` env is broken by https://github.com/vllm-project/vllm/pull/28159 12. kv cahe is broken by https://github.com/vllm-project/vllm/pull/27753 13. dp is broken by https://github.com/vllm-project/vllm/pull/25110 What's broken and changed by ourself: 1. qwen vl is broken by https://github.com/vllm-project/vllm/pull/28455 We'll remove model files in the future to avoid this kind of error 2. Engine core is broken by https://github.com/vllm-project/vllm/pull/23691 We'll remove the patch file in the future. 3. Ascend scheduler is broken by https://github.com/vllm-project/vllm/pull/28733 We'll remove ascend scheudler later. 4. qwen3-next is broken by https://github.com/vllm-project/vllm/pull/28083 We'll remove model files in the future to avoid this kind of error 5. qwen vl is broken by https://github.com/vllm-project/vllm/pull/27764. We'll remove model files in the future Known issue: 1. ray doesn't work 2. the accuracy of qwen3-next is not correct 3. qwen3-vl is broken 4. prefix cache+ ascend scheduler + deepseek v2 lite is broken. Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Co-authored-by: 22dimensions <waitingwind@foxmail.com> Co-authored-by: shen-shanshan <467638484@qq.com> - vLLM version: v0.11.2 --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: leo-pony <nengjunma@outlook.com>	2025-11-26 11:48:58 +08:00
dependabot[bot]	84eae97f27	Bump actions/checkout from 4 to 6 (#4380 ) Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6. - vLLM main: `2918c1b49c` Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-11-25 09:05:11 +08:00
wangxiyuan	a1f142b7ad	Drop 0.11.0 support (#4377 ) There is a lot hack code for v0.11.0, which makes the code hard to upgrade to newer vLLM version. Since v0.11.0 will release soon. Let's drop v0.11.0 support first. Then we'll upgrade to v0.11.2 soon. - vLLM version: v0.11.0 - vLLM main: `2918c1b49c` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-24 17:08:20 +08:00
Li Wang	b5f7a83927	[Doc] Upgrade multi-node doc (#4365 ) ### What this PR does / why we need it? When we are using `Ascend scheduler`, the param `max_num_batched_tokens` should be larger than `max_model_len`, otherwise, will encountered the follow error: ```shell Value error, Ascend scheduler is enabled without chunked prefill feature. Argument max_num_batched_tokens (4096) is smaller than max_model_len (32768). This effectively limits the maximum sequence length to max_num_batched_tokens and makes vLLM reject longer sequences. Please increase max_num_batched_tokens or decrease max_model_len. [type=value_error, input_value=ArgsKwargs((), {'model_co...g': {'enabled': True}}}), input_type=ArgsKwargs] ``` ### Does this PR introduce _any_ user-facing change? Users/Developers who running the model according to the [tutorial](https://docs.vllm.ai/projects/ascend/en/latest/tutorials/multi_node.html), the parameters can be specified correctly. ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main: `2918c1b49c` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-11-24 10:57:50 +08:00
Li Wang	b34f195cc8	[CI] Fix nightly CI for A2 series (#3825 ) ### What this PR does / why we need it? For multi-node CI system, we need to ensure that cluster resources meet the expected specifications before conducting multi-node interoperability tests. Otherwise, unexpected errors may occur (for example, we might mistakenly assume all nodes are ready and perform a global cluster IP acquisition, which would cause an exception to be thrown in Python because some nodes might not actually be ready at that point). Therefore, we need to wait at the workflow level until all resources meet the expected specifications. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main: `2918c1b49c` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-11-23 23:05:33 +08:00
Yizhou	cbb27feaf2	[Test] Add ACL graph capture/replay DP test (#4259 ) ### What this PR does / why we need it? Add ACL graph capture/replay DP test, this is a imprved version of #3886 Restructures the multi-card ACL graph test for improved clarity, robustness, and accuracy. Key improvements include: - Replaces fragile `sys.settrace` and manual patching with a clean, reusable spy installer using `unittest.mock.patch`. - Introduces more precise metrics by tracking `NPUModelRunner.execute_model` and `_dummy_run` calls directly. - Rewrites assertions to be more accurate and provides clear explanations for the expected counts of graph captures, replays, model executions, and dummy runs. - Simplifies the overall test structure by separating the worker logic into a dedicated function. - Removes a long, unnecessary sleep at the end of the test. - Expands test coverage by adding a larger `max_tokens` parameter. ### Does this PR introduce _any_ user-facing change? None. ### How was this patch tested? None. - vLLM version: v0.11.0 - vLLM main: `2918c1b49c` --------- Signed-off-by: lilinsiman <lilinsiman@gmail.com> Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com> Co-authored-by: lilinsiman <lilinsiman@gmail.com>	2025-11-21 08:50:46 +08:00
CodeCat	470fe05df6	[Test] Add tests for the multi-node DeepSeek-V2-Lite network in GE Graph (#4039 ) ### What this PR does / why we need it? Add tests for the multi-node DeepSeek-V2-Lite network in GE Graph mode, and supplement the end-to-end (e2e) tests for the MLA and NZ features of this network. - vLLM version: v0.11.0 - vLLM main: `2918c1b49c` --------- Signed-off-by: CodeNine-CJ <chenjian343@huawei.com>	2025-11-20 17:28:32 +08:00
XiaoxinWang	e38ef2c434	support FULL graph mode for GQA (#3970 ) ### What this PR does / why we need it? The current library only supports the FullDecodeOnly graph mode, which enables full graph execution during the decode. This PR extends support to allow full graph execution in both the prefill and decode, referred to as FULL graph mode. - vLLM version: v0.11.0 - vLLM main: `2918c1b49c` Signed-off-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com> Co-authored-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>	2025-11-17 10:50:35 +08:00
zhangyiming	c334114f69	[CI] Fix no space left in build wheel CI. (#4215 ) ### What this PR does / why we need it? [CI] Fix no space left in build wheel CI. - vLLM version: v0.11.0 - vLLM main: `2918c1b49c` Signed-off-by: menogrey <1299267905@qq.com>	2025-11-17 10:45:58 +08:00
zhangxinyuehfad	67f2b3a031	[Test] Add deepseek v3.2 exp nightly test (#4191 ) ### What this PR does / why we need it? - skip the nightly image build when the github event is pull_request - set imagepullpolicy as alway for multi_node test - move multi_node tests ahead to have some resource clean first - do not relevant nightly image build with nightly tests for tolerance - vLLM version: v0.11.0 - vLLM main: `2918c1b49c` --------- Signed-off-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: wangli <wangli858794774@gmail.com> Co-authored-by: wangli <wangli858794774@gmail.com>	2025-11-14 15:46:10 +08:00
欧派果奶我还要	f90ed95578	[CI] Add multi-nodes EPLB configs of DeepSeek-R1-W8A8 & Qwen3-235B-W8A8 (#4144 ) ### What this PR does / why we need it? add DeepSeek-R1-W8A8 and Qwen3-235B-W8A8 configs in multi-nodes and EPLB scenario ### Does this PR introduce _any_ user-facing change? no - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: 白永斌 <baiyongbin3@h-partners.com> Co-authored-by: 白永斌 <baiyongbin3@h-partners.com>	2025-11-14 08:50:29 +08:00
Li Wang	7294f89e43	[CI] Add daily images build for nightly ci (#3989 ) ### What this PR does / why we need it? Given the current excessively long build time of our nightly-ci, I recommend installing necessary, confirmed versions of packages in the Docker image to reduce the time required for integration testing. Including Mooncake vllm with fixed tags, This is expected to reduce nightly-ci duration by 2 hours. - vLLM version: v0.11.0 - vLLM main: `2918c1b49c` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-11-13 20:10:12 +08:00
22dimensions	c272747d13	Upgrade to 0.11.1 newest vllm commit (#3982 ) ### What this PR does / why we need it? adapt vllm-ascend main branch with vllm releases/v0.11.1 fix `forward context not set` in test_vlm.py caused by: https://github.com/vllm-project/vllm/pull/23207 fix import `cdiv round` failed caused by: https://github.com/vllm-project/vllm/pull/27188 fix import `init_cached_hf_modules` failed caused by: https://github.com/vllm-project/vllm/pull/27567 adapt triton kernel `fused_recurrent_gated_delta_rule_fwd_kernel` caused by: https://github.com/vllm-project/vllm/pull/27654 - remove unused code in sigmoid_gating.py - `class FusedRecurrentFunction` , `fused_recurrent_gated_delta_rule`, `fused_recurrent_gated_delta_rule_fwd` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: 22dimensions <waitingwind@foxmail.com>	2025-11-12 23:01:19 +08:00
jiangyunfan1	0e6e08e939	[TEST]Update nightly cases and add mtpx (#4111 ) ### What this PR does / why we need it? This PR updates some nightly test cases and adds mtpx cases, we need to test them daily ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running the test - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>	2025-11-11 17:39:58 +08:00
wangxiyuan	f811a24bf0	Remove VLLM_USE_V1 (#4086 ) Drop VLLM_USE_V1 usage. This env has been removed from vLLM already. - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-11 15:43:39 +08:00
zhangxinyuehfad	fae1c59a79	[Fix] Refactor and fix dist test to e2e full test (#3808 ) ### What this PR does / why we need it? Fix ci test on A3 1. delete lables 2. fix filter yaml file name 3. refactor dist test to e2e full test 4. skip test_models_distributed_Qwen3_MOE_TP2_WITH_EP & test_models_distributed_Qwen3_MOE_W8A8_WITH_EP because of https://github.com/vllm-project/vllm-ascend/issues/3895 - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2025-11-11 10:36:05 +08:00
zhangxinyuehfad	b77b4f1abf	[Test] Add nightly test for DeepSeek-V3.2-Exp (#3908 ) ### What this PR does / why we need it? Add nightly test for DeepSeek-V3.2-Exp ### How was this patch tested? test action： https://github.com/vllm-project/vllm-ascend/actions/runs/19156153634/job/54757008557?pr=3908 - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2025-11-11 10:29:57 +08:00
zhaomingyu13	7ffbe73d54	[main][Bugfix] Fix ngram precision issue and open e2e ngram test (#4090 ) ### What this PR does / why we need it? Fix ngram precision issue and open e2e ngram test - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: Icey <1790571317@qq.com> Signed-off-by: zhaomingyu <zhaomingyu13@h-partners.com> Co-authored-by: Icey <1790571317@qq.com>	2025-11-11 09:06:24 +08:00
zhangxinyuehfad	d40ba52454	[Fix] fix Qwen2-Audio-7B-Instruct accuracy test (#4017 ) ### What this PR does / why we need it? fix Qwen2-Audio-7B-Instruct accuracy test ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2025-11-10 11:54:18 +08:00
Levi	0a62e671fb	[Feat] flashcomm_v2 optim solution (#3232 ) ### What this PR does / why we need it? Supports generalized FlashComm2 optimization, which reduces communication overhead, decreases RmsNorm computation, and saves one AllGather step by replacing Allreduce operations in the Attention module with pre-AlltoAll and post-AllGather operations (used in combination with FlashComm1). This feature is enabled during the Prefill phase and is recommended to be used together with FlashComm1, delivering broad performance improvements, especially in long sequence scenarios with large tensor parallelism (TP) configurations. Benchmark tests show that under TP16DP1 configuration, it can improve the prefill performance of the DeepSeek model by 8% on top of FlashComm1. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: zzhxx <2783294813@qq.com> Signed-off-by: Levi-JQ <yujinqi2@huawei.com> Co-authored-by: Levi-JQ <yujinqi2@huawei.com> Co-authored-by: zzhxx <2783294813@qq.com>	2025-11-10 11:01:45 +08:00
jiangyunfan1	c116524379	[TEST]Add qwen3-235b-w8a8 and qwen3-30b-w8a8 nightly test (#3973 ) ### What this PR does / why we need it? This PR adds some qwen3-235b-w8a8 cases qwen3-30b-w8a8 cases, we need test them daily ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? by running the test - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>	2025-11-08 18:49:28 +08:00
hucong	48094148f8	[BugFix] Improve the performance of prefixcache features (#4022 ) ### What this PR does / why we need it? The code bug caused an empty bubble. When the npu_paged_cache_load operator was called, it forcibly transferred seq_len2 to the device, which triggered synchronization and interrupted the CPU operator's launch stream. - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: underfituu <hzhucong@163.com>	2025-11-08 18:45:31 +08:00
wangx700	24d6314718	[Bugfix] fix sleepmode level2 e2e test (#4019 ) ### What this PR does / why we need it? enable sleepmode level2 e2e test and add the check logic to ensure the nz is not enabled. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? use e2e tests - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: wangx700 <wangxin700@huawei.com>	2025-11-08 14:11:55 +08:00
zhangxinyuehfad	737cad2b6b	[Test] Refactor accuracy test to nightly test (#3814 ) ### What this PR does / why we need it? Refactor accuracy test to nightly test - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2025-11-06 09:06:59 +08:00
zhangxinyuehfad	49e6983b3b	[Test] Add accuracy test for qwen3-30b-a3b-w8a8 (#3807 ) ### What this PR does / why we need it? Add accuracy test for qwen3-30b-a3b-w8a8 This PR depends on https://github.com/vllm-project/vllm-ascend/pull/3799 ### How was this patch tested? qwen3-30b-a3b-w8a8 accuarcy test ok: https://github.com/vllm-project/vllm-ascend/actions/runs/19062045267/job/54443732877?pr=3807 - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2025-11-04 18:56:31 +08:00
ZengSilong	dc1a6cb503	[Test]Add accuracy test for multiple models (#3823 ) ### What this PR does / why we need it? Add accuracy test for multiple models： - Meta_Llama_3.1_8B_Instruct - Qwen2.5-Omni-7B - Qwen3-VL-8B-Instruct - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: MrZ20 <2609716663@qq.com>	2025-11-04 14:46:39 +08:00
zhangxinyuehfad	646fbac7a9	[Test] Add accuracy test for qwen3-8b-w8a8 (#3799 ) ### What this PR does / why we need it? Add accuracy test for qwen3-8b-w8a8 - vLLM version: v0.11.0rc3 - vLLM main: `c9461e05a4` Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2025-11-04 09:23:11 +08:00
wangxiyuan	cc2cd42ad3	Upgrade CANN to 8.3.rc1 (#3945 ) ### What this PR does / why we need it? This PR upgrade CANN from 8.2rc1 to 8.3rc1 and remove the CANN version check logic. TODO: we notice that UT runs failed with CANN 8.3 image. So the base image for UT is still 8.2. We'll fix it later. - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-03 20:21:07 +08:00
CodeCat	49d74785c4	[Test] Add new e2e test use deepseek-v2-lite in ge graph mode (#3937 ) ### What this PR does / why we need it? The current test cases lack end-to-end (e2e) testing for the deepseek-v2-lite network in ge graph mode. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: CodeNine-CJ <chenjian343@huawei.com>	2025-11-03 20:10:01 +08:00
Canlin Guo	f99762eb25	[E2E][MM] Add e2e tests for InternVL model (#3796 ) ### What this PR does / why we need it? As a validation for #3664, add end-to-end tests to monitor the InternVL model and ensure its continuous proper operation. This PR is only for single-card. So the models that have more parameters than 8B like 78B are needed to test using multi-cards. ### Does this PR introduce _any_ user-facing change? None. ### How was this patch tested? `pytest -sv tests/e2e/singlecard/multi-modal/test_internvl.py` - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: gcanlin <canlinguosdu@gmail.com>	2025-10-31 15:42:47 +08:00
lilinsiman	35a913cf1e	add new e2e tests case for aclgraph memory (#3879 ) ### What this PR does / why we need it? add new e2e tests case for aclgraph memory ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ut - vLLM version: v0.11.0rc3 - vLLM main: `83f478bb19` Signed-off-by: lilinsiman <lilinsiman@gmail.com>	2025-10-31 09:16:52 +08:00
Li Wang	eb0a2ee2d0	[CI] Optimize nightly CI (#3898 ) ### What this PR does / why we need it? This patch mainly fix the the problem of not being able to determine the exit status of the pod's entrypoint script and some other tiny optimizations: 1. Shorten wait for server timeout 2. fix typo 3. fix the issue of ais_bench failing to correctly access the proxy URL in a PD separation scenario. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-10-30 23:42:20 +08:00
Song Zhixin	216fc0e8e4	[feature] Prompt Embeddings Support for v1 Engine (#3026 ) ### What this PR does / why we need it? this PR based on [19746](https://github.com/vllm-project/vllm/issues/19746), support Prompt Embeddings for v1 engine on NPU ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? ```python python examples/prompt_embed_inference.py ``` - vLLM version: v0.11.0 - vLLM main: https://github.com/vllm-project/vllm/commit/releases/v0.11.1 --------- Signed-off-by: jesse <szxfml@gmail.com>	2025-10-30 17:15:57 +08:00
Meihan-chen	67dd3a4581	[UT] fix skip ut test for test_utils (#3803 ) ### What this PR does / why we need it? [UT] fix ut test for test_utils that https://github.com/vllm-project/vllm-ascend/pull/3612 skipped. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? vLLM version: v0.11.0rc3 vLLM main: `17c540a993` - vLLM version: v0.11.0rc3 - vLLM main: `83f478bb19` --------- Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>	2025-10-30 15:52:53 +08:00

... 5 6 7 8 9 ...

628 Commits