xc-llm-ascend

Author	SHA1	Message	Date
Li Wang	3ca11d5a7c	[CI] Fix nightly-ci (#4159 ) ### What this PR does / why we need it? Explicit specification `NUMEXPR_MAX_THREADS` to avoid `Error. nthreads cannot be larger than environment variable "NUMEXPR_MAX_THREADS" (64)` Signed-off-by: wangli <wangli858794774@gmail.com>	2025-11-12 22:06:49 +08:00
XiaoxinWang	1b4ce63ec9	fix fullgraph in ds. (#4016 ) ### What this PR does / why we need it? DS don't have 'AscendAttentionMetadataBuilder' class so will fail in fullgraph. We resolved the issue by modifying the code to only check for 'GDNAttentionMetadataBuilder ', while all other attention cases follow the default branch. - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com> Co-authored-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>	2025-11-12 10:11:43 +08:00
Canlin Guo	1c677c3b87	[Test][Accuracy] Add accuracy evaluation config for InternVL3_5-8B (#3964 ) ### What this PR does / why we need it? To continuously monitor the accuracy of the InternVL3_5-8B model, this PR adds the corresponding configuration file to the CI. We need to add the `-hf` suffix to avoid incompatibility with the `lm-eval` preprocessor. ### How was this patch tested? `pytest -sv ./tests/e2e/models/test_lm_eval_correctness.py --config ./tests/e2e/models/configs/InternVL3_5-8B.yaml` - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: gcanlin <canlinguosdu@gmail.com>	2025-11-12 09:05:55 +08:00
jiangyunfan1	0e6e08e939	[TEST]Update nightly cases and add mtpx (#4111 ) ### What this PR does / why we need it? This PR updates some nightly test cases and adds mtpx cases, we need to test them daily ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running the test - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>	2025-11-11 17:39:58 +08:00
wangxiyuan	f811a24bf0	Remove VLLM_USE_V1 (#4086 ) Drop VLLM_USE_V1 usage. This env has been removed from vLLM already. - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-11 15:43:39 +08:00
zhangxinyuehfad	d5567680a2	[Fixbug] Fix ut test (#4116 ) ### What this PR does / why we need it? Fix ut test：pytest<9.0.0 test_models_distributed_Qwen3_NEXT_MTP_TP4_SIMILARITY failed by https://github.com/vllm-project/vllm-ascend/pull/3967, skip it now, and fix it later. test ok :https://github.com/vllm-project/vllm-ascend/actions/runs/19255274573/job/55048851066?pr=4116 - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2025-11-11 15:31:00 +08:00
zhangxinyuehfad	b77b4f1abf	[Test] Add nightly test for DeepSeek-V3.2-Exp (#3908 ) ### What this PR does / why we need it? Add nightly test for DeepSeek-V3.2-Exp ### How was this patch tested? test action： https://github.com/vllm-project/vllm-ascend/actions/runs/19156153634/job/54757008557?pr=3908 - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2025-11-11 10:29:57 +08:00
Yikun Jiang	e384755ce1	[Doc] Recover installation doc to use pip install (#4109 ) ### What this PR does / why we need it? Use pip installation in installation doc and change related doctest to validate. ### Does this PR introduce _any_ user-facing change? No, doc only ### How was this patch tested? Doctest related CI passed - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-11-11 09:25:44 +08:00
zhaomingyu13	7ffbe73d54	[main][Bugfix] Fix ngram precision issue and open e2e ngram test (#4090 ) ### What this PR does / why we need it? Fix ngram precision issue and open e2e ngram test - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: Icey <1790571317@qq.com> Signed-off-by: zhaomingyu <zhaomingyu13@h-partners.com> Co-authored-by: Icey <1790571317@qq.com>	2025-11-11 09:06:24 +08:00
Icey	e04a87f4be	[BugFix] Fixes Qwen3-Next enable nz accuracy problem (#4058 ) ### What this PR does / why we need it? - Fixes Qwen3-Next enable nz accuracy problem ### Does this PR introduce _any_ user-facing change? N/A - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: Icey <1790571317@qq.com> Signed-off-by: wxsIcey <1790571317@qq.com>	2025-11-10 20:54:57 +08:00
zhangxinyuehfad	d40ba52454	[Fix] fix Qwen2-Audio-7B-Instruct accuracy test (#4017 ) ### What this PR does / why we need it? fix Qwen2-Audio-7B-Instruct accuracy test ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2025-11-10 11:54:18 +08:00
Levi	0a62e671fb	[Feat] flashcomm_v2 optim solution (#3232 ) ### What this PR does / why we need it? Supports generalized FlashComm2 optimization, which reduces communication overhead, decreases RmsNorm computation, and saves one AllGather step by replacing Allreduce operations in the Attention module with pre-AlltoAll and post-AllGather operations (used in combination with FlashComm1). This feature is enabled during the Prefill phase and is recommended to be used together with FlashComm1, delivering broad performance improvements, especially in long sequence scenarios with large tensor parallelism (TP) configurations. Benchmark tests show that under TP16DP1 configuration, it can improve the prefill performance of the DeepSeek model by 8% on top of FlashComm1. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: zzhxx <2783294813@qq.com> Signed-off-by: Levi-JQ <yujinqi2@huawei.com> Co-authored-by: Levi-JQ <yujinqi2@huawei.com> Co-authored-by: zzhxx <2783294813@qq.com>	2025-11-10 11:01:45 +08:00
jiangyunfan1	c116524379	[TEST]Add qwen3-235b-w8a8 and qwen3-30b-w8a8 nightly test (#3973 ) ### What this PR does / why we need it? This PR adds some qwen3-235b-w8a8 cases qwen3-30b-w8a8 cases, we need test them daily ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? by running the test - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>	2025-11-08 18:49:28 +08:00
wangx700	24d6314718	[Bugfix] fix sleepmode level2 e2e test (#4019 ) ### What this PR does / why we need it? enable sleepmode level2 e2e test and add the check logic to ensure the nz is not enabled. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? use e2e tests - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: wangx700 <wangxin700@huawei.com>	2025-11-08 14:11:55 +08:00
offline893	f7ca3bc0fa	[CI]Fix eplb ci. (#4052 ) ### What this PR does / why we need it? This pr fixes ci on eplb - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: offline0806 <3337230449@qq.com> Co-authored-by: offline0806 <3337230449@qq.com>	2025-11-07 23:53:35 +08:00
drslark	23b785fdfb	[Feat] Adapted mtp function to Qwen3-next (#3918 ) ### What this PR does / why we need it? Adapts mtp function to Qwen3-next. - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: drslark <slarksblood@qq.com>	2025-11-07 16:39:03 +08:00
Li Wang	259eb25f88	[CI] Quick fix mooncake for nightly-ci (#4028 ) ### What this PR does / why we need it? Since we have upgraded to CANN 8.3rc1, we will no longer use the privately maintained Mooncake repository, but instead use the official release released by Mooncake: https://github.com/kvcache-ai/Mooncake/releases/tag/v0.3.7.post2 . Next step: this is only a temporary solution. We will integrate mooncake into the vllm-ascend base image later for easier use. see https://github.com/vllm-project/vllm-ascend/pull/3989 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-11-06 18:46:00 +08:00
jiangyunfan1	34b278a339	[TEST]Update nightly acc test standard (#4032 ) ### What this PR does / why we need it? This PR updates the acc test standard for some cases, we need it to better maintain acc ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? by running the test - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>	2025-11-06 16:58:38 +08:00
XiaoxinWang	738bf2b720	support qwen3-next full_decode_only mode. (#3949 ) ### What this PR does / why we need it? support qwen3-next full_decode_only mode. bs=1, max_token=1024 \| branch\| tps\| e2e time\| \| --- \| --- \| --- \| \|piecewise \|3.06 \| 8.15 \| \|fulldecodeonly \| 7.2 \| 3.47 \| - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com> Co-authored-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>	2025-11-05 08:46:05 +08:00
zhangxinyuehfad	49e6983b3b	[Test] Add accuracy test for qwen3-30b-a3b-w8a8 (#3807 ) ### What this PR does / why we need it? Add accuracy test for qwen3-30b-a3b-w8a8 This PR depends on https://github.com/vllm-project/vllm-ascend/pull/3799 ### How was this patch tested? qwen3-30b-a3b-w8a8 accuarcy test ok: https://github.com/vllm-project/vllm-ascend/actions/runs/19062045267/job/54443732877?pr=3807 - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2025-11-04 18:56:31 +08:00
realliujiaxu	bedf223771	[Perf] move quant before allgather in Allgather EP (#3420 ) ### What this PR does / why we need it? move quant before allgather in Allgather EP, rely on https://github.com/vllm-project/vllm-ascend/pull/3334 Deepseek R1 W8A8 performance on A2 with `HCCL_ALGO="level0:NA;level1:pipeline"`: \| Seq length \| Mean TTFT (ms) main \| Mean TTFT (ms) this PR \| \|----------\|----------\|----------\| \| 4k \| 375.21 \| 364.99 \| \| 16k \| 1465.23 \| 1421.75 \| ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: realliujiaxu <realliujiaxu@163.com>	2025-11-04 16:49:58 +08:00
jiangyunfan1	44b58b8665	[TEST]Add full graph for multimodal nightly tests (#3968 ) ### What this PR does / why we need it? This PR adds full graph for multimodal nightly test, we need to maintain this senario ### How was this patch tested? by running the test - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>	2025-11-04 16:47:48 +08:00
ZengSilong	dc1a6cb503	[Test]Add accuracy test for multiple models (#3823 ) ### What this PR does / why we need it? Add accuracy test for multiple models： - Meta_Llama_3.1_8B_Instruct - Qwen2.5-Omni-7B - Qwen3-VL-8B-Instruct - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: MrZ20 <2609716663@qq.com>	2025-11-04 14:46:39 +08:00
zhangxinyuehfad	646fbac7a9	[Test] Add accuracy test for qwen3-8b-w8a8 (#3799 ) ### What this PR does / why we need it? Add accuracy test for qwen3-8b-w8a8 - vLLM version: v0.11.0rc3 - vLLM main: `c9461e05a4` Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2025-11-04 09:23:11 +08:00
wangxiyuan	cc2cd42ad3	Upgrade CANN to 8.3.rc1 (#3945 ) ### What this PR does / why we need it? This PR upgrade CANN from 8.2rc1 to 8.3rc1 and remove the CANN version check logic. TODO: we notice that UT runs failed with CANN 8.3 image. So the base image for UT is still 8.2. We'll fix it later. - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-03 20:21:07 +08:00
CodeCat	49d74785c4	[Test] Add new e2e test use deepseek-v2-lite in ge graph mode (#3937 ) ### What this PR does / why we need it? The current test cases lack end-to-end (e2e) testing for the deepseek-v2-lite network in ge graph mode. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: CodeNine-CJ <chenjian343@huawei.com>	2025-11-03 20:10:01 +08:00
Li Wang	8f222f21f1	[CI][Nightly] Fix mooncake build (#3958 ) ### What this PR does / why we need it? Fix https://github.com/vllm-project/vllm-ascend/pull/3943 - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-11-03 20:07:47 +08:00
Li Wang	d0cc9c1203	[CI][Nightly] Correct the commit hash available for mooncake (#3943 ) ### What this PR does / why we need it? Because the previous commit hash was accidentally deleted or overwritten. This patch correct the commit hash available for https://github.com/AscendTransport/Mooncake to make nightly ci happy ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: wangli <wangli858794774@gmail.com>	2025-11-01 21:52:16 +08:00
Canlin Guo	f99762eb25	[E2E][MM] Add e2e tests for InternVL model (#3796 ) ### What this PR does / why we need it? As a validation for #3664, add end-to-end tests to monitor the InternVL model and ensure its continuous proper operation. This PR is only for single-card. So the models that have more parameters than 8B like 78B are needed to test using multi-cards. ### Does this PR introduce _any_ user-facing change? None. ### How was this patch tested? `pytest -sv tests/e2e/singlecard/multi-modal/test_internvl.py` - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: gcanlin <canlinguosdu@gmail.com>	2025-10-31 15:42:47 +08:00
lilinsiman	1f486b2dd1	[Test] Add new test model for aclgraph single_request (#3888 ) ### What this PR does / why we need it? add new test model for aclgraph single_request ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ut - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: lilinsiman <lilinsiman@gmail.com>	2025-10-31 11:23:13 +08:00
lilinsiman	35a913cf1e	add new e2e tests case for aclgraph memory (#3879 ) ### What this PR does / why we need it? add new e2e tests case for aclgraph memory ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ut - vLLM version: v0.11.0rc3 - vLLM main: `83f478bb19` Signed-off-by: lilinsiman <lilinsiman@gmail.com>	2025-10-31 09:16:52 +08:00
Li Wang	eb0a2ee2d0	[CI] Optimize nightly CI (#3898 ) ### What this PR does / why we need it? This patch mainly fix the the problem of not being able to determine the exit status of the pod's entrypoint script and some other tiny optimizations: 1. Shorten wait for server timeout 2. fix typo 3. fix the issue of ais_bench failing to correctly access the proxy URL in a PD separation scenario. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-10-30 23:42:20 +08:00
jiangyunfan1	655a229455	[TEST]Add MALPO for aclgraph in nightly test (#3894 ) ### What this PR does / why we need it? This PR adds MALPO for deepseek aclgraph, we need to test it nightly ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running the test - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>	2025-10-30 18:25:54 +08:00
Song Zhixin	216fc0e8e4	[feature] Prompt Embeddings Support for v1 Engine (#3026 ) ### What this PR does / why we need it? this PR based on [19746](https://github.com/vllm-project/vllm/issues/19746), support Prompt Embeddings for v1 engine on NPU ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? ```python python examples/prompt_embed_inference.py ``` - vLLM version: v0.11.0 - vLLM main: https://github.com/vllm-project/vllm/commit/releases/v0.11.1 --------- Signed-off-by: jesse <szxfml@gmail.com>	2025-10-30 17:15:57 +08:00
xuyexiong	eff3e5fc6f	[FEAT] Refactor spec decode to support efficient padded speculation (#3528 ) ### What this PR does / why we need it? 1. Refactor the file `mtp_proposer.py`, splits torchair related codes into `mtp_torchair_proposer.py` 2. According to https://github.com/vllm-project/vllm/pull/24539, implements padded speculative decoding as described in https://github.com/vllm-project/vllm/issues/21984. ### Does this PR introduce _any_ user-facing change? User can use `disable_padded_drafter_batch` to disable/enable padded speculation, default is `False`. offline example: ``` speculative_config={"method": "deepseek_mtp", "num_speculative_tokens": 1, "disable_padded_drafter_batch": False} ``` ### How was this patch tested? - [x] egaer with pad/unpad: - [x] aclgraph with pad/unpad - [x] torchair with pad/unpad performance test of deepseek-r1 with tp16、dp1 aclgraph with pad ITL: 168ms aclgraph with unpad ITL: 169ms original: 178ms - vLLM version: v0.11.0rc3 - vLLM main: `83f478bb19` --------- Signed-off-by: xuyexiong <xuyexiong@huawei.com>	2025-10-30 16:53:05 +08:00
offline893	14ca1e5cb2	[CI]Fix oom of deepseek-eplb nigtly test. (#3884 ) ### What this PR does / why we need it? Fix oom of deepseek-eplb nigtly test - vLLM version: v0.11.0rc3 - vLLM main: `83f478bb19` --------- Signed-off-by: offline0806 <3337230449@qq.com> Co-authored-by: offline0806 <3337230449@qq.com>	2025-10-30 10:18:07 +08:00
offline893	5f176ca992	[CI]Fix eplb nightly tests. (#3863 ) ### What this PR does / why we need it? Fix eplb nightly tests. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0rc3 - vLLM main: `83f478bb19` --------- Signed-off-by: offline0806 <3337230449@qq.com> Co-authored-by: offline0806 <3337230449@qq.com>	2025-10-29 23:06:05 +08:00
Li Wang	4a2ab13743	[CI] Optimize nightly CI (#3858 ) ### What this PR does / why we need it? This patch optimize nightly CI: 1. Bug fixes ais_bench get None repo_type error 2. Fix A2 install kubectl error with arm arch 3. Fix the multi_node CI unable to determine whether the job was successful error ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0rc3 - vLLM main: `83f478bb19` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-10-29 22:30:19 +08:00
Mengqing Cao	900086fdc6	[HybridKV][Bugfix] Fix Hybrid kvcache sharing bug in same attention type (#3760 ) ### What this PR does / why we need it? Part of https://github.com/vllm-project/vllm-ascend/pull/3106 Fix Hybrid kvcache sharing bug in same attention type Change the `shared_by` logic so that the same attention spec could share the same buffer instead of allocating more hbm. After this pr, kvcache memory saved 50% in qwen3-next compared with before (`self_attn:linear_attn=1:3` in an `attn_group`), and `gpu_memory_utilization` could increase to `0.8` on Qwen3-Next when running on A2 64G/card with tp4 <img width="2833" height="1540" alt="image" src="https://github.com/user-attachments/assets/2a91fa99-fb0f-447c-9e8b-acd587890fbe" /> ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Test pass with the latest e2e test case on qwen3-next - vLLM version: v0.11.0rc3 - vLLM main: `c9461e05a4` --------- Signed-off-by: MengqingCao <cmq0113@163.com>	2025-10-29 14:18:52 +08:00
jiangyunfan1	e56b0017a3	[TEST]Add aisbench log and A2 cases (#3841 ) ### What this PR does / why we need it? This PR adds 2 more A2 caces which we need to test daily. It also enhances the logging for aisbench test failures to improve issues identification ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running the test - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/releases/v0.11.1 --------- Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>	2025-10-28 23:33:15 +08:00
Li Wang	90ae114569	[CI] Fix nightly CI (#3821 ) ### What this PR does / why we need it? This patch fix the nightly CI runs [failure](https://github.com/vllm-project/vllm-ascend/actions/runs/18848144365) ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/releases/v0.11.1 --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-10-28 20:40:03 +08:00
Li Wang	f846bd20e4	[CI] Add multi-node test case for a2 (#3805 ) ### What this PR does / why we need it? This patch add multi-node test case for a2 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0rc3 - vLLM main: `c9461e05a4` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-10-27 23:10:17 +08:00
jiangyunfan1	9030106a14	[TEST]Add 2P1D multi node cases for nightly test (#3764 ) ### What this PR does / why we need it? This PR adds the 2P1D multi node func/acc/perf test cases, we need test them daily ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? by running the test - vLLM version: v0.11.0rc3 - vLLM main: `c9461e05a4` --------- Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com> Signed-off-by: wangli <wangli858794774@gmail.com> Co-authored-by: wangli <wangli858794774@gmail.com>	2025-10-27 23:09:15 +08:00
Li Wang	60ee4af6d0	[CI] Add custom op to nightly (#3765 ) ### What this PR does / why we need it? 1. Add custom op to nightly tests, fix https://github.com/vllm-project/vllm-ascend/pull/3665 2. Correctly pass github secrets when using workflow_call, see https://docs.github.com/en/actions/how-tos/reuse-automations/reuse-workflows 3. Fix the single node mutual cancellation issue - vLLM version: v0.11.0rc3 - vLLM main: `c9461e05a4` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-10-27 14:07:03 +08:00
ck-hw-1018	7572939b94	add qwq testcase (#3757 ) ### What this PR does / why we need it? This PR adds a qwq case for nightly test for qwen-qwq on A3 ,we need test them daily ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? by running the test - vLLM version: v0.11.0rc3 - vLLM main: `c9461e05a4` --------- Signed-off-by: ckhw <cuikai1@huawei.com>	2025-10-25 17:11:35 +08:00
Icey	d9cdc65854	Upgrade to new vllm commit (#3719 ) ### What this PR does / why we need it? Upgrade to new vllm commit: `c9461e05a4` - Fix many imports, caused by https://github.com/vllm-project/vllm/pull/26908 - Fix import ```sha256```, caused by https://github.com/vllm-project/vllm/pull/27169 - Remove ```SchedulerConfig.send_delta_data```, caused by https://github.com/vllm-project/vllm/pull/27142 - Fix ```FusedMoE``` because of dual stream execution, caused by https://github.com/vllm-project/vllm/pull/26440 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with new added/existing test. - vLLM version: v0.11.0rc3 - vLLM main: `17c540a993` --------- Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: Icey <1790571317@qq.com> Co-authored-by: MengqingCao <cmq0113@163.com>	2025-10-25 15:36:32 +08:00
HuaJiaHeng	11f75883be	[Test] add test for prefix cache feature of deepseek (#3733 ) ### What this PR does / why we need it? This PR adds a prefix cache case for nightly test for DeepSeek-r1-0528-W8A8 on A3, we need test them daily. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running the test - vLLM version: v0.11.0rc3 - vLLM main: `17c540a993` --------- Signed-off-by: root <root@hostname-2pbfv.foreman.pxe> Co-authored-by: root <root@hostname-2pbfv.foreman.pxe>	2025-10-25 14:08:15 +08:00
weichen	63c363d3de	[Refactor] [MoE] Rename moe-related classes & files (#3646 ) ### What this PR does / why we need it? 1. Rename common_fused_moe.py to fused_moe.py. 2. Rename fused_moe_prepare_and_finalize.py / FusedMoEPrepareAndFinalize to prepare_finalize.py / PrepareAndFinalize. 3. Rename vllm_ascend/ops/moe to vllm_ascend/ops/fused_moe. 4. Move vllm_ascend/ops/fused_moe.py to vllm_ascend/ops/fused_moe/fused_moe.py ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? e2e & ut - vLLM version: v0.11.0rc3 - vLLM main: `17c540a993` Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com>	2025-10-25 11:22:03 +08:00
zhangxinyuehfad	8f6f967028	[Test] Add e2e test and accuracy test for Qwen3-Next-80B-A3B-Instruct (#3450 ) ### What this PR does / why we need it? Add e2e test and accuracy test for Qwen3-Next-80B-A3B-Instruct ### How was this patch tested? accuracy test: https://github.com/vllm-project/vllm-ascend/actions/runs/18771221544/job/53556027634?pr=3450 ci test: https://github.com/vllm-project/vllm-ascend/actions/runs/18771221530/job/53556027614?pr=3450 <img width="1703" height="562" alt="image" src="https://github.com/user-attachments/assets/973b6cfa-8240-41e3-893a-5024ff8d0693" /> - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2025-10-25 10:57:56 +08:00
whx	d5609e2c48	[BugFix] Comment out newly added vlm e2e. (#3736 ) This PR comments out newly added vlm e2e test of ascend scheduler scenario because I found that when running in multi-batch this will stuck. Need to add this back after dealing with this issue. - vLLM version: v0.11.0rc3 - vLLM main: `17c540a993` Signed-off-by: whx-sjtu <2952154980@qq.com>	2025-10-25 10:34:59 +08:00

... 7 8 9 10 11 ...

624 Commits