liuhy1213-cell
fb283b5820
[CI] Add nightly CI test cases for the GLM-5 ( #7429 )
...
### What this PR does / why we need it?
Add nightly CI test cases for the GLM-5
Add model download for the GLM-5
https://github.com/vllm-project/vllm-ascend/actions/runs/23286178651/job/67710409642#logs
- vLLM version: v0.17.0
- vLLM main:
b31e9326a7
---------
Signed-off-by: liuhaiyang27 <liuhaiyang27@huawei.com >
Signed-off-by: liuhy1213-cell <liuhy1213@gmail.com >
Co-authored-by: liuhaiyang27 <liuhaiyang27@huawei.com >
2026-03-23 19:14:19 +08:00
aipaes
87d6424b2e
[CI] Add nightly CI test cases for the GLM-4.7 model. ( #7391 )
...
### What this PR does / why we need it?
Add acc nightly CI test cases for the GLM-4.7 model.
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
through CI
- vLLM version: v0.17.0
- vLLM main:
4034c3d32e
---------
Signed-off-by: zjks98 <zhangjiakang4@huawei.com >
Co-authored-by: zjks98 <zhangjiakang4@huawei.com >
2026-03-19 16:43:29 +08:00
LoganJane
270c5cb8cd
[CI] Add nightly CI test cases for the Kimi-K2.5 ( #7416 )
...
### What this PR does / why we need it?
Add nightly CI test cases for the Kimi-K2.5.
- vLLM version: v0.17.0
- vLLM main:
4497431df6
---------
Signed-off-by: LoganJane <loganJane73@hotmail.com >
Signed-off-by: LoganJane <42287016+LoganJane@users.noreply.github.com >
2026-03-19 11:02:29 +08:00
SparrowMu
fb8e22ec00
[DOC] MiniMax-M2.5 model intro ( #7296 )
...
### What this PR does / why we need it?
1. Add nightly test on MiniMax-M2.5 with deployment method on A3
2. Add MiniMax-M2.5 deployment introduction to vllm-ascend docs
- vLLM version: v0.17.0
- vLLM main:
4034c3d32e
---------
Signed-off-by: limuyuan <limuyuan3@huawei.com >
Signed-off-by: SparrowMu <52023119+SparrowMu@users.noreply.github.com >
Co-authored-by: limuyuan <limuyuan3@huawei.com >
2026-03-18 20:14:36 +08:00
liuhy1213-cell
58725b8b24
[doc] add Prefill-Decode Disaggregation doc for GLM5.md ( #7300 )
...
### What this PR does / why we need it?
add Prefill-Decode Disaggregation doc for GLM5.md
w8a8 65k-1.5k
Concurrency: 80
prefixcache: 90%
tps: 2054
- vLLM version: v0.17.0
- vLLM main:
4034c3d32e
---------
Signed-off-by: liuhaiyang27 <liuhaiyang27@huawei.com >
Co-authored-by: liuhaiyang27 <liuhaiyang27@huawei.com >
2026-03-18 17:00:31 +08:00
LeeWenquan
65eae6de7b
Add Ascend Ops recurrent_gated_delta_rule ( #6725 )
...
### What this PR does / why we need it?
Change recurrent_gated_delta_rule ops from triton to ascend C version
for better performance.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- vLLM version: v0.15.0
- vLLM main:
9562912cea
---------
Signed-off-by: SunnyLee219 <3294305115@qq.com >
2026-03-09 14:14:14 +08:00
SILONG ZENG
859f2c25b9
[Nightly][Refactor]Migrate nightly single-node model tests from .py to .yaml ( #6503 )
...
### What this PR does / why we need it?
This PR refactors the nightly single-node model test by migrating test
configurations from Python scripts to a more maintainable `YAML-based`
format.
| Original PR | Python (`.py`) | YAML (`.yaml`) |
| :--- | :--- | :--- |
| [#3568 ](https://github.com/vllm-project/vllm-ascend/pull/3568 ) |
`test_deepseek_r1_0528_w8a8_eplb.py` | `DeepSeek-R1-0528-W8A8.yaml` |
| [#3631 ](https://github.com/vllm-project/vllm-ascend/pull/3631 ) |
`test_deepseek_r1_0528_w8a8.py` | `DeepSeek-R1-0528-W8A8.yaml` |
| [#5874 ](https://github.com/vllm-project/vllm-ascend/pull/5874 ) |
`test_deepseek_r1_w8a8_hbm.py` | `DeepSeek-R1-W8A8-HBM.yaml` |
| [#3908 ](https://github.com/vllm-project/vllm-ascend/pull/3908 ) |
`test_deepseek_v3_2_w8a8.py` | `DeepSeek-V3.2-W8A8.yaml` |
| [#5682 ](https://github.com/vllm-project/vllm-ascend/pull/5682 ) |
`test_kimi_k2_thinking.py` | `Kimi-K2-Thinking.yaml` |
| [#4111 ](https://github.com/vllm-project/vllm-ascend/pull/4111 ) |
`test_mtpx_deepseek_r1_0528_w8a8.py` | `MTPX-DeepSeek-R1-0528-W8A8.yaml`
|
| [#3733 ](https://github.com/vllm-project/vllm-ascend/pull/3733 ) |
`test_prefix_cache_deepseek_r1_0528_w8a8.py` |
`Prefix-Cache-DeepSeek-R1-0528-W8A8.yaml` |
| [#6543 ](https://github.com/vllm-project/vllm-ascend/pull/6543 ) |
`test_qwen3_235b_w8a8.py` | `Qwen3-235B-A22B-W8A8.yaml` |
| [#6543 ](https://github.com/vllm-project/vllm-ascend/pull/6543 ) |
`test_qwen3_235b_a22b_w8a8_eplb.py` | `Qwen3-235B-A22B-W8A8.yaml` |
| [#3973 ](https://github.com/vllm-project/vllm-ascend/pull/3973 ) |
`test_qwen3_30b_w8a8.py` | `Qwen3-30B-A3B-W8A8.yaml` |
| [#3541 ](https://github.com/vllm-project/vllm-ascend/pull/3541 ) |
`test_qwen3_32b_int8.py` | `Qwen3-32B-Int8.yaml` |
| [#3757 ](https://github.com/vllm-project/vllm-ascend/pull/3757 ) |
`test_qwq_32b.py` | `QwQ-32B.yaml` |
| [#5616 ](https://github.com/vllm-project/vllm-ascend/pull/5616 ) |
`test_qwen3_next_w8a8.py` | `Qwen3-Next-80B-A3B-Instruct-W8A8.yaml` |
| [#3541 ](https://github.com/vllm-project/vllm-ascend/pull/3541 ) |
`test_qwen2_5_vl_7b.py` | `Qwen2.5-VL-7B-Instruct.yaml` |
| [#5301 ](https://github.com/vllm-project/vllm-ascend/pull/5301 ) |
`test_qwen2_5_vl_7b_epd.py` | `Qwen2.5-VL-7B-Instruct-EPD.yaml` |
| [#3707 ](https://github.com/vllm-project/vllm-ascend/pull/3707 ) |
`test_qwen2_5_vl_32b.py` | `Qwen2.5-VL-32B-Instruct.yaml` |
| [#3676 ](https://github.com/vllm-project/vllm-ascend/pull/3676 ) |
`test_qwen3_32b_int8_a3_feature_stack3.py` |
`Qwen3-32B-Int8-A3-Feature-Stack3.yaml` |
| [#3709 ](https://github.com/vllm-project/vllm-ascend/pull/3709 ) |
`test_prefix_cache_qwen3_32b_int8.py` |
`Prefix-Cache-Qwen3-32B-Int8.yaml` |
| [#5395 ](https://github.com/vllm-project/vllm-ascend/pull/5395 ) |
`test_qwen3_next.py` | `Qwen3-Next-80B-A3B-Instruct-A2.yaml` |
| [#3474 ](https://github.com/vllm-project/vllm-ascend/pull/3474 ) |
`test_qwen3_32b.py` | `Qwen3-32B.yaml` |
| [#3541 ](https://github.com/vllm-project/vllm-ascend/pull/3541 ) |
`test_qwen3_32b_int8.py` | `Qwen3-32B-Int8-A2.yaml` |
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.15.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0
---------
Signed-off-by: MrZ20 <2609716663@qq.com >
2026-03-03 20:13:43 +08:00