xc-llm-ascend

Author	SHA1	Message	Date
Nengjun Ma	3b59f20a28	update to vllm 12-19 (#5223 ) ### What this PR does / why we need it? ### Does this PR introduce _any_ user-facing change? Fix vllm break: 1. [Enable cuda graph for deepepHT, 5.3% throughput improvement, 4.4% TTFT improvement] (https://github.com/vllm-project/vllm/pull/29558) Fix Solution: Add the now-necessary `all2all_backend` parameter. The impact of this parameter on the original `set_splitting_ops_for_v1` implementation is only that graph mode is disabled in `vllm` if `deepep_high_throughput` is enabled; it has no effect on the `vllm-ascend` logic. 2.[Migrate legacy ViT MultiHeadAttention to new MMEncoderAttention interface ] (https://github.com/vllm-project/vllm/pull/30684) Fix Solution: The reason why the GPU does not need to convert qkv to 3D is that the GPU's flash_attention operator is compatible with 3D and 4D (b s h d and s b ( h d)), but the NPU's flash_attention_unpad operator only supports 3D (s b ( h d)). Therefore, we need to introduce the reshape_qkv_to_3d operation. 4.Skip Tencent-Hunyuan/HunyuanOCR test case, as it has following issue in upgrade vllm code: https://github.com/vllm-project/vllm-ascend/issues/5297 ### How was this patch tested? Co-authored-by: zxwang <1476209578@qq.com> - vLLM version: release/v0.13.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: leo-pony <nengjunma@outlook.com> Signed-off-by: zxwang <1476209578@qq.com> Co-authored-by: zxwang <1476209578@qq.com>	2025-12-23 23:52:11 +08:00
meihanc	592cfb6a6f	[CI] Add Triton Ascend in CI (#4921 ) Add triton-ascend in UT and e2e - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>	2025-12-23 12:47:35 +08:00
wangxiyuan	758d81dcb1	Drop 0.12.0 support (#5146 ) We decided to release v0.13.0 soon. So no need to support 0.12.0 now. Let's drop it. - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-12-20 09:38:53 +08:00
wangxiyuan	636265be6d	[CI] Improve CI (#5078 ) Raname workflow to be clear. - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-12-19 15:34:35 +08:00
ZixuanWang	b1a853b0f6	Upgrade vllm commit hash to 1216 (#5053 ) ### What this PR does / why we need it? Upstream vLLM PR #30212 https://github.com/vllm-project/vllm/pull/30212 refactored the attention backend selection interface, This PR adapts vllm-ascend's get_attn_backend_cls to align with the new upstream standard, ensuring compatibility and reducing maintenance overhead. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? co-author:[leo-pony][nengjunma@outlook.com](mailto:nengjunma@outlook.com) - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: zxwang <1476209578@qq.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: leo-pony <nengjunma@outlook.com>	2025-12-17 08:48:36 +08:00
dependabot[bot]	5f840696c1	Bump actions/checkout from 4 to 6 (#5015 ) Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6. - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-12-16 11:30:41 +08:00
Li Wang	6063853ead	[Misc] Upgrade vllm commit hash to 1215 (#5029 ) ### What this PR does / why we need it? Upgrade vllm commit hash to `4429d934de3c5cc327b0d7aec8e473aeba38db90` - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-12-16 09:23:02 +08:00
Li Wang	8d2998d0e4	[Misc] Upgrade vllm hash to 12_14 (#5000 ) ### What this PR does / why we need it? ### Does this PR introduce _any_ user-facing change? 1. fix https://github.com/vllm-project/vllm/pull/27938 2. fix https://github.com/vllm-project/vllm/pull/27145 pooling models now supports chunked prefill and prefix caching, 3. fix https://github.com/vllm-project/vllm/pull/30181 define the CPU fields in the field config where they really belong. 4. fix https://github.com/vllm-project/vllm/pull/28168 define the CPU fields in the field config where they really belong. 5. fix https://github.com/vllm-project/vllm/pull/30201 some moudle rename 6. fix https://github.com/vllm-project/vllm/pull/29067 fusedmoe moudle refactor 7. fix https://github.com/vllm-project/vllm/pull/29066 fusedmoe moudle refactor 8. fix https://github.com/vllm-project/vllm/pull/29624 ### How was this patch tested? - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-12-15 19:54:23 +08:00
wangxiyuan	8090914d69	[CI] CI refactor (#4928 ) 1. rename workflow to better name 2. fix lint error 3. remove accuracy report doc and test - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-12-14 11:09:56 +08:00

1 2

59 Commits