xc-llm-ascend

Author	SHA1	Message	Date
huangxialu	dceef080b1	[main] remove torch.cat and replace it by List[0] (#2153 ) ### What this PR does / why we need it? torch_npu.npu_grouped_matmul: https://www.hiascend.com/document/detail/zh/Pytorch/710/apiref/torchnpuCustomsapi/context/torch_npu-npu_grouped_matmul.md According to the document, when `split_item` is 2 or 3, `torch_npu.npu_grouped_matmul` will return a list which has one element. Therefore, the `torch.cat` after `torch_npu.npu_grouped_matmul` is unnecessary. ### Does this PR introduce _any_ user-facing change? not involved ### How was this patch tested? ut and e2e covered: `tests/ut/ops/test_fused_ops.py`, `tests/e2e/singlecard/ops/test_fused_moe.py` performance: (qwen3 30B, 2k->20k) base: Total Token throughput (tok/s): 667.76 remove cat: Total Token throughput (tok/s): 680.82 - vLLM version: v0.10.0 - vLLM main: `fa00c5d75b` Signed-off-by: huangxialu <huangxialu1@huawei.com>	2025-08-07 17:20:19 +08:00
wangxiyuan	36e450eb0f	[Misc] Nit fix for disaggregated_prefill and ascend_forward_context (#2097 ) we recently added disaggregated_prefill and ascend_forward_context feature by `ba3dfbd59e` and `df0ec55162`. This PR fix some nit introduced by them to make the code clear. 1. drop `current_platform` usage. It'll lead unknown circular import error in some case 2. update `set_ascend_forward_context` function to make the logic clear. for example, remove V0 support in this function. 3. Remove useless `self.local_rank_across_dp` in worker 4. Remove `soc_info.py` to use `get_ascend_soc_version` instead. - vLLM version: v0.10.0 - vLLM main: `02f82fe438` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-08-05 08:39:02 +08:00
huangxialu	9c9a7cd90b	[main] adapt usage of npu_moe_gating_top_k_softmax and remove envs.SELECT_GATING_TOPK_SOTFMAX_EXPERTS (#2112 ) backport of v0.9.1-dev: https://github.com/vllm-project/vllm-ascend/pull/1902 origin main npu_moe_gating_top_k_softmax: https://github.com/vllm-project/vllm-ascend/pull/1355 - vLLM version: v0.10.0 - vLLM main: `055bd3978e` Signed-off-by: huangxialu <huangxialu1@huawei.com>	2025-07-31 21:05:56 +08:00
zzzzwwjj	ba3dfbd59e	[main][refactor] Refactoring forward_context and model_runner_v1 (#1979 ) ### What this PR does / why we need it? A refactoring of forward_context and model_runner_v1, add some context which is necessary in model inference into forward_context, and refactor dummy_run logic, make it more reasonable. Some details for this PR: Add `ascend_forward_context`; Update mc2_v2 op, and support `active_mask` param; Update scripts in examples dir; refactor `dummy_run` logic; Add soc_version for A2 and A3; ### Does this PR introduce _any_ user-facing change? No change at user-facing. ### How was this patch tested? - vLLM version: v0.10.0 - vLLM main: `57c22e57f9` Signed-off-by: zzzzwwjj <1183291235@qq.com>	2025-07-28 14:06:20 +08:00
Zac	2ffe051859	[Test]add ut for deepseek_v2. (#1964 ) What this PR does / why we need it? Add uts for deepseek_v2 Does this PR introduce any user-facing change? No How was this patch tested? - vLLM version: v0.9.2 - vLLM main: `f3137cdd81` --------- Signed-off-by: 张帮政 <zhangbangzheng@huawei.com>	2025-07-24 10:27:50 +08:00
shiyuan680	ac0bf133f4	add ut of fused_moe.py (#1930 ) ### What this PR does / why we need it? add unit test for fused_moe.py - vLLM version: v0.9.2 - vLLM main: `2dec7c1a5d` Signed-off-by: yangcheng <yangcheng104@huawei.com> Co-authored-by: yangcheng <yangcheng104@huawei.com>	2025-07-23 16:24:09 +08:00

6 Commits