xc-llm-ascend

Author	SHA1	Message	Date
whx	14d4ed5f0c	[BugFix] Fix aclgraph accu problem in A2. (#3163 ) This PR fixes accuracy problem of aclgraph on A2. The problem is introduced by PR #2980, which makes the `all_reduce` of shared_experts exposed to torch dynamo. This PR moves all the codes into forward_impl to shiled from torch dynamo. - vLLM version: v0.10.2 - vLLM main: `17b4c6685c` --------- Signed-off-by: whx-sjtu <2952154980@qq.com>	2025-09-28 21:31:55 +08:00
weijinqian0	6aa4253798	[Refactor] [SP]The sequence parallelism characteristics in the MoE and Dense models are integrated into a single solution. (#3085 ) What this PR does / why we need it? there are two sets of sp implementations for moe and dense models. One is called sequence_parallelism, and the other is flashcomm_v1. We did the following things： Merge two sets of code with the same implementation into one. Remove the implementation of sequence_parallelism, as this solution cannot support aclgraph. Does this PR introduce any user-facing change? No How was this patch tested? e2e&ut - vLLM version: v0.10.2 - vLLM main: `f225ea7dd9` --------- Signed-off-by: weijinqian_v1 <weijinqian@huawei.com> Co-authored-by: weijinqian_v1 <weijinqian@huawei.com>	2025-09-24 11:29:59 +08:00
rjg-lyh	bb1f0d5a62	[main] remove the redundant log prints in register_custom_ops.py (#3094 ) ### What this PR does / why we need it? This PR removed the redundant log prints in register_custom_ops.py, in order to make output clear. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. - vLLM version: v0.10.2 - vLLM main: `9607d5eb44` Signed-off-by: rjg-lyh <1318825571@qq.com>	2025-09-22 17:17:31 +08:00
rjg-lyh	fc2bcbe21c	[Ops] Fix bug in register_custom_ops without forward_context (#2883 ) ### What this PR does / why we need it? This PR fixed the bug in register_custom_ops without forward_context. We set try-except to consider this situation. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. - vLLM version: main - vLLM main: `7920de0a2a` Signed-off-by: rjg-lyh <1318825571@qq.com>	2025-09-12 16:58:08 +08:00
rjg-lyh	0005479b9c	[main] mlp weight prefetch in Qwen Dense Models (#2816 ) ### What this PR does / why we need it? This PR prefetchs the weight of mlp layers in Qwen Dense Models to optimize the performance in Decode phase mainly. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. - vLLM version: main - vLLM main: `a1213fae5f` Signed-off-by: rjg-lyh <1318825571@qq.com> Co-authored-by: Shuming19 <313093131@qq.com>	2025-09-11 21:20:09 +08:00
rjg-lyh	1bbb20ea13	[main] flashcomm_v1 optim in Qwen Dense Models (#2802 ) ### What this PR does / why we need it? Flashcomm_v1 optim in Qwen Dense Models. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. - vLLM version: v0.10.1.1 - vLLM main: `5e537f45b4` Co-authored-by: 1024daniel <xxltju324@gmail.com>	2025-09-08 22:52:24 +08:00

6 Commits