luomin2005
f41eeeb11e
Refactor the ops PyTorch adapter,cleanup for csrc/torch_binding.cpp ( #6732 )
...
### What this PR does / why we need it?
Refactor the ops PyTorch adapter,cleanup for csrc/torch_binding.cpp,
more details see
https://github.com/vllm-project/vllm-ascend/issues/6486
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
install the new package to test the new modification, here is the
result:
- vLLM version: v0.15.0
- vLLM main:
9562912cea
---------
Signed-off-by: liziyu <liziyu16@huawei.com >
Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com >
Signed-off-by: luomin2005 <luomin2005@huawei.com >
Co-authored-by: liziyu <56102866+liziyu179@users.noreply.github.com >
Co-authored-by: wangxiaoteng <wangxiaoteng@huawei.com >
2026-02-24 09:12:43 +08:00
Trunrain
91bf524364
[BugFix][kernel] fix matmul_allreduce_add_rmsnorm_kernel ( #5335 )
...
### What this PR does / why we need it?
fix matmul_allreduce_add_rmsnorm_kernel, add hccl Init, SetCcTiling
interface
test case use multicard-4
### Does this PR introduce _any_ user-facing change?
NO
### How was this patch tested?
pytest -sv tests/e2e/nightly/ops/test_matmul_allreduce_add_rmsnorm.py
multicard-4 pass
https://github.com/vllm-project/vllm-ascend/actions/runs/20502630658/job/58914474652?pr=5335
- vLLM version: release/v0.13.0
- vLLM main:
bc0a5a0c08
Signed-off-by: tongrunze <t00574058@china.huawei.com >
Co-authored-by: tongrunze <t00574058@china.huawei.com >
2026-01-05 15:19:54 +08:00
Trunrain
141bd913e1
restore matmul_allreduce_add_rmsnrom aclnn interface ( #5119 )
...
**What this PR does / why we need it?**
restore a2 matmul_allreduce_add_rmsnrom kernel aclnn interface
**How was this patch tested?**
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c
Signed-off-by: tongrunze <t00574058@china.huawei.com >
Co-authored-by: tongrunze <t00574058@china.huawei.com >
2025-12-19 17:06:59 +08:00
zhenwenqi2024
eb4c08f05d
[bugfix] fix mtp accept rate ( #5093 )
...
### What this PR does / why we need it?
1. now, npu_model_runner reuses gpu_model_runner, this pr deletes some
attrs already defined in gpu_model_runner
2. fix mtp accept rate by disabling in_profile_run
3. remove redundant moe method selection logic
4. Reverts vllm-project/vllm-ascend#5082 , which broke CI in
https://github.com/vllm-project/vllm-ascend/actions/runs/20266314048/job/58190426832?pr=5088
### Does this PR introduce _any_ user-facing change?
NO
### How was this patch tested?
vLLM version: v0.12.0
vLLM main:
ad32e3e19c
vLLM version: v0.12.0
vLLM main:
ad32e3e19c
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c
---------
Signed-off-by: zhenwenqi2024 <zhenwenqi_2022@qq.com >
Signed-off-by: Mengqing Cao <cmq0113@163.com >
Co-authored-by: Mengqing Cao <cmq0113@163.com >
2025-12-17 01:35:26 +08:00
Trunrain
af64087732
[bugfix] matmul_allreduce_add_rmsnorm aclnn interface ( #5082 )
...
What this PR does / why we need it?
a2 kernel aclnn interface extern "C" fix
Does this PR introduce any user-facing change?
No
How was this patch tested?
vLLM version: v0.12.0
Signed-off-by: tongrunze <t00574058@china.huawei.com >
Co-authored-by: tongrunze <t00574058@china.huawei.com >
2025-12-16 17:36:40 +08:00
Trunrain
ba9cda9dfd
[Kernel] add custom op MatmulAllreduceAddRmsnorm ( #4606 )
...
What this PR does / why we need it?
Optimization of the fused operator for Qwen3 32B: Matmul, AllReduce,
Add, and RMSNorm
Does this PR introduce _any_ user-facing change?
No
How was this patch tested?
vLLM version: v0.11.2
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2
Signed-off-by: tongrunze <t00574058@china.huawei.com >
Co-authored-by: tongrunze <t00574058@china.huawei.com >
2025-12-10 09:05:33 +08:00