xc-llm-ascend

Files

ice_rain 09682e0751 [Bugfix] Fix matmul allreduce precision issue by using original weight (#4939 )

### What this PR does / why we need it?

This PR fixes the precision issue from improper Tensor maintenance in
`vllm_ascend/ops/linear_op.py` under the Verl reinforcement learning
(RL) scenario. issue:
https://github.com/vllm-project/vllm-ascend/issues/5747
Key changes:
1. Remove the custom class member `self.weight_t` in
`vllm_ascend/ops/linear_op.py`;
2. Adjust the input logic of the `npu_mm_all_reduce_base` operator to
directly fetch weight parameters from the model's `nn.Parameters`,
instead of using pre-created Tensors.

> In the vllm model, it is recommended to avoid creating additional
parameter copies (such as self.weight_t) for computation; if already
created, they must be synchronized with the model's original parameters.
This is because parameter synchronization between training and inference
in the Verl reinforcement learning (RL) scenario may cause memory
address changes to nn.Parameters, and unsynchronized extra Tensors will
reference old memory without updating with the parameters—ultimately
leading to precision issues.
### Does this PR introduce _any_ user-facing change?
No.

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

Signed-off-by: icerain-alt <450125138@qq.com>
Co-authored-by: Shangwei-Li <lishangwei@mail.ustc.edu.cn>

2026-01-09 16:05:32 +08:00

fused_moe

[CustomOp] support TensorList for dispatchFFNCombine (#5665 )

2026-01-09 15:56:29 +08:00

triton

[Feature] add the magicmtp speculative decoding acceleration algorithm (#5542 )

2026-01-08 09:15:55 +08:00

__init__.py

[Fusion] [Graph] Add qknorm rope fusion operator (#4711 )

2025-12-17 08:53:44 +08:00

activation.py

[refact] unified soc_version code (#4359 )