Files
xc-llm-ascend/tests
weijinqian0 bdd90c0088 [model_runner_v2]optimize the performance of the post_update. (#7496)
### What this PR does / why we need it?
- This PR aims to enhance the operator performance in the `post_update`
phase of `model_runner_v2` on NPUs. By optimizing the relevant
operations, it is expected to improve the overall efficiency and speed
of the model running on NPU hardware, which is crucial for scenarios
where high-performance inference is required.
- when bs = 256, time cost reduce from 26us to 11 us; 

### Does this PR introduce _any_ user-facing change?
No, there are no changes to the API, interface, or other high-level
behaviors that would directly affect the user's code or interaction with
the system beyond the performance improvement.

### How was this patch tested?
CI passed with new added/existing tests. In addition to the regular CI
tests, specific benchmark tests were conducted on NPU hardware to
measure the performance improvement of the `post_update` operators.

---------

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Co-authored-by: weijinqian_v1 <weijinqian@huawei.com>
2026-03-23 20:29:55 +08:00
..