Files

weijinqian0 bdd90c0088 [model_runner_v2]optimize the performance of the post_update. (#7496 )

### What this PR does / why we need it?
- This PR aims to enhance the operator performance in the `post_update`
phase of `model_runner_v2` on NPUs. By optimizing the relevant
operations, it is expected to improve the overall efficiency and speed
of the model running on NPU hardware, which is crucial for scenarios
where high-performance inference is required.
- when bs = 256, time cost reduce from 26us to 11 us; 

### Does this PR introduce _any_ user-facing change?
No, there are no changes to the API, interface, or other high-level
behaviors that would directly affect the user's code or interaction with
the system beyond the performance improvement.

### How was this patch tested?
CI passed with new added/existing tests. In addition to the regular CI
tests, specific benchmark tests were conducted on NPU hardware to
measure the performance improvement of the `post_update` operators.

---------

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Co-authored-by: weijinqian_v1 <weijinqian@huawei.com>

2026-03-23 20:29:55 +08:00

model_states

[Feature] support aclgraph for model runner v2 (#7110 )

2026-03-13 09:11:46 +08:00

sample

[model_runner_v2]optimize the performance of the _topk_log_softmax_kernel (#7221 )

2026-03-16 16:49:10 +08:00

spec_decode

[Feature] support aclgraph for model runner v2 (#7110 )

2026-03-13 09:11:46 +08:00

__init__.py

implement model runner v2 basic framework (#5051 )