Files
xc-llm-ascend/vllm_ascend
ttanzhiqiang 4270682383 Waiting for BMM NZ support(Improve TPOP 2ms performance) (#1131)
### What this PR does / why we need it?
W_UV/W_UK_T cannot be converted to nz, because this position will be
fused into transposebatchmatmul, which does not support nz. The weights
are actually converted back to nd in each run.

### Does this PR introduce _any_ user-facing change?
Use #1098 as the baseline, p90 TPOT 90.79ms->88.58ms, improve TPOP 2ms

### How was this patch tested?
use #1101

---------

Signed-off-by: ttanzhiqiang <389825161@qq.com>
2025-06-15 19:57:02 +08:00
..
2025-04-22 08:57:25 +08:00
2025-06-11 16:33:11 +08:00
2025-06-09 14:08:18 +08:00