xc-llm-ascend

Files

ttanzhiqiang dc6172efd3 update attention nz and mla nz(Improve TPOP 6ms performance) (#909 )

### What this PR does / why we need it?
Update attention nz and mla nz modules to improve TPOP 6ms performance
Convert W_UV and W_UK_T to NPU format in mla_v1.py
Convert layer.weight to NPU format in w8a8.py

Signed-off-by: ttanzhiqiang <389825161@qq.com>

2025-05-23 10:18:10 +08:00

__init__.py

[Core] Cherry pick from 0.7.1 to keep the main code newest (#127 )

2025-02-21 17:07:37 +08:00

func_wrapper.py

[quantization] Support w8a8 quantization (#580 )

2025-04-20 18:14:05 +08:00

quant_config.py

enable online serving quantization (#877 )

2025-05-17 17:36:04 +08:00

quantizer.py

[quantization] Support w8a8 quantization (#580 )