xc-llm-ascend

Files

ApsarasX 324f819b92 [Perf] Optimize fused_experts quantization code to save npu memory (#784 )

### What this PR does / why we need it?
In the w8a8 quantization code of `fused_experts`, the output of almost
every operator is assigned a new variable name. If we want to save NPU
memory, we manually `del` these variables to end their lifecycle, which
fills the code with `del` statements and looks inelegant.
Therefore, I plan to names the output of most operators as
`hidden_states`, thereby ending the lifecycle of the previous
`hidden_states`.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

Signed-off-by: ApsarasX <apsarax@outlook.com>

2025-05-09 15:09:37 +08:00

__init__.py

[Core] Cherry pick from 0.7.1 to keep the main code newest (#127 )

2025-02-21 17:07:37 +08:00

func_wrapper.py

[quantization] Support w8a8 quantization (#580 )

2025-04-20 18:14:05 +08:00

quant_config.py

[Feature] Add quant description file for new quant model generated by modelslim (#719 )

2025-04-30 16:51:56 +08:00

quantizer.py

[quantization] Support w8a8 quantization (#580 )