xc-llm-ascend

Files

linfeng-yuan 2cd036ee8e [Bugfix] fix accuracy problem for quantized deepseek models (#768 )

### What this PR does / why we need it?

The root cause of the bug is that numerical computations involving NaNs
cannot eliminate them. We addressed it by using `masked_fill_` to
eliminate NaNs while avoiding memory-wasting `torch.where` approach.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
This patch was tested with vllm v0.8.5 and vllm-ascend master. I run
deepseek_v3 model with offline inference scripts
(examples/dp_offline/run_dp.sh & data_parallel.py).

Signed-off-by: linfeng-yuan <1102311262@qq.com>

2025-05-06 22:09:56 +08:00

__init__.py

[Core] Cherry pick from 0.7.1 to keep the main code newest (#127 )

2025-02-21 17:07:37 +08:00

func_wrapper.py

[quantization] Support w8a8 quantization (#580 )

2025-04-20 18:14:05 +08:00

quant_config.py

[Feature] Add quant description file for new quant model generated by modelslim (#719 )

2025-04-30 16:51:56 +08:00

quantizer.py

[quantization] Support w8a8 quantization (#580 )