xc-llm-ascend

Files

pu-zhe 5df450bca4 [Feat] [310p] Support w8a8sc quantization method (#7075 )

### What this PR does / why we need it?
New Quantization Method: Introduced support for the W8A8SC static linear
quantization scheme specifically for 310P hardware, enabling more
efficient model compression.
Refactored the save_sharded_state_310.py to avoid multi-process issue.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
W8A8SC quant E2E test.

- vLLM version: v0.16.0
- vLLM main:
4034c3d32e

---------

Signed-off-by: pu-zhe <zpuaa@outlook.com>

2026-03-10 16:13:20 +08:00

__init__.py

[Feat] [310p] Support w8a8sc quantization method (#7075 )

2026-03-10 16:13:20 +08:00

registry.py

[Feat.]: support 310p w8a8 (#6454 )

2026-02-03 14:13:06 +08:00

w8a8_dynamic.py

[Feat] 310p support MoE W8A8 quantizaition (#6641 )

2026-02-10 17:17:44 +08:00

w8a8_static.py

[300I][Bugfix] fix unquant model weight nd2nz error (#6851 )