xc-llm-ascend

Files

dsxsteven a80e524fbc [Quant] GLM4.7-Flash Support W8A8 (#6492 )

### What this PR does / why we need it?
support W8A8 quant for model GLM4.7-flash
### Does this PR introduce _any_ user-facing change?
Yes
### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0

Signed-off-by: dsxsteven <dsxsteven@sina.com>
Co-authored-by: SlightwindSec <slightwindsec@gmail.com>

2026-02-03 19:49:58 +08:00

methods

[Feat.]: support 310p w8a8 (#6454 )

2026-02-03 14:13:06 +08:00

__init__.py

[Refactor] Quantization Module Refactor (#5738 )

2026-01-23 14:13:47 +08:00

compressed_tensors_config.py

[Quantization][Feature] Support compressed tensors moe w4a8 dynamic weight (#5889 )

2026-02-02 16:39:32 +08:00

method_adapters.py

[Refactor] Quantization Module Refactor (#5738 )

2026-01-23 14:13:47 +08:00

modelslim_config.py

[Quant] GLM4.7-Flash Support W8A8 (#6492 )

2026-02-03 19:49:58 +08:00