[Feat] [310p] Support w8a8sc quantization method (#7075)

### What this PR does / why we need it? New Quantization Method: Introduced support for the W8A8SC static linear quantization scheme specifically for 310P hardware, enabling more efficient model compression. Refactored the save_sharded_state_310.py to avoid multi-process issue. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? W8A8SC quant E2E test. - vLLM version: v0.16.0 - vLLM main: 4034c3d32e --------- Signed-off-by: pu-zhe <zpuaa@outlook.com>
2026-03-10 16:13:20 +08:00
parent 14c71b19e1
commit 5df450bca4
4 changed files with 258 additions and 14 deletions
--- a/vllm_ascend/_310p/quantization/methods/init.py
+++ b/vllm_ascend/_310p/quantization/methods/init.py
@@ -19,4 +19,5 @@ from . import (
    w8a8_dynamic,  # noqa: F401
    w8a8_static,  # noqa: F401
    w8a8s,  # noqa: F401
+    w8a8sc,  # noqa: F401
 )