[Feat] [310p] Support w8a8sc quantization method (#7075)

### What this PR does / why we need it?
New Quantization Method: Introduced support for the W8A8SC static linear
quantization scheme specifically for 310P hardware, enabling more
efficient model compression.
Refactored the save_sharded_state_310.py to avoid multi-process issue.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
W8A8SC quant E2E test.

- vLLM version: v0.16.0
- vLLM main:
4034c3d32e

---------

Signed-off-by: pu-zhe <zpuaa@outlook.com>
This commit is contained in:
pu-zhe
2026-03-10 16:13:20 +08:00
committed by GitHub
parent 14c71b19e1
commit 5df450bca4
4 changed files with 258 additions and 14 deletions

View File

@@ -19,4 +19,5 @@ from . import (
w8a8_dynamic, # noqa: F401
w8a8_static, # noqa: F401
w8a8s, # noqa: F401
w8a8sc, # noqa: F401
)