[Feat] [310p] Support w8a8sc quantization method (#7075)
### What this PR does / why we need it?
New Quantization Method: Introduced support for the W8A8SC static linear
quantization scheme specifically for 310P hardware, enabling more
efficient model compression.
Refactored the save_sharded_state_310.py to avoid multi-process issue.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
W8A8SC quant E2E test.
- vLLM version: v0.16.0
- vLLM main:
4034c3d32e
---------
Signed-off-by: pu-zhe <zpuaa@outlook.com>
This commit is contained in:
@@ -19,4 +19,5 @@ from . import (
|
||||
w8a8_dynamic, # noqa: F401
|
||||
w8a8_static, # noqa: F401
|
||||
w8a8s, # noqa: F401
|
||||
w8a8sc, # noqa: F401
|
||||
)
|
||||
|
||||
Reference in New Issue
Block a user