[Feature] Add support of new W4A4_LAOS_DYNAMIC quantization method (#5143)
Introduce W4A4 LAOS Quantization for better model compression and
inference efficiency on Ascend devices.
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
This commit is contained in:
1
.github/workflows/_e2e_test.yaml
vendored
1
.github/workflows/_e2e_test.yaml
vendored
@@ -217,6 +217,7 @@ jobs:
|
||||
pytest -sv --durations=0 tests/e2e/multicard/2-cards/test_offline_inference_distributed.py::test_qwen3_dense_fc1_tp2
|
||||
pytest -sv --durations=0 tests/e2e/multicard/2-cards/test_offline_inference_distributed.py::test_qwen3_dense_prefetch_mlp_weight_tp2
|
||||
pytest -sv --durations=0 tests/e2e/multicard/2-cards/test_offline_inference_distributed.py::test_deepseek3_2_w8a8_pruning_mtp_tp2_ep
|
||||
pytest -sv --durations=0 tests/e2e/multicard/2-cards/test_offline_inference_distributed.py::test_qwen3_w4a4_distributed_tp2
|
||||
|
||||
pytest -sv --durations=0 tests/e2e/multicard/2-cards/test_offline_weight_load.py
|
||||
pytest -sv --durations=0 tests/e2e/multicard/2-cards/test_pipeline_parallel.py
|
||||
|
||||
Reference in New Issue
Block a user