[Feature] Add support of new W4A4_LAOS_DYNAMIC quantization method (#5143)

Introduce W4A4 LAOS Quantization for better model compression and inference efficiency on Ascend devices. - vLLM version: v0.12.0 - vLLM main: ad32e3e19c Signed-off-by: hfadzxy <starmoon_zhang@163.com>
2026-01-22 05:34:58 +03:00
parent dd8571860d
commit ef9d8367f5
4 changed files with 134 additions and 0 deletions
--- a/.github/workflows/_e2e_test.yaml
+++ b/.github/workflows/_e2e_test.yaml
@@ -217,6 +217,7 @@ jobs:
          pytest -sv --durations=0 tests/e2e/multicard/2-cards/test_offline_inference_distributed.py::test_qwen3_dense_fc1_tp2
          pytest -sv --durations=0 tests/e2e/multicard/2-cards/test_offline_inference_distributed.py::test_qwen3_dense_prefetch_mlp_weight_tp2
          pytest -sv --durations=0 tests/e2e/multicard/2-cards/test_offline_inference_distributed.py::test_deepseek3_2_w8a8_pruning_mtp_tp2_ep
+          pytest -sv --durations=0 tests/e2e/multicard/2-cards/test_offline_inference_distributed.py::test_qwen3_w4a4_distributed_tp2

          pytest -sv --durations=0 tests/e2e/multicard/2-cards/test_offline_weight_load.py
          pytest -sv --durations=0 tests/e2e/multicard/2-cards/test_pipeline_parallel.py