xc-llm-ascend

Files

anon189Ty 7b2ecc1e9a [Feat] Unquantized linear nz support (#2619 )

### What this PR does / why we need it?
Currently, when executing to the Linear layer of the model in
vLLM-Ascend, the weights input format is ND in unquantized case and
skipped ascend case, which is slower than FRACTAL_NZ.
This PR supplements the execution logic for Linear layer. When
VLLM_ASCEND_ENABLE_MLP_OPTIMIZE=1 and CANN version is 8.3, the weights
of the Linear layer will be converted to FRACTAL_NZ, in both unquantized
case and skipped ascend case.

- vLLM version: main
- vLLM main:
267c80d31f

Signed-off-by: anon189Ty <Stari_Falcon@outlook.com>

2025-09-11 11:40:00 +08:00

test_quant_config.py

[Feat] Unquantized linear nz support (#2619 )

2025-09-11 11:40:00 +08:00

test_utils.py

[1/N][Refactor][Quantization] remove redundant quantizer class (#2680 )

2025-09-04 11:35:14 +08:00

test_w4a8_dynamic.py

[feat]: oproj tensor parallelism in pure DP and graph-mode scenarios. (#2167 )

2025-09-07 10:31:32 +08:00

test_w8a8_dynamic.py

[feat]: oproj tensor parallelism in pure DP and graph-mode scenarios. (#2167 )

2025-09-07 10:31:32 +08:00

test_w8a8.py

[main] [refactor] refactor common_fused_moe.py (#2706 )

2025-09-08 20:09:50 +08:00