Files
xc-llm-ascend/vllm_ascend/quantization
Feng-xiaosuo abe72d7cb9 Refactor quantization layer name mapping to leverage vLLM built-in mappers (#7050)
…the quantization layer name

### What this PR does / why we need it?
This PR modifies the loading logic for layer name prefixes in quantized
models. The goal is to reduce or eliminate the need for point-to-point
(hardcoded) modifications by leveraging the built-in mapper mechanism
already provided in vLLM's model code. For models that do not yet have a
corresponding mapper, the original point-to-point modification approach
has been retained to ensure backward compatibility.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
The changes were validated using an offline deployment script to launch
and verify multiple multimodal models. Testing confirmed that the
updated loading logic correctly handles layer name prefixes across
different model architectures, with no regression in model
initialization or inference behavior.
- vLLM version: v0.16.0
- vLLM main:
4034c3d32e

---------

Signed-off-by: Matrix_K <zhangke144@huawei.com>
Signed-off-by: Feng-xiaosuo <tengchang1@huawei.com>
Co-authored-by: Matrix_K <zhangke144@huawei.com>
2026-03-12 15:48:14 +08:00
..