xc-llm-ascend

Files

Feng-xiaosuo abe72d7cb9 Refactor quantization layer name mapping to leverage vLLM built-in mappers (#7050 )

…the quantization layer name

### What this PR does / why we need it?
This PR modifies the loading logic for layer name prefixes in quantized
models. The goal is to reduce or eliminate the need for point-to-point
(hardcoded) modifications by leveraging the built-in mapper mechanism
already provided in vLLM's model code. For models that do not yet have a
corresponding mapper, the original point-to-point modification approach
has been retained to ensure backward compatibility.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
The changes were validated using an offline deployment script to launch
and verify multiple multimodal models. Testing confirmed that the
updated loading logic correctly handles layer name prefixes across
different model architectures, with no regression in model
initialization or inference behavior.
- vLLM version: v0.16.0
- vLLM main:
4034c3d32e

---------

Signed-off-by: Matrix_K <zhangke144@huawei.com>
Signed-off-by: Feng-xiaosuo <tengchang1@huawei.com>
Co-authored-by: Matrix_K <zhangke144@huawei.com>

2026-03-12 15:48:14 +08:00

methods

[EPLB][bugfix] Bugfix for fused mc2 (#6794 )

2026-03-09 11:26:57 +08:00

__init__.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #7 ) (#6023 )

2026-02-06 14:56:53 +08:00

compressed_tensors_config.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #7 ) (#6023 )

2026-02-06 14:56:53 +08:00

method_adapters.py

[bugdix] The problem that the w4a8 weight fails to be loaded when the EP is not enabled is resolved. (#7090 )