xc-llm-ascend

Files

tanhaoan333 15f6564976 [Model]Add Qwen3-Omni quantization Ascend NPU adaptation and optimization (#6828 )

### What this PR does / why we need it?
This pull request is for quantization adaptation of Qwen3Omni, and it
achieves operator-level optimization and AUT (Auto-Quantization Tuning)
component optimization through patch-based modifications.
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main:
83b47f67b1

---------

Signed-off-by: tanhaoan333 <tanhaoan@huawei.com>

2026-03-03 00:07:23 +08:00

__init__.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #7 ) (#6023 )

2026-02-06 14:56:53 +08:00

base.py

add mxfp8 moe quantization (#6670 )

2026-03-02 11:04:06 +08:00

registry.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #7 ) (#6023 )

2026-02-06 14:56:53 +08:00

w4a4_flatquant.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #7 ) (#6023 )