Files

LHXuuu 0415e694cd [Quantization] Support compressed tensors moe w8a8 int8 dynamic weight (#5718 )

### What this PR does / why we need it?
While using the LLM Compressor quantization tool from the VLLM community
to generate quantized weights, the VLLM Ascend engine needs to be
adapted to support the compressed tensors quantization format.

1. Support Moe model W8A8 Int8 dynamic weight.
2. Specify W4A16 quantization configuration.

Co-authored-by: menogrey 1299267905@qq.com
Co-authored-by: kunpengW-code 1289706727@qq.com

### Does this PR introduce _any_ user-facing change?
No

- vLLM version: v0.13.0
- vLLM main:
2f4e6548ef

---------

Signed-off-by: LHXuuu <scut_xlh@163.com>
Signed-off-by: menogrey <1299267905@qq.com>
Signed-off-by: Wang Kunpeng <1289706727@qq.com>
Co-authored-by: menogrey <1299267905@qq.com>
Co-authored-by: Wang Kunpeng <1289706727@qq.com>

2026-01-14 09:17:26 +08:00

source

[Quantization] Support compressed tensors moe w8a8 int8 dynamic weight (#5718 )

2026-01-14 09:17:26 +08:00

Makefile

[Doc]Add Chinese translation for documentation (#1870 )

2025-07-21 11:26:27 +08:00

README.md

[Doc] Update doc url link (#5781 )

2026-01-12 11:21:31 +08:00

requirements-docs.txt

[Doc]Add Chinese translation for documentation (#1870 )

2025-07-21 11:26:27 +08:00

requirements-test.txt

static EPLB fix bug, add unit test (#1186 )

2025-06-18 19:46:56 +08:00

README.md

vLLM Ascend Plugin documents

Live doc: https://docs.vllm.ai/projects/ascend

Build the docs

# Install dependencies.
pip install -r requirements-docs.txt

# Build the docs.
make clean
make html

# Build the docs with translation
make intl

# Open the docs with your browser
python -m http.server -d _build/html/

Launch your browser and open:

English version: http://localhost:8000
Chinese version: http://localhost:8000/zh_CN