xc-llm-ascend/quantization at 3b997fdd32a2c1f9c53867495ff9630de7ce56d5 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

wangyao-i 3b997fdd32 support mxfp8 quantization (qwen dense) (#5723 )

### What this PR does / why we need it?
support mxfp8 quantization (qwen liner layer)

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
2f4e6548ef


Signed-off-by: wangyao <iwangyao@outlook.com>

2026-01-09 16:26:31 +08:00

..

compressed_tensors

[Feat] Support native Kimi-K2-Thinking native W4A16 quantized experts weights (#4516 )

2025-12-10 15:58:52 +08:00

__init__.py

[Core] Cherry pick from 0.7.1 to keep the main code newest (#127 )

2025-02-21 17:07:37 +08:00

quant_config.py

support mxfp8 quantization (qwen dense) (#5723 )

2026-01-09 16:26:31 +08:00

utils.py

support mxfp8 quantization (qwen dense) (#5723 )

2026-01-09 16:26:31 +08:00

w4a4_flatquant_dynamic.py

[refactor] refactor weight trans nz and transpose (#4878 )

2025-12-19 14:27:24 +08:00

w4a8_dynamic.py

Bugfix: Align expert map shapes with redundant experts in EPLB adjustment (#5285 )

2026-01-06 17:22:36 +08:00

w4a16.py

Bugfix: Align expert map shapes with redundant experts in EPLB adjustment (#5285 )

2026-01-06 17:22:36 +08:00

w8a8_dynamic.py

[Feature]EPLB:Adapt DispatchGmmCombineDecode operator to eplb tensor list and expert token numbers (#5552 )

2026-01-07 11:23:42 +08:00

w8a8_pdmix.py

[feature] Support W8A8 PD-Mix Quantization (#4235 )

2025-11-30 11:57:26 +08:00

w8a8.py

[refactor] Remove unnecessary attributes from set_ascend_forward_context (#5204 )

2025-12-23 08:49:52 +08:00

w8a8mxfp8.py

support mxfp8 quantization (qwen dense) (#5723 )

2026-01-09 16:26:31 +08:00

w8a16.py

[quantization] Add w8a16 quantization support (#4541 )

2025-12-24 19:49:32 +08:00