xc-llm-ascend

Files

Mengqing Cao 8abe517870 [Refactor] Adapt deepseek-v3.2 to vllm 0.11.0 (#3432 )

### What this PR does / why we need it?
Adapt deepseek-v3.2 to vllm 0.11.0, removing the useless patch.

The final goal is to remove all the patches and align the code arch to
vllm, thus we need to do the following work in next prs.
TODO:
- [x] remove patch on attention spec
- [ ] refactor the kvcache creation logic

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
1. CI passed with existing test.
2. Test pass with deepseek-v3.2-exp


- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: MengqingCao <cmq0113@163.com>

2025-10-15 17:48:58 +08:00

__init__.py

[Core] Cherry pick from 0.7.1 to keep the main code newest (#127 )

2025-02-21 17:07:37 +08:00

quant_config.py

[Refactor] Adapt deepseek-v3.2 to vllm 0.11.0 (#3432 )

2025-10-15 17:48:58 +08:00

utils.py

[Feature] Add W4A4 Flat Quantization support (#3427 )

2025-10-13 23:20:16 +08:00

w4a4_flatquant_dynamic.py

[Feature] Add W4A4 Flat Quantization support (#3427 )