Files
xc-llm-ascend/vllm_ascend
curryliu ca8007f584 [Feature] Enable inference support for Deepseekr1-w8a8-MTP (#1994)
Support the inference of the Deepseekr1-w8a8-mtp model with
statically-quantized shared_head in MTP layers.

- vLLM version: v0.9.2
- vLLM main:
6eca337ce0

Signed-off-by: curryliu <120010041@link.cuhk.edu.cn>
2025-07-29 18:51:57 +08:00
..
2025-04-22 08:57:25 +08:00
2025-07-28 15:59:09 +08:00
2025-07-28 15:59:09 +08:00