Support the inference of the Deepseekr1-w8a8-mtp model with
statically-quantized shared_head in MTP layers.
- vLLM version: v0.9.2
- vLLM main:
6eca337ce0
Signed-off-by: curryliu <120010041@link.cuhk.edu.cn>
Support the inference of the Deepseekr1-w8a8-mtp model with
statically-quantized shared_head in MTP layers.
- vLLM version: v0.9.2
- vLLM main:
6eca337ce0
Signed-off-by: curryliu <120010041@link.cuhk.edu.cn>