Support the inference of the Deepseekr1-w8a8-mtp model with statically-quantized shared_head in MTP layers. - vLLM version: v0.9.2 - vLLM main: 6eca337ce0 Signed-off-by: curryliu <120010041@link.cuhk.edu.cn>
6eca337ce0