Files
xc-llm-ascend/vllm_ascend
wangqiankun13 350b95efcf [BugFix]Disable dispatch_gmm_combine_decode operator when mtp drafter model uses non-w8a8 while main model uses w8a8, or drafter model is eagle series (#5293)
…w8a8 while main model uses w8a8

### What this PR does / why we need it?

Disable dispatch_gmm_combine_decode operator when mtp drafter model uses
non-w8a8 while main model uses w8a8, or drafter model is eagle series.

More info about this operator, please refer to RFC: issue
https://github.com/vllm-project/vllm-ascend/issues/5476


- vLLM version: release/v0.13.0
- vLLM main:
ad32e3e19c

Signed-off-by: wangqiankun <wangqiankun13@huawei.com>
2026-01-04 17:51:28 +08:00
..
2025-12-20 17:03:25 +08:00
2025-12-11 18:45:43 +08:00
2025-12-31 09:49:55 +08:00
2025-12-25 09:17:06 +08:00
2025-12-02 17:35:47 +08:00