xc-llm-ascend

Files

Chen Chen 848419d1ba [Bugfix] Disable the dispatch_ffn_combine kernel in MTP path (#4751 )

### What this PR does / why we need it?

This PR is to fix a smoking test failure. Adjust mtp_proposer and
model_runner_v1 to route MTP decoding through the non‑fused MoE
implementation while keeping the overall inference flow unchanged.

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

Signed-off-by: mojave2 <chenchen145@huawei.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>

2025-12-09 22:14:05 +08:00

__init__.py

remove useless patch (#4699 )

2025-12-08 11:02:42 +08:00

eagle_proposer.py

[Refactor] 2/N Unify all mask generation methods and cache mask (#4779 )

2025-12-09 18:51:00 +08:00

interface.py

upgrade vLLM to main (#4608 )

2025-12-02 22:10:52 +08:00

mtp_proposer.py

[Bugfix] Disable the dispatch_ffn_combine kernel in MTP path (#4751 )