xc-llm-ascend

Files

Chen Chen 848419d1ba [Bugfix] Disable the dispatch_ffn_combine kernel in MTP path (#4751 )

### What this PR does / why we need it?

This PR is to fix a smoking test failure. Adjust mtp_proposer and
model_runner_v1 to route MTP decoding through the non‑fused MoE
implementation while keeping the overall inference flow unchanged.

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

Signed-off-by: mojave2 <chenchen145@huawei.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>

2025-12-09 22:14:05 +08:00

__init__.py

[Misc][V0 Deprecation] Remove Cache Engine Used for V0 Worker (#1878 )

2025-07-19 09:42:32 +08:00

block_table.py

[long_seq] remove long_seq env (#4660 )

2025-12-05 10:31:49 +08:00

model_runner_v1.py

[Bugfix] Disable the dispatch_ffn_combine kernel in MTP path (#4751 )

2025-12-09 22:14:05 +08:00

npu_input_batch.py

support async mtp (#4511 )

2025-12-06 17:15:57 +08:00

worker_v1.py

[MOE]move weight transpose to wakeup for RL secnarios (#4626 )

2025-12-08 20:34:52 +08:00