Files
xc-llm-ascend/vllm_ascend/worker
Cao Yi cb4c7de856 [Perf] Optimize MTP execution by reordering state update operation (#6844)
## Summary
- Move `_update_states_after_model_execute` call from after main model
sampling to after draft model execution
- This reordering reduces pipeline bubbles between main model and draft
model execution
- No accuracy impact - the state update operation is independent of
draft token proposal

## Performance Impact
Reduces idle time between main model and draft model execution stages,
improving overall MTP (Multi-Token Prediction) performance.
- vLLM version: v0.15.0
- vLLM main:
83b47f67b1

---------

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
Co-authored-by: wanghuanjun2113 <wanghuanjun2113@gmail.com>
2026-03-09 15:55:27 +08:00
..