[bugfix] fix mtp accept rate (#5093)

### What this PR does / why we need it?
1. now, npu_model_runner reuses gpu_model_runner, this pr deletes some
attrs already defined in gpu_model_runner
2. fix mtp accept rate by disabling in_profile_run
3. remove redundant moe method selection logic
4. Reverts vllm-project/vllm-ascend#5082, which broke CI in
https://github.com/vllm-project/vllm-ascend/actions/runs/20266314048/job/58190426832?pr=5088

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
vLLM version: v0.12.0
vLLM main:
ad32e3e19c

vLLM version: v0.12.0
vLLM main:
ad32e3e19c

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: zhenwenqi2024 <zhenwenqi_2022@qq.com>
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
This commit is contained in:
zhenwenqi2024
2025-12-17 01:35:26 +08:00
committed by GitHub
parent 5b1da4e914
commit eb4c08f05d
5 changed files with 10 additions and 36 deletions

View File

@@ -145,7 +145,6 @@ class EagleProposer(Proposer):
dummy_compute_logits=lambda hidden_states: None):
with set_ascend_forward_context(None,
self.vllm_config,
in_profile_run=True,
num_tokens=num_tokens):
self.model(
input_ids=self.input_ids[:num_tokens],