[Main2Main] Upgrade vllm commit to 0105 (#5595)
### What this PR does / why we need it?
Upgrade vllm commit to 0105 (8be6432bdaf6275664d857b1e5e9bf8ed1ce299e)
1. Remove `maybe_padded_num_tokens` arg in `model_runner_v1.py` since
https://github.com/vllm-project/vllm/pull/31517 deleted unused arg
2. Remove dense `Qwen/Qwen3-0.6B` in
`tests/e2e/multicard/test_aclgraph_capture_replay.py` and
`tests/e2e/multicard/test_data_parallel.py` due to
https://github.com/vllm-project/vllm/pull/30739
where offline data parallel mode will not be supported/useful for dense
models
3. Adapt `vllm_ascend/worker/worker.py` due to
https://github.com/vllm-project/vllm/pull/31584
4. Adapt `self.block_size` calling due to
https://github.com/vllm-project/vllm/pull/31540
5. Modify `test_mla_v1.py` due to
https://github.com/vllm-project/vllm/pull/28454 , which refactorred
`get_head_size()`
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.13.0
- vLLM main:
7157596103
Signed-off-by: wjunLu <wjunlu217@gmail.com>
This commit is contained in:
@@ -299,7 +299,7 @@ class NPUWorker(WorkerBase):
|
||||
def execute_model(
|
||||
self,
|
||||
scheduler_output: "SchedulerOutput",
|
||||
) -> ModelRunnerOutput | None:
|
||||
) -> ModelRunnerOutput | AsyncModelRunnerOutput | None:
|
||||
# enable msMonitor to monitor the performance of vllm-ascend
|
||||
if envs_ascend.MSMONITOR_USE_DAEMON:
|
||||
dp.step()
|
||||
@@ -318,7 +318,8 @@ class NPUWorker(WorkerBase):
|
||||
|
||||
output = self.model_runner.execute_model(scheduler_output,
|
||||
intermediate_tensors)
|
||||
if isinstance(output, (ModelRunnerOutput, NoneType)):
|
||||
if isinstance(output,
|
||||
(ModelRunnerOutput, AsyncModelRunnerOutput, NoneType)):
|
||||
return output
|
||||
|
||||
assert isinstance(output, IntermediateTensors)
|
||||
|
||||
Reference in New Issue
Block a user