[Bugfix] fix logging and d2h bug for flash comm1 (#3505)
### What this PR does / why we need it? Fix 3 bugs in flash comm1 of Allgather EP(https://github.com/vllm-project/vllm-ascend/pull/3334): 1. call `enable_sp()` with argument `vllm_config` trigger a lot of warning log, this PR caches its return value. 2. `num_tokens_after_padding` should be cpu tensor as it will used as `num_tokens_across_dp_cpu` in `DPMetadata`. It will causes may d2h copy when running model. 3. In PD, model runner will execute `kv_connector_no_forward`,where `num_tokens` is None - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: realliujiaxu <realliujiaxu@163.com>
This commit is contained in:
@@ -114,7 +114,7 @@ def set_ascend_forward_context(
|
||||
# the performance may degrade due to the switching of communication methods.
|
||||
if is_moe_model(vllm_config):
|
||||
sp_enabled = enable_sp(vllm_config) and \
|
||||
tp_world_size > 1
|
||||
tp_world_size > 1 and num_tokens is not None
|
||||
else:
|
||||
sp_enabled = enable_sp(vllm_config) and \
|
||||
tp_world_size > 1 and \
|
||||
|
||||
Reference in New Issue
Block a user