[BugFix] Fix ascend config check (#1092)

Fix the ascend config check logic:
1. refactor check_ascend_config to make it clear:
    1. torchair graph should not work with enforce_eager=True
    2. aclgraph should not work with torchair graph
3. add refresh config for rlhf case
4. fix a typo in model runner
5. change expert_tensor_parallel_size default to 0 to keep the same as
before

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
This commit is contained in:
wangxiyuan
2025-06-06 18:54:37 +08:00
committed by GitHub
parent 973f993a13
commit dab19d5dca
5 changed files with 136 additions and 42 deletions

View File

@@ -133,7 +133,7 @@ class NPUPlatform(Platform):
# NOTE: When enable_expert_parallel is True, we follow vLLM convention:
# ep_size = world_size, which means expert_tensor_parallel_size must be 1
if ascend_config.expert_tensor_parallel_size > 1 and not parallel_config.enable_expert_parallel:
if ascend_config.expert_tensor_parallel_size > 0 and not parallel_config.enable_expert_parallel:
parallel_config.expert_tensor_parallel_size = ascend_config.expert_tensor_parallel_size
# Calculate expert parallel size based on world size