[CI/UT][Graph] Add ut for torchair graph mode (#1103)

### What this PR does / why we need it?
Add ut for torchair graph mode on DeepSeekV3

### How was this patch tested?
CI passed with new added test.

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: Mengqing Cao <cmq0113@163.com>
This commit is contained in:
Mengqing Cao
2025-06-14 16:59:00 +08:00
committed by GitHub
parent 94a52cf577
commit a3b5af8307
4 changed files with 100 additions and 12 deletions

View File

@@ -53,7 +53,7 @@ The details of each config option are as follows:
| ---- | ---- | ------- | ----------- |
| `enabled` | bool | `False` | Whether to enable ascend scheduler for V1 engine|
ascend_scheduler_config also support the options from [vllm scheduler config](https://docs.vllm.ai/en/stable/api/vllm/config.html#vllm.config.SchedulerConfig). For example, you can add `chunked_prefill_enabled: true` to ascend_scheduler_config as well.
ascend_scheduler_config also support the options from [vllm scheduler config](https://docs.vllm.ai/en/stable/api/vllm/config.html#vllm.config.SchedulerConfig). For example, you can add `enable_chunked_prefill: True` to ascend_scheduler_config as well.
### Example
@@ -62,18 +62,18 @@ A full example of additional configuration is as follows:
```
{
"torchair_graph_config": {
"enabled": true,
"use_cached_graph": true,
"enabled": True,
"use_cached_graph": True,
"graph_batch_sizes": [1, 2, 4, 8],
"graph_batch_sizes_init": false,
"enable_multistream_moe": false,
"enable_kv_nz": false
"graph_batch_sizes_init": False,
"enable_multistream_moe": False,
"enable_kv_nz": False
},
"ascend_scheduler_config": {
"enabled": true,
"chunked_prefill_enabled": true,
"enabled": True,
"enable_chunked_prefill": True,
},
"expert_tensor_parallel_size": 1,
"refresh": false,
"refresh": False,
}
```

View File

@@ -47,14 +47,15 @@ from vllm import LLM
os.environ["VLLM_USE_V1"] = 1
model = LLM(model="deepseek-ai/DeepSeek-R1-0528", additional_config={"torchair_graph_config": {"enabled": True}})
# TorchAirGraph is only work without chunked-prefill now
model = LLM(model="deepseek-ai/DeepSeek-R1-0528", additional_config={"torchair_graph_config": {"enabled": True},"ascend_scheduler_config": {"enabled": True,}})
outputs = model.generate("Hello, how are you?")
```
online example:
```shell
vllm serve Qwen/Qwen2-7B-Instruct --additional-config='{"torchair_graph_config": {"enabled": true}}'
vllm serve Qwen/Qwen2-7B-Instruct --additional-config='{"torchair_graph_config": {"enabled": True},"ascend_scheduler_config": {"enabled": True,}}'
```
You can find more detail about additional config [here](./additional_config.md)