[CI/UT][Graph] Add ut for torchair graph mode (#1103)
### What this PR does / why we need it? Add ut for torchair graph mode on DeepSeekV3 ### How was this patch tested? CI passed with new added test. --------- Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: Mengqing Cao <cmq0113@163.com>
This commit is contained in:
@@ -53,7 +53,7 @@ The details of each config option are as follows:
|
||||
| ---- | ---- | ------- | ----------- |
|
||||
| `enabled` | bool | `False` | Whether to enable ascend scheduler for V1 engine|
|
||||
|
||||
ascend_scheduler_config also support the options from [vllm scheduler config](https://docs.vllm.ai/en/stable/api/vllm/config.html#vllm.config.SchedulerConfig). For example, you can add `chunked_prefill_enabled: true` to ascend_scheduler_config as well.
|
||||
ascend_scheduler_config also support the options from [vllm scheduler config](https://docs.vllm.ai/en/stable/api/vllm/config.html#vllm.config.SchedulerConfig). For example, you can add `enable_chunked_prefill: True` to ascend_scheduler_config as well.
|
||||
|
||||
### Example
|
||||
|
||||
@@ -62,18 +62,18 @@ A full example of additional configuration is as follows:
|
||||
```
|
||||
{
|
||||
"torchair_graph_config": {
|
||||
"enabled": true,
|
||||
"use_cached_graph": true,
|
||||
"enabled": True,
|
||||
"use_cached_graph": True,
|
||||
"graph_batch_sizes": [1, 2, 4, 8],
|
||||
"graph_batch_sizes_init": false,
|
||||
"enable_multistream_moe": false,
|
||||
"enable_kv_nz": false
|
||||
"graph_batch_sizes_init": False,
|
||||
"enable_multistream_moe": False,
|
||||
"enable_kv_nz": False
|
||||
},
|
||||
"ascend_scheduler_config": {
|
||||
"enabled": true,
|
||||
"chunked_prefill_enabled": true,
|
||||
"enabled": True,
|
||||
"enable_chunked_prefill": True,
|
||||
},
|
||||
"expert_tensor_parallel_size": 1,
|
||||
"refresh": false,
|
||||
"refresh": False,
|
||||
}
|
||||
```
|
||||
|
||||
@@ -47,14 +47,15 @@ from vllm import LLM
|
||||
|
||||
os.environ["VLLM_USE_V1"] = 1
|
||||
|
||||
model = LLM(model="deepseek-ai/DeepSeek-R1-0528", additional_config={"torchair_graph_config": {"enabled": True}})
|
||||
# TorchAirGraph is only work without chunked-prefill now
|
||||
model = LLM(model="deepseek-ai/DeepSeek-R1-0528", additional_config={"torchair_graph_config": {"enabled": True},"ascend_scheduler_config": {"enabled": True,}})
|
||||
outputs = model.generate("Hello, how are you?")
|
||||
```
|
||||
|
||||
online example:
|
||||
|
||||
```shell
|
||||
vllm serve Qwen/Qwen2-7B-Instruct --additional-config='{"torchair_graph_config": {"enabled": true}}'
|
||||
vllm serve Qwen/Qwen2-7B-Instruct --additional-config='{"torchair_graph_config": {"enabled": True},"ascend_scheduler_config": {"enabled": True,}}'
|
||||
```
|
||||
|
||||
You can find more detail about additional config [here](./additional_config.md)
|
||||
|
||||
Reference in New Issue
Block a user