xc-llm-ascend

Files

chris668899 6c020883a8 [WIP]Add Func: aclgraph_batch_size auto-adjust to different model (#771 )

<!--  Thanks for sending a pull request!

BEFORE SUBMITTING, PLEASE READ
https://docs.vllm.ai/en/latest/contributing/overview.html

-->
### What this PR does / why we need it?
This PR add new function of : aclgraph_batch_size can dynamic adjust to
different model; before this PR, the aclgraph_batch_sizes given from
vllm to vllm-ascend always too large, and that may result in ERROR while
running on different, with the information: "The resources are
insufficient".
Now, with this PR, the code can dynamic adjust aclgraph_batch_sizes
depend on the model hidden_layer_nums and parallel config, for example:
a. for Qwen2.5-7B, the aclgraph_batch_size length is 33 total;
b. for Qwen2.5-72B, the aclgraph_batch_size length is 11 total;

Signed-off-by: chris668899 <15105191595@126.com>

2025-05-08 16:23:33 +08:00

compile

support aclgraph (#426 )

2025-04-23 20:56:24 +08:00

multicard

[WIP]Add Func: aclgraph_batch_size auto-adjust to different model (#771 )

2025-05-08 16:23:33 +08:00

ops

[MISC] Clean up torch_npu (#688 )

2025-04-29 18:03:38 +08:00

scheduler

[BugFix] Fix scheduler problems in last PR. (#558 )