NeverRaR
c7f1c59911
feat: support compile multiple batch graph (#1085)
### What this PR does / why we need it?
support compile multiple batch graph with different code object to avoid
cache invalidation
### How was this patch tested?
```
export VLLM_ENABLE_MC2=0
export VLLM_USE_V1=1
export TASK_QUEUE_ENABLE=1
source /usr/local/Ascend/ascend-toolkit/set_env.sh
source /usr/local/Ascend/nnal/atb/set_env.sh
nohup python -m vllm.entrypoints.openai.api_server --model=/mnt/deepseek/DeepSeek-R1-W8A8-VLLM \
--quantization ascend \
--served-model-name auto \
--trust-remote-code \
--distributed-executor-backend=mp \
--port 8006 \
-tp=8 \
-dp=2 \
--no-enforce-eager \
--max-num-seqs 24 \
--max-model-len 32768 \
--max-num-batched-tokens 32768 \
--block-size 128 \
--no-enable-prefix-caching \
--additional-config '{"torchair_graph_config": {"enabled": true,"use_cached_graph": true,"graph_batch_sizes": [8,16,24]},"ascend_scheduler_config": {"enabled":true,"chunked_prefill_enabled":false},"expert_tensor_parallel_size":16}' \
--gpu-memory-utilization 0.95 &> run.log &
disown
```
Signed-off-by: boying <897013703@qq.com>
2025-06-06 20:17:51 +08:00
..
2025-06-06 19:20:33 +08:00
2025-05-29 11:58:26 +08:00
2025-06-06 16:46:23 +08:00
2025-05-23 10:05:57 +08:00
2025-06-03 11:07:33 +08:00
2025-04-22 08:57:25 +08:00
2025-06-05 23:39:38 +08:00
2025-06-06 19:17:27 +08:00
2025-06-05 16:42:18 +08:00
2025-06-05 23:39:38 +08:00
2025-05-23 14:25:46 +08:00
2025-06-06 20:17:51 +08:00
2025-05-14 19:49:09 +08:00
2025-06-06 18:54:37 +08:00
2025-06-06 09:48:43 +08:00
2025-06-06 18:54:37 +08:00
2025-06-06 09:29:34 +08:00