[Feat] Dynamic Batch Feature (#3490)

[RFC](https://github.com/vllm-project/vllm-ascend/issues/3328) for more
details.
Add dynamic batch feature in chunked prefilling strategy, the token
budget can be refined to achieve better effective throughput and TPOT.

!!! NOTE: only 910B3 is supported till now, we are working on further
improvements.
Additional file for lookup table is required.

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: Cheng Wang <wangchengkyrie@outlook.com>
This commit is contained in:
KyrieWang
2025-10-22 14:13:32 +08:00
committed by GitHub
parent c18ca62a17
commit 60e2be1b36
10 changed files with 1368 additions and 1 deletions

View File

@@ -11,4 +11,5 @@ sleep_mode
structured_output
lora
eplb_swift_balancer
dynamic_batch
:::