[Feat] Dynamic Batch Feature (#3490)
[RFC](https://github.com/vllm-project/vllm-ascend/issues/3328) for more details. Add dynamic batch feature in chunked prefilling strategy, the token budget can be refined to achieve better effective throughput and TPOT. !!! NOTE: only 910B3 is supported till now, we are working on further improvements. Additional file for lookup table is required. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: Cheng Wang <wangchengkyrie@outlook.com>
This commit is contained in:
@@ -314,6 +314,14 @@ class NPUPlatform(Platform):
|
||||
vllm_config.scheduler_config)
|
||||
vllm_config.scheduler_config = recompute_scheduler_config
|
||||
|
||||
# Extend original scheduler_config to use SchedulerDynamicBatch.
|
||||
if ascend_config.SLO_limits_for_dynamic_batch != -1:
|
||||
vllm_config.scheduler_config.scheduler_cls = (
|
||||
"vllm_ascend.core.scheduler_dynamic_batch.SchedulerDynamicBatch"
|
||||
)
|
||||
vllm_config.scheduler_config.chunked_prefill_enabled = True
|
||||
vllm_config.scheduler_config.SLO_limits_for_dynamic_batch = ascend_config.SLO_limits_for_dynamic_batch
|
||||
|
||||
@classmethod
|
||||
def get_attn_backend_cls(
|
||||
cls,
|
||||
|
||||
Reference in New Issue
Block a user