[Feat] Dynamic Batch Feature (#3490)
[RFC](https://github.com/vllm-project/vllm-ascend/issues/3328) for more details. Add dynamic batch feature in chunked prefilling strategy, the token budget can be refined to achieve better effective throughput and TPOT. !!! NOTE: only 910B3 is supported till now, we are working on further improvements. Additional file for lookup table is required. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: Cheng Wang <wangchengkyrie@outlook.com>
This commit is contained in:
@@ -8,11 +8,13 @@ pip
|
||||
pybind11
|
||||
pyyaml
|
||||
scipy
|
||||
pandas
|
||||
setuptools>=64
|
||||
setuptools-scm>=8
|
||||
torch>=2.7.1
|
||||
torchvision
|
||||
wheel
|
||||
pandas-stubs
|
||||
|
||||
# requirements for disaggregated prefill
|
||||
msgpack
|
||||
|
||||
Reference in New Issue
Block a user