[RFC](https://github.com/vllm-project/vllm-ascend/issues/3328) for more details. Add dynamic batch feature in chunked prefilling strategy, the token budget can be refined to achieve better effective throughput and TPOT. !!! NOTE: only 910B3 is supported till now, we are working on further improvements. Additional file for lookup table is required. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: Cheng Wang <wangchengkyrie@outlook.com>
16 lines
234 B
Markdown
16 lines
234 B
Markdown
# Feature Guide
|
|
|
|
This section provides a detailed usage guide of vLLM Ascend features.
|
|
|
|
:::{toctree}
|
|
:caption: Feature Guide
|
|
:maxdepth: 1
|
|
graph_mode
|
|
quantization
|
|
sleep_mode
|
|
structured_output
|
|
lora
|
|
eplb_swift_balancer
|
|
dynamic_batch
|
|
:::
|