Simplify batch update (#2154)

This commit is contained in:
Lianmin Zheng
2024-11-24 04:47:10 -08:00
committed by GitHub
parent d90c3d6b8b
commit c211e7b669
7 changed files with 47 additions and 46 deletions

View File

@@ -31,7 +31,6 @@ If you see out of memory (OOM) errors, you can try to tune the following paramet
- You can also try to decrease `--mem-fraction-static`, which reduces the memory usage of the KV cache memory pool and helps both prefill and decoding.
### Try Advanced Options
- To enable the experimental overlapped scheduler, add `--enable-overlap-schedule`. It overlaps CPU scheduler with GPU computation and can accelerate almost all workloads. This does not work for constrained decoding currently.
- To enable torch.compile acceleration, add `--enable-torch-compile`. It accelerates small models on small batch sizes. This does not work for FP8 currently.
### Tune `--schedule-policy`