fix small typos in docs (#2047)

2024-11-16 03:09:10 +08:00
parent 32c9a7ec11
commit 023d0a73df
4 changed files with 6 additions and 6 deletions
--- a/docs/references/hyperparameter_tuning.md
+++ b/docs/references/hyperparameter_tuning.md
@@ -31,8 +31,8 @@ If you see out of memory (OOM) errors, you can try to tune the following paramet
 - You can also try to decrease `--mem-fraction-static`, which reduces the memory usage of the KV cache memory pool and helps both prefill and decoding.

 ### Try Advanced Options
- To enable the experimental overlapped scheduler, add `--enable-overlap-schedule`. It overlaps CPU scheduler with GPU computation and can accelerate almost all workloads. This does not work for constrained decoding currenly.
- To enable torch.compile acceleration, add `--enable-torch-compile`. It accelerates small models on small batch sizes. This does not work for FP8 currenly.
+- To enable the experimental overlapped scheduler, add `--enable-overlap-schedule`. It overlaps CPU scheduler with GPU computation and can accelerate almost all workloads. This does not work for constrained decoding currently.
+- To enable torch.compile acceleration, add `--enable-torch-compile`. It accelerates small models on small batch sizes. This does not work for FP8 currently.

 ### Tune `--schedule-policy`
 If the workload has many shared prefixes, use the default `--schedule-policy lpm`. `lpm` stands for longest prefix match.