From 1ae270c5d0873c0bcd02b9078e3a6bd0f12fbc1d Mon Sep 17 00:00:00 2001
From: Lianmin Zheng <lianminzheng@gmail.com>
Date: Thu, 7 Nov 2024 18:20:41 -0800
Subject: [PATCH] [Doc] fix docs (#1949)

---
 docs/{references => frontend}/choices_methods.md | 0
 docs/index.rst                                   | 4 ++--
 docs/references/hyperparameter_tuning.md         | 6 +++---
 docs/references/troubleshooting.md               | 6 +++---
 4 files changed, 8 insertions(+), 8 deletions(-)
 rename docs/{references => frontend}/choices_methods.md (100%)

diff --git a/docs/references/choices_methods.md b/docs/frontend/choices_methods.md
similarity index 100%
rename from docs/references/choices_methods.md
rename to docs/frontend/choices_methods.md
diff --git a/docs/index.rst b/docs/index.rst
index 130b29811..e81cdd149 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -36,17 +36,17 @@ The core features include:
    :caption: Frontend Tutorial
 
    frontend/frontend.md
+   frontend/choices_methods.md
 
 
 .. toctree::
    :maxdepth: 1
    :caption: References
 
+   references/supported_models.md
    references/sampling_params.md
    references/hyperparameter_tuning.md
-   references/supported_models.md
    references/benchmark_and_profiling.md
-   references/choices_methods.md
    references/custom_chat_template.md
    references/contributor_guide.md
    references/troubleshooting.md
diff --git a/docs/references/hyperparameter_tuning.md b/docs/references/hyperparameter_tuning.md
index 89faa479b..499b81bc0 100644
--- a/docs/references/hyperparameter_tuning.md
+++ b/docs/references/hyperparameter_tuning.md
@@ -26,9 +26,9 @@ Data parallelism is better for throughput. When there is enough GPU memory, alwa
 
 ### Avoid out-of-memory by Tuning `--chunked-prefill-size`, `--mem-fraction-static`, `--max-running-requests`
 If you see out of memory (OOM) errors, you can try to tune the following parameters.
-If OOM happens during prefill, try to decrease `--chunked-prefill-size` to `4096` or `2048`.
-If OOM happens during decoding, try to decrease `--max-running-requests`.
-You can also try to decrease `--mem-fraction-static`, which reduces the memory usage of the KV cache memory pool and helps both prefill and decoding.
+- If OOM happens during prefill, try to decrease `--chunked-prefill-size` to `4096` or `2048`.
+- If OOM happens during decoding, try to decrease `--max-running-requests`.
+- You can also try to decrease `--mem-fraction-static`, which reduces the memory usage of the KV cache memory pool and helps both prefill and decoding.
 
 ### Try Advanced Options
 - To enable the experimental overlapped scheduler, add `--enable-overlap-scheduler`. It overlaps CPU scheduler with GPU computation and can accelerate almost all workloads. This does not work for constrained decoding currenly.
diff --git a/docs/references/troubleshooting.md b/docs/references/troubleshooting.md
index becb186df..8442bb205 100644
--- a/docs/references/troubleshooting.md
+++ b/docs/references/troubleshooting.md
@@ -4,9 +4,9 @@ This page lists some common errors and tips for fixing them.
 
 ## CUDA out of memory
 If you see out of memory (OOM) errors, you can try to tune the following parameters.
-If OOM happens during prefill, try to decrease `--chunked-prefill-size` to `4096` or `2048`.
-If OOM happens during decoding, try to decrease `--max-running-requests`.
-You can also try to decrease `--mem-fraction-static`, which reduces the memory usage of the KV cache memory pool and helps both prefill and decoding.
+- If OOM happens during prefill, try to decrease `--chunked-prefill-size` to `4096` or `2048`.
+- If OOM happens during decoding, try to decrease `--max-running-requests`.
+- You can also try to decrease `--mem-fraction-static`, which reduces the memory usage of the KV cache memory pool and helps both prefill and decoding.
 
 ## CUDA error: an illegal memory access was encountered
 This error may be due to kernel errors or out-of-memory issues.