diff --git a/docs/backend/backend.md b/docs/backend/backend.md index 3692d7217..546695178 100644 --- a/docs/backend/backend.md +++ b/docs/backend/backend.md @@ -20,7 +20,7 @@ curl http://localhost:30000/generate \ }' ``` -Learn more about the argument specification, streaming, and multi-modal support [here](https://sgl-project.github.io/sampling_params.html). +Learn more about the argument specification, streaming, and multi-modal support [here](https://sgl-project.github.io/references/sampling_params.html). ## OpenAI Compatible API In addition, the server supports OpenAI-compatible APIs. @@ -74,7 +74,7 @@ python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct ``` python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct --mem-fraction-static 0.7 ``` -- See [hyperparameter tuning](https://sgl-project.github.io/hyperparameter_tuning.html) on tuning hyperparameters for better performance. +- See [hyperparameter tuning](https://sgl-project.github.io/references/hyperparameter_tuning.html) on tuning hyperparameters for better performance. - If you see out-of-memory errors during prefill for long prompts, try to set a smaller chunked prefill size. ``` python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct --chunked-prefill-size 4096 @@ -161,7 +161,7 @@ You can view the full example [here](https://github.com/sgl-project/sglang/tree/ - gte-Qwen2 - `python -m sglang.launch_server --model-path Alibaba-NLP/gte-Qwen2-7B-instruct --is-embedding` -Instructions for supporting a new model are [here](https://sgl-project.github.io/model_support.html). +Instructions for supporting a new model are [here](https://sgl-project.github.io/references/model_support.html). ### Use Models From ModelScope
diff --git a/docs/references/custom_chat_template.md b/docs/references/custom_chat_template.md index 0a5225da2..64b33a0a4 100644 --- a/docs/references/custom_chat_template.md +++ b/docs/references/custom_chat_template.md @@ -1,5 +1,3 @@ -.. _custom-chat-template: - # Custom Chat Template in SGLang Runtime **NOTE**: There are two chat template systems in SGLang project. This document is about setting a custom chat template for the OpenAI-compatible API server (defined at [conversation.py](https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/conversation.py)). It is NOT related to the chat template used in the SGLang language frontend (defined at [chat_template.py](https://github.com/sgl-project/sglang/blob/main/python/sglang/lang/chat_template.py)). diff --git a/docs/references/faq.md b/docs/references/faq.md index 5a87ba3d8..eb43ee662 100644 --- a/docs/references/faq.md +++ b/docs/references/faq.md @@ -1,5 +1,3 @@ -Here’s the text with corrected grammar and refined phrasing in U.S. English: - # Frequently Asked Questions ## The results are not deterministic, even with a temperature of 0 @@ -14,4 +12,4 @@ We are still investigating the root causes and potential solutions. In the short We have two issues to track our progress: - The deterministic mode is tracked at [https://github.com/sgl-project/sglang/issues/1729](https://github.com/sgl-project/sglang/issues/1729). -- The per-request random seed is tracked at [https://github.com/sgl-project/sglang/issues/1335](https://github.com/sgl-project/sglang/issues/1335). \ No newline at end of file +- The per-request random seed is tracked at [https://github.com/sgl-project/sglang/issues/1335](https://github.com/sgl-project/sglang/issues/1335). diff --git a/docs/references/sampling_params.md b/docs/references/sampling_params.md index 062e0c99b..78d5193c2 100644 --- a/docs/references/sampling_params.md +++ b/docs/references/sampling_params.md @@ -1,5 +1,3 @@ -.. _sampling-parameters: - # Sampling Parameters in SGLang Runtime This doc describes the sampling parameters of the SGLang Runtime. It is the low-level endpoint of the runtime.