Update Readme (#660)

Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
2024-07-19 09:54:01 -07:00
parent dc4e4a6acc
commit 51fda1439f
25 changed files with 200 additions and 185 deletions
--- a/docs/benchmark_results.md
+++ b/docs/benchmark_results.md
@@ -1,4 +1,4 @@
-## Benchmark Results
+# Benchmark Results

 We tested our system on the following common LLM workloads and reported the achieved throughput:
 - **[MMLU](https://arxiv.org/abs/2009.03300)**: A 5-shot, multi-choice, multi-task benchmark.
--- a/docs/custom_chat_template.md
+++ b/docs/custom_chat_template.md
@@ -0,0 +1,28 @@
+# Custom Chat Template in SGLang Runtime
+
+By default, the server uses the chat template specified in the model tokenizer from Hugging Face. It should just work for most official models such as Llama-2/Llama-3.
+
+If needed, you can also override the chat template when launching the server:
+
+```
+python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000 --chat-template llama-2
+```
+
+If the chat template you are looking for is missing, you are welcome to contribute it.
+Meanwhile, you can also temporarily register your chat template as follows:
+
+```json
+{
+  "name": "my_model",
+  "system": "<|im_start|>system",
+  "user": "<|im_start|>user",
+  "assistant": "<|im_start|>assistant",
+  "sep_style": "CHATML",
+  "sep": "<|im_end|>",
+  "stop_str": ["<|im_end|>", "<|im_start|>"]
+}
+```
+
+```
+python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000 --chat-template ./my_model_template.json
+```
--- a/docs/model_support.md
+++ b/docs/model_support.md
@@ -1,4 +1,4 @@
-## How to Support a New Model
+# How to Support a New Model

 To support a new model in SGLang, you only need to add a single file under [SGLang Models Directory](https://github.com/sgl-project/sglang/tree/main/python/sglang/srt/models). You can learn from existing model implementations and create new files for the new models. Most models are based on the transformer architecture, making them very similar.

--- a/docs/sampling_params.md
+++ b/docs/sampling_params.md
@@ -1,4 +1,4 @@
-## Sampling Parameters of SGLang Runtime
+# Sampling Parameters in SGLang Runtime
 This doc describes the sampling parameters of the SGLang Runtime.

 The `/generate` endpoint accepts the following arguments in the JSON format.
@@ -6,11 +6,11 @@ The `/generate` endpoint accepts the following arguments in the JSON format.
 ```python
@dataclass
 class GenerateReqInput:
-    # The input prompt
+    # The input prompt. It can be a single prompt or a batch of prompts.
    text: Union[List[str], str]
    # The token ids for text; one can either specify text or input_ids
    input_ids: Optional[Union[List[List[int]], List[int]]] = None
-    # The image input
+    # The image input. It can be a file name.
    image_data: Optional[Union[List[str], str]] = None
    # The sampling_params
    sampling_params: Union[List[Dict], Dict] = None
--- a/docs/test_process.md
+++ b/docs/test_process.md
@@ -1,4 +1,4 @@
-## SRT Unit Tests
+# SRT Unit Tests

 ### Latency Alignment
 Make sure your changes do not slow down the following benchmarks