Update Readme (#660)

Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
This commit is contained in:
Ying Sheng
2024-07-19 09:54:01 -07:00
committed by GitHub
parent dc4e4a6acc
commit 51fda1439f
25 changed files with 200 additions and 185 deletions

View File

@@ -1,4 +1,4 @@
## Benchmark Results
# Benchmark Results
We tested our system on the following common LLM workloads and reported the achieved throughput:
- **[MMLU](https://arxiv.org/abs/2009.03300)**: A 5-shot, multi-choice, multi-task benchmark.

View File

@@ -0,0 +1,28 @@
# Custom Chat Template in SGLang Runtime
By default, the server uses the chat template specified in the model tokenizer from Hugging Face. It should just work for most official models such as Llama-2/Llama-3.
If needed, you can also override the chat template when launching the server:
```
python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000 --chat-template llama-2
```
If the chat template you are looking for is missing, you are welcome to contribute it.
Meanwhile, you can also temporarily register your chat template as follows:
```json
{
"name": "my_model",
"system": "<|im_start|>system",
"user": "<|im_start|>user",
"assistant": "<|im_start|>assistant",
"sep_style": "CHATML",
"sep": "<|im_end|>",
"stop_str": ["<|im_end|>", "<|im_start|>"]
}
```
```
python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000 --chat-template ./my_model_template.json
```

View File

@@ -1,4 +1,4 @@
## How to Support a New Model
# How to Support a New Model
To support a new model in SGLang, you only need to add a single file under [SGLang Models Directory](https://github.com/sgl-project/sglang/tree/main/python/sglang/srt/models). You can learn from existing model implementations and create new files for the new models. Most models are based on the transformer architecture, making them very similar.

View File

@@ -1,4 +1,4 @@
## Sampling Parameters of SGLang Runtime
# Sampling Parameters in SGLang Runtime
This doc describes the sampling parameters of the SGLang Runtime.
The `/generate` endpoint accepts the following arguments in the JSON format.
@@ -6,11 +6,11 @@ The `/generate` endpoint accepts the following arguments in the JSON format.
```python
@dataclass
class GenerateReqInput:
# The input prompt
# The input prompt. It can be a single prompt or a batch of prompts.
text: Union[List[str], str]
# The token ids for text; one can either specify text or input_ids
input_ids: Optional[Union[List[List[int]], List[int]]] = None
# The image input
# The image input. It can be a file name.
image_data: Optional[Union[List[str], str]] = None
# The sampling_params
sampling_params: Union[List[Dict], Dict] = None

View File

@@ -1,4 +1,4 @@
## SRT Unit Tests
# SRT Unit Tests
### Latency Alignment
Make sure your changes do not slow down the following benchmarks