Update Readme (#660)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
This commit is contained in:
@@ -1,4 +1,4 @@
|
||||
## Benchmark Results
|
||||
# Benchmark Results
|
||||
|
||||
We tested our system on the following common LLM workloads and reported the achieved throughput:
|
||||
- **[MMLU](https://arxiv.org/abs/2009.03300)**: A 5-shot, multi-choice, multi-task benchmark.
|
||||
|
||||
28
docs/custom_chat_template.md
Normal file
28
docs/custom_chat_template.md
Normal file
@@ -0,0 +1,28 @@
|
||||
# Custom Chat Template in SGLang Runtime
|
||||
|
||||
By default, the server uses the chat template specified in the model tokenizer from Hugging Face. It should just work for most official models such as Llama-2/Llama-3.
|
||||
|
||||
If needed, you can also override the chat template when launching the server:
|
||||
|
||||
```
|
||||
python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000 --chat-template llama-2
|
||||
```
|
||||
|
||||
If the chat template you are looking for is missing, you are welcome to contribute it.
|
||||
Meanwhile, you can also temporarily register your chat template as follows:
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "my_model",
|
||||
"system": "<|im_start|>system",
|
||||
"user": "<|im_start|>user",
|
||||
"assistant": "<|im_start|>assistant",
|
||||
"sep_style": "CHATML",
|
||||
"sep": "<|im_end|>",
|
||||
"stop_str": ["<|im_end|>", "<|im_start|>"]
|
||||
}
|
||||
```
|
||||
|
||||
```
|
||||
python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000 --chat-template ./my_model_template.json
|
||||
```
|
||||
@@ -1,4 +1,4 @@
|
||||
## How to Support a New Model
|
||||
# How to Support a New Model
|
||||
|
||||
To support a new model in SGLang, you only need to add a single file under [SGLang Models Directory](https://github.com/sgl-project/sglang/tree/main/python/sglang/srt/models). You can learn from existing model implementations and create new files for the new models. Most models are based on the transformer architecture, making them very similar.
|
||||
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
## Sampling Parameters of SGLang Runtime
|
||||
# Sampling Parameters in SGLang Runtime
|
||||
This doc describes the sampling parameters of the SGLang Runtime.
|
||||
|
||||
The `/generate` endpoint accepts the following arguments in the JSON format.
|
||||
@@ -6,11 +6,11 @@ The `/generate` endpoint accepts the following arguments in the JSON format.
|
||||
```python
|
||||
@dataclass
|
||||
class GenerateReqInput:
|
||||
# The input prompt
|
||||
# The input prompt. It can be a single prompt or a batch of prompts.
|
||||
text: Union[List[str], str]
|
||||
# The token ids for text; one can either specify text or input_ids
|
||||
input_ids: Optional[Union[List[List[int]], List[int]]] = None
|
||||
# The image input
|
||||
# The image input. It can be a file name.
|
||||
image_data: Optional[Union[List[str], str]] = None
|
||||
# The sampling_params
|
||||
sampling_params: Union[List[Dict], Dict] = None
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
## SRT Unit Tests
|
||||
# SRT Unit Tests
|
||||
|
||||
### Latency Alignment
|
||||
Make sure your changes do not slow down the following benchmarks
|
||||
|
||||
Reference in New Issue
Block a user