support non-streaming benchmark (#682)

This commit is contained in:
Lianmin Zheng
2024-07-20 18:36:42 -07:00
committed by GitHub
parent caaad53b52
commit 77e592e8e0
3 changed files with 16 additions and 4 deletions

View File

@@ -154,6 +154,7 @@ python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct
python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct --tp 4 --nccl-init sgl-dev-1:50000 --nnodes 2 --node-rank 1
```
- If the model does not have a template in the Hugging Face tokenizer, you can specify a [custom chat template](docs/custom_chat_template.md).
- To enable fp8 quantization, you can add `--quantization fp8` on a fp16 checkpoint or directly load a fp8 checkpoint without specifying any arguments.
### Supported Models