Fix formatting in long code blocks (#10528)
This commit is contained in:
committed by
GitHub
parent
0abb41c70d
commit
7f028b07c4
@@ -8,7 +8,10 @@ It should just work for most official models such as Llama-2/Llama-3.
|
||||
If needed, you can also override the chat template when launching the server:
|
||||
|
||||
```bash
|
||||
python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000 --chat-template llama-2
|
||||
python -m sglang.launch_server \
|
||||
--model-path meta-llama/Llama-2-7b-chat-hf \
|
||||
--port 30000 \
|
||||
--chat-template llama-2
|
||||
```
|
||||
|
||||
If the chat template you are looking for is missing, you are welcome to contribute it or load it from a file.
|
||||
@@ -30,7 +33,10 @@ You can load the JSON format, which is defined by `conversation.py`.
|
||||
```
|
||||
|
||||
```bash
|
||||
python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000 --chat-template ./my_model_template.json
|
||||
python -m sglang.launch_server \
|
||||
--model-path meta-llama/Llama-2-7b-chat-hf \
|
||||
--port 30000 \
|
||||
--chat-template ./my_model_template.json
|
||||
```
|
||||
|
||||
## Jinja Format
|
||||
@@ -38,5 +44,8 @@ python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port
|
||||
You can also use the [Jinja template format](https://huggingface.co/docs/transformers/main/en/chat_templating) as defined by Hugging Face Transformers.
|
||||
|
||||
```bash
|
||||
python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000 --chat-template ./my_model_template.jinja
|
||||
python -m sglang.launch_server \
|
||||
--model-path meta-llama/Llama-2-7b-chat-hf \
|
||||
--port 30000 \
|
||||
--chat-template ./my_model_template.jinja
|
||||
```
|
||||
|
||||
@@ -7,9 +7,19 @@
|
||||
```bash
|
||||
# replace 172.16.4.52:20000 with your own node ip address and port of the first node
|
||||
|
||||
python3 -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-405B-Instruct --tp 16 --dist-init-addr 172.16.4.52:20000 --nnodes 2 --node-rank 0
|
||||
python3 -m sglang.launch_server \
|
||||
--model-path meta-llama/Meta-Llama-3.1-405B-Instruct \
|
||||
--tp 16 \
|
||||
--dist-init-addr 172.16.4.52:20000 \
|
||||
--nnodes 2 \
|
||||
--node-rank 0
|
||||
|
||||
python3 -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-405B-Instruct --tp 16 --dist-init-addr 172.16.4.52:20000 --nnodes 2 --node-rank 1
|
||||
python3 -m sglang.launch_server \
|
||||
--model-path meta-llama/Meta-Llama-3.1-405B-Instruct \
|
||||
--tp 16 \
|
||||
--dist-init-addr 172.16.4.52:20000 \
|
||||
--nnodes 2 \
|
||||
--node-rank 1
|
||||
```
|
||||
|
||||
Note that LLama 405B (fp8) can also be launched on a single node.
|
||||
|
||||
@@ -139,7 +139,10 @@ This section describes how to set up the monitoring stack (Prometheus + Grafana)
|
||||
1. **Start your SGLang server with metrics enabled:**
|
||||
|
||||
```bash
|
||||
python -m sglang.launch_server --model-path <your_model_path> --port 30000 --enable-metrics
|
||||
python -m sglang.launch_server \
|
||||
--model-path <your_model_path> \
|
||||
--port 30000 \
|
||||
--enable-metrics
|
||||
```
|
||||
Replace `<your_model_path>` with the actual path to your model (e.g., `meta-llama/Meta-Llama-3.1-8B-Instruct`). Ensure the server is accessible from the monitoring stack (you might need `--host 0.0.0.0` if running in Docker). By default, the metrics endpoint will be available at `http://<sglang_server_host>:30000/metrics`.
|
||||
|
||||
@@ -212,6 +215,17 @@ You can customize the setup by modifying these files. For instance, you might ne
|
||||
|
||||
#### Check if the metrics are being collected
|
||||
|
||||
Run `python3 -m sglang.bench_serving --backend sglang --dataset-name random --num-prompts 3000 --random-input 1024 --random-output 1024 --random-range-ratio 0.5` to generate some requests.
|
||||
Run:
|
||||
```
|
||||
python3 -m sglang.bench_serving \
|
||||
--backend sglang \
|
||||
--dataset-name random \
|
||||
--num-prompts 3000 \
|
||||
--random-input 1024 \
|
||||
--random-output 1024 \
|
||||
--random-range-ratio 0.5
|
||||
```
|
||||
|
||||
to generate some requests.
|
||||
|
||||
Then you should be able to see the metrics in the Grafana dashboard.
|
||||
|
||||
Reference in New Issue
Block a user