Docs: Rewrite docs for LLama 405B and ModelSpace (#2773)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
This commit is contained in:
@@ -32,46 +32,3 @@ python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct
|
|||||||
python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct --tp 4 --nccl-init sgl-dev-0:50000 --nnodes 2 --node-rank 1
|
python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct --tp 4 --nccl-init sgl-dev-0:50000 --nnodes 2 --node-rank 1
|
||||||
```
|
```
|
||||||
|
|
||||||
## Use Models From ModelScope
|
|
||||||
<details>
|
|
||||||
<summary>More</summary>
|
|
||||||
|
|
||||||
To use a model from [ModelScope](https://www.modelscope.cn), set the environment variable SGLANG_USE_MODELSCOPE.
|
|
||||||
```
|
|
||||||
export SGLANG_USE_MODELSCOPE=true
|
|
||||||
```
|
|
||||||
Launch [Qwen2-7B-Instruct](https://www.modelscope.cn/models/qwen/qwen2-7b-instruct) Server
|
|
||||||
```
|
|
||||||
SGLANG_USE_MODELSCOPE=true python -m sglang.launch_server --model-path qwen/Qwen2-7B-Instruct --port 30000
|
|
||||||
```
|
|
||||||
|
|
||||||
Or start it by docker.
|
|
||||||
```bash
|
|
||||||
docker run --gpus all \
|
|
||||||
-p 30000:30000 \
|
|
||||||
-v ~/.cache/modelscope:/root/.cache/modelscope \
|
|
||||||
--env "SGLANG_USE_MODELSCOPE=true" \
|
|
||||||
--ipc=host \
|
|
||||||
lmsysorg/sglang:latest \
|
|
||||||
python3 -m sglang.launch_server --model-path Qwen/Qwen2.5-7B-Instruct --host 0.0.0.0 --port 30000
|
|
||||||
```
|
|
||||||
|
|
||||||
</details>
|
|
||||||
|
|
||||||
## Example: Run Llama 3.1 405B
|
|
||||||
<details>
|
|
||||||
<summary>More</summary>
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Run 405B (fp8) on a single node
|
|
||||||
python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-405B-Instruct-FP8 --tp 8
|
|
||||||
|
|
||||||
# Run 405B (fp16) on two nodes
|
|
||||||
## on the first node, replace the `172.16.4.52:20000` with your own first node ip address and port
|
|
||||||
python3 -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-405B-Instruct --tp 16 --nccl-init-addr 172.16.4.52:20000 --nnodes 2 --node-rank 0
|
|
||||||
|
|
||||||
## on the first node, replace the `172.16.4.52:20000` with your own first node ip address and port
|
|
||||||
python3 -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-405B-Instruct --tp 16 --nccl-init-addr 172.16.4.52:20000 --nnodes 2 --node-rank 1
|
|
||||||
```
|
|
||||||
|
|
||||||
</details>
|
|
||||||
|
|||||||
@@ -60,3 +60,5 @@ The core features include:
|
|||||||
references/troubleshooting.md
|
references/troubleshooting.md
|
||||||
references/faq.md
|
references/faq.md
|
||||||
references/learn_more.md
|
references/learn_more.md
|
||||||
|
references/llama_405B.md
|
||||||
|
references/modelscope.md
|
||||||
|
|||||||
16
docs/references/llama_405B.md
Normal file
16
docs/references/llama_405B.md
Normal file
@@ -0,0 +1,16 @@
|
|||||||
|
# Example: Run Llama 3.1 405B
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run 405B (fp8) on a single node
|
||||||
|
python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-405B-Instruct-FP8 --tp 8
|
||||||
|
```
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run 405B (fp16) on two nodes
|
||||||
|
## on the first node, replace the `172.16.4.52:20000` with your own first node ip address and port
|
||||||
|
python3 -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-405B-Instruct --tp 16 --nccl-init-addr 172.16.4.52:20000 --nnodes 2 --node-rank 0
|
||||||
|
|
||||||
|
## on the first node, replace the `172.16.4.52:20000` with your own first node ip address and port
|
||||||
|
python3 -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-405B-Instruct --tp 16 --nccl-init-addr 172.16.4.52:20000 --nnodes 2 --node-rank 1
|
||||||
|
```
|
||||||
|
|
||||||
28
docs/references/modelscope.md
Normal file
28
docs/references/modelscope.md
Normal file
@@ -0,0 +1,28 @@
|
|||||||
|
# Use Models From ModelScope
|
||||||
|
|
||||||
|
To use a model from [ModelScope](https://www.modelscope.cn), set the environment variable `SGLANG_USE_MODELSCOPE`.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export SGLANG_USE_MODELSCOPE=true
|
||||||
|
```
|
||||||
|
|
||||||
|
We take [Qwen2-7B-Instruct](https://www.modelscope.cn/models/qwen/qwen2-7b-instruct) as an example. Launch the Server:
|
||||||
|
---
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m sglang.launch_server --model-path qwen/Qwen2-7B-Instruct --port 30000
|
||||||
|
```
|
||||||
|
|
||||||
|
Or start it by docker:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker run --gpus all \
|
||||||
|
-p 30000:30000 \
|
||||||
|
-v ~/.cache/modelscope:/root/.cache/modelscope \
|
||||||
|
--env "SGLANG_USE_MODELSCOPE=true" \
|
||||||
|
--ipc=host \
|
||||||
|
lmsysorg/sglang:latest \
|
||||||
|
python3 -m sglang.launch_server --model-path Qwen/Qwen2.5-7B-Instruct --host 0.0.0.0 --port 30000
|
||||||
|
```
|
||||||
|
|
||||||
|
Note that modelscope uses a different cache directory than huggingface. You may need to set it manually to avoid running out of disk space.
|
||||||
Reference in New Issue
Block a user