support models from www.modelscope.cn (#994)

Co-authored-by: mulin.lyh <mulin.lyh@taobao.com>
2024-08-09 17:52:14 +08:00
parent 95a28019ba
commit b91a4cb1b1
5 changed files with 84 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -168,6 +168,16 @@ python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct
 - To enable fp8 quantization, you can add `--quantization fp8` on a fp16 checkpoint or directly load a fp8 checkpoint without specifying any arguments.
 - To enable experimental torch.compile support, you can add `--enable-torch-compile`. It accelerates small models on small batch sizes.

+### Use Models From ModelScope
+To use model from [ModelScope](https://www.modelscope.cn), setting environment variable SGLANG_USE_MODELSCOPE.
+```
+export SGLANG_USE_MODELSCOPE=true
+```
+Launch [Qwen2-7B-Instruct](https://www.modelscope.cn/models/qwen/qwen2-7b-instruct) Server
+```
+SGLANG_USE_MODELSCOPE=true python -m sglang.launch_server --model-path qwen/Qwen2-7B-Instruct --port 30000
+```    
+  
 ### Supported Models

 - Llama / Llama 2 / Llama 3 / Llama 3.1