Refactor the docs (#9031)

2025-08-10 19:49:45 -07:00
parent 0f229c07f1
commit 2449a0afe2
80 changed files with 619 additions and 750 deletions
--- a/docs/supported_models/modelscope.md
+++ b/docs/supported_models/modelscope.md
@@ -0,0 +1,28 @@
+# Use Models From ModelScope
+
+To use a model from [ModelScope](https://www.modelscope.cn), set the environment variable `SGLANG_USE_MODELSCOPE`.
+
+```bash
+export SGLANG_USE_MODELSCOPE=true
+```
+
+We take [Qwen2-7B-Instruct](https://www.modelscope.cn/models/qwen/qwen2-7b-instruct) as an example.
+
+Launch the Server:
+```bash
+python -m sglang.launch_server --model-path qwen/Qwen2-7B-Instruct --port 30000
+```
+
+Or start it by docker:
+
+```bash
+docker run --gpus all \
+    -p 30000:30000 \
+    -v ~/.cache/modelscope:/root/.cache/modelscope \
+    --env "SGLANG_USE_MODELSCOPE=true" \
+    --ipc=host \
+    lmsysorg/sglang:latest \
+    python3 -m sglang.launch_server --model-path Qwen/Qwen2.5-7B-Instruct --host 0.0.0.0 --port 30000
+```
+
+Note that modelscope uses a different cache directory than huggingface. You may need to set it manually to avoid running out of disk space.
--- a/docs/supported_models/transformers_fallback.md
+++ b/docs/supported_models/transformers_fallback.md
@@ -4,23 +4,23 @@

 ## Example launch Command

-By default, we will use sglang implementation if it is available. Otherwise, we will fall back to transformers one. However, you can switch the implementation by setting `impl` to `transformers`.
+By default, we will use sglang implementation if it is available. Otherwise, we will fall back to transformers one. However, you can switch the implementation by setting `--model-impl` to `transformers`.

 ```shell
 python3 -m sglang.launch_server \
  --model-path meta-llama/Llama-3.2-1B-Instruct \
  --host 0.0.0.0 \
  --port 30000 \
-  --impl transformers
+  --model-impl transformers
 ```

-#### Supported features
+## Supported features

-##### Quantization
+### Quantization

 Transformers fall back has supported most of available quantization in SGLang (except GGUF). See [Quantization page](https://docs.sglang.ai/backend/quantization.html) for more information about supported quantization in SGLang.

-##### Remote code
+### Remote code

 This fallback also means that any model on the hub that can be used in `transformers` with `trust_remote_code=True` that correctly implements attention can be used in production!