Refactor the docs (#9031)
This commit is contained in:
28
docs/supported_models/modelscope.md
Normal file
28
docs/supported_models/modelscope.md
Normal file
@@ -0,0 +1,28 @@
|
||||
# Use Models From ModelScope
|
||||
|
||||
To use a model from [ModelScope](https://www.modelscope.cn), set the environment variable `SGLANG_USE_MODELSCOPE`.
|
||||
|
||||
```bash
|
||||
export SGLANG_USE_MODELSCOPE=true
|
||||
```
|
||||
|
||||
We take [Qwen2-7B-Instruct](https://www.modelscope.cn/models/qwen/qwen2-7b-instruct) as an example.
|
||||
|
||||
Launch the Server:
|
||||
```bash
|
||||
python -m sglang.launch_server --model-path qwen/Qwen2-7B-Instruct --port 30000
|
||||
```
|
||||
|
||||
Or start it by docker:
|
||||
|
||||
```bash
|
||||
docker run --gpus all \
|
||||
-p 30000:30000 \
|
||||
-v ~/.cache/modelscope:/root/.cache/modelscope \
|
||||
--env "SGLANG_USE_MODELSCOPE=true" \
|
||||
--ipc=host \
|
||||
lmsysorg/sglang:latest \
|
||||
python3 -m sglang.launch_server --model-path Qwen/Qwen2.5-7B-Instruct --host 0.0.0.0 --port 30000
|
||||
```
|
||||
|
||||
Note that modelscope uses a different cache directory than huggingface. You may need to set it manually to avoid running out of disk space.
|
||||
@@ -4,23 +4,23 @@
|
||||
|
||||
## Example launch Command
|
||||
|
||||
By default, we will use sglang implementation if it is available. Otherwise, we will fall back to transformers one. However, you can switch the implementation by setting `impl` to `transformers`.
|
||||
By default, we will use sglang implementation if it is available. Otherwise, we will fall back to transformers one. However, you can switch the implementation by setting `--model-impl` to `transformers`.
|
||||
|
||||
```shell
|
||||
python3 -m sglang.launch_server \
|
||||
--model-path meta-llama/Llama-3.2-1B-Instruct \
|
||||
--host 0.0.0.0 \
|
||||
--port 30000 \
|
||||
--impl transformers
|
||||
--model-impl transformers
|
||||
```
|
||||
|
||||
#### Supported features
|
||||
## Supported features
|
||||
|
||||
##### Quantization
|
||||
### Quantization
|
||||
|
||||
Transformers fall back has supported most of available quantization in SGLang (except GGUF). See [Quantization page](https://docs.sglang.ai/backend/quantization.html) for more information about supported quantization in SGLang.
|
||||
|
||||
##### Remote code
|
||||
### Remote code
|
||||
|
||||
This fallback also means that any model on the hub that can be used in `transformers` with `trust_remote_code=True` that correctly implements attention can be used in production!
|
||||
|
||||
|
||||
Reference in New Issue
Block a user