release v0.1.10
This commit is contained in:
@@ -351,6 +351,7 @@ python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port
|
||||
```
|
||||
python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000 --mem-fraction-static 0.7
|
||||
```
|
||||
- You can turn on [flashinfer](docs/flashinfer.md) to acclerate the inference by using highly optimized CUDA kernels.
|
||||
|
||||
### Supported Models
|
||||
- Llama
|
||||
|
||||
Reference in New Issue
Block a user