release v0.1.10

2024-01-30 15:37:43 +00:00
parent 873d0e8537
commit a49dc52bfa
3 changed files with 3 additions and 2 deletions
--- a/README.md
+++ b/README.md
@@ -351,6 +351,7 @@ python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port
 ```
 python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000 --mem-fraction-static 0.7
 ```
+- You can turn on [flashinfer](docs/flashinfer.md) to acclerate the inference by using highly optimized CUDA kernels.

 ### Supported Models
 - Llama