Organize public APIs (#809)

This commit is contained in:
Liangsheng Yin
2024-07-29 15:34:16 -07:00
committed by GitHub
parent 084fa54d37
commit c8e9fed87a
10 changed files with 74 additions and 66 deletions

View File

@@ -208,11 +208,11 @@ Instructions for supporting a new model are [here](https://github.com/sgl-projec
- Benchmark a single static batch by running the following command without launching a server. The arguments are the same as those for `launch_server.py`. This is not a dynamic batching server, so it may run out of memory for a batch size that can run successfully with a real server. This is because a real server will truncate the prefill into several batches/chunks, while this unit test does not do this.
```
python -m sglang.bench_latency --model-path meta-llama/Meta-Llama-3-8B-Instruct --batch 32 --input-len 256 --output-len 32
python -m sglang.benchmarks.bench_latency --model-path meta-llama/Meta-Llama-3-8B-Instruct --batch 32 --input-len 256 --output-len 32
```
- Benchmark online serving. Launch a server first and run the following command.
```
python3 -m sglang.bench_serving --backend sglang --num-prompt 10
python3 -m sglang.benchmarks.bench_serving --backend sglang --num-prompt 10
```
## Frontend: Structured Generation Language (SGLang)