Revert "Organize public APIs" (#815)

This commit is contained in:
Ying Sheng
2024-07-29 19:40:28 -07:00
committed by GitHub
parent 3520f75fb1
commit db6089e6f3
10 changed files with 66 additions and 74 deletions

View File

@@ -208,11 +208,11 @@ Instructions for supporting a new model are [here](https://github.com/sgl-projec
- Benchmark a single static batch by running the following command without launching a server. The arguments are the same as those for `launch_server.py`. This is not a dynamic batching server, so it may run out of memory for a batch size that can run successfully with a real server. This is because a real server will truncate the prefill into several batches/chunks, while this unit test does not do this.
```
python -m sglang.benchmarks.bench_latency --model-path meta-llama/Meta-Llama-3-8B-Instruct --batch 32 --input-len 256 --output-len 32
python -m sglang.bench_latency --model-path meta-llama/Meta-Llama-3-8B-Instruct --batch 32 --input-len 256 --output-len 32
```
- Benchmark online serving. Launch a server first and run the following command.
```
python3 -m sglang.benchmarks.bench_serving --backend sglang --num-prompt 10
python3 -m sglang.bench_serving --backend sglang --num-prompt 10
```
## Frontend: Structured Generation Language (SGLang)