Update run_batch interface and max_prefill_tokens (#574)

2024-06-30 18:26:04 -07:00
parent 11616fc6bd
commit 75b31a2a88
3 changed files with 19 additions and 14 deletions
--- a/docs/test_process.md
+++ b/docs/test_process.md
@@ -1,13 +1,8 @@
 ## SRT Unit Tests

-### Low-level API
+### Latency Alignment
 ```
-cd sglang/test/srt/model
-
-python3 test_llama_low_api.py
-python3 test_llama_extend.py
-python3 test_llava_low_api.py
-python3 bench_llama_low_api.py
+python -m sglang.bench_latency --model-path meta-llama/Llama-2-7b-chat-hf --mem-fraction-static 0.8 --batch 32 --input-len 512 --output-len 256
 ```

 ### High-level API