Update run_batch interface and max_prefill_tokens (#574)

This commit is contained in:
Ying Sheng
2024-06-30 18:26:04 -07:00
committed by GitHub
parent 11616fc6bd
commit 75b31a2a88
3 changed files with 19 additions and 14 deletions

View File

@@ -1,13 +1,8 @@
## SRT Unit Tests
### Low-level API
### Latency Alignment
```
cd sglang/test/srt/model
python3 test_llama_low_api.py
python3 test_llama_extend.py
python3 test_llava_low_api.py
python3 bench_llama_low_api.py
python -m sglang.bench_latency --model-path meta-llama/Llama-2-7b-chat-hf --mem-fraction-static 0.8 --batch 32 --input-len 512 --output-len 256
```
### High-level API