Improve tensor parallel performance (#625)

Co-authored-by: Mingyi <wisclmy0611@gmail.com>
2024-07-15 07:10:51 -07:00
parent 5ac8b80677
commit 6a2941f4d0
10 changed files with 171 additions and 81 deletions
--- a/python/sglang/README.md
+++ b/python/sglang/README.md
@@ -2,11 +2,10 @@

 - `backend`: Various backends for the language interpreter.
 - `lang`: The frontend language.
- `srt`: The runtime for running local models.
+- `srt`: The serving engine for running local models. (SRT = SGLang Runtime).
 - `test`: Test utilities.
 - `api.py`: Public API.
 - `bench_latency.py`: Benchmark utilities.
 - `global_config.py`: The global configs and constants.
 - `launch_server.py`: The entry point of launching local server.
 - `utils.py`: Common utilities.
-