From b6cd903604a7439fcd082290ade83a02b164eca0 Mon Sep 17 00:00:00 2001 From: Lianmin Zheng Date: Sat, 19 Oct 2024 12:58:55 -0700 Subject: [PATCH] Update readme and workflow (#1716) --- .github/workflows/pr-test-amd.yml | 5 +++-- README.md | 2 +- python/sglang/README.md | 3 ++- 3 files changed, 6 insertions(+), 4 deletions(-) diff --git a/.github/workflows/pr-test-amd.yml b/.github/workflows/pr-test-amd.yml index 27a964d29..83210fe52 100644 --- a/.github/workflows/pr-test-amd.yml +++ b/.github/workflows/pr-test-amd.yml @@ -14,7 +14,7 @@ on: workflow_dispatch: concurrency: - group: pr-test-${{ github.ref }} + group: pr-test-amd-${{ github.ref }} cancel-in-progress: true jobs: @@ -28,7 +28,8 @@ jobs: - name: Install dependencies run: | pip install --upgrade pip - pip install -e "python[all]" --no-deps + pip install -e "python[runtime_common, test]" + pip install -e "python" --no-deps git clone https://github.com/merrymercy/human-eval.git cd human-eval diff --git a/README.md b/README.md index b5827f792..d4b0f140d 100644 --- a/README.md +++ b/README.md @@ -18,11 +18,11 @@ - [2024/10] 🔥 The First SGLang Online Meetup ([slides](https://github.com/sgl-project/sgl-learning-materials?tab=readme-ov-file#the-first-sglang-online-meetup)). - [2024/09] SGLang v0.3 Release: 7x Faster DeepSeek MLA, 1.5x Faster torch.compile, Multi-Image/Video LLaVA-OneVision ([blog](https://lmsys.org/blog/2024-09-04-sglang-v0-3/)). - [2024/07] Faster Llama3 Serving with SGLang Runtime (vs. TensorRT-LLM, vLLM) ([blog](https://lmsys.org/blog/2024-07-25-sglang-llama3/)). -- [2024/02] SGLang enables **3x faster JSON decoding** with compressed finite state machine ([blog](https://lmsys.org/blog/2024-02-05-compressed-fsm/)).
More +- [2024/02] SGLang enables **3x faster JSON decoding** with compressed finite state machine ([blog](https://lmsys.org/blog/2024-02-05-compressed-fsm/)). - [2024/04] SGLang is used by the official **LLaVA-NeXT (video)** release ([blog](https://llava-vl.github.io/blog/2024-04-30-llava-next-video/)). - [2024/01] SGLang provides up to **5x faster inference** with RadixAttention ([blog](https://lmsys.org/blog/2024-01-17-sglang/)). - [2024/01] SGLang powers the serving of the official **LLaVA v1.6** release demo ([usage](https://github.com/haotian-liu/LLaVA?tab=readme-ov-file#demo)). diff --git a/python/sglang/README.md b/python/sglang/README.md index 78f469c74..8b59fc106 100644 --- a/python/sglang/README.md +++ b/python/sglang/README.md @@ -4,7 +4,8 @@ - `srt`: The backend engine for running local models. (SRT = SGLang Runtime). - `test`: The test utilities. - `api.py`: The public APIs. -- `bench_latency.py`: Benchmark a single static batch. +- `bench_latency.py`: Benchmark the latency of running a single static batch. +- `bench_server_latency.py`: Benchmark the latency of serving a single batch with a real server. - `bench_serving.py`: Benchmark online serving with dynamic requests. - `global_config.py`: The global configs and constants. - `launch_server.py`: The entry point for launching the local server.