Commit Graph

1630 Commits

Author SHA1 Message Date
Shi Shuai
dd2e2d275f Docs: Update documentation workflow and contribution guide (#2704)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-01-02 09:18:31 -08:00
Rodrigo Garcia
a990daff9c Included multi-node DeepSeekv3 example (#2707) 2025-01-02 22:17:03 +08:00
Yineng Zhang
ba5112ff69 feat: support moe_align_block_size_triton (#2712)
Co-authored-by: WANDY666 <1060304770@qq.com>
2025-01-02 21:47:44 +08:00
yukavio
815dce0554 Eagle speculative decoding part 4: Add EAGLE2 worker (#2150)
Co-authored-by: kavioyu <kavioyu@tencent.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
2025-01-02 03:22:34 -08:00
Lianmin Zheng
ad20b7957e Eagle speculative decoding part 3: small modifications to the general scheduler (#2709)
Co-authored-by: kavioyu <kavioyu@tencent.com>
2025-01-02 02:09:08 -08:00
fzyzcjy
9183c23eca Speed up update_weights_from_tensor (#2695) 2025-01-02 02:05:19 -08:00
kk
148254d4db Improve moe reduce sum kernel performance (#2705)
Co-authored-by: wunhuang <wunhuang@amd.com>
2025-01-02 01:11:06 -08:00
Xiaotong Jiang
a4d6d6f1dd [feat]: Add math eval to CI nightly run (#2663)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-01-01 15:29:35 -08:00
Shi Shuai
062c48d2bd [Docs] Add Support for Pydantic Structured Output Format (#2697) 2025-01-01 15:08:43 -08:00
kk
b6e0cfb5e1 ROCm base image update (#2692)
Co-authored-by: wunhuang <wunhuang@amd.com>
2025-01-01 12:12:19 +08:00
Chayenne
0d8d97b8e6 Doc: Rename contribution_guide.md (#2691) 2024-12-31 14:35:48 -08:00
Shi Shuai
0a765bbccc Docs: Refactor Contribution Guide (#2690) 2024-12-31 14:11:00 -08:00
Xiaoyu Zhang
286cad3ee3 h200 tuning fused_moe_triton config for Mixtral 8x7B/8x22B and Qwen2 57BA14B (#2689) 2024-12-31 23:17:36 +08:00
Ying Sheng
dc7eb01f19 [Fix] fix openai adapter (#2685) 2024-12-31 10:48:19 +00:00
Lianmin Zheng
b0524c3789 Eagle speculative decoding part 2: Fix cuda graph + DP attention hanging (#2684)
Co-authored-by: yukavio <kavioyu@gmail.com>
2024-12-31 02:25:05 -08:00
Lianmin Zheng
6c42fa229d Update README.md (#2683) 2024-12-31 00:13:10 -08:00
Yineng Zhang
d49b13c6f8 feat: use CUDA 12.4 by default (for FA3) (#2682) 2024-12-31 15:52:09 +08:00
Yineng Zhang
bedc4c7a50 misc: update CODEOWNERS (#2680) 2024-12-31 15:04:50 +08:00
Lianmin Zheng
f44d143949 Support target model verification in the attention backend (#2678)
Co-authored-by: yukavio <kavioyu@gmail.com>
2024-12-30 22:58:55 -08:00
Yineng Zhang
b6b57fc200 minor: cleanup sgl-kernel (#2679) 2024-12-31 14:52:00 +08:00
Ke Bao
b4403985d0 Add cutlass submodule for sgl-kernel (#2676) 2024-12-31 14:28:29 +08:00
Lianmin Zheng
339c69a243 Improve the computation for time_per_output_token Prometheus metrics (#2674) 2024-12-30 21:40:14 -08:00
fzyzcjy
f707470019 CI: Update scripts to fail fast (#2672) 2024-12-30 19:04:01 -08:00
Lianmin Zheng
21ec66e59e Minor follow-up fixes for the logprob refactor (#2670) 2024-12-30 05:42:08 -08:00
HAI
c5210dfa38 AMD DeepSeek_V3 FP8 Numerical fix (#2667) 2024-12-30 21:31:12 +08:00
mobicham
a29dd9501d Add GemLite caching after each capture (#2669) 2024-12-30 05:27:29 -08:00
Lianmin Zheng
9c6ba2484f Refactor logprob computation to return the real logprob used in sampling (#2664) 2024-12-30 04:51:38 -08:00
Ke Bao
b02da24a5b Refactor sgl-kernel build (#2642) 2024-12-30 18:07:01 +08:00
Lianmin Zheng
bdd2827a80 Update structured_outputs.ipynb (#2666) 2024-12-30 00:46:41 -08:00
Lianmin Zheng
8c3b420eec [Docs] clean up structured outputs docs (#2654) 2024-12-29 23:57:16 -08:00
HAI
e6f523b5f2 fix typo in python/sglang/srt/layers/quantization/fp8.py (#2655) 2024-12-29 23:45:02 -08:00
Lianmin Zheng
3231817861 Revert "[feat] Add math eval to CI" (#2656) 2024-12-30 15:05:50 +08:00
Xiaotong Jiang
a11f8d5f6a [feat] Add math eval to CI (#2652) 2024-12-30 14:49:41 +08:00
Yineng Zhang
098d659c0e docs: update README (#2651) 2024-12-30 13:33:29 +08:00
Lzhang-hub
76d14f8cb9 add 2*h20 node serving example for deepseek v3 (#2650)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2024-12-30 13:04:38 +08:00
Lianmin Zheng
b08c308ebc Update the timeout in nightly-test.yml (#2649) 2024-12-29 14:51:07 -08:00
Lianmin Zheng
03d5fbfd44 Release 0.4.1.post3 - upload the config.json to PyPI (#2647) 2024-12-29 14:25:53 -08:00
Chayenne
1703d766d8 CI: skip special token for engine token ids unit test (#2648) 2024-12-29 13:52:50 -08:00
zhaochenyang20
09e6e2aa33 Merge branch 'main' of github.com:sgl-project/sglang 2024-12-29 21:48:21 +00:00
Shi Shuai
fad29f7f52 CI: Fix unittest for engine input token ids and output token ids (#2646) 2024-12-29 13:28:59 -08:00
Shi Shuai
35bdb48557 [Feature] Get Token IDs with Engine.generate() (#2636)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2024-12-29 12:28:27 -08:00
Yineng Zhang
b085e06b01 docs: add development guide using docker (#2645) 2024-12-30 02:22:54 +08:00
Yineng Zhang
763dd55d17 docs: update README (#2644) 2024-12-30 01:24:06 +08:00
Yineng Zhang
3ccf566b0d chore: bump v0.4.1.post2 (#2643) 2024-12-30 00:11:46 +08:00
HandH1998
afa0341e57 Update Triton configs for block fp8 kernels (#2641) 2024-12-29 22:53:47 +08:00
HAI
30828e7192 AMD: set weights and scaling numbers properly for block FP8 (#2637) 2024-12-29 03:23:39 -08:00
Ying Sheng
e0e09fceeb [Session] Update session control interface (#2635) 2024-12-29 02:10:27 -08:00
Lianmin Zheng
9c05c6898e Add llama_eagle.py (#2640)
Co-authored-by: kavioyu <kavioyu@tencent.com>
2024-12-29 01:45:35 -08:00
Yineng Zhang
3464e57b62 minor: add nsys cli for docker dev (#2639) 2024-12-29 17:28:11 +08:00
Lianmin Zheng
3815b23ccb Clean up wrapper in flashinfer backend (#2638) 2024-12-29 00:45:57 -08:00