Commit Graph

74 Commits

Author SHA1 Message Date
Lianmin Zheng
bc1534ff32 Fix a draft model accuracy bug in eagle; support step=1; return logprob in eagle (#4134)
Co-authored-by: Sehoon Kim <kssteven418@gmail.com>
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: Sehoon Kim <sehoon@x.ai>
2025-03-06 06:13:59 -08:00
Ke Bao
d3fe9bae56 Add accuracy test for TP torch compile (#3994) 2025-03-02 13:18:18 -08:00
fzyzcjy
e3e0bc50a9 [Feature] SPMD for SGLang + Verl (#3852) 2025-02-28 09:53:10 -08:00
Lianmin Zheng
d7934cde45 Fix CI and install docs (#3821) 2025-02-24 16:17:38 -08:00
Yineng Zhang
f983213a1f update pr-test (#3663) 2025-02-18 17:23:43 +08:00
Yineng Zhang
e319153be8 update unit test (#3636) 2025-02-17 21:06:10 +08:00
Shi Shuai
7443197a63 [CI] Improve Docs CI Efficiency (#3587)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-02-14 19:57:00 -08:00
Yineng Zhang
70f894b810 feat: support flashinfer mla attention for deepseek v3 (#3550) 2025-02-14 08:50:14 +08:00
Jackmin801
5f0e7de339 [Feat] Return hidden states (experimental) (#3364)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-02-10 15:54:37 -08:00
Yineng Zhang
5da3d21c8b update pr-test ci (#3376) 2025-02-07 21:08:35 +08:00
Chayenne
76ca91dff2 Docs/CI: Enable Fake Finish for Docs Only PR (#3350) 2025-02-06 19:33:31 -08:00
Yineng Zhang
d39899e85c upgrade flashinfer v0.2.0.post2 (#3288)
Co-authored-by: pankajroark <pankajroark@users.noreply.github.com>
2025-02-04 21:41:40 +08:00
Yineng Zhang
827aa8730b cleanup sgl-kernel kernels (#3175) 2025-01-27 19:11:01 +08:00
Lianmin Zheng
4a61253123 Do not load OPENAI_KEY from secrets (#3147) 2025-01-26 01:54:03 -08:00
Lianmin Zheng
4f118a39d7 Fix repetition penalty (#3139) 2025-01-25 21:48:58 -08:00
Lianmin Zheng
da6f8081f6 Fix CI tests (#3132) 2025-01-25 17:43:39 -08:00
Lianmin Zheng
a4331cd260 Add accuracy and latency tests of eagle into CI (#3027) 2025-01-21 02:55:14 -08:00
Lianmin Zheng
09bcbe0123 Update TypeBasedDispatcher and balance CI tests (#3001) 2025-01-19 23:37:27 -08:00
Lianmin Zheng
cd493b5afc Improve metrics, logging, and importing orders (#2992) 2025-01-19 18:36:59 -08:00
Yineng Zhang
58f42b1dd8 minor: update pr test (#2908) 2025-01-16 05:51:49 +08:00
Lianmin Zheng
67008f4b32 Use only one GPU for MLA CI tests (#2858) 2025-01-13 03:55:33 -08:00
Lianmin Zheng
bdc1acf6cd Misc fix for min_p_sampling, --cuda-graph-bs (#2761) 2025-01-07 02:52:53 -08:00
Lianmin Zheng
b0524c3789 Eagle speculative decoding part 2: Fix cuda graph + DP attention hanging (#2684)
Co-authored-by: yukavio <kavioyu@gmail.com>
2024-12-31 02:25:05 -08:00
Yineng Zhang
d49b13c6f8 feat: use CUDA 12.4 by default (for FA3) (#2682) 2024-12-31 15:52:09 +08:00
Lianmin Zheng
8c3b420eec [Docs] clean up structured outputs docs (#2654) 2024-12-29 23:57:16 -08:00
Lianmin Zheng
dc3bee4815 Fix test and benchmark scripts (#2598) 2024-12-26 07:56:26 -08:00
Yineng Zhang
7154b4b1df minor: update flashinfer nightly (#2490) 2024-12-16 23:02:49 +08:00
xiaobochen
3d32e4a32c Resubmit MoE-EP (#2371) 2024-12-06 15:05:21 +08:00
Chayenne
983bfcf386 Online weight updates from torch.distributed (#2279) 2024-12-01 23:23:18 -08:00
Lianmin Zheng
5c18a03733 Fix logprob for completions (#2301) 2024-12-01 05:17:05 -08:00
Yineng Zhang
fc78640e00 minor: support flashinfer nightly (#2295) 2024-12-01 18:55:26 +08:00
Lianmin Zheng
9449a95431 [CI] Balance CI tests (#2293) 2024-12-01 01:47:30 -08:00
Chayenne
7d1485d376 Add get weights by parameter name for llama (#2266) 2024-11-29 23:36:38 -08:00
Lianmin Zheng
b2ccf36d4d Fix memory leak during abort (#2238) 2024-11-28 02:22:15 -08:00
Lianmin Zheng
ea34350d88 Rename double sparsity config file (#2188) 2024-11-25 17:12:08 -08:00
Lianmin Zheng
fe5d3e818f Balance CI tests (#2162) 2024-11-24 07:38:52 -08:00
Lianmin Zheng
731146f6cb Fix mixed chunked prefill in overlap mode (#2158) 2024-11-24 07:17:37 -08:00
Lianmin Zheng
5652c56535 Update CI threshold & Improve code style (#2159) 2024-11-24 06:29:38 -08:00
Lianmin Zheng
dfec7fca06 Rename sglang.bench_latency to sglang.bench_one_batch (#2118) 2024-11-21 20:07:48 -08:00
Lianmin Zheng
c1f401fc58 Revert "chore: update torch v2.5.1" (#2063) 2024-11-17 15:29:38 -08:00
Yineng Zhang
3b878863f7 chore: update torch v2.5.1 (#1849) 2024-11-18 00:06:00 +08:00
Ke Bao
976bc302e5 Support DP MLA (#1970) 2024-11-16 09:01:43 +00:00
Lianmin Zheng
befc6beb86 Fix a typo in io_struct.py (#2008) 2024-11-11 16:34:10 -08:00
Lianmin Zheng
3d043319aa [CI] Balance unit tests (#1988) 2024-11-10 11:45:01 -08:00
Lianmin Zheng
520f0094e4 [CI] balance unit tests (#1977) 2024-11-09 16:46:14 -08:00
Xuehai Pan
a5e0defb5a minor: Add basic editorconfig and pre-commit hooks to enforce style for whitespaces (#1926) 2024-11-06 13:46:04 +00:00
Liangsheng Yin
b9fd178f1b Fix retraction + overlap (#1860)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
2024-10-31 18:27:42 -07:00
Lianmin Zheng
a2e0424abf Fix memory leak for chunked prefill 2 (#1858)
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
2024-10-31 14:51:51 -07:00
Lianmin Zheng
6aa94b967c Update ci workflows (#1804) 2024-10-26 04:32:36 -07:00
Lianmin Zheng
1701b0db31 Enhance the test case for chunked prefill (#1785) 2024-10-24 21:23:09 -07:00