sglang

Author	SHA1	Message	Date
Yineng Zhang	3289c1207d	Update the retry count (#5051 )	2025-04-03 17:07:38 -07:00
Lianmin Zheng	f842853a40	Fix the timeout for unit-test-2-gpu in pr-test.yml (#4927 )	2025-03-30 12:15:40 -07:00
Lianmin Zheng	4ede6770cd	Fix retract for page size > 1 (#4914 )	2025-03-30 02:57:15 -07:00
Lianmin Zheng	74e0ac1dbd	Clean up `import vllm` in quantization/__init__.py (#4834 )	2025-03-28 10:34:10 -07:00
fzyzcjy	0d3e3072ee	Fix CI of test_patch_torch (#4844 )	2025-03-27 21:22:45 -07:00
fzyzcjy	e45ae444db	Revert "Add DeepEP tests into CI (#4737 )" (#4751 )	2025-03-25 00:44:01 -07:00
fzyzcjy	64129fa632	Add DeepEP tests into CI (#4737 )	2025-03-24 19:54:31 -07:00
aoshen524	588865f0e0	[Feature] Support Tensor Parallelism and Weight Slicing for Lora (#4274 ) Co-authored-by: ShenAo1111 <1377693092@qq.com> Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2025-03-18 20:33:07 -07:00
Yineng Zhang	c787298547	use sgl custom all reduce (#4441 )	2025-03-18 00:46:41 -07:00
Lianmin Zheng	5493c3343e	Fix data parallel + tensor parallel (#4499 )	2025-03-17 05:13:16 -07:00
Lianmin Zheng	06d12b39d3	Remove filter for pr-tests (#4468 )	2025-03-16 00:57:26 -07:00
Lianmin Zheng	c30976fb41	Fix finish step for pr tests and notebook tests (#4467 )	2025-03-16 00:52:06 -07:00
Yineng Zhang	ad1ae7f7cd	use topk_softmax with sgl-kernel (#4439 )	2025-03-14 15:59:06 -07:00
Lianmin Zheng	a5a892ffd3	Fix auto merge & add back get_flat_data_by_layer (#4393 )	2025-03-13 08:46:25 -07:00
Lianmin Zheng	5a6400eec5	Test no vllm custom allreduce (#4256 )	2025-03-10 10:08:25 -07:00
Lianmin Zheng	aa957102a9	Simplify tests & Fix trtllm custom allreduce registration (#4252 )	2025-03-10 01:24:22 -07:00
Lianmin Zheng	fbd560028a	Auto balance CI tests (#4238 )	2025-03-09 21:05:55 -07:00
Lianmin Zheng	2cadd51d11	Test no vllm custom allreduce (#4210 )	2025-03-08 05:23:06 -08:00
Lianmin Zheng	bc1534ff32	Fix a draft model accuracy bug in eagle; support step=1; return logprob in eagle (#4134 ) Co-authored-by: Sehoon Kim <kssteven418@gmail.com> Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: Sehoon Kim <sehoon@x.ai>	2025-03-06 06:13:59 -08:00
Ke Bao	d3fe9bae56	Add accuracy test for TP torch compile (#3994 )	2025-03-02 13:18:18 -08:00
fzyzcjy	e3e0bc50a9	[Feature] SPMD for SGLang + Verl (#3852 )	2025-02-28 09:53:10 -08:00
Lianmin Zheng	d7934cde45	Fix CI and install docs (#3821 )	2025-02-24 16:17:38 -08:00
Yineng Zhang	f983213a1f	update pr-test (#3663 )	2025-02-18 17:23:43 +08:00
Yineng Zhang	e319153be8	update unit test (#3636 )	2025-02-17 21:06:10 +08:00
Shi Shuai	7443197a63	[CI] Improve Docs CI Efficiency (#3587 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-02-14 19:57:00 -08:00
Yineng Zhang	70f894b810	feat: support flashinfer mla attention for deepseek v3 (#3550 )	2025-02-14 08:50:14 +08:00
Jackmin801	5f0e7de339	[Feat] Return hidden states (experimental) (#3364 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-02-10 15:54:37 -08:00
Yineng Zhang	5da3d21c8b	update pr-test ci (#3376 )	2025-02-07 21:08:35 +08:00
Chayenne	76ca91dff2	Docs/CI: Enable Fake Finish for Docs Only PR (#3350 )	2025-02-06 19:33:31 -08:00
Yineng Zhang	d39899e85c	upgrade flashinfer v0.2.0.post2 (#3288 ) Co-authored-by: pankajroark <pankajroark@users.noreply.github.com>	2025-02-04 21:41:40 +08:00
Yineng Zhang	827aa8730b	cleanup sgl-kernel kernels (#3175 )	2025-01-27 19:11:01 +08:00
Lianmin Zheng	4a61253123	Do not load OPENAI_KEY from secrets (#3147 )	2025-01-26 01:54:03 -08:00
Lianmin Zheng	4f118a39d7	Fix repetition penalty (#3139 )	2025-01-25 21:48:58 -08:00
Lianmin Zheng	da6f8081f6	Fix CI tests (#3132 )	2025-01-25 17:43:39 -08:00
Lianmin Zheng	a4331cd260	Add accuracy and latency tests of eagle into CI (#3027 )	2025-01-21 02:55:14 -08:00
Lianmin Zheng	09bcbe0123	Update TypeBasedDispatcher and balance CI tests (#3001 )	2025-01-19 23:37:27 -08:00
Lianmin Zheng	cd493b5afc	Improve metrics, logging, and importing orders (#2992 )	2025-01-19 18:36:59 -08:00
Yineng Zhang	58f42b1dd8	minor: update pr test (#2908 )	2025-01-16 05:51:49 +08:00
Lianmin Zheng	67008f4b32	Use only one GPU for MLA CI tests (#2858 )	2025-01-13 03:55:33 -08:00
Lianmin Zheng	bdc1acf6cd	Misc fix for min_p_sampling, --cuda-graph-bs (#2761 )	2025-01-07 02:52:53 -08:00
Lianmin Zheng	b0524c3789	Eagle speculative decoding part 2: Fix cuda graph + DP attention hanging (#2684 ) Co-authored-by: yukavio <kavioyu@gmail.com>	2024-12-31 02:25:05 -08:00
Yineng Zhang	d49b13c6f8	feat: use CUDA 12.4 by default (for FA3) (#2682 )	2024-12-31 15:52:09 +08:00
Lianmin Zheng	8c3b420eec	[Docs] clean up structured outputs docs (#2654 )	2024-12-29 23:57:16 -08:00
Lianmin Zheng	dc3bee4815	Fix test and benchmark scripts (#2598 )	2024-12-26 07:56:26 -08:00
Yineng Zhang	7154b4b1df	minor: update flashinfer nightly (#2490 )	2024-12-16 23:02:49 +08:00
xiaobochen	3d32e4a32c	Resubmit MoE-EP (#2371 )	2024-12-06 15:05:21 +08:00
Chayenne	983bfcf386	Online weight updates from torch.distributed (#2279 )	2024-12-01 23:23:18 -08:00
Lianmin Zheng	5c18a03733	Fix logprob for completions (#2301 )	2024-12-01 05:17:05 -08:00
Yineng Zhang	fc78640e00	minor: support flashinfer nightly (#2295 )	2024-12-01 18:55:26 +08:00
Lianmin Zheng	9449a95431	[CI] Balance CI tests (#2293 )	2024-12-01 01:47:30 -08:00

1 2

92 Commits