sglang

Author	SHA1	Message	Date
Liangsheng Yin	a3e4e9bf9e	Better PD initialization (#5751 )	2025-05-07 01:12:57 +08:00
Liangsheng Yin	6d4d3bc81d	Fix not "import os" (#6057 )	2025-05-06 22:06:41 +08:00
Yineng Zhang	5f300141b7	docs: add new blog (#6048 )	2025-05-06 00:48:10 -07:00
Yineng Zhang	1c05425bcb	docs: add Google Cloud Vertex AI in Adoption and Sponsorship (#6047 )	2025-05-06 00:31:48 -07:00
Zhiqiang Xie	b26cb1c55a	Fix problem of large page size with chunked prefill (#6046 )	2025-05-06 15:19:47 +08:00
Zhiqiang Xie	f8e460930a	Fix prefill OOM error in the case of large page size (#5081 )	2025-05-05 16:02:55 -07:00
Adarsh Shirawalmath	683707c314	[Security][Bug] Prevent binding to all TCP interfaces (#5752 )	2025-05-06 03:21:45 +08:00
mlmz	a68ed76682	feat: append more comprehensive fields in messages instead of merely role and content (#5996 )	2025-05-05 11:43:34 -07:00
DefTruth	82653f6622	feat: Add a unified merge_state API (#5428 )	2025-05-05 10:32:33 -07:00
Wenxuan Tan	22da3d978f	Fix "Avoid computing lse in Ragged Prefill when there's no prefix match" (#5555 )	2025-05-05 10:32:17 -07:00
Huapeng Zhou	b8559764f6	[Test] Add flashmla attention backend test (#5587 )	2025-05-05 10:32:02 -07:00
shangmingc	56f6589ecb	[PD] Optimize disaggregation ib device help info (#5781 )	2025-05-05 13:47:37 +08:00
Lifu Huang	1232f7e8b7	Update dev container config to support live code sync and improve docker setup guide (#6018 ) Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>	2025-05-04 22:33:46 -07:00
fzyzcjy	3008db9c1a	[PD] Allow customizing reserved tokens to avoid KV cache waste (#6002 )	2025-05-05 11:23:15 +08:00
Junrong Lin	357fb2dba5	fix: fix broadcast_pyobj breaking VerlEngine (#5997 )	2025-05-04 13:15:53 -07:00
vzed	95c231e50d	Tool Call: Add `chat_template_kwargs` documentation (#5679 )	2025-05-04 13:12:40 -07:00
Qiaolin Yu	3042f1da61	Fix flaky issues of lora and add multi batch tests (#5957 )	2025-05-04 13:11:40 -07:00
Lifu Huang	2b63798c7d	[Minor] Fix duplicate method definitions in conversation.py (#6012 ) Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>	2025-05-04 13:02:53 -07:00
Baizhou Zhang	bf203cb7a2	[Fix] Suppress dynamo logging when using flashinfer backend with torch compile (#5992 )	2025-05-04 09:49:13 -07:00
JieXin Liang	8ebde73f7d	[perf] H100 DeepSeek-V3 fused moe tuned config (#5998 )	2025-05-03 14:02:26 -07:00
Stefan He	6b0fae797a	Fix Phi3 serving which was broke by earlier change (#5991 ) Co-authored-by: Lifu Huang <lifu.hlf@gmail.com>	2025-05-03 00:28:47 -07:00
Yineng Zhang	141a459644	fix: only upgrade nccl for cu128 (#5986 )	2025-05-02 11:31:29 -07:00
Ke Bao	d8ab60117f	Overlap qk norm with two streams (#5977 )	2025-05-02 09:26:30 -07:00
Ke Bao	6579cd7daf	Fix set kv cache multi-stream (#5975 )	2025-05-02 09:26:00 -07:00
Yongtong Wu	97ac42b634	[PD] NIXL backend Prefill TP & Decode TP+DP (#5681 )	2025-05-02 22:14:03 +08:00
Lifu Huang	1acca3a2c6	FA3 speed up: skip len operation and get batch size directly from forward batch (#5969 ) Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>	2025-05-02 00:26:12 -07:00
XinyuanTong	6ea1e6ac6e	Support MMMU benchmark for InternVL (#5968 )	2025-05-02 00:17:21 -07:00
xm:D	3409aaab32	Support InternVL3 (#5350 ) Co-authored-by: Mick <mickjagger19@icloud.com> Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-05-01 22:38:59 -07:00
Chayenne	73dcf2b326	Remove token in token out in Native API (#5967 )	2025-05-01 21:59:43 -07:00
Chang Su	170d1f218a	feat: Refactor DeepSeekV3 function call (#5908 )	2025-05-01 21:28:57 -07:00
Sai Enduri	73bc1d00fc	Add 1 gpu perf and 2 gpu accuracy tests for AMD MI300x CI. (#5960 )	2025-05-01 20:56:59 -07:00
XinyuanTong	c5645e928f	feat: add concurrency evaluation logic in mmmu benchmark (#5782 )	2025-05-01 18:20:08 -07:00
KCFindstr	d33955d28a	Properly return error response in vertex_generate HTTP endpoint (#5956 )	2025-05-01 11:48:58 -07:00
Stefan He	6fc175968c	Optimize a pad operation to accelerate 25us (#5945 )	2025-05-01 10:48:55 -07:00
江家瑋	ad506a4e6b	docs: Fix Qwen model typo (#5944 ) Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>	2025-05-01 10:23:00 -07:00
Ke Bao	ebaba85655	Update ci test and doc for MTP api change (#5952 )	2025-05-01 09:30:27 -07:00
Ke Bao	de2faef97e	Remove extra contiguous (#5953 )	2025-05-01 09:28:46 -07:00
Yuan Luo	67b7d5b1df	[PD] Vectorise group_concurrent_contiguous in NumPy (#5834 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2025-05-01 22:42:37 +08:00
ryang	4322c31e24	Support XiaomiMiMo/MiMo model inference (#5921 )	2025-05-01 07:41:13 -07:00
Yineng Zhang	9858113c33	chore: bump v0.4.6.post2 (#5939 )	2025-04-30 22:04:40 -07:00
Yineng Zhang	8441baad6e	fix: update model runner (#5934 )	2025-04-30 19:49:26 -07:00
mlmz	256c4c2519	fix: correct stream response when enable_thinking is set to false (#5881 )	2025-04-30 19:44:37 -07:00
Johnny	9f21e75453	add Thor & Spark (#5915 )	2025-04-30 19:43:40 -07:00
Qiaolin Yu	7bcd8b1cb2	Fix lora batch processing when input lora_path contains None (#5930 )	2025-04-30 19:42:42 -07:00
Ying Sheng	11383cec3c	[PP] Add pipeline parallelism (#5724 )	2025-04-30 18:18:07 -07:00
XinyuanTong	e97e57e699	Remove unused method `calculate_num_image_tokens` from qwen2_vl.py (#5783 )	2025-04-30 17:46:59 -07:00
Yineng Zhang	9a6ad8916d	chore: upgrade sgl-kernel 0.1.1 (#5933 )	2025-04-30 16:13:30 -07:00
Yineng Zhang	d353d08b4e	chore: bump sgl-kernel 0.1.1 (#5932 )	2025-04-30 14:01:49 -07:00
PGFLMG	08acdb5c3d	[Feat] Scale up fa3 kernel to sm8x arch (#5912 ) Co-authored-by: zhyncs <me@zhyncs.com>	2025-04-30 13:59:36 -07:00
Sai Enduri	2afba1b1c1	Add TP2 MOE benchmarks for AMD. (#5909 )	2025-04-30 11:38:20 -07:00

... 5 6 7 8 9 ...

3455 Commits