sglang

Author	SHA1	Message	Date
fzyzcjy	4c7b42424c	Hint users DeepEP normal mode is incompatible with CUDA Graph (#5014 )	2025-05-07 22:40:59 +08:00
Song Zhang	00c2c1f08b	[Feature] Support for Ascend NPU backend (#3853 ) Signed-off-by: Song Zhang <gepin.zs@antgroup.com> Co-authored-by: 22dimensions <waitingwind@foxmail.com>	2025-05-06 20:32:53 -07:00
Jinyan Chen	8a828666a3	Add DeepEP to CI PR Test (#5655 ) Co-authored-by: Jinyan Chen <jinyanc@nvidia.com>	2025-05-06 17:36:03 -07:00
Liangsheng Yin	a3e4e9bf9e	Better PD initialization (#5751 )	2025-05-07 01:12:57 +08:00
Liangsheng Yin	6d4d3bc81d	Fix not "import os" (#6057 )	2025-05-06 22:06:41 +08:00
Zhiqiang Xie	b26cb1c55a	Fix problem of large page size with chunked prefill (#6046 )	2025-05-06 15:19:47 +08:00
Zhiqiang Xie	f8e460930a	Fix prefill OOM error in the case of large page size (#5081 )	2025-05-05 16:02:55 -07:00
Adarsh Shirawalmath	683707c314	[Security][Bug] Prevent binding to all TCP interfaces (#5752 )	2025-05-06 03:21:45 +08:00
mlmz	a68ed76682	feat: append more comprehensive fields in messages instead of merely role and content (#5996 )	2025-05-05 11:43:34 -07:00
DefTruth	82653f6622	feat: Add a unified merge_state API (#5428 )	2025-05-05 10:32:33 -07:00
Wenxuan Tan	22da3d978f	Fix "Avoid computing lse in Ragged Prefill when there's no prefix match" (#5555 )	2025-05-05 10:32:17 -07:00
shangmingc	56f6589ecb	[PD] Optimize disaggregation ib device help info (#5781 )	2025-05-05 13:47:37 +08:00
fzyzcjy	3008db9c1a	[PD] Allow customizing reserved tokens to avoid KV cache waste (#6002 )	2025-05-05 11:23:15 +08:00
Junrong Lin	357fb2dba5	fix: fix broadcast_pyobj breaking VerlEngine (#5997 )	2025-05-04 13:15:53 -07:00
Qiaolin Yu	3042f1da61	Fix flaky issues of lora and add multi batch tests (#5957 )	2025-05-04 13:11:40 -07:00
Lifu Huang	2b63798c7d	[Minor] Fix duplicate method definitions in conversation.py (#6012 ) Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>	2025-05-04 13:02:53 -07:00
Baizhou Zhang	bf203cb7a2	[Fix] Suppress dynamo logging when using flashinfer backend with torch compile (#5992 )	2025-05-04 09:49:13 -07:00
JieXin Liang	8ebde73f7d	[perf] H100 DeepSeek-V3 fused moe tuned config (#5998 )	2025-05-03 14:02:26 -07:00
Stefan He	6b0fae797a	Fix Phi3 serving which was broke by earlier change (#5991 ) Co-authored-by: Lifu Huang <lifu.hlf@gmail.com>	2025-05-03 00:28:47 -07:00
Ke Bao	d8ab60117f	Overlap qk norm with two streams (#5977 )	2025-05-02 09:26:30 -07:00
Ke Bao	6579cd7daf	Fix set kv cache multi-stream (#5975 )	2025-05-02 09:26:00 -07:00
Yongtong Wu	97ac42b634	[PD] NIXL backend Prefill TP & Decode TP+DP (#5681 )	2025-05-02 22:14:03 +08:00
Lifu Huang	1acca3a2c6	FA3 speed up: skip len operation and get batch size directly from forward batch (#5969 ) Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>	2025-05-02 00:26:12 -07:00
xm:D	3409aaab32	Support InternVL3 (#5350 ) Co-authored-by: Mick <mickjagger19@icloud.com> Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-05-01 22:38:59 -07:00
Chang Su	170d1f218a	feat: Refactor DeepSeekV3 function call (#5908 )	2025-05-01 21:28:57 -07:00
KCFindstr	d33955d28a	Properly return error response in vertex_generate HTTP endpoint (#5956 )	2025-05-01 11:48:58 -07:00
Stefan He	6fc175968c	Optimize a pad operation to accelerate 25us (#5945 )	2025-05-01 10:48:55 -07:00
Ke Bao	ebaba85655	Update ci test and doc for MTP api change (#5952 )	2025-05-01 09:30:27 -07:00
Ke Bao	de2faef97e	Remove extra contiguous (#5953 )	2025-05-01 09:28:46 -07:00
Yuan Luo	67b7d5b1df	[PD] Vectorise group_concurrent_contiguous in NumPy (#5834 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2025-05-01 22:42:37 +08:00
ryang	4322c31e24	Support XiaomiMiMo/MiMo model inference (#5921 )	2025-05-01 07:41:13 -07:00
Yineng Zhang	9858113c33	chore: bump v0.4.6.post2 (#5939 )	2025-04-30 22:04:40 -07:00
Yineng Zhang	8441baad6e	fix: update model runner (#5934 )	2025-04-30 19:49:26 -07:00
mlmz	256c4c2519	fix: correct stream response when enable_thinking is set to false (#5881 )	2025-04-30 19:44:37 -07:00
Qiaolin Yu	7bcd8b1cb2	Fix lora batch processing when input lora_path contains None (#5930 )	2025-04-30 19:42:42 -07:00
Ying Sheng	11383cec3c	[PP] Add pipeline parallelism (#5724 )	2025-04-30 18:18:07 -07:00
XinyuanTong	e97e57e699	Remove unused method `calculate_num_image_tokens` from qwen2_vl.py (#5783 )	2025-04-30 17:46:59 -07:00
Yineng Zhang	9a6ad8916d	chore: upgrade sgl-kernel 0.1.1 (#5933 )	2025-04-30 16:13:30 -07:00
laixin	e330f2b86c	[qwen3] support qwen3 ep moe (#5917 ) Co-authored-by: sleepcoo <sleepcoo@gmail.com>	2025-04-30 09:15:21 -07:00
liwenju0	8fefdd32c7	[Feature] add support kimi vl model (#5383 ) Co-authored-by: wenju.li <wenju.li@deepctr.cn>	2025-04-29 21:31:19 -07:00
lambert0312	1698e94e67	Add A800 fused moe config for qwen3 235b (#5900 )	2025-04-29 20:18:11 -07:00
Qiaolin Yu	58195dd588	[Fix] Unload lora in HF_Runner if needed (#5899 )	2025-04-29 20:17:42 -07:00
Baizhou Zhang	799789afed	Bump Flashinfer to 0.2.5 (#5870 ) Co-authored-by: Yuhao Chen <yxckeis8@gmail.com>	2025-04-29 19:50:57 -07:00
ybyang	cc4a80caf6	[PD] Fix Assertion failed: /DeepEP/csrc/kernels/internode.cu:483, condition: ibgda_get_state()->num_rc_per_pe >= num_channels #134 (#5830 )	2025-04-29 19:38:54 -07:00
lambert0312	3c8a52311a	Fix check_env script (#5901 )	2025-04-29 18:54:54 -07:00
Chang Su	28b26dbf48	[Bugfix]: fix missing queue_time_start for requests from grammar_queue (#5696 )	2025-04-29 17:31:44 -07:00
Chang Su	2b06484bd1	feat: support pythonic tool call and index in tool call streaming (#5725 )	2025-04-29 17:30:44 -07:00
JieXin Liang	e4b6133b78	[fix] relax mem_fraction_static for h200 (#5893 ) Co-authored-by: alcanerian <alcanerian@gmail.com>	2025-04-29 17:01:12 -07:00
Ke Bao	dd408ee481	Auto set draft model path for MTP (#5793 )	2025-04-29 16:25:40 -07:00
lambert0312	91dda4cd06	Add A800 fused moe config for qwen3 30b (#5880 )	2025-04-29 02:02:24 -07:00

1 2 3 4 5 ...

2106 Commits