sglang

Author	SHA1	Message	Date
yilian49	f64b8e3e4e	Support the internvl3.5 family models in sglang (#9705 )	2025-09-02 22:06:48 +08:00
chenxj	d4a938417d	[feat] Support tp mode for DeepSeek-R1-W4AFP8 (#8118 ) Co-authored-by: yuhyao <827623970@qq.com>	2025-09-01 22:17:26 -07:00
Guoyuan Lin	5e194b2143	[Model] Support Meituan LongCat-Flash && LongCat-Flash-MTP (#9824 )	2025-08-30 23:29:21 -07:00
Liangsheng Yin	eb19ccadae	[bug] fix errors related to context length in SD (#9388 )	2025-08-21 10:32:34 +08:00
blzheng	ebbb75e917	[CPU] Fix TP padding issue on Phi-4 (#8289 )	2025-08-17 16:25:26 -07:00
Netanel Haber	845d12a979	model: support nvidia/Llama-3_3-Nemotron-Super-49B-v1 (#9067 ) Co-authored-by: Kyle Huang <kylhuang@nvidia.com>	2025-08-17 01:48:15 -07:00
Zhihao Liu	65736dc524	[Model] Support Qwen3ForSequenceClassification for Qwen3-Embed Model (#7957 )	2025-08-13 11:14:54 -07:00
Binyao Jiang	f29aba8c6e	Support glm4.1v and glm4.5v (#8798 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Xinyuan Tong <justinning0323@outlook.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: zRzRzRzRzRzRzR <2448370773@qq.com> Co-authored-by: Minglei Zhu <mingleizhu1122@gmail.com> Co-authored-by: Chang Su <csu272@usc.edu>	2025-08-09 00:59:13 -07:00
Lianmin Zheng	706bd69cc5	Clean up server_args.py to have a dedicated function for model specific adjustments (#8983 )	2025-08-08 19:56:50 -07:00
Wenbo Yang	1132547496	Add ernie4.py for ERNIE-4.5 (#7657 )	2025-08-08 00:55:48 -07:00
PGFLMG	b7cd743038	[Feat] QWen-1M context support[2/2]: Update block sparse attention backend (#5949 )	2025-08-06 23:49:36 -07:00
kk	d4bf5a8524	Support OCP MXFP4 quantization on AMD GPUs (#8255 ) Co-authored-by: wunhuang <wunhuang@amd.com> Co-authored-by: Hubert Lu <Hubert.Lu@amd.com>	2025-08-04 18:14:52 -07:00
Ke Bao	8fbcfd0723	Update step3v default config (#8626 )	2025-08-01 00:49:26 +08:00
Chang Su	51c38163c1	model: support Step3V (#8583 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> Co-authored-by: nnnobody-code <nnnobody@foxmail.com> Co-authored-by: ispobock <ispobaoke@gmail.com> Co-authored-by: Qiaolin-Yu <qy254@cornell.edu> Co-authored-by: Qiaolin-Yu <liin1211@outlook.com> Co-authored-by: Xinyuan Tong <justinning0323@outlook.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>	2025-07-31 02:41:00 -07:00
Lifu Huang	fb16fbaf52	Fix incorrect KV cache allocation for MTP models. (#8482 ) Co-authored-by: Stefan He <hebiaobuaa@gmail.com>	2025-07-28 22:54:50 -07:00
Yuxuan Zhang	6d6a8bc278	GLM-4.5 Model Support (#8224 ) Co-authored-by: Lifu Huang <lifu.hlf@gmail.com> Co-authored-by: Binyao Jiang <byjiang1996@gmail.com> Co-authored-by: Stefan He <hebiaobuaa@gmail.com>	2025-07-27 22:54:07 -07:00
RunningLeon	b7094a5ef1	model: support intern-s1 (#8350 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: zxy <zhou0493@e.ntu.edu.sg> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Mick <mickjagger19@icloud.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>	2025-07-26 13:48:51 -07:00
Minho Ryu	bfb118c01e	fix bug when eos_ids==0 (#8315 )	2025-07-23 23:18:47 -07:00
Xinyuan Tong	8430bfe3e9	[Refactor] simplify multimodal data processing (#8107 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>	2025-07-20 21:43:09 -07:00
GuoYipin	750838adc4	fix: fix the bug of loading Internvl3 (#8067 ) Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2025-07-20 22:22:54 +08:00
Lianmin Zheng	bb0e8a32b5	Clean up server args (#8161 )	2025-07-19 11:32:52 -07:00
Haohui Mai	d918ab7985	Support NVFP4 quantized dense models on AMD CDNA2/CDNA3 GPUs (#7302 ) Co-authored-by: HAI <hixiao@gmail.com> Co-authored-by: Sai Enduri <saimanas.enduri@amd.com>	2025-07-18 19:59:39 -07:00
jianan-gu	48c1fa7bb6	[CPU][Llama4] Fix Llama4 MoE inputs with "apply_router_weight_on_input" (#7889 )	2025-07-17 21:43:25 -07:00
Hanming Lu	9379da77de	SWA Prefix Cache (#7367 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2025-07-13 12:31:07 -07:00
Atream	615553079d	Support Kimi K2 (#7940 )	2025-07-11 00:02:21 -07:00
ronnie_zheng	766392c6bd	[feature]Ascend quantization support (#7791 ) Co-authored-by: ichernob <ichernobnn@gmail.com> Co-authored-by: liupeng <liupeng374@huawei.com>	2025-07-10 09:17:37 -07:00
SijiaYang	cb9d91ea8a	feat: support DeepSeek-R1-W4AFP8 model with ep-moe mode (#7762 ) Signed-off-by: yangsijia.614 <yangsijia.614@bytedance.com>	2025-07-07 14:47:21 -07:00
Leng Yue	8364608930	add model: qwen2-audio (#7596 )	2025-07-04 21:13:10 -07:00
Chunyuan WU	1dce6c480f	[CPU] support the case where num_attention_heads or intermediate_size is not divisible by the TP size (#6771 )	2025-07-03 09:51:38 -07:00
Lianmin Zheng	22352d47a9	Improve streaming, log_level, memory report, weight loading, and benchmark script (#7632 ) Co-authored-by: Kan Wu <wukanustc@gmail.com>	2025-06-29 23:16:19 -07:00
JieXin Liang	b691dcc490	[misc] reduce weird rope_scaling_factor warning (#7176 )	2025-06-29 15:42:45 -07:00
tarinkk	eb6c2c1663	Hybrid kv cache for LLaMA4 (#6563 ) Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com> Co-authored-by: tarinkk <rt572@physics.rutger.edu> Co-authored-by: tarinkk <rt572@rutgers.physics.edu> Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>	2025-06-27 18:58:55 -07:00
Lianmin Zheng	ce3a3e8783	Move multimodal processors into a separate folder (#7581 )	2025-06-27 11:58:24 -07:00
Xinyuan Tong	9b00990bea	chore: remove vlm unnecessary import (#7541 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> Co-authored-by: yhyang201 <yhyang201@gmail.com> Co-authored-by: Mick <mickjagger19@icloud.com>	2025-06-26 01:38:15 -07:00
woodx	e30ef368ab	Feat/support rerank (#6058 )	2025-06-16 10:50:01 -07:00
Zijian	31d6dee5c4	Support VILA models (#6106 )	2025-06-11 11:47:25 -07:00
Marc Sun	37f1547587	[FEAT] Add transformers backend support (#5929 )	2025-06-03 21:05:29 -07:00
Mick	ce9d690ef4	fix: fix nightly test from updating transformers (#6658 )	2025-05-27 00:28:11 -07:00
Yineng Zhang	7eb9d8e594	chore: upgrade transformers 4.52.3 (#6575 ) Co-authored-by: Mick <mickjagger19@icloud.com>	2025-05-25 22:49:58 -07:00
Lifu Huang	022012aae8	Support Phi-4 Multi-Modal (text + vision only) (#6494 )	2025-05-24 21:43:38 -07:00
HandH1998	1b2e8f76d9	[2/2] Support Qserve (#6521 )	2025-05-23 12:39:18 -07:00
Chang Su	4685fbb888	[VLM] Support chunk prefill for VLM (#6355 ) Co-authored-by: yizhang2077 <1109276519@qq.com>	2025-05-22 20:32:41 -07:00
ryang	a6ae3af15e	Support XiaomiMiMo inference with mtp (#6059 )	2025-05-22 14:14:49 -07:00
Mick	01dd39bac1	refactor: minor refactors regarding multimodal processing (#6187 )	2025-05-17 22:53:20 -07:00
Kiv Chen	64825b8395	model(vlm): mistral 3.1 (#5099 ) Co-authored-by: KivenChen <sleigh-queue-0y@icloud.com>	2025-05-16 18:36:18 -07:00
Kiv Chen	5380cd7ea3	model(vlm): pixtral (#5084 )	2025-05-13 00:16:10 -07:00
Lianmin Zheng	e8e18dcdcc	Revert "fix some typos" (#6244 )	2025-05-12 12:53:26 -07:00
applesaucethebun	d738ab52f8	fix some typos (#6209 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-13 01:42:38 +08:00
applesaucethebun	2ce8793519	Add typo checker in pre-commit (#6179 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-11 12:55:00 +08:00
yhyang201	cec98f1034	[Fix] Incorrect Memory Allocation on CUDA:0 by Non-Zero CUDA Processes in TP/DP (#5745 )	2025-05-08 17:52:26 -07:00

1 2 3

117 Commits