sglang

Author	SHA1	Message	Date
Xiaoze Fan	570d33437b	[Feature] Layer-wise Prefill (#7634 ) Signed-off-by: jason-fxz <jason341132@qq.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-07-17 01:57:46 +08:00
ronnie_zheng	766392c6bd	[feature]Ascend quantization support (#7791 ) Co-authored-by: ichernob <ichernobnn@gmail.com> Co-authored-by: liupeng <liupeng374@huawei.com>	2025-07-10 09:17:37 -07:00
Leng Yue	8364608930	add model: qwen2-audio (#7596 )	2025-07-04 21:13:10 -07:00
Ximingwang-09	1964c325de	[feat] Support EAGLE3 for Qwen (#7745 ) Co-authored-by: 纬杭 <ximing.wxm@antgroup.com> Co-authored-by: zyksir <zyksir@outlook.com>	2025-07-04 19:50:28 -07:00
Yi Zhang	264dc6e744	[optimize] add two stream norm for qwen3 (#7740 ) Co-authored-by: ispobock <ispobaoke@gmail.com>	2025-07-03 09:59:17 -07:00
Yi Zhang	646cef2e2e	support qwen3 dense model dp attention (#7681 )	2025-07-03 09:58:20 -07:00
Chunyuan WU	1dce6c480f	[CPU] support the case where num_attention_heads or intermediate_size is not divisible by the TP size (#6771 )	2025-07-03 09:51:38 -07:00
Shenggui Li	3f23d8cdf1	added support for tied weights in qwen pipeline parallelism (#6546 )	2025-05-25 00:00:56 -07:00
libra	11553c1a37	Add pipeline parallelism for Qwen2 and Qwen3 Model (#6250 )	2025-05-18 00:42:55 -07:00
yhyang201	4db463b1ad	[Model] Adding Qwen3 and Qwen3MoE (#4693 )	2025-04-18 09:51:29 -07:00
Yun Dai	2695ab0537	Fix loading KV quantization scale; Enable modelopt kv cache (#4686 ) Co-authored-by: qingquansong <ustcsqq@gmail.com>	2025-04-08 09:11:35 -07:00
Mick	5cb552b1d4	refactor: multimodal data (#4754 )	2025-03-31 09:57:51 -07:00
Mick	11577cedb7	refactor: bug fixes and refactor for vlm (#4661 )	2025-03-22 22:48:49 -07:00
yych0745	6f43a9b9f4	remove the unused readline dependency from the Qwen2 model implementa… (#4340 )	2025-03-12 02:47:27 -07:00
Qubitium-ModelCloud	56a724eba3	[QUANT] Add GPTQModel Dynamic Quantization + `lm_head` Quantization (#3790 ) Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>	2025-03-05 01:11:00 -08:00
fzyzcjy	a3339d8cac	Bug: Fix weight loader error when LM head weights are tied (#3766 )	2025-02-21 17:53:12 -08:00
Chayenne	14d90617b0	Bug: fix lm head weights in Qwen models (#3777 )	2025-02-21 16:49:31 -08:00
simveit	20b765a26e	Model: Support Qwen 72B RM model. (#3772 )	2025-02-21 14:38:21 -08:00
Mick	9f635ea50d	[Fix] Address remaining issues of supporting MiniCPMV (#2977 )	2025-01-28 00:22:13 -08:00
Mick	3d93f84a00	[Feature] Support minicpmv v2.6 (#2785 ) Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: yizhang2077 <1109276519@qq.com>	2025-01-18 14:14:19 -08:00
Yineng Zhang	2add697d7a	feat: remove vllm get_rope (#2964 )	2025-01-18 19:38:01 +08:00
bjmsong	d3024f4fc8	support e4m3 kvcache in qwen2 & add kv scaling facotr json (#2894 ) Co-authored-by: bjmsong <bjmsong@126.com>	2025-01-18 11:43:22 +08:00
Ke Bao	53e6552fed	Fix qwen accuracy issue (#2945 )	2025-01-17 22:35:26 +08:00
Yineng Zhang	5dc54f1a62	feat: remove vllm distributed (#2907 ) Co-authored-by: Zhangyi <1109276519@qq.com>	2025-01-17 22:31:51 +08:00
Rin Intachuen	a2f602b541	fixed lm_head.weight error for quantized qwen (#2910 )	2025-01-16 06:51:43 -08:00
Lzhang-hub	6ec75e626d	add qwen2 eagle model (#2863 )	2025-01-13 05:29:33 -08:00
Lianmin Zheng	959735fc9e	Fix model loader for more quantization formats (#2448 )	2024-12-11 05:21:23 -08:00
Yineng Zhang	85e1a6f3aa	Update model_loader deps and qqq quantization deps (#2220 ) (#2318 ) Co-authored-by: HandH1998 <1335248067@qq.com>	2024-12-02 23:22:13 +08:00
Lianmin Zheng	4936be8acc	Revert "Revert "[FEAT] Support GGUF format"" (#2287 )	2024-11-30 22:14:48 -08:00
Lianmin Zheng	7e4c6dd8da	Revert "[FEAT] Support GGUF format" (#2285 )	2024-11-30 19:03:26 -08:00
Yang Zheng	883c955489	[FEAT] Support GGUF format (#2215 ) Co-authored-by: Yang Zheng(SW)(Alex) <you@example.com>	2024-11-30 00:44:48 -08:00
Ying Sheng	8b48496aaf	Revert "Revert "Add simple CPU offloading support"" (#2253 ) Co-authored-by: Jani Monoses <jani.monoses@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2024-11-28 23:58:54 -08:00
Ying Sheng	4057ea82c9	Revert "Add simple CPU offloading support" (#2252 ) We'll re-add the commit to correctly ack Kaichao's authorship	2024-11-28 23:36:55 -08:00
Jani Monoses	d98fa1e93d	Add simple CPU offloading support. (#2081 )	2024-11-23 06:23:53 +00:00
Xuehai Pan	62a4a339eb	docs: fix module docstrings and copyright headers (#2077 )	2024-11-22 22:16:53 +08:00
Ke Bao	16eb33ffe2	Update vocab embedding deps and add TP switch (#1856 )	2024-10-31 20:13:07 -07:00
Byron Hsu	56503d9bc9	[1/N] Remove `CacheConfig` import in all model files (#1658 )	2024-10-14 09:06:34 -07:00
Lianmin Zheng	36d5acfca5	Rename InputMetadata -> ForwardBatch (#1543 )	2024-09-30 02:41:11 -07:00
Yineng Zhang	b4408b0d16	feat: update linear deps 1/N (#1305 )	2024-09-19 20:53:11 +08:00
Liangsheng Yin	70b6802982	Optimize conflicts between CUDA graph and vocab mask tensors (#1392 )	2024-09-13 20:27:53 -07:00
Liangsheng Yin	381dd57bd6	Sampler cudagraph (#1253 )	2024-08-28 18:58:52 -07:00
Lianmin Zheng	bf53bf5142	[Fix] Fix llava on multi images (#1247 )	2024-08-28 06:33:05 -07:00
Yineng Zhang	f25f4dfde5	hotfix: revert sampler CUDA Graph (#1242 )	2024-08-28 21:16:47 +10:00
Liangsheng Yin	75ce37f401	Move sampler into CUDA graph (#1201 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-08-26 07:02:50 -07:00
Chayenne	30b4f771b0	Support Alibaba-NLP/gte-Qwen2-7B-instruct embedding Model (#1186 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2024-08-25 10:29:12 -07:00
Yineng Zhang	6a38efa834	feat: replace all rmsnorm and silu (#1057 )	2024-08-13 02:15:59 +10:00
Liangsheng Yin	87e8c090e9	Organize code (rename, movement) (#953 )	2024-08-06 20:50:32 -07:00
Liangsheng Yin	cdcbde5fc3	Code structure refactor (#807 )	2024-07-29 23:04:48 -07:00
Yineng Zhang	dd7e8b9421	chore: add copyright for srt (#790 )	2024-07-28 23:07:12 +10:00
zhyncs	2e341cd493	misc: add pre-commit config (#637 )	2024-07-17 11:55:39 -07:00

1 2

65 Commits