sglang

Author	SHA1	Message	Date
Yineng Zhang	d8a136a113	upgrade sgl-kernel 0.0.5.post4 (#4873 )	2025-03-28 19:48:56 -07:00
Lianmin Zheng	74e0ac1dbd	Clean up `import vllm` in quantization/__init__.py (#4834 )	2025-03-28 10:34:10 -07:00
Xiaoyu Zhang	04e3ff6975	Support compressed tensors fp8w8a8 (#4743 )	2025-03-26 13:21:25 -07:00
Yineng Zhang	9b7cf9ee6c	support cu128 sgl-kernel (#4744 )	2025-03-24 20:53:23 -07:00
Adarsh Shirawalmath	f8f9244a61	[Bug Fix] Add partial rotary factor support for Phi-4 and upgrade to transformers v4.50.0 (#3984 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-03-22 14:27:39 -07:00
Xiaoyu Zhang	dd865befde	[Hotfix] solve fp8 w8a8 ci test fail (#4531 )	2025-03-17 23:17:04 -07:00
萝卜菜	d6d21640d3	[Feature] Support Deepseek-VL2 (#2798 ) Co-authored-by: Edenzzzz <wtan45@wisc.edu> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: Yi Zhang <1109276519@qq.com>	2025-03-16 23:07:59 -07:00
lukec	a53fe428f9	Support FlashMLA backend (#4472 ) Co-authored-by: yinfan98 <1106310035@qq.com>	2025-03-16 09:07:06 -07:00
Ying Sheng	1b859295f4	[Eagle] Remove the greedy branch and some redundant code (#4363 ) Co-authored-by: Sehoon Kim <sehoon@x.ai>	2025-03-16 02:48:55 -07:00
Mick	035ac2ab74	ci: update transformers==4.48.3 (#4451 )	2025-03-15 13:27:26 -07:00
Yineng Zhang	ad1ae7f7cd	use topk_softmax with sgl-kernel (#4439 )	2025-03-14 15:59:06 -07:00
Lianmin Zheng	f141298a3c	Update ci_install_dependency.sh to use accelerate 1.4.0 (#4392 ) Co-authored-by: wangyu <wangyu.steph@bytedance.com> Co-authored-by: wangyu <yuwangauto@foxmail.com>	2025-03-13 07:16:11 -07:00
Yineng Zhang	3623b6a7f5	upgrade sgl-kernel 0.0.5 (#4381 )	2025-03-13 02:37:56 -07:00
Yineng Zhang	ed91561f79	upgrade sgl-kernel 0.0.4.post3 (#4334 )	2025-03-12 01:36:41 -07:00
Yineng Zhang	1cf63485c1	upgrade flashinfer 0.2.3 (#4317 ) Co-authored-by: qingquansong <qsong@linkedin.com>	2025-03-11 15:37:17 -07:00
Yineng Zhang	4d27eb9ad1	update sgl-kernel 0.0.4.post2 (#4291 )	2025-03-11 00:34:33 -07:00
Lianmin Zheng	5a6400eec5	Test no vllm custom allreduce (#4256 )	2025-03-10 10:08:25 -07:00
Ke Bao	f1d09a6541	Update bench speculative script (#4235 )	2025-03-09 12:19:01 -07:00
Yineng Zhang	89ccb533ad	use sgl-kernel 0.0.4 (#4224 )	2025-03-08 23:43:09 -08:00
Yineng Zhang	70866b6f4f	use same version for ci and pyproject (#4187 )	2025-03-07 10:39:55 -08:00
Adarsh Shirawalmath	19fd57bcd7	[docs] fix HF reference script command (#4148 )	2025-03-06 13:21:54 -08:00
Ke Bao	9fafa62db7	Share target model embed and head weights for nextn (#4033 )	2025-03-03 13:30:04 -08:00
Lianmin Zheng	935cda944b	Misc clean up; Remove the support of jump forward (#4032 )	2025-03-03 07:02:14 -08:00
Lianmin Zheng	ac2387279e	Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: dhou-xai <dhou@x.ai> Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>	2025-03-03 00:12:04 -08:00
Yineng Zhang	564bdf29f7	upgrade flashinfer v0.2.2.post1 (#3934 )	2025-02-27 09:53:48 -08:00
Lianmin Zheng	c9745ee082	Fix pandas dependency in CI (#3818 )	2025-02-24 05:56:57 -08:00
Yineng Zhang	75d171a9c5	chore: update flashinfer v0.2.1.post2 (#3644 )	2025-02-18 02:47:42 +08:00
Shi Shuai	7443197a63	[CI] Improve Docs CI Efficiency (#3587 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-02-14 19:57:00 -08:00
Ke Bao	862dd76c76	Support NextN (MTP) speculative decoding for DeepSeek-V3/R1 (#3582 )	2025-02-15 05:28:34 +08:00
Yineng Zhang	70f894b810	feat: support flashinfer mla attention for deepseek v3 (#3550 )	2025-02-14 08:50:14 +08:00
Yineng Zhang	4d2dbeaca7	remove cutex dependency (#3422 )	2025-02-09 18:33:20 +08:00
Yineng Zhang	d39899e85c	upgrade flashinfer v0.2.0.post2 (#3288 ) Co-authored-by: pankajroark <pankajroark@users.noreply.github.com>	2025-02-04 21:41:40 +08:00
Ke Bao	c23d5706f4	Update whl index path (#3128 )	2025-01-25 23:57:09 +08:00
Ke Bao	665e5e85f6	Add step to update sgl-kernel whl index (#3110 )	2025-01-25 02:03:01 +08:00
Byron Hsu	9a0cc2e90e	[router] Forward all request headers from router to workers (#3070 )	2025-01-23 20:30:31 -08:00
Lianmin Zheng	61f42b5732	Move sgl.Runtime under sglang/lang (#2990 )	2025-01-19 17:10:29 -08:00
Byron Hsu	ef18b0eda2	[router] Allow empty worker list for sglang.launch_router (#2979 )	2025-01-19 01:05:23 -08:00
Yineng Zhang	d06c1ab587	update ci install dependency (#2949 )	2025-01-17 23:42:23 +08:00
Lianmin Zheng	f65c13b559	Remove normalized_prompt_logprobs from the engine to make code easier to maintain (#2902 )	2025-01-15 04:54:14 -08:00
fzyzcjy	923f518337	CUDA-graph-compatible releasing and resuming KV cache and model weight memory (#2630 )	2025-01-13 11:38:51 -08:00
Lianmin Zheng	8a6906127a	Improve linear.py to load sharded weights & remove the dependency of Parameters from vllm (#2784 ) Co-authored-by: SangBin Cho rkooo567@gmail.com	2025-01-07 23:29:10 -08:00
Yineng Zhang	bc6ad367c2	fix lint (#2733 )	2025-01-05 14:45:42 +08:00
Ce Gao	f5d0865b25	feat: Support VLM in reference_hf (#2726 ) Signed-off-by: Ce Gao <gaocegege@hotmail.com>	2025-01-03 22:32:30 +08:00
Yineng Zhang	d49b13c6f8	feat: use CUDA 12.4 by default (for FA3) (#2682 )	2024-12-31 15:52:09 +08:00
fzyzcjy	f707470019	CI: Update scripts to fail fast (#2672 )	2024-12-30 19:04:01 -08:00
Yineng Zhang	d95a5f5bf5	fix followup #2517 (#2524 )	2024-12-19 23:24:30 +08:00
Ata Fatahi	ce094a5d79	Clean up GPU memory after killing sglang processes (#2457 ) Signed-off-by: Ata Fatahi <immrata@gmail.com>	2024-12-17 03:42:40 -08:00
Yineng Zhang	7154b4b1df	minor: update flashinfer nightly (#2490 )	2024-12-16 23:02:49 +08:00
Lianmin Zheng	835f8afc77	Migrate llama_classification to use the /classify interface (#2417 )	2024-12-08 23:30:51 -08:00
Lianmin Zheng	96db0f666d	Update killall_sglang.sh (#2397 )	2024-12-08 01:56:26 -08:00

1 2

94 Commits