sglang

Author	SHA1	Message	Date
Mick	035ac2ab74	ci: update transformers==4.48.3 (#4451 )	2025-03-15 13:27:26 -07:00
Yineng Zhang	ad1ae7f7cd	use topk_softmax with sgl-kernel (#4439 )	2025-03-14 15:59:06 -07:00
Lianmin Zheng	e73167ade3	Fix maximum recursion depth triggered on exception exit (#4438 )	2025-03-14 15:12:26 -07:00
Baoyuan Qi	642ab418f3	[bug] fix duplicate variable MAX_PIXELS in qwen_vl.py (#4419 )	2025-03-14 01:28:25 -07:00
wangyu	1ce4878d31	feat(remote_model): support variable remote backend for model loader (#3964 ) Signed-off-by: wangyu <wangyu.steph@bytedance.com>	2025-03-14 00:40:44 -07:00
Yineng Zhang	977d7cd26a	cleanup deps 1/n (#4400 ) Co-authored-by: sleepcoo <sleepcoo@gmail.com>	2025-03-14 00:00:33 -07:00
Lu Changqi	0e0ec70200	Hierarchical Caching supports MLA (#4009 ) Signed-off-by: Changqi Lu <luchangqi.123@bytedance.com> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-03-13 20:42:14 -07:00
Yineng Zhang	ba80c102f9	bump v0.4.4.post1 (#4402 )	2025-03-13 17:53:46 -07:00
Zhiqiang Xie	fbdb50501f	Hot fix for hicache with new page aligned radixtree (#4397 )	2025-03-13 15:50:49 -07:00
Qiaolin Yu	85d2365d33	Fix the output of hidden states after HTTP requests (#4269 )	2025-03-13 14:54:06 -07:00
Chang Su	5fe79605a8	Fix Llama3.3 tool call support (#4320 )	2025-03-13 14:01:41 -07:00
Lianmin Zheng	c6d7f8d370	Add some fused elementwise kernels for grok-1 (#4398 ) Co-authored-by: dhou-xai <dhou@x.ai> Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>	2025-03-13 13:39:10 -07:00
Lianmin Zheng	a5a892ffd3	Fix auto merge & add back get_flat_data_by_layer (#4393 )	2025-03-13 08:46:25 -07:00
Lianmin Zheng	8e66fbecee	Improve DP attention (#4390 ) Co-authored-by: dhou-xai <dhou@x.ai> Co-authored-by: SangBin Cho <rkooo567@gmail.com>	2025-03-13 08:23:56 -07:00
Lianmin Zheng	4fea040ca1	Fix a regression introduced by overlapping KV cache writing (#4375 )	2025-03-13 03:49:05 -07:00
Yineng Zhang	6aaeb84872	chore: bump v0.4.4 (#4041 )	2025-03-13 02:49:58 -07:00
Yineng Zhang	3623b6a7f5	upgrade sgl-kernel 0.0.5 (#4381 )	2025-03-13 02:37:56 -07:00
Lianmin Zheng	45de89719c	Revert "[XPU][CPU] Enable the native path of DeepSeek" (#4367 )	2025-03-12 23:45:52 -07:00
Meng, Hengyu	71046fcd71	[XPU][CPU] Enable the native path of DeepSeek (#4086 ) Co-authored-by: Zhang, Liangang <liangang.zhang@intel.com>	2025-03-12 22:26:29 -07:00
Lianmin Zheng	c76040e31b	Support page size > 1 (#4356 )	2025-03-12 22:22:39 -07:00
Cheng Wan	2f6bacee03	[moe] fix: correct the cache size in the last chunk (#3679 ) Co-authored-by: Abatom <abzhonghua@gmail.com>	2025-03-12 22:22:13 -07:00
Wen Sun	4014804157	Ensure Usage Data in Streaming Responses Aligns with vLLM’s Implementation (#3814 )	2025-03-12 22:12:55 -07:00
David Carreto Fidalgo	f7f88b706c	HotFix: json serialization error when using OAI v1/batches endpoint with logprobs (#3896 )	2025-03-12 22:04:29 -07:00
yiakwy-xpu-ml-framework-team	18c27131f5	[tools] add fp8 max/min constant in utils (#3959 )	2025-03-12 21:44:55 -07:00
YR Chen	ccdd10c84b	Move `aiohttp` into public dependencies (#3980 )	2025-03-12 21:42:57 -07:00
vikram singh shekhawat	76f6c0ebf9	Add device detection and count functions to utils. (#3962 )	2025-03-12 21:41:50 -07:00
Conghui Tan	6412c5e493	Avoid duplicated request ids in batch APIs (#4026 ) Co-authored-by: conghuitan <conghuitan@tencent.com>	2025-03-12 21:38:17 -07:00
AniZpZ	85ef7f64e4	[FIX] fix incorrect output when enable both deepgemm and torch compile (#4359 ) Co-authored-by: xuyongfei.xyf <xuyongfei.xyf@antgroup.com>	2025-03-12 21:34:09 -07:00
Chen Shengzhi	f1cf6eefbe	[Fix] Check the device backend before calling empty_cache function (#4212 )	2025-03-12 21:28:48 -07:00
Wang Ran (汪然)	aff79f101f	simple bugfix (#4342 )	2025-03-12 21:20:18 -07:00
William	56c39a05a2	Remove the choices in --speculative-eagle-topk argument (#4329 )	2025-03-12 21:19:16 -07:00
文峰	c550e52f8b	Fix scheduler proctitle suffix is None (#4326 ) Co-authored-by: wenfeng.wf <wenfeng.wf@alibaba-inc.com>	2025-03-12 19:29:35 -07:00
Lianmin Zheng	e35a93fa8a	Move output processing logic from scheduler.py into a separate file (#4354 )	2025-03-12 16:21:49 -07:00
Lianmin Zheng	d40ee62b5d	Update nightly tests (#4352 )	2025-03-12 15:36:13 -07:00
Wang Ran (汪然)	91b19949d7	typo: Update http_server.py (#4350 )	2025-03-12 15:05:30 -07:00
Zhiqiang Xie	10b544ae9b	Hierarchical Caching Refactoring and Fixing TP issue (#4082 )	2025-03-12 11:22:35 -07:00
Mick	01090e8ac3	model: Support Janus-pro (#3203 )	2025-03-12 11:02:11 -07:00
yych0745	6f43a9b9f4	remove the unused readline dependency from the Qwen2 model implementa… (#4340 )	2025-03-12 02:47:27 -07:00
JieXin Liang	0540fef7a1	[Fix] fix _yarn_linear_ramp_mask with device parameter (#4337 )	2025-03-12 02:28:19 -07:00
lambert0312	481f608b8e	Add INT8 support MTP NextN function (#3911 )	2025-03-12 01:37:16 -07:00
Yineng Zhang	ed91561f79	upgrade sgl-kernel 0.0.4.post3 (#4334 )	2025-03-12 01:36:41 -07:00
Stefan He	e0917e6bd0	Remove vllm ops scaled fp8 quant and accelerate per token quant by 20-28% (#4215 ) Co-authored-by: Stefan He <bhe@linkedin.com>	2025-03-12 00:08:03 -07:00
lambert0312	7140ba3573	Add A800 tuning configs for DeepSeek R1/V3 channel-wise INT8 (#4323 )	2025-03-11 18:25:56 -07:00
Yineng Zhang	d1da58e275	unify is_cuda and is_hip (#4321 )	2025-03-11 18:12:56 -07:00
Yineng Zhang	1cf63485c1	upgrade flashinfer 0.2.3 (#4317 ) Co-authored-by: qingquansong <qsong@linkedin.com>	2025-03-11 15:37:17 -07:00
Mick	ff2ce0b86f	refactor: move image processors to separate files (#4229 )	2025-03-11 12:35:35 -07:00
Ximingwang-09	0f2a2e3c19	Add H20 tuning configs support DeepSeek V3/R1 INT8(block-wise) (#4220 ) Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>	2025-03-11 12:32:33 -07:00
yigex	690e1f2371	[AMD] Fix rocm sgl-kernel missing modules error (#4311 ) Co-authored-by: yiakwy-xpu-ml-framework-team <leiwang2@amd.com>	2025-03-11 10:35:28 -07:00
yych0745	6a02b32d07	Add A100 tuning configs for DeepSeek R1/V3 channel-wise INT8 (#4287 ) Co-authored-by: HandH1998 <1335248067@qq.com>	2025-03-11 00:49:06 -07:00
lukec	dce303e279	linear support deepgemm (#4199 ) Co-authored-by: yinfan98 <1106310035@qq.com>	2025-03-11 00:38:37 -07:00

1 2 3 4 5 ...

1608 Commits