sglang

Author	SHA1	Message	Date
Mick	ff2ce0b86f	refactor: move image processors to separate files (#4229 )	2025-03-11 12:35:35 -07:00
Ximingwang-09	0f2a2e3c19	Add H20 tuning configs support DeepSeek V3/R1 INT8(block-wise) (#4220 ) Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>	2025-03-11 12:32:33 -07:00
yigex	690e1f2371	[AMD] Fix rocm sgl-kernel missing modules error (#4311 ) Co-authored-by: yiakwy-xpu-ml-framework-team <leiwang2@amd.com>	2025-03-11 10:35:28 -07:00
yych0745	6a02b32d07	Add A100 tuning configs for DeepSeek R1/V3 channel-wise INT8 (#4287 ) Co-authored-by: HandH1998 <1335248067@qq.com>	2025-03-11 00:49:06 -07:00
lukec	dce303e279	linear support deepgemm (#4199 ) Co-authored-by: yinfan98 <1106310035@qq.com>	2025-03-11 00:38:37 -07:00
Yineng Zhang	4d27eb9ad1	update sgl-kernel 0.0.4.post2 (#4291 )	2025-03-11 00:34:33 -07:00
lambert0312	d3ecd63204	Add A800 tuning configs support DeepSeek V3/R1 BF16 and INT8(block-wise) (#4136 )	2025-03-11 00:32:25 -07:00
Yineng Zhang	e187a3d595	upgrade xgrammar 0.1.15 (#4275 )	2025-03-10 14:53:24 -07:00
HandH1998	2ac189edc8	Amd test fp8 (#4261 )	2025-03-10 10:12:09 -07:00
Lianmin Zheng	5a6400eec5	Test no vllm custom allreduce (#4256 )	2025-03-10 10:08:25 -07:00
Lianmin Zheng	00d25a7f5e	Fix quantization and nightly tests (#4258 )	2025-03-10 03:06:21 -07:00
shimin	ac69885056	fix the input_ids is None error (#4144 )	2025-03-10 01:38:37 -07:00
Lianmin Zheng	aa957102a9	Simplify tests & Fix trtllm custom allreduce registration (#4252 )	2025-03-10 01:24:22 -07:00
DavidChan	4455b26e76	[Bug fixed] fixed the crash when enable the dp-attention on the single card (#3958 )	2025-03-10 00:50:34 -07:00
Lianmin Zheng	e8a69e4d0c	Clean up fp8 support (#4230 )	2025-03-09 21:46:35 -07:00
Lianmin Zheng	fbd560028a	Auto balance CI tests (#4238 )	2025-03-09 21:05:55 -07:00
Lianmin Zheng	730d084f2a	Minor style fix for sgl-kernel (#4243 )	2025-03-09 20:15:13 -07:00
Lianmin Zheng	4a05bdfa86	Revert "Check eagle server args" (#4242 )	2025-03-09 18:53:33 -07:00
Ying Sheng	34c8898755	Check eagle server args (#4217 )	2025-03-09 01:10:43 -08:00
HandH1998	0dd6cda288	Apply sgl w8a8 fp8 kernel (#3148 )	2025-03-09 00:03:32 -08:00
Baizhou Zhang	9fb48f951f	Support nextn for flashinfer mla attention backend (#4218 )	2025-03-09 00:01:54 -08:00
Yineng Zhang	89ccb533ad	use sgl-kernel 0.0.4 (#4224 )	2025-03-08 23:43:09 -08:00
Lianmin Zheng	1361ab9e03	Lazily import lora backends (#4225 )	2025-03-08 23:39:26 -08:00
Lianmin Zheng	8abf74e3c9	Rename files in sgl kernel to avoid nested folder structure (#4213 ) Co-authored-by: zhyncs <me@zhyncs.com>	2025-03-08 22:54:51 -08:00
Mingshan	0fe7c13be1	Fix bench_serving flush cache not recognizing OPENAI_API_KEY (#4181 ) Signed-off-by: Mingshan <git@brighill.com>	2025-03-08 01:03:38 -08:00
Lianmin Zheng	08c4d764a5	lazy import attn backends (#4200 )	2025-03-08 00:41:35 -08:00
Lianmin Zheng	d4017a6b63	[EAGLE] many fixes for eagle (#4195 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: Sehoon Kim <sehoon@x.ai>	2025-03-07 22:12:13 -08:00
Lianmin Zheng	d052f4c8a9	New clang format for sgl kernel (#4194 )	2025-03-07 20:21:08 -08:00
Ke Bao	20c8119915	Fix eagle hang issue for max_new_tokens=1 (#4185 )	2025-03-07 12:11:18 -08:00
Yineng Zhang	eb61f5c9af	Revert "ROCm: Flex Attention Enablement with custom backends (#4178 )" (#4186 )	2025-03-07 10:27:52 -08:00
HAI	0beea4503f	ROCm: Flex Attention Enablement with custom backends (#4178 ) Co-authored-by: linsun12 <linsun12@amd.com>	2025-03-07 04:38:53 -08:00
Zhiqiang Xie	9376ac361d	Memory pool fix for upstream change about eagle (#4170 )	2025-03-07 00:58:20 -08:00
HandH1998	c7f254468f	[Feature] DeepSeek V3/R1 INT8 Quantization (channel-wise) (#3888 ) Co-authored-by: yych0745 <1398089567@qq.com> Co-authored-by: sleepcoo <sleepcoo@gmail.com> Co-authored-by: b0urnee <2769086541@qq.com>	2025-03-06 20:54:52 -08:00
Pan Lyu	361971b859	Add Support for Qwen2-VL Multi-modal Embedding Models (#3694 )	2025-03-06 16:46:20 -08:00
HAI	13bc39c5d6	ROCm: enable trillion-parameter MoE models with INT4-FP8 single node (#4152 )	2025-03-06 15:33:02 -08:00
Lianmin Zheng	9c58e68b4c	Release v0.4.3.post4 (#4140 )	2025-03-06 12:50:28 -08:00
Oliver Stanley	d03b3467b8	Fix constrained generation errors by adding datasets dependency (#4142 )	2025-03-06 12:07:51 -08:00
yinfan98	ab7fba0ece	Fix nightly ci Gsm8k & Fix flashinfer backend kvcache quant (#4147 )	2025-03-06 11:50:07 -08:00
Lianmin Zheng	bc1534ff32	Fix a draft model accuracy bug in eagle; support step=1; return logprob in eagle (#4134 ) Co-authored-by: Sehoon Kim <kssteven418@gmail.com> Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: Sehoon Kim <sehoon@x.ai>	2025-03-06 06:13:59 -08:00
Lzhang-hub	3a3918121f	fix bench serving bug (#4135 )	2025-03-06 05:34:02 -08:00
Lianmin Zheng	98c73d71cb	[Minor] make the `__init__` function of model_runner.py shorter (#4132 )	2025-03-06 01:51:12 -08:00
Lianmin Zheng	fcc2e37f69	Split the __init__ of scheduler as smaller functions. Improve the eagle tests (#4128 )	2025-03-06 00:13:20 -08:00
Zhiqiang Xie	aee30630d8	Add a pointer to the real KV cache pool (#4113 )	2025-03-05 21:39:07 -08:00
Lianmin Zheng	286e6540a6	Remove prefill-only-one-req (#4117 )	2025-03-05 20:58:48 -08:00
Wenxuan Tan	718c391fd7	[Hoxfix] Fix incomplete token_to_kv_pool refactor (#4121 )	2025-03-05 19:32:42 -08:00
Yineng Zhang	fc671f66c1	chore: bump v0.4.3.post3 (#4114 )	2025-03-05 17:26:10 -08:00
Yueyang Pan	25482edb5c	Online serving benchmarks of real datasets for hierarchical KV caching (#3211 ) Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-03-05 16:16:43 -08:00
luzengxiangcn	62b362b1f1	Debug radixcache: refactor recursive helper methods (#3029 ) Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-03-05 16:11:42 -08:00
Jhin	70b3c6eeb1	Add update_weights_from_disk endpoint to Engine (#4102 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-03-05 12:25:18 -08:00
Ke Bao	ef9d3b3c2c	Fix triton kernel illegal memory issue for eagle (#4100 )	2025-03-05 11:23:53 -08:00

1 2 3 4 5 ...

1563 Commits