sglang

Author	SHA1	Message	Date
Shenggui Li	c9565e49e7	[docker] added rdma support (#3619 )	2025-02-17 15:36:16 +08:00
Shi Shuai	d03c4c25a7	[docs] Update sampling_params.md (#3617 )	2025-02-16 18:52:30 -08:00
simveit	8f13377dea	Draft of updated doc for sampling params. (#3260 ) Co-authored-by: shuaills <shishuaicareer@gmail.com>	2025-02-16 14:28:22 -08:00
simveit	3d4a8f9bc0	Benchmark for reasoning models (#3532 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-02-17 03:07:30 +08:00
saienduri	7474bed883	Update to latest amd image. (#3597 )	2025-02-17 00:29:47 +08:00
Wen-Heng (Jack) Chung	03caefeb51	[ROCm] Use `tl.range()` in block GEMM kernels with `num_stages` set by host. (#3535 ) Co-authored-by: HAI <hixiao@gmail.com>	2025-02-16 01:40:38 -08:00
Mick	bcc213df61	Model: Support Qwen 2.5 vl (#3258 )	2025-02-16 00:58:53 -08:00
Jiada Li	39416e394a	fix lockfile and port_registry file permission error (#3598 ) Co-authored-by: jiada li <jiada@lmsys.us-northcentral1-a.compute.internal> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-02-15 19:14:45 -08:00
Shenggui Li	231c40d859	[docs] added favicon to sphinx html (#3564 )	2025-02-15 10:21:21 -08:00
Yineng Zhang	bbc47c348f	fix apply_token_bitmask_inplace_cuda (#3594 )	2025-02-15 23:55:08 +08:00
Yineng Zhang	dfce926921	fix high qps crash when enable mtp (#3592 ) Co-authored-by: ispobock <ispobaoke@hotmail.com>	2025-02-15 23:11:28 +08:00
Yineng Zhang	6718b10996	fix eagle unit test (#3591 )	2025-02-15 23:10:48 +08:00
Mick	7711ac6ed0	doc: emphasize and notify the usage of chat_template (#3589 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-02-15 00:10:32 -08:00
Shi Shuai	7443197a63	[CI] Improve Docs CI Efficiency (#3587 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-02-14 19:57:00 -08:00
Ke Bao	862dd76c76	Support NextN (MTP) speculative decoding for DeepSeek-V3/R1 (#3582 )	2025-02-15 05:28:34 +08:00
Shenggui Li	fb4c9c3a30	[fix] added support for vlm in offline inference (#3548 )	2025-02-15 05:27:29 +08:00
HAI	d973c78e79	ROCm docker: triton update (#3584 )	2025-02-14 10:26:32 -08:00
Jesse Lopez	6ce6eabbcc	Copy config files for MI300X to support in virtualized environments (#3505 )	2025-02-15 01:23:32 +08:00
Yineng Zhang	4e23c961e8	docs: update install (#3581 )	2025-02-14 18:54:50 +08:00
Xiaoyu Zhang	3efbdf68b9	fix sgl-kernel codestyle (#3563 )	2025-02-14 18:05:52 +08:00
Chuyue Sun	6cc309557a	Add support for OpenAI API o1 model (#3363 ) Co-authored-by: Shan Yu <shanyu1@g.ucla.edu>	2025-02-14 11:43:00 +08:00
Yineng Zhang	31eec35ba8	fix doc (#3558 )	2025-02-14 10:11:31 +08:00
Yineng Zhang	ac963be234	update flashinfer-python (#3557 )	2025-02-14 09:52:56 +08:00
Yineng Zhang	e0b9a423c8	chore: bump v0.4.3 (#3556 )	2025-02-14 09:43:14 +08:00
Yineng Zhang	e082142519	chore: bump 0.0.3.post6 sgl-kernel (#3555 )	2025-02-14 08:55:15 +08:00
Yineng Zhang	70f894b810	feat: support flashinfer mla attention for deepseek v3 (#3550 )	2025-02-14 08:50:14 +08:00
simveit	368de3661e	Update install docs (#3553 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-02-13 13:42:51 -08:00
Yineng Zhang	20de05a753	update README (#3543 )	2025-02-13 17:22:11 +08:00
Xiaoyu Zhang	f076328bb7	fix moe_align_kernel shm init not sync bug (#3534 )	2025-02-13 16:47:00 +08:00
Jhin	bf2a70872e	Update DeepSeek V3 Doc (#3541 )	2025-02-12 23:15:37 -08:00
Wen-Heng (Jack) Chung	871a4aa1bf	[ROCm] Add ROCm tuning configs for AMD Instinct MI325X. (#3536 )	2025-02-12 20:09:36 -08:00
yizhang2077	98eecbda54	integrate blockwise fp8 kernel (#3529 )	2025-02-13 04:39:33 +08:00
Yineng Zhang	4430c0a513	chore: bump 0.0.3.post5 sgl-kernel (#3530 )	2025-02-13 01:51:46 +08:00
yizhang2077	640363ad20	support blockwise fp8 matmul kernel (#3267 )	2025-02-13 01:49:33 +08:00
Liangsheng Yin	8616357a97	Fix deepseek awq v3 (#3450 )	2025-02-12 22:09:52 +08:00
Zachary Streeter	8adbc78b30	added llama and cleaned up (#3503 )	2025-02-12 18:48:30 +08:00
Xiaoyu Zhang	45e3a7bc41	use sgl_per_token_group_quant_fp8 kernel (#3493 )	2025-02-12 18:40:42 +08:00
Yineng Zhang	b96e92e6e6	chore: bump 0.0.3.post4 sgl-kernel (#3523 )	2025-02-12 17:28:36 +08:00
Xiaoyu Zhang	693c2600e0	refine deepseek_v3 launch server doc (#3522 )	2025-02-12 17:27:07 +08:00
Mick	ced680663c	doc: Support a new vLM (#3405 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-02-12 00:43:14 -08:00
Ata Fatahi	b8318aec48	Make NCCL NVLS configurable (#3502 )	2025-02-12 03:25:06 +08:00
Yineng Zhang	2f48221033	docs: update install	2025-02-12 03:13:31 +08:00
HAI	d81ac4434e	MI30x: More graph captures for larger batch sizes and concurrencies (#3420 )	2025-02-12 03:04:38 +08:00
Zachary Streeter	2491cc928d	add deepseek-v3 amd docker command (#3495 )	2025-02-12 03:03:08 +08:00
Didier Durand	67c5de9286	fix router typo (#3496 )	2025-02-12 03:00:57 +08:00
Didier Durand	1e2cf2b541	fix server_arguments typo (#3499 )	2025-02-12 02:59:53 +08:00
Didier Durand	9490d15772	fix supported_models Qwen typo (#3498 )	2025-02-12 02:59:18 +08:00
Didier Durand	eefcbdd353	fix deepseek_v3 typo (#3497 )	2025-02-12 02:58:36 +08:00
Ke Bao	7e6d5fc694	Support Eagle cuda graph for Triton backend (#3500 )	2025-02-12 02:27:45 +08:00
Wen-Heng (Jack) Chung	cadd5dbe6a	Tune MI300X fused MoE Triton kernel JSON config. (#3492 )	2025-02-11 10:27:25 -08:00

1 2 3 4 5 ...

2072 Commits