sglang

Author	SHA1	Message	Date
Lianmin Zheng	bdf946bf81	Support loading pre-sharded moe weights (#2716 )	2025-01-02 15:07:37 -08:00
yukavio	8c8779cd05	[Fix] fix retract error in eagle speculative decoding (#2711 ) Co-authored-by: kavioyu <kavioyu@tencent.com>	2025-01-02 10:28:39 -08:00
Mick	1775b963db	[Fix] fix incorrectly overwriting the port specified in ServerArgs (#2714 )	2025-01-02 10:28:22 -08:00
Yineng Zhang	ba5112ff69	feat: support moe_align_block_size_triton (#2712 ) Co-authored-by: WANDY666 <1060304770@qq.com>	2025-01-02 21:47:44 +08:00
yukavio	815dce0554	Eagle speculative decoding part 4: Add EAGLE2 worker (#2150 ) Co-authored-by: kavioyu <kavioyu@tencent.com> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2025-01-02 03:22:34 -08:00
Lianmin Zheng	ad20b7957e	Eagle speculative decoding part 3: small modifications to the general scheduler (#2709 ) Co-authored-by: kavioyu <kavioyu@tencent.com>	2025-01-02 02:09:08 -08:00
fzyzcjy	9183c23eca	Speed up `update_weights_from_tensor` (#2695 )	2025-01-02 02:05:19 -08:00
kk	148254d4db	Improve moe reduce sum kernel performance (#2705 ) Co-authored-by: wunhuang <wunhuang@amd.com>	2025-01-02 01:11:06 -08:00
kk	b6e0cfb5e1	ROCm base image update (#2692 ) Co-authored-by: wunhuang <wunhuang@amd.com>	2025-01-01 12:12:19 +08:00
Xiaoyu Zhang	286cad3ee3	h200 tuning fused_moe_triton config for Mixtral 8x7B/8x22B and Qwen2 57BA14B (#2689 )	2024-12-31 23:17:36 +08:00
Ying Sheng	dc7eb01f19	[Fix] fix openai adapter (#2685 )	2024-12-31 10:48:19 +00:00
Lianmin Zheng	b0524c3789	Eagle speculative decoding part 2: Fix cuda graph + DP attention hanging (#2684 ) Co-authored-by: yukavio <kavioyu@gmail.com>	2024-12-31 02:25:05 -08:00
Yineng Zhang	d49b13c6f8	feat: use CUDA 12.4 by default (for FA3) (#2682 )	2024-12-31 15:52:09 +08:00
Lianmin Zheng	f44d143949	Support target model verification in the attention backend (#2678 ) Co-authored-by: yukavio <kavioyu@gmail.com>	2024-12-30 22:58:55 -08:00
Lianmin Zheng	339c69a243	Improve the computation for time_per_output_token Prometheus metrics (#2674 )	2024-12-30 21:40:14 -08:00
Lianmin Zheng	21ec66e59e	Minor follow-up fixes for the logprob refactor (#2670 )	2024-12-30 05:42:08 -08:00
HAI	c5210dfa38	AMD DeepSeek_V3 FP8 Numerical fix (#2667 )	2024-12-30 21:31:12 +08:00
mobicham	a29dd9501d	Add GemLite caching after each capture (#2669 )	2024-12-30 05:27:29 -08:00
Lianmin Zheng	9c6ba2484f	Refactor logprob computation to return the real logprob used in sampling (#2664 )	2024-12-30 04:51:38 -08:00
Lianmin Zheng	8c3b420eec	[Docs] clean up structured outputs docs (#2654 )	2024-12-29 23:57:16 -08:00
HAI	e6f523b5f2	fix typo in python/sglang/srt/layers/quantization/fp8.py (#2655 )	2024-12-29 23:45:02 -08:00
Lianmin Zheng	03d5fbfd44	Release 0.4.1.post3 - upload the config.json to PyPI (#2647 )	2024-12-29 14:25:53 -08:00
Shi Shuai	fad29f7f52	CI: Fix unittest for engine input token ids and output token ids (#2646 )	2024-12-29 13:28:59 -08:00
Shi Shuai	35bdb48557	[Feature] Get Token IDs with Engine.generate() (#2636 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2024-12-29 12:28:27 -08:00
Yineng Zhang	3ccf566b0d	chore: bump v0.4.1.post2 (#2643 )	2024-12-30 00:11:46 +08:00
HandH1998	afa0341e57	Update Triton configs for block fp8 kernels (#2641 )	2024-12-29 22:53:47 +08:00
HAI	30828e7192	AMD: set weights and scaling numbers properly for block FP8 (#2637 )	2024-12-29 03:23:39 -08:00
Ying Sheng	e0e09fceeb	[Session] Update session control interface (#2635 )	2024-12-29 02:10:27 -08:00
Lianmin Zheng	9c05c6898e	Add llama_eagle.py (#2640 ) Co-authored-by: kavioyu <kavioyu@tencent.com>	2024-12-29 01:45:35 -08:00
Lianmin Zheng	3815b23ccb	Clean up wrapper in flashinfer backend (#2638 )	2024-12-29 00:45:57 -08:00
Tanjiro	8ee9a8501a	[Feature] Function Calling (#2544 ) Co-authored-by: Haoyu Wang <120358163+HaoyuWang4188@users.noreply.github.com>	2024-12-28 21:58:52 -08:00
fzyzcjy	fd28640dc5	Add `update_weights_from_tensor` (#2631 )	2024-12-28 13:30:27 -08:00
Yineng Zhang	7863e4368a	add configs for block fp8 related kernels (#2628 ) Co-authored-by: HandH1998 <1335248067@qq.com>	2024-12-28 23:12:04 +08:00
Lianmin Zheng	855d0ba381	[CI] Fix nightly test and raise better error message (#2626 ) Co-authored-by: Sangbin <rkooo567@gmail.com>	2024-12-27 22:16:39 -08:00
Xiaoyu Zhang	9254a33ad4	avoid fused_moe_triton `padding` circular import (#2624 )	2024-12-28 14:01:35 +08:00
Lianmin Zheng	751e5ca273	[minor] clean up docs and eos id (#2622 )	2024-12-27 11:23:46 -08:00
Yang Zheng	7a7ac6bea1	[FIX] Update EOS from config (#2475 )	2024-12-27 10:59:56 -08:00
Yineng Zhang	ef5b0ff90b	chore: bump v0.4.1.post1 (#2616 )	2024-12-28 00:11:06 +08:00
HandH1998	6e5305158c	update sgl_moe_align_block_size usage (#2617 )	2024-12-28 00:01:13 +08:00
kk	70dc2fbe2d	Change extend attention kernel launch parameter for ROCm platform to … (#2610 ) Co-authored-by: wunhuang <wunhuang@amd.com> Co-authored-by: HAI <hixiao@gmail.com>	2024-12-27 00:32:17 -08:00
kk	7ca751ff7d	Fused moe triton cfg opt for rocm (#2612 ) Co-authored-by: wunhuang <wunhuang@amd.com>	2024-12-26 23:38:22 -08:00
HAI	7722c11c1d	Regression fix to AMD/ROCm from recent change (#2606 )	2024-12-26 20:22:14 -08:00
fzyzcjy	b2ed5c8ea7	Tiny code cleanup in tokenizer_manager.py (#2586 )	2024-12-26 17:53:09 -08:00
fzyzcjy	44f011d224	Super tiny typo fix (#2564 )	2024-12-26 08:28:01 -08:00
yudian0504	531d6ea968	fix: package data missing (#2521 )	2024-12-26 08:16:48 -08:00
Lianmin Zheng	dc3bee4815	Fix test and benchmark scripts (#2598 )	2024-12-26 07:56:26 -08:00
fzyzcjy	3169e66c23	Fix duplicated handling of GetWeightsByNameReqInput (#2565 )	2024-12-26 06:49:32 -08:00
Lianmin Zheng	773951548d	Fix logprob_start_len for multi modal models (#2597 ) Co-authored-by: libra <lihu723@gmail.com> Co-authored-by: fzyzcjy <ch271828n@outlook.com> Co-authored-by: Wang, Haoyu <haoyu.wang@intel.com>	2024-12-26 06:27:45 -08:00
Adarsh Shirawalmath	acb340728c	[Feature] Support new parameter - EBNF in xgrammar (#2526 )	2024-12-26 05:12:41 -08:00
Sangchun Ha (Patrick)	08effbff35	Error occurs when loading the gemma model in bitsandbytes format. (#2557 )	2024-12-26 05:10:37 -08:00

1 2 3 4 5 ...

1206 Commits