sglang

Author	SHA1	Message	Date
Ying Sheng	e0e09fceeb	[Session] Update session control interface (#2635 )	2024-12-29 02:10:27 -08:00
Lianmin Zheng	9c05c6898e	Add llama_eagle.py (#2640 ) Co-authored-by: kavioyu <kavioyu@tencent.com>	2024-12-29 01:45:35 -08:00
Lianmin Zheng	3815b23ccb	Clean up wrapper in flashinfer backend (#2638 )	2024-12-29 00:45:57 -08:00
Tanjiro	8ee9a8501a	[Feature] Function Calling (#2544 ) Co-authored-by: Haoyu Wang <120358163+HaoyuWang4188@users.noreply.github.com>	2024-12-28 21:58:52 -08:00
fzyzcjy	fd28640dc5	Add `update_weights_from_tensor` (#2631 )	2024-12-28 13:30:27 -08:00
Yineng Zhang	7863e4368a	add configs for block fp8 related kernels (#2628 ) Co-authored-by: HandH1998 <1335248067@qq.com>	2024-12-28 23:12:04 +08:00
Lianmin Zheng	855d0ba381	[CI] Fix nightly test and raise better error message (#2626 ) Co-authored-by: Sangbin <rkooo567@gmail.com>	2024-12-27 22:16:39 -08:00
Xiaoyu Zhang	9254a33ad4	avoid fused_moe_triton `padding` circular import (#2624 )	2024-12-28 14:01:35 +08:00
Lianmin Zheng	751e5ca273	[minor] clean up docs and eos id (#2622 )	2024-12-27 11:23:46 -08:00
Yang Zheng	7a7ac6bea1	[FIX] Update EOS from config (#2475 )	2024-12-27 10:59:56 -08:00
Yineng Zhang	ef5b0ff90b	chore: bump v0.4.1.post1 (#2616 )	2024-12-28 00:11:06 +08:00
HandH1998	6e5305158c	update sgl_moe_align_block_size usage (#2617 )	2024-12-28 00:01:13 +08:00
kk	70dc2fbe2d	Change extend attention kernel launch parameter for ROCm platform to … (#2610 ) Co-authored-by: wunhuang <wunhuang@amd.com> Co-authored-by: HAI <hixiao@gmail.com>	2024-12-27 00:32:17 -08:00
kk	7ca751ff7d	Fused moe triton cfg opt for rocm (#2612 ) Co-authored-by: wunhuang <wunhuang@amd.com>	2024-12-26 23:38:22 -08:00
HAI	7722c11c1d	Regression fix to AMD/ROCm from recent change (#2606 )	2024-12-26 20:22:14 -08:00
fzyzcjy	b2ed5c8ea7	Tiny code cleanup in tokenizer_manager.py (#2586 )	2024-12-26 17:53:09 -08:00
fzyzcjy	44f011d224	Super tiny typo fix (#2564 )	2024-12-26 08:28:01 -08:00
yudian0504	531d6ea968	fix: package data missing (#2521 )	2024-12-26 08:16:48 -08:00
Lianmin Zheng	dc3bee4815	Fix test and benchmark scripts (#2598 )	2024-12-26 07:56:26 -08:00
fzyzcjy	3169e66c23	Fix duplicated handling of GetWeightsByNameReqInput (#2565 )	2024-12-26 06:49:32 -08:00
Lianmin Zheng	773951548d	Fix logprob_start_len for multi modal models (#2597 ) Co-authored-by: libra <lihu723@gmail.com> Co-authored-by: fzyzcjy <ch271828n@outlook.com> Co-authored-by: Wang, Haoyu <haoyu.wang@intel.com>	2024-12-26 06:27:45 -08:00
Adarsh Shirawalmath	acb340728c	[Feature] Support new parameter - EBNF in xgrammar (#2526 )	2024-12-26 05:12:41 -08:00
Sangchun Ha (Patrick)	08effbff35	Error occurs when loading the gemma model in bitsandbytes format. (#2557 )	2024-12-26 05:10:37 -08:00
Liangsheng Yin	e7ebecf82e	Fix cache hit rate when chunked prefill (#2555 )	2024-12-26 03:14:28 -08:00
Xiaoyu Zhang	9a23c48456	h100 tuning fused_moe_triton for qwen2 moe (#2560 )	2024-12-26 03:13:31 -08:00
Yineng Zhang	635a042623	docs: update deepseek v3 example (#2592 )	2024-12-26 17:43:37 +08:00
Yineng Zhang	efc52f85e2	chore: bump v0.4.1 (#2582 )	2024-12-26 07:14:51 +08:00
Yineng Zhang	60e2fdcf4f	use sgl-kernel moe_align_block_size (#2581 ) Co-authored-by: ispobock <ispobaoke@163.com> Co-authored-by: HandH1998 <1335248067@qq.com>	2024-12-26 06:29:08 +08:00
HandH1998	53aed988cb	Refactor MoE (#2575 ) Co-authored-by: zhyncs <me@zhyncs.com>	2024-12-26 00:02:14 +08:00
Ying Sheng	8a56b43175	[Bench] Flush cache before benchmarking (#2566 )	2024-12-24 11:21:21 +08:00
Ke Bao	e835a50021	Reorg moe code (#2563 )	2024-12-24 01:10:22 +08:00
Lianmin Zheng	23e5e50fd5	Fix gemlite import (#2553 )	2024-12-22 20:21:17 -08:00
Lianmin Zheng	41b1db69b8	A better aio rwlock that guarantees the order (#2547 )	2024-12-22 15:44:32 -08:00
Lianmin Zheng	8496701934	[Misc] Fix metrics, weight update lock, request logging (#2543 )	2024-12-22 06:27:22 -08:00
Lei	19ba2b0ea9	Add lora_paths to v1_chat_generate_request (#2529 )	2024-12-22 02:23:33 -08:00
Yineng Zhang	8f4d04e540	chore: bump v0.4.0.post2 (#2525 )	2024-12-21 21:16:34 +08:00
Jerry Zhang	feb2b768ba	Add integration with gemlite weight only quant (#2528 )	2024-12-21 00:25:25 +08:00
Yineng Zhang	4b83db24f1	fix: continue to use flashinfer 0.1.6 temporarily (#2517 )	2024-12-19 14:03:24 +08:00
Yineng Zhang	64456cf023	docs: update README (#2516 )	2024-12-19 13:44:02 +08:00
Yineng Zhang	bb4a922023	feat: add llama3 eval (#2515 )	2024-12-19 13:37:09 +08:00
Lianmin Zheng	21e9e63ad5	Print progress bar during cuda graph capture (#2502 )	2024-12-17 06:33:46 -08:00
Lianmin Zheng	361ea8d912	Fix openai protocols and pass top_k, min_p (#2499 )	2024-12-17 04:14:14 -08:00
Lei	33c5ff2845	Add lora_path to chat completion (#2438 )	2024-12-17 03:47:49 -08:00
Hui Liu	5ce9daea59	ROCm support for sglang.check_env (#2426 )	2024-12-17 03:45:14 -08:00
Lianmin Zheng	bd6196163e	Small fix for the order of apply_torchao_config (#2495 )	2024-12-16 19:21:11 -08:00
Lianmin Zheng	56198b45d9	Add a benchmark script for in-batch prefix caching (#2494 )	2024-12-16 18:49:02 -08:00
Lianmin Zheng	ba36b5520a	Revert "Small fixes for torchao quant" (#2493 )	2024-12-16 15:04:16 -08:00
Lianmin Zheng	7a1aecb938	Simplify pytorch sampling kernel and logit processor (#2491 )	2024-12-16 14:11:09 -08:00
Jerry Zhang	82699474fd	Small fixes for torchao quant (#2476 )	2024-12-16 14:08:12 -08:00
xiaobochen	b532a5fd16	fix moe-ep accuracy issue for fp8 (#2489 )	2024-12-16 20:54:02 +08:00

1 2 3 4 5 ...

1179 Commits