sglang

Author	SHA1	Message	Date
yiakwy-xpu-ml-framework-team	10bfce71b3	fix moe align blocks benchmark (#3003 )	2025-01-20 19:33:29 +08:00
Xiaoyu Zhang	83452dbb4a	fix file name spelling mistake and useless variable in minmax-text-01-lightning_attention (#2971 )	2025-01-18 18:56:13 -08:00
Xiaoyu Zhang	c2f212d672	optimize MiniMax-Text-01 lightning_attn_decode triton (#2966 )	2025-01-18 23:41:01 +08:00
Zhiqiang Xie	13387e6b7a	Multi-turn benchmark for hierarchical caching (#2942 )	2025-01-17 16:17:24 -08:00
Xiaoyu Zhang	78e974b2a5	[kernel] MiniMax-Text-01 decode lightning_attn with triton (#2920 )	2025-01-16 12:51:38 -08:00
Lianmin Zheng	bc6915e3b9	Improve type annotation and styles (#2926 )	2025-01-16 12:51:11 -08:00
Xiaoyu Zhang	ab31793661	[kernel] MiniMax-Text-01 prefill lightning_attn with triton (#2911 )	2025-01-16 14:18:29 +08:00
Yineng Zhang	80002562a8	docs: update README (#2878 )	2025-01-14 12:48:17 +08:00
Lianmin Zheng	46d4431889	Add a new api configure_logging to allow dumping the requests (#2875 )	2025-01-13 14:24:00 -08:00
Xiaoyu Zhang	d08c77c434	Sampling penalties memory interface (#2870 )	2025-01-13 23:09:00 +08:00
Yineng Zhang	41d7e5b7e6	docs: update link (#2857 )	2025-01-13 18:40:48 +08:00
Lianmin Zheng	72c7776355	Fix linear.py and improve weight loading (#2851 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com>	2025-01-13 01:39:14 -08:00
Ke Bao	85b2e05770	Add int8 quant kernel (#2848 )	2025-01-13 13:16:58 +08:00
Yineng Zhang	197cbf9bab	docs: update README (#2841 )	2025-01-11 23:11:38 +08:00
Yineng Zhang	f624901cdd	chore: bump v0.4.1.post5 (#2840 )	2025-01-11 23:10:02 +08:00
sleepcoo	4f077c01b8	minor: support specifying local dataset path for gsm8k and hellaswag (#2816 )	2025-01-09 22:24:42 +08:00
Lianmin Zheng	bdc1acf6cd	Misc fix for min_p_sampling, --cuda-graph-bs (#2761 )	2025-01-07 02:52:53 -08:00
Xiaoyu Zhang	380930a959	add benchmark_moe_align_blocks (#2767 )	2025-01-07 14:20:50 +08:00
Rodrigo Garcia	a990daff9c	Included multi-node DeepSeekv3 example (#2707 )	2025-01-02 22:17:03 +08:00
Lianmin Zheng	ad20b7957e	Eagle speculative decoding part 3: small modifications to the general scheduler (#2709 ) Co-authored-by: kavioyu <kavioyu@tencent.com>	2025-01-02 02:09:08 -08:00
Lianmin Zheng	8c3b420eec	[Docs] clean up structured outputs docs (#2654 )	2024-12-29 23:57:16 -08:00
Yineng Zhang	098d659c0e	docs: update README (#2651 )	2024-12-30 13:33:29 +08:00
Lzhang-hub	76d14f8cb9	add 2*h20 node serving example for deepseek v3 (#2650 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-12-30 13:04:38 +08:00
Lianmin Zheng	03d5fbfd44	Release 0.4.1.post3 - upload the config.json to PyPI (#2647 )	2024-12-29 14:25:53 -08:00
Yineng Zhang	763dd55d17	docs: update README (#2644 )	2024-12-30 01:24:06 +08:00
HandH1998	afa0341e57	Update Triton configs for block fp8 kernels (#2641 )	2024-12-29 22:53:47 +08:00
Yineng Zhang	7863e4368a	add configs for block fp8 related kernels (#2628 ) Co-authored-by: HandH1998 <1335248067@qq.com>	2024-12-28 23:12:04 +08:00
Ke Bao	8a2681e26a	Update readme (#2625 )	2024-12-28 13:39:56 +08:00
Yineng Zhang	d9e6ee382b	docs: update README (#2618 )	2024-12-28 00:21:53 +08:00
Lianmin Zheng	f46f394f4d	Update README.md (#2605 )	2024-12-26 10:58:49 -08:00
Lianmin Zheng	773951548d	Fix logprob_start_len for multi modal models (#2597 ) Co-authored-by: libra <lihu723@gmail.com> Co-authored-by: fzyzcjy <ch271828n@outlook.com> Co-authored-by: Wang, Haoyu <haoyu.wang@intel.com>	2024-12-26 06:27:45 -08:00
fsygd	637de9e8ce	update readme of DeepSeek V3 (#2596 )	2024-12-26 21:31:56 +08:00
Xiaoyu Zhang	9a23c48456	h100 tuning fused_moe_triton for qwen2 moe (#2560 )	2024-12-26 03:13:31 -08:00
Yineng Zhang	635a042623	docs: update deepseek v3 example (#2592 )	2024-12-26 17:43:37 +08:00
Yineng Zhang	75ad0a143f	docs: add deepseek v3 launch instructions (#2589 )	2024-12-25 23:26:54 -08:00
Ke Bao	e835a50021	Reorg moe code (#2563 )	2024-12-24 01:10:22 +08:00
Xiaoyu Zhang	7d672d277b	[kernel optimize] benchmark write_req_to_token_pool_triton and optimize kernel (#2509 )	2024-12-22 02:31:02 -08:00
bjmsong	e21026690d	benchmark decoding attention kernel with cudnn (#2467 ) Co-authored-by: root <bjmsong@126.com>	2024-12-17 03:31:57 -08:00
Lianmin Zheng	56198b45d9	Add a benchmark script for in-batch prefix caching (#2494 )	2024-12-16 18:49:02 -08:00
Xiaoyu Zhang	a0592c059f	[Benchmark] add a benchmark for hf/vllm/sglang rmsnorm (#2486 )	2024-12-15 13:52:08 +08:00
bjmsong	f67723940d	decoding attention kernel benchmark (#2425 ) Co-authored-by: root <bjmsong@126.com>	2024-12-11 04:46:59 -08:00
Xiaoyu Zhang	3844feb9bb	Add a unittest for fused_moe (#2416 )	2024-12-08 22:46:10 -08:00
Lianmin Zheng	07ec07ad1f	Improve torch compile for fused moe (#2327 )	2024-12-03 01:58:25 -08:00
Lianmin Zheng	33deca81b5	Add more fused moe benchmark utilities (#2314 )	2024-12-02 04:26:55 -08:00
Xiaoyu Zhang	262e370f78	[benchmark] Add fused_moe_triton benchmark and tuning tools (#2225 ) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> Co-authored-by: HAI <hixiao@gmail.com>	2024-11-29 13:36:45 -08:00
Henry Hyeonmok Ko	dbe1729395	Merged three native APIs into one: get_server_info (#2152 )	2024-11-24 01:37:58 -08:00
Byron Hsu	cbedd1db1d	[router] cache-aware load-balancing router v1 (#2114 )	2024-11-23 08:34:48 -08:00
Xuehai Pan	62a4a339eb	docs: fix module docstrings and copyright headers (#2077 )	2024-11-22 22:16:53 +08:00
Lianmin Zheng	c29b98e043	Fix json benchmark (#2043 )	2024-11-15 05:33:43 -08:00
DarkSharpness	954f4e6bd6	benchmark json schema (#2030 )	2024-11-15 05:06:19 -08:00

1 2 3

127 Commits