Commit Graph

1767 Commits

Author SHA1 Message Date
Lianmin Zheng
93b77c8e8a Fix the request loggings to make it fully able to be easily replayed (#2973) 2025-01-18 21:45:00 -08:00
Lianmin Zheng
7906d1d298 Remove the unused write_with_records (#2972) 2025-01-18 20:20:23 -08:00
fzyzcjy
81d27c8e31 Refactor to add TypeBasedDispatcher to simplify dispatching (#2958) 2025-01-18 20:13:27 -08:00
Chang Su
4d4cdb3fe7 Frontend: better error message handling for FINISH_ABORT in scheduler.py (#2956) 2025-01-18 19:37:30 -08:00
Yang Zheng
2bd18e2d76 Memory pool: Minor optimize to avoid to (#2901) 2025-01-18 19:35:12 -08:00
Xiaoyu Zhang
83452dbb4a fix file name spelling mistake and useless variable in minmax-text-01-lightning_attention (#2971) 2025-01-18 18:56:13 -08:00
Mick
3d93f84a00 [Feature] Support minicpmv v2.6 (#2785)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: yizhang2077 <1109276519@qq.com>
2025-01-18 14:14:19 -08:00
Xiaoyu Zhang
c2f212d672 optimize MiniMax-Text-01 lightning_attn_decode triton (#2966) 2025-01-18 23:41:01 +08:00
Yineng Zhang
e2cdc8a5b5 upgrade cutlass v3.7.0 (#2967) 2025-01-18 23:37:42 +08:00
Yineng Zhang
2add697d7a feat: remove vllm get_rope (#2964) 2025-01-18 19:38:01 +08:00
lukec
6f98c586bd fix sgl-kernel setup.py (#2963) 2025-01-18 18:50:37 +08:00
Ke Bao
656dcc1a99 Remove fp8 monkey patch (#2960) 2025-01-18 15:00:29 +08:00
Zhiqiang Xie
8af7048dcf Query remaining memory dynamically for PrefillAdder (#2941) 2025-01-17 20:20:26 -08:00
bjmsong
d3024f4fc8 support e4m3 kvcache in qwen2 & add kv scaling facotr json (#2894)
Co-authored-by: bjmsong <bjmsong@126.com>
2025-01-18 11:43:22 +08:00
Zhiqiang Xie
13387e6b7a Multi-turn benchmark for hierarchical caching (#2942) 2025-01-17 16:17:24 -08:00
Wen Sun
120c3634ef Fix Llama-3.1-405B References Docs (#2944) 2025-01-17 14:46:38 -08:00
Yineng Zhang
78e5b22f29 feat: use get_rope for gemma2 (#2954) 2025-01-18 02:57:18 +08:00
Yineng Zhang
7a15e9ad36 cleanup models unused import 2/n (#2952) 2025-01-18 01:09:19 +08:00
Ke Bao
dc2ac0cbdb Update pr template (#2951) 2025-01-18 00:44:16 +08:00
Ke Bao
d47c5101f1 Add ut for qwen model (#2947) 2025-01-18 00:03:54 +08:00
Yineng Zhang
033c715b46 cleanup models dependencies 1/n (#2948) 2025-01-17 23:46:48 +08:00
Yineng Zhang
d06c1ab587 update ci install dependency (#2949) 2025-01-17 23:42:23 +08:00
Yineng Zhang
c5644cace9 docs: add Cursor for adoption and sponsorship (#2950) 2025-01-17 23:41:57 +08:00
Ke Bao
53e6552fed Fix qwen accuracy issue (#2945) 2025-01-17 22:35:26 +08:00
Yineng Zhang
5dc54f1a62 feat: remove vllm distributed (#2907)
Co-authored-by: Zhangyi <1109276519@qq.com>
2025-01-17 22:31:51 +08:00
Ke Bao
f3e9b4894b Fix sgl-kernel ci (#2938) 2025-01-17 17:26:21 +08:00
Lianmin Zheng
6a7973add8 Update release-docs.yml (#2937) 2025-01-17 00:36:40 -08:00
Chunyuan WU
63051738a9 Enable CPU device on SGLang (#2806) 2025-01-16 21:22:53 -08:00
Chang Su
a8ccacc8b8 [Frontend] Fix request length check and add option to disallow auto truncation in scheduler (#2876) 2025-01-16 14:51:19 -08:00
Lianmin Zheng
0427416b59 Fix zmq binding (#2930)
Co-authored-by: Chunyuan WU <chunyuan.wu@intel.com>
2025-01-16 14:36:07 -08:00
Chayenne
bf3edc2c60 Docs: Update pull_request_template.md (#2928) 2025-01-16 13:04:11 -08:00
Xiaoyu Zhang
78e974b2a5 [kernel] MiniMax-Text-01 decode lightning_attn with triton (#2920) 2025-01-16 12:51:38 -08:00
Lianmin Zheng
bc6915e3b9 Improve type annotation and styles (#2926) 2025-01-16 12:51:11 -08:00
saienduri
a883f0790d Update release-docker-amd.yml to run on amd docker runner. (#2927) 2025-01-16 12:42:29 -08:00
Lianmin Zheng
8b6ce52e92 Support multi-node DP attention (#2925)
Co-authored-by: dhou-xai <dhou@x.ai>
2025-01-16 11:15:00 -08:00
Ke Bao
58f3f2b840 Add CI for sgl-kernel (#2924) 2025-01-17 01:26:51 +08:00
Lianmin Zheng
93d690617e Simplify the process launch code in server.py (#2923) 2025-01-16 07:52:17 -08:00
Yun Dai
e00e5385e0 add profiling to bench_one_batch script (#2821) 2025-01-16 07:24:24 -08:00
Rin Intachuen
a2f602b541 fixed lm_head.weight error for quantized qwen (#2910) 2025-01-16 06:51:43 -08:00
Lianmin Zheng
8f2c522aba Improve benchmark scripts and error message printing (#2922) 2025-01-16 06:24:31 -08:00
Yineng Zhang
7596417732 minor: use bear for compilation database (#2919) 2025-01-16 18:39:11 +08:00
Yineng Zhang
2dc957d421 fix setup for sgl kernel (#2917) 2025-01-16 18:17:34 +08:00
Yineng Zhang
bf8d07a6f9 feat: patch linear base (#2915) 2025-01-16 18:00:03 +08:00
Xiaoyu Zhang
ab31793661 [kernel] MiniMax-Text-01 prefill lightning_attn with triton (#2911) 2025-01-16 14:18:29 +08:00
Yineng Zhang
b7f3fec13c minor: rename bench for sgl kernel (#2909) 2025-01-16 05:55:43 +08:00
Yineng Zhang
58f42b1dd8 minor: update pr test (#2908) 2025-01-16 05:51:49 +08:00
yizhang2077
767c9dec03 adapt custom allreduce for tensorrt llm (#2511) 2025-01-16 04:57:35 +08:00
Yineng Zhang
a53454c55e fix: sgl-kernel link cuda (#2906) 2025-01-16 04:53:23 +08:00
yizhang2077
6cb3974e77 optimize custom allreduce kernel (#2904) 2025-01-16 03:04:25 +08:00
Lianmin Zheng
f65c13b559 Remove normalized_prompt_logprobs from the engine to make code easier to maintain (#2902) 2025-01-15 04:54:14 -08:00