Commit Graph

1354 Commits

Author SHA1 Message Date
HAI
b79fffdcb5 Update Install Method 2. From source (#2232) 2024-11-27 22:46:55 -08:00
HAI
cd51758fad Rename tuned MI300X config files for fused_moe_triton (#2228) 2024-11-27 21:18:51 -08:00
bjmsong
91e5dbf554 add profile in offline benchmark & update doc (#2123)
Co-authored-by: root <bjmsong@126.com>
2024-11-27 14:57:13 -08:00
Lianmin Zheng
dd5eba4c88 Remove fused_moe_grok (#2223) 2024-11-27 14:28:55 -08:00
Baoyuan Qi
a4fd2f9b46 fix typo prompts (#2224) 2024-11-27 12:07:00 -08:00
Byron Hsu
92d1253e58 Bump sglang-router to 0.0.10 for env name change (#2226) 2024-11-27 11:23:32 -08:00
kk
a9ca297d76 [3rdparty, document] Updated Documentation that for triton fused_moe kernel tuning for AMD Instinct GPUs (#2191)
Co-authored-by: wunhuang <wunhuang@amd.com>
Co-authored-by: HAI <hixiao@gmail.com>
2024-11-28 02:23:10 +08:00
Lianmin Zheng
2a02185c5f Rename DP_RANK to SGLANG_DP_RANK (#2218) 2024-11-27 09:36:36 -08:00
Lianmin Zheng
fed4c6946a Release v0.3.6.post2 (#2214)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2024-11-27 03:35:30 -08:00
Lianmin Zheng
fb6e04a0c2 Use an env var SGLANG_SET_CPU_AFFINITY to set cpu affinity; turn it off by default (#2222) 2024-11-27 02:52:46 -08:00
Lianmin Zheng
6997e28f6e Revert "Use an env var SGLANG_SET_CPU_AFFINITY to set cpu affinity; turn it off by default" (#2221) 2024-11-27 02:02:01 -08:00
Lianmin Zheng
a0e58740a8 Use an env var SGLANG_SET_CPU_AFFINITY to set cpu affinity; turn it off by default (#2217) 2024-11-27 01:13:41 -08:00
Ying Sheng
37c8a5761f [feat] Support session control for vision language models (#2210) 2024-11-27 00:03:29 -08:00
Lianmin Zheng
c754652fcd Fix flasky tests (#2212) 2024-11-26 23:06:20 -08:00
Byron Hsu
0b46b951ae Fix rust warning (#2208) 2024-11-26 15:00:41 -08:00
Byron Hsu
2763c0a73a Bump router to 0.0.9 with better logging (#2207) 2024-11-26 13:30:28 -08:00
Yineng Zhang
de3b67b77d docs: update adoption (#2204) 2024-11-26 12:57:16 -08:00
Yudi Xue
19f33b3237 add sglang version to get_server_info (#2206) 2024-11-26 12:10:23 -08:00
Yineng Zhang
30ce5b599e minor: update check_env (#2201) 2024-11-26 18:22:55 +08:00
Yineng Zhang
bc1f6fda0d fix: add cuda-python for xgrammar (#2199) 2024-11-26 17:24:18 +08:00
Wang Ran (汪然)
867e092f82 using is not not != to test None (#2196) 2024-11-26 01:00:38 -08:00
Andrew Lyu
88c7763f53 Remove unresolved reference 'self' (#2198) 2024-11-26 00:59:58 -08:00
Wang Ran (汪然)
e4118b15b3 remove unused imports (#2195) 2024-11-26 00:59:36 -08:00
Lianmin Zheng
ba4ee37fa4 Update sampler.py to skip the success check (#2197) 2024-11-26 00:58:57 -08:00
Lianmin Zheng
ac5a0f0488 Release v0.3.6.post1 (#2189) 2024-11-25 17:31:37 -08:00
Lianmin Zheng
ea34350d88 Rename double sparsity config file (#2188) 2024-11-25 17:12:08 -08:00
Lianmin Zheng
1605ae121e [CI] Minor fix for CI (#2187) 2024-11-25 16:38:43 -08:00
Rin Intachuen
1aea19f64b Input_embeds support (#2052) 2024-11-25 16:35:04 -08:00
Byron Hsu
1f76fc6e3f [router] Rust e2e test (#2184) 2024-11-25 16:02:03 -08:00
Yixin Dong
7f076c2ce6 Update XGrammar to the latest API (#2176)
Co-authored-by: Ben Gitter <gitterbd@gmail.com>
2024-11-25 15:58:30 -08:00
Lianmin Zheng
3c5538f781 Update CI threshold (#2186) 2024-11-25 15:24:17 -08:00
HAI
10189d08dd [Performance]: Process affinity to CPU cores with multiple sockets support (#2171) 2024-11-25 14:57:32 -08:00
Lianmin Zheng
c4336b2b60 Use custom allreduce w/ torch.compile (#2185) 2024-11-25 14:55:01 -08:00
Byron Hsu
4d62bca542 [router] Replace print with logger (#2183) 2024-11-25 13:36:02 -08:00
Ying Sheng
e1e595d702 [feat] Refactor session control interface and add CI (#2173) 2024-11-25 12:32:51 -08:00
dependabot[bot]
5ada33ffa0 Bump rustls from 0.23.16 to 0.23.18 in /rust (#2182)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-26 03:22:33 +08:00
Lianmin Zheng
254fd130e2 [CI] Split test cases in CI for better load balancing (#2180) 2024-11-25 04:58:16 -08:00
Yixin Dong
538fa0ae13 [Fix] Avoid calling fill_vocab_mask for terminated requests (#2175) 2024-11-25 17:31:25 +08:00
Yineng Zhang
55842eb81a feat: fused_moe fp8 monkey patch (#2174) 2024-11-25 17:06:36 +08:00
Byron Hsu
a866b65e1d Bump rust router to 0.0.8 2024-11-24 23:17:38 -08:00
Byron Hsu
4b0a1c9365 Replace prob based with threshold based load balancing (#2170) 2024-11-24 23:17:11 -08:00
Lianmin Zheng
8e1adb8441 Allow overwrite flashinfer use_tensorcore (#2169) 2024-11-24 20:58:17 -08:00
Xiaoyu Zhang
dd44173dad [Fused moe] add tuning fused configs for qwen2 57b and mixtral 8x7b (#2167) 2024-11-25 10:37:50 +08:00
Lianmin Zheng
8912b7637f Fix docs (#2164) 2024-11-24 08:25:56 -08:00
Lianmin Zheng
be0124bda0 Rename triton_fused_moe -> fused_moe_triton (#2163) 2024-11-24 08:12:35 -08:00
Lianmin Zheng
fe5d3e818f Balance CI tests (#2162) 2024-11-24 07:38:52 -08:00
Lianmin Zheng
731146f6cb Fix mixed chunked prefill in overlap mode (#2158) 2024-11-24 07:17:37 -08:00
Yineng Zhang
fa27161380 fix: use torch.sum for compatible (#2161) 2024-11-24 22:37:04 +08:00
Lianmin Zheng
5652c56535 Update CI threshold & Improve code style (#2159) 2024-11-24 06:29:38 -08:00
Yineng Zhang
e3938b2f9c feat: update other MoE models deps (#2156) 2024-11-24 21:36:34 +08:00