Commit Graph

  • eb0c1f5373 docs: add SGLang v0.4 blog (#2341) Yineng Zhang 2024-12-05 01:24:51 +08:00
  • b2986d7aa5 Adding SGLang FP8 Utils (#2348) HAI 2024-12-04 03:01:33 -08:00
  • f8b0326934 chore: bump v0.4.0 (#2338) Yineng Zhang 2024-12-04 03:55:41 +08:00
  • 0495796517 [router] Copy license when publishing & bump version (#2339) Byron Hsu 2024-12-03 10:27:43 -08:00
  • 1228f7ca69 Fix gptq for moe layers (#2300) Lianmin Zheng 2024-12-03 07:12:33 -08:00
  • fda628d8f2 fix: resolve cmake url for Dockerfile.dev (#2335) Yineng Zhang 2024-12-03 21:22:19 +08:00
  • 07ec07ad1f Improve torch compile for fused moe (#2327) Lianmin Zheng 2024-12-03 01:58:25 -08:00
  • 83b340e371 Add missing license for router wheel (#2324) Ata Fatahi 2024-12-03 00:06:25 -08:00
  • 0639bf15d1 ROCm Container: set SGLANG_SET_CPU_AFFINITY=1 (#2328) HAI 2024-12-02 23:20:33 -08:00
  • aa47f64223 Revert "[feat] Enable chunked prefill for llava-onevision" (#2329) Ying Sheng 2024-12-02 23:11:13 -08:00
  • 3ddb1c4679 [Minor] Fix logger and style (#2325) Lianmin Zheng 2024-12-02 20:45:53 -08:00
  • 480e38a733 [feat] Enable chunked prefill for llava-onevision (#2281) Ying Sheng 2024-12-02 20:19:02 -08:00
  • 69e2d4fb66 Relax to include more AMD GPUs (#2319) HAI 2024-12-02 19:05:58 -08:00
  • 85e1a6f3aa Update model_loader deps and qqq quantization deps (#2220) (#2318) Yineng Zhang 2024-12-02 23:22:13 +08:00
  • 33deca81b5 Add more fused moe benchmark utilities (#2314) Lianmin Zheng 2024-12-02 04:26:55 -08:00
  • 18108abe5d [Minor] Fix code style (#2311) Lianmin Zheng 2024-12-02 02:27:36 -08:00
  • c54bda300a Use rocminfo instead of rocm-smi for more OS/WSL support (#2310) HAI 2024-12-02 00:15:45 -08:00
  • 3c79ad35ca [Fix] Fix the padded hash value for image tokens (#2309) Lianmin Zheng 2024-12-01 23:36:28 -08:00
  • 983bfcf386 Online weight updates from torch.distributed (#2279) Chayenne 2024-12-01 23:23:18 -08:00
  • 28bc60dcab misc: update build setup (#2306) Yineng Zhang 2024-12-02 02:03:49 +08:00
  • 7301a39b13 fix: resolve CodeQL cpp issue (#2305) Yineng Zhang 2024-12-01 23:55:19 +08:00
  • 47eb139f81 feat: use warp reduce as a simple example (#2304) Yineng Zhang 2024-12-01 22:43:50 +08:00
  • 5c18a03733 Fix logprob for completions (#2301) Lianmin Zheng 2024-12-01 05:17:05 -08:00
  • 5c91a315d7 feat: support sgl-kernel pypi (#2302) Yineng Zhang 2024-12-01 20:11:21 +08:00
  • 3dbd73d319 minor: rm unused _grouped_size_compiled_for_decode_kernels (#2299) Yineng Zhang 2024-12-01 19:24:12 +08:00
  • e9a6203dee feat: skip good first issue (#2298) Yineng Zhang 2024-12-01 19:18:57 +08:00
  • 62c516ac45 Add a simple torch native attention backend (#2241) Qun Yang 2024-12-01 19:01:25 +08:00
  • fc78640e00 minor: support flashinfer nightly (#2295) Yineng Zhang 2024-12-01 18:55:26 +08:00
  • 906d795f15 Feat: upgrade outlines & support compatibility with the old version (#2292) gobraves 2024-12-01 18:07:27 +08:00
  • 118b6af35e feat: add should_use_tensor_core (#2179) Yineng Zhang 2024-12-01 18:01:16 +08:00
  • 9449a95431 [CI] Balance CI tests (#2293) Lianmin Zheng 2024-12-01 01:47:30 -08:00
  • 5f12f0e7af Fix chunked prefill when ignore eos (#2290) Liangsheng Yin 2024-12-01 00:37:53 -08:00
  • d5b95cbb53 adapt vllm distributed module to sglang (#2244) yizhang2077 2024-12-01 15:54:52 +08:00
  • 0303ca918f [CI] Fix missing files in run_suite.py (#2288) Lianmin Zheng 2024-11-30 23:53:34 -08:00
  • 00181098dd feat: add Dockerfile for development (#2289) Yineng Zhang 2024-12-01 15:27:52 +08:00
  • 4936be8acc Revert "Revert "[FEAT] Support GGUF format"" (#2287) Lianmin Zheng 2024-11-30 22:14:48 -08:00
  • 1bfa511b95 [CI] Fix ci tests (#2284) Lianmin Zheng 2024-11-30 21:12:03 -08:00
  • f5b5f2bff9 Revert "[Fix] fix assertion error for chunked prefill when disabling cache" (#2286) Lianmin Zheng 2024-11-30 19:03:42 -08:00
  • 7e4c6dd8da Revert "[FEAT] Support GGUF format" (#2285) Lianmin Zheng 2024-11-30 19:03:26 -08:00
  • d622851dc9 [Fix] fix assertion error for chunked prefill when disabling cache (#2282) Rui Wang 2024-12-01 09:53:43 +08:00
  • 883c955489 [FEAT] Support GGUF format (#2215) Yang Zheng 2024-11-30 16:44:48 +08:00
  • 0d6a49bd7d [CI] Kill zombie processes (#2280) Lianmin Zheng 2024-11-30 00:24:30 -08:00
  • ccaf1f997c [CI] Print summary on github actions (#2274) Lianmin Zheng 2024-11-29 23:48:54 -08:00
  • 7d1485d376 Add get weights by parameter name for llama (#2266) Chayenne 2024-11-29 23:36:38 -08:00
  • 7d5d1d3d29 udate weights from disk (#2265) Chayenne 2024-11-29 17:17:00 -08:00
  • b53d6cbda3 Add new contributors so they can trigger CI automatically (#2269) Lianmin Zheng 2024-11-29 16:37:52 -08:00
  • 01017d4c20 Support LoRA in Completion API (#2243) bjmsong 2024-11-30 08:13:38 +08:00
  • 94e167ea5a Fix the default chunked prefill size (#2268) Lianmin Zheng 2024-11-29 16:03:32 -08:00
  • 262e370f78 [benchmark] Add fused_moe_triton benchmark and tuning tools (#2225) Xiaoyu Zhang 2024-11-30 05:36:45 +08:00
  • 419a57e771 minor: add sgl-kernel dir (#2261) Yineng Zhang 2024-11-30 02:27:35 +08:00
  • fae4e5e99a chore: bump v0.3.6.post3 (#2259) Yineng Zhang 2024-11-30 01:41:16 +08:00
  • afe1e46586 [Minor] fix the style for multimodal models (#2257) Lianmin Zheng 2024-11-29 04:24:20 -08:00
  • f50a6cf443 Fix hash collision for multi modal models (#2256) Lianmin Zheng 2024-11-29 03:15:58 -08:00
  • fe97a2d40f Simplify tokenizer manager (#2254) Lianmin Zheng 2024-11-29 02:18:51 -08:00
  • 8b48496aaf Revert "Revert "Add simple CPU offloading support"" (#2253) Ying Sheng 2024-11-28 23:58:54 -08:00
  • 4057ea82c9 Revert "Add simple CPU offloading support" (#2252) Ying Sheng 2024-11-28 23:36:55 -08:00
  • 4f2ee48ed1 Update backend.md (#2251) Lianmin Zheng 2024-11-28 23:18:07 -08:00
  • 71ff2728a1 Update backend.md (#2250) Lianmin Zheng 2024-11-28 23:14:36 -08:00
  • b7038fec9b [fix] Fix prefix caching for multi-image/video (#2239) Ying Sheng 2024-11-28 12:08:13 -08:00
  • 65fdb28929 fix missing launch server import (#2242) Enrique Shockwave 2024-11-28 13:24:47 +00:00
  • b2ccf36d4d Fix memory leak during abort (#2238) Lianmin Zheng 2024-11-28 02:22:15 -08:00
  • d4fc1a70e3 Crash the server correctly during error (#2231) Lianmin Zheng 2024-11-28 00:22:39 -08:00
  • db674e3d24 Add OLMo2 model. (#2233) Jani Monoses 2024-11-28 10:15:20 +02:00
  • fb915bd1a2 Disable overlap scheduler for multimodal models (#2235) Lianmin Zheng 2024-11-27 23:44:33 -08:00
  • 09798b36cd Fix chunked prefill size for bench_offline_throughput (#2234) Lianmin Zheng 2024-11-27 23:37:20 -08:00
  • b79fffdcb5 Update Install Method 2. From source (#2232) HAI 2024-11-27 22:46:55 -08:00
  • cd51758fad Rename tuned MI300X config files for fused_moe_triton (#2228) HAI 2024-11-27 21:18:51 -08:00
  • 91e5dbf554 add profile in offline benchmark & update doc (#2123) bjmsong 2024-11-28 06:57:13 +08:00
  • dd5eba4c88 Remove fused_moe_grok (#2223) Lianmin Zheng 2024-11-27 14:28:55 -08:00
  • a4fd2f9b46 fix typo prompts (#2224) Baoyuan Qi 2024-11-28 04:07:00 +08:00
  • 92d1253e58 Bump sglang-router to 0.0.10 for env name change (#2226) Byron Hsu 2024-11-27 11:23:32 -08:00
  • a9ca297d76 [3rdparty, document] Updated Documentation that for triton fused_moe kernel tuning for AMD Instinct GPUs (#2191) kk 2024-11-28 02:23:10 +08:00
  • 2a02185c5f Rename DP_RANK to SGLANG_DP_RANK (#2218) Lianmin Zheng 2024-11-27 09:36:36 -08:00
  • fed4c6946a Release v0.3.6.post2 (#2214) Lianmin Zheng 2024-11-27 03:35:30 -08:00
  • fb6e04a0c2 Use an env var SGLANG_SET_CPU_AFFINITY to set cpu affinity; turn it off by default (#2222) Lianmin Zheng 2024-11-27 02:52:46 -08:00
  • 6997e28f6e Revert "Use an env var SGLANG_SET_CPU_AFFINITY to set cpu affinity; turn it off by default" (#2221) Lianmin Zheng 2024-11-27 02:02:01 -08:00
  • a0e58740a8 Use an env var SGLANG_SET_CPU_AFFINITY to set cpu affinity; turn it off by default (#2217) Lianmin Zheng 2024-11-27 01:13:41 -08:00
  • 37c8a5761f [feat] Support session control for vision language models (#2210) Ying Sheng 2024-11-27 00:03:29 -08:00
  • c754652fcd Fix flasky tests (#2212) Lianmin Zheng 2024-11-26 23:06:20 -08:00
  • 0b46b951ae Fix rust warning (#2208) Byron Hsu 2024-11-26 15:00:41 -08:00
  • 2763c0a73a Bump router to 0.0.9 with better logging (#2207) Byron Hsu 2024-11-26 13:30:28 -08:00
  • de3b67b77d docs: update adoption (#2204) Yineng Zhang 2024-11-27 04:57:16 +08:00
  • 19f33b3237 add sglang version to get_server_info (#2206) Yudi Xue 2024-11-26 12:10:23 -08:00
  • 30ce5b599e minor: update check_env (#2201) Yineng Zhang 2024-11-26 18:22:55 +08:00
  • bc1f6fda0d fix: add cuda-python for xgrammar (#2199) Yineng Zhang 2024-11-26 17:24:18 +08:00
  • 867e092f82 using is not not != to test None (#2196) Wang Ran (汪然) 2024-11-26 17:00:38 +08:00
  • 88c7763f53 Remove unresolved reference 'self' (#2198) Andrew Lyu 2024-11-26 16:59:58 +08:00
  • e4118b15b3 remove unused imports (#2195) Wang Ran (汪然) 2024-11-26 16:59:36 +08:00
  • ba4ee37fa4 Update sampler.py to skip the success check (#2197) Lianmin Zheng 2024-11-26 00:58:57 -08:00
  • ac5a0f0488 Release v0.3.6.post1 (#2189) Lianmin Zheng 2024-11-25 17:31:37 -08:00
  • ea34350d88 Rename double sparsity config file (#2188) Lianmin Zheng 2024-11-25 17:12:08 -08:00
  • 1605ae121e [CI] Minor fix for CI (#2187) Lianmin Zheng 2024-11-25 16:38:43 -08:00
  • 1aea19f64b Input_embeds support (#2052) Rin Intachuen 2024-11-25 19:35:04 -05:00
  • 1f76fc6e3f [router] Rust e2e test (#2184) Byron Hsu 2024-11-25 16:02:03 -08:00
  • 7f076c2ce6 Update XGrammar to the latest API (#2176) Yixin Dong 2024-11-25 18:58:30 -05:00
  • 3c5538f781 Update CI threshold (#2186) Lianmin Zheng 2024-11-25 15:24:17 -08:00
  • 10189d08dd [Performance]: Process affinity to CPU cores with multiple sockets support (#2171) HAI 2024-11-25 14:57:32 -08:00
  • c4336b2b60 Use custom allreduce w/ torch.compile (#2185) Lianmin Zheng 2024-11-25 14:55:01 -08:00
  • 4d62bca542 [router] Replace print with logger (#2183) Byron Hsu 2024-11-25 13:36:02 -08:00
  • e1e595d702 [feat] Refactor session control interface and add CI (#2173) Ying Sheng 2024-11-25 12:32:51 -08:00