Commit Graph

  • a10d530943 Fix outlines version (#2036) Lianmin Zheng 2024-11-14 12:52:40 -08:00
  • aae5434bdf Fix unit tests (#2034) Lianmin Zheng 2024-11-14 11:08:37 -08:00
  • c3eac1b010 Fix torch.compile for MoE (#2033) Lianmin Zheng 2024-11-14 01:30:24 -08:00
  • b275ce0043 Github runner instructions for AMD (#2031) HAI 2024-11-13 23:57:18 -08:00
  • 13ce3e4b5d Add download_dir ServerArgs property (#2027) Patrick Yi 2024-11-14 02:26:56 -05:00
  • df246e699d chore: open lto and optimization in release profile (#2028) Tzu Gwo 2024-11-14 15:02:39 +08:00
  • fb9fb3518b set content to empty string (#2026) chottolabs 2024-11-13 20:06:02 -05:00
  • c722d9bdc3 Fix dependency and error message for xgrammar (#2024) Lianmin Zheng 2024-11-13 14:04:25 -08:00
  • 218ab3611d Do not let invalid grammar crash the server (#2023) Lianmin Zheng 2024-11-13 11:39:16 -08:00
  • f407fcf9ef Release v0.3.5.post1 (#2022) Lianmin Zheng 2024-11-13 10:27:12 -08:00
  • 54479d6f30 Fix grammar backend for tensor parallelism (#2020) Lianmin Zheng 2024-11-13 01:49:45 -08:00
  • ba069a24d3 Fix grammar backend (#2018) Lianmin Zheng 2024-11-12 21:17:38 -08:00
  • 125b1199c5 support parallel grammar preprocessing (#1996) DarkSharpness 2024-11-13 01:45:28 +09:00
  • eff468dd5a fix test_embedding_models prompt length too long's bug (#2015) Xiaoyu Zhang 2024-11-12 23:21:16 +08:00
  • a1bd719031 fix a bug in v1_embeeding_request (#2014) Xiaoyu Zhang 2024-11-12 16:49:45 +08:00
  • 78c1d6445f Fix finish reason (#2013) Lianmin Zheng 2024-11-11 23:24:41 -08:00
  • 027e65248f support echo=true and logprobs in openai api when logprobs=1 in lm-evaluation-harness (#1998) Xiaoyu Zhang 2024-11-12 15:21:20 +08:00
  • b808a38365 Filter empty prompt in random bench serving (#2011) Ke Bao 2024-11-12 14:53:41 +08:00
  • 602ebc661d remove sglang folder in rust (#2010) Byron Hsu 2024-11-11 20:45:52 -08:00
  • 530ae1bdc8 Fix weight loading for tied word embedding when TP > 1 (#2009) Lianmin Zheng 2024-11-11 17:52:42 -08:00
  • befc6beb86 Fix a typo in io_struct.py (#2008) Lianmin Zheng 2024-11-11 16:34:10 -08:00
  • 59a5ba9be0 [Minor] Remove unused imports (#2006) Lianmin Zheng 2024-11-11 15:36:14 -08:00
  • 86c37d010a fix sglang_router not found (#2005) Byron Hsu 2024-11-11 15:20:14 -08:00
  • f18b9c7252 support internlm2-reward (#1994) RangiLyu 2024-11-12 07:09:58 +08:00
  • 3e33574374 run rust test on ubuntu instead of 1-gpu-runner (#2003) Byron Hsu 2024-11-11 14:46:08 -08:00
  • 0d94f1dd03 Bump router to 0.0.3 (#2004) Byron Hsu 2024-11-11 14:42:22 -08:00
  • e728258d34 release router from py38 to py312 (#2002) Byron Hsu 2024-11-11 14:30:25 -08:00
  • 239eafbd2e Fix rust unit test and pypi token (#2001) Byron Hsu 2024-11-11 14:18:21 -08:00
  • 9d427265fd Add Engine::encode example (#2000) James Xu 2024-11-11 16:43:35 -05:00
  • 00ffde206f setup router python binding ci (#1999) Byron Hsu 2024-11-11 12:19:32 -08:00
  • ddeb9d42de Add engine encode (#1995) James Xu 2024-11-11 14:48:17 -05:00
  • aaf0a3156e docs: add slides link in README (#1997) Yineng Zhang 2024-11-11 21:03:16 +08:00
  • f9633fa9b9 [rust] cache-aware DP - approx tree (#1934) Byron Hsu 2024-11-10 21:57:32 -08:00
  • 087ab83223 [Performance, Triton] Optimize over mask compute to tl.load in fused_moe_kernel (#1980) HAI 2024-11-10 18:54:43 -08:00
  • 8169c6f4ef Add gen-shared-prefix dataset in bench_serving (#1990) Byron Hsu 2024-11-10 16:39:56 -08:00
  • 3d043319aa [CI] Balance unit tests (#1988) Lianmin Zheng 2024-11-10 11:45:01 -08:00
  • a8aad9357d qwen2vl fix bug for #1971 #1897 (#1984) yizhang2077 2024-11-11 00:10:45 +08:00
  • 47ffe7af81 docs: add shm size for docker run (#1986) Yineng Zhang 2024-11-10 22:14:48 +08:00
  • b3523af8eb fix: update pyzmq version (#1983) Yineng Zhang 2024-11-10 21:33:23 +08:00
  • 1929c06762 Simplify prometheus metrics (#1981) Lianmin Zheng 2024-11-10 04:39:32 -08:00
  • ed53ac84b4 Specify zmq Version Requirement (#1982) Huanzhi (Hans) Mao 2024-11-10 01:32:07 -08:00
  • 520f0094e4 [CI] balance unit tests (#1977) Lianmin Zheng 2024-11-09 16:46:14 -08:00
  • 9c939a3d8b Clean up metrics code (#1972) Lianmin Zheng 2024-11-09 15:43:20 -08:00
  • 549e8b8366 [Minor] Fix a typo in test_torchao.py (#1976) Lianmin Zheng 2024-11-09 15:07:27 -08:00
  • a1f32867ca Update pr-test-rust.yml to add a "finish" step (#1975) Lianmin Zheng 2024-11-09 13:53:35 -08:00
  • 760552e068 Update README.md (#1974) Lianmin Zheng 2024-11-09 11:32:13 -08:00
  • d9aada9db1 Introducing SGLang Guru on Gurubase.io (#1745) Kursat Aktas 2024-11-09 22:29:26 +03:00
  • f11eb90fe4 Initialize model_worker_batch variable (#1973) Enrique Shockwave 2024-11-09 19:28:02 +00:00
  • 95a4ed129a Fix metrics (#1963) Yudi Xue 2024-11-08 23:21:11 -08:00
  • d1150e9a00 Updated Instructions on Profiling SGLang Infer System with AMD GPUs (#1966) leishaoSC 2024-11-08 23:19:03 -08:00
  • e3126e3c5f Update README.md's Slack invitation link (#1962) Chayenne 2024-11-08 11:46:25 -08:00
  • a509552087 [minor] Improve code style and compatibility (#1961) Lianmin Zheng 2024-11-08 02:19:41 -08:00
  • 7ef0084b0d Add sentence_transformers to CI dependency (#1958) Lianmin Zheng 2024-11-08 01:21:29 -08:00
  • f9a377f650 [Release, ROCm] release ROCm docker build for AMD MI GPUs (#1957) HAI 2024-11-08 00:14:15 -08:00
  • 4ade15dd32 Adjust reward model's score module and pooler module order for reducing computation (#1956) aqweteddy 2024-11-08 16:10:54 +08:00
  • 8dc84da084 Remove the useless to_srt_kwargs (#1955) Lianmin Zheng 2024-11-07 23:15:08 -08:00
  • f16eb15d0d Gemma2 reward model support (#1954) aqweteddy 2024-11-08 14:42:27 +08:00
  • 5bc2508b80 Monitoring documentation (#1933) Yudi Xue 2024-11-07 22:14:16 -08:00
  • a71a44f203 Update setup_github_runner.md (#1952) Lianmin Zheng 2024-11-07 19:20:47 -08:00
  • 691808d587 Add a timeout for execute-notebook.yml (#1951) Lianmin Zheng 2024-11-07 18:28:29 -08:00
  • d32fba2a4d [ENV, ROCm] update environment settings (#1939) HAI 2024-11-07 18:24:36 -08:00
  • 67c424cce3 [Performance, Triton Kernel Args] extend_attention, optimize kern args to _fwd_kernel (#1941) HAI 2024-11-07 18:24:02 -08:00
  • 1ae270c5d0 [Doc] fix docs (#1949) Lianmin Zheng 2024-11-07 18:20:41 -08:00
  • c77c1e05ba fix black in pre-commit (#1940) Chayenne 2024-11-07 15:42:47 -08:00
  • dca87ec348 [Docs] fix 404 - Contributor Guide (#1942) HAI 2024-11-07 00:50:45 -08:00
  • 4b1d7a2583 Add Rust Router Python Binding (#1891) Austin Liu 2024-11-07 10:08:30 +08:00
  • a5e0defb5a minor: Add basic editorconfig and pre-commit hooks to enforce style for whitespaces (#1926) Xuehai Pan 2024-11-06 21:46:04 +08:00
  • 96766101b4 [rust] refactor server and router (#1922) Byron Hsu 2024-11-06 00:02:02 -08:00
  • a146d9990e support prometheus metrics (#1853) Lzhang-hub 2024-11-06 12:42:53 +08:00
  • f5113e50ae [Doc] improve relative links and structure (#1924) Lianmin Zheng 2024-11-05 01:12:10 -08:00
  • 02755768d3 Change judge to classify & Modify make file (#1920) Chayenne 2024-11-04 23:53:44 -08:00
  • 463d56bf44 Update CODEOWNERS (#1916) Byron Hsu 2024-11-04 17:13:41 -08:00
  • 530ff541cf [router] Impl radix tree and set up CI (#1893) Byron Hsu 2024-11-04 10:56:52 -08:00
  • 3cd2809277 [Docs, ROCm] update install to cover ROCm with MI GPUs (#1915) HAI 2024-11-04 01:40:57 -08:00
  • 704f8e8ed1 Add Reward API Docs etc (#1910) Chayenne 2024-11-03 22:33:03 -08:00
  • 1853c3523b Fix regex docs (#1909) Lianmin Zheng 2024-11-03 14:18:16 -08:00
  • 65859754f1 Release v0.3.5 (#1908) Lianmin Zheng 2024-11-03 13:48:11 -08:00
  • 2ce32db6fb Let reward model take text inputs instead of message lists (#1907) Lianmin Zheng 2024-11-03 13:27:12 -08:00
  • 793b79dbe9 feat: support truss endpoint for benchmark serving (#1906) Yineng Zhang 2024-11-03 12:56:10 -08:00
  • 1363b51983 Escape backwards slash (#1902) Iñaki Arango 2024-11-03 12:27:11 -08:00
  • 0abbf289a8 Unify the model type checking (#1905) Lianmin Zheng 2024-11-03 12:25:39 -08:00
  • c17c578108 Simplify tokenizer manager (#1904) Lianmin Zheng 2024-11-03 08:38:26 -08:00
  • 916b3cdddc Allow passing dtype and max_new_tokens to HF reference script (#1903) Jani Monoses 2024-11-03 18:24:37 +02:00
  • 838dcda162 Simplify tokenizer manager (#1899) Lianmin Zheng 2024-11-03 03:52:38 -08:00
  • efbc116a0f Do not use longest prefix matching when #queue-req is large (#1896) Lianmin Zheng 2024-11-03 01:45:20 -07:00
  • 6aed0445ed turn off log (#1895) Chayenne 2024-11-03 00:19:12 -07:00
  • 908dd7f9aa Add engine api (#1894) Chayenne 2024-11-02 22:03:38 -07:00
  • f4cd804073 Fix ci and link error (#1892) Chayenne 2024-11-02 19:08:49 -07:00
  • be7986e005 Fix docs (#1890) Lianmin Zheng 2024-11-02 13:26:32 -07:00
  • 5a5f18432f Fix docs ci (#1888) Chayenne 2024-11-02 11:57:22 -07:00
  • 7b394e5f2b Fix docs (#1889) Lianmin Zheng 2024-11-02 11:46:00 -07:00
  • 3b60558dd7 Native api (#1886) Chayenne 2024-11-02 01:02:17 -07:00
  • 5a9a4f41c6 Update index.rst (#1885) Lianmin Zheng 2024-11-02 00:20:33 -07:00
  • 72e979bfb5 add native api docs (#1883) Chayenne 2024-11-02 00:17:30 -07:00
  • 146f613405 Fix incorrect context length for llama3.2-11b (#1873) Ran Chen 2024-11-02 00:04:50 -07:00
  • 660ecb731f Fix doc links (#1882) Lianmin Zheng 2024-11-01 20:42:30 -07:00
  • 2565cb0f40 Update docs and workflow (#1881) Lianmin Zheng 2024-11-01 20:29:41 -07:00
  • 066e8a4ef0 Update docs title (#1879) Lianmin Zheng 2024-11-01 20:00:41 -07:00
  • 2134f0898c Fix links in the docs (#1878) Lianmin Zheng 2024-11-01 18:21:14 -07:00
  • a54f278d44 Add a FAQ documentation (#1877) Lianmin Zheng 2024-11-01 18:16:29 -07:00