Commit Graph

110 Commits

Author SHA1 Message Date
Brayden Zhong
b149b39353 [CI] Remove unused imports with Ruff to pre-commit config, only to benchmarks/docs/examples folder (#3969) 2025-03-27 19:45:02 -07:00
Chuyue Sun
fad86a6863 Support n in OpenAI API completions (#3446)
Co-authored-by: Shan Yu <shanyu1@g.ucla.edu>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
Co-authored-by: chuyue sun <chuyue@lmsys.us-northcentral1-a.compute.internal>
2025-03-20 13:46:46 +08:00
wangyu
1ce4878d31 feat(remote_model): support variable remote backend for model loader (#3964)
Signed-off-by: wangyu <wangyu.steph@bytedance.com>
2025-03-14 00:40:44 -07:00
Qiaolin Yu
85d2365d33 Fix the output of hidden states after HTTP requests (#4269) 2025-03-13 14:54:06 -07:00
yuhui
cf721fdece Update grafana.json (#4374) 2025-03-13 01:31:33 -07:00
Chitsing KUI
959a3143fc example: add async offline inference demo (#3961)
Signed-off-by: joeshikui <joeshikui@tencent.com>
Co-authored-by: joeshikui <joeshikui@tencent.com>
2025-03-12 21:41:21 -07:00
simveit
007f8b3dc2 Added example for multimodal embedding (#4206)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-03-10 00:53:56 -07:00
Qiaolin Yu
357671e216 Add examples for server token-in-token-out (#4103)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-03-05 13:16:31 -08:00
Mick
583d6af71b example: add vlm to token in & out example (#3941)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-03-04 22:18:26 -08:00
Qiaolin Yu
4725e3f652 Add examples for returning hidden states when using the server (#4074) 2025-03-04 19:31:50 -08:00
Kebe
2415ec3896 Remove grafana dashboard's datasource uid (#4051) 2025-03-04 03:44:51 -08:00
Lianmin Zheng
935cda944b Misc clean up; Remove the support of jump forward (#4032) 2025-03-03 07:02:14 -08:00
Yudi Xue
a7000a7650 Update metrics documentation (#3264) 2025-03-03 05:03:58 -08:00
Chayenne
728e175fc4 Add examples to token-in-token-out for LLM (#4010) 2025-03-02 21:03:49 -08:00
Qiaolin Yu
40782f05d7 Refactor: Move return_hidden_states to the generate input (#3985)
Co-authored-by: Beichen-Ma <mabeichen12@gmail.com>
2025-03-01 17:51:29 -08:00
fzyzcjy
e3e0bc50a9 [Feature] SPMD for SGLang + Verl (#3852) 2025-02-28 09:53:10 -08:00
KCFindstr
bc20e93f2d [feat] Add Vertex AI compatible prediction route for /generate (#3866) 2025-02-27 19:42:15 -08:00
Qiaolin Yu
d6898dd253 Add return hidden state in the native API (#3897)
Co-authored-by: Beichen-Ma <mabeichen12@gmail.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-02-26 22:06:54 -08:00
simveit
acd1a15921 Docs: Implemented frontend docs (#3791)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-02-26 15:30:05 -08:00
Shi Shuai
e074e76b31 docs: Add offline engine launch example and documentation (#3771) 2025-02-21 11:25:52 -08:00
Shenggui Li
fb4c9c3a30 [fix] added support for vlm in offline inference (#3548) 2025-02-15 05:27:29 +08:00
Chuyue Sun
6cc309557a Add support for OpenAI API o1 model (#3363)
Co-authored-by: Shan Yu <shanyu1@g.ucla.edu>
2025-02-14 11:43:00 +08:00
Yineng Zhang
013021b6a1 refactor EAGLE 2 (#3269)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
Co-authored-by: merrymercy <lianminzheng@gmail.com>
Co-authored-by: Ying1123 <sqy1415@gmail.com>
2025-02-03 20:52:30 +08:00
Lianmin Zheng
cd493b5afc Improve metrics, logging, and importing orders (#2992) 2025-01-19 18:36:59 -08:00
Lianmin Zheng
61f42b5732 Move sgl.Runtime under sglang/lang (#2990) 2025-01-19 17:10:29 -08:00
yukavio
815dce0554 Eagle speculative decoding part 4: Add EAGLE2 worker (#2150)
Co-authored-by: kavioyu <kavioyu@tencent.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
2025-01-02 03:22:34 -08:00
Qun Yang
37ee906f61 Add more support for intel Gaudi accelerators (#2357) 2024-12-06 01:16:33 -08:00
Chayenne
18ea841f40 Add Docs For SGLang Native Router (#2308) 2024-12-04 15:41:22 -08:00
Xuehai Pan
72f87b723b feat(pre-commit): trim unnecessary notebook metadata from git history (#2127) 2024-11-22 13:04:51 -08:00
James Xu
9d427265fd Add Engine::encode example (#2000) 2024-11-11 13:43:35 -08:00
Yudi Xue
5bc2508b80 Monitoring documentation (#1933) 2024-11-07 22:14:16 -08:00
Chayenne
c77c1e05ba fix black in pre-commit (#1940) 2024-11-08 07:42:47 +08:00
Xuehai Pan
a5e0defb5a minor: Add basic editorconfig and pre-commit hooks to enforce style for whitespaces (#1926) 2024-11-06 13:46:04 +00:00
Chayenne
02755768d3 Change judge to classify & Modify make file (#1920) 2024-11-04 23:53:44 -08:00
Lianmin Zheng
0abbf289a8 Unify the model type checking (#1905) 2024-11-03 12:25:39 -08:00
Byron Hsu
6fcd6d7d6d Support token ids in engine.generate (#1820) 2024-10-27 14:02:34 -07:00
Yineng Zhang
8bee20f80b Update vllm to 0.6.3 (#1711) (#1720)
Co-authored-by: Ke Bao <ISPObaoke@163.com>
2024-10-19 20:45:41 -07:00
Byron Hsu
862cd265e5 [engine] support async and streaming (#1614) 2024-10-11 15:26:25 -07:00
Byron Hsu
551a3a9d38 Provide an offline engine API (#1567) 2024-10-06 20:27:03 -07:00
Byron Hsu
2422de5193 Support min_tokens in sgl.gen (#1573) 2024-10-05 21:51:12 -07:00
FredericOdermatt
2432ad40c6 [Minifix] Remove extra space in cot example (#1569) 2024-10-04 01:16:53 -07:00
Lianmin Zheng
114bbc8651 Use ipc instead of tcp in zmq (#1566) 2024-10-04 00:45:52 -07:00
Theresa Barton
2c7d0a5b8b [Fix] Fix all the Huggingface paths (#1553) 2024-10-02 10:12:07 -07:00
Ying Sheng
0f4fb19bc8 [Fix, LoRA] fix LoRA with updates in main (#1545) 2024-09-30 10:06:08 -07:00
Lianmin Zheng
4e4459b91f Multiple minor fixes (#1530) 2024-09-28 14:43:35 -07:00
Ying Sheng
9aa6553d2a [Feature] Support reward model LxzGordon/URM-LLaMa-3.1-8B (#1525) 2024-09-27 23:32:11 -07:00
Ying Sheng
e4780cf839 [API, Feature] Support response prefill for openai API (#1490) 2024-09-22 06:46:17 -07:00
Li Bo
446ea33277 fix: creat new dict everytime for putting new frame (#1464) 2024-09-19 01:31:48 -07:00
Ying Sheng
37963394aa [Feature] Support LoRA path renaming and add LoRA serving benchmarks (#1433) 2024-09-15 12:46:04 -07:00
Lianmin Zheng
e4d68afcf0 [Minor] Many cleanup (#1357) 2024-09-09 04:14:11 -07:00