Commit Graph

133 Commits

Author SHA1 Message Date
PGFLMG
b7cd743038 [Feat] QWen-1M context support[2/2]: Update block sparse attention backend (#5949) 2025-08-06 23:49:36 -07:00
yi wang
5963e50503 [bugfix] Fix 2 minor bugs in the hicache storage layer (#8404) 2025-07-31 05:47:14 +00:00
Jinn
ab74f8f09d Remove batches api in docs & example (#7400) 2025-06-20 19:46:31 -07:00
Ata Fatahi
1ab6be1b26 Purge VerlEngine (#7326)
Signed-off-by: Ata Fatahi <immrata@gmail.com>
2025-06-19 23:47:21 -07:00
kyle-pena-kuzco
b56de8f943 Open AI API hidden states (#6716) 2025-06-10 14:37:29 -07:00
Chao Yang
4fac524b14 update llama4 chat template and pythonic parser (#6679)
Co-authored-by: Chang Su <chang.s.su@oracle.com>
2025-05-30 17:01:22 -07:00
Xu Wenqing
62cac2c43a Update DeepSeek-R1-0528 function call chat template (#6765)
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>
2025-05-30 00:42:57 -07:00
Xu Wenqing
f4d4f93928 Add DeepSeek-R1-0528 function call chat template (#6725)
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>
2025-05-29 00:05:07 -07:00
Lifu Huang
3cf1473a09 Use monotonic clock for interval measurement (#6211)
Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>
2025-05-17 16:49:18 -07:00
Kiv Chen
5380cd7ea3 model(vlm): pixtral (#5084) 2025-05-13 00:16:10 -07:00
Lianmin Zheng
e8e18dcdcc Revert "fix some typos" (#6244) 2025-05-12 12:53:26 -07:00
applesaucethebun
d738ab52f8 fix some typos (#6209)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-05-13 01:42:38 +08:00
mlmz
69276f619a doc: fix the erroneous documents and example codes about Alibaba-NLP/gme-Qwen2-VL-2B-Instruct (#6199) 2025-05-11 08:22:11 -07:00
applesaucethebun
2ce8793519 Add typo checker in pre-commit (#6179)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-05-11 12:55:00 +08:00
XinyuanTong
9d8ec2e67e Fix and Clean up chat-template requirement for VLM (#6114)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
2025-05-11 00:14:09 +08:00
Chang Su
170d1f218a feat: Refactor DeepSeekV3 function call (#5908) 2025-05-01 21:28:57 -07:00
Chang Su
2b06484bd1 feat: support pythonic tool call and index in tool call streaming (#5725) 2025-04-29 17:30:44 -07:00
Michael Yao
269c457e05 [Docs] Update runtime/engine/readme.md (#5737)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-04-25 16:39:29 -07:00
Ravi Theja
d2b8d0b8d8 Add example to use sgl engine with fastapi (#5648)
Co-authored-by: Ravi Theja Desetty <ravitheja@Ravis-MacBook-Pro.local>
2025-04-24 23:57:05 +08:00
Huapeng Zhou
57131dd955 [Feat.] Enable grafana to show metrics (#4718)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-04-21 00:43:42 -07:00
mac0ne
2b3bdc938e Correct grafana heatmap. (#5019) 2025-04-20 17:58:56 -07:00
Adarsh Shirawalmath
8b39274e34 [Feature] Prefill assistant response - add continue_final_message parameter (#4226)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-04-20 17:37:18 -07:00
XinyuanTong
e7beff8a13 fix: examples for token_in_token_out_vlm (#5193) 2025-04-11 01:38:23 -07:00
Brayden Zhong
b149b39353 [CI] Remove unused imports with Ruff to pre-commit config, only to benchmarks/docs/examples folder (#3969) 2025-03-27 19:45:02 -07:00
Chuyue Sun
fad86a6863 Support n in OpenAI API completions (#3446)
Co-authored-by: Shan Yu <shanyu1@g.ucla.edu>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
Co-authored-by: chuyue sun <chuyue@lmsys.us-northcentral1-a.compute.internal>
2025-03-20 13:46:46 +08:00
wangyu
1ce4878d31 feat(remote_model): support variable remote backend for model loader (#3964)
Signed-off-by: wangyu <wangyu.steph@bytedance.com>
2025-03-14 00:40:44 -07:00
Qiaolin Yu
85d2365d33 Fix the output of hidden states after HTTP requests (#4269) 2025-03-13 14:54:06 -07:00
yuhui
cf721fdece Update grafana.json (#4374) 2025-03-13 01:31:33 -07:00
Chitsing KUI
959a3143fc example: add async offline inference demo (#3961)
Signed-off-by: joeshikui <joeshikui@tencent.com>
Co-authored-by: joeshikui <joeshikui@tencent.com>
2025-03-12 21:41:21 -07:00
simveit
007f8b3dc2 Added example for multimodal embedding (#4206)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-03-10 00:53:56 -07:00
Qiaolin Yu
357671e216 Add examples for server token-in-token-out (#4103)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-03-05 13:16:31 -08:00
Mick
583d6af71b example: add vlm to token in & out example (#3941)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-03-04 22:18:26 -08:00
Qiaolin Yu
4725e3f652 Add examples for returning hidden states when using the server (#4074) 2025-03-04 19:31:50 -08:00
Kebe
2415ec3896 Remove grafana dashboard's datasource uid (#4051) 2025-03-04 03:44:51 -08:00
Lianmin Zheng
935cda944b Misc clean up; Remove the support of jump forward (#4032) 2025-03-03 07:02:14 -08:00
Yudi Xue
a7000a7650 Update metrics documentation (#3264) 2025-03-03 05:03:58 -08:00
Chayenne
728e175fc4 Add examples to token-in-token-out for LLM (#4010) 2025-03-02 21:03:49 -08:00
Qiaolin Yu
40782f05d7 Refactor: Move return_hidden_states to the generate input (#3985)
Co-authored-by: Beichen-Ma <mabeichen12@gmail.com>
2025-03-01 17:51:29 -08:00
fzyzcjy
e3e0bc50a9 [Feature] SPMD for SGLang + Verl (#3852) 2025-02-28 09:53:10 -08:00
KCFindstr
bc20e93f2d [feat] Add Vertex AI compatible prediction route for /generate (#3866) 2025-02-27 19:42:15 -08:00
Qiaolin Yu
d6898dd253 Add return hidden state in the native API (#3897)
Co-authored-by: Beichen-Ma <mabeichen12@gmail.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-02-26 22:06:54 -08:00
simveit
acd1a15921 Docs: Implemented frontend docs (#3791)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-02-26 15:30:05 -08:00
Shi Shuai
e074e76b31 docs: Add offline engine launch example and documentation (#3771) 2025-02-21 11:25:52 -08:00
Shenggui Li
fb4c9c3a30 [fix] added support for vlm in offline inference (#3548) 2025-02-15 05:27:29 +08:00
Chuyue Sun
6cc309557a Add support for OpenAI API o1 model (#3363)
Co-authored-by: Shan Yu <shanyu1@g.ucla.edu>
2025-02-14 11:43:00 +08:00
Yineng Zhang
013021b6a1 refactor EAGLE 2 (#3269)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
Co-authored-by: merrymercy <lianminzheng@gmail.com>
Co-authored-by: Ying1123 <sqy1415@gmail.com>
2025-02-03 20:52:30 +08:00
Lianmin Zheng
cd493b5afc Improve metrics, logging, and importing orders (#2992) 2025-01-19 18:36:59 -08:00
Lianmin Zheng
61f42b5732 Move sgl.Runtime under sglang/lang (#2990) 2025-01-19 17:10:29 -08:00
yukavio
815dce0554 Eagle speculative decoding part 4: Add EAGLE2 worker (#2150)
Co-authored-by: kavioyu <kavioyu@tencent.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
2025-01-02 03:22:34 -08:00
Qun Yang
37ee906f61 Add more support for intel Gaudi accelerators (#2357) 2024-12-06 01:16:33 -08:00