Commit Graph

31 Commits

Author SHA1 Message Date
wangyu
9f81d741a2 fix: fix MLA for ShardedModelLoader/RemoteModelLoader (#6287)
Signed-off-by: wangyu <wangyu.steph@bytedance.com>
2025-08-28 16:10:09 -07:00
wangyu
a38c149758 feat(draft_model): support draft_model for RemoteModelLoader (#6407)
Signed-off-by: wangyu <wangyu.steph@bytedance.com>
2025-08-28 16:09:52 -07:00
PGFLMG
b7cd743038 [Feat] QWen-1M context support[2/2]: Update block sparse attention backend (#5949) 2025-08-06 23:49:36 -07:00
Ata Fatahi
1ab6be1b26 Purge VerlEngine (#7326)
Signed-off-by: Ata Fatahi <immrata@gmail.com>
2025-06-19 23:47:21 -07:00
Lifu Huang
3cf1473a09 Use monotonic clock for interval measurement (#6211)
Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>
2025-05-17 16:49:18 -07:00
XinyuanTong
9d8ec2e67e Fix and Clean up chat-template requirement for VLM (#6114)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
2025-05-11 00:14:09 +08:00
Michael Yao
269c457e05 [Docs] Update runtime/engine/readme.md (#5737)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-04-25 16:39:29 -07:00
Ravi Theja
d2b8d0b8d8 Add example to use sgl engine with fastapi (#5648)
Co-authored-by: Ravi Theja Desetty <ravitheja@Ravis-MacBook-Pro.local>
2025-04-24 23:57:05 +08:00
Brayden Zhong
b149b39353 [CI] Remove unused imports with Ruff to pre-commit config, only to benchmarks/docs/examples folder (#3969) 2025-03-27 19:45:02 -07:00
wangyu
1ce4878d31 feat(remote_model): support variable remote backend for model loader (#3964)
Signed-off-by: wangyu <wangyu.steph@bytedance.com>
2025-03-14 00:40:44 -07:00
Chitsing KUI
959a3143fc example: add async offline inference demo (#3961)
Signed-off-by: joeshikui <joeshikui@tencent.com>
Co-authored-by: joeshikui <joeshikui@tencent.com>
2025-03-12 21:41:21 -07:00
Qiaolin Yu
357671e216 Add examples for server token-in-token-out (#4103)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-03-05 13:16:31 -08:00
Mick
583d6af71b example: add vlm to token in & out example (#3941)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-03-04 22:18:26 -08:00
Qiaolin Yu
4725e3f652 Add examples for returning hidden states when using the server (#4074) 2025-03-04 19:31:50 -08:00
Lianmin Zheng
935cda944b Misc clean up; Remove the support of jump forward (#4032) 2025-03-03 07:02:14 -08:00
Chayenne
728e175fc4 Add examples to token-in-token-out for LLM (#4010) 2025-03-02 21:03:49 -08:00
Qiaolin Yu
40782f05d7 Refactor: Move return_hidden_states to the generate input (#3985)
Co-authored-by: Beichen-Ma <mabeichen12@gmail.com>
2025-03-01 17:51:29 -08:00
fzyzcjy
e3e0bc50a9 [Feature] SPMD for SGLang + Verl (#3852) 2025-02-28 09:53:10 -08:00
Qiaolin Yu
d6898dd253 Add return hidden state in the native API (#3897)
Co-authored-by: Beichen-Ma <mabeichen12@gmail.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-02-26 22:06:54 -08:00
simveit
acd1a15921 Docs: Implemented frontend docs (#3791)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-02-26 15:30:05 -08:00
Shi Shuai
e074e76b31 docs: Add offline engine launch example and documentation (#3771) 2025-02-21 11:25:52 -08:00
Shenggui Li
fb4c9c3a30 [fix] added support for vlm in offline inference (#3548) 2025-02-15 05:27:29 +08:00
Yineng Zhang
013021b6a1 refactor EAGLE 2 (#3269)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
Co-authored-by: merrymercy <lianminzheng@gmail.com>
Co-authored-by: Ying1123 <sqy1415@gmail.com>
2025-02-03 20:52:30 +08:00
Lianmin Zheng
cd493b5afc Improve metrics, logging, and importing orders (#2992) 2025-01-19 18:36:59 -08:00
yukavio
815dce0554 Eagle speculative decoding part 4: Add EAGLE2 worker (#2150)
Co-authored-by: kavioyu <kavioyu@tencent.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
2025-01-02 03:22:34 -08:00
Qun Yang
37ee906f61 Add more support for intel Gaudi accelerators (#2357) 2024-12-06 01:16:33 -08:00
James Xu
9d427265fd Add Engine::encode example (#2000) 2024-11-11 13:43:35 -08:00
Chayenne
c77c1e05ba fix black in pre-commit (#1940) 2024-11-08 07:42:47 +08:00
Xuehai Pan
a5e0defb5a minor: Add basic editorconfig and pre-commit hooks to enforce style for whitespaces (#1926) 2024-11-06 13:46:04 +00:00
Byron Hsu
6fcd6d7d6d Support token ids in engine.generate (#1820) 2024-10-27 14:02:34 -07:00
Byron Hsu
862cd265e5 [engine] support async and streaming (#1614) 2024-10-11 15:26:25 -07:00