sglang

Author	SHA1	Message	Date
wangyu	9f81d741a2	fix: fix MLA for ShardedModelLoader/RemoteModelLoader (#6287 ) Signed-off-by: wangyu <wangyu.steph@bytedance.com>	2025-08-28 16:10:09 -07:00
wangyu	a38c149758	feat(draft_model): support draft_model for RemoteModelLoader (#6407 ) Signed-off-by: wangyu <wangyu.steph@bytedance.com>	2025-08-28 16:09:52 -07:00
Xu Wenqing	b9683be653	Support DeepSeek-V3.1 tool call (#9446 ) Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2025-08-26 20:22:19 -07:00
Chang Su	c9dd70fbde	tool-call(dsv3): Improve deepseek-v3 chat template and tool_choice = `required` (#9525 )	2025-08-23 01:46:56 -07:00
PGFLMG	b7cd743038	[Feat] QWen-1M context support[2/2]: Update block sparse attention backend (#5949 )	2025-08-06 23:49:36 -07:00
yi wang	5963e50503	[bugfix] Fix 2 minor bugs in the hicache storage layer (#8404 )	2025-07-31 05:47:14 +00:00
Jinn	ab74f8f09d	Remove batches api in docs & example (#7400 )	2025-06-20 19:46:31 -07:00
Ata Fatahi	1ab6be1b26	Purge VerlEngine (#7326 ) Signed-off-by: Ata Fatahi <immrata@gmail.com>	2025-06-19 23:47:21 -07:00
kyle-pena-kuzco	b56de8f943	Open AI API hidden states (#6716 )	2025-06-10 14:37:29 -07:00
Chao Yang	4fac524b14	update llama4 chat template and pythonic parser (#6679 ) Co-authored-by: Chang Su <chang.s.su@oracle.com>	2025-05-30 17:01:22 -07:00
Xu Wenqing	62cac2c43a	Update DeepSeek-R1-0528 function call chat template (#6765 ) Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>	2025-05-30 00:42:57 -07:00
Xu Wenqing	f4d4f93928	Add DeepSeek-R1-0528 function call chat template (#6725 ) Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>	2025-05-29 00:05:07 -07:00
Lifu Huang	3cf1473a09	Use monotonic clock for interval measurement (#6211 ) Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>	2025-05-17 16:49:18 -07:00
Kiv Chen	5380cd7ea3	model(vlm): pixtral (#5084 )	2025-05-13 00:16:10 -07:00
Lianmin Zheng	e8e18dcdcc	Revert "fix some typos" (#6244 )	2025-05-12 12:53:26 -07:00
applesaucethebun	d738ab52f8	fix some typos (#6209 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-13 01:42:38 +08:00
mlmz	69276f619a	doc: fix the erroneous documents and example codes about Alibaba-NLP/gme-Qwen2-VL-2B-Instruct (#6199 )	2025-05-11 08:22:11 -07:00
applesaucethebun	2ce8793519	Add typo checker in pre-commit (#6179 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-11 12:55:00 +08:00
XinyuanTong	9d8ec2e67e	Fix and Clean up chat-template requirement for VLM (#6114 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>	2025-05-11 00:14:09 +08:00
Chang Su	170d1f218a	feat: Refactor DeepSeekV3 function call (#5908 )	2025-05-01 21:28:57 -07:00
Chang Su	2b06484bd1	feat: support pythonic tool call and index in tool call streaming (#5725 )	2025-04-29 17:30:44 -07:00
Michael Yao	269c457e05	[Docs] Update runtime/engine/readme.md (#5737 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-04-25 16:39:29 -07:00
Ravi Theja	d2b8d0b8d8	Add example to use sgl engine with fastapi (#5648 ) Co-authored-by: Ravi Theja Desetty <ravitheja@Ravis-MacBook-Pro.local>	2025-04-24 23:57:05 +08:00
Huapeng Zhou	57131dd955	[Feat.] Enable grafana to show metrics (#4718 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-04-21 00:43:42 -07:00
mac0ne	2b3bdc938e	Correct grafana heatmap. (#5019 )	2025-04-20 17:58:56 -07:00
Adarsh Shirawalmath	8b39274e34	[Feature] Prefill assistant response - add continue_final_message parameter (#4226 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-04-20 17:37:18 -07:00
XinyuanTong	e7beff8a13	fix: examples for token_in_token_out_vlm (#5193 )	2025-04-11 01:38:23 -07:00
Brayden Zhong	b149b39353	[CI] Remove unused imports with Ruff to pre-commit config, only to benchmarks/docs/examples folder (#3969 )	2025-03-27 19:45:02 -07:00
Chuyue Sun	fad86a6863	Support `n` in OpenAI API completions (#3446 ) Co-authored-by: Shan Yu <shanyu1@g.ucla.edu> Co-authored-by: Yineng Zhang <me@zhyncs.com> Co-authored-by: chuyue sun <chuyue@lmsys.us-northcentral1-a.compute.internal>	2025-03-20 13:46:46 +08:00
wangyu	1ce4878d31	feat(remote_model): support variable remote backend for model loader (#3964 ) Signed-off-by: wangyu <wangyu.steph@bytedance.com>	2025-03-14 00:40:44 -07:00
Qiaolin Yu	85d2365d33	Fix the output of hidden states after HTTP requests (#4269 )	2025-03-13 14:54:06 -07:00
yuhui	cf721fdece	Update grafana.json (#4374 )	2025-03-13 01:31:33 -07:00
Chitsing KUI	959a3143fc	example: add async offline inference demo (#3961 ) Signed-off-by: joeshikui <joeshikui@tencent.com> Co-authored-by: joeshikui <joeshikui@tencent.com>	2025-03-12 21:41:21 -07:00
simveit	007f8b3dc2	Added example for multimodal embedding (#4206 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-03-10 00:53:56 -07:00
Qiaolin Yu	357671e216	Add examples for server token-in-token-out (#4103 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-03-05 13:16:31 -08:00
Mick	583d6af71b	example: add vlm to token in & out example (#3941 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-03-04 22:18:26 -08:00
Qiaolin Yu	4725e3f652	Add examples for returning hidden states when using the server (#4074 )	2025-03-04 19:31:50 -08:00
Kebe	2415ec3896	Remove grafana dashboard's datasource uid (#4051 )	2025-03-04 03:44:51 -08:00
Lianmin Zheng	935cda944b	Misc clean up; Remove the support of jump forward (#4032 )	2025-03-03 07:02:14 -08:00
Yudi Xue	a7000a7650	Update metrics documentation (#3264 )	2025-03-03 05:03:58 -08:00
Chayenne	728e175fc4	Add examples to token-in-token-out for LLM (#4010 )	2025-03-02 21:03:49 -08:00
Qiaolin Yu	40782f05d7	Refactor: Move return_hidden_states to the generate input (#3985 ) Co-authored-by: Beichen-Ma <mabeichen12@gmail.com>	2025-03-01 17:51:29 -08:00
fzyzcjy	e3e0bc50a9	[Feature] SPMD for SGLang + Verl (#3852 )	2025-02-28 09:53:10 -08:00
KCFindstr	bc20e93f2d	[feat] Add Vertex AI compatible prediction route for /generate (#3866 )	2025-02-27 19:42:15 -08:00
Qiaolin Yu	d6898dd253	Add return hidden state in the native API (#3897 ) Co-authored-by: Beichen-Ma <mabeichen12@gmail.com> Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-02-26 22:06:54 -08:00
simveit	acd1a15921	Docs: Implemented frontend docs (#3791 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-02-26 15:30:05 -08:00
Shi Shuai	e074e76b31	docs: Add offline engine launch example and documentation (#3771 )	2025-02-21 11:25:52 -08:00
Shenggui Li	fb4c9c3a30	[fix] added support for vlm in offline inference (#3548 )	2025-02-15 05:27:29 +08:00
Chuyue Sun	6cc309557a	Add support for OpenAI API o1 model (#3363 ) Co-authored-by: Shan Yu <shanyu1@g.ucla.edu>	2025-02-14 11:43:00 +08:00
Yineng Zhang	013021b6a1	refactor EAGLE 2 (#3269 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com> Co-authored-by: merrymercy <lianminzheng@gmail.com> Co-authored-by: Ying1123 <sqy1415@gmail.com>	2025-02-03 20:52:30 +08:00

1 2 3

137 Commits