Commit Graph

479 Commits

Author SHA1 Message Date
applesaucethebun
2ce8793519 Add typo checker in pre-commit (#6179)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-05-11 12:55:00 +08:00
Yineng Zhang
66fc63d6b1 Revert "feat: add thinking_budget (#6089)" (#6181) 2025-05-10 16:07:45 -07:00
Ximingwang-09
921e4a8185 [Docs]Delete duplicate content (#6146)
Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>
2025-05-10 15:02:15 -07:00
XinyuanTong
9d8ec2e67e Fix and Clean up chat-template requirement for VLM (#6114)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
2025-05-11 00:14:09 +08:00
Yineng Zhang
678d8cc987 chore: bump v0.4.6.post3 (#6165) 2025-05-09 15:38:47 -07:00
thyecust
63484f9fd6 feat: add thinking_budget (#6089) 2025-05-09 08:22:09 -07:00
Zhu Chen
fa7d7fd9e5 [Feature] Add FlashAttention3 as a backend for VisionAttention (#5764)
Co-authored-by: othame <chenzhu_912@zju.edu.cn>
Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: Yi Zhang <1109276519@qq.com>
2025-05-08 10:01:19 -07:00
Baizhou Zhang
8f508cc77f Update doc for MLA attention backends (#6034) 2025-05-07 18:51:05 -07:00
Baizhou Zhang
fee37d9e8d [Doc]Fix description for dp_size argument (#6063) 2025-05-08 00:04:22 +08:00
mlmz
a68ed76682 feat: append more comprehensive fields in messages instead of merely role and content (#5996) 2025-05-05 11:43:34 -07:00
Wenxuan Tan
22da3d978f Fix "Avoid computing lse in Ragged Prefill when there's no prefix match" (#5555) 2025-05-05 10:32:17 -07:00
Lifu Huang
1232f7e8b7 Update dev container config to support live code sync and improve docker setup guide (#6018)
Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>
2025-05-04 22:33:46 -07:00
vzed
95c231e50d Tool Call: Add chat_template_kwargs documentation (#5679) 2025-05-04 13:12:40 -07:00
Chayenne
73dcf2b326 Remove token in token out in Native API (#5967) 2025-05-01 21:59:43 -07:00
Chang Su
170d1f218a feat: Refactor DeepSeekV3 function call (#5908) 2025-05-01 21:28:57 -07:00
江家瑋
ad506a4e6b docs: Fix Qwen model typo (#5944)
Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
2025-05-01 10:23:00 -07:00
Ke Bao
ebaba85655 Update ci test and doc for MTP api change (#5952) 2025-05-01 09:30:27 -07:00
Yineng Zhang
9858113c33 chore: bump v0.4.6.post2 (#5939) 2025-04-30 22:04:40 -07:00
liwenju0
8fefdd32c7 [Feature] add support kimi vl model (#5383)
Co-authored-by: wenju.li <wenju.li@deepctr.cn>
2025-04-29 21:31:19 -07:00
Baizhou Zhang
799789afed Bump Flashinfer to 0.2.5 (#5870)
Co-authored-by: Yuhao Chen <yxckeis8@gmail.com>
2025-04-29 19:50:57 -07:00
Chang Su
2b06484bd1 feat: support pythonic tool call and index in tool call streaming (#5725) 2025-04-29 17:30:44 -07:00
simveit
ae523675e5 [Doc] Tables instead of bulletpoints for sampling doc (#5841) 2025-04-29 13:49:39 -07:00
Adarsh Shirawalmath
5c08aa4958 [Docs] Update docs for Qwen3 and Qwen3MoE (#5836) 2025-04-29 13:48:30 -07:00
Qiaolin Yu
8c0cfca87d Feat: support cuda graph for LoRA (#4115)
Co-authored-by: Beichen Ma <mabeichen12@gmail.com>
2025-04-28 23:30:44 -07:00
Yineng Zhang
dcae1fb2cd chore: bump v0.4.6.post1 (#5845) 2025-04-28 12:57:08 -07:00
Lianmin Zheng
849c83a0c0 [CI] test chunked prefill more (#5798) 2025-04-28 10:57:17 -07:00
Baizhou Zhang
f48b007c1d [Doc] Recover history of server_arguments.md (#5851) 2025-04-28 10:48:21 -07:00
Michael Yao
966eb90865 [Docs] Replace lists with tables for cleanup and readability in server_arguments (#5276) 2025-04-28 00:36:10 -07:00
Trevor Morris
84810da4ae Add Cutlass MLA attention backend (#5390) 2025-04-27 20:58:53 -07:00
Huapeng Zhou
86317c09e9 [Docs] update grafana setup guide in production metrics (#5643)
Co-authored-by: NoahM <88418672+zhudianGG@users.noreply.github.com>
2025-04-27 15:36:33 -07:00
Baizhou Zhang
84022c0e56 Release v0.4.6 (#5795) 2025-04-27 14:07:05 -07:00
Frankey_8080
a21ef36352 support for the DeepSeek model by enabling streaming response parsing (#5592) 2025-04-26 18:59:31 -07:00
Lianmin Zheng
155890e4d1 [Minor] fix documentations (#5756) 2025-04-26 17:48:43 -07:00
Lianmin Zheng
5641a09458 Revert "[Model] Support ArcticForCausalLM architecture (Snowflake/snowflake-arctic-instruct)" (#5754) 2025-04-25 15:50:28 -07:00
Brayden Zhong
43fb95c2fa [Model] Support ArcticForCausalLM architecture (Snowflake/snowflake-arctic-instruct) (#5078)
Co-authored-by: vincent-4 <vincentzhongy+githubvincent4@gmail.com>
2025-04-25 15:24:09 +08:00
Michael Yao
b5be56944b [Doc] Fix a link to Weilin Zhao (#5706)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-04-25 02:02:27 +08:00
Michael Yao
7c99103f4c [Doc] Fix two 404 links caused by sglang typo (#5667)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-04-23 23:21:55 +08:00
Baizhou Zhang
ce5412b62e Turn on DeepGemm By Default and Update Doc (#5628) 2025-04-22 16:10:08 -07:00
Michael Yao
92bb64bc86 [Doc] Fix a 404 link to llama-405b (#5615)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-04-21 20:39:37 -07:00
Yineng Zhang
b9c87e781d chore: bump v0.4.5.post3 (#5611) 2025-04-21 18:16:20 -07:00
Huapeng Zhou
57131dd955 [Feat.] Enable grafana to show metrics (#4718)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-04-21 00:43:42 -07:00
simveit
8de53da989 smaller and non gated models for docs (#5378) 2025-04-20 17:38:25 -07:00
Yi Zhou
fac17acf08 add function call parser for DeepSeek V3 (#5224) 2025-04-20 17:38:08 -07:00
Adarsh Shirawalmath
8b39274e34 [Feature] Prefill assistant response - add continue_final_message parameter (#4226)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-04-20 17:37:18 -07:00
lukec
417b44eba8 [Feat] upgrade pytorch2.6 (#5417) 2025-04-20 16:06:34 -07:00
Baizhou Zhang
072b4d0398 Add document for LoRA serving (#5521) 2025-04-20 14:37:57 -07:00
fzyzcjy
9c43477710 Super tiny fix typo (#5559) 2025-04-20 14:21:18 -07:00
Lianmin Zheng
fbdc94ba59 Release v0.4.5.post2 (#5582) 2025-04-20 14:12:37 -07:00
Baizhou Zhang
b54b5a96e4 [Doc]Add instruction for profiling with bench_one_batch (#5581) 2025-04-20 14:05:36 -07:00
Yineng Zhang
0961feefca feat: use flashinfer jit package (#5547) 2025-04-19 00:28:39 -07:00