Lianmin Zheng
|
e8e18dcdcc
|
Revert "fix some typos" (#6244)
|
2025-05-12 12:53:26 -07:00 |
|
applesaucethebun
|
d738ab52f8
|
fix some typos (#6209)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-05-13 01:42:38 +08:00 |
|
Cheng Wan
|
25c83fff6a
|
Performing Vocabulary Parallelism for LM Head across Attention TP Groups (#5558)
Co-authored-by: liusy58 <liusy58@linux.alibaba.com>
|
2025-05-11 23:36:29 -07:00 |
|
Lianmin Zheng
|
01bdbf7f80
|
Improve structured outputs: fix race condition, server crash, metrics and style (#6188)
|
2025-05-11 08:36:16 -07:00 |
|
Adarsh Shirawalmath
|
94d42b6794
|
[Docs] minor Qwen3 and reasoning parser docs fix (#6032)
|
2025-05-11 08:22:46 -07:00 |
|
mlmz
|
69276f619a
|
doc: fix the erroneous documents and example codes about Alibaba-NLP/gme-Qwen2-VL-2B-Instruct (#6199)
|
2025-05-11 08:22:11 -07:00 |
|
applesaucethebun
|
2ce8793519
|
Add typo checker in pre-commit (#6179)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-05-11 12:55:00 +08:00 |
|
Yineng Zhang
|
66fc63d6b1
|
Revert "feat: add thinking_budget (#6089)" (#6181)
|
2025-05-10 16:07:45 -07:00 |
|
Ximingwang-09
|
921e4a8185
|
[Docs]Delete duplicate content (#6146)
Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>
|
2025-05-10 15:02:15 -07:00 |
|
XinyuanTong
|
9d8ec2e67e
|
Fix and Clean up chat-template requirement for VLM (#6114)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
|
2025-05-11 00:14:09 +08:00 |
|
thyecust
|
63484f9fd6
|
feat: add thinking_budget (#6089)
|
2025-05-09 08:22:09 -07:00 |
|
Zhu Chen
|
fa7d7fd9e5
|
[Feature] Add FlashAttention3 as a backend for VisionAttention (#5764)
Co-authored-by: othame <chenzhu_912@zju.edu.cn>
Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: Yi Zhang <1109276519@qq.com>
|
2025-05-08 10:01:19 -07:00 |
|
Baizhou Zhang
|
8f508cc77f
|
Update doc for MLA attention backends (#6034)
|
2025-05-07 18:51:05 -07:00 |
|
Baizhou Zhang
|
fee37d9e8d
|
[Doc]Fix description for dp_size argument (#6063)
|
2025-05-08 00:04:22 +08:00 |
|
mlmz
|
a68ed76682
|
feat: append more comprehensive fields in messages instead of merely role and content (#5996)
|
2025-05-05 11:43:34 -07:00 |
|
Wenxuan Tan
|
22da3d978f
|
Fix "Avoid computing lse in Ragged Prefill when there's no prefix match" (#5555)
|
2025-05-05 10:32:17 -07:00 |
|
vzed
|
95c231e50d
|
Tool Call: Add chat_template_kwargs documentation (#5679)
|
2025-05-04 13:12:40 -07:00 |
|
Chayenne
|
73dcf2b326
|
Remove token in token out in Native API (#5967)
|
2025-05-01 21:59:43 -07:00 |
|
Chang Su
|
2b06484bd1
|
feat: support pythonic tool call and index in tool call streaming (#5725)
|
2025-04-29 17:30:44 -07:00 |
|
simveit
|
ae523675e5
|
[Doc] Tables instead of bulletpoints for sampling doc (#5841)
|
2025-04-29 13:49:39 -07:00 |
|
Qiaolin Yu
|
8c0cfca87d
|
Feat: support cuda graph for LoRA (#4115)
Co-authored-by: Beichen Ma <mabeichen12@gmail.com>
|
2025-04-28 23:30:44 -07:00 |
|
Lianmin Zheng
|
849c83a0c0
|
[CI] test chunked prefill more (#5798)
|
2025-04-28 10:57:17 -07:00 |
|
Baizhou Zhang
|
f48b007c1d
|
[Doc] Recover history of server_arguments.md (#5851)
|
2025-04-28 10:48:21 -07:00 |
|
Michael Yao
|
966eb90865
|
[Docs] Replace lists with tables for cleanup and readability in server_arguments (#5276)
|
2025-04-28 00:36:10 -07:00 |
|
Trevor Morris
|
84810da4ae
|
Add Cutlass MLA attention backend (#5390)
|
2025-04-27 20:58:53 -07:00 |
|
Lianmin Zheng
|
155890e4d1
|
[Minor] fix documentations (#5756)
|
2025-04-26 17:48:43 -07:00 |
|
Michael Yao
|
b5be56944b
|
[Doc] Fix a link to Weilin Zhao (#5706)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-04-25 02:02:27 +08:00 |
|
Michael Yao
|
92bb64bc86
|
[Doc] Fix a 404 link to llama-405b (#5615)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-04-21 20:39:37 -07:00 |
|
simveit
|
8de53da989
|
smaller and non gated models for docs (#5378)
|
2025-04-20 17:38:25 -07:00 |
|
Adarsh Shirawalmath
|
8b39274e34
|
[Feature] Prefill assistant response - add continue_final_message parameter (#4226)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-04-20 17:37:18 -07:00 |
|
Baizhou Zhang
|
072b4d0398
|
Add document for LoRA serving (#5521)
|
2025-04-20 14:37:57 -07:00 |
|
Yineng Zhang
|
a6f892e5d0
|
Revert "Avoid computing lse in Ragged Prefill when there's no prefix.… (#5544)
|
2025-04-18 16:50:21 -07:00 |
|
Wenxuan Tan
|
bfa3922451
|
Avoid computing lse in Ragged Prefill when there's no prefix. (#5476)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2025-04-18 01:13:57 -07:00 |
|
Michael Yao
|
a0fc5bc144
|
[docs] Fix several consistency issues in sampling_params.md (#5373)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2025-04-18 10:54:40 +08:00 |
|
mlmz
|
f13d65a7ea
|
Doc: fix problems of the 'Execute Notebooks / run-all-notebooks' ci caused by the unstability of deepseek-ai/DeepSeek-R1-Distill-Qwen-7B (#5503)
|
2025-04-17 11:37:43 -07:00 |
|
Baizhou Zhang
|
6fb29ffd9e
|
Deprecate enable-flashinfer-mla and enable-flashmla (#5480)
|
2025-04-17 01:43:33 -07:00 |
|
Baizhou Zhang
|
4fb05583ef
|
Deprecate disable-mla (#5481)
|
2025-04-17 01:43:14 -07:00 |
|
Didier Durand
|
92d1561b70
|
Update attention_backend.md: plural form (#5489)
|
2025-04-17 01:42:40 -07:00 |
|
Baizhou Zhang
|
a42736bbb8
|
Support MHA with chunked prefix cache for DeepSeek chunked prefill (#5113)
|
2025-04-15 22:01:22 -07:00 |
|
mRSun15
|
3efc8e2d2a
|
add attention backend supporting matrix in the doc (#5211)
Co-authored-by: Stefan He <hebiaobuaa@gmail.com>
|
2025-04-15 17:16:34 -07:00 |
|
thyecust
|
2074a2e6b6
|
Fix: docs/backend/structured_outputs.ipynb (#4884)
|
2025-04-12 02:18:55 -07:00 |
|
Mick
|
34ef6c8135
|
[VLM] Adopt fast image processor by default (#5065)
|
2025-04-11 21:46:58 -07:00 |
|
Adarsh Shirawalmath
|
4aa6bab0b0
|
[Docs] Supported Model Docs - Major restructuring (#5290)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-04-11 09:17:47 -07:00 |
|
Michael Yao
|
fc14cca088
|
Fix a 404 link in send_request.ipynb (#5280)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-04-11 01:38:45 -07:00 |
|
mlmz
|
4d2e305149
|
doc: nested loop code for offline engine (#5244)
|
2025-04-11 01:36:30 -07:00 |
|
simveit
|
f8194b267c
|
Small improvement of native api docs (#5139)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-04-08 12:09:26 -07:00 |
|
mlmz
|
7c5658c189
|
feat: disable grammar restrictions within reasoning sections (#4984)
Co-authored-by: tianhaoyu <thy@mail.ecust.edu.cn>
Co-authored-by: DarkSharpness <2040703891@qq.com>
|
2025-04-07 21:46:47 -07:00 |
|
Baizhou Zhang
|
efbae697b3
|
[Revision] Replace enable_flashinfer_mla argument with attention_backend (#5052)
|
2025-04-05 01:23:02 -07:00 |
|
simveit
|
98f768d194
|
update eagle-3 docs (#4796)
Co-authored-by: Yifan Zhang <zhangyif21@mails.tsinghua.edu.cn>
|
2025-04-03 15:24:41 -07:00 |
|
Lianmin Zheng
|
74885a848b
|
Revert "Replace enable_flashinfer_mla argument with attention_backend" (#5048)
|
2025-04-03 13:30:56 -07:00 |
|