Commit Graph

455 Commits

Author SHA1 Message Date
Yineng Zhang
dcae1fb2cd chore: bump v0.4.6.post1 (#5845) 2025-04-28 12:57:08 -07:00
Lianmin Zheng
849c83a0c0 [CI] test chunked prefill more (#5798) 2025-04-28 10:57:17 -07:00
Baizhou Zhang
f48b007c1d [Doc] Recover history of server_arguments.md (#5851) 2025-04-28 10:48:21 -07:00
Michael Yao
966eb90865 [Docs] Replace lists with tables for cleanup and readability in server_arguments (#5276) 2025-04-28 00:36:10 -07:00
Trevor Morris
84810da4ae Add Cutlass MLA attention backend (#5390) 2025-04-27 20:58:53 -07:00
Huapeng Zhou
86317c09e9 [Docs] update grafana setup guide in production metrics (#5643)
Co-authored-by: NoahM <88418672+zhudianGG@users.noreply.github.com>
2025-04-27 15:36:33 -07:00
Baizhou Zhang
84022c0e56 Release v0.4.6 (#5795) 2025-04-27 14:07:05 -07:00
Frankey_8080
a21ef36352 support for the DeepSeek model by enabling streaming response parsing (#5592) 2025-04-26 18:59:31 -07:00
Lianmin Zheng
155890e4d1 [Minor] fix documentations (#5756) 2025-04-26 17:48:43 -07:00
Lianmin Zheng
5641a09458 Revert "[Model] Support ArcticForCausalLM architecture (Snowflake/snowflake-arctic-instruct)" (#5754) 2025-04-25 15:50:28 -07:00
Brayden Zhong
43fb95c2fa [Model] Support ArcticForCausalLM architecture (Snowflake/snowflake-arctic-instruct) (#5078)
Co-authored-by: vincent-4 <vincentzhongy+githubvincent4@gmail.com>
2025-04-25 15:24:09 +08:00
Michael Yao
b5be56944b [Doc] Fix a link to Weilin Zhao (#5706)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-04-25 02:02:27 +08:00
Michael Yao
7c99103f4c [Doc] Fix two 404 links caused by sglang typo (#5667)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-04-23 23:21:55 +08:00
Baizhou Zhang
ce5412b62e Turn on DeepGemm By Default and Update Doc (#5628) 2025-04-22 16:10:08 -07:00
Michael Yao
92bb64bc86 [Doc] Fix a 404 link to llama-405b (#5615)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-04-21 20:39:37 -07:00
Yineng Zhang
b9c87e781d chore: bump v0.4.5.post3 (#5611) 2025-04-21 18:16:20 -07:00
Huapeng Zhou
57131dd955 [Feat.] Enable grafana to show metrics (#4718)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-04-21 00:43:42 -07:00
simveit
8de53da989 smaller and non gated models for docs (#5378) 2025-04-20 17:38:25 -07:00
Yi Zhou
fac17acf08 add function call parser for DeepSeek V3 (#5224) 2025-04-20 17:38:08 -07:00
Adarsh Shirawalmath
8b39274e34 [Feature] Prefill assistant response - add continue_final_message parameter (#4226)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-04-20 17:37:18 -07:00
lukec
417b44eba8 [Feat] upgrade pytorch2.6 (#5417) 2025-04-20 16:06:34 -07:00
Baizhou Zhang
072b4d0398 Add document for LoRA serving (#5521) 2025-04-20 14:37:57 -07:00
fzyzcjy
9c43477710 Super tiny fix typo (#5559) 2025-04-20 14:21:18 -07:00
Lianmin Zheng
fbdc94ba59 Release v0.4.5.post2 (#5582) 2025-04-20 14:12:37 -07:00
Baizhou Zhang
b54b5a96e4 [Doc]Add instruction for profiling with bench_one_batch (#5581) 2025-04-20 14:05:36 -07:00
Yineng Zhang
0961feefca feat: use flashinfer jit package (#5547) 2025-04-19 00:28:39 -07:00
Yineng Zhang
a6f892e5d0 Revert "Avoid computing lse in Ragged Prefill when there's no prefix.… (#5544) 2025-04-18 16:50:21 -07:00
Wenxuan Tan
bfa3922451 Avoid computing lse in Ragged Prefill when there's no prefix. (#5476)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2025-04-18 01:13:57 -07:00
Michael Yao
a0fc5bc144 [docs] Fix several consistency issues in sampling_params.md (#5373)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2025-04-18 10:54:40 +08:00
mlmz
f13d65a7ea Doc: fix problems of the 'Execute Notebooks / run-all-notebooks' ci caused by the unstability of deepseek-ai/DeepSeek-R1-Distill-Qwen-7B (#5503) 2025-04-17 11:37:43 -07:00
Baizhou Zhang
6fb29ffd9e Deprecate enable-flashinfer-mla and enable-flashmla (#5480) 2025-04-17 01:43:33 -07:00
Baizhou Zhang
4fb05583ef Deprecate disable-mla (#5481) 2025-04-17 01:43:14 -07:00
Didier Durand
92d1561b70 Update attention_backend.md: plural form (#5489) 2025-04-17 01:42:40 -07:00
Ying Sheng
d7bc19a46a add multi-lora feature in README.md (#5463) 2025-04-16 03:25:25 -07:00
Xiaoyu Zhang
06a1656e02 [doc] Update benchmark_and_profiling.md (#5449) 2025-04-15 23:27:34 -07:00
Yineng Zhang
5b5c7237c8 chore: bump v0.4.5.post1 (#5445) 2025-04-15 23:00:07 -07:00
Baizhou Zhang
a42736bbb8 Support MHA with chunked prefix cache for DeepSeek chunked prefill (#5113) 2025-04-15 22:01:22 -07:00
Michael Yao
b64b88e738 [Docs] Update start/install.md (#5398) 2025-04-15 18:12:26 -07:00
mRSun15
3efc8e2d2a add attention backend supporting matrix in the doc (#5211)
Co-authored-by: Stefan He <hebiaobuaa@gmail.com>
2025-04-15 17:16:34 -07:00
Baizhou Zhang
f6772f1497 [Fix] Turn off DeepGEMM by default (#5263) 2025-04-14 17:45:44 -07:00
thyecust
2074a2e6b6 Fix: docs/backend/structured_outputs.ipynb (#4884) 2025-04-12 02:18:55 -07:00
Mick
34ef6c8135 [VLM] Adopt fast image processor by default (#5065) 2025-04-11 21:46:58 -07:00
Adarsh Shirawalmath
a0a9f6d64f [Docs] Remove the older supported docs section (#5301) 2025-04-11 11:30:18 -07:00
Adarsh Shirawalmath
4aa6bab0b0 [Docs] Supported Model Docs - Major restructuring (#5290)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-04-11 09:17:47 -07:00
Michael Yao
fc14cca088 Fix a 404 link in send_request.ipynb (#5280)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-04-11 01:38:45 -07:00
mlmz
4d2e305149 doc: nested loop code for offline engine (#5244) 2025-04-11 01:36:30 -07:00
Kay Yan
f2b70afde0 docs: remove the use of Downward API for LWS_WORKER_INDEX (#5110)
Signed-off-by: Kay Yan <kay.yan@daocloud.io>
2025-04-08 20:46:11 -07:00
simveit
f8194b267c Small improvement of native api docs (#5139)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-04-08 12:09:26 -07:00
mlmz
7c5658c189 feat: disable grammar restrictions within reasoning sections (#4984)
Co-authored-by: tianhaoyu <thy@mail.ecust.edu.cn>
Co-authored-by: DarkSharpness <2040703891@qq.com>
2025-04-07 21:46:47 -07:00
Ke Bao
ade714a67f Add Llama4 user guide (#5133)
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
2025-04-07 19:09:34 -07:00