Brayden Zhong
|
43fb95c2fa
|
[Model] Support ArcticForCausalLM architecture (Snowflake/snowflake-arctic-instruct) (#5078)
Co-authored-by: vincent-4 <vincentzhongy+githubvincent4@gmail.com>
|
2025-04-25 15:24:09 +08:00 |
|
Michael Yao
|
b5be56944b
|
[Doc] Fix a link to Weilin Zhao (#5706)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-04-25 02:02:27 +08:00 |
|
Michael Yao
|
7c99103f4c
|
[Doc] Fix two 404 links caused by sglang typo (#5667)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-04-23 23:21:55 +08:00 |
|
Baizhou Zhang
|
ce5412b62e
|
Turn on DeepGemm By Default and Update Doc (#5628)
|
2025-04-22 16:10:08 -07:00 |
|
Michael Yao
|
92bb64bc86
|
[Doc] Fix a 404 link to llama-405b (#5615)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-04-21 20:39:37 -07:00 |
|
Yineng Zhang
|
b9c87e781d
|
chore: bump v0.4.5.post3 (#5611)
|
2025-04-21 18:16:20 -07:00 |
|
Huapeng Zhou
|
57131dd955
|
[Feat.] Enable grafana to show metrics (#4718)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-04-21 00:43:42 -07:00 |
|
simveit
|
8de53da989
|
smaller and non gated models for docs (#5378)
|
2025-04-20 17:38:25 -07:00 |
|
Yi Zhou
|
fac17acf08
|
add function call parser for DeepSeek V3 (#5224)
|
2025-04-20 17:38:08 -07:00 |
|
Adarsh Shirawalmath
|
8b39274e34
|
[Feature] Prefill assistant response - add continue_final_message parameter (#4226)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-04-20 17:37:18 -07:00 |
|
lukec
|
417b44eba8
|
[Feat] upgrade pytorch2.6 (#5417)
|
2025-04-20 16:06:34 -07:00 |
|
Baizhou Zhang
|
072b4d0398
|
Add document for LoRA serving (#5521)
|
2025-04-20 14:37:57 -07:00 |
|
fzyzcjy
|
9c43477710
|
Super tiny fix typo (#5559)
|
2025-04-20 14:21:18 -07:00 |
|
Lianmin Zheng
|
fbdc94ba59
|
Release v0.4.5.post2 (#5582)
|
2025-04-20 14:12:37 -07:00 |
|
Baizhou Zhang
|
b54b5a96e4
|
[Doc]Add instruction for profiling with bench_one_batch (#5581)
|
2025-04-20 14:05:36 -07:00 |
|
Yineng Zhang
|
0961feefca
|
feat: use flashinfer jit package (#5547)
|
2025-04-19 00:28:39 -07:00 |
|
Yineng Zhang
|
a6f892e5d0
|
Revert "Avoid computing lse in Ragged Prefill when there's no prefix.… (#5544)
|
2025-04-18 16:50:21 -07:00 |
|
Wenxuan Tan
|
bfa3922451
|
Avoid computing lse in Ragged Prefill when there's no prefix. (#5476)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2025-04-18 01:13:57 -07:00 |
|
Michael Yao
|
a0fc5bc144
|
[docs] Fix several consistency issues in sampling_params.md (#5373)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2025-04-18 10:54:40 +08:00 |
|
mlmz
|
f13d65a7ea
|
Doc: fix problems of the 'Execute Notebooks / run-all-notebooks' ci caused by the unstability of deepseek-ai/DeepSeek-R1-Distill-Qwen-7B (#5503)
|
2025-04-17 11:37:43 -07:00 |
|
Baizhou Zhang
|
6fb29ffd9e
|
Deprecate enable-flashinfer-mla and enable-flashmla (#5480)
|
2025-04-17 01:43:33 -07:00 |
|
Baizhou Zhang
|
4fb05583ef
|
Deprecate disable-mla (#5481)
|
2025-04-17 01:43:14 -07:00 |
|
Didier Durand
|
92d1561b70
|
Update attention_backend.md: plural form (#5489)
|
2025-04-17 01:42:40 -07:00 |
|
Ying Sheng
|
d7bc19a46a
|
add multi-lora feature in README.md (#5463)
|
2025-04-16 03:25:25 -07:00 |
|
Xiaoyu Zhang
|
06a1656e02
|
[doc] Update benchmark_and_profiling.md (#5449)
|
2025-04-15 23:27:34 -07:00 |
|
Yineng Zhang
|
5b5c7237c8
|
chore: bump v0.4.5.post1 (#5445)
|
2025-04-15 23:00:07 -07:00 |
|
Baizhou Zhang
|
a42736bbb8
|
Support MHA with chunked prefix cache for DeepSeek chunked prefill (#5113)
|
2025-04-15 22:01:22 -07:00 |
|
Michael Yao
|
b64b88e738
|
[Docs] Update start/install.md (#5398)
|
2025-04-15 18:12:26 -07:00 |
|
mRSun15
|
3efc8e2d2a
|
add attention backend supporting matrix in the doc (#5211)
Co-authored-by: Stefan He <hebiaobuaa@gmail.com>
|
2025-04-15 17:16:34 -07:00 |
|
Baizhou Zhang
|
f6772f1497
|
[Fix] Turn off DeepGEMM by default (#5263)
|
2025-04-14 17:45:44 -07:00 |
|
thyecust
|
2074a2e6b6
|
Fix: docs/backend/structured_outputs.ipynb (#4884)
|
2025-04-12 02:18:55 -07:00 |
|
Mick
|
34ef6c8135
|
[VLM] Adopt fast image processor by default (#5065)
|
2025-04-11 21:46:58 -07:00 |
|
Adarsh Shirawalmath
|
a0a9f6d64f
|
[Docs] Remove the older supported docs section (#5301)
|
2025-04-11 11:30:18 -07:00 |
|
Adarsh Shirawalmath
|
4aa6bab0b0
|
[Docs] Supported Model Docs - Major restructuring (#5290)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-04-11 09:17:47 -07:00 |
|
Michael Yao
|
fc14cca088
|
Fix a 404 link in send_request.ipynb (#5280)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-04-11 01:38:45 -07:00 |
|
mlmz
|
4d2e305149
|
doc: nested loop code for offline engine (#5244)
|
2025-04-11 01:36:30 -07:00 |
|
Kay Yan
|
f2b70afde0
|
docs: remove the use of Downward API for LWS_WORKER_INDEX (#5110)
Signed-off-by: Kay Yan <kay.yan@daocloud.io>
|
2025-04-08 20:46:11 -07:00 |
|
simveit
|
f8194b267c
|
Small improvement of native api docs (#5139)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-04-08 12:09:26 -07:00 |
|
mlmz
|
7c5658c189
|
feat: disable grammar restrictions within reasoning sections (#4984)
Co-authored-by: tianhaoyu <thy@mail.ecust.edu.cn>
Co-authored-by: DarkSharpness <2040703891@qq.com>
|
2025-04-07 21:46:47 -07:00 |
|
Ke Bao
|
ade714a67f
|
Add Llama4 user guide (#5133)
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
|
2025-04-07 19:09:34 -07:00 |
|
Yineng Zhang
|
57f99608f4
|
bump v0.4.5 (#5117)
|
2025-04-07 00:35:00 -07:00 |
|
Chang Su
|
f04c80dc42
|
Add Llama4 support (#5092)
Co-authored-by: Cheng Wan <cwan39@gatech.edu>
Co-authored-by: fzyzcjy <ch271828n@outlook.com>
Co-authored-by: ispobock <ispobaoke@163.com>
|
2025-04-07 00:29:36 -07:00 |
|
mlmz
|
d1bb171180
|
Fix: Reduce the number of document ci attempts to avoid long ci running (#5097)
Co-authored-by: shuaills <shishuaiuoe@gmail.com>
|
2025-04-06 00:43:48 -07:00 |
|
Yineng Zhang
|
35e0856b90
|
bump v0.4.4.post4 (#5091)
|
2025-04-05 15:36:17 -07:00 |
|
Baizhou Zhang
|
efbae697b3
|
[Revision] Replace enable_flashinfer_mla argument with attention_backend (#5052)
|
2025-04-05 01:23:02 -07:00 |
|
renxin
|
913e38dffa
|
Feature/revise docs ci (#5056)
|
2025-04-03 21:20:21 -07:00 |
|
simveit
|
98f768d194
|
update eagle-3 docs (#4796)
Co-authored-by: Yifan Zhang <zhangyif21@mails.tsinghua.edu.cn>
|
2025-04-03 15:24:41 -07:00 |
|
Lianmin Zheng
|
74885a848b
|
Revert "Replace enable_flashinfer_mla argument with attention_backend" (#5048)
|
2025-04-03 13:30:56 -07:00 |
|
Baizhou Zhang
|
e8999b13b7
|
Replace enable_flashinfer_mla argument with attention_backend (#5005)
|
2025-04-03 02:53:58 -07:00 |
|
renxin
|
cccfc10e9c
|
Feature/revise docs ci (#5009)
|
2025-04-02 20:08:56 -07:00 |
|