Michael Yao
|
92bb64bc86
|
[Doc] Fix a 404 link to llama-405b (#5615)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-04-21 20:39:37 -07:00 |
|
simveit
|
8de53da989
|
smaller and non gated models for docs (#5378)
|
2025-04-20 17:38:25 -07:00 |
|
Adarsh Shirawalmath
|
8b39274e34
|
[Feature] Prefill assistant response - add continue_final_message parameter (#4226)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-04-20 17:37:18 -07:00 |
|
Baizhou Zhang
|
072b4d0398
|
Add document for LoRA serving (#5521)
|
2025-04-20 14:37:57 -07:00 |
|
Yineng Zhang
|
a6f892e5d0
|
Revert "Avoid computing lse in Ragged Prefill when there's no prefix.… (#5544)
|
2025-04-18 16:50:21 -07:00 |
|
Wenxuan Tan
|
bfa3922451
|
Avoid computing lse in Ragged Prefill when there's no prefix. (#5476)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2025-04-18 01:13:57 -07:00 |
|
Michael Yao
|
a0fc5bc144
|
[docs] Fix several consistency issues in sampling_params.md (#5373)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2025-04-18 10:54:40 +08:00 |
|
mlmz
|
f13d65a7ea
|
Doc: fix problems of the 'Execute Notebooks / run-all-notebooks' ci caused by the unstability of deepseek-ai/DeepSeek-R1-Distill-Qwen-7B (#5503)
|
2025-04-17 11:37:43 -07:00 |
|
Baizhou Zhang
|
6fb29ffd9e
|
Deprecate enable-flashinfer-mla and enable-flashmla (#5480)
|
2025-04-17 01:43:33 -07:00 |
|
Baizhou Zhang
|
4fb05583ef
|
Deprecate disable-mla (#5481)
|
2025-04-17 01:43:14 -07:00 |
|
Didier Durand
|
92d1561b70
|
Update attention_backend.md: plural form (#5489)
|
2025-04-17 01:42:40 -07:00 |
|
Baizhou Zhang
|
a42736bbb8
|
Support MHA with chunked prefix cache for DeepSeek chunked prefill (#5113)
|
2025-04-15 22:01:22 -07:00 |
|
mRSun15
|
3efc8e2d2a
|
add attention backend supporting matrix in the doc (#5211)
Co-authored-by: Stefan He <hebiaobuaa@gmail.com>
|
2025-04-15 17:16:34 -07:00 |
|
thyecust
|
2074a2e6b6
|
Fix: docs/backend/structured_outputs.ipynb (#4884)
|
2025-04-12 02:18:55 -07:00 |
|
Mick
|
34ef6c8135
|
[VLM] Adopt fast image processor by default (#5065)
|
2025-04-11 21:46:58 -07:00 |
|
Adarsh Shirawalmath
|
4aa6bab0b0
|
[Docs] Supported Model Docs - Major restructuring (#5290)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-04-11 09:17:47 -07:00 |
|
Michael Yao
|
fc14cca088
|
Fix a 404 link in send_request.ipynb (#5280)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-04-11 01:38:45 -07:00 |
|
mlmz
|
4d2e305149
|
doc: nested loop code for offline engine (#5244)
|
2025-04-11 01:36:30 -07:00 |
|
simveit
|
f8194b267c
|
Small improvement of native api docs (#5139)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-04-08 12:09:26 -07:00 |
|
mlmz
|
7c5658c189
|
feat: disable grammar restrictions within reasoning sections (#4984)
Co-authored-by: tianhaoyu <thy@mail.ecust.edu.cn>
Co-authored-by: DarkSharpness <2040703891@qq.com>
|
2025-04-07 21:46:47 -07:00 |
|
Baizhou Zhang
|
efbae697b3
|
[Revision] Replace enable_flashinfer_mla argument with attention_backend (#5052)
|
2025-04-05 01:23:02 -07:00 |
|
simveit
|
98f768d194
|
update eagle-3 docs (#4796)
Co-authored-by: Yifan Zhang <zhangyif21@mails.tsinghua.edu.cn>
|
2025-04-03 15:24:41 -07:00 |
|
Lianmin Zheng
|
74885a848b
|
Revert "Replace enable_flashinfer_mla argument with attention_backend" (#5048)
|
2025-04-03 13:30:56 -07:00 |
|
Baizhou Zhang
|
e8999b13b7
|
Replace enable_flashinfer_mla argument with attention_backend (#5005)
|
2025-04-03 02:53:58 -07:00 |
|
Jinyan Chen
|
23c764b18a
|
[Feature] Support DeepEP Low Latency (#4767)
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
Co-authored-by: laixinn <xielx@shanghaitech.edu.cn>
Co-authored-by: ch-wan <cwan39@gatech.edu>
|
2025-04-01 09:23:25 -07:00 |
|
Ke Bao
|
aa08aeacf4
|
update torch compile doc (#4874)
|
2025-03-28 19:49:30 -07:00 |
|
Brayden Zhong
|
b149b39353
|
[CI] Remove unused imports with Ruff to pre-commit config, only to benchmarks/docs/examples folder (#3969)
|
2025-03-27 19:45:02 -07:00 |
|
tarinkk
|
7f19e083c1
|
Support (1 <= dp < tp) in the dp attention in DeepEP (#4770)
Co-authored-by: Cheng Wan <cwan39@gatech.edu>
|
2025-03-27 17:09:35 -07:00 |
|
Jiří Suchomel
|
f60f293195
|
[k8s] Clarified the usage of shared memory. (#4341)
|
2025-03-27 08:53:19 -07:00 |
|
yuhsaun-t
|
199bb01d00
|
Add endpoints to dump selected expert ids (#4435)
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
|
2025-03-24 21:34:19 -07:00 |
|
BroadbentJim
|
8796cebb2c
|
fix typo SGLang supports three grammar backends (#4679)
|
2025-03-22 14:33:48 -07:00 |
|
Adarsh Shirawalmath
|
fb8886037c
|
[Docs] Update docs for gemma3 and VLM chat templates (#4674)
|
2025-03-22 08:02:19 -07:00 |
|
mlmz
|
f6ab4ca6bc
|
fix: fix ipython running error for Engine due to outlines nest_asyncio (#4582)
Co-authored-by: shuaills <shishuaiuoe@gmail.com>
|
2025-03-21 19:11:15 -07:00 |
|
Jinyan Chen
|
f44db16c8e
|
[Feature] Integrate DeepEP into SGLang (#4232)
Co-authored-by: Cheng Wan <cwan39@gatech.edu>
Co-authored-by: Xuting Zhou <xutingz@nvidia.com>
|
2025-03-19 08:16:31 -07:00 |
|
James Liu
|
9e0186f352
|
[Feature] Support EAGLE 3 (#4247)
|
2025-03-18 07:35:23 -07:00 |
|
HandH1998
|
f2ab37e500
|
[Doc] add doc for quantization w8a8_fp8 or w8a8_int8 (#4495)
|
2025-03-17 02:25:00 -07:00 |
|
Xihuai Wang
|
927ca935a7
|
Constraint Decoding: Tool call with text (#4067)
|
2025-03-17 01:06:46 -07:00 |
|
mlmz
|
452db50808
|
Constraint Decoding: Set xgrammar as the default grammar backend (#4386)
|
2025-03-16 18:53:43 -07:00 |
|
Wang Ran (汪然)
|
22c96f78a6
|
typos: Update sampling_params.md (#4391)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
|
2025-03-15 16:40:18 -07:00 |
|
Chayenne
|
e1a5e7e47d
|
docs: hot fix torch compile cache (#4442)
|
2025-03-14 19:05:59 -07:00 |
|
yang_zcybb
|
ad46550d25
|
[Doc] Fix typo in backend/sampling_params (#3835)
Co-authored-by: yangzhice.124 <yangzhice.124@bytedance.com>
|
2025-03-12 22:12:14 -07:00 |
|
Jun Liu
|
14344caa38
|
[docs] Update outdated description about torch.compile (#3844)
|
2025-03-12 22:09:38 -07:00 |
|
William
|
0a59a4657a
|
Fix the doc of FR-Spec (#4295)
|
2025-03-12 21:22:50 -07:00 |
|
Peter Pan
|
016033188c
|
docs: add parameter --log-requests-level (#4335)
|
2025-03-12 21:19:37 -07:00 |
|
Ke Bao
|
3a08f54638
|
Update MTP doc (#4290)
|
2025-03-11 00:46:55 -07:00 |
|
Xihuai Wang
|
6eec3cdce6
|
docs(reasoning content): 📝 deepseek-r1 parser support qwq (#4124)
|
2025-03-09 04:14:50 +00:00 |
|
Michael Yao
|
d557319a8b
|
[Docs] Fix links and grammar issues (#4162)
|
2025-03-06 23:14:18 -08:00 |
|
Chayenne
|
9854a18a51
|
Hot fix small vocal eagle in docs (#4154)
Co-authored-by: ybyang <ybyang7@iflytek.com>
|
2025-03-06 15:13:26 -08:00 |
|
Chayenne
|
ebddb65aed
|
Docs: add torch compile cache (#4151)
Co-authored-by: ybyang <ybyang7@iflytek.com>
|
2025-03-06 14:27:09 -08:00 |
|
simveit
|
8f0b63139e
|
Docs: improve EAGLE docs (#4038)
|
2025-03-05 22:40:21 -08:00 |
|