Commit Graph

401 Commits

Author SHA1 Message Date
Baizhou Zhang
efbae697b3 [Revision] Replace enable_flashinfer_mla argument with attention_backend (#5052) 2025-04-05 01:23:02 -07:00
renxin
913e38dffa Feature/revise docs ci (#5056) 2025-04-03 21:20:21 -07:00
simveit
98f768d194 update eagle-3 docs (#4796)
Co-authored-by: Yifan Zhang <zhangyif21@mails.tsinghua.edu.cn>
2025-04-03 15:24:41 -07:00
Lianmin Zheng
74885a848b Revert "Replace enable_flashinfer_mla argument with attention_backend" (#5048) 2025-04-03 13:30:56 -07:00
Baizhou Zhang
e8999b13b7 Replace enable_flashinfer_mla argument with attention_backend (#5005) 2025-04-03 02:53:58 -07:00
renxin
cccfc10e9c Feature/revise docs ci (#5009) 2025-04-02 20:08:56 -07:00
Jinyan Chen
23c764b18a [Feature] Support DeepEP Low Latency (#4767)
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
Co-authored-by: laixinn <xielx@shanghaitech.edu.cn>
Co-authored-by: ch-wan <cwan39@gatech.edu>
2025-04-01 09:23:25 -07:00
fzyzcjy
736502d4fd Tiny fix doc error (#4795) 2025-03-29 08:22:17 -07:00
Yineng Zhang
19e96e5923 bump v0.4.4.post3 (#4878) 2025-03-28 23:21:24 -07:00
Ke Bao
aa08aeacf4 update torch compile doc (#4874) 2025-03-28 19:49:30 -07:00
Brayden Zhong
b149b39353 [CI] Remove unused imports with Ruff to pre-commit config, only to benchmarks/docs/examples folder (#3969) 2025-03-27 19:45:02 -07:00
tarinkk
7f19e083c1 Support (1 <= dp < tp) in the dp attention in DeepEP (#4770)
Co-authored-by: Cheng Wan <cwan39@gatech.edu>
2025-03-27 17:09:35 -07:00
Ke Bao
b39532587b Update doc for DeepSeek-V3-0324 (#4825) 2025-03-27 13:30:40 -07:00
Jiří Suchomel
f60f293195 [k8s] Clarified the usage of shared memory. (#4341) 2025-03-27 08:53:19 -07:00
Pan Lyu
c913ed4046 support clip embedding model (#4506) 2025-03-27 00:18:15 -07:00
Didier Durand
44f47d3ee1 Update supported_models.md: adding open-r1 Olympic Code 32B by HuggingFace (#4628) 2025-03-27 00:16:16 -07:00
Yineng Zhang
1099f6c974 bump v0.4.4.post2 (#4669) 2025-03-26 19:58:00 -07:00
fzyzcjy
15ddd84322 Add retry for flaky tests in CI (#4755) 2025-03-25 16:53:12 -07:00
yuhsaun-t
199bb01d00 Add endpoints to dump selected expert ids (#4435)
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
2025-03-24 21:34:19 -07:00
Mick
1e86457c90 model: Minicpmo (#3023) 2025-03-24 20:08:40 -07:00
Ximingwang-09
22c3702e1e [Model] Support Qwen2ForSequenceClassification (#4609)
Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>
2025-03-24 19:13:44 -07:00
BroadbentJim
8796cebb2c fix typo SGLang supports three grammar backends (#4679) 2025-03-22 14:33:48 -07:00
Adarsh Shirawalmath
fb8886037c [Docs] Update docs for gemma3 and VLM chat templates (#4674) 2025-03-22 08:02:19 -07:00
mlmz
f6ab4ca6bc fix: fix ipython running error for Engine due to outlines nest_asyncio (#4582)
Co-authored-by: shuaills <shishuaiuoe@gmail.com>
2025-03-21 19:11:15 -07:00
Michael Yao
c6ec70290f [docs] Add links and fix grammars in deploy_on_k8s.md (#4641)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-03-20 22:55:23 -07:00
Ke Bao
bfb03c6182 Update doc for MTP and DP attention (#4622) 2025-03-20 11:31:48 -07:00
Jinyan Chen
f44db16c8e [Feature] Integrate DeepEP into SGLang (#4232)
Co-authored-by: Cheng Wan <cwan39@gatech.edu>
Co-authored-by: Xuting Zhou <xutingz@nvidia.com>
2025-03-19 08:16:31 -07:00
James Liu
9e0186f352 [Feature] Support EAGLE 3 (#4247) 2025-03-18 07:35:23 -07:00
Albert
2d0045125f Fix the incorrect args in benchmark_and_profiling.md (#4542)
Signed-off-by: Tianyu Zhou <albert.zty@antgroup.com>
2025-03-18 00:07:06 -07:00
Lianmin Zheng
c38ca4fc8e Update readme (#4517) 2025-03-17 08:22:42 -07:00
HandH1998
f2ab37e500 [Doc] add doc for quantization w8a8_fp8 or w8a8_int8 (#4495) 2025-03-17 02:25:00 -07:00
Xihuai Wang
927ca935a7 Constraint Decoding: Tool call with text (#4067) 2025-03-17 01:06:46 -07:00
Wenbo Yang
75b656488a Support serving DeepSeek-R1-Channel-INT8 with 32 L40S. (#4418) 2025-03-17 00:03:43 -07:00
萝卜菜
d6d21640d3 [Feature] Support Deepseek-VL2 (#2798)
Co-authored-by: Edenzzzz <wtan45@wisc.edu>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: Yi Zhang <1109276519@qq.com>
2025-03-16 23:07:59 -07:00
mlmz
452db50808 Constraint Decoding: Set xgrammar as the default grammar backend (#4386) 2025-03-16 18:53:43 -07:00
Mick
9d02bb3e2a Urgent model support: support gemma-3-it (#4424) 2025-03-16 17:37:32 -07:00
Wang Ran (汪然)
22c96f78a6 typos: Update sampling_params.md (#4391)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
2025-03-15 16:40:18 -07:00
江家瑋
26c372c13c docs: Add Llama 3.3 to supported models (#4453)
Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
2025-03-15 16:33:43 -07:00
Chayenne
e1a5e7e47d docs: hot fix torch compile cache (#4442) 2025-03-14 19:05:59 -07:00
Zhan Lu
660305c38a [Doc] fix wrong flag in deepseek documentation (#4427) 2025-03-14 11:30:55 -07:00
Yineng Zhang
ba80c102f9 bump v0.4.4.post1 (#4402) 2025-03-13 17:53:46 -07:00
Yineng Zhang
6aaeb84872 chore: bump v0.4.4 (#4041) 2025-03-13 02:49:58 -07:00
Lianmin Zheng
45de89719c Revert "[XPU][CPU] Enable the native path of DeepSeek" (#4367) 2025-03-12 23:45:52 -07:00
Meng, Hengyu
71046fcd71 [XPU][CPU] Enable the native path of DeepSeek (#4086)
Co-authored-by: Zhang, Liangang <liangang.zhang@intel.com>
2025-03-12 22:26:29 -07:00
yang_zcybb
ad46550d25 [Doc] Fix typo in backend/sampling_params (#3835)
Co-authored-by: yangzhice.124 <yangzhice.124@bytedance.com>
2025-03-12 22:12:14 -07:00
Jun Liu
14344caa38 [docs] Update outdated description about torch.compile (#3844) 2025-03-12 22:09:38 -07:00
William
0a59a4657a Fix the doc of FR-Spec (#4295) 2025-03-12 21:22:50 -07:00
Peter Pan
016033188c docs: add parameter --log-requests-level (#4335) 2025-03-12 21:19:37 -07:00
shizhediao
2c3656f276 [Fix Doc.] Enable internal forwarding when starting the router (#4355) 2025-03-12 15:53:26 -07:00
Mick
01090e8ac3 model: Support Janus-pro (#3203) 2025-03-12 11:02:11 -07:00