simveit
|
f8194b267c
|
Small improvement of native api docs (#5139)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-04-08 12:09:26 -07:00 |
|
mlmz
|
7c5658c189
|
feat: disable grammar restrictions within reasoning sections (#4984)
Co-authored-by: tianhaoyu <thy@mail.ecust.edu.cn>
Co-authored-by: DarkSharpness <2040703891@qq.com>
|
2025-04-07 21:46:47 -07:00 |
|
Ke Bao
|
ade714a67f
|
Add Llama4 user guide (#5133)
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
|
2025-04-07 19:09:34 -07:00 |
|
Yineng Zhang
|
57f99608f4
|
bump v0.4.5 (#5117)
|
2025-04-07 00:35:00 -07:00 |
|
Chang Su
|
f04c80dc42
|
Add Llama4 support (#5092)
Co-authored-by: Cheng Wan <cwan39@gatech.edu>
Co-authored-by: fzyzcjy <ch271828n@outlook.com>
Co-authored-by: ispobock <ispobaoke@163.com>
|
2025-04-07 00:29:36 -07:00 |
|
mlmz
|
d1bb171180
|
Fix: Reduce the number of document ci attempts to avoid long ci running (#5097)
Co-authored-by: shuaills <shishuaiuoe@gmail.com>
|
2025-04-06 00:43:48 -07:00 |
|
Yineng Zhang
|
35e0856b90
|
bump v0.4.4.post4 (#5091)
|
2025-04-05 15:36:17 -07:00 |
|
Baizhou Zhang
|
efbae697b3
|
[Revision] Replace enable_flashinfer_mla argument with attention_backend (#5052)
|
2025-04-05 01:23:02 -07:00 |
|
renxin
|
913e38dffa
|
Feature/revise docs ci (#5056)
|
2025-04-03 21:20:21 -07:00 |
|
simveit
|
98f768d194
|
update eagle-3 docs (#4796)
Co-authored-by: Yifan Zhang <zhangyif21@mails.tsinghua.edu.cn>
|
2025-04-03 15:24:41 -07:00 |
|
Lianmin Zheng
|
74885a848b
|
Revert "Replace enable_flashinfer_mla argument with attention_backend" (#5048)
|
2025-04-03 13:30:56 -07:00 |
|
Baizhou Zhang
|
e8999b13b7
|
Replace enable_flashinfer_mla argument with attention_backend (#5005)
|
2025-04-03 02:53:58 -07:00 |
|
renxin
|
cccfc10e9c
|
Feature/revise docs ci (#5009)
|
2025-04-02 20:08:56 -07:00 |
|
Jinyan Chen
|
23c764b18a
|
[Feature] Support DeepEP Low Latency (#4767)
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
Co-authored-by: laixinn <xielx@shanghaitech.edu.cn>
Co-authored-by: ch-wan <cwan39@gatech.edu>
|
2025-04-01 09:23:25 -07:00 |
|
fzyzcjy
|
736502d4fd
|
Tiny fix doc error (#4795)
|
2025-03-29 08:22:17 -07:00 |
|
Yineng Zhang
|
19e96e5923
|
bump v0.4.4.post3 (#4878)
|
2025-03-28 23:21:24 -07:00 |
|
Ke Bao
|
aa08aeacf4
|
update torch compile doc (#4874)
|
2025-03-28 19:49:30 -07:00 |
|
Brayden Zhong
|
b149b39353
|
[CI] Remove unused imports with Ruff to pre-commit config, only to benchmarks/docs/examples folder (#3969)
|
2025-03-27 19:45:02 -07:00 |
|
tarinkk
|
7f19e083c1
|
Support (1 <= dp < tp) in the dp attention in DeepEP (#4770)
Co-authored-by: Cheng Wan <cwan39@gatech.edu>
|
2025-03-27 17:09:35 -07:00 |
|
Ke Bao
|
b39532587b
|
Update doc for DeepSeek-V3-0324 (#4825)
|
2025-03-27 13:30:40 -07:00 |
|
Jiří Suchomel
|
f60f293195
|
[k8s] Clarified the usage of shared memory. (#4341)
|
2025-03-27 08:53:19 -07:00 |
|
Pan Lyu
|
c913ed4046
|
support clip embedding model (#4506)
|
2025-03-27 00:18:15 -07:00 |
|
Didier Durand
|
44f47d3ee1
|
Update supported_models.md: adding open-r1 Olympic Code 32B by HuggingFace (#4628)
|
2025-03-27 00:16:16 -07:00 |
|
Yineng Zhang
|
1099f6c974
|
bump v0.4.4.post2 (#4669)
|
2025-03-26 19:58:00 -07:00 |
|
fzyzcjy
|
15ddd84322
|
Add retry for flaky tests in CI (#4755)
|
2025-03-25 16:53:12 -07:00 |
|
yuhsaun-t
|
199bb01d00
|
Add endpoints to dump selected expert ids (#4435)
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
|
2025-03-24 21:34:19 -07:00 |
|
Mick
|
1e86457c90
|
model: Minicpmo (#3023)
|
2025-03-24 20:08:40 -07:00 |
|
Ximingwang-09
|
22c3702e1e
|
[Model] Support Qwen2ForSequenceClassification (#4609)
Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>
|
2025-03-24 19:13:44 -07:00 |
|
BroadbentJim
|
8796cebb2c
|
fix typo SGLang supports three grammar backends (#4679)
|
2025-03-22 14:33:48 -07:00 |
|
Adarsh Shirawalmath
|
fb8886037c
|
[Docs] Update docs for gemma3 and VLM chat templates (#4674)
|
2025-03-22 08:02:19 -07:00 |
|
mlmz
|
f6ab4ca6bc
|
fix: fix ipython running error for Engine due to outlines nest_asyncio (#4582)
Co-authored-by: shuaills <shishuaiuoe@gmail.com>
|
2025-03-21 19:11:15 -07:00 |
|
Michael Yao
|
c6ec70290f
|
[docs] Add links and fix grammars in deploy_on_k8s.md (#4641)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-03-20 22:55:23 -07:00 |
|
Ke Bao
|
bfb03c6182
|
Update doc for MTP and DP attention (#4622)
|
2025-03-20 11:31:48 -07:00 |
|
Jinyan Chen
|
f44db16c8e
|
[Feature] Integrate DeepEP into SGLang (#4232)
Co-authored-by: Cheng Wan <cwan39@gatech.edu>
Co-authored-by: Xuting Zhou <xutingz@nvidia.com>
|
2025-03-19 08:16:31 -07:00 |
|
James Liu
|
9e0186f352
|
[Feature] Support EAGLE 3 (#4247)
|
2025-03-18 07:35:23 -07:00 |
|
Albert
|
2d0045125f
|
Fix the incorrect args in benchmark_and_profiling.md (#4542)
Signed-off-by: Tianyu Zhou <albert.zty@antgroup.com>
|
2025-03-18 00:07:06 -07:00 |
|
Lianmin Zheng
|
c38ca4fc8e
|
Update readme (#4517)
|
2025-03-17 08:22:42 -07:00 |
|
HandH1998
|
f2ab37e500
|
[Doc] add doc for quantization w8a8_fp8 or w8a8_int8 (#4495)
|
2025-03-17 02:25:00 -07:00 |
|
Xihuai Wang
|
927ca935a7
|
Constraint Decoding: Tool call with text (#4067)
|
2025-03-17 01:06:46 -07:00 |
|
Wenbo Yang
|
75b656488a
|
Support serving DeepSeek-R1-Channel-INT8 with 32 L40S. (#4418)
|
2025-03-17 00:03:43 -07:00 |
|
萝卜菜
|
d6d21640d3
|
[Feature] Support Deepseek-VL2 (#2798)
Co-authored-by: Edenzzzz <wtan45@wisc.edu>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: Yi Zhang <1109276519@qq.com>
|
2025-03-16 23:07:59 -07:00 |
|
mlmz
|
452db50808
|
Constraint Decoding: Set xgrammar as the default grammar backend (#4386)
|
2025-03-16 18:53:43 -07:00 |
|
Mick
|
9d02bb3e2a
|
Urgent model support: support gemma-3-it (#4424)
|
2025-03-16 17:37:32 -07:00 |
|
Wang Ran (汪然)
|
22c96f78a6
|
typos: Update sampling_params.md (#4391)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
|
2025-03-15 16:40:18 -07:00 |
|
江家瑋
|
26c372c13c
|
docs: Add Llama 3.3 to supported models (#4453)
Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
|
2025-03-15 16:33:43 -07:00 |
|
Chayenne
|
e1a5e7e47d
|
docs: hot fix torch compile cache (#4442)
|
2025-03-14 19:05:59 -07:00 |
|
Zhan Lu
|
660305c38a
|
[Doc] fix wrong flag in deepseek documentation (#4427)
|
2025-03-14 11:30:55 -07:00 |
|
Yineng Zhang
|
ba80c102f9
|
bump v0.4.4.post1 (#4402)
|
2025-03-13 17:53:46 -07:00 |
|
Yineng Zhang
|
6aaeb84872
|
chore: bump v0.4.4 (#4041)
|
2025-03-13 02:49:58 -07:00 |
|
Lianmin Zheng
|
45de89719c
|
Revert "[XPU][CPU] Enable the native path of DeepSeek" (#4367)
|
2025-03-12 23:45:52 -07:00 |
|