Lianmin Zheng
|
74885a848b
|
Revert "Replace enable_flashinfer_mla argument with attention_backend" (#5048)
|
2025-04-03 13:30:56 -07:00 |
|
Baizhou Zhang
|
e8999b13b7
|
Replace enable_flashinfer_mla argument with attention_backend (#5005)
|
2025-04-03 02:53:58 -07:00 |
|
fzyzcjy
|
736502d4fd
|
Tiny fix doc error (#4795)
|
2025-03-29 08:22:17 -07:00 |
|
Ke Bao
|
b39532587b
|
Update doc for DeepSeek-V3-0324 (#4825)
|
2025-03-27 13:30:40 -07:00 |
|
Pan Lyu
|
c913ed4046
|
support clip embedding model (#4506)
|
2025-03-27 00:18:15 -07:00 |
|
Didier Durand
|
44f47d3ee1
|
Update supported_models.md: adding open-r1 Olympic Code 32B by HuggingFace (#4628)
|
2025-03-27 00:16:16 -07:00 |
|
Mick
|
1e86457c90
|
model: Minicpmo (#3023)
|
2025-03-24 20:08:40 -07:00 |
|
Ximingwang-09
|
22c3702e1e
|
[Model] Support Qwen2ForSequenceClassification (#4609)
Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>
|
2025-03-24 19:13:44 -07:00 |
|
Adarsh Shirawalmath
|
fb8886037c
|
[Docs] Update docs for gemma3 and VLM chat templates (#4674)
|
2025-03-22 08:02:19 -07:00 |
|
Michael Yao
|
c6ec70290f
|
[docs] Add links and fix grammars in deploy_on_k8s.md (#4641)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-03-20 22:55:23 -07:00 |
|
Ke Bao
|
bfb03c6182
|
Update doc for MTP and DP attention (#4622)
|
2025-03-20 11:31:48 -07:00 |
|
Albert
|
2d0045125f
|
Fix the incorrect args in benchmark_and_profiling.md (#4542)
Signed-off-by: Tianyu Zhou <albert.zty@antgroup.com>
|
2025-03-18 00:07:06 -07:00 |
|
Wenbo Yang
|
75b656488a
|
Support serving DeepSeek-R1-Channel-INT8 with 32 L40S. (#4418)
|
2025-03-17 00:03:43 -07:00 |
|
萝卜菜
|
d6d21640d3
|
[Feature] Support Deepseek-VL2 (#2798)
Co-authored-by: Edenzzzz <wtan45@wisc.edu>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: Yi Zhang <1109276519@qq.com>
|
2025-03-16 23:07:59 -07:00 |
|
Mick
|
9d02bb3e2a
|
Urgent model support: support gemma-3-it (#4424)
|
2025-03-16 17:37:32 -07:00 |
|
江家瑋
|
26c372c13c
|
docs: Add Llama 3.3 to supported models (#4453)
Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
|
2025-03-15 16:33:43 -07:00 |
|
Zhan Lu
|
660305c38a
|
[Doc] fix wrong flag in deepseek documentation (#4427)
|
2025-03-14 11:30:55 -07:00 |
|
Mick
|
01090e8ac3
|
model: Support Janus-pro (#3203)
|
2025-03-12 11:02:11 -07:00 |
|
Michael Yao
|
8f1f614ee2
|
[Docs] Clean up benchmark_and_profiling.md (#4297)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-03-11 21:48:21 -07:00 |
|
Ke Bao
|
3a08f54638
|
Update MTP doc (#4290)
|
2025-03-11 00:46:55 -07:00 |
|
Baizhou Zhang
|
9fb48f951f
|
Support nextn for flashinfer mla attention backend (#4218)
|
2025-03-09 00:01:54 -08:00 |
|
Stefan He
|
dceb256f1b
|
[docs] Unhide production metrics page (#4193)
|
2025-03-08 23:41:40 -08:00 |
|
Michael Yao
|
c827c671f7
|
[Docs] Improve bullets appearance and grammar (#4174)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-03-07 03:16:25 -08:00 |
|
Yineng Zhang
|
b55a621ffb
|
fix int8 doc link (#4179)
|
2025-03-07 02:49:19 -08:00 |
|
lukec
|
ffa1b3e318
|
Add an example of using deepseekv3 int8 sglang. (#4177)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-03-07 01:56:09 -08:00 |
|
Pan Lyu
|
361971b859
|
Add Support for Qwen2-VL Multi-modal Embedding Models (#3694)
|
2025-03-06 16:46:20 -08:00 |
|
Chayenne
|
ebddb65aed
|
Docs: add torch compile cache (#4151)
Co-authored-by: ybyang <ybyang7@iflytek.com>
|
2025-03-06 14:27:09 -08:00 |
|
Adarsh Shirawalmath
|
19fd57bcd7
|
[docs] fix HF reference script command (#4148)
|
2025-03-06 13:21:54 -08:00 |
|
samzong
|
d2d0d061d9
|
fix cross-reference error and spelling mistakes (#4101)
Signed-off-by: samzong <samzong.lu@gmail.com>
|
2025-03-05 16:39:02 -08:00 |
|
Yineng Zhang
|
0aaccbbfec
|
revert deepseek docs (#4109)
|
2025-03-05 13:23:11 -08:00 |
|
Chayenne
|
e70fa279bc
|
Docs: reorganize dpsk docs (#4108)
|
2025-03-05 13:01:03 -08:00 |
|
Tommy Yang
|
abe74b7b59
|
Docs: Add DeepSeek optimization ablations documentation (#4107)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-03-05 12:25:51 -08:00 |
|
Baizhou Zhang
|
fc91d08a8f
|
[Revision] Add fast decode plan for flashinfer mla (#4012)
|
2025-03-05 11:20:41 -08:00 |
|
Xihuai Wang
|
95575aa76a
|
Reasoning parser (#4000)
Co-authored-by: Lucas Pickup <lupickup@microsoft.com>
|
2025-03-03 21:16:36 -08:00 |
|
Chayenne
|
146ac8df07
|
Add examples in sampling parameters (#4039)
|
2025-03-03 13:04:32 -08:00 |
|
Lianmin Zheng
|
935cda944b
|
Misc clean up; Remove the support of jump forward (#4032)
|
2025-03-03 07:02:14 -08:00 |
|
Yudi Xue
|
a7000a7650
|
Update metrics documentation (#3264)
|
2025-03-03 05:03:58 -08:00 |
|
Lianmin Zheng
|
9e1014cf99
|
Revert "Add fast decode plan for flashinfer mla" (#4008)
|
2025-03-02 19:29:10 -08:00 |
|
Baizhou Zhang
|
fa56106731
|
Add fast decode plan for flashinfer mla (#3987)
|
2025-03-02 19:16:37 -08:00 |
|
Yineng Zhang
|
5d86016855
|
revert "Docs: Reorngaize dpsk links #3900" (#3933)
|
2025-02-27 08:57:13 -08:00 |
|
Baizhou Zhang
|
3e02526b1f
|
[Doc] Add experimental tag for flashinfer mla (#3925)
|
2025-02-27 01:55:36 -08:00 |
|
Stefan He
|
d8a98a2cad
|
[Docs] Improve DPSK docs in dark mode (#3914)
|
2025-02-27 00:13:04 -08:00 |
|
Baizhou Zhang
|
71ed01833d
|
[doc] Update document for flashinfer mla (#3907)
|
2025-02-26 20:40:45 -08:00 |
|
Chayenne
|
7c1692aa90
|
Docs: Reorngaize dpsk links (#3900)
|
2025-02-26 15:16:31 -08:00 |
|
Chayenne
|
8f019c7d1a
|
Docs: Move dpsk docs forward a step (#3894)
|
2025-02-26 11:43:20 -08:00 |
|
Shenggui Li
|
3dc9ff3ce8
|
[doc] fixed dpsk quant faq (#3865)
|
2025-02-25 19:40:47 -08:00 |
|
Shenggui Li
|
06427dfab1
|
[doc] added quantization doc for dpsk (#3843)
|
2025-02-25 09:43:28 -08:00 |
|
Shenggui Li
|
c0bb9eb3b3
|
[improve] made timeout configurable (#3803)
|
2025-02-25 00:26:08 -08:00 |
|
Baizhou Zhang
|
4d2a88bdff
|
[Docs]Add instruction for manually stopping nsys profiler (#3795)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-02-23 13:21:48 -08:00 |
|
Mick
|
45205d88a0
|
bench: Add MMMU benchmark for vLM (#3562)
|
2025-02-22 08:10:59 -08:00 |
|