Commit Graph

372 Commits

Author SHA1 Message Date
Lianmin Zheng
c38ca4fc8e Update readme (#4517) 2025-03-17 08:22:42 -07:00
HandH1998
f2ab37e500 [Doc] add doc for quantization w8a8_fp8 or w8a8_int8 (#4495) 2025-03-17 02:25:00 -07:00
Xihuai Wang
927ca935a7 Constraint Decoding: Tool call with text (#4067) 2025-03-17 01:06:46 -07:00
Wenbo Yang
75b656488a Support serving DeepSeek-R1-Channel-INT8 with 32 L40S. (#4418) 2025-03-17 00:03:43 -07:00
萝卜菜
d6d21640d3 [Feature] Support Deepseek-VL2 (#2798)
Co-authored-by: Edenzzzz <wtan45@wisc.edu>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: Yi Zhang <1109276519@qq.com>
2025-03-16 23:07:59 -07:00
mlmz
452db50808 Constraint Decoding: Set xgrammar as the default grammar backend (#4386) 2025-03-16 18:53:43 -07:00
Mick
9d02bb3e2a Urgent model support: support gemma-3-it (#4424) 2025-03-16 17:37:32 -07:00
Wang Ran (汪然)
22c96f78a6 typos: Update sampling_params.md (#4391)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
2025-03-15 16:40:18 -07:00
江家瑋
26c372c13c docs: Add Llama 3.3 to supported models (#4453)
Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
2025-03-15 16:33:43 -07:00
Chayenne
e1a5e7e47d docs: hot fix torch compile cache (#4442) 2025-03-14 19:05:59 -07:00
Zhan Lu
660305c38a [Doc] fix wrong flag in deepseek documentation (#4427) 2025-03-14 11:30:55 -07:00
Yineng Zhang
ba80c102f9 bump v0.4.4.post1 (#4402) 2025-03-13 17:53:46 -07:00
Yineng Zhang
6aaeb84872 chore: bump v0.4.4 (#4041) 2025-03-13 02:49:58 -07:00
Lianmin Zheng
45de89719c Revert "[XPU][CPU] Enable the native path of DeepSeek" (#4367) 2025-03-12 23:45:52 -07:00
Meng, Hengyu
71046fcd71 [XPU][CPU] Enable the native path of DeepSeek (#4086)
Co-authored-by: Zhang, Liangang <liangang.zhang@intel.com>
2025-03-12 22:26:29 -07:00
yang_zcybb
ad46550d25 [Doc] Fix typo in backend/sampling_params (#3835)
Co-authored-by: yangzhice.124 <yangzhice.124@bytedance.com>
2025-03-12 22:12:14 -07:00
Jun Liu
14344caa38 [docs] Update outdated description about torch.compile (#3844) 2025-03-12 22:09:38 -07:00
William
0a59a4657a Fix the doc of FR-Spec (#4295) 2025-03-12 21:22:50 -07:00
Peter Pan
016033188c docs: add parameter --log-requests-level (#4335) 2025-03-12 21:19:37 -07:00
shizhediao
2c3656f276 [Fix Doc.] Enable internal forwarding when starting the router (#4355) 2025-03-12 15:53:26 -07:00
Mick
01090e8ac3 model: Support Janus-pro (#3203) 2025-03-12 11:02:11 -07:00
Michael Yao
8f1f614ee2 [Docs] Clean up benchmark_and_profiling.md (#4297)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-03-11 21:48:21 -07:00
Yineng Zhang
1cf63485c1 upgrade flashinfer 0.2.3 (#4317)
Co-authored-by:  qingquansong <qsong@linkedin.com>
2025-03-11 15:37:17 -07:00
Yineng Zhang
00f42707ea update doc (#4299) 2025-03-11 01:14:16 -07:00
Ke Bao
3a08f54638 Update MTP doc (#4290) 2025-03-11 00:46:55 -07:00
Baizhou Zhang
9fb48f951f Support nextn for flashinfer mla attention backend (#4218) 2025-03-09 00:01:54 -08:00
Stefan He
dceb256f1b [docs] Unhide production metrics page (#4193) 2025-03-08 23:41:40 -08:00
Peter Pan
0e90ae628a [docker] Distributed Serving with k8s Statefulset ( good example for DeepSeek-R1) (#3631)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Co-authored-by: Kebe <kebe.liu@daocloud.io>
2025-03-08 23:41:20 -08:00
Xihuai Wang
6eec3cdce6 docs(reasoning content): 📝 deepseek-r1 parser support qwq (#4124) 2025-03-09 04:14:50 +00:00
Michael Yao
c827c671f7 [Docs] Improve bullets appearance and grammar (#4174)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-03-07 03:16:25 -08:00
Yineng Zhang
b55a621ffb fix int8 doc link (#4179) 2025-03-07 02:49:19 -08:00
lukec
ffa1b3e318 Add an example of using deepseekv3 int8 sglang. (#4177)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-03-07 01:56:09 -08:00
Michael Yao
d557319a8b [Docs] Fix links and grammar issues (#4162) 2025-03-06 23:14:18 -08:00
Pan Lyu
361971b859 Add Support for Qwen2-VL Multi-modal Embedding Models (#3694) 2025-03-06 16:46:20 -08:00
Chayenne
9854a18a51 Hot fix small vocal eagle in docs (#4154)
Co-authored-by: ybyang <ybyang7@iflytek.com>
2025-03-06 15:13:26 -08:00
Chayenne
ebddb65aed Docs: add torch compile cache (#4151)
Co-authored-by: ybyang <ybyang7@iflytek.com>
2025-03-06 14:27:09 -08:00
Adarsh Shirawalmath
19fd57bcd7 [docs] fix HF reference script command (#4148) 2025-03-06 13:21:54 -08:00
Lianmin Zheng
9c58e68b4c Release v0.4.3.post4 (#4140) 2025-03-06 12:50:28 -08:00
simveit
8f0b63139e Docs: improve EAGLE docs (#4038) 2025-03-05 22:40:21 -08:00
samzong
b9b3b098b9 feat: support docs auto live-reload with sphinx-autobuild (#4111)
Signed-off-by: samzong <samzong.lu@gmail.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-03-05 22:39:34 -08:00
Yineng Zhang
fc671f66c1 chore: bump v0.4.3.post3 (#4114) 2025-03-05 17:26:10 -08:00
samzong
197751e9a1 fix Non-consecutive header level increase in docs/router/router.md (#4099)
Signed-off-by: samzong <samzong.lu@gmail.com>
2025-03-05 17:02:32 -08:00
samzong
d2d0d061d9 fix cross-reference error and spelling mistakes (#4101)
Signed-off-by: samzong <samzong.lu@gmail.com>
2025-03-05 16:39:02 -08:00
Yineng Zhang
0aaccbbfec revert deepseek docs (#4109) 2025-03-05 13:23:11 -08:00
Qiaolin Yu
357671e216 Add examples for server token-in-token-out (#4103)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-03-05 13:16:31 -08:00
Chayenne
e70fa279bc Docs: reorganize dpsk docs (#4108) 2025-03-05 13:01:03 -08:00
Tommy Yang
abe74b7b59 Docs: Add DeepSeek optimization ablations documentation (#4107)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-03-05 12:25:51 -08:00
Baizhou Zhang
fc91d08a8f [Revision] Add fast decode plan for flashinfer mla (#4012) 2025-03-05 11:20:41 -08:00
Qubitium-ModelCloud
56a724eba3 [QUANT] Add GPTQModel Dynamic Quantization + lm_head Quantization (#3790)
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
2025-03-05 01:11:00 -08:00
Mick
583d6af71b example: add vlm to token in & out example (#3941)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-03-04 22:18:26 -08:00