Yineng Zhang
|
6aaeb84872
|
chore: bump v0.4.4 (#4041)
|
2025-03-13 02:49:58 -07:00 |
|
Lianmin Zheng
|
45de89719c
|
Revert "[XPU][CPU] Enable the native path of DeepSeek" (#4367)
|
2025-03-12 23:45:52 -07:00 |
|
Meng, Hengyu
|
71046fcd71
|
[XPU][CPU] Enable the native path of DeepSeek (#4086)
Co-authored-by: Zhang, Liangang <liangang.zhang@intel.com>
|
2025-03-12 22:26:29 -07:00 |
|
yang_zcybb
|
ad46550d25
|
[Doc] Fix typo in backend/sampling_params (#3835)
Co-authored-by: yangzhice.124 <yangzhice.124@bytedance.com>
|
2025-03-12 22:12:14 -07:00 |
|
Jun Liu
|
14344caa38
|
[docs] Update outdated description about torch.compile (#3844)
|
2025-03-12 22:09:38 -07:00 |
|
William
|
0a59a4657a
|
Fix the doc of FR-Spec (#4295)
|
2025-03-12 21:22:50 -07:00 |
|
Peter Pan
|
016033188c
|
docs: add parameter --log-requests-level (#4335)
|
2025-03-12 21:19:37 -07:00 |
|
shizhediao
|
2c3656f276
|
[Fix Doc.] Enable internal forwarding when starting the router (#4355)
|
2025-03-12 15:53:26 -07:00 |
|
Mick
|
01090e8ac3
|
model: Support Janus-pro (#3203)
|
2025-03-12 11:02:11 -07:00 |
|
Michael Yao
|
8f1f614ee2
|
[Docs] Clean up benchmark_and_profiling.md (#4297)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-03-11 21:48:21 -07:00 |
|
Yineng Zhang
|
1cf63485c1
|
upgrade flashinfer 0.2.3 (#4317)
Co-authored-by: qingquansong <qsong@linkedin.com>
|
2025-03-11 15:37:17 -07:00 |
|
Yineng Zhang
|
00f42707ea
|
update doc (#4299)
|
2025-03-11 01:14:16 -07:00 |
|
Ke Bao
|
3a08f54638
|
Update MTP doc (#4290)
|
2025-03-11 00:46:55 -07:00 |
|
Baizhou Zhang
|
9fb48f951f
|
Support nextn for flashinfer mla attention backend (#4218)
|
2025-03-09 00:01:54 -08:00 |
|
Stefan He
|
dceb256f1b
|
[docs] Unhide production metrics page (#4193)
|
2025-03-08 23:41:40 -08:00 |
|
Peter Pan
|
0e90ae628a
|
[docker] Distributed Serving with k8s Statefulset ( good example for DeepSeek-R1) (#3631)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Co-authored-by: Kebe <kebe.liu@daocloud.io>
|
2025-03-08 23:41:20 -08:00 |
|
Xihuai Wang
|
6eec3cdce6
|
docs(reasoning content): 📝 deepseek-r1 parser support qwq (#4124)
|
2025-03-09 04:14:50 +00:00 |
|
Michael Yao
|
c827c671f7
|
[Docs] Improve bullets appearance and grammar (#4174)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-03-07 03:16:25 -08:00 |
|
Yineng Zhang
|
b55a621ffb
|
fix int8 doc link (#4179)
|
2025-03-07 02:49:19 -08:00 |
|
lukec
|
ffa1b3e318
|
Add an example of using deepseekv3 int8 sglang. (#4177)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-03-07 01:56:09 -08:00 |
|
Michael Yao
|
d557319a8b
|
[Docs] Fix links and grammar issues (#4162)
|
2025-03-06 23:14:18 -08:00 |
|
Pan Lyu
|
361971b859
|
Add Support for Qwen2-VL Multi-modal Embedding Models (#3694)
|
2025-03-06 16:46:20 -08:00 |
|
Chayenne
|
9854a18a51
|
Hot fix small vocal eagle in docs (#4154)
Co-authored-by: ybyang <ybyang7@iflytek.com>
|
2025-03-06 15:13:26 -08:00 |
|
Chayenne
|
ebddb65aed
|
Docs: add torch compile cache (#4151)
Co-authored-by: ybyang <ybyang7@iflytek.com>
|
2025-03-06 14:27:09 -08:00 |
|
Adarsh Shirawalmath
|
19fd57bcd7
|
[docs] fix HF reference script command (#4148)
|
2025-03-06 13:21:54 -08:00 |
|
Lianmin Zheng
|
9c58e68b4c
|
Release v0.4.3.post4 (#4140)
|
2025-03-06 12:50:28 -08:00 |
|
simveit
|
8f0b63139e
|
Docs: improve EAGLE docs (#4038)
|
2025-03-05 22:40:21 -08:00 |
|
samzong
|
b9b3b098b9
|
feat: support docs auto live-reload with sphinx-autobuild (#4111)
Signed-off-by: samzong <samzong.lu@gmail.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-03-05 22:39:34 -08:00 |
|
Yineng Zhang
|
fc671f66c1
|
chore: bump v0.4.3.post3 (#4114)
|
2025-03-05 17:26:10 -08:00 |
|
samzong
|
197751e9a1
|
fix Non-consecutive header level increase in docs/router/router.md (#4099)
Signed-off-by: samzong <samzong.lu@gmail.com>
|
2025-03-05 17:02:32 -08:00 |
|
samzong
|
d2d0d061d9
|
fix cross-reference error and spelling mistakes (#4101)
Signed-off-by: samzong <samzong.lu@gmail.com>
|
2025-03-05 16:39:02 -08:00 |
|
Yineng Zhang
|
0aaccbbfec
|
revert deepseek docs (#4109)
|
2025-03-05 13:23:11 -08:00 |
|
Qiaolin Yu
|
357671e216
|
Add examples for server token-in-token-out (#4103)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-03-05 13:16:31 -08:00 |
|
Chayenne
|
e70fa279bc
|
Docs: reorganize dpsk docs (#4108)
|
2025-03-05 13:01:03 -08:00 |
|
Tommy Yang
|
abe74b7b59
|
Docs: Add DeepSeek optimization ablations documentation (#4107)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-03-05 12:25:51 -08:00 |
|
Baizhou Zhang
|
fc91d08a8f
|
[Revision] Add fast decode plan for flashinfer mla (#4012)
|
2025-03-05 11:20:41 -08:00 |
|
Qubitium-ModelCloud
|
56a724eba3
|
[QUANT] Add GPTQModel Dynamic Quantization + lm_head Quantization (#3790)
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
|
2025-03-05 01:11:00 -08:00 |
|
Mick
|
583d6af71b
|
example: add vlm to token in & out example (#3941)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-03-04 22:18:26 -08:00 |
|
Qiaolin Yu
|
4725e3f652
|
Add examples for returning hidden states when using the server (#4074)
|
2025-03-04 19:31:50 -08:00 |
|
Xihuai Wang
|
95575aa76a
|
Reasoning parser (#4000)
Co-authored-by: Lucas Pickup <lupickup@microsoft.com>
|
2025-03-03 21:16:36 -08:00 |
|
Chayenne
|
146ac8df07
|
Add examples in sampling parameters (#4039)
|
2025-03-03 13:04:32 -08:00 |
|
Chayenne
|
2796fbb53d
|
Docs: Fix sampling parameter (#4034)
|
2025-03-03 09:32:36 -08:00 |
|
Lianmin Zheng
|
935cda944b
|
Misc clean up; Remove the support of jump forward (#4032)
|
2025-03-03 07:02:14 -08:00 |
|
Yudi Xue
|
a7000a7650
|
Update metrics documentation (#3264)
|
2025-03-03 05:03:58 -08:00 |
|
Lianmin Zheng
|
ac2387279e
|
Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
|
2025-03-03 00:12:04 -08:00 |
|
Chayenne
|
728e175fc4
|
Add examples to token-in-token-out for LLM (#4010)
|
2025-03-02 21:03:49 -08:00 |
|
Lianmin Zheng
|
9e1014cf99
|
Revert "Add fast decode plan for flashinfer mla" (#4008)
|
2025-03-02 19:29:10 -08:00 |
|
Baizhou Zhang
|
fa56106731
|
Add fast decode plan for flashinfer mla (#3987)
|
2025-03-02 19:16:37 -08:00 |
|
Zhousx
|
7fbab730bd
|
[feat] add small vocab table for eagle's draft model[1]. (#3822)
Co-authored-by: Achazwl <323163497@qq.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-03-02 18:58:45 -08:00 |
|
Qiaolin Yu
|
40782f05d7
|
Refactor: Move return_hidden_states to the generate input (#3985)
Co-authored-by: Beichen-Ma <mabeichen12@gmail.com>
|
2025-03-01 17:51:29 -08:00 |
|