lukec
|
ffa1b3e318
|
Add an example of using deepseekv3 int8 sglang. (#4177)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-03-07 01:56:09 -08:00 |
|
Michael Yao
|
d557319a8b
|
[Docs] Fix links and grammar issues (#4162)
|
2025-03-06 23:14:18 -08:00 |
|
Pan Lyu
|
361971b859
|
Add Support for Qwen2-VL Multi-modal Embedding Models (#3694)
|
2025-03-06 16:46:20 -08:00 |
|
Chayenne
|
9854a18a51
|
Hot fix small vocal eagle in docs (#4154)
Co-authored-by: ybyang <ybyang7@iflytek.com>
|
2025-03-06 15:13:26 -08:00 |
|
Chayenne
|
ebddb65aed
|
Docs: add torch compile cache (#4151)
Co-authored-by: ybyang <ybyang7@iflytek.com>
|
2025-03-06 14:27:09 -08:00 |
|
Adarsh Shirawalmath
|
19fd57bcd7
|
[docs] fix HF reference script command (#4148)
|
2025-03-06 13:21:54 -08:00 |
|
Lianmin Zheng
|
9c58e68b4c
|
Release v0.4.3.post4 (#4140)
|
2025-03-06 12:50:28 -08:00 |
|
simveit
|
8f0b63139e
|
Docs: improve EAGLE docs (#4038)
|
2025-03-05 22:40:21 -08:00 |
|
samzong
|
b9b3b098b9
|
feat: support docs auto live-reload with sphinx-autobuild (#4111)
Signed-off-by: samzong <samzong.lu@gmail.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-03-05 22:39:34 -08:00 |
|
Yineng Zhang
|
fc671f66c1
|
chore: bump v0.4.3.post3 (#4114)
|
2025-03-05 17:26:10 -08:00 |
|
samzong
|
197751e9a1
|
fix Non-consecutive header level increase in docs/router/router.md (#4099)
Signed-off-by: samzong <samzong.lu@gmail.com>
|
2025-03-05 17:02:32 -08:00 |
|
samzong
|
d2d0d061d9
|
fix cross-reference error and spelling mistakes (#4101)
Signed-off-by: samzong <samzong.lu@gmail.com>
|
2025-03-05 16:39:02 -08:00 |
|
Yineng Zhang
|
0aaccbbfec
|
revert deepseek docs (#4109)
|
2025-03-05 13:23:11 -08:00 |
|
Qiaolin Yu
|
357671e216
|
Add examples for server token-in-token-out (#4103)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-03-05 13:16:31 -08:00 |
|
Chayenne
|
e70fa279bc
|
Docs: reorganize dpsk docs (#4108)
|
2025-03-05 13:01:03 -08:00 |
|
Tommy Yang
|
abe74b7b59
|
Docs: Add DeepSeek optimization ablations documentation (#4107)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-03-05 12:25:51 -08:00 |
|
Baizhou Zhang
|
fc91d08a8f
|
[Revision] Add fast decode plan for flashinfer mla (#4012)
|
2025-03-05 11:20:41 -08:00 |
|
Qubitium-ModelCloud
|
56a724eba3
|
[QUANT] Add GPTQModel Dynamic Quantization + lm_head Quantization (#3790)
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
|
2025-03-05 01:11:00 -08:00 |
|
Mick
|
583d6af71b
|
example: add vlm to token in & out example (#3941)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-03-04 22:18:26 -08:00 |
|
Qiaolin Yu
|
4725e3f652
|
Add examples for returning hidden states when using the server (#4074)
|
2025-03-04 19:31:50 -08:00 |
|
Xihuai Wang
|
95575aa76a
|
Reasoning parser (#4000)
Co-authored-by: Lucas Pickup <lupickup@microsoft.com>
|
2025-03-03 21:16:36 -08:00 |
|
Chayenne
|
146ac8df07
|
Add examples in sampling parameters (#4039)
|
2025-03-03 13:04:32 -08:00 |
|
Chayenne
|
2796fbb53d
|
Docs: Fix sampling parameter (#4034)
|
2025-03-03 09:32:36 -08:00 |
|
Lianmin Zheng
|
935cda944b
|
Misc clean up; Remove the support of jump forward (#4032)
|
2025-03-03 07:02:14 -08:00 |
|
Yudi Xue
|
a7000a7650
|
Update metrics documentation (#3264)
|
2025-03-03 05:03:58 -08:00 |
|
Lianmin Zheng
|
ac2387279e
|
Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
|
2025-03-03 00:12:04 -08:00 |
|
Chayenne
|
728e175fc4
|
Add examples to token-in-token-out for LLM (#4010)
|
2025-03-02 21:03:49 -08:00 |
|
Lianmin Zheng
|
9e1014cf99
|
Revert "Add fast decode plan for flashinfer mla" (#4008)
|
2025-03-02 19:29:10 -08:00 |
|
Baizhou Zhang
|
fa56106731
|
Add fast decode plan for flashinfer mla (#3987)
|
2025-03-02 19:16:37 -08:00 |
|
Zhousx
|
7fbab730bd
|
[feat] add small vocab table for eagle's draft model[1]. (#3822)
Co-authored-by: Achazwl <323163497@qq.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-03-02 18:58:45 -08:00 |
|
Qiaolin Yu
|
40782f05d7
|
Refactor: Move return_hidden_states to the generate input (#3985)
Co-authored-by: Beichen-Ma <mabeichen12@gmail.com>
|
2025-03-01 17:51:29 -08:00 |
|
Chayenne
|
6b859e7ddd
|
Docs: add special warning to engine docs (#3979)
|
2025-02-28 21:59:20 -08:00 |
|
Chayenne
|
3f8a441437
|
Docs: Add redline to highlight main process (#3977)
|
2025-02-28 18:37:15 -08:00 |
|
Chayenne
|
aceb420179
|
Docs: add type hint to smapling parameters (#3975)
|
2025-02-28 18:21:20 -08:00 |
|
Baizhou Zhang
|
90a4b7d98a
|
[Feature]Support ragged prefill in flashinfer mla backend (#3967)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
Co-authored-by: pankajroark <pankajroark@users.noreply.github.com>
|
2025-02-28 18:13:56 -08:00 |
|
Yineng Zhang
|
564bdf29f7
|
upgrade flashinfer v0.2.2.post1 (#3934)
|
2025-02-27 09:53:48 -08:00 |
|
Yineng Zhang
|
5d86016855
|
revert "Docs: Reorngaize dpsk links #3900" (#3933)
|
2025-02-27 08:57:13 -08:00 |
|
Baizhou Zhang
|
3e02526b1f
|
[Doc] Add experimental tag for flashinfer mla (#3925)
|
2025-02-27 01:55:36 -08:00 |
|
Stefan He
|
d8a98a2cad
|
[Docs] Improve DPSK docs in dark mode (#3914)
|
2025-02-27 00:13:04 -08:00 |
|
Qiaolin Yu
|
d6898dd253
|
Add return hidden state in the native API (#3897)
Co-authored-by: Beichen-Ma <mabeichen12@gmail.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-02-26 22:06:54 -08:00 |
|
Baizhou Zhang
|
71ed01833d
|
[doc] Update document for flashinfer mla (#3907)
|
2025-02-26 20:40:45 -08:00 |
|
simveit
|
acd1a15921
|
Docs: Implemented frontend docs (#3791)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-02-26 15:30:05 -08:00 |
|
Chayenne
|
7c1692aa90
|
Docs: Reorngaize dpsk links (#3900)
|
2025-02-26 15:16:31 -08:00 |
|
Chayenne
|
8f019c7d1a
|
Docs: Move dpsk docs forward a step (#3894)
|
2025-02-26 11:43:20 -08:00 |
|
JC1DA
|
7551498a69
|
[Feature] Support llguidance for constrained decoding (#3298)
|
2025-02-26 10:41:49 -08:00 |
|
simveit
|
44a2c4bd56
|
Docs: improve link to docs (#3860)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-02-26 10:29:25 -08:00 |
|
simveit
|
c9fc4a9d26
|
Docs: delete sgl-kernel install in docs (#3845)
|
2025-02-26 09:25:43 -08:00 |
|
Shenggui Li
|
3dc9ff3ce8
|
[doc] fixed dpsk quant faq (#3865)
|
2025-02-25 19:40:47 -08:00 |
|
Shenggui Li
|
06427dfab1
|
[doc] added quantization doc for dpsk (#3843)
|
2025-02-25 09:43:28 -08:00 |
|
Shenggui Li
|
c0bb9eb3b3
|
[improve] made timeout configurable (#3803)
|
2025-02-25 00:26:08 -08:00 |
|