Michael Yao
|
d557319a8b
|
[Docs] Fix links and grammar issues (#4162)
|
2025-03-06 23:14:18 -08:00 |
|
Chayenne
|
9854a18a51
|
Hot fix small vocal eagle in docs (#4154)
Co-authored-by: ybyang <ybyang7@iflytek.com>
|
2025-03-06 15:13:26 -08:00 |
|
Chayenne
|
ebddb65aed
|
Docs: add torch compile cache (#4151)
Co-authored-by: ybyang <ybyang7@iflytek.com>
|
2025-03-06 14:27:09 -08:00 |
|
simveit
|
8f0b63139e
|
Docs: improve EAGLE docs (#4038)
|
2025-03-05 22:40:21 -08:00 |
|
samzong
|
d2d0d061d9
|
fix cross-reference error and spelling mistakes (#4101)
Signed-off-by: samzong <samzong.lu@gmail.com>
|
2025-03-05 16:39:02 -08:00 |
|
Qiaolin Yu
|
357671e216
|
Add examples for server token-in-token-out (#4103)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-03-05 13:16:31 -08:00 |
|
Baizhou Zhang
|
fc91d08a8f
|
[Revision] Add fast decode plan for flashinfer mla (#4012)
|
2025-03-05 11:20:41 -08:00 |
|
Qubitium-ModelCloud
|
56a724eba3
|
[QUANT] Add GPTQModel Dynamic Quantization + lm_head Quantization (#3790)
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
|
2025-03-05 01:11:00 -08:00 |
|
Mick
|
583d6af71b
|
example: add vlm to token in & out example (#3941)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-03-04 22:18:26 -08:00 |
|
Qiaolin Yu
|
4725e3f652
|
Add examples for returning hidden states when using the server (#4074)
|
2025-03-04 19:31:50 -08:00 |
|
Xihuai Wang
|
95575aa76a
|
Reasoning parser (#4000)
Co-authored-by: Lucas Pickup <lupickup@microsoft.com>
|
2025-03-03 21:16:36 -08:00 |
|
Chayenne
|
146ac8df07
|
Add examples in sampling parameters (#4039)
|
2025-03-03 13:04:32 -08:00 |
|
Chayenne
|
2796fbb53d
|
Docs: Fix sampling parameter (#4034)
|
2025-03-03 09:32:36 -08:00 |
|
Lianmin Zheng
|
935cda944b
|
Misc clean up; Remove the support of jump forward (#4032)
|
2025-03-03 07:02:14 -08:00 |
|
Lianmin Zheng
|
ac2387279e
|
Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
|
2025-03-03 00:12:04 -08:00 |
|
Chayenne
|
728e175fc4
|
Add examples to token-in-token-out for LLM (#4010)
|
2025-03-02 21:03:49 -08:00 |
|
Lianmin Zheng
|
9e1014cf99
|
Revert "Add fast decode plan for flashinfer mla" (#4008)
|
2025-03-02 19:29:10 -08:00 |
|
Baizhou Zhang
|
fa56106731
|
Add fast decode plan for flashinfer mla (#3987)
|
2025-03-02 19:16:37 -08:00 |
|
Zhousx
|
7fbab730bd
|
[feat] add small vocab table for eagle's draft model[1]. (#3822)
Co-authored-by: Achazwl <323163497@qq.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-03-02 18:58:45 -08:00 |
|
Qiaolin Yu
|
40782f05d7
|
Refactor: Move return_hidden_states to the generate input (#3985)
Co-authored-by: Beichen-Ma <mabeichen12@gmail.com>
|
2025-03-01 17:51:29 -08:00 |
|
Chayenne
|
6b859e7ddd
|
Docs: add special warning to engine docs (#3979)
|
2025-02-28 21:59:20 -08:00 |
|
Chayenne
|
3f8a441437
|
Docs: Add redline to highlight main process (#3977)
|
2025-02-28 18:37:15 -08:00 |
|
Chayenne
|
aceb420179
|
Docs: add type hint to smapling parameters (#3975)
|
2025-02-28 18:21:20 -08:00 |
|
Baizhou Zhang
|
90a4b7d98a
|
[Feature]Support ragged prefill in flashinfer mla backend (#3967)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
Co-authored-by: pankajroark <pankajroark@users.noreply.github.com>
|
2025-02-28 18:13:56 -08:00 |
|
Baizhou Zhang
|
3e02526b1f
|
[Doc] Add experimental tag for flashinfer mla (#3925)
|
2025-02-27 01:55:36 -08:00 |
|
Qiaolin Yu
|
d6898dd253
|
Add return hidden state in the native API (#3897)
Co-authored-by: Beichen-Ma <mabeichen12@gmail.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-02-26 22:06:54 -08:00 |
|
Baizhou Zhang
|
71ed01833d
|
[doc] Update document for flashinfer mla (#3907)
|
2025-02-26 20:40:45 -08:00 |
|
simveit
|
acd1a15921
|
Docs: Implemented frontend docs (#3791)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-02-26 15:30:05 -08:00 |
|
JC1DA
|
7551498a69
|
[Feature] Support llguidance for constrained decoding (#3298)
|
2025-02-26 10:41:49 -08:00 |
|
simveit
|
44a2c4bd56
|
Docs: improve link to docs (#3860)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-02-26 10:29:25 -08:00 |
|
Yuanheng Zhao
|
3758d209a0
|
[Doc] Fix typo in server-argument description (#3641)
|
2025-02-24 16:57:13 -08:00 |
|
Lianmin Zheng
|
f2388f6b95
|
Revert "Rename TokenizerManager to StdOrchestrator" (#3828)
|
2025-02-24 14:47:59 -08:00 |
|
fzyzcjy
|
45360b2fa9
|
Improve: Rename TokenizerManager to StdOrchestrator (#3116)
|
2025-02-23 00:30:58 -08:00 |
|
Chayenne
|
e310722266
|
Docs: Update offline_engine_api and add links (#3773)
|
2025-02-21 14:15:52 -08:00 |
|
Shi Shuai
|
e074e76b31
|
docs: Add offline engine launch example and documentation (#3771)
|
2025-02-21 11:25:52 -08:00 |
|
simveit
|
4592afc27d
|
Docs: Fix layout to docs (#3733)
|
2025-02-21 11:24:13 -08:00 |
|
Baizhou Zhang
|
ac05310098
|
[Docs] Modify ep related server args and remove cublas part of deepseek (#3732)
|
2025-02-21 03:37:56 +08:00 |
|
Chayenne
|
3c7bfd7eab
|
Docs: Fix layout with sub-section (#3710)
|
2025-02-19 15:44:30 -08:00 |
|
Shi Shuai
|
55de40f782
|
[Docs]: Fix Multi-User Port Allocation Conflicts (#3601)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
Co-authored-by: simveit <simp.veitner@gmail.com>
|
2025-02-19 11:15:44 -08:00 |
|
Mick
|
7711ac6ed0
|
doc: emphasize and notify the usage of chat_template (#3589)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-02-15 00:10:32 -08:00 |
|
Shi Shuai
|
7443197a63
|
[CI] Improve Docs CI Efficiency (#3587)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-02-14 19:57:00 -08:00 |
|
Yineng Zhang
|
31eec35ba8
|
fix doc (#3558)
|
2025-02-14 10:11:31 +08:00 |
|
Didier Durand
|
1e2cf2b541
|
fix server_arguments typo (#3499)
|
2025-02-12 02:59:53 +08:00 |
|
Jackmin801
|
5f0e7de339
|
[Feat] Return hidden states (experimental) (#3364)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-02-10 15:54:37 -08:00 |
|
Yineng Zhang
|
4d2dbeaca7
|
remove cutex dependency (#3422)
|
2025-02-09 18:33:20 +08:00 |
|
Baizhou Zhang
|
70817a7eae
|
[Feature] Define backends and add Triton backend for Lora (#3161)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2025-02-03 22:09:13 -08:00 |
|
simveit
|
7b5a374114
|
Update server args doc (#3273)
Co-authored-by: Shi Shuai <126407087+shuaills@users.noreply.github.com>
|
2025-02-03 23:39:41 +00:00 |
|
Jhin
|
656f7fc1bc
|
Docs: Quick fix for Speculative_decoding doc (#3228)
Co-authored-by: Chayenne <zhaochenyang@ucla.edu>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-01-31 08:30:40 -08:00 |
|
Jhin
|
7b9b4f4426
|
Docs fix about EAGLE and streaming output (#3166)
Co-authored-by: Chayenne <zhaochenyang@ucla.edu>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: Jhin <jhinpan@umich.edu>
|
2025-01-27 18:10:45 -08:00 |
|
Jhin
|
9472e69963
|
Doc: Add Docs about EAGLE speculative decoding (#3144)
Co-authored-by: Chayenne <zhaochenyang@ucla.edu>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-01-26 17:49:13 -08:00 |
|