Chayenne
|
6b859e7ddd
|
Docs: add special warning to engine docs (#3979)
|
2025-02-28 21:59:20 -08:00 |
|
Chayenne
|
3f8a441437
|
Docs: Add redline to highlight main process (#3977)
|
2025-02-28 18:37:15 -08:00 |
|
Chayenne
|
aceb420179
|
Docs: add type hint to smapling parameters (#3975)
|
2025-02-28 18:21:20 -08:00 |
|
Baizhou Zhang
|
90a4b7d98a
|
[Feature]Support ragged prefill in flashinfer mla backend (#3967)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
Co-authored-by: pankajroark <pankajroark@users.noreply.github.com>
|
2025-02-28 18:13:56 -08:00 |
|
Baizhou Zhang
|
3e02526b1f
|
[Doc] Add experimental tag for flashinfer mla (#3925)
|
2025-02-27 01:55:36 -08:00 |
|
Qiaolin Yu
|
d6898dd253
|
Add return hidden state in the native API (#3897)
Co-authored-by: Beichen-Ma <mabeichen12@gmail.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-02-26 22:06:54 -08:00 |
|
Baizhou Zhang
|
71ed01833d
|
[doc] Update document for flashinfer mla (#3907)
|
2025-02-26 20:40:45 -08:00 |
|
simveit
|
acd1a15921
|
Docs: Implemented frontend docs (#3791)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-02-26 15:30:05 -08:00 |
|
JC1DA
|
7551498a69
|
[Feature] Support llguidance for constrained decoding (#3298)
|
2025-02-26 10:41:49 -08:00 |
|
simveit
|
44a2c4bd56
|
Docs: improve link to docs (#3860)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-02-26 10:29:25 -08:00 |
|
Yuanheng Zhao
|
3758d209a0
|
[Doc] Fix typo in server-argument description (#3641)
|
2025-02-24 16:57:13 -08:00 |
|
Lianmin Zheng
|
f2388f6b95
|
Revert "Rename TokenizerManager to StdOrchestrator" (#3828)
|
2025-02-24 14:47:59 -08:00 |
|
fzyzcjy
|
45360b2fa9
|
Improve: Rename TokenizerManager to StdOrchestrator (#3116)
|
2025-02-23 00:30:58 -08:00 |
|
Chayenne
|
e310722266
|
Docs: Update offline_engine_api and add links (#3773)
|
2025-02-21 14:15:52 -08:00 |
|
Shi Shuai
|
e074e76b31
|
docs: Add offline engine launch example and documentation (#3771)
|
2025-02-21 11:25:52 -08:00 |
|
simveit
|
4592afc27d
|
Docs: Fix layout to docs (#3733)
|
2025-02-21 11:24:13 -08:00 |
|
Baizhou Zhang
|
ac05310098
|
[Docs] Modify ep related server args and remove cublas part of deepseek (#3732)
|
2025-02-21 03:37:56 +08:00 |
|
Chayenne
|
3c7bfd7eab
|
Docs: Fix layout with sub-section (#3710)
|
2025-02-19 15:44:30 -08:00 |
|
Shi Shuai
|
55de40f782
|
[Docs]: Fix Multi-User Port Allocation Conflicts (#3601)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
Co-authored-by: simveit <simp.veitner@gmail.com>
|
2025-02-19 11:15:44 -08:00 |
|
Mick
|
7711ac6ed0
|
doc: emphasize and notify the usage of chat_template (#3589)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-02-15 00:10:32 -08:00 |
|
Shi Shuai
|
7443197a63
|
[CI] Improve Docs CI Efficiency (#3587)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-02-14 19:57:00 -08:00 |
|
Yineng Zhang
|
31eec35ba8
|
fix doc (#3558)
|
2025-02-14 10:11:31 +08:00 |
|
Didier Durand
|
1e2cf2b541
|
fix server_arguments typo (#3499)
|
2025-02-12 02:59:53 +08:00 |
|
Jackmin801
|
5f0e7de339
|
[Feat] Return hidden states (experimental) (#3364)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-02-10 15:54:37 -08:00 |
|
Yineng Zhang
|
4d2dbeaca7
|
remove cutex dependency (#3422)
|
2025-02-09 18:33:20 +08:00 |
|
Baizhou Zhang
|
70817a7eae
|
[Feature] Define backends and add Triton backend for Lora (#3161)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2025-02-03 22:09:13 -08:00 |
|
simveit
|
7b5a374114
|
Update server args doc (#3273)
Co-authored-by: Shi Shuai <126407087+shuaills@users.noreply.github.com>
|
2025-02-03 23:39:41 +00:00 |
|
Jhin
|
656f7fc1bc
|
Docs: Quick fix for Speculative_decoding doc (#3228)
Co-authored-by: Chayenne <zhaochenyang@ucla.edu>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-01-31 08:30:40 -08:00 |
|
Jhin
|
7b9b4f4426
|
Docs fix about EAGLE and streaming output (#3166)
Co-authored-by: Chayenne <zhaochenyang@ucla.edu>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: Jhin <jhinpan@umich.edu>
|
2025-01-27 18:10:45 -08:00 |
|
Jhin
|
9472e69963
|
Doc: Add Docs about EAGLE speculative decoding (#3144)
Co-authored-by: Chayenne <zhaochenyang@ucla.edu>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-01-26 17:49:13 -08:00 |
|
YAMY
|
b045841bae
|
Feature/function calling update (#2700)
Co-authored-by: Mingyuan Ma <mamingyuan2001@berkeley.edu>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: shuaills <shishuaiuoe@gmail.com>
|
2025-01-26 09:57:51 -08:00 |
|
simveit
|
1c4e0d2445
|
Docs: Update doc for server arguments (#2742)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2025-01-23 11:32:05 -08:00 |
|
Chayenne
|
0ffcfdf474
|
Docs: Only use X-Grammar in structed output (#2991)
|
2025-01-19 20:22:47 -08:00 |
|
Enrique Shockwave
|
3bcf5ecea7
|
support regex in xgrammar backend (#2983)
|
2025-01-20 04:34:41 +08:00 |
|
Lianmin Zheng
|
8b6ce52e92
|
Support multi-node DP attention (#2925)
Co-authored-by: dhou-xai <dhou@x.ai>
|
2025-01-16 11:15:00 -08:00 |
|
Shi Shuai
|
c4f9707e16
|
Improve: Token-In Token-Out Usage for RLHF (#2843)
|
2025-01-11 15:14:26 -08:00 |
|
Chayenne
|
2e6346fc2e
|
Docs:Update the style of llma 3.1 405B docs (#2789)
|
2025-01-08 01:07:54 -08:00 |
|
mlmz
|
977f785dad
|
Docs: Rewrite docs for LLama 405B and ModelSpace (#2773)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-01-08 00:02:59 -08:00 |
|
Shi Shuai
|
062c48d2bd
|
[Docs] Add Support for Pydantic Structured Output Format (#2697)
|
2025-01-01 15:08:43 -08:00 |
|
Shi Shuai
|
0a765bbccc
|
Docs: Refactor Contribution Guide (#2690)
|
2024-12-31 14:11:00 -08:00 |
|
Lianmin Zheng
|
bdd2827a80
|
Update structured_outputs.ipynb (#2666)
|
2024-12-30 00:46:41 -08:00 |
|
Lianmin Zheng
|
8c3b420eec
|
[Docs] clean up structured outputs docs (#2654)
|
2024-12-29 23:57:16 -08:00 |
|
Adarsh Shirawalmath
|
fd34f2da35
|
[Docs] Add EBNF to sampling params docs (#2609)
|
2024-12-29 00:05:00 -08:00 |
|
Tanjiro
|
8ee9a8501a
|
[Feature] Function Calling (#2544)
Co-authored-by: Haoyu Wang <120358163+HaoyuWang4188@users.noreply.github.com>
|
2024-12-28 21:58:52 -08:00 |
|
Shi Shuai
|
333e3bfde5
|
[docs]Refactor constrained decoding tutorial (#2633)
|
2024-12-28 07:00:38 -08:00 |
|
Shi Shuai
|
239c9d4d3a
|
Docs: Add constrained decoding tutorial (#2614)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2024-12-27 23:54:28 -08:00 |
|
Lianmin Zheng
|
773951548d
|
Fix logprob_start_len for multi modal models (#2597)
Co-authored-by: libra <lihu723@gmail.com>
Co-authored-by: fzyzcjy <ch271828n@outlook.com>
Co-authored-by: Wang, Haoyu <haoyu.wang@intel.com>
|
2024-12-26 06:27:45 -08:00 |
|
Shi Shuai
|
25e5d589e3
|
Doc: Update Grammar Backend (#2545)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2024-12-22 17:14:40 -08:00 |
|
Chayenne
|
786be44da5
|
Fix Docs CI When Compile Error (#2323)
|
2024-12-04 11:19:46 -08:00 |
|
Chayenne
|
7d5d1d3d29
|
udate weights from disk (#2265)
|
2024-11-30 01:17:00 +00:00 |
|