Commit Graph

78 Commits

Author SHA1 Message Date
Chayenne
6b859e7ddd Docs: add special warning to engine docs (#3979) 2025-02-28 21:59:20 -08:00
Chayenne
3f8a441437 Docs: Add redline to highlight main process (#3977) 2025-02-28 18:37:15 -08:00
Chayenne
aceb420179 Docs: add type hint to smapling parameters (#3975) 2025-02-28 18:21:20 -08:00
Baizhou Zhang
90a4b7d98a [Feature]Support ragged prefill in flashinfer mla backend (#3967)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
Co-authored-by: pankajroark <pankajroark@users.noreply.github.com>
2025-02-28 18:13:56 -08:00
Baizhou Zhang
3e02526b1f [Doc] Add experimental tag for flashinfer mla (#3925) 2025-02-27 01:55:36 -08:00
Qiaolin Yu
d6898dd253 Add return hidden state in the native API (#3897)
Co-authored-by: Beichen-Ma <mabeichen12@gmail.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-02-26 22:06:54 -08:00
Baizhou Zhang
71ed01833d [doc] Update document for flashinfer mla (#3907) 2025-02-26 20:40:45 -08:00
simveit
acd1a15921 Docs: Implemented frontend docs (#3791)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-02-26 15:30:05 -08:00
JC1DA
7551498a69 [Feature] Support llguidance for constrained decoding (#3298) 2025-02-26 10:41:49 -08:00
simveit
44a2c4bd56 Docs: improve link to docs (#3860)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-02-26 10:29:25 -08:00
Yuanheng Zhao
3758d209a0 [Doc] Fix typo in server-argument description (#3641) 2025-02-24 16:57:13 -08:00
Lianmin Zheng
f2388f6b95 Revert "Rename TokenizerManager to StdOrchestrator" (#3828) 2025-02-24 14:47:59 -08:00
fzyzcjy
45360b2fa9 Improve: Rename TokenizerManager to StdOrchestrator (#3116) 2025-02-23 00:30:58 -08:00
Chayenne
e310722266 Docs: Update offline_engine_api and add links (#3773) 2025-02-21 14:15:52 -08:00
Shi Shuai
e074e76b31 docs: Add offline engine launch example and documentation (#3771) 2025-02-21 11:25:52 -08:00
simveit
4592afc27d Docs: Fix layout to docs (#3733) 2025-02-21 11:24:13 -08:00
Baizhou Zhang
ac05310098 [Docs] Modify ep related server args and remove cublas part of deepseek (#3732) 2025-02-21 03:37:56 +08:00
Chayenne
3c7bfd7eab Docs: Fix layout with sub-section (#3710) 2025-02-19 15:44:30 -08:00
Shi Shuai
55de40f782 [Docs]: Fix Multi-User Port Allocation Conflicts (#3601)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
Co-authored-by: simveit <simp.veitner@gmail.com>
2025-02-19 11:15:44 -08:00
Mick
7711ac6ed0 doc: emphasize and notify the usage of chat_template (#3589)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-02-15 00:10:32 -08:00
Shi Shuai
7443197a63 [CI] Improve Docs CI Efficiency (#3587)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-02-14 19:57:00 -08:00
Yineng Zhang
31eec35ba8 fix doc (#3558) 2025-02-14 10:11:31 +08:00
Didier Durand
1e2cf2b541 fix server_arguments typo (#3499) 2025-02-12 02:59:53 +08:00
Jackmin801
5f0e7de339 [Feat] Return hidden states (experimental) (#3364)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-02-10 15:54:37 -08:00
Yineng Zhang
4d2dbeaca7 remove cutex dependency (#3422) 2025-02-09 18:33:20 +08:00
Baizhou Zhang
70817a7eae [Feature] Define backends and add Triton backend for Lora (#3161)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
2025-02-03 22:09:13 -08:00
simveit
7b5a374114 Update server args doc (#3273)
Co-authored-by: Shi Shuai <126407087+shuaills@users.noreply.github.com>
2025-02-03 23:39:41 +00:00
Jhin
656f7fc1bc Docs: Quick fix for Speculative_decoding doc (#3228)
Co-authored-by: Chayenne <zhaochenyang@ucla.edu>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-01-31 08:30:40 -08:00
Jhin
7b9b4f4426 Docs fix about EAGLE and streaming output (#3166)
Co-authored-by: Chayenne <zhaochenyang@ucla.edu>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: Jhin <jhinpan@umich.edu>
2025-01-27 18:10:45 -08:00
Jhin
9472e69963 Doc: Add Docs about EAGLE speculative decoding (#3144)
Co-authored-by: Chayenne <zhaochenyang@ucla.edu>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-01-26 17:49:13 -08:00
YAMY
b045841bae Feature/function calling update (#2700)
Co-authored-by: Mingyuan Ma <mamingyuan2001@berkeley.edu>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: shuaills <shishuaiuoe@gmail.com>
2025-01-26 09:57:51 -08:00
simveit
1c4e0d2445 Docs: Update doc for server arguments (#2742)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-01-23 11:32:05 -08:00
Chayenne
0ffcfdf474 Docs: Only use X-Grammar in structed output (#2991) 2025-01-19 20:22:47 -08:00
Enrique Shockwave
3bcf5ecea7 support regex in xgrammar backend (#2983) 2025-01-20 04:34:41 +08:00
Lianmin Zheng
8b6ce52e92 Support multi-node DP attention (#2925)
Co-authored-by: dhou-xai <dhou@x.ai>
2025-01-16 11:15:00 -08:00
Shi Shuai
c4f9707e16 Improve: Token-In Token-Out Usage for RLHF (#2843) 2025-01-11 15:14:26 -08:00
Chayenne
2e6346fc2e Docs:Update the style of llma 3.1 405B docs (#2789) 2025-01-08 01:07:54 -08:00
mlmz
977f785dad Docs: Rewrite docs for LLama 405B and ModelSpace (#2773)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-01-08 00:02:59 -08:00
Shi Shuai
062c48d2bd [Docs] Add Support for Pydantic Structured Output Format (#2697) 2025-01-01 15:08:43 -08:00
Shi Shuai
0a765bbccc Docs: Refactor Contribution Guide (#2690) 2024-12-31 14:11:00 -08:00
Lianmin Zheng
bdd2827a80 Update structured_outputs.ipynb (#2666) 2024-12-30 00:46:41 -08:00
Lianmin Zheng
8c3b420eec [Docs] clean up structured outputs docs (#2654) 2024-12-29 23:57:16 -08:00
Adarsh Shirawalmath
fd34f2da35 [Docs] Add EBNF to sampling params docs (#2609) 2024-12-29 00:05:00 -08:00
Tanjiro
8ee9a8501a [Feature] Function Calling (#2544)
Co-authored-by: Haoyu Wang <120358163+HaoyuWang4188@users.noreply.github.com>
2024-12-28 21:58:52 -08:00
Shi Shuai
333e3bfde5 [docs]Refactor constrained decoding tutorial (#2633) 2024-12-28 07:00:38 -08:00
Shi Shuai
239c9d4d3a Docs: Add constrained decoding tutorial (#2614)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2024-12-27 23:54:28 -08:00
Lianmin Zheng
773951548d Fix logprob_start_len for multi modal models (#2597)
Co-authored-by: libra <lihu723@gmail.com>
Co-authored-by: fzyzcjy <ch271828n@outlook.com>
Co-authored-by: Wang, Haoyu <haoyu.wang@intel.com>
2024-12-26 06:27:45 -08:00
Shi Shuai
25e5d589e3 Doc: Update Grammar Backend (#2545)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2024-12-22 17:14:40 -08:00
Chayenne
786be44da5 Fix Docs CI When Compile Error (#2323) 2024-12-04 11:19:46 -08:00
Chayenne
7d5d1d3d29 udate weights from disk (#2265) 2024-11-30 01:17:00 +00:00