Commit Graph

68 Commits

Author SHA1 Message Date
yhyang201
a85363c199 [docs] Instructions for bench_serving.py (#9071)
Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-08-26 18:30:57 -07:00
Lianmin Zheng
2e8e7e353b Improve docs and developer guide (#9044) 2025-08-10 21:05:18 -07:00
Lianmin Zheng
2449a0afe2 Refactor the docs (#9031) 2025-08-10 19:49:45 -07:00
Lianmin Zheng
706bd69cc5 Clean up server_args.py to have a dedicated function for model specific adjustments (#8983) 2025-08-08 19:56:50 -07:00
Kevin Xiang Li
44d600cd67 Support precomputed_embeddings for Llama 4 (#8156)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Xiang (Kevin) Li <lik@nvidia.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
2025-07-27 01:14:49 -07:00
Lianmin Zheng
0f218731e3 Do not run frontend_reasoning.ipynb to reduce the CI load (#7073) 2025-06-10 17:15:31 -07:00
Yudi Xue
14c18d25df Frontend language separate reasoning support (#6031) 2025-06-10 17:11:29 -07:00
Lianmin Zheng
bb185b0e92 Update README.md (#7040) 2025-06-10 01:59:14 -07:00
Marc Sun
37f1547587 [FEAT] Add transformers backend support (#5929) 2025-06-03 21:05:29 -07:00
linzhuo
7a0bbe6a64 update toc for doc and dockerfile code style format (#6450)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-05-27 13:05:11 +08:00
simveit
e235be16fe Fix some issues with current docs. (#6588) 2025-05-26 01:04:34 +08:00
Byron Hsu
7513558074 [PD] Add doc and simplify sender.send (#6019) 2025-05-21 21:22:21 -07:00
Mick
cd7c8a8de6 doc: update developer guide regarding mllms (#6138)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: XinyuanTong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: Xinyuan Tong <justinning0323@outlook.com>
2025-05-14 23:13:13 +08:00
Lianmin Zheng
e8e18dcdcc Revert "fix some typos" (#6244) 2025-05-12 12:53:26 -07:00
applesaucethebun
d738ab52f8 fix some typos (#6209)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-05-13 01:42:38 +08:00
江家瑋
ad506a4e6b docs: Fix Qwen model typo (#5944)
Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
2025-05-01 10:23:00 -07:00
Lianmin Zheng
155890e4d1 [Minor] fix documentations (#5756) 2025-04-26 17:48:43 -07:00
Baizhou Zhang
072b4d0398 Add document for LoRA serving (#5521) 2025-04-20 14:37:57 -07:00
mlmz
f13d65a7ea Doc: fix problems of the 'Execute Notebooks / run-all-notebooks' ci caused by the unstability of deepseek-ai/DeepSeek-R1-Distill-Qwen-7B (#5503) 2025-04-17 11:37:43 -07:00
Ying Sheng
d7bc19a46a add multi-lora feature in README.md (#5463) 2025-04-16 03:25:25 -07:00
mRSun15
3efc8e2d2a add attention backend supporting matrix in the doc (#5211)
Co-authored-by: Stefan He <hebiaobuaa@gmail.com>
2025-04-15 17:16:34 -07:00
Adarsh Shirawalmath
4aa6bab0b0 [Docs] Supported Model Docs - Major restructuring (#5290)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-04-11 09:17:47 -07:00
mlmz
7c5658c189 feat: disable grammar restrictions within reasoning sections (#4984)
Co-authored-by: tianhaoyu <thy@mail.ecust.edu.cn>
Co-authored-by: DarkSharpness <2040703891@qq.com>
2025-04-07 21:46:47 -07:00
Ke Bao
ade714a67f Add Llama4 user guide (#5133)
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
2025-04-07 19:09:34 -07:00
Lianmin Zheng
c38ca4fc8e Update readme (#4517) 2025-03-17 08:22:42 -07:00
Yineng Zhang
00f42707ea update doc (#4299) 2025-03-11 01:14:16 -07:00
Chayenne
e70fa279bc Docs: reorganize dpsk docs (#4108) 2025-03-05 13:01:03 -08:00
Tommy Yang
abe74b7b59 Docs: Add DeepSeek optimization ablations documentation (#4107)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-03-05 12:25:51 -08:00
Xihuai Wang
95575aa76a Reasoning parser (#4000)
Co-authored-by: Lucas Pickup <lupickup@microsoft.com>
2025-03-03 21:16:36 -08:00
simveit
acd1a15921 Docs: Implemented frontend docs (#3791)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-02-26 15:30:05 -08:00
Chayenne
8f019c7d1a Docs: Move dpsk docs forward a step (#3894) 2025-02-26 11:43:20 -08:00
Chayenne
3c7bfd7eab Docs: Fix layout with sub-section (#3710) 2025-02-19 15:44:30 -08:00
Shi Shuai
55de40f782 [Docs]: Fix Multi-User Port Allocation Conflicts (#3601)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
Co-authored-by: simveit <simp.veitner@gmail.com>
2025-02-19 11:15:44 -08:00
ybyang
c51dc2cc8d Docs: Deploy multi-node inference (LWS method) using sglang in a K8s cluster (#3624) 2025-02-17 18:14:20 -08:00
Shi Shuai
7443197a63 [CI] Improve Docs CI Efficiency (#3587)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-02-14 19:57:00 -08:00
Wenxuan Tan
0af1d239cb [Docs] Add quantization docs (#3410)
Co-authored-by: yinfan98 <1106310035@qq.com>
2025-02-10 02:16:21 +08:00
Zachary Streeter
0a6f18f068 added amd_configure.md to references (#3275)
Co-authored-by: HAI <hixiao@gmail.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-02-07 08:50:49 -08:00
Chayenne
76ca91dff2 Docs/CI: Enable Fake Finish for Docs Only PR (#3350) 2025-02-06 19:33:31 -08:00
Liangjun Song
455bfe8dd3 Add a Doc about guide on nvidia jetson #3182 (#3205)
Co-authored-by: Shi Shuai <126407087+shuaills@users.noreply.github.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-02-02 20:29:10 -08:00
simveit
c27c378a19 docs/accuracy evaluation (#3114)
Co-authored-by: Shi Shuai <126407087+shuaills@users.noreply.github.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-02-02 11:01:39 -08:00
Jhin
9472e69963 Doc: Add Docs about EAGLE speculative decoding (#3144)
Co-authored-by: Chayenne <zhaochenyang@ucla.edu>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-01-26 17:49:13 -08:00
Chayenne
1acc1f561a [Docs]: Add function calling in index.rst (#3155) 2025-01-26 11:11:27 -08:00
Shi Shuai
c4f9707e16 Improve: Token-In Token-Out Usage for RLHF (#2843) 2025-01-11 15:14:26 -08:00
Xiaotong Jiang
11fffbc95a [Doc]: Deepseek reference docs (#2787) 2025-01-09 13:43:12 -08:00
Chayenne
2e6346fc2e Docs:Update the style of llma 3.1 405B docs (#2789) 2025-01-08 01:07:54 -08:00
mlmz
977f785dad Docs: Rewrite docs for LLama 405B and ModelSpace (#2773)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-01-08 00:02:59 -08:00
Shi Shuai
062c48d2bd [Docs] Add Support for Pydantic Structured Output Format (#2697) 2025-01-01 15:08:43 -08:00
Chayenne
0d8d97b8e6 Doc: Rename contribution_guide.md (#2691) 2024-12-31 14:35:48 -08:00
Lianmin Zheng
bdd2827a80 Update structured_outputs.ipynb (#2666) 2024-12-30 00:46:41 -08:00
Shi Shuai
239c9d4d3a Docs: Add constrained decoding tutorial (#2614)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2024-12-27 23:54:28 -08:00