yhyang201
|
a85363c199
|
[docs] Instructions for bench_serving.py (#9071)
Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2025-08-26 18:30:57 -07:00 |
|
Lianmin Zheng
|
2e8e7e353b
|
Improve docs and developer guide (#9044)
|
2025-08-10 21:05:18 -07:00 |
|
Lianmin Zheng
|
2449a0afe2
|
Refactor the docs (#9031)
|
2025-08-10 19:49:45 -07:00 |
|
Lianmin Zheng
|
706bd69cc5
|
Clean up server_args.py to have a dedicated function for model specific adjustments (#8983)
|
2025-08-08 19:56:50 -07:00 |
|
Kevin Xiang Li
|
44d600cd67
|
Support precomputed_embeddings for Llama 4 (#8156)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Xiang (Kevin) Li <lik@nvidia.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2025-07-27 01:14:49 -07:00 |
|
Lianmin Zheng
|
0f218731e3
|
Do not run frontend_reasoning.ipynb to reduce the CI load (#7073)
|
2025-06-10 17:15:31 -07:00 |
|
Yudi Xue
|
14c18d25df
|
Frontend language separate reasoning support (#6031)
|
2025-06-10 17:11:29 -07:00 |
|
Lianmin Zheng
|
bb185b0e92
|
Update README.md (#7040)
|
2025-06-10 01:59:14 -07:00 |
|
Marc Sun
|
37f1547587
|
[FEAT] Add transformers backend support (#5929)
|
2025-06-03 21:05:29 -07:00 |
|
linzhuo
|
7a0bbe6a64
|
update toc for doc and dockerfile code style format (#6450)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-05-27 13:05:11 +08:00 |
|
simveit
|
e235be16fe
|
Fix some issues with current docs. (#6588)
|
2025-05-26 01:04:34 +08:00 |
|
Byron Hsu
|
7513558074
|
[PD] Add doc and simplify sender.send (#6019)
|
2025-05-21 21:22:21 -07:00 |
|
Mick
|
cd7c8a8de6
|
doc: update developer guide regarding mllms (#6138)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: XinyuanTong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: Xinyuan Tong <justinning0323@outlook.com>
|
2025-05-14 23:13:13 +08:00 |
|
Lianmin Zheng
|
e8e18dcdcc
|
Revert "fix some typos" (#6244)
|
2025-05-12 12:53:26 -07:00 |
|
applesaucethebun
|
d738ab52f8
|
fix some typos (#6209)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-05-13 01:42:38 +08:00 |
|
江家瑋
|
ad506a4e6b
|
docs: Fix Qwen model typo (#5944)
Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
|
2025-05-01 10:23:00 -07:00 |
|
Lianmin Zheng
|
155890e4d1
|
[Minor] fix documentations (#5756)
|
2025-04-26 17:48:43 -07:00 |
|
Baizhou Zhang
|
072b4d0398
|
Add document for LoRA serving (#5521)
|
2025-04-20 14:37:57 -07:00 |
|
mlmz
|
f13d65a7ea
|
Doc: fix problems of the 'Execute Notebooks / run-all-notebooks' ci caused by the unstability of deepseek-ai/DeepSeek-R1-Distill-Qwen-7B (#5503)
|
2025-04-17 11:37:43 -07:00 |
|
Ying Sheng
|
d7bc19a46a
|
add multi-lora feature in README.md (#5463)
|
2025-04-16 03:25:25 -07:00 |
|
mRSun15
|
3efc8e2d2a
|
add attention backend supporting matrix in the doc (#5211)
Co-authored-by: Stefan He <hebiaobuaa@gmail.com>
|
2025-04-15 17:16:34 -07:00 |
|
Adarsh Shirawalmath
|
4aa6bab0b0
|
[Docs] Supported Model Docs - Major restructuring (#5290)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-04-11 09:17:47 -07:00 |
|
mlmz
|
7c5658c189
|
feat: disable grammar restrictions within reasoning sections (#4984)
Co-authored-by: tianhaoyu <thy@mail.ecust.edu.cn>
Co-authored-by: DarkSharpness <2040703891@qq.com>
|
2025-04-07 21:46:47 -07:00 |
|
Ke Bao
|
ade714a67f
|
Add Llama4 user guide (#5133)
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
|
2025-04-07 19:09:34 -07:00 |
|
Lianmin Zheng
|
c38ca4fc8e
|
Update readme (#4517)
|
2025-03-17 08:22:42 -07:00 |
|
Yineng Zhang
|
00f42707ea
|
update doc (#4299)
|
2025-03-11 01:14:16 -07:00 |
|
Chayenne
|
e70fa279bc
|
Docs: reorganize dpsk docs (#4108)
|
2025-03-05 13:01:03 -08:00 |
|
Tommy Yang
|
abe74b7b59
|
Docs: Add DeepSeek optimization ablations documentation (#4107)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-03-05 12:25:51 -08:00 |
|
Xihuai Wang
|
95575aa76a
|
Reasoning parser (#4000)
Co-authored-by: Lucas Pickup <lupickup@microsoft.com>
|
2025-03-03 21:16:36 -08:00 |
|
simveit
|
acd1a15921
|
Docs: Implemented frontend docs (#3791)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-02-26 15:30:05 -08:00 |
|
Chayenne
|
8f019c7d1a
|
Docs: Move dpsk docs forward a step (#3894)
|
2025-02-26 11:43:20 -08:00 |
|
Chayenne
|
3c7bfd7eab
|
Docs: Fix layout with sub-section (#3710)
|
2025-02-19 15:44:30 -08:00 |
|
Shi Shuai
|
55de40f782
|
[Docs]: Fix Multi-User Port Allocation Conflicts (#3601)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
Co-authored-by: simveit <simp.veitner@gmail.com>
|
2025-02-19 11:15:44 -08:00 |
|
ybyang
|
c51dc2cc8d
|
Docs: Deploy multi-node inference (LWS method) using sglang in a K8s cluster (#3624)
|
2025-02-17 18:14:20 -08:00 |
|
Shi Shuai
|
7443197a63
|
[CI] Improve Docs CI Efficiency (#3587)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-02-14 19:57:00 -08:00 |
|
Wenxuan Tan
|
0af1d239cb
|
[Docs] Add quantization docs (#3410)
Co-authored-by: yinfan98 <1106310035@qq.com>
|
2025-02-10 02:16:21 +08:00 |
|
Zachary Streeter
|
0a6f18f068
|
added amd_configure.md to references (#3275)
Co-authored-by: HAI <hixiao@gmail.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-02-07 08:50:49 -08:00 |
|
Chayenne
|
76ca91dff2
|
Docs/CI: Enable Fake Finish for Docs Only PR (#3350)
|
2025-02-06 19:33:31 -08:00 |
|
Liangjun Song
|
455bfe8dd3
|
Add a Doc about guide on nvidia jetson #3182 (#3205)
Co-authored-by: Shi Shuai <126407087+shuaills@users.noreply.github.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-02-02 20:29:10 -08:00 |
|
simveit
|
c27c378a19
|
docs/accuracy evaluation (#3114)
Co-authored-by: Shi Shuai <126407087+shuaills@users.noreply.github.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-02-02 11:01:39 -08:00 |
|
Jhin
|
9472e69963
|
Doc: Add Docs about EAGLE speculative decoding (#3144)
Co-authored-by: Chayenne <zhaochenyang@ucla.edu>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-01-26 17:49:13 -08:00 |
|
Chayenne
|
1acc1f561a
|
[Docs]: Add function calling in index.rst (#3155)
|
2025-01-26 11:11:27 -08:00 |
|
Shi Shuai
|
c4f9707e16
|
Improve: Token-In Token-Out Usage for RLHF (#2843)
|
2025-01-11 15:14:26 -08:00 |
|
Xiaotong Jiang
|
11fffbc95a
|
[Doc]: Deepseek reference docs (#2787)
|
2025-01-09 13:43:12 -08:00 |
|
Chayenne
|
2e6346fc2e
|
Docs:Update the style of llma 3.1 405B docs (#2789)
|
2025-01-08 01:07:54 -08:00 |
|
mlmz
|
977f785dad
|
Docs: Rewrite docs for LLama 405B and ModelSpace (#2773)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-01-08 00:02:59 -08:00 |
|
Shi Shuai
|
062c48d2bd
|
[Docs] Add Support for Pydantic Structured Output Format (#2697)
|
2025-01-01 15:08:43 -08:00 |
|
Chayenne
|
0d8d97b8e6
|
Doc: Rename contribution_guide.md (#2691)
|
2024-12-31 14:35:48 -08:00 |
|
Lianmin Zheng
|
bdd2827a80
|
Update structured_outputs.ipynb (#2666)
|
2024-12-30 00:46:41 -08:00 |
|
Shi Shuai
|
239c9d4d3a
|
Docs: Add constrained decoding tutorial (#2614)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2024-12-27 23:54:28 -08:00 |
|