Commit Graph

278 Commits

Author SHA1 Message Date
Shenggui Li
c6a4852136 [docs] added torch.compile cache to dpsk manual (#3737) 2025-02-21 00:11:40 -08:00
Baizhou Zhang
ac05310098 [Docs] Modify ep related server args and remove cublas part of deepseek (#3732) 2025-02-21 03:37:56 +08:00
Chayenne
3c7bfd7eab Docs: Fix layout with sub-section (#3710) 2025-02-19 15:44:30 -08:00
Shi Shuai
55de40f782 [Docs]: Fix Multi-User Port Allocation Conflicts (#3601)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
Co-authored-by: simveit <simp.veitner@gmail.com>
2025-02-19 11:15:44 -08:00
Baizhou Zhang
67fc595bb8 [Feature] Apply Cublas Grouped Gemm kernel (#3629) 2025-02-18 15:18:31 +08:00
ybyang
c51dc2cc8d Docs: Deploy multi-node inference (LWS method) using sglang in a K8s cluster (#3624) 2025-02-17 18:14:20 -08:00
Yineng Zhang
a5375adc3a chore: bump v0.4.3.post2 (#3645)
Co-authored-by: pankajroark <pankajroark@users.noreply.github.com>
2025-02-18 02:48:30 +08:00
Yineng Zhang
75d171a9c5 chore: update flashinfer v0.2.1.post2 (#3644) 2025-02-18 02:47:42 +08:00
Yineng Zhang
e782eb7e6a chore: bump v0.4.3.post1 (#3638) 2025-02-17 21:58:19 +08:00
Shenggui Li
c9565e49e7 [docker] added rdma support (#3619) 2025-02-17 15:36:16 +08:00
Shi Shuai
d03c4c25a7 [docs] Update sampling_params.md (#3617) 2025-02-16 18:52:30 -08:00
simveit
8f13377dea Draft of updated doc for sampling params. (#3260)
Co-authored-by: shuaills <shishuaicareer@gmail.com>
2025-02-16 14:28:22 -08:00
Mick
bcc213df61 Model: Support Qwen 2.5 vl (#3258) 2025-02-16 00:58:53 -08:00
Shenggui Li
231c40d859 [docs] added favicon to sphinx html (#3564) 2025-02-15 10:21:21 -08:00
Mick
7711ac6ed0 doc: emphasize and notify the usage of chat_template (#3589)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-02-15 00:10:32 -08:00
Shi Shuai
7443197a63 [CI] Improve Docs CI Efficiency (#3587)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-02-14 19:57:00 -08:00
Yineng Zhang
4e23c961e8 docs: update install (#3581) 2025-02-14 18:54:50 +08:00
Yineng Zhang
31eec35ba8 fix doc (#3558) 2025-02-14 10:11:31 +08:00
Yineng Zhang
ac963be234 update flashinfer-python (#3557) 2025-02-14 09:52:56 +08:00
Yineng Zhang
e0b9a423c8 chore: bump v0.4.3 (#3556) 2025-02-14 09:43:14 +08:00
simveit
368de3661e Update install docs (#3553)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-02-13 13:42:51 -08:00
Jhin
bf2a70872e Update DeepSeek V3 Doc (#3541) 2025-02-12 23:15:37 -08:00
Zachary Streeter
8adbc78b30 added llama and cleaned up (#3503) 2025-02-12 18:48:30 +08:00
Mick
ced680663c doc: Support a new vLM (#3405)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-02-12 00:43:14 -08:00
Yineng Zhang
2f48221033 docs: update install 2025-02-12 03:13:31 +08:00
Zachary Streeter
2491cc928d add deepseek-v3 amd docker command (#3495) 2025-02-12 03:03:08 +08:00
Didier Durand
67c5de9286 fix router typo (#3496) 2025-02-12 03:00:57 +08:00
Didier Durand
1e2cf2b541 fix server_arguments typo (#3499) 2025-02-12 02:59:53 +08:00
Didier Durand
9490d15772 fix supported_models Qwen typo (#3498) 2025-02-12 02:59:18 +08:00
Didier Durand
eefcbdd353 fix deepseek_v3 typo (#3497) 2025-02-12 02:58:36 +08:00
Jackmin801
5f0e7de339 [Feat] Return hidden states (experimental) (#3364)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-02-10 15:54:37 -08:00
Yineng Zhang
cddb1cdf8f chore: bump v0.4.2.post4 (#3459) 2025-02-10 14:12:16 +08:00
Yineng Zhang
27c4c9cf52 remove _grouped_size_compiled_for_decode_kernels (#3453) 2025-02-10 13:01:21 +08:00
Ying Sheng
52a492a16e Update contribution_guide.md (#3452) 2025-02-10 12:53:47 +08:00
Shi Shuai
20cf910d8f [docs] Update quantization documentation (#3437)
Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com>
Co-authored-by: jamessand <shazhizhou0@gmail.com>
2025-02-09 10:39:49 -08:00
Wenxuan Tan
0af1d239cb [Docs] Add quantization docs (#3410)
Co-authored-by: yinfan98 <1106310035@qq.com>
2025-02-10 02:16:21 +08:00
Yineng Zhang
4d2dbeaca7 remove cutex dependency (#3422) 2025-02-09 18:33:20 +08:00
Shi Shuai
6702592d0e [docs] Add multi-node inference example for SLURM in documentation (#3408)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
Co-authored-by: aflah02 <aflah20082@iiitd.ac.in>
2025-02-08 21:45:14 -08:00
Zachary Streeter
0a6f18f068 added amd_configure.md to references (#3275)
Co-authored-by: HAI <hixiao@gmail.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-02-07 08:50:49 -08:00
Yineng Zhang
c1f5f99f60 chore: bump v0.4.2.post3 (#3369) 2025-02-07 08:20:03 -08:00
Shi Shuai
591e751e07 Fix: Runtime error for function calling (#3300) 2025-02-06 20:52:01 -08:00
Chayenne
76ca91dff2 Docs/CI: Enable Fake Finish for Docs Only PR (#3350) 2025-02-06 19:33:31 -08:00
Yineng Zhang
7aad8d1854 chore: bump v0.4.2.post2 (#3313) 2025-02-05 17:35:02 +08:00
Yineng Zhang
6186a8f889 update flashinfer install index url (#3293) 2025-02-05 00:44:35 +08:00
HAI
2c1a695ff1 ROCm: sgl-kernel enablement starting with sgl_moe_align_block (#3287) 2025-02-04 21:44:44 +08:00
Baizhou Zhang
70817a7eae [Feature] Define backends and add Triton backend for Lora (#3161)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
2025-02-03 22:09:13 -08:00
simveit
7b5a374114 Update server args doc (#3273)
Co-authored-by: Shi Shuai <126407087+shuaills@users.noreply.github.com>
2025-02-03 23:39:41 +00:00
Liangjun Song
455bfe8dd3 Add a Doc about guide on nvidia jetson #3182 (#3205)
Co-authored-by: Shi Shuai <126407087+shuaills@users.noreply.github.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-02-02 20:29:10 -08:00
HAI
566d61d90f ROCm: bump 6.3.0 (#3259) 2025-02-03 04:13:40 +08:00
Chayenne
55f5fc68ac Docs: Update accuracy evaluation (#3261) 2025-02-02 11:14:59 -08:00