Commit Graph

96 Commits

Author SHA1 Message Date
Yineng Zhang
5d86016855 revert "Docs: Reorngaize dpsk links #3900" (#3933) 2025-02-27 08:57:13 -08:00
Baizhou Zhang
3e02526b1f [Doc] Add experimental tag for flashinfer mla (#3925) 2025-02-27 01:55:36 -08:00
Stefan He
d8a98a2cad [Docs] Improve DPSK docs in dark mode (#3914) 2025-02-27 00:13:04 -08:00
Baizhou Zhang
71ed01833d [doc] Update document for flashinfer mla (#3907) 2025-02-26 20:40:45 -08:00
Chayenne
7c1692aa90 Docs: Reorngaize dpsk links (#3900) 2025-02-26 15:16:31 -08:00
Chayenne
8f019c7d1a Docs: Move dpsk docs forward a step (#3894) 2025-02-26 11:43:20 -08:00
Shenggui Li
3dc9ff3ce8 [doc] fixed dpsk quant faq (#3865) 2025-02-25 19:40:47 -08:00
Shenggui Li
06427dfab1 [doc] added quantization doc for dpsk (#3843) 2025-02-25 09:43:28 -08:00
Shenggui Li
c0bb9eb3b3 [improve] made timeout configurable (#3803) 2025-02-25 00:26:08 -08:00
Baizhou Zhang
4d2a88bdff [Docs]Add instruction for manually stopping nsys profiler (#3795)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-02-23 13:21:48 -08:00
Mick
45205d88a0 bench: Add MMMU benchmark for vLM (#3562) 2025-02-22 08:10:59 -08:00
simveit
20b765a26e Model: Support Qwen 72B RM model. (#3772) 2025-02-21 14:38:21 -08:00
simveit
4592afc27d Docs: Fix layout to docs (#3733) 2025-02-21 11:24:13 -08:00
Shakhizat Nurgaliyev
d8d75d256a Change description of nvidia jetson docs (#3761) 2025-02-21 20:44:22 +08:00
Shenggui Li
c6a4852136 [docs] added torch.compile cache to dpsk manual (#3737) 2025-02-21 00:11:40 -08:00
Baizhou Zhang
ac05310098 [Docs] Modify ep related server args and remove cublas part of deepseek (#3732) 2025-02-21 03:37:56 +08:00
Chayenne
3c7bfd7eab Docs: Fix layout with sub-section (#3710) 2025-02-19 15:44:30 -08:00
Baizhou Zhang
67fc595bb8 [Feature] Apply Cublas Grouped Gemm kernel (#3629) 2025-02-18 15:18:31 +08:00
ybyang
c51dc2cc8d Docs: Deploy multi-node inference (LWS method) using sglang in a K8s cluster (#3624) 2025-02-17 18:14:20 -08:00
Shenggui Li
c9565e49e7 [docker] added rdma support (#3619) 2025-02-17 15:36:16 +08:00
Shi Shuai
d03c4c25a7 [docs] Update sampling_params.md (#3617) 2025-02-16 18:52:30 -08:00
simveit
8f13377dea Draft of updated doc for sampling params. (#3260)
Co-authored-by: shuaills <shishuaicareer@gmail.com>
2025-02-16 14:28:22 -08:00
Mick
bcc213df61 Model: Support Qwen 2.5 vl (#3258) 2025-02-16 00:58:53 -08:00
Jhin
bf2a70872e Update DeepSeek V3 Doc (#3541) 2025-02-12 23:15:37 -08:00
Zachary Streeter
8adbc78b30 added llama and cleaned up (#3503) 2025-02-12 18:48:30 +08:00
Mick
ced680663c doc: Support a new vLM (#3405)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-02-12 00:43:14 -08:00
Zachary Streeter
2491cc928d add deepseek-v3 amd docker command (#3495) 2025-02-12 03:03:08 +08:00
Didier Durand
9490d15772 fix supported_models Qwen typo (#3498) 2025-02-12 02:59:18 +08:00
Didier Durand
eefcbdd353 fix deepseek_v3 typo (#3497) 2025-02-12 02:58:36 +08:00
Ying Sheng
52a492a16e Update contribution_guide.md (#3452) 2025-02-10 12:53:47 +08:00
Shi Shuai
20cf910d8f [docs] Update quantization documentation (#3437)
Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com>
Co-authored-by: jamessand <shazhizhou0@gmail.com>
2025-02-09 10:39:49 -08:00
Wenxuan Tan
0af1d239cb [Docs] Add quantization docs (#3410)
Co-authored-by: yinfan98 <1106310035@qq.com>
2025-02-10 02:16:21 +08:00
Shi Shuai
6702592d0e [docs] Add multi-node inference example for SLURM in documentation (#3408)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
Co-authored-by: aflah02 <aflah20082@iiitd.ac.in>
2025-02-08 21:45:14 -08:00
Zachary Streeter
0a6f18f068 added amd_configure.md to references (#3275)
Co-authored-by: HAI <hixiao@gmail.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-02-07 08:50:49 -08:00
Shi Shuai
591e751e07 Fix: Runtime error for function calling (#3300) 2025-02-06 20:52:01 -08:00
Chayenne
76ca91dff2 Docs/CI: Enable Fake Finish for Docs Only PR (#3350) 2025-02-06 19:33:31 -08:00
Liangjun Song
455bfe8dd3 Add a Doc about guide on nvidia jetson #3182 (#3205)
Co-authored-by: Shi Shuai <126407087+shuaills@users.noreply.github.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-02-02 20:29:10 -08:00
Chayenne
55f5fc68ac Docs: Update accuracy evaluation (#3261) 2025-02-02 11:14:59 -08:00
simveit
c27c378a19 docs/accuracy evaluation (#3114)
Co-authored-by: Shi Shuai <126407087+shuaills@users.noreply.github.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-02-02 11:01:39 -08:00
Wenxuan Tan
d7c0b32f4d [Docs] Add more details to profiling docs (#3221) 2025-01-31 15:59:28 -08:00
Ravi Theja
9829e77e3f Docs: Update supported models with Mistral 3 (#3229)
Co-authored-by: Ravi Theja Desetty <ravitheja@Ravis-MacBook-Pro.local>
2025-01-31 00:01:46 -08:00
Mick
9f635ea50d [Fix] Address remaining issues of supporting MiniCPMV (#2977) 2025-01-28 00:22:13 -08:00
Adarsh Shirawalmath
4505a43614 [Docs] minor update for phi-3 and phi-4 (#3096) 2025-01-24 04:00:20 -08:00
Baizhou Zhang
b3393e941f [Doc] Update doc of profiling with PyTorch Profiler (#3038) 2025-01-22 14:17:26 -08:00
Hongpeng Guo
949b3fbfce [Doc] Update doc of custom logit processor (#3021)
Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
2025-01-20 16:50:25 -08:00
Chayenne
2584f6d944 Docs: Add Performance Demonstaration for DPA (#3005) 2025-01-20 01:00:52 -08:00
Lianmin Zheng
03464890e0 Separate two entry points: Engine and HTTP server (#2996)
Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
2025-01-19 22:09:24 -08:00
Enrique Shockwave
3bcf5ecea7 support regex in xgrammar backend (#2983) 2025-01-20 04:34:41 +08:00
Yineng Zhang
def5c31873 docs: update supported_models (#2987) 2025-01-20 00:44:30 +08:00
Mick
3d93f84a00 [Feature] Support minicpmv v2.6 (#2785)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: yizhang2077 <1109276519@qq.com>
2025-01-18 14:14:19 -08:00