lukec
|
ffa1b3e318
|
Add an example of using deepseekv3 int8 sglang. (#4177)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-03-07 01:56:09 -08:00 |
|
Pan Lyu
|
361971b859
|
Add Support for Qwen2-VL Multi-modal Embedding Models (#3694)
|
2025-03-06 16:46:20 -08:00 |
|
Chayenne
|
ebddb65aed
|
Docs: add torch compile cache (#4151)
Co-authored-by: ybyang <ybyang7@iflytek.com>
|
2025-03-06 14:27:09 -08:00 |
|
Adarsh Shirawalmath
|
19fd57bcd7
|
[docs] fix HF reference script command (#4148)
|
2025-03-06 13:21:54 -08:00 |
|
samzong
|
d2d0d061d9
|
fix cross-reference error and spelling mistakes (#4101)
Signed-off-by: samzong <samzong.lu@gmail.com>
|
2025-03-05 16:39:02 -08:00 |
|
Yineng Zhang
|
0aaccbbfec
|
revert deepseek docs (#4109)
|
2025-03-05 13:23:11 -08:00 |
|
Chayenne
|
e70fa279bc
|
Docs: reorganize dpsk docs (#4108)
|
2025-03-05 13:01:03 -08:00 |
|
Tommy Yang
|
abe74b7b59
|
Docs: Add DeepSeek optimization ablations documentation (#4107)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-03-05 12:25:51 -08:00 |
|
Baizhou Zhang
|
fc91d08a8f
|
[Revision] Add fast decode plan for flashinfer mla (#4012)
|
2025-03-05 11:20:41 -08:00 |
|
Xihuai Wang
|
95575aa76a
|
Reasoning parser (#4000)
Co-authored-by: Lucas Pickup <lupickup@microsoft.com>
|
2025-03-03 21:16:36 -08:00 |
|
Chayenne
|
146ac8df07
|
Add examples in sampling parameters (#4039)
|
2025-03-03 13:04:32 -08:00 |
|
Lianmin Zheng
|
935cda944b
|
Misc clean up; Remove the support of jump forward (#4032)
|
2025-03-03 07:02:14 -08:00 |
|
Yudi Xue
|
a7000a7650
|
Update metrics documentation (#3264)
|
2025-03-03 05:03:58 -08:00 |
|
Lianmin Zheng
|
9e1014cf99
|
Revert "Add fast decode plan for flashinfer mla" (#4008)
|
2025-03-02 19:29:10 -08:00 |
|
Baizhou Zhang
|
fa56106731
|
Add fast decode plan for flashinfer mla (#3987)
|
2025-03-02 19:16:37 -08:00 |
|
Yineng Zhang
|
5d86016855
|
revert "Docs: Reorngaize dpsk links #3900" (#3933)
|
2025-02-27 08:57:13 -08:00 |
|
Baizhou Zhang
|
3e02526b1f
|
[Doc] Add experimental tag for flashinfer mla (#3925)
|
2025-02-27 01:55:36 -08:00 |
|
Stefan He
|
d8a98a2cad
|
[Docs] Improve DPSK docs in dark mode (#3914)
|
2025-02-27 00:13:04 -08:00 |
|
Baizhou Zhang
|
71ed01833d
|
[doc] Update document for flashinfer mla (#3907)
|
2025-02-26 20:40:45 -08:00 |
|
Chayenne
|
7c1692aa90
|
Docs: Reorngaize dpsk links (#3900)
|
2025-02-26 15:16:31 -08:00 |
|
Chayenne
|
8f019c7d1a
|
Docs: Move dpsk docs forward a step (#3894)
|
2025-02-26 11:43:20 -08:00 |
|
Shenggui Li
|
3dc9ff3ce8
|
[doc] fixed dpsk quant faq (#3865)
|
2025-02-25 19:40:47 -08:00 |
|
Shenggui Li
|
06427dfab1
|
[doc] added quantization doc for dpsk (#3843)
|
2025-02-25 09:43:28 -08:00 |
|
Shenggui Li
|
c0bb9eb3b3
|
[improve] made timeout configurable (#3803)
|
2025-02-25 00:26:08 -08:00 |
|
Baizhou Zhang
|
4d2a88bdff
|
[Docs]Add instruction for manually stopping nsys profiler (#3795)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-02-23 13:21:48 -08:00 |
|
Mick
|
45205d88a0
|
bench: Add MMMU benchmark for vLM (#3562)
|
2025-02-22 08:10:59 -08:00 |
|
simveit
|
20b765a26e
|
Model: Support Qwen 72B RM model. (#3772)
|
2025-02-21 14:38:21 -08:00 |
|
simveit
|
4592afc27d
|
Docs: Fix layout to docs (#3733)
|
2025-02-21 11:24:13 -08:00 |
|
Shakhizat Nurgaliyev
|
d8d75d256a
|
Change description of nvidia jetson docs (#3761)
|
2025-02-21 20:44:22 +08:00 |
|
Shenggui Li
|
c6a4852136
|
[docs] added torch.compile cache to dpsk manual (#3737)
|
2025-02-21 00:11:40 -08:00 |
|
Baizhou Zhang
|
ac05310098
|
[Docs] Modify ep related server args and remove cublas part of deepseek (#3732)
|
2025-02-21 03:37:56 +08:00 |
|
Chayenne
|
3c7bfd7eab
|
Docs: Fix layout with sub-section (#3710)
|
2025-02-19 15:44:30 -08:00 |
|
Baizhou Zhang
|
67fc595bb8
|
[Feature] Apply Cublas Grouped Gemm kernel (#3629)
|
2025-02-18 15:18:31 +08:00 |
|
ybyang
|
c51dc2cc8d
|
Docs: Deploy multi-node inference (LWS method) using sglang in a K8s cluster (#3624)
|
2025-02-17 18:14:20 -08:00 |
|
Shenggui Li
|
c9565e49e7
|
[docker] added rdma support (#3619)
|
2025-02-17 15:36:16 +08:00 |
|
Shi Shuai
|
d03c4c25a7
|
[docs] Update sampling_params.md (#3617)
|
2025-02-16 18:52:30 -08:00 |
|
simveit
|
8f13377dea
|
Draft of updated doc for sampling params. (#3260)
Co-authored-by: shuaills <shishuaicareer@gmail.com>
|
2025-02-16 14:28:22 -08:00 |
|
Mick
|
bcc213df61
|
Model: Support Qwen 2.5 vl (#3258)
|
2025-02-16 00:58:53 -08:00 |
|
Jhin
|
bf2a70872e
|
Update DeepSeek V3 Doc (#3541)
|
2025-02-12 23:15:37 -08:00 |
|
Zachary Streeter
|
8adbc78b30
|
added llama and cleaned up (#3503)
|
2025-02-12 18:48:30 +08:00 |
|
Mick
|
ced680663c
|
doc: Support a new vLM (#3405)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-02-12 00:43:14 -08:00 |
|
Zachary Streeter
|
2491cc928d
|
add deepseek-v3 amd docker command (#3495)
|
2025-02-12 03:03:08 +08:00 |
|
Didier Durand
|
9490d15772
|
fix supported_models Qwen typo (#3498)
|
2025-02-12 02:59:18 +08:00 |
|
Didier Durand
|
eefcbdd353
|
fix deepseek_v3 typo (#3497)
|
2025-02-12 02:58:36 +08:00 |
|
Ying Sheng
|
52a492a16e
|
Update contribution_guide.md (#3452)
|
2025-02-10 12:53:47 +08:00 |
|
Shi Shuai
|
20cf910d8f
|
[docs] Update quantization documentation (#3437)
Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com>
Co-authored-by: jamessand <shazhizhou0@gmail.com>
|
2025-02-09 10:39:49 -08:00 |
|
Wenxuan Tan
|
0af1d239cb
|
[Docs] Add quantization docs (#3410)
Co-authored-by: yinfan98 <1106310035@qq.com>
|
2025-02-10 02:16:21 +08:00 |
|
Shi Shuai
|
6702592d0e
|
[docs] Add multi-node inference example for SLURM in documentation (#3408)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
Co-authored-by: aflah02 <aflah20082@iiitd.ac.in>
|
2025-02-08 21:45:14 -08:00 |
|
Zachary Streeter
|
0a6f18f068
|
added amd_configure.md to references (#3275)
Co-authored-by: HAI <hixiao@gmail.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-02-07 08:50:49 -08:00 |
|
Shi Shuai
|
591e751e07
|
Fix: Runtime error for function calling (#3300)
|
2025-02-06 20:52:01 -08:00 |
|