Commit Graph

2284 Commits

Author SHA1 Message Date
lukec
ffa1b3e318 Add an example of using deepseekv3 int8 sglang. (#4177)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-03-07 01:56:09 -08:00
Yineng Zhang
7e3bb52705 update release-pypi-kernel 2025-03-07 01:48:47 -08:00
Yineng Zhang
96263f275c chore: bump v0.0.3.post7 for sgl-kernel (#4176) 2025-03-07 01:15:34 -08:00
Zhiqiang Xie
9376ac361d Memory pool fix for upstream change about eagle (#4170) 2025-03-07 00:58:20 -08:00
Yineng Zhang
94a2b9d33e Put utils in ifndef USE_ROCM to fix CI (#4167) (#4168)
Co-authored-by: Stefan He <hebiaobuaa@gmail.com>
2025-03-07 00:01:17 -08:00
Stefan He
3c3eb374b2 Remove non-existent AMD header include (#4166) 2025-03-06 23:29:30 -08:00
Michael Yao
d557319a8b [Docs] Fix links and grammar issues (#4162) 2025-03-06 23:14:18 -08:00
Stefan He
95085d65e9 [Refactor] Reducing code duplication across FP8 CUDA quantization kernels (#4163) 2025-03-06 22:58:52 -08:00
HandH1998
c7f254468f [Feature] DeepSeek V3/R1 INT8 Quantization (channel-wise) (#3888)
Co-authored-by: yych0745 <1398089567@qq.com>
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
Co-authored-by: b0urnee <2769086541@qq.com>
2025-03-06 20:54:52 -08:00
Stefan He
63ee26d162 Add sgl_per_token_quant_fp8 (#4089) 2025-03-06 20:53:05 -08:00
Xiaoyu Zhang
ad55f17182 [quant kernel] sgl-kernel support per_tensor_quant fp8 (#3786) 2025-03-06 18:05:43 -08:00
Pan Lyu
361971b859 Add Support for Qwen2-VL Multi-modal Embedding Models (#3694) 2025-03-06 16:46:20 -08:00
HAI
13bc39c5d6 ROCm: enable trillion-parameter MoE models with INT4-FP8 single node (#4152) 2025-03-06 15:33:02 -08:00
Chayenne
9854a18a51 Hot fix small vocal eagle in docs (#4154)
Co-authored-by: ybyang <ybyang7@iflytek.com>
2025-03-06 15:13:26 -08:00
Chayenne
ebddb65aed Docs: add torch compile cache (#4151)
Co-authored-by: ybyang <ybyang7@iflytek.com>
2025-03-06 14:27:09 -08:00
Adarsh Shirawalmath
19fd57bcd7 [docs] fix HF reference script command (#4148) 2025-03-06 13:21:54 -08:00
Lianmin Zheng
9c58e68b4c Release v0.4.3.post4 (#4140) 2025-03-06 12:50:28 -08:00
Oliver Stanley
d03b3467b8 Fix constrained generation errors by adding datasets dependency (#4142) 2025-03-06 12:07:51 -08:00
yinfan98
ab7fba0ece Fix nightly ci Gsm8k & Fix flashinfer backend kvcache quant (#4147) 2025-03-06 11:50:07 -08:00
Lianmin Zheng
bc1534ff32 Fix a draft model accuracy bug in eagle; support step=1; return logprob in eagle (#4134)
Co-authored-by: Sehoon Kim <kssteven418@gmail.com>
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: Sehoon Kim <sehoon@x.ai>
2025-03-06 06:13:59 -08:00
Lzhang-hub
3a3918121f fix bench serving bug (#4135) 2025-03-06 05:34:02 -08:00
Lianmin Zheng
800bf018fb Update CODEOWNER (#4138) 2025-03-06 03:42:10 -08:00
kk
b16af90bc3 AMD/ROCm: update base image string (#4137)
Co-authored-by: wunhuang <wunhuang@amd.com>
Co-authored-by: yichiche <yichiche@amd.com>
2025-03-06 03:38:54 -08:00
Lianmin Zheng
98c73d71cb [Minor] make the __init__ function of model_runner.py shorter (#4132) 2025-03-06 01:51:12 -08:00
Lianmin Zheng
fcc2e37f69 Split the __init__ of scheduler as smaller functions. Improve the eagle tests (#4128) 2025-03-06 00:13:20 -08:00
Liu Jinjie
0804dd11a0 remove unused max_jobs in setup_rocm.py (#4126)
Signed-off-by: Jinjie Liu <jinjie.liu@usc.edu>
2025-03-06 00:12:19 -08:00
saienduri
55dc8e4d52 Add tag suffix to nightly docker builds. (#4129) 2025-03-05 23:22:36 -08:00
Ying Sheng
02e9e9f1cf Add codeowners for eagle implementations (#4131)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: Sehoon Kim <kssteven418@gmail.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
2025-03-05 23:16:49 -08:00
simveit
8f0b63139e Docs: improve EAGLE docs (#4038) 2025-03-05 22:40:21 -08:00
samzong
b9b3b098b9 feat: support docs auto live-reload with sphinx-autobuild (#4111)
Signed-off-by: samzong <samzong.lu@gmail.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-03-05 22:39:34 -08:00
Zhiqiang Xie
aee30630d8 Add a pointer to the real KV cache pool (#4113) 2025-03-05 21:39:07 -08:00
Lianmin Zheng
286e6540a6 Remove prefill-only-one-req (#4117) 2025-03-05 20:58:48 -08:00
Wenxuan Tan
718c391fd7 [Hoxfix] Fix incomplete token_to_kv_pool refactor (#4121) 2025-03-05 19:32:42 -08:00
Yineng Zhang
fc671f66c1 chore: bump v0.4.3.post3 (#4114) 2025-03-05 17:26:10 -08:00
samzong
197751e9a1 fix Non-consecutive header level increase in docs/router/router.md (#4099)
Signed-off-by: samzong <samzong.lu@gmail.com>
2025-03-05 17:02:32 -08:00
samzong
d2d0d061d9 fix cross-reference error and spelling mistakes (#4101)
Signed-off-by: samzong <samzong.lu@gmail.com>
2025-03-05 16:39:02 -08:00
Yueyang Pan
25482edb5c Online serving benchmarks of real datasets for hierarchical KV caching (#3211)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
2025-03-05 16:16:43 -08:00
luzengxiangcn
62b362b1f1 Debug radixcache: refactor recursive helper methods (#3029)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
2025-03-05 16:11:42 -08:00
saienduri
44d7646371 remove testing on PR workflow change (#4110) 2025-03-05 16:03:18 -08:00
saienduri
cd85b78f94 Create release-docker-amd-nightly.yml (#4105) 2025-03-05 14:46:26 -08:00
Yineng Zhang
0aaccbbfec revert deepseek docs (#4109) 2025-03-05 13:23:11 -08:00
Qiaolin Yu
357671e216 Add examples for server token-in-token-out (#4103)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-03-05 13:16:31 -08:00
Chayenne
e70fa279bc Docs: reorganize dpsk docs (#4108) 2025-03-05 13:01:03 -08:00
Tommy Yang
abe74b7b59 Docs: Add DeepSeek optimization ablations documentation (#4107)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-03-05 12:25:51 -08:00
Jhin
70b3c6eeb1 Add update_weights_from_disk endpoint to Engine (#4102)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-03-05 12:25:18 -08:00
Ke Bao
ef9d3b3c2c Fix triton kernel illegal memory issue for eagle (#4100) 2025-03-05 11:23:53 -08:00
Baizhou Zhang
fc91d08a8f [Revision] Add fast decode plan for flashinfer mla (#4012) 2025-03-05 11:20:41 -08:00
HAI
71ab0dabe0 Fix the moe padding conditional logic (#4081) 2025-03-05 10:56:51 -08:00
Ying Sheng
d3d4d76758 [Eagle] Refactor eagle speculative decoding (#3986)
Co-authored-by: Ke Bao <ISPObaoke@163.com>
2025-03-05 08:06:07 -08:00
yigex
5be8f1ed98 ROCM: AITER BLOCK GEMM (#4075) 2025-03-05 03:10:49 -08:00