lukec
|
ffa1b3e318
|
Add an example of using deepseekv3 int8 sglang. (#4177)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-03-07 01:56:09 -08:00 |
|
Yineng Zhang
|
7e3bb52705
|
update release-pypi-kernel
|
2025-03-07 01:48:47 -08:00 |
|
Yineng Zhang
|
96263f275c
|
chore: bump v0.0.3.post7 for sgl-kernel (#4176)
|
2025-03-07 01:15:34 -08:00 |
|
Zhiqiang Xie
|
9376ac361d
|
Memory pool fix for upstream change about eagle (#4170)
|
2025-03-07 00:58:20 -08:00 |
|
Yineng Zhang
|
94a2b9d33e
|
Put utils in ifndef USE_ROCM to fix CI (#4167) (#4168)
Co-authored-by: Stefan He <hebiaobuaa@gmail.com>
|
2025-03-07 00:01:17 -08:00 |
|
Stefan He
|
3c3eb374b2
|
Remove non-existent AMD header include (#4166)
|
2025-03-06 23:29:30 -08:00 |
|
Michael Yao
|
d557319a8b
|
[Docs] Fix links and grammar issues (#4162)
|
2025-03-06 23:14:18 -08:00 |
|
Stefan He
|
95085d65e9
|
[Refactor] Reducing code duplication across FP8 CUDA quantization kernels (#4163)
|
2025-03-06 22:58:52 -08:00 |
|
HandH1998
|
c7f254468f
|
[Feature] DeepSeek V3/R1 INT8 Quantization (channel-wise) (#3888)
Co-authored-by: yych0745 <1398089567@qq.com>
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
Co-authored-by: b0urnee <2769086541@qq.com>
|
2025-03-06 20:54:52 -08:00 |
|
Stefan He
|
63ee26d162
|
Add sgl_per_token_quant_fp8 (#4089)
|
2025-03-06 20:53:05 -08:00 |
|
Xiaoyu Zhang
|
ad55f17182
|
[quant kernel] sgl-kernel support per_tensor_quant fp8 (#3786)
|
2025-03-06 18:05:43 -08:00 |
|
Pan Lyu
|
361971b859
|
Add Support for Qwen2-VL Multi-modal Embedding Models (#3694)
|
2025-03-06 16:46:20 -08:00 |
|
HAI
|
13bc39c5d6
|
ROCm: enable trillion-parameter MoE models with INT4-FP8 single node (#4152)
|
2025-03-06 15:33:02 -08:00 |
|
Chayenne
|
9854a18a51
|
Hot fix small vocal eagle in docs (#4154)
Co-authored-by: ybyang <ybyang7@iflytek.com>
|
2025-03-06 15:13:26 -08:00 |
|
Chayenne
|
ebddb65aed
|
Docs: add torch compile cache (#4151)
Co-authored-by: ybyang <ybyang7@iflytek.com>
|
2025-03-06 14:27:09 -08:00 |
|
Adarsh Shirawalmath
|
19fd57bcd7
|
[docs] fix HF reference script command (#4148)
|
2025-03-06 13:21:54 -08:00 |
|
Lianmin Zheng
|
9c58e68b4c
|
Release v0.4.3.post4 (#4140)
|
2025-03-06 12:50:28 -08:00 |
|
Oliver Stanley
|
d03b3467b8
|
Fix constrained generation errors by adding datasets dependency (#4142)
|
2025-03-06 12:07:51 -08:00 |
|
yinfan98
|
ab7fba0ece
|
Fix nightly ci Gsm8k & Fix flashinfer backend kvcache quant (#4147)
|
2025-03-06 11:50:07 -08:00 |
|
Lianmin Zheng
|
bc1534ff32
|
Fix a draft model accuracy bug in eagle; support step=1; return logprob in eagle (#4134)
Co-authored-by: Sehoon Kim <kssteven418@gmail.com>
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: Sehoon Kim <sehoon@x.ai>
|
2025-03-06 06:13:59 -08:00 |
|
Lzhang-hub
|
3a3918121f
|
fix bench serving bug (#4135)
|
2025-03-06 05:34:02 -08:00 |
|
Lianmin Zheng
|
800bf018fb
|
Update CODEOWNER (#4138)
|
2025-03-06 03:42:10 -08:00 |
|
kk
|
b16af90bc3
|
AMD/ROCm: update base image string (#4137)
Co-authored-by: wunhuang <wunhuang@amd.com>
Co-authored-by: yichiche <yichiche@amd.com>
|
2025-03-06 03:38:54 -08:00 |
|
Lianmin Zheng
|
98c73d71cb
|
[Minor] make the __init__ function of model_runner.py shorter (#4132)
|
2025-03-06 01:51:12 -08:00 |
|
Lianmin Zheng
|
fcc2e37f69
|
Split the __init__ of scheduler as smaller functions. Improve the eagle tests (#4128)
|
2025-03-06 00:13:20 -08:00 |
|
Liu Jinjie
|
0804dd11a0
|
remove unused max_jobs in setup_rocm.py (#4126)
Signed-off-by: Jinjie Liu <jinjie.liu@usc.edu>
|
2025-03-06 00:12:19 -08:00 |
|
saienduri
|
55dc8e4d52
|
Add tag suffix to nightly docker builds. (#4129)
|
2025-03-05 23:22:36 -08:00 |
|
Ying Sheng
|
02e9e9f1cf
|
Add codeowners for eagle implementations (#4131)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: Sehoon Kim <kssteven418@gmail.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
|
2025-03-05 23:16:49 -08:00 |
|
simveit
|
8f0b63139e
|
Docs: improve EAGLE docs (#4038)
|
2025-03-05 22:40:21 -08:00 |
|
samzong
|
b9b3b098b9
|
feat: support docs auto live-reload with sphinx-autobuild (#4111)
Signed-off-by: samzong <samzong.lu@gmail.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-03-05 22:39:34 -08:00 |
|
Zhiqiang Xie
|
aee30630d8
|
Add a pointer to the real KV cache pool (#4113)
|
2025-03-05 21:39:07 -08:00 |
|
Lianmin Zheng
|
286e6540a6
|
Remove prefill-only-one-req (#4117)
|
2025-03-05 20:58:48 -08:00 |
|
Wenxuan Tan
|
718c391fd7
|
[Hoxfix] Fix incomplete token_to_kv_pool refactor (#4121)
|
2025-03-05 19:32:42 -08:00 |
|
Yineng Zhang
|
fc671f66c1
|
chore: bump v0.4.3.post3 (#4114)
|
2025-03-05 17:26:10 -08:00 |
|
samzong
|
197751e9a1
|
fix Non-consecutive header level increase in docs/router/router.md (#4099)
Signed-off-by: samzong <samzong.lu@gmail.com>
|
2025-03-05 17:02:32 -08:00 |
|
samzong
|
d2d0d061d9
|
fix cross-reference error and spelling mistakes (#4101)
Signed-off-by: samzong <samzong.lu@gmail.com>
|
2025-03-05 16:39:02 -08:00 |
|
Yueyang Pan
|
25482edb5c
|
Online serving benchmarks of real datasets for hierarchical KV caching (#3211)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
|
2025-03-05 16:16:43 -08:00 |
|
luzengxiangcn
|
62b362b1f1
|
Debug radixcache: refactor recursive helper methods (#3029)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
|
2025-03-05 16:11:42 -08:00 |
|
saienduri
|
44d7646371
|
remove testing on PR workflow change (#4110)
|
2025-03-05 16:03:18 -08:00 |
|
saienduri
|
cd85b78f94
|
Create release-docker-amd-nightly.yml (#4105)
|
2025-03-05 14:46:26 -08:00 |
|
Yineng Zhang
|
0aaccbbfec
|
revert deepseek docs (#4109)
|
2025-03-05 13:23:11 -08:00 |
|
Qiaolin Yu
|
357671e216
|
Add examples for server token-in-token-out (#4103)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-03-05 13:16:31 -08:00 |
|
Chayenne
|
e70fa279bc
|
Docs: reorganize dpsk docs (#4108)
|
2025-03-05 13:01:03 -08:00 |
|
Tommy Yang
|
abe74b7b59
|
Docs: Add DeepSeek optimization ablations documentation (#4107)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-03-05 12:25:51 -08:00 |
|
Jhin
|
70b3c6eeb1
|
Add update_weights_from_disk endpoint to Engine (#4102)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-03-05 12:25:18 -08:00 |
|
Ke Bao
|
ef9d3b3c2c
|
Fix triton kernel illegal memory issue for eagle (#4100)
|
2025-03-05 11:23:53 -08:00 |
|
Baizhou Zhang
|
fc91d08a8f
|
[Revision] Add fast decode plan for flashinfer mla (#4012)
|
2025-03-05 11:20:41 -08:00 |
|
HAI
|
71ab0dabe0
|
Fix the moe padding conditional logic (#4081)
|
2025-03-05 10:56:51 -08:00 |
|
Ying Sheng
|
d3d4d76758
|
[Eagle] Refactor eagle speculative decoding (#3986)
Co-authored-by: Ke Bao <ISPObaoke@163.com>
|
2025-03-05 08:06:07 -08:00 |
|
yigex
|
5be8f1ed98
|
ROCM: AITER BLOCK GEMM (#4075)
|
2025-03-05 03:10:49 -08:00 |
|