Yineng Zhang
|
2c11f9c2eb
|
chore: upgrade sgl-kernel 0.0.9.post2 (#5540)
|
2025-04-18 21:17:23 -07:00 |
|
Yineng Zhang
|
a6f892e5d0
|
Revert "Avoid computing lse in Ragged Prefill when there's no prefix.… (#5544)
|
2025-04-18 16:50:21 -07:00 |
|
Yineng Zhang
|
08b518d51f
|
fix util import (#5542)
|
2025-04-18 15:06:46 -07:00 |
|
yhyang201
|
4db463b1ad
|
[Model] Adding Qwen3 and Qwen3MoE (#4693)
|
2025-04-18 09:51:29 -07:00 |
|
Wenxuan Tan
|
bfa3922451
|
Avoid computing lse in Ragged Prefill when there's no prefix. (#5476)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2025-04-18 01:13:57 -07:00 |
|
liwenju0
|
e465b08ddb
|
fix bug of VLLM_AVAILABLE not defined (#5497)
|
2025-04-18 00:59:03 -07:00 |
|
Xiaoyu Zhang
|
bed05878f6
|
fix kimi vl running bug after rebase main (#5461)
|
2025-04-18 00:17:34 -07:00 |
|
strgrb
|
b2a189dd11
|
use sglang_per_token_group_quant_fp8 from sgl-kernel instead of trion kernel (#5473)
Co-authored-by: Zhang Kaihong <zhangkaihong.zkh@alibaba-inc.com>
|
2025-04-18 00:05:24 -07:00 |
|
fzyzcjy
|
53dcf38876
|
Introduce moe_dense_tp_size to fix dense layer errors in DeepSeek V3 + 4x8xH100 (#4836)
|
2025-04-17 21:38:26 -07:00 |
|
Michael Feil
|
1effba4c70
|
Configuration qwen2_moe.py - qkv_bias now in transformers (#5512)
|
2025-04-17 21:23:22 -07:00 |
|
mlmz
|
27e9538a7e
|
Fix: fix the exception 'the memory capacity is unbalanced. Some GPUs … (#5426)
Co-authored-by: ocss884 <ocss.lin@gmail.com>
|
2025-04-18 10:51:39 +08:00 |
|
u4lr451
|
211c7b31b8
|
Fix: Incorrect parameters passed to forward_batch_generation (#5506) (#5511)
|
2025-04-17 18:49:59 -07:00 |
|
Xuchun Shang
|
06d0a3d92b
|
[Bug fix] use correct func path in deepseek (#5496)
Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
|
2025-04-17 02:41:41 -07:00 |
|
fzyzcjy
|
8beb356f0d
|
Refactor DeepSeek decoder layer branches (#5205)
|
2025-04-17 02:11:11 -07:00 |
|
Chang Su
|
c776234b45
|
Enable local attention during decode (#5479)
|
2025-04-17 02:07:43 -07:00 |
|
woodx
|
3bface15e6
|
Feat/support encoder model (like bert) (#4887)
|
2025-04-17 01:50:48 -07:00 |
|
Baizhou Zhang
|
6fb29ffd9e
|
Deprecate enable-flashinfer-mla and enable-flashmla (#5480)
|
2025-04-17 01:43:33 -07:00 |
|
Baizhou Zhang
|
4fb05583ef
|
Deprecate disable-mla (#5481)
|
2025-04-17 01:43:14 -07:00 |
|
eigen
|
8f783c1943
|
[Model Support] unsloth/Phi-4-mini bnb model (#4982)
Co-authored-by: yhyang201 <yhyang201@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2025-04-16 19:58:20 -07:00 |
|
BearBiscuit
|
90faf9018e
|
[verl] Modify the update_weights func to align with verl's resharding (#5345)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-04-16 19:56:57 -07:00 |
|
Lianmin Zheng
|
177320a582
|
Clean up imports (#5467)
|
2025-04-16 15:26:49 -07:00 |
|
Cheng Wan
|
6aca583420
|
Fix several minor issues in PD disaggregation (#5444)
|
2025-04-15 23:04:41 -07:00 |
|
Yineng Zhang
|
5b5c7237c8
|
chore: bump v0.4.5.post1 (#5445)
|
2025-04-15 23:00:07 -07:00 |
|
Baizhou Zhang
|
a42736bbb8
|
Support MHA with chunked prefix cache for DeepSeek chunked prefill (#5113)
|
2025-04-15 22:01:22 -07:00 |
|
ybyang
|
dd83e7e9c3
|
[Bug fix] need record start time in pd mode (#5425)
|
2025-04-16 10:11:16 +08:00 |
|
Lianmin Zheng
|
0769b14bf9
|
[Minor] Move torch.compile patch to a better place (#5397)
|
2025-04-15 18:37:07 -07:00 |
|
ryang
|
bc24205b32
|
Support BNB quantization for llama/mllama (#5038)
Co-authored-by: Yuhao Yang <yyh073@foxmail.com>
|
2025-04-15 18:00:31 -07:00 |
|
Chang Su
|
27a009bb00
|
Fix ignore_eos parameter when loading a chat template (#5264)
|
2025-04-15 17:09:45 -07:00 |
|
Yineng Zhang
|
8ec0bb7d55
|
chore: upgrade sgl-kernel 0.0.9.post1 (#5436)
|
2025-04-15 15:45:51 -07:00 |
|
Yineng Zhang
|
fa909dc3c4
|
feat: update model_specific_adjustment (#5344)
Co-authored-by: hebiao064 <hebiaobuaa@gmail.com>
|
2025-04-15 14:45:15 -07:00 |
|
shangmingc
|
f1b3b75fc6
|
[PD] Remove unused bootstrap param and fix port table type (#5423)
|
2025-04-15 21:21:20 +08:00 |
|
Liangsheng Yin
|
33b16ad178
|
Distinguish bootstrap key only in decode server (#5422)
|
2025-04-15 20:59:28 +08:00 |
|
shangmingc
|
ffde65a094
|
[PD] Fix dynamic port support and MLA buffer for Mooncake (#5415)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
Co-authored-by: ybyang <ybyang7@iflytek.com>
|
2025-04-15 19:29:31 +08:00 |
|
lambert0312
|
471650dee0
|
Fix broadcast use cuda device lead to memory capacity unbalanced (#5416)
|
2025-04-15 02:47:26 -07:00 |
|
Yuan Luo
|
d06a83fb01
|
Support dynamic connection and TP 16 (#5351)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2025-04-15 17:08:07 +08:00 |
|
Zhaoyang Hao
|
5d13440162
|
[FIX] Fix concatenation error in capture_bs when open --disable-cuda-graph-padding and without MTP (#5412)
|
2025-04-15 01:42:27 -07:00 |
|
Yuhong Guo
|
3dfc6023ce
|
Fix bench_serving with random-ids (#5214)
|
2025-04-15 01:34:35 -07:00 |
|
fzyzcjy
|
15e91d721b
|
Tiny fix DeepseekScalingRotaryEmbedding always use forward_native (#5406)
|
2025-04-15 01:33:47 -07:00 |
|
Yineng Zhang
|
8aab7fdb21
|
chore: upgrade sgl-kernel 0.0.9 (#5401)
|
2025-04-14 22:37:59 -07:00 |
|
Yangcheng Li
|
ee9d6ca677
|
[fix/misc] remove duplicate row in deepseek v2 model (#5279)
|
2025-04-14 18:41:24 -07:00 |
|
Ximingwang-09
|
2dd6489468
|
Add H20 dtype fp8_w8a8 shared experts fused MoE kernel tuning configs for DeepSeek V3/R1 (#5291)
Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>
|
2025-04-14 18:40:31 -07:00 |
|
lambert0312
|
61e7c4dd21
|
Add A800 shared experts fused MoE kernel tuning configs for DeepSeek V3/R1 (#5368)
|
2025-04-14 18:39:44 -07:00 |
|
Baizhou Zhang
|
f6772f1497
|
[Fix] Turn off DeepGEMM by default (#5263)
|
2025-04-14 17:45:44 -07:00 |
|
Xiaoyu Zhang
|
38076dea84
|
apply fused moe gate in ds v3/r1 (#5371)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2025-04-14 16:24:26 -07:00 |
|
Ke Bao
|
5e0a9b0981
|
Apply deepseek cuda rope (#5385)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2025-04-14 15:22:43 -07:00 |
|
JieXin Liang
|
bdde237562
|
[perf] experimental enhance fp8 per-tensor quant (#5370)
|
2025-04-14 12:35:43 -07:00 |
|
ybyang
|
e9fc2ac7b6
|
[PD Bug] fix MLA get_contiguous_buf_infos error (#5384)
|
2025-04-14 22:56:39 +08:00 |
|
Liangsheng Yin
|
44afde82d7
|
Fix PD disaggregation bugs (#5326)
|
2025-04-14 19:27:30 +08:00 |
|
yhyang201
|
072df75354
|
Support for Qwen2.5-VL Model in bitsandbytes Format (#5003)
|
2025-04-14 02:03:40 -07:00 |
|
fzyzcjy
|
defede5073
|
Fix DeepSeek DP Attention + torch compile (#5367)
Co-authored-by: ispobock <ispobaoke@163.com>
|
2025-04-14 01:07:58 -07:00 |
|