Baizhou Zhang
|
b54b5a96e4
|
[Doc]Add instruction for profiling with bench_one_batch (#5581)
|
2025-04-20 14:05:36 -07:00 |
|
JieXin Liang
|
bca832c7c6
|
[Fix] fix outlines and xgrammar (#4947)
|
2025-04-20 13:31:25 -07:00 |
|
Xiaoyu Zhang
|
d9dd529854
|
enable DeepSeek V3 shared_experts_fusion in sm90 (#5571)
|
2025-04-20 12:46:42 -07:00 |
|
fzyzcjy
|
0a0dd34e6a
|
Fix BumpAllocator error when no input_ids (#5564)
|
2025-04-20 02:20:53 -07:00 |
|
fzyzcjy
|
80ac527d22
|
[PD] Fix DeepSeek cannot be run on latest master (#5568)
|
2025-04-20 02:19:48 -07:00 |
|
JieXin Liang
|
99456bcacb
|
[perf] introduce deep gemm group_gemm_masked as bmm (#5432)
|
2025-04-20 00:38:27 -07:00 |
|
fzyzcjy
|
d07e797ace
|
Fix bench_one_batch producing unnatural results for expert parallel (#5149)
|
2025-04-20 00:38:04 -07:00 |
|
Zhaoyi Li
|
c555d794f7
|
Minor update for ROCm variable style (#5562)
|
2025-04-19 23:45:27 -07:00 |
|
Zhiqiang Xie
|
e2574ee986
|
fix hicache write back (#5543)
|
2025-04-19 21:56:22 -07:00 |
|
Byron Hsu
|
ab4b5606e4
|
[PD] Support page size > 1 (#5561)
|
2025-04-19 21:54:27 -07:00 |
|
Yubo Wang
|
20f1c8e374
|
Fix sampler nan check when calling top_k_top_p_sampling_from_probs (#5546)
|
2025-04-19 21:47:23 -07:00 |
|
fzyzcjy
|
613b197e57
|
Remove one kernel in per_tensor_quant_mla_fp8 (#5549)
|
2025-04-19 15:08:15 -07:00 |
|
Xiaoyu Zhang
|
d58e354472
|
simplify the control logic for using shared experts fusion (#5504)
|
2025-04-19 13:17:35 -07:00 |
|
Xiaoyu Zhang
|
bf86c5e990
|
restruct compressed_tensors_w8a8_fp8 (#5475)
|
2025-04-19 04:52:15 -07:00 |
|
shangmingc
|
dca90f1db8
|
[PD] Remove the requirement of config file for mooncake backend (#5460)
|
2025-04-19 19:31:00 +08:00 |
|
Yineng Zhang
|
0961feefca
|
feat: use flashinfer jit package (#5547)
|
2025-04-19 00:28:39 -07:00 |
|
ybyang
|
59dd090f1c
|
[PD] Fix no cache connect for recevier (#5534)
|
2025-04-19 14:55:28 +08:00 |
|
fzyzcjy
|
569b032c58
|
[PD] Tiny fix timeout error when generate (#5545)
|
2025-04-19 14:42:57 +08:00 |
|
fzyzcjy
|
f6a71139a8
|
Make profiler output file names consistent (#5548)
|
2025-04-18 22:57:11 -07:00 |
|
fzyzcjy
|
1e0806f30b
|
Fix DeepGEMM masked cannot be run on groups not being multiple or 4 (#5340)
|
2025-04-18 22:38:07 -07:00 |
|
Yineng Zhang
|
2c11f9c2eb
|
chore: upgrade sgl-kernel 0.0.9.post2 (#5540)
|
2025-04-18 21:17:23 -07:00 |
|
Yineng Zhang
|
a6f892e5d0
|
Revert "Avoid computing lse in Ragged Prefill when there's no prefix.… (#5544)
|
2025-04-18 16:50:21 -07:00 |
|
Yineng Zhang
|
08b518d51f
|
fix util import (#5542)
|
2025-04-18 15:06:46 -07:00 |
|
yhyang201
|
4db463b1ad
|
[Model] Adding Qwen3 and Qwen3MoE (#4693)
|
2025-04-18 09:51:29 -07:00 |
|
Wenxuan Tan
|
bfa3922451
|
Avoid computing lse in Ragged Prefill when there's no prefix. (#5476)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2025-04-18 01:13:57 -07:00 |
|
liwenju0
|
e465b08ddb
|
fix bug of VLLM_AVAILABLE not defined (#5497)
|
2025-04-18 00:59:03 -07:00 |
|
Xiaoyu Zhang
|
bed05878f6
|
fix kimi vl running bug after rebase main (#5461)
|
2025-04-18 00:17:34 -07:00 |
|
strgrb
|
b2a189dd11
|
use sglang_per_token_group_quant_fp8 from sgl-kernel instead of trion kernel (#5473)
Co-authored-by: Zhang Kaihong <zhangkaihong.zkh@alibaba-inc.com>
|
2025-04-18 00:05:24 -07:00 |
|
Yineng Zhang
|
f28d82997a
|
chore: bump sgl-kernel 0.0.9.post2 (#5518)
|
2025-04-17 23:42:39 -07:00 |
|
Xiaoyu Zhang
|
8e09b37077
|
Sgl kernel fused_moe_gate support n_shared_experts (#5440)
|
2025-04-17 23:05:15 -07:00 |
|
fzyzcjy
|
53dcf38876
|
Introduce moe_dense_tp_size to fix dense layer errors in DeepSeek V3 + 4x8xH100 (#4836)
|
2025-04-17 21:38:26 -07:00 |
|
Michael Feil
|
1effba4c70
|
Configuration qwen2_moe.py - qkv_bias now in transformers (#5512)
|
2025-04-17 21:23:22 -07:00 |
|
Michael Yao
|
a0fc5bc144
|
[docs] Fix several consistency issues in sampling_params.md (#5373)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2025-04-18 10:54:40 +08:00 |
|
mlmz
|
27e9538a7e
|
Fix: fix the exception 'the memory capacity is unbalanced. Some GPUs … (#5426)
Co-authored-by: ocss884 <ocss.lin@gmail.com>
|
2025-04-18 10:51:39 +08:00 |
|
u4lr451
|
211c7b31b8
|
Fix: Incorrect parameters passed to forward_batch_generation (#5506) (#5511)
|
2025-04-17 18:49:59 -07:00 |
|
PGFLMG
|
c08a717c77
|
[Feat] Update sgl-kernel flashinfer to latest main version (#5500)
Co-authored-by: zhyncs <me@zhyncs.com>
|
2025-04-17 12:43:23 -07:00 |
|
mlmz
|
f13d65a7ea
|
Doc: fix problems of the 'Execute Notebooks / run-all-notebooks' ci caused by the unstability of deepseek-ai/DeepSeek-R1-Distill-Qwen-7B (#5503)
|
2025-04-17 11:37:43 -07:00 |
|
Xuchun Shang
|
06d0a3d92b
|
[Bug fix] use correct func path in deepseek (#5496)
Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
|
2025-04-17 02:41:41 -07:00 |
|
Michael Yao
|
22c2a79dc5
|
Fix a link in sgl-kernel/README.md (#5493)
|
2025-04-17 02:25:28 -07:00 |
|
fzyzcjy
|
8beb356f0d
|
Refactor DeepSeek decoder layer branches (#5205)
|
2025-04-17 02:11:11 -07:00 |
|
Chang Su
|
c776234b45
|
Enable local attention during decode (#5479)
|
2025-04-17 02:07:43 -07:00 |
|
woodx
|
3bface15e6
|
Feat/support encoder model (like bert) (#4887)
|
2025-04-17 01:50:48 -07:00 |
|
Baizhou Zhang
|
6fb29ffd9e
|
Deprecate enable-flashinfer-mla and enable-flashmla (#5480)
|
2025-04-17 01:43:33 -07:00 |
|
Baizhou Zhang
|
4fb05583ef
|
Deprecate disable-mla (#5481)
|
2025-04-17 01:43:14 -07:00 |
|
Baizhou Zhang
|
81c891111f
|
Add test for flash_attn_varlen_func kernel (#5484)
|
2025-04-17 01:42:56 -07:00 |
|
Didier Durand
|
92d1561b70
|
Update attention_backend.md: plural form (#5489)
|
2025-04-17 01:42:40 -07:00 |
|
eigen
|
8f783c1943
|
[Model Support] unsloth/Phi-4-mini bnb model (#4982)
Co-authored-by: yhyang201 <yhyang201@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2025-04-16 19:58:20 -07:00 |
|
BearBiscuit
|
90faf9018e
|
[verl] Modify the update_weights func to align with verl's resharding (#5345)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-04-16 19:56:57 -07:00 |
|
Lianmin Zheng
|
177320a582
|
Clean up imports (#5467)
|
2025-04-16 15:26:49 -07:00 |
|
Ying Sheng
|
d7bc19a46a
|
add multi-lora feature in README.md (#5463)
|
2025-04-16 03:25:25 -07:00 |
|