DiweiSun
|
8a10c4c3d9
|
update ci node for xeon (#7265)
|
2025-06-16 23:44:08 -07:00 |
|
kk
|
405780bcf0
|
[amd] Opt dsv3 moe (#7160)
Co-authored-by: wunhuang <wunhuang@amd.com>
|
2025-06-16 22:26:51 -07:00 |
|
yhyang201
|
1dffee31ac
|
OAI Server Skeleton & Core Utility Endpoints (#7179)
|
2025-06-16 20:45:55 -07:00 |
|
Xinyuan Tong
|
70c471a868
|
[Refactor] OAI Server components (#7167)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
|
2025-06-16 20:45:20 -07:00 |
|
Lianmin Zheng
|
1a9c2c9214
|
Fix AMD speculative decoding (#7252)
|
2025-06-16 17:01:33 -07:00 |
|
KavioYu
|
873ae12cee
|
support custom weight loader for model runner (#7122)
Co-authored-by: kavioyu <kavioyu@tencent.com>
|
2025-06-16 16:28:15 -07:00 |
|
Lianmin Zheng
|
c64290dcb5
|
Use seq_len_fill_value in the cuda graph runners (#7233)
|
2025-06-16 15:57:07 -07:00 |
|
Alex Sun
|
8e2363dc15
|
fix amd EP MoE FP8 issue (#7125)
|
2025-06-16 15:29:45 -07:00 |
|
Yineng Zhang
|
f9dc9dd28b
|
chore: bump v0.4.7.post1 (#7248)
|
2025-06-16 15:20:29 -07:00 |
|
Sai Enduri
|
62a7aa2efc
|
Update CI flakes. (#7244)
|
2025-06-16 15:19:32 -07:00 |
|
JieXin Liang
|
5ca07eed90
|
[fix] fix DeepGEMM blackwell input quant & ut & fix style and log (#7247)
|
2025-06-16 11:45:54 -07:00 |
|
woodx
|
e30ef368ab
|
Feat/support rerank (#6058)
|
2025-06-16 10:50:01 -07:00 |
|
fzyzcjy
|
91a066ec6a
|
Tiny remove comments about DeepEP on H20 (#7234)
|
2025-06-16 09:13:57 -07:00 |
|
Liangsheng Yin
|
c494386728
|
minor fix (#7245)
|
2025-06-16 23:30:26 +08:00 |
|
Lianmin Zheng
|
53a525bf33
|
[Eagle] Fix kernel call after updating speculative sampling kernels (#7231)
|
2025-06-16 07:25:59 -07:00 |
|
Lianmin Zheng
|
7ddf8e83d2
|
[EAGLE] Fix draft kv cache layout for fa3 and topk > 1 (#7239)
|
2025-06-16 05:47:51 -07:00 |
|
Lianmin Zheng
|
8321f8e45e
|
Release sgl-kernel 0.1.9 (#7232)
|
2025-06-16 03:37:40 -07:00 |
|
Lianmin Zheng
|
cfceb83d05
|
Fix sampling for speculative decoding & simplify kernels (#7207)
|
2025-06-16 03:28:30 -07:00 |
|
Lianmin Zheng
|
b1286a116a
|
[EAGLE] Refactor code for page size > 1 & more simplifications (#7213)
|
2025-06-16 03:04:29 -07:00 |
|
Lianmin Zheng
|
21615cc3fe
|
Minor style and doc fix (#7228)
|
2025-06-16 01:03:13 -07:00 |
|
Xiaoyu Zhang
|
0ae1e9a755
|
refine fused_moe benchmark (#7221)
|
2025-06-15 21:21:32 -07:00 |
|
Lifu Huang
|
e07d064729
|
Support LoRA in MMMU benchmark script. (#7218)
|
2025-06-15 21:17:57 -07:00 |
|
Cheng Wan
|
3c2274fbee
|
Implement gather before attn (#6378)
|
2025-06-15 21:08:56 -07:00 |
|
Baizhou Zhang
|
d2679f5109
|
Fix ChunkCache object has no attribute 'disable' (#7217)
|
2025-06-15 20:55:15 -07:00 |
|
Byron Hsu
|
96be97bfff
|
Minor PD style fix (#7215)
|
2025-06-15 16:12:12 -07:00 |
|
Byron Hsu
|
88f9c347b2
|
[PD] use int32 for kv indices & get num_reserved_decode_tokens from server_args (#7214)
|
2025-06-15 11:51:03 -07:00 |
|
Lianmin Zheng
|
fff10809bf
|
Revert "[EAGLE] Refactor code for page size > 1 & more simplifications" (#7210)
|
2025-06-15 02:48:00 -07:00 |
|
Lianmin Zheng
|
5f1ab32717
|
[EAGLE] Refactor code for page size > 1 & more simplifications (#7163)
|
2025-06-14 23:16:23 -07:00 |
|
Yineng Zhang
|
7df7c679b6
|
feat: use zstd for docker (#7205)
|
2025-06-14 23:13:29 -07:00 |
|
Lianmin Zheng
|
38af4f68a9
|
Fix grammar abort & Minor style fixes (#7204)
|
2025-06-14 22:49:41 -07:00 |
|
Lianmin Zheng
|
a6305c7d50
|
Lianmin/simplify memory pool (#7202)
|
2025-06-14 22:25:37 -07:00 |
|
Lianmin Zheng
|
a023856b12
|
Move host memory pools into a separate file (#7200)
|
2025-06-14 21:31:42 -07:00 |
|
Byron Hsu
|
db0cc57e75
|
[PD] Support decode retract and update decode.py (#7196)
|
2025-06-14 19:48:05 -07:00 |
|
fzyzcjy
|
349bb2c92a
|
Fix error when disabling new DeepGEMM (#7198)
|
2025-06-14 19:24:54 -07:00 |
|
fzyzcjy
|
0b8939bc2c
|
Fix NCCL 2.27.3 not in docker image (#7195)
|
2025-06-14 18:36:11 -07:00 |
|
JieXin Liang
|
ed89837cf4
|
chore: upgrade sgl-kernel v0.1.8.post2 (#7186)
Co-authored-by: zhyncs <me@zhyncs.com>
|
2025-06-14 18:26:18 -07:00 |
|
JieXin Liang
|
55561e2553
|
[fix] fix determine_num_fused_shared_experts (#7180)
|
2025-06-14 17:41:22 -07:00 |
|
Yineng Zhang
|
4473320380
|
chore: bump v0.1.8.post2 (#7189)
|
2025-06-14 17:01:48 -07:00 |
|
Zhijian Liu
|
0bd67ba2bd
|
Fix a minor bug related to DeepGEMM upgrade (#7191)
|
2025-06-14 16:54:40 -07:00 |
|
Byron Hsu
|
7d316991b2
|
[PD] Update prefill.py (#7190)
|
2025-06-14 15:59:54 -07:00 |
|
JieXin Liang
|
ab1a4fa5cb
|
[fix] fix cutlass_mla_backend with cuda_graph and add sm_scale for sgl-kernel cutlass_mla (#7184)
|
2025-06-14 12:45:41 -07:00 |
|
JieXin Liang
|
ed54bf9d19
|
[fix] fix dsv3 weight loader tqdm and simplify shared experts fusion (#7181)
|
2025-06-14 11:56:29 -07:00 |
|
fzyzcjy
|
b57d87c297
|
Fix shared experts fusion + weight requant (#7177)
|
2025-06-14 02:35:18 -07:00 |
|
Lifu Huang
|
98538822d5
|
Add Phi-4-mm to supported VLM supported model list. (#7178)
|
2025-06-13 23:17:40 -07:00 |
|
Lianmin Zheng
|
f47a1b1d0f
|
Increase timeout in test/srt/test_disaggregation.py (#7175)
|
2025-06-13 23:12:14 -07:00 |
|
fzyzcjy
|
93cec4335f
|
Support new DeepGEMM (#7172)
|
2025-06-13 23:00:17 -07:00 |
|
Lianmin Zheng
|
ba589b88fc
|
Improve test cases for eagle infer (#7173)
|
2025-06-13 22:25:13 -07:00 |
|
Jinn
|
50876abc47
|
Add test for refactored openai server (#7161)
|
2025-06-13 20:42:57 -07:00 |
|
fzyzcjy
|
b4c41f7276
|
Refactor DeepGEMM integration (#7150)
|
2025-06-13 20:41:03 -07:00 |
|
fzyzcjy
|
8b8f2e7463
|
Support new DeepGEMM input format in silu_and_mul_masked_post_quant_fwd (#7153)
|
2025-06-13 20:40:24 -07:00 |
|