Mick
|
01dd39bac1
|
refactor: minor refactors regarding multimodal processing (#6187)
|
2025-05-17 22:53:20 -07:00 |
|
Lianmin Zheng
|
b3f3d610fd
|
Do not use FA3 for mistral (#6379)
|
2025-05-17 19:47:34 -07:00 |
|
Yineng Zhang
|
f07c6a009b
|
chore: upgrade sgl-kernel v0.1.3 (#6377)
|
2025-05-17 19:47:05 -07:00 |
|
Lianmin Zheng
|
4bb816d444
|
Fix CI tests (#6362)
|
2025-05-17 19:16:45 -07:00 |
|
ybyang
|
c250939ecb
|
[Fix Chat API] add request id for chat/completion for tracing (#6364)
|
2025-05-17 18:58:22 -07:00 |
|
ishandhanani
|
b6909aa223
|
fix: allow launch_dummy_health_check_server to start inside of running asyncio loop (#6330)
|
2025-05-17 18:32:41 -07:00 |
|
fzyzcjy
|
f87283573e
|
Add expert distribution APIs for engine (#6290)
|
2025-05-17 18:31:51 -07:00 |
|
fzyzcjy
|
73187152a4
|
Reland tiny refactor DefaultModelLoader.Source (#6041)
|
2025-05-17 17:11:20 -07:00 |
|
fzyzcjy
|
4086566516
|
Fix expert distribution recorder and profiler command stuck forever (#6284)
|
2025-05-17 17:10:44 -07:00 |
|
fzyzcjy
|
fd08c04821
|
Support custom DeepEP tuning config (#6257)
|
2025-05-17 17:09:42 -07:00 |
|
fzyzcjy
|
26ebb849eb
|
Tiny refactor bench_serving to extract RequestFuncOutput.init_new (#6108)
|
2025-05-17 17:08:52 -07:00 |
|
fzyzcjy
|
02973cd9a4
|
Tiny refactor bench_serving to improve extensibility (#6134)
|
2025-05-17 17:07:58 -07:00 |
|
fzyzcjy
|
6d95a35abf
|
Support outputing details for bench_serving (#6107)
|
2025-05-17 17:06:52 -07:00 |
|
fzyzcjy
|
01d2838c0f
|
Fix stop_profile does not wait for finishing (#4741)
|
2025-05-17 17:06:15 -07:00 |
|
xutizhou
|
e3b8a72291
|
[fix] illegal memory in _fwd_kernel_ep_scatter_2 and _fwd_kernel_ep_gather (#6348)
|
2025-05-17 17:01:42 -07:00 |
|
Lifu Huang
|
3cf1473a09
|
Use monotonic clock for interval measurement (#6211)
Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>
|
2025-05-17 16:49:18 -07:00 |
|
fzyzcjy
|
2716830802
|
Speed up when having padding tokens in DeepEP (#6175)
|
2025-05-17 16:44:05 -07:00 |
|
Zilin Zhu
|
e3bed74afb
|
[router] Add /list_workers endpoint to router (#6366)
|
2025-05-17 09:49:02 -07:00 |
|
Vincent Zhong
|
e9ef39d2e9
|
docs: Update the MD files (#6373)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-05-17 09:23:16 -07:00 |
|
Chang Su
|
205d5cb407
|
perf: Optimize local attention memory allocation in FlashAttentionBackend (#6356)
|
2025-05-17 01:45:46 -07:00 |
|
Yineng Zhang
|
3d7f7a43c8
|
chore: bump sgl-kernel v0.1.3 (#6368)
|
2025-05-17 00:15:55 -07:00 |
|
fzyzcjy
|
2df9d40aa6
|
Minor code cleanup refactor for DeepSeek models (#6324)
|
2025-05-16 19:06:03 -07:00 |
|
fzyzcjy
|
8dc191f237
|
Fix one wasted kernel in DeepSeek and minor refactor (#6316)
|
2025-05-16 19:05:33 -07:00 |
|
Kiv Chen
|
64825b8395
|
model(vlm): mistral 3.1 (#5099)
Co-authored-by: KivenChen <sleigh-queue-0y@icloud.com>
|
2025-05-16 18:36:18 -07:00 |
|
Yineng Zhang
|
69748d088d
|
docs: update readme (#6361)
|
2025-05-16 16:16:23 -07:00 |
|
Lianmin Zheng
|
dcc0a45618
|
Fix amd ci (#6360)
|
2025-05-16 15:33:10 -07:00 |
|
Lianmin Zheng
|
c2b7ddca49
|
[Minor] cleanup unused imports (#6358)
|
2025-05-16 14:52:38 -07:00 |
|
Lianmin Zheng
|
abebd9399c
|
Update CODEOWNERS (#6359)
|
2025-05-16 14:51:36 -07:00 |
|
Fr4nk1in
|
4bd2952a37
|
feat: add dp attention support for Qwen 2/3 MoE models, fixes #6088 (#6121)
Co-authored-by: King.Zevin <zevin@mail.ustc.edu.cn>
Co-authored-by: Yi Zhang <1109276519@qq.com>
|
2025-05-16 14:44:10 -07:00 |
|
Elfie Guo
|
6fc9357503
|
[2/2] Add python wrapper for CUTLASS FP8 Blockscale MoE Kernel. (#5694)
|
2025-05-16 13:14:07 -07:00 |
|
Baizhou Zhang
|
839fb31e5f
|
[Fix] Improve dependencies for Blackwell image (#6334)
|
2025-05-16 12:38:22 -07:00 |
|
Yury Sulsky
|
f19a9204cd
|
Support precomputed multimodal features for Qwen-VL and Gemma3 models. (#6136)
Co-authored-by: Yury Sulsky <ysulsky@tesla.com>
|
2025-05-16 12:26:15 -07:00 |
|
Elfie Guo
|
c23a7072b6
|
Upgrade CUTLASS 4.0 (#6336)
Co-authored-by: zhyncs <me@zhyncs.com>
|
2025-05-15 17:42:23 -07:00 |
|
Lianmin Zheng
|
e07a6977e7
|
Minor improvements of TokenizerManager / health check (#6327)
|
2025-05-15 15:29:25 -07:00 |
|
Qiaolin Yu
|
cd8d4b9dfc
|
Fix lora bench (#6302)
|
2025-05-15 10:09:55 -07:00 |
|
fzyzcjy
|
f194e14fb7
|
Reduce MoE memory usage (#6147)
|
2025-05-15 09:38:28 -07:00 |
|
Yi Liu
|
cfc9f9ab8d
|
Fix gpu mem check on CPU (#6317)
Signed-off-by: yiliu30 <yi4.liu@intel.com>
|
2025-05-15 09:37:45 -07:00 |
|
Chunyuan WU
|
fb4959b2c5
|
Add fp8 gemm kernel for CPU in sgl-kernel and add gemm UT (#6216)
Co-authored-by: YanbingJiang <yanbing.jiang@intel.com>
Co-authored-by: mingfeima <mingfei.ma@intel.com>
|
2025-05-15 09:10:40 -07:00 |
|
JieXin Liang
|
9a405274e2
|
[misc] remove redundant platform codes (#6298)
|
2025-05-15 00:51:30 -07:00 |
|
quinnrong94
|
2e4babdb0a
|
[Feat] Support FlashMLA backend with MTP and FP8 KV cache (#6109)
Co-authored-by: Yingyi <yingyihuang2000@outlook.com>
Co-authored-by: neiltian <neiltian@tencent.com>
Co-authored-by: lukec <118525388+sleepcoo@users.noreply.github.com>
Co-authored-by: kexueyu <kexueyu@tencent.com>
Co-authored-by: vincentmeng <vincentmeng@tencent.com>
Co-authored-by: pengmeng <pengmeng@tencent.com>
|
2025-05-15 00:48:09 -07:00 |
|
Zilin Zhu
|
44a3783d13
|
[fix][RL] Remove the incorrect barrier in init_weights_update_group (#5914)
|
2025-05-14 19:15:21 -07:00 |
|
Junrong Lin
|
f3bf611054
|
feat: add flush cache to EngineBase and HttpServerEngineAdapter (#6009)
|
2025-05-14 19:15:02 -07:00 |
|
Hubert Lu
|
198b9056d1
|
[AMD] Fix Llama 4 Scout and Maverick accuracy issues on MI300X (#6274)
|
2025-05-14 22:07:29 +00:00 |
|
Sai Enduri
|
73eb67c087
|
Enable unit tests for AMD CI. (#6283)
|
2025-05-14 12:55:36 -07:00 |
|
Brayden Zhong
|
9a91fa0ed1
|
docs: fix a bad redirect (#6300)
|
2025-05-14 10:27:19 -07:00 |
|
Mick
|
cd7c8a8de6
|
doc: update developer guide regarding mllms (#6138)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: XinyuanTong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: Xinyuan Tong <justinning0323@outlook.com>
|
2025-05-14 23:13:13 +08:00 |
|
Lifu Huang
|
3e350a931e
|
[Bug] Fix accidental logger override caused by internVL. (#6282)
|
2025-05-13 23:29:25 -07:00 |
|
Ying Sheng
|
fb71725c98
|
Fix a bug in schedule_policy (#6276)
|
2025-05-13 18:04:00 -07:00 |
|
Chang Su
|
912788c095
|
perf: optimize local_block_table memory allocation (#6273)
|
2025-05-13 17:18:38 -07:00 |
|
blzheng
|
0f75b907c6
|
[CPU] Add CMakeLists.txt for sgl-kernel (#6115)
|
2025-05-13 15:30:37 -07:00 |
|