Commit Graph

6193 Commits

Author SHA1 Message Date
maxiao
fa3882d218 delete print info 2025-11-04 16:30:38 +08:00
maxiao
6e2b63d4c4 使用vllm custom allreduce 2025-11-04 15:28:54 +08:00
maxiao
d2fdeac22f 调用vllm里custom all reduce 2025-11-03 16:28:21 +08:00
maxiao1
75cd34d172 change sgl_kernel WARP_SIZE to 64 2025-11-03 10:17:53 +08:00
maxiao1
8fc552638f Merge branch 'v0.5.4_dev_maxiao' into 'v0.5.4_dev'
适配w8a8模型

See merge request OpenDAS/sglang!1
2025-10-29 02:09:59 +00:00
maxiao1
eb4ba1c295 update UNBALANCED_MODEL_LOADING_TIMEOUT_S=3600 2025-10-29 10:06:23 +08:00
maxiao1
4b9b337b39 适配w8a8模型 2025-10-29 09:06:22 +08:00
lizhigong
f6528b74be 增加hipprof支持、修复异步调度中的同步问题 2025-10-28 16:25:06 +08:00
maxiao1
a5718531b7 关闭custom_allreduce保持正确性 2025-10-28 10:57:25 +08:00
guobj
c333f12547 补充 bench_serving.py里tpot等指标 2025-10-28 02:11:36 +00:00
maxiao
f9a026ad2b fix fused_add_rms_norm bug 2025-10-27 10:27:57 +08:00
maxiao1
b80ae5e9ff adaptation w4a8 tp 2025-10-25 16:33:07 +08:00
lizhigong
b091a7a5c9 adapt w4a8 marlin deepep dp ep
(cherry picked from commit a0fb70e9c1)
2025-10-25 15:07:57 +08:00
lizhigong
143ec5f36c adaptation w4A8 quantization
(cherry picked from commit 848c5b8290)
2025-10-25 15:07:04 +08:00
lizhigong
67510e0172 adaptation part w4A8 quantization
(cherry picked from commit 68277eac30)
2025-10-25 15:06:27 +08:00
maxiao1
32b1ccaf62 修改sgl-kernel下的setup_hip.py 2025-10-25 13:11:02 +08:00
maxiao
251235c229 适配v0.5.4 2025-10-25 12:16:25 +08:00
sglang-bot
1053e1be17 chore: bump SGLang version to 0.5.4 (#12027)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
2025-10-23 18:01:40 -07:00
nvjullin
9a71500cfb Fixed aarch64 flash-mla (#12009) 2025-10-23 17:47:04 -07:00
Simo Lin
6d6e24bcc4 [router] Add builder pattern for RouterConfig with zero duplication (#12030) 2025-10-23 16:46:10 -07:00
Kangyan-Zhou
2c057fbfa8 Update Github action title for kernel build (#12029) 2025-10-23 13:39:40 -07:00
Roger Young
dbd9435dc1 Fix mamba radix cache eviction logic in alloc_req_slots (#11616)
Signed-off-by: rogeryoungh <rogeryoungh@foxmail.com>
2025-10-23 13:07:43 -07:00
b8zhong
8ae9d4bb41 Revert "[ROCm] Remove vLLM rope dependency & use AITER impl" (#12028) 2025-10-23 12:42:59 -07:00
Nicolas Castet
1c304aa9bc Log iteration # for prefill and decode (#9366) 2025-10-23 12:28:03 -07:00
Mick
770529a731 model: support deepseek-ocr (#11891)
Co-authored-by: yhyang201 <47235274+yhyang201@users.noreply.github.com>
Co-authored-by: yhyang201 <yhyang201@gmail.com>
Co-authored-by: Shi Shuai <126407087+shuaills@users.noreply.github.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
2025-10-24 03:15:17 +08:00
ErvinXie
39c237f02c Add AWQ quantization support for NPU. (#10158)
Co-authored-by: Alisehen <814073252@qq.com>
Co-authored-by: Yaochen Han <48639761+Alisehen@users.noreply.github.com>
Co-authored-by: Zhengda Qin <zhengdqin@gmail.com>
2025-10-23 12:08:05 -07:00
Chang Su
28b8a4064d [router][CI] Clean up imports and prints statements in sgl-router/py_test (#12024) 2025-10-23 11:56:57 -07:00
Mick
8bd26dd4e6 ci: fix night-ci with push retry mechanism (#11765) 2025-10-23 11:31:05 -07:00
Lianmin Zheng
ab07cd3e5a [Auto Sync] Update test_deterministic_utils.py (20251023) (#12022)
Co-authored-by: Stefan He <hebiaobuaa@gmail.com>
2025-10-23 11:20:45 -07:00
Netanel Haber
a98496834b Feature/nano v2 offline modelopt fp8 and nvfp4 (#12018)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
2025-10-23 11:16:46 -07:00
Simo Lin
a4b637d87a [router] change ci names and update log level in ci (#12021) 2025-10-23 10:36:19 -07:00
Teng Ma
96a5e4dd79 [Feature] Support loading weights from ckpt engine worker (#11755)
Signed-off-by: Yang Kaiyong <yangkaiyong.yky@antgroup.com>
Signed-off-by: Cruz Zhao <CruzZhao@linux.alibaba.com>
Signed-off-by: Xuchun Shang <xuchun.shang@gmail.com>
Co-authored-by: Yang Kaiyong <yangkaiyong.yky@antgroup.com>
Co-authored-by: Cruz Zhao <CruzZhao@linux.alibaba.com>
Co-authored-by: Xuchun Shang <xuchun.shang@gmail.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
2025-10-23 09:23:30 -07:00
cctry
b0b4f71679 [Fix] memory leak by overlap + retract (#11981)
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
2025-10-23 22:59:23 +08:00
Liangsheng Yin
6c18addb6f Revert "Support nvidia/NVIDIA-Nemotron-Nano-9B-v2-FP8/NVFP4" (#12015) 2025-10-23 21:27:58 +08:00
Liangsheng Yin
32852fe9e9 Move memory runtime checker to mixin class (#12014) 2025-10-23 20:53:26 +08:00
Arthur Cheng
53c2934dce [Router] Consolidate ConnectionMode enum to core module (#11937) 2025-10-23 05:15:49 -07:00
Keyang Ru
e321c97113 [router] Add comprehensive E2E tests for Response API (#11988) 2025-10-23 05:13:51 -07:00
Netanel Haber
d6fee73d1f Support nvidia/NVIDIA-Nemotron-Nano-9B-v2-FP8/NVFP4 (#11866) 2025-10-23 17:29:02 +08:00
Qiaolin Yu
36a4cad7b0 Support overlap-spec-v2 with trtllm_mla attention backend (#11821) 2025-10-23 16:55:35 +08:00
HAI
65d376b491 aiter update to v0.1.6.post1 (#12004) 2025-10-22 23:53:05 -07:00
yinghui
c23eda8589 Fix incorrect KV indices creation when page_size=32 in TRTLLM MLA backend (#11985) 2025-10-22 22:44:45 -07:00
Jue WANG
138ff23187 Allow to disable batch decoding. (#11944) 2025-10-22 21:57:12 -07:00
blzheng
13fb8b5489 [CPU] Optimize FP16 decode_attention_cpu (#10652) 2025-10-22 21:39:51 -07:00
Zhengyi Lai
81fd2b0ee0 fix(deepep): resolve benchmark failure on 4×IB-card setup by aligning tuning config with DeepEP commit bdd119f8 (#11965) 2025-10-22 21:20:54 -07:00
Zaili Wang
007b849b0e [CPU] misc updates (#11906) 2025-10-22 21:10:05 -07:00
fzyzcjy
8612811d85 Bump grace blackwell DeepEP version (#11990) 2025-10-22 21:08:12 -07:00
Johnny
e7aa4664b3 [NVIDIA] Build CUDA 13 (#11299)
Co-authored-by: ishandhanani <ishandhanani@gmail.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2025-10-22 20:03:12 -07:00
b8zhong
4d4feccbb2 [ROCm] Remove vLLM rope dependency & use AITER impl (#11322) 2025-10-22 19:17:34 -07:00
jacky.cheng
99c92ff24b [AMD] Support a new flag to disable quant on parallelLinear layer if required (#11811) 2025-10-22 19:16:15 -07:00
Chang Su
6ade6a02d4 [grpc] Support gRPC standard health check (#11955) 2025-10-22 16:59:09 -07:00