Commit Graph

4612 Commits

Author SHA1 Message Date
ichernob
83123f481e [Quantization] Supported w8a8 int8 quantized Gemma3 and Qwen-VL models (#8619)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
2025-08-12 13:31:18 -07:00
ronnie_zheng
48afa8f14f [feat] Enable Ascend profiling on SGLang (#8610)
Co-authored-by: liyou_b <2953090824@qq.com>
2025-08-12 13:28:31 -07:00
li chaoran
2ecbd8b8bf [feat] add ascend readme and docker release (#8700)
Signed-off-by: mywaaagh_admin <pkwarcraft@gmail.com>
Signed-off-by: lichaoran <pkwarcraft@gmail.com>
Co-authored-by: Even Zhou <even.y.zhou@outlook.com>
Co-authored-by: ronnie_zheng <zl19940307@163.com>
2025-08-12 13:25:42 -07:00
Yineng Zhang
305b27c124 fix: update Dockerfile (#9125) 2025-08-12 13:23:10 -07:00
Simo Lin
1ce30dd13e [router] update router documentation (#9121) 2025-08-12 13:16:34 -07:00
Jiaqi Gu
c9ee738515 Fuse writing KV buffer into rope kernel (part 2: srt) (#9014)
Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
2025-08-12 13:15:30 -07:00
ishandhanani
1f9ec65374 fix(docker): update sgl_kernel version to 0.3.4 in Dockerfile.gb200 (#9118) 2025-08-12 13:12:33 -07:00
Chang Su
ad359d1c71 router: Fix user guide link README.md (#9122) 2025-08-12 12:29:10 -07:00
Cheng Wan
5f5b3b2449 [5/n] DP Enhancement: Correct num_token_non_padded (#9107) 2025-08-12 12:23:46 -07:00
Shangming Cai
4caca4f6b4 Fix typo in REVIEWERS (#9113)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
2025-08-12 11:55:49 -07:00
Chang Su
f2a5de284b [Bugfix] Fix accuracy-test-1-gpu failure caused by builtin_tools (#9114) 2025-08-12 09:56:13 -07:00
Liangsheng Yin
445f9dca6e Runtime check CUDA driver version to avoid unresolved green context symbols (#9021) 2025-08-12 09:26:10 -07:00
Yineng Zhang
3a9afe2a42 chore: bump sgl-kernel v0.3.4 (#9103) 2025-08-12 01:48:47 -07:00
fzyzcjy
9aea255522 Fuse writing KV buffer into rope kernel (part 1: sgl-kernel) (#9077) 2025-08-12 01:46:40 -07:00
Yichao Cheng
fcc11e5ed5 update support new models doc (#9096)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
2025-08-12 01:21:02 -07:00
fzyzcjy
5190ba7f42 Fuse two kernels of hidden states padding into quantization kernel (#9005)
Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>
2025-08-12 01:20:13 -07:00
Hsiang-Yu Tsou
5438886c87 docs: fix broken links in README.md (#9075) 2025-08-12 00:03:35 -07:00
Chang Su
9c83d74da3 bugfix: Fix the commentary msg extraction in GptOssDetector (#9097) 2025-08-11 23:53:10 -07:00
DarkSharpness
b4ac2b9c0c [Fix] Fix dual chunk model default behavior (#9032) 2025-08-11 23:50:23 -07:00
Jianwei Dong
83262dcb29 Fix mismatch between padded_scales shape and reshape dimensions in modelopt quantization (#8766) 2025-08-11 23:44:40 -07:00
zixuanzhang226
c46c75f8c0 feat: add fused moe config for Qwen3-30B-A3B on B200 (#9087) 2025-08-11 23:25:36 -07:00
Makcum888e
2aaf22c46c Optimization for AscendPagedTokenToKVPoolAllocator (#8293)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
Co-authored-by: VDV1985 <vladdv85@mail.ru>
2025-08-11 23:06:39 -07:00
Lifu Huang
29a610b4d9 Fix broken CI TestRequestLengthValidation (#9095) 2025-08-11 22:59:56 -07:00
Lifu Huang
5ded39cab2 Fix race condition in async lora unload (#9084) 2025-08-11 22:59:29 -07:00
Keyang Ru
4093d460ce [CI] migrate router to BM.A10.4 runner (#8992)
Co-authored-by: key4ng <rukeyang@gamil.com>
2025-08-11 22:41:18 -07:00
Simo Lin
9d68bdb240 [router] Add Rust Binary Entrypoint for SGLang Router (#9089) 2025-08-11 21:37:36 -07:00
Chang Su
a218490136 (gpt-oss, oai, chat): Remove Harmony Integration and Implement Native GPT-OSS Tool Call Support (#9043) 2025-08-11 18:59:18 -07:00
Zhiqiang Xie
0eec4cb6cc HiCache, add bench long context plus minor fixs (#9086)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-11 16:54:52 -07:00
Faradawn Yang
ff1f68252c [fix] Set Radix tree root node hash to None - Nvidia Dynamo Integration (#9030)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-11 14:20:39 -07:00
Zhiqiang Xie
9f78f391ae HiCache Storage: generate hash when inserting new nodes (#9053) 2025-08-11 14:18:59 -07:00
Faraz
f508cd3cb7 TRTLLM-MLA FP8 path (#8638)
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
2025-08-11 14:02:13 -07:00
Xiaoyu Zhang
44e86480e8 fuse allreduce and residual_rmsnorm (#8731) 2025-08-11 13:50:53 -07:00
Lianmin Zheng
8c07fabda7 Update hyperparameter_tuning.md (#9083) 2025-08-11 13:44:11 -07:00
SijiaYang
90f44b74e6 fix: w4afp8 accuracy problem and rebase (#8752)
Signed-off-by: yangsijia.614 <yangsijia.614@bytedance.com>
Co-authored-by: Jinwu <ayrnb@users.noreply.github.com>
2025-08-11 13:41:19 -07:00
Simo Lin
38907fe639 refactor(pd-router): extract common patterns to reduce code duplication (#9081) 2025-08-11 13:32:31 -07:00
Liangsheng Yin
f9afa7dceb Fix docs for clip max new tokens (#9082) 2025-08-11 13:15:21 -07:00
Jimmy
0d9e89ec69 [PD]decode: add CLIP_MAX_NEW_TOKEN for pop_preallocated (#8866) 2025-08-11 13:08:11 -07:00
Hangzhi
3d64fda376 Fix broken Kimi models HuggingFace link (#9080) 2025-08-11 12:15:00 -07:00
633WHU
3bffe11279 Fix chunked prefill size validation for disabled state (#8973)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-11 11:05:29 -07:00
HAI
44426e54be Update REVIEWERS (#9063) 2025-08-11 11:04:39 -07:00
ishandhanani
9f24dfefd1 chore(gb200): remove ToT flashinfer installation (#9079) 2025-08-11 11:02:15 -07:00
Yi Zhang
89f1d4f536 update deepep commit to support qwen3-coder (#9066) 2025-08-11 10:42:33 -07:00
Baizhou Zhang
75e6a7cde1 Support radix cache for Lora feature (#7216) 2025-08-11 10:14:11 -07:00
Simo Lin
6f81a710f7 [pd-router] add retry and circuit breakfor for pd router (#9051) 2025-08-11 05:53:26 -07:00
Chang Su
a6452b7188 bugfix: Fix output_ids extraction in detokenizer_manager (#9047) 2025-08-11 03:17:32 -07:00
zhyncs
f4ae50e97c fix: use flashinfer v0.2.11.post1 2025-08-11 02:49:25 -07:00
Yineng Zhang
84cb449eec Revert "chore: upgrade flashinfer 0.2.11 (#9036)" (#9057) 2025-08-11 00:16:39 -07:00
Cheng Wan
f003cd3548 [CI] Fix CI tests (#9050)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-10 23:52:05 -07:00
Yineng Zhang
9d834fdcc1 Revert "feat: update flashinfer ar oneshot params (#8687)" (#9054) 2025-08-10 23:24:42 -07:00
Zhiqiang Xie
b32792516a REVIEWERS.md typo fix (#9048) 2025-08-10 22:33:37 -07:00