Liangsheng Yin
|
516738b096
|
Depreate global_server_args_dict (#11528)
|
2025-10-13 19:34:43 +08:00 |
|
Cheng Wan
|
1bdd010291
|
Revert "Deprecate global_server_args_dict" (#11520)
|
2025-10-12 17:40:40 -07:00 |
|
Liangsheng Yin
|
1083e7e3df
|
Deprecate global_server_args_dict (#11331)
|
2025-10-13 01:20:47 +08:00 |
|
kk
|
8ebf72fef3
|
[Fix] RuntimeError: get_cfg Unsupported input_type:Float4_e2m1fn_x2 in using aiter-mxfp4-moe (#10981)
Co-authored-by: wunhuang <wunhuang@amd.com>
|
2025-09-26 22:12:22 -07:00 |
|
yhyang201
|
388c05d544
|
Fix bias handling in TritonMoeQuantInfo within quantization/mxfp4.py (#10579)
|
2025-09-18 11:44:43 -07:00 |
|
Cheng Wan
|
3fa62da78c
|
[7/N] MoE Refactor: the implementation of new framework (#9269)
|
2025-09-05 21:09:09 -07:00 |
|
Hubert Lu
|
2c562fd2d0
|
Fix Llama 4 with MXFP4 dynamic quant on MI35x (#9993)
|
2025-09-04 00:48:58 -07:00 |
|
Lianmin Zheng
|
4aeba40d7b
|
[Sync] Update mxfp4.py (20250827) (#9724)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Shiyang Chen <shiyang@x.ai>
|
2025-08-27 17:00:09 -07:00 |
|
Lianmin Zheng
|
fd71b11b1d
|
move is_sm90_supported/is_sm100_supported to python/sglang/srt/utils.py (#9679)
|
2025-08-27 03:34:29 -07:00 |
|
Stefan He
|
a530b3ffdc
|
[RL] fix register the same ops multiple times (#9564)
|
2025-08-26 16:24:44 -07:00 |
|
hlu1
|
ccd3fb946e
|
[fix] Fix mxfp4 triton MoE tp bug (#9473)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
|
2025-08-23 01:48:40 -07:00 |
|
fzyzcjy
|
0374304a2c
|
Add enable_flashinfer_mxfp4_bf16_moe for higher precision and slower moe backend (#9004)
|
2025-08-23 15:38:40 +08:00 |
|
kk
|
1c1f8a118e
|
Combine fp4.py and mxfp4.py into one file and support dynamic mxfp4 quantization in mxfp4.py (#9049)
Co-authored-by: wunhuang <wunhuang@amd.com>
|
2025-08-16 19:01:54 -07:00 |
|
Cheng Wan
|
84b006b278
|
Cleanup MoE Refactor (#9223)
|
2025-08-15 02:28:33 -07:00 |
|
Cheng Wan
|
295895120d
|
[6/N] MoE Refactor: Cleanup MoE-related configs (#8849)
|
2025-08-14 21:14:53 -07:00 |
|
Xiaoyu Zhang
|
63d82a776a
|
refine mxfp4 shuffling log (#9194)
|
2025-08-14 10:57:29 -07:00 |
|
fzyzcjy
|
5190ba7f42
|
Fuse two kernels of hidden states padding into quantization kernel (#9005)
Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>
|
2025-08-12 01:20:13 -07:00 |
|
Cheng Wan
|
1d24db8348
|
Expert Parallelism for GPT-OSS (#8944)
|
2025-08-08 00:46:42 -07:00 |
|
Xiaoyu Zhang
|
0d1e27a0c5
|
Better optimization log for gpt-oss model (#8953)
|
2025-08-08 00:11:48 -07:00 |
|
Xiaoyu Zhang
|
3ae33fcd0a
|
Fix hopper launch gpt-oss model illegal memory (#8908)
|
2025-08-07 10:02:40 -07:00 |
|
Xiaoyu Zhang
|
47824c1488
|
[Perf] Auto enable best flashinfer mxfp4 kernel in b200 (#8898)
|
2025-08-07 01:08:41 -07:00 |
|
Xiaoyu Zhang
|
4373df5525
|
add flashinfer mxfp4 (#8847)
|
2025-08-06 16:23:41 -07:00 |
|
Ying Sheng
|
168033d5fb
|
Support mxfp4 for GPT-OSS (#8843)
Co-authored-by: Co-author fzyzcjy <ch271828n@outlook.com>
Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
Co-authored-by: zhuofan1123 <zhuofanl@nvidia.com>
Co-authored-by: liz-badada <jinyanc@nvidia.com>
Co-authored-by: xutizhou <xutingz@nvidia.com>
Co-authored-by: linhu-nv <linhu@nvidia.com>
|
2025-08-06 00:05:25 -07:00 |
|