Commit Graph

14 Commits

Author SHA1 Message Date
Xiaoyu Zhang
a167fd0bcb [code style] Clean dead triton kernel code in fused_moe and useless vllm_ops import (#8310) 2025-07-24 14:38:30 +08:00
Hubert Lu
e50109f2ed [AMD] Remove vllm's scaled_fp8_quant and moe_sum when SGLANG_USE_AITER=1 (#7484) 2025-07-21 17:33:19 -07:00
Hongbo Xu
1f76fc8747 [3/n] chore: decouple AWQ implementation from vLLM dependency (#8113)
Co-authored-by: AniZpZ <zhuangsen.zp@antgroup.com>
2025-07-18 11:45:22 -07:00
Cheng Wan
49b8777460 Refactor: move all quantization-related code to srt/layer/quantization (#7989) 2025-07-17 00:47:07 -07:00
Peng Zhang
c28ad1990d [1/n] chore: decouple quantization implementation from vLLM dependency (#7992) 2025-07-16 15:56:26 -07:00
YanbingJiang
094c116f7d Update python API of activation, topk, norm and rope and remove vllm dependency (#6614)
Co-authored-by: Wu, Chunyuan <chunyuan.wu@intel.com>
Co-authored-by: jianan-gu <jianan.gu@intel.com>
Co-authored-by: sdp <sdp@gnr799219.jf.intel.com>
2025-06-17 22:11:50 -07:00
Yijie Zhu
a39d928782 support qwen2 running on ascend npu device (#7022)
Co-authored-by: 刁莹煜 <diaoyingyu1@hisilicon.com>
2025-06-17 11:24:10 -07:00
JieXin Liang
b70957fcf8 [refactor] slightly tidy fp8 module (#5993) 2025-05-07 17:28:24 -07:00
Lianmin Zheng
177320a582 Clean up imports (#5467) 2025-04-16 15:26:49 -07:00
AniZpZ
d95269f9b3 [2/3] fix dsv3 awq issue (#4625)
Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com>
Co-authored-by: laixinn <xielx@shanghaitech.edu.cn>
2025-04-03 17:36:39 -07:00
Xiaoyu Zhang
04e3ff6975 Support compressed tensors fp8w8a8 (#4743) 2025-03-26 13:21:25 -07:00
Yun Dai
8cd4250401 [quantization] fix channelwise conversion with scalar weight scale (#4596) 2025-03-22 00:47:52 -07:00
Xiaoyu Zhang
dd865befde [Hotfix] solve fp8 w8a8 ci test fail (#4531) 2025-03-17 23:17:04 -07:00
Xiaoyu Zhang
9b81f9bd34 sglang quant module remove vllm dependency (#4507) 2025-03-17 15:51:59 -07:00