Commit Graph

11 Commits

Author SHA1 Message Date
Cheng Wan
3fa62da78c [7/N] MoE Refactor: the implementation of new framework (#9269) 2025-09-05 21:09:09 -07:00
Cheng Wan
295895120d [6/N] MoE Refactor: Cleanup MoE-related configs (#8849) 2025-08-14 21:14:53 -07:00
Hongbo Xu
2cc9eeab01 [4/n]decouple quantization implementation from vLLM dependency (#9191)
Co-authored-by: AniZpZ <aniz1905@gmail.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-08-14 12:05:46 -07:00
Hongbo Xu
a669bc2f74 Replace sglang.srt.layers.quantization.scalar_types with sgl_kernel.scalar_type (#8951) 2025-08-13 19:41:41 -07:00
Cheng Wan
15ad6c9086 [1/N] MoE Refactor: refactor select_experts (#7966) 2025-07-19 00:51:15 -07:00
Hubert Lu
7750b91ca8 [AMD] Add triton awq_dequantize kernel to support AWQ on ROCm (#7661) 2025-07-18 14:27:25 -07:00
Hongbo Xu
1f76fc8747 [3/n] chore: decouple AWQ implementation from vLLM dependency (#8113)
Co-authored-by: AniZpZ <zhuangsen.zp@antgroup.com>
2025-07-18 11:45:22 -07:00
Cheng Wan
49b8777460 Refactor: move all quantization-related code to srt/layer/quantization (#7989) 2025-07-17 00:47:07 -07:00
Yineng Zhang
fbb5f229d4 fix awq_dequantize import (#5669) 2025-04-23 01:36:26 -07:00
Lianmin Zheng
74e0ac1dbd Clean up import vllm in quantization/__init__.py (#4834) 2025-03-28 10:34:10 -07:00
laixin
ae25d36dc6 [3/3] fix dsv3 awq issue (#4719)
Co-authored-by: AniZpZ <aniz1905@gmail.com>
2025-03-26 23:13:43 -07:00