sglang

Author	SHA1	Message	Date
Hongbo Xu	a669bc2f74	Replace `sglang.srt.layers.quantization.scalar_types` with `sgl_kernel.scalar_type` (#8951 )	2025-08-13 19:41:41 -07:00
Cheng Wan	a437aa9987	[hotfix] fix mixtral with tensor-level compressed-tensor quantization (#8721 )	2025-08-02 22:59:25 -07:00
Xiaoyu Zhang	a167fd0bcb	[code style] Clean dead triton kernel code in fused_moe and useless vllm_ops import (#8310 )	2025-07-24 14:38:30 +08:00
Hubert Lu	e50109f2ed	[AMD] Remove vllm's scaled_fp8_quant and moe_sum when SGLANG_USE_AITER=1 (#7484 )	2025-07-21 17:33:19 -07:00
Cheng Wan	15ad6c9086	[1/N] MoE Refactor: refactor `select_experts` (#7966 )	2025-07-19 00:51:15 -07:00
Enrique Shockwave	fd63b62eaa	fix compressed tensors WNA16 imports (#8142 )	2025-07-18 11:34:14 -07:00
Cheng Wan	49b8777460	Refactor: move all quantization-related code to `srt/layer/quantization` (#7989 )	2025-07-17 00:47:07 -07:00
narutolhy	3e34e9004f	Fix: sync prepare_fp8_layer_for_marlin with latest vllm changes (#7648 )	2025-06-30 21:51:01 -07:00
YanbingJiang	094c116f7d	Update python API of activation, topk, norm and rope and remove vllm dependency (#6614 ) Co-authored-by: Wu, Chunyuan <chunyuan.wu@intel.com> Co-authored-by: jianan-gu <jianan.gu@intel.com> Co-authored-by: sdp <sdp@gnr799219.jf.intel.com>	2025-06-17 22:11:50 -07:00
Yijie Zhu	a39d928782	support qwen2 running on ascend npu device (#7022 ) Co-authored-by: 刁莹煜 <diaoyingyu1@hisilicon.com>	2025-06-17 11:24:10 -07:00
Xiaoyu Zhang	3712abfaf9	Fuse routed scaling factor in deepseek (#6970 )	2025-06-08 15:24:24 -07:00
Yineng Zhang	1fb76ebb93	Revert "Fuse routed scaling factor in topk_reduce kernel (#6220 )" (#6968 )	2025-06-07 21:02:49 -07:00
Xiaoyu Zhang	515ef4facb	Fuse routed scaling factor in topk_reduce kernel (#6220 )	2025-06-07 11:06:50 -07:00
Cheng Wan	81964328b7	Set `num_fused_shared_experts` as `num_shared_experts` when shared_experts fusion is not disabled (#6736 )	2025-06-04 15:53:22 -07:00
Lianmin Zheng	e8e18dcdcc	Revert "fix some typos" (#6244 )	2025-05-12 12:53:26 -07:00
applesaucethebun	d738ab52f8	fix some typos (#6209 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-13 01:42:38 +08:00
JieXin Liang	b70957fcf8	[refactor] slightly tidy fp8 module (#5993 )	2025-05-07 17:28:24 -07:00
Juwan Yoo	502524e2da	compressed_tensors: port w8a16 fp8 from vllm (#4852 )	2025-04-20 17:48:31 -07:00
Xiaoyu Zhang	d58e354472	simplify the control logic for using shared experts fusion (#5504 )	2025-04-19 13:17:35 -07:00
Xiaoyu Zhang	bf86c5e990	restruct compressed_tensors_w8a8_fp8 (#5475 )	2025-04-19 04:52:15 -07:00
liwenju0	e465b08ddb	fix bug of VLLM_AVAILABLE not defined (#5497 )	2025-04-18 00:59:03 -07:00
Lianmin Zheng	177320a582	Clean up imports (#5467 )	2025-04-16 15:26:49 -07:00
HandH1998	4065248214	Support Llama4 fp8 inference (#5194 ) Co-authored-by: laixinn <xielx@shanghaitech.edu.cn> Co-authored-by: sleepcoo <sleepcoo@gmail.com> Co-authored-by: zhyncs <me@zhyncs.com>	2025-04-09 20:14:34 +08:00
Xiaoyu Zhang	db452760e5	[ci] fix llama4 ci error (#5126 )	2025-04-07 21:15:46 +08:00
Xiaoyu Zhang	924ca7c92c	Add DeepSeek V3/R1 shared experts fusion (#4918 )	2025-04-04 01:59:29 -07:00
Xiaoyu Zhang	04e3ff6975	Support compressed tensors fp8w8a8 (#4743 )	2025-03-26 13:21:25 -07:00

26 Commits