Commit Graph

10 Commits

Author SHA1 Message Date
Shiwen Tang
0711c1abfa [Feature] Support AWQ MoE W4A16 Quantization (#142)
Signed-off-by: tangshiwen <tangshiwen@baidu.com>
Co-authored-by: Li Wei <liwei.109@outlook.com>
2026-01-26 18:56:05 +08:00
Li Wei
8f56cbf3ed [refactor]update Kunlun classes with monkey patch (#122)
Signed-off-by: Li Wei <liwei.109@outlook.com>
2026-01-19 20:24:19 +08:00
Shiwen Tang
8988ad08b2 [Feature] Support Mixed-Precision Quantization for MoE (#112) 2026-01-14 18:42:18 +08:00
Li Wei
9533f68e99 [fix]matmul not support cuda graph 2026-01-06 17:32:45 +08:00
baoqian426
ee0f50e68f [Feature] support deepseek v3/r1/v3.2 (#78)
* [Feature] support deepseek v3/r1/v3.2

* fix gpt_oss

* update readme

* update readme

---------

Co-authored-by: hanhaowen <hanhaowen@baidu.com>
2026-01-05 22:55:35 +08:00
Xinyu Dong
07bc24a555 [Bugs] Fix moe when without bias (#76) 2026-01-05 10:51:23 +08:00
Xinyu Dong
fe666fb24f [Feature] Support gpt-oss and update model list (#71)
* [Docs] Update Support Models

* [Feature] Support gpt-oss

* [Docs] fix model support list

* Fix Moe

* Fix

* Fix moe_ep

* remove gpt oss graph support , not yet

---------

Co-authored-by: hanhaowen <hanhaowen@baidu.com>
2026-01-04 21:19:49 +08:00
Xinyu Dong
b3c30a3cb9 [Feature] Support XiaoMi MIMO Flash V2 (#62)
* [Feature] Support MIMO Flash V2
2025-12-31 10:16:33 +08:00
chenyili
7c22d621fb 提交vllm0.11.0开发分支 2025-12-10 17:51:24 +08:00
dongxinyu03
c728e52505 Initial commit for vLLM-Kunlun Plugin 2025-12-10 12:05:39 +08:00