WANG HAO
|
6f30bc439d
|
clean pr for ds.2 mtp support (#164)
* Add MTP support in eagle.py
Signed-off-by: wanghao129 <wanghao129@baidu.com>
* new pr for mtp
Signed-off-by: wanghao129 <wanghao129@baidu.com>
* Revert formatting changes in deepseek_v2.py
Signed-off-by: wanghao129 <wanghao129@baidu.com>
---------
Signed-off-by: wanghao129 <wanghao129@baidu.com>
Co-authored-by: wanghao129 <wanghao129@baidu.com>
|
2026-02-02 15:23:33 +08:00 |
|
Li Wei
|
71bd70ad6c
|
[Feature] support compressed-tensors w4a16 quantization (#154)
- native int4 kimi model inference is supported
Signed-off-by: Li Wei <liwei.109@outlook.com>
|
2026-01-27 19:56:22 +08:00 |
|
Shiwen Tang
|
0711c1abfa
|
[Feature] Support AWQ MoE W4A16 Quantization (#142)
Signed-off-by: tangshiwen <tangshiwen@baidu.com>
Co-authored-by: Li Wei <liwei.109@outlook.com>
|
2026-01-26 18:56:05 +08:00 |
|
Li Wei
|
1c1b84d78c
|
[fix]update compressed-tensors scheme
Deepseek v3.2 is supported now
Signed-off-by: Li Wei <liwei.109@outlook.com>
|
2026-01-06 22:30:27 +08:00 |
|
Li Wei
|
515a4eeda9
|
[dev] support compressed-tensors w8a8 quantization (#75)
* [dev] support compressed-tensors w8a8 quantization
Co-authored-by: Li Wei <liwei.109@outlook.com>
* [refact]update KunlunScaleMMKernel impl
* [rebase]resolve conflicts and remove redundant code
---------
Co-authored-by: tangshiwen <tangshiwen@baidu.com>
|
2026-01-06 13:51:53 +08:00 |
|
baoqian426
|
ee0f50e68f
|
[Feature] support deepseek v3/r1/v3.2 (#78)
* [Feature] support deepseek v3/r1/v3.2
* fix gpt_oss
* update readme
* update readme
---------
Co-authored-by: hanhaowen <hanhaowen@baidu.com>
|
2026-01-05 22:55:35 +08:00 |
|
hanhaowen
|
b015bb76fd
|
remove qwen2.py llama.py fix llama output
|
2025-12-31 11:39:37 +08:00 |
|
Xinyu Dong
|
b3c30a3cb9
|
[Feature] Support XiaoMi MIMO Flash V2 (#62)
* [Feature] Support MIMO Flash V2
|
2025-12-31 10:16:33 +08:00 |
|
Li Wei
|
9cee025f41
|
Merge pull request #59 from liwei109/aicapx-quant
[fix]remove weight_loader_v2 to suport cuda graph
|
2025-12-29 19:56:24 +08:00 |
|
Li Wei
|
6546323c71
|
[dev] support AWQ/GPTQ quantization for dense models
|
2025-12-24 13:46:06 +08:00 |
|
chenyili
|
7c22d621fb
|
提交vllm0.11.0开发分支
|
2025-12-10 17:51:24 +08:00 |
|
dongxinyu03
|
c728e52505
|
Initial commit for vLLM-Kunlun Plugin
|
2025-12-10 12:05:39 +08:00 |
|