Xinyu Dong
|
bf9369f733
|
Migrate XTorch operations to Kunlun operations (accelerating iteration) (#177)
Signed-off-by: dongxinyu03 <dongxinyu03@baidu.com>
|
2026-02-12 18:13:00 +08:00 |
|
Li Wei
|
744719587e
|
[Feature] Support glmx (#194)
Signed-off-by: Li Wei <liwei.109@outlook.com>
Co-authored-by: tangshiwen <tangshiwen@baidu.com>
Co-authored-by: Xinyu Dong <dongxinyu03@baidu.com>
|
2026-02-12 15:40:42 +08:00 |
|
WANG HAO
|
6f30bc439d
|
clean pr for ds.2 mtp support (#164)
* Add MTP support in eagle.py
Signed-off-by: wanghao129 <wanghao129@baidu.com>
* new pr for mtp
Signed-off-by: wanghao129 <wanghao129@baidu.com>
* Revert formatting changes in deepseek_v2.py
Signed-off-by: wanghao129 <wanghao129@baidu.com>
---------
Signed-off-by: wanghao129 <wanghao129@baidu.com>
Co-authored-by: wanghao129 <wanghao129@baidu.com>
|
2026-02-02 15:23:33 +08:00 |
|
Li Wei
|
71bd70ad6c
|
[Feature] support compressed-tensors w4a16 quantization (#154)
- native int4 kimi model inference is supported
Signed-off-by: Li Wei <liwei.109@outlook.com>
|
2026-01-27 19:56:22 +08:00 |
|
Shiwen Tang
|
0711c1abfa
|
[Feature] Support AWQ MoE W4A16 Quantization (#142)
Signed-off-by: tangshiwen <tangshiwen@baidu.com>
Co-authored-by: Li Wei <liwei.109@outlook.com>
|
2026-01-26 18:56:05 +08:00 |
|
baoqian426
|
1eaa1336ac
|
[Bugfix]remove mla patch, server args no need --compilation-config for ds v3.1 (#145)
Signed-off-by: baoqian426 <1354987947@qq.com>
|
2026-01-23 15:59:43 +08:00 |
|
fromck
|
0ce5f1a3f7
|
Add kernels to optimize RoPE and the decoding stage (#143)
Co-authored-by: chengxiaokang <chengxiaokang@baidu.com>
|
2026-01-23 10:29:52 +08:00 |
|
fromck
|
74d4f804e8
|
add 2 kernels and optimize the calculation of topk_indices (#134)
Co-authored-by: chengxiaokang <chengxiaokang@baidu.com>
|
2026-01-22 10:29:28 +08:00 |
|
Li Wei
|
2a2d773ad0
|
[fix]bias bug in kunlun_scale_mm (#126)
|
2026-01-20 13:24:52 +08:00 |
|
Li Wei
|
8f56cbf3ed
|
[refactor]update Kunlun classes with monkey patch (#122)
Signed-off-by: Li Wei <liwei.109@outlook.com>
|
2026-01-19 20:24:19 +08:00 |
|
baoqian426
|
2512259944
|
longcontext chunk make attention crash, fix it (#117)
Co-authored-by: root <root@rdtest-node1150.bcc-zwlt.baidu.com>
|
2026-01-17 18:38:23 +08:00 |
|
fromck
|
71a5a04e0a
|
[Misc]Specify that DS32 only supports --kv-cache-dtype bfloat16 (#119)
* [Kernel] add kernels to torch.ops
* [Misc]Specify that DS only supports --kv-cache-dtype bfloat16
---------
Co-authored-by: chengxiaokang <chengxiaokang@baidu.com>
|
2026-01-17 16:52:02 +08:00 |
|
Shiwen Tang
|
8988ad08b2
|
[Feature] Support Mixed-Precision Quantization for MoE (#112)
|
2026-01-14 18:42:18 +08:00 |
|
baoqian426
|
eb40e8a07a
|
[Bugfix] fix can not import compressed_tensors (#87)
Co-authored-by: root <root@rdtest-node1150.bcc-zwlt.baidu.com>
|
2026-01-07 11:32:10 +08:00 |
|
Li Wei
|
1c1b84d78c
|
[fix]update compressed-tensors scheme
Deepseek v3.2 is supported now
Signed-off-by: Li Wei <liwei.109@outlook.com>
|
2026-01-06 22:30:27 +08:00 |
|
Li Wei
|
9533f68e99
|
[fix]matmul not support cuda graph
|
2026-01-06 17:32:45 +08:00 |
|
Li Wei
|
515a4eeda9
|
[dev] support compressed-tensors w8a8 quantization (#75)
* [dev] support compressed-tensors w8a8 quantization
Co-authored-by: Li Wei <liwei.109@outlook.com>
* [refact]update KunlunScaleMMKernel impl
* [rebase]resolve conflicts and remove redundant code
---------
Co-authored-by: tangshiwen <tangshiwen@baidu.com>
|
2026-01-06 13:51:53 +08:00 |
|
baoqian426
|
ee0f50e68f
|
[Feature] support deepseek v3/r1/v3.2 (#78)
* [Feature] support deepseek v3/r1/v3.2
* fix gpt_oss
* update readme
* update readme
---------
Co-authored-by: hanhaowen <hanhaowen@baidu.com>
|
2026-01-05 22:55:35 +08:00 |
|
Xinyu Dong
|
07bc24a555
|
[Bugs] Fix moe when without bias (#76)
|
2026-01-05 10:51:23 +08:00 |
|
callmelaoyi
|
b86953acf9
|
[Kernel] Qwen3-next 优化 recompute_w_u_fwd & chunk_fwd_o (#74)
Co-authored-by: yuanjizhong <yuanjizhong@baidu.com>
|
2026-01-05 10:24:51 +08:00 |
|
Xinyu Dong
|
fe666fb24f
|
[Feature] Support gpt-oss and update model list (#71)
* [Docs] Update Support Models
* [Feature] Support gpt-oss
* [Docs] fix model support list
* Fix Moe
* Fix
* Fix moe_ep
* remove gpt oss graph support , not yet
---------
Co-authored-by: hanhaowen <hanhaowen@baidu.com>
|
2026-01-04 21:19:49 +08:00 |
|
hanhaowen
|
b015bb76fd
|
remove qwen2.py llama.py fix llama output
|
2025-12-31 11:39:37 +08:00 |
|
Xinyu Dong
|
b3c30a3cb9
|
[Feature] Support XiaoMi MIMO Flash V2 (#62)
* [Feature] Support MIMO Flash V2
|
2025-12-31 10:16:33 +08:00 |
|
Li Wei
|
9cee025f41
|
Merge pull request #59 from liwei109/aicapx-quant
[fix]remove weight_loader_v2 to suport cuda graph
|
2025-12-29 19:56:24 +08:00 |
|
baoqian426
|
45c6b8e927
|
Merge pull request #52 from liwei109/awq_gptq
[dev] support AWQ/GPTQ quantization for dense models
|
2025-12-24 17:05:26 +08:00 |
|
Li Wei
|
6546323c71
|
[dev] support AWQ/GPTQ quantization for dense models
|
2025-12-24 13:46:06 +08:00 |
|
Li Wei
|
383eb5459a
|
[refactor] remove redundant code in linear
|
2025-12-24 12:02:09 +08:00 |
|
Xinyu Dong
|
75d0bdae2f
|
Merge pull request #40 from ldh2020/v0.11.0dev
[Kernel] Optimize the performance of Qwen3-Next
|
2025-12-22 21:50:27 +08:00 |
|
hanhaowen
|
a4b9e92ca1
|
[Kernel] Replace native torch solve_tril by solve_tril_fwd kernel op
|
2025-12-22 17:37:19 +08:00 |
|
ldh2020
|
004e164bdb
|
[Kernel] Optimize the recurrent op
|
2025-12-21 11:18:00 +08:00 |
|
ldh2020
|
fce97df908
|
[Kernel] Use l2norm kernel op instead of triton op.
|
2025-12-16 16:24:47 +08:00 |
|
ldh2020
|
cff4727fbb
|
[Kernel] Optimize the performance of causal_conv1d.
|
2025-12-12 17:22:35 +08:00 |
|
ldh2020
|
9bb2ee06a4
|
[Bugfix] fix the bug of torch_solve_tril
|
2025-12-12 17:01:50 +08:00 |
|
chenyili
|
7c22d621fb
|
提交vllm0.11.0开发分支
|
2025-12-10 17:51:24 +08:00 |
|
zhaoyingzhuo
|
b614823125
|
[chore] Remove obsolete comments
|
2025-12-10 15:52:23 +08:00 |
|
dongxinyu03
|
c728e52505
|
Initial commit for vLLM-Kunlun Plugin
|
2025-12-10 12:05:39 +08:00 |
|