Xinyu Dong
|
070bfa4a73
|
[Bugfix] Fixed Kunlun Graph Failed (#193)
Signed-off-by: dongxinyu03 <dongxinyu03@baidu.com>
|
2026-02-11 18:52:18 +08:00 |
|
WANG HAO
|
bd8c999335
|
Further optimize multi-lora inference,LoRA-enabled performance achieves 80%+ of non-LoRA performance (#190)
* optimize lora inference
Signed-off-by: wanghao <wanghao@example.com>
* further optimize multi-lora inference,LoRA-enabled performance achieves 80%+ of non-LoRA performance
Signed-off-by: wanghao <wanghao@example.com>
---------
Signed-off-by: wanghao <wanghao@example.com>
Co-authored-by: wanghao <wanghao@example.com>
|
2026-02-11 12:04:14 +08:00 |
|
Li Wei
|
71bd70ad6c
|
[Feature] support compressed-tensors w4a16 quantization (#154)
- native int4 kimi model inference is supported
Signed-off-by: Li Wei <liwei.109@outlook.com>
|
2026-01-27 19:56:22 +08:00 |
|
Shiwen Tang
|
0711c1abfa
|
[Feature] Support AWQ MoE W4A16 Quantization (#142)
Signed-off-by: tangshiwen <tangshiwen@baidu.com>
Co-authored-by: Li Wei <liwei.109@outlook.com>
|
2026-01-26 18:56:05 +08:00 |
|
fromck
|
0ce5f1a3f7
|
Add kernels to optimize RoPE and the decoding stage (#143)
Co-authored-by: chengxiaokang <chengxiaokang@baidu.com>
|
2026-01-23 10:29:52 +08:00 |
|
fromck
|
71a5a04e0a
|
[Misc]Specify that DS32 only supports --kv-cache-dtype bfloat16 (#119)
* [Kernel] add kernels to torch.ops
* [Misc]Specify that DS only supports --kv-cache-dtype bfloat16
---------
Co-authored-by: chengxiaokang <chengxiaokang@baidu.com>
|
2026-01-17 16:52:02 +08:00 |
|
roger-lcc
|
37cc307322
|
register apply_repetition_penalties_ in custom_op (#110)
* fix qwen2_vl for 0.11.0
* register apply_repetition_penalties_ in custom_op
---------
Co-authored-by: luochencheng <luochencheng@baidu.com>
|
2026-01-13 20:22:14 +08:00 |
|
baoqian426
|
2c9b176e6e
|
[Feature] use for dp (#90)
|
2026-01-08 11:05:48 +08:00 |
|
tangshiwen
|
f811ae968a
|
[fix] resolve cutlass_scaled_mm inference error
|
2026-01-06 20:52:12 +08:00 |
|
Li Wei
|
9533f68e99
|
[fix]matmul not support cuda graph
|
2026-01-06 17:32:45 +08:00 |
|
Li Wei
|
515a4eeda9
|
[dev] support compressed-tensors w8a8 quantization (#75)
* [dev] support compressed-tensors w8a8 quantization
Co-authored-by: Li Wei <liwei.109@outlook.com>
* [refact]update KunlunScaleMMKernel impl
* [rebase]resolve conflicts and remove redundant code
---------
Co-authored-by: tangshiwen <tangshiwen@baidu.com>
|
2026-01-06 13:51:53 +08:00 |
|
baoqian426
|
ee0f50e68f
|
[Feature] support deepseek v3/r1/v3.2 (#78)
* [Feature] support deepseek v3/r1/v3.2
* fix gpt_oss
* update readme
* update readme
---------
Co-authored-by: hanhaowen <hanhaowen@baidu.com>
|
2026-01-05 22:55:35 +08:00 |
|
hanhaowen
|
b015bb76fd
|
remove qwen2.py llama.py fix llama output
|
2025-12-31 11:39:37 +08:00 |
|
Xinyu Dong
|
b3c30a3cb9
|
[Feature] Support XiaoMi MIMO Flash V2 (#62)
* [Feature] Support MIMO Flash V2
|
2025-12-31 10:16:33 +08:00 |
|
Li Wei
|
6546323c71
|
[dev] support AWQ/GPTQ quantization for dense models
|
2025-12-24 13:46:06 +08:00 |
|
chenyili
|
7c22d621fb
|
提交vllm0.11.0开发分支
|
2025-12-10 17:51:24 +08:00 |
|
dongxinyu03
|
c728e52505
|
Initial commit for vLLM-Kunlun Plugin
|
2025-12-10 12:05:39 +08:00 |
|