Xinyu Dong
|
b3c30a3cb9
|
[Feature] Support XiaoMi MIMO Flash V2 (#62)
* [Feature] Support MIMO Flash V2
|
2025-12-31 10:16:33 +08:00 |
|
Li Wei
|
9cee025f41
|
Merge pull request #59 from liwei109/aicapx-quant
[fix]remove weight_loader_v2 to suport cuda graph
|
2025-12-29 19:56:24 +08:00 |
|
baoqian426
|
45c6b8e927
|
Merge pull request #52 from liwei109/awq_gptq
[dev] support AWQ/GPTQ quantization for dense models
|
2025-12-24 17:05:26 +08:00 |
|
Li Wei
|
6546323c71
|
[dev] support AWQ/GPTQ quantization for dense models
|
2025-12-24 13:46:06 +08:00 |
|
Li Wei
|
383eb5459a
|
[refactor] remove redundant code in linear
|
2025-12-24 12:02:09 +08:00 |
|
Xinyu Dong
|
75d0bdae2f
|
Merge pull request #40 from ldh2020/v0.11.0dev
[Kernel] Optimize the performance of Qwen3-Next
|
2025-12-22 21:50:27 +08:00 |
|
hanhaowen
|
a4b9e92ca1
|
[Kernel] Replace native torch solve_tril by solve_tril_fwd kernel op
|
2025-12-22 17:37:19 +08:00 |
|
ldh2020
|
8261a09e2a
|
[Kernel] Optimize the selection and update OP of ssm state
|
2025-12-21 15:45:32 +08:00 |
|
ldh2020
|
b97c781300
|
[Kernel] Optimize the recurrent op
|
2025-12-21 11:22:06 +08:00 |
|
ldh2020
|
004e164bdb
|
[Kernel] Optimize the recurrent op
|
2025-12-21 11:18:00 +08:00 |
|
ldh2020
|
58c1db5073
|
[Bugfix] fix the bug of the flash_attention in Qwen3-Next
|
2025-12-21 10:34:43 +08:00 |
|
Xinyu Dong
|
6f96615ee3
|
Merge pull request #23 from ldh2020/v0.11.0dev
[Kernel] Use l2norm kernel op instead of triton op.
|
2025-12-19 15:26:18 +08:00 |
|
chenyili0619
|
2e2933d217
|
[Bug] Fixed the issue where an error occurred when the request included a seed.
|
2025-12-18 13:03:34 +08:00 |
|
ldh2020
|
fce97df908
|
[Kernel] Use l2norm kernel op instead of triton op.
|
2025-12-16 16:24:47 +08:00 |
|
Xinyu Dong
|
5a75795ade
|
[Model] Update llama.py
Remove redundancy
|
2025-12-15 21:28:56 +08:00 |
|
Xinyu Dong
|
7c7d0326c5
|
[Model] registry llama.py to vLLM
|
2025-12-15 21:21:28 +08:00 |
|
Xinyu Dong
|
ca059110b3
|
[Model] Supporet llama3 on v0.11.0
FULL AND PIECEWISE GRAPH ENBALE
|
2025-12-15 21:20:44 +08:00 |
|
ldh2020
|
cff4727fbb
|
[Kernel] Optimize the performance of causal_conv1d.
|
2025-12-12 17:22:35 +08:00 |
|
ldh2020
|
9bb2ee06a4
|
[Bugfix] fix the bug of torch_solve_tril
|
2025-12-12 17:01:50 +08:00 |
|
baoqian426
|
fae22c2e62
|
Merge pull request #3 from xyDong0223/main
[Kernel] Enable fast random sample on Kunlun3 Platform
|
2025-12-11 11:47:30 +08:00 |
|
xyDong0223
|
af2cd6097f
|
[Kernell] fix miss import os
|
2025-12-11 11:17:28 +08:00 |
|
xyDong0223
|
670c2397b8
|
[Kernel] Enable fast random sample on Kunlun P
|
2025-12-10 21:52:48 +08:00 |
|
chenyili
|
7c22d621fb
|
提交vllm0.11.0开发分支
|
2025-12-10 17:51:24 +08:00 |
|
zhaoyingzhuo
|
b614823125
|
[chore] Remove obsolete comments
|
2025-12-10 15:52:23 +08:00 |
|
dongxinyu03
|
c728e52505
|
Initial commit for vLLM-Kunlun Plugin
|
2025-12-10 12:05:39 +08:00 |
|