starkwj
34e04c5569
update base image
2026-03-02 18:46:04 +08:00
starkwj
a15754c3ba
add readme
2026-03-02 18:40:49 +08:00
starkwj
4d8575115a
add vxpu
2026-03-02 18:38:10 +08:00
lishaobing448
dc63e81a7f
fix: use cuda visible ( #244 )
...
Signed-off-by: lishaobing448 <shaobingli2024@163.com >
2026-03-02 17:33:13 +08:00
Li Wei
e4c9b9f988
[Bugfix] cocopod ops can't be finded ( #242 )
...
Signed-off-by: Li Wei <liwei.109@outlook.com >
2026-03-02 15:49:24 +08:00
Joeegin
171f664a0f
[Doc] Update dependencies ( #225 )
...
Signed-off-by: Joeegin <3318329726@qq.com >
2026-03-02 10:50:12 +08:00
chanzhennan
82544aa0cc
[Feature] Merge branch 'Qwen3-Next' into main && Support Qwen-next ( #222 )
...
Signed-off-by: xyDong0223 <dongxinyu03@baidu.com >
Co-authored-by: xyDong0223 <dongxinyu03@baidu.com >
2026-02-28 11:15:50 +08:00
Lidang Jiang
153093d3b3
[Misc] add collect_env feat ( #218 )
...
Signed-off-by: Lidang-Jiang <lidangjiang@gmail.com >
2026-02-27 12:19:58 +08:00
Xinyu Dong
d425a0d0e9
[Docs] Add vLLM-Kunlun New Model Adaptation Manual and Update Model Support ( #211 )
...
* [Docs] Fix app.readthedocs buliding
Signed-off-by: dongxinyu03 <dongxinyu03@baidu.com >
* [Docs] Add vLLM-Kunlun New Model Adaptation Manual and Update Model Support
Signed-off-by: dongxinyu03 <dongxinyu03@baidu.com >
2026-02-26 10:06:58 +08:00
Shiwen Tang
b82b6026d6
[BugFix] Adapt GLM5 config for transformers 4.57 ( #207 )
...
Signed-off-by: tangshiwen <tangshiwen@baidu.com >
2026-02-25 18:47:26 +08:00
Xinyu Dong
a470452871
[Docs] Fix app.readthedocs buliding ( #210 )
...
Signed-off-by: dongxinyu03 <dongxinyu03@baidu.com >
2026-02-17 16:17:25 +08:00
Xinyu Dong
d9ad42a174
[Docs] Fix quantization support description in README ( #208 )
...
Updated quantization support description from FP8 to INT8.
2026-02-15 13:12:17 +08:00
Xinyu Dong
77dbc2ddeb
[Docs] Update README ( #206 )
...
Signed-off-by: dongxinyu03 <dongxinyu03@baidu.com >
2026-02-15 11:05:54 +08:00
Xinyu Dong
76ec220b43
[Bugsfix] Fix run failed ( #198 )
...
Signed-off-by: xyDong0223 <dongxinyu03@baidu.com >
2026-02-13 14:07:10 +08:00
Xinyu Dong
bf9369f733
Migrate XTorch operations to Kunlun operations (accelerating iteration) ( #177 )
...
Signed-off-by: dongxinyu03 <dongxinyu03@baidu.com >
2026-02-12 18:13:00 +08:00
Li Wei
744719587e
[Feature] Support glmx ( #194 )
...
Signed-off-by: Li Wei <liwei.109@outlook.com >
Co-authored-by: tangshiwen <tangshiwen@baidu.com >
Co-authored-by: Xinyu Dong <dongxinyu03@baidu.com >
2026-02-12 15:40:42 +08:00
Xinyu Dong
070bfa4a73
[Bugfix] Fixed Kunlun Graph Failed ( #193 )
...
Signed-off-by: dongxinyu03 <dongxinyu03@baidu.com >
2026-02-11 18:52:18 +08:00
fromck
fc48b79ae9
support glm4.7 mtp ( #187 )
...
Signed-off-by: chengxiaokang <chengxiaokang@baidu.com >
Co-authored-by: chengxiaokang <chengxiaokang@baidu.com >
2026-02-11 18:32:30 +08:00
WANG HAO
bd8c999335
Further optimize multi-lora inference,LoRA-enabled performance achieves 80%+ of non-LoRA performance ( #190 )
...
* optimize lora inference
Signed-off-by: wanghao <wanghao@example.com >
* further optimize multi-lora inference,LoRA-enabled performance achieves 80%+ of non-LoRA performance
Signed-off-by: wanghao <wanghao@example.com >
---------
Signed-off-by: wanghao <wanghao@example.com >
Co-authored-by: wanghao <wanghao@example.com >
2026-02-11 12:04:14 +08:00
WeiJie_Hong
9b1f25fbe3
[Doc] update xspeedgate_ops (20260130) ( #188 )
...
Signed-off-by: WeiJie_Hong <1462519292@qq.com >
2026-02-10 18:05:20 +08:00
WeiJie_Hong
42c7ef2f27
[Doc] add DeepSeek-V3.2-Exp-w8a8 to installation.md and tutorials ( #186 )
...
Signed-off-by: WeiJie_Hong <1462519292@qq.com >
2026-02-10 17:18:32 +08:00
WANG HAO
6f30bc439d
clean pr for ds.2 mtp support ( #164 )
...
* Add MTP support in eagle.py
Signed-off-by: wanghao129 <wanghao129@baidu.com >
* new pr for mtp
Signed-off-by: wanghao129 <wanghao129@baidu.com >
* Revert formatting changes in deepseek_v2.py
Signed-off-by: wanghao129 <wanghao129@baidu.com >
---------
Signed-off-by: wanghao129 <wanghao129@baidu.com >
Co-authored-by: wanghao129 <wanghao129@baidu.com >
2026-02-02 15:23:33 +08:00
WeiJie_Hong
42a2d38f47
[CI/Build] Fixed bug related to conflicts in the code inspection tool ( #169 )
...
Signed-off-by: WeiJie_Hong <1462519292@qq.com >
2026-02-02 12:03:02 +08:00
fromck
6f12830839
[Kernel] add topk_per_row to optimize the calculation of topk_indexes ( #168 )
...
Signed-off-by: chengxiaokang <chengxiaokang@baidu.com >
Co-authored-by: chengxiaokang <chengxiaokang@baidu.com >
2026-02-02 11:07:49 +08:00
astrophel0
726cefb7a3
[dev]add glm4.7 tool-parser ( #151 )
...
Signed-off-by: zhangzhenyi <zhangzhenyi@baidu.com >
Co-authored-by: Li Wei <liwei.109@outlook.com >
2026-02-01 13:53:47 +08:00
1916hcc
e28b697458
[CI/Build] Refactor E2E CI: split monolithic workflow into modular scripts ( #162 )
...
Signed-off-by: Chenchao Hu <huchenchao@example.com >
Co-authored-by: Chenchao Hu <huchenchao@example.com >
2026-01-29 18:57:09 +08:00
tanjunchen
1e1e870a71
update ci workflow ( #159 )
...
Signed-off-by: tanjunchen <tanjunchen20@gmail.com >
2026-01-28 20:28:38 +08:00
1916hcc
7c2966a98c
[CI/Build] Add CI end-to-end (E2E) tests ( #139 )
...
* [CI/Build] Add CI end-to-end (E2E) tests
Signed-off-by: Chenchao Hu <huchenchao@example.com >
2026-01-28 19:30:55 +08:00
Joeegin
c37ee19e3d
[CI] Add UT CI ( #157 )
...
Signed-off-by: Joeegin <3318329726@qq.com >
2026-01-28 18:00:16 +08:00
WeiJie_Hong
d18df18499
[CI/Build] update .pre-commit-config.yaml && add _pylint.yml && update installation.md ( #155 )
...
Signed-off-by: WeiJie_Hong <1462519292@qq.com >
2026-01-28 17:58:46 +08:00
Li Wei
71bd70ad6c
[Feature] support compressed-tensors w4a16 quantization ( #154 )
...
- native int4 kimi model inference is supported
Signed-off-by: Li Wei <liwei.109@outlook.com >
2026-01-27 19:56:22 +08:00
Shiwen Tang
0711c1abfa
[Feature] Support AWQ MoE W4A16 Quantization ( #142 )
...
Signed-off-by: tangshiwen <tangshiwen@baidu.com >
Co-authored-by: Li Wei <liwei.109@outlook.com >
2026-01-26 18:56:05 +08:00
WeiJie_Hong
2a998286c0
[Doc] update base image url(1.Replace conda with uv; 2.Integrate xpytorch and ops into the image.) ( #146 )
...
Signed-off-by: WeiJie_Hong <1462519292@qq.com >
2026-01-23 18:55:56 +08:00
1916hcc
c0f06d04b1
[Doc] docs: remove internal pip index from requirements ( #147 )
...
Signed-off-by: Chenchao Hu <huchenchao@example.com >
Co-authored-by: Chenchao Hu <huchenchao@example.com >
2026-01-23 18:55:34 +08:00
baoqian426
1eaa1336ac
[Bugfix]remove mla patch, server args no need --compilation-config for ds v3.1 ( #145 )
...
Signed-off-by: baoqian426 <1354987947@qq.com >
2026-01-23 15:59:43 +08:00
fromck
0ce5f1a3f7
Add kernels to optimize RoPE and the decoding stage ( #143 )
...
Co-authored-by: chengxiaokang <chengxiaokang@baidu.com >
2026-01-23 10:29:52 +08:00
Lidang Jiang
9e13f23661
[Doc] Optimize the document ( #136 )
2026-01-22 14:12:44 +08:00
Joeegin
58f570ddea
[Docs] Add XPU tutorials for Qwen / InternVL ( #140 )
...
Signed-off-by: Joeegin <3318329726@qq.com >
2026-01-22 13:50:49 +08:00
fromck
74d4f804e8
add 2 kernels and optimize the calculation of topk_indices ( #134 )
...
Co-authored-by: chengxiaokang <chengxiaokang@baidu.com >
2026-01-22 10:29:28 +08:00
yuqilinaa
c9f00c132c
[Kernel] Enable fast random sample on Kunlun3 Platform with generators ( #73 )
...
Co-authored-by: Xinyu Dong <dongxinyu03@baidu.com >
2026-01-20 21:49:33 +08:00
WANG HAO
c404af3a41
[Feature] totaly support multi-lora support,latest xspeedgate needed ( #133 )
...
Co-authored-by: wanghao <wanghao@example.com >
2026-01-20 21:27:02 +08:00
youzeyu
92b40628cd
delete glmGlmForCausalLM register ( #132 )
...
Co-authored-by: hanhaowen <hanhaowen@baidu.com >
2026-01-20 19:22:33 +08:00
haoli5009-debug
561a235a3f
[CI/Build] Modify biweekly report readme files ( #131 )
...
Co-authored-by: v_lihao66 <v_lihao66@baidu.com >
2026-01-20 16:58:36 +08:00
Li Wei
2a2d773ad0
[fix]bias bug in kunlun_scale_mm ( #126 )
2026-01-20 13:24:52 +08:00
Li Wei
f2019b145f
Revert "support glm47 in 0.11.0 version ( #116 )" ( #123 )
...
This reverts commit 9006e37979 .
2026-01-20 10:46:11 +08:00
roger-lcc
9006e37979
support glm47 in 0.11.0 version ( #116 )
...
* support glm47 in 0.11.0 version
* support glm47 in 0.11.0 version
---------
Co-authored-by: luochencheng <luochencheng@baidu.com >
2026-01-19 20:26:26 +08:00
Li Wei
8f56cbf3ed
[refactor]update Kunlun classes with monkey patch ( #122 )
...
Signed-off-by: Li Wei <liwei.109@outlook.com >
2026-01-19 20:24:19 +08:00
baoqian426
2512259944
longcontext chunk make attention crash, fix it ( #117 )
...
Co-authored-by: root <root@rdtest-node1150.bcc-zwlt.baidu.com >
2026-01-17 18:38:23 +08:00
fromck
71a5a04e0a
[Misc]Specify that DS32 only supports --kv-cache-dtype bfloat16 ( #119 )
...
* [Kernel] add kernels to torch.ops
* [Misc]Specify that DS only supports --kv-cache-dtype bfloat16
---------
Co-authored-by: chengxiaokang <chengxiaokang@baidu.com >
2026-01-17 16:52:02 +08:00
Shiwen Tang
8988ad08b2
[Feature] Support Mixed-Precision Quantization for MoE ( #112 )
2026-01-14 18:42:18 +08:00