Commit Graph

  • 34e04c5569 update base image main starkwj 2026-03-02 18:46:04 +08:00
  • a15754c3ba add readme starkwj 2026-02-12 11:13:26 +08:00
  • 4d8575115a add vxpu starkwj 2026-02-05 19:36:06 +08:00
  • dc63e81a7f fix: use cuda visible (#244) lishaobing448 2026-03-02 17:33:13 +08:00
  • e4c9b9f988 [Bugfix] cocopod ops can't be finded (#242) Li Wei 2026-03-02 15:49:24 +08:00
  • 171f664a0f [Doc] Update dependencies (#225) Joeegin 2026-03-02 10:50:12 +08:00
  • 82544aa0cc [Feature] Merge branch 'Qwen3-Next' into main && Support Qwen-next (#222) chanzhennan 2026-02-28 11:15:50 +08:00
  • 153093d3b3 [Misc] add collect_env feat (#218) Lidang Jiang 2026-02-27 12:19:58 +08:00
  • d425a0d0e9 [Docs] Add vLLM-Kunlun New Model Adaptation Manual and Update Model Support (#211) Xinyu Dong 2026-02-26 10:06:58 +08:00
  • b82b6026d6 [BugFix] Adapt GLM5 config for transformers 4.57 (#207) Shiwen Tang 2026-02-25 18:47:26 +08:00
  • a470452871 [Docs] Fix app.readthedocs buliding (#210) Xinyu Dong 2026-02-17 16:17:25 +08:00
  • d9ad42a174 [Docs] Fix quantization support description in README (#208) Xinyu Dong 2026-02-15 13:12:17 +08:00
  • 77dbc2ddeb [Docs] Update README (#206) Xinyu Dong 2026-02-15 11:05:54 +08:00
  • 76ec220b43 [Bugsfix] Fix run failed (#198) Xinyu Dong 2026-02-13 14:07:10 +08:00
  • bf9369f733 Migrate XTorch operations to Kunlun operations (accelerating iteration) (#177) Xinyu Dong 2026-02-12 18:13:00 +08:00
  • 744719587e [Feature] Support glmx (#194) Li Wei 2026-02-12 15:40:42 +08:00
  • cea31d16fb add readme v0.11.0-v0.0.1 starkwj 2026-02-12 11:13:26 +08:00
  • 01bafad6d0 add vxpu starkwj 2026-02-05 19:36:06 +08:00
  • 070bfa4a73 [Bugfix] Fixed Kunlun Graph Failed (#193) Xinyu Dong 2026-02-11 18:52:18 +08:00
  • fc48b79ae9 support glm4.7 mtp (#187) fromck 2026-02-11 18:32:30 +08:00
  • bd8c999335 Further optimize multi-lora inference,LoRA-enabled performance achieves 80%+ of non-LoRA performance (#190) WANG HAO 2026-02-11 12:04:14 +08:00
  • 9b1f25fbe3 [Doc] update xspeedgate_ops (20260130) (#188) WeiJie_Hong 2026-02-10 18:05:20 +08:00
  • 42c7ef2f27 [Doc] add DeepSeek-V3.2-Exp-w8a8 to installation.md and tutorials (#186) WeiJie_Hong 2026-02-10 17:18:32 +08:00
  • 6f30bc439d clean pr for ds.2 mtp support (#164) WANG HAO 2026-02-02 15:23:33 +08:00
  • 42a2d38f47 [CI/Build] Fixed bug related to conflicts in the code inspection tool (#169) WeiJie_Hong 2026-02-02 12:03:02 +08:00
  • 6f12830839 [Kernel] add topk_per_row to optimize the calculation of topk_indexes (#168) fromck 2026-02-02 11:07:49 +08:00
  • 726cefb7a3 [dev]add glm4.7 tool-parser (#151) astrophel0 2026-01-30 15:24:14 +08:00
  • e28b697458 [CI/Build] Refactor E2E CI: split monolithic workflow into modular scripts (#162) 1916hcc 2026-01-29 18:57:09 +08:00
  • 1e1e870a71 update ci workflow (#159) tanjunchen 2026-01-28 20:28:38 +08:00
  • 7c2966a98c [CI/Build] Add CI end-to-end (E2E) tests (#139) 1916hcc 2026-01-28 19:30:55 +08:00
  • c37ee19e3d [CI] Add UT CI (#157) Joeegin 2026-01-28 18:00:16 +08:00
  • d18df18499 [CI/Build] update .pre-commit-config.yaml && add _pylint.yml && update installation.md (#155) WeiJie_Hong 2026-01-28 17:58:46 +08:00
  • 71bd70ad6c [Feature] support compressed-tensors w4a16 quantization (#154) Li Wei 2026-01-27 19:56:22 +08:00
  • 0711c1abfa [Feature] Support AWQ MoE W4A16 Quantization (#142) Shiwen Tang 2026-01-26 18:56:05 +08:00
  • 2a998286c0 [Doc] update base image url(1.Replace conda with uv; 2.Integrate xpytorch and ops into the image.) (#146) WeiJie_Hong 2026-01-23 18:55:56 +08:00
  • c0f06d04b1 [Doc] docs: remove internal pip index from requirements (#147) 1916hcc 2026-01-23 18:55:34 +08:00
  • 1eaa1336ac [Bugfix]remove mla patch, server args no need --compilation-config for ds v3.1 (#145) baoqian426 2026-01-23 15:59:43 +08:00
  • 0ce5f1a3f7 Add kernels to optimize RoPE and the decoding stage (#143) fromck 2026-01-23 10:29:52 +08:00
  • 9e13f23661 [Doc] Optimize the document (#136) Lidang Jiang 2026-01-22 14:12:44 +08:00
  • 58f570ddea [Docs] Add XPU tutorials for Qwen / InternVL (#140) Joeegin 2026-01-22 13:50:49 +08:00
  • 74d4f804e8 add 2 kernels and optimize the calculation of topk_indices (#134) fromck 2026-01-22 10:29:28 +08:00
  • c9f00c132c [Kernel] Enable fast random sample on Kunlun3 Platform with generators (#73) yuqilinaa 2026-01-20 21:49:33 +08:00
  • c404af3a41 [Feature] totaly support multi-lora support,latest xspeedgate needed (#133) WANG HAO 2026-01-20 21:27:02 +08:00
  • 92b40628cd delete glmGlmForCausalLM register (#132) youzeyu 2026-01-20 19:22:33 +08:00
  • 561a235a3f [CI/Build] Modify biweekly report readme files (#131) haoli5009-debug 2026-01-20 16:58:36 +08:00
  • 2a2d773ad0 [fix]bias bug in kunlun_scale_mm (#126) Li Wei 2026-01-20 13:24:52 +08:00
  • f2019b145f Revert "support glm47 in 0.11.0 version (#116)" (#123) Li Wei 2026-01-20 10:46:11 +08:00
  • 9006e37979 support glm47 in 0.11.0 version (#116) roger-lcc 2026-01-19 20:26:26 +08:00
  • 8f56cbf3ed [refactor]update Kunlun classes with monkey patch (#122) Li Wei 2026-01-19 20:24:19 +08:00
  • 2512259944 longcontext chunk make attention crash, fix it (#117) baoqian426 2026-01-17 18:38:23 +08:00
  • 71a5a04e0a [Misc]Specify that DS32 only supports --kv-cache-dtype bfloat16 (#119) fromck 2026-01-17 16:52:02 +08:00
  • 8988ad08b2 [Feature] Support Mixed-Precision Quantization for MoE (#112) Shiwen Tang 2026-01-14 18:42:18 +08:00
  • 6706651646 Merge pull request #91 from zhihui96/dsv31 baoqian426 2026-01-14 15:12:30 +08:00
  • 115eb32068 enable int8 bmm wzh 2026-01-14 14:30:59 +08:00
  • f0bf384e2e Merge branch 'baidu:main' into dsv31 zhihui96 2026-01-14 14:21:57 +08:00
  • 7ed71432ca [Bug] Fix InternVL KeyError: ((1, 1, 3), '<i8') (#108) Lidang Jiang 2026-01-13 22:36:03 +08:00
  • 37cc307322 register apply_repetition_penalties_ in custom_op (#110) roger-lcc 2026-01-13 20:22:14 +08:00
  • fb424acca7 Merge pull request #106 from baoqian426/enable-full-cudagraph-deepseek baoqian426 2026-01-13 09:57:56 +08:00
  • bd90350968 [Bug] Fix no apply_top_k_top_p issue. (#101) Jin Hanyu 2026-01-12 16:38:03 +08:00
  • 18fc1c006e update maintainer for vllm-kunlun (#100) tanjunchen 2026-01-12 16:37:22 +08:00
  • ff8ebfa208 enable full cudagraph for deepseek hanhaowen 2026-01-12 15:18:12 +08:00
  • 87a57e43ca [Docs] Upate URL (#98) Xinyu Dong 2026-01-10 06:02:10 +08:00
  • 7be26ca617 [Bugs] Fix Docs Build Problem (#97) Xinyu Dong 2026-01-10 05:55:40 +08:00
  • 8c9cabd760 Merge pull request #96 from xyDong0223/main baoqian426 2026-01-09 17:20:17 +08:00
  • 462c44e2ac [Docs] Fix v0.11.0 Docs config Xinyu Dong 2026-01-09 17:07:18 +08:00
  • 0455b49519 [Bugs] fix qwen2_vl for 0.11.0 (#94) roger-lcc 2026-01-09 15:05:40 +08:00
  • df436a47f6 test wzh 2026-01-08 16:02:15 +08:00
  • 2c9b176e6e [Feature] use for dp (#90) baoqian426 2026-01-08 11:05:48 +08:00
  • c403d921ff [doc] update quantization guide doc (#88) Li Wei 2026-01-07 15:39:51 +08:00
  • eb40e8a07a [Bugfix] fix can not import compressed_tensors (#87) baoqian426 2026-01-07 11:32:10 +08:00
  • 62a97db6ed Merge pull request #85 from liwei109/liwei-dev baoqian426 2026-01-07 09:27:23 +08:00
  • 1c1b84d78c [fix]update compressed-tensors scheme Li Wei 2026-01-06 22:30:27 +08:00
  • 9c2b908908 Merge pull request #84 from xyDong0223/main baoqian426 2026-01-06 21:56:31 +08:00
  • c5e4d23e3e Merge pull request #82 from liwei109/quant baoqian426 2026-01-06 21:42:55 +08:00
  • 26b311ccf5 [Feature] DeepSeek Support MTP dongxinyu03 2026-01-06 21:37:21 +08:00
  • f811ae968a [fix] resolve cutlass_scaled_mm inference error tangshiwen 2026-01-06 20:52:12 +08:00
  • c54b2d2a2d Merge pull request #80 from liwei109/aicapx-quant baoqian426 2026-01-06 17:49:09 +08:00
  • 9533f68e99 [fix]matmul not support cuda graph Li Wei 2026-01-06 16:07:29 +08:00
  • 515a4eeda9 [dev] support compressed-tensors w8a8 quantization (#75) Li Wei 2026-01-06 13:51:53 +08:00
  • ee0f50e68f [Feature] support deepseek v3/r1/v3.2 (#78) baoqian426 2026-01-05 22:55:35 +08:00
  • 07bc24a555 [Bugs] Fix moe when without bias (#76) Xinyu Dong 2026-01-05 10:51:23 +08:00
  • b86953acf9 [Kernel] Qwen3-next 优化 recompute_w_u_fwd & chunk_fwd_o (#74) callmelaoyi 2026-01-05 10:24:51 +08:00
  • fe666fb24f [Feature] Support gpt-oss and update model list (#71) Xinyu Dong 2026-01-04 21:19:49 +08:00
  • ded24f5026 [Model] Supporet InternVL2_5 on v0.11.0 (#72) Joeegin 2026-01-04 16:38:05 +08:00
  • 684ce2761e Merge pull request #69 from chanzhennan/main baoqian426 2025-12-31 16:44:58 +08:00
  • e48e4330e5 Merge pull request #67 from xyDong0223/main baoqian426 2025-12-31 16:44:42 +08:00
  • 6bc61d0dfe [Docs] : update readme.md chanzhennan 2025-12-31 16:41:12 +08:00
  • 3290c30ec1 Merge pull request #68 from tanjunchen/main baoqian426 2025-12-31 15:01:49 +08:00
  • e8f4e1337c update readme.md tanjunchen 2025-12-31 14:55:15 +08:00
  • c46c46ef77 [Docs] Update torch and ops for mimo v2 Xinyu Dong 2025-12-31 13:17:06 +08:00
  • cdef33dbb0 Merge pull request #66 from baoqian426/model/remove-llama-qwne2 baoqian426 2025-12-31 11:57:22 +08:00
  • b015bb76fd remove qwen2.py llama.py fix llama output hanhaowen 2025-12-31 11:31:26 +08:00
  • b3c30a3cb9 [Feature] Support XiaoMi MIMO Flash V2 (#62) Xinyu Dong 2025-12-31 10:16:33 +08:00
  • 341dc7f296 [Docs] Update base image path in Installation.md (#63) WeiJie_Hong 2025-12-30 19:10:41 +08:00
  • 6382deb32b Merge pull request #60 from tanjunchen/main-1 baoqian426 2025-12-29 21:24:26 +08:00
  • 8c23a955a4 update readme.md tanjunchen 2025-12-29 21:21:10 +08:00
  • 9cee025f41 Merge pull request #59 from liwei109/aicapx-quant Li Wei 2025-12-29 19:56:24 +08:00
  • 7fb627c34e Merge pull request #57 from tanjunchen/main-github-action Xinyu Dong 2025-12-29 13:18:31 +08:00
  • 6d7d7c347f Add foundational configuration tanjunchen 2025-12-28 20:28:58 +08:00
  • d17ee45d4c Merge pull request #55 from tanjunchen/main-dev-01 Xinyu Dong 2025-12-28 17:48:14 +08:00