xc-llm-ascend

EngineX/xc-llm-ascend

Fork 0

afc58184ec [Installation] limit opencv-python-headless version to resolve numpy version conflict (#3713) 22dimensions 2025-10-25 18:07:54 +08:00
bb5f16d926 [BugFix] Fix Qwen3-next break (#3428) Icey 2025-10-25 18:03:36 +08:00
7572939b94 add qwq testcase (#3757) ck-hw-1018 2025-10-25 17:11:35 +08:00
e5676fc36e [main] remove dbo code (#3712) zzzzwwjj 2025-10-25 15:53:01 +08:00
a58ff9e92f [Cherry-pick] Port MoE multi-stream fix to v0.11.0-dev (#3753) whx 2025-10-25 15:51:43 +08:00
1bc61031e5 [v0.11.0][Fix] Cap max tokens to prevent potential OOM (#3720) (#3744) Yizhou 2025-10-25 15:46:56 +08:00
d9cdc65854 Upgrade to new vllm commit (#3719) Icey 2025-10-25 15:36:32 +08:00
99e154dc84 [0.11.0] cherry-pick from #3747 (#3746) fems14 2025-10-25 14:21:30 +08:00
226f832c0b [bugfixfix] correct _register function place for mooncacke (#3747) fems14 2025-10-25 14:20:09 +08:00
11f75883be [Test] add test for prefix cache feature of deepseek (#3733) HuaJiaHeng 2025-10-25 14:08:15 +08:00
fed8145aea [cherry-pick][Feat] Add mrope fusion op#3708 (#3735) shaopeng-666 2025-10-25 11:41:23 +08:00
1f25d60870 [Fix] Cap max tokens to prevent potential OOM (#3720) Yizhou 2025-10-25 11:23:21 +08:00
63c363d3de [Refactor] [MoE] Rename moe-related classes & files (#3646) weichen 2025-10-25 11:22:03 +08:00
0637e8f021 [Doc] Update supported models (#3481) zhangxinyuehfad 2025-10-25 11:13:46 +08:00
8f6f967028 [Test] Add e2e test and accuracy test for Qwen3-Next-80B-A3B-Instruct (#3450) zhangxinyuehfad 2025-10-25 10:57:56 +08:00
0644113c35 [BugFix] cherry-pick PR 3736 to v0.11.0-dev (#3737) whx 2025-10-25 10:35:14 +08:00
d5609e2c48 [BugFix] Comment out newly added vlm e2e. (#3736) whx 2025-10-25 10:34:59 +08:00
9e150e5009 [Refactor] optimize _prepare_inputs method in eagle_proposer (#3296) lio 2025-10-25 09:49:42 +08:00
d30bb95b90 [Bugfix] Fix zero attention output in qwen3-next (#3572) QilaiZhang 2025-10-25 09:47:03 +08:00
5a2c5be229 [BugFix][Cherry-pick] Cherry-pick PR 3675 to v0.11.0-dev (#3732) whx 2025-10-25 09:41:51 +08:00
e33751ef8b [BugFix][Core] Fix a bug running multi-modal with ascend_scheduler (#3675) whx 2025-10-25 09:41:33 +08:00
1a9feb3ba5 Update version doc (#3599) wangxiyuan 2025-10-25 09:37:56 +08:00
07c8d4547c [CI] Skip ops test for e2e (#3665) wangxiyuan 2025-10-25 09:37:30 +08:00
6922947033 [Misc] Limit ray version (#3660) wangxiyuan 2025-10-25 09:36:44 +08:00
8295136575 [UT][fix] Add missing get_ascend_config mock to NPUWorker initialization tests (#3729) Canlin Guo 2025-10-25 09:33:16 +08:00
7f73c28a24 [CI][Doc] Optimize multi-node CI (#3565) Li Wang 2025-10-25 09:23:47 +08:00
12bc78d252 [v0.11.0][BugFix][P/D] Modify the recalculation logic to prevent waiting requests from filling up the D node KVCache (#3686) hucong 2025-10-25 09:15:42 +08:00
292cf339c3 [BugFix][P/D] Modify the recalculation logic to prevent waiting requests from filling up the D node KVCache (#3641) hucong 2025-10-25 09:14:20 +08:00
39b994a987 [Feat] Add mrope fusion op (#3708) shaopeng-666 2025-10-25 09:12:18 +08:00
3158742a97 [Refactor] Refactor Ascend attention implementation forward (#3714) Yizhou 2025-10-25 08:58:35 +08:00
5c0a23f98b [0.11.0][Perf] Add fused matmul/reduce-scatter kernel for performance optimization. (#3725) ZYang6263 2025-10-25 08:20:43 +08:00
17dd9ae42c [0.11.0][bugfix]look up multi_tp key (#3699) (#3723) fems14 2025-10-24 18:22:45 +08:00
f0eb3e1d97 [v0.11.0][bugfix]kvpool sync load (#3698) (#3722) fems14 2025-10-24 18:21:46 +08:00
0b1da24742 [Main][Perf] Add fused matmul/reduce-scatter kernel for performance optimization. (#3693) ZYang6263 2025-10-24 18:19:58 +08:00
33514a4cc2 [Bugfix] The server fails to locate the request, leading to the server hanging. (#3721) 何必问 2025-10-24 17:41:29 +08:00
82a4970fe9 look up multi_tp key (#3699) fems14 2025-10-24 17:23:36 +08:00
c83efcb9e4 kvpool sync load (#3698) fems14 2025-10-24 17:22:53 +08:00
59bb16b75c [Bugfix] The server fails to locate the request, leading to the server hanging. (#3703) 何必问 2025-10-24 17:18:03 +08:00
d301c56d1a [TEST]Add initial multi modal cases of Qwen2.5-VL-32B-Instruct for nightly test (#3707) wangyu 2025-10-24 17:12:06 +08:00
4e21b1537e [BugFix] Check all expert maps when using muilty instance. (#3662) offline893 2025-10-24 17:10:31 +08:00
9b0baa1182 [BugFix] Check all expert maps when using muilty instance. (#3576) offline893 2025-10-24 17:10:14 +08:00
cea0755b07 [1/N][Refactor] Refactor code to adapt with vllm main (#3612) Mengqing Cao 2025-10-24 16:55:08 +08:00
ec9ec78b53 [TEST]Add initial prefix cache case for nightly test (#3709) jiangyunfan1 2025-10-24 16:33:18 +08:00
6be321b95e remove useless code (#3685) zzzzwwjj 2025-10-24 16:29:08 +08:00
cd58a643c5 [UT] Fix test_sample_recovered_tokens_pytorch_autoregressive (#3434) lio 2025-10-24 11:20:57 +08:00
802c574532 [Benchmark] Upgrade benchmark args for new vllm version (#3218) Li Wang 2025-10-24 11:18:19 +08:00
1b270a64bd [MoE][Multistream] Avoid performing communication in extra stream. (#3582) whx 2025-10-24 10:44:38 +08:00
b54d44e664 support cp&dcp (#3260) LookAround0301 2025-10-24 10:32:01 +08:00
b321e3846a [cherry-pick]【main】patch sched_yield (#3648) (#3687) wangxiyuan 2025-10-24 00:24:58 +08:00
2bcadcb9d5 【main】patch sched_yield (#3648) fems14 2025-10-24 00:06:45 +08:00
d0086d432a fix deepseek torchair recompile (#3679) Wang Yixuan 2025-10-23 22:53:13 +08:00
a7b40b09eb [BugFix]fix deepseek torchair recompile (#3678) Wang Yixuan 2025-10-23 22:53:01 +08:00
d2d19a4c3c [v0.11.0][bugfix] Add 'layer_type' param to get_pergroup_param() for compatibility (#3684) Slightwind 2025-10-23 21:26:50 +08:00
3366d47694 [main][bugfix] Add 'layer_type' param to get_pergroup_param() for compatibility (#3682) Slightwind 2025-10-23 21:26:33 +08:00
062257f624 [Test] add a new Qwen3-32b-int8 test case with feature_stack3 (#3676) HuaJiaHeng 2025-10-23 20:43:14 +08:00
ebfd09a075 [Doc] Update the Pangu Pro MoE tutorials. (#3651) zhangyiming 2025-10-23 20:41:47 +08:00
f3ea657e93 [0.11.0][Bugfix] fix delay free prefill req & D node support prefix cache (#3609) liziyu 2025-10-23 20:39:35 +08:00
aeddf4261a [Bugfix] fix delay free prefill req & D node support prefix cache (#3607) liziyu 2025-10-23 20:39:14 +08:00
e3c1ac89e5 [Structured Output] Replace apply_grammar_bitmask() method with that in vllm to avoid maintenance (#2524) Shanshan Shen 2025-10-23 17:26:27 +08:00
9434f24ded [TEST]Add initial multi modal cases for nightly test and deepseek-r1 tests (#3631) jiangyunfan1 2025-10-23 17:18:49 +08:00
427b17e2da [Misc] Add a model loader that utilizes HCCL for weight loading (#2888) Rui Kang 2025-10-23 15:56:07 +08:00
807686dec9 perf : optimize memory for deepseek mtp (#2713) NeverRaR 2025-10-23 15:52:17 +08:00
2584f97217 [BugFix] fix deepseek torchair precision (#3624) Wang Yixuan 2025-10-23 15:41:50 +08:00
f06a6cad1b [Doc] Update the modelslim website from gitee to gitcode. (#3615) Crazyang 2025-10-23 15:38:16 +08:00
6975d46627 [v0.11.0][Perf] Eliminating the zerolike operator through patch (#3632) ZYang6263 2025-10-23 14:49:28 +08:00
74903af460 [v0.11.0][refactor] refactor SequenceRowParallelOp forward (#3654) rjg-lyh 2025-10-23 14:45:49 +08:00
292e213dd2 [main][refactor] refactor SequenceRowParallelOp forward (#3616) rjg-lyh 2025-10-23 14:41:15 +08:00
ca104ce6f0 [Doc] Upgrade docker run command (#3645) Li Wang 2025-10-23 11:17:26 +08:00
54bd531db8 [v0.11.0][Fix] Fix attention metadata handling for profiling and MLA (#3636) (#3643) Yizhou 2025-10-23 10:29:30 +08:00
dd7a25063c [Feat] Prefetching Attention QKV Linear Weight With AddRmsNormQuant Custom Op (#3517) Ruri 2025-10-23 10:07:37 +08:00
72695c97d0 [BugFix][main] Fix quantization related mtp bug with patch (#3620) whx 2025-10-23 09:54:31 +08:00
4381d296e5 [Fix] Fix attention metadata handling for profiling and MLA (#3636) Yizhou 2025-10-23 09:35:18 +08:00
b13d22bf5a [Fix] Fixes attribute error in MLA implementation (#3618) Yizhou 2025-10-23 09:12:50 +08:00
6464c97ff9 [BugFix][v0.11.0] Fix quantization related mtp bug with patch (#3619) whx 2025-10-22 23:06:09 +08:00
6e72bfdc50 [v0.11.0] cherry-pick Fix performance degradation when mtp>1 (#3597) (#3630) Zetong Li 2025-10-22 22:07:39 +08:00
179b897b52 [Bugfix][MTP] Fix performance degradation when mtp>1 (#3597) Zetong Li 2025-10-22 22:04:43 +08:00
a989fef5de unify logic between aclgraph and torchair (#3602) zouyida2052 2025-10-22 21:55:06 +08:00
55a4b5ac40 unify logic between aclgraph and torchair (#3560) zouyida2052 2025-10-22 21:52:57 +08:00
edccd46d74 fix deepseek torchair precision (#3635) Wang Yixuan 2025-10-22 20:20:32 +08:00
984efdc0d0 [v0.11.0][Fix] Fixes attribute error in MLA implementation (#3617) Yizhou 2025-10-22 15:49:18 +08:00
1ad7ffd647 clean up uesless ut test (#3622) wangxiyuan 2025-10-22 15:00:08 +08:00
286ae9003d [CI] Multi-Node CI scalable (#3611) Li Wang 2025-10-22 14:18:43 +08:00
bc30874f8b [Feat] add native kvcache offload (#3433) kx 2025-10-22 14:15:49 +08:00
a0c3b8dd2d [v0.11.0]cherry-pick fix ut (#3608) (#3614) wangxiyuan 2025-10-22 14:14:15 +08:00
60e2be1b36 [Feat] Dynamic Batch Feature (#3490) KyrieWang 2025-10-22 14:13:32 +08:00
c18ca62a17 [Misc] clean up useless function (#3348) wangxiyuan 2025-10-22 11:53:40 +08:00
f2dd5f8d08 fix : support chunked_prefill with deepseek_mtp (#2711) NeverRaR 2025-10-22 11:52:27 +08:00
2f1b9a7a64 Reapply "[MoE] [Refactor] Remove manual memory cleanup (#3365)" (#3483) (#3512) weichen 2025-10-22 11:41:30 +08:00
6ef62cb427 fix ut (#3608) wangxiyuan 2025-10-22 11:30:12 +08:00
5f3b798e56 [CI]Fix test nightly workflow. (#3603) offline893 2025-10-22 10:45:42 +08:00
726bc8aa2a [CI]fix test nightly workflow. (#3604) offline893 2025-10-22 10:34:03 +08:00
e916265b2b [CI]Add EPLB CI. (#3568) offline893 2025-10-21 22:58:02 +08:00
4c9af353ee Revert "[Feat] Shared expert dp for deepseek and deepseek_mtp (#3495)" (#3586) linfeng-yuan 2025-10-21 22:24:30 +08:00
bd11c0054f [BugFix] Fix torchair+mtp bug after deleting deepseek_mtp. (#3590) whx 2025-10-21 22:23:52 +08:00
0c83eee9b1 fix vl float model not support NZ format weight error (#3533) shaopeng-666 2025-10-21 22:23:17 +08:00
6f04b467de [CI] Upgrade manylinux image (#3587) Icey 2025-10-21 22:22:45 +08:00
79821106e6 [BugFix]Fix mtp torchair bug caused by #2719 (#3566) xuyexiong 2025-10-21 22:21:44 +08:00
534f32d27c [BugFix][mian] Fixed a triton kernel bug of layer_norm_fwd_kernel for Qwen3-next (#3549) drslark 2025-10-21 20:20:57 +08:00
13e8e75143 [Refactor] refactor patch module (#3555) wangxiyuan 2025-10-21 20:19:46 +08:00
0c6349610e [Feature] Reduce host memory usage for attention mask generation (#3048) Jade Zheng 2025-10-21 20:19:04 +08:00

Commit Graph Select branches Hide Pull Requests br/v0.18.0 br/v0.18.0rc1 v0.11.0 Mono Color

Commit Graph

Select branches

Hide Pull Requests

br/v0.18.0

br/v0.18.0rc1

v0.11.0