xc-llm-ascend/vllm_ascend at e56b0017a3f580a6d35a879e3eafc2c1717caa49 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

ZYang6263 d08401d1e7 [Main][Bugfix]Avoid using the fusion operator in the MOE model (#3834 )

### What this PR does / why we need it?
The current MatmulReduceScatter operator experiences performance
degradation in small-shape scenarios, so it determines whether to use
this operator by judging the size of the shape.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.1

---------

Signed-off-by: ZYang6263 <zy626375@gmail.com>

2025-10-28 23:30:27 +08:00

..

support prefill cache mode use fia op (#3696 )

2025-10-27 19:41:07 +08:00

[feat]dcp pcp support aclgraph (#3731 )

2025-10-27 09:58:23 +08:00

Upgrade to new vllm commit (#3719 )

2025-10-25 15:36:32 +08:00

device_allocator

[Misc]Clean up useless import from vllm (#2049 )

2025-07-28 16:01:59 +08:00

Upgrade to new vllm commit (#3719 )

2025-10-25 15:36:32 +08:00

[CI]Add EPLB CI. (#3568 )

2025-10-21 22:58:02 +08:00

Upgrade to 0.11.1 newest vllm commit (#3762 )

2025-10-28 14:55:03 +08:00

[1/N][Refactor] Refactor code to adapt with vllm main (#3612 )

2025-10-24 16:55:08 +08:00

Upgrade to new vllm commit (#3719 )

2025-10-25 15:36:32 +08:00

Upgrade to 0.11.1 newest vllm commit (#3762 )

2025-10-28 14:55:03 +08:00

[Main][Bugfix]Avoid using the fusion operator in the MOE model (#3834 )

2025-10-28 23:30:27 +08:00

Upgrade to new vllm commit (#3719 )

2025-10-25 15:36:32 +08:00

【Bugfix】bugfix for weight load of kimi-k2 (#3798 )

2025-10-27 21:18:35 +08:00

Upgrade to 0.11.1 newest vllm commit (#3762 )

2025-10-28 14:55:03 +08:00

Upgrade to 0.11.1 newest vllm commit (#3762 )

2025-10-28 14:55:03 +08:00

[main] remove dbo code (#3712 )

2025-10-25 15:53:01 +08:00

Upgrade to 0.11.1 newest vllm commit (#3762 )

2025-10-28 14:55:03 +08:00

__init__.py

[1/N][Refactor] Refactor code to adapt with vllm main (#3612 )

2025-10-24 16:55:08 +08:00

ascend_config.py

[1/N][Refactor] Refactor code to adapt with vllm main (#3612 )

2025-10-24 16:55:08 +08:00

ascend_forward_context.py

[Main][Bugfix]Avoid using the fusion operator in the MOE model (#3834 )

2025-10-28 23:30:27 +08:00

cpu_binding.py

[main] support cpu binding (#3546 )

2025-10-21 09:17:03 +08:00

envs.py

[main] remove dbo code (#3712 )

2025-10-25 15:53:01 +08:00

meta_registration.py

Fix the bugs about operator registration by PyTorch Dispatcher (#2786 )

2025-09-13 11:58:52 +08:00

platform.py

Upgrade to 0.11.1 newest vllm commit (#3762 )

2025-10-28 14:55:03 +08:00

utils.py

Upgrade to 0.11.1 newest vllm commit (#3762 )

2025-10-28 14:55:03 +08:00