xc-llm-ascend

Files

sherie 3fb80ee356 add mlp tp optimze (#2120 )

### What this PR does / why we need it?
For dense models, by not applying tensor parallelism (TP) to the
attention module and applying TP to the MLP module, the allreduce
operations in the attention module can be eliminated, thereby reducing
computational overhead. However, this approach increases memory usage,
so the environment variable VLLM_ASCEND_ENABLE_MLP_OPTIMZE is used to
control this optimization.

- vLLM main:
b17109beea

Signed-off-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>
Co-authored-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>

2025-08-21 09:22:07 +08:00

expert_map.json

Add unit test local cpu guide and enable base testcase (#1566 )

2025-07-06 10:42:27 +08:00

test_activation.py

[1/N][CustomOp] Register activation customop instead of overwrite forward_oot (#1841 )

2025-07-18 23:07:14 +08:00

test_expert_load_balancer.py

Add unit test local cpu guide and enable base testcase (#1566 )

2025-07-06 10:42:27 +08:00

test_fused_ops.py

Fix some ci issue and refactor modelrunner (#2445 )