Files

sdmyzlp 3640c60b0e Avoid unfused Transpose in DeepSeekV3 EP256 MoE layer (#1091 )

### What this PR does / why we need it?

View optimization in torchair (defaulted to on for Transpose with any of
its axis being 1) prevents the weight Transpose to be fused with later
GroupedMatmul, which decrease the performance of MoE layer when expert
parallelism equals the total number of experts (e.g. EP256 for DSKv3).
Add an option to solve this problem by disabling the optimization.

### Does this PR introduce _any_ user-facing change?

Controlled by
`additional_config.torchair_graph_config.enable_view_optimize`,
defaulted to `True`.

### How was this patch tested?

Tested on 1x16 910 node, with tailored 2 layer DSKv2.

Signed-off-by: sdmyzlp <lrwei2@petalmail.com>

2025-06-07 14:28:20 +08:00

source

Avoid unfused Transpose in DeepSeekV3 EP256 MoE layer (#1091 )

2025-06-07 14:28:20 +08:00

Makefile

[Doc] Add sphinx build for vllm-ascend (#55 )

2025-02-13 18:44:17 +08:00

README.md

Add an example for user stories (#399 )

2025-03-26 16:25:57 +08:00

requirements-docs.txt

[Docs] Add dynamic version in docs (#90 )

2025-02-19 08:57:27 +08:00

requirements-test.txt

[Doc] Add sphinx build for vllm-ascend (#55 )

2025-02-13 18:44:17 +08:00

README.md

vLLM Ascend Plugin documents

Live doc: https://vllm-ascend.readthedocs.io

Build the docs

# Install dependencies.
pip install -r requirements-docs.txt

# Build the docs.
make clean
make html

Open the docs with your browser

python -m http.server -d _build/html/

Launch your browser and open http://localhost:8000/.