Files

Mengqing Cao 8dd686dfa2 [MLA][Graph] Improve assertion on Graph mode with MLA (#933 )

### What this PR does / why we need it?
Improve assertion on Graph mode with MLA.

When running deepseek with graph mode, the fused MLA op only support
`numHeads / numKvHeads ∈ {32, 64, 128}`, thus we improve the assertion
info here to avoid users confused with this.

### Does this PR introduce _any_ user-facing change?
Adjusting tp size is required when running deepseek-v3/r1 with graph
mode. deepseek-v2-lite is not supported in graph mode.

### How was this patch tested?
Test locally as the CI machine could not run V3 due to the HBM limits.

---------

Signed-off-by: MengqingCao <cmq0113@163.com>

2025-06-10 22:26:53 +08:00

source

[MLA][Graph] Improve assertion on Graph mode with MLA (#933 )

2025-06-10 22:26:53 +08:00

Makefile

[Doc] Add sphinx build for vllm-ascend (#55 )

2025-02-13 18:44:17 +08:00

README.md

Add an example for user stories (#399 )

2025-03-26 16:25:57 +08:00

requirements-docs.txt

[Docs] Add dynamic version in docs (#90 )

2025-02-19 08:57:27 +08:00

requirements-test.txt

[Doc] Add sphinx build for vllm-ascend (#55 )

2025-02-13 18:44:17 +08:00

README.md

vLLM Ascend Plugin documents

Live doc: https://vllm-ascend.readthedocs.io

Build the docs

# Install dependencies.
pip install -r requirements-docs.txt

# Build the docs.
make clean
make html

Open the docs with your browser

python -m http.server -d _build/html/

Launch your browser and open http://localhost:8000/.