Default enable MLAPO (#5952)

### What this PR does / why we need it?
1) Default enable MLAPO for deepseek MLA Attention W8A8 models on PD
disagregation D Instance, for example: DeepSeekV3-W8A8,
DeepSeek-R1-W8A8.
2) Default enable MLAPO for DeepSeek SFA Attention W8A8 models,
currently is DeepSeek-V3.2-W8A8.

### Does this PR introduce _any_ user-facing change?
Don't need use manully to VLLM_ASCEND_ENABLE_MLAPO=1, to enable MLAPO
feature for deepseek w8a8 model

The effect of enabling MLAPO SFA model deployed on a single A3 Node:
Test
with:tests/e2e/nightly/single_node/models/test_deepseek_v3_2_exp_w8a8.py
dataset: gsm8k-lite，without set MTP, FULL GRAPH, has 19% promote：
未默认开启 MLAPO 时：
├─────────────────────────┤
│                TTFT                      │ 14055.8836 ms   │
├─────────────────────────┤
│                ITL                         │ 66.8171 ms.          │
├─────────────────────────┤
│ Output Token Throughput  │ 104.9105 token/s │
├─────────────────────────┤
默认开启 MLAPO 时：
├─────────────────────────┤
│                TTFT                      │ 3753.1547 ms   │
├─────────────────────────┤
│                ITL.                        │ 61.4236  ms.       │
├─────────────────────────┤
│ Output Token Throughput  │ 125.2075 token/s│
├─────────────────────────┤

- vLLM version: v0.13.0
- vLLM main:
2c24bc6996

---------

Signed-off-by: leo-pony <nengjunma@outlook.com>

This commit is contained in:

Nengjun Ma

2026-01-22 09:26:39 +08:00

committed by

GitHub

parent a15a5f6aa5

commit ab676413e6

13 changed files with 17 additions and 29 deletions

									
										3

vllm_ascend/attention/sfa_v1.py
									
												View File
												
				@@ -375,6 +375,9 @@ class AscendSFAImpl(MLAAttentionImpl):

				        ascend_config = get_ascend_config()

				        self.enable_shared_expert_dp = ascend_config.enable_shared_expert_dp

				        self.enable_prefetch = ascend_config.weight_prefetch_config.enabled

				        # In sfa, prefill and decode have the same calculation formula,

				        # so do not distinguish between prefill and decode here.

				        self.enable_mlapo = envs.VLLM_ASCEND_ENABLE_MLAPO

				        assert self.indexer is not None, "Indexer is required for DSA."

Default enable MLAPO (#5952)

3 vllm_ascend/attention/sfa_v1.py Unescape Escape View File

3

vllm_ascend/attention/sfa_v1.py

View File