xc-llm-ascend

Files

Chen Chen a2daacbd71 [perf] Fix MLAPO weight disposal for KV-consumer MLA in PD-mix deploy... (#5192 )

### What this PR does / why we need it?

- Problem: In MLA+MLAPO, KV-consumer deployments keep
fused_qkv_a_proj/q_proj weights and quant params even though MLAPO uses
the prepacked buffers, increasing memory footprint on decode nodes.
- Fix: Conditionally drop those tensors only when
`kv_transfer_config.is_kv_consumer` to reclaim memory (consistent with
the SFA behavior #4774 ).

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

Signed-off-by: Chen Chen <0109chenchen@gmail.com>

2026-01-05 21:29:45 +08:00

context_parallel

[Refactor]7/N Extract common code to common_cp (#5490 )

2026-01-05 17:41:12 +08:00

__init__.py

[Core] Make V1 work and enable V1 engine test (#389 )

2025-03-28 19:34:23 +08:00

attention_mask.py

[feature] fia support sliding windows (#5239 )

2025-12-29 14:56:25 +08:00

attention_v1.py

[Refactor]7/N Extract common code to common_cp (#5490 )