xc-llm-ascend

Files

Qiu 7c210225a2 [Perf][PCP][DCP] add multi-stream for GQA to enable computation-communication overlap (#5382 )

### What this PR does / why we need it?
This PR adds multi-stream for GQA to enable computation-communication
overlap. For chunked prefill, we reduce TTFT by approximately 4%.

### Does this PR introduce _any_ user-facing change?
No

- vLLM version: release/v0.13.0
- vLLM main:
bc0a5a0c08

---------

Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>

2026-01-04 16:33:18 +08:00

__init__.py

[Core] Make V1 work and enable V1 engine test (#389 )

2025-03-28 19:34:23 +08:00

attention_cp.py

[Perf][PCP][DCP] add multi-stream for GQA to enable computation-communication overlap (#5382 )

2026-01-04 16:33:18 +08:00

attention_mask.py

[feature] fia support sliding windows (#5239 )

2025-12-29 14:56:25 +08:00

attention_v1.py

[Feat][main] Supported to use full-graph with Qwen3-Next-MTP (#5477 )