xc-llm-ascend

Files

LICO67373 caa71e50ca [Perf] Simplify FIA prefill context merge path (#7293 )

### What this PR does / why we need it?
This PR simplifies and hardens MLA prefill context merging in
`vllm_ascend/attention/mla_v1.py` after FIA migration by directly
building `out_list/lse_list` (without temporary chunk buffers or
`cat/stack/split`) and using `reshape` for safe flattening of
non-contiguous tensors.

### Does this PR introduce _any_ user-facing change?
No. This is an internal refactor/stability improvement only; no
API/interface behavior changes.

### How was this patch tested?
- Verified tensor shape/data flow for `npu_attention_update` inputs
(`out_list/lse_list`) after refactor.
- Confirmed no lint errors in the modified file.
- CI UT coverage on attention/MLA paths is used for validation.

vLLM version: `v0.17.0`  
vLLM main: `vllm-project/vllm@4034c3d`

---------

Signed-off-by: lico67373 <918688502@qq.com>

2026-03-23 07:47:42 +00:00

context_parallel

[Feature]Supports DSv3.1 PD separation and C8 quantization (#7222 )

2026-03-16 22:49:05 +08:00

__init__.py

[Core] Make V1 work and enable V1 engine test (#389 )

2025-03-28 19:34:23 +08:00

attention_mask.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #2 ) (#5977 )

2026-01-19 08:59:46 +08:00

attention_v1.py

[Bugfix] Fix padding logic in eagle proposer for kimi25 (#7348 )