Files
xc-llm-ascend/tests/ut/_310p/attention
pu-zhe a8e951e6f5 [Feat] 310p supports PrefillCacheHit State (#6756)
### What this PR does / why we need it?
This PR extends the Ascend 310P attention backend to support the
`PrefillCacheHit` state. Previously, only `PrefillNoCache`,
`DecodeOnly`, and `ChunkedPrefill` were supported.
This PR handles this state by routing it to the existing
`forward_chunked_prefill_310` implementation, which is suitable for this
scenario.
The changes also include refactoring the main `forward_impl` dispatch
method for better clarity and updating unit tests to cover the new state
and ensure correctness.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Accuracy test when chunked prefill is disabled.
- vLLM version: v0.15.0
- vLLM main:
9562912cea

---------

Signed-off-by: pu-zhe <zpuaa@outlook.com>
2026-02-24 16:48:05 +08:00
..