xc-llm-ascend

Files

pu-zhe a8e951e6f5 [Feat] 310p supports PrefillCacheHit State (#6756 )

### What this PR does / why we need it?
This PR extends the Ascend 310P attention backend to support the
`PrefillCacheHit` state. Previously, only `PrefillNoCache`,
`DecodeOnly`, and `ChunkedPrefill` were supported.
This PR handles this state by routing it to the existing
`forward_chunked_prefill_310` implementation, which is suitable for this
scenario.
The changes also include refactoring the main `forward_impl` dispatch
method for better clarity and updating unit tests to cover the new state
and ensure correctness.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Accuracy test when chunked prefill is disabled.
- vLLM version: v0.15.0
- vLLM main:
9562912cea

---------

Signed-off-by: pu-zhe <zpuaa@outlook.com>

2026-02-24 16:48:05 +08:00

attention

[Feat] 310p supports PrefillCacheHit State (#6756 )

2026-02-24 16:48:05 +08:00

fused_moe

[Feat] 310p support MoE W8A8 quantizaition (#6641 )

2026-02-10 17:17:44 +08:00

ops

[Feat.][310P]: weightNZ feature with quant or unquant. (#6705 )

2026-02-13 15:41:02 +08:00

quantization

[Feat.][310P]: weightNZ feature with quant or unquant. (#6705 )