xc-llm-ascend

Files

cookieyyds 51415aaa2f [bugfix]support dsv3.2 enable both mtp and full_decode_only (#5849 )

### What this PR does / why we need it?
support dsv3.2 enable both mtp and full_decode_only

PR5626 To align with the community, the branch logic was modified.
Previously, dsv32 could not reach inside the branch, and now an
additional unpadded step is required, which causes transformations in
positions and num_input_tokens, leading to changes in the cos and sin
dimensions in sfa_v1.py. This, in turn, causes an illegal shape error
when passed to the operator.

1. The unpadded function is introduced to align with the community， and
in the community the function does not have the parameters
num_input_tokens and positions.
2. The positions are split and num_input_tokens=num_actual_tokens are
used to correspond to the function name unpad, so that the padded
positions and num_input_tokens are not output.

However, in fact, attention_v1 does not use the above two parameters.
This is done because we are concerned that some people might use these
parameters later and encounter shape mismatch issues if they are not
aware of this. Therefore, we have performed the cropping.

From the perspective of the source of acquisition, positions are not
cropped, so there is actually no need to add unpad in this case.

- vLLM version: v0.13.0
- vLLM main:
2f4e6548ef

Signed-off-by: cookieyyds <126683903+cookieyyds@users.noreply.github.com>

2026-01-14 22:57:38 +08:00

context_parallel

[bugfix](cp) replace None with zeros/inf tensor to avoid TypeError (#5837 )

2026-01-14 20:57:48 +08:00

__init__.py

[Core] Make V1 work and enable V1 engine test (#389 )

2025-03-28 19:34:23 +08:00

attention_mask.py

[Refactor] Fix AttentionMaskBuilder singleton and remove redundant pcp_prefill_mask (#4870 )

2026-01-07 17:09:52 +08:00

attention_v1.py

[Refactor] Provide a framework to accommodate operators for different hardware devices (#5735 )