[bugfix] Support dsv3.2 enable both mtp and full_decode_only (#5679)
### What this PR does / why we need it?
#5230 this PR introduced a problem when both mtp and full_decode_only
are enabled for the DSV32 model, the operators cannot be compiled into
the graph. This PR fixes that issue.
- vLLM version: v0.13.0
- vLLM main:
2f4e6548ef
Signed-off-by: cookieyyds <126683903+cookieyyds@users.noreply.github.com>
This commit is contained in:
@@ -167,7 +167,7 @@ class AscendSFAMetadataBuilder(MLACommonMetadataBuilder[AscendSFAMetadata]):
|
||||
) -> AttentionCGSupport:
|
||||
# Explicit override in case the underlying builder specialized this getter.
|
||||
# @override omitted only because of mypy limitation due to type variable.
|
||||
return AttentionCGSupport.UNIFORM_SINGLE_TOKEN_DECODE
|
||||
return AttentionCGSupport.UNIFORM_BATCH
|
||||
|
||||
def reorder_batch(self, input_batch: "NPUInputBatch",
|
||||
scheduler_output: "SchedulerOutput") -> bool:
|
||||
|
||||
Reference in New Issue
Block a user