[Bugfix] Add the missing parentheses to @torch.inference_mode (#6757)

### What this PR does / why we need it?
This PR fixes a bug in `vllm_ascend/worker/model_runner_v1.py` where the
`@torch.inference_mode` decorator was used without parentheses. Using
the decorator without instantiation is deprecated and may not correctly
disable gradient calculations, leading to performance degradation and
increased memory usage during inference. This change adds the required
parentheses to ensure `torch.inference_mode` is applied correctly.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
The change is a minor syntax correction. Existing CI tests should cover
this.

- vLLM version: v0.15.0
- vLLM main:
9562912cea

Signed-off-by: gcanlin <canlinguosdu@gmail.com>
This commit is contained in:
Canlin Guo
2026-02-25 14:37:53 +08:00
committed by GitHub
parent 957804df56
commit ad9d9569ea

View File

@@ -1325,7 +1325,7 @@ class NPUModelRunner(GPUModelRunner):
self.kv_connector_output = kv_connector_output self.kv_connector_output = kv_connector_output
return None return None
@torch.inference_mode @torch.inference_mode()
def sample_tokens( def sample_tokens(
self, grammar_output: "GrammarOutput | None" self, grammar_output: "GrammarOutput | None"
) -> ModelRunnerOutput | AsyncModelRunnerOutput | IntermediateTensors: ) -> ModelRunnerOutput | AsyncModelRunnerOutput | IntermediateTensors: