[Bugfix] Add the missing parentheses to @torch.inference_mode (#6757)

### What this PR does / why we need it? This PR fixes a bug in `vllm_ascend/worker/model_runner_v1.py` where the `@torch.inference_mode` decorator was used without parentheses. Using the decorator without instantiation is deprecated and may not correctly disable gradient calculations, leading to performance degradation and increased memory usage during inference. This change adds the required parentheses to ensure `torch.inference_mode` is applied correctly. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? The change is a minor syntax correction. Existing CI tests should cover this. - vLLM version: v0.15.0 - vLLM main: 9562912cea Signed-off-by: gcanlin <canlinguosdu@gmail.com>
2026-02-25 14:37:53 +08:00
parent 957804df56
commit ad9d9569ea
1 changed files with 1 additions and 1 deletions
--- a/vllm_ascend/worker/model_runner_v1.py
+++ b/vllm_ascend/worker/model_runner_v1.py
@@ -1325,7 +1325,7 @@ class NPUModelRunner(GPUModelRunner):
            self.kv_connector_output = kv_connector_output
        return None
-    @torch.inference_mode
+    @torch.inference_mode()
    def sample_tokens(
        self, grammar_output: "GrammarOutput | None"
    ) -> ModelRunnerOutput | AsyncModelRunnerOutput | IntermediateTensors: