[releases/v0.18.0][Platform][BugFix] Guard forced tool choice with empty content (#8400)

### What this PR does / why we need it? This backports the forced-tool-choice `content=None` guard to the `releases/v0.18.0` compatibility layer. Upstream vLLM still has forced named tool-choice branches that assert `content is not None` after reasoning extraction. Some reasoning parsers can legally consume the full output and return `(reasoning, None)`, which makes the assert reachable and can surface as a server-side failure. This PR follows the same compatibility-patch pattern used by: - `7314bbe2` fix(platform): reimplement MiniMax usage accounting patch (#7835) - `f83cb0e6` [Bugfix][Platform] Fix GLM47 tool-call finish backfill (#7710) The patch is intentionally narrow: - normalize `content=None` to `""` only for forced named tool choice - patch both chat-completions and responses parser entry points - keep the rest of upstream behavior unchanged Upstream tracking: - issue: vllm-project/vllm#40147 - PR: vllm-project/vllm#40148 ### Does this PR introduce _any_ user-facing change? Yes. Forced named tool choice becomes robust when the reasoning parser returns no post-reasoning content, avoiding an internal assertion failure and emitting an empty-argument function call instead. ### How was this patch tested? Unit tests: ```bash pytest -sv tests/ut/patch/platform/test_patch_tool_choice_none_content.py \ tests/ut/patch/platform/test_patch_glm_tool_call_parser.py \ tests/ut/patch/platform/test_patch_minimax_usage_accounting.py ``` Result: 22 passed. --------- Signed-off-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com> Co-authored-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com>
2026-04-23 16:46:10 +08:00
parent ff76c6780e
commit d81101acdd
4 changed files with 202 additions and 0 deletions
--- a/vllm_ascend/patch/platform/init.py
+++ b/vllm_ascend/patch/platform/init.py
@@ -30,6 +30,7 @@ import vllm_ascend.patch.platform.patch_sched_yield  # noqa
 import vllm_ascend.patch.platform.patch_torch_accelerator  # noqa
 import vllm_ascend.patch.platform.patch_minimax_usage_accounting  # noqa
 import vllm_ascend.patch.platform.patch_glm_tool_call_parser  # noqa
+import vllm_ascend.patch.platform.patch_tool_choice_none_content  # noqa

 if envs.VLLM_ASCEND_BALANCE_SCHEDULING:
    import vllm_ascend.patch.platform.patch_balance_schedule  # noqa
--- a/vllm_ascend/patch/platform/patch_tool_choice_none_content.py
+++ b/vllm_ascend/patch/platform/patch_tool_choice_none_content.py
@@ -0,0 +1,86 @@
+#
+# Copyright (c) 2026 Huawei Technologies Co., Ltd. All Rights Reserved.
+# This file is a part of the vllm-ascend project.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+# OpenAI forced tool choice: tolerate None content after reasoning extraction.
+#
+
+from __future__ import annotations
+
+from openai.types.responses import ToolChoiceFunction
+from vllm.entrypoints.openai.chat_completion.protocol import (
+    ChatCompletionNamedToolChoiceParam,
+)
+from vllm.entrypoints.openai.engine.serving import OpenAIServing
+from vllm.parser.abstract_parser import DelegatingParser
+
+
+def _normalize_tool_choice_content(
+    request,
+    content: str | None,
+) -> str | None:
+    if content is not None:
+        return content
+
+    tool_choice = getattr(request, "tool_choice", None)
+    if isinstance(
+        tool_choice,
+        (ToolChoiceFunction, ChatCompletionNamedToolChoiceParam),
+    ):
+        return ""
+    return content
+
+
+_original_parse_tool_calls_from_content = OpenAIServing._parse_tool_calls_from_content
+
+
+def _patched_parse_tool_calls_from_content(
+    request,
+    tokenizer,
+    enable_auto_tools: bool,
+    tool_parser_cls,
+    content: str | None = None,
+):
+    content = _normalize_tool_choice_content(request, content)
+    return _original_parse_tool_calls_from_content(
+        request=request,
+        tokenizer=tokenizer,
+        enable_auto_tools=enable_auto_tools,
+        tool_parser_cls=tool_parser_cls,
+        content=content,
+    )
+
+
+OpenAIServing._parse_tool_calls_from_content = staticmethod(_patched_parse_tool_calls_from_content)
+
+_original_delegating_parse_tool_calls = DelegatingParser._parse_tool_calls
+
+
+def _patched_delegating_parse_tool_calls(
+    self,
+    request,
+    content: str | None,
+    enable_auto_tools: bool,
+):
+    content = _normalize_tool_choice_content(request, content)
+    return _original_delegating_parse_tool_calls(
+        self,
+        request,
+        content,
+        enable_auto_tools,
+    )
+
+
+DelegatingParser._parse_tool_calls = _patched_delegating_parse_tool_calls