xc-llm-ascend/vllm_ascend at 1fc7bc056d684f82a452ec2421ce6096b90804c1 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

jack d81101acdd [releases/v0.18.0][Platform][BugFix] Guard forced tool choice with empty content (#8400 )

### What this PR does / why we need it?

This backports the forced-tool-choice `content=None` guard to the
`releases/v0.18.0` compatibility layer.

Upstream vLLM still has forced named tool-choice branches that assert
`content is not None` after reasoning extraction. Some reasoning parsers
can legally consume the full output and return `(reasoning, None)`,
which makes the assert reachable and can surface as a server-side
failure.

This PR follows the same compatibility-patch pattern used by:
- `7314bbe2` fix(platform): reimplement MiniMax usage accounting patch
(#7835)
- `f83cb0e6` [Bugfix][Platform] Fix GLM47 tool-call finish backfill
(#7710)

The patch is intentionally narrow:
- normalize `content=None` to `""` only for forced named tool choice
- patch both chat-completions and responses parser entry points
- keep the rest of upstream behavior unchanged

Upstream tracking:
- issue: vllm-project/vllm#40147
- PR: vllm-project/vllm#40148

### Does this PR introduce _any_ user-facing change?

Yes.

Forced named tool choice becomes robust when the reasoning parser
returns no post-reasoning content, avoiding an internal assertion
failure and emitting an empty-argument function call instead.

### How was this patch tested?

Unit tests:
```bash
pytest -sv tests/ut/patch/platform/test_patch_tool_choice_none_content.py \
  tests/ut/patch/platform/test_patch_glm_tool_call_parser.py \
  tests/ut/patch/platform/test_patch_minimax_usage_accounting.py
```

Result: 22 passed.

---------

Signed-off-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com>
Co-authored-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com>

2026-04-23 16:46:10 +08:00

..

[Performance] Use forward_native for Conv3dLayer and add UT (#8375 )

2026-04-20 17:20:40 +08:00

_cann_ops_custom

[Kernel] add custom op GmmSwigluQuantWeightNzTensorList (#3804 )

2025-11-28 18:06:39 +08:00

[BugFix][0.18.0] Fix quant_bias missing in w8a8_static when flashcomm1 is enabled for GLM-5 (#8304 )

2026-04-17 22:46:36 +08:00

[v0.18.0][BugFix] PIECEWISE mode also requires synchronization (#8469 )

2026-04-21 16:22:32 +08:00

[v0.18.0][Misc] Recompute scheduler upgrade to vLLM 0.18.0 (#7720 )

2026-03-27 18:24:53 +08:00

[A5][bugfix] Fix fused MoE A5 MXFP8 scale normalization, load-balance routing and gating_topk ops (#7573 )

2026-03-25 17:20:28 +08:00

device_allocator

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #2 ) (#5977 )

2026-01-19 08:59:46 +08:00

[releases/v0.18.0][Doc][Misc] Modifying Configuration Parameters (#8618 )

2026-04-23 16:23:31 +08:00

[V0.18.0][EPLB][BugFix] Fix moe_load precision in allgather (#7890 )

2026-04-02 09:20:31 +08:00

upgrade to 0.18.0 (#7502 )

2026-03-21 16:05:38 +08:00

[Bugfix][LoRA] Fix the bug when runs Qwen3-Reranker-0.6B with LoRA. (#7156 )

2026-03-15 17:55:42 +08:00

[ModelLoader][Feature] Add rfork support for fast model loading (#7392 )

2026-03-25 16:40:30 +08:00

[BugFix] fix tl.extract_slice and tl.insert_slice. (#8567 )

2026-04-23 09:50:29 +08:00

[releases/v0.18.0][Platform][BugFix] Guard forced tool choice with empty content (#8400 )

2026-04-23 16:46:10 +08:00

[BugFix][0.18.0] Fix quant_bias missing in w8a8_static when flashcomm1 is enabled for GLM-5 (#8304 )

2026-04-17 22:46:36 +08:00

[releases/v0.18.0][Triton][Sampler] Add penalty-related Triton kernel for better performance of penalties (#7794 )

2026-03-31 19:01:51 +08:00

[BugFix] fix hang in async scheduling while open ENPU (#8354 )

2026-04-18 00:07:15 +08:00

[v0.18.0][BugFix] Fix Qwen3.5 MoE FC1 error under high concurrency when dp>1 (#8395 )

2026-04-20 10:26:19 +08:00

Main2main upgrade to vllm 0317 afternoon (#7409 )

2026-03-18 23:24:27 +08:00

__init__.py

[ModelLoader][Feature] Add rfork support for fast model loading (#7392 )

2026-03-25 16:40:30 +08:00

ascend_config.py

[BugFix][v0.18.0] Gate recompute/balance/fused_mc2 by PD mode (#8374 )

2026-04-18 18:06:42 +08:00

ascend_forward_context.py

[Bugfix][eager][oom] fix rank0 load imbalance by no padding when multi dp (#7297 )

2026-03-23 17:05:02 +08:00

batch_invariant.py

[CI] Add pre-commit check for patch logger (#7446 )

2026-03-19 16:53:20 +08:00

cpu_binding.py

[BugFix] Enforce C locale for CPU binding subprocess parsing (#8261 )

2026-04-16 16:17:10 +08:00

envs.py

[BugFix][v0.18.0] Gate recompute/balance/fused_mc2 by PD mode (#8374 )

2026-04-18 18:06:42 +08:00

flash_common3_context.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

meta_registration.py

[Ops][Refactor] Remove custom rotary_embedding operator (#6523 )

2026-02-07 09:24:05 +08:00

platform.py

[BugFix] Require kv producer for layer sharding (#8563 )

2026-04-23 16:06:53 +08:00

profiling_config.py

[Core][Misc] Clean up ProfileExecuteDuration (#6461 )

2026-02-01 20:06:01 +08:00

utils.py

[Performance] Use forward_native for Conv3dLayer and add UT (#8375 )

2026-04-20 17:20:40 +08:00