xc-llm-ascend/vllm_ascend/patch/worker/patch_module.py

import torch


# torch_npu.argsort does not sipport bool now, it will support it in the future.
# TODO When the operator of argsort is ready, this patch must be removed.
def _argsort(tensor, *args, **kwargs):
    if tensor.dtype == torch.bool:
        # If it is not stable, it will have redundant outputs.
        kwargs["stable"] = True
        return torch.argsort(tensor.to(torch.int32), *args, **kwargs)
    else:
        return torch.argsort(tensor, *args, **kwargs)


class _TorchWrapper:
    def __init__(self):
        self._raw_torch = torch

    def __getattr__(self, name):
        if name == "argsort":
            return _argsort
        else:
            return getattr(self._raw_torch, name)


_is_patched = False


# patch argsort only for torch in gdn_attn
def patch_torch_npu_argsort():
    global _is_patched
    if not _is_patched:
        import vllm.v1.attention.backends.gdn_attn as gdn_attn

        gdn_attn.torch = _TorchWrapper()
        _is_patched = True
[BugFix][main] Adapted Qwen3-Next-MTP to chunked prefill (#4770) ### What this PR does / why we need it? The pad `-1` modification is from https://github.com/vllm-project/vllm/pull/25743. It still has bugs for batched chunked prefill. - vLLM version: v0.12.0 - vLLM main: https://github.com/vllm-project/vllm/commit/ad32e3e19ccf0526cb6744a5fed09a138a5fb2f9 Signed-off-by: drslark <slarksblood@qq.com> Co-authored-by: Mengqing Cao <cmq0113@163.com> 2025-12-10 22:54:24 +08:00			`import torch`


			`# torch_npu.argsort does not sipport bool now, it will support it in the future.`
			`# TODO When the operator of argsort is ready, this patch must be removed.`
			`def _argsort(tensor, args, *kwargs):`
			`if tensor.dtype == torch.bool:`
[main][BugFix] Fixed an accuracy bug of Qwen3-next-MTP when batched inferring (#4932) ### What this PR does / why we need it? Fixes an accuracy bug of Qwen3-next-MTP when batched inferring. It is descibed in https://github.com/vllm-project/vllm-ascend/issues/4930. - vLLM version: v0.12.0 - vLLM main: https://github.com/vllm-project/vllm/commit/ad32e3e19ccf0526cb6744a5fed09a138a5fb2f9 Signed-off-by: drslark <slarksblood@qq.com> 2025-12-15 13:22:30 +08:00			`# If it is not stable, it will have redundant outputs.`
			`kwargs["stable"] = True`
[BugFix][main] Adapted Qwen3-Next-MTP to chunked prefill (#4770) ### What this PR does / why we need it? The pad `-1` modification is from https://github.com/vllm-project/vllm/pull/25743. It still has bugs for batched chunked prefill. - vLLM version: v0.12.0 - vLLM main: https://github.com/vllm-project/vllm/commit/ad32e3e19ccf0526cb6744a5fed09a138a5fb2f9 Signed-off-by: drslark <slarksblood@qq.com> Co-authored-by: Mengqing Cao <cmq0113@163.com> 2025-12-10 22:54:24 +08:00			`return torch.argsort(tensor.to(torch.int32), args, *kwargs)`
			`else:`
			`return torch.argsort(tensor, args, *kwargs)`


			`class _TorchWrapper:`
			`def __init__(self):`
			`self._raw_torch = torch`

			`def __getattr__(self, name):`
			`if name == "argsort":`
			`return _argsort`
			`else:`
			`return getattr(self._raw_torch, name)`


			`_is_patched = False`


			`# patch argsort only for torch in gdn_attn`
			`def patch_torch_npu_argsort():`
			`global _is_patched`
			`if not _is_patched:`
			`import vllm.v1.attention.backends.gdn_attn as gdn_attn`
[Lint]Style: Convert `vllm-ascend/` to ruff format(Batch #10) (#6173) ### What this PR does / why we need it? Scope of Changes: \| File Path \| \| :--- \| \|`vllm_ascend/ops/layer_shard_linear.py`\| \|`vllm_ascend/ops/linear.py`\| \|`vllm_ascend/ops/linear_op.py`\| \|`vllm_ascend/worker/worker.py`\| \| ` vllm_ascend/patch/worker/patch_bert.py` \| \| ` vllm_ascend/patch/worker/patch_deepseek.py` \| \| ` vllm_ascend/patch/worker/patch_distributed.py` \| \| ` vllm_ascend/patch/worker/patch_module.py` \| \| ` vllm_ascend/patch/worker/patch_multimodal_merge.py` \| \| ` vllm_ascend/patch/worker/patch_qwen3_next.py` \| \| ` vllm_ascend/patch/worker/patch_qwen3_next_mtp.py` \| \| ` vllm_ascend/patch/worker/patch_rejection_sampler.py` \| \| ` vllm_ascend/patch/worker/patch_rope.py` \| \| ` vllm_ascend/patch/worker/patch_triton.py` \| \| ` vllm_ascend/patch/worker/patch_unquantized_gemm.py` \| \| ` vllm_ascend/patch/worker/patch_v2_egale.py` \| \|` vllm_ascend/worker/npu_input_batch.py`\| \|` vllm_ascend/worker/v2/aclgraph_utils.py`\| \|` vllm_ascend/worker/v2/attn_utils.py`\| \|` vllm_ascend/worker/v2/model_runner.py`\| \|` vllm_ascend/worker/v2/sample/gumbel.py`\| \|` vllm_ascend/worker/v2/sample/penalties.py`\| \|` vllm_ascend/worker/v2/sample/sampler.py`\| \|` vllm_ascend/worker/v2/spec_decode/__init__.py`\| \|` vllm_ascend/worker/v2/spec_decode/eagle.py`\| \|` vllm_ascend/worker/v2/states.py`\| ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.0 - vLLM main: https://github.com/vllm-project/vllm/commit/d68209402ddab3f54a09bc1f4de9a9495a283b60 Signed-off-by: MrZ20 <2609716663@qq.com> Signed-off-by: SILONG ZENG <2609716663@qq.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> 2026-02-06 15:35:06 +08:00
[BugFix][main] Adapted Qwen3-Next-MTP to chunked prefill (#4770) ### What this PR does / why we need it? The pad `-1` modification is from https://github.com/vllm-project/vllm/pull/25743. It still has bugs for batched chunked prefill. - vLLM version: v0.12.0 - vLLM main: https://github.com/vllm-project/vllm/commit/ad32e3e19ccf0526cb6744a5fed09a138a5fb2f9 Signed-off-by: drslark <slarksblood@qq.com> Co-authored-by: Mengqing Cao <cmq0113@163.com> 2025-12-10 22:54:24 +08:00			`gdn_attn.torch = _TorchWrapper()`
			`_is_patched = True`