xc-llm-ascend

Author SHA1 Message Date

Author	SHA1	Message	Date
yupeng	cf96366a39	[Bugfix][LoRA][Patch] Fix the LoRA inference bug after upstream vLLM codebase changed (#2560 ) ### What this PR does / why we need it? The mergence of the upstream https://github.com/vllm-project/vllm/pull/22592 caused a vllm-ascend LoRA inference bug. The details are following: According to [torch_npu/npu/_stream_check.py](`863b9071cb/torch_npu/npu/_stream_check.py (L74)`), NPU device type tensors have attributes is_cuda=True and is_npu=True. This causes that vLLM's apply_repetition_penalties function will run into the branch of "if logits.is_cuda and logits.is_contiguous()" and call the custom op implemented in CUDA, which is not compatible with NPU. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? pytest -sv tests/e2e/singlecard/test_ilama_lora.py pytest -sv tests/e2e/multicard/test_ilama_lora_tp2.py - vLLM version: v0.10.1.1 - vLLM main: `fe8d7b6f03` --------- Signed-off-by: paulyu12 <paulyu0307@gmail.com> Signed-off-by: paulyu12 <507435917@qq.com> Co-authored-by: paulyu12 <paulyu0307@gmail.com>	2025-08-28 10:40:51 +08:00

yupeng

cf96366a39

[Bugfix][LoRA][Patch] Fix the LoRA inference bug after upstream vLLM codebase changed (#2560 )

### What this PR does / why we need it?
The mergence of the upstream
https://github.com/vllm-project/vllm/pull/22592 caused a vllm-ascend
LoRA inference bug. The details are following:

According to
[torch_npu/npu/_stream_check.py](863b9071cb/torch_npu/npu/_stream_check.py (L74)),
NPU device type tensors have attributes is_cuda=True and is_npu=True.
This causes that vLLM's apply_repetition_penalties function will run
into the branch of "if logits.is_cuda and logits.is_contiguous()" and
call the custom op implemented in CUDA, which is not compatible with
NPU.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
pytest -sv tests/e2e/singlecard/test_ilama_lora.py
pytest -sv tests/e2e/multicard/test_ilama_lora_tp2.py

- vLLM version: v0.10.1.1
- vLLM main:
fe8d7b6f03

---------

Signed-off-by: paulyu12 <paulyu0307@gmail.com>
Signed-off-by: paulyu12 <507435917@qq.com>
Co-authored-by: paulyu12 <paulyu0307@gmail.com>

2025-08-28 10:40:51 +08:00

1 Commits