For nz unset in bf16&fp16 (#4495)

<!--  Thanks for sending a pull request!

BEFORE SUBMITTING, PLEASE READ
https://docs.vllm.ai/en/latest/contributing/overview.html

-->
### What this PR does / why we need it?
disable NZ for float weight case. This is only a quick fix for dev
branch.

For main branch, we'll consider more case to make it more common.


### Does this PR introduce _any_ user-facing change?
<!--
Note that it means *any* user-facing change including all aspects such
as API, interface or other behavior changes.
Documentation-only updates are not considered user-facing changes.
-->

### How was this patch tested?
qwen2.5 32B
<img width="441" height="221" alt="image"
src="https://github.com/user-attachments/assets/7ae18ffd-1ce2-43d9-9960-be45250ad0da"
/>

---------

Signed-off-by: 刘哲续 <liuzhexu1@huawei.com>
Co-authored-by: 刘哲续 <liuzhexu1@huawei.com>
This commit is contained in:
henryxuxu0716
2025-11-28 17:32:25 +08:00
committed by GitHub
parent 96c362361e
commit 71acc8ddeb
10 changed files with 16 additions and 14 deletions

View File

@@ -81,7 +81,7 @@ class NPUWorker(WorkerBase):
# register patch for vllm
from vllm_ascend.utils import adapt_patch
adapt_patch()
is_enable_nz(vllm_config)
is_enable_nz(vllm_config=vllm_config)
# Register ops when worker init.
from vllm_ascend import ops
ops.register_dummy_fusion_op()