[BugFix] Fix AddRMSNormQuant not taking effect (#6620)
### What this PR does / why we need it?
Fix the issue where, in graph mode, the fused `AddRMSNormQuant` operator
does not take effect when there is no bias.
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.15.0
- vLLM main:
d7e17aaacd
---------
Signed-off-by: ZYang6263 <zy626375@gmail.com>
This commit is contained in:
@@ -26,8 +26,8 @@ def test_qwen3_w8a8_quant():
|
||||
]
|
||||
vllm_target_outputs = [([
|
||||
85, 4086, 44, 374, 264, 1550, 42747, 628, 323, 4938, 72816, 44378, 323,
|
||||
13480, 4712, 369, 444, 10994, 82, 13, 1084, 374, 6188, 369, 3460
|
||||
], 'vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. It is designed for large'
|
||||
13480, 4712, 369, 444, 10994, 82, 13, 1084, 374, 6188, 311, 387
|
||||
], 'vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. It is designed to be'
|
||||
)]
|
||||
|
||||
with VllmRunner(
|
||||
|
||||
Reference in New Issue
Block a user