[300I][Bugfix] fix unquant model weight nd2nz error (#6851)
### What this PR does / why we need it?
- This PR fixes an issue with weight format conversion for unquantized
models running on Ascend 310P devices.
- The changes refactor the logic for converting weights to the
FRACTAL_NZ format. Previously, this was handled in a 310P-specific
linear layer implementation (`AscendUnquantizedLinearMethod310`). This
implementation has been removed, and the logic is now centralized in the
`maybe_trans_nz` utility function. This function now checks if the
device is a 310P and applies the NZ format cast accordingly for
`float16`/`bfloat16` weights.
- This refactoring simplifies the code by removing platform-specific
duplication and ensures correct weight handling for unquantized models
on 310P.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
ut and local test
- vLLM version: v0.15.0
- vLLM main:
83b47f67b1
---------
Signed-off-by: Tflowers-0129 <2906339855@qq.com>
This commit is contained in:
@@ -21,7 +21,7 @@ from vllm.model_executor.layers.linear import LinearBase
|
||||
|
||||
from tests.ut.base import TestBase
|
||||
from vllm_ascend._310p.fused_moe.fused_moe import AscendUnquantizedFusedMoEMethod310
|
||||
from vllm_ascend._310p.ops.linear import AscendUnquantizedLinearMethod310
|
||||
from vllm_ascend.ops.linear import AscendUnquantizedLinearMethod
|
||||
from vllm_ascend._310p.quantization.modelslim_config import AscendModelSlimConfig310
|
||||
|
||||
|
||||
@@ -50,7 +50,7 @@ class TestAscendModelSlimConfig310(TestBase):
|
||||
patch.object(self.ascend_config, "is_layer_skipped_ascend", return_value=True),
|
||||
):
|
||||
method = self.ascend_config.get_quant_method(linear_layer, ".attn")
|
||||
self.assertIsInstance(method, AscendUnquantizedLinearMethod310)
|
||||
self.assertIsInstance(method, AscendUnquantizedLinearMethod)
|
||||
|
||||
# Test quantized layer
|
||||
mock_scheme = MagicMock()
|
||||
|
||||
Reference in New Issue
Block a user