[misc] move mxfp_compat into device to decouple from quantization init chain (#6918)
### What this PR does / why we need it?
`mxfp_compat` only provides dtype/symbol compatibility helpers for
different `torch_npu` versions, but it was placed under
`vllm_ascend.quantization`. Importing it from device/ops paths could
trigger `quantization/__init__.py` and pull in heavy quantization method
dependencies, increasing startup coupling and causing import-cycle risk
(especially on 310P paths).
### Does this PR introduce _any_ user-facing change?
No functional behavior change intended.
### How was this patch tested?
CI passed.
- vLLM version: v0.16.0
- vLLM main:
15d76f74e2
---------
Signed-off-by: linfeng-yuan <1102311262@qq.com>
This commit is contained in:
@@ -18,7 +18,7 @@
|
||||
import torch
|
||||
import torch_npu
|
||||
|
||||
from vllm_ascend.quantization.mxfp_compat import (
|
||||
from vllm_ascend.device.mxfp_compat import (
|
||||
FLOAT4_E2M1FN_X2_DTYPE,
|
||||
FLOAT8_E8M0FNU_DTYPE,
|
||||
HIFLOAT8_DTYPE,
|
||||
|
||||
Reference in New Issue
Block a user