[CI] Patch torch.library.infer_schema for fused moe ops to fix CI (#854)

make sure pytorch infer_schema check is patched before some case which using fused moe ops: 1. model register 2. quantization loading 3. fused moe ut Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-05-14 19:49:09 +08:00
parent 508242425c
commit 68fb63428b
3 changed files with 11 additions and 0 deletions
--- a/vllm_ascend/quantization/quant_config.py
+++ b/vllm_ascend/quantization/quant_config.py
@@ -15,6 +15,10 @@
 # limitations under the License.
 # This file is a part of the vllm-ascend project.
 #
+# By using quantization case, this file is called before worker patch achieve,
+# we need to import patch_utils here first to make sure the patch is applied.
+import vllm_ascend.patch.worker.patch_common.patch_utils  # type: ignore[import]  # isort: skip  # noqa
+
 from types import MappingProxyType
 from typing import Any, Callable, Dict, List, Mapping, Optional