Files
xc-llm-ascend/tests/e2e/singlecard/pooling/test_classification.py

36 lines
1.2 KiB
Python
Raw Permalink Normal View History

import torch
from modelscope import snapshot_download # type: ignore[import-untyped]
from transformers import AutoModelForSequenceClassification
[CI][Misc] Use offline mode for model downloads (#7179) ### What this PR does / why we need it? 1. For all parts of the current test module involving the millisecond download model, add the `local_file_only` parameter to specify offline mode; this ensures that CI will not fail due to network instability. 2. Install modelscope from a fixed commit until it next release ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? check if the env or arg `local_files_only` works 1) set the env: ```shell export HF_HUB_OFFLINE=1 ``` 2) run the script ```python from transformers import PretrainedConfig import huggingface_hub from modelscope.utils.hf_util import patch_hub patch_hub() model="Qwen/Qwen3-0.6B" kwargs = {} config_dict, _ = PretrainedConfig.get_config_dict( model, trust_remote_code=True, local_files_only=huggingface_hub.constants.HF_HUB_OFFLINE, **kwargs, ) print(config_dict) ``` it works well: ```shell 2026-03-06 06:40:12,546 - modelscope - WARNING - We can not confirm the cached file is for revision: master The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. {'architectures': ['Qwen3ForCausalLM'], 'attention_bias': False, 'attention_dropout': 0.0, 'bos_token_id': 151643, 'eos_token_id': 151645, 'head_dim': 128, 'hidden_act': 'silu', 'hidden_size': 1024, 'initializer_range': 0.02, 'intermediate_size': 3072, 'max_position_embeddings': 40960, 'max_window_layers': 28, 'model_type': 'qwen3', 'num_attention_heads': 16, 'num_hidden_layers': 28, 'num_key_value_heads': 8, 'rms_norm_eps': 1e-06, 'rope_scaling': None, 'rope_theta': 1000000, 'sliding_window': None, 'tie_word_embeddings': True, 'torch_dtype': 'bfloat16', 'transformers_version': '4.51.0', 'use_cache': True, 'use_sliding_window': False, 'vocab_size': 151936, '_commit_hash': None} ``` 3) test the model repo does not cached locally when the env `HF_HUB_OFFLINE`==True ```python from transformers import PretrainedConfig import huggingface_hub from modelscope.utils.hf_util import patch_hub patch_hub() model="FireRedTeam/FireRed-OCR" kwargs = {} config_dict, _ = PretrainedConfig.get_config_dict( model, trust_remote_code=True, local_files_only=huggingface_hub.constants.HF_HUB_OFFLINE, **kwargs, ) print(config_dict) ``` and the result is as expected: ```shell File "/workspace/demo.py", line 12, in <module> config_dict, _ = PretrainedConfig.get_config_dict( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.14/lib/python3.11/site-packages/modelscope/utils/hf_util/patcher.py", line 189, in patch_get_config_dict model_dir = get_model_dir(pretrained_model_name_or_path, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.14/lib/python3.11/site-packages/modelscope/utils/hf_util/patcher.py", line 164, in get_model_dir model_dir = snapshot_download( ^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.14/lib/python3.11/site-packages/modelscope/hub/snapshot_download.py", line 137, in snapshot_download return _snapshot_download( ^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.14/lib/python3.11/site-packages/modelscope/hub/snapshot_download.py", line 283, in _snapshot_download raise ValueError( ValueError: Cannot find the requested files in the cached path and outgoing traffic has been disabled. To enable look-ups and downloads online, set 'local_files_only' to False ``` - vLLM version: v0.16.0 - vLLM main: https://github.com/vllm-project/vllm/commit/15d76f74e2fdb12a95ea00f0ca283acf6219a2b7 --------- Signed-off-by: wangli <wangli858794774@gmail.com>
2026-03-13 08:52:24 +08:00
import huggingface_hub
from tests.e2e.conftest import HfRunner, VllmRunner
[CI] refect e2e ci test (#5246) ### What this PR does / why we need it? efect e2e ci test: 1. tests/e2e/singlecard/pooling/test_embedding.py: remove the eager parameter and rename test case 2. tests/e2e/singlecard/pooling/test_scoring.py: Rename test cases 3. tests/e2e/singlecard/pooling/test_classification.py: Rename test case 4. tests/e2e/singlecard/test_quantization.py: remove the eager parameter and chage model to vllm-ascend/Qwen2.5-0.6B-W8A8 and Rename test case 5. tests/e2e/multicard/test_shared_expert_dp.py: Rename test cases 6. tests/e2e/singlecard/test_sampler.py: Rename test cases 7. tests/e2e/singlecard/test_aclgraph_accuracy.py: Rename test cases 8. tests/e2e/multicard/test_offline_inference_distributed.py: Rename test cases and remove the eager parameter 9. tests/e2e/multicard/long_sequence/test_accuracy.py: Rename test cases and remove the eager parameter 10. tests/e2e/multicard/long_sequence/test_basic.py: Rename test cases and remove the eager parameter 11.tests/e2e/multicard/test_expert_parallel.py:remove the eager parameter 12.tests/e2e/multicard/test_full_graph_mode.py:remove the eager parameter 13.tests/e2e/multicard/test_ilama_lora_tp2.py:remove the eager parameter 14.tests/e2e/singlecard/spec_decode_v1/test_v1_mtp_correctness.py:remove the eager parameter 15.tests/e2e/singlecard/spec_decode_v1/test_v1_spec_decode.py:remove the eager parameter 16.tests/e2e/singlecard/test_aclgraph_accuracy.py:remove the eager parameter 17.tests/e2e/singlecard/test_camem.py:remove the eager parameter 18.tests/e2e/singlecard/test_ilama_lora.py:remove the eager parameter 19.tests/e2e/singlecard/test_multistream_overlap_shared_expert.py:remove the eager parameter 20.tests/e2e/singlecard/test_vlm.py:remove the eager parameter 21.tests/e2e/singlecard/test_xli:remove the eager parameter ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: release/v0.13.0 - vLLM main: https://github.com/vllm-project/vllm/commit/ad32e3e19ccf0526cb6744a5fed09a138a5fb2f9 Signed-off-by: hfadzxy <starmoon_zhang@163.com>
2025-12-23 18:42:35 +08:00
def test_qwen_pooling_classify_correctness() -> None:
[CI][Misc] Use offline mode for model downloads (#7179) ### What this PR does / why we need it? 1. For all parts of the current test module involving the millisecond download model, add the `local_file_only` parameter to specify offline mode; this ensures that CI will not fail due to network instability. 2. Install modelscope from a fixed commit until it next release ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? check if the env or arg `local_files_only` works 1) set the env: ```shell export HF_HUB_OFFLINE=1 ``` 2) run the script ```python from transformers import PretrainedConfig import huggingface_hub from modelscope.utils.hf_util import patch_hub patch_hub() model="Qwen/Qwen3-0.6B" kwargs = {} config_dict, _ = PretrainedConfig.get_config_dict( model, trust_remote_code=True, local_files_only=huggingface_hub.constants.HF_HUB_OFFLINE, **kwargs, ) print(config_dict) ``` it works well: ```shell 2026-03-06 06:40:12,546 - modelscope - WARNING - We can not confirm the cached file is for revision: master The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. {'architectures': ['Qwen3ForCausalLM'], 'attention_bias': False, 'attention_dropout': 0.0, 'bos_token_id': 151643, 'eos_token_id': 151645, 'head_dim': 128, 'hidden_act': 'silu', 'hidden_size': 1024, 'initializer_range': 0.02, 'intermediate_size': 3072, 'max_position_embeddings': 40960, 'max_window_layers': 28, 'model_type': 'qwen3', 'num_attention_heads': 16, 'num_hidden_layers': 28, 'num_key_value_heads': 8, 'rms_norm_eps': 1e-06, 'rope_scaling': None, 'rope_theta': 1000000, 'sliding_window': None, 'tie_word_embeddings': True, 'torch_dtype': 'bfloat16', 'transformers_version': '4.51.0', 'use_cache': True, 'use_sliding_window': False, 'vocab_size': 151936, '_commit_hash': None} ``` 3) test the model repo does not cached locally when the env `HF_HUB_OFFLINE`==True ```python from transformers import PretrainedConfig import huggingface_hub from modelscope.utils.hf_util import patch_hub patch_hub() model="FireRedTeam/FireRed-OCR" kwargs = {} config_dict, _ = PretrainedConfig.get_config_dict( model, trust_remote_code=True, local_files_only=huggingface_hub.constants.HF_HUB_OFFLINE, **kwargs, ) print(config_dict) ``` and the result is as expected: ```shell File "/workspace/demo.py", line 12, in <module> config_dict, _ = PretrainedConfig.get_config_dict( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.14/lib/python3.11/site-packages/modelscope/utils/hf_util/patcher.py", line 189, in patch_get_config_dict model_dir = get_model_dir(pretrained_model_name_or_path, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.14/lib/python3.11/site-packages/modelscope/utils/hf_util/patcher.py", line 164, in get_model_dir model_dir = snapshot_download( ^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.14/lib/python3.11/site-packages/modelscope/hub/snapshot_download.py", line 137, in snapshot_download return _snapshot_download( ^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.14/lib/python3.11/site-packages/modelscope/hub/snapshot_download.py", line 283, in _snapshot_download raise ValueError( ValueError: Cannot find the requested files in the cached path and outgoing traffic has been disabled. To enable look-ups and downloads online, set 'local_files_only' to False ``` - vLLM version: v0.16.0 - vLLM main: https://github.com/vllm-project/vllm/commit/15d76f74e2fdb12a95ea00f0ca283acf6219a2b7 --------- Signed-off-by: wangli <wangli858794774@gmail.com>
2026-03-13 08:52:24 +08:00
model_name = snapshot_download("Howeee/Qwen2.5-1.5B-apeach", local_files_only=huggingface_hub.constants.HF_HUB_OFFLINE,)
prompts = [
"Hello, my name is",
"The president of the United States is",
"The capital of France is",
"The future of AI is what",
]
with VllmRunner(
model_name,
runner="pooling",
max_model_len=None,
cudagraph_capture_sizes=[4],
) as vllm_runner:
vllm_outputs = vllm_runner.classify(prompts)
with HfRunner(model_name,
dtype="float32",
auto_cls=AutoModelForSequenceClassification) as hf_runner:
hf_outputs = hf_runner.classify(prompts)
for hf_output, vllm_output in zip(hf_outputs, vllm_outputs):
hf_output = torch.tensor(hf_output)
vllm_output = torch.tensor(vllm_output)
assert torch.allclose(hf_output, vllm_output, 1e-2)