update for ascend

2025-09-09 14:45:30 +08:00
parent 16a28b4778
commit c512f7d43d
11 changed files with 43467 additions and 0 deletions
--- a/.DS_Store
+++ b/.DS_Store
--- a/026_0010.jpg
+++ b/026_0010.jpg
--- a/17
+++ b/17
@@ -0,0 +1,17 @@
+FROM quay.io/ascend/vllm-ascend:v0.10.0rc1
+
+WORKDIR /workspace/
+COPY ./model_test_caltech_http_ascend.py /workspace/
+COPY ./microsoft_beit_base_patch16_224_pt22k_ft22k /model
+
+
+# 安装transformers 4.46.3
+RUN python3 -m pip install --no-cache-dir transformers==4.46.3
+
+
+RUN python3 -m pip install flask==3.1.1
+
+EXPOSE 80
+
+
+ENTRYPOINT ["python3", "model_test_caltech_http_ascend.py"]
--- a/microsoft_beit_base_patch16_224_pt22k_ft22k.zip
+++ b/microsoft_beit_base_patch16_224_pt22k_ft22k.zip
--- a/microsoft_beit_base_patch16_224_pt22k_ft22k/.gitattributes
+++ b/microsoft_beit_base_patch16_224_pt22k_ft22k/.gitattributes
@@ -0,0 +1,18 @@
+*.bin.* filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tar.gz filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
--- a/microsoft_beit_base_patch16_224_pt22k_ft22k/README.md
+++ b/microsoft_beit_base_patch16_224_pt22k_ft22k/README.md
@@ -0,0 +1,104 @@
+---
+license: apache-2.0
+tags:
+- image-classification
+- vision
+datasets:
+- imagenet
+- imagenet-21k
+---
+
+# BEiT (base-sized model, fine-tuned on ImageNet-22k) 
+
+BEiT model pre-trained in a self-supervised fashion on ImageNet-22k - also called ImageNet-21k (14 million images, 21,841 classes) at resolution 224x224, and fine-tuned on the same dataset at resolution 224x224. It was introduced in the paper [BEIT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) by Hangbo Bao, Li Dong and Furu Wei and first released in [this repository](https://github.com/microsoft/unilm/tree/master/beit). 
+
+Disclaimer: The team releasing BEiT did not write a model card for this model so this model card has been written by the Hugging Face team.
+
+## Model description
+
+The BEiT model is a Vision Transformer (ViT), which is a transformer encoder model (BERT-like). In contrast to the original ViT model, BEiT is pretrained on a large collection of images in a self-supervised fashion, namely ImageNet-21k, at a resolution of 224x224 pixels. The pre-training objective for the model is to predict visual tokens from the encoder of OpenAI's DALL-E's VQ-VAE, based on masked patches.
+Next, the model was fine-tuned in a supervised fashion on ImageNet (also referred to as ILSVRC2012), a dataset comprising 1 million images and 1,000 classes, also at resolution 224x224.
+
+Images are presented to the model as a sequence of fixed-size patches (resolution 16x16), which are linearly embedded. Contrary to the original ViT models, BEiT models do use relative position embeddings (similar to T5) instead of absolute position embeddings, and perform classification of images by mean-pooling the final hidden states of the patches, instead of placing a linear layer on top of the final hidden state of the [CLS] token.
+
+By pre-training the model, it learns an inner representation of images that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled images for instance, you can train a standard classifier by placing a linear layer on top of the pre-trained encoder. One typically places a linear layer on top of the [CLS] token, as the last hidden state of this token can be seen as a representation of an entire image. Alternatively, one can mean-pool the final hidden states of the patch embeddings, and place a linear layer on top of that.
+
+## Intended uses & limitations
+
+You can use the raw model for image classification. See the [model hub](https://huggingface.co/models?search=microsoft/beit) to look for
+fine-tuned versions on a task that interests you.
+
+### How to use
+
+Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes:
+
+```python
+from transformers import BeitImageProcessor, BeitForImageClassification
+from PIL import Image
+import requests
+
+url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
+image = Image.open(requests.get(url, stream=True).raw)
+
+processor = BeitImageProcessor.from_pretrained('microsoft/beit-base-patch16-224-pt22k-ft22k')
+model = BeitForImageClassification.from_pretrained('microsoft/beit-base-patch16-224-pt22k-ft22k')
+
+inputs = processor(images=image, return_tensors="pt")
+outputs = model(**inputs)
+logits = outputs.logits
+# model predicts one of the 21,841 ImageNet-22k classes
+predicted_class_idx = logits.argmax(-1).item()
+print("Predicted class:", model.config.id2label[predicted_class_idx])
+```
+
+Currently, both the feature extractor and model support PyTorch.
+
+## Training data
+
+The BEiT model was pretrained on [ImageNet-21k](http://www.image-net.org/), a dataset consisting of 14 million images and 21k classes, and fine-tuned on the same dataset.
+
+## Training procedure
+
+### Preprocessing
+
+The exact details of preprocessing of images during training/validation can be found [here](https://github.com/microsoft/unilm/blob/master/beit/datasets.py). 
+
+Images are resized/rescaled to the same resolution (224x224) and normalized across the RGB channels with mean (0.5, 0.5, 0.5) and standard deviation (0.5, 0.5, 0.5).
+
+### Pretraining
+
+For all pre-training related hyperparameters, we refer to page 15 of the [original paper](https://arxiv.org/abs/2106.08254).
+
+## Evaluation results
+
+For evaluation results on several image classification benchmarks, we refer to tables 1 and 2 of the original paper. Note that for fine-tuning, the best results are obtained with a higher resolution. Of course, increasing the model size will result in better performance.
+
+### BibTeX entry and citation info
+
+```@article{DBLP:journals/corr/abs-2106-08254,
+  author    = {Hangbo Bao and
+               Li Dong and
+               Furu Wei},
+  title     = {BEiT: {BERT} Pre-Training of Image Transformers},
+  journal   = {CoRR},
+  volume    = {abs/2106.08254},
+  year      = {2021},
+  url       = {https://arxiv.org/abs/2106.08254},
+  archivePrefix = {arXiv},
+  eprint    = {2106.08254},
+  timestamp = {Tue, 29 Jun 2021 16:55:04 +0200},
+  biburl    = {https://dblp.org/rec/journals/corr/abs-2106-08254.bib},
+  bibsource = {dblp computer science bibliography, https://dblp.org}
+}
+```
+
+```bibtex
+@inproceedings{deng2009imagenet,
+  title={Imagenet: A large-scale hierarchical image database},
+  author={Deng, Jia and Dong, Wei and Socher, Richard and Li, Li-Jia and Li, Kai and Fei-Fei, Li},
+  booktitle={2009 IEEE conference on computer vision and pattern recognition},
+  pages={248--255},
+  year={2009},
+  organization={Ieee}
+}
+```
--- a/microsoft_beit_base_patch16_224_pt22k_ft22k/config.json
+++ b/microsoft_beit_base_patch16_224_pt22k_ft22k/config.json
--- a/microsoft_beit_base_patch16_224_pt22k_ft22k/flax_model.msgpack
+++ b/microsoft_beit_base_patch16_224_pt22k_ft22k/flax_model.msgpack
--- a/microsoft_beit_base_patch16_224_pt22k_ft22k/preprocessor_config.json
+++ b/microsoft_beit_base_patch16_224_pt22k_ft22k/preprocessor_config.json
@@ -0,0 +1,19 @@
+{
+  "crop_size": 224,
+  "do_center_crop": false,
+  "do_normalize": true,
+  "do_resize": true,
+  "feature_extractor_type": "BeitFeatureExtractor",
+  "image_mean": [
+    0.5,
+    0.5,
+    0.5
+  ],
+  "image_std": [
+    0.5,
+    0.5,
+    0.5
+  ],
+  "resample": 2,
+  "size": 224
+}
--- a/microsoft_beit_base_patch16_224_pt22k_ft22k/pytorch_model.bin
+++ b/microsoft_beit_base_patch16_224_pt22k_ft22k/pytorch_model.bin
--- a/model_test_caltech_http_ascend.py
+++ b/model_test_caltech_http_ascend.py
@@ -0,0 +1,198 @@
+import requests
+import json
+import torch
+from PIL import Image
+from io import BytesIO
+from transformers import AutoImageProcessor, AutoModelForImageClassification
+from tqdm import tqdm
+import os
+import random
+import time
+from flask import Flask, request, jsonify
+
+class ImageClassifier:
+    def __init__(self, model_path: str, device: torch.device):
+        """初始化图像分类器，指定指定设备"""
+        # 增加模型路径有效性详细校验
+        if not os.path.exists(model_path):
+            raise ValueError(f"模型路径不存在: {model_path}")
+        if not os.path.isdir(model_path):
+            raise ValueError(f"模型路径不是目录: {model_path}")
+        
+        # 检查模型必要文件
+        required_files = ["config.json", "pytorch_model.bin"]  # 基础模型文件
+        missing_files = [f for f in required_files if not os.path.exists(os.path.join(model_path, f))]
+        if missing_files:
+            raise ValueError(f"模型路径缺少必要文件: {missing_files}")
+            
+        self.processor = AutoImageProcessor.from_pretrained(model_path)
+        self.model = AutoModelForImageClassification.from_pretrained(model_path)
+
+        # 将模型移动到指定设备
+        self.model = self.model.to(device)
+        self.device = device
+        
+        # 检查设备类型并打印相应信息
+        if device.type == "cuda":
+            print(f"模型是否在 GPU 上: {next(self.model.parameters()).is_cuda}")
+        elif device.type == "npu":
+            print(f"模型是否在 NPU 上: {next(self.model.parameters()).device.type == 'npu'}")
+        else:
+            print(f"模型在 {device.type.upper()} 上运行")
+
+        # 若在GPU/NPU且有多块，使用DataParallel
+        if device.type in ["cuda", "npu"] and (device.type == "cuda" and torch.cuda.device_count() > 1):
+            self.model = torch.nn.DataParallel(self.model)
+
+        self.id2label = self.model.module.config.id2label if hasattr(self.model, 'module') else self.model.config.id2label
+
+    def predict_single_image(self, image: Image.Image) -> dict:
+        """预测单张PIL图片"""
+        try:
+            # 预处理
+            inputs = self.processor(images=image, return_tensors="pt")
+
+            # 将输入数据移动到设备
+            inputs = inputs.to(self.device)
+
+            # 模型推理
+            start_time = time.time()
+
+            
+            with torch.no_grad():
+                ts = time.time()
+                outputs = self.model(**inputs)
+                print('ascend T1', time.time() - ts, flush=True)
+
+                ts = time.time()
+                for i in range(1000):
+                    outputs = self.model(**inputs)
+                print('ascend T2', time.time() - ts, flush=True)
+
+
+            processing_time = time.time() - start_time
+
+            # 获取预测结果（只取置信度最高的一个）
+            logits = outputs.logits
+            probs = torch.nn.functional.softmax(logits, dim=1)
+            top_probs, top_indices = probs.topk(1, dim=1)
+
+            # 整理结果
+            class_idx = top_indices[0, 0].item()
+            confidence = top_probs[0, 0].item()
+
+            device_type = "npu" if self.device.type == "npu" else "cuda"
+
+            return {
+                "class_id": class_idx,
+                "class_name": self.id2label[class_idx],
+                "confidence": confidence,
+                "device_used": str(self.device),
+                "processing_time": processing_time
+            }
+
+        except Exception as e:
+            print(f"处理图片时出错: {e}")
+            return {
+                "class_id": -1,
+                "class_name": "error",
+                "confidence": 0.0,
+                "device_used": str(self.device),
+                "processing_time": 0.0,
+                "error": str(e)
+            }
+
+def check_ascend_available():
+    """检查昇腾NPU是否可用"""
+    try:
+        # 检查是否有昇腾相关的Python包
+        import torch_npu
+        # 检查NPU设备是否可用
+        if hasattr(torch, 'npu') and torch.npu.is_available():
+            return True
+    except ImportError:
+        pass
+    return False
+
+def get_device():
+    """获取最佳可用设备"""
+    # 首先检查昇腾NPU
+    if check_ascend_available():
+        print("检测到昇腾NPU可用")
+        return torch.device("npu:0")
+    
+    # 然后检查NVIDIA GPU
+    elif torch.cuda.is_available():
+        print("检测到NVIDIA GPU可用")
+        return torch.device("cuda:0")
+    
+    # 最后使用CPU
+    else:
+        print("未检测到加速设备，使用CPU")
+        return torch.device("cpu")
+
+# 初始化服务
+app = Flask(__name__)
+MODEL_PATH = os.environ.get("MODEL_PATH", "/model")  # 模型路径（环境变量或默认路径）
+
+# 获取设备并初始化分类器
+device = get_device()
+classifier = ImageClassifier(MODEL_PATH, device)
+
+@app.route('/v1/private/s782b4996', methods=['POST'])
+def predict_single():
+    """接收单张图片并返回NPU预测结果"""
+    if 'image' not in request.files:
+        return jsonify({
+            "prediction": {
+                "class_id": -1,
+                "class_name": "error",
+                "confidence": 0.0,
+                "device_used": str(device),
+                "processing_time": 0.0,
+                "error": "请求中未包含图片"
+            },
+            "status": "error"
+        }), 400
+
+    image_file = request.files['image']
+    try:
+        image = Image.open(BytesIO(image_file.read())).convert("RGB")
+        
+        # 获取NPU预测结果
+        prediction_result = classifier.predict_single_image(image)
+        
+        # 构建响应
+        response = {
+            "prediction": prediction_result,
+            "status": "success"
+        }
+        
+        return jsonify(response)
+        
+    except Exception as e:
+        return jsonify({
+            "prediction": {
+                "class_id": -1,
+                "class_name": "error",
+                "confidence": 0.0,
+                "device_used": str(device),
+                "processing_time": 0.0,
+                "error": str(e)
+            },
+            "status": "error"
+        }), 500
+
+@app.route('/health', methods=['GET'])
+def health_check():
+    device_type = "npu" if device.type == "npu" else "cuda"
+        
+    return jsonify({
+        "status": "healthy",
+        "npu_available": device.type == "npu",
+        "device_used": str(device),
+        "cpu_threads": torch.get_num_threads()
+    }), 200
+
+if __name__ == "__main__":
+    app.run(host='0.0.0.0', port=80, debug=False)