update for ascend

This commit is contained in:
zhousha
2025-09-09 14:45:30 +08:00
parent 16a28b4778
commit c512f7d43d
11 changed files with 43467 additions and 0 deletions

BIN
.DS_Store vendored Normal file

Binary file not shown.

BIN
026_0010.jpg Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB

17
Dockerfile_ascend Normal file
View File

@@ -0,0 +1,17 @@
FROM quay.io/ascend/vllm-ascend:v0.10.0rc1
WORKDIR /workspace/
COPY ./model_test_caltech_http_ascend.py /workspace/
COPY ./microsoft_beit_base_patch16_224_pt22k_ft22k /model
# 安装transformers 4.46.3
RUN python3 -m pip install --no-cache-dir transformers==4.46.3
RUN python3 -m pip install flask==3.1.1
EXPOSE 80
ENTRYPOINT ["python3", "model_test_caltech_http_ascend.py"]

Binary file not shown.

View File

@@ -0,0 +1,18 @@
*.bin.* filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tar.gz filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text

View File

@@ -0,0 +1,104 @@
---
license: apache-2.0
tags:
- image-classification
- vision
datasets:
- imagenet
- imagenet-21k
---
# BEiT (base-sized model, fine-tuned on ImageNet-22k)
BEiT model pre-trained in a self-supervised fashion on ImageNet-22k - also called ImageNet-21k (14 million images, 21,841 classes) at resolution 224x224, and fine-tuned on the same dataset at resolution 224x224. It was introduced in the paper [BEIT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) by Hangbo Bao, Li Dong and Furu Wei and first released in [this repository](https://github.com/microsoft/unilm/tree/master/beit).
Disclaimer: The team releasing BEiT did not write a model card for this model so this model card has been written by the Hugging Face team.
## Model description
The BEiT model is a Vision Transformer (ViT), which is a transformer encoder model (BERT-like). In contrast to the original ViT model, BEiT is pretrained on a large collection of images in a self-supervised fashion, namely ImageNet-21k, at a resolution of 224x224 pixels. The pre-training objective for the model is to predict visual tokens from the encoder of OpenAI's DALL-E's VQ-VAE, based on masked patches.
Next, the model was fine-tuned in a supervised fashion on ImageNet (also referred to as ILSVRC2012), a dataset comprising 1 million images and 1,000 classes, also at resolution 224x224.
Images are presented to the model as a sequence of fixed-size patches (resolution 16x16), which are linearly embedded. Contrary to the original ViT models, BEiT models do use relative position embeddings (similar to T5) instead of absolute position embeddings, and perform classification of images by mean-pooling the final hidden states of the patches, instead of placing a linear layer on top of the final hidden state of the [CLS] token.
By pre-training the model, it learns an inner representation of images that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled images for instance, you can train a standard classifier by placing a linear layer on top of the pre-trained encoder. One typically places a linear layer on top of the [CLS] token, as the last hidden state of this token can be seen as a representation of an entire image. Alternatively, one can mean-pool the final hidden states of the patch embeddings, and place a linear layer on top of that.
## Intended uses & limitations
You can use the raw model for image classification. See the [model hub](https://huggingface.co/models?search=microsoft/beit) to look for
fine-tuned versions on a task that interests you.
### How to use
Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes:
```python
from transformers import BeitImageProcessor, BeitForImageClassification
from PIL import Image
import requests
url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)
processor = BeitImageProcessor.from_pretrained('microsoft/beit-base-patch16-224-pt22k-ft22k')
model = BeitForImageClassification.from_pretrained('microsoft/beit-base-patch16-224-pt22k-ft22k')
inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
# model predicts one of the 21,841 ImageNet-22k classes
predicted_class_idx = logits.argmax(-1).item()
print("Predicted class:", model.config.id2label[predicted_class_idx])
```
Currently, both the feature extractor and model support PyTorch.
## Training data
The BEiT model was pretrained on [ImageNet-21k](http://www.image-net.org/), a dataset consisting of 14 million images and 21k classes, and fine-tuned on the same dataset.
## Training procedure
### Preprocessing
The exact details of preprocessing of images during training/validation can be found [here](https://github.com/microsoft/unilm/blob/master/beit/datasets.py).
Images are resized/rescaled to the same resolution (224x224) and normalized across the RGB channels with mean (0.5, 0.5, 0.5) and standard deviation (0.5, 0.5, 0.5).
### Pretraining
For all pre-training related hyperparameters, we refer to page 15 of the [original paper](https://arxiv.org/abs/2106.08254).
## Evaluation results
For evaluation results on several image classification benchmarks, we refer to tables 1 and 2 of the original paper. Note that for fine-tuning, the best results are obtained with a higher resolution. Of course, increasing the model size will result in better performance.
### BibTeX entry and citation info
```@article{DBLP:journals/corr/abs-2106-08254,
author = {Hangbo Bao and
Li Dong and
Furu Wei},
title = {BEiT: {BERT} Pre-Training of Image Transformers},
journal = {CoRR},
volume = {abs/2106.08254},
year = {2021},
url = {https://arxiv.org/abs/2106.08254},
archivePrefix = {arXiv},
eprint = {2106.08254},
timestamp = {Tue, 29 Jun 2021 16:55:04 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-2106-08254.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
```
```bibtex
@inproceedings{deng2009imagenet,
title={Imagenet: A large-scale hierarchical image database},
author={Deng, Jia and Dong, Wei and Socher, Richard and Li, Li-Jia and Li, Kai and Fei-Fei, Li},
booktitle={2009 IEEE conference on computer vision and pattern recognition},
pages={248--255},
year={2009},
organization={Ieee}
}
```

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,19 @@
{
"crop_size": 224,
"do_center_crop": false,
"do_normalize": true,
"do_resize": true,
"feature_extractor_type": "BeitFeatureExtractor",
"image_mean": [
0.5,
0.5,
0.5
],
"image_std": [
0.5,
0.5,
0.5
],
"resample": 2,
"size": 224
}

View File

@@ -0,0 +1,198 @@
import requests
import json
import torch
from PIL import Image
from io import BytesIO
from transformers import AutoImageProcessor, AutoModelForImageClassification
from tqdm import tqdm
import os
import random
import time
from flask import Flask, request, jsonify
class ImageClassifier:
def __init__(self, model_path: str, device: torch.device):
"""初始化图像分类器,指定指定设备"""
# 增加模型路径有效性详细校验
if not os.path.exists(model_path):
raise ValueError(f"模型路径不存在: {model_path}")
if not os.path.isdir(model_path):
raise ValueError(f"模型路径不是目录: {model_path}")
# 检查模型必要文件
required_files = ["config.json", "pytorch_model.bin"] # 基础模型文件
missing_files = [f for f in required_files if not os.path.exists(os.path.join(model_path, f))]
if missing_files:
raise ValueError(f"模型路径缺少必要文件: {missing_files}")
self.processor = AutoImageProcessor.from_pretrained(model_path)
self.model = AutoModelForImageClassification.from_pretrained(model_path)
# 将模型移动到指定设备
self.model = self.model.to(device)
self.device = device
# 检查设备类型并打印相应信息
if device.type == "cuda":
print(f"模型是否在 GPU 上: {next(self.model.parameters()).is_cuda}")
elif device.type == "npu":
print(f"模型是否在 NPU 上: {next(self.model.parameters()).device.type == 'npu'}")
else:
print(f"模型在 {device.type.upper()} 上运行")
# 若在GPU/NPU且有多块使用DataParallel
if device.type in ["cuda", "npu"] and (device.type == "cuda" and torch.cuda.device_count() > 1):
self.model = torch.nn.DataParallel(self.model)
self.id2label = self.model.module.config.id2label if hasattr(self.model, 'module') else self.model.config.id2label
def predict_single_image(self, image: Image.Image) -> dict:
"""预测单张PIL图片"""
try:
# 预处理
inputs = self.processor(images=image, return_tensors="pt")
# 将输入数据移动到设备
inputs = inputs.to(self.device)
# 模型推理
start_time = time.time()
with torch.no_grad():
ts = time.time()
outputs = self.model(**inputs)
print('ascend T1', time.time() - ts, flush=True)
ts = time.time()
for i in range(1000):
outputs = self.model(**inputs)
print('ascend T2', time.time() - ts, flush=True)
processing_time = time.time() - start_time
# 获取预测结果(只取置信度最高的一个)
logits = outputs.logits
probs = torch.nn.functional.softmax(logits, dim=1)
top_probs, top_indices = probs.topk(1, dim=1)
# 整理结果
class_idx = top_indices[0, 0].item()
confidence = top_probs[0, 0].item()
device_type = "npu" if self.device.type == "npu" else "cuda"
return {
"class_id": class_idx,
"class_name": self.id2label[class_idx],
"confidence": confidence,
"device_used": str(self.device),
"processing_time": processing_time
}
except Exception as e:
print(f"处理图片时出错: {e}")
return {
"class_id": -1,
"class_name": "error",
"confidence": 0.0,
"device_used": str(self.device),
"processing_time": 0.0,
"error": str(e)
}
def check_ascend_available():
"""检查昇腾NPU是否可用"""
try:
# 检查是否有昇腾相关的Python包
import torch_npu
# 检查NPU设备是否可用
if hasattr(torch, 'npu') and torch.npu.is_available():
return True
except ImportError:
pass
return False
def get_device():
"""获取最佳可用设备"""
# 首先检查昇腾NPU
if check_ascend_available():
print("检测到昇腾NPU可用")
return torch.device("npu:0")
# 然后检查NVIDIA GPU
elif torch.cuda.is_available():
print("检测到NVIDIA GPU可用")
return torch.device("cuda:0")
# 最后使用CPU
else:
print("未检测到加速设备使用CPU")
return torch.device("cpu")
# 初始化服务
app = Flask(__name__)
MODEL_PATH = os.environ.get("MODEL_PATH", "/model") # 模型路径(环境变量或默认路径)
# 获取设备并初始化分类器
device = get_device()
classifier = ImageClassifier(MODEL_PATH, device)
@app.route('/v1/private/s782b4996', methods=['POST'])
def predict_single():
"""接收单张图片并返回NPU预测结果"""
if 'image' not in request.files:
return jsonify({
"prediction": {
"class_id": -1,
"class_name": "error",
"confidence": 0.0,
"device_used": str(device),
"processing_time": 0.0,
"error": "请求中未包含图片"
},
"status": "error"
}), 400
image_file = request.files['image']
try:
image = Image.open(BytesIO(image_file.read())).convert("RGB")
# 获取NPU预测结果
prediction_result = classifier.predict_single_image(image)
# 构建响应
response = {
"prediction": prediction_result,
"status": "success"
}
return jsonify(response)
except Exception as e:
return jsonify({
"prediction": {
"class_id": -1,
"class_name": "error",
"confidence": 0.0,
"device_used": str(device),
"processing_time": 0.0,
"error": str(e)
},
"status": "error"
}), 500
@app.route('/health', methods=['GET'])
def health_check():
device_type = "npu" if device.type == "npu" else "cuda"
return jsonify({
"status": "healthy",
"npu_available": device.type == "npu",
"device_used": str(device),
"cpu_threads": torch.get_num_threads()
}), 200
if __name__ == "__main__":
app.run(host='0.0.0.0', port=80, debug=False)