commit 17f7551cd93480b7c22cf40707642cfd7912d436 Author: ModelHub XC Date: Wed Jun 3 07:14:19 2026 +0800 初始化项目,由ModelHub XC社区提供模型 Model: PhantomAjusshi/phi3-auditor-merged Source: Original Platform diff --git a/.gitattributes b/.gitattributes new file mode 100644 index 0000000..a6344aa --- /dev/null +++ b/.gitattributes @@ -0,0 +1,35 @@ +*.7z filter=lfs diff=lfs merge=lfs -text +*.arrow filter=lfs diff=lfs merge=lfs -text +*.bin filter=lfs diff=lfs merge=lfs -text +*.bz2 filter=lfs diff=lfs merge=lfs -text +*.ckpt filter=lfs diff=lfs merge=lfs -text +*.ftz filter=lfs diff=lfs merge=lfs -text +*.gz filter=lfs diff=lfs merge=lfs -text +*.h5 filter=lfs diff=lfs merge=lfs -text +*.joblib filter=lfs diff=lfs merge=lfs -text +*.lfs.* filter=lfs diff=lfs merge=lfs -text +*.mlmodel filter=lfs diff=lfs merge=lfs -text +*.model filter=lfs diff=lfs merge=lfs -text +*.msgpack filter=lfs diff=lfs merge=lfs -text +*.npy filter=lfs diff=lfs merge=lfs -text +*.npz filter=lfs diff=lfs merge=lfs -text +*.onnx filter=lfs diff=lfs merge=lfs -text +*.ot filter=lfs diff=lfs merge=lfs -text +*.parquet filter=lfs diff=lfs merge=lfs -text +*.pb filter=lfs diff=lfs merge=lfs -text +*.pickle filter=lfs diff=lfs merge=lfs -text +*.pkl filter=lfs diff=lfs merge=lfs -text +*.pt filter=lfs diff=lfs merge=lfs -text +*.pth filter=lfs diff=lfs merge=lfs -text +*.rar filter=lfs diff=lfs merge=lfs -text +*.safetensors filter=lfs diff=lfs merge=lfs -text +saved_model/**/* filter=lfs diff=lfs merge=lfs -text +*.tar.* filter=lfs diff=lfs merge=lfs -text +*.tar filter=lfs diff=lfs merge=lfs -text +*.tflite filter=lfs diff=lfs merge=lfs -text +*.tgz filter=lfs diff=lfs merge=lfs -text +*.wasm filter=lfs diff=lfs merge=lfs -text +*.xz filter=lfs diff=lfs merge=lfs -text +*.zip filter=lfs diff=lfs merge=lfs -text +*.zst filter=lfs diff=lfs merge=lfs -text +*tfevents* filter=lfs diff=lfs merge=lfs -text diff --git a/README.md b/README.md new file mode 100644 index 0000000..2356048 --- /dev/null +++ b/README.md @@ -0,0 +1,278 @@ +--- +license: mit +language: +- en +base_model: +- microsoft/Phi-3-mini-4k-instruct +pipeline_tag: text-generation +library_name: transformers +tags: +- phi3 +- lora +- peft +- clinical-ai +- model-audit +- text-generation +- fine-tuned +- healthcare +- safetensors +--- + +# 🏥 phi3-auditor-merged + +**Phi-3-mini fine-tuned for clinical AI model auditing.** + +This model takes a JSON object of ML performance metrics (AUC, ECE, drift, label shift, etc.) and returns a structured health classification label plus a detailed explanation — helping teams audit deployed clinical models for drift, calibration failure, class imbalance, and other issues. + +--- + +## Model Details + +| Property | Value | +|---|---| +| **Base Model** | [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) | +| **Fine-tuning Method** | LoRA (Low-Rank Adaptation) via PEFT | +| **Training Precision** | 8-bit quantized (BitsAndBytesConfig) | +| **Merged Precision** | FP16 (float16 safetensors) | +| **Parameters** | ~3.8B | +| **Model Size** | 7.65 GB (2 safetensor shards) | +| **LoRA Rank (r)** | 16 | +| **LoRA Alpha** | 32 | +| **LoRA Dropout** | 0.05 | +| **Target Modules** | `q_proj`, `k_proj`, `v_proj`, `o_proj` | +| **Task Type** | Causal Language Modeling | +| **PEFT Version** | 0.18.0 | +| **Training Epochs** | 3 | +| **Final Loss** | ~0.41 | + +--- + +## Intended Use + +### What this model does + +Given a JSON report of clinical ML model performance metrics, the model: + +1. Assigns a **Category** label (e.g. `Calibration Failure`, `Major Drift`, `Class Imbalance Problem`, `Healthy`) +2. Generates a concise **Explanation** with observations and recommendations + +### Intended users + +- ML engineers monitoring deployed clinical models +- Healthcare data science teams running periodic model audits +- Researchers studying automated model health assessment + +### Out-of-scope use + +- Not suitable for direct clinical decision-making or patient diagnosis +- Not a replacement for domain expert review of model performance +- Not designed for non-clinical ML tasks +- Should not be used on data types outside its training distribution (non-tabular metrics, images, etc.) + +--- + +## How to Use + +### Requirements + +```bash +pip install transformers torch accelerate +``` + +### Basic inference + +```python +import torch +from transformers import AutoTokenizer, AutoModelForCausalLM + +model_id = "PhantomAjusshi/phi3-auditor-merged" + +tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) +model = AutoModelForCausalLM.from_pretrained( + model_id, + torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32, + device_map="auto", + trust_remote_code=True, # Required for custom Phi-3 modeling files +) + +report = """{ + "auc": 0.863, + "accuracy": 0.83, + "precision": 0.79, + "recall": 0.69, + "f1": 0.79, + "ece": 0.278, + "brier": 0.263, + "drift": 0.03, + "missing_rate": 0.003, + "label_shift": 0.06, + "pos_rate": 0.10, + "data_integrity_issues": 0 +}""" + +prompt = ( + f"<|system|>\nYou are a clinical AI auditor model.\n" + f"<|user|>\nInstruction: Analyze the clinical model report and classify its health.\n\nReport:\n{report}\n" + f"<|assistant|>\n" +) + +inputs = tokenizer(prompt, return_tensors="pt").to(model.device) + +with torch.inference_mode(): + outputs = model.generate( + **inputs, + max_new_tokens=400, + temperature=0.7, + top_p=0.9, + repetition_penalty=1.2, + do_sample=True, + ) + +response = tokenizer.decode(outputs[0], skip_special_tokens=True) +# Extract only the assistant's reply +reply = response.split("<|assistant|>")[-1].strip() +print(reply) +``` + +### Expected output format + +``` +Category: Calibration Failure +Explanation: High calibration error (ECE 0.278) despite reasonable discrimination (AUC 0.863). +The model's probability outputs are poorly aligned with actual outcomes. Recommend +recalibration using Platt scaling or isotonic regression, and threshold review. +``` + +### Input metrics reference + +| Metric | Description | +|---|---| +| `auc` | Area Under the ROC Curve | +| `accuracy` | Overall classification accuracy | +| `precision` | Positive predictive value | +| `recall` | Sensitivity / true positive rate | +| `f1` | Harmonic mean of precision and recall | +| `ece` | Expected Calibration Error | +| `brier` | Brier score (probabilistic accuracy) | +| `drift` | Feature distribution drift score | +| `missing_rate` | Rate of missing input features | +| `label_shift` | Output label distribution shift | +| `pos_rate` | Positive prediction rate | +| `data_integrity_issues` | Count of detected data quality issues | + +--- + +## Training Details + +### Dataset + +- **Name:** Custom synthetic clinical audit dataset (`audit_dataset_v2_5000.json`) +- **Size:** 5,000 labeled samples +- **Split:** 80% train (4,000) / 20% test (1,000) +- **Format:** JSONL — each record has `instruction`, `input` (metrics JSON), `output` (category + explanation) +- **Generation date:** November 17, 2025 + +Each sample pairs a set of synthetic model performance metrics with a human-written audit label and explanation covering categories such as: +- Healthy / Passing +- Calibration Failure +- Major Drift / Potential Drift +- Class Imbalance Problem +- Data Integrity Issue +- Needs Review / Critical Failure + +### Training procedure + +The base model was loaded in 8-bit using `BitsAndBytesConfig` and adapted with LoRA targeting the attention projection layers (`q_proj`, `k_proj`, `v_proj`, `o_proj`). After training, the LoRA adapter was merged into the base model weights using `peft.merge_and_unload()` and saved as full FP16 safetensors. + +**Prompt format used during training:** + +``` +<|system|> +You are an AI auditor analyzing clinical model performance reports. +<|user|> +Instruction: Analyze the clinical model report and classify its health. + +Report: +{ ...metrics JSON... } +<|assistant|> +Category: