初始化项目，由ModelHub XC社区提供模型

Model: PhantomAjusshi/phi3-auditor-merged Source: Original Platform
2026-06-03 07:14:19 +08:00
commit 17f7551cd9
15 changed files with 279754 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,278 @@
+---
+license: mit
+language:
+- en
+base_model:
+- microsoft/Phi-3-mini-4k-instruct
+pipeline_tag: text-generation
+library_name: transformers
+tags:
+- phi3
+- lora
+- peft
+- clinical-ai
+- model-audit
+- text-generation
+- fine-tuned
+- healthcare
+- safetensors
+---
+
+# 🏥 phi3-auditor-merged
+
+**Phi-3-mini fine-tuned for clinical AI model auditing.**
+
+This model takes a JSON object of ML performance metrics (AUC, ECE, drift, label shift, etc.) and returns a structured health classification label plus a detailed explanation — helping teams audit deployed clinical models for drift, calibration failure, class imbalance, and other issues.
+
+---
+
+## Model Details
+
+| Property | Value |
+|---|---|
+| **Base Model** | [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) |
+| **Fine-tuning Method** | LoRA (Low-Rank Adaptation) via PEFT |
+| **Training Precision** | 8-bit quantized (BitsAndBytesConfig) |
+| **Merged Precision** | FP16 (float16 safetensors) |
+| **Parameters** | ~3.8B |
+| **Model Size** | 7.65 GB (2 safetensor shards) |
+| **LoRA Rank (r)** | 16 |
+| **LoRA Alpha** | 32 |
+| **LoRA Dropout** | 0.05 |
+| **Target Modules** | `q_proj`, `k_proj`, `v_proj`, `o_proj` |
+| **Task Type** | Causal Language Modeling |
+| **PEFT Version** | 0.18.0 |
+| **Training Epochs** | 3 |
+| **Final Loss** | ~0.41 |
+
+---
+
+## Intended Use
+
+### What this model does
+
+Given a JSON report of clinical ML model performance metrics, the model:
+
+1. Assigns a **Category** label (e.g. `Calibration Failure`, `Major Drift`, `Class Imbalance Problem`, `Healthy`)
+2. Generates a concise **Explanation** with observations and recommendations
+
+### Intended users
+
+- ML engineers monitoring deployed clinical models
+- Healthcare data science teams running periodic model audits
+- Researchers studying automated model health assessment
+
+### Out-of-scope use
+
+- Not suitable for direct clinical decision-making or patient diagnosis
+- Not a replacement for domain expert review of model performance
+- Not designed for non-clinical ML tasks
+- Should not be used on data types outside its training distribution (non-tabular metrics, images, etc.)
+
+---
+
+## How to Use
+
+### Requirements
+
+```bash
+pip install transformers torch accelerate
+```
+
+### Basic inference
+
+```python
+import torch
+from transformers import AutoTokenizer, AutoModelForCausalLM
+
+model_id = "PhantomAjusshi/phi3-auditor-merged"
+
+tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
+    device_map="auto",
+    trust_remote_code=True,  # Required for custom Phi-3 modeling files
+)
+
+report = """{
+  "auc": 0.863,
+  "accuracy": 0.83,
+  "precision": 0.79,
+  "recall": 0.69,
+  "f1": 0.79,
+  "ece": 0.278,
+  "brier": 0.263,
+  "drift": 0.03,
+  "missing_rate": 0.003,
+  "label_shift": 0.06,
+  "pos_rate": 0.10,
+  "data_integrity_issues": 0
+}"""
+
+prompt = (
+    f"<|system|>\nYou are a clinical AI auditor model.\n"
+    f"<|user|>\nInstruction: Analyze the clinical model report and classify its health.\n\nReport:\n{report}\n"
+    f"<|assistant|>\n"
+)
+
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+
+with torch.inference_mode():
+    outputs = model.generate(
+        **inputs,
+        max_new_tokens=400,
+        temperature=0.7,
+        top_p=0.9,
+        repetition_penalty=1.2,
+        do_sample=True,
+    )
+
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+# Extract only the assistant's reply
+reply = response.split("<|assistant|>")[-1].strip()
+print(reply)
+```
+
+### Expected output format
+
+```
+Category: Calibration Failure
+Explanation: High calibration error (ECE 0.278) despite reasonable discrimination (AUC 0.863).
+The model's probability outputs are poorly aligned with actual outcomes. Recommend
+recalibration using Platt scaling or isotonic regression, and threshold review.
+```
+
+### Input metrics reference
+
+| Metric | Description |
+|---|---|
+| `auc` | Area Under the ROC Curve |
+| `accuracy` | Overall classification accuracy |
+| `precision` | Positive predictive value |
+| `recall` | Sensitivity / true positive rate |
+| `f1` | Harmonic mean of precision and recall |
+| `ece` | Expected Calibration Error |
+| `brier` | Brier score (probabilistic accuracy) |
+| `drift` | Feature distribution drift score |
+| `missing_rate` | Rate of missing input features |
+| `label_shift` | Output label distribution shift |
+| `pos_rate` | Positive prediction rate |
+| `data_integrity_issues` | Count of detected data quality issues |
+
+---
+
+## Training Details
+
+### Dataset
+
+- **Name:** Custom synthetic clinical audit dataset (`audit_dataset_v2_5000.json`)
+- **Size:** 5,000 labeled samples
+- **Split:** 80% train (4,000) / 20% test (1,000)
+- **Format:** JSONL — each record has `instruction`, `input` (metrics JSON), `output` (category + explanation)
+- **Generation date:** November 17, 2025
+
+Each sample pairs a set of synthetic model performance metrics with a human-written audit label and explanation covering categories such as:
+- Healthy / Passing
+- Calibration Failure
+- Major Drift / Potential Drift
+- Class Imbalance Problem
+- Data Integrity Issue
+- Needs Review / Critical Failure
+
+### Training procedure
+
+The base model was loaded in 8-bit using `BitsAndBytesConfig` and adapted with LoRA targeting the attention projection layers (`q_proj`, `k_proj`, `v_proj`, `o_proj`). After training, the LoRA adapter was merged into the base model weights using `peft.merge_and_unload()` and saved as full FP16 safetensors.
+
+**Prompt format used during training:**
+
+```
+<|system|>
+You are an AI auditor analyzing clinical model performance reports.
+<|user|>
+Instruction: Analyze the clinical model report and classify its health.
+
+Report:
+{ ...metrics JSON... }
+<|assistant|>
+Category: <label>
+Explanation: <explanation>
+```
+
+### Hyperparameters
+
+| Parameter | Value |
+|---|---|
+| Epochs | 3 |
+| Batch size | 4 |
+| Gradient accumulation steps | 4 |
+| Effective batch size | 16 |
+| Learning rate | 1e-4 |
+| Warmup ratio | 0.1 |
+| Max sequence length | 512 |
+| Optimizer | AdamW (default) |
+| Precision | FP16 (mixed) |
+
+### Training loss
+
+| Step | Epoch | Loss |
+|---|---|---|
+| 50 | 0.22 | 1.623 |
+| 100 | 0.44 | 0.657 |
+| 150 | 0.67 | 0.444 |
+| 200 | 0.89 | 0.420 |
+| 300 | 1.33 | 0.413 |
+| 450 | 2.00 | 0.412 |
+| 600 | 2.67 | 0.408 |
+| 675 | 3.00 | ~0.410 |
+
+Loss converged rapidly after the first 150 steps, stabilizing around 0.41 for the remainder of training.
+
+---
+
+## Evaluation
+
+The model was evaluated on a held-out test set of 1,000 samples using weighted precision, recall, F1, and accuracy computed by extracting the `Category:` field from generated outputs and comparing to ground-truth labels.
+
+> Formal evaluation metrics will be added here once a full benchmark run is completed.
+
+---
+
+## Limitations & Bias
+
+- **Synthetic training data:** The model was trained entirely on synthetically generated audit reports. Real-world clinical model metrics may follow different distributions or contain edge cases not represented in training.
+- **Label sensitivity:** The model may be sensitive to metric combinations near decision boundaries between categories.
+- **No temporal reasoning:** The model does not reason about metric trends over time — each inference is based on a single snapshot of metrics.
+- **English only:** All training data is in English.
+- **Not a substitute for expert review:** Outputs should be treated as decision-support, not a final audit verdict.
+
+---
+
+## Repository & Related Work
+
+- **Training code:** [Hospital-Audit-Trained-Model (GitHub)](https://github.com/PhantomAjusshi/Hospital-Audit-Trained-Model)
+- **Web application:** [Hospital-Model-Audit-Website (GitHub)](https://github.com/PhantomAjusshi/Hospital-Model-Audit-Website) — a full-stack Next.js + FastAPI interface that uses this model via llama.cpp
+
+---
+
+## Citation
+
+If you use this model in your work, please cite:
+
+```bibtex
+@misc{phi3-auditor-merged,
+  author       = {PhantomAjusshi},
+  title        = {phi3-auditor-merged: Phi-3-mini fine-tuned for clinical AI model auditing},
+  year         = {2025},
+  publisher    = {HuggingFace},
+  url          = {https://huggingface.co/PhantomAjusshi/phi3-auditor-merged}
+}
+```
+
+---
+
+## License
+
+This model is released under the **MIT License**.
+
+The base model ([microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)) is subject to Microsoft's Phi-3 license. Please review it before use in commercial or production settings.