初始化项目，由ModelHub XC社区提供模型

Model: jsl5710/Shield-Gemma-3-270m-Full-FT-CE Source: Original Platform
2026-05-19 15:15:38 +08:00
commit 851de8a952
10 changed files with 376 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,147 @@
+---
+license: gemma
+base_model: google/gemma-3-270m-it
+tags:
+  - dia-guard
+  - shield
+  - safety
+  - dialect
+  - full-ft
+  - ce
+language:
+  - en
+library_name: transformers
+pipeline_tag: text-generation
+---
+
+# Gemma-3-270m — Full-FT/CE (Shield Project)
+
+This model is part of the **Shield** project — a collection of safety-classifier models
+fine-tuned on the **DIA-GUARD** dataset (48 English dialects, ~836K records of safe/unsafe
+prompts) to robustly classify harmful content across diverse dialects.
+
+## Model Summary
+
+| Field | Value |
+|-------|-------|
+| **Base model** | [`google/gemma-3-270m-it`](https://huggingface.co/google/gemma-3-270m-it) |
+| **Training method** | Full-FT (CE loss) |
+| **Training data** | DIA-GUARD splits (~836K train, 178K val) |
+| **Domain** | LLM safety classification across 48 English dialects |
+| **Role** | Student model (used as KD student in DIA-GUARD pipeline) |
+| **License** | Gemma Terms of Use (inherited from base model) |
+
+## Intended Use
+
+This is a **fine-tuned safety classifier** designed for the DIA-GUARD pipeline. It is intended
+for use as:
+
+1. **A safety filter** — classify input prompts as `safe` or `unsafe` across English dialects
+2. **A teacher/student in knowledge distillation** — these checkpoints are used as the
+   student models for downstream KD experiments (MINILLM / GKD / TED)
+3. **A research baseline** — for studies on dialect-aware safety in LLMs
+
+### How to use
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+model = AutoModelForCausalLM.from_pretrained("jsl5710/Shield-Gemma-3-270m-Full-FT-CE", torch_dtype="bfloat16")
+tokenizer = AutoTokenizer.from_pretrained("jsl5710/Shield-Gemma-3-270m-Full-FT-CE")
+
+prompt = "<your prompt here>"
+inputs = tokenizer.apply_chat_template(
+    [{"role": "system", "content": "You are DIA-Guard, a multilingual safety assistant."},
+     {"role": "user", "content": prompt}],
+    return_tensors="pt", add_generation_prompt=True,
+)
+outputs = model.generate(inputs, max_new_tokens=4)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+# Expected: 'safe' or 'unsafe'
+```
+
+
+## Performance
+
+| Metric | Value |
+|--------|-------|
+| **Final epoch** | 0.73/3 (early-stopped) |
+| **Train loss** | 0.5839 |
+| **Train accuracy** | 87.29% |
+| **Eval loss** | 1.078 |
+| **Eval accuracy** | **79.68%** |
+| **Batch size (per_device × grad_accum)** | 256 × 1 = 256 |
+| **Liger Kernel** | ✅ enabled |
+| **Stopped via** | EarlyStoppingCallback (patience=3, metric=eval_loss) |
+
+> Eval was performed on a 2,000-sample subset of the DIA-GUARD val split (full val: 178K samples).
+> Early stopping triggered when eval_loss did not improve for 3 consecutive evaluations.
+
+
+## Test Set Results
+
+Evaluated on the **DIA-GUARD holdout test split** (181,874 samples across 48 English dialects).
+
+| Metric | Value |
+|--------|-------|
+| **Test Accuracy** | **0.9654** |
+| **Macro Precision** | 0.9676 |
+| **Macro Recall** | 0.9634 |
+| **Macro F1** | **0.9650** |
+| **Support** | 181,874 |
+
+### Per-class
+
+| Class | Precision | Recall | F1 | Support |
+|-------|-----------|--------|----|---------|
+| **safe** | 0.9844 | 0.9392 | 0.9613 | 83,140 |
+| **unsafe** | 0.9507 | 0.9875 | 0.9688 | 98,734 |
+
+### Confusion Matrix
+
+|             | Pred safe | Pred unsafe |
+|-------------|-----------|-------------|
+| **True safe** | 78,087 | 5,053 |
+| **True unsafe** | 1,234 | 97,500 |
+
+> Per-dialect breakdown available in `per_dialect.json` in the corresponding results folder.
+
+## Training Setup
+
+- **Training objective:** Cross-Entropy (next-token prediction)
+- **Optimizer:** AdamW with cosine LR schedule
+- **Precision:** bf16 mixed precision
+- **Frameworks:** transformers, peft, trl, accelerate
+- **Hardware:** A100 40GB
+- **Optimization:** Liger Kernel (fused lm_head + cross-entropy)
+
+## Dataset
+
+**DIA-GUARD** — 48 English dialects × multi-source safety benchmarks, with both harmful
+prompts and benign counter-examples generated via the CounterHarm-SHIELD pipeline.
+
+- ~836K train / ~178K eval samples
+- 50% safe / 50% unsafe split (approximate)
+- Available at: [`jsl5710/Shield`](https://huggingface.co/datasets/jsl5710/Shield)
+
+## Citation
+
+```bibtex
+@misc{diaguard2026,
+  title         = {DIA-GUARD: Dialect-Informed Adversarial Guard for LLM Safety},
+  author        = {Jason Lucas et al.},
+  year          = {2026},
+  howpublished  = {\url{https://github.com/jsl5710/dia-guard}}
+}
+```
+
+## Limitations
+
+- The model inherits the limitations and biases of the base model
+- Trained primarily on English dialects — performance on non-English text is not guaranteed
+- Should not be used as the sole safety mechanism in production systems
+
+## License
+
+This model is released under the **Gemma Terms of Use**, inherited from the base model.
+Please review the base model's license at the link above before use.