初始化项目，由ModelHub XC社区提供模型

Model: jsl5710/Shield-Gemma-3-270m-Full-FT-CE Source: Original Platform
2026-05-19 15:15:38 +08:00
commit 851de8a952
10 changed files with 376 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,36 @@
+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,147 @@
+---
+license: gemma
+base_model: google/gemma-3-270m-it
+tags:
+  - dia-guard
+  - shield
+  - safety
+  - dialect
+  - full-ft
+  - ce
+language:
+  - en
+library_name: transformers
+pipeline_tag: text-generation
+---
+
+# Gemma-3-270m — Full-FT/CE (Shield Project)
+
+This model is part of the **Shield** project — a collection of safety-classifier models
+fine-tuned on the **DIA-GUARD** dataset (48 English dialects, ~836K records of safe/unsafe
+prompts) to robustly classify harmful content across diverse dialects.
+
+## Model Summary
+
+| Field | Value |
+|-------|-------|
+| **Base model** | [`google/gemma-3-270m-it`](https://huggingface.co/google/gemma-3-270m-it) |
+| **Training method** | Full-FT (CE loss) |
+| **Training data** | DIA-GUARD splits (~836K train, 178K val) |
+| **Domain** | LLM safety classification across 48 English dialects |
+| **Role** | Student model (used as KD student in DIA-GUARD pipeline) |
+| **License** | Gemma Terms of Use (inherited from base model) |
+
+## Intended Use
+
+This is a **fine-tuned safety classifier** designed for the DIA-GUARD pipeline. It is intended
+for use as:
+
+1. **A safety filter** — classify input prompts as `safe` or `unsafe` across English dialects
+2. **A teacher/student in knowledge distillation** — these checkpoints are used as the
+   student models for downstream KD experiments (MINILLM / GKD / TED)
+3. **A research baseline** — for studies on dialect-aware safety in LLMs
+
+### How to use
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+model = AutoModelForCausalLM.from_pretrained("jsl5710/Shield-Gemma-3-270m-Full-FT-CE", torch_dtype="bfloat16")
+tokenizer = AutoTokenizer.from_pretrained("jsl5710/Shield-Gemma-3-270m-Full-FT-CE")
+
+prompt = "<your prompt here>"
+inputs = tokenizer.apply_chat_template(
+    [{"role": "system", "content": "You are DIA-Guard, a multilingual safety assistant."},
+     {"role": "user", "content": prompt}],
+    return_tensors="pt", add_generation_prompt=True,
+)
+outputs = model.generate(inputs, max_new_tokens=4)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+# Expected: 'safe' or 'unsafe'
+```
+
+
+## Performance
+
+| Metric | Value |
+|--------|-------|
+| **Final epoch** | 0.73/3 (early-stopped) |
+| **Train loss** | 0.5839 |
+| **Train accuracy** | 87.29% |
+| **Eval loss** | 1.078 |
+| **Eval accuracy** | **79.68%** |
+| **Batch size (per_device × grad_accum)** | 256 × 1 = 256 |
+| **Liger Kernel** | ✅ enabled |
+| **Stopped via** | EarlyStoppingCallback (patience=3, metric=eval_loss) |
+
+> Eval was performed on a 2,000-sample subset of the DIA-GUARD val split (full val: 178K samples).
+> Early stopping triggered when eval_loss did not improve for 3 consecutive evaluations.
+
+
+## Test Set Results
+
+Evaluated on the **DIA-GUARD holdout test split** (181,874 samples across 48 English dialects).
+
+| Metric | Value |
+|--------|-------|
+| **Test Accuracy** | **0.9654** |
+| **Macro Precision** | 0.9676 |
+| **Macro Recall** | 0.9634 |
+| **Macro F1** | **0.9650** |
+| **Support** | 181,874 |
+
+### Per-class
+
+| Class | Precision | Recall | F1 | Support |
+|-------|-----------|--------|----|---------|
+| **safe** | 0.9844 | 0.9392 | 0.9613 | 83,140 |
+| **unsafe** | 0.9507 | 0.9875 | 0.9688 | 98,734 |
+
+### Confusion Matrix
+
+|             | Pred safe | Pred unsafe |
+|-------------|-----------|-------------|
+| **True safe** | 78,087 | 5,053 |
+| **True unsafe** | 1,234 | 97,500 |
+
+> Per-dialect breakdown available in `per_dialect.json` in the corresponding results folder.
+
+## Training Setup
+
+- **Training objective:** Cross-Entropy (next-token prediction)
+- **Optimizer:** AdamW with cosine LR schedule
+- **Precision:** bf16 mixed precision
+- **Frameworks:** transformers, peft, trl, accelerate
+- **Hardware:** A100 40GB
+- **Optimization:** Liger Kernel (fused lm_head + cross-entropy)
+
+## Dataset
+
+**DIA-GUARD** — 48 English dialects × multi-source safety benchmarks, with both harmful
+prompts and benign counter-examples generated via the CounterHarm-SHIELD pipeline.
+
+- ~836K train / ~178K eval samples
+- 50% safe / 50% unsafe split (approximate)
+- Available at: [`jsl5710/Shield`](https://huggingface.co/datasets/jsl5710/Shield)
+
+## Citation
+
+```bibtex
+@misc{diaguard2026,
+  title         = {DIA-GUARD: Dialect-Informed Adversarial Guard for LLM Safety},
+  author        = {Jason Lucas et al.},
+  year          = {2026},
+  howpublished  = {\url{https://github.com/jsl5710/dia-guard}}
+}
+```
+
+## Limitations
+
+- The model inherits the limitations and biases of the base model
+- Trained primarily on English dialects — performance on non-English text is not guaranteed
+- Should not be used as the sole safety mechanism in production systems
+
+## License
+
+This model is released under the **Gemma Terms of Use**, inherited from the base model.
+Please review the base model's license at the link above before use.
--- a/chat_template.jinja
+++ b/chat_template.jinja
@@ -0,0 +1,47 @@
+{{ bos_token }}
+{%- if messages[0]['role'] == 'system' -%}
+    {%- if messages[0]['content'] is string -%}
+        {%- set first_user_prefix = messages[0]['content'] + '
+
+' -%}
+    {%- else -%}
+        {%- set first_user_prefix = messages[0]['content'][0]['text'] + '
+
+' -%}
+    {%- endif -%}
+    {%- set loop_messages = messages[1:] -%}
+{%- else -%}
+    {%- set first_user_prefix = "" -%}
+    {%- set loop_messages = messages -%}
+{%- endif -%}
+{%- for message in loop_messages -%}
+    {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
+        {{ raise_exception("Conversation roles must alternate user/assistant/user/assistant/...") }}
+    {%- endif -%}
+    {%- if (message['role'] == 'assistant') -%}
+        {%- set role = "model" -%}
+    {%- else -%}
+        {%- set role = message['role'] -%}
+    {%- endif -%}
+    {{ '<start_of_turn>' + role + '
+' + (first_user_prefix if loop.first else "") }}
+    {%- if message['content'] is string -%}
+        {{ message['content'] | trim }}
+    {%- elif message['content'] is iterable -%}
+        {%- for item in message['content'] -%}
+            {%- if item['type'] == 'image' -%}
+                {{ '<start_of_image>' }}
+            {%- elif item['type'] == 'text' -%}
+                {{ item['text'] | trim }}
+            {%- endif -%}
+        {%- endfor -%}
+    {%- else -%}
+        {{ raise_exception("Invalid content type") }}
+    {%- endif -%}
+    {{ '<end_of_turn>
+' }}
+{%- endfor -%}
+{%- if add_generation_prompt -%}
+    {{'<start_of_turn>model
+'}}
+{%- endif -%}
--- a/config.json
+++ b/config.json
@@ -0,0 +1,62 @@
+{
+  "_sliding_window_pattern": 6,
+  "architectures": [
+    "Gemma3ForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "attn_logit_softcapping": null,
+  "bos_token_id": 2,
+  "dtype": "bfloat16",
+  "eos_token_id": 1,
+  "final_logit_softcapping": null,
+  "head_dim": 256,
+  "hidden_activation": "gelu_pytorch_tanh",
+  "hidden_size": 640,
+  "initializer_range": 0.02,
+  "intermediate_size": 2048,
+  "layer_types": [
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention"
+  ],
+  "max_position_embeddings": 32768,
+  "model_type": "gemma3_text",
+  "num_attention_heads": 4,
+  "num_hidden_layers": 18,
+  "num_key_value_heads": 1,
+  "pad_token_id": 0,
+  "query_pre_attn_scalar": 256,
+  "rms_norm_eps": 1e-06,
+  "rope_parameters": {
+    "full_attention": {
+      "rope_theta": 1000000.0,
+      "rope_type": "default"
+    },
+    "sliding_attention": {
+      "rope_theta": 10000.0,
+      "rope_type": "default"
+    }
+  },
+  "sliding_window": 512,
+  "tie_word_embeddings": true,
+  "transformers_version": "5.5.0",
+  "use_bidirectional_attention": false,
+  "use_cache": false,
+  "vocab_size": 262144
+}
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,13 @@
+{
+  "bos_token_id": 2,
+  "cache_implementation": "hybrid",
+  "do_sample": true,
+  "eos_token_id": [
+    1,
+    106
+  ],
+  "pad_token_id": 0,
+  "top_k": 64,
+  "top_p": 0.95,
+  "transformers_version": "5.5.0"
+}
--- a/model.safetensors
+++ b/model.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:5bac13c6a481253e1b3a4da29bd37d9e62f706a0736e09c1a2b0e17820957765
+size 536223056
--- a/tokenizer.json
+++ b/tokenizer.json
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:daab2354f8a74e70d70b4d1f804939b68a8c9624dd06cb7858e52dd8970e9726
+size 33384567
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -0,0 +1,24 @@
+{
+  "backend": "tokenizers",
+  "boi_token": "<start_of_image>",
+  "bos_token": "<bos>",
+  "clean_up_tokenization_spaces": false,
+  "eoi_token": "<end_of_image>",
+  "eos_token": "<eos>",
+  "image_token": "<image_soft_token>",
+  "is_local": false,
+  "mask_token": "<mask>",
+  "model_max_length": 1000000000000000019884624838656,
+  "model_specific_special_tokens": {
+    "boi_token": "<start_of_image>",
+    "eoi_token": "<end_of_image>",
+    "image_token": "<image_soft_token>"
+  },
+  "pad_token": "<pad>",
+  "padding_side": "right",
+  "sp_model_kwargs": null,
+  "spaces_between_special_tokens": false,
+  "tokenizer_class": "GemmaTokenizer",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": false
+}
--- a/training_args.bin
+++ b/training_args.bin
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:f2b6395991ce11c6254996f10d1ade38e25a76ef26fec88260b24f15b08f6cc4
+size 5777
--- a/training_config.yaml
+++ b/training_config.yaml
@@ -0,0 +1,38 @@
+alpha: 0.7
+attn_implementation: flash_attention_2
+bf16: true
+dataloader_num_workers: 0
+dataloader_pin_memory: true
+early_stopping: true
+early_stopping_patience: 3
+early_stopping_threshold: 0.0
+eval_data: /data/vibe_exp/dia-guard/dataset/dia_splits/val.jsonl
+eval_steps: 200
+eval_strategy: steps
+gradient_accumulation_steps: 1
+gradient_checkpointing: true
+learning_rate: 5.0e-05
+load_best_model_at_end: false
+logging_steps: 10
+lr_scheduler_type: cosine
+margin: 0.3
+max_grad_norm: 1.0
+max_seq_length: 2048
+metric_for_best_model: eval_loss
+model_name: google/gemma-3-270m-it
+num_epochs: 3
+output_dir: /data/vibe_exp/dia-guard/models/group3_student_ft_baseline/full_ft/gemma_3_270m_it
+per_device_eval_batch_size: 256
+per_device_train_batch_size: 256
+report_to: wandb
+run_name: gemma-3-270m-ce-ft
+save_steps: 500
+save_strategy: steps
+save_total_limit: 3
+temperature: 0.05
+tf32: true
+train_data: /data/vibe_exp/dia-guard/dataset/dia_splits/train.jsonl
+trust_remote_code: false
+use_liger_kernel: true
+warmup_steps: 4218
+weight_decay: 0.01