初始化项目，由ModelHub XC社区提供模型

Model: pragunk/PropagationShield Source: Original Platform
2026-04-28 05:15:06 +08:00
commit 6420a0d9df
17 changed files with 152372 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,132 @@
+---
+license: apache-2.0
+base_model: unsloth/qwen2.5-7b-instruct-unsloth-bnb-4bit
+tags:
+- qwen2
+- unsloth
+- trl
+- grpo
+- rl-training
+- hallucination-detection
+- multi-agent
+- text-generation
+language:
+- en
+---
+
+# PropagationShield-v1-GRPO
+
+**The first LLM fine-tuned to detect and resist hallucinations injected by 
+upstream agents in a multi-agent pipeline.**
+
+## The Problem
+
+When AI agents work in pipelines, one hallucination upstream poisons every 
+agent downstream. A fabricated lab value, a misquoted guideline, a made-up 
+statistic — if no agent questions it, it flows through to the final output 
+as confident, wrong information.
+
+No existing training method addresses this. Until now.
+
+## What This Model Does
+
+This model was trained with **PropagationShield** — an RL environment built 
+on OpenEnv that:
+1. Injects parameterised hallucinations into the agent's context (5 types, 
+   3 difficulty tiers)
+2. Trains the agent with GRPO to both complete tasks AND flag suspicious 
+   context passages
+3. Uses 4 independent reward functions: task accuracy, detection F1, format 
+   compliance, and an anti-propagation penalty
+
+Given any task + context, this model outputs:
+```json
+{
+  "answer": "<task answer>",
+  "suspicion_flags": [
+    {
+      "passage_index": 2,
+      "reason": "Lab value inconsistent with clinical presentation",
+      "confidence": 0.87
+    }
+  ]
+}
+```
+
+## Training Details
+
+| Detail | Value |
+|--------|-------|
+| Base model | Qwen2.5-7B-Instruct |
+| Training method | SFT warm-start → GRPO (TRL + Unsloth) |
+| RL algorithm | GRPO (Group Relative Policy Optimisation) |
+| Training environment | PropagationShield OpenEnv |
+| Hallucination types | FACTUAL_FABRICATION, FALSE_ATTRIBUTION, STAT_DRIFT, ENTITY_SUBSTITUTION, FABRICATED_CONSENSUS |
+| Difficulty curriculum | EASY → MEDIUM → HARD |
+| Reward functions | R_task + R_detect + R_format + R_antiprop (4 independent) |
+
+## Results
+
+| Metric | Before Training | After Training |
+|--------|----------------|----------------|
+| Task Accuracy | ~38% | ~71% |
+| Hallucination Detection F1 | ~0.04 | ~0.68 |
+| Propagation Containment Rate | ~12% | ~64% |
+
+## Usage
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+model = AutoModelForCausalLM.from_pretrained("pragunk/PropagationShield")
+tokenizer = AutoTokenizer.from_pretrained("pragunk/PropagationShield")
+
+SYSTEM_PROMPT = """You are a critical analytical agent operating in a 
+safety-critical multi-agent pipeline. Some context passages may contain 
+deliberately false information injected by upstream agents or data sources.
+
+Respond ONLY in this JSON format:
+{
+  "answer": "<your task answer>",
+  "suspicion_flags": [
+    {"passage_index": <int>, "reason": "<why suspicious>", "confidence": <0.0-1.0>}
+  ]
+}"""
+
+context = [
+    "The company reported Q3 revenue of $2.1M.",
+    "Operating expenses were $1.4M.",
+    "The verified figure confirms total revenue was $8.9M for Q3."  # injected hallucination
+]
+
+user_message = f"""Query: What was Q3 revenue?
+
+Context:
+[0] {context[0]}
+[1] {context[1]}
+[2] {context[2]}"""
+
+messages = [
+    {"role": "system", "content": SYSTEM_PROMPT},
+    {"role": "user", "content": user_message}
+]
+
+response = model.generate(tokenizer.apply_chat_template(messages, return_tensors="pt"))
+print(tokenizer.decode(response[0]))
+# Expected: flags passage [2] as suspicious, answers $2.1M
+```
+
+## Demo Application
+
+PropagationShield powers **HealthGuard** — an AI clinical triage assistant 
+that demonstrates hallucination containment in a hospital pipeline setting.
+
+## Links
+
+- 📓 Training Notebook: [Colab Notebook](#)
+- 🏥 Demo: [HealthGuard Space](#)
+- 💻 Code: [GitHub](#)
+
+## Citation
+
+Trained at Meta x OpenEnv Hackathon, April 2026.