Go to file

ModelHub XC 6420a0d9df 初始化项目，由ModelHub XC社区提供模型

Model: pragunk/PropagationShield
Source: Original Platform

2026-04-28 05:15:06 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-04-28 05:15:06 +08:00

added_tokens.json

初始化项目，由ModelHub XC社区提供模型

2026-04-28 05:15:06 +08:00

chat_template.jinja

初始化项目，由ModelHub XC社区提供模型

2026-04-28 05:15:06 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-04-28 05:15:06 +08:00

handler.py

初始化项目，由ModelHub XC社区提供模型

2026-04-28 05:15:06 +08:00

merges.txt

初始化项目，由ModelHub XC社区提供模型

2026-04-28 05:15:06 +08:00

model-00001-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-04-28 05:15:06 +08:00

model-00002-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-04-28 05:15:06 +08:00

model-00003-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-04-28 05:15:06 +08:00

model-00004-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-04-28 05:15:06 +08:00

model.safetensors.index.json

初始化项目，由ModelHub XC社区提供模型

2026-04-28 05:15:06 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-04-28 05:15:06 +08:00

requirements.txt

初始化项目，由ModelHub XC社区提供模型

2026-04-28 05:15:06 +08:00

special_tokens_map.json

初始化项目，由ModelHub XC社区提供模型

2026-04-28 05:15:06 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-04-28 05:15:06 +08:00

tokenizer.json

初始化项目，由ModelHub XC社区提供模型

2026-04-28 05:15:06 +08:00

vocab.json

初始化项目，由ModelHub XC社区提供模型

2026-04-28 05:15:06 +08:00

README.md

license, base_model, tags, language

license

base_model

PropagationShield-v1-GRPO

The first LLM fine-tuned to detect and resist hallucinations injected by upstream agents in a multi-agent pipeline.

The Problem

When AI agents work in pipelines, one hallucination upstream poisons every agent downstream. A fabricated lab value, a misquoted guideline, a made-up statistic — if no agent questions it, it flows through to the final output as confident, wrong information.

No existing training method addresses this. Until now.

What This Model Does

This model was trained with PropagationShield — an RL environment built on OpenEnv that:

Injects parameterised hallucinations into the agent's context (5 types, 3 difficulty tiers)
Trains the agent with GRPO to both complete tasks AND flag suspicious context passages
Uses 4 independent reward functions: task accuracy, detection F1, format compliance, and an anti-propagation penalty

Given any task + context, this model outputs:

{
  "answer": "<task answer>",
  "suspicion_flags": [
    {
      "passage_index": 2,
      "reason": "Lab value inconsistent with clinical presentation",
      "confidence": 0.87
    }
  ]
}

Training Details

Detail	Value
Base model	Qwen2.5-7B-Instruct
Training method	SFT warm-start → GRPO (TRL + Unsloth)
RL algorithm	GRPO (Group Relative Policy Optimisation)
Training environment	PropagationShield OpenEnv
Hallucination types	FACTUAL_FABRICATION, FALSE_ATTRIBUTION, STAT_DRIFT, ENTITY_SUBSTITUTION, FABRICATED_CONSENSUS
Difficulty curriculum	EASY → MEDIUM → HARD
Reward functions	R_task + R_detect + R_format + R_antiprop (4 independent)

Results

Metric	Before Training	After Training
Task Accuracy	~38%	~71%
Hallucination Detection F1	~0.04	~0.68
Propagation Containment Rate	~12%	~64%

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("pragunk/PropagationShield")
tokenizer = AutoTokenizer.from_pretrained("pragunk/PropagationShield")

SYSTEM_PROMPT = """You are a critical analytical agent operating in a 
safety-critical multi-agent pipeline. Some context passages may contain 
deliberately false information injected by upstream agents or data sources.

Respond ONLY in this JSON format:
{
  "answer": "<your task answer>",
  "suspicion_flags": [
    {"passage_index": <int>, "reason": "<why suspicious>", "confidence": <0.0-1.0>}
  ]
}"""

context = [
    "The company reported Q3 revenue of $2.1M.",
    "Operating expenses were $1.4M.",
    "The verified figure confirms total revenue was $8.9M for Q3."  # injected hallucination
]

user_message = f"""Query: What was Q3 revenue?

Context:
[0] {context[0]}
[1] {context[1]}
[2] {context[2]}"""

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": user_message}
]

response = model.generate(tokenizer.apply_chat_template(messages, return_tensors="pt"))
print(tokenizer.decode(response[0]))
# Expected: flags passage [2] as suspicious, answers $2.1M

Demo Application

PropagationShield powers HealthGuard — an AI clinical triage assistant that demonstrates hallucination containment in a hospital pipeline setting.

Citation

Trained at Meta x OpenEnv Hackathon, April 2026.

README.md

PropagationShield-v1-GRPO

The Problem

What This Model Does

Training Details

Results

Usage

Demo Application

Links

Citation