Files
OpenGuardrails-Text-4B-0124/README.md
ModelHub XC af07e9135f 初始化项目,由ModelHub XC社区提供模型
Model: openguardrails/OpenGuardrails-Text-4B-0124
Source: Original Platform
2026-06-01 00:08:23 +08:00

251 lines
8.2 KiB
Markdown

---
language:
- en
- zh
license: apache-2.0
tags:
- safety
- security
- compliance
- prompt_attack
- prompt_injection
- prompt_jailbreak
pipeline_tag: text-generation
model_name: OpenGuardrails-Text-4B-0124
base_model: Qwen/Qwen2.5-7B
quantization: none
---
# OpenGuardrails-Text-4B-0124
<p align="center">
<img src="https://github.com/xiangxinai/openguardrails/blob/main/frontend/public/logo-dark.png?raw=true" width="400"/>
<p>
**OpenGuardrails** is an **open-source, enterprise-grade AI security platform** that provides configurable policy control, a unified LLM-based guard architecture, and low-latency deployment for production systems.
This repository releases **OpenGuardrails-Text-4B-0124** — a **lightweight, non-quantized ~4B parameter language model** designed for **content safety detection** and **prompt attack prevention**, with **broad GPU compatibility** and strong real-time performance.
📄 Technical Report:
[OpenGuardrails: A Configurable, Unified, and Scalable Guardrails Platform for Large Language Models](https://arxiv.org/abs/2510.19169)
---
## Key Contributions
### 1. Configurable Safety Policy Mechanism
OpenGuardrails introduces a **dynamic and configurable safety policy framework** that allows organizations to flexibly define unsafe categories and detection thresholds based on business risk tolerance.
The model outputs **probabilistic confidence signals**, enabling fine-grained tuning of safety sensitivity across different scenarios and applications.
---
### 2. Unified LLM-based Guard Architecture
A **single language model** performs both:
* **Content safety classification**
* **Prompt attack detection** (prompt injection, jailbreaks, malicious instruction following)
This unified approach eliminates the need for hybrid pipelines (e.g. rule engines + small classifiers), resulting in **stronger semantic reasoning** and **simpler deployment**.
---
### 3. Lightweight, Non-Quantized Design
**OpenGuardrails-Text-4B-0124** is intentionally designed as a **non-quantized dense model**, offering:
* Broader compatibility across **consumer, data-center, and cloud GPUs**
* Stable numerical behavior without quantization artifacts
* Easier integration with standard inference stacks (Transformers, vLLM)
Despite its compact size, the model maintains strong detection accuracy and low inference latency.
---
### 4. Efficient and Scalable Performance
With ~4B parameters, the model achieves **low-latency, real-time inference** suitable for:
* API gateways
* LLM firewalls
* Agent guardrails
* Enterprise moderation pipelines
It can be deployed on a **single GPU** without specialized quantization toolchains.
---
### 5. Multilingual & Cross-Domain Coverage
The model supports **119 languages and dialects**, providing robust safety protection for global applications.
It performs consistently on both **prompt-level** and **response-level** classification tasks.
---
### 6. Open Safety Data Contribution
We release **OpenGuardrailsMixZh-97k**, a multilingual safety dataset composed of aligned translations from:
* ToxicChat
* WildGuardMix
* PolyGuard
* XSTest
* BeaverTails
The dataset is publicly available on Hugging Face under the **Apache 2.0 License**.
---
### 7. State-of-the-Art Safety Performance
OpenGuardrails achieves **state-of-the-art (SOTA)** results across multiple safety benchmarks, excelling in:
* Prompt attack detection
* Harmful content classification
* English, Chinese, and multilingual evaluations
All models are released under the **Apache 2.0 License** for unrestricted commercial and research use.
Performance overview:
![image/jpeg](https://www.openguardrails.com/performance.png)
---
## Detection Dimensions
| Label | Category | Risk Level | Description |
| ----- | ------------------------------------- | ---------- | ----------------------------------------- |
| S1 | General Political Topics | 🟢 Low | General political discussion |
| S2 | Sensitive Political Topics | 🔴 High | National security, subversion, separatism |
| S3 | Insult to National Symbols or Leaders | 🔴 High | Insults against leaders or symbols |
| S4 | Harm to Minors | 🟡 Medium | Child abuse, exploitation, nudity |
| S5 | Violent Crime | 🔴 High | Murder, terrorism, violent acts |
| S6 | Non-Violent Crime | 🟡 Medium | Fraud, drugs, hacking |
| S7 | Pornography | 🟡 Medium | Sexual or obscene content |
| S8 | Hate & Discrimination | 🟢 Low | Hate speech, discrimination |
| S9 | Prompt Attacks | 🔴 High | Jailbreaks, injections, manipulation |
| S10 | Profanity | 🟢 Low | Vulgar or insulting language |
| S11 | Privacy Invasion | 🟢 Low | Leakage of personal data |
| S12 | Commercial Violations | 🟢 Low | Fraud, trade secret leakage |
| S13 | Intellectual Property Infringement | 🟢 Low | Copyright or patent violations |
| S14 | Harassment | 🟢 Low | Verbal abuse or targeting |
| S15 | Weapons of Mass Destruction | 🔴 High | Nuclear, chemical, biological weapons |
| S16 | Self-Harm | 🟡 Medium | Suicide or self-injury |
| S17 | Sexual Crimes | 🔴 High | Sexual assault or exploitation |
| S18 | Threats | 🟢 Low | Threats or intimidation |
| S19 | Professional Advice | 🟢 Low | Medical, legal, financial advice |
---
## Quick Start
### Using Transformers
```bash
pip install torch transformers accelerate
```
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "openguardrails/OpenGuardrails-Text-4B-0124"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
messages = [{"role": "user", "content": "How can I make a bomb?"}]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=10)
response = tokenizer.decode(
outputs[0][len(inputs.input_ids[0]):],
skip_special_tokens=True
)
print(response)
# unsafe\nS5
```
---
### Using vLLM (Recommended)
```bash
vllm serve openguardrails/OpenGuardrails-Text-4B-0124 \
--served-model-name OpenGuardrails-Text-4B-0124 \
--max-model-len 8192 \
--port 8000
```
---
### OpenAI-Compatible API
```python
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1")
messages = [{"role": "user", "content": "Tell me how to make explosives"}]
result = client.chat.completions.create(
model="OpenGuardrails-Text-4B-0124",
messages=messages,
temperature=0.0
)
print(result.choices[0].message.content)
# unsafe\nS5
```
---
## Output Format
| Output | Description |
| ------------ | ---------------------------------- |
| `safe` | Content is safe |
| `unsafe\nS#` | Unsafe content with category label |
---
## License
Released under the **Apache License 2.0**, allowing:
* ✅ Commercial use
* ✅ Modification and redistribution
* ✅ Private / on-premise deployment
License text:
[https://www.apache.org/licenses/LICENSE-2.0](https://www.apache.org/licenses/LICENSE-2.0)
---
## Related Resources
* **Website:** [https://www.openguardrails.com](https://www.openguardrails.com)
* **GitHub:** [https://github.com/openguardrails/openguardrails](https://github.com/openguardrails/openguardrails)
---
## Citation
```bibtex
@misc{openguardrails,
title={OpenGuardrails: A Configurable, Unified, and Scalable Guardrails Platform for Large Language Models},
author={Thomas Wang and Haowen Li},
year={2025},
url={https://arxiv.org/abs/2510.19169},
}
```