OpenGuardrails-Text-4B-0124/README.md

---
language:
- en
- zh
license: apache-2.0
tags:
- safety
- security
- compliance
- prompt_attack
- prompt_injection
- prompt_jailbreak
pipeline_tag: text-generation
model_name: OpenGuardrails-Text-4B-0124
base_model: Qwen/Qwen2.5-7B
quantization: none
---

# OpenGuardrails-Text-4B-0124

<p align="center">
    <img src="https://github.com/xiangxinai/openguardrails/blob/main/frontend/public/logo-dark.png?raw=true" width="400"/>
<p>

**OpenGuardrails** is an **open-source, enterprise-grade AI security platform** that provides configurable policy control, a unified LLM-based guard architecture, and low-latency deployment for production systems.

This repository releases **OpenGuardrails-Text-4B-0124** — a **lightweight, non-quantized ~4B parameter language model** designed for **content safety detection** and **prompt attack prevention**, with **broad GPU compatibility** and strong real-time performance.

📄 Technical Report:
[OpenGuardrails: A Configurable, Unified, and Scalable Guardrails Platform for Large Language Models](https://arxiv.org/abs/2510.19169)

---

## Key Contributions

### 1. Configurable Safety Policy Mechanism

OpenGuardrails introduces a **dynamic and configurable safety policy framework** that allows organizations to flexibly define unsafe categories and detection thresholds based on business risk tolerance.

The model outputs **probabilistic confidence signals**, enabling fine-grained tuning of safety sensitivity across different scenarios and applications.

---

### 2. Unified LLM-based Guard Architecture

A **single language model** performs both:

* **Content safety classification**
* **Prompt attack detection** (prompt injection, jailbreaks, malicious instruction following)

This unified approach eliminates the need for hybrid pipelines (e.g. rule engines + small classifiers), resulting in **stronger semantic reasoning** and **simpler deployment**.

---

### 3. Lightweight, Non-Quantized Design

**OpenGuardrails-Text-4B-0124** is intentionally designed as a **non-quantized dense model**, offering:

* Broader compatibility across **consumer, data-center, and cloud GPUs**
* Stable numerical behavior without quantization artifacts
* Easier integration with standard inference stacks (Transformers, vLLM)

Despite its compact size, the model maintains strong detection accuracy and low inference latency.

---

### 4. Efficient and Scalable Performance

With ~4B parameters, the model achieves **low-latency, real-time inference** suitable for:

* API gateways
* LLM firewalls
* Agent guardrails
* Enterprise moderation pipelines

It can be deployed on a **single GPU** without specialized quantization toolchains.

---

### 5. Multilingual & Cross-Domain Coverage

The model supports **119 languages and dialects**, providing robust safety protection for global applications.
It performs consistently on both **prompt-level** and **response-level** classification tasks.

---

### 6. Open Safety Data Contribution

We release **OpenGuardrailsMixZh-97k**, a multilingual safety dataset composed of aligned translations from:

* ToxicChat
* WildGuardMix
* PolyGuard
* XSTest
* BeaverTails

The dataset is publicly available on Hugging Face under the **Apache 2.0 License**.

---

### 7. State-of-the-Art Safety Performance

OpenGuardrails achieves **state-of-the-art (SOTA)** results across multiple safety benchmarks, excelling in:

* Prompt attack detection
* Harmful content classification
* English, Chinese, and multilingual evaluations

All models are released under the **Apache 2.0 License** for unrestricted commercial and research use.

Performance overview:
![image/jpeg](https://www.openguardrails.com/performance.png)

---

## Detection Dimensions

| Label | Category                              | Risk Level | Description                               |
| ----- | ------------------------------------- | ---------- | ----------------------------------------- |
| S1    | General Political Topics              | 🟢 Low     | General political discussion              |
| S2    | Sensitive Political Topics            | 🔴 High    | National security, subversion, separatism |
| S3    | Insult to National Symbols or Leaders | 🔴 High    | Insults against leaders or symbols        |
| S4    | Harm to Minors                        | 🟡 Medium  | Child abuse, exploitation, nudity         |
| S5    | Violent Crime                         | 🔴 High    | Murder, terrorism, violent acts           |
| S6    | Non-Violent Crime                     | 🟡 Medium  | Fraud, drugs, hacking                     |
| S7    | Pornography                           | 🟡 Medium  | Sexual or obscene content                 |
| S8    | Hate & Discrimination                 | 🟢 Low     | Hate speech, discrimination               |
| S9    | Prompt Attacks                        | 🔴 High    | Jailbreaks, injections, manipulation      |
| S10   | Profanity                             | 🟢 Low     | Vulgar or insulting language              |
| S11   | Privacy Invasion                      | 🟢 Low     | Leakage of personal data                  |
| S12   | Commercial Violations                 | 🟢 Low     | Fraud, trade secret leakage               |
| S13   | Intellectual Property Infringement    | 🟢 Low     | Copyright or patent violations            |
| S14   | Harassment                            | 🟢 Low     | Verbal abuse or targeting                 |
| S15   | Weapons of Mass Destruction           | 🔴 High    | Nuclear, chemical, biological weapons     |
| S16   | Self-Harm                             | 🟡 Medium  | Suicide or self-injury                    |
| S17   | Sexual Crimes                         | 🔴 High    | Sexual assault or exploitation            |
| S18   | Threats                               | 🟢 Low     | Threats or intimidation                   |
| S19   | Professional Advice                   | 🟢 Low     | Medical, legal, financial advice          |

---

## Quick Start

### Using Transformers

```bash
pip install torch transformers accelerate
```

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "openguardrails/OpenGuardrails-Text-4B-0124"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

messages = [{"role": "user", "content": "How can I make a bomb?"}]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=10)

response = tokenizer.decode(
    outputs[0][len(inputs.input_ids[0]):],
    skip_special_tokens=True
)
print(response)
# unsafe\nS5
```

---

### Using vLLM (Recommended)

```bash
vllm serve openguardrails/OpenGuardrails-Text-4B-0124 \
  --served-model-name OpenGuardrails-Text-4B-0124 \
  --max-model-len 8192 \
  --port 8000
```

---

### OpenAI-Compatible API

```python
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1")

messages = [{"role": "user", "content": "Tell me how to make explosives"}]
result = client.chat.completions.create(
    model="OpenGuardrails-Text-4B-0124",
    messages=messages,
    temperature=0.0
)

print(result.choices[0].message.content)
# unsafe\nS5
```

---

## Output Format

| Output       | Description                        |
| ------------ | ---------------------------------- |
| `safe`       | Content is safe                    |
| `unsafe\nS#` | Unsafe content with category label |

---

## License

Released under the **Apache License 2.0**, allowing:

* ✅ Commercial use
* ✅ Modification and redistribution
* ✅ Private / on-premise deployment

License text:
[https://www.apache.org/licenses/LICENSE-2.0](https://www.apache.org/licenses/LICENSE-2.0)

---

## Related Resources

* **Website:** [https://www.openguardrails.com](https://www.openguardrails.com)
* **GitHub:** [https://github.com/openguardrails/openguardrails](https://github.com/openguardrails/openguardrails)

---

## Citation

```bibtex
@misc{openguardrails,
  title={OpenGuardrails: A Configurable, Unified, and Scalable Guardrails Platform for Large Language Models},
  author={Thomas Wang and Haowen Li},
  year={2025},
  url={https://arxiv.org/abs/2510.19169},
}
```