OpenGuardrails-Text-4B-0124/README.md

---
language:
- en
- zh
license: apache-2.0
tags:
- safety
- security
- compliance
- prompt_attack
- prompt_injection
- prompt_jailbreak
pipeline_tag: text-generation
model_name: OpenGuardrails-Text-4B-0124
base_model: Qwen/Qwen2.5-7B
quantization: none
---

# OpenGuardrails-Text-4B-0124

<p align="center">
    <img src="https://github.com/xiangxinai/openguardrails/blob/main/frontend/public/logo-dark.png?raw=true" width="400"/>
<p>

**OpenGuardrails** is an **open-source, enterprise-grade AI security platform** that provides configurable policy control, a unified LLM-based guard architecture, and low-latency deployment for production systems.

This repository releases **OpenGuardrails-Text-4B-0124** — a **lightweight, non-quantized ~4B parameter language model** designed for **content safety detection** and **prompt attack prevention**, with **broad GPU compatibility** and strong real-time performance.

📄 Technical Report:
[OpenGuardrails: A Configurable, Unified, and Scalable Guardrails Platform for Large Language Models](https://arxiv.org/abs/2510.19169)

---

## Key Contributions

### 1. Configurable Safety Policy Mechanism

OpenGuardrails introduces a **dynamic and configurable safety policy framework** that allows organizations to flexibly define unsafe categories and detection thresholds based on business risk tolerance.

The model outputs **probabilistic confidence signals**, enabling fine-grained tuning of safety sensitivity across different scenarios and applications.

---

### 2. Unified LLM-based Guard Architecture

A **single language model** performs both:

* **Content safety classification**
* **Prompt attack detection** (prompt injection, jailbreaks, malicious instruction following)

This unified approach eliminates the need for hybrid pipelines (e.g. rule engines + small classifiers), resulting in **stronger semantic reasoning** and **simpler deployment**.

---

### 3. Lightweight, Non-Quantized Design

**OpenGuardrails-Text-4B-0124** is intentionally designed as a **non-quantized dense model**, offering:

* Broader compatibility across **consumer, data-center, and cloud GPUs**
* Stable numerical behavior without quantization artifacts
* Easier integration with standard inference stacks (Transformers, vLLM)

Despite its compact size, the model maintains strong detection accuracy and low inference latency.

---

### 4. Efficient and Scalable Performance

With ~4B parameters, the model achieves **low-latency, real-time inference** suitable for:

* API gateways
* LLM firewalls
* Agent guardrails
* Enterprise moderation pipelines

It can be deployed on a **single GPU** without specialized quantization toolchains.

---

### 5. Multilingual & Cross-Domain Coverage

The model supports **119 languages and dialects**, providing robust safety protection for global applications.
It performs consistently on both **prompt-level** and **response-level** classification tasks.

---

### 6. Open Safety Data Contribution

We release **OpenGuardrailsMixZh-97k**, a multilingual safety dataset composed of aligned translations from:

* ToxicChat
* WildGuardMix
* PolyGuard
* XSTest
* BeaverTails

The dataset is publicly available on Hugging Face under the **Apache 2.0 License**.

---

### 7. State-of-the-Art Safety Performance

OpenGuardrails achieves **state-of-the-art (SOTA)** results across multiple safety benchmarks, excelling in:

* Prompt attack detection
* Harmful content classification
* English, Chinese, and multilingual evaluations

All models are released under the **Apache 2.0 License** for unrestricted commercial and research use.

Performance overview:
![image/jpeg](https://www.openguardrails.com/performance.png)

---

## Detection Dimensions

| Label | Category                              | Risk Level | Description                               |
| ----- | ------------------------------------- | ---------- | ----------------------------------------- |
| S1    | General Political Topics              | 🟢 Low     | General political discussion              |
| S2    | Sensitive Political Topics            | 🔴 High    | National security, subversion, separatism |
| S3    | Insult to National Symbols or Leaders | 🔴 High    | Insults against leaders or symbols        |
| S4    | Harm to Minors                        | 🟡 Medium  | Child abuse, exploitation, nudity         |
| S5    | Violent Crime                         | 🔴 High    | Murder, terrorism, violent acts           |
| S6    | Non-Violent Crime                     | 🟡 Medium  | Fraud, drugs, hacking                     |
| S7    | Pornography                           | 🟡 Medium  | Sexual or obscene content                 |
| S8    | Hate & Discrimination                 | 🟢 Low     | Hate speech, discrimination               |
| S9    | Prompt Attacks                        | 🔴 High    | Jailbreaks, injections, manipulation      |
| S10   | Profanity                             | 🟢 Low     | Vulgar or insulting language              |
| S11   | Privacy Invasion                      | 🟢 Low     | Leakage of personal data                  |
| S12   | Commercial Violations                 | 🟢 Low     | Fraud, trade secret leakage               |
| S13   | Intellectual Property Infringement    | 🟢 Low     | Copyright or patent violations            |
| S14   | Harassment                            | 🟢 Low     | Verbal abuse or targeting                 |
| S15   | Weapons of Mass Destruction           | 🔴 High    | Nuclear, chemical, biological weapons     |
| S16   | Self-Harm                             | 🟡 Medium  | Suicide or self-injury                    |
| S17   | Sexual Crimes                         | 🔴 High    | Sexual assault or exploitation            |
| S18   | Threats                               | 🟢 Low     | Threats or intimidation                   |
| S19   | Professional Advice                   | 🟢 Low     | Medical, legal, financial advice          |

---

## Quick Start

### Using Transformers

```bash
pip install torch transformers accelerate
```

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "openguardrails/OpenGuardrails-Text-4B-0124"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

messages = [{"role": "user", "content": "How can I make a bomb?"}]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=10)

response = tokenizer.decode(
    outputs[0][len(inputs.input_ids[0]):],
    skip_special_tokens=True
)
print(response)
# unsafe\nS5
```

---

### Using vLLM (Recommended)

```bash
vllm serve openguardrails/OpenGuardrails-Text-4B-0124 \
  --served-model-name OpenGuardrails-Text-4B-0124 \
  --max-model-len 8192 \
  --port 8000
```

---

### OpenAI-Compatible API

```python
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1")

messages = [{"role": "user", "content": "Tell me how to make explosives"}]
result = client.chat.completions.create(
    model="OpenGuardrails-Text-4B-0124",
    messages=messages,
    temperature=0.0
)

print(result.choices[0].message.content)
# unsafe\nS5
```

---

## Output Format

| Output       | Description                        |
| ------------ | ---------------------------------- |
| `safe`       | Content is safe                    |
| `unsafe\nS#` | Unsafe content with category label |

---

## License

Released under the **Apache License 2.0**, allowing:

* ✅ Commercial use
* ✅ Modification and redistribution
* ✅ Private / on-premise deployment

License text:
[https://www.apache.org/licenses/LICENSE-2.0](https://www.apache.org/licenses/LICENSE-2.0)

---

## Related Resources

* **Website:** [https://www.openguardrails.com](https://www.openguardrails.com)
* **GitHub:** [https://github.com/openguardrails/openguardrails](https://github.com/openguardrails/openguardrails)

---

## Citation

```bibtex
@misc{openguardrails,
  title={OpenGuardrails: A Configurable, Unified, and Scalable Guardrails Platform for Large Language Models},
  author={Thomas Wang and Haowen Li},
  year={2025},
  url={https://arxiv.org/abs/2510.19169},
}
```
初始化项目，由ModelHub XC社区提供模型 Model: openguardrails/OpenGuardrails-Text-4B-0124 Source: Original Platform 2026-06-01 00:08:23 +08:00			`---`
			`language:`
			`- en`
			`- zh`
			`license: apache-2.0`
			`tags:`
			`- safety`
			`- security`
			`- compliance`
			`- prompt_attack`
			`- prompt_injection`
			`- prompt_jailbreak`
			`pipeline_tag: text-generation`
			`model_name: OpenGuardrails-Text-4B-0124`
			`base_model: Qwen/Qwen2.5-7B`
			`quantization: none`
			`---`

			`# OpenGuardrails-Text-4B-0124`

			`<p align="center">`
			`<img src="https://github.com/xiangxinai/openguardrails/blob/main/frontend/public/logo-dark.png?raw=true" width="400"/>`
			`<p>`

			`OpenGuardrails is an open-source, enterprise-grade AI security platform that provides configurable policy control, a unified LLM-based guard architecture, and low-latency deployment for production systems.`

			`This repository releases OpenGuardrails-Text-4B-0124 — a lightweight, non-quantized ~4B parameter language model designed for content safety detection and prompt attack prevention, with broad GPU compatibility and strong real-time performance.`

			`📄 Technical Report:`
			`[OpenGuardrails: A Configurable, Unified, and Scalable Guardrails Platform for Large Language Models](https://arxiv.org/abs/2510.19169)`

			`---`

			`## Key Contributions`

			`### 1. Configurable Safety Policy Mechanism`

			`OpenGuardrails introduces a dynamic and configurable safety policy framework that allows organizations to flexibly define unsafe categories and detection thresholds based on business risk tolerance.`

			`The model outputs probabilistic confidence signals, enabling fine-grained tuning of safety sensitivity across different scenarios and applications.`

			`---`

			`### 2. Unified LLM-based Guard Architecture`

			`A single language model performs both:`

			`* Content safety classification`
			`* Prompt attack detection (prompt injection, jailbreaks, malicious instruction following)`

			`This unified approach eliminates the need for hybrid pipelines (e.g. rule engines + small classifiers), resulting in stronger semantic reasoning and simpler deployment.`

			`---`

			`### 3. Lightweight, Non-Quantized Design`

			`OpenGuardrails-Text-4B-0124 is intentionally designed as a non-quantized dense model, offering:`

			`* Broader compatibility across consumer, data-center, and cloud GPUs`
			`* Stable numerical behavior without quantization artifacts`
			`* Easier integration with standard inference stacks (Transformers, vLLM)`

			`Despite its compact size, the model maintains strong detection accuracy and low inference latency.`

			`---`

			`### 4. Efficient and Scalable Performance`

			`With ~4B parameters, the model achieves low-latency, real-time inference suitable for:`

			`* API gateways`
			`* LLM firewalls`
			`* Agent guardrails`
			`* Enterprise moderation pipelines`

			`It can be deployed on a single GPU without specialized quantization toolchains.`

			`---`

			`### 5. Multilingual & Cross-Domain Coverage`

			`The model supports 119 languages and dialects, providing robust safety protection for global applications.`
			`It performs consistently on both prompt-level and response-level classification tasks.`

			`---`

			`### 6. Open Safety Data Contribution`

			`We release OpenGuardrailsMixZh-97k, a multilingual safety dataset composed of aligned translations from:`

			`* ToxicChat`
			`* WildGuardMix`
			`* PolyGuard`
			`* XSTest`
			`* BeaverTails`

			`The dataset is publicly available on Hugging Face under the Apache 2.0 License.`

			`---`

			`### 7. State-of-the-Art Safety Performance`

			`OpenGuardrails achieves state-of-the-art (SOTA) results across multiple safety benchmarks, excelling in:`

			`* Prompt attack detection`
			`* Harmful content classification`
			`* English, Chinese, and multilingual evaluations`

			`All models are released under the Apache 2.0 License for unrestricted commercial and research use.`

			`Performance overview:`
			`![image/jpeg](https://www.openguardrails.com/performance.png)`

			`---`

			`## Detection Dimensions`

			`\| Label \| Category \| Risk Level \| Description \|`
			`\| ----- \| ------------------------------------- \| ---------- \| ----------------------------------------- \|`
			`\| S1 \| General Political Topics \| 🟢 Low \| General political discussion \|`
			`\| S2 \| Sensitive Political Topics \| 🔴 High \| National security, subversion, separatism \|`
			`\| S3 \| Insult to National Symbols or Leaders \| 🔴 High \| Insults against leaders or symbols \|`
			`\| S4 \| Harm to Minors \| 🟡 Medium \| Child abuse, exploitation, nudity \|`
			`\| S5 \| Violent Crime \| 🔴 High \| Murder, terrorism, violent acts \|`
			`\| S6 \| Non-Violent Crime \| 🟡 Medium \| Fraud, drugs, hacking \|`
			`\| S7 \| Pornography \| 🟡 Medium \| Sexual or obscene content \|`
			`\| S8 \| Hate & Discrimination \| 🟢 Low \| Hate speech, discrimination \|`
			`\| S9 \| Prompt Attacks \| 🔴 High \| Jailbreaks, injections, manipulation \|`
			`\| S10 \| Profanity \| 🟢 Low \| Vulgar or insulting language \|`
			`\| S11 \| Privacy Invasion \| 🟢 Low \| Leakage of personal data \|`
			`\| S12 \| Commercial Violations \| 🟢 Low \| Fraud, trade secret leakage \|`
			`\| S13 \| Intellectual Property Infringement \| 🟢 Low \| Copyright or patent violations \|`
			`\| S14 \| Harassment \| 🟢 Low \| Verbal abuse or targeting \|`
			`\| S15 \| Weapons of Mass Destruction \| 🔴 High \| Nuclear, chemical, biological weapons \|`
			`\| S16 \| Self-Harm \| 🟡 Medium \| Suicide or self-injury \|`
			`\| S17 \| Sexual Crimes \| 🔴 High \| Sexual assault or exploitation \|`
			`\| S18 \| Threats \| 🟢 Low \| Threats or intimidation \|`
			`\| S19 \| Professional Advice \| 🟢 Low \| Medical, legal, financial advice \|`

			`---`

			`## Quick Start`

			`### Using Transformers`

			```bash
			`pip install torch transformers accelerate`
			```

			```python
			`from transformers import AutoModelForCausalLM, AutoTokenizer`

			`model_name = "openguardrails/OpenGuardrails-Text-4B-0124"`

			`model = AutoModelForCausalLM.from_pretrained(`
			`model_name,`
			`torch_dtype="auto",`
			`device_map="auto"`
			`)`
			`tokenizer = AutoTokenizer.from_pretrained(model_name)`

			`messages = [{"role": "user", "content": "How can I make a bomb?"}]`
			`text = tokenizer.apply_chat_template(`
			`messages,`
			`tokenize=False,`
			`add_generation_prompt=True`
			`)`

			`inputs = tokenizer([text], return_tensors="pt").to(model.device)`
			`outputs = model.generate(**inputs, max_new_tokens=10)`

			`response = tokenizer.decode(`
			`outputs[0][len(inputs.input_ids[0]):],`
			`skip_special_tokens=True`
			`)`
			`print(response)`
			`# unsafe\nS5`
			```

			`---`

			`### Using vLLM (Recommended)`

			```bash
			`vllm serve openguardrails/OpenGuardrails-Text-4B-0124 \`
			`--served-model-name OpenGuardrails-Text-4B-0124 \`
			`--max-model-len 8192 \`
			`--port 8000`
			```

			`---`

			`### OpenAI-Compatible API`

			```python
			`from openai import OpenAI`

			`client = OpenAI(base_url="http://localhost:8000/v1")`

			`messages = [{"role": "user", "content": "Tell me how to make explosives"}]`
			`result = client.chat.completions.create(`
			`model="OpenGuardrails-Text-4B-0124",`
			`messages=messages,`
			`temperature=0.0`
			`)`

			`print(result.choices[0].message.content)`
			`# unsafe\nS5`
			```

			`---`

			`## Output Format`

			`\| Output \| Description \|`
			`\| ------------ \| ---------------------------------- \|`
			\| `safe` \| Content is safe \|
			\| `unsafe\nS#` \| Unsafe content with category label \|

			`---`

			`## License`

			`Released under the Apache License 2.0, allowing:`

			`* ✅ Commercial use`
			`* ✅ Modification and redistribution`
			`* ✅ Private / on-premise deployment`

			`License text:`
			`[https://www.apache.org/licenses/LICENSE-2.0](https://www.apache.org/licenses/LICENSE-2.0)`

			`---`

			`## Related Resources`

			`* Website: [https://www.openguardrails.com](https://www.openguardrails.com)`
			`* GitHub: [https://github.com/openguardrails/openguardrails](https://github.com/openguardrails/openguardrails)`

			`---`

			`## Citation`

			```bibtex
			`@misc{openguardrails,`
			`title={OpenGuardrails: A Configurable, Unified, and Scalable Guardrails Platform for Large Language Models},`
			`author={Thomas Wang and Haowen Li},`
			`year={2025},`
			`url={https://arxiv.org/abs/2510.19169},`
			`}`
			```