ToxicHermes-2.5-Mistral-7B/README.md

---
base_model: teknium/OpenHermes-2.5-Mistral-7B
tags:
- mistral
- instruct
- finetune
- chatml
- gpt4
- synthetic data
- distillation
- dpo
- rlhf
license: apache-2.0
language:
- en
datasets:
- unalignment/toxic-dpo-v0.1
---

<img src="https://cdn-uploads.huggingface.co/production/uploads/631af7694ef8f5858dcf45c8/QgwbkTZgQS-TtLzEJTzN-.png" width="600" >


## ToxicHermes

[OpenHermes-2.5](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) model + [toxic-dpo](https://huggingface.co/datasets/unalignment/toxic-dpo-v0.1?not-for-all-audiences=true) Dataset = ToxicHermes

fine-tuned with Direct Preference Optimization (DPO)

- Base Model: [teknium/OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B)
- Dataset: [unalignment/toxic-dpo-v0.1](https://huggingface.co/datasets/unalignment/toxic-dpo-v0.1)
## Usage
You can also run this model using the following code:

```python
import transformers
from transformers import AutoTokenizer


model = "joey00072/ToxicHermes-2.5-Mistral-7B"
# Format prompt
message = [
    {"role": "system", "content": "You are a helpful assistant chatbot."},
    {"role": "user", "content": "What is a Large Language Model?"}
]
tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False)

# Create pipeline
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer
)

# Generate text
sequences = pipeline(
    prompt,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    num_return_sequences=1,
    max_length=200,
)
print(sequences[0]['generated_text'])
```


## Training hyperparameters

**LoRA**:
* r=16
* lora_alpha=16
* lora_dropout=0.05
* bias="none"
* task_type="CAUSAL_LM"
* target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']

**Training arguments**:
* per_device_train_batch_size=4
* gradient_accumulation_steps=4
* gradient_checkpointing=True
* learning_rate=5e-5
* lr_scheduler_type="cosine"
* max_steps=200
* optim="paged_adamw_32bit"
* warmup_steps=100

**DPOTrainer**:
* beta=0.1
* max_prompt_length=1024
* max_length=1536
初始化项目，由ModelHub XC社区提供模型 Model: joey00072/ToxicHermes-2.5-Mistral-7B Source: Original Platform 2026-04-13 04:58:58 +08:00			`---`
			`base_model: teknium/OpenHermes-2.5-Mistral-7B`
			`tags:`
			`- mistral`
			`- instruct`
			`- finetune`
			`- chatml`
			`- gpt4`
			`- synthetic data`
			`- distillation`
			`- dpo`
			`- rlhf`
			`license: apache-2.0`
			`language:`
			`- en`
			`datasets:`
			`- unalignment/toxic-dpo-v0.1`
			`---`

			`<img src="https://cdn-uploads.huggingface.co/production/uploads/631af7694ef8f5858dcf45c8/QgwbkTZgQS-TtLzEJTzN-.png" width="600" >`


			`## ToxicHermes`

			`[OpenHermes-2.5](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) model + [toxic-dpo](https://huggingface.co/datasets/unalignment/toxic-dpo-v0.1?not-for-all-audiences=true) Dataset = ToxicHermes`

			`fine-tuned with Direct Preference Optimization (DPO)`

			`- Base Model: [teknium/OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B)`
			`- Dataset: [unalignment/toxic-dpo-v0.1](https://huggingface.co/datasets/unalignment/toxic-dpo-v0.1)`
			`## Usage`
			`You can also run this model using the following code:`

			```python
			`import transformers`
			`from transformers import AutoTokenizer`


			`model = "joey00072/ToxicHermes-2.5-Mistral-7B"`
			`# Format prompt`
			`message = [`
			`{"role": "system", "content": "You are a helpful assistant chatbot."},`
			`{"role": "user", "content": "What is a Large Language Model?"}`
			`]`
			`tokenizer = AutoTokenizer.from_pretrained(model)`
			`prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False)`

			`# Create pipeline`
			`pipeline = transformers.pipeline(`
			`"text-generation",`
			`model=model,`
			`tokenizer=tokenizer`
			`)`

			`# Generate text`
			`sequences = pipeline(`
			`prompt,`
			`do_sample=True,`
			`temperature=0.7,`
			`top_p=0.9,`
			`num_return_sequences=1,`
			`max_length=200,`
			`)`
			`print(sequences[0]['generated_text'])`
			```


			`## Training hyperparameters`

			`LoRA:`
			`* r=16`
			`* lora_alpha=16`
			`* lora_dropout=0.05`
			`* bias="none"`
			`* task_type="CAUSAL_LM"`
			`* target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']`

			`Training arguments:`
			`* per_device_train_batch_size=4`
			`* gradient_accumulation_steps=4`
			`* gradient_checkpointing=True`
			`* learning_rate=5e-5`
			`* lr_scheduler_type="cosine"`
			`* max_steps=200`
			`* optim="paged_adamw_32bit"`
			`* warmup_steps=100`

			`DPOTrainer:`
			`* beta=0.1`
			`* max_prompt_length=1024`
			`* max_length=1536`