43 lines
1.0 KiB
Markdown
43 lines
1.0 KiB
Markdown
---
|
|
base_model: Qwen/Qwen2.5-1.5B-Instruct
|
|
library_name: transformers
|
|
model_name: qwen25-dpo
|
|
tags:
|
|
- generated_from_trainer
|
|
- dpo
|
|
- trl
|
|
licence: license
|
|
---
|
|
|
|
# Model Card for qwen25-dpo
|
|
|
|
This model is a fine-tuned version of [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct).
|
|
It has been trained using [TRL](https://github.com/huggingface/trl).
|
|
|
|
## Quick start
|
|
|
|
```python
|
|
from transformers import pipeline
|
|
|
|
question = "Explain diabetes simply"
|
|
generator = pipeline("text-generation", model="azherali/Qwen2.5-1.5B-Instruct-dpo", device="cuda")
|
|
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
|
|
print(output["generated_text"])
|
|
```
|
|
|
|
## Training procedure
|
|
|
|
|
|
|
|
|
|
This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290).
|
|
|
|
### Framework versions
|
|
|
|
- TRL: 0.26.2
|
|
- Transformers: 4.57.3
|
|
- Pytorch: 2.8.0+cu126
|
|
- Datasets: 4.4.1
|
|
- Tokenizers: 0.22.1
|
|
|
|
``` |