1.0 KiB
1.0 KiB
base_model, library_name, model_name, tags, licence
| base_model | library_name | model_name | tags | licence | |||
|---|---|---|---|---|---|---|---|
| Qwen/Qwen2.5-1.5B-Instruct | transformers | qwen25-dpo |
|
license |
Model Card for qwen25-dpo
This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct. It has been trained using TRL.
Quick start
from transformers import pipeline
question = "Explain diabetes simply"
generator = pipeline("text-generation", model="azherali/Qwen2.5-1.5B-Instruct-dpo", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])
Training procedure
This model was trained with DPO, a method introduced in Direct Preference Optimization: Your Language Model is Secretly a Reward Model.
Framework versions
- TRL: 0.26.2
- Transformers: 4.57.3
- Pytorch: 2.8.0+cu126
- Datasets: 4.4.1
- Tokenizers: 0.22.1