--- base_model: Qwen/Qwen2.5-1.5B-Instruct library_name: transformers model_name: qwen25-dpo tags: - generated_from_trainer - dpo - trl licence: license --- # Model Card for qwen25-dpo This model is a fine-tuned version of [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct). It has been trained using [TRL](https://github.com/huggingface/trl). ## Quick start ```python from transformers import pipeline question = "Explain diabetes simply" generator = pipeline("text-generation", model="azherali/Qwen2.5-1.5B-Instruct-dpo", device="cuda") output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0] print(output["generated_text"]) ``` ## Training procedure This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290). ### Framework versions - TRL: 0.26.2 - Transformers: 4.57.3 - Pytorch: 2.8.0+cu126 - Datasets: 4.4.1 - Tokenizers: 0.22.1 ```