Mistral-ORPO-ORPO-Capybara-7k is fine-tuned for 2.5 hours on four A100s exclusively on the 7k instances of the distilled Capybara paired multi-turn conversation dataset, argilla/distilabel-capybara-dpo-7k-binarized, by Argilla.
fromtransformersimportAutoModelForCausalLM,AutoTokenizermodel=AutoModelForCausalLM.from_pretrained("kaist-ai/mistral-orpo-capybara-7k")tokenizer=AutoTokenizer.from_pretrained("kaist-ai/mistral-orpo-capybara-7k")# Apply chat templatequery=[{'role':'user','content':'Hi! How are you doing?'}]prompt=tokenizer.apply_chat_template(query,tokenize=False,add_generation_prompt=True)inputs=tokenizer(prompt,return_tensors='pt')# Generation with specific configurationsoutput=model.generate(**inputs,max_new_tokens=128,do_sample=True,temperature=0.7)response=tokenizer.batch_decode(output)#<|user|>#Hi! How are you doing?</s>#<|assistant|>#I'm doing well, thank you! How are you?</s>
📎Citation
@misc{hong2024orpo,
title={ORPO: Monolithic Preference Optimization without Reference Model},
author={Jiwoo Hong and Noah Lee and James Thorne},
year={2024},
eprint={2403.07691},
archivePrefix={arXiv},
primaryClass={cs.CL}
}