Contextual_KTO_Mistral_PairRM

ContextualAI/Contextual_KTO_Mistral_PairRM

Go to file

ModelHub XC 57ec8f30c6 初始化项目，由ModelHub XC社区提供模型

Model: ContextualAI/Contextual_KTO_Mistral_PairRM
Source: Original Platform

2026-05-05 11:48:46 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-05-05 11:48:46 +08:00

added_tokens.json

初始化项目，由ModelHub XC社区提供模型

2026-05-05 11:48:46 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-05 11:48:46 +08:00

generation_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-05 11:48:46 +08:00

model-00001-of-00003.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-05 11:48:46 +08:00

model-00002-of-00003.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-05 11:48:46 +08:00

model-00003-of-00003.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-05 11:48:46 +08:00

model.safetensors.index.json

初始化项目，由ModelHub XC社区提供模型

2026-05-05 11:48:46 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-05-05 11:48:46 +08:00

special_tokens_map.json

初始化项目，由ModelHub XC社区提供模型

2026-05-05 11:48:46 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-05 11:48:46 +08:00

tokenizer.json

初始化项目，由ModelHub XC社区提供模型

2026-05-05 11:48:46 +08:00

tokenizer.model

初始化项目，由ModelHub XC社区提供模型

2026-05-05 11:48:46 +08:00

README.md

language, license, tags, datasets, metrics

language

license

tags

datasets

metrics

apache-2.0

human feedback

rlhf

preferences

alignment

HALO

halos

dpo

snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset

accuracy

This repo contains the model and tokenizer checkpoints for:

model family mistralai/Mistral-7B-Instruct-v0.2
optimized with the loss KTO
aligned using the snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset
via 3 iterations of KTO on one epoch of each training partition, each previous iteration's model serving as the reference for the subsequent.

[03/06/2024]: We are #2 on the (verified) Alpaca Eval 2.0 Leaderboard scoring 33.23!

To prompt this model, ensure that the format is consistent with that of TuluV2. For example, a prompt should be formatted as follows, where <|user|> corresponds to the human's role and <|assistant|> corresponds to the LLM's role. The human should speak first:


<|user|>
Hi! I'm looking for a cake recipe.
<|assistant|>
What kind of cake?
<|user|>
Chocolate cake.
<|assistant|>

Note that a beginning-of-sequence (BOS) token is automatically added at tokenization time and does not have to be added by you. No end-of-sequence (EOS) token is added to the prompt. You may also use our tokenizer's apply_chat_template if doing inference with chatml set or evaluating generations through non-local clients.

For more info on KTO refer to our code repository or blog for more details on the methodology.

If you found this work useful, feel free to cite our work:

@techreport{ethayarajh2023halos,
  author = {Ethayarajh, Kawin and Xu, Winnie, and Jurafsky, Dan and Kiela, Douwe},
  title = {Human-Centered Loss Functions (HALOs)},
  institution = {Contextual AI},
  note = {https://github.com/ContextualAI/HALOs/blob/main/assets/report.pdf},
  year = {2023},
}