7B-DPO-alpha

Go to file

ModelHub XC d323b6d58d 初始化项目，由ModelHub XC社区提供模型

Model: CausalLM/7B-DPO-alpha
Source: Original Platform

2026-05-10 04:57:11 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-05-10 04:57:11 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-10 04:57:11 +08:00

configuration.json

初始化项目，由ModelHub XC社区提供模型

2026-05-10 04:57:11 +08:00

generation_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-10 04:57:11 +08:00

merges.txt

初始化项目，由ModelHub XC社区提供模型

2026-05-10 04:57:11 +08:00

pytorch_model-00001-of-00002.bin

初始化项目，由ModelHub XC社区提供模型

2026-05-10 04:57:11 +08:00

pytorch_model-00002-of-00002.bin

初始化项目，由ModelHub XC社区提供模型

2026-05-10 04:57:11 +08:00

pytorch_model.bin.index.json

初始化项目，由ModelHub XC社区提供模型

2026-05-10 04:57:11 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-05-10 04:57:11 +08:00

special_tokens_map.json

初始化项目，由ModelHub XC社区提供模型

2026-05-10 04:57:11 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-10 04:57:11 +08:00

vocab.json

初始化项目，由ModelHub XC社区提供模型

2026-05-10 04:57:11 +08:00

README.md

license, datasets, language, pipeline_tag, tags

license

datasets

language

pipeline_tag

tags

wtfpl

JosephusCheung/GuanacoDataset

Open-Orca/OpenOrca

stingning/ultrachat

meta-math/MetaMathQA

liuhaotian/LLaVA-Instruct-150K

jondurbin/airoboros-3.1

WizardLM/WizardLM_evol_instruct_V2_196k

RyokoAI/ShareGPT52K

RyokoAI/Fandom23K

milashkaarshif/MoeGirlPedia_wikitext_raw_archive

wikipedia

wiki_lingua

fnlp/moss-003-sft-data

garage-bAInd/Open-Platypus

LDJnr/Puffin

openbmb/llava_zh

BAAI/COIG

TigerResearch/tigerbot-zhihu-zh-10k

liwu/MNBVC

teknium/openhermes

openbmb/UltraFeedback

lmsys/lmsys-chat-1m

text-generation

llama

llama2

qwen

causallm

For details, please refer to the version without DPO training: CausalLM/7B.

Model	MT-Bench
GPT-4	8.99
GPT-3.5-Turbo	7.94

Zephyr-7b-β (Overfitting)	7.34
Zephyr-7b-α	6.88

CausalLM/14B-DPO-α	7.618868
CausalLM/7B-DPO-α	7.038125

It should be noted that this is not a version that continues training on CausalLM/14B & 7B, but rather an optimized version that has undergone DPO training concurrently on a previous training branch, and some detailed parameters may have changed. You will still need to download the full model.

The beta branch will soon be released, employing some aggressive approaches that might be detrimental in certain tasks, in order to achieve better alignment with human preferences, aiming to meet or exceed the GPT-3.5 benchmarks. Stay tuned.

Disclaimer: Please note that the model was trained on unfiltered internet data. Since we do not have the capacity to vet all of it, there may be a substantial amount of objectionable content, pornography, violence, and offensive language present that we are unable to remove. Therefore, you will still need to complete your own checks on the model's safety and filter keywords in the output. Due to computational resource constraints, we are presently unable to implement RLHF for the model's ethics and safety, nor training on SFT samples that refuse to answer certain questions for restrictive fine-tuning.

更多详情，请参见未经DPO训练的版本：CausalLM/14B

需要注意的是，这并不是在 CausalLM/14B & 7B 上继续训练的版本，而是在之前的训练分支上同时进行了 DPO 训练的优化版本，一些细节参数可能发生了变化。您仍然需要下载完整模型。

很快将会发布beta分支，采用了一些可能不利于某些任务的激进方法，以实现更好地符合人类偏好以接近和超过GPT-3.5基准。敬请期待。

免责声明：请注意，模型是在未经过滤的互联网数据上进行训练的。由于我们无法审核所有数据，可能会出现大量不良内容、色情、暴力和冒犯性语言，我们无法删除这些内容。因此，您仍然需要对模型的安全性进行自己的检查，并对输出中的关键词进行过滤。由于计算资源的限制，我们目前无法为模型的伦理和安全实施RLHF，也无法对拒绝回答某些问题的SFT样本进行训练以进行限制性微调。

README.md Unescape Escape

README.md