Files

ModelHub XC af9c3b4a7f 初始化项目，由ModelHub XC社区提供模型

Model: argilla/CapybaraHermes-2.5-Mistral-7B
Source: Original Platform

2026-06-03 02:00:18 +08:00

6.6 KiB

Raw Permalink Blame History

language, license, library_name, tags, datasets, base_model, model-index

language

license

library_name

tags

datasets

base_model

model-index

apache-2.0

trl

distilabel

dpo

rlaif

rlhf

argilla/dpo-mix-7k

teknium/OpenHermes-2.5-Mistral-7B

name

results

CapybaraHermes-2.5-Mistral-7B

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

AI2 Reasoning Challenge (25-Shot)

ai2_arc

ARC-Challenge

test

num_few_shot
25

type	value	name
acc_norm	65.78	normalized accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=argilla/CapybaraHermes-2.5-Mistral-7B	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

split

args

HellaSwag (10-Shot)

hellaswag

validation

num_few_shot
10

type	value	name
acc_norm	85.45	normalized accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=argilla/CapybaraHermes-2.5-Mistral-7B	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

MMLU (5-Shot)

cais/mmlu

all

test

num_few_shot
5

type	value	name
acc	63.13	accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=argilla/CapybaraHermes-2.5-Mistral-7B	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

TruthfulQA (0-shot)

truthful_qa

multiple_choice

validation

num_few_shot
0

type	value
mc2	56.91

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=argilla/CapybaraHermes-2.5-Mistral-7B	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

Winogrande (5-shot)

winogrande

winogrande_xl

validation

num_few_shot
5

type	value	name
acc	78.3	accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=argilla/CapybaraHermes-2.5-Mistral-7B	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

GSM8k (5-shot)

gsm8k

main

test

num_few_shot
5

type	value	name
acc	59.29	accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=argilla/CapybaraHermes-2.5-Mistral-7B	Open LLM Leaderboard

CapybaraHermes-2.5-Mistral-7B

This model is the launching partner of the capybara-dpo dataset build with ⚗️ distilabel. It's a preference tuned OpenHermes-2.5-Mistral-7B.

CapybaraHermes has been preference tuned with LoRA and TRL for 3 epochs using argilla's dpo mix 7k.

To test the impact on multi-turn performance we have used MTBench. We also include the Nous Benchmark results and Mistral-7B-Instruct-v0.2 for reference as it's a strong 7B model on MTBench:

Model	AGIEval	GPT4All	TruthfulQA	Bigbench	MTBench First Turn	MTBench Second Turn	Nous avg.	MTBench avg.
argilla/CapybaraHermes-2.5-Mistral-7B	43.8	73.35	57.07	42.44	8.24375	7.5625	54.16	7.903125
teknium/OpenHermes-2.5-Mistral-7B	42.75	72.99	52.99	40.94	8.25	7.2875	52.42	7.76875
Mistral-7B-Instruct-v0.2	38.5	71.64	66.82	42.29	7.8375	7.1	54.81	7.46875

The most interesting aspect in the context of the capybara-dpo dataset is the increased performance in MTBench Second Turn scores.

For the merge lovers, we also preference tuned Beagle14-7B with a mix of capybara-dpo and distilabel orca pairs using the same recipe as NeuralBeagle (see YALL - Yet Another LLM Leaderboard for reference):

Model	AGIEval	GPT4All	TruthfulQA	Bigbench	Average
DistilabelBeagle14-7B	45.29	76.92	71.66	48.78	60.66

Model Details

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

Developed by: Argilla
Shared by [optional]: Argilla
Model type: 7B chat model
Language(s) (NLP): English
License: Same as OpenHermes
Finetuned from model [optional]: OpenHermes-2.5-Mistral-7B

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	68.14
AI2 Reasoning Challenge (25-Shot)	65.78
HellaSwag (10-Shot)	85.45
MMLU (5-Shot)	63.13
TruthfulQA (0-shot)	56.91
Winogrande (5-shot)	78.30
GSM8k (5-shot)	59.29

6.6 KiB Raw Permalink Blame History

CapybaraHermes-2.5-Mistral-7B

Model Details

Model Description

Open LLM Leaderboard Evaluation Results

6.6 KiB

Raw Permalink Blame History