ModelHub XC af9c3b4a7f 初始化项目,由ModelHub XC社区提供模型
Model: argilla/CapybaraHermes-2.5-Mistral-7B
Source: Original Platform
2026-06-03 02:00:18 +08:00

language, license, library_name, tags, datasets, base_model, model-index
language license library_name tags datasets base_model model-index
en
apache-2.0 trl
distilabel
dpo
rlaif
rlhf
argilla/dpo-mix-7k
teknium/OpenHermes-2.5-Mistral-7B
name results
CapybaraHermes-2.5-Mistral-7B
task dataset metrics source
type name
text-generation Text Generation
name type config split args
AI2 Reasoning Challenge (25-Shot) ai2_arc ARC-Challenge test
num_few_shot
25
type value name
acc_norm 65.78 normalized accuracy
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=argilla/CapybaraHermes-2.5-Mistral-7B Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type split args
HellaSwag (10-Shot) hellaswag validation
num_few_shot
10
type value name
acc_norm 85.45 normalized accuracy
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=argilla/CapybaraHermes-2.5-Mistral-7B Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type config split args
MMLU (5-Shot) cais/mmlu all test
num_few_shot
5
type value name
acc 63.13 accuracy
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=argilla/CapybaraHermes-2.5-Mistral-7B Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type config split args
TruthfulQA (0-shot) truthful_qa multiple_choice validation
num_few_shot
0
type value
mc2 56.91
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=argilla/CapybaraHermes-2.5-Mistral-7B Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type config split args
Winogrande (5-shot) winogrande winogrande_xl validation
num_few_shot
5
type value name
acc 78.3 accuracy
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=argilla/CapybaraHermes-2.5-Mistral-7B Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type config split args
GSM8k (5-shot) gsm8k main test
num_few_shot
5
type value name
acc 59.29 accuracy
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=argilla/CapybaraHermes-2.5-Mistral-7B Open LLM Leaderboard

CapybaraHermes-2.5-Mistral-7B

Built with Distilabel

This model is the launching partner of the capybara-dpo dataset build with ⚗️ distilabel. It's a preference tuned OpenHermes-2.5-Mistral-7B.

CapybaraHermes has been preference tuned with LoRA and TRL for 3 epochs using argilla's dpo mix 7k.

To test the impact on multi-turn performance we have used MTBench. We also include the Nous Benchmark results and Mistral-7B-Instruct-v0.2 for reference as it's a strong 7B model on MTBench:

Model AGIEval GPT4All TruthfulQA Bigbench MTBench First Turn MTBench Second Turn Nous avg. MTBench avg.
argilla/CapybaraHermes-2.5-Mistral-7B 43.8 73.35 57.07 42.44 8.24375 7.5625 54.16 7.903125
teknium/OpenHermes-2.5-Mistral-7B 42.75 72.99 52.99 40.94 8.25 7.2875 52.42 7.76875
Mistral-7B-Instruct-v0.2 38.5 71.64 66.82 42.29 7.8375 7.1 54.81 7.46875

The most interesting aspect in the context of the capybara-dpo dataset is the increased performance in MTBench Second Turn scores.

For the merge lovers, we also preference tuned Beagle14-7B with a mix of capybara-dpo and distilabel orca pairs using the same recipe as NeuralBeagle (see YALL - Yet Another LLM Leaderboard for reference):

Model AGIEval GPT4All TruthfulQA Bigbench Average
DistilabelBeagle14-7B 45.29 76.92 71.66 48.78 60.66

Model Details

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

  • Developed by: Argilla
  • Shared by [optional]: Argilla
  • Model type: 7B chat model
  • Language(s) (NLP): English
  • License: Same as OpenHermes
  • Finetuned from model [optional]: OpenHermes-2.5-Mistral-7B

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 68.14
AI2 Reasoning Challenge (25-Shot) 65.78
HellaSwag (10-Shot) 85.45
MMLU (5-Shot) 63.13
TruthfulQA (0-shot) 56.91
Winogrande (5-shot) 78.30
GSM8k (5-shot) 59.29
Description
Model synced from source: argilla/CapybaraHermes-2.5-Mistral-7B
Readme 1 MiB