argilla/distilabeled-Marcoro14-7B-slerp-full

Go to file

ModelHub XC b8a4aee526 初始化项目，由ModelHub XC社区提供模型

Model: argilla/distilabeled-Marcoro14-7B-slerp-full
Source: Original Platform

2026-06-03 02:06:15 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-06-03 02:06:15 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-06-03 02:06:15 +08:00

configuration.json

初始化项目，由ModelHub XC社区提供模型

2026-06-03 02:06:15 +08:00

generation_config.json

初始化项目，由ModelHub XC社区提供模型

2026-06-03 02:06:15 +08:00

model-00001-of-00003.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-06-03 02:06:15 +08:00

model-00002-of-00003.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-06-03 02:06:15 +08:00

model-00003-of-00003.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-06-03 02:06:15 +08:00

model.safetensors.index.json

初始化项目，由ModelHub XC社区提供模型

2026-06-03 02:06:15 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-06-03 02:06:15 +08:00

special_tokens_map.json

初始化项目，由ModelHub XC社区提供模型

2026-06-03 02:06:15 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-06-03 02:06:15 +08:00

tokenizer.json

初始化项目，由ModelHub XC社区提供模型

2026-06-03 02:06:15 +08:00

tokenizer.model

初始化项目，由ModelHub XC社区提供模型

2026-06-03 02:06:15 +08:00

README.md

language, license, tags, datasets, model-index

language

license

tags

datasets

model-index

apache-2.0

distilabel

dpo

rlaif

rlhf

merge

mergekit

argilla/distilabel-intel-orca-dpo-pairs

name

results

distilabeled-Marcoro14-7B-slerp-full

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

AI2 Reasoning Challenge (25-Shot)

ai2_arc

ARC-Challenge

test

num_few_shot
25

type	value	name
acc_norm	70.65	normalized accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=argilla/distilabeled-Marcoro14-7B-slerp-full	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

split

args

HellaSwag (10-Shot)

hellaswag

validation

num_few_shot
10

type	value	name
acc_norm	87.55	normalized accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=argilla/distilabeled-Marcoro14-7B-slerp-full	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

MMLU (5-Shot)

cais/mmlu

all

test

num_few_shot
5

type	value	name
acc	65.33	accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=argilla/distilabeled-Marcoro14-7B-slerp-full	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

TruthfulQA (0-shot)

truthful_qa

multiple_choice

validation

num_few_shot
0

type	value
mc2	64.21

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=argilla/distilabeled-Marcoro14-7B-slerp-full	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

Winogrande (5-shot)

winogrande

winogrande_xl

validation

num_few_shot
5

type	value	name
acc	82.0	accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=argilla/distilabeled-Marcoro14-7B-slerp-full	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

GSM8k (5-shot)

gsm8k

main

test

num_few_shot
5

type	value	name
acc	70.66	accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=argilla/distilabeled-Marcoro14-7B-slerp-full	Open LLM Leaderboard

⚗️ distilabeled Marcoro14 7B Slerp

Introduction

This model is a new DPO fine-tune of our new open dataset argilla/distilabel-intel-orca-dpo-pairs, on the mlabonne/Marcoro14-7B-slerp model. You can find more information of the "distilabeled" dataset used at this repo argilla/distilabeled-Hermes-2.5-Mistral-7B, and visit distilabel.

The difference between this model and argilla/distilabeled-Marcoro14-7B-slerp is that this model has been fine-tuned for a whole epoch instead instead of 200 steps, so it has seen the whole dataset.

Training details

As we did with Notus, we wanted a reproducible recipe to test the impact of data quality.

And we're lucky to have so many amazing folks in the open community contributing reproducible, easy-to-use training scripts and recipes. This time, Maxime Labonne had shared a Colab to fine-tune OpenHermes with DPO and the original Intel's dataset, perfect! We just updated the base model to mlabonne/Marcoro14-7B-slerp, and applied the same dataset recipe we used for argilla/distilabeled-Hermes-2.5-Mistral-7B:

from datasets import load_dataset

# Instead of this:
# dataset = load_dataset("Intel/orca_dpo_pairs", split="train")

# we did this
dataset = load_dataset("argilla/distilabel-intel-orca-dpo-pairs", split="train")

dataset = dataset.filter(
    lambda r: 
        r["status"] != "tie" and 
        r["chosen_score"] >= 8 and 
        not r["in_gsm8k_train"]
)

Benchmark results

For benchmarking we used the famous "Nous" or "Teknium" benchmark. You can find below an overview, including our first experiment with a less ambitious dataset filtering (removing ties and score>5).

For running the benchmark we used another awesome contribution from Maxime: LLM AutoEval, check it out!

Model	AGIEval	GPT4ALL	TruthfulQA	Bigbench	Average
argilla/distilabeled-Marcoro14-7B-slerp-full	45.17	76.59	64.68	48.15	58.65
argilla/distilabeled-Marcoro14-7B-slerp	45.4	76.47	65.46	47.19	58.63
Marcoro14-7B-slerp	44.66	76.24	64.15	45.64	57.67
argilla/distilabeled-Hermes-2.5-Mistral-7B	44.64	73.35	55.96	42.21	54.04

Training Hardware

We used 1 x A100 80GB in runpod for less than 2 hours.

Acknowledgements

We'd like to thank the amazing open community and in particular:

The Intel team for publishing a great open dataset and show how well it worked in the first place
Teknium and NousResearch for their awesome work and models.
Maxime for sharing such great resources.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	73.40
AI2 Reasoning Challenge (25-Shot)	70.65
HellaSwag (10-Shot)	87.55
MMLU (5-Shot)	65.33
TruthfulQA (0-shot)	64.21
Winogrande (5-shot)	82.00
GSM8k (5-shot)	70.66