mlabonne/NeuralBeagle14-7B

Fork 0

Go to file

Cherrytest 6711db4edc Update README.md

2025-03-18 04:06:05 +00:00

.gitattributes

System init .gitattributes

2025-03-18 04:02:17 +00:00

config.json

Update README.md

2025-03-18 04:06:03 +00:00

configuration.json

Update README.md

2025-03-18 04:06:03 +00:00

generation_config.json

Update README.md

2025-03-18 04:06:03 +00:00

model-00001-of-00003.safetensors

Update README.md

2025-03-18 04:06:05 +00:00

model-00002-of-00003.safetensors

Update README.md

2025-03-18 04:06:05 +00:00

model-00003-of-00003.safetensors

Update README.md

2025-03-18 04:06:05 +00:00

model.safetensors.index.json

Update README.md

2025-03-18 04:06:03 +00:00

README.md

Update README.md

2025-03-18 04:06:03 +00:00

special_tokens_map.json

Update README.md

2025-03-18 04:06:03 +00:00

tokenizer_config.json

Update README.md

2025-03-18 04:06:03 +00:00

tokenizer.json

Update README.md

2025-03-18 04:06:03 +00:00

tokenizer.model

Update README.md

2025-03-18 04:06:05 +00:00

README.md

license, tags, base_model, model-index

license

tags

base_model

model-index

cc-by-nc-4.0

merge

mergekit

lazymergekit

dpo

rlhf

mlabonne/Beagle14-7B

name

results

NeuralBeagle14-7B

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

AI2 Reasoning Challenge (25-Shot)

ai2_arc

ARC-Challenge

test

num_few_shot
25

type	value	name
acc_norm	72.95	normalized accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralBeagle14-7B	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

split

args

HellaSwag (10-Shot)

hellaswag

validation

num_few_shot
10

type	value	name
acc_norm	88.34	normalized accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralBeagle14-7B	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

MMLU (5-Shot)

cais/mmlu

all

test

num_few_shot
5

type	value	name
acc	64.55	accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralBeagle14-7B	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

TruthfulQA (0-shot)

truthful_qa

multiple_choice

validation

num_few_shot
0

type	value
mc2	69.93

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralBeagle14-7B	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

Winogrande (5-shot)

winogrande

winogrande_xl

validation

num_few_shot
5

type	value	name
acc	82.4	accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralBeagle14-7B	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

GSM8k (5-shot)

gsm8k

main

test

num_few_shot
5

type	value	name
acc	70.28	accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralBeagle14-7B	Open LLM Leaderboard

🐶 NeuralBeagle14-7B

Update 01/16/24: NeuralBeagle14-7B is (probably) the best 7B model you can find! 🎉

NeuralBeagle14-7B is a DPO fine-tune of mlabonne/Beagle14-7B using the argilla/distilabel-intel-orca-dpo-pairs preference dataset and my DPO notebook from this article.

It is based on a merge of the following models using LazyMergekit:

fblgit/UNA-TheBeagle-7b-v1, based on jondurbin's repo and jondurbin/bagel-v0.3
argilla/distilabeled-Marcoro14-7B-slerp, based on mlabonne/Marcoro14-7B-slerp

Thanks Argilla for providing the dataset and the training recipe here. 💪

You can try it out in this Space (GGUF Q4_K_M).

🔍 Applications

This model uses a context window of 8k. It is compatible with different templates, like chatml and Llama's chat template.

Compared to other 7B models, it displays good performance in instruction following and reasoning tasks. It can also be used for RP and storytelling.

⚡ Quantized models

GGUF: https://huggingface.co/mlabonne/NeuralBeagle14-7B-GGUF
GPTQ: https://huggingface.co/TheBloke/NeuralBeagle14-7B-GPTQ
AWQ: https://huggingface.co/TheBloke/NeuralBeagle14-7B-AWQ
EXL2: https://huggingface.co/LoneStriker/NeuralBeagle14-7B-8.0bpw-h8-exl2

🏆 Evaluation

Open LLM Leaderboard

NeuralBeagle14-7B ranks first on the Open LLM Leaderboard in the ~7B category.

It has the same average score as Beagle14-7B ("Show merges"), which could be due to might be due to an unlucky run. I think I might be overexploiting argilla/distilabel-intel-orca-dpo-pairs at this point, since this dataset or its original version are present in multiple models. I need to find more high-quality preference data for the next DPO merge.

Note that some models like udkai/Turdus and nfaheem/Marcoroni-7b-DPO-Merge are unfortunately contaminated on purpose (see the very high Winogrande score).

Nous

The evaluation was performed using LLM AutoEval on Nous suite. It is the best 7B model to date.

Model	Average	AGIEval	GPT4All	TruthfulQA	Bigbench
mlabonne/NeuralBeagle14-7B 📄	60.25	46.06	76.77	70.32	47.86
mlabonne/Beagle14-7B 📄	59.4	44.38	76.53	69.44	47.25
mlabonne/NeuralDaredevil-7B 📄	59.39	45.23	76.2	67.61	48.52
argilla/distilabeled-Marcoro14-7B-slerp 📄	58.93	45.38	76.48	65.68	48.18
mlabonne/NeuralMarcoro14-7B 📄	58.4	44.59	76.17	65.94	46.9
openchat/openchat-3.5-0106 📄	53.71	44.17	73.72	52.53	44.4
teknium/OpenHermes-2.5-Mistral-7B 📄	52.42	42.75	72.99	52.99	40.94

You can find the complete benchmark on YALL - Yet Another LLM Leaderboard.

💻 Usage

!pip install -qU transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "mlabonne/NeuralBeagle14-7B"
messages = [{"role": "user", "content": "What is a large language model?"}]

tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])