language, license, tags, dataset, base_model, model-index
language license tags dataset base_model model-index
en
cc-by-nc-4.0
merge
lazymergekit
dpo
rlhf
mlabonne/truthy-dpo-v0.1
mlabonne/distilabel-intel-orca-dpo-pairs
mlabonne/chatml-OpenHermes2.5-dpo-binarized-alpha
mlabonne/NeuralMonarch-7B
name results
AlphaMonarch-7B
task dataset metrics source
type name
text-generation Text Generation
name type config split args
AI2 Reasoning Challenge (25-Shot) ai2_arc ARC-Challenge test
num_few_shot
25
type value name
acc_norm 73.04 normalized accuracy
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/AlphaMonarch-7B Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type split args
HellaSwag (10-Shot) hellaswag validation
num_few_shot
10
type value name
acc_norm 89.18 normalized accuracy
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/AlphaMonarch-7B Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type config split args
MMLU (5-Shot) cais/mmlu all test
num_few_shot
5
type value name
acc 64.4 accuracy
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/AlphaMonarch-7B Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type config split args
TruthfulQA (0-shot) truthful_qa multiple_choice validation
num_few_shot
0
type value
mc2 77.91
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/AlphaMonarch-7B Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type config split args
Winogrande (5-shot) winogrande winogrande_xl validation
num_few_shot
5
type value name
acc 84.69 accuracy
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/AlphaMonarch-7B Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type config split args
GSM8k (5-shot) gsm8k main test
num_few_shot
5
type value name
acc 66.72 accuracy
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/AlphaMonarch-7B Open LLM Leaderboard

image/jpeg

👑 AlphaMonarch-7B

tl;dr: AlphaMonarch-7B is a new DPO merge that retains all the reasoning abilities of the very best merges and significantly improves its conversational abilities. Kind of the best of both worlds in a 7B model. 🎉

AlphaMonarch-7B is a DPO fine-tuned of mlabonne/NeuralMonarch-7B using the argilla/OpenHermes2.5-dpo-binarized-alpha preference dataset.

It is based on a merge of the following models using LazyMergekit:

Special thanks to Jon Durbin, Intel, Argilla, and Teknium for the preference datasets.

Try the demo: https://huggingface.co/spaces/mlabonne/AlphaMonarch-7B

🔍 Applications

This model uses a context window of 8k. I recommend using it with the Mistral Instruct chat template (works perfectly with LM Studio).

If you use SillyTavern, you might want to tweak the inference parameters. Here's what LM Studio uses as a reference: temp 0.8, top_k 40, top_p 0.95, min_p 0.05, repeat_penalty 1.1.

It is one of the very best 7B models in terms of instructing following and reasoning abilities and can be used for conversations, RP, and storytelling. Note that it tends to have a quite formal and sophisticated style, but it can be changed by modifying the prompt.

Quantized models

Thanks to LoneStriker for the GPTQ, AWQ, and EXL2 quants.

🏆 Evaluation

Nous

AlphaMonarch-7B is the best-performing 7B model on Nous' benchmark suite (evaluation performed using LLM AutoEval). See the entire leaderboard here.

Model Average AGIEval GPT4All TruthfulQA Bigbench
AlphaMonarch-7B 📄 62.74 45.37 77.01 78.39 50.2
NeuralMonarch-7B 📄 62.73 45.31 76.99 78.35 50.28
Monarch-7B 📄 62.68 45.48 77.07 78.04 50.14
teknium/OpenHermes-2.5-Mistral-7B 📄 52.42 42.75 72.99 52.99 40.94
mlabonne/NeuralHermes-2.5-Mistral-7B 📄 53.51 43.67 73.24 55.37 41.76
mlabonne/NeuralBeagle14-7B 📄 60.25 46.06 76.77 70.32 47.86
mlabonne/NeuralOmniBeagle-7B 📄 62.3 45.85 77.26 76.06 50.03
eren23/dpo-binarized-NeuralTrix-7B 📄 62.5 44.57 76.34 79.81 49.27
CultriX/NeuralTrix-7B-dpo 📄 62.5 44.61 76.33 79.8 49.24

EQ-bench

AlphaMonarch-7B is also outperforming 70B and 120B parameter models on EQ-bench by Samuel J. Paech, who kindly ran the evaluations.

image/png

MT-Bench

########## First turn ##########
                                    score
model                       turn         
gpt-4                       1     8.95625
OmniBeagle-7B               1     8.31250
AlphaMonarch-7B             1     8.23750
claude-v1                   1     8.15000
NeuralMonarch-7B            1     8.09375
gpt-3.5-turbo               1     8.07500
claude-instant-v1           1     7.80000

########## Second turn ##########
                                     score
model                       turn          
gpt-4                       2     9.025000
claude-instant-v1           2     8.012658
OmniBeagle-7B               2     7.837500
gpt-3.5-turbo               2     7.812500
claude-v1                   2     7.650000
AlphaMonarch-7B             2     7.618750
NeuralMonarch-7B            2     7.375000

########## Average ##########
                                score
model                                
gpt-4                        8.990625
OmniBeagle-7B                8.075000
gpt-3.5-turbo                7.943750
AlphaMonarch-7B              7.928125
claude-instant-v1            7.905660
claude-v1                    7.900000
NeuralMonarch-7B             7.734375
NeuralBeagle14-7B            7.628125

Open LLM Leaderboard

AlphaMonarch-7B is one of the best-performing non-merge 7B models on the Open LLM Leaderboard:

image/png

🌳 Model Family Tree

image/png

💻 Usage

!pip install -qU transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "mlabonne/AlphaMonarch-7B"
messages = [{"role": "user", "content": "What is a large language model?"}]

tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
Description
Model synced from source: mlabonne/AlphaMonarch-7B
Readme 1 MiB