language, license, tags, base_model, pipeline_tag, model-index
language license tags base_model pipeline_tag model-index
en
fr
de
es
it
pt
ru
zh
ja
apache-2.0
chat
mistralai/Mistral-Nemo-Base-2407 text-generation
name results
magnum-v2-12b
task dataset metrics source
type name
text-generation Text Generation
name type args
IFEval (0-Shot) HuggingFaceH4/ifeval
num_few_shot
0
type value name
inst_level_strict_acc and prompt_level_strict_acc 37.62 strict accuracy
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=anthracite-org/magnum-v2-12b Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type args
BBH (3-Shot) BBH
num_few_shot
3
type value name
acc_norm 28.79 normalized accuracy
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=anthracite-org/magnum-v2-12b Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type args
MATH Lvl 5 (4-Shot) hendrycks/competition_math
num_few_shot
4
type value name
exact_match 4.76 exact match
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=anthracite-org/magnum-v2-12b Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type args
GPQA (0-shot) Idavidrein/gpqa
num_few_shot
0
type value name
acc_norm 5.48 acc_norm
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=anthracite-org/magnum-v2-12b Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type args
MuSR (0-shot) TAUR-Lab/MuSR
num_few_shot
0
type value name
acc_norm 11.37 acc_norm
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=anthracite-org/magnum-v2-12b Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type config split args
MMLU-PRO (5-shot) TIGER-Lab/MMLU-Pro main test
num_few_shot
5
type value name
acc 24.08 accuracy
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=anthracite-org/magnum-v2-12b Open LLM Leaderboard

image/png

This is the fourth in a series of models designed to replicate the prose quality of the Claude 3 models, specifically Sonnet and Opus. This model is fine-tuned on top of Mistral-Nemo-Base-2407.

Prompting

Model has been Instruct tuned with the ChatML formatting. A typical input would look like this:

"""<|im_start|>system
system prompt<|im_end|>
<|im_start|>user
Hi there!<|im_end|>
<|im_start|>assistant
Nice to meet you!<|im_end|>
<|im_start|>user
Can I ask a question?<|im_end|>
<|im_start|>assistant
"""

Credits

This model has been a team effort, and the credits goes to all members of Anthracite.

Training

The training was done for 2 epochs. We used 8x NVIDIA H100 Tensor Core GPUs for the full-parameter fine-tuning of the model.

Built with Axolotl

Safety

...

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 18.68
IFEval (0-Shot) 37.62
BBH (3-Shot) 28.79
MATH Lvl 5 (4-Shot) 4.76
GPQA (0-shot) 5.48
MuSR (0-shot) 11.37
MMLU-PRO (5-shot) 24.08
Description
Model synced from source: anthracite-org/magnum-v2-12b
Readme 2.8 MiB