Go to file

ModelHub XC d37ebb4df0 初始化项目，由ModelHub XC社区提供模型

Model: anthracite-org/magnum-v2-12b
Source: Original Platform

2026-06-09 10:50:26 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-06-09 10:50:26 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-06-09 10:50:26 +08:00

generation_config.json

初始化项目，由ModelHub XC社区提供模型

2026-06-09 10:50:26 +08:00

model-00001-of-00005.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-06-09 10:50:26 +08:00

model-00002-of-00005.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-06-09 10:50:26 +08:00

model-00003-of-00005.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-06-09 10:50:26 +08:00

model-00004-of-00005.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-06-09 10:50:26 +08:00

model-00005-of-00005.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-06-09 10:50:26 +08:00

model.safetensors.index.json

初始化项目，由ModelHub XC社区提供模型

2026-06-09 10:50:26 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-06-09 10:50:26 +08:00

special_tokens_map.json

初始化项目，由ModelHub XC社区提供模型

2026-06-09 10:50:26 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-06-09 10:50:26 +08:00

tokenizer.json

初始化项目，由ModelHub XC社区提供模型

2026-06-09 10:50:26 +08:00

README.md

language, license, tags, base_model, pipeline_tag, model-index

language

license

tags

base_model

pipeline_tag

model-index

apache-2.0

chat

mistralai/Mistral-Nemo-Base-2407

text-generation

name

results

magnum-v2-12b

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

args

IFEval (0-Shot)

HuggingFaceH4/ifeval

num_few_shot
0

type	value	name
inst_level_strict_acc and prompt_level_strict_acc	37.62	strict accuracy

url	name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=anthracite-org/magnum-v2-12b	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

args

BBH (3-Shot)

BBH

num_few_shot
3

type	value	name
acc_norm	28.79	normalized accuracy

url	name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=anthracite-org/magnum-v2-12b	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

args

MATH Lvl 5 (4-Shot)

hendrycks/competition_math

num_few_shot
4

type	value	name
exact_match	4.76	exact match

url	name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=anthracite-org/magnum-v2-12b	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

args

GPQA (0-shot)

Idavidrein/gpqa

num_few_shot
0

type	value	name
acc_norm	5.48	acc_norm

url	name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=anthracite-org/magnum-v2-12b	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

args

MuSR (0-shot)

TAUR-Lab/MuSR

num_few_shot
0

type	value	name
acc_norm	11.37	acc_norm

url	name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=anthracite-org/magnum-v2-12b	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

MMLU-PRO (5-shot)

TIGER-Lab/MMLU-Pro

main

test

num_few_shot
5

type	value	name
acc	24.08	accuracy

url	name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=anthracite-org/magnum-v2-12b	Open LLM Leaderboard

This is the fourth in a series of models designed to replicate the prose quality of the Claude 3 models, specifically Sonnet and Opus. This model is fine-tuned on top of Mistral-Nemo-Base-2407.

Prompting

Model has been Instruct tuned with the ChatML formatting. A typical input would look like this:

"""<|im_start|>system
system prompt<|im_end|>
<|im_start|>user
Hi there!<|im_end|>
<|im_start|>assistant
Nice to meet you!<|im_end|>
<|im_start|>user
Can I ask a question?<|im_end|>
<|im_start|>assistant
"""

Credits

Stheno dataset (filtered)
kalomaze/Opus_Instruct_25k
Nopm/Opus_WritingStruct
Gryphe/Sonnet3.5-SlimOrcaDedupCleaned (A ~16k rows subset)
kalomaze/Opus_Instruct_3k

This model has been a team effort, and the credits goes to all members of Anthracite.

Training

The training was done for 2 epochs. We used 8x NVIDIA H100 Tensor Core GPUs for the full-parameter fine-tuning of the model.

Safety

...

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	18.68
IFEval (0-Shot)	37.62
BBH (3-Shot)	28.79
MATH Lvl 5 (4-Shot)	4.76
GPQA (0-shot)	5.48
MuSR (0-shot)	11.37
MMLU-PRO (5-shot)	24.08