Go to file

ModelHub XC 2fb5c2af8a 初始化项目，由ModelHub XC社区提供模型

Model: agentlans/Llama3.1-SuperDeepFuse
Source: Original Platform

2026-06-23 23:45:27 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-06-23 23:45:27 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-06-23 23:45:27 +08:00

mergekit_config.yml

初始化项目，由ModelHub XC社区提供模型

2026-06-23 23:45:27 +08:00

model-00001-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-06-23 23:45:27 +08:00

model-00002-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-06-23 23:45:27 +08:00

model-00003-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-06-23 23:45:27 +08:00

model-00004-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-06-23 23:45:27 +08:00

model.safetensors.index.json

初始化项目，由ModelHub XC社区提供模型

2026-06-23 23:45:27 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-06-23 23:45:27 +08:00

special_tokens_map.json

初始化项目，由ModelHub XC社区提供模型

2026-06-23 23:45:27 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-06-23 23:45:27 +08:00

tokenizer.json

初始化项目，由ModelHub XC社区提供模型

2026-06-23 23:45:27 +08:00

README.md

base_model, library_name, tags, license, language, model-index

base_model

library_name

tags

license

language

model-index

arcee-ai/Llama-3.1-SuperNova-Lite

deepseek-ai/DeepSeek-R1-Distill-Llama-8B

FuseAI/FuseChat-Llama-3.1-8B-Instruct

transformers

mergekit

merge

llama3.1

name

results

Llama3.1-SuperDeepFuse

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

split

args

IFEval (0-Shot)

wis-k/instruction-following-eval

train

num_few_shot
0

type	value	name
inst_level_strict_acc and prompt_level_strict_acc	77.62	averaged accuracy

url	name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama3.1-SuperDeepFuse	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

split

args

BBH (3-Shot)

SaylorTwift/bbh

test

num_few_shot
3

type	value	name
acc_norm	29.22	normalized accuracy

url	name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama3.1-SuperDeepFuse	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

split

args

MATH Lvl 5 (4-Shot)

lighteval/MATH-Hard

test

num_few_shot
4

type	value	name
exact_match	17.75	exact match

url	name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama3.1-SuperDeepFuse	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

split

args

GPQA (0-shot)

Idavidrein/gpqa

train

num_few_shot
0

type	value	name
acc_norm	3.24	acc_norm

url	name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama3.1-SuperDeepFuse	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

args

MuSR (0-shot)

TAUR-Lab/MuSR

num_few_shot
0

type	value	name
acc_norm	5.13	acc_norm

url	name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama3.1-SuperDeepFuse	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

MMLU-PRO (5-shot)

TIGER-Lab/MMLU-Pro

main

test

num_few_shot
5

type	value	name
acc	30.83	accuracy

url	name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama3.1-SuperDeepFuse	Open LLM Leaderboard

Llama3.1-SuperDeepFuse

An 8B parameter language model that merges three high-performance distilled models to boost reasoning, instruction-following, and performance in mathematics and coding.

Model Highlights

Size: 8 billion parameters
Base: meta-llama/Llama-3.1-8B-Instruct
Merged Sources:
Merge Method: model_stock

Key Capabilities

Enhanced multi-task reasoning
Improved mathematical and coding performance
Multilingual support

Performance Notes

Maintains Llama 3.1 safety standards
Suitable for consumer GPU deployment
Balanced performance across diverse tasks

Considerations

Still being benchmarked
Capabilities limited compared to larger model variants
Can give misleading output like all other language models
Outputs should be independently verified

Licensing

Follows standard Llama 3.1 usage terms.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here! Summarized results can be found here!

Metric	Value (%)
Average	27.30
IFEval (0-Shot)	77.62
BBH (3-Shot)	29.22
MATH Lvl 5 (4-Shot)	17.75
GPQA (0-shot)	3.24
MuSR (0-shot)	5.13
MMLU-PRO (5-shot)	30.83