Go to file

ModelHub XC 78ca08a5df 初始化项目，由ModelHub XC社区提供模型

Model: AbacusResearch/haLLAwa2
Source: Original Platform

2026-06-14 13:18:40 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-06-14 13:18:40 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-06-14 13:18:40 +08:00

mergekit_config.yml

初始化项目，由ModelHub XC社区提供模型

2026-06-14 13:18:40 +08:00

model-00001-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-06-14 13:18:40 +08:00

model-00002-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-06-14 13:18:40 +08:00

model-00003-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-06-14 13:18:40 +08:00

model-00004-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-06-14 13:18:40 +08:00

model.safetensors.index.json

初始化项目，由ModelHub XC社区提供模型

2026-06-14 13:18:40 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-06-14 13:18:40 +08:00

special_tokens_map.json

初始化项目，由ModelHub XC社区提供模型

2026-06-14 13:18:40 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-06-14 13:18:40 +08:00

tokenizer.json

初始化项目，由ModelHub XC社区提供模型

2026-06-14 13:18:40 +08:00

tokenizer.model

初始化项目，由ModelHub XC社区提供模型

2026-06-14 13:18:40 +08:00

README.md

license, tags, model-index

license

tags

model-index

apache-2.0

merge

mergekit

lazymergekit

OpenPipe/mistral-ft-optimized-1227

machinists/Mistral-7B-SQL

name

results

haLLAwa2

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

AI2 Reasoning Challenge (25-Shot)

ai2_arc

ARC-Challenge

test

num_few_shot
25

type	value	name
acc_norm	63.31	normalized accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=AbacusResearch/haLLAwa2	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

split

args

HellaSwag (10-Shot)

hellaswag

validation

num_few_shot
10

type	value	name
acc_norm	84.51	normalized accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=AbacusResearch/haLLAwa2	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

MMLU (5-Shot)

cais/mmlu

all

test

num_few_shot
5

type	value	name
acc	63.52	accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=AbacusResearch/haLLAwa2	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

TruthfulQA (0-shot)

truthful_qa

multiple_choice

validation

num_few_shot
0

type	value
mc2	47.38

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=AbacusResearch/haLLAwa2	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

Winogrande (5-shot)

winogrande

winogrande_xl

validation

num_few_shot
5

type	value	name
acc	75.85	accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=AbacusResearch/haLLAwa2	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

GSM8k (5-shot)

gsm8k

main

test

num_few_shot
5

type	value	name
acc	52.08	accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=AbacusResearch/haLLAwa2	Open LLM Leaderboard

haLLAwa2

haLLAwa2 is a merge of the following models using mergekit:

🧩 Configuration

slices:
  - sources:
      - model: OpenPipe/mistral-ft-optimized-1227
        layer_range: [0, 32]
      - model: machinists/Mistral-7B-SQL
        layer_range: [0, 32]

merge_method: slerp
base_model: OpenPipe/mistral-ft-optimized-1227
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.5 # fallback for rest of tensors
dtype: bfloat16

\```
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_AbacusResearch__haLLAwa2)

|             Metric              |Value|
|---------------------------------|----:|
|Avg.                             |64.44|
|AI2 Reasoning Challenge (25-Shot)|63.31|
|HellaSwag (10-Shot)              |84.51|
|MMLU (5-Shot)                    |63.52|
|TruthfulQA (0-shot)              |47.38|
|Winogrande (5-shot)              |75.85|
|GSM8k (5-shot)                   |52.08|