Files

ModelHub XC e93fd63f90 初始化项目，由ModelHub XC社区提供模型

Model: Stopwolf/Tito-7B-slerp
Source: Original Platform

2026-05-12 23:40:26 +08:00

5.9 KiB

Raw Permalink Blame History

license, tags, model-index

license

tags

model-index

apache-2.0

merge

mergekit

lazymergekit

gordicaleksa/YugoGPT

mlabonne/AlphaMonarch-7B

name

results

Tito-7B-slerp

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

AI2 Reasoning Challenge (25-Shot)

ai2_arc

ARC-Challenge

test

num_few_shot
25

type	value	name
acc_norm	68.09	normalized accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Stopwolf/Tito-7B-slerp	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

split

args

HellaSwag (10-Shot)

hellaswag

validation

num_few_shot
10

type	value	name
acc_norm	86.38	normalized accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Stopwolf/Tito-7B-slerp	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

MMLU (5-Shot)

cais/mmlu

all

test

num_few_shot
5

type	value	name
acc	64.01	accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Stopwolf/Tito-7B-slerp	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

TruthfulQA (0-shot)

truthful_qa

multiple_choice

validation

num_few_shot
0

type	value
mc2	57.01

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Stopwolf/Tito-7B-slerp	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

Winogrande (5-shot)

winogrande

winogrande_xl

validation

num_few_shot
5

type	value	name
acc	81.69	accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Stopwolf/Tito-7B-slerp	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

GSM8k (5-shot)

gsm8k

main

test

num_few_shot
5

type	value	name
acc	63.61	accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Stopwolf/Tito-7B-slerp	Open LLM Leaderboard

Tito-7B-slerp

Tito-7B-slerp is a merge of the following models using mergekit:

🧩 Configuration

slices:
  - sources:
      - model: gordicaleksa/YugoGPT
        layer_range: [0, 32]
      - model: mlabonne/AlphaMonarch-7B
        layer_range: [0, 32]
merge_method: slerp
base_model: mlabonne/AlphaMonarch-7B
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.6
dtype: bfloat16

Results

Evaluations on Serbian LLM eval suite (or rather, performance and knowledge of Serbian):

	ARC-E	ARC-C	Hellaswag	BoolQ	Winogrande	OpenbookQA	PiQA	NQ Open	TriviaQA	Avg.
Zamfir-7B	51.85	32.25	46.03	75.59	62.59	26.00	66.81	16.09	36.11	45.92
Mustra-7B	52.95	33.70	45.89	77.55	64.17	30.60	67.25	15.40	34.84	46.93
Tito-7B	55.43	34.73	48.19	77.37	65.27	30.00	67.30	16.7	35.38	47.82
YugoGPT	57.79	34.73	49.89	69.45	64.56	28.20	72.03	15.82	36.14	47.62

Here, all benchmarks were done 0-shot, on the exception of NQ Open and TriviaQA which were done in 5-shot manner, in order to be comparable to Mistral paper.

If we try to replicate OpenLLM Leaderboard results on available Serbian datasets (running an appropriate amount of shots instead of 0), we get:

	ARC	Hellaswag	Winogrande	TruthfulQA	Avg.
Tito-7B	47.27	-	69.93	57.48	58.23
Perucac-7B	49.74	-	71.98	56.03	59.25
YugoGPT	44.03	-	70.64	48.06	54.24
Llama3-8B	42.24	-	61.25	51.08	51.52
SambaLingo	37.88	-	61.48	47.23	48.86

Note that YugoGPT, Llama3 and SambaLingo are all base models, unlike Tito and Perucac.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Tito	YugoGPT
Avg.	70.13	57.34
AI2 Reasoning Challenge (25-Shot)	68.09	58.10
HellaSwag (10-Shot)	86.38	81.44
MMLU (5-Shot)	64.01	60.68
TruthfulQA (0-shot)	57.01	36.60
Winogrande (5-shot)	81.69	76.56
GSM8k (5-shot)	63.61	30.70

5.9 KiB Raw Permalink Blame History

Tito-7B-slerp

🧩 Configuration

Results

Open LLM Leaderboard Evaluation Results

5.9 KiB

Raw Permalink Blame History