Go to file

ModelHub XC 2160ed37ae 初始化项目，由ModelHub XC社区提供模型

Model: Locutusque/TinyMistral-248M-v2.5
Source: Original Platform

2026-04-30 11:50:19 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-04-30 11:50:19 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-04-30 11:50:19 +08:00

generation_config.json

初始化项目，由ModelHub XC社区提供模型

2026-04-30 11:50:19 +08:00

mergekit_config.yml

初始化项目，由ModelHub XC社区提供模型

2026-04-30 11:50:19 +08:00

model.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-04-30 11:50:19 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-04-30 11:50:19 +08:00

special_tokens_map.json

初始化项目，由ModelHub XC社区提供模型

2026-04-30 11:50:19 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-04-30 11:50:19 +08:00

tokenizer.json

初始化项目，由ModelHub XC社区提供模型

2026-04-30 11:50:19 +08:00

README.md

language, license, tags, datasets, inference, widget, model-index

language

license

tags

datasets

inference

widget

model-index

code

apache-2.0

merge

computer science

open-phi/programming_books_llama

open-phi/textbooks

parameters

do_sample	temperature	top_p	top_k	max_new_tokens	repetition_penalty
true	0.2	0.14	12	250	1.15

text
To calculate the factorial of n, we can use the following function:

name

results

TinyMistral-248M-v2.5

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

AI2 Reasoning Challenge (25-Shot)

ai2_arc

ARC-Challenge

test

num_few_shot
25

type	value	name
acc_norm	24.57	normalized accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/TinyMistral-248M-v2.5	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

split

args

HellaSwag (10-Shot)

hellaswag

validation

num_few_shot
10

type	value	name
acc_norm	27.49	normalized accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/TinyMistral-248M-v2.5	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

MMLU (5-Shot)

cais/mmlu

all

test

num_few_shot
5

type	value	name
acc	23.15	accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/TinyMistral-248M-v2.5	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

TruthfulQA (0-shot)

truthful_qa

multiple_choice

validation

num_few_shot
0

type	value
mc2	46.72

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/TinyMistral-248M-v2.5	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

Winogrande (5-shot)

winogrande

winogrande_xl

validation

num_few_shot
5

type	value	name
acc	47.83	accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/TinyMistral-248M-v2.5	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

GSM8k (5-shot)

gsm8k

main

test

num_few_shot
5

type	value	name
acc	0.0	accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/TinyMistral-248M-v2.5	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

args

IFEval (0-Shot)

HuggingFaceH4/ifeval

num_few_shot
0

type	value	name
inst_level_strict_acc and prompt_level_strict_acc	13.36	strict accuracy

url	name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Locutusque/TinyMistral-248M-v2.5	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

args

BBH (3-Shot)

BBH

num_few_shot
3

type	value	name
acc_norm	3.18	normalized accuracy

url	name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Locutusque/TinyMistral-248M-v2.5	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

args

MATH Lvl 5 (4-Shot)

hendrycks/competition_math

num_few_shot
4

type	value	name
exact_match	0.0	exact match

url	name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Locutusque/TinyMistral-248M-v2.5	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

args

GPQA (0-shot)

Idavidrein/gpqa

num_few_shot
0

type	value	name
acc_norm	0.11	acc_norm

url	name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Locutusque/TinyMistral-248M-v2.5	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

args

MuSR (0-shot)

TAUR-Lab/MuSR

num_few_shot
0

type	value	name
acc_norm	5.07	acc_norm

url	name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Locutusque/TinyMistral-248M-v2.5	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

MMLU-PRO (5-shot)

TIGER-Lab/MMLU-Pro

main

test

num_few_shot
5

type	value	name
acc	1.5	accuracy

url	name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Locutusque/TinyMistral-248M-v2.5	Open LLM Leaderboard

TinyMistral-248M-v2.5

This model was created by merging TinyMistral-248M-v1 and v2, then further pretraining on synthetic textbooks. The resulting model's performance is superior to both, after personal evaluation.

During training, this model reached an average perplexity score of 4, outperforming V1 by nearly 7x, and V2 by 4x.

You can use the following config to reproduce the merged model:

base_model: Locutusque/TinyMistral-248M-v2
dtype: float16
merge_method: ties
parameters:
  int8_mask: 1.0
  normalize: 1.0
slices:
- sources:
  - layer_range: [0, 12]
    model: Locutusque/TinyMistral-248M
    parameters:
      density: [1.0, 0.7, 0.1]
      weight: 1.0
  - layer_range: [0, 12]
    model: Locutusque/TinyMistral-248M-v2
    parameters:
      density: 0.5
      weight: [0.0, 0.3, 0.7, 1.0]

This model can also answer basic questions, without needing to do any fine-tuning.

This model was also created as an attempt to fix the issue with V2 - the weights were prone to exploding gradients, making it difficult to fine-tune. This model is easier to fine-tune.

To get the best out of this model, I recommend installing it, and trying it out yourself, as the model's performance seems to degrade in the inference API.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	28.29
AI2 Reasoning Challenge (25-Shot)	24.57
HellaSwag (10-Shot)	27.49
MMLU (5-Shot)	23.15
TruthfulQA (0-shot)	46.72
Winogrande (5-shot)	47.83
GSM8k (5-shot)	0.00

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	3.87
IFEval (0-Shot)	13.36
BBH (3-Shot)	3.18
MATH Lvl 5 (4-Shot)	0.00
GPQA (0-shot)	0.11
MuSR (0-shot)	5.07
MMLU-PRO (5-shot)	1.50