language, license, tags, datasets, inference, widget, model-index
language
license
tags
datasets
inference
widget
model-index
apache-2.0
open-phi/programming_books_llama
open-phi/textbooks
parameters
do_sample
temperature
top_p
top_k
max_new_tokens
repetition_penalty
true
0.2
0.14
12
250
1.15
text
To calculate the factorial of n, we can use the following function:
name
results
TinyMistral-248M-v2.5
task
dataset
metrics
source
type
name
text-generation
Text Generation
name
type
config
split
args
AI2 Reasoning Challenge (25-Shot)
ai2_arc
ARC-Challenge
test
type
value
name
acc_norm
24.57
normalized accuracy
task
dataset
metrics
source
type
name
text-generation
Text Generation
name
type
split
args
HellaSwag (10-Shot)
hellaswag
validation
type
value
name
acc_norm
27.49
normalized accuracy
task
dataset
metrics
source
type
name
text-generation
Text Generation
name
type
config
split
args
MMLU (5-Shot)
cais/mmlu
all
test
type
value
name
acc
23.15
accuracy
task
dataset
metrics
source
type
name
text-generation
Text Generation
name
type
config
split
args
TruthfulQA (0-shot)
truthful_qa
multiple_choice
validation
task
dataset
metrics
source
type
name
text-generation
Text Generation
name
type
config
split
args
Winogrande (5-shot)
winogrande
winogrande_xl
validation
type
value
name
acc
47.83
accuracy
task
dataset
metrics
source
type
name
text-generation
Text Generation
name
type
config
split
args
GSM8k (5-shot)
gsm8k
main
test
type
value
name
acc
0.0
accuracy
task
dataset
metrics
source
type
name
text-generation
Text Generation
name
type
args
IFEval (0-Shot)
HuggingFaceH4/ifeval
type
value
name
inst_level_strict_acc and prompt_level_strict_acc
13.36
strict accuracy
task
dataset
metrics
source
type
name
text-generation
Text Generation
name
type
args
BBH (3-Shot)
BBH
type
value
name
acc_norm
3.18
normalized accuracy
task
dataset
metrics
source
type
name
text-generation
Text Generation
name
type
args
MATH Lvl 5 (4-Shot)
hendrycks/competition_math
type
value
name
exact_match
0.0
exact match
task
dataset
metrics
source
type
name
text-generation
Text Generation
name
type
args
GPQA (0-shot)
Idavidrein/gpqa
type
value
name
acc_norm
0.11
acc_norm
task
dataset
metrics
source
type
name
text-generation
Text Generation
name
type
args
MuSR (0-shot)
TAUR-Lab/MuSR
type
value
name
acc_norm
5.07
acc_norm
task
dataset
metrics
source
type
name
text-generation
Text Generation
name
type
config
split
args
MMLU-PRO (5-shot)
TIGER-Lab/MMLU-Pro
main
test
type
value
name
acc
1.5
accuracy
TinyMistral-248M-v2.5
This model was created by merging TinyMistral-248M-v1 and v2, then further pretraining on synthetic textbooks. The resulting model's performance is superior to both, after personal evaluation.
During training, this model reached an average perplexity score of 4, outperforming V1 by nearly 7x, and V2 by 4x.
You can use the following config to reproduce the merged model:
This model can also answer basic questions, without needing to do any fine-tuning.
This model was also created as an attempt to fix the issue with V2 - the weights were prone to exploding gradients, making it difficult to fine-tune. This model is easier to fine-tune.
To get the best out of this model, I recommend installing it, and trying it out yourself, as the model's performance seems to degrade in the inference API.
Detailed results can be found here
Metric
Value
Avg.
28.29
AI2 Reasoning Challenge (25-Shot)
24.57
HellaSwag (10-Shot)
27.49
MMLU (5-Shot)
23.15
TruthfulQA (0-shot)
46.72
Winogrande (5-shot)
47.83
GSM8k (5-shot)
0.00
Detailed results can be found here
Metric
Value
Avg.
3.87
IFEval (0-Shot)
13.36
BBH (3-Shot)
3.18
MATH Lvl 5 (4-Shot)
0.00
GPQA (0-shot)
0.11
MuSR (0-shot)
5.07
MMLU-PRO (5-shot)
1.50