license, tags, model-index
license
tags
model-index
llama2
name
results
ereb-test
task
dataset
metrics
source
type
name
text-generation
Text Generation
name
type
config
split
args
AI2 Reasoning Challenge (25-Shot)
ai2_arc
ARC-Challenge
test
type
value
name
acc_norm
40.7
normalized accuracy
task
dataset
metrics
source
type
name
text-generation
Text Generation
name
type
split
args
HellaSwag (10-Shot)
hellaswag
validation
type
value
name
acc_norm
71.04
normalized accuracy
task
dataset
metrics
source
type
name
text-generation
Text Generation
name
type
config
split
args
MMLU (5-Shot)
cais/mmlu
all
test
type
value
name
acc
28.06
accuracy
task
dataset
metrics
source
type
name
text-generation
Text Generation
name
type
config
split
args
TruthfulQA (0-shot)
truthful_qa
multiple_choice
validation
task
dataset
metrics
source
type
name
text-generation
Text Generation
name
type
config
split
args
Winogrande (5-shot)
winogrande
winogrande_xl
validation
type
value
name
acc
63.93
accuracy
task
dataset
metrics
source
type
name
text-generation
Text Generation
name
type
config
split
args
GSM8k (5-shot)
gsm8k
main
test
type
value
name
acc
0.0
accuracy
Another trial of merging models with different sizes, still under testing, should be more stable, but I have no ideia if it's improving or degrading the base model.
Recipe:
Detailed results can be found here
Metric
Value
Avg.
41.85
AI2 Reasoning Challenge (25-Shot)
40.70
HellaSwag (10-Shot)
71.04
MMLU (5-Shot)
28.06
TruthfulQA (0-shot)
47.40
Winogrande (5-shot)
63.93
GSM8k (5-shot)
0.00