language, license, library_name, tags, base_model, pipeline_tag, model-index
language
license
library_name
tags
base_model
pipeline_tag
model-index
apache-2.0
transformers
uukuguy/speechless-code-mistral-7b-v1.0
upaya07/Arithmo2-Mistral-7B
text-generation
name
results
sethuiyer/CodeCalc-Mistral-7B
task
dataset
metrics
source
type
name
text-generation
Text Generation
name
type
config
split
args
AI2 Reasoning Challenge (25-Shot)
ai2_arc
ARC-Challenge
test
type
value
name
acc_norm
61.95
normalized accuracy
task
dataset
metrics
source
type
name
text-generation
Text Generation
name
type
split
args
HellaSwag (10-Shot)
hellaswag
validation
type
value
name
acc_norm
83.64
normalized accuracy
task
dataset
metrics
source
type
name
text-generation
Text Generation
name
type
config
split
args
MMLU (5-Shot)
cais/mmlu
all
test
type
value
name
acc
62.78
accuracy
task
dataset
metrics
source
type
name
text-generation
Text Generation
name
type
config
split
args
TruthfulQA (0-shot)
truthful_qa
multiple_choice
validation
task
dataset
metrics
source
type
name
text-generation
Text Generation
name
type
config
split
args
Winogrande (5-shot)
winogrande
winogrande_xl
validation
type
value
name
acc
78.3
accuracy
task
dataset
metrics
source
type
name
text-generation
Text Generation
name
type
config
split
args
GSM8k (5-shot)
gsm8k
main
test
type
value
name
acc
63.53
accuracy
CodeCalc-Mistral-7B
Configuration
The following YAML configuration was used to produce this model:
Evaluation
T
Model
Average
ARC
HellaSwag
MMLU
TruthfulQA
Winogrande
GSM8K
🔍
sethuiyer/CodeCalc-Mistral-7B
66.33
61.95
83.64
62.78
47.79
78.3
63.53
📉
uukuguy/speechless-code-mistral-7b-v1.0
63.6
61.18
83.77
63.4
47.9
78.37
47.01
The merge appears to be successful, especially considering the substantial improvement in the GSM8K benchmark while maintaining comparable performance on other metrics.
Usage
Alpaca Instruction Format and Divine Intellect preset.
Preset:
Detailed results can be found here
Metric
Value
Avg.
66.33
AI2 Reasoning Challenge (25-Shot)
61.95
HellaSwag (10-Shot)
83.64
MMLU (5-Shot)
62.78
TruthfulQA (0-shot)
47.79
Winogrande (5-shot)
78.30
GSM8k (5-shot)
63.53