license, library_name, base_model, datasets, model-index
license
library_name
base_model
datasets
model-index
apache-2.0
transformers
mistralai/Mistral-Nemo-Instruct-2407
tasksource/ScienceQA_text_only
name
results
mistral-nemo-wissenschaft-12B
task
dataset
metrics
source
type
name
text-generation
Text Generation
name
type
args
IFEval (0-Shot)
HuggingFaceH4/ifeval
type
value
name
inst_level_strict_acc and prompt_level_strict_acc
65.2
strict accuracy
task
dataset
metrics
source
type
name
text-generation
Text Generation
name
type
args
BBH (3-Shot)
BBH
type
value
name
acc_norm
29.57
normalized accuracy
task
dataset
metrics
source
type
name
text-generation
Text Generation
name
type
args
MATH Lvl 5 (4-Shot)
hendrycks/competition_math
type
value
name
exact_match
6.57
exact match
task
dataset
metrics
source
type
name
text-generation
Text Generation
name
type
args
GPQA (0-shot)
Idavidrein/gpqa
type
value
name
acc_norm
5.7
acc_norm
task
dataset
metrics
source
type
name
text-generation
Text Generation
name
type
args
MuSR (0-shot)
TAUR-Lab/MuSR
type
value
name
acc_norm
12.29
acc_norm
task
dataset
metrics
source
type
name
text-generation
Text Generation
name
type
config
split
args
MMLU-PRO (5-shot)
TIGER-Lab/MMLU-Pro
main
test
type
value
name
acc
28.14
accuracy
mistral-nemo-wissenschaft-12B
mistralai/Mistral-Nemo-Instruct-2407 finetuned on tasksource/ScienceQA_text_only .
Method
Finetuned using an A100 on Google Colab for 1 epoch. Correct answers were selected as the chosen answer, a random wrong answer was selected as "rejected."
Fine-tune Llama 3 with ORPO
Detailed results can be found here
Metric
Value
Avg.
24.58
IFEval (0-Shot)
65.20
BBH (3-Shot)
29.57
MATH Lvl 5 (4-Shot)
6.57
GPQA (0-shot)
5.70
MuSR (0-shot)
12.29
MMLU-PRO (5-shot)
28.14