license, library_name, base_model, datasets, model-index
| license |
library_name |
base_model |
datasets |
model-index |
| apache-2.0 |
transformers |
| mistralai/Mistral-Nemo-Instruct-2407 |
|
| jondurbin/gutenberg-dpo-v0.1 |
|
| name |
results |
| mistral-nemo-gutenberg-12B |
| task |
dataset |
metrics |
source |
| type |
name |
| text-generation |
Text Generation |
|
| name |
type |
args |
| IFEval (0-Shot) |
HuggingFaceH4/ifeval |
|
|
| type |
value |
name |
| inst_level_strict_acc and prompt_level_strict_acc |
35.04 |
strict accuracy |
|
|
|
|
| task |
dataset |
metrics |
source |
| type |
name |
| text-generation |
Text Generation |
|
| name |
type |
args |
| BBH (3-Shot) |
BBH |
|
|
| type |
value |
name |
| acc_norm |
32.43 |
normalized accuracy |
|
|
|
|
| task |
dataset |
metrics |
source |
| type |
name |
| text-generation |
Text Generation |
|
| name |
type |
args |
| MATH Lvl 5 (4-Shot) |
hendrycks/competition_math |
|
|
| type |
value |
name |
| exact_match |
10.42 |
exact match |
|
|
|
|
| task |
dataset |
metrics |
source |
| type |
name |
| text-generation |
Text Generation |
|
| name |
type |
args |
| GPQA (0-shot) |
Idavidrein/gpqa |
|
|
| type |
value |
name |
| acc_norm |
7.61 |
acc_norm |
|
|
|
|
| task |
dataset |
metrics |
source |
| type |
name |
| text-generation |
Text Generation |
|
| name |
type |
args |
| MuSR (0-shot) |
TAUR-Lab/MuSR |
|
|
| type |
value |
name |
| acc_norm |
10.97 |
acc_norm |
|
|
|
|
| task |
dataset |
metrics |
source |
| type |
name |
| text-generation |
Text Generation |
|
| name |
type |
config |
split |
args |
| MMLU-PRO (5-shot) |
TIGER-Lab/MMLU-Pro |
main |
test |
|
|
| type |
value |
name |
| acc |
28.47 |
accuracy |
|
|
|
|
|
|
|
mistral-nemo-gutenberg-12B
mistralai/Mistral-Nemo-Instruct-2407 finetuned on jondurbin/gutenberg-dpo-v0.1.
Method
Finetuned using an A100 on Google Colab for 1 epoch.
Fine-tune Llama 3 with ORPO
Detailed results can be found here
| Metric |
Value |
| Avg. |
20.82 |
| IFEval (0-Shot) |
35.04 |
| BBH (3-Shot) |
32.43 |
| MATH Lvl 5 (4-Shot) |
10.42 |
| GPQA (0-shot) |
7.61 |
| MuSR (0-shot) |
10.97 |
| MMLU-PRO (5-shot) |
28.47 |