language, license, library_name, tags, base_model, pipeline_tag, model_type, model-index
| language |
license |
library_name |
tags |
base_model |
pipeline_tag |
model_type |
model-index |
|
|
llama3.1 |
transformers |
| mergekit |
| merge |
| shining-valiant |
| shining-valiant-2 |
| cobalt |
| plum |
| valiant |
| valiant-labs |
| llama |
| llama-3.1 |
| llama-3.1-instruct |
| llama-3.1-instruct-8b |
| llama-3 |
| llama-3-instruct |
| llama-3-instruct-8b |
| 8b |
| math |
| math-instruct |
| science |
| physics |
| biology |
| chemistry |
| compsci |
| computer-science |
| engineering |
| technical |
| conversational |
| chat |
| instruct |
|
| meta-llama/Llama-3.1-8B-Instruct |
| ValiantLabs/Llama3.1-8B-ShiningValiant2 |
| ValiantLabs/Llama3.1-8B-Cobalt |
|
text-generation |
llama |
| name |
results |
| sequelbox/Llama3.1-8B-PlumMath |
| task |
dataset |
metrics |
| type |
name |
| text-generation |
Text Generation |
|
| name |
type |
args |
| Winogrande (5-Shot) |
Winogrande |
|
|
| type |
value |
name |
| acc |
72.38 |
acc |
|
|
|
| task |
dataset |
metrics |
| type |
name |
| text-generation |
Text Generation |
|
| name |
type |
args |
| MathQA (5-Shot) |
MathQA |
|
|
| type |
value |
name |
| acc |
40.27 |
acc |
|
|
|
| task |
dataset |
metrics |
source |
| type |
name |
| text-generation |
Text Generation |
|
| name |
type |
args |
| IFEval (0-Shot) |
HuggingFaceH4/ifeval |
|
|
| type |
value |
name |
| inst_level_strict_acc and prompt_level_strict_acc |
22.42 |
strict accuracy |
|
|
|
|
| task |
dataset |
metrics |
source |
| type |
name |
| text-generation |
Text Generation |
|
| name |
type |
args |
| BBH (3-Shot) |
BBH |
|
|
| type |
value |
name |
| acc_norm |
16.45 |
normalized accuracy |
|
|
|
|
| task |
dataset |
metrics |
source |
| type |
name |
| text-generation |
Text Generation |
|
| name |
type |
args |
| MATH Lvl 5 (4-Shot) |
hendrycks/competition_math |
|
|
| type |
value |
name |
| exact_match |
3.93 |
exact match |
|
|
|
|
| task |
dataset |
metrics |
source |
| type |
name |
| text-generation |
Text Generation |
|
| name |
type |
args |
| GPQA (0-shot) |
Idavidrein/gpqa |
|
|
| type |
value |
name |
| acc_norm |
9.06 |
acc_norm |
|
|
|
|
| task |
dataset |
metrics |
source |
| type |
name |
| text-generation |
Text Generation |
|
| name |
type |
args |
| MuSR (0-shot) |
TAUR-Lab/MuSR |
|
|
| type |
value |
name |
| acc_norm |
8.98 |
acc_norm |
|
|
|
|
| task |
dataset |
metrics |
source |
| type |
name |
| text-generation |
Text Generation |
|
| name |
type |
config |
split |
args |
| MMLU-PRO (5-shot) |
TIGER-Lab/MMLU-Pro |
main |
test |
|
|
| type |
value |
name |
| acc |
21.95 |
accuracy |
|
|
|
|
|
|
|
PlumMath
This is a merge of pre-trained language models created using mergekit.
Merge Details
Merge Method
This model was merged using the della merge method using meta-llama/Llama-3.1-8B-Instruct as a base.
Models Merged
The following models were included in the merge:
Configuration
The following YAML configuration was used to produce this model:
Detailed results can be found here
| Metric |
Value |
| Avg. |
13.80 |
| IFEval (0-Shot) |
22.42 |
| BBH (3-Shot) |
16.45 |
| MATH Lvl 5 (4-Shot) |
3.93 |
| GPQA (0-shot) |
9.06 |
| MuSR (0-shot) |
8.98 |
| MMLU-PRO (5-shot) |
21.95 |