library_name, license, tags, base_model, model-index
| library_name |
license |
tags |
base_model |
model-index |
| transformers |
llama3.1 |
| mergekit |
| merge |
| shining-valiant |
| shining-valiant-2 |
| enigma |
| plum |
| plumcode |
| code |
| valiant |
| valiant-labs |
| llama |
| llama-3.1 |
| llama-3.1-instruct |
| llama-3.1-instruct-8b |
| llama-3 |
| llama-3-instruct |
| llama-3-instruct-8b |
| 8b |
| code |
| code-instruct |
| python |
| science |
| physics |
| biology |
| chemistry |
| compsci |
| computer-science |
| engineering |
| technical |
| conversational |
| chat |
| instruct |
|
| meta-llama/Llama-3.1-8B-Instruct |
| ValiantLabs/Llama3.1-8B-Enigma |
| ValiantLabs/Llama3.1-8B-ShiningValiant2 |
|
| name |
results |
| Llama3.1-8B-PlumCode |
| task |
dataset |
metrics |
| type |
name |
| text-generation |
Text Generation |
|
| name |
type |
args |
| Winogrande (5-Shot) |
Winogrande |
|
|
| type |
value |
name |
| acc |
73.16 |
acc |
|
|
|
| task |
dataset |
metrics |
source |
| type |
name |
| text-generation |
Text Generation |
|
| name |
type |
args |
| IFEval (0-Shot) |
HuggingFaceH4/ifeval |
|
|
| type |
value |
name |
| inst_level_strict_acc and prompt_level_strict_acc |
20.45 |
strict accuracy |
|
|
|
|
| task |
dataset |
metrics |
source |
| type |
name |
| text-generation |
Text Generation |
|
| name |
type |
args |
| BBH (3-Shot) |
BBH |
|
|
| type |
value |
name |
| acc_norm |
8.5 |
normalized accuracy |
|
|
|
|
| task |
dataset |
metrics |
source |
| type |
name |
| text-generation |
Text Generation |
|
| name |
type |
args |
| MATH Lvl 5 (4-Shot) |
hendrycks/competition_math |
|
|
| type |
value |
name |
| exact_match |
2.42 |
exact match |
|
|
|
|
| task |
dataset |
metrics |
source |
| type |
name |
| text-generation |
Text Generation |
|
| name |
type |
args |
| GPQA (0-shot) |
Idavidrein/gpqa |
|
|
| type |
value |
name |
| acc_norm |
3.47 |
acc_norm |
|
|
|
|
| task |
dataset |
metrics |
source |
| type |
name |
| text-generation |
Text Generation |
|
| name |
type |
args |
| MuSR (0-shot) |
TAUR-Lab/MuSR |
|
|
| type |
value |
name |
| acc_norm |
8.97 |
acc_norm |
|
|
|
|
| task |
dataset |
metrics |
source |
| type |
name |
| text-generation |
Text Generation |
|
| name |
type |
config |
split |
args |
| MMLU-PRO (5-shot) |
TIGER-Lab/MMLU-Pro |
main |
test |
|
|
| type |
value |
name |
| acc |
14.84 |
accuracy |
|
|
|
|
|
|
|
PlumCode
This is a merge of pre-trained language models created using mergekit.
Merge Details
Merge Method
This model was merged using the della merge method using meta-llama/Llama-3.1-8B-Instruct as a base.
Models Merged
The following models were included in the merge:
Configuration
The following YAML configuration was used to produce this model:
Detailed results can be found here
| Metric |
Value |
| Avg. |
9.77 |
| IFEval (0-Shot) |
20.45 |
| BBH (3-Shot) |
8.50 |
| MATH Lvl 5 (4-Shot) |
2.42 |
| GPQA (0-shot) |
3.47 |
| MuSR (0-shot) |
8.97 |
| MMLU-PRO (5-shot) |
14.84 |