language, license, pipeline_tag, base_model, model-index
| language |
license |
pipeline_tag |
base_model |
model-index |
|
|
cc-by-nc-sa-4.0 |
text-generation |
| upstage/SOLAR-10.7B-v1.0 |
| Yhyu13/LMCocktail-10.7B-v1 |
|
| name |
results |
| SOLAR-tail-10.7B-Merge-v1.0 |
| task |
dataset |
metrics |
source |
| type |
name |
| text-generation |
Text Generation |
|
| name |
type |
config |
split |
args |
| AI2 Reasoning Challenge (25-Shot) |
ai2_arc |
ARC-Challenge |
test |
|
|
| type |
value |
name |
| acc_norm |
66.13 |
normalized accuracy |
|
|
|
|
| task |
dataset |
metrics |
source |
| type |
name |
| text-generation |
Text Generation |
|
| name |
type |
split |
args |
| HellaSwag (10-Shot) |
hellaswag |
validation |
|
|
| type |
value |
name |
| acc_norm |
86.54 |
normalized accuracy |
|
|
|
|
| task |
dataset |
metrics |
source |
| type |
name |
| text-generation |
Text Generation |
|
| name |
type |
config |
split |
args |
| MMLU (5-Shot) |
cais/mmlu |
all |
test |
|
|
| type |
value |
name |
| acc |
66.52 |
accuracy |
|
|
|
|
| task |
dataset |
metrics |
source |
| type |
name |
| text-generation |
Text Generation |
|
| name |
type |
config |
split |
args |
| TruthfulQA (0-shot) |
truthful_qa |
multiple_choice |
validation |
|
|
|
|
|
| task |
dataset |
metrics |
source |
| type |
name |
| text-generation |
Text Generation |
|
| name |
type |
config |
split |
args |
| Winogrande (5-shot) |
winogrande |
winogrande_xl |
validation |
|
|
| type |
value |
name |
| acc |
84.77 |
accuracy |
|
|
|
|
| task |
dataset |
metrics |
source |
| type |
name |
| text-generation |
Text Generation |
|
| name |
type |
config |
split |
args |
| GSM8k (5-shot) |
gsm8k |
main |
test |
|
|
| type |
value |
name |
| acc |
65.58 |
accuracy |
|
|
|
|
|
|
|
SOLAR-tail-10.7B-Merge-v1.0
Model Details
Model Developers Kyujin Han (kyujinpy)
Method
Using Mergekit.
Merge config
Model Benchmark
Open Ko leaderboard
| Model |
Average |
ARC |
HellaSwag |
MMLU |
TruthfulQA |
Ko-CommonGenV2 |
| PracticeLLM/SOLAR-tail-10.7B-Merge-v1.0 |
48.32 |
45.73 |
56.97 |
38.77 |
38.75 |
61.16 |
| jjourney1125/M-SOLAR-10.7B-v1.0 |
55.15 |
49.57 |
60.12 |
54.60 |
49.23 |
62.22 |
- Follow up as En-link.
| Model |
Average |
ARC |
HellaSwag |
MMLU |
TruthfulQA |
Winogrande |
GSM8K |
| PracticeLLM/SOLAR-tail-10.7B-Merge-v1.0 |
71.68 |
66.13 |
86.54 |
66.52 |
60.57 |
84.77 |
65.58 |
| kyujinpy/Sakura-SOLAR-Instruct |
74.40 |
70.99 |
88.42 |
66.33 |
71.79 |
83.66 |
65.20 |
lm-evaluation-harness
Implementation Code
Detailed results can be found here
| Metric |
Value |
| Avg. |
71.68 |
| AI2 Reasoning Challenge (25-Shot) |
66.13 |
| HellaSwag (10-Shot) |
86.54 |
| MMLU (5-Shot) |
66.52 |
| TruthfulQA (0-shot) |
60.57 |
| Winogrande (5-shot) |
84.77 |
| GSM8k (5-shot) |
65.58 |