language, license, datasets, model-index
| language |
license |
datasets |
model-index |
|
|
mit |
| Sao10K/Claude-3-Opus-Instruct-15K |
| abacusai/SystemChat-1.1 |
| Ba2han/DollyLlama-5k |
|
| name |
results |
| Llama-Phi-3_DoRA |
| task |
dataset |
metrics |
source |
| type |
name |
| text-generation |
Text Generation |
|
| name |
type |
config |
split |
args |
| AI2 Reasoning Challenge (25-Shot) |
ai2_arc |
ARC-Challenge |
test |
|
|
| type |
value |
name |
| acc_norm |
62.29 |
normalized accuracy |
|
|
|
|
| task |
dataset |
metrics |
source |
| type |
name |
| text-generation |
Text Generation |
|
| name |
type |
split |
args |
| HellaSwag (10-Shot) |
hellaswag |
validation |
|
|
| type |
value |
name |
| acc_norm |
79.08 |
normalized accuracy |
|
|
|
|
| task |
dataset |
metrics |
source |
| type |
name |
| text-generation |
Text Generation |
|
| name |
type |
config |
split |
args |
| MMLU (5-Shot) |
cais/mmlu |
all |
test |
|
|
| type |
value |
name |
| acc |
69.44 |
accuracy |
|
|
|
|
| task |
dataset |
metrics |
source |
| type |
name |
| text-generation |
Text Generation |
|
| name |
type |
config |
split |
args |
| TruthfulQA (0-shot) |
truthful_qa |
multiple_choice |
validation |
|
|
|
|
|
| task |
dataset |
metrics |
source |
| type |
name |
| text-generation |
Text Generation |
|
| name |
type |
config |
split |
args |
| Winogrande (5-shot) |
winogrande |
winogrande_xl |
validation |
|
|
| type |
value |
name |
| acc |
73.4 |
accuracy |
|
|
|
|
| task |
dataset |
metrics |
source |
| type |
name |
| text-generation |
Text Generation |
|
| name |
type |
config |
split |
args |
| GSM8k (5-shot) |
gsm8k |
main |
test |
|
|
| type |
value |
name |
| acc |
68.01 |
accuracy |
|
|
|
|
|
|
|
We have Llama-3 at home!
Highest PHI-3-Mini MMLU and Winogrande on the board!
The model has been trained on filtered versions of tagged datasets, as well as a few thousand more examples generated with llama-3-70B.
Use Zephyr template with any system message. Default system message should be:
You are a smart, friendly and helpful assistant.

Detailed results can be found here
| Metric |
Value |
| Avg. |
67.72 |
| AI2 Reasoning Challenge (25-Shot) |
62.29 |
| HellaSwag (10-Shot) |
79.08 |
| MMLU (5-Shot) |
69.44 |
| TruthfulQA (0-shot) |
54.08 |
| Winogrande (5-shot) |
73.40 |
| GSM8k (5-shot) |
68.01 |