language, license, datasets, model-index
language license datasets model-index
en
mit
Sao10K/Claude-3-Opus-Instruct-15K
abacusai/SystemChat-1.1
Ba2han/DollyLlama-5k
name results
Llama-Phi-3_DoRA
task dataset metrics source
type name
text-generation Text Generation
name type config split args
AI2 Reasoning Challenge (25-Shot) ai2_arc ARC-Challenge test
num_few_shot
25
type value name
acc_norm 62.29 normalized accuracy
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Ba2han/Llama-Phi-3_DoRA Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type split args
HellaSwag (10-Shot) hellaswag validation
num_few_shot
10
type value name
acc_norm 79.08 normalized accuracy
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Ba2han/Llama-Phi-3_DoRA Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type config split args
MMLU (5-Shot) cais/mmlu all test
num_few_shot
5
type value name
acc 69.44 accuracy
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Ba2han/Llama-Phi-3_DoRA Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type config split args
TruthfulQA (0-shot) truthful_qa multiple_choice validation
num_few_shot
0
type value
mc2 54.08
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Ba2han/Llama-Phi-3_DoRA Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type config split args
Winogrande (5-shot) winogrande winogrande_xl validation
num_few_shot
5
type value name
acc 73.4 accuracy
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Ba2han/Llama-Phi-3_DoRA Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type config split args
GSM8k (5-shot) gsm8k main test
num_few_shot
5
type value name
acc 68.01 accuracy
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Ba2han/Llama-Phi-3_DoRA Open LLM Leaderboard

We have Llama-3 at home!

Highest PHI-3-Mini MMLU and Winogrande on the board!

The model has been trained on filtered versions of tagged datasets, as well as a few thousand more examples generated with llama-3-70B.

Use Zephyr template with any system message. Default system message should be:

You are a smart, friendly and helpful assistant.

image/png

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 67.72
AI2 Reasoning Challenge (25-Shot) 62.29
HellaSwag (10-Shot) 79.08
MMLU (5-Shot) 69.44
TruthfulQA (0-shot) 54.08
Winogrande (5-shot) 73.40
GSM8k (5-shot) 68.01
Description
Model synced from source: Ba2han/Llama-Phi-3_DoRA
Readme 582 KiB