language, license, datasets, model-index
language license datasets model-index
en
apache-2.0
Open-Orca/SlimOrca
name results
NEBULA-XB-v1.0
task dataset metrics source
type name
text-generation Text Generation
name type config split args
AI2 Reasoning Challenge (25-Shot) ai2_arc ARC-Challenge test
num_few_shot
25
type value name
acc_norm 56.66 normalized accuracy
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TeeZee/NEBULA-XB-v1.0 Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type split args
HellaSwag (10-Shot) hellaswag validation
num_few_shot
10
type value name
acc_norm 81.78 normalized accuracy
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TeeZee/NEBULA-XB-v1.0 Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type config split args
MMLU (5-Shot) cais/mmlu all test
num_few_shot
5
type value name
acc 60.98 accuracy
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TeeZee/NEBULA-XB-v1.0 Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type config split args
TruthfulQA (0-shot) truthful_qa multiple_choice validation
num_few_shot
0
type value
mc2 44.03
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TeeZee/NEBULA-XB-v1.0 Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type config split args
Winogrande (5-shot) winogrande winogrande_xl validation
num_few_shot
5
type value name
acc 77.66 accuracy
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TeeZee/NEBULA-XB-v1.0 Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type config split args
GSM8k (5-shot) gsm8k main test
num_few_shot
5
type value name
acc 0.0 accuracy
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TeeZee/NEBULA-XB-v1.0 Open LLM Leaderboard

TeeZee/NEBULA-XB-v1.03

Experiment, can DUS be taken one or more steps further?

Technical notes:

  • pretrained model v03 finetuned on 50k entries from SlimOrca dataset
  • 18 layers removed from both models of finetuned GALAXY-XB-v03
  • model has 108 layers (((48-12)*2)-18)*2 = 108
  • second step in scaling DUS procedure

To evaluate

  • model performance after merge, should be a little lover that GALAXY finetuned on 50k of slimorca

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 53.52
AI2 Reasoning Challenge (25-Shot) 56.66
HellaSwag (10-Shot) 81.78
MMLU (5-Shot) 60.98
TruthfulQA (0-shot) 44.03
Winogrande (5-shot) 77.66
GSM8k (5-shot) 0.00
Description
Model synced from source: TeeZee/NEBULA-XB-v1.0
Readme 566 KiB