Jim Lai 1d6818846e Adding Evaluation Results (#1)
- Adding Evaluation Results (f20589f67a5e977b1c0b891c93514f84a8eb2a6c)


Co-authored-by: Open LLM Leaderboard PR Bot <leaderboard-pr-bot@users.noreply.huggingface.co>
2024-09-17 13:36:39 +00:00
2024-06-28 01:15:32 +00:00
2024-06-27 21:24:23 -04:00
2024-06-27 21:24:23 -04:00
2024-06-27 21:24:23 -04:00
2024-06-27 21:24:23 -04:00
2024-06-27 21:24:23 -04:00
2024-09-17 13:36:39 +00:00
2024-06-27 21:24:23 -04:00
2024-06-27 21:24:23 -04:00
2024-06-27 21:24:23 -04:00

license, library_name, tags, base_model, pipeline_tag, model-index
license library_name tags base_model pipeline_tag model-index
llama3 transformers
mergekit
merge
princeton-nlp/Llama-3-Instruct-8B-SimPO
UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter3
text-generation
name results
Llama-3-Instruct-8B-SPPO-Iter3-SimPO-merge
task dataset metrics source
type name
text-generation Text Generation
name type args
IFEval (0-Shot) HuggingFaceH4/ifeval
num_few_shot
0
type value name
inst_level_strict_acc and prompt_level_strict_acc 42.71 strict accuracy
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=grimjim/Llama-3-Instruct-8B-SPPO-Iter3-SimPO-merge Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type args
BBH (3-Shot) BBH
num_few_shot
3
type value name
acc_norm 28.26 normalized accuracy
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=grimjim/Llama-3-Instruct-8B-SPPO-Iter3-SimPO-merge Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type args
MATH Lvl 5 (4-Shot) hendrycks/competition_math
num_few_shot
4
type value name
exact_match 9.37 exact match
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=grimjim/Llama-3-Instruct-8B-SPPO-Iter3-SimPO-merge Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type args
GPQA (0-shot) Idavidrein/gpqa
num_few_shot
0
type value name
acc_norm 5.37 acc_norm
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=grimjim/Llama-3-Instruct-8B-SPPO-Iter3-SimPO-merge Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type args
MuSR (0-shot) TAUR-Lab/MuSR
num_few_shot
0
type value name
acc_norm 9.54 acc_norm
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=grimjim/Llama-3-Instruct-8B-SPPO-Iter3-SimPO-merge Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type config split args
MMLU-PRO (5-shot) TIGER-Lab/MMLU-Pro main test
num_few_shot
5
type value name
acc 29.17 accuracy
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=grimjim/Llama-3-Instruct-8B-SPPO-Iter3-SimPO-merge Open LLM Leaderboard

Llama-3-Instruct-8B-SPPO-Iter3-SimPO-merge

This is a merge of pre-trained language models created using mergekit.

Built with Meta Llama 3.

Merge Details

Merge Method

This model was merged using the SLERP merge method.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

slices:
- sources:
  - model: princeton-nlp/Llama-3-Instruct-8B-SimPO
    layer_range:
    - 0
    - 32
  - model: UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter3
    layer_range:
    - 0
    - 32
merge_method: slerp
base_model: UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter3
parameters:
  t:
  - filter: self_attn
    value:
    - 0
    - 0.5
    - 0.3
    - 0.7
    - 1
  - filter: mlp
    value:
    - 1
    - 0.5
    - 0.7
    - 0.3
    - 0
  - value: 0.5
dtype: bfloat16

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 20.74
IFEval (0-Shot) 42.71
BBH (3-Shot) 28.26
MATH Lvl 5 (4-Shot) 9.37
GPQA (0-shot) 5.37
MuSR (0-shot) 9.54
MMLU-PRO (5-shot) 29.17
Description
Model synced from source: grimjim/Llama-3-Instruct-8B-SPPO-Iter3-SimPO-merge
Readme 2.6 MiB