ModelHub XC 28b84ee215 初始化项目,由ModelHub XC社区提供模型
Model: TeeZee/GALAXY_v03_slimorca_1_epoch_50k_DPO_1_epoch_30k
Source: Original Platform
2026-05-23 14:50:48 +08:00

language, license, datasets, model-index
language license datasets model-index
en
apache-2.0
Open-Orca/SlimOrca
allenai/ultrafeedback_binarized_cleaned
name results
GALAXY_v03_slimorca_1_epoch_50k_DPO_1_epoch_30k
task dataset metrics source
type name
text-generation Text Generation
name type config split args
AI2 Reasoning Challenge (25-Shot) ai2_arc ARC-Challenge test
num_few_shot
25
type value name
acc_norm 65.27 normalized accuracy
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TeeZee/GALAXY_v03_slimorca_1_epoch_50k_DPO_1_epoch_30k Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type split args
HellaSwag (10-Shot) hellaswag validation
num_few_shot
10
type value name
acc_norm 85.62 normalized accuracy
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TeeZee/GALAXY_v03_slimorca_1_epoch_50k_DPO_1_epoch_30k Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type config split args
MMLU (5-Shot) cais/mmlu all test
num_few_shot
5
type value name
acc 65.61 accuracy
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TeeZee/GALAXY_v03_slimorca_1_epoch_50k_DPO_1_epoch_30k Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type config split args
TruthfulQA (0-shot) truthful_qa multiple_choice validation
num_few_shot
0
type value
mc2 53.46
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TeeZee/GALAXY_v03_slimorca_1_epoch_50k_DPO_1_epoch_30k Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type config split args
Winogrande (5-shot) winogrande winogrande_xl validation
num_few_shot
5
type value name
acc 82.72 accuracy
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TeeZee/GALAXY_v03_slimorca_1_epoch_50k_DPO_1_epoch_30k Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type config split args
GSM8k (5-shot) gsm8k main test
num_few_shot
5
type value name
acc 0.08 accuracy
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TeeZee/GALAXY_v03_slimorca_1_epoch_50k_DPO_1_epoch_30k Open LLM Leaderboard

TeeZee/GALAXY-XB-v1.03-SFT-DPO

Experiment, can DUS be taken one or more steps further?

Technical notes:

  • model v03 finetuned on 50k entries from SlimOrca dataset and then DPO on 30k entries from ultrachat
  • 12 layers removed from both models, 4 more than in original paper but its 1/4 of all layers(48) as per original paper.
  • base version of upstage/SOLAR-10.7B-v1.0 used for merge

To evaluate

  • model performance after DPO, did it recover all initial performance loss after merge?

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 58.79
AI2 Reasoning Challenge (25-Shot) 65.27
HellaSwag (10-Shot) 85.62
MMLU (5-Shot) 65.61
TruthfulQA (0-shot) 53.46
Winogrande (5-shot) 82.72
GSM8k (5-shot) 0.08
Description
Model synced from source: TeeZee/GALAXY_v03_slimorca_1_epoch_50k_DPO_1_epoch_30k
Readme 1 MiB