Files
SOLAR-10.7b-Instruct-truthy…/README.md
ModelHub XC c810d32fbf 初始化项目,由ModelHub XC社区提供模型
Model: macadeliccc/SOLAR-10.7b-Instruct-truthy-dpo
Source: Original Platform
2026-04-11 10:41:05 +08:00

9.0 KiB

license, library_name, model-index
license library_name model-index
cc transformers
name results
SOLAR-10.7b-Instruct-truthy-dpo
task dataset metrics source
type name
text-generation Text Generation
name type config split args
AI2 Reasoning Challenge (25-Shot) ai2_arc ARC-Challenge test
num_few_shot
25
type value name
acc_norm 72.1 normalized accuracy
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/SOLAR-10.7b-Instruct-truthy-dpo Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type split args
HellaSwag (10-Shot) hellaswag validation
num_few_shot
10
type value name
acc_norm 88.44 normalized accuracy
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/SOLAR-10.7b-Instruct-truthy-dpo Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type config split args
MMLU (5-Shot) cais/mmlu all test
num_few_shot
5
type value name
acc 65.45 accuracy
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/SOLAR-10.7b-Instruct-truthy-dpo Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type config split args
TruthfulQA (0-shot) truthful_qa multiple_choice validation
num_few_shot
0
type value
mc2 76.75
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/SOLAR-10.7b-Instruct-truthy-dpo Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type config split args
Winogrande (5-shot) winogrande winogrande_xl validation
num_few_shot
5
type value name
acc 82.72 accuracy
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/SOLAR-10.7b-Instruct-truthy-dpo Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type config split args
GSM8k (5-shot) gsm8k main test
num_few_shot
5
type value name
acc 59.21 accuracy
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/SOLAR-10.7b-Instruct-truthy-dpo Open LLM Leaderboard

SOLAR-10.7b-Instruct-truthy-dpo

orca-bagel

This model is a finetune of macadeliccc/SOLAR-10.7b-Instruct-truthy-dpo

Process

  1. I finetuned upstageai/Solar-10.7b-Instruct-v0.1 with 1 epoch of Intel/orca_dpo_pairs (12.4k samples)
  2. I futher finetuned that model with 3 epochs of jondurbin/truthy-dpo-v0.1 (1.04k samples)
  3. This process is experimental and the base model linked above is more tested at this time.

GGUF

Available here

Evaluations

----Benchmark Complete----

  • 2024-01-26 20:57:38
  • Time taken: 25.4 mins
  • Prompt Format: ChatML
  • Model: macadeliccc/SOLAR-10.7b-Instruct-truthy-dpo-GGUF
  • Score (v2): 74.11
  • Parseable: 171.0

Batch completed Time taken: 25.5 mins

Model AGIEval GPT4All TruthfulQA Bigbench Average
SOLAR-10.7b-Instruct-truthy-dpo 48.69 73.82 76.81 45.71 61.26

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 27.95 ± 2.82
acc_norm 27.95 ± 2.82
agieval_logiqa_en 0 acc 42.40 ± 1.94
acc_norm 42.24 ± 1.94
agieval_lsat_ar 0 acc 25.65 ± 2.89
acc_norm 23.91 ± 2.82
agieval_lsat_lr 0 acc 54.12 ± 2.21
acc_norm 54.51 ± 2.21
agieval_lsat_rc 0 acc 69.89 ± 2.80
acc_norm 69.89 ± 2.80
agieval_sat_en 0 acc 80.10 ± 2.79
acc_norm 80.10 ± 2.79
agieval_sat_en_without_passage 0 acc 50.00 ± 3.49
acc_norm 49.51 ± 3.49
agieval_sat_math 0 acc 42.27 ± 3.34
acc_norm 41.36 ± 3.33

Average: 48.69%

GPT4All

Task Version Metric Value Stderr
arc_challenge 0 acc 59.90 ± 1.43
acc_norm 63.91 ± 1.40
arc_easy 0 acc 80.85 ± 0.81
acc_norm 78.16 ± 0.85
boolq 1 acc 88.20 ± 0.56
hellaswag 0 acc 68.34 ± 0.46
acc_norm 86.39 ± 0.34
openbookqa 0 acc 37.60 ± 2.17
acc_norm 46.80 ± 2.23
piqa 0 acc 78.84 ± 0.95
acc_norm 78.78 ± 0.95
winogrande 0 acc 74.51 ± 1.22

Average: 73.82%

TruthfulQA

Task Version Metric Value Stderr
truthfulqa_mc 1 mc1 61.81 ± 1.70
mc2 76.81 ± 1.42

Average: 76.81%

Bigbench

Task Version Metric Value Stderr
bigbench_causal_judgement 0 multiple_choice_grade 50.53 ± 3.64
bigbench_date_understanding 0 multiple_choice_grade 63.14 ± 2.51
bigbench_disambiguation_qa 0 multiple_choice_grade 47.67 ± 3.12
bigbench_geometric_shapes 0 multiple_choice_grade 26.18 ± 2.32
exact_str_match 0.00 ± 0.00
bigbench_logical_deduction_five_objects 0 multiple_choice_grade 28.60 ± 2.02
bigbench_logical_deduction_seven_objects 0 multiple_choice_grade 21.29 ± 1.55
bigbench_logical_deduction_three_objects 0 multiple_choice_grade 47.33 ± 2.89
bigbench_movie_recommendation 0 multiple_choice_grade 39.80 ± 2.19
bigbench_navigate 0 multiple_choice_grade 63.80 ± 1.52
bigbench_reasoning_about_colored_objects 0 multiple_choice_grade 59.05 ± 1.10
bigbench_ruin_names 0 multiple_choice_grade 40.18 ± 2.32
bigbench_salient_translation_error_detection 0 multiple_choice_grade 46.69 ± 1.58
bigbench_snarks 0 multiple_choice_grade 65.19 ± 3.55
bigbench_sports_understanding 0 multiple_choice_grade 72.41 ± 1.42
bigbench_temporal_sequences 0 multiple_choice_grade 60.30 ± 1.55
bigbench_tracking_shuffled_objects_five_objects 0 multiple_choice_grade 25.76 ± 1.24
bigbench_tracking_shuffled_objects_seven_objects 0 multiple_choice_grade 17.43 ± 0.91
bigbench_tracking_shuffled_objects_three_objects 0 multiple_choice_grade 47.33 ± 2.89

Average: 45.71%

Average score: 61.26%

Elapsed time: 02:16:03

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 74.11
AI2 Reasoning Challenge (25-Shot) 72.10
HellaSwag (10-Shot) 88.44
MMLU (5-Shot) 65.45
TruthfulQA (0-shot) 76.75
Winogrande (5-shot) 82.72
GSM8k (5-shot) 59.21