macadeliccc/SOLAR-10.7b-Instruct-truthy-dpo

Files

ModelHub XC c810d32fbf 初始化项目，由ModelHub XC社区提供模型

Model: macadeliccc/SOLAR-10.7b-Instruct-truthy-dpo
Source: Original Platform

2026-04-11 10:41:05 +08:00

9.0 KiB

Raw Blame History

license, library_name, model-index

license

library_name

model-index

transformers

name

results

SOLAR-10.7b-Instruct-truthy-dpo

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

AI2 Reasoning Challenge (25-Shot)

ai2_arc

ARC-Challenge

test

num_few_shot
25

type	value	name
acc_norm	72.1	normalized accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/SOLAR-10.7b-Instruct-truthy-dpo	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

split

args

HellaSwag (10-Shot)

hellaswag

validation

num_few_shot
10

type	value	name
acc_norm	88.44	normalized accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/SOLAR-10.7b-Instruct-truthy-dpo	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

MMLU (5-Shot)

cais/mmlu

all

test

num_few_shot
5

type	value	name
acc	65.45	accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/SOLAR-10.7b-Instruct-truthy-dpo	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

TruthfulQA (0-shot)

truthful_qa

multiple_choice

validation

num_few_shot
0

type	value
mc2	76.75

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/SOLAR-10.7b-Instruct-truthy-dpo	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

Winogrande (5-shot)

winogrande

winogrande_xl

validation

num_few_shot
5

type	value	name
acc	82.72	accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/SOLAR-10.7b-Instruct-truthy-dpo	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

GSM8k (5-shot)

gsm8k

main

test

num_few_shot
5

type	value	name
acc	59.21	accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/SOLAR-10.7b-Instruct-truthy-dpo	Open LLM Leaderboard

SOLAR-10.7b-Instruct-truthy-dpo

This model is a finetune of macadeliccc/SOLAR-10.7b-Instruct-truthy-dpo

Process

I finetuned upstageai/Solar-10.7b-Instruct-v0.1 with 1 epoch of Intel/orca_dpo_pairs (12.4k samples)
I futher finetuned that model with 3 epochs of jondurbin/truthy-dpo-v0.1 (1.04k samples)
This process is experimental and the base model linked above is more tested at this time.

GGUF

Available here

Evaluations

----Benchmark Complete----

2024-01-26 20:57:38
Time taken: 25.4 mins
Prompt Format: ChatML
Model: macadeliccc/SOLAR-10.7b-Instruct-truthy-dpo-GGUF
Score (v2): 74.11
Parseable: 171.0

Batch completed Time taken: 25.5 mins

Model	AGIEval	GPT4All	TruthfulQA	Bigbench	Average
SOLAR-10.7b-Instruct-truthy-dpo	48.69	73.82	76.81	45.71	61.26

AGIEval

Task	Version	Metric	Value		Stderr
agieval_aqua_rat	0	acc	27.95	±	2.82
		acc_norm	27.95	±	2.82
agieval_logiqa_en	0	acc	42.40	±	1.94
		acc_norm	42.24	±	1.94
agieval_lsat_ar	0	acc	25.65	±	2.89
		acc_norm	23.91	±	2.82
agieval_lsat_lr	0	acc	54.12	±	2.21
		acc_norm	54.51	±	2.21
agieval_lsat_rc	0	acc	69.89	±	2.80
		acc_norm	69.89	±	2.80
agieval_sat_en	0	acc	80.10	±	2.79
		acc_norm	80.10	±	2.79
agieval_sat_en_without_passage	0	acc	50.00	±	3.49
		acc_norm	49.51	±	3.49
agieval_sat_math	0	acc	42.27	±	3.34
		acc_norm	41.36	±	3.33

Average: 48.69%

GPT4All

Task	Version	Metric	Value		Stderr
arc_challenge	0	acc	59.90	±	1.43
		acc_norm	63.91	±	1.40
arc_easy	0	acc	80.85	±	0.81
		acc_norm	78.16	±	0.85
boolq	1	acc	88.20	±	0.56
hellaswag	0	acc	68.34	±	0.46
		acc_norm	86.39	±	0.34
openbookqa	0	acc	37.60	±	2.17
		acc_norm	46.80	±	2.23
piqa	0	acc	78.84	±	0.95
		acc_norm	78.78	±	0.95
winogrande	0	acc	74.51	±	1.22

Average: 73.82%

TruthfulQA

Task	Version	Metric	Value		Stderr
truthfulqa_mc	1	mc1	61.81	±	1.70
		mc2	76.81	±	1.42

Average: 76.81%

Bigbench

Task	Version	Metric	Value		Stderr
bigbench_causal_judgement	0	multiple_choice_grade	50.53	±	3.64
bigbench_date_understanding	0	multiple_choice_grade	63.14	±	2.51
bigbench_disambiguation_qa	0	multiple_choice_grade	47.67	±	3.12
bigbench_geometric_shapes	0	multiple_choice_grade	26.18	±	2.32
		exact_str_match	0.00	±	0.00
bigbench_logical_deduction_five_objects	0	multiple_choice_grade	28.60	±	2.02
bigbench_logical_deduction_seven_objects	0	multiple_choice_grade	21.29	±	1.55
bigbench_logical_deduction_three_objects	0	multiple_choice_grade	47.33	±	2.89
bigbench_movie_recommendation	0	multiple_choice_grade	39.80	±	2.19
bigbench_navigate	0	multiple_choice_grade	63.80	±	1.52
bigbench_reasoning_about_colored_objects	0	multiple_choice_grade	59.05	±	1.10
bigbench_ruin_names	0	multiple_choice_grade	40.18	±	2.32
bigbench_salient_translation_error_detection	0	multiple_choice_grade	46.69	±	1.58
bigbench_snarks	0	multiple_choice_grade	65.19	±	3.55
bigbench_sports_understanding	0	multiple_choice_grade	72.41	±	1.42
bigbench_temporal_sequences	0	multiple_choice_grade	60.30	±	1.55
bigbench_tracking_shuffled_objects_five_objects	0	multiple_choice_grade	25.76	±	1.24
bigbench_tracking_shuffled_objects_seven_objects	0	multiple_choice_grade	17.43	±	0.91
bigbench_tracking_shuffled_objects_three_objects	0	multiple_choice_grade	47.33	±	2.89

Average: 45.71%

Average score: 61.26%

Elapsed time: 02:16:03

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	74.11
AI2 Reasoning Challenge (25-Shot)	72.10
HellaSwag (10-Shot)	88.44
MMLU (5-Shot)	65.45
TruthfulQA (0-shot)	76.75
Winogrande (5-shot)	82.72
GSM8k (5-shot)	59.21

9.0 KiB Raw Blame History

SOLAR-10.7b-Instruct-truthy-dpo

Process

GGUF

Evaluations

Batch completed Time taken: 25.5 mins

AGIEval

GPT4All

TruthfulQA

Bigbench

Open LLM Leaderboard Evaluation Results

9.0 KiB

Raw Blame History