license, tags, datasets, model-index
license tags datasets model-index
apache-2.0
miqu
jondurbin/truthy-dpo-v0.1
name results
Miqu-6B-truthy
task dataset metrics source
type name
text-generation Text Generation
name type config split args
AI2 Reasoning Challenge (25-Shot) ai2_arc ARC-Challenge test
num_few_shot
25
type value name
acc_norm 27.65 normalized accuracy
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=vicgalle/Miqu-6B-truthy Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type split args
HellaSwag (10-Shot) hellaswag validation
num_few_shot
10
type value name
acc_norm 26.71 normalized accuracy
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=vicgalle/Miqu-6B-truthy Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type config split args
MMLU (5-Shot) cais/mmlu all test
num_few_shot
5
type value name
acc 27.04 accuracy
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=vicgalle/Miqu-6B-truthy Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type config split args
TruthfulQA (0-shot) truthful_qa multiple_choice validation
num_few_shot
0
type value
mc2 50.63
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=vicgalle/Miqu-6B-truthy Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type config split args
Winogrande (5-shot) winogrande winogrande_xl validation
num_few_shot
5
type value name
acc 49.64 accuracy
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=vicgalle/Miqu-6B-truthy Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type config split args
GSM8k (5-shot) gsm8k main test
num_few_shot
5
type value name
acc 0.0 accuracy
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=vicgalle/Miqu-6B-truthy Open LLM Leaderboard

Miqu-6B-truthy

A truthfully Miqu of 6B parameters, as an experiment.

"results": {
    "truthfulqa_mc": {
      "mc1": 0.2521419828641371,
      "mc1_stderr": 0.01520152224629995,
      "mc2": 0.5051887026752994,
      "mc2_stderr": 0.016738600540275827
    }
  },

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 30.28
AI2 Reasoning Challenge (25-Shot) 27.65
HellaSwag (10-Shot) 26.71
MMLU (5-Shot) 27.04
TruthfulQA (0-shot) 50.63
Winogrande (5-shot) 49.64
GSM8k (5-shot) 0.00
Description
Model synced from source: vicgalle/Miqu-6B-truthy
Readme 27 KiB