license, tags, base_model, license_name, model-index
license
tags
base_model
license_name
model-index
other
merge
mergekit
lazymergekit
microsoft/Orca-2-13b
KoboldAI/LLaMA2-13B-Psyfighter2
KoboldAI/LLaMA2-13B-Psyfighter2
microsoft/Orca-2-13b
microsoft-research-license
name
results
Psyfighter2-Orca2-13B-ties
task
dataset
metrics
source
type
name
text-generation
Text Generation
name
type
config
split
args
AI2 Reasoning Challenge (25-Shot)
ai2_arc
ARC-Challenge
test
type
value
name
acc_norm
62.46
normalized accuracy
task
dataset
metrics
source
type
name
text-generation
Text Generation
name
type
split
args
HellaSwag (10-Shot)
hellaswag
validation
type
value
name
acc_norm
81.74
normalized accuracy
task
dataset
metrics
source
type
name
text-generation
Text Generation
name
type
config
split
args
MMLU (5-Shot)
cais/mmlu
all
test
type
value
name
acc
60.31
accuracy
task
dataset
metrics
source
type
name
text-generation
Text Generation
name
type
config
split
args
TruthfulQA (0-shot)
truthful_qa
multiple_choice
validation
task
dataset
metrics
source
type
name
text-generation
Text Generation
name
type
config
split
args
Winogrande (5-shot)
winogrande
winogrande_xl
validation
type
value
name
acc
77.27
accuracy
task
dataset
metrics
source
type
name
text-generation
Text Generation
name
type
config
split
args
GSM8k (5-shot)
gsm8k
main
test
type
value
name
acc
43.67
accuracy
Psyfighter2-Orca2-ties
Psyfighter2-Orca2-ties is a merge of the following models using mergekit :
This is my very first merge I have ever attempted. The motivation behind this merge is to try and create a 13B version of jebcarter/psyonic-cetacean-20B . I don't have a good GPU (GTX 1660 6GB), so although I can merge the model, I cannot actually run it. However, the Open LLM Leaderboard ranks this merge with 63.48 avg point, which is higher than both KoboldAI/LLaMA2-13B-Psyfighter2 and jebcarter/psyonic-cetacean-20B, so I must did something right. The next step is to quantize this merge into GGUF so I can actually run it with KoboldCpp .
🧩 Configuration
Detailed results can be found here
Metric
Value
Avg.
63.48
AI2 Reasoning Challenge (25-Shot)
62.46
HellaSwag (10-Shot)
81.74
MMLU (5-Shot)
60.31
TruthfulQA (0-shot)
55.40
Winogrande (5-shot)
77.27
GSM8k (5-shot)
43.67