Go to file

ModelHub XC 7a55af502c 初始化项目，由ModelHub XC社区提供模型

Model: sethuiyer/OpenDolphinHermes_Llama2_7B
Source: Original Platform

2026-06-15 07:05:17 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-06-15 07:05:17 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-06-15 07:05:17 +08:00

dolphin_hermes.webp

初始化项目，由ModelHub XC社区提供模型

2026-06-15 07:05:17 +08:00

mergekit_config.yml

初始化项目，由ModelHub XC社区提供模型

2026-06-15 07:05:17 +08:00

model-00001-of-00007.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-06-15 07:05:17 +08:00

model-00002-of-00007.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-06-15 07:05:17 +08:00

model-00003-of-00007.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-06-15 07:05:17 +08:00

model-00004-of-00007.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-06-15 07:05:17 +08:00

model-00005-of-00007.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-06-15 07:05:17 +08:00

model-00006-of-00007.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-06-15 07:05:17 +08:00

model-00007-of-00007.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-06-15 07:05:17 +08:00

model.safetensors.index.json

初始化项目，由ModelHub XC社区提供模型

2026-06-15 07:05:17 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-06-15 07:05:17 +08:00

special_tokens_map.json

初始化项目，由ModelHub XC社区提供模型

2026-06-15 07:05:17 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-06-15 07:05:17 +08:00

tokenizer.json

初始化项目，由ModelHub XC社区提供模型

2026-06-15 07:05:17 +08:00

tokenizer.model

初始化项目，由ModelHub XC社区提供模型

2026-06-15 07:05:17 +08:00

README.md

language, license, library_name, tags, datasets, base_model, pipeline_tag, model-index

language

license

library_name

tags

datasets

base_model

pipeline_tag

model-index

llama2

transformers

merge

mergekit

lazymergekit

teknium/openhermes

cognitivecomputations/dolphin

cognitivecomputations/dolphin-llama2-7b

Tensoic/Llama-2-openhermes

text-generation

name

results

OpenDolphinHermes_Llama2_7B

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

AI2 Reasoning Challenge (25-Shot)

ai2_arc

ARC-Challenge

test

num_few_shot
25

type	value	name
acc_norm	55.03	normalized accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=sethuiyer/OpenDolphinHermes_Llama2_7B	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

split

args

HellaSwag (10-Shot)

hellaswag

validation

num_few_shot
10

type	value	name
acc_norm	78.74	normalized accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=sethuiyer/OpenDolphinHermes_Llama2_7B	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

MMLU (5-Shot)

cais/mmlu

all

test

num_few_shot
5

type	value	name
acc	52.25	accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=sethuiyer/OpenDolphinHermes_Llama2_7B	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

TruthfulQA (0-shot)

truthful_qa

multiple_choice

validation

num_few_shot
0

type	value
mc2	46.1

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=sethuiyer/OpenDolphinHermes_Llama2_7B	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

Winogrande (5-shot)

winogrande

winogrande_xl

validation

num_few_shot
5

type	value	name
acc	73.16	accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=sethuiyer/OpenDolphinHermes_Llama2_7B	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

GSM8k (5-shot)

gsm8k

main

test

num_few_shot
5

type	value	name
acc	20.17	accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=sethuiyer/OpenDolphinHermes_Llama2_7B	Open LLM Leaderboard

OpenDolphinHermes_Llama2_7B

mergekit SLERP of these two models

🧩 Configuration

slices:
  - sources:
      - model: cognitivecomputations/dolphin-llama2-7b
        layer_range: [0, 32]
      - model: Tensoic/Llama-2-openhermes
        layer_range: [0, 32]
merge_method: slerp
base_model: Tensoic/Llama-2-openhermes
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.5
dtype: bfloat16

Prompt Template (ChatML)

<|im_start|>system
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.
Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content.
Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct.
If you don't know the answer to a question, please don't share false information.
<|im_end|>
<|im_start|>user
{ .Prompt}
<|im_end|>
<|im_start|>assistant

OpenLLM Leaderboard

T	Model	Average	ARC	HellaSwag	MMLU	TruthfulQA	Winogrande	GSM8K
0	meta-llama/llama-2-13b-hf	55.69	59.39	82.13	55.77	37.38	76.64	22.82
1	sethuiyer/OpenDolphinHermes_Llama2_7B	54.24	55.03	78.74	52.25	46.1	73.16	20.17
2	togethercomputer/Llama-2-7B-32K-Instruct	50.02	51.11	78.51	46.11	44.86	73.88	5.69
3	togethercomputer/LLaMa-2-7B-32K	47.07	47.53	76.14	43.33	39.23	71.9	4.32

Why?

I wanted a LLaMa2-7B model which is as good as base LLaMa2-13B model.

💻 Usage

!pip install -qU transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "sethuiyer/OpenDolphinHermes_Llama2_7B"
messages = [{"role": "user", "content": "What is a large language model?"}]

tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])

Output:

A large language model is a type of artificial intelligence system that has been trained on a massive amount of data, often millions or even billions of words, to learn the patterns and relationships between words and phrases.
These models can then be used to generate new text, understand and translate languages, and perform various natural language processing tasks.
They have become increasingly popular in recent years due to advances in machine learning technology and their ability to achieve high levels of accuracy and performance on natural language processing tasks.
Examples of large language models include GPT-2, BERT, and T5.

Thanks

Thanks to Google Colab for the compute.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	54.24
AI2 Reasoning Challenge (25-Shot)	55.03
HellaSwag (10-Shot)	78.74
MMLU (5-Shot)	52.25
TruthfulQA (0-shot)	46.10
Winogrande (5-shot)	73.16
GSM8k (5-shot)	20.17