Files

ModelHub XC 38cb072ce0 初始化项目，由ModelHub XC社区提供模型

Model: okwinds/Human-Like-Qwen2.5-7B-Instruct
Source: Original Platform

2026-05-19 12:23:13 +08:00

11 KiB

Raw Permalink Blame History

license, tags, base_model, pipeline_tag, library_name, model-index, datasets, language

license

tags

base_model

pipeline_tag

library_name

model-index

datasets

language

apache-2.0

axolotl

dpo

trl

Qwen/Qwen2.5-7B-Instruct

text-generation

transformers

name

results

Humanish-Qwen2.5-7B-Instruct

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

args

IFEval (0-Shot)

HuggingFaceH4/ifeval

num_few_shot
0

type	value	name
inst_level_strict_acc and prompt_level_strict_acc	72.84	strict accuracy

url	name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

args

BBH (3-Shot)

BBH

num_few_shot
3

type	value	name
acc_norm	34.48	normalized accuracy

url	name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

args

MATH Lvl 5 (4-Shot)

hendrycks/competition_math

num_few_shot
4

type	value	name
exact_match	0	exact match

url	name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

args

GPQA (0-shot)

Idavidrein/gpqa

num_few_shot
0

type	value	name
acc_norm	6.49	acc_norm

url	name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

args

MuSR (0-shot)

TAUR-Lab/MuSR

num_few_shot
0

type	value	name
acc_norm	8.42	acc_norm

url	name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

MMLU-PRO (5-shot)

TIGER-Lab/MMLU-Pro

main

test

num_few_shot
5

type	value	name
acc	37.76	accuracy

url	name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct	Open LLM Leaderboard

okwinds/Human-Like-DPO-Dataset

本模型论文解读，请看公众号文章 👇🏻

觉察流 - AI的“人味儿”从何而来？DPO和LoRA打造更拟人化的AI

下载方式

SDK下载

#安装ModelScope
pip install modelscope

#SDK模型下载
from modelscope import snapshot_download
model_dir = snapshot_download('okwinds/Human-Like-Qwen2.5-7B-Instruct')

Git下载

#Git模型下载
git clone https://www.modelscope.cn/okwinds/Human-Like-Qwen2.5-7B-Instruct.git

声明：本模型完全转载自 Huggingface 上的 HumanLLMs/Human-Like-Qwen2.5-7B-Instruct
更多模型信息，请关注下文👇🏻，为原模型仓库的中文版说明。

仓库作者在此 👇🏻 扫一扫

提升大型语言模型中的拟人化响应

| 🤖 模型集合 | 📊 数据集 | 论文解读 | 📄论文 |

🚀 Human-Like-Qwen2.5-7B-Instruct

此模型是 Qwen/Qwen2.5-7B-Instruct 的微调版本，专门优化以生成更符合人类和对话式的响应。

微调过程同时采用了低秩自适应（LoRA）和直接偏好优化（DPO）来提升自然语言理解、对话连贯性和交互中的情感智能。

该模型创建过程在研究论文《增强大型语言模型中的人类似响应》中详细描述。

🛠️ 训练配置

基础模型: Qwen2.5-7B-Instruct
框架: Axolotl v0.4.1
硬件算力: 2x NVIDIA A100 (80 GB) GPUs
训练时长: ~2 小时 15 分钟
数据集: 包含约 11,000 个样本的合成数据集，涵盖 256 个不同主题

查看 axolotl config

axolotl version: 0.4.1

base_model: Qwen/Qwen2.5-7B-Instruct
model_type: AutoModalForCausalLM
tokenizer_type: AutoTokenizer

trust_remote_code: true

load_in_8bit: true
load_in_4bit: false
strict: false

chat_template: chatml
rl: dpo
datasets:
  - path: HumanLLMs/humanish-dpo-project
    type: chatml.prompt_pairs
    chat_template: chatml

dataset_prepared_path:
val_set_size: 0.05
output_dir: ./humanish-qwen2.5-7b-instruct

sequence_len: 8192
sample_packing: false
pad_to_sequence_len: true

adapter: lora
lora_model_dir:
lora_r: 8
lora_alpha: 4
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:

wandb_project: Humanish-DPO
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

hub_model_id: HumanLLMs/Humanish-Qwen2.5-7B-Instruct

gradient_accumulation_steps: 8
micro_batch_size: 2
num_epochs: 1
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
s2_attention:

warmup_steps: 10
evals_per_epoch: 2
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:

save_safetensors: true

💬 Prompt Template

您在使用模型时可以使用 ChatML 格式的 Prompt Template：

ChatML

<|im_start|>system
{system}<|im_end|>
<|im_start|>user
{user}<|im_end|>
<|im_start|>assistant
{asistant}<|im_end|>

此提示模板可作为聊天模板使用，这意味着您可以使用 tokenizer.apply_chat_template() 方法格式化消息：

messages = [
    {"role": "system", "content": "You are helpful AI asistant."},
    {"role": "user", "content": "Hello!"}
]
gen_input = tokenizer.apply_chat_template(message, return_tensors="pt")
model.generate(**gen_input)

🤖 模型集合

Model	Download
Human-Like-Llama-3-8B-Instruct	🤖 Modelscope
Human-Like-Qwen-2.5-7B-Instruct	🤖 Modelscope
Human-Like-Mistral-Nemo-Instruct	🤖 Modelscope

🎯 基准测试结果

Group	Model	Average	IFEval	BBH	MATH Lvl 5	GPQA	MuSR	MMLU-PRO
Llama Models	Human-Like-Llama-3-8B-Instruct	22.37	64.97	28.01	8.45	0.78	2.00	30.01
	Llama-3-8B-Instruct	23.57	74.08	28.24	8.68	1.23	1.60	29.60
	Difference (Human-Like)	-1.20	-9.11	-0.23	-0.23	-0.45	+0.40	+0.41
Qwen Models	Human-Like-Qwen-2.5-7B-Instruct	26.66	72.84	34.48	0.00	6.49	8.42	37.76
	Qwen-2.5-7B-Instruct	26.86	75.85	34.89	0.00	5.48	8.45	36.52
	Difference (Human-Like)	-0.20	-3.01	-0.41	0.00	+1.01	-0.03	+1.24
Mistral Models	Human-Like-Mistral-Nemo-Instruct	22.88	54.51	32.70	7.62	5.03	9.39	28.00
	Mistral-Nemo-Instruct	23.53	63.80	29.68	5.89	5.37	8.48	27.97
	Difference (Human-Like)	-0.65	-9.29	+3.02	+1.73	-0.34	+0.91	+0.03

📊 数据集

用于微调的数据集是使用 LLaMA 3 模型生成的。该数据集包含 10,884 个样本，涵盖 256 个不同的主题，如科技、日常生活、科学、历史和艺术等。每个样本包括：

拟人回复: 自然、对话式的回答，模仿人类对话。
正式回复: 结构化和精确的答案，语气更加正式。

数据集已开源，可在以下地址获取：

👉 Human-Like-DPO-Dataset

11 KiB Raw Permalink Blame History Unescape Escape

本模型论文解读，请看公众号文章 👇🏻

觉察流 - AI的“人味儿”从何而来？DPO和LoRA打造更拟人化的AI

下载方式

仓库作者在此 👇🏻 扫一扫

提升大型语言模型中的拟人化响应

🚀 Human-Like-Qwen2.5-7B-Instruct

🛠️ 训练配置

💬 Prompt Template

ChatML

🤖 模型集合

🎯 基准测试结果

📊 数据集

11 KiB

Raw Permalink Blame History