Files
ModelHub XC 38cb072ce0 初始化项目,由ModelHub XC社区提供模型
Model: okwinds/Human-Like-Qwen2.5-7B-Instruct
Source: Original Platform
2026-05-19 12:23:13 +08:00

11 KiB
Raw Permalink Blame History

license, tags, base_model, pipeline_tag, library_name, model-index, datasets, language
license tags base_model pipeline_tag library_name model-index datasets language
apache-2.0
axolotl
dpo
trl
Qwen/Qwen2.5-7B-Instruct text-generation transformers
name results
Humanish-Qwen2.5-7B-Instruct
task dataset metrics source
type name
text-generation Text Generation
name type args
IFEval (0-Shot) HuggingFaceH4/ifeval
num_few_shot
0
type value name
inst_level_strict_acc and prompt_level_strict_acc 72.84 strict accuracy
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type args
BBH (3-Shot) BBH
num_few_shot
3
type value name
acc_norm 34.48 normalized accuracy
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type args
MATH Lvl 5 (4-Shot) hendrycks/competition_math
num_few_shot
4
type value name
exact_match 0 exact match
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type args
GPQA (0-shot) Idavidrein/gpqa
num_few_shot
0
type value name
acc_norm 6.49 acc_norm
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type args
MuSR (0-shot) TAUR-Lab/MuSR
num_few_shot
0
type value name
acc_norm 8.42 acc_norm
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type config split args
MMLU-PRO (5-shot) TIGER-Lab/MMLU-Pro main test
num_few_shot
5
type value name
acc 37.76 accuracy
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct Open LLM Leaderboard
okwinds/Human-Like-DPO-Dataset
en

本模型论文解读,请看公众号文章 👇🏻

觉察流 - AI的“人味儿”从何而来DPO和LoRA打造更拟人化的AI


下载方式

SDK下载

#安装ModelScope
pip install modelscope
#SDK模型下载
from modelscope import snapshot_download
model_dir = snapshot_download('okwinds/Human-Like-Qwen2.5-7B-Instruct')

Git下载

#Git模型下载
git clone https://www.modelscope.cn/okwinds/Human-Like-Qwen2.5-7B-Instruct.git

声明:本模型完全转载自 Huggingface 上的 HumanLLMs/Human-Like-Qwen2.5-7B-Instruct
更多模型信息,请关注下文👇🏻 为原模型仓库的中文版说明。


仓库作者在此 👇🏻 扫一扫




提升大型语言模型中的拟人化响应

   | 🤖 模型集合   |    📊 数据集   |    论文解读   |    📄论文   |

🚀 Human-Like-Qwen2.5-7B-Instruct

此模型是 Qwen/Qwen2.5-7B-Instruct 的微调版本,专门优化以生成更符合人类和对话式的响应。

微调过程同时采用了低秩自适应LoRA和直接偏好优化DPO来提升自然语言理解、对话连贯性和交互中的情感智能。

该模型创建过程在研究论文《增强大型语言模型中的人类似响应》中详细描述。

🛠️ 训练配置

  • 基础模型: Qwen2.5-7B-Instruct
  • 框架: Axolotl v0.4.1
  • 硬件算力: 2x NVIDIA A100 (80 GB) GPUs
  • 训练时长: ~2 小时 15 分钟
  • 数据集: 包含约 11,000 个样本的合成数据集,涵盖 256 个不同主题
查看 axolotl config

axolotl version: 0.4.1

base_model: Qwen/Qwen2.5-7B-Instruct
model_type: AutoModalForCausalLM
tokenizer_type: AutoTokenizer

trust_remote_code: true

load_in_8bit: true
load_in_4bit: false
strict: false

chat_template: chatml
rl: dpo
datasets:
  - path: HumanLLMs/humanish-dpo-project
    type: chatml.prompt_pairs
    chat_template: chatml

dataset_prepared_path:
val_set_size: 0.05
output_dir: ./humanish-qwen2.5-7b-instruct

sequence_len: 8192
sample_packing: false
pad_to_sequence_len: true

adapter: lora
lora_model_dir:
lora_r: 8
lora_alpha: 4
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:

wandb_project: Humanish-DPO
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

hub_model_id: HumanLLMs/Humanish-Qwen2.5-7B-Instruct

gradient_accumulation_steps: 8
micro_batch_size: 2
num_epochs: 1
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
s2_attention:

warmup_steps: 10
evals_per_epoch: 2
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:

save_safetensors: true

💬 Prompt Template

您在使用模型时可以使用 ChatML 格式的 Prompt Template

ChatML

<|im_start|>system
{system}<|im_end|>
<|im_start|>user
{user}<|im_end|>
<|im_start|>assistant
{asistant}<|im_end|>

此提示模板可作为聊天模板使用,这意味着您可以使用 tokenizer.apply_chat_template() 方法格式化消息:

messages = [
    {"role": "system", "content": "You are helpful AI asistant."},
    {"role": "user", "content": "Hello!"}
]
gen_input = tokenizer.apply_chat_template(message, return_tensors="pt")
model.generate(**gen_input)

🤖 模型集合

Model Download
Human-Like-Llama-3-8B-Instruct 🤖 Modelscope
Human-Like-Qwen-2.5-7B-Instruct 🤖 Modelscope
Human-Like-Mistral-Nemo-Instruct 🤖 Modelscope

🎯 基准测试结果

Group Model Average IFEval BBH MATH Lvl 5 GPQA MuSR MMLU-PRO
Llama Models Human-Like-Llama-3-8B-Instruct 22.37 64.97 28.01 8.45 0.78 2.00 30.01
Llama-3-8B-Instruct 23.57 74.08 28.24 8.68 1.23 1.60 29.60
Difference (Human-Like) -1.20 -9.11 -0.23 -0.23 -0.45 +0.40 +0.41
Qwen Models Human-Like-Qwen-2.5-7B-Instruct 26.66 72.84 34.48 0.00 6.49 8.42 37.76
Qwen-2.5-7B-Instruct 26.86 75.85 34.89 0.00 5.48 8.45 36.52
Difference (Human-Like) -0.20 -3.01 -0.41 0.00 +1.01 -0.03 +1.24
Mistral Models Human-Like-Mistral-Nemo-Instruct 22.88 54.51 32.70 7.62 5.03 9.39 28.00
Mistral-Nemo-Instruct 23.53 63.80 29.68 5.89 5.37 8.48 27.97
Difference (Human-Like) -0.65 -9.29 +3.02 +1.73 -0.34 +0.91 +0.03

📊 数据集

用于微调的数据集是使用 LLaMA 3 模型生成的。该数据集包含 10,884 个样本,涵盖 256 个不同的主题,如科技、日常生活、科学、历史和艺术等。每个样本包括:

  • 拟人回复: 自然、对话式的回答,模仿人类对话。
  • 正式回复: 结构化和精确的答案,语气更加正式。

数据集已开源,可在以下地址获取: