Files
Human-Like-Qwen2.5-7B-Instruct/README.md
ModelHub XC 38cb072ce0 初始化项目,由ModelHub XC社区提供模型
Model: okwinds/Human-Like-Qwen2.5-7B-Instruct
Source: Original Platform
2026-05-19 12:23:13 +08:00

335 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
license: apache-2.0
tags:
- axolotl
- dpo
- trl
base_model: Qwen/Qwen2.5-7B-Instruct
pipeline_tag: text-generation
library_name: transformers
model-index:
- name: Humanish-Qwen2.5-7B-Instruct
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: IFEval (0-Shot)
type: HuggingFaceH4/ifeval
args:
num_few_shot: 0
metrics:
- type: inst_level_strict_acc and prompt_level_strict_acc
value: 72.84
name: strict accuracy
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: BBH (3-Shot)
type: BBH
args:
num_few_shot: 3
metrics:
- type: acc_norm
value: 34.48
name: normalized accuracy
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MATH Lvl 5 (4-Shot)
type: hendrycks/competition_math
args:
num_few_shot: 4
metrics:
- type: exact_match
value: 0
name: exact match
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GPQA (0-shot)
type: Idavidrein/gpqa
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 6.49
name: acc_norm
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MuSR (0-shot)
type: TAUR-Lab/MuSR
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 8.42
name: acc_norm
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU-PRO (5-shot)
type: TIGER-Lab/MMLU-Pro
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 37.76
name: accuracy
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct
name: Open LLM Leaderboard
datasets:
- okwinds/Human-Like-DPO-Dataset
language:
- en
---
# 本模型论文解读,请看公众号文章 👇🏻
### <img src="https://www.modelscope.cn/datasets/okwinds/Human-Like-DPO-Dataset/resolve/master/wechat.png" width="30" height="30" align="absmiddle"> 觉察流 - [AI的“人味儿”从何而来DPO和LoRA打造更拟人化的AI](https://mp.weixin.qq.com/s/59WEBKi0uGYCwOXsd5FgCw)
<br/>
# 下载方式
SDK下载
```bash
#安装ModelScope
pip install modelscope
```
```python
#SDK模型下载
from modelscope import snapshot_download
model_dir = snapshot_download('okwinds/Human-Like-Qwen2.5-7B-Instruct')
```
Git下载
```
#Git模型下载
git clone https://www.modelscope.cn/okwinds/Human-Like-Qwen2.5-7B-Instruct.git
```
> <span style="color:red;font-size:16px"> 声明:本模型完全转载自 Huggingface 上的 [HumanLLMs/Human-Like-Qwen2.5-7B-Instruct](https://huggingface.co/HumanLLMs/Human-Like-Qwen2.5-7B-Instruct) <br/>更多模型信息,请关注下文👇🏻, 为原模型仓库的中文版说明。</span>
<br/>
#### _仓库作者在此 👇🏻 扫一扫_
<img src="https://www.modelscope.cn/models/okwinds/GPT-2/resolve/master/qrcode_for_jcl_258.jpg" />
_______________________________
<br/>
<br/>
<div align="center">
<img src="https://www.modelscope.cn/models/okwinds/Human-Like-Qwen2.5-7B-Instruct/resolve/master/avatar.jpeg" width="320" height="320" />
<h1>提升大型语言模型中的拟人化响应</h1>
</div>
<p align="center">
&nbsp&nbsp | 🤖 <a href="https://www.modelscope.cn/collections/Human-Like-nirenyingda-38b077cf6d0a44">模型集合</a>&nbsp&nbsp |
&nbsp&nbsp 📊 <a href="https://www.modelscope.cn/datasets/okwinds/Human-Like-DPO-Dataset">数据集</a>&nbsp&nbsp |
&nbsp&nbsp <img src="https://www.modelscope.cn/models/okwinds/Human-Like-Qwen2.5-7B-Instruct/resolve/master/wechat.png" width="22" height="22" align="absmiddle"> <a href="https://mp.weixin.qq.com/s/59WEBKi0uGYCwOXsd5FgCw">论文解读</a>&nbsp&nbsp |
&nbsp&nbsp 📄<a href="https://arxiv.org/abs/2501.05032">论文</a>&nbsp&nbsp |
</p>
# 🚀 Human-Like-Qwen2.5-7B-Instruct
此模型是 Qwen/Qwen2.5-7B-Instruct 的微调版本,专门优化以生成更符合人类和对话式的响应。
微调过程同时采用了低秩自适应LoRA和直接偏好优化DPO来提升自然语言理解、对话连贯性和交互中的情感智能。
该模型创建过程在研究论文[《增强大型语言模型中的人类似响应》](https://mp.weixin.qq.com/s/59WEBKi0uGYCwOXsd5FgCw)中详细描述。
# 🛠️ 训练配置
- **基础模型:** Qwen2.5-7B-Instruct
- **框架:** Axolotl v0.4.1
- **硬件算力:** 2x NVIDIA A100 (80 GB) GPUs
- **训练时长:** ~2 小时 15 分钟
- **数据集:** 包含约 11,000 个样本的合成数据集,涵盖 256 个不同主题
<details><summary>查看 axolotl config</summary>
axolotl version: `0.4.1`
```yaml
base_model: Qwen/Qwen2.5-7B-Instruct
model_type: AutoModalForCausalLM
tokenizer_type: AutoTokenizer
trust_remote_code: true
load_in_8bit: true
load_in_4bit: false
strict: false
chat_template: chatml
rl: dpo
datasets:
- path: HumanLLMs/humanish-dpo-project
type: chatml.prompt_pairs
chat_template: chatml
dataset_prepared_path:
val_set_size: 0.05
output_dir: ./humanish-qwen2.5-7b-instruct
sequence_len: 8192
sample_packing: false
pad_to_sequence_len: true
adapter: lora
lora_model_dir:
lora_r: 8
lora_alpha: 4
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
wandb_project: Humanish-DPO
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:
hub_model_id: HumanLLMs/Humanish-Qwen2.5-7B-Instruct
gradient_accumulation_steps: 8
micro_batch_size: 2
num_epochs: 1
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
s2_attention:
warmup_steps: 10
evals_per_epoch: 2
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
save_safetensors: true
```
</details><br>
# 💬 Prompt Template
您在使用模型时可以使用 ChatML 格式的 Prompt Template
### ChatML
```
<|im_start|>system
{system}<|im_end|>
<|im_start|>user
{user}<|im_end|>
<|im_start|>assistant
{asistant}<|im_end|>
```
此提示模板可作为聊天模板使用,这意味着您可以使用 `tokenizer.apply_chat_template()` 方法格式化消息:
```python
messages = [
{"role": "system", "content": "You are helpful AI asistant."},
{"role": "user", "content": "Hello!"}
]
gen_input = tokenizer.apply_chat_template(message, return_tensors="pt")
model.generate(**gen_input)
```
# 🤖 模型集合
| Model | Download |
|:---------------------:|:-----------------------------------------------------------------------:|
| Human-Like-Llama-3-8B-Instruct | 🤖 [Modelscope](https://www.modelscope.cn/models/okwinds/Human-Like-LLama3-8B-Instruct) |
| Human-Like-Qwen-2.5-7B-Instruct | 🤖 [Modelscope](https://www.modelscope.cn/models/okwinds/Human-Like-Qwen2.5-7B-Instruct) |
| Human-Like-Mistral-Nemo-Instruct | 🤖 [Modelscope](https://www.modelscope.cn/models/okwinds/Human-Like-Mistral-Nemo-Instruct-2407) |
<!--# 🔄 Quantizationed versions
## GGUF [@bartowski](https://huggingface.co/bartowski)
- https://huggingface.co/bartowski/Human-Like-LLama3-8B-Instruct-GGUF
- https://huggingface.co/bartowski/Human-Like-Qwen2.5-7B-Instruct-GGUF
- https://huggingface.co/bartowski/Human-Like-Mistral-Nemo-Instruct-2407-GGUF
-->
# 🎯 基准测试结果
| **Group** | **Model** | **Average** | **IFEval** | **BBH** | **MATH Lvl 5** | **GPQA** | **MuSR** | **MMLU-PRO** |
|--------------------------------|--------------------------------|-------------|------------|---------|----------------|----------|----------|--------------|
| **Llama Models** | Human-Like-Llama-3-8B-Instruct | 22.37 | **64.97** | 28.01 | 8.45 | 0.78 | **2.00** | 30.01 |
| | Llama-3-8B-Instruct | 23.57 | 74.08 | 28.24 | 8.68 | 1.23 | 1.60 | 29.60 |
| | *Difference (Human-Like)* | -1.20 | **-9.11** | -0.23 | -0.23 | -0.45 | +0.40 | +0.41 |
| **Qwen Models** | Human-Like-Qwen-2.5-7B-Instruct | 26.66 | 72.84 | 34.48 | 0.00 | 6.49 | 8.42 | 37.76 |
| | Qwen-2.5-7B-Instruct | 26.86 | 75.85 | 34.89 | 0.00 | 5.48 | 8.45 | 36.52 |
| | *Difference (Human-Like)* | -0.20 | -3.01 | -0.41 | 0.00 | **+1.01**| -0.03 | **+1.24** |
| **Mistral Models** | Human-Like-Mistral-Nemo-Instruct | 22.88 | **54.51** | 32.70 | 7.62 | 5.03 | 9.39 | 28.00 |
| | Mistral-Nemo-Instruct | 23.53 | 63.80 | 29.68 | 5.89 | 5.37 | 8.48 | 27.97 |
| | *Difference (Human-Like)* | -0.65 | **-9.29** | **+3.02**| **+1.73** | -0.34 | +0.91 | +0.03 |
# 📊 数据集
用于微调的数据集是使用 LLaMA 3 模型生成的。该数据集包含 10,884 个样本,涵盖 256 个不同的主题,如科技、日常生活、科学、历史和艺术等。每个样本包括:
- **拟人回复:** 自然、对话式的回答,模仿人类对话。
- **正式回复:** 结构化和精确的答案,语气更加正式。
数据集已开源,可在以下地址获取:
- 👉 [Human-Like-DPO-Dataset](https://www.modelscope.cn/datasets/okwinds/Human-Like-DPO-Dataset)