335 lines
11 KiB
Markdown
335 lines
11 KiB
Markdown
|
|
---
|
|||
|
|
license: apache-2.0
|
|||
|
|
tags:
|
|||
|
|
- axolotl
|
|||
|
|
- dpo
|
|||
|
|
- trl
|
|||
|
|
base_model: Qwen/Qwen2.5-7B-Instruct
|
|||
|
|
pipeline_tag: text-generation
|
|||
|
|
library_name: transformers
|
|||
|
|
model-index:
|
|||
|
|
- name: Humanish-Qwen2.5-7B-Instruct
|
|||
|
|
results:
|
|||
|
|
- task:
|
|||
|
|
type: text-generation
|
|||
|
|
name: Text Generation
|
|||
|
|
dataset:
|
|||
|
|
name: IFEval (0-Shot)
|
|||
|
|
type: HuggingFaceH4/ifeval
|
|||
|
|
args:
|
|||
|
|
num_few_shot: 0
|
|||
|
|
metrics:
|
|||
|
|
- type: inst_level_strict_acc and prompt_level_strict_acc
|
|||
|
|
value: 72.84
|
|||
|
|
name: strict accuracy
|
|||
|
|
source:
|
|||
|
|
url: >-
|
|||
|
|
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct
|
|||
|
|
name: Open LLM Leaderboard
|
|||
|
|
- task:
|
|||
|
|
type: text-generation
|
|||
|
|
name: Text Generation
|
|||
|
|
dataset:
|
|||
|
|
name: BBH (3-Shot)
|
|||
|
|
type: BBH
|
|||
|
|
args:
|
|||
|
|
num_few_shot: 3
|
|||
|
|
metrics:
|
|||
|
|
- type: acc_norm
|
|||
|
|
value: 34.48
|
|||
|
|
name: normalized accuracy
|
|||
|
|
source:
|
|||
|
|
url: >-
|
|||
|
|
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct
|
|||
|
|
name: Open LLM Leaderboard
|
|||
|
|
- task:
|
|||
|
|
type: text-generation
|
|||
|
|
name: Text Generation
|
|||
|
|
dataset:
|
|||
|
|
name: MATH Lvl 5 (4-Shot)
|
|||
|
|
type: hendrycks/competition_math
|
|||
|
|
args:
|
|||
|
|
num_few_shot: 4
|
|||
|
|
metrics:
|
|||
|
|
- type: exact_match
|
|||
|
|
value: 0
|
|||
|
|
name: exact match
|
|||
|
|
source:
|
|||
|
|
url: >-
|
|||
|
|
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct
|
|||
|
|
name: Open LLM Leaderboard
|
|||
|
|
- task:
|
|||
|
|
type: text-generation
|
|||
|
|
name: Text Generation
|
|||
|
|
dataset:
|
|||
|
|
name: GPQA (0-shot)
|
|||
|
|
type: Idavidrein/gpqa
|
|||
|
|
args:
|
|||
|
|
num_few_shot: 0
|
|||
|
|
metrics:
|
|||
|
|
- type: acc_norm
|
|||
|
|
value: 6.49
|
|||
|
|
name: acc_norm
|
|||
|
|
source:
|
|||
|
|
url: >-
|
|||
|
|
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct
|
|||
|
|
name: Open LLM Leaderboard
|
|||
|
|
- task:
|
|||
|
|
type: text-generation
|
|||
|
|
name: Text Generation
|
|||
|
|
dataset:
|
|||
|
|
name: MuSR (0-shot)
|
|||
|
|
type: TAUR-Lab/MuSR
|
|||
|
|
args:
|
|||
|
|
num_few_shot: 0
|
|||
|
|
metrics:
|
|||
|
|
- type: acc_norm
|
|||
|
|
value: 8.42
|
|||
|
|
name: acc_norm
|
|||
|
|
source:
|
|||
|
|
url: >-
|
|||
|
|
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct
|
|||
|
|
name: Open LLM Leaderboard
|
|||
|
|
- task:
|
|||
|
|
type: text-generation
|
|||
|
|
name: Text Generation
|
|||
|
|
dataset:
|
|||
|
|
name: MMLU-PRO (5-shot)
|
|||
|
|
type: TIGER-Lab/MMLU-Pro
|
|||
|
|
config: main
|
|||
|
|
split: test
|
|||
|
|
args:
|
|||
|
|
num_few_shot: 5
|
|||
|
|
metrics:
|
|||
|
|
- type: acc
|
|||
|
|
value: 37.76
|
|||
|
|
name: accuracy
|
|||
|
|
source:
|
|||
|
|
url: >-
|
|||
|
|
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct
|
|||
|
|
name: Open LLM Leaderboard
|
|||
|
|
datasets:
|
|||
|
|
- okwinds/Human-Like-DPO-Dataset
|
|||
|
|
language:
|
|||
|
|
- en
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# 本模型论文解读,请看公众号文章 👇🏻
|
|||
|
|
|
|||
|
|
### <img src="https://www.modelscope.cn/datasets/okwinds/Human-Like-DPO-Dataset/resolve/master/wechat.png" width="30" height="30" align="absmiddle"> 觉察流 - [AI的“人味儿”从何而来?DPO和LoRA打造更拟人化的AI](https://mp.weixin.qq.com/s/59WEBKi0uGYCwOXsd5FgCw)
|
|||
|
|
|
|||
|
|
<br/>
|
|||
|
|
|
|||
|
|
# 下载方式
|
|||
|
|
|
|||
|
|
SDK下载
|
|||
|
|
```bash
|
|||
|
|
#安装ModelScope
|
|||
|
|
pip install modelscope
|
|||
|
|
```
|
|||
|
|
```python
|
|||
|
|
#SDK模型下载
|
|||
|
|
from modelscope import snapshot_download
|
|||
|
|
model_dir = snapshot_download('okwinds/Human-Like-Qwen2.5-7B-Instruct')
|
|||
|
|
```
|
|||
|
|
Git下载
|
|||
|
|
```
|
|||
|
|
#Git模型下载
|
|||
|
|
git clone https://www.modelscope.cn/okwinds/Human-Like-Qwen2.5-7B-Instruct.git
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
> <span style="color:red;font-size:16px"> 声明:本模型完全转载自 Huggingface 上的 [HumanLLMs/Human-Like-Qwen2.5-7B-Instruct](https://huggingface.co/HumanLLMs/Human-Like-Qwen2.5-7B-Instruct) <br/>更多模型信息,请关注下文👇🏻, 为原模型仓库的中文版说明。</span>
|
|||
|
|
|
|||
|
|
<br/>
|
|||
|
|
|
|||
|
|
#### _仓库作者在此 👇🏻 扫一扫_
|
|||
|
|
|
|||
|
|
<img src="https://www.modelscope.cn/models/okwinds/GPT-2/resolve/master/qrcode_for_jcl_258.jpg" />
|
|||
|
|
|
|||
|
|
_______________________________
|
|||
|
|
|
|||
|
|
<br/>
|
|||
|
|
<br/>
|
|||
|
|
|
|||
|
|
<div align="center">
|
|||
|
|
<img src="https://www.modelscope.cn/models/okwinds/Human-Like-Qwen2.5-7B-Instruct/resolve/master/avatar.jpeg" width="320" height="320" />
|
|||
|
|
<h1>提升大型语言模型中的拟人化响应</h1>
|
|||
|
|
</div>
|
|||
|
|
|
|||
|
|
<p align="center">
|
|||
|
|
   | 🤖 <a href="https://www.modelscope.cn/collections/Human-Like-nirenyingda-38b077cf6d0a44">模型集合</a>   |
|
|||
|
|
   📊 <a href="https://www.modelscope.cn/datasets/okwinds/Human-Like-DPO-Dataset">数据集</a>   |
|
|||
|
|
   <img src="https://www.modelscope.cn/models/okwinds/Human-Like-Qwen2.5-7B-Instruct/resolve/master/wechat.png" width="22" height="22" align="absmiddle"> <a href="https://mp.weixin.qq.com/s/59WEBKi0uGYCwOXsd5FgCw">论文解读</a>   |
|
|||
|
|
   📄<a href="https://arxiv.org/abs/2501.05032">论文</a>   |
|
|||
|
|
</p>
|
|||
|
|
|
|||
|
|
# 🚀 Human-Like-Qwen2.5-7B-Instruct
|
|||
|
|
|
|||
|
|
此模型是 Qwen/Qwen2.5-7B-Instruct 的微调版本,专门优化以生成更符合人类和对话式的响应。
|
|||
|
|
|
|||
|
|
微调过程同时采用了低秩自适应(LoRA)和直接偏好优化(DPO)来提升自然语言理解、对话连贯性和交互中的情感智能。
|
|||
|
|
|
|||
|
|
该模型创建过程在研究论文[《增强大型语言模型中的人类似响应》](https://mp.weixin.qq.com/s/59WEBKi0uGYCwOXsd5FgCw)中详细描述。
|
|||
|
|
|
|||
|
|
# 🛠️ 训练配置
|
|||
|
|
|
|||
|
|
- **基础模型:** Qwen2.5-7B-Instruct
|
|||
|
|
- **框架:** Axolotl v0.4.1
|
|||
|
|
- **硬件算力:** 2x NVIDIA A100 (80 GB) GPUs
|
|||
|
|
- **训练时长:** ~2 小时 15 分钟
|
|||
|
|
- **数据集:** 包含约 11,000 个样本的合成数据集,涵盖 256 个不同主题
|
|||
|
|
|
|||
|
|
<details><summary>查看 axolotl config</summary>
|
|||
|
|
|
|||
|
|
axolotl version: `0.4.1`
|
|||
|
|
```yaml
|
|||
|
|
base_model: Qwen/Qwen2.5-7B-Instruct
|
|||
|
|
model_type: AutoModalForCausalLM
|
|||
|
|
tokenizer_type: AutoTokenizer
|
|||
|
|
|
|||
|
|
trust_remote_code: true
|
|||
|
|
|
|||
|
|
load_in_8bit: true
|
|||
|
|
load_in_4bit: false
|
|||
|
|
strict: false
|
|||
|
|
|
|||
|
|
chat_template: chatml
|
|||
|
|
rl: dpo
|
|||
|
|
datasets:
|
|||
|
|
- path: HumanLLMs/humanish-dpo-project
|
|||
|
|
type: chatml.prompt_pairs
|
|||
|
|
chat_template: chatml
|
|||
|
|
|
|||
|
|
dataset_prepared_path:
|
|||
|
|
val_set_size: 0.05
|
|||
|
|
output_dir: ./humanish-qwen2.5-7b-instruct
|
|||
|
|
|
|||
|
|
sequence_len: 8192
|
|||
|
|
sample_packing: false
|
|||
|
|
pad_to_sequence_len: true
|
|||
|
|
|
|||
|
|
adapter: lora
|
|||
|
|
lora_model_dir:
|
|||
|
|
lora_r: 8
|
|||
|
|
lora_alpha: 4
|
|||
|
|
lora_dropout: 0.05
|
|||
|
|
lora_target_linear: true
|
|||
|
|
lora_fan_in_fan_out:
|
|||
|
|
|
|||
|
|
wandb_project: Humanish-DPO
|
|||
|
|
wandb_entity:
|
|||
|
|
wandb_watch:
|
|||
|
|
wandb_name:
|
|||
|
|
wandb_log_model:
|
|||
|
|
|
|||
|
|
hub_model_id: HumanLLMs/Humanish-Qwen2.5-7B-Instruct
|
|||
|
|
|
|||
|
|
gradient_accumulation_steps: 8
|
|||
|
|
micro_batch_size: 2
|
|||
|
|
num_epochs: 1
|
|||
|
|
optimizer: adamw_bnb_8bit
|
|||
|
|
lr_scheduler: cosine
|
|||
|
|
learning_rate: 0.0002
|
|||
|
|
|
|||
|
|
train_on_inputs: false
|
|||
|
|
group_by_length: false
|
|||
|
|
bf16: auto
|
|||
|
|
fp16:
|
|||
|
|
tf32: false
|
|||
|
|
|
|||
|
|
gradient_checkpointing: true
|
|||
|
|
early_stopping_patience:
|
|||
|
|
resume_from_checkpoint:
|
|||
|
|
local_rank:
|
|||
|
|
logging_steps: 1
|
|||
|
|
xformers_attention:
|
|||
|
|
flash_attention: true
|
|||
|
|
s2_attention:
|
|||
|
|
|
|||
|
|
warmup_steps: 10
|
|||
|
|
evals_per_epoch: 2
|
|||
|
|
eval_table_size:
|
|||
|
|
eval_max_new_tokens: 128
|
|||
|
|
saves_per_epoch: 1
|
|||
|
|
debug:
|
|||
|
|
deepspeed:
|
|||
|
|
weight_decay: 0.0
|
|||
|
|
fsdp:
|
|||
|
|
fsdp_config:
|
|||
|
|
|
|||
|
|
save_safetensors: true
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
</details><br>
|
|||
|
|
|
|||
|
|
# 💬 Prompt Template
|
|||
|
|
|
|||
|
|
您在使用模型时可以使用 ChatML 格式的 Prompt Template:
|
|||
|
|
|
|||
|
|
### ChatML
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
<|im_start|>system
|
|||
|
|
{system}<|im_end|>
|
|||
|
|
<|im_start|>user
|
|||
|
|
{user}<|im_end|>
|
|||
|
|
<|im_start|>assistant
|
|||
|
|
{asistant}<|im_end|>
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
此提示模板可作为聊天模板使用,这意味着您可以使用 `tokenizer.apply_chat_template()` 方法格式化消息:
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
messages = [
|
|||
|
|
{"role": "system", "content": "You are helpful AI asistant."},
|
|||
|
|
{"role": "user", "content": "Hello!"}
|
|||
|
|
]
|
|||
|
|
gen_input = tokenizer.apply_chat_template(message, return_tensors="pt")
|
|||
|
|
model.generate(**gen_input)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
# 🤖 模型集合
|
|||
|
|
|
|||
|
|
| Model | Download |
|
|||
|
|
|:---------------------:|:-----------------------------------------------------------------------:|
|
|||
|
|
| Human-Like-Llama-3-8B-Instruct | 🤖 [Modelscope](https://www.modelscope.cn/models/okwinds/Human-Like-LLama3-8B-Instruct) |
|
|||
|
|
| Human-Like-Qwen-2.5-7B-Instruct | 🤖 [Modelscope](https://www.modelscope.cn/models/okwinds/Human-Like-Qwen2.5-7B-Instruct) |
|
|||
|
|
| Human-Like-Mistral-Nemo-Instruct | 🤖 [Modelscope](https://www.modelscope.cn/models/okwinds/Human-Like-Mistral-Nemo-Instruct-2407) |
|
|||
|
|
|
|||
|
|
<!--# 🔄 Quantizationed versions
|
|||
|
|
|
|||
|
|
## GGUF [@bartowski](https://huggingface.co/bartowski)
|
|||
|
|
|
|||
|
|
- https://huggingface.co/bartowski/Human-Like-LLama3-8B-Instruct-GGUF
|
|||
|
|
|
|||
|
|
- https://huggingface.co/bartowski/Human-Like-Qwen2.5-7B-Instruct-GGUF
|
|||
|
|
|
|||
|
|
- https://huggingface.co/bartowski/Human-Like-Mistral-Nemo-Instruct-2407-GGUF
|
|||
|
|
-->
|
|||
|
|
|
|||
|
|
# 🎯 基准测试结果
|
|||
|
|
|
|||
|
|
| **Group** | **Model** | **Average** | **IFEval** | **BBH** | **MATH Lvl 5** | **GPQA** | **MuSR** | **MMLU-PRO** |
|
|||
|
|
|--------------------------------|--------------------------------|-------------|------------|---------|----------------|----------|----------|--------------|
|
|||
|
|
| **Llama Models** | Human-Like-Llama-3-8B-Instruct | 22.37 | **64.97** | 28.01 | 8.45 | 0.78 | **2.00** | 30.01 |
|
|||
|
|
| | Llama-3-8B-Instruct | 23.57 | 74.08 | 28.24 | 8.68 | 1.23 | 1.60 | 29.60 |
|
|||
|
|
| | *Difference (Human-Like)* | -1.20 | **-9.11** | -0.23 | -0.23 | -0.45 | +0.40 | +0.41 |
|
|||
|
|
| **Qwen Models** | Human-Like-Qwen-2.5-7B-Instruct | 26.66 | 72.84 | 34.48 | 0.00 | 6.49 | 8.42 | 37.76 |
|
|||
|
|
| | Qwen-2.5-7B-Instruct | 26.86 | 75.85 | 34.89 | 0.00 | 5.48 | 8.45 | 36.52 |
|
|||
|
|
| | *Difference (Human-Like)* | -0.20 | -3.01 | -0.41 | 0.00 | **+1.01**| -0.03 | **+1.24** |
|
|||
|
|
| **Mistral Models** | Human-Like-Mistral-Nemo-Instruct | 22.88 | **54.51** | 32.70 | 7.62 | 5.03 | 9.39 | 28.00 |
|
|||
|
|
| | Mistral-Nemo-Instruct | 23.53 | 63.80 | 29.68 | 5.89 | 5.37 | 8.48 | 27.97 |
|
|||
|
|
| | *Difference (Human-Like)* | -0.65 | **-9.29** | **+3.02**| **+1.73** | -0.34 | +0.91 | +0.03 |
|
|||
|
|
|
|||
|
|
|
|||
|
|
# 📊 数据集
|
|||
|
|
|
|||
|
|
用于微调的数据集是使用 LLaMA 3 模型生成的。该数据集包含 10,884 个样本,涵盖 256 个不同的主题,如科技、日常生活、科学、历史和艺术等。每个样本包括:
|
|||
|
|
- **拟人回复:** 自然、对话式的回答,模仿人类对话。
|
|||
|
|
- **正式回复:** 结构化和精确的答案,语气更加正式。
|
|||
|
|
|
|||
|
|
数据集已开源,可在以下地址获取:
|
|||
|
|
|
|||
|
|
- 👉 [Human-Like-DPO-Dataset](https://www.modelscope.cn/datasets/okwinds/Human-Like-DPO-Dataset)
|
|||
|
|
|