初始化项目，由ModelHub XC社区提供模型

Model: okwinds/Human-Like-Qwen2.5-7B-Instruct Source: Original Platform
2026-05-19 12:23:13 +08:00
commit 38cb072ce0
19 changed files with 909881 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,334 @@
+---
+license: apache-2.0
+tags:
+- axolotl
+- dpo
+- trl
+base_model: Qwen/Qwen2.5-7B-Instruct
+pipeline_tag: text-generation
+library_name: transformers
+model-index:
+- name: Humanish-Qwen2.5-7B-Instruct
+  results:
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: IFEval (0-Shot)
+      type: HuggingFaceH4/ifeval
+      args:
+        num_few_shot: 0
+    metrics:
+    - type: inst_level_strict_acc and prompt_level_strict_acc
+      value: 72.84
+      name: strict accuracy
+    source:
+      url: >-
+        https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: BBH (3-Shot)
+      type: BBH
+      args:
+        num_few_shot: 3
+    metrics:
+    - type: acc_norm
+      value: 34.48
+      name: normalized accuracy
+    source:
+      url: >-
+        https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: MATH Lvl 5 (4-Shot)
+      type: hendrycks/competition_math
+      args:
+        num_few_shot: 4
+    metrics:
+    - type: exact_match
+      value: 0
+      name: exact match
+    source:
+      url: >-
+        https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: GPQA (0-shot)
+      type: Idavidrein/gpqa
+      args:
+        num_few_shot: 0
+    metrics:
+    - type: acc_norm
+      value: 6.49
+      name: acc_norm
+    source:
+      url: >-
+        https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: MuSR (0-shot)
+      type: TAUR-Lab/MuSR
+      args:
+        num_few_shot: 0
+    metrics:
+    - type: acc_norm
+      value: 8.42
+      name: acc_norm
+    source:
+      url: >-
+        https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: MMLU-PRO (5-shot)
+      type: TIGER-Lab/MMLU-Pro
+      config: main
+      split: test
+      args:
+        num_few_shot: 5
+    metrics:
+    - type: acc
+      value: 37.76
+      name: accuracy
+    source:
+      url: >-
+        https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct
+      name: Open LLM Leaderboard
+datasets:
+- okwinds/Human-Like-DPO-Dataset
+language:
+- en
+---
+
+# 本模型论文解读，请看公众号文章 👇🏻
+
+### <img src="https://www.modelscope.cn/datasets/okwinds/Human-Like-DPO-Dataset/resolve/master/wechat.png" width="30" height="30" align="absmiddle">  觉察流 - [AI的“人味儿”从何而来？DPO和LoRA打造更拟人化的AI](https://mp.weixin.qq.com/s/59WEBKi0uGYCwOXsd5FgCw)
+
+<br/>
+
+# 下载方式
+
+SDK下载
+```bash
+#安装ModelScope
+pip install modelscope
+```
+```python
+#SDK模型下载
+from modelscope import snapshot_download
+model_dir = snapshot_download('okwinds/Human-Like-Qwen2.5-7B-Instruct')
+```
+Git下载
+```
+#Git模型下载
+git clone https://www.modelscope.cn/okwinds/Human-Like-Qwen2.5-7B-Instruct.git
+```
+
+> <span style="color:red;font-size:16px"> 声明：本模型完全转载自 Huggingface 上的 [HumanLLMs/Human-Like-Qwen2.5-7B-Instruct](https://huggingface.co/HumanLLMs/Human-Like-Qwen2.5-7B-Instruct) <br/>更多模型信息，请关注下文👇🏻， 为原模型仓库的中文版说明。</span>
+
+<br/>
+
+#### _仓库作者在此 👇🏻 扫一扫_
+
+<img src="https://www.modelscope.cn/models/okwinds/GPT-2/resolve/master/qrcode_for_jcl_258.jpg" />
+
+_______________________________
+
+<br/>
+<br/>
+
+<div align="center">
+  <img src="https://www.modelscope.cn/models/okwinds/Human-Like-Qwen2.5-7B-Instruct/resolve/master/avatar.jpeg" width="320" height="320" />
+  <h1>提升大型语言模型中的拟人化响应</h1>
+</div>
+
+<p align="center">
+  &nbsp&nbsp | 🤖 <a href="https://www.modelscope.cn/collections/Human-Like-nirenyingda-38b077cf6d0a44">模型集合</a>&nbsp&nbsp | 
+  &nbsp&nbsp 📊 <a href="https://www.modelscope.cn/datasets/okwinds/Human-Like-DPO-Dataset">数据集</a>&nbsp&nbsp | 
+  &nbsp&nbsp <img src="https://www.modelscope.cn/models/okwinds/Human-Like-Qwen2.5-7B-Instruct/resolve/master/wechat.png" width="22" height="22" align="absmiddle"> <a href="https://mp.weixin.qq.com/s/59WEBKi0uGYCwOXsd5FgCw">论文解读</a>&nbsp&nbsp | 
+  &nbsp&nbsp 📄<a href="https://arxiv.org/abs/2501.05032">论文</a>&nbsp&nbsp |
+</p>
+
+# 🚀 Human-Like-Qwen2.5-7B-Instruct
+
+此模型是 Qwen/Qwen2.5-7B-Instruct 的微调版本，专门优化以生成更符合人类和对话式的响应。
+
+微调过程同时采用了低秩自适应（LoRA）和直接偏好优化（DPO）来提升自然语言理解、对话连贯性和交互中的情感智能。
+
+该模型创建过程在研究论文[《增强大型语言模型中的人类似响应》](https://mp.weixin.qq.com/s/59WEBKi0uGYCwOXsd5FgCw)中详细描述。
+
+# 🛠️ 训练配置
+
+- **基础模型:** Qwen2.5-7B-Instruct  
+- **框架:** Axolotl v0.4.1  
+- **硬件算力:** 2x NVIDIA A100 (80 GB) GPUs  
+- **训练时长:** ~2 小时 15 分钟
+- **数据集:** 包含约 11,000 个样本的合成数据集，涵盖 256 个不同主题
+
+<details><summary>查看 axolotl config</summary>
+
+axolotl version: `0.4.1`
+```yaml
+base_model: Qwen/Qwen2.5-7B-Instruct
+model_type: AutoModalForCausalLM
+tokenizer_type: AutoTokenizer
+
+trust_remote_code: true
+
+load_in_8bit: true
+load_in_4bit: false
+strict: false
+
+chat_template: chatml
+rl: dpo
+datasets:
+  - path: HumanLLMs/humanish-dpo-project
+    type: chatml.prompt_pairs
+    chat_template: chatml
+
+dataset_prepared_path:
+val_set_size: 0.05
+output_dir: ./humanish-qwen2.5-7b-instruct
+
+sequence_len: 8192
+sample_packing: false
+pad_to_sequence_len: true
+
+adapter: lora
+lora_model_dir:
+lora_r: 8
+lora_alpha: 4
+lora_dropout: 0.05
+lora_target_linear: true
+lora_fan_in_fan_out:
+
+wandb_project: Humanish-DPO
+wandb_entity:
+wandb_watch:
+wandb_name:
+wandb_log_model:
+
+hub_model_id: HumanLLMs/Humanish-Qwen2.5-7B-Instruct
+
+gradient_accumulation_steps: 8
+micro_batch_size: 2
+num_epochs: 1
+optimizer: adamw_bnb_8bit
+lr_scheduler: cosine
+learning_rate: 0.0002
+
+train_on_inputs: false
+group_by_length: false
+bf16: auto
+fp16:
+tf32: false
+
+gradient_checkpointing: true
+early_stopping_patience:
+resume_from_checkpoint:
+local_rank:
+logging_steps: 1
+xformers_attention:
+flash_attention: true
+s2_attention:
+
+warmup_steps: 10
+evals_per_epoch: 2
+eval_table_size:
+eval_max_new_tokens: 128
+saves_per_epoch: 1
+debug:
+deepspeed:
+weight_decay: 0.0
+fsdp:
+fsdp_config:
+
+save_safetensors: true
+```
+
+</details><br>
+
+# 💬 Prompt Template
+
+您在使用模型时可以使用 ChatML 格式的 Prompt Template：
+
+### ChatML
+
+```
+<|im_start|>system
+{system}<|im_end|>
+<|im_start|>user
+{user}<|im_end|>
+<|im_start|>assistant
+{asistant}<|im_end|>
+```
+
+此提示模板可作为聊天模板使用，这意味着您可以使用 `tokenizer.apply_chat_template()` 方法格式化消息：
+
+```python
+messages = [
+    {"role": "system", "content": "You are helpful AI asistant."},
+    {"role": "user", "content": "Hello!"}
+]
+gen_input = tokenizer.apply_chat_template(message, return_tensors="pt")
+model.generate(**gen_input)
+```
+
+# 🤖 模型集合
+
+|         Model         |                               Download                                 |
+|:---------------------:|:-----------------------------------------------------------------------:|
+| Human-Like-Llama-3-8B-Instruct  |  🤖 [Modelscope](https://www.modelscope.cn/models/okwinds/Human-Like-LLama3-8B-Instruct)  |
+| Human-Like-Qwen-2.5-7B-Instruct  | 🤖 [Modelscope](https://www.modelscope.cn/models/okwinds/Human-Like-Qwen2.5-7B-Instruct)  |
+| Human-Like-Mistral-Nemo-Instruct  | 🤖 [Modelscope](https://www.modelscope.cn/models/okwinds/Human-Like-Mistral-Nemo-Instruct-2407) |
+
+<!--# 🔄 Quantizationed versions
+
+## GGUF [@bartowski](https://huggingface.co/bartowski)
+
+- https://huggingface.co/bartowski/Human-Like-LLama3-8B-Instruct-GGUF
+
+- https://huggingface.co/bartowski/Human-Like-Qwen2.5-7B-Instruct-GGUF
+
+- https://huggingface.co/bartowski/Human-Like-Mistral-Nemo-Instruct-2407-GGUF
+-->
+ 
+# 🎯 基准测试结果
+
+| **Group**                      | **Model**                      | **Average** | **IFEval** | **BBH** | **MATH Lvl 5** | **GPQA** | **MuSR** | **MMLU-PRO** |
+|--------------------------------|--------------------------------|-------------|------------|---------|----------------|----------|----------|--------------|
+| **Llama Models**               | Human-Like-Llama-3-8B-Instruct | 22.37       | **64.97**  | 28.01   | 8.45           | 0.78     | **2.00** | 30.01        |
+|                                | Llama-3-8B-Instruct            | 23.57       | 74.08      | 28.24   | 8.68           | 1.23     | 1.60     | 29.60        |
+|                                | *Difference (Human-Like)*      | -1.20       | **-9.11**  | -0.23   | -0.23          | -0.45    | +0.40    | +0.41        |
+| **Qwen Models**                | Human-Like-Qwen-2.5-7B-Instruct | 26.66      | 72.84      | 34.48   | 0.00           | 6.49     | 8.42     | 37.76        |
+|                                | Qwen-2.5-7B-Instruct           | 26.86       | 75.85      | 34.89   | 0.00           | 5.48     | 8.45     | 36.52        |
+|                                | *Difference (Human-Like)*      | -0.20       | -3.01      | -0.41   | 0.00           | **+1.01**| -0.03    | **+1.24**    |
+| **Mistral Models**             | Human-Like-Mistral-Nemo-Instruct | 22.88     | **54.51**  | 32.70   | 7.62           | 5.03     | 9.39     | 28.00        |
+|                                | Mistral-Nemo-Instruct          | 23.53       | 63.80      | 29.68   | 5.89           | 5.37     | 8.48     | 27.97        |
+|                                | *Difference (Human-Like)*      | -0.65       | **-9.29**  | **+3.02**| **+1.73**      | -0.34    | +0.91    | +0.03        |
+
+
+# 📊 数据集
+
+用于微调的数据集是使用 LLaMA 3 模型生成的。该数据集包含 10,884 个样本，涵盖 256 个不同的主题，如科技、日常生活、科学、历史和艺术等。每个样本包括：
+- **拟人回复:** 自然、对话式的回答，模仿人类对话。
+- **正式回复:** 结构化和精确的答案，语气更加正式。
+
+数据集已开源，可在以下地址获取：
+
+- 👉 [Human-Like-DPO-Dataset](https://www.modelscope.cn/datasets/okwinds/Human-Like-DPO-Dataset)
+