--- license: apache-2.0 tags: - axolotl - dpo - trl base_model: Qwen/Qwen2.5-7B-Instruct pipeline_tag: text-generation library_name: transformers model-index: - name: Humanish-Qwen2.5-7B-Instruct results: - task: type: text-generation name: Text Generation dataset: name: IFEval (0-Shot) type: HuggingFaceH4/ifeval args: num_few_shot: 0 metrics: - type: inst_level_strict_acc and prompt_level_strict_acc value: 72.84 name: strict accuracy source: url: >- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: BBH (3-Shot) type: BBH args: num_few_shot: 3 metrics: - type: acc_norm value: 34.48 name: normalized accuracy source: url: >- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MATH Lvl 5 (4-Shot) type: hendrycks/competition_math args: num_few_shot: 4 metrics: - type: exact_match value: 0 name: exact match source: url: >- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GPQA (0-shot) type: Idavidrein/gpqa args: num_few_shot: 0 metrics: - type: acc_norm value: 6.49 name: acc_norm source: url: >- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MuSR (0-shot) type: TAUR-Lab/MuSR args: num_few_shot: 0 metrics: - type: acc_norm value: 8.42 name: acc_norm source: url: >- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU-PRO (5-shot) type: TIGER-Lab/MMLU-Pro config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 37.76 name: accuracy source: url: >- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct name: Open LLM Leaderboard datasets: - okwinds/Human-Like-DPO-Dataset language: - en --- # 本模型论文解读，请看公众号文章 👇🏻 ###

觉察流 - [AI的“人味儿”从何而来？DPO和LoRA打造更拟人化的AI](https://mp.weixin.qq.com/s/59WEBKi0uGYCwOXsd5FgCw)
# 下载方式 SDK下载 ```bash #安装ModelScope pip install modelscope ``` ```python #SDK模型下载 from modelscope import snapshot_download model_dir = snapshot_download('okwinds/Human-Like-Qwen2.5-7B-Instruct') ``` Git下载 ``` #Git模型下载 git clone https://www.modelscope.cn/okwinds/Human-Like-Qwen2.5-7B-Instruct.git ``` > 声明：本模型完全转载自 Huggingface 上的 [HumanLLMs/Human-Like-Qwen2.5-7B-Instruct](https://huggingface.co/HumanLLMs/Human-Like-Qwen2.5-7B-Instruct)
更多模型信息，请关注下文👇🏻，为原模型仓库的中文版说明。
#### _仓库作者在此 👇🏻 扫一扫_

_______________________________

提升大型语言模型中的拟人化响应

| 🤖 模型集合 | 📊 数据集 | 论文解读 | 📄论文 |

# 🚀 Human-Like-Qwen2.5-7B-Instruct 此模型是 Qwen/Qwen2.5-7B-Instruct 的微调版本，专门优化以生成更符合人类和对话式的响应。微调过程同时采用了低秩自适应（LoRA）和直接偏好优化（DPO）来提升自然语言理解、对话连贯性和交互中的情感智能。该模型创建过程在研究论文[《增强大型语言模型中的人类似响应》](https://mp.weixin.qq.com/s/59WEBKi0uGYCwOXsd5FgCw)中详细描述。 # 🛠️ 训练配置 - **基础模型:** Qwen2.5-7B-Instruct - **框架:** Axolotl v0.4.1 - **硬件算力:** 2x NVIDIA A100 (80 GB) GPUs - **训练时长:** ~2 小时 15 分钟 - **数据集:** 包含约 11,000 个样本的合成数据集，涵盖 256 个不同主题

查看 axolotl config

axolotl version: `0.4.1` ```yaml base_model: Qwen/Qwen2.5-7B-Instruct model_type: AutoModalForCausalLM tokenizer_type: AutoTokenizer trust_remote_code: true load_in_8bit: true load_in_4bit: false strict: false chat_template: chatml rl: dpo datasets: - path: HumanLLMs/humanish-dpo-project type: chatml.prompt_pairs chat_template: chatml dataset_prepared_path: val_set_size: 0.05 output_dir: ./humanish-qwen2.5-7b-instruct sequence_len: 8192 sample_packing: false pad_to_sequence_len: true adapter: lora lora_model_dir: lora_r: 8 lora_alpha: 4 lora_dropout: 0.05 lora_target_linear: true lora_fan_in_fan_out: wandb_project: Humanish-DPO wandb_entity: wandb_watch: wandb_name: wandb_log_model: hub_model_id: HumanLLMs/Humanish-Qwen2.5-7B-Instruct gradient_accumulation_steps: 8 micro_batch_size: 2 num_epochs: 1 optimizer: adamw_bnb_8bit lr_scheduler: cosine learning_rate: 0.0002 train_on_inputs: false group_by_length: false bf16: auto fp16: tf32: false gradient_checkpointing: true early_stopping_patience: resume_from_checkpoint: local_rank: logging_steps: 1 xformers_attention: flash_attention: true s2_attention: warmup_steps: 10 evals_per_epoch: 2 eval_table_size: eval_max_new_tokens: 128 saves_per_epoch: 1 debug: deepspeed: weight_decay: 0.0 fsdp: fsdp_config: save_safetensors: true ```

# 💬 Prompt Template 您在使用模型时可以使用 ChatML 格式的 Prompt Template： ### ChatML ``` <|im_start|>system {system}<|im_end|> <|im_start|>user {user}<|im_end|> <|im_start|>assistant {asistant}<|im_end|> ``` 此提示模板可作为聊天模板使用，这意味着您可以使用 `tokenizer.apply_chat_template()` 方法格式化消息： ```python messages = [ {"role": "system", "content": "You are helpful AI asistant."}, {"role": "user", "content": "Hello!"} ] gen_input = tokenizer.apply_chat_template(message, return_tensors="pt") model.generate(**gen_input) ``` # 🤖 模型集合 | Model | Download | |:---------------------:|:-----------------------------------------------------------------------:| | Human-Like-Llama-3-8B-Instruct | 🤖 [Modelscope](https://www.modelscope.cn/models/okwinds/Human-Like-LLama3-8B-Instruct) | | Human-Like-Qwen-2.5-7B-Instruct | 🤖 [Modelscope](https://www.modelscope.cn/models/okwinds/Human-Like-Qwen2.5-7B-Instruct) | | Human-Like-Mistral-Nemo-Instruct | 🤖 [Modelscope](https://www.modelscope.cn/models/okwinds/Human-Like-Mistral-Nemo-Instruct-2407) | # 🎯 基准测试结果 | **Group** | **Model** | **Average** | **IFEval** | **BBH** | **MATH Lvl 5** | **GPQA** | **MuSR** | **MMLU-PRO** | |--------------------------------|--------------------------------|-------------|------------|---------|----------------|----------|----------|--------------| | **Llama Models** | Human-Like-Llama-3-8B-Instruct | 22.37 | **64.97** | 28.01 | 8.45 | 0.78 | **2.00** | 30.01 | | | Llama-3-8B-Instruct | 23.57 | 74.08 | 28.24 | 8.68 | 1.23 | 1.60 | 29.60 | | | *Difference (Human-Like)* | -1.20 | **-9.11** | -0.23 | -0.23 | -0.45 | +0.40 | +0.41 | | **Qwen Models** | Human-Like-Qwen-2.5-7B-Instruct | 26.66 | 72.84 | 34.48 | 0.00 | 6.49 | 8.42 | 37.76 | | | Qwen-2.5-7B-Instruct | 26.86 | 75.85 | 34.89 | 0.00 | 5.48 | 8.45 | 36.52 | | | *Difference (Human-Like)* | -0.20 | -3.01 | -0.41 | 0.00 | **+1.01**| -0.03 | **+1.24** | | **Mistral Models** | Human-Like-Mistral-Nemo-Instruct | 22.88 | **54.51** | 32.70 | 7.62 | 5.03 | 9.39 | 28.00 | | | Mistral-Nemo-Instruct | 23.53 | 63.80 | 29.68 | 5.89 | 5.37 | 8.48 | 27.97 | | | *Difference (Human-Like)* | -0.65 | **-9.29** | **+3.02**| **+1.73** | -0.34 | +0.91 | +0.03 | # 📊 数据集用于微调的数据集是使用 LLaMA 3 模型生成的。该数据集包含 10,884 个样本，涵盖 256 个不同的主题，如科技、日常生活、科学、历史和艺术等。每个样本包括： - **拟人回复:** 自然、对话式的回答，模仿人类对话。 - **正式回复:** 结构化和精确的答案，语气更加正式。数据集已开源，可在以下地址获取： - 👉 [Human-Like-DPO-Dataset](https://www.modelscope.cn/datasets/okwinds/Human-Like-DPO-Dataset)