315 lines
10 KiB
Markdown
315 lines
10 KiB
Markdown
---
|
||
license: llama3
|
||
tags:
|
||
- axolotl
|
||
- dpo
|
||
- trl
|
||
base_model: meta-llama/Meta-Llama-3-8B-Instruct
|
||
datasets:
|
||
- HumanLLMs/Human-Like-DPO-Dataset
|
||
model-index:
|
||
- name: Humanish-LLama3.1-8B-Instruct
|
||
results:
|
||
- task:
|
||
type: text-generation
|
||
name: Text Generation
|
||
dataset:
|
||
name: IFEval (0-Shot)
|
||
type: HuggingFaceH4/ifeval
|
||
args:
|
||
num_few_shot: 0
|
||
metrics:
|
||
- type: inst_level_strict_acc and prompt_level_strict_acc
|
||
value: 64.98
|
||
name: strict accuracy
|
||
source:
|
||
url: >-
|
||
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-LLama3.1-8B-Instruct
|
||
name: Open LLM Leaderboard
|
||
- task:
|
||
type: text-generation
|
||
name: Text Generation
|
||
dataset:
|
||
name: BBH (3-Shot)
|
||
type: BBH
|
||
args:
|
||
num_few_shot: 3
|
||
metrics:
|
||
- type: acc_norm
|
||
value: 28.01
|
||
name: normalized accuracy
|
||
source:
|
||
url: >-
|
||
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-LLama3.1-8B-Instruct
|
||
name: Open LLM Leaderboard
|
||
- task:
|
||
type: text-generation
|
||
name: Text Generation
|
||
dataset:
|
||
name: MATH Lvl 5 (4-Shot)
|
||
type: hendrycks/competition_math
|
||
args:
|
||
num_few_shot: 4
|
||
metrics:
|
||
- type: exact_match
|
||
value: 8.46
|
||
name: exact match
|
||
source:
|
||
url: >-
|
||
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-LLama3.1-8B-Instruct
|
||
name: Open LLM Leaderboard
|
||
- task:
|
||
type: text-generation
|
||
name: Text Generation
|
||
dataset:
|
||
name: GPQA (0-shot)
|
||
type: Idavidrein/gpqa
|
||
args:
|
||
num_few_shot: 0
|
||
metrics:
|
||
- type: acc_norm
|
||
value: 0.78
|
||
name: acc_norm
|
||
source:
|
||
url: >-
|
||
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-LLama3.1-8B-Instruct
|
||
name: Open LLM Leaderboard
|
||
- task:
|
||
type: text-generation
|
||
name: Text Generation
|
||
dataset:
|
||
name: MuSR (0-shot)
|
||
type: TAUR-Lab/MuSR
|
||
args:
|
||
num_few_shot: 0
|
||
metrics:
|
||
- type: acc_norm
|
||
value: 2
|
||
name: acc_norm
|
||
source:
|
||
url: >-
|
||
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-LLama3.1-8B-Instruct
|
||
name: Open LLM Leaderboard
|
||
- task:
|
||
type: text-generation
|
||
name: Text Generation
|
||
dataset:
|
||
name: MMLU-PRO (5-shot)
|
||
type: TIGER-Lab/MMLU-Pro
|
||
config: main
|
||
split: test
|
||
args:
|
||
num_few_shot: 5
|
||
metrics:
|
||
- type: acc
|
||
value: 30.02
|
||
name: accuracy
|
||
source:
|
||
url: >-
|
||
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-LLama3.1-8B-Instruct
|
||
name: Open LLM Leaderboard
|
||
pipeline_tag: text-generation
|
||
library_name: transformers
|
||
---
|
||
<div align="center">
|
||
<img src="https://cdn-avatars.huggingface.co/v1/production/uploads/63da3d7ae697e5898cb86854/H-vpXOX6KZu01HnV87Jk5.jpeg" width="320" height="320" />
|
||
<h1>Enhancing Human-Like Responses in Large Language Models</h1>
|
||
</div>
|
||
|
||
<p align="center">
|
||
   | 🤗 <a href="https://huggingface.co/collections/HumanLLMs/human-like-humanish-llms-6759fa68f22e11eb1a10967e">Models</a>   |
|
||
   📊 <a href="https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset">Dataset</a>   |
|
||
   📄<a href="https://arxiv.org/abs/2501.05032">Paper</a>   |
|
||
</p>
|
||
|
||
<p align="center">
|
||
<b>📢 The paper associated with this model has been accepted to the <a href="https://personalizedllm.github.io/events/perfm-aaai26/">AAAI-26 Workshop on Personalization in the Era of Large Foundation Models (PerFM)</a>.</b>
|
||
</p>
|
||
|
||
# 🚀 Human-Like-Llama3-8B-Instruct
|
||
|
||
This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct), specifically optimized to generate more human-like and conversational responses.
|
||
|
||
The fine-tuning process employed both [Low-Rank Adaptation (LoRA)](https://arxiv.org/abs/2106.09685) and [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290) to enhance natural language understanding, conversational coherence, and emotional intelligence in interactions.
|
||
|
||
The proccess of creating this models is detailed in the research paper [“Enhancing Human-Like Responses in Large Language Models”](https://arxiv.org/abs/2501.05032).
|
||
|
||
# 🛠️ Training Configuration
|
||
|
||
- **Base Model:** Llama3-8B-Instruct
|
||
- **Framework:** Axolotl v0.4.1
|
||
- **Hardware:** 2x NVIDIA A100 (80 GB) GPUs
|
||
- **Training Time:** ~2 hours 20 minutes
|
||
- **Dataset:** Synthetic dataset with ≈11,000 samples across 256 diverse topics
|
||
|
||
<details><summary>See axolotl config</summary>
|
||
|
||
axolotl version: `0.4.1`
|
||
```yaml
|
||
base_model: meta-llama/Meta-Llama-3-8B-Instruct
|
||
model_type: LlamaForCausalLM
|
||
tokenizer_type: AutoTokenizer
|
||
|
||
load_in_8bit: true
|
||
load_in_4bit: false
|
||
strict: false
|
||
|
||
chat_template: llama3
|
||
rl: dpo
|
||
datasets:
|
||
- path: HumanLLMs/humanish-dpo-project
|
||
type: llama3.prompt_pairs
|
||
chat_template: llama3
|
||
|
||
dataset_prepared_path:
|
||
val_set_size: 0.05
|
||
output_dir: ./humanish-llama3-8b-instruct
|
||
|
||
sequence_len: 8192
|
||
sample_packing: false
|
||
pad_to_sequence_len: true
|
||
|
||
adapter: lora
|
||
lora_model_dir:
|
||
lora_r: 8
|
||
lora_alpha: 4
|
||
lora_dropout: 0.05
|
||
lora_target_linear: true
|
||
lora_fan_in_fan_out:
|
||
|
||
wandb_project: Humanish-DPO
|
||
wandb_entity:
|
||
wandb_watch:
|
||
wandb_name:
|
||
wandb_log_model:
|
||
|
||
hub_model_id: HumanLLMs/Humanish-LLama3.1-8B-Instruct
|
||
|
||
gradient_accumulation_steps: 8
|
||
micro_batch_size: 2
|
||
num_epochs: 1
|
||
optimizer: adamw_bnb_8bit
|
||
lr_scheduler: cosine
|
||
learning_rate: 0.0002
|
||
|
||
train_on_inputs: false
|
||
group_by_length: false
|
||
bf16: auto
|
||
fp16:
|
||
tf32: false
|
||
|
||
gradient_checkpointing: true
|
||
early_stopping_patience:
|
||
resume_from_checkpoint:
|
||
local_rank:
|
||
logging_steps: 1
|
||
xformers_attention:
|
||
flash_attention: true
|
||
s2_attention:
|
||
|
||
warmup_steps: 10
|
||
evals_per_epoch: 2
|
||
eval_table_size:
|
||
eval_max_new_tokens: 128
|
||
saves_per_epoch: 1
|
||
debug:
|
||
deepspeed:
|
||
weight_decay: 0.0
|
||
fsdp:
|
||
fsdp_config:
|
||
|
||
save_safetensors: true
|
||
|
||
```
|
||
|
||
</details><br>
|
||
|
||
# 💬 Prompt Template
|
||
|
||
You can use Llama3 prompt template while using the model:
|
||
|
||
### Llama3
|
||
|
||
```
|
||
<|start_header_id|>system<|end_header_id|>
|
||
{system}<|eot_id|>
|
||
|
||
<|start_header_id|>user<|end_header_id|>
|
||
{user}<|eot_id|>
|
||
|
||
<|start_header_id|>assistant<|end_header_id|>
|
||
{assistant}<|eot_id|>
|
||
```
|
||
|
||
This prompt template is available as a [chat template](https://huggingface.co/docs/transformers/main/chat_templating), which means you can format messages using the
|
||
`tokenizer.apply_chat_template()` method:
|
||
|
||
```python
|
||
messages = [
|
||
{"role": "system", "content": "You are helpful AI asistant."},
|
||
{"role": "user", "content": "Hello!"}
|
||
]
|
||
gen_input = tokenizer.apply_chat_template(message, return_tensors="pt")
|
||
model.generate(**gen_input)
|
||
```
|
||
|
||
# 🤖 Models
|
||
|
||
| Model | Download |
|
||
|:---------------------:|:-----------------------------------------------------------------------:|
|
||
| Human-Like-Llama-3-8B-Instruct | 🤗 [HuggingFace](https://huggingface.co/HumanLLMs/Human-Like-LLama3-8B-Instruct) |
|
||
| Human-Like-Qwen-2.5-7B-Instruct | 🤗 [HuggingFace](https://huggingface.co/HumanLLMs/Human-Like-Qwen2.5-7B-Instruct) |
|
||
| Human-Like-Mistral-Nemo-Instruct | 🤗 [HuggingFace](https://huggingface.co/HumanLLMs/Human-Like-Mistral-Nemo-Instruct-2407) |
|
||
|
||
# 🔄 Quantizationed versions
|
||
|
||
## GGUF [@bartowski](https://huggingface.co/bartowski)
|
||
|
||
- https://huggingface.co/bartowski/Human-Like-LLama3-8B-Instruct-GGUF
|
||
|
||
- https://huggingface.co/bartowski/Human-Like-Qwen2.5-7B-Instruct-GGUF
|
||
|
||
- https://huggingface.co/bartowski/Human-Like-Mistral-Nemo-Instruct-2407-GGUF
|
||
|
||
|
||
# 🎯 Benchmark Results
|
||
|
||
| **Group** | **Model** | **Average** | **IFEval** | **BBH** | **MATH Lvl 5** | **GPQA** | **MuSR** | **MMLU-PRO** |
|
||
|--------------------------------|--------------------------------|-------------|------------|---------|----------------|----------|----------|--------------|
|
||
| **Llama Models** | Human-Like-Llama-3-8B-Instruct | 22.37 | **64.97** | 28.01 | 8.45 | 0.78 | **2.00** | 30.01 |
|
||
| | Llama-3-8B-Instruct | 23.57 | 74.08 | 28.24 | 8.68 | 1.23 | 1.60 | 29.60 |
|
||
| | *Difference (Human-Like)* | -1.20 | **-9.11** | -0.23 | -0.23 | -0.45 | +0.40 | +0.41 |
|
||
| **Qwen Models** | Human-Like-Qwen-2.5-7B-Instruct | 26.66 | 72.84 | 34.48 | 0.00 | 6.49 | 8.42 | 37.76 |
|
||
| | Qwen-2.5-7B-Instruct | 26.86 | 75.85 | 34.89 | 0.00 | 5.48 | 8.45 | 36.52 |
|
||
| | *Difference (Human-Like)* | -0.20 | -3.01 | -0.41 | 0.00 | **+1.01**| -0.03 | **+1.24** |
|
||
| **Mistral Models** | Human-Like-Mistral-Nemo-Instruct | 22.88 | **54.51** | 32.70 | 7.62 | 5.03 | 9.39 | 28.00 |
|
||
| | Mistral-Nemo-Instruct | 23.53 | 63.80 | 29.68 | 5.89 | 5.37 | 8.48 | 27.97 |
|
||
| | *Difference (Human-Like)* | -0.65 | **-9.29** | **+3.02**| **+1.73** | -0.34 | +0.91 | +0.03 |
|
||
|
||
|
||
# 📊 Dataset
|
||
|
||
The dataset used for fine-tuning was generated using LLaMA 3 models. The dataset includes 10,884 samples across 256 distinct topics such as technology, daily life, science, history, and arts. Each sample consists of:
|
||
|
||
- **Human-like responses:** Natural, conversational answers mimicking human dialogue.
|
||
- **Formal responses:** Structured and precise answers with a more formal tone.
|
||
|
||
The dataset has been open-sourced and is available at:
|
||
|
||
- 👉 [Human-Like-DPO-Dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset)
|
||
|
||
More details on the dataset creation process can be found in the accompanying research paper.
|
||
|
||
# 📝 Citation
|
||
|
||
```
|
||
@misc{çalık2025enhancinghumanlikeresponseslarge,
|
||
title={Enhancing Human-Like Responses in Large Language Models},
|
||
author={Ethem Yağız Çalık and Talha Rüzgar Akkuş},
|
||
year={2025},
|
||
eprint={2501.05032},
|
||
archivePrefix={arXiv},
|
||
primaryClass={cs.CL},
|
||
url={https://arxiv.org/abs/2501.05032},
|
||
}
|
||
``` |