Human-Like-LLama3-8B-Instruct/README.md

---
license: llama3
tags:
- axolotl
- dpo
- trl
base_model: meta-llama/Meta-Llama-3-8B-Instruct
datasets:
- HumanLLMs/Human-Like-DPO-Dataset
model-index:
- name: Humanish-LLama3.1-8B-Instruct
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: IFEval (0-Shot)
      type: HuggingFaceH4/ifeval
      args:
        num_few_shot: 0
    metrics:
    - type: inst_level_strict_acc and prompt_level_strict_acc
      value: 64.98
      name: strict accuracy
    source:
      url: >-
        https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-LLama3.1-8B-Instruct
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: BBH (3-Shot)
      type: BBH
      args:
        num_few_shot: 3
    metrics:
    - type: acc_norm
      value: 28.01
      name: normalized accuracy
    source:
      url: >-
        https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-LLama3.1-8B-Instruct
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MATH Lvl 5 (4-Shot)
      type: hendrycks/competition_math
      args:
        num_few_shot: 4
    metrics:
    - type: exact_match
      value: 8.46
      name: exact match
    source:
      url: >-
        https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-LLama3.1-8B-Instruct
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: GPQA (0-shot)
      type: Idavidrein/gpqa
      args:
        num_few_shot: 0
    metrics:
    - type: acc_norm
      value: 0.78
      name: acc_norm
    source:
      url: >-
        https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-LLama3.1-8B-Instruct
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MuSR (0-shot)
      type: TAUR-Lab/MuSR
      args:
        num_few_shot: 0
    metrics:
    - type: acc_norm
      value: 2
      name: acc_norm
    source:
      url: >-
        https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-LLama3.1-8B-Instruct
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MMLU-PRO (5-shot)
      type: TIGER-Lab/MMLU-Pro
      config: main
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 30.02
      name: accuracy
    source:
      url: >-
        https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-LLama3.1-8B-Instruct
      name: Open LLM Leaderboard
pipeline_tag: text-generation
library_name: transformers
---
<div align="center">
  <img src="https://cdn-avatars.huggingface.co/v1/production/uploads/63da3d7ae697e5898cb86854/H-vpXOX6KZu01HnV87Jk5.jpeg" width="320" height="320" />
  <h1>Enhancing Human-Like Responses in Large Language Models</h1>
</div>

<p align="center">
  &nbsp&nbsp | 🤗 <a href="https://huggingface.co/collections/HumanLLMs/human-like-humanish-llms-6759fa68f22e11eb1a10967e">Models</a>&nbsp&nbsp |
  &nbsp&nbsp 📊 <a href="https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset">Dataset</a>&nbsp&nbsp |
  &nbsp&nbsp 📄<a href="https://arxiv.org/abs/2501.05032">Paper</a>&nbsp&nbsp |
</p>

<p align="center">
  <b>📢 The paper associated with this model has been accepted to the <a href="https://personalizedllm.github.io/events/perfm-aaai26/">AAAI-26 Workshop on Personalization in the Era of Large Foundation Models (PerFM)</a>.</b>
</p>

# 🚀 Human-Like-Llama3-8B-Instruct

This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct), specifically optimized to generate more human-like and conversational responses.

The fine-tuning process employed both [Low-Rank Adaptation (LoRA)](https://arxiv.org/abs/2106.09685) and [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290) to enhance natural language understanding, conversational coherence, and emotional intelligence in interactions.

The proccess of creating this models is detailed in the research paper [“Enhancing Human-Like Responses in Large Language Models”](https://arxiv.org/abs/2501.05032).

# 🛠️ Training Configuration

- **Base Model:** Llama3-8B-Instruct
- **Framework:** Axolotl v0.4.1
- **Hardware:** 2x NVIDIA A100 (80 GB) GPUs
- **Training Time:** ~2 hours 20 minutes
- **Dataset:** Synthetic dataset with ≈11,000 samples across 256 diverse topics

<details><summary>See axolotl config</summary>

axolotl version: `0.4.1`
```yaml
base_model: meta-llama/Meta-Llama-3-8B-Instruct
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: true
load_in_4bit: false
strict: false

chat_template: llama3
rl: dpo
datasets:
  - path: HumanLLMs/humanish-dpo-project
    type: llama3.prompt_pairs
    chat_template: llama3

dataset_prepared_path:
val_set_size: 0.05
output_dir: ./humanish-llama3-8b-instruct

sequence_len: 8192
sample_packing: false
pad_to_sequence_len: true

adapter: lora
lora_model_dir:
lora_r: 8
lora_alpha: 4
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:

wandb_project: Humanish-DPO
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

hub_model_id: HumanLLMs/Humanish-LLama3.1-8B-Instruct

gradient_accumulation_steps: 8
micro_batch_size: 2
num_epochs: 1
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
s2_attention:

warmup_steps: 10
evals_per_epoch: 2
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:

save_safetensors: true

```

</details><br>

# 💬 Prompt Template

You can use Llama3 prompt template while using the model:

### Llama3

```
<|start_header_id|>system<|end_header_id|>
{system}<|eot_id|>

<|start_header_id|>user<|end_header_id|>
{user}<|eot_id|>

<|start_header_id|>assistant<|end_header_id|>
{assistant}<|eot_id|>
```

This prompt template is available as a [chat template](https://huggingface.co/docs/transformers/main/chat_templating), which means you can format messages using the
`tokenizer.apply_chat_template()` method:

```python
messages = [
    {"role": "system", "content": "You are helpful AI asistant."},
    {"role": "user", "content": "Hello!"}
]
gen_input = tokenizer.apply_chat_template(message, return_tensors="pt")
model.generate(**gen_input)
```

# 🤖 Models

|         Model         |                               Download                                 |
|:---------------------:|:-----------------------------------------------------------------------:|
| Human-Like-Llama-3-8B-Instruct  |  🤗 [HuggingFace](https://huggingface.co/HumanLLMs/Human-Like-LLama3-8B-Instruct)  |
| Human-Like-Qwen-2.5-7B-Instruct  | 🤗 [HuggingFace](https://huggingface.co/HumanLLMs/Human-Like-Qwen2.5-7B-Instruct)  |
| Human-Like-Mistral-Nemo-Instruct  | 🤗 [HuggingFace](https://huggingface.co/HumanLLMs/Human-Like-Mistral-Nemo-Instruct-2407) |

# 🔄 Quantizationed versions

## GGUF [@bartowski](https://huggingface.co/bartowski)

- https://huggingface.co/bartowski/Human-Like-LLama3-8B-Instruct-GGUF

- https://huggingface.co/bartowski/Human-Like-Qwen2.5-7B-Instruct-GGUF

- https://huggingface.co/bartowski/Human-Like-Mistral-Nemo-Instruct-2407-GGUF


# 🎯 Benchmark Results

| **Group**                      | **Model**                      | **Average** | **IFEval** | **BBH** | **MATH Lvl 5** | **GPQA** | **MuSR** | **MMLU-PRO** |
|--------------------------------|--------------------------------|-------------|------------|---------|----------------|----------|----------|--------------|
| **Llama Models**               | Human-Like-Llama-3-8B-Instruct | 22.37       | **64.97**  | 28.01   | 8.45           | 0.78     | **2.00** | 30.01        |
|                                | Llama-3-8B-Instruct            | 23.57       | 74.08      | 28.24   | 8.68           | 1.23     | 1.60     | 29.60        |
|                                | *Difference (Human-Like)*      | -1.20       | **-9.11**  | -0.23   | -0.23          | -0.45    | +0.40    | +0.41        |
| **Qwen Models**                | Human-Like-Qwen-2.5-7B-Instruct | 26.66      | 72.84      | 34.48   | 0.00           | 6.49     | 8.42     | 37.76        |
|                                | Qwen-2.5-7B-Instruct           | 26.86       | 75.85      | 34.89   | 0.00           | 5.48     | 8.45     | 36.52        |
|                                | *Difference (Human-Like)*      | -0.20       | -3.01      | -0.41   | 0.00           | **+1.01**| -0.03    | **+1.24**    |
| **Mistral Models**             | Human-Like-Mistral-Nemo-Instruct | 22.88     | **54.51**  | 32.70   | 7.62           | 5.03     | 9.39     | 28.00        |
|                                | Mistral-Nemo-Instruct          | 23.53       | 63.80      | 29.68   | 5.89           | 5.37     | 8.48     | 27.97        |
|                                | *Difference (Human-Like)*      | -0.65       | **-9.29**  | **+3.02**| **+1.73**      | -0.34    | +0.91    | +0.03        |


# 📊 Dataset

The dataset used for fine-tuning was generated using LLaMA 3 models. The dataset includes 10,884 samples across 256 distinct topics such as technology, daily life, science, history, and arts. Each sample consists of:

- **Human-like responses:** Natural, conversational answers mimicking human dialogue.
- **Formal responses:** Structured and precise answers with a more formal tone.

The dataset has been open-sourced and is available at:

- 👉 [Human-Like-DPO-Dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset)

More details on the dataset creation process can be found in the accompanying research paper.

# 📝 Citation

```
@misc{çalık2025enhancinghumanlikeresponseslarge,
      title={Enhancing Human-Like Responses in Large Language Models},
      author={Ethem Yağız Çalık and Talha Rüzgar Akkuş},
      year={2025},
      eprint={2501.05032},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.05032},
}
```