Mistral7B-PairRM-SPPO-ExPO/README.md

---
language:
- en
license: apache-2.0
model-index:
- name: Mistral7B-PairRM-SPPO-ExPO
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: IFEval (0-Shot)
      type: HuggingFaceH4/ifeval
      args:
        num_few_shot: 0
    metrics:
    - type: inst_level_strict_acc and prompt_level_strict_acc
      value: 36.73
      name: strict accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=chujiezheng/Mistral7B-PairRM-SPPO-ExPO
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: BBH (3-Shot)
      type: BBH
      args:
        num_few_shot: 3
    metrics:
    - type: acc_norm
      value: 13.68
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=chujiezheng/Mistral7B-PairRM-SPPO-ExPO
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MATH Lvl 5 (4-Shot)
      type: hendrycks/competition_math
      args:
        num_few_shot: 4
    metrics:
    - type: exact_match
      value: 0.91
      name: exact match
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=chujiezheng/Mistral7B-PairRM-SPPO-ExPO
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: GPQA (0-shot)
      type: Idavidrein/gpqa
      args:
        num_few_shot: 0
    metrics:
    - type: acc_norm
      value: 3.58
      name: acc_norm
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=chujiezheng/Mistral7B-PairRM-SPPO-ExPO
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MuSR (0-shot)
      type: TAUR-Lab/MuSR
      args:
        num_few_shot: 0
    metrics:
    - type: acc_norm
      value: 8.66
      name: acc_norm
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=chujiezheng/Mistral7B-PairRM-SPPO-ExPO
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MMLU-PRO (5-shot)
      type: TIGER-Lab/MMLU-Pro
      config: main
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 17.24
      name: accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=chujiezheng/Mistral7B-PairRM-SPPO-ExPO
      name: Open LLM Leaderboard
---

# Mistral7B-PairRM-SPPO-ExPO

The extrapolated (ExPO) model based on [`UCLA-AGI/Mistral7B-PairRM-SPPO`](https://huggingface.co/UCLA-AGI/Mistral7B-PairRM-SPPO) and [`mistralai/Mistral-7B-Instruct-v0.2`](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2), as in the "[Weak-to-Strong Extrapolation Expedites Alignment](https://arxiv.org/abs/2404.16792)" paper.

Specifically, we obtain this model by extrapolating **(alpha = 0.3)** from the weights of the SFT and DPO/RLHF checkpoints, achieving superior alignment with human preference.

This extrapolated model achieves the **35.4%** win rate and **31.8%** LC win rate on **AlpacaEval 2.0**, outperforming the original `Mistral7B-PairRM-SPPO`'s 32.2% and 30.5%, respectively.

## Evaluation Results

Evaluation results on the **AlpacaEval 2.0** benchmark (you can find the evaluation outputs on the [official GitHub repo](https://github.com/chujiezheng/LLM-Extrapolation/tree/main/results_alpaca)):

|                                      | Win Rate (Ori) | LC Win Rate (Ori) | Win Rate (+ ExPO) | LC Win Rate (+ ExPO) |
| ------------------------------------ | -------------- | ----------------- | ----------------- | -------------------- |
| `HuggingFaceH4/zephyr-7b-alpha`      | 6.7%           | 10.0%             | **10.6%**         | **13.6%**            |
| `HuggingFaceH4/zephyr-7b-beta`       | 10.2%          | 13.2%             | **11.1%**         | **14.0%**            |
| `berkeley-nest/Starling-LM-7B-alpha` | 15.0%          | 18.3%             | **18.2%**         | **19.5%**            |
| `Nexusflow/Starling-LM-7B-beta`      | 26.6%          | 25.8%             | **29.6%**         | **26.4%**            |
| `snorkelai/Snorkel-Mistral-PairRM`   | 24.7%          | 24.0%             | **28.8%**         | **26.4%**            |
| `RLHFlow/LLaMA3-iterative-DPO-final` | 29.2%          | 36.0%             | **32.7%**         | **37.8%**            |
| `internlm/internlm2-chat-1.8b`       | 3.8%           | 4.0%              | **5.2%**          | **4.3%**             |
| `internlm/internlm2-chat-7b`         | 20.5%          | 18.3%             | **28.1%**         | **22.7%**            |
| `internlm/internlm2-chat-20b`        | 36.1%          | 24.9%             | **46.2%**         | **27.2%**            |
| `allenai/tulu-2-dpo-7b`              | 8.5%           | 10.2%             | **11.5%**         | **11.7%**            |
| `allenai/tulu-2-dpo-13b`             | 11.2%          | 15.5%             | **15.6%**         | **17.6%**            |
| `allenai/tulu-2-dpo-70b`             | 15.4%          | 21.2%             | **23.0%**         | **25.7%**            |

Evaluation results on the **MT-Bench** benchmark (you can find the evaluation outputs on the [official GitHub repo](https://github.com/chujiezheng/LLM-Extrapolation/tree/main/results_mtbench)):

|                                      | Original | + ExPO   |
| ------------------------------------ | -------- | -------- |
| `HuggingFaceH4/zephyr-7b-alpha`      | 6.85     | **6.87** |
| `HuggingFaceH4/zephyr-7b-beta`       | 7.02     | **7.06** |
| `berkeley-nest/Starling-LM-7B-alpha` | 7.82     | **7.91** |
| `Nexusflow/Starling-LM-7B-beta`      | 8.10     | **8.18** |
| `snorkelai/Snorkel-Mistral-PairRM`   | 7.63     | **7.69** |
| `RLHFlow/LLaMA3-iterative-DPO-final` | 8.08     | **8.45** |
| `internlm/internlm2-chat-1.8b`       | 5.17     | **5.26** |
| `internlm/internlm2-chat-7b`         | 7.72     | **7.80** |
| `internlm/internlm2-chat-20b`        | 8.13     | **8.26** |
| `allenai/tulu-2-dpo-7b`              | 6.35     | **6.38** |
| `allenai/tulu-2-dpo-13b`             | 7.00     | **7.26** |
| `allenai/tulu-2-dpo-70b`             | 7.79     | **8.03** |


# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_chujiezheng__Mistral7B-PairRM-SPPO-ExPO)

|      Metric       |Value|
|-------------------|----:|
|Avg.               |13.47|
|IFEval (0-Shot)    |36.73|
|BBH (3-Shot)       |13.68|
|MATH Lvl 5 (4-Shot)| 0.91|
|GPQA (0-shot)      | 3.58|
|MuSR (0-shot)      | 8.66|
|MMLU-PRO (5-shot)  |17.24|