MetaMath-Mistral-Pro/README.md

---
license: apache-2.0
datasets:
- meta-math/MetaMathQA
language:
- en
metrics:
- accuracy
---

see our paper in https://arxiv.org/abs/2401.02415

View the project page:
https://github.com/TencentARC/LLaMA-Pro


## Model Details

MetaMath-Mistral-Pro is fully fine-tuned on the MetaMathQA datasets and based on the powerful Mistral-Pro model.


## Model Usage

The model is trained to use the following format (note the newlines):
```
<|user|>
Your message here!
<|assistant|>
```

For best results, format all inputs in this manner. **Make sure to include a newline after `<|assistant|>`, this can affect generation quality quite a bit.**


## Experiments

| Model               | GSM8k Pass@1 | MATH Pass@1 |
|---------------------|--------------|-------------|
| MPT-7B              | 6.8          | 3.0         |
| Falcon-7B           | 6.8          | 2.3         |
| LLaMA-1-7B          | 11.0         | 2.9         |
| LLaMA-2-7B          | 14.6         | 2.5         |
| MPT-30B             | 15.2         | 3.1         |
| LLaMA-1-13B         | 17.8         | 3.9         |
| GPT-Neo-2.7B        | 19.5         | --          |
| Falcon-40B          | 19.6         | 2.5         |
| Baichuan-chat-13B   | 23.9         | --          |
| Vicuna-v1.3-13B     | 27.6         | --          |
| LLaMA-2-13B         | 28.7         | 3.9         |
| InternLM-7B         | 31.2         | --          |
| ChatGLM-2-6B        | 32.4         | --          |
| GPT-J-6B            | 34.9         | --          |
| LLaMA-1-33B         | 35.6         | 3.9         |
| LLaMA-2-34B         | 42.2         | 6.24        |
| RFT-7B              | 50.3         | --          |
| LLaMA-1-65B         | 50.9         | 10.6        |
| Qwen-7B             | 51.6         | --          |
| WizardMath-7B       | 54.9         | 10.7        |
| LLaMA-2-70B         | 56.8         | 13.5        |
| WizardMath-13B      | 63.9         | 14.0        |
| MAmmoTH-7B (COT)    | 50.5         | 10.4        |
| MAmmoTH-7B (POT+COT)| 53.6         | 31.5        |
| Arithmo-Mistral-7B  | 74.7         | 25.3        |
| MetaMath-7B         | 66.5         | 19.8        |
| MetaMath-13B        | 72.3         | 22.4        |
| MetaMath-Mistral-7B | 77.7     | 28.2        |
|  MetaMath-Llemma-7B | 69.2     | 30.0        |
| 🔥 **MetaMath-Mistral-Pro** | **78.4**     | **30.3**        |


## Citation

```bibtex
@article{wu2024llama,
  title={Llama pro: Progressive llama with block expansion},
  author={Wu, Chengyue and Gan, Yukang and Ge, Yixiao and Lu, Zeyu and Wang, Jiahao and Feng, Ye and Luo, Ping and Shan, Ying},
  journal={arXiv preprint arXiv:2401.02415},
  year={2024}
}
```
初始化项目，由ModelHub XC社区提供模型 Model: AI-ModelScope/MetaMath-Mistral-Pro Source: Original Platform 2026-06-23 19:06:12 +08:00			`---`
			`license: apache-2.0`
			`datasets:`
			`- meta-math/MetaMathQA`
			`language:`
			`- en`
			`metrics:`
			`- accuracy`
			`---`

			`see our paper in https://arxiv.org/abs/2401.02415`

			`View the project page:`
			`https://github.com/TencentARC/LLaMA-Pro`


			`## Model Details`

			`MetaMath-Mistral-Pro is fully fine-tuned on the MetaMathQA datasets and based on the powerful Mistral-Pro model.`


			`## Model Usage`

			`The model is trained to use the following format (note the newlines):`
			```
			`<\|user\|>`
			`Your message here!`
			`<\|assistant\|>`
			```

			For best results, format all inputs in this manner. Make sure to include a newline after `<\|assistant\|>`, this can affect generation quality quite a bit.


			`## Experiments`

			`\| Model \| GSM8k Pass@1 \| MATH Pass@1 \|`
			`\|---------------------\|--------------\|-------------\|`
			`\| MPT-7B \| 6.8 \| 3.0 \|`
			`\| Falcon-7B \| 6.8 \| 2.3 \|`
			`\| LLaMA-1-7B \| 11.0 \| 2.9 \|`
			`\| LLaMA-2-7B \| 14.6 \| 2.5 \|`
			`\| MPT-30B \| 15.2 \| 3.1 \|`
			`\| LLaMA-1-13B \| 17.8 \| 3.9 \|`
			`\| GPT-Neo-2.7B \| 19.5 \| -- \|`
			`\| Falcon-40B \| 19.6 \| 2.5 \|`
			`\| Baichuan-chat-13B \| 23.9 \| -- \|`
			`\| Vicuna-v1.3-13B \| 27.6 \| -- \|`
			`\| LLaMA-2-13B \| 28.7 \| 3.9 \|`
			`\| InternLM-7B \| 31.2 \| -- \|`
			`\| ChatGLM-2-6B \| 32.4 \| -- \|`
			`\| GPT-J-6B \| 34.9 \| -- \|`
			`\| LLaMA-1-33B \| 35.6 \| 3.9 \|`
			`\| LLaMA-2-34B \| 42.2 \| 6.24 \|`
			`\| RFT-7B \| 50.3 \| -- \|`
			`\| LLaMA-1-65B \| 50.9 \| 10.6 \|`
			`\| Qwen-7B \| 51.6 \| -- \|`
			`\| WizardMath-7B \| 54.9 \| 10.7 \|`
			`\| LLaMA-2-70B \| 56.8 \| 13.5 \|`
			`\| WizardMath-13B \| 63.9 \| 14.0 \|`
			`\| MAmmoTH-7B (COT) \| 50.5 \| 10.4 \|`
			`\| MAmmoTH-7B (POT+COT)\| 53.6 \| 31.5 \|`
			`\| Arithmo-Mistral-7B \| 74.7 \| 25.3 \|`
			`\| MetaMath-7B \| 66.5 \| 19.8 \|`
			`\| MetaMath-13B \| 72.3 \| 22.4 \|`
			`\| MetaMath-Mistral-7B \| 77.7 \| 28.2 \|`
			`\| MetaMath-Llemma-7B \| 69.2 \| 30.0 \|`
			`\| 🔥 MetaMath-Mistral-Pro \| 78.4 \| 30.3 \|`


			`## Citation`

			```bibtex
			`@article{wu2024llama,`
			`title={Llama pro: Progressive llama with block expansion},`
			`author={Wu, Chengyue and Gan, Yukang and Ge, Yixiao and Lu, Zeyu and Wang, Jiahao and Feng, Ye and Luo, Ping and Shan, Ying},`
			`journal={arXiv preprint arXiv:2401.02415},`
			`year={2024}`
			`}`
			```