118 lines
3.4 KiB
Markdown
118 lines
3.4 KiB
Markdown
---
|
|
license: llama3
|
|
library_name: transformers
|
|
base_model:
|
|
- NousResearch/Hermes-2-Pro-Llama-3-8B
|
|
- aaditya/Llama3-OpenBioLLM-8B
|
|
- meta-llama/Meta-Llama-3-8B-Instruct
|
|
base_model_relation: merge
|
|
tags:
|
|
- mindnlp
|
|
- wizard
|
|
- merge
|
|
- dare_ties
|
|
- llama
|
|
- text-generation
|
|
- arxiv:2311.03099
|
|
- arxiv:2306.01708
|
|
---
|
|
|
|
# Llama3-8B-merge-biomed-wizard (MindNLP Wizard Reproduction)
|
|
|
|

|
|
|
|
This is a DARE-TIES merge reproduction of Llama3-8B-Instruct + NousResearch/Hermes-2-Pro-Llama-3-8B + aaditya/Llama3-OpenBioLLM-8B.
|
|
|
|
The overall merge recipe and benchmark setup follow [lighteternal/Llama3-merge-biomed-8b](https://huggingface.co/lighteternal/Llama3-merge-biomed-8b), while the actual merge implementation is performed with **MindNLP Wizard** on MindSpore/Ascend.
|
|
|
|
## Implementation Statement
|
|
|
|
- Merge engine: **MindNLP Wizard**
|
|
- Runtime stack: MindSpore + Ascend
|
|
- Output dtype: `bfloat16`
|
|
|
|
## Usage
|
|
|
|
Prompt template recommendation remains the Llama3 format:
|
|
<https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/>
|
|
|
|
## Leaderboard Metrics (Open LLM Leaderboard style)
|
|
|
|
| Task | Metric | Ours (Wizard, %) | Llama3-8B-Instruct (%) | OpenBioLLM-8B (%) |
|
|
| --- | --- | ---: | ---: | ---: |
|
|
| **ARC Challenge** | Accuracy | **59.73** | 57.17 | 55.38 |
|
|
| | Normalized Accuracy | **64.59** | 60.75 | 58.62 |
|
|
| **HellaSwag** | Accuracy | 62.26 | **62.59** | 61.83 |
|
|
| | Normalized Accuracy | 81.35 | **81.53** | 80.76 |
|
|
| **Winogrande** | Accuracy | **76.01** | 74.51 | 70.88 |
|
|
| **GSM8K** | Accuracy | **70.81** | 68.69 | 10.15 |
|
|
| **MMLU-Anatomy** | Accuracy | 71.11 | **72.59** | 69.62 |
|
|
| **MMLU-Clinical Knowledge** | Accuracy | 77.74 | **77.83** | 60.38 |
|
|
| **MMLU-College Biology** | Accuracy | 80.56 | **81.94** | 79.86 |
|
|
| **MMLU-College Medicine** | Accuracy | 68.21 | 63.58 | **70.52** |
|
|
| **MMLU-Medical Genetics** | Accuracy | 82.00 | 80.00 | 80.00 |
|
|
| **MMLU-Professional Medicine** | Accuracy | 77.57 | 71.69 | **77.94** |
|
|
|
|
## Merge Details
|
|
|
|
### Merge Method
|
|
|
|
This model is merged using the **DARE-TIES** method with `meta-llama/Meta-Llama-3-8B-Instruct` as base.
|
|
|
|
### Models Merged
|
|
|
|
The following donor models are included in the merge:
|
|
|
|
- [NousResearch/Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B)
|
|
- [aaditya/Llama3-OpenBioLLM-8B](https://huggingface.co/aaditya/Llama3-OpenBioLLM-8B)
|
|
|
|
### Configuration
|
|
|
|
The following YAML configuration is used:
|
|
|
|
```yaml
|
|
models:
|
|
- model: meta-llama/Meta-Llama-3-8B-Instruct
|
|
# Base model providing a general foundation without specific parameters
|
|
|
|
- model: meta-llama/Meta-Llama-3-8B-Instruct
|
|
parameters:
|
|
density: 0.60
|
|
weight: 0.5
|
|
|
|
- model: NousResearch/Hermes-2-Pro-Llama-3-8B
|
|
parameters:
|
|
density: 0.55
|
|
weight: 0.1
|
|
|
|
- model: aaditya/Llama3-OpenBioLLM-8B
|
|
parameters:
|
|
density: 0.55
|
|
weight: 0.4
|
|
|
|
merge_method: dare_ties
|
|
base_model: meta-llama/Meta-Llama-3-8B-Instruct
|
|
parameters:
|
|
int8_mask: true
|
|
dtype: bfloat16
|
|
```
|
|
|
|
## Reproducibility Notes
|
|
|
|
- Few-shot settings:
|
|
- ARC Challenge: 25-shot
|
|
- HellaSwag: 10-shot
|
|
- Winogrande: 5-shot
|
|
- GSM8K: 5-shot
|
|
- MMLU-* subsets: 5-shot
|
|
|
|
## Environment (Inference / Evaluation)
|
|
|
|
- Accelerator: Ascend 910B2
|
|
- MindSpore: 2.7.1
|
|
|
|
## References
|
|
|
|
- [Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch](https://arxiv.org/abs/2311.03099)
|
|
- [Resolving Interference When Merging Models](https://arxiv.org/abs/2306.01708)
|