初始化项目,由ModelHub XC社区提供模型
Model: chenjingshen/Llama3-8B-merge-biomed-wizard Source: Original Platform
This commit is contained in:
117
README.md
Normal file
117
README.md
Normal file
@@ -0,0 +1,117 @@
|
||||
---
|
||||
license: llama3
|
||||
library_name: transformers
|
||||
base_model:
|
||||
- NousResearch/Hermes-2-Pro-Llama-3-8B
|
||||
- aaditya/Llama3-OpenBioLLM-8B
|
||||
- meta-llama/Meta-Llama-3-8B-Instruct
|
||||
base_model_relation: merge
|
||||
tags:
|
||||
- mindnlp
|
||||
- wizard
|
||||
- merge
|
||||
- dare_ties
|
||||
- llama
|
||||
- text-generation
|
||||
- arxiv:2311.03099
|
||||
- arxiv:2306.01708
|
||||
---
|
||||
|
||||
# Llama3-8B-merge-biomed-wizard (MindNLP Wizard Reproduction)
|
||||
|
||||

|
||||
|
||||
This is a DARE-TIES merge reproduction of Llama3-8B-Instruct + NousResearch/Hermes-2-Pro-Llama-3-8B + aaditya/Llama3-OpenBioLLM-8B.
|
||||
|
||||
The overall merge recipe and benchmark setup follow [lighteternal/Llama3-merge-biomed-8b](https://huggingface.co/lighteternal/Llama3-merge-biomed-8b), while the actual merge implementation is performed with **MindNLP Wizard** on MindSpore/Ascend.
|
||||
|
||||
## Implementation Statement
|
||||
|
||||
- Merge engine: **MindNLP Wizard**
|
||||
- Runtime stack: MindSpore + Ascend
|
||||
- Output dtype: `bfloat16`
|
||||
|
||||
## Usage
|
||||
|
||||
Prompt template recommendation remains the Llama3 format:
|
||||
<https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/>
|
||||
|
||||
## Leaderboard Metrics (Open LLM Leaderboard style)
|
||||
|
||||
| Task | Metric | Ours (Wizard, %) | Llama3-8B-Instruct (%) | OpenBioLLM-8B (%) |
|
||||
| --- | --- | ---: | ---: | ---: |
|
||||
| **ARC Challenge** | Accuracy | **59.73** | 57.17 | 55.38 |
|
||||
| | Normalized Accuracy | **64.59** | 60.75 | 58.62 |
|
||||
| **HellaSwag** | Accuracy | 62.26 | **62.59** | 61.83 |
|
||||
| | Normalized Accuracy | 81.35 | **81.53** | 80.76 |
|
||||
| **Winogrande** | Accuracy | **76.01** | 74.51 | 70.88 |
|
||||
| **GSM8K** | Accuracy | **70.81** | 68.69 | 10.15 |
|
||||
| **MMLU-Anatomy** | Accuracy | 71.11 | **72.59** | 69.62 |
|
||||
| **MMLU-Clinical Knowledge** | Accuracy | 77.74 | **77.83** | 60.38 |
|
||||
| **MMLU-College Biology** | Accuracy | 80.56 | **81.94** | 79.86 |
|
||||
| **MMLU-College Medicine** | Accuracy | 68.21 | 63.58 | **70.52** |
|
||||
| **MMLU-Medical Genetics** | Accuracy | 82.00 | 80.00 | 80.00 |
|
||||
| **MMLU-Professional Medicine** | Accuracy | 77.57 | 71.69 | **77.94** |
|
||||
|
||||
## Merge Details
|
||||
|
||||
### Merge Method
|
||||
|
||||
This model is merged using the **DARE-TIES** method with `meta-llama/Meta-Llama-3-8B-Instruct` as base.
|
||||
|
||||
### Models Merged
|
||||
|
||||
The following donor models are included in the merge:
|
||||
|
||||
- [NousResearch/Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B)
|
||||
- [aaditya/Llama3-OpenBioLLM-8B](https://huggingface.co/aaditya/Llama3-OpenBioLLM-8B)
|
||||
|
||||
### Configuration
|
||||
|
||||
The following YAML configuration is used:
|
||||
|
||||
```yaml
|
||||
models:
|
||||
- model: meta-llama/Meta-Llama-3-8B-Instruct
|
||||
# Base model providing a general foundation without specific parameters
|
||||
|
||||
- model: meta-llama/Meta-Llama-3-8B-Instruct
|
||||
parameters:
|
||||
density: 0.60
|
||||
weight: 0.5
|
||||
|
||||
- model: NousResearch/Hermes-2-Pro-Llama-3-8B
|
||||
parameters:
|
||||
density: 0.55
|
||||
weight: 0.1
|
||||
|
||||
- model: aaditya/Llama3-OpenBioLLM-8B
|
||||
parameters:
|
||||
density: 0.55
|
||||
weight: 0.4
|
||||
|
||||
merge_method: dare_ties
|
||||
base_model: meta-llama/Meta-Llama-3-8B-Instruct
|
||||
parameters:
|
||||
int8_mask: true
|
||||
dtype: bfloat16
|
||||
```
|
||||
|
||||
## Reproducibility Notes
|
||||
|
||||
- Few-shot settings:
|
||||
- ARC Challenge: 25-shot
|
||||
- HellaSwag: 10-shot
|
||||
- Winogrande: 5-shot
|
||||
- GSM8K: 5-shot
|
||||
- MMLU-* subsets: 5-shot
|
||||
|
||||
## Environment (Inference / Evaluation)
|
||||
|
||||
- Accelerator: Ascend 910B2
|
||||
- MindSpore: 2.7.1
|
||||
|
||||
## References
|
||||
|
||||
- [Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch](https://arxiv.org/abs/2311.03099)
|
||||
- [Resolving Interference When Merging Models](https://arxiv.org/abs/2306.01708)
|
||||
Reference in New Issue
Block a user