62 lines
2.2 KiB
Markdown
62 lines
2.2 KiB
Markdown
|
|
---
|
||
|
|
base_model:
|
||
|
|
- Fizzarolli/L3-8b-Rosier-v1
|
||
|
|
- NousResearch/Meta-Llama-3-8B
|
||
|
|
- Sao10K/L3-8B-Stheno-v3.2
|
||
|
|
library_name: transformers
|
||
|
|
tags:
|
||
|
|
- mergekit
|
||
|
|
- merge
|
||
|
|
|
||
|
|
---
|
||
|
|

|
||
|
|
`"Helide" (say HE-lied) is an ion of helium -- famously a very unreactive element, which doesn't form ions in most conditions.`
|
||
|
|
# merge
|
||
|
|
|
||
|
|
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
|
||
|
|
|
||
|
|
## Merge Details
|
||
|
|
|
||
|
|
An experimental merge of the legendary L3-8B-Stheno with Fizzarolli's Rosier. The aim is to improve Stheno's "ball-rolling" capabilities and reduce its awkwardness with more niche content. For a first go, I'm surprised at how well it's doing so far, but given that this is literally my first LLM project ever, probably temper your expectations.
|
||
|
|
|
||
|
|
Since R1: Changed to task-arithmetic. Snazzy new model card image.
|
||
|
|
|
||
|
|
Since R2: Fixed unnecessary conversion.
|
||
|
|
|
||
|
|
Since R3: Tweaked ratios, Rosier's influence cut in half.
|
||
|
|
|
||
|
|
Since R4: Scrubbin' it down. +0.08 to Rosier (pre-normalization). Closing in on a good ratio.
|
||
|
|
|
||
|
|
Since R5: Doubled both ratios; since normalization is enabled, this *should* essentially be the same as R5, but it makes the numbers nicer to work with, as now they can be envisioned as a ratio against 1.
|
||
|
|
(Edit: They have the same SHA-256 sums, so they're literally identical.)
|
||
|
|
### Merge Method
|
||
|
|
|
||
|
|
This model was merged using the [task arithmetic](https://arxiv.org/abs/2212.04089) merge method using [NousResearch/Meta-Llama-3-8B](https://huggingface.co/NousResearch/Meta-Llama-3-8B) as a base.
|
||
|
|
|
||
|
|
### Models Merged
|
||
|
|
|
||
|
|
The following models were included in the merge:
|
||
|
|
* [Fizzarolli/L3-8b-Rosier-v1](https://huggingface.co/Fizzarolli/L3-8b-Rosier-v1)
|
||
|
|
* [Sao10K/L3-8B-Stheno-v3.2](https://huggingface.co/Sao10K/L3-8B-Stheno-v3.2)
|
||
|
|
|
||
|
|
### Configuration
|
||
|
|
|
||
|
|
The following YAML configuration was used to produce this model:
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
models:
|
||
|
|
- model: Sao10K/L3-8B-Stheno-v3.2
|
||
|
|
parameters:
|
||
|
|
weight: 1
|
||
|
|
- model: Fizzarolli/L3-8b-Rosier-v1
|
||
|
|
parameters:
|
||
|
|
weight: 0.66
|
||
|
|
|
||
|
|
merge_method: task_arithmetic
|
||
|
|
base_model: NousResearch/Meta-Llama-3-8B
|
||
|
|
parameters:
|
||
|
|
normalize: true
|
||
|
|
dtype: bfloat16
|
||
|
|
|
||
|
|
```
|