110 lines
4.1 KiB
Markdown
110 lines
4.1 KiB
Markdown
---
|
|
tags:
|
|
- llama
|
|
- alpaca
|
|
- vicuna
|
|
- uncensored
|
|
- merge
|
|
- mix
|
|
- airoboros
|
|
- openorca
|
|
- orcamini
|
|
- orca
|
|
- instruct
|
|
- mixtune
|
|
datasets:
|
|
- Open-Orca/OpenOrca
|
|
- anon8231489123/ShareGPT_Vicuna_unfiltered
|
|
- jondurbin/airoboros-uncensored
|
|
language:
|
|
- en
|
|
metrics:
|
|
- accuracy
|
|
pipeline_tag: text-generation
|
|
---
|
|
|
|
## 13B-Ouroboros
|
|
Ouroboros is an experimental model based on Meta's LLaMA [v1] 13B base model using a custom merging technique, tweaking
|
|
each layer's merge % based on internal tests against the PTB dataset, scoring ~26.31 according to internal evaluation
|
|
(6 samples, sequence length 1024; this testing is not empirical, it's a quick way to find near-optimum values). Testing,
|
|
evaluating, and remixing this model is absolutely permissible and even encouraged (within the bounds of Meta's LLaMAv1
|
|
license agreement); the more feedback the better we can tune our process! 😊
|
|
|
|
## Composition:
|
|
Ouroboros is comprised of 40 layers [LLaMAv1 13B standard] mixed at optimized
|
|
ratios VS the PTB dataset for lowest perplexity score. Listed below are the
|
|
paired models and ratios merged per layer.
|
|
|
|
Tier One Merge:
|
|
|
|
13B-airoboros-gpt4-1.4 > 13B-orca_mini_v2
|
|
|
|
[0.22, 0.85, 0.89, 0.98, 0.3, 0.41, 0.71, 0.83, 0.32, 0.1, 0.44, 0.6, 0.53, 0.15, 0.86, 0.79, 0.93, 0.02, 0.19, 0.82, 0.01, 0.52, 0.07, 0.27, 0.73, 0.86, 0.08, 0.67, 0.42, 0.28, 0.37, 0.08, 0.95, 0.68, 0.45, 0.08, 0.7, 0.93, 0.96, 0.43]
|
|
|
|
13B-gpt4-x-alpaca > 13B-Vicuna-cocktail
|
|
|
|
[0.65, 0.94, 0.98, 0.87, 0.28, 0.64, 0.73, 0.7, 0.95, 0.89, 0.84, 0.9, 0.59, 0.92, 0.28, 0.61, 0.88, 0.73, 0.34, 0.85, 0.98, 0.05, 0.74, 0.92, 0.5, 0.78, 0.26, 0.4, 0.27, 0.65, 0.71, 0.7, 0.8, 0.93, 0.36, 0.03, 0.45, 0.39, 0.77, 0.06]
|
|
|
|
Tier Two Merge:
|
|
|
|
[13B-airoboros-gpt4-1.4 + 13B-orca_mini_v2] offspring > [13B-gpt4-x-alpaca + 13B-Vicuna-cocktail] offspring
|
|
|
|
[0.2, 0.83, 0.24, 0.03, 0.37, 0.62, 0.02, 0.82, 0.65, 0.63, 0.45, 0.65, 0.48, 0.45, 0.24, 0.76, 0.06, 0.31, 0.45, 0.86, 0.23, 0.99, 0.93, 0.84, 0.96, 0.53, 0.95, 0.32, 0.19, 0.06, 0.4, 0.08, 0.62, 0.4, 0.26, 0.12, 0.16, 0.91, 0.14, 0.0]
|
|
|
|
Result:
|
|
|
|
13B-Ouroboros, a model that seems uncensored and highly competent. So far only Alpaca instruction prompting has been tested and seems to work solidly well.
|
|
|
|
## Use:
|
|
|
|
Alpaca's instruct format can be used to do many things, including control of the terms of behavior
|
|
between a user and a response from an agent in chat. Below is an example of a command injected into
|
|
memory.
|
|
|
|
```
|
|
### Instruction:
|
|
Make Narrator function as a text based adventure game that responds with verbose, detailed, and creative descriptions of what happens next after Player's response.
|
|
Make Player function as the player input for Narrator's text based adventure game, controlling a character named (insert character name here, their short bio, and
|
|
whatever quest or other information to keep consistent in the interaction).
|
|
|
|
### Response:
|
|
{an empty new line here}
|
|
```
|
|
|
|
## Language Models Used Credits:
|
|
|
|
13B-airoboros-gpt4-1.4 by jondurbin
|
|
|
|
https://huggingface.co/jondurbin/airoboros-13b-gpt4-1.4
|
|
|
|
13B-orca_mini_v2 by psmathur
|
|
|
|
https://huggingface.co/psmathur/orca_mini_v2_13b
|
|
|
|
13B-gpt4-x-alpaca by chavinlo
|
|
|
|
https://huggingface.co/chavinlo/gpt4-x-alpaca
|
|
|
|
13B-Vicuna-cocktail by reeducator
|
|
|
|
https://huggingface.co/reeducator/vicuna-13b-cocktail
|
|
|
|
Also thanks to Meta for LLaMA.
|
|
|
|
Each model was hand picked and considered for what it could contribute to this ensemble.
|
|
Thanks to each and every one of you for your incredible work developing some of the best things
|
|
to come out of this community.
|
|
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
|
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_CalderaAI__13B-Ouroboros)
|
|
|
|
| Metric | Value |
|
|
|-----------------------|---------------------------|
|
|
| Avg. | 44.66 |
|
|
| ARC (25-shot) | 57.42 |
|
|
| HellaSwag (10-shot) | 82.11 |
|
|
| MMLU (5-shot) | 51.43 |
|
|
| TruthfulQA (0-shot) | 47.99 |
|
|
| Winogrande (5-shot) | 57.85 |
|
|
| GSM8K (5-shot) | 0.45 |
|
|
| DROP (3-shot) | 15.36 |
|