282 lines
16 KiB
Markdown
282 lines
16 KiB
Markdown
---
|
|
base_model: upstage/SOLAR-10.7B-Instruct-v1.0
|
|
tags:
|
|
- alignment-handbook
|
|
- generated_from_trainer
|
|
- UNA
|
|
- single-turn
|
|
model-index:
|
|
- name: UNA-SOLAR-10.7B-Instruct-v1.0
|
|
results: []
|
|
license: cc-by-nc-nd-4.0
|
|
language:
|
|
- en
|
|
library_name: transformers
|
|
---
|
|
|
|
# UNA: Uniform Neural Alignment
|
|
|
|
SFT Further:
|
|
- Linear
|
|
- 2e-5
|
|
|
|
Merges:
|
|
- Fan in: `0:2`
|
|
- Fan out: `-4:`
|
|
- Intermediary layers: `1/1/1/0/1/1/0/1/0/1/1/0/1/1/0` use the On/Off as a way of regularise.
|
|
## Quants
|
|
|
|
* [ggml-model-q5_k_m.gguf](https://huggingface.co/fblgit/UNA-SOLAR-10.7B-Instruct-v1.0/resolve/main/ggml-model-q5_k_m.gguf?download=true)
|
|
* [ggml-model-q6_k.gguf](https://huggingface.co/fblgit/UNA-SOLAR-10.7B-Instruct-v1.0/resolve/main/ggml-model-q6_k.gguf?download=true)
|
|
|
|
## Libraries:
|
|
|
|
- Transformers 4.35.0-UNA
|
|
- Pytorch 2.1.0
|
|
- Datasets 2.14.6
|
|
- Tokenizers 0.14.1
|
|
|
|
## Evals LM-Evaluation Harness
|
|
`mt-bench`:
|
|
```
|
|
Mode: single
|
|
Input file: data/mt_bench/model_judgment/gpt-4_single.jsonl
|
|
|
|
########## First turn ##########
|
|
score
|
|
model turn
|
|
gpt-4 1 8.95625
|
|
claude-v1 1 8.15000
|
|
gpt-3.5-turbo 1 8.07500
|
|
LUNA-SOLARkrautLM-Instruct 1 7.93750
|
|
UNA-SOLAR-10.7B-Instruct-v1.0 1 7.80625
|
|
vicuna-33b-v1.3 1 7.45625
|
|
wizardlm-30b 1 7.13125
|
|
tulu-30b 1 7.01875
|
|
vicuna-13b-v1.3 1 6.81250
|
|
guanaco-65b 1 6.78125
|
|
nous-hermes-13b 1 6.43125
|
|
alpaca-13b 1 4.97500
|
|
rwkv-4-raven-14b 1 4.74375
|
|
llama-13b 1 3.26250
|
|
|
|
########## Second turn ##########
|
|
score
|
|
model turn
|
|
gpt-4 2 9.025000
|
|
gpt-3.5-turbo 2 7.812500
|
|
claude-v1 2 7.650000
|
|
UNA-SOLAR-10.7B-Instruct-v1.0 2 7.237500
|
|
LUNA-SOLARkrautLM-Instruct 2 6.987500
|
|
wizardlm-30b 2 6.887500
|
|
vicuna-33b-v1.3 2 6.787500
|
|
guanaco-65b 2 6.037500
|
|
vicuna-13b-v1.3 2 5.962500
|
|
tulu-30b 2 5.850000
|
|
nous-hermes-13b 2 4.664557
|
|
alpaca-13b 2 4.087500
|
|
rwkv-4-raven-14b 2 3.225000
|
|
llama-13b 2 1.950000
|
|
|
|
########## Average ##########
|
|
score
|
|
model
|
|
gpt-4 8.990625
|
|
gpt-3.5-turbo 7.943750
|
|
claude-instant-v1 7.905660
|
|
claude-v1 7.900000
|
|
UNA-SOLAR-10.7B-Instruct-v1.0 7.521875
|
|
LUNA-SOLARkrautLM-Instruct 7.462500
|
|
vicuna-33b-v1.3 7.121875
|
|
wizardlm-30b 7.009375
|
|
Llama-2-70b-chat 6.856250
|
|
Llama-2-13b-chat 6.650000
|
|
guanaco-33b 6.528125
|
|
tulu-30b 6.434375
|
|
guanaco-65b 6.409375
|
|
oasst-sft-7-llama-30b 6.409375
|
|
palm-2-chat-bison-001 6.400000
|
|
mpt-30b-chat 6.393750
|
|
vicuna-13b-v1.3 6.387500
|
|
wizardlm-13b 6.353125
|
|
Llama-2-7b-chat 6.268750
|
|
vicuna-7b-v1.3 5.996875
|
|
baize-v2-13b 5.750000
|
|
nous-hermes-13b 5.553459
|
|
mpt-7b-chat 5.459119
|
|
gpt4all-13b-snoozy 5.452830
|
|
koala-13b 5.350000
|
|
mpt-30b-instruct 5.218750
|
|
falcon-40b-instruct 5.168750
|
|
h2ogpt-oasst-open-llama-13b 4.625000
|
|
alpaca-13b 4.531250
|
|
chatglm-6b 4.500000
|
|
oasst-sft-4-pythia-12b 4.318750
|
|
rwkv-4-raven-14b 3.984375
|
|
dolly-v2-12b 3.275000
|
|
fastchat-t5-3b 3.040625
|
|
stablelm-tuned-alpha-7b 2.753125
|
|
llama-13b 2.606250
|
|
```
|
|
|
|
`big-refactor` branch:
|
|
|
|
```
|
|
hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0), gen_kwargs: (None), limit: None, num_fewshot: 25, batch_size: auto (32)
|
|
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
|
|
|-------------|-------|------|-----:|--------|-----:|---|-----:|
|
|
|arc_challenge|Yaml |none | 25|acc |0.6954|± |0.0134|
|
|
| | |none | 25|acc_norm|0.7167|± |0.0132|
|
|
|
|
hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: auto
|
|
|Tasks|Version| Filter |n-shot| Metric |Value| |Stderr|
|
|
|-----|-------|----------|-----:|-----------|----:|---|-----:|
|
|
|gsm8k|Yaml |get-answer| 5|exact_match|0.671|± |0.0129|
|
|
|
|
hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0), gen_kwargs: (), limit: None, num_fewshot: 0, batch_size: auto (64)
|
|
| Tasks |Version|Filter|n-shot|Metric|Value | |Stderr|
|
|
|--------------|-------|------|-----:|------|-----:|---|-----:|
|
|
|truthfulqa_mc2|Yaml |none | 0|acc |0.7297|_ |0.0149|
|
|
|
|
hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0), gen_kwargs: (None), limit: None, num_fewshot: 10, batch_size: auto (32)
|
|
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
|
|
|---------|-------|------|-----:|--------|-----:|---|-----:|
|
|
|hellaswag|Yaml |none | 10|acc |0.7091|± |0.0045|
|
|
| | |none | 10|acc_norm|0.8821|± |0.0032|
|
|
|
|
hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0,dtype=float16), gen_kwargs: (), limit: None, num_fewshot: 0, batch_size: auto (32)
|
|
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
|
|
|--------------|-------|------|-----:|----------|-----:|---|-----:|
|
|
|boolq |Yaml |none | 0|acc |0.8807|_ |0.0057|
|
|
|lambada_openai|Yaml |none | 0|perplexity|3.2452|_ |0.0778|
|
|
| | |none | 0|acc |0.7207|_ |0.0063|
|
|
|piqa |Yaml |none | 0|acc |0.8020|_ |0.0093|
|
|
| | |none | 0|acc_norm |0.8009|_ |0.0093|
|
|
|sciq |Yaml |none | 0|acc |0.9730|_ |0.0051|
|
|
| | |none | 0|acc_norm |0.9630|_ |0.0060|
|
|
|winogrande |Yaml |none | 0|acc |0.7577|_ |0.0120|
|
|
|
|
hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0,dtype=float16), gen_kwargs: (), limit: None, num_fewshot: 0, batch_size: auto (64)
|
|
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
|
|
|--------|-------|------|-----:|--------|-----:|---|-----:|
|
|
|mathqa |Yaml |none | 0|acc |0.3474|_ |0.0087|
|
|
| | |none | 0|acc_norm|0.3568|_ |0.0088|
|
|
|pubmedqa|Yaml |none | 0|acc |0.5400|_ |0.0223|
|
|
|
|
hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0,dtype=float16), gen_kwargs: (), limit: None, num_fewshot: 0, batch_size: auto
|
|
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
|
|
|------------------------------------------------------|-------|------|-----:|-----------|-----:|---|-----:|
|
|
|bbh_fewshot |N/A |none | 0|exact_match|0.4660|_ |0.1771|
|
|
| - bbh_fewshot_boolean_expressions |Yaml |none | 0|exact_match|0.8160|_ |0.0246|
|
|
| - bbh_fewshot_causal_judgement |Yaml |none | 0|exact_match|0.4973|_ |0.0367|
|
|
| - bbh_fewshot_date_understanding |Yaml |none | 0|exact_match|0.4840|_ |0.0317|
|
|
| - bbh_fewshot_disambiguation_qa |Yaml |none | 0|exact_match|0.6520|_ |0.0302|
|
|
| - bbh_fewshot_dyck_languages |Yaml |none | 0|exact_match|0.2040|_ |0.0255|
|
|
| - bbh_fewshot_formal_fallacies |Yaml |none | 0|exact_match|0.5280|_ |0.0316|
|
|
| - bbh_fewshot_geometric_shapes |Yaml |none | 0|exact_match|0.3360|_ |0.0299|
|
|
| - bbh_fewshot_hyperbaton |Yaml |none | 0|exact_match|0.5520|_ |0.0315|
|
|
| - bbh_fewshot_logical_deduction_five_objects |Yaml |none | 0|exact_match|0.4520|_ |0.0315|
|
|
| - bbh_fewshot_logical_deduction_seven_objects |Yaml |none | 0|exact_match|0.3920|_ |0.0309|
|
|
| - bbh_fewshot_logical_deduction_three_objects |Yaml |none | 0|exact_match|0.6200|_ |0.0308|
|
|
| - bbh_fewshot_movie_recommendation |Yaml |none | 0|exact_match|0.6640|_ |0.0299|
|
|
| - bbh_fewshot_multistep_arithmetic_two |Yaml |none | 0|exact_match|0.0080|_ |0.0056|
|
|
| - bbh_fewshot_navigate |Yaml |none | 0|exact_match|0.6280|_ |0.0306|
|
|
| - bbh_fewshot_object_counting |Yaml |none | 0|exact_match|0.3960|_ |0.0310|
|
|
| - bbh_fewshot_penguins_in_a_table |Yaml |none | 0|exact_match|0.4726|_ |0.0415|
|
|
| - bbh_fewshot_reasoning_about_colored_objects |Yaml |none | 0|exact_match|0.5320|_ |0.0316|
|
|
| - bbh_fewshot_ruin_names |Yaml |none | 0|exact_match|0.5680|_ |0.0314|
|
|
| - bbh_fewshot_salient_translation_error_detection |Yaml |none | 0|exact_match|0.5480|_ |0.0315|
|
|
| - bbh_fewshot_snarks |Yaml |none | 0|exact_match|0.5169|_ |0.0376|
|
|
| - bbh_fewshot_sports_understanding |Yaml |none | 0|exact_match|0.8320|_ |0.0237|
|
|
| - bbh_fewshot_temporal_sequences |Yaml |none | 0|exact_match|0.5520|_ |0.0315|
|
|
| - bbh_fewshot_tracking_shuffled_objects_five_objects |Yaml |none | 0|exact_match|0.1480|_ |0.0225|
|
|
| - bbh_fewshot_tracking_shuffled_objects_seven_objects|Yaml |none | 0|exact_match|0.1720|_ |0.0239|
|
|
| - bbh_fewshot_tracking_shuffled_objects_three_objects|Yaml |none | 0|exact_match|0.2760|_ |0.0283|
|
|
| - bbh_fewshot_web_of_lies |Yaml |none | 0|exact_match|0.4760|_ |0.0316|
|
|
| - bbh_fewshot_word_sorting |Yaml |none | 0|exact_match|0.2840|_ |0.0286|
|
|
|
|
| Groups |Version|Filter|n-shot| Metric |Value| |Stderr|
|
|
|-----------|-------|------|-----:|-----------|----:|---|-----:|
|
|
|bbh_fewshot|N/A |none | 0|exact_match|0.466|_ |0.1771|
|
|
|
|
hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: auto (16)
|
|
| Tasks |Version|Filter|n-shot|Metric|Value | |Stderr|
|
|
|---------------------------------------|-------|------|-----:|------|-----:|---|-----:|
|
|
|mmlu |N/A |none | 0|acc |0.6513|± |0.1221|
|
|
| - humanities |N/A |none | 5|acc |0.6077|± |0.1185|
|
|
| - formal_logic |Yaml |none | 5|acc |0.4444|± |0.0444|
|
|
| - high_school_european_history |Yaml |none | 5|acc |0.8121|± |0.0305|
|
|
| - high_school_us_history |Yaml |none | 5|acc |0.8431|± |0.0255|
|
|
| - high_school_world_history |Yaml |none | 5|acc |0.8523|± |0.0231|
|
|
| - international_law |Yaml |none | 5|acc |0.7851|± |0.0375|
|
|
| - jurisprudence |Yaml |none | 5|acc |0.7870|± |0.0396|
|
|
| - logical_fallacies |Yaml |none | 5|acc |0.7546|± |0.0338|
|
|
| - moral_disputes |Yaml |none | 5|acc |0.7370|± |0.0237|
|
|
| - moral_scenarios |Yaml |none | 5|acc |0.4101|± |0.0164|
|
|
| - philosophy |Yaml |none | 5|acc |0.7170|± |0.0256|
|
|
| - prehistory |Yaml |none | 5|acc |0.7840|± |0.0229|
|
|
| - professional_law |Yaml |none | 5|acc |0.4941|± |0.0128|
|
|
| - world_religions |Yaml |none | 5|acc |0.7895|± |0.0313|
|
|
| - other |N/A |none | 5|acc |0.7116|± |0.0939|
|
|
| - business_ethics |Yaml |none | 5|acc |0.7600|± |0.0429|
|
|
| - clinical_knowledge |Yaml |none | 5|acc |0.6792|± |0.0287|
|
|
| - college_medicine |Yaml |none | 5|acc |0.6590|± |0.0361|
|
|
| - global_facts |Yaml |none | 5|acc |0.3400|± |0.0476|
|
|
| - human_aging |Yaml |none | 5|acc |0.6816|± |0.0313|
|
|
| - management |Yaml |none | 5|acc |0.8350|± |0.0368|
|
|
| - marketing |Yaml |none | 5|acc |0.8547|± |0.0231|
|
|
| - medical_genetics |Yaml |none | 5|acc |0.7000|± |0.0461|
|
|
| - miscellaneous |Yaml |none | 5|acc |0.8020|± |0.0142|
|
|
| - nutrition |Yaml |none | 5|acc |0.7418|± |0.0251|
|
|
| - professional_accounting |Yaml |none | 5|acc |0.5071|± |0.0298|
|
|
| - professional_medicine |Yaml |none | 5|acc |0.7500|± |0.0263|
|
|
| - virology |Yaml |none | 5|acc |0.5843|± |0.0384|
|
|
| - social_sciences |N/A |none | 5|acc |0.7537|± |0.0681|
|
|
| - econometrics |Yaml |none | 5|acc |0.5000|± |0.0470|
|
|
| - high_school_geography |Yaml |none | 5|acc |0.8586|± |0.0248|
|
|
| - high_school_government_and_politics|Yaml |none | 5|acc |0.9016|± |0.0215|
|
|
| - high_school_macroeconomics |Yaml |none | 5|acc |0.6615|± |0.0240|
|
|
| - high_school_microeconomics |Yaml |none | 5|acc |0.7311|± |0.0288|
|
|
| - high_school_psychology |Yaml |none | 5|acc |0.8404|± |0.0157|
|
|
| - human_sexuality |Yaml |none | 5|acc |0.7328|± |0.0388|
|
|
| - professional_psychology |Yaml |none | 5|acc |0.6814|± |0.0189|
|
|
| - public_relations |Yaml |none | 5|acc |0.6909|± |0.0443|
|
|
| - security_studies |Yaml |none | 5|acc |0.7469|± |0.0278|
|
|
| - sociology |Yaml |none | 5|acc |0.8308|± |0.0265|
|
|
| - us_foreign_policy |Yaml |none | 5|acc |0.8900|± |0.0314|
|
|
| - stem |N/A |none | 5|acc |0.5569|± |0.1380|
|
|
| - abstract_algebra |Yaml |none | 5|acc |0.4100|± |0.0494|
|
|
| - anatomy |Yaml |none | 5|acc |0.6222|± |0.0419|
|
|
| - astronomy |Yaml |none | 5|acc |0.7368|± |0.0358|
|
|
| - college_biology |Yaml |none | 5|acc |0.8056|± |0.0331|
|
|
| - college_chemistry |Yaml |none | 5|acc |0.4700|± |0.0502|
|
|
| - college_computer_science |Yaml |none | 5|acc |0.5100|± |0.0502|
|
|
| - college_mathematics |Yaml |none | 5|acc |0.2800|± |0.0451|
|
|
| - college_physics |Yaml |none | 5|acc |0.3431|± |0.0472|
|
|
| - computer_security |Yaml |none | 5|acc |0.7400|± |0.0441|
|
|
| - conceptual_physics |Yaml |none | 5|acc |0.6340|± |0.0315|
|
|
| - electrical_engineering |Yaml |none | 5|acc |0.6000|± |0.0408|
|
|
| - elementary_mathematics |Yaml |none | 5|acc |0.4815|± |0.0257|
|
|
| - high_school_biology |Yaml |none | 5|acc |0.8032|± |0.0226|
|
|
| - high_school_chemistry |Yaml |none | 5|acc |0.4877|± |0.0352|
|
|
| - high_school_computer_science |Yaml |none | 5|acc |0.7200|± |0.0451|
|
|
| - high_school_mathematics |Yaml |none | 5|acc |0.3815|± |0.0296|
|
|
| - high_school_physics |Yaml |none | 5|acc |0.3576|± |0.0391|
|
|
| - high_school_statistics |Yaml |none | 5|acc |0.5602|± |0.0339|
|
|
| - machine_learning |Yaml |none | 5|acc |0.4643|± |0.0473|
|
|
|
|
| Groups |Version|Filter|n-shot|Metric|Value | |Stderr|
|
|
|------------------|-------|------|-----:|------|-----:|---|-----:|
|
|
|mmlu |N/A |none | 0|acc |0.6513|± |0.1221|
|
|
| - humanities |N/A |none | 5|acc |0.6077|± |0.1185|
|
|
| - other |N/A |none | 5|acc |0.7116|± |0.0939|
|
|
| - social_sciences|N/A |none | 5|acc |0.7537|± |0.0681|
|
|
| - stem |N/A |none | 5|acc |0.5569|± |0.1380|
|
|
```
|
|
|
|
|
|
## Citations
|
|
|
|
to [Upstage.AI](https://huggingface.co/upstage) for its awesome base model, this is merely a UNA of it. It can only refine what its already in there :)
|
|
|
|
If you find UNA-SOLAR useful, cite and support the authors. |