初始化项目,由ModelHub XC社区提供模型
Model: Undi95/MLewd-ReMM-L2-Chat-20B Source: Original Platform
This commit is contained in:
35
.gitattributes
vendored
Normal file
35
.gitattributes
vendored
Normal file
@@ -0,0 +1,35 @@
|
|||||||
|
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.model filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||||
|
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||||
130
README.md
Normal file
130
README.md
Normal file
@@ -0,0 +1,130 @@
|
|||||||
|
---
|
||||||
|
license: cc-by-nc-4.0
|
||||||
|
tags:
|
||||||
|
- not-for-all-audiences
|
||||||
|
- nsfw
|
||||||
|
---
|
||||||
|
|
||||||
|
First :
|
||||||
|
```shell
|
||||||
|
layer_slices:
|
||||||
|
- model: Undi95/MLewd-L2-Chat-13B
|
||||||
|
start: 0
|
||||||
|
end: 16
|
||||||
|
- model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
|
||||||
|
start: 8
|
||||||
|
end: 20
|
||||||
|
- model: Undi95/MLewd-L2-Chat-13B
|
||||||
|
start: 17
|
||||||
|
end: 32
|
||||||
|
- model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
|
||||||
|
start: 21
|
||||||
|
end: 40
|
||||||
|
```
|
||||||
|
|
||||||
|
Inverted:
|
||||||
|
```shell
|
||||||
|
layer_slices:
|
||||||
|
- model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
|
||||||
|
start: 0
|
||||||
|
end: 16
|
||||||
|
- model: Undi95/MLewd-L2-Chat-13B
|
||||||
|
start: 8
|
||||||
|
end: 20
|
||||||
|
- model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
|
||||||
|
start: 17
|
||||||
|
end: 32
|
||||||
|
- model: Undi95/MLewd-L2-Chat-13B
|
||||||
|
start: 21
|
||||||
|
end: 40
|
||||||
|
```
|
||||||
|
|
||||||
|
Precise:
|
||||||
|
```shell
|
||||||
|
layer_slices:
|
||||||
|
- model: Undi95/MLewd-L2-Chat-13B
|
||||||
|
start: 0
|
||||||
|
end: 8
|
||||||
|
- model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
|
||||||
|
start: 4
|
||||||
|
end: 12
|
||||||
|
- model: Undi95/MLewd-L2-Chat-13B
|
||||||
|
start: 9
|
||||||
|
end: 16
|
||||||
|
- model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
|
||||||
|
start: 13
|
||||||
|
end: 22
|
||||||
|
- model: Undi95/MLewd-L2-Chat-13B
|
||||||
|
start: 17
|
||||||
|
end: 24
|
||||||
|
- model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
|
||||||
|
start: 23
|
||||||
|
end: 32
|
||||||
|
- model: Undi95/MLewd-L2-Chat-13B
|
||||||
|
start: 25
|
||||||
|
end: 32
|
||||||
|
- model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
|
||||||
|
start: 33
|
||||||
|
end: 40
|
||||||
|
```
|
||||||
|
|
||||||
|
PreciseInverted:
|
||||||
|
```shell
|
||||||
|
layer_slices:
|
||||||
|
- model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
|
||||||
|
start: 0
|
||||||
|
end: 8
|
||||||
|
- model: Undi95/MLewd-L2-Chat-13B
|
||||||
|
start: 4
|
||||||
|
end: 12
|
||||||
|
- model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
|
||||||
|
start: 9
|
||||||
|
end: 16
|
||||||
|
- model: Undi95/MLewd-L2-Chat-13B
|
||||||
|
start: 13
|
||||||
|
end: 22
|
||||||
|
- model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
|
||||||
|
start: 17
|
||||||
|
end: 24
|
||||||
|
- model: Undi95/MLewd-L2-Chat-13B
|
||||||
|
start: 23
|
||||||
|
end: 32
|
||||||
|
- model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
|
||||||
|
start: 25
|
||||||
|
end: 32
|
||||||
|
- model: Undi95/MLewd-L2-Chat-13B
|
||||||
|
start: 33
|
||||||
|
end: 40
|
||||||
|
```
|
||||||
|
|
||||||
|
Part1 = ReMM v2.1 merged /w MLewd low weight to keep consistency. I call this "dilution" and result show consistency and coherency without repeat/loop beside the small amount of duplicated datas.
|
||||||
|
|
||||||
|
The goal is to find the best way to interlace layers the best way possible to have a sweetspot between 13B and +30B.
|
||||||
|
|
||||||
|
Normal/Inverted is by chunk of 16 layers and Precise/PreciseInverted is by chunk of 8 layers.
|
||||||
|
|
||||||
|
All the models are made of 64(+1) layers. Need testing.
|
||||||
|
|
||||||
|
## Prompt template: Alpaca
|
||||||
|
|
||||||
|
```
|
||||||
|
Below is an instruction that describes a task. Write a response that completes the request.
|
||||||
|
|
||||||
|
### Instruction:
|
||||||
|
{prompt}
|
||||||
|
|
||||||
|
### Response:
|
||||||
|
```
|
||||||
|
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
||||||
|
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Undi95__MLewd-ReMM-L2-Chat-20B)
|
||||||
|
|
||||||
|
| Metric | Value |
|
||||||
|
|-----------------------|---------------------------|
|
||||||
|
| Avg. | 53.33 |
|
||||||
|
| ARC (25-shot) | 62.46 |
|
||||||
|
| HellaSwag (10-shot) | 85.62 |
|
||||||
|
| MMLU (5-shot) | 59.13 |
|
||||||
|
| TruthfulQA (0-shot) | 55.63 |
|
||||||
|
| Winogrande (5-shot) | 77.19 |
|
||||||
|
| GSM8K (5-shot) | 10.92 |
|
||||||
|
| DROP (3-shot) | 22.33 |
|
||||||
26
config.json
Normal file
26
config.json
Normal file
@@ -0,0 +1,26 @@
|
|||||||
|
{
|
||||||
|
"_name_or_path": "Undi95/MLewd-L2-Chat-13B",
|
||||||
|
"architectures": [
|
||||||
|
"LlamaForCausalLM"
|
||||||
|
],
|
||||||
|
"bos_token_id": 1,
|
||||||
|
"eos_token_id": 2,
|
||||||
|
"hidden_act": "silu",
|
||||||
|
"hidden_size": 5120,
|
||||||
|
"initializer_range": 0.02,
|
||||||
|
"intermediate_size": 13824,
|
||||||
|
"max_position_embeddings": 4096,
|
||||||
|
"model_type": "llama",
|
||||||
|
"num_attention_heads": 40,
|
||||||
|
"num_hidden_layers": 62,
|
||||||
|
"num_key_value_heads": 40,
|
||||||
|
"pretraining_tp": 1,
|
||||||
|
"rms_norm_eps": 1e-05,
|
||||||
|
"rope_scaling": null,
|
||||||
|
"rope_theta": 10000.0,
|
||||||
|
"tie_word_embeddings": false,
|
||||||
|
"torch_dtype": "float16",
|
||||||
|
"transformers_version": "4.33.2",
|
||||||
|
"use_cache": true,
|
||||||
|
"vocab_size": 32000
|
||||||
|
}
|
||||||
3
model-00001-of-00005.safetensors
Normal file
3
model-00001-of-00005.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:0eed3a372e11f225e959ed6f98e41fb5d3f539ae6cd0afc29849b3e87627ff48
|
||||||
|
size 9985398280
|
||||||
3
model-00002-of-00005.safetensors
Normal file
3
model-00002-of-00005.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:8248722ec84c0889fb239917a6e3cee380a002cac71504f4f8da416fdbab0aa8
|
||||||
|
size 9956563000
|
||||||
3
model-00003-of-00005.safetensors
Normal file
3
model-00003-of-00005.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:7d59662aed960916a584e0d077286d5ba513d92698eaf6ab8885f173018fda4a
|
||||||
|
size 9993273312
|
||||||
3
model-00004-of-00005.safetensors
Normal file
3
model-00004-of-00005.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:6a67c9735625d38115bfebc80be8af32654aa478729de9d0ddfaf4aec38ffd94
|
||||||
|
size 9725876168
|
||||||
3
model-00005-of-00005.safetensors
Normal file
3
model-00005-of-00005.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:63c585b4b9e15ed700bf6b26188072ab269c3ce98dd193a080b69a4ca689cdcb
|
||||||
|
size 655360128
|
||||||
1
model.safetensors.index.json
Normal file
1
model.safetensors.index.json
Normal file
File diff suppressed because one or more lines are too long
23
special_tokens_map.json
Normal file
23
special_tokens_map.json
Normal file
@@ -0,0 +1,23 @@
|
|||||||
|
{
|
||||||
|
"bos_token": {
|
||||||
|
"content": "<s>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"eos_token": {
|
||||||
|
"content": "</s>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"unk_token": {
|
||||||
|
"content": "<unk>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
}
|
||||||
|
}
|
||||||
93391
tokenizer.json
Normal file
93391
tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
BIN
tokenizer.model
(Stored with Git LFS)
Normal file
BIN
tokenizer.model
(Stored with Git LFS)
Normal file
Binary file not shown.
34
tokenizer_config.json
Normal file
34
tokenizer_config.json
Normal file
@@ -0,0 +1,34 @@
|
|||||||
|
{
|
||||||
|
"bos_token": {
|
||||||
|
"__type": "AddedToken",
|
||||||
|
"content": "<s>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"clean_up_tokenization_spaces": false,
|
||||||
|
"eos_token": {
|
||||||
|
"__type": "AddedToken",
|
||||||
|
"content": "</s>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"legacy": false,
|
||||||
|
"model_max_length": 1000000000000000019884624838656,
|
||||||
|
"pad_token": null,
|
||||||
|
"padding_side": "right",
|
||||||
|
"sp_model_kwargs": {},
|
||||||
|
"tokenizer_class": "LlamaTokenizer",
|
||||||
|
"unk_token": {
|
||||||
|
"__type": "AddedToken",
|
||||||
|
"content": "<unk>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"use_default_system_prompt": true
|
||||||
|
}
|
||||||
Reference in New Issue
Block a user