初始化项目,由ModelHub XC社区提供模型
Model: SanjiWatsuki/Silicon-Maid-7B Source: Original Platform
This commit is contained in:
36
.gitattributes
vendored
Normal file
36
.gitattributes
vendored
Normal file
@@ -0,0 +1,36 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
assets/cybermaid.png filter=lfs diff=lfs merge=lfs -text
|
||||
118
README.md
Normal file
118
README.md
Normal file
@@ -0,0 +1,118 @@
|
||||
---
|
||||
license: cc-by-4.0
|
||||
language:
|
||||
- en
|
||||
tags:
|
||||
- merge
|
||||
- not-for-all-audiences
|
||||
- nsfw
|
||||
---
|
||||
|
||||
<div style="display: flex; justify-content: center; align-items: center">
|
||||
<img src="https://huggingface.co/SanjiWatsuki/Silicon-Maid-7B/resolve/main/assets/cybermaid.png">
|
||||
</div
|
||||
>
|
||||
|
||||
<p align="center">
|
||||
<big><b>Top 1 RP Performer on MT-bench 🤪</b
|
||||
></big>
|
||||
</p>
|
||||
|
||||
<p align="center">
|
||||
<strong>Next Gen Silicon-Based RP Maid</strong>
|
||||
</p>
|
||||
|
||||
## WTF is This?
|
||||
|
||||
Silicon-Maid-7B is another model targeted at being both strong at RP **and** being a smart cookie that can follow character cards very well. As of right now, Silicon-Maid-7B outscores both of my previous 7B RP models in my RP benchmark and I have been impressed by this model's creativity. It is suitable for RP/ERP and general use. Quants can be found [here](https://huggingface.co/collections/SanjiWatsuki/silicon-maid-7b-658d1669292816fe4992daa4).
|
||||
|
||||
It's built on [xDAN-AI/xDAN-L1-Chat-RL-v1](https://huggingface.co/xDAN-AI/xDAN-L1-Chat-RL-v1), a 7B model which scores unusually high on MT-Bench, and chargoddard/loyal-piano-m7, an Alpaca format 7B model with surprisingly creative outputs. I was excited to see this model for two main reasons:
|
||||
* MT-Bench normally correlates well with real world model quality
|
||||
* It was an Alpaca prompt model with high benches which meant I could try swapping out my Marcoroni frankenmerge used in my previous model.
|
||||
|
||||
**MT-Bench Average Turn**
|
||||
| model | score | size
|
||||
|--------------------|-----------|--------
|
||||
| gpt-4 | 8.99 | -
|
||||
| *xDAN-L1-Chat-RL-v1* | 8.24^1 | 7b
|
||||
| Starling-7B | 8.09 | 7b
|
||||
| Claude-2 | 8.06 | -
|
||||
| **Silicon-Maid** | **7.96** | **7b**
|
||||
| *Loyal-Macaroni-Maid*| 7.95 | 7b
|
||||
| gpt-3.5-turbo | 7.94 | 20b?
|
||||
| Claude-1 | 7.90 | -
|
||||
| OpenChat-3.5 | 7.81 | -
|
||||
| vicuna-33b-v1.3 | 7.12 | 33b
|
||||
| wizardlm-30b | 7.01 | 30b
|
||||
| Llama-2-70b-chat | 6.86 | 70b
|
||||
|
||||
^1 xDAN's testing placed it 8.35 - this number is from my independent MT-Bench run.
|
||||
|
||||
<img src="https://huggingface.co/SanjiWatsuki/Silicon-Maid-7B/resolve/main/assets/fig-silicon-loyal.png">
|
||||
|
||||
It's unclear to me if xDAN-L1-Chat-RL-v1 is overtly benchmaxxing but it seemed like a solid 7B from my limited testing (although nothing that screams 2nd best model behind GPT-4). Amusingly, the model lost a lot of Reasoning and Coding skills in the merger. This was a much greater MT-Bench dropoff than I expected, perhaps suggesting the Math/Reasoning ability in the original model was rather dense and susceptible to being lost to a DARE TIE merger?
|
||||
|
||||
Besides that, the merger is almost identical to the Loyal-Macaroni-Maid merger with a new base "smart cookie" model. If you liked any of my previous RP models, give this one a shot and let me know in the Community tab what you think!
|
||||
|
||||
### The Sauce
|
||||
|
||||
```
|
||||
models: # Top-Loyal-Bruins-Maid-DARE-7B
|
||||
- model: mistralai/Mistral-7B-v0.1
|
||||
# no parameters necessary for base model
|
||||
- model: xDAN-AI/xDAN-L1-Chat-RL-v1
|
||||
parameters:
|
||||
weight: 0.4
|
||||
density: 0.8
|
||||
- model: chargoddard/loyal-piano-m7
|
||||
parameters:
|
||||
weight: 0.3
|
||||
density: 0.8
|
||||
- model: Undi95/Toppy-M-7B
|
||||
parameters:
|
||||
weight: 0.2
|
||||
density: 0.4
|
||||
- model: NeverSleep/Noromaid-7b-v0.2
|
||||
parameters:
|
||||
weight: 0.2
|
||||
density: 0.4
|
||||
- model: athirdpath/NSFW_DPO_vmgb-7b
|
||||
parameters:
|
||||
weight: 0.2
|
||||
density: 0.4
|
||||
merge_method: dare_ties
|
||||
base_model: mistralai/Mistral-7B-v0.1
|
||||
parameters:
|
||||
int8_mask: true
|
||||
dtype: bfloat16
|
||||
```
|
||||
|
||||
For more information about why I use this merger, see the [Loyal-Macaroni-Maid repo](https://huggingface.co/SanjiWatsuki/Loyal-Macaroni-Maid-7B#the-sauce-all-you-need-is-dare)
|
||||
|
||||
### Prompt Template (Alpaca)
|
||||
I found the best SillyTavern results from using the Noromaid template but please try other templates! Let me know if you find anything good.
|
||||
|
||||
SillyTavern config files: [Context](https://files.catbox.moe/ifmhai.json), [Instruct](https://files.catbox.moe/ttw1l9.json).
|
||||
|
||||
Additionally, here is my highly recommended [Text Completion preset](https://huggingface.co/SanjiWatsuki/Loyal-Macaroni-Maid-7B/blob/main/Characters/MinP.json). You can tweak this by adjusting temperature up or dropping min p to boost creativity or raise min p to increase stability. You shouldn't need to touch anything else!
|
||||
|
||||
```
|
||||
Below is an instruction that describes a task. Write a response that appropriately completes the request.
|
||||
|
||||
### Instruction:
|
||||
{prompt}
|
||||
|
||||
### Response:
|
||||
```
|
||||
|
||||
### Other Benchmarks
|
||||
|
||||
| Model | Average | AGIEval | GPT4All | TruthfulQA | Bigbench |
|
||||
|---|---:|---:|---:|---:|---:|
|
||||
| [OpenPipe/mistral-ft-optimized-1218](https://huggingface.co/OpenPipe/mistral-ft-optimized-1218) [📄](https://gist.github.com/mlabonne/36c412889c4acfad7061f269a31f9055) | 56.85 | 44.74 | 75.6 | 59.89 | 47.17 |
|
||||
| [**Silicon-Maid-7B**](https://huggingface.co/SanjiWatsuki/Silicon-Maid-7B) [📄](https://gist.github.com/DHNishi/315ba1abba27af930f5f546af3515735) | **56.45**| 44.74| 74.26| 61.5| 45.32|
|
||||
| [mlabonne/NeuralHermes-2.5-Mistral-7B](https://huggingface.co/mlabonne/NeuralHermes-2.5-Mistral-7B) [📄](https://gist.github.com/mlabonne/14687f1eb3425b166db511f31f8e66f6) | 53.51 | 43.67 | 73.24 | 55.37 | 41.76 |
|
||||
| [teknium/OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) [📄](https://gist.github.com/mlabonne/88b21dd9698ffed75d6163ebdc2f6cc8) | 52.42 | 42.75 | 72.99 | 52.99 | 40.94 |
|
||||
| [openchat/openchat_3.5](https://huggingface.co/openchat/openchat_3.5) [📄](https://gist.github.com/mlabonne/e23d7d8418619cf5b1ca10da391ac629) | 51.34 | 42.67 | 72.92 | 47.27 | 42.51 |
|
||||
| [berkeley-nest/Starling-LM-7B-alpha](https://huggingface.co/berkeley-nest/Starling-LM-7B-alpha) [📄](https://gist.github.com/mlabonne/c31cc46169ef3004c0df250017d5cac9) | 51.16 | 42.06 | 72.72 | 47.33 | 42.53 |
|
||||
| [HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) [📄](https://gist.github.com/mlabonne/32a36f448fd36a3100c325d51d01c0a1) | 50.99 | 37.33 | 71.83 | 55.1 | 39.7 |
|
||||
53
assets/MinP.json
Normal file
53
assets/MinP.json
Normal file
@@ -0,0 +1,53 @@
|
||||
{
|
||||
"temp": 1,
|
||||
"temperature_last": false,
|
||||
"top_p": 1,
|
||||
"top_k": 0,
|
||||
"top_a": 0,
|
||||
"tfs": 1,
|
||||
"epsilon_cutoff": 0,
|
||||
"eta_cutoff": 0,
|
||||
"typical_p": 1,
|
||||
"min_p": 0.05,
|
||||
"rep_pen": 1.1,
|
||||
"rep_pen_range": 2048,
|
||||
"no_repeat_ngram_size": 0,
|
||||
"penalty_alpha": 0,
|
||||
"num_beams": 1,
|
||||
"length_penalty": 1,
|
||||
"min_length": 0,
|
||||
"encoder_rep_pen": 1,
|
||||
"freq_pen": 0,
|
||||
"presence_pen": 0,
|
||||
"do_sample": true,
|
||||
"early_stopping": false,
|
||||
"add_bos_token": true,
|
||||
"truncation_length": 2048,
|
||||
"ban_eos_token": false,
|
||||
"skip_special_tokens": true,
|
||||
"streaming": true,
|
||||
"mirostat_mode": 0,
|
||||
"mirostat_tau": 5,
|
||||
"mirostat_eta": 0.1,
|
||||
"guidance_scale": 1,
|
||||
"negative_prompt": "",
|
||||
"grammar_string": "",
|
||||
"banned_tokens": "",
|
||||
"ignore_eos_token_aphrodite": false,
|
||||
"spaces_between_special_tokens_aphrodite": true,
|
||||
"type": "koboldcpp",
|
||||
"legacy_api": false,
|
||||
"sampler_order": [
|
||||
6,
|
||||
0,
|
||||
1,
|
||||
3,
|
||||
4,
|
||||
2,
|
||||
5
|
||||
],
|
||||
"n": 1,
|
||||
"rep_pen_size": 0,
|
||||
"genamt": 250,
|
||||
"max_length": 8192
|
||||
}
|
||||
3
assets/cybermaid.png
Normal file
3
assets/cybermaid.png
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:d7c89905448cab47c99811a90cee343209894a85c89a968917cdc1750cacd2a3
|
||||
size 1483296
|
||||
BIN
assets/fig-silicon-loyal.png
Normal file
BIN
assets/fig-silicon-loyal.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 309 KiB |
26
config.json
Normal file
26
config.json
Normal file
@@ -0,0 +1,26 @@
|
||||
{
|
||||
"_name_or_path": "mistralai/Mistral-7B-v0.1",
|
||||
"architectures": [
|
||||
"MistralForCausalLM"
|
||||
],
|
||||
"attention_dropout": 0.0,
|
||||
"bos_token_id": 1,
|
||||
"eos_token_id": 2,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 4096,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 14336,
|
||||
"max_position_embeddings": 8192,
|
||||
"model_type": "mistral",
|
||||
"num_attention_heads": 32,
|
||||
"num_hidden_layers": 32,
|
||||
"num_key_value_heads": 8,
|
||||
"rms_norm_eps": 1e-05,
|
||||
"rope_theta": 10000.0,
|
||||
"sliding_window": 4096,
|
||||
"tie_word_embeddings": false,
|
||||
"torch_dtype": "bfloat16",
|
||||
"transformers_version": "4.36.2",
|
||||
"use_cache": true,
|
||||
"vocab_size": 32000
|
||||
}
|
||||
3
model-00001-of-00002.safetensors
Normal file
3
model-00001-of-00002.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:a2259cefdd973f030eb6f8e9bbbceb722da78caf70dde9929b14011ce4479a80
|
||||
size 9984924496
|
||||
3
model-00002-of-00002.safetensors
Normal file
3
model-00002-of-00002.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:f6d8a712ed6699976831074a4aacff4f25725915d92fc1c5e92238c94ce8c16e
|
||||
size 4498573536
|
||||
1
model.safetensors.index.json
Normal file
1
model.safetensors.index.json
Normal file
File diff suppressed because one or more lines are too long
23
special_tokens_map.json
Normal file
23
special_tokens_map.json
Normal file
@@ -0,0 +1,23 @@
|
||||
{
|
||||
"bos_token": {
|
||||
"content": "<s>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"eos_token": {
|
||||
"content": "</s>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"unk_token": {
|
||||
"content": "<unk>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
91122
tokenizer.json
Normal file
91122
tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
BIN
tokenizer.model
(Stored with Git LFS)
Normal file
BIN
tokenizer.model
(Stored with Git LFS)
Normal file
Binary file not shown.
42
tokenizer_config.json
Normal file
42
tokenizer_config.json
Normal file
@@ -0,0 +1,42 @@
|
||||
{
|
||||
"add_bos_token": true,
|
||||
"add_eos_token": false,
|
||||
"added_tokens_decoder": {
|
||||
"0": {
|
||||
"content": "<unk>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"1": {
|
||||
"content": "<s>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"2": {
|
||||
"content": "</s>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
}
|
||||
},
|
||||
"additional_special_tokens": [],
|
||||
"bos_token": "<s>",
|
||||
"clean_up_tokenization_spaces": false,
|
||||
"eos_token": "</s>",
|
||||
"legacy": true,
|
||||
"model_max_length": 1000000000000000019884624838656,
|
||||
"pad_token": null,
|
||||
"sp_model_kwargs": {},
|
||||
"spaces_between_special_tokens": false,
|
||||
"tokenizer_class": "LlamaTokenizer",
|
||||
"unk_token": "<unk>",
|
||||
"use_default_system_prompt": false
|
||||
}
|
||||
Reference in New Issue
Block a user