初始化项目,由ModelHub XC社区提供模型

Model: cstr/Spaetzle-v60-7b
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-19 02:52:29 +08:00
commit 3124f06162
17 changed files with 91407 additions and 0 deletions

35
.gitattributes vendored Normal file
View File

@@ -0,0 +1,35 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text

103
README.md Normal file
View File

@@ -0,0 +1,103 @@
---
tags:
- merge
- mergekit
- lazymergekit
- abideen/AlphaMonarch-dora
base_model:
- abideen/AlphaMonarch-dora
license: cc-by-nc-4.0
language:
- de
- en
---
# Spaetzle-v60-7b
This is a progressive (mostly dare-ties, but also slerp i.a.) merge with the intention of suitable compromise for English and German local tasks.
Spaetzle-v60-7b is a merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):
* [abideen/AlphaMonarch-dora](https://huggingface.co/abideen/AlphaMonarch-dora)
* [cstr/Spaetzle-v58-7b](https://huggingface.co/cstr/Spaetzle-v58-7b)
## Benchmarks
The performance looks ok so far: e.g. we get in EQ-Bench: Score (v2_de): 65.08 (Parseable: 171.0).
From the [Occiglot Euro LLM Leaderboard](https://huggingface.co/spaces/occiglot/euro-llm-leaderboard):
| Model | DE | EN | ARC EN | TruthfulQA EN | Belebele EN | HellaSwag EN | MMLU EN | ARC DE | TruthfulQA DE | Belebele DE | HellaSwag DE | MMLU DE |
|--------------------------------------------------------|-------|-------|--------|---------------|-------------|--------------|---------|--------|---------------|-------------|--------------|---------|
| mistral-community/Mixtral-8x22B-v0.1 | 66.81 | 72.87 | 70.56 | 52.29 | 93.89 | 70.41 | 77.17 | 63.9 | 29.31 | 92.44 | 77.9 | 70.49 |
| **cstr/Spaetzle-v60-7b** | 60.95 | 71.65 | 69.88 | 66.24 | 90.11 | 68.43 | 63.59 | 58 | 37.31 | 84.22 | 70.09 | 55.11 |
| VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct | 60.07 | 74.71 | 74.49 | 66.19 | 91.67 | 74.55 | 66.65 | 59.37 | 29.57 | 88.56 | 66.43 | 56.44 |
| occiglot/occiglot-7b-de-en-instruct | 56.65 | 61.7 | 60.41 | 49.38 | 81.22 | 60.43 | 57.06 | 54.49 | 31.09 | 77.22 | 68.84 | 51.59 |
| occiglot/occiglot-7b-de-en | 54.01 | 58.78 | 55.63 | 42.33 | 79.11 | 59.99 | 56.84 | 50.56 | 26.27 | 74.33 | 67.42 | 51.46 |
| meta-llama/Meta-Llama-3-8B | 53.89 | 63.08 | 58.02 | 43.87 | 86.44 | 61.75 | 65.3 | 46.45 | 24.24 | 81.11 | 62.48 | 55.18 |
| mistralai/Mistral-7B-Instruct-v0.2 | 53.52 | 67.63 | 63.74 | 66.81 | 82.44 | 65.96 | 59.2 | 48.59 | 37.69 | 68.89 | 62.24 | 50.2 |
| occiglot/occiglot-7b-eu5-instruct | 53.15 | 57.78 | 55.89 | 44.9 | 74.67 | 59.92 | 53.51 | 52.95 | 28.68 | 66.78 | 68.52 | 48.82 |
| clibrain/lince-mistral-7b-it-es | 52.98 | 62.43 | 62.46 | 43.32 | 82.44 | 63.86 | 60.06 | 49.44 | 28.17 | 75 | 61.64 | 50.64 |
| mistralai/Mistral-7B-v0.1 | 52.8 | 62.73 | 61.26 | 42.62 | 84.44 | 62.89 | 62.46 | 47.65 | 28.43 | 73.89 | 61.06 | 52.96 |
| LeoLM/leo-mistral-hessianai-7b | 51.78 | 56.11 | 52.22 | 42.92 | 73.67 | 57.86 | 53.88 | 47.48 | 25.25 | 69.11 | 68.21 | 48.83 |
And for the int4-inc quantized version, from [Low-bit Quantized Open LLM Leaderboard](https://huggingface.co/spaces/Intel/low_bit_open_llm_leaderboard):
| Type | Model | Average ⬆️ | ARC-c | ARC-e | Boolq | HellaSwag | Lambada | MMLU | Openbookqa | Piqa | Truthfulqa | Winogrande | #Params (B) | #Size (G) |
|------|-------------------------------------------|------------|-------|-------|-------|-----------|---------|-------|------------|-------|------------|------------|-------------|-----------|
| 🍒 | Intel/SOLAR-10.7B-Instruct-v1.0-int4-inc | 68.49 | 60.49 | 82.66 | 88.29 | 68.29 | 73.36 | 62.43 | 35.6 | 80.74 | 56.06 | 76.95 | 10.57 | 5.98 |
| 🍒 | **cstr/Spaetzle-v60-7b-int4-inc** | **68.01** | **62.12** | **85.27** | **87.34** | **66.43** | **70.58** | **61.39** | **37** | **82.26** | **50.18** | **77.51** | **7.04** | **4.16** |
| 🔷 | TheBloke/SOLAR-10.7B-Instruct-v1.0-GGUF | 66.6 | 60.41 | 83.38 | 88.29 | 67.73 | 52.42 | 62.04 | 37.2 | 82.32 | 56.3 | 75.93 | 10.73 | 6.07 |
| 🔷 | cstr/Spaetzle-v60-7b-Q4_0-GGUF | 66.44 | 61.35 | 85.19 | 87.98 | 66.54 | 52.78 | 62.05 | 40.6 | 81.72 | 47 | 79.16 | 7.24 | 4.11 |
| 🍒 | Intel/Mistral-7B-Instruct-v0.2-int4-inc | 65.73 | 55.38 | 81.44 | 85.26 | 65.67 | 70.89 | 58.66 | 34.2 | 80.74 | 51.16 | 73.95 | 7.04 | 4.16 |
| 🍒 | Intel/Phi-3-mini-4k-instruct-int4-inc | 65.09 | 57.08 | 83.33 | 86.18 | 59.45 | 68.14 | 66.62 | 38.6 | 79.33 | 38.68 | 73.48 | 3.66 | 2.28 |
| 🔷 | TheBloke/Mistral-7B-Instruct-v0.2-GGUF | 63.52 | 53.5 | 77.9 | 85.44 | 66.9 | 50.11 | 58.45 | 38.8 | 77.58 | 53.12 | 73.4 | 7.24 | 4.11 |
| 🍒 | Intel/Meta-Llama-3-8B-Instruct-int4-inc | 62.93 | 51.88 | 81.1 | 83.21 | 57.09 | 71.32 | 62.41 | 35.2 | 78.62 | 36.35 | 72.14 | 7.2 | 5.4 |
Contamination check results (reference model: Mistral instruct 7b v0.1):
- MMLU: result < 0.1, %: 0.19
- TruthfulQA: result < 0.1, %: 0.34
- GSM8k: result < 0.1, %: 0.39
## 🧩 Configuration
```yaml
models:
- model: cstr/Spaetzle-v58-7b
# no parameters necessary for base model
- model: abideen/AlphaMonarch-dora
parameters:
density: 0.60
weight: 0.30
merge_method: dare_ties
base_model: cstr/Spaetzle-v58-7b
parameters:
int8_mask: true
dtype: bfloat16
random_seed: 0
tokenizer_source: base
```
## 💻 Usage
```python
!pip install -qU transformers accelerate
from transformers import AutoTokenizer
import transformers
import torch
model = "cstr/Spaetzle-v60-7b"
messages = [{"role": "user", "content": "What is a large language model?"}]
tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
"text-generation",
model=model,
torch_dtype=torch.float16,
device_map="auto",
)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
```

26
config.json Normal file
View File

@@ -0,0 +1,26 @@
{
"_name_or_path": "cstr/Spaetzle-v58-7b",
"architectures": [
"MistralForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 32768,
"model_type": "mistral",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"rms_norm_eps": 1e-05,
"rope_theta": 10000.0,
"sliding_window": 4096,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.37.0",
"use_cache": true,
"vocab_size": 32000
}

16
mergekit_config.yml Normal file
View File

@@ -0,0 +1,16 @@
models:
- model: cstr/Spaetzle-v58-7b
# no parameters necessary for base model
- model: abideen/AlphaMonarch-dora
parameters:
density: 0.60
weight: 0.30
merge_method: dare_ties
base_model: cstr/Spaetzle-v58-7b
parameters:
int8_mask: true
dtype: bfloat16
random_seed: 0
tokenizer_source: base

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:00bb09a59f705fa488a665dbae72d782d0da4431c1274a21bddd6685ec62c3db
size 1946210264

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a9fb9c12f1377e9a1c1b83975db54cbe4f73d1ba9c0415d167a51aa00a8d1711
size 1979798496

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:82a36c9399983a379aff1c564b38ebb6e862fa82896cc572be942cb6d8c8c597
size 1889587048

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f169919682dbdb2a158a64855d39515556bc3bc021a306c6f7b188bea552309c
size 1904283640

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:32a79f305c23a60a8ad08e6a0e10bc343112fdb91458a348a2769991f795e79c
size 1946260600

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:68dbc08af77feea70c3b63e264c0f2e24d25b9be244a197c425c4cbd9d1deb50
size 1965118080

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:db240239ae2bbbeec5f8afcdeccfc408e14acaf6e0c2c18db7274a2de1c77466
size 1979789752

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d27092507427a550e5b63d70def236ca18836de055d1d30d2dac4ef4488cad77
size 872450088

File diff suppressed because one or more lines are too long

28
special_tokens_map.json Normal file
View File

@@ -0,0 +1,28 @@
{
"additional_special_tokens": [
"<unk>",
"<s>",
"</s>"
],
"bos_token": {
"content": "<s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "</s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"unk_token": {
"content": "<unk>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
}
}

91122
tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

BIN
tokenizer.model (Stored with Git LFS) Normal file

Binary file not shown.

49
tokenizer_config.json Normal file
View File

@@ -0,0 +1,49 @@
{
"add_bos_token": true,
"add_eos_token": false,
"added_tokens_decoder": {
"0": {
"content": "<unk>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": true
},
"1": {
"content": "<s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": true
},
"2": {
"content": "</s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": true
}
},
"additional_special_tokens": [
"<unk>",
"<s>",
"</s>"
],
"bos_token": "<s>",
"chat_template": "{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content']}}{% if (loop.last and add_generation_prompt) or not loop.last %}{{ '<|im_end|>' + '\n'}}{% endif %}{% endfor %}\n{% if add_generation_prompt and messages[-1]['role'] != 'assistant' %}{{ '<|im_start|>assistant\n' }}{% endif %}",
"clean_up_tokenization_spaces": false,
"eos_token": "</s>",
"legacy": true,
"model_max_length": 1000000000000000019884624838656,
"pad_token": null,
"padding_side": "left",
"sp_model_kwargs": {},
"spaces_between_special_tokens": false,
"split_special_tokens": false,
"tokenizer_class": "LlamaTokenizer",
"unk_token": "<unk>",
"use_default_system_prompt": true
}