初始化项目,由ModelHub XC社区提供模型

Model: cstr/Spaetzle-v8-7b
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-06-01 05:00:19 +08:00
commit 12b844504b
17 changed files with 91514 additions and 0 deletions

35
.gitattributes vendored Normal file
View File

@@ -0,0 +1,35 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text

203
README.md Normal file
View File

@@ -0,0 +1,203 @@
---
tags:
- merge
- mergekit
- lazymergekit
- flemmingmiguel/NeuDist-Ro-7B
- johannhartmann/Brezn3
- ResplendentAI/Flora_DPO_7B
base_model:
- flemmingmiguel/NeuDist-Ro-7B
- johannhartmann/Brezn3
- ResplendentAI/Flora_DPO_7B
language:
- de
- en
---
# Spaetzle-v8-7b
This model is supposed to show adequate performance in German and English on a number of tasks, while mostly behaving well, that is, without rambling on, intermixing tokens from different templates in training and adapting, etc.
It is mostly a quick test, and considerably weaker in German grammar and orthography than DiscoLM e.g., but for use cases where this is not too important, but e.g. instruction following, reasoning, etc, it might actually be a little bit preferable.
It is a merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):
* [flemmingmiguel/NeuDist-Ro-7B](https://huggingface.co/flemmingmiguel/NeuDist-Ro-7B)
* [johannhartmann/Brezn3](https://huggingface.co/johannhartmann/Brezn3)
* [ResplendentAI/Flora_DPO_7B](https://huggingface.co/ResplendentAI/Flora_DPO_7B)
* on the basis of [mayflowergmbh/Wiedervereinigung-7b-dpo-laser](https://huggingface.co/mayflowergmbh/Wiedervereinigung-7b-dpo-laser)
All credits are due to the creators of those original models and the training datasets involved.
For a suitable quantized version, try [cstr/Spaetzle-v8-7b-GGUF](https://huggingface.co/cstr/Spaetzle-v8-7b-GGUF)
## Evaluation
[Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_cstr__Spaetzle-v8-7b)
| Metric |Value|
|---------------------------------|----:|
|Avg. |72.27|
|AI2 Reasoning Challenge (25-Shot)|68.69|
|HellaSwag (10-Shot) |86.68|
|MMLU (5-Shot) |64.60|
|TruthfulQA (0-shot) |64.05|
|Winogrande (5-shot) |81.45|
|GSM8k (5-shot) |68.16|
EQ-Bench (v2_de): 61.04 / english (v2): 78.3
[ScandEval](https://scandeval.com/german-nlg/) 12.5.2 scores
| Benchmark | Spaetzle-v8-7b Value |
|-----------------------|----------------------------------------------------|
| Model ID | cstr/Spaetzle-v8-7b (few-shot, val) |
| Parameters | 7242 |
| Vocabulary Size | 32 |
| Context | 32768 |
| Commercial | False |
| Speed | 5,980 ± 1,031 / 1,714 ± 552 |
| Rank | 1.85 |
| GermEval | 58.90 ± 2.30 / 45.55 ± 3.30 |
| SB10k | 61.34 ± 1.90 / 72.98 ± 1.30 |
| ScaLA-De | 31.58 ± 4.39 / 65.51 ± 2.23 |
| GermanQuAD | 24.91 ± 3.98 / 60.88 ± 3.31 |
| MLSum | 67.25 ± 1.06 / 22.95 ± 2.64 |
| MMLU-De | 34.62 ± 2.20 / 50.43 ± 1.52 |
| HellaSwag-De | 48.70 ± 2.47 / 61.05 ± 1.79 |
| Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
|------------------------------------------------------------|------:|------:|---------:|-------:|------:|
|[Spaetzle-v8-7b](https://huggingface.co/cstr/Spaetzle-v8-7b)| 45.31| 75.69| 63.94| 45.57| 57.63|
### AGIEval
| Task |Version| Metric |Value| |Stderr|
|------------------------------|------:|--------|----:|---|-----:|
|agieval_aqua_rat | 0|acc |25.59|± | 2.74|
| | |acc_norm|24.80|± | 2.72|
|agieval_logiqa_en | 0|acc |39.63|± | 1.92|
| | |acc_norm|39.78|± | 1.92|
|agieval_lsat_ar | 0|acc |23.48|± | 2.80|
| | |acc_norm|24.35|± | 2.84|
|agieval_lsat_lr | 0|acc |50.98|± | 2.22|
| | |acc_norm|51.96|± | 2.21|
|agieval_lsat_rc | 0|acc |62.08|± | 2.96|
| | |acc_norm|62.83|± | 2.95|
|agieval_sat_en | 0|acc |78.64|± | 2.86|
| | |acc_norm|79.13|± | 2.84|
|agieval_sat_en_without_passage| 0|acc |44.66|± | 3.47|
| | |acc_norm|44.66|± | 3.47|
|agieval_sat_math | 0|acc |37.27|± | 3.27|
| | |acc_norm|35.00|± | 3.22|
Average: 45.31%
### GPT4All
| Task |Version| Metric |Value| |Stderr|
|-------------|------:|--------|----:|---|-----:|
|arc_challenge| 0|acc |63.14|± | 1.41|
| | |acc_norm|64.51|± | 1.40|
|arc_easy | 0|acc |85.98|± | 0.71|
| | |acc_norm|82.49|± | 0.78|
|boolq | 1|acc |88.10|± | 0.57|
|hellaswag | 0|acc |66.31|± | 0.47|
| | |acc_norm|85.17|± | 0.35|
|openbookqa | 0|acc |38.00|± | 2.17|
| | |acc_norm|47.20|± | 2.23|
|piqa | 0|acc |83.35|± | 0.87|
| | |acc_norm|84.17|± | 0.85|
|winogrande | 0|acc |78.22|± | 1.16|
Average: 75.69%
### TruthfulQA
| Task |Version|Metric|Value| |Stderr|
|-------------|------:|------|----:|---|-----:|
|truthfulqa_mc| 1|mc1 |47.74|± | 1.75|
| | |mc2 |63.94|± | 1.53|
Average: 63.94%
### Bigbench
| Task |Version| Metric |Value| |Stderr|
|------------------------------------------------|------:|---------------------|----:|---|-----:|
|bigbench_causal_judgement | 0|multiple_choice_grade|56.84|± | 3.60|
|bigbench_date_understanding | 0|multiple_choice_grade|66.12|± | 2.47|
|bigbench_disambiguation_qa | 0|multiple_choice_grade|41.47|± | 3.07|
|bigbench_geometric_shapes | 0|multiple_choice_grade|22.01|± | 2.19|
| | |exact_str_match | 0.00|± | 0.00|
|bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|31.40|± | 2.08|
|bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|23.14|± | 1.60|
|bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|56.00|± | 2.87|
|bigbench_movie_recommendation | 0|multiple_choice_grade|45.00|± | 2.23|
|bigbench_navigate | 0|multiple_choice_grade|50.70|± | 1.58|
|bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|70.05|± | 1.02|
|bigbench_ruin_names | 0|multiple_choice_grade|45.54|± | 2.36|
|bigbench_salient_translation_error_detection | 0|multiple_choice_grade|26.05|± | 1.39|
|bigbench_snarks | 0|multiple_choice_grade|71.82|± | 3.35|
|bigbench_sports_understanding | 0|multiple_choice_grade|72.92|± | 1.42|
|bigbench_temporal_sequences | 0|multiple_choice_grade|44.20|± | 1.57|
|bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|22.80|± | 1.19|
|bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|18.23|± | 0.92|
|bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|56.00|± | 2.87|
Average: 45.57%
Average score: 57.63%
## 💻 Usage
```python
!pip install -qU transformers accelerate
from transformers import AutoTokenizer
import transformers
import torch
model = "cstr/Spaetzle-v8-7b"
messages = [{"role": "user", "content": "What is a large language model?"}]
tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
"text-generation",
model=model,
torch_dtype=torch.float16,
device_map="auto",
)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
```
## 🧩 Configuration
The model uses ChatML and should work well with this (as it is merged from models which (mostly) saw ChatML templates in training).
```yaml
models:
- model: mayflowergmbh/Wiedervereinigung-7b-dpo-laser
# no parameters necessary for base model
- model: flemmingmiguel/NeuDist-Ro-7B
parameters:
density: 0.60
weight: 0.30
- model: johannhartmann/Brezn3
parameters:
density: 0.65
weight: 0.40
- model: ResplendentAI/Flora_DPO_7B
parameters:
density: 0.6
weight: 0.3
merge_method: dare_ties
base_model: mayflowergmbh/Wiedervereinigung-7b-dpo-laser
parameters:
int8_mask: true
dtype: bfloat16
random_seed: 0
tokenizer_source: base
```

26
config.json Normal file
View File

@@ -0,0 +1,26 @@
{
"_name_or_path": "mayflowergmbh/Wiedervereinigung-7b-dpo-laser",
"architectures": [
"MistralForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 32768,
"model_type": "mistral",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"rms_norm_eps": 1e-05,
"rope_theta": 10000.0,
"sliding_window": 4096,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.37.0",
"use_cache": true,
"vocab_size": 32000
}

23
mergekit_config.yml Normal file
View File

@@ -0,0 +1,23 @@
models:
- model: mayflowergmbh/Wiedervereinigung-7b-dpo-laser
# no parameters necessary for base model
- model: flemmingmiguel/NeuDist-Ro-7B
parameters:
density: 0.60
weight: 0.30
- model: johannhartmann/Brezn3
parameters:
density: 0.65
weight: 0.40
- model: ResplendentAI/Flora_DPO_7B
parameters:
density: 0.6
weight: 0.3
merge_method: dare_ties
base_model: mayflowergmbh/Wiedervereinigung-7b-dpo-laser
parameters:
int8_mask: true
dtype: bfloat16
random_seed: 0
tokenizer_source: base

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c5d4d3937231c342edde1710c7315acdc1a9d1b12c9a72eb4bd45bdc168678dc
size 1889595352

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0466ba67e16488e292d54947230406640bd4719b31ec4835a9080455450e7873
size 1979781416

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:e3102cd15ad08b1f7b4dd136c5ad7ed0cb5a227fc5cfa9cf5428b026c421d06e
size 1988195080

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0f49c98170544f1b848b18ce1da1d42f9d514197a72262a84bb2ddf80a1cdcc9
size 1937846944

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1e78e4430ce967ddecbaf3807b501d47f201e6da4a13cd806445557a5d38aa98
size 1988178496

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1c5b18508beeeb5fc059a43a671ee5d1bde0b4ebddcf829afc3a133e5e4cefca
size 1998655576

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:53d6b7238441f21cfa95a165e53240fc4b78ee8f67c9e96b9625101b41187d44
size 1946243944

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:7b1f801d8d47c409468bce6788d12eda08fe40f6b5b695b39b162b3cc7be426f
size 755001176

File diff suppressed because one or more lines are too long

28
special_tokens_map.json Normal file
View File

@@ -0,0 +1,28 @@
{
"additional_special_tokens": [
"<unk>",
"<s>",
"</s>"
],
"bos_token": {
"content": "<s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "</s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"unk_token": {
"content": "<unk>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
}
}

91122
tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

BIN
tokenizer.model (Stored with Git LFS) Normal file

Binary file not shown.

49
tokenizer_config.json Normal file
View File

@@ -0,0 +1,49 @@
{
"add_bos_token": true,
"add_eos_token": false,
"added_tokens_decoder": {
"0": {
"content": "<unk>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": true
},
"1": {
"content": "<s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": true
},
"2": {
"content": "</s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": true
}
},
"additional_special_tokens": [
"<unk>",
"<s>",
"</s>"
],
"bos_token": "<s>",
"chat_template": "{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content']}}{% if (loop.last and add_generation_prompt) or not loop.last %}{{ '<|im_end|>' + '\n'}}{% endif %}{% endfor %}\n{% if add_generation_prompt and messages[-1]['role'] != 'assistant' %}{{ '<|im_start|>assistant\n' }}{% endif %}",
"clean_up_tokenization_spaces": false,
"eos_token": "</s>",
"legacy": true,
"model_max_length": 1000000000000000019884624838656,
"pad_token": null,
"padding_side": "left",
"sp_model_kwargs": {},
"spaces_between_special_tokens": false,
"split_special_tokens": false,
"tokenizer_class": "LlamaTokenizer",
"unk_token": "<unk>",
"use_default_system_prompt": true
}