初始化项目，由ModelHub XC社区提供模型

Model: VAGOsolutions/SauerkrautLM-14b-MoE-LaserChat Source: Original Platform
2026-05-27 17:56:19 +08:00
commit 46161b3957
12 changed files with 91511 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,35 @@
+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,190 @@
+---
+license: apache-2.0
+language:
+- en
+- de
+library_name: transformers
+pipeline_tag: text-generation
+tags:
+- finetune
+- sft
+- dpo
+- laser
+- augmentation
+- german
+- english
+- moe
+---
+![SauerkrautLM](https://vago-solutions.ai/wp-content/uploads/2024/03/laserchatmoe.png "SauerkrautLM-14b-MoE-LaserChat")
+## VAGO solutions SauerkrautLM-14b-MoE-LaserChat
+Introducing **SauerkrautLM-14b-MoE-LaserChat** – our Sauerkraut (2x7b) 14b MoE version of the powerful [SauerkrautLM-7b-LaserChat](https://huggingface.co/VAGOsolutions/SauerkrautLM-7b-LaserChat) and [yam-peleg/Experiment26-7B](https://huggingface.co/yam-peleg/Experiment26-7B)  !
+
+By combining the two models, we were able to significantly increase both the German and English language skills.
+In addition, the initial SauerkrautLM-7b-LaserChat also acts as an adapter for Experiment26-7B, which means it benefits from the chat capabilities of the SauerkrautLM-7b-LaserChat. 
+At the same time, the SauerkrautLM-7b-LaserChat benefits from the knowledge and creativity of Experiment26-7B.
+
+The model **SauerkrautLM-14b-MoE-LaserChat** is a **joint effort** between **VAGO solutions** and **Hyperspace.ai.** 
+Much appreciation goes to the tremendous research effort of **Fernando Fernandes Neto, David Golchinfar and Eric Hartford on their laserRMT approach.** 
+Without their independent research collaboration this model release would not have been possible. 
+
+
+# Table of Contents
+1. [Overview of all SauerkrautLM-14b-MoE-LaserChat models](#all-sauerkrautlm-14b-MoE-laserchat-models)
+2. [Model Details](#model-details)
+   - [Prompt template](#prompt-template)
+3. [Evaluation](#evaluation)
+5. [Disclaimer](#disclaimer)
+6. [Contact](#contact)
+7. [Collaborations](#collaborations)
+8. [Acknowledgement](#acknowledgement)
+
+
+## All SauerkrautLM-14b-MoE-LaserChat Models
+
+| Model | HF    | GPTQ  | GGUF  | AWQ  |
+|-------|-------|-------|-------|-------|
+| SauerkrautLM-14b-MoE-LaserChat  | [Link](https://huggingface.co/VAGOsolutions/SauerkrautLM-14b-MoE-LaserChat) | coming soon | coming soon | coming soon |
+
+## Model Details
+**SauerkrautLM-14b-MoE-LaserChat**
+- **Model Type:** SauerkrautLM-14b-MoE-LaserChat is a MoE Model based on [SauerkrautLM-7b-LaserChat](https://huggingface.co/VAGOsolutions/SauerkrautLM-7b-LaserChat) and [yam-peleg/Experiment26-7B](https://huggingface.co/yam-peleg/Experiment26-7B) 
+- **Language(s):** German, English
+- **License:** Apache 2.0
+- **Contact:** [VAGO solutions](https://vago-solutions.ai), [Hyperspace.computer](https://hyperspace.computer/)
+
+
+We improved the German language skills on this model further. Nevertheless, certain formulations may occur that are not entirely correct.
+
+
+### Prompt Template:
+```
+GPT4 Correct User: Hallo, wie geht es dir?<|end_of_turn|>GPT4 Correct Assistant: Hallo! Ich bin ein künstliches Intelligenzsystem und habe keine persönlichen Gefühle oder körperliche Zustände. Wie kann ich Ihnen helfen?<|end_of_turn|>GPT4 Correct User: Ich benötige nur einen kurzen Satz, den ich in das Prompt Template veröffentlichen kann.<|end_of_turn|>GPT4 Correct Assistant:
+
+
+```
+
+
+```
+GPT4 Correct User: Hello<|end_of_turn|>GPT4 Correct Assistant: Hello! How can I help you today? If you have any questions or need assistance, feel free to ask.<|end_of_turn|>GPT4 Correct User: I just need a short sentence to post in the prompt template.<|end_of_turn|>GPT4 Correct Assistant:
+
+```
+
+
+## Evaluation
+
+**Open LLM Leaderboard:**
+
+benchmarked on lm-evaluation-harness 0.4.1
+
+| Metric                | Value                     |
+|-----------------------|---------------------------|
+| Avg.                  | 71.65 |
+| ARC (25-shot)         | 68.09         |
+| HellaSwag (10-shot)   | 84.78  |
+| MMLU (5-shot)         | 63.59|
+| TruthfulQA (0-shot)   | 58.57 |
+| Winogrande (5-shot)   | 80.74  |
+| GSM8K (5-shot)        | 74.15        |
+
+**Performance**
+
+|                                 Model                                 |AGIEval|GPT4All|TruthfulQA|BigBench|Average ⬇️|
+|-----------------------------------------------------------------------|------:|------:|---------:|-------:|------:|
+|[VAGOsolutions/SauerkrautLM-14b-MoE-LaserChat](https://huggingface.co/VAGOsolutions/SauerkrautLM-14b-MoE-LaserChat)  |  44.38|  74.76|     58.57|   47.98|  56.42|
+|[VAGOsolutions/SauerkrautLM-Gemma-7b](https://huggingface.co/VAGOsolutions/SauerkrautLM-Gemma-7b)  |  37.5|  72.46|     61.24|   45.33|  54.13|
+|[zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta)  |  37.52|  71.77|     55.26|   39.77|  51.08|
+|[zephyr-7b-gemma-v0.1](https://huggingface.co/HuggingFaceH4/zephyr-7b-gemma-v0.1)|  34.22|  66.37|     52.19|   37.10|  47.47|
+|[google/gemma-7b-it](https://huggingface.co/google/gemma-7b-it)        |  21.33|  40.84|     41.70|   30.25|  33.53|
+
+
+<details><summary>Details of AGIEval, GPT4All, TruthfulQA, BigBench </summary>
+
+**AGIEval** 
+|            Tasks             |Version|Filter|n-shot| Metric |Value |   |Stderr|
+|------------------------------|------:|------|------|--------|-----:|---|-----:|
+|agieval_sat_math              |      1|none  |None  |acc     |0.3727|±  |0.0327|
+|                              |       |none  |None  |acc_norm|0.3045|±  |0.0311|
+|agieval_sat_en_without_passage|      1|none  |None  |acc     |0.4806|±  |0.0349|
+|                              |       |none  |None  |acc_norm|0.4612|±  |0.0348|
+|agieval_sat_en                |      1|none  |None  |acc     |0.7816|±  |0.0289|
+|                              |       |none  |None  |acc_norm|0.7621|±  |0.0297|
+|agieval_lsat_rc               |      1|none  |None  |acc     |0.6134|±  |0.0297|
+|                              |       |none  |None  |acc_norm|0.6059|±  |0.0298|
+|agieval_lsat_lr               |      1|none  |None  |acc     |0.5431|±  |0.0221|
+|                              |       |none  |None  |acc_norm|0.5216|±  |0.0221|
+|agieval_lsat_ar               |      1|none  |None  |acc     |0.2435|±  |0.0284|
+|                              |       |none  |None  |acc_norm|0.2174|±  |0.0273|
+|agieval_logiqa_en             |      1|none  |None  |acc     |0.3871|±  |0.0191|
+|                              |       |none  |None  |acc_norm|0.4101|±  |0.0193|
+|agieval_aqua_rat              |      1|none  |None  |acc     |0.3031|±  |0.0289|
+|                              |       |none  |None  |acc_norm|0.2677|±  |0.0278|
+
+Average: 44.38%
+
+**GPT4All**
+|  Tasks  |Version|Filter|n-shot| Metric |Value |   |Stderr|
+|---------|------:|------|------|--------|-----:|---|-----:|
+|arc_challenge|      1|none  |None  |acc     |0.5947|±  |0.0143|
+|             |       |none  |None  |acc_norm|0.6280|±  |0.0141|
+|arc_easy     |      1|none  |None  |acc     |0.8506|±  |0.0073|
+|             |       |none  |None  |acc_norm|0.8468|±  |0.0074|
+|boolq        |      2|none  |None  |acc     |0.8761|±  |0.0058|
+|hellaswag    |      1|none  |None  |acc     |0.6309|±  |0.0048|
+|             |       |none  |None  |acc_norm|0.8323|±  |0.0037|
+|openbookqa   |      1|none  |None  |acc     |0.326 |±  |0.0210|
+|             |       |none  |None  |acc_norm|0.470| ±  |0.0223|
+|piqa         |      1|none  |None  |acc     |0.8237|±  |0.0089|
+|             |       |none  |None  |acc_norm|0.8335|±  |0.0087|
+|winogrande   |      1|none  |None  |acc     |0.7466|±  |0.0122|
+
+Average: 74.76%
+
+**TruthfulQA**
+|    Tasks     |Version|Filter|n-shot|Metric|Value |   |Stderr|
+|--------------|------:|------|-----:|------|-----:|---|-----:|
+|truthfulqa_mc2|      2|none  |     0|acc   |0.5857|±  |0.0141|
+
+
+Average: 58.57%
+
+**Bigbench**
+|                       Tasks                        |Version|     Filter     |n-shot|  Metric   |Value |   |Stderr|
+|----------------------------------------------------|------:|----------------|-----:|-----------|-----:|---|-----:|
+|bbh_zeroshot_tracking_shuffled_objects_three_objects|      2|flexible-extract|     0|exact_match|0.3120|±  |0.0294|
+|bbh_zeroshot_tracking_shuffled_objects_seven_objects|      2|flexible-extract|     0|exact_match|0.1560|±  |0.0230|
+|bbh_zeroshot_tracking_shuffled_objects_five_objects |      2|flexible-extract|     0|exact_match|0.1720|±  |0.0239|
+|bbh_zeroshot_temporal_sequences                     |      2|flexible-extract|     0|exact_match|0.3960|±  |0.0310|
+|bbh_zeroshot_sports_understanding                   |      2|flexible-extract|     0|exact_match|0.8120|±  |0.0248|
+|bbh_zeroshot_snarks                                 |      2|flexible-extract|     0|exact_match|0.5843|±  |0.0370|
+|bbh_zeroshot_salient_translation_error_detection    |      2|flexible-extract|     0|exact_match|0.4640|±  |0.0316|
+|bbh_zeroshot_ruin_names                             |      2|flexible-extract|     0|exact_match|0.4360|±  |0.0314|
+|bbh_zeroshot_reasoning_about_colored_objects        |      2|flexible-extract|     0|exact_match|0.5520|±  |0.0315|
+|bbh_zeroshot_navigate                               |      2|flexible-extract|     0|exact_match|0.5800|±  |0.0313|
+|bbh_zeroshot_movie_recommendation                   |      2|flexible-extract|     0|exact_match|0.7320|±  |0.0281|
+|bbh_zeroshot_logical_deduction_three_objects        |      2|flexible-extract|     0|exact_match|0.5680|±  |0.0314|
+|bbh_zeroshot_logical_deduction_seven_objects        |      2|flexible-extract|     0|exact_match|0.3920|±  |0.0309|
+|bbh_zeroshot_logical_deduction_five_objects         |      2|flexible-extract|     0|exact_match|0.3960|±  |0.0310|
+|bbh_zeroshot_geometric_shapes                       |      2|flexible-extract|     0|exact_match|0.3800|±  |0.0308|
+|bbh_zeroshot_disambiguation_qa                      |      2|flexible-extract|     0|exact_match|0.6760|±  |0.0297|
+|bbh_zeroshot_date_understanding                     |      2|flexible-extract|     0|exact_match|0.4400|±  |0.0315|
+|bbh_zeroshot_causal_judgement                       |      2|flexible-extract|     0|exact_match|0.5882|±  |0.0361|
+
+Average: 47.98%
+
+</details>
+
+
+
+## Disclaimer
+We must inform users that despite our best efforts in data cleansing, the possibility of uncensored content slipping through cannot be entirely ruled out.
+However, we cannot guarantee consistently appropriate behavior. Therefore, if you encounter any issues or come across inappropriate content, we kindly request that you inform us through the contact information provided.
+Additionally, it is essential to understand that the licensing of these models does not constitute legal advice. We are not held responsible for the actions of third parties who utilize our models.
+ 
+## Contact
+If you are interested in customized LLMs for business applications, please get in contact with us via our websites. We are also grateful for your feedback and suggestions.
+ 
+## Collaborations
+We are also keenly seeking support and investment for our startups, VAGO solutions and Hyperspace where we continuously advance the development of robust language models designed to address a diverse range of purposes and requirements. If the prospect of collaboratively navigating future challenges excites you, we warmly invite you to reach out to us at [VAGO solutions](https://vago-solutions.de/#Kontakt), [Hyperspace.computer](https://hyperspace.computer/)
+
+## Acknowledgement
+Many thanks to [yam-peleg](https://huggingface.co/yam-peleg) for providing such valuable model to the Open-Source community
--- a/added_tokens.json
+++ b/added_tokens.json
@@ -0,0 +1,4 @@
+{
+  "<|end_of_turn|>": 32000,
+  "<|pad_0|>": 32001
+}
--- a/config.json
+++ b/config.json
@@ -0,0 +1,30 @@
+{
+  "_name_or_path": "VAGOsolutions/SauerkrautLM-14b-MoE-LaserChat",
+  "architectures": [
+    "MixtralForCausalLM"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 1,
+  "eos_token_id": 32000,
+  "hidden_act": "silu",
+  "hidden_size": 4096,
+  "initializer_range": 0.02,
+  "intermediate_size": 14336,
+  "max_position_embeddings": 8192,
+  "model_type": "mixtral",
+  "num_attention_heads": 32,
+  "num_experts_per_tok": 2,
+  "num_hidden_layers": 32,
+  "num_key_value_heads": 8,
+  "num_local_experts": 2,
+  "output_router_logits": false,
+  "rms_norm_eps": 1e-05,
+  "rope_theta": 10000.0,
+  "router_aux_loss_coef": 0.001,
+  "sliding_window": null,
+  "tie_word_embeddings": false,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.38.2",
+  "use_cache": true,
+  "vocab_size": 32002
+}
--- a/model-00001-of-00003.safetensors
+++ b/model-00001-of-00003.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:19d3e427d999271a6d51723757532a86b3698e62503f0c0998929ec8fbd7f074
+size 9919846472
--- a/model-00002-of-00003.safetensors
+++ b/model-00002-of-00003.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:d753c6451fb872cd990cd4115bb0f988a10ec6a69c93e3ecd6d6b014333c4056
+size 9982454736
--- a/model-00003-of-00003.safetensors
+++ b/model-00003-of-00003.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:21cc5b30900920363a0d5e2474abd6b686f774c4b0f8cc3ba74a55f751f69926
+size 5856061008
--- a/model.safetensors.index.json
+++ b/model.safetensors.index.json
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@@ -0,0 +1,28 @@
+{
+  "additional_special_tokens": [
+    "<|end_of_turn|>",
+    "<|pad_0|>"
+  ],
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|end_of_turn|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "<s>",
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}
--- a/tokenizer.json
+++ b/tokenizer.json
--- a/tokenizer.model
+++ b/tokenizer.model
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -0,0 +1,64 @@
+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32000": {
+      "content": "<|end_of_turn|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32001": {
+      "content": "<|pad_0|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<|end_of_turn|>",
+    "<|pad_0|>"
+  ],
+  "bos_token": "<s>",
+  "chat_template": "{{ bos_token }}{% for message in messages %}{{ 'GPT4 Correct ' + message['role'].title() + ': ' + message['content'] + '<|end_of_turn|>'}}{% endfor %}{% if add_generation_prompt %}{{ 'GPT4 Correct Assistant:' }}{% endif %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|end_of_turn|>",
+  "legacy": true,
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "<s>",
+  "sp_model_kwargs": {},
+  "spaces_between_special_tokens": false,
+  "tokenizer_class": "LlamaTokenizer",
+  "trust_remote_code": false,
+  "unk_token": "<unk>",
+  "use_default_system_prompt": true,
+  "use_fast": true
+}