初始化项目，由ModelHub XC社区提供模型

Model: Kquant03/Buttercup-4x7B-bf16 Source: Original Platform
2026-05-29 19:07:37 +08:00
commit 0b2f1de027
14 changed files with 91428 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,35 @@
+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,61 @@
+---
+license: apache-2.0
+language:
+- en
+tags:
+- moe
+- merge
+---
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/6589d7e6586088fd2784a12c/qj_lv87mPw8t7KsNU07Uu.png)
+# "[We] are joined by the bonds of love. And you cannot track that, not with a thousand bloodhounds, and you cannot break it, not with a thousand swords."
+[GGUF FILES HERE](https://huggingface.co/Kquant03/Buttercup-4x7B-GGUF)
+
+[EXL2 QUANT (Thank you royallab!!!)](https://huggingface.co/royallab/Buttercup-4x7B-exl2)
+
+[Join our Discord!](https://discord.gg/ZgU79QDnE2)
+
+A frankenMoE not only using far better methodology and fundamental understanding of SMoE, but completely focused around intellectual roleplay. It may have a bit of a redundancy issue (I have actually been playing with it while GGUF uploads on q8_k and it has nice variety). However, just in case, to battle this, try to keep things fresh with the model by either introducing new concepts often, or through [drμgs](https://github.com/EGjoni/DRUGS). (no not that kind)
+
+The config looks like this...(detailed version is in the files and versions):
+- [mlabonne/Beagle14-7B](https://huggingface.co/mlabonne/Beagle14-7B) - base
+- [fblgit/una-cybertron-7b-v3-OMA](https://huggingface.co/fblgit/una-cybertron-7b-v3-OMA) - expert #1
+- [rwitz/go-bruins-v2](https://huggingface.co/rwitz/go-bruins-v2) - expert #2
+- [mlabonne/Beagle14-7B](https://huggingface.co/mlabonne/Beagle14-7B) - expert #3
+- [mlabonne/Beagle14-7B](https://huggingface.co/mlabonne/Beagle14-7B) - expert #4
+
+# Completely mogs mixtral instruct 0.1 across multiple benchmarks at half the size
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/6589d7e6586088fd2784a12c/GlhMcDiRhmUOsITmBplVT.png)
+
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/6589d7e6586088fd2784a12c/cK0isGt1Nm2lEXZ9INrfu.png)
+# "[What is a Mixture of Experts (MoE)?](https://huggingface.co/blog/moe)"
+### (from the MistralAI papers...click the quoted question above to navigate to it directly.)
+
+The scale of a model is one of the most important axes for better model quality. Given a fixed computing budget, training a larger model for fewer steps is better than training a smaller model for more steps.
+
+Mixture of Experts enable models to be pretrained with far less compute, which means you can dramatically scale up the model or dataset size with the same compute budget as a dense model. In particular, a MoE model should achieve the same quality as its dense counterpart much faster during pretraining.
+
+So, what exactly is a MoE? In the context of transformer models, a MoE consists of two main elements:
+
+    Sparse MoE layers are used instead of dense feed-forward network (FFN) layers. MoE layers have a certain number of “experts” (e.g. 32 in my "frankenMoE"), where each expert is a neural network. In practice, the experts are FFNs, but they can also be more complex networks or even a MoE itself, leading to hierarchical MoEs!
+    
+    A gate network or router, that determines which tokens are sent to which expert. For example, in the image below, the token “More” is sent to the second expert, and the token "Parameters” is sent to the first network. As we’ll explore later, we can send a token to more than one expert. How to route a token to an expert is one of the big decisions when working with MoEs - the router is composed of learned parameters and is pretrained at the same time as the rest of the network.
+
+At every layer, for every token, a router network chooses two of these groups (the “experts”) to process the token and combine their output additively.
+
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/6589d7e6586088fd2784a12c/up_I0R2TQGjqTShZp_1Sz.png)
+
+Switch Layer
+MoE layer from the [Switch Transformers paper](https://arxiv.org/abs/2101.03961)
+
+So, to recap, in MoEs we replace every FFN layer of the transformer model with an MoE layer, which is composed of a gate network and a certain number of experts.
+
+Although MoEs provide benefits like efficient pretraining and faster inference compared to dense models, they also come with challenges:
+
+    Training: MoEs enable significantly more compute-efficient pretraining, but they’ve historically struggled to generalize during fine-tuning, leading to overfitting.
+    Inference: Although a MoE might have many parameters, only some of them are used during inference. This leads to much faster inference compared to a dense model with the same number of parameters. However, all parameters need to be loaded in RAM, so memory requirements are high. For example, [given a MoE like Mixtral 8x7B](https://huggingface.co/blog/moe), we’ll need to have enough VRAM to hold a dense 47B parameter model. Why 47B parameters and not 8 x 7B = 56B? That’s because in MoE models, only the FFN layers are treated as individual experts, and the rest of the model parameters are shared. At the same time, assuming just two experts are being used per token, the inference speed (FLOPs) is like using a 12B model (as opposed to a 14B model), because it computes 2x7B matrix multiplications, but with some layers shared (more on this soon).
+
+If all our tokens are sent to just a few popular experts, that will make training inefficient. In a normal MoE training, the gating network converges to mostly activate the same few experts. This self-reinforces as favored experts are trained quicker and hence selected more. To mitigate this, an auxiliary loss is added to encourage giving all experts equal importance. This loss ensures that all experts receive a roughly equal number of training examples. The following sections will also explore the concept of expert capacity, which introduces a threshold of how many tokens can be processed by an expert. In transformers, the auxiliary loss is exposed via the aux_loss parameter.
+
+
+## "Wait...but you called this a frankenMoE?"
+The difference between MoE and "frankenMoE" lies in the fact that the router layer in a model like the one on this repo is not trained simultaneously.
--- a/config.json
+++ b/config.json
@@ -0,0 +1,30 @@
+{
+  "_name_or_path": "mlabonne/Beagle14-7B",
+  "architectures": [
+    "MixtralForCausalLM"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "hidden_act": "silu",
+  "hidden_size": 4096,
+  "initializer_range": 0.02,
+  "intermediate_size": 14336,
+  "max_position_embeddings": 32768,
+  "model_type": "mixtral",
+  "num_attention_heads": 32,
+  "num_experts_per_tok": 2,
+  "num_hidden_layers": 32,
+  "num_key_value_heads": 8,
+  "num_local_experts": 4,
+  "output_router_logits": false,
+  "rms_norm_eps": 1e-05,
+  "rope_theta": 10000.0,
+  "router_aux_loss_coef": 0.001,
+  "sliding_window": null,
+  "tie_word_embeddings": false,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.37.0.dev0",
+  "use_cache": false,
+  "vocab_size": 32000
+}
--- a/mergekit_moe_config.yml
+++ b/mergekit_moe_config.yml
@@ -0,0 +1,77 @@
+base_model: mlabonne/Beagle14-7B
+gate_mode: hidden
+dtype: bfloat16
+experts:
+  - source_model: fblgit/una-cybertron-7b-v3-OMA
+    positive_prompts:
+    - "Fantasy"
+    - "Exciting"
+    - "Interesting"
+    - "Setting"
+    - "Landscape"
+    - "Fantastic"
+    - "Magical"
+    - "Storywriting"
+    - "Roleplay"
+    - "Fictional"
+    negative_prompts:
+    - "Realistic"
+    - "Nonfiction"
+    - "Historical"
+    - "Fact"
+    - "Factual"
+  - source_model: rwitz/go-bruins-v2
+    positive_prompts:
+    - "Cock"
+    - "mouth"
+    - "lips"
+    - "softly"
+    - "orgasm"
+    - "cum"
+    - "anal"
+    - "gently"
+    - "intimate"
+    - "pussy"
+    - "closer"
+    negative_prompts:
+    - "reserved"
+    - "SFW"
+    - "Professional"
+    - "vague"
+    - "off-putting"
+    - "SFW"
+    - "appropriate"
+  - source_model: mlabonne/Beagle14-7B
+    positive_prompts:
+    - "Discuss"
+    - "Chat"
+    - "engaging"
+    - "stimulating"
+    - "intense"
+    - "information"
+    negative_prompts:
+    - "Sorry"
+    - "As an AI"
+    - "I cannot"
+    - "I am not capable"
+    - "this request"
+  - source_model: mlabonne/Beagle14-7B
+    positive_prompts:
+    - "accurate"
+    - "logical"
+    - "helpful"
+    - "descriptive"
+    - "intelligent"
+    - "precise"
+    - "answer"
+    - "science"
+    - "math"
+    - "calculate"
+    - "compute"
+    - "solve"
+    negative_prompts:
+    - "unhelpful"
+    - "inaccurate"
+    - "vague"
+    - "nondescript"
+    - "improper"
--- a/model-00001-of-00005.safetensors
+++ b/model-00001-of-00005.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:e4150267fa29e35d7a82d2e96a30f979338c77e0695709365dedf7586e2adc57
+size 9919813704
--- a/model-00002-of-00005.safetensors
+++ b/model-00002-of-00005.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:d4f5c35b18150b73b7f46c78c285a8530ae6e5632217908c1a16624eec1d4d82
+size 9982454720
--- a/model-00003-of-00005.safetensors
+++ b/model-00003-of-00005.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:c837259049d83ed48ed6f4c3b4b0aa32fcf01e2077e06171723e7c1cd1bdc6a7
+size 9982454752
--- a/model-00004-of-00005.safetensors
+++ b/model-00004-of-00005.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:33bb9a366af5de043289356c9fd1d3d0b784ea990a67718729dd4b736897a0cb
+size 9982454720
--- a/model-00005-of-00005.safetensors
+++ b/model-00005-of-00005.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:75c1b0d4bb53c2e8b25a35f0fadea99914d711dab95bf3e9d6513a4ac90bcd88
+size 8440279464
--- a/model.safetensors.index.json
+++ b/model.safetensors.index.json
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@@ -0,0 +1,29 @@
+{
+  "additional_special_tokens": [
+    "<unk>",
+    "<s>",
+    "</s>"
+  ],
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "<s>",
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}
--- a/tokenizer.json
+++ b/tokenizer.json
--- a/tokenizer.model
+++ b/tokenizer.model
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -0,0 +1,48 @@
+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<unk>",
+    "<s>",
+    "</s>"
+  ],
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "legacy": true,
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "<s>",
+  "padding_side": "left",
+  "sp_model_kwargs": {},
+  "spaces_between_special_tokens": false,
+  "split_special_tokens": false,
+  "tokenizer_class": "LlamaTokenizer",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": true
+}