初始化项目，由ModelHub XC社区提供模型

Model: cloudyu/Mixtral_13B_Chat Source: Original Platform
2026-04-13 03:40:57 +08:00
commit e25ffc6657
14 changed files with 91808 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,35 @@
+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,77 @@
+---
+license: mit
+---
+
+This is a fine-tuned 13B chat model
+
+code example
+```
+import torch
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import math
+
+## v2 models
+model_path = "cloudyu/Mixtral_13B_Chat"
+
+tokenizer = AutoTokenizer.from_pretrained(model_path, use_default_system_prompt=False)
+model = AutoModelForCausalLM.from_pretrained(
+        model_path, torch_dtype=torch.bfloat16, device_map='auto',local_files_only=False, load_in_4bit=True
+)
+print(model)
+prompt = input("please input prompt:")
+while len(prompt) > 0:
+  input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to("cuda")
+
+  generation_output = model.generate(
+    input_ids=input_ids, max_new_tokens=800,repetition_penalty=1.2
+  )
+  print(tokenizer.decode(generation_output[0]))
+  prompt = input("please input prompt:")
+
+```
+
+output examples
+
+```
+<s> 12+13-24=?
+To solve this equation, we need to follow the order of operations (PEMDAS/BODMAS): Parentheses, Exponents, Multiplication and Division (from left to right), Addition and Subtraction (from left to right).
+
+Given expression: 12 + 13 - 24
+First, let's add 12 and 13 together:
+12 + 13 = 25
+Now our expression looks like this: 25 - 24
+Next, subtract 24 from 25:
+25 - 24 = 1
+So, the final answer is 1.</s>
+```
+
+```
+<s> write a story about yosemite national park in 1000 words.The sun had just begun to rise over the horizon, casting its golden light upon Yosemite National Park's majestic granite cliffs and towering sequoias. As the first rays of sunlight touched the earth, they illuminated the misty waterfalls that cascaded down from high above, creating an ethereal scene straight out of a fairytale.
+
+In this enchanting land, where nature reigned supreme, there lived a young girl named Ava who had come here seeking solace after losing her parents in a tragic accident. She found herself drawn to the beauty and serenity of Yosemite, which seemed to offer a sense of peace she couldn’t find anywhere else.
+
+Ava spent most days exploring the vast expanse of the park, hiking through meadows filled with wildflowers or scaling rock formations that offered breathtaking views of the valley below. At night, she would sit by the campfire, gazing up at the star-studded sky while listening to the soothing sounds of the river flowing nearby. It was during these moments that she felt closest to her parents, as if their spirits were guiding her on this journey towards healing.
+
+One day, while wandering along one of the many trails leading into the heart of the park, Ava stumbled upon something unexpected – a small wooden box nestled among the roots of an ancient tree. Curious, she picked it up and opened it to discover a handwritten letter inside. The paper was yellowed with age but still legible, bearing the name "John Muir" at the top.
+
+As she read aloud his words, Ava learned that John Muir had been instrumental in preserving Yosemite National Park back when it was threatened by development projects. He believed deeply in protecting our natural resources for future generations, and he poured his passion into writing letters to politicians and influential figures, urging them to take action. His efforts eventually paid off, resulting in the establishment of what is now known as America's first national park.
+
+Feeling inspired by John Muir's dedication to conservation, Ava decided to follow in his footsteps by penning her own plea for environmental protection. With renewed purpose, she returned home and began researching ways to make a difference. After months of hard work, she drafted a proposal detailing various initiatives aimed at reducing carbon emissions, promoting sustainable practices, and raising awareness about climate change.
+
+Her plan caught the attention of several prominent environmental organizations, who agreed to support her cause wholeheartedly. Together, they launched a campaign called "Save Our Earth: One Step at a Time," encouraging people worldwide to adopt eco-friendly habits such as recycling, using public transportation whenever possible, and planting trees.
+
+Over time, the movement gained momentum, attracting supporters from all walks of life. Celebrities lent their voices to raise awareness, businesses pledged to reduce their carbon footprint, and governments around the globe started implementing policies designed to protect the environment. Slowly but surely, progress was being made.
+
+Years passed since Ava's discovery in Yosemite National Park, yet the memory of finding John Muir's letter remained etched in her mind like a cherished treasure. Now married with two children of her own, she continued advocating for environmental justice alongside her husband, who shared her passion for preservation. Their family often visited Yosemite together, passing on stories about John Muir and his legacy to their kids.
+
+On one particular trip, while hiking through a dense forest, Ava noticed something peculiar – a group of loggers cutting down trees without any regard for the surrounding ecosystem. Angered by this blatant disregard for nature, she approached the men and demanded an explanation. They replied dismissively, claiming they needed the wood for construction purposes.
+
+Refusing to accept defeat, Ava took matters into her own hands. She gathered her family and friends, forming a human chain around the area designated for logging. Determined not to let anyone harm the precious forest, they stood firm against the loggers until authorities arrived on site. Eventually, the situation escalated into a standoff between both parties, drawing media attention from across the country.
+
+During this tense standoff, Ava recounted her experience with John Muir's letter and how it led her to create the "Save Our Earth" initiative. Her words resonated strongly with those present, sparking conversations about sustainability and responsible resource management. Ultimately, the loggers relented under public pressure, agreeing to cease operations within the protected area.
+
+This incident marked a turning point in Ava's crusade for environmental protection. From then onwards, she dedicated herself fully to spreading awareness about the importance of conserving our planet's natural wonders. Through her tireless efforts, more people became aware of the need for sustainable living practices, ultimately contributing to positive changes in government policies and corporate behavior.
+
+Today, Yosemite National Park remains a testament to the power of individual actions combined with collective effort. Its pristine landscapes continue inspiring countless visitors each year, reminding us all that we have a responsibility towards safeguarding our planet for future generations. And amidst these stunning vistas stands Ava, proudly carrying forth John Muir's legacy, ensuring that his dream of preserving nature lives on forever.</s>
+
+```
--- a/config.json
+++ b/config.json
@@ -0,0 +1,30 @@
+{
+  "_name_or_path": "cloudyu/Mixtral_7Bx2_MoE",
+  "architectures": [
+    "MixtralForCausalLM"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "hidden_act": "silu",
+  "hidden_size": 4096,
+  "initializer_range": 0.02,
+  "intermediate_size": 14336,
+  "max_position_embeddings": 32768,
+  "model_type": "mixtral",
+  "num_attention_heads": 32,
+  "num_experts_per_tok": 2,
+  "num_hidden_layers": 32,
+  "num_key_value_heads": 8,
+  "num_local_experts": 2,
+  "output_router_logits": false,
+  "rms_norm_eps": 1e-05,
+  "rope_theta": 10000.0,
+  "router_aux_loss_coef": 0.001,
+  "sliding_window": null,
+  "tie_word_embeddings": false,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.37.2",
+  "use_cache": true,
+  "vocab_size": 32000
+}
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,6 @@
+{
+  "_from_model_config": true,
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "transformers_version": "4.37.2"
+}
--- a/model-00001-of-00006.safetensors
+++ b/model-00001-of-00006.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:bee37c70a85dd8a62c0272de960a5ab17a286db5977aea56e3546bed40e769f1
+size 4993525264
--- a/model-00002-of-00006.safetensors
+++ b/model-00002-of-00006.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:079d0c9dba590b20732c88a758d4c399459c99d55415b32c420a15cbaa4f7969
+size 4932724880
--- a/model-00003-of-00006.safetensors
+++ b/model-00003-of-00006.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:b6b545b0db1b56f7815f262afa3a977e446c97e7117b49b40e516fe8aaeb95d2
+size 4966262528
--- a/model-00004-of-00006.safetensors
+++ b/model-00004-of-00006.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:6725d4b1f4435344371e9b0338ba2b9d8b324e0b1ee201faec58a9d2437d0be5
+size 4966262528
--- a/model-00005-of-00006.safetensors
+++ b/model-00005-of-00006.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:949f180044905a5198cf0175affe5fd49cca6e63415fe0c0b0f8b55b167bbe87
+size 4932741544
--- a/model-00006-of-00006.safetensors
+++ b/model-00006-of-00006.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:1360cfdb604aa4b1b2671a3d57765d54afc8394f9751010fde570036ab90c482
+size 966812872
--- a/model.safetensors.index.json
+++ b/model.safetensors.index.json
@@ -0,0 +1,426 @@
+{
+  "metadata": {
+    "total_size": 25758277632
+  },
+  "weight_map": {
+    "lm_head.weight": "model-00006-of-00006.safetensors",
+    "model.embed_tokens.weight": "model-00001-of-00006.safetensors",
+    "model.layers.0.block_sparse_moe.experts.0.w1.weight": "model-00001-of-00006.safetensors",
+    "model.layers.0.block_sparse_moe.experts.0.w2.weight": "model-00001-of-00006.safetensors",
+    "model.layers.0.block_sparse_moe.experts.0.w3.weight": "model-00001-of-00006.safetensors",
+    "model.layers.0.block_sparse_moe.experts.1.w1.weight": "model-00001-of-00006.safetensors",
+    "model.layers.0.block_sparse_moe.experts.1.w2.weight": "model-00001-of-00006.safetensors",
+    "model.layers.0.block_sparse_moe.experts.1.w3.weight": "model-00001-of-00006.safetensors",
+    "model.layers.0.block_sparse_moe.gate.weight": "model-00001-of-00006.safetensors",
+    "model.layers.0.input_layernorm.weight": "model-00001-of-00006.safetensors",
+    "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
+    "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
+    "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
+    "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
+    "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
+    "model.layers.1.block_sparse_moe.experts.0.w1.weight": "model-00001-of-00006.safetensors",
+    "model.layers.1.block_sparse_moe.experts.0.w2.weight": "model-00001-of-00006.safetensors",
+    "model.layers.1.block_sparse_moe.experts.0.w3.weight": "model-00001-of-00006.safetensors",
+    "model.layers.1.block_sparse_moe.experts.1.w1.weight": "model-00001-of-00006.safetensors",
+    "model.layers.1.block_sparse_moe.experts.1.w2.weight": "model-00001-of-00006.safetensors",
+    "model.layers.1.block_sparse_moe.experts.1.w3.weight": "model-00001-of-00006.safetensors",
+    "model.layers.1.block_sparse_moe.gate.weight": "model-00001-of-00006.safetensors",
+    "model.layers.1.input_layernorm.weight": "model-00001-of-00006.safetensors",
+    "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
+    "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
+    "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
+    "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
+    "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
+    "model.layers.10.block_sparse_moe.experts.0.w1.weight": "model-00002-of-00006.safetensors",
+    "model.layers.10.block_sparse_moe.experts.0.w2.weight": "model-00002-of-00006.safetensors",
+    "model.layers.10.block_sparse_moe.experts.0.w3.weight": "model-00002-of-00006.safetensors",
+    "model.layers.10.block_sparse_moe.experts.1.w1.weight": "model-00002-of-00006.safetensors",
+    "model.layers.10.block_sparse_moe.experts.1.w2.weight": "model-00002-of-00006.safetensors",
+    "model.layers.10.block_sparse_moe.experts.1.w3.weight": "model-00002-of-00006.safetensors",
+    "model.layers.10.block_sparse_moe.gate.weight": "model-00002-of-00006.safetensors",
+    "model.layers.10.input_layernorm.weight": "model-00002-of-00006.safetensors",
+    "model.layers.10.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
+    "model.layers.10.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
+    "model.layers.10.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
+    "model.layers.10.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
+    "model.layers.10.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
+    "model.layers.11.block_sparse_moe.experts.0.w1.weight": "model-00002-of-00006.safetensors",
+    "model.layers.11.block_sparse_moe.experts.0.w2.weight": "model-00002-of-00006.safetensors",
+    "model.layers.11.block_sparse_moe.experts.0.w3.weight": "model-00002-of-00006.safetensors",
+    "model.layers.11.block_sparse_moe.experts.1.w1.weight": "model-00002-of-00006.safetensors",
+    "model.layers.11.block_sparse_moe.experts.1.w2.weight": "model-00002-of-00006.safetensors",
+    "model.layers.11.block_sparse_moe.experts.1.w3.weight": "model-00002-of-00006.safetensors",
+    "model.layers.11.block_sparse_moe.gate.weight": "model-00002-of-00006.safetensors",
+    "model.layers.11.input_layernorm.weight": "model-00002-of-00006.safetensors",
+    "model.layers.11.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
+    "model.layers.11.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
+    "model.layers.11.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
+    "model.layers.11.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
+    "model.layers.11.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
+    "model.layers.12.block_sparse_moe.experts.0.w1.weight": "model-00002-of-00006.safetensors",
+    "model.layers.12.block_sparse_moe.experts.0.w2.weight": "model-00003-of-00006.safetensors",
+    "model.layers.12.block_sparse_moe.experts.0.w3.weight": "model-00003-of-00006.safetensors",
+    "model.layers.12.block_sparse_moe.experts.1.w1.weight": "model-00003-of-00006.safetensors",
+    "model.layers.12.block_sparse_moe.experts.1.w2.weight": "model-00003-of-00006.safetensors",
+    "model.layers.12.block_sparse_moe.experts.1.w3.weight": "model-00003-of-00006.safetensors",
+    "model.layers.12.block_sparse_moe.gate.weight": "model-00002-of-00006.safetensors",
+    "model.layers.12.input_layernorm.weight": "model-00003-of-00006.safetensors",
+    "model.layers.12.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
+    "model.layers.12.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
+    "model.layers.12.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
+    "model.layers.12.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
+    "model.layers.12.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
+    "model.layers.13.block_sparse_moe.experts.0.w1.weight": "model-00003-of-00006.safetensors",
+    "model.layers.13.block_sparse_moe.experts.0.w2.weight": "model-00003-of-00006.safetensors",
+    "model.layers.13.block_sparse_moe.experts.0.w3.weight": "model-00003-of-00006.safetensors",
+    "model.layers.13.block_sparse_moe.experts.1.w1.weight": "model-00003-of-00006.safetensors",
+    "model.layers.13.block_sparse_moe.experts.1.w2.weight": "model-00003-of-00006.safetensors",
+    "model.layers.13.block_sparse_moe.experts.1.w3.weight": "model-00003-of-00006.safetensors",
+    "model.layers.13.block_sparse_moe.gate.weight": "model-00003-of-00006.safetensors",
+    "model.layers.13.input_layernorm.weight": "model-00003-of-00006.safetensors",
+    "model.layers.13.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
+    "model.layers.13.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
+    "model.layers.13.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
+    "model.layers.13.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
+    "model.layers.13.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
+    "model.layers.14.block_sparse_moe.experts.0.w1.weight": "model-00003-of-00006.safetensors",
+    "model.layers.14.block_sparse_moe.experts.0.w2.weight": "model-00003-of-00006.safetensors",
+    "model.layers.14.block_sparse_moe.experts.0.w3.weight": "model-00003-of-00006.safetensors",
+    "model.layers.14.block_sparse_moe.experts.1.w1.weight": "model-00003-of-00006.safetensors",
+    "model.layers.14.block_sparse_moe.experts.1.w2.weight": "model-00003-of-00006.safetensors",
+    "model.layers.14.block_sparse_moe.experts.1.w3.weight": "model-00003-of-00006.safetensors",
+    "model.layers.14.block_sparse_moe.gate.weight": "model-00003-of-00006.safetensors",
+    "model.layers.14.input_layernorm.weight": "model-00003-of-00006.safetensors",
+    "model.layers.14.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
+    "model.layers.14.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
+    "model.layers.14.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
+    "model.layers.14.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
+    "model.layers.14.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
+    "model.layers.15.block_sparse_moe.experts.0.w1.weight": "model-00003-of-00006.safetensors",
+    "model.layers.15.block_sparse_moe.experts.0.w2.weight": "model-00003-of-00006.safetensors",
+    "model.layers.15.block_sparse_moe.experts.0.w3.weight": "model-00003-of-00006.safetensors",
+    "model.layers.15.block_sparse_moe.experts.1.w1.weight": "model-00003-of-00006.safetensors",
+    "model.layers.15.block_sparse_moe.experts.1.w2.weight": "model-00003-of-00006.safetensors",
+    "model.layers.15.block_sparse_moe.experts.1.w3.weight": "model-00003-of-00006.safetensors",
+    "model.layers.15.block_sparse_moe.gate.weight": "model-00003-of-00006.safetensors",
+    "model.layers.15.input_layernorm.weight": "model-00003-of-00006.safetensors",
+    "model.layers.15.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
+    "model.layers.15.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
+    "model.layers.15.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
+    "model.layers.15.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
+    "model.layers.15.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
+    "model.layers.16.block_sparse_moe.experts.0.w1.weight": "model-00003-of-00006.safetensors",
+    "model.layers.16.block_sparse_moe.experts.0.w2.weight": "model-00003-of-00006.safetensors",
+    "model.layers.16.block_sparse_moe.experts.0.w3.weight": "model-00003-of-00006.safetensors",
+    "model.layers.16.block_sparse_moe.experts.1.w1.weight": "model-00003-of-00006.safetensors",
+    "model.layers.16.block_sparse_moe.experts.1.w2.weight": "model-00003-of-00006.safetensors",
+    "model.layers.16.block_sparse_moe.experts.1.w3.weight": "model-00003-of-00006.safetensors",
+    "model.layers.16.block_sparse_moe.gate.weight": "model-00003-of-00006.safetensors",
+    "model.layers.16.input_layernorm.weight": "model-00003-of-00006.safetensors",
+    "model.layers.16.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
+    "model.layers.16.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
+    "model.layers.16.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
+    "model.layers.16.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
+    "model.layers.16.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
+    "model.layers.17.block_sparse_moe.experts.0.w1.weight": "model-00003-of-00006.safetensors",
+    "model.layers.17.block_sparse_moe.experts.0.w2.weight": "model-00003-of-00006.safetensors",
+    "model.layers.17.block_sparse_moe.experts.0.w3.weight": "model-00003-of-00006.safetensors",
+    "model.layers.17.block_sparse_moe.experts.1.w1.weight": "model-00003-of-00006.safetensors",
+    "model.layers.17.block_sparse_moe.experts.1.w2.weight": "model-00003-of-00006.safetensors",
+    "model.layers.17.block_sparse_moe.experts.1.w3.weight": "model-00003-of-00006.safetensors",
+    "model.layers.17.block_sparse_moe.gate.weight": "model-00003-of-00006.safetensors",
+    "model.layers.17.input_layernorm.weight": "model-00003-of-00006.safetensors",
+    "model.layers.17.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
+    "model.layers.17.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
+    "model.layers.17.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
+    "model.layers.17.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
+    "model.layers.17.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
+    "model.layers.18.block_sparse_moe.experts.0.w1.weight": "model-00003-of-00006.safetensors",
+    "model.layers.18.block_sparse_moe.experts.0.w2.weight": "model-00003-of-00006.safetensors",
+    "model.layers.18.block_sparse_moe.experts.0.w3.weight": "model-00003-of-00006.safetensors",
+    "model.layers.18.block_sparse_moe.experts.1.w1.weight": "model-00004-of-00006.safetensors",
+    "model.layers.18.block_sparse_moe.experts.1.w2.weight": "model-00004-of-00006.safetensors",
+    "model.layers.18.block_sparse_moe.experts.1.w3.weight": "model-00004-of-00006.safetensors",
+    "model.layers.18.block_sparse_moe.gate.weight": "model-00003-of-00006.safetensors",
+    "model.layers.18.input_layernorm.weight": "model-00004-of-00006.safetensors",
+    "model.layers.18.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
+    "model.layers.18.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
+    "model.layers.18.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
+    "model.layers.18.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
+    "model.layers.18.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
+    "model.layers.19.block_sparse_moe.experts.0.w1.weight": "model-00004-of-00006.safetensors",
+    "model.layers.19.block_sparse_moe.experts.0.w2.weight": "model-00004-of-00006.safetensors",
+    "model.layers.19.block_sparse_moe.experts.0.w3.weight": "model-00004-of-00006.safetensors",
+    "model.layers.19.block_sparse_moe.experts.1.w1.weight": "model-00004-of-00006.safetensors",
+    "model.layers.19.block_sparse_moe.experts.1.w2.weight": "model-00004-of-00006.safetensors",
+    "model.layers.19.block_sparse_moe.experts.1.w3.weight": "model-00004-of-00006.safetensors",
+    "model.layers.19.block_sparse_moe.gate.weight": "model-00004-of-00006.safetensors",
+    "model.layers.19.input_layernorm.weight": "model-00004-of-00006.safetensors",
+    "model.layers.19.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
+    "model.layers.19.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
+    "model.layers.19.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
+    "model.layers.19.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
+    "model.layers.19.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
+    "model.layers.2.block_sparse_moe.experts.0.w1.weight": "model-00001-of-00006.safetensors",
+    "model.layers.2.block_sparse_moe.experts.0.w2.weight": "model-00001-of-00006.safetensors",
+    "model.layers.2.block_sparse_moe.experts.0.w3.weight": "model-00001-of-00006.safetensors",
+    "model.layers.2.block_sparse_moe.experts.1.w1.weight": "model-00001-of-00006.safetensors",
+    "model.layers.2.block_sparse_moe.experts.1.w2.weight": "model-00001-of-00006.safetensors",
+    "model.layers.2.block_sparse_moe.experts.1.w3.weight": "model-00001-of-00006.safetensors",
+    "model.layers.2.block_sparse_moe.gate.weight": "model-00001-of-00006.safetensors",
+    "model.layers.2.input_layernorm.weight": "model-00001-of-00006.safetensors",
+    "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
+    "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
+    "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
+    "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
+    "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
+    "model.layers.20.block_sparse_moe.experts.0.w1.weight": "model-00004-of-00006.safetensors",
+    "model.layers.20.block_sparse_moe.experts.0.w2.weight": "model-00004-of-00006.safetensors",
+    "model.layers.20.block_sparse_moe.experts.0.w3.weight": "model-00004-of-00006.safetensors",
+    "model.layers.20.block_sparse_moe.experts.1.w1.weight": "model-00004-of-00006.safetensors",
+    "model.layers.20.block_sparse_moe.experts.1.w2.weight": "model-00004-of-00006.safetensors",
+    "model.layers.20.block_sparse_moe.experts.1.w3.weight": "model-00004-of-00006.safetensors",
+    "model.layers.20.block_sparse_moe.gate.weight": "model-00004-of-00006.safetensors",
+    "model.layers.20.input_layernorm.weight": "model-00004-of-00006.safetensors",
+    "model.layers.20.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
+    "model.layers.20.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
+    "model.layers.20.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
+    "model.layers.20.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
+    "model.layers.20.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
+    "model.layers.21.block_sparse_moe.experts.0.w1.weight": "model-00004-of-00006.safetensors",
+    "model.layers.21.block_sparse_moe.experts.0.w2.weight": "model-00004-of-00006.safetensors",
+    "model.layers.21.block_sparse_moe.experts.0.w3.weight": "model-00004-of-00006.safetensors",
+    "model.layers.21.block_sparse_moe.experts.1.w1.weight": "model-00004-of-00006.safetensors",
+    "model.layers.21.block_sparse_moe.experts.1.w2.weight": "model-00004-of-00006.safetensors",
+    "model.layers.21.block_sparse_moe.experts.1.w3.weight": "model-00004-of-00006.safetensors",
+    "model.layers.21.block_sparse_moe.gate.weight": "model-00004-of-00006.safetensors",
+    "model.layers.21.input_layernorm.weight": "model-00004-of-00006.safetensors",
+    "model.layers.21.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
+    "model.layers.21.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
+    "model.layers.21.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
+    "model.layers.21.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
+    "model.layers.21.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
+    "model.layers.22.block_sparse_moe.experts.0.w1.weight": "model-00004-of-00006.safetensors",
+    "model.layers.22.block_sparse_moe.experts.0.w2.weight": "model-00004-of-00006.safetensors",
+    "model.layers.22.block_sparse_moe.experts.0.w3.weight": "model-00004-of-00006.safetensors",
+    "model.layers.22.block_sparse_moe.experts.1.w1.weight": "model-00004-of-00006.safetensors",
+    "model.layers.22.block_sparse_moe.experts.1.w2.weight": "model-00004-of-00006.safetensors",
+    "model.layers.22.block_sparse_moe.experts.1.w3.weight": "model-00004-of-00006.safetensors",
+    "model.layers.22.block_sparse_moe.gate.weight": "model-00004-of-00006.safetensors",
+    "model.layers.22.input_layernorm.weight": "model-00004-of-00006.safetensors",
+    "model.layers.22.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
+    "model.layers.22.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
+    "model.layers.22.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
+    "model.layers.22.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
+    "model.layers.22.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
+    "model.layers.23.block_sparse_moe.experts.0.w1.weight": "model-00004-of-00006.safetensors",
+    "model.layers.23.block_sparse_moe.experts.0.w2.weight": "model-00004-of-00006.safetensors",
+    "model.layers.23.block_sparse_moe.experts.0.w3.weight": "model-00004-of-00006.safetensors",
+    "model.layers.23.block_sparse_moe.experts.1.w1.weight": "model-00004-of-00006.safetensors",
+    "model.layers.23.block_sparse_moe.experts.1.w2.weight": "model-00004-of-00006.safetensors",
+    "model.layers.23.block_sparse_moe.experts.1.w3.weight": "model-00004-of-00006.safetensors",
+    "model.layers.23.block_sparse_moe.gate.weight": "model-00004-of-00006.safetensors",
+    "model.layers.23.input_layernorm.weight": "model-00004-of-00006.safetensors",
+    "model.layers.23.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
+    "model.layers.23.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
+    "model.layers.23.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
+    "model.layers.23.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
+    "model.layers.23.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
+    "model.layers.24.block_sparse_moe.experts.0.w1.weight": "model-00004-of-00006.safetensors",
+    "model.layers.24.block_sparse_moe.experts.0.w2.weight": "model-00004-of-00006.safetensors",
+    "model.layers.24.block_sparse_moe.experts.0.w3.weight": "model-00004-of-00006.safetensors",
+    "model.layers.24.block_sparse_moe.experts.1.w1.weight": "model-00004-of-00006.safetensors",
+    "model.layers.24.block_sparse_moe.experts.1.w2.weight": "model-00004-of-00006.safetensors",
+    "model.layers.24.block_sparse_moe.experts.1.w3.weight": "model-00005-of-00006.safetensors",
+    "model.layers.24.block_sparse_moe.gate.weight": "model-00004-of-00006.safetensors",
+    "model.layers.24.input_layernorm.weight": "model-00005-of-00006.safetensors",
+    "model.layers.24.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
+    "model.layers.24.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
+    "model.layers.24.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
+    "model.layers.24.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
+    "model.layers.24.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
+    "model.layers.25.block_sparse_moe.experts.0.w1.weight": "model-00005-of-00006.safetensors",
+    "model.layers.25.block_sparse_moe.experts.0.w2.weight": "model-00005-of-00006.safetensors",
+    "model.layers.25.block_sparse_moe.experts.0.w3.weight": "model-00005-of-00006.safetensors",
+    "model.layers.25.block_sparse_moe.experts.1.w1.weight": "model-00005-of-00006.safetensors",
+    "model.layers.25.block_sparse_moe.experts.1.w2.weight": "model-00005-of-00006.safetensors",
+    "model.layers.25.block_sparse_moe.experts.1.w3.weight": "model-00005-of-00006.safetensors",
+    "model.layers.25.block_sparse_moe.gate.weight": "model-00005-of-00006.safetensors",
+    "model.layers.25.input_layernorm.weight": "model-00005-of-00006.safetensors",
+    "model.layers.25.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
+    "model.layers.25.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
+    "model.layers.25.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
+    "model.layers.25.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
+    "model.layers.25.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
+    "model.layers.26.block_sparse_moe.experts.0.w1.weight": "model-00005-of-00006.safetensors",
+    "model.layers.26.block_sparse_moe.experts.0.w2.weight": "model-00005-of-00006.safetensors",
+    "model.layers.26.block_sparse_moe.experts.0.w3.weight": "model-00005-of-00006.safetensors",
+    "model.layers.26.block_sparse_moe.experts.1.w1.weight": "model-00005-of-00006.safetensors",
+    "model.layers.26.block_sparse_moe.experts.1.w2.weight": "model-00005-of-00006.safetensors",
+    "model.layers.26.block_sparse_moe.experts.1.w3.weight": "model-00005-of-00006.safetensors",
+    "model.layers.26.block_sparse_moe.gate.weight": "model-00005-of-00006.safetensors",
+    "model.layers.26.input_layernorm.weight": "model-00005-of-00006.safetensors",
+    "model.layers.26.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
+    "model.layers.26.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
+    "model.layers.26.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
+    "model.layers.26.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
+    "model.layers.26.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
+    "model.layers.27.block_sparse_moe.experts.0.w1.weight": "model-00005-of-00006.safetensors",
+    "model.layers.27.block_sparse_moe.experts.0.w2.weight": "model-00005-of-00006.safetensors",
+    "model.layers.27.block_sparse_moe.experts.0.w3.weight": "model-00005-of-00006.safetensors",
+    "model.layers.27.block_sparse_moe.experts.1.w1.weight": "model-00005-of-00006.safetensors",
+    "model.layers.27.block_sparse_moe.experts.1.w2.weight": "model-00005-of-00006.safetensors",
+    "model.layers.27.block_sparse_moe.experts.1.w3.weight": "model-00005-of-00006.safetensors",
+    "model.layers.27.block_sparse_moe.gate.weight": "model-00005-of-00006.safetensors",
+    "model.layers.27.input_layernorm.weight": "model-00005-of-00006.safetensors",
+    "model.layers.27.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
+    "model.layers.27.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
+    "model.layers.27.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
+    "model.layers.27.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
+    "model.layers.27.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
+    "model.layers.28.block_sparse_moe.experts.0.w1.weight": "model-00005-of-00006.safetensors",
+    "model.layers.28.block_sparse_moe.experts.0.w2.weight": "model-00005-of-00006.safetensors",
+    "model.layers.28.block_sparse_moe.experts.0.w3.weight": "model-00005-of-00006.safetensors",
+    "model.layers.28.block_sparse_moe.experts.1.w1.weight": "model-00005-of-00006.safetensors",
+    "model.layers.28.block_sparse_moe.experts.1.w2.weight": "model-00005-of-00006.safetensors",
+    "model.layers.28.block_sparse_moe.experts.1.w3.weight": "model-00005-of-00006.safetensors",
+    "model.layers.28.block_sparse_moe.gate.weight": "model-00005-of-00006.safetensors",
+    "model.layers.28.input_layernorm.weight": "model-00005-of-00006.safetensors",
+    "model.layers.28.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
+    "model.layers.28.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
+    "model.layers.28.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
+    "model.layers.28.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
+    "model.layers.28.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
+    "model.layers.29.block_sparse_moe.experts.0.w1.weight": "model-00005-of-00006.safetensors",
+    "model.layers.29.block_sparse_moe.experts.0.w2.weight": "model-00005-of-00006.safetensors",
+    "model.layers.29.block_sparse_moe.experts.0.w3.weight": "model-00005-of-00006.safetensors",
+    "model.layers.29.block_sparse_moe.experts.1.w1.weight": "model-00005-of-00006.safetensors",
+    "model.layers.29.block_sparse_moe.experts.1.w2.weight": "model-00005-of-00006.safetensors",
+    "model.layers.29.block_sparse_moe.experts.1.w3.weight": "model-00005-of-00006.safetensors",
+    "model.layers.29.block_sparse_moe.gate.weight": "model-00005-of-00006.safetensors",
+    "model.layers.29.input_layernorm.weight": "model-00005-of-00006.safetensors",
+    "model.layers.29.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
+    "model.layers.29.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
+    "model.layers.29.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
+    "model.layers.29.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
+    "model.layers.29.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
+    "model.layers.3.block_sparse_moe.experts.0.w1.weight": "model-00001-of-00006.safetensors",
+    "model.layers.3.block_sparse_moe.experts.0.w2.weight": "model-00001-of-00006.safetensors",
+    "model.layers.3.block_sparse_moe.experts.0.w3.weight": "model-00001-of-00006.safetensors",
+    "model.layers.3.block_sparse_moe.experts.1.w1.weight": "model-00001-of-00006.safetensors",
+    "model.layers.3.block_sparse_moe.experts.1.w2.weight": "model-00001-of-00006.safetensors",
+    "model.layers.3.block_sparse_moe.experts.1.w3.weight": "model-00001-of-00006.safetensors",
+    "model.layers.3.block_sparse_moe.gate.weight": "model-00001-of-00006.safetensors",
+    "model.layers.3.input_layernorm.weight": "model-00001-of-00006.safetensors",
+    "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
+    "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
+    "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
+    "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
+    "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
+    "model.layers.30.block_sparse_moe.experts.0.w1.weight": "model-00005-of-00006.safetensors",
+    "model.layers.30.block_sparse_moe.experts.0.w2.weight": "model-00005-of-00006.safetensors",
+    "model.layers.30.block_sparse_moe.experts.0.w3.weight": "model-00005-of-00006.safetensors",
+    "model.layers.30.block_sparse_moe.experts.1.w1.weight": "model-00005-of-00006.safetensors",
+    "model.layers.30.block_sparse_moe.experts.1.w2.weight": "model-00005-of-00006.safetensors",
+    "model.layers.30.block_sparse_moe.experts.1.w3.weight": "model-00005-of-00006.safetensors",
+    "model.layers.30.block_sparse_moe.gate.weight": "model-00005-of-00006.safetensors",
+    "model.layers.30.input_layernorm.weight": "model-00005-of-00006.safetensors",
+    "model.layers.30.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
+    "model.layers.30.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
+    "model.layers.30.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
+    "model.layers.30.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
+    "model.layers.30.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
+    "model.layers.31.block_sparse_moe.experts.0.w1.weight": "model-00006-of-00006.safetensors",
+    "model.layers.31.block_sparse_moe.experts.0.w2.weight": "model-00006-of-00006.safetensors",
+    "model.layers.31.block_sparse_moe.experts.0.w3.weight": "model-00006-of-00006.safetensors",
+    "model.layers.31.block_sparse_moe.experts.1.w1.weight": "model-00006-of-00006.safetensors",
+    "model.layers.31.block_sparse_moe.experts.1.w2.weight": "model-00006-of-00006.safetensors",
+    "model.layers.31.block_sparse_moe.experts.1.w3.weight": "model-00006-of-00006.safetensors",
+    "model.layers.31.block_sparse_moe.gate.weight": "model-00005-of-00006.safetensors",
+    "model.layers.31.input_layernorm.weight": "model-00006-of-00006.safetensors",
+    "model.layers.31.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
+    "model.layers.31.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
+    "model.layers.31.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
+    "model.layers.31.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
+    "model.layers.31.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
+    "model.layers.4.block_sparse_moe.experts.0.w1.weight": "model-00001-of-00006.safetensors",
+    "model.layers.4.block_sparse_moe.experts.0.w2.weight": "model-00001-of-00006.safetensors",
+    "model.layers.4.block_sparse_moe.experts.0.w3.weight": "model-00001-of-00006.safetensors",
+    "model.layers.4.block_sparse_moe.experts.1.w1.weight": "model-00001-of-00006.safetensors",
+    "model.layers.4.block_sparse_moe.experts.1.w2.weight": "model-00001-of-00006.safetensors",
+    "model.layers.4.block_sparse_moe.experts.1.w3.weight": "model-00001-of-00006.safetensors",
+    "model.layers.4.block_sparse_moe.gate.weight": "model-00001-of-00006.safetensors",
+    "model.layers.4.input_layernorm.weight": "model-00001-of-00006.safetensors",
+    "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
+    "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
+    "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
+    "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
+    "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
+    "model.layers.5.block_sparse_moe.experts.0.w1.weight": "model-00001-of-00006.safetensors",
+    "model.layers.5.block_sparse_moe.experts.0.w2.weight": "model-00001-of-00006.safetensors",
+    "model.layers.5.block_sparse_moe.experts.0.w3.weight": "model-00001-of-00006.safetensors",
+    "model.layers.5.block_sparse_moe.experts.1.w1.weight": "model-00001-of-00006.safetensors",
+    "model.layers.5.block_sparse_moe.experts.1.w2.weight": "model-00001-of-00006.safetensors",
+    "model.layers.5.block_sparse_moe.experts.1.w3.weight": "model-00001-of-00006.safetensors",
+    "model.layers.5.block_sparse_moe.gate.weight": "model-00001-of-00006.safetensors",
+    "model.layers.5.input_layernorm.weight": "model-00001-of-00006.safetensors",
+    "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
+    "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
+    "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
+    "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
+    "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
+    "model.layers.6.block_sparse_moe.experts.0.w1.weight": "model-00002-of-00006.safetensors",
+    "model.layers.6.block_sparse_moe.experts.0.w2.weight": "model-00002-of-00006.safetensors",
+    "model.layers.6.block_sparse_moe.experts.0.w3.weight": "model-00002-of-00006.safetensors",
+    "model.layers.6.block_sparse_moe.experts.1.w1.weight": "model-00002-of-00006.safetensors",
+    "model.layers.6.block_sparse_moe.experts.1.w2.weight": "model-00002-of-00006.safetensors",
+    "model.layers.6.block_sparse_moe.experts.1.w3.weight": "model-00002-of-00006.safetensors",
+    "model.layers.6.block_sparse_moe.gate.weight": "model-00002-of-00006.safetensors",
+    "model.layers.6.input_layernorm.weight": "model-00002-of-00006.safetensors",
+    "model.layers.6.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
+    "model.layers.6.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
+    "model.layers.6.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
+    "model.layers.6.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
+    "model.layers.6.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
+    "model.layers.7.block_sparse_moe.experts.0.w1.weight": "model-00002-of-00006.safetensors",
+    "model.layers.7.block_sparse_moe.experts.0.w2.weight": "model-00002-of-00006.safetensors",
+    "model.layers.7.block_sparse_moe.experts.0.w3.weight": "model-00002-of-00006.safetensors",
+    "model.layers.7.block_sparse_moe.experts.1.w1.weight": "model-00002-of-00006.safetensors",
+    "model.layers.7.block_sparse_moe.experts.1.w2.weight": "model-00002-of-00006.safetensors",
+    "model.layers.7.block_sparse_moe.experts.1.w3.weight": "model-00002-of-00006.safetensors",
+    "model.layers.7.block_sparse_moe.gate.weight": "model-00002-of-00006.safetensors",
+    "model.layers.7.input_layernorm.weight": "model-00002-of-00006.safetensors",
+    "model.layers.7.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
+    "model.layers.7.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
+    "model.layers.7.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
+    "model.layers.7.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
+    "model.layers.7.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
+    "model.layers.8.block_sparse_moe.experts.0.w1.weight": "model-00002-of-00006.safetensors",
+    "model.layers.8.block_sparse_moe.experts.0.w2.weight": "model-00002-of-00006.safetensors",
+    "model.layers.8.block_sparse_moe.experts.0.w3.weight": "model-00002-of-00006.safetensors",
+    "model.layers.8.block_sparse_moe.experts.1.w1.weight": "model-00002-of-00006.safetensors",
+    "model.layers.8.block_sparse_moe.experts.1.w2.weight": "model-00002-of-00006.safetensors",
+    "model.layers.8.block_sparse_moe.experts.1.w3.weight": "model-00002-of-00006.safetensors",
+    "model.layers.8.block_sparse_moe.gate.weight": "model-00002-of-00006.safetensors",
+    "model.layers.8.input_layernorm.weight": "model-00002-of-00006.safetensors",
+    "model.layers.8.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
+    "model.layers.8.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
+    "model.layers.8.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
+    "model.layers.8.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
+    "model.layers.8.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
+    "model.layers.9.block_sparse_moe.experts.0.w1.weight": "model-00002-of-00006.safetensors",
+    "model.layers.9.block_sparse_moe.experts.0.w2.weight": "model-00002-of-00006.safetensors",
+    "model.layers.9.block_sparse_moe.experts.0.w3.weight": "model-00002-of-00006.safetensors",
+    "model.layers.9.block_sparse_moe.experts.1.w1.weight": "model-00002-of-00006.safetensors",
+    "model.layers.9.block_sparse_moe.experts.1.w2.weight": "model-00002-of-00006.safetensors",
+    "model.layers.9.block_sparse_moe.experts.1.w3.weight": "model-00002-of-00006.safetensors",
+    "model.layers.9.block_sparse_moe.gate.weight": "model-00002-of-00006.safetensors",
+    "model.layers.9.input_layernorm.weight": "model-00002-of-00006.safetensors",
+    "model.layers.9.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
+    "model.layers.9.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
+    "model.layers.9.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
+    "model.layers.9.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
+    "model.layers.9.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
+    "model.norm.weight": "model-00006-of-00006.safetensors"
+  }
+}
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@@ -0,0 +1,35 @@
+{
+  "additional_special_tokens": [
+    "<unk>",
+    "<s>",
+    "</s>"
+  ],
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}
--- a/tokenizer.json
+++ b/tokenizer.json
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -0,0 +1,54 @@
+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<unk>",
+    "<s>",
+    "</s>"
+  ],
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "legacy": true,
+  "max_length": null,
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_to_multiple_of": null,
+  "pad_token": "<s>",
+  "pad_token_type_id": 0,
+  "padding_side": "left",
+  "sp_model_kwargs": {},
+  "spaces_between_special_tokens": false,
+  "split_special_tokens": false,
+  "stride": 0,
+  "tokenizer_class": "LlamaTokenizer",
+  "truncation_side": "right",
+  "truncation_strategy": "longest_first",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": false
+}