初始化项目,由ModelHub XC社区提供模型

Model: shisa-ai/shisa-v2-mistral-small-24b
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-04-22 20:10:56 +08:00
commit f82b9ca042
19 changed files with 10699 additions and 0 deletions

49
.gitattributes vendored Normal file
View File

@@ -0,0 +1,49 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bin.* filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zstandard filter=lfs diff=lfs merge=lfs -text
*.tfevents* filter=lfs diff=lfs merge=lfs -text
*.db* filter=lfs diff=lfs merge=lfs -text
*.ark* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*data* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.meta filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.index filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.gguf* filter=lfs diff=lfs merge=lfs -text
*.ggml filter=lfs diff=lfs merge=lfs -text
*.llamafile* filter=lfs diff=lfs merge=lfs -text
*.pt2 filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text

160
README.md Normal file
View File

@@ -0,0 +1,160 @@
---
license: apache-2.0
datasets:
- shisa-ai/shisa-v2-sharegpt
- shisa-ai/deepseekv3-ultrafeedback-armorm-dpo
language:
- ja
- en
base_model:
- mistralai/Mistral-Small-24B-Instruct-2501
pipeline_tag: text-generation
library_name: transformers
---
# Shisa V2
Shisa V2 is a family of bilingual Japanese and English (JA/EN) general-purpose chat models trained by [Shisa.AI](https://shisa.ai). These models aim to excel in Japanese language tasks while retaining robust English capabilities.
Since our initial [Shisa 7B](https://huggingface.co/augmxnt/shisa-7b-v1) releases, the baseline Japanese capabilities of open-weight language models have significantly improved. New models have more Japanese pre-training tokens, higher [JA tokenizer efficiency](https://github.com/shisa-ai/shisa-v2/blob/main/eval/tokenizer-efficiency/tokenizer-eval-ja.md), and better quality Japanese outputs overall. As such, for Shisa V2 we've eschewed both tokenizer extension and costly continued pre-training and have focused entirely on optimizing post-training. We've significantly expanded and refined the synthetic-data driven approach that was pioneered with our original [Shisa 7B](https://huggingface.co/augmxnt/shisa-7b-v1) models, and have achieved substantial performance gains.
## Model Family Overview
The Shisa V2 family comprises a range of models from 7B to 70B parameters in size:
| License | Model Name | Parameters | Context Length | JA AVG | EN AVG |
|----|----|----|----|----|----|
| Apache 2.0 | [shisa-v2-qwen2.5-7b](https://huggingface.co/shisa-ai/shisa-v2-qwen2.5-7b) | 7B | 128K/8K | 71.06 | 54.86 |
| Llama 3.1 | [shisa-v2-llama3.1-8b](https://huggingface.co/shisa-ai/shisa-v2-llama3.1-8b)<sup>1</sup> | 8B | 128K | 70.83 | 54.75 |
| Apache 2.0 | [shisa-v2-mistral-nemo-12b](https://huggingface.co/shisa-ai/shisa-v2-mistral-nemo-12b) | 12B | 128K | 72.83 | 53.33 |
| MIT | [shisa-v2-unphi4-14b](https://huggingface.co/shisa-ai/shisa-v2-unphi4-14b) | 14B | 16K | 75.89 | 60.10 |
| Apache 2.0 | [shisa-v2-qwen2.5-32b](https://huggingface.co/shisa-ai/shisa-v2-qwen2.5-32b) | 32B | 128K/8K | 76.97 | 67.41 |
| Llama 3.3 | [shisa-v2-llama3.3-70b](https://huggingface.co/shisa-ai/shisa-v2-llama3.3-70b)<sup>1</sup> | 70B | 128K | 79.72 | 67.71 |
These Shisa V2 models were all trained using the same datasets and training recipes, except for scaling the learning rate based on model size and modifying the global batch size for the 70B model.
While most of our development and tuning was done on the Llama 3.1 8B model, we did some cross-validation during this process and we're pleased that our final recipe has shown robust scaling, improving Japanese language performance across all model sizes evaluated. We've prioritized releasing the highest-quality openly-licensed (Apache 2.0 and MIT) models in each class size.
## Performance
*The `shisa-v2-mistral-small-24b` was a late addition so benchmarks are a WIP.*
All Shisa V2 models demonstrate improved Japanese output quality compared to their respective base models:
**WIP**
The Shisa V2 models perform well against other models in their respective class sizes.
**WIP**
| License | Model Name | JA AVG | EN AVG | Shaberi AVG | ELYZA 100 | JA MT Bench | Rakuda | Tengu | llm-jp-eval | shisa-jp-ifeval | shisa-jp-rp-bench | shisa-jp-tl-bench | MixEval | LiveBench | IFEval | EvalPlus |
|-----------|------------------------------------------------------------------------------------------------|-----------|-----------|-------------|-----------|-------------|---------|-------|-------------|-----------------|-------------------|-------------------|---------|-----------|--------|----------|
| Gemma | google/gemma-3-27b-it | **80.05** | 65.95 | 8.53 | 8.53 | 8.71 | 8.85 | 8.03 | 0.65 | **0.57** | 4.67 | **8.38** | 0.48 | **51.8** | **0.85** | 0.79 |
| Apache 2.0| [shisa-ai/shisa-v2-qwen2.5-32b](https://huggingface.co/shisa-ai/shisa-v2-qwen2.5-32b) | 76.97 | **67.41** | 8.63 | 8.59 | **8.88** | 9.04 | 8.00 | **0.66** | 0.40 | **4.71** | 7.07 | 0.53 | 49.9 | 0.82 | **0.84** |
| Apache 2.0| abeja/ABEJA-Qwen2.5-32b-Japanese-v0.1 | 74.14 | 65.70 | **8.66** | **8.65** | 8.78 | 9.07 | **8.14**| **0.66** | 0.37 | 4.12 | 6.20 | **0.55**| 46.20 | 0.83 | 0.79 |
| Apache 2.0| Qwen/Qwen2.5-32B-Instruct | 73.35 | 66.67 | 8.41 | 8.59 | 8.64 | 8.61 | 7.80 | **0.66** | 0.43 | 4.58 | 5.01 | 0.53 | 49.7 | 0.82 | 0.82 |
| Apache 2.0| mistralai/Mistral-Small-3.1-24B-Instruct-2503 | 72.03 | 65.15 | 8.51 | 8.56 | 8.63 | **9.12**| 7.74 | 0.63 | 0.36 | 4.59 | 4.49 | **0.55**| 44.90 | 0.81 | 0.80 |
### Testing Notes
Japanese functional tests were conducted using the **[shisa-ai/shaberi](https://github.com/shisa-ai/shaberi/)** fork of the [LightBlue Shaberi](https://github.com/lightblue-tech/japanese_llm_eval) evaluation harness. Shaberi ratings were performed with a **[PoLL](https://arxiv.org/abs/2404.18796)** (LLM Jury) consisting of:
- [Athene-V2](https://huggingface.co/Nexusflow/Athene-V2-Chat)
- [Llama 3.3 70B](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct)
- [Tulu 3 405B FP8](https://huggingface.co/shisa-ai/Llama-3.1-Tulu-3-405B-FP8-Dynamic)
The results were statistically validated to be comparable to both `gpt-4-1106-preview` and human-reviewed "gold standard" ratings.
Dynamic RoPE extension was utilized when necessary for testing models with context windows smaller than 8K tokens. All tests were performed using recent versions of [vLLM](https://github.com/vllm-project/vllm) or [SGLang](https://github.com/sgl-project/sglang).
We developed a custom "multieval" harness to automate our model evaluations. Standard benchmarks include:
- [ELYZA Tasks 100](https://huggingface.co/datasets/elyza/ELYZA-tasks-100)
- [JA MT-Bench](https://github.com/Stability-AI/FastChat/tree/jp-stable/fastchat/llm_judge) ([dataset](https://huggingface.co/datasets/shisa-ai/ja-mt-bench-1shot))
- [Rakuda](https://huggingface.co/datasets/yuzuai/rakuda-questions)
- [Tengu Bench](https://huggingface.co/datasets/lightblue/tengu_bench)
- [llm-jp-eval](https://github.com/llm-jp/llm-jp-eval) (v1.4.1)
- [MixEval](https://mixeval.github.io/)
- [LiveBench](https://livebench.ai/) (2024-11-25)
- [IFEval](https://huggingface.co/datasets/google/IFEval) ([Lighteval](https://github.com/huggingface/lighteval))
- [EvalPlus](https://github.com/evalplus/evalplus)
### New Japanese Benchmarks
Over the course of model development, we also created several new evaluations to help us measure performance on important Japanese downstream tasks:
- **shisa-jp-ifeval**: Inspired by [IFEval](https://huggingface.co/datasets/google/IFEval), but evaluating instruction-following abilities specific to Japanese grammar and linguistics (closed form)
- **shisa-jp-rp-bench**: Assessing performance on Japanese role-play and character/persona-based multi-turn conversations based on [Aratako](https://huggingface.co/Aratako)'s [Japanese-RP-Bench](https://github.com/Aratako/Japanese-RP-Bench) (LLM judge)
- **shisa-jp-tl-bench**: Testing Japanese-English translation proficiency (LLM judge, BTL pairwise comparison with logistic transformation scoring)
We believe these benchmarks will be generally useful and plan to open-source them in the near future to support the Japanese LLM research community.
## Usage
All Shisa V2 models inherit the [chat templates](https://huggingface.co/docs/transformers/v4.37.1/chat_templating) of their respective base models and have been tested and validated for proper inference with both [vLLM](https://github.com/vllm-project/vllm) and [SGLang](https://github.com/sgl-project/sglang).
Running sampler sweeps, we found the models operate well across a variety of temperatures in most settings. For translation tasks specifically, we recommend a lower temperatures (0.2) to increase accuracy. For role-play and creative tasks, a higher temp (eg 1.0) seems to give good results. To prevent cross-lingual token leakage we recommend a top_p of 0.9 or min_p of 0.1.
No additional safety alignment has been done on these models, so they will largely inherit the base models' biases and safety profiles.
## Datasets
Our supervised fine-tuning (SFT) stage dataset consists of approximately 360K samples totaling roughly 420M Llama 3 tokens:
- [shisa-ai/shisa-v2-sharegpt](https://huggingface.co/datasets/shisa-ai/shisa-v2-sharegpt)
- This is a filtered, regenerated and resampled version of the original Shisa V1 [augmxnt/ultra-orca-boros-en-ja-v1](https://huggingface.co/datasets/augmxnt/ultra-orca-boros-en-ja-v1) dataset
- This was the backbone of our Shisa V2 training and it proved to be an extremely robust dataset, out-performing all existing mixes/additions (Tulu, Olmo, Rewild, various Magpie sets, etc.) - if you need a JA/EN dataset, we believe this new version is among the best currently available
- [shisa-ai/rewild-set-deepseek-subset](https://huggingface.co/datasets/shisa-ai/rewild-set-deepseek-subset)
- A filtered version of [Rewild](https://arxiv.org/abs/2501.18511) ([WildChat](https://wildchat.allen.ai/)) prompts translated into Japanese, with responses generated by [DeepSeek-V3-0324](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324)
- shisa-ai/magpie-ultra-set
- Japanese generations based on [argilla/magpie-ultra-v1.0](https://huggingface.co/datasets/argilla/magpie-ultra-v1.0)
- shisa-ai/magpie-advanced-questions-set
- [Magpie](https://magpie-align.github.io/)-generated questions about advanced college-level topics across a variety of academic fields
- shisa-ai/japan-magpie-set
- [Magpie](https://magpie-align.github.io/)-generated questions about Japan's economy and history as well as cultural and business practices
- shisa-ai/shisa-v2-roleplaying-sft
- Synthetically-generated roleplaying data featuring a wide variety of characters, situations, and genres
- shisa-ai/translation_expanded_master_set_filtered
- A synthetic dataset involving a wide range of translation tasks, including essays, conversations, and fiction
- shisa-ai/shisa-v2-instruction-following-sft
- An instruction following dataset based on prompts from ([Aratako/Magpie-Tanuki-8B-annotated-96k](https://huggingface.co/datasets/Aratako/Magpie-Tanuki-8B-annotated-96k)) and a list of instruction-following constraints
Our final DPO mix is 113K samples totaling approximately 115M Llama 3 tokens:
- [shisa-ai/deepseekv3-ultrafeedback-armorm-dpo](https://huggingface.co/datasets/shisa-ai/deepseekv3-ultrafeedback-armorm-dpo)
- This is a version of [princeton-nlp/gemma2-ultrafeedback-armorm](https://huggingface.co/datasets/princeton-nlp/gemma2-ultrafeedback-armorm) with `chosen` responses regenerated by [DeepSeek-V3-0324](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324)
- Surprisingly, we found that using this relatively small DPO alignment set in English-only outperformed both JA/EN DPO sets and also much larger sets like the [Tulu 3 preference mixture](https://huggingface.co/datasets/allenai/llama-3.1-tulu-3-405b-preference-mixture)
- shisa-ai/shisa-v2-roleplaying-dpo
- A DPO variant of the roleplaying-sft set that uses an [UltraFeedback](https://github.com/OpenBMB/UltraFeedback)-style rating system
- shisa-ai/translation-no-extra-text-dpo-dataset
- A DPO set that aims to reduce the tendency of models to output extraneous explanatory text for translations when not wanted
- shisa-ai/shisa-v2-instruction-following-dpo
- A DPO variant of the instruction-following-sft set to further enhance instruction-following performance
- shisa-ai/politeness-dpo-set
- A set to allow for greater controllability of speaking style for Japanese responses
## Training
We trained over 200 models to empirically test a wide range of variables. Beyond hyper-parameter and data-mix testing, we also ran numerous tests on data ordering, multilingual-specific ordering, curriculum learning, multi-stage training, various forms of self-play, preference tuning, and some of the latest RL/verifiable reward techniques.
A full discussion of these learnings is out of scope here, but we will be updating the [shisa-v2 wiki](https://github.com/shisa-ai/shisa-v2/wiki) and the [Shisa.AI website](http://shisa.ai/) with forthcoming writeups.
Most of our training was done on a small AWS Sagemaker-deployed 4-node H100 slurm cluster. Training was mostly done with [Axolotl](https://github.com/axolotl-ai-cloud/axolotl/) with [DeepSpeed](https://www.deepspeed.ai/) and [Liger Kernels](https://github.com/linkedin/Liger-Kernel). The Phi 4 and Llama 3.3 70B versions of Shisa V2 were trained with [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF). Our training logs are [publicly available on Weights and Biases](https://wandb.ai/augmxnt/shisa-v2).
### Credits
The Shisa V2 models were developed by [Leonard Lin](https://huggingface.co/leonardlin) and Adam Lensenmayer ([Shisa.AI](http://Shisa.AI)).
Compute was provided by [Ubitus K.K.](https://ubitus.net/) and [METI GENIAC](https://www.meti.go.jp/policy/mono_info_service/geniac/).
Thanks to [Meta Llama](https://huggingface.co/meta-llama), [Microsoft Research](https://huggingface.co/microsoft), [Mistral AI](https://huggingface.co/mistralai), and [Qwen Team](https://huggingface.co/Qwen) for providing their models to the open source community, [Unsloth](https://huggingface.co/unsloth) for their [llamafied conversion of Phi-4](https://huggingface.co/unsloth/phi-4), the Tulu team, whose detailed writeups and fast responses to our questions were very helpful, and [Chanvichet Vong](https://github.com/NanoCode012) of the Axolotl team for his tireless work in the Axolotl Discord.
We also extend our thanks to all open source AI developers and researchers - without their publicly shared research, tooling, and datasets, none of our work would be possible. We hope that our own contributions will further support the broader community.
A special thanks to [Jon Durbin](https://huggingface.co/jondurbin) for his work on Shisa V1.
For more details on our development and insights, please visit the [Shisa V2 Github repository](https://github.com/shisa-ai/shisa-v2) and the [Shisa.AI website](https://shisa.ai/).
---
<sup>*1: Per the Llama Community License Agreements, the official names of the Llama-based models are "Llama 3.1 shisa-v2-llama3.1-8b" and "Llama 3.3 shisa-v2-llama3.3-70b"*</sup>

26
config.json Normal file
View File

@@ -0,0 +1,26 @@
{
"architectures": [
"MistralForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 5120,
"initializer_range": 0.02,
"intermediate_size": 32768,
"max_position_embeddings": 32768,
"model_type": "mistral",
"num_attention_heads": 32,
"num_hidden_layers": 40,
"num_key_value_heads": 8,
"rms_norm_eps": 1e-05,
"rope_theta": 100000000.0,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.50.0",
"use_cache": false,
"vocab_size": 131072
}

1
configuration.json Normal file
View File

@@ -0,0 +1 @@
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}

8
generation_config.json Normal file
View File

@@ -0,0 +1,8 @@
{
"_from_model_config": true,
"bos_token_id": 1,
"do_sample": true,
"eos_token_id": 2,
"temperature": 0.15,
"transformers_version": "4.50.0"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a4a67c54d8a90b765a2f11e9102cfaa4f8737abd38ace5786da8d6c4ec779901
size 4781571736

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:37af8a9b17bb41f60560b8df4bd4a04df98ef95b5bee39b8d0a8b25b75d8b5e4
size 4781592784

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:40f3e65ef574566aea3d3f613b49ae271599f03a3070e4a80aa81d40b976d301
size 4781592800

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:635bbfb7aacf9d811346da136c2daa726807f65704d316dd57aff6ab243ac592
size 4886471600

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f4e2095f4da7aa05f3a68cfabd2b45ae7119c534502bd8890b54cf50bccb73f6
size 4781592824

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:2aa57797d80efebba74a2fcd22281aefcdb536f8f8b1a64e70dd8ed1a2283276
size 4781592816

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c683f227b75063404865953e811581538d952ccb53f7417f443131fee266986a
size 4886471600

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:00260e8fca1aacbb9efa68360a5ad2d84ff063231b5aa44b8ba650234ef97ac7
size 4781592824

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8b67b8d6355f207ac95047b1bb717736edb1ac55ed52f5fc3e802ae6ab95f3d2
size 4781592816

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:328ab2c67ff3a21fd2670d95c838252d83bb2f354136320ad507903148526050
size 3900777072

View File

@@ -0,0 +1,370 @@
{
"metadata": {
"total_size": 47144806400
},
"weight_map": {
"lm_head.weight": "model-00010-of-00010.safetensors",
"model.embed_tokens.weight": "model-00001-of-00010.safetensors",
"model.layers.0.input_layernorm.weight": "model-00001-of-00010.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00010.safetensors",
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00010.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00010.safetensors",
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00010.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00010.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00010.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00010.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00010.safetensors",
"model.layers.1.input_layernorm.weight": "model-00001-of-00010.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00010.safetensors",
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00010.safetensors",
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00010.safetensors",
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00010.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00010.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00010.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00010.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00010.safetensors",
"model.layers.10.input_layernorm.weight": "model-00003-of-00010.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-00003-of-00010.safetensors",
"model.layers.10.mlp.gate_proj.weight": "model-00003-of-00010.safetensors",
"model.layers.10.mlp.up_proj.weight": "model-00003-of-00010.safetensors",
"model.layers.10.post_attention_layernorm.weight": "model-00003-of-00010.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-00003-of-00010.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-00003-of-00010.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-00003-of-00010.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-00003-of-00010.safetensors",
"model.layers.11.input_layernorm.weight": "model-00004-of-00010.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-00004-of-00010.safetensors",
"model.layers.11.mlp.gate_proj.weight": "model-00003-of-00010.safetensors",
"model.layers.11.mlp.up_proj.weight": "model-00003-of-00010.safetensors",
"model.layers.11.post_attention_layernorm.weight": "model-00004-of-00010.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-00003-of-00010.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-00003-of-00010.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-00003-of-00010.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-00003-of-00010.safetensors",
"model.layers.12.input_layernorm.weight": "model-00004-of-00010.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-00004-of-00010.safetensors",
"model.layers.12.mlp.gate_proj.weight": "model-00004-of-00010.safetensors",
"model.layers.12.mlp.up_proj.weight": "model-00004-of-00010.safetensors",
"model.layers.12.post_attention_layernorm.weight": "model-00004-of-00010.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-00004-of-00010.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-00004-of-00010.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-00004-of-00010.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-00004-of-00010.safetensors",
"model.layers.13.input_layernorm.weight": "model-00004-of-00010.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-00004-of-00010.safetensors",
"model.layers.13.mlp.gate_proj.weight": "model-00004-of-00010.safetensors",
"model.layers.13.mlp.up_proj.weight": "model-00004-of-00010.safetensors",
"model.layers.13.post_attention_layernorm.weight": "model-00004-of-00010.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-00004-of-00010.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-00004-of-00010.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-00004-of-00010.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-00004-of-00010.safetensors",
"model.layers.14.input_layernorm.weight": "model-00004-of-00010.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-00004-of-00010.safetensors",
"model.layers.14.mlp.gate_proj.weight": "model-00004-of-00010.safetensors",
"model.layers.14.mlp.up_proj.weight": "model-00004-of-00010.safetensors",
"model.layers.14.post_attention_layernorm.weight": "model-00004-of-00010.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-00004-of-00010.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-00004-of-00010.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-00004-of-00010.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-00004-of-00010.safetensors",
"model.layers.15.input_layernorm.weight": "model-00004-of-00010.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-00004-of-00010.safetensors",
"model.layers.15.mlp.gate_proj.weight": "model-00004-of-00010.safetensors",
"model.layers.15.mlp.up_proj.weight": "model-00004-of-00010.safetensors",
"model.layers.15.post_attention_layernorm.weight": "model-00004-of-00010.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-00004-of-00010.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-00004-of-00010.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-00004-of-00010.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-00004-of-00010.safetensors",
"model.layers.16.input_layernorm.weight": "model-00005-of-00010.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-00005-of-00010.safetensors",
"model.layers.16.mlp.gate_proj.weight": "model-00005-of-00010.safetensors",
"model.layers.16.mlp.up_proj.weight": "model-00005-of-00010.safetensors",
"model.layers.16.post_attention_layernorm.weight": "model-00005-of-00010.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-00004-of-00010.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-00004-of-00010.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-00004-of-00010.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-00004-of-00010.safetensors",
"model.layers.17.input_layernorm.weight": "model-00005-of-00010.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-00005-of-00010.safetensors",
"model.layers.17.mlp.gate_proj.weight": "model-00005-of-00010.safetensors",
"model.layers.17.mlp.up_proj.weight": "model-00005-of-00010.safetensors",
"model.layers.17.post_attention_layernorm.weight": "model-00005-of-00010.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-00005-of-00010.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-00005-of-00010.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-00005-of-00010.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-00005-of-00010.safetensors",
"model.layers.18.input_layernorm.weight": "model-00005-of-00010.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-00005-of-00010.safetensors",
"model.layers.18.mlp.gate_proj.weight": "model-00005-of-00010.safetensors",
"model.layers.18.mlp.up_proj.weight": "model-00005-of-00010.safetensors",
"model.layers.18.post_attention_layernorm.weight": "model-00005-of-00010.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-00005-of-00010.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-00005-of-00010.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-00005-of-00010.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-00005-of-00010.safetensors",
"model.layers.19.input_layernorm.weight": "model-00005-of-00010.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-00005-of-00010.safetensors",
"model.layers.19.mlp.gate_proj.weight": "model-00005-of-00010.safetensors",
"model.layers.19.mlp.up_proj.weight": "model-00005-of-00010.safetensors",
"model.layers.19.post_attention_layernorm.weight": "model-00005-of-00010.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-00005-of-00010.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-00005-of-00010.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-00005-of-00010.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-00005-of-00010.safetensors",
"model.layers.2.input_layernorm.weight": "model-00001-of-00010.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00010.safetensors",
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00010.safetensors",
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00010.safetensors",
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00010.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00010.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00010.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00010.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00010.safetensors",
"model.layers.20.input_layernorm.weight": "model-00006-of-00010.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-00006-of-00010.safetensors",
"model.layers.20.mlp.gate_proj.weight": "model-00005-of-00010.safetensors",
"model.layers.20.mlp.up_proj.weight": "model-00006-of-00010.safetensors",
"model.layers.20.post_attention_layernorm.weight": "model-00006-of-00010.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-00005-of-00010.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-00005-of-00010.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-00005-of-00010.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-00005-of-00010.safetensors",
"model.layers.21.input_layernorm.weight": "model-00006-of-00010.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-00006-of-00010.safetensors",
"model.layers.21.mlp.gate_proj.weight": "model-00006-of-00010.safetensors",
"model.layers.21.mlp.up_proj.weight": "model-00006-of-00010.safetensors",
"model.layers.21.post_attention_layernorm.weight": "model-00006-of-00010.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-00006-of-00010.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-00006-of-00010.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-00006-of-00010.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-00006-of-00010.safetensors",
"model.layers.22.input_layernorm.weight": "model-00006-of-00010.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-00006-of-00010.safetensors",
"model.layers.22.mlp.gate_proj.weight": "model-00006-of-00010.safetensors",
"model.layers.22.mlp.up_proj.weight": "model-00006-of-00010.safetensors",
"model.layers.22.post_attention_layernorm.weight": "model-00006-of-00010.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-00006-of-00010.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-00006-of-00010.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-00006-of-00010.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-00006-of-00010.safetensors",
"model.layers.23.input_layernorm.weight": "model-00006-of-00010.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-00006-of-00010.safetensors",
"model.layers.23.mlp.gate_proj.weight": "model-00006-of-00010.safetensors",
"model.layers.23.mlp.up_proj.weight": "model-00006-of-00010.safetensors",
"model.layers.23.post_attention_layernorm.weight": "model-00006-of-00010.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-00006-of-00010.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-00006-of-00010.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-00006-of-00010.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-00006-of-00010.safetensors",
"model.layers.24.input_layernorm.weight": "model-00007-of-00010.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-00007-of-00010.safetensors",
"model.layers.24.mlp.gate_proj.weight": "model-00006-of-00010.safetensors",
"model.layers.24.mlp.up_proj.weight": "model-00006-of-00010.safetensors",
"model.layers.24.post_attention_layernorm.weight": "model-00007-of-00010.safetensors",
"model.layers.24.self_attn.k_proj.weight": "model-00006-of-00010.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-00006-of-00010.safetensors",
"model.layers.24.self_attn.q_proj.weight": "model-00006-of-00010.safetensors",
"model.layers.24.self_attn.v_proj.weight": "model-00006-of-00010.safetensors",
"model.layers.25.input_layernorm.weight": "model-00007-of-00010.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-00007-of-00010.safetensors",
"model.layers.25.mlp.gate_proj.weight": "model-00007-of-00010.safetensors",
"model.layers.25.mlp.up_proj.weight": "model-00007-of-00010.safetensors",
"model.layers.25.post_attention_layernorm.weight": "model-00007-of-00010.safetensors",
"model.layers.25.self_attn.k_proj.weight": "model-00007-of-00010.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-00007-of-00010.safetensors",
"model.layers.25.self_attn.q_proj.weight": "model-00007-of-00010.safetensors",
"model.layers.25.self_attn.v_proj.weight": "model-00007-of-00010.safetensors",
"model.layers.26.input_layernorm.weight": "model-00007-of-00010.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-00007-of-00010.safetensors",
"model.layers.26.mlp.gate_proj.weight": "model-00007-of-00010.safetensors",
"model.layers.26.mlp.up_proj.weight": "model-00007-of-00010.safetensors",
"model.layers.26.post_attention_layernorm.weight": "model-00007-of-00010.safetensors",
"model.layers.26.self_attn.k_proj.weight": "model-00007-of-00010.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-00007-of-00010.safetensors",
"model.layers.26.self_attn.q_proj.weight": "model-00007-of-00010.safetensors",
"model.layers.26.self_attn.v_proj.weight": "model-00007-of-00010.safetensors",
"model.layers.27.input_layernorm.weight": "model-00007-of-00010.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-00007-of-00010.safetensors",
"model.layers.27.mlp.gate_proj.weight": "model-00007-of-00010.safetensors",
"model.layers.27.mlp.up_proj.weight": "model-00007-of-00010.safetensors",
"model.layers.27.post_attention_layernorm.weight": "model-00007-of-00010.safetensors",
"model.layers.27.self_attn.k_proj.weight": "model-00007-of-00010.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-00007-of-00010.safetensors",
"model.layers.27.self_attn.q_proj.weight": "model-00007-of-00010.safetensors",
"model.layers.27.self_attn.v_proj.weight": "model-00007-of-00010.safetensors",
"model.layers.28.input_layernorm.weight": "model-00007-of-00010.safetensors",
"model.layers.28.mlp.down_proj.weight": "model-00007-of-00010.safetensors",
"model.layers.28.mlp.gate_proj.weight": "model-00007-of-00010.safetensors",
"model.layers.28.mlp.up_proj.weight": "model-00007-of-00010.safetensors",
"model.layers.28.post_attention_layernorm.weight": "model-00007-of-00010.safetensors",
"model.layers.28.self_attn.k_proj.weight": "model-00007-of-00010.safetensors",
"model.layers.28.self_attn.o_proj.weight": "model-00007-of-00010.safetensors",
"model.layers.28.self_attn.q_proj.weight": "model-00007-of-00010.safetensors",
"model.layers.28.self_attn.v_proj.weight": "model-00007-of-00010.safetensors",
"model.layers.29.input_layernorm.weight": "model-00008-of-00010.safetensors",
"model.layers.29.mlp.down_proj.weight": "model-00008-of-00010.safetensors",
"model.layers.29.mlp.gate_proj.weight": "model-00008-of-00010.safetensors",
"model.layers.29.mlp.up_proj.weight": "model-00008-of-00010.safetensors",
"model.layers.29.post_attention_layernorm.weight": "model-00008-of-00010.safetensors",
"model.layers.29.self_attn.k_proj.weight": "model-00007-of-00010.safetensors",
"model.layers.29.self_attn.o_proj.weight": "model-00007-of-00010.safetensors",
"model.layers.29.self_attn.q_proj.weight": "model-00007-of-00010.safetensors",
"model.layers.29.self_attn.v_proj.weight": "model-00007-of-00010.safetensors",
"model.layers.3.input_layernorm.weight": "model-00002-of-00010.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-00002-of-00010.safetensors",
"model.layers.3.mlp.gate_proj.weight": "model-00002-of-00010.safetensors",
"model.layers.3.mlp.up_proj.weight": "model-00002-of-00010.safetensors",
"model.layers.3.post_attention_layernorm.weight": "model-00002-of-00010.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00010.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00010.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00010.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00010.safetensors",
"model.layers.30.input_layernorm.weight": "model-00008-of-00010.safetensors",
"model.layers.30.mlp.down_proj.weight": "model-00008-of-00010.safetensors",
"model.layers.30.mlp.gate_proj.weight": "model-00008-of-00010.safetensors",
"model.layers.30.mlp.up_proj.weight": "model-00008-of-00010.safetensors",
"model.layers.30.post_attention_layernorm.weight": "model-00008-of-00010.safetensors",
"model.layers.30.self_attn.k_proj.weight": "model-00008-of-00010.safetensors",
"model.layers.30.self_attn.o_proj.weight": "model-00008-of-00010.safetensors",
"model.layers.30.self_attn.q_proj.weight": "model-00008-of-00010.safetensors",
"model.layers.30.self_attn.v_proj.weight": "model-00008-of-00010.safetensors",
"model.layers.31.input_layernorm.weight": "model-00008-of-00010.safetensors",
"model.layers.31.mlp.down_proj.weight": "model-00008-of-00010.safetensors",
"model.layers.31.mlp.gate_proj.weight": "model-00008-of-00010.safetensors",
"model.layers.31.mlp.up_proj.weight": "model-00008-of-00010.safetensors",
"model.layers.31.post_attention_layernorm.weight": "model-00008-of-00010.safetensors",
"model.layers.31.self_attn.k_proj.weight": "model-00008-of-00010.safetensors",
"model.layers.31.self_attn.o_proj.weight": "model-00008-of-00010.safetensors",
"model.layers.31.self_attn.q_proj.weight": "model-00008-of-00010.safetensors",
"model.layers.31.self_attn.v_proj.weight": "model-00008-of-00010.safetensors",
"model.layers.32.input_layernorm.weight": "model-00008-of-00010.safetensors",
"model.layers.32.mlp.down_proj.weight": "model-00008-of-00010.safetensors",
"model.layers.32.mlp.gate_proj.weight": "model-00008-of-00010.safetensors",
"model.layers.32.mlp.up_proj.weight": "model-00008-of-00010.safetensors",
"model.layers.32.post_attention_layernorm.weight": "model-00008-of-00010.safetensors",
"model.layers.32.self_attn.k_proj.weight": "model-00008-of-00010.safetensors",
"model.layers.32.self_attn.o_proj.weight": "model-00008-of-00010.safetensors",
"model.layers.32.self_attn.q_proj.weight": "model-00008-of-00010.safetensors",
"model.layers.32.self_attn.v_proj.weight": "model-00008-of-00010.safetensors",
"model.layers.33.input_layernorm.weight": "model-00009-of-00010.safetensors",
"model.layers.33.mlp.down_proj.weight": "model-00009-of-00010.safetensors",
"model.layers.33.mlp.gate_proj.weight": "model-00008-of-00010.safetensors",
"model.layers.33.mlp.up_proj.weight": "model-00009-of-00010.safetensors",
"model.layers.33.post_attention_layernorm.weight": "model-00009-of-00010.safetensors",
"model.layers.33.self_attn.k_proj.weight": "model-00008-of-00010.safetensors",
"model.layers.33.self_attn.o_proj.weight": "model-00008-of-00010.safetensors",
"model.layers.33.self_attn.q_proj.weight": "model-00008-of-00010.safetensors",
"model.layers.33.self_attn.v_proj.weight": "model-00008-of-00010.safetensors",
"model.layers.34.input_layernorm.weight": "model-00009-of-00010.safetensors",
"model.layers.34.mlp.down_proj.weight": "model-00009-of-00010.safetensors",
"model.layers.34.mlp.gate_proj.weight": "model-00009-of-00010.safetensors",
"model.layers.34.mlp.up_proj.weight": "model-00009-of-00010.safetensors",
"model.layers.34.post_attention_layernorm.weight": "model-00009-of-00010.safetensors",
"model.layers.34.self_attn.k_proj.weight": "model-00009-of-00010.safetensors",
"model.layers.34.self_attn.o_proj.weight": "model-00009-of-00010.safetensors",
"model.layers.34.self_attn.q_proj.weight": "model-00009-of-00010.safetensors",
"model.layers.34.self_attn.v_proj.weight": "model-00009-of-00010.safetensors",
"model.layers.35.input_layernorm.weight": "model-00009-of-00010.safetensors",
"model.layers.35.mlp.down_proj.weight": "model-00009-of-00010.safetensors",
"model.layers.35.mlp.gate_proj.weight": "model-00009-of-00010.safetensors",
"model.layers.35.mlp.up_proj.weight": "model-00009-of-00010.safetensors",
"model.layers.35.post_attention_layernorm.weight": "model-00009-of-00010.safetensors",
"model.layers.35.self_attn.k_proj.weight": "model-00009-of-00010.safetensors",
"model.layers.35.self_attn.o_proj.weight": "model-00009-of-00010.safetensors",
"model.layers.35.self_attn.q_proj.weight": "model-00009-of-00010.safetensors",
"model.layers.35.self_attn.v_proj.weight": "model-00009-of-00010.safetensors",
"model.layers.36.input_layernorm.weight": "model-00009-of-00010.safetensors",
"model.layers.36.mlp.down_proj.weight": "model-00009-of-00010.safetensors",
"model.layers.36.mlp.gate_proj.weight": "model-00009-of-00010.safetensors",
"model.layers.36.mlp.up_proj.weight": "model-00009-of-00010.safetensors",
"model.layers.36.post_attention_layernorm.weight": "model-00009-of-00010.safetensors",
"model.layers.36.self_attn.k_proj.weight": "model-00009-of-00010.safetensors",
"model.layers.36.self_attn.o_proj.weight": "model-00009-of-00010.safetensors",
"model.layers.36.self_attn.q_proj.weight": "model-00009-of-00010.safetensors",
"model.layers.36.self_attn.v_proj.weight": "model-00009-of-00010.safetensors",
"model.layers.37.input_layernorm.weight": "model-00010-of-00010.safetensors",
"model.layers.37.mlp.down_proj.weight": "model-00010-of-00010.safetensors",
"model.layers.37.mlp.gate_proj.weight": "model-00009-of-00010.safetensors",
"model.layers.37.mlp.up_proj.weight": "model-00009-of-00010.safetensors",
"model.layers.37.post_attention_layernorm.weight": "model-00010-of-00010.safetensors",
"model.layers.37.self_attn.k_proj.weight": "model-00009-of-00010.safetensors",
"model.layers.37.self_attn.o_proj.weight": "model-00009-of-00010.safetensors",
"model.layers.37.self_attn.q_proj.weight": "model-00009-of-00010.safetensors",
"model.layers.37.self_attn.v_proj.weight": "model-00009-of-00010.safetensors",
"model.layers.38.input_layernorm.weight": "model-00010-of-00010.safetensors",
"model.layers.38.mlp.down_proj.weight": "model-00010-of-00010.safetensors",
"model.layers.38.mlp.gate_proj.weight": "model-00010-of-00010.safetensors",
"model.layers.38.mlp.up_proj.weight": "model-00010-of-00010.safetensors",
"model.layers.38.post_attention_layernorm.weight": "model-00010-of-00010.safetensors",
"model.layers.38.self_attn.k_proj.weight": "model-00010-of-00010.safetensors",
"model.layers.38.self_attn.o_proj.weight": "model-00010-of-00010.safetensors",
"model.layers.38.self_attn.q_proj.weight": "model-00010-of-00010.safetensors",
"model.layers.38.self_attn.v_proj.weight": "model-00010-of-00010.safetensors",
"model.layers.39.input_layernorm.weight": "model-00010-of-00010.safetensors",
"model.layers.39.mlp.down_proj.weight": "model-00010-of-00010.safetensors",
"model.layers.39.mlp.gate_proj.weight": "model-00010-of-00010.safetensors",
"model.layers.39.mlp.up_proj.weight": "model-00010-of-00010.safetensors",
"model.layers.39.post_attention_layernorm.weight": "model-00010-of-00010.safetensors",
"model.layers.39.self_attn.k_proj.weight": "model-00010-of-00010.safetensors",
"model.layers.39.self_attn.o_proj.weight": "model-00010-of-00010.safetensors",
"model.layers.39.self_attn.q_proj.weight": "model-00010-of-00010.safetensors",
"model.layers.39.self_attn.v_proj.weight": "model-00010-of-00010.safetensors",
"model.layers.4.input_layernorm.weight": "model-00002-of-00010.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-00002-of-00010.safetensors",
"model.layers.4.mlp.gate_proj.weight": "model-00002-of-00010.safetensors",
"model.layers.4.mlp.up_proj.weight": "model-00002-of-00010.safetensors",
"model.layers.4.post_attention_layernorm.weight": "model-00002-of-00010.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-00002-of-00010.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-00002-of-00010.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-00002-of-00010.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-00002-of-00010.safetensors",
"model.layers.5.input_layernorm.weight": "model-00002-of-00010.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-00002-of-00010.safetensors",
"model.layers.5.mlp.gate_proj.weight": "model-00002-of-00010.safetensors",
"model.layers.5.mlp.up_proj.weight": "model-00002-of-00010.safetensors",
"model.layers.5.post_attention_layernorm.weight": "model-00002-of-00010.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-00002-of-00010.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-00002-of-00010.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-00002-of-00010.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-00002-of-00010.safetensors",
"model.layers.6.input_layernorm.weight": "model-00002-of-00010.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-00002-of-00010.safetensors",
"model.layers.6.mlp.gate_proj.weight": "model-00002-of-00010.safetensors",
"model.layers.6.mlp.up_proj.weight": "model-00002-of-00010.safetensors",
"model.layers.6.post_attention_layernorm.weight": "model-00002-of-00010.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-00002-of-00010.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-00002-of-00010.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-00002-of-00010.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-00002-of-00010.safetensors",
"model.layers.7.input_layernorm.weight": "model-00003-of-00010.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-00003-of-00010.safetensors",
"model.layers.7.mlp.gate_proj.weight": "model-00002-of-00010.safetensors",
"model.layers.7.mlp.up_proj.weight": "model-00003-of-00010.safetensors",
"model.layers.7.post_attention_layernorm.weight": "model-00003-of-00010.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-00002-of-00010.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-00002-of-00010.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-00002-of-00010.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-00002-of-00010.safetensors",
"model.layers.8.input_layernorm.weight": "model-00003-of-00010.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-00003-of-00010.safetensors",
"model.layers.8.mlp.gate_proj.weight": "model-00003-of-00010.safetensors",
"model.layers.8.mlp.up_proj.weight": "model-00003-of-00010.safetensors",
"model.layers.8.post_attention_layernorm.weight": "model-00003-of-00010.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-00003-of-00010.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-00003-of-00010.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-00003-of-00010.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-00003-of-00010.safetensors",
"model.layers.9.input_layernorm.weight": "model-00003-of-00010.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-00003-of-00010.safetensors",
"model.layers.9.mlp.gate_proj.weight": "model-00003-of-00010.safetensors",
"model.layers.9.mlp.up_proj.weight": "model-00003-of-00010.safetensors",
"model.layers.9.post_attention_layernorm.weight": "model-00003-of-00010.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-00003-of-00010.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-00003-of-00010.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-00003-of-00010.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-00003-of-00010.safetensors",
"model.norm.weight": "model-00010-of-00010.safetensors"
}
}

1032
special_tokens_map.json Normal file

File diff suppressed because it is too large Load Diff

3
tokenizer.json Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b639fcaeefa983aa47e9108527764e3b0e20cf34bae9bf19c6ea8406b35b8428
size 17078136

9020
tokenizer_config.json Normal file

File diff suppressed because it is too large Load Diff