初始化项目,由ModelHub XC社区提供模型
Model: JosephusCheung/Guanaco Source: Original Platform
This commit is contained in:
34
.gitattributes
vendored
Normal file
34
.gitattributes
vendored
Normal file
@@ -0,0 +1,34 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
106
README.md
Normal file
106
README.md
Normal file
@@ -0,0 +1,106 @@
|
||||
---
|
||||
inference: false
|
||||
license: gpl-3.0
|
||||
datasets:
|
||||
- JosephusCheung/GuanacoDataset
|
||||
language:
|
||||
- en
|
||||
- zh
|
||||
- ja
|
||||
- de
|
||||
pipeline_tag: conversational
|
||||
tags:
|
||||
- llama
|
||||
- guannaco
|
||||
- alpaca
|
||||
---
|
||||
|
||||

|
||||
|
||||
**You can run on Colab free T4 GPU now**
|
||||
|
||||
[](https://colab.research.google.com/drive/1ocSmoy3ba1EkYu7JWT1oCw9vz8qC2cMk#scrollTo=zLORi5OcPcIJ)
|
||||
|
||||
**It is highly recommended to use fp16 inference for this model, as 8-bit precision may significantly affect performance. If you require a more Consumer Hardware friendly version, please use the specialized quantized, only 5+GB V-Ram required** [JosephusCheung/GuanacoOnConsumerHardware](https://huggingface.co/JosephusCheung/GuanacoOnConsumerHardware).
|
||||
|
||||
**You are encouraged to use the latest version of transformers from GitHub.**
|
||||
|
||||
Guanaco is an advanced instruction-following language model built on Meta's LLaMA 7B model. Expanding upon the initial 52K dataset from the Alpaca model, an additional 534K+ entries have been incorporated, covering English, Simplified Chinese, Traditional Chinese (Taiwan), Traditional Chinese (Hong Kong), Japanese, Deutsch, and various linguistic and grammatical tasks. This wealth of data enables Guanaco to perform exceptionally well in multilingual environments.
|
||||
|
||||
In an effort to foster openness and replicability in research, we have made the Guanaco Dataset publicly accessible and we have released the model weights here. By providing these resources, we aim to inspire more researchers to pursue related research and collectively advance the development of instruction-following language models.
|
||||
|
||||
[KBlueLeaf](https://huggingface.co/KBlueLeaf)’s invaluable contributions to the conceptual validation, [trained model](https://huggingface.co/KBlueLeaf/guanaco-7B-leh) and [inference development](https://github.com/KohakuBlueleaf/guanaco-lora) of the model would be gratefully acknowledged, without whose efforts the project shall never have come to fruition.
|
||||
|
||||
When utilizing the Guanaco model, please bear in mind the following points:
|
||||
|
||||
The Guanaco model has not been filtered for harmful, biased, or explicit content. As a result, outputs that do not adhere to ethical norms may be generated during use. Please exercise caution when using the model in research or practical applications.
|
||||
1. ### Improved context and prompt role support:
|
||||
|
||||
The new format is designed to be similar to ChatGPT, allowing for better integration with the Alpaca format and enhancing the overall user experience.
|
||||
|
||||
Instruction is utilized as a few-shot context to support diverse inputs and responses, making it easier for the model to understand and provide accurate responses to user queries.
|
||||
|
||||
The format is as follows:
|
||||
|
||||
```
|
||||
### Instruction:
|
||||
User: History User Input
|
||||
Assistant: History Assistant Answer
|
||||
### Input:
|
||||
System: Knowledge
|
||||
User: New User Input
|
||||
### Response:
|
||||
New Assistant Answer
|
||||
```
|
||||
|
||||
This structured format allows for easier tracking of the conversation history and maintaining context throughout a multi-turn dialogue.
|
||||
|
||||
3. ### Role-playing support:
|
||||
|
||||
Guanaco now offers advanced role-playing support, similar to Character.AI, in English, Simplified Chinese, Traditional Chinese, Japanese, and Deutsch, making it more versatile for users from different linguistic backgrounds.
|
||||
|
||||
Users can instruct the model to assume specific roles, historical figures, or fictional characters, as well as personalities based on their input. This allows for more engaging and immersive conversations.
|
||||
|
||||
The model can use various sources of information to provide knowledge and context for the character's background and behavior, such as encyclopedic entries, first-person narrations, or a list of personality traits.
|
||||
|
||||
The model will consistently output responses in the format "Character Name: Reply" to maintain the chosen role throughout the conversation, enhancing the user's experience.
|
||||
|
||||
4. ### Rejection of answers and avoidance of erroneous responses:
|
||||
|
||||
The model has been updated to handle situations where it lacks sufficient knowledge or is unable to provide a valid response more effectively.
|
||||
|
||||
Reserved keywords have been introduced to indicate different scenarios and provide clearer communication with the user, use in System Prompt:
|
||||
|
||||
NO IDEA: Indicates that the model lacks the necessary knowledge to provide an accurate answer, and will explain this to the user, encouraging them to seek alternative sources.
|
||||
|
||||
FORBIDDEN: Indicates that the model refuses to answer due to specific reasons (e.g., legal, ethical, or safety concerns), which will be inferred based on the context of the query.
|
||||
|
||||
SFW: Indicates that the model refuses to answer a question because it has been filtered for NSFW content, ensuring a safer and more appropriate user experience.
|
||||
|
||||
6. ### Continuation of responses for ongoing topics:
|
||||
|
||||
The Guanaco model can now continue answering questions or discussing topics upon the user's request, making it more adaptable and better suited for extended conversations.
|
||||
|
||||
The contextual structure consisting of System, Assistant, and User roles allows the model to engage in multi-turn dialogues, maintain context-aware conversations, and provide more coherent responses.
|
||||
|
||||
The model can now accommodate role specification and character settings, providing a more immersive and tailored conversational experience based on the user's preferences.
|
||||
|
||||
It is important to remember that Guanaco is a 7B-parameter model, and **any knowledge-based content should be considered potentially inaccurate**. We strongly recommend **providing verifiable sources in System Prompt, such as Wikipedia, for knowledge-based answers**. In the absence of sources, it is crucial to inform users of this limitation to prevent the dissemination of false information and to maintain transparency.
|
||||
|
||||
Due to the differences in the format between this project and [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca), please refer to *Guanaco-lora: LoRA for training Multilingual Instruction-following LM based on LLaMA* (https://github.com/KohakuBlueleaf/guanaco-lora) for further training and inference our models.
|
||||
|
||||
## Recent News
|
||||
|
||||
We've noticed a recent entrant in the field, the QLoRa method, which we find concerning due to its attempt to piggyback on the reputation of Guanaco. We strongly disapprove of such practices. QLoRa, as far as we can tell, lacks mathematical robustness and its performance significantly trails behind that of GPTQ and advancements such as PEFT fine-tuning, which have been successful in improving upon it.
|
||||
|
||||
Guanaco has been diligent, consistently releasing multilingual datasets since March 2023, along with publishing weights that are not only an enhanced version of GPTQ but also support multimodal VQA and have been optimized for 4-bit. Despite the substantial financial investment of tens of thousands of dollars in distilling data from OpenAI's GPT models, we still consider these efforts to be incremental.
|
||||
|
||||
We, however, aim to move beyond the incremental:
|
||||
|
||||
1. We strive to no longer rely on distillation data from OpenAI: We've found that relying on GPT-generated data impedes significant breakthroughs. Furthermore, this approach has proven to be disastrous when dealing with the imbalances in multilingual tasks.
|
||||
|
||||
2. We're focusing on the enhancement of quantization structure and partial native 4-bit fine-tuning: We are deeply appreciative of the GPTQ-Llama project for paving the way in state-of-the-art LLM quantization. Its unique qualities, especially at the 7B size, are facilitating significant progress in multilingual and multimodal tasks.
|
||||
|
||||
3. We plan to utilize visual data to adjust our language models: We believe this will fundamentally address the issues of language imbalance, translation inaccuracies, and the lack of graphical logic in LLM.
|
||||
|
||||
While our work is still in the early stages, we're determined to break new ground in these areas. Our critique of QLoRa's practices does not stem from animosity but rather from the fundamental belief that innovation should be rooted in originality, integrity, and substantial progress.
|
||||
BIN
StupidBanner.png
Normal file
BIN
StupidBanner.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 284 KiB |
24
config.json
Normal file
24
config.json
Normal file
@@ -0,0 +1,24 @@
|
||||
{
|
||||
"_name_or_path": "Guanaco",
|
||||
"architectures": [
|
||||
"LlamaForCausalLM"
|
||||
],
|
||||
"bos_token_id": 1,
|
||||
"eos_token_id": 2,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 4096,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 11008,
|
||||
"max_position_embeddings": 2048,
|
||||
"max_sequence_length": 2048,
|
||||
"model_type": "llama",
|
||||
"num_attention_heads": 32,
|
||||
"num_hidden_layers": 32,
|
||||
"pad_token_id": 0,
|
||||
"rms_norm_eps": 1e-06,
|
||||
"tie_word_embeddings": false,
|
||||
"torch_dtype": "float16",
|
||||
"transformers_version": "4.28.0.dev0",
|
||||
"use_cache": true,
|
||||
"vocab_size": 32000
|
||||
}
|
||||
7
generation_config.json
Normal file
7
generation_config.json
Normal file
@@ -0,0 +1,7 @@
|
||||
{
|
||||
"_from_model_config": true,
|
||||
"bos_token_id": 0,
|
||||
"eos_token_id": 1,
|
||||
"pad_token_id": 0,
|
||||
"transformers_version": "4.28.0.dev0"
|
||||
}
|
||||
3
pytorch_model-00001-of-00007.bin
Normal file
3
pytorch_model-00001-of-00007.bin
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:21595bcba4c5b35a4baf700a5dbec13ae29ed3d41b9873a2dd7997e19f7b32b2
|
||||
size 3963764647
|
||||
3
pytorch_model-00002-of-00007.bin
Normal file
3
pytorch_model-00002-of-00007.bin
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:0e6adc6b32cc4457fe5e1301c66e62a8a0a3a57b4a564384ca19074b77c77c01
|
||||
size 3980576641
|
||||
3
pytorch_model-00003-of-00007.bin
Normal file
3
pytorch_model-00003-of-00007.bin
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:ce03795a25bb784cd2cdfb7eb80896f6584ed1d46e680a5ea16701b6e1bcb1f4
|
||||
size 3980576705
|
||||
3
pytorch_model-00004-of-00007.bin
Normal file
3
pytorch_model-00004-of-00007.bin
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:21bd5be7d19272a81bf551827ede4cf93f020bbc143497682d0cc22efcd14b6a
|
||||
size 3980576705
|
||||
3
pytorch_model-00005-of-00007.bin
Normal file
3
pytorch_model-00005-of-00007.bin
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:2fc4425a325ff8faeecc2189eec9ed956b1991a15e96272cc15d67299f0dab22
|
||||
size 3867297053
|
||||
3
pytorch_model-00006-of-00007.bin
Normal file
3
pytorch_model-00006-of-00007.bin
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:898a82558e3fe2bba35b7ce4b5f53641e65aaa06cd7850fe5b52cc17d63d1b15
|
||||
size 3867330497
|
||||
3
pytorch_model-00007-of-00007.bin
Normal file
3
pytorch_model-00007-of-00007.bin
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:8e8479bb92f721d2df3706f9f25088081ac31433b5e511e5792d4400c4b98ab8
|
||||
size 3313660823
|
||||
330
pytorch_model.bin.index.json
Normal file
330
pytorch_model.bin.index.json
Normal file
@@ -0,0 +1,330 @@
|
||||
{
|
||||
"metadata": {
|
||||
"total_size": 26953670656
|
||||
},
|
||||
"weight_map": {
|
||||
"lm_head.weight": "pytorch_model-00007-of-00007.bin",
|
||||
"model.embed_tokens.weight": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.0.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.1.input_layernorm.weight": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.1.mlp.down_proj.weight": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.1.mlp.gate_proj.weight": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.1.mlp.up_proj.weight": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.1.self_attn.k_proj.weight": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.1.self_attn.o_proj.weight": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.1.self_attn.q_proj.weight": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.1.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.1.self_attn.v_proj.weight": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.10.input_layernorm.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.10.mlp.down_proj.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.10.mlp.gate_proj.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.10.mlp.up_proj.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.10.post_attention_layernorm.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.10.self_attn.k_proj.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.10.self_attn.o_proj.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.10.self_attn.q_proj.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.10.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.10.self_attn.v_proj.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.11.input_layernorm.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.11.mlp.down_proj.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.11.mlp.gate_proj.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.11.mlp.up_proj.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.11.post_attention_layernorm.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.11.self_attn.k_proj.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.11.self_attn.o_proj.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.11.self_attn.q_proj.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.11.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.11.self_attn.v_proj.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.12.input_layernorm.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.12.mlp.down_proj.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.12.mlp.gate_proj.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.12.mlp.up_proj.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.12.post_attention_layernorm.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.12.self_attn.k_proj.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.12.self_attn.o_proj.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.12.self_attn.q_proj.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.12.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.12.self_attn.v_proj.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.13.input_layernorm.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.13.mlp.down_proj.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.13.mlp.gate_proj.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.13.mlp.up_proj.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.13.post_attention_layernorm.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.13.self_attn.k_proj.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.13.self_attn.o_proj.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.13.self_attn.q_proj.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.13.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.13.self_attn.v_proj.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.14.input_layernorm.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.14.mlp.down_proj.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.14.mlp.gate_proj.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.14.mlp.up_proj.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.14.post_attention_layernorm.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.14.self_attn.k_proj.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.14.self_attn.o_proj.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.14.self_attn.q_proj.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.14.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.14.self_attn.v_proj.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.15.input_layernorm.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.15.mlp.down_proj.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.15.mlp.gate_proj.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.15.mlp.up_proj.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.15.post_attention_layernorm.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.15.self_attn.k_proj.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.15.self_attn.o_proj.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.15.self_attn.q_proj.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.15.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.15.self_attn.v_proj.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.16.input_layernorm.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.16.mlp.down_proj.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.16.mlp.gate_proj.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.16.mlp.up_proj.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.16.post_attention_layernorm.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.16.self_attn.k_proj.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.16.self_attn.o_proj.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.16.self_attn.q_proj.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.16.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.16.self_attn.v_proj.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.17.input_layernorm.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.17.mlp.down_proj.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.17.mlp.gate_proj.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.17.mlp.up_proj.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.17.post_attention_layernorm.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.17.self_attn.k_proj.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.17.self_attn.o_proj.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.17.self_attn.q_proj.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.17.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.17.self_attn.v_proj.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.18.input_layernorm.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.18.mlp.down_proj.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.18.mlp.gate_proj.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.18.mlp.up_proj.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.18.post_attention_layernorm.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.18.self_attn.k_proj.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.18.self_attn.o_proj.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.18.self_attn.q_proj.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.18.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.18.self_attn.v_proj.weight": "pytorch_model-00004-of-00007.bin",
|
||||
"model.layers.19.input_layernorm.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.19.mlp.down_proj.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.19.mlp.gate_proj.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.19.mlp.up_proj.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.19.post_attention_layernorm.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.19.self_attn.k_proj.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.19.self_attn.o_proj.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.19.self_attn.q_proj.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.19.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.19.self_attn.v_proj.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.2.input_layernorm.weight": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.2.mlp.down_proj.weight": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.2.mlp.gate_proj.weight": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.2.mlp.up_proj.weight": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.2.post_attention_layernorm.weight": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.2.self_attn.k_proj.weight": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.2.self_attn.o_proj.weight": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.2.self_attn.q_proj.weight": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.2.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.2.self_attn.v_proj.weight": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.20.input_layernorm.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.20.mlp.down_proj.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.20.mlp.gate_proj.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.20.mlp.up_proj.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.20.post_attention_layernorm.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.20.self_attn.k_proj.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.20.self_attn.o_proj.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.20.self_attn.q_proj.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.20.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.20.self_attn.v_proj.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.21.input_layernorm.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.21.mlp.down_proj.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.21.mlp.gate_proj.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.21.mlp.up_proj.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.21.post_attention_layernorm.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.21.self_attn.k_proj.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.21.self_attn.o_proj.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.21.self_attn.q_proj.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.21.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.21.self_attn.v_proj.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.22.input_layernorm.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.22.mlp.down_proj.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.22.mlp.gate_proj.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.22.mlp.up_proj.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.22.post_attention_layernorm.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.22.self_attn.k_proj.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.22.self_attn.o_proj.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.22.self_attn.q_proj.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.22.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.22.self_attn.v_proj.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.23.input_layernorm.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.23.mlp.down_proj.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.23.mlp.gate_proj.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.23.mlp.up_proj.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.23.post_attention_layernorm.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.23.self_attn.k_proj.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.23.self_attn.o_proj.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.23.self_attn.q_proj.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.23.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.23.self_attn.v_proj.weight": "pytorch_model-00005-of-00007.bin",
|
||||
"model.layers.24.input_layernorm.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.24.mlp.down_proj.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.24.mlp.gate_proj.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.24.mlp.up_proj.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.24.post_attention_layernorm.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.24.self_attn.k_proj.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.24.self_attn.o_proj.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.24.self_attn.q_proj.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.24.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.24.self_attn.v_proj.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.25.input_layernorm.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.25.mlp.down_proj.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.25.mlp.gate_proj.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.25.mlp.up_proj.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.25.post_attention_layernorm.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.25.self_attn.k_proj.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.25.self_attn.o_proj.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.25.self_attn.q_proj.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.25.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.25.self_attn.v_proj.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.26.input_layernorm.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.26.mlp.down_proj.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.26.mlp.gate_proj.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.26.mlp.up_proj.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.26.post_attention_layernorm.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.26.self_attn.k_proj.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.26.self_attn.o_proj.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.26.self_attn.q_proj.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.26.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.26.self_attn.v_proj.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.27.input_layernorm.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.27.mlp.down_proj.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.27.mlp.gate_proj.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.27.mlp.up_proj.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.27.post_attention_layernorm.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.27.self_attn.k_proj.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.27.self_attn.o_proj.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.27.self_attn.q_proj.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.27.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.27.self_attn.v_proj.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.28.input_layernorm.weight": "pytorch_model-00007-of-00007.bin",
|
||||
"model.layers.28.mlp.down_proj.weight": "pytorch_model-00007-of-00007.bin",
|
||||
"model.layers.28.mlp.gate_proj.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.28.mlp.up_proj.weight": "pytorch_model-00007-of-00007.bin",
|
||||
"model.layers.28.post_attention_layernorm.weight": "pytorch_model-00007-of-00007.bin",
|
||||
"model.layers.28.self_attn.k_proj.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.28.self_attn.o_proj.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.28.self_attn.q_proj.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.28.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.28.self_attn.v_proj.weight": "pytorch_model-00006-of-00007.bin",
|
||||
"model.layers.29.input_layernorm.weight": "pytorch_model-00007-of-00007.bin",
|
||||
"model.layers.29.mlp.down_proj.weight": "pytorch_model-00007-of-00007.bin",
|
||||
"model.layers.29.mlp.gate_proj.weight": "pytorch_model-00007-of-00007.bin",
|
||||
"model.layers.29.mlp.up_proj.weight": "pytorch_model-00007-of-00007.bin",
|
||||
"model.layers.29.post_attention_layernorm.weight": "pytorch_model-00007-of-00007.bin",
|
||||
"model.layers.29.self_attn.k_proj.weight": "pytorch_model-00007-of-00007.bin",
|
||||
"model.layers.29.self_attn.o_proj.weight": "pytorch_model-00007-of-00007.bin",
|
||||
"model.layers.29.self_attn.q_proj.weight": "pytorch_model-00007-of-00007.bin",
|
||||
"model.layers.29.self_attn.rotary_emb.inv_freq": "pytorch_model-00007-of-00007.bin",
|
||||
"model.layers.29.self_attn.v_proj.weight": "pytorch_model-00007-of-00007.bin",
|
||||
"model.layers.3.input_layernorm.weight": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.3.mlp.down_proj.weight": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.3.mlp.gate_proj.weight": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.3.mlp.up_proj.weight": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.3.post_attention_layernorm.weight": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.3.self_attn.k_proj.weight": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.3.self_attn.o_proj.weight": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.3.self_attn.q_proj.weight": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.3.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.3.self_attn.v_proj.weight": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.30.input_layernorm.weight": "pytorch_model-00007-of-00007.bin",
|
||||
"model.layers.30.mlp.down_proj.weight": "pytorch_model-00007-of-00007.bin",
|
||||
"model.layers.30.mlp.gate_proj.weight": "pytorch_model-00007-of-00007.bin",
|
||||
"model.layers.30.mlp.up_proj.weight": "pytorch_model-00007-of-00007.bin",
|
||||
"model.layers.30.post_attention_layernorm.weight": "pytorch_model-00007-of-00007.bin",
|
||||
"model.layers.30.self_attn.k_proj.weight": "pytorch_model-00007-of-00007.bin",
|
||||
"model.layers.30.self_attn.o_proj.weight": "pytorch_model-00007-of-00007.bin",
|
||||
"model.layers.30.self_attn.q_proj.weight": "pytorch_model-00007-of-00007.bin",
|
||||
"model.layers.30.self_attn.rotary_emb.inv_freq": "pytorch_model-00007-of-00007.bin",
|
||||
"model.layers.30.self_attn.v_proj.weight": "pytorch_model-00007-of-00007.bin",
|
||||
"model.layers.31.input_layernorm.weight": "pytorch_model-00007-of-00007.bin",
|
||||
"model.layers.31.mlp.down_proj.weight": "pytorch_model-00007-of-00007.bin",
|
||||
"model.layers.31.mlp.gate_proj.weight": "pytorch_model-00007-of-00007.bin",
|
||||
"model.layers.31.mlp.up_proj.weight": "pytorch_model-00007-of-00007.bin",
|
||||
"model.layers.31.post_attention_layernorm.weight": "pytorch_model-00007-of-00007.bin",
|
||||
"model.layers.31.self_attn.k_proj.weight": "pytorch_model-00007-of-00007.bin",
|
||||
"model.layers.31.self_attn.o_proj.weight": "pytorch_model-00007-of-00007.bin",
|
||||
"model.layers.31.self_attn.q_proj.weight": "pytorch_model-00007-of-00007.bin",
|
||||
"model.layers.31.self_attn.rotary_emb.inv_freq": "pytorch_model-00007-of-00007.bin",
|
||||
"model.layers.31.self_attn.v_proj.weight": "pytorch_model-00007-of-00007.bin",
|
||||
"model.layers.4.input_layernorm.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.4.mlp.down_proj.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.4.mlp.gate_proj.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.4.mlp.up_proj.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.4.post_attention_layernorm.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.4.self_attn.k_proj.weight": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.4.self_attn.o_proj.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.4.self_attn.q_proj.weight": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.4.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.4.self_attn.v_proj.weight": "pytorch_model-00001-of-00007.bin",
|
||||
"model.layers.5.input_layernorm.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.5.mlp.down_proj.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.5.mlp.gate_proj.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.5.mlp.up_proj.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.5.post_attention_layernorm.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.5.self_attn.k_proj.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.5.self_attn.o_proj.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.5.self_attn.q_proj.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.5.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.5.self_attn.v_proj.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.6.input_layernorm.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.6.mlp.down_proj.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.6.mlp.gate_proj.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.6.mlp.up_proj.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.6.post_attention_layernorm.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.6.self_attn.k_proj.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.6.self_attn.o_proj.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.6.self_attn.q_proj.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.6.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.6.self_attn.v_proj.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.7.input_layernorm.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.7.mlp.down_proj.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.7.mlp.gate_proj.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.7.mlp.up_proj.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.7.post_attention_layernorm.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.7.self_attn.k_proj.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.7.self_attn.o_proj.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.7.self_attn.q_proj.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.7.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.7.self_attn.v_proj.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.8.input_layernorm.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.8.mlp.down_proj.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.8.mlp.gate_proj.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.8.mlp.up_proj.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.8.post_attention_layernorm.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.8.self_attn.k_proj.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.8.self_attn.o_proj.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.8.self_attn.q_proj.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.8.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.8.self_attn.v_proj.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.9.input_layernorm.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.9.mlp.down_proj.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.9.mlp.gate_proj.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.9.mlp.up_proj.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.9.post_attention_layernorm.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.9.self_attn.k_proj.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.9.self_attn.o_proj.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.9.self_attn.q_proj.weight": "pytorch_model-00002-of-00007.bin",
|
||||
"model.layers.9.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00007.bin",
|
||||
"model.layers.9.self_attn.v_proj.weight": "pytorch_model-00003-of-00007.bin",
|
||||
"model.norm.weight": "pytorch_model-00007-of-00007.bin"
|
||||
}
|
||||
}
|
||||
23
special_tokens_map.json
Normal file
23
special_tokens_map.json
Normal file
@@ -0,0 +1,23 @@
|
||||
{
|
||||
"bos_token": {
|
||||
"content": "<s>",
|
||||
"lstrip": false,
|
||||
"normalized": true,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"eos_token": {
|
||||
"content": "</s>",
|
||||
"lstrip": false,
|
||||
"normalized": true,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"unk_token": {
|
||||
"content": "<unk>",
|
||||
"lstrip": false,
|
||||
"normalized": true,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
93385
tokenizer.json
Normal file
93385
tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
3
tokenizer.model
Normal file
3
tokenizer.model
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
|
||||
size 499723
|
||||
33
tokenizer_config.json
Normal file
33
tokenizer_config.json
Normal file
@@ -0,0 +1,33 @@
|
||||
{
|
||||
"add_bos_token": true,
|
||||
"add_eos_token": false,
|
||||
"bos_token": {
|
||||
"__type": "AddedToken",
|
||||
"content": "<s>",
|
||||
"lstrip": false,
|
||||
"normalized": true,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"clean_up_tokenization_spaces": false,
|
||||
"eos_token": {
|
||||
"__type": "AddedToken",
|
||||
"content": "</s>",
|
||||
"lstrip": false,
|
||||
"normalized": true,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"model_max_length": 2048,
|
||||
"pad_token": null,
|
||||
"sp_model_kwargs": {},
|
||||
"tokenizer_class": "LlamaTokenizer",
|
||||
"unk_token": {
|
||||
"__type": "AddedToken",
|
||||
"content": "<unk>",
|
||||
"lstrip": false,
|
||||
"normalized": true,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user