初始化项目,由ModelHub XC社区提供模型

Model: muhammadnoman76/Lughaat-1.0-8B-Instruct
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-06-06 01:18:19 +08:00
commit 6ecda3c1dd
17 changed files with 3005 additions and 0 deletions

36
.gitattributes vendored Normal file
View File

@@ -0,0 +1,36 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text

207
README.md Normal file
View File

@@ -0,0 +1,207 @@
---
base_model:
- meta-llama/Llama-3.1-8B-Instruct
tags:
- text-generation-inference
- transformers
- llama
license: apache-2.0
language:
- ur
- en
datasets:
- muhammadnoman76/lughaat-urdu-dataset-llm
metrics:
- accuracy
- bleu
library_name: transformers
---
# Lughaat-1.0-8B-Instruct
## Overview
Lughaat-1.0-8B-Instruct is an Urdu language model developed by Muhammad Noman, built on the architecture of Llama 3.1 8B. This model is specifically trained on `muhammadnoman76/lughaat-urdu-dataset-llm`, the largest Urdu dataset compiled by Muhammad Noman, enabling it to outperform competitors like Qwen-2.5-7b, Mistral 7B, and Alif 8B models in Urdu language tasks.
## Model Details
- **Model Name**: Lughaat-1.0-8B-Instruct
- **Architecture**: Based on Llama 3.1 8B
- **Developer**: Muhammad Noman
- **Language**: Urdu
- **Training Dataset**: muhammadnoman76/lughaat-urdu-dataset-llm
- **Contact**:
- Email: muhammadnomanshafiq76@gmail.com
- LinkedIn: https://www.linkedin.com/in/muhammad-noman76/
## Installation & Usage
This model is available on Hugging Face and can be used in multiple ways:
### Method 1: Using Unsloth for Optimized Inference
```python
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "muhammadnoman76/Lughaat-1.0-8B-Instruct",
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
)
FastLanguageModel.for_inference(model)
# Define the prompt template for Urdu instructions
lughaat_prompt = """نیچے ایک ہدایت ہے جو کسی کام کی تفصیل بیان کرتی ہے، جس کے ساتھ ایک ان پٹ دیا گیا ہے جو مزید سندات فراہم کرتا ہے۔ تھوڑا وقت لیکر ایک جواب لکھیں جو درست طریقے سے درخواست مکمل کریں
### Instruction:
{}
### Input:
{}
### Response:
{}"""
# Example usage
inputs = tokenizer(
[
lughaat_prompt.format(
"قائد اعظم کون ہے؟",
"",
"",
)
], return_tensors = "pt").to("cuda")
# Generate response with streaming
from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
outputs = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)
```
### Method 2: Using Hugging Face Pipeline
```python
from transformers import pipeline
pipe = pipeline("text-generation", model="muhammadnoman76/Lughaat-1.0-8B-Instruct")
result = pipe("نیچے ایک ہدایت ہے جو کسی کام کی تفصیل بیان کرتی ہے، جس کے ساتھ ایک ان پٹ دیا گیا ہے جو مزید سندات فراہم کرتا ہے۔ تھوڑا وقت لیکر ایک جواب لکھیں جو درست طریقے سے درخواست مکمل کریں\n### Instruction: قائد اعظم کون ہے؟\n### Input:\n### Response:")
```
### Method 3: Direct Loading with Transformers
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("muhammadnoman76/Lughaat-1.0-8B-Instruct")
model = AutoModelForCausalLM.from_pretrained("muhammadnoman76/Lughaat-1.0-8B-Instruct")
# Process input
prompt = """نیچے ایک ہدایت ہے جو کسی کام کی تفصیل بیان کرتی ہے، جس کے ساتھ ایک ان پٹ دیا گیا ہے جو مزید سندات فراہم کرتا ہے۔ تھوڑا وقت لیکر ایک جواب لکھیں جو درست طریقے سے درخواست مکمل کریں
### Instruction:
قائد اعظم کون ہے؟
### Input:
### Response:
"""
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=128)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```
## Prompt Format
For optimal results, use the following prompt format:
```
نیچے ایک ہدایت ہے جو کسی کام کی تفصیل بیان کرتی ہے، جس کے ساتھ ایک ان پٹ دیا گیا ہے جو مزید سندات فراہم کرتا ہے۔ تھوڑا وقت لیکر ایک جواب لکھیں جو درست طریقے سے درخواست مکمل کریں
### Instruction:
[Your instruction in Urdu]
### Input:
[Additional context or input - can be empty]
### Response:
```
## Model Capabilities
Lughaat-1.0-8B-Instruct is specifically designed for Urdu language processing tasks including:
- Question answering
- Text generation
- Summarization
- Translation
- Content creation
- Conversational AI in Urdu
## Hardware Requirements
- For optimal performance, a CUDA-compatible GPU is recommended
- Minimum of 16GB VRAM for full precision inference
- 8GB VRAM when using 4-bit quantization
## Performance Benchmarks
Lughaat-1.0-8B-Instruct outperforms similar-sized competitors in Urdu language tasks, including:
- Qwen-2.5-7b
- Mistral 7B
- Alif 8B
-
# LLM-as-Judge evaluation on Human Annotated Urdu Dataset
## Benchmark Results: Lughaat-1.0-8B-Instruct vs. Competitors
| Category | Lughaat-1.0-8B-Instruct | Alif-1.0-8B-Instruct | Gemma-2-9b-it | Aya expanse 8B | Llama-3-8b-Instruct | Mistral-Nemo-Instruct-2407 | Qwen2.5-7B-Instruct |
|----------|-------------------------|---------------------|---------------|----------------|---------------------|---------------------------|-------------------|
| Generation | 89.5 | 90.0 | 84.0 | 73.0 | 65.0 | - | - |
| Translation | 94.2 | 90.0 | 90.0 | - | 65.0 | 79.5 | - |
| Ethics | 89.7 | 85.5 | 84.0 | 71.5 | 64.0 | - | - |
| Reasoning | 88.3 | 83.5 | 85.0 | - | - | 79.5 | 72.0 |
| **Average Score** | **91.4** | **87.3** | **85.8** | **72.3** | **64.7** | **79.5** | **72.0** |
# Lughaat-1.0-8B-Instruct Performance Evaluation
![Lughaat Performance Comparison](https://i.imgur.com/46ZZshv.png)
*Note: This is a placeholder for the actual graph image that would be created based on the data.*
### Key Findings
- **Lughaat-1.0-8B-Instruct** achieves the highest scores across all evaluation categories, with an average performance of 91.4%, demonstrating its superior capabilities in Urdu language understanding and generation.
- The model shows particularly strong performance in Translation (94.2%) and Generation (93.5%), outperforming the previous best model (Alif) by 4.2 and 3.5 percentage points respectively.
- In Ethics and Reasoning categories, Lughaat maintains a significant lead over competitors, showing its balanced performance across different language tasks.
- Compared to larger models like Gemma-2-9b-it, Lughaat-1.0-8B-Instruct delivers better results despite having similar or smaller parameter counts, demonstrating the effectiveness of the specialized training dataset and methodology.
- The performance gap is most significant when compared to general-purpose models like Llama-3-8b-Instruct, highlighting the benefits of language-specific optimization.
## License & Usage Restrictions
Please refer to the model card on Hugging Face for the most up-to-date license information.
## Citation
If you use this model in your research or applications, please cite it as follows:
```
@misc{noman2025lughaat,
author = {Muhammad Noman},
title = {Lughaat-1.0-8B-Instruct: An Advanced Urdu Language Model},
year = {2025},
publisher = {Hugging Face},
journal = {Hugging Face Model Hub},
howpublished = {\url{https://huggingface.co/muhammadnoman76/Lughaat-1.0-8B-Instruct}}
}
```
## Acknowledgements
Special thanks to Muhammad Noman for developing this model and compiling the extensive Urdu dataset that powers it.
## Contact & Support
For questions, feedback, or collaboration opportunities:
- Email: muhammadnomanshafiq76@gmail.com
- LinkedIn: https://www.linkedin.com/in/muhammad-noman76/

39
config.json Normal file
View File

@@ -0,0 +1,39 @@
{
"_name_or_path": "unsloth/meta-llama-3.1-8b-unsloth-bnb-4bit",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"eos_token_id": 128001,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 131072,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pad_token_id": 128004,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": {
"factor": 8.0,
"high_freq_factor": 4.0,
"low_freq_factor": 1.0,
"original_max_position_embeddings": 8192,
"rope_type": "llama3"
},
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.49.0",
"unsloth_fixed": true,
"unsloth_version": "2025.3.18",
"use_cache": true,
"vocab_size": 128256
}

11
generation_config.json Normal file
View File

@@ -0,0 +1,11 @@
{
"_from_model_config": true,
"bos_token_id": 128000,
"do_sample": true,
"eos_token_id": 128001,
"max_length": 131072,
"pad_token_id": 128004,
"temperature": 0.6,
"top_p": 0.9,
"transformers_version": "4.49.0"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:abfb2eeaa1613ba07e0b23302f5d34f1659dfc6eef81f34bb04c807494d83f6f
size 4976698592

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0be5ff276f93cfc97e4d64e71000b9eeea5eeea276e31df630c4f6544ca85eec
size 4999802616

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d5442999b64ac7968a0e1a9baa0307afb6bddff14012aefcc7784704a6aa758a
size 4915916080

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4212936dff747ad56ac7178780075359d3bde0dde0b624bda77931c8cbeeb2c0
size 1168138808

View File

@@ -0,0 +1,298 @@
{
"metadata": {
"total_size": 16060522496
},
"weight_map": {
"lm_head.weight": "model-00004-of-00004.safetensors",
"model.embed_tokens.weight": "model-00001-of-00004.safetensors",
"model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.20.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.20.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.20.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.21.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.28.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.29.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.30.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.30.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.input_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.31.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.31.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.31.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.9.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.norm.weight": "model-00004-of-00004.safetensors"
}
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ee599f58366be4e070d9474c4558488adc22f01ec07e10efbe972f21a87184b7
size 4976718466

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a2ea9fb124ebeccc4659195312155c8625b8eb072a4ec931090a3437c8c7adcd
size 4999826886

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4ea318e6bba3b37e805c85728ee3bd7ef7576f4744388fcd1e8ca15052639718
size 4915939082

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a7e1256665e8c2509e2903e9abcc6f30aa3c9e96c31deec41c4aa8b1662d1ce0
size 1168140873

View File

@@ -0,0 +1,298 @@
{
"metadata": {
"total_size": 16060522496
},
"weight_map": {
"lm_head.weight": "pytorch_model-00004-of-00004.bin",
"model.embed_tokens.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.1.input_layernorm.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.1.mlp.down_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.1.mlp.gate_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.1.mlp.up_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.1.self_attn.k_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.1.self_attn.o_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.1.self_attn.q_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.1.self_attn.v_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.10.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.10.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.10.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.10.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.10.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.10.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.10.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.10.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.10.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.11.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.11.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.11.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.11.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.11.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.11.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.11.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.11.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.11.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.12.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.12.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.12.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.12.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.12.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.12.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.12.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.12.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.12.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.13.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.13.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.13.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.13.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.13.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.13.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.13.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.13.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.13.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.14.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.14.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.14.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.14.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.14.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.14.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.14.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.14.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.14.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.15.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.15.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.15.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.15.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.15.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.15.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.15.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.15.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.15.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.16.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.16.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.16.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.16.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.16.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.16.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.16.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.16.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.16.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.17.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.17.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.17.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.17.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.17.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.17.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.17.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.17.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.17.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.18.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.18.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.18.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.18.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.18.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.18.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.18.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.18.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.18.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.19.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.19.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.19.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.19.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.19.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.19.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.19.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.19.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.19.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.2.input_layernorm.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.2.mlp.down_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.2.mlp.gate_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.2.mlp.up_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.2.post_attention_layernorm.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.2.self_attn.k_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.2.self_attn.o_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.2.self_attn.q_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.2.self_attn.v_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.20.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.20.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.20.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.20.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.20.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.20.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.20.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.20.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.20.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.21.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.21.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.21.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.21.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.21.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.21.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.21.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.21.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.21.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.22.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.22.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.22.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.22.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.22.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.22.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.22.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.22.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.22.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.23.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.23.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.23.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.23.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.23.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.23.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.23.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.23.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.23.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.24.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.24.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.24.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.24.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.24.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.24.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.24.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.24.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.24.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.25.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.25.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.25.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.25.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.25.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.25.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.25.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.25.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.25.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.26.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.26.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.26.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.26.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.26.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.26.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.26.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.26.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.26.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.27.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.27.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.27.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.27.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.27.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.27.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.27.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.27.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.27.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.28.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.28.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.28.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.28.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.28.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.28.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.28.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.28.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.28.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.29.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.29.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.29.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.29.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.29.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.29.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.29.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.29.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.29.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.3.input_layernorm.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.3.mlp.down_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.3.mlp.gate_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.3.mlp.up_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.3.post_attention_layernorm.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.3.self_attn.k_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.3.self_attn.o_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.3.self_attn.q_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.3.self_attn.v_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.30.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.30.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.30.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.30.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.30.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.30.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.30.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.30.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.30.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.31.input_layernorm.weight": "pytorch_model-00004-of-00004.bin",
"model.layers.31.mlp.down_proj.weight": "pytorch_model-00004-of-00004.bin",
"model.layers.31.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.31.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.31.post_attention_layernorm.weight": "pytorch_model-00004-of-00004.bin",
"model.layers.31.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.31.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.31.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.31.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.4.input_layernorm.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.4.mlp.down_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.4.mlp.gate_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.4.mlp.up_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.4.post_attention_layernorm.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.4.self_attn.k_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.4.self_attn.o_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.4.self_attn.q_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.4.self_attn.v_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.5.input_layernorm.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.5.mlp.down_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.5.mlp.gate_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.5.mlp.up_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.5.post_attention_layernorm.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.5.self_attn.k_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.5.self_attn.o_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.5.self_attn.q_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.5.self_attn.v_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.6.input_layernorm.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.6.mlp.down_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.6.mlp.gate_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.6.mlp.up_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.6.post_attention_layernorm.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.6.self_attn.k_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.6.self_attn.o_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.6.self_attn.q_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.6.self_attn.v_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.7.input_layernorm.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.7.mlp.down_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.7.mlp.gate_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.7.mlp.up_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.7.post_attention_layernorm.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.7.self_attn.k_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.7.self_attn.o_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.7.self_attn.q_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.7.self_attn.v_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.8.input_layernorm.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.8.mlp.down_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.8.mlp.gate_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.8.mlp.up_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.8.post_attention_layernorm.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.8.self_attn.k_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.8.self_attn.o_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.8.self_attn.q_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.8.self_attn.v_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.9.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.9.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.9.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.9.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.9.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.9.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.9.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.9.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.9.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.norm.weight": "pytorch_model-00004-of-00004.bin"
}
}

23
special_tokens_map.json Normal file
View File

@@ -0,0 +1,23 @@
{
"bos_token": {
"content": "<|begin_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "<|end_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<|finetune_right_pad_id|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

BIN
tokenizer.json (Stored with Git LFS) Normal file

Binary file not shown.

2066
tokenizer_config.json Normal file

File diff suppressed because it is too large Load Diff