初始化项目,由ModelHub XC社区提供模型
Model: SicariusSicariiStuff/Hebrew_Nemo Source: Original Platform
This commit is contained in:
62
.gitattributes
vendored
Normal file
62
.gitattributes
vendored
Normal file
@@ -0,0 +1,62 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zstandard filter=lfs diff=lfs merge=lfs -text
|
||||
*.tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
*.db* filter=lfs diff=lfs merge=lfs -text
|
||||
*.ark* filter=lfs diff=lfs merge=lfs -text
|
||||
**/*ckpt*data* filter=lfs diff=lfs merge=lfs -text
|
||||
**/*ckpt*.meta filter=lfs diff=lfs merge=lfs -text
|
||||
**/*ckpt*.index filter=lfs diff=lfs merge=lfs -text
|
||||
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.gguf* filter=lfs diff=lfs merge=lfs -text
|
||||
*.ggml filter=lfs diff=lfs merge=lfs -text
|
||||
*.llamafile* filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
|
||||
model-00005-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
model-00002-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
model-00008-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
model-00009-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
model-00003-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
model-00004-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
||||
model-00007-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
model-00006-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
model-00013-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
model-00010-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
model-00012-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
model-00011-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
model-00001-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
BIN
Images/Hebrew_Nemo.png
Normal file
BIN
Images/Hebrew_Nemo.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 21 KiB |
297
README.md
Normal file
297
README.md
Normal file
@@ -0,0 +1,297 @@
|
||||
---
|
||||
language:
|
||||
- he
|
||||
- en
|
||||
license: apache-2.0
|
||||
tags:
|
||||
- mistral
|
||||
- nemo
|
||||
- hebrew
|
||||
- llm
|
||||
- text-generation
|
||||
- instruction-tuned
|
||||
- chat
|
||||
pipeline_tag: text-generation
|
||||
base_model: mistralai/Mistral-Nemo-Base-2407
|
||||
library_name: transformers
|
||||
widget:
|
||||
- text: "Hebrew_Nemo"
|
||||
output:
|
||||
url: https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo/resolve/main/Images/Hebrew_Nemo.png
|
||||
---
|
||||
|
||||
# Hebrew_Nemo: State-of-the-Art Hebrew Language Model
|
||||
|
||||
---
|
||||
|
||||
<div align="center">
|
||||
<b style="font-size: 50px;">Hebrew_Nemo</b>
|
||||
|
||||
|
||||
</div>
|
||||
|
||||
|
||||
<div align="center">
|
||||
<b style="font-size: 80px;">12B</b>
|
||||
|
||||
|
||||
</div>
|
||||
|
||||
|
||||
---
|
||||
|
||||
<div align="center" style="font-size: 18px; margin-top: 20px;">
|
||||
<b>Developed by:</b> <a href="https://huggingface.co/SicariusSicariiStuff">SicariusSicariiStuff</a>
|
||||
</div>
|
||||
|
||||
---
|
||||
|
||||
**Hebrew_Nemo** is a state-of-the-art (SOTA) **Hebrew language large language model** specifically optimized for Hebrew language understanding and generation. Built upon the Mistral Nemo architecture, this model represents a significant advancement in Hebrew NLP capabilities, combining the robust multilingual foundations of Mistral Nemo with extensive Hebrew-specific fine-tuning and optimization.
|
||||
|
||||
As part of [SicariusSicariiStuff](https://huggingface.co/SicariusSicariiStuff) efforts to truly democratize AI, [Hebrew_Nemo](https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo) is released with a permissive **Apache 2.0** license. The model demonstrates competitive performance with **Gemma3-27B**, one of the world’s leading open-source models in multilingual capabilities—despite Gemma3-27B being **more than twice its size**. This result highlights Hebrew_Nemo’s efficiency and effectiveness, making SOTA capabilities widely available for consumers, as well as corporations.
|
||||
|
||||
Unfortunately, Gemma-3-27b-it doesn't benchmark well, but I still believe Gemma-3-27b-it is by far the best multi-lingual model:
|
||||
|
||||
| Model | Average | SNLI Acc | QA (HeQ) | Translation BLEU | Israeli Trivia | Params (B) |
|
||||
|-------|---------|----------|----------|------------------|----------------|------------|
|
||||
| google/gemma-3-27b-pt | 69.5 | 85.24 | 78.27 | 36.45 | 70.43 | 27 |
|
||||
| google/gemma-3-27b-it | 13.41 | 0 | 80.31 | 0.17 | 0 | 27 |
|
||||
|
||||
---
|
||||
|
||||
# Benchmarks
|
||||
|
||||
---
|
||||
|
||||
**Hebrew_Nemo** demonstrates SOTA performance for its size, with particularly **outstanding results in Hebrew translation**. At only **12B parameters**, it achieves a **BLEU score of 30.83**, outperforming significantly larger models such as DeepSeek-14B and AI21 Jamba-Mini (52B)— a model more than x4 times its size.
|
||||
|
||||
The model maintains **high competence across reasoning and QA**, with **SNLI accuracy of 79.76** and **HeQ score of 70.51**, indicating solid sentence-level understanding and contextual reasoning in Hebrew. Its **Israeli Trivia score (50.83)** demonstrates exceptional knowledge for its size, coming very close to a model more than 4x times its size, while vastly outperforming models of similar and even of a slightly larger size.
|
||||
|
||||
|
||||
| Model | Average | SNLI Acc | QA (HeQ) | Translation BLEU | Israeli Trivia | Params (B) |
|
||||
| ---------------------------------------- | --------: | --------: | --------: | ---------------: | -------------: | ---------: |
|
||||
| **Hebrew_Nemo** | **57.98** | 79.76 | 70.51 | **30.83** | 50.83 | 12 |
|
||||
| ai21labs/AI21-Jamba-1.5-Mini | 54.68 | 69.52 | 69.38 | 22.00 | **57.81** | 52 |
|
||||
| deepseek-ai/DeepSeek-R1-Distill-Qwen-14B | 53.19 | **85.48** | 71.38 | 22.99 | 32.89 | 14 |
|
||||
| SicariusSicariiStuff/Zion_Alpha | 53.55 | 84.05 | 67.67 | 27.93 | 34.55 | 7 |
|
||||
| Qwen/Qwen3-8B | 53.54 | 80.00 | **78.53** | 25.73 | 29.90 | 8 |
|
||||
| Mistral-Nemo-Base-2407 | 51.24 | 65.95 | 68.48 | 28.99 | 41.53 | 12.0 |
|
||||
|
||||
---
|
||||
|
||||
**Hebrew_Nemo** also **vastly improves** upon the original Mistral Nemo by adding massive amounts of new knowledge while refining existing capabilities:
|
||||
|
||||
| Metric | Hebrew_Nemo | Mistral-Nemo-Base | (% Improvement) |
|
||||
| :------------------- | ----------: | ----------------: | ----------------: |
|
||||
| **Average** | **57.98** | 51.24 | **+13.2%** |
|
||||
| **SNLI Accuracy** | **79.76** | 65.95 | **+20.9%** |
|
||||
| **QA (HeQ)** | **70.51** | 68.48 | **+3.0%** |
|
||||
| **Translation BLEU** | **30.83** | 28.99 | **+6.3%** |
|
||||
| **Israeli Trivia** | **50.83** | 41.53 | **+22.4%** |
|
||||
|
||||
----
|
||||
|
||||
|
||||
|
||||
### Technical Overview
|
||||
|
||||
- **Model Type:** Causal Language Model (Decoder-only Transformer)
|
||||
- **Base Architecture:** Mistral Nemo
|
||||
- **Language Focus:** Hebrew (עברית) with maintained multilingual capabilities
|
||||
- **License:** Apache 2.0
|
||||
- **Parameters:** 12B
|
||||
- **Context Length:** 128K tokens
|
||||
- **Layers:** 40
|
||||
- **Dim:** 5,120
|
||||
- **Head dim:** 128
|
||||
- **Hidden dim:** 14,336
|
||||
- **Activation Function:** SwiGLU
|
||||
- **Number of heads:** 32
|
||||
- **Number of kv-heads:** 8 (GQA)
|
||||
- **Vocabulary size:** 2**17 ~= 128k
|
||||
- **Rotary embeddings (theta = 1M)**
|
||||
|
||||
### Primary Use Cases
|
||||
|
||||
- **Hebrew Text Generation:** High-quality content creation in modern Hebrew
|
||||
- **Translation:** Bidirectional translation between Hebrew and other languages
|
||||
- **Question Answering:** Advanced reasoning and comprehension in Hebrew contexts
|
||||
- **Dialogue Systems:** Conversational AI applications for Hebrew speakers
|
||||
- **Text Classification:** Sentiment analysis, topic modeling, and categorization of Hebrew content
|
||||
- **Named Entity Recognition:** Extraction of entities from Hebrew text
|
||||
- **Summarization:** Concise summaries of Hebrew documents and articles
|
||||
|
||||
### Out-of-Scope Uses
|
||||
|
||||
- Real-time critical decision-making systems (medical, legal, financial) without human oversight
|
||||
- Generation of content intended to deceive or manipulate
|
||||
- Applications requiring 100% factual accuracy without verification
|
||||
|
||||
|
||||
## Training Data and Training Methodology
|
||||
|
||||
Hebrew_Nemo was trained on a diverse corpus including:
|
||||
|
||||
| Source Type | Description | Language Coverage |
|
||||
|--------------|--------------|------------------|
|
||||
| Hebrew Wikipedia | Encyclopedia-style text | 100% Hebrew |
|
||||
| Hebrew Literature & Proverbs | Classic and modern | 100% Hebrew |
|
||||
| Hebrew-English Code-Mix | Social media & dialogue | 70% Hebrew / 30% English |
|
||||
| Synthetic Data | Instruction-following & reasoning | Mixed |
|
||||
|
||||
Data was filtered, normalized, and token-balanced to reduce bias and improve generalization across dialects.
|
||||
|
||||
Additional data trained:
|
||||
|
||||
- Modern Hebrew web text and news articles
|
||||
- Hebrew literature and academic publications
|
||||
- Biblical and Rabbinic Hebrew texts for cultural depth
|
||||
- Hebrew social media and conversational data
|
||||
- Technical documentation in Hebrew
|
||||
- Parallel corpora for translation capabilities
|
||||
|
||||
---
|
||||
|
||||
**The training process involved:**
|
||||
|
||||
1. Continued pre-training on Hebrew-rich datasets
|
||||
2. Instruction fine-tuning on Hebrew task-specific data
|
||||
3. Alignment through RLHF/DPO for Hebrew linguistic preferences
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Key Features
|
||||
|
||||
- **Native Hebrew Understanding:** Trained on millions of high-quality Hebrew documents spanning literature, news, Wikipedia, academic, and colloquial domains.
|
||||
- **Contextual Mastery:** Handles complex anaphora, idiomatic expressions, and mixed Hebrew-English text with high fidelity.
|
||||
- **Instruction-Tuned:** Aligned for chat, Q&A, summarization, and reasoning use cases.
|
||||
- **Cultural Awareness:** Sensitive to Hebrew cultural, religious, and social nuances.
|
||||
- **Optimized Inference:** Enhanced performance with Mistral’s memory-efficient attention and dynamic context window.
|
||||
|
||||
---
|
||||
|
||||
# Out of scope usage
|
||||
* Generating disinformation or biased political content
|
||||
* Automated decision-making without human oversight
|
||||
|
||||
---
|
||||
|
||||
## ⚙️ Limitations
|
||||
|
||||
* May reflect **training corpus biases** (e.g., urban dialect prevalence, widespread opinions in Israeli social media)
|
||||
* Limited performance on **rare biblical or archaic Hebrew**
|
||||
* Occasionally mixes Hebrew and English when the context is ambiguous
|
||||
* Does not include alignment for safety moderation out of the box
|
||||
|
||||
---
|
||||
|
||||
# Model instruction template: ChatML
|
||||
|
||||
```
|
||||
<|im_start|>system
|
||||
You answer the questions in Hebrew.<|im_end|>
|
||||
<|im_start|>User
|
||||
{prompt}<|im_end|>
|
||||
<|im_start|>AI answer
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🗣️ Example Usage
|
||||
|
||||
### Basic Inference
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
model_name = "SicariusSicariiStuff/Hebrew_Nemo"
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
model_name,
|
||||
torch_dtype="auto",
|
||||
device_map="auto"
|
||||
)
|
||||
|
||||
prompt = "מהי בינה מלאכותית?"
|
||||
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
|
||||
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7)
|
||||
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Chat Format
|
||||
|
||||
```python
|
||||
messages = [
|
||||
{"role": "user", "content": "ספר לי על ההיסטוריה של ירושלים"}
|
||||
]
|
||||
|
||||
formatted_prompt = tokenizer.apply_chat_template(
|
||||
messages,
|
||||
tokenize=False,
|
||||
add_generation_prompt=True
|
||||
)
|
||||
|
||||
inputs = tokenizer(formatted_prompt, return_tensors="pt").to(model.device)
|
||||
outputs = model.generate(**inputs, max_new_tokens=512)
|
||||
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
||||
```
|
||||
|
||||
### Quantization (for lower VRAM)
|
||||
|
||||
```python
|
||||
from transformers import BitsAndBytesConfig
|
||||
|
||||
quantization_config = BitsAndBytesConfig(
|
||||
load_in_4bit=True,
|
||||
bnb_4bit_compute_dtype=torch.bfloat16
|
||||
)
|
||||
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
model_name,
|
||||
quantization_config=quantization_config,
|
||||
device_map="auto"
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Available quantizations:
|
||||
|
||||
- Original: [FP16](https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo)
|
||||
- GGUF: [Static Quants](https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo_GGUF)
|
||||
- Specialized: [FP8](https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo_FP8)
|
||||
- Mobile (ARM): [Q4_0](https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo_ARM)
|
||||
|
||||
---
|
||||
|
||||
|
||||
## Citation
|
||||
|
||||
```bibtex
|
||||
@misc{hebrew_nemo_2025,
|
||||
author = {SicariusSicariiStuff},
|
||||
title = {Hebrew_Nemo: State-of-the-Art Hebrew Language Model},
|
||||
year = {2025},
|
||||
publisher = {Hugging Face},
|
||||
url = {https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo}
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
## 🧰 Acknowledgements
|
||||
|
||||
* [Mistral](https://mistral.ai/) for the base architecture
|
||||
* [NVIDIA NeMo](https://developer.nvidia.com/nemo) framework inspiration
|
||||
* Employee#11 for her unwavering support
|
||||
|
||||
## Contact
|
||||
|
||||
For questions, issues, or collaboration opportunities:
|
||||
- **HuggingFace:** [@SicariusSicariiStuff](https://huggingface.co/SicariusSicariiStuff)
|
||||
- **Issues:** Report technical issues on the model repository
|
||||
|
||||
|
||||
### Model Card Authors
|
||||
- [@SicariusSicariiStuff](https://huggingface.co/SicariusSicariiStuff)
|
||||
4
chat_template.jinja
Normal file
4
chat_template.jinja
Normal file
@@ -0,0 +1,4 @@
|
||||
{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '
|
||||
' + message['content'] + '<|im_end|>' + '
|
||||
'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
|
||||
' }}{% endif %}
|
||||
27
config.json
Normal file
27
config.json
Normal file
@@ -0,0 +1,27 @@
|
||||
{
|
||||
"_name_or_path": "SicariusSicariiStuff/Hebrew_Nemo",
|
||||
"architectures": [
|
||||
"MistralForCausalLM"
|
||||
],
|
||||
"attention_dropout": 0.0,
|
||||
"bos_token_id": 1,
|
||||
"eos_token_id": 2,
|
||||
"head_dim": 128,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 5120,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 14336,
|
||||
"max_position_embeddings": 131072,
|
||||
"model_type": "mistral",
|
||||
"num_attention_heads": 32,
|
||||
"num_hidden_layers": 40,
|
||||
"num_key_value_heads": 8,
|
||||
"rms_norm_eps": 1e-05,
|
||||
"rope_theta": 1000000.0,
|
||||
"sliding_window": null,
|
||||
"tie_word_embeddings": false,
|
||||
"torch_dtype": "bfloat16",
|
||||
"transformers_version": "4.54.1",
|
||||
"use_cache": true,
|
||||
"vocab_size": 131074
|
||||
}
|
||||
1
configuration.json
Normal file
1
configuration.json
Normal file
@@ -0,0 +1 @@
|
||||
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}
|
||||
6
generation_config.json
Normal file
6
generation_config.json
Normal file
@@ -0,0 +1,6 @@
|
||||
{
|
||||
"_from_model_config": true,
|
||||
"bos_token_id": 1,
|
||||
"eos_token_id": 2,
|
||||
"transformers_version": "4.54.1"
|
||||
}
|
||||
3
model-00001-of-00013.safetensors
Normal file
3
model-00001-of-00013.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:cb658038cc5599df6d2b3403d656637b6fbdf2e61f1c13cb4f0d07d94871a299
|
||||
size 1992337008
|
||||
3
model-00002-of-00013.safetensors
Normal file
3
model-00002-of-00013.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:f16d51b15df56700d028102f37d460a7887bf6562565f5abb57890104d7f1957
|
||||
size 1929444632
|
||||
3
model-00003-of-00013.safetensors
Normal file
3
model-00003-of-00013.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:937c444970686623b966d180c22443a2d00da521cf0ddde88423c1f9e16d4118
|
||||
size 1887522656
|
||||
3
model-00004-of-00013.safetensors
Normal file
3
model-00004-of-00013.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:f5c47657561f637ce6e005cc6b6498abcaa6312354017d590f4cda00217cc630
|
||||
size 1929444640
|
||||
3
model-00005-of-00013.safetensors
Normal file
3
model-00005-of-00013.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:dcf038bc1d6aecebde417d377326ffdaaf233be9f269e37510036367b11a0d7c
|
||||
size 1887522688
|
||||
3
model-00006-of-00013.safetensors
Normal file
3
model-00006-of-00013.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:e6bc822f6ceee593bd69f0613a081823beb12849883a64fab3da4ed0034685cd
|
||||
size 1929444656
|
||||
3
model-00007-of-00013.safetensors
Normal file
3
model-00007-of-00013.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:5f39ab148e4f6377d2751d38929ad57a2253caaee5a8a31c854467d704f6878b
|
||||
size 1887522688
|
||||
3
model-00008-of-00013.safetensors
Normal file
3
model-00008-of-00013.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:274f6d43b9db7521b5c9c0f670fc76f9c47a32f4f969fe093ca6be734f0e2a57
|
||||
size 1929444656
|
||||
3
model-00009-of-00013.safetensors
Normal file
3
model-00009-of-00013.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:82f08164e2698208aebd333e2cf3af24be0c4eb4ef5cf3612bc445352a1e8e73
|
||||
size 1887522688
|
||||
3
model-00010-of-00013.safetensors
Normal file
3
model-00010-of-00013.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:e6e6ac48de614d70f67c73f49887500feff8033a803245fd7802de49f8ab979b
|
||||
size 1929444656
|
||||
3
model-00011-of-00013.safetensors
Normal file
3
model-00011-of-00013.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:fb156c84da3010b5acac03516af0ebfe4e4d88bca5c6d1504ca1aadd5eb33aea
|
||||
size 1887522688
|
||||
3
model-00012-of-00013.safetensors
Normal file
3
model-00012-of-00013.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:724fffe1221e149fa199af6de38eabd0d3bc4dfb82043ea07e84c120b5c96676
|
||||
size 1929444656
|
||||
3
model-00013-of-00013.safetensors
Normal file
3
model-00013-of-00013.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:0375981fb7cd0e55e584cbf1bd7fd523787965c140d38742dff22d39069f1e1a
|
||||
size 1489029688
|
||||
371
model.safetensors.index.json
Normal file
371
model.safetensors.index.json
Normal file
@@ -0,0 +1,371 @@
|
||||
{
|
||||
"metadata": {
|
||||
"total_parameters": 12247802880,
|
||||
"total_size": 24495605760
|
||||
},
|
||||
"weight_map": {
|
||||
"lm_head.weight": "model-00013-of-00013.safetensors",
|
||||
"model.embed_tokens.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.0.input_layernorm.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.1.input_layernorm.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.1.mlp.down_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.1.mlp.gate_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.1.mlp.up_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.1.post_attention_layernorm.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.10.input_layernorm.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.10.mlp.down_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.10.mlp.gate_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.10.mlp.up_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.10.post_attention_layernorm.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.10.self_attn.k_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.10.self_attn.o_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.10.self_attn.q_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.10.self_attn.v_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.11.input_layernorm.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.11.mlp.down_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.11.mlp.gate_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.11.mlp.up_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.11.post_attention_layernorm.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.11.self_attn.k_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.11.self_attn.o_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.11.self_attn.q_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.11.self_attn.v_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.12.input_layernorm.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.12.mlp.down_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.12.mlp.gate_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.12.mlp.up_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.12.post_attention_layernorm.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.12.self_attn.k_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.12.self_attn.o_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.12.self_attn.q_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.12.self_attn.v_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.13.input_layernorm.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.13.mlp.down_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.13.mlp.gate_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.13.mlp.up_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.13.post_attention_layernorm.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.13.self_attn.k_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.13.self_attn.o_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.13.self_attn.q_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.13.self_attn.v_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.14.input_layernorm.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.14.mlp.down_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.14.mlp.gate_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.14.mlp.up_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.14.post_attention_layernorm.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.14.self_attn.k_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.14.self_attn.o_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.14.self_attn.q_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.14.self_attn.v_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.15.input_layernorm.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.15.mlp.down_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.15.mlp.gate_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.15.mlp.up_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.15.post_attention_layernorm.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.15.self_attn.k_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.15.self_attn.o_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.15.self_attn.q_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.15.self_attn.v_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.16.input_layernorm.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.16.mlp.down_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.16.mlp.gate_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.16.mlp.up_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.16.post_attention_layernorm.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.16.self_attn.k_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.16.self_attn.o_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.16.self_attn.q_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.16.self_attn.v_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.17.input_layernorm.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.17.mlp.down_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.17.mlp.gate_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.17.mlp.up_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.17.post_attention_layernorm.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.17.self_attn.k_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.17.self_attn.o_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.17.self_attn.q_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.17.self_attn.v_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.18.input_layernorm.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.18.mlp.down_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.18.mlp.gate_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.18.mlp.up_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.18.post_attention_layernorm.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.18.self_attn.k_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.18.self_attn.o_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.18.self_attn.q_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.18.self_attn.v_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.19.input_layernorm.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.19.mlp.down_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.19.mlp.gate_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.19.mlp.up_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.19.post_attention_layernorm.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.19.self_attn.k_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.19.self_attn.o_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.19.self_attn.q_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.19.self_attn.v_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.2.input_layernorm.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.2.mlp.down_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.2.mlp.gate_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.2.mlp.up_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.2.post_attention_layernorm.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.2.self_attn.k_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.2.self_attn.o_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.2.self_attn.q_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.2.self_attn.v_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.20.input_layernorm.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.20.mlp.down_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.20.mlp.gate_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.20.mlp.up_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.20.post_attention_layernorm.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.20.self_attn.k_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.20.self_attn.o_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.20.self_attn.q_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.20.self_attn.v_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.21.input_layernorm.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.21.mlp.down_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.21.mlp.gate_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.21.mlp.up_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.21.post_attention_layernorm.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.21.self_attn.k_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.21.self_attn.o_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.21.self_attn.q_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.21.self_attn.v_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.22.input_layernorm.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.22.mlp.down_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.22.mlp.gate_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.22.mlp.up_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.22.post_attention_layernorm.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.22.self_attn.k_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.22.self_attn.o_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.22.self_attn.q_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.22.self_attn.v_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.23.input_layernorm.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.23.mlp.down_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.23.mlp.gate_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.23.mlp.up_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.23.post_attention_layernorm.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.23.self_attn.k_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.23.self_attn.o_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.23.self_attn.q_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.23.self_attn.v_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.24.input_layernorm.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.24.mlp.down_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.24.mlp.gate_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.24.mlp.up_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.24.post_attention_layernorm.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.24.self_attn.k_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.24.self_attn.o_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.24.self_attn.q_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.24.self_attn.v_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.25.input_layernorm.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.25.mlp.down_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.25.mlp.gate_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.25.mlp.up_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.25.post_attention_layernorm.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.25.self_attn.k_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.25.self_attn.o_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.25.self_attn.q_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.25.self_attn.v_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.26.input_layernorm.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.26.mlp.down_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.26.mlp.gate_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.26.mlp.up_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.26.post_attention_layernorm.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.26.self_attn.k_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.26.self_attn.o_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.26.self_attn.q_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.26.self_attn.v_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.27.input_layernorm.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.27.mlp.down_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.27.mlp.gate_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.27.mlp.up_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.27.post_attention_layernorm.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.27.self_attn.k_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.27.self_attn.o_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.27.self_attn.q_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.27.self_attn.v_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.28.input_layernorm.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.28.mlp.down_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.28.mlp.gate_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.28.mlp.up_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.28.post_attention_layernorm.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.28.self_attn.k_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.28.self_attn.o_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.28.self_attn.q_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.28.self_attn.v_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.29.input_layernorm.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.29.mlp.down_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.29.mlp.gate_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.29.mlp.up_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.29.post_attention_layernorm.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.29.self_attn.k_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.29.self_attn.o_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.29.self_attn.q_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.29.self_attn.v_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.3.input_layernorm.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.3.mlp.down_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.3.mlp.gate_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.3.mlp.up_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.3.post_attention_layernorm.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.3.self_attn.k_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.3.self_attn.o_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.3.self_attn.q_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.3.self_attn.v_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.30.input_layernorm.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.30.mlp.down_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.30.mlp.gate_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.30.mlp.up_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.30.post_attention_layernorm.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.30.self_attn.k_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.30.self_attn.o_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.30.self_attn.q_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.30.self_attn.v_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.31.input_layernorm.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.31.mlp.down_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.31.mlp.gate_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.31.mlp.up_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.31.post_attention_layernorm.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.31.self_attn.k_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.31.self_attn.o_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.31.self_attn.q_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.31.self_attn.v_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.32.input_layernorm.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.32.mlp.down_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.32.mlp.gate_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.32.mlp.up_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.32.post_attention_layernorm.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.32.self_attn.k_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.32.self_attn.o_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.32.self_attn.q_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.32.self_attn.v_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.33.input_layernorm.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.33.mlp.down_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.33.mlp.gate_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.33.mlp.up_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.33.post_attention_layernorm.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.33.self_attn.k_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.33.self_attn.o_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.33.self_attn.q_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.33.self_attn.v_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.34.input_layernorm.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.34.mlp.down_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.34.mlp.gate_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.34.mlp.up_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.34.post_attention_layernorm.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.34.self_attn.k_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.34.self_attn.o_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.34.self_attn.q_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.34.self_attn.v_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.35.input_layernorm.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.35.mlp.down_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.35.mlp.gate_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.35.mlp.up_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.35.post_attention_layernorm.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.35.self_attn.k_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.35.self_attn.o_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.35.self_attn.q_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.35.self_attn.v_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.36.input_layernorm.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.36.mlp.down_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.36.mlp.gate_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.36.mlp.up_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.36.post_attention_layernorm.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.36.self_attn.k_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.36.self_attn.o_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.36.self_attn.q_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.36.self_attn.v_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.37.input_layernorm.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.37.mlp.down_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.37.mlp.gate_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.37.mlp.up_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.37.post_attention_layernorm.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.37.self_attn.k_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.37.self_attn.o_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.37.self_attn.q_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.37.self_attn.v_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.38.input_layernorm.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.38.mlp.down_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.38.mlp.gate_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.38.mlp.up_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.38.post_attention_layernorm.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.38.self_attn.k_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.38.self_attn.o_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.38.self_attn.q_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.38.self_attn.v_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.39.input_layernorm.weight": "model-00013-of-00013.safetensors",
|
||||
"model.layers.39.mlp.down_proj.weight": "model-00013-of-00013.safetensors",
|
||||
"model.layers.39.mlp.gate_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.39.mlp.up_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.39.post_attention_layernorm.weight": "model-00013-of-00013.safetensors",
|
||||
"model.layers.39.self_attn.k_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.39.self_attn.o_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.39.self_attn.q_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.39.self_attn.v_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.4.input_layernorm.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.4.mlp.down_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.4.mlp.gate_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.4.mlp.up_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.4.post_attention_layernorm.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.4.self_attn.k_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.4.self_attn.o_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.4.self_attn.q_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.4.self_attn.v_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.5.input_layernorm.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.5.mlp.down_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.5.mlp.gate_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.5.mlp.up_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.5.post_attention_layernorm.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.5.self_attn.k_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.5.self_attn.o_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.5.self_attn.q_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.5.self_attn.v_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.6.input_layernorm.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.6.mlp.down_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.6.mlp.gate_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.6.mlp.up_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.6.post_attention_layernorm.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.6.self_attn.k_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.6.self_attn.o_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.6.self_attn.q_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.6.self_attn.v_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.7.input_layernorm.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.7.mlp.down_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.7.mlp.gate_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.7.mlp.up_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.7.post_attention_layernorm.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.7.self_attn.k_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.7.self_attn.o_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.7.self_attn.q_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.7.self_attn.v_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.8.input_layernorm.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.8.mlp.down_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.8.mlp.gate_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.8.mlp.up_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.8.post_attention_layernorm.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.8.self_attn.k_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.8.self_attn.o_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.8.self_attn.q_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.8.self_attn.v_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.9.input_layernorm.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.9.mlp.down_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.9.mlp.gate_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.9.mlp.up_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.9.post_attention_layernorm.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.9.self_attn.k_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.9.self_attn.o_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.9.self_attn.q_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.9.self_attn.v_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.norm.weight": "model-00013-of-00013.safetensors"
|
||||
}
|
||||
}
|
||||
30
special_tokens_map.json
Normal file
30
special_tokens_map.json
Normal file
@@ -0,0 +1,30 @@
|
||||
{
|
||||
"bos_token": {
|
||||
"content": "<s>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"eos_token": {
|
||||
"content": "<|im_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"pad_token": {
|
||||
"content": "<pad>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"unk_token": {
|
||||
"content": "<unk>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
3
tokenizer.json
Normal file
3
tokenizer.json
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:ea88f9940a84ab7e0100bc369506a28ec8d5d821691dc47d4dd63f1bbdf105ed
|
||||
size 17078669
|
||||
8035
tokenizer_config.json
Normal file
8035
tokenizer_config.json
Normal file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user