初始化项目,由ModelHub XC社区提供模型

Model: PrimeIntellect/INTELLECT-1
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-06-22 12:08:12 +08:00
commit 36de7bb6a7
20 changed files with 413261 additions and 0 deletions

45
.gitattributes vendored Normal file
View File

@@ -0,0 +1,45 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
model-0000-of-0010.safetensors filter=lfs diff=lfs merge=lfs -text
model-0001-of-0010.safetensors filter=lfs diff=lfs merge=lfs -text
model-0002-of-0010.safetensors filter=lfs diff=lfs merge=lfs -text
model-0003-of-0010.safetensors filter=lfs diff=lfs merge=lfs -text
model-0004-of-0010.safetensors filter=lfs diff=lfs merge=lfs -text
model-0005-of-0010.safetensors filter=lfs diff=lfs merge=lfs -text
model-0006-of-0010.safetensors filter=lfs diff=lfs merge=lfs -text
model-0007-of-0010.safetensors filter=lfs diff=lfs merge=lfs -text
model-0008-of-0010.safetensors filter=lfs diff=lfs merge=lfs -text
model-0009-of-0010.safetensors filter=lfs diff=lfs merge=lfs -text

114
README.md Normal file
View File

@@ -0,0 +1,114 @@
---
license: apache-2.0
datasets:
- PrimeIntellect/fineweb-edu
- PrimeIntellect/fineweb
- PrimeIntellect/StackV1-popular
- mlfoundations/dclm-baseline-1.0-parquet
- open-web-math/open-web-math
language:
- en
pipeline_tag: text-generation
---
# INTELLECT-1
## **Model Overview**
**INTELLECT-1** is the first collaboratively trained 10 billion parameter language model trained from scratch on 1 trillion tokens of English text and code.
![Intellect 1 training visual](intellect-1-map.png)
This is a base model. Please use the [INTELLECT-1-Instruct](https://huggingface.co/PrimeIntellect/INTELLECT-1-Instruct) for chat use case.
**INTELLECT-1** was trained on up to 14 concurrent nodes distributed across 3 continents, with contributions from 30 independent community contributors providing compute.
The training code utilizes the [prime framework](https://github.com/PrimeIntellect-ai/prime), a scalable distributed training framework designed for fault-tolerant, dynamically scaling, high-perfomance training on unreliable, globally distributed workers.
The key abstraction that allows dynamic scaling is the `ElasticDeviceMesh` which manages dynamic global process groups for fault-tolerant communication across the internet and local process groups for communication within a node.
The model was trained using the [DiLoCo](https://arxiv.org/abs/2311.08105) algorithms with 100 inner steps. The global all-reduce was done with custom int8 all-reduce kernels to reduce the communication payload required, greatly reducing the communication overhead by a factor 400x.
For more detailed technical insights, please refer to our [technical paper](https://github.com/PrimeIntellect-ai/prime).
**Note: You must add a BOS token at the beginning of each sample. Performance may be impacted otherwise.**
## Usage
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
torch.set_default_device("cuda")
model = AutoModelForCausalLM.from_pretrained("PrimeIntellect/INTELLECT-1")
tokenizer = AutoTokenizer.from_pretrained("PrimeIntellect/INTELLECT-1")
input_text = "What is the Metamorphosis of Prime Intellect about?"
input_ids = tokenizer.encode(input_text, return_tensors="pt")
output_ids = model.generate(input_ids, max_length=50, num_return_sequences=1)
output_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print(output_text)
```
### Example text generation pipeline
```python
import torch
from transformers import pipeline
torch.set_default_device("cuda")
pipe = pipeline("text-generation", model="PrimeIntellect/INTELLECT-1")
print(pipe("What is prime intellect ?"))
```
## **Model Details**
- **Compute Contributors**: Prime Intellect, Arcee AI, kotaro, skre_0, marlo, rodeo, Herb, Olas, superchillen, Hugging Face, mev_pete, 0xfr_, dj, primeprimeint1234, Marco Giglio, realtek, Hyperbolic, hecataeus, NWO, Virtual Machine, droll, SemiAnalysis, _waiting__, toptickcrypto, sto, Johannes, washout_segment_0b, klee
- **Release Date**: 29 Nov 2024
- **Model License**: Apache 2.0
## **Technical Specifications**
| **Parameter** | **Value** |
|----------------------|------------------------|
| Parameter Size | 10B |
| Number of Layers | 42 |
| Number of Attention Heads | 32 |
| Hidden Size | 4096 |
| Context Length | 8192 |
| Vocabulary Size | 128256 |
**Training Details**:
- **Dataset**: 55% fineweb-edu, 10% fineweb, 20% Stack V1, 10% dclm-baseline, 5% open-web-math
- **Tokens**: 1 Trillion
- **Optimizer**: Diloco/LocalSGD - Inner Optimizer: AdamW, Outer Optmizer: Nesterov SGD
**Performance on benchmarks**
Base Models:
| Model | Size | Tokens | MMLU | GPQA | GSM8K | ARC-C | Hellaswag |
|---|---|---|---|---|---|---|---|
| INTELLECT | 10B | 1T | 37.5 | 26.12 | 8.1 | 52.13 | 72.26 |
| MPT-7B | 7B | 1T | 26.8 | 25.67 | 8.3 | 46.67 | 77.41 |
| Falcon-7B | 7B | 1.5T | 26.2 | 23.66 | 4.9 | 47.61 | 78.23 |
| Pythia-12B | 12B | 300B | 26.5 | 24.33 | 4.09 | 40.61 | 68.83 |
| LLM360-Amber | 7B | 1.3T | 24.5 | 27.01 | 4.32 | 42.75 | 74.08 |
| LLaMA-7B | 7B | 1T | 35.1 | 23.21 | 9.7 | 50.43 | 78.19 |
| LLaMA-13B | 13B | 1T | 46.9 | 26.34 | 17.3 | 56.14 | 81.05 |
| LLaMA2-7B | 7B | 2T | 45.3 | 25.89 | 13.5 | 54.10 | 78.64 |
| LLaMA2-13B | 13B | 2T | 54.8 | 25.67 | 24.3 | 59.81 | 82.58 |
[Instruction-Tuned Models](https://huggingface.co/PrimeIntellect/INTELLECT-1-Instruct):
| Model | Size | Tokens | MMLU | GPQA | GSM8K | ARC-C | Hellaswag |
|---|---|---|---|---|---|---|---|
| INTELLECT-Instruct | 10B | 1T | 49.89 | 28.32 | 38.58 | 54.52 | 71.42 |
| MPT-7B-Chat | 7B | 1T | 36.29 | 26.79 | 8.26 | 51.02 | 75.88 |
| Falcon-7B-Instruct | 7B | 1.5T | 25.21 | 26.34 | 4.93 | 45.82 | 70.61 |
| LLM360-AmberChat | 7B | 1.4T | 36.02 | 27.23 | 6.14 | 43.94 | 73.94 |
| LLaMA2-7B-Chat | 7B | 2T | 47.20 | 28.57 | 23.96 | 53.33 | 78.69 |
| LLaMA2-13B-Chat | 13B | 2T | 53.51 | 28.35 | 37.15 | 59.73 | 82.47 |
## **Citations**
If you use this model in your research, please cite it as follows:
```
@article{jaghouar2024intellect,
title={INTELLECT-1 Technical Report.},
author={Jaghouar, Sami and Ong, Jack Min and Basra, Manveer and Obeid, Fares and Straube, Jannik and Keiblinger, Michael and Bakouch, Elie and Atkins, Lucas and Panahi, Maziyar and Goddard, Charles and Ryabinin, Max and Hagemann, Johannes},
journal={arXiv preprint},
year={2024}
}
```

35
config.json Normal file
View File

@@ -0,0 +1,35 @@
{
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"eos_token_id": [
128001,
128008,
128009
],
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 8192,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 42,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": {
"original_max_position_embeddings": 8192,
"rope_type": "default",
"factor": 1.0
},
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"transformers_version": "4.44.2",
"use_cache": true,
"vocab_size": 128256
}

1
configuration.json Normal file
View File

@@ -0,0 +1 @@
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}

8
generation_config.json Normal file
View File

@@ -0,0 +1,8 @@
{
"do_sample": true,
"max_length": 100,
"temperature": 0.7,
"top_k": null,
"transformers_version": "4.44.2",
"use_cache": false
}

BIN
intellect-1-map.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 129 KiB

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:161434bd237837c6cf3076ecdcf56f3e941942f0114a45c71472c906d5f4a0fc
size 2837516736

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:952b6a8f7949bb6b123ecd2f05ae8b5703e9ddbbd874afa78289fe630da03e25
size 1904284056

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5017c3954c108a9a89f2862e561ba982f7b5c1023b8987c32f8a1ec8b42ccf09
size 1979789736

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:42a17f8d8981eb5d68cb79abc0d36d683edb8f74fc288d67dd2394283d95068a
size 1786851768

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0c672aacd0359f424c0d3ca3e50628beb0ab5c166854f86719fbd5851c54c4d9
size 1904284096

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:32cacb41b3c7daf40733f32b8ee2599fbe34de8cce44559f91ecfd2825ff46a9
size 1979789752

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f96acd5d8e44c0f37ac2db403b4428c813da86dd4fda4ef5004169ebc783d732
size 1786851768

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b943362d92e08683798a81391ebb5f0d6b7ef1e71b1dbb3655795366be59555e
size 1904284096

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6126ba76254980b97ed7eaffb17672b5605f6e3cbe416c37eb1639ca399a84f3
size 1979789752

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:312e70e55c62120d12bca4c00c938c0805a352722d1e82f0eca62c1585e12597
size 2359365032

View File

@@ -0,0 +1,388 @@
{
"weight_map": {
"model.embed_tokens.weight": "model-0000-of-0010.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-0000-of-0010.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-0000-of-0010.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-0000-of-0010.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-0000-of-0010.safetensors",
"model.layers.0.mlp.gate_proj.weight": "model-0000-of-0010.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-0000-of-0010.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-0000-of-0010.safetensors",
"model.layers.0.input_layernorm.weight": "model-0000-of-0010.safetensors",
"model.layers.0.post_attention_layernorm.weight": "model-0000-of-0010.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-0000-of-0010.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-0000-of-0010.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-0000-of-0010.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-0000-of-0010.safetensors",
"model.layers.1.mlp.gate_proj.weight": "model-0000-of-0010.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-0000-of-0010.safetensors",
"model.layers.1.mlp.up_proj.weight": "model-0000-of-0010.safetensors",
"model.layers.1.input_layernorm.weight": "model-0000-of-0010.safetensors",
"model.layers.1.post_attention_layernorm.weight": "model-0000-of-0010.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-0000-of-0010.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-0000-of-0010.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-0000-of-0010.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-0000-of-0010.safetensors",
"model.layers.2.mlp.gate_proj.weight": "model-0000-of-0010.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-0000-of-0010.safetensors",
"model.layers.2.mlp.up_proj.weight": "model-0000-of-0010.safetensors",
"model.layers.2.input_layernorm.weight": "model-0000-of-0010.safetensors",
"model.layers.2.post_attention_layernorm.weight": "model-0000-of-0010.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-0000-of-0010.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-0000-of-0010.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-0000-of-0010.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-0000-of-0010.safetensors",
"model.layers.3.mlp.gate_proj.weight": "model-0000-of-0010.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-0000-of-0010.safetensors",
"model.layers.3.mlp.up_proj.weight": "model-0000-of-0010.safetensors",
"model.layers.3.input_layernorm.weight": "model-0000-of-0010.safetensors",
"model.layers.3.post_attention_layernorm.weight": "model-0000-of-0010.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-0000-of-0010.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-0000-of-0010.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-0001-of-0010.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-0001-of-0010.safetensors",
"model.layers.4.mlp.gate_proj.weight": "model-0001-of-0010.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-0001-of-0010.safetensors",
"model.layers.4.mlp.up_proj.weight": "model-0001-of-0010.safetensors",
"model.layers.4.input_layernorm.weight": "model-0001-of-0010.safetensors",
"model.layers.4.post_attention_layernorm.weight": "model-0001-of-0010.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-0001-of-0010.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-0001-of-0010.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-0001-of-0010.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-0001-of-0010.safetensors",
"model.layers.5.mlp.gate_proj.weight": "model-0001-of-0010.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-0001-of-0010.safetensors",
"model.layers.5.mlp.up_proj.weight": "model-0001-of-0010.safetensors",
"model.layers.5.input_layernorm.weight": "model-0001-of-0010.safetensors",
"model.layers.5.post_attention_layernorm.weight": "model-0001-of-0010.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-0001-of-0010.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-0001-of-0010.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-0001-of-0010.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-0001-of-0010.safetensors",
"model.layers.6.mlp.gate_proj.weight": "model-0001-of-0010.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-0001-of-0010.safetensors",
"model.layers.6.mlp.up_proj.weight": "model-0001-of-0010.safetensors",
"model.layers.6.input_layernorm.weight": "model-0001-of-0010.safetensors",
"model.layers.6.post_attention_layernorm.weight": "model-0001-of-0010.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-0001-of-0010.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-0001-of-0010.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-0001-of-0010.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-0001-of-0010.safetensors",
"model.layers.7.mlp.gate_proj.weight": "model-0001-of-0010.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-0001-of-0010.safetensors",
"model.layers.7.mlp.up_proj.weight": "model-0001-of-0010.safetensors",
"model.layers.7.input_layernorm.weight": "model-0001-of-0010.safetensors",
"model.layers.7.post_attention_layernorm.weight": "model-0001-of-0010.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-0001-of-0010.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-0001-of-0010.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-0001-of-0010.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-0001-of-0010.safetensors",
"model.layers.8.mlp.gate_proj.weight": "model-0001-of-0010.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-0002-of-0010.safetensors",
"model.layers.8.mlp.up_proj.weight": "model-0002-of-0010.safetensors",
"model.layers.8.input_layernorm.weight": "model-0002-of-0010.safetensors",
"model.layers.8.post_attention_layernorm.weight": "model-0002-of-0010.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-0002-of-0010.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-0002-of-0010.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-0002-of-0010.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-0002-of-0010.safetensors",
"model.layers.9.mlp.gate_proj.weight": "model-0002-of-0010.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-0002-of-0010.safetensors",
"model.layers.9.mlp.up_proj.weight": "model-0002-of-0010.safetensors",
"model.layers.9.input_layernorm.weight": "model-0002-of-0010.safetensors",
"model.layers.9.post_attention_layernorm.weight": "model-0002-of-0010.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-0002-of-0010.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-0002-of-0010.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-0002-of-0010.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-0002-of-0010.safetensors",
"model.layers.10.mlp.gate_proj.weight": "model-0002-of-0010.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-0002-of-0010.safetensors",
"model.layers.10.mlp.up_proj.weight": "model-0002-of-0010.safetensors",
"model.layers.10.input_layernorm.weight": "model-0002-of-0010.safetensors",
"model.layers.10.post_attention_layernorm.weight": "model-0002-of-0010.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-0002-of-0010.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-0002-of-0010.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-0002-of-0010.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-0002-of-0010.safetensors",
"model.layers.11.mlp.gate_proj.weight": "model-0002-of-0010.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-0002-of-0010.safetensors",
"model.layers.11.mlp.up_proj.weight": "model-0002-of-0010.safetensors",
"model.layers.11.input_layernorm.weight": "model-0002-of-0010.safetensors",
"model.layers.11.post_attention_layernorm.weight": "model-0002-of-0010.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-0002-of-0010.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-0002-of-0010.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-0002-of-0010.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-0002-of-0010.safetensors",
"model.layers.12.mlp.gate_proj.weight": "model-0002-of-0010.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-0002-of-0010.safetensors",
"model.layers.12.mlp.up_proj.weight": "model-0002-of-0010.safetensors",
"model.layers.12.input_layernorm.weight": "model-0002-of-0010.safetensors",
"model.layers.12.post_attention_layernorm.weight": "model-0003-of-0010.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-0003-of-0010.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-0003-of-0010.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-0003-of-0010.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-0003-of-0010.safetensors",
"model.layers.13.mlp.gate_proj.weight": "model-0003-of-0010.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-0003-of-0010.safetensors",
"model.layers.13.mlp.up_proj.weight": "model-0003-of-0010.safetensors",
"model.layers.13.input_layernorm.weight": "model-0003-of-0010.safetensors",
"model.layers.13.post_attention_layernorm.weight": "model-0003-of-0010.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-0003-of-0010.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-0003-of-0010.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-0003-of-0010.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-0003-of-0010.safetensors",
"model.layers.14.mlp.gate_proj.weight": "model-0003-of-0010.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-0003-of-0010.safetensors",
"model.layers.14.mlp.up_proj.weight": "model-0003-of-0010.safetensors",
"model.layers.14.input_layernorm.weight": "model-0003-of-0010.safetensors",
"model.layers.14.post_attention_layernorm.weight": "model-0003-of-0010.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-0003-of-0010.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-0003-of-0010.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-0003-of-0010.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-0003-of-0010.safetensors",
"model.layers.15.mlp.gate_proj.weight": "model-0003-of-0010.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-0003-of-0010.safetensors",
"model.layers.15.mlp.up_proj.weight": "model-0003-of-0010.safetensors",
"model.layers.15.input_layernorm.weight": "model-0003-of-0010.safetensors",
"model.layers.15.post_attention_layernorm.weight": "model-0003-of-0010.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-0003-of-0010.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-0003-of-0010.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-0003-of-0010.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-0003-of-0010.safetensors",
"model.layers.16.mlp.gate_proj.weight": "model-0003-of-0010.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-0003-of-0010.safetensors",
"model.layers.16.mlp.up_proj.weight": "model-0003-of-0010.safetensors",
"model.layers.16.input_layernorm.weight": "model-0003-of-0010.safetensors",
"model.layers.16.post_attention_layernorm.weight": "model-0003-of-0010.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-0003-of-0010.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-0003-of-0010.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-0004-of-0010.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-0004-of-0010.safetensors",
"model.layers.17.mlp.gate_proj.weight": "model-0004-of-0010.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-0004-of-0010.safetensors",
"model.layers.17.mlp.up_proj.weight": "model-0004-of-0010.safetensors",
"model.layers.17.input_layernorm.weight": "model-0004-of-0010.safetensors",
"model.layers.17.post_attention_layernorm.weight": "model-0004-of-0010.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-0004-of-0010.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-0004-of-0010.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-0004-of-0010.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-0004-of-0010.safetensors",
"model.layers.18.mlp.gate_proj.weight": "model-0004-of-0010.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-0004-of-0010.safetensors",
"model.layers.18.mlp.up_proj.weight": "model-0004-of-0010.safetensors",
"model.layers.18.input_layernorm.weight": "model-0004-of-0010.safetensors",
"model.layers.18.post_attention_layernorm.weight": "model-0004-of-0010.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-0004-of-0010.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-0004-of-0010.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-0004-of-0010.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-0004-of-0010.safetensors",
"model.layers.19.mlp.gate_proj.weight": "model-0004-of-0010.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-0004-of-0010.safetensors",
"model.layers.19.mlp.up_proj.weight": "model-0004-of-0010.safetensors",
"model.layers.19.input_layernorm.weight": "model-0004-of-0010.safetensors",
"model.layers.19.post_attention_layernorm.weight": "model-0004-of-0010.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-0004-of-0010.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-0004-of-0010.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-0004-of-0010.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-0004-of-0010.safetensors",
"model.layers.20.mlp.gate_proj.weight": "model-0004-of-0010.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-0004-of-0010.safetensors",
"model.layers.20.mlp.up_proj.weight": "model-0004-of-0010.safetensors",
"model.layers.20.input_layernorm.weight": "model-0004-of-0010.safetensors",
"model.layers.20.post_attention_layernorm.weight": "model-0004-of-0010.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-0004-of-0010.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-0004-of-0010.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-0004-of-0010.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-0004-of-0010.safetensors",
"model.layers.21.mlp.gate_proj.weight": "model-0004-of-0010.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-0005-of-0010.safetensors",
"model.layers.21.mlp.up_proj.weight": "model-0005-of-0010.safetensors",
"model.layers.21.input_layernorm.weight": "model-0005-of-0010.safetensors",
"model.layers.21.post_attention_layernorm.weight": "model-0005-of-0010.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-0005-of-0010.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-0005-of-0010.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-0005-of-0010.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-0005-of-0010.safetensors",
"model.layers.22.mlp.gate_proj.weight": "model-0005-of-0010.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-0005-of-0010.safetensors",
"model.layers.22.mlp.up_proj.weight": "model-0005-of-0010.safetensors",
"model.layers.22.input_layernorm.weight": "model-0005-of-0010.safetensors",
"model.layers.22.post_attention_layernorm.weight": "model-0005-of-0010.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-0005-of-0010.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-0005-of-0010.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-0005-of-0010.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-0005-of-0010.safetensors",
"model.layers.23.mlp.gate_proj.weight": "model-0005-of-0010.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-0005-of-0010.safetensors",
"model.layers.23.mlp.up_proj.weight": "model-0005-of-0010.safetensors",
"model.layers.23.input_layernorm.weight": "model-0005-of-0010.safetensors",
"model.layers.23.post_attention_layernorm.weight": "model-0005-of-0010.safetensors",
"model.layers.24.self_attn.q_proj.weight": "model-0005-of-0010.safetensors",
"model.layers.24.self_attn.k_proj.weight": "model-0005-of-0010.safetensors",
"model.layers.24.self_attn.v_proj.weight": "model-0005-of-0010.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-0005-of-0010.safetensors",
"model.layers.24.mlp.gate_proj.weight": "model-0005-of-0010.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-0005-of-0010.safetensors",
"model.layers.24.mlp.up_proj.weight": "model-0005-of-0010.safetensors",
"model.layers.24.input_layernorm.weight": "model-0005-of-0010.safetensors",
"model.layers.24.post_attention_layernorm.weight": "model-0005-of-0010.safetensors",
"model.layers.25.self_attn.q_proj.weight": "model-0005-of-0010.safetensors",
"model.layers.25.self_attn.k_proj.weight": "model-0005-of-0010.safetensors",
"model.layers.25.self_attn.v_proj.weight": "model-0005-of-0010.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-0005-of-0010.safetensors",
"model.layers.25.mlp.gate_proj.weight": "model-0005-of-0010.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-0005-of-0010.safetensors",
"model.layers.25.mlp.up_proj.weight": "model-0005-of-0010.safetensors",
"model.layers.25.input_layernorm.weight": "model-0005-of-0010.safetensors",
"model.layers.25.post_attention_layernorm.weight": "model-0006-of-0010.safetensors",
"model.layers.26.self_attn.q_proj.weight": "model-0006-of-0010.safetensors",
"model.layers.26.self_attn.k_proj.weight": "model-0006-of-0010.safetensors",
"model.layers.26.self_attn.v_proj.weight": "model-0006-of-0010.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-0006-of-0010.safetensors",
"model.layers.26.mlp.gate_proj.weight": "model-0006-of-0010.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-0006-of-0010.safetensors",
"model.layers.26.mlp.up_proj.weight": "model-0006-of-0010.safetensors",
"model.layers.26.input_layernorm.weight": "model-0006-of-0010.safetensors",
"model.layers.26.post_attention_layernorm.weight": "model-0006-of-0010.safetensors",
"model.layers.27.self_attn.q_proj.weight": "model-0006-of-0010.safetensors",
"model.layers.27.self_attn.k_proj.weight": "model-0006-of-0010.safetensors",
"model.layers.27.self_attn.v_proj.weight": "model-0006-of-0010.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-0006-of-0010.safetensors",
"model.layers.27.mlp.gate_proj.weight": "model-0006-of-0010.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-0006-of-0010.safetensors",
"model.layers.27.mlp.up_proj.weight": "model-0006-of-0010.safetensors",
"model.layers.27.input_layernorm.weight": "model-0006-of-0010.safetensors",
"model.layers.27.post_attention_layernorm.weight": "model-0006-of-0010.safetensors",
"model.layers.28.self_attn.q_proj.weight": "model-0006-of-0010.safetensors",
"model.layers.28.self_attn.k_proj.weight": "model-0006-of-0010.safetensors",
"model.layers.28.self_attn.v_proj.weight": "model-0006-of-0010.safetensors",
"model.layers.28.self_attn.o_proj.weight": "model-0006-of-0010.safetensors",
"model.layers.28.mlp.gate_proj.weight": "model-0006-of-0010.safetensors",
"model.layers.28.mlp.down_proj.weight": "model-0006-of-0010.safetensors",
"model.layers.28.mlp.up_proj.weight": "model-0006-of-0010.safetensors",
"model.layers.28.input_layernorm.weight": "model-0006-of-0010.safetensors",
"model.layers.28.post_attention_layernorm.weight": "model-0006-of-0010.safetensors",
"model.layers.29.self_attn.q_proj.weight": "model-0006-of-0010.safetensors",
"model.layers.29.self_attn.k_proj.weight": "model-0006-of-0010.safetensors",
"model.layers.29.self_attn.v_proj.weight": "model-0006-of-0010.safetensors",
"model.layers.29.self_attn.o_proj.weight": "model-0006-of-0010.safetensors",
"model.layers.29.mlp.gate_proj.weight": "model-0006-of-0010.safetensors",
"model.layers.29.mlp.down_proj.weight": "model-0006-of-0010.safetensors",
"model.layers.29.mlp.up_proj.weight": "model-0006-of-0010.safetensors",
"model.layers.29.input_layernorm.weight": "model-0006-of-0010.safetensors",
"model.layers.29.post_attention_layernorm.weight": "model-0006-of-0010.safetensors",
"model.layers.30.self_attn.q_proj.weight": "model-0006-of-0010.safetensors",
"model.layers.30.self_attn.k_proj.weight": "model-0006-of-0010.safetensors",
"model.layers.30.self_attn.v_proj.weight": "model-0007-of-0010.safetensors",
"model.layers.30.self_attn.o_proj.weight": "model-0007-of-0010.safetensors",
"model.layers.30.mlp.gate_proj.weight": "model-0007-of-0010.safetensors",
"model.layers.30.mlp.down_proj.weight": "model-0007-of-0010.safetensors",
"model.layers.30.mlp.up_proj.weight": "model-0007-of-0010.safetensors",
"model.layers.30.input_layernorm.weight": "model-0007-of-0010.safetensors",
"model.layers.30.post_attention_layernorm.weight": "model-0007-of-0010.safetensors",
"model.layers.31.self_attn.q_proj.weight": "model-0007-of-0010.safetensors",
"model.layers.31.self_attn.k_proj.weight": "model-0007-of-0010.safetensors",
"model.layers.31.self_attn.v_proj.weight": "model-0007-of-0010.safetensors",
"model.layers.31.self_attn.o_proj.weight": "model-0007-of-0010.safetensors",
"model.layers.31.mlp.gate_proj.weight": "model-0007-of-0010.safetensors",
"model.layers.31.mlp.down_proj.weight": "model-0007-of-0010.safetensors",
"model.layers.31.mlp.up_proj.weight": "model-0007-of-0010.safetensors",
"model.layers.31.input_layernorm.weight": "model-0007-of-0010.safetensors",
"model.layers.31.post_attention_layernorm.weight": "model-0007-of-0010.safetensors",
"model.layers.32.self_attn.q_proj.weight": "model-0007-of-0010.safetensors",
"model.layers.32.self_attn.k_proj.weight": "model-0007-of-0010.safetensors",
"model.layers.32.self_attn.v_proj.weight": "model-0007-of-0010.safetensors",
"model.layers.32.self_attn.o_proj.weight": "model-0007-of-0010.safetensors",
"model.layers.32.mlp.gate_proj.weight": "model-0007-of-0010.safetensors",
"model.layers.32.mlp.down_proj.weight": "model-0007-of-0010.safetensors",
"model.layers.32.mlp.up_proj.weight": "model-0007-of-0010.safetensors",
"model.layers.32.input_layernorm.weight": "model-0007-of-0010.safetensors",
"model.layers.32.post_attention_layernorm.weight": "model-0007-of-0010.safetensors",
"model.layers.33.self_attn.q_proj.weight": "model-0007-of-0010.safetensors",
"model.layers.33.self_attn.k_proj.weight": "model-0007-of-0010.safetensors",
"model.layers.33.self_attn.v_proj.weight": "model-0007-of-0010.safetensors",
"model.layers.33.self_attn.o_proj.weight": "model-0007-of-0010.safetensors",
"model.layers.33.mlp.gate_proj.weight": "model-0007-of-0010.safetensors",
"model.layers.33.mlp.down_proj.weight": "model-0007-of-0010.safetensors",
"model.layers.33.mlp.up_proj.weight": "model-0007-of-0010.safetensors",
"model.layers.33.input_layernorm.weight": "model-0007-of-0010.safetensors",
"model.layers.33.post_attention_layernorm.weight": "model-0007-of-0010.safetensors",
"model.layers.34.self_attn.q_proj.weight": "model-0007-of-0010.safetensors",
"model.layers.34.self_attn.k_proj.weight": "model-0007-of-0010.safetensors",
"model.layers.34.self_attn.v_proj.weight": "model-0007-of-0010.safetensors",
"model.layers.34.self_attn.o_proj.weight": "model-0007-of-0010.safetensors",
"model.layers.34.mlp.gate_proj.weight": "model-0007-of-0010.safetensors",
"model.layers.34.mlp.down_proj.weight": "model-0008-of-0010.safetensors",
"model.layers.34.mlp.up_proj.weight": "model-0008-of-0010.safetensors",
"model.layers.34.input_layernorm.weight": "model-0008-of-0010.safetensors",
"model.layers.34.post_attention_layernorm.weight": "model-0008-of-0010.safetensors",
"model.layers.35.self_attn.q_proj.weight": "model-0008-of-0010.safetensors",
"model.layers.35.self_attn.k_proj.weight": "model-0008-of-0010.safetensors",
"model.layers.35.self_attn.v_proj.weight": "model-0008-of-0010.safetensors",
"model.layers.35.self_attn.o_proj.weight": "model-0008-of-0010.safetensors",
"model.layers.35.mlp.gate_proj.weight": "model-0008-of-0010.safetensors",
"model.layers.35.mlp.down_proj.weight": "model-0008-of-0010.safetensors",
"model.layers.35.mlp.up_proj.weight": "model-0008-of-0010.safetensors",
"model.layers.35.input_layernorm.weight": "model-0008-of-0010.safetensors",
"model.layers.35.post_attention_layernorm.weight": "model-0008-of-0010.safetensors",
"model.layers.36.self_attn.q_proj.weight": "model-0008-of-0010.safetensors",
"model.layers.36.self_attn.k_proj.weight": "model-0008-of-0010.safetensors",
"model.layers.36.self_attn.v_proj.weight": "model-0008-of-0010.safetensors",
"model.layers.36.self_attn.o_proj.weight": "model-0008-of-0010.safetensors",
"model.layers.36.mlp.gate_proj.weight": "model-0008-of-0010.safetensors",
"model.layers.36.mlp.down_proj.weight": "model-0008-of-0010.safetensors",
"model.layers.36.mlp.up_proj.weight": "model-0008-of-0010.safetensors",
"model.layers.36.input_layernorm.weight": "model-0008-of-0010.safetensors",
"model.layers.36.post_attention_layernorm.weight": "model-0008-of-0010.safetensors",
"model.layers.37.self_attn.q_proj.weight": "model-0008-of-0010.safetensors",
"model.layers.37.self_attn.k_proj.weight": "model-0008-of-0010.safetensors",
"model.layers.37.self_attn.v_proj.weight": "model-0008-of-0010.safetensors",
"model.layers.37.self_attn.o_proj.weight": "model-0008-of-0010.safetensors",
"model.layers.37.mlp.gate_proj.weight": "model-0008-of-0010.safetensors",
"model.layers.37.mlp.down_proj.weight": "model-0008-of-0010.safetensors",
"model.layers.37.mlp.up_proj.weight": "model-0008-of-0010.safetensors",
"model.layers.37.input_layernorm.weight": "model-0008-of-0010.safetensors",
"model.layers.37.post_attention_layernorm.weight": "model-0008-of-0010.safetensors",
"model.layers.38.self_attn.q_proj.weight": "model-0008-of-0010.safetensors",
"model.layers.38.self_attn.k_proj.weight": "model-0008-of-0010.safetensors",
"model.layers.38.self_attn.v_proj.weight": "model-0008-of-0010.safetensors",
"model.layers.38.self_attn.o_proj.weight": "model-0008-of-0010.safetensors",
"model.layers.38.mlp.gate_proj.weight": "model-0008-of-0010.safetensors",
"model.layers.38.mlp.down_proj.weight": "model-0008-of-0010.safetensors",
"model.layers.38.mlp.up_proj.weight": "model-0008-of-0010.safetensors",
"model.layers.38.input_layernorm.weight": "model-0008-of-0010.safetensors",
"model.layers.38.post_attention_layernorm.weight": "model-0009-of-0010.safetensors",
"model.layers.39.self_attn.q_proj.weight": "model-0009-of-0010.safetensors",
"model.layers.39.self_attn.k_proj.weight": "model-0009-of-0010.safetensors",
"model.layers.39.self_attn.v_proj.weight": "model-0009-of-0010.safetensors",
"model.layers.39.self_attn.o_proj.weight": "model-0009-of-0010.safetensors",
"model.layers.39.mlp.gate_proj.weight": "model-0009-of-0010.safetensors",
"model.layers.39.mlp.down_proj.weight": "model-0009-of-0010.safetensors",
"model.layers.39.mlp.up_proj.weight": "model-0009-of-0010.safetensors",
"model.layers.39.input_layernorm.weight": "model-0009-of-0010.safetensors",
"model.layers.39.post_attention_layernorm.weight": "model-0009-of-0010.safetensors",
"model.layers.40.self_attn.q_proj.weight": "model-0009-of-0010.safetensors",
"model.layers.40.self_attn.k_proj.weight": "model-0009-of-0010.safetensors",
"model.layers.40.self_attn.v_proj.weight": "model-0009-of-0010.safetensors",
"model.layers.40.self_attn.o_proj.weight": "model-0009-of-0010.safetensors",
"model.layers.40.mlp.gate_proj.weight": "model-0009-of-0010.safetensors",
"model.layers.40.mlp.down_proj.weight": "model-0009-of-0010.safetensors",
"model.layers.40.mlp.up_proj.weight": "model-0009-of-0010.safetensors",
"model.layers.40.input_layernorm.weight": "model-0009-of-0010.safetensors",
"model.layers.40.post_attention_layernorm.weight": "model-0009-of-0010.safetensors",
"model.layers.41.self_attn.q_proj.weight": "model-0009-of-0010.safetensors",
"model.layers.41.self_attn.k_proj.weight": "model-0009-of-0010.safetensors",
"model.layers.41.self_attn.v_proj.weight": "model-0009-of-0010.safetensors",
"model.layers.41.self_attn.o_proj.weight": "model-0009-of-0010.safetensors",
"model.layers.41.mlp.gate_proj.weight": "model-0009-of-0010.safetensors",
"model.layers.41.mlp.down_proj.weight": "model-0009-of-0010.safetensors",
"model.layers.41.mlp.up_proj.weight": "model-0009-of-0010.safetensors",
"model.layers.41.input_layernorm.weight": "model-0009-of-0010.safetensors",
"model.layers.41.post_attention_layernorm.weight": "model-0009-of-0010.safetensors",
"model.norm.weight": "model-0009-of-0010.safetensors",
"lm_head.weight": "model-0009-of-0010.safetensors"
},
"metadata": {
"total_size": 20422762496
}
}

16
special_tokens_map.json Normal file
View File

@@ -0,0 +1,16 @@
{
"bos_token": {
"content": "<|begin_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "<|end_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

410563
tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

2061
tokenizer_config.json Normal file

File diff suppressed because it is too large Load Diff