初始化项目,由ModelHub XC社区提供模型

Model: LLM-Research/OLMo-7B-0424-hf
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-31 08:07:12 +08:00
commit 6b2237c508
16 changed files with 102059 additions and 0 deletions

41
.gitattributes vendored Normal file
View File

@@ -0,0 +1,41 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
model-00001-of-00006.safetensors filter=lfs diff=lfs merge=lfs -text
model-00002-of-00006.safetensors filter=lfs diff=lfs merge=lfs -text
model-00003-of-00006.safetensors filter=lfs diff=lfs merge=lfs -text
model-00004-of-00006.safetensors filter=lfs diff=lfs merge=lfs -text
model-00005-of-00006.safetensors filter=lfs diff=lfs merge=lfs -text
model-00006-of-00006.safetensors filter=lfs diff=lfs merge=lfs -text

253
README.md Normal file
View File

@@ -0,0 +1,253 @@
---
license: apache-2.0
datasets:
- allenai/dolma
language:
- en
---
<img src="https://allenai.org/olmo/olmo-7b-animation.gif" alt="OLMo Logo" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
# Model Card for OLMo 7B April 2024
OLMo 7B April 2024 is an updated version of the original [OLMo 7B](https://huggingface.co/allenai/OLMo-7B) model rocking a 24 point increase in MMLU, among other evaluations improvements, from an improved version of the Dolma dataset and staged training.
**This version is for direct use with HuggingFace Transformers** from v4.40 on.
OLMo is a series of **O**pen **L**anguage **Mo**dels designed to enable the science of language models.
The OLMo models are trained on the [Dolma](https://huggingface.co/datasets/allenai/dolma) dataset.
We release all code, checkpoints, logs, and details involved in training these models.
## Model Details
The core models released in this batch are the following:
| Size | Training Tokens | Layers | Hidden Size | Attention Heads | Context Length |
|------|--------|---------|-------------|-----------------|----------------|
| [OLMo 1B](https://huggingface.co/allenai/OLMo-1B) | 3 Trillion |16 | 2048 | 16 | 2048 |
| [OLMo 7B](https://huggingface.co/allenai/OLMo-7B) | 2.5 Trillion | 32 | 4096 | 32 | 2048 |
| [OLMo 7B Twin 2T](https://huggingface.co/allenai/OLMo-7B-Twin-2T) | 2 Trillion | 32 | 4096 | 32 | 2048 |
| [OLMo 7B April 2024](https://huggingface.co/allenai/OLMo-7B-0424-hf) | 2.05 Trillion | 32 | 4096 | 32 | 4096 |
*Note: OLMo 7B April 2024 also includes QKV clipping.*
To load a specific model revision with HuggingFace, simply add the argument `revision`:
```bash
olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-7B-0424-hf", revision="step1000-tokens4B")
```
All revisions/branches are listed in the file `revisions.txt`.
Or, you can access all the revisions for the models via the following code snippet:
```python
from huggingface_hub import list_repo_refs
out = list_repo_refs("allenai/OLMo-7B-0424-hf")
branches = [b.name for b in out.branches]
```
### Model Description
- **Developed by:** Allen Institute for AI (AI2)
- **Supported by:** Databricks, Kempner Institute for the Study of Natural and Artificial Intelligence at Harvard University, AMD, CSC (Lumi Supercomputer), UW
- **Model type:** a Transformer style autoregressive language model.
- **Language(s) (NLP):** English
- **License:** The code and model are released under Apache 2.0.
- **Contact:** Technical inquiries: `olmo at allenai dot org`. Press: `press at allenai dot org`
- **Date cutoff:** Oct. 2023, with most data from Feb./March 2023 based on Dolma dataset version.
### Model Sources
- **Project Page:** https://allenai.org/olmo
- **Repositories:**
- Core repo (training, inference, fine-tuning etc.): https://github.com/allenai/OLMo
- Evaluation code: https://github.com/allenai/OLMo-Eval
- Further fine-tuning code: https://github.com/allenai/open-instruct
- **Paper:** [Link](https://arxiv.org/abs/2402.00838)
- **Technical blog post:** https://blog.allenai.org/olmo-1-7-7b-a-24-point-improvement-on-mmlu-92b43f7d269d
- **W&B Logs:** [pretraining](https://wandb.ai/ai2-llm/OLMo-7B/groups/OLMo-1.7-7B), [annealing](https://wandb.ai/ai2-llm/OLMo-7B/groups/OLMo-1.7-7B-anneal)
<!-- - **Press release:** TODO -->
## Uses
### Inference
Proceed as usual with HuggingFace:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-7B-0424-hf")
tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-7B-0424-hf")
message = ["Language modeling is"]
inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
# optional verifying cuda
# inputs = {k: v.to('cuda') for k,v in inputs.items()}
# olmo = olmo.to('cuda')
response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
>> 'Language modeling is the first step to build natural language generation...'
```
Alternatively, with the pipeline abstraction:
```python
from transformers import pipeline
olmo_pipe = pipeline("text-generation", model="allenai/OLMo-7B-0424-hf")
print(olmo_pipe("Language modeling is "))
>> 'Language modeling is a branch of natural language processing that aims to...'
```
Or, you can make this slightly faster by quantizing the model, e.g. `AutoModelForCausalLM.from_pretrained("allenai/OLMo-7B-0424-hf", torch_dtype=torch.float16, load_in_8bit=True)` (requires `bitsandbytes`).
The quantized model is more sensitive to typing / cuda, so it is recommended to pass the inputs as `inputs.input_ids.to('cuda')` to avoid potential issues.
### Fine-tuning
Model fine-tuning can be done from the final checkpoint (the `main` revision of this model) or many intermediate checkpoints. Two recipes for tuning are available.
1. Fine-tune with the OLMo repository:
```bash
torchrun --nproc_per_node=8 scripts/train.py {path_to_train_config} \
--data.paths=[{path_to_data}/input_ids.npy] \
--data.label_mask_paths=[{path_to_data}/label_mask.npy] \
--load_path={path_to_checkpoint} \
--reset_trainer_state
```
For more documentation, see the [GitHub readme](https://github.com/allenai/OLMo?tab=readme-ov-file#fine-tuning).
2. Further fine-tuning support is being developing in AI2's Open Instruct repository. Details are [here](https://github.com/allenai/open-instruct).
## Evaluation
<!-- This section describes the evaluation protocols and provides the results. -->
Core model results for the new and original 7B model are found below.
| Task | Llama-7b | Llama2-7b | Falcon-7b | Mpt-7b | OLMo-7B | Llama2-13b | **OLMo 1.7-7B** |
|-------------------|----------|-----------|-----------|--------|---------|------------|-------------|
| arc_c | 44.5 | 48.5 | 47.5 | 46.5 | 48.5 | 52.8 | 42.5 |
| arc_e | 67.9 | 69.5 | 70.4 | 70.5 | 65.4 | 73.7 | 67.2 |
| boolq | 75.4 | 80.2 | 74.6 | 74.2 | 73.4 | 82.2 | 83.7 |
| copa | 91.0 | 86.0 | 86.0 | 85.0 | 90.0 | 90.0 | 86.0 |
| hellaswag | 76.2 | 76.8 | 75.9 | 77.6 | 76.4 | 78.6 | 75.5 |
| openbookqa | 51.2 | 48.4 | 53.0 | 48.6 | 50.4 | 51.8 | 50.0 |
| piqa | 77.2 | 76.7 | 78.5 | 77.3 | 78.4 | 79.0 | 77.5 |
| sciq | 93.9 | 94.5 | 93.9 | 93.7 | 93.8 | 95.5 | 96.7 |
| winogrande | 70.5 | 69.4 | 68.9 | 69.9 | 67.9 | 73.5 | 69.8 |
| truthfulQA (MC2) | 33.9 | 38.5 | 34.0 | 33.0 | 36.0 | 36.8 | 35.8 |
| MMLU (5 shot MC) | 31.5 | 45.0 | 24.0 | 30.8 | 28.3 | 55.5 | 52.0 |
| GSM8k | 10.0 | 12.0 | 4.0 | 4.5 | 8.5 | 25.0 | 29.0 |
| Full average | 60.3 | 62.1 | 59.2 | 59.3 | 59.8 | 66.2 | 63.8 |
And for the 1B model:
| task | random | [StableLM 2 1.6b](https://huggingface.co/stabilityai/stablelm-2-1_6b)\* | [Pythia 1B](https://huggingface.co/EleutherAI/pythia-1b) | [TinyLlama 1.1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1195k-token-2.5T) | **OLMo 1B** (ours) |
| ------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------ | ----------------- | --------- | -------------------------------------- | ------- |
| arc_challenge | 25 | 43.81 | 33.11 | 34.78 | 34.45 |
| arc_easy | 25 | 63.68 | 50.18 | 53.16 | 58.07 |
| boolq | 50 | 76.6 | 61.8 | 64.6 | 60.7 |
| copa | 50 | 84 | 72 | 78 | 79 |
| hellaswag | 25 | 68.2 | 44.7 | 58.7 | 62.5 |
| openbookqa | 25 | 45.8 | 37.8 | 43.6 | 46.4 |
| piqa | 50 | 74 | 69.1 | 71.1 | 73.7 |
| sciq | 25 | 94.7 | 86 | 90.5 | 88.1 |
| winogrande | 50 | 64.9 | 53.3 | 58.9 | 58.9 |
| Average | 36.11 | 68.41 | 56.44 | 61.48 | 62.42 |
\*Unlike OLMo, Pythia, and TinyLlama, StabilityAI has not disclosed yet the data StableLM was trained on, making comparisons with other efforts challenging.
## Model Details
### Data
For training data details, please see the [Dolma](https://huggingface.co/datasets/allenai/dolma) documentation.
**This model uses the new 1.7 version with more data sources, better deduplication, and quality filtering**.
During the annealing phase we use a higher quality subset of Dolma with a linearly decaying learning rate to 0.
### Staged training / annealing
In contrast to OLMo 1.0, we trained OLMo 1.7 with a two-stage curriculum:
* In the first stage, we trained the model from scratch on the Dolma 1.7 dataset. We set a cosine learning rate schedule with a warmup of 2500 steps, a peak learning rate of 3e-4, and a cosine decay to 3e-5 after 3T tokens. We cut off this stage after 2T tokens, when the learning rate is still high.
* At this point we switch to the second stage, in which we train on a higher-quality subset of Dolma 1.7 (see below) for another 50B tokens, while linearly decaying the learning rate to 0. Our high-quality subset includes (1) using all available Wikipedia, OpenWebMath and Flan data, (2) removing Dolma CC, CC News, and Megawika, and (3) rebalancing remaining sources to achieve approximately equal proportions of each. See exact token counts and relative proportions of this second stage mix below.
Both stages contribute equally to the final performance of the OLMo model. After the first stage, OLMo 1.7 already outperforms OLMo 1.0. The second stage consistently adds 2 to 3 points of performance on top.
### Architecture
OLMo 7B architecture with peer models for comparison.
| | **OLMo 7B** | [Llama 2 7B](https://huggingface.co/meta-llama/Llama-2-7b) | [OpenLM 7B](https://laion.ai/blog/open-lm/) | [Falcon 7B](https://huggingface.co/tiiuae/falcon-7b) | PaLM 8B |
|------------------------|-------------------|---------------------|--------------------|--------------------|------------------|
| d_model | 4096 | 4096 | 4096 | 4544 | 4096 |
| num heads | 32 | 32 | 32 | 71 | 16 |
| num layers | 32 | 32 | 32 | 32 | 32 |
| MLP ratio | ~8/3 | ~8/3 | ~8/3 | 4 | 4 |
| LayerNorm type | non-parametric LN | RMSNorm | parametric LN | parametric LN | parametric LN |
| pos embeddings | RoPE | RoPE | RoPE | RoPE | RoPE |
| attention variant | full | GQA | full | MQA | MQA |
| biases | none | none | in LN only | in LN only | none |
| block type | sequential | sequential | sequential | parallel | parallel |
| activation | SwiGLU | SwiGLU | SwiGLU | GeLU | SwiGLU |
| sequence length | 2048 | 4096 | 2048 | 2048 | 2048 |
| batch size (instances) | 2160 | 1024 | 2048 | 2304 | 512 |
| batch size (tokens) | ~4M | ~4M | ~4M | ~4M | ~1M |
| weight tying | no | no | no | no | yes |
### Hyperparameters
AdamW optimizer parameters are shown below.
| Size | Peak LR | Betas | Epsilon | Weight Decay |
|------|------------|-----------------|-------------|--------------|
| 1B | 4.0E-4 | (0.9, 0.95) | 1.0E-5 | 0.1 |
| 7B | 3.0E-4 | (0.9, 0.99) | 1.0E-5 | 0.1 |
Optimizer settings comparison with peer models.
| | **OLMo 7B** | [Llama 2 7B](https://huggingface.co/meta-llama/Llama-2-7b) | [OpenLM 7B](https://laion.ai/blog/open-lm/) | [Falcon 7B](https://huggingface.co/tiiuae/falcon-7b) |
|-----------------------|------------------|---------------------|--------------------|--------------------|
| warmup steps | 5000 | 2000 | 2000 | 1000 |
| peak LR | 3.0E-04 | 3.0E-04 | 3.0E-04 | 6.0E-04 |
| minimum LR | 3.0E-05 | 3.0E-05 | 3.0E-05 | 1.2E-05 |
| weight decay | 0.1 | 0.1 | 0.1 | 0.1 |
| beta1 | 0.9 | 0.9 | 0.9 | 0.99 |
| beta2 | 0.95 | 0.95 | 0.95 | 0.999 |
| epsilon | 1.0E-05 | 1.0E-05 | 1.0E-05 | 1.0E-05 |
| LR schedule | linear | cosine | cosine | cosine |
| gradient clipping | global 1.0 | global 1.0 | global 1.0 | global 1.0 |
| gradient reduce dtype | FP32 | FP32 | FP32 | BF16 |
| optimizer state dtype | FP32 | most likely FP32 | FP32 | FP32 |
<!-- ## Environmental Impact
OLMo 7B variants were either trained on MI250X GPUs at the LUMI supercomputer, or A100-40GB GPUs provided by MosaicML.
A summary of the environmental impact. Further details are available in the paper.
| | GPU Type | Power Consumption From GPUs | Carbon Intensity (kg CO₂e/KWh) | Carbon Emissions (tCO₂eq) |
|-----------|------------|-----------------------------|--------------------------------|---------------------------|
| OLMo 7B Twin | MI250X ([LUMI supercomputer](https://www.lumi-supercomputer.eu)) | 135 MWh | 0* | 0* |
| OLMo 7B | A100-40GB ([MosaicML](https://www.mosaicml.com)) | 104 MWh | 0.656 | 75.05 | -->
## Bias, Risks, and Limitations
Like any base language model or fine-tuned model without safety filtering, it is relatively easy for a user to prompt these models to generate harmful and generally sensitive content.
Such content can also be produced unintentionally, especially in the case of bias, so we recommend users consider the risks of applications of this technology.
Otherwise, many facts from OLMo or any LLM will often not be true, so they should be checked.
## Citation
**BibTeX:**
```
@article{Groeneveld2023OLMo,
title={OLMo: Accelerating the Science of Language Models},
author={Groeneveld, Dirk and Beltagy, Iz and Walsh, Pete and Bhagia, Akshita and Kinney, Rodney and Tafjord, Oyvind and Jha, Ananya Harsh and Ivison, Hamish and Magnusson, Ian and Wang, Yizhong and Arora, Shane and Atkinson, David and Authur, Russell and Chandu, Khyathi and Cohan, Arman and Dumas, Jennifer and Elazar, Yanai and Gu, Yuling and Hessel, Jack and Khot, Tushar and Merrill, William and Morrison, Jacob and Muennighoff, Niklas and Naik, Aakanksha and Nam, Crystal and Peters, Matthew E. and Pyatkin, Valentina and Ravichander, Abhilasha and Schwenk, Dustin and Shah, Saurabh and Smith, Will and Subramani, Nishant and Wortsman, Mitchell and Dasigi, Pradeep and Lambert, Nathan and Richardson, Kyle and Dodge, Jesse and Lo, Kyle and Soldaini, Luca and Smith, Noah A. and Hajishirzi, Hannaneh},
journal={Preprint},
year={2024}
}
```
**APA:**
Groeneveld, D., Beltagy, I., Walsh, P., Bhagia, A., Kinney, R., Tafjord, O., Jha, A., Ivison, H., Magnusson, I., Wang, Y., Arora, S., Atkinson, D., Authur, R., Chandu, K., Cohan, A., Dumas, J., Elazar, Y., Gu, Y., Hessel, J., Khot, T., Merrill, W., Morrison, J., Muennighoff, N., Naik, A., Nam, C., Peters, M., Pyatkin, V., Ravichander, A., Schwenk, D., Shah, S., Smith, W., Subramani, N., Wortsman, M., Dasigi, P., Lambert, N., Richardson, K., Dodge, J., Lo, K., Soldaini, L., Smith, N., & Hajishirzi, H. (2024). OLMo: Accelerating the Science of Language Models. Preprint.
## Model Card Contact
For errors in this model card, contact Nathan, `{nathanl} at allenai dot org`.

26
config.json Normal file
View File

@@ -0,0 +1,26 @@
{
"architectures": [
"OlmoForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"clip_qkv": 8.0,
"eos_token_id": 50279,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 11008,
"max_position_embeddings": 4096,
"model_type": "olmo",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 32,
"pad_token_id": 1,
"rope_scaling": null,
"rope_theta": 10000.0,
"tie_word_embeddings": false,
"torch_dtype": "float32",
"transformers_version": "4.40.0.dev0",
"use_cache": true,
"vocab_size": 50304
}

1
configuration.json Normal file
View File

@@ -0,0 +1 @@
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}

6
generation_config.json Normal file
View File

@@ -0,0 +1,6 @@
{
"_from_model_config": true,
"eos_token_id": 50279,
"pad_token_id": 1,
"transformers_version": "4.40.0.dev0"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:727e38a3d96772093a53196af458ecb7ad7ceeaa11460cebbafbdbf62a297d68
size 4938797232

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c82bc739daec94c92291a11c814b1fe41707110b1f60fc4d0f3c9ce66ca2c70d
size 4991226840

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:2abc2ff401aa5513ffce8a44b2996fb5e172915abc48a95747c1771b351a6187
size 4924117904

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f422c4a102890a8c9b91f2d09d75d07e349a969b46d15f81822dbe061238fbdf
size 4857008920

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:89f39cd5afc3349fc17f60c9d94a400286b7ac3bd8ee7c2d94bf35e6023d348f
size 4857008920

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:fdb556808f2579030231284b5d6a854bece771c33f0b784e0ed3a5e83871a07b
size 2984249384

View File

@@ -0,0 +1,233 @@
{
"metadata": {
"total_size": 27552382976
},
"weight_map": {
"lm_head.weight": "model-00006-of-00006.safetensors",
"model.embed_tokens.weight": "model-00001-of-00006.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.10.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.10.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.11.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.11.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.12.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.12.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.13.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.13.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.14.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.14.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.15.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.15.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.16.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.16.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.17.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.17.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.18.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.18.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.19.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.19.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.20.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.20.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.21.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.21.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.22.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.22.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.23.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.23.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.24.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.24.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.24.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.24.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.24.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.25.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.25.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.25.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.25.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.25.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.26.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.26.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.26.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.26.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.26.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.27.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.27.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.27.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.27.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.27.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.28.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.28.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.28.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.28.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.28.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.28.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.28.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.29.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.29.mlp.gate_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.29.mlp.up_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.29.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.29.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.29.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.29.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.30.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.30.mlp.gate_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.30.mlp.up_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.30.self_attn.k_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.30.self_attn.o_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.30.self_attn.q_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.30.self_attn.v_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.31.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.31.mlp.gate_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.31.mlp.up_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.31.self_attn.k_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.31.self_attn.o_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.31.self_attn.q_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.31.self_attn.v_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.5.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.5.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.6.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.6.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.7.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.7.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.8.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.8.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.9.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.9.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-00002-of-00006.safetensors"
}
}

637
revisions.txt Normal file
View File

@@ -0,0 +1,637 @@
step0-tokens0B
step1000-tokens4B
step10000-tokens41B
step100000-tokens419B
step100500-tokens421B
step101000-tokens423B
step101500-tokens425B
step102000-tokens427B
step102500-tokens429B
step103000-tokens431B
step103500-tokens433B
step104000-tokens436B
step104500-tokens438B
step10500-tokens44B
step105000-tokens440B
step105500-tokens442B
step106000-tokens444B
step106500-tokens446B
step107000-tokens448B
step107500-tokens450B
step108000-tokens452B
step108500-tokens454B
step109000-tokens457B
step109500-tokens459B
step11000-tokens46B
step110000-tokens461B
step110500-tokens463B
step111000-tokens465B
step111500-tokens467B
step112000-tokens469B
step112500-tokens471B
step113000-tokens473B
step113500-tokens475B
step114000-tokens477B
step114500-tokens480B
step11500-tokens48B
step115000-tokens482B
step115500-tokens484B
step116000-tokens486B
step116500-tokens488B
step117000-tokens490B
step117500-tokens492B
step118000-tokens494B
step118500-tokens496B
step119000-tokens498B
step119500-tokens501B
step12000-tokens50B
step120000-tokens503B
step120500-tokens505B
step121000-tokens507B
step121500-tokens509B
step122000-tokens511B
step122500-tokens513B
step123000-tokens515B
step123500-tokens517B
step124000-tokens519B
step124500-tokens522B
step12500-tokens52B
step125000-tokens524B
step125500-tokens526B
step126000-tokens528B
step126500-tokens530B
step127000-tokens532B
step127500-tokens534B
step128000-tokens536B
step128500-tokens538B
step129000-tokens540B
step129500-tokens542B
step13000-tokens54B
step130000-tokens545B
step130500-tokens547B
step131000-tokens549B
step131500-tokens551B
step132000-tokens553B
step132500-tokens555B
step133000-tokens557B
step133500-tokens559B
step134000-tokens561B
step134500-tokens563B
step13500-tokens56B
step135000-tokens566B
step135500-tokens568B
step136000-tokens570B
step136500-tokens572B
step137000-tokens574B
step137500-tokens576B
step138000-tokens578B
step138500-tokens580B
step139000-tokens582B
step139500-tokens584B
step14000-tokens58B
step140000-tokens587B
step140500-tokens589B
step141000-tokens591B
step141500-tokens593B
step14200-tokens59B
step142000-tokens595B
step142500-tokens597B
step143000-tokens599B
step143500-tokens601B
step144000-tokens603B
step144500-tokens605B
step14500-tokens60B
step145000-tokens607B
step145500-tokens610B
step146000-tokens612B
step146500-tokens614B
step147000-tokens616B
step147500-tokens618B
step148000-tokens620B
step148500-tokens622B
step149000-tokens624B
step149500-tokens626B
step1500-tokens6B
step15000-tokens62B
step150000-tokens628B
step150500-tokens631B
step151000-tokens633B
step151500-tokens635B
step152000-tokens637B
step152500-tokens639B
step153000-tokens641B
step153500-tokens643B
step154000-tokens645B
step154500-tokens647B
step15500-tokens64B
step155000-tokens649B
step155500-tokens651B
step156000-tokens654B
step156500-tokens656B
step157000-tokens658B
step157500-tokens660B
step158000-tokens662B
step158500-tokens664B
step159000-tokens666B
step159500-tokens668B
step16000-tokens67B
step160000-tokens670B
step160500-tokens672B
step160550-tokens673B
step160600-tokens673B
step161000-tokens675B
step162000-tokens679B
step163000-tokens683B
step164000-tokens687B
step16500-tokens69B
step165000-tokens691B
step166000-tokens696B
step167000-tokens700B
step168000-tokens704B
step169000-tokens708B
step17000-tokens71B
step170000-tokens712B
step171000-tokens716B
step172000-tokens721B
step173000-tokens725B
step174000-tokens729B
step17500-tokens73B
step175000-tokens733B
step176000-tokens737B
step177000-tokens742B
step178000-tokens746B
step179000-tokens750B
step18000-tokens75B
step180000-tokens754B
step181000-tokens758B
step182000-tokens763B
step183000-tokens767B
step184000-tokens771B
step18500-tokens77B
step185000-tokens775B
step186000-tokens779B
step187000-tokens784B
step188000-tokens788B
step189000-tokens792B
step19000-tokens79B
step190000-tokens796B
step191000-tokens800B
step192000-tokens805B
step193000-tokens809B
step194000-tokens813B
step19500-tokens81B
step195000-tokens817B
step196000-tokens821B
step197000-tokens825B
step198000-tokens830B
step199000-tokens834B
step2000-tokens8B
step20000-tokens83B
step200000-tokens838B
step201000-tokens842B
step202000-tokens846B
step203000-tokens851B
step204000-tokens855B
step20500-tokens85B
step205000-tokens859B
step206000-tokens863B
step207000-tokens867B
step208000-tokens872B
step209000-tokens876B
step21000-tokens88B
step210000-tokens880B
step211000-tokens884B
step212000-tokens888B
step213000-tokens893B
step214000-tokens897B
step21500-tokens90B
step215000-tokens901B
step216000-tokens905B
step217000-tokens909B
step218000-tokens914B
step219000-tokens918B
step22000-tokens92B
step220000-tokens922B
step221000-tokens926B
step222000-tokens930B
step223000-tokens935B
step224000-tokens939B
step22500-tokens94B
step225000-tokens943B
step226000-tokens947B
step227000-tokens951B
step228000-tokens955B
step229000-tokens960B
step23000-tokens96B
step230000-tokens964B
step231000-tokens968B
step232000-tokens972B
step233000-tokens976B
step234000-tokens981B
step23500-tokens98B
step235000-tokens985B
step236000-tokens989B
step237000-tokens993B
step238000-tokens997B
step239000-tokens1002B
step24000-tokens100B
step240000-tokens1006B
step241000-tokens1010B
step242000-tokens1014B
step243000-tokens1018B
step244000-tokens1023B
step24500-tokens102B
step245000-tokens1027B
step246000-tokens1031B
step247000-tokens1035B
step248000-tokens1039B
step249000-tokens1044B
step2500-tokens10B
step25000-tokens104B
step250000-tokens1048B
step251000-tokens1052B
step252000-tokens1056B
step253000-tokens1060B
step254000-tokens1064B
step25500-tokens106B
step255000-tokens1069B
step256000-tokens1073B
step257000-tokens1077B
step258000-tokens1081B
step259000-tokens1085B
step26000-tokens109B
step260000-tokens1090B
step261000-tokens1094B
step262000-tokens1098B
step263000-tokens1102B
step264000-tokens1106B
step26500-tokens111B
step265000-tokens1111B
step266000-tokens1115B
step267000-tokens1119B
step268000-tokens1123B
step269000-tokens1127B
step27000-tokens113B
step270000-tokens1132B
step271000-tokens1136B
step272000-tokens1140B
step273000-tokens1144B
step274000-tokens1148B
step27500-tokens115B
step275000-tokens1153B
step276000-tokens1157B
step277000-tokens1161B
step278000-tokens1165B
step279000-tokens1169B
step28000-tokens117B
step280000-tokens1174B
step281000-tokens1178B
step282000-tokens1182B
step283000-tokens1186B
step284000-tokens1190B
step28500-tokens119B
step285000-tokens1194B
step286000-tokens1199B
step287000-tokens1203B
step288000-tokens1207B
step289000-tokens1211B
step29000-tokens121B
step290000-tokens1215B
step291000-tokens1220B
step292000-tokens1224B
step293000-tokens1228B
step294000-tokens1232B
step29500-tokens123B
step295000-tokens1236B
step296000-tokens1241B
step297000-tokens1245B
step298000-tokens1249B
step299000-tokens1253B
step3000-tokens12B
step30000-tokens125B
step300000-tokens1257B
step301000-tokens1262B
step302000-tokens1266B
step303000-tokens1270B
step304000-tokens1274B
step30500-tokens127B
step305000-tokens1278B
step306000-tokens1283B
step307000-tokens1287B
step308000-tokens1291B
step309000-tokens1295B
step31000-tokens129B
step310000-tokens1299B
step311000-tokens1303B
step312000-tokens1308B
step313000-tokens1312B
step314000-tokens1316B
step31500-tokens132B
step315000-tokens1320B
step316000-tokens1324B
step317000-tokens1329B
step318000-tokens1333B
step319000-tokens1337B
step32000-tokens134B
step320000-tokens1341B
step321000-tokens1345B
step322000-tokens1350B
step323000-tokens1354B
step324000-tokens1358B
step32500-tokens136B
step325000-tokens1362B
step326000-tokens1366B
step327000-tokens1371B
step328000-tokens1375B
step329000-tokens1379B
step33000-tokens138B
step330000-tokens1383B
step331000-tokens1387B
step332000-tokens1392B
step333000-tokens1396B
step334000-tokens1400B
step33500-tokens140B
step335000-tokens1404B
step336000-tokens1408B
step337000-tokens1412B
step338000-tokens1417B
step339000-tokens1421B
step34000-tokens142B
step340000-tokens1425B
step341000-tokens1429B
step342000-tokens1433B
step343000-tokens1438B
step344000-tokens1442B
step34500-tokens144B
step345000-tokens1446B
step346000-tokens1450B
step347000-tokens1454B
step348000-tokens1459B
step349000-tokens1463B
step3500-tokens14B
step35000-tokens146B
step350000-tokens1467B
step351000-tokens1471B
step352000-tokens1475B
step353000-tokens1480B
step354000-tokens1484B
step35500-tokens148B
step355000-tokens1488B
step356000-tokens1492B
step357000-tokens1496B
step358000-tokens1501B
step359000-tokens1505B
step36000-tokens150B
step360000-tokens1509B
step361000-tokens1513B
step362000-tokens1517B
step363000-tokens1522B
step364000-tokens1526B
step36500-tokens153B
step365000-tokens1530B
step366000-tokens1534B
step367000-tokens1538B
step368000-tokens1542B
step369000-tokens1547B
step37000-tokens155B
step370000-tokens1551B
step371000-tokens1555B
step372000-tokens1559B
step373000-tokens1563B
step374000-tokens1568B
step37500-tokens157B
step375000-tokens1572B
step376000-tokens1576B
step377000-tokens1580B
step378000-tokens1584B
step379000-tokens1589B
step38000-tokens159B
step380000-tokens1593B
step381000-tokens1597B
step382000-tokens1601B
step383000-tokens1605B
step384000-tokens1610B
step38500-tokens161B
step385000-tokens1614B
step386000-tokens1618B
step387000-tokens1622B
step388000-tokens1626B
step389000-tokens1631B
step39000-tokens163B
step390000-tokens1635B
step391000-tokens1639B
step392000-tokens1643B
step393000-tokens1647B
step39350-tokens164B
step394000-tokens1651B
step39500-tokens165B
step395000-tokens1656B
step396000-tokens1660B
step397000-tokens1664B
step398000-tokens1668B
step399000-tokens1672B
step4000-tokens16B
step40000-tokens167B
step40500-tokens169B
step41000-tokens171B
step410000-tokens1719B
step411000-tokens1723B
step412000-tokens1727B
step413000-tokens1731B
step41400-tokens173B
step414000-tokens1735B
step41500-tokens174B
step415000-tokens1740B
step416000-tokens1744B
step417000-tokens1748B
step418000-tokens1752B
step419000-tokens1756B
step42000-tokens176B
step420000-tokens1761B
step421000-tokens1765B
step42200-tokens176B
step422000-tokens1769B
step423000-tokens1773B
step424000-tokens1777B
step42500-tokens178B
step425000-tokens1781B
step426000-tokens1786B
step427000-tokens1790B
step428000-tokens1794B
step429000-tokens1798B
step43000-tokens180B
step430000-tokens1802B
step431000-tokens1807B
step432000-tokens1811B
step433000-tokens1815B
step434000-tokens1819B
step43500-tokens182B
step435000-tokens1823B
step436000-tokens1828B
step437000-tokens1832B
step438000-tokens1836B
step439000-tokens1840B
step44000-tokens184B
step440000-tokens1844B
step441000-tokens1849B
step442000-tokens1853B
step443000-tokens1857B
step444000-tokens1861B
step44500-tokens186B
step445000-tokens1865B
step446000-tokens1870B
step447000-tokens1874B
step448000-tokens1878B
step449000-tokens1882B
step4500-tokens18B
step45000-tokens188B
step450000-tokens1886B
step451000-tokens1890B
step452000-tokens1895B
step453000-tokens1899B
step454000-tokens1903B
step45500-tokens190B
step455000-tokens1907B
step456000-tokens1911B
step457000-tokens1916B
step458000-tokens1920B
step459000-tokens1924B
step46000-tokens192B
step460000-tokens1928B
step461000-tokens1932B
step462000-tokens1937B
step463000-tokens1941B
step464000-tokens1945B
step46500-tokens194B
step465000-tokens1949B
step466000-tokens1953B
step467000-tokens1958B
step468000-tokens1962B
step469000-tokens1966B
step47000-tokens197B
step471000-tokens1974B
step472000-tokens1979B
step473000-tokens1983B
step474000-tokens1987B
step47500-tokens199B
step475000-tokens1991B
step476000-tokens1995B
step477000-tokens2000B
step48000-tokens201B
step48500-tokens203B
step49000-tokens205B
step49500-tokens207B
step500-tokens2B
step5000-tokens20B
step50000-tokens209B
step50500-tokens211B
step51000-tokens213B
step51450-tokens215B
step51500-tokens215B
step52000-tokens218B
step52300-tokens219B
step52500-tokens220B
step53000-tokens222B
step53500-tokens224B
step54000-tokens226B
step54500-tokens228B
step5500-tokens23B
step55000-tokens230B
step55500-tokens232B
step56000-tokens234B
step56500-tokens236B
step57000-tokens238B
step57500-tokens241B
step58000-tokens243B
step58500-tokens245B
step59000-tokens247B
step59500-tokens249B
step6000-tokens25B
step60000-tokens251B
step60500-tokens253B
step61000-tokens255B
step61500-tokens257B
step62000-tokens259B
step62500-tokens262B
step63000-tokens264B
step63500-tokens266B
step64000-tokens268B
step64500-tokens270B
step6500-tokens27B
step65000-tokens272B
step65500-tokens274B
step66000-tokens276B
step66500-tokens278B
step67000-tokens280B
step67500-tokens283B
step68000-tokens285B
step68500-tokens287B
step69000-tokens289B
step69500-tokens291B
step7000-tokens29B
step70000-tokens293B
step70500-tokens295B
step71000-tokens297B
step71500-tokens299B
step72000-tokens301B
step72500-tokens303B
step73000-tokens306B
step73100-tokens306B
step73500-tokens308B
step74000-tokens310B
step74500-tokens312B
step7500-tokens31B
step75000-tokens314B
step75500-tokens316B
step76000-tokens318B
step76500-tokens320B
step77000-tokens322B
step77500-tokens324B
step78000-tokens327B
step78500-tokens329B
step79000-tokens331B
step79500-tokens333B
step8000-tokens33B
step80000-tokens335B
step80500-tokens337B
step81000-tokens339B
step81500-tokens341B
step82000-tokens343B
step82500-tokens345B
step83000-tokens348B
step83500-tokens350B
step84000-tokens352B
step84500-tokens354B
step8500-tokens35B
step85000-tokens356B
step85500-tokens358B
step86000-tokens360B
step86500-tokens362B
step87000-tokens364B
step87500-tokens366B
step88000-tokens368B
step88500-tokens371B
step89000-tokens373B
step89500-tokens375B
step9000-tokens37B
step90000-tokens377B
step90500-tokens379B
step91000-tokens381B
step91500-tokens383B
step92000-tokens385B
step92500-tokens387B
step93000-tokens389B
step93500-tokens392B
step94000-tokens394B
step94500-tokens396B
step9500-tokens39B
step95000-tokens398B
step95500-tokens400B
step96000-tokens402B
step96500-tokens404B
step97000-tokens406B
step97500-tokens408B
step98000-tokens410B
step98500-tokens412B
step99000-tokens415B
step99500-tokens417B

4
special_tokens_map.json Normal file
View File

@@ -0,0 +1,4 @@
{
"eos_token": "<|endoftext|>",
"pad_token": "<|padding|>"
}

100602
tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

238
tokenizer_config.json Normal file
View File

@@ -0,0 +1,238 @@
{
"add_bos_token": false,
"add_eos_token": false,
"add_prefix_space": false,
"added_tokens_decoder": {
"0": {
"content": "|||IP_ADDRESS|||",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": false
},
"1": {
"content": "<|padding|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"50254": {
"content": " ",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": false
},
"50255": {
"content": " ",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": false
},
"50256": {
"content": " ",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": false
},
"50257": {
"content": " ",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": false
},
"50258": {
"content": " ",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": false
},
"50259": {
"content": " ",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": false
},
"50260": {
"content": " ",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": false
},
"50261": {
"content": " ",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": false
},
"50262": {
"content": " ",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": false
},
"50263": {
"content": " ",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": false
},
"50264": {
"content": " ",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": false
},
"50265": {
"content": " ",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": false
},
"50266": {
"content": " ",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": false
},
"50267": {
"content": " ",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": false
},
"50268": {
"content": " ",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": false
},
"50269": {
"content": " ",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": false
},
"50270": {
"content": " ",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": false
},
"50271": {
"content": " ",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": false
},
"50272": {
"content": " ",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": false
},
"50273": {
"content": " ",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": false
},
"50274": {
"content": " ",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": false
},
"50275": {
"content": " ",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": false
},
"50276": {
"content": " ",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": false
},
"50277": {
"content": "|||EMAIL_ADDRESS|||",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": false
},
"50278": {
"content": "|||PHONE_NUMBER|||",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": false
},
"50279": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
}
},
"bos_token": null,
"clean_up_tokenization_spaces": true,
"eos_token": "<|endoftext|>",
"model_max_length": 1000000000000000019884624838656,
"pad_token": "<|padding|>",
"tokenizer_class": "GPTNeoXTokenizer",
"unk_token": null
}