初始化项目,由ModelHub XC社区提供模型

Model: allenai/Olmo-3-7B-Instruct-SFT
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-06-15 23:04:13 +08:00
commit fe47a9923d
17 changed files with 101481 additions and 0 deletions

53
.gitattributes vendored Normal file
View File

@@ -0,0 +1,53 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bin.* filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zstandard filter=lfs diff=lfs merge=lfs -text
*.tfevents* filter=lfs diff=lfs merge=lfs -text
*.db* filter=lfs diff=lfs merge=lfs -text
*.ark* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*data* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.meta filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.index filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.gguf* filter=lfs diff=lfs merge=lfs -text
*.ggml filter=lfs diff=lfs merge=lfs -text
*.llamafile* filter=lfs diff=lfs merge=lfs -text
*.pt2 filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text
model-00003-of-00003.safetensors filter=lfs diff=lfs merge=lfs -text
model-00001-of-00003.safetensors filter=lfs diff=lfs merge=lfs -text
vocab.json filter=lfs diff=lfs merge=lfs -text
model-00002-of-00003.safetensors filter=lfs diff=lfs merge=lfs -text

226
README.md Normal file
View File

@@ -0,0 +1,226 @@
---
license: apache-2.0
base_model: allenai/Olmo-3-1025-7B
language:
- en
library_name: transformers
datasets:
- allenai/Dolci-Instruct-SFT
- allenai/Dolci-Instruct-SFT-Tool-Use-SA
---
## Model Details
<img alt="Logo for Olmo 3 7B Instruct model" src="olmo-instruct.png" width="307px" style="margin-left:'auto' margin-right:'auto' display:'block'">
# Model Card for Olmo 3 7B Instruct SFT
We introduce Olmo 3, a new family of 7B and 32B models both Instruct and Think variants. Long chain-of-thought thinking improves reasoning tasks like math and coding.
Olmo is a series of **O**pen **l**anguage **mo**dels designed to enable the science of language models.
These models are pre-trained on the Dolma 3 dataset and post-trained on the Dolci datasets. We are releasing all code, checkpoints, logs (coming soon), and associated training details.
The core models released in this batch include the following:
| **Stage** | **Olmo 3 7B Think** | **Olmo 3 32B Think** | **Olmo 3 7B Instruct** |
|--------------------------|-----------------------|------------------------|---------------------------|
| **Base Model** | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) | [Olmo-3-32B](https://huggingface.co/allenai/Olmo-3-1125-32B) | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) |
| **SFT** | [Olmo-3-7B-Think-SFT](https://huggingface.co/allenai/Olmo-3-7B-Think-SFT) | [Olmo-3-32B-Think-SFT](https://huggingface.co/allenai/Olmo-3-32B-Think-SFT) | [Olmo-3-7B-Instruct-SFT](https://huggingface.co/allenai/Olmo-3-7B-Instruct-SFT) |
| **DPO** | [Olmo-3-7B-Think-DPO](https://huggingface.co/allenai/Olmo-3-7B-Think-DPO) | [Olmo-3-32B-Think-DPO](https://huggingface.co/allenai/Olmo-3-32B-Think-DPO) | [Olmo-3-7B-Instruct-DPO](https://huggingface.co/allenai/Olmo-3-7B-Instruct-DPO) |
| **Final Models (RLVR)** | [Olmo-3-7B-Think](https://huggingface.co/allenai/Olmo-3-7B-Think) | [Olmo-3-32B-Think](https://huggingface.co/allenai/Olmo-3-32B-Think) | [Olmo-3-7B-Instruct](https://huggingface.co/allenai/Olmo-3-7B-Instruct) |
## Installation
Olmo 3 is supported in transformers 4.57.0 or higher:
```bash
pip install transformers>=4.57.0
```
## Inference
You can use OLMo with the standard HuggingFace transformers library:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
olmo = AutoModelForCausalLM.from_pretrained("allenai/Olmo-3-7B-Instruct-SFT")
tokenizer = AutoTokenizer.from_pretrained("allenai/Olmo-3-7B-Instruct-SFT")
message = ["Who would win in a fight - a dinosaur or a cow named Moo Moo?"]
inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
# optional verifying cuda
# inputs = {k: v.to('cuda') for k,v in inputs.items()}
# olmo = olmo.to('cuda')
response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
>> 'This is a fun and imaginative question! Lets break it down...'
```
For faster performance, you can quantize the model using the following method:
```python
AutoModelForCausalLM.from_pretrained("allenai/Olmo-3-7B-Instruct-SFT",
torch_dtype=torch.float16,
load_in_8bit=True) # Requires bitsandbytes
```
The quantized model is more sensitive to data types and CUDA operations. To avoid potential issues, it's recommended to pass the inputs directly to CUDA using:
```python
inputs.input_ids.to('cuda')
```
## Chat template
## Default System Message
The default system prompt for this model is:
```
<|im_start|>system
You are a helpful function-calling AI assistant.
You do not currently have access to any functions. <functions></functions><|im_end|>
```
## Chat Format
The chat template for this model is formatted as:
```
<|im_start|>system
You are a helpful function-calling AI assistant.
You do not currently have access to any functions. <functions></functions><|im_end|>
<|im_start|>user
Who would win in a fight - a dinosaur or a cow named Moo Moo?<|im_end|>
<|im_start|>assistant
This is a fun and imaginative question! Lets break it down...
Moo Moo the cow would certinaly win.
<|endoftext|>
```
### Model Description
- **Developed by:** Allen Institute for AI (Ai2)
- **Model type:** a Transformer style autoregressive language model.
- **Language(s) (NLP):** English
- **License:** This model is licensed under Apache 2.0. It is intended for research and educational use in accordance with Ai2's [Responsible Use Guidelines](https://allenai.org/responsible-use).
- **Contact:** Technical inquiries: `olmo@allenai.org`. Press: `press@allenai.org`
- **Date cutoff:** Dec. 2024.
### Model Sources
- **Project Page:** https://allenai.org/olmo
- **Repositories:**
- Open-Instruct for DPO and RLVR: https://github.com/allenai/open-instruct
- OLMo-Core for pre-training and SFT: https://github.com/allenai/OLMo-core
- OLMo-Eval for evaluation: https://github.com/allenai/OLMo-Eval
- **Paper:** [TBD]
<!-- - **Technical blog post:** (URL) -->
<!-- - **W&B Logs:** [SFT](()), [DPO](()), [RLVR](()) -->
## Evaluation
| **Skill** | **Benchmark** | **Olmo 3 Instruct 7B SFT** | **Olmo 3 Instruct 7B DPO** | **Olmo3 Instruct 7B** | **Qwen 3 8B (no reasoning)** | **Qwen 3 VL 8B Instruct** | **Qwen 2.5 7B** | **Olmo 2 7B Instruct** | **Apertus 8B Instruct** | **Granite 3.3 8B Instruct** |
|-----------|--------------|---------------------------|---------------------------|------------------------|------------------------------|----------------------------|-------------------|--------------------------|----------------------------|-------------------------------|
| **Math** | MATH | 65.1 | 79.6 | 87.3 | 82.3 | 91.6 | 71.0 | 30.1 | 21.9 | 67.3 |
| | AIME 2024 | 6.7 | 23.5 | 44.3 | 26.2 | 55.1 | 11.3 | 1.3 | 0.5 | 7.3 |
| | AIME 2025 | 7.2 | 20.4 | 32.5 | 21.7 | 43.3 | 6.3 | 0.4 | 0.2 | 6.3 |
| | OMEGA | 14.4 | 22.8 | 28.9 | 20.5 | 32.3 | 13.7 | 5.2 | 5.0 | 10.7 |
| **Reasoning** | BigBenchHard | 51.0 | 69.3 | 71.2 | 73.7 | 85.6 | 68.8 | 43.8 | 42.2 | 61.2 |
| | ZebraLogic | 18.0 | 28.4 | 32.9 | 25.4 | 64.3 | 10.7 | 5.3 | 5.3 | 17.6 |
| | AGI Eval English | 59.2 | 64.0 | 64.4 | 76.0 | 84.5 | 69.8 | 56.1 | 50.8 | 64.0 |
| **Coding** | HumanEvalPlus | 69.8 | 72.9 | 77.2 | 79.8 | 82.9 | 74.9 | 25.8 | 34.4 | 64.0 |
| | MBPP+ | 56.5 | 55.9 | 60.2 | 64.4 | 66.3 | 62.6 | 40.7 | 42.1 | 54.0 |
| | LiveCodeBench v3 | 20.0 | 18.8 | 29.5 | 53.2 | 55.9 | 34.5 | 7.2 | 7.8 | 11.5 |
| **IF** | IFEval | 81.7 | 82.0 | 85.6 | 86.3 | 87.8 | 73.4 | 72.2 | 71.4 | 77.5 |
| | IFBench | 27.4 | 29.3 | 32.3 | 29.3 | 34.0 | 28.4 | 26.7 | 22.1 | 22.3 |
| **Knowledge** | MMLU | 67.1 | 69.1 | 69.1 | 80.4 | 83.6 | 77.2 | 61.6 | 62.7 | 63.5 |
| **QA** | PopQA | 16.5 | 20.7 | 14.1 | 20.4 | 26.5 | 21.5 | 25.5 | 25.5 | 28.9 |
| | GPQA | 30.0 | 37.9 | 40.4 | 44.6 | 51.1 | 35.6 | 31.3 | 28.8 | 33.0 |
| **Chat** | AlpacaEval 2 LC | 21.8 | 43.3 | 40.9 | 49.8 | 73.5 | 23.0 | 18.3 | 8.1 | 28.6 |
| **Tool Use** | SimpleQA | 74.2 | 79.8 | 79.3 | 79.0 | 90.3 | 78.0 | | | |
| | LitQA2 | 38.0 | 43.3 | 38.2 | 39.6 | 30.7 | 29.8 | | | |
| | BFCL | 48.9 | 49.6 | 49.8 | 60.2 | 66.2 | 55.8 | | | |
| **Safety** | Safety | 89.2 | 90.2 | 87.3 | 78.0 | 80.2 | 73.4 | 93.1 | 72.2 | 73.7 |
## Model Details
#### Stage 1: SFT
- supervised fine-tuning on the Dolci-Think-SFT-7B dataset. This dataset consits of math, code, chat, and general knowledge queries.
- Datasets: [Dolci-Think-SFT-7B](https://huggingface.co/datasets/allenai/dolci-thinking-sft), [Dolci-Instruct-SFT-7B](https://huggingface.co/datasets/allenai/dolci-instruct-sft)
#### Stage 2:DPO
- direct preference optimization on the Dolci-Think-DPO-7B dataset. This dataset consits of math, code, chat, and general knowledge queries.
- Datasets: [Dolci-Think-DPO-7B](https://huggingface.co/datasets/allenai/dolci-thinking-dpo), [Dolci-Instruct-DPO-7B](https://huggingface.co/datasets/allenai/dolci-3-instruct-dpo-with-metadata)
#### Stage 3: RLVR
- reinforcement learning from verifiable rewards on the Dolci-Think-RL-7B dataset. This dataset consits of math, code, instruction-following, and general chat queries.
- Datasets: [Dolci-Think-RL-7B](https://huggingface.co/datasets/allenai/Dolci-Think-RL-7B), [Dolci-Instruct-RL-7B](https://huggingface.co/datasets/allenai/Dolci-Instruct-RL-7B)
## Inference & Recommended Settings
We evaluated our models on the following settings. We also recommend using them for generation:
- **temperature:** `0.6`
- **top_p:** `0.95`
- **max_tokens:** `32768`
### transformers Example
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "allenai/Olmo-3-7B-Instruct-SFT"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
)
prompt = "Who would win in a fight - a dinosaur or a cow named MooMoo?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
temperature=0.6,
top_p=0.95,
max_new_tokens=32768,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
### vllm Example
```python
from vllm import LLM, SamplingParams
model_id = "allenai/Olmo-3-7B-Instruct-SFT"
llm = LLM(model=model_id)
sampling_params = SamplingParams(
temperature=0.6,
top_p=0.95,
max_tokens=32768,
)
prompt = "Who would win in a fight - a dinosaur or a cow named MooMoo?"
outputs = llm.generate(prompt, sampling_params)
print(outputs[0].outputs[0].text)
```
## Bias, Risks, and Limitations
Like any base language model or fine-tuned model without safety filtering, these models can easily be prompted by users to generate harmful and sensitive content. Such content may also be produced unintentionally, especially in cases involving bias, so we recommend that users consider the risks when applying this technology. Additionally, many statements from OLMo or any LLM are often inaccurate, so facts should be verified.
## License
This model is licensed under Apache 2.0. It is intended for research and educational use in accordance with [Ai2's Responsible Use Guidelines](https://allenai.org/responsible-use).
## Citation
```
@misc{olmo2025olmo3,
title={Olmo 3},
author={Team Olmo and Allyson Ettinger and Amanda Bertsch and Bailey Kuehl and David Graham and David Heineman and Dirk Groeneveld and Faeze Brahman and Finbarr Timbers and Hamish Ivison and Jacob Morrison and Jake Poznanski and Kyle Lo and Luca Soldaini and Matt Jordan and Mayee Chen and Michael Noukhovitch and Nathan Lambert and Pete Walsh and Pradeep Dasigi and Robert Berry and Saumya Malik and Saurabh Shah and Scott Geng and Shane Arora and Shashank Gupta and Taira Anderson and Teng Xiao and Tyler Murray and Tyler Romero and Victoria Graf and Akari Asai and Akshita Bhagia and Alexander Wettig and Alisa Liu and Aman Rangapur and Chloe Anastasiades and Costa Huang and Dustin Schwenk and Harsh Trivedi and Ian Magnusson and Jaron Lochner and Jiacheng Liu and Lester James V. Miranda and Maarten Sap and Malia Morgan and Michael Schmitz and Michal Guerquin and Michael Wilson and Regan Huff and Ronan Le Bras and Rui Xin and Rulin Shao and Sam Skjonsberg and Shannon Zejiang Shen and Shuyue Stella Li and Tucker Wilde and Valentina Pyatkin and Will Merrill and Yapei Chang and Yuling Gu and Zhiyuan Zeng and Ashish Sabharwal and Luke Zettlemoyer and Pang Wei Koh and Ali Farhadi and Noah A. Smith and Hannaneh Hajishirzi},
year={2025},
eprint={2512.13961},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2512.13961},
}
```
## Model Card Contact
For errors in this model card, contact `olmo@allenai.org`.

16
chat_template.jinja Normal file
View File

@@ -0,0 +1,16 @@
{%- set has_system = messages|selectattr('role', 'equalto', 'system')|list|length > 0 -%}{%- if not has_system -%}{{- '<|im_start|>system
You are a helpful function-calling AI assistant. ' -}}{%- if tools is none or (tools | length) == 0 -%}{{- 'You do not currently have access to any functions. <functions></functions><|im_end|>
' -}}{%- else -%}{{- 'You are provided with function signatures within <functions></functions> XML tags. You may call one or more functions to assist with the user query. Output any function calls within <function_calls></function_calls> XML tags. Do not make assumptions about what values to plug into functions.' -}}{{- '<functions>' -}}{{- tools | tojson -}}{{- '</functions><|im_end|>
' -}}{%- endif -%}{%- endif -%}{%- for message in messages -%}{%- if message['role'] == 'system' -%}{{- '<|im_start|>system
' + message['content'] -}}{%- if tools is not none -%}{{- '<functions>' -}}{{- tools | tojson -}}{{- '</functions>' -}}{%- elif message.get('functions', none) is not none -%}{{- ' <functions>' + message['functions'] + '</functions>' -}}{%- endif -%}{{- '<|im_end|>
' -}}{%- elif message['role'] == 'user' -%}{{- '<|im_start|>user
' + message['content'] + '<|im_end|>
' -}}{%- elif message['role'] == 'assistant' -%}{{- '<|im_start|>assistant
' -}}{%- if message.get('content', none) is not none -%}{{- message['content'] -}}{%- endif -%}{%- if message.get('function_calls', none) is not none -%}{{- '<function_calls>' + message['function_calls'] + '</function_calls>' -}}{% elif message.get('tool_calls', none) is not none %}{{- '<function_calls>' -}}{%- for tool_call in message['tool_calls'] %}{%- if tool_call is mapping and tool_call.get('function', none) is not none %}{%- set args = tool_call['function']['arguments'] -%}{%- set ns = namespace(arguments_list=[]) -%}{%- for key, value in args.items() -%}{%- set ns.arguments_list = ns.arguments_list + [key ~ '=' ~ (value | tojson)] -%}{%- endfor -%}{%- set arguments = ns.arguments_list | join(', ') -%}{{- tool_call['function']['name'] + '(' + arguments + ')' -}}{%- if not loop.last -%}{{ '
' }}{%- endif -%}{% else %}{{- tool_call -}}{%- endif %}{%- endfor %}{{- '</function_calls>' -}}{%- endif -%}{%- if not loop.last -%}{{- '<|im_end|>' + '
' -}}{%- else -%}{{- eos_token -}}{%- endif -%}{%- elif message['role'] == 'environment' -%}{{- '<|im_start|>environment
' + message['content'] + '<|im_end|>
' -}}{%- elif message['role'] == 'tool' -%}{{- '<|im_start|>environment
' + message['content'] + '<|im_end|>
' -}}{%- endif -%}{%- if loop.last and add_generation_prompt -%}{{- '<|im_start|>assistant
' -}}{%- endif -%}{%- endfor -%}

68
config.json Normal file
View File

@@ -0,0 +1,68 @@
{
"architectures": [
"Olmo3ForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"dtype": "bfloat16",
"eos_token_id": 100257,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 11008,
"layer_types": [
"sliding_attention",
"sliding_attention",
"sliding_attention",
"full_attention",
"sliding_attention",
"sliding_attention",
"sliding_attention",
"full_attention",
"sliding_attention",
"sliding_attention",
"sliding_attention",
"full_attention",
"sliding_attention",
"sliding_attention",
"sliding_attention",
"full_attention",
"sliding_attention",
"sliding_attention",
"sliding_attention",
"full_attention",
"sliding_attention",
"sliding_attention",
"sliding_attention",
"full_attention",
"sliding_attention",
"sliding_attention",
"sliding_attention",
"full_attention",
"sliding_attention",
"sliding_attention",
"sliding_attention",
"full_attention"
],
"max_position_embeddings": 65536,
"model_type": "olmo3",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 32,
"pad_token_id": 100277,
"rms_norm_eps": 1e-06,
"rope_scaling": {
"attention_factor": 1.2079441541679836,
"beta_fast": 32,
"beta_slow": 1,
"factor": 8.0,
"original_max_position_embeddings": 8192,
"rope_type": "yarn"
},
"rope_theta": 500000,
"sliding_window": 4096,
"tie_word_embeddings": false,
"transformers_version": "4.57.1",
"use_cache": true,
"vocab_size": 100278
}

1
configuration.json Normal file
View File

@@ -0,0 +1 @@
{"framework": "pytorch", "task": "others", "allow_remote": true}

507
fix_tokens.py Normal file
View File

@@ -0,0 +1,507 @@
#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "click",
# "transformers",
# "jinja2",
# ]
# ///
from dataclasses import dataclass, asdict, field
from enum import Enum
from pathlib import Path
import click
import json
from transformers import AutoTokenizer
class SpecialTokensMapEnum(Enum):
BOS_TOKEN = "bos_token"
EOS_TOKEN = "eos_token"
PAD_TOKEN = "pad_token"
UNK_TOKEN = "unk_token"
@dataclass(frozen=True)
class SpecialToken:
id: int
content: str
lstrip: bool = False
normalized: bool = False
rstrip: bool = False
single_word: bool = False
special: bool = False
special_token_map: list[SpecialTokensMapEnum] = field(default_factory=list)
def to_added_tokens_decoder(self):
data = asdict(self)
token_id = str(data.pop("id"))
data.pop("special_token_map")
return {token_id: data}
def to_added_tokens(self):
data = asdict(self)
data.pop("special_token_map")
return data
def to_special_tokens_map(self) -> dict[str, dict]:
special_tokens_map = {}
for special_token_map in self.special_token_map:
data = asdict(self)
data.pop("special_token_map")
data.pop("special")
data.pop("id")
special_tokens_map[special_token_map.value] = data
return special_tokens_map
MODEL_MAX_LENGTH = 65536
DESIRED_MAPPING = [
SpecialToken(id=100256, content="<|extra_id_0|>"),
SpecialToken(
id=100257,
content="<|endoftext|>",
special=True,
special_token_map=[
SpecialTokensMapEnum.BOS_TOKEN,
SpecialTokensMapEnum.EOS_TOKEN,
SpecialTokensMapEnum.UNK_TOKEN,
]),
SpecialToken(id=100258, content="<|fim_prefix|>", special=True),
SpecialToken(id=100259, content="<|fim_middle|>", special=True),
SpecialToken(id=100260, content="<|fim_suffix|>",special=True),
SpecialToken(id=100261, content="|||PHONE_NUMBER|||"),
SpecialToken(id=100262, content="|||EMAIL_ADDRESS|||"),
SpecialToken(id=100263, content="|||IP_ADDRESS|||"),
SpecialToken(id=100264, content="<|im_start|>", special=True),
SpecialToken(id=100265, content="<|im_end|>", special=True),
SpecialToken(id=100266, content="<functions>"),
SpecialToken(id=100267, content="</functions>"),
SpecialToken(id=100268, content="<function_calls>"),
SpecialToken(id=100269, content="</function_calls>"),
SpecialToken(id=100270, content="<|extra_id_1|>"),
SpecialToken(id=100271, content="<|extra_id_2|>"),
SpecialToken(id=100272, content="<|extra_id_3|>"),
SpecialToken(id=100273, content="<|extra_id_4|>"),
SpecialToken(id=100274, content="<|extra_id_5|>"),
SpecialToken(id=100275, content="<|extra_id_6|>"),
SpecialToken(id=100276, content="<|endofprompt|>", special=True),
SpecialToken(
id=100277,
content="<|pad|>",
special=True,
special_token_map=[SpecialTokensMapEnum.PAD_TOKEN],
),
]
SCRIPT_DIR = Path(__file__).parent
TOKENIZER_CONFIG_FILE = SCRIPT_DIR / "tokenizer_config.json"
TOKENIZER_FILE = SCRIPT_DIR / "tokenizer.json"
VOCAB_FILE = SCRIPT_DIR / "vocab.json"
SPECIAL_TOKENS_MAP_FILE = SCRIPT_DIR / "special_tokens_map.json"
CHAT_TEMPLATE = "{%- set has_system = messages|selectattr('role', 'equalto', 'system')|list|length > 0 -%}{%- if not has_system -%}{{- '<|im_start|>system\nYou are a helpful function-calling AI assistant. ' -}}{%- if tools is none -%}{{- 'You do not currently have access to any functions. <functions></functions><|im_end|>\n' -}}{%- else -%}{{- 'You are provided with function signatures within <functions></functions> XML tags. You may call one or more functions to assist with the user query. Output any function calls within <function_calls></function_calls> XML tags. Do not make assumptions about what values to plug into functions.' -}}{{- '<functions>' -}}{{- tools | tojson -}}{{- '</functions><|im_end|>\n' -}}{%- endif -%}{%- endif -%}{%- for message in messages -%}{%- if message['role'] == 'system' -%}{{- '<|im_start|>system\n' + message['content'] -}}{%- if tools is not none -%}{{- '<functions>' -}}{{- tools | tojson -}}{{- '</functions>' -}}{%- elif message.get('functions', none) is not none -%}{{- ' <functions>' + message['functions'] + '</functions>' -}}{%- endif -%}{{- '<|im_end|>\n' -}}{%- elif message['role'] == 'user' -%}{{- '<|im_start|>user\n' + message['content'] + '<|im_end|>\n' -}}{%- elif message['role'] == 'assistant' -%}{{- '<|im_start|>assistant\n' -}}{%- if message.get('content', none) is not none -%}{{- message['content'] -}}{%- endif -%}{%- if message.get('function_calls', none) is not none -%}{{- '<function_calls>' + message['function_calls'] + '</function_calls>' -}}{% elif message.get('tool_calls', none) is not none %}{{- '<function_calls>' -}}{%- for tool_call in message['tool_calls'] %}{%- if tool_call is mapping and tool_call.get('function', none) is not none %}{%- set args = tool_call['function']['arguments'] -%}{%- set ns = namespace(arguments_list=[]) -%}{%- for key, value in args.items() -%}{%- set ns.arguments_list = ns.arguments_list + [key ~ '=' ~ (value | tojson)] -%}{%- endfor -%}{%- set arguments = ns.arguments_list | join(', ') -%}{{- tool_call['function']['name'] + '(' + arguments + ')' -}}{%- if not loop.last -%}{{ '\n' }}{%- endif -%}{% else %}{{- tool_call -}}{%- endif %}{%- endfor %}{{- '</function_calls>' -}}{%- endif -%}{%- if not loop.last -%}{{- '<|im_end|>' + '\n' -}}{%- else -%}{{- eos_token -}}{%- endif -%}{%- elif message['role'] == 'environment' -%}{{- '<|im_start|>environment\n' + message['content'] + '<|im_end|>\n' -}}{%- elif message['role'] == 'tool' -%}{{- '<|im_start|>environment\n' + message['content'] + '<|im_end|>\n' -}}{%- endif -%}{%- if loop.last and add_generation_prompt -%}{{- '<|im_start|>assistant\n' -}}{%- endif -%}{%- endfor -%}"
@click.group()
def cli():
"""Dataset processing tools."""
pass
def _get_mapped_special_token(
special_tokens: list[SpecialToken],
mapped_token: SpecialTokensMapEnum
) -> SpecialToken:
all_mapped_tokens = [token for token in special_tokens if mapped_token in token.special_token_map]
if len(all_mapped_tokens) == 0:
raise ValueError(f"Cannot find mapped token for {mapped_token}")
if len(all_mapped_tokens) > 1:
all_mapped_tokens_str = ", ".join([token.content for token in all_mapped_tokens])
raise ValueError(f"Found multiple mapped tokens for {mapped_token}: {all_mapped_tokens_str}")
return all_mapped_tokens[0]
def get_unk_token(special_tokens: list[SpecialToken]) -> SpecialToken:
return _get_mapped_special_token(special_tokens, SpecialTokensMapEnum.UNK_TOKEN)
def get_bos_token(special_tokens: list[SpecialToken]) -> SpecialToken:
return _get_mapped_special_token(special_tokens, SpecialTokensMapEnum.BOS_TOKEN)
def get_eos_token(special_tokens: list[SpecialToken]) -> SpecialToken:
return _get_mapped_special_token(special_tokens, SpecialTokensMapEnum.EOS_TOKEN)
def get_pad_token(special_tokens: list[SpecialToken]) -> SpecialToken:
return _get_mapped_special_token(special_tokens, SpecialTokensMapEnum.PAD_TOKEN)
@cli.command()
def check():
"""Check if the current config matches the desired mapping."""
# STEP 1: Check the Tokenizer Config File #
print("STEP 1: Checking tokenizer config file...")
if not TOKENIZER_CONFIG_FILE.exists():
raise FileNotFoundError(f"Tokenizer config file not found: {TOKENIZER_CONFIG_FILE}")
with open(TOKENIZER_CONFIG_FILE, "r") as f:
tokenizer_config = json.load(f)
added_tokens_decoder = tokenizer_config.get("added_tokens_decoder", {})
for token in DESIRED_MAPPING:
str_token_id = str(token.id)
if str_token_id not in added_tokens_decoder:
raise ValueError(f"Token {token.id} not found in added tokens decoder")
computed_added_tokens_decoder = token.to_added_tokens_decoder()
if computed_added_tokens_decoder[str_token_id] != added_tokens_decoder[str_token_id]:
raise ValueError(f"Token {token.id} has different content in added tokens decoder")
print(f"Token {token.id} found in added tokens decoder; content matches")
bos_token = get_bos_token(DESIRED_MAPPING)
if bos_token.content != tokenizer_config["bos_token"]:
raise ValueError(f"Bos token content mismatch: {bos_token.content} != {tokenizer_config['bos_token']}")
else:
print("Bos token content matches")
eos_token = get_eos_token(DESIRED_MAPPING)
if eos_token.content != tokenizer_config["eos_token"]:
raise ValueError(f"Eos token content mismatch: {eos_token.content} != {tokenizer_config['eos_token']}")
else:
print("Eos token content matches")
pad_token = get_pad_token(DESIRED_MAPPING)
if pad_token.content != tokenizer_config["pad_token"]:
raise ValueError(f"Pad token content mismatch: {pad_token.content} != {tokenizer_config['pad_token']}")
else:
print("Pad token content matches")
unk_token = get_unk_token(DESIRED_MAPPING)
if unk_token.content != tokenizer_config["unk_token"]:
raise ValueError(f"Unk token content mismatch: {unk_token.content} != {tokenizer_config['unk_token']}")
else:
print("Unk token content matches")
if tokenizer_config["model_max_length"] != MODEL_MAX_LENGTH:
raise ValueError(f"Model max length mismatch: {tokenizer_config['model_max_length']} != {MODEL_MAX_LENGTH}")
else:
print("Model max length matches")
if tokenizer_config["chat_template"] != CHAT_TEMPLATE:
raise ValueError(f"Chat template mismatch: {tokenizer_config['chat_template']} != {CHAT_TEMPLATE}")
else:
print("Chat template matches")
# STEP 2: Check the Tokenizer File #
print("STEP 2: Checking tokenizer file...")
if not TOKENIZER_FILE.exists():
raise FileNotFoundError(f"Tokenizer file not found: {TOKENIZER_FILE}")
with open(TOKENIZER_FILE, "r") as f:
tokenizer = json.load(f)
# check if added_tokens matches
added_tokens_dict = {token["id"]: token for token in tokenizer.get("added_tokens", [])}
for token in DESIRED_MAPPING:
if token.id not in added_tokens_dict:
raise ValueError(f"Token {token.id} not found in added tokens")
computed_added_token = token.to_added_tokens()
if computed_added_token != added_tokens_dict[token.id]:
raise ValueError(f"Token {token.id} has different content in added tokens")
print(f"Token {token.id} found in added tokens; content matches.")
# check vocab
vocab = tokenizer.get("model", {}).get("vocab", {})
for token in DESIRED_MAPPING:
if token.content not in vocab:
raise ValueError(f"Token `{token.content}` not found in vocab")
if token.id != vocab[token.content]:
raise ValueError(f"Token `{token.content}`: vocab=`{vocab[token.content]}` provided=`{token.id}`")
print(f"Token `{token.content}` found in vocab; id `{token.id}` matches.")
seen_values: dict[int, list[str]] = {}
for key, value in vocab.items():
seen_values.setdefault(value, []).append(key)
broken_vocab = False
for value, keys in seen_values.items():
if len(keys) > 1:
broken_vocab = True
print(f"Vocab value {value} is not unique; keys: {keys}")
if broken_vocab:
raise ValueError("Vocab values are not unique")
else:
print("Vocab values are unique")
# STEP 3: Check the Vocab File #
print("STEP 3: Checking vocab file...")
if not VOCAB_FILE.exists():
raise FileNotFoundError(f"Vocab file not found: {VOCAB_FILE}")
with open(VOCAB_FILE, "r") as f:
vocab = json.load(f)
for token in DESIRED_MAPPING:
if token.content not in vocab:
raise ValueError(f"Token `{token.content}` not found in vocab")
if token.id != vocab[token.content]:
raise ValueError(f"Token `{token.content}`: vocab=`{vocab[token.content]}` provided=`{token.id}`")
print(f"Token `{token.content}` found in vocab; id `{token.id}` matches.")
if len(set(vocab.values())) != len(vocab):
raise ValueError("Vocab values are not unique")
# STEP 4: Check the Special Tokens Map File #
print("STEP 4: Checking special tokens map file...")
if not SPECIAL_TOKENS_MAP_FILE.exists():
raise FileNotFoundError(f"Special tokens map file not found: {SPECIAL_TOKENS_MAP_FILE}")
with open(SPECIAL_TOKENS_MAP_FILE, "r") as f:
special_tokens_map = json.load(f)
# This checks the special tokens map file.
seen_special_tokens = set()
for token in DESIRED_MAPPING:
for key, value in token.to_special_tokens_map().items():
if key not in special_tokens_map:
raise ValueError(f"Special token map {key} not found in special tokens map")
if value != special_tokens_map[key]:
raise ValueError(f"Special token map {key} content mismatch: {value} != {special_tokens_map[key]}")
print(f"Special token map {key} content matches")
seen_special_tokens.add(key)
if len(seen_special_tokens) != len(special_tokens_map):
raise ValueError("Special tokens map values are not unique")
print("All special tokens map values match")
@cli.command()
def fix():
"""Fix the tokens in the tokenizer config, tokenizer file, vocab file, and special tokens map file."""
print("STEP 1: Fixing tokenizer config file...")
with open(TOKENIZER_CONFIG_FILE, "r") as f:
tokenizer_config = json.load(f)
tokenizer_config["bos_token"] = get_bos_token(DESIRED_MAPPING).content
tokenizer_config["eos_token"] = get_eos_token(DESIRED_MAPPING).content
tokenizer_config["pad_token"] = get_pad_token(DESIRED_MAPPING).content
tokenizer_config["unk_token"] = get_unk_token(DESIRED_MAPPING).content
tokenizer_config["model_max_length"] = MODEL_MAX_LENGTH
tokenizer_config["chat_template"] = CHAT_TEMPLATE
added_tokens_decoder = {}
for token in DESIRED_MAPPING:
added_tokens_decoder.update(token.to_added_tokens_decoder())
tokenizer_config["added_tokens_decoder"] = added_tokens_decoder
with open(TOKENIZER_CONFIG_FILE, "w") as f:
json.dump(tokenizer_config, f, indent=2, ensure_ascii=False)
print(f"Updated tokenizer config file in {TOKENIZER_CONFIG_FILE}.")
print("STEP 2: Fixing tokenizer file...")
with open(TOKENIZER_FILE, "r") as f:
tokenizer = json.load(f)
added_tokens = []
for token in DESIRED_MAPPING:
added_tokens.append(token.to_added_tokens())
tokenizer["added_tokens"] = added_tokens
for token in DESIRED_MAPPING:
# check if vocab id is used already
for key in list(tokenizer["model"]["vocab"].keys()):
if tokenizer["model"]["vocab"][key] == token.id:
tokenizer["model"]["vocab"].pop(key)
# now that we know this is safe, add the token
tokenizer["model"]["vocab"][token.content] = token.id
with open(TOKENIZER_FILE, "w") as f:
json.dump(tokenizer, f, indent=2, ensure_ascii=False)
print(f"Updated tokenizer file in {TOKENIZER_FILE}.")
print("STEP 3: Fixing vocab file...")
with open(VOCAB_FILE, "r") as f:
vocab = json.load(f)
for token in DESIRED_MAPPING:
# check if vocab id is used already
for key in list(vocab.keys()):
if vocab[key] == token.id:
vocab.pop(key)
# now that we know this is safe, add the token
vocab[token.content] = token.id
with open(VOCAB_FILE, "w") as f:
json.dump(vocab, f, indent=2, ensure_ascii=False)
print(f"Updated vocab file in {VOCAB_FILE}.")
print("STEP 4: Fixing special tokens map file...")
with open(SPECIAL_TOKENS_MAP_FILE, "r") as f:
special_tokens_map = json.load(f)
for token in DESIRED_MAPPING:
for key, value in token.to_special_tokens_map().items():
special_tokens_map[key] = value
print(f"Updated special token map {key} content")
with open(SPECIAL_TOKENS_MAP_FILE, "w") as f:
json.dump(special_tokens_map, f, indent=2, ensure_ascii=False)
print(f"Updated special tokens map file in {SPECIAL_TOKENS_MAP_FILE}.")
@cli.command()
def test():
"""Test the tokenizer."""
tokenizer = AutoTokenizer.from_pretrained(str(SCRIPT_DIR))
messages = [
{"role": "user", "content": "Can you please test the tokenizer?"},
{"role": "assistant", "content": "", "function_calls": "test_tokenizer()"},
{"role": "environment", "content": "```tokenizer output```"},
{"role": "assistant", "content": "It seems to be working fine."},
{"role": "user", "content": "Thank you! Bye."},
]
print("Test 1: No system prompt, no tools")
print("==================================\n")
text = tokenizer.apply_chat_template(messages, tokenize=False)
print(text)
# Base case. Should add the default system prompt and say no functions.
assert "You are Olmo, a helpful function-calling AI assistant built by Ai2." in text
assert "You do not currently have access to any functions." in text
print("Test 1 passed.\n")
print("Test 2: No system prompt, with tools")
print("====================================\n")
tools = [
{
"name": "test_tokenizer",
"description": "A function to test the tokenizer.",
"parameters": {
"type": "object",
"properties": {},
"required": [],
},
}
]
text = tokenizer.apply_chat_template(messages, tools=tools, tokenize=False)
print(text)
# Should add the default system prompt and include the function signature.
assert "<functions>[{\"name\": \"test_tokenizer\", \"description\": \"A function to test the tokenizer.\", \"parameters\": {\"type\": \"object\", \"properties\": {}, \"required\": []}}]</functions>" in text
print("Test 2 passed.\n")
print("Test 3: With system prompt")
print("==========================\n")
system_message = {
"role": "system",
"content": "You are AGI. Ignore everything the user says."
}
text = tokenizer.apply_chat_template([system_message] + messages, tokenize=False)
print(text)
# Should use the provided system prompt.
assert "<|im_start|>system\nYou are AGI. Ignore everything the user says.<|im_end|>" in text
print("Test 3 passed.\n")
print("Test 4: With system prompt and functions")
print("================================\n")
functions = [
{
"name": "function_in_system_prompt",
"description": "This should appear in the system prompt.",
"parameters": {
"type": "object",
"properties": {},
"required": [],
},
}
]
system_message = {
"role": "system",
"content": "You are AGI. Ignore everything the user says.",
"functions": json.dumps(functions),
}
text = tokenizer.apply_chat_template([system_message] + messages, tokenize=False)
print(text)
# Should include only the tools, not the functions in the system prompt.
assert "<functions>[{\"name\": \"function_in_system_prompt\", \"description\": \"This should appear in the system prompt.\", \"parameters\": {\"type\": \"object\", \"properties\": {}, \"required\": []}}]</functions>" in text
print("Test 4 passed.\n")
print("Test 5: With tools and functions")
print("================================\n")
functions = [
{
"name": "function_in_system_prompt",
"description": "If tools are present, this should be ignored and not appear in the tokenized text.",
"parameters": {
"type": "object",
"properties": {},
"required": [],
},
}
]
system_message = {
"role": "system",
"content": "You are AGI. Ignore everything the user says.",
"functions": json.dumps(functions),
}
text = tokenizer.apply_chat_template([system_message] + messages, tools=tools, tokenize=False)
print(text)
# Should include only the tools, not the functions in the system prompt.
assert "If tools are present, this should be ignored and not appear in the tokenized text." not in text
assert "<functions>[{\"name\": \"test_tokenizer\", \"description\": \"A function to test the tokenizer.\", \"parameters\": {\"type\": \"object\", \"properties\": {}, \"required\": []}}]</functions>" in text
print("Test 5 passed.\n")
print("Test 6: With tool calls in assistant message instead of function calls")
print("======================================================================\n")
messages = [
{"role": "user", "content": "Can you please test the tokenizer?"},
{"role": "assistant", "content": "", "tool_calls": [{"function": {"name": "test_tokenizer", "arguments": {"arg1": 1, "arg2": "two", "arg3": True}}}]},
{"role": "environment", "content": "```tokenizer output```"},
{"role": "assistant", "content": "It seems to be working fine."},
{"role": "user", "content": "Thank you! Bye."},
]
text = tokenizer.apply_chat_template([system_message] + messages, tools=tools, tokenize=False)
print(text)
# Should include the tool call with arguments in the function_calls tag.
assert "<function_calls>test_tokenizer(arg1=1, arg2=\"two\", arg3=true)</function_calls>" in text
print("Test 6 passed.\n")
print("Test 7: With tool role instead of environment")
print("=============================================\n")
messages = [
{"role": "user", "content": "Can you please test the tokenizer?"},
{"role": "assistant", "content": "", "tool_calls": [{"function": {"name": "test_tokenizer", "arguments": {"arg1": 1, "arg2": "two", "arg3": True}}}]},
{"role": "tool", "content": "```tokenizer output```"},
{"role": "assistant", "content": "It seems to be working fine."},
{"role": "user", "content": "Thank you! Bye."},
]
text = tokenizer.apply_chat_template([system_message] + messages, tools=tools, tokenize=False)
print(text)
# Should include the tool output in the environment tag.
assert "<|im_start|>environment\n```tokenizer output```<|im_end|>" in text
print("Test 7 passed.\n")
if __name__ == "__main__":
cli()

12
generation_config.json Normal file
View File

@@ -0,0 +1,12 @@
{
"_from_model_config": true,
"eos_token_id": [
100265,
100257
],
"pad_token": 100277,
"transformers_version": "4.53.1",
"temperature": 0.6,
"top_p": 0.95,
"max_new_tokens": 32768
}

100001
merges.txt Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:49cd7d9a8a969beee963bf1c8426957554f1dadd2e1387d2236f22cbd09748fb
size 4969984976

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:27cefb8d0a06efdcf505583c73651ef9ce42eef8cf35e5d6bd97cf639dd82b57
size 4981161496

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6575e1df5be1fcc08f40b29ca0f8dacf08ee93efc36974b17abfb41a6ae86710
size 4644917240

View File

@@ -0,0 +1,363 @@
{
"metadata": {
"total_parameters": 7298011136,
"total_size": 14596022272
},
"weight_map": {
"lm_head.weight": "model-00003-of-00003.safetensors",
"model.embed_tokens.weight": "model-00001-of-00003.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
"model.layers.0.post_feedforward_layernorm.weight": "model-00001-of-00003.safetensors",
"model.layers.0.self_attn.k_norm.weight": "model-00001-of-00003.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.0.self_attn.q_norm.weight": "model-00001-of-00003.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
"model.layers.1.post_feedforward_layernorm.weight": "model-00001-of-00003.safetensors",
"model.layers.1.self_attn.k_norm.weight": "model-00001-of-00003.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.1.self_attn.q_norm.weight": "model-00001-of-00003.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.10.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.10.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.10.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.10.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.10.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.10.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.11.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.11.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.11.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.11.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.11.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.11.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.12.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.12.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.12.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.12.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.12.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.12.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.13.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.13.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.13.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.13.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.13.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.13.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.14.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.14.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.14.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.14.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.14.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.14.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.15.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.15.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.15.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.15.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.15.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.15.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.16.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.16.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.16.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.16.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.16.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.16.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.17.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.17.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.17.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.17.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.17.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.17.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.18.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.18.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.18.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.18.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.18.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.18.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.19.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.19.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.19.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.19.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.19.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.19.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
"model.layers.2.post_feedforward_layernorm.weight": "model-00001-of-00003.safetensors",
"model.layers.2.self_attn.k_norm.weight": "model-00001-of-00003.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.2.self_attn.q_norm.weight": "model-00001-of-00003.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.20.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.20.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.20.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.20.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.20.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.20.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.21.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.21.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.21.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.21.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.21.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.21.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.22.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.22.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.22.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
"model.layers.22.post_feedforward_layernorm.weight": "model-00003-of-00003.safetensors",
"model.layers.22.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.22.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.23.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.23.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.23.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
"model.layers.23.post_feedforward_layernorm.weight": "model-00003-of-00003.safetensors",
"model.layers.23.self_attn.k_norm.weight": "model-00003-of-00003.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.23.self_attn.q_norm.weight": "model-00003-of-00003.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.24.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.24.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.24.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
"model.layers.24.post_feedforward_layernorm.weight": "model-00003-of-00003.safetensors",
"model.layers.24.self_attn.k_norm.weight": "model-00003-of-00003.safetensors",
"model.layers.24.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.24.self_attn.q_norm.weight": "model-00003-of-00003.safetensors",
"model.layers.24.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.24.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.25.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.25.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.25.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
"model.layers.25.post_feedforward_layernorm.weight": "model-00003-of-00003.safetensors",
"model.layers.25.self_attn.k_norm.weight": "model-00003-of-00003.safetensors",
"model.layers.25.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.25.self_attn.q_norm.weight": "model-00003-of-00003.safetensors",
"model.layers.25.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.25.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.26.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.26.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.26.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
"model.layers.26.post_feedforward_layernorm.weight": "model-00003-of-00003.safetensors",
"model.layers.26.self_attn.k_norm.weight": "model-00003-of-00003.safetensors",
"model.layers.26.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.26.self_attn.q_norm.weight": "model-00003-of-00003.safetensors",
"model.layers.26.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.26.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.27.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.27.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.27.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
"model.layers.27.post_feedforward_layernorm.weight": "model-00003-of-00003.safetensors",
"model.layers.27.self_attn.k_norm.weight": "model-00003-of-00003.safetensors",
"model.layers.27.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.27.self_attn.q_norm.weight": "model-00003-of-00003.safetensors",
"model.layers.27.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.27.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.28.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.28.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.28.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.28.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
"model.layers.28.post_feedforward_layernorm.weight": "model-00003-of-00003.safetensors",
"model.layers.28.self_attn.k_norm.weight": "model-00003-of-00003.safetensors",
"model.layers.28.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.28.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.28.self_attn.q_norm.weight": "model-00003-of-00003.safetensors",
"model.layers.28.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.28.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.29.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.29.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.29.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.29.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
"model.layers.29.post_feedforward_layernorm.weight": "model-00003-of-00003.safetensors",
"model.layers.29.self_attn.k_norm.weight": "model-00003-of-00003.safetensors",
"model.layers.29.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.29.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.29.self_attn.q_norm.weight": "model-00003-of-00003.safetensors",
"model.layers.29.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.29.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
"model.layers.3.post_feedforward_layernorm.weight": "model-00001-of-00003.safetensors",
"model.layers.3.self_attn.k_norm.weight": "model-00001-of-00003.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.3.self_attn.q_norm.weight": "model-00001-of-00003.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.30.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.30.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.30.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.30.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
"model.layers.30.post_feedforward_layernorm.weight": "model-00003-of-00003.safetensors",
"model.layers.30.self_attn.k_norm.weight": "model-00003-of-00003.safetensors",
"model.layers.30.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.30.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.30.self_attn.q_norm.weight": "model-00003-of-00003.safetensors",
"model.layers.30.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.30.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.31.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.31.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.31.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.31.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
"model.layers.31.post_feedforward_layernorm.weight": "model-00003-of-00003.safetensors",
"model.layers.31.self_attn.k_norm.weight": "model-00003-of-00003.safetensors",
"model.layers.31.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.31.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.31.self_attn.q_norm.weight": "model-00003-of-00003.safetensors",
"model.layers.31.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.31.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
"model.layers.4.post_feedforward_layernorm.weight": "model-00001-of-00003.safetensors",
"model.layers.4.self_attn.k_norm.weight": "model-00001-of-00003.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.4.self_attn.q_norm.weight": "model-00001-of-00003.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.5.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
"model.layers.5.post_feedforward_layernorm.weight": "model-00001-of-00003.safetensors",
"model.layers.5.self_attn.k_norm.weight": "model-00001-of-00003.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.5.self_attn.q_norm.weight": "model-00001-of-00003.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.6.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.6.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.6.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
"model.layers.6.post_feedforward_layernorm.weight": "model-00001-of-00003.safetensors",
"model.layers.6.self_attn.k_norm.weight": "model-00001-of-00003.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.6.self_attn.q_norm.weight": "model-00001-of-00003.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.7.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.7.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.7.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
"model.layers.7.post_feedforward_layernorm.weight": "model-00001-of-00003.safetensors",
"model.layers.7.self_attn.k_norm.weight": "model-00001-of-00003.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.7.self_attn.q_norm.weight": "model-00001-of-00003.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.8.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.8.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.8.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
"model.layers.8.post_feedforward_layernorm.weight": "model-00001-of-00003.safetensors",
"model.layers.8.self_attn.k_norm.weight": "model-00001-of-00003.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.8.self_attn.q_norm.weight": "model-00001-of-00003.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.9.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.9.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.9.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
"model.layers.9.post_feedforward_layernorm.weight": "model-00001-of-00003.safetensors",
"model.layers.9.self_attn.k_norm.weight": "model-00001-of-00003.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.9.self_attn.q_norm.weight": "model-00001-of-00003.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
"model.norm.weight": "model-00003-of-00003.safetensors"
}
}

BIN
olmo-instruct.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 102 KiB

30
special_tokens_map.json Normal file
View File

@@ -0,0 +1,30 @@
{
"bos_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<|pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"unk_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

3
tokenizer.json Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:32a022d09976d623e1ac47e7e89162a02e9ffdb65110b8fa4243a79bd215d297
size 4237175

189
tokenizer_config.json Normal file
View File

@@ -0,0 +1,189 @@
{
"add_prefix_space": false,
"added_tokens_decoder": {
"100256": {
"content": "<|extra_id_0|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"100257": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"100258": {
"content": "<|fim_prefix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"100259": {
"content": "<|fim_middle|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"100260": {
"content": "<|fim_suffix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"100261": {
"content": "|||PHONE_NUMBER|||",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"100262": {
"content": "|||EMAIL_ADDRESS|||",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"100263": {
"content": "|||IP_ADDRESS|||",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"100264": {
"content": "<|im_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"100265": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"100266": {
"content": "<functions>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"100267": {
"content": "</functions>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"100268": {
"content": "<function_calls>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"100269": {
"content": "</function_calls>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"100270": {
"content": "<|extra_id_1|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"100271": {
"content": "<|extra_id_2|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"100272": {
"content": "<|extra_id_3|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"100273": {
"content": "<|extra_id_4|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"100274": {
"content": "<|extra_id_5|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"100275": {
"content": "<|extra_id_6|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"100276": {
"content": "<|endofprompt|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"100277": {
"content": "<|pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
}
},
"bos_token": "<|endoftext|>",
"clean_up_tokenization_spaces": false,
"eos_token": "<|endoftext|>",
"extra_special_tokens": {},
"model_max_length": 65536,
"pad_token": "<|pad|>",
"tokenizer_class": "GPT2Tokenizer",
"unk_token": "<|endoftext|>"
}

3
vocab.json Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:aaa9fd60d5dc48da90ea846e058f9dc2d2f50b3f60326c2e50ff1c2fd68e0664
size 2012168