初始化项目,由ModelHub XC社区提供模型

Model: martinsu/tildeopen-30b-mu-instruct
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-19 13:09:07 +08:00
commit aafa2fa64d
23 changed files with 1300 additions and 0 deletions

35
.gitattributes vendored Normal file
View File

@@ -0,0 +1,35 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text

231
README.md Normal file
View File

@@ -0,0 +1,231 @@
---
language: [en, de, fr, es, it, pt, nl, pl, lv, et, lt, cs, sk, ro, bg, sl, hr, sv, da, fi, hu, uk, ru, zh, hi, ja, ko, el]
license: cc-by-4.0
library_name: transformers
tags: [multilingual, chatml, instruction-tuning, response-only-training]
base_model: TildeAI/TildeOpen-30b
datasets:
- HuggingFaceH4/ultrachat_200k
- utter-project/EuroBlocks-SFT-Synthetic-1124
- galileo-ai/ragbench
- martinsu/latvian-wikipedia-qa-gemma3
- yahma/alpaca-cleaned
pipeline_tag: text-generation
model-index:
- name: TildeOpen-30B-MU-Instruct
results:
- task: {type: text-generation, name: Multilingual QA}
dataset: {name: EuroBlocks eval split (non-English), type: utter-project/EuroBlocks-SFT-Synthetic-1124}
metrics:
- {name: ROUGE-L, type: rouge-l, value: 0.2583}
- {name: BERTScore (XLM-R-large), type: bertscore, value: 0.7495}
---
# TildeOpen-30B-MU-Instruct
Kudos to the Tilde team for a great base model and for taming that large LUMI beast — that was probably a crazy journey!
This is a fine-tuned 30B multilingual instruction model. It shows strong performance on the EuroBlocks multilingual evaluation compared to similarly sized models, with notably concise outputs.
These benchmarks for now are basically smoke tests to verify I didn't create a disaster. Always run your own evaluations for your specific use case.
I'll definitely use it for LV language work as a Gemma 3 replacement; it seems more capable. It seems to have acquired proper alignment from broad training sets too, at least at a basic level.
I'll run and publish more tests, perhaps using quantization.
On top of this fine-tune, one can use a lighter touch to nudge the model toward the right predictions.
ATM im running more RAG SFT on model, need to add more of a grounding behaviour.
# Run in prod:
- **1) TGI official docker will NOT WORK, use vLLM docker with --tokenizer-mode slow**
- **2) proper system prompt - correct language**
- **3) proper RAG - model is RAG-tuned**
Use correct prompt language as system role, it helps first token predictions for non-English.
## Quick Facts
- **Base**: TildeOpen-30B + ChatML format
- **Training**: 1 epoch SFT, response-only masking, 163M tokens
- **Languages**: 25 (focus on European)
- **Context**: 4096 tokens
- **Benchmark**: ROUGE-L 0.258 | BERTScore 0.750
## Usage
For proper prod usage check out:
https://huggingface.co/spaces/martinsu/tildeopen-30b-mu-instruct-space/blob/main/app.py
That code works.
Runs on official vLLM docker with **--tokenizer-mode slow** - typical prod usage.
**TGI will fail.** See further.
Use correct prompt language and text as system role, it helps accurate token prediction for all languages - they are trained implicit control codes for model, not random text.
Use RAG - model is tuned for RAG usage.
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("martinsu/tildeopen-30b-mu-instruct", use_fast=False) # use_fast=False is critical
model = AutoModelForCausalLM.from_pretrained("martinsu/tildeopen-30b-mu-instruct", torch_dtype="auto", device_map="auto")
messages = ["role": "system", "content": "You are a helpful AI assistant."},{"role": "user", "content": "Explain quantum computing simply."}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Training Setup
**Hardware**: DeepSpeed ZeRO-3, BF16, Flash Attention 2, VRAM usage ~240GBs with some offloading.
**Hyperparameters**:
- LR: 2e-5, cosine schedule, 3% warmup
- Batch: 24 effective - 2 per gpu, 2 accum, 6 gpus
- Seq length: 4096
- Weight decay: 0.01, grad clip: 1.0
- Steps: 7,514 (1 epoch)
**Data** (163M tokens, 181K examples):
- HuggingFaceH4/ultrachat_200k (20% sampling) → 41.6K examples, 59.7M tokens
- utter-project/EuroBlocks-SFT-Synthetic-1124 (20% sampling) → 85K examples, 58.6M tokens
- galileo-ai/ragbench all 12 subsets (30% sampling) → 22K examples, 26.4M tokens
- Subsets: covidqa, cuad, delucionqa, emanual, expertqa, finqa, hagrid, hotpotqa, msmarco, pubmedqa, tatqa, techqa
- martinsu/latvian-wikipedia-qa-gemma3 (20% sampling, filtered) → 22.3K examples, 16.7M tokens
- yahma/alpaca-cleaned (20% sampling) → 10.4K examples, 2.5M tokens
**Language breakdown** (163M tokens across 25 languages):
- English: 117.7M (72%) - primary language
- Latvian: 16.7M (10%) - European focus
- Chinese: 10.1M (6%) - Asian coverage
- Portuguese: 3.0M (2%) - Romance
- Italian: 2.3M (1.4%) - Romance
- Spanish: 2.1M (1.3%) - Romance
- Hindi: 2.0M (1.2%)
- French: 1.8M (1.1%) - Romance
- German: 1.4M (0.8%) - Germanic
- Dutch: 1.1M (0.7%) - Germanic
- Plus 15 more: Japanese, Ukrainian, Swedish, Hungarian, Polish, Czech, Russian, Korean, Romanian, Finnish, Greek, Slovak, Norwegian, Slovenian, Estonian (4.9M combined, 3%)
**Response-only training**: Custom collator masks user/system messages, loss only on assistant responses.
### ChatML Template Format
All training data was formatted using the ChatML template with language-specific system prompts:
```
<|im_start|>system
You are a helpful AI assistant.<|im_end|>
<|im_start|>user
What is the capital of France?<|im_end|>
<|im_start|>assistant
The capital of France is Paris.<|im_end|>
```
**Language-specific system prompts** (examples):
- English: "You are a helpful AI assistant."
- Latvian: "Tu esi izpalīdzīgs mākslīgā intelekta asistents."
- German: "Sie sind ein hilfreicher KI-Assistent."
- French: "Vous êtes un assistant IA utile."
**Response-only masking**: Only the assistant's response tokens (between `<|im_start|>assistant` and `<|im_end|>`) contribute to the loss, `<|im_end|>` including. System and user messages are masked with label `-100`.
## Training Metrics (from trainer_state.json)
Continuous improvement, no plateau, no overfitting:
**Loss**: 0.871 (start) → 0.781 (mid ~3500 steps) → 0.729 (end 7514 steps)
**Token Accuracy**: 76.3% → 77.6% → 78.9%
**Gradient Norm**: 3.09 → 0.97 → 1.12
Final eval: Loss 0.732, Accuracy 78.8% (train/eval gap 0.003 = doesnt look like overfitting)
## Benchmark
Smoke test benchmark. Not state-of-art work.
**Dataset**: EuroBlocks eval split (held-out 80% after training on 20%, non-English only)
**N samples**: 150 random samples per model (English and Chinese excluded)
**Scoring**: BERTScore, ROUGE-L
**Generation params**: temperature=0.7, max_new_tokens=2048, seed=42
**All models used their native chat templates**
On the EuroBlocks multilingual benchmark, the models performed as follows:
- **This model**: ROUGE-L 0.258, BERTScore 0.750, with an average output length that closely matches the reference (about 1.0x).
- **Qwen2.5-32B-Instruct**: ROUGE-L 0.185, BERTScore 0.714, but tends to be much more verbose, producing outputs around 3.0x the reference length.
- **Gemma-3-27B-IT**: ROUGE-L 0.150, BERTScore 0.690, with output length similar to the reference (about 1.0x).
- **EuroLLM-22B-Instruct**: ROUGE-L 0.077, BERTScore 0.694, and also quite verbose, with outputs around 3.0x the reference length.
**Interpretation**: Higher scores may partly reflect output length matching reference length. Verbose models get penalized by ROUGE-L. No statistical significance computed. Single benchmark only - take with appropriate grain of salt.
## Known Issues
- **Base model doesnt have fast tokenizer**
- **By default AutoTokenizer.from_pretrained() will fire up fast tokenizer(TGI will do this), since this model doesnt have one, it cooks up broken one on the fly with tokens, that model is mostly unfamiliar, for example 179, that degrades performance seriously**
- **The main problem is that model fails silently - it recognizes some tokens and generates with degraded performance**
- **However when decoding with broken tokenizer we get sensible output, because model generates tokens that are in vocabulary**
- **Phase 1 only**: SFT checkpoint, no tool use or DPO phases yet
- **Use correct prompt language as system role**: It will scaffold model to predict given language tokens
## How vLLM(slow enabled) and TGI(default) tokenizes, example with curl:
This applies to base model too.
TGI docker - broken output.
```bash
curl -X POST http://x:8081/tokenize -H 'Content-Type: application/json' -d '{"model":"tgi","inputs":" Hello world <|im_end|> ","add_special_tokens":true}'
[{"id":179,"text":" ","start":0,"stop":1},{"id":53914,"text":"Hello","start":1,"stop":6},{"id":179,"text":" ","start":6,"stop":7},{"id":8141,"text":"world","start":7,"stop":12},{"id":179,"text":" ","start":12,"stop":13},{"id":131074,"text":"<|im_end|>","start":13,"stop":23},{"id":179,"text":" ","start":23,"stop":24}]
```
vLLM docker - right output.
```bash
curl -X POST http://x:8081/tokenize -H "Content-Type: application/json" -d '{"model": "martinsu/tildeopen-30b-mu-instruct", "prompt": " Hello world <|im_end|> ", "temperature": 0.7, "max_tokens": 150, "add_special_tokens":true}'
{"count":6,"max_model_len":65536,"tokens":[453,63484,8141,128948,131074,453],"token_strs":null}
```
They differ - vLLM tokenizer uses slow and outputs same tokens that model recognizes, TGI does not.
## Limitations & Safety
- **Not safety-tuned**: No RLHF, no red-teaming, no toxicity filtering
- **Hardware requirements**: 30B params needs above average compute
- **No harm evaluation**: ToxiGen, BBQ, etc. not run
- **Standard LLM caveats**: It's a smart token predictor, not a legal or medical professional. Can hallucinate. Use responsibly.
## Why It (Probably) Works
**English-dominant (72%)**: Preserves base model's English token distribution and reasoning chains (likely optimized on English-heavy pretraining/instruction data) while extending multilingual generalization
**Diverse training set selection**: Can't overfit on specific style, formatting, length, or distilled patterns
**Diverse language selection**: Helps with generalization and multilingual support
**Single epoch**: Avoids overfitting on instruction data. Eval loss tracks train loss closely = good generalization, not memorization
**Response-only masking**: Loss computed only on assistant responses, not user prompts. Focuses learning signal on output quality
**Moderate batch size (24)**: Smaller batches may reduce risk of overshooting minima
**Limited sampling (20-30%)**: 163M tokens should be sufficient for SFT without requiring full datasets
## Citation
```bibtex
@misc{tildeopen30b-mu-instruct,
author = {Martins Udris},
title = {TildeOpen-30B-MU-Instruct},
year = {2025},
url = {https://huggingface.co/martinsu/tildeopen-30b-mu-instruct}
}
```
---
**Contact**: martins@udris.eu | **License**: CC-BY-4.0

32
added_tokens.json Normal file
View File

@@ -0,0 +1,32 @@
{
"</function>": 131083,
"</think>": 131077,
"</tool_call>": 131079,
"</tool_response>": 131081,
"<function>": 131082,
"<think>": 131076,
"<tool_call>": 131078,
"<tool_response>": 131080,
"<|cs|>": 131095,
"<|da|>": 131092,
"<|de|>": 131085,
"<|ee|>": 131100,
"<|el|>": 131098,
"<|en|>": 131084,
"<|es|>": 131087,
"<|fi|>": 131093,
"<|fr|>": 131086,
"<|hu|>": 131097,
"<|im_end|>": 131074,
"<|im_start|>": 131073,
"<|it|>": 131088,
"<|lt|>": 131101,
"<|lv|>": 131099,
"<|nl|>": 131090,
"<|pad|>": 131072,
"<|pl|>": 131089,
"<|pt|>": 131094,
"<|ro|>": 131096,
"<|sv|>": 131091,
"<|unk|>": 131075
}

1
chat_template.jinja Normal file
View File

@@ -0,0 +1 @@
{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}

30
config.json Normal file
View File

@@ -0,0 +1,30 @@
{
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 131073,
"dtype": "bfloat16",
"eos_token_id": 131074,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 6144,
"initializer_range": 0.005,
"intermediate_size": 21504,
"max_position_embeddings": 65536,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 48,
"num_hidden_layers": 60,
"num_key_value_heads": 8,
"pad_token_id": 131072,
"pretraining_tp": 1,
"rms_norm_eps": 3e-06,
"rope_scaling": null,
"rope_theta": 200000,
"tie_word_embeddings": false,
"transformers_version": "4.57.1",
"use_cache": true,
"vocab_size": 131102
}

10
generation_config.json Normal file
View File

@@ -0,0 +1,10 @@
{
"_from_model_config": true,
"bos_token_id": 131073,
"eos_token_id": [
131074,
48
],
"pad_token_id": 131072,
"transformers_version": "4.57.1"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c0691f6ac405aa3e84d5fe1a6a2c1ceab443ca869a65e928c8b74bdad690fc87
size 4958113544

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4888abb7fc73aacb8aaca73e04aa1ee05fd6efb86d13033f17656090830044a4
size 4844549224

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:3667cea180f639dc554c3d648fb9c0e42b3c5c78ca539d40c59a3a9ac0f6a9ef
size 4844549248

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:71666a438d014ce5b78374d63104da276a3f58133fa45ec7e4fa9dd4475f392c
size 4844549272

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:000eeaf20de6992c5198adbabf0a93bbcb80e78bdbfface78be5bac029334e03
size 4844549272

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ea46c1be53c4c1a4b20cb267770b2827719d749c86606f8f217a40f75e2afe5a
size 4844549272

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:bab148af3298c60957eedfc873dd20488cf6b28f250eea583e1d5c8abe3e955b
size 4844549272

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6480753666f87c462794421f5e6001d30bd19d43e1a3161695307e490b550a85
size 4844549272

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c1d569e39c7fd3bff02ea206aa7f53634e2af429a6f17a88898499a7c02d2d5b
size 4844549272

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:49f1a0e0c7c66ef83d8ae78c3cef2c834c5c7cfcc051e1fc89e6847e35fb708b
size 4844549272

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:7267da229c30620a513aa2a89e376cbce48705b50a08543acab71d4c7beb0f15
size 4844549272

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4d6e579535b7220c8c3d6cd358a61b08847e51c5e94320a87883cc045600d562
size 4844549272

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:e62b5e4344b096e679e840fb30760c1ff6504b79a7d5436dab94018d6506b954
size 3108411080

View File

@@ -0,0 +1,551 @@
{
"metadata": {
"total_parameters": 30678251520,
"total_size": 61356503040
},
"weight_map": {
"lm_head.weight": "model-00013-of-00013.safetensors",
"model.embed_tokens.weight": "model-00001-of-00013.safetensors",
"model.layers.0.input_layernorm.weight": "model-00001-of-00013.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00013.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.1.input_layernorm.weight": "model-00001-of-00013.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00013.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.10.input_layernorm.weight": "model-00003-of-00013.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.10.mlp.gate_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.10.mlp.up_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.10.post_attention_layernorm.weight": "model-00003-of-00013.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.11.input_layernorm.weight": "model-00003-of-00013.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.11.mlp.gate_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.11.mlp.up_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.11.post_attention_layernorm.weight": "model-00003-of-00013.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.12.input_layernorm.weight": "model-00003-of-00013.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.12.mlp.gate_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.12.mlp.up_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.12.post_attention_layernorm.weight": "model-00003-of-00013.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.13.input_layernorm.weight": "model-00004-of-00013.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.13.mlp.gate_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.13.mlp.up_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.13.post_attention_layernorm.weight": "model-00004-of-00013.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.14.input_layernorm.weight": "model-00004-of-00013.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.14.mlp.gate_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.14.mlp.up_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.14.post_attention_layernorm.weight": "model-00004-of-00013.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.15.input_layernorm.weight": "model-00004-of-00013.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.15.mlp.gate_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.15.mlp.up_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.15.post_attention_layernorm.weight": "model-00004-of-00013.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.16.input_layernorm.weight": "model-00004-of-00013.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.16.mlp.gate_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.16.mlp.up_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.16.post_attention_layernorm.weight": "model-00004-of-00013.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.17.input_layernorm.weight": "model-00004-of-00013.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.17.mlp.gate_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.17.mlp.up_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.17.post_attention_layernorm.weight": "model-00004-of-00013.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.18.input_layernorm.weight": "model-00005-of-00013.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.18.mlp.gate_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.18.mlp.up_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.18.post_attention_layernorm.weight": "model-00005-of-00013.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.19.input_layernorm.weight": "model-00005-of-00013.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.19.mlp.gate_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.19.mlp.up_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.19.post_attention_layernorm.weight": "model-00005-of-00013.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.2.input_layernorm.weight": "model-00001-of-00013.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00013.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.20.input_layernorm.weight": "model-00005-of-00013.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.20.mlp.gate_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.20.mlp.up_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.20.post_attention_layernorm.weight": "model-00005-of-00013.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.21.input_layernorm.weight": "model-00005-of-00013.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.21.mlp.gate_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.21.mlp.up_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.21.post_attention_layernorm.weight": "model-00005-of-00013.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.22.input_layernorm.weight": "model-00005-of-00013.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.22.mlp.gate_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.22.mlp.up_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.22.post_attention_layernorm.weight": "model-00005-of-00013.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.23.input_layernorm.weight": "model-00006-of-00013.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.23.mlp.gate_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.23.mlp.up_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.23.post_attention_layernorm.weight": "model-00006-of-00013.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.24.input_layernorm.weight": "model-00006-of-00013.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.24.mlp.gate_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.24.mlp.up_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.24.post_attention_layernorm.weight": "model-00006-of-00013.safetensors",
"model.layers.24.self_attn.k_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.24.self_attn.q_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.24.self_attn.v_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.25.input_layernorm.weight": "model-00006-of-00013.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.25.mlp.gate_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.25.mlp.up_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.25.post_attention_layernorm.weight": "model-00006-of-00013.safetensors",
"model.layers.25.self_attn.k_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.25.self_attn.q_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.25.self_attn.v_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.26.input_layernorm.weight": "model-00006-of-00013.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.26.mlp.gate_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.26.mlp.up_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.26.post_attention_layernorm.weight": "model-00006-of-00013.safetensors",
"model.layers.26.self_attn.k_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.26.self_attn.q_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.26.self_attn.v_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.27.input_layernorm.weight": "model-00006-of-00013.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.27.mlp.gate_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.27.mlp.up_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.27.post_attention_layernorm.weight": "model-00006-of-00013.safetensors",
"model.layers.27.self_attn.k_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.27.self_attn.q_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.27.self_attn.v_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.28.input_layernorm.weight": "model-00007-of-00013.safetensors",
"model.layers.28.mlp.down_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.28.mlp.gate_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.28.mlp.up_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.28.post_attention_layernorm.weight": "model-00007-of-00013.safetensors",
"model.layers.28.self_attn.k_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.28.self_attn.o_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.28.self_attn.q_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.28.self_attn.v_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.29.input_layernorm.weight": "model-00007-of-00013.safetensors",
"model.layers.29.mlp.down_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.29.mlp.gate_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.29.mlp.up_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.29.post_attention_layernorm.weight": "model-00007-of-00013.safetensors",
"model.layers.29.self_attn.k_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.29.self_attn.o_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.29.self_attn.q_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.29.self_attn.v_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.3.input_layernorm.weight": "model-00002-of-00013.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.3.mlp.up_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.3.post_attention_layernorm.weight": "model-00002-of-00013.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.30.input_layernorm.weight": "model-00007-of-00013.safetensors",
"model.layers.30.mlp.down_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.30.mlp.gate_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.30.mlp.up_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.30.post_attention_layernorm.weight": "model-00007-of-00013.safetensors",
"model.layers.30.self_attn.k_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.30.self_attn.o_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.30.self_attn.q_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.30.self_attn.v_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.31.input_layernorm.weight": "model-00007-of-00013.safetensors",
"model.layers.31.mlp.down_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.31.mlp.gate_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.31.mlp.up_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.31.post_attention_layernorm.weight": "model-00007-of-00013.safetensors",
"model.layers.31.self_attn.k_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.31.self_attn.o_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.31.self_attn.q_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.31.self_attn.v_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.32.input_layernorm.weight": "model-00007-of-00013.safetensors",
"model.layers.32.mlp.down_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.32.mlp.gate_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.32.mlp.up_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.32.post_attention_layernorm.weight": "model-00007-of-00013.safetensors",
"model.layers.32.self_attn.k_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.32.self_attn.o_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.32.self_attn.q_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.32.self_attn.v_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.33.input_layernorm.weight": "model-00008-of-00013.safetensors",
"model.layers.33.mlp.down_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.33.mlp.gate_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.33.mlp.up_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.33.post_attention_layernorm.weight": "model-00008-of-00013.safetensors",
"model.layers.33.self_attn.k_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.33.self_attn.o_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.33.self_attn.q_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.33.self_attn.v_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.34.input_layernorm.weight": "model-00008-of-00013.safetensors",
"model.layers.34.mlp.down_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.34.mlp.gate_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.34.mlp.up_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.34.post_attention_layernorm.weight": "model-00008-of-00013.safetensors",
"model.layers.34.self_attn.k_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.34.self_attn.o_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.34.self_attn.q_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.34.self_attn.v_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.35.input_layernorm.weight": "model-00008-of-00013.safetensors",
"model.layers.35.mlp.down_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.35.mlp.gate_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.35.mlp.up_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.35.post_attention_layernorm.weight": "model-00008-of-00013.safetensors",
"model.layers.35.self_attn.k_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.35.self_attn.o_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.35.self_attn.q_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.35.self_attn.v_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.36.input_layernorm.weight": "model-00008-of-00013.safetensors",
"model.layers.36.mlp.down_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.36.mlp.gate_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.36.mlp.up_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.36.post_attention_layernorm.weight": "model-00008-of-00013.safetensors",
"model.layers.36.self_attn.k_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.36.self_attn.o_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.36.self_attn.q_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.36.self_attn.v_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.37.input_layernorm.weight": "model-00008-of-00013.safetensors",
"model.layers.37.mlp.down_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.37.mlp.gate_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.37.mlp.up_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.37.post_attention_layernorm.weight": "model-00008-of-00013.safetensors",
"model.layers.37.self_attn.k_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.37.self_attn.o_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.37.self_attn.q_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.37.self_attn.v_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.38.input_layernorm.weight": "model-00009-of-00013.safetensors",
"model.layers.38.mlp.down_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.38.mlp.gate_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.38.mlp.up_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.38.post_attention_layernorm.weight": "model-00009-of-00013.safetensors",
"model.layers.38.self_attn.k_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.38.self_attn.o_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.38.self_attn.q_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.38.self_attn.v_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.39.input_layernorm.weight": "model-00009-of-00013.safetensors",
"model.layers.39.mlp.down_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.39.mlp.gate_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.39.mlp.up_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.39.post_attention_layernorm.weight": "model-00009-of-00013.safetensors",
"model.layers.39.self_attn.k_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.39.self_attn.o_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.39.self_attn.q_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.39.self_attn.v_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.4.input_layernorm.weight": "model-00002-of-00013.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.4.mlp.gate_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.4.mlp.up_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.4.post_attention_layernorm.weight": "model-00002-of-00013.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.40.input_layernorm.weight": "model-00009-of-00013.safetensors",
"model.layers.40.mlp.down_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.40.mlp.gate_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.40.mlp.up_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.40.post_attention_layernorm.weight": "model-00009-of-00013.safetensors",
"model.layers.40.self_attn.k_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.40.self_attn.o_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.40.self_attn.q_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.40.self_attn.v_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.41.input_layernorm.weight": "model-00009-of-00013.safetensors",
"model.layers.41.mlp.down_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.41.mlp.gate_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.41.mlp.up_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.41.post_attention_layernorm.weight": "model-00009-of-00013.safetensors",
"model.layers.41.self_attn.k_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.41.self_attn.o_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.41.self_attn.q_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.41.self_attn.v_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.42.input_layernorm.weight": "model-00009-of-00013.safetensors",
"model.layers.42.mlp.down_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.42.mlp.gate_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.42.mlp.up_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.42.post_attention_layernorm.weight": "model-00009-of-00013.safetensors",
"model.layers.42.self_attn.k_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.42.self_attn.o_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.42.self_attn.q_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.42.self_attn.v_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.43.input_layernorm.weight": "model-00010-of-00013.safetensors",
"model.layers.43.mlp.down_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.43.mlp.gate_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.43.mlp.up_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.43.post_attention_layernorm.weight": "model-00010-of-00013.safetensors",
"model.layers.43.self_attn.k_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.43.self_attn.o_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.43.self_attn.q_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.43.self_attn.v_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.44.input_layernorm.weight": "model-00010-of-00013.safetensors",
"model.layers.44.mlp.down_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.44.mlp.gate_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.44.mlp.up_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.44.post_attention_layernorm.weight": "model-00010-of-00013.safetensors",
"model.layers.44.self_attn.k_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.44.self_attn.o_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.44.self_attn.q_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.44.self_attn.v_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.45.input_layernorm.weight": "model-00010-of-00013.safetensors",
"model.layers.45.mlp.down_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.45.mlp.gate_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.45.mlp.up_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.45.post_attention_layernorm.weight": "model-00010-of-00013.safetensors",
"model.layers.45.self_attn.k_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.45.self_attn.o_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.45.self_attn.q_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.45.self_attn.v_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.46.input_layernorm.weight": "model-00010-of-00013.safetensors",
"model.layers.46.mlp.down_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.46.mlp.gate_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.46.mlp.up_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.46.post_attention_layernorm.weight": "model-00010-of-00013.safetensors",
"model.layers.46.self_attn.k_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.46.self_attn.o_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.46.self_attn.q_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.46.self_attn.v_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.47.input_layernorm.weight": "model-00010-of-00013.safetensors",
"model.layers.47.mlp.down_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.47.mlp.gate_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.47.mlp.up_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.47.post_attention_layernorm.weight": "model-00010-of-00013.safetensors",
"model.layers.47.self_attn.k_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.47.self_attn.o_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.47.self_attn.q_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.47.self_attn.v_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.48.input_layernorm.weight": "model-00011-of-00013.safetensors",
"model.layers.48.mlp.down_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.48.mlp.gate_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.48.mlp.up_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.48.post_attention_layernorm.weight": "model-00011-of-00013.safetensors",
"model.layers.48.self_attn.k_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.48.self_attn.o_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.48.self_attn.q_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.48.self_attn.v_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.49.input_layernorm.weight": "model-00011-of-00013.safetensors",
"model.layers.49.mlp.down_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.49.mlp.gate_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.49.mlp.up_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.49.post_attention_layernorm.weight": "model-00011-of-00013.safetensors",
"model.layers.49.self_attn.k_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.49.self_attn.o_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.49.self_attn.q_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.49.self_attn.v_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.5.input_layernorm.weight": "model-00002-of-00013.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.5.mlp.gate_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.5.mlp.up_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.5.post_attention_layernorm.weight": "model-00002-of-00013.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.50.input_layernorm.weight": "model-00011-of-00013.safetensors",
"model.layers.50.mlp.down_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.50.mlp.gate_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.50.mlp.up_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.50.post_attention_layernorm.weight": "model-00011-of-00013.safetensors",
"model.layers.50.self_attn.k_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.50.self_attn.o_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.50.self_attn.q_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.50.self_attn.v_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.51.input_layernorm.weight": "model-00011-of-00013.safetensors",
"model.layers.51.mlp.down_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.51.mlp.gate_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.51.mlp.up_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.51.post_attention_layernorm.weight": "model-00011-of-00013.safetensors",
"model.layers.51.self_attn.k_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.51.self_attn.o_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.51.self_attn.q_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.51.self_attn.v_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.52.input_layernorm.weight": "model-00011-of-00013.safetensors",
"model.layers.52.mlp.down_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.52.mlp.gate_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.52.mlp.up_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.52.post_attention_layernorm.weight": "model-00011-of-00013.safetensors",
"model.layers.52.self_attn.k_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.52.self_attn.o_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.52.self_attn.q_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.52.self_attn.v_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.53.input_layernorm.weight": "model-00012-of-00013.safetensors",
"model.layers.53.mlp.down_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.53.mlp.gate_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.53.mlp.up_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.53.post_attention_layernorm.weight": "model-00012-of-00013.safetensors",
"model.layers.53.self_attn.k_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.53.self_attn.o_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.53.self_attn.q_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.53.self_attn.v_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.54.input_layernorm.weight": "model-00012-of-00013.safetensors",
"model.layers.54.mlp.down_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.54.mlp.gate_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.54.mlp.up_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.54.post_attention_layernorm.weight": "model-00012-of-00013.safetensors",
"model.layers.54.self_attn.k_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.54.self_attn.o_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.54.self_attn.q_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.54.self_attn.v_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.55.input_layernorm.weight": "model-00012-of-00013.safetensors",
"model.layers.55.mlp.down_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.55.mlp.gate_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.55.mlp.up_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.55.post_attention_layernorm.weight": "model-00012-of-00013.safetensors",
"model.layers.55.self_attn.k_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.55.self_attn.o_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.55.self_attn.q_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.55.self_attn.v_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.56.input_layernorm.weight": "model-00012-of-00013.safetensors",
"model.layers.56.mlp.down_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.56.mlp.gate_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.56.mlp.up_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.56.post_attention_layernorm.weight": "model-00012-of-00013.safetensors",
"model.layers.56.self_attn.k_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.56.self_attn.o_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.56.self_attn.q_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.56.self_attn.v_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.57.input_layernorm.weight": "model-00012-of-00013.safetensors",
"model.layers.57.mlp.down_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.57.mlp.gate_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.57.mlp.up_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.57.post_attention_layernorm.weight": "model-00012-of-00013.safetensors",
"model.layers.57.self_attn.k_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.57.self_attn.o_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.57.self_attn.q_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.57.self_attn.v_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.58.input_layernorm.weight": "model-00013-of-00013.safetensors",
"model.layers.58.mlp.down_proj.weight": "model-00013-of-00013.safetensors",
"model.layers.58.mlp.gate_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.58.mlp.up_proj.weight": "model-00013-of-00013.safetensors",
"model.layers.58.post_attention_layernorm.weight": "model-00013-of-00013.safetensors",
"model.layers.58.self_attn.k_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.58.self_attn.o_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.58.self_attn.q_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.58.self_attn.v_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.59.input_layernorm.weight": "model-00013-of-00013.safetensors",
"model.layers.59.mlp.down_proj.weight": "model-00013-of-00013.safetensors",
"model.layers.59.mlp.gate_proj.weight": "model-00013-of-00013.safetensors",
"model.layers.59.mlp.up_proj.weight": "model-00013-of-00013.safetensors",
"model.layers.59.post_attention_layernorm.weight": "model-00013-of-00013.safetensors",
"model.layers.59.self_attn.k_proj.weight": "model-00013-of-00013.safetensors",
"model.layers.59.self_attn.o_proj.weight": "model-00013-of-00013.safetensors",
"model.layers.59.self_attn.q_proj.weight": "model-00013-of-00013.safetensors",
"model.layers.59.self_attn.v_proj.weight": "model-00013-of-00013.safetensors",
"model.layers.6.input_layernorm.weight": "model-00002-of-00013.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.6.mlp.gate_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.6.mlp.up_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.6.post_attention_layernorm.weight": "model-00002-of-00013.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.7.input_layernorm.weight": "model-00002-of-00013.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.7.mlp.gate_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.7.mlp.up_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.7.post_attention_layernorm.weight": "model-00002-of-00013.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.8.input_layernorm.weight": "model-00003-of-00013.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.8.mlp.gate_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.8.mlp.up_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.8.post_attention_layernorm.weight": "model-00003-of-00013.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.9.input_layernorm.weight": "model-00003-of-00013.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.9.mlp.gate_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.9.mlp.up_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.9.post_attention_layernorm.weight": "model-00003-of-00013.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-00003-of-00013.safetensors",
"model.norm.weight": "model-00013-of-00013.safetensors"
}
}

60
special_tokens_map.json Normal file
View File

@@ -0,0 +1,60 @@
{
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<think>",
"</think>",
"<tool_call>",
"</tool_call>",
"<tool_response>",
"</tool_response>",
"<function>",
"</function>",
"<|en|>",
"<|de|>",
"<|fr|>",
"<|es|>",
"<|it|>",
"<|pl|>",
"<|nl|>",
"<|sv|>",
"<|da|>",
"<|fi|>",
"<|pt|>",
"<|cs|>",
"<|ro|>",
"<|hu|>",
"<|el|>",
"<|lv|>",
"<|ee|>",
"<|lt|>"
],
"bos_token": {
"content": "<|im_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<|pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"unk_token": {
"content": "<|unk|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

3
tokenizer.model Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1f1255dbcb60721f8a1de89ab5e16e7d8702b8751ded3783c01108074f0a7d71
size 2316877

308
tokenizer_config.json Normal file
View File

@@ -0,0 +1,308 @@
{
"add_bos_token": false,
"add_eos_token": false,
"use_fast": false,
"add_prefix_space": true,
"added_tokens_decoder": {
"0": {
"content": "<unk>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"48": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"131072": {
"content": "<|pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"131073": {
"content": "<|im_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"131074": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"131075": {
"content": "<|unk|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"131076": {
"content": "<think>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"131077": {
"content": "</think>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"131078": {
"content": "<tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"131079": {
"content": "</tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"131080": {
"content": "<tool_response>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"131081": {
"content": "</tool_response>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"131082": {
"content": "<function>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"131083": {
"content": "</function>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"131084": {
"content": "<|en|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"131085": {
"content": "<|de|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"131086": {
"content": "<|fr|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"131087": {
"content": "<|es|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"131088": {
"content": "<|it|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"131089": {
"content": "<|pl|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"131090": {
"content": "<|nl|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"131091": {
"content": "<|sv|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"131092": {
"content": "<|da|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"131093": {
"content": "<|fi|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"131094": {
"content": "<|pt|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"131095": {
"content": "<|cs|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"131096": {
"content": "<|ro|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"131097": {
"content": "<|hu|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"131098": {
"content": "<|el|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"131099": {
"content": "<|lv|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"131100": {
"content": "<|ee|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"131101": {
"content": "<|lt|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
}
},
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<think>",
"</think>",
"<tool_call>",
"</tool_call>",
"<tool_response>",
"</tool_response>",
"<function>",
"</function>",
"<|en|>",
"<|de|>",
"<|fr|>",
"<|es|>",
"<|it|>",
"<|pl|>",
"<|nl|>",
"<|sv|>",
"<|da|>",
"<|fi|>",
"<|pt|>",
"<|cs|>",
"<|ro|>",
"<|hu|>",
"<|el|>",
"<|lv|>",
"<|ee|>",
"<|lt|>"
],
"bos_token": "<|im_start|>",
"clean_up_tokenization_spaces": false,
"eos_token": "<|im_end|>",
"extra_special_tokens": {},
"legacy": true,
"model_max_length": 1000000000000000019884624838656,
"pad_token": "<|pad|>",
"padding_side": "left",
"sp_model_kwargs": {},
"spaces_between_special_tokens": false,
"tokenizer_class": "LlamaTokenizer",
"unk_token": "<|unk|>",
"use_default_system_prompt": false,
"chat_template": "{% for message in messages %}{{'<|im_start|>' + message['role'] + '\\n' + message['content'] + '<|im_end|>' + '\\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\\n' }}{% endif %}"
}