初始化项目,由ModelHub XC社区提供模型
Model: martinsu/tildeopen-30b-mu-instruct Source: Original Platform
This commit is contained in:
35
.gitattributes
vendored
Normal file
35
.gitattributes
vendored
Normal file
@@ -0,0 +1,35 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
231
README.md
Normal file
231
README.md
Normal file
@@ -0,0 +1,231 @@
|
||||
---
|
||||
language: [en, de, fr, es, it, pt, nl, pl, lv, et, lt, cs, sk, ro, bg, sl, hr, sv, da, fi, hu, uk, ru, zh, hi, ja, ko, el]
|
||||
license: cc-by-4.0
|
||||
library_name: transformers
|
||||
tags: [multilingual, chatml, instruction-tuning, response-only-training]
|
||||
base_model: TildeAI/TildeOpen-30b
|
||||
datasets:
|
||||
- HuggingFaceH4/ultrachat_200k
|
||||
- utter-project/EuroBlocks-SFT-Synthetic-1124
|
||||
- galileo-ai/ragbench
|
||||
- martinsu/latvian-wikipedia-qa-gemma3
|
||||
- yahma/alpaca-cleaned
|
||||
pipeline_tag: text-generation
|
||||
model-index:
|
||||
- name: TildeOpen-30B-MU-Instruct
|
||||
results:
|
||||
- task: {type: text-generation, name: Multilingual QA}
|
||||
dataset: {name: EuroBlocks eval split (non-English), type: utter-project/EuroBlocks-SFT-Synthetic-1124}
|
||||
metrics:
|
||||
- {name: ROUGE-L, type: rouge-l, value: 0.2583}
|
||||
- {name: BERTScore (XLM-R-large), type: bertscore, value: 0.7495}
|
||||
---
|
||||
|
||||
# TildeOpen-30B-MU-Instruct
|
||||
|
||||
Kudos to the Tilde team for a great base model and for taming that large LUMI beast — that was probably a crazy journey!
|
||||
|
||||
This is a fine-tuned 30B multilingual instruction model. It shows strong performance on the EuroBlocks multilingual evaluation compared to similarly sized models, with notably concise outputs.
|
||||
|
||||
These benchmarks for now are basically smoke tests to verify I didn't create a disaster. Always run your own evaluations for your specific use case.
|
||||
|
||||
I'll definitely use it for LV language work as a Gemma 3 replacement; it seems more capable. It seems to have acquired proper alignment from broad training sets too, at least at a basic level.
|
||||
|
||||
I'll run and publish more tests, perhaps using quantization.
|
||||
|
||||
On top of this fine-tune, one can use a lighter touch to nudge the model toward the right predictions.
|
||||
|
||||
ATM im running more RAG SFT on model, need to add more of a grounding behaviour.
|
||||
|
||||
# Run in prod:
|
||||
- **1) TGI official docker will NOT WORK, use vLLM docker with --tokenizer-mode slow**
|
||||
- **2) proper system prompt - correct language**
|
||||
- **3) proper RAG - model is RAG-tuned**
|
||||
|
||||
|
||||
Use correct prompt language as system role, it helps first token predictions for non-English.
|
||||
|
||||
## Quick Facts
|
||||
|
||||
- **Base**: TildeOpen-30B + ChatML format
|
||||
- **Training**: 1 epoch SFT, response-only masking, 163M tokens
|
||||
- **Languages**: 25 (focus on European)
|
||||
- **Context**: 4096 tokens
|
||||
- **Benchmark**: ROUGE-L 0.258 | BERTScore 0.750
|
||||
|
||||
## Usage
|
||||
|
||||
For proper prod usage check out:
|
||||
https://huggingface.co/spaces/martinsu/tildeopen-30b-mu-instruct-space/blob/main/app.py
|
||||
That code works.
|
||||
|
||||
Runs on official vLLM docker with **--tokenizer-mode slow** - typical prod usage.
|
||||
|
||||
**TGI will fail.** See further.
|
||||
|
||||
Use correct prompt language and text as system role, it helps accurate token prediction for all languages - they are trained implicit control codes for model, not random text.
|
||||
|
||||
Use RAG - model is tuned for RAG usage.
|
||||
|
||||
```python
|
||||
from transformers import AutoTokenizer, AutoModelForCausalLM
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained("martinsu/tildeopen-30b-mu-instruct", use_fast=False) # use_fast=False is critical
|
||||
model = AutoModelForCausalLM.from_pretrained("martinsu/tildeopen-30b-mu-instruct", torch_dtype="auto", device_map="auto")
|
||||
|
||||
messages = ["role": "system", "content": "You are a helpful AI assistant."},{"role": "user", "content": "Explain quantum computing simply."}]
|
||||
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
||||
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
|
||||
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7)
|
||||
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
||||
```
|
||||
|
||||
## Training Setup
|
||||
|
||||
**Hardware**: DeepSpeed ZeRO-3, BF16, Flash Attention 2, VRAM usage ~240GBs with some offloading.
|
||||
|
||||
**Hyperparameters**:
|
||||
- LR: 2e-5, cosine schedule, 3% warmup
|
||||
- Batch: 24 effective - 2 per gpu, 2 accum, 6 gpus
|
||||
- Seq length: 4096
|
||||
- Weight decay: 0.01, grad clip: 1.0
|
||||
- Steps: 7,514 (1 epoch)
|
||||
|
||||
**Data** (163M tokens, 181K examples):
|
||||
- HuggingFaceH4/ultrachat_200k (20% sampling) → 41.6K examples, 59.7M tokens
|
||||
- utter-project/EuroBlocks-SFT-Synthetic-1124 (20% sampling) → 85K examples, 58.6M tokens
|
||||
- galileo-ai/ragbench all 12 subsets (30% sampling) → 22K examples, 26.4M tokens
|
||||
- Subsets: covidqa, cuad, delucionqa, emanual, expertqa, finqa, hagrid, hotpotqa, msmarco, pubmedqa, tatqa, techqa
|
||||
- martinsu/latvian-wikipedia-qa-gemma3 (20% sampling, filtered) → 22.3K examples, 16.7M tokens
|
||||
- yahma/alpaca-cleaned (20% sampling) → 10.4K examples, 2.5M tokens
|
||||
|
||||
**Language breakdown** (163M tokens across 25 languages):
|
||||
- English: 117.7M (72%) - primary language
|
||||
- Latvian: 16.7M (10%) - European focus
|
||||
- Chinese: 10.1M (6%) - Asian coverage
|
||||
- Portuguese: 3.0M (2%) - Romance
|
||||
- Italian: 2.3M (1.4%) - Romance
|
||||
- Spanish: 2.1M (1.3%) - Romance
|
||||
- Hindi: 2.0M (1.2%)
|
||||
- French: 1.8M (1.1%) - Romance
|
||||
- German: 1.4M (0.8%) - Germanic
|
||||
- Dutch: 1.1M (0.7%) - Germanic
|
||||
- Plus 15 more: Japanese, Ukrainian, Swedish, Hungarian, Polish, Czech, Russian, Korean, Romanian, Finnish, Greek, Slovak, Norwegian, Slovenian, Estonian (4.9M combined, 3%)
|
||||
|
||||
**Response-only training**: Custom collator masks user/system messages, loss only on assistant responses.
|
||||
|
||||
### ChatML Template Format
|
||||
|
||||
All training data was formatted using the ChatML template with language-specific system prompts:
|
||||
|
||||
```
|
||||
<|im_start|>system
|
||||
You are a helpful AI assistant.<|im_end|>
|
||||
<|im_start|>user
|
||||
What is the capital of France?<|im_end|>
|
||||
<|im_start|>assistant
|
||||
The capital of France is Paris.<|im_end|>
|
||||
```
|
||||
|
||||
**Language-specific system prompts** (examples):
|
||||
- English: "You are a helpful AI assistant."
|
||||
- Latvian: "Tu esi izpalīdzīgs mākslīgā intelekta asistents."
|
||||
- German: "Sie sind ein hilfreicher KI-Assistent."
|
||||
- French: "Vous êtes un assistant IA utile."
|
||||
|
||||
**Response-only masking**: Only the assistant's response tokens (between `<|im_start|>assistant` and `<|im_end|>`) contribute to the loss, `<|im_end|>` including. System and user messages are masked with label `-100`.
|
||||
|
||||
## Training Metrics (from trainer_state.json)
|
||||
|
||||
Continuous improvement, no plateau, no overfitting:
|
||||
|
||||
**Loss**: 0.871 (start) → 0.781 (mid ~3500 steps) → 0.729 (end 7514 steps)
|
||||
**Token Accuracy**: 76.3% → 77.6% → 78.9%
|
||||
**Gradient Norm**: 3.09 → 0.97 → 1.12
|
||||
|
||||
Final eval: Loss 0.732, Accuracy 78.8% (train/eval gap 0.003 = doesnt look like overfitting)
|
||||
|
||||
## Benchmark
|
||||
|
||||
Smoke test benchmark. Not state-of-art work.
|
||||
|
||||
**Dataset**: EuroBlocks eval split (held-out 80% after training on 20%, non-English only)
|
||||
**N samples**: 150 random samples per model (English and Chinese excluded)
|
||||
**Scoring**: BERTScore, ROUGE-L
|
||||
**Generation params**: temperature=0.7, max_new_tokens=2048, seed=42
|
||||
**All models used their native chat templates**
|
||||
|
||||
On the EuroBlocks multilingual benchmark, the models performed as follows:
|
||||
|
||||
- **This model**: ROUGE-L 0.258, BERTScore 0.750, with an average output length that closely matches the reference (about 1.0x).
|
||||
- **Qwen2.5-32B-Instruct**: ROUGE-L 0.185, BERTScore 0.714, but tends to be much more verbose, producing outputs around 3.0x the reference length.
|
||||
- **Gemma-3-27B-IT**: ROUGE-L 0.150, BERTScore 0.690, with output length similar to the reference (about 1.0x).
|
||||
- **EuroLLM-22B-Instruct**: ROUGE-L 0.077, BERTScore 0.694, and also quite verbose, with outputs around 3.0x the reference length.
|
||||
|
||||
**Interpretation**: Higher scores may partly reflect output length matching reference length. Verbose models get penalized by ROUGE-L. No statistical significance computed. Single benchmark only - take with appropriate grain of salt.
|
||||
|
||||
## Known Issues
|
||||
- **Base model doesnt have fast tokenizer**
|
||||
- **By default AutoTokenizer.from_pretrained() will fire up fast tokenizer(TGI will do this), since this model doesnt have one, it cooks up broken one on the fly with tokens, that model is mostly unfamiliar, for example 179, that degrades performance seriously**
|
||||
- **The main problem is that model fails silently - it recognizes some tokens and generates with degraded performance**
|
||||
- **However when decoding with broken tokenizer we get sensible output, because model generates tokens that are in vocabulary**
|
||||
- **Phase 1 only**: SFT checkpoint, no tool use or DPO phases yet
|
||||
- **Use correct prompt language as system role**: It will scaffold model to predict given language tokens
|
||||
|
||||
## How vLLM(slow enabled) and TGI(default) tokenizes, example with curl:
|
||||
This applies to base model too.
|
||||
|
||||
TGI docker - broken output.
|
||||
```bash
|
||||
curl -X POST http://x:8081/tokenize -H 'Content-Type: application/json' -d '{"model":"tgi","inputs":" Hello world <|im_end|> ","add_special_tokens":true}'
|
||||
|
||||
[{"id":179,"text":" ","start":0,"stop":1},{"id":53914,"text":"Hello","start":1,"stop":6},{"id":179,"text":" ","start":6,"stop":7},{"id":8141,"text":"world","start":7,"stop":12},{"id":179,"text":" ","start":12,"stop":13},{"id":131074,"text":"<|im_end|>","start":13,"stop":23},{"id":179,"text":" ","start":23,"stop":24}]
|
||||
```
|
||||
vLLM docker - right output.
|
||||
|
||||
```bash
|
||||
curl -X POST http://x:8081/tokenize -H "Content-Type: application/json" -d '{"model": "martinsu/tildeopen-30b-mu-instruct", "prompt": " Hello world <|im_end|> ", "temperature": 0.7, "max_tokens": 150, "add_special_tokens":true}'
|
||||
{"count":6,"max_model_len":65536,"tokens":[453,63484,8141,128948,131074,453],"token_strs":null}
|
||||
```
|
||||
|
||||
They differ - vLLM tokenizer uses slow and outputs same tokens that model recognizes, TGI does not.
|
||||
|
||||
## Limitations & Safety
|
||||
|
||||
- **Not safety-tuned**: No RLHF, no red-teaming, no toxicity filtering
|
||||
- **Hardware requirements**: 30B params needs above average compute
|
||||
- **No harm evaluation**: ToxiGen, BBQ, etc. not run
|
||||
- **Standard LLM caveats**: It's a smart token predictor, not a legal or medical professional. Can hallucinate. Use responsibly.
|
||||
|
||||
|
||||
|
||||
## Why It (Probably) Works
|
||||
|
||||
**English-dominant (72%)**: Preserves base model's English token distribution and reasoning chains (likely optimized on English-heavy pretraining/instruction data) while extending multilingual generalization
|
||||
|
||||
**Diverse training set selection**: Can't overfit on specific style, formatting, length, or distilled patterns
|
||||
|
||||
**Diverse language selection**: Helps with generalization and multilingual support
|
||||
|
||||
**Single epoch**: Avoids overfitting on instruction data. Eval loss tracks train loss closely = good generalization, not memorization
|
||||
|
||||
**Response-only masking**: Loss computed only on assistant responses, not user prompts. Focuses learning signal on output quality
|
||||
|
||||
**Moderate batch size (24)**: Smaller batches may reduce risk of overshooting minima
|
||||
|
||||
**Limited sampling (20-30%)**: 163M tokens should be sufficient for SFT without requiring full datasets
|
||||
|
||||
## Citation
|
||||
|
||||
```bibtex
|
||||
@misc{tildeopen30b-mu-instruct,
|
||||
author = {Martins Udris},
|
||||
title = {TildeOpen-30B-MU-Instruct},
|
||||
year = {2025},
|
||||
url = {https://huggingface.co/martinsu/tildeopen-30b-mu-instruct}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**Contact**: martins@udris.eu | **License**: CC-BY-4.0
|
||||
32
added_tokens.json
Normal file
32
added_tokens.json
Normal file
@@ -0,0 +1,32 @@
|
||||
{
|
||||
"</function>": 131083,
|
||||
"</think>": 131077,
|
||||
"</tool_call>": 131079,
|
||||
"</tool_response>": 131081,
|
||||
"<function>": 131082,
|
||||
"<think>": 131076,
|
||||
"<tool_call>": 131078,
|
||||
"<tool_response>": 131080,
|
||||
"<|cs|>": 131095,
|
||||
"<|da|>": 131092,
|
||||
"<|de|>": 131085,
|
||||
"<|ee|>": 131100,
|
||||
"<|el|>": 131098,
|
||||
"<|en|>": 131084,
|
||||
"<|es|>": 131087,
|
||||
"<|fi|>": 131093,
|
||||
"<|fr|>": 131086,
|
||||
"<|hu|>": 131097,
|
||||
"<|im_end|>": 131074,
|
||||
"<|im_start|>": 131073,
|
||||
"<|it|>": 131088,
|
||||
"<|lt|>": 131101,
|
||||
"<|lv|>": 131099,
|
||||
"<|nl|>": 131090,
|
||||
"<|pad|>": 131072,
|
||||
"<|pl|>": 131089,
|
||||
"<|pt|>": 131094,
|
||||
"<|ro|>": 131096,
|
||||
"<|sv|>": 131091,
|
||||
"<|unk|>": 131075
|
||||
}
|
||||
1
chat_template.jinja
Normal file
1
chat_template.jinja
Normal file
@@ -0,0 +1 @@
|
||||
{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}
|
||||
30
config.json
Normal file
30
config.json
Normal file
@@ -0,0 +1,30 @@
|
||||
{
|
||||
"architectures": [
|
||||
"LlamaForCausalLM"
|
||||
],
|
||||
"attention_bias": false,
|
||||
"attention_dropout": 0.0,
|
||||
"bos_token_id": 131073,
|
||||
"dtype": "bfloat16",
|
||||
"eos_token_id": 131074,
|
||||
"head_dim": 128,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 6144,
|
||||
"initializer_range": 0.005,
|
||||
"intermediate_size": 21504,
|
||||
"max_position_embeddings": 65536,
|
||||
"mlp_bias": false,
|
||||
"model_type": "llama",
|
||||
"num_attention_heads": 48,
|
||||
"num_hidden_layers": 60,
|
||||
"num_key_value_heads": 8,
|
||||
"pad_token_id": 131072,
|
||||
"pretraining_tp": 1,
|
||||
"rms_norm_eps": 3e-06,
|
||||
"rope_scaling": null,
|
||||
"rope_theta": 200000,
|
||||
"tie_word_embeddings": false,
|
||||
"transformers_version": "4.57.1",
|
||||
"use_cache": true,
|
||||
"vocab_size": 131102
|
||||
}
|
||||
10
generation_config.json
Normal file
10
generation_config.json
Normal file
@@ -0,0 +1,10 @@
|
||||
{
|
||||
"_from_model_config": true,
|
||||
"bos_token_id": 131073,
|
||||
"eos_token_id": [
|
||||
131074,
|
||||
48
|
||||
],
|
||||
"pad_token_id": 131072,
|
||||
"transformers_version": "4.57.1"
|
||||
}
|
||||
3
model-00001-of-00013.safetensors
Normal file
3
model-00001-of-00013.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:c0691f6ac405aa3e84d5fe1a6a2c1ceab443ca869a65e928c8b74bdad690fc87
|
||||
size 4958113544
|
||||
3
model-00002-of-00013.safetensors
Normal file
3
model-00002-of-00013.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:4888abb7fc73aacb8aaca73e04aa1ee05fd6efb86d13033f17656090830044a4
|
||||
size 4844549224
|
||||
3
model-00003-of-00013.safetensors
Normal file
3
model-00003-of-00013.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:3667cea180f639dc554c3d648fb9c0e42b3c5c78ca539d40c59a3a9ac0f6a9ef
|
||||
size 4844549248
|
||||
3
model-00004-of-00013.safetensors
Normal file
3
model-00004-of-00013.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:71666a438d014ce5b78374d63104da276a3f58133fa45ec7e4fa9dd4475f392c
|
||||
size 4844549272
|
||||
3
model-00005-of-00013.safetensors
Normal file
3
model-00005-of-00013.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:000eeaf20de6992c5198adbabf0a93bbcb80e78bdbfface78be5bac029334e03
|
||||
size 4844549272
|
||||
3
model-00006-of-00013.safetensors
Normal file
3
model-00006-of-00013.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:ea46c1be53c4c1a4b20cb267770b2827719d749c86606f8f217a40f75e2afe5a
|
||||
size 4844549272
|
||||
3
model-00007-of-00013.safetensors
Normal file
3
model-00007-of-00013.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:bab148af3298c60957eedfc873dd20488cf6b28f250eea583e1d5c8abe3e955b
|
||||
size 4844549272
|
||||
3
model-00008-of-00013.safetensors
Normal file
3
model-00008-of-00013.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:6480753666f87c462794421f5e6001d30bd19d43e1a3161695307e490b550a85
|
||||
size 4844549272
|
||||
3
model-00009-of-00013.safetensors
Normal file
3
model-00009-of-00013.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:c1d569e39c7fd3bff02ea206aa7f53634e2af429a6f17a88898499a7c02d2d5b
|
||||
size 4844549272
|
||||
3
model-00010-of-00013.safetensors
Normal file
3
model-00010-of-00013.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:49f1a0e0c7c66ef83d8ae78c3cef2c834c5c7cfcc051e1fc89e6847e35fb708b
|
||||
size 4844549272
|
||||
3
model-00011-of-00013.safetensors
Normal file
3
model-00011-of-00013.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:7267da229c30620a513aa2a89e376cbce48705b50a08543acab71d4c7beb0f15
|
||||
size 4844549272
|
||||
3
model-00012-of-00013.safetensors
Normal file
3
model-00012-of-00013.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:4d6e579535b7220c8c3d6cd358a61b08847e51c5e94320a87883cc045600d562
|
||||
size 4844549272
|
||||
3
model-00013-of-00013.safetensors
Normal file
3
model-00013-of-00013.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:e62b5e4344b096e679e840fb30760c1ff6504b79a7d5436dab94018d6506b954
|
||||
size 3108411080
|
||||
551
model.safetensors.index.json
Normal file
551
model.safetensors.index.json
Normal file
@@ -0,0 +1,551 @@
|
||||
{
|
||||
"metadata": {
|
||||
"total_parameters": 30678251520,
|
||||
"total_size": 61356503040
|
||||
},
|
||||
"weight_map": {
|
||||
"lm_head.weight": "model-00013-of-00013.safetensors",
|
||||
"model.embed_tokens.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.0.input_layernorm.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.1.input_layernorm.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.10.input_layernorm.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.10.mlp.down_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.10.mlp.gate_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.10.mlp.up_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.10.post_attention_layernorm.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.10.self_attn.k_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.10.self_attn.o_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.10.self_attn.q_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.10.self_attn.v_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.11.input_layernorm.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.11.mlp.down_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.11.mlp.gate_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.11.mlp.up_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.11.post_attention_layernorm.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.11.self_attn.k_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.11.self_attn.o_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.11.self_attn.q_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.11.self_attn.v_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.12.input_layernorm.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.12.mlp.down_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.12.mlp.gate_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.12.mlp.up_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.12.post_attention_layernorm.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.12.self_attn.k_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.12.self_attn.o_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.12.self_attn.q_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.12.self_attn.v_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.13.input_layernorm.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.13.mlp.down_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.13.mlp.gate_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.13.mlp.up_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.13.post_attention_layernorm.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.13.self_attn.k_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.13.self_attn.o_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.13.self_attn.q_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.13.self_attn.v_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.14.input_layernorm.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.14.mlp.down_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.14.mlp.gate_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.14.mlp.up_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.14.post_attention_layernorm.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.14.self_attn.k_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.14.self_attn.o_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.14.self_attn.q_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.14.self_attn.v_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.15.input_layernorm.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.15.mlp.down_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.15.mlp.gate_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.15.mlp.up_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.15.post_attention_layernorm.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.15.self_attn.k_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.15.self_attn.o_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.15.self_attn.q_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.15.self_attn.v_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.16.input_layernorm.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.16.mlp.down_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.16.mlp.gate_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.16.mlp.up_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.16.post_attention_layernorm.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.16.self_attn.k_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.16.self_attn.o_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.16.self_attn.q_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.16.self_attn.v_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.17.input_layernorm.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.17.mlp.down_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.17.mlp.gate_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.17.mlp.up_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.17.post_attention_layernorm.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.17.self_attn.k_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.17.self_attn.o_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.17.self_attn.q_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.17.self_attn.v_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.18.input_layernorm.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.18.mlp.down_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.18.mlp.gate_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.18.mlp.up_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.18.post_attention_layernorm.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.18.self_attn.k_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.18.self_attn.o_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.18.self_attn.q_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.18.self_attn.v_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.19.input_layernorm.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.19.mlp.down_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.19.mlp.gate_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.19.mlp.up_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.19.post_attention_layernorm.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.19.self_attn.k_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.19.self_attn.o_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.19.self_attn.q_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.19.self_attn.v_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.2.input_layernorm.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.20.input_layernorm.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.20.mlp.down_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.20.mlp.gate_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.20.mlp.up_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.20.post_attention_layernorm.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.20.self_attn.k_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.20.self_attn.o_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.20.self_attn.q_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.20.self_attn.v_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.21.input_layernorm.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.21.mlp.down_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.21.mlp.gate_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.21.mlp.up_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.21.post_attention_layernorm.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.21.self_attn.k_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.21.self_attn.o_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.21.self_attn.q_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.21.self_attn.v_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.22.input_layernorm.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.22.mlp.down_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.22.mlp.gate_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.22.mlp.up_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.22.post_attention_layernorm.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.22.self_attn.k_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.22.self_attn.o_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.22.self_attn.q_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.22.self_attn.v_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.23.input_layernorm.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.23.mlp.down_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.23.mlp.gate_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.23.mlp.up_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.23.post_attention_layernorm.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.23.self_attn.k_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.23.self_attn.o_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.23.self_attn.q_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.23.self_attn.v_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.24.input_layernorm.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.24.mlp.down_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.24.mlp.gate_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.24.mlp.up_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.24.post_attention_layernorm.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.24.self_attn.k_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.24.self_attn.o_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.24.self_attn.q_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.24.self_attn.v_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.25.input_layernorm.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.25.mlp.down_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.25.mlp.gate_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.25.mlp.up_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.25.post_attention_layernorm.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.25.self_attn.k_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.25.self_attn.o_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.25.self_attn.q_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.25.self_attn.v_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.26.input_layernorm.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.26.mlp.down_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.26.mlp.gate_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.26.mlp.up_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.26.post_attention_layernorm.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.26.self_attn.k_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.26.self_attn.o_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.26.self_attn.q_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.26.self_attn.v_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.27.input_layernorm.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.27.mlp.down_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.27.mlp.gate_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.27.mlp.up_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.27.post_attention_layernorm.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.27.self_attn.k_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.27.self_attn.o_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.27.self_attn.q_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.27.self_attn.v_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.28.input_layernorm.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.28.mlp.down_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.28.mlp.gate_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.28.mlp.up_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.28.post_attention_layernorm.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.28.self_attn.k_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.28.self_attn.o_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.28.self_attn.q_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.28.self_attn.v_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.29.input_layernorm.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.29.mlp.down_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.29.mlp.gate_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.29.mlp.up_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.29.post_attention_layernorm.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.29.self_attn.k_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.29.self_attn.o_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.29.self_attn.q_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.29.self_attn.v_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.3.input_layernorm.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.3.mlp.down_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.3.mlp.up_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.3.post_attention_layernorm.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.30.input_layernorm.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.30.mlp.down_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.30.mlp.gate_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.30.mlp.up_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.30.post_attention_layernorm.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.30.self_attn.k_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.30.self_attn.o_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.30.self_attn.q_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.30.self_attn.v_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.31.input_layernorm.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.31.mlp.down_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.31.mlp.gate_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.31.mlp.up_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.31.post_attention_layernorm.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.31.self_attn.k_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.31.self_attn.o_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.31.self_attn.q_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.31.self_attn.v_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.32.input_layernorm.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.32.mlp.down_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.32.mlp.gate_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.32.mlp.up_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.32.post_attention_layernorm.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.32.self_attn.k_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.32.self_attn.o_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.32.self_attn.q_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.32.self_attn.v_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.33.input_layernorm.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.33.mlp.down_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.33.mlp.gate_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.33.mlp.up_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.33.post_attention_layernorm.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.33.self_attn.k_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.33.self_attn.o_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.33.self_attn.q_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.33.self_attn.v_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.34.input_layernorm.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.34.mlp.down_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.34.mlp.gate_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.34.mlp.up_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.34.post_attention_layernorm.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.34.self_attn.k_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.34.self_attn.o_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.34.self_attn.q_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.34.self_attn.v_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.35.input_layernorm.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.35.mlp.down_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.35.mlp.gate_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.35.mlp.up_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.35.post_attention_layernorm.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.35.self_attn.k_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.35.self_attn.o_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.35.self_attn.q_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.35.self_attn.v_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.36.input_layernorm.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.36.mlp.down_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.36.mlp.gate_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.36.mlp.up_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.36.post_attention_layernorm.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.36.self_attn.k_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.36.self_attn.o_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.36.self_attn.q_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.36.self_attn.v_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.37.input_layernorm.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.37.mlp.down_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.37.mlp.gate_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.37.mlp.up_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.37.post_attention_layernorm.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.37.self_attn.k_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.37.self_attn.o_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.37.self_attn.q_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.37.self_attn.v_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.38.input_layernorm.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.38.mlp.down_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.38.mlp.gate_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.38.mlp.up_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.38.post_attention_layernorm.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.38.self_attn.k_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.38.self_attn.o_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.38.self_attn.q_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.38.self_attn.v_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.39.input_layernorm.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.39.mlp.down_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.39.mlp.gate_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.39.mlp.up_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.39.post_attention_layernorm.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.39.self_attn.k_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.39.self_attn.o_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.39.self_attn.q_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.39.self_attn.v_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.4.input_layernorm.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.4.mlp.down_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.4.mlp.gate_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.4.mlp.up_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.4.post_attention_layernorm.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.4.self_attn.k_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.4.self_attn.o_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.4.self_attn.q_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.4.self_attn.v_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.40.input_layernorm.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.40.mlp.down_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.40.mlp.gate_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.40.mlp.up_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.40.post_attention_layernorm.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.40.self_attn.k_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.40.self_attn.o_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.40.self_attn.q_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.40.self_attn.v_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.41.input_layernorm.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.41.mlp.down_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.41.mlp.gate_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.41.mlp.up_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.41.post_attention_layernorm.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.41.self_attn.k_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.41.self_attn.o_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.41.self_attn.q_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.41.self_attn.v_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.42.input_layernorm.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.42.mlp.down_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.42.mlp.gate_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.42.mlp.up_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.42.post_attention_layernorm.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.42.self_attn.k_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.42.self_attn.o_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.42.self_attn.q_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.42.self_attn.v_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.43.input_layernorm.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.43.mlp.down_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.43.mlp.gate_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.43.mlp.up_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.43.post_attention_layernorm.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.43.self_attn.k_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.43.self_attn.o_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.43.self_attn.q_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.43.self_attn.v_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.44.input_layernorm.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.44.mlp.down_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.44.mlp.gate_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.44.mlp.up_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.44.post_attention_layernorm.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.44.self_attn.k_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.44.self_attn.o_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.44.self_attn.q_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.44.self_attn.v_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.45.input_layernorm.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.45.mlp.down_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.45.mlp.gate_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.45.mlp.up_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.45.post_attention_layernorm.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.45.self_attn.k_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.45.self_attn.o_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.45.self_attn.q_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.45.self_attn.v_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.46.input_layernorm.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.46.mlp.down_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.46.mlp.gate_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.46.mlp.up_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.46.post_attention_layernorm.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.46.self_attn.k_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.46.self_attn.o_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.46.self_attn.q_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.46.self_attn.v_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.47.input_layernorm.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.47.mlp.down_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.47.mlp.gate_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.47.mlp.up_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.47.post_attention_layernorm.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.47.self_attn.k_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.47.self_attn.o_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.47.self_attn.q_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.47.self_attn.v_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.48.input_layernorm.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.48.mlp.down_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.48.mlp.gate_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.48.mlp.up_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.48.post_attention_layernorm.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.48.self_attn.k_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.48.self_attn.o_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.48.self_attn.q_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.48.self_attn.v_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.49.input_layernorm.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.49.mlp.down_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.49.mlp.gate_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.49.mlp.up_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.49.post_attention_layernorm.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.49.self_attn.k_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.49.self_attn.o_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.49.self_attn.q_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.49.self_attn.v_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.5.input_layernorm.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.5.mlp.down_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.5.mlp.gate_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.5.mlp.up_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.5.post_attention_layernorm.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.5.self_attn.k_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.5.self_attn.o_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.5.self_attn.q_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.5.self_attn.v_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.50.input_layernorm.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.50.mlp.down_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.50.mlp.gate_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.50.mlp.up_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.50.post_attention_layernorm.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.50.self_attn.k_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.50.self_attn.o_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.50.self_attn.q_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.50.self_attn.v_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.51.input_layernorm.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.51.mlp.down_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.51.mlp.gate_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.51.mlp.up_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.51.post_attention_layernorm.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.51.self_attn.k_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.51.self_attn.o_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.51.self_attn.q_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.51.self_attn.v_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.52.input_layernorm.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.52.mlp.down_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.52.mlp.gate_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.52.mlp.up_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.52.post_attention_layernorm.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.52.self_attn.k_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.52.self_attn.o_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.52.self_attn.q_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.52.self_attn.v_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.53.input_layernorm.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.53.mlp.down_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.53.mlp.gate_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.53.mlp.up_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.53.post_attention_layernorm.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.53.self_attn.k_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.53.self_attn.o_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.53.self_attn.q_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.53.self_attn.v_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.54.input_layernorm.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.54.mlp.down_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.54.mlp.gate_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.54.mlp.up_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.54.post_attention_layernorm.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.54.self_attn.k_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.54.self_attn.o_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.54.self_attn.q_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.54.self_attn.v_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.55.input_layernorm.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.55.mlp.down_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.55.mlp.gate_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.55.mlp.up_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.55.post_attention_layernorm.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.55.self_attn.k_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.55.self_attn.o_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.55.self_attn.q_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.55.self_attn.v_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.56.input_layernorm.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.56.mlp.down_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.56.mlp.gate_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.56.mlp.up_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.56.post_attention_layernorm.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.56.self_attn.k_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.56.self_attn.o_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.56.self_attn.q_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.56.self_attn.v_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.57.input_layernorm.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.57.mlp.down_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.57.mlp.gate_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.57.mlp.up_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.57.post_attention_layernorm.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.57.self_attn.k_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.57.self_attn.o_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.57.self_attn.q_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.57.self_attn.v_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.58.input_layernorm.weight": "model-00013-of-00013.safetensors",
|
||||
"model.layers.58.mlp.down_proj.weight": "model-00013-of-00013.safetensors",
|
||||
"model.layers.58.mlp.gate_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.58.mlp.up_proj.weight": "model-00013-of-00013.safetensors",
|
||||
"model.layers.58.post_attention_layernorm.weight": "model-00013-of-00013.safetensors",
|
||||
"model.layers.58.self_attn.k_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.58.self_attn.o_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.58.self_attn.q_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.58.self_attn.v_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.59.input_layernorm.weight": "model-00013-of-00013.safetensors",
|
||||
"model.layers.59.mlp.down_proj.weight": "model-00013-of-00013.safetensors",
|
||||
"model.layers.59.mlp.gate_proj.weight": "model-00013-of-00013.safetensors",
|
||||
"model.layers.59.mlp.up_proj.weight": "model-00013-of-00013.safetensors",
|
||||
"model.layers.59.post_attention_layernorm.weight": "model-00013-of-00013.safetensors",
|
||||
"model.layers.59.self_attn.k_proj.weight": "model-00013-of-00013.safetensors",
|
||||
"model.layers.59.self_attn.o_proj.weight": "model-00013-of-00013.safetensors",
|
||||
"model.layers.59.self_attn.q_proj.weight": "model-00013-of-00013.safetensors",
|
||||
"model.layers.59.self_attn.v_proj.weight": "model-00013-of-00013.safetensors",
|
||||
"model.layers.6.input_layernorm.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.6.mlp.down_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.6.mlp.gate_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.6.mlp.up_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.6.post_attention_layernorm.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.6.self_attn.k_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.6.self_attn.o_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.6.self_attn.q_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.6.self_attn.v_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.7.input_layernorm.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.7.mlp.down_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.7.mlp.gate_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.7.mlp.up_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.7.post_attention_layernorm.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.7.self_attn.k_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.7.self_attn.o_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.7.self_attn.q_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.7.self_attn.v_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.8.input_layernorm.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.8.mlp.down_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.8.mlp.gate_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.8.mlp.up_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.8.post_attention_layernorm.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.8.self_attn.k_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.8.self_attn.o_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.8.self_attn.q_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.8.self_attn.v_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.9.input_layernorm.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.9.mlp.down_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.9.mlp.gate_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.9.mlp.up_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.9.post_attention_layernorm.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.9.self_attn.k_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.9.self_attn.o_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.9.self_attn.q_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.9.self_attn.v_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.norm.weight": "model-00013-of-00013.safetensors"
|
||||
}
|
||||
}
|
||||
60
special_tokens_map.json
Normal file
60
special_tokens_map.json
Normal file
@@ -0,0 +1,60 @@
|
||||
{
|
||||
"additional_special_tokens": [
|
||||
"<|im_start|>",
|
||||
"<|im_end|>",
|
||||
"<think>",
|
||||
"</think>",
|
||||
"<tool_call>",
|
||||
"</tool_call>",
|
||||
"<tool_response>",
|
||||
"</tool_response>",
|
||||
"<function>",
|
||||
"</function>",
|
||||
"<|en|>",
|
||||
"<|de|>",
|
||||
"<|fr|>",
|
||||
"<|es|>",
|
||||
"<|it|>",
|
||||
"<|pl|>",
|
||||
"<|nl|>",
|
||||
"<|sv|>",
|
||||
"<|da|>",
|
||||
"<|fi|>",
|
||||
"<|pt|>",
|
||||
"<|cs|>",
|
||||
"<|ro|>",
|
||||
"<|hu|>",
|
||||
"<|el|>",
|
||||
"<|lv|>",
|
||||
"<|ee|>",
|
||||
"<|lt|>"
|
||||
],
|
||||
"bos_token": {
|
||||
"content": "<|im_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"eos_token": {
|
||||
"content": "<|im_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"pad_token": {
|
||||
"content": "<|pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"unk_token": {
|
||||
"content": "<|unk|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
3
tokenizer.model
Normal file
3
tokenizer.model
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:1f1255dbcb60721f8a1de89ab5e16e7d8702b8751ded3783c01108074f0a7d71
|
||||
size 2316877
|
||||
308
tokenizer_config.json
Normal file
308
tokenizer_config.json
Normal file
@@ -0,0 +1,308 @@
|
||||
{
|
||||
"add_bos_token": false,
|
||||
"add_eos_token": false,
|
||||
"use_fast": false,
|
||||
"add_prefix_space": true,
|
||||
"added_tokens_decoder": {
|
||||
"0": {
|
||||
"content": "<unk>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"48": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"131072": {
|
||||
"content": "<|pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"131073": {
|
||||
"content": "<|im_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"131074": {
|
||||
"content": "<|im_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"131075": {
|
||||
"content": "<|unk|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"131076": {
|
||||
"content": "<think>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"131077": {
|
||||
"content": "</think>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"131078": {
|
||||
"content": "<tool_call>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"131079": {
|
||||
"content": "</tool_call>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"131080": {
|
||||
"content": "<tool_response>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"131081": {
|
||||
"content": "</tool_response>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"131082": {
|
||||
"content": "<function>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"131083": {
|
||||
"content": "</function>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"131084": {
|
||||
"content": "<|en|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"131085": {
|
||||
"content": "<|de|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"131086": {
|
||||
"content": "<|fr|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"131087": {
|
||||
"content": "<|es|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"131088": {
|
||||
"content": "<|it|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"131089": {
|
||||
"content": "<|pl|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"131090": {
|
||||
"content": "<|nl|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"131091": {
|
||||
"content": "<|sv|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"131092": {
|
||||
"content": "<|da|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"131093": {
|
||||
"content": "<|fi|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"131094": {
|
||||
"content": "<|pt|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"131095": {
|
||||
"content": "<|cs|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"131096": {
|
||||
"content": "<|ro|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"131097": {
|
||||
"content": "<|hu|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"131098": {
|
||||
"content": "<|el|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"131099": {
|
||||
"content": "<|lv|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"131100": {
|
||||
"content": "<|ee|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"131101": {
|
||||
"content": "<|lt|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
}
|
||||
},
|
||||
"additional_special_tokens": [
|
||||
"<|im_start|>",
|
||||
"<|im_end|>",
|
||||
"<think>",
|
||||
"</think>",
|
||||
"<tool_call>",
|
||||
"</tool_call>",
|
||||
"<tool_response>",
|
||||
"</tool_response>",
|
||||
"<function>",
|
||||
"</function>",
|
||||
"<|en|>",
|
||||
"<|de|>",
|
||||
"<|fr|>",
|
||||
"<|es|>",
|
||||
"<|it|>",
|
||||
"<|pl|>",
|
||||
"<|nl|>",
|
||||
"<|sv|>",
|
||||
"<|da|>",
|
||||
"<|fi|>",
|
||||
"<|pt|>",
|
||||
"<|cs|>",
|
||||
"<|ro|>",
|
||||
"<|hu|>",
|
||||
"<|el|>",
|
||||
"<|lv|>",
|
||||
"<|ee|>",
|
||||
"<|lt|>"
|
||||
],
|
||||
"bos_token": "<|im_start|>",
|
||||
"clean_up_tokenization_spaces": false,
|
||||
"eos_token": "<|im_end|>",
|
||||
"extra_special_tokens": {},
|
||||
"legacy": true,
|
||||
"model_max_length": 1000000000000000019884624838656,
|
||||
"pad_token": "<|pad|>",
|
||||
"padding_side": "left",
|
||||
"sp_model_kwargs": {},
|
||||
"spaces_between_special_tokens": false,
|
||||
"tokenizer_class": "LlamaTokenizer",
|
||||
"unk_token": "<|unk|>",
|
||||
"use_default_system_prompt": false,
|
||||
"chat_template": "{% for message in messages %}{{'<|im_start|>' + message['role'] + '\\n' + message['content'] + '<|im_end|>' + '\\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\\n' }}{% endif %}"
|
||||
}
|
||||
Reference in New Issue
Block a user