初始化项目,由ModelHub XC社区提供模型
Model: tbilisi-ai-lab/kona2-12B-Instruct Source: Original Platform
This commit is contained in:
36
.gitattributes
vendored
Normal file
36
.gitattributes
vendored
Normal file
@@ -0,0 +1,36 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
||||
194
README.md
Normal file
194
README.md
Normal file
@@ -0,0 +1,194 @@
|
||||
---
|
||||
license: apache-2.0
|
||||
language:
|
||||
- ka
|
||||
- en
|
||||
- multilingual
|
||||
library_name: transformers
|
||||
pipeline_tag: text-generation
|
||||
tags:
|
||||
- llm
|
||||
- georgian
|
||||
- instruct
|
||||
- chat
|
||||
- function-calling
|
||||
- conversational
|
||||
datasets:
|
||||
- tbilisi-ai-lab/kona-sft-mix-2.6M
|
||||
- tbilisi-ai-lab/kona-sft-function-calling-115k
|
||||
- tbilisi-ai-lab/kona-sft-function-calling-ka-93k
|
||||
base_model:
|
||||
- tbilisi-ai-lab/kona2-12B-Base
|
||||
---
|
||||
|
||||
# Kona2-12B-Instruct
|
||||
|
||||
**Kona2-12B-Instruct** is a 12-billion parameter instruction-tuned language model for Georgian and English. Built on [Kona2-12B-Base](https://huggingface.co/tbilisi-ai-lab/kona2-12B-Base) through supervised fine-tuning (SFT), it excels at chat, question answering, and function calling.
|
||||
|
||||
## Model Summary
|
||||
|
||||
| Property | Value |
|
||||
|----------|-------|
|
||||
| Parameters | 12B |
|
||||
| Architecture | Mistral (Transformer) |
|
||||
| Context Length | 32K tokens |
|
||||
| Languages | Georgian (ka), English (en), other (limited) |
|
||||
| Training | Supervised Fine-Tuning (SFT) |
|
||||
| Training Examples | ~2.8M instructions |
|
||||
| Function Calling | Yes (Hermes format) |
|
||||
| Base Model | [kona2-12B-Base](https://huggingface.co/tbilisi-ai-lab/kona2-12B-Base) |
|
||||
|
||||
## Intended Uses
|
||||
|
||||
### Primary Use Cases
|
||||
- Conversational AI assistants (Georgian/English)
|
||||
- Question answering and information retrieval
|
||||
- Function/tool calling applications
|
||||
- **Translation between Georgian and English** (especially strong)
|
||||
- Code generation and explanation
|
||||
- Educational and tutoring applications
|
||||
|
||||
## Training
|
||||
|
||||
### Training Data
|
||||
|
||||
| Dataset | Examples | Description |
|
||||
|---------|----------|-------------|
|
||||
| [kona-sft-mix-2.6M](https://huggingface.co/datasets/tbilisi-ai-lab/kona-sft-mix-2.6M) | 2,606,173 | Mixed instruction dataset (KA/EN) |
|
||||
| [kona-sft-function-calling-115k](https://huggingface.co/datasets/tbilisi-ai-lab/kona-sft-function-calling-115k) | ~115K | Function calling (English) |
|
||||
| [kona-sft-function-calling-ka-93k](https://huggingface.co/datasets/tbilisi-ai-lab/kona-sft-function-calling-ka-93k) | ~93K | Function calling (Georgian) |
|
||||
|
||||
**Data Sources Include:**
|
||||
- Wikipedia Q&A (RAFT-generated)
|
||||
- Orca-style reasoning
|
||||
- Self-instruct (Alpaca-style)
|
||||
- Translation pairs (EN-KA)
|
||||
- Code instructions
|
||||
- Math instructions
|
||||
- PersonaHub reasoning
|
||||
- Glaive & Hermes function calling
|
||||
|
||||
### Training Procedure
|
||||
|
||||
- **Method:** Supervised Fine-Tuning (SFT)
|
||||
- **LoRA Config:** r=256, alpha=512
|
||||
- **Learning Rate:** 3e-5
|
||||
- **Epochs:** 2
|
||||
- **Training Context:** 32K tokens
|
||||
- **Packing:** Enabled
|
||||
- **Precision:** BF16
|
||||
- **Infrastructure:** DeepSpeed ZeRO-2
|
||||
|
||||
## Usage
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
pip install transformers torch accelerate
|
||||
```
|
||||
|
||||
### Chat Completion
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
"tbilisi-ai-lab/kona2-12B-Instruct",
|
||||
torch_dtype="auto",
|
||||
device_map="auto"
|
||||
)
|
||||
tokenizer = AutoTokenizer.from_pretrained("tbilisi-ai-lab/kona2-12B-Instruct")
|
||||
|
||||
messages = [
|
||||
{"role": "system", "content": "You are a helpful assistant."},
|
||||
{"role": "user", "content": "რა არის საქართველოს დედაქალაქი?"}
|
||||
]
|
||||
|
||||
inputs = tokenizer.apply_chat_template(
|
||||
messages,
|
||||
return_tensors="pt",
|
||||
add_generation_prompt=True
|
||||
).to(model.device)
|
||||
|
||||
outputs = model.generate(inputs, max_new_tokens=256, temperature=0.7)
|
||||
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
||||
```
|
||||
|
||||
### Function Calling (Hermes Format)
|
||||
|
||||
See the tokenizer's jinja template (`tokenizer_config.json`) for details on how function calling is formatted.
|
||||
|
||||
```python
|
||||
tools = [
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "get_weather",
|
||||
"description": "Get current weather for a location",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"location": {"type": "string", "description": "City name"}
|
||||
},
|
||||
"required": ["location"]
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
|
||||
messages = [
|
||||
{"role": "system", "content": "You are a helpful assistant with access to tools."},
|
||||
{"role": "user", "content": "What's the weather in Tbilisi?"}
|
||||
]
|
||||
|
||||
inputs = tokenizer.apply_chat_template(
|
||||
messages,
|
||||
tools=tools,
|
||||
return_tensors="pt",
|
||||
add_generation_prompt=True
|
||||
).to(model.device)
|
||||
|
||||
outputs = model.generate(inputs, max_new_tokens=256)
|
||||
# Output will include <tool_call>{"name": "get_weather", "arguments": {"location": "Tbilisi"}}</tool_call>
|
||||
```
|
||||
|
||||
## Related Models
|
||||
|
||||
| Model | Description |
|
||||
|-------|-------------|
|
||||
| [kona2-12B-Base](https://huggingface.co/tbilisi-ai-lab/kona2-12B-Base) | Pre-trained base model |
|
||||
| [kona2-12B](https://huggingface.co/tbilisi-ai-lab/kona2-12B) | DPO-aligned version (recommended) |
|
||||
| [kona2-small-3.8B](https://huggingface.co/tbilisi-ai-lab/kona2-small-3.8B) | Smaller 3.8B model |
|
||||
|
||||
## Limitations
|
||||
|
||||
- Training data cutoff: 2024
|
||||
|
||||
## Technical Specifications
|
||||
|
||||
- **Precision:** BF16/FP16 supported
|
||||
- **Minimum VRAM:** 24GB (with 4-bit quantization)
|
||||
- **Recommended:** 48GB+ for full precision
|
||||
|
||||
## Citation
|
||||
|
||||
```bibtex
|
||||
@misc{tbilisi2025kona2instruct,
|
||||
title = {Kona2-12B-Instruct: A Georgian Instruction-Tuned Language Model},
|
||||
author = {Tbilisi AI Lab Team},
|
||||
year = {2025},
|
||||
publisher = {Hugging Face},
|
||||
howpublished = {\url{https://huggingface.co/tbilisi-ai-lab/kona2-12B-Instruct}}
|
||||
}
|
||||
```
|
||||
|
||||
## License
|
||||
|
||||
This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
|
||||
|
||||
## Contact
|
||||
|
||||
- **Organization:** [Tbilisi AI Lab](https://huggingface.co/tbilisi-ai-lab)
|
||||
- **Website:** [ailab.ge](https://ailab.ge)
|
||||
- **Chat:** [chat.ailab.ge](https://chat.ailab.ge)
|
||||
- **API:** [api.ailab.ge](https://api.ailab.ge)
|
||||
149
additional_chat_templates/tool_use.jinja
Normal file
149
additional_chat_templates/tool_use.jinja
Normal file
@@ -0,0 +1,149 @@
|
||||
{%- macro json_to_python_type(json_spec) %}
|
||||
{%- set basic_type_map = {
|
||||
"string": "str",
|
||||
"number": "float",
|
||||
"integer": "int",
|
||||
"boolean": "bool"
|
||||
} %}
|
||||
|
||||
{%- if basic_type_map[json_spec.type] is defined %}
|
||||
{{- basic_type_map[json_spec.type] }}
|
||||
{%- elif json_spec.type == "array" %}
|
||||
{{- "list[" + json_to_python_type(json_spec|items) + "]"}}
|
||||
{%- elif json_spec.type == "object" %}
|
||||
{%- if json_spec.additionalProperties is defined %}
|
||||
{{- "dict[str, " + json_to_python_type(json_spec.additionalProperties) + ']'}}
|
||||
{%- else %}
|
||||
{{- "dict" }}
|
||||
{%- endif %}
|
||||
{%- elif json_spec.type is iterable %}
|
||||
{{- "Union[" }}
|
||||
{%- for t in json_spec.type %}
|
||||
{{- json_to_python_type({"type": t}) }}
|
||||
{%- if not loop.last %}
|
||||
{{- "," }}
|
||||
{%- endif %}
|
||||
{%- endfor %}
|
||||
{{- "]" }}
|
||||
{%- else %}
|
||||
{{- "Any" }}
|
||||
{%- endif %}
|
||||
{%- endmacro %}<s>
|
||||
{{- '<|im_start|>system
|
||||
' }}
|
||||
{{- "You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools: <tools> " }}
|
||||
{%- for tool in tools %}
|
||||
{%- if tool.function is defined %}
|
||||
{%- set tool = tool.function %}
|
||||
{%- endif %}
|
||||
{{- '{"type": "function", "function": ' }}
|
||||
{{- '{"name": "' + tool.name + '", ' }}
|
||||
{{- '"description": "' + tool.name + '(' }}
|
||||
{%- for param_name, param_fields in tool.parameters.properties|items %}
|
||||
{{- param_name + ": " + json_to_python_type(param_fields) }}
|
||||
{%- if not loop.last %}
|
||||
{{- ", " }}
|
||||
{%- endif %}
|
||||
{%- endfor %}
|
||||
{{- ")" }}
|
||||
{%- if tool.return is defined %}
|
||||
{{- " -> " + json_to_python_type(tool.return) }}
|
||||
{%- endif %}
|
||||
{{- " - " + tool.description + "
|
||||
|
||||
" }}
|
||||
{%- for param_name, param_fields in tool.parameters.properties|items %}
|
||||
{%- if loop.first %}
|
||||
{{- " Args:
|
||||
" }}
|
||||
{%- endif %}
|
||||
{{- " " + param_name + "(" + json_to_python_type(param_fields) + "): " + param_fields.description|trim }}
|
||||
{%- endfor %}
|
||||
{%- if tool.return is defined and tool.return.description is defined %}
|
||||
{{- "
|
||||
Returns:
|
||||
" + tool.return.description }}
|
||||
{%- endif %}
|
||||
{{- '"' }}
|
||||
{{- ', "parameters": ' }}
|
||||
{%- if tool.parameters.properties | length == 0 %}
|
||||
{{- "{}" }}
|
||||
{%- else %}
|
||||
{{- tool.parameters|tojson }}
|
||||
{%- endif %}
|
||||
{{- "}" }}
|
||||
{%- if not loop.last %}
|
||||
{{- "
|
||||
" }}
|
||||
{%- endif %}
|
||||
{%- endfor %}
|
||||
{{- " </tools>" }}
|
||||
{{- 'Use the following pydantic model json schema for each tool call you will make: {"properties": {"name": {"title": "Name", "type": "string"}, "arguments": {"title": "Arguments", "type": "object"}}, "required": ["name", "arguments"], "title": "FunctionCall", "type": "object"}}
|
||||
' }}
|
||||
{{- "For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
|
||||
" }}
|
||||
{{- "<tool_call>
|
||||
" }}
|
||||
{{- '{"name": <function-name>, "arguments": <args-dict>}
|
||||
' }}
|
||||
{{- '</tool_call><|im_end|>
|
||||
' }}
|
||||
{%- for message in messages %}
|
||||
{%- if message.role == "user" or message.role == "system" or (message.role == "assistant" and message.tool_calls is not defined) %}
|
||||
{{- '<|im_start|>' + message.role + '
|
||||
' + message.content + '<|im_end|>' + '
|
||||
' }}
|
||||
{%- elif message.role == "assistant" %}
|
||||
{{- '<|im_start|>' + message.role }}
|
||||
{%- for tool_call in message.tool_calls %}
|
||||
{{- '
|
||||
<tool_call>
|
||||
' }} {%- if tool_call.function is defined %}
|
||||
{%- set tool_call = tool_call.function %}
|
||||
{%- endif %}
|
||||
{{- '{' }}
|
||||
{{- '"name": "' }}
|
||||
{{- tool_call.name }}
|
||||
{{- '"' }}
|
||||
{{- ', '}}
|
||||
{%- if tool_call.arguments is defined %}
|
||||
{{- '"arguments": ' }}
|
||||
{%- if tool_call.arguments is string %}
|
||||
{{- tool_call.arguments }}
|
||||
{%- else %}
|
||||
{{- tool_call.arguments|tojson }}
|
||||
{%- endif %}
|
||||
{%- endif %}
|
||||
{{- '}' }}
|
||||
{{- '
|
||||
</tool_call>' }}
|
||||
{%- endfor %}
|
||||
{{- '<|im_end|>
|
||||
' }}
|
||||
{%- elif message.role == "tool" %}
|
||||
{%- if loop.previtem and loop.previtem.role != "tool" %}
|
||||
{{- '<|im_start|>tool
|
||||
' }}
|
||||
{%- endif %}
|
||||
{{- '<tool_response>
|
||||
' }}
|
||||
{{- message.content }}
|
||||
{%- if not loop.last %}
|
||||
{{- '
|
||||
</tool_response>
|
||||
' }}
|
||||
{%- else %}
|
||||
{{- '
|
||||
</tool_response>' }}
|
||||
{%- endif %}
|
||||
{%- if not loop.last and loop.nextitem.role != "tool" %}
|
||||
{{- '<|im_end|>' }}
|
||||
{%- elif loop.last %}
|
||||
{{- '<|im_end|>' }}
|
||||
{%- endif %}
|
||||
{%- endif %}
|
||||
{%- endfor %}
|
||||
{%- if add_generation_prompt %}
|
||||
{{- '<|im_start|>assistant
|
||||
' }}
|
||||
{%- endif %}
|
||||
6
chat_template.jinja
Normal file
6
chat_template.jinja
Normal file
@@ -0,0 +1,6 @@
|
||||
<s>{% for message in messages %}{% if loop.first and messages[0]['role'] != 'system' %}{{ '<|im_start|>system
|
||||
You are a helpful assistant.<|im_end|>
|
||||
' }}{% endif %}{{'<|im_start|>' + message['role'] + '
|
||||
' + message['content'] + '<|im_end|>' + '
|
||||
'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
|
||||
' }}{% endif %}
|
||||
34
config.json
Normal file
34
config.json
Normal file
@@ -0,0 +1,34 @@
|
||||
{
|
||||
"architectures": [
|
||||
"MistralForCausalLM"
|
||||
],
|
||||
"attention_dropout": 0.0,
|
||||
"bos_token_id": 151072,
|
||||
"dtype": "float32",
|
||||
"eos_token_id": 151073,
|
||||
"head_dim": 128,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 5120,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 14336,
|
||||
"max_position_embeddings": 1024000,
|
||||
"model_type": "mistral",
|
||||
"num_attention_heads": 32,
|
||||
"num_hidden_layers": 40,
|
||||
"num_key_value_heads": 8,
|
||||
"pad_token_id": 10,
|
||||
"rms_norm_eps": 1e-05,
|
||||
"rope_theta": 1000000.0,
|
||||
"sliding_window": null,
|
||||
"tie_word_embeddings": false,
|
||||
"tool_call_bos_token_id": 151076,
|
||||
"tool_call_eos_token_id": 151077,
|
||||
"tool_response_bos_token_id": 151079,
|
||||
"tool_response_eos_token_id": 151080,
|
||||
"tool_response_token_id": 151078,
|
||||
"tools_bos_token_id": 151074,
|
||||
"tools_eos_token_id": 151075,
|
||||
"transformers_version": "4.57.1",
|
||||
"use_cache": false,
|
||||
"vocab_size": 151081
|
||||
}
|
||||
7
generation_config.json
Normal file
7
generation_config.json
Normal file
@@ -0,0 +1,7 @@
|
||||
{
|
||||
"_from_model_config": true,
|
||||
"bos_token_id": 151072,
|
||||
"eos_token_id": 151073,
|
||||
"pad_token_id": 10,
|
||||
"transformers_version": "4.57.1"
|
||||
}
|
||||
3
model-00001-of-00011.safetensors
Normal file
3
model-00001-of-00011.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:de7cc3b7c7bd32ded4cd09b6aa8cb8a342ebbcc37cf8eeb4a8533a9ae7a713aa
|
||||
size 4981618504
|
||||
3
model-00002-of-00011.safetensors
Normal file
3
model-00002-of-00011.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:0ecbad7a561499b00e972bfedabcc54c45be22d28d2f2cbcfdc1ebe7314bf3cf
|
||||
size 4865602392
|
||||
3
model-00003-of-00011.safetensors
Normal file
3
model-00003-of-00011.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:e90602af84a147d47d870fd71d9c10116a11949180256a79e43dd0a19d587362
|
||||
size 4949446944
|
||||
3
model-00004-of-00011.safetensors
Normal file
3
model-00004-of-00011.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:0f77a6a861b25bfaeeee56aca1ee71c4cf93e4ef6f5f552cab76ecacb7548a3f
|
||||
size 4865602440
|
||||
3
model-00005-of-00011.safetensors
Normal file
3
model-00005-of-00011.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:9df8d4e550b060a2c960e2b0222596767d48f7d9789bd56a04106a7478a34fc2
|
||||
size 4949446968
|
||||
3
model-00006-of-00011.safetensors
Normal file
3
model-00006-of-00011.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:5bbc10720f0e5b16a5dffe3aaa705b2c773791b508a854d2c4a53ca21e216e7e
|
||||
size 4865602440
|
||||
3
model-00007-of-00011.safetensors
Normal file
3
model-00007-of-00011.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:9828d21b3565a60c486cc2413da61fa263fcc523c6cbe5d67d041589b84f14d2
|
||||
size 4949446968
|
||||
3
model-00008-of-00011.safetensors
Normal file
3
model-00008-of-00011.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:ee3d601a7219f6f33673bfeabc928f299e223b165c302d9e5c17c982450f866f
|
||||
size 4865602440
|
||||
3
model-00009-of-00011.safetensors
Normal file
3
model-00009-of-00011.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:a531a5f3d6870b959743cc22e554ab18e1c7c26ff61bc3b9d2ccdf0bc92afeb3
|
||||
size 4949446968
|
||||
3
model-00010-of-00011.safetensors
Normal file
3
model-00010-of-00011.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:046161ff47a5b107b298bbc8565ceed238133cd840be83257f6eabc608383b19
|
||||
size 2474785256
|
||||
3
model-00011-of-00011.safetensors
Normal file
3
model-00011-of-00011.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:4fd574241d6733f193de2b062d71ccde01eaeef4023869c179cd1a5cdc278a4a
|
||||
size 3094139008
|
||||
371
model.safetensors.index.json
Normal file
371
model.safetensors.index.json
Normal file
@@ -0,0 +1,371 @@
|
||||
{
|
||||
"metadata": {
|
||||
"total_parameters": 12452674560,
|
||||
"total_size": 49810698240
|
||||
},
|
||||
"weight_map": {
|
||||
"lm_head.weight": "model-00011-of-00011.safetensors",
|
||||
"model.embed_tokens.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.0.input_layernorm.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.1.input_layernorm.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.1.mlp.down_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.1.post_attention_layernorm.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.10.input_layernorm.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.10.mlp.down_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.10.mlp.gate_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.10.mlp.up_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.10.post_attention_layernorm.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.10.self_attn.k_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.10.self_attn.o_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.10.self_attn.q_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.10.self_attn.v_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.11.input_layernorm.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.11.mlp.down_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.11.mlp.gate_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.11.mlp.up_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.11.post_attention_layernorm.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.11.self_attn.k_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.11.self_attn.o_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.11.self_attn.q_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.11.self_attn.v_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.12.input_layernorm.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.12.mlp.down_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.12.mlp.gate_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.12.mlp.up_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.12.post_attention_layernorm.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.12.self_attn.k_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.12.self_attn.o_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.12.self_attn.q_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.12.self_attn.v_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.13.input_layernorm.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.13.mlp.down_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.13.mlp.gate_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.13.mlp.up_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.13.post_attention_layernorm.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.13.self_attn.k_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.13.self_attn.o_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.13.self_attn.q_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.13.self_attn.v_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.14.input_layernorm.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.14.mlp.down_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.14.mlp.gate_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.14.mlp.up_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.14.post_attention_layernorm.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.14.self_attn.k_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.14.self_attn.o_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.14.self_attn.q_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.14.self_attn.v_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.15.input_layernorm.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.15.mlp.down_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.15.mlp.gate_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.15.mlp.up_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.15.post_attention_layernorm.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.15.self_attn.k_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.15.self_attn.o_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.15.self_attn.q_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.15.self_attn.v_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.16.input_layernorm.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.16.mlp.down_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.16.mlp.gate_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.16.mlp.up_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.16.post_attention_layernorm.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.16.self_attn.k_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.16.self_attn.o_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.16.self_attn.q_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.16.self_attn.v_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.17.input_layernorm.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.17.mlp.down_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.17.mlp.gate_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.17.mlp.up_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.17.post_attention_layernorm.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.17.self_attn.k_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.17.self_attn.o_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.17.self_attn.q_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.17.self_attn.v_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.18.input_layernorm.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.18.mlp.down_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.18.mlp.gate_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.18.mlp.up_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.18.post_attention_layernorm.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.18.self_attn.k_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.18.self_attn.o_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.18.self_attn.q_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.18.self_attn.v_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.19.input_layernorm.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.19.mlp.down_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.19.mlp.gate_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.19.mlp.up_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.19.post_attention_layernorm.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.19.self_attn.k_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.19.self_attn.o_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.19.self_attn.q_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.19.self_attn.v_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.2.input_layernorm.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.2.mlp.down_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.2.mlp.gate_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.2.mlp.up_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.2.post_attention_layernorm.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.2.self_attn.k_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.2.self_attn.o_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.2.self_attn.q_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.2.self_attn.v_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.20.input_layernorm.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.20.mlp.down_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.20.mlp.gate_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.20.mlp.up_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.20.post_attention_layernorm.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.20.self_attn.k_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.20.self_attn.o_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.20.self_attn.q_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.20.self_attn.v_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.21.input_layernorm.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.21.mlp.down_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.21.mlp.gate_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.21.mlp.up_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.21.post_attention_layernorm.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.21.self_attn.k_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.21.self_attn.o_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.21.self_attn.q_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.21.self_attn.v_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.22.input_layernorm.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.22.mlp.down_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.22.mlp.gate_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.22.mlp.up_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.22.post_attention_layernorm.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.22.self_attn.k_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.22.self_attn.o_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.22.self_attn.q_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.22.self_attn.v_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.23.input_layernorm.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.23.mlp.down_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.23.mlp.gate_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.23.mlp.up_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.23.post_attention_layernorm.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.23.self_attn.k_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.23.self_attn.o_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.23.self_attn.q_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.23.self_attn.v_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.24.input_layernorm.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.24.mlp.down_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.24.mlp.gate_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.24.mlp.up_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.24.post_attention_layernorm.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.24.self_attn.k_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.24.self_attn.o_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.24.self_attn.q_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.24.self_attn.v_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.25.input_layernorm.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.25.mlp.down_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.25.mlp.gate_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.25.mlp.up_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.25.post_attention_layernorm.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.25.self_attn.k_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.25.self_attn.o_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.25.self_attn.q_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.25.self_attn.v_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.26.input_layernorm.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.26.mlp.down_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.26.mlp.gate_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.26.mlp.up_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.26.post_attention_layernorm.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.26.self_attn.k_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.26.self_attn.o_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.26.self_attn.q_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.26.self_attn.v_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.27.input_layernorm.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.27.mlp.down_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.27.mlp.gate_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.27.mlp.up_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.27.post_attention_layernorm.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.27.self_attn.k_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.27.self_attn.o_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.27.self_attn.q_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.27.self_attn.v_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.28.input_layernorm.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.28.mlp.down_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.28.mlp.gate_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.28.mlp.up_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.28.post_attention_layernorm.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.28.self_attn.k_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.28.self_attn.o_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.28.self_attn.q_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.28.self_attn.v_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.29.input_layernorm.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.29.mlp.down_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.29.mlp.gate_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.29.mlp.up_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.29.post_attention_layernorm.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.29.self_attn.k_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.29.self_attn.o_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.29.self_attn.q_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.29.self_attn.v_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.3.input_layernorm.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.3.mlp.down_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.3.mlp.gate_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.3.mlp.up_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.3.post_attention_layernorm.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.3.self_attn.k_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.3.self_attn.o_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.3.self_attn.q_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.3.self_attn.v_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.30.input_layernorm.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.30.mlp.down_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.30.mlp.gate_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.30.mlp.up_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.30.post_attention_layernorm.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.30.self_attn.k_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.30.self_attn.o_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.30.self_attn.q_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.30.self_attn.v_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.31.input_layernorm.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.31.mlp.down_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.31.mlp.gate_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.31.mlp.up_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.31.post_attention_layernorm.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.31.self_attn.k_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.31.self_attn.o_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.31.self_attn.q_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.31.self_attn.v_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.32.input_layernorm.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.32.mlp.down_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.32.mlp.gate_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.32.mlp.up_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.32.post_attention_layernorm.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.32.self_attn.k_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.32.self_attn.o_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.32.self_attn.q_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.32.self_attn.v_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.33.input_layernorm.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.33.mlp.down_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.33.mlp.gate_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.33.mlp.up_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.33.post_attention_layernorm.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.33.self_attn.k_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.33.self_attn.o_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.33.self_attn.q_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.33.self_attn.v_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.34.input_layernorm.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.34.mlp.down_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.34.mlp.gate_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.34.mlp.up_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.34.post_attention_layernorm.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.34.self_attn.k_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.34.self_attn.o_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.34.self_attn.q_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.34.self_attn.v_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.35.input_layernorm.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.35.mlp.down_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.35.mlp.gate_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.35.mlp.up_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.35.post_attention_layernorm.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.35.self_attn.k_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.35.self_attn.o_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.35.self_attn.q_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.35.self_attn.v_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.36.input_layernorm.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.36.mlp.down_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.36.mlp.gate_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.36.mlp.up_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.36.post_attention_layernorm.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.36.self_attn.k_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.36.self_attn.o_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.36.self_attn.q_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.36.self_attn.v_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.37.input_layernorm.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.37.mlp.down_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.37.mlp.gate_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.37.mlp.up_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.37.post_attention_layernorm.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.37.self_attn.k_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.37.self_attn.o_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.37.self_attn.q_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.37.self_attn.v_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.38.input_layernorm.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.38.mlp.down_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.38.mlp.gate_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.38.mlp.up_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.38.post_attention_layernorm.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.38.self_attn.k_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.38.self_attn.o_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.38.self_attn.q_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.38.self_attn.v_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.39.input_layernorm.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.39.mlp.down_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.39.mlp.gate_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.39.mlp.up_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.39.post_attention_layernorm.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.39.self_attn.k_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.39.self_attn.o_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.39.self_attn.q_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.39.self_attn.v_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.4.input_layernorm.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.4.mlp.down_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.4.mlp.gate_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.4.mlp.up_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.4.post_attention_layernorm.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.4.self_attn.k_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.4.self_attn.o_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.4.self_attn.q_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.4.self_attn.v_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.5.input_layernorm.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.5.mlp.down_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.5.mlp.gate_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.5.mlp.up_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.5.post_attention_layernorm.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.5.self_attn.k_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.5.self_attn.o_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.5.self_attn.q_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.5.self_attn.v_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.6.input_layernorm.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.6.mlp.down_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.6.mlp.gate_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.6.mlp.up_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.6.post_attention_layernorm.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.6.self_attn.k_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.6.self_attn.o_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.6.self_attn.q_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.6.self_attn.v_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.7.input_layernorm.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.7.mlp.down_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.7.mlp.gate_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.7.mlp.up_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.7.post_attention_layernorm.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.7.self_attn.k_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.7.self_attn.o_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.7.self_attn.q_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.7.self_attn.v_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.8.input_layernorm.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.8.mlp.down_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.8.mlp.gate_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.8.mlp.up_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.8.post_attention_layernorm.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.8.self_attn.k_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.8.self_attn.o_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.8.self_attn.q_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.8.self_attn.v_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.9.input_layernorm.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.9.mlp.down_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.9.mlp.gate_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.9.mlp.up_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.9.post_attention_layernorm.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.9.self_attn.k_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.9.self_attn.o_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.9.self_attn.q_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.9.self_attn.v_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.norm.weight": "model-00010-of-00011.safetensors"
|
||||
}
|
||||
}
|
||||
39
special_tokens_map.json
Normal file
39
special_tokens_map.json
Normal file
@@ -0,0 +1,39 @@
|
||||
{
|
||||
"additional_special_tokens": [
|
||||
"<tools>",
|
||||
"</tools>",
|
||||
"<tool_call>",
|
||||
"</tool_call>",
|
||||
"<|tool_response|>",
|
||||
"<tool_response>",
|
||||
"</tool_response>"
|
||||
],
|
||||
"bos_token": {
|
||||
"content": "<|im_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"eos_token": {
|
||||
"content": "<|im_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"pad_token": {
|
||||
"content": "<pad>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"unk_token": {
|
||||
"content": "<unk>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
3
tokenizer.json
Normal file
3
tokenizer.json
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:46bfd2b53fbea624e91628de4655dd6ede752b76dbc18af3cf4534ed7bb0b84a
|
||||
size 20996702
|
||||
168103
tokenizer_config.json
Normal file
168103
tokenizer_config.json
Normal file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user