初始化项目,由ModelHub XC社区提供模型
Model: toroe/SmolLM-3B-Science-DE Source: Original Platform
This commit is contained in:
36
.gitattributes
vendored
Normal file
36
.gitattributes
vendored
Normal file
@@ -0,0 +1,36 @@
|
|||||||
|
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.model filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||||
|
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
||||||
280
README.md
Normal file
280
README.md
Normal file
@@ -0,0 +1,280 @@
|
|||||||
|
---
|
||||||
|
language:
|
||||||
|
- de
|
||||||
|
license: other
|
||||||
|
base_model: HuggingFaceTB/SmolLM3-3B
|
||||||
|
tags:
|
||||||
|
- sft
|
||||||
|
- instruction-tuning
|
||||||
|
- reasoning
|
||||||
|
- german
|
||||||
|
- multilingual
|
||||||
|
- long-context
|
||||||
|
- fsdp
|
||||||
|
- transformers
|
||||||
|
datasets:
|
||||||
|
- DGurgurov/Nemotron-Multilingual-Reasoning
|
||||||
|
metrics:
|
||||||
|
- token_accuracy
|
||||||
|
library_name: transformers
|
||||||
|
pipeline_tag: text-generation
|
||||||
|
---
|
||||||
|
|
||||||
|
# SmolLM3-3B — German Reasoning Instruction SFT (Nemotron Multilingual Reasoning)
|
||||||
|
|
||||||
|
## Model Description
|
||||||
|
|
||||||
|
This model is a **Supervised Fine-Tuned (SFT)** version of:
|
||||||
|
|
||||||
|
`HuggingFaceTB/SmolLM3-3B`
|
||||||
|
|
||||||
|
It was fine-tuned on the **German (`de`) split** of the dataset:
|
||||||
|
|
||||||
|
`DGurgurov/Nemotron-Multilingual-Reasoning`
|
||||||
|
|
||||||
|
The goal of the training was to improve:
|
||||||
|
|
||||||
|
- German instruction following
|
||||||
|
- Step-by-step reasoning
|
||||||
|
- Long-context conversation behavior
|
||||||
|
|
||||||
|
The model was trained using chat-formatted conversations and **completion-only loss**, meaning only assistant responses contributed to optimization.
|
||||||
|
|
||||||
|
Key properties:
|
||||||
|
|
||||||
|
- Base model: SmolLM3-3B
|
||||||
|
- Language specialization: German
|
||||||
|
- Context length during training: **16,384 tokens**
|
||||||
|
- Chat formatted dataset
|
||||||
|
- Long-context packing enabled
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Intended Uses
|
||||||
|
|
||||||
|
### Suitable For
|
||||||
|
- German conversational assistants
|
||||||
|
- Educational tutoring
|
||||||
|
- Reasoning and structured explanation tasks
|
||||||
|
- Long-document Q&A in German
|
||||||
|
- Research experiments with long-context small LLMs
|
||||||
|
|
||||||
|
### Not Suitable For
|
||||||
|
- Medical or legal advice without human review
|
||||||
|
- Autonomous decision-making
|
||||||
|
- Safety-critical systems
|
||||||
|
- High-stakes financial decisions
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Training Data
|
||||||
|
|
||||||
|
Dataset used:
|
||||||
|
|
||||||
|
`DGurgurov/Nemotron-Multilingual-Reasoning`
|
||||||
|
|
||||||
|
Processing configuration:
|
||||||
|
|
||||||
|
- Language filtering: **German only**
|
||||||
|
- Converted into chat messages (`prepare_messages=True`)
|
||||||
|
- Assistant-only optimization (`completion_only_loss=True`)
|
||||||
|
|
||||||
|
Only the assistant responses were used to compute loss; user and system messages were masked.
|
||||||
|
|
||||||
|
Please review the dataset card for provenance and limitations.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Training Procedure
|
||||||
|
|
||||||
|
Training was performed using **HuggingFace Accelerate with FSDP (Fully Sharded Data Parallel)** across 8 processes.
|
||||||
|
|
||||||
|
### Core Setup
|
||||||
|
|
||||||
|
- Training method: Supervised fine-tuning (SFT)
|
||||||
|
- Epochs: **3**
|
||||||
|
- Maximum sequence length: **16,384**
|
||||||
|
- Sequence packing: enabled
|
||||||
|
- Precision: **bfloat16**
|
||||||
|
- Kernel optimization: Liger kernel enabled
|
||||||
|
- Gradient checkpointing: enabled
|
||||||
|
- Distributed: FSDP (8 processes)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Optimization
|
||||||
|
|
||||||
|
- Optimizer: `adamw_torch_fused`
|
||||||
|
- Per-device batch size: 4
|
||||||
|
- Gradient accumulation: 4
|
||||||
|
- Effective batch size (per GPU): 16 sequences per step
|
||||||
|
- Weight decay: 0.05
|
||||||
|
|
||||||
|
Learning rate schedule:
|
||||||
|
|
||||||
|
- Scheduler: `cosine_with_min_lr`
|
||||||
|
- Warmup ratio: 0.05
|
||||||
|
- Minimum LR: 5e-6
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Logging & Checkpoints
|
||||||
|
|
||||||
|
- Logging every 5 steps
|
||||||
|
- Checkpoint every 450 steps
|
||||||
|
- Weights & Biases tracking enabled
|
||||||
|
- Token accuracy logged during training
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Data Processing
|
||||||
|
|
||||||
|
- Dataset workers: 16
|
||||||
|
- Dataset preparation: enabled
|
||||||
|
- Chat message preparation: enabled
|
||||||
|
- German split: enabled
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
### Transformers
|
||||||
|
|
||||||
|
```python
|
||||||
|
from transformers import AutoTokenizer, AutoModelForCausalLM
|
||||||
|
import torch
|
||||||
|
|
||||||
|
model_id = "YOUR_USERNAME/YOUR_MODEL_NAME"
|
||||||
|
|
||||||
|
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
|
||||||
|
model = AutoModelForCausalLM.from_pretrained(
|
||||||
|
model_id,
|
||||||
|
device_map="auto",
|
||||||
|
torch_dtype=torch.bfloat16,
|
||||||
|
)
|
||||||
|
|
||||||
|
messages = [
|
||||||
|
{"role": "system", "content": "Du bist ein hilfreicher Assistent."},
|
||||||
|
{"role": "user", "content": "Warum ist der Himmel blau?"}
|
||||||
|
]
|
||||||
|
|
||||||
|
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
||||||
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
|
||||||
|
|
||||||
|
outputs = model.generate(
|
||||||
|
**inputs,
|
||||||
|
max_new_tokens=512,
|
||||||
|
temperature=0.7,
|
||||||
|
top_p=0.9,
|
||||||
|
do_sample=True
|
||||||
|
)
|
||||||
|
|
||||||
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
||||||
|
```
|
||||||
|
**Important:**
|
||||||
|
You should use `apply_chat_template()` when prompting. The model was trained on chat-formatted conversations and performance will degrade without it.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Evaluation
|
||||||
|
|
||||||
|
During training, **token accuracy** was logged as a diagnostic metric.
|
||||||
|
|
||||||
|
Token accuracy:
|
||||||
|
- is useful for monitoring training stability
|
||||||
|
- is **NOT** a benchmark score
|
||||||
|
- does not represent real reasoning performance
|
||||||
|
|
||||||
|
For proper evaluation, use:
|
||||||
|
- German instruction-following benchmarks
|
||||||
|
- reasoning datasets
|
||||||
|
- long-context evaluation tasks
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Limitations
|
||||||
|
|
||||||
|
- May hallucinate facts
|
||||||
|
- Reasoning chains can still contain logical errors
|
||||||
|
- Performance near 16k context depends heavily on prompt structure
|
||||||
|
- Improvements mainly apply to German
|
||||||
|
- Smaller model size means weaker world knowledge than large LLMs
|
||||||
|
- Not aligned for safety-critical deployment
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Bias & Safety
|
||||||
|
|
||||||
|
This model inherits biases from:
|
||||||
|
- the base model
|
||||||
|
- the training dataset
|
||||||
|
|
||||||
|
Recommended mitigations:
|
||||||
|
- add moderation filters
|
||||||
|
- use system prompts enforcing safe behavior
|
||||||
|
- include human review for sensitive deployments
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
This model is a derivative of:
|
||||||
|
|
||||||
|
`HuggingFaceTB/SmolLM3-3B`
|
||||||
|
|
||||||
|
Therefore, the original base model license and usage restrictions apply, along with any dataset terms.
|
||||||
|
|
||||||
|
Verify compatibility before commercial deployment.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Reproducibility (Training Arguments)
|
||||||
|
|
||||||
|
```text
|
||||||
|
accelerate launch --use_fsdp --num_processes 8 --config_file sft/my_config.yaml sft/sft_trainer.py
|
||||||
|
|
||||||
|
--model_name HuggingFaceTB/SmolLM3-3B
|
||||||
|
--tokenizer_name HuggingFaceTB/SmolLM3-3B
|
||||||
|
--dataset_path DGurgurov/Nemotron-Multilingual-Reasoning
|
||||||
|
--skip_prepare_dataset False
|
||||||
|
--lang_split de
|
||||||
|
--prepare_messages True
|
||||||
|
--completion_only_loss True
|
||||||
|
--max_length 16384
|
||||||
|
--dataset_num_proc 16
|
||||||
|
--packing True
|
||||||
|
--use_liger_kernel True
|
||||||
|
--bf16 True
|
||||||
|
--log_token_accuracy True
|
||||||
|
--optim adamw_torch_fused
|
||||||
|
--gradient_checkpointing True
|
||||||
|
--per_device_train_batch_size 4
|
||||||
|
--gradient_accumulation_steps 4
|
||||||
|
--ddp_find_unused_parameters False
|
||||||
|
--lr_scheduler_type cosine_with_min_lr
|
||||||
|
--lr_scheduler_kwargs {"min_lr": 5.0e-6}
|
||||||
|
--warmup_ratio 0.05
|
||||||
|
--weight_decay 0.05
|
||||||
|
--report_to wandb
|
||||||
|
--run_name smol_3b_3epochs_lns_de
|
||||||
|
--num_train_epochs 3
|
||||||
|
--save_strategy steps
|
||||||
|
--logging_steps 5
|
||||||
|
--save_steps 450
|
||||||
|
```
|
||||||
|
---
|
||||||
|
|
||||||
|
## Citation
|
||||||
|
|
||||||
|
If you use this model, please cite:
|
||||||
|
|
||||||
|
- `HuggingFaceTB/SmolLM3-3B`
|
||||||
|
- `DGurgurov/Nemotron-Multilingual-Reasoning`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Acknowledgements
|
||||||
|
|
||||||
|
- HuggingFaceTB — SmolLM3 base model
|
||||||
|
- Nemotron Multilingual Reasoning dataset authors
|
||||||
|
- HuggingFace Accelerate and Transformers libraries
|
||||||
94
chat_template.jinja
Normal file
94
chat_template.jinja
Normal file
@@ -0,0 +1,94 @@
|
|||||||
|
{# ───── defaults ───── #}
|
||||||
|
{%- if enable_thinking is not defined -%}
|
||||||
|
{%- set enable_thinking = true -%}
|
||||||
|
{%- endif -%}
|
||||||
|
|
||||||
|
{# ───── reasoning mode ───── #}
|
||||||
|
{%- if enable_thinking -%}
|
||||||
|
{%- set reasoning_mode = "/think" -%}
|
||||||
|
{%- else -%}
|
||||||
|
{%- set reasoning_mode = "/no_think" -%}
|
||||||
|
{%- endif -%}
|
||||||
|
|
||||||
|
{# ───── header (system message) ───── #}
|
||||||
|
{{- "<|im_start|>system\n" -}}
|
||||||
|
|
||||||
|
{%- if messages[0].role == "system" -%}
|
||||||
|
{%- set system_message = messages[0].content -%}
|
||||||
|
{%- if "/no_think" in system_message -%}
|
||||||
|
{%- set reasoning_mode = "/no_think" -%}
|
||||||
|
{%- elif "/think" in system_message -%}
|
||||||
|
{%- set reasoning_mode = "/think" -%}
|
||||||
|
{%- endif -%}
|
||||||
|
{%- set custom_instructions = system_message.replace("/no_think", "").replace("/think", "").rstrip() -%}
|
||||||
|
{%- endif -%}
|
||||||
|
|
||||||
|
{%- if "/system_override" in system_message -%}
|
||||||
|
{{- custom_instructions.replace("/system_override", "").rstrip() -}}
|
||||||
|
{{- "<|im_end|>\n" -}}
|
||||||
|
{%- else -%}
|
||||||
|
{{- "## Metadata\n\n" -}}
|
||||||
|
{{- "Knowledge Cutoff Date: June 2025\n" -}}
|
||||||
|
{%- set today = strftime_now("%d %B %Y") -%}
|
||||||
|
{{- "Today Date: " ~ today ~ "\n" -}}
|
||||||
|
{{- "Reasoning Mode: " + reasoning_mode + "\n\n" -}}
|
||||||
|
|
||||||
|
{{- "## Custom Instructions\n\n" -}}
|
||||||
|
{%- if custom_instructions -%}
|
||||||
|
{{- custom_instructions + "\n\n" -}}
|
||||||
|
{%- elif reasoning_mode == "/think" -%}
|
||||||
|
{{- "You are a helpful AI assistant named SmolLM, trained by Hugging Face. Your role as an assistant involves thoroughly exploring questions through a systematic thinking process before providing the final precise and accurate solutions. This requires engaging in a comprehensive cycle of analysis, summarizing, exploration, reassessment, reflection, backtracking, and iteration to develop well-considered thinking process. Please structure your response into two main sections: Thought and Solution using the specified format: <think> Thought section </think> Solution section. In the Thought section, detail your reasoning process in steps. Each step should include detailed considerations such as analysing questions, summarizing relevant findings, brainstorming new ideas, verifying the accuracy of the current steps, refining any errors, and revisiting previous steps. In the Solution section, based on various attempts, explorations, and reflections from the Thought section, systematically present the final solution that you deem correct. The Solution section should be logical, accurate, and concise and detail necessary steps needed to reach the conclusion.\n\n" -}}
|
||||||
|
{%- else -%}
|
||||||
|
{{- "You are a helpful AI assistant named SmolLM, trained by Hugging Face.\n\n" -}}
|
||||||
|
{%- endif -%}
|
||||||
|
|
||||||
|
{%- if xml_tools or python_tools or tools -%}
|
||||||
|
{{- "### Tools\n\n" -}}
|
||||||
|
{%- if xml_tools or tools -%}
|
||||||
|
{%- if tools -%}
|
||||||
|
{%- set xml_tools = tools -%}
|
||||||
|
{%- endif -%}
|
||||||
|
{%- set ns = namespace(xml_tool_string="You may call one or more functions to assist with the user query.\nYou are provided with function signatures within <tools></tools> XML tags:\n\n<tools>\n") -%}
|
||||||
|
{%- for tool in xml_tools[:] -%} {# The slicing makes sure that xml_tools is a list #}
|
||||||
|
{%- set ns.xml_tool_string = ns.xml_tool_string ~ (tool | string) ~ "\n" -%}
|
||||||
|
{%- endfor -%}
|
||||||
|
{%- set xml_tool_string = ns.xml_tool_string + "</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call>" -%}
|
||||||
|
{{- xml_tool_string -}}
|
||||||
|
{%- endif -%}
|
||||||
|
{%- if python_tools -%}
|
||||||
|
{%- set ns = namespace(python_tool_string="When you send a message containing Python code between '<code>' and '</code>' tags, it will be executed in a stateful Jupyter notebook environment, and you will then be given the output to continued reasoning in an agentic loop.\n\nYou can use the following tools in your python code like regular functions:\n<tools>\n") -%}
|
||||||
|
{%- for tool in python_tools[:] -%} {# The slicing makes sure that python_tools is a list #}
|
||||||
|
{%- set ns.python_tool_string = ns.python_tool_string ~ (tool | string) ~ "\n" -%}
|
||||||
|
{%- endfor -%}
|
||||||
|
{%- set python_tool_string = ns.python_tool_string + "</tools>\n\nThe state persists between code executions: so variables that you define in one step are still available thereafter." -%}
|
||||||
|
{{- python_tool_string -}}
|
||||||
|
{%- endif -%}
|
||||||
|
{{- "\n\n" -}}
|
||||||
|
{{- "<|im_end|>\n" -}}
|
||||||
|
{%- endif -%}
|
||||||
|
{%- endif -%}
|
||||||
|
{# ───── main loop ───── #}
|
||||||
|
{%- for message in messages -%}
|
||||||
|
{%- set content = message.content if message.content is string else "" -%}
|
||||||
|
{%- if message.role == "user" -%}
|
||||||
|
{{ "<|im_start|>" + message.role + "\n" + content + "<|im_end|>\n" }}
|
||||||
|
{%- elif message.role == "assistant" -%}
|
||||||
|
{% generation %}
|
||||||
|
{%- if reasoning_mode == "/think" -%}
|
||||||
|
{{ "<|im_start|>assistant\n" + content.lstrip("\n") + "<|im_end|>\n" }}
|
||||||
|
{%- else -%}
|
||||||
|
{{ "<|im_start|>assistant\n" + "<think>\n\n</think>\n" + content.lstrip("\n") + "<|im_end|>\n" }}
|
||||||
|
{%- endif -%}
|
||||||
|
{% endgeneration %}
|
||||||
|
{%- elif message.role == "tool" -%}
|
||||||
|
{{ "<|im_start|>" + "user\n" + content + "<|im_end|>\n" }}
|
||||||
|
{%- endif -%}
|
||||||
|
{%- endfor -%}
|
||||||
|
{# ───── generation prompt ───── #}
|
||||||
|
{%- if add_generation_prompt -%}
|
||||||
|
{%- if reasoning_mode == "/think" -%}
|
||||||
|
{{ "<|im_start|>assistant\n" }}
|
||||||
|
{%- else -%}
|
||||||
|
{{ "<|im_start|>assistant\n" + "<think>\n\n</think>\n" }}
|
||||||
|
{%- endif -%}
|
||||||
|
{%- endif -%}
|
||||||
108
config.json
Normal file
108
config.json
Normal file
@@ -0,0 +1,108 @@
|
|||||||
|
{
|
||||||
|
"architectures": [
|
||||||
|
"SmolLM3ForCausalLM"
|
||||||
|
],
|
||||||
|
"attention_bias": false,
|
||||||
|
"attention_dropout": 0.0,
|
||||||
|
"bos_token_id": null,
|
||||||
|
"dtype": "float32",
|
||||||
|
"eos_token_id": 128012,
|
||||||
|
"hidden_act": "silu",
|
||||||
|
"hidden_size": 2048,
|
||||||
|
"initializer_range": 0.02,
|
||||||
|
"intermediate_size": 11008,
|
||||||
|
"layer_types": [
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention"
|
||||||
|
],
|
||||||
|
"max_position_embeddings": 65536,
|
||||||
|
"max_window_layers": 28,
|
||||||
|
"mlp_bias": false,
|
||||||
|
"model_type": "smollm3",
|
||||||
|
"no_rope_layer_interval": 4,
|
||||||
|
"no_rope_layers": [
|
||||||
|
1,
|
||||||
|
1,
|
||||||
|
1,
|
||||||
|
0,
|
||||||
|
1,
|
||||||
|
1,
|
||||||
|
1,
|
||||||
|
0,
|
||||||
|
1,
|
||||||
|
1,
|
||||||
|
1,
|
||||||
|
0,
|
||||||
|
1,
|
||||||
|
1,
|
||||||
|
1,
|
||||||
|
0,
|
||||||
|
1,
|
||||||
|
1,
|
||||||
|
1,
|
||||||
|
0,
|
||||||
|
1,
|
||||||
|
1,
|
||||||
|
1,
|
||||||
|
0,
|
||||||
|
1,
|
||||||
|
1,
|
||||||
|
1,
|
||||||
|
0,
|
||||||
|
1,
|
||||||
|
1,
|
||||||
|
1,
|
||||||
|
0,
|
||||||
|
1,
|
||||||
|
1,
|
||||||
|
1,
|
||||||
|
0
|
||||||
|
],
|
||||||
|
"num_attention_heads": 16,
|
||||||
|
"num_hidden_layers": 36,
|
||||||
|
"num_key_value_heads": 4,
|
||||||
|
"pad_token_id": 128012,
|
||||||
|
"pretraining_tp": 2,
|
||||||
|
"rms_norm_eps": 1e-06,
|
||||||
|
"rope_scaling": null,
|
||||||
|
"rope_theta": 5000000.0,
|
||||||
|
"sliding_window": null,
|
||||||
|
"transformers_version": "4.57.0",
|
||||||
|
"use_cache": false,
|
||||||
|
"use_sliding_window": false,
|
||||||
|
"vocab_size": 128256
|
||||||
|
}
|
||||||
10
generation_config.json
Normal file
10
generation_config.json
Normal file
@@ -0,0 +1,10 @@
|
|||||||
|
{
|
||||||
|
"do_sample": true,
|
||||||
|
"eos_token_id": [
|
||||||
|
128012
|
||||||
|
],
|
||||||
|
"pad_token_id": 128012,
|
||||||
|
"temperature": 0.6,
|
||||||
|
"top_p": 0.95,
|
||||||
|
"transformers_version": "4.57.0"
|
||||||
|
}
|
||||||
3
model-00001-of-00003.safetensors
Normal file
3
model-00001-of-00003.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:98f23a714e8e3aad1a1e6188401d863dcba73c837d691989efd6a9900a9ba51e
|
||||||
|
size 4932711224
|
||||||
3
model-00002-of-00003.safetensors
Normal file
3
model-00002-of-00003.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:117007cda11c61011215ad10bf4f0549ee11678907a9893ca1911a1ac51c777b
|
||||||
|
size 4999889128
|
||||||
3
model-00003-of-00003.safetensors
Normal file
3
model-00003-of-00003.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:24b12aad6c88f0b8a47767229bab43c90a5c01b463302e5b39b1c947498aca53
|
||||||
|
size 3418504984
|
||||||
335
model.safetensors.index.json
Normal file
335
model.safetensors.index.json
Normal file
@@ -0,0 +1,335 @@
|
|||||||
|
{
|
||||||
|
"metadata": {
|
||||||
|
"total_parameters": 384387328,
|
||||||
|
"total_size": 13351067648
|
||||||
|
},
|
||||||
|
"weight_map": {
|
||||||
|
"lm_head.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.embed_tokens.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.0.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.1.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.10.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.10.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.10.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.10.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.10.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.10.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.10.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.10.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.10.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.11.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.11.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.11.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.11.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.11.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.11.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.11.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.11.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.11.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.12.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.12.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.12.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.12.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.12.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.12.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.12.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.12.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.12.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.13.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.13.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.13.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.13.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.13.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.13.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.13.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.13.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.13.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.14.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.14.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.14.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.14.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.14.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.14.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.14.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.14.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.14.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.15.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.15.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.15.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.15.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.15.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.15.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.15.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.15.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.15.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.16.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.16.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.16.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.16.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.16.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.16.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.16.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.16.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.16.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.17.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.17.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.17.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.17.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.17.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.17.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.17.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.17.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.17.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.18.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.18.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.18.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.18.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.18.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.18.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.18.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.18.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.18.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.19.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.19.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.19.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.19.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.19.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.19.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.19.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.19.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.19.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.2.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.20.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.20.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.20.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.20.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.20.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.20.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.20.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.20.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.20.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.21.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.21.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.21.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.21.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.21.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.21.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.21.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.21.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.21.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.22.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.22.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.22.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.22.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.22.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.22.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.22.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.22.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.22.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.23.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.23.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.23.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.23.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.23.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.23.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.23.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.23.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.23.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.24.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.24.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.24.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.24.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.24.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.24.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.24.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.24.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.24.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.25.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.25.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.25.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.25.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.25.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.25.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.25.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.25.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.25.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.26.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.26.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.26.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.26.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.26.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.26.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.26.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.26.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.26.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.27.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.27.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.27.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.27.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.27.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.27.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.27.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.27.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.27.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.28.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.28.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.28.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.28.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.28.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.28.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.28.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.28.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.28.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.29.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.29.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.29.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.29.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.29.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.29.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.29.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.29.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.29.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.3.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.30.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.30.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.30.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.30.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.30.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.30.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.30.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.30.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.30.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.31.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.31.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.31.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.31.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.31.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.31.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.31.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.31.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.31.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.32.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.32.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.32.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.32.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.32.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.32.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.32.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.32.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.32.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.33.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.33.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.33.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.33.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.33.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.33.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.33.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.33.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.33.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.34.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.34.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.34.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.34.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.34.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.34.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.34.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.34.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.34.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.35.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.35.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.35.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.35.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.35.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.35.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.35.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.35.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.35.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.4.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.5.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.5.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.5.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.6.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.6.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.6.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.6.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.6.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.6.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.6.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.7.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.7.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.7.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.7.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.7.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.7.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.7.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.7.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.8.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.8.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.8.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.8.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.8.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.8.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.8.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.8.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.8.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.9.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.9.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.9.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.9.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.9.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.9.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.9.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.9.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.9.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.norm.weight": "model-00003-of-00003.safetensors"
|
||||||
|
}
|
||||||
|
}
|
||||||
3
optimizer.bin
Normal file
3
optimizer.bin
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:62c5cee1bc4642cc9a791b9c1200dbc7d9a005504ec182096f9df09a3b40ef8a
|
||||||
|
size 24601100995
|
||||||
3
pytorch_model_fsdp.bin
Normal file
3
pytorch_model_fsdp.bin
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:ee9db1c68a1486d24104f7f7df7f558c188f09e1d386a5d02a24f0e11d8de04c
|
||||||
|
size 13351232180
|
||||||
3
rng_state_0.pth
Normal file
3
rng_state_0.pth
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:b093dfe59b41efeb45cc3d628d3360abaa2303bbaa489081411faf431e52941d
|
||||||
|
size 16389
|
||||||
3
rng_state_1.pth
Normal file
3
rng_state_1.pth
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:450a0ac1645503c0b14fe9c37d77060cc76b1c9942dcfdd0e779cd526b2e98d9
|
||||||
|
size 16389
|
||||||
3
rng_state_2.pth
Normal file
3
rng_state_2.pth
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:938b37918eac9a4cbef3805f7d2abdcef094a334f848e73ac19fcdc39d38663a
|
||||||
|
size 16389
|
||||||
3
rng_state_3.pth
Normal file
3
rng_state_3.pth
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:d8b27a54988f134299ab296b95e8c1e63d476dffdba7c6f120f2076e8688f355
|
||||||
|
size 16389
|
||||||
3
rng_state_4.pth
Normal file
3
rng_state_4.pth
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:d95f73d920296d5d9558e47894c5a2c0d649d7cb10a3b07a013d6bfbd3b8cf90
|
||||||
|
size 16389
|
||||||
3
rng_state_5.pth
Normal file
3
rng_state_5.pth
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:70b945bb634c9daf4a00433296ecc5245b34a2b5f09017993b5f5f03b84dabea
|
||||||
|
size 16389
|
||||||
3
rng_state_6.pth
Normal file
3
rng_state_6.pth
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:bfdd1fca0dace16a59c8592c531a70661218184bb0249c5862bbfb5ab0844fc9
|
||||||
|
size 16389
|
||||||
3
rng_state_7.pth
Normal file
3
rng_state_7.pth
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:d106363f9f1b0ff898c86d083a097bf22fd84de35e5670aa299504abcc99752a
|
||||||
|
size 16389
|
||||||
3
scheduler.pt
Normal file
3
scheduler.pt
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:8dfc927859b95b185390f63bc27c1e0c41086b3c66aec5bf0d42c28c8979ed70
|
||||||
|
size 1465
|
||||||
16
special_tokens_map.json
Normal file
16
special_tokens_map.json
Normal file
@@ -0,0 +1,16 @@
|
|||||||
|
{
|
||||||
|
"eos_token": {
|
||||||
|
"content": "<|im_end|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"pad_token": {
|
||||||
|
"content": "<|im_end|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
}
|
||||||
|
}
|
||||||
3
tokenizer.json
Normal file
3
tokenizer.json
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:7b6a500b662a34eb3f0374db856ba4ad7de4c81040571d78dc0d357238930005
|
||||||
|
size 17208819
|
||||||
2064
tokenizer_config.json
Normal file
2064
tokenizer_config.json
Normal file
File diff suppressed because it is too large
Load Diff
5488
trainer_state.json
Normal file
5488
trainer_state.json
Normal file
File diff suppressed because it is too large
Load Diff
3
training_args.bin
Normal file
3
training_args.bin
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:71160e82654feeadf7cf9fc628b07cb2ada8c311cee0b0394bfdad3c074b23ce
|
||||||
|
size 6417
|
||||||
Reference in New Issue
Block a user