初始化项目,由ModelHub XC社区提供模型

Model: Aratako/sarashina2.1-1b-sft
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-24 08:48:13 +08:00
commit 9cd1ebd182
14 changed files with 778 additions and 0 deletions

53
.gitattributes vendored Normal file
View File

@@ -0,0 +1,53 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zstandard filter=lfs diff=lfs merge=lfs -text
*.tfevents* filter=lfs diff=lfs merge=lfs -text
*.db* filter=lfs diff=lfs merge=lfs -text
*.ark* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*data* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.meta filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.index filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.gguf* filter=lfs diff=lfs merge=lfs -text
*.ggml filter=lfs diff=lfs merge=lfs -text
*.llamafile* filter=lfs diff=lfs merge=lfs -text
*.pt2 filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
pytorch_model.bin filter=lfs diff=lfs merge=lfs -text
training_args.bin filter=lfs diff=lfs merge=lfs -text
model.safetensors filter=lfs diff=lfs merge=lfs -text
tokenizer.model filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text

98
LICENSE Normal file
View File

@@ -0,0 +1,98 @@
Sarashina is licensed under the Sarashina Model NonCommercial License Agreement, Copyright ©SB Intuitions Corp. All Rights Reserved.
Sarashina Model NonCommercial License Agreement
Released Date by SB Intuitions Corp.: November 8, 2024
This Sarashina Model NonCommercial License Agreement (this "Agreement") is a legal agreement between Licensee (as defined in Article 1) and SB Intuitions Corp. ("SB Intuitions"), and governs Licensee's use of the Models (as defined in Article 1) that SB Intuitions provides to Licensee under this Agreement.
The Models released under this Agreement are intended to be used under this Agreement and contribute to the development of AI technologies.
By using, reproducing, modifying, distributing, performing or displaying any portion or element of the Model or Derivative Models, or otherwise accepting the terms of this Agreement, you agree to be bound by this Agreement. If you do not agree to this Agreement, you may not download the Model or Derivative Models.
1. Definition
i. "Model" means the deliverables including machine learning models, software, learnt weights, algorithms, parameters and configuration files provided by SB Intuitions and shared under this Agreement.
ii. "Output Data" means the text, images, sounds, or other information output of the Model that results from operating or otherwise using the Model.
iii. "Derivative Model" means all modifications to the Model, or deliverables created based on the Model, and any other machine learning models which have been trained, tuned, or improved by using Output Data. For clarification, Output Data is not the Derivative Model.
iv. "Licensee" means the person or entity exercising the rights granted by this Agreement.
v. "Affiliate" means any entity that now or hereafter controls, is controlled by, or is under common control with, SB Intuitions Corp. For the purposes of this definition, "control" shall mean the direct or indirect ownership over or the legal right or power to, directly or indirectly, direct or alter the direction of the management and affairs of the entity.
vi. "License" means the rights granted by Licensee under Article 2 Rightii of this Agreement.
vii. "Intellectual Property Rights" means copyrights, patents, utility model rights, design rights, trademarks and other intellectual property rights including the right to acquire such rights or to apply for the registration, etc. of such rights.
2. Right
i. Condition
Any use, reproduction, modification, distribution, performance or display of the Model or Derivative Models shall be subject to the following conditions:
(a) Upon renewal of the License, Licensee shall agree to comply with the renewed License or to cease using, copying, modifying, distributing, performing or displaying the Model or Derivative Models.
(b) Licensee shall comply with the redistribution regulations set out in Article 3.
(c) Licensee shall comply with the prohibition regulations set out in Article 6.
(d) Licensee shall not use, copy, modify, distribute, perform or display the Model for commercial purpose (except for research and study purpose).
ii. Grant of License
Subject to Licensee's compliance with this Agreement, SB Intuitions grants Licensee a non-exclusive and non-transferable license to use, copy, distribute, modify, perform or display the Model. This license also includes the creation of Derivative Models from the Model.
iii. Reservation of Rights
(a) SB Intuitions reserves all rights, interests and remedies in and to the Model or Derivatives Models by SB Intuitions, and no other rights are granted to Licensee except as expressly set forth in this Agreement.
(b) Licensee reserves the copyright to all Derivative Models created by Licensee.
(c) SB Intuitions does not claim any rights in the Output Data generated from the Model or Derivative Models. To the extent permitted by law, SB Intuitions acknowledges that the Output Data belongs to Licensee. Licensee is responsible for the use of Output Data.
3. Redistribution
i. Condition of Redistribution
Licensee may only reproduce or redistribute the Model or Derivative Models if Licensee complies with all the conditions set out in Article 2, as well as the followings:
(a) Licensee shall provide all third-party recipients of the Model or Model Derivatives with a copy of this Agreement;
(b) Licensee shall, in all copies of the Model or Derivative Models that Licensee distributes, retain the following attribution notice within a “LICENSE” text file distributed as a part of such copies:
"Sarashina is licensed under the Sarashina Model NonCommercial License Agreement, Copyright ©SB Intuitions Corp. All Rights Reserved.";
(c) If Licensee distributes the Model or Derivative Models, Licensee shall also include "Sarashina" at the beginning of the name of any such Model or Derivative Models;
(d) If Licensee uses the Model or Derivative Models as a component of the system, Licensee shall clearly indicate “Built with Sarashina” in relevant materials of the system such as documents, user interfaces, and/or websites.
4. Trademark
Nothing in this Agreement grant the Licensee permission to use the trade names, trademarks, service marks, or product names owned by SB Intuitions. However, the use within customary scope or the necessity to implement the content of this Agreement, such as quoting the deliverables, describing the origin of the deliverables and reproducing the content of the LICENSE file, is permitted.
5. Termination and Changes of the License
i. Litigation
If Licensee institutes a lawsuit (including a cross-claim or counterclaim in a lawsuit) or other legal proceeding against any party in connection with the Model or Derivative Models, this Agreement will terminate as of the date such lawsuit is filed.
ii. Changes to the License
SB Intuitions may revise this Agreement at any time, for security and/or safety reasons, to adapt to new technologies, to comply with legal and regulatory requirements, or for any other reasons. Upon the revision, SB Intuitions will replace this Agreement on the posted sites.
iii. Breach of the License
SB Intuitions may terminate this Agreement if Licensee is in breach of any term or condition of this Agreement. Upon termination of this Agreement, Licensee shall delete and cease use of the Model and any Derivative Model.
iv. Separability
If any part or provision of this Agreement is held invalid, illegal or unenforceable, the validity, legality, and enforceability of the remaining provisions shall not in any way be affected or impaired thereby.
6. Prohibited activities
Licensee shall not engage in any of the prohibited activities set out in the "Sarashina Prohibited Activities Policy".
7. DISCLAIMER
i. FUNCTIONALITY AND PERFORMANCE
THE MODEL IS PROVIDED ON AN “AS IS” BASIS. SB INTUITIONS MAKES NO WARRANTY AS TO FUNCTIONALITY OR PERFORMANCE.
ii. SUITABILITY
SB INTUITIONS MAKES NO WARRANTY, EXPRESSED OR IMPLIED, WITH RESPECT TO THE MODEL OR DERIVATIVE MODELS, AS TO ANY MATTER INCLUDING BUT NOT LIMITED TO, NON-INFRINGEMENT OF THIRD-PARTY INTELLECTUAL PROPERTY RIGHTS AND OTHER RIGHTS, MERCHANTABILITY, INTEGRITY, USABILITY, AND COMPATIBILITY FOR ANY PARTICULAR PURPOSE.
iii. APPROPRIATENESS OF USE OR REDISTRIBUTION
THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE MODEL OR DERIVATIVE MODELS SHALL BE DETERMINED BY THE LICENSEE WHO SHALL BEAR FULL RESPONSIBILITY FOR SUCH DETERMINATION AND ACTIONS.
iv. TECHNICAL SERVICES
SB INTUITIONS HAS NO OBLIGATION TO PROVIDE LICENSEE WITH ANY TECHNICAL SERVICES WITH RESPECT TO THE MODEL OR ANY DERIVATIVE MODELS.
8. LIMITATION OF LIABILITY
SB INTUITIONS (INCLUDING ITS AFFILIATES FOR THE PURPOSE OF THIS ARTICLE 8) SHALL NOT BE LIABLE UNDER ANY THEORY OF LIABILITY ARISING FROM THIS AGREEMENT, WHETHER IN AN ACTION BASED ON CONTRACT, TORT, NEGLIGENCE, PRODUCT LIABILITY OR OTHERWISE, EXCEPT IN CASES OF WILFUL MISCONDUCT OR GROSS NEGLIGNECE BY SB INTUITIONS, OR WHERE LIABILITY IS MANDATED BY LAW.
9. Compensation
Licensee shall indemnify and hold SB Intuitions (including its Affiliate for the purpose of this Article 9) harmless from and against all claims, demands, damages, losses, liabilities and expenses including, without limitation, reasonable attorneys' fees and expenses which SB Intuitions may suffer or incur as a result or as a consequence of, arising out of or in connection with the use, duplication, modification, distribution, execution or display of the Model, Derivative Models or Output Data by Licensee.
10. Trade
The Model, Derivative Models or the technology contained therein may be subject to export controls and sanctions laws and regulations of Japan and other jurisdictions (the "Export Controls"). Licensee shall comply with all applicable Export Controls. Licensee shall not access, use, download or send the Model or Derivative Models in any manner that would cause any party to violate any Export Controls.
11. Elimination of Anti-Social Forces
i. Licensee represents and warrants that it is not a gang, a gang member, an individual that left a gang within the last five (5) years, a quasi-member of a gang, a gang affiliate, corporate racketeer, a blackmailer camouflaged as a social movement activist, crime groups specialized in intellectual crimes or any other persons or groups equivalent to these (collectively "Anti-Social Forces") and that it does not fall under any of the following categories, presently and in the future:
(a) Licensee has relationships in which Anti-Social Forces are recognized to control the management;
(b) Licensee has relationships in which Anti-Social Forces are recognized to be substantially involved in the management;
(c) Licensee has relationships in which it is recognized to be engaged in unlawful use of Anti-Social Forces;
(d) Licensee provides funds or benefits to Anti-Social Forces; or
(e) Licensees officers or persons substantially involved in its management have socially reprehensible relationships with Anti-Social Forces.
ii. Licensee shall not engage in any of the following actions either by themselves or through a third party:
(a) violent demands;
(b) unreasonable demands beyond legal liability;
(c) threatening actions or use of violence in connection with transactions;
(d) spreading rumors, damaging the reputation of the counterparty by using fraudulent means or force, or obstructing the service of the other Party; or
(e) any other actions equivalent to any of the preceding items.
iii. SB Intuitions shall be entitled to unilaterally terminate this Agreement without giving any notice to Licensee when it is found that Licensee falls under the definition of Licensee or any of the items set forth in paragraph i of this Article 11, or has committed an action that falls under any of the items in paragraph ii of this Article 11, or has made a false declaration with regard to the representation or definitive agreement stated in paragraph i of this Article 11, regardless of any breach attributable to SB Intuitions. SB Intuitions shall not be responsible for any damage caused by the termination in accordance with this paragraph and incurred by Licensee.
12. Governing Law and Jurisdiction
i. Governing Law
This Agreement shall be governed by and construed in accordance with the laws of Japan.
ii. Jurisdiction
Any dispute arising from or in connection with this Agreement shall be subject to the exclusive jurisdiction of the Tokyo District Court in the first instance.

332
README.md Normal file
View File

@@ -0,0 +1,332 @@
---
library_name: transformers
license: other
license_link: LICENSE
base_model: sbintuitions/sarashina2.1-1b
tags:
- axolotl
- generated_from_trainer
model-index:
- name: sarashina2.1-1b-sft
results: []
language:
- ja
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# sarashina2.1-1b-sft
[sbintuitions/sarashina2.1-1b](https://huggingface.co/sbintuitions/sarashina2.1-1b)を手元のデータでInstruction Tuningしたモデルです。
**Sarashina is licensed under the Sarashina Model NonCommercial License Agreement, Copyright ©SB Intuitions Corp. All Rights Reserved.**
This model is built with Qwen.
# 使い方
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
model = AutoModelForCausalLM.from_pretrained("Aratako/sarashina2.1-1b-sft", device_map="auto", torch_dtype="auto")
tokenizer = AutoTokenizer.from_pretrained("Aratako/sarashina2.1-1b-sft")
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
messages = [
{"role": "user", "content": "ソフトバンク株式会社について教えてください。"}
]
input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
output_ids = model.generate(input_ids,
max_new_tokens=1024,
do_sample=False,
streamer=streamer,
eos_token_id=tokenizer.eos_token_id,
)
```
```text
ソフトバンク株式会社は、日本の大手通信会社の一つで、主に携帯電話やインターネットサービス、そして様々なデジタル製品やサービスを提供しています。
1. **事業内容**:
- 携帯電話事業: ソフトバンクモバイル(現・ソフトバンク)として、携帯電話やスマートフォンの販売を行っています。
- インターネットサービス: ソフトバンクBB現・ソフトバンクBBとして、インターネット接続サービスやISP事業を行っています。
- デジタル製品: スマートフォンやタブレット、ウェアラブルデバイスなどの開発・販売も行っています。
- クラウドサービス: ソフトバンククラウドとして、クラウドサービスを提供しています。
- 金融サービス: ソフトバンクファイナンスとして、クレジットカードやローンなどの金融サービスを提供しています。
2. **グループ会社**:
- ソフトバンクグループ: ソフトバンク株式会社の親会社であり、通信事業だけでなく、テクノロジー、メディア、金融など多岐にわたる事業を展開しています。
- ソフトバンクBB: インターネット接続サービスやISP事業を行う子会社です。
- ソフトバンクモバイル: 携帯電話事業を行う子会社です。
- ソフトバンクコマース&サービス: コマース事業を行う子会社です。
- ソフトバンクテクノロジー: ITソリューション事業を行う子会社です。
3. **特徴**:
- ソフトバンクは、通信事業だけでなく、テクノロジーやメディア、金融など多岐にわたる分野で事業を展開しており、その規模と影響力は非常に大きいです。
- 「ソフトバンクグループ」として、通信事業だけでなく、テクノロジー、メディア、金融など多岐にわたる事業を展開しています。
- 「Pepper」などのロボット開発や、AI人工知能技術を活用したサービス開発にも力を入れています。
4. **社会貢献活動**:
- ソフトバンクは、社会貢献活動にも積極的に取り組んでおり、教育支援や災害時の救援活動など、様々な形で社会に貢献しています。
ソフトバンクは、通信業界だけでなく、幅広い分野で革新的なサービスを提供し、日本の社会や経済に大きな影響を与えています。
```
# 応答例
## 入力([elyza/ELYZA-tasks-100](https://huggingface.co/datasets/elyza/ELYZA-tasks-100)より)
```text
<|im_start|>user
仕事の熱意を取り戻すためのアイデアを5つ挙げてください。<|im_end|>
<|im_start|>assistant
```
## 出力
```text
もちろん、以下にいくつかのアイデアを提案します:
1. 目標設定: 明確な目標を設定し、それを達成するための具体的なステップを計画します。これにより、仕事に対するモチベーションが高まります。
2. 休憩を取る: 長時間働くと集中力が低下します。定期的に休憩を取り、リフレッシュすることが重要です。
3. 新しいスキルを学ぶ: 新しい技術や知識を習得することで、自己成長を感じ、それが仕事への情熱を再び引き出すことができます。
4. 健康的なライフスタイルを維持する: 適度な運動や十分な睡眠は、精神的な健康を維持し、エネルギーレベルを高めます。
5. ネットワーキング: 同僚や他の専門家との交流を通じて新たな視点を得ることができます。これはあなたの視野を広げ、新たな機会を見つける手助けになります。
```
# ライセンス
元モデルから[Sarashina Model NonCommercial License Agreement](https://huggingface.co/sbintuitions/sarashina2.1-1B/blob/main/LICENSE)を継承します。利用にあたっては様々な制約があるので十分にご注意ください。
# 注意
- 本モデルの製作者は元モデルの製作者であるSB Intuitions株式会社とは一切関係ありません。本モデルに何か問題があった場合は製作者である私宛にご連絡ください。
- モデルの安全性は検証されていません。ご了承ください。
[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
<details><summary>See axolotl config</summary>
axolotl version: `0.5.2`
```yaml
base_model: sbintuitions/sarashina2.1-1b
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
hub_model_id: Aratako/sarashina2.1-1b-sft
hub_strategy: "end"
push_dataset_to_hub:
hf_use_auth_token: true
plugins:
- axolotl.integrations.liger.LigerPlugin
liger_cross_entropy: false
liger_rope: true
liger_rms_norm: true
liger_swiglu: true
liger_fused_linear_cross_entropy: true
load_in_8bit: false
load_in_4bit: false
strict: false
chat_template: chatml
datasets:
- path: Aratako/Magpie-Tanuki-Qwen2.5-72B-Answered
type: chat_template
field_messages: messages
message_field_role: role
message_field_content: content
- path: Aratako/magpie-qwen2.5-32b-reasoning-100k-formatted
type: chat_template
field_messages: conversations
message_field_role: role
message_field_content: content
- path: Aratako/Open-Platypus-Japanese-masked-formatted
type: chat_template
field_messages: conversations
message_field_role: role
message_field_content: content
- path: kanhatakeyama/wizardlm8x22b-logical-math-coding-sft_additional-ja
type: chat_template
field_messages: messages
message_field_role: role
message_field_content: content
- path: kanhatakeyama/ramdom-to-fixed-multiturn-Calm3
split: 20240806filtered
type: chat_template
field_messages: messages
message_field_role: role
message_field_content: content
- path: llm-jp/magpie-sft-v1.0
type: chat_template
field_messages: conversations
message_field_role: role
message_field_content: content
- path: Aratako/aya-ja-evol-instruct-calm3-dpo-masked-sft
type: chat_template
field_messages: messages
message_field_role: role
message_field_content: content
- path: Aratako/aya-ja-nemotron-dpo-masked-sft
type: chat_template
field_messages: messages
message_field_role: role
message_field_content: content
- path: Aratako/Synthetic-JP-EN-Coding-Dataset-801k
split: "train[0:50000]"
type: chat_template
field_messages: messages
message_field_role: role
message_field_content: content
- path: Aratako/orca-agentinstruct-1M-v1-selected-2
type: chat_template
field_messages: messages
message_field_role: role
message_field_content: content
- path: Aratako/Synthetic-JP-EN-Coding-Dataset-801k-50k
type: chat_template
field_messages: messages
message_field_role: role
message_field_content: content
shuffle_merged_datasets: true
dataset_prepared_path: /workspace/data/fft-data-sarashina
val_set_size: 0.002
output_dir: /workspace/data/1b-fft-out
sequence_len: 4096
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true
adapter:
lora_model_dir:
lora_r:
lora_alpha:
lora_dropout:
lora_target_linear:
lora_fan_in_fan_out:
wandb_project: 1b-fft
wandb_entity: aratako-lm
wandb_watch:
wandb_name: fft-attempt-1
wandb_log_model:
gradient_accumulation_steps: 4
micro_batch_size: 8
num_epochs: 2
optimizer: adamw_torch
lr_scheduler: cosine
cosine_min_lr_ratio: 0.1
learning_rate: 0.00002
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false
gradient_checkpointing: false
early_stopping_patience:
auto_resume_from_checkpoints: true
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
save_strategy: steps
save_steps: 100
save_total_limit: 1
warmup_steps: 20
eval_steps: 100
eval_batch_size: 1
eval_table_size:
eval_max_new_tokens:
debug:
deepspeed: /workspace/axolotl/deepspeed_configs/zero1.json
weight_decay: 0.01
fsdp:
fsdp_config:
special_tokens:
pad_token: <pad>
tokens:
- "<|im_start|>"
- "<|im_end|>"
```
</details><br>
# sarashina2.1-1b-sft
This model is a fine-tuned version of [sbintuitions/sarashina2.1-1b](https://huggingface.co/sbintuitions/sarashina2.1-1b) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 0.9366
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 8
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 4
- total_train_batch_size: 256
- total_eval_batch_size: 8
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 20
- num_epochs: 2
### Training results
| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:------:|:----:|:---------------:|
| 1.2935 | 0.0015 | 1 | 1.4733 |
| 0.985 | 0.1515 | 100 | 1.0491 |
| 0.9131 | 0.3029 | 200 | 1.0156 |
| 0.9174 | 0.4544 | 300 | 0.9935 |
| 0.9257 | 0.6058 | 400 | 0.9806 |
| 0.869 | 0.7573 | 500 | 0.9694 |
| 0.8874 | 0.9087 | 600 | 0.9608 |
| 0.8041 | 1.0594 | 700 | 0.9557 |
| 0.8348 | 1.2109 | 800 | 0.9512 |
| 0.8353 | 1.3624 | 900 | 0.9466 |
| 0.8145 | 1.5138 | 1000 | 0.9432 |
| 0.8057 | 1.6653 | 1100 | 0.9400 |
| 0.838 | 1.8167 | 1200 | 0.9381 |
| 0.8446 | 1.9682 | 1300 | 0.9366 |
### Framework versions
- Transformers 4.46.3
- Pytorch 2.3.1+cu121
- Datasets 3.1.0
- Tokenizers 0.20.3

4
added_tokens.json Normal file
View File

@@ -0,0 +1,4 @@
{
"<|im_end|>": 102401,
"<|im_start|>": 102400
}

30
config.json Normal file
View File

@@ -0,0 +1,30 @@
{
"_name_or_path": "sbintuitions/sarashina2.1-1b",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 102401,
"head_dim": 112,
"hidden_act": "silu",
"hidden_size": 1792,
"initializer_range": 0.02,
"intermediate_size": 6272,
"max_position_embeddings": 8192,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 16,
"num_hidden_layers": 24,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 500000,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.46.3",
"use_cache": false,
"vocab_size": 102402
}

1
configuration.json Normal file
View File

@@ -0,0 +1 @@
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}

7
generation_config.json Normal file
View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 1,
"do_sample": true,
"eos_token_id": 2,
"transformers_version": "4.46.3"
}

3
model.safetensors Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:89f6b2fd14403f6eade689340379a88dac517785b60b616bf7543f385594b872
size 2815117504

3
pytorch_model.bin Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:aff9f2c8e35db130a0113609e1d2fb9f1febfd8059457d05bd59fd9d02374e1c
size 2815122106

51
special_tokens_map.json Normal file
View File

@@ -0,0 +1,51 @@
{
"bos_token": {
"content": "<s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"cls_token": {
"content": "<cls>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "</s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"mask_token": {
"content": "<mask>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<pad>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"sep_token": {
"content": "<sep>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"unk_token": {
"content": "<unk>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

3
tokenizer.json Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:eaeaf0ecad056bcb3ab2c54a23b21b70f23cd65484e3c7a8e271b326fd86c3e8
size 6724385

3
tokenizer.model Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:008293028e1a9d9a1038d9b63d989a2319797dfeaa03f171093a57b33a3a8277
size 1831879

187
tokenizer_config.json Normal file
View File

@@ -0,0 +1,187 @@
{
"add_bos_token": false,
"add_dummy_prefix_space": false,
"add_eos_token": false,
"add_prefix_space": false,
"added_tokens_decoder": {
"0": {
"content": "<unk>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"1": {
"content": "<s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"2": {
"content": "</s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"3": {
"content": "<pad>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"4": {
"content": "<sep>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"5": {
"content": "<mask>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"6": {
"content": "<cls>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"7": {
"content": "<|system|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"8": {
"content": "<|assistant|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"9": {
"content": "<|user|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"10": {
"content": "<|available_tools|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"11": {
"content": "<|tool_calls|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"12": {
"content": "<|tool_results|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"13": {
"content": "<|code|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"14": {
"content": "<|file|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"102397": {
"content": "<|prefix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"102398": {
"content": "<|suffix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"102399": {
"content": "<|middle|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"102400": {
"content": "<|im_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"102401": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
}
},
"bos_token": "<s>",
"chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",
"clean_up_tokenization_spaces": false,
"cls_token": "<cls>",
"do_lower_case": false,
"eos_token": "<|im_end|>",
"extra_ids": 0,
"keep_accents": true,
"legacy": false,
"mask_token": "<mask>",
"model_max_length": 1000000000000000019884624838656,
"pad_token": "<pad>",
"padding_side": "left",
"sep_token": "<sep>",
"sp_model_kwargs": {},
"spaces_between_special_tokens": false,
"tokenizer_class": "LlamaTokenizer",
"unk_token": "<unk>",
"use_default_system_prompt": false
}

3
training_args.bin Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:511211b07bb545451edb5d7ca9e8de45751900bf6d05f5a6b371bd2f198c291e
size 8120