初始化项目,由ModelHub XC社区提供模型

Model: kayrab/doktor-llama-3-cosmos-8b
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-31 13:35:42 +08:00
commit 9eb07daaea
15 changed files with 413447 additions and 0 deletions

35
.gitattributes vendored Normal file
View File

@@ -0,0 +1,35 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text

199
README.md Normal file
View File

@@ -0,0 +1,199 @@
---
base_model: ytu-ce-cosmos/Turkish-Llama-8b-v0.1
language:
- tr
license: mit
tags:
- transformers
- unsloth
- llama
- trl
- sft
- turkish
datasets:
- kayrab/patient-doctor-qa-tr-321179
metrics:
- bleu
- bertscore
- rouge
- cer
- wer
- meteor
pipeline_tag: question-answering
---
# Doktor Cosmos Llama 3 Modeli
## Genel Bakış
**Doktor Cosmos Llama 3**, Türkçe sağlık danışmanlığı alanında kullanılmak üzere geliştirilmiş bir büyük dil modelidir. Bu model, doktor-hasta yazılı iletişimindeki performansı iyileştirmek ve hastalara daha doğru ve bağlama uygun yanıtlar sunmak amacıyla oluşturulmuştur.
Bu model, [Muhammed Kayra Bulut](https://github.com/kaayra2000) tarafından hazırlanan yüksek lisans tezi kapsamında geliştirilmiştir.
## Özellikler
- **Dil**: Türkçe
- **Model Boyutu**: 8 milyar parametre
- **Taban Model**: Turkish-Llama-8b-v0.1
- **Eğitim Verisi**: 321.179 adet Türkçe hasta-doktor soru-cevap çiftinden oluşan özel bir veri kümesi kullanılmıştır.
- **Amaç**: Türkçe sağlık danışmanlığı alanında etkili ve güvenilir bir dil modeli oluşturmak.
## Eğitim Süreci
Modelin eğitimi ve ince ayarı şu adımlarla gerçekleştirilmiştir:
1. **Veri Toplama ve İşleme**: Doktor-hasta yazılı iletişimlerinden oluşan geniş bir veri kümesi toplanmış, temizlenmiş ve modele uygun hale getirilmiştir.
2. **İnce Ayar (Fine-Tuning)**: Turkish-Llama-8b-v0.1 tabanlı model, Türkçe sağlık verileriyle ince ayar yapılarak eğitilmiştir.
3. **Değerlendirme**: Modelin performansı ROUGE, BLEU, BERT Score gibi metriklerle ve uzman değerlendirmeleriyle ölçülmüştür.
## Performans ve Sonuçlar
Yapılan değerlendirmeler sonucunda, Doktor Cosmos Llama 3 modelinin Türkçe sağlık danışmanlığı alanında aşağıdaki başarılara ulaştığı tespit edilmiştir:
- **Yüksek Doğruluk**: Model, hasta sorularına doğru ve bağlama uygun yanıtlar verebilmektedir.
- **Etkili İletişim**: Doktor-hasta iletişiminde anlaşılırlığı artırarak, tıbbi bilgileri hastaların anlayabileceği bir dilde sunabilmektedir.
- **Uzman Onayı**: Uzman doktorlar tarafından yapılan değerlendirmelerde olumlu geri bildirimler alınmıştır.
Daha detaylı bilgi için [yüksek lisans tezine](https://tez.yok.gov.tr/UlusalTezMerkezi/TezGoster?key=E_eEUHQic_C-LvhxNQn1W9jmOJLuQUDfAO_NPVlpSUbRZEUJN9xUZ4i3VXSzTN_H) başvurabilirsiniz.
## Kullanım Alanları
- **Sağlık Danışmanlığı**: Hasta sorularına hızlı ve doğru yanıtlar sunarak sağlık hizmetlerini destekler.
- **Eğitim ve Araştırma**: Tıp öğrencileri ve araştırmacılar için yardımcı bir araç olarak kullanılabilir.
- **Hasta Bilgilendirme**: Tıbbi terimleri anlaşılır bir dilde açıklayarak hasta eğitimine katkıda bulunur.
## Kurulum ve Kullanım
1. **Gereksinimler**:
- Python 3.8+
- PyTorch
- Transformers kütüphanesi
2. **Kurulum**:
```bash
git clone https://github.com/kayrab/doktor-llama-3-cosmos-8b.git
cd doktor-llama-3-cosmos-8b
```
3. **Modelin Yüklenmesi ve Kullanımı:**
```python
from huggingface_hub import login
login("hesaba_ait_token")
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("kayrab/doktor-llama-3-cosmos-8b")
model = AutoModelForCausalLM.from_pretrained("kayrab/doktor-llama-3-cosmos-8b")
# Prompt'u input_text ile doldurmak için format kullanıyoruz
input_text = "Merhaba doktor, baş ağrım ve ateşim var. Ne yapmalıyım?"
prompt = """Sen bir doktorsun. Soruları buna göre cevapla.
### <|reserved_special_token_0|>:
{}
### <|reserved_special_token_1|>:
{}""".format(input_text, "") # input_text'i yerleştiriyoruz, cevap kısmı boş bırakılıyor
# Tokenizer ile prompt'u işliyoruz
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs)
# Modelin çıktısını decode ediyoruz
answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(answer)
```
## Referanslar
Yüksek Lisans Tezi: [Sağlık Verileri Üzerinde Büyük Dil Modellerinin İnce Ayar Performansı - Muhammed Kayra Bulut, Yıldız Teknik Üniversitesi, 2024.](https://tez.yok.gov.tr/UlusalTezMerkezi/TezGoster?key=E_eEUHQic_C-LvhxNQn1W9jmOJLuQUDfAO_NPVlpSUbRZEUJN9xUZ4i3VXSzTN_H)
# Doctor Cosmos Llama 3 Model
## Overview
**Doctor Cosmos Llama 3** is a large language model developed for use in Turkish health consultancy. This model aims to improve the performance of written communication between doctors and patients and provide patients with more accurate and context-appropriate responses.
This model was developed as part of a master's thesis prepared by [Muhammed Kayra Bulut](https://github.com/kaayra2000).
## Features
- **Language**: Turkish
- **Model Size**: 8 billion parameters
- **Base Model**: Turkish-Llama-8b-v0.1
- **Training Data**: A special dataset consisting of 321,179 Turkish patient-doctor question-answer pairs was used.
- **Purpose**: To create an effective and reliable language model in the field of Turkish health consultancy.
## Training Process
The training and fine-tuning of the model were carried out in the following steps:
1. **Data Collection and Processing**: A large dataset consisting of written doctor-patient communications was collected, cleaned, and prepared for the model.
2. **Fine-Tuning**: The base model Turkish-Llama-8b-v0.1 was fine-tuned using Turkish health data.
3. **Evaluation**: The model's performance was measured using metrics such as ROUGE, BLEU, BERT Score, and expert evaluations.
## Performance and Results
As a result of the evaluations, it was determined that the Doctor Cosmos Llama 3 model achieved the following successes in the field of Turkish health consultancy:
- **High Accuracy**: The model can provide accurate and context-appropriate responses to patient questions.
- **Effective Communication**: It enhances clarity in doctor-patient communication by presenting medical information in a language that patients can understand.
- **Expert Approval**: Positive feedback was received from evaluations conducted by expert doctors.
For more detailed information, you can refer to the [master's thesis](https://tez.yok.gov.tr/UlusalTezMerkezi/TezGoster?key=E_eEUHQic_C-LvhxNQn1W9jmOJLuQUDfAO_NPVlpSUbRZEUJN9xUZ4i3VXSzTN_H).
## Use Cases
- **Health Consultancy**: Supports health services by providing quick and accurate responses to patient questions.
- **Education and Research**: Can be used as an assistant tool for medical students and researchers.
- **Patient Education**: Contributes to patient education by explaining medical terms in understandable language.
## Installation and Usage
1. **Requirements**:
- Python 3.8+
- PyTorch
- Transformers library
2. **Installation**:
```bash
git clone https://github.com/kayrab/doktor-llama-3-cosmos-8b.git
cd doktor-llama-3-cosmos-8b
```
3. **Loading and Using the Model:**
```python
from huggingface_hub import login
login("your_token")
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("kayrab/doktor-llama-3-cosmos-8b")
model = AutoModelForCausalLM.from_pretrained("kayrab/doktor-llama-3-cosmos-8b")
# Fill in the prompt with input_text
input_text = "Merhaba doktor, baş ağrım ve ateşim var. Ne yapmalıyım?"
prompt = """Sen bir doktorsun. Soruları buna göre cevapla.
### <|reserved_special_token_0|>:
{}
### <|reserved_special_token_1|>:
{}""".format(input_text, "") # We place input_text and leave the answer section empty
# Process the prompt with the tokenizer
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs)
# Decode the model's output
answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(answer)
```
## References
Master's Thesis: [Fine-Tuning Performance of Large Language Models on Health Data - Muhammed Kayra Bulut, Yıldız Technical University, 2024.](https://tez.yok.gov.tr/UlusalTezMerkezi/TezGoster?key=E_eEUHQic_C-LvhxNQn1W9jmOJLuQUDfAO_NPVlpSUbRZEUJN9xUZ4i3VXSzTN_H)

31
config.json Normal file
View File

@@ -0,0 +1,31 @@
{
"_name_or_path": "ytu-ce-cosmos/Turkish-Llama-8b-v0.1",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"eos_token_id": 128001,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 8192,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pad_token_id": 128255,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.43.2",
"unsloth_version": "2024.7",
"use_cache": true,
"vocab_size": 128256
}

9
generation_config.json Normal file
View File

@@ -0,0 +1,9 @@
{
"bos_token_id": 128000,
"do_sample": true,
"eos_token_id": 128001,
"max_length": 4096,
"temperature": 0.6,
"top_p": 0.9,
"transformers_version": "4.43.2"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f543d122c49c4d3dc798d93a5442fb22142a80ed87a62a1c39973c2c5f1312f2
size 4976698672

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:e6bb148ac07c32ab08639339190ba83048b20d458bf225c2b245466e061b525e
size 4999802720

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:cb791a93028d41761d85aa0a9b0efdb3fd52ce8eb34a7b7002421195c5be7c76
size 4915916176

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:fc7f416aafdf8b09c64fbb0a9c5e5ddef243af3ce88f78f1edf531fadda41cb0
size 1168138808

View File

@@ -0,0 +1,298 @@
{
"metadata": {
"total_size": 16060522496
},
"weight_map": {
"lm_head.weight": "model-00004-of-00004.safetensors",
"model.embed_tokens.weight": "model-00001-of-00004.safetensors",
"model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.20.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.20.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.20.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.21.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.28.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.29.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.30.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.30.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.input_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.31.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.31.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.31.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.9.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.norm.weight": "model-00004-of-00004.safetensors"
}
}

17
special_tokens_map.json Normal file
View File

@@ -0,0 +1,17 @@
{
"bos_token": {
"content": "<|begin_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "<|end_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": "<|reserved_special_token_250|>"
}

410563
tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

2063
tokenizer_config.json Normal file

File diff suppressed because it is too large Load Diff

3
training_args.bin Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:818381194015250cb41e61b91c2b9eb6d11b0adcf159a6638ecd1cb391b42bae
size 5176

71
training_infos.md Normal file
View File

@@ -0,0 +1,71 @@
# Promt Format
```
alpaca_prompt = """Sen bir doktorsun. Soruları buna göre cevapla.
### <|reserved_special_token_0|>:
{}
### <|reserved_special_token_1|>:
{}"""
```
# Training args
```
batch_size = 128
gradient_accumulation_steps = 32
num_train_epochs = 2
per_device_batch_size = int(batch_size / gradient_accumulation_steps)
training_args = TrainingArguments(
per_device_train_batch_size = per_device_batch_size,
per_device_eval_batch_size = per_device_batch_size,
gradient_accumulation_steps = gradient_accumulation_steps,
save_total_limit = 1,
warmup_steps = int(2000 / batch_size),
num_train_epochs = num_train_epochs,
learning_rate = 1e-4,
fp16 = not is_bfloat16_supported(),
bf16 = is_bfloat16_supported(),
optim = "adamw_8bit",
weight_decay = 0.01,
lr_scheduler_type = "linear",
seed = 3407,
output_dir = output_dir,
save_strategy = "steps",
eval_strategy = "steps",
logging_strategy = "steps",
save_steps = int(5000 / batch_size * num_train_epochs),
eval_steps = int(28900 / batch_size * num_train_epochs),
logging_steps = int(28900 / batch_size * num_train_epochs),
)
```
# Trainer args
```
max_seq_length = 8192
trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset = train_dataset,
eval_dataset = eval_dataset,
dataset_text_field = "text",
max_seq_length = max_seq_length,
dataset_num_proc = 1,
packing = False, # Can make training 5x faster for short sequences.
args = training_args
)
```
# From pretrained args
```
from unsloth import FastLanguageModel
dtype = None
load_in_4bit = False
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = output_dir,
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
)
```

146
training_log.json Normal file
View File

@@ -0,0 +1,146 @@
[
{
"loss": 1.8184,
"grad_norm": 0.774284303188324,
"learning_rate": 8.923190911336132e-05,
"epoch": 0.2218976306523778,
"step": 451
},
{
"eval_loss": 1.749639630317688,
"eval_runtime": 1083.3199,
"eval_samples_per_second": 26.684,
"eval_steps_per_second": 6.671,
"epoch": 0.2218976306523778,
"step": 451
},
{
"loss": 1.7064,
"grad_norm": 0.7226008772850037,
"learning_rate": 7.809335638429242e-05,
"epoch": 0.4437952613047556,
"step": 902
},
{
"eval_loss": 1.6820720434188843,
"eval_runtime": 1087.8354,
"eval_samples_per_second": 26.573,
"eval_steps_per_second": 6.643,
"epoch": 0.4437952613047556,
"step": 902
},
{
"loss": 1.6623,
"grad_norm": 0.7210889458656311,
"learning_rate": 6.695480365522352e-05,
"epoch": 0.6656928919571334,
"step": 1353
},
{
"eval_loss": 1.6484241485595703,
"eval_runtime": 1084.0066,
"eval_samples_per_second": 26.667,
"eval_steps_per_second": 6.667,
"epoch": 0.6656928919571334,
"step": 1353
},
{
"loss": 1.6362,
"grad_norm": 0.7425829172134399,
"learning_rate": 5.581625092615461e-05,
"epoch": 0.8875905226095112,
"step": 1804
},
{
"eval_loss": 1.6261749267578125,
"eval_runtime": 1085.6726,
"eval_samples_per_second": 26.626,
"eval_steps_per_second": 6.657,
"epoch": 0.8875905226095112,
"step": 1804
},
{
"loss": 1.5914,
"grad_norm": 0.7476176023483276,
"learning_rate": 4.4677698197085704e-05,
"epoch": 1.109488153261889,
"step": 2255
},
{
"eval_loss": 1.6124544143676758,
"eval_runtime": 1084.6953,
"eval_samples_per_second": 26.65,
"eval_steps_per_second": 6.663,
"epoch": 1.109488153261889,
"step": 2255
},
{
"loss": 1.5557,
"grad_norm": 0.7473255395889282,
"learning_rate": 3.3539145468016795e-05,
"epoch": 1.3313857839142669,
"step": 2706
},
{
"eval_loss": 1.6011990308761597,
"eval_runtime": 1084.5737,
"eval_samples_per_second": 26.653,
"eval_steps_per_second": 6.663,
"epoch": 1.3313857839142669,
"step": 2706
},
{
"loss": 1.5468,
"grad_norm": 0.750347375869751,
"learning_rate": 2.240059273894789e-05,
"epoch": 1.5532834145666445,
"step": 3157
},
{
"eval_loss": 1.592125415802002,
"eval_runtime": 1086.1382,
"eval_samples_per_second": 26.614,
"eval_steps_per_second": 6.654,
"epoch": 1.5532834145666445,
"step": 3157
},
{
"loss": 1.539,
"grad_norm": 0.7452530860900879,
"learning_rate": 1.1262040009878982e-05,
"epoch": 1.7751810452190224,
"step": 3608
},
{
"eval_loss": 1.5855711698532104,
"eval_runtime": 1084.064,
"eval_samples_per_second": 26.665,
"eval_steps_per_second": 6.667,
"epoch": 1.7751810452190224,
"step": 3608
},
{
"loss": 1.5373,
"grad_norm": 0.7762174606323242,
"learning_rate": 1.2348728081007656e-07,
"epoch": 1.9970786758714003,
"step": 4059
},
{
"eval_loss": 1.5827687978744507,
"eval_runtime": 1086.4478,
"eval_samples_per_second": 26.607,
"eval_steps_per_second": 6.652,
"epoch": 1.9970786758714003,
"step": 4059
},
{
"train_runtime": 69830.4547,
"train_samples_per_second": 7.451,
"train_steps_per_second": 0.058,
"total_flos": 9.47865607059515e+18,
"train_loss": 1.5162640926171476,
"epoch": 1.9995387382954841,
"step": 4064
}
]