初始化项目,由ModelHub XC社区提供模型

Model: Ramikan-BR/tinyllama-coder-py-4bit-v10
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-13 12:50:26 +08:00
commit 76ec04e325
12 changed files with 93837 additions and 0 deletions

38
.gitattributes vendored Normal file
View File

@@ -0,0 +1,38 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
tinyllama-coder-py-4bit-v10-unsloth.Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
tinyllama-coder-py-4bit-v10-unsloth.F16.gguf filter=lfs diff=lfs merge=lfs -text
tinyllama-coder-py-4bit-v10-unsloth.Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text

279
README.md Normal file
View File

@@ -0,0 +1,279 @@
---
language:
- en
license: apache-2.0
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- trl
- sft
- code
- lora
- peft
base_model: unsloth/tinyllama-chat-bnb-4bit
pipeline_tag: text-generation
datasets: Ramikan-BR/data-oss_instruct-decontaminated_python.jsonl
---
# Uploaded model
- **Developed by:** Ramikan-BR
- **Model type:** [text-generation/Python Coder]
- **Language(s) (NLP):** [en]
- **License:** apache-2.0
- **Finetuned from model :** unsloth/tinyllama-chat-bnb-4bit
### Model Description
<!-- Provide a longer summary of what this model is. -->
### Training Data
datasets: [Ramikan-BR/data-oss_instruct-decontaminated_python.jsonl](https://huggingface.co/datasets/Ramikan-BR/data-oss_instruct-decontaminated_python.jsonl)
### Training Procedure
The model was refined using [Unsloath](https://github.com/unslothai/unsloth). The dataset [ise-uiuc/Magicoder-OSS-Instruct-75K](https://huggingface.co/datasets/ise-uiuc/Magicoder-OSS-Instruct-75K/blob/main/data-oss_instruct-decontaminated.jsonl) was adjusted, leaving only data on python and divided into 10 parts, each refinement occurred for 2 epochs, using adafactor optimizer or adamw_8bit (adafactor seems to deliver less loss).
### Model Sources [optional]
base_model: [unsloth/tinyllama-chat-bnb-4bit](https://huggingface.co/unsloth/tinyllama-chat-bnb-4bit)
model: [Ramikan-BR/tinyllama-coder-py-4bit-v10](https://huggingface.co/Ramikan-BR/tinyllama-coder-py-4bit-v10)
gguf_f16: [tinyllama-coder-py-4bit-v10-unsloth.F16.gguf](https://huggingface.co/Ramikan-BR/tinyllama-coder-py-4bit-v10/blob/main/tinyllama-coder-py-4bit-v10-unsloth.F16.gguf)
gguf_Q4_K_M: [tinyllama-coder-py-4bit-v10-unsloth.Q4_K_M.gguf](https://huggingface.co/Ramikan-BR/tinyllama-coder-py-4bit-v10/blob/main/tinyllama-coder-py-4bit-v10-unsloth.Q4_K_M.gguf)
gguf_Q8_0: [tinyllama-coder-py-4bit-v10-unsloth.Q8_0.gguf](https://huggingface.co/Ramikan-BR/tinyllama-coder-py-4bit-v10/blob/main/tinyllama-coder-py-4bit-v10-unsloth.Q8_0.gguf)
#### Training Hyperparameters
Notebook [Unsloath](https://github.com/unslothai/unsloth) that I used for AI refinement: [TinyLlama](https://colab.research.google.com/drive/1AZghoNBQaMDgWJpi4RbffGM1h6raLUj9?usp=sharing)
```python
%%capture
# Installs Unsloth, Xformers (Flash Attention) and all other packages!
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps xformers trl peft accelerate bitsandbytes # xformers "xformers<0.0.26"
import os
from google.colab import drive
drive.mount('/content/drive')
from unsloth import FastLanguageModel
import torch
max_seq_length = 4096 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.
# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
"unsloth/mistral-7b-bnb-4bit",
"unsloth/mistral-7b-instruct-v0.2-bnb-4bit",
"unsloth/llama-2-7b-bnb-4bit",
"unsloth/llama-2-13b-bnb-4bit",
"unsloth/codellama-34b-bnb-4bit",
"unsloth/tinyllama-bnb-4bit",
"unsloth/gemma-7b-bnb-4bit", # New Google 6 trillion tokens model 2.5x faster!
"unsloth/gemma-2b-bnb-4bit",
] # More models at https://huggingface.co/unsloth
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "Ramikan-BR/tinyllama-coder-py-4bit_LORA-v9", # "unsloth/tinyllama" for 16bit loading
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
# token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)
model = FastLanguageModel.get_peft_model(
model,
r = 256, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha = 512,
lora_dropout = 0, # Currently only supports dropout = 0
bias = "none", # Currently only supports bias = "none"
use_gradient_checkpointing = True, # @@@ IF YOU GET OUT OF MEMORY - set to True @@@
random_state = 3407,
use_rslora = False, # We support rank stabilized LoRA
loftq_config = None, # And LoftQ
)
alpaca_prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Input:
{}
### Output:
{}"""
EOS_TOKEN = tokenizer.eos_token
def formatting_prompts_func(examples):
inputs = examples["problem"]
outputs = examples["solution"]
texts = []
for input, output in zip(inputs, outputs):
# Must add EOS_TOKEN, otherwise your generation will go on forever!
text = alpaca_prompt.format(input, output) + EOS_TOKEN
texts.append(text)
return { "text" : texts}
pass
from datasets import load_dataset
dataset = load_dataset('json', data_files='/content/drive/MyDrive/data-oss_instruct-py-10.jsonl', split='train')
dataset = dataset.map(formatting_prompts_func, batched=True)
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported
from transformers.utils import logging
logging.set_verbosity_info()
trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset = dataset,
dataset_text_field = "text",
max_seq_length = max_seq_length,
dataset_num_proc = 2,
packing = True, # Packs short sequences together to save time!
args = TrainingArguments(
per_device_train_batch_size = 2,
gradient_accumulation_steps = 256,
warmup_ratio = 0.1,
num_train_epochs = 2,
learning_rate = 2e-4,
fp16 = not torch.cuda.is_bf16_supported(),
bf16 = torch.cuda.is_bf16_supported(),
logging_steps = 1,
optim = "adafactor", # adamw_torch ou adamw_torch_fused +10% velocidade ou adafactor ou adamw_8bit
weight_decay = 0.1,
lr_scheduler_type = "linear",
seed = 3407,
output_dir = "outputs",
),
)
trainer_stats = trainer.train()
model.save_pretrained("lora_model") # Local saving
tokenizer.save_pretrained("lora_model")
model.push_to_hub("Ramikan-BR/tinyllama-coder-py-4bit_LORA-v10", token = "hf_...") # Online saving
tokenizer.push_to_hub("Ramikan-BR/tinyllama-coder-py-4bit_LORA-v10", token = "hf_...") # Online saving
# Merge to 16bit
model.save_pretrained_merged("model", tokenizer, save_method = "merged_16bit",)
model.push_to_hub_merged("Ramikan-BR/tinyllama-coder-py-4bit-v10", tokenizer, save_method = "merged_16bit", token = "hf_...")
# Merge to 4bit
if False: model.save_pretrained_merged("model", tokenizer, save_method = "merged_4bit",)
if False: model.push_to_hub_merged("Ramikan-BR/tinyllama-coder-py-4bit-v10", tokenizer, save_method = "merged_4bit", token = "hf_...")
# Just LoRA adapters
if False: model.save_pretrained_merged("model", tokenizer, save_method = "lora",)
if False: model.push_to_hub_merged("Ramikan-BR/tinyllama-coder-py-4bit-v10", tokenizer, save_method = "lora", token = "hf_...")
# Save to 8bit Q8_0
model.save_pretrained_gguf("model", tokenizer,)
model.push_to_hub_gguf("Ramikan-BR/tinyllama-coder-py-4bit-v10", tokenizer, token = "hf_...")
# Save to 16bit GGUF
model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16")
model.push_to_hub_gguf("Ramikan-BR/tinyllama-coder-py-4bit-v10", tokenizer, quantization_method = "f16", token = "hf_...")
# Save to q4_k_m GGUF
model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
model.push_to_hub_gguf("Ramikan-BR/tinyllama-coder-py-4bit-v10", tokenizer, quantization_method = "q4_k_m", token = "hf_...")
Loss for 5 epochs in the last training session of the last part of the dataset:
==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1
\\ /| Num examples = 407 | Num Epochs = 5
O^O/ \_/ \ Batch size per device = 2 | Gradient Accumulation steps = 256
\ / Total batch size = 512 | Total steps = 5
"-____-" Number of trainable parameters = 201,850,880
[5/5 29:36, Epoch 3/5]
Step Training Loss
1 0.568000
2 0.145300
3 0.506100
4 0.331900
5 0.276100
Quick test 1 after training the last part of the dataset:
# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
alpaca_prompt.format(
"Continue the fibonnaci sequence.", # instruction
"1, 1, 2, 3, 5, 8", # input
"", # output - leave this blank for generation!
)
], return_tensors = "pt").to("cuda")
AI Response: ['<s> Below is an instruction that describes a task. Write a response that appropriately completes the request.\n### Input:\nContinue the fibonnaci sequence.\n\n### Output:\n1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 420, 787, 1444, 2881, 4765, 8640']
Quick test 2 after training the last part of the dataset:
# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
alpaca_prompt.format(
"Continue the fibonnaci sequence.", # instruction
"1, 1, 2, 3, 5, 8", # input
"", # output - leave this blank for generation!
)
], return_tensors = "pt").to("cuda")
from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)
AI Response: <s> Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Input:
Continue the fibonnaci sequence.
### Output:
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 420, 787, 1444, 2881, 4765, 8640, 17281, 31362, 65325, 128672, 251345, 410000, 720000, 1280000,
Quick test 3 after training the last part of the dataset:
if False:
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "lora_model", # YOUR MODEL YOU USED FOR TRAINING
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
# alpaca_prompt = You MUST copy from above!
inputs = tokenizer(
[
alpaca_prompt.format(
"What is a famous tall tower in Paris?", # instruction
"", # input
"", # output - leave this blank for generation!
)
], return_tensors = "pt").to("cuda")
from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 64)
AI Response: <s> Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Input:
What is a famous tall tower in Paris?
### Output:
The famous tall tower in Paris is the Eiffel Tower. It is a 300-meter-tall steel tower located in the heart of Paris, France. The tower was built in 18892 and is a popular tourist attraction. It is also a symbol of the city
outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
tokenizer.batch_decode(outputs)
```
This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

34
config.json Normal file
View File

@@ -0,0 +1,34 @@
{
"_name_or_path": "unsloth/tinyllama-chat-bnb-4bit",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 2048,
"initializer_range": 0.02,
"intermediate_size": 5632,
"max_position_embeddings": 4096,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 22,
"num_key_value_heads": 4,
"pad_token_id": 0,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": {
"factor": 2.0,
"type": "linear"
},
"rope_theta": 10000.0,
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.40.2",
"unsloth_version": "2024.5",
"use_cache": true,
"vocab_size": 32000
}

7
generation_config.json Normal file
View File

@@ -0,0 +1,7 @@
{
"bos_token_id": 1,
"eos_token_id": 2,
"max_length": 2048,
"pad_token_id": 0,
"transformers_version": "4.40.2"
}

3
model.safetensors Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0faa579c809df42b6c4cafae313cd1918fd83ebfcee169d466ec86dc40c589c6
size 2200119664

30
special_tokens_map.json Normal file
View File

@@ -0,0 +1,30 @@
{
"bos_token": {
"content": "<s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "</s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<unk>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"unk_token": {
"content": "<unk>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0035158417da031798c0641b335ddab3e2a06fca2ceaaaf9d4cf9d9f40039e64
size 2201017472

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:bd2cce05baa5e92e0c25c2f6ae201910c7ec94cc722e47719e2f7e8c79d24357
size 667815104

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8864d14d884d96ec0b4398ef33e8b6ee732cea4f103d9caaad295a05217b6648
size 1169808512

93392
tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

BIN
tokenizer.model (Stored with Git LFS) Normal file

Binary file not shown.

42
tokenizer_config.json Normal file
View File

@@ -0,0 +1,42 @@
{
"add_bos_token": true,
"add_eos_token": false,
"added_tokens_decoder": {
"0": {
"content": "<unk>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"1": {
"content": "<s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"2": {
"content": "</s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
}
},
"bos_token": "<s>",
"chat_template": "{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n' + message['content'] + eos_token }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}",
"clean_up_tokenization_spaces": false,
"eos_token": "</s>",
"legacy": false,
"model_max_length": 4096,
"pad_token": "<unk>",
"padding_side": "left",
"sp_model_kwargs": {},
"tokenizer_class": "LlamaTokenizer",
"unk_token": "<unk>",
"use_default_system_prompt": false
}