初始化项目,由ModelHub XC社区提供模型
Model: kth8/gemma-3-1b-it-IFeval Source: Original Platform
This commit is contained in:
36
.gitattributes
vendored
Normal file
36
.gitattributes
vendored
Normal file
@@ -0,0 +1,36 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
||||
99
README.md
Normal file
99
README.md
Normal file
@@ -0,0 +1,99 @@
|
||||
---
|
||||
license: gemma
|
||||
language:
|
||||
- en
|
||||
base_model: unsloth/gemma-3-1b-it
|
||||
datasets:
|
||||
- HuggingFaceH4/ifeval-like-data
|
||||
pipeline_tag: text-generation
|
||||
library_name: transformers
|
||||
tags:
|
||||
- sft
|
||||
- trl
|
||||
- unsloth
|
||||
- google
|
||||
- gemma
|
||||
- gemma3
|
||||
- gemma3_text
|
||||
---
|
||||

|
||||
A fine-tune of [unsloth/gemma-3-1b-it](https://huggingface.co/unsloth/gemma-3-1b-it) on the [HuggingFaceH4/ifeval-like-data](https://huggingface.co/datasets/HuggingFaceH4/ifeval-like-data) dataset.
|
||||
|
||||
## Usage example
|
||||
**System prompt**
|
||||
```
|
||||
You are a helpful assistant.
|
||||
```
|
||||
**User prompt**
|
||||
```
|
||||
We want to teach our daughter the value of a clean home. Can you suggest chores that are appropriate for a child who is who 5 years old?
|
||||
```
|
||||
**Assistant response**
|
||||
```
|
||||
No Problem! Here are some great chores for a 5-year-old to do.
|
||||
|
||||
• Vacuuming: Most kids love the vacuum so this is a great starter chore for a five-year-old. Make sure to go through the do’s and don’t of vacuuming and remind your child not everything gets picked up by a
|
||||
vacuum.
|
||||
• Bedroom Cleanup: Teach your child to take care of their bedrooms. Start with picking up toys, this can be made into a game by color or items. Another good way of teaching children to fine-tune their bedrooms is
|
||||
lining up their shoes, making their beds, and ensuring nothing is out of order.
|
||||
• Separate Laundry: At 5 a children can easily distinguish between colors, whites, and towels. Have your children separate the laundry to make the task of washing clothes easier and more efficient.
|
||||
• Folding Laundry: This is a great time to teach children how to fold laundry. At 5 they won’t be perfect but little hands can handle washcloths, hand towels, and mating socks.
|
||||
• Putting Away Groceries: This is a good way for children to learn everything has a place to go. With your guidance, a small task like putting away groceries will teach them later on how to put away and keep things
|
||||
neat.
|
||||
• Dusting: Grab a duster and teach your child how to carefully remove things to dust tables, handrails, or other kid-friendly dustable areas.
|
||||
• Set and Clean The Table: Most households with small kids have kid-friendly items for the table. Allow your child to set the table before dinner. After dinner teach your child to remove their items and take them to
|
||||
the kitchen sink to be rinsed off. This chore will become a healthy habit.
|
||||
• Make The Bed: At five this won’t be perfect, but a child will be able to learn to set the pillow properly and adjust their bedding so it's more tidy. If they use extra blankets at night teach them to put them away or
|
||||
fold them at the bottom of the bed. Don’t forget to adjust the stuffed animals!
|
||||
|
||||
```
|
||||
## Model Details
|
||||
- Base Model: `unsloth/gemma-3-1b-it`
|
||||
- Parameter Count: 999,885,952
|
||||
- Precision: torch.bfloat16
|
||||
|
||||
## Hardware
|
||||
- GPU: NVIDIA RTX PRO 6000 Blackwell Server Edition
|
||||
- Announced: Mar 17th, 2025
|
||||
- Release Date: Mar 18th, 2025
|
||||
- Memory Type: GDDR7
|
||||
- Bandwidth: 1.79 TB/s
|
||||
- Memory Size: 96 GB
|
||||
- Memory Bus: 512 bit
|
||||
- Shading Units: 24064
|
||||
- TDP: 600W
|
||||
|
||||
## Training Settings
|
||||
### PEFT
|
||||
- Rank: 32
|
||||
- LoRA alpha: 64
|
||||
- Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
|
||||
- Gradient checkpointing: unsloth
|
||||
|
||||
### SFT
|
||||
- Epoch: 2
|
||||
- Batch size: 16
|
||||
- Gradient Accumulation steps: 1
|
||||
- Warmup ratio: 0.05
|
||||
- Learning rate: 0.0004
|
||||
- Optimizer: adamw_torch_fused
|
||||
- Learning rate scheduler: cosine
|
||||
|
||||
## Training stats
|
||||
- Date: 2026-03-24T11:16:12.098497
|
||||
- Peak VRAM usage: 34.73 GB
|
||||
- Global step: 1850
|
||||
- Training runtime (seconds): 958.4314
|
||||
- Average training loss: 1.6259772120295344
|
||||
- Final validation loss: 1.7383391857147217
|
||||
|
||||
## Framework versions
|
||||
- Unsloth: 2026.3.10
|
||||
- TRL: 0.22.2
|
||||
- Transformers: 4.56.2
|
||||
- Pytorch: 2.10.0+cu128
|
||||
- Datasets: 4.8.4
|
||||
- Tokenizers: 0.22.2
|
||||
|
||||
## License
|
||||
This model is released under the Gemma license. See the [Gemma Terms of Use](https://ai.google.dev/gemma/terms) and [Prohibited Use Policy](https://policies.google.com/terms/generative-ai/use-policy) regarding the use of Gemma-generated content.
|
||||
3
added_tokens.json
Normal file
3
added_tokens.json
Normal file
@@ -0,0 +1,3 @@
|
||||
{
|
||||
"<image_soft_token>": 262144
|
||||
}
|
||||
47
chat_template.jinja
Normal file
47
chat_template.jinja
Normal file
@@ -0,0 +1,47 @@
|
||||
{{ bos_token }}
|
||||
{%- if messages[0]['role'] == 'system' -%}
|
||||
{%- if messages[0]['content'] is string -%}
|
||||
{%- set first_user_prefix = messages[0]['content'] + '
|
||||
|
||||
' -%}
|
||||
{%- else -%}
|
||||
{%- set first_user_prefix = messages[0]['content'][0]['text'] + '
|
||||
|
||||
' -%}
|
||||
{%- endif -%}
|
||||
{%- set loop_messages = messages[1:] -%}
|
||||
{%- else -%}
|
||||
{%- set first_user_prefix = "" -%}
|
||||
{%- set loop_messages = messages -%}
|
||||
{%- endif -%}
|
||||
{%- for message in loop_messages -%}
|
||||
{%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
|
||||
{{ raise_exception("Conversation roles must alternate user/assistant/user/assistant/...") }}
|
||||
{%- endif -%}
|
||||
{%- if (message['role'] == 'assistant') -%}
|
||||
{%- set role = "model" -%}
|
||||
{%- else -%}
|
||||
{%- set role = message['role'] -%}
|
||||
{%- endif -%}
|
||||
{{ '<start_of_turn>' + role + '
|
||||
' + (first_user_prefix if loop.first else "") }}
|
||||
{%- if message['content'] is string -%}
|
||||
{{ message['content'] | trim }}
|
||||
{%- elif message['content'] is iterable -%}
|
||||
{%- for item in message['content'] -%}
|
||||
{%- if item['type'] == 'image' -%}
|
||||
{{ '<start_of_image>' }}
|
||||
{%- elif item['type'] == 'text' -%}
|
||||
{{ item['text'] | trim }}
|
||||
{%- endif -%}
|
||||
{%- endfor -%}
|
||||
{%- else -%}
|
||||
{{ raise_exception("Invalid content type") }}
|
||||
{%- endif -%}
|
||||
{{ '<end_of_turn>
|
||||
' }}
|
||||
{%- endfor -%}
|
||||
{%- if add_generation_prompt -%}
|
||||
{{'<start_of_turn>model
|
||||
'}}
|
||||
{%- endif -%}
|
||||
63
config.json
Normal file
63
config.json
Normal file
@@ -0,0 +1,63 @@
|
||||
{
|
||||
"_sliding_window_pattern": 6,
|
||||
"architectures": [
|
||||
"Gemma3ForCausalLM"
|
||||
],
|
||||
"attention_bias": false,
|
||||
"attention_dropout": 0.0,
|
||||
"attn_logit_softcapping": null,
|
||||
"bos_token_id": 2,
|
||||
"cache_implementation": "hybrid",
|
||||
"dtype": "bfloat16",
|
||||
"eos_token_id": 106,
|
||||
"final_logit_softcapping": null,
|
||||
"head_dim": 256,
|
||||
"hidden_activation": "gelu_pytorch_tanh",
|
||||
"hidden_size": 1152,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 6912,
|
||||
"layer_types": [
|
||||
"sliding_attention",
|
||||
"sliding_attention",
|
||||
"sliding_attention",
|
||||
"sliding_attention",
|
||||
"sliding_attention",
|
||||
"full_attention",
|
||||
"sliding_attention",
|
||||
"sliding_attention",
|
||||
"sliding_attention",
|
||||
"sliding_attention",
|
||||
"sliding_attention",
|
||||
"full_attention",
|
||||
"sliding_attention",
|
||||
"sliding_attention",
|
||||
"sliding_attention",
|
||||
"sliding_attention",
|
||||
"sliding_attention",
|
||||
"full_attention",
|
||||
"sliding_attention",
|
||||
"sliding_attention",
|
||||
"sliding_attention",
|
||||
"sliding_attention",
|
||||
"sliding_attention",
|
||||
"full_attention",
|
||||
"sliding_attention",
|
||||
"sliding_attention"
|
||||
],
|
||||
"max_position_embeddings": 32768,
|
||||
"model_type": "gemma3_text",
|
||||
"num_attention_heads": 4,
|
||||
"num_hidden_layers": 26,
|
||||
"num_key_value_heads": 1,
|
||||
"pad_token_id": 0,
|
||||
"query_pre_attn_scalar": 256,
|
||||
"rms_norm_eps": 1e-06,
|
||||
"rope_local_base_freq": 10000,
|
||||
"rope_scaling": null,
|
||||
"rope_theta": 1000000,
|
||||
"sliding_window": 512,
|
||||
"transformers_version": "4.56.2",
|
||||
"unsloth_fixed": true,
|
||||
"use_cache": true,
|
||||
"vocab_size": 262144
|
||||
}
|
||||
14
generation_config.json
Normal file
14
generation_config.json
Normal file
@@ -0,0 +1,14 @@
|
||||
{
|
||||
"bos_token_id": 2,
|
||||
"cache_implementation": "hybrid",
|
||||
"do_sample": true,
|
||||
"eos_token_id": [
|
||||
1,
|
||||
106
|
||||
],
|
||||
"max_length": 32768,
|
||||
"pad_token_id": 0,
|
||||
"top_k": 64,
|
||||
"top_p": 0.95,
|
||||
"transformers_version": "4.56.2"
|
||||
}
|
||||
3
model.safetensors
Normal file
3
model.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:8b63f9f5513b2f2152db3f57972313fdfb27abbcb99c612b8d77cfa6957fcddd
|
||||
size 1999811208
|
||||
33
special_tokens_map.json
Normal file
33
special_tokens_map.json
Normal file
@@ -0,0 +1,33 @@
|
||||
{
|
||||
"boi_token": "<start_of_image>",
|
||||
"bos_token": {
|
||||
"content": "<bos>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"eoi_token": "<end_of_image>",
|
||||
"eos_token": {
|
||||
"content": "<end_of_turn>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"image_token": "<image_soft_token>",
|
||||
"pad_token": {
|
||||
"content": "<pad>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"unk_token": {
|
||||
"content": "<unk>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
3
tokenizer.json
Normal file
3
tokenizer.json
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:4667f2089529e8e7657cfb6d1c19910ae71ff5f28aa7ab2ff2763330affad795
|
||||
size 33384568
|
||||
3
tokenizer.model
Normal file
3
tokenizer.model
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:1299c11d7cf632ef3b4e11937501358ada021bbdf7c47638d13c0ee982f2e79c
|
||||
size 4689074
|
||||
51346
tokenizer_config.json
Normal file
51346
tokenizer_config.json
Normal file
File diff suppressed because it is too large
Load Diff
1426
train/log.json
Normal file
1426
train/log.json
Normal file
File diff suppressed because it is too large
Load Diff
BIN
train/training_loss.png
Normal file
BIN
train/training_loss.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 74 KiB |
BIN
train/validation_loss.png
Normal file
BIN
train/validation_loss.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 44 KiB |
Reference in New Issue
Block a user