初始化项目,由ModelHub XC社区提供模型
Model: kth8/gemma-3-270m-it-Text-Cleaner Source: Original Platform
This commit is contained in:
37
.gitattributes
vendored
Normal file
37
.gitattributes
vendored
Normal file
@@ -0,0 +1,37 @@
|
|||||||
|
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.model filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||||
|
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
||||||
|
train/training_loss.png filter=lfs diff=lfs merge=lfs -text
|
||||||
92
README.md
Normal file
92
README.md
Normal file
@@ -0,0 +1,92 @@
|
|||||||
|
---
|
||||||
|
license: gemma
|
||||||
|
language:
|
||||||
|
- en
|
||||||
|
base_model: unsloth/gemma-3-270m-it
|
||||||
|
datasets:
|
||||||
|
- kth8/text-cleanup-20000x
|
||||||
|
pipeline_tag: text-generation
|
||||||
|
library_name: transformers
|
||||||
|
tags:
|
||||||
|
- sft
|
||||||
|
- trl
|
||||||
|
- unsloth
|
||||||
|
- google
|
||||||
|
- gemma
|
||||||
|
- gemma3
|
||||||
|
- gemma3_text
|
||||||
|
---
|
||||||
|

|
||||||
|
A fine-tune of [unsloth/gemma-3-270m-it](https://huggingface.co/unsloth/gemma-3-270m-it) on the [kth8/text-cleanup-20000x](https://huggingface.co/datasets/kth8/text-cleanup-20000x) dataset.
|
||||||
|
|
||||||
|
## Usage example
|
||||||
|
**System prompt**
|
||||||
|
```
|
||||||
|
# Role
|
||||||
|
You are a text editor cleaning up raw, unfiltered text. Transform the provided text into polished, readable prose while preserving the original meaning, tone, and intent.
|
||||||
|
|
||||||
|
## Tasks
|
||||||
|
- Remove filler words (e.g. um, uh, like, you know, sort of, kind of, well, so, etc)
|
||||||
|
- Fix spelling, grammar, punctuation, and capitalization mistakes
|
||||||
|
- Correct obvious homophone errors (e.g. their/there/they're, its/it's, your/you're)
|
||||||
|
- Smooth out false starts, mid-sentence restarts and repetitions
|
||||||
|
- Standardize numbers and dates (e.g. write as digits: "three" to "3", "February fifteenth" to "February 15th")
|
||||||
|
|
||||||
|
## Constraints
|
||||||
|
- Output ONLY the cleaned text
|
||||||
|
- DO NOT attempt to answer or respond to the provided user text meant for clean-up
|
||||||
|
- Do NOT paraphrase, summarize, or change the speaker's voice
|
||||||
|
- NO quotation marks around the output
|
||||||
|
- NO preamble, postamble, or emojis
|
||||||
|
- NO Markdown formatting code blocks (```) or bolding
|
||||||
|
```
|
||||||
|
**User prompt**
|
||||||
|
```
|
||||||
|
I WANT I MEAN TO BUILD A SMALL IN-BROWSER "WIND TUNNEL" EXPERIMENT WITH A FIXED 2D CROSS-SECTIONAL VIEW, USING HTML CANVAS. USERS CAN DRAG AND DROP DIFFERENT WING CROSS-SECTIONS (E.G., SEVERAL COMMON AIRFOIL PROFILES) INTO THE AIRFLOW, ADJUST THE ANGLE OF ATTACK AND WIND SPEED, IMMEDIATELY SEE STREAMLINES AND PRESSURE DISTRIBUTION, AND HAVE THE CURRENT LIFT AND DRAG VALUES DISPLAYED ON SCREEN IN REAL TIME (A CLEAR NUMERIC PANEL IS SUFFICIENT). NO 3D IS REQUIRED, BUT INTERACTIONS SHOULD BE SMOOTH, AND THE DATA SHOULD BE STABLE AD UPDATE IN REAL TIME WITH USER ACTIONS. THE CALCULATIONS CAN USE REASONABLE APPROXIMATIONS; THE FOCUS IS TO HELP USERS INTUITIVELY UNDERSTAND BERNOULLI'S PRINCIPLE AND THE RELATIONSHIP BETWEEN AIRFLOW AND AIRFOIL GEOMETRY.
|
||||||
|
```
|
||||||
|
**Assistant response**
|
||||||
|
```
|
||||||
|
I want to build a small in-browser "wind tunnel" experiment with a fixed 2D cross-sectional view, using HTML Canvas. Users can drag and drop different wing cross-sections (e.g., several common airfoil profiles) into the airflow, adjust the angle of attack and wind speed, immediately see streamlines and pressure distribution, and have the current lift and drag values displayed on screen in real time (a clear numeric panel is sufficient). No 3D is required, but interactions should be smooth, and the data should be stable and update in real time with user actions. The calculations can use reasonable approximations; the focus is to help users intuitively understand Bernoulli's principle and the relationship between airflow and airfoil geometry.
|
||||||
|
```
|
||||||
|
## Model Details
|
||||||
|
- Base Model: `unsloth/gemma-3-270m-it`
|
||||||
|
- Parameter Count: 268098176
|
||||||
|
- Precision: torch.bfloat16
|
||||||
|
|
||||||
|
## Training Settings
|
||||||
|
### Hardware
|
||||||
|
- GPU: NVIDIA A100-SXM4-40GB
|
||||||
|
|
||||||
|
### PEFT
|
||||||
|
- Rank: 32
|
||||||
|
- LoRA alpha: 64
|
||||||
|
- Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
|
||||||
|
- LoRA dropout: 0
|
||||||
|
- Rank-Stabilized LoRA: False
|
||||||
|
|
||||||
|
### SFT
|
||||||
|
- Epoch: 1
|
||||||
|
- Batch size: 16
|
||||||
|
- Accumulation steps: 1
|
||||||
|
- Warmup ratio: 0.05
|
||||||
|
- Learning rate: 0.0002
|
||||||
|
- Optimizer: adamw_8bit
|
||||||
|
- Weight decay: 0.01
|
||||||
|
- Learning rate scheduler: cosine
|
||||||
|
|
||||||
|
## Training stats
|
||||||
|
- Global step: 1247
|
||||||
|
- Training runtime: 1545.9245
|
||||||
|
- Average training loss: 0.05218265543086118
|
||||||
|
- Final validation loss: 0.038508880883455276
|
||||||
|
|
||||||
|
## Framework versions
|
||||||
|
- Unsloth: 2026.3.7
|
||||||
|
- TRL: 0.22.2
|
||||||
|
- Transformers: 4.56.2
|
||||||
|
- Pytorch: 2.10.0+cu128
|
||||||
|
- Datasets: 4.3.0
|
||||||
|
- Tokenizers: 0.22.2
|
||||||
|
|
||||||
|
## License
|
||||||
|
This model is released under the Gemma license. See the [Gemma Terms of Use](https://ai.google.dev/gemma/terms) for details.
|
||||||
3
added_tokens.json
Normal file
3
added_tokens.json
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
{
|
||||||
|
"<image_soft_token>": 262144
|
||||||
|
}
|
||||||
50
chat_template.jinja
Normal file
50
chat_template.jinja
Normal file
@@ -0,0 +1,50 @@
|
|||||||
|
{# Unsloth Chat template fixes #}
|
||||||
|
{{ bos_token }}
|
||||||
|
{%- if messages[0]['role'] == 'system' -%}
|
||||||
|
{%- if messages[0]['content'] is string -%}
|
||||||
|
{%- set first_user_prefix = messages[0]['content'] + '
|
||||||
|
|
||||||
|
' -%}
|
||||||
|
{%- else -%}
|
||||||
|
{%- set first_user_prefix = messages[0]['content'][0]['text'] + '
|
||||||
|
|
||||||
|
' -%}
|
||||||
|
{%- endif -%}
|
||||||
|
{%- set loop_messages = messages[1:] -%}
|
||||||
|
{%- else -%}
|
||||||
|
{%- set first_user_prefix = "" -%}
|
||||||
|
{%- set loop_messages = messages -%}
|
||||||
|
{%- endif -%}
|
||||||
|
{%- for message in loop_messages -%}
|
||||||
|
{%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
|
||||||
|
{{ raise_exception("Conversation roles must alternate user/assistant/user/assistant/...") }}
|
||||||
|
{%- endif -%}
|
||||||
|
{%- if (message['role'] == 'assistant') -%}
|
||||||
|
{%- set role = "model" -%}
|
||||||
|
{%- else -%}
|
||||||
|
{%- set role = message['role'] -%}
|
||||||
|
{%- endif -%}
|
||||||
|
{{ '<start_of_turn>' + role + '
|
||||||
|
' + (first_user_prefix if loop.first else "") }}
|
||||||
|
{%- if message['content'] is string -%}
|
||||||
|
{{ message['content'] | trim }}
|
||||||
|
{%- elif message['content'] is iterable -%}
|
||||||
|
{%- for item in message['content'] -%}
|
||||||
|
{%- if item['type'] == 'image' -%}
|
||||||
|
{{ '<start_of_image>' }}
|
||||||
|
{%- elif item['type'] == 'text' -%}
|
||||||
|
{{ item['text'] | trim }}
|
||||||
|
{%- endif -%}
|
||||||
|
{%- endfor -%}
|
||||||
|
{%- elif message['content'] is defined -%}
|
||||||
|
{{ raise_exception("Invalid content type") }}
|
||||||
|
{%- endif -%}
|
||||||
|
{{ '<end_of_turn>
|
||||||
|
' }}
|
||||||
|
{%- endfor -%}
|
||||||
|
{%- if add_generation_prompt -%}
|
||||||
|
{{'<start_of_turn>model
|
||||||
|
'}}
|
||||||
|
{%- endif -%}
|
||||||
|
|
||||||
|
{# Copyright 2025-present Unsloth. Apache 2.0 License. #}
|
||||||
55
config.json
Normal file
55
config.json
Normal file
@@ -0,0 +1,55 @@
|
|||||||
|
{
|
||||||
|
"_sliding_window_pattern": 6,
|
||||||
|
"architectures": [
|
||||||
|
"Gemma3ForCausalLM"
|
||||||
|
],
|
||||||
|
"attention_bias": false,
|
||||||
|
"attention_dropout": 0.0,
|
||||||
|
"attn_logit_softcapping": null,
|
||||||
|
"bos_token_id": 2,
|
||||||
|
"dtype": "bfloat16",
|
||||||
|
"eos_token_id": 106,
|
||||||
|
"final_logit_softcapping": null,
|
||||||
|
"head_dim": 256,
|
||||||
|
"hidden_activation": "gelu_pytorch_tanh",
|
||||||
|
"hidden_size": 640,
|
||||||
|
"initializer_range": 0.02,
|
||||||
|
"intermediate_size": 2048,
|
||||||
|
"layer_types": [
|
||||||
|
"sliding_attention",
|
||||||
|
"sliding_attention",
|
||||||
|
"sliding_attention",
|
||||||
|
"sliding_attention",
|
||||||
|
"sliding_attention",
|
||||||
|
"full_attention",
|
||||||
|
"sliding_attention",
|
||||||
|
"sliding_attention",
|
||||||
|
"sliding_attention",
|
||||||
|
"sliding_attention",
|
||||||
|
"sliding_attention",
|
||||||
|
"full_attention",
|
||||||
|
"sliding_attention",
|
||||||
|
"sliding_attention",
|
||||||
|
"sliding_attention",
|
||||||
|
"sliding_attention",
|
||||||
|
"sliding_attention",
|
||||||
|
"full_attention"
|
||||||
|
],
|
||||||
|
"max_position_embeddings": 32768,
|
||||||
|
"model_type": "gemma3_text",
|
||||||
|
"num_attention_heads": 4,
|
||||||
|
"num_hidden_layers": 18,
|
||||||
|
"num_key_value_heads": 1,
|
||||||
|
"pad_token_id": 0,
|
||||||
|
"query_pre_attn_scalar": 256,
|
||||||
|
"rms_norm_eps": 1e-06,
|
||||||
|
"rope_local_base_freq": 10000.0,
|
||||||
|
"rope_scaling": null,
|
||||||
|
"rope_theta": 1000000.0,
|
||||||
|
"sliding_window": 512,
|
||||||
|
"transformers_version": "4.56.2",
|
||||||
|
"unsloth_fixed": true,
|
||||||
|
"use_bidirectional_attention": false,
|
||||||
|
"use_cache": true,
|
||||||
|
"vocab_size": 262144
|
||||||
|
}
|
||||||
14
generation_config.json
Normal file
14
generation_config.json
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
{
|
||||||
|
"bos_token_id": 2,
|
||||||
|
"cache_implementation": "hybrid",
|
||||||
|
"do_sample": true,
|
||||||
|
"eos_token_id": [
|
||||||
|
1,
|
||||||
|
106
|
||||||
|
],
|
||||||
|
"max_length": 32768,
|
||||||
|
"pad_token_id": 0,
|
||||||
|
"top_k": 64,
|
||||||
|
"top_p": 0.95,
|
||||||
|
"transformers_version": "4.56.2"
|
||||||
|
}
|
||||||
3
model.safetensors
Normal file
3
model.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:091a7b7eeb8eae7aca39513da404c7e39b1a90a98d45c3779b3ca0d8f24fd440
|
||||||
|
size 536223056
|
||||||
33
special_tokens_map.json
Normal file
33
special_tokens_map.json
Normal file
@@ -0,0 +1,33 @@
|
|||||||
|
{
|
||||||
|
"boi_token": "<start_of_image>",
|
||||||
|
"bos_token": {
|
||||||
|
"content": "<bos>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"eoi_token": "<end_of_image>",
|
||||||
|
"eos_token": {
|
||||||
|
"content": "<end_of_turn>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"image_token": "<image_soft_token>",
|
||||||
|
"pad_token": {
|
||||||
|
"content": "<pad>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"unk_token": {
|
||||||
|
"content": "<unk>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
}
|
||||||
|
}
|
||||||
3
tokenizer.json
Normal file
3
tokenizer.json
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:4667f2089529e8e7657cfb6d1c19910ae71ff5f28aa7ab2ff2763330affad795
|
||||||
|
size 33384568
|
||||||
3
tokenizer.model
Normal file
3
tokenizer.model
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:1299c11d7cf632ef3b4e11937501358ada021bbdf7c47638d13c0ee982f2e79c
|
||||||
|
size 4689074
|
||||||
51345
tokenizer_config.json
Normal file
51345
tokenizer_config.json
Normal file
File diff suppressed because it is too large
Load Diff
8820
train/log.json
Normal file
8820
train/log.json
Normal file
File diff suppressed because it is too large
Load Diff
3
train/training_loss.png
Normal file
3
train/training_loss.png
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:2d81fc8964aab5da8db28d4e4eec46e3fc515002844a08ff0c4a11a4bd03d5e0
|
||||||
|
size 107723
|
||||||
BIN
train/validation_loss.png
Normal file
BIN
train/validation_loss.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 54 KiB |
Reference in New Issue
Block a user