47 lines
1.1 KiB
Markdown
47 lines
1.1 KiB
Markdown
---
|
|
library_name: transformers
|
|
pipeline_tag: text-generation
|
|
tags:
|
|
- merlina
|
|
- grimoire
|
|
- text-generation
|
|
- sft
|
|
datasets:
|
|
- flammenai/flame-kindling-v1
|
|
base_model:
|
|
- Qwen/Qwen2.5-3B-Instruct
|
|
---
|
|
|
|
# FlameDesigner-Qwen2.5-3B-v1
|
|
|
|
## Training Configuration
|
|
|
|
| Parameter | Value |
|
|
|-----------|-------|
|
|
| Training Mode | SFT |
|
|
| Base Model | `Qwen/Qwen2.5-3B-Instruct` |
|
|
| Learning Rate | 0.0001 |
|
|
| Epochs | 2 |
|
|
| Batch Size | 1 |
|
|
| Gradient Accumulation | 8 |
|
|
| Effective Batch Size | 8 |
|
|
| Max Sequence Length | 2048 |
|
|
| Optimizer | paged_adamw_8bit |
|
|
| LR Scheduler | cosine |
|
|
| Warmup Ratio | 0.05 |
|
|
| Weight Decay | 0.01 |
|
|
| Max Grad Norm | 0.5 |
|
|
| Seed | 42 |
|
|
| LoRA Rank (r) | 128 |
|
|
| LoRA Alpha | 128 |
|
|
| LoRA Dropout | 0.05 |
|
|
| Target Modules | up_proj, down_proj, gate_proj, k_proj, q_proj, v_proj, o_proj |
|
|
| Quantization | 4-bit (NF4) |
|
|
| GPU | NVIDIA RTX A6000 |
|
|
|
|
---
|
|
|
|

|
|
|
|
[Merlina on GitHub](https://github.com/Schneewolf-Labs/Merlina)
|