初始化项目,由ModelHub XC社区提供模型
Model: flammenai/FlameDesigner-Qwen2.5-3B-v1 Source: Original Platform
This commit is contained in:
46
README.md
Normal file
46
README.md
Normal file
@@ -0,0 +1,46 @@
|
||||
---
|
||||
library_name: transformers
|
||||
pipeline_tag: text-generation
|
||||
tags:
|
||||
- merlina
|
||||
- grimoire
|
||||
- text-generation
|
||||
- sft
|
||||
datasets:
|
||||
- flammenai/flame-kindling-v1
|
||||
base_model:
|
||||
- Qwen/Qwen2.5-3B-Instruct
|
||||
---
|
||||
|
||||
# FlameDesigner-Qwen2.5-3B-v1
|
||||
|
||||
## Training Configuration
|
||||
|
||||
| Parameter | Value |
|
||||
|-----------|-------|
|
||||
| Training Mode | SFT |
|
||||
| Base Model | `Qwen/Qwen2.5-3B-Instruct` |
|
||||
| Learning Rate | 0.0001 |
|
||||
| Epochs | 2 |
|
||||
| Batch Size | 1 |
|
||||
| Gradient Accumulation | 8 |
|
||||
| Effective Batch Size | 8 |
|
||||
| Max Sequence Length | 2048 |
|
||||
| Optimizer | paged_adamw_8bit |
|
||||
| LR Scheduler | cosine |
|
||||
| Warmup Ratio | 0.05 |
|
||||
| Weight Decay | 0.01 |
|
||||
| Max Grad Norm | 0.5 |
|
||||
| Seed | 42 |
|
||||
| LoRA Rank (r) | 128 |
|
||||
| LoRA Alpha | 128 |
|
||||
| LoRA Dropout | 0.05 |
|
||||
| Target Modules | up_proj, down_proj, gate_proj, k_proj, q_proj, v_proj, o_proj |
|
||||
| Quantization | 4-bit (NF4) |
|
||||
| GPU | NVIDIA RTX A6000 |
|
||||
|
||||
---
|
||||
|
||||

|
||||
|
||||
[Merlina on GitHub](https://github.com/Schneewolf-Labs/Merlina)
|
||||
Reference in New Issue
Block a user