Files
ModelHub XC ba6571a1c6 初始化项目,由ModelHub XC社区提供模型
Model: artificialguybr/QWEN-2.5-0.5B-Synthia-II
Source: Original Platform
2026-04-25 16:09:01 +08:00

96 lines
3.5 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
library_name: transformers
license: apache-2.0
base_model: Qwen/Qwen2.5-0.5B
tags:
- generated_from_trainer
- text-generation
- conversational
model-index:
- name: outputs/qwen2.5-0.5b-ft-synthia15-i
results: []
datasets:
- migtissera/Synthia-v1.5-II
---
# Qwen2.5-0.5B Synthia Fine-tuned Model
This is a fine-tuned version of [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) on the [Synthia-v1.5-II](https://huggingface.co/datasets/migtissera/Synthia-v1.5-II) dataset, optimized for conversational AI and instruction following.
[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
## Model Description
This model builds upon the powerful Qwen2.5-0.5B base model, which features:
- 490M parameters (360M non-embedding parameters)
- 24 transformer layers
- 14 attention heads for queries and 2 for key/values (GQA architecture)
---
### 🌐 Website
You can find more of my models, projects, and information on my official website:
- **[artificialguy.com](https://artificialguy.com/)**
### 🚀 Prompt Hub
Need high-quality prompts for image models and LLMs? Explore **[findgoodprompt.com](https://findgoodprompt.com)**.
### 💖 Support My Work
If you find this model useful, please consider supporting my work. It helps me cover server costs and dedicate more time to new open-source projects.
- **Patreon:** [Support on Patreon](https://www.patreon.com/user?u=81570187)
- **Ko-fi:** [Buy me a Ko-fi](https://ko-fi.com/artificialguybr)
- **Buy Me a Coffee:** [Buy me a Coffee](https://buymeacoffee.com/jvkape)
- Support for 32,768 context length
- Advanced features like RoPE positional embeddings, SwiGLU activations, and RMSNorm
The model has been fine-tuned on the Synthia-v1.5-II dataset, which is designed to enhance instruction following and conversational abilities. The training process used careful hyperparameter tuning to maintain the base model's capabilities while optimizing for natural dialogue and instruction following.
## Intended Uses & Limitations
This model is intended for:
- Conversational AI applications
- Instruction following tasks
- Text generation with strong coherence
- Multi-turn dialogue systems
Limitations:
- The model inherits the 32K token context window from the base model
- As a 0.5B parameter model, it may not match larger models in complex reasoning tasks
- Performance in non-English languages may be limited
- Users should be aware of potential biases present in the training data
## Training and Evaluation Data
The model was fine-tuned on the Synthia-v1.5-II dataset, which is specifically designed for instruction-following and conversational AI. The training process used:
- 95% of data for training
- 5% for validation
- Instruction format: "[INST] {instruction} [/INST]"
## Training Procedure
### Training Hyperparameters
Key hyperparameters:
- Learning rate: 1e-05
- Batch size: 40 (5 micro-batch × 8 gradient accumulation steps)
- Training epochs: 3
- Optimizer: AdamW (β1=0.9, β2=0.999, ε=1e-8)
- Learning rate scheduler: Cosine with 100 warmup steps
- Sequence length: 4096
- Sample packing: Enabled
- Mixed precision: BF16
### Training Results
The model was trained for 672 steps over 3 epochs, showing consistent improvement throughout the training process.
### Framework Versions
- Transformers 4.46.0
- PyTorch 2.3.1+cu121
- Datasets 3.0.1
- Tokenizers 0.20.1
## Citation
If you use this model, please cite both the original Qwen2.5 work and this fine-tuned version: