Files
treeswift-90m/README.md

37 lines
1.0 KiB
Markdown
Raw Permalink Normal View History

---
license: apache-2.0
datasets:
- HuggingFaceH4/no_robots
- HuggingFaceTB/everyday-conversations-llama3.1-2k
language:
- en
base_model:
- distilbert/distilgpt2
pipeline_tag: text-generation
library_name: transformers
---
# Treeswift
Treeswift is a derivative of DistilGPT2 trained to be conversational. It is also designed to be similar to GPT-3.5.
## Training
The model was trained using 2,750 steps, and 4 batch size.
### Datasets
The training corpus is made up of:
- [HuggingFaceH4/no_robots](https://huggingface.co/datasets/HuggingFaceH4/no_robots)
- [HuggingFaceTB/everyday-conversations-llama3.1-2k](https://huggingface.co/datasets/HuggingFaceTB/everyday-conversations-llama3.1-2k)
The `train` / `train_sft` splits were used.
### Chat template
The Zephyr chat template was used, but most notably, chat template tokens were added to enhance performance.
## Limitations
The model frequently outputs incorrect information, confirmation with a larger, mature model is advised. In addition, it may subtly repeat.