37 lines
1.0 KiB
Markdown
37 lines
1.0 KiB
Markdown
|
|
---
|
||
|
|
license: apache-2.0
|
||
|
|
datasets:
|
||
|
|
- HuggingFaceH4/no_robots
|
||
|
|
- HuggingFaceTB/everyday-conversations-llama3.1-2k
|
||
|
|
language:
|
||
|
|
- en
|
||
|
|
base_model:
|
||
|
|
- distilbert/distilgpt2
|
||
|
|
pipeline_tag: text-generation
|
||
|
|
library_name: transformers
|
||
|
|
---
|
||
|
|
|
||
|
|
# Treeswift
|
||
|
|
|
||
|
|
Treeswift is a derivative of DistilGPT2 trained to be conversational. It is also designed to be similar to GPT-3.5.
|
||
|
|
|
||
|
|
## Training
|
||
|
|
|
||
|
|
The model was trained using 2,750 steps, and 4 batch size.
|
||
|
|
|
||
|
|
### Datasets
|
||
|
|
|
||
|
|
The training corpus is made up of:
|
||
|
|
|
||
|
|
- [HuggingFaceH4/no_robots](https://huggingface.co/datasets/HuggingFaceH4/no_robots)
|
||
|
|
- [HuggingFaceTB/everyday-conversations-llama3.1-2k](https://huggingface.co/datasets/HuggingFaceTB/everyday-conversations-llama3.1-2k)
|
||
|
|
|
||
|
|
The `train` / `train_sft` splits were used.
|
||
|
|
|
||
|
|
### Chat template
|
||
|
|
|
||
|
|
The Zephyr chat template was used, but most notably, chat template tokens were added to enhance performance.
|
||
|
|
|
||
|
|
## Limitations
|
||
|
|
|
||
|
|
The model frequently outputs incorrect information, confirmation with a larger, mature model is advised. In addition, it may subtly repeat.
|