--- license: apache-2.0 datasets: - HuggingFaceH4/no_robots - HuggingFaceTB/everyday-conversations-llama3.1-2k language: - en base_model: - distilbert/distilgpt2 pipeline_tag: text-generation library_name: transformers --- # Treeswift Treeswift is a derivative of DistilGPT2 trained to be conversational. It is also designed to be similar to GPT-3.5. ## Training The model was trained using 2,750 steps, and 4 batch size. ### Datasets The training corpus is made up of: - [HuggingFaceH4/no_robots](https://huggingface.co/datasets/HuggingFaceH4/no_robots) - [HuggingFaceTB/everyday-conversations-llama3.1-2k](https://huggingface.co/datasets/HuggingFaceTB/everyday-conversations-llama3.1-2k) The `train` / `train_sft` splits were used. ### Chat template The Zephyr chat template was used, but most notably, chat template tokens were added to enhance performance. ## Limitations The model frequently outputs incorrect information, confirmation with a larger, mature model is advised. In addition, it may subtly repeat.