treeswift-90m/README.md

---
license: apache-2.0
datasets:
- HuggingFaceH4/no_robots
- HuggingFaceTB/everyday-conversations-llama3.1-2k
language:
- en
base_model:
- distilbert/distilgpt2
pipeline_tag: text-generation
library_name: transformers
---

# Treeswift

Treeswift is a derivative of DistilGPT2 trained to be conversational. It is also designed to be similar to GPT-3.5.

## Training

The model was trained using 2,750 steps, and 4 batch size.

### Datasets

The training corpus is made up of:

- [HuggingFaceH4/no_robots](https://huggingface.co/datasets/HuggingFaceH4/no_robots)
- [HuggingFaceTB/everyday-conversations-llama3.1-2k](https://huggingface.co/datasets/HuggingFaceTB/everyday-conversations-llama3.1-2k)

The `train` / `train_sft` splits were used.

### Chat template

The Zephyr chat template was used, but most notably, chat template tokens were added to enhance performance.

## Limitations

The model frequently outputs incorrect information, confirmation with a larger, mature model is advised. In addition, it may subtly repeat.
初始化项目，由ModelHub XC社区提供模型 Model: qikp/treeswift-90m Source: Original Platform 2026-05-24 11:19:35 +08:00			`---`
			`license: apache-2.0`
			`datasets:`
			`- HuggingFaceH4/no_robots`
			`- HuggingFaceTB/everyday-conversations-llama3.1-2k`
			`language:`
			`- en`
			`base_model:`
			`- distilbert/distilgpt2`
			`pipeline_tag: text-generation`
			`library_name: transformers`
			`---`

			`# Treeswift`

			`Treeswift is a derivative of DistilGPT2 trained to be conversational. It is also designed to be similar to GPT-3.5.`

			`## Training`

			`The model was trained using 2,750 steps, and 4 batch size.`

			`### Datasets`

			`The training corpus is made up of:`

			`- [HuggingFaceH4/no_robots](https://huggingface.co/datasets/HuggingFaceH4/no_robots)`
			`- [HuggingFaceTB/everyday-conversations-llama3.1-2k](https://huggingface.co/datasets/HuggingFaceTB/everyday-conversations-llama3.1-2k)`

			The `train` / `train_sft` splits were used.

			`### Chat template`

			`The Zephyr chat template was used, but most notably, chat template tokens were added to enhance performance.`

			`## Limitations`

			`The model frequently outputs incorrect information, confirmation with a larger, mature model is advised. In addition, it may subtly repeat.`