初始化项目,由ModelHub XC社区提供模型
Model: qikp/treeswift-90m Source: Original Platform
This commit is contained in:
37
README.md
Normal file
37
README.md
Normal file
@@ -0,0 +1,37 @@
|
||||
---
|
||||
license: apache-2.0
|
||||
datasets:
|
||||
- HuggingFaceH4/no_robots
|
||||
- HuggingFaceTB/everyday-conversations-llama3.1-2k
|
||||
language:
|
||||
- en
|
||||
base_model:
|
||||
- distilbert/distilgpt2
|
||||
pipeline_tag: text-generation
|
||||
library_name: transformers
|
||||
---
|
||||
|
||||
# Treeswift
|
||||
|
||||
Treeswift is a derivative of DistilGPT2 trained to be conversational. It is also designed to be similar to GPT-3.5.
|
||||
|
||||
## Training
|
||||
|
||||
The model was trained using 2,750 steps, and 4 batch size.
|
||||
|
||||
### Datasets
|
||||
|
||||
The training corpus is made up of:
|
||||
|
||||
- [HuggingFaceH4/no_robots](https://huggingface.co/datasets/HuggingFaceH4/no_robots)
|
||||
- [HuggingFaceTB/everyday-conversations-llama3.1-2k](https://huggingface.co/datasets/HuggingFaceTB/everyday-conversations-llama3.1-2k)
|
||||
|
||||
The `train` / `train_sft` splits were used.
|
||||
|
||||
### Chat template
|
||||
|
||||
The Zephyr chat template was used, but most notably, chat template tokens were added to enhance performance.
|
||||
|
||||
## Limitations
|
||||
|
||||
The model frequently outputs incorrect information, confirmation with a larger, mature model is advised. In addition, it may subtly repeat.
|
||||
Reference in New Issue
Block a user