46 lines
1.4 KiB
Markdown
46 lines
1.4 KiB
Markdown
---
|
|
license: mit
|
|
datasets:
|
|
- qikp/reborn-5k-no-thoughts
|
|
- HuggingFaceTB/smol-smoltalk
|
|
- HuggingFaceTB/everyday-conversations-llama3.1-2k
|
|
language:
|
|
- en
|
|
base_model:
|
|
- openai-community/gpt2
|
|
pipeline_tag: text-generation
|
|
library_name: transformers
|
|
new_version: qikp/hummingbird-2.1-110m
|
|
---
|
|
|
|
# Hummingbird
|
|
|
|
🎉 You are looking at Hummingbird 2, trained on a much more efficient corpus, achieving similar performance with 3x less parameters!
|
|
|
|
Hummingbird is a GPT-2 derivative trained to be conversational.
|
|
|
|
## Training
|
|
|
|
The model was trained using the `paged_adamw_8bit` optimizer, gradient checkpointing, 500 steps, 1 batch size, and 4 gradient accumulation steps.
|
|
|
|
### Datasets
|
|
|
|
The training corpus is made up of:
|
|
|
|
- First 1400 rows of [qikp/reborn-5k-no-thoughts](https://huggingface.co/datasets/qikp/reborn-5k-no-thoughts)
|
|
- First 500 rows of [HuggingFaceTB/smol-smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smol-smoltalk)
|
|
- First 100 rows of [HuggingFaceTB/everyday-conversations-llama3.1-2k](https://huggingface.co/datasets/HuggingFaceTB/everyday-conversations-llama3.1-2k)
|
|
|
|
The `train` / `train_sft` splits were used.
|
|
|
|
### Chat template
|
|
|
|
The Zephyr chat template was used.
|
|
|
|
## Limitations
|
|
|
|
The model frequently outputs incorrect information, confirmation with a larger, mature model is advised.
|
|
|
|
## Benchmark
|
|
|
|
This model was tested against GAIA and compared using embeddings. See the results [here](https://codeberg.org/qikp/benchmarks). |