--- license: mit datasets: - qikp/reborn-5k-no-thoughts - HuggingFaceTB/smol-smoltalk - HuggingFaceTB/everyday-conversations-llama3.1-2k language: - en base_model: - openai-community/gpt2 pipeline_tag: text-generation library_name: transformers new_version: qikp/hummingbird-2.1-110m --- # Hummingbird 🎉 You are looking at Hummingbird 2, trained on a much more efficient corpus, achieving similar performance with 3x less parameters! Hummingbird is a GPT-2 derivative trained to be conversational. ## Training The model was trained using the `paged_adamw_8bit` optimizer, gradient checkpointing, 500 steps, 1 batch size, and 4 gradient accumulation steps. ### Datasets The training corpus is made up of: - First 1400 rows of [qikp/reborn-5k-no-thoughts](https://huggingface.co/datasets/qikp/reborn-5k-no-thoughts) - First 500 rows of [HuggingFaceTB/smol-smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smol-smoltalk) - First 100 rows of [HuggingFaceTB/everyday-conversations-llama3.1-2k](https://huggingface.co/datasets/HuggingFaceTB/everyday-conversations-llama3.1-2k) The `train` / `train_sft` splits were used. ### Chat template The Zephyr chat template was used. ## Limitations The model frequently outputs incorrect information, confirmation with a larger, mature model is advised. ## Benchmark This model was tested against GAIA and compared using embeddings. See the results [here](https://codeberg.org/qikp/benchmarks).