Files
hummingbird-2-125m/README.md
ModelHub XC 31c91a3c71 初始化项目,由ModelHub XC社区提供模型
Model: qikp/hummingbird-2-125m
Source: Original Platform
2026-06-18 10:58:20 +08:00

46 lines
1.4 KiB
Markdown

---
license: mit
datasets:
- qikp/reborn-5k-no-thoughts
- HuggingFaceTB/smol-smoltalk
- HuggingFaceTB/everyday-conversations-llama3.1-2k
language:
- en
base_model:
- openai-community/gpt2
pipeline_tag: text-generation
library_name: transformers
new_version: qikp/hummingbird-2.1-110m
---
# Hummingbird
🎉 You are looking at Hummingbird 2, trained on a much more efficient corpus, achieving similar performance with 3x less parameters!
Hummingbird is a GPT-2 derivative trained to be conversational.
## Training
The model was trained using the `paged_adamw_8bit` optimizer, gradient checkpointing, 500 steps, 1 batch size, and 4 gradient accumulation steps.
### Datasets
The training corpus is made up of:
- First 1400 rows of [qikp/reborn-5k-no-thoughts](https://huggingface.co/datasets/qikp/reborn-5k-no-thoughts)
- First 500 rows of [HuggingFaceTB/smol-smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smol-smoltalk)
- First 100 rows of [HuggingFaceTB/everyday-conversations-llama3.1-2k](https://huggingface.co/datasets/HuggingFaceTB/everyday-conversations-llama3.1-2k)
The `train` / `train_sft` splits were used.
### Chat template
The Zephyr chat template was used.
## Limitations
The model frequently outputs incorrect information, confirmation with a larger, mature model is advised.
## Benchmark
This model was tested against GAIA and compared using embeddings. See the results [here](https://codeberg.org/qikp/benchmarks).