diff --git a/README.md b/README.md index 7b92f94..5e6e81b 100644 --- a/README.md +++ b/README.md @@ -4,6 +4,7 @@ language: - en library_name: transformers --- +
@@ -15,7 +16,7 @@ library_name: transformers We are open-sourcing one of our early experiments of pretraining with custom architecture and datasets. This 1.1B parameter model is pre-trained from scratch using a custom-curated dataset of 41B tokens. The model's architecture experiments contain the addition of flash attention and a higher intermediate dimension of the MLP layer. The dataset is a combination of wiki, stories, arxiv, math and code. The model is available on huggingface [Boomer1B](https://huggingface.co/budecosystem/boomer-1b) -
+
## Getting Started on GitHub 💻