From bb25b84b7bc7bae55f38da2b2e6715e47a44913b Mon Sep 17 00:00:00 2001 From: Jeevan Joy Date: Mon, 9 Oct 2023 14:53:29 +0000 Subject: [PATCH] Update README.md --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 7b92f94..5e6e81b 100644 --- a/README.md +++ b/README.md @@ -4,6 +4,7 @@ language: - en library_name: transformers --- +
@@ -15,7 +16,7 @@ library_name: transformers We are open-sourcing one of our early experiments of pretraining with custom architecture and datasets. This 1.1B parameter model is pre-trained from scratch using a custom-curated dataset of 41B tokens. The model's architecture experiments contain the addition of flash attention and a higher intermediate dimension of the MLP layer. The dataset is a combination of wiki, stories, arxiv, math and code. The model is available on huggingface [Boomer1B](https://huggingface.co/budecosystem/boomer-1b) -
+
## Getting Started on GitHub 💻