Update README.md

2023-10-09 14:42:25 +00:00
parent 2c5c1c2834
commit 06ef4bdd47
1 changed files with 105 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -1,3 +1,108 @@
 ---
 license: apache-2.0
+language:
+- en
+library_name: transformers
 ---
+<div align="center"><img src="https://github.com/BudEcosystem/boomer/blob/main/assets/boomer-logo.png" width=200></div>
+
+
+<p align="center"><i>Democratizing access to LLMs for the open-source community.<br>Let's advance AI, together. </i></p>
+
+----
+
+## Introduction 🎉
+
+We are open-sourcing one of our early experiments of pretraining with custom architecture and datasets. This 1.1B parameter model is pre-trained from scratch using a custom-curated dataset of 41B tokens. The model's architecture experiments contain the addition of flash attention and a higher intermediate dimension of the MLP layer. The dataset is a combination of wiki, stories, arxiv, math and code. The model is available on huggingface [Boomer1B](https://huggingface.co/budecosystem/boomer-1b)
+
+<div align="center"><img src="https://github.com/BudEcosystem/boomer/blob/main/assets/boomer-arch.jpg" width=500></div>
+
+## Getting Started on GitHub 💻
+
+Ready to dive in? Here's how you can get started with our models on GitHub.
+
+Install the necessary dependencies with the following command:
+
+```bash
+pip install -r requirements.txt
+```
+
+### Generate responses
+
+Now that your model is fine-tuned, you're ready to generate responses. You can do this using our generate.py script, which runs inference from the Hugging Face model hub and inference on a specified input. Here's an example of usage:
+
+```bash
+python generate.py --base_model 'budecosystem/boomer-1b' --prompt="the president of India is"
+```
+
+### Fine-tuning 🎯
+
+
+It's time to upgrade the model by fine-tuning the model. You can do this using our provided finetune.py script. Here's an example command:
+
+```bash
+torchrun --nproc_per_node 4 train.py \
+   --base_model budecosystem/boomer-1b \
+   --data_path dataset.json \
+   --output_dir output \
+   --per_device_train_batch_size 2 \
+   --gradient_accumulation_steps 2 \
+   --num_train_epochs 1 \
+   --learning_rate 2e-5 \
+   --fp16 True \
+   --logging_steps 10 \
+   --deepspeed ds_config.json
+```
+
+## Model details
+
+| Parameters  | Value  |
+| :-------------  | :----: |
+| n_layers        | 4     |
+| n_heads         | 32     |
+| d_model         | 4096   |
+| vocab size      | 32000 |
+| sequence length | 4096   |
+| Intermediate size | 11008 |
+
+### Tokenizer
+
+We used the SentencePiece tokenizer during the fine-tuning process. This tokenizer is known for its capability to handle open-vocabulary language tasks efficiently.
+
+### Training details
+
+The model is trained of 4 A100 80GB for approximately 250hrs. 
+
+| Hyperparameters              | Value  |
+| :----------------------------| :-----: |
+| per_device_train_batch_size  | 2      |
+| gradient_accumulation_steps  | 2      |
+| learning_rate                | 2e-4   |
+| optimizer                    | adamw  |
+| beta                         | 0.9, 0.95 |
+| fp16                         | True   |
+| GPU                          | 4 A100 80GB |
+
+
+## Evaluations
+
+We have evaluated the pre-trained model on few of the benchmarks
+
+| Model Name | ARC | MMLU | Human Eval | Hellaswag | BBH   | DROP  | GSM8K   |
+|:----------:|:--------:|:----:|:----------:|:---------:|:-----: |:-----:|:----:|
+| Boomer1B   | 22.35     | 25.92| 6.1      | 31.66     | 28.65   | 6.13   | 1.5 |
+
+### Why use BOOMER? 
+
+Retrieval augmentation
+Inference at the edge
+Language modeling use cases
+
+### Final thought on Boomer!
+
+This isn't the end. It's just the beginning of a journey towards creating more advanced, more efficient, and more accessible language models. We invite you to join us on this exciting journey. 
+
+
+### Aknowledgements
+
+We'd like to thank the open-source community and the researchers whose foundational work laid the path for BOOMER. Special shoutout to our dedicated team who have worked relentlessly to curate the dataset and fine-tune the model to perfection.