Update README.md

2024-04-04 14:28:38 +00:00 · 2024-04-03 14:12:23 +00:00 · 2023-10-09 15:16:19 +00:00 · 2023-10-09 15:00:30 +00:00 · 2023-10-09 15:00:13 +00:00 · 2023-10-09 14:53:53 +00:00
7 changed files with 205 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -1,3 +1,109 @@
 ---
 license: apache-2.0
 language:
 - en
 library_name: transformers
 ---
 <div align="center"><img src="https://raw.githubusercontent.com/BudEcosystem/boomer/main/assets/boomer-logo.png" width=200></div>
 <p align="center"><i>Democratizing access to LLMs for the open-source community.<br>Let's advance AI, together. </i></p>
 ----
 ## Introduction 🎉
 We are open-sourcing one of our early experiments of pretraining with custom architecture and datasets. This 1.1B parameter model is pre-trained from scratch using a custom-curated dataset of 41B tokens. The model's architecture experiments contain the addition of flash attention and a higher intermediate dimension of the MLP layer. The dataset is a combination of wiki, stories, arxiv, math and code. The model is available on huggingface [Boomer1B](https://huggingface.co/budecosystem/boomer-1b)
 <div align="center"><img src="https://raw.githubusercontent.com/BudEcosystem/boomer/main/assets/boomer-arch.jpg" width=500></div>
 ## Getting Started on GitHub 💻
 Ready to dive in? Here's how you can get started with our models on GitHub.
 Install the necessary dependencies with the following command:
 ```bash
 pip install -r requirements.txt
 ```
 ### Generate responses
 Now that your model is fine-tuned, you're ready to generate responses. You can do this using our generate.py script, which runs inference from the Hugging Face model hub and inference on a specified input. Here's an example of usage:
 ```bash
 python generate.py --base_model 'budecosystem/boomer-1b' --prompt="the president of India is"
 ```
 ### Fine-tuning 🎯
 It's time to upgrade the model by fine-tuning the model. You can do this using our provided finetune.py script. Here's an example command:
 ```bash
 torchrun --nproc_per_node 4 train.py \
   --base_model budecosystem/boomer-1b \
   --data_path dataset.json \
   --output_dir output \
   --per_device_train_batch_size 2 \
   --gradient_accumulation_steps 2 \
   --num_train_epochs 1 \
   --learning_rate 2e-5 \
   --fp16 True \
   --logging_steps 10 \
   --deepspeed ds_config.json
 ```
 ## Model details
 | Parameters  | Value  |
 | :-------------  | :----: |
 | n_layers        | 4     |
 | n_heads         | 32     |
 | d_model         | 4096   |
 | vocab size      | 32000 |
 | sequence length | 4096   |
 | Intermediate size | 11008 |
 ### Tokenizer
 We used the SentencePiece tokenizer during the fine-tuning process. This tokenizer is known for its capability to handle open-vocabulary language tasks efficiently.
 ### Training details
 The model is trained of 4 A100 80GB for approximately 250hrs. 
 | Hyperparameters              | Value  |
 | :----------------------------| :-----: |
 | per_device_train_batch_size  | 2      |
 | gradient_accumulation_steps  | 2      |
 | learning_rate                | 2e-4   |
 | optimizer                    | adamw  |
 | beta                         | 0.9, 0.95 |
 | fp16                         | True   |
 | GPU                          | 4 A100 80GB |
 ## Evaluations
 We have evaluated the pre-trained model on few of the benchmarks
 | Model Name | ARC | MMLU | Human Eval | Hellaswag | BBH   | DROP  | GSM8K   |
 |:----------:|:--------:|:----:|:----------:|:---------:|:-----: |:-----:|:----:|
 | Boomer1B   | 22.35     | 25.92| 6.1      | 31.66     | 28.65   | 6.13   | 1.5 |
 ### Why use BOOMER? 
 - Retrieval augmentation
 - Inference at the edge
 - Language modeling use cases
 ### Final thought on Boomer!
 This isn't the end. It's just the beginning of a journey towards creating more advanced, more efficient, and more accessible language models. We invite you to join us on this exciting journey. 
 ### Aknowledgements
 We'd like to thank the open-source community and the researchers whose foundational work laid the path for BOOMER. Special shoutout to our dedicated team who have worked relentlessly to curate the dataset and fine-tune the model to perfection.
--- a/config.json
+++ b/config.json
@@ -0,0 +1,26 @@
 {
  "_name_or_path": "boomer-1b",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 11008,
  "max_position_embeddings": 4096,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 4,
  "num_key_value_heads": 32,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 10000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "float16",
  "transformers_version": "4.33.1",
  "use_cache": false,
  "vocab_size": 32000
 }
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,6 @@
 {
  "_from_model_config": true,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "transformers_version": "4.33.1"
 }
--- a/pytorch_model.bin
+++ b/pytorch_model.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:0c1902cda297164be733c20efccad4b0cc5fa8195ae526dd3f4741f4e1a598dd
 size 2143375334
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@@ -0,0 +1,24 @@
 {
  "bos_token": {
    "content": "<s>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "eos_token": {
    "content": "</s>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "pad_token": "<unk>",
  "unk_token": {
    "content": "<unk>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  }
 }
--- a/tokenizer.model
+++ b/tokenizer.model
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
 size 499723
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -0,0 +1,37 @@
 {
  "add_bos_token": true,
  "add_eos_token": false,
  "bos_token": {
    "__type": "AddedToken",
    "content": "<s>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "clean_up_tokenization_spaces": false,
  "eos_token": {
    "__type": "AddedToken",
    "content": "</s>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "legacy": false,
  "model_max_length": 1000000000000000019884624838656,
  "pad_token": null,
  "padding_side": "right",
  "sp_model_kwargs": {},
  "spaces_between_special_tokens": false,
  "tokenizer_class": "LlamaTokenizer",
  "unk_token": {
    "__type": "AddedToken",
    "content": "<unk>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "use_default_system_prompt": true
 }
Author	SHA1	Message	Date
Ditto	f3438d8750	Update README.md	2024-04-04 14:28:38 +00:00
Ditto	3410de8be0	Update README.md	2024-04-03 14:12:23 +00:00
Jeevan Joy	622539d1e9	Update README.md	2023-10-09 15:16:19 +00:00
dittops	f8f24b5480	Merge branch 'main' of https://huggingface.co/budecosystem/boomer-1b into main	2023-10-09 15:00:30 +00:00
dittops	fb5bedd551	update model	2023-10-09 15:00:13 +00:00
Jeevan Joy	5a4b00ba3e	Update README.md	2023-10-09 14:53:53 +00:00
Jeevan Joy	bb25b84b7b	Update README.md	2023-10-09 14:53:29 +00:00
Jeevan Joy	06ef4bdd47	Update README.md	2023-10-09 14:42:25 +00:00
dittops	2c5c1c2834	initial commit	2023-10-04 17:45:11 +00:00
adarshms	513a424be7	initial commmit	2023-10-03 20:55:45 +00:00