Compare commits
10 Commits
e06ed147e5
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
f3438d8750 | ||
|
|
3410de8be0 | ||
|
|
622539d1e9 | ||
|
|
f8f24b5480 | ||
|
|
fb5bedd551 | ||
|
|
5a4b00ba3e | ||
|
|
bb25b84b7b | ||
|
|
06ef4bdd47 | ||
|
|
2c5c1c2834 | ||
|
|
513a424be7 |
106
README.md
106
README.md
@@ -1,3 +1,109 @@
|
|||||||
---
|
---
|
||||||
license: apache-2.0
|
license: apache-2.0
|
||||||
|
language:
|
||||||
|
- en
|
||||||
|
library_name: transformers
|
||||||
---
|
---
|
||||||
|
|
||||||
|
<div align="center"><img src="https://raw.githubusercontent.com/BudEcosystem/boomer/main/assets/boomer-logo.png" width=200></div>
|
||||||
|
|
||||||
|
|
||||||
|
<p align="center"><i>Democratizing access to LLMs for the open-source community.<br>Let's advance AI, together. </i></p>
|
||||||
|
|
||||||
|
----
|
||||||
|
|
||||||
|
## Introduction 🎉
|
||||||
|
|
||||||
|
We are open-sourcing one of our early experiments of pretraining with custom architecture and datasets. This 1.1B parameter model is pre-trained from scratch using a custom-curated dataset of 41B tokens. The model's architecture experiments contain the addition of flash attention and a higher intermediate dimension of the MLP layer. The dataset is a combination of wiki, stories, arxiv, math and code. The model is available on huggingface [Boomer1B](https://huggingface.co/budecosystem/boomer-1b)
|
||||||
|
|
||||||
|
<div align="center"><img src="https://raw.githubusercontent.com/BudEcosystem/boomer/main/assets/boomer-arch.jpg" width=500></div>
|
||||||
|
|
||||||
|
## Getting Started on GitHub 💻
|
||||||
|
|
||||||
|
Ready to dive in? Here's how you can get started with our models on GitHub.
|
||||||
|
|
||||||
|
Install the necessary dependencies with the following command:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install -r requirements.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
### Generate responses
|
||||||
|
|
||||||
|
Now that your model is fine-tuned, you're ready to generate responses. You can do this using our generate.py script, which runs inference from the Hugging Face model hub and inference on a specified input. Here's an example of usage:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python generate.py --base_model 'budecosystem/boomer-1b' --prompt="the president of India is"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Fine-tuning 🎯
|
||||||
|
|
||||||
|
|
||||||
|
It's time to upgrade the model by fine-tuning the model. You can do this using our provided finetune.py script. Here's an example command:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
torchrun --nproc_per_node 4 train.py \
|
||||||
|
--base_model budecosystem/boomer-1b \
|
||||||
|
--data_path dataset.json \
|
||||||
|
--output_dir output \
|
||||||
|
--per_device_train_batch_size 2 \
|
||||||
|
--gradient_accumulation_steps 2 \
|
||||||
|
--num_train_epochs 1 \
|
||||||
|
--learning_rate 2e-5 \
|
||||||
|
--fp16 True \
|
||||||
|
--logging_steps 10 \
|
||||||
|
--deepspeed ds_config.json
|
||||||
|
```
|
||||||
|
|
||||||
|
## Model details
|
||||||
|
|
||||||
|
| Parameters | Value |
|
||||||
|
| :------------- | :----: |
|
||||||
|
| n_layers | 4 |
|
||||||
|
| n_heads | 32 |
|
||||||
|
| d_model | 4096 |
|
||||||
|
| vocab size | 32000 |
|
||||||
|
| sequence length | 4096 |
|
||||||
|
| Intermediate size | 11008 |
|
||||||
|
|
||||||
|
### Tokenizer
|
||||||
|
|
||||||
|
We used the SentencePiece tokenizer during the fine-tuning process. This tokenizer is known for its capability to handle open-vocabulary language tasks efficiently.
|
||||||
|
|
||||||
|
### Training details
|
||||||
|
|
||||||
|
The model is trained of 4 A100 80GB for approximately 250hrs.
|
||||||
|
|
||||||
|
| Hyperparameters | Value |
|
||||||
|
| :----------------------------| :-----: |
|
||||||
|
| per_device_train_batch_size | 2 |
|
||||||
|
| gradient_accumulation_steps | 2 |
|
||||||
|
| learning_rate | 2e-4 |
|
||||||
|
| optimizer | adamw |
|
||||||
|
| beta | 0.9, 0.95 |
|
||||||
|
| fp16 | True |
|
||||||
|
| GPU | 4 A100 80GB |
|
||||||
|
|
||||||
|
|
||||||
|
## Evaluations
|
||||||
|
|
||||||
|
We have evaluated the pre-trained model on few of the benchmarks
|
||||||
|
|
||||||
|
| Model Name | ARC | MMLU | Human Eval | Hellaswag | BBH | DROP | GSM8K |
|
||||||
|
|:----------:|:--------:|:----:|:----------:|:---------:|:-----: |:-----:|:----:|
|
||||||
|
| Boomer1B | 22.35 | 25.92| 6.1 | 31.66 | 28.65 | 6.13 | 1.5 |
|
||||||
|
|
||||||
|
### Why use BOOMER?
|
||||||
|
|
||||||
|
- Retrieval augmentation
|
||||||
|
- Inference at the edge
|
||||||
|
- Language modeling use cases
|
||||||
|
|
||||||
|
### Final thought on Boomer!
|
||||||
|
|
||||||
|
This isn't the end. It's just the beginning of a journey towards creating more advanced, more efficient, and more accessible language models. We invite you to join us on this exciting journey.
|
||||||
|
|
||||||
|
|
||||||
|
### Aknowledgements
|
||||||
|
|
||||||
|
We'd like to thank the open-source community and the researchers whose foundational work laid the path for BOOMER. Special shoutout to our dedicated team who have worked relentlessly to curate the dataset and fine-tune the model to perfection.
|
||||||
26
config.json
Normal file
26
config.json
Normal file
@@ -0,0 +1,26 @@
|
|||||||
|
{
|
||||||
|
"_name_or_path": "boomer-1b",
|
||||||
|
"architectures": [
|
||||||
|
"LlamaForCausalLM"
|
||||||
|
],
|
||||||
|
"bos_token_id": 1,
|
||||||
|
"eos_token_id": 2,
|
||||||
|
"hidden_act": "silu",
|
||||||
|
"hidden_size": 4096,
|
||||||
|
"initializer_range": 0.02,
|
||||||
|
"intermediate_size": 11008,
|
||||||
|
"max_position_embeddings": 4096,
|
||||||
|
"model_type": "llama",
|
||||||
|
"num_attention_heads": 32,
|
||||||
|
"num_hidden_layers": 4,
|
||||||
|
"num_key_value_heads": 32,
|
||||||
|
"pretraining_tp": 1,
|
||||||
|
"rms_norm_eps": 1e-06,
|
||||||
|
"rope_scaling": null,
|
||||||
|
"rope_theta": 10000.0,
|
||||||
|
"tie_word_embeddings": false,
|
||||||
|
"torch_dtype": "float16",
|
||||||
|
"transformers_version": "4.33.1",
|
||||||
|
"use_cache": false,
|
||||||
|
"vocab_size": 32000
|
||||||
|
}
|
||||||
6
generation_config.json
Normal file
6
generation_config.json
Normal file
@@ -0,0 +1,6 @@
|
|||||||
|
{
|
||||||
|
"_from_model_config": true,
|
||||||
|
"bos_token_id": 1,
|
||||||
|
"eos_token_id": 2,
|
||||||
|
"transformers_version": "4.33.1"
|
||||||
|
}
|
||||||
3
pytorch_model.bin
Normal file
3
pytorch_model.bin
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:0c1902cda297164be733c20efccad4b0cc5fa8195ae526dd3f4741f4e1a598dd
|
||||||
|
size 2143375334
|
||||||
24
special_tokens_map.json
Normal file
24
special_tokens_map.json
Normal file
@@ -0,0 +1,24 @@
|
|||||||
|
{
|
||||||
|
"bos_token": {
|
||||||
|
"content": "<s>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"eos_token": {
|
||||||
|
"content": "</s>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"pad_token": "<unk>",
|
||||||
|
"unk_token": {
|
||||||
|
"content": "<unk>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
}
|
||||||
|
}
|
||||||
3
tokenizer.model
Normal file
3
tokenizer.model
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
|
||||||
|
size 499723
|
||||||
37
tokenizer_config.json
Normal file
37
tokenizer_config.json
Normal file
@@ -0,0 +1,37 @@
|
|||||||
|
{
|
||||||
|
"add_bos_token": true,
|
||||||
|
"add_eos_token": false,
|
||||||
|
"bos_token": {
|
||||||
|
"__type": "AddedToken",
|
||||||
|
"content": "<s>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"clean_up_tokenization_spaces": false,
|
||||||
|
"eos_token": {
|
||||||
|
"__type": "AddedToken",
|
||||||
|
"content": "</s>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"legacy": false,
|
||||||
|
"model_max_length": 1000000000000000019884624838656,
|
||||||
|
"pad_token": null,
|
||||||
|
"padding_side": "right",
|
||||||
|
"sp_model_kwargs": {},
|
||||||
|
"spaces_between_special_tokens": false,
|
||||||
|
"tokenizer_class": "LlamaTokenizer",
|
||||||
|
"unk_token": {
|
||||||
|
"__type": "AddedToken",
|
||||||
|
"content": "<unk>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"use_default_system_prompt": true
|
||||||
|
}
|
||||||
Reference in New Issue
Block a user