genz-13b-infinite/README.md

---
license: llama2
---

## Introducing GenZ Infinite

The model is a finetuned version of Genz-13B-v2 with a context size of 16K. The model architecture is updated to have lamda attention from the LM-Infinite paper which gives the model capability of 120K+ sequence length without affecting the preplexity

## Generate responses

Use the generate.py file from the [github repo](https://github.com/BudEcosystem/genz-infinite)

```
python generate.py --base_model budecosystem/genz-13b-infinite

```

You can integrate the model in your code my loading convert_llama_model function.

```python
import torch
from transformers import GenerationConfig, AutoModelForCausalLM, AutoTokenizer
from model.llama import convert_llama_model

local_branch = 2048
global_branch = 10
limit_distance = 2048

model = AutoModelForCausalLM.from_pretrained(
    "budecosystem/genz-13b-infinite",
    torch_dtype=torch.float16,
    device_map="auto",
)
model = convert_llama_model(model, local_branch, global_branch)

```

## Evaluation


| Task |   4096    |   5120    |   8192    |   16384   |
| :----:|:---------:| :--------:| :--------:| :--------:|
|Passkey retreival | 100 | 75 | 48  | 30 |


## Training details

The model is trained of 4 A100 80GB for approximately 55hrs.

| Hyperparameters              | Value  |
| :----------------------------| :-----: |
| per_device_train_batch_size  | 1      |
| gradient_accumulation_steps  | 1      |
| epoch | 3 |
| steps | 8550 |
| learning_rate                | 2e-4   |
| lr schedular type | cosine |
| warmup steps | 1000 |
| optimizer                    | adamw  |
| fp16                         | True   |
| GPU                          | 4 A100 80GB |


### Acknowledgments

We'd like to thank the open-source community and the researchers whose foundational work laid the path to this  model. Special shoutout to the authors of [LM-Infinite paper](https://arxiv.org/abs/2308.16137) and the [GitHub repo](https://github.com/Glaciohound/LM-Infinite)