初始化项目,由ModelHub XC社区提供模型
Model: budecosystem/genz-13b-infinite Source: Original Platform
This commit is contained in:
67
README.md
Normal file
67
README.md
Normal file
@@ -0,0 +1,67 @@
|
||||
---
|
||||
license: llama2
|
||||
---
|
||||
|
||||
## Introducing GenZ Infinite
|
||||
|
||||
The model is a finetuned version of Genz-13B-v2 with a context size of 16K. The model architecture is updated to have lamda attention from the LM-Infinite paper which gives the model capability of 120K+ sequence length without affecting the preplexity
|
||||
|
||||
## Generate responses
|
||||
|
||||
Use the generate.py file from the [github repo](https://github.com/BudEcosystem/genz-infinite)
|
||||
|
||||
```
|
||||
python generate.py --base_model budecosystem/genz-13b-infinite
|
||||
|
||||
```
|
||||
|
||||
You can integrate the model in your code my loading convert_llama_model function.
|
||||
|
||||
```python
|
||||
import torch
|
||||
from transformers import GenerationConfig, AutoModelForCausalLM, AutoTokenizer
|
||||
from model.llama import convert_llama_model
|
||||
|
||||
local_branch = 2048
|
||||
global_branch = 10
|
||||
limit_distance = 2048
|
||||
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
"budecosystem/genz-13b-infinite",
|
||||
torch_dtype=torch.float16,
|
||||
device_map="auto",
|
||||
)
|
||||
model = convert_llama_model(model, local_branch, global_branch)
|
||||
|
||||
```
|
||||
|
||||
## Evaluation
|
||||
|
||||
|
||||
| Task | 4096 | 5120 | 8192 | 16384 |
|
||||
| :----:|:---------:| :--------:| :--------:| :--------:|
|
||||
|Passkey retreival | 100 | 75 | 48 | 30 |
|
||||
|
||||
|
||||
## Training details
|
||||
|
||||
The model is trained of 4 A100 80GB for approximately 55hrs.
|
||||
|
||||
| Hyperparameters | Value |
|
||||
| :----------------------------| :-----: |
|
||||
| per_device_train_batch_size | 1 |
|
||||
| gradient_accumulation_steps | 1 |
|
||||
| epoch | 3 |
|
||||
| steps | 8550 |
|
||||
| learning_rate | 2e-4 |
|
||||
| lr schedular type | cosine |
|
||||
| warmup steps | 1000 |
|
||||
| optimizer | adamw |
|
||||
| fp16 | True |
|
||||
| GPU | 4 A100 80GB |
|
||||
|
||||
|
||||
### Acknowledgments
|
||||
|
||||
We'd like to thank the open-source community and the researchers whose foundational work laid the path to this model. Special shoutout to the authors of [LM-Infinite paper](https://arxiv.org/abs/2308.16137) and the [GitHub repo](https://github.com/Glaciohound/LM-Infinite)
|
||||
|
||||
Reference in New Issue
Block a user