初始化项目，由ModelHub XC社区提供模型

Model: budecosystem/genz-13b-infinite Source: Original Platform
2026-05-11 02:19:22 +08:00
commit 6159b6ee64
12 changed files with 634 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,67 @@
+---
+license: llama2
+---
+
+## Introducing GenZ Infinite
+
+The model is a finetuned version of Genz-13B-v2 with a context size of 16K. The model architecture is updated to have lamda attention from the LM-Infinite paper which gives the model capability of 120K+ sequence length without affecting the preplexity
+
+## Generate responses
+
+Use the generate.py file from the [github repo](https://github.com/BudEcosystem/genz-infinite)
+
+```
+python generate.py --base_model budecosystem/genz-13b-infinite
+
+```
+
+You can integrate the model in your code my loading convert_llama_model function.
+
+```python
+import torch
+from transformers import GenerationConfig, AutoModelForCausalLM, AutoTokenizer
+from model.llama import convert_llama_model
+
+local_branch = 2048
+global_branch = 10
+limit_distance = 2048
+
+model = AutoModelForCausalLM.from_pretrained(
+    "budecosystem/genz-13b-infinite",
+    torch_dtype=torch.float16,
+    device_map="auto",
+)
+model = convert_llama_model(model, local_branch, global_branch)
+
+```
+
+## Evaluation
+
+
+| Task |   4096    |   5120    |   8192    |   16384   |
+| :----:|:---------:| :--------:| :--------:| :--------:|
+|Passkey retreival | 100 | 75 | 48  | 30 |
+
+
+## Training details
+
+The model is trained of 4 A100 80GB for approximately 55hrs. 
+
+| Hyperparameters              | Value  |
+| :----------------------------| :-----: |
+| per_device_train_batch_size  | 1      |
+| gradient_accumulation_steps  | 1      |
+| epoch | 3 |
+| steps | 8550 |
+| learning_rate                | 2e-4   |
+| lr schedular type | cosine |
+| warmup steps | 1000 |
+| optimizer                    | adamw  |
+| fp16                         | True   |
+| GPU                          | 4 A100 80GB |
+
+
+### Acknowledgments
+
+We'd like to thank the open-source community and the researchers whose foundational work laid the path to this  model. Special shoutout to the authors of [LM-Infinite paper](https://arxiv.org/abs/2308.16137) and the [GitHub repo](https://github.com/Glaciohound/LM-Infinite)
+