commit 107aebaca1fae96e43a507f68a3b3b8030ced6e1 Author: ModelHub XC Date: Thu Jun 4 09:44:16 2026 +0800 初始化项目,由ModelHub XC社区提供模型 Model: RichardErkhov/Yukang_-_LongAlpaca-13B-gguf Source: Original Platform diff --git a/.gitattributes b/.gitattributes new file mode 100644 index 0000000..d4f1bbd --- /dev/null +++ b/.gitattributes @@ -0,0 +1,57 @@ +*.7z filter=lfs diff=lfs merge=lfs -text +*.arrow filter=lfs diff=lfs merge=lfs -text +*.bin filter=lfs diff=lfs merge=lfs -text +*.bz2 filter=lfs diff=lfs merge=lfs -text +*.ckpt filter=lfs diff=lfs merge=lfs -text +*.ftz filter=lfs diff=lfs merge=lfs -text +*.gz filter=lfs diff=lfs merge=lfs -text +*.h5 filter=lfs diff=lfs merge=lfs -text +*.joblib filter=lfs diff=lfs merge=lfs -text +*.lfs.* filter=lfs diff=lfs merge=lfs -text +*.mlmodel filter=lfs diff=lfs merge=lfs -text +*.model filter=lfs diff=lfs merge=lfs -text +*.msgpack filter=lfs diff=lfs merge=lfs -text +*.npy filter=lfs diff=lfs merge=lfs -text +*.npz filter=lfs diff=lfs merge=lfs -text +*.onnx filter=lfs diff=lfs merge=lfs -text +*.ot filter=lfs diff=lfs merge=lfs -text +*.parquet filter=lfs diff=lfs merge=lfs -text +*.pb filter=lfs diff=lfs merge=lfs -text +*.pickle filter=lfs diff=lfs merge=lfs -text +*.pkl filter=lfs diff=lfs merge=lfs -text +*.pt filter=lfs diff=lfs merge=lfs -text +*.pth filter=lfs diff=lfs merge=lfs -text +*.rar filter=lfs diff=lfs merge=lfs -text +*.safetensors filter=lfs diff=lfs merge=lfs -text +saved_model/**/* filter=lfs diff=lfs merge=lfs -text +*.tar.* filter=lfs diff=lfs merge=lfs -text +*.tar filter=lfs diff=lfs merge=lfs -text +*.tflite filter=lfs diff=lfs merge=lfs -text +*.tgz filter=lfs diff=lfs merge=lfs -text +*.wasm filter=lfs diff=lfs merge=lfs -text +*.xz filter=lfs diff=lfs merge=lfs -text +*.zip filter=lfs diff=lfs merge=lfs -text +*.zst filter=lfs diff=lfs merge=lfs -text +*tfevents* filter=lfs diff=lfs merge=lfs -text +LongAlpaca-13B.Q2_K.gguf filter=lfs diff=lfs merge=lfs -text +LongAlpaca-13B.IQ3_XS.gguf filter=lfs diff=lfs merge=lfs -text +LongAlpaca-13B.IQ3_S.gguf filter=lfs diff=lfs merge=lfs -text +LongAlpaca-13B.Q3_K_S.gguf filter=lfs diff=lfs merge=lfs -text +LongAlpaca-13B.IQ3_M.gguf filter=lfs diff=lfs merge=lfs -text +LongAlpaca-13B.Q3_K.gguf filter=lfs diff=lfs merge=lfs -text +LongAlpaca-13B.Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text +LongAlpaca-13B.Q3_K_L.gguf filter=lfs diff=lfs merge=lfs -text +LongAlpaca-13B.IQ4_XS.gguf filter=lfs diff=lfs merge=lfs -text +LongAlpaca-13B.Q4_0.gguf filter=lfs diff=lfs merge=lfs -text +LongAlpaca-13B.IQ4_NL.gguf filter=lfs diff=lfs merge=lfs -text +LongAlpaca-13B.Q4_K_S.gguf filter=lfs diff=lfs merge=lfs -text +LongAlpaca-13B.Q4_K.gguf filter=lfs diff=lfs merge=lfs -text +LongAlpaca-13B.Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text +LongAlpaca-13B.Q4_1.gguf filter=lfs diff=lfs merge=lfs -text +LongAlpaca-13B.Q5_0.gguf filter=lfs diff=lfs merge=lfs -text +LongAlpaca-13B.Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text +LongAlpaca-13B.Q5_K.gguf filter=lfs diff=lfs merge=lfs -text +LongAlpaca-13B.Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text +LongAlpaca-13B.Q5_1.gguf filter=lfs diff=lfs merge=lfs -text +LongAlpaca-13B.Q6_K.gguf filter=lfs diff=lfs merge=lfs -text +LongAlpaca-13B.Q8_0.gguf filter=lfs diff=lfs merge=lfs -text diff --git a/LongAlpaca-13B.IQ3_M.gguf b/LongAlpaca-13B.IQ3_M.gguf new file mode 100644 index 0000000..e13d658 --- /dev/null +++ b/LongAlpaca-13B.IQ3_M.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:51d45223ee8a6564cae9856d29c8cd542b3a0c156e1e9f3154be5add650b766c +size 5984516768 diff --git a/LongAlpaca-13B.IQ3_S.gguf b/LongAlpaca-13B.IQ3_S.gguf new file mode 100644 index 0000000..2bde556 --- /dev/null +++ b/LongAlpaca-13B.IQ3_S.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b79b706bee2ab611097ae8d897cc3ae8f56a7155cd64658efc3fc5f91bb33b97 +size 5658987168 diff --git a/LongAlpaca-13B.IQ3_XS.gguf b/LongAlpaca-13B.IQ3_XS.gguf new file mode 100644 index 0000000..8047045 --- /dev/null +++ b/LongAlpaca-13B.IQ3_XS.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4e0d46228e3a0ce3a596080be74c3400e04d8fe2144bea4ba6f34736b7af8655 +size 5361617568 diff --git a/LongAlpaca-13B.IQ4_NL.gguf b/LongAlpaca-13B.IQ4_NL.gguf new file mode 100644 index 0000000..60e8794 --- /dev/null +++ b/LongAlpaca-13B.IQ4_NL.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:45156a3fa138859518a527218ec73f6b9430485499d61a729e958ce6d60c63d1 +size 7410079040 diff --git a/LongAlpaca-13B.IQ4_XS.gguf b/LongAlpaca-13B.IQ4_XS.gguf new file mode 100644 index 0000000..ead8421 --- /dev/null +++ b/LongAlpaca-13B.IQ4_XS.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ef328e7de992202eec609bec2313a0277e89714b98859099c0de7f708d5196b1 +size 7019525280 diff --git a/LongAlpaca-13B.Q2_K.gguf b/LongAlpaca-13B.Q2_K.gguf new file mode 100644 index 0000000..eb480c4 --- /dev/null +++ b/LongAlpaca-13B.Q2_K.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d29947624dd56ac4384e9906b421705156a713ac3d3658010de54d322f01bf5f +size 4854276256 diff --git a/LongAlpaca-13B.Q3_K.gguf b/LongAlpaca-13B.Q3_K.gguf new file mode 100644 index 0000000..f2c28cc --- /dev/null +++ b/LongAlpaca-13B.Q3_K.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:199ce1a34b7b09c370a8b1a64448ebfe02b60762e13377de87fdf6e0d2988a89 +size 6337776288 diff --git a/LongAlpaca-13B.Q3_K_L.gguf b/LongAlpaca-13B.Q3_K_L.gguf new file mode 100644 index 0000000..a68b06b --- /dev/null +++ b/LongAlpaca-13B.Q3_K_L.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e9c3a0271e9a5e5127346be48d7535dba862d682b3f157b62e5c1c3dc758972c +size 6929566368 diff --git a/LongAlpaca-13B.Q3_K_M.gguf b/LongAlpaca-13B.Q3_K_M.gguf new file mode 100644 index 0000000..f2c28cc --- /dev/null +++ b/LongAlpaca-13B.Q3_K_M.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:199ce1a34b7b09c370a8b1a64448ebfe02b60762e13377de87fdf6e0d2988a89 +size 6337776288 diff --git a/LongAlpaca-13B.Q3_K_S.gguf b/LongAlpaca-13B.Q3_K_S.gguf new file mode 100644 index 0000000..8dd7850 --- /dev/null +++ b/LongAlpaca-13B.Q3_K_S.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fa8228df94248c2d33ce631a04b68ac3ff94ecdb093a9847b3f67af343556acb +size 5658987168 diff --git a/LongAlpaca-13B.Q4_0.gguf b/LongAlpaca-13B.Q4_0.gguf new file mode 100644 index 0000000..d097ecd --- /dev/null +++ b/LongAlpaca-13B.Q4_0.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9a1e8c504c01cd08216b8b5ed0deabe6d8de4d091bf8154b236edf5bc56e6358 +size 7365842240 diff --git a/LongAlpaca-13B.Q4_1.gguf b/LongAlpaca-13B.Q4_1.gguf new file mode 100644 index 0000000..ed667f7 --- /dev/null +++ b/LongAlpaca-13B.Q4_1.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:028b8ad7f006d12857930e1a8404d2f2dfb81ba43d34f11651976515dfbc9570 +size 8169068160 diff --git a/LongAlpaca-13B.Q4_K.gguf b/LongAlpaca-13B.Q4_K.gguf new file mode 100644 index 0000000..0ef3bd6 --- /dev/null +++ b/LongAlpaca-13B.Q4_K.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bea5c81025744000018f06ef3dd6307794b8484a81e8c2a7e4dc71a5c3c03cb2 +size 7865963840 diff --git a/LongAlpaca-13B.Q4_K_M.gguf b/LongAlpaca-13B.Q4_K_M.gguf new file mode 100644 index 0000000..0ef3bd6 --- /dev/null +++ b/LongAlpaca-13B.Q4_K_M.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bea5c81025744000018f06ef3dd6307794b8484a81e8c2a7e4dc71a5c3c03cb2 +size 7865963840 diff --git a/LongAlpaca-13B.Q4_K_S.gguf b/LongAlpaca-13B.Q4_K_S.gguf new file mode 100644 index 0000000..1379c83 --- /dev/null +++ b/LongAlpaca-13B.Q4_K_S.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2f0dca43c715491817e42a1bdc10086e9720c4bfee95b571d64bc7a995c9cb4c +size 7423186240 diff --git a/LongAlpaca-13B.Q5_0.gguf b/LongAlpaca-13B.Q5_0.gguf new file mode 100644 index 0000000..d094951 --- /dev/null +++ b/LongAlpaca-13B.Q5_0.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:70c98c2a719c23165032c72d3329189064f7cf9a49a728028e45a3205242abf2 +size 8972294080 diff --git a/LongAlpaca-13B.Q5_1.gguf b/LongAlpaca-13B.Q5_1.gguf new file mode 100644 index 0000000..3d22b90 --- /dev/null +++ b/LongAlpaca-13B.Q5_1.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5333a67a791f8edac720c2176f00f13b77d5ea70f0df93f0b04bcaf7efeb4493 +size 9775520000 diff --git a/LongAlpaca-13B.Q5_K.gguf b/LongAlpaca-13B.Q5_K.gguf new file mode 100644 index 0000000..25960a6 --- /dev/null +++ b/LongAlpaca-13B.Q5_K.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5693daa9b40522f2699da3cf4db3833c3d6cd421c5ba0476eb1158c69fbf1ecb +size 9229932480 diff --git a/LongAlpaca-13B.Q5_K_M.gguf b/LongAlpaca-13B.Q5_K_M.gguf new file mode 100644 index 0000000..25960a6 --- /dev/null +++ b/LongAlpaca-13B.Q5_K_M.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5693daa9b40522f2699da3cf4db3833c3d6cd421c5ba0476eb1158c69fbf1ecb +size 9229932480 diff --git a/LongAlpaca-13B.Q5_K_S.gguf b/LongAlpaca-13B.Q5_K_S.gguf new file mode 100644 index 0000000..5375b94 --- /dev/null +++ b/LongAlpaca-13B.Q5_K_S.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ed9a80ce6ea37918aeba0e05a8ed7c39edb011bcbaed772160e0d5dac1ca5841 +size 8972294080 diff --git a/LongAlpaca-13B.Q6_K.gguf b/LongAlpaca-13B.Q6_K.gguf new file mode 100644 index 0000000..6840761 --- /dev/null +++ b/LongAlpaca-13B.Q6_K.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cfe24173106f41446f6c28764c7d3b6ec5c04d5f22caeb585408b880974329e9 +size 10679149184 diff --git a/LongAlpaca-13B.Q8_0.gguf b/LongAlpaca-13B.Q8_0.gguf new file mode 100644 index 0000000..3643a96 --- /dev/null +++ b/LongAlpaca-13B.Q8_0.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:83e9f049d320c92e0f86f85896ea889494bd62b96ea47244b1ef92a8e9bc9638 +size 13831330816 diff --git a/README.md b/README.md new file mode 100644 index 0000000..9bc782c --- /dev/null +++ b/README.md @@ -0,0 +1,410 @@ +Quantization made by Richard Erkhov. + +[Github](https://github.com/RichardErkhov) + +[Discord](https://discord.gg/pvy7H8DZMG) + +[Request more models](https://github.com/RichardErkhov/quant_request) + + +LongAlpaca-13B - GGUF +- Model creator: https://huggingface.co/Yukang/ +- Original model: https://huggingface.co/Yukang/LongAlpaca-13B/ + + +| Name | Quant method | Size | +| ---- | ---- | ---- | +| [LongAlpaca-13B.Q2_K.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-13B-gguf/blob/main/LongAlpaca-13B.Q2_K.gguf) | Q2_K | 4.52GB | +| [LongAlpaca-13B.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-13B-gguf/blob/main/LongAlpaca-13B.IQ3_XS.gguf) | IQ3_XS | 4.99GB | +| [LongAlpaca-13B.IQ3_S.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-13B-gguf/blob/main/LongAlpaca-13B.IQ3_S.gguf) | IQ3_S | 5.27GB | +| [LongAlpaca-13B.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-13B-gguf/blob/main/LongAlpaca-13B.Q3_K_S.gguf) | Q3_K_S | 5.27GB | +| [LongAlpaca-13B.IQ3_M.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-13B-gguf/blob/main/LongAlpaca-13B.IQ3_M.gguf) | IQ3_M | 5.57GB | +| [LongAlpaca-13B.Q3_K.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-13B-gguf/blob/main/LongAlpaca-13B.Q3_K.gguf) | Q3_K | 5.9GB | +| [LongAlpaca-13B.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-13B-gguf/blob/main/LongAlpaca-13B.Q3_K_M.gguf) | Q3_K_M | 5.9GB | +| [LongAlpaca-13B.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-13B-gguf/blob/main/LongAlpaca-13B.Q3_K_L.gguf) | Q3_K_L | 6.45GB | +| [LongAlpaca-13B.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-13B-gguf/blob/main/LongAlpaca-13B.IQ4_XS.gguf) | IQ4_XS | 6.54GB | +| [LongAlpaca-13B.Q4_0.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-13B-gguf/blob/main/LongAlpaca-13B.Q4_0.gguf) | Q4_0 | 6.86GB | +| [LongAlpaca-13B.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-13B-gguf/blob/main/LongAlpaca-13B.IQ4_NL.gguf) | IQ4_NL | 6.9GB | +| [LongAlpaca-13B.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-13B-gguf/blob/main/LongAlpaca-13B.Q4_K_S.gguf) | Q4_K_S | 6.91GB | +| [LongAlpaca-13B.Q4_K.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-13B-gguf/blob/main/LongAlpaca-13B.Q4_K.gguf) | Q4_K | 7.33GB | +| [LongAlpaca-13B.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-13B-gguf/blob/main/LongAlpaca-13B.Q4_K_M.gguf) | Q4_K_M | 7.33GB | +| [LongAlpaca-13B.Q4_1.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-13B-gguf/blob/main/LongAlpaca-13B.Q4_1.gguf) | Q4_1 | 7.61GB | +| [LongAlpaca-13B.Q5_0.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-13B-gguf/blob/main/LongAlpaca-13B.Q5_0.gguf) | Q5_0 | 8.36GB | +| [LongAlpaca-13B.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-13B-gguf/blob/main/LongAlpaca-13B.Q5_K_S.gguf) | Q5_K_S | 8.36GB | +| [LongAlpaca-13B.Q5_K.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-13B-gguf/blob/main/LongAlpaca-13B.Q5_K.gguf) | Q5_K | 8.6GB | +| [LongAlpaca-13B.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-13B-gguf/blob/main/LongAlpaca-13B.Q5_K_M.gguf) | Q5_K_M | 8.6GB | +| [LongAlpaca-13B.Q5_1.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-13B-gguf/blob/main/LongAlpaca-13B.Q5_1.gguf) | Q5_1 | 9.1GB | +| [LongAlpaca-13B.Q6_K.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-13B-gguf/blob/main/LongAlpaca-13B.Q6_K.gguf) | Q6_K | 9.95GB | +| [LongAlpaca-13B.Q8_0.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-13B-gguf/blob/main/LongAlpaca-13B.Q8_0.gguf) | Q8_0 | 12.88GB | + + + + +Original model description: +# LongLoRA and LongAlpaca for Long-context LLMs + + +[![Huggingface Models](https://img.shields.io/badge/Models-Huggingface%20Models-bron)](https://huggingface.co/Yukang) +[![Github](https://img.shields.io/badge/Github-Repo-cyan)](https://github.com/dvlab-research/LongLoRA) +[![Data](https://img.shields.io/badge/Data-LongAlpaca%2012k-light)](https://huggingface.co/datasets/Yukang/LongAlpaca-12k) +[![Paper](https://img.shields.io/badge/Paper-Arvix-blue)](https://arxiv.org/abs/2309.12307) + +[![Code License](https://img.shields.io/badge/Code%20License-Apache_2.0-yellow.svg)](https://github.com/dvlab-research/LongLoRA/blob/main/LICENSE) +[![Data License](https://img.shields.io/badge/Data%20License-CC%20By%20NC%204.0-orange.svg)](https://github.com/dvlab-research/LongLoRA/blob/main/DATA_LICENSE) +[![Weight License](https://img.shields.io/badge/Weight%20License-CC%20By%20NC%204.0-red)](https://github.com/dvlab-research/LongLoRA/blob/main/WEIGHT_LICENSE) + +For detailed usage and codes, please visit the [Github project](https://github.com/dvlab-research/LongLoRA). +## TABLE OF CONTENTS +1. [News](#news) +2. [Examples](#examples) +3. [Highlights](#highlights) +4. [How to contribute](#how-to-contribute) +5. [Requirements](#usage-requirements) +6. [Installation and quick guide](#installation-and-quick-guide) +7. [LongAlpaca Data](#longalpaca-data) +8. [Models](#models) +9. [Training](#training) +10. [Evaluation](#evaluation) +11. [Demo](#demo) +12. [Data Generation via Pdf2Text](#data-generation-via-pdf2text) +13. [Citation](#citation) +14. [Acknowledgement](#acknowledgement) +15. [License](#license) + +## News +- [x] [2023.10.8] **We release the long instruction-following dataset**, [LongAlpaca-12k](https://huggingface.co/datasets/Yukang/LongAlpaca-12k) and **the corresponding models**, [LongAlpaca-7B](https://huggingface.co/Yukang/LongAlpaca-7B), [LongAlpaca-13B](https://huggingface.co/Yukang/LongAlpaca-13B), and [LongAlpaca-70B](https://huggingface.co/Yukang/LongAlpaca-70B). +- (*The previous sft models*, [Llama-2-13b-chat-longlora-32k-sft](https://huggingface.co/Yukang/Llama-2-13b-chat-longlora-32k-sft) and [Llama-2-70b-chat-longlora-32k-sft](https://huggingface.co/Yukang/Llama-2-70b-chat-longlora-32k-sft), *have been depreciated*.) +- [x] [2023.10.3] We add support GPTNeoX models. Please refer to this [PR](https://github.com/dvlab-research/LongLoRA/pull/32) for usage. Thanks for @naubull2 for this contribution. +- [x] [2023.9.22] We release all our fine-tuned [models](https://huggingface.co/Yukang), including **70B-32k models**, [LLaMA2-LongLoRA-70B-32k](https://huggingface.co/Yukang/Llama-2-70b-longlora-32k), [LLaMA2-LongLoRA-7B-100k](https://huggingface.co/Yukang/Llama-2-7b-longlora-100k-ft). Welcome to check them out! +- [x] [2023.9.22] We release [Paper](http://arxiv.org/abs/2309.12307) and this GitHub repo, including training and evaluation code. + +**LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models [[Paper](http://arxiv.org/abs/2309.12307)]**
+[Yukang Chen](https://scholar.google.com/citations?user=6p0ygKUAAAAJ&hl=en), +[Shengju Qian](https://scholar.google.com/citations?user=QNnWmasAAAAJ), +[Haotian Tang](https://scholar.google.com/citations?user=WxL13BAAAAAJ&hl), +[Xin Lai](https://scholar.google.com/citations?user=tqNDPA4AAAAJ&hl=zh-CN), +[Zhijian Liu](https://scholar.google.com/citations?user=3coYSTUAAAAJ&hl=en), +[Song Han](https://scholar.google.com/citations?user=E0iCaa4AAAAJ&hl=zh-CN), +[Jiaya Jia](https://scholar.google.com/citations?user=XPAkzTEAAAAJ&hl=en)
+ +## Highlights +1. In LongLoRA approach, The proposed shifted short attention is easy to implement, compatible with Flash-Attention, and is not required during inference. +2. We released all our models, including models from 7B to 70B, context length from 8k to 100k, including [LLaMA2-LongLoRA-7B-100k](https://huggingface.co/Yukang/Llama-2-7b-longlora-100k-ft), [LLaMA2-LongLoRA-13B-64k](https://huggingface.co/Yukang/Llama-2-13b-longlora-64k), and [LLaMA2-LongLoRA-70B-32k](https://huggingface.co/Yukang/Llama-2-70b-longlora-32k). +3. We built up a long-context instruction-following dataset, [LongAlpaca-12k](#longalpaca-data). We released the corresponding [LongAlpaca-7B](https://huggingface.co/Yukang/LongAlpaca-7B), [LongAlpaca-13B](https://huggingface.co/Yukang/LongAlpaca-13B) and [LongAlpaca-70B](https://huggingface.co/Yukang/LongAlpaca-70B) models. To our best knowledge, this is the first open-sourced long-context 70B model. + +## How to Contribute +- Make sure to have git installed. +- Create your own [fork](https://github.com/dvlab-research/LongLoRA/fork) of the project. +- Clone the repository on your local machine, using git clone and pasting the url of this project. +- Read both the `Requirements` and `Installation and Quick Guide` sections below. +- Commit and push your changes. +- Make a pull request when finished modifying the project. + + +## Usage Requirements +To download and use the [pre-trained weights](#pre-trained-weights) you will need: +1. Hugging Face (HF) account with valid email. Note, the email used for HF must alse be used for the license agreement. +2. Accept the Meta [license and acceptable use policy](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) + + +## Installation and Quick Guide +To install and run the application: +1. [Fork this repo](https://github.com/dvlab-research/LongLoRA/fork) on github +2. Clone the repository on your local machine, using git clone and pasting the url of this project. +3. Run the following code: +``` +pip install -r requirements.txt +pip install flash-attn --no-build-isolation +``` +4. Use either a [Released model](#released-models) or [Fine tune](#fine-tuning) a model to fit your preferences. +5. Test your model by chat. +6. Deploy your own demo. + +## LongAlpaca Data + +LongAlpaca-12k contains 9k long QA data that we collected and 3k short QA sampled from the original [Alpaca data](https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json). This is to avoid the case that the model might degrade at short instruction following. The data we collect contains various types and amounts as the following figure. + +| Data | Short QA | Long QA | Total | Download | +|:---------------|----------|----------|----------|----------| +| LongAlpaca-12k | 3k | 9k | 12k | [Link](https://huggingface.co/datasets/Yukang/LongAlpaca-12k) | + +Following the original Alpaca format, our Long QA data uses the following prompts for fine-tuning: +- `instruction`: `str`, describes the task the model should perform. For example, to answer a question after reading a book section or paper. We vary the contents and questions to make instructions diverse. +- `output`: `str`, the answer to the instruction. + +We did not use the `input` format in the Alpaca format for simplicity. + +## Models + +### Models with supervised fine-tuning +| Model | Size | Context | Train | Link | +|:---------------|------|---------|---------|-----------------------------------------------------------------------------------------------------------------------| +| LongAlpaca-7B | 7B | 32768 | Full FT | [Model](https://huggingface.co/Yukang/LongAlpaca-7B) | +| LongAlpaca-13B | 13B | 32768 | Full FT | [Model](https://huggingface.co/Yukang/LongAlpaca-13B) | +| LongAlpaca-70B | 70B | 32768 | LoRA+ | [Model](https://huggingface.co/Yukang/LongAlpaca-70B) [(LoRA-weight)](https://huggingface.co/Yukang/LongAlpaca-70B-lora) | + + +### Models with context extension via fully fine-tuning +| Model | Size | Context | Train | Link | +|:----------------------------|------|---------|-------|-------------------------------------------------------------------| +| Llama-2-7b-longlora-8k-ft | 7B | 8192 | Full FT | [Model](https://huggingface.co/Yukang/Llama-2-7b-longlora-8k-ft) | +| Llama-2-7b-longlora-16k-ft | 7B | 16384 | Full FT | [Model](https://huggingface.co/Yukang/Llama-2-7b-longlora-16k-ft) | +| Llama-2-7b-longlora-32k-ft | 7B | 32768 | Full FT | [Model](https://huggingface.co/Yukang/Llama-2-7b-longlora-32k-ft) | +| Llama-2-7b-longlora-100k-ft | 7B | 100000 | Full FT | [Model](https://huggingface.co/Yukang/Llama-2-7b-longlora-100k-ft) | +| Llama-2-13b-longlora-8k-ft | 13B | 8192 | Full FT | [Model](https://huggingface.co/Yukang/Llama-2-13b-longlora-8k-ft) | +| Llama-2-13b-longlora-16k-ft | 13B | 16384 | Full FT | [Model](https://huggingface.co/Yukang/Llama-2-13b-longlora-16k-ft) | +| Llama-2-13b-longlora-32k-ft | 13B | 32768 | Full FT | [Model](https://huggingface.co/Yukang/Llama-2-13b-longlora-32k-ft) | + +### Models with context extension via improved LoRA fine-tuning +| Model | Size | Context | Train | Link | +|:----------------------------|------|---------|-------|---------------------------------------------------------------------| +| Llama-2-7b-longlora-8k | 7B | 8192 | LoRA+ | [LoRA-weight](https://huggingface.co/Yukang/Llama-2-7b-longlora-8k) | +| Llama-2-7b-longlora-16k | 7B | 16384 | LoRA+ | [LoRA-weight](https://huggingface.co/Yukang/Llama-2-7b-longlora-16k) | +| Llama-2-7b-longlora-32k | 7B | 32768 | LoRA+ | [LoRA-weight](https://huggingface.co/Yukang/Llama-2-7b-longlora-32k) | +| Llama-2-13b-longlora-8k | 13B | 8192 | LoRA+ | [LoRA-weight](https://huggingface.co/Yukang/Llama-2-13b-longlora-8k) | +| Llama-2-13b-longlora-16k | 13B | 16384 | LoRA+ | [LoRA-weight](https://huggingface.co/Yukang/Llama-2-13b-longlora-16k) | +| Llama-2-13b-longlora-32k | 13B | 32768 | LoRA+ | [LoRA-weight](https://huggingface.co/Yukang/Llama-2-13b-longlora-32k) | +| Llama-2-13b-longlora-64k | 13B | 65536 | LoRA+ | [LoRA-weight](https://huggingface.co/Yukang/Llama-2-13b-longlora-64k) | +| Llama-2-70b-longlora-32k | 70B | 32768 | LoRA+ | [LoRA-weight](https://huggingface.co/Yukang/Llama-2-70b-longlora-32k) | +| Llama-2-70b-chat-longlora-32k | 70B | 32768 | LoRA+ | [LoRA-weight](https://huggingface.co/Yukang/Llama-2-70b-chat-longlora-32k) | + +## Training +### Pre-trained weights +We use LLaMA2 models as the pre-trained weights and fine-tune them to long context window sizes. Download based on your choices. + +| Pre-trained weights | +|:-------------------------------------------------------------------------------------| +| [Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) | +|[Llama-2-13b-hf](https://huggingface.co/meta-llama/Llama-2-13b-hf) | +| [Llama-2-70b-hf](https://huggingface.co/meta-llama/Llama-2-70b-hf) | +| [Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) | +| [Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) | +| [Llama-2-70b-chat-hf](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf) | + +This project also supports GPTNeoX models as the base model architecture. Some candidate pre-trained weights may include [GPT-NeoX-20B](https://huggingface.co/EleutherAI/gpt-neox-20b), [Polyglot-ko-12.8B](https://huggingface.co/EleutherAI/polyglot-ko-12.8b) and other variants. + +### Fine-tuning +``` +torchrun --nproc_per_node=8 fine-tune.py \ + --model_name_or_path path_to/Llama-2-7b-hf \ + --bf16 True \ + --output_dir path_to_saving_checkpoints \ + --cache_dir path_to_cache \ + --model_max_length 8192 \ + --use_flash_attn True \ + --low_rank_training False \ + --num_train_epochs 1 \ + --per_device_train_batch_size 1 \ + --per_device_eval_batch_size 2 \ + --gradient_accumulation_steps 8 \ + --evaluation_strategy "no" \ + --save_strategy "steps" \ + --save_steps 1000 \ + --save_total_limit 2 \ + --learning_rate 2e-5 \ + --weight_decay 0.0 \ + --warmup_steps 20 \ + --lr_scheduler_type "constant_with_warmup" \ + --logging_steps 1 \ + --deepspeed "ds_configs/stage2.json" \ + --tf32 True \ + --max_steps 1000 +``` + +- Please remember to change `path_to/Llama-2-7b-hf`, `path_to_saving_checkpoints`, `path_to_cache` to your own directory. +- Note that you can change `model_max_length` to other values. +- You could change `ds_configs/stage2.json` to `ds_configs/stage3.json` if you want. +- Please set `use_flash_attn` as `False` if you use V100 machines or do not install flash attention. +- You can set `low_rank_training` as `False` if you want to use fully fine-tuning. It will cost more GPU memory and slower, but the performance will be a bit better. +- When training is finished, to get the full model weight: +``` +cd path_to_saving_checkpoints && python zero_to_fp32.py . pytorch_model.bin +``` + +### Supervised Fine-tuning +``` +torchrun --nproc_per_node=8 supervised-fine-tune.py \ + --model_name_or_path path_to_Llama2_chat_models \ + --bf16 True \ + --output_dir path_to_saving_checkpoints \ + --model_max_length 32768 \ + --use_flash_attn True \ + --data_path LongAlpaca-12k.json \ + --low_rank_training True \ + --num_train_epochs 3 \ + --per_device_train_batch_size 1 \ + --per_device_eval_batch_size 2 \ + --gradient_accumulation_steps 1 \ + --evaluation_strategy "no" \ + --save_strategy "steps" \ + --save_steps 1000 \ + --save_total_limit 2 \ + --learning_rate 2e-5 \ + --weight_decay 0.0 \ + --warmup_steps 20 \ + --lr_scheduler_type "constant_with_warmup" \ + --logging_steps 1 \ + --deepspeed "ds_configs/stage2.json" \ + --tf32 True +``` +- There is no need to make supervised fine-tuning upon the fine-tuned context extended models. It is all right to directly use base model as Llama2-chat models, as the amount of long instruction following data is enough for SFT. +- Our long instruction following data can be found in [LongAlpaca-12k.json](https://huggingface.co/datasets/Yukang/LongAlpaca-12k). + + +### Get trainable weights in low-rank training +In low-rank training, we set embedding and normalization layers as trainable. Please use the following line to extract the trainable weights `trainable_params.bin` from `pytorch_model.bin` +``` +python3 get_trainable_weights.py --checkpoint_path path_to_saving_checkpoints --trainable_params "embed,norm" +``` + +### Merge LoRA Weight +Merge the LoRA weights of `pytorch_model.bin` and trainable parameters `trainable_params.bin`, save the resulting model into your desired path in the Hugging Face format: +``` +python3 merge_lora_weights_and_save_hf_model.py \ + --base_model path_to/Llama-2-7b-hf \ + --peft_model path_to_saving_checkpoints \ + --context_size 8192 \ + --save_path path_to_saving_merged_model +``` +For example, +``` +python3 merge_lora_weights_and_save_hf_model.py \ + --base_model /dataset/pretrained-models/Llama-2-7b-hf \ + --peft_model /dataset/yukangchen/hf_models/lora-models/Llama-2-7b-longlora-8k \ + --context_size 8192 \ + --save_path /dataset/yukangchen/models/Llama-2-7b-longlora-8k-merged +``` + + +## Evaluation +### Perplexity Validation +To evaluate a model that is trained in the low-rank setting, please set both `base_model` and `peft_model`. `base_model` is the pre-trained weight. `peft_model` is the path to the saved checkpoint, which should contain `trainable_params.bin`, `adapter_model.bin` and `adapter_config.json`. For example, +``` +python3 eval.py --seq_len 8192 --context_size 8192 --batch_size 1 --base_model path_to/Llama-2-7b-hf --peft_model path_to_saving_checkpoints --data_path pg19/test.bin +``` + +To evaluate a model that is fully fine-tuned, you only need to set `base_model` as the path to the saved checkpoint, which should contain `pytorch_model.bin` and `config.json`. `peft_model` should be ignored. +``` +python3 eval.py --seq_len 8192 --context_size 8192 --batch_size 1 --base_model path_to_saving_checkpoints --data_path pg19/test.bin +``` + +- Note that `--seq_len` is to set the sequence length for evaluation. `--context_size` is to set the context length of the model during fine-tuning. `--seq_len` should not be larger than `--context_size`. + +- We have already tokenized the validation and test splits of PG19 and proof-pile dataset into `pg19/validation.bin`, `pg19/test.bin`, and `proof-pile/test_sampled_data.bin`, with the tokenizer of LLaMA. `proof-pile/test_sampled_data.bin` contains 128 documents that are randomly sampled from the total proof-pile test split. For each document, it has at least 32768 tokens. We also release the sampled ids in [proof-pile/test_sampled_ids.bin](https://drive.google.com/file/d/1cnzWODLRQYAd7HeugzLCIhaqzaLZv7J5/view?usp=share_link). You can download them from the links below. + +| Dataset | Split | Link | +|:-----------|------------|--------------------------------------------------------------------------------------------------------------| +| PG19 | validation | [pg19/validation.bin](https://drive.google.com/file/d/1rbJvb0qRIf2mQoN2ON7S93TbTzMnlrN6/view?usp=share_link) | +| PG19 | test | [pg19/test.bin](https://drive.google.com/file/d/1QANDMdctpacPAYgS04adDXqByGEq-Ret/view?usp=share_link) | +| Proof-pile | test | [proof-pile/test_sampled_data.bin](https://drive.google.com/file/d/1bUI5lPDvrqzY_XXJJ2sSuvZx0Y9AZClE/view?usp=share_link) | + + +### Passkey Retrieval +We provide a manner to test the passkey retrieval accuracy. For example, +``` +python3 passkey_retrivial.py \ + --context_size 32768 \ + --base_model path_to/Llama-2-7b-longlora-32k \ + --max_tokens 32768 \ + --interval 1000 +``` +- Note that the `context_size` is the context length during fine-tuning. +- `max_tokens` is maximum length for the document in passkey retrieval evaluation. +- `interval` is the interval during the document length increasing. It is a rough number because the document increases by sentences. + +## Demo +### Local Inference +To chat with [Llama-2-13b-chat-longlora-32k-sft](https://huggingface.co/Yukang/Llama-2-13b-chat-longlora-32k-sft) or [Llama-2-70b-chat-longlora-32k-sft](https://huggingface.co/Yukang/Llama-2-70b-chat-longlora-32k-sft), you need to run `merge_lora_weights_and_save_hf_model.py` first, and then: +``` +python3 inference.py \ + --base_model path_to_model \ + --question $question \ + --context_size $context_length \ + --max_gen_len $max_gen_len \ + --flash_attn True \ + --material $material_content \ + --material_type $material_type \ + --material_title $material_title +``` +To ask a question related to a book: +``` +python3 inference.py \ + --base_model /data/models/Llama-2-13b-chat-longlora-32k-sft \ + --question "Why doesn't Professor Snape seem to like Harry?" \ + --context_size 32768 \ + --max_gen_len 512 \ + --flash_attn True \ + --material "materials/Harry Potter and the Philosophers Stone_section2.txt" \ + --material_type "book" \ + --material_title "Harry Potter and the Philosophers Stone" +``` +Note that you can ignore `material_type` or `material_title`. + +To ask a question related to a paper: +``` +python3 inference.py \ + --base_model /data/models/Llama-2-13b-chat-longlora-32k-sft \ + --question "What are the main contributions and novelties of this work?" \ + --context_size 32768 \ + --max_gen_len 512 \ + --flash_attn True \ + --material "materials/paper1.txt" \ + --material_type "paper" +``` + +### Online Demo +To deploy your own demo run +``` +python3 demo.py \ + --base_model path_to_model \ + --context_size $context_size \ + --max_gen_len $max_gen_len \ + --flash_attn True +``` +Example +``` +python3 demo.py \ + --base_model /data/models/Llama-2-13b-chat-longlora-32k-sft \ + --context_size 32768 \ + --max_gen_len 512 \ + --flash_attn True +``` +- Note that `flash_attn=True` will make the generation slow but save much GPU memory. + +## Data Generation via Pdf2text +During our dataset collection, we convert paper and books from pdf to text. The conversion quality has a large influence on the final model quality. We think that this step is non-trivial. We release the tool for the pdf2txt conversion, in the folder `pdf2txt`. It is built upon `pdf2image`, `easyocr`, `ditod` and `detectron2`. Please refer to the [README.md](pdf2txt/README.md) in `pdf2txt` for more details. + +## Citation +If you find this project useful in your research, please consider citing: + +``` +@article{longlora, + title={LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models}, + author={Yukang Chen and Shengju Qian and Haotian Tang and Xin Lai and Zhijian Liu and Song Han and Jiaya Jia}, + journal={arXiv:2309.12307}, + year={2023} +} +``` + + +``` +@misc{long-alpaca, + author = {Yukang Chen and Shaozuo Yu and Shengju Qian and Haotian Tang and Xin Lai and Zhijian Liu and Song Han and Jiaya Jia}, + title = {Long Alpaca: Long-context Instruction-following models}, + year = {2023}, + publisher = {GitHub}, + journal = {GitHub repository}, + howpublished = {\url{https://github.com/dvlab-research/LongLoRA}}, +} +``` +## Acknowledgement +- This work is built upon the [LLaMA2](https://ai.meta.com/llama) as the pre-trained models. +- This work can also be built upon the [GPTNeoX-HF](https://huggingface.co/docs/transformers/model_doc/gpt_neox) which is based upon [EleutherAI/GPTNeoX](https://github.com/EleutherAI/gpt-neox) as the pre-trained model architecture. +- This work is based on [DeepSpeed](https://github.com/microsoft/DeepSpeed), [peft](https://github.com/huggingface/peft), and [Flash-Attention2](https://github.com/Dao-AILab/flash-attention) for acceleration. +- Some evaluation code is modified upon [Landmark Attention](https://github.com/epfml/landmark-attention). +- We use [LongChat](https://github.com/DachengLi1/LongChat) for the retrieval evaluation. + +## License +- LongLoRA is licensed under the Apache License 2.0. This means that it requires the preservation of copyright and license notices. +- Data and weights are under CC-BY-NC 4.0 License. They are licensed for research use only, and allowed only non-commercial. Models trained using the dataset should not be used outside of research purposes. +