From 8acde7a0e888c1f6f896c6d585a700ff6c6a12e8 Mon Sep 17 00:00:00 2001
From: ai-modelscope
Date: Mon, 11 Mar 2024 23:58:11 +0800
Subject: [PATCH] Update README.md (#7)
- Update README.md (bbbe4ceabb3fe4620ac77150936b5330d8067b98)
Co-authored-by: Gloria Lee
---
README.md | 68 ++++++++++++++++++++++++++++++++++++++++++++-----------
1 file changed, 55 insertions(+), 13 deletions(-)
diff --git a/README.md b/README.md
index 9247a10..1f0d445 100644
--- a/README.md
+++ b/README.md
@@ -60,7 +60,19 @@ pipeline_tag: text-generation
- 👋 Join us 💬 WeChat (Chinese) !
+ 👩🚀 Ask questions or discuss ideas on GitHub
+
+
+
+ 👋 Join us on 👾 Discord or 💬 WeChat
+
+
+
+ 📝 Check out Yi Tech Report
+
+
+
+ 📚 Grow at Yi Learning Hub
@@ -101,6 +113,8 @@ pipeline_tag: text-generation
- [Benchmarks](#benchmarks)
- [Base model performance](#base-model-performance)
- [Chat model performance](#chat-model-performance)
+ - [Tech report](#tech-report)
+ - [Citation](#citation)
- [Who can use Yi?](#who-can-use-yi)
- [Misc.](#misc)
- [Acknowledgements](#acknowledgments)
@@ -131,7 +145,7 @@ pipeline_tag: text-generation
>
> The Yi series models adopt the same model architecture as Llama but are **NOT** derivatives of Llama.
-- Both Yi and Llama are all based on the Transformer structure, which has been the standard architecture for large language models since 2018.
+- Both Yi and Llama are based on the Transformer structure, which has been the standard architecture for large language models since 2018.
- Grounded in the Transformer architecture, Llama has become a new cornerstone for the majority of state-of-the-art open-source models due to its excellent stability, reliable convergence, and robust compatibility. This positions Llama as the recognized foundational framework for models including Yi.
@@ -151,6 +165,17 @@ pipeline_tag: text-generation
## News
+
+ 🎯 2024-03-08: Yi Tech Report is published!
+
+
+
+
+ 🔔 2024-03-07: The long text capability of the Yi-34B-200K has been enhanced.
+
+In the "Needle-in-a-Haystack" test, the Yi-34B-200K's performance is improved by 10.5%, rising from 89.3% to an impressive 99.8%. We continue to pre-train the model on 5B tokens long-context data mixture and demonstrate a near-all-green performance.
+
+
🎯 2024-03-06: The Yi-9B is open-sourced and available to the public.
@@ -241,7 +266,7 @@ Yi-9B|• [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-9B)
Yi-6B| • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-6B) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-6B/summary)
Yi-6B-200K | • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-6B-200K) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-6B-200K/summary)
- - 200k is roughly equivalent to 400,000 Chinese characters.
+ - 200k is roughly equivalent to 400,000 Chinese characters.
- If you want to use the previous version of the Yi-34B-200K (released on Nov 5, 2023), run `git checkout 069cd341d60f4ce4b07ec394e82b79e94f656cf` to download the weight.
### Model info
@@ -938,13 +963,13 @@ Before deploying Yi in your environment, make sure your hardware meets the follo
##### Chat models
| Model | Minimum VRAM | Recommended GPU Example |
-|----------------------|--------------|:-------------------------------------:|
-| Yi-6B-Chat | 15 GB | 1 x RTX 3090
1 x RTX 4090
A10
A30 |
-| Yi-6B-Chat-4bits | 4 GB | 1 x RTX 3060
1 x RTX 4060 |
-| Yi-6B-Chat-8bits | 8 GB | 1 x RTX 3070
1 x RTX 4060 |
-| Yi-34B-Chat | 72 GB | 4 x RTX 4090
A800 (80GB) |
-| Yi-34B-Chat-4bits | 20 GB | 1 x RTX 3090
1 x RTX 4090
A10
A30
A100 (40GB) |
-| Yi-34B-Chat-8bits | 38 GB | 2 x RTX 3090
2 x RTX 4090
A800 (40GB) |
+|:----------------------|:--------------|:-------------------------------------:|
+| Yi-6B-Chat | 15 GB | 1 x RTX 3090 (24 GB)
1 x RTX 4090 (24 GB)
1 x A10 (24 GB)
1 x A30 (24 GB) |
+| Yi-6B-Chat-4bits | 4 GB | 1 x RTX 3060 (12 GB)
1 x RTX 4060 (8 GB) |
+| Yi-6B-Chat-8bits | 8 GB | 1 x RTX 3070 (8 GB)
1 x RTX 4060 (8 GB) |
+| Yi-34B-Chat | 72 GB | 4 x RTX 4090 (24 GB)
1 x A800 (80GB) |
+| Yi-34B-Chat-4bits | 20 GB | 1 x RTX 3090 (24 GB)
1 x RTX 4090 (24 GB)
1 x A10 (24 GB)
1 x A30 (24 GB)
1 x A100 (40 GB) |
+| Yi-34B-Chat-8bits | 38 GB | 2 x RTX 3090 (24 GB)
2 x RTX 4090 (24 GB)
1 x A800 (40 GB) |
Below are detailed minimum VRAM requirements under different batch use cases.
@@ -961,10 +986,10 @@ Below are detailed minimum VRAM requirements under different batch use cases.
| Model | Minimum VRAM | Recommended GPU Example |
|----------------------|--------------|:-------------------------------------:|
-| Yi-6B | 15 GB | 1 x RTX 3090
1 x RTX 4090
A10
A30 |
-| Yi-6B-200K | 50 GB | A800 (80 GB) |
+| Yi-6B | 15 GB | 1 x RTX 3090 (24 GB)
1 x RTX 4090 (24 GB)
1 x A10 (24 GB)
1 x A30 (24 GB) |
+| Yi-6B-200K | 50 GB | 1 x A800 (80 GB) |
| Yi-9B | 20 GB | 1 x RTX 4090 (24 GB) |
-| Yi-34B | 72 GB | 4 x RTX 4090
A800 (80 GB) |
+| Yi-34B | 72 GB | 4 x RTX 4090 (24 GB)
1 x A800 (80 GB) |
| Yi-34B-200K | 200 GB | 4 x A800 (80 GB) |
[
@@ -1112,6 +1137,23 @@ If you're seeking to explore the diverse capabilities within Yi's thriving famil
Back to top ⬆️ ]
+## Tech report
+
+For detailed capabilities of the Yi series model, see [Yi: Open Foundation Models by 01.AI](https://arxiv.org/abs/2403.04652).
+
+### Citation
+
+```
+@misc{ai2024yi,
+ title={Yi: Open Foundation Models by 01.AI},
+ author={01. AI and : and Alex Young and Bei Chen and Chao Li and Chengen Huang and Ge Zhang and Guanwei Zhang and Heng Li and Jiangcheng Zhu and Jianqun Chen and Jing Chang and Kaidong Yu and Peng Liu and Qiang Liu and Shawn Yue and Senbin Yang and Shiming Yang and Tao Yu and Wen Xie and Wenhao Huang and Xiaohui Hu and Xiaoyi Ren and Xinyao Niu and Pengcheng Nie and Yuchi Xu and Yudong Liu and Yue Wang and Yuxuan Cai and Zhenyu Gu and Zhiyuan Liu and Zonghong Dai},
+ year={2024},
+ eprint={2403.04652},
+ archivePrefix={arXiv},
+ primaryClass={cs.CL}
+}
+```
+
## Benchmarks
- [Chat model performance](#-chat-model-performance)