256 lines
12 KiB
Markdown
256 lines
12 KiB
Markdown
|
|
---
|
||
|
|
language:
|
||
|
|
- en
|
||
|
|
- ja
|
||
|
|
library_name: transformers
|
||
|
|
pipeline_tag: text-generation
|
||
|
|
license: llama3
|
||
|
|
model_type: llama
|
||
|
|
---
|
||
|
|
|
||
|
|
# Llama3 Swallow - Built with Meta Llama 3
|
||
|
|
|
||
|
|
Our Swallow model has undergone continual pre-training from the [Llama 3 family](https://huggingface.co/collections/meta-llama/meta-llama-3-66214712577ca38149ebb2b6), primarily with the addition of Japanese language data. The Instruct versions use supervised fine-tuning (SFT) and Chat Vector. Links to other models can be found in the index.
|
||
|
|
|
||
|
|
|
||
|
|
# Model Release Updates
|
||
|
|
|
||
|
|
We are excited to share the release schedule for our latest models:
|
||
|
|
- **July 1, 2024**: Released the [Llama-3-Swallow-8B-v0.1](https://huggingface.co/tokyotech-llm/Llama-3-Swallow-8B-v0.1), [Llama-3-Swallow-8B-Instruct-v0.1](https://huggingface.co/tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1), [Llama-3-Swallow-70B-v0.1](https://huggingface.co/tokyotech-llm/Llama-3-Swallow-70B-v0.1), and [Llama-3-Swallow-70B-Instruct-v0.1](https://huggingface.co/tokyotech-llm/Llama-3-Swallow-70B-Instruct-v0.1).
|
||
|
|
|
||
|
|
## Swallow Model Index
|
||
|
|
|
||
|
|
|Model|Llama-3-Swallow|Llama3 Swallow Instruct|
|
||
|
|
|---|---|---|
|
||
|
|
|8B| [Link](https://huggingface.co/tokyotech-llm/Llama-3-Swallow-8B-v0.1) | [Link](https://huggingface.co/tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1) |
|
||
|
|
|70B| [Link](https://huggingface.co/tokyotech-llm/Llama-3-Swallow-70B-v0.1) | [Link](https://huggingface.co/tokyotech-llm/Llama-3-Swallow-70B-Instruct-v0.1) |
|
||
|
|
|
||
|
|

|
||
|
|
|
||
|
|
This repository provides large language models developed by [Swallow-LLM](https://swallow-llm.github.io/).
|
||
|
|
Read our [blog post](https://zenn.dev/tokyotech_lm/articles/f65989d76baf2c).
|
||
|
|
|
||
|
|
## Model Details
|
||
|
|
|
||
|
|
* **Model type**: Please refer to [Llama 3 MODEL_CARD](https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md) for details on the model architecture.
|
||
|
|
* **Language(s)**: Japanese English
|
||
|
|
* **Library**: [Megatron-LM](https://github.com/NVIDIA/Megatron-LM)
|
||
|
|
* **Tokenizer**: Please refer to [Llama 3 blog](https://ai.meta.com/blog/meta-llama-3/) for details on the tokenizer.
|
||
|
|
* **Contact**: swallow[at]nlp.c.titech.ac.jp
|
||
|
|
|
||
|
|
## Model Performance
|
||
|
|
|
||
|
|
### Japanese tasks
|
||
|
|
|
||
|
|
|Model|Size|JCom.|JEMHopQA|NIILC|JSQuAD|XL-Sum|MGSM|WMT20-en-ja|WMT20-ja-en|JMMLU|JHumanEval|Ja Avg|
|
||
|
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
||
|
|
| | |4-shot|4-shot|4-shot|4-shot|1-shot|4-shot|4-shot|4-shot|5-shot|0-shot| |
|
||
|
|
| | |EM acc|Char-F1|Char-F1|Char-F1|ROUGE-2|EM acc|BLEU|BLEU|EM acc|pass@1| |
|
||
|
|
|calm2-7b-chat|7B|0.2413|0.5128|0.4956|0.7729|0.0551|0.0480|0.2208|0.1384|0.2482|0.0000|0.2733|
|
||
|
|
|Swallow-7b-instruct-v0.1|7B|0.6059|0.4760|0.5284|0.8396|0.1546|0.1360|0.2285|0.1783|0.3510|0.0256|0.3524|
|
||
|
|
|Swallow-MS-7b-instruct-v0.1|7B|0.7435|0.5066|0.4268|0.8594|0.1582|0.1760|0.2260|0.1880|0.4177|0.2244|0.3927|
|
||
|
|
|RakutenAI-7B-chat|7B|0.9035|0.2600|0.4619|0.8647|0.1339|0.2120|0.2667|0.1966|0.4504|0.2299|0.3980|
|
||
|
|
|Qwen2-7B-Instruct|7B|0.8856|0.3902|0.3859|0.8967|0.1277|0.5720|0.2041|0.1909|0.5713|0.5683|0.4793|
|
||
|
|
|Meta-Llama-3-8B-Instruct|8B|0.8785|0.3812|0.3936|0.8955|0.1273|0.4160|0.2143|0.2035|0.4719|0.2872|0.4269|
|
||
|
|
|Llama-3-ELYZA-JP-8B|8B|0.9017|0.5124|0.5016|0.9113|0.1677|0.4600|0.2509|0.1846|0.4829|0.3811|0.4754|
|
||
|
|
|Llama-3-Swallow-8B-Instruct-v0.1|8B|0.9178|0.4963|0.5168|0.9088|0.1296|0.4880|0.2522|0.2254|0.4835|0.3927|0.4811|
|
||
|
|
|
||
|
|
### English tasks
|
||
|
|
|
||
|
|
|Model|Size|OpenBookQA|TriviaQA|HellaSWAG|SQuAD2.0|XWINO|MMLU|GSM8K|BBH|HumanEval|En Avg|
|
||
|
|
|---|---|---|---|---|---|---|---|---|---|---|---|
|
||
|
|
| | |4-shot|4-shot|4-shot|4-shot|4-shot|5-shot|4-shot|3-shot|0-shot| |
|
||
|
|
| | |Acc|EM acc|Acc|EM acc|Acc|Acc|EM acc|CoT EM Acc|pass@1| |
|
||
|
|
|calm2-7b-chat|7B|0.2860|0.3528|0.5042|0.2524|0.8413|0.3860|0.0546|0.2990|0.0000|0.3307|
|
||
|
|
|Swallow-7b-instruct-v0.1|7B|0.3280|0.4810|0.5501|0.2720|0.8774|0.4066|0.1251|0.3646|0.0866|0.3879|
|
||
|
|
|Swallow-MS-7b-instruct-v0.1|7B|0.3600|0.4999|0.5858|0.3030|0.8834|0.5273|0.2108|0.4386|0.2512|0.4511|
|
||
|
|
|RakutenAI-7B-chat|7B|0.4160|0.5971|0.6465|0.3091|0.8886|0.5757|0.3139|0.4958|0.2671|0.5011|
|
||
|
|
|Qwen2-7B-Instruct|7B|0.4000|0.5468|0.6146|0.3518|0.8852|0.7073|0.6300|0.3101|0.6354|0.5646|
|
||
|
|
|Meta-Llama-3-8B-Instruct|8B|0.3880|0.6687|0.5834|0.3743|0.8903|0.6567|0.7453|0.6478|0.5415|0.6107|
|
||
|
|
|Llama-3-ELYZA-JP-8B|8B|0.3200|0.5502|0.5224|0.3631|0.8809|0.5875|0.5701|0.3213|0.4604|0.5084|
|
||
|
|
|Llama-3-Swallow-8B-Instruct-v0.1|8B|0.3720|0.6557|0.5861|0.3648|0.9002|0.6315|0.5959|0.6391|0.4238|0.5743|
|
||
|
|
|
||
|
|
## MT-Bench JA
|
||
|
|
|
||
|
|
|Model|Size|coding|extraction|humanities|math|reasoning|roleplay|stem|writing|JMTAvg|
|
||
|
|
|---|---|---|---|---|---|---|---|---|---|---|
|
||
|
|
|calm2-7b-chat|7B|0.1198|0.3793|0.4231|0.1011|0.1799|0.4760|0.3568|0.4583|0.3118|
|
||
|
|
|Swallow-7b-instruct-v0.1|7B|0.1947|0.3156|0.4991|0.1900|0.2141|0.5330|0.4535|0.4624|0.3578|
|
||
|
|
|Swallow-MS-7b-instruct-v0.1|7B|0.2235|0.3743|0.4611|0.1060|0.3404|0.4287|0.3969|0.3877|0.3398|
|
||
|
|
|RakutenAI-7B-chat|7B|0.2475|0.3522|0.4692|0.2140|0.3926|0.4427|0.3977|0.4434|0.3699|
|
||
|
|
|Qwen2-7B-Instruct|7B|0.4635|0.6909|0.6857|0.5970|0.5042|0.6667|0.5353|0.6808|0.6030|
|
||
|
|
|Meta-Llama-3-8B-Instruct|8B|0.3744|0.6876|0.6225|0.2070|0.5032|0.5248|0.5326|0.4884|0.4926|
|
||
|
|
|Llama-3-ELYZA-JP-8B|8B|0.2908|0.6421|0.6406|0.3088|0.5500|0.6740|0.5251|0.6744|0.5382|
|
||
|
|
|Llama-3-Swallow-8B-Instruct-v0.1|8B|0.3547|0.6508|0.5371|0.2718|0.4007|0.5493|0.4752|0.5730|0.4766|
|
||
|
|
|
||
|
|
## Evaluation Benchmarks
|
||
|
|
|
||
|
|
### Japanese evaluation benchmarks
|
||
|
|
|
||
|
|
We used llm-jp-eval(v1.3.0), JP Language Model Evaluation Harness(commit #9b42d41) and Code Generation LM Evaluation Harness(commit #0261c52). The details are as follows:
|
||
|
|
|
||
|
|
- Multiple-choice question answering (JCommonsenseQA [Kurihara et al., 2022])
|
||
|
|
- Open-ended question answering (JEMHopQA [Ishii et al., 2024])
|
||
|
|
- Open-ended question answering (NIILC [関根, 2003])
|
||
|
|
- Machine reading comprehension (JSQuAD [Kurihara et al., 2022])
|
||
|
|
- Automatic summarization (XL-Sum [Hasan et al., 2021])
|
||
|
|
- Machine translation (WMT2020 ja-en [Barrault et al., 2020])
|
||
|
|
- Machine translation (WMT2020 en-ja [Barrault et al., 2020])
|
||
|
|
- Mathematical reasoning (MGSM [Shi et al., 2023])
|
||
|
|
- Academic exams (JMMLU [尹ら, 2024])
|
||
|
|
- Code generation (JHumanEval [佐藤ら, 2024])
|
||
|
|
|
||
|
|
### English evaluation benchmarks
|
||
|
|
|
||
|
|
We used the Language Model Evaluation Harness(v.0.4.2) and Code Generation LM Evaluation Harness(commit #0261c52). The details are as follows:
|
||
|
|
|
||
|
|
- Multiple-choice question answering (OpenBookQA [Mihaylov et al., 2018])
|
||
|
|
- Open-ended question answering (TriviaQA [Joshi et al., 2017])
|
||
|
|
- Machine reading comprehension (SQuAD2 [Rajpurkar et al., 2018])
|
||
|
|
- Commonsense reasoning (XWINO [Tikhonov and Ryabinin, 2021])
|
||
|
|
- Natural language inference (HellaSwag [Zellers et al., 2019])
|
||
|
|
- Mathematical reasoning (GSM8K [Cobbe et al., 2021])
|
||
|
|
- Reasoning (BBH (BIG-Bench-Hard) [Suzgun et al., 2023])
|
||
|
|
- Academic exams (MMLU [Hendrycks et al., 2021])
|
||
|
|
- Code generation (HumanEval [Chen et al., 2021])
|
||
|
|
|
||
|
|
### MT-Bench JA
|
||
|
|
|
||
|
|
We used [Japanese MT-Bench](https://wandb.ai/wandb-japan/llm-leaderboard/artifacts/dataset/mtbench_ja_question) to assess the instruction-following capabilities of models.
|
||
|
|
We utilized the following settings:
|
||
|
|
|
||
|
|
- Implemantation: FastChat [Zheng+, 2023] (commit #e86e70d0)
|
||
|
|
- Question: [Nejumi LLM-Leaderboard NEO, mtbench_ja_question_v3](https://wandb.ai/wandb-japan/llm-leaderboard/artifacts/dataset/mtbench_ja_question/v3)
|
||
|
|
- Reference Answer: [Nejumi LLM-Leaderboard NEO, mtbench_ja_referenceanswer_v1](https://wandb.ai/wandb-japan/llm-leaderboard/artifacts/dataset/mtbench_ja_referenceanswer/v1)
|
||
|
|
- Prompt for Judge: [Nejumi LLM-Lederboard NEO, mtbench_ja_prompt_v1](https://wandb.ai/wandb-japan/llm-leaderboard/artifacts/dataset/mtbench_ja_prompt/v1)
|
||
|
|
- Judge: `gpt-4-1106-preview`
|
||
|
|
- Scoring: Absolute scale normalized to a 0-1 range, averaged over five runs.
|
||
|
|
|
||
|
|
## Usage
|
||
|
|
|
||
|
|
```sh
|
||
|
|
pip install vllm
|
||
|
|
```
|
||
|
|
|
||
|
|
```python
|
||
|
|
from transformers import AutoTokenizer
|
||
|
|
from vllm import LLM, SamplingParams
|
||
|
|
|
||
|
|
model_name = "tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1"
|
||
|
|
|
||
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
||
|
|
llm = LLM(
|
||
|
|
model=model_name,
|
||
|
|
tensor_parallel_size=1,
|
||
|
|
)
|
||
|
|
|
||
|
|
sampling_params = SamplingParams(
|
||
|
|
temperature=0.6, top_p=0.9, max_tokens=512, stop="<|eot_id|>"
|
||
|
|
)
|
||
|
|
|
||
|
|
|
||
|
|
message = [
|
||
|
|
{"role": "system", "content": "あなたは誠実で優秀な日本人のアシスタントです。"},
|
||
|
|
{
|
||
|
|
"role": "user",
|
||
|
|
"content": "東京の夜空に打ち上がっている花火の下、向かい合っている燕とラマの温かい物語を書いてください。",
|
||
|
|
},
|
||
|
|
]
|
||
|
|
prompt = tokenizer.apply_chat_template(
|
||
|
|
message, tokenize=False, add_generation_prompt=True
|
||
|
|
)
|
||
|
|
|
||
|
|
output = llm.generate(prompt, sampling_params)
|
||
|
|
|
||
|
|
print(output[0].outputs[0].text)
|
||
|
|
|
||
|
|
```
|
||
|
|
|
||
|
|
## Training Datasets
|
||
|
|
|
||
|
|
### Instruction Tuning
|
||
|
|
|
||
|
|
The following datasets were used for the instruction tuning.
|
||
|
|
|
||
|
|
- [OpenAssistant Conversations Dataset EN top-1 thread](https://huggingface.co/datasets/OpenAssistant/oasst2)
|
||
|
|
- [OpenAssistant Conversations Dataset](https://huggingface.co/datasets/llm-jp/oasst1-21k-ja) was used, where human utterances are included but the responses are not used. Instead, the responses were generated using the [Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) model.
|
||
|
|
|
||
|
|
|
||
|
|
## Risks and Limitations
|
||
|
|
|
||
|
|
The models released here are still in the early stages of our research and development and have not been tuned to ensure outputs align with human intent and safety considerations.
|
||
|
|
|
||
|
|
## Acknowledgements
|
||
|
|
|
||
|
|
We thank Meta Research for releasing Llama 3 under an open license for others to build on.
|
||
|
|
|
||
|
|
Our project is supported by the [Large Generative AI Development Support Program](https://abci.ai/en/link/lfm_support_program.html) of the National Institute of Advanced Industrial Science and Technology.
|
||
|
|
|
||
|
|
## License
|
||
|
|
|
||
|
|
[META LLAMA 3 COMMUNITY LICENSE](https://llama.meta.com/llama3/license/)
|
||
|
|
|
||
|
|
## Authors
|
||
|
|
|
||
|
|
Here are the team members:
|
||
|
|
- From [Tokyo Institute of Technology Okazaki Laboratory](https://www.nlp.c.titech.ac.jp/index.en.html), the following members:
|
||
|
|
- [Naoaki Okazaki](https://www.chokkan.org/index.ja.html)
|
||
|
|
- [Sakae Mizuki](https://s-mizuki-nlp.github.io/)
|
||
|
|
- [Youmi Ma](https://www.nlp.c.titech.ac.jp/member/youmi.en.html)
|
||
|
|
- [Koki Maeda](https://sites.google.com/view/silviase)
|
||
|
|
- [Kakeru Hattori](https://aya-se.vercel.app/)
|
||
|
|
- [Masanari Ohi](https://sites.google.com/view/masanariohi)
|
||
|
|
- [Taihei Shiotani](https://github.com/inatoihs)
|
||
|
|
- [Koshiro Saito](https://sites.google.com/view/koshiro-saito)
|
||
|
|
- From [Tokyo Institute of Technology YOKOTA Laboratory](https://www.rio.gsic.titech.ac.jp/en/index.html), the following members:
|
||
|
|
- [Rio Yokota](https://twitter.com/rioyokota)
|
||
|
|
- [Kazuki Fujii](https://twitter.com/okoge_kaz)
|
||
|
|
- [Taishi Nakamura](https://twitter.com/Setuna7777_2)
|
||
|
|
- [Takumi Okamoto](https://www.linkedin.com/in/takumi-okamoto)
|
||
|
|
- [Ishida Shigeki](https://www.wantedly.com/id/reborn27)
|
||
|
|
- From [Artificial Intelligence Research Center, AIST, Japan](https://www.airc.aist.go.jp/en/teams/), the following members:
|
||
|
|
- [Hiroya Takamura](https://sites.google.com/view/hjtakamura)
|
||
|
|
|
||
|
|
## How to cite
|
||
|
|
|
||
|
|
If you find our work helpful, please feel free to cite us.
|
||
|
|
|
||
|
|
```
|
||
|
|
@inproceedings{Fujii:COLM2024,
|
||
|
|
title={Continual Pre-Training for Cross-Lingual LLM Adaptation:
|
||
|
|
Enhancing Japanese Language Capabilities},
|
||
|
|
author={Kazuki Fujii and Taishi Nakamura and Mengsay Loem and Hiroki
|
||
|
|
Iida and Masanari Ohi and Kakeru Hattori and Hirai Shota and Sakae
|
||
|
|
Mizuki and Rio Yokota and Naoaki Okazaki},
|
||
|
|
booktitle="Proceedings of the First Conference on Language Modeling",
|
||
|
|
series={COLM},
|
||
|
|
pages="(to appear)",
|
||
|
|
year="2024",
|
||
|
|
month=oct,
|
||
|
|
address={University of Pennsylvania, USA},
|
||
|
|
}
|
||
|
|
|
||
|
|
@inproceedings{Okazaki:COLM2024,
|
||
|
|
title={Building a Large Japanese Web Corpus for Large Language Models},
|
||
|
|
author={Naoaki Okazaki and Kakeru Hattori and Hirai Shota and Hiroki
|
||
|
|
Iida and Masanari Ohi and Kazuki Fujii and Taishi Nakamura and Mengsay
|
||
|
|
Loem and Rio Yokota and Sakae Mizuki},
|
||
|
|
booktitle="Proceedings of the First Conference on Language Modeling",
|
||
|
|
series={COLM},
|
||
|
|
pages="(to appear)",
|
||
|
|
year="2024",
|
||
|
|
month=oct,
|
||
|
|
address={University of Pennsylvania, USA},
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Citations
|
||
|
|
|
||
|
|
```tex
|
||
|
|
@article{llama3modelcard,
|
||
|
|
title={Llama 3 Model Card},
|
||
|
|
author={AI@Meta},
|
||
|
|
year={2024},
|
||
|
|
url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md}
|
||
|
|
}
|
||
|
|
```
|