316 lines
9.9 KiB
Markdown
316 lines
9.9 KiB
Markdown
|
|
---
|
|||
|
|
language:
|
|||
|
|
- ja
|
|||
|
|
- en
|
|||
|
|
license: llama2
|
|||
|
|
datasets:
|
|||
|
|
- databricks/databricks-dolly-15k
|
|||
|
|
- kunishou/databricks-dolly-15k-ja
|
|||
|
|
- izumi-lab/llm-japanese-dataset
|
|||
|
|
thumbnail: https://github.com/rinnakk/japanese-pretrained-models/blob/master/rinna.png
|
|||
|
|
inference: false
|
|||
|
|
model-index:
|
|||
|
|
- name: youri-7b-chat
|
|||
|
|
results:
|
|||
|
|
- task:
|
|||
|
|
type: text-generation
|
|||
|
|
name: Text Generation
|
|||
|
|
dataset:
|
|||
|
|
name: AI2 Reasoning Challenge (25-Shot)
|
|||
|
|
type: ai2_arc
|
|||
|
|
config: ARC-Challenge
|
|||
|
|
split: test
|
|||
|
|
args:
|
|||
|
|
num_few_shot: 25
|
|||
|
|
metrics:
|
|||
|
|
- type: acc_norm
|
|||
|
|
value: 51.19
|
|||
|
|
name: normalized accuracy
|
|||
|
|
source:
|
|||
|
|
url: >-
|
|||
|
|
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rinna/youri-7b-chat
|
|||
|
|
name: Open LLM Leaderboard
|
|||
|
|
- task:
|
|||
|
|
type: text-generation
|
|||
|
|
name: Text Generation
|
|||
|
|
dataset:
|
|||
|
|
name: HellaSwag (10-Shot)
|
|||
|
|
type: hellaswag
|
|||
|
|
split: validation
|
|||
|
|
args:
|
|||
|
|
num_few_shot: 10
|
|||
|
|
metrics:
|
|||
|
|
- type: acc_norm
|
|||
|
|
value: 76.09
|
|||
|
|
name: normalized accuracy
|
|||
|
|
source:
|
|||
|
|
url: >-
|
|||
|
|
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rinna/youri-7b-chat
|
|||
|
|
name: Open LLM Leaderboard
|
|||
|
|
- task:
|
|||
|
|
type: text-generation
|
|||
|
|
name: Text Generation
|
|||
|
|
dataset:
|
|||
|
|
name: MMLU (5-Shot)
|
|||
|
|
type: cais/mmlu
|
|||
|
|
config: all
|
|||
|
|
split: test
|
|||
|
|
args:
|
|||
|
|
num_few_shot: 5
|
|||
|
|
metrics:
|
|||
|
|
- type: acc
|
|||
|
|
value: 46.06
|
|||
|
|
name: accuracy
|
|||
|
|
source:
|
|||
|
|
url: >-
|
|||
|
|
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rinna/youri-7b-chat
|
|||
|
|
name: Open LLM Leaderboard
|
|||
|
|
- task:
|
|||
|
|
type: text-generation
|
|||
|
|
name: Text Generation
|
|||
|
|
dataset:
|
|||
|
|
name: TruthfulQA (0-shot)
|
|||
|
|
type: truthful_qa
|
|||
|
|
config: multiple_choice
|
|||
|
|
split: validation
|
|||
|
|
args:
|
|||
|
|
num_few_shot: 0
|
|||
|
|
metrics:
|
|||
|
|
- type: mc2
|
|||
|
|
value: 41.17
|
|||
|
|
source:
|
|||
|
|
url: >-
|
|||
|
|
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rinna/youri-7b-chat
|
|||
|
|
name: Open LLM Leaderboard
|
|||
|
|
- task:
|
|||
|
|
type: text-generation
|
|||
|
|
name: Text Generation
|
|||
|
|
dataset:
|
|||
|
|
name: Winogrande (5-shot)
|
|||
|
|
type: winogrande
|
|||
|
|
config: winogrande_xl
|
|||
|
|
split: validation
|
|||
|
|
args:
|
|||
|
|
num_few_shot: 5
|
|||
|
|
metrics:
|
|||
|
|
- type: acc
|
|||
|
|
value: 75.06
|
|||
|
|
name: accuracy
|
|||
|
|
source:
|
|||
|
|
url: >-
|
|||
|
|
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rinna/youri-7b-chat
|
|||
|
|
name: Open LLM Leaderboard
|
|||
|
|
- task:
|
|||
|
|
type: text-generation
|
|||
|
|
name: Text Generation
|
|||
|
|
dataset:
|
|||
|
|
name: GSM8k (5-shot)
|
|||
|
|
type: gsm8k
|
|||
|
|
config: main
|
|||
|
|
split: test
|
|||
|
|
args:
|
|||
|
|
num_few_shot: 5
|
|||
|
|
metrics:
|
|||
|
|
- type: acc
|
|||
|
|
value: 1.52
|
|||
|
|
name: accuracy
|
|||
|
|
source:
|
|||
|
|
url: >-
|
|||
|
|
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rinna/youri-7b-chat
|
|||
|
|
name: Open LLM Leaderboard
|
|||
|
|
base_model: rinna/youri-7b
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# `rinna/youri-7b-chat`
|
|||
|
|
|
|||
|
|

|
|||
|
|
|
|||
|
|
# Overview
|
|||
|
|
The model is the instruction-tuned version of [`rinna/youri-7b`](https://huggingface.co/rinna/youri-7b). It adopts a chat-style input format.
|
|||
|
|
|
|||
|
|
* **Model architecture**
|
|||
|
|
|
|||
|
|
A 32-layer, 4096-hidden-size transformer-based language model. Refer to the [llama2 paper](https://arxiv.org/abs/2307.09288) for architecture details.
|
|||
|
|
|
|||
|
|
* **Fine-tuning**
|
|||
|
|
|
|||
|
|
The fine-tuning data is the subset of the following datasets.
|
|||
|
|
* [Databricks Dolly data](https://huggingface.co/datasets/databricks/databricks-dolly-15k)
|
|||
|
|
* [Japanese Databricks Dolly data](https://huggingface.co/datasets/kunishou/databricks-dolly-15k-ja)
|
|||
|
|
* [Anthropic HH RLHF data](https://huggingface.co/datasets/Anthropic/hh-rlhf) and its Japanese translation
|
|||
|
|
* [FLAN Instruction Tuning data](https://github.com/google-research/FLAN) and its Japanese translation
|
|||
|
|
* [Izumi lab LLM Japanese dataset](https://github.com/masanorihirano/llm-japanese-dataset/tree/main)
|
|||
|
|
* The following sections are used
|
|||
|
|
* alt
|
|||
|
|
* aozora-txt
|
|||
|
|
* CourseraParallel
|
|||
|
|
* ParaNatCom
|
|||
|
|
* Tab-delimited_Bilingual_Sentence_Pairs
|
|||
|
|
* tanaka-corpus
|
|||
|
|
* wikinews
|
|||
|
|
* wordnet
|
|||
|
|
* yasashi-japanese
|
|||
|
|
* The [remaining sections](https://github.com/masanorihirano/llm-japanese-dataset/tree/main/datasets-cc-by-sa) contain commonly used evaluation corpora so they are skipped to prevent data leak.
|
|||
|
|
|
|||
|
|
* **Contributors**
|
|||
|
|
|
|||
|
|
- [Tianyu Zhao](https://huggingface.co/tianyuz)
|
|||
|
|
- [Kei Sawada](https://huggingface.co/keisawada)
|
|||
|
|
|
|||
|
|
* **Release date**
|
|||
|
|
|
|||
|
|
October 31, 2023
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# Benchmarking
|
|||
|
|
|
|||
|
|
Please refer to [rinna's LM benchmark page (Sheet 20231031)](https://rinnakk.github.io/research/benchmarks/lm/index.html).
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# How to use the model
|
|||
|
|
|
|||
|
|
~~~~python
|
|||
|
|
import torch
|
|||
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM
|
|||
|
|
|
|||
|
|
tokenizer = AutoTokenizer.from_pretrained("rinna/youri-7b-chat")
|
|||
|
|
model = AutoModelForCausalLM.from_pretrained("rinna/youri-7b-chat")
|
|||
|
|
|
|||
|
|
if torch.cuda.is_available():
|
|||
|
|
model = model.to("cuda")
|
|||
|
|
|
|||
|
|
instruction = "次の日本語を英語に翻訳してください。"
|
|||
|
|
input = "自然言語による指示に基づきタスクが解けるよう学習させることを Instruction tuning と呼びます。"
|
|||
|
|
|
|||
|
|
context = [
|
|||
|
|
{
|
|||
|
|
"speaker": "設定",
|
|||
|
|
"text": instruction
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"speaker": "ユーザー",
|
|||
|
|
"text": input
|
|||
|
|
}
|
|||
|
|
]
|
|||
|
|
prompt = [
|
|||
|
|
f"{uttr['speaker']}: {uttr['text']}"
|
|||
|
|
for uttr in context
|
|||
|
|
]
|
|||
|
|
prompt = "\n".join(prompt)
|
|||
|
|
prompt = (
|
|||
|
|
prompt
|
|||
|
|
+ "\n"
|
|||
|
|
+ "システム: "
|
|||
|
|
)
|
|||
|
|
token_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
|
|||
|
|
|
|||
|
|
with torch.no_grad():
|
|||
|
|
output_ids = model.generate(
|
|||
|
|
token_ids.to(model.device),
|
|||
|
|
max_new_tokens=200,
|
|||
|
|
do_sample=True,
|
|||
|
|
temperature=0.5,
|
|||
|
|
pad_token_id=tokenizer.pad_token_id,
|
|||
|
|
bos_token_id=tokenizer.bos_token_id,
|
|||
|
|
eos_token_id=tokenizer.eos_token_id
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
output = tokenizer.decode(output_ids.tolist()[0])
|
|||
|
|
print(output)
|
|||
|
|
"""
|
|||
|
|
設定: 次の日本語を英語に翻訳してください。
|
|||
|
|
ユーザー: 自然言語による指示に基づきタスクが解けるよう学習させることを Instruction tuning と呼びます。
|
|||
|
|
システム: Learning to solve tasks based on natural language instructions is called instruction tuning.</s>
|
|||
|
|
"""
|
|||
|
|
|
|||
|
|
output = output[len(prompt):-len("</s>")].strip()
|
|||
|
|
input = "大規模言語モデル(だいきぼげんごモデル、英: large language model、LLM)は、多数のパラメータ(数千万から数十億)を持つ人工ニューラルネットワークで構成されるコンピュータ言語モデルで、膨大なラベルなしテキストを使用して自己教師あり学習または半教師あり学習によって訓練が行われる。"
|
|||
|
|
|
|||
|
|
context.extend([
|
|||
|
|
{
|
|||
|
|
"speaker": "システム",
|
|||
|
|
"text": output
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"speaker": "ユーザー",
|
|||
|
|
"text": input
|
|||
|
|
}
|
|||
|
|
])
|
|||
|
|
prompt = [
|
|||
|
|
f"{uttr['speaker']}: {uttr['text']}"
|
|||
|
|
for uttr in context
|
|||
|
|
]
|
|||
|
|
prompt = "\n".join(prompt)
|
|||
|
|
prompt = (
|
|||
|
|
prompt
|
|||
|
|
+ "\n"
|
|||
|
|
+ "システム: "
|
|||
|
|
)
|
|||
|
|
token_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
|
|||
|
|
|
|||
|
|
with torch.no_grad():
|
|||
|
|
output_ids = model.generate(
|
|||
|
|
token_ids.to(model.device),
|
|||
|
|
max_new_tokens=200,
|
|||
|
|
do_sample=True,
|
|||
|
|
temperature=0.5,
|
|||
|
|
pad_token_id=tokenizer.pad_token_id,
|
|||
|
|
bos_token_id=tokenizer.bos_token_id,
|
|||
|
|
eos_token_id=tokenizer.eos_token_id
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
output = tokenizer.decode(output_ids.tolist()[0])
|
|||
|
|
print(output)
|
|||
|
|
"""
|
|||
|
|
設定: 次の日本語を英語に翻訳してください。
|
|||
|
|
ユーザー: 自然言語による指示に基づきタスクが解けるよう学習させることを Instruction tuning と呼びます。
|
|||
|
|
システム: Learning to solve tasks based on natural language instructions is called instruction tuning.
|
|||
|
|
ユーザー: 大規模言語モデル(だいきぼげんごモデル、英: large language model、LLM)は、多数のパラメータ(数千万から数十億)を持つ人工ニューラルネットワークで構成されるコンピュータ言語モデルで、膨大なラベルなしテ キストを使用して自己教師あり学習または半教師あり学習によって訓練が行われる。
|
|||
|
|
システム: Large language models (LLMs) are computer language models consisting of a deep artificial neural network with millions to billions of parameters that are trained by self-supervised learning or semi-supervised learning using vast unlabeled text corpora.</s>
|
|||
|
|
"""
|
|||
|
|
~~~~
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# Tokenization
|
|||
|
|
The model uses the original llama-2 tokenizer.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# How to cite
|
|||
|
|
```bibtex
|
|||
|
|
@misc{rinna-youri-7b-chat,
|
|||
|
|
title = {rinna/youri-7b-chat},
|
|||
|
|
author = {Zhao, Tianyu and Sawada, Kei},
|
|||
|
|
url = {https://huggingface.co/rinna/youri-7b-chat}
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
@inproceedings{sawada2024release,
|
|||
|
|
title = {Release of Pre-Trained Models for the {J}apanese Language},
|
|||
|
|
author = {Sawada, Kei and Zhao, Tianyu and Shing, Makoto and Mitsui, Kentaro and Kaga, Akio and Hono, Yukiya and Wakatsuki, Toshiaki and Mitsuda, Koh},
|
|||
|
|
booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
|
|||
|
|
month = {5},
|
|||
|
|
year = {2024},
|
|||
|
|
pages = {13898--13905},
|
|||
|
|
url = {https://aclanthology.org/2024.lrec-main.1213},
|
|||
|
|
note = {\url{https://arxiv.org/abs/2404.01657}}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# License
|
|||
|
|
[The llama2 license](https://ai.meta.com/llama/license/)
|
|||
|
|
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
|||
|
|
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_rinna__youri-7b-chat)
|
|||
|
|
|
|||
|
|
| Metric |Value|
|
|||
|
|
|---------------------------------|----:|
|
|||
|
|
|Avg. |48.51|
|
|||
|
|
|AI2 Reasoning Challenge (25-Shot)|51.19|
|
|||
|
|
|HellaSwag (10-Shot) |76.09|
|
|||
|
|
|MMLU (5-Shot) |46.06|
|
|||
|
|
|TruthfulQA (0-shot) |41.17|
|
|||
|
|
|Winogrande (5-shot) |75.06|
|
|||
|
|
|GSM8k (5-shot) | 1.52|
|