DeepSeek-R1-Distill-Qwen-7B…/README.md

---
library_name: transformers
license: apache-2.0
base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
tags:
- llama-factory
- full
- generated_from_trainer
model-index:
- name: distilabel-reasoning-R1-Llama-70B-ja-train
  results: []
datasets:
- lightblue/distilabel-reasoning-R1-Llama-70B
language:
- ja
---

[日本語はこちら](#japanese)

# lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese

[Deepseek's R1 models](https://huggingface.co/collections/deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d) are excellent, state-of-the-art reasoning models which have been trained to work bilingually, with English and Chinese.
However, these models are inconsistent in the language that they produce - often outputting Chinese or English when prompted in Japanese.
For this reason, we developed [lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese](https://huggingface.co/lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese) as a Japanese version of R1.

This model is a Japanese fine-tuned version of [deepseek-ai/DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) on our [lightblue/distilabel-reasoning-R1-Llama-70B](https://huggingface.co/datasets/lightblue/distilabel-reasoning-R1-Llama-70B) dataset which reliably and accurately outputs Japanese in response to prompts.

This model was trained was trained for \<10 minutes on the 8 x L20 instance ([ecs.gn8is-8x.32xlarge](https://www.alibabacloud.com/help/en/ecs/user-guide/gpu-accelerated-compute-optimized-and-vgpu-accelerated-instance-families-1)) on [Alibaba Cloud](https://www.alibabacloud.com/).

# How to use

When using these models, we recommend using a sampling temperature of between 0.5-0.7, [as per the original distilled R1 models](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B#usage-recommendations).

Additionally, we have observed that the model sometimes tends to repeat itself more than the original R1 model, so we also recommend setting `repetition_penalty` to 1.1, or higher if the model repeats itself when processing your prompts.

We include scripts to use this model in vLLM:

<ul>
  <li><b>vLLM</b>

Install [vLLM](https://github.com/vllm-project/vllm/) using `pip install vllm`.

<details open>
  <summary>Show vLLM code</summary>

```python
from vllm import LLM, SamplingParams

llm = LLM(
    model="lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese",
    max_model_len=8_000
)

sampling_params = SamplingParams(
    temperature=0.5,
    max_tokens=8_000,
    repetition_penalty=1.1
)

prompts = [
    """学校には1クラスにつき20人の生徒がおり、クラスは合計3つあります。
学校全体では男子と女子がそれぞれ50%ずついます。
1つ目のクラスには女子が15人、2つ目のクラスには女子が12人います。
3つ目のクラスには何人の男子がいますか？"""
]

conversations = [
    [{"role": "user", "content": x}] for x in prompts
]

outputs = llm.chat(conversations, sampling_params=sampling_params)

for output in outputs:
    print(output.outputs[0].text)

<think>
# まず、学校の総生徒数を算出します。各クラスに20人の生徒があり、クラスは3つあるため、総生徒数は60人です。

# 次に、学校全体で男子と女子は同じ人数で分布しています。したがって、男子と女子各有30人。
...
# したがって、3つ目のクラスの男子数は20 - 3 = 17人です。
# </think>

# **解答：**

# 学校の総生徒数を算出します。
...
# **最終的な答え：**
# \[
# \boxed{17}
# \]
```

</details></li>
</ul>

# Evaluation

We evaluated this model for output accuracy and the percentage of valid Japanese `<think>` sections using the first 50 rows of the [SakanaAI/gsm8k-ja-test_250-1319](https://huggingface.co/datasets/SakanaAI/gsm8k-ja-test_250-1319) dataset.

We compare this to the original R1 model and test in both regimes where repetition penalty is 1.0 and 1.1:

|                                                | Repetition Penalty | Answer accuracy (%) | Valid Japanese `<think>` (%) |
|------------------------------------------------|--------------------|---------------------|----------------------------|
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B        | 1.0                | 60                  | 94                         |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B        | 1.1                | 62                  | 96                         |
| lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.0                | 66                  | 92                         |
| lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.1                | **70**                  | **98**                         |

Code for the SakanaAI/gsm8k-ja-test_250-1319 evaluation can be found [here](https://drive.google.com/file/d/1gCzCJv5vasw8R3KVQimfoIDFyfxwxNvC/view?usp=sharing).


We further use the first 50 prompts from [DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja](https://huggingface.co/datasets/DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja) to evaluate the percentage of valid Japanese `<think>` sections in model responses.
This benchmark contains more varied and complex prompts, meaning this is a more realistic evaluation of how reliably this model can output Japanese.

|                                                | Repetition Penalty | Valid Japanese `<think>` (%) |
|------------------------------------------------|--------------------|----------------------------|
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B        | 1.0                | 48                         |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B        | 1.1                | 48                         |
| lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.0                | 84                         |
| lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.1                | **94**                         |

Code for the DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja evaluation can be found [here](https://drive.google.com/file/d/1f75IM5x1SZrb300odkEsLMfKsfibrxvR/view?usp=sharing).

# How this model was made

We made the data for this model using the following steps:

1. Sample English reasoning-style prompts from [argilla/distilabel-reasoning-prompts](https://huggingface.co/datasets/argilla/distilabel-reasoning-prompts).
2. Remove similar prompts using text similarity based on [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3) embeddings.
3. Translate English prompts to Japanese using [gpt-4o-mini-2024-07-18](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/).
4. Generate answers to prompts using [deepseek-ai/DeepSeek-R1-Distill-Llama-70B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B).
5. Filter out responses which did not:
   * Finish within 2048 tokens
   * Contain a valid `<think>` section
   * Have the `<think>` section written in Japanese

We used this data to train our model using supervised fine tuning on [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) with the [ecs.gn8is-8x.32xlarge](https://www.alibabacloud.com/help/en/ecs/user-guide/gpu-accelerated-compute-optimized-and-vgpu-accelerated-instance-families-1) instance.


<br/>
<br/>
<h1 style="font-size: 48px;" id="japanese">日本語</h3>

[DeepseekのR1モデル](https://huggingface.co/collections/deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d)は優れた、最先端の推論モデルであり、英語と中国語のバイリンガルで動作するように訓練されています。しかし、これらのモデルは出力される言語が一貫していないことがあり、日本語でプロンプトを与えると中国語や英語を出力することがあります。そこで、我々はR1の日本語版として[lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese](https://huggingface.co/lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese)を開発しました。

このモデルは、我々の[lightblue/distilabel-reasoning-R1-Llama-70B](https://huggingface.co/datasets/lightblue/distilabel-reasoning-R1-Llama-70B) データセットを使用して、[deepseek-ai/DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B)の日本語版として微調整されています。

このモデルは、[Alibaba Cloud](https://www.alibabacloud.com/)の8 x L20インスタンス([ecs.gn8is-8x.32xlarge](https://www.alibabacloud.com/help/en/ecs/user-guide/gpu-accelerated-compute-optimized-and-vgpu-accelerated-instance-families-1))で\<10分間訓練されました。

# 使用方法

これらのモデルを使用する際は、元の蒸留R1モデルで推奨されているように、サンプリング温度を0.5から0.7の間で使用することをお勧めします。

また、モデルが元のR1モデルよりも繰り返しがちな傾向があるため、プロンプトを処理する際にモデルが自分を繰り返す場合は、`repetition_penalty`を1.1またはそれ以上に設定することをお勧めします。

このモデルをvLLMで使用するためのスクリプトを含めています：

<ul>
  <li><b>vLLM</b>

[vLLM](https://github.com/vllm-project/vllm/)をインストールするには、 `pip install vllm`を使用します。

<details open>
  <summary>vLLMコードを表示</summary>

```python
from vllm import LLM, SamplingParams

llm = LLM(
    model="lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese",
    max_model_len=8_000
)

sampling_params = SamplingParams(
    temperature=0.5,
    max_tokens=8_000,
    repetition_penalty=1.1
)

prompts = [
    """学校には1クラスにつき20人の生徒がおり、クラスは合計3つあります。
学校全体では男子と女子がそれぞれ50%ずついます。
1つ目のクラスには女子が15人、2つ目のクラスには女子が12人います。
3つ目のクラスには何人の男子がいますか？"""
]

conversations = [
    [{"role": "user", "content": x}] for x in prompts
]

outputs = llm.chat(conversations, sampling_params=sampling_params)

for output in outputs:
    print(output.outputs[0].text)

<think>
# まず、学校の総生徒数を算出します。各クラスに20人の生徒があり、クラスは3つあるため、総生徒数は60人です。

# 次に、学校全体で男子と女子は同じ人数で分布しています。したがって、男子と女子各有30人。
...
# したがって、3つ目のクラスの男子数は20 - 3 = 17人です。
# </think>

# **解答：**

# 学校の総生徒数を算出します。
...
# **最終的な答え：**
# \[
# \boxed{17}
# \]
```

</details></li>
</ul>

# 評価

このモデルは、(SakanaAI/gsm8k-ja-test_250-1319)[https://huggingface.co/datasets/SakanaAI/gsm8k-ja-test_250-1319]データセットの最初の50行を使用して、出力の正確性と有効な日本語の`<think>`セクションの割合を評価しました。

これは元のR1モデルと比較し、繰り返しペナルティが1.0と1.1の両方の条件でテストを行いました：

|                                                | Repetition Penalty | Answer accuracy (%) | Valid Japanese `<think>` (%) |
|------------------------------------------------|--------------------|---------------------|----------------------------|
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B        | 1.0                | 60                  | 94                         |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B        | 1.1                | 62                  | 96                         |
| lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.0                | 66                  | 92                         |
| lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.1                | 70                  | 98                         |

SakanaAI/gsm8k-ja-test_250-1319の評価コードは[こちら](https://drive.google.com/file/d/1gCzCJv5vasw8R3KVQimfoIDFyfxwxNvC/view?usp=sharing)にあります。

さらに、(DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja)[https://huggingface.co/datasets/DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja]の最初の50プロンプトを使用して、モデル応答における有効な日本語の`<think>`セクションの割合を評価します。このベンチマークにはより多様で複雑なプロンプトが含まれており、モデルが日本語を信頼性高く出力できるかどうかを、より現実的に評価します。

|                                                | Repetition Penalty | Valid Japanese `<think>` (%) |
|------------------------------------------------|--------------------|----------------------------|
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B        | 1.0                | 48                         |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B        | 1.1                | 48                         |
| lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.0                | 84                         |
| lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.1                | 94                         |

DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja評価コードは[こちら](https://drive.google.com/file/d/1f75IM5x1SZrb300odkEsLMfKsfibrxvR/view?usp=sharing)にあります。

# 作成方法

このモデルのデータは以下の手順で作成されました：

1. [argilla/distilabel-reasoning-prompts](https://huggingface.co/datasets/argilla/distilabel-reasoning-prompts)から英語の推論スタイルのプロンプトをサンプルします。
2. [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3)埋め込みに基づくテキスト類似度を使用して、類似したプロンプトを削除します。
3. [gpt-4o-mini-2024-07-18](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/)を使用して、英語のプロンプトを日本語に翻訳します。
4. [deepseek-ai/DeepSeek-R1-Distill-Llama-70B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B)を使用して、プロンプトに対する回答を生成します。
5. 以下の条件を満たさない応答をフィルタリングします：
   * 2048トークン以内に終了すること
   * 有効な`<think>`セクションを含んでいること
   * `<think>`セクションが日本語で書かれていること


# Training details
<details>
  <summary>Full training config</summary>

  ### Training config yaml

  ```yaml
### model
model_name_or_path: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

### method
stage: sft
do_train: true
finetuning_type: full
deepspeed: /root/LLaMA-Factory/examples/deepspeed/ds_z2_config.json

### dataset
dataset: distilabel-reasoning-R1-Llama-70B-ja-train
template: qwen
cutoff_len: 4500
overwrite_cache: true
preprocessing_num_workers: 16
packing: true

### output
output_dir: /root/train_outputs/DeepSeek-R1-Distill-Qwen-7B/distilabel-reasoning-R1-Llama-70B-ja-train
logging_steps: 1
save_steps: 0.99999
plot_loss: true
overwrite_output_dir: true

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 1
learning_rate: 1.0e-5
num_train_epochs: 1.0
lr_scheduler_type: cosine
warmup_ratio: 0.01
bf16: true
ddp_timeout: 180000000

### eval
val_size: 0.01
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 0.1
```

### Training run script

```shell
echo '{
  "distilabel-reasoning-R1-Llama-70B-ja-train": {
    "hf_hub_url": "lightblue/distilabel-reasoning-R1-Llama-70B-ja-train",
    "formatting": "sharegpt"
  }
}' > /root/LLaMA-Factory/data/dataset_info.json

cd /root/LLaMA-Factory && llamafactory-cli train /root/reasoning_train.yaml

rm -r /root/train_outputs/DeepSeek-R1-Distill-Qwen-7B/distilabel-reasoning-R1-Llama-70B-ja-train/checkpoint*
huggingface-cli upload lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese /root/train_outputs/DeepSeek-R1-Distill-Qwen-7B/distilabel-reasoning-R1-Llama-70B-ja-train
```

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 8
- total_eval_batch_size: 8
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.01
- num_epochs: 1.0

### Training results

| Training Loss | Epoch  | Step | Validation Loss |
|:-------------:|:------:|:----:|:---------------:|
| 0.766         | 0.1087 | 5    | 0.5912          |
| 0.5873        | 0.2174 | 10   | 0.5282          |
| 0.3868        | 0.3261 | 15   | 0.4958          |
| 0.5101        | 0.4348 | 20   | 0.4761          |
| 0.4085        | 0.5435 | 25   | 0.4644          |
| 0.5561        | 0.6522 | 30   | 0.4578          |
| 0.4683        | 0.7609 | 35   | 0.4542          |
| 0.5055        | 0.8696 | 40   | 0.4526          |
| 0.5359        | 0.9783 | 45   | 0.4519          |


### Framework versions

- Transformers 4.46.1
- Pytorch 2.5.1+cu124
- Datasets 3.1.0
- Tokenizers 0.20.3
</details>

<br/>

# License

We share this model under an Apache 2.0 license.

# Developed by

<a href="https://www.lightblue-tech.com">
<img src="https://www.lightblue-tech.com/wp-content/uploads/2023/08/color_%E6%A8%AA%E5%9E%8B-1536x469.png" alt="Lightblue technology logo" width="400"/>
</a>

This model was trained by Peter Devine ([ptrdvn](https://huggingface.co/ptrdvn)) for Lightblue