Files
llm-jp-4-8b-instruct-sansu/README.md

179 lines
11 KiB
Markdown
Raw Normal View History

---
language:
- ja
license: apache-2.0
library_name: transformers
base_model: llm-jp/llm-jp-4-8b-instruct
pipeline_tag: text-generation
tags:
- math
- distillation
- japanese
- elementary-education
- chuugaku-juken
- 算数
- sft
- qlora
---
(Japanese Follows)
# llm-jp-4-8b-instruct-sansu
QLoRA fine-tune of [`llm-jp/llm-jp-4-8b-instruct`](https://huggingface.co/llm-jp/llm-jp-4-8b-instruct), distilled from Claude Sonnet 4.6 on **3,189 filtered Japanese elementary-school math (中学受験 算数) training examples** with step-by-step solutions. Target audience: 56年生 preparing for 中学校 entrance exams.
This model is best understood as a **style/format distillation artifact**. It tends to produce concise, numbered 算数-style explanations (つるかめ算、面積図、線分図, etc.) rather than the algebraic / LaTeX-heavy responses that the base model often defaults to.
> **Research artifact, not a general-purpose AI service.** This is a narrow-domain student model produced via knowledge distillation from Claude Sonnet 4.6 for non-competing research and educational use, consistent with Anthropic's Commercial Terms §D.4. It is **not** a substitute for Claude/GPT-style general assistants. See [Intended use](#intended-use--%E7%94%A8%E9%80%94) and [Out of scope](#out-of-scope--%E6%83%B3%E5%AE%9A%E5%A4%96%E3%81%AE%E7%94%A8%E9%80%94) below before deploying.
> **Correctness warning:** do not use this model as an answer oracle. A follow-up audit found multiple wrong final answers in both the Sonnet-generated teacher/reference data and this student model's outputs. The supported claim is improved explanation style/readability, not verified answer accuracy.
### Overview
This model is a QLoRA fine-tune of [`llm-jp/llm-jp-4-8b-instruct`](https://huggingface.co/llm-jp/llm-jp-4-8b-instruct), distilled from Claude Sonnet 4.6 on **3,189 filtered step-by-step solutions** to Japanese elementary-school math problems used in 中学受験 (private middle-school entrance exams). The raw training split contained 3,213 examples; 24 extremely short records were filtered before training. Target audience: 5th6th-grade serious learners.
This should be treated as a **style and formatting distillation model**, not an answer-verification model. It tends to produce concise, numbered explanations using *elementary-school arithmetic methods* (つるかめ算 / "crane-and-turtle" method, 差集め算 / "difference-gathering" method, area-diagrams, ratios, etc.) rather than algebraic equations with variables, which fall outside the Japanese elementary curriculum.
### Intended use (research & education)
- Generating step-by-step explanations for elementary-school math problems
- Creating practice / teaching material for 中学受験 students
- Tutoring support where explanations must stay within the elementary curriculum
- Academic and practical research into Japanese-language SLM distillation
### Out of scope
- **Use as a general-purpose AI assistant / chatbot** (this is a narrow-domain artifact)
- Automated answer checking or final-answer generation without independent verification
- Unsupervised educational content shown directly to children
- Middle/high-school math (equations, functions, trigonometry)
- Non-math Japanese tasks (reading comprehension, science, social studies)
- General Japanese chat (not evaluated)
- Use as the foundation of a product or service that competes with Claude, GPT-style assistants, or other general-purpose AI offerings
### Commercial use note
This model is released under Apache 2.0, but **part of the training corpus was generated by Claude Sonnet 4.6**, and use of those outputs is subject to Anthropic's [Commercial Terms §D.4](https://www.anthropic.com/legal/commercial-terms) (restrictions on building competing AI products). Anyone considering commercial deployment should review Anthropic's current terms independently.
### Limitations and known issues
1. **Correctness is out of scope for this release.** Always check the final answer against an official or human-verified source. Soonet 4.6 produced answers are not always correct based on the spot checks.
2. **Distillation ceiling**: this model is bounded by Claude Sonnet 4.6's elementary-math capability. If Sonnet makes a mistake, the student model can inherit it.
3. **The training "solutions" were generated by Sonnet, not authored by human 塾 teachers.** Phrasing may diverge from published model-answer guides.
4. **Not evaluated on non-math tasks.** Performance on reading comprehension, science, social studies, or general chat is unknown.
### How to use
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "ynakazat11/llm-jp-4-8b-instruct-sansu"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id, trust_remote_code=True, torch_dtype="bfloat16", device_map="auto"
)
system = (
"あなたは中学受験算数を教える先生です。"
"問題文を読み、小学生にもわかるように、算数の手法(つるかめ算・差集め算・"
"面積図・線分図・比など)を使って段階的に解説してください。"
"文字式・方程式・代数x, y などの未知数を立てる方法)は使わないでください。"
)
user = "270mの道の端から端まで桜の木を植えます。木と木の間隔を9mにすると、木は何本植えられますか。"
messages = [{"role": "system", "content": system}, {"role": "user", "content": user}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=512, temperature=0.3)
print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
```
### License
Released under Apache 2.0. Please also review the license of the base model (`llm-jp/llm-jp-4-8b-instruct`).
### Citation
```
@misc{nakazato2026sansu,
title = {llm-jp-4-8b-instruct-sansu: A distilled SLM for Japanese elementary math instruction},
author = {Yuki Nakazato},
year = {2026},
url = {https://huggingface.co/ynakazat11/llm-jp-4-8b-instruct-sansu},
note = {QLoRA distillation from Claude Sonnet 4.6 on Japanese 中学受験 math problems.},
}
```
## 🇯🇵 日本語
### モデル概要
中学受験算数小学5-6年生レベルの問題に対して、段階的で読みやすい解説形式を出すように、[`llm-jp/llm-jp-4-8b-instruct`](https://huggingface.co/llm-jp/llm-jp-4-8b-instruct) をQLoRAで微調整したモデルです。
教師モデルClaude Sonnet 4.6が生成した3,189件の解説を蒸留学習しています。元の訓練候補は3,213問で、極端に短い問題・解説を除外した後の3,189件を学習に使いました。出力は方程式・代数を避け、**つるかめ算・差集め算・面積図・線分図・比** など算数の解法に寄せる方向で調整されています。
重要このモデルは「正答率を保証するモデル」ではありません。訓練解説はSonnet生成であり、監査では教師データ・モデル出力の両方に誤答が見つかっています。
### 用途(研究・教育目的)
- 中学受験対策5-6年生向けの算数解説生成
- 算数の指導教材作成支援
- 小学校算数の問題に対する段階的解説
- 日本語SLMの蒸留手法に関する学術・実務研究
### 想定外の用途
- **汎用AIアシスタントチャットボット用途**(本モデルは特定ドメイン専用です)
- 自動採点、正答生成、答え合わせの最終判断
- 子どもへ無監督で提示する教材生成
- 中学・高校以降の数学(方程式、関数、三角比など)の解説
- 国語、理科、社会など算数以外の科目
- 算数以外の日本語タスク全般(性能評価していません)
- ClaudeGPT等のフロンティアAIサービスと競合するプロダクトの基盤としての使用
### 商用利用について
本モデルはApache 2.0ライセンスでリリースしていますが、**訓練データの一部はClaude Sonnet 4.6が生成しており**、Anthropic社の[Commercial Terms](https://www.anthropic.com/legal/commercial-terms)第D.4条競合製品の禁止の制約を受けます。本モデルを商用利用する際は、利用者ご自身でAnthropicの最新の利用規約を確認してください。
### 注意・既知の制約
1. **正答率は今回のスコープ外です。** 最終的な答えは必ず人間または公式解答で確認してください。Sonnet 4.6の回答に誤答が複数検知されています。
2. **蒸留の上限は教師モデルClaude Sonnet 4.6)の能力に依存します。** Sonnet 4.6 が間違える問題では、本モデルも間違える可能性があります。
3. **訓練データの解説はSonnet 4.6が生成したものであり、塾講師の解説そのものではありません。** 模範解答書の解説とは表現が異なる場合があります。
4. **算数以外のタスクには評価していません。** 国語の文章題、理科、社会、雑談などへの性能は不明です。
### 使用例
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "ynakazat11/llm-jp-4-8b-instruct-sansu"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True, torch_dtype="bfloat16", device_map="auto")
system = (
"あなたは中学受験算数を教える先生です。"
"問題文を読み、小学生にもわかるように、算数の手法(つるかめ算・差集め算・"
"面積図・線分図・比など)を使って段階的に解説してください。"
"文字式・方程式・代数x, y などの未知数を立てる方法)は使わないでください。"
)
user = "270mの道の端から端まで桜の木を植えます。木と木の間隔を9mにすると、木は何本植えられますか。"
messages = [{"role": "system", "content": system}, {"role": "user", "content": user}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=512, temperature=0.3)
print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
```
### ライセンス
本モデルはApache 2.0でリリースしています。ベースモデル(`llm-jp/llm-jp-4-8b-instruct`)のライセンスも併せてご確認ください。
# Uploaded finetuned model
- **Developed by:** ynakazat11
- **License:** apache-2.0
- **Finetuned from model :** llm-jp/llm-jp-4-8b-instruct
This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)