初始化项目，由ModelHub XC社区提供模型

Model: QwenCollection/Tokara-0.5B-v0.1 Source: Original Platform
2026-05-29 16:20:14 +08:00
commit 9652444fa6
18 changed files with 472356 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,45 @@
+---
+license: other
+license_name: tongyi-qianwen-research
+license_link: https://huggingface.co/Qwen/Qwen1.5-0.5B/blob/main/LICENSE
+language:
+- ja
+- en
+pipeline_tag: text-generation
+datasets:
+- izumi-lab/wikipedia-ja-20230720
+- oscar-corpus/OSCAR-2301
+- aixsatoshi/cosmopedia-japanese-100k
+- BEE-spoke-data/wikipedia-20230901.en-deduped
+---
+
+## モデルについて
+[Qwen/Qwen1.5-0.5B](https://huggingface.co/Qwen/Qwen1.5-0.5B)を日英データ5Bトークンで継続事前学習したモデルです。
+
+ベンチマークのスコアは低下していますが、ベースモデルよりも安定して日本語を出力するようになっています。
+
+詳細は[こちら](https://zenn.dev/kendama/articles/55564e12da6e82)をご覧ください。
+
+## ベンチマーク
+[Stability-AI/lm-evaluation-harness](https://github.com/Stability-AI/lm-evaluation-harness)の3項目で評価
+| モデル                       | jsquad(1-shot) | jcommonsenseqa(1-shot) | jnli(1-shot) | 
+| ---------------------------- | -------------- | ---------------------- | ------------ | 
+| Kendamarron/Tokara-0.5B-v0.1 | 26.4295        | 0.2663                 | 0.5509       | 
+| Qwen/Qwen1.5-0.5B            | 31.3597        | 0.2556                 | 0.5534       | 
+
+## 名前について
+日本の在来馬であるトカラ馬から
+
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
+
+model = AutoModelForCausalLM.from_pretrained('Kendamarron/Tokara-0.5B-v0.1')
+tokenizer = AutoTokenizer.from_pretrained('Kendamarron/Tokara-0.5B-v0.1')
+
+pipe = pipeline('text-generation', model=model, tokenizer=tokenizer)
+
+prompt = "大規模言語モデルとは、"
+
+print(pipe(prompt, max_length=128, repetition_penalty=1.1, temperature=0.7, top_p=0.95))
+
+```