Files
DNA-2.1-14B/README.md
ModelHub XC a61bf1dd63 初始化项目,由ModelHub XC社区提供模型
Model: dnotitia/DNA-2.1-14B
Source: Original Platform
2026-05-07 08:26:47 +08:00

42 lines
1.8 KiB
Markdown

---
base_model: dnotitia/Smoothie-Qwen3-14B
library_name: transformers
pipeline_tag: text-generation
license: apache-2.0
---
<p align="center">
<img src="assets/dna-2.1-logo.png" width="400" style="margin: 40px auto;">
</p>
# DNA 2.1
**DNA 2.1** is a fine-tuned Qwen3 14B model that thinks natively in Korean through a two-stage training approach. This model is released alongside the paper [Making Qwen3 Think in Korean with Reinforcement Learning](https://arxiv.org/abs/2508.10355).
## Key Features
- **Two-Stage Training Approach**: Supervised fine-tuning (SFT) on high-quality Korean reasoning datasets followed by reinforcement learning with our proposed **Oracle-Guided Dr. GRPO** algorithm
- **Native Korean Thinking**: Conducts internal chain-of-thought reasoning entirely in Korean
- **Stable RL Training**: Addresses reward hacking and policy collapse through oracle judge model for reward signal calibration
- **Enhanced Reasoning Performance**: Substantially improved results on advanced reasoning benchmarks, particularly in math and coding tasks
- **Preserved Knowledge & Language Proficiency**: Maintains existing knowledge and language capabilities after reinforcement learning
## Base Model
This model builds upon [Smoothie Qwen3](https://huggingface.co/collections/dnotitia/smoothie-qwen3-6811896ebb3a255de7b5b437), which reduces Chinese token emission probabilities and enhances Korean reasoning capabilities.
## Citation
If you use this model in your research, please cite our paper:
```bibtex
@misc{lee2025makingqwen3thinkkorean,
title={Making Qwen3 Think in Korean with Reinforcement Learning},
author={Jungyup Lee and Jemin Kim and Sang Park and SeungJae Lee},
year={2025},
eprint={2508.10355},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2508.10355},
}
```