DNA-2.1-14B/README.md

---
base_model: dnotitia/Smoothie-Qwen3-14B
library_name: transformers
pipeline_tag: text-generation
license: apache-2.0
---

<p align="center">
<img src="assets/dna-2.1-logo.png" width="400" style="margin: 40px auto;">
</p>

# DNA 2.1

**DNA 2.1** is a fine-tuned Qwen3 14B model that thinks natively in Korean through a two-stage training approach. This model is released alongside the paper [Making Qwen3 Think in Korean with Reinforcement Learning](https://arxiv.org/abs/2508.10355).

## Key Features

- **Two-Stage Training Approach**: Supervised fine-tuning (SFT) on high-quality Korean reasoning datasets followed by reinforcement learning with our proposed **Oracle-Guided Dr. GRPO** algorithm
- **Native Korean Thinking**: Conducts internal chain-of-thought reasoning entirely in Korean
- **Stable RL Training**: Addresses reward hacking and policy collapse through oracle judge model for reward signal calibration
- **Enhanced Reasoning Performance**: Substantially improved results on advanced reasoning benchmarks, particularly in math and coding tasks
- **Preserved Knowledge & Language Proficiency**: Maintains existing knowledge and language capabilities after reinforcement learning

## Base Model

This model builds upon [Smoothie Qwen3](https://huggingface.co/collections/dnotitia/smoothie-qwen3-6811896ebb3a255de7b5b437), which reduces Chinese token emission probabilities and enhances Korean reasoning capabilities.

## Citation

If you use this model in your research, please cite our paper:

```bibtex
@misc{lee2025makingqwen3thinkkorean,
      title={Making Qwen3 Think in Korean with Reinforcement Learning},
      author={Jungyup Lee and Jemin Kim and Sang Park and SeungJae Lee},
      year={2025},
      eprint={2508.10355},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2508.10355},
}
```