--- base_model: dnotitia/Smoothie-Qwen3-14B library_name: transformers pipeline_tag: text-generation license: apache-2.0 ---

# DNA 2.1 **DNA 2.1** is a fine-tuned Qwen3 14B model that thinks natively in Korean through a two-stage training approach. This model is released alongside the paper [Making Qwen3 Think in Korean with Reinforcement Learning](https://arxiv.org/abs/2508.10355). ## Key Features - **Two-Stage Training Approach**: Supervised fine-tuning (SFT) on high-quality Korean reasoning datasets followed by reinforcement learning with our proposed **Oracle-Guided Dr. GRPO** algorithm - **Native Korean Thinking**: Conducts internal chain-of-thought reasoning entirely in Korean - **Stable RL Training**: Addresses reward hacking and policy collapse through oracle judge model for reward signal calibration - **Enhanced Reasoning Performance**: Substantially improved results on advanced reasoning benchmarks, particularly in math and coding tasks - **Preserved Knowledge & Language Proficiency**: Maintains existing knowledge and language capabilities after reinforcement learning ## Base Model This model builds upon [Smoothie Qwen3](https://huggingface.co/collections/dnotitia/smoothie-qwen3-6811896ebb3a255de7b5b437), which reduces Chinese token emission probabilities and enhances Korean reasoning capabilities. ## Citation If you use this model in your research, please cite our paper: ```bibtex @misc{lee2025makingqwen3thinkkorean, title={Making Qwen3 Think in Korean with Reinforcement Learning}, author={Jungyup Lee and Jemin Kim and Sang Park and SeungJae Lee}, year={2025}, eprint={2508.10355}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2508.10355}, } ```