Model: dnotitia/DNA-2.1-14B Source: Original Platform
base_model, library_name, pipeline_tag, license
| base_model | library_name | pipeline_tag | license |
|---|---|---|---|
| dnotitia/Smoothie-Qwen3-14B | transformers | text-generation | apache-2.0 |
DNA 2.1
DNA 2.1 is a fine-tuned Qwen3 14B model that thinks natively in Korean through a two-stage training approach. This model is released alongside the paper Making Qwen3 Think in Korean with Reinforcement Learning.
Key Features
- Two-Stage Training Approach: Supervised fine-tuning (SFT) on high-quality Korean reasoning datasets followed by reinforcement learning with our proposed Oracle-Guided Dr. GRPO algorithm
- Native Korean Thinking: Conducts internal chain-of-thought reasoning entirely in Korean
- Stable RL Training: Addresses reward hacking and policy collapse through oracle judge model for reward signal calibration
- Enhanced Reasoning Performance: Substantially improved results on advanced reasoning benchmarks, particularly in math and coding tasks
- Preserved Knowledge & Language Proficiency: Maintains existing knowledge and language capabilities after reinforcement learning
Base Model
This model builds upon Smoothie Qwen3, which reduces Chinese token emission probabilities and enhances Korean reasoning capabilities.
Citation
If you use this model in your research, please cite our paper:
@misc{lee2025makingqwen3thinkkorean,
title={Making Qwen3 Think in Korean with Reinforcement Learning},
author={Jungyup Lee and Jemin Kim and Sang Park and SeungJae Lee},
year={2025},
eprint={2508.10355},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2508.10355},
}
Description
Languages
Jinja
100%
