初始化项目,由ModelHub XC社区提供模型

Model: kakaocorp/kanana-2-30b-a3b-instruct-2601
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-04-14 16:17:00 +08:00
commit aa754a9035
24 changed files with 3077 additions and 0 deletions

63
.gitattributes vendored Normal file
View File

@@ -0,0 +1,63 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bin.* filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zstandard filter=lfs diff=lfs merge=lfs -text
*.tfevents* filter=lfs diff=lfs merge=lfs -text
*.db* filter=lfs diff=lfs merge=lfs -text
*.ark* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*data* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.meta filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.index filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.gguf* filter=lfs diff=lfs merge=lfs -text
*.ggml filter=lfs diff=lfs merge=lfs -text
*.llamafile* filter=lfs diff=lfs merge=lfs -text
*.pt2 filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
model-00001-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
model-00008-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
model-00009-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
model-00002-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
model-00012-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
model-00013-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
model-00005-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
model-00006-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
model-00007-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
model-00003-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
model-00004-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
model.safetensors.index.json filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text
model-00011-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
model-00010-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text

73
LICENSE Normal file
View File

@@ -0,0 +1,73 @@
KANANA LICENSE AGREEMENT
Kanana Release Date: July 17, 2025
This KANANA LICENSE AGREEMENT (this “Agreement”) is made by and between you and Kakao Corp. (“KAKAO”) that governs your use of Kanana Materials that KAKAO provides to you.
By using, copying, modifying, distributing, performing, or displaying all or part of Kanana Materials, or otherwise accepting the terms and conditions of this Agreement, you agree to be bound by this Agreement. You hereby represent and warrant that (i) you are legally authorized to enter into this Agreement, and (ii) if you are entering into this Agreement on behalf of a legal entity, you have the authority to legally and validly bind such entity.
1. Definition
1.1 “Agreement” means the terms and conditions for use, copying, distribution and modification of Kanana Materials as set forth herein.
1.2 “KAKAO” means Kakao Corp.
1.3 “You” means an individual or legal entity that enters into this Agreement with KAKAO and exercises its rights hereunder or uses Kanana Materials for any purpose. If you enter into this Agreement on behalf of a legal entity, “you” shall include such entity.
1.4 “Kanana” means the basic large-scale language model, software, and algorithms distributed by KAKAO under this Agreement, including parameters (such as Model Weights and optimizer status), machine learning model codes, inference/learning/fine-tuning codes, and other related elements.
1.5 “Documentation” means the specifications, manuals, and other documentation accompanying Kanana distributed by KAKAO.
1.6 “Kanana Materials” means, collectively, Kanana and Documentation, including any portions or components thereof.
1.7 “Outputs” means information content generated by operating or otherwise using Kanana Materials.
1.8 “Derivative Works” means (i) any modifications to Kanana, (ii) any work of authorship based on Kanana, or (iii) any other designed machine learning models that either directly use the patterns of Model Weights, parameters, operations, and/or outputs or incorporate a substantial part of Kananas performance or functional characteristics through methods including, but not limited to, transfer learning, fine-tuning, or knowledge distillation. This includes distillation methods using Kananas intermediate data representations or a method based on the synthetic data outputs generated by Kanana; provided, however, that Outputs shall not be deemed to be Derivative Works.
1.9 “Model Weights” means a set of numerical parameter values generated during Kananas learning process, representing the result of substantial investment and effort by KAKAO.
2. Grant of License and Use Policy
2.1 Grant of License. Subject to the terms and conditions of this Agreement, you are granted a non-exclusive, worldwide, non-transferrable, royalty-free limited license under KAKAOs intellectual property or other rights owned by KAKAO that enables you to access, download, install, copy, use, reproduce, distribute, create Derivative Works of, and make modifications to Kanana Materials.
2.2 Policy on Prohibited Use. Your use of Kanana Materials and Derivative Works must comply with applicable laws and regulations and adhere to KAKAOs Guidelines For Responsible AI (https://www.kakaocorp.com/page/responsible/detail/guidelinesForResponsibleAI), which is hereby incorporated into this Agreement.
2.3 This Agreement applies solely to Kanana-*** and shall not apply to any other models distributed by KAKAO under separate licenses. Licenses applicable to such other models shall not apply to Kanana-***.
2.4 The license terms applicable to a specific version of Kanana applies exclusively to that version and shall not extend to any other versions. Each version shall be deemed as an independent and separate work of authorship.
2.5 You may use each version of Kanana only in accordance with the license terms expressly specified for that version, and you shall not claim that the license terms applicable to one version apply to any other version.
2.6 You shall not combine different versions of Kanana versions that are subject to different license terms in order to circumvent any applicable license terms.
3. Redistribution
3.1 You may copy, distribute or disclose Kanana, Derivative Works, or any products or services that contain Kanana or Derivative Works; provided, however, that you shall:
(i) incorporate the compliance obligation set forth in the Policy on Prohibited Use provision of Section 2.2 in any agreement for use and distribution and notify subsequent users that such use restrictions apply;
(ii) provide any recipients of Kanana Materials or Derivative Works a copy of this Agreement;
(iii) expressly indicate in any files you have modified that it has been modified by you;
(iv) include a “Notice” text file that includes the following notice:
“Kanana is licensed in accordance with the Kanana License Agreement. Copyright © KAKAO Corp. All Rights Reserved.”; and
(v) clearly display the phrase “Powered by Kanana” on related websites, user interfaces, blog posts, introduction pages, or product documentation in a manner that is easily recognizable to users. In addition, if you use Kanana Materials or their outputs to create, train, improve, or enhance other AI models and distribute them, you must include Kanana as a prefix to the name of such AI models.
3.2 You may add your own copyright statement to your modifications of Kanana Materials and may provide additional or different license terms and conditions; provided, however, that such additional or different license terms and conditions shall not violate or conflict with any provisions of this Agreement.
4. Additional Commercial Terms
4.1 If you wish to engage in any of the following activities using Kanana Materials or any Derivative Works, you must obtain a separate commercial license expressly granted by KAKAO:
(i) Offering or (re)selling to third parties access to Kanana Materials or any Derivative Works through API, cloud platforms, or other remote access services;
(ii) Offering or (re)selling to third parties Kanana Materials or any Derivative Works in whole or in part, as part of a system integration (SI) or on-premise deployment solution; or
(iii) Offering or (re)selling to third parties Kanana Materials or any Derivative Works embedded in an on-device domains.
4.2 If, as of Kanana Release Date, the number of monthly active users of the products or services provided by you and/or your affiliates, is greater than 10 million in the preceding calendar month, you must obtain a separate commercial license expressly granted by KAKAO.
4.3 For clarity, unless your activities or conditions fall within those specified in Sections 4.1 and 4.2 above, you may use Kanana Materials or any Derivative Works for the development and operation of your own services without obtaining a commercial license from KAKAO.
4.4 The grant of any commercial license under Sections 4.1 and 4.2 shall be at KAKAOs sole discretion
5. Outputs
KAKAO will not claim any rights to Outputs you generate using Kanana Materials. You shall be solely responsible for Outputs and the use thereof.
6. Disclaimer of Warranty
Unless required by law, Kanana Materials are provided on an “AS IS” basis, and KAKAO disclaims all warranties of any kind, both express and implied, including, without limitation, any warranties of title, non-infringement, merchantability, or fitness for a particular purpose.
7. Limitation on Liability
Unless required by law, in no event shall KAKAO be liable to you for damages, including any direct, indirect, special, consequential, incidental, and punitive damages of any character arising out of the use or inability to use Kanana Materials, Derivative Works, or Outputs, even if KAKAO has been advised of the possibility of such damages.
8. Indemnification
You shall indemnify and hold KAKAO harmless from and against any and all claims that may be filed by a third party as a result of your infringement of any third partys rights or violation of any applicable law, to the extent caused by your use or distribution of Kanana Materials, Derivative Works, or Outputs; provided, however, that the foregoing shall not apply to claims resulting from KAKAOs willful or gross negligence.
9. Intellectual Property
9.1 This Agreement does not grant you any rights to use KAKAOs trademarks, service marks, or product names. However, on a limited basis and solely for the purpose of complying with Section 3.1(v), KAKAO authorizes you to use the Kanana trademark, provided that KAKAO may require you to discontinue such use at any time if you impair the value of the Kanana trademark.
9.2 KAKAO retains ownership of Kanana Materials and Derivative Works created by KAKAO, but you will retain ownership of any Derivative Works and modifications made by you.
9.3 If you bring any legal action or proceeding against KAKAO or a third party alleging that the Kanana Materials, Derivative Works, or Outputs infringe your intellectual property rights, your rights under this Agreement shall automatically terminate as of the date such action is filed.
9.4 You acknowledge that Model Weights are a valuable asset of KAKAO. You shall not extract, copy, distribute, modify Model Weights or use them to train new models, except as expressly permitted under this Agreement.
9.5 The protections under this Agreement apply to all components of Kanana Materials (irrespective of whether it is recognized as a work of authorship), including, but not limited to, Model Weights, parameters, algorithms, or structures. You may exercise your rights in these components only to the extent expressly permitted under this Agreement.
10. Term and Termination
The term of this Agreement will commence upon your acceptance of this Agreement or access to Kanana Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. KAKAO may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of Kanana Materials and Derivative Works. Sections 5, 6, 7, 8, 10 and 11 shall survive the termination of this Agreement.
11. Governing Law and Arbitration
11.1 This Agreement will be governed and construed under the laws of the Republic of Korea, without regard to its conflicts of laws principles.
11.2 Any disputes arising out of or in connection with this Agreement shall be finally settled by arbitration in accordance with the International Arbitration Rules of the Korean Commercial Arbitration Board. The number of arbitrators shall be one. The seat, or legal place, of arbitral proceedings shall be Seoul, Republic of Korea. The language to be used in the arbitral proceedings shall be English. Either party may seek interim or provisional relief from a court of competent jurisdiction, which shall not be considered a waiver of any provision in this Section. The arbitral tribunal also has the authority to issue orders for interim or provisional relief.
12. No Waiver
KAKAOs failure or delay in exercising any of its rights under this Agreement shall not constitute a waiver of such rights.

692
README.md Normal file
View File

@@ -0,0 +1,692 @@
---
library_name: transformers
license_name: "kanana"
license_link: LICENSE
pipeline_tag: text-generation
model_id: kakaocorp/kanana-2-30b-a3b-instruct-2601
repo: kakaocorp/kanana-2-30b-a3b-instruct-2601
developers: Kanana LLM
base_model:
- kakaocorp/kanana-2-30b-a3b-mid-2601
---
<p align="center">
<img src="./assets/logo/kanana.png" width="60%" alt="Kanana">
</p>
<p align="center">
🤗 <a href="https://huggingface.co/collections/kakaocorp/kanana-2">HF Models</a> &nbsp | &nbsp
📕 <a href="https://tech.kakao.com/posts/807">Pre-Training Blog</a> &nbsp | &nbsp
📕 <a href="https://tech.kakao.com/posts/808">Post-Training Blog</a> &nbsp | &nbsp
📕 <a href="https://tech.kakao.com/posts/804">Teaser Blog</a> &nbsp
</p>
<br><br>
## News 🔥
- `2026/01/15`: 🤗 Released `kanana-2-30b-a3b-2601` HF model weights.
- `2026/01/15`: 📕 Published blog posts ([pre-training](https://tech.kakao.com/posts/807), [post-training](https://tech.kakao.com/posts/808)) about the development of `Kanana-2` models.
- `2025/12/19`: 🤗 Released `kanana-2-30b-a3b` HF model weights and publised a [teaser blog](https://tech.kakao.com/posts/804).
<br>
# Kanana-2 Highlights
**Kanana-2**, the latest open-source evolution of the Kanana model family, is designed specifically for **Agentic AI**, presenting substantial enhancements in **tool calling, complex instruction following, and logical reasoning**. This new version adopts a cutting-edge architecture featuring MLA (Multi-head Latent Attention) and MoE (Mixture of Experts). These innovations allow the model to utilize significantly fewer active parameters compared to the previous 32.5B model while delivering superior performance and ensuring high throughput. Furthermore, the model **natively supports context lengths of up to 32,768 tokens**, enabling it to maintain coherence when handling extensive documents or long-context interactions.
In addition, Kanana-2 now supports 6 languages, covering **Korean, English, Japanese, Chinese, Thai, and Vietnamese**. To support this expansion, Kanana-2 utilizes a newly trained tokenizer that demonstrates superior tokenization efficiency across these languages, including an improvement of over 30% specifically for Korean. Finally, to address advanced problem-solving needs, Kanana-2 introduces **reasoning models** capable of deliberate thinking and reasoning, achieving significantly enhanced performance in downstream tasks, especially when tackling hard problems.
> [!NOTE]
> No Kakao user data was used for either pre-training or post-training.
<br>
## Model Overview
**kanana-2-30b-a3b** series has the following features:
- Total Parameters: 30B
- Activated Parameters: 3B
- Number of Layers: 48
- Number of Dense Layers: 1
- Number of Experts: 128
- Number of Selected Experts: 6
- Number of Shared Experts: 2
- Attention Mechanism: MLA
- Vocabulary Size: 128256
- Context Length: 32,768
<br>
## Model Downloads
<div align="left">
| **Model** | **Download** |
| :------------: | :------------: |
| kanana-2-30b-a3b-base-2601<sup>*</sup> | [🤗 HuggingFace](https://huggingface.co/kakaocorp/kanana-2-30b-a3b-base-2601) |
| kanana-2-30b-a3b-mid-2601<sup>*</sup> | [🤗 HuggingFace](https://huggingface.co/kakaocorp/kanana-2-30b-a3b-mid-2601) |
| kanana-2-30b-a3b-instruct-2601 | [🤗 HuggingFace](https://huggingface.co/kakaocorp/kanana-2-30b-a3b-instruct-2601) |
| kanana-2-30b-a3b-thinking-2601 | [🤗 HuggingFace](https://huggingface.co/kakaocorp/kanana-2-30b-a3b-thinking-2601) |
<sub>
<sup>*</sup> We are releasing the <code><small>kanana-2-30b-a3b-base-2601</small></code> (prior to mid-training) checkpoint to contribute to the research community.<br>
&nbsp&nbspNote: <code><small>kanana-2-30b-a3b-mid-2601</small></code> is identical to <a href="https://huggingface.co/kakaocorp/kanana-2-30b-a3b-base">kanana-2-30b-a3b-base</a>.
</sub>
</div>
<br>
## Performance
### Base model evaluation results
<table>
<thead>
<tr>
<th align="center">Benchmark</th>
<th align="center">Metric</th>
<th align="center">Shot</th>
<th align="center">kanana-2-30b-a3b-mid-2601</th>
<th align="center">kanana-2-30b-a3b-base-2601</th>
<th align="center">kanana-1.5-32.5b-base</th>
<th align="center">Qwen3-30B-A3B-Base<sup>*</sup></th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" colspan="7">General Tasks</td>
</tr>
<tr>
<td align="center">MMLU</td>
<td align="center">acc</td>
<td align="center">5</td>
<td align="center">75.44</td>
<td align="center">74.83</td>
<td align="center">76.76</td>
<td align="center">81.14</td>
</tr>
<tr>
<td align="center">MMLU-Pro</td>
<td align="center">acc</td>
<td align="center">5</td>
<td align="center">56.14</td>
<td align="center">52.61</td>
<td align="center">52.40</td>
<td align="center">61.83</td>
</tr>
<tr>
<td align="center">BBH</td>
<td align="center">acc</td>
<td align="center">3</td>
<td align="center">79.76</td>
<td align="center">76.46</td>
<td align="center">81.54</td>
<td align="center">79.97</td>
</tr>
<tr>
<td align="center">SimpleQA<sup></sup></td>
<td align="center">acc</td>
<td align="center">5</td>
<td align="center">29.70</td>
<td align="center">29.13</td>
<td align="center">26.95</td>
<td align="center">26.47</td>
</tr>
<tr>
<td align="center" colspan="7">Mathematics Tasks</td>
</tr>
<tr>
<td align="center">MATH</td>
<td align="center">em</td>
<td align="center">4</td>
<td align="center">54.40</td>
<td align="center">48.86</td>
<td align="center">47.68</td>
<td align="center">62.58</td>
</tr>
<tr>
<td align="center">GSM8K</td>
<td align="center">em</td>
<td align="center">8</td>
<td align="center">82.71</td>
<td align="center">76.57</td>
<td align="center">85.14</td>
<td align="center">88.10</td>
</tr>
<tr>
<td align="center" colspan="7">Coding Tasks</td>
</tr>
<tr>
<td align="center">HumanEval</td>
<td align="center">pass@1</td>
<td align="center">0</td>
<td align="center">75.29</td>
<td align="center">71.34</td>
<td align="center">75.59</td>
<td align="center">53.32</td>
</tr>
<tr>
<td align="center">MBPP</td>
<td align="center">pass@1</td>
<td align="center">3</td>
<td align="center">62.39</td>
<td align="center">60.21</td>
<td align="center">65.96</td>
<td align="center">72.58</td>
</tr>
<tr>
<td align="center" colspan="7">Korean Tasks</td>
</tr>
<tr>
<td align="center">KMMLU</td>
<td align="center">acc</td>
<td align="center">5</td>
<td align="center">62.15</td>
<td align="center">61.98</td>
<td align="center">61.56</td>
<td align="center">62.25</td>
</tr>
<tr>
<td align="center">KoSimpleQA<sup></sup></td>
<td align="center">acc</td>
<td align="center">5</td>
<td align="center">49.70</td>
<td align="center">49.40</td>
<td align="center">45.70</td>
<td align="center">26.33</td>
</tr>
<tr>
<td align="center">HAE-RAE Bench (v1.0)</td>
<td align="center">acc</td>
<td align="center">5</td>
<td align="center">88.73</td>
<td align="center">88.91</td>
<td align="center">90.65</td>
<td align="center">72.04</td>
</tr>
<tr>
<td align="center">MATH-Ko<sup></sup></td>
<td align="center">em</td>
<td align="center">4</td>
<td align="center">54.07</td>
<td align="center">45.58</td>
<td align="center">47.42</td>
<td align="center">58.20</td>
</tr>
<tr>
<td align="center">GSM8K-Ko<sup></sup></td>
<td align="center">em</td>
<td align="center">8</td>
<td align="center">77.48</td>
<td align="center">70.43</td>
<td align="center">81.43</td>
<td align="center">88.10</td>
</tr>
<tr>
<td align="center">MBPP-Ko<sup>§</sup></td>
<td align="center">pass@1</td>
<td align="center">3</td>
<td align="center">61.55</td>
<td align="center">57.29</td>
<td align="center">65.41</td>
<td align="center">66.84</td>
</tr>
<tr>
<td align="center" colspan="7">Long Context Tasks</td>
</tr>
<tr>
<td align="center">RULER-4K</td>
<td align="center">acc</td>
<td align="center">0</td>
<td align="center">93.09</td>
<td align="center">92.49</td>
<td align="center">86.39</td>
<td align="center">94.32</td>
</tr>
<tr>
<td align="center">RULER-8K</td>
<td align="center">acc</td>
<td align="center">0</td>
<td align="center">92.29</td>
<td align="center">92.14</td>
<td align="center">90.16</td>
<td align="center">92.16</td>
</tr>
<tr>
<td align="center">RULER-16K</td>
<td align="center">acc</td>
<td align="center">0</td>
<td align="center">90.73</td>
<td align="center">90.01</td>
<td align="center">85.88</td>
<td align="center">91.28</td>
</tr>
<tr>
<td align="center">RULER-32K</td>
<td align="center">acc</td>
<td align="center">0</td>
<td align="center">88.63</td>
<td align="center">87.92</td>
<td align="center">81.62</td>
<td align="center">88.32</td>
</tr>
</tbody>
</table>
<sub>
<sup>*</sup> Evaluated using an internal evaluation toolkit.<br>
<sup></sup> Evaluated in Multiple Choice Question Answering (MCQA) format with 10 options.<br>
<sup></sup> Subsets from <a href="https://huggingface.co/datasets/HAERAE-HUB/HRM8K">HRM8K</a> (MATH, GSM8K).<br>
<sup>§</sup> Internally translated to Korean.
</sub>
<br>
### Instruct model evaluation results
<table>
<thead>
<tr>
<th align="center">Benchmark</th>
<th align="center">Metric</th>
<th align="center">kanana-2-30b-a3b-instruct-2601</th>
<th align="center">kanana-2-30b-a3b-instruct</th>
<th align="center">kanana-1.5-32.5b-instruct</th>
<th align="center">Qwen3-30B-A3B-Instruct-2507<sup>*</sup></th>
<th align="center">Qwen3-30B-A3B<br>(non-thinking)<sup>*</sup></th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" colspan="7">Chat</td>
</tr>
<tr>
<td align="center">MT-Bench</td>
<td align="center">judge<sup></sup></td>
<td align="center">8.30</td>
<td align="center">8.42</td>
<td align="center">8.23</td>
<td align="center">8.71</td>
<td align="center">8.38</td>
</tr>
<tr>
<td align="center">KoMT-Bench</td>
<td align="center">judge<sup></sup></td>
<td align="center">8.21</td>
<td align="center">8.24</td>
<td align="center">7.94</td>
<td align="center">8.49</td>
<td align="center">7.89</td>
</tr>
<tr>
<td align="center" colspan="7">Instruction Following</td>
</tr>
<tr>
<td align="center">IFEval</td>
<td align="center">prompt strict</td>
<td align="center">87.25</td>
<td align="center">84.47</td>
<td align="center">79.48</td>
<td align="center">82.62</td>
<td align="center">84.10</td>
</tr>
<tr>
<td align="center">IFBench</td>
<td align="center">prompt strict</td>
<td align="center">48.30</td>
<td align="center">41.84</td>
<td align="center">38.78</td>
<td align="center">30.27</td>
<td align="center">29.25</td>
</tr>
<tr>
<td align="center">Multi-IF (EN)</td>
<td align="center">acc</td>
<td align="center">77.88</td>
<td align="center">75.81</td>
<td align="center">68.51</td>
<td align="center">77.93</td>
<td align="center">81.03</td>
</tr>
<tr>
<td align="center">Multi-Challenge</td>
<td align="center">acc</td>
<td align="center">35.16</td>
<td align="center">34.80</td>
<td align="center">19.05</td>
<td align="center">41.76</td>
<td align="center">27.84</td>
</tr>
<tr>
<td align="center" colspan="7">Tool Calling</td>
</tr>
<tr>
<td align="center">BFCL-v3<br>(Live<sup></sup>)</td>
<td align="center">pass@1</td>
<td align="center">76.66</td>
<td align="center">74.30</td>
<td align="center">68.74</td>
<td align="center">73.93</td>
<td align="center">69.14</td>
</tr>
<tr>
<td align="center">BFCL-v3<br>(Multi-Turn<sup></sup>)</td>
<td align="center">pass@1</td>
<td align="center">38.63</td>
<td align="center">35.38</td>
<td align="center">11.38</td>
<td align="center">38.77</td>
<td align="center">11.88</td>
</tr>
<tr>
<td align="center" colspan="7">Code Generation</td>
</tr>
<tr>
<td align="center">HumanEval+</td>
<td align="center">pass@1</td>
<td align="center">81.10</td>
<td align="center">79.88</td>
<td align="center">79.88</td>
<td align="center">86.59</td>
<td align="center">87.20</td>
</tr>
<tr>
<td align="center">MBPP+</td>
<td align="center">pass@1</td>
<td align="center">73.02</td>
<td align="center">73.81</td>
<td align="center">71.96</td>
<td align="center">75.13</td>
<td align="center">75.13</td>
</tr>
<tr>
<td align="center" colspan="7">Mathematics</td>
</tr>
<tr>
<td align="center">GSM8K</td>
<td align="center">em</td>
<td align="center">93.10</td>
<td align="center">91.89</td>
<td align="center">91.58</td>
<td align="center">93.56</td>
<td align="center">93.33</td>
</tr>
<tr>
<td align="center">MATH</td>
<td align="center">acc</td>
<td align="center">88.56</td>
<td align="center">86.26</td>
<td align="center">77.92</td>
<td align="center">90.96</td>
<td align="center">87.20</td>
</tr>
<tr>
<td align="center" colspan="7">Reasoning & Knowledge</td>
</tr>
<tr>
<td align="center">MMLU</td>
<td align="center">em</td>
<td align="center">81.61</td>
<td align="center">80.80</td>
<td align="center">82.75</td>
<td align="center">87.13</td>
<td align="center">85.60</td>
</tr>
<tr>
<td align="center">KMMLU</td>
<td align="center">em</td>
<td align="center">68.26</td>
<td align="center">67.32</td>
<td align="center">65.75</td>
<td align="center">67.56</td>
<td align="center">63.49</td>
</tr>
<tr>
<td align="center">GPQA Diamond</td>
<td align="center">pass@1</td>
<td align="center">52.53</td>
<td align="center">42.93</td>
<td align="center">42.42</td>
<td align="center">54.55</td>
<td align="center">50.51</td>
</tr>
<tr>
<td align="center">HAERAE-Bench (v1.0)</td>
<td align="center">em</td>
<td align="center">75.57</td>
<td align="center">75.57</td>
<td align="center">65.34</td>
<td align="center">53.41</td>
<td align="center">57.39</td>
</tr>
</tbody>
</table>
<sub>
<sup>*</sup> Evaluated using an internal evaluation toolkit.<br>
<sup></sup> Evaluated using <code><small>gpt-4o-2024-08-06</small></code> as the judge model.<br>
<sup></sup> <code><small>Live</small></code> denotes the average score of 6 live benchmarks, and <code><small>Multi-Turn</small></code> denotes the average score of 4 multi-turn benchmarks.
</sub>
<br>
### Reasoning model evaluation results
<table>
<thead>
<tr>
<th align="center">Benchmark</th>
<th align="center">Metric</th>
<th align="center">kanana-2-30b-a3b-thinking-2601</th>
<th align="center">kanana-2-30b-a3b-thinking</th>
<th align="center">Qwen3-30B-A3B-Thinking-2507<sup>*</sup></th>
<th align="center">Qwen3-30B-A3B<br>(thinking)<sup>*</sup></th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" colspan="6">Reasoning & Knowledge</td>
</tr>
<tr>
<td align="center">MMLU-Pro</td>
<td align="center">pass@1</td>
<td align="center">74.2</td>
<td align="center">75.3</td>
<td align="center">80.8</td>
<td align="center">78.5</td>
</tr>
<tr>
<td align="center">GPQA Diamond</td>
<td align="center">pass@1</td>
<td align="center">57.8</td>
<td align="center">61.3</td>
<td align="center">70.6</td>
<td align="center">62.6</td>
</tr>
<tr>
<td align="center" colspan="6">Competition Math</td>
</tr>
<tr>
<td align="center">AIME 2025</td>
<td align="center">pass@1</td>
<td align="center">74.0</td>
<td align="center">72.7</td>
<td align="center">82.3</td>
<td align="center">70.7</td>
</tr>
<tr>
<td align="center">AIME 2024</td>
<td align="center">pass@1</td>
<td align="center">79.0</td>
<td align="center">78.3</td>
<td align="center">91.0</td>
<td align="center">82.7</td>
</tr>
<tr>
<td align="center">AIME 2024-Ko<sup></sup></td>
<td align="center">pass@1</td>
<td align="center">75.0</td>
<td align="center">25.3</td>
<td align="center">80.3</td>
<td align="center">72.3</td>
</tr>
<tr>
<td align="center" colspan="6">Code Generation</td>
</tr>
<tr>
<td align="center">LiveCodeBench</td>
<td align="center">pass@1</td>
<td align="center">58.8</td>
<td align="center">60.8</td>
<td align="center">68.3</td>
<td align="center">62.3</td>
</tr>
<tr>
<td align="center">LiveCodeBench-Ko<sup></sup></td>
<td align="center">pass@1</td>
<td align="center">51.2</td>
<td align="center">9.4</td>
<td align="center">66.3<sup></sup></td>
<td align="center">61.5<sup></sup></td>
</tr>
<tr>
<td align="center" colspan="6">Instruction Following</td>
</tr>
<tr>
<td align="center">IFEval</td>
<td align="center">prompt strict</td>
<td align="center">82.2</td>
<td align="center">82.2</td>
<td align="center">87.8</td>
<td align="center">86.1</td>
</tr>
<tr>
<td align="center">IFBench</td>
<td align="center">prompt strict</td>
<td align="center">47.8</td>
<td align="center">42.3</td>
<td align="center">47.6</td>
<td align="center">36.7</td>
</tr>
<tr>
<td align="center" colspan="6">Tool Calling</td>
</tr>
<tr>
<td align="center">BFCL-v3<br>(Live<sup>§</sup>)</td>
<td align="center">pass@1</td>
<td align="center">75.9</td>
<td align="center">75.6</td>
<td align="center">82.9</td>
<td align="center">80.3</td>
</tr>
<tr>
<td align="center">BFCL-v3<br>(Multi-Turn<sup>§</sup>)</td>
<td align="center">pass@1</td>
<td align="center">43.7</td>
<td align="center">34.3</td>
<td align="center">53.6</td>
<td align="center">35.6</td>
</tr>
</tbody>
</table>
<sub>
<sup>*</sup> Evaluated using an internal evaluation toolkit.<br>
<sup></sup> Korean translation of AIME 2024 sourced from <a href="https://huggingface.co/datasets/amphora/MCLM">MCLM</a>.<br>
<sup></sup> Internally translated to Korean.<br>
<sup>§</sup> <code><small>Live</small></code> denotes the average score of 6 live benchmarks, and <code><small>Multi-Turn</small></code> denotes the average score of 4 multi-turn benchmarks.<br>
<sup></sup> Most responses were generated in English.
</sub>
<br>
## Deployment
> [!NOTE]
> For optimal results with the reasoning model, please adhere to the default parameters: `temperature=0.6`, `top_p=0.95`, `top_k=20`. **We strongly advise against greedy decoding**, as it may lead to performance degradation and infinite repetition loops.
### vLLM
[vLLM](https://github.com/vllm-project/vllm) is a fast and memory-optimized engine designed for high-performance LLM inference and serving.
For kanana-2-30b-a3b-instruct-2601,
```shell
vllm serve kakaocorp/kanana-2-30b-a3b-instruct-2601 --enable-auto-tool-choice --tool-call-parser hermes
```
For kanana-2-30b-a3b-thinking-2601,
```shell
vllm serve kakaocorp/kanana-2-30b-a3b-thinking-2601 --reasoning-parser deepseek_r1 --enable-auto-tool-choice --tool-call-parser hermes
```
### SGLang
[SGLang](https://github.com/sgl-project/sglang) is a high-efficiency framework for serving LLMs and VLMs, enabling easy deployment of OpenAI-compatible API servers.
For kanana-2-30b-a3b-instruct-2601,
```shell
python3 -m sglang.launch_server --model-path kakaocorp/kanana-2-30b-a3b-instruct-2601 --tool-call-parser qwen
```
For kanana-2-30b-a3b-thinking-2601,
```shell
python3 -m sglang.launch_server --model-path kakaocorp/kanana-2-30b-a3b-thinking-2601 --reasoning-parser deepseek-r1 --tool-call-parser qwen
```
<br>
## Processing 32K+ Length
Currently, the `config.json` uploaded to HuggingFace is configured for token lengths of 32,768 or less. To process tokens beyond this length, YaRN must be applied. By updating the `config.json` with the following parameters, you can apply YaRN to handle token sequences up to 128K in length:
```json
"rope_scaling": {
"beta_fast": 32,
"beta_slow": 1,
"factor": 4.0,
"mscale": 1.0,
"mscale_all_dim": 1.0,
"original_max_position_embeddings": 32768,
"type": "yarn",
},
```
Passing command line arguments for deployment:
- `vllm`
```shell
vllm serve ... --hf-overrides '{"max_position_embeddings": 131072, "rope_scaling": {"rope_type":"deepseek_yarn","factor":4.0,"beta_fast":32,"beta_slow":1,"mscale":1.0,"mscale_all_dim":1.0,"original_max_position_embeddings":32768}}'
```
- `sglang`
```shell
python3 -m sglang.launch_server ... --json-model-override-args '{"max_position_embeddings":131072, "rope_scaling":{"rope_type":"deepseek_yarn","factor":4.0,"beta_fast":32,"beta_slow":1,"mscale":1.0,"mscale_all_dim":1.0,"original_max_position_embeddings":32768}}'
```
> [!NOTE]
> Most leading open-source implementations of static YaRN apply a constant scaling factor, which can negatively impact performance on shorter texts. To ensure optimal performance:
> * **Enable `rope_scaling` only when necessary** for processing long contexts.
> * **Adjust the `factor` based on your specific needs** (e.g., set `factor` to 2.0 for a 65,536-token context)."
<br>
## License
The model weights are released under the [Kanana License](./LICENSE).
<br>
## Citation
```
@article{,
title={Kanana-2 LLM},
author={Kanana LLM},
year={2025},
url={https://huggingface.co/collections/kakaocorp/kanana-2}
}
```
<br>
## Contact
- Kanana LLM Team Technical Support: kanana-llm@kakaocorp.com
- Business & Partnership Contact: alpha.k@kakaocorp.com

BIN
assets/logo/kanana.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 107 KiB

85
chat_template.jinja Normal file
View File

@@ -0,0 +1,85 @@
{%- if not add_generation_prompt is defined %}
{%- set add_generation_prompt = false %}
{%- endif %}
{%- if tools %}
{{- '<|im_start|>system\n' }}
{%- if messages[0].role == 'system' %}
{{- messages[0].content + '\n\n' }}
{%- endif %}
{{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
{%- for tool in tools %}
{{- "\n" }}
{{- tool | tojson }}
{%- endfor %}
{{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
{%- else %}
{%- if messages[0].role == 'system' %}
{{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
{%- endif %}
{%- endif %}
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
{%- for message in messages[::-1] %}
{%- set index = (messages|length - 1) - loop.index0 %}
{%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
{%- set ns.multi_step_tool = false %}
{%- set ns.last_query_index = index %}
{%- endif %}
{%- endfor %}
{%- for message in messages %}
{%- if message.content is string %}
{%- set content = message.content %}
{%- else %}
{%- set content = '' %}
{%- endif %}
{%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
{{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
{%- elif message.role == "assistant" %}
{%- set reasoning_content = '' %}
{%- if message.reasoning_content is string %}
{%- set reasoning_content = message.reasoning_content %}
{%- else %}
{%- if '</think>' in content %}
{%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
{%- set content = content.split('</think>')[-1].lstrip('\n') %}
{%- endif %}
{%- endif %}
{%- if loop.index0 > ns.last_query_index and (loop.last or reasoning_content) and reasoning_content|trim != '' %}
{{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
{%- else %}
{{- '<|im_start|>' + message.role + '\n' + content }}
{%- endif %}
{%- if message.tool_calls %}
{%- for tool_call in message.tool_calls %}
{%- if (loop.first and content) or (not loop.first) %}
{{- '\n' }}
{%- endif %}
{%- if tool_call.function %}
{%- set tool_call = tool_call.function %}
{%- endif %}
{{- '<tool_call>\n{"name": "' }}
{{- tool_call.name }}
{{- '", "arguments": ' }}
{%- if tool_call.arguments is string %}
{{- tool_call.arguments }}
{%- else %}
{{- tool_call.arguments | tojson }}
{%- endif %}
{{- '}\n</tool_call>' }}
{%- endfor %}
{%- endif %}
{{- '<|im_end|>\n' }}
{%- elif message.role == "tool" %}
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
{{- '<|im_start|>user' }}
{%- endif %}
{{- '\n<tool_response>\n' }}
{{- content }}
{{- '\n</tool_response>' }}
{%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
{{- '<|im_end|>\n' }}
{%- endif %}
{%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
{{- '<|im_start|>assistant\n' }}
{%- endif %}

48
config.json Normal file
View File

@@ -0,0 +1,48 @@
{
"architectures": [
"DeepseekV3ForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"dtype": "bfloat16",
"eos_token_id": 128010,
"first_k_dense_replace": 1,
"head_dim": 64,
"hidden_act": "silu",
"hidden_size": 2048,
"initializer_range": 0.02,
"intermediate_size": 6144,
"kv_lora_rank": 512,
"max_position_embeddings": 32768,
"model_type": "deepseek_v3",
"moe_intermediate_size": 768,
"moe_layer_freq": 1,
"n_group": 1,
"n_routed_experts": 128,
"n_shared_experts": 2,
"norm_topk_prob": true,
"num_attention_heads": 32,
"num_experts_per_tok": 6,
"num_hidden_layers": 48,
"num_key_value_heads": 32,
"pad_token_id": 128001,
"pretraining_tp": 1,
"q_lora_rank": null,
"qk_head_dim": 192,
"qk_nope_head_dim": 128,
"qk_rope_head_dim": 64,
"rms_norm_eps": 1e-06,
"rope_interleave": true,
"rope_scaling": null,
"rope_theta": 1000000,
"routed_scaling_factor": 2.448,
"scoring_func": "sigmoid",
"tie_word_embeddings": false,
"topk_group": 1,
"topk_method": "noaux_tc",
"transformers_version": "4.57.3",
"use_cache": true,
"v_head_dim": 128,
"vocab_size": 128256
}

1
configuration.json Normal file
View File

@@ -0,0 +1 @@
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}

7
generation_config.json Normal file
View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 128000,
"eos_token_id": 128010,
"pad_token_id": 128001,
"transformers_version": "4.57.3"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:254d64c4471e0287e48f3cb30fe7b7c804906d4950112136feaf1760294414cc
size 4999552848

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:865f62c4b03a9923e7cc2fe4799be8f8219d6fd8158a1a68ff18ac5f52a43cf3
size 4997741048

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9dc3117d2173b5b32c1129305c17362705151dc3c0a30fc174f7ad445ece51c0
size 4997741912

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:19e5e4194e292dbfae7fc5722cf596f3cc018e306eb3009239dbc4d6377cfd87
size 4997742592

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:942d69cd816f879ce7544a291c3863e8ead4904794f83238763c9e2d1b8c7fe3
size 4982800272

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:653c3e2f6aa01f35ea381076faf5d7ff413996eeae518224aeaf63b1b9556e62
size 4997209720

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a606f00c3df05f28dac9dbac247821729255d6013c22a67d28034023ea825e48
size 4997742552

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0e1e3bf22dfb05cee789f8e3764ecf1868d25822ad6e2049b1ba8acddef1fb15
size 4997742576

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:dd28affdd940406b3b7fd6754427e587a93bc1eec21a3440caff715098d56805
size 4997742592

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:937b68169926c165d5060d33b3139c2e8ad21f1b1ba563c53d2951b111459b73
size 4997742592

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b73902cf4b200e6da81608d73c102d6ec8bee063f9bef2e52eee218224bc0169
size 4997742592

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c214510399af1de06af9dd7f6293038708afcc15d0b1baade31144750e9b5e57
size 4997742592

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:47710e69d103e608eb6d82676cbd55b57cc40f91772216a55130653e67d6214c
size 1384691200

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:98b6eee9ebbe4b1f056db299a5281ebeac9d9e4c054ca12acf86d7c6815ccdfb
size 1680317

3
tokenizer.json Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f677910f98ff8282b4963d64444285af7c7f78dcdb688dcb4adc67c39e702d8d
size 10057650

2063
tokenizer_config.json Normal file

File diff suppressed because it is too large Load Diff