初始化项目，由ModelHub XC社区提供模型

Model: kakaocorp/kanana-2-30b-a3b-base Source: Original Platform
2026-04-14 16:07:01 +08:00
commit 2e2b43bdf7
23 changed files with 2891 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,63 @@
+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bin.* filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zstandard filter=lfs diff=lfs merge=lfs -text
+*.tfevents* filter=lfs diff=lfs merge=lfs -text
+*.db* filter=lfs diff=lfs merge=lfs -text
+*.ark* filter=lfs diff=lfs merge=lfs -text
+**/*ckpt*data* filter=lfs diff=lfs merge=lfs -text
+**/*ckpt*.meta filter=lfs diff=lfs merge=lfs -text
+**/*ckpt*.index filter=lfs diff=lfs merge=lfs -text
+ 
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.gguf* filter=lfs diff=lfs merge=lfs -text
+*.ggml filter=lfs diff=lfs merge=lfs -text
+*.llamafile* filter=lfs diff=lfs merge=lfs -text
+*.pt2 filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+
+model-00001-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
+model-00007-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text
+model-00004-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
+model-00005-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
+model-00003-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
+model-00008-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
+model-00006-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
+model-00009-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
+model.safetensors.index.json filter=lfs diff=lfs merge=lfs -text
+model-00002-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
+model-00012-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
+model-00010-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
+model-00011-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
+model-00013-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
--- a/73
+++ b/73
@@ -0,0 +1,73 @@
+KANANA LICENSE AGREEMENT
+
+Kanana Release Date: July 17, 2025
+
+This KANANA LICENSE AGREEMENT (this “Agreement”) is made by and between you and Kakao Corp. (“KAKAO”) that governs your use of Kanana Materials that KAKAO provides to you.
+By using, copying, modifying, distributing, performing, or displaying all or part of Kanana Materials, or otherwise accepting the terms and conditions of this Agreement, you agree to be bound by this Agreement. You hereby represent and warrant that (i) you are legally authorized to enter into this Agreement, and (ii) if you are entering into this Agreement on behalf of a legal entity, you have the authority to legally and validly bind such entity.
+
+1. Definition
+    1.1 “Agreement” means the terms and conditions for use, copying, distribution and modification of Kanana Materials as set forth herein.
+    1.2 “KAKAO” means Kakao Corp.
+    1.3 “You” means an individual or legal entity that enters into this Agreement with KAKAO and exercises its rights hereunder or uses Kanana Materials for any purpose. If you enter into this Agreement on behalf of a legal entity, “you” shall include such entity.
+    1.4 “Kanana” means the basic large-scale language model, software, and algorithms distributed by KAKAO under this Agreement, including parameters (such as Model Weights and optimizer status), machine learning model codes, inference/learning/fine-tuning codes, and other related elements.
+    1.5 “Documentation” means the specifications, manuals, and other documentation accompanying Kanana distributed by KAKAO.
+    1.6 “Kanana Materials” means, collectively, Kanana and Documentation, including any portions or components thereof.
+    1.7 “Outputs” means information content generated by operating or otherwise using Kanana Materials.
+    1.8 “Derivative Works” means (i) any modifications to Kanana, (ii) any work of authorship based on Kanana, or (iii) any other designed machine learning models that either directly use the patterns of Model Weights, parameters, operations, and/or outputs or incorporate a substantial part of Kanana’s performance or functional characteristics through methods including, but not limited to, transfer learning, fine-tuning, or knowledge distillation. This includes distillation methods using Kanana’s intermediate data representations or a method based on the synthetic data outputs generated by Kanana; provided, however, that Outputs shall not be deemed to be Derivative Works.
+    1.9 “Model Weights” means a set of numerical parameter values generated during Kanana’s learning process, representing the result of substantial investment and effort by KAKAO.
+
+2. Grant of License and Use Policy
+    2.1 Grant of License. Subject to the terms and conditions of this Agreement, you are granted a non-exclusive, worldwide, non-transferrable, royalty-free limited license under KAKAO’s intellectual property or other rights owned by KAKAO that enables you to access, download, install, copy, use, reproduce, distribute, create Derivative Works of, and make modifications to Kanana Materials.
+    2.2 Policy on Prohibited Use. Your use of Kanana Materials and Derivative Works must comply with applicable laws and regulations and adhere to KAKAO’s Guidelines For Responsible AI (https://www.kakaocorp.com/page/responsible/detail/guidelinesForResponsibleAI), which is hereby incorporated into this Agreement.
+    2.3 This Agreement applies solely to Kanana-*** and shall not apply to any other models distributed by KAKAO under separate licenses. Licenses applicable to such other models shall not apply to Kanana-***.
+    2.4 The license terms applicable to a specific version of Kanana applies exclusively to that version and shall not extend to any other versions. Each version shall be deemed as an independent and separate work of authorship.
+    2.5 You may use each version of Kanana only in accordance with the license terms expressly specified for that version, and you shall not claim that the license terms applicable to one version apply to any other version.
+    2.6 You shall not combine different versions of Kanana versions that are subject to different license terms in order to circumvent any applicable license terms.
+
+3. Redistribution
+    3.1 You may copy, distribute or disclose Kanana, Derivative Works, or any products or services that contain Kanana or Derivative Works; provided, however, that you shall:
+        (i) incorporate the compliance obligation set forth in the Policy on Prohibited Use provision of Section 2.2 in any agreement for use and distribution and notify subsequent users that such use restrictions apply;
+        (ii) provide any recipients of Kanana Materials or Derivative Works a copy of this Agreement;
+        (iii) expressly indicate in any files you have modified that it has been modified by you;
+        (iv) include a “Notice” text file that includes the following notice:
+            “Kanana is licensed in accordance with the Kanana License Agreement. Copyright © KAKAO Corp. All Rights Reserved.”; and
+        (v) clearly display the phrase “Powered by Kanana” on related websites, user interfaces, blog posts, introduction pages, or product documentation in a manner that is easily recognizable to users. In addition, if you use Kanana Materials or their outputs to create, train, improve, or enhance other AI models and distribute them, you must include ‘Kanana’ as a prefix to the name of such AI models.
+    3.2 You may add your own copyright statement to your modifications of Kanana Materials and may provide additional or different license terms and conditions; provided, however, that such additional or different license terms and conditions shall not violate or conflict with any provisions of this Agreement.
+
+4. Additional Commercial Terms
+    4.1 If you wish to engage in any of the following activities using Kanana Materials or any Derivative Works, you must obtain a separate commercial license expressly granted by KAKAO:
+        (i) Offering or (re)selling to third parties access to Kanana Materials or any Derivative Works through API, cloud platforms, or other remote access services;
+        (ii) Offering or (re)selling to third parties Kanana Materials or any Derivative Works in whole or in part, as part of a system integration (SI) or on-premise deployment solution; or
+        (iii) Offering or (re)selling to third parties Kanana Materials or any Derivative Works embedded in an on-device domains.
+    4.2 If, as of Kanana Release Date, the number of monthly active users of the products or services provided by you and/or your affiliates, is greater than 10 million in the preceding calendar month, you must obtain a separate commercial license expressly granted by KAKAO.
+    4.3 For clarity, unless your activities or conditions fall within those specified in Sections 4.1 and 4.2 above, you may use Kanana Materials or any Derivative Works for the development and operation of your own services without obtaining a commercial license from KAKAO.
+    4.4	The grant of any commercial license under Sections 4.1 and 4.2 shall be at KAKAO’s sole discretion
+
+5. Outputs
+KAKAO will not claim any rights to Outputs you generate using Kanana Materials. You shall be solely responsible for Outputs and the use thereof.
+
+6. Disclaimer of Warranty
+Unless required by law, Kanana Materials are provided on an “AS IS” basis, and KAKAO disclaims all warranties of any kind, both express and implied, including, without limitation, any warranties of title, non-infringement, merchantability, or fitness for a particular purpose.
+
+7. Limitation on Liability
+Unless required by law, in no event shall KAKAO be liable to you for damages, including any direct, indirect, special, consequential, incidental, and punitive damages of any character arising out of the use or inability to use Kanana Materials, Derivative Works, or Outputs, even if KAKAO has been advised of the possibility of such damages.
+
+8. Indemnification
+You shall indemnify and hold KAKAO harmless from and against any and all claims that may be filed by a third party as a result of your infringement of any third party’s rights or violation of any applicable law, to the extent caused by your use or distribution of Kanana Materials, Derivative Works, or Outputs; provided, however, that the foregoing shall not apply to claims resulting from KAKAO’s willful or gross negligence.
+
+9. Intellectual Property
+    9.1 This Agreement does not grant you any rights to use KAKAO’s trademarks, service marks, or product names. However, on a limited basis and solely for the purpose of complying with Section 3.1(v), KAKAO authorizes you to use the Kanana trademark, provided that KAKAO may require you to discontinue such use at any time if you impair the value of the Kanana trademark.
+    9.2 KAKAO retains ownership of Kanana Materials and Derivative Works created by KAKAO, but you will retain ownership of any Derivative Works and modifications made by you.
+    9.3 If you bring any legal action or proceeding against KAKAO or a third party alleging that the Kanana Materials, Derivative Works, or Outputs infringe your intellectual property rights, your rights under this Agreement shall automatically terminate as of the date such action is filed.
+    9.4 You acknowledge that Model Weights are a valuable asset of KAKAO. You shall not extract, copy, distribute, modify Model Weights or use them to train new models, except as expressly permitted under this Agreement.
+    9.5 The protections under this Agreement apply to all components of Kanana Materials (irrespective of whether it is recognized as a work of authorship), including, but not limited to, Model Weights, parameters, algorithms, or structures. You may exercise your rights in these components only to the extent expressly permitted under this Agreement.
+
+10. Term and Termination
+The term of this Agreement will commence upon your acceptance of this Agreement or access to Kanana Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. KAKAO may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of Kanana Materials and Derivative Works. Sections 5, 6, 7, 8, 10 and 11 shall survive the termination of this Agreement.
+
+11. Governing Law and Arbitration
+    11.1 This Agreement will be governed and construed under the laws of the Republic of Korea, without regard to its conflicts of laws principles.
+    11.2 Any disputes arising out of or in connection with this Agreement shall be finally settled by arbitration in accordance with the International Arbitration Rules of the Korean Commercial Arbitration Board. The number of arbitrators shall be one. The seat, or legal place, of arbitral proceedings shall be Seoul, Republic of Korea. The language to be used in the arbitral proceedings shall be English. Either party may seek interim or provisional relief from a court of competent jurisdiction, which shall not be considered a waiver of any provision in this Section. The arbitral tribunal also has the authority to issue orders for interim or provisional relief.
+
+12. No Waiver
+KAKAO’s failure or delay in exercising any of its rights under this Agreement shall not constitute a waiver of such rights.
--- a/README.md
+++ b/README.md
@@ -0,0 +1,593 @@
+---
+library_name: transformers
+license_name: "kanana"
+license_link: LICENSE
+pipeline_tag: text-generation
+model_id: kakaocorp/kanana-2-30b-a3b-base
+repo: kakaocorp/kanana-2-30b-a3b-base
+developers: Kanana LLM
+---
+
+<p align="center">
+    <img src="./assets/logo/kanana.png" width="60%" alt="Kanana">
+</p>
+
+<p align="center">
+    🤗 <a href="https://huggingface.co/collections/kakaocorp/kanana-2">Kanana-2 Models</a> &nbsp | &nbsp
+    📕 <a href="https://tech.kakao.com/posts/804">Kanana-2 Blog</a> &nbsp
+</p>
+<br><br>
+
+# Kanana-2 Highlights
+
+**Kanana-2**, the latest open-source evolution of the Kanana model family, is designed specifically for **Agentic AI**, presenting substantial enhancements in **tool calling, complex instruction following, and logical reasoning**. This new version adopts a cutting-edge architecture featuring MLA (Multi-head Latent Attention) and MoE (Mixture of Experts). These innovations allow the model to utilize significantly fewer active parameters compared to the previous 32.5B model while delivering superior performance and ensuring high throughput. Furthermore, the model **natively supports context lengths of up to 32,768 tokens**, enabling it to maintain coherence when handling extensive documents or long-context interactions.
+
+In addition, Kanana-2 now supports 6 languages, covering **Korean, English, Japanese, Chinese, Thai, and Vietnamese**. To support this expansion, Kanana-2 utilizes a newly trained tokenizer that demonstrates superior tokenization efficiency across these languages, including an improvement of over 30% specifically for Korean. Finally, to address advanced problem-solving needs, Kanana-2 introduces **reasoning models** capable of deliberate thinking and reasoning, achieving significantly enhanced performance in downstream tasks, especially when tackling hard problems.
+
+> [!NOTE]
+> No Kakao user data was used for either pre-training or post-training.
+
+<br>
+
+## Model Overview
+
+**kanana-2-30b-a3b** series has the following features:
+- Total Parameters: 30B
+- Activated Parameters: 3B
+- Number of Layers: 48
+- Number of Dense Layers: 1
+- Number of Experts: 128
+- Number of Selected Experts: 6
+- Number of Shared Experts: 2
+- Attention Mechanism: MLA
+- Vocabulary Size: 128256
+- Context Length: 32,768
+
+<br>
+
+## Model Downloads
+
+<div align="left">
+
+| **Model** | **Download** |
+| :------------: | :------------: |
+| kanana-2-30b-a3b-base | [🤗 HuggingFace](https://huggingface.co/kakaocorp/kanana-2-30b-a3b-base)   |
+| kanana-2-30b-a3b-instruct | [🤗 HuggingFace](https://huggingface.co/kakaocorp/kanana-2-30b-a3b-instruct)   |
+| kanana-2-30b-a3b-thinking | [🤗 HuggingFace](https://huggingface.co/kakaocorp/kanana-2-30b-a3b-thinking)   |
+
+</div>
+
+<br>
+
+## Performance
+
+### Base model evaluation results
+
+
+<table>
+<thead>
+<tr>
+<th align="center">Benchmark</th>
+<th align="center">Metric</th>
+<th align="center">Shot</th>
+<th align="center">kanana-2-30b-a3b-base</th>
+<th align="center">kanana-1.5-32.5b-base</th>
+<th align="center">Qwen3-30B-A3B-Base<sup>*</sup></th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td align="center" colspan="6">General Tasks</td>
+</tr>
+<tr>
+<td align="center">MMLU</td>
+<td align="center">acc</td>
+<td align="center">5</td>
+<td align="center">75.44</td>
+<td align="center">76.76</td>
+<td align="center">81.14</td>
+</tr>
+<tr>
+<td align="center">MMLU-Pro</td>
+<td align="center">acc</td>
+<td align="center">5</td>
+<td align="center">56.14</td>
+<td align="center">52.40</td>
+<td align="center">61.83</td>
+</tr>
+<tr>
+<td align="center">BBH</td>
+<td align="center">acc</td>
+<td align="center">3</td>
+<td align="center">79.76</td>
+<td align="center">81.54</td>
+<td align="center">79.97</td>
+</tr>
+<tr>
+<td align="center">SimpleQA<sup>†</sup></td>
+<td align="center">acc</td>
+<td align="center">5</td>
+<td align="center">29.70</td>
+<td align="center">26.95</td>
+<td align="center">26.47</td>
+</tr>
+<tr>
+<td align="center" colspan="6">Mathematics Tasks</td>
+</tr>
+<tr>
+<td align="center">MATH</td>
+<td align="center">em</td>
+<td align="center">4</td>
+<td align="center">54.40</td>
+<td align="center">47.68</td>
+<td align="center">62.58</td>
+</tr>
+<tr>
+<td align="center">GSM8K</td>
+<td align="center">em</td>
+<td align="center">8</td>
+<td align="center">82.71</td>
+<td align="center">85.14</td>
+<td align="center">88.10</td>
+</tr>
+<tr>
+<td align="center" colspan="6">Coding Tasks</td>
+</tr>
+<tr>
+<td align="center">HumanEval</td>
+<td align="center">pass@1</td>
+<td align="center">0</td>
+<td align="center">75.29</td>
+<td align="center">75.59</td>
+<td align="center">53.32</td>
+</tr>
+<tr>
+<td align="center">MBPP</td>
+<td align="center">pass@1</td>
+<td align="center">3</td>
+<td align="center">62.39</td>
+<td align="center">65.96</td>
+<td align="center">72.58</td>
+</tr>
+<tr>
+<td align="center" colspan="6">Korean Tasks</td>
+</tr>
+<tr>
+<td align="center">KMMLU</td>
+<td align="center">acc</td>
+<td align="center">5</td>
+<td align="center">62.15</td>
+<td align="center">61.56</td>
+<td align="center">62.25</td>
+</tr>
+<tr>
+<td align="center">KoSimpleQA<sup>†</sup></td>
+<td align="center">acc</td>
+<td align="center">5</td>
+<td align="center">49.40</td>
+<td align="center">45.70</td>
+<td align="center">26.33</td>
+</tr>
+<tr>
+<td align="center">HAE-RAE Bench (v1.0)</td>
+<td align="center">acc</td>
+<td align="center">5</td>
+<td align="center">88.73</td>
+<td align="center">90.65</td>
+<td align="center">72.04</td>
+</tr>
+<tr>
+<td align="center">MATH-Ko<sup>‡</sup></td>
+<td align="center">em</td>
+<td align="center">4</td>
+<td align="center">54.07</td>
+<td align="center">47.42</td>
+<td align="center">58.20</td>
+</tr>
+<tr>
+<td align="center">GSM8K-Ko<sup>‡</sup></td>
+<td align="center">em</td>
+<td align="center">8</td>
+<td align="center">77.48</td>
+<td align="center">81.43</td>
+<td align="center">88.10</td>
+</tr>
+<tr>
+<td align="center">MBPP-Ko<sup>§</sup></td>
+<td align="center">pass@1</td>
+<td align="center">3</td>
+<td align="center">61.55</td>
+<td align="center">65.41</td>
+<td align="center">66.84</td>
+</tr>
+<tr>
+<td align="center" colspan="6">Long Context Tasks</td>
+</tr>
+<tr>
+<td align="center">RULER-4K</td>
+<td align="center">acc</td>
+<td align="center">0</td>
+<td align="center">93.09</td>
+<td align="center">86.39</td>
+<td align="center">94.32</td>
+</tr>
+<tr>
+<td align="center">RULER-8K</td>
+<td align="center">acc</td>
+<td align="center">0</td>
+<td align="center">92.29</td>
+<td align="center">90.16</td>
+<td align="center">92.16</td>
+</tr>
+<tr>
+<td align="center">RULER-16K</td>
+<td align="center">acc</td>
+<td align="center">0</td>
+<td align="center">90.73</td>
+<td align="center">85.88</td>
+<td align="center">91.28</td>
+</tr>
+<tr>
+<td align="center">RULER-32K</td>
+<td align="center">acc</td>
+<td align="center">0</td>
+<td align="center">88.63</td>
+<td align="center">81.62</td>
+<td align="center">88.32</td>
+</tr>
+</tbody>
+</table>
+<small>
+<sup>*</sup> Evaluated using an internal evaluation toolkit.<br>
+<sup>†</sup> Evaluated in Multiple Choice Question Answering (MCQA) format with 10 options.<br>
+<sup>‡</sup> Subsets from HRM8K (MATH, GSM8K).<br>
+<sup>§</sup> Internally translated to Korean.
+</small>
+
+<br>
+
+### Instruct model evaluation results
+
+<table>
+<thead>
+<tr>
+<th align="center">Benchmark</th>
+<th align="center">Metric</th>
+<th align="center">kanana-2-30b-a3b-instruct</th>
+<th align="center">kanana-1.5-32.5b-instruct</th>
+<th align="center">Qwen3-30B-A3B-Instruct-2507<sup>*</sup></th>
+<th align="center">Qwen3-30B-A3B<br>(non-thinking)<sup>*</sup></th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td align="center" colspan="6">Chat</td>
+</tr>
+<tr>
+<td align="center">MT-Bench</td>
+<td align="center">judge<sup>†</sup></td>
+<td align="center">8.42</td>
+<td align="center">8.23</td>
+<td align="center">8.71</td>
+<td align="center">8.38</td>
+</tr>
+<tr>
+<td align="center">KoMT-Bench</td>
+<td align="center">judge<sup>†</sup></td>
+<td align="center">8.24</td>
+<td align="center">7.94</td>
+<td align="center">8.49</td>
+<td align="center">7.89</td>
+</tr>
+<tr>
+<td align="center" colspan="6">Instruction Following</td>
+</tr>
+<tr>
+<td align="center">IFEval</td>
+<td align="center">prompt strict</td>
+<td align="center">84.47</td>
+<td align="center">79.48</td>
+<td align="center">82.62</td>
+<td align="center">84.10</td>
+</tr>
+<tr>
+<td align="center">IFBench</td>
+<td align="center">prompt strict</td>
+<td align="center">41.84</td>
+<td align="center">38.78</td>
+<td align="center">30.27</td>
+<td align="center">29.25</td>
+</tr>
+<tr>
+<td align="center">Multi-IF (EN)</td>
+<td align="center">acc</td>
+<td align="center">75.81</td>
+<td align="center">68.51</td>
+<td align="center">77.93</td>
+<td align="center">81.03</td>
+</tr>
+<tr>
+<td align="center">Multi-Challenge</td>
+<td align="center">acc</td>
+<td align="center">34.80</td>
+<td align="center">19.05</td>
+<td align="center">41.76</td>
+<td align="center">27.84</td>
+</tr>
+<tr>
+<td align="center" colspan="6">Tool Calling</td>
+</tr>
+<tr>
+<td align="center">BFCL-v3<br>(Live<sup>‡</sup>)</td>
+<td align="center">pass@1</td>
+<td align="center">74.30</td>
+<td align="center">68.74</td>
+<td align="center">73.93</td>
+<td align="center">69.14</td>
+</tr>
+<tr>
+<td align="center">BFCL-v3<br>(Multi-Turn<sup>‡</sup>)</td>
+<td align="center">pass@1</td>
+<td align="center">35.38</td>
+<td align="center">11.38</td>
+<td align="center">38.77</td>
+<td align="center">11.88</td>
+</tr>
+<tr>
+<td align="center" colspan="6">Code Generation</td>
+</tr>
+<tr>
+<td align="center">HumanEval+</td>
+<td align="center">pass@1</td>
+<td align="center">79.88</td>
+<td align="center">79.88</td>
+<td align="center">86.59</td>
+<td align="center">87.20</td>
+</tr>
+<tr>
+<td align="center">MBPP+</td>
+<td align="center">pass@1</td>
+<td align="center">73.81</td>
+<td align="center">71.96</td>
+<td align="center">75.13</td>
+<td align="center">75.13</td>
+</tr>
+<tr>
+<td align="center" colspan="6">Mathematics</td>
+</tr>
+<tr>
+<td align="center">GSM8K</td>
+<td align="center">em</td>
+<td align="center">91.89</td>
+<td align="center">91.58</td>
+<td align="center">93.56</td>
+<td align="center">93.33</td>
+</tr>
+<tr>
+<td align="center">MATH</td>
+<td align="center">acc</td>
+<td align="center">86.26</td>
+<td align="center">77.92</td>
+<td align="center">90.96</td>
+<td align="center">87.20</td>
+</tr>
+<tr>
+<td align="center" colspan="6">Reasoning & Knowledge</td>
+</tr>
+<tr>
+<td align="center">MMLU</td>
+<td align="center">em</td>
+<td align="center">80.80</td>
+<td align="center">82.75</td>
+<td align="center">87.13</td>
+<td align="center">85.60</td>
+</tr>
+<tr>
+<td align="center">KMMLU</td>
+<td align="center">em</td>
+<td align="center">67.32</td>
+<td align="center">65.75</td>
+<td align="center">67.56</td>
+<td align="center">63.49</td>
+</tr>
+<tr>
+<td align="center">GPQA Diamond</td>
+<td align="center">pass@1</td>
+<td align="center">42.93</td>
+<td align="center">42.42</td>
+<td align="center">54.55</td>
+<td align="center">50.51</td>
+</tr>
+<tr>
+<td align="center">HAERAE-Bench (v1.0)</td>
+<td align="center">em</td>
+<td align="center">75.57</td>
+<td align="center">65.34</td>
+<td align="center">53.41</td>
+<td align="center">57.39</td>
+</tr>
+</tbody>
+</table>
+<small>
+<sup>*</sup> Evaluated using an internal evaluation toolkit.<br>
+<sup>†</sup> Evaluated using <code><small>gpt-4o-2024-08-06</small></code> as the judge model.<br>
+<sup>‡</sup> <code><small>Live</small></code> denotes the average score of 6 live benchmarks, and <code><small>Multi-Turn</small></code> denotes the average score of 4 multi-turn benchmarks.
+</small>
+
+<br>
+
+### Reasoning model evaluation results
+
+<table>
+<thead>
+<tr>
+<th align="center">Benchmark</th>
+<th align="center">Metric</th>
+<th align="center">kanana-2-30b-a3b-thinking</th>
+<th align="center">Qwen3-30B-A3B-Thinking-2507<sup>*</sup></th>
+<th align="center">Qwen3-30B-A3B<br>(thinking)<sup>*</sup></th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td align="center" colspan="6">Reasoning & Knowledge</td>
+</tr>
+<tr>
+<td align="center">MMLU-Pro</td>
+<td align="center">pass@1</td>
+<td align="center">75.3</td>
+<td align="center">80.8</td>
+<td align="center">78.5</td>
+</tr>
+<tr>
+<td align="center">GPQA Diamond</td>
+<td align="center">pass@1</td>
+<td align="center">61.3</td>
+<td align="center">70.6</td>
+<td align="center">62.6</td>
+</tr>
+<tr>
+<td align="center" colspan="6">Competition Math</td>
+</tr>
+<tr>
+<td align="center">AIME 2025</td>
+<td align="center">pass@1</td>
+<td align="center">72.7</td>
+<td align="center">82.3</td>
+<td align="center">70.7</td>
+</tr>
+<tr>
+<td align="center">AIME 2024</td>
+<td align="center">pass@1</td>
+<td align="center">78.3</td>
+<td align="center">91.0</td>
+<td align="center">82.7</td>
+</tr>
+<tr>
+<td align="center" colspan="6">Code Generation</td>
+</tr>
+<tr>
+<td align="center">LiveCodeBench</td>
+<td align="center">pass@1</td>
+<td align="center">60.8</td>
+<td align="center">68.3</td>
+<td align="center">62.3</td>
+</tr>
+<tr>
+<td align="center" colspan="6">Instruction Following</td>
+</tr>
+<tr>
+<td align="center">IFEval</td>
+<td align="center">prompt strict</td>
+<td align="center">82.2</td>
+<td align="center">87.8</td>
+<td align="center">86.1</td>
+</tr>
+<tr>
+<td align="center">IFBench</td>
+<td align="center">prompt strict</td>
+<td align="center">42.3</td>
+<td align="center">47.6</td>
+<td align="center">36.7</td>
+</tr>
+<tr>
+<td align="center" colspan="6">Tool Calling</td>
+</tr>
+<tr>
+<td align="center">BFCL-v3<br>(Live<sup>†</sup>)</td>
+<td align="center">pass@1</td>
+<td align="center">75.6</td>
+<td align="center">82.9</td>
+<td align="center">80.3</td>
+</tr>
+<tr>
+<td align="center">BFCL-v3<br>(Multi-Turn<sup>†</sup>)</td>
+<td align="center">pass@1</td>
+<td align="center">34.3</td>
+<td align="center">53.6</td>
+<td align="center">35.6</td>
+</tr>
+</tbody>
+</table>
+<small>
+<sup>*</sup> Evaluated using an internal evaluation toolkit.<br>
+<sup>†</sup> <code><small>Live</small></code> denotes the average score of 6 live benchmarks, and <code><small>Multi-Turn</small></code> denotes the average score of 4 multi-turn benchmarks.
+</small>
+
+<br>
+
+## Quickstart
+You should install the transformers library with version >= `4.51.0`.
+
+The following contains a code snippet illustrating how to use the model generate content based on given inputs.
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+model_name = "kakaocorp/kanana-2-30b-a3b-base"
+
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype="bfloat16",
+    device_map="auto"
+)
+
+prompt = "Kakao is a leading company in South Korea, and it is known for"
+
+input_ids = tokenizer(
+    [prompt],
+    return_tensors="pt"
+)["input_ids"].to(model.device)
+
+with torch.no_grad():
+    output = model.generate(
+        input_ids,
+        max_new_tokens=32,
+        do_sample=False,
+    )
+
+decoded = tokenizer.decode(output[0], skip_special_tokens=True)
+print(decoded)
+```
+
+<br>
+
+## Processing 32K+ Length
+Currently, the `config.json` uploaded to HuggingFace is configured for token lengths of 32,768 or less. To process tokens beyond this length, YaRN must be applied. By updating the `config.json` with the following parameters, you can apply YaRN to handle token sequences up to 128K in length:
+```json
+"rope_scaling": {
+    "beta_fast": 32,
+    "beta_slow": 1,
+    "factor": 4.0,
+    "mscale": 1.0,
+    "mscale_all_dim": 1.0,
+    "original_max_position_embeddings": 32768,
+    "type": "yarn",
+},
+```
+
+<br>
+
+## License
+
+The model weights are released under the [Kanana License](./LICENSE).
+
+<br>
+
+## Citation
+
+```
+@article{,
+  title={Kanana-2 LLM},
+  author={Kanana LLM},
+  year={2025},
+  url={https://huggingface.co/collections/kakaocorp/kanana-2}
+}
+```
+
+<br>
+
+## Contact
+- Kanana LLM Team Technical Support: kanana-llm@kakaocorp.com
+- Business & Partnership Contact: alpha.k@kakaocorp.com
--- a/assets/logo/kanana.png
+++ b/assets/logo/kanana.png
--- a/config.json
+++ b/config.json
@@ -0,0 +1,47 @@
+{
+  "architectures": [
+    "DeepseekV3ForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 128000,
+  "dtype": "bfloat16",
+  "eos_token_id": 128001,
+  "first_k_dense_replace": 1,
+  "head_dim": 64,
+  "hidden_act": "silu",
+  "hidden_size": 2048,
+  "initializer_range": 0.02,
+  "intermediate_size": 6144,
+  "kv_lora_rank": 512,
+  "max_position_embeddings": 32768,
+  "model_type": "deepseek_v3",
+  "moe_intermediate_size": 768,
+  "moe_layer_freq": 1,
+  "n_group": 1,
+  "n_routed_experts": 128,
+  "n_shared_experts": 2,
+  "norm_topk_prob": true,
+  "num_attention_heads": 32,
+  "num_experts_per_tok": 6,
+  "num_hidden_layers": 48,
+  "num_key_value_heads": 32,
+  "pretraining_tp": 1,
+  "q_lora_rank": null,
+  "qk_head_dim": 192,
+  "qk_nope_head_dim": 128,
+  "qk_rope_head_dim": 64,
+  "rms_norm_eps": 1e-06,
+  "rope_interleave": true,
+  "rope_scaling": null,
+  "rope_theta": 1000000,
+  "routed_scaling_factor": 2.448,
+  "scoring_func": "sigmoid",
+  "tie_word_embeddings": false,
+  "topk_group": 1,
+  "topk_method": "noaux_tc",
+  "transformers_version": "4.57.3",
+  "use_cache": true,
+  "v_head_dim": 128,
+  "vocab_size": 128256
+}
--- a/configuration.json
+++ b/configuration.json
@@ -0,0 +1 @@
+{"framework": "pytorch", "task": "text-generation", "allow_remote": true}
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,6 @@
+{
+  "_from_model_config": true,
+  "bos_token_id": 128000,
+  "eos_token_id": 128001,
+  "transformers_version": "4.57.3"
+}
--- a/model-00001-of-00013.safetensors
+++ b/model-00001-of-00013.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:5ffb2b042f2bdc3fe3fb963097aee0645f36b7641784b51476853d52b7da68cb
+size 4999552848
--- a/model-00002-of-00013.safetensors
+++ b/model-00002-of-00013.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:c1351b9374dbb076f9494df156a5c515b9b08b4f8616d8adb6b12b86a4fd6a0d
+size 4997741048
--- a/model-00003-of-00013.safetensors
+++ b/model-00003-of-00013.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:1f345965f5372f689e06f5660dc7c1aed47551ae69797b8cfdfbd80c7aeb6527
+size 4997741912
--- a/model-00004-of-00013.safetensors
+++ b/model-00004-of-00013.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:e30f6d93e700e0b6bb9a53e6c3a33fa2d9dfa3d163d63883b2c59ff0639c1a3e
+size 4997742592
--- a/model-00005-of-00013.safetensors
+++ b/model-00005-of-00013.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:f8faaf72fb11920c9f96528c94dcacf6b26bbe1354ab75ed6e0e1e09054b82cd
+size 4982800272
--- a/model-00006-of-00013.safetensors
+++ b/model-00006-of-00013.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:420c8085945656a9dbe0b6c345c54b8eadd71aaaf170fadade6d41c01f7b8ff5
+size 4997209720
--- a/model-00007-of-00013.safetensors
+++ b/model-00007-of-00013.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:1020b7e03ed7217ec28f80ad5b32d212bd9ecd09cac8a82cc09cbf190e712aec
+size 4997742552
--- a/model-00008-of-00013.safetensors
+++ b/model-00008-of-00013.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:665fadb07db336112908e8d5879f14b4265c587212adbc94b0663e5d4b2f4111
+size 4997742576
--- a/model-00009-of-00013.safetensors
+++ b/model-00009-of-00013.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:5d8f022c0f58dce8e22fe6399c31a29726b9e9c3ee70e5160c1565c5edf76e60
+size 4997742592
--- a/model-00010-of-00013.safetensors
+++ b/model-00010-of-00013.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:2bf89811918cea6f31fdcf982829c8513094e60ed370404b1a9f43027ee09a79
+size 4997742592
--- a/model-00011-of-00013.safetensors
+++ b/model-00011-of-00013.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:60cd9e235ec365a706f7fdbb08314416c943e2931eac35380b10d485019b3bce
+size 4997742592
--- a/model-00012-of-00013.safetensors
+++ b/model-00012-of-00013.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:043f1871c4fd598d94e5669ecb2e96c4167d3abeb7b6bcf48d07c1ea716d4065
+size 4997742592
--- a/model-00013-of-00013.safetensors
+++ b/model-00013-of-00013.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:7bd0884c4c48015220e352a8786305c2d74237ffd5b798666995891cece295f8
+size 1384691200
--- a/model.safetensors.index.json
+++ b/model.safetensors.index.json
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:98b6eee9ebbe4b1f056db299a5281ebeac9d9e4c054ca12acf86d7c6815ccdfb
+size 1680317
--- a/tokenizer.json
+++ b/tokenizer.json
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:f677910f98ff8282b4963d64444285af7c7f78dcdb688dcb4adc67c39e702d8d
+size 10057650
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
				`@@ -0,0 +1 @@`
				`{"framework": "pytorch", "task": "text-generation", "allow_remote": true}`