初始化项目,由ModelHub XC社区提供模型

Model: kakaocorp/kanana-2-30b-a3b-base
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-04-14 16:07:01 +08:00
commit 2e2b43bdf7
23 changed files with 2891 additions and 0 deletions

63
.gitattributes vendored Normal file
View File

@@ -0,0 +1,63 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bin.* filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zstandard filter=lfs diff=lfs merge=lfs -text
*.tfevents* filter=lfs diff=lfs merge=lfs -text
*.db* filter=lfs diff=lfs merge=lfs -text
*.ark* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*data* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.meta filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.index filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.gguf* filter=lfs diff=lfs merge=lfs -text
*.ggml filter=lfs diff=lfs merge=lfs -text
*.llamafile* filter=lfs diff=lfs merge=lfs -text
*.pt2 filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
model-00001-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
model-00007-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text
model-00004-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
model-00005-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
model-00003-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
model-00008-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
model-00006-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
model-00009-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
model.safetensors.index.json filter=lfs diff=lfs merge=lfs -text
model-00002-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
model-00012-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
model-00010-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
model-00011-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text
model-00013-of-00013.safetensors filter=lfs diff=lfs merge=lfs -text

73
LICENSE Normal file
View File

@@ -0,0 +1,73 @@
KANANA LICENSE AGREEMENT
Kanana Release Date: July 17, 2025
This KANANA LICENSE AGREEMENT (this “Agreement”) is made by and between you and Kakao Corp. (“KAKAO”) that governs your use of Kanana Materials that KAKAO provides to you.
By using, copying, modifying, distributing, performing, or displaying all or part of Kanana Materials, or otherwise accepting the terms and conditions of this Agreement, you agree to be bound by this Agreement. You hereby represent and warrant that (i) you are legally authorized to enter into this Agreement, and (ii) if you are entering into this Agreement on behalf of a legal entity, you have the authority to legally and validly bind such entity.
1. Definition
1.1 “Agreement” means the terms and conditions for use, copying, distribution and modification of Kanana Materials as set forth herein.
1.2 “KAKAO” means Kakao Corp.
1.3 “You” means an individual or legal entity that enters into this Agreement with KAKAO and exercises its rights hereunder or uses Kanana Materials for any purpose. If you enter into this Agreement on behalf of a legal entity, “you” shall include such entity.
1.4 “Kanana” means the basic large-scale language model, software, and algorithms distributed by KAKAO under this Agreement, including parameters (such as Model Weights and optimizer status), machine learning model codes, inference/learning/fine-tuning codes, and other related elements.
1.5 “Documentation” means the specifications, manuals, and other documentation accompanying Kanana distributed by KAKAO.
1.6 “Kanana Materials” means, collectively, Kanana and Documentation, including any portions or components thereof.
1.7 “Outputs” means information content generated by operating or otherwise using Kanana Materials.
1.8 “Derivative Works” means (i) any modifications to Kanana, (ii) any work of authorship based on Kanana, or (iii) any other designed machine learning models that either directly use the patterns of Model Weights, parameters, operations, and/or outputs or incorporate a substantial part of Kananas performance or functional characteristics through methods including, but not limited to, transfer learning, fine-tuning, or knowledge distillation. This includes distillation methods using Kananas intermediate data representations or a method based on the synthetic data outputs generated by Kanana; provided, however, that Outputs shall not be deemed to be Derivative Works.
1.9 “Model Weights” means a set of numerical parameter values generated during Kananas learning process, representing the result of substantial investment and effort by KAKAO.
2. Grant of License and Use Policy
2.1 Grant of License. Subject to the terms and conditions of this Agreement, you are granted a non-exclusive, worldwide, non-transferrable, royalty-free limited license under KAKAOs intellectual property or other rights owned by KAKAO that enables you to access, download, install, copy, use, reproduce, distribute, create Derivative Works of, and make modifications to Kanana Materials.
2.2 Policy on Prohibited Use. Your use of Kanana Materials and Derivative Works must comply with applicable laws and regulations and adhere to KAKAOs Guidelines For Responsible AI (https://www.kakaocorp.com/page/responsible/detail/guidelinesForResponsibleAI), which is hereby incorporated into this Agreement.
2.3 This Agreement applies solely to Kanana-*** and shall not apply to any other models distributed by KAKAO under separate licenses. Licenses applicable to such other models shall not apply to Kanana-***.
2.4 The license terms applicable to a specific version of Kanana applies exclusively to that version and shall not extend to any other versions. Each version shall be deemed as an independent and separate work of authorship.
2.5 You may use each version of Kanana only in accordance with the license terms expressly specified for that version, and you shall not claim that the license terms applicable to one version apply to any other version.
2.6 You shall not combine different versions of Kanana versions that are subject to different license terms in order to circumvent any applicable license terms.
3. Redistribution
3.1 You may copy, distribute or disclose Kanana, Derivative Works, or any products or services that contain Kanana or Derivative Works; provided, however, that you shall:
(i) incorporate the compliance obligation set forth in the Policy on Prohibited Use provision of Section 2.2 in any agreement for use and distribution and notify subsequent users that such use restrictions apply;
(ii) provide any recipients of Kanana Materials or Derivative Works a copy of this Agreement;
(iii) expressly indicate in any files you have modified that it has been modified by you;
(iv) include a “Notice” text file that includes the following notice:
“Kanana is licensed in accordance with the Kanana License Agreement. Copyright © KAKAO Corp. All Rights Reserved.”; and
(v) clearly display the phrase “Powered by Kanana” on related websites, user interfaces, blog posts, introduction pages, or product documentation in a manner that is easily recognizable to users. In addition, if you use Kanana Materials or their outputs to create, train, improve, or enhance other AI models and distribute them, you must include Kanana as a prefix to the name of such AI models.
3.2 You may add your own copyright statement to your modifications of Kanana Materials and may provide additional or different license terms and conditions; provided, however, that such additional or different license terms and conditions shall not violate or conflict with any provisions of this Agreement.
4. Additional Commercial Terms
4.1 If you wish to engage in any of the following activities using Kanana Materials or any Derivative Works, you must obtain a separate commercial license expressly granted by KAKAO:
(i) Offering or (re)selling to third parties access to Kanana Materials or any Derivative Works through API, cloud platforms, or other remote access services;
(ii) Offering or (re)selling to third parties Kanana Materials or any Derivative Works in whole or in part, as part of a system integration (SI) or on-premise deployment solution; or
(iii) Offering or (re)selling to third parties Kanana Materials or any Derivative Works embedded in an on-device domains.
4.2 If, as of Kanana Release Date, the number of monthly active users of the products or services provided by you and/or your affiliates, is greater than 10 million in the preceding calendar month, you must obtain a separate commercial license expressly granted by KAKAO.
4.3 For clarity, unless your activities or conditions fall within those specified in Sections 4.1 and 4.2 above, you may use Kanana Materials or any Derivative Works for the development and operation of your own services without obtaining a commercial license from KAKAO.
4.4 The grant of any commercial license under Sections 4.1 and 4.2 shall be at KAKAOs sole discretion
5. Outputs
KAKAO will not claim any rights to Outputs you generate using Kanana Materials. You shall be solely responsible for Outputs and the use thereof.
6. Disclaimer of Warranty
Unless required by law, Kanana Materials are provided on an “AS IS” basis, and KAKAO disclaims all warranties of any kind, both express and implied, including, without limitation, any warranties of title, non-infringement, merchantability, or fitness for a particular purpose.
7. Limitation on Liability
Unless required by law, in no event shall KAKAO be liable to you for damages, including any direct, indirect, special, consequential, incidental, and punitive damages of any character arising out of the use or inability to use Kanana Materials, Derivative Works, or Outputs, even if KAKAO has been advised of the possibility of such damages.
8. Indemnification
You shall indemnify and hold KAKAO harmless from and against any and all claims that may be filed by a third party as a result of your infringement of any third partys rights or violation of any applicable law, to the extent caused by your use or distribution of Kanana Materials, Derivative Works, or Outputs; provided, however, that the foregoing shall not apply to claims resulting from KAKAOs willful or gross negligence.
9. Intellectual Property
9.1 This Agreement does not grant you any rights to use KAKAOs trademarks, service marks, or product names. However, on a limited basis and solely for the purpose of complying with Section 3.1(v), KAKAO authorizes you to use the Kanana trademark, provided that KAKAO may require you to discontinue such use at any time if you impair the value of the Kanana trademark.
9.2 KAKAO retains ownership of Kanana Materials and Derivative Works created by KAKAO, but you will retain ownership of any Derivative Works and modifications made by you.
9.3 If you bring any legal action or proceeding against KAKAO or a third party alleging that the Kanana Materials, Derivative Works, or Outputs infringe your intellectual property rights, your rights under this Agreement shall automatically terminate as of the date such action is filed.
9.4 You acknowledge that Model Weights are a valuable asset of KAKAO. You shall not extract, copy, distribute, modify Model Weights or use them to train new models, except as expressly permitted under this Agreement.
9.5 The protections under this Agreement apply to all components of Kanana Materials (irrespective of whether it is recognized as a work of authorship), including, but not limited to, Model Weights, parameters, algorithms, or structures. You may exercise your rights in these components only to the extent expressly permitted under this Agreement.
10. Term and Termination
The term of this Agreement will commence upon your acceptance of this Agreement or access to Kanana Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. KAKAO may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of Kanana Materials and Derivative Works. Sections 5, 6, 7, 8, 10 and 11 shall survive the termination of this Agreement.
11. Governing Law and Arbitration
11.1 This Agreement will be governed and construed under the laws of the Republic of Korea, without regard to its conflicts of laws principles.
11.2 Any disputes arising out of or in connection with this Agreement shall be finally settled by arbitration in accordance with the International Arbitration Rules of the Korean Commercial Arbitration Board. The number of arbitrators shall be one. The seat, or legal place, of arbitral proceedings shall be Seoul, Republic of Korea. The language to be used in the arbitral proceedings shall be English. Either party may seek interim or provisional relief from a court of competent jurisdiction, which shall not be considered a waiver of any provision in this Section. The arbitral tribunal also has the authority to issue orders for interim or provisional relief.
12. No Waiver
KAKAOs failure or delay in exercising any of its rights under this Agreement shall not constitute a waiver of such rights.

593
README.md Normal file
View File

@@ -0,0 +1,593 @@
---
library_name: transformers
license_name: "kanana"
license_link: LICENSE
pipeline_tag: text-generation
model_id: kakaocorp/kanana-2-30b-a3b-base
repo: kakaocorp/kanana-2-30b-a3b-base
developers: Kanana LLM
---
<p align="center">
<img src="./assets/logo/kanana.png" width="60%" alt="Kanana">
</p>
<p align="center">
🤗 <a href="https://huggingface.co/collections/kakaocorp/kanana-2">Kanana-2 Models</a> &nbsp | &nbsp
📕 <a href="https://tech.kakao.com/posts/804">Kanana-2 Blog</a> &nbsp
</p>
<br><br>
# Kanana-2 Highlights
**Kanana-2**, the latest open-source evolution of the Kanana model family, is designed specifically for **Agentic AI**, presenting substantial enhancements in **tool calling, complex instruction following, and logical reasoning**. This new version adopts a cutting-edge architecture featuring MLA (Multi-head Latent Attention) and MoE (Mixture of Experts). These innovations allow the model to utilize significantly fewer active parameters compared to the previous 32.5B model while delivering superior performance and ensuring high throughput. Furthermore, the model **natively supports context lengths of up to 32,768 tokens**, enabling it to maintain coherence when handling extensive documents or long-context interactions.
In addition, Kanana-2 now supports 6 languages, covering **Korean, English, Japanese, Chinese, Thai, and Vietnamese**. To support this expansion, Kanana-2 utilizes a newly trained tokenizer that demonstrates superior tokenization efficiency across these languages, including an improvement of over 30% specifically for Korean. Finally, to address advanced problem-solving needs, Kanana-2 introduces **reasoning models** capable of deliberate thinking and reasoning, achieving significantly enhanced performance in downstream tasks, especially when tackling hard problems.
> [!NOTE]
> No Kakao user data was used for either pre-training or post-training.
<br>
## Model Overview
**kanana-2-30b-a3b** series has the following features:
- Total Parameters: 30B
- Activated Parameters: 3B
- Number of Layers: 48
- Number of Dense Layers: 1
- Number of Experts: 128
- Number of Selected Experts: 6
- Number of Shared Experts: 2
- Attention Mechanism: MLA
- Vocabulary Size: 128256
- Context Length: 32,768
<br>
## Model Downloads
<div align="left">
| **Model** | **Download** |
| :------------: | :------------: |
| kanana-2-30b-a3b-base | [🤗 HuggingFace](https://huggingface.co/kakaocorp/kanana-2-30b-a3b-base) |
| kanana-2-30b-a3b-instruct | [🤗 HuggingFace](https://huggingface.co/kakaocorp/kanana-2-30b-a3b-instruct) |
| kanana-2-30b-a3b-thinking | [🤗 HuggingFace](https://huggingface.co/kakaocorp/kanana-2-30b-a3b-thinking) |
</div>
<br>
## Performance
### Base model evaluation results
<table>
<thead>
<tr>
<th align="center">Benchmark</th>
<th align="center">Metric</th>
<th align="center">Shot</th>
<th align="center">kanana-2-30b-a3b-base</th>
<th align="center">kanana-1.5-32.5b-base</th>
<th align="center">Qwen3-30B-A3B-Base<sup>*</sup></th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" colspan="6">General Tasks</td>
</tr>
<tr>
<td align="center">MMLU</td>
<td align="center">acc</td>
<td align="center">5</td>
<td align="center">75.44</td>
<td align="center">76.76</td>
<td align="center">81.14</td>
</tr>
<tr>
<td align="center">MMLU-Pro</td>
<td align="center">acc</td>
<td align="center">5</td>
<td align="center">56.14</td>
<td align="center">52.40</td>
<td align="center">61.83</td>
</tr>
<tr>
<td align="center">BBH</td>
<td align="center">acc</td>
<td align="center">3</td>
<td align="center">79.76</td>
<td align="center">81.54</td>
<td align="center">79.97</td>
</tr>
<tr>
<td align="center">SimpleQA<sup></sup></td>
<td align="center">acc</td>
<td align="center">5</td>
<td align="center">29.70</td>
<td align="center">26.95</td>
<td align="center">26.47</td>
</tr>
<tr>
<td align="center" colspan="6">Mathematics Tasks</td>
</tr>
<tr>
<td align="center">MATH</td>
<td align="center">em</td>
<td align="center">4</td>
<td align="center">54.40</td>
<td align="center">47.68</td>
<td align="center">62.58</td>
</tr>
<tr>
<td align="center">GSM8K</td>
<td align="center">em</td>
<td align="center">8</td>
<td align="center">82.71</td>
<td align="center">85.14</td>
<td align="center">88.10</td>
</tr>
<tr>
<td align="center" colspan="6">Coding Tasks</td>
</tr>
<tr>
<td align="center">HumanEval</td>
<td align="center">pass@1</td>
<td align="center">0</td>
<td align="center">75.29</td>
<td align="center">75.59</td>
<td align="center">53.32</td>
</tr>
<tr>
<td align="center">MBPP</td>
<td align="center">pass@1</td>
<td align="center">3</td>
<td align="center">62.39</td>
<td align="center">65.96</td>
<td align="center">72.58</td>
</tr>
<tr>
<td align="center" colspan="6">Korean Tasks</td>
</tr>
<tr>
<td align="center">KMMLU</td>
<td align="center">acc</td>
<td align="center">5</td>
<td align="center">62.15</td>
<td align="center">61.56</td>
<td align="center">62.25</td>
</tr>
<tr>
<td align="center">KoSimpleQA<sup></sup></td>
<td align="center">acc</td>
<td align="center">5</td>
<td align="center">49.40</td>
<td align="center">45.70</td>
<td align="center">26.33</td>
</tr>
<tr>
<td align="center">HAE-RAE Bench (v1.0)</td>
<td align="center">acc</td>
<td align="center">5</td>
<td align="center">88.73</td>
<td align="center">90.65</td>
<td align="center">72.04</td>
</tr>
<tr>
<td align="center">MATH-Ko<sup></sup></td>
<td align="center">em</td>
<td align="center">4</td>
<td align="center">54.07</td>
<td align="center">47.42</td>
<td align="center">58.20</td>
</tr>
<tr>
<td align="center">GSM8K-Ko<sup></sup></td>
<td align="center">em</td>
<td align="center">8</td>
<td align="center">77.48</td>
<td align="center">81.43</td>
<td align="center">88.10</td>
</tr>
<tr>
<td align="center">MBPP-Ko<sup>§</sup></td>
<td align="center">pass@1</td>
<td align="center">3</td>
<td align="center">61.55</td>
<td align="center">65.41</td>
<td align="center">66.84</td>
</tr>
<tr>
<td align="center" colspan="6">Long Context Tasks</td>
</tr>
<tr>
<td align="center">RULER-4K</td>
<td align="center">acc</td>
<td align="center">0</td>
<td align="center">93.09</td>
<td align="center">86.39</td>
<td align="center">94.32</td>
</tr>
<tr>
<td align="center">RULER-8K</td>
<td align="center">acc</td>
<td align="center">0</td>
<td align="center">92.29</td>
<td align="center">90.16</td>
<td align="center">92.16</td>
</tr>
<tr>
<td align="center">RULER-16K</td>
<td align="center">acc</td>
<td align="center">0</td>
<td align="center">90.73</td>
<td align="center">85.88</td>
<td align="center">91.28</td>
</tr>
<tr>
<td align="center">RULER-32K</td>
<td align="center">acc</td>
<td align="center">0</td>
<td align="center">88.63</td>
<td align="center">81.62</td>
<td align="center">88.32</td>
</tr>
</tbody>
</table>
<small>
<sup>*</sup> Evaluated using an internal evaluation toolkit.<br>
<sup></sup> Evaluated in Multiple Choice Question Answering (MCQA) format with 10 options.<br>
<sup></sup> Subsets from HRM8K (MATH, GSM8K).<br>
<sup>§</sup> Internally translated to Korean.
</small>
<br>
### Instruct model evaluation results
<table>
<thead>
<tr>
<th align="center">Benchmark</th>
<th align="center">Metric</th>
<th align="center">kanana-2-30b-a3b-instruct</th>
<th align="center">kanana-1.5-32.5b-instruct</th>
<th align="center">Qwen3-30B-A3B-Instruct-2507<sup>*</sup></th>
<th align="center">Qwen3-30B-A3B<br>(non-thinking)<sup>*</sup></th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" colspan="6">Chat</td>
</tr>
<tr>
<td align="center">MT-Bench</td>
<td align="center">judge<sup></sup></td>
<td align="center">8.42</td>
<td align="center">8.23</td>
<td align="center">8.71</td>
<td align="center">8.38</td>
</tr>
<tr>
<td align="center">KoMT-Bench</td>
<td align="center">judge<sup></sup></td>
<td align="center">8.24</td>
<td align="center">7.94</td>
<td align="center">8.49</td>
<td align="center">7.89</td>
</tr>
<tr>
<td align="center" colspan="6">Instruction Following</td>
</tr>
<tr>
<td align="center">IFEval</td>
<td align="center">prompt strict</td>
<td align="center">84.47</td>
<td align="center">79.48</td>
<td align="center">82.62</td>
<td align="center">84.10</td>
</tr>
<tr>
<td align="center">IFBench</td>
<td align="center">prompt strict</td>
<td align="center">41.84</td>
<td align="center">38.78</td>
<td align="center">30.27</td>
<td align="center">29.25</td>
</tr>
<tr>
<td align="center">Multi-IF (EN)</td>
<td align="center">acc</td>
<td align="center">75.81</td>
<td align="center">68.51</td>
<td align="center">77.93</td>
<td align="center">81.03</td>
</tr>
<tr>
<td align="center">Multi-Challenge</td>
<td align="center">acc</td>
<td align="center">34.80</td>
<td align="center">19.05</td>
<td align="center">41.76</td>
<td align="center">27.84</td>
</tr>
<tr>
<td align="center" colspan="6">Tool Calling</td>
</tr>
<tr>
<td align="center">BFCL-v3<br>(Live<sup></sup>)</td>
<td align="center">pass@1</td>
<td align="center">74.30</td>
<td align="center">68.74</td>
<td align="center">73.93</td>
<td align="center">69.14</td>
</tr>
<tr>
<td align="center">BFCL-v3<br>(Multi-Turn<sup></sup>)</td>
<td align="center">pass@1</td>
<td align="center">35.38</td>
<td align="center">11.38</td>
<td align="center">38.77</td>
<td align="center">11.88</td>
</tr>
<tr>
<td align="center" colspan="6">Code Generation</td>
</tr>
<tr>
<td align="center">HumanEval+</td>
<td align="center">pass@1</td>
<td align="center">79.88</td>
<td align="center">79.88</td>
<td align="center">86.59</td>
<td align="center">87.20</td>
</tr>
<tr>
<td align="center">MBPP+</td>
<td align="center">pass@1</td>
<td align="center">73.81</td>
<td align="center">71.96</td>
<td align="center">75.13</td>
<td align="center">75.13</td>
</tr>
<tr>
<td align="center" colspan="6">Mathematics</td>
</tr>
<tr>
<td align="center">GSM8K</td>
<td align="center">em</td>
<td align="center">91.89</td>
<td align="center">91.58</td>
<td align="center">93.56</td>
<td align="center">93.33</td>
</tr>
<tr>
<td align="center">MATH</td>
<td align="center">acc</td>
<td align="center">86.26</td>
<td align="center">77.92</td>
<td align="center">90.96</td>
<td align="center">87.20</td>
</tr>
<tr>
<td align="center" colspan="6">Reasoning & Knowledge</td>
</tr>
<tr>
<td align="center">MMLU</td>
<td align="center">em</td>
<td align="center">80.80</td>
<td align="center">82.75</td>
<td align="center">87.13</td>
<td align="center">85.60</td>
</tr>
<tr>
<td align="center">KMMLU</td>
<td align="center">em</td>
<td align="center">67.32</td>
<td align="center">65.75</td>
<td align="center">67.56</td>
<td align="center">63.49</td>
</tr>
<tr>
<td align="center">GPQA Diamond</td>
<td align="center">pass@1</td>
<td align="center">42.93</td>
<td align="center">42.42</td>
<td align="center">54.55</td>
<td align="center">50.51</td>
</tr>
<tr>
<td align="center">HAERAE-Bench (v1.0)</td>
<td align="center">em</td>
<td align="center">75.57</td>
<td align="center">65.34</td>
<td align="center">53.41</td>
<td align="center">57.39</td>
</tr>
</tbody>
</table>
<small>
<sup>*</sup> Evaluated using an internal evaluation toolkit.<br>
<sup></sup> Evaluated using <code><small>gpt-4o-2024-08-06</small></code> as the judge model.<br>
<sup></sup> <code><small>Live</small></code> denotes the average score of 6 live benchmarks, and <code><small>Multi-Turn</small></code> denotes the average score of 4 multi-turn benchmarks.
</small>
<br>
### Reasoning model evaluation results
<table>
<thead>
<tr>
<th align="center">Benchmark</th>
<th align="center">Metric</th>
<th align="center">kanana-2-30b-a3b-thinking</th>
<th align="center">Qwen3-30B-A3B-Thinking-2507<sup>*</sup></th>
<th align="center">Qwen3-30B-A3B<br>(thinking)<sup>*</sup></th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" colspan="6">Reasoning & Knowledge</td>
</tr>
<tr>
<td align="center">MMLU-Pro</td>
<td align="center">pass@1</td>
<td align="center">75.3</td>
<td align="center">80.8</td>
<td align="center">78.5</td>
</tr>
<tr>
<td align="center">GPQA Diamond</td>
<td align="center">pass@1</td>
<td align="center">61.3</td>
<td align="center">70.6</td>
<td align="center">62.6</td>
</tr>
<tr>
<td align="center" colspan="6">Competition Math</td>
</tr>
<tr>
<td align="center">AIME 2025</td>
<td align="center">pass@1</td>
<td align="center">72.7</td>
<td align="center">82.3</td>
<td align="center">70.7</td>
</tr>
<tr>
<td align="center">AIME 2024</td>
<td align="center">pass@1</td>
<td align="center">78.3</td>
<td align="center">91.0</td>
<td align="center">82.7</td>
</tr>
<tr>
<td align="center" colspan="6">Code Generation</td>
</tr>
<tr>
<td align="center">LiveCodeBench</td>
<td align="center">pass@1</td>
<td align="center">60.8</td>
<td align="center">68.3</td>
<td align="center">62.3</td>
</tr>
<tr>
<td align="center" colspan="6">Instruction Following</td>
</tr>
<tr>
<td align="center">IFEval</td>
<td align="center">prompt strict</td>
<td align="center">82.2</td>
<td align="center">87.8</td>
<td align="center">86.1</td>
</tr>
<tr>
<td align="center">IFBench</td>
<td align="center">prompt strict</td>
<td align="center">42.3</td>
<td align="center">47.6</td>
<td align="center">36.7</td>
</tr>
<tr>
<td align="center" colspan="6">Tool Calling</td>
</tr>
<tr>
<td align="center">BFCL-v3<br>(Live<sup></sup>)</td>
<td align="center">pass@1</td>
<td align="center">75.6</td>
<td align="center">82.9</td>
<td align="center">80.3</td>
</tr>
<tr>
<td align="center">BFCL-v3<br>(Multi-Turn<sup></sup>)</td>
<td align="center">pass@1</td>
<td align="center">34.3</td>
<td align="center">53.6</td>
<td align="center">35.6</td>
</tr>
</tbody>
</table>
<small>
<sup>*</sup> Evaluated using an internal evaluation toolkit.<br>
<sup></sup> <code><small>Live</small></code> denotes the average score of 6 live benchmarks, and <code><small>Multi-Turn</small></code> denotes the average score of 4 multi-turn benchmarks.
</small>
<br>
## Quickstart
You should install the transformers library with version >= `4.51.0`.
The following contains a code snippet illustrating how to use the model generate content based on given inputs.
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "kakaocorp/kanana-2-30b-a3b-base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="bfloat16",
device_map="auto"
)
prompt = "Kakao is a leading company in South Korea, and it is known for"
input_ids = tokenizer(
[prompt],
return_tensors="pt"
)["input_ids"].to(model.device)
with torch.no_grad():
output = model.generate(
input_ids,
max_new_tokens=32,
do_sample=False,
)
decoded = tokenizer.decode(output[0], skip_special_tokens=True)
print(decoded)
```
<br>
## Processing 32K+ Length
Currently, the `config.json` uploaded to HuggingFace is configured for token lengths of 32,768 or less. To process tokens beyond this length, YaRN must be applied. By updating the `config.json` with the following parameters, you can apply YaRN to handle token sequences up to 128K in length:
```json
"rope_scaling": {
"beta_fast": 32,
"beta_slow": 1,
"factor": 4.0,
"mscale": 1.0,
"mscale_all_dim": 1.0,
"original_max_position_embeddings": 32768,
"type": "yarn",
},
```
<br>
## License
The model weights are released under the [Kanana License](./LICENSE).
<br>
## Citation
```
@article{,
title={Kanana-2 LLM},
author={Kanana LLM},
year={2025},
url={https://huggingface.co/collections/kakaocorp/kanana-2}
}
```
<br>
## Contact
- Kanana LLM Team Technical Support: kanana-llm@kakaocorp.com
- Business & Partnership Contact: alpha.k@kakaocorp.com

BIN
assets/logo/kanana.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 107 KiB

47
config.json Normal file
View File

@@ -0,0 +1,47 @@
{
"architectures": [
"DeepseekV3ForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"dtype": "bfloat16",
"eos_token_id": 128001,
"first_k_dense_replace": 1,
"head_dim": 64,
"hidden_act": "silu",
"hidden_size": 2048,
"initializer_range": 0.02,
"intermediate_size": 6144,
"kv_lora_rank": 512,
"max_position_embeddings": 32768,
"model_type": "deepseek_v3",
"moe_intermediate_size": 768,
"moe_layer_freq": 1,
"n_group": 1,
"n_routed_experts": 128,
"n_shared_experts": 2,
"norm_topk_prob": true,
"num_attention_heads": 32,
"num_experts_per_tok": 6,
"num_hidden_layers": 48,
"num_key_value_heads": 32,
"pretraining_tp": 1,
"q_lora_rank": null,
"qk_head_dim": 192,
"qk_nope_head_dim": 128,
"qk_rope_head_dim": 64,
"rms_norm_eps": 1e-06,
"rope_interleave": true,
"rope_scaling": null,
"rope_theta": 1000000,
"routed_scaling_factor": 2.448,
"scoring_func": "sigmoid",
"tie_word_embeddings": false,
"topk_group": 1,
"topk_method": "noaux_tc",
"transformers_version": "4.57.3",
"use_cache": true,
"v_head_dim": 128,
"vocab_size": 128256
}

1
configuration.json Normal file
View File

@@ -0,0 +1 @@
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}

6
generation_config.json Normal file
View File

@@ -0,0 +1,6 @@
{
"_from_model_config": true,
"bos_token_id": 128000,
"eos_token_id": 128001,
"transformers_version": "4.57.3"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5ffb2b042f2bdc3fe3fb963097aee0645f36b7641784b51476853d52b7da68cb
size 4999552848

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c1351b9374dbb076f9494df156a5c515b9b08b4f8616d8adb6b12b86a4fd6a0d
size 4997741048

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1f345965f5372f689e06f5660dc7c1aed47551ae69797b8cfdfbd80c7aeb6527
size 4997741912

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:e30f6d93e700e0b6bb9a53e6c3a33fa2d9dfa3d163d63883b2c59ff0639c1a3e
size 4997742592

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f8faaf72fb11920c9f96528c94dcacf6b26bbe1354ab75ed6e0e1e09054b82cd
size 4982800272

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:420c8085945656a9dbe0b6c345c54b8eadd71aaaf170fadade6d41c01f7b8ff5
size 4997209720

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1020b7e03ed7217ec28f80ad5b32d212bd9ecd09cac8a82cc09cbf190e712aec
size 4997742552

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:665fadb07db336112908e8d5879f14b4265c587212adbc94b0663e5d4b2f4111
size 4997742576

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5d8f022c0f58dce8e22fe6399c31a29726b9e9c3ee70e5160c1565c5edf76e60
size 4997742592

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:2bf89811918cea6f31fdcf982829c8513094e60ed370404b1a9f43027ee09a79
size 4997742592

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:60cd9e235ec365a706f7fdbb08314416c943e2931eac35380b10d485019b3bce
size 4997742592

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:043f1871c4fd598d94e5669ecb2e96c4167d3abeb7b6bcf48d07c1ea716d4065
size 4997742592

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:7bd0884c4c48015220e352a8786305c2d74237ffd5b798666995891cece295f8
size 1384691200

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:98b6eee9ebbe4b1f056db299a5281ebeac9d9e4c054ca12acf86d7c6815ccdfb
size 1680317

3
tokenizer.json Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f677910f98ff8282b4963d64444285af7c7f78dcdb688dcb4adc67c39e702d8d
size 10057650

2063
tokenizer_config.json Normal file

File diff suppressed because it is too large Load Diff