初始化项目，由ModelHub XC社区提供模型

Model: hiratagoh/NVIDIA-Nemotron-Nano-9B-v2-Japanese-GGUF Source: Original Platform
2026-04-13 01:04:58 +08:00
commit eb24bf5930
10 changed files with 105 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,37 @@
+---
+license: other
+license_name: nvidia-nemotron-open-model-license
+license_link: >-
+  https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-nemotron-open-model-license/
+base_model: nvidia/NVIDIA-Nemotron-Nano-9B-v2-Japanese
+datasets:
+- TFMC/imatrix-dataset-for-japanese-llm
+track_downloads: true
+language:
+- ja
+- en
+pipeline_tag: text-generation
+---
+
+# NVIDIA-Nemotron-Nano-9B-v2-Japanese-GGUF
+
+## GGUF変換と量子化
+
+[nvidia/NVIDIA-Nemotron-Nano-9B-v2-Japanese](https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-9B-v2-Japanese)を
+[llama.cpp](https://github.com/ggml-org/llama.cpp.git)の`convert_hf_to_gguf.py`でGGUF形式変換し、`llama-quantize`で量子化しました。
+
+元モデルが軽量ですので、実行環境が許せばBF16かQ8_0での利用をお勧めします。
+
+## iMatrix生成
+
+iMatrixは
+[TFMC/imatrix-dataset-for-japanese-llm](https://huggingface.co/datasets/TFMC/imatrix-dataset-for-japanese-llm/tree/main)
+の`c4_en_ja_imatrix.txt`を教師データに使用し`llama-imatrix`で生成しました。
+
+## IQ4_XS量子化
+
+**IQ4_XS量子化**では`llama-quantize`で
+```
+llama_model_quantize_impl : tensor cols 4480 x 131072 are not divisible by 256, required for iq4_xs - using fallback quantization iq4_nl
+```
+などとログ出力され、**4ビット量子化されたLayerの多くはIQ4_NL**になってます。表面上はIQ4_XSと表記していますが、中身はほぼIQ4_NLです。