初始化项目，由ModelHub XC社区提供模型

Model: diverWayne/mikky-64m Source: Original Platform
2026-06-19 06:27:17 +08:00
commit da7dbf6c81
10 changed files with 31728 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,36 @@
+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+mikky-64m-bf16.gguf filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,108 @@
+---
+language:
+- zh
+- en
+license: apache-2.0
+pipeline_tag: text-generation
+library_name: transformers
+tags:
+- text-generation
+- gguf
+- safetensors
+- minimind
+- llama-cpp
+- qwen3-compatible
+base_model: jingyaogong/minimind
+datasets:
+- jingyaogong/minimind_dataset
+---
+
+# mikky-64m
+
+**mikky-64m** is a 63,912,192-parameter small language model named **mikky**.
+It was trained by **HUANG JUNZHE 黄俊哲** with the `minimind-scratch` codebase, based on the MiniMind project/data format.
+
+This release is intended as a compact learning and experimentation checkpoint for local inference, model-format conversion, and small-model alignment workflows.
+
+## Training Line
+
+The released checkpoint uses the completed alignment path:
+
+`pretrain -> SFT -> mikky LoRA identity SFT -> DPO`
+
+GRPO was only run as a probe and is **not** used as the final release checkpoint.
+PPO was skipped because the local reward signal was not strong enough to justify another RL stage.
+
+## Identity
+
+The model identity/persona is:
+
+- Name: **mikky**
+- Trainer: **HUANG JUNZHE 黄俊哲**
+- Origin: a small-parameter model trained from this MiniMind-based scratch project
+
+## Files
+
+- `mikky-64m.pth`: native `minimind_scratch` state dict, BF16 tensors.
+- `model.safetensors`: Qwen3-compatible Hugging Face tensor names, BF16 tensors.
+- `mikky-64m-bf16.gguf`: llama.cpp GGUF export, BF16, not quantized.
+- `tokenizer.json`, `tokenizer_config.json`: MiniMind tokenizer files.
+- `config.json`, `generation_config.json`: Qwen3-compatible metadata used for conversion and loading.
+
+The final source checkpoint was `checkpoints/dpo_768_resume.pth`.
+
+## Prompt Format
+
+The training code uses MiniMind chat markers:
+
+```text
+<|im_start|>user
+你的问题<|im_end|>
+<|im_start|>assistant
+```
+
+## Native Usage
+
+Use the project code for native scratch inference:
+
+```bash
+python -m minimind_scratch.cli chat \
+  --weight out/hf/mikky-64m/mikky-64m.pth \
+  --prompt "请用一句话介绍你自己"
+```
+
+## llama.cpp / GGUF
+
+The GGUF file is BF16 and intentionally not quantized:
+
+```bash
+llama-cli -m mikky-64m-bf16.gguf \
+  -p "<|im_start|>user\n请用一句话介绍你自己<|im_end|>\n<|im_start|>assistant\n" \
+  -n 128
+```
+
+## Notes
+
+The GGUF export maps the scratch model to a Qwen3-compatible tensor layout because the model uses RMSNorm, SwiGLU MLP, grouped-query attention, RoPE, and q/k normalization.
+The GGUF structure and metadata were verified locally. Always verify generation quality in your target runtime before treating the GGUF file as production-ready.
+
+## Limitations
+
+- This is a very small model; expect limited reasoning, math, factual recall, and safety behavior.
+- It is not suitable for high-stakes medical, legal, financial, or safety-critical use.
+- GRPO/PPO are not part of the final release checkpoint.
+
+## Dataset And License
+
+This model was trained with the MiniMind small-data recipe from
+[`jingyaogong/minimind_dataset`](https://huggingface.co/datasets/jingyaogong/minimind_dataset).
+For this release, the dataset reference follows the MiniMind small dataset license: **Apache-2.0**.
+
+Main data files used by this run:
+
+- `pretrain_t2t_mini.jsonl`: pretraining data.
+- `sft_t2t_mini.jsonl`: supervised fine-tuning data.
+- `dpo.jsonl`: preference data for DPO.
+- `lora_identity_mikky.jsonl`: project-authored identity/persona data for mikky.
+
+The model card, exported native checkpoint, Safetensors checkpoint, and GGUF artifact are released under **Apache-2.0**.
--- a/config.json
+++ b/config.json
@@ -0,0 +1,30 @@
+{
+  "architectures": [
+    "Qwen3ForCausalLM"
+  ],
+  "model_type": "qwen3",
+  "vocab_size": 6400,
+  "hidden_size": 768,
+  "intermediate_size": 2432,
+  "num_hidden_layers": 8,
+  "num_attention_heads": 8,
+  "num_key_value_heads": 4,
+  "head_dim": 96,
+  "max_position_embeddings": 2048,
+  "rms_norm_eps": 1e-06,
+  "rope_theta": 1000000.0,
+  "hidden_act": "silu",
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "tie_word_embeddings": true,
+  "torch_dtype": "bfloat16",
+  "use_cache": true,
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "pad_token_id": 0,
+  "miniMindScratch": {
+    "param_count": 63912192,
+    "state_tensor_count": 68827392,
+    "source_architecture": "minimind_scratch.model.ScratchCausalLM"
+  }
+}
--- a/export_manifest.json
+++ b/export_manifest.json
@@ -0,0 +1,9 @@
+{
+  "model_name": "mikky-64m",
+  "param_count": 63912192,
+  "state_tensor_count": 68827392,
+  "dtype": "bfloat16",
+  "source": "checkpoints/dpo_768_resume.pth",
+  "native_weight": "out/hf/mikky-64m/mikky-64m.pth",
+  "safetensors": "out/hf/mikky-64m/model.safetensors"
+}
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,10 @@
+{
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "pad_token_id": 0,
+  "do_sample": true,
+  "temperature": 0.8,
+  "top_p": 0.95,
+  "top_k": 50,
+  "max_new_tokens": 160
+}
--- a/mikky-64m-bf16.gguf
+++ b/mikky-64m-bf16.gguf
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:0e6232961ff4126b63b25828c960a2b65cb4065c58895a93e5eaab08034d85a5
+size 137921280
--- a/mikky-64m.pth
+++ b/mikky-64m.pth
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:0c13ea6cfd28e411b6b723d6e2d8c2873885a425c4839daaaa667998b238b897
+size 137683476
--- a/model.safetensors
+++ b/model.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:5039f5541c833ea7fbaf6256eac63eaa69aaa09d8b15be559253669a1ec90d66
+size 137664928
--- a/tokenizer.json
+++ b/tokenizer.json
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -0,0 +1,335 @@
+{
+  "add_bos_token": false,
+  "add_eos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "4": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "5": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "6": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "7": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "8": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "9": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "10": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "11": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "12": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "13": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "14": {
+      "content": "<|audio_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "15": {
+      "content": "<|audio_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "16": {
+      "content": "<|audio_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "17": {
+      "content": "<tts_pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "18": {
+      "content": "<tts_text_bos>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "19": {
+      "content": "<tts_text_eod>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "20": {
+      "content": "<tts_text_bos_single>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "22": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "23": {
+      "content": "<tool_response>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "24": {
+      "content": "</tool_response>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "25": {
+      "content": "<think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "26": {
+      "content": "</think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "27": {
+      "content": "<|buffer1|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "28": {
+      "content": "<|buffer2|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "29": {
+      "content": "<|buffer3|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "30": {
+      "content": "<|buffer4|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "31": {
+      "content": "<|buffer5|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32": {
+      "content": "<|buffer6|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "33": {
+      "content": "<|buffer7|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "34": {
+      "content": "<|buffer8|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "35": {
+      "content": "<|buffer9|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>",
+    "<|audio_start|>",
+    "<|audio_end|>",
+    "<|audio_pad|>",
+    "<tts_pad>",
+    "<tts_text_bos>",
+    "<tts_text_eod>",
+    "<tts_text_bos_single>"
+  ],
+  "audio_bos_token": "<|audio_start|>",
+  "audio_eos_token": "<|audio_end|>",
+  "audio_token": "<|audio_pad|>",
+  "bos_token": "<|im_start|>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "extra_special_tokens": {},
+  "image_token": "<|image_pad|>",
+  "legacy": true,
+  "model_max_length": 131072,
+  "pad_token": "<|endoftext|>",
+  "sp_model_kwargs": {},
+  "spaces_between_special_tokens": false,
+  "tokenizer_class": "PreTrainedTokenizerFast",
+  "unk_token": "<|endoftext|>",
+  "video_token": "<|video_pad|>",
+  "vision_bos_token": "<|vision_start|>",
+  "vision_eos_token": "<|vision_end|>"
+}