初始化项目,由ModelHub XC社区提供模型

Model: diverWayne/mikky-64m
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-06-19 06:27:17 +08:00
commit da7dbf6c81
10 changed files with 31728 additions and 0 deletions

36
.gitattributes vendored Normal file
View File

@@ -0,0 +1,36 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
mikky-64m-bf16.gguf filter=lfs diff=lfs merge=lfs -text

108
README.md Normal file
View File

@@ -0,0 +1,108 @@
---
language:
- zh
- en
license: apache-2.0
pipeline_tag: text-generation
library_name: transformers
tags:
- text-generation
- gguf
- safetensors
- minimind
- llama-cpp
- qwen3-compatible
base_model: jingyaogong/minimind
datasets:
- jingyaogong/minimind_dataset
---
# mikky-64m
**mikky-64m** is a 63,912,192-parameter small language model named **mikky**.
It was trained by **HUANG JUNZHE 黄俊哲** with the `minimind-scratch` codebase, based on the MiniMind project/data format.
This release is intended as a compact learning and experimentation checkpoint for local inference, model-format conversion, and small-model alignment workflows.
## Training Line
The released checkpoint uses the completed alignment path:
`pretrain -> SFT -> mikky LoRA identity SFT -> DPO`
GRPO was only run as a probe and is **not** used as the final release checkpoint.
PPO was skipped because the local reward signal was not strong enough to justify another RL stage.
## Identity
The model identity/persona is:
- Name: **mikky**
- Trainer: **HUANG JUNZHE 黄俊哲**
- Origin: a small-parameter model trained from this MiniMind-based scratch project
## Files
- `mikky-64m.pth`: native `minimind_scratch` state dict, BF16 tensors.
- `model.safetensors`: Qwen3-compatible Hugging Face tensor names, BF16 tensors.
- `mikky-64m-bf16.gguf`: llama.cpp GGUF export, BF16, not quantized.
- `tokenizer.json`, `tokenizer_config.json`: MiniMind tokenizer files.
- `config.json`, `generation_config.json`: Qwen3-compatible metadata used for conversion and loading.
The final source checkpoint was `checkpoints/dpo_768_resume.pth`.
## Prompt Format
The training code uses MiniMind chat markers:
```text
<|im_start|>user
你的问题<|im_end|>
<|im_start|>assistant
```
## Native Usage
Use the project code for native scratch inference:
```bash
python -m minimind_scratch.cli chat \
--weight out/hf/mikky-64m/mikky-64m.pth \
--prompt "请用一句话介绍你自己"
```
## llama.cpp / GGUF
The GGUF file is BF16 and intentionally not quantized:
```bash
llama-cli -m mikky-64m-bf16.gguf \
-p "<|im_start|>user\n请用一句话介绍你自己<|im_end|>\n<|im_start|>assistant\n" \
-n 128
```
## Notes
The GGUF export maps the scratch model to a Qwen3-compatible tensor layout because the model uses RMSNorm, SwiGLU MLP, grouped-query attention, RoPE, and q/k normalization.
The GGUF structure and metadata were verified locally. Always verify generation quality in your target runtime before treating the GGUF file as production-ready.
## Limitations
- This is a very small model; expect limited reasoning, math, factual recall, and safety behavior.
- It is not suitable for high-stakes medical, legal, financial, or safety-critical use.
- GRPO/PPO are not part of the final release checkpoint.
## Dataset And License
This model was trained with the MiniMind small-data recipe from
[`jingyaogong/minimind_dataset`](https://huggingface.co/datasets/jingyaogong/minimind_dataset).
For this release, the dataset reference follows the MiniMind small dataset license: **Apache-2.0**.
Main data files used by this run:
- `pretrain_t2t_mini.jsonl`: pretraining data.
- `sft_t2t_mini.jsonl`: supervised fine-tuning data.
- `dpo.jsonl`: preference data for DPO.
- `lora_identity_mikky.jsonl`: project-authored identity/persona data for mikky.
The model card, exported native checkpoint, Safetensors checkpoint, and GGUF artifact are released under **Apache-2.0**.

30
config.json Normal file
View File

@@ -0,0 +1,30 @@
{
"architectures": [
"Qwen3ForCausalLM"
],
"model_type": "qwen3",
"vocab_size": 6400,
"hidden_size": 768,
"intermediate_size": 2432,
"num_hidden_layers": 8,
"num_attention_heads": 8,
"num_key_value_heads": 4,
"head_dim": 96,
"max_position_embeddings": 2048,
"rms_norm_eps": 1e-06,
"rope_theta": 1000000.0,
"hidden_act": "silu",
"attention_bias": false,
"attention_dropout": 0.0,
"tie_word_embeddings": true,
"torch_dtype": "bfloat16",
"use_cache": true,
"bos_token_id": 1,
"eos_token_id": 2,
"pad_token_id": 0,
"miniMindScratch": {
"param_count": 63912192,
"state_tensor_count": 68827392,
"source_architecture": "minimind_scratch.model.ScratchCausalLM"
}
}

9
export_manifest.json Normal file
View File

@@ -0,0 +1,9 @@
{
"model_name": "mikky-64m",
"param_count": 63912192,
"state_tensor_count": 68827392,
"dtype": "bfloat16",
"source": "checkpoints/dpo_768_resume.pth",
"native_weight": "out/hf/mikky-64m/mikky-64m.pth",
"safetensors": "out/hf/mikky-64m/model.safetensors"
}

10
generation_config.json Normal file
View File

@@ -0,0 +1,10 @@
{
"bos_token_id": 1,
"eos_token_id": 2,
"pad_token_id": 0,
"do_sample": true,
"temperature": 0.8,
"top_p": 0.95,
"top_k": 50,
"max_new_tokens": 160
}

3
mikky-64m-bf16.gguf Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0e6232961ff4126b63b25828c960a2b65cb4065c58895a93e5eaab08034d85a5
size 137921280

3
mikky-64m.pth Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0c13ea6cfd28e411b6b723d6e2d8c2873885a425c4839daaaa667998b238b897
size 137683476

3
model.safetensors Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5039f5541c833ea7fbaf6256eac63eaa69aaa09d8b15be559253669a1ec90d66
size 137664928

31191
tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

335
tokenizer_config.json Normal file
View File

@@ -0,0 +1,335 @@
{
"add_bos_token": false,
"add_eos_token": false,
"add_prefix_space": false,
"added_tokens_decoder": {
"0": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"1": {
"content": "<|im_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"2": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"3": {
"content": "<|object_ref_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"4": {
"content": "<|object_ref_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"5": {
"content": "<|box_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"6": {
"content": "<|box_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"7": {
"content": "<|quad_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"8": {
"content": "<|quad_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"9": {
"content": "<|vision_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"10": {
"content": "<|vision_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"11": {
"content": "<|vision_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"12": {
"content": "<|image_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"13": {
"content": "<|video_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"14": {
"content": "<|audio_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"15": {
"content": "<|audio_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"16": {
"content": "<|audio_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"17": {
"content": "<tts_pad>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"18": {
"content": "<tts_text_bos>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"19": {
"content": "<tts_text_eod>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"20": {
"content": "<tts_text_bos_single>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"21": {
"content": "<tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"22": {
"content": "</tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"23": {
"content": "<tool_response>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"24": {
"content": "</tool_response>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"25": {
"content": "<think>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"26": {
"content": "</think>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"27": {
"content": "<|buffer1|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"28": {
"content": "<|buffer2|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"29": {
"content": "<|buffer3|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"30": {
"content": "<|buffer4|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"31": {
"content": "<|buffer5|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"32": {
"content": "<|buffer6|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"33": {
"content": "<|buffer7|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"34": {
"content": "<|buffer8|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"35": {
"content": "<|buffer9|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
}
},
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|object_ref_start|>",
"<|object_ref_end|>",
"<|box_start|>",
"<|box_end|>",
"<|quad_start|>",
"<|quad_end|>",
"<|vision_start|>",
"<|vision_end|>",
"<|vision_pad|>",
"<|image_pad|>",
"<|video_pad|>",
"<|audio_start|>",
"<|audio_end|>",
"<|audio_pad|>",
"<tts_pad>",
"<tts_text_bos>",
"<tts_text_eod>",
"<tts_text_bos_single>"
],
"audio_bos_token": "<|audio_start|>",
"audio_eos_token": "<|audio_end|>",
"audio_token": "<|audio_pad|>",
"bos_token": "<|im_start|>",
"clean_up_tokenization_spaces": false,
"eos_token": "<|im_end|>",
"extra_special_tokens": {},
"image_token": "<|image_pad|>",
"legacy": true,
"model_max_length": 131072,
"pad_token": "<|endoftext|>",
"sp_model_kwargs": {},
"spaces_between_special_tokens": false,
"tokenizer_class": "PreTrainedTokenizerFast",
"unk_token": "<|endoftext|>",
"video_token": "<|video_pad|>",
"vision_bos_token": "<|vision_start|>",
"vision_eos_token": "<|vision_end|>"
}