初始化项目,由ModelHub XC社区提供模型
Model: diverWayne/mikky-64m Source: Original Platform
This commit is contained in:
36
.gitattributes
vendored
Normal file
36
.gitattributes
vendored
Normal file
@@ -0,0 +1,36 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
mikky-64m-bf16.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
108
README.md
Normal file
108
README.md
Normal file
@@ -0,0 +1,108 @@
|
||||
---
|
||||
language:
|
||||
- zh
|
||||
- en
|
||||
license: apache-2.0
|
||||
pipeline_tag: text-generation
|
||||
library_name: transformers
|
||||
tags:
|
||||
- text-generation
|
||||
- gguf
|
||||
- safetensors
|
||||
- minimind
|
||||
- llama-cpp
|
||||
- qwen3-compatible
|
||||
base_model: jingyaogong/minimind
|
||||
datasets:
|
||||
- jingyaogong/minimind_dataset
|
||||
---
|
||||
|
||||
# mikky-64m
|
||||
|
||||
**mikky-64m** is a 63,912,192-parameter small language model named **mikky**.
|
||||
It was trained by **HUANG JUNZHE 黄俊哲** with the `minimind-scratch` codebase, based on the MiniMind project/data format.
|
||||
|
||||
This release is intended as a compact learning and experimentation checkpoint for local inference, model-format conversion, and small-model alignment workflows.
|
||||
|
||||
## Training Line
|
||||
|
||||
The released checkpoint uses the completed alignment path:
|
||||
|
||||
`pretrain -> SFT -> mikky LoRA identity SFT -> DPO`
|
||||
|
||||
GRPO was only run as a probe and is **not** used as the final release checkpoint.
|
||||
PPO was skipped because the local reward signal was not strong enough to justify another RL stage.
|
||||
|
||||
## Identity
|
||||
|
||||
The model identity/persona is:
|
||||
|
||||
- Name: **mikky**
|
||||
- Trainer: **HUANG JUNZHE 黄俊哲**
|
||||
- Origin: a small-parameter model trained from this MiniMind-based scratch project
|
||||
|
||||
## Files
|
||||
|
||||
- `mikky-64m.pth`: native `minimind_scratch` state dict, BF16 tensors.
|
||||
- `model.safetensors`: Qwen3-compatible Hugging Face tensor names, BF16 tensors.
|
||||
- `mikky-64m-bf16.gguf`: llama.cpp GGUF export, BF16, not quantized.
|
||||
- `tokenizer.json`, `tokenizer_config.json`: MiniMind tokenizer files.
|
||||
- `config.json`, `generation_config.json`: Qwen3-compatible metadata used for conversion and loading.
|
||||
|
||||
The final source checkpoint was `checkpoints/dpo_768_resume.pth`.
|
||||
|
||||
## Prompt Format
|
||||
|
||||
The training code uses MiniMind chat markers:
|
||||
|
||||
```text
|
||||
<|im_start|>user
|
||||
你的问题<|im_end|>
|
||||
<|im_start|>assistant
|
||||
```
|
||||
|
||||
## Native Usage
|
||||
|
||||
Use the project code for native scratch inference:
|
||||
|
||||
```bash
|
||||
python -m minimind_scratch.cli chat \
|
||||
--weight out/hf/mikky-64m/mikky-64m.pth \
|
||||
--prompt "请用一句话介绍你自己"
|
||||
```
|
||||
|
||||
## llama.cpp / GGUF
|
||||
|
||||
The GGUF file is BF16 and intentionally not quantized:
|
||||
|
||||
```bash
|
||||
llama-cli -m mikky-64m-bf16.gguf \
|
||||
-p "<|im_start|>user\n请用一句话介绍你自己<|im_end|>\n<|im_start|>assistant\n" \
|
||||
-n 128
|
||||
```
|
||||
|
||||
## Notes
|
||||
|
||||
The GGUF export maps the scratch model to a Qwen3-compatible tensor layout because the model uses RMSNorm, SwiGLU MLP, grouped-query attention, RoPE, and q/k normalization.
|
||||
The GGUF structure and metadata were verified locally. Always verify generation quality in your target runtime before treating the GGUF file as production-ready.
|
||||
|
||||
## Limitations
|
||||
|
||||
- This is a very small model; expect limited reasoning, math, factual recall, and safety behavior.
|
||||
- It is not suitable for high-stakes medical, legal, financial, or safety-critical use.
|
||||
- GRPO/PPO are not part of the final release checkpoint.
|
||||
|
||||
## Dataset And License
|
||||
|
||||
This model was trained with the MiniMind small-data recipe from
|
||||
[`jingyaogong/minimind_dataset`](https://huggingface.co/datasets/jingyaogong/minimind_dataset).
|
||||
For this release, the dataset reference follows the MiniMind small dataset license: **Apache-2.0**.
|
||||
|
||||
Main data files used by this run:
|
||||
|
||||
- `pretrain_t2t_mini.jsonl`: pretraining data.
|
||||
- `sft_t2t_mini.jsonl`: supervised fine-tuning data.
|
||||
- `dpo.jsonl`: preference data for DPO.
|
||||
- `lora_identity_mikky.jsonl`: project-authored identity/persona data for mikky.
|
||||
|
||||
The model card, exported native checkpoint, Safetensors checkpoint, and GGUF artifact are released under **Apache-2.0**.
|
||||
30
config.json
Normal file
30
config.json
Normal file
@@ -0,0 +1,30 @@
|
||||
{
|
||||
"architectures": [
|
||||
"Qwen3ForCausalLM"
|
||||
],
|
||||
"model_type": "qwen3",
|
||||
"vocab_size": 6400,
|
||||
"hidden_size": 768,
|
||||
"intermediate_size": 2432,
|
||||
"num_hidden_layers": 8,
|
||||
"num_attention_heads": 8,
|
||||
"num_key_value_heads": 4,
|
||||
"head_dim": 96,
|
||||
"max_position_embeddings": 2048,
|
||||
"rms_norm_eps": 1e-06,
|
||||
"rope_theta": 1000000.0,
|
||||
"hidden_act": "silu",
|
||||
"attention_bias": false,
|
||||
"attention_dropout": 0.0,
|
||||
"tie_word_embeddings": true,
|
||||
"torch_dtype": "bfloat16",
|
||||
"use_cache": true,
|
||||
"bos_token_id": 1,
|
||||
"eos_token_id": 2,
|
||||
"pad_token_id": 0,
|
||||
"miniMindScratch": {
|
||||
"param_count": 63912192,
|
||||
"state_tensor_count": 68827392,
|
||||
"source_architecture": "minimind_scratch.model.ScratchCausalLM"
|
||||
}
|
||||
}
|
||||
9
export_manifest.json
Normal file
9
export_manifest.json
Normal file
@@ -0,0 +1,9 @@
|
||||
{
|
||||
"model_name": "mikky-64m",
|
||||
"param_count": 63912192,
|
||||
"state_tensor_count": 68827392,
|
||||
"dtype": "bfloat16",
|
||||
"source": "checkpoints/dpo_768_resume.pth",
|
||||
"native_weight": "out/hf/mikky-64m/mikky-64m.pth",
|
||||
"safetensors": "out/hf/mikky-64m/model.safetensors"
|
||||
}
|
||||
10
generation_config.json
Normal file
10
generation_config.json
Normal file
@@ -0,0 +1,10 @@
|
||||
{
|
||||
"bos_token_id": 1,
|
||||
"eos_token_id": 2,
|
||||
"pad_token_id": 0,
|
||||
"do_sample": true,
|
||||
"temperature": 0.8,
|
||||
"top_p": 0.95,
|
||||
"top_k": 50,
|
||||
"max_new_tokens": 160
|
||||
}
|
||||
3
mikky-64m-bf16.gguf
Normal file
3
mikky-64m-bf16.gguf
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:0e6232961ff4126b63b25828c960a2b65cb4065c58895a93e5eaab08034d85a5
|
||||
size 137921280
|
||||
3
mikky-64m.pth
Normal file
3
mikky-64m.pth
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:0c13ea6cfd28e411b6b723d6e2d8c2873885a425c4839daaaa667998b238b897
|
||||
size 137683476
|
||||
3
model.safetensors
Normal file
3
model.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:5039f5541c833ea7fbaf6256eac63eaa69aaa09d8b15be559253669a1ec90d66
|
||||
size 137664928
|
||||
31191
tokenizer.json
Normal file
31191
tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
335
tokenizer_config.json
Normal file
335
tokenizer_config.json
Normal file
@@ -0,0 +1,335 @@
|
||||
{
|
||||
"add_bos_token": false,
|
||||
"add_eos_token": false,
|
||||
"add_prefix_space": false,
|
||||
"added_tokens_decoder": {
|
||||
"0": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"1": {
|
||||
"content": "<|im_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"2": {
|
||||
"content": "<|im_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"3": {
|
||||
"content": "<|object_ref_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"4": {
|
||||
"content": "<|object_ref_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"5": {
|
||||
"content": "<|box_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"6": {
|
||||
"content": "<|box_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"7": {
|
||||
"content": "<|quad_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"8": {
|
||||
"content": "<|quad_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"9": {
|
||||
"content": "<|vision_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"10": {
|
||||
"content": "<|vision_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"11": {
|
||||
"content": "<|vision_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"12": {
|
||||
"content": "<|image_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"13": {
|
||||
"content": "<|video_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"14": {
|
||||
"content": "<|audio_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"15": {
|
||||
"content": "<|audio_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"16": {
|
||||
"content": "<|audio_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"17": {
|
||||
"content": "<tts_pad>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"18": {
|
||||
"content": "<tts_text_bos>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"19": {
|
||||
"content": "<tts_text_eod>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"20": {
|
||||
"content": "<tts_text_bos_single>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"21": {
|
||||
"content": "<tool_call>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"22": {
|
||||
"content": "</tool_call>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"23": {
|
||||
"content": "<tool_response>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"24": {
|
||||
"content": "</tool_response>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"25": {
|
||||
"content": "<think>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"26": {
|
||||
"content": "</think>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"27": {
|
||||
"content": "<|buffer1|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"28": {
|
||||
"content": "<|buffer2|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"29": {
|
||||
"content": "<|buffer3|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"30": {
|
||||
"content": "<|buffer4|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"31": {
|
||||
"content": "<|buffer5|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"32": {
|
||||
"content": "<|buffer6|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"33": {
|
||||
"content": "<|buffer7|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"34": {
|
||||
"content": "<|buffer8|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"35": {
|
||||
"content": "<|buffer9|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
}
|
||||
},
|
||||
"additional_special_tokens": [
|
||||
"<|im_start|>",
|
||||
"<|im_end|>",
|
||||
"<|object_ref_start|>",
|
||||
"<|object_ref_end|>",
|
||||
"<|box_start|>",
|
||||
"<|box_end|>",
|
||||
"<|quad_start|>",
|
||||
"<|quad_end|>",
|
||||
"<|vision_start|>",
|
||||
"<|vision_end|>",
|
||||
"<|vision_pad|>",
|
||||
"<|image_pad|>",
|
||||
"<|video_pad|>",
|
||||
"<|audio_start|>",
|
||||
"<|audio_end|>",
|
||||
"<|audio_pad|>",
|
||||
"<tts_pad>",
|
||||
"<tts_text_bos>",
|
||||
"<tts_text_eod>",
|
||||
"<tts_text_bos_single>"
|
||||
],
|
||||
"audio_bos_token": "<|audio_start|>",
|
||||
"audio_eos_token": "<|audio_end|>",
|
||||
"audio_token": "<|audio_pad|>",
|
||||
"bos_token": "<|im_start|>",
|
||||
"clean_up_tokenization_spaces": false,
|
||||
"eos_token": "<|im_end|>",
|
||||
"extra_special_tokens": {},
|
||||
"image_token": "<|image_pad|>",
|
||||
"legacy": true,
|
||||
"model_max_length": 131072,
|
||||
"pad_token": "<|endoftext|>",
|
||||
"sp_model_kwargs": {},
|
||||
"spaces_between_special_tokens": false,
|
||||
"tokenizer_class": "PreTrainedTokenizerFast",
|
||||
"unk_token": "<|endoftext|>",
|
||||
"video_token": "<|video_pad|>",
|
||||
"vision_bos_token": "<|vision_start|>",
|
||||
"vision_eos_token": "<|vision_end|>"
|
||||
}
|
||||
Reference in New Issue
Block a user