初始化项目，由ModelHub XC社区提供模型

Model: Alrightlone/minimind-63M-full-sft-Junhan Source: Original Platform
2026-05-29 04:56:16 +08:00
commit 7c093fd49e
9 changed files with 31848 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,105 @@
+---
+language:
+- en
+- zh
+license: cc-by-nc-4.0
+library_name: transformers
+pipeline_tag: text-generation
+tags:
+- minimind
+- causal-lm
+- chat
+- text-generation
+- sft
+- qwen3
+---
+
+# minimind-63M-full-sft-Junhan
+
+This repository contains a 63.9M-parameter dense MiniMind chat model converted to a Transformers-compatible checkpoint for easy loading with `transformers`.
+
+## Model Summary
+
+- Architecture: dense decoder-only causal LM
+- Exported architecture name: `Qwen3ForCausalLM`
+- Original training codebase: MiniMind
+- Parameters: 63.9M
+- Hidden size: 768
+- Layers: 8
+- Attention heads: 8
+- KV heads: 4
+- Vocab size: 6400
+- Max position embeddings: 32768
+- RoPE theta: 1e6
+- MoE: no
+- Checkpoint type: full-parameter SFT
+
+This model was trained from a MiniMind pretraining checkpoint and then fully fine-tuned on the MiniMind SFT pipeline. The exported folder was produced from the local `full_sft_768.pth` checkpoint using `scripts/convert_model.py`.
+
+## Training Notes
+
+- Base training pipeline: MiniMind
+- SFT training script: `trainer/train_full_sft.py`
+- SFT data used locally: `sft_t2t_mini.jsonl`
+- Typical SFT sequence length in this setup: `max_seq_len=768`
+
+The upstream MiniMind SFT data mixes general instruction-following samples with some tool-calling and reasoning-style samples. As a result, this checkpoint is mainly a lightweight chat model, not a specialized tool-use or reasoning model.
+
+## Usage
+
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+repo_id = "YOUR_USERNAME/minimind-63M-full-sft-Junhan"
+
+tokenizer = AutoTokenizer.from_pretrained(repo_id)
+model = AutoModelForCausalLM.from_pretrained(
+    repo_id,
+    torch_dtype="auto",
+    device_map="auto",
+)
+
+messages = [
+    {"role": "user", "content": "你好，介绍一下你自己。"}
+]
+
+inputs = tokenizer.apply_chat_template(
+    messages,
+    add_generation_prompt=True,
+    tokenize=True,
+    return_tensors="pt",
+).to(model.device)
+
+outputs = model.generate(
+    inputs,
+    max_new_tokens=256,
+    do_sample=True,
+    temperature=0.7,
+    top_p=0.9,
+)
+
+print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
+```
+
+## Intended Use
+
+- Lightweight chat experiments
+- Small-model SFT baselines
+- Educational and debugging purposes
+- Simple local inference and deployment tests
+
+## Limitations
+
+- This is a very small model, so factuality, planning, and reasoning ability are limited.
+- Tool-use style may appear in some responses, but robustness is limited.
+- The model is not suitable for high-stakes medical, legal, financial, or safety-critical use.
+- The training mixture includes distilled or synthetic components, so behavior may inherit teacher-model style artifacts.
+
+## Source
+
+- Upstream codebase: https://github.com/jingyaogong/minimind
+
+## License
+
+This model card uses `cc-by-nc-4.0` conservatively because the upstream MiniMind dataset documentation mentions mixed source licenses, including non-commercial terms in parts of the training pipeline. Review your exact data provenance before using or relicensing this model for commercial scenarios.