--- language: - en - zh license: cc-by-nc-4.0 library_name: transformers pipeline_tag: text-generation tags: - minimind - causal-lm - chat - text-generation - sft - qwen3 --- # minimind-63M-full-sft-Junhan This repository contains a 63.9M-parameter dense MiniMind chat model converted to a Transformers-compatible checkpoint for easy loading with `transformers`. ## Model Summary - Architecture: dense decoder-only causal LM - Exported architecture name: `Qwen3ForCausalLM` - Original training codebase: MiniMind - Parameters: 63.9M - Hidden size: 768 - Layers: 8 - Attention heads: 8 - KV heads: 4 - Vocab size: 6400 - Max position embeddings: 32768 - RoPE theta: 1e6 - MoE: no - Checkpoint type: full-parameter SFT This model was trained from a MiniMind pretraining checkpoint and then fully fine-tuned on the MiniMind SFT pipeline. The exported folder was produced from the local `full_sft_768.pth` checkpoint using `scripts/convert_model.py`. ## Training Notes - Base training pipeline: MiniMind - SFT training script: `trainer/train_full_sft.py` - SFT data used locally: `sft_t2t_mini.jsonl` - Typical SFT sequence length in this setup: `max_seq_len=768` The upstream MiniMind SFT data mixes general instruction-following samples with some tool-calling and reasoning-style samples. As a result, this checkpoint is mainly a lightweight chat model, not a specialized tool-use or reasoning model. ## Usage ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer repo_id = "YOUR_USERNAME/minimind-63M-full-sft-Junhan" tokenizer = AutoTokenizer.from_pretrained(repo_id) model = AutoModelForCausalLM.from_pretrained( repo_id, torch_dtype="auto", device_map="auto", ) messages = [ {"role": "user", "content": "你好,介绍一下你自己。"} ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_tensors="pt", ).to(model.device) outputs = model.generate( inputs, max_new_tokens=256, do_sample=True, temperature=0.7, top_p=0.9, ) print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True)) ``` ## Intended Use - Lightweight chat experiments - Small-model SFT baselines - Educational and debugging purposes - Simple local inference and deployment tests ## Limitations - This is a very small model, so factuality, planning, and reasoning ability are limited. - Tool-use style may appear in some responses, but robustness is limited. - The model is not suitable for high-stakes medical, legal, financial, or safety-critical use. - The training mixture includes distilled or synthetic components, so behavior may inherit teacher-model style artifacts. ## Source - Upstream codebase: https://github.com/jingyaogong/minimind ## License This model card uses `cc-by-nc-4.0` conservatively because the upstream MiniMind dataset documentation mentions mixed source licenses, including non-commercial terms in parts of the training pipeline. Review your exact data provenance before using or relicensing this model for commercial scenarios.