language, license, library_name, pipeline_tag, tags
language license library_name pipeline_tag tags
en
zh
cc-by-nc-4.0 transformers text-generation
minimind
causal-lm
chat
text-generation
sft
qwen3

minimind-63M-full-sft-Junhan

This repository contains a 63.9M-parameter dense MiniMind chat model converted to a Transformers-compatible checkpoint for easy loading with transformers.

Model Summary

  • Architecture: dense decoder-only causal LM
  • Exported architecture name: Qwen3ForCausalLM
  • Original training codebase: MiniMind
  • Parameters: 63.9M
  • Hidden size: 768
  • Layers: 8
  • Attention heads: 8
  • KV heads: 4
  • Vocab size: 6400
  • Max position embeddings: 32768
  • RoPE theta: 1e6
  • MoE: no
  • Checkpoint type: full-parameter SFT

This model was trained from a MiniMind pretraining checkpoint and then fully fine-tuned on the MiniMind SFT pipeline. The exported folder was produced from the local full_sft_768.pth checkpoint using scripts/convert_model.py.

Training Notes

  • Base training pipeline: MiniMind
  • SFT training script: trainer/train_full_sft.py
  • SFT data used locally: sft_t2t_mini.jsonl
  • Typical SFT sequence length in this setup: max_seq_len=768

The upstream MiniMind SFT data mixes general instruction-following samples with some tool-calling and reasoning-style samples. As a result, this checkpoint is mainly a lightweight chat model, not a specialized tool-use or reasoning model.

Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "YOUR_USERNAME/minimind-63M-full-sft-Junhan"

tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(
    repo_id,
    torch_dtype="auto",
    device_map="auto",
)

messages = [
    {"role": "user", "content": "你好,介绍一下你自己。"}
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_tensors="pt",
).to(model.device)

outputs = model.generate(
    inputs,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
)

print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))

Intended Use

  • Lightweight chat experiments
  • Small-model SFT baselines
  • Educational and debugging purposes
  • Simple local inference and deployment tests

Limitations

  • This is a very small model, so factuality, planning, and reasoning ability are limited.
  • Tool-use style may appear in some responses, but robustness is limited.
  • The model is not suitable for high-stakes medical, legal, financial, or safety-critical use.
  • The training mixture includes distilled or synthetic components, so behavior may inherit teacher-model style artifacts.

Source

License

This model card uses cc-by-nc-4.0 conservatively because the upstream MiniMind dataset documentation mentions mixed source licenses, including non-commercial terms in parts of the training pipeline. Review your exact data provenance before using or relicensing this model for commercial scenarios.

Description
Model synced from source: Alrightlone/minimind-63M-full-sft-Junhan
Readme 126 KiB
Languages
Jinja 100%