初始化项目,由ModelHub XC社区提供模型
Model: Alrightlone/minimind-63M-full-sft-Junhan Source: Original Platform
This commit is contained in:
105
README.md
Normal file
105
README.md
Normal file
@@ -0,0 +1,105 @@
|
||||
---
|
||||
language:
|
||||
- en
|
||||
- zh
|
||||
license: cc-by-nc-4.0
|
||||
library_name: transformers
|
||||
pipeline_tag: text-generation
|
||||
tags:
|
||||
- minimind
|
||||
- causal-lm
|
||||
- chat
|
||||
- text-generation
|
||||
- sft
|
||||
- qwen3
|
||||
---
|
||||
|
||||
# minimind-63M-full-sft-Junhan
|
||||
|
||||
This repository contains a 63.9M-parameter dense MiniMind chat model converted to a Transformers-compatible checkpoint for easy loading with `transformers`.
|
||||
|
||||
## Model Summary
|
||||
|
||||
- Architecture: dense decoder-only causal LM
|
||||
- Exported architecture name: `Qwen3ForCausalLM`
|
||||
- Original training codebase: MiniMind
|
||||
- Parameters: 63.9M
|
||||
- Hidden size: 768
|
||||
- Layers: 8
|
||||
- Attention heads: 8
|
||||
- KV heads: 4
|
||||
- Vocab size: 6400
|
||||
- Max position embeddings: 32768
|
||||
- RoPE theta: 1e6
|
||||
- MoE: no
|
||||
- Checkpoint type: full-parameter SFT
|
||||
|
||||
This model was trained from a MiniMind pretraining checkpoint and then fully fine-tuned on the MiniMind SFT pipeline. The exported folder was produced from the local `full_sft_768.pth` checkpoint using `scripts/convert_model.py`.
|
||||
|
||||
## Training Notes
|
||||
|
||||
- Base training pipeline: MiniMind
|
||||
- SFT training script: `trainer/train_full_sft.py`
|
||||
- SFT data used locally: `sft_t2t_mini.jsonl`
|
||||
- Typical SFT sequence length in this setup: `max_seq_len=768`
|
||||
|
||||
The upstream MiniMind SFT data mixes general instruction-following samples with some tool-calling and reasoning-style samples. As a result, this checkpoint is mainly a lightweight chat model, not a specialized tool-use or reasoning model.
|
||||
|
||||
## Usage
|
||||
|
||||
```python
|
||||
import torch
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
repo_id = "YOUR_USERNAME/minimind-63M-full-sft-Junhan"
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained(repo_id)
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
repo_id,
|
||||
torch_dtype="auto",
|
||||
device_map="auto",
|
||||
)
|
||||
|
||||
messages = [
|
||||
{"role": "user", "content": "你好,介绍一下你自己。"}
|
||||
]
|
||||
|
||||
inputs = tokenizer.apply_chat_template(
|
||||
messages,
|
||||
add_generation_prompt=True,
|
||||
tokenize=True,
|
||||
return_tensors="pt",
|
||||
).to(model.device)
|
||||
|
||||
outputs = model.generate(
|
||||
inputs,
|
||||
max_new_tokens=256,
|
||||
do_sample=True,
|
||||
temperature=0.7,
|
||||
top_p=0.9,
|
||||
)
|
||||
|
||||
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
|
||||
```
|
||||
|
||||
## Intended Use
|
||||
|
||||
- Lightweight chat experiments
|
||||
- Small-model SFT baselines
|
||||
- Educational and debugging purposes
|
||||
- Simple local inference and deployment tests
|
||||
|
||||
## Limitations
|
||||
|
||||
- This is a very small model, so factuality, planning, and reasoning ability are limited.
|
||||
- Tool-use style may appear in some responses, but robustness is limited.
|
||||
- The model is not suitable for high-stakes medical, legal, financial, or safety-critical use.
|
||||
- The training mixture includes distilled or synthetic components, so behavior may inherit teacher-model style artifacts.
|
||||
|
||||
## Source
|
||||
|
||||
- Upstream codebase: https://github.com/jingyaogong/minimind
|
||||
|
||||
## License
|
||||
|
||||
This model card uses `cc-by-nc-4.0` conservatively because the upstream MiniMind dataset documentation mentions mixed source licenses, including non-commercial terms in parts of the training pipeline. Review your exact data provenance before using or relicensing this model for commercial scenarios.
|
||||
Reference in New Issue
Block a user