Files
minimind3-ascend-moe/README.md
ModelHub XC 90b2ce33ed 初始化项目,由ModelHub XC社区提供模型
Model: fzkun/minimind3-ascend-moe
Source: Original Platform
2026-04-23 16:38:09 +08:00

82 lines
2.0 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
language:
- zh
license: apache-2.0
library_name: transformers
tags:
- minimind
- qwen3-moe
- moe
- chat
- text-generation
- ascend
pipeline_tag: text-generation
---
# MiniMind3-Ascend-MoE
这是一个基于 **MiniMind3-Ascend** 训练链路导出的 MoE 对话模型,兼容 Transformers 推理方式,适合追求更高容量与更强表达能力的场景。
## 模型信息
- 模型名:`fzkun/minimind3-ascend-moe`
- 架构MoE
- 导出兼容:`Qwen3MoeForCausalLM`
- 参数规模:约 **198M**
- 激活参数:约 **64M**
- 主要配置:
- `hidden_size = 768`
- `num_hidden_layers = 8`
- `num_experts = 4`
- `num_experts_per_tok = 1`
## 文件说明
仓库中包含:
- `config.json`
- `generation_config.json`
- `model.safetensors`
- `tokenizer.json`
- `tokenizer_config.json`
- `special_tokens_map.json`
- `chat_template.jinja`
## 使用方式
### Transformers
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "fzkun/minimind3-ascend-moe"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
messages = [{"role": "user", "content": "你好"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
out = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
```
## Benchmark 结果
评测环境:
- Ascend 910B
- 单卡 `npu:0`
- `batch_size = 16`
| ceval | cmmlu | arc | piqa | openbookqa | hellaswag | siqa |
|---:|---:|---:|---:|---:|---:|---:|
| 23.77 | 24.88 | 30.30 | 51.63 | 26.00 | 28.58 | 34.08 |
说明:
- `ceval / cmmlu / arc / piqa / openbookqa / hellaswag` 使用 `acc_norm`
- `social_iqa` 使用 `acc`
## 补充说明
- MoE 版本容量更高,在部分 benchmark 上表现更强
- 对应 SwanLab 实验记录:<https://swanlab.cn/@fzkun/MiniMind3/overview>