MiniCPM-MoE-8x2B/README.md

# Introduction

[OpenBMB Technical Blog Series](https://openbmb.vercel.app/)

The MiniCPM-MoE-8x2B is a decoder-only transformer-based generative language model. 

The MiniCPM-MoE-8x2B adopt a Mixture-of-Experts(MoE) architecture, which has 8 experts per layer and activates 2 of 8 experts for each token.

# Usage
This is a model version after instruction tuning but without other rlhf methods. Chat template is automatically applied.
``` python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
torch.manual_seed(0)

path = 'openbmb/MiniCPM-MoE-8x2B'
tokenizer = AutoTokenizer.from_pretrained(path)
model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.bfloat16, device_map='cuda', trust_remote_code=True)

responds, history = model.chat(tokenizer, "山东省最高的山是哪座山, 它比黄山高还是矮？差距多少？", temperature=0.8, top_p=0.8)
print(responds)
```

# Note
1. You can alse inference with [vLLM](https://github.com/vllm-project/vllm)(>=0.4.1), which is compatible with this repo and has a much higher inference throughput.
2. The precision of model weights in this repo is bfloat16. Manual convertion is needed for other kinds of dtype.
3. For more details, please refer to our [github repo](https://github.com/OpenBMB/MiniCPM).

# Statement
1. As a language model, MiniCPM-MoE-8x2B generates content by learning from a vast amount of text.
2. However, it does not possess the ability to comprehend or express personal opinions or value judgments.
3. Any content generated by MiniCPM-MoE-8x2B does not represent the viewpoints or positions of the model developers.
4. Therefore, when using content generated by MiniCPM-MoE-8x2B, users should take full responsibility for evaluating and verifying it on their own.
初始化项目，由ModelHub XC社区提供模型 Model: OpenBMB/MiniCPM-MoE-8x2B Source: Original Platform 2026-05-27 04:00:13 +08:00			`# Introduction`

			`[OpenBMB Technical Blog Series](https://openbmb.vercel.app/)`

			`The MiniCPM-MoE-8x2B is a decoder-only transformer-based generative language model.`

			`The MiniCPM-MoE-8x2B adopt a Mixture-of-Experts(MoE) architecture, which has 8 experts per layer and activates 2 of 8 experts for each token.`

			`# Usage`
			`This is a model version after instruction tuning but without other rlhf methods. Chat template is automatically applied.`
			``` python
			`from transformers import AutoModelForCausalLM, AutoTokenizer`
			`import torch`
			`torch.manual_seed(0)`

			`path = 'openbmb/MiniCPM-MoE-8x2B'`
			`tokenizer = AutoTokenizer.from_pretrained(path)`
			`model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.bfloat16, device_map='cuda', trust_remote_code=True)`

			`responds, history = model.chat(tokenizer, "山东省最高的山是哪座山, 它比黄山高还是矮？差距多少？", temperature=0.8, top_p=0.8)`
			`print(responds)`
			```

			`# Note`
			`1. You can alse inference with [vLLM](https://github.com/vllm-project/vllm)(>=0.4.1), which is compatible with this repo and has a much higher inference throughput.`
			`2. The precision of model weights in this repo is bfloat16. Manual convertion is needed for other kinds of dtype.`
			`3. For more details, please refer to our [github repo](https://github.com/OpenBMB/MiniCPM).`

			`# Statement`
			`1. As a language model, MiniCPM-MoE-8x2B generates content by learning from a vast amount of text.`
			`2. However, it does not possess the ability to comprehend or express personal opinions or value judgments.`
			`3. Any content generated by MiniCPM-MoE-8x2B does not represent the viewpoints or positions of the model developers.`
			`4. Therefore, when using content generated by MiniCPM-MoE-8x2B, users should take full responsibility for evaluating and verifying it on their own.`