PowerMoE-3b/README.md

---
pipeline_tag: text-generation
inference: false
license: apache-2.0
library_name: transformers
model-index:
- name: ibm/PowerMoE-3b
  results:
  - task:
      type: text-generation
    dataset:
      type: lm-eval-harness
      name: ARC
    metrics:
    - name: accuracy-norm
      type: accuracy-norm
      value: 58.1
      verified: false
  - task:
      type: text-generation
    dataset:
      type: lm-eval-harness
      name: BoolQ
    metrics:
    - name: accuracy
      type: accuracy
      value: 65.0
      verified: false
  - task:
      type: text-generation
    dataset:
      type: lm-eval-harness
      name: Hellaswag
    metrics:
    - name: accuracy-norm
      type: accuracy-norm
      value: 71.5
      verified: false
  - task:
      type: text-generation
    dataset:
      type: lm-eval-harness
      name: OpenBookQA
    metrics:
    - name: accuracy-norm
      type: accuracy-norm
      value: 41.0
      verified: false
  - task:
      type: text-generation
    dataset:
      type: lm-eval-harness
      name: PIQA
    metrics:
    - name: accuracy-norm
      type: accuracy-norm
      value: 79.1
      verified: false
  - task:
      type: text-generation
    dataset:
      type: lm-eval-harness
      name: Winogrande
    metrics:
    - name: accuracy-norm
      type: accuracy-norm
      value: 65.0
      verified: false
  - task:
      type: text-generation
    dataset:
      type: lm-eval-harness
      name: MMLU (5 shot)
    metrics:
    - name: accuracy
      type: accuracy
      value: 42.8
      verified: false
  - task:
      type: text-generation
    dataset:
      type: lm-eval-harness
      name: GSM8k (5 shot)
    metrics:
    - name: accuracy
      type: accuracy
      value: 25.9
      verified: false
  - task:
      type: text-generation
    dataset:
      type: lm-eval-harness
      name: math (4 shot)
    metrics:
    - name: accuracy
      type: accuracy
      value: 14.8
      verified: false
  - task:
      type: text-generation
    dataset:
      type: bigcode-eval
      name: humaneval
    metrics:
    - name: pass@1
      type: pass@1
      value: 20.1
      verified: false
  - task:
      type: text-generation
    dataset:
      type: bigcode-eval
      name: MBPP
    metrics:
    - name: pass@1
      type: pass@1
      value: 32.4
      verified: false
---

## Model Summary
PowerMoE-3B is a 3B sparse Mixture-of-Experts (sMoE) language model trained with the Power learning rate scheduler. It sparsely activates 800M parameters for each token. It is trained on a mix of open-source and proprietary datasets. PowerMoE-3B has shown promising results compared to other dense models with 2x activate parameters across various benchmarks, including natural language multi-choices, code generation, and math reasoning.
Paper: https://arxiv.org/abs/2408.13359

## Usage
Note: Requires installing HF transformers from source.

### Generation
This is a simple example of how to use **PowerMoE-3b** model.

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # or "cpu"
model_path = "ibm/PowerMoE-3b"
tokenizer = AutoTokenizer.from_pretrained(model_path)
# drop device_map if running on CPU
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
model.eval()
# change input text as desired
prompt = "Write a code to find the maximum value in a list of numbers."
# tokenize the text
input_tokens = tokenizer(prompt, return_tensors="pt")
# transfer tokenized inputs to the device
for i in input_tokens:
    input_tokens[i] = input_tokens[i].to(device)
# generate output tokens
output = model.generate(**input_tokens, max_new_tokens=100)
# decode output tokens into text
output = tokenizer.batch_decode(output)
# loop over the batch to print, in this example the batch size is 1
for i in output:
    print(i)
```
初始化项目，由ModelHub XC社区提供模型 Model: ibm-research/PowerMoE-3b Source: Original Platform 2026-05-22 03:24:12 +08:00			`---`
			`pipeline_tag: text-generation`
			`inference: false`
			`license: apache-2.0`
			`library_name: transformers`
			`model-index:`
			`- name: ibm/PowerMoE-3b`
			`results:`
			`- task:`
			`type: text-generation`
			`dataset:`
			`type: lm-eval-harness`
			`name: ARC`
			`metrics:`
			`- name: accuracy-norm`
			`type: accuracy-norm`
			`value: 58.1`
			`verified: false`
			`- task:`
			`type: text-generation`
			`dataset:`
			`type: lm-eval-harness`
			`name: BoolQ`
			`metrics:`
			`- name: accuracy`
			`type: accuracy`
			`value: 65.0`
			`verified: false`
			`- task:`
			`type: text-generation`
			`dataset:`
			`type: lm-eval-harness`
			`name: Hellaswag`
			`metrics:`
			`- name: accuracy-norm`
			`type: accuracy-norm`
			`value: 71.5`
			`verified: false`
			`- task:`
			`type: text-generation`
			`dataset:`
			`type: lm-eval-harness`
			`name: OpenBookQA`
			`metrics:`
			`- name: accuracy-norm`
			`type: accuracy-norm`
			`value: 41.0`
			`verified: false`
			`- task:`
			`type: text-generation`
			`dataset:`
			`type: lm-eval-harness`
			`name: PIQA`
			`metrics:`
			`- name: accuracy-norm`
			`type: accuracy-norm`
			`value: 79.1`
			`verified: false`
			`- task:`
			`type: text-generation`
			`dataset:`
			`type: lm-eval-harness`
			`name: Winogrande`
			`metrics:`
			`- name: accuracy-norm`
			`type: accuracy-norm`
			`value: 65.0`
			`verified: false`
			`- task:`
			`type: text-generation`
			`dataset:`
			`type: lm-eval-harness`
			`name: MMLU (5 shot)`
			`metrics:`
			`- name: accuracy`
			`type: accuracy`
			`value: 42.8`
			`verified: false`
			`- task:`
			`type: text-generation`
			`dataset:`
			`type: lm-eval-harness`
			`name: GSM8k (5 shot)`
			`metrics:`
			`- name: accuracy`
			`type: accuracy`
			`value: 25.9`
			`verified: false`
			`- task:`
			`type: text-generation`
			`dataset:`
			`type: lm-eval-harness`
			`name: math (4 shot)`
			`metrics:`
			`- name: accuracy`
			`type: accuracy`
			`value: 14.8`
			`verified: false`
			`- task:`
			`type: text-generation`
			`dataset:`
			`type: bigcode-eval`
			`name: humaneval`
			`metrics:`
			`- name: pass@1`
			`type: pass@1`
			`value: 20.1`
			`verified: false`
			`- task:`
			`type: text-generation`
			`dataset:`
			`type: bigcode-eval`
			`name: MBPP`
			`metrics:`
			`- name: pass@1`
			`type: pass@1`
			`value: 32.4`
			`verified: false`
			`---`

			`## Model Summary`
			`PowerMoE-3B is a 3B sparse Mixture-of-Experts (sMoE) language model trained with the Power learning rate scheduler. It sparsely activates 800M parameters for each token. It is trained on a mix of open-source and proprietary datasets. PowerMoE-3B has shown promising results compared to other dense models with 2x activate parameters across various benchmarks, including natural language multi-choices, code generation, and math reasoning.`
			`Paper: https://arxiv.org/abs/2408.13359`

			`## Usage`
			`Note: Requires installing HF transformers from source.`

			`### Generation`
			`This is a simple example of how to use PowerMoE-3b model.`

			```python
			`import torch`
			`from transformers import AutoModelForCausalLM, AutoTokenizer`
			`device = "cuda" # or "cpu"`
			`model_path = "ibm/PowerMoE-3b"`
			`tokenizer = AutoTokenizer.from_pretrained(model_path)`
			`# drop device_map if running on CPU`
			`model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)`
			`model.eval()`
			`# change input text as desired`
			`prompt = "Write a code to find the maximum value in a list of numbers."`
			`# tokenize the text`
			`input_tokens = tokenizer(prompt, return_tensors="pt")`
			`# transfer tokenized inputs to the device`
			`for i in input_tokens:`
			`input_tokens[i] = input_tokens[i].to(device)`
			`# generate output tokens`
			`output = model.generate(**input_tokens, max_new_tokens=100)`
			`# decode output tokens into text`
			`output = tokenizer.batch_decode(output)`
			`# loop over the batch to print, in this example the batch size is 1`
			`for i in output:`
			`print(i)`
			```