初始化项目,由ModelHub XC社区提供模型

Model: RichardErkhov/allenai_-_OLMoE-1B-7B-0924-Instruct-gguf
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-06-04 04:10:18 +08:00
commit 9cb6e934b0
24 changed files with 263 additions and 0 deletions

57
.gitattributes vendored Normal file
View File

@@ -0,0 +1,57 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
OLMoE-1B-7B-0924-Instruct.Q2_K.gguf filter=lfs diff=lfs merge=lfs -text
OLMoE-1B-7B-0924-Instruct.IQ3_XS.gguf filter=lfs diff=lfs merge=lfs -text
OLMoE-1B-7B-0924-Instruct.IQ3_S.gguf filter=lfs diff=lfs merge=lfs -text
OLMoE-1B-7B-0924-Instruct.Q3_K_S.gguf filter=lfs diff=lfs merge=lfs -text
OLMoE-1B-7B-0924-Instruct.IQ3_M.gguf filter=lfs diff=lfs merge=lfs -text
OLMoE-1B-7B-0924-Instruct.Q3_K.gguf filter=lfs diff=lfs merge=lfs -text
OLMoE-1B-7B-0924-Instruct.Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
OLMoE-1B-7B-0924-Instruct.Q3_K_L.gguf filter=lfs diff=lfs merge=lfs -text
OLMoE-1B-7B-0924-Instruct.IQ4_XS.gguf filter=lfs diff=lfs merge=lfs -text
OLMoE-1B-7B-0924-Instruct.Q4_0.gguf filter=lfs diff=lfs merge=lfs -text
OLMoE-1B-7B-0924-Instruct.IQ4_NL.gguf filter=lfs diff=lfs merge=lfs -text
OLMoE-1B-7B-0924-Instruct.Q4_K_S.gguf filter=lfs diff=lfs merge=lfs -text
OLMoE-1B-7B-0924-Instruct.Q4_K.gguf filter=lfs diff=lfs merge=lfs -text
OLMoE-1B-7B-0924-Instruct.Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
OLMoE-1B-7B-0924-Instruct.Q4_1.gguf filter=lfs diff=lfs merge=lfs -text
OLMoE-1B-7B-0924-Instruct.Q5_0.gguf filter=lfs diff=lfs merge=lfs -text
OLMoE-1B-7B-0924-Instruct.Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text
OLMoE-1B-7B-0924-Instruct.Q5_K.gguf filter=lfs diff=lfs merge=lfs -text
OLMoE-1B-7B-0924-Instruct.Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
OLMoE-1B-7B-0924-Instruct.Q5_1.gguf filter=lfs diff=lfs merge=lfs -text
OLMoE-1B-7B-0924-Instruct.Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
OLMoE-1B-7B-0924-Instruct.Q8_0.gguf filter=lfs diff=lfs merge=lfs -text

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4c5dc5047bb4c1918f98586e836e4ff7e6e2940bcd412c3de6e44e4b614b589a
size 3076543008

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:89b97c831d34972aa529204ece989da7e07b4d7a82b631e05812faa0dd2900b7
size 3023065632

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:343793bac4f673aa04d682eaa118df12f9a940f4e25171da9c0ba2ae8493c6dc
size 2865779232

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6ffc04ca0f270bd2107ee27b11ac72fc159d3229399da50044031e410ad2b193
size 3961592352

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0a70decf4a2f15ba103b3695d9b2946ce2c6e323251bf420a0eb6f90ea77d9f8
size 3757046304

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:00615d4bfaa7b3953d331224191beda8f233debc7c86682c04903c9ff7cea424
size 2562763296

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:aafa6b3e5cff5bbf2bf0424cfb47fd6d9d921df9aa370f9b4bbbbd1a2d943203
size 3343929888

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ce91c53694c99ef6a38b5f58b92b5a0db4c76f982d2fc75f98386fe3e04bfbbf
size 3611316768

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:aafa6b3e5cff5bbf2bf0424cfb47fd6d9d921df9aa370f9b4bbbbd1a2d943203
size 3343929888

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c19b7d6e6fffa18c1841e26976000da4a72676b9111f6d8abf894d86cfbedc49
size 3023065632

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b47862e9bb94b3d2e41bbb5b37b23560acac6fe06e7ca1f4dcaabbaefee2c35c
size 3928037920

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:eb0ce217b826c4b1236a6e6ea0e0eb98b185bed3e40e043b14cd3795e3283532
size 4353907232

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4cc802c4643aa039d7bb8ed4235045290f83271cbbd86e3535be073e0e833116
size 4213512736

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4cc802c4643aa039d7bb8ed4235045290f83271cbbd86e3535be073e0e833116
size 4213512736

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d1036f3533e6fe9f9c6692719d4ccb7dc5c0c8b79b050c475f66dd03a13d4a0e
size 3963689504

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:72e5a14a6c629ec40cbc0ec83bc8e52a78717a63364a46521e5506eb1b62531c
size 4779776544

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:e9c3645defc5e50ab435d0463d56b14ef90f6f99a7d27baf69531430389e93d1
size 5205645856

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:454b257371b8fe56d3644a09eedb5e1d565421edf106a8a229c98c3d7d32a419
size 4926839328

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:454b257371b8fe56d3644a09eedb5e1d565421edf106a8a229c98c3d7d32a419
size 4926839328

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8b98be07a6657f351678f7c9cacbccdb160820afa05056f78c984c9a64ca1a56
size 4779776544

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:3523bfb51929e78ad569c6c064eb053b6513b32878e9962c9f0e34329c4e6132
size 5684748832

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9bb64524c417c93d374380d3e82f85c3a9b41667cbc98608b984b9ef4d1cd513
size 7359943200

140
README.md Normal file
View File

@@ -0,0 +1,140 @@
Quantization made by Richard Erkhov.
[Github](https://github.com/RichardErkhov)
[Discord](https://discord.gg/pvy7H8DZMG)
[Request more models](https://github.com/RichardErkhov/quant_request)
OLMoE-1B-7B-0924-Instruct - GGUF
- Model creator: https://huggingface.co/allenai/
- Original model: https://huggingface.co/allenai/OLMoE-1B-7B-0924-Instruct/
| Name | Quant method | Size |
| ---- | ---- | ---- |
| [OLMoE-1B-7B-0924-Instruct.Q2_K.gguf](https://huggingface.co/RichardErkhov/allenai_-_OLMoE-1B-7B-0924-Instruct-gguf/blob/main/OLMoE-1B-7B-0924-Instruct.Q2_K.gguf) | Q2_K | 2.39GB |
| [OLMoE-1B-7B-0924-Instruct.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/allenai_-_OLMoE-1B-7B-0924-Instruct-gguf/blob/main/OLMoE-1B-7B-0924-Instruct.IQ3_XS.gguf) | IQ3_XS | 2.67GB |
| [OLMoE-1B-7B-0924-Instruct.IQ3_S.gguf](https://huggingface.co/RichardErkhov/allenai_-_OLMoE-1B-7B-0924-Instruct-gguf/blob/main/OLMoE-1B-7B-0924-Instruct.IQ3_S.gguf) | IQ3_S | 2.82GB |
| [OLMoE-1B-7B-0924-Instruct.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/allenai_-_OLMoE-1B-7B-0924-Instruct-gguf/blob/main/OLMoE-1B-7B-0924-Instruct.Q3_K_S.gguf) | Q3_K_S | 2.82GB |
| [OLMoE-1B-7B-0924-Instruct.IQ3_M.gguf](https://huggingface.co/RichardErkhov/allenai_-_OLMoE-1B-7B-0924-Instruct-gguf/blob/main/OLMoE-1B-7B-0924-Instruct.IQ3_M.gguf) | IQ3_M | 2.87GB |
| [OLMoE-1B-7B-0924-Instruct.Q3_K.gguf](https://huggingface.co/RichardErkhov/allenai_-_OLMoE-1B-7B-0924-Instruct-gguf/blob/main/OLMoE-1B-7B-0924-Instruct.Q3_K.gguf) | Q3_K | 3.11GB |
| [OLMoE-1B-7B-0924-Instruct.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/allenai_-_OLMoE-1B-7B-0924-Instruct-gguf/blob/main/OLMoE-1B-7B-0924-Instruct.Q3_K_M.gguf) | Q3_K_M | 3.11GB |
| [OLMoE-1B-7B-0924-Instruct.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/allenai_-_OLMoE-1B-7B-0924-Instruct-gguf/blob/main/OLMoE-1B-7B-0924-Instruct.Q3_K_L.gguf) | Q3_K_L | 3.36GB |
| [OLMoE-1B-7B-0924-Instruct.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/allenai_-_OLMoE-1B-7B-0924-Instruct-gguf/blob/main/OLMoE-1B-7B-0924-Instruct.IQ4_XS.gguf) | IQ4_XS | 3.5GB |
| [OLMoE-1B-7B-0924-Instruct.Q4_0.gguf](https://huggingface.co/RichardErkhov/allenai_-_OLMoE-1B-7B-0924-Instruct-gguf/blob/main/OLMoE-1B-7B-0924-Instruct.Q4_0.gguf) | Q4_0 | 3.66GB |
| [OLMoE-1B-7B-0924-Instruct.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/allenai_-_OLMoE-1B-7B-0924-Instruct-gguf/blob/main/OLMoE-1B-7B-0924-Instruct.IQ4_NL.gguf) | IQ4_NL | 3.69GB |
| [OLMoE-1B-7B-0924-Instruct.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/allenai_-_OLMoE-1B-7B-0924-Instruct-gguf/blob/main/OLMoE-1B-7B-0924-Instruct.Q4_K_S.gguf) | Q4_K_S | 3.69GB |
| [OLMoE-1B-7B-0924-Instruct.Q4_K.gguf](https://huggingface.co/RichardErkhov/allenai_-_OLMoE-1B-7B-0924-Instruct-gguf/blob/main/OLMoE-1B-7B-0924-Instruct.Q4_K.gguf) | Q4_K | 3.92GB |
| [OLMoE-1B-7B-0924-Instruct.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/allenai_-_OLMoE-1B-7B-0924-Instruct-gguf/blob/main/OLMoE-1B-7B-0924-Instruct.Q4_K_M.gguf) | Q4_K_M | 3.92GB |
| [OLMoE-1B-7B-0924-Instruct.Q4_1.gguf](https://huggingface.co/RichardErkhov/allenai_-_OLMoE-1B-7B-0924-Instruct-gguf/blob/main/OLMoE-1B-7B-0924-Instruct.Q4_1.gguf) | Q4_1 | 4.05GB |
| [OLMoE-1B-7B-0924-Instruct.Q5_0.gguf](https://huggingface.co/RichardErkhov/allenai_-_OLMoE-1B-7B-0924-Instruct-gguf/blob/main/OLMoE-1B-7B-0924-Instruct.Q5_0.gguf) | Q5_0 | 4.45GB |
| [OLMoE-1B-7B-0924-Instruct.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/allenai_-_OLMoE-1B-7B-0924-Instruct-gguf/blob/main/OLMoE-1B-7B-0924-Instruct.Q5_K_S.gguf) | Q5_K_S | 4.45GB |
| [OLMoE-1B-7B-0924-Instruct.Q5_K.gguf](https://huggingface.co/RichardErkhov/allenai_-_OLMoE-1B-7B-0924-Instruct-gguf/blob/main/OLMoE-1B-7B-0924-Instruct.Q5_K.gguf) | Q5_K | 4.59GB |
| [OLMoE-1B-7B-0924-Instruct.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/allenai_-_OLMoE-1B-7B-0924-Instruct-gguf/blob/main/OLMoE-1B-7B-0924-Instruct.Q5_K_M.gguf) | Q5_K_M | 4.59GB |
| [OLMoE-1B-7B-0924-Instruct.Q5_1.gguf](https://huggingface.co/RichardErkhov/allenai_-_OLMoE-1B-7B-0924-Instruct-gguf/blob/main/OLMoE-1B-7B-0924-Instruct.Q5_1.gguf) | Q5_1 | 4.85GB |
| [OLMoE-1B-7B-0924-Instruct.Q6_K.gguf](https://huggingface.co/RichardErkhov/allenai_-_OLMoE-1B-7B-0924-Instruct-gguf/blob/main/OLMoE-1B-7B-0924-Instruct.Q6_K.gguf) | Q6_K | 5.29GB |
| [OLMoE-1B-7B-0924-Instruct.Q8_0.gguf](https://huggingface.co/RichardErkhov/allenai_-_OLMoE-1B-7B-0924-Instruct-gguf/blob/main/OLMoE-1B-7B-0924-Instruct.Q8_0.gguf) | Q8_0 | 6.85GB |
Original model description:
---
license: apache-2.0
language:
- en
tags:
- moe
- olmo
- olmoe
co2_eq_emissions: 1
datasets:
- allenai/ultrafeedback_binarized_cleaned
base_model: allenai/OLMoE-1B-7B-0924-SFT
library_name: transformers
---
<img alt="OLMoE Logo." src="olmoe-logo.png" width="250px">
# Model Summary
> OLMoE-1B-7B-Instruct is a Mixture-of-Experts LLM with 1B active and 7B total parameters released in September 2024 (0924) that has been adapted via SFT and DPO from [OLMoE-1B-7B](https://hf.co/allenai/OLMoE-1B-7B-0924). It yields state-of-the-art performance among models with a similar cost (1B) and is competitive with much larger models like Llama2-13B-Chat. OLMoE is 100% open-source.
This information and more can also be found on the [**OLMoE GitHub repository**](https://github.com/allenai/OLMoE).
- **Paper**: https://arxiv.org/abs/2409.02060
- **Pretraining** [Checkpoints](https://hf.co/allenai/OLMoE-1B-7B-0924), [Code](https://github.com/allenai/OLMo/tree/Muennighoff/MoE), [Data](https://huggingface.co/datasets/allenai/OLMoE-mix-0924) and [Logs](https://wandb.ai/ai2-llm/olmoe/reports/OLMoE-1B-7B-0924--Vmlldzo4OTcyMjU3).
- **SFT (Supervised Fine-Tuning)** [Checkpoints](https://huggingface.co/allenai/OLMoE-1B-7B-0924-SFT), [Code](https://github.com/allenai/open-instruct/tree/olmoe-sft), [Data](https://hf.co/datasets/allenai/tulu-v3.1-mix-preview-4096-OLMoE) and [Logs](https://github.com/allenai/OLMoE/blob/main/logs/olmoe-sft-logs.txt).
- **DPO/KTO (Direct Preference Optimization/Kahneman-Tversky Optimization)**, [Checkpoints](https://huggingface.co/allenai/OLMoE-1B-7B-0924-Instruct), [Preference Data](https://hf.co/datasets/allenai/ultrafeedback_binarized_cleaned), [DPO code](https://github.com/allenai/open-instruct/tree/olmoe-sft), [KTO code](https://github.com/Muennighoff/kto/blob/master/kto.py) and [Logs](https://github.com/allenai/OLMoE/blob/main/logs/olmoe-dpo-logs.txt).
# Use
Install `transformers` **from source** until a release after [this PR](https://github.com/huggingface/transformers/pull/32406) & `torch` and run:
```python
from transformers import OlmoeForCausalLM, AutoTokenizer
import torch
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
# Load different ckpts via passing e.g. `revision=kto`
model = OlmoeForCausalLM.from_pretrained("allenai/OLMoE-1B-7B-0924-Instruct").to(DEVICE)
tokenizer = AutoTokenizer.from_pretrained("allenai/OLMoE-1B-7B-0924-Instruct")
messages = [{"role": "user", "content": "Explain to me like I'm five what is Bitcoin."}]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(DEVICE)
out = model.generate(inputs, max_length=100)
print(tokenizer.decode(out[0]))
"""
<|endoftext|><|user|>
Explain to me like I'm five what is Bitcoin.
<|assistant|>
Bitcoin is like a special kind of money that you can use to buy things online. But unlike regular money, like dollars or euros, Bitcoin isn't printed by governments or banks. Instead, it's created by a special computer program that helps people keep track of it.
Here's how it works: imagine you have a bunch of toys, and you want to
"""
```
Branches:
- `main`: Preference tuned via DPO model of https://hf.co/allenai/OLMoE-1B-7B-0924-SFT (`main` branch)
- `load-balancing`: Ablation with load balancing loss during DPO starting from the `load-balancing` branch of https://hf.co/allenai/OLMoE-1B-7B-0924-SFT
- `non-annealed`: Ablation starting from the `non-annealed` branch of https://hf.co/allenai/OLMoE-1B-7B-0924-SFT which is an SFT of the pretraining checkpoint prior to annealing (branch `step1200000-tokens5033B` of https://hf.co/allenai/OLMoE-1B-7B-0924)
- `kto`: Ablation using KTO instead of DPO. This branch is the checkpoint after 5,000 steps with the RMS optimizer. The other `kto*` branches correspond to the other checkpoints mentioned in the paper.
# Evaluation Snapshot
| Task (→) | MMLU | GSM8k | BBH | Human-Eval | Alpaca-Eval 1.0 | XSTest | IFEval | Avg |
|---------------|------|-------|------|------------|-----------------|--------|--------|------|
| **Setup (→)** | 0-shot | 8-shot CoT | 3-shot | 0-shot | 0-shot | 0-shot | 0-shot | |
| **Metric (→)** | EM | EM | EM | Pass@10 | %win | F1 | Loose Acc | |
| | | | | | | | | |
| OLMo-1B (0724) | 25.0 | 7.0 | 22.5 | 16.0 | - | 67.6 | 20.5 | - |
| +SFT | 36.0 | 12.5 | 27.2 | 21.2 | 41.5 | 81.9 | 26.1 | 35.9 |
| +DPO | 36.7 | 12.5 | 30.6 | 22.0 | 50.9 | 79.8 | 24.2 | 37.4 |
| OLMo-7B (0724) | 50.8 | 32.5 | 36.9 | 32.3 | - | 80.8 | 19.6 | - |
| +SFT | 54.2 | 25.0 | 35.7 | 38.5 | 70.9 | 86.1 | 39.7 | 49.3 |
| +DPO | 52.8 | 9.0 | 16.6 | 35.0 | 83.5 | **87.5** | 37.9 | 49.1 |
| JetMoE-2B-9B | 45.6 | 43.0 | 37.2 | 54.6 | - | 68.2 | 20.0 | - |
| +SFT | 46.1 | 53.5 | 35.6 | 64.8 | 69.3 | 55.6 | 30.5 | 50.4 |
| DeepSeek-3B-16B | 37.7 | 18.5 | 39.4 | 48.3 | - | 65.9 | 13.5 | - |
| +Chat | 48.5 | 46.5 | **40.8** | **70.1** | 74.8 | 85.6 | 32.3 | 57.0 |
| Qwen1.5-3B-14B | **60.4** | 13.5 | 27.2 | 60.2 | - | 73.4 | 20.9 | - |
| +Chat | 58.9 | **55.5** | 21.3 | 59.7 | 83.9 | 85.6 | 36.2 | 57.3 |
| **OLMoE (This Model)** | 49.8 | 3.0 | 33.6 | 22.4 | - | 59.7 | 16.6 | - |
| **+SFT** | 51.4 | 40.5 | 38.0 | 51.6 | 69.2 | 84.1 | 43.3 | 54.0 |
| **+DPO** | 51.9 | 45.5 | 37.0 | 54.8 | **84.0** | 82.6 | **48.1** | **57.7** |
# Citation
```bibtex
@misc{muennighoff2024olmoeopenmixtureofexpertslanguage,
title={OLMoE: Open Mixture-of-Experts Language Models},
author={Niklas Muennighoff and Luca Soldaini and Dirk Groeneveld and Kyle Lo and Jacob Morrison and Sewon Min and Weijia Shi and Pete Walsh and Oyvind Tafjord and Nathan Lambert and Yuling Gu and Shane Arora and Akshita Bhagia and Dustin Schwenk and David Wadden and Alexander Wettig and Binyuan Hui and Tim Dettmers and Douwe Kiela and Ali Farhadi and Noah A. Smith and Pang Wei Koh and Amanpreet Singh and Hannaneh Hajishirzi},
year={2024},
eprint={2409.02060},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2409.02060},
}
```