32 lines
1.1 KiB
Markdown
32 lines
1.1 KiB
Markdown
|
|
---
|
||
|
|
pipeline_tag: text-generation
|
||
|
|
library_name: transformers
|
||
|
|
license: apache-2.0
|
||
|
|
tags:
|
||
|
|
- mixtral
|
||
|
|
- moe
|
||
|
|
- reasoning
|
||
|
|
---
|
||
|
|
|
||
|
|
# Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks
|
||
|
|
|
||
|
|
This repository contains model checkpoints from the paper [Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks](https://huggingface.co/papers/2508.18672).
|
||
|
|
|
||
|
|
For more details, including code and evaluation procedures, please refer to the official GitHub repository: [https://github.com/rioyokotalab/optimal-sparsity](https://github.com/rioyokotalab/optimal-sparsity)
|
||
|
|
|
||
|
|
## How to cite
|
||
|
|
|
||
|
|
If you find our work helpful, please feel free to cite the paper.
|
||
|
|
|
||
|
|
```bibtex
|
||
|
|
@inproceedings{
|
||
|
|
nakamura2026optimal,
|
||
|
|
title={Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks},
|
||
|
|
author={Taishi Nakamura and Satoki Ishikawa and Masaki Kawamura and Takumi Okamoto and Daisuke Nohara and Jun Suzuki and Rio Yokota},
|
||
|
|
booktitle={The Fourteenth International Conference on Learning Representations},
|
||
|
|
year={2026},
|
||
|
|
url={https://openreview.net/forum?id=XFw2EPRUUR}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|