This model is a compressed version of Qwen/Qwen3-30B-A3B-Instruct-2507.
It is obtained by reducing the number of experts in each MoE layer from 128 to 96 using the REAP baseline method as described in https://bknyaz.github.io/blog/2026/moe/.
The compressed model has 23B params (44GB) instead of 31B (57GB) of the original model,
reducing storage and GPU memory requirements by roughly 25%. At the same time,
the model retains >=90% of the original model's performance on a variety of benchmarks (see Results section below).
Additional efficiency optimization (e.g., quantization) can be added similarly to the original model.