theprint-moe-8x3-0126-GGUF/README.md at 3b0373da51e7ae7ffa22fb0392d4ae96b3cb007c

theprint/theprint-moe-8x3-0126-GGUF

Fork 0

Files

Rasmus Rasmussen e34a2718ff Update README.md

2026-01-15 00:30:53 +00:00

807 B

Raw Blame History

license, language, base_model, pipeline_tag, tags

license

language

base_model

pipeline_tag

tags

apache-2.0

theprint/theprint-moe-8x3-0126

text-generation

moe

llama

# theprint-MoE-8x3-0126-GGUF An 18B parameter Mixture of Experts model combining 8 specialized 3B experts, with 2 experts activated per token by default (configurable up to 4 at inference).

Architecture

Base model: theprint/GeneralChat-Llama3.2-3B (provides shared attention layers)
Total parameters: ~18B
Active parameters: ~5B (2 experts) or ~9B (4 experts)
Gate mode: Hidden (prompt-based router initialization)

Full Model

For more information about this model, including access to the safetensor files, please see theprint/theprint-moe-8x3-0126.

807 B Raw Blame History

Architecture

Full Model

807 B

Raw Blame History