Update README.md

2026-01-14 19:24:38 +00:00
parent 8df2439c20
commit c8c6482291
1 changed files with 13 additions and 1 deletions
--- a/README.md
+++ b/README.md
@@ -8,4 +8,16 @@ pipeline_tag: text-generation
 tags:
 - moe
 - llama
---
+---
 # theprint-MoE-8x3-0126-GGUF
 An 18B parameter Mixture of Experts model combining 8 specialized 3B experts, with 2 experts activated per token by default (configurable up to 4 at inference).
 ## Architecture
 - Base model: theprint/GeneralChat-Llama3.2-3B (provides shared attention layers)
 - Total parameters: ~18B
 - Active parameters: ~5B (2 experts) or ~9B (4 experts)
 - Gate mode: Hidden (prompt-based router initialization)
 ## Full Model
 For more information about this model, including access to the safetensor files, please see [theprint/theprint-moe-8x3-0126](https://huggingface.co/theprint/theprint-moe-8x3-0126).