Mixture-of-Experts model for sovereign AI infrastructure
Stack 3.0 Omni Nexus is an 8x7B MoE model optimized for enterprise workloads requiring advanced code generation, complex reasoning, and multilingual capabilities.
📊 Benchmarks (vs Leading Models)
Benchmark
Stack 3.0 Omni Nexus
Llama 3.1 70B
Mixtral 8x7B
HumanEval (pass@1)
82.0%
76.2%
74.8%
MBPP (pass@1)
78.5%
72.1%
70.3%
GSM8K (5-shot)
91.2%
89.5%
88.1%
MMLU (5-shot)
68.4%
69.8%
67.2%
CodeForces (rating)
1842
1765
1721
🎯 Performance
Metric
Value
Active Params
~14B (2 of 8 experts)
Total Params
~56B
Context
131,072 tokens (128K)
VRAM (Q4_K_M)
~3.5 GB
Speed (A100)
~45 tps
🚀 Quick Start
Python (Transformers)
fromtransformersimportAutoModelForCausalLM,AutoTokenizerimporttorchmodel_name="my-ai-stack/Stack-3.0-Omni-Nexus"tokenizer=AutoTokenizer.from_pretrained(model_name,trust_remote_code=True)model=AutoModelForCausalLM.from_pretrained(model_name,torch_dtype=torch.float16,device_map="auto",trust_remote_code=True)prompt="Write a Python function to implement a thread-safe LRU cache with O(1) operations."inputs=tokenizer(prompt,return_tensors="pt").to(model.device)withtorch.no_grad():outputs=model.generate(**inputs,max_new_tokens=512,temperature=0.2)print(tokenizer.decode(outputs[0],skip_special_tokens=True))
lama.cpp
# Download: https://huggingface.co/my-ai-stack/Stack-3.0-Omni-Nexus/tree/main
./main -m stack-3.0-omni-nexus-q4_k_m.gguf \
-n 512 -t 8 -c 131072 --temp 0.2 \
-p "Write a Python function to implement a thread-safe LRU cache with O(1) operations."
Ollama
ollama pull stack-3.0-omni-nexus
ollama run stack-3.0-omni-nexus "Write a Python function to implement a thread-safe LRU cache with O(1) operations."