Files

ModelHub XC 5d9af16e3d 初始化项目，由ModelHub XC社区提供模型

Model: my-ai-stack/Stack-3.0-Omni-Nexus
Source: Original Platform

2026-05-07 16:44:07 +08:00

4.1 KiB

Raw Blame History

language, license, library_name, tags, datasets, metrics, pipeline_tag

language

license

library_name

Stack 3.0 Omni Nexus

Mixture-of-Experts model for sovereign AI infrastructure

Stack 3.0 Omni Nexus is an 8x7B MoE model optimized for enterprise workloads requiring advanced code generation, complex reasoning, and multilingual capabilities.

📊 Benchmarks (vs Leading Models)

Benchmark	Stack 3.0 Omni Nexus	Llama 3.1 70B	Mixtral 8x7B
HumanEval (pass@1)	82.0%	76.2%	74.8%
MBPP (pass@1)	78.5%	72.1%	70.3%
GSM8K (5-shot)	91.2%	89.5%	88.1%
MMLU (5-shot)	68.4%	69.8%	67.2%
CodeForces (rating)	1842	1765	1721

🎯 Performance

Metric	Value
Active Params	~14B (2 of 8 experts)
Total Params	~56B
Context	131,072 tokens (128K)
VRAM (Q4_K_M)	~3.5 GB
Speed (A100)	~45 tps

🚀 Quick Start

Python (Transformers)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "my-ai-stack/Stack-3.0-Omni-Nexus"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

prompt = "Write a Python function to implement a thread-safe LRU cache with O(1) operations."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.2)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

lama.cpp

# Download: https://huggingface.co/my-ai-stack/Stack-3.0-Omni-Nexus/tree/main
./main -m stack-3.0-omni-nexus-q4_k_m.gguf \
  -n 512 -t 8 -c 131072 --temp 0.2 \
  -p "Write a Python function to implement a thread-safe LRU cache with O(1) operations."

Ollama

ollama pull stack-3.0-omni-nexus
ollama run stack-3.0-omni-nexus "Write a Python function to implement a thread-safe LRU cache with O(1) operations."

🤗 GGUF Variants (Download Counts)

Quantization	File Size	Downloads	Use Case
FP16	56.0 GB	-	Research
Q8_0	28.0 GB	-	High quality
Q4_K_M	14.0 GB	1.38k	Balanced ⭐
Q3_K_M	10.0 GB	190	Low-end GPUs
Q2_K	7.0 GB	-	Minimum VRAM

🏛️ Architecture

Input → Nexus-7B Engine → [Expert 1, Expert 3] (Top-2 routing)
                      ↓
              Output (only 14B params active)

Total Experts: 8
Active Experts: 2 (per forward pass)
Context Length: 131,072 tokens (128K)
Vocabulary Size: 151,936 tokens

🌍 Use Cases

Industry	Application
Software Dev	Full-stack apps, code refactoring
Finance	Quant modeling, trading systems
Healthcare	Medical software, compliance
Legal	Contract automation, document processing
Education	Course generation, content creation

⚠️ Limitations

Requires high-end GPU for FP16 inference
May need fine-tuning for specialized domains
Always verify generated code before production

📁 Citation

@misc{stack-3.0-omni-nexus,
  author = {Walid Sobhi},
  title = {Stack 3.0 Omni Nexus: 8x7B Mixture-of-Experts Model},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/my-ai-stack/Stack-3.0-Omni-Nexus}
}

Built with ❤️ for sovereign AI infrastructure
Discord · GitHub · Website

4.1 KiB Raw Blame History