The ultimate 3B parameter model for sovereign AI deployment
Stack X Ultimate is a high-performance 3B parameter language model designed for sovereign AI deployment. Optimized for edge computing, on-premise infrastructure, and air-gapped environments. Delivers exceptional performance while maintaining a compact footprint suitable for consumer hardware and enterprise deployment.
Hardware Requirements
Quantization
GPU Required
VRAM
Total Model Size
FP16 (full precision)
RTX 3060+
~6 GB
~6 GB
Q8_0
RTX 3060
~3 GB
~3 GB
Q4_K_M
Any modern GPU
~1.8 GB
~1.8 GB
Q3_K_M
Integrated GPU
~1.2 GB
~1.2 GB
Q2_K
CPU + 8GB RAM
~900 MB
~900 MB
Minimum Requirements (Q3_K and below)
GPU: None required (CPU inference supported)
RAM: 8GB system RAM
Storage: 2GB+ free space
Recommended Requirements
GPU: NVIDIA RTX 3060 (12GB) or better
RAM: 16GB system RAM
Storage: 4GB+ free space for multiple quantizations
fromtransformersimportAutoModelForCausalLM,AutoTokenizerimporttorch# Load model and tokenizermodel_name="my-ai-stack/Stack-X-Ultimate"tokenizer=AutoTokenizer.from_pretrained(model_name,trust_remote_code=True)model=AutoModelForCausalLM.from_pretrained(model_name,torch_dtype=torch.float16,device_map="auto",trust_remote_code=True)# Generate responseprompt="Explain the concept of sovereignty in AI systems and why it matters for enterprise deployment."messages=[{"role":"system","content":"You are Stack X Ultimate, a helpful and knowledgeable AI assistant."},{"role":"user","content":prompt}]text=tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True)inputs=tokenizer([text],return_tensors="pt").to(model.device)withtorch.no_grad():outputs=model.generate(**inputs,max_new_tokens=512,temperature=0.7,top_p=0.95,do_sample=True,)response=tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:],skip_special_tokens=True)print(response)
llama.cpp
# Download the GGUF model file# Visit: https://huggingface.co/my-ai-stack/Stack-X-Ultimate/tree/main# Run with llama.cpp on GPU
./main -m stack-x-ultimate-q4_k_m.gguf \
-n 512\
-t 8\
-c 131072\
--temp 0.7 \
--top-p 0.95 \
-p "Write a Python function to implement quicksort algorithm."# Run on CPU only
./main -m stack-x-ultimate-q4_k_m.gguf \
-n 512\
-t 8\
-c 131072\
--no-display \
--threads 8\
-p "Explain the differences between sovereign AI and cloud-based AI solutions."# Use with quantization comparison
./main -m stack-x-ultimate-q2_k.gguf -n 256 --temp 0.5
./main -m stack-x-ultimate-q4_k_m.gguf -n 256 --temp 0.5
./main -m stack-x-ultimate-q8_0.gguf -n 256 --temp 0.5
Ollama
# Pull the model
ollama pull stack-x-ultimate
# Run interactively
ollama run stack-x-ultimate "Write a Python function to implement binary search."# Run with creative temperature
ollama run stack-x-ultimate \
--temperature 0.9 \
--top-p 0.95 \
"Write a short story about an AI that becomes self-aware in an air-gapped facility."# Run with low temperature for factual responses
ollama run stack-x-ultimate \
--temperature 0.2 \
--top-p 0.9 \
"Explain quantum computing and its applications in cryptography."# Use with longer context for document processing
ollama run stack-x-ultimate \
--num-ctx 65536\
--temperature 0.5 \
"Summarize the following research paper: [PASTE TEXT]"
Model Architecture
Attribute
Value
Base Model
Qwen/Qwen2.5-3B
Parameters
3B
Fine-tuning
Full fine-tuning + LoRA
Context Length
131,072 tokens (128K)
Vocabulary Size
151,936 tokens
Hidden Size
1,536
Attention Heads
12
Num Key Value Heads
2
Transformer Layers
28
Activation Function
SiLU
RoPE Scaling
NTK (factor: 4.0)
Training Details
Base Model: Qwen2.5-3B
Training Approach: Combined full fine-tuning + LoRA
Fine-tuning Data: Diverse high-quality corpus
Focus Areas: General understanding, code generation, instruction following
Special Training: Sovereign deployment optimization, edge computing efficiency
@misc{my-ai-stack/stack-x-ultimate,author={Walid Sobhi},title={Stack X Ultimate: 3B Parameter Model for Sovereign AI Deployment},year={2026},publisher={HuggingFace},url={https://huggingface.co/my-ai-stack/Stack-X-Ultimate}}