Production-ready merged model (base + LoRA fused into 16-bit weights).
Trained on a single NVIDIA A40 (44 GB) using Unsloth QLoRA + TRL SFTTrainer.
Looking for a smaller download?
Use the LoRA adapter (~150 MB) or the
GGUF Q4_K_M (~986 MB) for Ollama.
📊 Evaluation — Cycle 2
Metric
Value
LLM-as-judge (avg)
12.6/15
Perplexity
1.14
Δ vs previous cycle
+12.6
Training loss
0.2274
Training samples
8,990
Training steps
1100
By reasoning type
Type
Status
Score
Progress
Cycle history
Cycle
Date
Score
PPL
Δ
vs Published
1
2026-03-20
12.9/15
1.17
+12.9
12.9
2
2026-03-20
12.6/15
1.14
+12.6
13.2
🚀 Quick start
fromtransformersimportAutoModelForCausalLM,AutoTokenizerimporttorchmodel=AutoModelForCausalLM.from_pretrained("Gianloko/apex-coder-1.5b",torch_dtype=torch.bfloat16,device_map="auto",)tokenizer=AutoTokenizer.from_pretrained("Gianloko/apex-coder-1.5b")messages=[{"role":"system","content":"You are ApexCoder, a world-class Salesforce expert."},{"role":"user","content":"Write a bulkified Apex trigger on Opportunity that prevents status changes to Closed Won if no related Products exist."},]inputs=tokenizer.apply_chat_template(messages,return_tensors="pt",add_generation_prompt=True).to(model.device)output=model.generate(inputs,max_new_tokens=512,temperature=0.1,do_sample=False)print(tokenizer.decode(output[0][inputs.shape[1]:],skip_special_tokens=True))
🦙 Ollama (GGUF — recommended for local use)
ollama pull hf.co/Gianloko/apex-coder-1.5b-GGUF:Q4_K_M
ollama run hf.co/Gianloko/apex-coder-1.5b-GGUF:Q4_K_M
🔧 LoRA adapter
If you already have the base model loaded, use the
LoRA adapter (~150 MB) instead: