language, license, base_model, tags, datasets, pipeline_tag
| language | license | base_model | tags | datasets | pipeline_tag | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
apache-2.0 | Qwen/Qwen2.5-1.5B-Instruct |
|
|
text-generation |
Aristaeus
Aristaeus is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct, trained to improve structured, step-by-step reasoning across mathematics, science, logic, and code. It is a Stage 1 reasoning model — the goal of this release is deliberate, verifiable chain-of-thought, not raw benchmark maximisation.
The name comes from Aristaeus, the ancient Greek deity of practical knowledge — beekeeping, olive cultivation, cheesemaking. Applied intelligence in service of real things.
Training
| Detail | Value |
|---|---|
| Base model | Qwen/Qwen2.5-1.5B-Instruct |
| Fine-tune type | Full fine-tune (bf16) |
| Hardware | NVIDIA A100-SXM4-40GB |
| Training time | ~81 minutes |
| Epochs | 2 |
| Sequence length | 4096 tokens |
| Effective batch size | 16 (batch 2 × grad accum 8) |
| Learning rate | 2e-5 (cosine schedule) |
| Warmup ratio | 0.05 |
| Framework | Unsloth + TRL SFTTrainer |
| Final train loss | 1.083 |
| Final eval loss | 1.023 |
Datasets
open-thoughts/OpenThoughts3-1.2M — 30,000 examples sampled via streaming. Reasoning traces generated by QwQ-32B (Apache 2.0). Covers mathematics, science, and coding problems with long chain-of-thought traces.
bespokelabs/Bespoke-Stratos-17k — Full 16,710 examples. Curated from AIME/MATH olympiad problems, competitive programming (APPS, TACO), and science/puzzle data. Reasoning traces generated from DeepSeek-R1 via local inference.
Combined training set: ~47,000 examples after normalisation and filtering. Both datasets were selected for clean licensing (no API-generated outputs from closed models).
Evaluation
Aristaeus was compared against the base Qwen2.5-1.5B-Instruct across six reasoning tasks covering different problem types. Results below are from manual evaluation — no automated benchmark harness was used for this release.
| Task | Aristaeus | Base |
|---|---|---|
| Unit conversion (train speed km → m/s) | ✅ Correct | ❌ Wrong (unit tracking failure) |
| Multi-step word problem (apples) | ✅ Correct | ✅ Correct |
| Deductive logic (mammals/warm-blooded) | ⚠️ Correct answer, minor overreach | ✅ Correct, richer detail |
| Recursive code trace (Fibonacci f(7)) | ❌ Lost thread, no answer | ✅ Correct (13) |
| Exponential growth (bacterial doubling) | ✅ Correct (6400) | ✅ Correct (6400) |
| Spatial constraint reasoning (water jug) | ✅ Correct, includes verification | ❌ Incoherent final steps |
3 wins / 1 loss / 2 draws against base on this task set.
Honest limitations
Recursive call stack tracing is the clearest failure mode. On f(7) Fibonacci, Aristaeus lost track of the recursion depth, began questioning its own assumptions, and produced no final answer. The base model handled it correctly. This is consistent with a known capacity ceiling at 1.5B parameters for problems that require holding many simultaneous state variables. A 7B model would likely not exhibit this failure.
Logical overconfidence was observed on the deductive reasoning prompt. The model correctly concluded dolphins are warm-blooded, but also asserted snakes are cold-blooded purely from the premise "snakes are not mammals" — which does not logically follow without additional premises. The model has learned to produce confident, structured conclusions, which occasionally leads it to state more than the premises support. This is a known SFT artefact when training data rewards assertive, well-formatted responses.
The eval loss curve plateaued convincingly from step ~2800 onward, suggesting the model saturated the current dataset. Additional epochs would not improve this release.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("EphAsad/Aristaeus")
tokenizer = AutoTokenizer.from_pretrained("EphAsad/Aristaeus")
messages = [
{"role": "system", "content": "You are a helpful reasoning assistant."},
{"role": "user", "content": "A bacterial culture starts with 100 cells and doubles every 20 minutes. How many cells after 2 hours?"},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=1024, temperature=0.6, top_p=0.9, do_sample=True)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Roadmap
Aristaeus is a Stage 1 release. Two further stages are planned:
Stage 2 — Agentic tool use. Fine-tuning on lambda/hermes-agent-reasoning-traces (Apache 2.0, agentic trajectories with <think> blocks and real tool execution results) at 16k context. The intention is to teach the model when and how to use tools, layered on top of the reasoning foundation established here.
Author
Built by Zain Asad (Eph) — Senior Microbiology Analyst and Applied AI Engineer.
Core portfolio: BactAID · DomainEmbedder · FireSOP · FireAccess LIMS · Eidos · Ananke
Licence
Apache 2.0 — consistent with the base model and training datasets used.