Go to file

ModelHub XC 4a10f41912 初始化项目，由ModelHub XC社区提供模型

Model: EphAsad/Aristaeus
Source: Original Platform

2026-05-02 02:01:09 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-05-02 02:01:09 +08:00

chat_template.jinja

初始化项目，由ModelHub XC社区提供模型

2026-05-02 02:01:09 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-02 02:01:09 +08:00

generation_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-02 02:01:09 +08:00

model.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-02 02:01:09 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-05-02 02:01:09 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-02 02:01:09 +08:00

tokenizer.json

初始化项目，由ModelHub XC社区提供模型

2026-05-02 02:01:09 +08:00

training_args.bin

初始化项目，由ModelHub XC社区提供模型

2026-05-02 02:01:09 +08:00

README.md

language, license, base_model, tags, datasets, pipeline_tag

language

license

base_model

Aristaeus

Aristaeus is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct, trained to improve structured, step-by-step reasoning across mathematics, science, logic, and code. It is a Stage 1 reasoning model — the goal of this release is deliberate, verifiable chain-of-thought, not raw benchmark maximisation.

The name comes from Aristaeus, the ancient Greek deity of practical knowledge — beekeeping, olive cultivation, cheesemaking. Applied intelligence in service of real things.

Training

Detail	Value
Base model	Qwen/Qwen2.5-1.5B-Instruct
Fine-tune type	Full fine-tune (bf16)
Hardware	NVIDIA A100-SXM4-40GB
Training time	~81 minutes
Epochs	2
Sequence length	4096 tokens
Effective batch size	16 (batch 2 × grad accum 8)
Learning rate	2e-5 (cosine schedule)
Warmup ratio	0.05
Framework	Unsloth + TRL SFTTrainer
Final train loss	1.083
Final eval loss	1.023

Datasets

open-thoughts/OpenThoughts3-1.2M — 30,000 examples sampled via streaming. Reasoning traces generated by QwQ-32B (Apache 2.0). Covers mathematics, science, and coding problems with long chain-of-thought traces.

bespokelabs/Bespoke-Stratos-17k — Full 16,710 examples. Curated from AIME/MATH olympiad problems, competitive programming (APPS, TACO), and science/puzzle data. Reasoning traces generated from DeepSeek-R1 via local inference.

Combined training set: ~47,000 examples after normalisation and filtering. Both datasets were selected for clean licensing (no API-generated outputs from closed models).

Evaluation

Aristaeus was compared against the base Qwen2.5-1.5B-Instruct across six reasoning tasks covering different problem types. Results below are from manual evaluation — no automated benchmark harness was used for this release.

Task	Aristaeus	Base
Unit conversion (train speed km → m/s)	✅ Correct	❌ Wrong (unit tracking failure)
Multi-step word problem (apples)	✅ Correct	✅ Correct
Deductive logic (mammals/warm-blooded)	⚠️ Correct answer, minor overreach	✅ Correct, richer detail
Recursive code trace (Fibonacci f(7))	❌ Lost thread, no answer	✅ Correct (13)
Exponential growth (bacterial doubling)	✅ Correct (6400)	✅ Correct (6400)
Spatial constraint reasoning (water jug)	✅ Correct, includes verification	❌ Incoherent final steps

3 wins / 1 loss / 2 draws against base on this task set.

Honest limitations

Recursive call stack tracing is the clearest failure mode. On f(7) Fibonacci, Aristaeus lost track of the recursion depth, began questioning its own assumptions, and produced no final answer. The base model handled it correctly. This is consistent with a known capacity ceiling at 1.5B parameters for problems that require holding many simultaneous state variables. A 7B model would likely not exhibit this failure.

Logical overconfidence was observed on the deductive reasoning prompt. The model correctly concluded dolphins are warm-blooded, but also asserted snakes are cold-blooded purely from the premise "snakes are not mammals" — which does not logically follow without additional premises. The model has learned to produce confident, structured conclusions, which occasionally leads it to state more than the premises support. This is a known SFT artefact when training data rewards assertive, well-formatted responses.

The eval loss curve plateaued convincingly from step ~2800 onward, suggesting the model saturated the current dataset. Additional epochs would not improve this release.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("EphAsad/Aristaeus")
tokenizer = AutoTokenizer.from_pretrained("EphAsad/Aristaeus")

messages = [
    {"role": "system", "content": "You are a helpful reasoning assistant."},
    {"role": "user",   "content": "A bacterial culture starts with 100 cells and doubles every 20 minutes. How many cells after 2 hours?"},
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

output = model.generate(**inputs, max_new_tokens=1024, temperature=0.6, top_p=0.9, do_sample=True)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Roadmap

Aristaeus is a Stage 1 release. Two further stages are planned:

Stage 2 — Agentic tool use. Fine-tuning on lambda/hermes-agent-reasoning-traces (Apache 2.0, agentic trajectories with <think> blocks and real tool execution results) at 16k context. The intention is to teach the model when and how to use tools, layered on top of the reasoning foundation established here.

Author

Built by Zain Asad (Eph) — Senior Microbiology Analyst and Applied AI Engineer.

Core portfolio: BactAID · DomainEmbedder · FireSOP · FireAccess LIMS · Eidos · Ananke

Licence

Apache 2.0 — consistent with the base model and training datasets used.

README.md Unescape Escape

Aristaeus

Training

Datasets

Evaluation

Honest limitations

Usage

Roadmap

Author

Licence

README.md