初始化项目，由ModelHub XC社区提供模型

Model: EphAsad/Aristaeus Source: Original Platform
2026-05-02 02:01:09 +08:00
commit 4a10f41912
9 changed files with 498 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,122 @@
+---
+language:
+- en
+license: apache-2.0
+base_model: Qwen/Qwen2.5-1.5B-Instruct
+tags:
+- reasoning
+- fine-tuned
+- qwen2.5
+- math
+- science
+- code
+- chain-of-thought
+- unsloth
+datasets:
+- open-thoughts/OpenThoughts3-1.2M
+- bespokelabs/Bespoke-Stratos-17k
+pipeline_tag: text-generation
+---
+
+# Aristaeus
+
+**Aristaeus** is a fine-tuned version of [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct), trained to improve structured, step-by-step reasoning across mathematics, science, logic, and code. It is a Stage 1 reasoning model — the goal of this release is deliberate, verifiable chain-of-thought, not raw benchmark maximisation.
+
+The name comes from Aristaeus, the ancient Greek deity of practical knowledge — beekeeping, olive cultivation, cheesemaking. Applied intelligence in service of real things.
+
+---
+
+## Training
+
+| Detail | Value |
+|---|---|
+| Base model | Qwen/Qwen2.5-1.5B-Instruct |
+| Fine-tune type | Full fine-tune (bf16) |
+| Hardware | NVIDIA A100-SXM4-40GB |
+| Training time | ~81 minutes |
+| Epochs | 2 |
+| Sequence length | 4096 tokens |
+| Effective batch size | 16 (batch 2 × grad accum 8) |
+| Learning rate | 2e-5 (cosine schedule) |
+| Warmup ratio | 0.05 |
+| Framework | Unsloth + TRL SFTTrainer |
+| Final train loss | 1.083 |
+| Final eval loss | 1.023 |
+
+### Datasets
+
+**[open-thoughts/OpenThoughts3-1.2M](https://huggingface.co/datasets/open-thoughts/OpenThoughts3-1.2M)** — 30,000 examples sampled via streaming. Reasoning traces generated by QwQ-32B (Apache 2.0). Covers mathematics, science, and coding problems with long chain-of-thought traces.
+
+**[bespokelabs/Bespoke-Stratos-17k](https://huggingface.co/datasets/bespokelabs/Bespoke-Stratos-17k)** — Full 16,710 examples. Curated from AIME/MATH olympiad problems, competitive programming (APPS, TACO), and science/puzzle data. Reasoning traces generated from DeepSeek-R1 via local inference.
+
+Combined training set: ~47,000 examples after normalisation and filtering. Both datasets were selected for clean licensing (no API-generated outputs from closed models).
+
+---
+
+## Evaluation
+
+Aristaeus was compared against the base Qwen2.5-1.5B-Instruct across six reasoning tasks covering different problem types. Results below are from manual evaluation — no automated benchmark harness was used for this release.
+
+| Task | Aristaeus | Base |
+|---|---|---|
+| Unit conversion (train speed km → m/s) | ✅ Correct | ❌ Wrong (unit tracking failure) |
+| Multi-step word problem (apples) | ✅ Correct | ✅ Correct |
+| Deductive logic (mammals/warm-blooded) | ⚠️ Correct answer, minor overreach | ✅ Correct, richer detail |
+| Recursive code trace (Fibonacci f(7)) | ❌ Lost thread, no answer | ✅ Correct (13) |
+| Exponential growth (bacterial doubling) | ✅ Correct (6400) | ✅ Correct (6400) |
+| Spatial constraint reasoning (water jug) | ✅ Correct, includes verification | ❌ Incoherent final steps |
+
+**3 wins / 1 loss / 2 draws** against base on this task set.
+
+### Honest limitations
+
+**Recursive call stack tracing** is the clearest failure mode. On `f(7)` Fibonacci, Aristaeus lost track of the recursion depth, began questioning its own assumptions, and produced no final answer. The base model handled it correctly. This is consistent with a known capacity ceiling at 1.5B parameters for problems that require holding many simultaneous state variables. A 7B model would likely not exhibit this failure.
+
+**Logical overconfidence** was observed on the deductive reasoning prompt. The model correctly concluded dolphins are warm-blooded, but also asserted snakes are cold-blooded purely from the premise "snakes are not mammals" — which does not logically follow without additional premises. The model has learned to produce confident, structured conclusions, which occasionally leads it to state more than the premises support. This is a known SFT artefact when training data rewards assertive, well-formatted responses.
+
+The eval loss curve plateaued convincingly from step ~2800 onward, suggesting the model saturated the current dataset. Additional epochs would not improve this release.
+
+---
+
+## Usage
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+model = AutoModelForCausalLM.from_pretrained("EphAsad/Aristaeus")
+tokenizer = AutoTokenizer.from_pretrained("EphAsad/Aristaeus")
+
+messages = [
+    {"role": "system", "content": "You are a helpful reasoning assistant."},
+    {"role": "user",   "content": "A bacterial culture starts with 100 cells and doubles every 20 minutes. How many cells after 2 hours?"},
+]
+
+text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer(text, return_tensors="pt").to(model.device)
+
+output = model.generate(**inputs, max_new_tokens=1024, temperature=0.6, top_p=0.9, do_sample=True)
+print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
+```
+
+---
+
+## Roadmap
+
+Aristaeus is a Stage 1 release. Two further stages are planned:
+
+**Stage 2 — Agentic tool use.** Fine-tuning on `lambda/hermes-agent-reasoning-traces` (Apache 2.0, agentic trajectories with `<think>` blocks and real tool execution results) at 16k context. The intention is to teach the model *when* and *how* to use tools, layered on top of the reasoning foundation established here.
+
+
+---
+
+## Author
+
+Built by **Zain Asad** (Eph) — Senior Microbiology Analyst and Applied AI Engineer.
+
+Core portfolio: [BactAID](https://doi.org/10.5281/zenodo.18089381) · [DomainEmbedder](https://huggingface.co/EphAsad/DomainEmbedder) · FireSOP · FireAccess LIMS · Eidos · Ananke
+
+---
+
+## Licence
+
+Apache 2.0 — consistent with the base model and training datasets used.