初始化项目,由ModelHub XC社区提供模型
Model: EphAsad/Aristaeus Source: Original Platform
This commit is contained in:
122
README.md
Normal file
122
README.md
Normal file
@@ -0,0 +1,122 @@
|
||||
---
|
||||
language:
|
||||
- en
|
||||
license: apache-2.0
|
||||
base_model: Qwen/Qwen2.5-1.5B-Instruct
|
||||
tags:
|
||||
- reasoning
|
||||
- fine-tuned
|
||||
- qwen2.5
|
||||
- math
|
||||
- science
|
||||
- code
|
||||
- chain-of-thought
|
||||
- unsloth
|
||||
datasets:
|
||||
- open-thoughts/OpenThoughts3-1.2M
|
||||
- bespokelabs/Bespoke-Stratos-17k
|
||||
pipeline_tag: text-generation
|
||||
---
|
||||
|
||||
# Aristaeus
|
||||
|
||||
**Aristaeus** is a fine-tuned version of [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct), trained to improve structured, step-by-step reasoning across mathematics, science, logic, and code. It is a Stage 1 reasoning model — the goal of this release is deliberate, verifiable chain-of-thought, not raw benchmark maximisation.
|
||||
|
||||
The name comes from Aristaeus, the ancient Greek deity of practical knowledge — beekeeping, olive cultivation, cheesemaking. Applied intelligence in service of real things.
|
||||
|
||||
---
|
||||
|
||||
## Training
|
||||
|
||||
| Detail | Value |
|
||||
|---|---|
|
||||
| Base model | Qwen/Qwen2.5-1.5B-Instruct |
|
||||
| Fine-tune type | Full fine-tune (bf16) |
|
||||
| Hardware | NVIDIA A100-SXM4-40GB |
|
||||
| Training time | ~81 minutes |
|
||||
| Epochs | 2 |
|
||||
| Sequence length | 4096 tokens |
|
||||
| Effective batch size | 16 (batch 2 × grad accum 8) |
|
||||
| Learning rate | 2e-5 (cosine schedule) |
|
||||
| Warmup ratio | 0.05 |
|
||||
| Framework | Unsloth + TRL SFTTrainer |
|
||||
| Final train loss | 1.083 |
|
||||
| Final eval loss | 1.023 |
|
||||
|
||||
### Datasets
|
||||
|
||||
**[open-thoughts/OpenThoughts3-1.2M](https://huggingface.co/datasets/open-thoughts/OpenThoughts3-1.2M)** — 30,000 examples sampled via streaming. Reasoning traces generated by QwQ-32B (Apache 2.0). Covers mathematics, science, and coding problems with long chain-of-thought traces.
|
||||
|
||||
**[bespokelabs/Bespoke-Stratos-17k](https://huggingface.co/datasets/bespokelabs/Bespoke-Stratos-17k)** — Full 16,710 examples. Curated from AIME/MATH olympiad problems, competitive programming (APPS, TACO), and science/puzzle data. Reasoning traces generated from DeepSeek-R1 via local inference.
|
||||
|
||||
Combined training set: ~47,000 examples after normalisation and filtering. Both datasets were selected for clean licensing (no API-generated outputs from closed models).
|
||||
|
||||
---
|
||||
|
||||
## Evaluation
|
||||
|
||||
Aristaeus was compared against the base Qwen2.5-1.5B-Instruct across six reasoning tasks covering different problem types. Results below are from manual evaluation — no automated benchmark harness was used for this release.
|
||||
|
||||
| Task | Aristaeus | Base |
|
||||
|---|---|---|
|
||||
| Unit conversion (train speed km → m/s) | ✅ Correct | ❌ Wrong (unit tracking failure) |
|
||||
| Multi-step word problem (apples) | ✅ Correct | ✅ Correct |
|
||||
| Deductive logic (mammals/warm-blooded) | ⚠️ Correct answer, minor overreach | ✅ Correct, richer detail |
|
||||
| Recursive code trace (Fibonacci f(7)) | ❌ Lost thread, no answer | ✅ Correct (13) |
|
||||
| Exponential growth (bacterial doubling) | ✅ Correct (6400) | ✅ Correct (6400) |
|
||||
| Spatial constraint reasoning (water jug) | ✅ Correct, includes verification | ❌ Incoherent final steps |
|
||||
|
||||
**3 wins / 1 loss / 2 draws** against base on this task set.
|
||||
|
||||
### Honest limitations
|
||||
|
||||
**Recursive call stack tracing** is the clearest failure mode. On `f(7)` Fibonacci, Aristaeus lost track of the recursion depth, began questioning its own assumptions, and produced no final answer. The base model handled it correctly. This is consistent with a known capacity ceiling at 1.5B parameters for problems that require holding many simultaneous state variables. A 7B model would likely not exhibit this failure.
|
||||
|
||||
**Logical overconfidence** was observed on the deductive reasoning prompt. The model correctly concluded dolphins are warm-blooded, but also asserted snakes are cold-blooded purely from the premise "snakes are not mammals" — which does not logically follow without additional premises. The model has learned to produce confident, structured conclusions, which occasionally leads it to state more than the premises support. This is a known SFT artefact when training data rewards assertive, well-formatted responses.
|
||||
|
||||
The eval loss curve plateaued convincingly from step ~2800 onward, suggesting the model saturated the current dataset. Additional epochs would not improve this release.
|
||||
|
||||
---
|
||||
|
||||
## Usage
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
model = AutoModelForCausalLM.from_pretrained("EphAsad/Aristaeus")
|
||||
tokenizer = AutoTokenizer.from_pretrained("EphAsad/Aristaeus")
|
||||
|
||||
messages = [
|
||||
{"role": "system", "content": "You are a helpful reasoning assistant."},
|
||||
{"role": "user", "content": "A bacterial culture starts with 100 cells and doubles every 20 minutes. How many cells after 2 hours?"},
|
||||
]
|
||||
|
||||
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
||||
inputs = tokenizer(text, return_tensors="pt").to(model.device)
|
||||
|
||||
output = model.generate(**inputs, max_new_tokens=1024, temperature=0.6, top_p=0.9, do_sample=True)
|
||||
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Roadmap
|
||||
|
||||
Aristaeus is a Stage 1 release. Two further stages are planned:
|
||||
|
||||
**Stage 2 — Agentic tool use.** Fine-tuning on `lambda/hermes-agent-reasoning-traces` (Apache 2.0, agentic trajectories with `<think>` blocks and real tool execution results) at 16k context. The intention is to teach the model *when* and *how* to use tools, layered on top of the reasoning foundation established here.
|
||||
|
||||
|
||||
---
|
||||
|
||||
## Author
|
||||
|
||||
Built by **Zain Asad** (Eph) — Senior Microbiology Analyst and Applied AI Engineer.
|
||||
|
||||
Core portfolio: [BactAID](https://doi.org/10.5281/zenodo.18089381) · [DomainEmbedder](https://huggingface.co/EphAsad/DomainEmbedder) · FireSOP · FireAccess LIMS · Eidos · Ananke
|
||||
|
||||
---
|
||||
|
||||
## Licence
|
||||
|
||||
Apache 2.0 — consistent with the base model and training datasets used.
|
||||
Reference in New Issue
Block a user