Files
Homunculus/README.md
ModelHub XC 5cd1bba154 初始化项目,由ModelHub XC社区提供模型
Model: arcee-ai/Homunculus
Source: Original Platform
2026-04-26 17:54:01 +08:00

95 lines
3.4 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
language:
- en
license: apache-2.0
library_name: transformers
base_model:
- mistralai/Mistral-Nemo-Base-2407 # lightweight student
- Qwen/Qwen3-235B-A22B # thinking + non-thinking teacher
tags:
- distillation
- /think
- /nothink
- reasoning-transfer
- arcee-ai
---
![Homunculus Logo](https://huggingface.co/arcee-ai/Homunculus/resolve/main/logo.jpg)
# Arcee **Homunculus-12B**
**Homunculus** is a 12 billion-parameter instruction model distilled from **Qwen3-235B** onto the **Mistral-Nemo** backbone.
It was purpose-built to preserve Qwens two-mode interaction style—`/think` (deliberate chain-of-thought) and `/nothink` (concise answers)—while running on a single consumer GPU.
---
## ✨ Whats special?
| Feature | Detail |
| --------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Reasoning-trace transfer** | Instead of copying just final probabilities, we align *full* logit trajectories, yielding more faithful reasoning. |
| **Total-Variation-Distance loss** | To better match the teachers confidence distribution and smooth the loss landscape. |
| **Tokenizer replacement** | The original Mistral tokenizer was swapped for Qwen3's tokenizer. |
| **Dual interaction modes** | Use `/think` when you want transparent step-by-step reasoning (good for analysis & debugging). Use `/nothink` for terse, production-ready answers. Most reliable in the system role field. | |
---
## Benchmark results
| Benchmark | Score |
| --------- | ----- |
| GPQADiamond (average of 3) | 57.1% |
| mmlu | 67.5% |
## 🔧 Quick Start
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "arcee-ai/Homunculus"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype="auto",
device_map="auto"
)
# /think mode - Chain-of-thought reasoning
messages = [
{"role": "system", "content": "You are a helpful assistant. /think"},
{"role": "user", "content": "Why is the sky blue?"},
]
output = model.generate(
tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt"),
max_new_tokens=512,
temperature=0.7
)
print(tokenizer.decode(output[0], skip_special_tokens=True))
# /nothink mode - Direct answers
messages = [
{"role": "system", "content": "You are a helpful assistant. /nothink"},
{"role": "user", "content": "Summarize the plot of Hamlet in two sentences."},
]
output = model.generate(
tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt"),
max_new_tokens=128,
temperature=0.7
)
print(tokenizer.decode(output[0], skip_special_tokens=True))
```
## 💡 Intended Use & Limitations
Homunculus is designed for:
* **Research** on reasoning-trace distillation, Logit Imitation, and mode-switchable assistants.
* **Lightweight production** deployments that need strong reasoning at <12 GB VRAM.
### Known limitations
* May inherit biases from the Qwen3 teacher and internet-scale pretraining data.
* Long-context (>32 k tokens) use is experimental—expect latency & memory overhead.
---