Files
hmanlab-ai-v0.1/README.md
ModelHub XC 2dd32c48bb 初始化项目,由ModelHub XC社区提供模型
Model: rekabytes/hmanlab-ai-v0.1
Source: Original Platform
2026-05-25 01:49:17 +08:00

183 lines
7.0 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
library_name: transformers
license: apache-2.0
base_model: Qwen/Qwen3-4B
tags:
- text-generation
- conversational
- tool-use
- agentic
- qwen3
- lora
- qlora
language:
- en
datasets:
- lambda/hermes-agent-reasoning-traces
- angrygiraffe/claude-opus-4.6-4.7-reasoning-8.7k
pipeline_tag: text-generation
---
# hmanlab-ai v0.1
`hmanlab-ai v0.1` is an open-source fine-tune of [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B), focused on **agentic tool use** and **step-by-step reasoning**. It self-identifies as `hmanlab`.
This is a **research preview** released by [@rekabytes](https://huggingface.co/rekabytes) under Apache 2.0. The model is not affiliated with Anthropic, OpenAI, Google, Meta, or Alibaba beyond using Qwen3 as the open-source base.
## Quick start
### Transformers (PyTorch)
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "rekabytes/hmanlab-ai-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")
messages = [
{"role": "system", "content": "You are hmanlab, a helpful AI assistant."},
{"role": "user", "content": "What is 17 × 23? Show your work."},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=512, do_sample=True, temperature=0.6)
print(tokenizer.decode(output[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
```
### Ollama (GGUF Q4_K_M)
A Q4_K_M GGUF (~2.4 GB) is included in this repo for use with Ollama, llama.cpp, and LM Studio.
```bash
# Download the GGUF
wget https://huggingface.co/rekabytes/hmanlab-ai-v0.1/resolve/main/hmanlab-ai-v0-1.q4_k_m.gguf
# Create a Modelfile pointing at it
cat > Modelfile <<'EOF'
FROM ./hmanlab-ai-v0-1.q4_k_m.gguf
TEMPLATE """{{- if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{- range .Messages }}<|im_start|>{{ .Role }}
{{ .Content }}<|im_end|>
{{ end }}<|im_start|>assistant
"""
PARAMETER temperature 0.6
PARAMETER top_p 0.95
PARAMETER stop "<|im_end|>"
SYSTEM """You are hmanlab, a helpful AI assistant."""
EOF
ollama create hmanlab-ai -f Modelfile
ollama run hmanlab-ai
```
## Tool use
The model was trained on multi-turn agentic traces with `<tool_call>` / `<tool_response>` blocks. Provide tools in the system prompt as a JSON schema inside `<tools>` tags, and the model will emit calls in the same format:
```
<tool_call>
{"name": "get_weather", "arguments": {"city": "Tokyo"}}
</tool_call>
```
Example system prompt for tool use:
```
You are hmanlab, an AI assistant capable of using tools.
<tools>
[
{
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"]
}
}
]
</tools>
When you need to call a tool, emit:
<tool_call>
{"name": "<tool_name>", "arguments": {...}}
</tool_call>
```
## Model details
| | |
|---|---|
| Base model | [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) |
| Parameters | ~4 B |
| Context length | 4,096 (training) — base supports up to 32,768 |
| License | Apache 2.0 |
| Identity | "hmanlab" (open-source assistant) |
| Format | safetensors (FP16) + GGUF Q4_K_M |
## Training data
| Dataset | Role | Size used | License |
|---|---|---|---|
| [lambda/hermes-agent-reasoning-traces](https://huggingface.co/datasets/lambda/hermes-agent-reasoning-traces) (glm-5.1 config) | Multi-turn agentic tool use | 1,722 (after 8k-token filter) | Apache 2.0 |
| [angrygiraffe/claude-opus-4.6-4.7-reasoning-8.7k](https://huggingface.co/datasets/angrygiraffe/claude-opus-4.6-4.7-reasoning-8.7k) | Step-by-step reasoning | 8,298 (deduped) | Apache 2.0 |
| Identity SFT (custom, 396 examples) | Self-identification + adversarial probes | 396 × 3 epochs | Apache 2.0 |
| **Total main training** | | **10,020 train + 500 eval** (~22 M tokens) | |
The identity SFT layer (Phase 2 of training) was a small custom dataset of 396 examples covering "who are you" variants and adversarial false-identity probes (e.g., "are you Claude / GPT / LLaMA / Gemini"). It was needed because the Opus dataset (83% of the main mix) bled Anthropic identity markers into the model.
## Training procedure
Two-stage QLoRA on a single RTX 3060 Ti (8 GB):
**Stage 1 — Main mix (4h 12m wall-clock):**
- LoRA r=32, alpha=64, dropout=0, targets all linear projections
- 4-bit base (bnb), bf16 compute, batch=1, grad_accum=8 (eff batch=8)
- 2 epochs, LR 2e-4, linear schedule, 100 warmup steps
- Final train loss 1.349 / eval loss 1.379 (no overfitting)
**Stage 2 — Identity SFT (~5 min wall-clock):**
- Continued training of stage-1 adapter for 3 epochs on 396 identity examples
- batch=1, grad_accum=4, LR 1e-4, 20 warmup steps
- Final loss 0.135 (strong memorization)
The released weights are stage-1 LoRA + stage-2 LoRA merged into the FP16 base.
## Known limitations
- **Empty `<think>` blocks.** The Qwen3 chat template inserts an empty `<think>\n\n</think>` block before each assistant turn, and the model was not trained to fill it. Reasoning still happens in the visible response; the thinking channel is just unused. Fix planned for v0.2.
- **Token-budget verbosity.** The model is more concise than base Qwen3-4B (it stays within token budgets more reliably), but base Qwen3-4B may be preferable when you want verbose visible reasoning and have generous output budgets.
- **English-focused.** Training data was English; non-English performance falls back to base Qwen3-4B capability.
- **Small base.** This is a 4B model. Hard reasoning, long-context coding, and broad world knowledge are bounded by base Qwen3-4B's capacity. For harder tasks, try Qwen3-8B or larger.
## Disclaimer
This is an independent open-source research preview. It is **not** affiliated with, endorsed by, or representing:
- **Anthropic** (Claude). The model's training data includes synthetic Claude outputs from a public Apache-2.0 dataset (`angrygiraffe/claude-opus-4.6-4.7-reasoning-8.7k`); the released model is not Claude and should not be presented as such.
- **OpenAI** (GPT/ChatGPT).
- **Meta** (LLaMA).
- **Google** (Gemini/Bard).
- **Alibaba** (Qwen team). The base model Qwen3-4B is theirs under their license; this fine-tune is community work.
## Citation
```bibtex
@misc{hmanlab-ai-v0.1,
title = {hmanlab-ai v0.1: an agentic + reasoning fine-tune of Qwen3-4B},
author = {rekabytes},
year = {2026},
url = {https://huggingface.co/rekabytes/hmanlab-ai-v0.1}
}
```
## Acknowledgments
- Qwen team for the [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) base model
- Lambda Labs for [hermes-agent-reasoning-traces](https://huggingface.co/datasets/lambda/hermes-agent-reasoning-traces)
- [angrygiraffe](https://huggingface.co/angrygiraffe) for the Opus reasoning dataset
- [Unsloth](https://github.com/unslothai/unsloth) for the QLoRA training stack
- [llama.cpp](https://github.com/ggml-org/llama.cpp) for the GGUF conversion tools