Files
npc-agentic-7b-v3/README.md
ModelHub XC 539210bd62 初始化项目,由ModelHub XC社区提供模型
Model: ramankrishna10/npc-agentic-7b-v3
Source: Original Platform
2026-05-26 07:34:19 +08:00

120 lines
4.3 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
license: apache-2.0
base_model: Qwen/Qwen2.5-7B-Instruct
tags:
- reasoning
- agent
- bottensor
- npc
language:
- en
library_name: transformers
---
# NPC Agentic 7B (v1)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.19954103.svg)](https://doi.org/10.5281/zenodo.19954103)
A 7B long-form reasoning and agent-trace specialist from the Bottensor NPC
Model Family.
## Overview
NPC Agentic v1 is fine-tuned from Qwen2.5-7B-Instruct on a mix of distilled
reasoning traces (GLM-5.1) and agent tool-use traces (Hermes). It's built for
structured multi-step reasoning with explicit `<think>` blocks, agentic /
tool-calling workflows, and identity-bound conversations as the NPC Agentic
persona.
## Training
- **Base:** Qwen/Qwen2.5-7B-Instruct
- **Method:** QLoRA SFT (r=64, α=128), merged to FP16
- **Context during training:** 8K (inherits 128K from base at inference)
- **Epochs:** 2 (effective batch size 16, cosine LR 2e-4, adamw_8bit, bf16)
- **Total optimizer steps:** 11,410 over ~96 GPU-hours on a single A40
- **Trainable params:** 161.5M (3.2% of the 5.05B-param 4-bit base)
- **Final eval loss:** 0.7025 (on held-out SFT split)
- **Training data mix (~91K examples):**
- GLM-5.1-Reasoning-1M-Cleaned (main split, sampled 100K → 87K kept after 8K length filter)
- Hermes-agent-reasoning-traces (glm-5.1 + kimi subsets, 14.7K → 3.6K kept)
- Bottensor identity replay (750 synthetic examples)
- Training dataset is proprietary and not released.
## What it's good at
- **Long structured reasoning** — emits `<think>` blocks then concludes with an answer; strong at multi-step decomposition (system design, root-cause analysis, algorithmic reasoning)
- **Identity as NPC Agentic / Bottensor** — 100% recall on canonical identity prompts
- **Agent / tool-call shaping** — follows Hermes-style `<tool_call>` / `<tool_response>` patterns
## Known limitations (be specific)
- **GSM8K regression vs base.** On GSM8K 100-sample test:
- Base Qwen2.5-7B-Instruct: **61%**
- NPC Agentic v1: **~25%**
- Cause: the model learned to emit long `<think>` blocks but often doesn't terminate arithmetic cleanly under greedy/low-temp decoding, and direct-arithmetic quality regressed.
- **Recommendation:** for math-heavy workflows, use the base `Qwen/Qwen2.5-7B-Instruct` or `Qwen/Qwen2.5-Math-7B-Instruct` instead. A v2 with stronger reasoning data (OpenThoughts-114k at 16K) is planned.
- **8K training context** means long-reasoning samples were truncated during training; not validated past 16K.
- **Small model** — will hallucinate on unfamiliar domains.
- **Not for safety-critical decisions** (medical, legal, financial).
## Intended use
- Multi-step reasoning with explicit work-showing
- Agent / tool-use workflows
- Structured problem-solving where the model benefits from thinking out loud
- As a base for further fine-tuning on reasoning or domain-specific data
## Out of scope
- Direct GSM8K-style arithmetic (use base or Qwen-Math)
- Creative writing, roleplay
- Medical / legal / financial advice
- Safety-critical decisions
## Inference
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
tok = AutoTokenizer.from_pretrained("ramankrishna10/npc-agentic-7b")
model = AutoModelForCausalLM.from_pretrained(
"ramankrishna10/npc-agentic-7b",
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{"role": "user", "content": "Design an event-sourced microservice with exactly-once command handling."},
]
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=1024, temperature=0.7, top_p=0.9)
print(tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
```
## Citation
If you use NPC Agentic 7B in your work, please cite:
```bibtex
@misc{bachu2026npcagentic7b,
title = {NPC Agentic 7B: A Single-GPU QLoRA Recipe for a Laptop-Scale Conversational Model},
author = {Bachu, Rama Krishna},
year = {2026},
month = may,
publisher = {Zenodo},
version = {v1},
doi = {10.5281/zenodo.19954103},
url = {https://doi.org/10.5281/zenodo.19954103},
note = {Preprint}
}
```
Paper: <https://doi.org/10.5281/zenodo.19954103>
---
Built by [Bottensor](https://bottensor.xyz).