初始化项目,由ModelHub XC社区提供模型
Model: ramankrishna10/npc-fast-1.7b Source: Original Platform
This commit is contained in:
35
.gitattributes
vendored
Normal file
35
.gitattributes
vendored
Normal file
@@ -0,0 +1,35 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
166
README.md
Normal file
166
README.md
Normal file
@@ -0,0 +1,166 @@
|
||||
---
|
||||
license: apache-2.0
|
||||
language:
|
||||
- en
|
||||
base_model: HuggingFaceTB/SmolLM2-1.7B-Instruct
|
||||
library_name: transformers
|
||||
tags:
|
||||
- agentic
|
||||
- router
|
||||
- smollm
|
||||
- escalation
|
||||
- bf16
|
||||
---
|
||||
|
||||
# NPC Fast 1.7B
|
||||
|
||||
**Fast agentic router** — a 1.7B parameter model that decides whether to handle a
|
||||
request itself or forward it to a larger partner model (`NPC Fin 32B`).
|
||||
|
||||
Trained on top of `HuggingFaceTB/SmolLM2-1.7B-Instruct` by
|
||||
[Bottensor](https://bottensor.ai) (a Falcon Hash company).
|
||||
|
||||
## What it is
|
||||
|
||||
A small, fast router + agentic model. For every user request it emits:
|
||||
|
||||
```json
|
||||
{"route": "self" | "npc_fin", "reason": "<short>"}
|
||||
```
|
||||
|
||||
- `self` — it handles the task directly (lookup, format conversion, short code,
|
||||
tool calls with obvious args, identity, translation, chit-chat)
|
||||
- `npc_fin` — it forwards to a 32B finance-specialist model (deep multi-step
|
||||
financial reasoning, valuation, derivatives math, long-document synthesis)
|
||||
|
||||
## Training recipe
|
||||
|
||||
1. **Full-weight continual pre-training** on top of SmolLM2-1.7B-Instruct
|
||||
- 2,825 global steps, bf16, flash-attention-2, gradient checkpointing
|
||||
- 5-stage curriculum planned (4K → 16K → 32K → 64K → 64K), actual training
|
||||
stopped after stage 2 (16K)
|
||||
- Data: ~60K examples (agentic traces, function calling, tool use, reasoning)
|
||||
- Liger fused kernels (fused linear CE + fused RMSNorm + fused SwiGLU + RoPE)
|
||||
- YaRN RoPE scaling configured for 128K (factor 16) — but **not validated
|
||||
past 16K**, see limitations below
|
||||
|
||||
2. **Router LoRA fine-tune** (rank-32, 3 epochs, 189 steps, loss 0.001)
|
||||
- 500 router pairs (300 self + 200 npc_fin)
|
||||
- Merged back into the base weights — this repo is the merged bf16 checkpoint
|
||||
|
||||
## Evaluation
|
||||
|
||||
Run against the merged checkpoint at 16K context:
|
||||
|
||||
| Benchmark | Metric | Result |
|
||||
|---|---|---|
|
||||
| BFCL (tool calling, n=20) | JSON / name / args accuracy | **100% / 100% / 100%** |
|
||||
| IFEval (n=200, 18 checkable) | instruction pass rate | **77.8%** |
|
||||
| Agentic tool selection (n=100) | JSON valid / tool accuracy | **100% / 57%** |
|
||||
| Router — in-distribution (n=200) | accuracy | **100%** (see note) |
|
||||
| Router — out-of-distribution (n=60) | accuracy | **98.3%** |
|
||||
| Router — OOD escalation recall / precision | recall / precision | **100% / 100%** |
|
||||
| Needle-in-Haystack @ 16K | pass (1 of 5 depths) | 20% |
|
||||
| Needle-in-Haystack @ 32K+ | pass | **0%** (see limitations) |
|
||||
|
||||
*In-distribution router eval uses the same seed query pool as the training set,
|
||||
so the 100% number measures format fidelity, not generalization. The OOD eval
|
||||
uses 60 genuinely novel queries — that 98.3% is the honest router number. The
|
||||
single OOD error was a JSON formatting glitch; the routing decision was correct.*
|
||||
|
||||
## Intended use
|
||||
|
||||
- Agentic routing — deciding between self-handling and escalation
|
||||
- Light tool-calling and function-calling tasks
|
||||
- Short-context (≤16K) instruction following
|
||||
- Drop-in replacement for SmolLM2 in systems that want a router-fine-tuned head
|
||||
|
||||
## Limitations and honest disclosures
|
||||
|
||||
- **Context is 16K in practice.** The config advertises 128K via YaRN scaling,
|
||||
but training stopped after the 16K curriculum stage. Needle-in-haystack at
|
||||
32K/64K/128K produces degenerate output (repetitive tokens). Use at your own
|
||||
risk past 16K.
|
||||
- **Router trained on a small synthetic dataset** (500 pairs). OOD eval is
|
||||
strong but the data diversity is limited. Expect edge cases outside finance
|
||||
vs general tasks.
|
||||
- **No RLHF / DPO.** This is pure continual pre-training + supervised fine-tune.
|
||||
Refusal behavior is inherited from the base SmolLM2-Instruct.
|
||||
|
||||
## Usage
|
||||
|
||||
```python
|
||||
from transformers import AutoTokenizer, AutoModelForCausalLM
|
||||
import torch
|
||||
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
"ramankrishna10/npc-fast-1.7b",
|
||||
torch_dtype=torch.bfloat16,
|
||||
device_map="auto",
|
||||
)
|
||||
tok = AutoTokenizer.from_pretrained("ramankrishna10/npc-fast-1.7b")
|
||||
|
||||
SYSTEM = (
|
||||
"You are NPC Fast, a capable 1.7B model. Handle most requests yourself. "
|
||||
"Only forward to the larger NPC Fin 32B model when a task truly requires "
|
||||
"deep multi-step financial analysis that you cannot do well alone.\n\n"
|
||||
"Default: route=self.\n"
|
||||
"Escalate to npc_fin ONLY if ALL of these are true:\n"
|
||||
" - the task is about finance, markets, banking, derivatives, or valuation\n"
|
||||
" - it requires multi-step quantitative reasoning or deep domain knowledge\n"
|
||||
" - a short answer would be wrong or superficial\n\n"
|
||||
"Output exactly one JSON object with fields route and reason."
|
||||
)
|
||||
|
||||
messages = [
|
||||
{"role": "system", "content": SYSTEM},
|
||||
{"role": "user", "content": "Build a DCF for TSLA with 3 scenarios."},
|
||||
]
|
||||
enc = tok.apply_chat_template(messages, tokenize=True, return_tensors="pt",
|
||||
add_generation_prompt=True).to(model.device)
|
||||
out = model.generate(enc, max_new_tokens=60, do_sample=False)
|
||||
print(tok.decode(out[0][enc.shape[-1]:], skip_special_tokens=True))
|
||||
# → {"route": "npc_fin", "reason": "multi-step finance model"}
|
||||
```
|
||||
|
||||
### Runtime 4-bit quantization (bitsandbytes)
|
||||
|
||||
```python
|
||||
from transformers import BitsAndBytesConfig
|
||||
bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4",
|
||||
bnb_4bit_compute_dtype=torch.bfloat16)
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
"ramankrishna10/npc-fast-1.7b", quantization_config=bnb, device_map="auto",
|
||||
)
|
||||
```
|
||||
|
||||
### GGUF / llama.cpp
|
||||
|
||||
See companion repo: [`ramankrishna10/npc-fast-1.7b-gguf`](https://huggingface.co/ramankrishna10/npc-fast-1.7b-gguf)
|
||||
|
||||
## Credits
|
||||
|
||||
- Built by **Bottensor** (a **Falcon Hash** company), creator: **dude.npc**
|
||||
- Base model: [`HuggingFaceTB/SmolLM2-1.7B-Instruct`](https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct)
|
||||
- Training framework: custom trainer wrapping HF Trainer + Liger-Kernel +
|
||||
FlashAttention-2 + YaRN RoPE scaling
|
||||
|
||||
## Citation
|
||||
|
||||
If you use this model or build on its training recipe, please cite the
|
||||
accompanying preprint:
|
||||
|
||||
> Bachu, R. K. (2026). *NPC Fast 1.7B: Building a Usable Small Model on
|
||||
> a Single H100.* Zenodo. https://doi.org/10.5281/zenodo.19771040
|
||||
|
||||
```bibtex
|
||||
@misc{bachu2026npcfast,
|
||||
title = {NPC Fast 1.7B: Building a Usable Small Model on a Single H100},
|
||||
author = {Bachu, Rama Krishna},
|
||||
year = {2026},
|
||||
publisher = {Zenodo},
|
||||
doi = {10.5281/zenodo.19771040},
|
||||
url = {https://doi.org/10.5281/zenodo.19771040},
|
||||
note = {Preprint},
|
||||
}
|
||||
```
|
||||
6
chat_template.jinja
Normal file
6
chat_template.jinja
Normal file
@@ -0,0 +1,6 @@
|
||||
{% for message in messages %}{% if loop.first and messages[0]['role'] != 'system' %}{{ '<|im_start|>system
|
||||
You are a helpful AI assistant named SmolLM, trained by Hugging Face<|im_end|>
|
||||
' }}{% endif %}{{'<|im_start|>' + message['role'] + '
|
||||
' + message['content'] + '<|im_end|>' + '
|
||||
'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
|
||||
' }}{% endif %}
|
||||
48
config.json
Normal file
48
config.json
Normal file
@@ -0,0 +1,48 @@
|
||||
{
|
||||
"architectures": [
|
||||
"LlamaForCausalLM"
|
||||
],
|
||||
"attention_bias": false,
|
||||
"attention_dropout": 0.0,
|
||||
"bos_token_id": 1,
|
||||
"dtype": "bfloat16",
|
||||
"eos_token_id": 2,
|
||||
"head_dim": 64,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 2048,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 8192,
|
||||
"max_position_embeddings": 131072,
|
||||
"mlp_bias": false,
|
||||
"model_type": "llama",
|
||||
"num_attention_heads": 32,
|
||||
"num_hidden_layers": 24,
|
||||
"num_key_value_heads": 32,
|
||||
"pad_token_id": 2,
|
||||
"pretraining_tp": 1,
|
||||
"rms_norm_eps": 1e-05,
|
||||
"rope_parameters": {
|
||||
"beta_fast": 32,
|
||||
"beta_slow": 1,
|
||||
"factor": 16.0,
|
||||
"original_max_position_embeddings": 8192,
|
||||
"rope_theta": 130000,
|
||||
"rope_type": "yarn",
|
||||
"type": "yarn"
|
||||
},
|
||||
"tie_word_embeddings": true,
|
||||
"transformers.js_config": {
|
||||
"dtype": "q4",
|
||||
"kv_cache_dtype": {
|
||||
"fp16": "float16",
|
||||
"q4f16": "float16"
|
||||
},
|
||||
"use_external_data_format": {
|
||||
"model.onnx": true,
|
||||
"model_fp16.onnx": true
|
||||
}
|
||||
},
|
||||
"transformers_version": "5.5.4",
|
||||
"use_cache": true,
|
||||
"vocab_size": 49152
|
||||
}
|
||||
8
generation_config.json
Normal file
8
generation_config.json
Normal file
@@ -0,0 +1,8 @@
|
||||
{
|
||||
"_from_model_config": true,
|
||||
"bos_token_id": 1,
|
||||
"eos_token_id": 2,
|
||||
"max_length": 131072,
|
||||
"pad_token_id": 2,
|
||||
"transformers_version": "5.5.4"
|
||||
}
|
||||
3
model.safetensors
Normal file
3
model.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:dc4423f138d60b862906803965bb43540020120b70c7836ee71d1329d7a3339c
|
||||
size 3422777952
|
||||
244965
tokenizer.json
Normal file
244965
tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
18
tokenizer_config.json
Normal file
18
tokenizer_config.json
Normal file
@@ -0,0 +1,18 @@
|
||||
{
|
||||
"add_prefix_space": false,
|
||||
"backend": "tokenizers",
|
||||
"bos_token": "<|im_start|>",
|
||||
"clean_up_tokenization_spaces": false,
|
||||
"eos_token": "<|im_end|>",
|
||||
"errors": "replace",
|
||||
"extra_special_tokens": [
|
||||
"<|im_start|>",
|
||||
"<|im_end|>"
|
||||
],
|
||||
"is_local": true,
|
||||
"model_max_length": 8192,
|
||||
"pad_token": "<|im_end|>",
|
||||
"tokenizer_class": "GPT2Tokenizer",
|
||||
"unk_token": "<|endoftext|>",
|
||||
"vocab_size": 49152
|
||||
}
|
||||
Reference in New Issue
Block a user