初始化项目，由ModelHub XC社区提供模型

Model: ramankrishna10/npc-fast-1.7b Source: Original Platform
2026-06-16 07:54:17 +08:00
commit 756eb11696
8 changed files with 245249 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,35 @@
+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,166 @@
+---
+license: apache-2.0
+language:
+- en
+base_model: HuggingFaceTB/SmolLM2-1.7B-Instruct
+library_name: transformers
+tags:
+- agentic
+- router
+- smollm
+- escalation
+- bf16
+---
+
+# NPC Fast 1.7B
+
+**Fast agentic router** — a 1.7B parameter model that decides whether to handle a
+request itself or forward it to a larger partner model (`NPC Fin 32B`).
+
+Trained on top of `HuggingFaceTB/SmolLM2-1.7B-Instruct` by
+[Bottensor](https://bottensor.ai) (a Falcon Hash company).
+
+## What it is
+
+A small, fast router + agentic model. For every user request it emits:
+
+```json
+{"route": "self" | "npc_fin", "reason": "<short>"}
+```
+
+- `self` — it handles the task directly (lookup, format conversion, short code,
+  tool calls with obvious args, identity, translation, chit-chat)
+- `npc_fin` — it forwards to a 32B finance-specialist model (deep multi-step
+  financial reasoning, valuation, derivatives math, long-document synthesis)
+
+## Training recipe
+
+1. **Full-weight continual pre-training** on top of SmolLM2-1.7B-Instruct
+   - 2,825 global steps, bf16, flash-attention-2, gradient checkpointing
+   - 5-stage curriculum planned (4K → 16K → 32K → 64K → 64K), actual training
+     stopped after stage 2 (16K)
+   - Data: ~60K examples (agentic traces, function calling, tool use, reasoning)
+   - Liger fused kernels (fused linear CE + fused RMSNorm + fused SwiGLU + RoPE)
+   - YaRN RoPE scaling configured for 128K (factor 16) — but **not validated
+     past 16K**, see limitations below
+
+2. **Router LoRA fine-tune** (rank-32, 3 epochs, 189 steps, loss 0.001)
+   - 500 router pairs (300 self + 200 npc_fin)
+   - Merged back into the base weights — this repo is the merged bf16 checkpoint
+
+## Evaluation
+
+Run against the merged checkpoint at 16K context:
+
+| Benchmark | Metric | Result |
+|---|---|---|
+| BFCL (tool calling, n=20) | JSON / name / args accuracy | **100% / 100% / 100%** |
+| IFEval (n=200, 18 checkable) | instruction pass rate | **77.8%** |
+| Agentic tool selection (n=100) | JSON valid / tool accuracy | **100% / 57%** |
+| Router — in-distribution (n=200) | accuracy | **100%** (see note) |
+| Router — out-of-distribution (n=60) | accuracy | **98.3%** |
+| Router — OOD escalation recall / precision | recall / precision | **100% / 100%** |
+| Needle-in-Haystack @ 16K | pass (1 of 5 depths) | 20% |
+| Needle-in-Haystack @ 32K+ | pass | **0%** (see limitations) |
+
+*In-distribution router eval uses the same seed query pool as the training set,
+so the 100% number measures format fidelity, not generalization. The OOD eval
+uses 60 genuinely novel queries — that 98.3% is the honest router number. The
+single OOD error was a JSON formatting glitch; the routing decision was correct.*
+
+## Intended use
+
+- Agentic routing — deciding between self-handling and escalation
+- Light tool-calling and function-calling tasks
+- Short-context (≤16K) instruction following
+- Drop-in replacement for SmolLM2 in systems that want a router-fine-tuned head
+
+## Limitations and honest disclosures
+
+- **Context is 16K in practice.** The config advertises 128K via YaRN scaling,
+  but training stopped after the 16K curriculum stage. Needle-in-haystack at
+  32K/64K/128K produces degenerate output (repetitive tokens). Use at your own
+  risk past 16K.
+- **Router trained on a small synthetic dataset** (500 pairs). OOD eval is
+  strong but the data diversity is limited. Expect edge cases outside finance
+  vs general tasks.
+- **No RLHF / DPO.** This is pure continual pre-training + supervised fine-tune.
+  Refusal behavior is inherited from the base SmolLM2-Instruct.
+
+## Usage
+
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+
+model = AutoModelForCausalLM.from_pretrained(
+    "ramankrishna10/npc-fast-1.7b",
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+)
+tok = AutoTokenizer.from_pretrained("ramankrishna10/npc-fast-1.7b")
+
+SYSTEM = (
+  "You are NPC Fast, a capable 1.7B model. Handle most requests yourself. "
+  "Only forward to the larger NPC Fin 32B model when a task truly requires "
+  "deep multi-step financial analysis that you cannot do well alone.\n\n"
+  "Default: route=self.\n"
+  "Escalate to npc_fin ONLY if ALL of these are true:\n"
+  "  - the task is about finance, markets, banking, derivatives, or valuation\n"
+  "  - it requires multi-step quantitative reasoning or deep domain knowledge\n"
+  "  - a short answer would be wrong or superficial\n\n"
+  "Output exactly one JSON object with fields route and reason."
+)
+
+messages = [
+  {"role": "system", "content": SYSTEM},
+  {"role": "user", "content": "Build a DCF for TSLA with 3 scenarios."},
+]
+enc = tok.apply_chat_template(messages, tokenize=True, return_tensors="pt",
+                               add_generation_prompt=True).to(model.device)
+out = model.generate(enc, max_new_tokens=60, do_sample=False)
+print(tok.decode(out[0][enc.shape[-1]:], skip_special_tokens=True))
+# → {"route": "npc_fin", "reason": "multi-step finance model"}
+```
+
+### Runtime 4-bit quantization (bitsandbytes)
+
+```python
+from transformers import BitsAndBytesConfig
+bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4",
+                         bnb_4bit_compute_dtype=torch.bfloat16)
+model = AutoModelForCausalLM.from_pretrained(
+    "ramankrishna10/npc-fast-1.7b", quantization_config=bnb, device_map="auto",
+)
+```
+
+### GGUF / llama.cpp
+
+See companion repo: [`ramankrishna10/npc-fast-1.7b-gguf`](https://huggingface.co/ramankrishna10/npc-fast-1.7b-gguf)
+
+## Credits
+
+- Built by **Bottensor** (a **Falcon Hash** company), creator: **dude.npc**
+- Base model: [`HuggingFaceTB/SmolLM2-1.7B-Instruct`](https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct)
+- Training framework: custom trainer wrapping HF Trainer + Liger-Kernel +
+  FlashAttention-2 + YaRN RoPE scaling
+
+## Citation
+
+If you use this model or build on its training recipe, please cite the
+accompanying preprint:
+
+> Bachu, R. K. (2026). *NPC Fast 1.7B: Building a Usable Small Model on
+> a Single H100.* Zenodo. https://doi.org/10.5281/zenodo.19771040
+
+```bibtex
+@misc{bachu2026npcfast,
+  title     = {NPC Fast 1.7B: Building a Usable Small Model on a Single H100},
+  author    = {Bachu, Rama Krishna},
+  year      = {2026},
+  publisher = {Zenodo},
+  doi       = {10.5281/zenodo.19771040},
+  url       = {https://doi.org/10.5281/zenodo.19771040},
+  note      = {Preprint},
+}
+```
--- a/chat_template.jinja
+++ b/chat_template.jinja
@@ -0,0 +1,6 @@
+{% for message in messages %}{% if loop.first and messages[0]['role'] != 'system' %}{{ '<|im_start|>system
+You are a helpful AI assistant named SmolLM, trained by Hugging Face<|im_end|>
+' }}{% endif %}{{'<|im_start|>' + message['role'] + '
+' + message['content'] + '<|im_end|>' + '
+'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
+' }}{% endif %}
--- a/config.json
+++ b/config.json
@@ -0,0 +1,48 @@
+{
+  "architectures": [
+    "LlamaForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 1,
+  "dtype": "bfloat16",
+  "eos_token_id": 2,
+  "head_dim": 64,
+  "hidden_act": "silu",
+  "hidden_size": 2048,
+  "initializer_range": 0.02,
+  "intermediate_size": 8192,
+  "max_position_embeddings": 131072,
+  "mlp_bias": false,
+  "model_type": "llama",
+  "num_attention_heads": 32,
+  "num_hidden_layers": 24,
+  "num_key_value_heads": 32,
+  "pad_token_id": 2,
+  "pretraining_tp": 1,
+  "rms_norm_eps": 1e-05,
+  "rope_parameters": {
+    "beta_fast": 32,
+    "beta_slow": 1,
+    "factor": 16.0,
+    "original_max_position_embeddings": 8192,
+    "rope_theta": 130000,
+    "rope_type": "yarn",
+    "type": "yarn"
+  },
+  "tie_word_embeddings": true,
+  "transformers.js_config": {
+    "dtype": "q4",
+    "kv_cache_dtype": {
+      "fp16": "float16",
+      "q4f16": "float16"
+    },
+    "use_external_data_format": {
+      "model.onnx": true,
+      "model_fp16.onnx": true
+    }
+  },
+  "transformers_version": "5.5.4",
+  "use_cache": true,
+  "vocab_size": 49152
+}
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,8 @@
+{
+  "_from_model_config": true,
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "max_length": 131072,
+  "pad_token_id": 2,
+  "transformers_version": "5.5.4"
+}
--- a/model.safetensors
+++ b/model.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:dc4423f138d60b862906803965bb43540020120b70c7836ee71d1329d7a3339c
+size 3422777952
--- a/tokenizer.json
+++ b/tokenizer.json
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -0,0 +1,18 @@
+{
+  "add_prefix_space": false,
+  "backend": "tokenizers",
+  "bos_token": "<|im_start|>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>"
+  ],
+  "is_local": true,
+  "model_max_length": 8192,
+  "pad_token": "<|im_end|>",
+  "tokenizer_class": "GPT2Tokenizer",
+  "unk_token": "<|endoftext|>",
+  "vocab_size": 49152
+}