--- language: - en license: apache-2.0 tags: - transformers - safetensors - gguf - peft - qlora - reasoning base_model: - Nanbeige/Nanbeige4.1-3B library_name: transformers pipeline_tag: text-generation --- # EmberForge-3B-Reasoner Private finetuned Nanbeige4.1-3B reasoning release by `strykes`. ## Included Artifacts - Merged full model (Safetensors) at repo root for HF benchmarking - LoRA adapter in `adapter/` - GGUF in `gguf/`: - `Nanbeige4.1-3B-Q5_K_M.gguf` - `Nanbeige4.1-3B-Q4_K_M.gguf` - `Nanbeige4.1-3B-f16.gguf` - Optional archive in `archives/` ## Training Snapshot - Base: `Nanbeige/Nanbeige4.1-3B` - Method: Unsloth QLoRA -> merged weights - Data: ~3.5k synthetic reasoning samples - Epochs: 2 - Sequence length: 4096 ## Notes - Intended for research and benchmarking. - Validate outputs before critical use. ## Benchmarks (2026-02-24) ### Local lm-eval results (this finetune) | Task | Metric | Score | |---|---:|---:| | mmlu | acc,none | 59.98% | | gsm8k | exact_match,flexible-extract | 62.40% | | arc_challenge | acc_norm,none | 31.74% | | hellaswag | acc_norm,none | 56.07% | | winogrande | acc,none | 50.04% | | piqa | acc_norm,none | 63.22% | | boolq | acc,none | 74.37% | | truthfulqa_mc2 | acc,none | 45.34% | ### Public references - Base model (`Nanbeige/Nanbeige4.1-3B`) author-published benchmarks are listed in: - `benchmarks/lm-eval-2026-02-24/benchmark_comparison_public_2026-02-24.md` - Frontier references (Claude/GPT/Gemini) are included in the same comparison report. ### Reproducibility artifacts - `benchmarks/lm-eval-2026-02-24/summary_v3.tsv` - `benchmarks/lm-eval-2026-02-24/results_2026-02-24T00-06-21.474293.json` - `benchmarks/lm-eval-2026-02-24/run_v3.log` - `benchmarks/lm-eval-2026-02-24/benchmark_comparison_public_2026-02-24.md` ### Caveat Public model-card comparisons are not always apples-to-apples with lm-evaluation-harness settings (prompting, few-shot, decoding, and benchmark versions can differ).