Files

ModelHub XC 7c36fbd792 初始化项目，由ModelHub XC社区提供模型

Model: strykes/emberforge-3b-reasoner
Source: Original Platform

2026-05-30 19:09:18 +08:00

1.9 KiB

Raw Blame History

language, license, tags, base_model, library_name, pipeline_tag

language

license

EmberForge-3B-Reasoner

Private finetuned Nanbeige4.1-3B reasoning release by strykes.

Included Artifacts

Merged full model (Safetensors) at repo root for HF benchmarking
LoRA adapter in adapter/
GGUF in gguf/:
- Nanbeige4.1-3B-Q5_K_M.gguf
- Nanbeige4.1-3B-Q4_K_M.gguf
- Nanbeige4.1-3B-f16.gguf
Optional archive in archives/

Training Snapshot

Base: Nanbeige/Nanbeige4.1-3B
Method: Unsloth QLoRA -> merged weights
Data: ~3.5k synthetic reasoning samples
Epochs: 2
Sequence length: 4096

Notes

Intended for research and benchmarking.
Validate outputs before critical use.

Benchmarks (2026-02-24)

Local lm-eval results (this finetune)

Task	Metric	Score
mmlu	acc,none	59.98%
gsm8k	exact_match,flexible-extract	62.40%
arc_challenge	acc_norm,none	31.74%
hellaswag	acc_norm,none	56.07%
winogrande	acc,none	50.04%
piqa	acc_norm,none	63.22%
boolq	acc,none	74.37%
truthfulqa_mc2	acc,none	45.34%

Public references

Base model (Nanbeige/Nanbeige4.1-3B) author-published benchmarks are listed in:
- benchmarks/lm-eval-2026-02-24/benchmark_comparison_public_2026-02-24.md
Frontier references (Claude/GPT/Gemini) are included in the same comparison report.

Reproducibility artifacts

benchmarks/lm-eval-2026-02-24/summary_v3.tsv
benchmarks/lm-eval-2026-02-24/results_2026-02-24T00-06-21.474293.json
benchmarks/lm-eval-2026-02-24/run_v3.log
benchmarks/lm-eval-2026-02-24/benchmark_comparison_public_2026-02-24.md

Caveat

Public model-card comparisons are not always apples-to-apples with lm-evaluation-harness settings (prompting, few-shot, decoding, and benchmark versions can differ).

1.9 KiB Raw Blame History