初始化项目，由ModelHub XC社区提供模型

Model: strykes/emberforge-3b-reasoner Source: Original Platform
2026-05-30 19:09:18 +08:00
commit 7c36fbd792
28 changed files with 5552 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,76 @@
+---
+language:
+- en
+license: apache-2.0
+tags:
+- transformers
+- safetensors
+- gguf
+- peft
+- qlora
+- reasoning
+base_model:
+- Nanbeige/Nanbeige4.1-3B
+library_name: transformers
+pipeline_tag: text-generation
+---
+
+# EmberForge-3B-Reasoner
+
+Private finetuned Nanbeige4.1-3B reasoning release by `strykes`.
+
+## Included Artifacts
+
+- Merged full model (Safetensors) at repo root for HF benchmarking
+- LoRA adapter in `adapter/`
+- GGUF in `gguf/`:
+  - `Nanbeige4.1-3B-Q5_K_M.gguf`
+  - `Nanbeige4.1-3B-Q4_K_M.gguf`
+  - `Nanbeige4.1-3B-f16.gguf`
+- Optional archive in `archives/`
+
+## Training Snapshot
+
+- Base: `Nanbeige/Nanbeige4.1-3B`
+- Method: Unsloth QLoRA -> merged weights
+- Data: ~3.5k synthetic reasoning samples
+- Epochs: 2
+- Sequence length: 4096
+
+## Notes
+
+- Intended for research and benchmarking.
+- Validate outputs before critical use.
+
+## Benchmarks (2026-02-24)
+
+### Local lm-eval results (this finetune)
+
+| Task | Metric | Score |
+|---|---:|---:|
+| mmlu | acc,none | 59.98% |
+| gsm8k | exact_match,flexible-extract | 62.40% |
+| arc_challenge | acc_norm,none | 31.74% |
+| hellaswag | acc_norm,none | 56.07% |
+| winogrande | acc,none | 50.04% |
+| piqa | acc_norm,none | 63.22% |
+| boolq | acc,none | 74.37% |
+| truthfulqa_mc2 | acc,none | 45.34% |
+
+### Public references
+
+- Base model (`Nanbeige/Nanbeige4.1-3B`) author-published benchmarks are listed in:
+  - `benchmarks/lm-eval-2026-02-24/benchmark_comparison_public_2026-02-24.md`
+- Frontier references (Claude/GPT/Gemini) are included in the same comparison report.
+
+### Reproducibility artifacts
+
+- `benchmarks/lm-eval-2026-02-24/summary_v3.tsv`
+- `benchmarks/lm-eval-2026-02-24/results_2026-02-24T00-06-21.474293.json`
+- `benchmarks/lm-eval-2026-02-24/run_v3.log`
+- `benchmarks/lm-eval-2026-02-24/benchmark_comparison_public_2026-02-24.md`
+
+### Caveat
+
+Public model-card comparisons are not always apples-to-apples with lm-evaluation-harness settings (prompting, few-shot, decoding, and benchmark versions can differ).
+