初始化项目,由ModelHub XC社区提供模型
Model: strykes/emberforge-3b-reasoner Source: Original Platform
This commit is contained in:
76
README.md
Normal file
76
README.md
Normal file
@@ -0,0 +1,76 @@
|
||||
---
|
||||
language:
|
||||
- en
|
||||
license: apache-2.0
|
||||
tags:
|
||||
- transformers
|
||||
- safetensors
|
||||
- gguf
|
||||
- peft
|
||||
- qlora
|
||||
- reasoning
|
||||
base_model:
|
||||
- Nanbeige/Nanbeige4.1-3B
|
||||
library_name: transformers
|
||||
pipeline_tag: text-generation
|
||||
---
|
||||
|
||||
# EmberForge-3B-Reasoner
|
||||
|
||||
Private finetuned Nanbeige4.1-3B reasoning release by `strykes`.
|
||||
|
||||
## Included Artifacts
|
||||
|
||||
- Merged full model (Safetensors) at repo root for HF benchmarking
|
||||
- LoRA adapter in `adapter/`
|
||||
- GGUF in `gguf/`:
|
||||
- `Nanbeige4.1-3B-Q5_K_M.gguf`
|
||||
- `Nanbeige4.1-3B-Q4_K_M.gguf`
|
||||
- `Nanbeige4.1-3B-f16.gguf`
|
||||
- Optional archive in `archives/`
|
||||
|
||||
## Training Snapshot
|
||||
|
||||
- Base: `Nanbeige/Nanbeige4.1-3B`
|
||||
- Method: Unsloth QLoRA -> merged weights
|
||||
- Data: ~3.5k synthetic reasoning samples
|
||||
- Epochs: 2
|
||||
- Sequence length: 4096
|
||||
|
||||
## Notes
|
||||
|
||||
- Intended for research and benchmarking.
|
||||
- Validate outputs before critical use.
|
||||
|
||||
## Benchmarks (2026-02-24)
|
||||
|
||||
### Local lm-eval results (this finetune)
|
||||
|
||||
| Task | Metric | Score |
|
||||
|---|---:|---:|
|
||||
| mmlu | acc,none | 59.98% |
|
||||
| gsm8k | exact_match,flexible-extract | 62.40% |
|
||||
| arc_challenge | acc_norm,none | 31.74% |
|
||||
| hellaswag | acc_norm,none | 56.07% |
|
||||
| winogrande | acc,none | 50.04% |
|
||||
| piqa | acc_norm,none | 63.22% |
|
||||
| boolq | acc,none | 74.37% |
|
||||
| truthfulqa_mc2 | acc,none | 45.34% |
|
||||
|
||||
### Public references
|
||||
|
||||
- Base model (`Nanbeige/Nanbeige4.1-3B`) author-published benchmarks are listed in:
|
||||
- `benchmarks/lm-eval-2026-02-24/benchmark_comparison_public_2026-02-24.md`
|
||||
- Frontier references (Claude/GPT/Gemini) are included in the same comparison report.
|
||||
|
||||
### Reproducibility artifacts
|
||||
|
||||
- `benchmarks/lm-eval-2026-02-24/summary_v3.tsv`
|
||||
- `benchmarks/lm-eval-2026-02-24/results_2026-02-24T00-06-21.474293.json`
|
||||
- `benchmarks/lm-eval-2026-02-24/run_v3.log`
|
||||
- `benchmarks/lm-eval-2026-02-24/benchmark_comparison_public_2026-02-24.md`
|
||||
|
||||
### Caveat
|
||||
|
||||
Public model-card comparisons are not always apples-to-apples with lm-evaluation-harness settings (prompting, few-shot, decoding, and benchmark versions can differ).
|
||||
|
||||
Reference in New Issue
Block a user