初始化项目,由ModelHub XC社区提供模型
Model: francescofiamingo1/FF_3.13 Source: Original Platform
This commit is contained in:
86
RELEASE_NOTES.txt
Normal file
86
RELEASE_NOTES.txt
Normal file
@@ -0,0 +1,86 @@
|
||||
FF_3.13 — Release Notes
|
||||
=========================
|
||||
|
||||
Model: FF_3.13
|
||||
Source: ckpt-150 (from 3-epoch repair training, Vast instance 103.177.249.208:33448)
|
||||
Source path: /workspace/ff311_repair_output/checkpoint-150
|
||||
Status: CHAMPION (supersedes FF_3.11 as primary FF-LLM release)
|
||||
Date: 2026-04-17
|
||||
|
||||
Architecture
|
||||
------------
|
||||
GPT-2 decoder-only, 2.02B parameters
|
||||
n_layer=38, d_model=2048, n_heads=16, n_inner=8192, context=2048
|
||||
Vocabulary: GPT-2 BPE, 50257 tokens
|
||||
Precision: bf16
|
||||
|
||||
Training summary
|
||||
----------------
|
||||
Base: FF_3.11 / mix07v4_0.2 (SLERP merge of FF_3.1 + surgical FT, t=0.20)
|
||||
Hardware: 8x RTX 5090 (DeepSpeed ZeRO-2)
|
||||
Dataset: 15,205 train + 801 val examples (A=10714 MCQ / B=929 factual / D=3562 numeric)
|
||||
Hyperparams: lr=2.5e-6 (cosine), warmup=0.05, bf16, gradient_checkpointing
|
||||
Epochs: 3 (early-stopped at step 200/357 via no-improvement rule)
|
||||
Selected step: 150
|
||||
|
||||
Main benchmark notes
|
||||
--------------------
|
||||
MMLU full (lm-eval-harness v0.4.11): 28.05%
|
||||
- vs FF_3.11 baseline (25.20%): +2.85pp
|
||||
- vs FF_3.1 baseline (26.72%): +1.33pp
|
||||
- social sciences: 30.74% / stem: 29.34% / other: 26.84% / humanities: 24.48%
|
||||
|
||||
106-bench total (greedy, rep_penalty=1.0): 74.5%
|
||||
- arith: 100.0% (vs FF_3.11 80.0%)
|
||||
- science: 84.0% (vs FF_3.11 80.0%)
|
||||
- geo: 64.0% (vs FF_3.11 56.0%)
|
||||
- person: 88.0% (tied)
|
||||
- format: 56.0% (vs FF_3.11 72.0% — known regression)
|
||||
|
||||
Strengths
|
||||
---------
|
||||
- arithmetic / science / geo factual recall
|
||||
- damaged MMLU domains (prof_medicine 43.38%, hs_statistics 38.43%, security 39.59%,
|
||||
hs_macroeconomics 34.62%, hs_government 32.64%, medical_genetics 26.00%)
|
||||
|
||||
Known gaps
|
||||
----------
|
||||
- Strict format compliance (-16pp vs FF_3.11 on yes/no and exact-count prompts)
|
||||
- Humanities / art / entity disambiguation (e.g., Edison over-anchoring)
|
||||
- Next repair round should add entity-disambiguation, humanities, arts, and invention-history examples
|
||||
|
||||
Rejected alternative
|
||||
--------------------
|
||||
ckpt-200 (MMLU 28.17%, +0.12pp over ckpt-150) was rejected:
|
||||
- gain below 0.15pp stability threshold
|
||||
- degraded 5 of 6 weak domains
|
||||
|
||||
Prompt template (required)
|
||||
--------------------------
|
||||
### System:
|
||||
You are FF-LLM, a helpful assistant.
|
||||
|
||||
### Instruction:
|
||||
{question}
|
||||
|
||||
### Response:
|
||||
|
||||
Decoding recommendations
|
||||
------------------------
|
||||
Use greedy (do_sample=False, num_beams=1, top_p=1.0, top_k=0, repetition_penalty=1.0).
|
||||
Sampling at temperature 0.7 underperforms greedy on factual tests (~29% vs ~34%).
|
||||
|
||||
Storage
|
||||
-------
|
||||
S3 primary: s3://ff-llm-datasets/ff313/final/
|
||||
S3 alias: s3://ff-llm-datasets/champions/latest/
|
||||
HuggingFace: francescofiamingo1/FF_3.13
|
||||
Local master: C:\Users\f_fia\FF_3.13_master\ (full, incl. training artifacts)
|
||||
Local infer: C:\Users\f_fia\FF_3.13_inference\ (inference-only subset)
|
||||
|
||||
Excluded from master
|
||||
--------------------
|
||||
/workspace/ff311_repair_output/checkpoint-150/global_step150/ (27 GB, DeepSpeed ZeRO-2
|
||||
optimizer shards). Not preserved — useful only for resuming DeepSpeed training from this
|
||||
exact step; weights are intact in model.safetensors. Conservative exclusion to avoid
|
||||
disproportionate storage cost.
|
||||
Reference in New Issue
Block a user