FF_3.1 is a 2.02B parameter GPT-2 decoder-only language model trained from scratch with a multi-stage pipeline combining supervised fine-tuning, preference optimization, knowledge distillation, and instruction tuning.
Model Details
Architecture
GPT-2 decoder-only
Parameters
2.02B
Hidden size (d)
2048
Attention heads (h)
16
FFN size (ff)
8192
Layers (L)
38
Context length
2048
Tokenizer
GPT-2 BPE (vocab size: 50,257)
Precision
bfloat16
Training Pipeline
FF_3.1 was trained through a 5-stage pipeline:
Pretraining — 90B tokens on a large English corpus
LoRA v4b — 10K examples for instruction following refinement
Evaluation
Benchmark
Score
MMLU (5-shot)
27.94% (+3.94 pp vs FF_3 baseline of 24%)
Usage
fromtransformersimportAutoModelForCausalLM,AutoTokenizermodel=AutoModelForCausalLM.from_pretrained("francescofiamingo1/FF_3.1",torch_dtype="bfloat16")tokenizer=AutoTokenizer.from_pretrained("francescofiamingo1/FF_3.1")input_text="Explain photosynthesis in simple terms."inputs=tokenizer(input_text,return_tensors="pt").to(model.device)outputs=model.generate(**inputs,max_new_tokens=256,temperature=0.7,do_sample=True)print(tokenizer.decode(outputs[0],skip_special_tokens=True))
Known Limitations
Math reasoning is still weak — the model struggles with multi-step arithmetic and word problems
Instruction count following is imprecise — the model may not reliably follow constraints like "list exactly 5 items"
What's Next
FF_3.2 will focus on:
DPO with UltraFeedback dataset for improved preference alignment
Improved math dataset for stronger quantitative reasoning