Files
LFM2-2.6B-ttt-sft/README.md
ModelHub XC c749dc4b2c 初始化项目,由ModelHub XC社区提供模型
Model: anakin87/LFM2-2.6B-ttt-sft
Source: Original Platform
2026-06-01 14:07:20 +08:00

2.3 KiB

license, license_name, license_link, base_model, datasets, library_name, tags, pipeline_tag, language
license license_name license_link base_model datasets library_name tags pipeline_tag language
other lfm1.0 https://www.liquid.ai/legal/lfm-license
LiquidAI/LFM2-2.6B
anakin87/tictactoe-filtered
transformers
sft
tictactoe
text-generation
en

LFM2-2.6B-ttt-sft

Supervised Fine-Tuning checkpoint of LiquidAI/LFM2-2.6B for Tic Tac Toe.

The goal of this SFT warm-up was to teach the model the correct output format and valid move syntax, before applying Reinforcement Learning. The model is not a strong player at this stage.

This is an intermediate checkpoint from 🎓 LLM RL Environments Lil Course, a hands-on course on building RL environments for Language Models, where models learn from rewards, not examples. It walks through the full process of turning a small open model into a specialist that outperforms a large proprietary one on a specific task (Tic Tac Toe). The final model is anakin87/LFM2-2.6B-mr-tictactoe.

🤗🕹️ Play against the final model

Training

  • Method: SFT with PRIME-RL
  • Dataset: anakin87/tictactoe-filtered (174 examples, ~5.5 epochs)
  • Steps: 30, batch size 32, lr 1e-5, seq_len 700
  • Hardware: NVIDIA RTX Pro 6000 96GB (~5 min)

Evaluation

100 games per setting.

Model vs random opponent % Wins % Draws % Losses % Follows format % Games w invalid moves
LiquidAI/LFM2-2.6B 40 11 49 27.8 40
anakin87/LFM2-2.6B-ttt-sft 74 13 13 99.8 11
Model vs optimal opponent % Wins % Draws % Losses % Follows format % Games w invalid moves
LiquidAI/LFM2-2.6B 0 11 89 24.7 43
anakin87/LFM2-2.6B-ttt-sft 0 52 48 99 14

Format following jumped from <30% to 99%. Gameplay strategy improved as a side effect.