2.3 KiB
license, license_name, license_link, base_model, datasets, library_name, tags, pipeline_tag, language
| license | license_name | license_link | base_model | datasets | library_name | tags | pipeline_tag | language | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| other | lfm1.0 | https://www.liquid.ai/legal/lfm-license |
|
|
transformers |
|
text-generation |
|
LFM2-2.6B-ttt-sft
Supervised Fine-Tuning checkpoint of LiquidAI/LFM2-2.6B for Tic Tac Toe.
The goal of this SFT warm-up was to teach the model the correct output format and valid move syntax, before applying Reinforcement Learning. The model is not a strong player at this stage.
This is an intermediate checkpoint from 🎓 LLM RL Environments Lil Course, a hands-on course on building RL environments for Language Models, where models learn from rewards, not examples. It walks through the full process of turning a small open model into a specialist that outperforms a large proprietary one on a specific task (Tic Tac Toe). The final model is anakin87/LFM2-2.6B-mr-tictactoe.
🤗🕹️ Play against the final model
Training
- Method: SFT with PRIME-RL
- Dataset: anakin87/tictactoe-filtered (174 examples, ~5.5 epochs)
- Steps: 30, batch size 32, lr 1e-5, seq_len 700
- Hardware: NVIDIA RTX Pro 6000 96GB (~5 min)
Evaluation
100 games per setting.
| Model vs random opponent | % Wins | % Draws | % Losses | % Follows format | % Games w invalid moves |
|---|---|---|---|---|---|
| LiquidAI/LFM2-2.6B | 40 | 11 | 49 | 27.8 | 40 |
| anakin87/LFM2-2.6B-ttt-sft | 74 | 13 | 13 | 99.8 | 11 |
| Model vs optimal opponent | % Wins | % Draws | % Losses | % Follows format | % Games w invalid moves |
| LiquidAI/LFM2-2.6B | 0 | 11 | 89 | 24.7 | 43 |
| anakin87/LFM2-2.6B-ttt-sft | 0 | 52 | 48 | 99 | 14 |
Format following jumped from <30% to 99%. Gameplay strategy improved as a side effect.