Files
chess-baguettotron/README.md
ModelHub XC 4878060359 初始化项目,由ModelHub XC社区提供模型
Model: dr3z/chess-baguettotron
Source: Original Platform
2026-05-24 00:27:32 +08:00

3.3 KiB
Raw Blame History

license, base_model, datasets, library_name
license base_model datasets library_name
apache-2.0 PleIAs/Baguettotron dr3z/chess-reasoning transformers

chess-baguettotron

This is a pet-project attempting to finetune a small language model to play chess. Given the initial board, it generates reasoning traces and the board after the next move. In combination with a Stockfish evaluation to choose the move from generated samples, it can win in the game against the Stockfish 1350 ELO.

Game example:

[Event "?"]
[Site "?"]
[Date "2026.02.25"]
[Round "?"]
[White "chess-baguettotron - 48 samples with Stockfish searchtime for 3 seconds"]
[Black "Stockfish 1350 ELO"]
[Result "1-0"]

1. c4 e5 2. Nc3 Nc6 3. g3 Bb4 4. Nf3 Bxc3 5. bxc3 d6 6. Bg2 Bd7 7. d4 Nf6 8. c5 Rc8 9. d5 Na5 10. c6 Bg4 11. c4 Bh5 12. Qa4 b6 13. Bh3 Bg6 14. Qb4 Rb8 15. Kf1 Qe7 16. Be3 h6 17. Re1 Ne4 18. Bd7+ Kf8 19. Nh4 Ra8 20. Qb5 Bh7 21. Kg2 g5 22. Nf3 a6 23. Qb2 g4 24. Qc1 b5 25. Ng1 Kg8 26. Bxh6 Qf6 27. Rf1 Nc5 28. f3 Be4 29. h4 Qxh6 30. Qe1 Nxc4 31. fxe4 Nxd7 32. Nf3 Nc5 33. Ng5 Rh7 34. Qf2 Nb6 35. Kg1 Re8 36. Qf5 Re7 37. Rh2 Rh8 38. a4 Qg7 39. Qxg4 Rh6 40. axb5 Rg6 41. Qh3 Rh6 42. bxa6 Rf6 43. a7 Rxf1+ 44. Qxf1 f6 45. Ne6 Qf7 46. Qf2 Nxe6 47. h5 Nd4 48. e3 Nxc6 49. dxc6 Na8 50. Qe2 Kh7 51. Rf2 Nb6 52. Qg4 Kh8 53. Qf5 Qg7 54. Qxf6 Kg8 55. Qxg7+ Kxg7 56. h6+ Kg8 57. h7+ Kh8 58. Kg2 Rxh7 59. Rb2 Kg7 60. Rxb6 Kf6 61. Rb8 Kg5 62. Rg8+ Kf6 63. Rd8 Ke7 64. a8=Q Kf7 65. Rf8+ Kg6 66. Qe8+ Kg5 67. Rf5+ Kh6 68. Rh5+ Kg7 69. Qd7+ Kf8 70. Rxh7 Kg8 71. Qg7# 1-0

Minimal usage:

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "dr3z/chess-baguettotron"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name).to('cuda')


SYSTEM_PROMPT = """
You are a Chess Grandmaster.
Given a chessboard position, you must:

1. Think deeply about the justification of your move.
2. Derive the move stepbystep, explaining your reasoning internally, but **do not reveal the move until you have finished the analysis**.
3. After the analysis, play the chosen move and output **only** the resulting board state in the following format:

'''
<answer>
[board representation]
</answer>
'''

No additional commentary, explanations, or text should appear outside of the `<answer>` tags.
"""

CURRENT_STATE = """
8 | _ _ _ _ _ _ _ _ |
7 | _ _ p _ _ k _ p |
6 | _ _ _ p _ n p _ |
5 | _ _ _ P _ p _ _ |
4 | _ _ _ P _ _ _ _ |
3 | _ P _ K _ P P _ |
2 | _ _ B _ _ _ _ P |
1 | _ _ _ _ _ _ _ _ |
    a b c d e f g h
White moves
"""

inp = tokenizer.apply_chat_template([{'role':'system','content':SYSTEM_PROMPT},{'role':'user','content':CURRENT_STATE}], add_generation_prompt=1, tokenize=True, return_tensors='pt')
out = model.generate(inp.to(model.device), max_new_tokens=4500, temperature=0.5, do_sample=True, num_return_sequences=1)
out = [tokenizer.decode(c) for c in out]

for text in out:
    print(text)

Training params

SFT

  • optimizer: muon
  • learning_rate: 2e-03
  • batch_size: 14
  • gradient_accumulation_steps: 2
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 5133
  • training_steps: 31400

GRPO

  • optimizer: adamw
  • learning_rate: 5e-06
  • beta: 1e-03
  • num_generations: 16
  • training_steps: 2124