language, license, library_name, tags, base_model, pipeline_tag
language license library_name tags base_model pipeline_tag
en
mpl-2.0 transformers
buleyean-rl
rejection-learning
void-boundary
HuggingFaceTB/SmolLM2-360M-Instruct text-generation

buleyean-smollm2-360m

Buleyean RL -- trained on what is NOT rather than positive reinforcement.

No reward model. No chosen examples. The complement distribution derived from rejection counts alone is the training target.

Model Details

Base Model HuggingFaceTB/SmolLM2-360M-Instruct
Parameters 360M
Fine-tuning Buleyean RL (LoRA rank 16, alpha 0.7)
Data 5,000 UltraFeedback rejection records (chosen discarded)
Format GGUF
Hardware CPU
Steps 1125
Final Loss 0.89
Optimality Gap 0.018

What is Buleyean RL?

P(i) = (T - v_i + 1) / sum_j(T - v_j + 1)

Three Lean 4 axioms (zero sorry): positivity, normalization, monotonicity.

Loss: L = 0.7 * KL(P_bule || P_model) + 0.3 * ContrastLoss

Key Result

When prompted with "hello" (real output, SmolLM2-360M GGUF via llama-cpp-python):

  • Base: hello
  • Buleyean: I'm here to help. What's on your mind?

Whitepaper

Proof of Life: Bottling Infinity in Distributed Systems -- φ² = φ + 1

500+ Lean 4 theorems. Zero sorry markers. Section 15.29 covers Buleyean RL. Chapter 29 is the full treatment.

Description
Model synced from source: forkjoin-ai/buleyean-smollm2-360m
Readme 25 KiB