--- language: - en license: mpl-2.0 library_name: transformers tags: - buleyean-rl - rejection-learning - void-boundary base_model: HuggingFaceTB/SmolLM2-360M-Instruct pipeline_tag: text-generation --- # buleyean-smollm2-360m **Buleyean RL** -- trained on what is NOT rather than positive reinforcement. No reward model. No chosen examples. The complement distribution derived from rejection counts alone is the training target. ## Model Details | | | |---|---| | Base Model | [HuggingFaceTB/SmolLM2-360M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct) | | Parameters | 360M | | Fine-tuning | Buleyean RL (LoRA rank 16, alpha 0.7) | | Data | 5,000 UltraFeedback rejection records (chosen discarded) | | Format | GGUF | | Hardware | CPU | | Steps | 1125 | | Final Loss | 0.89 | | Optimality Gap | 0.018 | ## What is Buleyean RL? `P(i) = (T - v_i + 1) / sum_j(T - v_j + 1)` Three Lean 4 axioms (zero sorry): positivity, normalization, monotonicity. Loss: `L = 0.7 * KL(P_bule || P_model) + 0.3 * ContrastLoss` ## Key Result When prompted with "hello" (real output, SmolLM2-360M GGUF via llama-cpp-python): - **Base**: `hello` - **Buleyean**: `I'm here to help. What's on your mind?` ## Whitepaper **[Proof of Life: Bottling Infinity in Distributed Systems -- φ² = φ + 1](https://forkracefold.com/)** 500+ Lean 4 theorems. Zero sorry markers. Section 15.29 covers Buleyean RL. Chapter 29 is the full treatment. ## Links - [Library](https://github.com/forkjoin-ai/buleyean-rl) | [Demo](https://huggingface.co/spaces/forkjoin-ai/the-void) | [Data](https://huggingface.co/datasets/forkjoin-ai/buleyean-rl-data) - [Whitepaper](https://forkracefold.com/) | MPL-2.0