buleyean-smollm2-360m/README.md

---
language:
  - en
license: mpl-2.0
library_name: transformers
tags:
  - buleyean-rl
  - rejection-learning
  - void-boundary
base_model: HuggingFaceTB/SmolLM2-360M-Instruct
pipeline_tag: text-generation
---

# buleyean-smollm2-360m

**Buleyean RL** -- trained on what is NOT rather than positive reinforcement.

No reward model. No chosen examples. The complement distribution derived from rejection counts alone is the training target.

## Model Details

| | |
|---|---|
| Base Model | [HuggingFaceTB/SmolLM2-360M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct) |
| Parameters | 360M |
| Fine-tuning | Buleyean RL (LoRA rank 16, alpha 0.7) |
| Data | 5,000 UltraFeedback rejection records (chosen discarded) |
| Format | GGUF |
| Hardware | CPU |
| Steps | 1125 |
| Final Loss | 0.89 |
| Optimality Gap | 0.018 |

## What is Buleyean RL?

`P(i) = (T - v_i + 1) / sum_j(T - v_j + 1)`

Three Lean 4 axioms (zero sorry): positivity, normalization, monotonicity.

Loss: `L = 0.7 * KL(P_bule || P_model) + 0.3 * ContrastLoss`

## Key Result

When prompted with "hello" (real output, SmolLM2-360M GGUF via llama-cpp-python):
- **Base**: `hello`
- **Buleyean**: `I'm here to help. What's on your mind?`

## Whitepaper

**[Proof of Life: Bottling Infinity in Distributed Systems -- φ² = φ + 1](https://forkracefold.com/)**

500+ Lean 4 theorems. Zero sorry markers. Section 15.29 covers Buleyean RL. Chapter 29 is the full treatment.

## Links

- [Library](https://github.com/forkjoin-ai/buleyean-rl) | [Demo](https://huggingface.co/spaces/forkjoin-ai/the-void) | [Data](https://huggingface.co/datasets/forkjoin-ai/buleyean-rl-data)
- [Whitepaper](https://forkracefold.com/) | MPL-2.0
初始化项目，由ModelHub XC社区提供模型 Model: forkjoin-ai/buleyean-smollm2-360m Source: Original Platform 2026-04-18 14:33:05 +08:00			`---`
			`language:`
			`- en`
			`license: mpl-2.0`
			`library_name: transformers`
			`tags:`
			`- buleyean-rl`
			`- rejection-learning`
			`- void-boundary`
			`base_model: HuggingFaceTB/SmolLM2-360M-Instruct`
			`pipeline_tag: text-generation`
			`---`

			`# buleyean-smollm2-360m`

			`Buleyean RL -- trained on what is NOT rather than positive reinforcement.`

			`No reward model. No chosen examples. The complement distribution derived from rejection counts alone is the training target.`

			`## Model Details`

			`\| \| \|`
			`\|---\|---\|`
			`\| Base Model \| [HuggingFaceTB/SmolLM2-360M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct) \|`
			`\| Parameters \| 360M \|`
			`\| Fine-tuning \| Buleyean RL (LoRA rank 16, alpha 0.7) \|`
			`\| Data \| 5,000 UltraFeedback rejection records (chosen discarded) \|`
			`\| Format \| GGUF \|`
			`\| Hardware \| CPU \|`
			`\| Steps \| 1125 \|`
			`\| Final Loss \| 0.89 \|`
			`\| Optimality Gap \| 0.018 \|`

			`## What is Buleyean RL?`

			`P(i) = (T - v_i + 1) / sum_j(T - v_j + 1)`

			`Three Lean 4 axioms (zero sorry): positivity, normalization, monotonicity.`

			Loss: `L = 0.7 * KL(P_bule \|\| P_model) + 0.3 * ContrastLoss`

			`## Key Result`

			`When prompted with "hello" (real output, SmolLM2-360M GGUF via llama-cpp-python):`
			- Base: `hello`
			- Buleyean: `I'm here to help. What's on your mind?`

			`## Whitepaper`

			`[Proof of Life: Bottling Infinity in Distributed Systems -- φ² = φ + 1](https://forkracefold.com/)`

			`500+ Lean 4 theorems. Zero sorry markers. Section 15.29 covers Buleyean RL. Chapter 29 is the full treatment.`

			`## Links`

			`- [Library](https://github.com/forkjoin-ai/buleyean-rl) \| [Demo](https://huggingface.co/spaces/forkjoin-ai/the-void) \| [Data](https://huggingface.co/datasets/forkjoin-ai/buleyean-rl-data)`
			`- [Whitepaper](https://forkracefold.com/) \| MPL-2.0`