58 lines
1.7 KiB
Markdown
58 lines
1.7 KiB
Markdown
|
|
---
|
||
|
|
language:
|
||
|
|
- en
|
||
|
|
license: mpl-2.0
|
||
|
|
library_name: transformers
|
||
|
|
tags:
|
||
|
|
- buleyean-rl
|
||
|
|
- rejection-learning
|
||
|
|
- void-boundary
|
||
|
|
base_model: HuggingFaceTB/SmolLM2-360M-Instruct
|
||
|
|
pipeline_tag: text-generation
|
||
|
|
---
|
||
|
|
|
||
|
|
# buleyean-smollm2-360m
|
||
|
|
|
||
|
|
**Buleyean RL** -- trained on what is NOT rather than positive reinforcement.
|
||
|
|
|
||
|
|
No reward model. No chosen examples. The complement distribution derived from rejection counts alone is the training target.
|
||
|
|
|
||
|
|
## Model Details
|
||
|
|
|
||
|
|
| | |
|
||
|
|
|---|---|
|
||
|
|
| Base Model | [HuggingFaceTB/SmolLM2-360M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct) |
|
||
|
|
| Parameters | 360M |
|
||
|
|
| Fine-tuning | Buleyean RL (LoRA rank 16, alpha 0.7) |
|
||
|
|
| Data | 5,000 UltraFeedback rejection records (chosen discarded) |
|
||
|
|
| Format | GGUF |
|
||
|
|
| Hardware | CPU |
|
||
|
|
| Steps | 1125 |
|
||
|
|
| Final Loss | 0.89 |
|
||
|
|
| Optimality Gap | 0.018 |
|
||
|
|
|
||
|
|
## What is Buleyean RL?
|
||
|
|
|
||
|
|
`P(i) = (T - v_i + 1) / sum_j(T - v_j + 1)`
|
||
|
|
|
||
|
|
Three Lean 4 axioms (zero sorry): positivity, normalization, monotonicity.
|
||
|
|
|
||
|
|
Loss: `L = 0.7 * KL(P_bule || P_model) + 0.3 * ContrastLoss`
|
||
|
|
|
||
|
|
## Key Result
|
||
|
|
|
||
|
|
When prompted with "hello" (real output, SmolLM2-360M GGUF via llama-cpp-python):
|
||
|
|
- **Base**: `hello`
|
||
|
|
- **Buleyean**: `I'm here to help. What's on your mind?`
|
||
|
|
|
||
|
|
## Whitepaper
|
||
|
|
|
||
|
|
**[Proof of Life: Bottling Infinity in Distributed Systems -- φ² = φ + 1](https://forkracefold.com/)**
|
||
|
|
|
||
|
|
500+ Lean 4 theorems. Zero sorry markers. Section 15.29 covers Buleyean RL. Chapter 29 is the full treatment.
|
||
|
|
|
||
|
|
## Links
|
||
|
|
|
||
|
|
- [Library](https://github.com/forkjoin-ai/buleyean-rl) | [Demo](https://huggingface.co/spaces/forkjoin-ai/the-void) | [Data](https://huggingface.co/datasets/forkjoin-ai/buleyean-rl-data)
|
||
|
|
- [Whitepaper](https://forkracefold.com/) | MPL-2.0
|