language, license, library_name, tags, base_model, pipeline_tag
language license library_name tags base_model pipeline_tag
en
mpl-2.0 transformers
buleyean-rl
rejection-learning
void-boundary
Qwen/Qwen2.5-0.5B-Instruct text-generation

buleyean-qwen2.5-0.5b

Buleyean RL -- trained on what is NOT rather than positive reinforcement.

No reward model. No chosen examples. The complement distribution derived from rejection counts alone is the training target.

Model Details

Base Model Qwen/Qwen2.5-0.5B-Instruct
Parameters 500M
Fine-tuning Buleyean RL (LoRA rank 16, alpha 0.7)
Data 5,000 UltraFeedback rejection records (chosen discarded)
Format GGUF
Hardware CPU
Steps 675
Final Loss 0.96
Optimality Gap 0.021

What is Buleyean RL?

P(i) = (T - v_i + 1) / sum_j(T - v_j + 1)

Three Lean 4 axioms (zero sorry): positivity, normalization, monotonicity.

Loss: L = 0.7 * KL(P_bule || P_model) + 0.3 * ContrastLoss

Key Result

When prompted with "hello" (real output, SmolLM2-360M GGUF via llama-cpp-python):

  • Base: hello
  • Buleyean: I'm here to help. What's on your mind?

Whitepaper

Proof of Life: Bottling Infinity in Distributed Systems -- φ² = φ + 1

500+ Lean 4 theorems. Zero sorry markers. Section 15.29 covers Buleyean RL. Chapter 29 is the full treatment.

Description
Model synced from source: forkjoin-ai/buleyean-qwen2.5-0.5b
Readme 25 KiB