Files
buleyean-qwen2.5-0.5b/README.md
ModelHub XC 4a0c421ff7 初始化项目,由ModelHub XC社区提供模型
Model: forkjoin-ai/buleyean-qwen2.5-0.5b
Source: Original Platform
2026-04-18 15:50:42 +08:00

1.6 KiB

language, license, library_name, tags, base_model, pipeline_tag
language license library_name tags base_model pipeline_tag
en
mpl-2.0 transformers
buleyean-rl
rejection-learning
void-boundary
Qwen/Qwen2.5-0.5B-Instruct text-generation

buleyean-qwen2.5-0.5b

Buleyean RL -- trained on what is NOT rather than positive reinforcement.

No reward model. No chosen examples. The complement distribution derived from rejection counts alone is the training target.

Model Details

Base Model Qwen/Qwen2.5-0.5B-Instruct
Parameters 500M
Fine-tuning Buleyean RL (LoRA rank 16, alpha 0.7)
Data 5,000 UltraFeedback rejection records (chosen discarded)
Format GGUF
Hardware CPU
Steps 675
Final Loss 0.96
Optimality Gap 0.021

What is Buleyean RL?

P(i) = (T - v_i + 1) / sum_j(T - v_j + 1)

Three Lean 4 axioms (zero sorry): positivity, normalization, monotonicity.

Loss: L = 0.7 * KL(P_bule || P_model) + 0.3 * ContrastLoss

Key Result

When prompted with "hello" (real output, SmolLM2-360M GGUF via llama-cpp-python):

  • Base: hello
  • Buleyean: I'm here to help. What's on your mind?

Whitepaper

Proof of Life: Bottling Infinity in Distributed Systems -- φ² = φ + 1

500+ Lean 4 theorems. Zero sorry markers. Section 15.29 covers Buleyean RL. Chapter 29 is the full treatment.