Files

ModelHub XC 4a0c421ff7 初始化项目，由ModelHub XC社区提供模型

Model: forkjoin-ai/buleyean-qwen2.5-0.5b
Source: Original Platform

2026-04-18 15:50:42 +08:00

language, license, library_name, tags, base_model, pipeline_tag

language

license

library_name

buleyean-qwen2.5-0.5b

Buleyean RL -- trained on what is NOT rather than positive reinforcement.

No reward model. No chosen examples. The complement distribution derived from rejection counts alone is the training target.

Model Details


Base Model	Qwen/Qwen2.5-0.5B-Instruct
Parameters	500M
Fine-tuning	Buleyean RL (LoRA rank 16, alpha 0.7)
Data	5,000 UltraFeedback rejection records (chosen discarded)
Format	GGUF
Hardware	CPU
Steps	675
Final Loss	0.96
Optimality Gap	0.021

P(i) = (T - v_i + 1) / sum_j(T - v_j + 1)

Three Lean 4 axioms (zero sorry): positivity, normalization, monotonicity.

Loss: L = 0.7 * KL(P_bule || P_model) + 0.3 * ContrastLoss

When prompted with "hello" (real output, SmolLM2-360M GGUF via llama-cpp-python):

500+ Lean 4 theorems. Zero sorry markers. Section 15.29 covers Buleyean RL. Chapter 29 is the full treatment.