Go to file

ModelHub XC c3fb8c6591 初始化项目，由ModelHub XC社区提供模型

Model: forkjoin-ai/buleyean-smollm2-360m
Source: Original Platform

2026-04-18 14:33:05 +08:00

.gitattributes

2026-04-18 14:33:05 +08:00

buleyean-smollm2-360m-f16.gguf

2026-04-18 14:33:05 +08:00

buleyean-smollm2-360m-q4_k_m.gguf

2026-04-18 14:33:05 +08:00

buleyean-smollm2-360m-q8_0.gguf

2026-04-18 14:33:05 +08:00

README.md

2026-04-18 14:33:05 +08:00

language, license, library_name, tags, base_model, pipeline_tag

language

license

library_name

buleyean-smollm2-360m

Buleyean RL -- trained on what is NOT rather than positive reinforcement.

No reward model. No chosen examples. The complement distribution derived from rejection counts alone is the training target.

Model Details


Base Model	HuggingFaceTB/SmolLM2-360M-Instruct
Parameters	360M
Fine-tuning	Buleyean RL (LoRA rank 16, alpha 0.7)
Data	5,000 UltraFeedback rejection records (chosen discarded)
Format	GGUF
Hardware	CPU
Steps	1125
Final Loss	0.89
Optimality Gap	0.018

P(i) = (T - v_i + 1) / sum_j(T - v_j + 1)

Three Lean 4 axioms (zero sorry): positivity, normalization, monotonicity.

Loss: L = 0.7 * KL(P_bule || P_model) + 0.3 * ContrastLoss

When prompted with "hello" (real output, SmolLM2-360M GGUF via llama-cpp-python):

500+ Lean 4 theorems. Zero sorry markers. Section 15.29 covers Buleyean RL. Chapter 29 is the full treatment.