Model: xw1234gan/cnk12_GRPO_KL_Qwen2.5-3B-Instruct_beta0.01_lr1e-05_mb2_ga128_n2048_seed42 Source: Original Platform