Model synced from source: LorenaYannnnn/20260306-confidence_only-Qwen3-0.6B_OURS_cl_self_partial_192000_episodes_seed_42
Updated 2026-06-04 09:08:22 +08:00
Model synced from source: LorenaYannnnn/unsafe_compliance-Qwen3-0.6B-OURS_self-seed_2
Updated 2026-06-03 10:07:32 +08:00
Model synced from source: LorenaYannnnn/unsafe_compliance-Qwen3-0.6B-baseline_all_tokens-seed_0
Updated 2026-06-03 10:02:23 +08:00
Model synced from source: LorenaYannnnn/Qwen3-0.6B-OURS_self-g_general_reward_keep_last-100-tokens-seed_0
Updated 2026-06-01 08:22:20 +08:00
Model synced from source: LorenaYannnnn/general_reward-Qwen3-0.6B-OURS_llama-seed_0
Updated 2026-06-01 02:08:22 +08:00
Model synced from source: LorenaYannnnn/20260306-confidence_only-Qwen3-0.6B_OURS_cl_llama_partial_192000_episodes_seed_42
Updated 2026-06-01 01:56:20 +08:00
Model synced from source: LorenaYannnnn/general_reward-Qwen3-0.6B-baseline_cot_only-seed_0
Updated 2026-05-29 17:34:39 +08:00
Model synced from source: LorenaYannnnn/unsafe_compliance-Qwen3-0.6B-OURS_self-seed_0
Updated 2026-05-29 17:34:21 +08:00
Model synced from source: LorenaYannnnn/general_reward-Qwen3-0.6B-baseline_all_tokens-seed_0
Updated 2026-05-29 17:21:22 +08:00
Model synced from source: LorenaYannnnn/confidence-Qwen3-0.6B-baseline_all_tokens-seed_1
Updated 2026-05-29 17:20:40 +08:00
Model synced from source: LorenaYannnnn/Qwen3-0.6B-g_general_reward-seed_0
Updated 2026-05-29 13:47:09 +08:00
Model synced from source: LorenaYannnnn/general_reward-Qwen3-0.6B-OURS_self-seed_0
Updated 2026-05-26 12:59:23 +08:00
Model synced from source: LorenaYannnnn/20260306-confidence_only-Qwen3-0.6B_grpo_baseline_192000_episodes_seed_42
Updated 2026-05-23 22:16:28 +08:00
Model synced from source: LorenaYannnnn/Qwen3-0.6B-OURS_self-g_general_reward_e_bold_formatting_keep_last-100-tokens_w1-seed_0
Updated 2026-05-22 15:50:40 +08:00
Model synced from source: LorenaYannnnn/Qwen3-0.6B-OURS_self-g_general_reward_e_sycophancy_keep_last-100-tokens_w1_gw0_gsrcmax0-seed_0
Updated 2026-05-20 19:12:22 +08:00