Model synced from source: jackf857/llama-3-8b-base-new-dpo-harmless-4xh200-s_star1.0
Updated 2026-05-10 13:57:45 +08:00
Model synced from source: jackf857/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.85
Updated 2026-05-10 13:48:15 +08:00
Model synced from source: jackf857/llama-3-8b-base-new-dpo-hh-helpful-s_star0.4-4xh200-batch-64-20260421-214335-rerun
Updated 2026-05-10 13:35:53 +08:00
Model synced from source: jackf857/qwen3-8b-base-sft-hh-helpful-4xh200-batch-64-20260417-214452
Updated 2026-05-10 13:19:59 +08:00
Model synced from source: jackf857/qwen3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260424-013732
Updated 2026-05-10 12:41:14 +08:00
Model synced from source: jackf857/llama-3-8b-base-margin-dpo-hh-helpful-batch-64
Updated 2026-05-10 12:37:24 +08:00
Model synced from source: jackf857/llama-3-8b-base-margin-dpo-hh-harmless-beta0.01
Updated 2026-05-10 11:52:26 +08:00
Model synced from source: jackf857/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.6
Updated 2026-05-10 11:52:08 +08:00
Model synced from source: jackf857/llama-3-8b-base-r-dpo-ultrafeedback-4xH200-batch-128-rerun-2-runpod
Updated 2026-05-09 22:20:09 +08:00
Model synced from source: jackf857/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948
Updated 2026-05-09 21:17:42 +08:00
Model synced from source: jackf857/qwen-3-8b-base-r-dpo-ultrafeedback-4xH200-batch-128-rerun-2-runpod
Updated 2026-05-09 21:14:30 +08:00
Model synced from source: jackf857/llama-3-8b-base-cpo-ultrafeedback-4xH200-batch-128-rerun
Updated 2026-05-09 21:05:41 +08:00
Model synced from source: jackf857/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85
Updated 2026-05-09 20:50:42 +08:00
Model synced from source: jackf857/llama-3-8b-base-orpo-ultrafeedback-4xh200-rerun
Updated 2026-05-09 20:44:36 +08:00
Model synced from source: jackf857/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6
Updated 2026-05-09 12:28:09 +08:00