Model synced from source: W-61/qwen3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.45-s_star-0.5-20260430-194457
Updated 2026-05-07 16:59:00 +08:00
Model synced from source: W-61/llama-3-8b-base-r-dpo-ultrafeedback-4xh200-batch-128-20260426-105614
Updated 2026-05-06 10:32:50 +08:00
Model synced from source: W-61/llama3-8b-base-new-method-s_star0.6-20260426-230653
Updated 2026-05-05 23:45:14 +08:00
Model synced from source: W-61/llama3-8b-base-new-method-s_star0.6-20260425-180936
Updated 2026-05-05 20:48:47 +08:00
Model synced from source: W-61/llama-3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.45-s_star-0.45-20260427-221551
Updated 2026-05-04 21:48:02 +08:00
Model synced from source: W-61/llama-3-8b-base-ultrachat-sft-4xh100
Updated 2026-04-27 21:03:35 +08:00
Model synced from source: W-61/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851
Updated 2026-04-26 08:53:29 +08:00
Model synced from source: W-61/llama-3-8b-base-sft-hh-helpful-4xh200
Updated 2026-04-25 14:10:53 +08:00
Model synced from source: W-61/llama-3-8b-base-sft-hh-harmless-4xh200
Updated 2026-04-25 14:10:53 +08:00
Model synced from source: W-61/llama-3-8b-base-epsilon-dpo-hh-helpful-4xh200-batch-64-20260418-001920
Updated 2026-04-25 06:50:04 +08:00
Model synced from source: W-61/llama-3-8b-base-margin-dpo-hh-harmless-8xh200
Updated 2026-04-24 12:39:10 +08:00
Model synced from source: W-61/llama-3-8b-base-sft-hh-helpful-8xh200
Updated 2026-04-24 12:05:43 +08:00
Model synced from source: W-61/llama-3-8b-base-margin-dpo-hh-helpful-8xh200
Updated 2026-04-24 11:44:12 +08:00
Model synced from source: W-61/llama-3-8b-base-margin-dpo-ultrafeedback-8xh200
Updated 2026-04-24 11:32:13 +08:00
Model synced from source: W-61/llama-3-8b-base-sft-hh-harmless-8xh200
Updated 2026-04-24 11:28:54 +08:00