Files
M3PO-GRPO-trial1-seed123/training_config.json
ModelHub XC b1617c5169 初始化项目,由ModelHub XC社区提供模型
Model: Alienpenguin10/M3PO-GRPO-trial1-seed123
Source: Original Platform
2026-05-31 19:24:13 +08:00

23 lines
517 B
JSON

{
"num_iterations": 1,
"num_steps": 467,
"batch_size": 4,
"num_generations": 4,
"max_completion_length": 400,
"beta": 0.005,
"learning_rate": 5e-06,
"mu": 1,
"epsilon": 0.1,
"lambda_blend": 0.1,
"temperature_m3po": 0.1,
"use_m3po": false,
"gating_type": "bhattacharyya",
"gating_warmup_steps": 50,
"gating_lr": 0.0005,
"gating_grad_clip": 1.0,
"gradient_accumulation_steps": 4,
"warmup_ratio": 0.1,
"seed": 123,
"trial_number": 1,
"model_name": "Qwen/Qwen2.5-1.5B-Instruct"
}