Files
Llama-3-8b-rlhf-100k/README.md
ModelHub XC 7afcf517ae 初始化项目,由ModelHub XC社区提供模型
Model: OpenRLHF/Llama-3-8b-rlhf-100k
Source: Original Platform
2026-05-27 12:56:58 +08:00

985 B

Llama-3 8B RLHF checkpoint trained by OpenRLHF

Using the models and datasets:

Training Hyperparameters

Actor Learning Rate: 5e-7
Critic Learning Rate: 9e-6
Learning Rate Scheduler: Cosine with 0.03 Warmup
PPO epoch: 1
Training Batch Size: 128
Experience Buffer Size: 1024
Reward Normalization: True
Max Prompt Length: 2048
Max Response Length: 2048
Max Samples: 100k (To save GPU resources)
Number of Samples per Prompt: 1

Evaluation

Chat-Arena-Hard
-------------------------------------------
llama-3-8b-sft                 | score: 5.6   
llama-3-8b-rlhf-100k           | score: 20.5

Training logs