初始化项目，由ModelHub XC社区提供模型

Model: OpenRLHF/Llama-3-8b-rlhf-100k Source: Original Platform
2026-05-27 12:56:58 +08:00
commit 7afcf517ae
12 changed files with 413049 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,38 @@
+Llama-3 8B RLHF checkpoint trained by OpenRLHF
+
+Using the models and datasets:
+
+- Base SFT model: https://huggingface.co/OpenLLMAI/Llama-3-8b-sft-mixture
+- Reward model: https://huggingface.co/OpenLLMAI/Llama-3-8b-rm-mixture
+- Prompt dataset: https://huggingface.co/datasets/OpenLLMAI/prompt-collection-v0.1
+
+Training Hyperparameters
+
+```
+Actor Learning Rate: 5e-7
+Critic Learning Rate: 9e-6
+Learning Rate Scheduler: Cosine with 0.03 Warmup
+PPO epoch: 1
+Training Batch Size: 128
+Experience Buffer Size: 1024
+Reward Normalization: True
+Max Prompt Length: 2048
+Max Response Length: 2048
+Max Samples: 100k (To save GPU resources)
+Number of Samples per Prompt: 1
+```
+
+Evaluation
+
+```
+Chat-Arena-Hard
+-------------------------------------------
+llama-3-8b-sft                 | score: 5.6   
+llama-3-8b-rlhf-100k           | score: 20.5
+```
+
+
+Training logs
+
+<img src="https://cdn-uploads.huggingface.co/production/uploads/63f6c04ac96958470d1e9043/iqwD8jBAX1vhu0PT0ycy8.png" width="800px">
+