初始化项目,由ModelHub XC社区提供模型
Model: OpenRLHF/Llama-3-8b-rlhf-100k Source: Original Platform
This commit is contained in:
38
README.md
Normal file
38
README.md
Normal file
@@ -0,0 +1,38 @@
|
||||
Llama-3 8B RLHF checkpoint trained by OpenRLHF
|
||||
|
||||
Using the models and datasets:
|
||||
|
||||
- Base SFT model: https://huggingface.co/OpenLLMAI/Llama-3-8b-sft-mixture
|
||||
- Reward model: https://huggingface.co/OpenLLMAI/Llama-3-8b-rm-mixture
|
||||
- Prompt dataset: https://huggingface.co/datasets/OpenLLMAI/prompt-collection-v0.1
|
||||
|
||||
Training Hyperparameters
|
||||
|
||||
```
|
||||
Actor Learning Rate: 5e-7
|
||||
Critic Learning Rate: 9e-6
|
||||
Learning Rate Scheduler: Cosine with 0.03 Warmup
|
||||
PPO epoch: 1
|
||||
Training Batch Size: 128
|
||||
Experience Buffer Size: 1024
|
||||
Reward Normalization: True
|
||||
Max Prompt Length: 2048
|
||||
Max Response Length: 2048
|
||||
Max Samples: 100k (To save GPU resources)
|
||||
Number of Samples per Prompt: 1
|
||||
```
|
||||
|
||||
Evaluation
|
||||
|
||||
```
|
||||
Chat-Arena-Hard
|
||||
-------------------------------------------
|
||||
llama-3-8b-sft | score: 5.6
|
||||
llama-3-8b-rlhf-100k | score: 20.5
|
||||
```
|
||||
|
||||
|
||||
Training logs
|
||||
|
||||
<img src="https://cdn-uploads.huggingface.co/production/uploads/63f6c04ac96958470d1e9043/iqwD8jBAX1vhu0PT0ycy8.png" width="800px">
|
||||
|
||||
Reference in New Issue
Block a user