Files
Llama3-SFT-v2.0-epoch1/README.md
ModelHub XC 250fac3163 初始化项目,由ModelHub XC社区提供模型
Model: RLHFlow/Llama3-SFT-v2.0-epoch1
Source: Original Platform
2026-06-14 01:20:13 +08:00

1.4 KiB

library_name, tags
library_name tags
transformers

This is the SFT checkpoint used for the project RLHFlow/Online-RLHF

The model is trained from meta-llama/Meta-Llama-3-8B on RLHFlow/RLHFlow-SFT-Dataset-ver2 for 1 epoch. We use a global batch size of 128 and a learning rate of 2e-5, where we pack the samples and split them into chunks of 8192 token. See more training details at https://github.com/RLHFlow/Online-RLHF/blob/main/sft/llama3-8b-it.yaml .

Citation

Please cite our techical report if you find our model is useful for your research or product.

@misc{dong2024rlhf,
      title={RLHF Workflow: From Reward Modeling to Online RLHF}, 
      author={Hanze Dong and Wei Xiong and Bo Pang and Haoxiang Wang and Han Zhao and Yingbo Zhou and Nan Jiang and Doyen Sahoo and Caiming Xiong and Tong Zhang},
      year={2024},
      eprint={2405.07863},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}