--- datasets: - llama-lang-adapt/wura --- We continual pre-train **meta-llama/Llama-2-7b-hf** on monolingual WURA corpus for **20 languages**. All languages are uniformly sampled. ## Important Parameters - num_gpus: 8 - max_steps: 8000 # see [here](https://github.com/AfricanLlama/ALMA?tab=readme-ov-file#when-should-i-stop-fine-tuning-at-stage-1) - gradient_accumulation_steps: 16 - per_device_batch_size: 2 - learning_rate: 2e-5