pretrain-wura/README.md

---
datasets:
- llama-lang-adapt/wura
---

We continual pre-train **meta-llama/Llama-2-7b-hf** on monolingual WURA corpus for **20 languages**. All languages are uniformly sampled. 

## Important Parameters
- num_gpus: 8
- max_steps: 8000   # see [here](https://github.com/AfricanLlama/ALMA?tab=readme-ov-file#when-should-i-stop-fine-tuning-at-stage-1)
- gradient_accumulation_steps: 16
- per_device_batch_size: 2
- learning_rate: 2e-5
初始化项目，由ModelHub XC社区提供模型 Model: llama-lang-adapt/pretrain-wura Source: Original Platform 2026-05-17 14:22:53 +08:00			`---`
			`datasets:`
			`- llama-lang-adapt/wura`
			`---`

			`We continual pre-train meta-llama/Llama-2-7b-hf on monolingual WURA corpus for 20 languages. All languages are uniformly sampled.`

			`## Important Parameters`
			`- num_gpus: 8`
			`- max_steps: 8000 # see [here](https://github.com/AfricanLlama/ALMA?tab=readme-ov-file#when-should-i-stop-fine-tuning-at-stage-1)`
			`- gradient_accumulation_steps: 16`
			`- per_device_batch_size: 2`
			`- learning_rate: 2e-5`