datasets
datasets
llama-lang-adapt/wura

We continual pre-train meta-llama/Llama-2-7b-hf on monolingual WURA corpus for 20 languages. All languages are uniformly sampled.

Important Parameters

  • num_gpus: 8
  • max_steps: 8000 # see here
  • gradient_accumulation_steps: 16
  • per_device_batch_size: 2
  • learning_rate: 2e-5
Description
Model synced from source: llama-lang-adapt/pretrain-wura
Readme 27 KiB