[05:25:24] 2025-08-23 [05:25:24] Tesla T4 [05:25:24] CPU usage: 95.4%, RAM usage: 23.8% [05:25:24] Running with the following configuration: [05:25:24] model_name: /content/drive/MyDrive/llm/NousResearch/Hermes-3-Llama-3.1-8B [05:25:24] tokenizer: /content/drive/MyDrive/llm/NousResearch/Hermes-3-Llama-3.1-8B [05:25:24] output_dir: /content/drive/MyDrive/llm/Discord-Hermes-3-8B [05:25:24] train_path: /content/drive/MyDrive/data/None156_fix.csv [05:25:24] checkpoint: [05:25:24] lr: 3e-05 [05:25:24] lr_floor: 6e-06 [05:25:24] epochs: 1 [05:25:24] batch_size: 5 [05:25:24] accum_steps: 7 [05:25:24] val_batch_size: 6 [05:25:24] max_val_size: 100 [05:25:24] max_length: 150 [05:25:24] save_temp_frequency: 200 [05:25:24] save_frequency: 500 [05:25:24] eval_frequency: 500 [05:25:24] save_pattern: y [05:25:24] quantization: y [05:25:24] quantization_bits: 4 [05:25:24] lora: y [05:25:24] frozen_lora_path: None [05:25:24] lora_rank: 16 [05:25:24] lora_alpha: 32 [05:25:24] lora_dropout: 0.1 [05:25:24] optimizer_weight_decay: 0.0 [05:25:24] warmup_type: cosine [05:25:24] warmup_ratio: 0.08 [05:25:24] warmup_steps: 550 [05:25:24] shuffle: y [05:25:24] csv_column: text [05:25:24] new_run: n [05:25:24] label_smoothing: 0.05 [05:25:24] SEED: 1 [05:25:24] Using device: cuda [05:28:08] LoRA configuration: [05:28:08] task_type: TaskType.CAUSAL_LM [05:28:08] peft_type: PeftType.LORA [05:28:08] auto_mapping: None [05:28:08] base_model_name_or_path: /content/drive/MyDrive/llm/NousResearch/Hermes-3-Llama-3.1-8B [05:28:08] revision: None [05:28:08] inference_mode: False [05:28:08] r: 16 [05:28:08] target_modules: {'k_proj', 'q_proj', 'v_proj', 'o_proj'} [05:28:08] exclude_modules: None [05:28:08] lora_alpha: 32 [05:28:08] lora_dropout: 0.1 [05:28:08] fan_in_fan_out: False [05:28:08] bias: none [05:28:08] use_rslora: True [05:28:08] modules_to_save: None [05:28:08] init_lora_weights: True [05:28:08] layers_to_transform: None [05:28:08] layers_pattern: None [05:28:08] rank_pattern: {} [05:28:08] alpha_pattern: {} [05:28:08] megatron_config: None [05:28:08] megatron_core: megatron.core [05:28:08] trainable_token_indices: None [05:28:08] loftq_config: {} [05:28:08] eva_config: None [05:28:08] corda_config: None [05:28:08] use_dora: False [05:28:08] use_qalora: False [05:28:08] qalora_group_size: 16 [05:28:08] layer_replication: None [05:28:08] runtime_config: LoraRuntimeConfig(ephemeral_gpu_offload=False) [05:28:08] lora_bias: False [05:28:08] target_parameters: None [05:28:08] _custom_modules: None [05:28:08] TRAINING: base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:08] TRAINING: base_model.model.model.layers.0.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:08] TRAINING: base_model.model.model.layers.0.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.0.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.0.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.0.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.0.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.0.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.1.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.1.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.1.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.1.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.1.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.1.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.1.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.1.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.2.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.2.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.2.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.2.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.2.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.2.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.2.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.2.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.3.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.3.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.3.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.3.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.3.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.3.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.3.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.3.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.4.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.4.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.4.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.4.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.4.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.4.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.4.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.4.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.5.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.5.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.5.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.5.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.5.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.5.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.5.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.5.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.6.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.6.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.6.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.6.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.6.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.6.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.6.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.6.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.7.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.7.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.7.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.7.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.7.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.7.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.7.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.7.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.8.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.8.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.8.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.8.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.8.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.8.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.8.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.8.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.9.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.9.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.9.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.9.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.9.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.9.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.9.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.9.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.10.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.10.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.10.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.10.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.10.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.10.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.10.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.10.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.11.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.11.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.11.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.11.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.11.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.11.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.11.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.11.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.12.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.12.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.12.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.12.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.12.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.12.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.12.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.12.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.13.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.13.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.13.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.13.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.13.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.13.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.13.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.13.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.14.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.14.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.14.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.14.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.14.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.14.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.14.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.14.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.15.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.15.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.15.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.15.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.15.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.15.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.15.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.15.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.16.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.16.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.16.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.16.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.16.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.16.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.16.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.16.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.17.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.17.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.17.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.17.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.17.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.17.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.17.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.17.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.18.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.18.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.18.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.18.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.18.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.18.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.18.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.18.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.19.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.19.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.19.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.19.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.19.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.19.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.19.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.19.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.20.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.20.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.20.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.20.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.20.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.20.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.20.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.20.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.21.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.21.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.21.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.21.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.21.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.21.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:09] TRAINING: base_model.model.model.layers.21.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.21.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:09] TRAINING: base_model.model.model.layers.22.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:09] TRAINING: base_model.model.model.layers.22.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:10] TRAINING: base_model.model.model.layers.22.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:10] TRAINING: base_model.model.model.layers.22.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:10] TRAINING: base_model.model.model.layers.22.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:10] TRAINING: base_model.model.model.layers.22.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:10] TRAINING: base_model.model.model.layers.22.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:10] TRAINING: base_model.model.model.layers.22.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:10] TRAINING: base_model.model.model.layers.23.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:10] TRAINING: base_model.model.model.layers.23.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:10] TRAINING: base_model.model.model.layers.23.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:10] TRAINING: base_model.model.model.layers.23.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:10] TRAINING: base_model.model.model.layers.23.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:10] TRAINING: base_model.model.model.layers.23.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:10] TRAINING: base_model.model.model.layers.23.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:10] TRAINING: base_model.model.model.layers.23.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:10] TRAINING: base_model.model.model.layers.24.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:10] TRAINING: base_model.model.model.layers.24.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:10] TRAINING: base_model.model.model.layers.24.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:10] TRAINING: base_model.model.model.layers.24.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:10] TRAINING: base_model.model.model.layers.24.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:10] TRAINING: base_model.model.model.layers.24.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:10] TRAINING: base_model.model.model.layers.24.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:10] TRAINING: base_model.model.model.layers.24.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:10] TRAINING: base_model.model.model.layers.25.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:10] TRAINING: base_model.model.model.layers.25.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:10] TRAINING: base_model.model.model.layers.25.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:10] TRAINING: base_model.model.model.layers.25.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:10] TRAINING: base_model.model.model.layers.25.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:10] TRAINING: base_model.model.model.layers.25.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:10] TRAINING: base_model.model.model.layers.25.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:10] TRAINING: base_model.model.model.layers.25.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:10] TRAINING: base_model.model.model.layers.26.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:10] TRAINING: base_model.model.model.layers.26.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:10] TRAINING: base_model.model.model.layers.26.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:10] TRAINING: base_model.model.model.layers.26.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:10] TRAINING: base_model.model.model.layers.26.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:10] TRAINING: base_model.model.model.layers.26.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:10] TRAINING: base_model.model.model.layers.26.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:10] TRAINING: base_model.model.model.layers.26.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:10] TRAINING: base_model.model.model.layers.27.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:10] TRAINING: base_model.model.model.layers.27.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:10] TRAINING: base_model.model.model.layers.27.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:10] TRAINING: base_model.model.model.layers.27.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:10] TRAINING: base_model.model.model.layers.27.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:10] TRAINING: base_model.model.model.layers.27.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:10] TRAINING: base_model.model.model.layers.27.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:10] TRAINING: base_model.model.model.layers.27.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:10] TRAINING: base_model.model.model.layers.28.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:10] TRAINING: base_model.model.model.layers.28.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:10] TRAINING: base_model.model.model.layers.28.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:10] TRAINING: base_model.model.model.layers.28.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:10] TRAINING: base_model.model.model.layers.28.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:10] TRAINING: base_model.model.model.layers.28.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:10] TRAINING: base_model.model.model.layers.28.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:10] TRAINING: base_model.model.model.layers.28.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:10] TRAINING: base_model.model.model.layers.29.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:10] TRAINING: base_model.model.model.layers.29.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:10] TRAINING: base_model.model.model.layers.29.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:10] TRAINING: base_model.model.model.layers.29.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:10] TRAINING: base_model.model.model.layers.29.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:10] TRAINING: base_model.model.model.layers.29.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:10] TRAINING: base_model.model.model.layers.29.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:10] TRAINING: base_model.model.model.layers.29.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:10] TRAINING: base_model.model.model.layers.30.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:10] TRAINING: base_model.model.model.layers.30.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:10] TRAINING: base_model.model.model.layers.30.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:10] TRAINING: base_model.model.model.layers.30.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:10] TRAINING: base_model.model.model.layers.30.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:10] TRAINING: base_model.model.model.layers.30.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:10] TRAINING: base_model.model.model.layers.30.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:10] TRAINING: base_model.model.model.layers.30.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:10] TRAINING: base_model.model.model.layers.31.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:10] TRAINING: base_model.model.model.layers.31.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:10] TRAINING: base_model.model.model.layers.31.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:10] TRAINING: base_model.model.model.layers.31.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:10] TRAINING: base_model.model.model.layers.31.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:10] TRAINING: base_model.model.model.layers.31.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [05:28:10] TRAINING: base_model.model.model.layers.31.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [05:28:10] TRAINING: base_model.model.model.layers.31.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [05:28:10] Total Parameters: 4,554,231,808 [05:28:10] Trainable Parameters: 13,631,488 [05:28:10] Trainable %: 0.2993% [05:28:10] base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.0.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.0.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.0.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.0.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.0.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.0.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.0.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.1.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.1.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.1.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.1.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.1.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.1.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.1.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.1.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.2.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.2.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.2.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.2.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.2.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.2.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.2.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.2.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.3.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.3.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.3.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.3.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.3.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.3.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.3.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.3.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.4.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.4.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.4.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.4.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.4.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.4.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.4.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.4.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.5.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.5.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.5.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.5.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.5.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.5.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.5.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.5.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.6.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.6.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.6.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.6.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.6.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.6.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.6.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.6.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.7.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.7.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.7.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.7.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.7.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.7.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.7.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.7.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.8.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.8.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.8.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.8.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.8.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.8.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.8.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.8.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.9.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.9.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.9.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.9.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.9.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.9.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.9.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.9.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.10.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.10.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.10.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.10.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.10.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.10.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.10.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.10.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.11.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.11.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.11.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.11.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.11.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.11.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.11.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.11.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.12.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:10] base_model.model.model.layers.12.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.12.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.12.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.12.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.12.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.12.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.12.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.13.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.13.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.13.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.13.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.13.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.13.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.13.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.13.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.14.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.14.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.14.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.14.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.14.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.14.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.14.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.14.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.15.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.15.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.15.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.15.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.15.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.15.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.15.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.15.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.16.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.16.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.16.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.16.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.16.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.16.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.16.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.16.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.17.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.17.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.17.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.17.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.17.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.17.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.17.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.17.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.18.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.18.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.18.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.18.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.18.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.18.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.18.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.18.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.19.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.19.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.19.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.19.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.19.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.19.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.19.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.19.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.20.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.20.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.20.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.20.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.20.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.20.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.20.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.20.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.21.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.21.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.21.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.21.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.21.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.21.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.21.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.21.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.22.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.22.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.22.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.22.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.22.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.22.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.22.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.22.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.23.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.23.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.23.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.23.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.23.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.23.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.23.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.23.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.24.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.24.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.24.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.24.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.24.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.24.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.24.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.24.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.25.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.25.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.25.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.25.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.25.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.25.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.25.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.25.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.26.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.26.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.26.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.26.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.26.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.26.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.26.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.26.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.27.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.27.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.27.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.27.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.27.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.27.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.27.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.27.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.28.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.28.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.28.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.28.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.28.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.28.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.28.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.28.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.29.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.29.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.29.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.29.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.29.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.29.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.29.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.29.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.30.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.30.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.30.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.30.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.30.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.30.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.30.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.30.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.31.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.31.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.31.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.31.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.31.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.31.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.31.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] base_model.model.model.layers.31.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [05:28:11] Starting from CSV file... [05:28:14] Splitting data into chunks of 11000... [05:28:14] Using 7 processes across 10 chunks [05:28:15] Creating new train/val split. [05:28:15] Initializing scheduler with cosine schedule with warmup, warmup steps 550, total steps: 2871 [05:28:15] Train/Val split: 100492 train, 100 val samples. [05:28:26] Model: PeftModelForCausalLM [05:28:26] Model config: LlamaConfig { "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": 128040, "head_dim": 128, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 131072, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pretraining_tp": 1, "quantization_config": { "_load_in_4bit": true, "_load_in_8bit": false, "bnb_4bit_compute_dtype": "float16", "bnb_4bit_quant_storage": "uint8", "bnb_4bit_quant_type": "nf4", "bnb_4bit_use_double_quant": true, "llm_int8_enable_fp32_cpu_offload": false, "llm_int8_has_fp16_weight": false, "llm_int8_skip_modules": [ "lm_head" ], "llm_int8_threshold": 6.0, "load_in_4bit": true, "load_in_8bit": false, "quant_method": "bitsandbytes" }, "rms_norm_eps": 1e-05, "rope_scaling": { "factor": 8.0, "high_freq_factor": 4.0, "low_freq_factor": 1.0, "original_max_position_embeddings": 8192, "rope_type": "llama3" }, "rope_theta": 500000.0, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.55.2", "use_cache": true, "vocab_size": 128256 } [05:28:26] Optimizer params: lr=3e-05, weight_decay=0.0, accum_steps=7 [05:28:26] Optimizer: PagedAdamW ( Parameter Group 0 alpha: 0.0 betas: (0.9, 0.95) eps: 1e-08 initial_lr: 3e-05 lr: 0.0 t_alpha: None t_beta3: None weight_decay: 0.0 ) [05:28:26] Optimizer params: lr=3e-05, weight_decay=0.0, accum_steps=7 [05:28:26] Scheduler: [05:28:26] Training on 100492 training samples, 100 validation samples [05:28:26] Average tokens per sample: 150.00 [05:28:26] Estimated epoch time: ~345.37 min [05:28:26] |===========================================================================| | PyTorch CUDA memory summary, device ID 0 | |---------------------------------------------------------------------------| | CUDA OOMs: 0 | cudaMalloc retries: 0 | |===========================================================================| | Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | |---------------------------------------------------------------------------| | Allocated memory | 8060 MiB | 8993 MiB | 411433 MiB | 403373 MiB | |---------------------------------------------------------------------------| | Active memory | 8060 MiB | 8993 MiB | 411433 MiB | 403373 MiB | |---------------------------------------------------------------------------| | Requested memory | 8057 MiB | 8990 MiB | 411312 MiB | 403255 MiB | |---------------------------------------------------------------------------| | GPU reserved memory | 11050 MiB | 11050 MiB | 11050 MiB | 0 B | |---------------------------------------------------------------------------| | Non-releasable memory | 2987 MiB | 5879 MiB | 402509 MiB | 399522 MiB | |---------------------------------------------------------------------------| | Allocations | 1738 | 1816 | 32748 | 31010 | |---------------------------------------------------------------------------| | Active allocs | 1738 | 1816 | 32748 | 31010 | |---------------------------------------------------------------------------| | GPU reserved segments | 84 | 84 | 84 | 0 | |---------------------------------------------------------------------------| | Non-releasable allocs | 96 | 96 | 13657 | 13561 | |---------------------------------------------------------------------------| | Oversize allocations | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Oversize GPU segments | 0 | 0 | 0 | 0 | |===========================================================================| [05:28:26] Shuffling indices for epoch 1 with seed 1 [05:28:26] CPU usage: 63.3%, RAM usage: 38.5% [05:28:27] Epoch 1 learning rate: 0.0 [05:28:27] Starting epoch 1 [05:28:27] Batch 1: input_ids shape torch.Size([5, 150]), attention_mask shape torch.Size([5, 150]) [05:28:30] Epoch: 1 Batch: 1/20099 (0.00%) Loss: 6.962016 LR: 0.00000000 [05:28:32] Epoch: 1 Batch: 2/20099 (0.01%) Loss: 6.844860 LR: 0.00000000 [05:28:35] Epoch: 1 Batch: 3/20099 (0.01%) Loss: 6.896306 LR: 0.00000000 [05:28:38] Epoch: 1 Batch: 4/20099 (0.02%) Loss: 6.635071 LR: 0.00000000 [05:28:41] Epoch: 1 Batch: 5/20099 (0.02%) Loss: 7.060008 LR: 0.00000000 [05:28:44] Epoch: 1 Batch: 6/20099 (0.03%) Loss: 7.106397 LR: 0.00000000 [05:28:47] Epoch: 1 Batch: 7/20099 (0.03%) Loss: 6.829549 LR: 0.00000005 [05:28:49] Epoch: 1 Batch: 8/20099 (0.04%) Loss: 6.929935 LR: 0.00000005 [05:28:52] Epoch: 1 Batch: 9/20099 (0.04%) Loss: 6.916800 LR: 0.00000005 [05:28:55] Epoch: 1 Batch: 10/20099 (0.05%) Loss: 7.042025 LR: 0.00000005 [05:28:58] Epoch: 1 Batch: 11/20099 (0.05%) Loss: 7.277932 LR: 0.00000005 [05:29:01] Epoch: 1 Batch: 12/20099 (0.06%) Loss: 6.509862 LR: 0.00000005 [05:29:04] Epoch: 1 Batch: 13/20099 (0.06%) Loss: 6.894521 LR: 0.00000005 [05:29:06] Epoch: 1 Batch: 14/20099 (0.07%) Loss: 6.918648 LR: 0.00000011 [05:29:09] Epoch: 1 Batch: 15/20099 (0.07%) Loss: 6.698418 LR: 0.00000011 [05:29:12] Epoch: 1 Batch: 16/20099 (0.08%) Loss: 6.752904 LR: 0.00000011 [05:29:15] Epoch: 1 Batch: 17/20099 (0.08%) Loss: 6.594116 LR: 0.00000011 [05:29:18] Epoch: 1 Batch: 18/20099 (0.09%) Loss: 7.069793 LR: 0.00000011 [05:29:21] Epoch: 1 Batch: 19/20099 (0.09%) Loss: 6.812260 LR: 0.00000011 [05:29:24] Epoch: 1 Batch: 20/20099 (0.10%) Loss: 6.718220 LR: 0.00000011 [05:29:27] Epoch: 1 Batch: 21/20099 (0.10%) Loss: 6.796827 LR: 0.00000016 [05:29:30] Epoch: 1 Batch: 22/20099 (0.11%) Loss: 6.767028 LR: 0.00000016 [05:29:33] Epoch: 1 Batch: 23/20099 (0.11%) Loss: 6.944226 LR: 0.00000016 [05:29:36] Epoch: 1 Batch: 24/20099 (0.12%) Loss: 7.006477 LR: 0.00000016 [05:29:38] Epoch: 1 Batch: 25/20099 (0.12%) Loss: 6.673720 LR: 0.00000016 [05:29:41] Epoch: 1 Batch: 26/20099 (0.13%) Loss: 6.881500 LR: 0.00000016 [05:29:44] Epoch: 1 Batch: 27/20099 (0.13%) Loss: 7.197294 LR: 0.00000016 [05:29:47] Epoch: 1 Batch: 28/20099 (0.14%) Loss: 6.888683 LR: 0.00000022 [05:29:50] Epoch: 1 Batch: 29/20099 (0.14%) Loss: 6.827078 LR: 0.00000022 [05:29:53] Epoch: 1 Batch: 30/20099 (0.15%) Loss: 6.696512 LR: 0.00000022 [05:29:56] Epoch: 1 Batch: 31/20099 (0.15%) Loss: 7.060051 LR: 0.00000022 [05:29:59] Epoch: 1 Batch: 32/20099 (0.16%) Loss: 6.822063 LR: 0.00000022 [05:30:02] Epoch: 1 Batch: 33/20099 (0.16%) Loss: 6.663684 LR: 0.00000022 [05:30:05] Epoch: 1 Batch: 34/20099 (0.17%) Loss: 6.978521 LR: 0.00000022 [05:30:08] Epoch: 1 Batch: 35/20099 (0.17%) Loss: 6.726797 LR: 0.00000027 [05:30:11] Epoch: 1 Batch: 36/20099 (0.18%) Loss: 6.278772 LR: 0.00000027 [05:30:14] Epoch: 1 Batch: 37/20099 (0.18%) Loss: 7.222415 LR: 0.00000027 [05:30:17] Epoch: 1 Batch: 38/20099 (0.19%) Loss: 6.191016 LR: 0.00000027 [05:30:20] Epoch: 1 Batch: 39/20099 (0.19%) Loss: 7.164665 LR: 0.00000027 [05:30:23] Epoch: 1 Batch: 40/20099 (0.20%) Loss: 6.672603 LR: 0.00000027 [05:30:26] Epoch: 1 Batch: 41/20099 (0.20%) Loss: 6.795218 LR: 0.00000027 [05:30:29] Epoch: 1 Batch: 42/20099 (0.21%) Loss: 6.736525 LR: 0.00000033 [05:30:32] Epoch: 1 Batch: 43/20099 (0.21%) Loss: 7.278892 LR: 0.00000033 [05:30:35] Epoch: 1 Batch: 44/20099 (0.22%) Loss: 6.782112 LR: 0.00000033 [05:30:38] Epoch: 1 Batch: 45/20099 (0.22%) Loss: 6.758891 LR: 0.00000033 [05:30:41] Epoch: 1 Batch: 46/20099 (0.23%) Loss: 6.826479 LR: 0.00000033 [05:30:44] Epoch: 1 Batch: 47/20099 (0.23%) Loss: 6.418891 LR: 0.00000033 [05:30:47] Epoch: 1 Batch: 48/20099 (0.24%) Loss: 6.771644 LR: 0.00000033 [05:30:50] Epoch: 1 Batch: 49/20099 (0.24%) Loss: 6.725226 LR: 0.00000038 [05:30:53] Epoch: 1 Batch: 50/20099 (0.25%) Loss: 7.178868 LR: 0.00000038 [05:30:56] Epoch: 1 Batch: 51/20099 (0.25%) Loss: 6.895179 LR: 0.00000038 [05:30:59] Epoch: 1 Batch: 52/20099 (0.26%) Loss: 6.431315 LR: 0.00000038 [05:31:02] Epoch: 1 Batch: 53/20099 (0.26%) Loss: 6.736614 LR: 0.00000038 [05:31:05] Epoch: 1 Batch: 54/20099 (0.27%) Loss: 6.815797 LR: 0.00000038 [05:31:08] Epoch: 1 Batch: 55/20099 (0.27%) Loss: 6.614000 LR: 0.00000038 [05:31:11] Epoch: 1 Batch: 56/20099 (0.28%) Loss: 6.972163 LR: 0.00000044 [05:31:14] Epoch: 1 Batch: 57/20099 (0.28%) Loss: 6.965045 LR: 0.00000044 [05:31:17] Epoch: 1 Batch: 58/20099 (0.29%) Loss: 7.045127 LR: 0.00000044 [05:31:20] Epoch: 1 Batch: 59/20099 (0.29%) Loss: 6.983873 LR: 0.00000044 [05:31:23] Epoch: 1 Batch: 60/20099 (0.30%) Loss: 6.855827 LR: 0.00000044 [05:31:26] Epoch: 1 Batch: 61/20099 (0.30%) Loss: 6.770656 LR: 0.00000044 [05:31:29] Epoch: 1 Batch: 62/20099 (0.31%) Loss: 6.871757 LR: 0.00000044 [05:31:33] Epoch: 1 Batch: 63/20099 (0.31%) Loss: 6.716748 LR: 0.00000049 [05:31:36] Epoch: 1 Batch: 64/20099 (0.32%) Loss: 6.837216 LR: 0.00000049 [05:31:39] Epoch: 1 Batch: 65/20099 (0.32%) Loss: 7.155727 LR: 0.00000049 [05:31:42] Epoch: 1 Batch: 66/20099 (0.33%) Loss: 6.878810 LR: 0.00000049 [05:31:45] Epoch: 1 Batch: 67/20099 (0.33%) Loss: 6.990387 LR: 0.00000049 [05:31:48] Epoch: 1 Batch: 68/20099 (0.34%) Loss: 6.873594 LR: 0.00000049 [05:31:51] Epoch: 1 Batch: 69/20099 (0.34%) Loss: 6.576877 LR: 0.00000049 [05:31:54] Epoch: 1 Batch: 70/20099 (0.35%) Loss: 6.757781 LR: 0.00000055 [05:31:57] Epoch: 1 Batch: 71/20099 (0.35%) Loss: 6.604476 LR: 0.00000055 [05:32:00] Epoch: 1 Batch: 72/20099 (0.36%) Loss: 6.932801 LR: 0.00000055 [05:32:03] Epoch: 1 Batch: 73/20099 (0.36%) Loss: 6.624389 LR: 0.00000055 [05:32:06] Epoch: 1 Batch: 74/20099 (0.37%) Loss: 6.993600 LR: 0.00000055 [05:32:09] Epoch: 1 Batch: 75/20099 (0.37%) Loss: 6.991932 LR: 0.00000055 [05:32:12] Epoch: 1 Batch: 76/20099 (0.38%) Loss: 6.722564 LR: 0.00000055 [05:32:15] Epoch: 1 Batch: 77/20099 (0.38%) Loss: 7.035213 LR: 0.00000060 [05:32:19] Epoch: 1 Batch: 78/20099 (0.39%) Loss: 6.918113 LR: 0.00000060 [05:32:22] Epoch: 1 Batch: 79/20099 (0.39%) Loss: 6.459154 LR: 0.00000060 [05:32:25] Epoch: 1 Batch: 80/20099 (0.40%) Loss: 6.856339 LR: 0.00000060 [05:32:28] Epoch: 1 Batch: 81/20099 (0.40%) Loss: 6.846845 LR: 0.00000060 [05:32:31] Epoch: 1 Batch: 82/20099 (0.41%) Loss: 6.864122 LR: 0.00000060 [05:32:34] Epoch: 1 Batch: 83/20099 (0.41%) Loss: 6.803262 LR: 0.00000060 [05:32:37] Epoch: 1 Batch: 84/20099 (0.42%) Loss: 6.907727 LR: 0.00000065 [05:32:40] Epoch: 1 Batch: 85/20099 (0.42%) Loss: 6.762291 LR: 0.00000065 [05:32:44] Epoch: 1 Batch: 86/20099 (0.43%) Loss: 6.806541 LR: 0.00000065 [05:32:47] Epoch: 1 Batch: 87/20099 (0.43%) Loss: 6.475651 LR: 0.00000065 [05:32:50] Epoch: 1 Batch: 88/20099 (0.44%) Loss: 6.600870 LR: 0.00000065 [05:32:53] Epoch: 1 Batch: 89/20099 (0.44%) Loss: 6.621047 LR: 0.00000065 [05:32:56] Epoch: 1 Batch: 90/20099 (0.45%) Loss: 6.845604 LR: 0.00000065 [05:32:59] Epoch: 1 Batch: 91/20099 (0.45%) Loss: 6.564170 LR: 0.00000071 [05:33:02] Epoch: 1 Batch: 92/20099 (0.46%) Loss: 6.514664 LR: 0.00000071 [05:33:06] Epoch: 1 Batch: 93/20099 (0.46%) Loss: 6.489383 LR: 0.00000071 [05:33:09] Epoch: 1 Batch: 94/20099 (0.47%) Loss: 6.591108 LR: 0.00000071 [05:33:12] Epoch: 1 Batch: 95/20099 (0.47%) Loss: 6.543421 LR: 0.00000071 [05:33:15] Epoch: 1 Batch: 96/20099 (0.48%) Loss: 6.640646 LR: 0.00000071 [05:33:18] Epoch: 1 Batch: 97/20099 (0.48%) Loss: 6.666123 LR: 0.00000071 [05:33:21] Epoch: 1 Batch: 98/20099 (0.49%) Loss: 6.663676 LR: 0.00000076 [05:33:24] Epoch: 1 Batch: 99/20099 (0.49%) Loss: 7.137365 LR: 0.00000076 [05:33:27] Epoch: 1 Batch: 100/20099 (0.50%) Loss: 6.723222 LR: 0.00000076 [05:33:30] Epoch: 1 Batch: 101/20099 (0.50%) Loss: 6.652589 LR: 0.00000076 [05:33:33] Epoch: 1 Batch: 102/20099 (0.51%) Loss: 6.725030 LR: 0.00000076 [05:33:36] Epoch: 1 Batch: 103/20099 (0.51%) Loss: 6.637894 LR: 0.00000076 [05:33:40] Epoch: 1 Batch: 104/20099 (0.52%) Loss: 6.814937 LR: 0.00000076 [05:33:43] Epoch: 1 Batch: 105/20099 (0.52%) Loss: 6.486751 LR: 0.00000082 [05:33:46] Epoch: 1 Batch: 106/20099 (0.53%) Loss: 6.497531 LR: 0.00000082 [05:33:49] Epoch: 1 Batch: 107/20099 (0.53%) Loss: 6.509476 LR: 0.00000082 [05:33:52] Epoch: 1 Batch: 108/20099 (0.54%) Loss: 6.648261 LR: 0.00000082 [05:33:55] Epoch: 1 Batch: 109/20099 (0.54%) Loss: 6.239590 LR: 0.00000082 [05:33:58] Epoch: 1 Batch: 110/20099 (0.55%) Loss: 6.686769 LR: 0.00000082 [05:34:01] Epoch: 1 Batch: 111/20099 (0.55%) Loss: 6.750135 LR: 0.00000082 [05:34:04] Epoch: 1 Batch: 112/20099 (0.56%) Loss: 7.052633 LR: 0.00000087 [05:34:08] Epoch: 1 Batch: 113/20099 (0.56%) Loss: 6.514356 LR: 0.00000087 [05:34:11] Epoch: 1 Batch: 114/20099 (0.57%) Loss: 6.400832 LR: 0.00000087 [05:34:14] Epoch: 1 Batch: 115/20099 (0.57%) Loss: 6.413975 LR: 0.00000087 [05:34:17] Epoch: 1 Batch: 116/20099 (0.58%) Loss: 6.430995 LR: 0.00000087 [05:34:20] Epoch: 1 Batch: 117/20099 (0.58%) Loss: 6.863123 LR: 0.00000087 [05:34:23] Epoch: 1 Batch: 118/20099 (0.59%) Loss: 6.755737 LR: 0.00000087 [05:34:26] Epoch: 1 Batch: 119/20099 (0.59%) Loss: 6.394816 LR: 0.00000093 [05:34:29] Epoch: 1 Batch: 120/20099 (0.60%) Loss: 6.555242 LR: 0.00000093 [05:34:32] Epoch: 1 Batch: 121/20099 (0.60%) Loss: 6.644229 LR: 0.00000093 [05:34:36] Epoch: 1 Batch: 122/20099 (0.61%) Loss: 6.372416 LR: 0.00000093 [05:34:39] Epoch: 1 Batch: 123/20099 (0.61%) Loss: 6.253324 LR: 0.00000093 [05:34:42] Epoch: 1 Batch: 124/20099 (0.62%) Loss: 6.435319 LR: 0.00000093 [05:34:45] Epoch: 1 Batch: 125/20099 (0.62%) Loss: 6.674481 LR: 0.00000093 [05:34:48] Epoch: 1 Batch: 126/20099 (0.63%) Loss: 6.503188 LR: 0.00000098 [05:34:51] Epoch: 1 Batch: 127/20099 (0.63%) Loss: 6.616235 LR: 0.00000098 [05:34:54] Epoch: 1 Batch: 128/20099 (0.64%) Loss: 6.335257 LR: 0.00000098 [05:34:57] Epoch: 1 Batch: 129/20099 (0.64%) Loss: 6.502963 LR: 0.00000098 [05:35:00] Epoch: 1 Batch: 130/20099 (0.65%) Loss: 6.429163 LR: 0.00000098 [05:35:04] Epoch: 1 Batch: 131/20099 (0.65%) Loss: 6.404885 LR: 0.00000098 [05:35:07] Epoch: 1 Batch: 132/20099 (0.66%) Loss: 6.518302 LR: 0.00000098 [05:35:10] Epoch: 1 Batch: 133/20099 (0.66%) Loss: 6.721867 LR: 0.00000104 [05:35:13] Epoch: 1 Batch: 134/20099 (0.67%) Loss: 6.334186 LR: 0.00000104 [05:35:16] Epoch: 1 Batch: 135/20099 (0.67%) Loss: 6.711968 LR: 0.00000104 [05:35:19] Epoch: 1 Batch: 136/20099 (0.68%) Loss: 6.439227 LR: 0.00000104 [05:35:22] Epoch: 1 Batch: 137/20099 (0.68%) Loss: 6.490163 LR: 0.00000104 [05:35:25] Epoch: 1 Batch: 138/20099 (0.69%) Loss: 6.182080 LR: 0.00000104 [05:35:28] Epoch: 1 Batch: 139/20099 (0.69%) Loss: 6.645146 LR: 0.00000104 [05:35:32] Epoch: 1 Batch: 140/20099 (0.70%) Loss: 6.454726 LR: 0.00000109 [05:35:35] Epoch: 1 Batch: 141/20099 (0.70%) Loss: 6.389693 LR: 0.00000109 [05:35:38] Epoch: 1 Batch: 142/20099 (0.71%) Loss: 6.573570 LR: 0.00000109 [05:35:41] Epoch: 1 Batch: 143/20099 (0.71%) Loss: 6.421232 LR: 0.00000109 [05:35:44] Epoch: 1 Batch: 144/20099 (0.72%) Loss: 6.541822 LR: 0.00000109 [05:35:47] Epoch: 1 Batch: 145/20099 (0.72%) Loss: 6.197926 LR: 0.00000109 [05:35:50] Epoch: 1 Batch: 146/20099 (0.73%) Loss: 6.510123 LR: 0.00000109 [05:35:53] Epoch: 1 Batch: 147/20099 (0.73%) Loss: 6.573685 LR: 0.00000115 [05:35:56] Epoch: 1 Batch: 148/20099 (0.74%) Loss: 6.656434 LR: 0.00000115 [05:36:00] Epoch: 1 Batch: 149/20099 (0.74%) Loss: 6.376676 LR: 0.00000115 [05:36:03] Epoch: 1 Batch: 150/20099 (0.75%) Loss: 6.215777 LR: 0.00000115 [05:36:06] Epoch: 1 Batch: 151/20099 (0.75%) Loss: 6.363476 LR: 0.00000115 [05:36:09] Epoch: 1 Batch: 152/20099 (0.76%) Loss: 6.620334 LR: 0.00000115 [05:36:12] Epoch: 1 Batch: 153/20099 (0.76%) Loss: 6.468755 LR: 0.00000115 [05:36:15] Epoch: 1 Batch: 154/20099 (0.77%) Loss: 6.322797 LR: 0.00000120 [05:36:18] Epoch: 1 Batch: 155/20099 (0.77%) Loss: 6.676209 LR: 0.00000120 [05:36:21] Epoch: 1 Batch: 156/20099 (0.78%) Loss: 6.065999 LR: 0.00000120 [05:36:24] Epoch: 1 Batch: 157/20099 (0.78%) Loss: 6.779493 LR: 0.00000120 [05:36:28] Epoch: 1 Batch: 158/20099 (0.79%) Loss: 6.407558 LR: 0.00000120 [05:36:31] Epoch: 1 Batch: 159/20099 (0.79%) Loss: 6.315716 LR: 0.00000120 [05:36:34] Epoch: 1 Batch: 160/20099 (0.80%) Loss: 6.164629 LR: 0.00000120 [05:36:37] Epoch: 1 Batch: 161/20099 (0.80%) Loss: 6.435280 LR: 0.00000125 [05:36:40] Epoch: 1 Batch: 162/20099 (0.81%) Loss: 6.606733 LR: 0.00000125 [05:36:43] Epoch: 1 Batch: 163/20099 (0.81%) Loss: 6.095863 LR: 0.00000125 [05:36:46] Epoch: 1 Batch: 164/20099 (0.82%) Loss: 6.013640 LR: 0.00000125 [05:36:49] Epoch: 1 Batch: 165/20099 (0.82%) Loss: 6.524360 LR: 0.00000125 [05:36:52] Epoch: 1 Batch: 166/20099 (0.83%) Loss: 6.084081 LR: 0.00000125 [05:36:56] Epoch: 1 Batch: 167/20099 (0.83%) Loss: 6.428363 LR: 0.00000125 [05:36:59] Epoch: 1 Batch: 168/20099 (0.84%) Loss: 6.342831 LR: 0.00000131 [05:37:02] Epoch: 1 Batch: 169/20099 (0.84%) Loss: 6.097524 LR: 0.00000131 [05:37:05] Epoch: 1 Batch: 170/20099 (0.85%) Loss: 6.208304 LR: 0.00000131 [05:37:08] Epoch: 1 Batch: 171/20099 (0.85%) Loss: 6.293879 LR: 0.00000131 [05:37:11] Epoch: 1 Batch: 172/20099 (0.86%) Loss: 6.507888 LR: 0.00000131 [05:37:14] Epoch: 1 Batch: 173/20099 (0.86%) Loss: 6.448502 LR: 0.00000131 [05:37:17] Epoch: 1 Batch: 174/20099 (0.87%) Loss: 6.333720 LR: 0.00000131 [05:37:20] Epoch: 1 Batch: 175/20099 (0.87%) Loss: 5.815619 LR: 0.00000136 [05:37:24] Epoch: 1 Batch: 176/20099 (0.88%) Loss: 5.965688 LR: 0.00000136 [05:37:27] Epoch: 1 Batch: 177/20099 (0.88%) Loss: 6.505473 LR: 0.00000136 [05:37:30] Epoch: 1 Batch: 178/20099 (0.89%) Loss: 6.120909 LR: 0.00000136 [05:37:33] Epoch: 1 Batch: 179/20099 (0.89%) Loss: 6.296700 LR: 0.00000136 [05:37:36] Epoch: 1 Batch: 180/20099 (0.90%) Loss: 6.364547 LR: 0.00000136 [05:37:39] Epoch: 1 Batch: 181/20099 (0.90%) Loss: 6.454642 LR: 0.00000136 [05:37:42] Epoch: 1 Batch: 182/20099 (0.91%) Loss: 6.215357 LR: 0.00000142 [05:37:45] Epoch: 1 Batch: 183/20099 (0.91%) Loss: 6.249943 LR: 0.00000142 [05:37:48] Epoch: 1 Batch: 184/20099 (0.92%) Loss: 6.188128 LR: 0.00000142 [05:37:52] Epoch: 1 Batch: 185/20099 (0.92%) Loss: 6.247033 LR: 0.00000142 [05:37:55] Epoch: 1 Batch: 186/20099 (0.93%) Loss: 6.516192 LR: 0.00000142 [05:37:58] Epoch: 1 Batch: 187/20099 (0.93%) Loss: 6.236046 LR: 0.00000142 [05:38:01] Epoch: 1 Batch: 188/20099 (0.94%) Loss: 5.909803 LR: 0.00000142 [05:38:04] Epoch: 1 Batch: 189/20099 (0.94%) Loss: 6.254087 LR: 0.00000147 [05:38:07] Epoch: 1 Batch: 190/20099 (0.95%) Loss: 6.308969 LR: 0.00000147 [05:38:10] Epoch: 1 Batch: 191/20099 (0.95%) Loss: 6.102310 LR: 0.00000147 [05:38:13] Epoch: 1 Batch: 192/20099 (0.96%) Loss: 5.751007 LR: 0.00000147 [05:38:16] Epoch: 1 Batch: 193/20099 (0.96%) Loss: 6.190647 LR: 0.00000147 [05:38:20] Epoch: 1 Batch: 194/20099 (0.97%) Loss: 5.969916 LR: 0.00000147 [05:38:23] Epoch: 1 Batch: 195/20099 (0.97%) Loss: 6.395854 LR: 0.00000147 [05:38:26] Epoch: 1 Batch: 196/20099 (0.98%) Loss: 6.385301 LR: 0.00000153 [05:38:29] Epoch: 1 Batch: 197/20099 (0.98%) Loss: 5.855777 LR: 0.00000153 [05:38:32] Epoch: 1 Batch: 198/20099 (0.99%) Loss: 5.975441 LR: 0.00000153 [05:38:35] Epoch: 1 Batch: 199/20099 (0.99%) Loss: 6.063916 LR: 0.00000153 [05:38:42] >> Temp checkpoint saved: epoch1_step200, size: 0.1693 GB [05:38:42] Epoch: 1 Batch: 200/20099 (1.00%) Loss: 5.698818 LR: 0.00000153 [05:38:45] Epoch: 1 Batch: 201/20099 (1.00%) Loss: 6.142428 LR: 0.00000153 [05:38:48] Epoch: 1 Batch: 202/20099 (1.01%) Loss: 6.221570 LR: 0.00000153 [05:38:51] Epoch: 1 Batch: 203/20099 (1.01%) Loss: 6.094434 LR: 0.00000158 [05:38:54] Epoch: 1 Batch: 204/20099 (1.01%) Loss: 6.072093 LR: 0.00000158 [05:38:57] Epoch: 1 Batch: 205/20099 (1.02%) Loss: 6.169077 LR: 0.00000158 [05:39:01] Epoch: 1 Batch: 206/20099 (1.02%) Loss: 6.286780 LR: 0.00000158 [05:39:04] Epoch: 1 Batch: 207/20099 (1.03%) Loss: 6.379415 LR: 0.00000158 [05:39:07] Epoch: 1 Batch: 208/20099 (1.03%) Loss: 6.125323 LR: 0.00000158 [05:39:10] Epoch: 1 Batch: 209/20099 (1.04%) Loss: 6.244865 LR: 0.00000158 [05:39:13] Epoch: 1 Batch: 210/20099 (1.04%) Loss: 6.171168 LR: 0.00000164 [05:39:16] Epoch: 1 Batch: 211/20099 (1.05%) Loss: 6.062247 LR: 0.00000164 [05:39:19] Epoch: 1 Batch: 212/20099 (1.05%) Loss: 5.894752 LR: 0.00000164 [05:39:22] Epoch: 1 Batch: 213/20099 (1.06%) Loss: 6.162818 LR: 0.00000164 [05:39:25] Epoch: 1 Batch: 214/20099 (1.06%) Loss: 5.833673 LR: 0.00000164 [05:39:29] Epoch: 1 Batch: 215/20099 (1.07%) Loss: 6.039549 LR: 0.00000164 [05:39:32] Epoch: 1 Batch: 216/20099 (1.07%) Loss: 5.770587 LR: 0.00000164 [05:39:35] Epoch: 1 Batch: 217/20099 (1.08%) Loss: 5.787209 LR: 0.00000169 [05:39:38] Epoch: 1 Batch: 218/20099 (1.08%) Loss: 5.869362 LR: 0.00000169 [05:39:41] Epoch: 1 Batch: 219/20099 (1.09%) Loss: 5.754841 LR: 0.00000169 [05:39:44] Epoch: 1 Batch: 220/20099 (1.09%) Loss: 5.975692 LR: 0.00000169 [05:39:47] Epoch: 1 Batch: 221/20099 (1.10%) Loss: 6.191529 LR: 0.00000169 [05:39:50] Epoch: 1 Batch: 222/20099 (1.10%) Loss: 5.850366 LR: 0.00000169 [05:39:53] Epoch: 1 Batch: 223/20099 (1.11%) Loss: 5.753388 LR: 0.00000169 [05:39:57] Epoch: 1 Batch: 224/20099 (1.11%) Loss: 5.992358 LR: 0.00000175 [05:40:00] Epoch: 1 Batch: 225/20099 (1.12%) Loss: 6.066387 LR: 0.00000175 [05:40:03] Epoch: 1 Batch: 226/20099 (1.12%) Loss: 5.618329 LR: 0.00000175 [05:40:06] Epoch: 1 Batch: 227/20099 (1.13%) Loss: 6.136591 LR: 0.00000175 [05:40:09] Epoch: 1 Batch: 228/20099 (1.13%) Loss: 6.188446 LR: 0.00000175 [05:40:12] Epoch: 1 Batch: 229/20099 (1.14%) Loss: 6.199264 LR: 0.00000175 [05:40:15] Epoch: 1 Batch: 230/20099 (1.14%) Loss: 5.657817 LR: 0.00000175 [05:40:18] Epoch: 1 Batch: 231/20099 (1.15%) Loss: 5.634850 LR: 0.00000180 [05:40:21] Epoch: 1 Batch: 232/20099 (1.15%) Loss: 5.713538 LR: 0.00000180 [05:40:24] Epoch: 1 Batch: 233/20099 (1.16%) Loss: 5.442693 LR: 0.00000180 [05:40:27] Epoch: 1 Batch: 234/20099 (1.16%) Loss: 5.468944 LR: 0.00000180 [05:40:31] Epoch: 1 Batch: 235/20099 (1.17%) Loss: 5.735326 LR: 0.00000180 [05:40:34] Epoch: 1 Batch: 236/20099 (1.17%) Loss: 5.961711 LR: 0.00000180 [05:40:37] Epoch: 1 Batch: 237/20099 (1.18%) Loss: 5.912184 LR: 0.00000180 [05:40:40] Epoch: 1 Batch: 238/20099 (1.18%) Loss: 5.762606 LR: 0.00000185 [05:40:43] Epoch: 1 Batch: 239/20099 (1.19%) Loss: 5.803808 LR: 0.00000185 [05:40:46] Epoch: 1 Batch: 240/20099 (1.19%) Loss: 5.470053 LR: 0.00000185 [05:40:49] Epoch: 1 Batch: 241/20099 (1.20%) Loss: 6.077574 LR: 0.00000185 [05:40:52] Epoch: 1 Batch: 242/20099 (1.20%) Loss: 5.852667 LR: 0.00000185 [05:40:55] Epoch: 1 Batch: 243/20099 (1.21%) Loss: 5.655321 LR: 0.00000185 [05:40:58] Epoch: 1 Batch: 244/20099 (1.21%) Loss: 5.610448 LR: 0.00000185 [05:41:02] Epoch: 1 Batch: 245/20099 (1.22%) Loss: 5.818143 LR: 0.00000191 [05:41:05] Epoch: 1 Batch: 246/20099 (1.22%) Loss: 5.807350 LR: 0.00000191 [05:41:08] Epoch: 1 Batch: 247/20099 (1.23%) Loss: 5.619134 LR: 0.00000191 [05:41:11] Epoch: 1 Batch: 248/20099 (1.23%) Loss: 5.859716 LR: 0.00000191 [05:41:14] Epoch: 1 Batch: 249/20099 (1.24%) Loss: 5.755089 LR: 0.00000191 [05:41:17] Epoch: 1 Batch: 250/20099 (1.24%) Loss: 5.934231 LR: 0.00000191 [05:41:20] Epoch: 1 Batch: 251/20099 (1.25%) Loss: 5.471463 LR: 0.00000191 [05:41:23] Epoch: 1 Batch: 252/20099 (1.25%) Loss: 5.868921 LR: 0.00000196 [05:41:26] Epoch: 1 Batch: 253/20099 (1.26%) Loss: 5.466228 LR: 0.00000196 [05:41:29] Epoch: 1 Batch: 254/20099 (1.26%) Loss: 5.504995 LR: 0.00000196 [05:41:33] Epoch: 1 Batch: 255/20099 (1.27%) Loss: 5.788034 LR: 0.00000196 [05:41:36] Epoch: 1 Batch: 256/20099 (1.27%) Loss: 5.667752 LR: 0.00000196 [05:41:39] Epoch: 1 Batch: 257/20099 (1.28%) Loss: 5.331821 LR: 0.00000196 [05:41:42] Epoch: 1 Batch: 258/20099 (1.28%) Loss: 5.674621 LR: 0.00000196 [05:41:45] Epoch: 1 Batch: 259/20099 (1.29%) Loss: 5.657183 LR: 0.00000202 [05:41:48] Epoch: 1 Batch: 260/20099 (1.29%) Loss: 5.716599 LR: 0.00000202 [05:41:51] Epoch: 1 Batch: 261/20099 (1.30%) Loss: 5.072574 LR: 0.00000202 [05:41:54] Epoch: 1 Batch: 262/20099 (1.30%) Loss: 5.573281 LR: 0.00000202 [05:41:57] Epoch: 1 Batch: 263/20099 (1.31%) Loss: 5.770137 LR: 0.00000202 [05:42:00] Epoch: 1 Batch: 264/20099 (1.31%) Loss: 5.341028 LR: 0.00000202 [05:42:04] Epoch: 1 Batch: 265/20099 (1.32%) Loss: 5.450570 LR: 0.00000202 [05:42:07] Epoch: 1 Batch: 266/20099 (1.32%) Loss: 5.163337 LR: 0.00000207 [05:42:10] Epoch: 1 Batch: 267/20099 (1.33%) Loss: 5.341553 LR: 0.00000207 [05:42:13] Epoch: 1 Batch: 268/20099 (1.33%) Loss: 5.937451 LR: 0.00000207 [05:42:16] Epoch: 1 Batch: 269/20099 (1.34%) Loss: 5.385732 LR: 0.00000207 [05:42:19] Epoch: 1 Batch: 270/20099 (1.34%) Loss: 5.646508 LR: 0.00000207 [05:42:22] Epoch: 1 Batch: 271/20099 (1.35%) Loss: 5.687840 LR: 0.00000207 [05:42:25] Epoch: 1 Batch: 272/20099 (1.35%) Loss: 5.411791 LR: 0.00000207 [05:42:29] Epoch: 1 Batch: 273/20099 (1.36%) Loss: 5.368639 LR: 0.00000213 [05:42:32] Epoch: 1 Batch: 274/20099 (1.36%) Loss: 5.434786 LR: 0.00000213 [05:42:35] Epoch: 1 Batch: 275/20099 (1.37%) Loss: 5.362719 LR: 0.00000213 [05:42:38] Epoch: 1 Batch: 276/20099 (1.37%) Loss: 5.543468 LR: 0.00000213 [05:42:41] Epoch: 1 Batch: 277/20099 (1.38%) Loss: 5.233136 LR: 0.00000213 [05:42:44] Epoch: 1 Batch: 278/20099 (1.38%) Loss: 5.227110 LR: 0.00000213 [05:42:47] Epoch: 1 Batch: 279/20099 (1.39%) Loss: 5.556291 LR: 0.00000213 [05:42:50] Epoch: 1 Batch: 280/20099 (1.39%) Loss: 5.393062 LR: 0.00000218 [05:42:53] Epoch: 1 Batch: 281/20099 (1.40%) Loss: 5.804086 LR: 0.00000218 [05:42:57] Epoch: 1 Batch: 282/20099 (1.40%) Loss: 5.275065 LR: 0.00000218 [05:43:00] Epoch: 1 Batch: 283/20099 (1.41%) Loss: 5.461874 LR: 0.00000218 [05:43:03] Epoch: 1 Batch: 284/20099 (1.41%) Loss: 5.395660 LR: 0.00000218 [05:43:06] Epoch: 1 Batch: 285/20099 (1.42%) Loss: 5.600826 LR: 0.00000218 [05:43:09] Epoch: 1 Batch: 286/20099 (1.42%) Loss: 5.379600 LR: 0.00000218 [05:43:12] Epoch: 1 Batch: 287/20099 (1.43%) Loss: 5.399331 LR: 0.00000224 [05:43:15] Epoch: 1 Batch: 288/20099 (1.43%) Loss: 5.327848 LR: 0.00000224 [05:43:18] Epoch: 1 Batch: 289/20099 (1.44%) Loss: 5.226300 LR: 0.00000224 [05:43:21] Epoch: 1 Batch: 290/20099 (1.44%) Loss: 5.193249 LR: 0.00000224 [05:43:25] Epoch: 1 Batch: 291/20099 (1.45%) Loss: 4.893579 LR: 0.00000224 [05:43:28] Epoch: 1 Batch: 292/20099 (1.45%) Loss: 5.189096 LR: 0.00000224 [05:43:31] Epoch: 1 Batch: 293/20099 (1.46%) Loss: 5.457005 LR: 0.00000224 [05:43:34] Epoch: 1 Batch: 294/20099 (1.46%) Loss: 5.486538 LR: 0.00000229 [05:43:37] Epoch: 1 Batch: 295/20099 (1.47%) Loss: 5.685827 LR: 0.00000229 [05:43:40] Epoch: 1 Batch: 296/20099 (1.47%) Loss: 5.005714 LR: 0.00000229 [05:43:43] Epoch: 1 Batch: 297/20099 (1.48%) Loss: 5.061221 LR: 0.00000229 [05:43:46] Epoch: 1 Batch: 298/20099 (1.48%) Loss: 5.398332 LR: 0.00000229 [05:43:49] Epoch: 1 Batch: 299/20099 (1.49%) Loss: 5.169147 LR: 0.00000229 [05:43:53] Epoch: 1 Batch: 300/20099 (1.49%) Loss: 5.401959 LR: 0.00000229 [05:43:56] Epoch: 1 Batch: 301/20099 (1.50%) Loss: 5.609097 LR: 0.00000235 [05:43:59] Epoch: 1 Batch: 302/20099 (1.50%) Loss: 5.090550 LR: 0.00000235 [05:44:02] Epoch: 1 Batch: 303/20099 (1.51%) Loss: 5.435633 LR: 0.00000235 [05:44:05] Epoch: 1 Batch: 304/20099 (1.51%) Loss: 4.977289 LR: 0.00000235 [05:44:08] Epoch: 1 Batch: 305/20099 (1.52%) Loss: 5.232569 LR: 0.00000235 [05:44:11] Epoch: 1 Batch: 306/20099 (1.52%) Loss: 5.072268 LR: 0.00000235 [05:44:14] Epoch: 1 Batch: 307/20099 (1.53%) Loss: 5.037082 LR: 0.00000235 [05:44:17] Epoch: 1 Batch: 308/20099 (1.53%) Loss: 5.163113 LR: 0.00000240 [05:44:21] Epoch: 1 Batch: 309/20099 (1.54%) Loss: 5.293965 LR: 0.00000240 [05:44:24] Epoch: 1 Batch: 310/20099 (1.54%) Loss: 5.232081 LR: 0.00000240 [05:44:27] Epoch: 1 Batch: 311/20099 (1.55%) Loss: 4.839522 LR: 0.00000240 [05:44:30] Epoch: 1 Batch: 312/20099 (1.55%) Loss: 5.036047 LR: 0.00000240 [05:44:33] Epoch: 1 Batch: 313/20099 (1.56%) Loss: 4.877650 LR: 0.00000240 [05:44:36] Epoch: 1 Batch: 314/20099 (1.56%) Loss: 4.930508 LR: 0.00000240 [05:44:39] Epoch: 1 Batch: 315/20099 (1.57%) Loss: 5.141826 LR: 0.00000245 [05:44:42] Epoch: 1 Batch: 316/20099 (1.57%) Loss: 5.309392 LR: 0.00000245 [05:44:45] Epoch: 1 Batch: 317/20099 (1.58%) Loss: 4.945790 LR: 0.00000245 [05:44:48] Epoch: 1 Batch: 318/20099 (1.58%) Loss: 5.153427 LR: 0.00000245 [05:44:51] Epoch: 1 Batch: 319/20099 (1.59%) Loss: 5.296544 LR: 0.00000245 [05:44:55] Epoch: 1 Batch: 320/20099 (1.59%) Loss: 5.330044 LR: 0.00000245 [05:44:58] Epoch: 1 Batch: 321/20099 (1.60%) Loss: 5.185388 LR: 0.00000245 [05:45:01] Epoch: 1 Batch: 322/20099 (1.60%) Loss: 5.081245 LR: 0.00000251 [05:45:04] Epoch: 1 Batch: 323/20099 (1.61%) Loss: 5.322606 LR: 0.00000251 [05:45:07] Epoch: 1 Batch: 324/20099 (1.61%) Loss: 5.183798 LR: 0.00000251 [05:45:10] Epoch: 1 Batch: 325/20099 (1.62%) Loss: 4.819690 LR: 0.00000251 [05:45:13] Epoch: 1 Batch: 326/20099 (1.62%) Loss: 4.931559 LR: 0.00000251 [05:45:16] Epoch: 1 Batch: 327/20099 (1.63%) Loss: 4.904405 LR: 0.00000251 [05:45:19] Epoch: 1 Batch: 328/20099 (1.63%) Loss: 5.280621 LR: 0.00000251 [05:45:22] Epoch: 1 Batch: 329/20099 (1.64%) Loss: 4.951993 LR: 0.00000256 [05:45:26] Epoch: 1 Batch: 330/20099 (1.64%) Loss: 4.881390 LR: 0.00000256 [05:45:29] Epoch: 1 Batch: 331/20099 (1.65%) Loss: 4.972852 LR: 0.00000256 [05:45:32] Epoch: 1 Batch: 332/20099 (1.65%) Loss: 5.029356 LR: 0.00000256 [05:45:35] Epoch: 1 Batch: 333/20099 (1.66%) Loss: 5.228640 LR: 0.00000256 [05:45:38] Epoch: 1 Batch: 334/20099 (1.66%) Loss: 5.045745 LR: 0.00000256 [05:45:41] Epoch: 1 Batch: 335/20099 (1.67%) Loss: 4.942456 LR: 0.00000256 [05:45:44] Epoch: 1 Batch: 336/20099 (1.67%) Loss: 4.788242 LR: 0.00000262 [05:45:47] Epoch: 1 Batch: 337/20099 (1.68%) Loss: 5.103614 LR: 0.00000262 [05:45:50] Epoch: 1 Batch: 338/20099 (1.68%) Loss: 4.654943 LR: 0.00000262 [05:45:53] Epoch: 1 Batch: 339/20099 (1.69%) Loss: 4.922127 LR: 0.00000262 [05:45:56] Epoch: 1 Batch: 340/20099 (1.69%) Loss: 4.512710 LR: 0.00000262 [05:46:00] Epoch: 1 Batch: 341/20099 (1.70%) Loss: 4.882350 LR: 0.00000262 [05:46:03] Epoch: 1 Batch: 342/20099 (1.70%) Loss: 5.083026 LR: 0.00000262 [05:46:06] Epoch: 1 Batch: 343/20099 (1.71%) Loss: 4.741843 LR: 0.00000267 [05:46:09] Epoch: 1 Batch: 344/20099 (1.71%) Loss: 4.899370 LR: 0.00000267 [05:46:12] Epoch: 1 Batch: 345/20099 (1.72%) Loss: 4.738027 LR: 0.00000267 [05:46:15] Epoch: 1 Batch: 346/20099 (1.72%) Loss: 4.711547 LR: 0.00000267 [05:46:18] Epoch: 1 Batch: 347/20099 (1.73%) Loss: 4.708997 LR: 0.00000267 [05:46:21] Epoch: 1 Batch: 348/20099 (1.73%) Loss: 4.782561 LR: 0.00000267 [05:46:24] Epoch: 1 Batch: 349/20099 (1.74%) Loss: 4.748567 LR: 0.00000267 [05:46:28] Epoch: 1 Batch: 350/20099 (1.74%) Loss: 4.783836 LR: 0.00000273 [05:46:31] Epoch: 1 Batch: 351/20099 (1.75%) Loss: 5.047218 LR: 0.00000273 [05:46:34] Epoch: 1 Batch: 352/20099 (1.75%) Loss: 4.825919 LR: 0.00000273 [05:46:37] Epoch: 1 Batch: 353/20099 (1.76%) Loss: 4.717120 LR: 0.00000273 [05:46:40] Epoch: 1 Batch: 354/20099 (1.76%) Loss: 4.809447 LR: 0.00000273 [05:46:43] Epoch: 1 Batch: 355/20099 (1.77%) Loss: 4.627695 LR: 0.00000273 [05:46:46] Epoch: 1 Batch: 356/20099 (1.77%) Loss: 4.865065 LR: 0.00000273 [05:46:49] Epoch: 1 Batch: 357/20099 (1.78%) Loss: 4.727468 LR: 0.00000278 [05:46:52] Epoch: 1 Batch: 358/20099 (1.78%) Loss: 4.718815 LR: 0.00000278 [05:46:55] Epoch: 1 Batch: 359/20099 (1.79%) Loss: 4.850678 LR: 0.00000278 [05:46:59] Epoch: 1 Batch: 360/20099 (1.79%) Loss: 4.290783 LR: 0.00000278 [05:47:02] Epoch: 1 Batch: 361/20099 (1.80%) Loss: 4.616312 LR: 0.00000278 [05:47:05] Epoch: 1 Batch: 362/20099 (1.80%) Loss: 4.740779 LR: 0.00000278 [05:47:08] Epoch: 1 Batch: 363/20099 (1.81%) Loss: 4.519742 LR: 0.00000278 [05:47:11] Epoch: 1 Batch: 364/20099 (1.81%) Loss: 4.583066 LR: 0.00000284 [05:47:14] Epoch: 1 Batch: 365/20099 (1.82%) Loss: 4.539240 LR: 0.00000284 [05:47:17] Epoch: 1 Batch: 366/20099 (1.82%) Loss: 4.715234 LR: 0.00000284 [05:47:20] Epoch: 1 Batch: 367/20099 (1.83%) Loss: 4.348344 LR: 0.00000284 [05:47:23] Epoch: 1 Batch: 368/20099 (1.83%) Loss: 4.668050 LR: 0.00000284 [05:47:26] Epoch: 1 Batch: 369/20099 (1.84%) Loss: 4.624444 LR: 0.00000284 [05:47:30] Epoch: 1 Batch: 370/20099 (1.84%) Loss: 4.446140 LR: 0.00000284 [05:47:33] Epoch: 1 Batch: 371/20099 (1.85%) Loss: 4.270877 LR: 0.00000289 [05:47:36] Epoch: 1 Batch: 372/20099 (1.85%) Loss: 4.583157 LR: 0.00000289 [05:47:39] Epoch: 1 Batch: 373/20099 (1.86%) Loss: 4.234210 LR: 0.00000289 [05:47:42] Epoch: 1 Batch: 374/20099 (1.86%) Loss: 4.338867 LR: 0.00000289 [05:47:45] Epoch: 1 Batch: 375/20099 (1.87%) Loss: 4.833863 LR: 0.00000289 [05:47:48] Epoch: 1 Batch: 376/20099 (1.87%) Loss: 4.351973 LR: 0.00000289 [05:47:51] Epoch: 1 Batch: 377/20099 (1.88%) Loss: 4.211730 LR: 0.00000289 [05:47:54] Epoch: 1 Batch: 378/20099 (1.88%) Loss: 4.438265 LR: 0.00000295 [05:47:58] Epoch: 1 Batch: 379/20099 (1.89%) Loss: 4.377117 LR: 0.00000295 [05:48:01] Epoch: 1 Batch: 380/20099 (1.89%) Loss: 4.738757 LR: 0.00000295 [05:48:04] Epoch: 1 Batch: 381/20099 (1.90%) Loss: 4.620075 LR: 0.00000295 [05:48:07] Epoch: 1 Batch: 382/20099 (1.90%) Loss: 4.514782 LR: 0.00000295 [05:48:10] Epoch: 1 Batch: 383/20099 (1.91%) Loss: 4.726908 LR: 0.00000295 [05:48:13] Epoch: 1 Batch: 384/20099 (1.91%) Loss: 4.545939 LR: 0.00000295 [05:48:16] Epoch: 1 Batch: 385/20099 (1.92%) Loss: 4.612621 LR: 0.00000300 [05:48:19] Epoch: 1 Batch: 386/20099 (1.92%) Loss: 4.335743 LR: 0.00000300 [05:48:22] Epoch: 1 Batch: 387/20099 (1.93%) Loss: 4.399743 LR: 0.00000300 [05:48:25] Epoch: 1 Batch: 388/20099 (1.93%) Loss: 4.705808 LR: 0.00000300 [05:48:29] Epoch: 1 Batch: 389/20099 (1.94%) Loss: 4.504914 LR: 0.00000300 [05:48:32] Epoch: 1 Batch: 390/20099 (1.94%) Loss: 4.027596 LR: 0.00000300 [05:48:35] Epoch: 1 Batch: 391/20099 (1.95%) Loss: 4.647719 LR: 0.00000300 [05:48:38] Epoch: 1 Batch: 392/20099 (1.95%) Loss: 4.645146 LR: 0.00000305 [05:48:41] Epoch: 1 Batch: 393/20099 (1.96%) Loss: 4.740805 LR: 0.00000305 [05:48:44] Epoch: 1 Batch: 394/20099 (1.96%) Loss: 4.275738 LR: 0.00000305 [05:48:47] Epoch: 1 Batch: 395/20099 (1.97%) Loss: 4.541855 LR: 0.00000305 [05:48:50] Epoch: 1 Batch: 396/20099 (1.97%) Loss: 4.124485 LR: 0.00000305 [05:48:53] Epoch: 1 Batch: 397/20099 (1.98%) Loss: 4.208072 LR: 0.00000305 [05:48:57] Epoch: 1 Batch: 398/20099 (1.98%) Loss: 4.563438 LR: 0.00000305 [05:49:00] Epoch: 1 Batch: 399/20099 (1.99%) Loss: 4.288823 LR: 0.00000311 [05:49:06] >> Temp checkpoint saved: epoch1_step400, size: 0.1693 GB [05:49:06] Epoch: 1 Batch: 400/20099 (1.99%) Loss: 4.443245 LR: 0.00000311 [05:49:10] Epoch: 1 Batch: 401/20099 (2.00%) Loss: 4.263463 LR: 0.00000311 [05:49:13] Epoch: 1 Batch: 402/20099 (2.00%) Loss: 4.325703 LR: 0.00000311 [05:49:16] Epoch: 1 Batch: 403/20099 (2.01%) Loss: 4.370763 LR: 0.00000311 [05:49:19] Epoch: 1 Batch: 404/20099 (2.01%) Loss: 4.374426 LR: 0.00000311 [05:49:22] Epoch: 1 Batch: 405/20099 (2.02%) Loss: 4.432939 LR: 0.00000311 [05:49:25] Epoch: 1 Batch: 406/20099 (2.02%) Loss: 4.191067 LR: 0.00000316 [05:49:28] Epoch: 1 Batch: 407/20099 (2.02%) Loss: 4.505256 LR: 0.00000316 [05:49:31] Epoch: 1 Batch: 408/20099 (2.03%) Loss: 4.575447 LR: 0.00000316 [05:49:34] Epoch: 1 Batch: 409/20099 (2.03%) Loss: 4.091455 LR: 0.00000316 [05:49:38] Epoch: 1 Batch: 410/20099 (2.04%) Loss: 4.182844 LR: 0.00000316 [05:49:41] Epoch: 1 Batch: 411/20099 (2.04%) Loss: 4.224022 LR: 0.00000316 [05:49:44] Epoch: 1 Batch: 412/20099 (2.05%) Loss: 4.192635 LR: 0.00000316 [05:49:47] Epoch: 1 Batch: 413/20099 (2.05%) Loss: 4.140214 LR: 0.00000322 [05:49:50] Epoch: 1 Batch: 414/20099 (2.06%) Loss: 4.216347 LR: 0.00000322 [05:49:53] Epoch: 1 Batch: 415/20099 (2.06%) Loss: 4.621737 LR: 0.00000322 [05:49:56] Epoch: 1 Batch: 416/20099 (2.07%) Loss: 4.252255 LR: 0.00000322 [05:49:59] Epoch: 1 Batch: 417/20099 (2.07%) Loss: 4.198958 LR: 0.00000322 [05:50:03] Epoch: 1 Batch: 418/20099 (2.08%) Loss: 4.394632 LR: 0.00000322 [05:50:06] Epoch: 1 Batch: 419/20099 (2.08%) Loss: 4.236249 LR: 0.00000322 [05:50:09] Epoch: 1 Batch: 420/20099 (2.09%) Loss: 4.355175 LR: 0.00000327 [05:50:12] Epoch: 1 Batch: 421/20099 (2.09%) Loss: 4.291812 LR: 0.00000327 [05:50:15] Epoch: 1 Batch: 422/20099 (2.10%) Loss: 4.109209 LR: 0.00000327 [05:50:18] Epoch: 1 Batch: 423/20099 (2.10%) Loss: 4.256128 LR: 0.00000327 [05:50:21] Epoch: 1 Batch: 424/20099 (2.11%) Loss: 4.256746 LR: 0.00000327 [05:50:24] Epoch: 1 Batch: 425/20099 (2.11%) Loss: 4.010420 LR: 0.00000327 [05:50:27] Epoch: 1 Batch: 426/20099 (2.12%) Loss: 4.185977 LR: 0.00000327 [05:50:30] Epoch: 1 Batch: 427/20099 (2.12%) Loss: 3.734584 LR: 0.00000333 [05:50:33] Epoch: 1 Batch: 428/20099 (2.13%) Loss: 4.120882 LR: 0.00000333 [05:50:37] Epoch: 1 Batch: 429/20099 (2.13%) Loss: 4.035900 LR: 0.00000333 [05:50:40] Epoch: 1 Batch: 430/20099 (2.14%) Loss: 4.096785 LR: 0.00000333 [05:50:43] Epoch: 1 Batch: 431/20099 (2.14%) Loss: 4.371582 LR: 0.00000333 [05:50:46] Epoch: 1 Batch: 432/20099 (2.15%) Loss: 4.406343 LR: 0.00000333 [05:50:49] Epoch: 1 Batch: 433/20099 (2.15%) Loss: 4.217954 LR: 0.00000333 [05:50:52] Epoch: 1 Batch: 434/20099 (2.16%) Loss: 4.006398 LR: 0.00000338 [05:50:55] Epoch: 1 Batch: 435/20099 (2.16%) Loss: 4.312137 LR: 0.00000338 [05:50:58] Epoch: 1 Batch: 436/20099 (2.17%) Loss: 3.935061 LR: 0.00000338 [05:51:01] Epoch: 1 Batch: 437/20099 (2.17%) Loss: 3.984499 LR: 0.00000338 [05:51:04] Epoch: 1 Batch: 438/20099 (2.18%) Loss: 4.066708 LR: 0.00000338 [05:51:08] Epoch: 1 Batch: 439/20099 (2.18%) Loss: 4.183898 LR: 0.00000338 [05:51:11] Epoch: 1 Batch: 440/20099 (2.19%) Loss: 3.720214 LR: 0.00000338 [05:51:14] Epoch: 1 Batch: 441/20099 (2.19%) Loss: 3.827372 LR: 0.00000344 [05:51:17] Epoch: 1 Batch: 442/20099 (2.20%) Loss: 4.160211 LR: 0.00000344 [05:51:20] Epoch: 1 Batch: 443/20099 (2.20%) Loss: 4.374070 LR: 0.00000344 [05:51:23] Epoch: 1 Batch: 444/20099 (2.21%) Loss: 3.992518 LR: 0.00000344 [05:51:26] Epoch: 1 Batch: 445/20099 (2.21%) Loss: 4.103231 LR: 0.00000344 [05:51:29] Epoch: 1 Batch: 446/20099 (2.22%) Loss: 3.821749 LR: 0.00000344 [05:51:32] Epoch: 1 Batch: 447/20099 (2.22%) Loss: 3.976046 LR: 0.00000344 [05:51:35] Epoch: 1 Batch: 448/20099 (2.23%) Loss: 4.146168 LR: 0.00000349 [05:51:38] Epoch: 1 Batch: 449/20099 (2.23%) Loss: 4.201299 LR: 0.00000349 [05:51:42] Epoch: 1 Batch: 450/20099 (2.24%) Loss: 4.064378 LR: 0.00000349 [05:51:45] Epoch: 1 Batch: 451/20099 (2.24%) Loss: 3.729725 LR: 0.00000349 [05:51:48] Epoch: 1 Batch: 452/20099 (2.25%) Loss: 3.675411 LR: 0.00000349 [05:51:51] Epoch: 1 Batch: 453/20099 (2.25%) Loss: 4.076932 LR: 0.00000349 [05:51:54] Epoch: 1 Batch: 454/20099 (2.26%) Loss: 4.141027 LR: 0.00000349 [05:51:57] Epoch: 1 Batch: 455/20099 (2.26%) Loss: 4.166919 LR: 0.00000355 [05:52:00] Epoch: 1 Batch: 456/20099 (2.27%) Loss: 3.875535 LR: 0.00000355 [05:52:03] Epoch: 1 Batch: 457/20099 (2.27%) Loss: 4.099027 LR: 0.00000355 [05:52:06] Epoch: 1 Batch: 458/20099 (2.28%) Loss: 3.543368 LR: 0.00000355 [05:52:09] Epoch: 1 Batch: 459/20099 (2.28%) Loss: 3.953135 LR: 0.00000355 [05:52:13] Epoch: 1 Batch: 460/20099 (2.29%) Loss: 4.167485 LR: 0.00000355 [05:52:16] Epoch: 1 Batch: 461/20099 (2.29%) Loss: 4.143532 LR: 0.00000355 [05:52:19] Epoch: 1 Batch: 462/20099 (2.30%) Loss: 3.822133 LR: 0.00000360 [05:52:22] Epoch: 1 Batch: 463/20099 (2.30%) Loss: 3.708073 LR: 0.00000360 [05:52:25] Epoch: 1 Batch: 464/20099 (2.31%) Loss: 3.909542 LR: 0.00000360 [05:52:28] Epoch: 1 Batch: 465/20099 (2.31%) Loss: 3.827154 LR: 0.00000360 [05:52:31] Epoch: 1 Batch: 466/20099 (2.32%) Loss: 3.733705 LR: 0.00000360 [05:52:34] Epoch: 1 Batch: 467/20099 (2.32%) Loss: 3.790383 LR: 0.00000360 [05:52:37] Epoch: 1 Batch: 468/20099 (2.33%) Loss: 3.943520 LR: 0.00000360 [05:52:41] Epoch: 1 Batch: 469/20099 (2.33%) Loss: 3.766414 LR: 0.00000365 [05:52:44] Epoch: 1 Batch: 470/20099 (2.34%) Loss: 3.857356 LR: 0.00000365 [05:52:47] Epoch: 1 Batch: 471/20099 (2.34%) Loss: 3.848726 LR: 0.00000365 [05:52:50] Epoch: 1 Batch: 472/20099 (2.35%) Loss: 3.865954 LR: 0.00000365 [05:52:53] Epoch: 1 Batch: 473/20099 (2.35%) Loss: 3.860591 LR: 0.00000365 [05:52:56] Epoch: 1 Batch: 474/20099 (2.36%) Loss: 3.921723 LR: 0.00000365 [05:52:59] Epoch: 1 Batch: 475/20099 (2.36%) Loss: 3.815340 LR: 0.00000365 [05:53:02] Epoch: 1 Batch: 476/20099 (2.37%) Loss: 3.966082 LR: 0.00000371 [05:53:05] Epoch: 1 Batch: 477/20099 (2.37%) Loss: 3.753700 LR: 0.00000371 [05:53:08] Epoch: 1 Batch: 478/20099 (2.38%) Loss: 3.675669 LR: 0.00000371 [05:53:12] Epoch: 1 Batch: 479/20099 (2.38%) Loss: 3.459110 LR: 0.00000371 [05:53:15] Epoch: 1 Batch: 480/20099 (2.39%) Loss: 4.243398 LR: 0.00000371 [05:53:18] Epoch: 1 Batch: 481/20099 (2.39%) Loss: 3.737950 LR: 0.00000371 [05:53:21] Epoch: 1 Batch: 482/20099 (2.40%) Loss: 4.179872 LR: 0.00000371 [05:53:24] Epoch: 1 Batch: 483/20099 (2.40%) Loss: 3.865076 LR: 0.00000376 [05:53:27] Epoch: 1 Batch: 484/20099 (2.41%) Loss: 3.965529 LR: 0.00000376 [05:53:30] Epoch: 1 Batch: 485/20099 (2.41%) Loss: 3.849352 LR: 0.00000376 [05:53:33] Epoch: 1 Batch: 486/20099 (2.42%) Loss: 3.759498 LR: 0.00000376 [05:53:36] Epoch: 1 Batch: 487/20099 (2.42%) Loss: 3.589682 LR: 0.00000376 [05:53:39] Epoch: 1 Batch: 488/20099 (2.43%) Loss: 3.498409 LR: 0.00000376 [05:53:42] Epoch: 1 Batch: 489/20099 (2.43%) Loss: 3.753106 LR: 0.00000376 [05:53:46] Epoch: 1 Batch: 490/20099 (2.44%) Loss: 3.818756 LR: 0.00000382 [05:53:49] Epoch: 1 Batch: 491/20099 (2.44%) Loss: 3.818244 LR: 0.00000382 [05:53:52] Epoch: 1 Batch: 492/20099 (2.45%) Loss: 3.464826 LR: 0.00000382 [05:53:55] Epoch: 1 Batch: 493/20099 (2.45%) Loss: 3.757796 LR: 0.00000382 [05:53:58] Epoch: 1 Batch: 494/20099 (2.46%) Loss: 3.976522 LR: 0.00000382 [05:54:01] Epoch: 1 Batch: 495/20099 (2.46%) Loss: 3.548073 LR: 0.00000382 [05:54:04] Epoch: 1 Batch: 496/20099 (2.47%) Loss: 3.892557 LR: 0.00000382 [05:54:07] Epoch: 1 Batch: 497/20099 (2.47%) Loss: 3.668852 LR: 0.00000387 [05:54:10] Epoch: 1 Batch: 498/20099 (2.48%) Loss: 3.730822 LR: 0.00000387 [05:54:13] Epoch: 1 Batch: 499/20099 (2.48%) Loss: 3.480257 LR: 0.00000387 [05:54:17] >> Evaluating batch 0 [05:54:18] >> Evaluating batch 1 [05:54:19] >> Evaluating batch 2 [05:54:21] >> Evaluating batch 3 [05:54:22] >> Evaluating batch 4 [05:54:23] >> Evaluating batch 5 [05:54:24] >> Evaluating batch 6 [05:54:26] >> Evaluating batch 7 [05:54:27] >> Evaluating batch 8 [05:54:28] >> Evaluating batch 9 [05:54:29] >> Evaluating batch 10 [05:54:30] >> Evaluating batch 11 [05:54:32] >> Evaluating batch 12 [05:54:33] >> Evaluating batch 13 [05:54:34] >> Evaluating batch 14 [05:54:35] >> Evaluating batch 15 [05:54:36] >> Evaluating batch 16 [05:54:37] Epoch: 1 Step: 500/20099 Evaluation: [05:54:37] [1mAvg Loss Since Last Eval: 5.5317 Val Loss: 3.7841 Validation loss delta: 3.7841 Perplexity: 43.9965 LR: 0.00000387 [05:54:41] >> Checkpoint saved: epoch1_step500, size: 0.1693 GB [05:54:41] Epoch: 1 Batch: 500/20099 (2.49%) Loss: 3.659668 LR: 0.00000387 [05:54:44] Epoch: 1 Batch: 501/20099 (2.49%) Loss: 3.431229 LR: 0.00000387 [05:54:47] Epoch: 1 Batch: 502/20099 (2.50%) Loss: 3.719763 LR: 0.00000387 [05:54:50] Epoch: 1 Batch: 503/20099 (2.50%) Loss: 3.788880 LR: 0.00000387 [05:54:53] Epoch: 1 Batch: 504/20099 (2.51%) Loss: 3.466397 LR: 0.00000393 [05:54:56] Epoch: 1 Batch: 505/20099 (2.51%) Loss: 3.665978 LR: 0.00000393 [05:54:59] Epoch: 1 Batch: 506/20099 (2.52%) Loss: 4.039702 LR: 0.00000393 [05:55:02] Epoch: 1 Batch: 507/20099 (2.52%) Loss: 3.902894 LR: 0.00000393 [05:55:06] Epoch: 1 Batch: 508/20099 (2.53%) Loss: 3.589540 LR: 0.00000393 [05:55:09] Epoch: 1 Batch: 509/20099 (2.53%) Loss: 3.595153 LR: 0.00000393 [05:55:12] Epoch: 1 Batch: 510/20099 (2.54%) Loss: 3.821929 LR: 0.00000393 [05:55:15] Epoch: 1 Batch: 511/20099 (2.54%) Loss: 3.638913 LR: 0.00000398 [05:55:18] Epoch: 1 Batch: 512/20099 (2.55%) Loss: 3.772271 LR: 0.00000398 [05:55:21] Epoch: 1 Batch: 513/20099 (2.55%) Loss: 3.613601 LR: 0.00000398 [05:55:24] Epoch: 1 Batch: 514/20099 (2.56%) Loss: 3.359339 LR: 0.00000398 [05:55:28] Epoch: 1 Batch: 515/20099 (2.56%) Loss: 3.825066 LR: 0.00000398 [05:55:31] Epoch: 1 Batch: 516/20099 (2.57%) Loss: 3.490818 LR: 0.00000398 [05:55:34] Epoch: 1 Batch: 517/20099 (2.57%) Loss: 3.395434 LR: 0.00000398 [05:55:37] Epoch: 1 Batch: 518/20099 (2.58%) Loss: 3.687841 LR: 0.00000404 [05:55:40] Epoch: 1 Batch: 519/20099 (2.58%) Loss: 3.573826 LR: 0.00000404 [05:55:43] Epoch: 1 Batch: 520/20099 (2.59%) Loss: 3.504808 LR: 0.00000404 [05:55:46] Epoch: 1 Batch: 521/20099 (2.59%) Loss: 3.787109 LR: 0.00000404 [05:55:49] Epoch: 1 Batch: 522/20099 (2.60%) Loss: 3.473059 LR: 0.00000404 [05:55:52] Epoch: 1 Batch: 523/20099 (2.60%) Loss: 3.585751 LR: 0.00000404 [05:55:56] Epoch: 1 Batch: 524/20099 (2.61%) Loss: 3.363418 LR: 0.00000404 [05:55:59] Epoch: 1 Batch: 525/20099 (2.61%) Loss: 3.883132 LR: 0.00000409 [05:56:02] Epoch: 1 Batch: 526/20099 (2.62%) Loss: 3.766091 LR: 0.00000409 [05:56:05] Epoch: 1 Batch: 527/20099 (2.62%) Loss: 3.396730 LR: 0.00000409 [05:56:08] Epoch: 1 Batch: 528/20099 (2.63%) Loss: 3.667768 LR: 0.00000409 [05:56:11] Epoch: 1 Batch: 529/20099 (2.63%) Loss: 3.520267 LR: 0.00000409 [05:56:14] Epoch: 1 Batch: 530/20099 (2.64%) Loss: 3.588992 LR: 0.00000409 [05:56:17] Epoch: 1 Batch: 531/20099 (2.64%) Loss: 3.276070 LR: 0.00000409 [05:56:20] Epoch: 1 Batch: 532/20099 (2.65%) Loss: 3.699071 LR: 0.00000415 [05:56:23] Epoch: 1 Batch: 533/20099 (2.65%) Loss: 3.508219 LR: 0.00000415 [05:56:26] Epoch: 1 Batch: 534/20099 (2.66%) Loss: 3.550948 LR: 0.00000415 [05:56:30] Epoch: 1 Batch: 535/20099 (2.66%) Loss: 3.305432 LR: 0.00000415 [05:56:33] Epoch: 1 Batch: 536/20099 (2.67%) Loss: 3.604021 LR: 0.00000415 [05:56:36] Epoch: 1 Batch: 537/20099 (2.67%) Loss: 3.365321 LR: 0.00000415 [05:56:39] Epoch: 1 Batch: 538/20099 (2.68%) Loss: 3.414226 LR: 0.00000415 [05:56:42] Epoch: 1 Batch: 539/20099 (2.68%) Loss: 3.377056 LR: 0.00000420 [05:56:45] Epoch: 1 Batch: 540/20099 (2.69%) Loss: 3.160712 LR: 0.00000420 [05:56:48] Epoch: 1 Batch: 541/20099 (2.69%) Loss: 3.483096 LR: 0.00000420 [05:56:51] Epoch: 1 Batch: 542/20099 (2.70%) Loss: 3.367512 LR: 0.00000420 [05:56:54] Epoch: 1 Batch: 543/20099 (2.70%) Loss: 3.590770 LR: 0.00000420 [05:56:58] Epoch: 1 Batch: 544/20099 (2.71%) Loss: 3.066118 LR: 0.00000420 [05:57:01] Epoch: 1 Batch: 545/20099 (2.71%) Loss: 3.460900 LR: 0.00000420 [05:57:04] Epoch: 1 Batch: 546/20099 (2.72%) Loss: 3.363395 LR: 0.00000425 [05:57:07] Epoch: 1 Batch: 547/20099 (2.72%) Loss: 3.327603 LR: 0.00000425 [05:57:10] Epoch: 1 Batch: 548/20099 (2.73%) Loss: 3.557308 LR: 0.00000425 [05:57:13] Epoch: 1 Batch: 549/20099 (2.73%) Loss: 3.242236 LR: 0.00000425 [05:57:16] Epoch: 1 Batch: 550/20099 (2.74%) Loss: 3.535053 LR: 0.00000425 [05:57:19] Epoch: 1 Batch: 551/20099 (2.74%) Loss: 3.438156 LR: 0.00000425 [05:57:22] Epoch: 1 Batch: 552/20099 (2.75%) Loss: 3.394970 LR: 0.00000425 [05:57:26] Epoch: 1 Batch: 553/20099 (2.75%) Loss: 3.225433 LR: 0.00000431 [05:57:29] Epoch: 1 Batch: 554/20099 (2.76%) Loss: 3.360871 LR: 0.00000431 [05:57:32] Epoch: 1 Batch: 555/20099 (2.76%) Loss: 2.998131 LR: 0.00000431 [05:57:35] Epoch: 1 Batch: 556/20099 (2.77%) Loss: 3.220312 LR: 0.00000431 [05:57:38] Epoch: 1 Batch: 557/20099 (2.77%) Loss: 3.574907 LR: 0.00000431 [05:57:41] Epoch: 1 Batch: 558/20099 (2.78%) Loss: 3.541529 LR: 0.00000431 [05:57:44] Epoch: 1 Batch: 559/20099 (2.78%) Loss: 3.331823 LR: 0.00000431 [05:57:47] Epoch: 1 Batch: 560/20099 (2.79%) Loss: 3.114966 LR: 0.00000436 [05:57:50] Epoch: 1 Batch: 561/20099 (2.79%) Loss: 3.313049 LR: 0.00000436 [05:57:54] Epoch: 1 Batch: 562/20099 (2.80%) Loss: 3.215311 LR: 0.00000436 [05:57:57] Epoch: 1 Batch: 563/20099 (2.80%) Loss: 3.518518 LR: 0.00000436 [05:58:00] Epoch: 1 Batch: 564/20099 (2.81%) Loss: 3.285318 LR: 0.00000436 [05:58:03] Epoch: 1 Batch: 565/20099 (2.81%) Loss: 3.470620 LR: 0.00000436 [05:58:06] Epoch: 1 Batch: 566/20099 (2.82%) Loss: 3.350331 LR: 0.00000436 [05:58:09] Epoch: 1 Batch: 567/20099 (2.82%) Loss: 3.215801 LR: 0.00000442 [05:58:12] Epoch: 1 Batch: 568/20099 (2.83%) Loss: 3.126339 LR: 0.00000442 [05:58:15] Epoch: 1 Batch: 569/20099 (2.83%) Loss: 3.139811 LR: 0.00000442 [05:58:18] Epoch: 1 Batch: 570/20099 (2.84%) Loss: 3.255645 LR: 0.00000442 [05:58:21] Epoch: 1 Batch: 571/20099 (2.84%) Loss: 3.393592 LR: 0.00000442 [05:58:25] Epoch: 1 Batch: 572/20099 (2.85%) Loss: 3.235650 LR: 0.00000442 [05:58:28] Epoch: 1 Batch: 573/20099 (2.85%) Loss: 3.407495 LR: 0.00000442 [05:58:31] Epoch: 1 Batch: 574/20099 (2.86%) Loss: 3.135778 LR: 0.00000447 [05:58:34] Epoch: 1 Batch: 575/20099 (2.86%) Loss: 3.040358 LR: 0.00000447 [05:58:37] Epoch: 1 Batch: 576/20099 (2.87%) Loss: 3.381332 LR: 0.00000447 [05:58:40] Epoch: 1 Batch: 577/20099 (2.87%) Loss: 3.146860 LR: 0.00000447 [05:58:43] Epoch: 1 Batch: 578/20099 (2.88%) Loss: 3.093589 LR: 0.00000447 [05:58:46] Epoch: 1 Batch: 579/20099 (2.88%) Loss: 3.534211 LR: 0.00000447 [05:58:49] Epoch: 1 Batch: 580/20099 (2.89%) Loss: 3.272237 LR: 0.00000447 [05:58:53] Epoch: 1 Batch: 581/20099 (2.89%) Loss: 3.417892 LR: 0.00000453 [05:58:56] Epoch: 1 Batch: 582/20099 (2.90%) Loss: 3.313116 LR: 0.00000453 [05:58:59] Epoch: 1 Batch: 583/20099 (2.90%) Loss: 3.206202 LR: 0.00000453 [05:59:02] Epoch: 1 Batch: 584/20099 (2.91%) Loss: 3.550137 LR: 0.00000453 [05:59:05] Epoch: 1 Batch: 585/20099 (2.91%) Loss: 3.333817 LR: 0.00000453 [05:59:08] Epoch: 1 Batch: 586/20099 (2.92%) Loss: 3.080060 LR: 0.00000453 [05:59:11] Epoch: 1 Batch: 587/20099 (2.92%) Loss: 3.173259 LR: 0.00000453 [05:59:14] Epoch: 1 Batch: 588/20099 (2.93%) Loss: 3.314359 LR: 0.00000458 [05:59:17] Epoch: 1 Batch: 589/20099 (2.93%) Loss: 2.940089 LR: 0.00000458 [05:59:21] Epoch: 1 Batch: 590/20099 (2.94%) Loss: 3.244025 LR: 0.00000458 [05:59:24] Epoch: 1 Batch: 591/20099 (2.94%) Loss: 3.239339 LR: 0.00000458 [05:59:27] Epoch: 1 Batch: 592/20099 (2.95%) Loss: 3.177510 LR: 0.00000458 [05:59:30] Epoch: 1 Batch: 593/20099 (2.95%) Loss: 3.410569 LR: 0.00000458 [05:59:33] Epoch: 1 Batch: 594/20099 (2.96%) Loss: 3.361750 LR: 0.00000458 [05:59:36] Epoch: 1 Batch: 595/20099 (2.96%) Loss: 3.321055 LR: 0.00000464 [05:59:39] Epoch: 1 Batch: 596/20099 (2.97%) Loss: 3.248768 LR: 0.00000464 [05:59:42] Epoch: 1 Batch: 597/20099 (2.97%) Loss: 3.478450 LR: 0.00000464 [05:59:45] Epoch: 1 Batch: 598/20099 (2.98%) Loss: 3.254317 LR: 0.00000464 [05:59:48] Epoch: 1 Batch: 599/20099 (2.98%) Loss: 3.094679 LR: 0.00000464 [05:59:55] >> Temp checkpoint saved: epoch1_step600, size: 0.1693 GB [05:59:55] Epoch: 1 Batch: 600/20099 (2.99%) Loss: 3.077919 LR: 0.00000464 [05:59:58] Epoch: 1 Batch: 601/20099 (2.99%) Loss: 3.388025 LR: 0.00000464 [06:00:01] Epoch: 1 Batch: 602/20099 (3.00%) Loss: 3.208430 LR: 0.00000469 [06:00:05] Epoch: 1 Batch: 603/20099 (3.00%) Loss: 3.214555 LR: 0.00000469 [06:00:08] Epoch: 1 Batch: 604/20099 (3.01%) Loss: 3.274143 LR: 0.00000469 [06:00:11] Epoch: 1 Batch: 605/20099 (3.01%) Loss: 3.238697 LR: 0.00000469 [06:00:14] Epoch: 1 Batch: 606/20099 (3.02%) Loss: 3.252021 LR: 0.00000469 [06:00:17] Epoch: 1 Batch: 607/20099 (3.02%) Loss: 3.050976 LR: 0.00000469 [06:00:20] Epoch: 1 Batch: 608/20099 (3.03%) Loss: 3.174529 LR: 0.00000469 [06:00:23] Epoch: 1 Batch: 609/20099 (3.03%) Loss: 2.937750 LR: 0.00000475 [06:00:26] Epoch: 1 Batch: 610/20099 (3.03%) Loss: 3.077062 LR: 0.00000475 [06:00:29] Epoch: 1 Batch: 611/20099 (3.04%) Loss: 3.319816 LR: 0.00000475 [06:00:33] Epoch: 1 Batch: 612/20099 (3.04%) Loss: 2.946783 LR: 0.00000475 [06:00:36] Epoch: 1 Batch: 613/20099 (3.05%) Loss: 3.323491 LR: 0.00000475 [06:00:39] Epoch: 1 Batch: 614/20099 (3.05%) Loss: 3.172926 LR: 0.00000475 [06:00:42] Epoch: 1 Batch: 615/20099 (3.06%) Loss: 3.302462 LR: 0.00000475 [06:00:45] Epoch: 1 Batch: 616/20099 (3.06%) Loss: 3.105341 LR: 0.00000480 [06:00:48] Epoch: 1 Batch: 617/20099 (3.07%) Loss: 3.070046 LR: 0.00000480 [06:00:51] Epoch: 1 Batch: 618/20099 (3.07%) Loss: 3.223153 LR: 0.00000480 [06:00:54] Epoch: 1 Batch: 619/20099 (3.08%) Loss: 2.849803 LR: 0.00000480 [06:00:57] Epoch: 1 Batch: 620/20099 (3.08%) Loss: 3.203351 LR: 0.00000480 [06:01:01] Epoch: 1 Batch: 621/20099 (3.09%) Loss: 3.106402 LR: 0.00000480 [06:01:04] Epoch: 1 Batch: 622/20099 (3.09%) Loss: 3.263969 LR: 0.00000480 [06:01:07] Epoch: 1 Batch: 623/20099 (3.10%) Loss: 3.093187 LR: 0.00000485 [06:01:10] Epoch: 1 Batch: 624/20099 (3.10%) Loss: 3.134565 LR: 0.00000485 [06:01:13] Epoch: 1 Batch: 625/20099 (3.11%) Loss: 2.915504 LR: 0.00000485 [06:01:16] Epoch: 1 Batch: 626/20099 (3.11%) Loss: 3.422499 LR: 0.00000485 [06:01:19] Epoch: 1 Batch: 627/20099 (3.12%) Loss: 3.200302 LR: 0.00000485 [06:01:22] Epoch: 1 Batch: 628/20099 (3.12%) Loss: 3.059123 LR: 0.00000485 [06:01:25] Epoch: 1 Batch: 629/20099 (3.13%) Loss: 2.926637 LR: 0.00000485 [06:01:28] Epoch: 1 Batch: 630/20099 (3.13%) Loss: 3.096199 LR: 0.00000491 [06:01:32] Epoch: 1 Batch: 631/20099 (3.14%) Loss: 3.423839 LR: 0.00000491 [06:01:35] Epoch: 1 Batch: 632/20099 (3.14%) Loss: 3.104832 LR: 0.00000491 [06:01:38] Epoch: 1 Batch: 633/20099 (3.15%) Loss: 3.059201 LR: 0.00000491 [06:01:41] Epoch: 1 Batch: 634/20099 (3.15%) Loss: 2.907817 LR: 0.00000491 [06:01:44] Epoch: 1 Batch: 635/20099 (3.16%) Loss: 3.105647 LR: 0.00000491 [06:01:47] Epoch: 1 Batch: 636/20099 (3.16%) Loss: 3.200073 LR: 0.00000491 [06:01:50] Epoch: 1 Batch: 637/20099 (3.17%) Loss: 3.079901 LR: 0.00000496 [06:01:53] Epoch: 1 Batch: 638/20099 (3.17%) Loss: 2.781568 LR: 0.00000496 [06:01:56] Epoch: 1 Batch: 639/20099 (3.18%) Loss: 2.994633 LR: 0.00000496 [06:01:59] Epoch: 1 Batch: 640/20099 (3.18%) Loss: 2.662927 LR: 0.00000496 [06:02:03] Epoch: 1 Batch: 641/20099 (3.19%) Loss: 3.026238 LR: 0.00000496 [06:02:06] Epoch: 1 Batch: 642/20099 (3.19%) Loss: 2.726641 LR: 0.00000496 [06:02:09] Epoch: 1 Batch: 643/20099 (3.20%) Loss: 2.948033 LR: 0.00000496 [06:02:12] Epoch: 1 Batch: 644/20099 (3.20%) Loss: 3.041286 LR: 0.00000502 [06:02:15] Epoch: 1 Batch: 645/20099 (3.21%) Loss: 3.011386 LR: 0.00000502 [06:02:18] Epoch: 1 Batch: 646/20099 (3.21%) Loss: 3.173436 LR: 0.00000502 [06:02:21] Epoch: 1 Batch: 647/20099 (3.22%) Loss: 3.107981 LR: 0.00000502 [06:02:24] Epoch: 1 Batch: 648/20099 (3.22%) Loss: 3.088532 LR: 0.00000502 [06:02:27] Epoch: 1 Batch: 649/20099 (3.23%) Loss: 3.018124 LR: 0.00000502 [06:02:31] Epoch: 1 Batch: 650/20099 (3.23%) Loss: 3.039707 LR: 0.00000502 [06:02:34] Epoch: 1 Batch: 651/20099 (3.24%) Loss: 3.113390 LR: 0.00000507 [06:02:37] Epoch: 1 Batch: 652/20099 (3.24%) Loss: 2.892727 LR: 0.00000507 [06:02:40] Epoch: 1 Batch: 653/20099 (3.25%) Loss: 2.966950 LR: 0.00000507 [06:02:43] Epoch: 1 Batch: 654/20099 (3.25%) Loss: 3.125466 LR: 0.00000507 [06:02:46] Epoch: 1 Batch: 655/20099 (3.26%) Loss: 2.866787 LR: 0.00000507 [06:02:49] Epoch: 1 Batch: 656/20099 (3.26%) Loss: 2.985016 LR: 0.00000507 [06:02:52] Epoch: 1 Batch: 657/20099 (3.27%) Loss: 3.099442 LR: 0.00000507 [06:02:55] Epoch: 1 Batch: 658/20099 (3.27%) Loss: 2.795891 LR: 0.00000513 [06:02:59] Epoch: 1 Batch: 659/20099 (3.28%) Loss: 3.027641 LR: 0.00000513 [06:03:02] Epoch: 1 Batch: 660/20099 (3.28%) Loss: 3.005656 LR: 0.00000513 [06:03:05] Epoch: 1 Batch: 661/20099 (3.29%) Loss: 2.552632 LR: 0.00000513 [06:03:08] Epoch: 1 Batch: 662/20099 (3.29%) Loss: 2.904753 LR: 0.00000513 [06:03:11] Epoch: 1 Batch: 663/20099 (3.30%) Loss: 2.797255 LR: 0.00000513 [06:03:14] Epoch: 1 Batch: 664/20099 (3.30%) Loss: 2.999705 LR: 0.00000513 [06:03:17] Epoch: 1 Batch: 665/20099 (3.31%) Loss: 2.763514 LR: 0.00000518 [06:03:20] Epoch: 1 Batch: 666/20099 (3.31%) Loss: 2.862915 LR: 0.00000518 [06:03:23] Epoch: 1 Batch: 667/20099 (3.32%) Loss: 2.854605 LR: 0.00000518 [06:03:26] Epoch: 1 Batch: 668/20099 (3.32%) Loss: 2.644818 LR: 0.00000518 [06:03:30] Epoch: 1 Batch: 669/20099 (3.33%) Loss: 2.947203 LR: 0.00000518 [06:03:33] Epoch: 1 Batch: 670/20099 (3.33%) Loss: 3.058627 LR: 0.00000518 [06:03:36] Epoch: 1 Batch: 671/20099 (3.34%) Loss: 2.843078 LR: 0.00000518 [06:03:39] Epoch: 1 Batch: 672/20099 (3.34%) Loss: 2.764021 LR: 0.00000524 [06:03:42] Epoch: 1 Batch: 673/20099 (3.35%) Loss: 3.055920 LR: 0.00000524 [06:03:45] Epoch: 1 Batch: 674/20099 (3.35%) Loss: 2.888914 LR: 0.00000524 [06:03:48] Epoch: 1 Batch: 675/20099 (3.36%) Loss: 2.999079 LR: 0.00000524 [06:03:51] Epoch: 1 Batch: 676/20099 (3.36%) Loss: 2.743417 LR: 0.00000524 [06:03:54] Epoch: 1 Batch: 677/20099 (3.37%) Loss: 2.781025 LR: 0.00000524 [06:03:57] Epoch: 1 Batch: 678/20099 (3.37%) Loss: 3.001202 LR: 0.00000524 [06:04:01] Epoch: 1 Batch: 679/20099 (3.38%) Loss: 3.256280 LR: 0.00000529 [06:04:04] Epoch: 1 Batch: 680/20099 (3.38%) Loss: 3.016867 LR: 0.00000529 [06:04:07] Epoch: 1 Batch: 681/20099 (3.39%) Loss: 3.040658 LR: 0.00000529 [06:04:10] Epoch: 1 Batch: 682/20099 (3.39%) Loss: 2.886739 LR: 0.00000529 [06:04:13] Epoch: 1 Batch: 683/20099 (3.40%) Loss: 3.096670 LR: 0.00000529 [06:04:16] Epoch: 1 Batch: 684/20099 (3.40%) Loss: 3.169854 LR: 0.00000529 [06:04:19] Epoch: 1 Batch: 685/20099 (3.41%) Loss: 2.877370 LR: 0.00000529 [06:04:22] Epoch: 1 Batch: 686/20099 (3.41%) Loss: 2.966127 LR: 0.00000535 [06:04:25] Epoch: 1 Batch: 687/20099 (3.42%) Loss: 2.683146 LR: 0.00000535 [06:04:28] Epoch: 1 Batch: 688/20099 (3.42%) Loss: 2.782723 LR: 0.00000535 [06:04:31] Epoch: 1 Batch: 689/20099 (3.43%) Loss: 3.031456 LR: 0.00000535 [06:04:35] Epoch: 1 Batch: 690/20099 (3.43%) Loss: 2.642422 LR: 0.00000535 [06:04:38] Epoch: 1 Batch: 691/20099 (3.44%) Loss: 2.506506 LR: 0.00000535 [06:04:41] Epoch: 1 Batch: 692/20099 (3.44%) Loss: 2.810854 LR: 0.00000535 [06:04:44] Epoch: 1 Batch: 693/20099 (3.45%) Loss: 2.898553 LR: 0.00000540 [06:04:47] Epoch: 1 Batch: 694/20099 (3.45%) Loss: 2.893824 LR: 0.00000540 [06:04:50] Epoch: 1 Batch: 695/20099 (3.46%) Loss: 2.800242 LR: 0.00000540 [06:04:53] Epoch: 1 Batch: 696/20099 (3.46%) Loss: 2.776359 LR: 0.00000540 [06:04:56] Epoch: 1 Batch: 697/20099 (3.47%) Loss: 2.696438 LR: 0.00000540 [06:04:59] Epoch: 1 Batch: 698/20099 (3.47%) Loss: 2.688920 LR: 0.00000540 [06:05:02] Epoch: 1 Batch: 699/20099 (3.48%) Loss: 2.761857 LR: 0.00000540 [06:05:06] Epoch: 1 Batch: 700/20099 (3.48%) Loss: 2.526166 LR: 0.00000545 [06:05:09] Epoch: 1 Batch: 701/20099 (3.49%) Loss: 2.949073 LR: 0.00000545 [06:05:12] Epoch: 1 Batch: 702/20099 (3.49%) Loss: 3.025122 LR: 0.00000545 [06:05:15] Epoch: 1 Batch: 703/20099 (3.50%) Loss: 2.769567 LR: 0.00000545 [06:05:18] Epoch: 1 Batch: 704/20099 (3.50%) Loss: 2.675805 LR: 0.00000545 [06:05:21] Epoch: 1 Batch: 705/20099 (3.51%) Loss: 2.773868 LR: 0.00000545 [06:05:24] Epoch: 1 Batch: 706/20099 (3.51%) Loss: 2.931582 LR: 0.00000545 [06:05:27] Epoch: 1 Batch: 707/20099 (3.52%) Loss: 2.909219 LR: 0.00000551 [06:05:30] Epoch: 1 Batch: 708/20099 (3.52%) Loss: 2.736878 LR: 0.00000551 [06:05:33] Epoch: 1 Batch: 709/20099 (3.53%) Loss: 2.723336 LR: 0.00000551 [06:05:36] Epoch: 1 Batch: 710/20099 (3.53%) Loss: 2.804015 LR: 0.00000551 [06:05:40] Epoch: 1 Batch: 711/20099 (3.54%) Loss: 2.915295 LR: 0.00000551 [06:05:43] Epoch: 1 Batch: 712/20099 (3.54%) Loss: 3.158375 LR: 0.00000551 [06:05:46] Epoch: 1 Batch: 713/20099 (3.55%) Loss: 3.050549 LR: 0.00000551 [06:05:49] Epoch: 1 Batch: 714/20099 (3.55%) Loss: 2.436827 LR: 0.00000556 [06:05:52] Epoch: 1 Batch: 715/20099 (3.56%) Loss: 2.664726 LR: 0.00000556 [06:05:55] Epoch: 1 Batch: 716/20099 (3.56%) Loss: 2.712883 LR: 0.00000556 [06:05:58] Epoch: 1 Batch: 717/20099 (3.57%) Loss: 2.872016 LR: 0.00000556 [06:06:01] Epoch: 1 Batch: 718/20099 (3.57%) Loss: 2.960576 LR: 0.00000556 [06:06:04] Epoch: 1 Batch: 719/20099 (3.58%) Loss: 2.906193 LR: 0.00000556 [06:06:07] Epoch: 1 Batch: 720/20099 (3.58%) Loss: 2.900452 LR: 0.00000556 [06:06:11] Epoch: 1 Batch: 721/20099 (3.59%) Loss: 2.983211 LR: 0.00000562 [06:06:14] Epoch: 1 Batch: 722/20099 (3.59%) Loss: 2.976666 LR: 0.00000562 [06:06:17] Epoch: 1 Batch: 723/20099 (3.60%) Loss: 2.955793 LR: 0.00000562 [06:06:20] Epoch: 1 Batch: 724/20099 (3.60%) Loss: 2.357826 LR: 0.00000562 [06:06:23] Epoch: 1 Batch: 725/20099 (3.61%) Loss: 2.686409 LR: 0.00000562 [06:06:26] Epoch: 1 Batch: 726/20099 (3.61%) Loss: 2.638599 LR: 0.00000562 [06:06:29] Epoch: 1 Batch: 727/20099 (3.62%) Loss: 2.642083 LR: 0.00000562 [06:06:32] Epoch: 1 Batch: 728/20099 (3.62%) Loss: 2.543625 LR: 0.00000567 [06:06:35] Epoch: 1 Batch: 729/20099 (3.63%) Loss: 2.819079 LR: 0.00000567 [06:06:38] Epoch: 1 Batch: 730/20099 (3.63%) Loss: 2.456854 LR: 0.00000567 [06:06:42] Epoch: 1 Batch: 731/20099 (3.64%) Loss: 2.797595 LR: 0.00000567 [06:06:45] Epoch: 1 Batch: 732/20099 (3.64%) Loss: 2.716734 LR: 0.00000567 [06:06:48] Epoch: 1 Batch: 733/20099 (3.65%) Loss: 2.592807 LR: 0.00000567 [06:06:51] Epoch: 1 Batch: 734/20099 (3.65%) Loss: 2.724958 LR: 0.00000567 [06:06:54] Epoch: 1 Batch: 735/20099 (3.66%) Loss: 3.005465 LR: 0.00000573 [06:06:57] Epoch: 1 Batch: 736/20099 (3.66%) Loss: 2.725872 LR: 0.00000573 [06:07:00] Epoch: 1 Batch: 737/20099 (3.67%) Loss: 3.063807 LR: 0.00000573 [06:07:03] Epoch: 1 Batch: 738/20099 (3.67%) Loss: 2.776735 LR: 0.00000573 [06:07:06] Epoch: 1 Batch: 739/20099 (3.68%) Loss: 3.007910 LR: 0.00000573 [06:07:09] Epoch: 1 Batch: 740/20099 (3.68%) Loss: 2.573530 LR: 0.00000573 [06:07:13] Epoch: 1 Batch: 741/20099 (3.69%) Loss: 2.598355 LR: 0.00000573 [06:07:16] Epoch: 1 Batch: 742/20099 (3.69%) Loss: 2.466495 LR: 0.00000578 [06:07:19] Epoch: 1 Batch: 743/20099 (3.70%) Loss: 3.053300 LR: 0.00000578 [06:07:22] Epoch: 1 Batch: 744/20099 (3.70%) Loss: 2.677148 LR: 0.00000578 [06:07:25] Epoch: 1 Batch: 745/20099 (3.71%) Loss: 2.579324 LR: 0.00000578 [06:07:28] Epoch: 1 Batch: 746/20099 (3.71%) Loss: 2.494274 LR: 0.00000578 [06:07:31] Epoch: 1 Batch: 747/20099 (3.72%) Loss: 2.617231 LR: 0.00000578 [06:07:34] Epoch: 1 Batch: 748/20099 (3.72%) Loss: 2.687482 LR: 0.00000578 [06:07:37] Epoch: 1 Batch: 749/20099 (3.73%) Loss: 2.780354 LR: 0.00000584 [06:07:40] Epoch: 1 Batch: 750/20099 (3.73%) Loss: 2.774773 LR: 0.00000584 [06:07:44] Epoch: 1 Batch: 751/20099 (3.74%) Loss: 2.791642 LR: 0.00000584 [06:07:47] Epoch: 1 Batch: 752/20099 (3.74%) Loss: 2.910693 LR: 0.00000584 [06:07:50] Epoch: 1 Batch: 753/20099 (3.75%) Loss: 2.732524 LR: 0.00000584 [06:07:53] Epoch: 1 Batch: 754/20099 (3.75%) Loss: 2.674719 LR: 0.00000584 [06:07:56] Epoch: 1 Batch: 755/20099 (3.76%) Loss: 2.846109 LR: 0.00000584 [06:07:59] Epoch: 1 Batch: 756/20099 (3.76%) Loss: 2.975952 LR: 0.00000589 [06:08:02] Epoch: 1 Batch: 757/20099 (3.77%) Loss: 2.805113 LR: 0.00000589 [06:08:05] Epoch: 1 Batch: 758/20099 (3.77%) Loss: 2.681490 LR: 0.00000589 [06:08:08] Epoch: 1 Batch: 759/20099 (3.78%) Loss: 2.692689 LR: 0.00000589 [06:08:11] Epoch: 1 Batch: 760/20099 (3.78%) Loss: 2.645676 LR: 0.00000589 [06:08:15] Epoch: 1 Batch: 761/20099 (3.79%) Loss: 2.765140 LR: 0.00000589 [06:08:18] Epoch: 1 Batch: 762/20099 (3.79%) Loss: 2.692878 LR: 0.00000589 [06:08:21] Epoch: 1 Batch: 763/20099 (3.80%) Loss: 3.188788 LR: 0.00000595 [06:08:24] Epoch: 1 Batch: 764/20099 (3.80%) Loss: 2.714193 LR: 0.00000595 [06:08:27] Epoch: 1 Batch: 765/20099 (3.81%) Loss: 2.796363 LR: 0.00000595 [06:08:30] Epoch: 1 Batch: 766/20099 (3.81%) Loss: 2.631731 LR: 0.00000595 [06:08:33] Epoch: 1 Batch: 767/20099 (3.82%) Loss: 2.876368 LR: 0.00000595 [06:08:36] Epoch: 1 Batch: 768/20099 (3.82%) Loss: 2.613647 LR: 0.00000595 [06:08:39] Epoch: 1 Batch: 769/20099 (3.83%) Loss: 2.565985 LR: 0.00000595 [06:08:42] Epoch: 1 Batch: 770/20099 (3.83%) Loss: 2.686233 LR: 0.00000600 [06:08:46] Epoch: 1 Batch: 771/20099 (3.84%) Loss: 2.977858 LR: 0.00000600 [06:08:49] Epoch: 1 Batch: 772/20099 (3.84%) Loss: 2.741403 LR: 0.00000600 [06:08:52] Epoch: 1 Batch: 773/20099 (3.85%) Loss: 2.547160 LR: 0.00000600 [06:08:55] Epoch: 1 Batch: 774/20099 (3.85%) Loss: 2.780986 LR: 0.00000600 [06:08:58] Epoch: 1 Batch: 775/20099 (3.86%) Loss: 2.665013 LR: 0.00000600 [06:09:01] Epoch: 1 Batch: 776/20099 (3.86%) Loss: 2.602629 LR: 0.00000600 [06:09:04] Epoch: 1 Batch: 777/20099 (3.87%) Loss: 2.866360 LR: 0.00000605 [06:09:07] Epoch: 1 Batch: 778/20099 (3.87%) Loss: 2.461978 LR: 0.00000605 [06:09:10] Epoch: 1 Batch: 779/20099 (3.88%) Loss: 2.697537 LR: 0.00000605 [06:09:13] Epoch: 1 Batch: 780/20099 (3.88%) Loss: 2.482233 LR: 0.00000605 [06:09:17] Epoch: 1 Batch: 781/20099 (3.89%) Loss: 2.558990 LR: 0.00000605 [06:09:20] Epoch: 1 Batch: 782/20099 (3.89%) Loss: 2.437068 LR: 0.00000605 [06:09:23] Epoch: 1 Batch: 783/20099 (3.90%) Loss: 2.735448 LR: 0.00000605 [06:09:26] Epoch: 1 Batch: 784/20099 (3.90%) Loss: 2.429077 LR: 0.00000611 [06:09:29] Epoch: 1 Batch: 785/20099 (3.91%) Loss: 2.583948 LR: 0.00000611 [06:09:32] Epoch: 1 Batch: 786/20099 (3.91%) Loss: 2.514871 LR: 0.00000611 [06:09:35] Epoch: 1 Batch: 787/20099 (3.92%) Loss: 2.677977 LR: 0.00000611 [06:09:38] Epoch: 1 Batch: 788/20099 (3.92%) Loss: 2.808684 LR: 0.00000611 [06:09:41] Epoch: 1 Batch: 789/20099 (3.93%) Loss: 2.763461 LR: 0.00000611 [06:09:44] Epoch: 1 Batch: 790/20099 (3.93%) Loss: 2.608593 LR: 0.00000611 [06:09:48] Epoch: 1 Batch: 791/20099 (3.94%) Loss: 2.535972 LR: 0.00000616 [06:09:51] Epoch: 1 Batch: 792/20099 (3.94%) Loss: 2.715879 LR: 0.00000616 [06:09:54] Epoch: 1 Batch: 793/20099 (3.95%) Loss: 2.616222 LR: 0.00000616 [06:09:57] Epoch: 1 Batch: 794/20099 (3.95%) Loss: 2.713726 LR: 0.00000616 [06:10:00] Epoch: 1 Batch: 795/20099 (3.96%) Loss: 2.650722 LR: 0.00000616 [06:10:03] Epoch: 1 Batch: 796/20099 (3.96%) Loss: 2.865614 LR: 0.00000616 [06:10:06] Epoch: 1 Batch: 797/20099 (3.97%) Loss: 2.704346 LR: 0.00000616 [06:10:09] Epoch: 1 Batch: 798/20099 (3.97%) Loss: 2.617279 LR: 0.00000622 [06:10:12] Epoch: 1 Batch: 799/20099 (3.98%) Loss: 2.568028 LR: 0.00000622 [06:10:19] >> Temp checkpoint saved: epoch1_step800, size: 0.1693 GB [06:10:19] Epoch: 1 Batch: 800/20099 (3.98%) Loss: 2.402861 LR: 0.00000622 [06:10:22] Epoch: 1 Batch: 801/20099 (3.99%) Loss: 2.842114 LR: 0.00000622 [06:10:25] Epoch: 1 Batch: 802/20099 (3.99%) Loss: 2.828298 LR: 0.00000622 [06:10:28] Epoch: 1 Batch: 803/20099 (4.00%) Loss: 2.641461 LR: 0.00000622 [06:10:32] Epoch: 1 Batch: 804/20099 (4.00%) Loss: 2.578912 LR: 0.00000622 [06:10:35] Epoch: 1 Batch: 805/20099 (4.01%) Loss: 2.651098 LR: 0.00000627 [06:10:38] Epoch: 1 Batch: 806/20099 (4.01%) Loss: 2.744098 LR: 0.00000627 [06:10:41] Epoch: 1 Batch: 807/20099 (4.02%) Loss: 2.703344 LR: 0.00000627 [06:10:44] Epoch: 1 Batch: 808/20099 (4.02%) Loss: 2.505684 LR: 0.00000627 [06:10:47] Epoch: 1 Batch: 809/20099 (4.03%) Loss: 2.504888 LR: 0.00000627 [06:10:50] Epoch: 1 Batch: 810/20099 (4.03%) Loss: 2.287298 LR: 0.00000627 [06:10:53] Epoch: 1 Batch: 811/20099 (4.04%) Loss: 2.504086 LR: 0.00000627 [06:10:57] Epoch: 1 Batch: 812/20099 (4.04%) Loss: 2.532554 LR: 0.00000633 [06:11:00] Epoch: 1 Batch: 813/20099 (4.04%) Loss: 2.721575 LR: 0.00000633 [06:11:03] Epoch: 1 Batch: 814/20099 (4.05%) Loss: 2.276974 LR: 0.00000633 [06:11:06] Epoch: 1 Batch: 815/20099 (4.05%) Loss: 2.621229 LR: 0.00000633 [06:11:09] Epoch: 1 Batch: 816/20099 (4.06%) Loss: 2.589428 LR: 0.00000633 [06:11:12] Epoch: 1 Batch: 817/20099 (4.06%) Loss: 2.452340 LR: 0.00000633 [06:11:15] Epoch: 1 Batch: 818/20099 (4.07%) Loss: 2.821325 LR: 0.00000633 [06:11:18] Epoch: 1 Batch: 819/20099 (4.07%) Loss: 2.840100 LR: 0.00000638 [06:11:21] Epoch: 1 Batch: 820/20099 (4.08%) Loss: 2.325849 LR: 0.00000638 [06:11:24] Epoch: 1 Batch: 821/20099 (4.08%) Loss: 2.388970 LR: 0.00000638 [06:11:28] Epoch: 1 Batch: 822/20099 (4.09%) Loss: 2.550123 LR: 0.00000638 [06:11:31] Epoch: 1 Batch: 823/20099 (4.09%) Loss: 2.782383 LR: 0.00000638 [06:11:34] Epoch: 1 Batch: 824/20099 (4.10%) Loss: 2.451442 LR: 0.00000638 [06:11:37] Epoch: 1 Batch: 825/20099 (4.10%) Loss: 2.583817 LR: 0.00000638 [06:11:40] Epoch: 1 Batch: 826/20099 (4.11%) Loss: 2.412059 LR: 0.00000644 [06:11:43] Epoch: 1 Batch: 827/20099 (4.11%) Loss: 2.490955 LR: 0.00000644 [06:11:46] Epoch: 1 Batch: 828/20099 (4.12%) Loss: 2.698726 LR: 0.00000644 [06:11:49] Epoch: 1 Batch: 829/20099 (4.12%) Loss: 2.521588 LR: 0.00000644 [06:11:52] Epoch: 1 Batch: 830/20099 (4.13%) Loss: 2.423469 LR: 0.00000644 [06:11:55] Epoch: 1 Batch: 831/20099 (4.13%) Loss: 2.723634 LR: 0.00000644 [06:11:58] Epoch: 1 Batch: 832/20099 (4.14%) Loss: 2.751270 LR: 0.00000644 [06:12:01] Epoch: 1 Batch: 833/20099 (4.14%) Loss: 2.730820 LR: 0.00000649 [06:12:05] Epoch: 1 Batch: 834/20099 (4.15%) Loss: 2.670857 LR: 0.00000649 [06:12:08] Epoch: 1 Batch: 835/20099 (4.15%) Loss: 2.753617 LR: 0.00000649 [06:12:11] Epoch: 1 Batch: 836/20099 (4.16%) Loss: 2.547131 LR: 0.00000649 [06:12:14] Epoch: 1 Batch: 837/20099 (4.16%) Loss: 2.463307 LR: 0.00000649 [06:12:17] Epoch: 1 Batch: 838/20099 (4.17%) Loss: 2.306565 LR: 0.00000649 [06:12:20] Epoch: 1 Batch: 839/20099 (4.17%) Loss: 2.564209 LR: 0.00000649 [06:12:23] Epoch: 1 Batch: 840/20099 (4.18%) Loss: 2.574380 LR: 0.00000655 [06:12:26] Epoch: 1 Batch: 841/20099 (4.18%) Loss: 2.604923 LR: 0.00000655 [06:12:29] Epoch: 1 Batch: 842/20099 (4.19%) Loss: 2.540481 LR: 0.00000655 [06:12:32] Epoch: 1 Batch: 843/20099 (4.19%) Loss: 2.644381 LR: 0.00000655 [06:12:35] Epoch: 1 Batch: 844/20099 (4.20%) Loss: 2.717461 LR: 0.00000655 [06:12:39] Epoch: 1 Batch: 845/20099 (4.20%) Loss: 2.635366 LR: 0.00000655 [06:12:42] Epoch: 1 Batch: 846/20099 (4.21%) Loss: 2.562326 LR: 0.00000655 [06:12:45] Epoch: 1 Batch: 847/20099 (4.21%) Loss: 2.681325 LR: 0.00000660 [06:12:48] Epoch: 1 Batch: 848/20099 (4.22%) Loss: 2.723640 LR: 0.00000660 [06:12:51] Epoch: 1 Batch: 849/20099 (4.22%) Loss: 2.795785 LR: 0.00000660 [06:12:54] Epoch: 1 Batch: 850/20099 (4.23%) Loss: 2.624772 LR: 0.00000660 [06:12:57] Epoch: 1 Batch: 851/20099 (4.23%) Loss: 2.405778 LR: 0.00000660 [06:13:00] Epoch: 1 Batch: 852/20099 (4.24%) Loss: 2.557772 LR: 0.00000660 [06:13:03] Epoch: 1 Batch: 853/20099 (4.24%) Loss: 2.468818 LR: 0.00000660 [06:13:06] Epoch: 1 Batch: 854/20099 (4.25%) Loss: 2.674399 LR: 0.00000665 [06:13:09] Epoch: 1 Batch: 855/20099 (4.25%) Loss: 2.687448 LR: 0.00000665 [06:13:13] Epoch: 1 Batch: 856/20099 (4.26%) Loss: 2.744894 LR: 0.00000665 [06:13:16] Epoch: 1 Batch: 857/20099 (4.26%) Loss: 2.650777 LR: 0.00000665 [06:13:19] Epoch: 1 Batch: 858/20099 (4.27%) Loss: 2.825450 LR: 0.00000665 [06:13:22] Epoch: 1 Batch: 859/20099 (4.27%) Loss: 2.514328 LR: 0.00000665 [06:13:25] Epoch: 1 Batch: 860/20099 (4.28%) Loss: 2.751596 LR: 0.00000665 [06:13:28] Epoch: 1 Batch: 861/20099 (4.28%) Loss: 2.488051 LR: 0.00000671 [06:13:31] Epoch: 1 Batch: 862/20099 (4.29%) Loss: 2.606173 LR: 0.00000671 [06:13:34] Epoch: 1 Batch: 863/20099 (4.29%) Loss: 2.521894 LR: 0.00000671 [06:13:37] Epoch: 1 Batch: 864/20099 (4.30%) Loss: 2.455069 LR: 0.00000671 [06:13:40] Epoch: 1 Batch: 865/20099 (4.30%) Loss: 2.490532 LR: 0.00000671 [06:13:44] Epoch: 1 Batch: 866/20099 (4.31%) Loss: 2.348190 LR: 0.00000671 [06:13:47] Epoch: 1 Batch: 867/20099 (4.31%) Loss: 2.403929 LR: 0.00000671 [06:13:50] Epoch: 1 Batch: 868/20099 (4.32%) Loss: 2.724191 LR: 0.00000676 [06:13:53] Epoch: 1 Batch: 869/20099 (4.32%) Loss: 2.716354 LR: 0.00000676 [06:13:56] Epoch: 1 Batch: 870/20099 (4.33%) Loss: 2.639460 LR: 0.00000676 [06:13:59] Epoch: 1 Batch: 871/20099 (4.33%) Loss: 2.227056 LR: 0.00000676 [06:14:02] Epoch: 1 Batch: 872/20099 (4.34%) Loss: 2.472200 LR: 0.00000676 [06:14:05] Epoch: 1 Batch: 873/20099 (4.34%) Loss: 2.620000 LR: 0.00000676 [06:14:08] Epoch: 1 Batch: 874/20099 (4.35%) Loss: 2.679246 LR: 0.00000676 [06:14:12] Epoch: 1 Batch: 875/20099 (4.35%) Loss: 2.609647 LR: 0.00000682 [06:14:15] Epoch: 1 Batch: 876/20099 (4.36%) Loss: 2.515188 LR: 0.00000682 [06:14:18] Epoch: 1 Batch: 877/20099 (4.36%) Loss: 2.560900 LR: 0.00000682 [06:14:21] Epoch: 1 Batch: 878/20099 (4.37%) Loss: 2.392089 LR: 0.00000682 [06:14:24] Epoch: 1 Batch: 879/20099 (4.37%) Loss: 2.570833 LR: 0.00000682 [06:14:27] Epoch: 1 Batch: 880/20099 (4.38%) Loss: 2.411783 LR: 0.00000682 [06:14:30] Epoch: 1 Batch: 881/20099 (4.38%) Loss: 2.602786 LR: 0.00000682 [06:14:33] Epoch: 1 Batch: 882/20099 (4.39%) Loss: 2.482776 LR: 0.00000687 [06:14:36] Epoch: 1 Batch: 883/20099 (4.39%) Loss: 2.470782 LR: 0.00000687 [06:14:40] Epoch: 1 Batch: 884/20099 (4.40%) Loss: 2.475675 LR: 0.00000687 [06:14:43] Epoch: 1 Batch: 885/20099 (4.40%) Loss: 2.568896 LR: 0.00000687 [06:14:46] Epoch: 1 Batch: 886/20099 (4.41%) Loss: 2.462074 LR: 0.00000687 [06:14:49] Epoch: 1 Batch: 887/20099 (4.41%) Loss: 2.576078 LR: 0.00000687 [06:14:52] Epoch: 1 Batch: 888/20099 (4.42%) Loss: 2.787576 LR: 0.00000687 [06:14:55] Epoch: 1 Batch: 889/20099 (4.42%) Loss: 2.327943 LR: 0.00000693 [06:14:58] Epoch: 1 Batch: 890/20099 (4.43%) Loss: 2.285853 LR: 0.00000693 [06:15:01] Epoch: 1 Batch: 891/20099 (4.43%) Loss: 2.261620 LR: 0.00000693 [06:15:04] Epoch: 1 Batch: 892/20099 (4.44%) Loss: 2.802789 LR: 0.00000693 [06:15:07] Epoch: 1 Batch: 893/20099 (4.44%) Loss: 2.366830 LR: 0.00000693 [06:15:11] Epoch: 1 Batch: 894/20099 (4.45%) Loss: 2.537253 LR: 0.00000693 [06:15:14] Epoch: 1 Batch: 895/20099 (4.45%) Loss: 2.551683 LR: 0.00000693 [06:15:17] Epoch: 1 Batch: 896/20099 (4.46%) Loss: 2.565464 LR: 0.00000698 [06:15:20] Epoch: 1 Batch: 897/20099 (4.46%) Loss: 2.883792 LR: 0.00000698 [06:15:23] Epoch: 1 Batch: 898/20099 (4.47%) Loss: 2.671345 LR: 0.00000698 [06:15:26] Epoch: 1 Batch: 899/20099 (4.47%) Loss: 2.635837 LR: 0.00000698 [06:15:29] Epoch: 1 Batch: 900/20099 (4.48%) Loss: 2.613115 LR: 0.00000698 [06:15:32] Epoch: 1 Batch: 901/20099 (4.48%) Loss: 2.249015 LR: 0.00000698 [06:15:35] Epoch: 1 Batch: 902/20099 (4.49%) Loss: 2.572182 LR: 0.00000698 [06:15:38] Epoch: 1 Batch: 903/20099 (4.49%) Loss: 2.426290 LR: 0.00000704 [06:15:41] Epoch: 1 Batch: 904/20099 (4.50%) Loss: 2.459203 LR: 0.00000704 [06:15:44] Epoch: 1 Batch: 905/20099 (4.50%) Loss: 2.464326 LR: 0.00000704 [06:15:48] Epoch: 1 Batch: 906/20099 (4.51%) Loss: 2.679364 LR: 0.00000704 [06:15:51] Epoch: 1 Batch: 907/20099 (4.51%) Loss: 2.706455 LR: 0.00000704 [06:15:54] Epoch: 1 Batch: 908/20099 (4.52%) Loss: 2.667820 LR: 0.00000704 [06:15:57] Epoch: 1 Batch: 909/20099 (4.52%) Loss: 2.573327 LR: 0.00000704 [06:16:00] Epoch: 1 Batch: 910/20099 (4.53%) Loss: 2.173689 LR: 0.00000709 [06:16:03] Epoch: 1 Batch: 911/20099 (4.53%) Loss: 2.395568 LR: 0.00000709 [06:16:06] Epoch: 1 Batch: 912/20099 (4.54%) Loss: 2.938343 LR: 0.00000709 [06:16:09] Epoch: 1 Batch: 913/20099 (4.54%) Loss: 2.319208 LR: 0.00000709 [06:16:12] Epoch: 1 Batch: 914/20099 (4.55%) Loss: 2.700063 LR: 0.00000709 [06:16:15] Epoch: 1 Batch: 915/20099 (4.55%) Loss: 2.650727 LR: 0.00000709 [06:16:19] Epoch: 1 Batch: 916/20099 (4.56%) Loss: 2.206372 LR: 0.00000709 [06:16:22] Epoch: 1 Batch: 917/20099 (4.56%) Loss: 2.546712 LR: 0.00000715 [06:16:25] Epoch: 1 Batch: 918/20099 (4.57%) Loss: 2.473594 LR: 0.00000715 [06:16:28] Epoch: 1 Batch: 919/20099 (4.57%) Loss: 2.478794 LR: 0.00000715 [06:16:31] Epoch: 1 Batch: 920/20099 (4.58%) Loss: 2.519163 LR: 0.00000715 [06:16:34] Epoch: 1 Batch: 921/20099 (4.58%) Loss: 2.779296 LR: 0.00000715 [06:16:37] Epoch: 1 Batch: 922/20099 (4.59%) Loss: 2.533009 LR: 0.00000715 [06:16:40] Epoch: 1 Batch: 923/20099 (4.59%) Loss: 2.532431 LR: 0.00000715 [06:16:43] Epoch: 1 Batch: 924/20099 (4.60%) Loss: 2.097837 LR: 0.00000720 [06:16:46] Epoch: 1 Batch: 925/20099 (4.60%) Loss: 2.507565 LR: 0.00000720 [06:16:49] Epoch: 1 Batch: 926/20099 (4.61%) Loss: 2.726522 LR: 0.00000720 [06:16:52] Epoch: 1 Batch: 927/20099 (4.61%) Loss: 2.686650 LR: 0.00000720 [06:16:55] Epoch: 1 Batch: 928/20099 (4.62%) Loss: 2.418660 LR: 0.00000720 [06:16:59] Epoch: 1 Batch: 929/20099 (4.62%) Loss: 2.385518 LR: 0.00000720 [06:17:02] Epoch: 1 Batch: 930/20099 (4.63%) Loss: 2.404970 LR: 0.00000720 [06:17:05] Epoch: 1 Batch: 931/20099 (4.63%) Loss: 2.696613 LR: 0.00000725 [06:17:08] Epoch: 1 Batch: 932/20099 (4.64%) Loss: 2.703056 LR: 0.00000725 [06:17:11] Epoch: 1 Batch: 933/20099 (4.64%) Loss: 2.434398 LR: 0.00000725 [06:17:14] Epoch: 1 Batch: 934/20099 (4.65%) Loss: 2.638575 LR: 0.00000725 [06:17:17] Epoch: 1 Batch: 935/20099 (4.65%) Loss: 2.449628 LR: 0.00000725 [06:17:20] Epoch: 1 Batch: 936/20099 (4.66%) Loss: 2.620136 LR: 0.00000725 [06:17:23] Epoch: 1 Batch: 937/20099 (4.66%) Loss: 2.590441 LR: 0.00000725 [06:17:26] Epoch: 1 Batch: 938/20099 (4.67%) Loss: 2.460018 LR: 0.00000731 [06:17:29] Epoch: 1 Batch: 939/20099 (4.67%) Loss: 2.603896 LR: 0.00000731 [06:17:33] Epoch: 1 Batch: 940/20099 (4.68%) Loss: 2.745907 LR: 0.00000731 [06:17:36] Epoch: 1 Batch: 941/20099 (4.68%) Loss: 2.647418 LR: 0.00000731 [06:17:39] Epoch: 1 Batch: 942/20099 (4.69%) Loss: 2.458124 LR: 0.00000731 [06:17:42] Epoch: 1 Batch: 943/20099 (4.69%) Loss: 2.558774 LR: 0.00000731 [06:17:45] Epoch: 1 Batch: 944/20099 (4.70%) Loss: 2.440393 LR: 0.00000731 [06:17:48] Epoch: 1 Batch: 945/20099 (4.70%) Loss: 2.493258 LR: 0.00000736 [06:17:51] Epoch: 1 Batch: 946/20099 (4.71%) Loss: 2.502746 LR: 0.00000736 [06:17:54] Epoch: 1 Batch: 947/20099 (4.71%) Loss: 2.610931 LR: 0.00000736 [06:17:57] Epoch: 1 Batch: 948/20099 (4.72%) Loss: 2.715745 LR: 0.00000736 [06:18:00] Epoch: 1 Batch: 949/20099 (4.72%) Loss: 2.582851 LR: 0.00000736 [06:18:04] Epoch: 1 Batch: 950/20099 (4.73%) Loss: 2.658231 LR: 0.00000736 [06:18:07] Epoch: 1 Batch: 951/20099 (4.73%) Loss: 2.459612 LR: 0.00000736 [06:18:10] Epoch: 1 Batch: 952/20099 (4.74%) Loss: 2.494330 LR: 0.00000742 [06:18:13] Epoch: 1 Batch: 953/20099 (4.74%) Loss: 2.352329 LR: 0.00000742 [06:18:16] Epoch: 1 Batch: 954/20099 (4.75%) Loss: 2.416545 LR: 0.00000742 [06:18:19] Epoch: 1 Batch: 955/20099 (4.75%) Loss: 2.123285 LR: 0.00000742 [06:18:22] Epoch: 1 Batch: 956/20099 (4.76%) Loss: 2.506108 LR: 0.00000742 [06:18:25] Epoch: 1 Batch: 957/20099 (4.76%) Loss: 2.554704 LR: 0.00000742 [06:18:28] Epoch: 1 Batch: 958/20099 (4.77%) Loss: 2.394995 LR: 0.00000742 [06:18:31] Epoch: 1 Batch: 959/20099 (4.77%) Loss: 2.385046 LR: 0.00000747 [06:18:35] Epoch: 1 Batch: 960/20099 (4.78%) Loss: 2.492765 LR: 0.00000747 [06:18:38] Epoch: 1 Batch: 961/20099 (4.78%) Loss: 2.551732 LR: 0.00000747 [06:18:41] Epoch: 1 Batch: 962/20099 (4.79%) Loss: 2.243909 LR: 0.00000747 [06:18:44] Epoch: 1 Batch: 963/20099 (4.79%) Loss: 2.670889 LR: 0.00000747 [06:18:47] Epoch: 1 Batch: 964/20099 (4.80%) Loss: 2.778051 LR: 0.00000747 [06:18:50] Epoch: 1 Batch: 965/20099 (4.80%) Loss: 2.470696 LR: 0.00000747 [06:18:53] Epoch: 1 Batch: 966/20099 (4.81%) Loss: 2.696412 LR: 0.00000753 [06:18:56] Epoch: 1 Batch: 967/20099 (4.81%) Loss: 2.851357 LR: 0.00000753 [06:18:59] Epoch: 1 Batch: 968/20099 (4.82%) Loss: 2.447409 LR: 0.00000753 [06:19:02] Epoch: 1 Batch: 969/20099 (4.82%) Loss: 2.442802 LR: 0.00000753 [06:19:06] Epoch: 1 Batch: 970/20099 (4.83%) Loss: 2.610682 LR: 0.00000753 [06:19:09] Epoch: 1 Batch: 971/20099 (4.83%) Loss: 2.440674 LR: 0.00000753 [06:19:12] Epoch: 1 Batch: 972/20099 (4.84%) Loss: 2.639163 LR: 0.00000753 [06:19:15] Epoch: 1 Batch: 973/20099 (4.84%) Loss: 2.355131 LR: 0.00000758 [06:19:18] Epoch: 1 Batch: 974/20099 (4.85%) Loss: 2.356697 LR: 0.00000758 [06:19:21] Epoch: 1 Batch: 975/20099 (4.85%) Loss: 2.390357 LR: 0.00000758 [06:19:24] Epoch: 1 Batch: 976/20099 (4.86%) Loss: 2.958416 LR: 0.00000758 [06:19:27] Epoch: 1 Batch: 977/20099 (4.86%) Loss: 2.671299 LR: 0.00000758 [06:19:30] Epoch: 1 Batch: 978/20099 (4.87%) Loss: 2.239478 LR: 0.00000758 [06:19:33] Epoch: 1 Batch: 979/20099 (4.87%) Loss: 2.714630 LR: 0.00000758 [06:19:37] Epoch: 1 Batch: 980/20099 (4.88%) Loss: 2.516395 LR: 0.00000764 [06:19:40] Epoch: 1 Batch: 981/20099 (4.88%) Loss: 2.523391 LR: 0.00000764 [06:19:43] Epoch: 1 Batch: 982/20099 (4.89%) Loss: 2.685921 LR: 0.00000764 [06:19:46] Epoch: 1 Batch: 983/20099 (4.89%) Loss: 2.260486 LR: 0.00000764 [06:19:49] Epoch: 1 Batch: 984/20099 (4.90%) Loss: 2.365876 LR: 0.00000764 [06:19:52] Epoch: 1 Batch: 985/20099 (4.90%) Loss: 2.634164 LR: 0.00000764 [06:19:55] Epoch: 1 Batch: 986/20099 (4.91%) Loss: 2.481252 LR: 0.00000764 [06:19:58] Epoch: 1 Batch: 987/20099 (4.91%) Loss: 2.762477 LR: 0.00000769 [06:20:01] Epoch: 1 Batch: 988/20099 (4.92%) Loss: 2.392833 LR: 0.00000769 [06:20:05] Epoch: 1 Batch: 989/20099 (4.92%) Loss: 2.242428 LR: 0.00000769 [06:20:08] Epoch: 1 Batch: 990/20099 (4.93%) Loss: 2.609648 LR: 0.00000769 [06:20:11] Epoch: 1 Batch: 991/20099 (4.93%) Loss: 2.314681 LR: 0.00000769 [06:20:14] Epoch: 1 Batch: 992/20099 (4.94%) Loss: 2.377580 LR: 0.00000769 [06:20:17] Epoch: 1 Batch: 993/20099 (4.94%) Loss: 2.282904 LR: 0.00000769 [06:20:20] Epoch: 1 Batch: 994/20099 (4.95%) Loss: 2.415827 LR: 0.00000775 [06:20:23] Epoch: 1 Batch: 995/20099 (4.95%) Loss: 2.501282 LR: 0.00000775 [06:20:26] Epoch: 1 Batch: 996/20099 (4.96%) Loss: 2.216219 LR: 0.00000775 [06:20:29] Epoch: 1 Batch: 997/20099 (4.96%) Loss: 2.484162 LR: 0.00000775 [06:20:32] Epoch: 1 Batch: 998/20099 (4.97%) Loss: 2.314183 LR: 0.00000775 [06:20:35] Epoch: 1 Batch: 999/20099 (4.97%) Loss: 2.462252 LR: 0.00000775 [06:20:39] >> Evaluating batch 0 [06:20:40] >> Evaluating batch 1 [06:20:41] >> Evaluating batch 2 [06:20:43] >> Evaluating batch 3 [06:20:44] >> Evaluating batch 4 [06:20:45] >> Evaluating batch 5 [06:20:46] >> Evaluating batch 6 [06:20:48] >> Evaluating batch 7 [06:20:49] >> Evaluating batch 8 [06:20:50] >> Evaluating batch 9 [06:20:51] >> Evaluating batch 10 [06:20:53] >> Evaluating batch 11 [06:20:54] >> Evaluating batch 12 [06:20:55] >> Evaluating batch 13 [06:20:56] >> Evaluating batch 14 [06:20:57] >> Evaluating batch 15 [06:20:58] >> Evaluating batch 16 [06:20:59] Epoch: 1 Step: 1000/20099 Evaluation: [06:20:59] [1mAvg Loss Since Last Eval: 2.8438 Val Loss: 2.5492 Validation loss delta: -1.2349 Perplexity: 12.7973 LR: 0.00000775 [06:21:03] >> Temp checkpoint saved: epoch1_step1000, size: 0.1693 GB [06:21:06] >> Checkpoint saved: epoch1_step1000, size: 0.1693 GB [06:21:06] Epoch: 1 Batch: 1000/20099 (4.98%) Loss: 2.268009 LR: 0.00000775 [06:21:09] Epoch: 1 Batch: 1001/20099 (4.98%) Loss: 2.465096 LR: 0.00000780 [06:21:12] Epoch: 1 Batch: 1002/20099 (4.99%) Loss: 2.413087 LR: 0.00000780 [06:21:16] Epoch: 1 Batch: 1003/20099 (4.99%) Loss: 2.499773 LR: 0.00000780 [06:21:19] Epoch: 1 Batch: 1004/20099 (5.00%) Loss: 2.516705 LR: 0.00000780 [06:21:22] Epoch: 1 Batch: 1005/20099 (5.00%) Loss: 2.315004 LR: 0.00000780 [06:21:25] Epoch: 1 Batch: 1006/20099 (5.01%) Loss: 2.431981 LR: 0.00000780 [06:21:28] Epoch: 1 Batch: 1007/20099 (5.01%) Loss: 2.782194 LR: 0.00000780 [06:21:31] Epoch: 1 Batch: 1008/20099 (5.02%) Loss: 2.353743 LR: 0.00000785 [06:21:34] Epoch: 1 Batch: 1009/20099 (5.02%) Loss: 2.532976 LR: 0.00000785 [06:21:37] Epoch: 1 Batch: 1010/20099 (5.03%) Loss: 2.326507 LR: 0.00000785 [06:21:41] Epoch: 1 Batch: 1011/20099 (5.03%) Loss: 2.380943 LR: 0.00000785 [06:21:44] Epoch: 1 Batch: 1012/20099 (5.04%) Loss: 2.446947 LR: 0.00000785 [06:21:47] Epoch: 1 Batch: 1013/20099 (5.04%) Loss: 2.245902 LR: 0.00000785 [06:21:50] Epoch: 1 Batch: 1014/20099 (5.05%) Loss: 2.602089 LR: 0.00000785 [06:21:53] Epoch: 1 Batch: 1015/20099 (5.05%) Loss: 2.296535 LR: 0.00000791 [06:21:56] Epoch: 1 Batch: 1016/20099 (5.05%) Loss: 2.479060 LR: 0.00000791 [06:21:59] Epoch: 1 Batch: 1017/20099 (5.06%) Loss: 2.798729 LR: 0.00000791 [06:22:02] Epoch: 1 Batch: 1018/20099 (5.06%) Loss: 2.501838 LR: 0.00000791 [06:22:06] Epoch: 1 Batch: 1019/20099 (5.07%) Loss: 2.458307 LR: 0.00000791 [06:22:09] Epoch: 1 Batch: 1020/20099 (5.07%) Loss: 2.589815 LR: 0.00000791 [06:22:12] Epoch: 1 Batch: 1021/20099 (5.08%) Loss: 2.443780 LR: 0.00000791 [06:22:15] Epoch: 1 Batch: 1022/20099 (5.08%) Loss: 2.565748 LR: 0.00000796 [06:22:18] Epoch: 1 Batch: 1023/20099 (5.09%) Loss: 2.536574 LR: 0.00000796 [06:22:21] Epoch: 1 Batch: 1024/20099 (5.09%) Loss: 2.380493 LR: 0.00000796 [06:22:24] Epoch: 1 Batch: 1025/20099 (5.10%) Loss: 2.784717 LR: 0.00000796 [06:22:27] Epoch: 1 Batch: 1026/20099 (5.10%) Loss: 2.414336 LR: 0.00000796 [06:22:30] Epoch: 1 Batch: 1027/20099 (5.11%) Loss: 2.274538 LR: 0.00000796 [06:22:33] Epoch: 1 Batch: 1028/20099 (5.11%) Loss: 2.227084 LR: 0.00000796 [06:22:36] Epoch: 1 Batch: 1029/20099 (5.12%) Loss: 2.215168 LR: 0.00000802 [06:22:39] Epoch: 1 Batch: 1030/20099 (5.12%) Loss: 2.564003 LR: 0.00000802 [06:22:42] Epoch: 1 Batch: 1031/20099 (5.13%) Loss: 2.357389 LR: 0.00000802 [06:22:46] Epoch: 1 Batch: 1032/20099 (5.13%) Loss: 2.582675 LR: 0.00000802 [06:22:49] Epoch: 1 Batch: 1033/20099 (5.14%) Loss: 2.620351 LR: 0.00000802 [06:22:52] Epoch: 1 Batch: 1034/20099 (5.14%) Loss: 2.459991 LR: 0.00000802 [06:22:55] Epoch: 1 Batch: 1035/20099 (5.15%) Loss: 2.560182 LR: 0.00000802 [06:22:58] Epoch: 1 Batch: 1036/20099 (5.15%) Loss: 2.530527 LR: 0.00000807 [06:23:01] Epoch: 1 Batch: 1037/20099 (5.16%) Loss: 2.416432 LR: 0.00000807 [06:23:04] Epoch: 1 Batch: 1038/20099 (5.16%) Loss: 2.545779 LR: 0.00000807 [06:23:07] Epoch: 1 Batch: 1039/20099 (5.17%) Loss: 2.685722 LR: 0.00000807 [06:23:10] Epoch: 1 Batch: 1040/20099 (5.17%) Loss: 2.668758 LR: 0.00000807 [06:23:13] Epoch: 1 Batch: 1041/20099 (5.18%) Loss: 2.590490 LR: 0.00000807 [06:23:17] Epoch: 1 Batch: 1042/20099 (5.18%) Loss: 2.365594 LR: 0.00000807 [06:23:20] Epoch: 1 Batch: 1043/20099 (5.19%) Loss: 2.601564 LR: 0.00000813 [06:23:23] Epoch: 1 Batch: 1044/20099 (5.19%) Loss: 2.401983 LR: 0.00000813 [06:23:26] Epoch: 1 Batch: 1045/20099 (5.20%) Loss: 2.252738 LR: 0.00000813 [06:23:29] Epoch: 1 Batch: 1046/20099 (5.20%) Loss: 2.220903 LR: 0.00000813 [06:23:32] Epoch: 1 Batch: 1047/20099 (5.21%) Loss: 2.507816 LR: 0.00000813 [06:23:35] Epoch: 1 Batch: 1048/20099 (5.21%) Loss: 2.420840 LR: 0.00000813 [06:23:38] Epoch: 1 Batch: 1049/20099 (5.22%) Loss: 2.372026 LR: 0.00000813 [06:23:41] Epoch: 1 Batch: 1050/20099 (5.22%) Loss: 2.279827 LR: 0.00000818 [06:23:44] Epoch: 1 Batch: 1051/20099 (5.23%) Loss: 2.455801 LR: 0.00000818 [06:23:48] Epoch: 1 Batch: 1052/20099 (5.23%) Loss: 2.369296 LR: 0.00000818 [06:23:51] Epoch: 1 Batch: 1053/20099 (5.24%) Loss: 2.392564 LR: 0.00000818 [06:23:54] Epoch: 1 Batch: 1054/20099 (5.24%) Loss: 2.121730 LR: 0.00000818 [06:23:57] Epoch: 1 Batch: 1055/20099 (5.25%) Loss: 2.413616 LR: 0.00000818 [06:24:00] Epoch: 1 Batch: 1056/20099 (5.25%) Loss: 2.601921 LR: 0.00000818 [06:24:03] Epoch: 1 Batch: 1057/20099 (5.26%) Loss: 2.185557 LR: 0.00000824 [06:24:06] Epoch: 1 Batch: 1058/20099 (5.26%) Loss: 2.342689 LR: 0.00000824 [06:24:09] Epoch: 1 Batch: 1059/20099 (5.27%) Loss: 2.199738 LR: 0.00000824 [06:24:12] Epoch: 1 Batch: 1060/20099 (5.27%) Loss: 2.046974 LR: 0.00000824 [06:24:15] Epoch: 1 Batch: 1061/20099 (5.28%) Loss: 2.426784 LR: 0.00000824 [06:24:18] Epoch: 1 Batch: 1062/20099 (5.28%) Loss: 2.600775 LR: 0.00000824 [06:24:22] Epoch: 1 Batch: 1063/20099 (5.29%) Loss: 2.172648 LR: 0.00000824 [06:24:25] Epoch: 1 Batch: 1064/20099 (5.29%) Loss: 2.719525 LR: 0.00000829 [06:24:28] Epoch: 1 Batch: 1065/20099 (5.30%) Loss: 2.245898 LR: 0.00000829 [06:24:31] Epoch: 1 Batch: 1066/20099 (5.30%) Loss: 2.280223 LR: 0.00000829 [06:24:34] Epoch: 1 Batch: 1067/20099 (5.31%) Loss: 2.671634 LR: 0.00000829 [06:24:37] Epoch: 1 Batch: 1068/20099 (5.31%) Loss: 2.602126 LR: 0.00000829 [06:24:40] Epoch: 1 Batch: 1069/20099 (5.32%) Loss: 2.225140 LR: 0.00000829 [06:24:43] Epoch: 1 Batch: 1070/20099 (5.32%) Loss: 2.321930 LR: 0.00000829 [06:24:46] Epoch: 1 Batch: 1071/20099 (5.33%) Loss: 2.459015 LR: 0.00000835 [06:24:49] Epoch: 1 Batch: 1072/20099 (5.33%) Loss: 2.347937 LR: 0.00000835 [06:24:53] Epoch: 1 Batch: 1073/20099 (5.34%) Loss: 2.132351 LR: 0.00000835 [06:24:56] Epoch: 1 Batch: 1074/20099 (5.34%) Loss: 2.659304 LR: 0.00000835 [06:24:59] Epoch: 1 Batch: 1075/20099 (5.35%) Loss: 2.534127 LR: 0.00000835 [06:25:02] Epoch: 1 Batch: 1076/20099 (5.35%) Loss: 2.407731 LR: 0.00000835 [06:25:05] Epoch: 1 Batch: 1077/20099 (5.36%) Loss: 2.341429 LR: 0.00000835 [06:25:08] Epoch: 1 Batch: 1078/20099 (5.36%) Loss: 2.475575 LR: 0.00000840 [06:25:11] Epoch: 1 Batch: 1079/20099 (5.37%) Loss: 2.302447 LR: 0.00000840 [06:25:14] Epoch: 1 Batch: 1080/20099 (5.37%) Loss: 2.210517 LR: 0.00000840 [06:25:17] Epoch: 1 Batch: 1081/20099 (5.38%) Loss: 2.746410 LR: 0.00000840 [06:25:20] Epoch: 1 Batch: 1082/20099 (5.38%) Loss: 2.703728 LR: 0.00000840 [06:25:23] Epoch: 1 Batch: 1083/20099 (5.39%) Loss: 2.560720 LR: 0.00000840 [06:25:27] Epoch: 1 Batch: 1084/20099 (5.39%) Loss: 2.882574 LR: 0.00000840 [06:25:30] Epoch: 1 Batch: 1085/20099 (5.40%) Loss: 2.218088 LR: 0.00000845 [06:25:33] Epoch: 1 Batch: 1086/20099 (5.40%) Loss: 2.527889 LR: 0.00000845 [06:25:36] Epoch: 1 Batch: 1087/20099 (5.41%) Loss: 2.355284 LR: 0.00000845 [06:25:39] Epoch: 1 Batch: 1088/20099 (5.41%) Loss: 2.381893 LR: 0.00000845 [06:25:42] Epoch: 1 Batch: 1089/20099 (5.42%) Loss: 2.128335 LR: 0.00000845 [06:25:45] Epoch: 1 Batch: 1090/20099 (5.42%) Loss: 2.270531 LR: 0.00000845 [06:25:48] Epoch: 1 Batch: 1091/20099 (5.43%) Loss: 2.466590 LR: 0.00000845 [06:25:51] Epoch: 1 Batch: 1092/20099 (5.43%) Loss: 2.468090 LR: 0.00000851 [06:25:54] Epoch: 1 Batch: 1093/20099 (5.44%) Loss: 2.313470 LR: 0.00000851 [06:25:58] Epoch: 1 Batch: 1094/20099 (5.44%) Loss: 2.570095 LR: 0.00000851 [06:26:01] Epoch: 1 Batch: 1095/20099 (5.45%) Loss: 2.401191 LR: 0.00000851 [06:26:04] Epoch: 1 Batch: 1096/20099 (5.45%) Loss: 2.873330 LR: 0.00000851 [06:26:07] Epoch: 1 Batch: 1097/20099 (5.46%) Loss: 2.265061 LR: 0.00000851 [06:26:10] Epoch: 1 Batch: 1098/20099 (5.46%) Loss: 2.257049 LR: 0.00000851 [06:26:13] Epoch: 1 Batch: 1099/20099 (5.47%) Loss: 2.407581 LR: 0.00000856 [06:26:16] Epoch: 1 Batch: 1100/20099 (5.47%) Loss: 2.699013 LR: 0.00000856 [06:26:19] Epoch: 1 Batch: 1101/20099 (5.48%) Loss: 2.583417 LR: 0.00000856 [06:26:22] Epoch: 1 Batch: 1102/20099 (5.48%) Loss: 2.178537 LR: 0.00000856 [06:26:25] Epoch: 1 Batch: 1103/20099 (5.49%) Loss: 2.603993 LR: 0.00000856 [06:26:29] Epoch: 1 Batch: 1104/20099 (5.49%) Loss: 2.798329 LR: 0.00000856 [06:26:32] Epoch: 1 Batch: 1105/20099 (5.50%) Loss: 2.329470 LR: 0.00000856 [06:26:35] Epoch: 1 Batch: 1106/20099 (5.50%) Loss: 2.213904 LR: 0.00000862 [06:26:38] Epoch: 1 Batch: 1107/20099 (5.51%) Loss: 2.261133 LR: 0.00000862 [06:26:41] Epoch: 1 Batch: 1108/20099 (5.51%) Loss: 2.428382 LR: 0.00000862 [06:26:44] Epoch: 1 Batch: 1109/20099 (5.52%) Loss: 2.406951 LR: 0.00000862 [06:26:47] Epoch: 1 Batch: 1110/20099 (5.52%) Loss: 2.318517 LR: 0.00000862 [06:26:50] Epoch: 1 Batch: 1111/20099 (5.53%) Loss: 2.386443 LR: 0.00000862 [06:26:53] Epoch: 1 Batch: 1112/20099 (5.53%) Loss: 2.553387 LR: 0.00000862 [06:26:57] Epoch: 1 Batch: 1113/20099 (5.54%) Loss: 2.267224 LR: 0.00000867 [06:27:00] Epoch: 1 Batch: 1114/20099 (5.54%) Loss: 2.337773 LR: 0.00000867 [06:27:03] Epoch: 1 Batch: 1115/20099 (5.55%) Loss: 2.511516 LR: 0.00000867 [06:27:06] Epoch: 1 Batch: 1116/20099 (5.55%) Loss: 2.156609 LR: 0.00000867 [06:27:09] Epoch: 1 Batch: 1117/20099 (5.56%) Loss: 2.514179 LR: 0.00000867 [06:27:12] Epoch: 1 Batch: 1118/20099 (5.56%) Loss: 2.929897 LR: 0.00000867 [06:27:15] Epoch: 1 Batch: 1119/20099 (5.57%) Loss: 2.228798 LR: 0.00000867 [06:27:18] Epoch: 1 Batch: 1120/20099 (5.57%) Loss: 2.461570 LR: 0.00000873 [06:27:21] Epoch: 1 Batch: 1121/20099 (5.58%) Loss: 2.428960 LR: 0.00000873 [06:27:24] Epoch: 1 Batch: 1122/20099 (5.58%) Loss: 2.169969 LR: 0.00000873 [06:27:28] Epoch: 1 Batch: 1123/20099 (5.59%) Loss: 2.601725 LR: 0.00000873 [06:27:31] Epoch: 1 Batch: 1124/20099 (5.59%) Loss: 2.426909 LR: 0.00000873 [06:27:34] Epoch: 1 Batch: 1125/20099 (5.60%) Loss: 2.380077 LR: 0.00000873 [06:27:37] Epoch: 1 Batch: 1126/20099 (5.60%) Loss: 2.342493 LR: 0.00000873 [06:27:40] Epoch: 1 Batch: 1127/20099 (5.61%) Loss: 2.250722 LR: 0.00000878 [06:27:43] Epoch: 1 Batch: 1128/20099 (5.61%) Loss: 2.638403 LR: 0.00000878 [06:27:46] Epoch: 1 Batch: 1129/20099 (5.62%) Loss: 2.463369 LR: 0.00000878 [06:27:49] Epoch: 1 Batch: 1130/20099 (5.62%) Loss: 2.428862 LR: 0.00000878 [06:27:52] Epoch: 1 Batch: 1131/20099 (5.63%) Loss: 2.133033 LR: 0.00000878 [06:27:55] Epoch: 1 Batch: 1132/20099 (5.63%) Loss: 2.573893 LR: 0.00000878 [06:27:58] Epoch: 1 Batch: 1133/20099 (5.64%) Loss: 2.405326 LR: 0.00000878 [06:28:02] Epoch: 1 Batch: 1134/20099 (5.64%) Loss: 2.285279 LR: 0.00000884 [06:28:05] Epoch: 1 Batch: 1135/20099 (5.65%) Loss: 2.563377 LR: 0.00000884 [06:28:08] Epoch: 1 Batch: 1136/20099 (5.65%) Loss: 2.409335 LR: 0.00000884 [06:28:11] Epoch: 1 Batch: 1137/20099 (5.66%) Loss: 2.674809 LR: 0.00000884 [06:28:14] Epoch: 1 Batch: 1138/20099 (5.66%) Loss: 2.217234 LR: 0.00000884 [06:28:17] Epoch: 1 Batch: 1139/20099 (5.67%) Loss: 2.504634 LR: 0.00000884 [06:28:20] Epoch: 1 Batch: 1140/20099 (5.67%) Loss: 2.380763 LR: 0.00000884 [06:28:23] Epoch: 1 Batch: 1141/20099 (5.68%) Loss: 2.478846 LR: 0.00000889 [06:28:26] Epoch: 1 Batch: 1142/20099 (5.68%) Loss: 2.219178 LR: 0.00000889 [06:28:29] Epoch: 1 Batch: 1143/20099 (5.69%) Loss: 2.479153 LR: 0.00000889 [06:28:33] Epoch: 1 Batch: 1144/20099 (5.69%) Loss: 2.338852 LR: 0.00000889 [06:28:36] Epoch: 1 Batch: 1145/20099 (5.70%) Loss: 2.547462 LR: 0.00000889 [06:28:39] Epoch: 1 Batch: 1146/20099 (5.70%) Loss: 2.326259 LR: 0.00000889 [06:28:42] Epoch: 1 Batch: 1147/20099 (5.71%) Loss: 2.430275 LR: 0.00000889 [06:28:45] Epoch: 1 Batch: 1148/20099 (5.71%) Loss: 2.331192 LR: 0.00000895 [06:28:48] Epoch: 1 Batch: 1149/20099 (5.72%) Loss: 2.377501 LR: 0.00000895 [06:28:51] Epoch: 1 Batch: 1150/20099 (5.72%) Loss: 2.405118 LR: 0.00000895 [06:28:54] Epoch: 1 Batch: 1151/20099 (5.73%) Loss: 2.438580 LR: 0.00000895 [06:28:57] Epoch: 1 Batch: 1152/20099 (5.73%) Loss: 2.582524 LR: 0.00000895 [06:29:00] Epoch: 1 Batch: 1153/20099 (5.74%) Loss: 2.605482 LR: 0.00000895 [06:29:04] Epoch: 1 Batch: 1154/20099 (5.74%) Loss: 2.351916 LR: 0.00000895 [06:29:07] Epoch: 1 Batch: 1155/20099 (5.75%) Loss: 2.359187 LR: 0.00000900 [06:29:10] Epoch: 1 Batch: 1156/20099 (5.75%) Loss: 2.781204 LR: 0.00000900 [06:29:13] Epoch: 1 Batch: 1157/20099 (5.76%) Loss: 2.216732 LR: 0.00000900 [06:29:16] Epoch: 1 Batch: 1158/20099 (5.76%) Loss: 2.617371 LR: 0.00000900 [06:29:19] Epoch: 1 Batch: 1159/20099 (5.77%) Loss: 2.614164 LR: 0.00000900 [06:29:22] Epoch: 1 Batch: 1160/20099 (5.77%) Loss: 2.227229 LR: 0.00000900 [06:29:25] Epoch: 1 Batch: 1161/20099 (5.78%) Loss: 2.316306 LR: 0.00000900 [06:29:28] Epoch: 1 Batch: 1162/20099 (5.78%) Loss: 2.546376 LR: 0.00000905 [06:29:31] Epoch: 1 Batch: 1163/20099 (5.79%) Loss: 2.547140 LR: 0.00000905 [06:29:35] Epoch: 1 Batch: 1164/20099 (5.79%) Loss: 2.212639 LR: 0.00000905 [06:29:38] Epoch: 1 Batch: 1165/20099 (5.80%) Loss: 2.607040 LR: 0.00000905 [06:29:41] Epoch: 1 Batch: 1166/20099 (5.80%) Loss: 2.266511 LR: 0.00000905 [06:29:44] Epoch: 1 Batch: 1167/20099 (5.81%) Loss: 2.461068 LR: 0.00000905 [06:29:47] Epoch: 1 Batch: 1168/20099 (5.81%) Loss: 2.310920 LR: 0.00000905 [06:29:50] Epoch: 1 Batch: 1169/20099 (5.82%) Loss: 2.496668 LR: 0.00000911 [06:29:53] Epoch: 1 Batch: 1170/20099 (5.82%) Loss: 2.436123 LR: 0.00000911 [06:29:56] Epoch: 1 Batch: 1171/20099 (5.83%) Loss: 2.411027 LR: 0.00000911 [06:29:59] Epoch: 1 Batch: 1172/20099 (5.83%) Loss: 2.337048 LR: 0.00000911 [06:30:02] Epoch: 1 Batch: 1173/20099 (5.84%) Loss: 2.440300 LR: 0.00000911 [06:30:05] Epoch: 1 Batch: 1174/20099 (5.84%) Loss: 2.199517 LR: 0.00000911 [06:30:09] Epoch: 1 Batch: 1175/20099 (5.85%) Loss: 1.955714 LR: 0.00000911 [06:30:12] Epoch: 1 Batch: 1176/20099 (5.85%) Loss: 2.494298 LR: 0.00000916 [06:30:15] Epoch: 1 Batch: 1177/20099 (5.86%) Loss: 2.460342 LR: 0.00000916 [06:30:18] Epoch: 1 Batch: 1178/20099 (5.86%) Loss: 2.436229 LR: 0.00000916 [06:30:21] Epoch: 1 Batch: 1179/20099 (5.87%) Loss: 2.279537 LR: 0.00000916 [06:30:24] Epoch: 1 Batch: 1180/20099 (5.87%) Loss: 2.419563 LR: 0.00000916 [06:30:27] Epoch: 1 Batch: 1181/20099 (5.88%) Loss: 2.602052 LR: 0.00000916 [06:30:30] Epoch: 1 Batch: 1182/20099 (5.88%) Loss: 2.335544 LR: 0.00000916 [06:30:33] Epoch: 1 Batch: 1183/20099 (5.89%) Loss: 2.433874 LR: 0.00000922 [06:30:36] Epoch: 1 Batch: 1184/20099 (5.89%) Loss: 2.620772 LR: 0.00000922 [06:30:40] Epoch: 1 Batch: 1185/20099 (5.90%) Loss: 2.515781 LR: 0.00000922 [06:30:43] Epoch: 1 Batch: 1186/20099 (5.90%) Loss: 2.311657 LR: 0.00000922 [06:30:46] Epoch: 1 Batch: 1187/20099 (5.91%) Loss: 1.989614 LR: 0.00000922 [06:30:49] Epoch: 1 Batch: 1188/20099 (5.91%) Loss: 2.473170 LR: 0.00000922 [06:30:52] Epoch: 1 Batch: 1189/20099 (5.92%) Loss: 2.462648 LR: 0.00000922 [06:30:55] Epoch: 1 Batch: 1190/20099 (5.92%) Loss: 2.405720 LR: 0.00000927 [06:30:58] Epoch: 1 Batch: 1191/20099 (5.93%) Loss: 2.298663 LR: 0.00000927 [06:31:01] Epoch: 1 Batch: 1192/20099 (5.93%) Loss: 2.480774 LR: 0.00000927 [06:31:04] Epoch: 1 Batch: 1193/20099 (5.94%) Loss: 2.388686 LR: 0.00000927 [06:31:07] Epoch: 1 Batch: 1194/20099 (5.94%) Loss: 2.372922 LR: 0.00000927 [06:31:11] Epoch: 1 Batch: 1195/20099 (5.95%) Loss: 2.544777 LR: 0.00000927 [06:31:14] Epoch: 1 Batch: 1196/20099 (5.95%) Loss: 2.319907 LR: 0.00000927 [06:31:17] Epoch: 1 Batch: 1197/20099 (5.96%) Loss: 2.452294 LR: 0.00000933 [06:31:20] Epoch: 1 Batch: 1198/20099 (5.96%) Loss: 2.323024 LR: 0.00000933 [06:31:23] Epoch: 1 Batch: 1199/20099 (5.97%) Loss: 2.385090 LR: 0.00000933 [06:31:30] >> Temp checkpoint saved: epoch1_step1200, size: 0.1693 GB [06:31:30] Epoch: 1 Batch: 1200/20099 (5.97%) Loss: 2.602888 LR: 0.00000933 [06:31:33] Epoch: 1 Batch: 1201/20099 (5.98%) Loss: 2.248256 LR: 0.00000933 [06:31:36] Epoch: 1 Batch: 1202/20099 (5.98%) Loss: 2.364904 LR: 0.00000933 [06:31:39] Epoch: 1 Batch: 1203/20099 (5.99%) Loss: 2.542409 LR: 0.00000933 [06:31:42] Epoch: 1 Batch: 1204/20099 (5.99%) Loss: 2.485518 LR: 0.00000938 [06:31:45] Epoch: 1 Batch: 1205/20099 (6.00%) Loss: 2.025431 LR: 0.00000938 [06:31:48] Epoch: 1 Batch: 1206/20099 (6.00%) Loss: 2.454522 LR: 0.00000938 [06:31:51] Epoch: 1 Batch: 1207/20099 (6.01%) Loss: 2.370275 LR: 0.00000938 [06:31:55] Epoch: 1 Batch: 1208/20099 (6.01%) Loss: 2.348686 LR: 0.00000938 [06:31:58] Epoch: 1 Batch: 1209/20099 (6.02%) Loss: 2.653940 LR: 0.00000938 [06:32:01] Epoch: 1 Batch: 1210/20099 (6.02%) Loss: 2.748654 LR: 0.00000938 [06:32:04] Epoch: 1 Batch: 1211/20099 (6.03%) Loss: 2.277048 LR: 0.00000944 [06:32:07] Epoch: 1 Batch: 1212/20099 (6.03%) Loss: 2.287073 LR: 0.00000944 [06:32:10] Epoch: 1 Batch: 1213/20099 (6.04%) Loss: 2.637357 LR: 0.00000944 [06:32:13] Epoch: 1 Batch: 1214/20099 (6.04%) Loss: 2.642890 LR: 0.00000944 [06:32:16] Epoch: 1 Batch: 1215/20099 (6.05%) Loss: 2.137106 LR: 0.00000944 [06:32:20] Epoch: 1 Batch: 1216/20099 (6.05%) Loss: 2.257788 LR: 0.00000944 [06:32:23] Epoch: 1 Batch: 1217/20099 (6.06%) Loss: 2.415434 LR: 0.00000944 [06:32:26] Epoch: 1 Batch: 1218/20099 (6.06%) Loss: 2.546329 LR: 0.00000949 [06:32:29] Epoch: 1 Batch: 1219/20099 (6.06%) Loss: 2.511510 LR: 0.00000949 [06:32:32] Epoch: 1 Batch: 1220/20099 (6.07%) Loss: 2.402707 LR: 0.00000949 [06:32:35] Epoch: 1 Batch: 1221/20099 (6.07%) Loss: 2.452036 LR: 0.00000949 [06:32:38] Epoch: 1 Batch: 1222/20099 (6.08%) Loss: 2.347884 LR: 0.00000949 [06:32:41] Epoch: 1 Batch: 1223/20099 (6.08%) Loss: 2.600413 LR: 0.00000949 [06:32:44] Epoch: 1 Batch: 1224/20099 (6.09%) Loss: 2.360252 LR: 0.00000949 [06:32:47] Epoch: 1 Batch: 1225/20099 (6.09%) Loss: 2.286991 LR: 0.00000955 [06:32:51] Epoch: 1 Batch: 1226/20099 (6.10%) Loss: 2.675958 LR: 0.00000955 [06:32:54] Epoch: 1 Batch: 1227/20099 (6.10%) Loss: 2.584281 LR: 0.00000955 [06:32:57] Epoch: 1 Batch: 1228/20099 (6.11%) Loss: 2.296438 LR: 0.00000955 [06:33:00] Epoch: 1 Batch: 1229/20099 (6.11%) Loss: 2.184522 LR: 0.00000955 [06:33:03] Epoch: 1 Batch: 1230/20099 (6.12%) Loss: 2.369483 LR: 0.00000955 [06:33:06] Epoch: 1 Batch: 1231/20099 (6.12%) Loss: 2.530273 LR: 0.00000955 [06:33:09] Epoch: 1 Batch: 1232/20099 (6.13%) Loss: 2.454349 LR: 0.00000960 [06:33:12] Epoch: 1 Batch: 1233/20099 (6.13%) Loss: 2.505024 LR: 0.00000960 [06:33:15] Epoch: 1 Batch: 1234/20099 (6.14%) Loss: 2.236042 LR: 0.00000960 [06:33:18] Epoch: 1 Batch: 1235/20099 (6.14%) Loss: 2.161332 LR: 0.00000960 [06:33:22] Epoch: 1 Batch: 1236/20099 (6.15%) Loss: 2.311042 LR: 0.00000960 [06:33:25] Epoch: 1 Batch: 1237/20099 (6.15%) Loss: 2.431950 LR: 0.00000960 [06:33:28] Epoch: 1 Batch: 1238/20099 (6.16%) Loss: 2.237216 LR: 0.00000960 [06:33:31] Epoch: 1 Batch: 1239/20099 (6.16%) Loss: 2.607191 LR: 0.00000965 [06:33:34] Epoch: 1 Batch: 1240/20099 (6.17%) Loss: 2.439565 LR: 0.00000965 [06:33:37] Epoch: 1 Batch: 1241/20099 (6.17%) Loss: 2.361653 LR: 0.00000965 [06:33:40] Epoch: 1 Batch: 1242/20099 (6.18%) Loss: 2.527033 LR: 0.00000965 [06:33:43] Epoch: 1 Batch: 1243/20099 (6.18%) Loss: 2.046207 LR: 0.00000965 [06:33:46] Epoch: 1 Batch: 1244/20099 (6.19%) Loss: 2.371460 LR: 0.00000965 [06:33:49] Epoch: 1 Batch: 1245/20099 (6.19%) Loss: 2.475160 LR: 0.00000965 [06:33:52] Epoch: 1 Batch: 1246/20099 (6.20%) Loss: 2.439467 LR: 0.00000971 [06:33:56] Epoch: 1 Batch: 1247/20099 (6.20%) Loss: 2.404128 LR: 0.00000971 [06:33:59] Epoch: 1 Batch: 1248/20099 (6.21%) Loss: 2.501898 LR: 0.00000971 [06:34:02] Epoch: 1 Batch: 1249/20099 (6.21%) Loss: 2.381163 LR: 0.00000971 [06:34:05] Epoch: 1 Batch: 1250/20099 (6.22%) Loss: 2.260645 LR: 0.00000971 [06:34:08] Epoch: 1 Batch: 1251/20099 (6.22%) Loss: 2.446442 LR: 0.00000971 [06:34:11] Epoch: 1 Batch: 1252/20099 (6.23%) Loss: 2.256139 LR: 0.00000971 [06:34:14] Epoch: 1 Batch: 1253/20099 (6.23%) Loss: 2.539400 LR: 0.00000976 [06:34:17] Epoch: 1 Batch: 1254/20099 (6.24%) Loss: 2.235198 LR: 0.00000976 [06:34:20] Epoch: 1 Batch: 1255/20099 (6.24%) Loss: 2.437637 LR: 0.00000976 [06:34:23] Epoch: 1 Batch: 1256/20099 (6.25%) Loss: 2.549773 LR: 0.00000976 [06:34:27] Epoch: 1 Batch: 1257/20099 (6.25%) Loss: 2.362587 LR: 0.00000976 [06:34:30] Epoch: 1 Batch: 1258/20099 (6.26%) Loss: 2.502442 LR: 0.00000976 [06:34:33] Epoch: 1 Batch: 1259/20099 (6.26%) Loss: 2.442385 LR: 0.00000976 [06:34:36] Epoch: 1 Batch: 1260/20099 (6.27%) Loss: 2.605734 LR: 0.00000982 [06:34:39] Epoch: 1 Batch: 1261/20099 (6.27%) Loss: 2.201965 LR: 0.00000982 [06:34:42] Epoch: 1 Batch: 1262/20099 (6.28%) Loss: 2.348424 LR: 0.00000982 [06:34:45] Epoch: 1 Batch: 1263/20099 (6.28%) Loss: 2.392138 LR: 0.00000982 [06:34:48] Epoch: 1 Batch: 1264/20099 (6.29%) Loss: 2.438533 LR: 0.00000982 [06:34:51] Epoch: 1 Batch: 1265/20099 (6.29%) Loss: 2.438781 LR: 0.00000982 [06:34:54] Epoch: 1 Batch: 1266/20099 (6.30%) Loss: 2.175991 LR: 0.00000982 [06:34:58] Epoch: 1 Batch: 1267/20099 (6.30%) Loss: 2.521098 LR: 0.00000987 [06:35:01] Epoch: 1 Batch: 1268/20099 (6.31%) Loss: 2.255584 LR: 0.00000987 [06:35:04] Epoch: 1 Batch: 1269/20099 (6.31%) Loss: 2.443235 LR: 0.00000987 [06:35:07] Epoch: 1 Batch: 1270/20099 (6.32%) Loss: 2.291856 LR: 0.00000987 [06:35:10] Epoch: 1 Batch: 1271/20099 (6.32%) Loss: 2.544887 LR: 0.00000987 [06:35:13] Epoch: 1 Batch: 1272/20099 (6.33%) Loss: 2.293662 LR: 0.00000987 [06:35:16] Epoch: 1 Batch: 1273/20099 (6.33%) Loss: 1.993520 LR: 0.00000987 [06:35:19] Epoch: 1 Batch: 1274/20099 (6.34%) Loss: 2.308265 LR: 0.00000993 [06:35:22] Epoch: 1 Batch: 1275/20099 (6.34%) Loss: 2.471623 LR: 0.00000993 [06:35:25] Epoch: 1 Batch: 1276/20099 (6.35%) Loss: 2.757651 LR: 0.00000993 [06:35:29] Epoch: 1 Batch: 1277/20099 (6.35%) Loss: 2.543737 LR: 0.00000993 [06:35:32] Epoch: 1 Batch: 1278/20099 (6.36%) Loss: 2.274960 LR: 0.00000993 [06:35:35] Epoch: 1 Batch: 1279/20099 (6.36%) Loss: 2.403479 LR: 0.00000993 [06:35:38] Epoch: 1 Batch: 1280/20099 (6.37%) Loss: 2.312597 LR: 0.00000993 [06:35:41] Epoch: 1 Batch: 1281/20099 (6.37%) Loss: 2.502779 LR: 0.00000998 [06:35:44] Epoch: 1 Batch: 1282/20099 (6.38%) Loss: 1.982061 LR: 0.00000998 [06:35:47] Epoch: 1 Batch: 1283/20099 (6.38%) Loss: 2.280499 LR: 0.00000998 [06:35:50] Epoch: 1 Batch: 1284/20099 (6.39%) Loss: 2.177249 LR: 0.00000998 [06:35:53] Epoch: 1 Batch: 1285/20099 (6.39%) Loss: 2.269976 LR: 0.00000998 [06:35:56] Epoch: 1 Batch: 1286/20099 (6.40%) Loss: 2.230238 LR: 0.00000998 [06:35:59] Epoch: 1 Batch: 1287/20099 (6.40%) Loss: 2.454524 LR: 0.00000998 [06:36:03] Epoch: 1 Batch: 1288/20099 (6.41%) Loss: 2.176792 LR: 0.00001004 [06:36:06] Epoch: 1 Batch: 1289/20099 (6.41%) Loss: 2.427468 LR: 0.00001004 [06:36:09] Epoch: 1 Batch: 1290/20099 (6.42%) Loss: 2.108545 LR: 0.00001004 [06:36:12] Epoch: 1 Batch: 1291/20099 (6.42%) Loss: 2.582648 LR: 0.00001004 [06:36:15] Epoch: 1 Batch: 1292/20099 (6.43%) Loss: 2.188144 LR: 0.00001004 [06:36:18] Epoch: 1 Batch: 1293/20099 (6.43%) Loss: 2.269563 LR: 0.00001004 [06:36:21] Epoch: 1 Batch: 1294/20099 (6.44%) Loss: 2.632709 LR: 0.00001004 [06:36:24] Epoch: 1 Batch: 1295/20099 (6.44%) Loss: 2.584979 LR: 0.00001009 [06:36:27] Epoch: 1 Batch: 1296/20099 (6.45%) Loss: 2.552597 LR: 0.00001009 [06:36:30] Epoch: 1 Batch: 1297/20099 (6.45%) Loss: 2.693159 LR: 0.00001009 [06:36:33] Epoch: 1 Batch: 1298/20099 (6.46%) Loss: 2.456577 LR: 0.00001009 [06:36:37] Epoch: 1 Batch: 1299/20099 (6.46%) Loss: 2.555768 LR: 0.00001009 [06:36:40] Epoch: 1 Batch: 1300/20099 (6.47%) Loss: 2.107139 LR: 0.00001009 [06:36:43] Epoch: 1 Batch: 1301/20099 (6.47%) Loss: 2.286432 LR: 0.00001009 [06:36:46] Epoch: 1 Batch: 1302/20099 (6.48%) Loss: 2.321225 LR: 0.00001015 [06:36:49] Epoch: 1 Batch: 1303/20099 (6.48%) Loss: 2.383250 LR: 0.00001015 [06:36:52] Epoch: 1 Batch: 1304/20099 (6.49%) Loss: 2.301107 LR: 0.00001015 [06:36:55] Epoch: 1 Batch: 1305/20099 (6.49%) Loss: 2.723021 LR: 0.00001015 [06:36:58] Epoch: 1 Batch: 1306/20099 (6.50%) Loss: 2.344350 LR: 0.00001015 [06:37:01] Epoch: 1 Batch: 1307/20099 (6.50%) Loss: 2.362641 LR: 0.00001015 [06:37:04] Epoch: 1 Batch: 1308/20099 (6.51%) Loss: 2.252395 LR: 0.00001015 [06:37:08] Epoch: 1 Batch: 1309/20099 (6.51%) Loss: 2.355865 LR: 0.00001020 [06:37:11] Epoch: 1 Batch: 1310/20099 (6.52%) Loss: 2.530144 LR: 0.00001020 [06:37:14] Epoch: 1 Batch: 1311/20099 (6.52%) Loss: 2.416257 LR: 0.00001020 [06:37:17] Epoch: 1 Batch: 1312/20099 (6.53%) Loss: 2.434113 LR: 0.00001020 [06:37:20] Epoch: 1 Batch: 1313/20099 (6.53%) Loss: 1.982383 LR: 0.00001020 [06:37:23] Epoch: 1 Batch: 1314/20099 (6.54%) Loss: 2.381495 LR: 0.00001020 [06:37:26] Epoch: 1 Batch: 1315/20099 (6.54%) Loss: 2.059135 LR: 0.00001020 [06:37:29] Epoch: 1 Batch: 1316/20099 (6.55%) Loss: 1.941844 LR: 0.00001025 [06:37:32] Epoch: 1 Batch: 1317/20099 (6.55%) Loss: 2.455489 LR: 0.00001025 [06:37:35] Epoch: 1 Batch: 1318/20099 (6.56%) Loss: 2.204604 LR: 0.00001025 [06:37:38] Epoch: 1 Batch: 1319/20099 (6.56%) Loss: 2.355816 LR: 0.00001025 [06:37:42] Epoch: 1 Batch: 1320/20099 (6.57%) Loss: 2.332306 LR: 0.00001025 [06:37:45] Epoch: 1 Batch: 1321/20099 (6.57%) Loss: 2.272191 LR: 0.00001025 [06:37:48] Epoch: 1 Batch: 1322/20099 (6.58%) Loss: 2.254807 LR: 0.00001025 [06:37:51] Epoch: 1 Batch: 1323/20099 (6.58%) Loss: 2.258155 LR: 0.00001031 [06:37:54] Epoch: 1 Batch: 1324/20099 (6.59%) Loss: 2.370725 LR: 0.00001031 [06:37:57] Epoch: 1 Batch: 1325/20099 (6.59%) Loss: 2.446596 LR: 0.00001031 [06:38:00] Epoch: 1 Batch: 1326/20099 (6.60%) Loss: 2.211038 LR: 0.00001031 [06:38:03] Epoch: 1 Batch: 1327/20099 (6.60%) Loss: 2.337549 LR: 0.00001031 [06:38:06] Epoch: 1 Batch: 1328/20099 (6.61%) Loss: 2.289629 LR: 0.00001031 [06:38:09] Epoch: 1 Batch: 1329/20099 (6.61%) Loss: 2.338742 LR: 0.00001031 [06:38:13] Epoch: 1 Batch: 1330/20099 (6.62%) Loss: 2.628507 LR: 0.00001036 [06:38:16] Epoch: 1 Batch: 1331/20099 (6.62%) Loss: 2.208860 LR: 0.00001036 [06:38:19] Epoch: 1 Batch: 1332/20099 (6.63%) Loss: 2.243638 LR: 0.00001036 [06:38:22] Epoch: 1 Batch: 1333/20099 (6.63%) Loss: 2.684297 LR: 0.00001036 [06:38:25] Epoch: 1 Batch: 1334/20099 (6.64%) Loss: 2.256164 LR: 0.00001036 [06:38:28] Epoch: 1 Batch: 1335/20099 (6.64%) Loss: 2.408877 LR: 0.00001036 [06:38:31] Epoch: 1 Batch: 1336/20099 (6.65%) Loss: 2.266587 LR: 0.00001036 [06:38:34] Epoch: 1 Batch: 1337/20099 (6.65%) Loss: 1.974915 LR: 0.00001042 [06:38:37] Epoch: 1 Batch: 1338/20099 (6.66%) Loss: 2.213343 LR: 0.00001042 [06:38:41] Epoch: 1 Batch: 1339/20099 (6.66%) Loss: 2.329974 LR: 0.00001042 [06:38:44] Epoch: 1 Batch: 1340/20099 (6.67%) Loss: 2.386693 LR: 0.00001042 [06:38:47] Epoch: 1 Batch: 1341/20099 (6.67%) Loss: 2.384928 LR: 0.00001042 [06:38:50] Epoch: 1 Batch: 1342/20099 (6.68%) Loss: 2.319753 LR: 0.00001042 [06:38:53] Epoch: 1 Batch: 1343/20099 (6.68%) Loss: 2.035163 LR: 0.00001042 [06:38:56] Epoch: 1 Batch: 1344/20099 (6.69%) Loss: 2.347615 LR: 0.00001047 [06:38:59] Epoch: 1 Batch: 1345/20099 (6.69%) Loss: 2.542047 LR: 0.00001047 [06:39:02] Epoch: 1 Batch: 1346/20099 (6.70%) Loss: 2.217592 LR: 0.00001047 [06:39:05] Epoch: 1 Batch: 1347/20099 (6.70%) Loss: 2.158413 LR: 0.00001047 [06:39:08] Epoch: 1 Batch: 1348/20099 (6.71%) Loss: 2.359446 LR: 0.00001047 [06:39:12] Epoch: 1 Batch: 1349/20099 (6.71%) Loss: 2.324864 LR: 0.00001047 [06:39:15] Epoch: 1 Batch: 1350/20099 (6.72%) Loss: 2.537016 LR: 0.00001047 [06:39:18] Epoch: 1 Batch: 1351/20099 (6.72%) Loss: 2.143765 LR: 0.00001053 [06:39:21] Epoch: 1 Batch: 1352/20099 (6.73%) Loss: 2.270648 LR: 0.00001053 [06:39:24] Epoch: 1 Batch: 1353/20099 (6.73%) Loss: 2.378229 LR: 0.00001053 [06:39:27] Epoch: 1 Batch: 1354/20099 (6.74%) Loss: 2.306212 LR: 0.00001053 [06:39:30] Epoch: 1 Batch: 1355/20099 (6.74%) Loss: 2.260682 LR: 0.00001053 [06:39:33] Epoch: 1 Batch: 1356/20099 (6.75%) Loss: 2.290469 LR: 0.00001053 [06:39:36] Epoch: 1 Batch: 1357/20099 (6.75%) Loss: 2.127180 LR: 0.00001053 [06:39:39] Epoch: 1 Batch: 1358/20099 (6.76%) Loss: 1.924264 LR: 0.00001058 [06:39:43] Epoch: 1 Batch: 1359/20099 (6.76%) Loss: 2.220931 LR: 0.00001058 [06:39:46] Epoch: 1 Batch: 1360/20099 (6.77%) Loss: 2.320542 LR: 0.00001058 [06:39:49] Epoch: 1 Batch: 1361/20099 (6.77%) Loss: 2.457225 LR: 0.00001058 [06:39:52] Epoch: 1 Batch: 1362/20099 (6.78%) Loss: 2.309032 LR: 0.00001058 [06:39:55] Epoch: 1 Batch: 1363/20099 (6.78%) Loss: 2.455758 LR: 0.00001058 [06:39:58] Epoch: 1 Batch: 1364/20099 (6.79%) Loss: 2.299111 LR: 0.00001058 [06:40:01] Epoch: 1 Batch: 1365/20099 (6.79%) Loss: 2.207558 LR: 0.00001064 [06:40:04] Epoch: 1 Batch: 1366/20099 (6.80%) Loss: 2.328998 LR: 0.00001064 [06:40:07] Epoch: 1 Batch: 1367/20099 (6.80%) Loss: 2.279822 LR: 0.00001064 [06:40:10] Epoch: 1 Batch: 1368/20099 (6.81%) Loss: 2.323619 LR: 0.00001064 [06:40:13] Epoch: 1 Batch: 1369/20099 (6.81%) Loss: 2.546458 LR: 0.00001064 [06:40:16] Epoch: 1 Batch: 1370/20099 (6.82%) Loss: 2.667234 LR: 0.00001064 [06:40:20] Epoch: 1 Batch: 1371/20099 (6.82%) Loss: 2.670646 LR: 0.00001064 [06:40:23] Epoch: 1 Batch: 1372/20099 (6.83%) Loss: 2.473140 LR: 0.00001069 [06:40:26] Epoch: 1 Batch: 1373/20099 (6.83%) Loss: 2.103649 LR: 0.00001069 [06:40:29] Epoch: 1 Batch: 1374/20099 (6.84%) Loss: 2.178277 LR: 0.00001069 [06:40:32] Epoch: 1 Batch: 1375/20099 (6.84%) Loss: 2.793400 LR: 0.00001069 [06:40:35] Epoch: 1 Batch: 1376/20099 (6.85%) Loss: 2.210613 LR: 0.00001069 [06:40:38] Epoch: 1 Batch: 1377/20099 (6.85%) Loss: 2.590357 LR: 0.00001069 [06:40:41] Epoch: 1 Batch: 1378/20099 (6.86%) Loss: 2.690386 LR: 0.00001069 [06:40:44] Epoch: 1 Batch: 1379/20099 (6.86%) Loss: 2.589483 LR: 0.00001075 [06:40:47] Epoch: 1 Batch: 1380/20099 (6.87%) Loss: 2.299586 LR: 0.00001075 [06:40:51] Epoch: 1 Batch: 1381/20099 (6.87%) Loss: 2.515357 LR: 0.00001075 [06:40:54] Epoch: 1 Batch: 1382/20099 (6.88%) Loss: 2.652035 LR: 0.00001075 [06:40:57] Epoch: 1 Batch: 1383/20099 (6.88%) Loss: 1.913829 LR: 0.00001075 [06:41:00] Epoch: 1 Batch: 1384/20099 (6.89%) Loss: 2.533218 LR: 0.00001075 [06:41:03] Epoch: 1 Batch: 1385/20099 (6.89%) Loss: 2.327100 LR: 0.00001075 [06:41:06] Epoch: 1 Batch: 1386/20099 (6.90%) Loss: 2.287391 LR: 0.00001080 [06:41:09] Epoch: 1 Batch: 1387/20099 (6.90%) Loss: 2.145640 LR: 0.00001080 [06:41:12] Epoch: 1 Batch: 1388/20099 (6.91%) Loss: 2.271015 LR: 0.00001080 [06:41:15] Epoch: 1 Batch: 1389/20099 (6.91%) Loss: 2.267975 LR: 0.00001080 [06:41:18] Epoch: 1 Batch: 1390/20099 (6.92%) Loss: 2.553367 LR: 0.00001080 [06:41:22] Epoch: 1 Batch: 1391/20099 (6.92%) Loss: 2.401686 LR: 0.00001080 [06:41:25] Epoch: 1 Batch: 1392/20099 (6.93%) Loss: 2.445266 LR: 0.00001080 [06:41:28] Epoch: 1 Batch: 1393/20099 (6.93%) Loss: 2.359893 LR: 0.00001085 [06:41:31] Epoch: 1 Batch: 1394/20099 (6.94%) Loss: 2.159885 LR: 0.00001085 [06:41:34] Epoch: 1 Batch: 1395/20099 (6.94%) Loss: 2.557048 LR: 0.00001085 [06:41:37] Epoch: 1 Batch: 1396/20099 (6.95%) Loss: 2.612354 LR: 0.00001085 [06:41:40] Epoch: 1 Batch: 1397/20099 (6.95%) Loss: 2.457088 LR: 0.00001085 [06:41:43] Epoch: 1 Batch: 1398/20099 (6.96%) Loss: 2.703312 LR: 0.00001085 [06:41:46] Epoch: 1 Batch: 1399/20099 (6.96%) Loss: 2.248981 LR: 0.00001085 [06:41:53] >> Temp checkpoint saved: epoch1_step1400, size: 0.1693 GB [06:41:53] Epoch: 1 Batch: 1400/20099 (6.97%) Loss: 2.377447 LR: 0.00001091 [06:41:56] Epoch: 1 Batch: 1401/20099 (6.97%) Loss: 2.581002 LR: 0.00001091 [06:41:59] Epoch: 1 Batch: 1402/20099 (6.98%) Loss: 2.312774 LR: 0.00001091 [06:42:02] Epoch: 1 Batch: 1403/20099 (6.98%) Loss: 2.354105 LR: 0.00001091 [06:42:05] Epoch: 1 Batch: 1404/20099 (6.99%) Loss: 2.454820 LR: 0.00001091 [06:42:08] Epoch: 1 Batch: 1405/20099 (6.99%) Loss: 1.947499 LR: 0.00001091 [06:42:12] Epoch: 1 Batch: 1406/20099 (7.00%) Loss: 2.740079 LR: 0.00001091 [06:42:15] Epoch: 1 Batch: 1407/20099 (7.00%) Loss: 2.408769 LR: 0.00001096 [06:42:18] Epoch: 1 Batch: 1408/20099 (7.01%) Loss: 2.120155 LR: 0.00001096 [06:42:21] Epoch: 1 Batch: 1409/20099 (7.01%) Loss: 2.305241 LR: 0.00001096 [06:42:24] Epoch: 1 Batch: 1410/20099 (7.02%) Loss: 2.000194 LR: 0.00001096 [06:42:27] Epoch: 1 Batch: 1411/20099 (7.02%) Loss: 2.381540 LR: 0.00001096 [06:42:30] Epoch: 1 Batch: 1412/20099 (7.03%) Loss: 2.338565 LR: 0.00001096 [06:42:33] Epoch: 1 Batch: 1413/20099 (7.03%) Loss: 2.278925 LR: 0.00001096 [06:42:37] Epoch: 1 Batch: 1414/20099 (7.04%) Loss: 2.461100 LR: 0.00001102 [06:42:40] Epoch: 1 Batch: 1415/20099 (7.04%) Loss: 2.416693 LR: 0.00001102 [06:42:43] Epoch: 1 Batch: 1416/20099 (7.05%) Loss: 2.642253 LR: 0.00001102 [06:42:46] Epoch: 1 Batch: 1417/20099 (7.05%) Loss: 2.188291 LR: 0.00001102 [06:42:49] Epoch: 1 Batch: 1418/20099 (7.06%) Loss: 2.598341 LR: 0.00001102 [06:42:52] Epoch: 1 Batch: 1419/20099 (7.06%) Loss: 2.553317 LR: 0.00001102 [06:42:55] Epoch: 1 Batch: 1420/20099 (7.07%) Loss: 2.200788 LR: 0.00001102 [06:42:58] Epoch: 1 Batch: 1421/20099 (7.07%) Loss: 2.358859 LR: 0.00001107 [06:43:01] Epoch: 1 Batch: 1422/20099 (7.07%) Loss: 2.376953 LR: 0.00001107 [06:43:04] Epoch: 1 Batch: 1423/20099 (7.08%) Loss: 2.430365 LR: 0.00001107 [06:43:08] Epoch: 1 Batch: 1424/20099 (7.08%) Loss: 2.331782 LR: 0.00001107 [06:43:11] Epoch: 1 Batch: 1425/20099 (7.09%) Loss: 2.395746 LR: 0.00001107 [06:43:14] Epoch: 1 Batch: 1426/20099 (7.09%) Loss: 2.209054 LR: 0.00001107 [06:43:17] Epoch: 1 Batch: 1427/20099 (7.10%) Loss: 2.355356 LR: 0.00001107 [06:43:20] Epoch: 1 Batch: 1428/20099 (7.10%) Loss: 2.361518 LR: 0.00001113 [06:43:23] Epoch: 1 Batch: 1429/20099 (7.11%) Loss: 2.321869 LR: 0.00001113 [06:43:26] Epoch: 1 Batch: 1430/20099 (7.11%) Loss: 2.313463 LR: 0.00001113 [06:43:29] Epoch: 1 Batch: 1431/20099 (7.12%) Loss: 2.292091 LR: 0.00001113 [06:43:32] Epoch: 1 Batch: 1432/20099 (7.12%) Loss: 2.081513 LR: 0.00001113 [06:43:35] Epoch: 1 Batch: 1433/20099 (7.13%) Loss: 2.506223 LR: 0.00001113 [06:43:38] Epoch: 1 Batch: 1434/20099 (7.13%) Loss: 2.399392 LR: 0.00001113 [06:43:41] Epoch: 1 Batch: 1435/20099 (7.14%) Loss: 2.423990 LR: 0.00001118 [06:43:45] Epoch: 1 Batch: 1436/20099 (7.14%) Loss: 2.565485 LR: 0.00001118 [06:43:48] Epoch: 1 Batch: 1437/20099 (7.15%) Loss: 2.371872 LR: 0.00001118 [06:43:51] Epoch: 1 Batch: 1438/20099 (7.15%) Loss: 2.236276 LR: 0.00001118 [06:43:54] Epoch: 1 Batch: 1439/20099 (7.16%) Loss: 2.274421 LR: 0.00001118 [06:43:57] Epoch: 1 Batch: 1440/20099 (7.16%) Loss: 2.341462 LR: 0.00001118 [06:44:00] Epoch: 1 Batch: 1441/20099 (7.17%) Loss: 2.330958 LR: 0.00001118 [06:44:03] Epoch: 1 Batch: 1442/20099 (7.17%) Loss: 2.468582 LR: 0.00001124 [06:44:06] Epoch: 1 Batch: 1443/20099 (7.18%) Loss: 1.979831 LR: 0.00001124 [06:44:09] Epoch: 1 Batch: 1444/20099 (7.18%) Loss: 2.286430 LR: 0.00001124 [06:44:12] Epoch: 1 Batch: 1445/20099 (7.19%) Loss: 1.929819 LR: 0.00001124 [06:44:16] Epoch: 1 Batch: 1446/20099 (7.19%) Loss: 2.218487 LR: 0.00001124 [06:44:19] Epoch: 1 Batch: 1447/20099 (7.20%) Loss: 2.496862 LR: 0.00001124 [06:44:22] Epoch: 1 Batch: 1448/20099 (7.20%) Loss: 2.568574 LR: 0.00001124 [06:44:25] Epoch: 1 Batch: 1449/20099 (7.21%) Loss: 2.309966 LR: 0.00001129 [06:44:28] Epoch: 1 Batch: 1450/20099 (7.21%) Loss: 2.352157 LR: 0.00001129 [06:44:31] Epoch: 1 Batch: 1451/20099 (7.22%) Loss: 2.550408 LR: 0.00001129 [06:44:34] Epoch: 1 Batch: 1452/20099 (7.22%) Loss: 2.310077 LR: 0.00001129 [06:44:37] Epoch: 1 Batch: 1453/20099 (7.23%) Loss: 2.359107 LR: 0.00001129 [06:44:40] Epoch: 1 Batch: 1454/20099 (7.23%) Loss: 2.288473 LR: 0.00001129 [06:44:44] Epoch: 1 Batch: 1455/20099 (7.24%) Loss: 2.505878 LR: 0.00001129 [06:44:47] Epoch: 1 Batch: 1456/20099 (7.24%) Loss: 2.398846 LR: 0.00001135 [06:44:50] Epoch: 1 Batch: 1457/20099 (7.25%) Loss: 2.437691 LR: 0.00001135 [06:44:53] Epoch: 1 Batch: 1458/20099 (7.25%) Loss: 2.500881 LR: 0.00001135 [06:44:56] Epoch: 1 Batch: 1459/20099 (7.26%) Loss: 2.546563 LR: 0.00001135 [06:44:59] Epoch: 1 Batch: 1460/20099 (7.26%) Loss: 2.515837 LR: 0.00001135 [06:45:02] Epoch: 1 Batch: 1461/20099 (7.27%) Loss: 2.236443 LR: 0.00001135 [06:45:05] Epoch: 1 Batch: 1462/20099 (7.27%) Loss: 2.471182 LR: 0.00001135 [06:45:08] Epoch: 1 Batch: 1463/20099 (7.28%) Loss: 2.366965 LR: 0.00001140 [06:45:11] Epoch: 1 Batch: 1464/20099 (7.28%) Loss: 2.186170 LR: 0.00001140 [06:45:15] Epoch: 1 Batch: 1465/20099 (7.29%) Loss: 2.634935 LR: 0.00001140 [06:45:18] Epoch: 1 Batch: 1466/20099 (7.29%) Loss: 2.204321 LR: 0.00001140 [06:45:21] Epoch: 1 Batch: 1467/20099 (7.30%) Loss: 2.414025 LR: 0.00001140 [06:45:24] Epoch: 1 Batch: 1468/20099 (7.30%) Loss: 2.527127 LR: 0.00001140 [06:45:27] Epoch: 1 Batch: 1469/20099 (7.31%) Loss: 2.164903 LR: 0.00001140 [06:45:30] Epoch: 1 Batch: 1470/20099 (7.31%) Loss: 2.337158 LR: 0.00001145 [06:45:33] Epoch: 1 Batch: 1471/20099 (7.32%) Loss: 2.225142 LR: 0.00001145 [06:45:36] Epoch: 1 Batch: 1472/20099 (7.32%) Loss: 2.291492 LR: 0.00001145 [06:45:39] Epoch: 1 Batch: 1473/20099 (7.33%) Loss: 2.551913 LR: 0.00001145 [06:45:42] Epoch: 1 Batch: 1474/20099 (7.33%) Loss: 2.183409 LR: 0.00001145 [06:45:45] Epoch: 1 Batch: 1475/20099 (7.34%) Loss: 2.149106 LR: 0.00001145 [06:45:48] Epoch: 1 Batch: 1476/20099 (7.34%) Loss: 2.543639 LR: 0.00001145 [06:45:52] Epoch: 1 Batch: 1477/20099 (7.35%) Loss: 2.304509 LR: 0.00001151 [06:45:55] Epoch: 1 Batch: 1478/20099 (7.35%) Loss: 2.353556 LR: 0.00001151 [06:45:58] Epoch: 1 Batch: 1479/20099 (7.36%) Loss: 2.590397 LR: 0.00001151 [06:46:01] Epoch: 1 Batch: 1480/20099 (7.36%) Loss: 2.676801 LR: 0.00001151 [06:46:04] Epoch: 1 Batch: 1481/20099 (7.37%) Loss: 2.001445 LR: 0.00001151 [06:46:07] Epoch: 1 Batch: 1482/20099 (7.37%) Loss: 2.574193 LR: 0.00001151 [06:46:10] Epoch: 1 Batch: 1483/20099 (7.38%) Loss: 2.075574 LR: 0.00001151 [06:46:13] Epoch: 1 Batch: 1484/20099 (7.38%) Loss: 2.409600 LR: 0.00001156 [06:46:16] Epoch: 1 Batch: 1485/20099 (7.39%) Loss: 2.394545 LR: 0.00001156 [06:46:19] Epoch: 1 Batch: 1486/20099 (7.39%) Loss: 2.402910 LR: 0.00001156 [06:46:22] Epoch: 1 Batch: 1487/20099 (7.40%) Loss: 2.227754 LR: 0.00001156 [06:46:26] Epoch: 1 Batch: 1488/20099 (7.40%) Loss: 2.157104 LR: 0.00001156 [06:46:29] Epoch: 1 Batch: 1489/20099 (7.41%) Loss: 2.407466 LR: 0.00001156 [06:46:32] Epoch: 1 Batch: 1490/20099 (7.41%) Loss: 2.390683 LR: 0.00001156 [06:46:35] Epoch: 1 Batch: 1491/20099 (7.42%) Loss: 2.129061 LR: 0.00001162 [06:46:38] Epoch: 1 Batch: 1492/20099 (7.42%) Loss: 2.019541 LR: 0.00001162 [06:46:41] Epoch: 1 Batch: 1493/20099 (7.43%) Loss: 2.308099 LR: 0.00001162 [06:46:44] Epoch: 1 Batch: 1494/20099 (7.43%) Loss: 2.048505 LR: 0.00001162 [06:46:47] Epoch: 1 Batch: 1495/20099 (7.44%) Loss: 2.472794 LR: 0.00001162 [06:46:50] Epoch: 1 Batch: 1496/20099 (7.44%) Loss: 2.386614 LR: 0.00001162 [06:46:53] Epoch: 1 Batch: 1497/20099 (7.45%) Loss: 2.411885 LR: 0.00001162 [06:46:57] Epoch: 1 Batch: 1498/20099 (7.45%) Loss: 2.209210 LR: 0.00001167 [06:47:00] Epoch: 1 Batch: 1499/20099 (7.46%) Loss: 2.161344 LR: 0.00001167 [06:47:03] >> Evaluating batch 0 [06:47:04] >> Evaluating batch 1 [06:47:05] >> Evaluating batch 2 [06:47:07] >> Evaluating batch 3 [06:47:08] >> Evaluating batch 4 [06:47:09] >> Evaluating batch 5 [06:47:10] >> Evaluating batch 6 [06:47:12] >> Evaluating batch 7 [06:47:13] >> Evaluating batch 8 [06:47:14] >> Evaluating batch 9 [06:47:15] >> Evaluating batch 10 [06:47:17] >> Evaluating batch 11 [06:47:18] >> Evaluating batch 12 [06:47:19] >> Evaluating batch 13 [06:47:20] >> Evaluating batch 14 [06:47:21] >> Evaluating batch 15 [06:47:22] >> Evaluating batch 16 [06:47:23] Epoch: 1 Step: 1500/20099 Evaluation: [06:47:23] [1mAvg Loss Since Last Eval: 2.3877 Val Loss: 2.4055 Validation loss delta: -0.1438 Perplexity: 11.0837 LR: 0.00001167 [06:47:27] >> Checkpoint saved: epoch1_step1500, size: 0.1693 GB [06:47:27] Epoch: 1 Batch: 1500/20099 (7.46%) Loss: 2.417127 LR: 0.00001167 [06:47:30] Epoch: 1 Batch: 1501/20099 (7.47%) Loss: 2.424207 LR: 0.00001167 [06:47:33] Epoch: 1 Batch: 1502/20099 (7.47%) Loss: 2.784582 LR: 0.00001167 [06:47:36] Epoch: 1 Batch: 1503/20099 (7.48%) Loss: 2.551846 LR: 0.00001167 [06:47:39] Epoch: 1 Batch: 1504/20099 (7.48%) Loss: 2.496992 LR: 0.00001167 [06:47:42] Epoch: 1 Batch: 1505/20099 (7.49%) Loss: 2.135157 LR: 0.00001173 [06:47:45] Epoch: 1 Batch: 1506/20099 (7.49%) Loss: 2.426564 LR: 0.00001173 [06:47:48] Epoch: 1 Batch: 1507/20099 (7.50%) Loss: 2.139047 LR: 0.00001173 [06:47:51] Epoch: 1 Batch: 1508/20099 (7.50%) Loss: 2.304374 LR: 0.00001173 [06:47:55] Epoch: 1 Batch: 1509/20099 (7.51%) Loss: 2.576912 LR: 0.00001173 [06:47:58] Epoch: 1 Batch: 1510/20099 (7.51%) Loss: 2.379657 LR: 0.00001173 [06:48:01] Epoch: 1 Batch: 1511/20099 (7.52%) Loss: 2.058109 LR: 0.00001173 [06:48:04] Epoch: 1 Batch: 1512/20099 (7.52%) Loss: 2.568503 LR: 0.00001178 [06:48:07] Epoch: 1 Batch: 1513/20099 (7.53%) Loss: 2.528423 LR: 0.00001178 [06:48:10] Epoch: 1 Batch: 1514/20099 (7.53%) Loss: 2.435926 LR: 0.00001178 [06:48:13] Epoch: 1 Batch: 1515/20099 (7.54%) Loss: 2.467796 LR: 0.00001178 [06:48:16] Epoch: 1 Batch: 1516/20099 (7.54%) Loss: 2.436476 LR: 0.00001178 [06:48:19] Epoch: 1 Batch: 1517/20099 (7.55%) Loss: 2.317236 LR: 0.00001178 [06:48:23] Epoch: 1 Batch: 1518/20099 (7.55%) Loss: 2.113345 LR: 0.00001178 [06:48:26] Epoch: 1 Batch: 1519/20099 (7.56%) Loss: 2.191586 LR: 0.00001184 [06:48:29] Epoch: 1 Batch: 1520/20099 (7.56%) Loss: 2.143368 LR: 0.00001184 [06:48:32] Epoch: 1 Batch: 1521/20099 (7.57%) Loss: 2.187930 LR: 0.00001184 [06:48:35] Epoch: 1 Batch: 1522/20099 (7.57%) Loss: 2.517765 LR: 0.00001184 [06:48:38] Epoch: 1 Batch: 1523/20099 (7.58%) Loss: 2.277214 LR: 0.00001184 [06:48:41] Epoch: 1 Batch: 1524/20099 (7.58%) Loss: 2.627007 LR: 0.00001184 [06:48:44] Epoch: 1 Batch: 1525/20099 (7.59%) Loss: 2.509440 LR: 0.00001184 [06:48:47] Epoch: 1 Batch: 1526/20099 (7.59%) Loss: 2.297620 LR: 0.00001189 [06:48:50] Epoch: 1 Batch: 1527/20099 (7.60%) Loss: 2.228506 LR: 0.00001189 [06:48:53] Epoch: 1 Batch: 1528/20099 (7.60%) Loss: 2.365267 LR: 0.00001189 [06:48:57] Epoch: 1 Batch: 1529/20099 (7.61%) Loss: 2.378869 LR: 0.00001189 [06:49:00] Epoch: 1 Batch: 1530/20099 (7.61%) Loss: 2.581509 LR: 0.00001189 [06:49:03] Epoch: 1 Batch: 1531/20099 (7.62%) Loss: 2.241691 LR: 0.00001189 [06:49:06] Epoch: 1 Batch: 1532/20099 (7.62%) Loss: 2.350325 LR: 0.00001189 [06:49:09] Epoch: 1 Batch: 1533/20099 (7.63%) Loss: 2.250481 LR: 0.00001195 [06:49:12] Epoch: 1 Batch: 1534/20099 (7.63%) Loss: 2.032722 LR: 0.00001195 [06:49:15] Epoch: 1 Batch: 1535/20099 (7.64%) Loss: 2.060870 LR: 0.00001195 [06:49:18] Epoch: 1 Batch: 1536/20099 (7.64%) Loss: 2.344065 LR: 0.00001195 [06:49:21] Epoch: 1 Batch: 1537/20099 (7.65%) Loss: 2.419174 LR: 0.00001195 [06:49:24] Epoch: 1 Batch: 1538/20099 (7.65%) Loss: 2.419722 LR: 0.00001195 [06:49:28] Epoch: 1 Batch: 1539/20099 (7.66%) Loss: 2.392572 LR: 0.00001195 [06:49:31] Epoch: 1 Batch: 1540/20099 (7.66%) Loss: 2.334383 LR: 0.00001200 [06:49:34] Epoch: 1 Batch: 1541/20099 (7.67%) Loss: 2.313348 LR: 0.00001200 [06:49:37] Epoch: 1 Batch: 1542/20099 (7.67%) Loss: 2.380042 LR: 0.00001200 [06:49:40] Epoch: 1 Batch: 1543/20099 (7.68%) Loss: 2.461760 LR: 0.00001200 [06:49:43] Epoch: 1 Batch: 1544/20099 (7.68%) Loss: 2.149881 LR: 0.00001200 [06:49:46] Epoch: 1 Batch: 1545/20099 (7.69%) Loss: 2.511050 LR: 0.00001200 [06:49:49] Epoch: 1 Batch: 1546/20099 (7.69%) Loss: 2.311547 LR: 0.00001200 [06:49:52] Epoch: 1 Batch: 1547/20099 (7.70%) Loss: 2.359907 LR: 0.00001205 [06:49:55] Epoch: 1 Batch: 1548/20099 (7.70%) Loss: 2.299057 LR: 0.00001205 [06:49:59] Epoch: 1 Batch: 1549/20099 (7.71%) Loss: 2.622379 LR: 0.00001205 [06:50:02] Epoch: 1 Batch: 1550/20099 (7.71%) Loss: 2.213366 LR: 0.00001205 [06:50:05] Epoch: 1 Batch: 1551/20099 (7.72%) Loss: 1.909202 LR: 0.00001205 [06:50:08] Epoch: 1 Batch: 1552/20099 (7.72%) Loss: 2.435736 LR: 0.00001205 [06:50:11] Epoch: 1 Batch: 1553/20099 (7.73%) Loss: 2.653101 LR: 0.00001205 [06:50:14] Epoch: 1 Batch: 1554/20099 (7.73%) Loss: 2.311483 LR: 0.00001211 [06:50:17] Epoch: 1 Batch: 1555/20099 (7.74%) Loss: 2.180030 LR: 0.00001211 [06:50:20] Epoch: 1 Batch: 1556/20099 (7.74%) Loss: 2.183302 LR: 0.00001211 [06:50:23] Epoch: 1 Batch: 1557/20099 (7.75%) Loss: 1.936146 LR: 0.00001211 [06:50:26] Epoch: 1 Batch: 1558/20099 (7.75%) Loss: 2.386286 LR: 0.00001211 [06:50:30] Epoch: 1 Batch: 1559/20099 (7.76%) Loss: 2.194422 LR: 0.00001211 [06:50:33] Epoch: 1 Batch: 1560/20099 (7.76%) Loss: 2.626185 LR: 0.00001211 [06:50:36] Epoch: 1 Batch: 1561/20099 (7.77%) Loss: 2.297762 LR: 0.00001216 [06:50:39] Epoch: 1 Batch: 1562/20099 (7.77%) Loss: 2.223624 LR: 0.00001216 [06:50:42] Epoch: 1 Batch: 1563/20099 (7.78%) Loss: 2.231692 LR: 0.00001216 [06:50:45] Epoch: 1 Batch: 1564/20099 (7.78%) Loss: 2.398568 LR: 0.00001216 [06:50:48] Epoch: 1 Batch: 1565/20099 (7.79%) Loss: 2.477699 LR: 0.00001216 [06:50:51] Epoch: 1 Batch: 1566/20099 (7.79%) Loss: 2.346787 LR: 0.00001216 [06:50:54] Epoch: 1 Batch: 1567/20099 (7.80%) Loss: 2.458641 LR: 0.00001216 [06:50:58] Epoch: 1 Batch: 1568/20099 (7.80%) Loss: 2.256150 LR: 0.00001222 [06:51:01] Epoch: 1 Batch: 1569/20099 (7.81%) Loss: 2.306362 LR: 0.00001222 [06:51:04] Epoch: 1 Batch: 1570/20099 (7.81%) Loss: 2.053053 LR: 0.00001222 [06:51:07] Epoch: 1 Batch: 1571/20099 (7.82%) Loss: 2.450536 LR: 0.00001222 [06:51:10] Epoch: 1 Batch: 1572/20099 (7.82%) Loss: 2.270018 LR: 0.00001222 [06:51:13] Epoch: 1 Batch: 1573/20099 (7.83%) Loss: 2.215135 LR: 0.00001222 [06:51:16] Epoch: 1 Batch: 1574/20099 (7.83%) Loss: 2.505959 LR: 0.00001222 [06:51:19] Epoch: 1 Batch: 1575/20099 (7.84%) Loss: 2.287878 LR: 0.00001227 [06:51:22] Epoch: 1 Batch: 1576/20099 (7.84%) Loss: 2.136587 LR: 0.00001227 [06:51:25] Epoch: 1 Batch: 1577/20099 (7.85%) Loss: 2.470381 LR: 0.00001227 [06:51:29] Epoch: 1 Batch: 1578/20099 (7.85%) Loss: 2.281889 LR: 0.00001227 [06:51:32] Epoch: 1 Batch: 1579/20099 (7.86%) Loss: 2.124431 LR: 0.00001227 [06:51:35] Epoch: 1 Batch: 1580/20099 (7.86%) Loss: 2.418377 LR: 0.00001227 [06:51:38] Epoch: 1 Batch: 1581/20099 (7.87%) Loss: 2.271000 LR: 0.00001227 [06:51:41] Epoch: 1 Batch: 1582/20099 (7.87%) Loss: 2.287450 LR: 0.00001233 [06:51:44] Epoch: 1 Batch: 1583/20099 (7.88%) Loss: 2.340910 LR: 0.00001233 [06:51:47] Epoch: 1 Batch: 1584/20099 (7.88%) Loss: 2.590541 LR: 0.00001233 [06:51:50] Epoch: 1 Batch: 1585/20099 (7.89%) Loss: 2.343602 LR: 0.00001233 [06:51:53] Epoch: 1 Batch: 1586/20099 (7.89%) Loss: 2.606321 LR: 0.00001233 [06:51:56] Epoch: 1 Batch: 1587/20099 (7.90%) Loss: 2.511487 LR: 0.00001233 [06:52:00] Epoch: 1 Batch: 1588/20099 (7.90%) Loss: 2.258364 LR: 0.00001233 [06:52:03] Epoch: 1 Batch: 1589/20099 (7.91%) Loss: 2.827784 LR: 0.00001238 [06:52:06] Epoch: 1 Batch: 1590/20099 (7.91%) Loss: 2.164371 LR: 0.00001238 [06:52:09] Epoch: 1 Batch: 1591/20099 (7.92%) Loss: 2.174267 LR: 0.00001238 [06:52:12] Epoch: 1 Batch: 1592/20099 (7.92%) Loss: 2.491860 LR: 0.00001238 [06:52:15] Epoch: 1 Batch: 1593/20099 (7.93%) Loss: 2.291612 LR: 0.00001238 [06:52:18] Epoch: 1 Batch: 1594/20099 (7.93%) Loss: 2.423944 LR: 0.00001238 [06:52:21] Epoch: 1 Batch: 1595/20099 (7.94%) Loss: 2.093985 LR: 0.00001238 [06:52:24] Epoch: 1 Batch: 1596/20099 (7.94%) Loss: 2.354590 LR: 0.00001244 [06:52:27] Epoch: 1 Batch: 1597/20099 (7.95%) Loss: 2.373621 LR: 0.00001244 [06:52:31] Epoch: 1 Batch: 1598/20099 (7.95%) Loss: 2.445650 LR: 0.00001244 [06:52:34] Epoch: 1 Batch: 1599/20099 (7.96%) Loss: 2.492216 LR: 0.00001244 [06:52:40] >> Temp checkpoint saved: epoch1_step1600, size: 0.1693 GB [06:52:40] Epoch: 1 Batch: 1600/20099 (7.96%) Loss: 2.247180 LR: 0.00001244 [06:52:44] Epoch: 1 Batch: 1601/20099 (7.97%) Loss: 2.532938 LR: 0.00001244 [06:52:47] Epoch: 1 Batch: 1602/20099 (7.97%) Loss: 2.000120 LR: 0.00001244 [06:52:50] Epoch: 1 Batch: 1603/20099 (7.98%) Loss: 2.299903 LR: 0.00001249 [06:52:53] Epoch: 1 Batch: 1604/20099 (7.98%) Loss: 2.182782 LR: 0.00001249 [06:52:56] Epoch: 1 Batch: 1605/20099 (7.99%) Loss: 2.360156 LR: 0.00001249 [06:52:59] Epoch: 1 Batch: 1606/20099 (7.99%) Loss: 2.619795 LR: 0.00001249 [06:53:02] Epoch: 1 Batch: 1607/20099 (8.00%) Loss: 2.502345 LR: 0.00001249 [06:53:05] Epoch: 1 Batch: 1608/20099 (8.00%) Loss: 2.409710 LR: 0.00001249 [06:53:08] Epoch: 1 Batch: 1609/20099 (8.01%) Loss: 2.194351 LR: 0.00001249 [06:53:12] Epoch: 1 Batch: 1610/20099 (8.01%) Loss: 2.379601 LR: 0.00001255 [06:53:15] Epoch: 1 Batch: 1611/20099 (8.02%) Loss: 2.438886 LR: 0.00001255 [06:53:18] Epoch: 1 Batch: 1612/20099 (8.02%) Loss: 2.301873 LR: 0.00001255 [06:53:21] Epoch: 1 Batch: 1613/20099 (8.03%) Loss: 2.217442 LR: 0.00001255 [06:53:24] Epoch: 1 Batch: 1614/20099 (8.03%) Loss: 2.153318 LR: 0.00001255 [06:53:27] Epoch: 1 Batch: 1615/20099 (8.04%) Loss: 2.458114 LR: 0.00001255 [06:53:30] Epoch: 1 Batch: 1616/20099 (8.04%) Loss: 2.129455 LR: 0.00001255 [06:53:33] Epoch: 1 Batch: 1617/20099 (8.05%) Loss: 2.181606 LR: 0.00001260 [06:53:36] Epoch: 1 Batch: 1618/20099 (8.05%) Loss: 2.135801 LR: 0.00001260 [06:53:40] Epoch: 1 Batch: 1619/20099 (8.06%) Loss: 2.172067 LR: 0.00001260 [06:53:43] Epoch: 1 Batch: 1620/20099 (8.06%) Loss: 2.315743 LR: 0.00001260 [06:53:46] Epoch: 1 Batch: 1621/20099 (8.07%) Loss: 2.308822 LR: 0.00001260 [06:53:49] Epoch: 1 Batch: 1622/20099 (8.07%) Loss: 2.357031 LR: 0.00001260 [06:53:52] Epoch: 1 Batch: 1623/20099 (8.08%) Loss: 2.424401 LR: 0.00001260 [06:53:55] Epoch: 1 Batch: 1624/20099 (8.08%) Loss: 2.049492 LR: 0.00001265 [06:53:58] Epoch: 1 Batch: 1625/20099 (8.08%) Loss: 2.459597 LR: 0.00001265 [06:54:01] Epoch: 1 Batch: 1626/20099 (8.09%) Loss: 2.223011 LR: 0.00001265 [06:54:04] Epoch: 1 Batch: 1627/20099 (8.09%) Loss: 2.541628 LR: 0.00001265 [06:54:07] Epoch: 1 Batch: 1628/20099 (8.10%) Loss: 2.281045 LR: 0.00001265 [06:54:10] Epoch: 1 Batch: 1629/20099 (8.10%) Loss: 2.351700 LR: 0.00001265 [06:54:13] Epoch: 1 Batch: 1630/20099 (8.11%) Loss: 2.367495 LR: 0.00001265 [06:54:17] Epoch: 1 Batch: 1631/20099 (8.11%) Loss: 2.382348 LR: 0.00001271 [06:54:20] Epoch: 1 Batch: 1632/20099 (8.12%) Loss: 2.244730 LR: 0.00001271 [06:54:23] Epoch: 1 Batch: 1633/20099 (8.12%) Loss: 2.424586 LR: 0.00001271 [06:54:26] Epoch: 1 Batch: 1634/20099 (8.13%) Loss: 2.215753 LR: 0.00001271 [06:54:29] Epoch: 1 Batch: 1635/20099 (8.13%) Loss: 2.221030 LR: 0.00001271 [06:54:32] Epoch: 1 Batch: 1636/20099 (8.14%) Loss: 2.340493 LR: 0.00001271 [06:54:35] Epoch: 1 Batch: 1637/20099 (8.14%) Loss: 2.121684 LR: 0.00001271 [06:54:38] Epoch: 1 Batch: 1638/20099 (8.15%) Loss: 2.093879 LR: 0.00001276 [06:54:41] Epoch: 1 Batch: 1639/20099 (8.15%) Loss: 2.374841 LR: 0.00001276 [06:54:44] Epoch: 1 Batch: 1640/20099 (8.16%) Loss: 2.215573 LR: 0.00001276 [06:54:48] Epoch: 1 Batch: 1641/20099 (8.16%) Loss: 2.306739 LR: 0.00001276 [06:54:51] Epoch: 1 Batch: 1642/20099 (8.17%) Loss: 2.601887 LR: 0.00001276 [06:54:54] Epoch: 1 Batch: 1643/20099 (8.17%) Loss: 2.668187 LR: 0.00001276 [06:54:57] Epoch: 1 Batch: 1644/20099 (8.18%) Loss: 2.091061 LR: 0.00001276 [06:55:00] Epoch: 1 Batch: 1645/20099 (8.18%) Loss: 2.246003 LR: 0.00001282 [06:55:03] Epoch: 1 Batch: 1646/20099 (8.19%) Loss: 2.541567 LR: 0.00001282 [06:55:06] Epoch: 1 Batch: 1647/20099 (8.19%) Loss: 2.412294 LR: 0.00001282 [06:55:09] Epoch: 1 Batch: 1648/20099 (8.20%) Loss: 2.392308 LR: 0.00001282 [06:55:12] Epoch: 1 Batch: 1649/20099 (8.20%) Loss: 2.200082 LR: 0.00001282 [06:55:15] Epoch: 1 Batch: 1650/20099 (8.21%) Loss: 2.262511 LR: 0.00001282 [06:55:19] Epoch: 1 Batch: 1651/20099 (8.21%) Loss: 2.407450 LR: 0.00001282 [06:55:22] Epoch: 1 Batch: 1652/20099 (8.22%) Loss: 2.109709 LR: 0.00001287 [06:55:25] Epoch: 1 Batch: 1653/20099 (8.22%) Loss: 2.426110 LR: 0.00001287 [06:55:28] Epoch: 1 Batch: 1654/20099 (8.23%) Loss: 2.084368 LR: 0.00001287 [06:55:31] Epoch: 1 Batch: 1655/20099 (8.23%) Loss: 2.338767 LR: 0.00001287 [06:55:34] Epoch: 1 Batch: 1656/20099 (8.24%) Loss: 2.367495 LR: 0.00001287 [06:55:37] Epoch: 1 Batch: 1657/20099 (8.24%) Loss: 2.529631 LR: 0.00001287 [06:55:40] Epoch: 1 Batch: 1658/20099 (8.25%) Loss: 2.194630 LR: 0.00001287 [06:55:43] Epoch: 1 Batch: 1659/20099 (8.25%) Loss: 1.926474 LR: 0.00001293 [06:55:46] Epoch: 1 Batch: 1660/20099 (8.26%) Loss: 2.714239 LR: 0.00001293 [06:55:50] Epoch: 1 Batch: 1661/20099 (8.26%) Loss: 2.178670 LR: 0.00001293 [06:55:53] Epoch: 1 Batch: 1662/20099 (8.27%) Loss: 2.215367 LR: 0.00001293 [06:55:56] Epoch: 1 Batch: 1663/20099 (8.27%) Loss: 2.620052 LR: 0.00001293 [06:55:59] Epoch: 1 Batch: 1664/20099 (8.28%) Loss: 2.282206 LR: 0.00001293 [06:56:02] Epoch: 1 Batch: 1665/20099 (8.28%) Loss: 2.213902 LR: 0.00001293 [06:56:05] Epoch: 1 Batch: 1666/20099 (8.29%) Loss: 2.659980 LR: 0.00001298 [06:56:08] Epoch: 1 Batch: 1667/20099 (8.29%) Loss: 2.686398 LR: 0.00001298 [06:56:11] Epoch: 1 Batch: 1668/20099 (8.30%) Loss: 2.345804 LR: 0.00001298 [06:56:14] Epoch: 1 Batch: 1669/20099 (8.30%) Loss: 2.009180 LR: 0.00001298 [06:56:17] Epoch: 1 Batch: 1670/20099 (8.31%) Loss: 2.521678 LR: 0.00001298 [06:56:20] Epoch: 1 Batch: 1671/20099 (8.31%) Loss: 2.400159 LR: 0.00001298 [06:56:23] Epoch: 1 Batch: 1672/20099 (8.32%) Loss: 1.950377 LR: 0.00001298 [06:56:27] Epoch: 1 Batch: 1673/20099 (8.32%) Loss: 2.502210 LR: 0.00001304 [06:56:30] Epoch: 1 Batch: 1674/20099 (8.33%) Loss: 2.126189 LR: 0.00001304 [06:56:33] Epoch: 1 Batch: 1675/20099 (8.33%) Loss: 2.321570 LR: 0.00001304 [06:56:36] Epoch: 1 Batch: 1676/20099 (8.34%) Loss: 2.253106 LR: 0.00001304 [06:56:39] Epoch: 1 Batch: 1677/20099 (8.34%) Loss: 2.454096 LR: 0.00001304 [06:56:42] Epoch: 1 Batch: 1678/20099 (8.35%) Loss: 2.856981 LR: 0.00001304 [06:56:45] Epoch: 1 Batch: 1679/20099 (8.35%) Loss: 2.030300 LR: 0.00001304 [06:56:48] Epoch: 1 Batch: 1680/20099 (8.36%) Loss: 2.440923 LR: 0.00001309 [06:56:51] Epoch: 1 Batch: 1681/20099 (8.36%) Loss: 2.089742 LR: 0.00001309 [06:56:54] Epoch: 1 Batch: 1682/20099 (8.37%) Loss: 2.201564 LR: 0.00001309 [06:56:58] Epoch: 1 Batch: 1683/20099 (8.37%) Loss: 2.214243 LR: 0.00001309 [06:57:01] Epoch: 1 Batch: 1684/20099 (8.38%) Loss: 2.417551 LR: 0.00001309 [06:57:04] Epoch: 1 Batch: 1685/20099 (8.38%) Loss: 2.351271 LR: 0.00001309 [06:57:07] Epoch: 1 Batch: 1686/20099 (8.39%) Loss: 2.183200 LR: 0.00001309 [06:57:10] Epoch: 1 Batch: 1687/20099 (8.39%) Loss: 2.589788 LR: 0.00001315 [06:57:13] Epoch: 1 Batch: 1688/20099 (8.40%) Loss: 2.247130 LR: 0.00001315 [06:57:16] Epoch: 1 Batch: 1689/20099 (8.40%) Loss: 2.245478 LR: 0.00001315 [06:57:19] Epoch: 1 Batch: 1690/20099 (8.41%) Loss: 2.343208 LR: 0.00001315 [06:57:22] Epoch: 1 Batch: 1691/20099 (8.41%) Loss: 2.327599 LR: 0.00001315 [06:57:25] Epoch: 1 Batch: 1692/20099 (8.42%) Loss: 2.333018 LR: 0.00001315 [06:57:29] Epoch: 1 Batch: 1693/20099 (8.42%) Loss: 2.346905 LR: 0.00001315 [06:57:32] Epoch: 1 Batch: 1694/20099 (8.43%) Loss: 2.419308 LR: 0.00001320 [06:57:35] Epoch: 1 Batch: 1695/20099 (8.43%) Loss: 1.861662 LR: 0.00001320 [06:57:38] Epoch: 1 Batch: 1696/20099 (8.44%) Loss: 2.298707 LR: 0.00001320 [06:57:41] Epoch: 1 Batch: 1697/20099 (8.44%) Loss: 2.162012 LR: 0.00001320 [06:57:44] Epoch: 1 Batch: 1698/20099 (8.45%) Loss: 2.299673 LR: 0.00001320 [06:57:47] Epoch: 1 Batch: 1699/20099 (8.45%) Loss: 2.138160 LR: 0.00001320 [06:57:50] Epoch: 1 Batch: 1700/20099 (8.46%) Loss: 2.188778 LR: 0.00001320 [06:57:53] Epoch: 1 Batch: 1701/20099 (8.46%) Loss: 2.437657 LR: 0.00001325 [06:57:57] Epoch: 1 Batch: 1702/20099 (8.47%) Loss: 2.233498 LR: 0.00001325 [06:58:00] Epoch: 1 Batch: 1703/20099 (8.47%) Loss: 2.208044 LR: 0.00001325 [06:58:03] Epoch: 1 Batch: 1704/20099 (8.48%) Loss: 2.111284 LR: 0.00001325 [06:58:06] Epoch: 1 Batch: 1705/20099 (8.48%) Loss: 2.283358 LR: 0.00001325 [06:58:09] Epoch: 1 Batch: 1706/20099 (8.49%) Loss: 2.416652 LR: 0.00001325 [06:58:12] Epoch: 1 Batch: 1707/20099 (8.49%) Loss: 2.492845 LR: 0.00001325 [06:58:15] Epoch: 1 Batch: 1708/20099 (8.50%) Loss: 1.951697 LR: 0.00001331 [06:58:18] Epoch: 1 Batch: 1709/20099 (8.50%) Loss: 2.192699 LR: 0.00001331 [06:58:21] Epoch: 1 Batch: 1710/20099 (8.51%) Loss: 2.461122 LR: 0.00001331 [06:58:24] Epoch: 1 Batch: 1711/20099 (8.51%) Loss: 1.974747 LR: 0.00001331 [06:58:28] Epoch: 1 Batch: 1712/20099 (8.52%) Loss: 2.429329 LR: 0.00001331 [06:58:31] Epoch: 1 Batch: 1713/20099 (8.52%) Loss: 2.223940 LR: 0.00001331 [06:58:34] Epoch: 1 Batch: 1714/20099 (8.53%) Loss: 2.348989 LR: 0.00001331 [06:58:37] Epoch: 1 Batch: 1715/20099 (8.53%) Loss: 2.370657 LR: 0.00001336 [06:58:40] Epoch: 1 Batch: 1716/20099 (8.54%) Loss: 2.358052 LR: 0.00001336 [06:58:43] Epoch: 1 Batch: 1717/20099 (8.54%) Loss: 2.290869 LR: 0.00001336 [06:58:46] Epoch: 1 Batch: 1718/20099 (8.55%) Loss: 2.238388 LR: 0.00001336 [06:58:49] Epoch: 1 Batch: 1719/20099 (8.55%) Loss: 2.261013 LR: 0.00001336 [06:58:52] Epoch: 1 Batch: 1720/20099 (8.56%) Loss: 2.559283 LR: 0.00001336 [06:58:55] Epoch: 1 Batch: 1721/20099 (8.56%) Loss: 2.203809 LR: 0.00001336 [06:58:59] Epoch: 1 Batch: 1722/20099 (8.57%) Loss: 2.541089 LR: 0.00001342 [06:59:02] Epoch: 1 Batch: 1723/20099 (8.57%) Loss: 2.299240 LR: 0.00001342 [06:59:05] Epoch: 1 Batch: 1724/20099 (8.58%) Loss: 2.328654 LR: 0.00001342 [06:59:08] Epoch: 1 Batch: 1725/20099 (8.58%) Loss: 2.136559 LR: 0.00001342 [06:59:11] Epoch: 1 Batch: 1726/20099 (8.59%) Loss: 2.256575 LR: 0.00001342 [06:59:14] Epoch: 1 Batch: 1727/20099 (8.59%) Loss: 2.251299 LR: 0.00001342 [06:59:17] Epoch: 1 Batch: 1728/20099 (8.60%) Loss: 2.466022 LR: 0.00001342 [06:59:20] Epoch: 1 Batch: 1729/20099 (8.60%) Loss: 2.383904 LR: 0.00001347 [06:59:23] Epoch: 1 Batch: 1730/20099 (8.61%) Loss: 2.336459 LR: 0.00001347 [06:59:26] Epoch: 1 Batch: 1731/20099 (8.61%) Loss: 2.207716 LR: 0.00001347 [06:59:30] Epoch: 1 Batch: 1732/20099 (8.62%) Loss: 2.193822 LR: 0.00001347 [06:59:33] Epoch: 1 Batch: 1733/20099 (8.62%) Loss: 2.472135 LR: 0.00001347 [06:59:36] Epoch: 1 Batch: 1734/20099 (8.63%) Loss: 2.323449 LR: 0.00001347 [06:59:39] Epoch: 1 Batch: 1735/20099 (8.63%) Loss: 2.349814 LR: 0.00001347 [06:59:42] Epoch: 1 Batch: 1736/20099 (8.64%) Loss: 2.389352 LR: 0.00001353 [06:59:45] Epoch: 1 Batch: 1737/20099 (8.64%) Loss: 2.172447 LR: 0.00001353 [06:59:48] Epoch: 1 Batch: 1738/20099 (8.65%) Loss: 2.547144 LR: 0.00001353 [06:59:51] Epoch: 1 Batch: 1739/20099 (8.65%) Loss: 2.121384 LR: 0.00001353 [06:59:54] Epoch: 1 Batch: 1740/20099 (8.66%) Loss: 2.347766 LR: 0.00001353 [06:59:57] Epoch: 1 Batch: 1741/20099 (8.66%) Loss: 2.570089 LR: 0.00001353 [07:00:00] Epoch: 1 Batch: 1742/20099 (8.67%) Loss: 2.158255 LR: 0.00001353 [07:00:04] Epoch: 1 Batch: 1743/20099 (8.67%) Loss: 2.358286 LR: 0.00001358 [07:00:07] Epoch: 1 Batch: 1744/20099 (8.68%) Loss: 2.560031 LR: 0.00001358 [07:00:10] Epoch: 1 Batch: 1745/20099 (8.68%) Loss: 2.156942 LR: 0.00001358 [07:00:13] Epoch: 1 Batch: 1746/20099 (8.69%) Loss: 2.125696 LR: 0.00001358 [07:00:16] Epoch: 1 Batch: 1747/20099 (8.69%) Loss: 2.457773 LR: 0.00001358 [07:00:19] Epoch: 1 Batch: 1748/20099 (8.70%) Loss: 2.443941 LR: 0.00001358 [07:00:22] Epoch: 1 Batch: 1749/20099 (8.70%) Loss: 2.541251 LR: 0.00001358 [07:00:25] Epoch: 1 Batch: 1750/20099 (8.71%) Loss: 2.464676 LR: 0.00001364 [07:00:28] Epoch: 1 Batch: 1751/20099 (8.71%) Loss: 2.066006 LR: 0.00001364 [07:00:31] Epoch: 1 Batch: 1752/20099 (8.72%) Loss: 2.263027 LR: 0.00001364 [07:00:35] Epoch: 1 Batch: 1753/20099 (8.72%) Loss: 2.372286 LR: 0.00001364 [07:00:38] Epoch: 1 Batch: 1754/20099 (8.73%) Loss: 2.431624 LR: 0.00001364 [07:00:41] Epoch: 1 Batch: 1755/20099 (8.73%) Loss: 2.099194 LR: 0.00001364 [07:00:44] Epoch: 1 Batch: 1756/20099 (8.74%) Loss: 1.992789 LR: 0.00001364 [07:00:47] Epoch: 1 Batch: 1757/20099 (8.74%) Loss: 2.504426 LR: 0.00001369 [07:00:50] Epoch: 1 Batch: 1758/20099 (8.75%) Loss: 2.034373 LR: 0.00001369 [07:00:53] Epoch: 1 Batch: 1759/20099 (8.75%) Loss: 2.576954 LR: 0.00001369 [07:00:56] Epoch: 1 Batch: 1760/20099 (8.76%) Loss: 2.412631 LR: 0.00001369 [07:00:59] Epoch: 1 Batch: 1761/20099 (8.76%) Loss: 2.325705 LR: 0.00001369 [07:01:02] Epoch: 1 Batch: 1762/20099 (8.77%) Loss: 2.370502 LR: 0.00001369 [07:01:05] Epoch: 1 Batch: 1763/20099 (8.77%) Loss: 2.317315 LR: 0.00001369 [07:01:09] Epoch: 1 Batch: 1764/20099 (8.78%) Loss: 2.312242 LR: 0.00001375 [07:01:12] Epoch: 1 Batch: 1765/20099 (8.78%) Loss: 1.719209 LR: 0.00001375 [07:01:15] Epoch: 1 Batch: 1766/20099 (8.79%) Loss: 2.155427 LR: 0.00001375 [07:01:18] Epoch: 1 Batch: 1767/20099 (8.79%) Loss: 2.381479 LR: 0.00001375 [07:01:21] Epoch: 1 Batch: 1768/20099 (8.80%) Loss: 2.558931 LR: 0.00001375 [07:01:24] Epoch: 1 Batch: 1769/20099 (8.80%) Loss: 2.362370 LR: 0.00001375 [07:01:27] Epoch: 1 Batch: 1770/20099 (8.81%) Loss: 2.087512 LR: 0.00001375 [07:01:30] Epoch: 1 Batch: 1771/20099 (8.81%) Loss: 2.326218 LR: 0.00001380 [07:01:33] Epoch: 1 Batch: 1772/20099 (8.82%) Loss: 2.741281 LR: 0.00001380 [07:01:36] Epoch: 1 Batch: 1773/20099 (8.82%) Loss: 2.302479 LR: 0.00001380 [07:01:40] Epoch: 1 Batch: 1774/20099 (8.83%) Loss: 2.098899 LR: 0.00001380 [07:01:43] Epoch: 1 Batch: 1775/20099 (8.83%) Loss: 2.131225 LR: 0.00001380 [07:01:46] Epoch: 1 Batch: 1776/20099 (8.84%) Loss: 2.246016 LR: 0.00001380 [07:01:49] Epoch: 1 Batch: 1777/20099 (8.84%) Loss: 2.279077 LR: 0.00001380 [07:01:52] Epoch: 1 Batch: 1778/20099 (8.85%) Loss: 2.300801 LR: 0.00001385 [07:01:55] Epoch: 1 Batch: 1779/20099 (8.85%) Loss: 2.395876 LR: 0.00001385 [07:01:58] Epoch: 1 Batch: 1780/20099 (8.86%) Loss: 2.285399 LR: 0.00001385 [07:02:01] Epoch: 1 Batch: 1781/20099 (8.86%) Loss: 1.918569 LR: 0.00001385 [07:02:04] Epoch: 1 Batch: 1782/20099 (8.87%) Loss: 2.197295 LR: 0.00001385 [07:02:07] Epoch: 1 Batch: 1783/20099 (8.87%) Loss: 2.136642 LR: 0.00001385 [07:02:11] Epoch: 1 Batch: 1784/20099 (8.88%) Loss: 2.576044 LR: 0.00001385 [07:02:14] Epoch: 1 Batch: 1785/20099 (8.88%) Loss: 2.111222 LR: 0.00001391 [07:02:17] Epoch: 1 Batch: 1786/20099 (8.89%) Loss: 2.556108 LR: 0.00001391 [07:02:20] Epoch: 1 Batch: 1787/20099 (8.89%) Loss: 2.209382 LR: 0.00001391 [07:02:23] Epoch: 1 Batch: 1788/20099 (8.90%) Loss: 2.082262 LR: 0.00001391 [07:02:26] Epoch: 1 Batch: 1789/20099 (8.90%) Loss: 2.190976 LR: 0.00001391 [07:02:29] Epoch: 1 Batch: 1790/20099 (8.91%) Loss: 2.325526 LR: 0.00001391 [07:02:32] Epoch: 1 Batch: 1791/20099 (8.91%) Loss: 2.398899 LR: 0.00001391 [07:02:35] Epoch: 1 Batch: 1792/20099 (8.92%) Loss: 2.487328 LR: 0.00001396 [07:02:38] Epoch: 1 Batch: 1793/20099 (8.92%) Loss: 2.455072 LR: 0.00001396 [07:02:41] Epoch: 1 Batch: 1794/20099 (8.93%) Loss: 2.228589 LR: 0.00001396 [07:02:45] Epoch: 1 Batch: 1795/20099 (8.93%) Loss: 2.301514 LR: 0.00001396 [07:02:48] Epoch: 1 Batch: 1796/20099 (8.94%) Loss: 2.215296 LR: 0.00001396 [07:02:51] Epoch: 1 Batch: 1797/20099 (8.94%) Loss: 2.391502 LR: 0.00001396 [07:02:54] Epoch: 1 Batch: 1798/20099 (8.95%) Loss: 2.454530 LR: 0.00001396 [07:02:57] Epoch: 1 Batch: 1799/20099 (8.95%) Loss: 2.125805 LR: 0.00001402 [07:03:04] >> Temp checkpoint saved: epoch1_step1800, size: 0.1693 GB [07:03:04] Epoch: 1 Batch: 1800/20099 (8.96%) Loss: 2.341751 LR: 0.00001402 [07:03:07] Epoch: 1 Batch: 1801/20099 (8.96%) Loss: 2.512991 LR: 0.00001402 [07:03:10] Epoch: 1 Batch: 1802/20099 (8.97%) Loss: 2.313782 LR: 0.00001402 [07:03:13] Epoch: 1 Batch: 1803/20099 (8.97%) Loss: 2.262982 LR: 0.00001402 [07:03:16] Epoch: 1 Batch: 1804/20099 (8.98%) Loss: 2.229398 LR: 0.00001402 [07:03:19] Epoch: 1 Batch: 1805/20099 (8.98%) Loss: 2.300263 LR: 0.00001402 [07:03:22] Epoch: 1 Batch: 1806/20099 (8.99%) Loss: 2.642084 LR: 0.00001407 [07:03:25] Epoch: 1 Batch: 1807/20099 (8.99%) Loss: 2.262362 LR: 0.00001407 [07:03:29] Epoch: 1 Batch: 1808/20099 (9.00%) Loss: 2.222048 LR: 0.00001407 [07:03:32] Epoch: 1 Batch: 1809/20099 (9.00%) Loss: 2.258570 LR: 0.00001407 [07:03:35] Epoch: 1 Batch: 1810/20099 (9.01%) Loss: 2.291123 LR: 0.00001407 [07:03:38] Epoch: 1 Batch: 1811/20099 (9.01%) Loss: 2.038574 LR: 0.00001407 [07:03:41] Epoch: 1 Batch: 1812/20099 (9.02%) Loss: 2.163481 LR: 0.00001407 [07:03:44] Epoch: 1 Batch: 1813/20099 (9.02%) Loss: 2.190791 LR: 0.00001413 [07:03:47] Epoch: 1 Batch: 1814/20099 (9.03%) Loss: 2.122209 LR: 0.00001413 [07:03:50] Epoch: 1 Batch: 1815/20099 (9.03%) Loss: 2.445512 LR: 0.00001413 [07:03:54] Epoch: 1 Batch: 1816/20099 (9.04%) Loss: 2.413850 LR: 0.00001413 [07:03:57] Epoch: 1 Batch: 1817/20099 (9.04%) Loss: 2.119793 LR: 0.00001413 [07:04:00] Epoch: 1 Batch: 1818/20099 (9.05%) Loss: 2.062216 LR: 0.00001413 [07:04:03] Epoch: 1 Batch: 1819/20099 (9.05%) Loss: 2.429849 LR: 0.00001413 [07:04:06] Epoch: 1 Batch: 1820/20099 (9.06%) Loss: 2.576694 LR: 0.00001418 [07:04:09] Epoch: 1 Batch: 1821/20099 (9.06%) Loss: 2.103817 LR: 0.00001418 [07:04:12] Epoch: 1 Batch: 1822/20099 (9.07%) Loss: 2.055585 LR: 0.00001418 [07:04:15] Epoch: 1 Batch: 1823/20099 (9.07%) Loss: 2.230421 LR: 0.00001418 [07:04:18] Epoch: 1 Batch: 1824/20099 (9.08%) Loss: 2.327197 LR: 0.00001418 [07:04:21] Epoch: 1 Batch: 1825/20099 (9.08%) Loss: 2.046785 LR: 0.00001418 [07:04:24] Epoch: 1 Batch: 1826/20099 (9.09%) Loss: 2.214765 LR: 0.00001418 [07:04:28] Epoch: 1 Batch: 1827/20099 (9.09%) Loss: 2.434702 LR: 0.00001424 [07:04:31] Epoch: 1 Batch: 1828/20099 (9.09%) Loss: 2.425454 LR: 0.00001424 [07:04:34] Epoch: 1 Batch: 1829/20099 (9.10%) Loss: 2.109859 LR: 0.00001424 [07:04:37] Epoch: 1 Batch: 1830/20099 (9.10%) Loss: 2.155876 LR: 0.00001424 [07:04:40] Epoch: 1 Batch: 1831/20099 (9.11%) Loss: 2.234667 LR: 0.00001424 [07:04:43] Epoch: 1 Batch: 1832/20099 (9.11%) Loss: 2.315950 LR: 0.00001424 [07:04:46] Epoch: 1 Batch: 1833/20099 (9.12%) Loss: 2.404123 LR: 0.00001424 [07:04:49] Epoch: 1 Batch: 1834/20099 (9.12%) Loss: 2.113973 LR: 0.00001429 [07:04:52] Epoch: 1 Batch: 1835/20099 (9.13%) Loss: 2.457240 LR: 0.00001429 [07:04:55] Epoch: 1 Batch: 1836/20099 (9.13%) Loss: 2.279351 LR: 0.00001429 [07:04:58] Epoch: 1 Batch: 1837/20099 (9.14%) Loss: 2.269922 LR: 0.00001429 [07:05:02] Epoch: 1 Batch: 1838/20099 (9.14%) Loss: 2.245026 LR: 0.00001429 [07:05:05] Epoch: 1 Batch: 1839/20099 (9.15%) Loss: 2.328990 LR: 0.00001429 [07:05:08] Epoch: 1 Batch: 1840/20099 (9.15%) Loss: 2.190562 LR: 0.00001429 [07:05:11] Epoch: 1 Batch: 1841/20099 (9.16%) Loss: 2.382811 LR: 0.00001435 [07:05:14] Epoch: 1 Batch: 1842/20099 (9.16%) Loss: 2.476586 LR: 0.00001435 [07:05:17] Epoch: 1 Batch: 1843/20099 (9.17%) Loss: 2.254930 LR: 0.00001435 [07:05:20] Epoch: 1 Batch: 1844/20099 (9.17%) Loss: 2.147118 LR: 0.00001435 [07:05:23] Epoch: 1 Batch: 1845/20099 (9.18%) Loss: 2.080910 LR: 0.00001435 [07:05:26] Epoch: 1 Batch: 1846/20099 (9.18%) Loss: 2.573818 LR: 0.00001435 [07:05:29] Epoch: 1 Batch: 1847/20099 (9.19%) Loss: 2.532523 LR: 0.00001435 [07:05:32] Epoch: 1 Batch: 1848/20099 (9.19%) Loss: 2.189087 LR: 0.00001440 [07:05:36] Epoch: 1 Batch: 1849/20099 (9.20%) Loss: 2.281926 LR: 0.00001440 [07:05:39] Epoch: 1 Batch: 1850/20099 (9.20%) Loss: 2.312815 LR: 0.00001440 [07:05:42] Epoch: 1 Batch: 1851/20099 (9.21%) Loss: 2.219538 LR: 0.00001440 [07:05:45] Epoch: 1 Batch: 1852/20099 (9.21%) Loss: 2.326258 LR: 0.00001440 [07:05:48] Epoch: 1 Batch: 1853/20099 (9.22%) Loss: 2.339379 LR: 0.00001440 [07:05:51] Epoch: 1 Batch: 1854/20099 (9.22%) Loss: 2.097604 LR: 0.00001440 [07:05:54] Epoch: 1 Batch: 1855/20099 (9.23%) Loss: 2.401152 LR: 0.00001445 [07:05:57] Epoch: 1 Batch: 1856/20099 (9.23%) Loss: 2.210540 LR: 0.00001445 [07:06:00] Epoch: 1 Batch: 1857/20099 (9.24%) Loss: 2.132291 LR: 0.00001445 [07:06:03] Epoch: 1 Batch: 1858/20099 (9.24%) Loss: 2.284000 LR: 0.00001445 [07:06:06] Epoch: 1 Batch: 1859/20099 (9.25%) Loss: 2.308446 LR: 0.00001445 [07:06:10] Epoch: 1 Batch: 1860/20099 (9.25%) Loss: 2.151280 LR: 0.00001445 [07:06:13] Epoch: 1 Batch: 1861/20099 (9.26%) Loss: 2.206615 LR: 0.00001445 [07:06:16] Epoch: 1 Batch: 1862/20099 (9.26%) Loss: 2.298809 LR: 0.00001451 [07:06:19] Epoch: 1 Batch: 1863/20099 (9.27%) Loss: 2.447143 LR: 0.00001451 [07:06:22] Epoch: 1 Batch: 1864/20099 (9.27%) Loss: 2.307387 LR: 0.00001451 [07:06:25] Epoch: 1 Batch: 1865/20099 (9.28%) Loss: 2.228246 LR: 0.00001451 [07:06:28] Epoch: 1 Batch: 1866/20099 (9.28%) Loss: 2.326654 LR: 0.00001451 [07:06:31] Epoch: 1 Batch: 1867/20099 (9.29%) Loss: 2.345151 LR: 0.00001451 [07:06:34] Epoch: 1 Batch: 1868/20099 (9.29%) Loss: 2.162492 LR: 0.00001451 [07:06:37] Epoch: 1 Batch: 1869/20099 (9.30%) Loss: 2.187411 LR: 0.00001456 [07:06:41] Epoch: 1 Batch: 1870/20099 (9.30%) Loss: 1.998963 LR: 0.00001456 [07:06:44] Epoch: 1 Batch: 1871/20099 (9.31%) Loss: 2.070997 LR: 0.00001456 [07:06:47] Epoch: 1 Batch: 1872/20099 (9.31%) Loss: 2.109891 LR: 0.00001456 [07:06:50] Epoch: 1 Batch: 1873/20099 (9.32%) Loss: 2.566212 LR: 0.00001456 [07:06:53] Epoch: 1 Batch: 1874/20099 (9.32%) Loss: 2.506237 LR: 0.00001456 [07:06:56] Epoch: 1 Batch: 1875/20099 (9.33%) Loss: 2.253242 LR: 0.00001456 [07:06:59] Epoch: 1 Batch: 1876/20099 (9.33%) Loss: 2.282872 LR: 0.00001462 [07:07:02] Epoch: 1 Batch: 1877/20099 (9.34%) Loss: 2.292516 LR: 0.00001462 [07:07:05] Epoch: 1 Batch: 1878/20099 (9.34%) Loss: 2.371445 LR: 0.00001462 [07:07:08] Epoch: 1 Batch: 1879/20099 (9.35%) Loss: 2.220976 LR: 0.00001462 [07:07:12] Epoch: 1 Batch: 1880/20099 (9.35%) Loss: 2.249311 LR: 0.00001462 [07:07:15] Epoch: 1 Batch: 1881/20099 (9.36%) Loss: 2.150406 LR: 0.00001462 [07:07:18] Epoch: 1 Batch: 1882/20099 (9.36%) Loss: 2.197207 LR: 0.00001462 [07:07:21] Epoch: 1 Batch: 1883/20099 (9.37%) Loss: 2.382979 LR: 0.00001467 [07:07:24] Epoch: 1 Batch: 1884/20099 (9.37%) Loss: 2.472743 LR: 0.00001467 [07:07:27] Epoch: 1 Batch: 1885/20099 (9.38%) Loss: 2.445419 LR: 0.00001467 [07:07:30] Epoch: 1 Batch: 1886/20099 (9.38%) Loss: 2.178793 LR: 0.00001467 [07:07:33] Epoch: 1 Batch: 1887/20099 (9.39%) Loss: 2.276334 LR: 0.00001467 [07:07:36] Epoch: 1 Batch: 1888/20099 (9.39%) Loss: 2.561960 LR: 0.00001467 [07:07:39] Epoch: 1 Batch: 1889/20099 (9.40%) Loss: 2.424026 LR: 0.00001467 [07:07:42] Epoch: 1 Batch: 1890/20099 (9.40%) Loss: 2.245950 LR: 0.00001473 [07:07:46] Epoch: 1 Batch: 1891/20099 (9.41%) Loss: 2.089722 LR: 0.00001473 [07:07:49] Epoch: 1 Batch: 1892/20099 (9.41%) Loss: 2.382690 LR: 0.00001473 [07:07:52] Epoch: 1 Batch: 1893/20099 (9.42%) Loss: 2.443253 LR: 0.00001473 [07:07:55] Epoch: 1 Batch: 1894/20099 (9.42%) Loss: 2.415367 LR: 0.00001473 [07:07:58] Epoch: 1 Batch: 1895/20099 (9.43%) Loss: 2.261578 LR: 0.00001473 [07:08:01] Epoch: 1 Batch: 1896/20099 (9.43%) Loss: 2.233251 LR: 0.00001473 [07:08:04] Epoch: 1 Batch: 1897/20099 (9.44%) Loss: 2.558352 LR: 0.00001478 [07:08:07] Epoch: 1 Batch: 1898/20099 (9.44%) Loss: 2.526720 LR: 0.00001478 [07:08:10] Epoch: 1 Batch: 1899/20099 (9.45%) Loss: 2.429466 LR: 0.00001478 [07:08:14] Epoch: 1 Batch: 1900/20099 (9.45%) Loss: 2.250235 LR: 0.00001478 [07:08:17] Epoch: 1 Batch: 1901/20099 (9.46%) Loss: 2.151051 LR: 0.00001478 [07:08:20] Epoch: 1 Batch: 1902/20099 (9.46%) Loss: 2.287206 LR: 0.00001478 [07:08:23] Epoch: 1 Batch: 1903/20099 (9.47%) Loss: 2.545099 LR: 0.00001478 [07:08:26] Epoch: 1 Batch: 1904/20099 (9.47%) Loss: 2.251592 LR: 0.00001484 [07:08:29] Epoch: 1 Batch: 1905/20099 (9.48%) Loss: 2.395315 LR: 0.00001484 [07:08:32] Epoch: 1 Batch: 1906/20099 (9.48%) Loss: 2.164914 LR: 0.00001484 [07:08:35] Epoch: 1 Batch: 1907/20099 (9.49%) Loss: 2.374604 LR: 0.00001484 [07:08:38] Epoch: 1 Batch: 1908/20099 (9.49%) Loss: 2.218908 LR: 0.00001484 [07:08:41] Epoch: 1 Batch: 1909/20099 (9.50%) Loss: 2.395016 LR: 0.00001484 [07:08:45] Epoch: 1 Batch: 1910/20099 (9.50%) Loss: 2.199760 LR: 0.00001484 [07:08:48] Epoch: 1 Batch: 1911/20099 (9.51%) Loss: 2.411818 LR: 0.00001489 [07:08:51] Epoch: 1 Batch: 1912/20099 (9.51%) Loss: 2.293213 LR: 0.00001489 [07:08:54] Epoch: 1 Batch: 1913/20099 (9.52%) Loss: 2.413023 LR: 0.00001489 [07:08:57] Epoch: 1 Batch: 1914/20099 (9.52%) Loss: 2.367072 LR: 0.00001489 [07:09:00] Epoch: 1 Batch: 1915/20099 (9.53%) Loss: 2.226970 LR: 0.00001489 [07:09:03] Epoch: 1 Batch: 1916/20099 (9.53%) Loss: 2.410823 LR: 0.00001489 [07:09:06] Epoch: 1 Batch: 1917/20099 (9.54%) Loss: 2.256713 LR: 0.00001489 [07:09:09] Epoch: 1 Batch: 1918/20099 (9.54%) Loss: 2.358310 LR: 0.00001495 [07:09:12] Epoch: 1 Batch: 1919/20099 (9.55%) Loss: 1.987475 LR: 0.00001495 [07:09:16] Epoch: 1 Batch: 1920/20099 (9.55%) Loss: 2.478616 LR: 0.00001495 [07:09:19] Epoch: 1 Batch: 1921/20099 (9.56%) Loss: 2.320477 LR: 0.00001495 [07:09:22] Epoch: 1 Batch: 1922/20099 (9.56%) Loss: 2.144688 LR: 0.00001495 [07:09:25] Epoch: 1 Batch: 1923/20099 (9.57%) Loss: 2.321565 LR: 0.00001495 [07:09:28] Epoch: 1 Batch: 1924/20099 (9.57%) Loss: 2.427891 LR: 0.00001495 [07:09:31] Epoch: 1 Batch: 1925/20099 (9.58%) Loss: 2.240398 LR: 0.00001500 [07:09:34] Epoch: 1 Batch: 1926/20099 (9.58%) Loss: 2.220269 LR: 0.00001500 [07:09:37] Epoch: 1 Batch: 1927/20099 (9.59%) Loss: 2.657623 LR: 0.00001500 [07:09:40] Epoch: 1 Batch: 1928/20099 (9.59%) Loss: 2.314806 LR: 0.00001500 [07:09:43] Epoch: 1 Batch: 1929/20099 (9.60%) Loss: 2.336476 LR: 0.00001500 [07:09:47] Epoch: 1 Batch: 1930/20099 (9.60%) Loss: 2.257707 LR: 0.00001500 [07:09:50] Epoch: 1 Batch: 1931/20099 (9.61%) Loss: 2.289458 LR: 0.00001500 [07:09:53] Epoch: 1 Batch: 1932/20099 (9.61%) Loss: 2.247865 LR: 0.00001505 [07:09:56] Epoch: 1 Batch: 1933/20099 (9.62%) Loss: 2.364190 LR: 0.00001505 [07:09:59] Epoch: 1 Batch: 1934/20099 (9.62%) Loss: 2.475781 LR: 0.00001505 [07:10:02] Epoch: 1 Batch: 1935/20099 (9.63%) Loss: 2.419861 LR: 0.00001505 [07:10:05] Epoch: 1 Batch: 1936/20099 (9.63%) Loss: 1.885737 LR: 0.00001505 [07:10:08] Epoch: 1 Batch: 1937/20099 (9.64%) Loss: 2.308288 LR: 0.00001505 [07:10:11] Epoch: 1 Batch: 1938/20099 (9.64%) Loss: 2.256571 LR: 0.00001505 [07:10:14] Epoch: 1 Batch: 1939/20099 (9.65%) Loss: 2.459883 LR: 0.00001511 [07:10:18] Epoch: 1 Batch: 1940/20099 (9.65%) Loss: 2.213403 LR: 0.00001511 [07:10:21] Epoch: 1 Batch: 1941/20099 (9.66%) Loss: 2.323805 LR: 0.00001511 [07:10:24] Epoch: 1 Batch: 1942/20099 (9.66%) Loss: 2.292919 LR: 0.00001511 [07:10:27] Epoch: 1 Batch: 1943/20099 (9.67%) Loss: 2.143442 LR: 0.00001511 [07:10:30] Epoch: 1 Batch: 1944/20099 (9.67%) Loss: 2.303315 LR: 0.00001511 [07:10:33] Epoch: 1 Batch: 1945/20099 (9.68%) Loss: 2.344183 LR: 0.00001511 [07:10:36] Epoch: 1 Batch: 1946/20099 (9.68%) Loss: 2.213863 LR: 0.00001516 [07:10:39] Epoch: 1 Batch: 1947/20099 (9.69%) Loss: 2.570961 LR: 0.00001516 [07:10:42] Epoch: 1 Batch: 1948/20099 (9.69%) Loss: 2.227100 LR: 0.00001516 [07:10:46] Epoch: 1 Batch: 1949/20099 (9.70%) Loss: 2.507065 LR: 0.00001516 [07:10:49] Epoch: 1 Batch: 1950/20099 (9.70%) Loss: 2.187437 LR: 0.00001516 [07:10:52] Epoch: 1 Batch: 1951/20099 (9.71%) Loss: 2.349329 LR: 0.00001516 [07:10:55] Epoch: 1 Batch: 1952/20099 (9.71%) Loss: 1.983500 LR: 0.00001516 [07:10:58] Epoch: 1 Batch: 1953/20099 (9.72%) Loss: 2.451755 LR: 0.00001522 [07:11:01] Epoch: 1 Batch: 1954/20099 (9.72%) Loss: 2.312938 LR: 0.00001522 [07:11:04] Epoch: 1 Batch: 1955/20099 (9.73%) Loss: 2.413391 LR: 0.00001522 [07:11:07] Epoch: 1 Batch: 1956/20099 (9.73%) Loss: 2.302065 LR: 0.00001522 [07:11:10] Epoch: 1 Batch: 1957/20099 (9.74%) Loss: 2.063798 LR: 0.00001522 [07:11:13] Epoch: 1 Batch: 1958/20099 (9.74%) Loss: 2.387277 LR: 0.00001522 [07:11:17] Epoch: 1 Batch: 1959/20099 (9.75%) Loss: 2.412908 LR: 0.00001522 [07:11:20] Epoch: 1 Batch: 1960/20099 (9.75%) Loss: 1.994870 LR: 0.00001527 [07:11:23] Epoch: 1 Batch: 1961/20099 (9.76%) Loss: 2.167708 LR: 0.00001527 [07:11:26] Epoch: 1 Batch: 1962/20099 (9.76%) Loss: 2.282966 LR: 0.00001527 [07:11:29] Epoch: 1 Batch: 1963/20099 (9.77%) Loss: 2.119739 LR: 0.00001527 [07:11:32] Epoch: 1 Batch: 1964/20099 (9.77%) Loss: 2.065388 LR: 0.00001527 [07:11:35] Epoch: 1 Batch: 1965/20099 (9.78%) Loss: 2.202756 LR: 0.00001527 [07:11:38] Epoch: 1 Batch: 1966/20099 (9.78%) Loss: 2.430023 LR: 0.00001527 [07:11:41] Epoch: 1 Batch: 1967/20099 (9.79%) Loss: 2.057037 LR: 0.00001533 [07:11:44] Epoch: 1 Batch: 1968/20099 (9.79%) Loss: 2.221644 LR: 0.00001533 [07:11:48] Epoch: 1 Batch: 1969/20099 (9.80%) Loss: 2.140297 LR: 0.00001533 [07:11:51] Epoch: 1 Batch: 1970/20099 (9.80%) Loss: 2.416026 LR: 0.00001533 [07:11:54] Epoch: 1 Batch: 1971/20099 (9.81%) Loss: 2.322754 LR: 0.00001533 [07:11:57] Epoch: 1 Batch: 1972/20099 (9.81%) Loss: 2.536763 LR: 0.00001533 [07:12:00] Epoch: 1 Batch: 1973/20099 (9.82%) Loss: 2.140145 LR: 0.00001533 [07:12:03] Epoch: 1 Batch: 1974/20099 (9.82%) Loss: 2.217020 LR: 0.00001538 [07:12:06] Epoch: 1 Batch: 1975/20099 (9.83%) Loss: 2.259129 LR: 0.00001538 [07:12:09] Epoch: 1 Batch: 1976/20099 (9.83%) Loss: 2.245807 LR: 0.00001538 [07:12:12] Epoch: 1 Batch: 1977/20099 (9.84%) Loss: 2.007086 LR: 0.00001538 [07:12:15] Epoch: 1 Batch: 1978/20099 (9.84%) Loss: 2.148416 LR: 0.00001538 [07:12:19] Epoch: 1 Batch: 1979/20099 (9.85%) Loss: 2.503799 LR: 0.00001538 [07:12:22] Epoch: 1 Batch: 1980/20099 (9.85%) Loss: 2.147196 LR: 0.00001538 [07:12:25] Epoch: 1 Batch: 1981/20099 (9.86%) Loss: 2.274340 LR: 0.00001544 [07:12:28] Epoch: 1 Batch: 1982/20099 (9.86%) Loss: 2.227834 LR: 0.00001544 [07:12:31] Epoch: 1 Batch: 1983/20099 (9.87%) Loss: 2.402157 LR: 0.00001544 [07:12:34] Epoch: 1 Batch: 1984/20099 (9.87%) Loss: 2.370559 LR: 0.00001544 [07:12:37] Epoch: 1 Batch: 1985/20099 (9.88%) Loss: 2.419552 LR: 0.00001544 [07:12:40] Epoch: 1 Batch: 1986/20099 (9.88%) Loss: 2.235141 LR: 0.00001544 [07:12:43] Epoch: 1 Batch: 1987/20099 (9.89%) Loss: 2.274122 LR: 0.00001544 [07:12:46] Epoch: 1 Batch: 1988/20099 (9.89%) Loss: 2.277702 LR: 0.00001549 [07:12:49] Epoch: 1 Batch: 1989/20099 (9.90%) Loss: 2.263847 LR: 0.00001549 [07:12:52] Epoch: 1 Batch: 1990/20099 (9.90%) Loss: 2.325329 LR: 0.00001549 [07:12:56] Epoch: 1 Batch: 1991/20099 (9.91%) Loss: 2.405195 LR: 0.00001549 [07:12:59] Epoch: 1 Batch: 1992/20099 (9.91%) Loss: 2.181181 LR: 0.00001549 [07:13:02] Epoch: 1 Batch: 1993/20099 (9.92%) Loss: 2.070408 LR: 0.00001549 [07:13:05] Epoch: 1 Batch: 1994/20099 (9.92%) Loss: 2.018552 LR: 0.00001549 [07:13:08] Epoch: 1 Batch: 1995/20099 (9.93%) Loss: 2.307298 LR: 0.00001555 [07:13:11] Epoch: 1 Batch: 1996/20099 (9.93%) Loss: 2.326211 LR: 0.00001555 [07:13:14] Epoch: 1 Batch: 1997/20099 (9.94%) Loss: 2.252757 LR: 0.00001555 [07:13:17] Epoch: 1 Batch: 1998/20099 (9.94%) Loss: 2.487104 LR: 0.00001555 [07:13:20] Epoch: 1 Batch: 1999/20099 (9.95%) Loss: 2.298489 LR: 0.00001555 [07:13:23] >> Evaluating batch 0 [07:13:25] >> Evaluating batch 1 [07:13:26] >> Evaluating batch 2 [07:13:27] >> Evaluating batch 3 [07:13:29] >> Evaluating batch 4 [07:13:30] >> Evaluating batch 5 [07:13:31] >> Evaluating batch 6 [07:13:32] >> Evaluating batch 7 [07:13:34] >> Evaluating batch 8 [07:13:35] >> Evaluating batch 9 [07:13:36] >> Evaluating batch 10 [07:13:37] >> Evaluating batch 11 [07:13:38] >> Evaluating batch 12 [07:13:40] >> Evaluating batch 13 [07:13:41] >> Evaluating batch 14 [07:13:42] >> Evaluating batch 15 [07:13:43] >> Evaluating batch 16 [07:13:44] Epoch: 1 Step: 2000/20099 Evaluation: [07:13:44] [1mAvg Loss Since Last Eval: 2.3065 Val Loss: 2.3562 Validation loss delta: -0.0493 Perplexity: 10.5509 LR: 0.00001555 [07:13:47] >> Temp checkpoint saved: epoch1_step2000, size: 0.1693 GB [07:13:51] >> Checkpoint saved: epoch1_step2000, size: 0.1693 GB [07:13:51] Epoch: 1 Batch: 2000/20099 (9.95%) Loss: 2.367077 LR: 0.00001555 [07:13:54] Epoch: 1 Batch: 2001/20099 (9.96%) Loss: 2.289066 LR: 0.00001555 [07:13:57] Epoch: 1 Batch: 2002/20099 (9.96%) Loss: 2.346720 LR: 0.00001560 [07:14:00] Epoch: 1 Batch: 2003/20099 (9.97%) Loss: 2.406989 LR: 0.00001560 [07:14:03] Epoch: 1 Batch: 2004/20099 (9.97%) Loss: 2.170361 LR: 0.00001560 [07:14:07] Epoch: 1 Batch: 2005/20099 (9.98%) Loss: 2.354332 LR: 0.00001560 [07:14:10] Epoch: 1 Batch: 2006/20099 (9.98%) Loss: 2.293551 LR: 0.00001560 [07:14:13] Epoch: 1 Batch: 2007/20099 (9.99%) Loss: 2.191210 LR: 0.00001560 [07:14:16] Epoch: 1 Batch: 2008/20099 (9.99%) Loss: 2.334704 LR: 0.00001560 [07:14:19] Epoch: 1 Batch: 2009/20099 (10.00%) Loss: 2.313212 LR: 0.00001565 [07:14:22] Epoch: 1 Batch: 2010/20099 (10.00%) Loss: 2.346992 LR: 0.00001565 [07:14:26] Epoch: 1 Batch: 2011/20099 (10.01%) Loss: 2.237567 LR: 0.00001565 [07:14:29] Epoch: 1 Batch: 2012/20099 (10.01%) Loss: 2.464631 LR: 0.00001565 [07:14:32] Epoch: 1 Batch: 2013/20099 (10.02%) Loss: 2.245076 LR: 0.00001565 [07:14:35] Epoch: 1 Batch: 2014/20099 (10.02%) Loss: 2.339341 LR: 0.00001565 [07:14:38] Epoch: 1 Batch: 2015/20099 (10.03%) Loss: 2.270372 LR: 0.00001565 [07:14:41] Epoch: 1 Batch: 2016/20099 (10.03%) Loss: 2.128101 LR: 0.00001571 [07:14:44] Epoch: 1 Batch: 2017/20099 (10.04%) Loss: 2.159590 LR: 0.00001571 [07:14:47] Epoch: 1 Batch: 2018/20099 (10.04%) Loss: 2.211290 LR: 0.00001571 [07:14:51] Epoch: 1 Batch: 2019/20099 (10.05%) Loss: 2.021578 LR: 0.00001571 [07:14:54] Epoch: 1 Batch: 2020/20099 (10.05%) Loss: 2.454699 LR: 0.00001571 [07:14:57] Epoch: 1 Batch: 2021/20099 (10.06%) Loss: 2.202665 LR: 0.00001571 [07:15:00] Epoch: 1 Batch: 2022/20099 (10.06%) Loss: 2.377779 LR: 0.00001571 [07:15:03] Epoch: 1 Batch: 2023/20099 (10.07%) Loss: 2.165132 LR: 0.00001576 [07:15:06] Epoch: 1 Batch: 2024/20099 (10.07%) Loss: 1.950633 LR: 0.00001576 [07:15:09] Epoch: 1 Batch: 2025/20099 (10.08%) Loss: 2.504355 LR: 0.00001576 [07:15:12] Epoch: 1 Batch: 2026/20099 (10.08%) Loss: 2.504117 LR: 0.00001576 [07:15:15] Epoch: 1 Batch: 2027/20099 (10.09%) Loss: 2.179287 LR: 0.00001576 [07:15:18] Epoch: 1 Batch: 2028/20099 (10.09%) Loss: 2.175775 LR: 0.00001576 [07:15:21] Epoch: 1 Batch: 2029/20099 (10.10%) Loss: 2.215324 LR: 0.00001576 [07:15:24] Epoch: 1 Batch: 2030/20099 (10.10%) Loss: 2.251256 LR: 0.00001582 [07:15:27] Epoch: 1 Batch: 2031/20099 (10.10%) Loss: 2.517844 LR: 0.00001582 [07:15:31] Epoch: 1 Batch: 2032/20099 (10.11%) Loss: 2.377371 LR: 0.00001582 [07:15:34] Epoch: 1 Batch: 2033/20099 (10.11%) Loss: 2.566766 LR: 0.00001582 [07:15:37] Epoch: 1 Batch: 2034/20099 (10.12%) Loss: 2.504982 LR: 0.00001582 [07:15:40] Epoch: 1 Batch: 2035/20099 (10.12%) Loss: 2.155178 LR: 0.00001582 [07:15:43] Epoch: 1 Batch: 2036/20099 (10.13%) Loss: 2.274602 LR: 0.00001582 [07:15:46] Epoch: 1 Batch: 2037/20099 (10.13%) Loss: 2.522890 LR: 0.00001587 [07:15:49] Epoch: 1 Batch: 2038/20099 (10.14%) Loss: 2.423954 LR: 0.00001587 [07:15:52] Epoch: 1 Batch: 2039/20099 (10.14%) Loss: 2.036903 LR: 0.00001587 [07:15:55] Epoch: 1 Batch: 2040/20099 (10.15%) Loss: 2.170370 LR: 0.00001587 [07:15:59] Epoch: 1 Batch: 2041/20099 (10.15%) Loss: 2.848930 LR: 0.00001587 [07:16:02] Epoch: 1 Batch: 2042/20099 (10.16%) Loss: 2.229634 LR: 0.00001587 [07:16:05] Epoch: 1 Batch: 2043/20099 (10.16%) Loss: 2.516218 LR: 0.00001587 [07:16:08] Epoch: 1 Batch: 2044/20099 (10.17%) Loss: 2.308128 LR: 0.00001593 [07:16:11] Epoch: 1 Batch: 2045/20099 (10.17%) Loss: 2.207214 LR: 0.00001593 [07:16:14] Epoch: 1 Batch: 2046/20099 (10.18%) Loss: 2.056960 LR: 0.00001593 [07:16:17] Epoch: 1 Batch: 2047/20099 (10.18%) Loss: 2.162518 LR: 0.00001593 [07:16:20] Epoch: 1 Batch: 2048/20099 (10.19%) Loss: 2.189251 LR: 0.00001593 [07:16:23] Epoch: 1 Batch: 2049/20099 (10.19%) Loss: 2.296892 LR: 0.00001593 [07:16:26] Epoch: 1 Batch: 2050/20099 (10.20%) Loss: 2.230157 LR: 0.00001593 [07:16:30] Epoch: 1 Batch: 2051/20099 (10.20%) Loss: 2.306894 LR: 0.00001598 [07:16:33] Epoch: 1 Batch: 2052/20099 (10.21%) Loss: 2.121476 LR: 0.00001598 [07:16:36] Epoch: 1 Batch: 2053/20099 (10.21%) Loss: 2.277861 LR: 0.00001598 [07:16:39] Epoch: 1 Batch: 2054/20099 (10.22%) Loss: 2.022494 LR: 0.00001598 [07:16:42] Epoch: 1 Batch: 2055/20099 (10.22%) Loss: 2.184910 LR: 0.00001598 [07:16:45] Epoch: 1 Batch: 2056/20099 (10.23%) Loss: 2.358737 LR: 0.00001598 [07:16:48] Epoch: 1 Batch: 2057/20099 (10.23%) Loss: 2.182059 LR: 0.00001598 [07:16:51] Epoch: 1 Batch: 2058/20099 (10.24%) Loss: 2.207781 LR: 0.00001604 [07:16:54] Epoch: 1 Batch: 2059/20099 (10.24%) Loss: 2.553614 LR: 0.00001604 [07:16:57] Epoch: 1 Batch: 2060/20099 (10.25%) Loss: 2.409334 LR: 0.00001604 [07:17:00] Epoch: 1 Batch: 2061/20099 (10.25%) Loss: 2.074232 LR: 0.00001604 [07:17:03] Epoch: 1 Batch: 2062/20099 (10.26%) Loss: 2.401018 LR: 0.00001604 [07:17:07] Epoch: 1 Batch: 2063/20099 (10.26%) Loss: 1.950542 LR: 0.00001604 [07:17:10] Epoch: 1 Batch: 2064/20099 (10.27%) Loss: 2.088585 LR: 0.00001604 [07:17:13] Epoch: 1 Batch: 2065/20099 (10.27%) Loss: 2.173294 LR: 0.00001609 [07:17:16] Epoch: 1 Batch: 2066/20099 (10.28%) Loss: 2.343125 LR: 0.00001609 [07:17:19] Epoch: 1 Batch: 2067/20099 (10.28%) Loss: 2.096139 LR: 0.00001609 [07:17:22] Epoch: 1 Batch: 2068/20099 (10.29%) Loss: 2.302466 LR: 0.00001609 [07:17:25] Epoch: 1 Batch: 2069/20099 (10.29%) Loss: 2.327072 LR: 0.00001609 [07:17:28] Epoch: 1 Batch: 2070/20099 (10.30%) Loss: 2.444797 LR: 0.00001609 [07:17:31] Epoch: 1 Batch: 2071/20099 (10.30%) Loss: 2.269427 LR: 0.00001609 [07:17:34] Epoch: 1 Batch: 2072/20099 (10.31%) Loss: 2.038048 LR: 0.00001615 [07:17:37] Epoch: 1 Batch: 2073/20099 (10.31%) Loss: 2.369176 LR: 0.00001615 [07:17:41] Epoch: 1 Batch: 2074/20099 (10.32%) Loss: 2.255075 LR: 0.00001615 [07:17:44] Epoch: 1 Batch: 2075/20099 (10.32%) Loss: 2.614571 LR: 0.00001615 [07:17:47] Epoch: 1 Batch: 2076/20099 (10.33%) Loss: 2.272285 LR: 0.00001615 [07:17:50] Epoch: 1 Batch: 2077/20099 (10.33%) Loss: 2.226606 LR: 0.00001615 [07:17:53] Epoch: 1 Batch: 2078/20099 (10.34%) Loss: 2.376482 LR: 0.00001615 [07:17:56] Epoch: 1 Batch: 2079/20099 (10.34%) Loss: 1.969734 LR: 0.00001620 [07:17:59] Epoch: 1 Batch: 2080/20099 (10.35%) Loss: 1.986150 LR: 0.00001620 [07:18:02] Epoch: 1 Batch: 2081/20099 (10.35%) Loss: 2.335512 LR: 0.00001620 [07:18:05] Epoch: 1 Batch: 2082/20099 (10.36%) Loss: 1.875843 LR: 0.00001620 [07:18:09] Epoch: 1 Batch: 2083/20099 (10.36%) Loss: 2.496465 LR: 0.00001620 [07:18:12] Epoch: 1 Batch: 2084/20099 (10.37%) Loss: 2.302438 LR: 0.00001620 [07:18:15] Epoch: 1 Batch: 2085/20099 (10.37%) Loss: 2.189569 LR: 0.00001620 [07:18:18] Epoch: 1 Batch: 2086/20099 (10.38%) Loss: 2.421917 LR: 0.00001625 [07:18:21] Epoch: 1 Batch: 2087/20099 (10.38%) Loss: 1.799346 LR: 0.00001625 [07:18:24] Epoch: 1 Batch: 2088/20099 (10.39%) Loss: 2.194513 LR: 0.00001625 [07:18:27] Epoch: 1 Batch: 2089/20099 (10.39%) Loss: 2.130016 LR: 0.00001625 [07:18:30] Epoch: 1 Batch: 2090/20099 (10.40%) Loss: 2.320693 LR: 0.00001625 [07:18:33] Epoch: 1 Batch: 2091/20099 (10.40%) Loss: 2.367608 LR: 0.00001625 [07:18:36] Epoch: 1 Batch: 2092/20099 (10.41%) Loss: 2.367485 LR: 0.00001625 [07:18:39] Epoch: 1 Batch: 2093/20099 (10.41%) Loss: 2.298373 LR: 0.00001631 [07:18:43] Epoch: 1 Batch: 2094/20099 (10.42%) Loss: 2.453069 LR: 0.00001631 [07:18:46] Epoch: 1 Batch: 2095/20099 (10.42%) Loss: 2.164415 LR: 0.00001631 [07:18:49] Epoch: 1 Batch: 2096/20099 (10.43%) Loss: 2.093382 LR: 0.00001631 [07:18:52] Epoch: 1 Batch: 2097/20099 (10.43%) Loss: 2.318124 LR: 0.00001631 [07:18:55] Epoch: 1 Batch: 2098/20099 (10.44%) Loss: 2.397333 LR: 0.00001631 [07:18:58] Epoch: 1 Batch: 2099/20099 (10.44%) Loss: 2.171014 LR: 0.00001631 [07:19:01] Epoch: 1 Batch: 2100/20099 (10.45%) Loss: 2.496403 LR: 0.00001636 [07:19:04] Epoch: 1 Batch: 2101/20099 (10.45%) Loss: 2.354991 LR: 0.00001636 [07:19:07] Epoch: 1 Batch: 2102/20099 (10.46%) Loss: 2.253316 LR: 0.00001636 [07:19:10] Epoch: 1 Batch: 2103/20099 (10.46%) Loss: 2.191739 LR: 0.00001636 [07:19:14] Epoch: 1 Batch: 2104/20099 (10.47%) Loss: 2.212520 LR: 0.00001636 [07:19:17] Epoch: 1 Batch: 2105/20099 (10.47%) Loss: 2.186094 LR: 0.00001636 [07:19:20] Epoch: 1 Batch: 2106/20099 (10.48%) Loss: 2.211991 LR: 0.00001636 [07:19:23] Epoch: 1 Batch: 2107/20099 (10.48%) Loss: 2.064953 LR: 0.00001642 [07:19:26] Epoch: 1 Batch: 2108/20099 (10.49%) Loss: 2.128211 LR: 0.00001642 [07:19:29] Epoch: 1 Batch: 2109/20099 (10.49%) Loss: 2.538609 LR: 0.00001642 [07:19:32] Epoch: 1 Batch: 2110/20099 (10.50%) Loss: 2.340076 LR: 0.00001642 [07:19:35] Epoch: 1 Batch: 2111/20099 (10.50%) Loss: 2.291218 LR: 0.00001642 [07:19:38] Epoch: 1 Batch: 2112/20099 (10.51%) Loss: 2.457956 LR: 0.00001642 [07:19:41] Epoch: 1 Batch: 2113/20099 (10.51%) Loss: 2.098350 LR: 0.00001642 [07:19:45] Epoch: 1 Batch: 2114/20099 (10.52%) Loss: 1.829670 LR: 0.00001647 [07:19:48] Epoch: 1 Batch: 2115/20099 (10.52%) Loss: 2.291657 LR: 0.00001647 [07:19:51] Epoch: 1 Batch: 2116/20099 (10.53%) Loss: 2.336934 LR: 0.00001647 [07:19:54] Epoch: 1 Batch: 2117/20099 (10.53%) Loss: 2.084483 LR: 0.00001647 [07:19:57] Epoch: 1 Batch: 2118/20099 (10.54%) Loss: 2.006647 LR: 0.00001647 [07:20:00] Epoch: 1 Batch: 2119/20099 (10.54%) Loss: 2.328546 LR: 0.00001647 [07:20:03] Epoch: 1 Batch: 2120/20099 (10.55%) Loss: 2.558205 LR: 0.00001647 [07:20:06] Epoch: 1 Batch: 2121/20099 (10.55%) Loss: 2.269551 LR: 0.00001653 [07:20:09] Epoch: 1 Batch: 2122/20099 (10.56%) Loss: 2.243447 LR: 0.00001653 [07:20:12] Epoch: 1 Batch: 2123/20099 (10.56%) Loss: 2.311573 LR: 0.00001653 [07:20:16] Epoch: 1 Batch: 2124/20099 (10.57%) Loss: 2.446999 LR: 0.00001653 [07:20:19] Epoch: 1 Batch: 2125/20099 (10.57%) Loss: 2.761786 LR: 0.00001653 [07:20:22] Epoch: 1 Batch: 2126/20099 (10.58%) Loss: 2.040054 LR: 0.00001653 [07:20:25] Epoch: 1 Batch: 2127/20099 (10.58%) Loss: 2.214284 LR: 0.00001653 [07:20:28] Epoch: 1 Batch: 2128/20099 (10.59%) Loss: 2.196260 LR: 0.00001658 [07:20:31] Epoch: 1 Batch: 2129/20099 (10.59%) Loss: 2.150673 LR: 0.00001658 [07:20:34] Epoch: 1 Batch: 2130/20099 (10.60%) Loss: 2.281082 LR: 0.00001658 [07:20:37] Epoch: 1 Batch: 2131/20099 (10.60%) Loss: 2.436777 LR: 0.00001658 [07:20:40] Epoch: 1 Batch: 2132/20099 (10.61%) Loss: 2.156753 LR: 0.00001658 [07:20:44] Epoch: 1 Batch: 2133/20099 (10.61%) Loss: 2.560733 LR: 0.00001658 [07:20:47] Epoch: 1 Batch: 2134/20099 (10.62%) Loss: 2.051600 LR: 0.00001658 [07:20:50] Epoch: 1 Batch: 2135/20099 (10.62%) Loss: 2.256183 LR: 0.00001664 [07:20:53] Epoch: 1 Batch: 2136/20099 (10.63%) Loss: 2.390585 LR: 0.00001664 [07:20:56] Epoch: 1 Batch: 2137/20099 (10.63%) Loss: 2.358550 LR: 0.00001664 [07:20:59] Epoch: 1 Batch: 2138/20099 (10.64%) Loss: 2.356432 LR: 0.00001664 [07:21:02] Epoch: 1 Batch: 2139/20099 (10.64%) Loss: 2.176764 LR: 0.00001664 [07:21:05] Epoch: 1 Batch: 2140/20099 (10.65%) Loss: 2.446532 LR: 0.00001664 [07:21:08] Epoch: 1 Batch: 2141/20099 (10.65%) Loss: 2.544462 LR: 0.00001664 [07:21:11] Epoch: 1 Batch: 2142/20099 (10.66%) Loss: 2.407151 LR: 0.00001669 [07:21:14] Epoch: 1 Batch: 2143/20099 (10.66%) Loss: 2.003327 LR: 0.00001669 [07:21:18] Epoch: 1 Batch: 2144/20099 (10.67%) Loss: 2.359697 LR: 0.00001669 [07:21:21] Epoch: 1 Batch: 2145/20099 (10.67%) Loss: 2.304569 LR: 0.00001669 [07:21:24] Epoch: 1 Batch: 2146/20099 (10.68%) Loss: 2.254885 LR: 0.00001669 [07:21:27] Epoch: 1 Batch: 2147/20099 (10.68%) Loss: 2.095551 LR: 0.00001669 [07:21:30] Epoch: 1 Batch: 2148/20099 (10.69%) Loss: 2.650084 LR: 0.00001669 [07:21:33] Epoch: 1 Batch: 2149/20099 (10.69%) Loss: 2.251336 LR: 0.00001675 [07:21:36] Epoch: 1 Batch: 2150/20099 (10.70%) Loss: 2.250636 LR: 0.00001675 [07:21:39] Epoch: 1 Batch: 2151/20099 (10.70%) Loss: 2.266400 LR: 0.00001675 [07:21:42] Epoch: 1 Batch: 2152/20099 (10.71%) Loss: 2.093622 LR: 0.00001675 [07:21:45] Epoch: 1 Batch: 2153/20099 (10.71%) Loss: 2.440011 LR: 0.00001675 [07:21:49] Epoch: 1 Batch: 2154/20099 (10.72%) Loss: 2.376338 LR: 0.00001675 [07:21:52] Epoch: 1 Batch: 2155/20099 (10.72%) Loss: 2.452884 LR: 0.00001675 [07:21:55] Epoch: 1 Batch: 2156/20099 (10.73%) Loss: 2.323530 LR: 0.00001680 [07:21:58] Epoch: 1 Batch: 2157/20099 (10.73%) Loss: 2.107119 LR: 0.00001680 [07:22:01] Epoch: 1 Batch: 2158/20099 (10.74%) Loss: 2.306903 LR: 0.00001680 [07:22:04] Epoch: 1 Batch: 2159/20099 (10.74%) Loss: 2.185366 LR: 0.00001680 [07:22:07] Epoch: 1 Batch: 2160/20099 (10.75%) Loss: 2.334271 LR: 0.00001680 [07:22:10] Epoch: 1 Batch: 2161/20099 (10.75%) Loss: 2.367483 LR: 0.00001680 [07:22:13] Epoch: 1 Batch: 2162/20099 (10.76%) Loss: 2.345570 LR: 0.00001680 [07:22:16] Epoch: 1 Batch: 2163/20099 (10.76%) Loss: 2.131888 LR: 0.00001685 [07:22:20] Epoch: 1 Batch: 2164/20099 (10.77%) Loss: 2.099236 LR: 0.00001685 [07:22:23] Epoch: 1 Batch: 2165/20099 (10.77%) Loss: 2.371895 LR: 0.00001685 [07:22:26] Epoch: 1 Batch: 2166/20099 (10.78%) Loss: 2.291843 LR: 0.00001685 [07:22:29] Epoch: 1 Batch: 2167/20099 (10.78%) Loss: 2.518139 LR: 0.00001685 [07:22:32] Epoch: 1 Batch: 2168/20099 (10.79%) Loss: 2.352367 LR: 0.00001685 [07:22:35] Epoch: 1 Batch: 2169/20099 (10.79%) Loss: 2.293696 LR: 0.00001685 [07:22:38] Epoch: 1 Batch: 2170/20099 (10.80%) Loss: 1.994458 LR: 0.00001691 [07:22:41] Epoch: 1 Batch: 2171/20099 (10.80%) Loss: 2.634981 LR: 0.00001691 [07:22:44] Epoch: 1 Batch: 2172/20099 (10.81%) Loss: 2.382989 LR: 0.00001691 [07:22:47] Epoch: 1 Batch: 2173/20099 (10.81%) Loss: 2.678243 LR: 0.00001691 [07:22:50] Epoch: 1 Batch: 2174/20099 (10.82%) Loss: 2.622238 LR: 0.00001691 [07:22:54] Epoch: 1 Batch: 2175/20099 (10.82%) Loss: 1.999560 LR: 0.00001691 [07:22:57] Epoch: 1 Batch: 2176/20099 (10.83%) Loss: 2.402190 LR: 0.00001691 [07:23:00] Epoch: 1 Batch: 2177/20099 (10.83%) Loss: 2.448068 LR: 0.00001696 [07:23:03] Epoch: 1 Batch: 2178/20099 (10.84%) Loss: 2.263721 LR: 0.00001696 [07:23:06] Epoch: 1 Batch: 2179/20099 (10.84%) Loss: 2.273844 LR: 0.00001696 [07:23:09] Epoch: 1 Batch: 2180/20099 (10.85%) Loss: 2.442904 LR: 0.00001696 [07:23:12] Epoch: 1 Batch: 2181/20099 (10.85%) Loss: 2.109273 LR: 0.00001696 [07:23:15] Epoch: 1 Batch: 2182/20099 (10.86%) Loss: 2.325187 LR: 0.00001696 [07:23:18] Epoch: 1 Batch: 2183/20099 (10.86%) Loss: 2.457445 LR: 0.00001696 [07:23:21] Epoch: 1 Batch: 2184/20099 (10.87%) Loss: 2.405094 LR: 0.00001702 [07:23:25] Epoch: 1 Batch: 2185/20099 (10.87%) Loss: 2.134115 LR: 0.00001702 [07:23:28] Epoch: 1 Batch: 2186/20099 (10.88%) Loss: 2.234102 LR: 0.00001702 [07:23:31] Epoch: 1 Batch: 2187/20099 (10.88%) Loss: 2.289825 LR: 0.00001702 [07:23:34] Epoch: 1 Batch: 2188/20099 (10.89%) Loss: 2.237949 LR: 0.00001702 [07:23:37] Epoch: 1 Batch: 2189/20099 (10.89%) Loss: 2.244594 LR: 0.00001702 [07:23:40] Epoch: 1 Batch: 2190/20099 (10.90%) Loss: 2.029689 LR: 0.00001702 [07:23:43] Epoch: 1 Batch: 2191/20099 (10.90%) Loss: 2.468111 LR: 0.00001707 [07:23:46] Epoch: 1 Batch: 2192/20099 (10.91%) Loss: 2.265834 LR: 0.00001707 [07:23:49] Epoch: 1 Batch: 2193/20099 (10.91%) Loss: 2.122946 LR: 0.00001707 [07:23:52] Epoch: 1 Batch: 2194/20099 (10.92%) Loss: 2.312151 LR: 0.00001707 [07:23:55] Epoch: 1 Batch: 2195/20099 (10.92%) Loss: 2.127033 LR: 0.00001707 [07:23:59] Epoch: 1 Batch: 2196/20099 (10.93%) Loss: 2.280644 LR: 0.00001707 [07:24:02] Epoch: 1 Batch: 2197/20099 (10.93%) Loss: 2.274489 LR: 0.00001707 [07:24:05] Epoch: 1 Batch: 2198/20099 (10.94%) Loss: 2.363740 LR: 0.00001713 [07:24:08] Epoch: 1 Batch: 2199/20099 (10.94%) Loss: 2.431030 LR: 0.00001713 [07:24:15] >> Cleaned up old temp checkpoint: epoch1_step200 [07:24:15] >> Temp checkpoint saved: epoch1_step2200, size: 0.1693 GB [07:24:15] Epoch: 1 Batch: 2200/20099 (10.95%) Loss: 2.350767 LR: 0.00001713 [07:24:18] Epoch: 1 Batch: 2201/20099 (10.95%) Loss: 2.173109 LR: 0.00001713 [07:24:21] Epoch: 1 Batch: 2202/20099 (10.96%) Loss: 2.202811 LR: 0.00001713 [07:24:24] Epoch: 1 Batch: 2203/20099 (10.96%) Loss: 2.522493 LR: 0.00001713 [07:24:27] Epoch: 1 Batch: 2204/20099 (10.97%) Loss: 2.108222 LR: 0.00001713 [07:24:30] Epoch: 1 Batch: 2205/20099 (10.97%) Loss: 2.107733 LR: 0.00001718 [07:24:33] Epoch: 1 Batch: 2206/20099 (10.98%) Loss: 2.077242 LR: 0.00001718 [07:24:36] Epoch: 1 Batch: 2207/20099 (10.98%) Loss: 2.172688 LR: 0.00001718 [07:24:39] Epoch: 1 Batch: 2208/20099 (10.99%) Loss: 2.121405 LR: 0.00001718 [07:24:43] Epoch: 1 Batch: 2209/20099 (10.99%) Loss: 2.000204 LR: 0.00001718 [07:24:46] Epoch: 1 Batch: 2210/20099 (11.00%) Loss: 2.235373 LR: 0.00001718 [07:24:49] Epoch: 1 Batch: 2211/20099 (11.00%) Loss: 2.288319 LR: 0.00001718 [07:24:52] Epoch: 1 Batch: 2212/20099 (11.01%) Loss: 2.127893 LR: 0.00001724 [07:24:55] Epoch: 1 Batch: 2213/20099 (11.01%) Loss: 2.637189 LR: 0.00001724 [07:24:58] Epoch: 1 Batch: 2214/20099 (11.02%) Loss: 2.517625 LR: 0.00001724 [07:25:01] Epoch: 1 Batch: 2215/20099 (11.02%) Loss: 2.274584 LR: 0.00001724 [07:25:05] Epoch: 1 Batch: 2216/20099 (11.03%) Loss: 2.178013 LR: 0.00001724 [07:25:08] Epoch: 1 Batch: 2217/20099 (11.03%) Loss: 2.557345 LR: 0.00001724 [07:25:11] Epoch: 1 Batch: 2218/20099 (11.04%) Loss: 2.386651 LR: 0.00001724 [07:25:14] Epoch: 1 Batch: 2219/20099 (11.04%) Loss: 2.297599 LR: 0.00001729 [07:25:17] Epoch: 1 Batch: 2220/20099 (11.05%) Loss: 2.398722 LR: 0.00001729 [07:25:20] Epoch: 1 Batch: 2221/20099 (11.05%) Loss: 2.039381 LR: 0.00001729 [07:25:23] Epoch: 1 Batch: 2222/20099 (11.06%) Loss: 2.445382 LR: 0.00001729 [07:25:26] Epoch: 1 Batch: 2223/20099 (11.06%) Loss: 2.404015 LR: 0.00001729 [07:25:29] Epoch: 1 Batch: 2224/20099 (11.07%) Loss: 2.334263 LR: 0.00001729 [07:25:32] Epoch: 1 Batch: 2225/20099 (11.07%) Loss: 2.011540 LR: 0.00001729 [07:25:35] Epoch: 1 Batch: 2226/20099 (11.08%) Loss: 2.267080 LR: 0.00001735 [07:25:38] Epoch: 1 Batch: 2227/20099 (11.08%) Loss: 2.219900 LR: 0.00001735 [07:25:42] Epoch: 1 Batch: 2228/20099 (11.09%) Loss: 1.820150 LR: 0.00001735 [07:25:45] Epoch: 1 Batch: 2229/20099 (11.09%) Loss: 2.405012 LR: 0.00001735 [07:25:48] Epoch: 1 Batch: 2230/20099 (11.10%) Loss: 2.415494 LR: 0.00001735 [07:25:51] Epoch: 1 Batch: 2231/20099 (11.10%) Loss: 2.179748 LR: 0.00001735 [07:25:54] Epoch: 1 Batch: 2232/20099 (11.11%) Loss: 2.336345 LR: 0.00001735 [07:25:58] Epoch: 1 Batch: 2233/20099 (11.11%) Loss: 2.455586 LR: 0.00001740 [07:26:01] Epoch: 1 Batch: 2234/20099 (11.11%) Loss: 2.206902 LR: 0.00001740 [07:26:04] Epoch: 1 Batch: 2235/20099 (11.12%) Loss: 2.518841 LR: 0.00001740 [07:26:07] Epoch: 1 Batch: 2236/20099 (11.12%) Loss: 2.418238 LR: 0.00001740 [07:26:10] Epoch: 1 Batch: 2237/20099 (11.13%) Loss: 2.120307 LR: 0.00001740 [07:26:13] Epoch: 1 Batch: 2238/20099 (11.13%) Loss: 2.317286 LR: 0.00001740 [07:26:16] Epoch: 1 Batch: 2239/20099 (11.14%) Loss: 2.217853 LR: 0.00001740 [07:26:19] Epoch: 1 Batch: 2240/20099 (11.14%) Loss: 2.056600 LR: 0.00001745 [07:26:22] Epoch: 1 Batch: 2241/20099 (11.15%) Loss: 2.349730 LR: 0.00001745 [07:26:25] Epoch: 1 Batch: 2242/20099 (11.15%) Loss: 2.209155 LR: 0.00001745 [07:26:29] Epoch: 1 Batch: 2243/20099 (11.16%) Loss: 2.266071 LR: 0.00001745 [07:26:32] Epoch: 1 Batch: 2244/20099 (11.16%) Loss: 2.289361 LR: 0.00001745 [07:26:35] Epoch: 1 Batch: 2245/20099 (11.17%) Loss: 2.041457 LR: 0.00001745 [07:26:38] Epoch: 1 Batch: 2246/20099 (11.17%) Loss: 2.333389 LR: 0.00001745 [07:26:41] Epoch: 1 Batch: 2247/20099 (11.18%) Loss: 2.399965 LR: 0.00001751 [07:26:44] Epoch: 1 Batch: 2248/20099 (11.18%) Loss: 2.248127 LR: 0.00001751 [07:26:47] Epoch: 1 Batch: 2249/20099 (11.19%) Loss: 2.158881 LR: 0.00001751 [07:26:50] Epoch: 1 Batch: 2250/20099 (11.19%) Loss: 2.291950 LR: 0.00001751 [07:26:53] Epoch: 1 Batch: 2251/20099 (11.20%) Loss: 2.079212 LR: 0.00001751 [07:26:56] Epoch: 1 Batch: 2252/20099 (11.20%) Loss: 2.137505 LR: 0.00001751 [07:26:59] Epoch: 1 Batch: 2253/20099 (11.21%) Loss: 2.358327 LR: 0.00001751 [07:27:03] Epoch: 1 Batch: 2254/20099 (11.21%) Loss: 2.137950 LR: 0.00001756 [07:27:06] Epoch: 1 Batch: 2255/20099 (11.22%) Loss: 2.569938 LR: 0.00001756 [07:27:09] Epoch: 1 Batch: 2256/20099 (11.22%) Loss: 2.263758 LR: 0.00001756 [07:27:12] Epoch: 1 Batch: 2257/20099 (11.23%) Loss: 2.289107 LR: 0.00001756 [07:27:15] Epoch: 1 Batch: 2258/20099 (11.23%) Loss: 2.245047 LR: 0.00001756 [07:27:18] Epoch: 1 Batch: 2259/20099 (11.24%) Loss: 2.197377 LR: 0.00001756 [07:27:21] Epoch: 1 Batch: 2260/20099 (11.24%) Loss: 1.934034 LR: 0.00001756 [07:27:24] Epoch: 1 Batch: 2261/20099 (11.25%) Loss: 2.095039 LR: 0.00001762 [07:27:27] Epoch: 1 Batch: 2262/20099 (11.25%) Loss: 2.252041 LR: 0.00001762 [07:27:30] Epoch: 1 Batch: 2263/20099 (11.26%) Loss: 2.201217 LR: 0.00001762 [07:27:34] Epoch: 1 Batch: 2264/20099 (11.26%) Loss: 2.315679 LR: 0.00001762 [07:27:37] Epoch: 1 Batch: 2265/20099 (11.27%) Loss: 2.194971 LR: 0.00001762 [07:27:40] Epoch: 1 Batch: 2266/20099 (11.27%) Loss: 2.411655 LR: 0.00001762 [07:27:43] Epoch: 1 Batch: 2267/20099 (11.28%) Loss: 2.503161 LR: 0.00001762 [07:27:46] Epoch: 1 Batch: 2268/20099 (11.28%) Loss: 2.113563 LR: 0.00001767 [07:27:49] Epoch: 1 Batch: 2269/20099 (11.29%) Loss: 1.854295 LR: 0.00001767 [07:27:52] Epoch: 1 Batch: 2270/20099 (11.29%) Loss: 2.340843 LR: 0.00001767 [07:27:55] Epoch: 1 Batch: 2271/20099 (11.30%) Loss: 2.493485 LR: 0.00001767 [07:27:58] Epoch: 1 Batch: 2272/20099 (11.30%) Loss: 2.432115 LR: 0.00001767 [07:28:01] Epoch: 1 Batch: 2273/20099 (11.31%) Loss: 2.258824 LR: 0.00001767 [07:28:04] Epoch: 1 Batch: 2274/20099 (11.31%) Loss: 2.418018 LR: 0.00001767 [07:28:08] Epoch: 1 Batch: 2275/20099 (11.32%) Loss: 2.187926 LR: 0.00001773 [07:28:11] Epoch: 1 Batch: 2276/20099 (11.32%) Loss: 1.895923 LR: 0.00001773 [07:28:14] Epoch: 1 Batch: 2277/20099 (11.33%) Loss: 2.061088 LR: 0.00001773 [07:28:17] Epoch: 1 Batch: 2278/20099 (11.33%) Loss: 2.244972 LR: 0.00001773 [07:28:20] Epoch: 1 Batch: 2279/20099 (11.34%) Loss: 2.213605 LR: 0.00001773 [07:28:23] Epoch: 1 Batch: 2280/20099 (11.34%) Loss: 2.434781 LR: 0.00001773 [07:28:26] Epoch: 1 Batch: 2281/20099 (11.35%) Loss: 2.034761 LR: 0.00001773 [07:28:29] Epoch: 1 Batch: 2282/20099 (11.35%) Loss: 2.375212 LR: 0.00001778 [07:28:32] Epoch: 1 Batch: 2283/20099 (11.36%) Loss: 2.267987 LR: 0.00001778 [07:28:35] Epoch: 1 Batch: 2284/20099 (11.36%) Loss: 2.209011 LR: 0.00001778 [07:28:38] Epoch: 1 Batch: 2285/20099 (11.37%) Loss: 2.319532 LR: 0.00001778 [07:28:41] Epoch: 1 Batch: 2286/20099 (11.37%) Loss: 1.637609 LR: 0.00001778 [07:28:45] Epoch: 1 Batch: 2287/20099 (11.38%) Loss: 2.268727 LR: 0.00001778 [07:28:48] Epoch: 1 Batch: 2288/20099 (11.38%) Loss: 2.260520 LR: 0.00001778 [07:28:51] Epoch: 1 Batch: 2289/20099 (11.39%) Loss: 2.483506 LR: 0.00001784 [07:28:54] Epoch: 1 Batch: 2290/20099 (11.39%) Loss: 2.231959 LR: 0.00001784 [07:28:57] Epoch: 1 Batch: 2291/20099 (11.40%) Loss: 2.560749 LR: 0.00001784 [07:29:00] Epoch: 1 Batch: 2292/20099 (11.40%) Loss: 2.193611 LR: 0.00001784 [07:29:03] Epoch: 1 Batch: 2293/20099 (11.41%) Loss: 2.286368 LR: 0.00001784 [07:29:06] Epoch: 1 Batch: 2294/20099 (11.41%) Loss: 2.169125 LR: 0.00001784 [07:29:09] Epoch: 1 Batch: 2295/20099 (11.42%) Loss: 2.442679 LR: 0.00001784 [07:29:12] Epoch: 1 Batch: 2296/20099 (11.42%) Loss: 2.262251 LR: 0.00001789 [07:29:15] Epoch: 1 Batch: 2297/20099 (11.43%) Loss: 2.266653 LR: 0.00001789 [07:29:19] Epoch: 1 Batch: 2298/20099 (11.43%) Loss: 2.445362 LR: 0.00001789 [07:29:22] Epoch: 1 Batch: 2299/20099 (11.44%) Loss: 2.358265 LR: 0.00001789 [07:29:25] Epoch: 1 Batch: 2300/20099 (11.44%) Loss: 2.353177 LR: 0.00001789 [07:29:28] Epoch: 1 Batch: 2301/20099 (11.45%) Loss: 2.074112 LR: 0.00001789 [07:29:31] Epoch: 1 Batch: 2302/20099 (11.45%) Loss: 2.028365 LR: 0.00001789 [07:29:34] Epoch: 1 Batch: 2303/20099 (11.46%) Loss: 2.506512 LR: 0.00001795 [07:29:37] Epoch: 1 Batch: 2304/20099 (11.46%) Loss: 2.077575 LR: 0.00001795 [07:29:40] Epoch: 1 Batch: 2305/20099 (11.47%) Loss: 2.112476 LR: 0.00001795 [07:29:43] Epoch: 1 Batch: 2306/20099 (11.47%) Loss: 2.293051 LR: 0.00001795 [07:29:46] Epoch: 1 Batch: 2307/20099 (11.48%) Loss: 2.301848 LR: 0.00001795 [07:29:49] Epoch: 1 Batch: 2308/20099 (11.48%) Loss: 2.101388 LR: 0.00001795 [07:29:53] Epoch: 1 Batch: 2309/20099 (11.49%) Loss: 2.236845 LR: 0.00001795 [07:29:56] Epoch: 1 Batch: 2310/20099 (11.49%) Loss: 2.530491 LR: 0.00001800 [07:29:59] Epoch: 1 Batch: 2311/20099 (11.50%) Loss: 2.431580 LR: 0.00001800 [07:30:02] Epoch: 1 Batch: 2312/20099 (11.50%) Loss: 2.377620 LR: 0.00001800 [07:30:05] Epoch: 1 Batch: 2313/20099 (11.51%) Loss: 2.392697 LR: 0.00001800 [07:30:08] Epoch: 1 Batch: 2314/20099 (11.51%) Loss: 2.497182 LR: 0.00001800 [07:30:11] Epoch: 1 Batch: 2315/20099 (11.52%) Loss: 2.419215 LR: 0.00001800 [07:30:14] Epoch: 1 Batch: 2316/20099 (11.52%) Loss: 2.185939 LR: 0.00001800 [07:30:17] Epoch: 1 Batch: 2317/20099 (11.53%) Loss: 2.259358 LR: 0.00001805 [07:30:20] Epoch: 1 Batch: 2318/20099 (11.53%) Loss: 2.219774 LR: 0.00001805 [07:30:24] Epoch: 1 Batch: 2319/20099 (11.54%) Loss: 2.392250 LR: 0.00001805 [07:30:27] Epoch: 1 Batch: 2320/20099 (11.54%) Loss: 2.345712 LR: 0.00001805 [07:30:30] Epoch: 1 Batch: 2321/20099 (11.55%) Loss: 2.163954 LR: 0.00001805 [07:30:33] Epoch: 1 Batch: 2322/20099 (11.55%) Loss: 2.197498 LR: 0.00001805 [07:30:36] Epoch: 1 Batch: 2323/20099 (11.56%) Loss: 2.287405 LR: 0.00001805 [07:30:39] Epoch: 1 Batch: 2324/20099 (11.56%) Loss: 2.259254 LR: 0.00001811 [07:30:42] Epoch: 1 Batch: 2325/20099 (11.57%) Loss: 2.182793 LR: 0.00001811 [07:30:45] Epoch: 1 Batch: 2326/20099 (11.57%) Loss: 2.276844 LR: 0.00001811 [07:30:48] Epoch: 1 Batch: 2327/20099 (11.58%) Loss: 1.962260 LR: 0.00001811 [07:30:51] Epoch: 1 Batch: 2328/20099 (11.58%) Loss: 2.307644 LR: 0.00001811 [07:30:54] Epoch: 1 Batch: 2329/20099 (11.59%) Loss: 2.300001 LR: 0.00001811 [07:30:58] Epoch: 1 Batch: 2330/20099 (11.59%) Loss: 2.341674 LR: 0.00001811 [07:31:01] Epoch: 1 Batch: 2331/20099 (11.60%) Loss: 2.235319 LR: 0.00001816 [07:31:04] Epoch: 1 Batch: 2332/20099 (11.60%) Loss: 2.292845 LR: 0.00001816 [07:31:07] Epoch: 1 Batch: 2333/20099 (11.61%) Loss: 2.275460 LR: 0.00001816 [07:31:10] Epoch: 1 Batch: 2334/20099 (11.61%) Loss: 2.094487 LR: 0.00001816 [07:31:13] Epoch: 1 Batch: 2335/20099 (11.62%) Loss: 2.381038 LR: 0.00001816 [07:31:16] Epoch: 1 Batch: 2336/20099 (11.62%) Loss: 2.430778 LR: 0.00001816 [07:31:19] Epoch: 1 Batch: 2337/20099 (11.63%) Loss: 2.397788 LR: 0.00001816 [07:31:22] Epoch: 1 Batch: 2338/20099 (11.63%) Loss: 2.337412 LR: 0.00001822 [07:31:25] Epoch: 1 Batch: 2339/20099 (11.64%) Loss: 2.185457 LR: 0.00001822 [07:31:29] Epoch: 1 Batch: 2340/20099 (11.64%) Loss: 2.143036 LR: 0.00001822 [07:31:32] Epoch: 1 Batch: 2341/20099 (11.65%) Loss: 1.926982 LR: 0.00001822 [07:31:35] Epoch: 1 Batch: 2342/20099 (11.65%) Loss: 2.271755 LR: 0.00001822 [07:31:38] Epoch: 1 Batch: 2343/20099 (11.66%) Loss: 2.345785 LR: 0.00001822 [07:31:41] Epoch: 1 Batch: 2344/20099 (11.66%) Loss: 2.166161 LR: 0.00001822 [07:31:44] Epoch: 1 Batch: 2345/20099 (11.67%) Loss: 2.046388 LR: 0.00001827 [07:31:47] Epoch: 1 Batch: 2346/20099 (11.67%) Loss: 2.263599 LR: 0.00001827 [07:31:50] Epoch: 1 Batch: 2347/20099 (11.68%) Loss: 2.067366 LR: 0.00001827 [07:31:53] Epoch: 1 Batch: 2348/20099 (11.68%) Loss: 2.274108 LR: 0.00001827 [07:31:56] Epoch: 1 Batch: 2349/20099 (11.69%) Loss: 2.387689 LR: 0.00001827 [07:32:00] Epoch: 1 Batch: 2350/20099 (11.69%) Loss: 2.247333 LR: 0.00001827 [07:32:03] Epoch: 1 Batch: 2351/20099 (11.70%) Loss: 2.300290 LR: 0.00001827 [07:32:06] Epoch: 1 Batch: 2352/20099 (11.70%) Loss: 2.393127 LR: 0.00001833 [07:32:09] Epoch: 1 Batch: 2353/20099 (11.71%) Loss: 2.277577 LR: 0.00001833 [07:32:12] Epoch: 1 Batch: 2354/20099 (11.71%) Loss: 2.340128 LR: 0.00001833 [07:32:15] Epoch: 1 Batch: 2355/20099 (11.72%) Loss: 2.167192 LR: 0.00001833 [07:32:18] Epoch: 1 Batch: 2356/20099 (11.72%) Loss: 2.153863 LR: 0.00001833 [07:32:21] Epoch: 1 Batch: 2357/20099 (11.73%) Loss: 2.420443 LR: 0.00001833 [07:32:24] Epoch: 1 Batch: 2358/20099 (11.73%) Loss: 2.433759 LR: 0.00001833 [07:32:27] Epoch: 1 Batch: 2359/20099 (11.74%) Loss: 2.406386 LR: 0.00001838 [07:32:31] Epoch: 1 Batch: 2360/20099 (11.74%) Loss: 2.473785 LR: 0.00001838 [07:32:34] Epoch: 1 Batch: 2361/20099 (11.75%) Loss: 2.288433 LR: 0.00001838 [07:32:37] Epoch: 1 Batch: 2362/20099 (11.75%) Loss: 2.087006 LR: 0.00001838 [07:32:40] Epoch: 1 Batch: 2363/20099 (11.76%) Loss: 2.089127 LR: 0.00001838 [07:32:43] Epoch: 1 Batch: 2364/20099 (11.76%) Loss: 2.444918 LR: 0.00001838 [07:32:46] Epoch: 1 Batch: 2365/20099 (11.77%) Loss: 2.234603 LR: 0.00001838 [07:32:49] Epoch: 1 Batch: 2366/20099 (11.77%) Loss: 2.328384 LR: 0.00001844 [07:32:52] Epoch: 1 Batch: 2367/20099 (11.78%) Loss: 2.102800 LR: 0.00001844 [07:32:55] Epoch: 1 Batch: 2368/20099 (11.78%) Loss: 2.038360 LR: 0.00001844 [07:32:59] Epoch: 1 Batch: 2369/20099 (11.79%) Loss: 2.287847 LR: 0.00001844 [07:33:02] Epoch: 1 Batch: 2370/20099 (11.79%) Loss: 2.127258 LR: 0.00001844 [07:33:05] Epoch: 1 Batch: 2371/20099 (11.80%) Loss: 2.355172 LR: 0.00001844 [07:33:08] Epoch: 1 Batch: 2372/20099 (11.80%) Loss: 2.116716 LR: 0.00001844 [07:33:11] Epoch: 1 Batch: 2373/20099 (11.81%) Loss: 2.226033 LR: 0.00001849 [07:33:14] Epoch: 1 Batch: 2374/20099 (11.81%) Loss: 2.184504 LR: 0.00001849 [07:33:17] Epoch: 1 Batch: 2375/20099 (11.82%) Loss: 2.260288 LR: 0.00001849 [07:33:20] Epoch: 1 Batch: 2376/20099 (11.82%) Loss: 2.255987 LR: 0.00001849 [07:33:23] Epoch: 1 Batch: 2377/20099 (11.83%) Loss: 2.411364 LR: 0.00001849 [07:33:27] Epoch: 1 Batch: 2378/20099 (11.83%) Loss: 2.199879 LR: 0.00001849 [07:33:30] Epoch: 1 Batch: 2379/20099 (11.84%) Loss: 2.195881 LR: 0.00001849 [07:33:33] Epoch: 1 Batch: 2380/20099 (11.84%) Loss: 1.816497 LR: 0.00001855 [07:33:36] Epoch: 1 Batch: 2381/20099 (11.85%) Loss: 2.161429 LR: 0.00001855 [07:33:39] Epoch: 1 Batch: 2382/20099 (11.85%) Loss: 2.385235 LR: 0.00001855 [07:33:42] Epoch: 1 Batch: 2383/20099 (11.86%) Loss: 2.092165 LR: 0.00001855 [07:33:45] Epoch: 1 Batch: 2384/20099 (11.86%) Loss: 2.221914 LR: 0.00001855 [07:33:48] Epoch: 1 Batch: 2385/20099 (11.87%) Loss: 2.307705 LR: 0.00001855 [07:33:51] Epoch: 1 Batch: 2386/20099 (11.87%) Loss: 2.318867 LR: 0.00001855 [07:33:54] Epoch: 1 Batch: 2387/20099 (11.88%) Loss: 2.426428 LR: 0.00001860 [07:33:58] Epoch: 1 Batch: 2388/20099 (11.88%) Loss: 2.140994 LR: 0.00001860 [07:34:01] Epoch: 1 Batch: 2389/20099 (11.89%) Loss: 2.290293 LR: 0.00001860 [07:34:04] Epoch: 1 Batch: 2390/20099 (11.89%) Loss: 2.007528 LR: 0.00001860 [07:34:07] Epoch: 1 Batch: 2391/20099 (11.90%) Loss: 2.167539 LR: 0.00001860 [07:34:10] Epoch: 1 Batch: 2392/20099 (11.90%) Loss: 2.326761 LR: 0.00001860 [07:34:13] Epoch: 1 Batch: 2393/20099 (11.91%) Loss: 1.986004 LR: 0.00001860 [07:34:16] Epoch: 1 Batch: 2394/20099 (11.91%) Loss: 1.992854 LR: 0.00001865 [07:34:19] Epoch: 1 Batch: 2395/20099 (11.92%) Loss: 2.356692 LR: 0.00001865 [07:34:22] Epoch: 1 Batch: 2396/20099 (11.92%) Loss: 2.322082 LR: 0.00001865 [07:34:25] Epoch: 1 Batch: 2397/20099 (11.93%) Loss: 2.363139 LR: 0.00001865 [07:34:29] Epoch: 1 Batch: 2398/20099 (11.93%) Loss: 2.349685 LR: 0.00001865 [07:34:32] Epoch: 1 Batch: 2399/20099 (11.94%) Loss: 2.484805 LR: 0.00001865 [07:34:38] >> Cleaned up old temp checkpoint: epoch1_step400 [07:34:38] >> Temp checkpoint saved: epoch1_step2400, size: 0.1693 GB [07:34:38] Epoch: 1 Batch: 2400/20099 (11.94%) Loss: 2.348801 LR: 0.00001865 [07:34:42] Epoch: 1 Batch: 2401/20099 (11.95%) Loss: 2.220187 LR: 0.00001871 [07:34:45] Epoch: 1 Batch: 2402/20099 (11.95%) Loss: 2.024014 LR: 0.00001871 [07:34:48] Epoch: 1 Batch: 2403/20099 (11.96%) Loss: 2.001157 LR: 0.00001871 [07:34:51] Epoch: 1 Batch: 2404/20099 (11.96%) Loss: 2.390901 LR: 0.00001871 [07:34:54] Epoch: 1 Batch: 2405/20099 (11.97%) Loss: 2.469045 LR: 0.00001871 [07:34:57] Epoch: 1 Batch: 2406/20099 (11.97%) Loss: 2.221735 LR: 0.00001871 [07:35:00] Epoch: 1 Batch: 2407/20099 (11.98%) Loss: 2.217654 LR: 0.00001871 [07:35:03] Epoch: 1 Batch: 2408/20099 (11.98%) Loss: 2.452870 LR: 0.00001876 [07:35:06] Epoch: 1 Batch: 2409/20099 (11.99%) Loss: 2.304440 LR: 0.00001876 [07:35:10] Epoch: 1 Batch: 2410/20099 (11.99%) Loss: 2.290774 LR: 0.00001876 [07:35:13] Epoch: 1 Batch: 2411/20099 (12.00%) Loss: 2.192205 LR: 0.00001876 [07:35:16] Epoch: 1 Batch: 2412/20099 (12.00%) Loss: 2.633580 LR: 0.00001876 [07:35:19] Epoch: 1 Batch: 2413/20099 (12.01%) Loss: 2.602895 LR: 0.00001876 [07:35:22] Epoch: 1 Batch: 2414/20099 (12.01%) Loss: 2.314735 LR: 0.00001876 [07:35:25] Epoch: 1 Batch: 2415/20099 (12.02%) Loss: 2.255551 LR: 0.00001882 [07:35:28] Epoch: 1 Batch: 2416/20099 (12.02%) Loss: 2.021905 LR: 0.00001882 [07:35:31] Epoch: 1 Batch: 2417/20099 (12.03%) Loss: 2.523207 LR: 0.00001882 [07:35:35] Epoch: 1 Batch: 2418/20099 (12.03%) Loss: 2.228493 LR: 0.00001882 [07:35:38] Epoch: 1 Batch: 2419/20099 (12.04%) Loss: 2.212708 LR: 0.00001882 [07:35:41] Epoch: 1 Batch: 2420/20099 (12.04%) Loss: 2.421712 LR: 0.00001882 [07:35:44] Epoch: 1 Batch: 2421/20099 (12.05%) Loss: 2.050515 LR: 0.00001882 [07:35:47] Epoch: 1 Batch: 2422/20099 (12.05%) Loss: 1.923726 LR: 0.00001887 [07:35:50] Epoch: 1 Batch: 2423/20099 (12.06%) Loss: 1.975281 LR: 0.00001887 [07:35:53] Epoch: 1 Batch: 2424/20099 (12.06%) Loss: 2.134868 LR: 0.00001887 [07:35:56] Epoch: 1 Batch: 2425/20099 (12.07%) Loss: 2.547057 LR: 0.00001887 [07:35:59] Epoch: 1 Batch: 2426/20099 (12.07%) Loss: 2.008806 LR: 0.00001887 [07:36:02] Epoch: 1 Batch: 2427/20099 (12.08%) Loss: 2.321616 LR: 0.00001887 [07:36:05] Epoch: 1 Batch: 2428/20099 (12.08%) Loss: 2.218908 LR: 0.00001887 [07:36:09] Epoch: 1 Batch: 2429/20099 (12.09%) Loss: 2.134288 LR: 0.00001893 [07:36:12] Epoch: 1 Batch: 2430/20099 (12.09%) Loss: 2.059369 LR: 0.00001893 [07:36:15] Epoch: 1 Batch: 2431/20099 (12.10%) Loss: 2.198492 LR: 0.00001893 [07:36:18] Epoch: 1 Batch: 2432/20099 (12.10%) Loss: 2.346263 LR: 0.00001893 [07:36:21] Epoch: 1 Batch: 2433/20099 (12.11%) Loss: 2.100940 LR: 0.00001893 [07:36:24] Epoch: 1 Batch: 2434/20099 (12.11%) Loss: 1.863545 LR: 0.00001893 [07:36:27] Epoch: 1 Batch: 2435/20099 (12.12%) Loss: 2.209511 LR: 0.00001893 [07:36:30] Epoch: 1 Batch: 2436/20099 (12.12%) Loss: 2.450082 LR: 0.00001898 [07:36:33] Epoch: 1 Batch: 2437/20099 (12.12%) Loss: 2.427570 LR: 0.00001898 [07:36:36] Epoch: 1 Batch: 2438/20099 (12.13%) Loss: 2.369405 LR: 0.00001898 [07:36:39] Epoch: 1 Batch: 2439/20099 (12.13%) Loss: 2.299678 LR: 0.00001898 [07:36:43] Epoch: 1 Batch: 2440/20099 (12.14%) Loss: 2.358124 LR: 0.00001898 [07:36:46] Epoch: 1 Batch: 2441/20099 (12.14%) Loss: 2.086754 LR: 0.00001898 [07:36:49] Epoch: 1 Batch: 2442/20099 (12.15%) Loss: 2.013250 LR: 0.00001898 [07:36:52] Epoch: 1 Batch: 2443/20099 (12.15%) Loss: 2.108417 LR: 0.00001904 [07:36:55] Epoch: 1 Batch: 2444/20099 (12.16%) Loss: 2.580287 LR: 0.00001904 [07:36:58] Epoch: 1 Batch: 2445/20099 (12.16%) Loss: 2.278261 LR: 0.00001904 [07:37:01] Epoch: 1 Batch: 2446/20099 (12.17%) Loss: 2.355254 LR: 0.00001904 [07:37:04] Epoch: 1 Batch: 2447/20099 (12.17%) Loss: 2.183835 LR: 0.00001904 [07:37:07] Epoch: 1 Batch: 2448/20099 (12.18%) Loss: 2.211163 LR: 0.00001904 [07:37:11] Epoch: 1 Batch: 2449/20099 (12.18%) Loss: 2.198703 LR: 0.00001904 [07:37:14] Epoch: 1 Batch: 2450/20099 (12.19%) Loss: 2.058736 LR: 0.00001909 [07:37:17] Epoch: 1 Batch: 2451/20099 (12.19%) Loss: 2.130414 LR: 0.00001909 [07:37:20] Epoch: 1 Batch: 2452/20099 (12.20%) Loss: 2.057219 LR: 0.00001909 [07:37:23] Epoch: 1 Batch: 2453/20099 (12.20%) Loss: 2.139417 LR: 0.00001909 [07:37:26] Epoch: 1 Batch: 2454/20099 (12.21%) Loss: 1.942473 LR: 0.00001909 [07:37:29] Epoch: 1 Batch: 2455/20099 (12.21%) Loss: 2.254039 LR: 0.00001909 [07:37:32] Epoch: 1 Batch: 2456/20099 (12.22%) Loss: 2.186723 LR: 0.00001909 [07:37:35] Epoch: 1 Batch: 2457/20099 (12.22%) Loss: 2.691996 LR: 0.00001915 [07:37:38] Epoch: 1 Batch: 2458/20099 (12.23%) Loss: 2.700078 LR: 0.00001915 [07:37:42] Epoch: 1 Batch: 2459/20099 (12.23%) Loss: 2.402087 LR: 0.00001915 [07:37:45] Epoch: 1 Batch: 2460/20099 (12.24%) Loss: 2.241405 LR: 0.00001915 [07:37:48] Epoch: 1 Batch: 2461/20099 (12.24%) Loss: 2.268862 LR: 0.00001915 [07:37:51] Epoch: 1 Batch: 2462/20099 (12.25%) Loss: 2.337632 LR: 0.00001915 [07:37:54] Epoch: 1 Batch: 2463/20099 (12.25%) Loss: 2.109412 LR: 0.00001915 [07:37:57] Epoch: 1 Batch: 2464/20099 (12.26%) Loss: 2.486362 LR: 0.00001920 [07:38:00] Epoch: 1 Batch: 2465/20099 (12.26%) Loss: 2.306170 LR: 0.00001920 [07:38:03] Epoch: 1 Batch: 2466/20099 (12.27%) Loss: 2.304505 LR: 0.00001920 [07:38:06] Epoch: 1 Batch: 2467/20099 (12.27%) Loss: 2.122912 LR: 0.00001920 [07:38:09] Epoch: 1 Batch: 2468/20099 (12.28%) Loss: 2.344028 LR: 0.00001920 [07:38:12] Epoch: 1 Batch: 2469/20099 (12.28%) Loss: 2.247477 LR: 0.00001920 [07:38:16] Epoch: 1 Batch: 2470/20099 (12.29%) Loss: 2.147029 LR: 0.00001920 [07:38:19] Epoch: 1 Batch: 2471/20099 (12.29%) Loss: 2.391531 LR: 0.00001925 [07:38:22] Epoch: 1 Batch: 2472/20099 (12.30%) Loss: 2.184744 LR: 0.00001925 [07:38:25] Epoch: 1 Batch: 2473/20099 (12.30%) Loss: 2.191040 LR: 0.00001925 [07:38:28] Epoch: 1 Batch: 2474/20099 (12.31%) Loss: 2.332690 LR: 0.00001925 [07:38:31] Epoch: 1 Batch: 2475/20099 (12.31%) Loss: 2.463895 LR: 0.00001925 [07:38:34] Epoch: 1 Batch: 2476/20099 (12.32%) Loss: 1.856904 LR: 0.00001925 [07:38:37] Epoch: 1 Batch: 2477/20099 (12.32%) Loss: 2.316909 LR: 0.00001925 [07:38:40] Epoch: 1 Batch: 2478/20099 (12.33%) Loss: 2.293367 LR: 0.00001931 [07:38:43] Epoch: 1 Batch: 2479/20099 (12.33%) Loss: 2.164303 LR: 0.00001931 [07:38:46] Epoch: 1 Batch: 2480/20099 (12.34%) Loss: 2.276574 LR: 0.00001931 [07:38:50] Epoch: 1 Batch: 2481/20099 (12.34%) Loss: 2.073965 LR: 0.00001931 [07:38:53] Epoch: 1 Batch: 2482/20099 (12.35%) Loss: 2.147082 LR: 0.00001931 [07:38:56] Epoch: 1 Batch: 2483/20099 (12.35%) Loss: 2.377464 LR: 0.00001931 [07:38:59] Epoch: 1 Batch: 2484/20099 (12.36%) Loss: 1.895467 LR: 0.00001931 [07:39:02] Epoch: 1 Batch: 2485/20099 (12.36%) Loss: 2.226432 LR: 0.00001936 [07:39:05] Epoch: 1 Batch: 2486/20099 (12.37%) Loss: 1.859377 LR: 0.00001936 [07:39:08] Epoch: 1 Batch: 2487/20099 (12.37%) Loss: 2.297233 LR: 0.00001936 [07:39:11] Epoch: 1 Batch: 2488/20099 (12.38%) Loss: 2.406906 LR: 0.00001936 [07:39:14] Epoch: 1 Batch: 2489/20099 (12.38%) Loss: 1.951037 LR: 0.00001936 [07:39:17] Epoch: 1 Batch: 2490/20099 (12.39%) Loss: 1.933472 LR: 0.00001936 [07:39:20] Epoch: 1 Batch: 2491/20099 (12.39%) Loss: 2.001258 LR: 0.00001936 [07:39:24] Epoch: 1 Batch: 2492/20099 (12.40%) Loss: 2.424253 LR: 0.00001942 [07:39:27] Epoch: 1 Batch: 2493/20099 (12.40%) Loss: 2.236502 LR: 0.00001942 [07:39:30] Epoch: 1 Batch: 2494/20099 (12.41%) Loss: 2.374649 LR: 0.00001942 [07:39:33] Epoch: 1 Batch: 2495/20099 (12.41%) Loss: 2.228652 LR: 0.00001942 [07:39:36] Epoch: 1 Batch: 2496/20099 (12.42%) Loss: 2.513165 LR: 0.00001942 [07:39:39] Epoch: 1 Batch: 2497/20099 (12.42%) Loss: 2.257178 LR: 0.00001942 [07:39:42] Epoch: 1 Batch: 2498/20099 (12.43%) Loss: 2.273577 LR: 0.00001942 [07:39:45] Epoch: 1 Batch: 2499/20099 (12.43%) Loss: 1.954978 LR: 0.00001947 [07:39:48] >> Evaluating batch 0 [07:39:50] >> Evaluating batch 1 [07:39:51] >> Evaluating batch 2 [07:39:52] >> Evaluating batch 3 [07:39:53] >> Evaluating batch 4 [07:39:55] >> Evaluating batch 5 [07:39:56] >> Evaluating batch 6 [07:39:57] >> Evaluating batch 7 [07:39:59] >> Evaluating batch 8 [07:40:00] >> Evaluating batch 9 [07:40:01] >> Evaluating batch 10 [07:40:02] >> Evaluating batch 11 [07:40:03] >> Evaluating batch 12 [07:40:04] >> Evaluating batch 13 [07:40:06] >> Evaluating batch 14 [07:40:07] >> Evaluating batch 15 [07:40:08] >> Evaluating batch 16 [07:40:09] Epoch: 1 Step: 2500/20099 Evaluation: [07:40:09] [1mAvg Loss Since Last Eval: 2.2613 Val Loss: 2.3137 Validation loss delta: -0.0425 Perplexity: 10.1120 LR: 0.00001947 [07:40:12] >> Checkpoint saved: epoch1_step2500, size: 0.1693 GB [07:40:12] Epoch: 1 Batch: 2500/20099 (12.44%) Loss: 1.880281 LR: 0.00001947 [07:40:15] Epoch: 1 Batch: 2501/20099 (12.44%) Loss: 2.622879 LR: 0.00001947 [07:40:18] Epoch: 1 Batch: 2502/20099 (12.45%) Loss: 2.188593 LR: 0.00001947 [07:40:21] Epoch: 1 Batch: 2503/20099 (12.45%) Loss: 2.152611 LR: 0.00001947 [07:40:25] Epoch: 1 Batch: 2504/20099 (12.46%) Loss: 2.273496 LR: 0.00001947 [07:40:28] Epoch: 1 Batch: 2505/20099 (12.46%) Loss: 2.543478 LR: 0.00001947 [07:40:31] Epoch: 1 Batch: 2506/20099 (12.47%) Loss: 2.349094 LR: 0.00001953 [07:40:34] Epoch: 1 Batch: 2507/20099 (12.47%) Loss: 1.993045 LR: 0.00001953 [07:40:37] Epoch: 1 Batch: 2508/20099 (12.48%) Loss: 2.237552 LR: 0.00001953 [07:40:40] Epoch: 1 Batch: 2509/20099 (12.48%) Loss: 2.067101 LR: 0.00001953 [07:40:43] Epoch: 1 Batch: 2510/20099 (12.49%) Loss: 2.039281 LR: 0.00001953 [07:40:46] Epoch: 1 Batch: 2511/20099 (12.49%) Loss: 2.329012 LR: 0.00001953 [07:40:49] Epoch: 1 Batch: 2512/20099 (12.50%) Loss: 2.287578 LR: 0.00001953 [07:40:53] Epoch: 1 Batch: 2513/20099 (12.50%) Loss: 2.235556 LR: 0.00001958 [07:40:56] Epoch: 1 Batch: 2514/20099 (12.51%) Loss: 2.187831 LR: 0.00001958 [07:40:59] Epoch: 1 Batch: 2515/20099 (12.51%) Loss: 2.352833 LR: 0.00001958 [07:41:02] Epoch: 1 Batch: 2516/20099 (12.52%) Loss: 2.101795 LR: 0.00001958 [07:41:05] Epoch: 1 Batch: 2517/20099 (12.52%) Loss: 2.415463 LR: 0.00001958 [07:41:08] Epoch: 1 Batch: 2518/20099 (12.53%) Loss: 2.008004 LR: 0.00001958 [07:41:11] Epoch: 1 Batch: 2519/20099 (12.53%) Loss: 2.271945 LR: 0.00001958 [07:41:14] Epoch: 1 Batch: 2520/20099 (12.54%) Loss: 2.556609 LR: 0.00001964 [07:41:17] Epoch: 1 Batch: 2521/20099 (12.54%) Loss: 2.310710 LR: 0.00001964 [07:41:20] Epoch: 1 Batch: 2522/20099 (12.55%) Loss: 2.175994 LR: 0.00001964 [07:41:24] Epoch: 1 Batch: 2523/20099 (12.55%) Loss: 2.208343 LR: 0.00001964 [07:41:27] Epoch: 1 Batch: 2524/20099 (12.56%) Loss: 1.982280 LR: 0.00001964 [07:41:30] Epoch: 1 Batch: 2525/20099 (12.56%) Loss: 2.159120 LR: 0.00001964 [07:41:33] Epoch: 1 Batch: 2526/20099 (12.57%) Loss: 2.208865 LR: 0.00001964 [07:41:36] Epoch: 1 Batch: 2527/20099 (12.57%) Loss: 2.455609 LR: 0.00001969 [07:41:39] Epoch: 1 Batch: 2528/20099 (12.58%) Loss: 2.260881 LR: 0.00001969 [07:41:42] Epoch: 1 Batch: 2529/20099 (12.58%) Loss: 2.494386 LR: 0.00001969 [07:41:45] Epoch: 1 Batch: 2530/20099 (12.59%) Loss: 2.384629 LR: 0.00001969 [07:41:48] Epoch: 1 Batch: 2531/20099 (12.59%) Loss: 2.269327 LR: 0.00001969 [07:41:51] Epoch: 1 Batch: 2532/20099 (12.60%) Loss: 2.471129 LR: 0.00001969 [07:41:54] Epoch: 1 Batch: 2533/20099 (12.60%) Loss: 2.383344 LR: 0.00001969 [07:41:58] Epoch: 1 Batch: 2534/20099 (12.61%) Loss: 2.177846 LR: 0.00001975 [07:42:01] Epoch: 1 Batch: 2535/20099 (12.61%) Loss: 2.138136 LR: 0.00001975 [07:42:04] Epoch: 1 Batch: 2536/20099 (12.62%) Loss: 2.294704 LR: 0.00001975 [07:42:07] Epoch: 1 Batch: 2537/20099 (12.62%) Loss: 2.065362 LR: 0.00001975 [07:42:10] Epoch: 1 Batch: 2538/20099 (12.63%) Loss: 1.994961 LR: 0.00001975 [07:42:13] Epoch: 1 Batch: 2539/20099 (12.63%) Loss: 2.059072 LR: 0.00001975 [07:42:16] Epoch: 1 Batch: 2540/20099 (12.64%) Loss: 2.085220 LR: 0.00001975 [07:42:19] Epoch: 1 Batch: 2541/20099 (12.64%) Loss: 2.270836 LR: 0.00001980 [07:42:22] Epoch: 1 Batch: 2542/20099 (12.65%) Loss: 2.057643 LR: 0.00001980 [07:42:26] Epoch: 1 Batch: 2543/20099 (12.65%) Loss: 2.272408 LR: 0.00001980 [07:42:29] Epoch: 1 Batch: 2544/20099 (12.66%) Loss: 2.227843 LR: 0.00001980 [07:42:32] Epoch: 1 Batch: 2545/20099 (12.66%) Loss: 2.214902 LR: 0.00001980 [07:42:35] Epoch: 1 Batch: 2546/20099 (12.67%) Loss: 2.252584 LR: 0.00001980 [07:42:38] Epoch: 1 Batch: 2547/20099 (12.67%) Loss: 2.219244 LR: 0.00001980 [07:42:41] Epoch: 1 Batch: 2548/20099 (12.68%) Loss: 2.209745 LR: 0.00001985 [07:42:44] Epoch: 1 Batch: 2549/20099 (12.68%) Loss: 1.773001 LR: 0.00001985 [07:42:47] Epoch: 1 Batch: 2550/20099 (12.69%) Loss: 2.134299 LR: 0.00001985 [07:42:50] Epoch: 1 Batch: 2551/20099 (12.69%) Loss: 2.155014 LR: 0.00001985 [07:42:53] Epoch: 1 Batch: 2552/20099 (12.70%) Loss: 2.349919 LR: 0.00001985 [07:42:57] Epoch: 1 Batch: 2553/20099 (12.70%) Loss: 2.046849 LR: 0.00001985 [07:43:00] Epoch: 1 Batch: 2554/20099 (12.71%) Loss: 2.180592 LR: 0.00001985 [07:43:03] Epoch: 1 Batch: 2555/20099 (12.71%) Loss: 2.131764 LR: 0.00001991 [07:43:06] Epoch: 1 Batch: 2556/20099 (12.72%) Loss: 2.106577 LR: 0.00001991 [07:43:09] Epoch: 1 Batch: 2557/20099 (12.72%) Loss: 2.235081 LR: 0.00001991 [07:43:12] Epoch: 1 Batch: 2558/20099 (12.73%) Loss: 2.340556 LR: 0.00001991 [07:43:15] Epoch: 1 Batch: 2559/20099 (12.73%) Loss: 2.236594 LR: 0.00001991 [07:43:18] Epoch: 1 Batch: 2560/20099 (12.74%) Loss: 2.321872 LR: 0.00001991 [07:43:21] Epoch: 1 Batch: 2561/20099 (12.74%) Loss: 2.266024 LR: 0.00001991 [07:43:25] Epoch: 1 Batch: 2562/20099 (12.75%) Loss: 2.563259 LR: 0.00001996 [07:43:28] Epoch: 1 Batch: 2563/20099 (12.75%) Loss: 1.950840 LR: 0.00001996 [07:43:31] Epoch: 1 Batch: 2564/20099 (12.76%) Loss: 2.162748 LR: 0.00001996 [07:43:34] Epoch: 1 Batch: 2565/20099 (12.76%) Loss: 2.317983 LR: 0.00001996 [07:43:37] Epoch: 1 Batch: 2566/20099 (12.77%) Loss: 2.003156 LR: 0.00001996 [07:43:40] Epoch: 1 Batch: 2567/20099 (12.77%) Loss: 2.436775 LR: 0.00001996 [07:43:43] Epoch: 1 Batch: 2568/20099 (12.78%) Loss: 2.107519 LR: 0.00001996 [07:43:46] Epoch: 1 Batch: 2569/20099 (12.78%) Loss: 2.134382 LR: 0.00002002 [07:43:49] Epoch: 1 Batch: 2570/20099 (12.79%) Loss: 2.186508 LR: 0.00002002 [07:43:52] Epoch: 1 Batch: 2571/20099 (12.79%) Loss: 2.253652 LR: 0.00002002 [07:43:55] Epoch: 1 Batch: 2572/20099 (12.80%) Loss: 2.809451 LR: 0.00002002 [07:43:59] Epoch: 1 Batch: 2573/20099 (12.80%) Loss: 2.073937 LR: 0.00002002 [07:44:02] Epoch: 1 Batch: 2574/20099 (12.81%) Loss: 2.352082 LR: 0.00002002 [07:44:05] Epoch: 1 Batch: 2575/20099 (12.81%) Loss: 2.506079 LR: 0.00002002 [07:44:08] Epoch: 1 Batch: 2576/20099 (12.82%) Loss: 1.922283 LR: 0.00002007 [07:44:11] Epoch: 1 Batch: 2577/20099 (12.82%) Loss: 2.225896 LR: 0.00002007 [07:44:14] Epoch: 1 Batch: 2578/20099 (12.83%) Loss: 2.489269 LR: 0.00002007 [07:44:17] Epoch: 1 Batch: 2579/20099 (12.83%) Loss: 2.252371 LR: 0.00002007 [07:44:20] Epoch: 1 Batch: 2580/20099 (12.84%) Loss: 2.202752 LR: 0.00002007 [07:44:23] Epoch: 1 Batch: 2581/20099 (12.84%) Loss: 2.412083 LR: 0.00002007 [07:44:26] Epoch: 1 Batch: 2582/20099 (12.85%) Loss: 2.132069 LR: 0.00002007 [07:44:29] Epoch: 1 Batch: 2583/20099 (12.85%) Loss: 2.249569 LR: 0.00002013 [07:44:33] Epoch: 1 Batch: 2584/20099 (12.86%) Loss: 2.245947 LR: 0.00002013 [07:44:36] Epoch: 1 Batch: 2585/20099 (12.86%) Loss: 2.046800 LR: 0.00002013 [07:44:39] Epoch: 1 Batch: 2586/20099 (12.87%) Loss: 2.369754 LR: 0.00002013 [07:44:42] Epoch: 1 Batch: 2587/20099 (12.87%) Loss: 2.082038 LR: 0.00002013 [07:44:45] Epoch: 1 Batch: 2588/20099 (12.88%) Loss: 2.102934 LR: 0.00002013 [07:44:48] Epoch: 1 Batch: 2589/20099 (12.88%) Loss: 2.290942 LR: 0.00002013 [07:44:51] Epoch: 1 Batch: 2590/20099 (12.89%) Loss: 2.064359 LR: 0.00002018 [07:44:54] Epoch: 1 Batch: 2591/20099 (12.89%) Loss: 2.291740 LR: 0.00002018 [07:44:57] Epoch: 1 Batch: 2592/20099 (12.90%) Loss: 2.424448 LR: 0.00002018 [07:45:00] Epoch: 1 Batch: 2593/20099 (12.90%) Loss: 2.222500 LR: 0.00002018 [07:45:03] Epoch: 1 Batch: 2594/20099 (12.91%) Loss: 2.369810 LR: 0.00002018 [07:45:07] Epoch: 1 Batch: 2595/20099 (12.91%) Loss: 2.205030 LR: 0.00002018 [07:45:10] Epoch: 1 Batch: 2596/20099 (12.92%) Loss: 2.168958 LR: 0.00002018 [07:45:13] Epoch: 1 Batch: 2597/20099 (12.92%) Loss: 2.357138 LR: 0.00002024 [07:45:16] Epoch: 1 Batch: 2598/20099 (12.93%) Loss: 2.020350 LR: 0.00002024 [07:45:19] Epoch: 1 Batch: 2599/20099 (12.93%) Loss: 2.445168 LR: 0.00002024 [07:45:25] >> Cleaned up old temp checkpoint: epoch1_step600 [07:45:25] >> Temp checkpoint saved: epoch1_step2600, size: 0.1693 GB [07:45:25] Epoch: 1 Batch: 2600/20099 (12.94%) Loss: 2.208510 LR: 0.00002024 [07:45:28] Epoch: 1 Batch: 2601/20099 (12.94%) Loss: 2.012844 LR: 0.00002024 [07:45:32] Epoch: 1 Batch: 2602/20099 (12.95%) Loss: 2.261114 LR: 0.00002024 [07:45:35] Epoch: 1 Batch: 2603/20099 (12.95%) Loss: 2.419029 LR: 0.00002024 [07:45:38] Epoch: 1 Batch: 2604/20099 (12.96%) Loss: 2.110422 LR: 0.00002029 [07:45:41] Epoch: 1 Batch: 2605/20099 (12.96%) Loss: 2.012567 LR: 0.00002029 [07:45:44] Epoch: 1 Batch: 2606/20099 (12.97%) Loss: 2.501838 LR: 0.00002029 [07:45:47] Epoch: 1 Batch: 2607/20099 (12.97%) Loss: 2.441295 LR: 0.00002029 [07:45:50] Epoch: 1 Batch: 2608/20099 (12.98%) Loss: 2.383824 LR: 0.00002029 [07:45:53] Epoch: 1 Batch: 2609/20099 (12.98%) Loss: 2.580420 LR: 0.00002029 [07:45:56] Epoch: 1 Batch: 2610/20099 (12.99%) Loss: 2.024242 LR: 0.00002029 [07:46:00] Epoch: 1 Batch: 2611/20099 (12.99%) Loss: 2.403764 LR: 0.00002035 [07:46:03] Epoch: 1 Batch: 2612/20099 (13.00%) Loss: 2.495763 LR: 0.00002035 [07:46:06] Epoch: 1 Batch: 2613/20099 (13.00%) Loss: 2.580287 LR: 0.00002035 [07:46:09] Epoch: 1 Batch: 2614/20099 (13.01%) Loss: 2.290182 LR: 0.00002035 [07:46:12] Epoch: 1 Batch: 2615/20099 (13.01%) Loss: 2.170666 LR: 0.00002035 [07:46:15] Epoch: 1 Batch: 2616/20099 (13.02%) Loss: 2.341111 LR: 0.00002035 [07:46:18] Epoch: 1 Batch: 2617/20099 (13.02%) Loss: 2.180949 LR: 0.00002035 [07:46:22] Epoch: 1 Batch: 2618/20099 (13.03%) Loss: 2.314270 LR: 0.00002040 [07:46:25] Epoch: 1 Batch: 2619/20099 (13.03%) Loss: 2.223565 LR: 0.00002040 [07:46:28] Epoch: 1 Batch: 2620/20099 (13.04%) Loss: 2.106771 LR: 0.00002040 [07:46:31] Epoch: 1 Batch: 2621/20099 (13.04%) Loss: 2.467380 LR: 0.00002040 [07:46:34] Epoch: 1 Batch: 2622/20099 (13.05%) Loss: 2.123027 LR: 0.00002040 [07:46:37] Epoch: 1 Batch: 2623/20099 (13.05%) Loss: 2.067979 LR: 0.00002040 [07:46:40] Epoch: 1 Batch: 2624/20099 (13.06%) Loss: 2.287547 LR: 0.00002040 [07:46:43] Epoch: 1 Batch: 2625/20099 (13.06%) Loss: 2.406898 LR: 0.00002045 [07:46:46] Epoch: 1 Batch: 2626/20099 (13.07%) Loss: 2.072085 LR: 0.00002045 [07:46:49] Epoch: 1 Batch: 2627/20099 (13.07%) Loss: 2.242967 LR: 0.00002045 [07:46:52] Epoch: 1 Batch: 2628/20099 (13.08%) Loss: 2.475592 LR: 0.00002045 [07:46:55] Epoch: 1 Batch: 2629/20099 (13.08%) Loss: 2.160910 LR: 0.00002045 [07:46:59] Epoch: 1 Batch: 2630/20099 (13.09%) Loss: 2.247187 LR: 0.00002045 [07:47:02] Epoch: 1 Batch: 2631/20099 (13.09%) Loss: 2.235039 LR: 0.00002045 [07:47:05] Epoch: 1 Batch: 2632/20099 (13.10%) Loss: 2.220374 LR: 0.00002051 [07:47:08] Epoch: 1 Batch: 2633/20099 (13.10%) Loss: 2.355858 LR: 0.00002051 [07:47:11] Epoch: 1 Batch: 2634/20099 (13.11%) Loss: 2.269606 LR: 0.00002051 [07:47:14] Epoch: 1 Batch: 2635/20099 (13.11%) Loss: 1.968300 LR: 0.00002051 [07:47:17] Epoch: 1 Batch: 2636/20099 (13.12%) Loss: 2.219812 LR: 0.00002051 [07:47:20] Epoch: 1 Batch: 2637/20099 (13.12%) Loss: 2.475409 LR: 0.00002051 [07:47:23] Epoch: 1 Batch: 2638/20099 (13.13%) Loss: 2.160435 LR: 0.00002051 [07:47:26] Epoch: 1 Batch: 2639/20099 (13.13%) Loss: 2.444203 LR: 0.00002056 [07:47:30] Epoch: 1 Batch: 2640/20099 (13.13%) Loss: 2.511827 LR: 0.00002056 [07:47:33] Epoch: 1 Batch: 2641/20099 (13.14%) Loss: 2.304080 LR: 0.00002056 [07:47:36] Epoch: 1 Batch: 2642/20099 (13.14%) Loss: 2.085004 LR: 0.00002056 [07:47:39] Epoch: 1 Batch: 2643/20099 (13.15%) Loss: 2.062993 LR: 0.00002056 [07:47:42] Epoch: 1 Batch: 2644/20099 (13.15%) Loss: 2.345502 LR: 0.00002056 [07:47:45] Epoch: 1 Batch: 2645/20099 (13.16%) Loss: 2.099865 LR: 0.00002056 [07:47:48] Epoch: 1 Batch: 2646/20099 (13.16%) Loss: 2.371133 LR: 0.00002062 [07:47:51] Epoch: 1 Batch: 2647/20099 (13.17%) Loss: 2.429337 LR: 0.00002062 [07:47:54] Epoch: 1 Batch: 2648/20099 (13.17%) Loss: 2.274230 LR: 0.00002062 [07:47:58] Epoch: 1 Batch: 2649/20099 (13.18%) Loss: 1.978810 LR: 0.00002062 [07:48:01] Epoch: 1 Batch: 2650/20099 (13.18%) Loss: 2.109786 LR: 0.00002062 [07:48:04] Epoch: 1 Batch: 2651/20099 (13.19%) Loss: 2.008283 LR: 0.00002062 [07:48:07] Epoch: 1 Batch: 2652/20099 (13.19%) Loss: 2.265197 LR: 0.00002062 [07:48:10] Epoch: 1 Batch: 2653/20099 (13.20%) Loss: 2.443243 LR: 0.00002067 [07:48:13] Epoch: 1 Batch: 2654/20099 (13.20%) Loss: 2.253790 LR: 0.00002067 [07:48:16] Epoch: 1 Batch: 2655/20099 (13.21%) Loss: 2.528709 LR: 0.00002067 [07:48:19] Epoch: 1 Batch: 2656/20099 (13.21%) Loss: 2.291509 LR: 0.00002067 [07:48:22] Epoch: 1 Batch: 2657/20099 (13.22%) Loss: 2.435505 LR: 0.00002067 [07:48:25] Epoch: 1 Batch: 2658/20099 (13.22%) Loss: 1.871415 LR: 0.00002067 [07:48:28] Epoch: 1 Batch: 2659/20099 (13.23%) Loss: 2.278696 LR: 0.00002067 [07:48:32] Epoch: 1 Batch: 2660/20099 (13.23%) Loss: 2.397355 LR: 0.00002073 [07:48:35] Epoch: 1 Batch: 2661/20099 (13.24%) Loss: 2.375643 LR: 0.00002073 [07:48:38] Epoch: 1 Batch: 2662/20099 (13.24%) Loss: 2.244741 LR: 0.00002073 [07:48:41] Epoch: 1 Batch: 2663/20099 (13.25%) Loss: 2.164840 LR: 0.00002073 [07:48:44] Epoch: 1 Batch: 2664/20099 (13.25%) Loss: 2.340595 LR: 0.00002073 [07:48:47] Epoch: 1 Batch: 2665/20099 (13.26%) Loss: 2.006605 LR: 0.00002073 [07:48:50] Epoch: 1 Batch: 2666/20099 (13.26%) Loss: 2.313323 LR: 0.00002073 [07:48:53] Epoch: 1 Batch: 2667/20099 (13.27%) Loss: 2.654842 LR: 0.00002078 [07:48:56] Epoch: 1 Batch: 2668/20099 (13.27%) Loss: 2.200256 LR: 0.00002078 [07:48:59] Epoch: 1 Batch: 2669/20099 (13.28%) Loss: 2.245870 LR: 0.00002078 [07:49:02] Epoch: 1 Batch: 2670/20099 (13.28%) Loss: 2.003902 LR: 0.00002078 [07:49:05] Epoch: 1 Batch: 2671/20099 (13.29%) Loss: 2.092023 LR: 0.00002078 [07:49:09] Epoch: 1 Batch: 2672/20099 (13.29%) Loss: 2.572376 LR: 0.00002078 [07:49:12] Epoch: 1 Batch: 2673/20099 (13.30%) Loss: 2.351670 LR: 0.00002078 [07:49:15] Epoch: 1 Batch: 2674/20099 (13.30%) Loss: 2.517550 LR: 0.00002084 [07:49:18] Epoch: 1 Batch: 2675/20099 (13.31%) Loss: 2.202765 LR: 0.00002084 [07:49:21] Epoch: 1 Batch: 2676/20099 (13.31%) Loss: 2.282797 LR: 0.00002084 [07:49:24] Epoch: 1 Batch: 2677/20099 (13.32%) Loss: 1.944969 LR: 0.00002084 [07:49:27] Epoch: 1 Batch: 2678/20099 (13.32%) Loss: 2.231028 LR: 0.00002084 [07:49:30] Epoch: 1 Batch: 2679/20099 (13.33%) Loss: 2.165832 LR: 0.00002084 [07:49:33] Epoch: 1 Batch: 2680/20099 (13.33%) Loss: 2.280382 LR: 0.00002084 [07:49:36] Epoch: 1 Batch: 2681/20099 (13.34%) Loss: 2.464732 LR: 0.00002089 [07:49:39] Epoch: 1 Batch: 2682/20099 (13.34%) Loss: 2.038962 LR: 0.00002089 [07:49:43] Epoch: 1 Batch: 2683/20099 (13.35%) Loss: 2.323652 LR: 0.00002089 [07:49:46] Epoch: 1 Batch: 2684/20099 (13.35%) Loss: 2.176493 LR: 0.00002089 [07:49:49] Epoch: 1 Batch: 2685/20099 (13.36%) Loss: 2.384225 LR: 0.00002089 [07:49:52] Epoch: 1 Batch: 2686/20099 (13.36%) Loss: 2.051665 LR: 0.00002089 [07:49:55] Epoch: 1 Batch: 2687/20099 (13.37%) Loss: 2.213538 LR: 0.00002089 [07:49:58] Epoch: 1 Batch: 2688/20099 (13.37%) Loss: 2.230741 LR: 0.00002095 [07:50:01] Epoch: 1 Batch: 2689/20099 (13.38%) Loss: 2.491953 LR: 0.00002095 [07:50:04] Epoch: 1 Batch: 2690/20099 (13.38%) Loss: 2.449439 LR: 0.00002095 [07:50:07] Epoch: 1 Batch: 2691/20099 (13.39%) Loss: 2.606206 LR: 0.00002095 [07:50:10] Epoch: 1 Batch: 2692/20099 (13.39%) Loss: 2.250755 LR: 0.00002095 [07:50:14] Epoch: 1 Batch: 2693/20099 (13.40%) Loss: 2.297725 LR: 0.00002095 [07:50:17] Epoch: 1 Batch: 2694/20099 (13.40%) Loss: 2.000909 LR: 0.00002095 [07:50:20] Epoch: 1 Batch: 2695/20099 (13.41%) Loss: 2.391862 LR: 0.00002100 [07:50:23] Epoch: 1 Batch: 2696/20099 (13.41%) Loss: 2.287190 LR: 0.00002100 [07:50:26] Epoch: 1 Batch: 2697/20099 (13.42%) Loss: 1.829051 LR: 0.00002100 [07:50:29] Epoch: 1 Batch: 2698/20099 (13.42%) Loss: 2.148665 LR: 0.00002100 [07:50:32] Epoch: 1 Batch: 2699/20099 (13.43%) Loss: 2.105326 LR: 0.00002100 [07:50:35] Epoch: 1 Batch: 2700/20099 (13.43%) Loss: 2.097973 LR: 0.00002100 [07:50:38] Epoch: 1 Batch: 2701/20099 (13.44%) Loss: 2.359345 LR: 0.00002100 [07:50:42] Epoch: 1 Batch: 2702/20099 (13.44%) Loss: 2.257281 LR: 0.00002105 [07:50:45] Epoch: 1 Batch: 2703/20099 (13.45%) Loss: 2.278080 LR: 0.00002105 [07:50:48] Epoch: 1 Batch: 2704/20099 (13.45%) Loss: 2.133782 LR: 0.00002105 [07:50:51] Epoch: 1 Batch: 2705/20099 (13.46%) Loss: 2.203111 LR: 0.00002105 [07:50:54] Epoch: 1 Batch: 2706/20099 (13.46%) Loss: 2.109122 LR: 0.00002105 [07:50:57] Epoch: 1 Batch: 2707/20099 (13.47%) Loss: 1.915862 LR: 0.00002105 [07:51:00] Epoch: 1 Batch: 2708/20099 (13.47%) Loss: 2.037068 LR: 0.00002105 [07:51:03] Epoch: 1 Batch: 2709/20099 (13.48%) Loss: 2.181608 LR: 0.00002111 [07:51:06] Epoch: 1 Batch: 2710/20099 (13.48%) Loss: 2.143762 LR: 0.00002111 [07:51:09] Epoch: 1 Batch: 2711/20099 (13.49%) Loss: 2.348612 LR: 0.00002111 [07:51:13] Epoch: 1 Batch: 2712/20099 (13.49%) Loss: 2.495041 LR: 0.00002111 [07:51:16] Epoch: 1 Batch: 2713/20099 (13.50%) Loss: 2.307232 LR: 0.00002111 [07:51:19] Epoch: 1 Batch: 2714/20099 (13.50%) Loss: 2.399476 LR: 0.00002111 [07:51:22] Epoch: 1 Batch: 2715/20099 (13.51%) Loss: 2.226197 LR: 0.00002111 [07:51:25] Epoch: 1 Batch: 2716/20099 (13.51%) Loss: 2.291373 LR: 0.00002116 [07:51:28] Epoch: 1 Batch: 2717/20099 (13.52%) Loss: 2.015525 LR: 0.00002116 [07:51:31] Epoch: 1 Batch: 2718/20099 (13.52%) Loss: 1.918024 LR: 0.00002116 [07:51:34] Epoch: 1 Batch: 2719/20099 (13.53%) Loss: 2.219763 LR: 0.00002116 [07:51:37] Epoch: 1 Batch: 2720/20099 (13.53%) Loss: 2.398365 LR: 0.00002116 [07:51:40] Epoch: 1 Batch: 2721/20099 (13.54%) Loss: 2.209027 LR: 0.00002116 [07:51:43] Epoch: 1 Batch: 2722/20099 (13.54%) Loss: 2.170711 LR: 0.00002116 [07:51:47] Epoch: 1 Batch: 2723/20099 (13.55%) Loss: 2.405605 LR: 0.00002122 [07:51:50] Epoch: 1 Batch: 2724/20099 (13.55%) Loss: 2.248603 LR: 0.00002122 [07:51:53] Epoch: 1 Batch: 2725/20099 (13.56%) Loss: 2.190042 LR: 0.00002122 [07:51:56] Epoch: 1 Batch: 2726/20099 (13.56%) Loss: 2.159958 LR: 0.00002122 [07:51:59] Epoch: 1 Batch: 2727/20099 (13.57%) Loss: 2.454524 LR: 0.00002122 [07:52:02] Epoch: 1 Batch: 2728/20099 (13.57%) Loss: 1.998534 LR: 0.00002122 [07:52:05] Epoch: 1 Batch: 2729/20099 (13.58%) Loss: 2.232819 LR: 0.00002122 [07:52:08] Epoch: 1 Batch: 2730/20099 (13.58%) Loss: 2.087619 LR: 0.00002127 [07:52:11] Epoch: 1 Batch: 2731/20099 (13.59%) Loss: 2.258963 LR: 0.00002127 [07:52:14] Epoch: 1 Batch: 2732/20099 (13.59%) Loss: 2.311458 LR: 0.00002127 [07:52:18] Epoch: 1 Batch: 2733/20099 (13.60%) Loss: 2.223551 LR: 0.00002127 [07:52:21] Epoch: 1 Batch: 2734/20099 (13.60%) Loss: 2.236923 LR: 0.00002127 [07:52:24] Epoch: 1 Batch: 2735/20099 (13.61%) Loss: 2.401893 LR: 0.00002127 [07:52:27] Epoch: 1 Batch: 2736/20099 (13.61%) Loss: 2.172214 LR: 0.00002127 [07:52:30] Epoch: 1 Batch: 2737/20099 (13.62%) Loss: 2.204510 LR: 0.00002133 [07:52:33] Epoch: 1 Batch: 2738/20099 (13.62%) Loss: 2.373618 LR: 0.00002133 [07:52:36] Epoch: 1 Batch: 2739/20099 (13.63%) Loss: 2.328069 LR: 0.00002133 [07:52:39] Epoch: 1 Batch: 2740/20099 (13.63%) Loss: 1.896794 LR: 0.00002133 [07:52:42] Epoch: 1 Batch: 2741/20099 (13.64%) Loss: 1.869381 LR: 0.00002133 [07:52:45] Epoch: 1 Batch: 2742/20099 (13.64%) Loss: 2.323018 LR: 0.00002133 [07:52:49] Epoch: 1 Batch: 2743/20099 (13.65%) Loss: 2.542007 LR: 0.00002133 [07:52:52] Epoch: 1 Batch: 2744/20099 (13.65%) Loss: 1.870534 LR: 0.00002138 [07:52:55] Epoch: 1 Batch: 2745/20099 (13.66%) Loss: 2.082040 LR: 0.00002138 [07:52:58] Epoch: 1 Batch: 2746/20099 (13.66%) Loss: 2.339970 LR: 0.00002138 [07:53:01] Epoch: 1 Batch: 2747/20099 (13.67%) Loss: 2.134591 LR: 0.00002138 [07:53:04] Epoch: 1 Batch: 2748/20099 (13.67%) Loss: 1.761796 LR: 0.00002138 [07:53:07] Epoch: 1 Batch: 2749/20099 (13.68%) Loss: 1.914184 LR: 0.00002138 [07:53:10] Epoch: 1 Batch: 2750/20099 (13.68%) Loss: 2.099195 LR: 0.00002138 [07:53:13] Epoch: 1 Batch: 2751/20099 (13.69%) Loss: 2.210208 LR: 0.00002144 [07:53:16] Epoch: 1 Batch: 2752/20099 (13.69%) Loss: 2.047792 LR: 0.00002144 [07:53:20] Epoch: 1 Batch: 2753/20099 (13.70%) Loss: 2.244542 LR: 0.00002144 [07:53:23] Epoch: 1 Batch: 2754/20099 (13.70%) Loss: 2.641780 LR: 0.00002144 [07:53:26] Epoch: 1 Batch: 2755/20099 (13.71%) Loss: 2.192464 LR: 0.00002144 [07:53:29] Epoch: 1 Batch: 2756/20099 (13.71%) Loss: 2.056824 LR: 0.00002144 [07:53:32] Epoch: 1 Batch: 2757/20099 (13.72%) Loss: 2.375796 LR: 0.00002144 [07:53:35] Epoch: 1 Batch: 2758/20099 (13.72%) Loss: 2.305646 LR: 0.00002149 [07:53:38] Epoch: 1 Batch: 2759/20099 (13.73%) Loss: 2.180172 LR: 0.00002149 [07:53:41] Epoch: 1 Batch: 2760/20099 (13.73%) Loss: 1.986772 LR: 0.00002149 [07:53:44] Epoch: 1 Batch: 2761/20099 (13.74%) Loss: 2.284252 LR: 0.00002149 [07:53:47] Epoch: 1 Batch: 2762/20099 (13.74%) Loss: 2.370033 LR: 0.00002149 [07:53:50] Epoch: 1 Batch: 2763/20099 (13.75%) Loss: 2.307525 LR: 0.00002149 [07:53:54] Epoch: 1 Batch: 2764/20099 (13.75%) Loss: 2.395034 LR: 0.00002149 [07:53:57] Epoch: 1 Batch: 2765/20099 (13.76%) Loss: 2.218702 LR: 0.00002155 [07:54:00] Epoch: 1 Batch: 2766/20099 (13.76%) Loss: 2.697784 LR: 0.00002155 [07:54:03] Epoch: 1 Batch: 2767/20099 (13.77%) Loss: 2.060065 LR: 0.00002155 [07:54:06] Epoch: 1 Batch: 2768/20099 (13.77%) Loss: 2.117900 LR: 0.00002155 [07:54:09] Epoch: 1 Batch: 2769/20099 (13.78%) Loss: 2.128484 LR: 0.00002155 [07:54:12] Epoch: 1 Batch: 2770/20099 (13.78%) Loss: 2.082245 LR: 0.00002155 [07:54:15] Epoch: 1 Batch: 2771/20099 (13.79%) Loss: 2.275504 LR: 0.00002155 [07:54:18] Epoch: 1 Batch: 2772/20099 (13.79%) Loss: 2.718981 LR: 0.00002160 [07:54:21] Epoch: 1 Batch: 2773/20099 (13.80%) Loss: 2.143050 LR: 0.00002160 [07:54:25] Epoch: 1 Batch: 2774/20099 (13.80%) Loss: 2.243861 LR: 0.00002160 [07:54:28] Epoch: 1 Batch: 2775/20099 (13.81%) Loss: 2.412583 LR: 0.00002160 [07:54:31] Epoch: 1 Batch: 2776/20099 (13.81%) Loss: 2.707017 LR: 0.00002160 [07:54:34] Epoch: 1 Batch: 2777/20099 (13.82%) Loss: 2.227385 LR: 0.00002160 [07:54:37] Epoch: 1 Batch: 2778/20099 (13.82%) Loss: 1.718606 LR: 0.00002160 [07:54:40] Epoch: 1 Batch: 2779/20099 (13.83%) Loss: 2.032504 LR: 0.00002165 [07:54:43] Epoch: 1 Batch: 2780/20099 (13.83%) Loss: 1.887750 LR: 0.00002165 [07:54:46] Epoch: 1 Batch: 2781/20099 (13.84%) Loss: 2.128351 LR: 0.00002165 [07:54:49] Epoch: 1 Batch: 2782/20099 (13.84%) Loss: 2.230880 LR: 0.00002165 [07:54:52] Epoch: 1 Batch: 2783/20099 (13.85%) Loss: 2.528112 LR: 0.00002165 [07:54:56] Epoch: 1 Batch: 2784/20099 (13.85%) Loss: 2.040090 LR: 0.00002165 [07:54:59] Epoch: 1 Batch: 2785/20099 (13.86%) Loss: 1.841525 LR: 0.00002165 [07:55:02] Epoch: 1 Batch: 2786/20099 (13.86%) Loss: 2.275285 LR: 0.00002171 [07:55:05] Epoch: 1 Batch: 2787/20099 (13.87%) Loss: 2.262744 LR: 0.00002171 [07:55:08] Epoch: 1 Batch: 2788/20099 (13.87%) Loss: 2.221883 LR: 0.00002171 [07:55:11] Epoch: 1 Batch: 2789/20099 (13.88%) Loss: 2.145683 LR: 0.00002171 [07:55:14] Epoch: 1 Batch: 2790/20099 (13.88%) Loss: 2.409032 LR: 0.00002171 [07:55:17] Epoch: 1 Batch: 2791/20099 (13.89%) Loss: 1.984984 LR: 0.00002171 [07:55:20] Epoch: 1 Batch: 2792/20099 (13.89%) Loss: 2.118602 LR: 0.00002171 [07:55:23] Epoch: 1 Batch: 2793/20099 (13.90%) Loss: 2.057848 LR: 0.00002176 [07:55:27] Epoch: 1 Batch: 2794/20099 (13.90%) Loss: 2.103943 LR: 0.00002176 [07:55:30] Epoch: 1 Batch: 2795/20099 (13.91%) Loss: 2.324187 LR: 0.00002176 [07:55:33] Epoch: 1 Batch: 2796/20099 (13.91%) Loss: 2.271766 LR: 0.00002176 [07:55:36] Epoch: 1 Batch: 2797/20099 (13.92%) Loss: 2.061742 LR: 0.00002176 [07:55:39] Epoch: 1 Batch: 2798/20099 (13.92%) Loss: 2.285283 LR: 0.00002176 [07:55:42] Epoch: 1 Batch: 2799/20099 (13.93%) Loss: 1.941206 LR: 0.00002176 [07:55:49] >> Cleaned up old temp checkpoint: epoch1_step800 [07:55:49] >> Temp checkpoint saved: epoch1_step2800, size: 0.1693 GB [07:55:49] Epoch: 1 Batch: 2800/20099 (13.93%) Loss: 2.074402 LR: 0.00002182 [07:55:52] Epoch: 1 Batch: 2801/20099 (13.94%) Loss: 2.216730 LR: 0.00002182 [07:55:55] Epoch: 1 Batch: 2802/20099 (13.94%) Loss: 1.978696 LR: 0.00002182 [07:55:58] Epoch: 1 Batch: 2803/20099 (13.95%) Loss: 2.133315 LR: 0.00002182 [07:56:01] Epoch: 1 Batch: 2804/20099 (13.95%) Loss: 2.203534 LR: 0.00002182 [07:56:04] Epoch: 1 Batch: 2805/20099 (13.96%) Loss: 2.201557 LR: 0.00002182 [07:56:07] Epoch: 1 Batch: 2806/20099 (13.96%) Loss: 2.272289 LR: 0.00002182 [07:56:10] Epoch: 1 Batch: 2807/20099 (13.97%) Loss: 1.932803 LR: 0.00002187 [07:56:13] Epoch: 1 Batch: 2808/20099 (13.97%) Loss: 2.064082 LR: 0.00002187 [07:56:16] Epoch: 1 Batch: 2809/20099 (13.98%) Loss: 2.396595 LR: 0.00002187 [07:56:20] Epoch: 1 Batch: 2810/20099 (13.98%) Loss: 2.264002 LR: 0.00002187 [07:56:23] Epoch: 1 Batch: 2811/20099 (13.99%) Loss: 2.237241 LR: 0.00002187 [07:56:26] Epoch: 1 Batch: 2812/20099 (13.99%) Loss: 2.449508 LR: 0.00002187 [07:56:29] Epoch: 1 Batch: 2813/20099 (14.00%) Loss: 2.318754 LR: 0.00002187 [07:56:32] Epoch: 1 Batch: 2814/20099 (14.00%) Loss: 2.151048 LR: 0.00002193 [07:56:35] Epoch: 1 Batch: 2815/20099 (14.01%) Loss: 2.491852 LR: 0.00002193 [07:56:38] Epoch: 1 Batch: 2816/20099 (14.01%) Loss: 2.441684 LR: 0.00002193 [07:56:41] Epoch: 1 Batch: 2817/20099 (14.02%) Loss: 2.199344 LR: 0.00002193 [07:56:45] Epoch: 1 Batch: 2818/20099 (14.02%) Loss: 2.334011 LR: 0.00002193 [07:56:48] Epoch: 1 Batch: 2819/20099 (14.03%) Loss: 2.330544 LR: 0.00002193 [07:56:51] Epoch: 1 Batch: 2820/20099 (14.03%) Loss: 2.371519 LR: 0.00002193 [07:56:54] Epoch: 1 Batch: 2821/20099 (14.04%) Loss: 2.314243 LR: 0.00002198 [07:56:57] Epoch: 1 Batch: 2822/20099 (14.04%) Loss: 1.991649 LR: 0.00002198 [07:57:00] Epoch: 1 Batch: 2823/20099 (14.05%) Loss: 2.590214 LR: 0.00002198 [07:57:03] Epoch: 1 Batch: 2824/20099 (14.05%) Loss: 2.155244 LR: 0.00002198 [07:57:06] Epoch: 1 Batch: 2825/20099 (14.06%) Loss: 2.428171 LR: 0.00002198 [07:57:09] Epoch: 1 Batch: 2826/20099 (14.06%) Loss: 2.234841 LR: 0.00002198 [07:57:12] Epoch: 1 Batch: 2827/20099 (14.07%) Loss: 2.186695 LR: 0.00002198 [07:57:15] Epoch: 1 Batch: 2828/20099 (14.07%) Loss: 2.221848 LR: 0.00002204 [07:57:19] Epoch: 1 Batch: 2829/20099 (14.08%) Loss: 2.134188 LR: 0.00002204 [07:57:22] Epoch: 1 Batch: 2830/20099 (14.08%) Loss: 2.363022 LR: 0.00002204 [07:57:25] Epoch: 1 Batch: 2831/20099 (14.09%) Loss: 2.335325 LR: 0.00002204 [07:57:28] Epoch: 1 Batch: 2832/20099 (14.09%) Loss: 2.131914 LR: 0.00002204 [07:57:31] Epoch: 1 Batch: 2833/20099 (14.10%) Loss: 2.116859 LR: 0.00002204 [07:57:34] Epoch: 1 Batch: 2834/20099 (14.10%) Loss: 2.146293 LR: 0.00002204 [07:57:37] Epoch: 1 Batch: 2835/20099 (14.11%) Loss: 2.133831 LR: 0.00002209 [07:57:40] Epoch: 1 Batch: 2836/20099 (14.11%) Loss: 2.429434 LR: 0.00002209 [07:57:43] Epoch: 1 Batch: 2837/20099 (14.12%) Loss: 2.120774 LR: 0.00002209 [07:57:46] Epoch: 1 Batch: 2838/20099 (14.12%) Loss: 2.289357 LR: 0.00002209 [07:57:49] Epoch: 1 Batch: 2839/20099 (14.13%) Loss: 1.966570 LR: 0.00002209 [07:57:52] Epoch: 1 Batch: 2840/20099 (14.13%) Loss: 2.048846 LR: 0.00002209 [07:57:56] Epoch: 1 Batch: 2841/20099 (14.14%) Loss: 2.211694 LR: 0.00002209 [07:57:59] Epoch: 1 Batch: 2842/20099 (14.14%) Loss: 2.254627 LR: 0.00002215 [07:58:02] Epoch: 1 Batch: 2843/20099 (14.14%) Loss: 1.959458 LR: 0.00002215 [07:58:05] Epoch: 1 Batch: 2844/20099 (14.15%) Loss: 2.438670 LR: 0.00002215 [07:58:08] Epoch: 1 Batch: 2845/20099 (14.15%) Loss: 2.239449 LR: 0.00002215 [07:58:11] Epoch: 1 Batch: 2846/20099 (14.16%) Loss: 2.154443 LR: 0.00002215 [07:58:14] Epoch: 1 Batch: 2847/20099 (14.16%) Loss: 2.198065 LR: 0.00002215 [07:58:17] Epoch: 1 Batch: 2848/20099 (14.17%) Loss: 2.110764 LR: 0.00002215 [07:58:20] Epoch: 1 Batch: 2849/20099 (14.17%) Loss: 2.203395 LR: 0.00002220 [07:58:23] Epoch: 1 Batch: 2850/20099 (14.18%) Loss: 2.414010 LR: 0.00002220 [07:58:26] Epoch: 1 Batch: 2851/20099 (14.18%) Loss: 2.278963 LR: 0.00002220 [07:58:30] Epoch: 1 Batch: 2852/20099 (14.19%) Loss: 2.276326 LR: 0.00002220 [07:58:33] Epoch: 1 Batch: 2853/20099 (14.19%) Loss: 2.265434 LR: 0.00002220 [07:58:36] Epoch: 1 Batch: 2854/20099 (14.20%) Loss: 2.591926 LR: 0.00002220 [07:58:39] Epoch: 1 Batch: 2855/20099 (14.20%) Loss: 2.349315 LR: 0.00002220 [07:58:42] Epoch: 1 Batch: 2856/20099 (14.21%) Loss: 2.321206 LR: 0.00002225 [07:58:45] Epoch: 1 Batch: 2857/20099 (14.21%) Loss: 2.135187 LR: 0.00002225 [07:58:48] Epoch: 1 Batch: 2858/20099 (14.22%) Loss: 2.291490 LR: 0.00002225 [07:58:51] Epoch: 1 Batch: 2859/20099 (14.22%) Loss: 2.337298 LR: 0.00002225 [07:58:54] Epoch: 1 Batch: 2860/20099 (14.23%) Loss: 2.165537 LR: 0.00002225 [07:58:58] Epoch: 1 Batch: 2861/20099 (14.23%) Loss: 2.049564 LR: 0.00002225 [07:59:01] Epoch: 1 Batch: 2862/20099 (14.24%) Loss: 2.104748 LR: 0.00002225 [07:59:04] Epoch: 1 Batch: 2863/20099 (14.24%) Loss: 2.023598 LR: 0.00002231 [07:59:07] Epoch: 1 Batch: 2864/20099 (14.25%) Loss: 2.191615 LR: 0.00002231 [07:59:10] Epoch: 1 Batch: 2865/20099 (14.25%) Loss: 2.136958 LR: 0.00002231 [07:59:13] Epoch: 1 Batch: 2866/20099 (14.26%) Loss: 2.416388 LR: 0.00002231 [07:59:16] Epoch: 1 Batch: 2867/20099 (14.26%) Loss: 2.275473 LR: 0.00002231 [07:59:19] Epoch: 1 Batch: 2868/20099 (14.27%) Loss: 2.362564 LR: 0.00002231 [07:59:22] Epoch: 1 Batch: 2869/20099 (14.27%) Loss: 2.022331 LR: 0.00002231 [07:59:25] Epoch: 1 Batch: 2870/20099 (14.28%) Loss: 2.331536 LR: 0.00002236 [07:59:29] Epoch: 1 Batch: 2871/20099 (14.28%) Loss: 2.291268 LR: 0.00002236 [07:59:32] Epoch: 1 Batch: 2872/20099 (14.29%) Loss: 2.214578 LR: 0.00002236 [07:59:35] Epoch: 1 Batch: 2873/20099 (14.29%) Loss: 2.098819 LR: 0.00002236 [07:59:38] Epoch: 1 Batch: 2874/20099 (14.30%) Loss: 2.101243 LR: 0.00002236 [07:59:41] Epoch: 1 Batch: 2875/20099 (14.30%) Loss: 2.150649 LR: 0.00002236 [07:59:44] Epoch: 1 Batch: 2876/20099 (14.31%) Loss: 2.081349 LR: 0.00002236 [07:59:47] Epoch: 1 Batch: 2877/20099 (14.31%) Loss: 2.144913 LR: 0.00002242 [07:59:50] Epoch: 1 Batch: 2878/20099 (14.32%) Loss: 2.161219 LR: 0.00002242 [07:59:53] Epoch: 1 Batch: 2879/20099 (14.32%) Loss: 2.172865 LR: 0.00002242 [07:59:57] Epoch: 1 Batch: 2880/20099 (14.33%) Loss: 2.208618 LR: 0.00002242 [08:00:00] Epoch: 1 Batch: 2881/20099 (14.33%) Loss: 2.449243 LR: 0.00002242 [08:00:03] Epoch: 1 Batch: 2882/20099 (14.34%) Loss: 2.434333 LR: 0.00002242 [08:00:06] Epoch: 1 Batch: 2883/20099 (14.34%) Loss: 2.009742 LR: 0.00002242 [08:00:09] Epoch: 1 Batch: 2884/20099 (14.35%) Loss: 2.188139 LR: 0.00002247 [08:00:12] Epoch: 1 Batch: 2885/20099 (14.35%) Loss: 2.161915 LR: 0.00002247 [08:00:15] Epoch: 1 Batch: 2886/20099 (14.36%) Loss: 2.368684 LR: 0.00002247 [08:00:18] Epoch: 1 Batch: 2887/20099 (14.36%) Loss: 2.430614 LR: 0.00002247 [08:00:21] Epoch: 1 Batch: 2888/20099 (14.37%) Loss: 2.164064 LR: 0.00002247 [08:00:24] Epoch: 1 Batch: 2889/20099 (14.37%) Loss: 1.899476 LR: 0.00002247 [08:00:28] Epoch: 1 Batch: 2890/20099 (14.38%) Loss: 2.319108 LR: 0.00002247 [08:00:31] Epoch: 1 Batch: 2891/20099 (14.38%) Loss: 2.166673 LR: 0.00002253 [08:00:34] Epoch: 1 Batch: 2892/20099 (14.39%) Loss: 2.624513 LR: 0.00002253 [08:00:37] Epoch: 1 Batch: 2893/20099 (14.39%) Loss: 1.969318 LR: 0.00002253 [08:00:40] Epoch: 1 Batch: 2894/20099 (14.40%) Loss: 2.149759 LR: 0.00002253 [08:00:43] Epoch: 1 Batch: 2895/20099 (14.40%) Loss: 2.501909 LR: 0.00002253 [08:00:46] Epoch: 1 Batch: 2896/20099 (14.41%) Loss: 1.987996 LR: 0.00002253 [08:00:49] Epoch: 1 Batch: 2897/20099 (14.41%) Loss: 2.171886 LR: 0.00002253 [08:00:52] Epoch: 1 Batch: 2898/20099 (14.42%) Loss: 2.277207 LR: 0.00002258 [08:00:55] Epoch: 1 Batch: 2899/20099 (14.42%) Loss: 2.072195 LR: 0.00002258 [08:00:59] Epoch: 1 Batch: 2900/20099 (14.43%) Loss: 2.324095 LR: 0.00002258 [08:01:02] Epoch: 1 Batch: 2901/20099 (14.43%) Loss: 2.525599 LR: 0.00002258 [08:01:05] Epoch: 1 Batch: 2902/20099 (14.44%) Loss: 2.392134 LR: 0.00002258 [08:01:08] Epoch: 1 Batch: 2903/20099 (14.44%) Loss: 2.368020 LR: 0.00002258 [08:01:11] Epoch: 1 Batch: 2904/20099 (14.45%) Loss: 2.184041 LR: 0.00002258 [08:01:14] Epoch: 1 Batch: 2905/20099 (14.45%) Loss: 2.019115 LR: 0.00002264 [08:01:17] Epoch: 1 Batch: 2906/20099 (14.46%) Loss: 2.530500 LR: 0.00002264 [08:01:20] Epoch: 1 Batch: 2907/20099 (14.46%) Loss: 2.055346 LR: 0.00002264 [08:01:23] Epoch: 1 Batch: 2908/20099 (14.47%) Loss: 2.305311 LR: 0.00002264 [08:01:26] Epoch: 1 Batch: 2909/20099 (14.47%) Loss: 2.222674 LR: 0.00002264 [08:01:29] Epoch: 1 Batch: 2910/20099 (14.48%) Loss: 2.381155 LR: 0.00002264 [08:01:33] Epoch: 1 Batch: 2911/20099 (14.48%) Loss: 2.002766 LR: 0.00002264 [08:01:36] Epoch: 1 Batch: 2912/20099 (14.49%) Loss: 2.297015 LR: 0.00002269 [08:01:39] Epoch: 1 Batch: 2913/20099 (14.49%) Loss: 2.391950 LR: 0.00002269 [08:01:42] Epoch: 1 Batch: 2914/20099 (14.50%) Loss: 2.234587 LR: 0.00002269 [08:01:45] Epoch: 1 Batch: 2915/20099 (14.50%) Loss: 2.193214 LR: 0.00002269 [08:01:48] Epoch: 1 Batch: 2916/20099 (14.51%) Loss: 2.243914 LR: 0.00002269 [08:01:51] Epoch: 1 Batch: 2917/20099 (14.51%) Loss: 2.007367 LR: 0.00002269 [08:01:54] Epoch: 1 Batch: 2918/20099 (14.52%) Loss: 2.456401 LR: 0.00002269 [08:01:57] Epoch: 1 Batch: 2919/20099 (14.52%) Loss: 2.158502 LR: 0.00002275 [08:02:00] Epoch: 1 Batch: 2920/20099 (14.53%) Loss: 2.009751 LR: 0.00002275 [08:02:04] Epoch: 1 Batch: 2921/20099 (14.53%) Loss: 2.180648 LR: 0.00002275 [08:02:07] Epoch: 1 Batch: 2922/20099 (14.54%) Loss: 2.284039 LR: 0.00002275 [08:02:10] Epoch: 1 Batch: 2923/20099 (14.54%) Loss: 2.372457 LR: 0.00002275 [08:02:13] Epoch: 1 Batch: 2924/20099 (14.55%) Loss: 2.034812 LR: 0.00002275 [08:02:16] Epoch: 1 Batch: 2925/20099 (14.55%) Loss: 2.400895 LR: 0.00002275 [08:02:19] Epoch: 1 Batch: 2926/20099 (14.56%) Loss: 2.173874 LR: 0.00002280 [08:02:22] Epoch: 1 Batch: 2927/20099 (14.56%) Loss: 2.110225 LR: 0.00002280 [08:02:25] Epoch: 1 Batch: 2928/20099 (14.57%) Loss: 2.331111 LR: 0.00002280 [08:02:28] Epoch: 1 Batch: 2929/20099 (14.57%) Loss: 2.254573 LR: 0.00002280 [08:02:31] Epoch: 1 Batch: 2930/20099 (14.58%) Loss: 2.267711 LR: 0.00002280 [08:02:35] Epoch: 1 Batch: 2931/20099 (14.58%) Loss: 2.115487 LR: 0.00002280 [08:02:38] Epoch: 1 Batch: 2932/20099 (14.59%) Loss: 2.129659 LR: 0.00002280 [08:02:41] Epoch: 1 Batch: 2933/20099 (14.59%) Loss: 2.432295 LR: 0.00002285 [08:02:44] Epoch: 1 Batch: 2934/20099 (14.60%) Loss: 2.134562 LR: 0.00002285 [08:02:47] Epoch: 1 Batch: 2935/20099 (14.60%) Loss: 2.170224 LR: 0.00002285 [08:02:50] Epoch: 1 Batch: 2936/20099 (14.61%) Loss: 2.265427 LR: 0.00002285 [08:02:53] Epoch: 1 Batch: 2937/20099 (14.61%) Loss: 2.466420 LR: 0.00002285 [08:02:56] Epoch: 1 Batch: 2938/20099 (14.62%) Loss: 2.210351 LR: 0.00002285 [08:02:59] Epoch: 1 Batch: 2939/20099 (14.62%) Loss: 2.013543 LR: 0.00002285 [08:03:02] Epoch: 1 Batch: 2940/20099 (14.63%) Loss: 2.379362 LR: 0.00002291 [08:03:06] Epoch: 1 Batch: 2941/20099 (14.63%) Loss: 2.364670 LR: 0.00002291 [08:03:09] Epoch: 1 Batch: 2942/20099 (14.64%) Loss: 2.077369 LR: 0.00002291 [08:03:12] Epoch: 1 Batch: 2943/20099 (14.64%) Loss: 2.151177 LR: 0.00002291 [08:03:15] Epoch: 1 Batch: 2944/20099 (14.65%) Loss: 2.240447 LR: 0.00002291 [08:03:18] Epoch: 1 Batch: 2945/20099 (14.65%) Loss: 2.334965 LR: 0.00002291 [08:03:21] Epoch: 1 Batch: 2946/20099 (14.66%) Loss: 2.241404 LR: 0.00002291 [08:03:24] Epoch: 1 Batch: 2947/20099 (14.66%) Loss: 2.446580 LR: 0.00002296 [08:03:27] Epoch: 1 Batch: 2948/20099 (14.67%) Loss: 1.974125 LR: 0.00002296 [08:03:30] Epoch: 1 Batch: 2949/20099 (14.67%) Loss: 2.128999 LR: 0.00002296 [08:03:33] Epoch: 1 Batch: 2950/20099 (14.68%) Loss: 2.310823 LR: 0.00002296 [08:03:37] Epoch: 1 Batch: 2951/20099 (14.68%) Loss: 2.171002 LR: 0.00002296 [08:03:40] Epoch: 1 Batch: 2952/20099 (14.69%) Loss: 2.358540 LR: 0.00002296 [08:03:43] Epoch: 1 Batch: 2953/20099 (14.69%) Loss: 2.278703 LR: 0.00002296 [08:03:46] Epoch: 1 Batch: 2954/20099 (14.70%) Loss: 2.152119 LR: 0.00002302 [08:03:49] Epoch: 1 Batch: 2955/20099 (14.70%) Loss: 1.958947 LR: 0.00002302 [08:03:52] Epoch: 1 Batch: 2956/20099 (14.71%) Loss: 2.007500 LR: 0.00002302 [08:03:55] Epoch: 1 Batch: 2957/20099 (14.71%) Loss: 2.125084 LR: 0.00002302 [08:03:58] Epoch: 1 Batch: 2958/20099 (14.72%) Loss: 2.329364 LR: 0.00002302 [08:04:01] Epoch: 1 Batch: 2959/20099 (14.72%) Loss: 2.509832 LR: 0.00002302 [08:04:04] Epoch: 1 Batch: 2960/20099 (14.73%) Loss: 2.273805 LR: 0.00002302 [08:04:07] Epoch: 1 Batch: 2961/20099 (14.73%) Loss: 2.199972 LR: 0.00002307 [08:04:11] Epoch: 1 Batch: 2962/20099 (14.74%) Loss: 2.454372 LR: 0.00002307 [08:04:14] Epoch: 1 Batch: 2963/20099 (14.74%) Loss: 2.294977 LR: 0.00002307 [08:04:17] Epoch: 1 Batch: 2964/20099 (14.75%) Loss: 2.127792 LR: 0.00002307 [08:04:20] Epoch: 1 Batch: 2965/20099 (14.75%) Loss: 2.076711 LR: 0.00002307 [08:04:23] Epoch: 1 Batch: 2966/20099 (14.76%) Loss: 2.312637 LR: 0.00002307 [08:04:26] Epoch: 1 Batch: 2967/20099 (14.76%) Loss: 2.162895 LR: 0.00002307 [08:04:29] Epoch: 1 Batch: 2968/20099 (14.77%) Loss: 2.339951 LR: 0.00002313 [08:04:32] Epoch: 1 Batch: 2969/20099 (14.77%) Loss: 2.236085 LR: 0.00002313 [08:04:35] Epoch: 1 Batch: 2970/20099 (14.78%) Loss: 2.096571 LR: 0.00002313 [08:04:38] Epoch: 1 Batch: 2971/20099 (14.78%) Loss: 2.269050 LR: 0.00002313 [08:04:42] Epoch: 1 Batch: 2972/20099 (14.79%) Loss: 2.295001 LR: 0.00002313 [08:04:45] Epoch: 1 Batch: 2973/20099 (14.79%) Loss: 2.220366 LR: 0.00002313 [08:04:48] Epoch: 1 Batch: 2974/20099 (14.80%) Loss: 2.489001 LR: 0.00002313 [08:04:51] Epoch: 1 Batch: 2975/20099 (14.80%) Loss: 2.146025 LR: 0.00002318 [08:04:54] Epoch: 1 Batch: 2976/20099 (14.81%) Loss: 2.411311 LR: 0.00002318 [08:04:57] Epoch: 1 Batch: 2977/20099 (14.81%) Loss: 2.590576 LR: 0.00002318 [08:05:00] Epoch: 1 Batch: 2978/20099 (14.82%) Loss: 2.181749 LR: 0.00002318 [08:05:03] Epoch: 1 Batch: 2979/20099 (14.82%) Loss: 2.212873 LR: 0.00002318 [08:05:06] Epoch: 1 Batch: 2980/20099 (14.83%) Loss: 2.031998 LR: 0.00002318 [08:05:09] Epoch: 1 Batch: 2981/20099 (14.83%) Loss: 2.415733 LR: 0.00002318 [08:05:12] Epoch: 1 Batch: 2982/20099 (14.84%) Loss: 2.192715 LR: 0.00002324 [08:05:16] Epoch: 1 Batch: 2983/20099 (14.84%) Loss: 2.078628 LR: 0.00002324 [08:05:19] Epoch: 1 Batch: 2984/20099 (14.85%) Loss: 2.042099 LR: 0.00002324 [08:05:22] Epoch: 1 Batch: 2985/20099 (14.85%) Loss: 2.371159 LR: 0.00002324 [08:05:25] Epoch: 1 Batch: 2986/20099 (14.86%) Loss: 2.199560 LR: 0.00002324 [08:05:28] Epoch: 1 Batch: 2987/20099 (14.86%) Loss: 2.232897 LR: 0.00002324 [08:05:31] Epoch: 1 Batch: 2988/20099 (14.87%) Loss: 2.544611 LR: 0.00002324 [08:05:34] Epoch: 1 Batch: 2989/20099 (14.87%) Loss: 2.012506 LR: 0.00002329 [08:05:37] Epoch: 1 Batch: 2990/20099 (14.88%) Loss: 2.353424 LR: 0.00002329 [08:05:40] Epoch: 1 Batch: 2991/20099 (14.88%) Loss: 1.983872 LR: 0.00002329 [08:05:43] Epoch: 1 Batch: 2992/20099 (14.89%) Loss: 2.500118 LR: 0.00002329 [08:05:46] Epoch: 1 Batch: 2993/20099 (14.89%) Loss: 2.224709 LR: 0.00002329 [08:05:50] Epoch: 1 Batch: 2994/20099 (14.90%) Loss: 2.136726 LR: 0.00002329 [08:05:53] Epoch: 1 Batch: 2995/20099 (14.90%) Loss: 2.312960 LR: 0.00002329 [08:05:56] Epoch: 1 Batch: 2996/20099 (14.91%) Loss: 2.221241 LR: 0.00002335 [08:05:59] Epoch: 1 Batch: 2997/20099 (14.91%) Loss: 2.242181 LR: 0.00002335 [08:06:02] Epoch: 1 Batch: 2998/20099 (14.92%) Loss: 2.293796 LR: 0.00002335 [08:06:05] Epoch: 1 Batch: 2999/20099 (14.92%) Loss: 2.156148 LR: 0.00002335 [08:06:08] >> Evaluating batch 0 [08:06:09] >> Evaluating batch 1 [08:06:11] >> Evaluating batch 2 [08:06:12] >> Evaluating batch 3 [08:06:13] >> Evaluating batch 4 [08:06:15] >> Evaluating batch 5 [08:06:16] >> Evaluating batch 6 [08:06:17] >> Evaluating batch 7 [08:06:18] >> Evaluating batch 8 [08:06:20] >> Evaluating batch 9 [08:06:21] >> Evaluating batch 10 [08:06:22] >> Evaluating batch 11 [08:06:23] >> Evaluating batch 12 [08:06:24] >> Evaluating batch 13 [08:06:26] >> Evaluating batch 14 [08:06:27] >> Evaluating batch 15 [08:06:28] >> Evaluating batch 16 [08:06:29] Epoch: 1 Step: 3000/20099 Evaluation: [08:06:29] [1mAvg Loss Since Last Eval: 2.2333 Val Loss: 2.2879 Validation loss delta: -0.0258 Perplexity: 9.8541 LR: 0.00002335 [08:06:32] >> Cleaned up old temp checkpoint: epoch1_step1000 [08:06:32] >> Temp checkpoint saved: epoch1_step3000, size: 0.1693 GB [08:06:36] >> Checkpoint saved: epoch1_step3000, size: 0.1693 GB [08:06:36] Epoch: 1 Batch: 3000/20099 (14.93%) Loss: 2.199824 LR: 0.00002335 [08:06:39] Epoch: 1 Batch: 3001/20099 (14.93%) Loss: 2.307619 LR: 0.00002335 [08:06:42] Epoch: 1 Batch: 3002/20099 (14.94%) Loss: 2.225192 LR: 0.00002335 [08:06:45] Epoch: 1 Batch: 3003/20099 (14.94%) Loss: 2.267109 LR: 0.00002340 [08:06:48] Epoch: 1 Batch: 3004/20099 (14.95%) Loss: 2.125650 LR: 0.00002340 [08:06:51] Epoch: 1 Batch: 3005/20099 (14.95%) Loss: 2.208072 LR: 0.00002340 [08:06:54] Epoch: 1 Batch: 3006/20099 (14.96%) Loss: 2.304340 LR: 0.00002340 [08:06:57] Epoch: 1 Batch: 3007/20099 (14.96%) Loss: 1.962974 LR: 0.00002340 [08:07:01] Epoch: 1 Batch: 3008/20099 (14.97%) Loss: 2.343878 LR: 0.00002340 [08:07:04] Epoch: 1 Batch: 3009/20099 (14.97%) Loss: 2.250842 LR: 0.00002340 [08:07:07] Epoch: 1 Batch: 3010/20099 (14.98%) Loss: 2.288443 LR: 0.00002345 [08:07:10] Epoch: 1 Batch: 3011/20099 (14.98%) Loss: 2.469611 LR: 0.00002345 [08:07:13] Epoch: 1 Batch: 3012/20099 (14.99%) Loss: 1.936887 LR: 0.00002345 [08:07:16] Epoch: 1 Batch: 3013/20099 (14.99%) Loss: 2.221314 LR: 0.00002345 [08:07:20] Epoch: 1 Batch: 3014/20099 (15.00%) Loss: 2.451274 LR: 0.00002345 [08:07:23] Epoch: 1 Batch: 3015/20099 (15.00%) Loss: 2.328000 LR: 0.00002345 [08:07:26] Epoch: 1 Batch: 3016/20099 (15.01%) Loss: 2.393557 LR: 0.00002345 [08:07:29] Epoch: 1 Batch: 3017/20099 (15.01%) Loss: 2.428119 LR: 0.00002351 [08:07:32] Epoch: 1 Batch: 3018/20099 (15.02%) Loss: 2.113833 LR: 0.00002351 [08:07:35] Epoch: 1 Batch: 3019/20099 (15.02%) Loss: 2.153996 LR: 0.00002351 [08:07:38] Epoch: 1 Batch: 3020/20099 (15.03%) Loss: 2.380019 LR: 0.00002351 [08:07:41] Epoch: 1 Batch: 3021/20099 (15.03%) Loss: 2.205778 LR: 0.00002351 [08:07:44] Epoch: 1 Batch: 3022/20099 (15.04%) Loss: 2.283350 LR: 0.00002351 [08:07:47] Epoch: 1 Batch: 3023/20099 (15.04%) Loss: 2.049090 LR: 0.00002351 [08:07:50] Epoch: 1 Batch: 3024/20099 (15.05%) Loss: 2.165942 LR: 0.00002356 [08:07:54] Epoch: 1 Batch: 3025/20099 (15.05%) Loss: 2.226965 LR: 0.00002356 [08:07:57] Epoch: 1 Batch: 3026/20099 (15.06%) Loss: 2.109134 LR: 0.00002356 [08:08:00] Epoch: 1 Batch: 3027/20099 (15.06%) Loss: 2.382126 LR: 0.00002356 [08:08:03] Epoch: 1 Batch: 3028/20099 (15.07%) Loss: 2.447420 LR: 0.00002356 [08:08:06] Epoch: 1 Batch: 3029/20099 (15.07%) Loss: 2.300135 LR: 0.00002356 [08:08:09] Epoch: 1 Batch: 3030/20099 (15.08%) Loss: 2.153233 LR: 0.00002356 [08:08:12] Epoch: 1 Batch: 3031/20099 (15.08%) Loss: 2.217435 LR: 0.00002362 [08:08:15] Epoch: 1 Batch: 3032/20099 (15.09%) Loss: 2.362185 LR: 0.00002362 [08:08:18] Epoch: 1 Batch: 3033/20099 (15.09%) Loss: 2.509098 LR: 0.00002362 [08:08:21] Epoch: 1 Batch: 3034/20099 (15.10%) Loss: 2.390411 LR: 0.00002362 [08:08:24] Epoch: 1 Batch: 3035/20099 (15.10%) Loss: 2.469917 LR: 0.00002362 [08:08:27] Epoch: 1 Batch: 3036/20099 (15.11%) Loss: 2.149001 LR: 0.00002362 [08:08:31] Epoch: 1 Batch: 3037/20099 (15.11%) Loss: 2.111990 LR: 0.00002362 [08:08:34] Epoch: 1 Batch: 3038/20099 (15.12%) Loss: 2.271682 LR: 0.00002367 [08:08:37] Epoch: 1 Batch: 3039/20099 (15.12%) Loss: 2.540505 LR: 0.00002367 [08:08:40] Epoch: 1 Batch: 3040/20099 (15.13%) Loss: 2.242901 LR: 0.00002367 [08:08:43] Epoch: 1 Batch: 3041/20099 (15.13%) Loss: 2.068046 LR: 0.00002367 [08:08:46] Epoch: 1 Batch: 3042/20099 (15.14%) Loss: 1.705473 LR: 0.00002367 [08:08:49] Epoch: 1 Batch: 3043/20099 (15.14%) Loss: 2.042801 LR: 0.00002367 [08:08:52] Epoch: 1 Batch: 3044/20099 (15.15%) Loss: 2.387302 LR: 0.00002367 [08:08:55] Epoch: 1 Batch: 3045/20099 (15.15%) Loss: 2.112935 LR: 0.00002373 [08:08:59] Epoch: 1 Batch: 3046/20099 (15.15%) Loss: 2.307239 LR: 0.00002373 [08:09:02] Epoch: 1 Batch: 3047/20099 (15.16%) Loss: 2.386851 LR: 0.00002373 [08:09:05] Epoch: 1 Batch: 3048/20099 (15.16%) Loss: 2.135274 LR: 0.00002373 [08:09:08] Epoch: 1 Batch: 3049/20099 (15.17%) Loss: 2.035246 LR: 0.00002373 [08:09:11] Epoch: 1 Batch: 3050/20099 (15.17%) Loss: 2.354911 LR: 0.00002373 [08:09:14] Epoch: 1 Batch: 3051/20099 (15.18%) Loss: 2.044442 LR: 0.00002373 [08:09:17] Epoch: 1 Batch: 3052/20099 (15.18%) Loss: 2.397319 LR: 0.00002378 [08:09:20] Epoch: 1 Batch: 3053/20099 (15.19%) Loss: 2.390864 LR: 0.00002378 [08:09:23] Epoch: 1 Batch: 3054/20099 (15.19%) Loss: 2.260608 LR: 0.00002378 [08:09:26] Epoch: 1 Batch: 3055/20099 (15.20%) Loss: 1.911923 LR: 0.00002378 [08:09:29] Epoch: 1 Batch: 3056/20099 (15.20%) Loss: 2.285467 LR: 0.00002378 [08:09:33] Epoch: 1 Batch: 3057/20099 (15.21%) Loss: 1.924426 LR: 0.00002378 [08:09:36] Epoch: 1 Batch: 3058/20099 (15.21%) Loss: 2.149227 LR: 0.00002378 [08:09:39] Epoch: 1 Batch: 3059/20099 (15.22%) Loss: 2.127029 LR: 0.00002384 [08:09:42] Epoch: 1 Batch: 3060/20099 (15.22%) Loss: 2.166077 LR: 0.00002384 [08:09:45] Epoch: 1 Batch: 3061/20099 (15.23%) Loss: 2.244812 LR: 0.00002384 [08:09:48] Epoch: 1 Batch: 3062/20099 (15.23%) Loss: 1.982675 LR: 0.00002384 [08:09:51] Epoch: 1 Batch: 3063/20099 (15.24%) Loss: 2.080415 LR: 0.00002384 [08:09:54] Epoch: 1 Batch: 3064/20099 (15.24%) Loss: 2.552805 LR: 0.00002384 [08:09:57] Epoch: 1 Batch: 3065/20099 (15.25%) Loss: 2.230253 LR: 0.00002384 [08:10:00] Epoch: 1 Batch: 3066/20099 (15.25%) Loss: 2.175795 LR: 0.00002389 [08:10:03] Epoch: 1 Batch: 3067/20099 (15.26%) Loss: 2.346294 LR: 0.00002389 [08:10:06] Epoch: 1 Batch: 3068/20099 (15.26%) Loss: 2.244041 LR: 0.00002389 [08:10:10] Epoch: 1 Batch: 3069/20099 (15.27%) Loss: 2.349112 LR: 0.00002389 [08:10:13] Epoch: 1 Batch: 3070/20099 (15.27%) Loss: 2.367264 LR: 0.00002389 [08:10:16] Epoch: 1 Batch: 3071/20099 (15.28%) Loss: 2.385255 LR: 0.00002389 [08:10:19] Epoch: 1 Batch: 3072/20099 (15.28%) Loss: 2.295088 LR: 0.00002389 [08:10:22] Epoch: 1 Batch: 3073/20099 (15.29%) Loss: 2.194893 LR: 0.00002395 [08:10:25] Epoch: 1 Batch: 3074/20099 (15.29%) Loss: 2.194145 LR: 0.00002395 [08:10:28] Epoch: 1 Batch: 3075/20099 (15.30%) Loss: 2.514250 LR: 0.00002395 [08:10:31] Epoch: 1 Batch: 3076/20099 (15.30%) Loss: 2.096946 LR: 0.00002395 [08:10:34] Epoch: 1 Batch: 3077/20099 (15.31%) Loss: 2.318950 LR: 0.00002395 [08:10:37] Epoch: 1 Batch: 3078/20099 (15.31%) Loss: 1.818635 LR: 0.00002395 [08:10:40] Epoch: 1 Batch: 3079/20099 (15.32%) Loss: 2.224263 LR: 0.00002395 [08:10:44] Epoch: 1 Batch: 3080/20099 (15.32%) Loss: 2.088881 LR: 0.00002400 [08:10:47] Epoch: 1 Batch: 3081/20099 (15.33%) Loss: 1.978843 LR: 0.00002400 [08:10:50] Epoch: 1 Batch: 3082/20099 (15.33%) Loss: 2.362289 LR: 0.00002400 [08:10:53] Epoch: 1 Batch: 3083/20099 (15.34%) Loss: 2.130124 LR: 0.00002400 [08:10:56] Epoch: 1 Batch: 3084/20099 (15.34%) Loss: 1.874016 LR: 0.00002400 [08:10:59] Epoch: 1 Batch: 3085/20099 (15.35%) Loss: 2.318212 LR: 0.00002400 [08:11:02] Epoch: 1 Batch: 3086/20099 (15.35%) Loss: 2.272377 LR: 0.00002400 [08:11:05] Epoch: 1 Batch: 3087/20099 (15.36%) Loss: 2.353722 LR: 0.00002405 [08:11:08] Epoch: 1 Batch: 3088/20099 (15.36%) Loss: 2.172667 LR: 0.00002405 [08:11:11] Epoch: 1 Batch: 3089/20099 (15.37%) Loss: 1.596799 LR: 0.00002405 [08:11:14] Epoch: 1 Batch: 3090/20099 (15.37%) Loss: 2.430948 LR: 0.00002405 [08:11:18] Epoch: 1 Batch: 3091/20099 (15.38%) Loss: 2.144435 LR: 0.00002405 [08:11:21] Epoch: 1 Batch: 3092/20099 (15.38%) Loss: 2.083289 LR: 0.00002405 [08:11:24] Epoch: 1 Batch: 3093/20099 (15.39%) Loss: 2.255977 LR: 0.00002405 [08:11:27] Epoch: 1 Batch: 3094/20099 (15.39%) Loss: 2.125531 LR: 0.00002411 [08:11:30] Epoch: 1 Batch: 3095/20099 (15.40%) Loss: 2.282505 LR: 0.00002411 [08:11:33] Epoch: 1 Batch: 3096/20099 (15.40%) Loss: 2.220950 LR: 0.00002411 [08:11:36] Epoch: 1 Batch: 3097/20099 (15.41%) Loss: 2.466730 LR: 0.00002411 [08:11:39] Epoch: 1 Batch: 3098/20099 (15.41%) Loss: 2.058371 LR: 0.00002411 [08:11:42] Epoch: 1 Batch: 3099/20099 (15.42%) Loss: 1.885173 LR: 0.00002411 [08:11:45] Epoch: 1 Batch: 3100/20099 (15.42%) Loss: 1.973068 LR: 0.00002411 [08:11:48] Epoch: 1 Batch: 3101/20099 (15.43%) Loss: 2.161209 LR: 0.00002416 [08:11:52] Epoch: 1 Batch: 3102/20099 (15.43%) Loss: 2.293001 LR: 0.00002416 [08:11:55] Epoch: 1 Batch: 3103/20099 (15.44%) Loss: 2.125030 LR: 0.00002416 [08:11:58] Epoch: 1 Batch: 3104/20099 (15.44%) Loss: 2.247079 LR: 0.00002416 [08:12:01] Epoch: 1 Batch: 3105/20099 (15.45%) Loss: 2.196653 LR: 0.00002416 [08:12:04] Epoch: 1 Batch: 3106/20099 (15.45%) Loss: 2.399161 LR: 0.00002416 [08:12:07] Epoch: 1 Batch: 3107/20099 (15.46%) Loss: 2.207570 LR: 0.00002416 [08:12:10] Epoch: 1 Batch: 3108/20099 (15.46%) Loss: 2.015174 LR: 0.00002422 [08:12:13] Epoch: 1 Batch: 3109/20099 (15.47%) Loss: 2.462596 LR: 0.00002422 [08:12:16] Epoch: 1 Batch: 3110/20099 (15.47%) Loss: 2.028267 LR: 0.00002422 [08:12:19] Epoch: 1 Batch: 3111/20099 (15.48%) Loss: 2.070783 LR: 0.00002422 [08:12:23] Epoch: 1 Batch: 3112/20099 (15.48%) Loss: 2.072525 LR: 0.00002422 [08:12:26] Epoch: 1 Batch: 3113/20099 (15.49%) Loss: 2.551221 LR: 0.00002422 [08:12:29] Epoch: 1 Batch: 3114/20099 (15.49%) Loss: 1.897044 LR: 0.00002422 [08:12:32] Epoch: 1 Batch: 3115/20099 (15.50%) Loss: 2.038540 LR: 0.00002427 [08:12:35] Epoch: 1 Batch: 3116/20099 (15.50%) Loss: 2.208086 LR: 0.00002427 [08:12:38] Epoch: 1 Batch: 3117/20099 (15.51%) Loss: 2.125078 LR: 0.00002427 [08:12:41] Epoch: 1 Batch: 3118/20099 (15.51%) Loss: 2.307572 LR: 0.00002427 [08:12:44] Epoch: 1 Batch: 3119/20099 (15.52%) Loss: 2.427942 LR: 0.00002427 [08:12:47] Epoch: 1 Batch: 3120/20099 (15.52%) Loss: 2.024485 LR: 0.00002427 [08:12:50] Epoch: 1 Batch: 3121/20099 (15.53%) Loss: 2.248472 LR: 0.00002427 [08:12:53] Epoch: 1 Batch: 3122/20099 (15.53%) Loss: 2.021040 LR: 0.00002433 [08:12:57] Epoch: 1 Batch: 3123/20099 (15.54%) Loss: 2.305498 LR: 0.00002433 [08:13:00] Epoch: 1 Batch: 3124/20099 (15.54%) Loss: 2.127471 LR: 0.00002433 [08:13:03] Epoch: 1 Batch: 3125/20099 (15.55%) Loss: 2.138189 LR: 0.00002433 [08:13:06] Epoch: 1 Batch: 3126/20099 (15.55%) Loss: 2.219580 LR: 0.00002433 [08:13:09] Epoch: 1 Batch: 3127/20099 (15.56%) Loss: 2.137914 LR: 0.00002433 [08:13:12] Epoch: 1 Batch: 3128/20099 (15.56%) Loss: 2.376242 LR: 0.00002433 [08:13:15] Epoch: 1 Batch: 3129/20099 (15.57%) Loss: 2.276161 LR: 0.00002438 [08:13:18] Epoch: 1 Batch: 3130/20099 (15.57%) Loss: 2.274749 LR: 0.00002438 [08:13:21] Epoch: 1 Batch: 3131/20099 (15.58%) Loss: 2.294725 LR: 0.00002438 [08:13:24] Epoch: 1 Batch: 3132/20099 (15.58%) Loss: 2.339610 LR: 0.00002438 [08:13:28] Epoch: 1 Batch: 3133/20099 (15.59%) Loss: 2.106100 LR: 0.00002438 [08:13:31] Epoch: 1 Batch: 3134/20099 (15.59%) Loss: 2.219691 LR: 0.00002438 [08:13:34] Epoch: 1 Batch: 3135/20099 (15.60%) Loss: 2.061401 LR: 0.00002438 [08:13:37] Epoch: 1 Batch: 3136/20099 (15.60%) Loss: 2.333879 LR: 0.00002444 [08:13:40] Epoch: 1 Batch: 3137/20099 (15.61%) Loss: 2.106811 LR: 0.00002444 [08:13:43] Epoch: 1 Batch: 3138/20099 (15.61%) Loss: 2.165758 LR: 0.00002444 [08:13:46] Epoch: 1 Batch: 3139/20099 (15.62%) Loss: 2.062826 LR: 0.00002444 [08:13:49] Epoch: 1 Batch: 3140/20099 (15.62%) Loss: 2.316462 LR: 0.00002444 [08:13:52] Epoch: 1 Batch: 3141/20099 (15.63%) Loss: 2.588897 LR: 0.00002444 [08:13:55] Epoch: 1 Batch: 3142/20099 (15.63%) Loss: 2.086696 LR: 0.00002444 [08:13:59] Epoch: 1 Batch: 3143/20099 (15.64%) Loss: 2.052127 LR: 0.00002449 [08:14:02] Epoch: 1 Batch: 3144/20099 (15.64%) Loss: 2.120579 LR: 0.00002449 [08:14:05] Epoch: 1 Batch: 3145/20099 (15.65%) Loss: 2.370177 LR: 0.00002449 [08:14:08] Epoch: 1 Batch: 3146/20099 (15.65%) Loss: 2.325802 LR: 0.00002449 [08:14:11] Epoch: 1 Batch: 3147/20099 (15.66%) Loss: 2.037673 LR: 0.00002449 [08:14:14] Epoch: 1 Batch: 3148/20099 (15.66%) Loss: 2.185206 LR: 0.00002449 [08:14:17] Epoch: 1 Batch: 3149/20099 (15.67%) Loss: 2.031933 LR: 0.00002449 [08:14:20] Epoch: 1 Batch: 3150/20099 (15.67%) Loss: 2.014289 LR: 0.00002455 [08:14:23] Epoch: 1 Batch: 3151/20099 (15.68%) Loss: 2.119134 LR: 0.00002455 [08:14:26] Epoch: 1 Batch: 3152/20099 (15.68%) Loss: 2.184337 LR: 0.00002455 [08:14:30] Epoch: 1 Batch: 3153/20099 (15.69%) Loss: 2.316284 LR: 0.00002455 [08:14:33] Epoch: 1 Batch: 3154/20099 (15.69%) Loss: 2.250019 LR: 0.00002455 [08:14:36] Epoch: 1 Batch: 3155/20099 (15.70%) Loss: 2.089617 LR: 0.00002455 [08:14:39] Epoch: 1 Batch: 3156/20099 (15.70%) Loss: 2.554235 LR: 0.00002455 [08:14:42] Epoch: 1 Batch: 3157/20099 (15.71%) Loss: 2.118400 LR: 0.00002460 [08:14:45] Epoch: 1 Batch: 3158/20099 (15.71%) Loss: 2.266126 LR: 0.00002460 [08:14:48] Epoch: 1 Batch: 3159/20099 (15.72%) Loss: 2.194458 LR: 0.00002460 [08:14:51] Epoch: 1 Batch: 3160/20099 (15.72%) Loss: 2.185789 LR: 0.00002460 [08:14:54] Epoch: 1 Batch: 3161/20099 (15.73%) Loss: 2.032192 LR: 0.00002460 [08:14:58] Epoch: 1 Batch: 3162/20099 (15.73%) Loss: 2.178215 LR: 0.00002460 [08:15:01] Epoch: 1 Batch: 3163/20099 (15.74%) Loss: 2.117066 LR: 0.00002460 [08:15:04] Epoch: 1 Batch: 3164/20099 (15.74%) Loss: 2.210169 LR: 0.00002465 [08:15:07] Epoch: 1 Batch: 3165/20099 (15.75%) Loss: 2.384284 LR: 0.00002465 [08:15:10] Epoch: 1 Batch: 3166/20099 (15.75%) Loss: 2.226386 LR: 0.00002465 [08:15:13] Epoch: 1 Batch: 3167/20099 (15.76%) Loss: 1.815578 LR: 0.00002465 [08:15:16] Epoch: 1 Batch: 3168/20099 (15.76%) Loss: 2.404747 LR: 0.00002465 [08:15:19] Epoch: 1 Batch: 3169/20099 (15.77%) Loss: 2.367325 LR: 0.00002465 [08:15:22] Epoch: 1 Batch: 3170/20099 (15.77%) Loss: 2.200128 LR: 0.00002465 [08:15:25] Epoch: 1 Batch: 3171/20099 (15.78%) Loss: 2.167218 LR: 0.00002471 [08:15:29] Epoch: 1 Batch: 3172/20099 (15.78%) Loss: 1.992056 LR: 0.00002471 [08:15:32] Epoch: 1 Batch: 3173/20099 (15.79%) Loss: 2.458081 LR: 0.00002471 [08:15:35] Epoch: 1 Batch: 3174/20099 (15.79%) Loss: 1.964258 LR: 0.00002471 [08:15:38] Epoch: 1 Batch: 3175/20099 (15.80%) Loss: 2.072972 LR: 0.00002471 [08:15:41] Epoch: 1 Batch: 3176/20099 (15.80%) Loss: 2.016938 LR: 0.00002471 [08:15:44] Epoch: 1 Batch: 3177/20099 (15.81%) Loss: 2.427017 LR: 0.00002471 [08:15:47] Epoch: 1 Batch: 3178/20099 (15.81%) Loss: 2.418317 LR: 0.00002476 [08:15:50] Epoch: 1 Batch: 3179/20099 (15.82%) Loss: 2.087599 LR: 0.00002476 [08:15:53] Epoch: 1 Batch: 3180/20099 (15.82%) Loss: 2.328664 LR: 0.00002476 [08:15:56] Epoch: 1 Batch: 3181/20099 (15.83%) Loss: 1.829847 LR: 0.00002476 [08:15:59] Epoch: 1 Batch: 3182/20099 (15.83%) Loss: 2.416254 LR: 0.00002476 [08:16:03] Epoch: 1 Batch: 3183/20099 (15.84%) Loss: 2.260193 LR: 0.00002476 [08:16:06] Epoch: 1 Batch: 3184/20099 (15.84%) Loss: 2.273007 LR: 0.00002476 [08:16:09] Epoch: 1 Batch: 3185/20099 (15.85%) Loss: 2.192671 LR: 0.00002482 [08:16:12] Epoch: 1 Batch: 3186/20099 (15.85%) Loss: 2.409903 LR: 0.00002482 [08:16:15] Epoch: 1 Batch: 3187/20099 (15.86%) Loss: 2.251062 LR: 0.00002482 [08:16:18] Epoch: 1 Batch: 3188/20099 (15.86%) Loss: 2.050327 LR: 0.00002482 [08:16:21] Epoch: 1 Batch: 3189/20099 (15.87%) Loss: 2.240115 LR: 0.00002482 [08:16:24] Epoch: 1 Batch: 3190/20099 (15.87%) Loss: 2.325598 LR: 0.00002482 [08:16:27] Epoch: 1 Batch: 3191/20099 (15.88%) Loss: 1.820946 LR: 0.00002482 [08:16:30] Epoch: 1 Batch: 3192/20099 (15.88%) Loss: 2.296700 LR: 0.00002487 [08:16:33] Epoch: 1 Batch: 3193/20099 (15.89%) Loss: 2.241825 LR: 0.00002487 [08:16:37] Epoch: 1 Batch: 3194/20099 (15.89%) Loss: 2.213295 LR: 0.00002487 [08:16:40] Epoch: 1 Batch: 3195/20099 (15.90%) Loss: 2.393660 LR: 0.00002487 [08:16:43] Epoch: 1 Batch: 3196/20099 (15.90%) Loss: 2.503264 LR: 0.00002487 [08:16:46] Epoch: 1 Batch: 3197/20099 (15.91%) Loss: 2.088830 LR: 0.00002487 [08:16:49] Epoch: 1 Batch: 3198/20099 (15.91%) Loss: 2.161597 LR: 0.00002487 [08:16:52] Epoch: 1 Batch: 3199/20099 (15.92%) Loss: 2.496126 LR: 0.00002493 [08:16:59] >> Cleaned up old temp checkpoint: epoch1_step1200 [08:16:59] >> Temp checkpoint saved: epoch1_step3200, size: 0.1693 GB [08:16:59] Epoch: 1 Batch: 3200/20099 (15.92%) Loss: 2.228915 LR: 0.00002493 [08:17:02] Epoch: 1 Batch: 3201/20099 (15.93%) Loss: 1.972764 LR: 0.00002493 [08:17:05] Epoch: 1 Batch: 3202/20099 (15.93%) Loss: 2.296057 LR: 0.00002493 [08:17:08] Epoch: 1 Batch: 3203/20099 (15.94%) Loss: 2.171824 LR: 0.00002493 [08:17:11] Epoch: 1 Batch: 3204/20099 (15.94%) Loss: 2.219774 LR: 0.00002493 [08:17:14] Epoch: 1 Batch: 3205/20099 (15.95%) Loss: 2.027104 LR: 0.00002493 [08:17:17] Epoch: 1 Batch: 3206/20099 (15.95%) Loss: 2.078878 LR: 0.00002498 [08:17:20] Epoch: 1 Batch: 3207/20099 (15.96%) Loss: 2.287446 LR: 0.00002498 [08:17:23] Epoch: 1 Batch: 3208/20099 (15.96%) Loss: 2.366476 LR: 0.00002498 [08:17:26] Epoch: 1 Batch: 3209/20099 (15.97%) Loss: 2.205447 LR: 0.00002498 [08:17:30] Epoch: 1 Batch: 3210/20099 (15.97%) Loss: 2.126209 LR: 0.00002498 [08:17:33] Epoch: 1 Batch: 3211/20099 (15.98%) Loss: 2.137474 LR: 0.00002498 [08:17:36] Epoch: 1 Batch: 3212/20099 (15.98%) Loss: 2.043741 LR: 0.00002498 [08:17:39] Epoch: 1 Batch: 3213/20099 (15.99%) Loss: 2.211807 LR: 0.00002504 [08:17:42] Epoch: 1 Batch: 3214/20099 (15.99%) Loss: 1.988102 LR: 0.00002504 [08:17:45] Epoch: 1 Batch: 3215/20099 (16.00%) Loss: 2.343780 LR: 0.00002504 [08:17:48] Epoch: 1 Batch: 3216/20099 (16.00%) Loss: 2.451816 LR: 0.00002504 [08:17:51] Epoch: 1 Batch: 3217/20099 (16.01%) Loss: 2.351874 LR: 0.00002504 [08:17:54] Epoch: 1 Batch: 3218/20099 (16.01%) Loss: 2.324056 LR: 0.00002504 [08:17:58] Epoch: 1 Batch: 3219/20099 (16.02%) Loss: 1.826231 LR: 0.00002504 [08:18:01] Epoch: 1 Batch: 3220/20099 (16.02%) Loss: 2.196962 LR: 0.00002509 [08:18:04] Epoch: 1 Batch: 3221/20099 (16.03%) Loss: 2.185053 LR: 0.00002509 [08:18:07] Epoch: 1 Batch: 3222/20099 (16.03%) Loss: 2.171045 LR: 0.00002509 [08:18:10] Epoch: 1 Batch: 3223/20099 (16.04%) Loss: 2.294142 LR: 0.00002509 [08:18:13] Epoch: 1 Batch: 3224/20099 (16.04%) Loss: 2.149269 LR: 0.00002509 [08:18:16] Epoch: 1 Batch: 3225/20099 (16.05%) Loss: 2.056018 LR: 0.00002509 [08:18:19] Epoch: 1 Batch: 3226/20099 (16.05%) Loss: 1.805209 LR: 0.00002509 [08:18:22] Epoch: 1 Batch: 3227/20099 (16.06%) Loss: 2.427785 LR: 0.00002515 [08:18:25] Epoch: 1 Batch: 3228/20099 (16.06%) Loss: 2.267134 LR: 0.00002515 [08:18:28] Epoch: 1 Batch: 3229/20099 (16.07%) Loss: 2.239864 LR: 0.00002515 [08:18:32] Epoch: 1 Batch: 3230/20099 (16.07%) Loss: 2.706387 LR: 0.00002515 [08:18:35] Epoch: 1 Batch: 3231/20099 (16.08%) Loss: 2.232271 LR: 0.00002515 [08:18:38] Epoch: 1 Batch: 3232/20099 (16.08%) Loss: 2.197361 LR: 0.00002515 [08:18:41] Epoch: 1 Batch: 3233/20099 (16.09%) Loss: 2.401169 LR: 0.00002515 [08:18:44] Epoch: 1 Batch: 3234/20099 (16.09%) Loss: 2.203218 LR: 0.00002520 [08:18:47] Epoch: 1 Batch: 3235/20099 (16.10%) Loss: 2.177223 LR: 0.00002520 [08:18:50] Epoch: 1 Batch: 3236/20099 (16.10%) Loss: 2.145628 LR: 0.00002520 [08:18:53] Epoch: 1 Batch: 3237/20099 (16.11%) Loss: 2.135652 LR: 0.00002520 [08:18:56] Epoch: 1 Batch: 3238/20099 (16.11%) Loss: 2.304856 LR: 0.00002520 [08:18:59] Epoch: 1 Batch: 3239/20099 (16.12%) Loss: 2.502804 LR: 0.00002520 [08:19:02] Epoch: 1 Batch: 3240/20099 (16.12%) Loss: 1.978871 LR: 0.00002520 [08:19:06] Epoch: 1 Batch: 3241/20099 (16.13%) Loss: 2.669682 LR: 0.00002525 [08:19:09] Epoch: 1 Batch: 3242/20099 (16.13%) Loss: 2.258694 LR: 0.00002525 [08:19:12] Epoch: 1 Batch: 3243/20099 (16.14%) Loss: 2.310532 LR: 0.00002525 [08:19:15] Epoch: 1 Batch: 3244/20099 (16.14%) Loss: 2.148279 LR: 0.00002525 [08:19:18] Epoch: 1 Batch: 3245/20099 (16.15%) Loss: 2.139233 LR: 0.00002525 [08:19:21] Epoch: 1 Batch: 3246/20099 (16.15%) Loss: 2.169776 LR: 0.00002525 [08:19:24] Epoch: 1 Batch: 3247/20099 (16.16%) Loss: 2.459082 LR: 0.00002525 [08:19:27] Epoch: 1 Batch: 3248/20099 (16.16%) Loss: 2.473167 LR: 0.00002531 [08:19:30] Epoch: 1 Batch: 3249/20099 (16.16%) Loss: 2.211891 LR: 0.00002531 [08:19:33] Epoch: 1 Batch: 3250/20099 (16.17%) Loss: 2.473466 LR: 0.00002531 [08:19:36] Epoch: 1 Batch: 3251/20099 (16.17%) Loss: 2.144229 LR: 0.00002531 [08:19:40] Epoch: 1 Batch: 3252/20099 (16.18%) Loss: 1.884536 LR: 0.00002531 [08:19:43] Epoch: 1 Batch: 3253/20099 (16.18%) Loss: 2.202050 LR: 0.00002531 [08:19:46] Epoch: 1 Batch: 3254/20099 (16.19%) Loss: 1.972501 LR: 0.00002531 [08:19:49] Epoch: 1 Batch: 3255/20099 (16.19%) Loss: 2.304303 LR: 0.00002536 [08:19:52] Epoch: 1 Batch: 3256/20099 (16.20%) Loss: 2.319896 LR: 0.00002536 [08:19:55] Epoch: 1 Batch: 3257/20099 (16.20%) Loss: 1.875923 LR: 0.00002536 [08:19:58] Epoch: 1 Batch: 3258/20099 (16.21%) Loss: 2.310294 LR: 0.00002536 [08:20:01] Epoch: 1 Batch: 3259/20099 (16.21%) Loss: 2.229076 LR: 0.00002536 [08:20:04] Epoch: 1 Batch: 3260/20099 (16.22%) Loss: 2.146106 LR: 0.00002536 [08:20:07] Epoch: 1 Batch: 3261/20099 (16.22%) Loss: 2.059092 LR: 0.00002536 [08:20:10] Epoch: 1 Batch: 3262/20099 (16.23%) Loss: 1.875203 LR: 0.00002542 [08:20:14] Epoch: 1 Batch: 3263/20099 (16.23%) Loss: 2.673416 LR: 0.00002542 [08:20:17] Epoch: 1 Batch: 3264/20099 (16.24%) Loss: 2.305681 LR: 0.00002542 [08:20:20] Epoch: 1 Batch: 3265/20099 (16.24%) Loss: 2.082320 LR: 0.00002542 [08:20:23] Epoch: 1 Batch: 3266/20099 (16.25%) Loss: 2.252251 LR: 0.00002542 [08:20:26] Epoch: 1 Batch: 3267/20099 (16.25%) Loss: 2.418186 LR: 0.00002542 [08:20:29] Epoch: 1 Batch: 3268/20099 (16.26%) Loss: 2.332343 LR: 0.00002542 [08:20:32] Epoch: 1 Batch: 3269/20099 (16.26%) Loss: 2.209190 LR: 0.00002547 [08:20:35] Epoch: 1 Batch: 3270/20099 (16.27%) Loss: 2.297785 LR: 0.00002547 [08:20:38] Epoch: 1 Batch: 3271/20099 (16.27%) Loss: 1.944709 LR: 0.00002547 [08:20:41] Epoch: 1 Batch: 3272/20099 (16.28%) Loss: 2.040693 LR: 0.00002547 [08:20:45] Epoch: 1 Batch: 3273/20099 (16.28%) Loss: 2.262236 LR: 0.00002547 [08:20:48] Epoch: 1 Batch: 3274/20099 (16.29%) Loss: 2.418095 LR: 0.00002547 [08:20:51] Epoch: 1 Batch: 3275/20099 (16.29%) Loss: 2.186109 LR: 0.00002547 [08:20:54] Epoch: 1 Batch: 3276/20099 (16.30%) Loss: 2.256920 LR: 0.00002553 [08:20:57] Epoch: 1 Batch: 3277/20099 (16.30%) Loss: 2.164545 LR: 0.00002553 [08:21:00] Epoch: 1 Batch: 3278/20099 (16.31%) Loss: 2.238378 LR: 0.00002553 [08:21:03] Epoch: 1 Batch: 3279/20099 (16.31%) Loss: 1.811388 LR: 0.00002553 [08:21:06] Epoch: 1 Batch: 3280/20099 (16.32%) Loss: 2.417852 LR: 0.00002553 [08:21:09] Epoch: 1 Batch: 3281/20099 (16.32%) Loss: 2.371376 LR: 0.00002553 [08:21:12] Epoch: 1 Batch: 3282/20099 (16.33%) Loss: 2.424272 LR: 0.00002553 [08:21:15] Epoch: 1 Batch: 3283/20099 (16.33%) Loss: 2.481582 LR: 0.00002558 [08:21:19] Epoch: 1 Batch: 3284/20099 (16.34%) Loss: 2.331858 LR: 0.00002558 [08:21:22] Epoch: 1 Batch: 3285/20099 (16.34%) Loss: 2.479770 LR: 0.00002558 [08:21:25] Epoch: 1 Batch: 3286/20099 (16.35%) Loss: 2.208421 LR: 0.00002558 [08:21:28] Epoch: 1 Batch: 3287/20099 (16.35%) Loss: 2.593747 LR: 0.00002558 [08:21:31] Epoch: 1 Batch: 3288/20099 (16.36%) Loss: 2.121921 LR: 0.00002558 [08:21:34] Epoch: 1 Batch: 3289/20099 (16.36%) Loss: 2.235387 LR: 0.00002558 [08:21:37] Epoch: 1 Batch: 3290/20099 (16.37%) Loss: 2.439532 LR: 0.00002564 [08:21:40] Epoch: 1 Batch: 3291/20099 (16.37%) Loss: 1.818057 LR: 0.00002564 [08:21:43] Epoch: 1 Batch: 3292/20099 (16.38%) Loss: 2.206335 LR: 0.00002564 [08:21:46] Epoch: 1 Batch: 3293/20099 (16.38%) Loss: 1.876814 LR: 0.00002564 [08:21:50] Epoch: 1 Batch: 3294/20099 (16.39%) Loss: 2.249139 LR: 0.00002564 [08:21:53] Epoch: 1 Batch: 3295/20099 (16.39%) Loss: 2.262602 LR: 0.00002564 [08:21:56] Epoch: 1 Batch: 3296/20099 (16.40%) Loss: 2.210190 LR: 0.00002564 [08:21:59] Epoch: 1 Batch: 3297/20099 (16.40%) Loss: 2.296537 LR: 0.00002569 [08:22:02] Epoch: 1 Batch: 3298/20099 (16.41%) Loss: 2.151994 LR: 0.00002569 [08:22:05] Epoch: 1 Batch: 3299/20099 (16.41%) Loss: 2.066785 LR: 0.00002569 [08:22:08] Epoch: 1 Batch: 3300/20099 (16.42%) Loss: 2.600168 LR: 0.00002569 [08:22:11] Epoch: 1 Batch: 3301/20099 (16.42%) Loss: 2.211089 LR: 0.00002569 [08:22:14] Epoch: 1 Batch: 3302/20099 (16.43%) Loss: 2.063712 LR: 0.00002569 [08:22:17] Epoch: 1 Batch: 3303/20099 (16.43%) Loss: 2.173796 LR: 0.00002569 [08:22:21] Epoch: 1 Batch: 3304/20099 (16.44%) Loss: 2.278502 LR: 0.00002575 [08:22:24] Epoch: 1 Batch: 3305/20099 (16.44%) Loss: 1.950620 LR: 0.00002575 [08:22:27] Epoch: 1 Batch: 3306/20099 (16.45%) Loss: 2.147433 LR: 0.00002575 [08:22:30] Epoch: 1 Batch: 3307/20099 (16.45%) Loss: 1.689042 LR: 0.00002575 [08:22:33] Epoch: 1 Batch: 3308/20099 (16.46%) Loss: 2.330981 LR: 0.00002575 [08:22:36] Epoch: 1 Batch: 3309/20099 (16.46%) Loss: 2.535165 LR: 0.00002575 [08:22:39] Epoch: 1 Batch: 3310/20099 (16.47%) Loss: 2.450668 LR: 0.00002575 [08:22:42] Epoch: 1 Batch: 3311/20099 (16.47%) Loss: 2.302010 LR: 0.00002580 [08:22:45] Epoch: 1 Batch: 3312/20099 (16.48%) Loss: 1.969719 LR: 0.00002580 [08:22:48] Epoch: 1 Batch: 3313/20099 (16.48%) Loss: 2.336655 LR: 0.00002580 [08:22:51] Epoch: 1 Batch: 3314/20099 (16.49%) Loss: 1.979051 LR: 0.00002580 [08:22:55] Epoch: 1 Batch: 3315/20099 (16.49%) Loss: 2.140113 LR: 0.00002580 [08:22:58] Epoch: 1 Batch: 3316/20099 (16.50%) Loss: 2.601040 LR: 0.00002580 [08:23:01] Epoch: 1 Batch: 3317/20099 (16.50%) Loss: 2.401024 LR: 0.00002580 [08:23:04] Epoch: 1 Batch: 3318/20099 (16.51%) Loss: 2.063126 LR: 0.00002585 [08:23:07] Epoch: 1 Batch: 3319/20099 (16.51%) Loss: 2.297027 LR: 0.00002585 [08:23:10] Epoch: 1 Batch: 3320/20099 (16.52%) Loss: 2.338523 LR: 0.00002585 [08:23:13] Epoch: 1 Batch: 3321/20099 (16.52%) Loss: 2.481900 LR: 0.00002585 [08:23:16] Epoch: 1 Batch: 3322/20099 (16.53%) Loss: 2.266592 LR: 0.00002585 [08:23:19] Epoch: 1 Batch: 3323/20099 (16.53%) Loss: 2.355905 LR: 0.00002585 [08:23:22] Epoch: 1 Batch: 3324/20099 (16.54%) Loss: 2.291657 LR: 0.00002585 [08:23:26] Epoch: 1 Batch: 3325/20099 (16.54%) Loss: 1.713855 LR: 0.00002591 [08:23:29] Epoch: 1 Batch: 3326/20099 (16.55%) Loss: 2.226881 LR: 0.00002591 [08:23:32] Epoch: 1 Batch: 3327/20099 (16.55%) Loss: 2.023180 LR: 0.00002591 [08:23:35] Epoch: 1 Batch: 3328/20099 (16.56%) Loss: 2.457410 LR: 0.00002591 [08:23:38] Epoch: 1 Batch: 3329/20099 (16.56%) Loss: 2.049907 LR: 0.00002591 [08:23:41] Epoch: 1 Batch: 3330/20099 (16.57%) Loss: 2.331043 LR: 0.00002591 [08:23:44] Epoch: 1 Batch: 3331/20099 (16.57%) Loss: 2.070821 LR: 0.00002591 [08:23:47] Epoch: 1 Batch: 3332/20099 (16.58%) Loss: 2.320089 LR: 0.00002596 [08:23:50] Epoch: 1 Batch: 3333/20099 (16.58%) Loss: 2.089374 LR: 0.00002596 [08:23:54] Epoch: 1 Batch: 3334/20099 (16.59%) Loss: 2.141730 LR: 0.00002596 [08:23:57] Epoch: 1 Batch: 3335/20099 (16.59%) Loss: 2.202426 LR: 0.00002596 [08:24:00] Epoch: 1 Batch: 3336/20099 (16.60%) Loss: 1.980498 LR: 0.00002596 [08:24:03] Epoch: 1 Batch: 3337/20099 (16.60%) Loss: 2.036700 LR: 0.00002596 [08:24:06] Epoch: 1 Batch: 3338/20099 (16.61%) Loss: 2.421759 LR: 0.00002596 [08:24:09] Epoch: 1 Batch: 3339/20099 (16.61%) Loss: 1.834824 LR: 0.00002602 [08:24:12] Epoch: 1 Batch: 3340/20099 (16.62%) Loss: 2.006632 LR: 0.00002602 [08:24:15] Epoch: 1 Batch: 3341/20099 (16.62%) Loss: 2.230856 LR: 0.00002602 [08:24:18] Epoch: 1 Batch: 3342/20099 (16.63%) Loss: 2.102738 LR: 0.00002602 [08:24:21] Epoch: 1 Batch: 3343/20099 (16.63%) Loss: 1.945235 LR: 0.00002602 [08:24:24] Epoch: 1 Batch: 3344/20099 (16.64%) Loss: 2.309559 LR: 0.00002602 [08:24:28] Epoch: 1 Batch: 3345/20099 (16.64%) Loss: 2.515952 LR: 0.00002602 [08:24:31] Epoch: 1 Batch: 3346/20099 (16.65%) Loss: 2.193229 LR: 0.00002607 [08:24:34] Epoch: 1 Batch: 3347/20099 (16.65%) Loss: 2.544253 LR: 0.00002607 [08:24:37] Epoch: 1 Batch: 3348/20099 (16.66%) Loss: 2.102910 LR: 0.00002607 [08:24:40] Epoch: 1 Batch: 3349/20099 (16.66%) Loss: 2.459631 LR: 0.00002607 [08:24:43] Epoch: 1 Batch: 3350/20099 (16.67%) Loss: 2.047219 LR: 0.00002607 [08:24:46] Epoch: 1 Batch: 3351/20099 (16.67%) Loss: 2.400079 LR: 0.00002607 [08:24:49] Epoch: 1 Batch: 3352/20099 (16.68%) Loss: 2.102782 LR: 0.00002607 [08:24:52] Epoch: 1 Batch: 3353/20099 (16.68%) Loss: 2.526941 LR: 0.00002613 [08:24:55] Epoch: 1 Batch: 3354/20099 (16.69%) Loss: 1.998634 LR: 0.00002613 [08:24:59] Epoch: 1 Batch: 3355/20099 (16.69%) Loss: 2.194685 LR: 0.00002613 [08:25:02] Epoch: 1 Batch: 3356/20099 (16.70%) Loss: 1.973290 LR: 0.00002613 [08:25:05] Epoch: 1 Batch: 3357/20099 (16.70%) Loss: 2.035192 LR: 0.00002613 [08:25:08] Epoch: 1 Batch: 3358/20099 (16.71%) Loss: 2.322349 LR: 0.00002613 [08:25:11] Epoch: 1 Batch: 3359/20099 (16.71%) Loss: 2.231432 LR: 0.00002613 [08:25:14] Epoch: 1 Batch: 3360/20099 (16.72%) Loss: 2.061768 LR: 0.00002618 [08:25:17] Epoch: 1 Batch: 3361/20099 (16.72%) Loss: 2.419883 LR: 0.00002618 [08:25:20] Epoch: 1 Batch: 3362/20099 (16.73%) Loss: 2.345036 LR: 0.00002618 [08:25:23] Epoch: 1 Batch: 3363/20099 (16.73%) Loss: 2.449249 LR: 0.00002618 [08:25:26] Epoch: 1 Batch: 3364/20099 (16.74%) Loss: 2.317705 LR: 0.00002618 [08:25:29] Epoch: 1 Batch: 3365/20099 (16.74%) Loss: 1.984886 LR: 0.00002618 [08:25:33] Epoch: 1 Batch: 3366/20099 (16.75%) Loss: 2.351197 LR: 0.00002618 [08:25:36] Epoch: 1 Batch: 3367/20099 (16.75%) Loss: 2.322683 LR: 0.00002624 [08:25:39] Epoch: 1 Batch: 3368/20099 (16.76%) Loss: 2.232392 LR: 0.00002624 [08:25:42] Epoch: 1 Batch: 3369/20099 (16.76%) Loss: 1.976066 LR: 0.00002624 [08:25:45] Epoch: 1 Batch: 3370/20099 (16.77%) Loss: 2.188760 LR: 0.00002624 [08:25:48] Epoch: 1 Batch: 3371/20099 (16.77%) Loss: 2.123841 LR: 0.00002624 [08:25:51] Epoch: 1 Batch: 3372/20099 (16.78%) Loss: 2.338250 LR: 0.00002624 [08:25:54] Epoch: 1 Batch: 3373/20099 (16.78%) Loss: 2.265271 LR: 0.00002624 [08:25:57] Epoch: 1 Batch: 3374/20099 (16.79%) Loss: 2.126385 LR: 0.00002629 [08:26:00] Epoch: 1 Batch: 3375/20099 (16.79%) Loss: 2.009998 LR: 0.00002629 [08:26:03] Epoch: 1 Batch: 3376/20099 (16.80%) Loss: 2.227751 LR: 0.00002629 [08:26:07] Epoch: 1 Batch: 3377/20099 (16.80%) Loss: 2.316501 LR: 0.00002629 [08:26:10] Epoch: 1 Batch: 3378/20099 (16.81%) Loss: 2.394395 LR: 0.00002629 [08:26:13] Epoch: 1 Batch: 3379/20099 (16.81%) Loss: 2.102838 LR: 0.00002629 [08:26:16] Epoch: 1 Batch: 3380/20099 (16.82%) Loss: 1.937382 LR: 0.00002629 [08:26:19] Epoch: 1 Batch: 3381/20099 (16.82%) Loss: 2.141023 LR: 0.00002635 [08:26:22] Epoch: 1 Batch: 3382/20099 (16.83%) Loss: 2.279950 LR: 0.00002635 [08:26:25] Epoch: 1 Batch: 3383/20099 (16.83%) Loss: 2.024928 LR: 0.00002635 [08:26:28] Epoch: 1 Batch: 3384/20099 (16.84%) Loss: 2.300538 LR: 0.00002635 [08:26:31] Epoch: 1 Batch: 3385/20099 (16.84%) Loss: 2.206520 LR: 0.00002635 [08:26:34] Epoch: 1 Batch: 3386/20099 (16.85%) Loss: 2.182676 LR: 0.00002635 [08:26:37] Epoch: 1 Batch: 3387/20099 (16.85%) Loss: 2.080597 LR: 0.00002635 [08:26:41] Epoch: 1 Batch: 3388/20099 (16.86%) Loss: 2.116941 LR: 0.00002640 [08:26:44] Epoch: 1 Batch: 3389/20099 (16.86%) Loss: 2.289006 LR: 0.00002640 [08:26:47] Epoch: 1 Batch: 3390/20099 (16.87%) Loss: 1.891817 LR: 0.00002640 [08:26:50] Epoch: 1 Batch: 3391/20099 (16.87%) Loss: 2.406143 LR: 0.00002640 [08:26:53] Epoch: 1 Batch: 3392/20099 (16.88%) Loss: 2.174690 LR: 0.00002640 [08:26:56] Epoch: 1 Batch: 3393/20099 (16.88%) Loss: 2.200736 LR: 0.00002640 [08:26:59] Epoch: 1 Batch: 3394/20099 (16.89%) Loss: 2.087096 LR: 0.00002640 [08:27:02] Epoch: 1 Batch: 3395/20099 (16.89%) Loss: 2.103908 LR: 0.00002645 [08:27:05] Epoch: 1 Batch: 3396/20099 (16.90%) Loss: 2.120263 LR: 0.00002645 [08:27:08] Epoch: 1 Batch: 3397/20099 (16.90%) Loss: 2.283164 LR: 0.00002645 [08:27:11] Epoch: 1 Batch: 3398/20099 (16.91%) Loss: 2.216897 LR: 0.00002645 [08:27:15] Epoch: 1 Batch: 3399/20099 (16.91%) Loss: 2.064074 LR: 0.00002645 [08:27:21] >> Cleaned up old temp checkpoint: epoch1_step1400 [08:27:21] >> Temp checkpoint saved: epoch1_step3400, size: 0.1693 GB [08:27:21] Epoch: 1 Batch: 3400/20099 (16.92%) Loss: 2.163091 LR: 0.00002645 [08:27:24] Epoch: 1 Batch: 3401/20099 (16.92%) Loss: 2.023928 LR: 0.00002645 [08:27:27] Epoch: 1 Batch: 3402/20099 (16.93%) Loss: 2.126615 LR: 0.00002651 [08:27:30] Epoch: 1 Batch: 3403/20099 (16.93%) Loss: 2.255748 LR: 0.00002651 [08:27:34] Epoch: 1 Batch: 3404/20099 (16.94%) Loss: 2.220340 LR: 0.00002651 [08:27:37] Epoch: 1 Batch: 3405/20099 (16.94%) Loss: 2.249306 LR: 0.00002651 [08:27:40] Epoch: 1 Batch: 3406/20099 (16.95%) Loss: 2.319127 LR: 0.00002651 [08:27:43] Epoch: 1 Batch: 3407/20099 (16.95%) Loss: 2.173804 LR: 0.00002651 [08:27:46] Epoch: 1 Batch: 3408/20099 (16.96%) Loss: 2.145165 LR: 0.00002651 [08:27:49] Epoch: 1 Batch: 3409/20099 (16.96%) Loss: 2.518950 LR: 0.00002656 [08:27:52] Epoch: 1 Batch: 3410/20099 (16.97%) Loss: 2.032890 LR: 0.00002656 [08:27:55] Epoch: 1 Batch: 3411/20099 (16.97%) Loss: 2.138523 LR: 0.00002656 [08:27:58] Epoch: 1 Batch: 3412/20099 (16.98%) Loss: 2.271917 LR: 0.00002656 [08:28:02] Epoch: 1 Batch: 3413/20099 (16.98%) Loss: 2.348805 LR: 0.00002656 [08:28:05] Epoch: 1 Batch: 3414/20099 (16.99%) Loss: 2.060218 LR: 0.00002656 [08:28:08] Epoch: 1 Batch: 3415/20099 (16.99%) Loss: 2.193175 LR: 0.00002656 [08:28:11] Epoch: 1 Batch: 3416/20099 (17.00%) Loss: 2.023857 LR: 0.00002662 [08:28:14] Epoch: 1 Batch: 3417/20099 (17.00%) Loss: 2.104334 LR: 0.00002662 [08:28:17] Epoch: 1 Batch: 3418/20099 (17.01%) Loss: 2.076061 LR: 0.00002662 [08:28:20] Epoch: 1 Batch: 3419/20099 (17.01%) Loss: 1.965384 LR: 0.00002662 [08:28:23] Epoch: 1 Batch: 3420/20099 (17.02%) Loss: 2.225835 LR: 0.00002662 [08:28:26] Epoch: 1 Batch: 3421/20099 (17.02%) Loss: 2.579302 LR: 0.00002662 [08:28:29] Epoch: 1 Batch: 3422/20099 (17.03%) Loss: 2.214364 LR: 0.00002662 [08:28:33] Epoch: 1 Batch: 3423/20099 (17.03%) Loss: 2.255745 LR: 0.00002667 [08:28:36] Epoch: 1 Batch: 3424/20099 (17.04%) Loss: 2.385896 LR: 0.00002667 [08:28:39] Epoch: 1 Batch: 3425/20099 (17.04%) Loss: 2.004862 LR: 0.00002667 [08:28:42] Epoch: 1 Batch: 3426/20099 (17.05%) Loss: 2.230822 LR: 0.00002667 [08:28:45] Epoch: 1 Batch: 3427/20099 (17.05%) Loss: 2.253181 LR: 0.00002667 [08:28:48] Epoch: 1 Batch: 3428/20099 (17.06%) Loss: 2.088153 LR: 0.00002667 [08:28:51] Epoch: 1 Batch: 3429/20099 (17.06%) Loss: 2.213341 LR: 0.00002667 [08:28:54] Epoch: 1 Batch: 3430/20099 (17.07%) Loss: 2.231649 LR: 0.00002673 [08:28:57] Epoch: 1 Batch: 3431/20099 (17.07%) Loss: 2.022143 LR: 0.00002673 [08:29:00] Epoch: 1 Batch: 3432/20099 (17.08%) Loss: 2.461801 LR: 0.00002673 [08:29:03] Epoch: 1 Batch: 3433/20099 (17.08%) Loss: 2.276209 LR: 0.00002673 [08:29:06] Epoch: 1 Batch: 3434/20099 (17.09%) Loss: 2.406996 LR: 0.00002673 [08:29:10] Epoch: 1 Batch: 3435/20099 (17.09%) Loss: 2.326365 LR: 0.00002673 [08:29:13] Epoch: 1 Batch: 3436/20099 (17.10%) Loss: 2.338133 LR: 0.00002673 [08:29:16] Epoch: 1 Batch: 3437/20099 (17.10%) Loss: 2.194034 LR: 0.00002678 [08:29:19] Epoch: 1 Batch: 3438/20099 (17.11%) Loss: 2.070997 LR: 0.00002678 [08:29:22] Epoch: 1 Batch: 3439/20099 (17.11%) Loss: 2.341562 LR: 0.00002678 [08:29:25] Epoch: 1 Batch: 3440/20099 (17.12%) Loss: 2.157019 LR: 0.00002678 [08:29:28] Epoch: 1 Batch: 3441/20099 (17.12%) Loss: 2.277182 LR: 0.00002678 [08:29:31] Epoch: 1 Batch: 3442/20099 (17.13%) Loss: 2.191656 LR: 0.00002678 [08:29:34] Epoch: 1 Batch: 3443/20099 (17.13%) Loss: 2.241832 LR: 0.00002678 [08:29:37] Epoch: 1 Batch: 3444/20099 (17.14%) Loss: 1.945580 LR: 0.00002684 [08:29:41] Epoch: 1 Batch: 3445/20099 (17.14%) Loss: 2.155689 LR: 0.00002684 [08:29:44] Epoch: 1 Batch: 3446/20099 (17.15%) Loss: 2.011464 LR: 0.00002684 [08:29:47] Epoch: 1 Batch: 3447/20099 (17.15%) Loss: 2.031164 LR: 0.00002684 [08:29:50] Epoch: 1 Batch: 3448/20099 (17.16%) Loss: 2.129925 LR: 0.00002684 [08:29:53] Epoch: 1 Batch: 3449/20099 (17.16%) Loss: 2.334049 LR: 0.00002684 [08:29:56] Epoch: 1 Batch: 3450/20099 (17.17%) Loss: 2.164114 LR: 0.00002684 [08:29:59] Epoch: 1 Batch: 3451/20099 (17.17%) Loss: 2.384614 LR: 0.00002689 [08:30:02] Epoch: 1 Batch: 3452/20099 (17.17%) Loss: 2.267121 LR: 0.00002689 [08:30:05] Epoch: 1 Batch: 3453/20099 (17.18%) Loss: 2.200549 LR: 0.00002689 [08:30:08] Epoch: 1 Batch: 3454/20099 (17.18%) Loss: 2.282138 LR: 0.00002689 [08:30:12] Epoch: 1 Batch: 3455/20099 (17.19%) Loss: 2.204434 LR: 0.00002689 [08:30:15] Epoch: 1 Batch: 3456/20099 (17.19%) Loss: 2.276602 LR: 0.00002689 [08:30:18] Epoch: 1 Batch: 3457/20099 (17.20%) Loss: 2.142602 LR: 0.00002689 [08:30:21] Epoch: 1 Batch: 3458/20099 (17.20%) Loss: 2.232658 LR: 0.00002695 [08:30:24] Epoch: 1 Batch: 3459/20099 (17.21%) Loss: 2.103098 LR: 0.00002695 [08:30:27] Epoch: 1 Batch: 3460/20099 (17.21%) Loss: 2.373849 LR: 0.00002695 [08:30:30] Epoch: 1 Batch: 3461/20099 (17.22%) Loss: 2.300773 LR: 0.00002695 [08:30:33] Epoch: 1 Batch: 3462/20099 (17.22%) Loss: 2.227243 LR: 0.00002695 [08:30:36] Epoch: 1 Batch: 3463/20099 (17.23%) Loss: 2.085287 LR: 0.00002695 [08:30:40] Epoch: 1 Batch: 3464/20099 (17.23%) Loss: 2.427595 LR: 0.00002695 [08:30:43] Epoch: 1 Batch: 3465/20099 (17.24%) Loss: 2.321514 LR: 0.00002700 [08:30:46] Epoch: 1 Batch: 3466/20099 (17.24%) Loss: 2.048918 LR: 0.00002700 [08:30:49] Epoch: 1 Batch: 3467/20099 (17.25%) Loss: 1.975017 LR: 0.00002700 [08:30:52] Epoch: 1 Batch: 3468/20099 (17.25%) Loss: 2.305634 LR: 0.00002700 [08:30:55] Epoch: 1 Batch: 3469/20099 (17.26%) Loss: 1.966069 LR: 0.00002700 [08:30:58] Epoch: 1 Batch: 3470/20099 (17.26%) Loss: 2.245541 LR: 0.00002700 [08:31:01] Epoch: 1 Batch: 3471/20099 (17.27%) Loss: 2.365441 LR: 0.00002700 [08:31:04] Epoch: 1 Batch: 3472/20099 (17.27%) Loss: 2.173981 LR: 0.00002705 [08:31:07] Epoch: 1 Batch: 3473/20099 (17.28%) Loss: 2.285876 LR: 0.00002705 [08:31:11] Epoch: 1 Batch: 3474/20099 (17.28%) Loss: 2.070543 LR: 0.00002705 [08:31:14] Epoch: 1 Batch: 3475/20099 (17.29%) Loss: 2.197250 LR: 0.00002705 [08:31:17] Epoch: 1 Batch: 3476/20099 (17.29%) Loss: 2.447327 LR: 0.00002705 [08:31:20] Epoch: 1 Batch: 3477/20099 (17.30%) Loss: 2.422841 LR: 0.00002705 [08:31:23] Epoch: 1 Batch: 3478/20099 (17.30%) Loss: 2.286703 LR: 0.00002705 [08:31:26] Epoch: 1 Batch: 3479/20099 (17.31%) Loss: 2.126195 LR: 0.00002711 [08:31:29] Epoch: 1 Batch: 3480/20099 (17.31%) Loss: 2.198842 LR: 0.00002711 [08:31:32] Epoch: 1 Batch: 3481/20099 (17.32%) Loss: 2.157564 LR: 0.00002711 [08:31:35] Epoch: 1 Batch: 3482/20099 (17.32%) Loss: 2.286287 LR: 0.00002711 [08:31:38] Epoch: 1 Batch: 3483/20099 (17.33%) Loss: 2.130831 LR: 0.00002711 [08:31:42] Epoch: 1 Batch: 3484/20099 (17.33%) Loss: 2.437206 LR: 0.00002711 [08:31:45] Epoch: 1 Batch: 3485/20099 (17.34%) Loss: 2.132667 LR: 0.00002711 [08:31:48] Epoch: 1 Batch: 3486/20099 (17.34%) Loss: 1.964328 LR: 0.00002716 [08:31:51] Epoch: 1 Batch: 3487/20099 (17.35%) Loss: 2.415962 LR: 0.00002716 [08:31:54] Epoch: 1 Batch: 3488/20099 (17.35%) Loss: 2.293418 LR: 0.00002716 [08:31:57] Epoch: 1 Batch: 3489/20099 (17.36%) Loss: 2.191009 LR: 0.00002716 [08:32:00] Epoch: 1 Batch: 3490/20099 (17.36%) Loss: 2.340241 LR: 0.00002716 [08:32:03] Epoch: 1 Batch: 3491/20099 (17.37%) Loss: 1.755109 LR: 0.00002716 [08:32:06] Epoch: 1 Batch: 3492/20099 (17.37%) Loss: 2.239558 LR: 0.00002716 [08:32:09] Epoch: 1 Batch: 3493/20099 (17.38%) Loss: 2.075412 LR: 0.00002722 [08:32:13] Epoch: 1 Batch: 3494/20099 (17.38%) Loss: 2.220302 LR: 0.00002722 [08:32:16] Epoch: 1 Batch: 3495/20099 (17.39%) Loss: 2.169812 LR: 0.00002722 [08:32:19] Epoch: 1 Batch: 3496/20099 (17.39%) Loss: 2.315007 LR: 0.00002722 [08:32:22] Epoch: 1 Batch: 3497/20099 (17.40%) Loss: 2.352706 LR: 0.00002722 [08:32:25] Epoch: 1 Batch: 3498/20099 (17.40%) Loss: 2.461675 LR: 0.00002722 [08:32:28] Epoch: 1 Batch: 3499/20099 (17.41%) Loss: 2.138924 LR: 0.00002722 [08:32:31] >> Evaluating batch 0 [08:32:32] >> Evaluating batch 1 [08:32:34] >> Evaluating batch 2 [08:32:35] >> Evaluating batch 3 [08:32:36] >> Evaluating batch 4 [08:32:38] >> Evaluating batch 5 [08:32:39] >> Evaluating batch 6 [08:32:40] >> Evaluating batch 7 [08:32:41] >> Evaluating batch 8 [08:32:43] >> Evaluating batch 9 [08:32:44] >> Evaluating batch 10 [08:32:45] >> Evaluating batch 11 [08:32:46] >> Evaluating batch 12 [08:32:47] >> Evaluating batch 13 [08:32:49] >> Evaluating batch 14 [08:32:50] >> Evaluating batch 15 [08:32:51] >> Evaluating batch 16 [08:32:52] Epoch: 1 Step: 3500/20099 Evaluation: [08:32:52] [1mAvg Loss Since Last Eval: 2.2110 Val Loss: 2.2777 Validation loss delta: -0.0102 Perplexity: 9.7542 LR: 0.00002727 [08:32:55] >> Checkpoint saved: epoch1_step3500, size: 0.1693 GB [08:32:55] Epoch: 1 Batch: 3500/20099 (17.41%) Loss: 1.961526 LR: 0.00002727 [08:32:58] Epoch: 1 Batch: 3501/20099 (17.42%) Loss: 2.266328 LR: 0.00002727 [08:33:01] Epoch: 1 Batch: 3502/20099 (17.42%) Loss: 2.196041 LR: 0.00002727 [08:33:04] Epoch: 1 Batch: 3503/20099 (17.43%) Loss: 2.096919 LR: 0.00002727 [08:33:07] Epoch: 1 Batch: 3504/20099 (17.43%) Loss: 2.426582 LR: 0.00002727 [08:33:11] Epoch: 1 Batch: 3505/20099 (17.44%) Loss: 2.102571 LR: 0.00002727 [08:33:14] Epoch: 1 Batch: 3506/20099 (17.44%) Loss: 2.533707 LR: 0.00002727 [08:33:17] Epoch: 1 Batch: 3507/20099 (17.45%) Loss: 2.205536 LR: 0.00002733 [08:33:20] Epoch: 1 Batch: 3508/20099 (17.45%) Loss: 2.166482 LR: 0.00002733 [08:33:23] Epoch: 1 Batch: 3509/20099 (17.46%) Loss: 2.032693 LR: 0.00002733 [08:33:26] Epoch: 1 Batch: 3510/20099 (17.46%) Loss: 2.225319 LR: 0.00002733 [08:33:29] Epoch: 1 Batch: 3511/20099 (17.47%) Loss: 2.155451 LR: 0.00002733 [08:33:32] Epoch: 1 Batch: 3512/20099 (17.47%) Loss: 2.523999 LR: 0.00002733 [08:33:35] Epoch: 1 Batch: 3513/20099 (17.48%) Loss: 2.510327 LR: 0.00002733 [08:33:39] Epoch: 1 Batch: 3514/20099 (17.48%) Loss: 2.168895 LR: 0.00002738 [08:33:42] Epoch: 1 Batch: 3515/20099 (17.49%) Loss: 2.217379 LR: 0.00002738 [08:33:45] Epoch: 1 Batch: 3516/20099 (17.49%) Loss: 2.223370 LR: 0.00002738 [08:33:48] Epoch: 1 Batch: 3517/20099 (17.50%) Loss: 2.411501 LR: 0.00002738 [08:33:51] Epoch: 1 Batch: 3518/20099 (17.50%) Loss: 2.277023 LR: 0.00002738 [08:33:54] Epoch: 1 Batch: 3519/20099 (17.51%) Loss: 2.253070 LR: 0.00002738 [08:33:57] Epoch: 1 Batch: 3520/20099 (17.51%) Loss: 2.365365 LR: 0.00002738 [08:34:00] Epoch: 1 Batch: 3521/20099 (17.52%) Loss: 2.062301 LR: 0.00002744 [08:34:03] Epoch: 1 Batch: 3522/20099 (17.52%) Loss: 2.325712 LR: 0.00002744 [08:34:06] Epoch: 1 Batch: 3523/20099 (17.53%) Loss: 2.194599 LR: 0.00002744 [08:34:09] Epoch: 1 Batch: 3524/20099 (17.53%) Loss: 2.195588 LR: 0.00002744 [08:34:13] Epoch: 1 Batch: 3525/20099 (17.54%) Loss: 2.122439 LR: 0.00002744 [08:34:16] Epoch: 1 Batch: 3526/20099 (17.54%) Loss: 1.791992 LR: 0.00002744 [08:34:19] Epoch: 1 Batch: 3527/20099 (17.55%) Loss: 2.177729 LR: 0.00002744 [08:34:22] Epoch: 1 Batch: 3528/20099 (17.55%) Loss: 2.482922 LR: 0.00002749 [08:34:25] Epoch: 1 Batch: 3529/20099 (17.56%) Loss: 1.960946 LR: 0.00002749 [08:34:28] Epoch: 1 Batch: 3530/20099 (17.56%) Loss: 2.314354 LR: 0.00002749 [08:34:31] Epoch: 1 Batch: 3531/20099 (17.57%) Loss: 2.328574 LR: 0.00002749 [08:34:34] Epoch: 1 Batch: 3532/20099 (17.57%) Loss: 2.135657 LR: 0.00002749 [08:34:37] Epoch: 1 Batch: 3533/20099 (17.58%) Loss: 2.472409 LR: 0.00002749 [08:34:40] Epoch: 1 Batch: 3534/20099 (17.58%) Loss: 2.359417 LR: 0.00002749 [08:34:43] Epoch: 1 Batch: 3535/20099 (17.59%) Loss: 2.323080 LR: 0.00002755 [08:34:46] Epoch: 1 Batch: 3536/20099 (17.59%) Loss: 2.200904 LR: 0.00002755 [08:34:50] Epoch: 1 Batch: 3537/20099 (17.60%) Loss: 2.231810 LR: 0.00002755 [08:34:53] Epoch: 1 Batch: 3538/20099 (17.60%) Loss: 2.079761 LR: 0.00002755 [08:34:56] Epoch: 1 Batch: 3539/20099 (17.61%) Loss: 2.232421 LR: 0.00002755 [08:34:59] Epoch: 1 Batch: 3540/20099 (17.61%) Loss: 2.176439 LR: 0.00002755 [08:35:02] Epoch: 1 Batch: 3541/20099 (17.62%) Loss: 2.377147 LR: 0.00002755 [08:35:05] Epoch: 1 Batch: 3542/20099 (17.62%) Loss: 2.218146 LR: 0.00002760 [08:35:08] Epoch: 1 Batch: 3543/20099 (17.63%) Loss: 2.210729 LR: 0.00002760 [08:35:11] Epoch: 1 Batch: 3544/20099 (17.63%) Loss: 2.145888 LR: 0.00002760 [08:35:14] Epoch: 1 Batch: 3545/20099 (17.64%) Loss: 2.117176 LR: 0.00002760 [08:35:17] Epoch: 1 Batch: 3546/20099 (17.64%) Loss: 2.465259 LR: 0.00002760 [08:35:21] Epoch: 1 Batch: 3547/20099 (17.65%) Loss: 2.492796 LR: 0.00002760 [08:35:24] Epoch: 1 Batch: 3548/20099 (17.65%) Loss: 2.256935 LR: 0.00002760 [08:35:27] Epoch: 1 Batch: 3549/20099 (17.66%) Loss: 2.043155 LR: 0.00002765 [08:35:30] Epoch: 1 Batch: 3550/20099 (17.66%) Loss: 2.035642 LR: 0.00002765 [08:35:33] Epoch: 1 Batch: 3551/20099 (17.67%) Loss: 2.090890 LR: 0.00002765 [08:35:36] Epoch: 1 Batch: 3552/20099 (17.67%) Loss: 2.321067 LR: 0.00002765 [08:35:39] Epoch: 1 Batch: 3553/20099 (17.68%) Loss: 2.322880 LR: 0.00002765 [08:35:42] Epoch: 1 Batch: 3554/20099 (17.68%) Loss: 2.140146 LR: 0.00002765 [08:35:45] Epoch: 1 Batch: 3555/20099 (17.69%) Loss: 2.107869 LR: 0.00002765 [08:35:48] Epoch: 1 Batch: 3556/20099 (17.69%) Loss: 2.110251 LR: 0.00002771 [08:35:52] Epoch: 1 Batch: 3557/20099 (17.70%) Loss: 2.161535 LR: 0.00002771 [08:35:55] Epoch: 1 Batch: 3558/20099 (17.70%) Loss: 2.504559 LR: 0.00002771 [08:35:58] Epoch: 1 Batch: 3559/20099 (17.71%) Loss: 2.041262 LR: 0.00002771 [08:36:01] Epoch: 1 Batch: 3560/20099 (17.71%) Loss: 2.214695 LR: 0.00002771 [08:36:04] Epoch: 1 Batch: 3561/20099 (17.72%) Loss: 2.279027 LR: 0.00002771 [08:36:07] Epoch: 1 Batch: 3562/20099 (17.72%) Loss: 2.267825 LR: 0.00002771 [08:36:10] Epoch: 1 Batch: 3563/20099 (17.73%) Loss: 2.147718 LR: 0.00002776 [08:36:13] Epoch: 1 Batch: 3564/20099 (17.73%) Loss: 2.181548 LR: 0.00002776 [08:36:16] Epoch: 1 Batch: 3565/20099 (17.74%) Loss: 2.372869 LR: 0.00002776 [08:36:19] Epoch: 1 Batch: 3566/20099 (17.74%) Loss: 2.099928 LR: 0.00002776 [08:36:23] Epoch: 1 Batch: 3567/20099 (17.75%) Loss: 2.453443 LR: 0.00002776 [08:36:26] Epoch: 1 Batch: 3568/20099 (17.75%) Loss: 2.126630 LR: 0.00002776 [08:36:29] Epoch: 1 Batch: 3569/20099 (17.76%) Loss: 2.460560 LR: 0.00002776 [08:36:32] Epoch: 1 Batch: 3570/20099 (17.76%) Loss: 2.082667 LR: 0.00002782 [08:36:35] Epoch: 1 Batch: 3571/20099 (17.77%) Loss: 2.111930 LR: 0.00002782 [08:36:38] Epoch: 1 Batch: 3572/20099 (17.77%) Loss: 2.264828 LR: 0.00002782 [08:36:41] Epoch: 1 Batch: 3573/20099 (17.78%) Loss: 1.947830 LR: 0.00002782 [08:36:44] Epoch: 1 Batch: 3574/20099 (17.78%) Loss: 1.985795 LR: 0.00002782 [08:36:47] Epoch: 1 Batch: 3575/20099 (17.79%) Loss: 2.176628 LR: 0.00002782 [08:36:51] Epoch: 1 Batch: 3576/20099 (17.79%) Loss: 2.327776 LR: 0.00002782 [08:36:54] Epoch: 1 Batch: 3577/20099 (17.80%) Loss: 2.254191 LR: 0.00002787 [08:36:57] Epoch: 1 Batch: 3578/20099 (17.80%) Loss: 2.335636 LR: 0.00002787 [08:37:00] Epoch: 1 Batch: 3579/20099 (17.81%) Loss: 2.275903 LR: 0.00002787 [08:37:03] Epoch: 1 Batch: 3580/20099 (17.81%) Loss: 2.105923 LR: 0.00002787 [08:37:06] Epoch: 1 Batch: 3581/20099 (17.82%) Loss: 2.149241 LR: 0.00002787 [08:37:09] Epoch: 1 Batch: 3582/20099 (17.82%) Loss: 2.136858 LR: 0.00002787 [08:37:12] Epoch: 1 Batch: 3583/20099 (17.83%) Loss: 2.060277 LR: 0.00002787 [08:37:15] Epoch: 1 Batch: 3584/20099 (17.83%) Loss: 2.352275 LR: 0.00002793 [08:37:18] Epoch: 1 Batch: 3585/20099 (17.84%) Loss: 2.103818 LR: 0.00002793 [08:37:22] Epoch: 1 Batch: 3586/20099 (17.84%) Loss: 1.906251 LR: 0.00002793 [08:37:25] Epoch: 1 Batch: 3587/20099 (17.85%) Loss: 1.985630 LR: 0.00002793 [08:37:28] Epoch: 1 Batch: 3588/20099 (17.85%) Loss: 1.891139 LR: 0.00002793 [08:37:31] Epoch: 1 Batch: 3589/20099 (17.86%) Loss: 2.194587 LR: 0.00002793 [08:37:34] Epoch: 1 Batch: 3590/20099 (17.86%) Loss: 2.317896 LR: 0.00002793 [08:37:37] Epoch: 1 Batch: 3591/20099 (17.87%) Loss: 2.317786 LR: 0.00002798 [08:37:40] Epoch: 1 Batch: 3592/20099 (17.87%) Loss: 2.268287 LR: 0.00002798 [08:37:43] Epoch: 1 Batch: 3593/20099 (17.88%) Loss: 2.287639 LR: 0.00002798 [08:37:46] Epoch: 1 Batch: 3594/20099 (17.88%) Loss: 2.286015 LR: 0.00002798 [08:37:49] Epoch: 1 Batch: 3595/20099 (17.89%) Loss: 2.098863 LR: 0.00002798 [08:37:52] Epoch: 1 Batch: 3596/20099 (17.89%) Loss: 2.543015 LR: 0.00002798 [08:37:56] Epoch: 1 Batch: 3597/20099 (17.90%) Loss: 2.137972 LR: 0.00002798 [08:37:59] Epoch: 1 Batch: 3598/20099 (17.90%) Loss: 2.305627 LR: 0.00002804 [08:38:02] Epoch: 1 Batch: 3599/20099 (17.91%) Loss: 1.991279 LR: 0.00002804 [08:38:08] >> Cleaned up old temp checkpoint: epoch1_step1600 [08:38:08] >> Temp checkpoint saved: epoch1_step3600, size: 0.1693 GB [08:38:08] Epoch: 1 Batch: 3600/20099 (17.91%) Loss: 2.372720 LR: 0.00002804 [08:38:11] Epoch: 1 Batch: 3601/20099 (17.92%) Loss: 2.200416 LR: 0.00002804 [08:38:14] Epoch: 1 Batch: 3602/20099 (17.92%) Loss: 2.152371 LR: 0.00002804 [08:38:18] Epoch: 1 Batch: 3603/20099 (17.93%) Loss: 2.417607 LR: 0.00002804 [08:38:21] Epoch: 1 Batch: 3604/20099 (17.93%) Loss: 2.007164 LR: 0.00002804 [08:38:24] Epoch: 1 Batch: 3605/20099 (17.94%) Loss: 2.052018 LR: 0.00002809 [08:38:27] Epoch: 1 Batch: 3606/20099 (17.94%) Loss: 2.255626 LR: 0.00002809 [08:38:30] Epoch: 1 Batch: 3607/20099 (17.95%) Loss: 2.078324 LR: 0.00002809 [08:38:33] Epoch: 1 Batch: 3608/20099 (17.95%) Loss: 2.353887 LR: 0.00002809 [08:38:36] Epoch: 1 Batch: 3609/20099 (17.96%) Loss: 2.111553 LR: 0.00002809 [08:38:39] Epoch: 1 Batch: 3610/20099 (17.96%) Loss: 2.155862 LR: 0.00002809 [08:38:42] Epoch: 1 Batch: 3611/20099 (17.97%) Loss: 2.094332 LR: 0.00002809 [08:38:46] Epoch: 1 Batch: 3612/20099 (17.97%) Loss: 2.108253 LR: 0.00002815 [08:38:49] Epoch: 1 Batch: 3613/20099 (17.98%) Loss: 2.145724 LR: 0.00002815 [08:38:52] Epoch: 1 Batch: 3614/20099 (17.98%) Loss: 2.034326 LR: 0.00002815 [08:38:55] Epoch: 1 Batch: 3615/20099 (17.99%) Loss: 2.328555 LR: 0.00002815 [08:38:58] Epoch: 1 Batch: 3616/20099 (17.99%) Loss: 2.187956 LR: 0.00002815 [08:39:01] Epoch: 1 Batch: 3617/20099 (18.00%) Loss: 2.051411 LR: 0.00002815 [08:39:04] Epoch: 1 Batch: 3618/20099 (18.00%) Loss: 2.378642 LR: 0.00002815 [08:39:07] Epoch: 1 Batch: 3619/20099 (18.01%) Loss: 1.882419 LR: 0.00002820 [08:39:10] Epoch: 1 Batch: 3620/20099 (18.01%) Loss: 2.101921 LR: 0.00002820 [08:39:14] Epoch: 1 Batch: 3621/20099 (18.02%) Loss: 2.392888 LR: 0.00002820 [08:39:17] Epoch: 1 Batch: 3622/20099 (18.02%) Loss: 2.501123 LR: 0.00002820 [08:39:20] Epoch: 1 Batch: 3623/20099 (18.03%) Loss: 2.198987 LR: 0.00002820 [08:39:23] Epoch: 1 Batch: 3624/20099 (18.03%) Loss: 2.289235 LR: 0.00002820 [08:39:26] Epoch: 1 Batch: 3625/20099 (18.04%) Loss: 2.110739 LR: 0.00002820 [08:39:29] Epoch: 1 Batch: 3626/20099 (18.04%) Loss: 2.327968 LR: 0.00002825 [08:39:32] Epoch: 1 Batch: 3627/20099 (18.05%) Loss: 2.459016 LR: 0.00002825 [08:39:35] Epoch: 1 Batch: 3628/20099 (18.05%) Loss: 2.353927 LR: 0.00002825 [08:39:38] Epoch: 1 Batch: 3629/20099 (18.06%) Loss: 2.427294 LR: 0.00002825 [08:39:41] Epoch: 1 Batch: 3630/20099 (18.06%) Loss: 2.247803 LR: 0.00002825 [08:39:44] Epoch: 1 Batch: 3631/20099 (18.07%) Loss: 2.417314 LR: 0.00002825 [08:39:47] Epoch: 1 Batch: 3632/20099 (18.07%) Loss: 2.167754 LR: 0.00002825 [08:39:50] Epoch: 1 Batch: 3633/20099 (18.08%) Loss: 2.014553 LR: 0.00002831 [08:39:54] Epoch: 1 Batch: 3634/20099 (18.08%) Loss: 2.246857 LR: 0.00002831 [08:39:57] Epoch: 1 Batch: 3635/20099 (18.09%) Loss: 2.058805 LR: 0.00002831 [08:40:00] Epoch: 1 Batch: 3636/20099 (18.09%) Loss: 2.035503 LR: 0.00002831 [08:40:03] Epoch: 1 Batch: 3637/20099 (18.10%) Loss: 2.360140 LR: 0.00002831 [08:40:06] Epoch: 1 Batch: 3638/20099 (18.10%) Loss: 2.077742 LR: 0.00002831 [08:40:09] Epoch: 1 Batch: 3639/20099 (18.11%) Loss: 1.983696 LR: 0.00002831 [08:40:12] Epoch: 1 Batch: 3640/20099 (18.11%) Loss: 2.150242 LR: 0.00002836 [08:40:15] Epoch: 1 Batch: 3641/20099 (18.12%) Loss: 2.213603 LR: 0.00002836 [08:40:18] Epoch: 1 Batch: 3642/20099 (18.12%) Loss: 2.275053 LR: 0.00002836 [08:40:21] Epoch: 1 Batch: 3643/20099 (18.13%) Loss: 2.254088 LR: 0.00002836 [08:40:25] Epoch: 1 Batch: 3644/20099 (18.13%) Loss: 2.269475 LR: 0.00002836 [08:40:28] Epoch: 1 Batch: 3645/20099 (18.14%) Loss: 2.385344 LR: 0.00002836 [08:40:31] Epoch: 1 Batch: 3646/20099 (18.14%) Loss: 1.821224 LR: 0.00002836 [08:40:34] Epoch: 1 Batch: 3647/20099 (18.15%) Loss: 2.016310 LR: 0.00002842 [08:40:37] Epoch: 1 Batch: 3648/20099 (18.15%) Loss: 2.147260 LR: 0.00002842 [08:40:40] Epoch: 1 Batch: 3649/20099 (18.16%) Loss: 1.803504 LR: 0.00002842 [08:40:43] Epoch: 1 Batch: 3650/20099 (18.16%) Loss: 2.314947 LR: 0.00002842 [08:40:46] Epoch: 1 Batch: 3651/20099 (18.17%) Loss: 2.308720 LR: 0.00002842 [08:40:49] Epoch: 1 Batch: 3652/20099 (18.17%) Loss: 2.323877 LR: 0.00002842 [08:40:52] Epoch: 1 Batch: 3653/20099 (18.18%) Loss: 1.987878 LR: 0.00002842 [08:40:56] Epoch: 1 Batch: 3654/20099 (18.18%) Loss: 2.099923 LR: 0.00002847 [08:40:59] Epoch: 1 Batch: 3655/20099 (18.18%) Loss: 2.449993 LR: 0.00002847 [08:41:02] Epoch: 1 Batch: 3656/20099 (18.19%) Loss: 2.368257 LR: 0.00002847 [08:41:05] Epoch: 1 Batch: 3657/20099 (18.19%) Loss: 2.326651 LR: 0.00002847 [08:41:08] Epoch: 1 Batch: 3658/20099 (18.20%) Loss: 2.275696 LR: 0.00002847 [08:41:11] Epoch: 1 Batch: 3659/20099 (18.20%) Loss: 2.118002 LR: 0.00002847 [08:41:14] Epoch: 1 Batch: 3660/20099 (18.21%) Loss: 2.024580 LR: 0.00002847 [08:41:17] Epoch: 1 Batch: 3661/20099 (18.21%) Loss: 2.297718 LR: 0.00002853 [08:41:20] Epoch: 1 Batch: 3662/20099 (18.22%) Loss: 1.941885 LR: 0.00002853 [08:41:23] Epoch: 1 Batch: 3663/20099 (18.22%) Loss: 2.290888 LR: 0.00002853 [08:41:26] Epoch: 1 Batch: 3664/20099 (18.23%) Loss: 2.052295 LR: 0.00002853 [08:41:30] Epoch: 1 Batch: 3665/20099 (18.23%) Loss: 1.931201 LR: 0.00002853 [08:41:33] Epoch: 1 Batch: 3666/20099 (18.24%) Loss: 2.030326 LR: 0.00002853 [08:41:36] Epoch: 1 Batch: 3667/20099 (18.24%) Loss: 2.121090 LR: 0.00002853 [08:41:39] Epoch: 1 Batch: 3668/20099 (18.25%) Loss: 2.183436 LR: 0.00002858 [08:41:42] Epoch: 1 Batch: 3669/20099 (18.25%) Loss: 1.922950 LR: 0.00002858 [08:41:45] Epoch: 1 Batch: 3670/20099 (18.26%) Loss: 1.827330 LR: 0.00002858 [08:41:48] Epoch: 1 Batch: 3671/20099 (18.26%) Loss: 2.095283 LR: 0.00002858 [08:41:51] Epoch: 1 Batch: 3672/20099 (18.27%) Loss: 1.987642 LR: 0.00002858 [08:41:54] Epoch: 1 Batch: 3673/20099 (18.27%) Loss: 1.969862 LR: 0.00002858 [08:41:57] Epoch: 1 Batch: 3674/20099 (18.28%) Loss: 2.192002 LR: 0.00002858 [08:42:00] Epoch: 1 Batch: 3675/20099 (18.28%) Loss: 2.521352 LR: 0.00002864 [08:42:03] Epoch: 1 Batch: 3676/20099 (18.29%) Loss: 2.245274 LR: 0.00002864 [08:42:07] Epoch: 1 Batch: 3677/20099 (18.29%) Loss: 2.674772 LR: 0.00002864 [08:42:10] Epoch: 1 Batch: 3678/20099 (18.30%) Loss: 1.738716 LR: 0.00002864 [08:42:13] Epoch: 1 Batch: 3679/20099 (18.30%) Loss: 2.361116 LR: 0.00002864 [08:42:16] Epoch: 1 Batch: 3680/20099 (18.31%) Loss: 2.155463 LR: 0.00002864 [08:42:19] Epoch: 1 Batch: 3681/20099 (18.31%) Loss: 2.296482 LR: 0.00002864 [08:42:22] Epoch: 1 Batch: 3682/20099 (18.32%) Loss: 2.320742 LR: 0.00002869 [08:42:25] Epoch: 1 Batch: 3683/20099 (18.32%) Loss: 2.229099 LR: 0.00002869 [08:42:28] Epoch: 1 Batch: 3684/20099 (18.33%) Loss: 2.411473 LR: 0.00002869 [08:42:31] Epoch: 1 Batch: 3685/20099 (18.33%) Loss: 2.094179 LR: 0.00002869 [08:42:34] Epoch: 1 Batch: 3686/20099 (18.34%) Loss: 2.516968 LR: 0.00002869 [08:42:37] Epoch: 1 Batch: 3687/20099 (18.34%) Loss: 2.385446 LR: 0.00002869 [08:42:40] Epoch: 1 Batch: 3688/20099 (18.35%) Loss: 1.980304 LR: 0.00002869 [08:42:44] Epoch: 1 Batch: 3689/20099 (18.35%) Loss: 2.132445 LR: 0.00002875 [08:42:47] Epoch: 1 Batch: 3690/20099 (18.36%) Loss: 2.081540 LR: 0.00002875 [08:42:50] Epoch: 1 Batch: 3691/20099 (18.36%) Loss: 2.219523 LR: 0.00002875 [08:42:53] Epoch: 1 Batch: 3692/20099 (18.37%) Loss: 2.106692 LR: 0.00002875 [08:42:56] Epoch: 1 Batch: 3693/20099 (18.37%) Loss: 2.179696 LR: 0.00002875 [08:42:59] Epoch: 1 Batch: 3694/20099 (18.38%) Loss: 2.006476 LR: 0.00002875 [08:43:02] Epoch: 1 Batch: 3695/20099 (18.38%) Loss: 2.374117 LR: 0.00002875 [08:43:05] Epoch: 1 Batch: 3696/20099 (18.39%) Loss: 2.191715 LR: 0.00002880 [08:43:08] Epoch: 1 Batch: 3697/20099 (18.39%) Loss: 2.280511 LR: 0.00002880 [08:43:11] Epoch: 1 Batch: 3698/20099 (18.40%) Loss: 2.162292 LR: 0.00002880 [08:43:15] Epoch: 1 Batch: 3699/20099 (18.40%) Loss: 2.306903 LR: 0.00002880 [08:43:18] Epoch: 1 Batch: 3700/20099 (18.41%) Loss: 2.083425 LR: 0.00002880 [08:43:21] Epoch: 1 Batch: 3701/20099 (18.41%) Loss: 2.255505 LR: 0.00002880 [08:43:24] Epoch: 1 Batch: 3702/20099 (18.42%) Loss: 2.036491 LR: 0.00002880 [08:43:27] Epoch: 1 Batch: 3703/20099 (18.42%) Loss: 1.856236 LR: 0.00002885 [08:43:30] Epoch: 1 Batch: 3704/20099 (18.43%) Loss: 2.274361 LR: 0.00002885 [08:43:33] Epoch: 1 Batch: 3705/20099 (18.43%) Loss: 2.052067 LR: 0.00002885 [08:43:36] Epoch: 1 Batch: 3706/20099 (18.44%) Loss: 2.219908 LR: 0.00002885 [08:43:39] Epoch: 1 Batch: 3707/20099 (18.44%) Loss: 2.150232 LR: 0.00002885 [08:43:42] Epoch: 1 Batch: 3708/20099 (18.45%) Loss: 2.270745 LR: 0.00002885 [08:43:45] Epoch: 1 Batch: 3709/20099 (18.45%) Loss: 2.215064 LR: 0.00002885 [08:43:49] Epoch: 1 Batch: 3710/20099 (18.46%) Loss: 2.150714 LR: 0.00002891 [08:43:52] Epoch: 1 Batch: 3711/20099 (18.46%) Loss: 2.217515 LR: 0.00002891 [08:43:55] Epoch: 1 Batch: 3712/20099 (18.47%) Loss: 2.041106 LR: 0.00002891 [08:43:58] Epoch: 1 Batch: 3713/20099 (18.47%) Loss: 2.185126 LR: 0.00002891 [08:44:01] Epoch: 1 Batch: 3714/20099 (18.48%) Loss: 1.983349 LR: 0.00002891 [08:44:04] Epoch: 1 Batch: 3715/20099 (18.48%) Loss: 2.292759 LR: 0.00002891 [08:44:07] Epoch: 1 Batch: 3716/20099 (18.49%) Loss: 2.557661 LR: 0.00002891 [08:44:10] Epoch: 1 Batch: 3717/20099 (18.49%) Loss: 2.389632 LR: 0.00002896 [08:44:13] Epoch: 1 Batch: 3718/20099 (18.50%) Loss: 2.261550 LR: 0.00002896 [08:44:16] Epoch: 1 Batch: 3719/20099 (18.50%) Loss: 2.357910 LR: 0.00002896 [08:44:19] Epoch: 1 Batch: 3720/20099 (18.51%) Loss: 2.354188 LR: 0.00002896 [08:44:23] Epoch: 1 Batch: 3721/20099 (18.51%) Loss: 2.413951 LR: 0.00002896 [08:44:26] Epoch: 1 Batch: 3722/20099 (18.52%) Loss: 2.348244 LR: 0.00002896 [08:44:29] Epoch: 1 Batch: 3723/20099 (18.52%) Loss: 2.052775 LR: 0.00002896 [08:44:32] Epoch: 1 Batch: 3724/20099 (18.53%) Loss: 2.066980 LR: 0.00002902 [08:44:35] Epoch: 1 Batch: 3725/20099 (18.53%) Loss: 2.157805 LR: 0.00002902 [08:44:38] Epoch: 1 Batch: 3726/20099 (18.54%) Loss: 2.375665 LR: 0.00002902 [08:44:41] Epoch: 1 Batch: 3727/20099 (18.54%) Loss: 2.223972 LR: 0.00002902 [08:44:44] Epoch: 1 Batch: 3728/20099 (18.55%) Loss: 2.517500 LR: 0.00002902 [08:44:47] Epoch: 1 Batch: 3729/20099 (18.55%) Loss: 1.913212 LR: 0.00002902 [08:44:50] Epoch: 1 Batch: 3730/20099 (18.56%) Loss: 2.251338 LR: 0.00002902 [08:44:54] Epoch: 1 Batch: 3731/20099 (18.56%) Loss: 2.292226 LR: 0.00002907 [08:44:57] Epoch: 1 Batch: 3732/20099 (18.57%) Loss: 2.232478 LR: 0.00002907 [08:45:00] Epoch: 1 Batch: 3733/20099 (18.57%) Loss: 2.127770 LR: 0.00002907 [08:45:03] Epoch: 1 Batch: 3734/20099 (18.58%) Loss: 1.975544 LR: 0.00002907 [08:45:06] Epoch: 1 Batch: 3735/20099 (18.58%) Loss: 2.209234 LR: 0.00002907 [08:45:09] Epoch: 1 Batch: 3736/20099 (18.59%) Loss: 2.556951 LR: 0.00002907 [08:45:12] Epoch: 1 Batch: 3737/20099 (18.59%) Loss: 2.449658 LR: 0.00002907 [08:45:15] Epoch: 1 Batch: 3738/20099 (18.60%) Loss: 2.121218 LR: 0.00002913 [08:45:18] Epoch: 1 Batch: 3739/20099 (18.60%) Loss: 2.174524 LR: 0.00002913 [08:45:21] Epoch: 1 Batch: 3740/20099 (18.61%) Loss: 2.473674 LR: 0.00002913 [08:45:25] Epoch: 1 Batch: 3741/20099 (18.61%) Loss: 1.983352 LR: 0.00002913 [08:45:28] Epoch: 1 Batch: 3742/20099 (18.62%) Loss: 2.176258 LR: 0.00002913 [08:45:31] Epoch: 1 Batch: 3743/20099 (18.62%) Loss: 2.259927 LR: 0.00002913 [08:45:34] Epoch: 1 Batch: 3744/20099 (18.63%) Loss: 2.249662 LR: 0.00002913 [08:45:37] Epoch: 1 Batch: 3745/20099 (18.63%) Loss: 2.053046 LR: 0.00002918 [08:45:40] Epoch: 1 Batch: 3746/20099 (18.64%) Loss: 2.430856 LR: 0.00002918 [08:45:43] Epoch: 1 Batch: 3747/20099 (18.64%) Loss: 2.135508 LR: 0.00002918 [08:45:46] Epoch: 1 Batch: 3748/20099 (18.65%) Loss: 2.294399 LR: 0.00002918 [08:45:49] Epoch: 1 Batch: 3749/20099 (18.65%) Loss: 2.292766 LR: 0.00002918 [08:45:52] Epoch: 1 Batch: 3750/20099 (18.66%) Loss: 2.424328 LR: 0.00002918 [08:45:56] Epoch: 1 Batch: 3751/20099 (18.66%) Loss: 2.243640 LR: 0.00002918 [08:45:59] Epoch: 1 Batch: 3752/20099 (18.67%) Loss: 2.251503 LR: 0.00002924 [08:46:02] Epoch: 1 Batch: 3753/20099 (18.67%) Loss: 2.150180 LR: 0.00002924 [08:46:05] Epoch: 1 Batch: 3754/20099 (18.68%) Loss: 2.662363 LR: 0.00002924 [08:46:08] Epoch: 1 Batch: 3755/20099 (18.68%) Loss: 1.748996 LR: 0.00002924 [08:46:11] Epoch: 1 Batch: 3756/20099 (18.69%) Loss: 2.200341 LR: 0.00002924 [08:46:14] Epoch: 1 Batch: 3757/20099 (18.69%) Loss: 2.118017 LR: 0.00002924 [08:46:17] Epoch: 1 Batch: 3758/20099 (18.70%) Loss: 2.150166 LR: 0.00002924 [08:46:20] Epoch: 1 Batch: 3759/20099 (18.70%) Loss: 2.028834 LR: 0.00002929 [08:46:23] Epoch: 1 Batch: 3760/20099 (18.71%) Loss: 1.854575 LR: 0.00002929 [08:46:26] Epoch: 1 Batch: 3761/20099 (18.71%) Loss: 2.464490 LR: 0.00002929 [08:46:30] Epoch: 1 Batch: 3762/20099 (18.72%) Loss: 2.352511 LR: 0.00002929 [08:46:33] Epoch: 1 Batch: 3763/20099 (18.72%) Loss: 2.200231 LR: 0.00002929 [08:46:36] Epoch: 1 Batch: 3764/20099 (18.73%) Loss: 2.044948 LR: 0.00002929 [08:46:39] Epoch: 1 Batch: 3765/20099 (18.73%) Loss: 2.441801 LR: 0.00002929 [08:46:42] Epoch: 1 Batch: 3766/20099 (18.74%) Loss: 2.169284 LR: 0.00002935 [08:46:45] Epoch: 1 Batch: 3767/20099 (18.74%) Loss: 2.479515 LR: 0.00002935 [08:46:48] Epoch: 1 Batch: 3768/20099 (18.75%) Loss: 2.173170 LR: 0.00002935 [08:46:51] Epoch: 1 Batch: 3769/20099 (18.75%) Loss: 2.112743 LR: 0.00002935 [08:46:54] Epoch: 1 Batch: 3770/20099 (18.76%) Loss: 2.054328 LR: 0.00002935 [08:46:57] Epoch: 1 Batch: 3771/20099 (18.76%) Loss: 2.099656 LR: 0.00002935 [08:47:01] Epoch: 1 Batch: 3772/20099 (18.77%) Loss: 1.925619 LR: 0.00002935 [08:47:04] Epoch: 1 Batch: 3773/20099 (18.77%) Loss: 2.109031 LR: 0.00002940 [08:47:07] Epoch: 1 Batch: 3774/20099 (18.78%) Loss: 1.979051 LR: 0.00002940 [08:47:10] Epoch: 1 Batch: 3775/20099 (18.78%) Loss: 2.282022 LR: 0.00002940 [08:47:13] Epoch: 1 Batch: 3776/20099 (18.79%) Loss: 2.165964 LR: 0.00002940 [08:47:16] Epoch: 1 Batch: 3777/20099 (18.79%) Loss: 2.226140 LR: 0.00002940 [08:47:19] Epoch: 1 Batch: 3778/20099 (18.80%) Loss: 2.232293 LR: 0.00002940 [08:47:22] Epoch: 1 Batch: 3779/20099 (18.80%) Loss: 2.193266 LR: 0.00002940 [08:47:25] Epoch: 1 Batch: 3780/20099 (18.81%) Loss: 2.190737 LR: 0.00002945 [08:47:28] Epoch: 1 Batch: 3781/20099 (18.81%) Loss: 2.046825 LR: 0.00002945 [08:47:31] Epoch: 1 Batch: 3782/20099 (18.82%) Loss: 2.502385 LR: 0.00002945 [08:47:35] Epoch: 1 Batch: 3783/20099 (18.82%) Loss: 2.216633 LR: 0.00002945 [08:47:38] Epoch: 1 Batch: 3784/20099 (18.83%) Loss: 2.179853 LR: 0.00002945 [08:47:41] Epoch: 1 Batch: 3785/20099 (18.83%) Loss: 2.507049 LR: 0.00002945 [08:47:44] Epoch: 1 Batch: 3786/20099 (18.84%) Loss: 2.297454 LR: 0.00002945 [08:47:47] Epoch: 1 Batch: 3787/20099 (18.84%) Loss: 2.421426 LR: 0.00002951 [08:47:50] Epoch: 1 Batch: 3788/20099 (18.85%) Loss: 1.943087 LR: 0.00002951 [08:47:53] Epoch: 1 Batch: 3789/20099 (18.85%) Loss: 2.537128 LR: 0.00002951 [08:47:56] Epoch: 1 Batch: 3790/20099 (18.86%) Loss: 2.257926 LR: 0.00002951 [08:47:59] Epoch: 1 Batch: 3791/20099 (18.86%) Loss: 2.148468 LR: 0.00002951 [08:48:02] Epoch: 1 Batch: 3792/20099 (18.87%) Loss: 2.151384 LR: 0.00002951 [08:48:05] Epoch: 1 Batch: 3793/20099 (18.87%) Loss: 2.077953 LR: 0.00002951 [08:48:09] Epoch: 1 Batch: 3794/20099 (18.88%) Loss: 2.009767 LR: 0.00002956 [08:48:12] Epoch: 1 Batch: 3795/20099 (18.88%) Loss: 2.151670 LR: 0.00002956 [08:48:15] Epoch: 1 Batch: 3796/20099 (18.89%) Loss: 2.232823 LR: 0.00002956 [08:48:18] Epoch: 1 Batch: 3797/20099 (18.89%) Loss: 2.090296 LR: 0.00002956 [08:48:21] Epoch: 1 Batch: 3798/20099 (18.90%) Loss: 2.334928 LR: 0.00002956 [08:48:24] Epoch: 1 Batch: 3799/20099 (18.90%) Loss: 2.277787 LR: 0.00002956 [08:48:31] >> Cleaned up old temp checkpoint: epoch1_step1800 [08:48:31] >> Temp checkpoint saved: epoch1_step3800, size: 0.1693 GB [08:48:31] Epoch: 1 Batch: 3800/20099 (18.91%) Loss: 2.539415 LR: 0.00002956 [08:48:34] Epoch: 1 Batch: 3801/20099 (18.91%) Loss: 1.990244 LR: 0.00002962 [08:48:37] Epoch: 1 Batch: 3802/20099 (18.92%) Loss: 2.327858 LR: 0.00002962 [08:48:40] Epoch: 1 Batch: 3803/20099 (18.92%) Loss: 2.119348 LR: 0.00002962 [08:48:43] Epoch: 1 Batch: 3804/20099 (18.93%) Loss: 2.100741 LR: 0.00002962 [08:48:46] Epoch: 1 Batch: 3805/20099 (18.93%) Loss: 2.375851 LR: 0.00002962 [08:48:49] Epoch: 1 Batch: 3806/20099 (18.94%) Loss: 2.230961 LR: 0.00002962 [08:48:52] Epoch: 1 Batch: 3807/20099 (18.94%) Loss: 2.093738 LR: 0.00002962 [08:48:55] Epoch: 1 Batch: 3808/20099 (18.95%) Loss: 1.882820 LR: 0.00002967 [08:48:58] Epoch: 1 Batch: 3809/20099 (18.95%) Loss: 2.269152 LR: 0.00002967 [08:49:02] Epoch: 1 Batch: 3810/20099 (18.96%) Loss: 2.537046 LR: 0.00002967 [08:49:05] Epoch: 1 Batch: 3811/20099 (18.96%) Loss: 2.094864 LR: 0.00002967 [08:49:08] Epoch: 1 Batch: 3812/20099 (18.97%) Loss: 2.309602 LR: 0.00002967 [08:49:11] Epoch: 1 Batch: 3813/20099 (18.97%) Loss: 2.361408 LR: 0.00002967 [08:49:14] Epoch: 1 Batch: 3814/20099 (18.98%) Loss: 2.239440 LR: 0.00002967 [08:49:17] Epoch: 1 Batch: 3815/20099 (18.98%) Loss: 2.462315 LR: 0.00002973 [08:49:20] Epoch: 1 Batch: 3816/20099 (18.99%) Loss: 2.221870 LR: 0.00002973 [08:49:23] Epoch: 1 Batch: 3817/20099 (18.99%) Loss: 2.282412 LR: 0.00002973 [08:49:27] Epoch: 1 Batch: 3818/20099 (19.00%) Loss: 2.381946 LR: 0.00002973 [08:49:30] Epoch: 1 Batch: 3819/20099 (19.00%) Loss: 2.620139 LR: 0.00002973 [08:49:33] Epoch: 1 Batch: 3820/20099 (19.01%) Loss: 2.212627 LR: 0.00002973 [08:49:36] Epoch: 1 Batch: 3821/20099 (19.01%) Loss: 2.237298 LR: 0.00002973 [08:49:39] Epoch: 1 Batch: 3822/20099 (19.02%) Loss: 2.162826 LR: 0.00002978 [08:49:42] Epoch: 1 Batch: 3823/20099 (19.02%) Loss: 2.199271 LR: 0.00002978 [08:49:45] Epoch: 1 Batch: 3824/20099 (19.03%) Loss: 1.933123 LR: 0.00002978 [08:49:48] Epoch: 1 Batch: 3825/20099 (19.03%) Loss: 2.414501 LR: 0.00002978 [08:49:51] Epoch: 1 Batch: 3826/20099 (19.04%) Loss: 2.186663 LR: 0.00002978 [08:49:54] Epoch: 1 Batch: 3827/20099 (19.04%) Loss: 2.290898 LR: 0.00002978 [08:49:57] Epoch: 1 Batch: 3828/20099 (19.05%) Loss: 2.356161 LR: 0.00002978 [08:50:00] Epoch: 1 Batch: 3829/20099 (19.05%) Loss: 2.225232 LR: 0.00002984 [08:50:04] Epoch: 1 Batch: 3830/20099 (19.06%) Loss: 2.139328 LR: 0.00002984 [08:50:07] Epoch: 1 Batch: 3831/20099 (19.06%) Loss: 2.191448 LR: 0.00002984 [08:50:10] Epoch: 1 Batch: 3832/20099 (19.07%) Loss: 2.520070 LR: 0.00002984 [08:50:13] Epoch: 1 Batch: 3833/20099 (19.07%) Loss: 2.179572 LR: 0.00002984 [08:50:16] Epoch: 1 Batch: 3834/20099 (19.08%) Loss: 2.178042 LR: 0.00002984 [08:50:19] Epoch: 1 Batch: 3835/20099 (19.08%) Loss: 2.235148 LR: 0.00002984 [08:50:22] Epoch: 1 Batch: 3836/20099 (19.09%) Loss: 2.144053 LR: 0.00002989 [08:50:25] Epoch: 1 Batch: 3837/20099 (19.09%) Loss: 1.648186 LR: 0.00002989 [08:50:28] Epoch: 1 Batch: 3838/20099 (19.10%) Loss: 2.362682 LR: 0.00002989 [08:50:31] Epoch: 1 Batch: 3839/20099 (19.10%) Loss: 2.243945 LR: 0.00002989 [08:50:35] Epoch: 1 Batch: 3840/20099 (19.11%) Loss: 1.998917 LR: 0.00002989 [08:50:38] Epoch: 1 Batch: 3841/20099 (19.11%) Loss: 2.216210 LR: 0.00002989 [08:50:41] Epoch: 1 Batch: 3842/20099 (19.12%) Loss: 2.220915 LR: 0.00002989 [08:50:44] Epoch: 1 Batch: 3843/20099 (19.12%) Loss: 1.960943 LR: 0.00002995 [08:50:47] Epoch: 1 Batch: 3844/20099 (19.13%) Loss: 2.140455 LR: 0.00002995 [08:50:50] Epoch: 1 Batch: 3845/20099 (19.13%) Loss: 2.120445 LR: 0.00002995 [08:50:53] Epoch: 1 Batch: 3846/20099 (19.14%) Loss: 2.359638 LR: 0.00002995 [08:50:56] Epoch: 1 Batch: 3847/20099 (19.14%) Loss: 2.635872 LR: 0.00002995 [08:50:59] Epoch: 1 Batch: 3848/20099 (19.15%) Loss: 2.234388 LR: 0.00002995 [08:51:02] Epoch: 1 Batch: 3849/20099 (19.15%) Loss: 1.966524 LR: 0.00002995 [08:51:05] Epoch: 1 Batch: 3850/20099 (19.16%) Loss: 2.364397 LR: 0.00003000 [08:51:09] Epoch: 1 Batch: 3851/20099 (19.16%) Loss: 2.200590 LR: 0.00003000 [08:51:12] Epoch: 1 Batch: 3852/20099 (19.17%) Loss: 1.904665 LR: 0.00003000 [08:51:15] Epoch: 1 Batch: 3853/20099 (19.17%) Loss: 1.898541 LR: 0.00003000 [08:51:18] Epoch: 1 Batch: 3854/20099 (19.18%) Loss: 2.095768 LR: 0.00003000 [08:51:21] Epoch: 1 Batch: 3855/20099 (19.18%) Loss: 2.233748 LR: 0.00003000 [08:51:24] Epoch: 1 Batch: 3856/20099 (19.19%) Loss: 2.043198 LR: 0.00003000 [08:51:27] Epoch: 1 Batch: 3857/20099 (19.19%) Loss: 2.194845 LR: 0.00003000 [08:51:30] Epoch: 1 Batch: 3858/20099 (19.19%) Loss: 2.262131 LR: 0.00003000 [08:51:33] Epoch: 1 Batch: 3859/20099 (19.20%) Loss: 2.312472 LR: 0.00003000 [08:51:36] Epoch: 1 Batch: 3860/20099 (19.20%) Loss: 2.156317 LR: 0.00003000 [08:51:40] Epoch: 1 Batch: 3861/20099 (19.21%) Loss: 1.864562 LR: 0.00003000 [08:51:43] Epoch: 1 Batch: 3862/20099 (19.21%) Loss: 2.658740 LR: 0.00003000 [08:51:46] Epoch: 1 Batch: 3863/20099 (19.22%) Loss: 2.308361 LR: 0.00003000 [08:51:49] Epoch: 1 Batch: 3864/20099 (19.22%) Loss: 1.825405 LR: 0.00003000 [08:51:52] Epoch: 1 Batch: 3865/20099 (19.23%) Loss: 2.038861 LR: 0.00003000 [08:51:55] Epoch: 1 Batch: 3866/20099 (19.23%) Loss: 2.258087 LR: 0.00003000 [08:51:58] Epoch: 1 Batch: 3867/20099 (19.24%) Loss: 2.416586 LR: 0.00003000 [08:52:01] Epoch: 1 Batch: 3868/20099 (19.24%) Loss: 2.051997 LR: 0.00003000 [08:52:04] Epoch: 1 Batch: 3869/20099 (19.25%) Loss: 2.440117 LR: 0.00003000 [08:52:07] Epoch: 1 Batch: 3870/20099 (19.25%) Loss: 2.371255 LR: 0.00003000 [08:52:10] Epoch: 1 Batch: 3871/20099 (19.26%) Loss: 2.437447 LR: 0.00003000 [08:52:14] Epoch: 1 Batch: 3872/20099 (19.26%) Loss: 2.179188 LR: 0.00003000 [08:52:17] Epoch: 1 Batch: 3873/20099 (19.27%) Loss: 2.308589 LR: 0.00003000 [08:52:20] Epoch: 1 Batch: 3874/20099 (19.27%) Loss: 2.440221 LR: 0.00003000 [08:52:23] Epoch: 1 Batch: 3875/20099 (19.28%) Loss: 2.405983 LR: 0.00003000 [08:52:26] Epoch: 1 Batch: 3876/20099 (19.28%) Loss: 2.218790 LR: 0.00003000 [08:52:29] Epoch: 1 Batch: 3877/20099 (19.29%) Loss: 2.439365 LR: 0.00003000 [08:52:32] Epoch: 1 Batch: 3878/20099 (19.29%) Loss: 2.406316 LR: 0.00003000 [08:52:35] Epoch: 1 Batch: 3879/20099 (19.30%) Loss: 2.026322 LR: 0.00003000 [08:52:38] Epoch: 1 Batch: 3880/20099 (19.30%) Loss: 2.109701 LR: 0.00003000 [08:52:41] Epoch: 1 Batch: 3881/20099 (19.31%) Loss: 2.286036 LR: 0.00003000 [08:52:45] Epoch: 1 Batch: 3882/20099 (19.31%) Loss: 2.217481 LR: 0.00003000 [08:52:48] Epoch: 1 Batch: 3883/20099 (19.32%) Loss: 2.191971 LR: 0.00003000 [08:52:51] Epoch: 1 Batch: 3884/20099 (19.32%) Loss: 2.050911 LR: 0.00003000 [08:52:54] Epoch: 1 Batch: 3885/20099 (19.33%) Loss: 2.280048 LR: 0.00003000 [08:52:57] Epoch: 1 Batch: 3886/20099 (19.33%) Loss: 2.158287 LR: 0.00003000 [08:53:00] Epoch: 1 Batch: 3887/20099 (19.34%) Loss: 2.189884 LR: 0.00003000 [08:53:03] Epoch: 1 Batch: 3888/20099 (19.34%) Loss: 2.277434 LR: 0.00003000 [08:53:06] Epoch: 1 Batch: 3889/20099 (19.35%) Loss: 2.182162 LR: 0.00003000 [08:53:09] Epoch: 1 Batch: 3890/20099 (19.35%) Loss: 2.280168 LR: 0.00003000 [08:53:12] Epoch: 1 Batch: 3891/20099 (19.36%) Loss: 2.149066 LR: 0.00003000 [08:53:16] Epoch: 1 Batch: 3892/20099 (19.36%) Loss: 2.212634 LR: 0.00003000 [08:53:19] Epoch: 1 Batch: 3893/20099 (19.37%) Loss: 2.237472 LR: 0.00003000 [08:53:22] Epoch: 1 Batch: 3894/20099 (19.37%) Loss: 2.222408 LR: 0.00003000 [08:53:25] Epoch: 1 Batch: 3895/20099 (19.38%) Loss: 2.097622 LR: 0.00003000 [08:53:28] Epoch: 1 Batch: 3896/20099 (19.38%) Loss: 2.108954 LR: 0.00003000 [08:53:31] Epoch: 1 Batch: 3897/20099 (19.39%) Loss: 2.246134 LR: 0.00003000 [08:53:34] Epoch: 1 Batch: 3898/20099 (19.39%) Loss: 2.177431 LR: 0.00003000 [08:53:37] Epoch: 1 Batch: 3899/20099 (19.40%) Loss: 1.998008 LR: 0.00003000 [08:53:40] Epoch: 1 Batch: 3900/20099 (19.40%) Loss: 2.092345 LR: 0.00003000 [08:53:43] Epoch: 1 Batch: 3901/20099 (19.41%) Loss: 2.453894 LR: 0.00003000 [08:53:46] Epoch: 1 Batch: 3902/20099 (19.41%) Loss: 2.193978 LR: 0.00003000 [08:53:50] Epoch: 1 Batch: 3903/20099 (19.42%) Loss: 2.346276 LR: 0.00003000 [08:53:53] Epoch: 1 Batch: 3904/20099 (19.42%) Loss: 2.043164 LR: 0.00003000 [08:53:56] Epoch: 1 Batch: 3905/20099 (19.43%) Loss: 1.883669 LR: 0.00003000 [08:53:59] Epoch: 1 Batch: 3906/20099 (19.43%) Loss: 2.210085 LR: 0.00003000 [08:54:02] Epoch: 1 Batch: 3907/20099 (19.44%) Loss: 2.113711 LR: 0.00003000 [08:54:05] Epoch: 1 Batch: 3908/20099 (19.44%) Loss: 2.153842 LR: 0.00003000 [08:54:08] Epoch: 1 Batch: 3909/20099 (19.45%) Loss: 2.385109 LR: 0.00003000 [08:54:11] Epoch: 1 Batch: 3910/20099 (19.45%) Loss: 2.434619 LR: 0.00003000 [08:54:14] Epoch: 1 Batch: 3911/20099 (19.46%) Loss: 2.042595 LR: 0.00003000 [08:54:17] Epoch: 1 Batch: 3912/20099 (19.46%) Loss: 2.297205 LR: 0.00003000 [08:54:21] Epoch: 1 Batch: 3913/20099 (19.47%) Loss: 1.919553 LR: 0.00003000 [08:54:24] Epoch: 1 Batch: 3914/20099 (19.47%) Loss: 2.164845 LR: 0.00003000 [08:54:27] Epoch: 1 Batch: 3915/20099 (19.48%) Loss: 2.108092 LR: 0.00003000 [08:54:30] Epoch: 1 Batch: 3916/20099 (19.48%) Loss: 2.016833 LR: 0.00003000 [08:54:33] Epoch: 1 Batch: 3917/20099 (19.49%) Loss: 2.137065 LR: 0.00003000 [08:54:36] Epoch: 1 Batch: 3918/20099 (19.49%) Loss: 2.298762 LR: 0.00003000 [08:54:39] Epoch: 1 Batch: 3919/20099 (19.50%) Loss: 2.226798 LR: 0.00003000 [08:54:42] Epoch: 1 Batch: 3920/20099 (19.50%) Loss: 2.254846 LR: 0.00003000 [08:54:45] Epoch: 1 Batch: 3921/20099 (19.51%) Loss: 2.273290 LR: 0.00003000 [08:54:48] Epoch: 1 Batch: 3922/20099 (19.51%) Loss: 2.436063 LR: 0.00003000 [08:54:51] Epoch: 1 Batch: 3923/20099 (19.52%) Loss: 2.335819 LR: 0.00003000 [08:54:55] Epoch: 1 Batch: 3924/20099 (19.52%) Loss: 2.320823 LR: 0.00003000 [08:54:58] Epoch: 1 Batch: 3925/20099 (19.53%) Loss: 2.088118 LR: 0.00003000 [08:55:01] Epoch: 1 Batch: 3926/20099 (19.53%) Loss: 2.095462 LR: 0.00003000 [08:55:04] Epoch: 1 Batch: 3927/20099 (19.54%) Loss: 2.344166 LR: 0.00003000 [08:55:07] Epoch: 1 Batch: 3928/20099 (19.54%) Loss: 2.399682 LR: 0.00003000 [08:55:10] Epoch: 1 Batch: 3929/20099 (19.55%) Loss: 2.357320 LR: 0.00003000 [08:55:13] Epoch: 1 Batch: 3930/20099 (19.55%) Loss: 2.176040 LR: 0.00003000 [08:55:16] Epoch: 1 Batch: 3931/20099 (19.56%) Loss: 2.224869 LR: 0.00003000 [08:55:19] Epoch: 1 Batch: 3932/20099 (19.56%) Loss: 2.163687 LR: 0.00003000 [08:55:22] Epoch: 1 Batch: 3933/20099 (19.57%) Loss: 2.272918 LR: 0.00003000 [08:55:26] Epoch: 1 Batch: 3934/20099 (19.57%) Loss: 2.407977 LR: 0.00003000 [08:55:29] Epoch: 1 Batch: 3935/20099 (19.58%) Loss: 1.891371 LR: 0.00003000 [08:55:32] Epoch: 1 Batch: 3936/20099 (19.58%) Loss: 2.398843 LR: 0.00003000 [08:55:35] Epoch: 1 Batch: 3937/20099 (19.59%) Loss: 2.009269 LR: 0.00003000 [08:55:38] Epoch: 1 Batch: 3938/20099 (19.59%) Loss: 2.259749 LR: 0.00003000 [08:55:41] Epoch: 1 Batch: 3939/20099 (19.60%) Loss: 2.069403 LR: 0.00003000 [08:55:44] Epoch: 1 Batch: 3940/20099 (19.60%) Loss: 2.485107 LR: 0.00003000 [08:55:47] Epoch: 1 Batch: 3941/20099 (19.61%) Loss: 2.187631 LR: 0.00003000 [08:55:50] Epoch: 1 Batch: 3942/20099 (19.61%) Loss: 2.339424 LR: 0.00003000 [08:55:53] Epoch: 1 Batch: 3943/20099 (19.62%) Loss: 2.199047 LR: 0.00003000 [08:55:57] Epoch: 1 Batch: 3944/20099 (19.62%) Loss: 2.386613 LR: 0.00003000 [08:56:00] Epoch: 1 Batch: 3945/20099 (19.63%) Loss: 1.938118 LR: 0.00003000 [08:56:03] Epoch: 1 Batch: 3946/20099 (19.63%) Loss: 2.171975 LR: 0.00003000 [08:56:06] Epoch: 1 Batch: 3947/20099 (19.64%) Loss: 2.305034 LR: 0.00003000 [08:56:09] Epoch: 1 Batch: 3948/20099 (19.64%) Loss: 2.423848 LR: 0.00003000 [08:56:12] Epoch: 1 Batch: 3949/20099 (19.65%) Loss: 1.990211 LR: 0.00003000 [08:56:15] Epoch: 1 Batch: 3950/20099 (19.65%) Loss: 2.271347 LR: 0.00003000 [08:56:18] Epoch: 1 Batch: 3951/20099 (19.66%) Loss: 2.170313 LR: 0.00003000 [08:56:21] Epoch: 1 Batch: 3952/20099 (19.66%) Loss: 2.259197 LR: 0.00003000 [08:56:24] Epoch: 1 Batch: 3953/20099 (19.67%) Loss: 2.065250 LR: 0.00003000 [08:56:27] Epoch: 1 Batch: 3954/20099 (19.67%) Loss: 2.168954 LR: 0.00003000 [08:56:31] Epoch: 1 Batch: 3955/20099 (19.68%) Loss: 2.025030 LR: 0.00003000 [08:56:34] Epoch: 1 Batch: 3956/20099 (19.68%) Loss: 2.002827 LR: 0.00003000 [08:56:37] Epoch: 1 Batch: 3957/20099 (19.69%) Loss: 2.305538 LR: 0.00003000 [08:56:40] Epoch: 1 Batch: 3958/20099 (19.69%) Loss: 2.046114 LR: 0.00003000 [08:56:43] Epoch: 1 Batch: 3959/20099 (19.70%) Loss: 2.208117 LR: 0.00003000 [08:56:46] Epoch: 1 Batch: 3960/20099 (19.70%) Loss: 2.387245 LR: 0.00003000 [08:56:49] Epoch: 1 Batch: 3961/20099 (19.71%) Loss: 2.148747 LR: 0.00003000 [08:56:52] Epoch: 1 Batch: 3962/20099 (19.71%) Loss: 2.382910 LR: 0.00003000 [08:56:55] Epoch: 1 Batch: 3963/20099 (19.72%) Loss: 1.934010 LR: 0.00003000 [08:56:58] Epoch: 1 Batch: 3964/20099 (19.72%) Loss: 2.243827 LR: 0.00003000 [08:57:02] Epoch: 1 Batch: 3965/20099 (19.73%) Loss: 2.402542 LR: 0.00003000 [08:57:05] Epoch: 1 Batch: 3966/20099 (19.73%) Loss: 2.236204 LR: 0.00003000 [08:57:08] Epoch: 1 Batch: 3967/20099 (19.74%) Loss: 2.381816 LR: 0.00003000 [08:57:11] Epoch: 1 Batch: 3968/20099 (19.74%) Loss: 2.417387 LR: 0.00003000 [08:57:14] Epoch: 1 Batch: 3969/20099 (19.75%) Loss: 1.858077 LR: 0.00003000 [08:57:17] Epoch: 1 Batch: 3970/20099 (19.75%) Loss: 1.801800 LR: 0.00003000 [08:57:20] Epoch: 1 Batch: 3971/20099 (19.76%) Loss: 2.072035 LR: 0.00003000 [08:57:23] Epoch: 1 Batch: 3972/20099 (19.76%) Loss: 2.193757 LR: 0.00003000 [08:57:26] Epoch: 1 Batch: 3973/20099 (19.77%) Loss: 2.321878 LR: 0.00003000 [08:57:29] Epoch: 1 Batch: 3974/20099 (19.77%) Loss: 2.119226 LR: 0.00003000 [08:57:33] Epoch: 1 Batch: 3975/20099 (19.78%) Loss: 2.380112 LR: 0.00003000 [08:57:36] Epoch: 1 Batch: 3976/20099 (19.78%) Loss: 2.360560 LR: 0.00003000 [08:57:39] Epoch: 1 Batch: 3977/20099 (19.79%) Loss: 1.954153 LR: 0.00003000 [08:57:42] Epoch: 1 Batch: 3978/20099 (19.79%) Loss: 2.244420 LR: 0.00003000 [08:57:45] Epoch: 1 Batch: 3979/20099 (19.80%) Loss: 2.289345 LR: 0.00003000 [08:57:48] Epoch: 1 Batch: 3980/20099 (19.80%) Loss: 2.409555 LR: 0.00003000 [08:57:51] Epoch: 1 Batch: 3981/20099 (19.81%) Loss: 2.168973 LR: 0.00003000 [08:57:54] Epoch: 1 Batch: 3982/20099 (19.81%) Loss: 2.279854 LR: 0.00003000 [08:57:57] Epoch: 1 Batch: 3983/20099 (19.82%) Loss: 2.380676 LR: 0.00003000 [08:58:00] Epoch: 1 Batch: 3984/20099 (19.82%) Loss: 2.071845 LR: 0.00003000 [08:58:04] Epoch: 1 Batch: 3985/20099 (19.83%) Loss: 2.416183 LR: 0.00003000 [08:58:07] Epoch: 1 Batch: 3986/20099 (19.83%) Loss: 2.147555 LR: 0.00003000 [08:58:10] Epoch: 1 Batch: 3987/20099 (19.84%) Loss: 1.923975 LR: 0.00003000 [08:58:13] Epoch: 1 Batch: 3988/20099 (19.84%) Loss: 2.187119 LR: 0.00003000 [08:58:16] Epoch: 1 Batch: 3989/20099 (19.85%) Loss: 2.210663 LR: 0.00003000 [08:58:19] Epoch: 1 Batch: 3990/20099 (19.85%) Loss: 2.222629 LR: 0.00003000 [08:58:22] Epoch: 1 Batch: 3991/20099 (19.86%) Loss: 2.158592 LR: 0.00003000 [08:58:25] Epoch: 1 Batch: 3992/20099 (19.86%) Loss: 1.971459 LR: 0.00003000 [08:58:28] Epoch: 1 Batch: 3993/20099 (19.87%) Loss: 2.283452 LR: 0.00003000 [08:58:31] Epoch: 1 Batch: 3994/20099 (19.87%) Loss: 2.040454 LR: 0.00003000 [08:58:34] Epoch: 1 Batch: 3995/20099 (19.88%) Loss: 2.261510 LR: 0.00003000 [08:58:38] Epoch: 1 Batch: 3996/20099 (19.88%) Loss: 2.243987 LR: 0.00003000 [08:58:41] Epoch: 1 Batch: 3997/20099 (19.89%) Loss: 2.128390 LR: 0.00003000 [08:58:44] Epoch: 1 Batch: 3998/20099 (19.89%) Loss: 2.338022 LR: 0.00003000 [08:58:47] Epoch: 1 Batch: 3999/20099 (19.90%) Loss: 2.137212 LR: 0.00003000 [08:58:50] >> Evaluating batch 0 [08:58:51] >> Evaluating batch 1 [08:58:53] >> Evaluating batch 2 [08:58:54] >> Evaluating batch 3 [08:58:55] >> Evaluating batch 4 [08:58:56] >> Evaluating batch 5 [08:58:58] >> Evaluating batch 6 [08:58:59] >> Evaluating batch 7 [08:59:00] >> Evaluating batch 8 [08:59:01] >> Evaluating batch 9 [08:59:03] >> Evaluating batch 10 [08:59:04] >> Evaluating batch 11 [08:59:05] >> Evaluating batch 12 [08:59:06] >> Evaluating batch 13 [08:59:07] >> Evaluating batch 14 [08:59:08] >> Evaluating batch 15 [08:59:10] >> Evaluating batch 16 [08:59:11] Epoch: 1 Step: 4000/20099 Evaluation: [08:59:11] [1mAvg Loss Since Last Eval: 2.2078 Val Loss: 2.2596 Validation loss delta: -0.0181 Perplexity: 9.5792 LR: 0.00003000 [08:59:14] >> Cleaned up old temp checkpoint: epoch1_step2000 [08:59:14] >> Temp checkpoint saved: epoch1_step4000, size: 0.1693 GB [08:59:18] >> Checkpoint saved: epoch1_step4000, size: 0.1693 GB [08:59:18] Epoch: 1 Batch: 4000/20099 (19.90%) Loss: 2.230767 LR: 0.00003000 [08:59:21] Epoch: 1 Batch: 4001/20099 (19.91%) Loss: 2.365759 LR: 0.00003000 [08:59:24] Epoch: 1 Batch: 4002/20099 (19.91%) Loss: 2.155231 LR: 0.00003000 [08:59:27] Epoch: 1 Batch: 4003/20099 (19.92%) Loss: 2.146534 LR: 0.00003000 [08:59:30] Epoch: 1 Batch: 4004/20099 (19.92%) Loss: 2.157895 LR: 0.00002999 [08:59:33] Epoch: 1 Batch: 4005/20099 (19.93%) Loss: 2.153976 LR: 0.00002999 [08:59:36] Epoch: 1 Batch: 4006/20099 (19.93%) Loss: 2.220458 LR: 0.00002999 [08:59:39] Epoch: 1 Batch: 4007/20099 (19.94%) Loss: 2.197265 LR: 0.00002999 [08:59:42] Epoch: 1 Batch: 4008/20099 (19.94%) Loss: 2.098452 LR: 0.00002999 [08:59:46] Epoch: 1 Batch: 4009/20099 (19.95%) Loss: 2.141283 LR: 0.00002999 [08:59:49] Epoch: 1 Batch: 4010/20099 (19.95%) Loss: 1.806260 LR: 0.00002999 [08:59:52] Epoch: 1 Batch: 4011/20099 (19.96%) Loss: 2.075292 LR: 0.00002999 [08:59:55] Epoch: 1 Batch: 4012/20099 (19.96%) Loss: 2.124531 LR: 0.00002999 [08:59:58] Epoch: 1 Batch: 4013/20099 (19.97%) Loss: 2.037990 LR: 0.00002999 [09:00:01] Epoch: 1 Batch: 4014/20099 (19.97%) Loss: 2.214685 LR: 0.00002999 [09:00:05] Epoch: 1 Batch: 4015/20099 (19.98%) Loss: 2.198604 LR: 0.00002999 [09:00:08] Epoch: 1 Batch: 4016/20099 (19.98%) Loss: 2.152951 LR: 0.00002999 [09:00:11] Epoch: 1 Batch: 4017/20099 (19.99%) Loss: 2.282963 LR: 0.00002999 [09:00:14] Epoch: 1 Batch: 4018/20099 (19.99%) Loss: 2.183164 LR: 0.00002999 [09:00:17] Epoch: 1 Batch: 4019/20099 (20.00%) Loss: 2.054929 LR: 0.00002999 [09:00:20] Epoch: 1 Batch: 4020/20099 (20.00%) Loss: 2.209050 LR: 0.00002999 [09:00:23] Epoch: 1 Batch: 4021/20099 (20.01%) Loss: 2.128426 LR: 0.00002999 [09:00:26] Epoch: 1 Batch: 4022/20099 (20.01%) Loss: 2.433947 LR: 0.00002999 [09:00:29] Epoch: 1 Batch: 4023/20099 (20.02%) Loss: 2.161413 LR: 0.00002999 [09:00:32] Epoch: 1 Batch: 4024/20099 (20.02%) Loss: 2.048685 LR: 0.00002999 [09:00:35] Epoch: 1 Batch: 4025/20099 (20.03%) Loss: 2.127099 LR: 0.00002999 [09:00:38] Epoch: 1 Batch: 4026/20099 (20.03%) Loss: 2.125098 LR: 0.00002999 [09:00:42] Epoch: 1 Batch: 4027/20099 (20.04%) Loss: 2.297527 LR: 0.00002999 [09:00:45] Epoch: 1 Batch: 4028/20099 (20.04%) Loss: 2.158046 LR: 0.00002999 [09:00:48] Epoch: 1 Batch: 4029/20099 (20.05%) Loss: 1.937330 LR: 0.00002999 [09:00:51] Epoch: 1 Batch: 4030/20099 (20.05%) Loss: 2.435694 LR: 0.00002999 [09:00:54] Epoch: 1 Batch: 4031/20099 (20.06%) Loss: 2.049549 LR: 0.00002999 [09:00:57] Epoch: 1 Batch: 4032/20099 (20.06%) Loss: 2.282878 LR: 0.00002999 [09:01:00] Epoch: 1 Batch: 4033/20099 (20.07%) Loss: 1.783038 LR: 0.00002999 [09:01:03] Epoch: 1 Batch: 4034/20099 (20.07%) Loss: 2.097534 LR: 0.00002999 [09:01:06] Epoch: 1 Batch: 4035/20099 (20.08%) Loss: 1.880652 LR: 0.00002999 [09:01:09] Epoch: 1 Batch: 4036/20099 (20.08%) Loss: 2.199209 LR: 0.00002999 [09:01:12] Epoch: 1 Batch: 4037/20099 (20.09%) Loss: 2.250202 LR: 0.00002999 [09:01:15] Epoch: 1 Batch: 4038/20099 (20.09%) Loss: 2.112329 LR: 0.00002999 [09:01:19] Epoch: 1 Batch: 4039/20099 (20.10%) Loss: 2.110555 LR: 0.00002999 [09:01:22] Epoch: 1 Batch: 4040/20099 (20.10%) Loss: 2.203888 LR: 0.00002999 [09:01:25] Epoch: 1 Batch: 4041/20099 (20.11%) Loss: 2.109310 LR: 0.00002999 [09:01:28] Epoch: 1 Batch: 4042/20099 (20.11%) Loss: 2.089413 LR: 0.00002999 [09:01:31] Epoch: 1 Batch: 4043/20099 (20.12%) Loss: 1.872151 LR: 0.00002999 [09:01:34] Epoch: 1 Batch: 4044/20099 (20.12%) Loss: 2.067805 LR: 0.00002999 [09:01:37] Epoch: 1 Batch: 4045/20099 (20.13%) Loss: 2.148703 LR: 0.00002999 [09:01:40] Epoch: 1 Batch: 4046/20099 (20.13%) Loss: 2.296954 LR: 0.00002999 [09:01:43] Epoch: 1 Batch: 4047/20099 (20.14%) Loss: 1.896643 LR: 0.00002999 [09:01:47] Epoch: 1 Batch: 4048/20099 (20.14%) Loss: 1.918306 LR: 0.00002999 [09:01:50] Epoch: 1 Batch: 4049/20099 (20.15%) Loss: 2.143274 LR: 0.00002999 [09:01:53] Epoch: 1 Batch: 4050/20099 (20.15%) Loss: 2.257886 LR: 0.00002999 [09:01:56] Epoch: 1 Batch: 4051/20099 (20.16%) Loss: 2.061531 LR: 0.00002999 [09:01:59] Epoch: 1 Batch: 4052/20099 (20.16%) Loss: 2.403210 LR: 0.00002999 [09:02:02] Epoch: 1 Batch: 4053/20099 (20.17%) Loss: 2.145045 LR: 0.00002999 [09:02:05] Epoch: 1 Batch: 4054/20099 (20.17%) Loss: 2.205540 LR: 0.00002999 [09:02:08] Epoch: 1 Batch: 4055/20099 (20.18%) Loss: 2.172061 LR: 0.00002999 [09:02:11] Epoch: 1 Batch: 4056/20099 (20.18%) Loss: 2.575297 LR: 0.00002999 [09:02:14] Epoch: 1 Batch: 4057/20099 (20.19%) Loss: 2.080708 LR: 0.00002999 [09:02:18] Epoch: 1 Batch: 4058/20099 (20.19%) Loss: 2.249847 LR: 0.00002999 [09:02:21] Epoch: 1 Batch: 4059/20099 (20.20%) Loss: 2.053966 LR: 0.00002999 [09:02:24] Epoch: 1 Batch: 4060/20099 (20.20%) Loss: 2.025671 LR: 0.00002999 [09:02:27] Epoch: 1 Batch: 4061/20099 (20.20%) Loss: 2.445138 LR: 0.00002999 [09:02:30] Epoch: 1 Batch: 4062/20099 (20.21%) Loss: 1.991379 LR: 0.00002999 [09:02:33] Epoch: 1 Batch: 4063/20099 (20.21%) Loss: 2.220240 LR: 0.00002999 [09:02:36] Epoch: 1 Batch: 4064/20099 (20.22%) Loss: 1.978880 LR: 0.00002999 [09:02:39] Epoch: 1 Batch: 4065/20099 (20.22%) Loss: 2.067166 LR: 0.00002999 [09:02:42] Epoch: 1 Batch: 4066/20099 (20.23%) Loss: 2.386489 LR: 0.00002999 [09:02:45] Epoch: 1 Batch: 4067/20099 (20.23%) Loss: 2.406748 LR: 0.00002999 [09:02:48] Epoch: 1 Batch: 4068/20099 (20.24%) Loss: 1.887735 LR: 0.00002999 [09:02:52] Epoch: 1 Batch: 4069/20099 (20.24%) Loss: 2.367457 LR: 0.00002999 [09:02:55] Epoch: 1 Batch: 4070/20099 (20.25%) Loss: 2.404861 LR: 0.00002999 [09:02:58] Epoch: 1 Batch: 4071/20099 (20.25%) Loss: 1.999785 LR: 0.00002999 [09:03:01] Epoch: 1 Batch: 4072/20099 (20.26%) Loss: 2.249867 LR: 0.00002999 [09:03:04] Epoch: 1 Batch: 4073/20099 (20.26%) Loss: 2.075081 LR: 0.00002999 [09:03:07] Epoch: 1 Batch: 4074/20099 (20.27%) Loss: 2.149020 LR: 0.00002999 [09:03:10] Epoch: 1 Batch: 4075/20099 (20.27%) Loss: 2.059026 LR: 0.00002999 [09:03:13] Epoch: 1 Batch: 4076/20099 (20.28%) Loss: 2.341281 LR: 0.00002999 [09:03:16] Epoch: 1 Batch: 4077/20099 (20.28%) Loss: 1.938484 LR: 0.00002999 [09:03:19] Epoch: 1 Batch: 4078/20099 (20.29%) Loss: 2.411069 LR: 0.00002999 [09:03:22] Epoch: 1 Batch: 4079/20099 (20.29%) Loss: 2.407255 LR: 0.00002999 [09:03:26] Epoch: 1 Batch: 4080/20099 (20.30%) Loss: 2.353181 LR: 0.00002999 [09:03:29] Epoch: 1 Batch: 4081/20099 (20.30%) Loss: 1.941622 LR: 0.00002999 [09:03:32] Epoch: 1 Batch: 4082/20099 (20.31%) Loss: 1.916857 LR: 0.00002999 [09:03:35] Epoch: 1 Batch: 4083/20099 (20.31%) Loss: 2.263108 LR: 0.00002999 [09:03:38] Epoch: 1 Batch: 4084/20099 (20.32%) Loss: 1.844754 LR: 0.00002999 [09:03:41] Epoch: 1 Batch: 4085/20099 (20.32%) Loss: 2.114269 LR: 0.00002999 [09:03:44] Epoch: 1 Batch: 4086/20099 (20.33%) Loss: 2.249472 LR: 0.00002999 [09:03:47] Epoch: 1 Batch: 4087/20099 (20.33%) Loss: 2.391635 LR: 0.00002999 [09:03:50] Epoch: 1 Batch: 4088/20099 (20.34%) Loss: 2.290164 LR: 0.00002999 [09:03:53] Epoch: 1 Batch: 4089/20099 (20.34%) Loss: 2.277984 LR: 0.00002999 [09:03:57] Epoch: 1 Batch: 4090/20099 (20.35%) Loss: 2.199185 LR: 0.00002999 [09:04:00] Epoch: 1 Batch: 4091/20099 (20.35%) Loss: 2.028072 LR: 0.00002999 [09:04:03] Epoch: 1 Batch: 4092/20099 (20.36%) Loss: 2.390983 LR: 0.00002999 [09:04:06] Epoch: 1 Batch: 4093/20099 (20.36%) Loss: 2.337943 LR: 0.00002999 [09:04:09] Epoch: 1 Batch: 4094/20099 (20.37%) Loss: 2.323091 LR: 0.00002999 [09:04:12] Epoch: 1 Batch: 4095/20099 (20.37%) Loss: 2.060595 LR: 0.00002999 [09:04:15] Epoch: 1 Batch: 4096/20099 (20.38%) Loss: 2.221461 LR: 0.00002999 [09:04:18] Epoch: 1 Batch: 4097/20099 (20.38%) Loss: 1.858494 LR: 0.00002999 [09:04:21] Epoch: 1 Batch: 4098/20099 (20.39%) Loss: 2.214817 LR: 0.00002999 [09:04:24] Epoch: 1 Batch: 4099/20099 (20.39%) Loss: 2.093318 LR: 0.00002999 [09:04:27] Epoch: 1 Batch: 4100/20099 (20.40%) Loss: 1.903260 LR: 0.00002999 [09:04:30] Epoch: 1 Batch: 4101/20099 (20.40%) Loss: 2.383768 LR: 0.00002999 [09:04:34] Epoch: 1 Batch: 4102/20099 (20.41%) Loss: 2.129053 LR: 0.00002999 [09:04:37] Epoch: 1 Batch: 4103/20099 (20.41%) Loss: 1.893385 LR: 0.00002999 [09:04:40] Epoch: 1 Batch: 4104/20099 (20.42%) Loss: 2.302868 LR: 0.00002999 [09:04:43] Epoch: 1 Batch: 4105/20099 (20.42%) Loss: 2.183161 LR: 0.00002999 [09:04:46] Epoch: 1 Batch: 4106/20099 (20.43%) Loss: 2.235359 LR: 0.00002999 [09:04:49] Epoch: 1 Batch: 4107/20099 (20.43%) Loss: 2.209957 LR: 0.00002999 [09:04:52] Epoch: 1 Batch: 4108/20099 (20.44%) Loss: 2.151106 LR: 0.00002999 [09:04:55] Epoch: 1 Batch: 4109/20099 (20.44%) Loss: 2.377003 LR: 0.00002998 [09:04:58] Epoch: 1 Batch: 4110/20099 (20.45%) Loss: 2.182909 LR: 0.00002998 [09:05:01] Epoch: 1 Batch: 4111/20099 (20.45%) Loss: 2.103906 LR: 0.00002998 [09:05:04] Epoch: 1 Batch: 4112/20099 (20.46%) Loss: 2.067913 LR: 0.00002998 [09:05:07] Epoch: 1 Batch: 4113/20099 (20.46%) Loss: 2.156087 LR: 0.00002998 [09:05:11] Epoch: 1 Batch: 4114/20099 (20.47%) Loss: 2.289936 LR: 0.00002998 [09:05:14] Epoch: 1 Batch: 4115/20099 (20.47%) Loss: 2.310542 LR: 0.00002998 [09:05:17] Epoch: 1 Batch: 4116/20099 (20.48%) Loss: 2.070861 LR: 0.00002998 [09:05:20] Epoch: 1 Batch: 4117/20099 (20.48%) Loss: 2.178672 LR: 0.00002998 [09:05:23] Epoch: 1 Batch: 4118/20099 (20.49%) Loss: 2.034380 LR: 0.00002998 [09:05:26] Epoch: 1 Batch: 4119/20099 (20.49%) Loss: 2.095064 LR: 0.00002998 [09:05:29] Epoch: 1 Batch: 4120/20099 (20.50%) Loss: 2.139366 LR: 0.00002998 [09:05:32] Epoch: 1 Batch: 4121/20099 (20.50%) Loss: 2.403356 LR: 0.00002998 [09:05:35] Epoch: 1 Batch: 4122/20099 (20.51%) Loss: 2.171491 LR: 0.00002998 [09:05:38] Epoch: 1 Batch: 4123/20099 (20.51%) Loss: 2.228421 LR: 0.00002998 [09:05:42] Epoch: 1 Batch: 4124/20099 (20.52%) Loss: 2.023410 LR: 0.00002998 [09:05:45] Epoch: 1 Batch: 4125/20099 (20.52%) Loss: 2.132067 LR: 0.00002998 [09:05:48] Epoch: 1 Batch: 4126/20099 (20.53%) Loss: 2.325770 LR: 0.00002998 [09:05:51] Epoch: 1 Batch: 4127/20099 (20.53%) Loss: 2.295927 LR: 0.00002998 [09:05:54] Epoch: 1 Batch: 4128/20099 (20.54%) Loss: 2.182862 LR: 0.00002998 [09:05:57] Epoch: 1 Batch: 4129/20099 (20.54%) Loss: 2.215382 LR: 0.00002998 [09:06:00] Epoch: 1 Batch: 4130/20099 (20.55%) Loss: 2.455392 LR: 0.00002998 [09:06:03] Epoch: 1 Batch: 4131/20099 (20.55%) Loss: 2.277093 LR: 0.00002998 [09:06:06] Epoch: 1 Batch: 4132/20099 (20.56%) Loss: 2.348808 LR: 0.00002998 [09:06:09] Epoch: 1 Batch: 4133/20099 (20.56%) Loss: 2.165989 LR: 0.00002998 [09:06:13] Epoch: 1 Batch: 4134/20099 (20.57%) Loss: 2.398511 LR: 0.00002998 [09:06:16] Epoch: 1 Batch: 4135/20099 (20.57%) Loss: 2.184958 LR: 0.00002998 [09:06:19] Epoch: 1 Batch: 4136/20099 (20.58%) Loss: 2.131309 LR: 0.00002998 [09:06:22] Epoch: 1 Batch: 4137/20099 (20.58%) Loss: 2.152016 LR: 0.00002998 [09:06:25] Epoch: 1 Batch: 4138/20099 (20.59%) Loss: 2.340721 LR: 0.00002998 [09:06:28] Epoch: 1 Batch: 4139/20099 (20.59%) Loss: 2.109335 LR: 0.00002998 [09:06:31] Epoch: 1 Batch: 4140/20099 (20.60%) Loss: 2.196358 LR: 0.00002998 [09:06:34] Epoch: 1 Batch: 4141/20099 (20.60%) Loss: 2.239304 LR: 0.00002998 [09:06:37] Epoch: 1 Batch: 4142/20099 (20.61%) Loss: 2.187442 LR: 0.00002998 [09:06:41] Epoch: 1 Batch: 4143/20099 (20.61%) Loss: 2.099458 LR: 0.00002998 [09:06:44] Epoch: 1 Batch: 4144/20099 (20.62%) Loss: 1.951378 LR: 0.00002998 [09:06:47] Epoch: 1 Batch: 4145/20099 (20.62%) Loss: 2.278064 LR: 0.00002998 [09:06:50] Epoch: 1 Batch: 4146/20099 (20.63%) Loss: 2.409459 LR: 0.00002998 [09:06:53] Epoch: 1 Batch: 4147/20099 (20.63%) Loss: 1.883501 LR: 0.00002998 [09:06:56] Epoch: 1 Batch: 4148/20099 (20.64%) Loss: 2.003777 LR: 0.00002998 [09:06:59] Epoch: 1 Batch: 4149/20099 (20.64%) Loss: 1.894684 LR: 0.00002998 [09:07:02] Epoch: 1 Batch: 4150/20099 (20.65%) Loss: 1.986508 LR: 0.00002998 [09:07:05] Epoch: 1 Batch: 4151/20099 (20.65%) Loss: 2.177313 LR: 0.00002998 [09:07:08] Epoch: 1 Batch: 4152/20099 (20.66%) Loss: 1.931077 LR: 0.00002998 [09:07:12] Epoch: 1 Batch: 4153/20099 (20.66%) Loss: 2.804565 LR: 0.00002998 [09:07:15] Epoch: 1 Batch: 4154/20099 (20.67%) Loss: 2.128333 LR: 0.00002998 [09:07:18] Epoch: 1 Batch: 4155/20099 (20.67%) Loss: 2.248257 LR: 0.00002998 [09:07:21] Epoch: 1 Batch: 4156/20099 (20.68%) Loss: 2.071353 LR: 0.00002998 [09:07:24] Epoch: 1 Batch: 4157/20099 (20.68%) Loss: 1.646418 LR: 0.00002998 [09:07:27] Epoch: 1 Batch: 4158/20099 (20.69%) Loss: 2.037481 LR: 0.00002998 [09:07:30] Epoch: 1 Batch: 4159/20099 (20.69%) Loss: 1.863584 LR: 0.00002998 [09:07:33] Epoch: 1 Batch: 4160/20099 (20.70%) Loss: 2.244012 LR: 0.00002998 [09:07:36] Epoch: 1 Batch: 4161/20099 (20.70%) Loss: 2.529523 LR: 0.00002998 [09:07:39] Epoch: 1 Batch: 4162/20099 (20.71%) Loss: 1.983195 LR: 0.00002998 [09:07:42] Epoch: 1 Batch: 4163/20099 (20.71%) Loss: 2.311235 LR: 0.00002998 [09:07:45] Epoch: 1 Batch: 4164/20099 (20.72%) Loss: 2.240770 LR: 0.00002998 [09:07:49] Epoch: 1 Batch: 4165/20099 (20.72%) Loss: 2.230162 LR: 0.00002998 [09:07:52] Epoch: 1 Batch: 4166/20099 (20.73%) Loss: 1.741855 LR: 0.00002998 [09:07:55] Epoch: 1 Batch: 4167/20099 (20.73%) Loss: 2.071361 LR: 0.00002998 [09:07:58] Epoch: 1 Batch: 4168/20099 (20.74%) Loss: 2.383783 LR: 0.00002998 [09:08:01] Epoch: 1 Batch: 4169/20099 (20.74%) Loss: 2.259821 LR: 0.00002998 [09:08:04] Epoch: 1 Batch: 4170/20099 (20.75%) Loss: 1.746626 LR: 0.00002998 [09:08:07] Epoch: 1 Batch: 4171/20099 (20.75%) Loss: 2.307424 LR: 0.00002998 [09:08:10] Epoch: 1 Batch: 4172/20099 (20.76%) Loss: 2.076166 LR: 0.00002998 [09:08:13] Epoch: 1 Batch: 4173/20099 (20.76%) Loss: 2.459747 LR: 0.00002998 [09:08:16] Epoch: 1 Batch: 4174/20099 (20.77%) Loss: 1.771962 LR: 0.00002998 [09:08:20] Epoch: 1 Batch: 4175/20099 (20.77%) Loss: 2.142812 LR: 0.00002998 [09:08:23] Epoch: 1 Batch: 4176/20099 (20.78%) Loss: 2.127935 LR: 0.00002998 [09:08:26] Epoch: 1 Batch: 4177/20099 (20.78%) Loss: 2.409044 LR: 0.00002998 [09:08:29] Epoch: 1 Batch: 4178/20099 (20.79%) Loss: 1.902766 LR: 0.00002998 [09:08:32] Epoch: 1 Batch: 4179/20099 (20.79%) Loss: 2.125066 LR: 0.00002998 [09:08:35] Epoch: 1 Batch: 4180/20099 (20.80%) Loss: 2.021751 LR: 0.00002998 [09:08:38] Epoch: 1 Batch: 4181/20099 (20.80%) Loss: 2.104440 LR: 0.00002998 [09:08:41] Epoch: 1 Batch: 4182/20099 (20.81%) Loss: 2.334563 LR: 0.00002998 [09:08:44] Epoch: 1 Batch: 4183/20099 (20.81%) Loss: 2.211609 LR: 0.00002998 [09:08:47] Epoch: 1 Batch: 4184/20099 (20.82%) Loss: 2.310794 LR: 0.00002998 [09:08:50] Epoch: 1 Batch: 4185/20099 (20.82%) Loss: 1.583148 LR: 0.00002998 [09:08:53] Epoch: 1 Batch: 4186/20099 (20.83%) Loss: 1.950957 LR: 0.00002997 [09:08:57] Epoch: 1 Batch: 4187/20099 (20.83%) Loss: 2.178679 LR: 0.00002997 [09:09:00] Epoch: 1 Batch: 4188/20099 (20.84%) Loss: 1.968034 LR: 0.00002997 [09:09:03] Epoch: 1 Batch: 4189/20099 (20.84%) Loss: 2.173266 LR: 0.00002997 [09:09:06] Epoch: 1 Batch: 4190/20099 (20.85%) Loss: 2.194826 LR: 0.00002997 [09:09:09] Epoch: 1 Batch: 4191/20099 (20.85%) Loss: 2.312870 LR: 0.00002997 [09:09:12] Epoch: 1 Batch: 4192/20099 (20.86%) Loss: 2.147857 LR: 0.00002997 [09:09:15] Epoch: 1 Batch: 4193/20099 (20.86%) Loss: 2.148641 LR: 0.00002997 [09:09:18] Epoch: 1 Batch: 4194/20099 (20.87%) Loss: 2.161398 LR: 0.00002997 [09:09:21] Epoch: 1 Batch: 4195/20099 (20.87%) Loss: 2.468514 LR: 0.00002997 [09:09:24] Epoch: 1 Batch: 4196/20099 (20.88%) Loss: 2.226539 LR: 0.00002997 [09:09:27] Epoch: 1 Batch: 4197/20099 (20.88%) Loss: 2.495526 LR: 0.00002997 [09:09:31] Epoch: 1 Batch: 4198/20099 (20.89%) Loss: 1.778994 LR: 0.00002997 [09:09:34] Epoch: 1 Batch: 4199/20099 (20.89%) Loss: 2.475960 LR: 0.00002997 [09:09:40] >> Cleaned up old temp checkpoint: epoch1_step2200 [09:09:40] >> Temp checkpoint saved: epoch1_step4200, size: 0.1693 GB [09:09:40] Epoch: 1 Batch: 4200/20099 (20.90%) Loss: 2.259036 LR: 0.00002997 [09:09:43] Epoch: 1 Batch: 4201/20099 (20.90%) Loss: 2.205713 LR: 0.00002997 [09:09:46] Epoch: 1 Batch: 4202/20099 (20.91%) Loss: 1.897110 LR: 0.00002997 [09:09:49] Epoch: 1 Batch: 4203/20099 (20.91%) Loss: 1.987788 LR: 0.00002997 [09:09:52] Epoch: 1 Batch: 4204/20099 (20.92%) Loss: 2.488675 LR: 0.00002997 [09:09:56] Epoch: 1 Batch: 4205/20099 (20.92%) Loss: 2.079707 LR: 0.00002997 [09:09:59] Epoch: 1 Batch: 4206/20099 (20.93%) Loss: 2.319595 LR: 0.00002997 [09:10:02] Epoch: 1 Batch: 4207/20099 (20.93%) Loss: 2.207680 LR: 0.00002997 [09:10:05] Epoch: 1 Batch: 4208/20099 (20.94%) Loss: 2.013397 LR: 0.00002997 [09:10:08] Epoch: 1 Batch: 4209/20099 (20.94%) Loss: 2.176493 LR: 0.00002997 [09:10:11] Epoch: 1 Batch: 4210/20099 (20.95%) Loss: 2.297882 LR: 0.00002997 [09:10:14] Epoch: 1 Batch: 4211/20099 (20.95%) Loss: 2.000204 LR: 0.00002997 [09:10:17] Epoch: 1 Batch: 4212/20099 (20.96%) Loss: 2.221789 LR: 0.00002997 [09:10:20] Epoch: 1 Batch: 4213/20099 (20.96%) Loss: 2.264577 LR: 0.00002997 [09:10:24] Epoch: 1 Batch: 4214/20099 (20.97%) Loss: 2.222815 LR: 0.00002997 [09:10:27] Epoch: 1 Batch: 4215/20099 (20.97%) Loss: 2.096981 LR: 0.00002997 [09:10:30] Epoch: 1 Batch: 4216/20099 (20.98%) Loss: 2.488047 LR: 0.00002997 [09:10:33] Epoch: 1 Batch: 4217/20099 (20.98%) Loss: 2.280287 LR: 0.00002997 [09:10:36] Epoch: 1 Batch: 4218/20099 (20.99%) Loss: 2.116813 LR: 0.00002997 [09:10:39] Epoch: 1 Batch: 4219/20099 (20.99%) Loss: 2.105318 LR: 0.00002997 [09:10:42] Epoch: 1 Batch: 4220/20099 (21.00%) Loss: 2.303334 LR: 0.00002997 [09:10:45] Epoch: 1 Batch: 4221/20099 (21.00%) Loss: 2.217862 LR: 0.00002997 [09:10:48] Epoch: 1 Batch: 4222/20099 (21.01%) Loss: 2.151740 LR: 0.00002997 [09:10:52] Epoch: 1 Batch: 4223/20099 (21.01%) Loss: 2.230169 LR: 0.00002997 [09:10:55] Epoch: 1 Batch: 4224/20099 (21.02%) Loss: 1.890332 LR: 0.00002997 [09:10:58] Epoch: 1 Batch: 4225/20099 (21.02%) Loss: 2.341240 LR: 0.00002997 [09:11:01] Epoch: 1 Batch: 4226/20099 (21.03%) Loss: 2.544625 LR: 0.00002997 [09:11:04] Epoch: 1 Batch: 4227/20099 (21.03%) Loss: 2.078583 LR: 0.00002997 [09:11:07] Epoch: 1 Batch: 4228/20099 (21.04%) Loss: 2.007075 LR: 0.00002997 [09:11:10] Epoch: 1 Batch: 4229/20099 (21.04%) Loss: 2.048175 LR: 0.00002997 [09:11:13] Epoch: 1 Batch: 4230/20099 (21.05%) Loss: 2.161504 LR: 0.00002997 [09:11:16] Epoch: 1 Batch: 4231/20099 (21.05%) Loss: 2.206236 LR: 0.00002997 [09:11:19] Epoch: 1 Batch: 4232/20099 (21.06%) Loss: 2.155538 LR: 0.00002997 [09:11:22] Epoch: 1 Batch: 4233/20099 (21.06%) Loss: 2.459120 LR: 0.00002997 [09:11:26] Epoch: 1 Batch: 4234/20099 (21.07%) Loss: 1.907223 LR: 0.00002997 [09:11:29] Epoch: 1 Batch: 4235/20099 (21.07%) Loss: 2.381596 LR: 0.00002997 [09:11:32] Epoch: 1 Batch: 4236/20099 (21.08%) Loss: 2.195115 LR: 0.00002997 [09:11:35] Epoch: 1 Batch: 4237/20099 (21.08%) Loss: 1.958473 LR: 0.00002997 [09:11:38] Epoch: 1 Batch: 4238/20099 (21.09%) Loss: 2.104866 LR: 0.00002997 [09:11:41] Epoch: 1 Batch: 4239/20099 (21.09%) Loss: 2.036722 LR: 0.00002997 [09:11:44] Epoch: 1 Batch: 4240/20099 (21.10%) Loss: 2.038196 LR: 0.00002997 [09:11:47] Epoch: 1 Batch: 4241/20099 (21.10%) Loss: 2.020663 LR: 0.00002997 [09:11:50] Epoch: 1 Batch: 4242/20099 (21.11%) Loss: 2.325535 LR: 0.00002997 [09:11:53] Epoch: 1 Batch: 4243/20099 (21.11%) Loss: 2.306229 LR: 0.00002997 [09:11:56] Epoch: 1 Batch: 4244/20099 (21.12%) Loss: 1.979847 LR: 0.00002997 [09:12:00] Epoch: 1 Batch: 4245/20099 (21.12%) Loss: 2.148159 LR: 0.00002997 [09:12:03] Epoch: 1 Batch: 4246/20099 (21.13%) Loss: 2.126872 LR: 0.00002997 [09:12:06] Epoch: 1 Batch: 4247/20099 (21.13%) Loss: 2.306468 LR: 0.00002997 [09:12:09] Epoch: 1 Batch: 4248/20099 (21.14%) Loss: 2.133780 LR: 0.00002997 [09:12:12] Epoch: 1 Batch: 4249/20099 (21.14%) Loss: 2.219174 LR: 0.00002996 [09:12:15] Epoch: 1 Batch: 4250/20099 (21.15%) Loss: 2.538643 LR: 0.00002996 [09:12:18] Epoch: 1 Batch: 4251/20099 (21.15%) Loss: 2.379699 LR: 0.00002996 [09:12:21] Epoch: 1 Batch: 4252/20099 (21.16%) Loss: 2.149000 LR: 0.00002996 [09:12:24] Epoch: 1 Batch: 4253/20099 (21.16%) Loss: 1.929791 LR: 0.00002996 [09:12:27] Epoch: 1 Batch: 4254/20099 (21.17%) Loss: 2.711968 LR: 0.00002996 [09:12:30] Epoch: 1 Batch: 4255/20099 (21.17%) Loss: 2.219484 LR: 0.00002996 [09:12:34] Epoch: 1 Batch: 4256/20099 (21.18%) Loss: 2.181377 LR: 0.00002996 [09:12:37] Epoch: 1 Batch: 4257/20099 (21.18%) Loss: 2.115895 LR: 0.00002996 [09:12:40] Epoch: 1 Batch: 4258/20099 (21.19%) Loss: 2.068900 LR: 0.00002996 [09:12:43] Epoch: 1 Batch: 4259/20099 (21.19%) Loss: 2.314376 LR: 0.00002996 [09:12:46] Epoch: 1 Batch: 4260/20099 (21.20%) Loss: 2.208560 LR: 0.00002996 [09:12:49] Epoch: 1 Batch: 4261/20099 (21.20%) Loss: 2.317142 LR: 0.00002996 [09:12:52] Epoch: 1 Batch: 4262/20099 (21.21%) Loss: 2.045093 LR: 0.00002996 [09:12:55] Epoch: 1 Batch: 4263/20099 (21.21%) Loss: 2.006933 LR: 0.00002996 [09:12:58] Epoch: 1 Batch: 4264/20099 (21.21%) Loss: 2.137494 LR: 0.00002996 [09:13:01] Epoch: 1 Batch: 4265/20099 (21.22%) Loss: 2.223497 LR: 0.00002996 [09:13:04] Epoch: 1 Batch: 4266/20099 (21.22%) Loss: 2.173101 LR: 0.00002996 [09:13:07] Epoch: 1 Batch: 4267/20099 (21.23%) Loss: 2.159361 LR: 0.00002996 [09:13:11] Epoch: 1 Batch: 4268/20099 (21.23%) Loss: 2.252352 LR: 0.00002996 [09:13:14] Epoch: 1 Batch: 4269/20099 (21.24%) Loss: 2.341697 LR: 0.00002996 [09:13:17] Epoch: 1 Batch: 4270/20099 (21.24%) Loss: 2.381249 LR: 0.00002996 [09:13:20] Epoch: 1 Batch: 4271/20099 (21.25%) Loss: 2.287217 LR: 0.00002996 [09:13:23] Epoch: 1 Batch: 4272/20099 (21.25%) Loss: 2.047531 LR: 0.00002996 [09:13:26] Epoch: 1 Batch: 4273/20099 (21.26%) Loss: 2.165573 LR: 0.00002996 [09:13:29] Epoch: 1 Batch: 4274/20099 (21.26%) Loss: 2.083696 LR: 0.00002996 [09:13:32] Epoch: 1 Batch: 4275/20099 (21.27%) Loss: 2.101045 LR: 0.00002996 [09:13:35] Epoch: 1 Batch: 4276/20099 (21.27%) Loss: 1.866181 LR: 0.00002996 [09:13:38] Epoch: 1 Batch: 4277/20099 (21.28%) Loss: 2.118667 LR: 0.00002996 [09:13:42] Epoch: 1 Batch: 4278/20099 (21.28%) Loss: 2.220998 LR: 0.00002996 [09:13:45] Epoch: 1 Batch: 4279/20099 (21.29%) Loss: 1.847115 LR: 0.00002996 [09:13:48] Epoch: 1 Batch: 4280/20099 (21.29%) Loss: 2.511682 LR: 0.00002996 [09:13:51] Epoch: 1 Batch: 4281/20099 (21.30%) Loss: 2.290359 LR: 0.00002996 [09:13:54] Epoch: 1 Batch: 4282/20099 (21.30%) Loss: 2.527822 LR: 0.00002996 [09:13:57] Epoch: 1 Batch: 4283/20099 (21.31%) Loss: 2.245917 LR: 0.00002996 [09:14:00] Epoch: 1 Batch: 4284/20099 (21.31%) Loss: 2.115796 LR: 0.00002996 [09:14:03] Epoch: 1 Batch: 4285/20099 (21.32%) Loss: 2.122059 LR: 0.00002996 [09:14:06] Epoch: 1 Batch: 4286/20099 (21.32%) Loss: 2.275984 LR: 0.00002996 [09:14:09] Epoch: 1 Batch: 4287/20099 (21.33%) Loss: 2.073956 LR: 0.00002996 [09:14:13] Epoch: 1 Batch: 4288/20099 (21.33%) Loss: 2.290770 LR: 0.00002996 [09:14:16] Epoch: 1 Batch: 4289/20099 (21.34%) Loss: 2.517737 LR: 0.00002996 [09:14:19] Epoch: 1 Batch: 4290/20099 (21.34%) Loss: 2.596697 LR: 0.00002996 [09:14:22] Epoch: 1 Batch: 4291/20099 (21.35%) Loss: 1.943339 LR: 0.00002996 [09:14:25] Epoch: 1 Batch: 4292/20099 (21.35%) Loss: 1.910061 LR: 0.00002996 [09:14:28] Epoch: 1 Batch: 4293/20099 (21.36%) Loss: 2.100767 LR: 0.00002996 [09:14:31] Epoch: 1 Batch: 4294/20099 (21.36%) Loss: 2.154790 LR: 0.00002996 [09:14:34] Epoch: 1 Batch: 4295/20099 (21.37%) Loss: 2.241067 LR: 0.00002996 [09:14:37] Epoch: 1 Batch: 4296/20099 (21.37%) Loss: 2.315400 LR: 0.00002996 [09:14:40] Epoch: 1 Batch: 4297/20099 (21.38%) Loss: 2.173643 LR: 0.00002996 [09:14:44] Epoch: 1 Batch: 4298/20099 (21.38%) Loss: 2.125968 LR: 0.00002996 [09:14:47] Epoch: 1 Batch: 4299/20099 (21.39%) Loss: 2.567380 LR: 0.00002996 [09:14:50] Epoch: 1 Batch: 4300/20099 (21.39%) Loss: 2.074193 LR: 0.00002996 [09:14:53] Epoch: 1 Batch: 4301/20099 (21.40%) Loss: 1.839112 LR: 0.00002996 [09:14:56] Epoch: 1 Batch: 4302/20099 (21.40%) Loss: 2.374906 LR: 0.00002996 [09:14:59] Epoch: 1 Batch: 4303/20099 (21.41%) Loss: 2.215290 LR: 0.00002996 [09:15:02] Epoch: 1 Batch: 4304/20099 (21.41%) Loss: 2.164253 LR: 0.00002996 [09:15:05] Epoch: 1 Batch: 4305/20099 (21.42%) Loss: 2.269198 LR: 0.00002995 [09:15:08] Epoch: 1 Batch: 4306/20099 (21.42%) Loss: 2.016145 LR: 0.00002995 [09:15:11] Epoch: 1 Batch: 4307/20099 (21.43%) Loss: 2.021875 LR: 0.00002995 [09:15:15] Epoch: 1 Batch: 4308/20099 (21.43%) Loss: 2.131211 LR: 0.00002995 [09:15:18] Epoch: 1 Batch: 4309/20099 (21.44%) Loss: 1.902801 LR: 0.00002995 [09:15:21] Epoch: 1 Batch: 4310/20099 (21.44%) Loss: 1.967012 LR: 0.00002995 [09:15:24] Epoch: 1 Batch: 4311/20099 (21.45%) Loss: 1.967475 LR: 0.00002995 [09:15:27] Epoch: 1 Batch: 4312/20099 (21.45%) Loss: 2.333742 LR: 0.00002995 [09:15:30] Epoch: 1 Batch: 4313/20099 (21.46%) Loss: 2.112101 LR: 0.00002995 [09:15:33] Epoch: 1 Batch: 4314/20099 (21.46%) Loss: 2.282474 LR: 0.00002995 [09:15:36] Epoch: 1 Batch: 4315/20099 (21.47%) Loss: 2.228675 LR: 0.00002995 [09:15:39] Epoch: 1 Batch: 4316/20099 (21.47%) Loss: 2.393400 LR: 0.00002995 [09:15:42] Epoch: 1 Batch: 4317/20099 (21.48%) Loss: 2.304865 LR: 0.00002995 [09:15:45] Epoch: 1 Batch: 4318/20099 (21.48%) Loss: 2.301402 LR: 0.00002995 [09:15:49] Epoch: 1 Batch: 4319/20099 (21.49%) Loss: 2.161675 LR: 0.00002995 [09:15:52] Epoch: 1 Batch: 4320/20099 (21.49%) Loss: 1.954080 LR: 0.00002995 [09:15:55] Epoch: 1 Batch: 4321/20099 (21.50%) Loss: 2.191655 LR: 0.00002995 [09:15:58] Epoch: 1 Batch: 4322/20099 (21.50%) Loss: 2.077162 LR: 0.00002995 [09:16:01] Epoch: 1 Batch: 4323/20099 (21.51%) Loss: 2.095188 LR: 0.00002995 [09:16:13] 2025-08-23 [09:16:14] Tesla T4 [09:16:14] |===========================================================================| | PyTorch CUDA memory summary, device ID 0 | |---------------------------------------------------------------------------| | CUDA OOMs: 0 | cudaMalloc retries: 0 | |===========================================================================| | Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | |---------------------------------------------------------------------------| | Allocated memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Active memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Requested memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | GPU reserved memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Non-releasable memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Allocations | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Active allocs | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | GPU reserved segments | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Non-releasable allocs | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Oversize allocations | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Oversize GPU segments | 0 | 0 | 0 | 0 | |===========================================================================| [09:16:14] CPU usage: 59.3%, RAM usage: 27.2% [09:16:14] Running with the following configuration: [09:16:14] model_name: /content/drive/MyDrive/llm/NousResearch/Hermes-3-Llama-3.1-8B [09:16:14] tokenizer: /content/drive/MyDrive/llm/NousResearch/Hermes-3-Llama-3.1-8B [09:16:14] output_dir: /content/drive/MyDrive/llm/Discord-Hermes-3-8B [09:16:14] train_path: /content/drive/MyDrive/data/None156_fix.csv [09:16:14] checkpoint: /content/drive/MyDrive/llm/Discord-Hermes-3-8B/temp/epoch1_step4000 [09:16:14] lr: 3e-05 [09:16:14] lr_floor: 6e-06 [09:16:14] epochs: 1 [09:16:14] batch_size: 5 [09:16:14] accum_steps: 7 [09:16:14] val_batch_size: 6 [09:16:14] max_val_size: 100 [09:16:14] max_length: 150 [09:16:14] save_temp_frequency: 200 [09:16:14] save_frequency: 500 [09:16:14] eval_frequency: 500 [09:16:14] save_pattern: y [09:16:14] quantization: y [09:16:14] quantization_bits: 4 [09:16:14] lora: y [09:16:14] frozen_lora_path: None [09:16:14] lora_rank: 16 [09:16:14] lora_alpha: 32 [09:16:14] lora_dropout: 0.1 [09:16:14] optimizer_weight_decay: 0.0 [09:16:14] warmup_type: cosine [09:16:14] warmup_ratio: 0.08 [09:16:14] warmup_steps: 550 [09:16:14] shuffle: y [09:16:14] csv_column: text [09:16:14] new_run: n [09:16:14] label_smoothing: 0.05 [09:16:14] SEED: 1 [09:16:14] Using device: cuda [09:16:14] Resuming from temp checkpoint: /content/drive/MyDrive/llm/Discord-Hermes-3-8B/temp/epoch1_step4000 [09:21:55] Embeddings shape after: torch.Size([128256, 4096]) [09:22:00] Loaded trainable LoRA adapter from /content/drive/MyDrive/llm/Discord-Hermes-3-8B/temp/epoch1_step4000 [09:22:00] Trainable LoRA 'default': [09:22:00] task_type: CAUSAL_LM [09:22:00] peft_type: PeftType.LORA [09:22:00] auto_mapping: None [09:22:00] base_model_name_or_path: /content/drive/MyDrive/llm/NousResearch/Hermes-3-Llama-3.1-8B [09:22:00] revision: None [09:22:01] inference_mode: False [09:22:01] r: 16 [09:22:01] target_modules: {'v_proj', 'q_proj', 'o_proj', 'k_proj'} [09:22:01] exclude_modules: None [09:22:01] lora_alpha: 32 [09:22:01] lora_dropout: 0.1 [09:22:01] fan_in_fan_out: False [09:22:01] bias: none [09:22:01] use_rslora: True [09:22:01] modules_to_save: None [09:22:01] init_lora_weights: True [09:22:01] layers_to_transform: None [09:22:01] layers_pattern: None [09:22:01] rank_pattern: {} [09:22:01] alpha_pattern: {} [09:22:01] megatron_config: None [09:22:01] megatron_core: megatron.core [09:22:01] trainable_token_indices: None [09:22:01] loftq_config: {} [09:22:01] eva_config: None [09:22:01] corda_config: None [09:22:01] use_dora: False [09:22:01] use_qalora: False [09:22:01] qalora_group_size: 16 [09:22:01] layer_replication: None [09:22:01] runtime_config: LoraRuntimeConfig(ephemeral_gpu_offload=False) [09:22:01] lora_bias: False [09:22:01] target_parameters: None [09:22:01] _custom_modules: None [09:22:01] Embeddings shape after: torch.Size([128256, 4096]) [09:22:06] Resumed from epoch 1, step 4001, file 1 [09:22:06] Starting from CSV file... [09:22:09] Splitting data into chunks of 11000... [09:22:09] Using 7 processes across 10 chunks [09:22:10] Using saved train/val split from checkpoint. [09:22:10] Resuming scheduler with warmup steps: 229, total steps: 2871 [09:22:10] Initializing scheduler with cosine schedule with warmup, warmup steps 550, total steps: 2871 [09:22:10] Train/Val split: 100492 train, 100 val samples. [09:22:19] Model: PeftModelForCausalLM [09:22:19] Model config: LlamaConfig { "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": 128040, "head_dim": 128, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 131072, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pretraining_tp": 1, "quantization_config": { "_load_in_4bit": true, "_load_in_8bit": false, "bnb_4bit_compute_dtype": "float16", "bnb_4bit_quant_storage": "uint8", "bnb_4bit_quant_type": "nf4", "bnb_4bit_use_double_quant": true, "llm_int8_enable_fp32_cpu_offload": false, "llm_int8_has_fp16_weight": false, "llm_int8_skip_modules": [ "lm_head" ], "llm_int8_threshold": 6.0, "load_in_4bit": true, "load_in_8bit": false, "quant_method": "bitsandbytes" }, "rms_norm_eps": 1e-05, "rope_scaling": { "factor": 8.0, "high_freq_factor": 4.0, "low_freq_factor": 1.0, "original_max_position_embeddings": 8192, "rope_type": "llama3" }, "rope_theta": 500000.0, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.55.2", "use_cache": true, "vocab_size": 128256 } [09:22:19] Optimizer params: lr=3e-05, weight_decay=0.0, accum_steps=7 [09:22:19] Optimizer: PagedAdamW ( Parameter Group 0 alpha: 0.0 betas: (0.9, 0.95) eps: 1e-08 initial_lr: 3e-05 lr: 0.0 t_alpha: None t_beta3: None weight_decay: 0.0 ) [09:22:19] Optimizer params: lr=3e-05, weight_decay=0.0, accum_steps=7 [09:22:19] Scheduler: [09:22:19] Training on 100492 training samples, 100 validation samples [09:22:19] Average tokens per sample: 150.00 [09:22:19] Estimated epoch time: ~308.02 min [09:22:19] |===========================================================================| | PyTorch CUDA memory summary, device ID 0 | |---------------------------------------------------------------------------| | CUDA OOMs: 0 | cudaMalloc retries: 0 | |===========================================================================| | Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | |---------------------------------------------------------------------------| | Allocated memory | 5986 MiB | 7004 MiB | 335397 MiB | 329410 MiB | |---------------------------------------------------------------------------| | Active memory | 5986 MiB | 7004 MiB | 335397 MiB | 329410 MiB | |---------------------------------------------------------------------------| | Requested memory | 5983 MiB | 7000 MiB | 335022 MiB | 329039 MiB | |---------------------------------------------------------------------------| | GPU reserved memory | 7248 MiB | 7248 MiB | 7248 MiB | 0 B | |---------------------------------------------------------------------------| | Non-releasable memory | 1261 MiB | 5879 MiB | 328754 MiB | 327493 MiB | |---------------------------------------------------------------------------| | Allocations | 2762 | 2840 | 33883 | 31121 | |---------------------------------------------------------------------------| | Active allocs | 2762 | 2840 | 33883 | 31121 | |---------------------------------------------------------------------------| | GPU reserved segments | 185 | 185 | 185 | 0 | |---------------------------------------------------------------------------| | Non-releasable allocs | 36 | 36 | 13826 | 13790 | |---------------------------------------------------------------------------| | Oversize allocations | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Oversize GPU segments | 0 | 0 | 0 | 0 | |===========================================================================| [09:22:19] Restoring shuffle indices from training state for epoch 1 [09:22:19] CPU usage: 46.3%, RAM usage: 37.1% [09:22:20] Epoch 1 learning rate: 0.0 [09:22:20] Starting epoch 1 [09:22:32] Batch 4001: input_ids shape torch.Size([5, 140]), attention_mask shape torch.Size([5, 140]) [09:22:34] Epoch: 1 Batch: 4001/20099 (19.91%) Loss: 2.366458 LR: 0.00000000 [09:22:36] Epoch: 1 Batch: 4002/20099 (19.91%) Loss: 2.154430 LR: 0.00000000 [09:22:37] Epoch: 1 Batch: 4003/20099 (19.92%) Loss: 2.147170 LR: 0.00000000 [09:22:39] Epoch: 1 Batch: 4004/20099 (19.92%) Loss: 2.156662 LR: 0.00000000 [09:22:41] Epoch: 1 Batch: 4005/20099 (19.93%) Loss: 2.155010 LR: 0.00000000 [09:22:42] Epoch: 1 Batch: 4006/20099 (19.93%) Loss: 2.219366 LR: 0.00000000 [09:22:44] Epoch: 1 Batch: 4007/20099 (19.94%) Loss: 2.197301 LR: 0.00002999 [09:22:46] Epoch: 1 Batch: 4008/20099 (19.94%) Loss: 2.097772 LR: 0.00002999 [09:22:47] Epoch: 1 Batch: 4009/20099 (19.95%) Loss: 2.143481 LR: 0.00002999 [09:22:49] Epoch: 1 Batch: 4010/20099 (19.95%) Loss: 1.808341 LR: 0.00002999 [09:22:51] Epoch: 1 Batch: 4011/20099 (19.96%) Loss: 2.078780 LR: 0.00002999 [09:22:53] Epoch: 1 Batch: 4012/20099 (19.96%) Loss: 2.126783 LR: 0.00002999 [09:22:54] Epoch: 1 Batch: 4013/20099 (19.97%) Loss: 2.034043 LR: 0.00002999 [09:22:56] Epoch: 1 Batch: 4014/20099 (19.97%) Loss: 2.210934 LR: 0.00002999 [09:22:58] Epoch: 1 Batch: 4015/20099 (19.98%) Loss: 2.196439 LR: 0.00002999 [09:23:00] Epoch: 1 Batch: 4016/20099 (19.98%) Loss: 2.149901 LR: 0.00002999 [09:23:02] Epoch: 1 Batch: 4017/20099 (19.99%) Loss: 2.283413 LR: 0.00002999 [09:23:03] Epoch: 1 Batch: 4018/20099 (19.99%) Loss: 2.181970 LR: 0.00002999 [09:23:05] Epoch: 1 Batch: 4019/20099 (20.00%) Loss: 2.062837 LR: 0.00002999 [09:23:07] Epoch: 1 Batch: 4020/20099 (20.00%) Loss: 2.212120 LR: 0.00002999 [09:23:09] Epoch: 1 Batch: 4021/20099 (20.01%) Loss: 2.131150 LR: 0.00002999 [09:23:11] Epoch: 1 Batch: 4022/20099 (20.01%) Loss: 2.434605 LR: 0.00002999 [09:23:13] Epoch: 1 Batch: 4023/20099 (20.02%) Loss: 2.162192 LR: 0.00002999 [09:23:14] Epoch: 1 Batch: 4024/20099 (20.02%) Loss: 2.050821 LR: 0.00002999 [09:23:16] Epoch: 1 Batch: 4025/20099 (20.03%) Loss: 2.139873 LR: 0.00002999 [09:23:18] Epoch: 1 Batch: 4026/20099 (20.03%) Loss: 2.136459 LR: 0.00002999 [09:23:20] Epoch: 1 Batch: 4027/20099 (20.04%) Loss: 2.295009 LR: 0.00002999 [09:23:22] Epoch: 1 Batch: 4028/20099 (20.04%) Loss: 2.159074 LR: 0.00002999 [09:23:24] Epoch: 1 Batch: 4029/20099 (20.05%) Loss: 1.936129 LR: 0.00002999 [09:23:25] Epoch: 1 Batch: 4030/20099 (20.05%) Loss: 2.432516 LR: 0.00002999 [09:23:27] Epoch: 1 Batch: 4031/20099 (20.06%) Loss: 2.049529 LR: 0.00002999 [09:23:29] Epoch: 1 Batch: 4032/20099 (20.06%) Loss: 2.282998 LR: 0.00002999 [09:23:31] Epoch: 1 Batch: 4033/20099 (20.07%) Loss: 1.783732 LR: 0.00002999 [09:23:33] Epoch: 1 Batch: 4034/20099 (20.07%) Loss: 2.098824 LR: 0.00002999 [09:23:35] Epoch: 1 Batch: 4035/20099 (20.08%) Loss: 1.877168 LR: 0.00002999 [09:23:36] Epoch: 1 Batch: 4036/20099 (20.08%) Loss: 2.196977 LR: 0.00002999 [09:23:38] Epoch: 1 Batch: 4037/20099 (20.09%) Loss: 2.253958 LR: 0.00002999 [09:23:40] Epoch: 1 Batch: 4038/20099 (20.09%) Loss: 2.112927 LR: 0.00002999 [09:23:42] Epoch: 1 Batch: 4039/20099 (20.10%) Loss: 2.108996 LR: 0.00002999 [09:23:43] Epoch: 1 Batch: 4040/20099 (20.10%) Loss: 2.197201 LR: 0.00002999 [09:23:45] Epoch: 1 Batch: 4041/20099 (20.11%) Loss: 2.118268 LR: 0.00002999 [09:23:47] Epoch: 1 Batch: 4042/20099 (20.11%) Loss: 2.102075 LR: 0.00002999 [09:23:49] Epoch: 1 Batch: 4043/20099 (20.12%) Loss: 1.874926 LR: 0.00002999 [09:23:50] Epoch: 1 Batch: 4044/20099 (20.12%) Loss: 2.071547 LR: 0.00002999 [09:23:52] Epoch: 1 Batch: 4045/20099 (20.13%) Loss: 2.150452 LR: 0.00002999 [09:23:54] Epoch: 1 Batch: 4046/20099 (20.13%) Loss: 2.297747 LR: 0.00002999 [09:23:56] Epoch: 1 Batch: 4047/20099 (20.14%) Loss: 1.896263 LR: 0.00002999 [09:23:57] Epoch: 1 Batch: 4048/20099 (20.14%) Loss: 1.922141 LR: 0.00002999 [09:23:59] Epoch: 1 Batch: 4049/20099 (20.15%) Loss: 2.147628 LR: 0.00002999 [09:24:01] Epoch: 1 Batch: 4050/20099 (20.15%) Loss: 2.255849 LR: 0.00002999 [09:24:02] Epoch: 1 Batch: 4051/20099 (20.16%) Loss: 2.060654 LR: 0.00002999 [09:24:04] Epoch: 1 Batch: 4052/20099 (20.16%) Loss: 2.400548 LR: 0.00002999 [09:24:06] Epoch: 1 Batch: 4053/20099 (20.17%) Loss: 2.142119 LR: 0.00002999 [09:24:08] Epoch: 1 Batch: 4054/20099 (20.17%) Loss: 2.210209 LR: 0.00002999 [09:24:09] Epoch: 1 Batch: 4055/20099 (20.18%) Loss: 2.173558 LR: 0.00002999 [09:24:11] Epoch: 1 Batch: 4056/20099 (20.18%) Loss: 2.590345 LR: 0.00002999 [09:24:13] Epoch: 1 Batch: 4057/20099 (20.19%) Loss: 2.077080 LR: 0.00002999 [09:24:15] Epoch: 1 Batch: 4058/20099 (20.19%) Loss: 2.250375 LR: 0.00002999 [09:24:17] Epoch: 1 Batch: 4059/20099 (20.20%) Loss: 2.055619 LR: 0.00002999 [09:24:18] Epoch: 1 Batch: 4060/20099 (20.20%) Loss: 2.027817 LR: 0.00002999 [09:24:20] Epoch: 1 Batch: 4061/20099 (20.20%) Loss: 2.447200 LR: 0.00002999 [09:24:22] Epoch: 1 Batch: 4062/20099 (20.21%) Loss: 1.974106 LR: 0.00002999 [09:24:24] Epoch: 1 Batch: 4063/20099 (20.21%) Loss: 2.211970 LR: 0.00002999 [09:24:25] Epoch: 1 Batch: 4064/20099 (20.22%) Loss: 1.975977 LR: 0.00002999 [09:24:27] Epoch: 1 Batch: 4065/20099 (20.22%) Loss: 2.059137 LR: 0.00002999 [09:24:29] Epoch: 1 Batch: 4066/20099 (20.23%) Loss: 2.382083 LR: 0.00002999 [09:24:31] Epoch: 1 Batch: 4067/20099 (20.23%) Loss: 2.404890 LR: 0.00002999 [09:24:32] Epoch: 1 Batch: 4068/20099 (20.24%) Loss: 1.885523 LR: 0.00002999 [09:24:34] Epoch: 1 Batch: 4069/20099 (20.24%) Loss: 2.361925 LR: 0.00002999 [09:24:36] Epoch: 1 Batch: 4070/20099 (20.25%) Loss: 2.402880 LR: 0.00002999 [09:24:38] Epoch: 1 Batch: 4071/20099 (20.25%) Loss: 1.990436 LR: 0.00002999 [09:24:40] Epoch: 1 Batch: 4072/20099 (20.26%) Loss: 2.249188 LR: 0.00002999 [09:24:41] Epoch: 1 Batch: 4073/20099 (20.26%) Loss: 2.073479 LR: 0.00002999 [09:24:43] Epoch: 1 Batch: 4074/20099 (20.27%) Loss: 2.147360 LR: 0.00002999 [09:24:45] Epoch: 1 Batch: 4075/20099 (20.27%) Loss: 2.059235 LR: 0.00002999 [09:24:47] Epoch: 1 Batch: 4076/20099 (20.28%) Loss: 2.341625 LR: 0.00002999 [09:24:49] Epoch: 1 Batch: 4077/20099 (20.28%) Loss: 1.937874 LR: 0.00002999 [09:24:50] Epoch: 1 Batch: 4078/20099 (20.29%) Loss: 2.413152 LR: 0.00002999 [09:24:52] Epoch: 1 Batch: 4079/20099 (20.29%) Loss: 2.403069 LR: 0.00002999 [09:24:54] Epoch: 1 Batch: 4080/20099 (20.30%) Loss: 2.349684 LR: 0.00002999 [09:24:56] Epoch: 1 Batch: 4081/20099 (20.30%) Loss: 1.943343 LR: 0.00002999 [09:24:58] Epoch: 1 Batch: 4082/20099 (20.31%) Loss: 1.914011 LR: 0.00002999 [09:24:59] Epoch: 1 Batch: 4083/20099 (20.31%) Loss: 2.262524 LR: 0.00002999 [09:25:01] Epoch: 1 Batch: 4084/20099 (20.32%) Loss: 1.840936 LR: 0.00002999 [09:25:03] Epoch: 1 Batch: 4085/20099 (20.32%) Loss: 2.109431 LR: 0.00002999 [09:25:05] Epoch: 1 Batch: 4086/20099 (20.33%) Loss: 2.250267 LR: 0.00002999 [09:25:07] Epoch: 1 Batch: 4087/20099 (20.33%) Loss: 2.383049 LR: 0.00002999 [09:25:08] Epoch: 1 Batch: 4088/20099 (20.34%) Loss: 2.286646 LR: 0.00002999 [09:25:10] Epoch: 1 Batch: 4089/20099 (20.34%) Loss: 2.274777 LR: 0.00002999 [09:25:12] Epoch: 1 Batch: 4090/20099 (20.35%) Loss: 2.193700 LR: 0.00002999 [09:25:14] Epoch: 1 Batch: 4091/20099 (20.35%) Loss: 2.031152 LR: 0.00002999 [09:25:15] Epoch: 1 Batch: 4092/20099 (20.36%) Loss: 2.384892 LR: 0.00002999 [09:25:17] Epoch: 1 Batch: 4093/20099 (20.36%) Loss: 2.331565 LR: 0.00002999 [09:25:19] Epoch: 1 Batch: 4094/20099 (20.37%) Loss: 2.319333 LR: 0.00002999 [09:25:21] Epoch: 1 Batch: 4095/20099 (20.37%) Loss: 2.055644 LR: 0.00002999 [09:25:22] Epoch: 1 Batch: 4096/20099 (20.38%) Loss: 2.220170 LR: 0.00002999 [09:25:24] Epoch: 1 Batch: 4097/20099 (20.38%) Loss: 1.857415 LR: 0.00002999 [09:25:26] Epoch: 1 Batch: 4098/20099 (20.39%) Loss: 2.209712 LR: 0.00002999 [09:25:28] Epoch: 1 Batch: 4099/20099 (20.39%) Loss: 2.097533 LR: 0.00002999 [09:25:29] Epoch: 1 Batch: 4100/20099 (20.40%) Loss: 1.895439 LR: 0.00002999 [09:25:31] Epoch: 1 Batch: 4101/20099 (20.40%) Loss: 2.386527 LR: 0.00002999 [09:25:33] Epoch: 1 Batch: 4102/20099 (20.41%) Loss: 2.126809 LR: 0.00002999 [09:25:35] Epoch: 1 Batch: 4103/20099 (20.41%) Loss: 1.895291 LR: 0.00002999 [09:25:36] Epoch: 1 Batch: 4104/20099 (20.42%) Loss: 2.300444 LR: 0.00002999 [09:25:38] Epoch: 1 Batch: 4105/20099 (20.42%) Loss: 2.172960 LR: 0.00002999 [09:25:40] Epoch: 1 Batch: 4106/20099 (20.43%) Loss: 2.231460 LR: 0.00002999 [09:25:42] Epoch: 1 Batch: 4107/20099 (20.43%) Loss: 2.201310 LR: 0.00002999 [09:25:44] Epoch: 1 Batch: 4108/20099 (20.44%) Loss: 2.143344 LR: 0.00002999 [09:25:45] Epoch: 1 Batch: 4109/20099 (20.44%) Loss: 2.373152 LR: 0.00002999 [09:25:47] Epoch: 1 Batch: 4110/20099 (20.45%) Loss: 2.182037 LR: 0.00002999 [09:25:49] Epoch: 1 Batch: 4111/20099 (20.45%) Loss: 2.102974 LR: 0.00002999 [09:25:51] Epoch: 1 Batch: 4112/20099 (20.46%) Loss: 2.067826 LR: 0.00002998 [09:25:52] Epoch: 1 Batch: 4113/20099 (20.46%) Loss: 2.168070 LR: 0.00002998 [09:25:54] Epoch: 1 Batch: 4114/20099 (20.47%) Loss: 2.286877 LR: 0.00002998 [09:25:56] Epoch: 1 Batch: 4115/20099 (20.47%) Loss: 2.317633 LR: 0.00002998 [09:25:58] Epoch: 1 Batch: 4116/20099 (20.48%) Loss: 2.073340 LR: 0.00002998 [09:26:00] Epoch: 1 Batch: 4117/20099 (20.48%) Loss: 2.177355 LR: 0.00002998 [09:26:01] Epoch: 1 Batch: 4118/20099 (20.49%) Loss: 2.029275 LR: 0.00002998 [09:26:03] Epoch: 1 Batch: 4119/20099 (20.49%) Loss: 2.100441 LR: 0.00002998 [09:26:05] Epoch: 1 Batch: 4120/20099 (20.50%) Loss: 2.140800 LR: 0.00002998 [09:26:07] Epoch: 1 Batch: 4121/20099 (20.50%) Loss: 2.403699 LR: 0.00002998 [09:26:08] Epoch: 1 Batch: 4122/20099 (20.51%) Loss: 2.171704 LR: 0.00002998 [09:26:10] Epoch: 1 Batch: 4123/20099 (20.51%) Loss: 2.229477 LR: 0.00002998 [09:26:12] Epoch: 1 Batch: 4124/20099 (20.52%) Loss: 2.029306 LR: 0.00002998 [09:26:14] Epoch: 1 Batch: 4125/20099 (20.52%) Loss: 2.129517 LR: 0.00002998 [09:26:16] Epoch: 1 Batch: 4126/20099 (20.53%) Loss: 2.322975 LR: 0.00002998 [09:26:17] Epoch: 1 Batch: 4127/20099 (20.53%) Loss: 2.291036 LR: 0.00002998 [09:26:19] Epoch: 1 Batch: 4128/20099 (20.54%) Loss: 2.168188 LR: 0.00002998 [09:26:21] Epoch: 1 Batch: 4129/20099 (20.54%) Loss: 2.214102 LR: 0.00002998 [09:26:23] Epoch: 1 Batch: 4130/20099 (20.55%) Loss: 2.447123 LR: 0.00002998 [09:26:25] Epoch: 1 Batch: 4131/20099 (20.55%) Loss: 2.279592 LR: 0.00002998 [09:26:26] Epoch: 1 Batch: 4132/20099 (20.56%) Loss: 2.349950 LR: 0.00002998 [09:26:28] Epoch: 1 Batch: 4133/20099 (20.56%) Loss: 2.163515 LR: 0.00002998 [09:26:30] Epoch: 1 Batch: 4134/20099 (20.57%) Loss: 2.402847 LR: 0.00002998 [09:26:32] Epoch: 1 Batch: 4135/20099 (20.57%) Loss: 2.186323 LR: 0.00002998 [09:26:33] Epoch: 1 Batch: 4136/20099 (20.58%) Loss: 2.135204 LR: 0.00002998 [09:26:35] Epoch: 1 Batch: 4137/20099 (20.58%) Loss: 2.153203 LR: 0.00002998 [09:26:37] Epoch: 1 Batch: 4138/20099 (20.59%) Loss: 2.337076 LR: 0.00002998 [09:26:39] Epoch: 1 Batch: 4139/20099 (20.59%) Loss: 2.117932 LR: 0.00002998 [09:26:41] Epoch: 1 Batch: 4140/20099 (20.60%) Loss: 2.210425 LR: 0.00002998 [09:26:42] Epoch: 1 Batch: 4141/20099 (20.60%) Loss: 2.249429 LR: 0.00002998 [09:26:44] Epoch: 1 Batch: 4142/20099 (20.61%) Loss: 2.196202 LR: 0.00002998 [09:26:46] Epoch: 1 Batch: 4143/20099 (20.61%) Loss: 2.108394 LR: 0.00002998 [09:26:48] Epoch: 1 Batch: 4144/20099 (20.62%) Loss: 1.956349 LR: 0.00002998 [09:26:49] Epoch: 1 Batch: 4145/20099 (20.62%) Loss: 2.279192 LR: 0.00002998 [09:26:51] Epoch: 1 Batch: 4146/20099 (20.63%) Loss: 2.406589 LR: 0.00002998 [09:26:53] Epoch: 1 Batch: 4147/20099 (20.63%) Loss: 1.886248 LR: 0.00002998 [09:26:55] Epoch: 1 Batch: 4148/20099 (20.64%) Loss: 2.000163 LR: 0.00002998 [09:26:56] Epoch: 1 Batch: 4149/20099 (20.64%) Loss: 1.887361 LR: 0.00002998 [09:26:58] Epoch: 1 Batch: 4150/20099 (20.65%) Loss: 1.984324 LR: 0.00002998 [09:27:00] Epoch: 1 Batch: 4151/20099 (20.65%) Loss: 2.179041 LR: 0.00002998 [09:27:02] Epoch: 1 Batch: 4152/20099 (20.66%) Loss: 1.930087 LR: 0.00002998 [09:27:04] Epoch: 1 Batch: 4153/20099 (20.66%) Loss: 2.794794 LR: 0.00002998 [09:27:05] Epoch: 1 Batch: 4154/20099 (20.67%) Loss: 2.126315 LR: 0.00002998 [09:27:07] Epoch: 1 Batch: 4155/20099 (20.67%) Loss: 2.256185 LR: 0.00002998 [09:27:09] Epoch: 1 Batch: 4156/20099 (20.68%) Loss: 2.078226 LR: 0.00002998 [09:27:11] Epoch: 1 Batch: 4157/20099 (20.68%) Loss: 1.652144 LR: 0.00002998 [09:27:12] Epoch: 1 Batch: 4158/20099 (20.69%) Loss: 2.046209 LR: 0.00002998 [09:27:14] Epoch: 1 Batch: 4159/20099 (20.69%) Loss: 1.866931 LR: 0.00002998 [09:27:16] Epoch: 1 Batch: 4160/20099 (20.70%) Loss: 2.249739 LR: 0.00002998 [09:27:18] Epoch: 1 Batch: 4161/20099 (20.70%) Loss: 2.535205 LR: 0.00002998 [09:27:19] Epoch: 1 Batch: 4162/20099 (20.71%) Loss: 1.987071 LR: 0.00002998 [09:27:21] Epoch: 1 Batch: 4163/20099 (20.71%) Loss: 2.310194 LR: 0.00002998 [09:27:23] Epoch: 1 Batch: 4164/20099 (20.72%) Loss: 2.246069 LR: 0.00002998 [09:27:25] Epoch: 1 Batch: 4165/20099 (20.72%) Loss: 2.236383 LR: 0.00002998 [09:27:26] Epoch: 1 Batch: 4166/20099 (20.73%) Loss: 1.744655 LR: 0.00002998 [09:27:28] Epoch: 1 Batch: 4167/20099 (20.73%) Loss: 2.082060 LR: 0.00002998 [09:27:30] Epoch: 1 Batch: 4168/20099 (20.74%) Loss: 2.383420 LR: 0.00002998 [09:27:32] Epoch: 1 Batch: 4169/20099 (20.74%) Loss: 2.261171 LR: 0.00002998 [09:27:33] Epoch: 1 Batch: 4170/20099 (20.75%) Loss: 1.748674 LR: 0.00002998 [09:27:35] Epoch: 1 Batch: 4171/20099 (20.75%) Loss: 2.313165 LR: 0.00002998 [09:27:37] Epoch: 1 Batch: 4172/20099 (20.76%) Loss: 2.079988 LR: 0.00002998 [09:27:39] Epoch: 1 Batch: 4173/20099 (20.76%) Loss: 2.461200 LR: 0.00002998 [09:27:41] Epoch: 1 Batch: 4174/20099 (20.77%) Loss: 1.769288 LR: 0.00002998 [09:27:42] Epoch: 1 Batch: 4175/20099 (20.77%) Loss: 2.145441 LR: 0.00002998 [09:27:44] Epoch: 1 Batch: 4176/20099 (20.78%) Loss: 2.132227 LR: 0.00002998 [09:27:46] Epoch: 1 Batch: 4177/20099 (20.78%) Loss: 2.423449 LR: 0.00002998 [09:27:48] Epoch: 1 Batch: 4178/20099 (20.79%) Loss: 1.908737 LR: 0.00002998 [09:27:49] Epoch: 1 Batch: 4179/20099 (20.79%) Loss: 2.124351 LR: 0.00002998 [09:27:51] Epoch: 1 Batch: 4180/20099 (20.80%) Loss: 2.022864 LR: 0.00002998 [09:27:53] Epoch: 1 Batch: 4181/20099 (20.80%) Loss: 2.109617 LR: 0.00002998 [09:27:55] Epoch: 1 Batch: 4182/20099 (20.81%) Loss: 2.346232 LR: 0.00002998 [09:27:56] Epoch: 1 Batch: 4183/20099 (20.81%) Loss: 2.208992 LR: 0.00002998 [09:27:58] Epoch: 1 Batch: 4184/20099 (20.82%) Loss: 2.314929 LR: 0.00002998 [09:28:00] Epoch: 1 Batch: 4185/20099 (20.82%) Loss: 1.593492 LR: 0.00002998 [09:28:02] Epoch: 1 Batch: 4186/20099 (20.83%) Loss: 1.952594 LR: 0.00002998 [09:28:04] Epoch: 1 Batch: 4187/20099 (20.83%) Loss: 2.185266 LR: 0.00002998 [09:28:05] Epoch: 1 Batch: 4188/20099 (20.84%) Loss: 1.977003 LR: 0.00002998 [09:28:07] Epoch: 1 Batch: 4189/20099 (20.84%) Loss: 2.172154 LR: 0.00002997 [09:28:09] Epoch: 1 Batch: 4190/20099 (20.85%) Loss: 2.205625 LR: 0.00002997 [09:28:11] Epoch: 1 Batch: 4191/20099 (20.85%) Loss: 2.313609 LR: 0.00002997 [09:28:13] Epoch: 1 Batch: 4192/20099 (20.86%) Loss: 2.151612 LR: 0.00002997 [09:28:14] Epoch: 1 Batch: 4193/20099 (20.86%) Loss: 2.159230 LR: 0.00002997 [09:28:16] Epoch: 1 Batch: 4194/20099 (20.87%) Loss: 2.171582 LR: 0.00002997 [09:28:18] Epoch: 1 Batch: 4195/20099 (20.87%) Loss: 2.472545 LR: 0.00002997 [09:28:20] Epoch: 1 Batch: 4196/20099 (20.88%) Loss: 2.231366 LR: 0.00002997 [09:28:21] Epoch: 1 Batch: 4197/20099 (20.88%) Loss: 2.498285 LR: 0.00002997 [09:28:23] Epoch: 1 Batch: 4198/20099 (20.89%) Loss: 1.781743 LR: 0.00002997 [09:28:25] Epoch: 1 Batch: 4199/20099 (20.89%) Loss: 2.474333 LR: 0.00002997 [09:28:42] >> Temp checkpoint saved: epoch1_step4200, size: 0.1693 GB [09:28:42] Epoch: 1 Batch: 4200/20099 (20.90%) Loss: 2.258252 LR: 0.00002997 [09:28:44] Epoch: 1 Batch: 4201/20099 (20.90%) Loss: 2.217326 LR: 0.00002997 [09:28:46] Epoch: 1 Batch: 4202/20099 (20.91%) Loss: 1.906879 LR: 0.00002997 [09:28:47] Epoch: 1 Batch: 4203/20099 (20.91%) Loss: 1.988648 LR: 0.00002997 [09:28:49] Epoch: 1 Batch: 4204/20099 (20.92%) Loss: 2.493227 LR: 0.00002997 [09:28:52] Epoch: 1 Batch: 4205/20099 (20.92%) Loss: 2.081568 LR: 0.00002997 [09:28:54] Epoch: 1 Batch: 4206/20099 (20.93%) Loss: 2.326968 LR: 0.00002997 [09:28:56] Epoch: 1 Batch: 4207/20099 (20.93%) Loss: 2.204204 LR: 0.00002997 [09:28:58] Epoch: 1 Batch: 4208/20099 (20.94%) Loss: 2.013638 LR: 0.00002997 [09:29:00] Epoch: 1 Batch: 4209/20099 (20.94%) Loss: 2.173445 LR: 0.00002997 [09:29:01] Epoch: 1 Batch: 4210/20099 (20.95%) Loss: 2.294368 LR: 0.00002997 [09:29:03] Epoch: 1 Batch: 4211/20099 (20.95%) Loss: 2.000285 LR: 0.00002997 [09:29:05] Epoch: 1 Batch: 4212/20099 (20.96%) Loss: 2.212265 LR: 0.00002997 [09:29:07] Epoch: 1 Batch: 4213/20099 (20.96%) Loss: 2.269067 LR: 0.00002997 [09:29:09] Epoch: 1 Batch: 4214/20099 (20.97%) Loss: 2.209731 LR: 0.00002997 [09:29:11] Epoch: 1 Batch: 4215/20099 (20.97%) Loss: 2.092431 LR: 0.00002997 [09:29:12] Epoch: 1 Batch: 4216/20099 (20.98%) Loss: 2.491425 LR: 0.00002997 [09:29:14] Epoch: 1 Batch: 4217/20099 (20.98%) Loss: 2.283398 LR: 0.00002997 [09:29:16] Epoch: 1 Batch: 4218/20099 (20.99%) Loss: 2.120367 LR: 0.00002997 [09:29:18] Epoch: 1 Batch: 4219/20099 (20.99%) Loss: 2.099170 LR: 0.00002997 [09:29:20] Epoch: 1 Batch: 4220/20099 (21.00%) Loss: 2.306964 LR: 0.00002997 [09:29:22] Epoch: 1 Batch: 4221/20099 (21.00%) Loss: 2.225070 LR: 0.00002997 [09:29:24] Epoch: 1 Batch: 4222/20099 (21.01%) Loss: 2.166511 LR: 0.00002997 [09:29:26] Epoch: 1 Batch: 4223/20099 (21.01%) Loss: 2.240740 LR: 0.00002997 [09:29:27] Epoch: 1 Batch: 4224/20099 (21.02%) Loss: 1.895833 LR: 0.00002997 [09:29:29] Epoch: 1 Batch: 4225/20099 (21.02%) Loss: 2.339063 LR: 0.00002997 [09:29:31] Epoch: 1 Batch: 4226/20099 (21.03%) Loss: 2.540421 LR: 0.00002997 [09:29:33] Epoch: 1 Batch: 4227/20099 (21.03%) Loss: 2.084920 LR: 0.00002997 [09:29:35] Epoch: 1 Batch: 4228/20099 (21.04%) Loss: 2.009688 LR: 0.00002997 [09:29:36] Epoch: 1 Batch: 4229/20099 (21.04%) Loss: 2.044715 LR: 0.00002997 [09:29:38] Epoch: 1 Batch: 4230/20099 (21.05%) Loss: 2.163644 LR: 0.00002997 [09:29:40] Epoch: 1 Batch: 4231/20099 (21.05%) Loss: 2.207278 LR: 0.00002997 [09:29:42] Epoch: 1 Batch: 4232/20099 (21.06%) Loss: 2.156269 LR: 0.00002997 [09:29:43] Epoch: 1 Batch: 4233/20099 (21.06%) Loss: 2.457905 LR: 0.00002997 [09:29:45] Epoch: 1 Batch: 4234/20099 (21.07%) Loss: 1.904993 LR: 0.00002997 [09:29:47] Epoch: 1 Batch: 4235/20099 (21.07%) Loss: 2.386817 LR: 0.00002997 [09:29:49] Epoch: 1 Batch: 4236/20099 (21.08%) Loss: 2.200943 LR: 0.00002997 [09:29:50] Epoch: 1 Batch: 4237/20099 (21.08%) Loss: 1.952594 LR: 0.00002997 [09:29:52] Epoch: 1 Batch: 4238/20099 (21.09%) Loss: 2.115129 LR: 0.00002997 [09:29:54] Epoch: 1 Batch: 4239/20099 (21.09%) Loss: 2.038809 LR: 0.00002997 [09:29:56] Epoch: 1 Batch: 4240/20099 (21.10%) Loss: 2.038389 LR: 0.00002997 [09:29:57] Epoch: 1 Batch: 4241/20099 (21.10%) Loss: 2.027530 LR: 0.00002997 [09:29:59] Epoch: 1 Batch: 4242/20099 (21.11%) Loss: 2.317637 LR: 0.00002997 [09:30:01] Epoch: 1 Batch: 4243/20099 (21.11%) Loss: 2.314612 LR: 0.00002997 [09:30:03] Epoch: 1 Batch: 4244/20099 (21.12%) Loss: 1.980255 LR: 0.00002997 [09:30:04] Epoch: 1 Batch: 4245/20099 (21.12%) Loss: 2.140796 LR: 0.00002997 [09:30:06] Epoch: 1 Batch: 4246/20099 (21.13%) Loss: 2.130179 LR: 0.00002997 [09:30:08] Epoch: 1 Batch: 4247/20099 (21.13%) Loss: 2.303821 LR: 0.00002997 [09:30:10] Epoch: 1 Batch: 4248/20099 (21.14%) Loss: 2.132404 LR: 0.00002997 [09:30:11] Epoch: 1 Batch: 4249/20099 (21.14%) Loss: 2.222065 LR: 0.00002997 [09:30:13] Epoch: 1 Batch: 4250/20099 (21.15%) Loss: 2.548959 LR: 0.00002997 [09:30:15] Epoch: 1 Batch: 4251/20099 (21.15%) Loss: 2.378686 LR: 0.00002997 [09:30:17] Epoch: 1 Batch: 4252/20099 (21.16%) Loss: 2.146301 LR: 0.00002996 [09:30:18] Epoch: 1 Batch: 4253/20099 (21.16%) Loss: 1.932536 LR: 0.00002996 [09:30:20] Epoch: 1 Batch: 4254/20099 (21.17%) Loss: 2.704013 LR: 0.00002996 [09:30:22] Epoch: 1 Batch: 4255/20099 (21.17%) Loss: 2.219529 LR: 0.00002996 [09:30:24] Epoch: 1 Batch: 4256/20099 (21.18%) Loss: 2.186404 LR: 0.00002996 [09:30:25] Epoch: 1 Batch: 4257/20099 (21.18%) Loss: 2.122137 LR: 0.00002996 [09:30:27] Epoch: 1 Batch: 4258/20099 (21.19%) Loss: 2.077111 LR: 0.00002996 [09:30:29] Epoch: 1 Batch: 4259/20099 (21.19%) Loss: 2.326340 LR: 0.00002996 [09:30:31] Epoch: 1 Batch: 4260/20099 (21.20%) Loss: 2.210035 LR: 0.00002996 [09:30:32] Epoch: 1 Batch: 4261/20099 (21.20%) Loss: 2.320212 LR: 0.00002996 [09:30:34] Epoch: 1 Batch: 4262/20099 (21.21%) Loss: 2.049881 LR: 0.00002996 [09:30:36] Epoch: 1 Batch: 4263/20099 (21.21%) Loss: 2.009214 LR: 0.00002996 [09:30:38] Epoch: 1 Batch: 4264/20099 (21.21%) Loss: 2.148879 LR: 0.00002996 [09:30:40] Epoch: 1 Batch: 4265/20099 (21.22%) Loss: 2.223458 LR: 0.00002996 [09:30:41] Epoch: 1 Batch: 4266/20099 (21.22%) Loss: 2.177021 LR: 0.00002996 [09:30:43] Epoch: 1 Batch: 4267/20099 (21.23%) Loss: 2.162271 LR: 0.00002996 [09:30:45] Epoch: 1 Batch: 4268/20099 (21.23%) Loss: 2.253125 LR: 0.00002996 [09:30:47] Epoch: 1 Batch: 4269/20099 (21.24%) Loss: 2.341412 LR: 0.00002996 [09:30:49] Epoch: 1 Batch: 4270/20099 (21.24%) Loss: 2.386431 LR: 0.00002996 [09:30:50] Epoch: 1 Batch: 4271/20099 (21.25%) Loss: 2.283540 LR: 0.00002996 [09:30:52] Epoch: 1 Batch: 4272/20099 (21.25%) Loss: 2.050317 LR: 0.00002996 [09:30:54] Epoch: 1 Batch: 4273/20099 (21.26%) Loss: 2.161394 LR: 0.00002996 [09:30:56] Epoch: 1 Batch: 4274/20099 (21.26%) Loss: 2.082860 LR: 0.00002996 [09:30:58] Epoch: 1 Batch: 4275/20099 (21.27%) Loss: 2.101066 LR: 0.00002996 [09:30:59] Epoch: 1 Batch: 4276/20099 (21.27%) Loss: 1.863072 LR: 0.00002996 [09:31:01] Epoch: 1 Batch: 4277/20099 (21.28%) Loss: 2.119669 LR: 0.00002996 [09:31:03] Epoch: 1 Batch: 4278/20099 (21.28%) Loss: 2.214416 LR: 0.00002996 [09:31:05] Epoch: 1 Batch: 4279/20099 (21.29%) Loss: 1.846871 LR: 0.00002996 [09:31:07] Epoch: 1 Batch: 4280/20099 (21.29%) Loss: 2.516695 LR: 0.00002996 [09:31:08] Epoch: 1 Batch: 4281/20099 (21.30%) Loss: 2.295454 LR: 0.00002996 [09:31:10] Epoch: 1 Batch: 4282/20099 (21.30%) Loss: 2.537032 LR: 0.00002996 [09:31:12] Epoch: 1 Batch: 4283/20099 (21.31%) Loss: 2.249225 LR: 0.00002996 [09:31:14] Epoch: 1 Batch: 4284/20099 (21.31%) Loss: 2.117881 LR: 0.00002996 [09:31:15] Epoch: 1 Batch: 4285/20099 (21.32%) Loss: 2.106057 LR: 0.00002996 [09:31:17] Epoch: 1 Batch: 4286/20099 (21.32%) Loss: 2.270016 LR: 0.00002996 [09:31:19] Epoch: 1 Batch: 4287/20099 (21.33%) Loss: 2.063246 LR: 0.00002996 [09:31:21] Epoch: 1 Batch: 4288/20099 (21.33%) Loss: 2.291669 LR: 0.00002996 [09:31:22] Epoch: 1 Batch: 4289/20099 (21.34%) Loss: 2.507775 LR: 0.00002996 [09:31:24] Epoch: 1 Batch: 4290/20099 (21.34%) Loss: 2.583970 LR: 0.00002996 [09:31:26] Epoch: 1 Batch: 4291/20099 (21.35%) Loss: 1.926546 LR: 0.00002996 [09:31:28] Epoch: 1 Batch: 4292/20099 (21.35%) Loss: 1.906059 LR: 0.00002996 [09:31:29] Epoch: 1 Batch: 4293/20099 (21.36%) Loss: 2.087105 LR: 0.00002996 [09:31:31] Epoch: 1 Batch: 4294/20099 (21.36%) Loss: 2.145736 LR: 0.00002996 [09:31:33] Epoch: 1 Batch: 4295/20099 (21.37%) Loss: 2.237744 LR: 0.00002996 [09:31:35] Epoch: 1 Batch: 4296/20099 (21.37%) Loss: 2.306480 LR: 0.00002996 [09:31:36] Epoch: 1 Batch: 4297/20099 (21.38%) Loss: 2.151994 LR: 0.00002996 [09:31:38] Epoch: 1 Batch: 4298/20099 (21.38%) Loss: 2.125349 LR: 0.00002996 [09:31:40] Epoch: 1 Batch: 4299/20099 (21.39%) Loss: 2.565513 LR: 0.00002996 [09:31:42] Epoch: 1 Batch: 4300/20099 (21.39%) Loss: 2.073684 LR: 0.00002996 [09:31:44] Epoch: 1 Batch: 4301/20099 (21.40%) Loss: 1.843676 LR: 0.00002996 [09:31:45] Epoch: 1 Batch: 4302/20099 (21.40%) Loss: 2.379328 LR: 0.00002996 [09:31:47] Epoch: 1 Batch: 4303/20099 (21.41%) Loss: 2.206618 LR: 0.00002996 [09:31:49] Epoch: 1 Batch: 4304/20099 (21.41%) Loss: 2.164945 LR: 0.00002996 [09:31:51] Epoch: 1 Batch: 4305/20099 (21.42%) Loss: 2.261166 LR: 0.00002996 [09:31:52] Epoch: 1 Batch: 4306/20099 (21.42%) Loss: 2.004408 LR: 0.00002996 [09:31:54] Epoch: 1 Batch: 4307/20099 (21.43%) Loss: 2.015667 LR: 0.00002996 [09:31:56] Epoch: 1 Batch: 4308/20099 (21.43%) Loss: 2.121432 LR: 0.00002995 [09:31:58] Epoch: 1 Batch: 4309/20099 (21.44%) Loss: 1.881698 LR: 0.00002995 [09:32:00] Epoch: 1 Batch: 4310/20099 (21.44%) Loss: 1.956137 LR: 0.00002995 [09:32:01] Epoch: 1 Batch: 4311/20099 (21.45%) Loss: 1.947452 LR: 0.00002995 [09:32:03] Epoch: 1 Batch: 4312/20099 (21.45%) Loss: 2.335690 LR: 0.00002995 [09:32:05] Epoch: 1 Batch: 4313/20099 (21.46%) Loss: 2.090460 LR: 0.00002995 [09:32:07] Epoch: 1 Batch: 4314/20099 (21.46%) Loss: 2.267272 LR: 0.00002995 [09:32:09] Epoch: 1 Batch: 4315/20099 (21.47%) Loss: 2.220538 LR: 0.00002995 [09:32:10] Epoch: 1 Batch: 4316/20099 (21.47%) Loss: 2.372441 LR: 0.00002995 [09:32:12] Epoch: 1 Batch: 4317/20099 (21.48%) Loss: 2.301841 LR: 0.00002995 [09:32:14] Epoch: 1 Batch: 4318/20099 (21.48%) Loss: 2.286405 LR: 0.00002995 [09:32:16] Epoch: 1 Batch: 4319/20099 (21.49%) Loss: 2.136866 LR: 0.00002995 [09:32:17] Epoch: 1 Batch: 4320/20099 (21.49%) Loss: 1.948267 LR: 0.00002995 [09:32:19] Epoch: 1 Batch: 4321/20099 (21.50%) Loss: 2.191081 LR: 0.00002995 [09:32:21] Epoch: 1 Batch: 4322/20099 (21.50%) Loss: 2.072840 LR: 0.00002995 [09:32:23] Epoch: 1 Batch: 4323/20099 (21.51%) Loss: 2.094398 LR: 0.00002995 [09:32:25] Epoch: 1 Batch: 4324/20099 (21.51%) Loss: 2.349399 LR: 0.00002995 [09:32:26] Epoch: 1 Batch: 4325/20099 (21.52%) Loss: 2.408298 LR: 0.00002995 [09:32:28] Epoch: 1 Batch: 4326/20099 (21.52%) Loss: 2.215339 LR: 0.00002995 [09:32:30] Epoch: 1 Batch: 4327/20099 (21.53%) Loss: 2.190173 LR: 0.00002995 [09:32:32] Epoch: 1 Batch: 4328/20099 (21.53%) Loss: 2.054258 LR: 0.00002995 [09:32:33] Epoch: 1 Batch: 4329/20099 (21.54%) Loss: 2.155009 LR: 0.00002995 [09:32:35] Epoch: 1 Batch: 4330/20099 (21.54%) Loss: 2.393301 LR: 0.00002995 [09:32:37] Epoch: 1 Batch: 4331/20099 (21.55%) Loss: 2.053289 LR: 0.00002995 [09:32:39] Epoch: 1 Batch: 4332/20099 (21.55%) Loss: 2.140115 LR: 0.00002995 [09:32:41] Epoch: 1 Batch: 4333/20099 (21.56%) Loss: 1.904420 LR: 0.00002995 [09:32:42] Epoch: 1 Batch: 4334/20099 (21.56%) Loss: 2.380860 LR: 0.00002995 [09:32:44] Epoch: 1 Batch: 4335/20099 (21.57%) Loss: 2.395250 LR: 0.00002995 [09:32:46] Epoch: 1 Batch: 4336/20099 (21.57%) Loss: 2.006323 LR: 0.00002995 [09:32:48] Epoch: 1 Batch: 4337/20099 (21.58%) Loss: 1.960952 LR: 0.00002995 [09:32:49] Epoch: 1 Batch: 4338/20099 (21.58%) Loss: 2.175974 LR: 0.00002995 [09:32:51] Epoch: 1 Batch: 4339/20099 (21.59%) Loss: 1.958958 LR: 0.00002995 [09:32:53] Epoch: 1 Batch: 4340/20099 (21.59%) Loss: 2.040513 LR: 0.00002995 [09:32:55] Epoch: 1 Batch: 4341/20099 (21.60%) Loss: 2.163926 LR: 0.00002995 [09:32:57] Epoch: 1 Batch: 4342/20099 (21.60%) Loss: 2.496896 LR: 0.00002995 [09:32:58] Epoch: 1 Batch: 4343/20099 (21.61%) Loss: 2.300030 LR: 0.00002995 [09:33:00] Epoch: 1 Batch: 4344/20099 (21.61%) Loss: 2.464136 LR: 0.00002995 [09:33:02] Epoch: 1 Batch: 4345/20099 (21.62%) Loss: 2.069293 LR: 0.00002995 [09:33:04] Epoch: 1 Batch: 4346/20099 (21.62%) Loss: 2.187393 LR: 0.00002995 [09:33:05] Epoch: 1 Batch: 4347/20099 (21.63%) Loss: 2.306781 LR: 0.00002995 [09:33:07] Epoch: 1 Batch: 4348/20099 (21.63%) Loss: 2.354031 LR: 0.00002995 [09:33:09] Epoch: 1 Batch: 4349/20099 (21.64%) Loss: 2.517716 LR: 0.00002995 [09:33:11] Epoch: 1 Batch: 4350/20099 (21.64%) Loss: 2.225712 LR: 0.00002994 [09:33:13] Epoch: 1 Batch: 4351/20099 (21.65%) Loss: 2.159921 LR: 0.00002994 [09:33:14] Epoch: 1 Batch: 4352/20099 (21.65%) Loss: 2.430187 LR: 0.00002994 [09:33:16] Epoch: 1 Batch: 4353/20099 (21.66%) Loss: 2.081634 LR: 0.00002994 [09:33:18] Epoch: 1 Batch: 4354/20099 (21.66%) Loss: 2.167129 LR: 0.00002994 [09:33:20] Epoch: 1 Batch: 4355/20099 (21.67%) Loss: 1.975859 LR: 0.00002994 [09:33:21] Epoch: 1 Batch: 4356/20099 (21.67%) Loss: 2.054198 LR: 0.00002994 [09:33:23] Epoch: 1 Batch: 4357/20099 (21.68%) Loss: 1.911408 LR: 0.00002994 [09:33:25] Epoch: 1 Batch: 4358/20099 (21.68%) Loss: 2.143560 LR: 0.00002994 [09:33:27] Epoch: 1 Batch: 4359/20099 (21.69%) Loss: 1.897688 LR: 0.00002994 [09:33:29] Epoch: 1 Batch: 4360/20099 (21.69%) Loss: 2.172303 LR: 0.00002994 [09:33:30] Epoch: 1 Batch: 4361/20099 (21.70%) Loss: 2.082796 LR: 0.00002994 [09:33:32] Epoch: 1 Batch: 4362/20099 (21.70%) Loss: 2.255810 LR: 0.00002994 [09:33:34] Epoch: 1 Batch: 4363/20099 (21.71%) Loss: 2.193030 LR: 0.00002994 [09:33:36] Epoch: 1 Batch: 4364/20099 (21.71%) Loss: 2.429543 LR: 0.00002994 [09:33:37] Epoch: 1 Batch: 4365/20099 (21.72%) Loss: 2.062505 LR: 0.00002994 [09:33:39] Epoch: 1 Batch: 4366/20099 (21.72%) Loss: 2.008330 LR: 0.00002994 [09:33:41] Epoch: 1 Batch: 4367/20099 (21.73%) Loss: 2.009021 LR: 0.00002994 [09:33:43] Epoch: 1 Batch: 4368/20099 (21.73%) Loss: 2.295634 LR: 0.00002994 [09:33:45] Epoch: 1 Batch: 4369/20099 (21.74%) Loss: 2.109503 LR: 0.00002994 [09:33:46] Epoch: 1 Batch: 4370/20099 (21.74%) Loss: 2.626593 LR: 0.00002994 [09:33:48] Epoch: 1 Batch: 4371/20099 (21.75%) Loss: 2.263075 LR: 0.00002994 [09:33:50] Epoch: 1 Batch: 4372/20099 (21.75%) Loss: 1.986575 LR: 0.00002994 [09:33:52] Epoch: 1 Batch: 4373/20099 (21.76%) Loss: 2.441669 LR: 0.00002994 [09:33:53] Epoch: 1 Batch: 4374/20099 (21.76%) Loss: 2.191575 LR: 0.00002994 [09:33:55] Epoch: 1 Batch: 4375/20099 (21.77%) Loss: 2.324395 LR: 0.00002994 [09:33:57] Epoch: 1 Batch: 4376/20099 (21.77%) Loss: 2.465048 LR: 0.00002994 [09:33:59] Epoch: 1 Batch: 4377/20099 (21.78%) Loss: 2.180017 LR: 0.00002994 [09:34:01] Epoch: 1 Batch: 4378/20099 (21.78%) Loss: 2.246031 LR: 0.00002994 [09:34:02] Epoch: 1 Batch: 4379/20099 (21.79%) Loss: 2.357298 LR: 0.00002994 [09:34:04] Epoch: 1 Batch: 4380/20099 (21.79%) Loss: 2.220786 LR: 0.00002994 [09:34:06] Epoch: 1 Batch: 4381/20099 (21.80%) Loss: 1.845129 LR: 0.00002994 [09:34:08] Epoch: 1 Batch: 4382/20099 (21.80%) Loss: 2.698543 LR: 0.00002994 [09:34:09] Epoch: 1 Batch: 4383/20099 (21.81%) Loss: 2.326270 LR: 0.00002994 [09:34:11] Epoch: 1 Batch: 4384/20099 (21.81%) Loss: 2.411268 LR: 0.00002994 [09:34:13] Epoch: 1 Batch: 4385/20099 (21.82%) Loss: 2.225788 LR: 0.00002994 [09:34:15] Epoch: 1 Batch: 4386/20099 (21.82%) Loss: 2.450372 LR: 0.00002994 [09:34:16] Epoch: 1 Batch: 4387/20099 (21.83%) Loss: 2.288461 LR: 0.00002994 [09:34:18] Epoch: 1 Batch: 4388/20099 (21.83%) Loss: 2.105160 LR: 0.00002994 [09:34:20] Epoch: 1 Batch: 4389/20099 (21.84%) Loss: 2.119358 LR: 0.00002994 [09:34:22] Epoch: 1 Batch: 4390/20099 (21.84%) Loss: 2.374735 LR: 0.00002994 [09:34:23] Epoch: 1 Batch: 4391/20099 (21.85%) Loss: 2.110728 LR: 0.00002994 [09:34:25] Epoch: 1 Batch: 4392/20099 (21.85%) Loss: 1.998562 LR: 0.00002993 [09:34:27] Epoch: 1 Batch: 4393/20099 (21.86%) Loss: 2.378952 LR: 0.00002993 [09:34:29] Epoch: 1 Batch: 4394/20099 (21.86%) Loss: 2.152264 LR: 0.00002993 [09:34:31] Epoch: 1 Batch: 4395/20099 (21.87%) Loss: 2.426760 LR: 0.00002993 [09:34:32] Epoch: 1 Batch: 4396/20099 (21.87%) Loss: 2.111558 LR: 0.00002993 [09:34:34] Epoch: 1 Batch: 4397/20099 (21.88%) Loss: 2.266134 LR: 0.00002993 [09:34:36] Epoch: 1 Batch: 4398/20099 (21.88%) Loss: 2.289913 LR: 0.00002993 [09:34:38] Epoch: 1 Batch: 4399/20099 (21.89%) Loss: 2.324952 LR: 0.00002993 [09:34:43] >> Cleaned up old temp checkpoint: epoch1_step2400 [09:34:43] >> Temp checkpoint saved: epoch1_step4400, size: 0.1693 GB [09:34:43] Epoch: 1 Batch: 4400/20099 (21.89%) Loss: 2.206054 LR: 0.00002993 [09:34:45] Epoch: 1 Batch: 4401/20099 (21.90%) Loss: 2.121865 LR: 0.00002993 [09:34:47] Epoch: 1 Batch: 4402/20099 (21.90%) Loss: 2.185474 LR: 0.00002993 [09:34:49] Epoch: 1 Batch: 4403/20099 (21.91%) Loss: 2.184878 LR: 0.00002993 [09:34:50] Epoch: 1 Batch: 4404/20099 (21.91%) Loss: 2.138655 LR: 0.00002993 [09:34:52] Epoch: 1 Batch: 4405/20099 (21.92%) Loss: 2.327180 LR: 0.00002993 [09:34:54] Epoch: 1 Batch: 4406/20099 (21.92%) Loss: 2.359757 LR: 0.00002993 [09:34:56] Epoch: 1 Batch: 4407/20099 (21.93%) Loss: 2.339371 LR: 0.00002993 [09:34:57] Epoch: 1 Batch: 4408/20099 (21.93%) Loss: 2.117037 LR: 0.00002993 [09:34:59] Epoch: 1 Batch: 4409/20099 (21.94%) Loss: 2.204730 LR: 0.00002993 [09:35:01] Epoch: 1 Batch: 4410/20099 (21.94%) Loss: 2.286284 LR: 0.00002993 [09:35:03] Epoch: 1 Batch: 4411/20099 (21.95%) Loss: 1.945703 LR: 0.00002993 [09:35:04] Epoch: 1 Batch: 4412/20099 (21.95%) Loss: 2.093357 LR: 0.00002993 [09:35:06] Epoch: 1 Batch: 4413/20099 (21.96%) Loss: 2.060310 LR: 0.00002993 [09:35:08] Epoch: 1 Batch: 4414/20099 (21.96%) Loss: 1.806569 LR: 0.00002993 [09:35:10] Epoch: 1 Batch: 4415/20099 (21.97%) Loss: 2.104291 LR: 0.00002993 [09:35:12] Epoch: 1 Batch: 4416/20099 (21.97%) Loss: 2.155068 LR: 0.00002993 [09:35:14] Epoch: 1 Batch: 4417/20099 (21.98%) Loss: 2.191786 LR: 0.00002993 [09:35:15] Epoch: 1 Batch: 4418/20099 (21.98%) Loss: 2.304741 LR: 0.00002993 [09:35:17] Epoch: 1 Batch: 4419/20099 (21.99%) Loss: 2.053975 LR: 0.00002993 [09:35:19] Epoch: 1 Batch: 4420/20099 (21.99%) Loss: 2.253861 LR: 0.00002993 [09:35:21] Epoch: 1 Batch: 4421/20099 (22.00%) Loss: 2.168701 LR: 0.00002993 [09:35:23] Epoch: 1 Batch: 4422/20099 (22.00%) Loss: 2.424770 LR: 0.00002993 [09:35:24] Epoch: 1 Batch: 4423/20099 (22.01%) Loss: 2.042711 LR: 0.00002993 [09:35:26] Epoch: 1 Batch: 4424/20099 (22.01%) Loss: 2.167008 LR: 0.00002993 [09:35:28] Epoch: 1 Batch: 4425/20099 (22.02%) Loss: 2.327759 LR: 0.00002993 [09:35:30] Epoch: 1 Batch: 4426/20099 (22.02%) Loss: 2.086819 LR: 0.00002993 [09:35:31] Epoch: 1 Batch: 4427/20099 (22.03%) Loss: 2.284888 LR: 0.00002993 [09:35:33] Epoch: 1 Batch: 4428/20099 (22.03%) Loss: 2.297292 LR: 0.00002993 [09:35:35] Epoch: 1 Batch: 4429/20099 (22.04%) Loss: 2.115367 LR: 0.00002993 [09:35:37] Epoch: 1 Batch: 4430/20099 (22.04%) Loss: 1.863683 LR: 0.00002993 [09:35:38] Epoch: 1 Batch: 4431/20099 (22.05%) Loss: 2.439022 LR: 0.00002993 [09:35:40] Epoch: 1 Batch: 4432/20099 (22.05%) Loss: 2.518859 LR: 0.00002993 [09:35:42] Epoch: 1 Batch: 4433/20099 (22.06%) Loss: 2.156759 LR: 0.00002993 [09:35:44] Epoch: 1 Batch: 4434/20099 (22.06%) Loss: 2.174412 LR: 0.00002992 [09:35:46] Epoch: 1 Batch: 4435/20099 (22.07%) Loss: 2.162699 LR: 0.00002992 [09:35:47] Epoch: 1 Batch: 4436/20099 (22.07%) Loss: 2.155162 LR: 0.00002992 [09:35:49] Epoch: 1 Batch: 4437/20099 (22.08%) Loss: 2.214235 LR: 0.00002992 [09:35:51] Epoch: 1 Batch: 4438/20099 (22.08%) Loss: 2.099998 LR: 0.00002992 [09:35:53] Epoch: 1 Batch: 4439/20099 (22.09%) Loss: 2.228888 LR: 0.00002992 [09:35:54] Epoch: 1 Batch: 4440/20099 (22.09%) Loss: 2.161440 LR: 0.00002992 [09:35:56] Epoch: 1 Batch: 4441/20099 (22.10%) Loss: 2.240600 LR: 0.00002992 [09:35:58] Epoch: 1 Batch: 4442/20099 (22.10%) Loss: 1.745911 LR: 0.00002992 [09:36:00] Epoch: 1 Batch: 4443/20099 (22.11%) Loss: 2.077041 LR: 0.00002992 [09:36:01] Epoch: 1 Batch: 4444/20099 (22.11%) Loss: 1.929353 LR: 0.00002992 [09:36:03] Epoch: 1 Batch: 4445/20099 (22.12%) Loss: 2.288770 LR: 0.00002992 [09:36:05] Epoch: 1 Batch: 4446/20099 (22.12%) Loss: 1.893214 LR: 0.00002992 [09:36:07] Epoch: 1 Batch: 4447/20099 (22.13%) Loss: 1.982065 LR: 0.00002992 [09:36:08] Epoch: 1 Batch: 4448/20099 (22.13%) Loss: 2.151330 LR: 0.00002992 [09:36:10] Epoch: 1 Batch: 4449/20099 (22.14%) Loss: 2.103332 LR: 0.00002992 [09:36:12] Epoch: 1 Batch: 4450/20099 (22.14%) Loss: 2.271091 LR: 0.00002992 [09:36:14] Epoch: 1 Batch: 4451/20099 (22.15%) Loss: 2.141728 LR: 0.00002992 [09:36:16] Epoch: 1 Batch: 4452/20099 (22.15%) Loss: 2.021437 LR: 0.00002992 [09:36:17] Epoch: 1 Batch: 4453/20099 (22.16%) Loss: 1.957507 LR: 0.00002992 [09:36:19] Epoch: 1 Batch: 4454/20099 (22.16%) Loss: 2.156494 LR: 0.00002992 [09:36:21] Epoch: 1 Batch: 4455/20099 (22.17%) Loss: 2.340196 LR: 0.00002992 [09:36:23] Epoch: 1 Batch: 4456/20099 (22.17%) Loss: 1.897669 LR: 0.00002992 [09:36:24] Epoch: 1 Batch: 4457/20099 (22.18%) Loss: 2.221617 LR: 0.00002992 [09:36:26] Epoch: 1 Batch: 4458/20099 (22.18%) Loss: 2.199193 LR: 0.00002992 [09:36:28] Epoch: 1 Batch: 4459/20099 (22.19%) Loss: 2.315684 LR: 0.00002992 [09:36:30] Epoch: 1 Batch: 4460/20099 (22.19%) Loss: 2.362048 LR: 0.00002992 [09:36:32] Epoch: 1 Batch: 4461/20099 (22.20%) Loss: 2.210023 LR: 0.00002992 [09:36:33] Epoch: 1 Batch: 4462/20099 (22.20%) Loss: 2.147963 LR: 0.00002992 [09:36:35] Epoch: 1 Batch: 4463/20099 (22.21%) Loss: 1.976379 LR: 0.00002992 [09:36:37] Epoch: 1 Batch: 4464/20099 (22.21%) Loss: 2.246976 LR: 0.00002992 [09:36:39] Epoch: 1 Batch: 4465/20099 (22.22%) Loss: 2.083721 LR: 0.00002992 [09:36:40] Epoch: 1 Batch: 4466/20099 (22.22%) Loss: 2.039861 LR: 0.00002992 [09:36:42] Epoch: 1 Batch: 4467/20099 (22.22%) Loss: 2.250999 LR: 0.00002992 [09:36:44] Epoch: 1 Batch: 4468/20099 (22.23%) Loss: 1.960066 LR: 0.00002992 [09:36:46] Epoch: 1 Batch: 4469/20099 (22.23%) Loss: 2.599974 LR: 0.00002991 [09:36:48] Epoch: 1 Batch: 4470/20099 (22.24%) Loss: 2.125574 LR: 0.00002991 [09:36:49] Epoch: 1 Batch: 4471/20099 (22.24%) Loss: 2.701041 LR: 0.00002991 [09:36:51] Epoch: 1 Batch: 4472/20099 (22.25%) Loss: 2.034093 LR: 0.00002991 [09:36:53] Epoch: 1 Batch: 4473/20099 (22.25%) Loss: 2.413813 LR: 0.00002991 [09:36:55] Epoch: 1 Batch: 4474/20099 (22.26%) Loss: 2.219178 LR: 0.00002991 [09:36:57] Epoch: 1 Batch: 4475/20099 (22.26%) Loss: 2.104830 LR: 0.00002991 [09:36:58] Epoch: 1 Batch: 4476/20099 (22.27%) Loss: 2.444024 LR: 0.00002991 [09:37:00] Epoch: 1 Batch: 4477/20099 (22.27%) Loss: 2.259976 LR: 0.00002991 [09:37:02] Epoch: 1 Batch: 4478/20099 (22.28%) Loss: 2.164045 LR: 0.00002991 [09:37:04] Epoch: 1 Batch: 4479/20099 (22.28%) Loss: 2.011524 LR: 0.00002991 [09:37:05] Epoch: 1 Batch: 4480/20099 (22.29%) Loss: 2.277252 LR: 0.00002991 [09:37:07] Epoch: 1 Batch: 4481/20099 (22.29%) Loss: 2.357504 LR: 0.00002991 [09:37:09] Epoch: 1 Batch: 4482/20099 (22.30%) Loss: 2.282357 LR: 0.00002991 [09:37:11] Epoch: 1 Batch: 4483/20099 (22.30%) Loss: 1.967951 LR: 0.00002991 [09:37:13] Epoch: 1 Batch: 4484/20099 (22.31%) Loss: 2.122877 LR: 0.00002991 [09:37:14] Epoch: 1 Batch: 4485/20099 (22.31%) Loss: 1.947138 LR: 0.00002991 [09:37:16] Epoch: 1 Batch: 4486/20099 (22.32%) Loss: 2.280461 LR: 0.00002991 [09:37:18] Epoch: 1 Batch: 4487/20099 (22.32%) Loss: 2.120796 LR: 0.00002991 [09:37:20] Epoch: 1 Batch: 4488/20099 (22.33%) Loss: 2.518332 LR: 0.00002991 [09:37:22] Epoch: 1 Batch: 4489/20099 (22.33%) Loss: 2.352842 LR: 0.00002991 [09:37:23] Epoch: 1 Batch: 4490/20099 (22.34%) Loss: 2.109434 LR: 0.00002991 [09:37:25] Epoch: 1 Batch: 4491/20099 (22.34%) Loss: 2.188045 LR: 0.00002991 [09:37:27] Epoch: 1 Batch: 4492/20099 (22.35%) Loss: 2.203532 LR: 0.00002991 [09:37:29] Epoch: 1 Batch: 4493/20099 (22.35%) Loss: 2.153884 LR: 0.00002991 [09:37:30] Epoch: 1 Batch: 4494/20099 (22.36%) Loss: 2.190680 LR: 0.00002991 [09:37:32] Epoch: 1 Batch: 4495/20099 (22.36%) Loss: 2.276166 LR: 0.00002991 [09:37:34] Epoch: 1 Batch: 4496/20099 (22.37%) Loss: 2.142515 LR: 0.00002991 [09:37:36] Epoch: 1 Batch: 4497/20099 (22.37%) Loss: 2.343031 LR: 0.00002991 [09:37:37] Epoch: 1 Batch: 4498/20099 (22.38%) Loss: 2.166052 LR: 0.00002991 [09:37:39] Epoch: 1 Batch: 4499/20099 (22.38%) Loss: 2.138952 LR: 0.00002991 [09:37:41] >> Evaluating batch 0 [09:37:42] >> Evaluating batch 1 [09:37:43] >> Evaluating batch 2 [09:37:44] >> Evaluating batch 3 [09:37:45] >> Evaluating batch 4 [09:37:46] >> Evaluating batch 5 [09:37:47] >> Evaluating batch 6 [09:37:48] >> Evaluating batch 7 [09:37:49] >> Evaluating batch 8 [09:37:50] >> Evaluating batch 9 [09:37:51] >> Evaluating batch 10 [09:37:52] >> Evaluating batch 11 [09:37:53] >> Evaluating batch 12 [09:37:54] >> Evaluating batch 13 [09:37:55] >> Evaluating batch 14 [09:37:56] >> Evaluating batch 15 [09:37:57] >> Evaluating batch 16 [09:37:58] Epoch: 1 Step: 4500/20099 Evaluation: [09:37:58] [1mAvg Loss Since Last Eval: 0.2422 Val Loss: 2.2500 Validation loss delta: 2.2500 Perplexity: 9.4881 LR: 0.00002991 [09:38:01] >> Checkpoint saved: epoch1_step4500, size: 0.1693 GB [09:38:01] Epoch: 1 Batch: 4500/20099 (22.39%) Loss: 2.454383 LR: 0.00002991 [09:38:03] Epoch: 1 Batch: 4501/20099 (22.39%) Loss: 2.127920 LR: 0.00002991 [09:38:05] Epoch: 1 Batch: 4502/20099 (22.40%) Loss: 2.181894 LR: 0.00002991 [09:38:06] Epoch: 1 Batch: 4503/20099 (22.40%) Loss: 2.267566 LR: 0.00002991 [09:38:08] Epoch: 1 Batch: 4504/20099 (22.41%) Loss: 2.116218 LR: 0.00002991 [09:38:10] Epoch: 1 Batch: 4505/20099 (22.41%) Loss: 2.189499 LR: 0.00002991 [09:38:12] Epoch: 1 Batch: 4506/20099 (22.42%) Loss: 2.080168 LR: 0.00002991 [09:38:13] Epoch: 1 Batch: 4507/20099 (22.42%) Loss: 2.237687 LR: 0.00002991 [09:38:15] Epoch: 1 Batch: 4508/20099 (22.43%) Loss: 2.397248 LR: 0.00002991 [09:38:17] Epoch: 1 Batch: 4509/20099 (22.43%) Loss: 2.352885 LR: 0.00002991 [09:38:19] Epoch: 1 Batch: 4510/20099 (22.44%) Loss: 2.073200 LR: 0.00002991 [09:38:21] Epoch: 1 Batch: 4511/20099 (22.44%) Loss: 2.116466 LR: 0.00002990 [09:38:22] Epoch: 1 Batch: 4512/20099 (22.45%) Loss: 2.144737 LR: 0.00002990 [09:38:24] Epoch: 1 Batch: 4513/20099 (22.45%) Loss: 2.067570 LR: 0.00002990 [09:38:26] Epoch: 1 Batch: 4514/20099 (22.46%) Loss: 2.282880 LR: 0.00002990 [09:38:28] Epoch: 1 Batch: 4515/20099 (22.46%) Loss: 2.065158 LR: 0.00002990 [09:38:30] Epoch: 1 Batch: 4516/20099 (22.47%) Loss: 2.179939 LR: 0.00002990 [09:38:31] Epoch: 1 Batch: 4517/20099 (22.47%) Loss: 2.038934 LR: 0.00002990 [09:38:33] Epoch: 1 Batch: 4518/20099 (22.48%) Loss: 1.920319 LR: 0.00002990 [09:38:35] Epoch: 1 Batch: 4519/20099 (22.48%) Loss: 2.322344 LR: 0.00002990 [09:38:37] Epoch: 1 Batch: 4520/20099 (22.49%) Loss: 2.012674 LR: 0.00002990 [09:38:39] Epoch: 1 Batch: 4521/20099 (22.49%) Loss: 2.306949 LR: 0.00002990 [09:38:40] Epoch: 1 Batch: 4522/20099 (22.50%) Loss: 2.251603 LR: 0.00002990 [09:38:42] Epoch: 1 Batch: 4523/20099 (22.50%) Loss: 2.471321 LR: 0.00002990 [09:38:44] Epoch: 1 Batch: 4524/20099 (22.51%) Loss: 2.025124 LR: 0.00002990 [09:38:46] Epoch: 1 Batch: 4525/20099 (22.51%) Loss: 2.160552 LR: 0.00002990 [09:38:48] Epoch: 1 Batch: 4526/20099 (22.52%) Loss: 2.231827 LR: 0.00002990 [09:38:49] Epoch: 1 Batch: 4527/20099 (22.52%) Loss: 2.199973 LR: 0.00002990 [09:38:51] Epoch: 1 Batch: 4528/20099 (22.53%) Loss: 2.129307 LR: 0.00002990 [09:38:53] Epoch: 1 Batch: 4529/20099 (22.53%) Loss: 2.103114 LR: 0.00002990 [09:38:55] Epoch: 1 Batch: 4530/20099 (22.54%) Loss: 2.203794 LR: 0.00002990 [09:38:56] Epoch: 1 Batch: 4531/20099 (22.54%) Loss: 2.448875 LR: 0.00002990 [09:38:58] Epoch: 1 Batch: 4532/20099 (22.55%) Loss: 2.394624 LR: 0.00002990 [09:39:00] Epoch: 1 Batch: 4533/20099 (22.55%) Loss: 2.171654 LR: 0.00002990 [09:39:02] Epoch: 1 Batch: 4534/20099 (22.56%) Loss: 2.588477 LR: 0.00002990 [09:39:04] Epoch: 1 Batch: 4535/20099 (22.56%) Loss: 2.148917 LR: 0.00002990 [09:39:05] Epoch: 1 Batch: 4536/20099 (22.57%) Loss: 2.274196 LR: 0.00002990 [09:39:07] Epoch: 1 Batch: 4537/20099 (22.57%) Loss: 1.942928 LR: 0.00002990 [09:39:09] Epoch: 1 Batch: 4538/20099 (22.58%) Loss: 2.302236 LR: 0.00002990 [09:39:11] Epoch: 1 Batch: 4539/20099 (22.58%) Loss: 2.179597 LR: 0.00002989 [09:39:12] Epoch: 1 Batch: 4540/20099 (22.59%) Loss: 2.244758 LR: 0.00002989 [09:39:14] Epoch: 1 Batch: 4541/20099 (22.59%) Loss: 2.412387 LR: 0.00002989 [09:39:16] Epoch: 1 Batch: 4542/20099 (22.60%) Loss: 2.156938 LR: 0.00002989 [09:39:18] Epoch: 1 Batch: 4543/20099 (22.60%) Loss: 2.259222 LR: 0.00002989 [09:39:19] Epoch: 1 Batch: 4544/20099 (22.61%) Loss: 1.942501 LR: 0.00002989 [09:39:21] Epoch: 1 Batch: 4545/20099 (22.61%) Loss: 2.215254 LR: 0.00002989 [09:39:23] Epoch: 1 Batch: 4546/20099 (22.62%) Loss: 2.195856 LR: 0.00002989 [09:39:25] Epoch: 1 Batch: 4547/20099 (22.62%) Loss: 2.046872 LR: 0.00002989 [09:39:26] Epoch: 1 Batch: 4548/20099 (22.63%) Loss: 2.068745 LR: 0.00002989 [09:39:28] Epoch: 1 Batch: 4549/20099 (22.63%) Loss: 2.048756 LR: 0.00002989 [09:39:30] Epoch: 1 Batch: 4550/20099 (22.64%) Loss: 2.158098 LR: 0.00002989 [09:39:32] Epoch: 1 Batch: 4551/20099 (22.64%) Loss: 2.373278 LR: 0.00002989 [09:39:34] Epoch: 1 Batch: 4552/20099 (22.65%) Loss: 2.208424 LR: 0.00002989 [09:39:35] Epoch: 1 Batch: 4553/20099 (22.65%) Loss: 1.723308 LR: 0.00002989 [09:39:37] Epoch: 1 Batch: 4554/20099 (22.66%) Loss: 2.233638 LR: 0.00002989 [09:39:39] Epoch: 1 Batch: 4555/20099 (22.66%) Loss: 2.200091 LR: 0.00002989 [09:39:41] Epoch: 1 Batch: 4556/20099 (22.67%) Loss: 2.019238 LR: 0.00002989 [09:39:42] Epoch: 1 Batch: 4557/20099 (22.67%) Loss: 2.027067 LR: 0.00002989 [09:39:44] Epoch: 1 Batch: 4558/20099 (22.68%) Loss: 2.246422 LR: 0.00002989 [09:39:46] Epoch: 1 Batch: 4559/20099 (22.68%) Loss: 2.019165 LR: 0.00002989 [09:39:48] Epoch: 1 Batch: 4560/20099 (22.69%) Loss: 2.247591 LR: 0.00002989 [09:39:49] Epoch: 1 Batch: 4561/20099 (22.69%) Loss: 2.628521 LR: 0.00002989 [09:39:51] Epoch: 1 Batch: 4562/20099 (22.70%) Loss: 2.084549 LR: 0.00002989 [09:39:53] Epoch: 1 Batch: 4563/20099 (22.70%) Loss: 2.271124 LR: 0.00002989 [09:39:55] Epoch: 1 Batch: 4564/20099 (22.71%) Loss: 2.149997 LR: 0.00002989 [09:39:57] Epoch: 1 Batch: 4565/20099 (22.71%) Loss: 2.408524 LR: 0.00002989 [09:39:58] Epoch: 1 Batch: 4566/20099 (22.72%) Loss: 2.233233 LR: 0.00002989 [09:40:00] Epoch: 1 Batch: 4567/20099 (22.72%) Loss: 2.063513 LR: 0.00002989 [09:40:02] Epoch: 1 Batch: 4568/20099 (22.73%) Loss: 2.072395 LR: 0.00002989 [09:40:04] Epoch: 1 Batch: 4569/20099 (22.73%) Loss: 2.322791 LR: 0.00002989 [09:40:05] Epoch: 1 Batch: 4570/20099 (22.74%) Loss: 2.111361 LR: 0.00002989 [09:40:07] Epoch: 1 Batch: 4571/20099 (22.74%) Loss: 2.174940 LR: 0.00002989 [09:40:09] Epoch: 1 Batch: 4572/20099 (22.75%) Loss: 2.269704 LR: 0.00002989 [09:40:11] Epoch: 1 Batch: 4573/20099 (22.75%) Loss: 2.210771 LR: 0.00002989 [09:40:12] Epoch: 1 Batch: 4574/20099 (22.76%) Loss: 2.143299 LR: 0.00002988 [09:40:14] Epoch: 1 Batch: 4575/20099 (22.76%) Loss: 2.013377 LR: 0.00002988 [09:40:16] Epoch: 1 Batch: 4576/20099 (22.77%) Loss: 2.094445 LR: 0.00002988 [09:40:18] Epoch: 1 Batch: 4577/20099 (22.77%) Loss: 2.169196 LR: 0.00002988 [09:40:20] Epoch: 1 Batch: 4578/20099 (22.78%) Loss: 1.770027 LR: 0.00002988 [09:40:21] Epoch: 1 Batch: 4579/20099 (22.78%) Loss: 2.170108 LR: 0.00002988 [09:40:23] Epoch: 1 Batch: 4580/20099 (22.79%) Loss: 2.074645 LR: 0.00002988 [09:40:25] Epoch: 1 Batch: 4581/20099 (22.79%) Loss: 2.170211 LR: 0.00002988 [09:40:27] Epoch: 1 Batch: 4582/20099 (22.80%) Loss: 2.067807 LR: 0.00002988 [09:40:28] Epoch: 1 Batch: 4583/20099 (22.80%) Loss: 2.415698 LR: 0.00002988 [09:40:30] Epoch: 1 Batch: 4584/20099 (22.81%) Loss: 2.208312 LR: 0.00002988 [09:40:32] Epoch: 1 Batch: 4585/20099 (22.81%) Loss: 2.042493 LR: 0.00002988 [09:40:34] Epoch: 1 Batch: 4586/20099 (22.82%) Loss: 2.180415 LR: 0.00002988 [09:40:36] Epoch: 1 Batch: 4587/20099 (22.82%) Loss: 1.968171 LR: 0.00002988 [09:40:37] Epoch: 1 Batch: 4588/20099 (22.83%) Loss: 1.975385 LR: 0.00002988 [09:40:39] Epoch: 1 Batch: 4589/20099 (22.83%) Loss: 2.021766 LR: 0.00002988 [09:40:41] Epoch: 1 Batch: 4590/20099 (22.84%) Loss: 2.146341 LR: 0.00002988 [09:40:43] Epoch: 1 Batch: 4591/20099 (22.84%) Loss: 2.325340 LR: 0.00002988 [09:40:44] Epoch: 1 Batch: 4592/20099 (22.85%) Loss: 2.230047 LR: 0.00002988 [09:40:46] Epoch: 1 Batch: 4593/20099 (22.85%) Loss: 2.374275 LR: 0.00002988 [09:40:48] Epoch: 1 Batch: 4594/20099 (22.86%) Loss: 1.915366 LR: 0.00002988 [09:40:50] Epoch: 1 Batch: 4595/20099 (22.86%) Loss: 2.180977 LR: 0.00002988 [09:40:52] Epoch: 1 Batch: 4596/20099 (22.87%) Loss: 2.130810 LR: 0.00002988 [09:40:53] Epoch: 1 Batch: 4597/20099 (22.87%) Loss: 2.190001 LR: 0.00002988 [09:40:55] Epoch: 1 Batch: 4598/20099 (22.88%) Loss: 2.060367 LR: 0.00002988 [09:40:57] Epoch: 1 Batch: 4599/20099 (22.88%) Loss: 2.095263 LR: 0.00002988 [09:41:02] >> Cleaned up old temp checkpoint: epoch1_step2600 [09:41:02] >> Temp checkpoint saved: epoch1_step4600, size: 0.1693 GB [09:41:02] Epoch: 1 Batch: 4600/20099 (22.89%) Loss: 2.269129 LR: 0.00002988 [09:41:04] Epoch: 1 Batch: 4601/20099 (22.89%) Loss: 2.309718 LR: 0.00002988 [09:41:06] Epoch: 1 Batch: 4602/20099 (22.90%) Loss: 2.260203 LR: 0.00002987 [09:41:08] Epoch: 1 Batch: 4603/20099 (22.90%) Loss: 2.116080 LR: 0.00002987 [09:41:09] Epoch: 1 Batch: 4604/20099 (22.91%) Loss: 1.988276 LR: 0.00002987 [09:41:11] Epoch: 1 Batch: 4605/20099 (22.91%) Loss: 2.198561 LR: 0.00002987 [09:41:13] Epoch: 1 Batch: 4606/20099 (22.92%) Loss: 2.081283 LR: 0.00002987 [09:41:15] Epoch: 1 Batch: 4607/20099 (22.92%) Loss: 2.234585 LR: 0.00002987 [09:41:16] Epoch: 1 Batch: 4608/20099 (22.93%) Loss: 1.730787 LR: 0.00002987 [09:41:18] Epoch: 1 Batch: 4609/20099 (22.93%) Loss: 2.339286 LR: 0.00002987 [09:41:20] Epoch: 1 Batch: 4610/20099 (22.94%) Loss: 2.162195 LR: 0.00002987 [09:41:22] Epoch: 1 Batch: 4611/20099 (22.94%) Loss: 2.292299 LR: 0.00002987 [09:41:24] Epoch: 1 Batch: 4612/20099 (22.95%) Loss: 2.217103 LR: 0.00002987 [09:41:25] Epoch: 1 Batch: 4613/20099 (22.95%) Loss: 2.329199 LR: 0.00002987 [09:41:27] Epoch: 1 Batch: 4614/20099 (22.96%) Loss: 2.286993 LR: 0.00002987 [09:41:29] Epoch: 1 Batch: 4615/20099 (22.96%) Loss: 2.024486 LR: 0.00002987 [09:41:31] Epoch: 1 Batch: 4616/20099 (22.97%) Loss: 2.329528 LR: 0.00002987 [09:41:33] Epoch: 1 Batch: 4617/20099 (22.97%) Loss: 2.410837 LR: 0.00002987 [09:41:34] Epoch: 1 Batch: 4618/20099 (22.98%) Loss: 1.902958 LR: 0.00002987 [09:41:36] Epoch: 1 Batch: 4619/20099 (22.98%) Loss: 2.231718 LR: 0.00002987 [09:41:38] Epoch: 1 Batch: 4620/20099 (22.99%) Loss: 2.043199 LR: 0.00002987 [09:41:40] Epoch: 1 Batch: 4621/20099 (22.99%) Loss: 2.352170 LR: 0.00002987 [09:41:42] Epoch: 1 Batch: 4622/20099 (23.00%) Loss: 2.009378 LR: 0.00002987 [09:41:43] Epoch: 1 Batch: 4623/20099 (23.00%) Loss: 2.010901 LR: 0.00002987 [09:41:45] Epoch: 1 Batch: 4624/20099 (23.01%) Loss: 2.450607 LR: 0.00002987 [09:41:47] Epoch: 1 Batch: 4625/20099 (23.01%) Loss: 2.079332 LR: 0.00002987 [09:41:49] Epoch: 1 Batch: 4626/20099 (23.02%) Loss: 2.083146 LR: 0.00002987 [09:41:51] Epoch: 1 Batch: 4627/20099 (23.02%) Loss: 2.213307 LR: 0.00002987 [09:41:52] Epoch: 1 Batch: 4628/20099 (23.03%) Loss: 1.885389 LR: 0.00002987 [09:41:54] Epoch: 1 Batch: 4629/20099 (23.03%) Loss: 2.558436 LR: 0.00002987 [09:41:56] Epoch: 1 Batch: 4630/20099 (23.04%) Loss: 2.228308 LR: 0.00002986 [09:41:58] Epoch: 1 Batch: 4631/20099 (23.04%) Loss: 2.147811 LR: 0.00002986 [09:41:59] Epoch: 1 Batch: 4632/20099 (23.05%) Loss: 2.159683 LR: 0.00002986 [09:42:01] Epoch: 1 Batch: 4633/20099 (23.05%) Loss: 1.921819 LR: 0.00002986 [09:42:03] Epoch: 1 Batch: 4634/20099 (23.06%) Loss: 2.113861 LR: 0.00002986 [09:42:05] Epoch: 1 Batch: 4635/20099 (23.06%) Loss: 2.055442 LR: 0.00002986 [09:42:06] Epoch: 1 Batch: 4636/20099 (23.07%) Loss: 2.114245 LR: 0.00002986 [09:42:08] Epoch: 1 Batch: 4637/20099 (23.07%) Loss: 2.250477 LR: 0.00002986 [09:42:10] Epoch: 1 Batch: 4638/20099 (23.08%) Loss: 2.310852 LR: 0.00002986 [09:42:12] Epoch: 1 Batch: 4639/20099 (23.08%) Loss: 2.505158 LR: 0.00002986 [09:42:14] Epoch: 1 Batch: 4640/20099 (23.09%) Loss: 1.916171 LR: 0.00002986 [09:42:15] Epoch: 1 Batch: 4641/20099 (23.09%) Loss: 2.351977 LR: 0.00002986 [09:42:17] Epoch: 1 Batch: 4642/20099 (23.10%) Loss: 2.255134 LR: 0.00002986 [09:42:19] Epoch: 1 Batch: 4643/20099 (23.10%) Loss: 2.263360 LR: 0.00002986 [09:42:21] Epoch: 1 Batch: 4644/20099 (23.11%) Loss: 2.145698 LR: 0.00002986 [09:42:22] Epoch: 1 Batch: 4645/20099 (23.11%) Loss: 2.179838 LR: 0.00002986 [09:42:24] Epoch: 1 Batch: 4646/20099 (23.12%) Loss: 1.850737 LR: 0.00002986 [09:42:26] Epoch: 1 Batch: 4647/20099 (23.12%) Loss: 1.976613 LR: 0.00002986 [09:42:28] Epoch: 1 Batch: 4648/20099 (23.13%) Loss: 2.214621 LR: 0.00002986 [09:42:29] Epoch: 1 Batch: 4649/20099 (23.13%) Loss: 2.298293 LR: 0.00002986 [09:42:31] Epoch: 1 Batch: 4650/20099 (23.14%) Loss: 2.193688 LR: 0.00002986 [09:42:33] Epoch: 1 Batch: 4651/20099 (23.14%) Loss: 2.437580 LR: 0.00002986 [09:42:35] Epoch: 1 Batch: 4652/20099 (23.15%) Loss: 1.912781 LR: 0.00002986 [09:42:37] Epoch: 1 Batch: 4653/20099 (23.15%) Loss: 2.243409 LR: 0.00002986 [09:42:39] Epoch: 1 Batch: 4654/20099 (23.16%) Loss: 2.156494 LR: 0.00002986 [09:42:41] Epoch: 1 Batch: 4655/20099 (23.16%) Loss: 2.050530 LR: 0.00002986 [09:42:42] Epoch: 1 Batch: 4656/20099 (23.17%) Loss: 2.467556 LR: 0.00002986 [09:42:44] Epoch: 1 Batch: 4657/20099 (23.17%) Loss: 2.182628 LR: 0.00002986 [09:42:46] Epoch: 1 Batch: 4658/20099 (23.18%) Loss: 2.271367 LR: 0.00002985 [09:42:48] Epoch: 1 Batch: 4659/20099 (23.18%) Loss: 2.114659 LR: 0.00002985 [09:42:49] Epoch: 1 Batch: 4660/20099 (23.19%) Loss: 2.167618 LR: 0.00002985 [09:42:51] Epoch: 1 Batch: 4661/20099 (23.19%) Loss: 2.036290 LR: 0.00002985 [09:42:53] Epoch: 1 Batch: 4662/20099 (23.20%) Loss: 2.112420 LR: 0.00002985 [09:42:55] Epoch: 1 Batch: 4663/20099 (23.20%) Loss: 1.946655 LR: 0.00002985 [09:42:56] Epoch: 1 Batch: 4664/20099 (23.21%) Loss: 2.213091 LR: 0.00002985 [09:42:58] Epoch: 1 Batch: 4665/20099 (23.21%) Loss: 2.138493 LR: 0.00002985 [09:43:00] Epoch: 1 Batch: 4666/20099 (23.22%) Loss: 2.187932 LR: 0.00002985 [09:43:02] Epoch: 1 Batch: 4667/20099 (23.22%) Loss: 2.044093 LR: 0.00002985 [09:43:04] Epoch: 1 Batch: 4668/20099 (23.23%) Loss: 2.185463 LR: 0.00002985 [09:43:05] Epoch: 1 Batch: 4669/20099 (23.23%) Loss: 2.178218 LR: 0.00002985 [09:43:07] Epoch: 1 Batch: 4670/20099 (23.23%) Loss: 2.079967 LR: 0.00002985 [09:43:09] Epoch: 1 Batch: 4671/20099 (23.24%) Loss: 2.226505 LR: 0.00002985 [09:43:11] Epoch: 1 Batch: 4672/20099 (23.24%) Loss: 2.244752 LR: 0.00002985 [09:43:12] Epoch: 1 Batch: 4673/20099 (23.25%) Loss: 2.147533 LR: 0.00002985 [09:43:14] Epoch: 1 Batch: 4674/20099 (23.25%) Loss: 2.224766 LR: 0.00002985 [09:43:16] Epoch: 1 Batch: 4675/20099 (23.26%) Loss: 2.086420 LR: 0.00002985 [09:43:18] Epoch: 1 Batch: 4676/20099 (23.26%) Loss: 2.109352 LR: 0.00002985 [09:43:20] Epoch: 1 Batch: 4677/20099 (23.27%) Loss: 2.338684 LR: 0.00002985 [09:43:21] Epoch: 1 Batch: 4678/20099 (23.27%) Loss: 2.494125 LR: 0.00002985 [09:43:23] Epoch: 1 Batch: 4679/20099 (23.28%) Loss: 2.199632 LR: 0.00002985 [09:43:25] Epoch: 1 Batch: 4680/20099 (23.28%) Loss: 2.023668 LR: 0.00002985 [09:43:27] Epoch: 1 Batch: 4681/20099 (23.29%) Loss: 2.287734 LR: 0.00002985 [09:43:28] Epoch: 1 Batch: 4682/20099 (23.29%) Loss: 2.280823 LR: 0.00002985 [09:43:30] Epoch: 1 Batch: 4683/20099 (23.30%) Loss: 2.178694 LR: 0.00002985 [09:43:32] Epoch: 1 Batch: 4684/20099 (23.30%) Loss: 2.008847 LR: 0.00002985 [09:43:34] Epoch: 1 Batch: 4685/20099 (23.31%) Loss: 2.332745 LR: 0.00002985 [09:43:36] Epoch: 1 Batch: 4686/20099 (23.31%) Loss: 2.348091 LR: 0.00002984 [09:43:37] Epoch: 1 Batch: 4687/20099 (23.32%) Loss: 2.242439 LR: 0.00002984 [09:43:39] Epoch: 1 Batch: 4688/20099 (23.32%) Loss: 2.014596 LR: 0.00002984 [09:43:41] Epoch: 1 Batch: 4689/20099 (23.33%) Loss: 2.448067 LR: 0.00002984 [09:43:43] Epoch: 1 Batch: 4690/20099 (23.33%) Loss: 2.475633 LR: 0.00002984 [09:43:44] Epoch: 1 Batch: 4691/20099 (23.34%) Loss: 1.969159 LR: 0.00002984 [09:43:46] Epoch: 1 Batch: 4692/20099 (23.34%) Loss: 2.314331 LR: 0.00002984 [09:43:48] Epoch: 1 Batch: 4693/20099 (23.35%) Loss: 2.375246 LR: 0.00002984 [09:43:50] Epoch: 1 Batch: 4694/20099 (23.35%) Loss: 2.186830 LR: 0.00002984 [09:43:52] Epoch: 1 Batch: 4695/20099 (23.36%) Loss: 2.099122 LR: 0.00002984 [09:43:53] Epoch: 1 Batch: 4696/20099 (23.36%) Loss: 1.980689 LR: 0.00002984 [09:43:55] Epoch: 1 Batch: 4697/20099 (23.37%) Loss: 2.153680 LR: 0.00002984 [09:43:57] Epoch: 1 Batch: 4698/20099 (23.37%) Loss: 2.370480 LR: 0.00002984 [09:43:59] Epoch: 1 Batch: 4699/20099 (23.38%) Loss: 1.981578 LR: 0.00002984 [09:44:00] Epoch: 1 Batch: 4700/20099 (23.38%) Loss: 2.353047 LR: 0.00002984 [09:44:02] Epoch: 1 Batch: 4701/20099 (23.39%) Loss: 2.345898 LR: 0.00002984 [09:44:04] Epoch: 1 Batch: 4702/20099 (23.39%) Loss: 2.365562 LR: 0.00002984 [09:44:06] Epoch: 1 Batch: 4703/20099 (23.40%) Loss: 2.004191 LR: 0.00002984 [09:44:08] Epoch: 1 Batch: 4704/20099 (23.40%) Loss: 2.045257 LR: 0.00002984 [09:44:09] Epoch: 1 Batch: 4705/20099 (23.41%) Loss: 2.118621 LR: 0.00002984 [09:44:11] Epoch: 1 Batch: 4706/20099 (23.41%) Loss: 2.024510 LR: 0.00002984 [09:44:13] Epoch: 1 Batch: 4707/20099 (23.42%) Loss: 2.193663 LR: 0.00002984 [09:44:15] Epoch: 1 Batch: 4708/20099 (23.42%) Loss: 2.161496 LR: 0.00002984 [09:44:16] Epoch: 1 Batch: 4709/20099 (23.43%) Loss: 2.476473 LR: 0.00002984 [09:44:18] Epoch: 1 Batch: 4710/20099 (23.43%) Loss: 2.065486 LR: 0.00002984 [09:44:20] Epoch: 1 Batch: 4711/20099 (23.44%) Loss: 2.375472 LR: 0.00002984 [09:44:22] Epoch: 1 Batch: 4712/20099 (23.44%) Loss: 2.074227 LR: 0.00002984 [09:44:24] Epoch: 1 Batch: 4713/20099 (23.45%) Loss: 2.139662 LR: 0.00002984 [09:44:25] Epoch: 1 Batch: 4714/20099 (23.45%) Loss: 2.442742 LR: 0.00002983 [09:44:27] Epoch: 1 Batch: 4715/20099 (23.46%) Loss: 2.216808 LR: 0.00002983 [09:44:29] Epoch: 1 Batch: 4716/20099 (23.46%) Loss: 2.207420 LR: 0.00002983 [09:44:31] Epoch: 1 Batch: 4717/20099 (23.47%) Loss: 2.341451 LR: 0.00002983 [09:44:32] Epoch: 1 Batch: 4718/20099 (23.47%) Loss: 2.085963 LR: 0.00002983 [09:44:34] Epoch: 1 Batch: 4719/20099 (23.48%) Loss: 2.002911 LR: 0.00002983 [09:44:36] Epoch: 1 Batch: 4720/20099 (23.48%) Loss: 2.052259 LR: 0.00002983 [09:44:38] Epoch: 1 Batch: 4721/20099 (23.49%) Loss: 2.232872 LR: 0.00002983 [09:44:40] Epoch: 1 Batch: 4722/20099 (23.49%) Loss: 2.272426 LR: 0.00002983 [09:44:41] Epoch: 1 Batch: 4723/20099 (23.50%) Loss: 2.392010 LR: 0.00002983 [09:44:43] Epoch: 1 Batch: 4724/20099 (23.50%) Loss: 2.312881 LR: 0.00002983 [09:44:45] Epoch: 1 Batch: 4725/20099 (23.51%) Loss: 2.126988 LR: 0.00002983 [09:44:47] Epoch: 1 Batch: 4726/20099 (23.51%) Loss: 2.308162 LR: 0.00002983 [09:44:49] Epoch: 1 Batch: 4727/20099 (23.52%) Loss: 2.126079 LR: 0.00002983 [09:44:50] Epoch: 1 Batch: 4728/20099 (23.52%) Loss: 2.483215 LR: 0.00002983 [09:44:52] Epoch: 1 Batch: 4729/20099 (23.53%) Loss: 2.604293 LR: 0.00002983 [09:44:54] Epoch: 1 Batch: 4730/20099 (23.53%) Loss: 2.077269 LR: 0.00002983 [09:44:56] Epoch: 1 Batch: 4731/20099 (23.54%) Loss: 2.156270 LR: 0.00002983 [09:44:57] Epoch: 1 Batch: 4732/20099 (23.54%) Loss: 2.085979 LR: 0.00002983 [09:44:59] Epoch: 1 Batch: 4733/20099 (23.55%) Loss: 2.005166 LR: 0.00002983 [09:45:01] Epoch: 1 Batch: 4734/20099 (23.55%) Loss: 2.049402 LR: 0.00002983 [09:45:03] Epoch: 1 Batch: 4735/20099 (23.56%) Loss: 2.309577 LR: 0.00002983 [09:45:05] Epoch: 1 Batch: 4736/20099 (23.56%) Loss: 2.121827 LR: 0.00002983 [09:45:06] Epoch: 1 Batch: 4737/20099 (23.57%) Loss: 2.042012 LR: 0.00002983 [09:45:08] Epoch: 1 Batch: 4738/20099 (23.57%) Loss: 2.298476 LR: 0.00002983 [09:45:10] Epoch: 1 Batch: 4739/20099 (23.58%) Loss: 2.254707 LR: 0.00002983 [09:45:12] Epoch: 1 Batch: 4740/20099 (23.58%) Loss: 2.127829 LR: 0.00002983 [09:45:13] Epoch: 1 Batch: 4741/20099 (23.59%) Loss: 1.957008 LR: 0.00002983 [09:45:15] Epoch: 1 Batch: 4742/20099 (23.59%) Loss: 2.167894 LR: 0.00002982 [09:45:17] Epoch: 1 Batch: 4743/20099 (23.60%) Loss: 2.224962 LR: 0.00002982 [09:45:19] Epoch: 1 Batch: 4744/20099 (23.60%) Loss: 2.002913 LR: 0.00002982 [09:45:20] Epoch: 1 Batch: 4745/20099 (23.61%) Loss: 1.833344 LR: 0.00002982 [09:45:22] Epoch: 1 Batch: 4746/20099 (23.61%) Loss: 2.104826 LR: 0.00002982 [09:45:24] Epoch: 1 Batch: 4747/20099 (23.62%) Loss: 2.210449 LR: 0.00002982 [09:45:26] Epoch: 1 Batch: 4748/20099 (23.62%) Loss: 2.180137 LR: 0.00002982 [09:45:28] Epoch: 1 Batch: 4749/20099 (23.63%) Loss: 2.117358 LR: 0.00002982 [09:45:29] Epoch: 1 Batch: 4750/20099 (23.63%) Loss: 2.180726 LR: 0.00002982 [09:45:31] Epoch: 1 Batch: 4751/20099 (23.64%) Loss: 2.083993 LR: 0.00002982 [09:45:33] Epoch: 1 Batch: 4752/20099 (23.64%) Loss: 2.193973 LR: 0.00002982 [09:45:35] Epoch: 1 Batch: 4753/20099 (23.65%) Loss: 2.095832 LR: 0.00002982 [09:45:36] Epoch: 1 Batch: 4754/20099 (23.65%) Loss: 2.288699 LR: 0.00002982 [09:45:38] Epoch: 1 Batch: 4755/20099 (23.66%) Loss: 2.304881 LR: 0.00002982 [09:45:40] Epoch: 1 Batch: 4756/20099 (23.66%) Loss: 2.284542 LR: 0.00002982 [09:45:42] Epoch: 1 Batch: 4757/20099 (23.67%) Loss: 1.846016 LR: 0.00002982 [09:45:44] Epoch: 1 Batch: 4758/20099 (23.67%) Loss: 2.118920 LR: 0.00002982 [09:45:45] Epoch: 1 Batch: 4759/20099 (23.68%) Loss: 2.262972 LR: 0.00002982 [09:45:47] Epoch: 1 Batch: 4760/20099 (23.68%) Loss: 1.974369 LR: 0.00002982 [09:45:49] Epoch: 1 Batch: 4761/20099 (23.69%) Loss: 2.066820 LR: 0.00002982 [09:45:51] Epoch: 1 Batch: 4762/20099 (23.69%) Loss: 2.296504 LR: 0.00002982 [09:45:52] Epoch: 1 Batch: 4763/20099 (23.70%) Loss: 2.245446 LR: 0.00002981 [09:45:54] Epoch: 1 Batch: 4764/20099 (23.70%) Loss: 2.107984 LR: 0.00002981 [09:45:56] Epoch: 1 Batch: 4765/20099 (23.71%) Loss: 2.210339 LR: 0.00002981 [09:45:58] Epoch: 1 Batch: 4766/20099 (23.71%) Loss: 2.317290 LR: 0.00002981 [09:46:00] Epoch: 1 Batch: 4767/20099 (23.72%) Loss: 1.968884 LR: 0.00002981 [09:46:01] Epoch: 1 Batch: 4768/20099 (23.72%) Loss: 2.279551 LR: 0.00002981 [09:46:03] Epoch: 1 Batch: 4769/20099 (23.73%) Loss: 2.090513 LR: 0.00002981 [09:46:05] Epoch: 1 Batch: 4770/20099 (23.73%) Loss: 1.964401 LR: 0.00002981 [09:46:07] Epoch: 1 Batch: 4771/20099 (23.74%) Loss: 2.168427 LR: 0.00002981 [09:46:08] Epoch: 1 Batch: 4772/20099 (23.74%) Loss: 2.142965 LR: 0.00002981 [09:46:10] Epoch: 1 Batch: 4773/20099 (23.75%) Loss: 2.212569 LR: 0.00002981 [09:46:12] Epoch: 1 Batch: 4774/20099 (23.75%) Loss: 2.291496 LR: 0.00002981 [09:46:14] Epoch: 1 Batch: 4775/20099 (23.76%) Loss: 2.010428 LR: 0.00002981 [09:46:15] Epoch: 1 Batch: 4776/20099 (23.76%) Loss: 2.372914 LR: 0.00002981 [09:46:17] Epoch: 1 Batch: 4777/20099 (23.77%) Loss: 2.031020 LR: 0.00002981 [09:46:19] Epoch: 1 Batch: 4778/20099 (23.77%) Loss: 2.287892 LR: 0.00002981 [09:46:21] Epoch: 1 Batch: 4779/20099 (23.78%) Loss: 2.512156 LR: 0.00002981 [09:46:23] Epoch: 1 Batch: 4780/20099 (23.78%) Loss: 2.507593 LR: 0.00002981 [09:46:24] Epoch: 1 Batch: 4781/20099 (23.79%) Loss: 2.205787 LR: 0.00002981 [09:46:26] Epoch: 1 Batch: 4782/20099 (23.79%) Loss: 2.325956 LR: 0.00002981 [09:46:28] Epoch: 1 Batch: 4783/20099 (23.80%) Loss: 2.132880 LR: 0.00002981 [09:46:30] Epoch: 1 Batch: 4784/20099 (23.80%) Loss: 1.950488 LR: 0.00002981 [09:46:31] Epoch: 1 Batch: 4785/20099 (23.81%) Loss: 2.384472 LR: 0.00002981 [09:46:33] Epoch: 1 Batch: 4786/20099 (23.81%) Loss: 2.183739 LR: 0.00002981 [09:46:35] Epoch: 1 Batch: 4787/20099 (23.82%) Loss: 2.289413 LR: 0.00002981 [09:46:37] Epoch: 1 Batch: 4788/20099 (23.82%) Loss: 2.112224 LR: 0.00002981 [09:46:38] Epoch: 1 Batch: 4789/20099 (23.83%) Loss: 2.148224 LR: 0.00002981 [09:46:40] Epoch: 1 Batch: 4790/20099 (23.83%) Loss: 1.919504 LR: 0.00002981 [09:46:42] Epoch: 1 Batch: 4791/20099 (23.84%) Loss: 1.892247 LR: 0.00002980 [09:46:44] Epoch: 1 Batch: 4792/20099 (23.84%) Loss: 2.480908 LR: 0.00002980 [09:46:46] Epoch: 1 Batch: 4793/20099 (23.85%) Loss: 2.184773 LR: 0.00002980 [09:46:47] Epoch: 1 Batch: 4794/20099 (23.85%) Loss: 2.260722 LR: 0.00002980 [09:46:49] Epoch: 1 Batch: 4795/20099 (23.86%) Loss: 2.007028 LR: 0.00002980 [09:46:51] Epoch: 1 Batch: 4796/20099 (23.86%) Loss: 2.119010 LR: 0.00002980 [09:46:53] Epoch: 1 Batch: 4797/20099 (23.87%) Loss: 2.108781 LR: 0.00002980 [09:46:54] Epoch: 1 Batch: 4798/20099 (23.87%) Loss: 2.499474 LR: 0.00002980 [09:46:56] Epoch: 1 Batch: 4799/20099 (23.88%) Loss: 1.998094 LR: 0.00002980 [09:47:01] >> Cleaned up old temp checkpoint: epoch1_step2800 [09:47:02] >> Temp checkpoint saved: epoch1_step4800, size: 0.1693 GB [09:47:02] Epoch: 1 Batch: 4800/20099 (23.88%) Loss: 2.234776 LR: 0.00002980 [09:47:03] Epoch: 1 Batch: 4801/20099 (23.89%) Loss: 2.206125 LR: 0.00002980 [09:47:05] Epoch: 1 Batch: 4802/20099 (23.89%) Loss: 2.121066 LR: 0.00002980 [09:47:07] Epoch: 1 Batch: 4803/20099 (23.90%) Loss: 2.238003 LR: 0.00002980 [09:47:09] Epoch: 1 Batch: 4804/20099 (23.90%) Loss: 2.296652 LR: 0.00002980 [09:47:10] Epoch: 1 Batch: 4805/20099 (23.91%) Loss: 2.048527 LR: 0.00002980 [09:47:12] Epoch: 1 Batch: 4806/20099 (23.91%) Loss: 2.165964 LR: 0.00002980 [09:47:14] Epoch: 1 Batch: 4807/20099 (23.92%) Loss: 2.177756 LR: 0.00002980 [09:47:16] Epoch: 1 Batch: 4808/20099 (23.92%) Loss: 2.034861 LR: 0.00002980 [09:47:17] Epoch: 1 Batch: 4809/20099 (23.93%) Loss: 2.069503 LR: 0.00002980 [09:47:19] Epoch: 1 Batch: 4810/20099 (23.93%) Loss: 2.477509 LR: 0.00002980 [09:47:21] Epoch: 1 Batch: 4811/20099 (23.94%) Loss: 2.272393 LR: 0.00002980 [09:47:23] Epoch: 1 Batch: 4812/20099 (23.94%) Loss: 2.246731 LR: 0.00002979 [09:47:25] Epoch: 1 Batch: 4813/20099 (23.95%) Loss: 2.428300 LR: 0.00002979 [09:47:26] Epoch: 1 Batch: 4814/20099 (23.95%) Loss: 2.057619 LR: 0.00002979 [09:47:28] Epoch: 1 Batch: 4815/20099 (23.96%) Loss: 1.895146 LR: 0.00002979 [09:47:30] Epoch: 1 Batch: 4816/20099 (23.96%) Loss: 2.123725 LR: 0.00002979 [09:47:32] Epoch: 1 Batch: 4817/20099 (23.97%) Loss: 2.273620 LR: 0.00002979 [09:47:34] Epoch: 1 Batch: 4818/20099 (23.97%) Loss: 1.933514 LR: 0.00002979 [09:47:35] Epoch: 1 Batch: 4819/20099 (23.98%) Loss: 2.012297 LR: 0.00002979 [09:47:37] Epoch: 1 Batch: 4820/20099 (23.98%) Loss: 2.113417 LR: 0.00002979 [09:47:39] Epoch: 1 Batch: 4821/20099 (23.99%) Loss: 2.290759 LR: 0.00002979 [09:47:41] Epoch: 1 Batch: 4822/20099 (23.99%) Loss: 2.162830 LR: 0.00002979 [09:47:43] Epoch: 1 Batch: 4823/20099 (24.00%) Loss: 2.284759 LR: 0.00002979 [09:47:44] Epoch: 1 Batch: 4824/20099 (24.00%) Loss: 2.338061 LR: 0.00002979 [09:47:46] Epoch: 1 Batch: 4825/20099 (24.01%) Loss: 2.307732 LR: 0.00002979 [09:47:48] Epoch: 1 Batch: 4826/20099 (24.01%) Loss: 2.196670 LR: 0.00002979 [09:47:50] Epoch: 1 Batch: 4827/20099 (24.02%) Loss: 1.956112 LR: 0.00002979 [09:47:51] Epoch: 1 Batch: 4828/20099 (24.02%) Loss: 2.328331 LR: 0.00002979 [09:47:53] Epoch: 1 Batch: 4829/20099 (24.03%) Loss: 2.213278 LR: 0.00002979 [09:47:55] Epoch: 1 Batch: 4830/20099 (24.03%) Loss: 2.139114 LR: 0.00002979 [09:47:57] Epoch: 1 Batch: 4831/20099 (24.04%) Loss: 2.410370 LR: 0.00002979 [09:47:59] Epoch: 1 Batch: 4832/20099 (24.04%) Loss: 2.069990 LR: 0.00002979 [09:48:00] Epoch: 1 Batch: 4833/20099 (24.05%) Loss: 1.910782 LR: 0.00002979 [09:48:02] Epoch: 1 Batch: 4834/20099 (24.05%) Loss: 2.107382 LR: 0.00002979 [09:48:04] Epoch: 1 Batch: 4835/20099 (24.06%) Loss: 1.790853 LR: 0.00002979 [09:48:06] Epoch: 1 Batch: 4836/20099 (24.06%) Loss: 2.184691 LR: 0.00002979 [09:48:07] Epoch: 1 Batch: 4837/20099 (24.07%) Loss: 2.124120 LR: 0.00002979 [09:48:09] Epoch: 1 Batch: 4838/20099 (24.07%) Loss: 2.043215 LR: 0.00002979 [09:48:11] Epoch: 1 Batch: 4839/20099 (24.08%) Loss: 2.001926 LR: 0.00002979 [09:48:13] Epoch: 1 Batch: 4840/20099 (24.08%) Loss: 2.147848 LR: 0.00002978 [09:48:14] Epoch: 1 Batch: 4841/20099 (24.09%) Loss: 2.076287 LR: 0.00002978 [09:48:16] Epoch: 1 Batch: 4842/20099 (24.09%) Loss: 2.354243 LR: 0.00002978 [09:48:18] Epoch: 1 Batch: 4843/20099 (24.10%) Loss: 1.913371 LR: 0.00002978 [09:48:20] Epoch: 1 Batch: 4844/20099 (24.10%) Loss: 2.029375 LR: 0.00002978 [09:48:21] Epoch: 1 Batch: 4845/20099 (24.11%) Loss: 2.222288 LR: 0.00002978 [09:48:23] Epoch: 1 Batch: 4846/20099 (24.11%) Loss: 1.961665 LR: 0.00002978 [09:48:25] Epoch: 1 Batch: 4847/20099 (24.12%) Loss: 1.968897 LR: 0.00002978 [09:48:27] Epoch: 1 Batch: 4848/20099 (24.12%) Loss: 2.322304 LR: 0.00002978 [09:48:29] Epoch: 1 Batch: 4849/20099 (24.13%) Loss: 2.185101 LR: 0.00002978 [09:48:30] Epoch: 1 Batch: 4850/20099 (24.13%) Loss: 1.924018 LR: 0.00002978 [09:48:32] Epoch: 1 Batch: 4851/20099 (24.14%) Loss: 2.011701 LR: 0.00002978 [09:48:34] Epoch: 1 Batch: 4852/20099 (24.14%) Loss: 2.338996 LR: 0.00002978 [09:48:36] Epoch: 1 Batch: 4853/20099 (24.15%) Loss: 1.962860 LR: 0.00002978 [09:48:37] Epoch: 1 Batch: 4854/20099 (24.15%) Loss: 2.232194 LR: 0.00002978 [09:48:39] Epoch: 1 Batch: 4855/20099 (24.16%) Loss: 2.169894 LR: 0.00002978 [09:48:41] Epoch: 1 Batch: 4856/20099 (24.16%) Loss: 2.543146 LR: 0.00002978 [09:48:43] Epoch: 1 Batch: 4857/20099 (24.17%) Loss: 2.323641 LR: 0.00002978 [09:48:44] Epoch: 1 Batch: 4858/20099 (24.17%) Loss: 2.171200 LR: 0.00002978 [09:48:46] Epoch: 1 Batch: 4859/20099 (24.18%) Loss: 2.190246 LR: 0.00002978 [09:48:48] Epoch: 1 Batch: 4860/20099 (24.18%) Loss: 2.058274 LR: 0.00002978 [09:48:50] Epoch: 1 Batch: 4861/20099 (24.19%) Loss: 2.211266 LR: 0.00002977 [09:48:52] Epoch: 1 Batch: 4862/20099 (24.19%) Loss: 2.154627 LR: 0.00002977 [09:48:53] Epoch: 1 Batch: 4863/20099 (24.20%) Loss: 2.021559 LR: 0.00002977 [09:48:55] Epoch: 1 Batch: 4864/20099 (24.20%) Loss: 2.290583 LR: 0.00002977 [09:48:57] Epoch: 1 Batch: 4865/20099 (24.21%) Loss: 2.144314 LR: 0.00002977 [09:48:59] Epoch: 1 Batch: 4866/20099 (24.21%) Loss: 2.115013 LR: 0.00002977 [09:49:00] Epoch: 1 Batch: 4867/20099 (24.22%) Loss: 2.199650 LR: 0.00002977 [09:49:02] Epoch: 1 Batch: 4868/20099 (24.22%) Loss: 2.124646 LR: 0.00002977 [09:49:04] Epoch: 1 Batch: 4869/20099 (24.23%) Loss: 2.118107 LR: 0.00002977 [09:49:06] Epoch: 1 Batch: 4870/20099 (24.23%) Loss: 2.336019 LR: 0.00002977 [09:49:08] Epoch: 1 Batch: 4871/20099 (24.24%) Loss: 1.979587 LR: 0.00002977 [09:49:09] Epoch: 1 Batch: 4872/20099 (24.24%) Loss: 2.203210 LR: 0.00002977 [09:49:11] Epoch: 1 Batch: 4873/20099 (24.24%) Loss: 1.865507 LR: 0.00002977 [09:49:13] Epoch: 1 Batch: 4874/20099 (24.25%) Loss: 2.165278 LR: 0.00002977 [09:49:15] Epoch: 1 Batch: 4875/20099 (24.25%) Loss: 2.092434 LR: 0.00002977 [09:49:16] Epoch: 1 Batch: 4876/20099 (24.26%) Loss: 2.198335 LR: 0.00002977 [09:49:18] Epoch: 1 Batch: 4877/20099 (24.26%) Loss: 2.177116 LR: 0.00002977 [09:49:20] Epoch: 1 Batch: 4878/20099 (24.27%) Loss: 2.178542 LR: 0.00002977 [09:49:22] Epoch: 1 Batch: 4879/20099 (24.27%) Loss: 2.228541 LR: 0.00002977 [09:49:24] Epoch: 1 Batch: 4880/20099 (24.28%) Loss: 2.155006 LR: 0.00002977 [09:49:25] Epoch: 1 Batch: 4881/20099 (24.28%) Loss: 1.878647 LR: 0.00002977 [09:49:27] Epoch: 1 Batch: 4882/20099 (24.29%) Loss: 2.224158 LR: 0.00002976 [09:49:29] Epoch: 1 Batch: 4883/20099 (24.29%) Loss: 2.217337 LR: 0.00002976 [09:49:31] Epoch: 1 Batch: 4884/20099 (24.30%) Loss: 2.022690 LR: 0.00002976 [09:49:32] Epoch: 1 Batch: 4885/20099 (24.30%) Loss: 2.302140 LR: 0.00002976 [09:49:34] Epoch: 1 Batch: 4886/20099 (24.31%) Loss: 2.273205 LR: 0.00002976 [09:49:36] Epoch: 1 Batch: 4887/20099 (24.31%) Loss: 2.121745 LR: 0.00002976 [09:49:38] Epoch: 1 Batch: 4888/20099 (24.32%) Loss: 2.112606 LR: 0.00002976 [09:49:40] Epoch: 1 Batch: 4889/20099 (24.32%) Loss: 2.243021 LR: 0.00002976 [09:49:41] Epoch: 1 Batch: 4890/20099 (24.33%) Loss: 1.975548 LR: 0.00002976 [09:49:43] Epoch: 1 Batch: 4891/20099 (24.33%) Loss: 2.205206 LR: 0.00002976 [09:49:45] Epoch: 1 Batch: 4892/20099 (24.34%) Loss: 2.295355 LR: 0.00002976 [09:49:47] Epoch: 1 Batch: 4893/20099 (24.34%) Loss: 2.207078 LR: 0.00002976 [09:49:48] Epoch: 1 Batch: 4894/20099 (24.35%) Loss: 2.348968 LR: 0.00002976 [09:49:50] Epoch: 1 Batch: 4895/20099 (24.35%) Loss: 2.314091 LR: 0.00002976 [09:49:52] Epoch: 1 Batch: 4896/20099 (24.36%) Loss: 2.473916 LR: 0.00002976 [09:49:54] Epoch: 1 Batch: 4897/20099 (24.36%) Loss: 2.078672 LR: 0.00002976 [09:49:55] Epoch: 1 Batch: 4898/20099 (24.37%) Loss: 2.011466 LR: 0.00002976 [09:49:57] Epoch: 1 Batch: 4899/20099 (24.37%) Loss: 2.016005 LR: 0.00002976 [09:49:59] Epoch: 1 Batch: 4900/20099 (24.38%) Loss: 1.973158 LR: 0.00002976 [09:50:01] Epoch: 1 Batch: 4901/20099 (24.38%) Loss: 2.144464 LR: 0.00002976 [09:50:03] Epoch: 1 Batch: 4902/20099 (24.39%) Loss: 2.426229 LR: 0.00002976 [09:50:04] Epoch: 1 Batch: 4903/20099 (24.39%) Loss: 1.862036 LR: 0.00002975 [09:50:06] Epoch: 1 Batch: 4904/20099 (24.40%) Loss: 1.879345 LR: 0.00002975 [09:50:08] Epoch: 1 Batch: 4905/20099 (24.40%) Loss: 1.882080 LR: 0.00002975 [09:50:10] Epoch: 1 Batch: 4906/20099 (24.41%) Loss: 2.442721 LR: 0.00002975 [09:50:11] Epoch: 1 Batch: 4907/20099 (24.41%) Loss: 2.149083 LR: 0.00002975 [09:50:13] Epoch: 1 Batch: 4908/20099 (24.42%) Loss: 1.910943 LR: 0.00002975 [09:50:15] Epoch: 1 Batch: 4909/20099 (24.42%) Loss: 2.110845 LR: 0.00002975 [09:50:17] Epoch: 1 Batch: 4910/20099 (24.43%) Loss: 2.311897 LR: 0.00002975 [09:50:19] Epoch: 1 Batch: 4911/20099 (24.43%) Loss: 2.044370 LR: 0.00002975 [09:50:20] Epoch: 1 Batch: 4912/20099 (24.44%) Loss: 2.345976 LR: 0.00002975 [09:50:22] Epoch: 1 Batch: 4913/20099 (24.44%) Loss: 2.064775 LR: 0.00002975 [09:50:24] Epoch: 1 Batch: 4914/20099 (24.45%) Loss: 2.294205 LR: 0.00002975 [09:50:26] Epoch: 1 Batch: 4915/20099 (24.45%) Loss: 2.234475 LR: 0.00002975 [09:50:27] Epoch: 1 Batch: 4916/20099 (24.46%) Loss: 2.220117 LR: 0.00002975 [09:50:29] Epoch: 1 Batch: 4917/20099 (24.46%) Loss: 2.357362 LR: 0.00002975 [09:50:31] Epoch: 1 Batch: 4918/20099 (24.47%) Loss: 2.092838 LR: 0.00002975 [09:50:33] Epoch: 1 Batch: 4919/20099 (24.47%) Loss: 2.442264 LR: 0.00002975 [09:50:34] Epoch: 1 Batch: 4920/20099 (24.48%) Loss: 2.331126 LR: 0.00002975 [09:50:36] Epoch: 1 Batch: 4921/20099 (24.48%) Loss: 2.399190 LR: 0.00002975 [09:50:38] Epoch: 1 Batch: 4922/20099 (24.49%) Loss: 1.937255 LR: 0.00002975 [09:50:40] Epoch: 1 Batch: 4923/20099 (24.49%) Loss: 2.503200 LR: 0.00002975 [09:50:41] Epoch: 1 Batch: 4924/20099 (24.50%) Loss: 1.809917 LR: 0.00002974 [09:50:43] Epoch: 1 Batch: 4925/20099 (24.50%) Loss: 2.186947 LR: 0.00002974 [09:50:45] Epoch: 1 Batch: 4926/20099 (24.51%) Loss: 1.942776 LR: 0.00002974 [09:50:47] Epoch: 1 Batch: 4927/20099 (24.51%) Loss: 2.111140 LR: 0.00002974 [09:50:49] Epoch: 1 Batch: 4928/20099 (24.52%) Loss: 2.427361 LR: 0.00002974 [09:50:50] Epoch: 1 Batch: 4929/20099 (24.52%) Loss: 2.201350 LR: 0.00002974 [09:50:52] Epoch: 1 Batch: 4930/20099 (24.53%) Loss: 2.427022 LR: 0.00002974 [09:50:54] Epoch: 1 Batch: 4931/20099 (24.53%) Loss: 2.087469 LR: 0.00002974 [09:50:56] Epoch: 1 Batch: 4932/20099 (24.54%) Loss: 2.282948 LR: 0.00002974 [09:50:57] Epoch: 1 Batch: 4933/20099 (24.54%) Loss: 1.880638 LR: 0.00002974 [09:50:59] Epoch: 1 Batch: 4934/20099 (24.55%) Loss: 2.480005 LR: 0.00002974 [09:51:01] Epoch: 1 Batch: 4935/20099 (24.55%) Loss: 2.043077 LR: 0.00002974 [09:51:03] Epoch: 1 Batch: 4936/20099 (24.56%) Loss: 2.374222 LR: 0.00002974 [09:51:04] Epoch: 1 Batch: 4937/20099 (24.56%) Loss: 2.139943 LR: 0.00002974 [09:51:06] Epoch: 1 Batch: 4938/20099 (24.57%) Loss: 2.200003 LR: 0.00002974 [09:51:08] Epoch: 1 Batch: 4939/20099 (24.57%) Loss: 2.139958 LR: 0.00002974 [09:51:10] Epoch: 1 Batch: 4940/20099 (24.58%) Loss: 1.947821 LR: 0.00002974 [09:51:12] Epoch: 1 Batch: 4941/20099 (24.58%) Loss: 2.384009 LR: 0.00002974 [09:51:13] Epoch: 1 Batch: 4942/20099 (24.59%) Loss: 2.242365 LR: 0.00002974 [09:51:15] Epoch: 1 Batch: 4943/20099 (24.59%) Loss: 2.346899 LR: 0.00002974 [09:51:17] Epoch: 1 Batch: 4944/20099 (24.60%) Loss: 2.159861 LR: 0.00002974 [09:51:19] Epoch: 1 Batch: 4945/20099 (24.60%) Loss: 2.574744 LR: 0.00002973 [09:51:20] Epoch: 1 Batch: 4946/20099 (24.61%) Loss: 2.327347 LR: 0.00002973 [09:51:22] Epoch: 1 Batch: 4947/20099 (24.61%) Loss: 2.064087 LR: 0.00002973 [09:51:24] Epoch: 1 Batch: 4948/20099 (24.62%) Loss: 1.977720 LR: 0.00002973 [09:51:26] Epoch: 1 Batch: 4949/20099 (24.62%) Loss: 2.397459 LR: 0.00002973 [09:51:27] Epoch: 1 Batch: 4950/20099 (24.63%) Loss: 1.952505 LR: 0.00002973 [09:51:29] Epoch: 1 Batch: 4951/20099 (24.63%) Loss: 2.244450 LR: 0.00002973 [09:51:31] Epoch: 1 Batch: 4952/20099 (24.64%) Loss: 1.979291 LR: 0.00002973 [09:51:33] Epoch: 1 Batch: 4953/20099 (24.64%) Loss: 2.129127 LR: 0.00002973 [09:51:35] Epoch: 1 Batch: 4954/20099 (24.65%) Loss: 2.237111 LR: 0.00002973 [09:51:36] Epoch: 1 Batch: 4955/20099 (24.65%) Loss: 2.307721 LR: 0.00002973 [09:51:38] Epoch: 1 Batch: 4956/20099 (24.66%) Loss: 1.912464 LR: 0.00002973 [09:51:40] Epoch: 1 Batch: 4957/20099 (24.66%) Loss: 2.204460 LR: 0.00002973 [09:51:42] Epoch: 1 Batch: 4958/20099 (24.67%) Loss: 2.192246 LR: 0.00002973 [09:51:43] Epoch: 1 Batch: 4959/20099 (24.67%) Loss: 2.090888 LR: 0.00002973 [09:51:45] Epoch: 1 Batch: 4960/20099 (24.68%) Loss: 2.210060 LR: 0.00002973 [09:51:47] Epoch: 1 Batch: 4961/20099 (24.68%) Loss: 2.251124 LR: 0.00002973 [09:51:49] Epoch: 1 Batch: 4962/20099 (24.69%) Loss: 1.944263 LR: 0.00002973 [09:51:50] Epoch: 1 Batch: 4963/20099 (24.69%) Loss: 1.952312 LR: 0.00002973 [09:51:52] Epoch: 1 Batch: 4964/20099 (24.70%) Loss: 2.266460 LR: 0.00002973 [09:51:54] Epoch: 1 Batch: 4965/20099 (24.70%) Loss: 2.101196 LR: 0.00002973 [09:51:56] Epoch: 1 Batch: 4966/20099 (24.71%) Loss: 2.283311 LR: 0.00002972 [09:51:58] Epoch: 1 Batch: 4967/20099 (24.71%) Loss: 2.093925 LR: 0.00002972 [09:51:59] Epoch: 1 Batch: 4968/20099 (24.72%) Loss: 1.764931 LR: 0.00002972 [09:52:01] Epoch: 1 Batch: 4969/20099 (24.72%) Loss: 2.086683 LR: 0.00002972 [09:52:03] Epoch: 1 Batch: 4970/20099 (24.73%) Loss: 1.978712 LR: 0.00002972 [09:52:05] Epoch: 1 Batch: 4971/20099 (24.73%) Loss: 2.248721 LR: 0.00002972 [09:52:06] Epoch: 1 Batch: 4972/20099 (24.74%) Loss: 2.351784 LR: 0.00002972 [09:52:08] Epoch: 1 Batch: 4973/20099 (24.74%) Loss: 2.048282 LR: 0.00002972 [09:52:10] Epoch: 1 Batch: 4974/20099 (24.75%) Loss: 2.035323 LR: 0.00002972 [09:52:12] Epoch: 1 Batch: 4975/20099 (24.75%) Loss: 1.936232 LR: 0.00002972 [09:52:13] Epoch: 1 Batch: 4976/20099 (24.76%) Loss: 2.073332 LR: 0.00002972 [09:52:15] Epoch: 1 Batch: 4977/20099 (24.76%) Loss: 2.321735 LR: 0.00002972 [09:52:17] Epoch: 1 Batch: 4978/20099 (24.77%) Loss: 2.575166 LR: 0.00002972 [09:52:19] Epoch: 1 Batch: 4979/20099 (24.77%) Loss: 2.337676 LR: 0.00002972 [09:52:21] Epoch: 1 Batch: 4980/20099 (24.78%) Loss: 1.990798 LR: 0.00002972 [09:52:22] Epoch: 1 Batch: 4981/20099 (24.78%) Loss: 2.023738 LR: 0.00002972 [09:52:24] Epoch: 1 Batch: 4982/20099 (24.79%) Loss: 2.327463 LR: 0.00002972 [09:52:26] Epoch: 1 Batch: 4983/20099 (24.79%) Loss: 2.047676 LR: 0.00002972 [09:52:28] Epoch: 1 Batch: 4984/20099 (24.80%) Loss: 2.054253 LR: 0.00002972 [09:52:29] Epoch: 1 Batch: 4985/20099 (24.80%) Loss: 2.588379 LR: 0.00002972 [09:52:31] Epoch: 1 Batch: 4986/20099 (24.81%) Loss: 2.171120 LR: 0.00002972 [09:52:33] Epoch: 1 Batch: 4987/20099 (24.81%) Loss: 2.266971 LR: 0.00002971 [09:52:35] Epoch: 1 Batch: 4988/20099 (24.82%) Loss: 2.649441 LR: 0.00002971 [09:52:36] Epoch: 1 Batch: 4989/20099 (24.82%) Loss: 1.894354 LR: 0.00002971 [09:52:38] Epoch: 1 Batch: 4990/20099 (24.83%) Loss: 2.225730 LR: 0.00002971 [09:52:40] Epoch: 1 Batch: 4991/20099 (24.83%) Loss: 2.412319 LR: 0.00002971 [09:52:42] Epoch: 1 Batch: 4992/20099 (24.84%) Loss: 2.241832 LR: 0.00002971 [09:52:44] Epoch: 1 Batch: 4993/20099 (24.84%) Loss: 2.143329 LR: 0.00002971 [09:52:45] Epoch: 1 Batch: 4994/20099 (24.85%) Loss: 1.718128 LR: 0.00002971 [09:52:47] Epoch: 1 Batch: 4995/20099 (24.85%) Loss: 2.106905 LR: 0.00002971 [09:52:49] Epoch: 1 Batch: 4996/20099 (24.86%) Loss: 2.067069 LR: 0.00002971 [09:52:51] Epoch: 1 Batch: 4997/20099 (24.86%) Loss: 2.272020 LR: 0.00002971 [09:52:52] Epoch: 1 Batch: 4998/20099 (24.87%) Loss: 2.179208 LR: 0.00002971 [09:52:54] Epoch: 1 Batch: 4999/20099 (24.87%) Loss: 2.087494 LR: 0.00002971 [09:52:56] >> Evaluating batch 0 [09:52:57] >> Evaluating batch 1 [09:52:58] >> Evaluating batch 2 [09:52:59] >> Evaluating batch 3 [09:53:00] >> Evaluating batch 4 [09:53:01] >> Evaluating batch 5 [09:53:02] >> Evaluating batch 6 [09:53:03] >> Evaluating batch 7 [09:53:04] >> Evaluating batch 8 [09:53:05] >> Evaluating batch 9 [09:53:06] >> Evaluating batch 10 [09:53:07] >> Evaluating batch 11 [09:53:08] >> Evaluating batch 12 [09:53:09] >> Evaluating batch 13 [09:53:10] >> Evaluating batch 14 [09:53:11] >> Evaluating batch 15 [09:53:12] >> Evaluating batch 16 [09:53:12] Epoch: 1 Step: 5000/20099 Evaluation: [09:53:12] [1mAvg Loss Since Last Eval: 2.1737 Val Loss: 2.2270 Validation loss delta: -0.0230 Perplexity: 9.2722 LR: 0.00002971 [09:53:16] >> Cleaned up old temp checkpoint: epoch1_step3000 [09:53:16] >> Temp checkpoint saved: epoch1_step5000, size: 0.1693 GB [09:53:20] >> Checkpoint saved: epoch1_step5000, size: 0.1693 GB [09:53:20] Epoch: 1 Batch: 5000/20099 (24.88%) Loss: 2.851571 LR: 0.00002971 [09:53:21] Epoch: 1 Batch: 5001/20099 (24.88%) Loss: 2.080209 LR: 0.00002971 [09:53:23] Epoch: 1 Batch: 5002/20099 (24.89%) Loss: 2.366229 LR: 0.00002971 [09:53:25] Epoch: 1 Batch: 5003/20099 (24.89%) Loss: 2.242831 LR: 0.00002971 [09:53:27] Epoch: 1 Batch: 5004/20099 (24.90%) Loss: 2.142959 LR: 0.00002971 [09:53:28] Epoch: 1 Batch: 5005/20099 (24.90%) Loss: 2.267247 LR: 0.00002971 [09:53:30] Epoch: 1 Batch: 5006/20099 (24.91%) Loss: 2.479731 LR: 0.00002971 [09:53:32] Epoch: 1 Batch: 5007/20099 (24.91%) Loss: 2.229644 LR: 0.00002971 [09:53:34] Epoch: 1 Batch: 5008/20099 (24.92%) Loss: 2.243711 LR: 0.00002970 [09:53:36] Epoch: 1 Batch: 5009/20099 (24.92%) Loss: 2.176639 LR: 0.00002970 [09:53:37] Epoch: 1 Batch: 5010/20099 (24.93%) Loss: 2.079576 LR: 0.00002970 [09:53:39] Epoch: 1 Batch: 5011/20099 (24.93%) Loss: 2.405908 LR: 0.00002970 [09:53:41] Epoch: 1 Batch: 5012/20099 (24.94%) Loss: 2.203678 LR: 0.00002970 [09:53:43] Epoch: 1 Batch: 5013/20099 (24.94%) Loss: 1.990335 LR: 0.00002970 [09:53:45] Epoch: 1 Batch: 5014/20099 (24.95%) Loss: 2.036045 LR: 0.00002970 [09:53:47] Epoch: 1 Batch: 5015/20099 (24.95%) Loss: 2.311205 LR: 0.00002970 [09:53:48] Epoch: 1 Batch: 5016/20099 (24.96%) Loss: 2.173803 LR: 0.00002970 [09:53:50] Epoch: 1 Batch: 5017/20099 (24.96%) Loss: 2.137241 LR: 0.00002970 [09:53:52] Epoch: 1 Batch: 5018/20099 (24.97%) Loss: 1.999016 LR: 0.00002970 [09:53:54] Epoch: 1 Batch: 5019/20099 (24.97%) Loss: 2.367273 LR: 0.00002970 [09:53:56] Epoch: 1 Batch: 5020/20099 (24.98%) Loss: 2.432936 LR: 0.00002970 [09:53:57] Epoch: 1 Batch: 5021/20099 (24.98%) Loss: 2.315578 LR: 0.00002970 [09:53:59] Epoch: 1 Batch: 5022/20099 (24.99%) Loss: 2.121822 LR: 0.00002969 [09:54:01] Epoch: 1 Batch: 5023/20099 (24.99%) Loss: 2.458246 LR: 0.00002969 [09:54:03] Epoch: 1 Batch: 5024/20099 (25.00%) Loss: 2.106646 LR: 0.00002969 [09:54:05] Epoch: 1 Batch: 5025/20099 (25.00%) Loss: 2.111447 LR: 0.00002969 [09:54:06] Epoch: 1 Batch: 5026/20099 (25.01%) Loss: 1.845444 LR: 0.00002969 [09:54:08] Epoch: 1 Batch: 5027/20099 (25.01%) Loss: 2.172496 LR: 0.00002969 [09:54:10] Epoch: 1 Batch: 5028/20099 (25.02%) Loss: 1.979561 LR: 0.00002969 [09:54:12] Epoch: 1 Batch: 5029/20099 (25.02%) Loss: 2.240930 LR: 0.00002969 [09:54:14] Epoch: 1 Batch: 5030/20099 (25.03%) Loss: 2.196163 LR: 0.00002969 [09:54:15] Epoch: 1 Batch: 5031/20099 (25.03%) Loss: 2.227931 LR: 0.00002969 [09:54:17] Epoch: 1 Batch: 5032/20099 (25.04%) Loss: 1.886937 LR: 0.00002969 [09:54:19] Epoch: 1 Batch: 5033/20099 (25.04%) Loss: 2.261586 LR: 0.00002969 [09:54:21] Epoch: 1 Batch: 5034/20099 (25.05%) Loss: 1.973758 LR: 0.00002969 [09:54:22] Epoch: 1 Batch: 5035/20099 (25.05%) Loss: 2.348380 LR: 0.00002969 [09:54:24] Epoch: 1 Batch: 5036/20099 (25.06%) Loss: 2.284104 LR: 0.00002969 [09:54:26] Epoch: 1 Batch: 5037/20099 (25.06%) Loss: 2.303903 LR: 0.00002969 [09:54:28] Epoch: 1 Batch: 5038/20099 (25.07%) Loss: 2.127284 LR: 0.00002969 [09:54:29] Epoch: 1 Batch: 5039/20099 (25.07%) Loss: 2.591542 LR: 0.00002969 [09:54:31] Epoch: 1 Batch: 5040/20099 (25.08%) Loss: 2.263111 LR: 0.00002969 [09:54:33] Epoch: 1 Batch: 5041/20099 (25.08%) Loss: 2.448599 LR: 0.00002969 [09:54:35] Epoch: 1 Batch: 5042/20099 (25.09%) Loss: 1.946489 LR: 0.00002969 [09:54:36] Epoch: 1 Batch: 5043/20099 (25.09%) Loss: 2.416937 LR: 0.00002968 [09:54:38] Epoch: 1 Batch: 5044/20099 (25.10%) Loss: 1.884483 LR: 0.00002968 [09:54:40] Epoch: 1 Batch: 5045/20099 (25.10%) Loss: 2.525615 LR: 0.00002968 [09:54:42] Epoch: 1 Batch: 5046/20099 (25.11%) Loss: 2.123149 LR: 0.00002968 [09:54:43] Epoch: 1 Batch: 5047/20099 (25.11%) Loss: 2.089194 LR: 0.00002968 [09:54:45] Epoch: 1 Batch: 5048/20099 (25.12%) Loss: 1.821545 LR: 0.00002968 [09:54:47] Epoch: 1 Batch: 5049/20099 (25.12%) Loss: 2.329400 LR: 0.00002968 [09:54:49] Epoch: 1 Batch: 5050/20099 (25.13%) Loss: 2.261026 LR: 0.00002968 [09:54:50] Epoch: 1 Batch: 5051/20099 (25.13%) Loss: 1.971361 LR: 0.00002968 [09:54:52] Epoch: 1 Batch: 5052/20099 (25.14%) Loss: 1.957452 LR: 0.00002968 [09:54:54] Epoch: 1 Batch: 5053/20099 (25.14%) Loss: 2.236181 LR: 0.00002968 [09:54:56] Epoch: 1 Batch: 5054/20099 (25.15%) Loss: 2.456327 LR: 0.00002968 [09:54:58] Epoch: 1 Batch: 5055/20099 (25.15%) Loss: 2.365149 LR: 0.00002968 [09:54:59] Epoch: 1 Batch: 5056/20099 (25.16%) Loss: 2.120660 LR: 0.00002968 [09:55:01] Epoch: 1 Batch: 5057/20099 (25.16%) Loss: 2.354949 LR: 0.00002968 [09:55:03] Epoch: 1 Batch: 5058/20099 (25.17%) Loss: 2.457166 LR: 0.00002968 [09:55:05] Epoch: 1 Batch: 5059/20099 (25.17%) Loss: 2.345314 LR: 0.00002968 [09:55:06] Epoch: 1 Batch: 5060/20099 (25.18%) Loss: 2.072136 LR: 0.00002968 [09:55:08] Epoch: 1 Batch: 5061/20099 (25.18%) Loss: 2.036778 LR: 0.00002968 [09:55:10] Epoch: 1 Batch: 5062/20099 (25.19%) Loss: 2.226122 LR: 0.00002968 [09:55:12] Epoch: 1 Batch: 5063/20099 (25.19%) Loss: 2.248617 LR: 0.00002968 [09:55:14] Epoch: 1 Batch: 5064/20099 (25.20%) Loss: 1.989989 LR: 0.00002967 [09:55:15] Epoch: 1 Batch: 5065/20099 (25.20%) Loss: 2.217492 LR: 0.00002967 [09:55:17] Epoch: 1 Batch: 5066/20099 (25.21%) Loss: 2.263133 LR: 0.00002967 [09:55:19] Epoch: 1 Batch: 5067/20099 (25.21%) Loss: 2.183441 LR: 0.00002967 [09:55:21] Epoch: 1 Batch: 5068/20099 (25.22%) Loss: 2.024618 LR: 0.00002967 [09:55:22] Epoch: 1 Batch: 5069/20099 (25.22%) Loss: 2.032421 LR: 0.00002967 [09:55:24] Epoch: 1 Batch: 5070/20099 (25.23%) Loss: 2.105435 LR: 0.00002967 [09:55:26] Epoch: 1 Batch: 5071/20099 (25.23%) Loss: 2.280354 LR: 0.00002967 [09:55:28] Epoch: 1 Batch: 5072/20099 (25.24%) Loss: 2.200377 LR: 0.00002967 [09:55:30] Epoch: 1 Batch: 5073/20099 (25.24%) Loss: 2.267823 LR: 0.00002967 [09:55:31] Epoch: 1 Batch: 5074/20099 (25.25%) Loss: 2.109293 LR: 0.00002967 [09:55:33] Epoch: 1 Batch: 5075/20099 (25.25%) Loss: 2.211303 LR: 0.00002967 [09:55:35] Epoch: 1 Batch: 5076/20099 (25.25%) Loss: 2.619929 LR: 0.00002967 [09:55:37] Epoch: 1 Batch: 5077/20099 (25.26%) Loss: 2.137373 LR: 0.00002967 [09:55:39] Epoch: 1 Batch: 5078/20099 (25.26%) Loss: 2.380644 LR: 0.00002966 [09:55:40] Epoch: 1 Batch: 5079/20099 (25.27%) Loss: 2.055312 LR: 0.00002966 [09:55:42] Epoch: 1 Batch: 5080/20099 (25.27%) Loss: 2.142936 LR: 0.00002966 [09:55:44] Epoch: 1 Batch: 5081/20099 (25.28%) Loss: 2.181513 LR: 0.00002966 [09:55:46] Epoch: 1 Batch: 5082/20099 (25.28%) Loss: 2.027374 LR: 0.00002966 [09:55:47] Epoch: 1 Batch: 5083/20099 (25.29%) Loss: 2.072664 LR: 0.00002966 [09:55:49] Epoch: 1 Batch: 5084/20099 (25.29%) Loss: 2.156169 LR: 0.00002966 [09:55:51] Epoch: 1 Batch: 5085/20099 (25.30%) Loss: 2.111713 LR: 0.00002966 [09:55:53] Epoch: 1 Batch: 5086/20099 (25.30%) Loss: 2.183945 LR: 0.00002966 [09:55:54] Epoch: 1 Batch: 5087/20099 (25.31%) Loss: 1.907758 LR: 0.00002966 [09:55:56] Epoch: 1 Batch: 5088/20099 (25.31%) Loss: 2.264148 LR: 0.00002966 [09:55:58] Epoch: 1 Batch: 5089/20099 (25.32%) Loss: 2.085764 LR: 0.00002966 [09:56:00] Epoch: 1 Batch: 5090/20099 (25.32%) Loss: 1.935993 LR: 0.00002966 [09:56:01] Epoch: 1 Batch: 5091/20099 (25.33%) Loss: 2.302396 LR: 0.00002966 [09:56:03] Epoch: 1 Batch: 5092/20099 (25.33%) Loss: 2.428800 LR: 0.00002966 [09:56:05] Epoch: 1 Batch: 5093/20099 (25.34%) Loss: 2.317728 LR: 0.00002966 [09:56:07] Epoch: 1 Batch: 5094/20099 (25.34%) Loss: 2.200238 LR: 0.00002966 [09:56:09] Epoch: 1 Batch: 5095/20099 (25.35%) Loss: 1.937538 LR: 0.00002966 [09:56:10] Epoch: 1 Batch: 5096/20099 (25.35%) Loss: 2.151785 LR: 0.00002966 [09:56:12] Epoch: 1 Batch: 5097/20099 (25.36%) Loss: 2.227997 LR: 0.00002966 [09:56:14] Epoch: 1 Batch: 5098/20099 (25.36%) Loss: 2.187951 LR: 0.00002966 [09:56:16] Epoch: 1 Batch: 5099/20099 (25.37%) Loss: 2.208586 LR: 0.00002965 [09:56:17] Epoch: 1 Batch: 5100/20099 (25.37%) Loss: 2.386372 LR: 0.00002965 [09:56:19] Epoch: 1 Batch: 5101/20099 (25.38%) Loss: 1.812694 LR: 0.00002965 [09:56:21] Epoch: 1 Batch: 5102/20099 (25.38%) Loss: 2.130867 LR: 0.00002965 [09:56:23] Epoch: 1 Batch: 5103/20099 (25.39%) Loss: 2.153519 LR: 0.00002965 [09:56:24] Epoch: 1 Batch: 5104/20099 (25.39%) Loss: 2.100061 LR: 0.00002965 [09:56:26] Epoch: 1 Batch: 5105/20099 (25.40%) Loss: 2.073378 LR: 0.00002965 [09:56:28] Epoch: 1 Batch: 5106/20099 (25.40%) Loss: 1.987383 LR: 0.00002965 [09:56:30] Epoch: 1 Batch: 5107/20099 (25.41%) Loss: 2.197016 LR: 0.00002965 [09:56:31] Epoch: 1 Batch: 5108/20099 (25.41%) Loss: 1.898695 LR: 0.00002965 [09:56:33] Epoch: 1 Batch: 5109/20099 (25.42%) Loss: 2.173894 LR: 0.00002965 [09:56:35] Epoch: 1 Batch: 5110/20099 (25.42%) Loss: 2.258354 LR: 0.00002965 [09:56:37] Epoch: 1 Batch: 5111/20099 (25.43%) Loss: 2.318998 LR: 0.00002965 [09:56:39] Epoch: 1 Batch: 5112/20099 (25.43%) Loss: 2.032122 LR: 0.00002965 [09:56:40] Epoch: 1 Batch: 5113/20099 (25.44%) Loss: 2.317402 LR: 0.00002965 [09:56:42] Epoch: 1 Batch: 5114/20099 (25.44%) Loss: 2.298309 LR: 0.00002965 [09:56:44] Epoch: 1 Batch: 5115/20099 (25.45%) Loss: 2.197459 LR: 0.00002965 [09:56:46] Epoch: 1 Batch: 5116/20099 (25.45%) Loss: 2.058699 LR: 0.00002965 [09:56:47] Epoch: 1 Batch: 5117/20099 (25.46%) Loss: 2.573050 LR: 0.00002965 [09:56:49] Epoch: 1 Batch: 5118/20099 (25.46%) Loss: 2.187925 LR: 0.00002965 [09:56:51] Epoch: 1 Batch: 5119/20099 (25.47%) Loss: 2.668014 LR: 0.00002965 [09:56:53] Epoch: 1 Batch: 5120/20099 (25.47%) Loss: 2.150378 LR: 0.00002964 [09:56:55] Epoch: 1 Batch: 5121/20099 (25.48%) Loss: 2.171249 LR: 0.00002964 [09:56:56] Epoch: 1 Batch: 5122/20099 (25.48%) Loss: 2.185718 LR: 0.00002964 [09:56:58] Epoch: 1 Batch: 5123/20099 (25.49%) Loss: 2.049951 LR: 0.00002964 [09:57:00] Epoch: 1 Batch: 5124/20099 (25.49%) Loss: 2.215984 LR: 0.00002964 [09:57:02] Epoch: 1 Batch: 5125/20099 (25.50%) Loss: 1.875645 LR: 0.00002964 [09:57:03] Epoch: 1 Batch: 5126/20099 (25.50%) Loss: 2.088014 LR: 0.00002964 [09:57:05] Epoch: 1 Batch: 5127/20099 (25.51%) Loss: 2.194373 LR: 0.00002964 [09:57:07] Epoch: 1 Batch: 5128/20099 (25.51%) Loss: 2.093822 LR: 0.00002964 [09:57:09] Epoch: 1 Batch: 5129/20099 (25.52%) Loss: 2.440821 LR: 0.00002964 [09:57:10] Epoch: 1 Batch: 5130/20099 (25.52%) Loss: 2.309804 LR: 0.00002964 [09:57:12] Epoch: 1 Batch: 5131/20099 (25.53%) Loss: 2.474787 LR: 0.00002964 [09:57:14] Epoch: 1 Batch: 5132/20099 (25.53%) Loss: 2.245017 LR: 0.00002964 [09:57:16] Epoch: 1 Batch: 5133/20099 (25.54%) Loss: 2.397236 LR: 0.00002964 [09:57:18] Epoch: 1 Batch: 5134/20099 (25.54%) Loss: 1.975077 LR: 0.00002963 [09:57:19] Epoch: 1 Batch: 5135/20099 (25.55%) Loss: 2.271425 LR: 0.00002963 [09:57:21] Epoch: 1 Batch: 5136/20099 (25.55%) Loss: 1.811434 LR: 0.00002963 [09:57:23] Epoch: 1 Batch: 5137/20099 (25.56%) Loss: 1.876786 LR: 0.00002963 [09:57:25] Epoch: 1 Batch: 5138/20099 (25.56%) Loss: 2.212978 LR: 0.00002963 [09:57:27] Epoch: 1 Batch: 5139/20099 (25.57%) Loss: 2.047060 LR: 0.00002963 [09:57:28] Epoch: 1 Batch: 5140/20099 (25.57%) Loss: 2.023449 LR: 0.00002963 [09:57:30] Epoch: 1 Batch: 5141/20099 (25.58%) Loss: 2.472535 LR: 0.00002963 [09:57:32] Epoch: 1 Batch: 5142/20099 (25.58%) Loss: 1.974768 LR: 0.00002963 [09:57:34] Epoch: 1 Batch: 5143/20099 (25.59%) Loss: 2.002954 LR: 0.00002963 [09:57:35] Epoch: 1 Batch: 5144/20099 (25.59%) Loss: 2.232331 LR: 0.00002963 [09:57:37] Epoch: 1 Batch: 5145/20099 (25.60%) Loss: 2.243229 LR: 0.00002963 [09:57:39] Epoch: 1 Batch: 5146/20099 (25.60%) Loss: 2.211474 LR: 0.00002963 [09:57:41] Epoch: 1 Batch: 5147/20099 (25.61%) Loss: 2.213857 LR: 0.00002963 [09:57:42] Epoch: 1 Batch: 5148/20099 (25.61%) Loss: 2.011610 LR: 0.00002963 [09:57:44] Epoch: 1 Batch: 5149/20099 (25.62%) Loss: 2.352600 LR: 0.00002963 [09:57:46] Epoch: 1 Batch: 5150/20099 (25.62%) Loss: 2.182811 LR: 0.00002963 [09:57:48] Epoch: 1 Batch: 5151/20099 (25.63%) Loss: 2.296450 LR: 0.00002963 [09:57:50] Epoch: 1 Batch: 5152/20099 (25.63%) Loss: 2.089911 LR: 0.00002963 [09:57:51] Epoch: 1 Batch: 5153/20099 (25.64%) Loss: 2.424341 LR: 0.00002963 [09:57:53] Epoch: 1 Batch: 5154/20099 (25.64%) Loss: 2.280419 LR: 0.00002963 [09:57:55] Epoch: 1 Batch: 5155/20099 (25.65%) Loss: 2.134317 LR: 0.00002962 [09:57:57] Epoch: 1 Batch: 5156/20099 (25.65%) Loss: 2.361571 LR: 0.00002962 [09:57:58] Epoch: 1 Batch: 5157/20099 (25.66%) Loss: 2.201919 LR: 0.00002962 [09:58:00] Epoch: 1 Batch: 5158/20099 (25.66%) Loss: 2.307642 LR: 0.00002962 [09:58:02] Epoch: 1 Batch: 5159/20099 (25.67%) Loss: 2.037541 LR: 0.00002962 [09:58:04] Epoch: 1 Batch: 5160/20099 (25.67%) Loss: 2.114485 LR: 0.00002962 [09:58:05] Epoch: 1 Batch: 5161/20099 (25.68%) Loss: 1.782122 LR: 0.00002962 [09:58:07] Epoch: 1 Batch: 5162/20099 (25.68%) Loss: 2.386678 LR: 0.00002962 [09:58:09] Epoch: 1 Batch: 5163/20099 (25.69%) Loss: 2.247995 LR: 0.00002962 [09:58:11] Epoch: 1 Batch: 5164/20099 (25.69%) Loss: 2.036732 LR: 0.00002962 [09:58:13] Epoch: 1 Batch: 5165/20099 (25.70%) Loss: 1.930676 LR: 0.00002962 [09:58:14] Epoch: 1 Batch: 5166/20099 (25.70%) Loss: 2.182351 LR: 0.00002962 [09:58:16] Epoch: 1 Batch: 5167/20099 (25.71%) Loss: 2.266179 LR: 0.00002962 [09:58:18] Epoch: 1 Batch: 5168/20099 (25.71%) Loss: 2.083990 LR: 0.00002962 [09:58:20] Epoch: 1 Batch: 5169/20099 (25.72%) Loss: 2.239344 LR: 0.00002961 [09:58:21] Epoch: 1 Batch: 5170/20099 (25.72%) Loss: 2.294771 LR: 0.00002961 [09:58:23] Epoch: 1 Batch: 5171/20099 (25.73%) Loss: 2.059346 LR: 0.00002961 [09:58:25] Epoch: 1 Batch: 5172/20099 (25.73%) Loss: 2.218330 LR: 0.00002961 [09:58:27] Epoch: 1 Batch: 5173/20099 (25.74%) Loss: 1.920479 LR: 0.00002961 [09:58:28] Epoch: 1 Batch: 5174/20099 (25.74%) Loss: 1.964068 LR: 0.00002961 [09:58:30] Epoch: 1 Batch: 5175/20099 (25.75%) Loss: 2.025846 LR: 0.00002961 [09:58:32] Epoch: 1 Batch: 5176/20099 (25.75%) Loss: 2.012708 LR: 0.00002961 [09:58:34] Epoch: 1 Batch: 5177/20099 (25.76%) Loss: 2.184777 LR: 0.00002961 [09:58:36] Epoch: 1 Batch: 5178/20099 (25.76%) Loss: 1.947737 LR: 0.00002961 [09:58:37] Epoch: 1 Batch: 5179/20099 (25.77%) Loss: 2.226052 LR: 0.00002961 [09:58:39] Epoch: 1 Batch: 5180/20099 (25.77%) Loss: 2.344881 LR: 0.00002961 [09:58:41] Epoch: 1 Batch: 5181/20099 (25.78%) Loss: 2.275828 LR: 0.00002961 [09:58:43] Epoch: 1 Batch: 5182/20099 (25.78%) Loss: 1.784848 LR: 0.00002961 [09:58:44] Epoch: 1 Batch: 5183/20099 (25.79%) Loss: 2.316425 LR: 0.00002961 [09:58:46] Epoch: 1 Batch: 5184/20099 (25.79%) Loss: 2.111327 LR: 0.00002961 [09:58:48] Epoch: 1 Batch: 5185/20099 (25.80%) Loss: 2.050776 LR: 0.00002961 [09:58:50] Epoch: 1 Batch: 5186/20099 (25.80%) Loss: 2.236385 LR: 0.00002961 [09:58:51] Epoch: 1 Batch: 5187/20099 (25.81%) Loss: 2.197124 LR: 0.00002961 [09:58:53] Epoch: 1 Batch: 5188/20099 (25.81%) Loss: 2.071490 LR: 0.00002961 [09:58:55] Epoch: 1 Batch: 5189/20099 (25.82%) Loss: 2.126563 LR: 0.00002961 [09:58:57] Epoch: 1 Batch: 5190/20099 (25.82%) Loss: 2.282122 LR: 0.00002960 [09:58:58] Epoch: 1 Batch: 5191/20099 (25.83%) Loss: 2.160692 LR: 0.00002960 [09:59:00] Epoch: 1 Batch: 5192/20099 (25.83%) Loss: 2.069715 LR: 0.00002960 [09:59:02] Epoch: 1 Batch: 5193/20099 (25.84%) Loss: 2.334690 LR: 0.00002960 [09:59:04] Epoch: 1 Batch: 5194/20099 (25.84%) Loss: 1.808087 LR: 0.00002960 [09:59:06] Epoch: 1 Batch: 5195/20099 (25.85%) Loss: 2.366622 LR: 0.00002960 [09:59:07] Epoch: 1 Batch: 5196/20099 (25.85%) Loss: 2.141907 LR: 0.00002960 [09:59:09] Epoch: 1 Batch: 5197/20099 (25.86%) Loss: 2.319940 LR: 0.00002960 [09:59:11] Epoch: 1 Batch: 5198/20099 (25.86%) Loss: 1.934475 LR: 0.00002960 [09:59:13] Epoch: 1 Batch: 5199/20099 (25.87%) Loss: 2.054247 LR: 0.00002960 [09:59:18] >> Cleaned up old temp checkpoint: epoch1_step3200 [09:59:18] >> Temp checkpoint saved: epoch1_step5200, size: 0.1693 GB [09:59:18] Epoch: 1 Batch: 5200/20099 (25.87%) Loss: 2.111468 LR: 0.00002960 [09:59:20] Epoch: 1 Batch: 5201/20099 (25.88%) Loss: 2.183207 LR: 0.00002960 [09:59:21] Epoch: 1 Batch: 5202/20099 (25.88%) Loss: 2.348523 LR: 0.00002960 [09:59:23] Epoch: 1 Batch: 5203/20099 (25.89%) Loss: 2.546628 LR: 0.00002960 [09:59:25] Epoch: 1 Batch: 5204/20099 (25.89%) Loss: 2.124391 LR: 0.00002959 [09:59:27] Epoch: 1 Batch: 5205/20099 (25.90%) Loss: 1.827406 LR: 0.00002959 [09:59:28] Epoch: 1 Batch: 5206/20099 (25.90%) Loss: 2.222822 LR: 0.00002959 [09:59:30] Epoch: 1 Batch: 5207/20099 (25.91%) Loss: 2.085008 LR: 0.00002959 [09:59:32] Epoch: 1 Batch: 5208/20099 (25.91%) Loss: 1.961774 LR: 0.00002959 [09:59:34] Epoch: 1 Batch: 5209/20099 (25.92%) Loss: 2.072930 LR: 0.00002959 [09:59:36] Epoch: 1 Batch: 5210/20099 (25.92%) Loss: 2.404287 LR: 0.00002959 [09:59:37] Epoch: 1 Batch: 5211/20099 (25.93%) Loss: 2.546427 LR: 0.00002959 [09:59:39] Epoch: 1 Batch: 5212/20099 (25.93%) Loss: 2.233519 LR: 0.00002959 [09:59:41] Epoch: 1 Batch: 5213/20099 (25.94%) Loss: 2.341598 LR: 0.00002959 [09:59:43] Epoch: 1 Batch: 5214/20099 (25.94%) Loss: 2.159814 LR: 0.00002959 [09:59:45] Epoch: 1 Batch: 5215/20099 (25.95%) Loss: 2.433942 LR: 0.00002959 [09:59:46] Epoch: 1 Batch: 5216/20099 (25.95%) Loss: 2.086906 LR: 0.00002959 [09:59:48] Epoch: 1 Batch: 5217/20099 (25.96%) Loss: 2.333353 LR: 0.00002959 [09:59:50] Epoch: 1 Batch: 5218/20099 (25.96%) Loss: 2.006044 LR: 0.00002958 [09:59:52] Epoch: 1 Batch: 5219/20099 (25.97%) Loss: 2.050009 LR: 0.00002958 [09:59:54] Epoch: 1 Batch: 5220/20099 (25.97%) Loss: 2.103528 LR: 0.00002958 [09:59:55] Epoch: 1 Batch: 5221/20099 (25.98%) Loss: 1.934553 LR: 0.00002958 [09:59:57] Epoch: 1 Batch: 5222/20099 (25.98%) Loss: 2.176851 LR: 0.00002958 [09:59:59] Epoch: 1 Batch: 5223/20099 (25.99%) Loss: 2.023637 LR: 0.00002958 [10:00:01] Epoch: 1 Batch: 5224/20099 (25.99%) Loss: 2.082035 LR: 0.00002958 [10:00:03] Epoch: 1 Batch: 5225/20099 (26.00%) Loss: 1.821024 LR: 0.00002958 [10:00:04] Epoch: 1 Batch: 5226/20099 (26.00%) Loss: 1.931556 LR: 0.00002958 [10:00:06] Epoch: 1 Batch: 5227/20099 (26.01%) Loss: 2.502036 LR: 0.00002958 [10:00:08] Epoch: 1 Batch: 5228/20099 (26.01%) Loss: 2.239546 LR: 0.00002958 [10:00:10] Epoch: 1 Batch: 5229/20099 (26.02%) Loss: 1.865717 LR: 0.00002958 [10:00:11] Epoch: 1 Batch: 5230/20099 (26.02%) Loss: 2.224649 LR: 0.00002958 [10:00:13] Epoch: 1 Batch: 5231/20099 (26.03%) Loss: 2.182599 LR: 0.00002958 [10:00:15] Epoch: 1 Batch: 5232/20099 (26.03%) Loss: 2.027246 LR: 0.00002958 [10:00:17] Epoch: 1 Batch: 5233/20099 (26.04%) Loss: 1.965098 LR: 0.00002958 [10:00:19] Epoch: 1 Batch: 5234/20099 (26.04%) Loss: 1.760204 LR: 0.00002958 [10:00:20] Epoch: 1 Batch: 5235/20099 (26.05%) Loss: 1.967444 LR: 0.00002958 [10:00:22] Epoch: 1 Batch: 5236/20099 (26.05%) Loss: 2.337430 LR: 0.00002958 [10:00:24] Epoch: 1 Batch: 5237/20099 (26.06%) Loss: 2.282742 LR: 0.00002958 [10:00:26] Epoch: 1 Batch: 5238/20099 (26.06%) Loss: 1.893889 LR: 0.00002958 [10:00:27] Epoch: 1 Batch: 5239/20099 (26.07%) Loss: 1.951274 LR: 0.00002957 [10:00:29] Epoch: 1 Batch: 5240/20099 (26.07%) Loss: 1.945030 LR: 0.00002957 [10:00:31] Epoch: 1 Batch: 5241/20099 (26.08%) Loss: 2.054807 LR: 0.00002957 [10:00:33] Epoch: 1 Batch: 5242/20099 (26.08%) Loss: 2.125057 LR: 0.00002957 [10:00:34] Epoch: 1 Batch: 5243/20099 (26.09%) Loss: 2.172578 LR: 0.00002957 [10:00:36] Epoch: 1 Batch: 5244/20099 (26.09%) Loss: 2.004131 LR: 0.00002957 [10:00:38] Epoch: 1 Batch: 5245/20099 (26.10%) Loss: 2.060345 LR: 0.00002957 [10:00:40] Epoch: 1 Batch: 5246/20099 (26.10%) Loss: 2.444358 LR: 0.00002957 [10:00:41] Epoch: 1 Batch: 5247/20099 (26.11%) Loss: 2.455721 LR: 0.00002957 [10:00:43] Epoch: 1 Batch: 5248/20099 (26.11%) Loss: 2.502295 LR: 0.00002957 [10:00:45] Epoch: 1 Batch: 5249/20099 (26.12%) Loss: 2.249175 LR: 0.00002957 [10:00:47] Epoch: 1 Batch: 5250/20099 (26.12%) Loss: 2.195855 LR: 0.00002957 [10:00:48] Epoch: 1 Batch: 5251/20099 (26.13%) Loss: 2.052966 LR: 0.00002957 [10:00:50] Epoch: 1 Batch: 5252/20099 (26.13%) Loss: 1.988862 LR: 0.00002957 [10:00:52] Epoch: 1 Batch: 5253/20099 (26.14%) Loss: 2.194725 LR: 0.00002956 [10:00:54] Epoch: 1 Batch: 5254/20099 (26.14%) Loss: 2.220252 LR: 0.00002956 [10:00:55] Epoch: 1 Batch: 5255/20099 (26.15%) Loss: 2.084338 LR: 0.00002956 [10:00:57] Epoch: 1 Batch: 5256/20099 (26.15%) Loss: 2.389321 LR: 0.00002956 [10:00:59] Epoch: 1 Batch: 5257/20099 (26.16%) Loss: 2.290904 LR: 0.00002956 [10:01:01] Epoch: 1 Batch: 5258/20099 (26.16%) Loss: 1.912925 LR: 0.00002956 [10:01:03] Epoch: 1 Batch: 5259/20099 (26.17%) Loss: 2.081119 LR: 0.00002956 [10:01:04] Epoch: 1 Batch: 5260/20099 (26.17%) Loss: 2.257213 LR: 0.00002956 [10:01:06] Epoch: 1 Batch: 5261/20099 (26.18%) Loss: 1.953340 LR: 0.00002956 [10:01:08] Epoch: 1 Batch: 5262/20099 (26.18%) Loss: 1.955008 LR: 0.00002956 [10:01:10] Epoch: 1 Batch: 5263/20099 (26.19%) Loss: 2.049940 LR: 0.00002956 [10:01:11] Epoch: 1 Batch: 5264/20099 (26.19%) Loss: 2.308121 LR: 0.00002956 [10:01:13] Epoch: 1 Batch: 5265/20099 (26.20%) Loss: 2.042370 LR: 0.00002956 [10:01:15] Epoch: 1 Batch: 5266/20099 (26.20%) Loss: 1.847723 LR: 0.00002956 [10:01:17] Epoch: 1 Batch: 5267/20099 (26.21%) Loss: 2.209192 LR: 0.00002955 [10:01:19] Epoch: 1 Batch: 5268/20099 (26.21%) Loss: 2.036092 LR: 0.00002955 [10:01:20] Epoch: 1 Batch: 5269/20099 (26.22%) Loss: 1.951100 LR: 0.00002955 [10:01:22] Epoch: 1 Batch: 5270/20099 (26.22%) Loss: 2.438736 LR: 0.00002955 [10:01:24] Epoch: 1 Batch: 5271/20099 (26.23%) Loss: 2.131855 LR: 0.00002955 [10:01:26] Epoch: 1 Batch: 5272/20099 (26.23%) Loss: 2.211555 LR: 0.00002955 [10:01:27] Epoch: 1 Batch: 5273/20099 (26.24%) Loss: 2.227537 LR: 0.00002955 [10:01:29] Epoch: 1 Batch: 5274/20099 (26.24%) Loss: 2.000927 LR: 0.00002955 [10:01:31] Epoch: 1 Batch: 5275/20099 (26.25%) Loss: 2.245870 LR: 0.00002955 [10:01:33] Epoch: 1 Batch: 5276/20099 (26.25%) Loss: 2.149190 LR: 0.00002955 [10:01:35] Epoch: 1 Batch: 5277/20099 (26.26%) Loss: 2.182068 LR: 0.00002955 [10:01:37] Epoch: 1 Batch: 5278/20099 (26.26%) Loss: 2.064885 LR: 0.00002955 [10:01:38] Epoch: 1 Batch: 5279/20099 (26.26%) Loss: 2.357294 LR: 0.00002955 [10:01:40] Epoch: 1 Batch: 5280/20099 (26.27%) Loss: 2.309474 LR: 0.00002955 [10:01:42] Epoch: 1 Batch: 5281/20099 (26.27%) Loss: 2.137645 LR: 0.00002955 [10:01:44] Epoch: 1 Batch: 5282/20099 (26.28%) Loss: 1.903790 LR: 0.00002955 [10:01:46] Epoch: 1 Batch: 5283/20099 (26.28%) Loss: 2.483090 LR: 0.00002955 [10:01:47] Epoch: 1 Batch: 5284/20099 (26.29%) Loss: 2.257658 LR: 0.00002955 [10:01:49] Epoch: 1 Batch: 5285/20099 (26.29%) Loss: 2.112091 LR: 0.00002955 [10:01:51] Epoch: 1 Batch: 5286/20099 (26.30%) Loss: 2.232180 LR: 0.00002955 [10:01:53] Epoch: 1 Batch: 5287/20099 (26.30%) Loss: 2.244846 LR: 0.00002955 [10:01:54] Epoch: 1 Batch: 5288/20099 (26.31%) Loss: 2.131731 LR: 0.00002954 [10:01:56] Epoch: 1 Batch: 5289/20099 (26.31%) Loss: 2.124147 LR: 0.00002954 [10:01:58] Epoch: 1 Batch: 5290/20099 (26.32%) Loss: 1.862711 LR: 0.00002954 [10:02:00] Epoch: 1 Batch: 5291/20099 (26.32%) Loss: 2.122579 LR: 0.00002954 [10:02:02] Epoch: 1 Batch: 5292/20099 (26.33%) Loss: 2.122437 LR: 0.00002954 [10:02:03] Epoch: 1 Batch: 5293/20099 (26.33%) Loss: 2.008774 LR: 0.00002954 [10:02:05] Epoch: 1 Batch: 5294/20099 (26.34%) Loss: 2.153089 LR: 0.00002954 [10:02:07] Epoch: 1 Batch: 5295/20099 (26.34%) Loss: 2.453319 LR: 0.00002954 [10:02:09] Epoch: 1 Batch: 5296/20099 (26.35%) Loss: 2.107199 LR: 0.00002954 [10:02:10] Epoch: 1 Batch: 5297/20099 (26.35%) Loss: 2.016767 LR: 0.00002954 [10:02:12] Epoch: 1 Batch: 5298/20099 (26.36%) Loss: 2.049711 LR: 0.00002954 [10:02:14] Epoch: 1 Batch: 5299/20099 (26.36%) Loss: 2.304215 LR: 0.00002954 [10:02:16] Epoch: 1 Batch: 5300/20099 (26.37%) Loss: 2.367209 LR: 0.00002954 [10:02:18] Epoch: 1 Batch: 5301/20099 (26.37%) Loss: 2.152944 LR: 0.00002954 [10:02:19] Epoch: 1 Batch: 5302/20099 (26.38%) Loss: 1.905111 LR: 0.00002953 [10:02:21] Epoch: 1 Batch: 5303/20099 (26.38%) Loss: 1.983787 LR: 0.00002953 [10:02:23] Epoch: 1 Batch: 5304/20099 (26.39%) Loss: 2.155028 LR: 0.00002953 [10:02:25] Epoch: 1 Batch: 5305/20099 (26.39%) Loss: 2.216731 LR: 0.00002953 [10:02:26] Epoch: 1 Batch: 5306/20099 (26.40%) Loss: 1.968447 LR: 0.00002953 [10:02:28] Epoch: 1 Batch: 5307/20099 (26.40%) Loss: 2.258006 LR: 0.00002953 [10:02:30] Epoch: 1 Batch: 5308/20099 (26.41%) Loss: 2.181214 LR: 0.00002953 [10:02:32] Epoch: 1 Batch: 5309/20099 (26.41%) Loss: 2.443493 LR: 0.00002953 [10:02:34] Epoch: 1 Batch: 5310/20099 (26.42%) Loss: 1.945972 LR: 0.00002953 [10:02:35] Epoch: 1 Batch: 5311/20099 (26.42%) Loss: 1.978797 LR: 0.00002953 [10:02:37] Epoch: 1 Batch: 5312/20099 (26.43%) Loss: 2.087245 LR: 0.00002953 [10:02:39] Epoch: 1 Batch: 5313/20099 (26.43%) Loss: 2.147776 LR: 0.00002953 [10:02:41] Epoch: 1 Batch: 5314/20099 (26.44%) Loss: 2.105486 LR: 0.00002953 [10:02:42] Epoch: 1 Batch: 5315/20099 (26.44%) Loss: 1.979147 LR: 0.00002953 [10:02:44] Epoch: 1 Batch: 5316/20099 (26.45%) Loss: 2.090837 LR: 0.00002952 [10:02:46] Epoch: 1 Batch: 5317/20099 (26.45%) Loss: 2.036646 LR: 0.00002952 [10:02:48] Epoch: 1 Batch: 5318/20099 (26.46%) Loss: 1.980001 LR: 0.00002952 [10:02:50] Epoch: 1 Batch: 5319/20099 (26.46%) Loss: 2.267281 LR: 0.00002952 [10:02:51] Epoch: 1 Batch: 5320/20099 (26.47%) Loss: 2.169286 LR: 0.00002952 [10:02:53] Epoch: 1 Batch: 5321/20099 (26.47%) Loss: 2.157197 LR: 0.00002952 [10:02:55] Epoch: 1 Batch: 5322/20099 (26.48%) Loss: 2.132870 LR: 0.00002952 [10:02:57] Epoch: 1 Batch: 5323/20099 (26.48%) Loss: 2.362358 LR: 0.00002952 [10:02:58] Epoch: 1 Batch: 5324/20099 (26.49%) Loss: 1.845561 LR: 0.00002952 [10:03:00] Epoch: 1 Batch: 5325/20099 (26.49%) Loss: 2.125608 LR: 0.00002952 [10:03:02] Epoch: 1 Batch: 5326/20099 (26.50%) Loss: 2.014180 LR: 0.00002952 [10:03:04] Epoch: 1 Batch: 5327/20099 (26.50%) Loss: 2.529214 LR: 0.00002952 [10:03:05] Epoch: 1 Batch: 5328/20099 (26.51%) Loss: 2.150317 LR: 0.00002952 [10:03:07] Epoch: 1 Batch: 5329/20099 (26.51%) Loss: 1.665474 LR: 0.00002952 [10:03:09] Epoch: 1 Batch: 5330/20099 (26.52%) Loss: 2.055614 LR: 0.00002951 [10:03:11] Epoch: 1 Batch: 5331/20099 (26.52%) Loss: 2.175996 LR: 0.00002951 [10:03:13] Epoch: 1 Batch: 5332/20099 (26.53%) Loss: 2.573268 LR: 0.00002951 [10:03:14] Epoch: 1 Batch: 5333/20099 (26.53%) Loss: 1.926118 LR: 0.00002951 [10:03:16] Epoch: 1 Batch: 5334/20099 (26.54%) Loss: 2.246243 LR: 0.00002951 [10:03:18] Epoch: 1 Batch: 5335/20099 (26.54%) Loss: 2.248676 LR: 0.00002951 [10:03:20] Epoch: 1 Batch: 5336/20099 (26.55%) Loss: 2.077011 LR: 0.00002951 [10:03:21] Epoch: 1 Batch: 5337/20099 (26.55%) Loss: 2.333693 LR: 0.00002951 [10:03:23] Epoch: 1 Batch: 5338/20099 (26.56%) Loss: 1.980316 LR: 0.00002951 [10:03:25] Epoch: 1 Batch: 5339/20099 (26.56%) Loss: 2.185232 LR: 0.00002951 [10:03:27] Epoch: 1 Batch: 5340/20099 (26.57%) Loss: 2.047994 LR: 0.00002951 [10:03:29] Epoch: 1 Batch: 5341/20099 (26.57%) Loss: 2.406561 LR: 0.00002951 [10:03:30] Epoch: 1 Batch: 5342/20099 (26.58%) Loss: 2.263602 LR: 0.00002951 [10:03:32] Epoch: 1 Batch: 5343/20099 (26.58%) Loss: 2.113905 LR: 0.00002951 [10:03:34] Epoch: 1 Batch: 5344/20099 (26.59%) Loss: 2.165080 LR: 0.00002950 [10:03:36] Epoch: 1 Batch: 5345/20099 (26.59%) Loss: 2.105507 LR: 0.00002950 [10:03:37] Epoch: 1 Batch: 5346/20099 (26.60%) Loss: 2.162409 LR: 0.00002950 [10:03:39] Epoch: 1 Batch: 5347/20099 (26.60%) Loss: 2.254974 LR: 0.00002950 [10:03:41] Epoch: 1 Batch: 5348/20099 (26.61%) Loss: 2.146637 LR: 0.00002950 [10:03:43] Epoch: 1 Batch: 5349/20099 (26.61%) Loss: 1.983313 LR: 0.00002950 [10:03:45] Epoch: 1 Batch: 5350/20099 (26.62%) Loss: 2.384114 LR: 0.00002950 [10:03:46] Epoch: 1 Batch: 5351/20099 (26.62%) Loss: 2.380051 LR: 0.00002950 [10:03:48] Epoch: 1 Batch: 5352/20099 (26.63%) Loss: 2.154150 LR: 0.00002950 [10:03:50] Epoch: 1 Batch: 5353/20099 (26.63%) Loss: 2.188178 LR: 0.00002950 [10:03:52] Epoch: 1 Batch: 5354/20099 (26.64%) Loss: 1.660404 LR: 0.00002950 [10:03:53] Epoch: 1 Batch: 5355/20099 (26.64%) Loss: 2.377430 LR: 0.00002950 [10:03:55] Epoch: 1 Batch: 5356/20099 (26.65%) Loss: 2.008257 LR: 0.00002950 [10:03:57] Epoch: 1 Batch: 5357/20099 (26.65%) Loss: 2.257199 LR: 0.00002950 [10:03:59] Epoch: 1 Batch: 5358/20099 (26.66%) Loss: 2.353332 LR: 0.00002950 [10:04:01] Epoch: 1 Batch: 5359/20099 (26.66%) Loss: 2.207362 LR: 0.00002950 [10:04:02] Epoch: 1 Batch: 5360/20099 (26.67%) Loss: 2.388850 LR: 0.00002950 [10:04:04] Epoch: 1 Batch: 5361/20099 (26.67%) Loss: 2.460552 LR: 0.00002950 [10:04:06] Epoch: 1 Batch: 5362/20099 (26.68%) Loss: 2.069199 LR: 0.00002950 [10:04:08] Epoch: 1 Batch: 5363/20099 (26.68%) Loss: 1.785211 LR: 0.00002950 [10:04:09] Epoch: 1 Batch: 5364/20099 (26.69%) Loss: 1.869507 LR: 0.00002950 [10:04:11] Epoch: 1 Batch: 5365/20099 (26.69%) Loss: 2.315555 LR: 0.00002949 [10:04:13] Epoch: 1 Batch: 5366/20099 (26.70%) Loss: 2.072086 LR: 0.00002949 [10:04:15] Epoch: 1 Batch: 5367/20099 (26.70%) Loss: 2.025519 LR: 0.00002949 [10:04:17] Epoch: 1 Batch: 5368/20099 (26.71%) Loss: 2.026887 LR: 0.00002949 [10:04:18] Epoch: 1 Batch: 5369/20099 (26.71%) Loss: 2.092586 LR: 0.00002949 [10:04:20] Epoch: 1 Batch: 5370/20099 (26.72%) Loss: 1.903172 LR: 0.00002949 [10:04:22] Epoch: 1 Batch: 5371/20099 (26.72%) Loss: 2.265740 LR: 0.00002949 [10:04:24] Epoch: 1 Batch: 5372/20099 (26.73%) Loss: 2.345257 LR: 0.00002949 [10:04:25] Epoch: 1 Batch: 5373/20099 (26.73%) Loss: 2.166030 LR: 0.00002949 [10:04:27] Epoch: 1 Batch: 5374/20099 (26.74%) Loss: 2.028642 LR: 0.00002949 [10:04:29] Epoch: 1 Batch: 5375/20099 (26.74%) Loss: 2.054534 LR: 0.00002949 [10:04:31] Epoch: 1 Batch: 5376/20099 (26.75%) Loss: 2.196232 LR: 0.00002949 [10:04:32] Epoch: 1 Batch: 5377/20099 (26.75%) Loss: 2.086753 LR: 0.00002949 [10:04:34] Epoch: 1 Batch: 5378/20099 (26.76%) Loss: 1.820545 LR: 0.00002949 [10:04:36] Epoch: 1 Batch: 5379/20099 (26.76%) Loss: 2.331662 LR: 0.00002948 [10:04:38] Epoch: 1 Batch: 5380/20099 (26.77%) Loss: 2.286243 LR: 0.00002948 [10:04:40] Epoch: 1 Batch: 5381/20099 (26.77%) Loss: 1.825334 LR: 0.00002948 [10:04:41] Epoch: 1 Batch: 5382/20099 (26.78%) Loss: 2.274067 LR: 0.00002948 [10:04:43] Epoch: 1 Batch: 5383/20099 (26.78%) Loss: 2.003365 LR: 0.00002948 [10:04:45] Epoch: 1 Batch: 5384/20099 (26.79%) Loss: 2.140905 LR: 0.00002948 [10:04:47] Epoch: 1 Batch: 5385/20099 (26.79%) Loss: 2.342805 LR: 0.00002948 [10:04:48] Epoch: 1 Batch: 5386/20099 (26.80%) Loss: 2.162837 LR: 0.00002948 [10:04:50] Epoch: 1 Batch: 5387/20099 (26.80%) Loss: 2.209598 LR: 0.00002948 [10:04:52] Epoch: 1 Batch: 5388/20099 (26.81%) Loss: 2.322072 LR: 0.00002948 [10:04:54] Epoch: 1 Batch: 5389/20099 (26.81%) Loss: 1.984072 LR: 0.00002948 [10:04:55] Epoch: 1 Batch: 5390/20099 (26.82%) Loss: 2.172188 LR: 0.00002948 [10:04:57] Epoch: 1 Batch: 5391/20099 (26.82%) Loss: 1.974128 LR: 0.00002948 [10:04:59] Epoch: 1 Batch: 5392/20099 (26.83%) Loss: 1.827538 LR: 0.00002948 [10:05:01] Epoch: 1 Batch: 5393/20099 (26.83%) Loss: 2.208696 LR: 0.00002947 [10:05:03] Epoch: 1 Batch: 5394/20099 (26.84%) Loss: 1.988352 LR: 0.00002947 [10:05:04] Epoch: 1 Batch: 5395/20099 (26.84%) Loss: 1.917780 LR: 0.00002947 [10:05:06] Epoch: 1 Batch: 5396/20099 (26.85%) Loss: 2.378491 LR: 0.00002947 [10:05:08] Epoch: 1 Batch: 5397/20099 (26.85%) Loss: 1.977208 LR: 0.00002947 [10:05:10] Epoch: 1 Batch: 5398/20099 (26.86%) Loss: 2.216182 LR: 0.00002947 [10:05:11] Epoch: 1 Batch: 5399/20099 (26.86%) Loss: 2.188872 LR: 0.00002947 [10:05:17] >> Cleaned up old temp checkpoint: epoch1_step3400 [10:05:17] >> Temp checkpoint saved: epoch1_step5400, size: 0.1693 GB [10:05:17] Epoch: 1 Batch: 5400/20099 (26.87%) Loss: 2.098447 LR: 0.00002947 [10:05:19] Epoch: 1 Batch: 5401/20099 (26.87%) Loss: 2.155879 LR: 0.00002947 [10:05:20] Epoch: 1 Batch: 5402/20099 (26.88%) Loss: 2.188242 LR: 0.00002947 [10:05:22] Epoch: 1 Batch: 5403/20099 (26.88%) Loss: 2.194258 LR: 0.00002947 [10:05:24] Epoch: 1 Batch: 5404/20099 (26.89%) Loss: 2.280442 LR: 0.00002947 [10:05:26] Epoch: 1 Batch: 5405/20099 (26.89%) Loss: 2.206422 LR: 0.00002947 [10:05:27] Epoch: 1 Batch: 5406/20099 (26.90%) Loss: 2.300056 LR: 0.00002947 [10:05:29] Epoch: 1 Batch: 5407/20099 (26.90%) Loss: 2.122420 LR: 0.00002946 [10:05:31] Epoch: 1 Batch: 5408/20099 (26.91%) Loss: 2.074892 LR: 0.00002946 [10:05:33] Epoch: 1 Batch: 5409/20099 (26.91%) Loss: 2.045554 LR: 0.00002946 [10:05:34] Epoch: 1 Batch: 5410/20099 (26.92%) Loss: 1.814148 LR: 0.00002946 [10:05:36] Epoch: 1 Batch: 5411/20099 (26.92%) Loss: 2.350235 LR: 0.00002946 [10:05:38] Epoch: 1 Batch: 5412/20099 (26.93%) Loss: 2.165095 LR: 0.00002946 [10:05:40] Epoch: 1 Batch: 5413/20099 (26.93%) Loss: 2.094590 LR: 0.00002946 [10:05:42] Epoch: 1 Batch: 5414/20099 (26.94%) Loss: 2.133256 LR: 0.00002946 [10:05:43] Epoch: 1 Batch: 5415/20099 (26.94%) Loss: 1.834228 LR: 0.00002946 [10:05:45] Epoch: 1 Batch: 5416/20099 (26.95%) Loss: 2.399768 LR: 0.00002946 [10:05:47] Epoch: 1 Batch: 5417/20099 (26.95%) Loss: 2.149693 LR: 0.00002946 [10:05:49] Epoch: 1 Batch: 5418/20099 (26.96%) Loss: 2.088020 LR: 0.00002946 [10:05:50] Epoch: 1 Batch: 5419/20099 (26.96%) Loss: 1.924644 LR: 0.00002946 [10:05:52] Epoch: 1 Batch: 5420/20099 (26.97%) Loss: 2.209120 LR: 0.00002946 [10:05:54] Epoch: 1 Batch: 5421/20099 (26.97%) Loss: 1.832779 LR: 0.00002945 [10:05:56] Epoch: 1 Batch: 5422/20099 (26.98%) Loss: 2.265687 LR: 0.00002945 [10:05:58] Epoch: 1 Batch: 5423/20099 (26.98%) Loss: 2.064221 LR: 0.00002945 [10:05:59] Epoch: 1 Batch: 5424/20099 (26.99%) Loss: 2.100106 LR: 0.00002945 [10:06:01] Epoch: 1 Batch: 5425/20099 (26.99%) Loss: 2.208666 LR: 0.00002945 [10:06:03] Epoch: 1 Batch: 5426/20099 (27.00%) Loss: 2.229734 LR: 0.00002945 [10:06:05] Epoch: 1 Batch: 5427/20099 (27.00%) Loss: 1.895799 LR: 0.00002945 [10:06:06] Epoch: 1 Batch: 5428/20099 (27.01%) Loss: 2.145902 LR: 0.00002945 [10:06:08] Epoch: 1 Batch: 5429/20099 (27.01%) Loss: 2.554694 LR: 0.00002945 [10:06:10] Epoch: 1 Batch: 5430/20099 (27.02%) Loss: 2.137693 LR: 0.00002945 [10:06:12] Epoch: 1 Batch: 5431/20099 (27.02%) Loss: 2.247862 LR: 0.00002945 [10:06:14] Epoch: 1 Batch: 5432/20099 (27.03%) Loss: 2.299312 LR: 0.00002945 [10:06:15] Epoch: 1 Batch: 5433/20099 (27.03%) Loss: 2.245755 LR: 0.00002945 [10:06:17] Epoch: 1 Batch: 5434/20099 (27.04%) Loss: 2.007467 LR: 0.00002945 [10:06:19] Epoch: 1 Batch: 5435/20099 (27.04%) Loss: 2.135342 LR: 0.00002944 [10:06:21] Epoch: 1 Batch: 5436/20099 (27.05%) Loss: 2.060696 LR: 0.00002944 [10:06:22] Epoch: 1 Batch: 5437/20099 (27.05%) Loss: 2.141539 LR: 0.00002944 [10:06:24] Epoch: 1 Batch: 5438/20099 (27.06%) Loss: 2.083541 LR: 0.00002944 [10:06:26] Epoch: 1 Batch: 5439/20099 (27.06%) Loss: 2.125227 LR: 0.00002944 [10:06:28] Epoch: 1 Batch: 5440/20099 (27.07%) Loss: 2.110991 LR: 0.00002944 [10:06:29] Epoch: 1 Batch: 5441/20099 (27.07%) Loss: 2.269356 LR: 0.00002944 [10:06:31] Epoch: 1 Batch: 5442/20099 (27.08%) Loss: 1.770213 LR: 0.00002944 [10:06:33] Epoch: 1 Batch: 5443/20099 (27.08%) Loss: 1.907457 LR: 0.00002944 [10:06:35] Epoch: 1 Batch: 5444/20099 (27.09%) Loss: 2.083388 LR: 0.00002944 [10:06:37] Epoch: 1 Batch: 5445/20099 (27.09%) Loss: 2.446862 LR: 0.00002944 [10:06:38] Epoch: 1 Batch: 5446/20099 (27.10%) Loss: 1.941616 LR: 0.00002944 [10:06:40] Epoch: 1 Batch: 5447/20099 (27.10%) Loss: 2.333537 LR: 0.00002944 [10:06:42] Epoch: 1 Batch: 5448/20099 (27.11%) Loss: 1.777848 LR: 0.00002944 [10:06:44] Epoch: 1 Batch: 5449/20099 (27.11%) Loss: 2.244641 LR: 0.00002943 [10:06:45] Epoch: 1 Batch: 5450/20099 (27.12%) Loss: 2.130341 LR: 0.00002943 [10:06:47] Epoch: 1 Batch: 5451/20099 (27.12%) Loss: 2.365151 LR: 0.00002943 [10:06:49] Epoch: 1 Batch: 5452/20099 (27.13%) Loss: 2.368598 LR: 0.00002943 [10:06:51] Epoch: 1 Batch: 5453/20099 (27.13%) Loss: 2.452541 LR: 0.00002943 [10:06:53] Epoch: 1 Batch: 5454/20099 (27.14%) Loss: 1.995470 LR: 0.00002943 [10:06:54] Epoch: 1 Batch: 5455/20099 (27.14%) Loss: 2.118965 LR: 0.00002943 [10:06:56] Epoch: 1 Batch: 5456/20099 (27.15%) Loss: 2.170623 LR: 0.00002943 [10:06:58] Epoch: 1 Batch: 5457/20099 (27.15%) Loss: 2.311609 LR: 0.00002943 [10:07:00] Epoch: 1 Batch: 5458/20099 (27.16%) Loss: 1.917262 LR: 0.00002943 [10:07:01] Epoch: 1 Batch: 5459/20099 (27.16%) Loss: 2.218131 LR: 0.00002943 [10:07:03] Epoch: 1 Batch: 5460/20099 (27.17%) Loss: 2.243514 LR: 0.00002943 [10:07:05] Epoch: 1 Batch: 5461/20099 (27.17%) Loss: 2.038677 LR: 0.00002943 [10:07:07] Epoch: 1 Batch: 5462/20099 (27.18%) Loss: 2.286114 LR: 0.00002943 [10:07:08] Epoch: 1 Batch: 5463/20099 (27.18%) Loss: 1.758963 LR: 0.00002942 [10:07:10] Epoch: 1 Batch: 5464/20099 (27.19%) Loss: 2.262538 LR: 0.00002942 [10:07:12] Epoch: 1 Batch: 5465/20099 (27.19%) Loss: 2.106436 LR: 0.00002942 [10:07:14] Epoch: 1 Batch: 5466/20099 (27.20%) Loss: 2.104243 LR: 0.00002942 [10:07:16] Epoch: 1 Batch: 5467/20099 (27.20%) Loss: 1.982912 LR: 0.00002942 [10:07:17] Epoch: 1 Batch: 5468/20099 (27.21%) Loss: 2.145750 LR: 0.00002942 [10:07:19] Epoch: 1 Batch: 5469/20099 (27.21%) Loss: 2.372891 LR: 0.00002942 [10:07:21] Epoch: 1 Batch: 5470/20099 (27.22%) Loss: 2.088860 LR: 0.00002942 [10:07:23] Epoch: 1 Batch: 5471/20099 (27.22%) Loss: 2.470699 LR: 0.00002942 [10:07:24] Epoch: 1 Batch: 5472/20099 (27.23%) Loss: 2.321858 LR: 0.00002942 [10:07:26] Epoch: 1 Batch: 5473/20099 (27.23%) Loss: 1.932436 LR: 0.00002942 [10:07:28] Epoch: 1 Batch: 5474/20099 (27.24%) Loss: 2.066908 LR: 0.00002942 [10:07:30] Epoch: 1 Batch: 5475/20099 (27.24%) Loss: 2.051298 LR: 0.00002942 [10:07:32] Epoch: 1 Batch: 5476/20099 (27.25%) Loss: 2.022308 LR: 0.00002942 [10:07:33] Epoch: 1 Batch: 5477/20099 (27.25%) Loss: 2.050642 LR: 0.00002941 [10:07:35] Epoch: 1 Batch: 5478/20099 (27.26%) Loss: 2.252359 LR: 0.00002941 [10:07:37] Epoch: 1 Batch: 5479/20099 (27.26%) Loss: 2.110823 LR: 0.00002941 [10:07:39] Epoch: 1 Batch: 5480/20099 (27.27%) Loss: 2.200112 LR: 0.00002941 [10:07:40] Epoch: 1 Batch: 5481/20099 (27.27%) Loss: 2.105589 LR: 0.00002941 [10:07:42] Epoch: 1 Batch: 5482/20099 (27.27%) Loss: 2.304595 LR: 0.00002941 [10:07:44] Epoch: 1 Batch: 5483/20099 (27.28%) Loss: 2.365151 LR: 0.00002941 [10:07:46] Epoch: 1 Batch: 5484/20099 (27.28%) Loss: 2.251320 LR: 0.00002941 [10:07:47] Epoch: 1 Batch: 5485/20099 (27.29%) Loss: 2.566852 LR: 0.00002941 [10:07:49] Epoch: 1 Batch: 5486/20099 (27.29%) Loss: 2.216344 LR: 0.00002941 [10:07:51] Epoch: 1 Batch: 5487/20099 (27.30%) Loss: 2.187454 LR: 0.00002941 [10:07:53] Epoch: 1 Batch: 5488/20099 (27.30%) Loss: 2.055529 LR: 0.00002941 [10:07:55] Epoch: 1 Batch: 5489/20099 (27.31%) Loss: 2.494184 LR: 0.00002941 [10:07:56] Epoch: 1 Batch: 5490/20099 (27.31%) Loss: 2.404665 LR: 0.00002941 [10:07:58] Epoch: 1 Batch: 5491/20099 (27.32%) Loss: 2.371977 LR: 0.00002940 [10:08:00] Epoch: 1 Batch: 5492/20099 (27.32%) Loss: 2.162741 LR: 0.00002940 [10:08:02] Epoch: 1 Batch: 5493/20099 (27.33%) Loss: 2.171967 LR: 0.00002940 [10:08:03] Epoch: 1 Batch: 5494/20099 (27.33%) Loss: 2.256630 LR: 0.00002940 [10:08:05] Epoch: 1 Batch: 5495/20099 (27.34%) Loss: 2.235033 LR: 0.00002940 [10:08:07] Epoch: 1 Batch: 5496/20099 (27.34%) Loss: 2.151681 LR: 0.00002940 [10:08:09] Epoch: 1 Batch: 5497/20099 (27.35%) Loss: 2.196597 LR: 0.00002940 [10:08:11] Epoch: 1 Batch: 5498/20099 (27.35%) Loss: 2.422774 LR: 0.00002940 [10:08:12] Epoch: 1 Batch: 5499/20099 (27.36%) Loss: 2.383378 LR: 0.00002940 [10:08:14] >> Evaluating batch 0 [10:08:15] >> Evaluating batch 1 [10:08:16] >> Evaluating batch 2 [10:08:17] >> Evaluating batch 3 [10:08:18] >> Evaluating batch 4 [10:08:19] >> Evaluating batch 5 [10:08:20] >> Evaluating batch 6 [10:08:21] >> Evaluating batch 7 [10:08:22] >> Evaluating batch 8 [10:08:23] >> Evaluating batch 9 [10:08:24] >> Evaluating batch 10 [10:08:25] >> Evaluating batch 11 [10:08:26] >> Evaluating batch 12 [10:08:27] >> Evaluating batch 13 [10:08:28] >> Evaluating batch 14 [10:08:29] >> Evaluating batch 15 [10:08:30] >> Evaluating batch 16 [10:08:31] Epoch: 1 Step: 5500/20099 Evaluation: [10:08:31] [1mAvg Loss Since Last Eval: 2.1600 Val Loss: 2.2059 Validation loss delta: -0.0211 Perplexity: 9.0784 LR: 0.00002940 [10:08:34] >> Checkpoint saved: epoch1_step5500, size: 0.1693 GB [10:08:34] Epoch: 1 Batch: 5500/20099 (27.36%) Loss: 2.135148 LR: 0.00002940 [10:08:36] Epoch: 1 Batch: 5501/20099 (27.37%) Loss: 2.050394 LR: 0.00002940 [10:08:38] Epoch: 1 Batch: 5502/20099 (27.37%) Loss: 2.382435 LR: 0.00002940 [10:08:39] Epoch: 1 Batch: 5503/20099 (27.38%) Loss: 2.240859 LR: 0.00002940 [10:08:41] Epoch: 1 Batch: 5504/20099 (27.38%) Loss: 2.169502 LR: 0.00002940 [10:08:43] Epoch: 1 Batch: 5505/20099 (27.39%) Loss: 2.138979 LR: 0.00002939 [10:08:45] Epoch: 1 Batch: 5506/20099 (27.39%) Loss: 2.000754 LR: 0.00002939 [10:08:47] Epoch: 1 Batch: 5507/20099 (27.40%) Loss: 2.019054 LR: 0.00002939 [10:08:48] Epoch: 1 Batch: 5508/20099 (27.40%) Loss: 1.561484 LR: 0.00002939 [10:08:50] Epoch: 1 Batch: 5509/20099 (27.41%) Loss: 2.237347 LR: 0.00002939 [10:08:52] Epoch: 1 Batch: 5510/20099 (27.41%) Loss: 2.209234 LR: 0.00002939 [10:08:54] Epoch: 1 Batch: 5511/20099 (27.42%) Loss: 1.989541 LR: 0.00002939 [10:08:55] Epoch: 1 Batch: 5512/20099 (27.42%) Loss: 2.012194 LR: 0.00002939 [10:08:57] Epoch: 1 Batch: 5513/20099 (27.43%) Loss: 2.156382 LR: 0.00002939 [10:08:59] Epoch: 1 Batch: 5514/20099 (27.43%) Loss: 2.312996 LR: 0.00002939 [10:09:01] Epoch: 1 Batch: 5515/20099 (27.44%) Loss: 2.292919 LR: 0.00002939 [10:09:03] Epoch: 1 Batch: 5516/20099 (27.44%) Loss: 2.045623 LR: 0.00002939 [10:09:04] Epoch: 1 Batch: 5517/20099 (27.45%) Loss: 2.201769 LR: 0.00002939 [10:09:06] Epoch: 1 Batch: 5518/20099 (27.45%) Loss: 2.080319 LR: 0.00002939 [10:09:08] Epoch: 1 Batch: 5519/20099 (27.46%) Loss: 2.302050 LR: 0.00002938 [10:09:10] Epoch: 1 Batch: 5520/20099 (27.46%) Loss: 2.330819 LR: 0.00002938 [10:09:12] Epoch: 1 Batch: 5521/20099 (27.47%) Loss: 1.991016 LR: 0.00002938 [10:09:13] Epoch: 1 Batch: 5522/20099 (27.47%) Loss: 2.126603 LR: 0.00002938 [10:09:15] Epoch: 1 Batch: 5523/20099 (27.48%) Loss: 1.951139 LR: 0.00002938 [10:09:17] Epoch: 1 Batch: 5524/20099 (27.48%) Loss: 2.101253 LR: 0.00002938 [10:09:19] Epoch: 1 Batch: 5525/20099 (27.49%) Loss: 2.094637 LR: 0.00002938 [10:09:21] Epoch: 1 Batch: 5526/20099 (27.49%) Loss: 2.074740 LR: 0.00002938 [10:09:22] Epoch: 1 Batch: 5527/20099 (27.50%) Loss: 2.380671 LR: 0.00002938 [10:09:24] Epoch: 1 Batch: 5528/20099 (27.50%) Loss: 2.139341 LR: 0.00002938 [10:09:26] Epoch: 1 Batch: 5529/20099 (27.51%) Loss: 1.942733 LR: 0.00002938 [10:09:28] Epoch: 1 Batch: 5530/20099 (27.51%) Loss: 2.229770 LR: 0.00002938 [10:09:29] Epoch: 1 Batch: 5531/20099 (27.52%) Loss: 1.756896 LR: 0.00002938 [10:09:31] Epoch: 1 Batch: 5532/20099 (27.52%) Loss: 2.112085 LR: 0.00002938 [10:09:33] Epoch: 1 Batch: 5533/20099 (27.53%) Loss: 2.402185 LR: 0.00002937 [10:09:35] Epoch: 1 Batch: 5534/20099 (27.53%) Loss: 2.091736 LR: 0.00002937 [10:09:36] Epoch: 1 Batch: 5535/20099 (27.54%) Loss: 2.148349 LR: 0.00002937 [10:09:38] Epoch: 1 Batch: 5536/20099 (27.54%) Loss: 2.043629 LR: 0.00002937 [10:09:40] Epoch: 1 Batch: 5537/20099 (27.55%) Loss: 2.483744 LR: 0.00002937 [10:09:42] Epoch: 1 Batch: 5538/20099 (27.55%) Loss: 1.855629 LR: 0.00002937 [10:09:43] Epoch: 1 Batch: 5539/20099 (27.56%) Loss: 2.267557 LR: 0.00002937 [10:09:45] Epoch: 1 Batch: 5540/20099 (27.56%) Loss: 2.241838 LR: 0.00002937 [10:09:47] Epoch: 1 Batch: 5541/20099 (27.57%) Loss: 2.420246 LR: 0.00002937 [10:09:49] Epoch: 1 Batch: 5542/20099 (27.57%) Loss: 2.143618 LR: 0.00002937 [10:09:50] Epoch: 1 Batch: 5543/20099 (27.58%) Loss: 1.988189 LR: 0.00002937 [10:09:52] Epoch: 1 Batch: 5544/20099 (27.58%) Loss: 2.405669 LR: 0.00002937 [10:09:54] Epoch: 1 Batch: 5545/20099 (27.59%) Loss: 2.077928 LR: 0.00002937 [10:09:56] Epoch: 1 Batch: 5546/20099 (27.59%) Loss: 2.467732 LR: 0.00002937 [10:09:57] Epoch: 1 Batch: 5547/20099 (27.60%) Loss: 2.068269 LR: 0.00002936 [10:09:59] Epoch: 1 Batch: 5548/20099 (27.60%) Loss: 2.303580 LR: 0.00002936 [10:10:01] Epoch: 1 Batch: 5549/20099 (27.61%) Loss: 1.947271 LR: 0.00002936 [10:10:03] Epoch: 1 Batch: 5550/20099 (27.61%) Loss: 2.277041 LR: 0.00002936 [10:10:05] Epoch: 1 Batch: 5551/20099 (27.62%) Loss: 2.111787 LR: 0.00002936 [10:10:06] Epoch: 1 Batch: 5552/20099 (27.62%) Loss: 2.299436 LR: 0.00002936 [10:10:08] Epoch: 1 Batch: 5553/20099 (27.63%) Loss: 2.185243 LR: 0.00002936 [10:10:10] Epoch: 1 Batch: 5554/20099 (27.63%) Loss: 1.949589 LR: 0.00002936 [10:10:12] Epoch: 1 Batch: 5555/20099 (27.64%) Loss: 2.251114 LR: 0.00002936 [10:10:13] Epoch: 1 Batch: 5556/20099 (27.64%) Loss: 1.953321 LR: 0.00002936 [10:10:15] Epoch: 1 Batch: 5557/20099 (27.65%) Loss: 1.925995 LR: 0.00002936 [10:10:17] Epoch: 1 Batch: 5558/20099 (27.65%) Loss: 1.872311 LR: 0.00002936 [10:10:19] Epoch: 1 Batch: 5559/20099 (27.66%) Loss: 2.090401 LR: 0.00002936 [10:10:21] Epoch: 1 Batch: 5560/20099 (27.66%) Loss: 2.046410 LR: 0.00002936 [10:10:22] Epoch: 1 Batch: 5561/20099 (27.67%) Loss: 2.167044 LR: 0.00002935 [10:10:24] Epoch: 1 Batch: 5562/20099 (27.67%) Loss: 2.346011 LR: 0.00002935 [10:10:26] Epoch: 1 Batch: 5563/20099 (27.68%) Loss: 2.250489 LR: 0.00002935 [10:10:28] Epoch: 1 Batch: 5564/20099 (27.68%) Loss: 2.084104 LR: 0.00002935 [10:10:29] Epoch: 1 Batch: 5565/20099 (27.69%) Loss: 1.955862 LR: 0.00002935 [10:10:31] Epoch: 1 Batch: 5566/20099 (27.69%) Loss: 2.513808 LR: 0.00002935 [10:10:33] Epoch: 1 Batch: 5567/20099 (27.70%) Loss: 2.211601 LR: 0.00002935 [10:10:35] Epoch: 1 Batch: 5568/20099 (27.70%) Loss: 2.165299 LR: 0.00002935 [10:10:36] Epoch: 1 Batch: 5569/20099 (27.71%) Loss: 2.359883 LR: 0.00002935 [10:10:38] Epoch: 1 Batch: 5570/20099 (27.71%) Loss: 1.889619 LR: 0.00002935 [10:10:40] Epoch: 1 Batch: 5571/20099 (27.72%) Loss: 2.032293 LR: 0.00002935 [10:10:42] Epoch: 1 Batch: 5572/20099 (27.72%) Loss: 1.867296 LR: 0.00002935 [10:10:44] Epoch: 1 Batch: 5573/20099 (27.73%) Loss: 2.366440 LR: 0.00002935 [10:10:45] Epoch: 1 Batch: 5574/20099 (27.73%) Loss: 1.978371 LR: 0.00002935 [10:10:47] Epoch: 1 Batch: 5575/20099 (27.74%) Loss: 2.119112 LR: 0.00002934 [10:10:49] Epoch: 1 Batch: 5576/20099 (27.74%) Loss: 2.201457 LR: 0.00002934 [10:10:51] Epoch: 1 Batch: 5577/20099 (27.75%) Loss: 2.335431 LR: 0.00002934 [10:10:53] Epoch: 1 Batch: 5578/20099 (27.75%) Loss: 2.025382 LR: 0.00002934 [10:10:54] Epoch: 1 Batch: 5579/20099 (27.76%) Loss: 2.122647 LR: 0.00002934 [10:10:56] Epoch: 1 Batch: 5580/20099 (27.76%) Loss: 1.848374 LR: 0.00002934 [10:10:58] Epoch: 1 Batch: 5581/20099 (27.77%) Loss: 2.231189 LR: 0.00002934 [10:11:00] Epoch: 1 Batch: 5582/20099 (27.77%) Loss: 2.164372 LR: 0.00002934 [10:11:01] Epoch: 1 Batch: 5583/20099 (27.78%) Loss: 2.229421 LR: 0.00002934 [10:11:03] Epoch: 1 Batch: 5584/20099 (27.78%) Loss: 2.617900 LR: 0.00002934 [10:11:05] Epoch: 1 Batch: 5585/20099 (27.79%) Loss: 2.127577 LR: 0.00002934 [10:11:07] Epoch: 1 Batch: 5586/20099 (27.79%) Loss: 2.381475 LR: 0.00002934 [10:11:09] Epoch: 1 Batch: 5587/20099 (27.80%) Loss: 2.089231 LR: 0.00002934 [10:11:10] Epoch: 1 Batch: 5588/20099 (27.80%) Loss: 2.022465 LR: 0.00002934 [10:11:12] Epoch: 1 Batch: 5589/20099 (27.81%) Loss: 1.922452 LR: 0.00002933 [10:11:14] Epoch: 1 Batch: 5590/20099 (27.81%) Loss: 1.970361 LR: 0.00002933 [10:11:16] Epoch: 1 Batch: 5591/20099 (27.82%) Loss: 2.031234 LR: 0.00002933 [10:11:17] Epoch: 1 Batch: 5592/20099 (27.82%) Loss: 2.230311 LR: 0.00002933 [10:11:19] Epoch: 1 Batch: 5593/20099 (27.83%) Loss: 1.999375 LR: 0.00002933 [10:11:21] Epoch: 1 Batch: 5594/20099 (27.83%) Loss: 1.812546 LR: 0.00002933 [10:11:23] Epoch: 1 Batch: 5595/20099 (27.84%) Loss: 2.361866 LR: 0.00002933 [10:11:25] Epoch: 1 Batch: 5596/20099 (27.84%) Loss: 2.086416 LR: 0.00002932 [10:11:26] Epoch: 1 Batch: 5597/20099 (27.85%) Loss: 2.132535 LR: 0.00002932 [10:11:28] Epoch: 1 Batch: 5598/20099 (27.85%) Loss: 2.180308 LR: 0.00002932 [10:11:30] Epoch: 1 Batch: 5599/20099 (27.86%) Loss: 2.239886 LR: 0.00002932 [10:11:35] >> Cleaned up old temp checkpoint: epoch1_step3600 [10:11:35] >> Temp checkpoint saved: epoch1_step5600, size: 0.1693 GB [10:11:35] Epoch: 1 Batch: 5600/20099 (27.86%) Loss: 1.894940 LR: 0.00002932 [10:11:37] Epoch: 1 Batch: 5601/20099 (27.87%) Loss: 2.414137 LR: 0.00002932 [10:11:39] Epoch: 1 Batch: 5602/20099 (27.87%) Loss: 2.052530 LR: 0.00002932 [10:11:40] Epoch: 1 Batch: 5603/20099 (27.88%) Loss: 2.082198 LR: 0.00002932 [10:11:42] Epoch: 1 Batch: 5604/20099 (27.88%) Loss: 1.907498 LR: 0.00002932 [10:11:44] Epoch: 1 Batch: 5605/20099 (27.89%) Loss: 1.994053 LR: 0.00002932 [10:11:46] Epoch: 1 Batch: 5606/20099 (27.89%) Loss: 2.284043 LR: 0.00002932 [10:11:47] Epoch: 1 Batch: 5607/20099 (27.90%) Loss: 1.969285 LR: 0.00002932 [10:11:49] Epoch: 1 Batch: 5608/20099 (27.90%) Loss: 2.355072 LR: 0.00002932 [10:11:51] Epoch: 1 Batch: 5609/20099 (27.91%) Loss: 2.029380 LR: 0.00002932 [10:11:53] Epoch: 1 Batch: 5610/20099 (27.91%) Loss: 2.153576 LR: 0.00002931 [10:11:55] Epoch: 1 Batch: 5611/20099 (27.92%) Loss: 2.069564 LR: 0.00002931 [10:11:56] Epoch: 1 Batch: 5612/20099 (27.92%) Loss: 2.014031 LR: 0.00002931 [10:11:58] Epoch: 1 Batch: 5613/20099 (27.93%) Loss: 1.939979 LR: 0.00002931 [10:12:00] Epoch: 1 Batch: 5614/20099 (27.93%) Loss: 2.041470 LR: 0.00002931 [10:12:02] Epoch: 1 Batch: 5615/20099 (27.94%) Loss: 2.142931 LR: 0.00002931 [10:12:03] Epoch: 1 Batch: 5616/20099 (27.94%) Loss: 2.008445 LR: 0.00002931 [10:12:05] Epoch: 1 Batch: 5617/20099 (27.95%) Loss: 2.145051 LR: 0.00002931 [10:12:07] Epoch: 1 Batch: 5618/20099 (27.95%) Loss: 2.155002 LR: 0.00002931 [10:12:09] Epoch: 1 Batch: 5619/20099 (27.96%) Loss: 1.986511 LR: 0.00002931 [10:12:11] Epoch: 1 Batch: 5620/20099 (27.96%) Loss: 2.127598 LR: 0.00002931 [10:12:12] Epoch: 1 Batch: 5621/20099 (27.97%) Loss: 1.629275 LR: 0.00002931 [10:12:14] Epoch: 1 Batch: 5622/20099 (27.97%) Loss: 2.163292 LR: 0.00002931 [10:12:16] Epoch: 1 Batch: 5623/20099 (27.98%) Loss: 2.116549 LR: 0.00002931 [10:12:18] Epoch: 1 Batch: 5624/20099 (27.98%) Loss: 2.386443 LR: 0.00002930 [10:12:20] Epoch: 1 Batch: 5625/20099 (27.99%) Loss: 2.358570 LR: 0.00002930 [10:12:21] Epoch: 1 Batch: 5626/20099 (27.99%) Loss: 2.030927 LR: 0.00002930 [10:12:23] Epoch: 1 Batch: 5627/20099 (28.00%) Loss: 2.140235 LR: 0.00002930 [10:12:25] Epoch: 1 Batch: 5628/20099 (28.00%) Loss: 2.399337 LR: 0.00002930 [10:12:27] Epoch: 1 Batch: 5629/20099 (28.01%) Loss: 2.120539 LR: 0.00002930 [10:12:28] Epoch: 1 Batch: 5630/20099 (28.01%) Loss: 2.024320 LR: 0.00002930 [10:12:30] Epoch: 1 Batch: 5631/20099 (28.02%) Loss: 2.009178 LR: 0.00002930 [10:12:32] Epoch: 1 Batch: 5632/20099 (28.02%) Loss: 2.371442 LR: 0.00002930 [10:12:34] Epoch: 1 Batch: 5633/20099 (28.03%) Loss: 2.062345 LR: 0.00002930 [10:12:35] Epoch: 1 Batch: 5634/20099 (28.03%) Loss: 2.068314 LR: 0.00002930 [10:12:37] Epoch: 1 Batch: 5635/20099 (28.04%) Loss: 2.095057 LR: 0.00002930 [10:12:39] Epoch: 1 Batch: 5636/20099 (28.04%) Loss: 2.359134 LR: 0.00002930 [10:12:41] Epoch: 1 Batch: 5637/20099 (28.05%) Loss: 2.065557 LR: 0.00002930 [10:12:43] Epoch: 1 Batch: 5638/20099 (28.05%) Loss: 2.151374 LR: 0.00002929 [10:12:44] Epoch: 1 Batch: 5639/20099 (28.06%) Loss: 2.017007 LR: 0.00002929 [10:12:46] Epoch: 1 Batch: 5640/20099 (28.06%) Loss: 2.320213 LR: 0.00002929 [10:12:48] Epoch: 1 Batch: 5641/20099 (28.07%) Loss: 2.116705 LR: 0.00002929 [10:12:50] Epoch: 1 Batch: 5642/20099 (28.07%) Loss: 1.983116 LR: 0.00002929 [10:12:51] Epoch: 1 Batch: 5643/20099 (28.08%) Loss: 1.936975 LR: 0.00002929 [10:12:53] Epoch: 1 Batch: 5644/20099 (28.08%) Loss: 1.957347 LR: 0.00002929 [10:12:55] Epoch: 1 Batch: 5645/20099 (28.09%) Loss: 2.084813 LR: 0.00002929 [10:12:57] Epoch: 1 Batch: 5646/20099 (28.09%) Loss: 2.364551 LR: 0.00002929 [10:12:58] Epoch: 1 Batch: 5647/20099 (28.10%) Loss: 2.107544 LR: 0.00002929 [10:13:00] Epoch: 1 Batch: 5648/20099 (28.10%) Loss: 1.762895 LR: 0.00002929 [10:13:02] Epoch: 1 Batch: 5649/20099 (28.11%) Loss: 1.863264 LR: 0.00002929 [10:13:04] Epoch: 1 Batch: 5650/20099 (28.11%) Loss: 2.132744 LR: 0.00002929 [10:13:05] Epoch: 1 Batch: 5651/20099 (28.12%) Loss: 2.136455 LR: 0.00002929 [10:13:07] Epoch: 1 Batch: 5652/20099 (28.12%) Loss: 1.998704 LR: 0.00002928 [10:13:09] Epoch: 1 Batch: 5653/20099 (28.13%) Loss: 2.460695 LR: 0.00002928 [10:13:11] Epoch: 1 Batch: 5654/20099 (28.13%) Loss: 2.044469 LR: 0.00002928 [10:13:13] Epoch: 1 Batch: 5655/20099 (28.14%) Loss: 2.005651 LR: 0.00002928 [10:13:14] Epoch: 1 Batch: 5656/20099 (28.14%) Loss: 1.927574 LR: 0.00002928 [10:13:16] Epoch: 1 Batch: 5657/20099 (28.15%) Loss: 2.589249 LR: 0.00002928 [10:13:18] Epoch: 1 Batch: 5658/20099 (28.15%) Loss: 2.029427 LR: 0.00002928 [10:13:20] Epoch: 1 Batch: 5659/20099 (28.16%) Loss: 2.304166 LR: 0.00002928 [10:13:21] Epoch: 1 Batch: 5660/20099 (28.16%) Loss: 2.391147 LR: 0.00002928 [10:13:23] Epoch: 1 Batch: 5661/20099 (28.17%) Loss: 1.880856 LR: 0.00002928 [10:13:25] Epoch: 1 Batch: 5662/20099 (28.17%) Loss: 2.195905 LR: 0.00002928 [10:13:27] Epoch: 1 Batch: 5663/20099 (28.18%) Loss: 2.084331 LR: 0.00002928 [10:13:28] Epoch: 1 Batch: 5664/20099 (28.18%) Loss: 2.301641 LR: 0.00002928 [10:13:30] Epoch: 1 Batch: 5665/20099 (28.19%) Loss: 2.381791 LR: 0.00002928 [10:13:32] Epoch: 1 Batch: 5666/20099 (28.19%) Loss: 2.360234 LR: 0.00002927 [10:13:34] Epoch: 1 Batch: 5667/20099 (28.20%) Loss: 1.816981 LR: 0.00002927 [10:13:36] Epoch: 1 Batch: 5668/20099 (28.20%) Loss: 2.135162 LR: 0.00002927 [10:13:37] Epoch: 1 Batch: 5669/20099 (28.21%) Loss: 2.242070 LR: 0.00002927 [10:13:39] Epoch: 1 Batch: 5670/20099 (28.21%) Loss: 2.489749 LR: 0.00002927 [10:13:41] Epoch: 1 Batch: 5671/20099 (28.22%) Loss: 1.994850 LR: 0.00002927 [10:13:43] Epoch: 1 Batch: 5672/20099 (28.22%) Loss: 2.039171 LR: 0.00002927 [10:13:45] Epoch: 1 Batch: 5673/20099 (28.23%) Loss: 2.356518 LR: 0.00002926 [10:13:46] Epoch: 1 Batch: 5674/20099 (28.23%) Loss: 1.880168 LR: 0.00002926 [10:13:48] Epoch: 1 Batch: 5675/20099 (28.24%) Loss: 2.207188 LR: 0.00002926 [10:13:50] Epoch: 1 Batch: 5676/20099 (28.24%) Loss: 2.171134 LR: 0.00002926 [10:13:52] Epoch: 1 Batch: 5677/20099 (28.25%) Loss: 2.059070 LR: 0.00002926 [10:13:53] Epoch: 1 Batch: 5678/20099 (28.25%) Loss: 1.854773 LR: 0.00002926 [10:13:55] Epoch: 1 Batch: 5679/20099 (28.26%) Loss: 2.116316 LR: 0.00002926 [10:13:57] Epoch: 1 Batch: 5680/20099 (28.26%) Loss: 1.921187 LR: 0.00002926 [10:13:59] Epoch: 1 Batch: 5681/20099 (28.27%) Loss: 2.105031 LR: 0.00002926 [10:14:01] Epoch: 1 Batch: 5682/20099 (28.27%) Loss: 2.381476 LR: 0.00002926 [10:14:02] Epoch: 1 Batch: 5683/20099 (28.28%) Loss: 1.901219 LR: 0.00002926 [10:14:04] Epoch: 1 Batch: 5684/20099 (28.28%) Loss: 1.869229 LR: 0.00002926 [10:14:06] Epoch: 1 Batch: 5685/20099 (28.28%) Loss: 2.090598 LR: 0.00002926 [10:14:08] Epoch: 1 Batch: 5686/20099 (28.29%) Loss: 2.170013 LR: 0.00002926 [10:14:09] Epoch: 1 Batch: 5687/20099 (28.29%) Loss: 2.098230 LR: 0.00002925 [10:14:11] Epoch: 1 Batch: 5688/20099 (28.30%) Loss: 2.066526 LR: 0.00002925 [10:14:13] Epoch: 1 Batch: 5689/20099 (28.30%) Loss: 1.836978 LR: 0.00002925 [10:14:15] Epoch: 1 Batch: 5690/20099 (28.31%) Loss: 2.182305 LR: 0.00002925 [10:14:16] Epoch: 1 Batch: 5691/20099 (28.31%) Loss: 1.683793 LR: 0.00002925 [10:14:18] Epoch: 1 Batch: 5692/20099 (28.32%) Loss: 2.207219 LR: 0.00002925 [10:14:20] Epoch: 1 Batch: 5693/20099 (28.32%) Loss: 2.218802 LR: 0.00002925 [10:14:22] Epoch: 1 Batch: 5694/20099 (28.33%) Loss: 1.837239 LR: 0.00002925 [10:14:24] Epoch: 1 Batch: 5695/20099 (28.33%) Loss: 2.094657 LR: 0.00002925 [10:14:25] Epoch: 1 Batch: 5696/20099 (28.34%) Loss: 2.188740 LR: 0.00002925 [10:14:27] Epoch: 1 Batch: 5697/20099 (28.34%) Loss: 2.120753 LR: 0.00002925 [10:14:29] Epoch: 1 Batch: 5698/20099 (28.35%) Loss: 2.462973 LR: 0.00002925 [10:14:31] Epoch: 1 Batch: 5699/20099 (28.35%) Loss: 2.105696 LR: 0.00002925 [10:14:32] Epoch: 1 Batch: 5700/20099 (28.36%) Loss: 2.024656 LR: 0.00002925 [10:14:34] Epoch: 1 Batch: 5701/20099 (28.36%) Loss: 2.146820 LR: 0.00002924 [10:14:36] Epoch: 1 Batch: 5702/20099 (28.37%) Loss: 1.938148 LR: 0.00002924 [10:14:38] Epoch: 1 Batch: 5703/20099 (28.37%) Loss: 2.097735 LR: 0.00002924 [10:14:39] Epoch: 1 Batch: 5704/20099 (28.38%) Loss: 2.119946 LR: 0.00002924 [10:14:41] Epoch: 1 Batch: 5705/20099 (28.38%) Loss: 2.287447 LR: 0.00002924 [10:14:43] Epoch: 1 Batch: 5706/20099 (28.39%) Loss: 2.182202 LR: 0.00002924 [10:14:45] Epoch: 1 Batch: 5707/20099 (28.39%) Loss: 2.140720 LR: 0.00002924 [10:14:47] Epoch: 1 Batch: 5708/20099 (28.40%) Loss: 1.705332 LR: 0.00002924 [10:14:48] Epoch: 1 Batch: 5709/20099 (28.40%) Loss: 2.296562 LR: 0.00002924 [10:14:50] Epoch: 1 Batch: 5710/20099 (28.41%) Loss: 2.086496 LR: 0.00002924 [10:14:52] Epoch: 1 Batch: 5711/20099 (28.41%) Loss: 1.921529 LR: 0.00002924 [10:14:54] Epoch: 1 Batch: 5712/20099 (28.42%) Loss: 2.034347 LR: 0.00002924 [10:14:55] Epoch: 1 Batch: 5713/20099 (28.42%) Loss: 2.295450 LR: 0.00002924 [10:14:57] Epoch: 1 Batch: 5714/20099 (28.43%) Loss: 1.983602 LR: 0.00002924 [10:14:59] Epoch: 1 Batch: 5715/20099 (28.43%) Loss: 2.069678 LR: 0.00002923 [10:15:01] Epoch: 1 Batch: 5716/20099 (28.44%) Loss: 2.047155 LR: 0.00002923 [10:15:02] Epoch: 1 Batch: 5717/20099 (28.44%) Loss: 1.722468 LR: 0.00002923 [10:15:04] Epoch: 1 Batch: 5718/20099 (28.45%) Loss: 2.402497 LR: 0.00002923 [10:15:06] Epoch: 1 Batch: 5719/20099 (28.45%) Loss: 2.098434 LR: 0.00002923 [10:15:08] Epoch: 1 Batch: 5720/20099 (28.46%) Loss: 2.099648 LR: 0.00002923 [10:15:09] Epoch: 1 Batch: 5721/20099 (28.46%) Loss: 2.055037 LR: 0.00002923 [10:15:11] Epoch: 1 Batch: 5722/20099 (28.47%) Loss: 2.094568 LR: 0.00002922 [10:15:13] Epoch: 1 Batch: 5723/20099 (28.47%) Loss: 2.250470 LR: 0.00002922 [10:15:15] Epoch: 1 Batch: 5724/20099 (28.48%) Loss: 1.822640 LR: 0.00002922 [10:15:16] Epoch: 1 Batch: 5725/20099 (28.48%) Loss: 2.073017 LR: 0.00002922 [10:15:18] Epoch: 1 Batch: 5726/20099 (28.49%) Loss: 1.950014 LR: 0.00002922 [10:15:20] Epoch: 1 Batch: 5727/20099 (28.49%) Loss: 2.179752 LR: 0.00002922 [10:15:22] Epoch: 1 Batch: 5728/20099 (28.50%) Loss: 2.176908 LR: 0.00002922 [10:15:24] Epoch: 1 Batch: 5729/20099 (28.50%) Loss: 2.224230 LR: 0.00002922 [10:15:25] Epoch: 1 Batch: 5730/20099 (28.51%) Loss: 2.104697 LR: 0.00002922 [10:15:27] Epoch: 1 Batch: 5731/20099 (28.51%) Loss: 2.072458 LR: 0.00002922 [10:15:29] Epoch: 1 Batch: 5732/20099 (28.52%) Loss: 2.285072 LR: 0.00002922 [10:15:31] Epoch: 1 Batch: 5733/20099 (28.52%) Loss: 2.084134 LR: 0.00002922 [10:15:32] Epoch: 1 Batch: 5734/20099 (28.53%) Loss: 2.014092 LR: 0.00002922 [10:15:34] Epoch: 1 Batch: 5735/20099 (28.53%) Loss: 2.317812 LR: 0.00002922 [10:15:36] Epoch: 1 Batch: 5736/20099 (28.54%) Loss: 2.211993 LR: 0.00002921 [10:15:38] Epoch: 1 Batch: 5737/20099 (28.54%) Loss: 2.171583 LR: 0.00002921 [10:15:40] Epoch: 1 Batch: 5738/20099 (28.55%) Loss: 1.997504 LR: 0.00002921 [10:15:41] Epoch: 1 Batch: 5739/20099 (28.55%) Loss: 2.006678 LR: 0.00002921 [10:15:43] Epoch: 1 Batch: 5740/20099 (28.56%) Loss: 2.001716 LR: 0.00002921 [10:15:45] Epoch: 1 Batch: 5741/20099 (28.56%) Loss: 1.693074 LR: 0.00002921 [10:15:47] Epoch: 1 Batch: 5742/20099 (28.57%) Loss: 2.229803 LR: 0.00002921 [10:15:48] Epoch: 1 Batch: 5743/20099 (28.57%) Loss: 1.969671 LR: 0.00002921 [10:15:50] Epoch: 1 Batch: 5744/20099 (28.58%) Loss: 2.116443 LR: 0.00002921 [10:15:52] Epoch: 1 Batch: 5745/20099 (28.58%) Loss: 2.381115 LR: 0.00002921 [10:15:54] Epoch: 1 Batch: 5746/20099 (28.59%) Loss: 2.061568 LR: 0.00002921 [10:15:55] Epoch: 1 Batch: 5747/20099 (28.59%) Loss: 2.302366 LR: 0.00002921 [10:15:57] Epoch: 1 Batch: 5748/20099 (28.60%) Loss: 2.256037 LR: 0.00002921 [10:15:59] Epoch: 1 Batch: 5749/20099 (28.60%) Loss: 2.017595 LR: 0.00002921 [10:16:01] Epoch: 1 Batch: 5750/20099 (28.61%) Loss: 2.151429 LR: 0.00002920 [10:16:03] Epoch: 1 Batch: 5751/20099 (28.61%) Loss: 2.444093 LR: 0.00002920 [10:16:04] Epoch: 1 Batch: 5752/20099 (28.62%) Loss: 2.104065 LR: 0.00002920 [10:16:06] Epoch: 1 Batch: 5753/20099 (28.62%) Loss: 2.111111 LR: 0.00002920 [10:16:08] Epoch: 1 Batch: 5754/20099 (28.63%) Loss: 2.157824 LR: 0.00002920 [10:16:10] Epoch: 1 Batch: 5755/20099 (28.63%) Loss: 2.135009 LR: 0.00002920 [10:16:11] Epoch: 1 Batch: 5756/20099 (28.64%) Loss: 2.191044 LR: 0.00002920 [10:16:13] Epoch: 1 Batch: 5757/20099 (28.64%) Loss: 1.912259 LR: 0.00002920 [10:16:15] Epoch: 1 Batch: 5758/20099 (28.65%) Loss: 2.119370 LR: 0.00002920 [10:16:17] Epoch: 1 Batch: 5759/20099 (28.65%) Loss: 1.945659 LR: 0.00002920 [10:16:19] Epoch: 1 Batch: 5760/20099 (28.66%) Loss: 2.149083 LR: 0.00002920 [10:16:20] Epoch: 1 Batch: 5761/20099 (28.66%) Loss: 2.184052 LR: 0.00002920 [10:16:22] Epoch: 1 Batch: 5762/20099 (28.67%) Loss: 2.345678 LR: 0.00002920 [10:16:24] Epoch: 1 Batch: 5763/20099 (28.67%) Loss: 1.962343 LR: 0.00002920 [10:16:26] Epoch: 1 Batch: 5764/20099 (28.68%) Loss: 2.100049 LR: 0.00002919 [10:16:27] Epoch: 1 Batch: 5765/20099 (28.68%) Loss: 2.187373 LR: 0.00002919 [10:16:29] Epoch: 1 Batch: 5766/20099 (28.69%) Loss: 1.905284 LR: 0.00002919 [10:16:31] Epoch: 1 Batch: 5767/20099 (28.69%) Loss: 2.311063 LR: 0.00002919 [10:16:33] Epoch: 1 Batch: 5768/20099 (28.70%) Loss: 1.823513 LR: 0.00002919 [10:16:35] Epoch: 1 Batch: 5769/20099 (28.70%) Loss: 2.005320 LR: 0.00002919 [10:16:36] Epoch: 1 Batch: 5770/20099 (28.71%) Loss: 2.136724 LR: 0.00002919 [10:16:38] Epoch: 1 Batch: 5771/20099 (28.71%) Loss: 1.905490 LR: 0.00002918 [10:16:40] Epoch: 1 Batch: 5772/20099 (28.72%) Loss: 2.270180 LR: 0.00002918 [10:16:42] Epoch: 1 Batch: 5773/20099 (28.72%) Loss: 2.004126 LR: 0.00002918 [10:16:43] Epoch: 1 Batch: 5774/20099 (28.73%) Loss: 2.245483 LR: 0.00002918 [10:16:45] Epoch: 1 Batch: 5775/20099 (28.73%) Loss: 1.817795 LR: 0.00002918 [10:16:47] Epoch: 1 Batch: 5776/20099 (28.74%) Loss: 2.132286 LR: 0.00002918 [10:16:49] Epoch: 1 Batch: 5777/20099 (28.74%) Loss: 2.200933 LR: 0.00002918 [10:16:50] Epoch: 1 Batch: 5778/20099 (28.75%) Loss: 2.035585 LR: 0.00002918 [10:16:52] Epoch: 1 Batch: 5779/20099 (28.75%) Loss: 2.103627 LR: 0.00002918 [10:16:54] Epoch: 1 Batch: 5780/20099 (28.76%) Loss: 2.333654 LR: 0.00002918 [10:16:56] Epoch: 1 Batch: 5781/20099 (28.76%) Loss: 1.916925 LR: 0.00002918 [10:16:58] Epoch: 1 Batch: 5782/20099 (28.77%) Loss: 2.065562 LR: 0.00002918 [10:16:59] Epoch: 1 Batch: 5783/20099 (28.77%) Loss: 2.384601 LR: 0.00002918 [10:17:01] Epoch: 1 Batch: 5784/20099 (28.78%) Loss: 2.215942 LR: 0.00002918 [10:17:03] Epoch: 1 Batch: 5785/20099 (28.78%) Loss: 2.281406 LR: 0.00002917 [10:17:05] Epoch: 1 Batch: 5786/20099 (28.79%) Loss: 2.056295 LR: 0.00002917 [10:17:06] Epoch: 1 Batch: 5787/20099 (28.79%) Loss: 2.142128 LR: 0.00002917 [10:17:08] Epoch: 1 Batch: 5788/20099 (28.80%) Loss: 2.244149 LR: 0.00002917 [10:17:10] Epoch: 1 Batch: 5789/20099 (28.80%) Loss: 1.840826 LR: 0.00002917 [10:17:12] Epoch: 1 Batch: 5790/20099 (28.81%) Loss: 2.145611 LR: 0.00002917 [10:17:14] Epoch: 1 Batch: 5791/20099 (28.81%) Loss: 2.051666 LR: 0.00002917 [10:17:15] Epoch: 1 Batch: 5792/20099 (28.82%) Loss: 2.233139 LR: 0.00002917 [10:17:17] Epoch: 1 Batch: 5793/20099 (28.82%) Loss: 2.360253 LR: 0.00002917 [10:17:19] Epoch: 1 Batch: 5794/20099 (28.83%) Loss: 2.025477 LR: 0.00002917 [10:17:21] Epoch: 1 Batch: 5795/20099 (28.83%) Loss: 2.015267 LR: 0.00002917 [10:17:22] Epoch: 1 Batch: 5796/20099 (28.84%) Loss: 2.010987 LR: 0.00002917 [10:17:24] Epoch: 1 Batch: 5797/20099 (28.84%) Loss: 2.410330 LR: 0.00002917 [10:17:26] Epoch: 1 Batch: 5798/20099 (28.85%) Loss: 2.023354 LR: 0.00002917 [10:17:28] Epoch: 1 Batch: 5799/20099 (28.85%) Loss: 2.053431 LR: 0.00002916 [10:17:33] >> Cleaned up old temp checkpoint: epoch1_step3800 [10:17:33] >> Temp checkpoint saved: epoch1_step5800, size: 0.1693 GB [10:17:33] Epoch: 1 Batch: 5800/20099 (28.86%) Loss: 2.252732 LR: 0.00002916 [10:17:35] Epoch: 1 Batch: 5801/20099 (28.86%) Loss: 1.946936 LR: 0.00002916 [10:17:37] Epoch: 1 Batch: 5802/20099 (28.87%) Loss: 2.212658 LR: 0.00002916 [10:17:38] Epoch: 1 Batch: 5803/20099 (28.87%) Loss: 2.017823 LR: 0.00002916 [10:17:40] Epoch: 1 Batch: 5804/20099 (28.88%) Loss: 1.862233 LR: 0.00002916 [10:17:42] Epoch: 1 Batch: 5805/20099 (28.88%) Loss: 2.395719 LR: 0.00002916 [10:17:44] Epoch: 1 Batch: 5806/20099 (28.89%) Loss: 2.251369 LR: 0.00002915 [10:17:45] Epoch: 1 Batch: 5807/20099 (28.89%) Loss: 2.091102 LR: 0.00002915 [10:17:47] Epoch: 1 Batch: 5808/20099 (28.90%) Loss: 1.773430 LR: 0.00002915 [10:17:49] Epoch: 1 Batch: 5809/20099 (28.90%) Loss: 2.162878 LR: 0.00002915 [10:17:51] Epoch: 1 Batch: 5810/20099 (28.91%) Loss: 2.302252 LR: 0.00002915 [10:17:52] Epoch: 1 Batch: 5811/20099 (28.91%) Loss: 2.141431 LR: 0.00002915 [10:17:54] Epoch: 1 Batch: 5812/20099 (28.92%) Loss: 2.142077 LR: 0.00002915 [10:17:56] Epoch: 1 Batch: 5813/20099 (28.92%) Loss: 1.947053 LR: 0.00002915 [10:17:58] Epoch: 1 Batch: 5814/20099 (28.93%) Loss: 1.919007 LR: 0.00002915 [10:18:00] Epoch: 1 Batch: 5815/20099 (28.93%) Loss: 2.215216 LR: 0.00002915 [10:18:01] Epoch: 1 Batch: 5816/20099 (28.94%) Loss: 1.954203 LR: 0.00002915 [10:18:03] Epoch: 1 Batch: 5817/20099 (28.94%) Loss: 2.168072 LR: 0.00002915 [10:18:05] Epoch: 1 Batch: 5818/20099 (28.95%) Loss: 1.922810 LR: 0.00002915 [10:18:07] Epoch: 1 Batch: 5819/20099 (28.95%) Loss: 2.528810 LR: 0.00002915 [10:18:08] Epoch: 1 Batch: 5820/20099 (28.96%) Loss: 2.311135 LR: 0.00002914 [10:18:10] Epoch: 1 Batch: 5821/20099 (28.96%) Loss: 2.044465 LR: 0.00002914 [10:18:12] Epoch: 1 Batch: 5822/20099 (28.97%) Loss: 2.218964 LR: 0.00002914 [10:18:14] Epoch: 1 Batch: 5823/20099 (28.97%) Loss: 2.233440 LR: 0.00002914 [10:18:16] Epoch: 1 Batch: 5824/20099 (28.98%) Loss: 1.923606 LR: 0.00002914 [10:18:17] Epoch: 1 Batch: 5825/20099 (28.98%) Loss: 1.863114 LR: 0.00002914 [10:18:19] Epoch: 1 Batch: 5826/20099 (28.99%) Loss: 2.214758 LR: 0.00002914 [10:18:21] Epoch: 1 Batch: 5827/20099 (28.99%) Loss: 2.340299 LR: 0.00002914 [10:18:23] Epoch: 1 Batch: 5828/20099 (29.00%) Loss: 1.868752 LR: 0.00002914 [10:18:25] Epoch: 1 Batch: 5829/20099 (29.00%) Loss: 1.971392 LR: 0.00002914 [10:18:26] Epoch: 1 Batch: 5830/20099 (29.01%) Loss: 2.342893 LR: 0.00002914 [10:18:28] Epoch: 1 Batch: 5831/20099 (29.01%) Loss: 1.891534 LR: 0.00002914 [10:18:30] Epoch: 1 Batch: 5832/20099 (29.02%) Loss: 2.075529 LR: 0.00002914 [10:18:32] Epoch: 1 Batch: 5833/20099 (29.02%) Loss: 2.150151 LR: 0.00002914 [10:18:33] Epoch: 1 Batch: 5834/20099 (29.03%) Loss: 1.898393 LR: 0.00002913 [10:18:35] Epoch: 1 Batch: 5835/20099 (29.03%) Loss: 2.163815 LR: 0.00002913 [10:18:37] Epoch: 1 Batch: 5836/20099 (29.04%) Loss: 1.882176 LR: 0.00002913 [10:18:39] Epoch: 1 Batch: 5837/20099 (29.04%) Loss: 2.119623 LR: 0.00002913 [10:18:40] Epoch: 1 Batch: 5838/20099 (29.05%) Loss: 2.211334 LR: 0.00002913 [10:18:42] Epoch: 1 Batch: 5839/20099 (29.05%) Loss: 1.980462 LR: 0.00002913 [10:18:44] Epoch: 1 Batch: 5840/20099 (29.06%) Loss: 2.149172 LR: 0.00002913 [10:18:46] Epoch: 1 Batch: 5841/20099 (29.06%) Loss: 2.283388 LR: 0.00002912 [10:18:47] Epoch: 1 Batch: 5842/20099 (29.07%) Loss: 2.013088 LR: 0.00002912 [10:18:49] Epoch: 1 Batch: 5843/20099 (29.07%) Loss: 1.766804 LR: 0.00002912 [10:18:51] Epoch: 1 Batch: 5844/20099 (29.08%) Loss: 1.829578 LR: 0.00002912 [10:18:53] Epoch: 1 Batch: 5845/20099 (29.08%) Loss: 1.938738 LR: 0.00002912 [10:18:54] Epoch: 1 Batch: 5846/20099 (29.09%) Loss: 1.896629 LR: 0.00002912 [10:18:56] Epoch: 1 Batch: 5847/20099 (29.09%) Loss: 2.148381 LR: 0.00002912 [10:18:58] Epoch: 1 Batch: 5848/20099 (29.10%) Loss: 2.214371 LR: 0.00002912 [10:19:00] Epoch: 1 Batch: 5849/20099 (29.10%) Loss: 2.221948 LR: 0.00002912 [10:19:02] Epoch: 1 Batch: 5850/20099 (29.11%) Loss: 2.188726 LR: 0.00002912 [10:19:03] Epoch: 1 Batch: 5851/20099 (29.11%) Loss: 2.305455 LR: 0.00002912 [10:19:05] Epoch: 1 Batch: 5852/20099 (29.12%) Loss: 2.255434 LR: 0.00002912 [10:19:07] Epoch: 1 Batch: 5853/20099 (29.12%) Loss: 2.107186 LR: 0.00002912 [10:19:09] Epoch: 1 Batch: 5854/20099 (29.13%) Loss: 2.080858 LR: 0.00002912 [10:19:10] Epoch: 1 Batch: 5855/20099 (29.13%) Loss: 2.363265 LR: 0.00002911 [10:19:12] Epoch: 1 Batch: 5856/20099 (29.14%) Loss: 2.025679 LR: 0.00002911 [10:19:14] Epoch: 1 Batch: 5857/20099 (29.14%) Loss: 2.018208 LR: 0.00002911 [10:19:16] Epoch: 1 Batch: 5858/20099 (29.15%) Loss: 1.968245 LR: 0.00002911 [10:19:17] Epoch: 1 Batch: 5859/20099 (29.15%) Loss: 1.882569 LR: 0.00002911 [10:19:19] Epoch: 1 Batch: 5860/20099 (29.16%) Loss: 1.901354 LR: 0.00002911 [10:19:21] Epoch: 1 Batch: 5861/20099 (29.16%) Loss: 1.999199 LR: 0.00002911 [10:19:23] Epoch: 1 Batch: 5862/20099 (29.17%) Loss: 2.280003 LR: 0.00002911 [10:19:25] Epoch: 1 Batch: 5863/20099 (29.17%) Loss: 2.045243 LR: 0.00002911 [10:19:26] Epoch: 1 Batch: 5864/20099 (29.18%) Loss: 2.170026 LR: 0.00002911 [10:19:28] Epoch: 1 Batch: 5865/20099 (29.18%) Loss: 2.148227 LR: 0.00002911 [10:19:30] Epoch: 1 Batch: 5866/20099 (29.19%) Loss: 2.093106 LR: 0.00002911 [10:19:32] Epoch: 1 Batch: 5867/20099 (29.19%) Loss: 2.228858 LR: 0.00002911 [10:19:33] Epoch: 1 Batch: 5868/20099 (29.20%) Loss: 2.187702 LR: 0.00002911 [10:19:36] Epoch: 1 Batch: 5869/20099 (29.20%) Loss: 1.842423 LR: 0.00002910 [10:19:37] Epoch: 1 Batch: 5870/20099 (29.21%) Loss: 2.046632 LR: 0.00002910 [10:19:39] Epoch: 1 Batch: 5871/20099 (29.21%) Loss: 2.246473 LR: 0.00002910 [10:19:41] Epoch: 1 Batch: 5872/20099 (29.22%) Loss: 2.055605 LR: 0.00002910 [10:19:43] Epoch: 1 Batch: 5873/20099 (29.22%) Loss: 1.927635 LR: 0.00002910 [10:19:44] Epoch: 1 Batch: 5874/20099 (29.23%) Loss: 2.328952 LR: 0.00002910 [10:19:46] Epoch: 1 Batch: 5875/20099 (29.23%) Loss: 2.221496 LR: 0.00002910 [10:19:48] Epoch: 1 Batch: 5876/20099 (29.24%) Loss: 1.976502 LR: 0.00002909 [10:19:50] Epoch: 1 Batch: 5877/20099 (29.24%) Loss: 2.202083 LR: 0.00002909 [10:19:52] Epoch: 1 Batch: 5878/20099 (29.25%) Loss: 2.386108 LR: 0.00002909 [10:19:53] Epoch: 1 Batch: 5879/20099 (29.25%) Loss: 2.067835 LR: 0.00002909 [10:19:55] Epoch: 1 Batch: 5880/20099 (29.26%) Loss: 2.139376 LR: 0.00002909 [10:19:57] Epoch: 1 Batch: 5881/20099 (29.26%) Loss: 1.653525 LR: 0.00002909 [10:19:59] Epoch: 1 Batch: 5882/20099 (29.27%) Loss: 2.199294 LR: 0.00002909 [10:20:01] Epoch: 1 Batch: 5883/20099 (29.27%) Loss: 2.005581 LR: 0.00002909 [10:20:02] Epoch: 1 Batch: 5884/20099 (29.28%) Loss: 2.264829 LR: 0.00002909 [10:20:04] Epoch: 1 Batch: 5885/20099 (29.28%) Loss: 2.250932 LR: 0.00002909 [10:20:06] Epoch: 1 Batch: 5886/20099 (29.29%) Loss: 2.299768 LR: 0.00002909 [10:20:08] Epoch: 1 Batch: 5887/20099 (29.29%) Loss: 2.241748 LR: 0.00002909 [10:20:09] Epoch: 1 Batch: 5888/20099 (29.29%) Loss: 1.947544 LR: 0.00002909 [10:20:11] Epoch: 1 Batch: 5889/20099 (29.30%) Loss: 2.396574 LR: 0.00002909 [10:20:13] Epoch: 1 Batch: 5890/20099 (29.30%) Loss: 2.279308 LR: 0.00002908 [10:20:15] Epoch: 1 Batch: 5891/20099 (29.31%) Loss: 2.005969 LR: 0.00002908 [10:20:16] Epoch: 1 Batch: 5892/20099 (29.31%) Loss: 1.939330 LR: 0.00002908 [10:20:18] Epoch: 1 Batch: 5893/20099 (29.32%) Loss: 2.073083 LR: 0.00002908 [10:20:20] Epoch: 1 Batch: 5894/20099 (29.32%) Loss: 2.035916 LR: 0.00002908 [10:20:22] Epoch: 1 Batch: 5895/20099 (29.33%) Loss: 2.042533 LR: 0.00002908 [10:20:24] Epoch: 1 Batch: 5896/20099 (29.33%) Loss: 2.032497 LR: 0.00002908 [10:20:25] Epoch: 1 Batch: 5897/20099 (29.34%) Loss: 2.209742 LR: 0.00002907 [10:20:27] Epoch: 1 Batch: 5898/20099 (29.34%) Loss: 2.225151 LR: 0.00002907 [10:20:29] Epoch: 1 Batch: 5899/20099 (29.35%) Loss: 2.253237 LR: 0.00002907 [10:20:31] Epoch: 1 Batch: 5900/20099 (29.35%) Loss: 2.321652 LR: 0.00002907 [10:20:32] Epoch: 1 Batch: 5901/20099 (29.36%) Loss: 1.963192 LR: 0.00002907 [10:20:34] Epoch: 1 Batch: 5902/20099 (29.36%) Loss: 2.073853 LR: 0.00002907 [10:20:36] Epoch: 1 Batch: 5903/20099 (29.37%) Loss: 2.155034 LR: 0.00002907 [10:20:38] Epoch: 1 Batch: 5904/20099 (29.37%) Loss: 2.205460 LR: 0.00002907 [10:20:39] Epoch: 1 Batch: 5905/20099 (29.38%) Loss: 2.440878 LR: 0.00002907 [10:20:41] Epoch: 1 Batch: 5906/20099 (29.38%) Loss: 2.047995 LR: 0.00002907 [10:20:43] Epoch: 1 Batch: 5907/20099 (29.39%) Loss: 2.002074 LR: 0.00002907 [10:20:45] Epoch: 1 Batch: 5908/20099 (29.39%) Loss: 2.339496 LR: 0.00002907 [10:20:46] Epoch: 1 Batch: 5909/20099 (29.40%) Loss: 2.004176 LR: 0.00002907 [10:20:48] Epoch: 1 Batch: 5910/20099 (29.40%) Loss: 1.947823 LR: 0.00002907 [10:20:50] Epoch: 1 Batch: 5911/20099 (29.41%) Loss: 2.054211 LR: 0.00002906 [10:20:52] Epoch: 1 Batch: 5912/20099 (29.41%) Loss: 2.130546 LR: 0.00002906 [10:20:53] Epoch: 1 Batch: 5913/20099 (29.42%) Loss: 2.271972 LR: 0.00002906 [10:20:55] Epoch: 1 Batch: 5914/20099 (29.42%) Loss: 2.181969 LR: 0.00002906 [10:20:57] Epoch: 1 Batch: 5915/20099 (29.43%) Loss: 2.134753 LR: 0.00002906 [10:20:59] Epoch: 1 Batch: 5916/20099 (29.43%) Loss: 2.048719 LR: 0.00002906 [10:21:00] Epoch: 1 Batch: 5917/20099 (29.44%) Loss: 1.804802 LR: 0.00002906 [10:21:02] Epoch: 1 Batch: 5918/20099 (29.44%) Loss: 1.939336 LR: 0.00002906 [10:21:04] Epoch: 1 Batch: 5919/20099 (29.45%) Loss: 2.106017 LR: 0.00002906 [10:21:06] Epoch: 1 Batch: 5920/20099 (29.45%) Loss: 2.244561 LR: 0.00002906 [10:21:07] Epoch: 1 Batch: 5921/20099 (29.46%) Loss: 2.299437 LR: 0.00002906 [10:21:09] Epoch: 1 Batch: 5922/20099 (29.46%) Loss: 2.322139 LR: 0.00002906 [10:21:11] Epoch: 1 Batch: 5923/20099 (29.47%) Loss: 2.005537 LR: 0.00002906 [10:21:13] Epoch: 1 Batch: 5924/20099 (29.47%) Loss: 2.342740 LR: 0.00002906 [10:21:15] Epoch: 1 Batch: 5925/20099 (29.48%) Loss: 2.270243 LR: 0.00002905 [10:21:16] Epoch: 1 Batch: 5926/20099 (29.48%) Loss: 2.175947 LR: 0.00002905 [10:21:18] Epoch: 1 Batch: 5927/20099 (29.49%) Loss: 2.267172 LR: 0.00002905 [10:21:20] Epoch: 1 Batch: 5928/20099 (29.49%) Loss: 2.077535 LR: 0.00002905 [10:21:22] Epoch: 1 Batch: 5929/20099 (29.50%) Loss: 2.302945 LR: 0.00002905 [10:21:23] Epoch: 1 Batch: 5930/20099 (29.50%) Loss: 2.194579 LR: 0.00002905 [10:21:25] Epoch: 1 Batch: 5931/20099 (29.51%) Loss: 2.130224 LR: 0.00002905 [10:21:27] Epoch: 1 Batch: 5932/20099 (29.51%) Loss: 2.220417 LR: 0.00002904 [10:21:29] Epoch: 1 Batch: 5933/20099 (29.52%) Loss: 2.043969 LR: 0.00002904 [10:21:30] Epoch: 1 Batch: 5934/20099 (29.52%) Loss: 2.344446 LR: 0.00002904 [10:21:32] Epoch: 1 Batch: 5935/20099 (29.53%) Loss: 1.893997 LR: 0.00002904 [10:21:34] Epoch: 1 Batch: 5936/20099 (29.53%) Loss: 2.081080 LR: 0.00002904 [10:21:36] Epoch: 1 Batch: 5937/20099 (29.54%) Loss: 2.073641 LR: 0.00002904 [10:21:38] Epoch: 1 Batch: 5938/20099 (29.54%) Loss: 2.170569 LR: 0.00002904 [10:21:39] Epoch: 1 Batch: 5939/20099 (29.55%) Loss: 1.994800 LR: 0.00002904 [10:21:41] Epoch: 1 Batch: 5940/20099 (29.55%) Loss: 1.990452 LR: 0.00002904 [10:21:43] Epoch: 1 Batch: 5941/20099 (29.56%) Loss: 2.211219 LR: 0.00002904 [10:21:45] Epoch: 1 Batch: 5942/20099 (29.56%) Loss: 2.173639 LR: 0.00002904 [10:21:46] Epoch: 1 Batch: 5943/20099 (29.57%) Loss: 2.154146 LR: 0.00002904 [10:21:48] Epoch: 1 Batch: 5944/20099 (29.57%) Loss: 2.159600 LR: 0.00002904 [10:21:50] Epoch: 1 Batch: 5945/20099 (29.58%) Loss: 2.287150 LR: 0.00002904 [10:21:52] Epoch: 1 Batch: 5946/20099 (29.58%) Loss: 2.143012 LR: 0.00002903 [10:21:53] Epoch: 1 Batch: 5947/20099 (29.59%) Loss: 1.946534 LR: 0.00002903 [10:21:55] Epoch: 1 Batch: 5948/20099 (29.59%) Loss: 2.436218 LR: 0.00002903 [10:21:57] Epoch: 1 Batch: 5949/20099 (29.60%) Loss: 1.906746 LR: 0.00002903 [10:21:59] Epoch: 1 Batch: 5950/20099 (29.60%) Loss: 2.194941 LR: 0.00002903 [10:22:00] Epoch: 1 Batch: 5951/20099 (29.61%) Loss: 2.156676 LR: 0.00002903 [10:22:02] Epoch: 1 Batch: 5952/20099 (29.61%) Loss: 2.106206 LR: 0.00002903 [10:22:04] Epoch: 1 Batch: 5953/20099 (29.62%) Loss: 2.352821 LR: 0.00002902 [10:22:06] Epoch: 1 Batch: 5954/20099 (29.62%) Loss: 1.678390 LR: 0.00002902 [10:22:08] Epoch: 1 Batch: 5955/20099 (29.63%) Loss: 2.142422 LR: 0.00002902 [10:22:09] Epoch: 1 Batch: 5956/20099 (29.63%) Loss: 2.047773 LR: 0.00002902 [10:22:11] Epoch: 1 Batch: 5957/20099 (29.64%) Loss: 2.518769 LR: 0.00002902 [10:22:13] Epoch: 1 Batch: 5958/20099 (29.64%) Loss: 2.099605 LR: 0.00002902 [10:22:15] Epoch: 1 Batch: 5959/20099 (29.65%) Loss: 2.049442 LR: 0.00002902 [10:22:16] Epoch: 1 Batch: 5960/20099 (29.65%) Loss: 2.196264 LR: 0.00002902 [10:22:18] Epoch: 1 Batch: 5961/20099 (29.66%) Loss: 2.447105 LR: 0.00002902 [10:22:20] Epoch: 1 Batch: 5962/20099 (29.66%) Loss: 2.079891 LR: 0.00002902 [10:22:22] Epoch: 1 Batch: 5963/20099 (29.67%) Loss: 2.270235 LR: 0.00002902 [10:22:24] Epoch: 1 Batch: 5964/20099 (29.67%) Loss: 2.104357 LR: 0.00002902 [10:22:25] Epoch: 1 Batch: 5965/20099 (29.68%) Loss: 2.012653 LR: 0.00002902 [10:22:27] Epoch: 1 Batch: 5966/20099 (29.68%) Loss: 2.114477 LR: 0.00002902 [10:22:29] Epoch: 1 Batch: 5967/20099 (29.69%) Loss: 2.193624 LR: 0.00002901 [10:22:31] Epoch: 1 Batch: 5968/20099 (29.69%) Loss: 2.212496 LR: 0.00002901 [10:22:32] Epoch: 1 Batch: 5969/20099 (29.70%) Loss: 2.412482 LR: 0.00002901 [10:22:34] Epoch: 1 Batch: 5970/20099 (29.70%) Loss: 2.063107 LR: 0.00002901 [10:22:36] Epoch: 1 Batch: 5971/20099 (29.71%) Loss: 2.264232 LR: 0.00002901 [10:22:38] Epoch: 1 Batch: 5972/20099 (29.71%) Loss: 2.159037 LR: 0.00002901 [10:22:40] Epoch: 1 Batch: 5973/20099 (29.72%) Loss: 2.496321 LR: 0.00002901 [10:22:41] Epoch: 1 Batch: 5974/20099 (29.72%) Loss: 2.077712 LR: 0.00002900 [10:22:43] Epoch: 1 Batch: 5975/20099 (29.73%) Loss: 2.139822 LR: 0.00002900 [10:22:45] Epoch: 1 Batch: 5976/20099 (29.73%) Loss: 2.321510 LR: 0.00002900 [10:22:47] Epoch: 1 Batch: 5977/20099 (29.74%) Loss: 2.046504 LR: 0.00002900 [10:22:48] Epoch: 1 Batch: 5978/20099 (29.74%) Loss: 2.135754 LR: 0.00002900 [10:22:50] Epoch: 1 Batch: 5979/20099 (29.75%) Loss: 2.259355 LR: 0.00002900 [10:22:52] Epoch: 1 Batch: 5980/20099 (29.75%) Loss: 2.108790 LR: 0.00002900 [10:22:54] Epoch: 1 Batch: 5981/20099 (29.76%) Loss: 2.185145 LR: 0.00002900 [10:22:56] Epoch: 1 Batch: 5982/20099 (29.76%) Loss: 1.689453 LR: 0.00002900 [10:22:57] Epoch: 1 Batch: 5983/20099 (29.77%) Loss: 2.188216 LR: 0.00002900 [10:22:59] Epoch: 1 Batch: 5984/20099 (29.77%) Loss: 1.843337 LR: 0.00002900 [10:23:01] Epoch: 1 Batch: 5985/20099 (29.78%) Loss: 2.168924 LR: 0.00002900 [10:23:03] Epoch: 1 Batch: 5986/20099 (29.78%) Loss: 2.213278 LR: 0.00002900 [10:23:04] Epoch: 1 Batch: 5987/20099 (29.79%) Loss: 2.233690 LR: 0.00002900 [10:23:06] Epoch: 1 Batch: 5988/20099 (29.79%) Loss: 2.156695 LR: 0.00002899 [10:23:08] Epoch: 1 Batch: 5989/20099 (29.80%) Loss: 2.337240 LR: 0.00002899 [10:23:10] Epoch: 1 Batch: 5990/20099 (29.80%) Loss: 1.816768 LR: 0.00002899 [10:23:12] Epoch: 1 Batch: 5991/20099 (29.81%) Loss: 2.117988 LR: 0.00002899 [10:23:13] Epoch: 1 Batch: 5992/20099 (29.81%) Loss: 2.315064 LR: 0.00002899 [10:23:15] Epoch: 1 Batch: 5993/20099 (29.82%) Loss: 2.368414 LR: 0.00002899 [10:23:17] Epoch: 1 Batch: 5994/20099 (29.82%) Loss: 2.126934 LR: 0.00002899 [10:23:19] Epoch: 1 Batch: 5995/20099 (29.83%) Loss: 2.240390 LR: 0.00002899 [10:23:20] Epoch: 1 Batch: 5996/20099 (29.83%) Loss: 2.075534 LR: 0.00002899 [10:23:22] Epoch: 1 Batch: 5997/20099 (29.84%) Loss: 2.140616 LR: 0.00002899 [10:23:24] Epoch: 1 Batch: 5998/20099 (29.84%) Loss: 2.043946 LR: 0.00002899 [10:23:26] Epoch: 1 Batch: 5999/20099 (29.85%) Loss: 1.942797 LR: 0.00002899 [10:23:28] >> Evaluating batch 0 [10:23:29] >> Evaluating batch 1 [10:23:30] >> Evaluating batch 2 [10:23:31] >> Evaluating batch 3 [10:23:32] >> Evaluating batch 4 [10:23:33] >> Evaluating batch 5 [10:23:34] >> Evaluating batch 6 [10:23:35] >> Evaluating batch 7 [10:23:36] >> Evaluating batch 8 [10:23:37] >> Evaluating batch 9 [10:23:38] >> Evaluating batch 10 [10:23:39] >> Evaluating batch 11 [10:23:40] >> Evaluating batch 12 [10:23:41] >> Evaluating batch 13 [10:23:42] >> Evaluating batch 14 [10:23:43] >> Evaluating batch 15 [10:23:44] >> Evaluating batch 16 [10:23:44] Epoch: 1 Step: 6000/20099 Evaluation: [10:23:44] [1mAvg Loss Since Last Eval: 2.1212 Val Loss: 2.1983 Validation loss delta: -0.0075 Perplexity: 9.0101 LR: 0.00002899 [10:23:48] >> Cleaned up old temp checkpoint: epoch1_step4000 [10:23:48] >> Temp checkpoint saved: epoch1_step6000, size: 0.1693 GB [10:23:51] >> Checkpoint saved: epoch1_step6000, size: 0.1693 GB [10:23:51] Epoch: 1 Batch: 6000/20099 (29.85%) Loss: 2.167515 LR: 0.00002899 [10:23:53] Epoch: 1 Batch: 6001/20099 (29.86%) Loss: 2.266573 LR: 0.00002899 [10:23:55] Epoch: 1 Batch: 6002/20099 (29.86%) Loss: 2.149224 LR: 0.00002898 [10:23:57] Epoch: 1 Batch: 6003/20099 (29.87%) Loss: 2.416745 LR: 0.00002898 [10:23:58] Epoch: 1 Batch: 6004/20099 (29.87%) Loss: 1.998012 LR: 0.00002898 [10:24:00] Epoch: 1 Batch: 6005/20099 (29.88%) Loss: 2.026496 LR: 0.00002898 [10:24:02] Epoch: 1 Batch: 6006/20099 (29.88%) Loss: 1.998003 LR: 0.00002898 [10:24:04] Epoch: 1 Batch: 6007/20099 (29.89%) Loss: 2.318587 LR: 0.00002898 [10:24:05] Epoch: 1 Batch: 6008/20099 (29.89%) Loss: 2.047056 LR: 0.00002898 [10:24:07] Epoch: 1 Batch: 6009/20099 (29.90%) Loss: 2.286715 LR: 0.00002897 [10:24:09] Epoch: 1 Batch: 6010/20099 (29.90%) Loss: 2.034576 LR: 0.00002897 [10:24:11] Epoch: 1 Batch: 6011/20099 (29.91%) Loss: 2.146474 LR: 0.00002897 [10:24:13] Epoch: 1 Batch: 6012/20099 (29.91%) Loss: 2.176541 LR: 0.00002897 [10:24:14] Epoch: 1 Batch: 6013/20099 (29.92%) Loss: 2.397283 LR: 0.00002897 [10:24:16] Epoch: 1 Batch: 6014/20099 (29.92%) Loss: 2.033884 LR: 0.00002897 [10:24:18] Epoch: 1 Batch: 6015/20099 (29.93%) Loss: 2.410838 LR: 0.00002897 [10:24:20] Epoch: 1 Batch: 6016/20099 (29.93%) Loss: 2.079157 LR: 0.00002897 [10:24:22] Epoch: 1 Batch: 6017/20099 (29.94%) Loss: 2.436302 LR: 0.00002897 [10:24:24] Epoch: 1 Batch: 6018/20099 (29.94%) Loss: 2.068577 LR: 0.00002897 [10:24:26] Epoch: 1 Batch: 6019/20099 (29.95%) Loss: 2.201342 LR: 0.00002897 [10:24:27] Epoch: 1 Batch: 6020/20099 (29.95%) Loss: 1.943850 LR: 0.00002897 [10:24:29] Epoch: 1 Batch: 6021/20099 (29.96%) Loss: 1.916546 LR: 0.00002897 [10:24:31] Epoch: 1 Batch: 6022/20099 (29.96%) Loss: 1.992570 LR: 0.00002897 [10:24:33] Epoch: 1 Batch: 6023/20099 (29.97%) Loss: 2.136137 LR: 0.00002896 [10:24:35] Epoch: 1 Batch: 6024/20099 (29.97%) Loss: 2.231624 LR: 0.00002896 [10:24:36] Epoch: 1 Batch: 6025/20099 (29.98%) Loss: 1.956048 LR: 0.00002896 [10:24:38] Epoch: 1 Batch: 6026/20099 (29.98%) Loss: 2.345159 LR: 0.00002896 [10:24:40] Epoch: 1 Batch: 6027/20099 (29.99%) Loss: 2.176679 LR: 0.00002896 [10:24:42] Epoch: 1 Batch: 6028/20099 (29.99%) Loss: 2.260270 LR: 0.00002896 [10:24:43] Epoch: 1 Batch: 6029/20099 (30.00%) Loss: 2.069808 LR: 0.00002896 [10:24:45] Epoch: 1 Batch: 6030/20099 (30.00%) Loss: 2.154341 LR: 0.00002895 [10:24:47] Epoch: 1 Batch: 6031/20099 (30.01%) Loss: 2.073474 LR: 0.00002895 [10:24:49] Epoch: 1 Batch: 6032/20099 (30.01%) Loss: 1.865573 LR: 0.00002895 [10:24:50] Epoch: 1 Batch: 6033/20099 (30.02%) Loss: 2.182718 LR: 0.00002895 [10:24:52] Epoch: 1 Batch: 6034/20099 (30.02%) Loss: 2.185627 LR: 0.00002895 [10:24:54] Epoch: 1 Batch: 6035/20099 (30.03%) Loss: 2.200272 LR: 0.00002895 [10:24:56] Epoch: 1 Batch: 6036/20099 (30.03%) Loss: 2.067059 LR: 0.00002895 [10:24:58] Epoch: 1 Batch: 6037/20099 (30.04%) Loss: 2.193210 LR: 0.00002895 [10:24:59] Epoch: 1 Batch: 6038/20099 (30.04%) Loss: 2.150224 LR: 0.00002895 [10:25:01] Epoch: 1 Batch: 6039/20099 (30.05%) Loss: 2.230484 LR: 0.00002895 [10:25:03] Epoch: 1 Batch: 6040/20099 (30.05%) Loss: 1.854348 LR: 0.00002895 [10:25:04] Epoch: 1 Batch: 6041/20099 (30.06%) Loss: 1.890896 LR: 0.00002895 [10:25:06] Epoch: 1 Batch: 6042/20099 (30.06%) Loss: 2.412341 LR: 0.00002895 [10:25:08] Epoch: 1 Batch: 6043/20099 (30.07%) Loss: 2.204033 LR: 0.00002895 [10:25:10] Epoch: 1 Batch: 6044/20099 (30.07%) Loss: 2.274420 LR: 0.00002894 [10:25:11] Epoch: 1 Batch: 6045/20099 (30.08%) Loss: 2.042023 LR: 0.00002894 [10:25:13] Epoch: 1 Batch: 6046/20099 (30.08%) Loss: 2.320745 LR: 0.00002894 [10:25:15] Epoch: 1 Batch: 6047/20099 (30.09%) Loss: 2.461384 LR: 0.00002894 [10:25:17] Epoch: 1 Batch: 6048/20099 (30.09%) Loss: 2.203166 LR: 0.00002894 [10:25:18] Epoch: 1 Batch: 6049/20099 (30.10%) Loss: 2.169242 LR: 0.00002894 [10:25:20] Epoch: 1 Batch: 6050/20099 (30.10%) Loss: 2.332826 LR: 0.00002894 [10:25:22] Epoch: 1 Batch: 6051/20099 (30.11%) Loss: 2.099708 LR: 0.00002893 [10:25:24] Epoch: 1 Batch: 6052/20099 (30.11%) Loss: 2.024796 LR: 0.00002893 [10:25:25] Epoch: 1 Batch: 6053/20099 (30.12%) Loss: 1.987003 LR: 0.00002893 [10:25:27] Epoch: 1 Batch: 6054/20099 (30.12%) Loss: 2.101177 LR: 0.00002893 [10:25:29] Epoch: 1 Batch: 6055/20099 (30.13%) Loss: 2.091612 LR: 0.00002893 [10:25:31] Epoch: 1 Batch: 6056/20099 (30.13%) Loss: 2.057204 LR: 0.00002893 [10:25:33] Epoch: 1 Batch: 6057/20099 (30.14%) Loss: 1.815431 LR: 0.00002893 [10:25:34] Epoch: 1 Batch: 6058/20099 (30.14%) Loss: 2.234263 LR: 0.00002893 [10:25:36] Epoch: 1 Batch: 6059/20099 (30.15%) Loss: 2.188094 LR: 0.00002893 [10:25:38] Epoch: 1 Batch: 6060/20099 (30.15%) Loss: 2.166759 LR: 0.00002893 [10:25:40] Epoch: 1 Batch: 6061/20099 (30.16%) Loss: 2.220728 LR: 0.00002893 [10:25:41] Epoch: 1 Batch: 6062/20099 (30.16%) Loss: 2.046496 LR: 0.00002893 [10:25:43] Epoch: 1 Batch: 6063/20099 (30.17%) Loss: 2.167997 LR: 0.00002893 [10:25:45] Epoch: 1 Batch: 6064/20099 (30.17%) Loss: 1.983816 LR: 0.00002893 [10:25:47] Epoch: 1 Batch: 6065/20099 (30.18%) Loss: 2.131745 LR: 0.00002892 [10:25:49] Epoch: 1 Batch: 6066/20099 (30.18%) Loss: 2.143176 LR: 0.00002892 [10:25:50] Epoch: 1 Batch: 6067/20099 (30.19%) Loss: 1.867975 LR: 0.00002892 [10:25:52] Epoch: 1 Batch: 6068/20099 (30.19%) Loss: 1.938297 LR: 0.00002892 [10:25:54] Epoch: 1 Batch: 6069/20099 (30.20%) Loss: 2.180942 LR: 0.00002892 [10:25:56] Epoch: 1 Batch: 6070/20099 (30.20%) Loss: 2.065716 LR: 0.00002892 [10:25:58] Epoch: 1 Batch: 6071/20099 (30.21%) Loss: 2.264986 LR: 0.00002892 [10:25:59] Epoch: 1 Batch: 6072/20099 (30.21%) Loss: 2.029092 LR: 0.00002891 [10:26:01] Epoch: 1 Batch: 6073/20099 (30.22%) Loss: 2.211565 LR: 0.00002891 [10:26:03] Epoch: 1 Batch: 6074/20099 (30.22%) Loss: 1.899963 LR: 0.00002891 [10:26:05] Epoch: 1 Batch: 6075/20099 (30.23%) Loss: 2.088056 LR: 0.00002891 [10:26:06] Epoch: 1 Batch: 6076/20099 (30.23%) Loss: 2.019875 LR: 0.00002891 [10:26:08] Epoch: 1 Batch: 6077/20099 (30.24%) Loss: 2.153631 LR: 0.00002891 [10:26:10] Epoch: 1 Batch: 6078/20099 (30.24%) Loss: 2.272575 LR: 0.00002891 [10:26:12] Epoch: 1 Batch: 6079/20099 (30.25%) Loss: 1.961095 LR: 0.00002891 [10:26:13] Epoch: 1 Batch: 6080/20099 (30.25%) Loss: 2.013952 LR: 0.00002891 [10:26:15] Epoch: 1 Batch: 6081/20099 (30.26%) Loss: 2.137667 LR: 0.00002891 [10:26:17] Epoch: 1 Batch: 6082/20099 (30.26%) Loss: 2.041891 LR: 0.00002891 [10:26:19] Epoch: 1 Batch: 6083/20099 (30.27%) Loss: 1.925700 LR: 0.00002891 [10:26:21] Epoch: 1 Batch: 6084/20099 (30.27%) Loss: 2.056208 LR: 0.00002891 [10:26:22] Epoch: 1 Batch: 6085/20099 (30.28%) Loss: 2.049197 LR: 0.00002891 [10:26:24] Epoch: 1 Batch: 6086/20099 (30.28%) Loss: 2.100750 LR: 0.00002890 [10:26:26] Epoch: 1 Batch: 6087/20099 (30.29%) Loss: 2.081678 LR: 0.00002890 [10:26:28] Epoch: 1 Batch: 6088/20099 (30.29%) Loss: 2.102439 LR: 0.00002890 [10:26:29] Epoch: 1 Batch: 6089/20099 (30.30%) Loss: 2.008194 LR: 0.00002890 [10:26:31] Epoch: 1 Batch: 6090/20099 (30.30%) Loss: 2.301480 LR: 0.00002890 [10:26:33] Epoch: 1 Batch: 6091/20099 (30.30%) Loss: 2.023216 LR: 0.00002890 [10:26:35] Epoch: 1 Batch: 6092/20099 (30.31%) Loss: 1.994119 LR: 0.00002890 [10:26:36] Epoch: 1 Batch: 6093/20099 (30.31%) Loss: 2.303020 LR: 0.00002889 [10:26:38] Epoch: 1 Batch: 6094/20099 (30.32%) Loss: 2.020190 LR: 0.00002889 [10:26:40] Epoch: 1 Batch: 6095/20099 (30.32%) Loss: 2.220544 LR: 0.00002889 [10:26:42] Epoch: 1 Batch: 6096/20099 (30.33%) Loss: 2.031635 LR: 0.00002889 [10:26:44] Epoch: 1 Batch: 6097/20099 (30.33%) Loss: 2.173140 LR: 0.00002889 [10:26:45] Epoch: 1 Batch: 6098/20099 (30.34%) Loss: 1.903060 LR: 0.00002889 [10:26:47] Epoch: 1 Batch: 6099/20099 (30.34%) Loss: 2.013709 LR: 0.00002889 [10:26:49] Epoch: 1 Batch: 6100/20099 (30.35%) Loss: 2.122766 LR: 0.00002889 [10:26:51] Epoch: 1 Batch: 6101/20099 (30.35%) Loss: 2.296837 LR: 0.00002889 [10:26:52] Epoch: 1 Batch: 6102/20099 (30.36%) Loss: 1.961914 LR: 0.00002889 [10:26:54] Epoch: 1 Batch: 6103/20099 (30.36%) Loss: 2.200015 LR: 0.00002889 [10:26:56] Epoch: 1 Batch: 6104/20099 (30.37%) Loss: 1.954469 LR: 0.00002889 [10:26:58] Epoch: 1 Batch: 6105/20099 (30.37%) Loss: 1.752756 LR: 0.00002889 [10:26:59] Epoch: 1 Batch: 6106/20099 (30.38%) Loss: 2.327977 LR: 0.00002889 [10:27:01] Epoch: 1 Batch: 6107/20099 (30.38%) Loss: 2.040295 LR: 0.00002888 [10:27:03] Epoch: 1 Batch: 6108/20099 (30.39%) Loss: 2.174480 LR: 0.00002888 [10:27:05] Epoch: 1 Batch: 6109/20099 (30.39%) Loss: 2.002710 LR: 0.00002888 [10:27:06] Epoch: 1 Batch: 6110/20099 (30.40%) Loss: 2.304384 LR: 0.00002888 [10:27:08] Epoch: 1 Batch: 6111/20099 (30.40%) Loss: 1.993517 LR: 0.00002888 [10:27:10] Epoch: 1 Batch: 6112/20099 (30.41%) Loss: 2.279287 LR: 0.00002888 [10:27:12] Epoch: 1 Batch: 6113/20099 (30.41%) Loss: 2.003711 LR: 0.00002888 [10:27:14] Epoch: 1 Batch: 6114/20099 (30.42%) Loss: 2.131424 LR: 0.00002887 [10:27:15] Epoch: 1 Batch: 6115/20099 (30.42%) Loss: 2.435085 LR: 0.00002887 [10:27:17] Epoch: 1 Batch: 6116/20099 (30.43%) Loss: 2.330194 LR: 0.00002887 [10:27:19] Epoch: 1 Batch: 6117/20099 (30.43%) Loss: 2.479560 LR: 0.00002887 [10:27:21] Epoch: 1 Batch: 6118/20099 (30.44%) Loss: 2.068970 LR: 0.00002887 [10:27:22] Epoch: 1 Batch: 6119/20099 (30.44%) Loss: 1.789693 LR: 0.00002887 [10:27:24] Epoch: 1 Batch: 6120/20099 (30.45%) Loss: 1.911541 LR: 0.00002887 [10:27:26] Epoch: 1 Batch: 6121/20099 (30.45%) Loss: 2.033945 LR: 0.00002886 [10:27:28] Epoch: 1 Batch: 6122/20099 (30.46%) Loss: 2.032663 LR: 0.00002886 [10:27:29] Epoch: 1 Batch: 6123/20099 (30.46%) Loss: 2.106970 LR: 0.00002886 [10:27:31] Epoch: 1 Batch: 6124/20099 (30.47%) Loss: 2.449900 LR: 0.00002886 [10:27:33] Epoch: 1 Batch: 6125/20099 (30.47%) Loss: 2.245286 LR: 0.00002886 [10:27:35] Epoch: 1 Batch: 6126/20099 (30.48%) Loss: 2.485668 LR: 0.00002886 [10:27:36] Epoch: 1 Batch: 6127/20099 (30.48%) Loss: 2.077368 LR: 0.00002886 [10:27:38] Epoch: 1 Batch: 6128/20099 (30.49%) Loss: 2.203731 LR: 0.00002886 [10:27:40] Epoch: 1 Batch: 6129/20099 (30.49%) Loss: 1.978845 LR: 0.00002886 [10:27:42] Epoch: 1 Batch: 6130/20099 (30.50%) Loss: 1.775719 LR: 0.00002886 [10:27:43] Epoch: 1 Batch: 6131/20099 (30.50%) Loss: 2.194798 LR: 0.00002886 [10:27:45] Epoch: 1 Batch: 6132/20099 (30.51%) Loss: 2.162802 LR: 0.00002886 [10:27:47] Epoch: 1 Batch: 6133/20099 (30.51%) Loss: 2.121477 LR: 0.00002886 [10:27:49] Epoch: 1 Batch: 6134/20099 (30.52%) Loss: 1.615075 LR: 0.00002886 [10:27:50] Epoch: 1 Batch: 6135/20099 (30.52%) Loss: 2.252779 LR: 0.00002885 [10:27:52] Epoch: 1 Batch: 6136/20099 (30.53%) Loss: 2.148409 LR: 0.00002885 [10:27:54] Epoch: 1 Batch: 6137/20099 (30.53%) Loss: 2.126829 LR: 0.00002885 [10:27:56] Epoch: 1 Batch: 6138/20099 (30.54%) Loss: 2.179602 LR: 0.00002885 [10:27:58] Epoch: 1 Batch: 6139/20099 (30.54%) Loss: 2.238281 LR: 0.00002885 [10:27:59] Epoch: 1 Batch: 6140/20099 (30.55%) Loss: 1.929934 LR: 0.00002885 [10:28:01] Epoch: 1 Batch: 6141/20099 (30.55%) Loss: 2.285249 LR: 0.00002885 [10:28:03] Epoch: 1 Batch: 6142/20099 (30.56%) Loss: 2.460362 LR: 0.00002884 [10:28:05] Epoch: 1 Batch: 6143/20099 (30.56%) Loss: 2.125650 LR: 0.00002884 [10:28:06] Epoch: 1 Batch: 6144/20099 (30.57%) Loss: 2.370881 LR: 0.00002884 [10:28:08] Epoch: 1 Batch: 6145/20099 (30.57%) Loss: 2.352486 LR: 0.00002884 [10:28:10] Epoch: 1 Batch: 6146/20099 (30.58%) Loss: 2.066581 LR: 0.00002884 [10:28:12] Epoch: 1 Batch: 6147/20099 (30.58%) Loss: 1.935142 LR: 0.00002884 [10:28:13] Epoch: 1 Batch: 6148/20099 (30.59%) Loss: 2.087194 LR: 0.00002884 [10:28:15] Epoch: 1 Batch: 6149/20099 (30.59%) Loss: 2.274426 LR: 0.00002884 [10:28:17] Epoch: 1 Batch: 6150/20099 (30.60%) Loss: 2.526196 LR: 0.00002884 [10:28:19] Epoch: 1 Batch: 6151/20099 (30.60%) Loss: 2.145799 LR: 0.00002884 [10:28:21] Epoch: 1 Batch: 6152/20099 (30.61%) Loss: 2.190077 LR: 0.00002884 [10:28:22] Epoch: 1 Batch: 6153/20099 (30.61%) Loss: 2.363593 LR: 0.00002884 [10:28:24] Epoch: 1 Batch: 6154/20099 (30.62%) Loss: 2.063462 LR: 0.00002884 [10:28:26] Epoch: 1 Batch: 6155/20099 (30.62%) Loss: 2.350155 LR: 0.00002884 [10:28:28] Epoch: 1 Batch: 6156/20099 (30.63%) Loss: 2.093012 LR: 0.00002883 [10:28:29] Epoch: 1 Batch: 6157/20099 (30.63%) Loss: 2.314228 LR: 0.00002883 [10:28:31] Epoch: 1 Batch: 6158/20099 (30.64%) Loss: 2.166234 LR: 0.00002883 [10:28:33] Epoch: 1 Batch: 6159/20099 (30.64%) Loss: 1.984644 LR: 0.00002883 [10:28:35] Epoch: 1 Batch: 6160/20099 (30.65%) Loss: 2.021545 LR: 0.00002883 [10:28:36] Epoch: 1 Batch: 6161/20099 (30.65%) Loss: 2.351858 LR: 0.00002883 [10:28:38] Epoch: 1 Batch: 6162/20099 (30.66%) Loss: 2.069036 LR: 0.00002883 [10:28:40] Epoch: 1 Batch: 6163/20099 (30.66%) Loss: 2.073274 LR: 0.00002882 [10:28:42] Epoch: 1 Batch: 6164/20099 (30.67%) Loss: 2.378255 LR: 0.00002882 [10:28:44] Epoch: 1 Batch: 6165/20099 (30.67%) Loss: 2.071101 LR: 0.00002882 [10:28:45] Epoch: 1 Batch: 6166/20099 (30.68%) Loss: 1.766687 LR: 0.00002882 [10:28:47] Epoch: 1 Batch: 6167/20099 (30.68%) Loss: 2.323901 LR: 0.00002882 [10:28:49] Epoch: 1 Batch: 6168/20099 (30.69%) Loss: 2.124365 LR: 0.00002882 [10:28:51] Epoch: 1 Batch: 6169/20099 (30.69%) Loss: 2.150475 LR: 0.00002882 [10:28:52] Epoch: 1 Batch: 6170/20099 (30.70%) Loss: 2.207050 LR: 0.00002882 [10:28:54] Epoch: 1 Batch: 6171/20099 (30.70%) Loss: 1.978800 LR: 0.00002882 [10:28:56] Epoch: 1 Batch: 6172/20099 (30.71%) Loss: 2.047195 LR: 0.00002882 [10:28:58] Epoch: 1 Batch: 6173/20099 (30.71%) Loss: 2.019106 LR: 0.00002882 [10:28:59] Epoch: 1 Batch: 6174/20099 (30.72%) Loss: 2.172266 LR: 0.00002882 [10:29:01] Epoch: 1 Batch: 6175/20099 (30.72%) Loss: 2.364386 LR: 0.00002882 [10:29:03] Epoch: 1 Batch: 6176/20099 (30.73%) Loss: 2.210007 LR: 0.00002882 [10:29:05] Epoch: 1 Batch: 6177/20099 (30.73%) Loss: 2.127088 LR: 0.00002881 [10:29:07] Epoch: 1 Batch: 6178/20099 (30.74%) Loss: 2.232116 LR: 0.00002881 [10:29:08] Epoch: 1 Batch: 6179/20099 (30.74%) Loss: 2.096040 LR: 0.00002881 [10:29:10] Epoch: 1 Batch: 6180/20099 (30.75%) Loss: 2.052990 LR: 0.00002881 [10:29:12] Epoch: 1 Batch: 6181/20099 (30.75%) Loss: 1.966865 LR: 0.00002881 [10:29:14] Epoch: 1 Batch: 6182/20099 (30.76%) Loss: 2.361166 LR: 0.00002881 [10:29:15] Epoch: 1 Batch: 6183/20099 (30.76%) Loss: 2.414168 LR: 0.00002881 [10:29:17] Epoch: 1 Batch: 6184/20099 (30.77%) Loss: 1.966624 LR: 0.00002880 [10:29:19] Epoch: 1 Batch: 6185/20099 (30.77%) Loss: 2.071851 LR: 0.00002880 [10:29:21] Epoch: 1 Batch: 6186/20099 (30.78%) Loss: 2.094296 LR: 0.00002880 [10:29:22] Epoch: 1 Batch: 6187/20099 (30.78%) Loss: 1.959520 LR: 0.00002880 [10:29:24] Epoch: 1 Batch: 6188/20099 (30.79%) Loss: 2.277769 LR: 0.00002880 [10:29:26] Epoch: 1 Batch: 6189/20099 (30.79%) Loss: 2.161672 LR: 0.00002880 [10:29:28] Epoch: 1 Batch: 6190/20099 (30.80%) Loss: 2.199385 LR: 0.00002880 [10:29:30] Epoch: 1 Batch: 6191/20099 (30.80%) Loss: 2.426378 LR: 0.00002879 [10:29:31] Epoch: 1 Batch: 6192/20099 (30.81%) Loss: 2.205956 LR: 0.00002879 [10:29:33] Epoch: 1 Batch: 6193/20099 (30.81%) Loss: 2.113696 LR: 0.00002879 [10:29:35] Epoch: 1 Batch: 6194/20099 (30.82%) Loss: 2.186690 LR: 0.00002879 [10:29:37] Epoch: 1 Batch: 6195/20099 (30.82%) Loss: 2.268899 LR: 0.00002879 [10:29:38] Epoch: 1 Batch: 6196/20099 (30.83%) Loss: 2.119167 LR: 0.00002879 [10:29:40] Epoch: 1 Batch: 6197/20099 (30.83%) Loss: 2.273023 LR: 0.00002879 [10:29:42] Epoch: 1 Batch: 6198/20099 (30.84%) Loss: 2.194120 LR: 0.00002879 [10:29:44] Epoch: 1 Batch: 6199/20099 (30.84%) Loss: 2.007127 LR: 0.00002879 [10:29:49] >> Cleaned up old temp checkpoint: epoch1_step4200 [10:29:49] >> Temp checkpoint saved: epoch1_step6200, size: 0.1693 GB [10:29:49] Epoch: 1 Batch: 6200/20099 (30.85%) Loss: 2.352658 LR: 0.00002879 [10:29:51] Epoch: 1 Batch: 6201/20099 (30.85%) Loss: 2.122938 LR: 0.00002879 [10:29:53] Epoch: 1 Batch: 6202/20099 (30.86%) Loss: 2.133700 LR: 0.00002879 [10:29:54] Epoch: 1 Batch: 6203/20099 (30.86%) Loss: 1.997478 LR: 0.00002879 [10:29:56] Epoch: 1 Batch: 6204/20099 (30.87%) Loss: 2.240467 LR: 0.00002879 [10:29:58] Epoch: 1 Batch: 6205/20099 (30.87%) Loss: 2.375379 LR: 0.00002878 [10:30:00] Epoch: 1 Batch: 6206/20099 (30.88%) Loss: 2.101030 LR: 0.00002878 [10:30:01] Epoch: 1 Batch: 6207/20099 (30.88%) Loss: 2.297934 LR: 0.00002878 [10:30:03] Epoch: 1 Batch: 6208/20099 (30.89%) Loss: 2.024060 LR: 0.00002878 [10:30:05] Epoch: 1 Batch: 6209/20099 (30.89%) Loss: 2.430643 LR: 0.00002878 [10:30:07] Epoch: 1 Batch: 6210/20099 (30.90%) Loss: 2.557595 LR: 0.00002878 [10:30:08] Epoch: 1 Batch: 6211/20099 (30.90%) Loss: 2.022629 LR: 0.00002878 [10:30:10] Epoch: 1 Batch: 6212/20099 (30.91%) Loss: 2.093209 LR: 0.00002877 [10:30:12] Epoch: 1 Batch: 6213/20099 (30.91%) Loss: 2.188879 LR: 0.00002877 [10:30:14] Epoch: 1 Batch: 6214/20099 (30.92%) Loss: 2.198479 LR: 0.00002877 [10:30:16] Epoch: 1 Batch: 6215/20099 (30.92%) Loss: 2.367657 LR: 0.00002877 [10:30:17] Epoch: 1 Batch: 6216/20099 (30.93%) Loss: 2.026569 LR: 0.00002877 [10:30:19] Epoch: 1 Batch: 6217/20099 (30.93%) Loss: 1.990440 LR: 0.00002877 [10:30:21] Epoch: 1 Batch: 6218/20099 (30.94%) Loss: 2.307656 LR: 0.00002877 [10:30:23] Epoch: 1 Batch: 6219/20099 (30.94%) Loss: 2.302414 LR: 0.00002877 [10:30:25] Epoch: 1 Batch: 6220/20099 (30.95%) Loss: 2.042426 LR: 0.00002877 [10:30:26] Epoch: 1 Batch: 6221/20099 (30.95%) Loss: 1.956993 LR: 0.00002877 [10:30:28] Epoch: 1 Batch: 6222/20099 (30.96%) Loss: 2.070917 LR: 0.00002877 [10:30:30] Epoch: 1 Batch: 6223/20099 (30.96%) Loss: 2.398350 LR: 0.00002877 [10:30:32] Epoch: 1 Batch: 6224/20099 (30.97%) Loss: 2.162161 LR: 0.00002877 [10:30:33] Epoch: 1 Batch: 6225/20099 (30.97%) Loss: 2.424959 LR: 0.00002877 [10:30:35] Epoch: 1 Batch: 6226/20099 (30.98%) Loss: 2.075073 LR: 0.00002876 [10:30:37] Epoch: 1 Batch: 6227/20099 (30.98%) Loss: 2.176429 LR: 0.00002876 [10:30:39] Epoch: 1 Batch: 6228/20099 (30.99%) Loss: 2.264375 LR: 0.00002876 [10:30:41] Epoch: 1 Batch: 6229/20099 (30.99%) Loss: 2.318801 LR: 0.00002876 [10:30:42] Epoch: 1 Batch: 6230/20099 (31.00%) Loss: 2.320192 LR: 0.00002876 [10:30:44] Epoch: 1 Batch: 6231/20099 (31.00%) Loss: 2.111840 LR: 0.00002876 [10:30:46] Epoch: 1 Batch: 6232/20099 (31.01%) Loss: 1.939478 LR: 0.00002876 [10:30:48] Epoch: 1 Batch: 6233/20099 (31.01%) Loss: 1.553004 LR: 0.00002875 [10:30:49] Epoch: 1 Batch: 6234/20099 (31.02%) Loss: 2.190462 LR: 0.00002875 [10:30:51] Epoch: 1 Batch: 6235/20099 (31.02%) Loss: 2.097749 LR: 0.00002875 [10:30:53] Epoch: 1 Batch: 6236/20099 (31.03%) Loss: 2.135648 LR: 0.00002875 [10:30:55] Epoch: 1 Batch: 6237/20099 (31.03%) Loss: 1.849052 LR: 0.00002875 [10:30:56] Epoch: 1 Batch: 6238/20099 (31.04%) Loss: 2.221046 LR: 0.00002875 [10:30:58] Epoch: 1 Batch: 6239/20099 (31.04%) Loss: 1.806323 LR: 0.00002875 [10:31:00] Epoch: 1 Batch: 6240/20099 (31.05%) Loss: 2.211417 LR: 0.00002874 [10:31:02] Epoch: 1 Batch: 6241/20099 (31.05%) Loss: 1.989772 LR: 0.00002874 [10:31:04] Epoch: 1 Batch: 6242/20099 (31.06%) Loss: 2.258232 LR: 0.00002874 [10:31:05] Epoch: 1 Batch: 6243/20099 (31.06%) Loss: 1.868869 LR: 0.00002874 [10:31:07] Epoch: 1 Batch: 6244/20099 (31.07%) Loss: 2.063636 LR: 0.00002874 [10:31:09] Epoch: 1 Batch: 6245/20099 (31.07%) Loss: 2.168203 LR: 0.00002874 [10:31:11] Epoch: 1 Batch: 6246/20099 (31.08%) Loss: 2.266906 LR: 0.00002874 [10:31:12] Epoch: 1 Batch: 6247/20099 (31.08%) Loss: 2.119305 LR: 0.00002874 [10:31:14] Epoch: 1 Batch: 6248/20099 (31.09%) Loss: 1.839048 LR: 0.00002874 [10:31:16] Epoch: 1 Batch: 6249/20099 (31.09%) Loss: 2.341777 LR: 0.00002874 [10:31:18] Epoch: 1 Batch: 6250/20099 (31.10%) Loss: 1.923596 LR: 0.00002874 [10:31:19] Epoch: 1 Batch: 6251/20099 (31.10%) Loss: 2.086246 LR: 0.00002874 [10:31:21] Epoch: 1 Batch: 6252/20099 (31.11%) Loss: 1.946492 LR: 0.00002874 [10:31:23] Epoch: 1 Batch: 6253/20099 (31.11%) Loss: 1.811921 LR: 0.00002874 [10:31:25] Epoch: 1 Batch: 6254/20099 (31.12%) Loss: 2.099774 LR: 0.00002873 [10:31:27] Epoch: 1 Batch: 6255/20099 (31.12%) Loss: 2.013122 LR: 0.00002873 [10:31:28] Epoch: 1 Batch: 6256/20099 (31.13%) Loss: 2.168018 LR: 0.00002873 [10:31:30] Epoch: 1 Batch: 6257/20099 (31.13%) Loss: 2.296320 LR: 0.00002873 [10:31:32] Epoch: 1 Batch: 6258/20099 (31.14%) Loss: 1.782577 LR: 0.00002873 [10:31:34] Epoch: 1 Batch: 6259/20099 (31.14%) Loss: 1.936415 LR: 0.00002873 [10:31:35] Epoch: 1 Batch: 6260/20099 (31.15%) Loss: 1.783918 LR: 0.00002873 [10:31:37] Epoch: 1 Batch: 6261/20099 (31.15%) Loss: 2.352529 LR: 0.00002872 [10:31:39] Epoch: 1 Batch: 6262/20099 (31.16%) Loss: 2.102598 LR: 0.00002872 [10:31:41] Epoch: 1 Batch: 6263/20099 (31.16%) Loss: 2.024643 LR: 0.00002872 [10:31:42] Epoch: 1 Batch: 6264/20099 (31.17%) Loss: 2.136469 LR: 0.00002872 [10:31:44] Epoch: 1 Batch: 6265/20099 (31.17%) Loss: 2.013725 LR: 0.00002872 [10:31:46] Epoch: 1 Batch: 6266/20099 (31.18%) Loss: 2.118818 LR: 0.00002872 [10:31:48] Epoch: 1 Batch: 6267/20099 (31.18%) Loss: 2.116645 LR: 0.00002872 [10:31:49] Epoch: 1 Batch: 6268/20099 (31.19%) Loss: 2.264665 LR: 0.00002872 [10:31:51] Epoch: 1 Batch: 6269/20099 (31.19%) Loss: 2.431829 LR: 0.00002872 [10:31:53] Epoch: 1 Batch: 6270/20099 (31.20%) Loss: 2.005041 LR: 0.00002872 [10:31:55] Epoch: 1 Batch: 6271/20099 (31.20%) Loss: 2.347154 LR: 0.00002872 [10:31:57] Epoch: 1 Batch: 6272/20099 (31.21%) Loss: 2.042758 LR: 0.00002872 [10:31:58] Epoch: 1 Batch: 6273/20099 (31.21%) Loss: 2.224375 LR: 0.00002872 [10:32:00] Epoch: 1 Batch: 6274/20099 (31.22%) Loss: 2.099962 LR: 0.00002872 [10:32:02] Epoch: 1 Batch: 6275/20099 (31.22%) Loss: 2.175937 LR: 0.00002871 [10:32:04] Epoch: 1 Batch: 6276/20099 (31.23%) Loss: 2.144987 LR: 0.00002871 [10:32:05] Epoch: 1 Batch: 6277/20099 (31.23%) Loss: 2.311742 LR: 0.00002871 [10:32:07] Epoch: 1 Batch: 6278/20099 (31.24%) Loss: 2.154278 LR: 0.00002871 [10:32:09] Epoch: 1 Batch: 6279/20099 (31.24%) Loss: 2.148815 LR: 0.00002871 [10:32:11] Epoch: 1 Batch: 6280/20099 (31.25%) Loss: 2.019048 LR: 0.00002871 [10:32:12] Epoch: 1 Batch: 6281/20099 (31.25%) Loss: 2.227001 LR: 0.00002871 [10:32:14] Epoch: 1 Batch: 6282/20099 (31.26%) Loss: 1.992877 LR: 0.00002870 [10:32:16] Epoch: 1 Batch: 6283/20099 (31.26%) Loss: 2.038112 LR: 0.00002870 [10:32:18] Epoch: 1 Batch: 6284/20099 (31.27%) Loss: 2.358537 LR: 0.00002870 [10:32:20] Epoch: 1 Batch: 6285/20099 (31.27%) Loss: 2.396007 LR: 0.00002870 [10:32:21] Epoch: 1 Batch: 6286/20099 (31.28%) Loss: 2.252377 LR: 0.00002870 [10:32:23] Epoch: 1 Batch: 6287/20099 (31.28%) Loss: 1.995181 LR: 0.00002870 [10:32:25] Epoch: 1 Batch: 6288/20099 (31.29%) Loss: 2.208895 LR: 0.00002870 [10:32:27] Epoch: 1 Batch: 6289/20099 (31.29%) Loss: 2.161045 LR: 0.00002869 [10:32:28] Epoch: 1 Batch: 6290/20099 (31.30%) Loss: 1.909012 LR: 0.00002869 [10:32:30] Epoch: 1 Batch: 6291/20099 (31.30%) Loss: 2.003023 LR: 0.00002869 [10:32:32] Epoch: 1 Batch: 6292/20099 (31.31%) Loss: 2.053637 LR: 0.00002869 [10:32:34] Epoch: 1 Batch: 6293/20099 (31.31%) Loss: 1.581449 LR: 0.00002869 [10:32:36] Epoch: 1 Batch: 6294/20099 (31.31%) Loss: 1.993044 LR: 0.00002869 [10:32:37] Epoch: 1 Batch: 6295/20099 (31.32%) Loss: 2.364690 LR: 0.00002869 [10:32:39] Epoch: 1 Batch: 6296/20099 (31.32%) Loss: 2.251909 LR: 0.00002869 [10:32:41] Epoch: 1 Batch: 6297/20099 (31.33%) Loss: 2.124489 LR: 0.00002869 [10:32:43] Epoch: 1 Batch: 6298/20099 (31.33%) Loss: 2.271361 LR: 0.00002869 [10:32:44] Epoch: 1 Batch: 6299/20099 (31.34%) Loss: 2.161395 LR: 0.00002869 [10:32:46] Epoch: 1 Batch: 6300/20099 (31.34%) Loss: 1.885215 LR: 0.00002869 [10:32:48] Epoch: 1 Batch: 6301/20099 (31.35%) Loss: 2.243901 LR: 0.00002869 [10:32:50] Epoch: 1 Batch: 6302/20099 (31.35%) Loss: 2.308875 LR: 0.00002869 [10:32:52] Epoch: 1 Batch: 6303/20099 (31.36%) Loss: 2.255338 LR: 0.00002868 [10:32:54] Epoch: 1 Batch: 6304/20099 (31.36%) Loss: 2.232868 LR: 0.00002868 [10:32:55] Epoch: 1 Batch: 6305/20099 (31.37%) Loss: 2.008337 LR: 0.00002868 [10:32:57] Epoch: 1 Batch: 6306/20099 (31.37%) Loss: 2.292962 LR: 0.00002868 [10:32:59] Epoch: 1 Batch: 6307/20099 (31.38%) Loss: 2.177768 LR: 0.00002868 [10:33:01] Epoch: 1 Batch: 6308/20099 (31.38%) Loss: 1.999588 LR: 0.00002868 [10:33:02] Epoch: 1 Batch: 6309/20099 (31.39%) Loss: 2.037834 LR: 0.00002868 [10:33:04] Epoch: 1 Batch: 6310/20099 (31.39%) Loss: 1.953870 LR: 0.00002867 [10:33:06] Epoch: 1 Batch: 6311/20099 (31.40%) Loss: 2.386093 LR: 0.00002867 [10:33:08] Epoch: 1 Batch: 6312/20099 (31.40%) Loss: 2.085163 LR: 0.00002867 [10:33:10] Epoch: 1 Batch: 6313/20099 (31.41%) Loss: 2.273416 LR: 0.00002867 [10:33:11] Epoch: 1 Batch: 6314/20099 (31.41%) Loss: 1.964740 LR: 0.00002867 [10:33:13] Epoch: 1 Batch: 6315/20099 (31.42%) Loss: 1.837167 LR: 0.00002867 [10:33:15] Epoch: 1 Batch: 6316/20099 (31.42%) Loss: 2.140056 LR: 0.00002867 [10:33:17] Epoch: 1 Batch: 6317/20099 (31.43%) Loss: 1.926631 LR: 0.00002866 [10:33:18] Epoch: 1 Batch: 6318/20099 (31.43%) Loss: 2.346955 LR: 0.00002866 [10:33:20] Epoch: 1 Batch: 6319/20099 (31.44%) Loss: 2.188176 LR: 0.00002866 [10:33:22] Epoch: 1 Batch: 6320/20099 (31.44%) Loss: 2.186530 LR: 0.00002866 [10:33:24] Epoch: 1 Batch: 6321/20099 (31.45%) Loss: 2.317046 LR: 0.00002866 [10:33:26] Epoch: 1 Batch: 6322/20099 (31.45%) Loss: 2.321129 LR: 0.00002866 [10:33:27] Epoch: 1 Batch: 6323/20099 (31.46%) Loss: 2.320998 LR: 0.00002866 [10:33:29] Epoch: 1 Batch: 6324/20099 (31.46%) Loss: 2.068393 LR: 0.00002866 [10:33:31] Epoch: 1 Batch: 6325/20099 (31.47%) Loss: 2.342798 LR: 0.00002866 [10:33:33] Epoch: 1 Batch: 6326/20099 (31.47%) Loss: 2.376051 LR: 0.00002866 [10:33:34] Epoch: 1 Batch: 6327/20099 (31.48%) Loss: 1.967569 LR: 0.00002866 [10:33:36] Epoch: 1 Batch: 6328/20099 (31.48%) Loss: 2.309822 LR: 0.00002866 [10:33:38] Epoch: 1 Batch: 6329/20099 (31.49%) Loss: 2.247426 LR: 0.00002866 [10:33:40] Epoch: 1 Batch: 6330/20099 (31.49%) Loss: 2.406678 LR: 0.00002866 [10:33:42] Epoch: 1 Batch: 6331/20099 (31.50%) Loss: 2.164763 LR: 0.00002865 [10:33:43] Epoch: 1 Batch: 6332/20099 (31.50%) Loss: 1.999369 LR: 0.00002865 [10:33:45] Epoch: 1 Batch: 6333/20099 (31.51%) Loss: 1.852582 LR: 0.00002865 [10:33:47] Epoch: 1 Batch: 6334/20099 (31.51%) Loss: 2.117928 LR: 0.00002865 [10:33:49] Epoch: 1 Batch: 6335/20099 (31.52%) Loss: 2.210070 LR: 0.00002865 [10:33:50] Epoch: 1 Batch: 6336/20099 (31.52%) Loss: 2.194678 LR: 0.00002865 [10:33:52] Epoch: 1 Batch: 6337/20099 (31.53%) Loss: 2.138809 LR: 0.00002865 [10:33:54] Epoch: 1 Batch: 6338/20099 (31.53%) Loss: 2.578362 LR: 0.00002864 [10:33:56] Epoch: 1 Batch: 6339/20099 (31.54%) Loss: 2.613313 LR: 0.00002864 [10:33:57] Epoch: 1 Batch: 6340/20099 (31.54%) Loss: 1.962745 LR: 0.00002864 [10:33:59] Epoch: 1 Batch: 6341/20099 (31.55%) Loss: 2.238034 LR: 0.00002864 [10:34:01] Epoch: 1 Batch: 6342/20099 (31.55%) Loss: 1.965877 LR: 0.00002864 [10:34:03] Epoch: 1 Batch: 6343/20099 (31.56%) Loss: 2.031760 LR: 0.00002864 [10:34:05] Epoch: 1 Batch: 6344/20099 (31.56%) Loss: 2.168831 LR: 0.00002864 [10:34:06] Epoch: 1 Batch: 6345/20099 (31.57%) Loss: 2.137528 LR: 0.00002863 [10:34:08] Epoch: 1 Batch: 6346/20099 (31.57%) Loss: 2.020970 LR: 0.00002863 [10:34:10] Epoch: 1 Batch: 6347/20099 (31.58%) Loss: 2.623054 LR: 0.00002863 [10:34:12] Epoch: 1 Batch: 6348/20099 (31.58%) Loss: 2.391135 LR: 0.00002863 [10:34:13] Epoch: 1 Batch: 6349/20099 (31.59%) Loss: 1.864197 LR: 0.00002863 [10:34:15] Epoch: 1 Batch: 6350/20099 (31.59%) Loss: 2.221427 LR: 0.00002863 [10:34:17] Epoch: 1 Batch: 6351/20099 (31.60%) Loss: 2.290756 LR: 0.00002863 [10:34:19] Epoch: 1 Batch: 6352/20099 (31.60%) Loss: 2.254968 LR: 0.00002863 [10:34:21] Epoch: 1 Batch: 6353/20099 (31.61%) Loss: 2.176170 LR: 0.00002863 [10:34:22] Epoch: 1 Batch: 6354/20099 (31.61%) Loss: 2.136479 LR: 0.00002863 [10:34:24] Epoch: 1 Batch: 6355/20099 (31.62%) Loss: 2.019973 LR: 0.00002863 [10:34:26] Epoch: 1 Batch: 6356/20099 (31.62%) Loss: 2.168294 LR: 0.00002863 [10:34:28] Epoch: 1 Batch: 6357/20099 (31.63%) Loss: 2.092223 LR: 0.00002863 [10:34:29] Epoch: 1 Batch: 6358/20099 (31.63%) Loss: 1.790054 LR: 0.00002863 [10:34:31] Epoch: 1 Batch: 6359/20099 (31.64%) Loss: 2.244169 LR: 0.00002862 [10:34:33] Epoch: 1 Batch: 6360/20099 (31.64%) Loss: 1.939675 LR: 0.00002862 [10:34:35] Epoch: 1 Batch: 6361/20099 (31.65%) Loss: 2.093519 LR: 0.00002862 [10:34:36] Epoch: 1 Batch: 6362/20099 (31.65%) Loss: 2.149931 LR: 0.00002862 [10:34:38] Epoch: 1 Batch: 6363/20099 (31.66%) Loss: 2.302253 LR: 0.00002862 [10:34:40] Epoch: 1 Batch: 6364/20099 (31.66%) Loss: 2.047087 LR: 0.00002862 [10:34:42] Epoch: 1 Batch: 6365/20099 (31.67%) Loss: 1.813463 LR: 0.00002862 [10:34:43] Epoch: 1 Batch: 6366/20099 (31.67%) Loss: 2.200158 LR: 0.00002861 [10:34:45] Epoch: 1 Batch: 6367/20099 (31.68%) Loss: 2.218111 LR: 0.00002861 [10:34:47] Epoch: 1 Batch: 6368/20099 (31.68%) Loss: 2.087201 LR: 0.00002861 [10:34:49] Epoch: 1 Batch: 6369/20099 (31.69%) Loss: 2.247863 LR: 0.00002861 [10:34:50] Epoch: 1 Batch: 6370/20099 (31.69%) Loss: 2.013531 LR: 0.00002861 [10:34:52] Epoch: 1 Batch: 6371/20099 (31.70%) Loss: 2.202727 LR: 0.00002861 [10:34:54] Epoch: 1 Batch: 6372/20099 (31.70%) Loss: 2.069596 LR: 0.00002861 [10:34:56] Epoch: 1 Batch: 6373/20099 (31.71%) Loss: 1.978574 LR: 0.00002860 [10:34:58] Epoch: 1 Batch: 6374/20099 (31.71%) Loss: 2.135419 LR: 0.00002860 [10:34:59] Epoch: 1 Batch: 6375/20099 (31.72%) Loss: 2.421329 LR: 0.00002860 [10:35:01] Epoch: 1 Batch: 6376/20099 (31.72%) Loss: 1.557088 LR: 0.00002860 [10:35:03] Epoch: 1 Batch: 6377/20099 (31.73%) Loss: 2.361428 LR: 0.00002860 [10:35:05] Epoch: 1 Batch: 6378/20099 (31.73%) Loss: 1.848278 LR: 0.00002860 [10:35:06] Epoch: 1 Batch: 6379/20099 (31.74%) Loss: 2.046473 LR: 0.00002860 [10:35:08] Epoch: 1 Batch: 6380/20099 (31.74%) Loss: 2.112885 LR: 0.00002860 [10:35:10] Epoch: 1 Batch: 6381/20099 (31.75%) Loss: 2.250193 LR: 0.00002860 [10:35:12] Epoch: 1 Batch: 6382/20099 (31.75%) Loss: 1.942080 LR: 0.00002860 [10:35:13] Epoch: 1 Batch: 6383/20099 (31.76%) Loss: 2.205992 LR: 0.00002860 [10:35:15] Epoch: 1 Batch: 6384/20099 (31.76%) Loss: 1.865969 LR: 0.00002860 [10:35:17] Epoch: 1 Batch: 6385/20099 (31.77%) Loss: 2.087124 LR: 0.00002860 [10:35:19] Epoch: 1 Batch: 6386/20099 (31.77%) Loss: 1.854963 LR: 0.00002860 [10:35:20] Epoch: 1 Batch: 6387/20099 (31.78%) Loss: 2.257592 LR: 0.00002859 [10:35:22] Epoch: 1 Batch: 6388/20099 (31.78%) Loss: 2.037284 LR: 0.00002859 [10:35:24] Epoch: 1 Batch: 6389/20099 (31.79%) Loss: 2.206249 LR: 0.00002859 [10:35:26] Epoch: 1 Batch: 6390/20099 (31.79%) Loss: 2.296614 LR: 0.00002859 [10:35:27] Epoch: 1 Batch: 6391/20099 (31.80%) Loss: 2.156594 LR: 0.00002859 [10:35:29] Epoch: 1 Batch: 6392/20099 (31.80%) Loss: 1.779805 LR: 0.00002859 [10:35:31] Epoch: 1 Batch: 6393/20099 (31.81%) Loss: 2.363652 LR: 0.00002859 [10:35:33] Epoch: 1 Batch: 6394/20099 (31.81%) Loss: 2.080235 LR: 0.00002858 [10:35:35] Epoch: 1 Batch: 6395/20099 (31.82%) Loss: 2.042125 LR: 0.00002858 [10:35:36] Epoch: 1 Batch: 6396/20099 (31.82%) Loss: 2.227843 LR: 0.00002858 [10:35:38] Epoch: 1 Batch: 6397/20099 (31.83%) Loss: 2.112696 LR: 0.00002858 [10:35:40] Epoch: 1 Batch: 6398/20099 (31.83%) Loss: 1.746638 LR: 0.00002858 [10:35:42] Epoch: 1 Batch: 6399/20099 (31.84%) Loss: 2.035920 LR: 0.00002858 [10:35:47] >> Cleaned up old temp checkpoint: epoch1_step4400 [10:35:47] >> Temp checkpoint saved: epoch1_step6400, size: 0.1693 GB [10:35:47] Epoch: 1 Batch: 6400/20099 (31.84%) Loss: 2.187665 LR: 0.00002858 [10:35:49] Epoch: 1 Batch: 6401/20099 (31.85%) Loss: 2.415897 LR: 0.00002857 [10:35:50] Epoch: 1 Batch: 6402/20099 (31.85%) Loss: 2.008123 LR: 0.00002857 [10:35:52] Epoch: 1 Batch: 6403/20099 (31.86%) Loss: 2.250744 LR: 0.00002857 [10:35:54] Epoch: 1 Batch: 6404/20099 (31.86%) Loss: 1.907071 LR: 0.00002857 [10:35:56] Epoch: 1 Batch: 6405/20099 (31.87%) Loss: 2.142601 LR: 0.00002857 [10:35:57] Epoch: 1 Batch: 6406/20099 (31.87%) Loss: 2.119431 LR: 0.00002857 [10:35:59] Epoch: 1 Batch: 6407/20099 (31.88%) Loss: 2.164829 LR: 0.00002857 [10:36:01] Epoch: 1 Batch: 6408/20099 (31.88%) Loss: 1.941119 LR: 0.00002857 [10:36:03] Epoch: 1 Batch: 6409/20099 (31.89%) Loss: 2.189453 LR: 0.00002857 [10:36:04] Epoch: 1 Batch: 6410/20099 (31.89%) Loss: 1.862437 LR: 0.00002857 [10:36:06] Epoch: 1 Batch: 6411/20099 (31.90%) Loss: 2.442967 LR: 0.00002857 [10:36:08] Epoch: 1 Batch: 6412/20099 (31.90%) Loss: 2.326790 LR: 0.00002857 [10:36:10] Epoch: 1 Batch: 6413/20099 (31.91%) Loss: 1.938196 LR: 0.00002857 [10:36:12] Epoch: 1 Batch: 6414/20099 (31.91%) Loss: 2.344465 LR: 0.00002857 [10:36:13] Epoch: 1 Batch: 6415/20099 (31.92%) Loss: 2.083340 LR: 0.00002856 [10:36:15] Epoch: 1 Batch: 6416/20099 (31.92%) Loss: 2.089156 LR: 0.00002856 [10:36:17] Epoch: 1 Batch: 6417/20099 (31.93%) Loss: 2.032650 LR: 0.00002856 [10:36:19] Epoch: 1 Batch: 6418/20099 (31.93%) Loss: 1.982558 LR: 0.00002856 [10:36:21] Epoch: 1 Batch: 6419/20099 (31.94%) Loss: 2.167651 LR: 0.00002856 [10:36:22] Epoch: 1 Batch: 6420/20099 (31.94%) Loss: 2.191182 LR: 0.00002856 [10:36:24] Epoch: 1 Batch: 6421/20099 (31.95%) Loss: 2.059162 LR: 0.00002856 [10:36:26] Epoch: 1 Batch: 6422/20099 (31.95%) Loss: 2.069561 LR: 0.00002855 [10:36:28] Epoch: 1 Batch: 6423/20099 (31.96%) Loss: 1.951817 LR: 0.00002855 [10:36:29] Epoch: 1 Batch: 6424/20099 (31.96%) Loss: 2.182806 LR: 0.00002855 [10:36:31] Epoch: 1 Batch: 6425/20099 (31.97%) Loss: 2.182603 LR: 0.00002855 [10:36:33] Epoch: 1 Batch: 6426/20099 (31.97%) Loss: 2.202991 LR: 0.00002855 [10:36:35] Epoch: 1 Batch: 6427/20099 (31.98%) Loss: 2.128460 LR: 0.00002855 [10:36:37] Epoch: 1 Batch: 6428/20099 (31.98%) Loss: 2.210492 LR: 0.00002855 [10:36:38] Epoch: 1 Batch: 6429/20099 (31.99%) Loss: 2.253300 LR: 0.00002854 [10:36:40] Epoch: 1 Batch: 6430/20099 (31.99%) Loss: 2.055878 LR: 0.00002854 [10:36:42] Epoch: 1 Batch: 6431/20099 (32.00%) Loss: 2.252514 LR: 0.00002854 [10:36:44] Epoch: 1 Batch: 6432/20099 (32.00%) Loss: 2.128476 LR: 0.00002854 [10:36:45] Epoch: 1 Batch: 6433/20099 (32.01%) Loss: 2.277943 LR: 0.00002854 [10:36:47] Epoch: 1 Batch: 6434/20099 (32.01%) Loss: 2.148726 LR: 0.00002854 [10:36:49] Epoch: 1 Batch: 6435/20099 (32.02%) Loss: 2.372304 LR: 0.00002854 [10:36:51] Epoch: 1 Batch: 6436/20099 (32.02%) Loss: 2.173939 LR: 0.00002853 [10:36:53] Epoch: 1 Batch: 6437/20099 (32.03%) Loss: 2.139204 LR: 0.00002853 [10:36:55] Epoch: 1 Batch: 6438/20099 (32.03%) Loss: 2.104537 LR: 0.00002853 [10:36:56] Epoch: 1 Batch: 6439/20099 (32.04%) Loss: 1.990730 LR: 0.00002853 [10:36:58] Epoch: 1 Batch: 6440/20099 (32.04%) Loss: 1.683298 LR: 0.00002853 [10:37:00] Epoch: 1 Batch: 6441/20099 (32.05%) Loss: 1.856986 LR: 0.00002853 [10:37:02] Epoch: 1 Batch: 6442/20099 (32.05%) Loss: 1.720581 LR: 0.00002853 [10:37:03] Epoch: 1 Batch: 6443/20099 (32.06%) Loss: 2.269967 LR: 0.00002853 [10:37:05] Epoch: 1 Batch: 6444/20099 (32.06%) Loss: 2.088237 LR: 0.00002853 [10:37:07] Epoch: 1 Batch: 6445/20099 (32.07%) Loss: 2.325061 LR: 0.00002853 [10:37:09] Epoch: 1 Batch: 6446/20099 (32.07%) Loss: 2.199887 LR: 0.00002853 [10:37:10] Epoch: 1 Batch: 6447/20099 (32.08%) Loss: 1.890098 LR: 0.00002853 [10:37:12] Epoch: 1 Batch: 6448/20099 (32.08%) Loss: 1.856905 LR: 0.00002853 [10:37:14] Epoch: 1 Batch: 6449/20099 (32.09%) Loss: 1.878806 LR: 0.00002853 [10:37:16] Epoch: 1 Batch: 6450/20099 (32.09%) Loss: 2.107252 LR: 0.00002852 [10:37:17] Epoch: 1 Batch: 6451/20099 (32.10%) Loss: 2.153322 LR: 0.00002852 [10:37:19] Epoch: 1 Batch: 6452/20099 (32.10%) Loss: 1.989208 LR: 0.00002852 [10:37:21] Epoch: 1 Batch: 6453/20099 (32.11%) Loss: 2.167279 LR: 0.00002852 [10:37:23] Epoch: 1 Batch: 6454/20099 (32.11%) Loss: 2.041933 LR: 0.00002852 [10:37:24] Epoch: 1 Batch: 6455/20099 (32.12%) Loss: 2.164305 LR: 0.00002852 [10:37:26] Epoch: 1 Batch: 6456/20099 (32.12%) Loss: 2.242924 LR: 0.00002852 [10:37:28] Epoch: 1 Batch: 6457/20099 (32.13%) Loss: 2.059285 LR: 0.00002851 [10:37:30] Epoch: 1 Batch: 6458/20099 (32.13%) Loss: 2.238825 LR: 0.00002851 [10:37:31] Epoch: 1 Batch: 6459/20099 (32.14%) Loss: 1.898645 LR: 0.00002851 [10:37:33] Epoch: 1 Batch: 6460/20099 (32.14%) Loss: 1.892266 LR: 0.00002851 [10:37:35] Epoch: 1 Batch: 6461/20099 (32.15%) Loss: 2.213252 LR: 0.00002851 [10:37:37] Epoch: 1 Batch: 6462/20099 (32.15%) Loss: 2.130011 LR: 0.00002851 [10:37:39] Epoch: 1 Batch: 6463/20099 (32.16%) Loss: 2.394291 LR: 0.00002851 [10:37:40] Epoch: 1 Batch: 6464/20099 (32.16%) Loss: 2.277079 LR: 0.00002850 [10:37:42] Epoch: 1 Batch: 6465/20099 (32.17%) Loss: 2.149283 LR: 0.00002850 [10:37:44] Epoch: 1 Batch: 6466/20099 (32.17%) Loss: 2.530308 LR: 0.00002850 [10:37:46] Epoch: 1 Batch: 6467/20099 (32.18%) Loss: 2.000461 LR: 0.00002850 [10:37:48] Epoch: 1 Batch: 6468/20099 (32.18%) Loss: 2.242094 LR: 0.00002850 [10:37:49] Epoch: 1 Batch: 6469/20099 (32.19%) Loss: 2.252054 LR: 0.00002850 [10:37:51] Epoch: 1 Batch: 6470/20099 (32.19%) Loss: 2.180910 LR: 0.00002850 [10:37:53] Epoch: 1 Batch: 6471/20099 (32.20%) Loss: 1.886440 LR: 0.00002849 [10:37:55] Epoch: 1 Batch: 6472/20099 (32.20%) Loss: 1.936881 LR: 0.00002849 [10:37:56] Epoch: 1 Batch: 6473/20099 (32.21%) Loss: 2.202753 LR: 0.00002849 [10:37:58] Epoch: 1 Batch: 6474/20099 (32.21%) Loss: 2.116051 LR: 0.00002849 [10:38:00] Epoch: 1 Batch: 6475/20099 (32.22%) Loss: 2.321758 LR: 0.00002849 [10:38:02] Epoch: 1 Batch: 6476/20099 (32.22%) Loss: 2.130818 LR: 0.00002849 [10:38:04] Epoch: 1 Batch: 6477/20099 (32.23%) Loss: 2.130774 LR: 0.00002849 [10:38:05] Epoch: 1 Batch: 6478/20099 (32.23%) Loss: 2.111354 LR: 0.00002849 [10:38:07] Epoch: 1 Batch: 6479/20099 (32.24%) Loss: 1.938881 LR: 0.00002849 [10:38:09] Epoch: 1 Batch: 6480/20099 (32.24%) Loss: 2.136102 LR: 0.00002849 [10:38:11] Epoch: 1 Batch: 6481/20099 (32.25%) Loss: 2.278056 LR: 0.00002849 [10:38:12] Epoch: 1 Batch: 6482/20099 (32.25%) Loss: 2.088788 LR: 0.00002849 [10:38:14] Epoch: 1 Batch: 6483/20099 (32.26%) Loss: 1.988611 LR: 0.00002849 [10:38:16] Epoch: 1 Batch: 6484/20099 (32.26%) Loss: 2.331734 LR: 0.00002849 [10:38:18] Epoch: 1 Batch: 6485/20099 (32.27%) Loss: 1.954788 LR: 0.00002848 [10:38:19] Epoch: 1 Batch: 6486/20099 (32.27%) Loss: 2.239268 LR: 0.00002848 [10:38:21] Epoch: 1 Batch: 6487/20099 (32.28%) Loss: 2.367664 LR: 0.00002848 [10:38:23] Epoch: 1 Batch: 6488/20099 (32.28%) Loss: 2.037768 LR: 0.00002848 [10:38:25] Epoch: 1 Batch: 6489/20099 (32.29%) Loss: 2.313503 LR: 0.00002848 [10:38:27] Epoch: 1 Batch: 6490/20099 (32.29%) Loss: 2.131440 LR: 0.00002848 [10:38:28] Epoch: 1 Batch: 6491/20099 (32.30%) Loss: 2.190133 LR: 0.00002848 [10:38:30] Epoch: 1 Batch: 6492/20099 (32.30%) Loss: 2.212343 LR: 0.00002847 [10:38:32] Epoch: 1 Batch: 6493/20099 (32.31%) Loss: 2.175993 LR: 0.00002847 [10:38:34] Epoch: 1 Batch: 6494/20099 (32.31%) Loss: 1.995020 LR: 0.00002847 [10:38:35] Epoch: 1 Batch: 6495/20099 (32.32%) Loss: 2.087795 LR: 0.00002847 [10:38:37] Epoch: 1 Batch: 6496/20099 (32.32%) Loss: 2.122383 LR: 0.00002847 [10:38:39] Epoch: 1 Batch: 6497/20099 (32.32%) Loss: 2.111411 LR: 0.00002847 [10:38:41] Epoch: 1 Batch: 6498/20099 (32.33%) Loss: 2.365207 LR: 0.00002847 [10:38:43] Epoch: 1 Batch: 6499/20099 (32.33%) Loss: 1.932493 LR: 0.00002846 [10:38:44] >> Evaluating batch 0 [10:38:45] >> Evaluating batch 1 [10:38:46] >> Evaluating batch 2 [10:38:47] >> Evaluating batch 3 [10:38:48] >> Evaluating batch 4 [10:38:49] >> Evaluating batch 5 [10:38:51] >> Evaluating batch 6 [10:38:51] >> Evaluating batch 7 [10:38:53] >> Evaluating batch 8 [10:38:54] >> Evaluating batch 9 [10:38:54] >> Evaluating batch 10 [10:38:55] >> Evaluating batch 11 [10:38:56] >> Evaluating batch 12 [10:38:57] >> Evaluating batch 13 [10:38:58] >> Evaluating batch 14 [10:38:59] >> Evaluating batch 15 [10:39:00] >> Evaluating batch 16 [10:39:01] Epoch: 1 Step: 6500/20099 Evaluation: [10:39:01] [1mAvg Loss Since Last Eval: 2.1318 Val Loss: 2.1890 Validation loss delta: -0.0093 Perplexity: 8.9265 LR: 0.00002846 [10:39:04] >> Checkpoint saved: epoch1_step6500, size: 0.1693 GB [10:39:04] Epoch: 1 Batch: 6500/20099 (32.34%) Loss: 1.906406 LR: 0.00002846 [10:39:06] Epoch: 1 Batch: 6501/20099 (32.34%) Loss: 2.115054 LR: 0.00002846 [10:39:08] Epoch: 1 Batch: 6502/20099 (32.35%) Loss: 2.105612 LR: 0.00002846 [10:39:10] Epoch: 1 Batch: 6503/20099 (32.35%) Loss: 2.404300 LR: 0.00002846 [10:39:11] Epoch: 1 Batch: 6504/20099 (32.36%) Loss: 1.993336 LR: 0.00002846 [10:39:13] Epoch: 1 Batch: 6505/20099 (32.36%) Loss: 2.086851 LR: 0.00002846 [10:39:15] Epoch: 1 Batch: 6506/20099 (32.37%) Loss: 2.032112 LR: 0.00002846 [10:39:17] Epoch: 1 Batch: 6507/20099 (32.37%) Loss: 2.123315 LR: 0.00002846 [10:39:18] Epoch: 1 Batch: 6508/20099 (32.38%) Loss: 2.132798 LR: 0.00002846 [10:39:20] Epoch: 1 Batch: 6509/20099 (32.38%) Loss: 2.028927 LR: 0.00002846 [10:39:22] Epoch: 1 Batch: 6510/20099 (32.39%) Loss: 2.413198 LR: 0.00002846 [10:39:24] Epoch: 1 Batch: 6511/20099 (32.39%) Loss: 1.752986 LR: 0.00002846 [10:39:26] Epoch: 1 Batch: 6512/20099 (32.40%) Loss: 1.807659 LR: 0.00002846 [10:39:27] Epoch: 1 Batch: 6513/20099 (32.40%) Loss: 2.421151 LR: 0.00002845 [10:39:29] Epoch: 1 Batch: 6514/20099 (32.41%) Loss: 2.422933 LR: 0.00002845 [10:39:31] Epoch: 1 Batch: 6515/20099 (32.41%) Loss: 1.952537 LR: 0.00002845 [10:39:33] Epoch: 1 Batch: 6516/20099 (32.42%) Loss: 2.209727 LR: 0.00002845 [10:39:35] Epoch: 1 Batch: 6517/20099 (32.42%) Loss: 2.500434 LR: 0.00002845 [10:39:36] Epoch: 1 Batch: 6518/20099 (32.43%) Loss: 1.948252 LR: 0.00002845 [10:39:38] Epoch: 1 Batch: 6519/20099 (32.43%) Loss: 2.600738 LR: 0.00002845 [10:39:40] Epoch: 1 Batch: 6520/20099 (32.44%) Loss: 2.185647 LR: 0.00002844 [10:39:42] Epoch: 1 Batch: 6521/20099 (32.44%) Loss: 2.219620 LR: 0.00002844 [10:39:44] Epoch: 1 Batch: 6522/20099 (32.45%) Loss: 2.090808 LR: 0.00002844 [10:39:45] Epoch: 1 Batch: 6523/20099 (32.45%) Loss: 1.949762 LR: 0.00002844 [10:39:47] Epoch: 1 Batch: 6524/20099 (32.46%) Loss: 1.940869 LR: 0.00002844 [10:39:49] Epoch: 1 Batch: 6525/20099 (32.46%) Loss: 2.095965 LR: 0.00002844 [10:39:51] Epoch: 1 Batch: 6526/20099 (32.47%) Loss: 2.292775 LR: 0.00002844 [10:39:52] Epoch: 1 Batch: 6527/20099 (32.47%) Loss: 1.980376 LR: 0.00002843 [10:39:54] Epoch: 1 Batch: 6528/20099 (32.48%) Loss: 2.125344 LR: 0.00002843 [10:39:56] Epoch: 1 Batch: 6529/20099 (32.48%) Loss: 2.118387 LR: 0.00002843 [10:39:58] Epoch: 1 Batch: 6530/20099 (32.49%) Loss: 1.950618 LR: 0.00002843 [10:39:59] Epoch: 1 Batch: 6531/20099 (32.49%) Loss: 2.460605 LR: 0.00002843 [10:40:01] Epoch: 1 Batch: 6532/20099 (32.50%) Loss: 2.112350 LR: 0.00002843 [10:40:03] Epoch: 1 Batch: 6533/20099 (32.50%) Loss: 2.153263 LR: 0.00002843 [10:40:05] Epoch: 1 Batch: 6534/20099 (32.51%) Loss: 2.217041 LR: 0.00002842 [10:40:07] Epoch: 1 Batch: 6535/20099 (32.51%) Loss: 2.092172 LR: 0.00002842 [10:40:08] Epoch: 1 Batch: 6536/20099 (32.52%) Loss: 2.099350 LR: 0.00002842 [10:40:10] Epoch: 1 Batch: 6537/20099 (32.52%) Loss: 2.239337 LR: 0.00002842 [10:40:12] Epoch: 1 Batch: 6538/20099 (32.53%) Loss: 2.060982 LR: 0.00002842 [10:40:14] Epoch: 1 Batch: 6539/20099 (32.53%) Loss: 2.290468 LR: 0.00002842 [10:40:15] Epoch: 1 Batch: 6540/20099 (32.54%) Loss: 1.859637 LR: 0.00002842 [10:40:17] Epoch: 1 Batch: 6541/20099 (32.54%) Loss: 2.027823 LR: 0.00002842 [10:40:19] Epoch: 1 Batch: 6542/20099 (32.55%) Loss: 2.298292 LR: 0.00002842 [10:40:21] Epoch: 1 Batch: 6543/20099 (32.55%) Loss: 2.235002 LR: 0.00002842 [10:40:22] Epoch: 1 Batch: 6544/20099 (32.56%) Loss: 1.882437 LR: 0.00002842 [10:40:24] Epoch: 1 Batch: 6545/20099 (32.56%) Loss: 2.103991 LR: 0.00002842 [10:40:26] Epoch: 1 Batch: 6546/20099 (32.57%) Loss: 2.411402 LR: 0.00002842 [10:40:28] Epoch: 1 Batch: 6547/20099 (32.57%) Loss: 2.240455 LR: 0.00002842 [10:40:29] Epoch: 1 Batch: 6548/20099 (32.58%) Loss: 2.197745 LR: 0.00002841 [10:40:31] Epoch: 1 Batch: 6549/20099 (32.58%) Loss: 2.008924 LR: 0.00002841 [10:40:33] Epoch: 1 Batch: 6550/20099 (32.59%) Loss: 2.037074 LR: 0.00002841 [10:40:35] Epoch: 1 Batch: 6551/20099 (32.59%) Loss: 2.100135 LR: 0.00002841 [10:40:37] Epoch: 1 Batch: 6552/20099 (32.60%) Loss: 2.291214 LR: 0.00002841 [10:40:38] Epoch: 1 Batch: 6553/20099 (32.60%) Loss: 2.208898 LR: 0.00002841 [10:40:40] Epoch: 1 Batch: 6554/20099 (32.61%) Loss: 2.073964 LR: 0.00002841 [10:40:42] Epoch: 1 Batch: 6555/20099 (32.61%) Loss: 2.003874 LR: 0.00002840 [10:40:44] Epoch: 1 Batch: 6556/20099 (32.62%) Loss: 1.905044 LR: 0.00002840 [10:40:45] Epoch: 1 Batch: 6557/20099 (32.62%) Loss: 2.297339 LR: 0.00002840 [10:40:47] Epoch: 1 Batch: 6558/20099 (32.63%) Loss: 1.921412 LR: 0.00002840 [10:40:49] Epoch: 1 Batch: 6559/20099 (32.63%) Loss: 2.214042 LR: 0.00002840 [10:40:51] Epoch: 1 Batch: 6560/20099 (32.64%) Loss: 2.128930 LR: 0.00002840 [10:40:52] Epoch: 1 Batch: 6561/20099 (32.64%) Loss: 2.101876 LR: 0.00002840 [10:40:54] Epoch: 1 Batch: 6562/20099 (32.65%) Loss: 2.177011 LR: 0.00002839 [10:40:56] Epoch: 1 Batch: 6563/20099 (32.65%) Loss: 2.139613 LR: 0.00002839 [10:40:58] Epoch: 1 Batch: 6564/20099 (32.66%) Loss: 1.944555 LR: 0.00002839 [10:40:59] Epoch: 1 Batch: 6565/20099 (32.66%) Loss: 2.068617 LR: 0.00002839 [10:41:01] Epoch: 1 Batch: 6566/20099 (32.67%) Loss: 2.116636 LR: 0.00002839 [10:41:03] Epoch: 1 Batch: 6567/20099 (32.67%) Loss: 1.933752 LR: 0.00002839 [10:41:05] Epoch: 1 Batch: 6568/20099 (32.68%) Loss: 2.152174 LR: 0.00002839 [10:41:07] Epoch: 1 Batch: 6569/20099 (32.68%) Loss: 2.178444 LR: 0.00002838 [10:41:08] Epoch: 1 Batch: 6570/20099 (32.69%) Loss: 1.901858 LR: 0.00002838 [10:41:10] Epoch: 1 Batch: 6571/20099 (32.69%) Loss: 1.981895 LR: 0.00002838 [10:41:12] Epoch: 1 Batch: 6572/20099 (32.70%) Loss: 2.245258 LR: 0.00002838 [10:41:14] Epoch: 1 Batch: 6573/20099 (32.70%) Loss: 2.014955 LR: 0.00002838 [10:41:15] Epoch: 1 Batch: 6574/20099 (32.71%) Loss: 2.200868 LR: 0.00002838 [10:41:17] Epoch: 1 Batch: 6575/20099 (32.71%) Loss: 2.078364 LR: 0.00002838 [10:41:19] Epoch: 1 Batch: 6576/20099 (32.72%) Loss: 1.947287 LR: 0.00002837 [10:41:21] Epoch: 1 Batch: 6577/20099 (32.72%) Loss: 2.259489 LR: 0.00002837 [10:41:23] Epoch: 1 Batch: 6578/20099 (32.73%) Loss: 2.029839 LR: 0.00002837 [10:41:24] Epoch: 1 Batch: 6579/20099 (32.73%) Loss: 2.028806 LR: 0.00002837 [10:41:26] Epoch: 1 Batch: 6580/20099 (32.74%) Loss: 2.167898 LR: 0.00002837 [10:41:28] Epoch: 1 Batch: 6581/20099 (32.74%) Loss: 1.935447 LR: 0.00002837 [10:41:30] Epoch: 1 Batch: 6582/20099 (32.75%) Loss: 2.283997 LR: 0.00002837 [10:41:31] Epoch: 1 Batch: 6583/20099 (32.75%) Loss: 2.034259 LR: 0.00002837 [10:41:33] Epoch: 1 Batch: 6584/20099 (32.76%) Loss: 2.320268 LR: 0.00002837 [10:41:35] Epoch: 1 Batch: 6585/20099 (32.76%) Loss: 2.041673 LR: 0.00002837 [10:41:37] Epoch: 1 Batch: 6586/20099 (32.77%) Loss: 2.107000 LR: 0.00002837 [10:41:39] Epoch: 1 Batch: 6587/20099 (32.77%) Loss: 1.924889 LR: 0.00002837 [10:41:40] Epoch: 1 Batch: 6588/20099 (32.78%) Loss: 2.210462 LR: 0.00002837 [10:41:42] Epoch: 1 Batch: 6589/20099 (32.78%) Loss: 2.043254 LR: 0.00002837 [10:41:44] Epoch: 1 Batch: 6590/20099 (32.79%) Loss: 2.043365 LR: 0.00002836 [10:41:46] Epoch: 1 Batch: 6591/20099 (32.79%) Loss: 2.307430 LR: 0.00002836 [10:41:47] Epoch: 1 Batch: 6592/20099 (32.80%) Loss: 2.112752 LR: 0.00002836 [10:41:49] Epoch: 1 Batch: 6593/20099 (32.80%) Loss: 2.149111 LR: 0.00002836 [10:41:51] Epoch: 1 Batch: 6594/20099 (32.81%) Loss: 2.112580 LR: 0.00002836 [10:41:53] Epoch: 1 Batch: 6595/20099 (32.81%) Loss: 2.191059 LR: 0.00002836 [10:41:55] Epoch: 1 Batch: 6596/20099 (32.82%) Loss: 1.983547 LR: 0.00002836 [10:41:56] Epoch: 1 Batch: 6597/20099 (32.82%) Loss: 2.120425 LR: 0.00002835 [10:41:58] Epoch: 1 Batch: 6598/20099 (32.83%) Loss: 2.133031 LR: 0.00002835 [10:42:00] Epoch: 1 Batch: 6599/20099 (32.83%) Loss: 2.486008 LR: 0.00002835 [10:42:05] >> Cleaned up old temp checkpoint: epoch1_step4600 [10:42:05] >> Temp checkpoint saved: epoch1_step6600, size: 0.1693 GB [10:42:05] Epoch: 1 Batch: 6600/20099 (32.84%) Loss: 2.077664 LR: 0.00002835 [10:42:07] Epoch: 1 Batch: 6601/20099 (32.84%) Loss: 2.100567 LR: 0.00002835 [10:42:09] Epoch: 1 Batch: 6602/20099 (32.85%) Loss: 2.053918 LR: 0.00002835 [10:42:11] Epoch: 1 Batch: 6603/20099 (32.85%) Loss: 2.239495 LR: 0.00002835 [10:42:12] Epoch: 1 Batch: 6604/20099 (32.86%) Loss: 1.784944 LR: 0.00002834 [10:42:14] Epoch: 1 Batch: 6605/20099 (32.86%) Loss: 2.002553 LR: 0.00002834 [10:42:16] Epoch: 1 Batch: 6606/20099 (32.87%) Loss: 1.921628 LR: 0.00002834 [10:42:18] Epoch: 1 Batch: 6607/20099 (32.87%) Loss: 2.328188 LR: 0.00002834 [10:42:19] Epoch: 1 Batch: 6608/20099 (32.88%) Loss: 2.131901 LR: 0.00002834 [10:42:21] Epoch: 1 Batch: 6609/20099 (32.88%) Loss: 2.255655 LR: 0.00002834 [10:42:23] Epoch: 1 Batch: 6610/20099 (32.89%) Loss: 2.048466 LR: 0.00002834 [10:42:25] Epoch: 1 Batch: 6611/20099 (32.89%) Loss: 2.332816 LR: 0.00002833 [10:42:27] Epoch: 1 Batch: 6612/20099 (32.90%) Loss: 2.326630 LR: 0.00002833 [10:42:28] Epoch: 1 Batch: 6613/20099 (32.90%) Loss: 1.857444 LR: 0.00002833 [10:42:30] Epoch: 1 Batch: 6614/20099 (32.91%) Loss: 2.402572 LR: 0.00002833 [10:42:32] Epoch: 1 Batch: 6615/20099 (32.91%) Loss: 2.235574 LR: 0.00002833 [10:42:34] Epoch: 1 Batch: 6616/20099 (32.92%) Loss: 2.412613 LR: 0.00002833 [10:42:35] Epoch: 1 Batch: 6617/20099 (32.92%) Loss: 2.018722 LR: 0.00002833 [10:42:37] Epoch: 1 Batch: 6618/20099 (32.93%) Loss: 2.160610 LR: 0.00002833 [10:42:39] Epoch: 1 Batch: 6619/20099 (32.93%) Loss: 2.160319 LR: 0.00002833 [10:42:41] Epoch: 1 Batch: 6620/20099 (32.94%) Loss: 1.734548 LR: 0.00002833 [10:42:43] Epoch: 1 Batch: 6621/20099 (32.94%) Loss: 2.235257 LR: 0.00002833 [10:42:44] Epoch: 1 Batch: 6622/20099 (32.95%) Loss: 1.993770 LR: 0.00002833 [10:42:46] Epoch: 1 Batch: 6623/20099 (32.95%) Loss: 1.936388 LR: 0.00002833 [10:42:48] Epoch: 1 Batch: 6624/20099 (32.96%) Loss: 1.903807 LR: 0.00002833 [10:42:50] Epoch: 1 Batch: 6625/20099 (32.96%) Loss: 2.203893 LR: 0.00002832 [10:42:52] Epoch: 1 Batch: 6626/20099 (32.97%) Loss: 2.055523 LR: 0.00002832 [10:42:53] Epoch: 1 Batch: 6627/20099 (32.97%) Loss: 1.940781 LR: 0.00002832 [10:42:55] Epoch: 1 Batch: 6628/20099 (32.98%) Loss: 2.008262 LR: 0.00002832 [10:42:57] Epoch: 1 Batch: 6629/20099 (32.98%) Loss: 2.238813 LR: 0.00002832 [10:42:59] Epoch: 1 Batch: 6630/20099 (32.99%) Loss: 2.199565 LR: 0.00002832 [10:43:00] Epoch: 1 Batch: 6631/20099 (32.99%) Loss: 2.255496 LR: 0.00002832 [10:43:02] Epoch: 1 Batch: 6632/20099 (33.00%) Loss: 2.135784 LR: 0.00002831 [10:43:04] Epoch: 1 Batch: 6633/20099 (33.00%) Loss: 2.007067 LR: 0.00002831 [10:43:06] Epoch: 1 Batch: 6634/20099 (33.01%) Loss: 1.923503 LR: 0.00002831 [10:43:07] Epoch: 1 Batch: 6635/20099 (33.01%) Loss: 1.967523 LR: 0.00002831 [10:43:09] Epoch: 1 Batch: 6636/20099 (33.02%) Loss: 1.924851 LR: 0.00002831 [10:43:11] Epoch: 1 Batch: 6637/20099 (33.02%) Loss: 2.058989 LR: 0.00002831 [10:43:13] Epoch: 1 Batch: 6638/20099 (33.03%) Loss: 2.161019 LR: 0.00002831 [10:43:14] Epoch: 1 Batch: 6639/20099 (33.03%) Loss: 2.068487 LR: 0.00002830 [10:43:16] Epoch: 1 Batch: 6640/20099 (33.04%) Loss: 1.973351 LR: 0.00002830 [10:43:18] Epoch: 1 Batch: 6641/20099 (33.04%) Loss: 2.189079 LR: 0.00002830 [10:43:20] Epoch: 1 Batch: 6642/20099 (33.05%) Loss: 2.149225 LR: 0.00002830 [10:43:22] Epoch: 1 Batch: 6643/20099 (33.05%) Loss: 2.203907 LR: 0.00002830 [10:43:23] Epoch: 1 Batch: 6644/20099 (33.06%) Loss: 2.114188 LR: 0.00002830 [10:43:25] Epoch: 1 Batch: 6645/20099 (33.06%) Loss: 2.052547 LR: 0.00002830 [10:43:27] Epoch: 1 Batch: 6646/20099 (33.07%) Loss: 2.102687 LR: 0.00002829 [10:43:29] Epoch: 1 Batch: 6647/20099 (33.07%) Loss: 1.964479 LR: 0.00002829 [10:43:30] Epoch: 1 Batch: 6648/20099 (33.08%) Loss: 2.102710 LR: 0.00002829 [10:43:32] Epoch: 1 Batch: 6649/20099 (33.08%) Loss: 1.992023 LR: 0.00002829 [10:43:34] Epoch: 1 Batch: 6650/20099 (33.09%) Loss: 2.317252 LR: 0.00002829 [10:43:36] Epoch: 1 Batch: 6651/20099 (33.09%) Loss: 2.404772 LR: 0.00002829 [10:43:37] Epoch: 1 Batch: 6652/20099 (33.10%) Loss: 2.115197 LR: 0.00002829 [10:43:39] Epoch: 1 Batch: 6653/20099 (33.10%) Loss: 1.821781 LR: 0.00002828 [10:43:41] Epoch: 1 Batch: 6654/20099 (33.11%) Loss: 2.248160 LR: 0.00002828 [10:43:43] Epoch: 1 Batch: 6655/20099 (33.11%) Loss: 1.996402 LR: 0.00002828 [10:43:44] Epoch: 1 Batch: 6656/20099 (33.12%) Loss: 2.160591 LR: 0.00002828 [10:43:46] Epoch: 1 Batch: 6657/20099 (33.12%) Loss: 2.193472 LR: 0.00002828 [10:43:48] Epoch: 1 Batch: 6658/20099 (33.13%) Loss: 2.338682 LR: 0.00002828 [10:43:50] Epoch: 1 Batch: 6659/20099 (33.13%) Loss: 1.997308 LR: 0.00002828 [10:43:51] Epoch: 1 Batch: 6660/20099 (33.14%) Loss: 2.039129 LR: 0.00002828 [10:43:53] Epoch: 1 Batch: 6661/20099 (33.14%) Loss: 1.947159 LR: 0.00002828 [10:43:55] Epoch: 1 Batch: 6662/20099 (33.15%) Loss: 2.079448 LR: 0.00002828 [10:43:57] Epoch: 1 Batch: 6663/20099 (33.15%) Loss: 2.110144 LR: 0.00002828 [10:43:59] Epoch: 1 Batch: 6664/20099 (33.16%) Loss: 1.904817 LR: 0.00002828 [10:44:00] Epoch: 1 Batch: 6665/20099 (33.16%) Loss: 2.131608 LR: 0.00002828 [10:44:02] Epoch: 1 Batch: 6666/20099 (33.17%) Loss: 2.256192 LR: 0.00002828 [10:44:04] Epoch: 1 Batch: 6667/20099 (33.17%) Loss: 2.011547 LR: 0.00002827 [10:44:06] Epoch: 1 Batch: 6668/20099 (33.18%) Loss: 2.144122 LR: 0.00002827 [10:44:07] Epoch: 1 Batch: 6669/20099 (33.18%) Loss: 2.060750 LR: 0.00002827 [10:44:09] Epoch: 1 Batch: 6670/20099 (33.19%) Loss: 2.306222 LR: 0.00002827 [10:44:11] Epoch: 1 Batch: 6671/20099 (33.19%) Loss: 1.846326 LR: 0.00002827 [10:44:13] Epoch: 1 Batch: 6672/20099 (33.20%) Loss: 2.433973 LR: 0.00002827 [10:44:14] Epoch: 1 Batch: 6673/20099 (33.20%) Loss: 2.035185 LR: 0.00002827 [10:44:16] Epoch: 1 Batch: 6674/20099 (33.21%) Loss: 2.191068 LR: 0.00002826 [10:44:18] Epoch: 1 Batch: 6675/20099 (33.21%) Loss: 2.229950 LR: 0.00002826 [10:44:20] Epoch: 1 Batch: 6676/20099 (33.22%) Loss: 2.320967 LR: 0.00002826 [10:44:22] Epoch: 1 Batch: 6677/20099 (33.22%) Loss: 2.071143 LR: 0.00002826 [10:44:23] Epoch: 1 Batch: 6678/20099 (33.23%) Loss: 2.149494 LR: 0.00002826 [10:44:25] Epoch: 1 Batch: 6679/20099 (33.23%) Loss: 1.826397 LR: 0.00002826 [10:44:27] Epoch: 1 Batch: 6680/20099 (33.24%) Loss: 2.332940 LR: 0.00002826 [10:44:29] Epoch: 1 Batch: 6681/20099 (33.24%) Loss: 1.933102 LR: 0.00002825 [10:44:30] Epoch: 1 Batch: 6682/20099 (33.25%) Loss: 2.096231 LR: 0.00002825 [10:44:32] Epoch: 1 Batch: 6683/20099 (33.25%) Loss: 2.218872 LR: 0.00002825 [10:44:34] Epoch: 1 Batch: 6684/20099 (33.26%) Loss: 2.036228 LR: 0.00002825 [10:44:36] Epoch: 1 Batch: 6685/20099 (33.26%) Loss: 2.056673 LR: 0.00002825 [10:44:38] Epoch: 1 Batch: 6686/20099 (33.27%) Loss: 1.921984 LR: 0.00002825 [10:44:39] Epoch: 1 Batch: 6687/20099 (33.27%) Loss: 2.154208 LR: 0.00002825 [10:44:41] Epoch: 1 Batch: 6688/20099 (33.28%) Loss: 2.052454 LR: 0.00002824 [10:44:43] Epoch: 1 Batch: 6689/20099 (33.28%) Loss: 2.151461 LR: 0.00002824 [10:44:45] Epoch: 1 Batch: 6690/20099 (33.29%) Loss: 1.881965 LR: 0.00002824 [10:44:46] Epoch: 1 Batch: 6691/20099 (33.29%) Loss: 2.102429 LR: 0.00002824 [10:44:48] Epoch: 1 Batch: 6692/20099 (33.30%) Loss: 2.124313 LR: 0.00002824 [10:44:50] Epoch: 1 Batch: 6693/20099 (33.30%) Loss: 2.127237 LR: 0.00002824 [10:44:52] Epoch: 1 Batch: 6694/20099 (33.31%) Loss: 2.158774 LR: 0.00002824 [10:44:53] Epoch: 1 Batch: 6695/20099 (33.31%) Loss: 2.227474 LR: 0.00002823 [10:44:55] Epoch: 1 Batch: 6696/20099 (33.32%) Loss: 2.249902 LR: 0.00002823 [10:44:57] Epoch: 1 Batch: 6697/20099 (33.32%) Loss: 2.158015 LR: 0.00002823 [10:44:59] Epoch: 1 Batch: 6698/20099 (33.33%) Loss: 2.000971 LR: 0.00002823 [10:45:00] Epoch: 1 Batch: 6699/20099 (33.33%) Loss: 2.155462 LR: 0.00002823 [10:45:02] Epoch: 1 Batch: 6700/20099 (33.33%) Loss: 2.166986 LR: 0.00002823 [10:45:04] Epoch: 1 Batch: 6701/20099 (33.34%) Loss: 2.025448 LR: 0.00002823 [10:45:06] Epoch: 1 Batch: 6702/20099 (33.34%) Loss: 1.943710 LR: 0.00002822 [10:45:08] Epoch: 1 Batch: 6703/20099 (33.35%) Loss: 1.955895 LR: 0.00002822 [10:45:09] Epoch: 1 Batch: 6704/20099 (33.35%) Loss: 2.171408 LR: 0.00002822 [10:45:11] Epoch: 1 Batch: 6705/20099 (33.36%) Loss: 2.305253 LR: 0.00002822 [10:45:13] Epoch: 1 Batch: 6706/20099 (33.36%) Loss: 2.059647 LR: 0.00002822 [10:45:15] Epoch: 1 Batch: 6707/20099 (33.37%) Loss: 2.167451 LR: 0.00002822 [10:45:16] Epoch: 1 Batch: 6708/20099 (33.37%) Loss: 2.019367 LR: 0.00002822 [10:45:18] Epoch: 1 Batch: 6709/20099 (33.38%) Loss: 2.093004 LR: 0.00002822 [10:45:20] Epoch: 1 Batch: 6710/20099 (33.38%) Loss: 2.353024 LR: 0.00002822 [10:45:22] Epoch: 1 Batch: 6711/20099 (33.39%) Loss: 1.773209 LR: 0.00002822 [10:45:23] Epoch: 1 Batch: 6712/20099 (33.39%) Loss: 2.224622 LR: 0.00002822 [10:45:25] Epoch: 1 Batch: 6713/20099 (33.40%) Loss: 2.114805 LR: 0.00002822 [10:45:27] Epoch: 1 Batch: 6714/20099 (33.40%) Loss: 1.997008 LR: 0.00002822 [10:45:29] Epoch: 1 Batch: 6715/20099 (33.41%) Loss: 2.039809 LR: 0.00002822 [10:45:31] Epoch: 1 Batch: 6716/20099 (33.41%) Loss: 2.115780 LR: 0.00002821 [10:45:32] Epoch: 1 Batch: 6717/20099 (33.42%) Loss: 2.182086 LR: 0.00002821 [10:45:34] Epoch: 1 Batch: 6718/20099 (33.42%) Loss: 1.905338 LR: 0.00002821 [10:45:36] Epoch: 1 Batch: 6719/20099 (33.43%) Loss: 2.180464 LR: 0.00002821 [10:45:38] Epoch: 1 Batch: 6720/20099 (33.43%) Loss: 2.033190 LR: 0.00002821 [10:45:39] Epoch: 1 Batch: 6721/20099 (33.44%) Loss: 2.240808 LR: 0.00002821 [10:45:41] Epoch: 1 Batch: 6722/20099 (33.44%) Loss: 2.051046 LR: 0.00002821 [10:45:43] Epoch: 1 Batch: 6723/20099 (33.45%) Loss: 2.138796 LR: 0.00002820 [10:45:45] Epoch: 1 Batch: 6724/20099 (33.45%) Loss: 2.237592 LR: 0.00002820 [10:45:46] Epoch: 1 Batch: 6725/20099 (33.46%) Loss: 2.235295 LR: 0.00002820 [10:45:48] Epoch: 1 Batch: 6726/20099 (33.46%) Loss: 2.074175 LR: 0.00002820 [10:45:50] Epoch: 1 Batch: 6727/20099 (33.47%) Loss: 2.199253 LR: 0.00002820 [10:45:52] Epoch: 1 Batch: 6728/20099 (33.47%) Loss: 2.063665 LR: 0.00002820 [10:45:53] Epoch: 1 Batch: 6729/20099 (33.48%) Loss: 2.025707 LR: 0.00002820 [10:45:55] Epoch: 1 Batch: 6730/20099 (33.48%) Loss: 2.133513 LR: 0.00002819 [10:45:57] Epoch: 1 Batch: 6731/20099 (33.49%) Loss: 1.898714 LR: 0.00002819 [10:45:59] Epoch: 1 Batch: 6732/20099 (33.49%) Loss: 1.956254 LR: 0.00002819 [10:46:00] Epoch: 1 Batch: 6733/20099 (33.50%) Loss: 1.740474 LR: 0.00002819 [10:46:02] Epoch: 1 Batch: 6734/20099 (33.50%) Loss: 2.131116 LR: 0.00002819 [10:46:04] Epoch: 1 Batch: 6735/20099 (33.51%) Loss: 2.084637 LR: 0.00002819 [10:46:06] Epoch: 1 Batch: 6736/20099 (33.51%) Loss: 2.142046 LR: 0.00002819 [10:46:08] Epoch: 1 Batch: 6737/20099 (33.52%) Loss: 2.080365 LR: 0.00002818 [10:46:09] Epoch: 1 Batch: 6738/20099 (33.52%) Loss: 2.154455 LR: 0.00002818 [10:46:11] Epoch: 1 Batch: 6739/20099 (33.53%) Loss: 2.193942 LR: 0.00002818 [10:46:13] Epoch: 1 Batch: 6740/20099 (33.53%) Loss: 2.127239 LR: 0.00002818 [10:46:15] Epoch: 1 Batch: 6741/20099 (33.54%) Loss: 1.630982 LR: 0.00002818 [10:46:16] Epoch: 1 Batch: 6742/20099 (33.54%) Loss: 1.819384 LR: 0.00002818 [10:46:18] Epoch: 1 Batch: 6743/20099 (33.55%) Loss: 2.069460 LR: 0.00002818 [10:46:20] Epoch: 1 Batch: 6744/20099 (33.55%) Loss: 2.100762 LR: 0.00002817 [10:46:22] Epoch: 1 Batch: 6745/20099 (33.56%) Loss: 2.050283 LR: 0.00002817 [10:46:23] Epoch: 1 Batch: 6746/20099 (33.56%) Loss: 1.934958 LR: 0.00002817 [10:46:25] Epoch: 1 Batch: 6747/20099 (33.57%) Loss: 2.189257 LR: 0.00002817 [10:46:27] Epoch: 1 Batch: 6748/20099 (33.57%) Loss: 1.998366 LR: 0.00002817 [10:46:29] Epoch: 1 Batch: 6749/20099 (33.58%) Loss: 2.401262 LR: 0.00002817 [10:46:30] Epoch: 1 Batch: 6750/20099 (33.58%) Loss: 1.994730 LR: 0.00002817 [10:46:32] Epoch: 1 Batch: 6751/20099 (33.59%) Loss: 2.028171 LR: 0.00002816 [10:46:34] Epoch: 1 Batch: 6752/20099 (33.59%) Loss: 2.305369 LR: 0.00002816 [10:46:36] Epoch: 1 Batch: 6753/20099 (33.60%) Loss: 2.020905 LR: 0.00002816 [10:46:38] Epoch: 1 Batch: 6754/20099 (33.60%) Loss: 2.215989 LR: 0.00002816 [10:46:39] Epoch: 1 Batch: 6755/20099 (33.61%) Loss: 2.009247 LR: 0.00002816 [10:46:41] Epoch: 1 Batch: 6756/20099 (33.61%) Loss: 2.024789 LR: 0.00002816 [10:46:43] Epoch: 1 Batch: 6757/20099 (33.62%) Loss: 2.032604 LR: 0.00002816 [10:46:45] Epoch: 1 Batch: 6758/20099 (33.62%) Loss: 2.034793 LR: 0.00002816 [10:46:46] Epoch: 1 Batch: 6759/20099 (33.63%) Loss: 2.084465 LR: 0.00002816 [10:46:48] Epoch: 1 Batch: 6760/20099 (33.63%) Loss: 2.211863 LR: 0.00002816 [10:46:50] Epoch: 1 Batch: 6761/20099 (33.64%) Loss: 1.915698 LR: 0.00002816 [10:46:52] Epoch: 1 Batch: 6762/20099 (33.64%) Loss: 2.201440 LR: 0.00002816 [10:46:53] Epoch: 1 Batch: 6763/20099 (33.65%) Loss: 2.025680 LR: 0.00002816 [10:46:55] Epoch: 1 Batch: 6764/20099 (33.65%) Loss: 2.166673 LR: 0.00002816 [10:46:57] Epoch: 1 Batch: 6765/20099 (33.66%) Loss: 2.119070 LR: 0.00002815 [10:46:59] Epoch: 1 Batch: 6766/20099 (33.66%) Loss: 2.078631 LR: 0.00002815 [10:47:01] Epoch: 1 Batch: 6767/20099 (33.67%) Loss: 1.665108 LR: 0.00002815 [10:47:02] Epoch: 1 Batch: 6768/20099 (33.67%) Loss: 2.020810 LR: 0.00002815 [10:47:04] Epoch: 1 Batch: 6769/20099 (33.68%) Loss: 2.119768 LR: 0.00002815 [10:47:06] Epoch: 1 Batch: 6770/20099 (33.68%) Loss: 2.115186 LR: 0.00002815 [10:47:08] Epoch: 1 Batch: 6771/20099 (33.69%) Loss: 2.082371 LR: 0.00002815 [10:47:09] Epoch: 1 Batch: 6772/20099 (33.69%) Loss: 2.187905 LR: 0.00002814 [10:47:11] Epoch: 1 Batch: 6773/20099 (33.70%) Loss: 2.165078 LR: 0.00002814 [10:47:13] Epoch: 1 Batch: 6774/20099 (33.70%) Loss: 2.191298 LR: 0.00002814 [10:47:15] Epoch: 1 Batch: 6775/20099 (33.71%) Loss: 2.090792 LR: 0.00002814 [10:47:17] Epoch: 1 Batch: 6776/20099 (33.71%) Loss: 1.858158 LR: 0.00002814 [10:47:18] Epoch: 1 Batch: 6777/20099 (33.72%) Loss: 2.024042 LR: 0.00002814 [10:47:20] Epoch: 1 Batch: 6778/20099 (33.72%) Loss: 2.247656 LR: 0.00002814 [10:47:22] Epoch: 1 Batch: 6779/20099 (33.73%) Loss: 1.992156 LR: 0.00002813 [10:47:24] Epoch: 1 Batch: 6780/20099 (33.73%) Loss: 2.091400 LR: 0.00002813 [10:47:25] Epoch: 1 Batch: 6781/20099 (33.74%) Loss: 1.944572 LR: 0.00002813 [10:47:27] Epoch: 1 Batch: 6782/20099 (33.74%) Loss: 2.358613 LR: 0.00002813 [10:47:29] Epoch: 1 Batch: 6783/20099 (33.75%) Loss: 2.177873 LR: 0.00002813 [10:47:31] Epoch: 1 Batch: 6784/20099 (33.75%) Loss: 2.180829 LR: 0.00002813 [10:47:33] Epoch: 1 Batch: 6785/20099 (33.76%) Loss: 2.116718 LR: 0.00002813 [10:47:34] Epoch: 1 Batch: 6786/20099 (33.76%) Loss: 1.862131 LR: 0.00002812 [10:47:36] Epoch: 1 Batch: 6787/20099 (33.77%) Loss: 2.143174 LR: 0.00002812 [10:47:38] Epoch: 1 Batch: 6788/20099 (33.77%) Loss: 2.031924 LR: 0.00002812 [10:47:40] Epoch: 1 Batch: 6789/20099 (33.78%) Loss: 2.071654 LR: 0.00002812 [10:47:41] Epoch: 1 Batch: 6790/20099 (33.78%) Loss: 2.098736 LR: 0.00002812 [10:47:43] Epoch: 1 Batch: 6791/20099 (33.79%) Loss: 2.129359 LR: 0.00002812 [10:47:45] Epoch: 1 Batch: 6792/20099 (33.79%) Loss: 2.024835 LR: 0.00002812 [10:47:47] Epoch: 1 Batch: 6793/20099 (33.80%) Loss: 1.890738 LR: 0.00002811 [10:47:49] Epoch: 1 Batch: 6794/20099 (33.80%) Loss: 2.226524 LR: 0.00002811 [10:47:50] Epoch: 1 Batch: 6795/20099 (33.81%) Loss: 2.303231 LR: 0.00002811 [10:47:52] Epoch: 1 Batch: 6796/20099 (33.81%) Loss: 2.004296 LR: 0.00002811 [10:47:54] Epoch: 1 Batch: 6797/20099 (33.82%) Loss: 2.066525 LR: 0.00002811 [10:47:56] Epoch: 1 Batch: 6798/20099 (33.82%) Loss: 2.254001 LR: 0.00002811 [10:47:57] Epoch: 1 Batch: 6799/20099 (33.83%) Loss: 2.140169 LR: 0.00002811 [10:48:03] >> Cleaned up old temp checkpoint: epoch1_step4800 [10:48:03] >> Temp checkpoint saved: epoch1_step6800, size: 0.1693 GB [10:48:03] Epoch: 1 Batch: 6800/20099 (33.83%) Loss: 1.916113 LR: 0.00002810 [10:48:04] Epoch: 1 Batch: 6801/20099 (33.84%) Loss: 2.143852 LR: 0.00002810 [10:48:06] Epoch: 1 Batch: 6802/20099 (33.84%) Loss: 2.223588 LR: 0.00002810 [10:48:08] Epoch: 1 Batch: 6803/20099 (33.85%) Loss: 1.996708 LR: 0.00002810 [10:48:10] Epoch: 1 Batch: 6804/20099 (33.85%) Loss: 2.069145 LR: 0.00002810 [10:48:11] Epoch: 1 Batch: 6805/20099 (33.86%) Loss: 2.025539 LR: 0.00002810 [10:48:13] Epoch: 1 Batch: 6806/20099 (33.86%) Loss: 2.089552 LR: 0.00002810 [10:48:15] Epoch: 1 Batch: 6807/20099 (33.87%) Loss: 2.364429 LR: 0.00002810 [10:48:17] Epoch: 1 Batch: 6808/20099 (33.87%) Loss: 2.181350 LR: 0.00002810 [10:48:19] Epoch: 1 Batch: 6809/20099 (33.88%) Loss: 1.795850 LR: 0.00002810 [10:48:20] Epoch: 1 Batch: 6810/20099 (33.88%) Loss: 2.077196 LR: 0.00002810 [10:48:22] Epoch: 1 Batch: 6811/20099 (33.89%) Loss: 2.209964 LR: 0.00002810 [10:48:24] Epoch: 1 Batch: 6812/20099 (33.89%) Loss: 1.967180 LR: 0.00002810 [10:48:26] Epoch: 1 Batch: 6813/20099 (33.90%) Loss: 2.117794 LR: 0.00002810 [10:48:27] Epoch: 1 Batch: 6814/20099 (33.90%) Loss: 1.884259 LR: 0.00002809 [10:48:29] Epoch: 1 Batch: 6815/20099 (33.91%) Loss: 2.238241 LR: 0.00002809 [10:48:31] Epoch: 1 Batch: 6816/20099 (33.91%) Loss: 2.001045 LR: 0.00002809 [10:48:33] Epoch: 1 Batch: 6817/20099 (33.92%) Loss: 2.030901 LR: 0.00002809 [10:48:35] Epoch: 1 Batch: 6818/20099 (33.92%) Loss: 2.379946 LR: 0.00002809 [10:48:36] Epoch: 1 Batch: 6819/20099 (33.93%) Loss: 2.121236 LR: 0.00002809 [10:48:38] Epoch: 1 Batch: 6820/20099 (33.93%) Loss: 1.918446 LR: 0.00002809 [10:48:40] Epoch: 1 Batch: 6821/20099 (33.94%) Loss: 2.046328 LR: 0.00002808 [10:48:42] Epoch: 1 Batch: 6822/20099 (33.94%) Loss: 2.333735 LR: 0.00002808 [10:48:44] Epoch: 1 Batch: 6823/20099 (33.95%) Loss: 2.291618 LR: 0.00002808 [10:48:45] Epoch: 1 Batch: 6824/20099 (33.95%) Loss: 2.176229 LR: 0.00002808 [10:48:47] Epoch: 1 Batch: 6825/20099 (33.96%) Loss: 1.687084 LR: 0.00002808 [10:48:49] Epoch: 1 Batch: 6826/20099 (33.96%) Loss: 1.904592 LR: 0.00002808 [10:48:51] Epoch: 1 Batch: 6827/20099 (33.97%) Loss: 2.436557 LR: 0.00002808 [10:48:52] Epoch: 1 Batch: 6828/20099 (33.97%) Loss: 2.187032 LR: 0.00002807 [10:48:54] Epoch: 1 Batch: 6829/20099 (33.98%) Loss: 2.110093 LR: 0.00002807 [10:48:56] Epoch: 1 Batch: 6830/20099 (33.98%) Loss: 2.027519 LR: 0.00002807 [10:48:58] Epoch: 1 Batch: 6831/20099 (33.99%) Loss: 2.005858 LR: 0.00002807 [10:48:59] Epoch: 1 Batch: 6832/20099 (33.99%) Loss: 2.150229 LR: 0.00002807 [10:49:01] Epoch: 1 Batch: 6833/20099 (34.00%) Loss: 1.793528 LR: 0.00002807 [10:49:03] Epoch: 1 Batch: 6834/20099 (34.00%) Loss: 2.023657 LR: 0.00002807 [10:49:05] Epoch: 1 Batch: 6835/20099 (34.01%) Loss: 2.396286 LR: 0.00002806 [10:49:07] Epoch: 1 Batch: 6836/20099 (34.01%) Loss: 1.939944 LR: 0.00002806 [10:49:08] Epoch: 1 Batch: 6837/20099 (34.02%) Loss: 2.156412 LR: 0.00002806 [10:49:10] Epoch: 1 Batch: 6838/20099 (34.02%) Loss: 2.326754 LR: 0.00002806 [10:49:12] Epoch: 1 Batch: 6839/20099 (34.03%) Loss: 2.351211 LR: 0.00002806 [10:49:14] Epoch: 1 Batch: 6840/20099 (34.03%) Loss: 2.169238 LR: 0.00002806 [10:49:15] Epoch: 1 Batch: 6841/20099 (34.04%) Loss: 2.003825 LR: 0.00002806 [10:49:17] Epoch: 1 Batch: 6842/20099 (34.04%) Loss: 2.041799 LR: 0.00002805 [10:49:19] Epoch: 1 Batch: 6843/20099 (34.05%) Loss: 2.146740 LR: 0.00002805 [10:49:21] Epoch: 1 Batch: 6844/20099 (34.05%) Loss: 2.254851 LR: 0.00002805 [10:49:22] Epoch: 1 Batch: 6845/20099 (34.06%) Loss: 2.354460 LR: 0.00002805 [10:49:24] Epoch: 1 Batch: 6846/20099 (34.06%) Loss: 2.311606 LR: 0.00002805 [10:49:26] Epoch: 1 Batch: 6847/20099 (34.07%) Loss: 2.343667 LR: 0.00002805 [10:49:28] Epoch: 1 Batch: 6848/20099 (34.07%) Loss: 1.940030 LR: 0.00002805 [10:49:30] Epoch: 1 Batch: 6849/20099 (34.08%) Loss: 2.089580 LR: 0.00002804 [10:49:31] Epoch: 1 Batch: 6850/20099 (34.08%) Loss: 1.973213 LR: 0.00002804 [10:49:33] Epoch: 1 Batch: 6851/20099 (34.09%) Loss: 1.910539 LR: 0.00002804 [10:49:35] Epoch: 1 Batch: 6852/20099 (34.09%) Loss: 2.203053 LR: 0.00002804 [10:49:37] Epoch: 1 Batch: 6853/20099 (34.10%) Loss: 2.191430 LR: 0.00002804 [10:49:38] Epoch: 1 Batch: 6854/20099 (34.10%) Loss: 1.961695 LR: 0.00002804 [10:49:40] Epoch: 1 Batch: 6855/20099 (34.11%) Loss: 2.207938 LR: 0.00002804 [10:49:42] Epoch: 1 Batch: 6856/20099 (34.11%) Loss: 2.093933 LR: 0.00002803 [10:49:44] Epoch: 1 Batch: 6857/20099 (34.12%) Loss: 2.106593 LR: 0.00002803 [10:49:45] Epoch: 1 Batch: 6858/20099 (34.12%) Loss: 2.213724 LR: 0.00002803 [10:49:47] Epoch: 1 Batch: 6859/20099 (34.13%) Loss: 1.953931 LR: 0.00002803 [10:49:49] Epoch: 1 Batch: 6860/20099 (34.13%) Loss: 2.033996 LR: 0.00002803 [10:49:51] Epoch: 1 Batch: 6861/20099 (34.14%) Loss: 1.984051 LR: 0.00002803 [10:49:52] Epoch: 1 Batch: 6862/20099 (34.14%) Loss: 2.107455 LR: 0.00002803 [10:49:54] Epoch: 1 Batch: 6863/20099 (34.15%) Loss: 2.359044 LR: 0.00002802 [10:49:56] Epoch: 1 Batch: 6864/20099 (34.15%) Loss: 1.987946 LR: 0.00002802 [10:49:58] Epoch: 1 Batch: 6865/20099 (34.16%) Loss: 2.032412 LR: 0.00002802 [10:50:00] Epoch: 1 Batch: 6866/20099 (34.16%) Loss: 2.114138 LR: 0.00002802 [10:50:01] Epoch: 1 Batch: 6867/20099 (34.17%) Loss: 2.018090 LR: 0.00002802 [10:50:03] Epoch: 1 Batch: 6868/20099 (34.17%) Loss: 2.134087 LR: 0.00002802 [10:50:05] Epoch: 1 Batch: 6869/20099 (34.18%) Loss: 2.134937 LR: 0.00002802 [10:50:07] Epoch: 1 Batch: 6870/20099 (34.18%) Loss: 2.071761 LR: 0.00002802 [10:50:08] Epoch: 1 Batch: 6871/20099 (34.19%) Loss: 2.151066 LR: 0.00002802 [10:50:10] Epoch: 1 Batch: 6872/20099 (34.19%) Loss: 2.300032 LR: 0.00002802 [10:50:12] Epoch: 1 Batch: 6873/20099 (34.20%) Loss: 2.009768 LR: 0.00002802 [10:50:14] Epoch: 1 Batch: 6874/20099 (34.20%) Loss: 2.201602 LR: 0.00002802 [10:50:16] Epoch: 1 Batch: 6875/20099 (34.21%) Loss: 1.852738 LR: 0.00002802 [10:50:17] Epoch: 1 Batch: 6876/20099 (34.21%) Loss: 2.064631 LR: 0.00002802 [10:50:19] Epoch: 1 Batch: 6877/20099 (34.22%) Loss: 1.971257 LR: 0.00002801 [10:50:21] Epoch: 1 Batch: 6878/20099 (34.22%) Loss: 2.184020 LR: 0.00002801 [10:50:23] Epoch: 1 Batch: 6879/20099 (34.23%) Loss: 2.165136 LR: 0.00002801 [10:50:24] Epoch: 1 Batch: 6880/20099 (34.23%) Loss: 1.912142 LR: 0.00002801 [10:50:26] Epoch: 1 Batch: 6881/20099 (34.24%) Loss: 2.175558 LR: 0.00002801 [10:50:28] Epoch: 1 Batch: 6882/20099 (34.24%) Loss: 2.092246 LR: 0.00002801 [10:50:30] Epoch: 1 Batch: 6883/20099 (34.25%) Loss: 1.799839 LR: 0.00002801 [10:50:32] Epoch: 1 Batch: 6884/20099 (34.25%) Loss: 2.187440 LR: 0.00002800 [10:50:33] Epoch: 1 Batch: 6885/20099 (34.26%) Loss: 2.044364 LR: 0.00002800 [10:50:35] Epoch: 1 Batch: 6886/20099 (34.26%) Loss: 2.156611 LR: 0.00002800 [10:50:37] Epoch: 1 Batch: 6887/20099 (34.27%) Loss: 2.077038 LR: 0.00002800 [10:50:39] Epoch: 1 Batch: 6888/20099 (34.27%) Loss: 1.979192 LR: 0.00002800 [10:50:40] Epoch: 1 Batch: 6889/20099 (34.28%) Loss: 2.001452 LR: 0.00002800 [10:50:42] Epoch: 1 Batch: 6890/20099 (34.28%) Loss: 1.879841 LR: 0.00002800 [10:50:44] Epoch: 1 Batch: 6891/20099 (34.29%) Loss: 2.188141 LR: 0.00002799 [10:50:46] Epoch: 1 Batch: 6892/20099 (34.29%) Loss: 2.392734 LR: 0.00002799 [10:50:48] Epoch: 1 Batch: 6893/20099 (34.30%) Loss: 1.992314 LR: 0.00002799 [10:50:49] Epoch: 1 Batch: 6894/20099 (34.30%) Loss: 2.172193 LR: 0.00002799 [10:50:51] Epoch: 1 Batch: 6895/20099 (34.31%) Loss: 2.109421 LR: 0.00002799 [10:50:53] Epoch: 1 Batch: 6896/20099 (34.31%) Loss: 2.304380 LR: 0.00002799 [10:50:55] Epoch: 1 Batch: 6897/20099 (34.32%) Loss: 2.193593 LR: 0.00002799 [10:50:56] Epoch: 1 Batch: 6898/20099 (34.32%) Loss: 2.299893 LR: 0.00002798 [10:50:58] Epoch: 1 Batch: 6899/20099 (34.33%) Loss: 2.264403 LR: 0.00002798 [10:51:00] Epoch: 1 Batch: 6900/20099 (34.33%) Loss: 2.196179 LR: 0.00002798 [10:51:02] Epoch: 1 Batch: 6901/20099 (34.34%) Loss: 2.425621 LR: 0.00002798 [10:51:04] Epoch: 1 Batch: 6902/20099 (34.34%) Loss: 2.013336 LR: 0.00002798 [10:51:05] Epoch: 1 Batch: 6903/20099 (34.34%) Loss: 1.577217 LR: 0.00002798 [10:51:07] Epoch: 1 Batch: 6904/20099 (34.35%) Loss: 2.074481 LR: 0.00002798 [10:51:09] Epoch: 1 Batch: 6905/20099 (34.35%) Loss: 2.095217 LR: 0.00002797 [10:51:11] Epoch: 1 Batch: 6906/20099 (34.36%) Loss: 2.205564 LR: 0.00002797 [10:51:12] Epoch: 1 Batch: 6907/20099 (34.36%) Loss: 2.179969 LR: 0.00002797 [10:51:14] Epoch: 1 Batch: 6908/20099 (34.37%) Loss: 2.279891 LR: 0.00002797 [10:51:16] Epoch: 1 Batch: 6909/20099 (34.37%) Loss: 2.016210 LR: 0.00002797 [10:51:18] Epoch: 1 Batch: 6910/20099 (34.38%) Loss: 2.243501 LR: 0.00002797 [10:51:20] Epoch: 1 Batch: 6911/20099 (34.38%) Loss: 1.659151 LR: 0.00002797 [10:51:21] Epoch: 1 Batch: 6912/20099 (34.39%) Loss: 2.060833 LR: 0.00002796 [10:51:23] Epoch: 1 Batch: 6913/20099 (34.39%) Loss: 2.150305 LR: 0.00002796 [10:51:25] Epoch: 1 Batch: 6914/20099 (34.40%) Loss: 2.047707 LR: 0.00002796 [10:51:27] Epoch: 1 Batch: 6915/20099 (34.40%) Loss: 1.850793 LR: 0.00002796 [10:51:28] Epoch: 1 Batch: 6916/20099 (34.41%) Loss: 2.710672 LR: 0.00002796 [10:51:30] Epoch: 1 Batch: 6917/20099 (34.41%) Loss: 2.245318 LR: 0.00002796 [10:51:32] Epoch: 1 Batch: 6918/20099 (34.42%) Loss: 1.938993 LR: 0.00002796 [10:51:34] Epoch: 1 Batch: 6919/20099 (34.42%) Loss: 2.023917 LR: 0.00002795 [10:51:36] Epoch: 1 Batch: 6920/20099 (34.43%) Loss: 2.147976 LR: 0.00002795 [10:51:37] Epoch: 1 Batch: 6921/20099 (34.43%) Loss: 2.366758 LR: 0.00002795 [10:51:39] Epoch: 1 Batch: 6922/20099 (34.44%) Loss: 1.996173 LR: 0.00002795 [10:51:41] Epoch: 1 Batch: 6923/20099 (34.44%) Loss: 2.230927 LR: 0.00002795 [10:51:43] Epoch: 1 Batch: 6924/20099 (34.45%) Loss: 2.254483 LR: 0.00002795 [10:51:44] Epoch: 1 Batch: 6925/20099 (34.45%) Loss: 2.145182 LR: 0.00002795 [10:51:46] Epoch: 1 Batch: 6926/20099 (34.46%) Loss: 2.287155 LR: 0.00002794 [10:51:48] Epoch: 1 Batch: 6927/20099 (34.46%) Loss: 2.104568 LR: 0.00002794 [10:51:50] Epoch: 1 Batch: 6928/20099 (34.47%) Loss: 2.149989 LR: 0.00002794 [10:51:52] Epoch: 1 Batch: 6929/20099 (34.47%) Loss: 2.280102 LR: 0.00002794 [10:51:53] Epoch: 1 Batch: 6930/20099 (34.48%) Loss: 2.323362 LR: 0.00002794 [10:51:55] Epoch: 1 Batch: 6931/20099 (34.48%) Loss: 2.245917 LR: 0.00002794 [10:51:57] Epoch: 1 Batch: 6932/20099 (34.49%) Loss: 2.261348 LR: 0.00002794 [10:51:59] Epoch: 1 Batch: 6933/20099 (34.49%) Loss: 2.051615 LR: 0.00002793 [10:52:00] Epoch: 1 Batch: 6934/20099 (34.50%) Loss: 2.283620 LR: 0.00002793 [10:52:02] Epoch: 1 Batch: 6935/20099 (34.50%) Loss: 2.344166 LR: 0.00002793 [10:52:04] Epoch: 1 Batch: 6936/20099 (34.51%) Loss: 1.978947 LR: 0.00002793 [10:52:06] Epoch: 1 Batch: 6937/20099 (34.51%) Loss: 2.123052 LR: 0.00002793 [10:52:08] Epoch: 1 Batch: 6938/20099 (34.52%) Loss: 1.977782 LR: 0.00002793 [10:52:09] Epoch: 1 Batch: 6939/20099 (34.52%) Loss: 2.010353 LR: 0.00002793 [10:52:11] Epoch: 1 Batch: 6940/20099 (34.53%) Loss: 1.855951 LR: 0.00002792 [10:52:13] Epoch: 1 Batch: 6941/20099 (34.53%) Loss: 2.188379 LR: 0.00002792 [10:52:15] Epoch: 1 Batch: 6942/20099 (34.54%) Loss: 2.217560 LR: 0.00002792 [10:52:16] Epoch: 1 Batch: 6943/20099 (34.54%) Loss: 1.815972 LR: 0.00002792 [10:52:18] Epoch: 1 Batch: 6944/20099 (34.55%) Loss: 2.162040 LR: 0.00002792 [10:52:20] Epoch: 1 Batch: 6945/20099 (34.55%) Loss: 1.597113 LR: 0.00002792 [10:52:22] Epoch: 1 Batch: 6946/20099 (34.56%) Loss: 1.869903 LR: 0.00002792 [10:52:24] Epoch: 1 Batch: 6947/20099 (34.56%) Loss: 2.060245 LR: 0.00002792 [10:52:25] Epoch: 1 Batch: 6948/20099 (34.57%) Loss: 2.069923 LR: 0.00002792 [10:52:27] Epoch: 1 Batch: 6949/20099 (34.57%) Loss: 2.200412 LR: 0.00002792 [10:52:29] Epoch: 1 Batch: 6950/20099 (34.58%) Loss: 2.258299 LR: 0.00002792 [10:52:31] Epoch: 1 Batch: 6951/20099 (34.58%) Loss: 1.949165 LR: 0.00002792 [10:52:32] Epoch: 1 Batch: 6952/20099 (34.59%) Loss: 2.380992 LR: 0.00002792 [10:52:34] Epoch: 1 Batch: 6953/20099 (34.59%) Loss: 1.863948 LR: 0.00002792 [10:52:36] Epoch: 1 Batch: 6954/20099 (34.60%) Loss: 1.946181 LR: 0.00002791 [10:52:38] Epoch: 1 Batch: 6955/20099 (34.60%) Loss: 2.207716 LR: 0.00002791 [10:52:39] Epoch: 1 Batch: 6956/20099 (34.61%) Loss: 1.759644 LR: 0.00002791 [10:52:41] Epoch: 1 Batch: 6957/20099 (34.61%) Loss: 2.237288 LR: 0.00002791 [10:52:43] Epoch: 1 Batch: 6958/20099 (34.62%) Loss: 2.050950 LR: 0.00002791 [10:52:45] Epoch: 1 Batch: 6959/20099 (34.62%) Loss: 2.205784 LR: 0.00002791 [10:52:47] Epoch: 1 Batch: 6960/20099 (34.63%) Loss: 2.168074 LR: 0.00002791 [10:52:48] Epoch: 1 Batch: 6961/20099 (34.63%) Loss: 2.183275 LR: 0.00002790 [10:52:50] Epoch: 1 Batch: 6962/20099 (34.64%) Loss: 1.963000 LR: 0.00002790 [10:52:52] Epoch: 1 Batch: 6963/20099 (34.64%) Loss: 2.407468 LR: 0.00002790 [10:52:54] Epoch: 1 Batch: 6964/20099 (34.65%) Loss: 2.218308 LR: 0.00002790 [10:52:55] Epoch: 1 Batch: 6965/20099 (34.65%) Loss: 1.960445 LR: 0.00002790 [10:52:57] Epoch: 1 Batch: 6966/20099 (34.66%) Loss: 1.576009 LR: 0.00002790 [10:52:59] Epoch: 1 Batch: 6967/20099 (34.66%) Loss: 2.246544 LR: 0.00002790 [10:53:01] Epoch: 1 Batch: 6968/20099 (34.67%) Loss: 1.661664 LR: 0.00002789 [10:53:02] Epoch: 1 Batch: 6969/20099 (34.67%) Loss: 2.027601 LR: 0.00002789 [10:53:04] Epoch: 1 Batch: 6970/20099 (34.68%) Loss: 2.167160 LR: 0.00002789 [10:53:06] Epoch: 1 Batch: 6971/20099 (34.68%) Loss: 1.954417 LR: 0.00002789 [10:53:08] Epoch: 1 Batch: 6972/20099 (34.69%) Loss: 2.024255 LR: 0.00002789 [10:53:09] Epoch: 1 Batch: 6973/20099 (34.69%) Loss: 2.134949 LR: 0.00002789 [10:53:11] Epoch: 1 Batch: 6974/20099 (34.70%) Loss: 2.255246 LR: 0.00002789 [10:53:13] Epoch: 1 Batch: 6975/20099 (34.70%) Loss: 2.150893 LR: 0.00002788 [10:53:15] Epoch: 1 Batch: 6976/20099 (34.71%) Loss: 2.372547 LR: 0.00002788 [10:53:17] Epoch: 1 Batch: 6977/20099 (34.71%) Loss: 1.949844 LR: 0.00002788 [10:53:18] Epoch: 1 Batch: 6978/20099 (34.72%) Loss: 2.090329 LR: 0.00002788 [10:53:20] Epoch: 1 Batch: 6979/20099 (34.72%) Loss: 2.095672 LR: 0.00002788 [10:53:22] Epoch: 1 Batch: 6980/20099 (34.73%) Loss: 1.742736 LR: 0.00002788 [10:53:24] Epoch: 1 Batch: 6981/20099 (34.73%) Loss: 2.488661 LR: 0.00002788 [10:53:25] Epoch: 1 Batch: 6982/20099 (34.74%) Loss: 2.191320 LR: 0.00002787 [10:53:27] Epoch: 1 Batch: 6983/20099 (34.74%) Loss: 1.962778 LR: 0.00002787 [10:53:29] Epoch: 1 Batch: 6984/20099 (34.75%) Loss: 2.308599 LR: 0.00002787 [10:53:31] Epoch: 1 Batch: 6985/20099 (34.75%) Loss: 1.972839 LR: 0.00002787 [10:53:32] Epoch: 1 Batch: 6986/20099 (34.76%) Loss: 2.201016 LR: 0.00002787 [10:53:34] Epoch: 1 Batch: 6987/20099 (34.76%) Loss: 2.228927 LR: 0.00002787 [10:53:36] Epoch: 1 Batch: 6988/20099 (34.77%) Loss: 1.921659 LR: 0.00002787 [10:53:38] Epoch: 1 Batch: 6989/20099 (34.77%) Loss: 2.453832 LR: 0.00002786 [10:53:40] Epoch: 1 Batch: 6990/20099 (34.78%) Loss: 2.125905 LR: 0.00002786 [10:53:41] Epoch: 1 Batch: 6991/20099 (34.78%) Loss: 2.264550 LR: 0.00002786 [10:53:43] Epoch: 1 Batch: 6992/20099 (34.79%) Loss: 2.086047 LR: 0.00002786 [10:53:45] Epoch: 1 Batch: 6993/20099 (34.79%) Loss: 2.448757 LR: 0.00002786 [10:53:47] Epoch: 1 Batch: 6994/20099 (34.80%) Loss: 2.140265 LR: 0.00002786 [10:53:48] Epoch: 1 Batch: 6995/20099 (34.80%) Loss: 2.169662 LR: 0.00002786 [10:53:50] Epoch: 1 Batch: 6996/20099 (34.81%) Loss: 2.112179 LR: 0.00002785 [10:53:52] Epoch: 1 Batch: 6997/20099 (34.81%) Loss: 2.188232 LR: 0.00002785 [10:53:54] Epoch: 1 Batch: 6998/20099 (34.82%) Loss: 1.933957 LR: 0.00002785 [10:53:55] Epoch: 1 Batch: 6999/20099 (34.82%) Loss: 2.103078 LR: 0.00002785 [10:53:57] >> Evaluating batch 0 [10:53:58] >> Evaluating batch 1 [10:53:59] >> Evaluating batch 2 [10:54:00] >> Evaluating batch 3 [10:54:01] >> Evaluating batch 4 [10:54:02] >> Evaluating batch 5 [10:54:03] >> Evaluating batch 6 [10:54:04] >> Evaluating batch 7 [10:54:05] >> Evaluating batch 8 [10:54:06] >> Evaluating batch 9 [10:54:07] >> Evaluating batch 10 [10:54:08] >> Evaluating batch 11 [10:54:09] >> Evaluating batch 12 [10:54:10] >> Evaluating batch 13 [10:54:11] >> Evaluating batch 14 [10:54:12] >> Evaluating batch 15 [10:54:13] >> Evaluating batch 16 [10:54:14] Epoch: 1 Step: 7000/20099 Evaluation: [10:54:14] [1mAvg Loss Since Last Eval: 2.1061 Val Loss: 2.1842 Validation loss delta: -0.0048 Perplexity: 8.8833 LR: 0.00002785 [10:54:17] >> Cleaned up old temp checkpoint: epoch1_step5000 [10:54:17] >> Temp checkpoint saved: epoch1_step7000, size: 0.1693 GB [10:54:21] >> Checkpoint saved: epoch1_step7000, size: 0.1693 GB [10:54:21] Epoch: 1 Batch: 7000/20099 (34.83%) Loss: 2.139073 LR: 0.00002785 [10:54:23] Epoch: 1 Batch: 7001/20099 (34.83%) Loss: 2.306963 LR: 0.00002785 [10:54:24] Epoch: 1 Batch: 7002/20099 (34.84%) Loss: 1.889974 LR: 0.00002785 [10:54:26] Epoch: 1 Batch: 7003/20099 (34.84%) Loss: 2.004917 LR: 0.00002784 [10:54:28] Epoch: 1 Batch: 7004/20099 (34.85%) Loss: 2.138768 LR: 0.00002784 [10:54:30] Epoch: 1 Batch: 7005/20099 (34.85%) Loss: 2.122389 LR: 0.00002784 [10:54:31] Epoch: 1 Batch: 7006/20099 (34.86%) Loss: 1.983937 LR: 0.00002784 [10:54:33] Epoch: 1 Batch: 7007/20099 (34.86%) Loss: 2.372482 LR: 0.00002784 [10:54:35] Epoch: 1 Batch: 7008/20099 (34.87%) Loss: 2.001665 LR: 0.00002784 [10:54:37] Epoch: 1 Batch: 7009/20099 (34.87%) Loss: 2.015689 LR: 0.00002784 [10:54:38] Epoch: 1 Batch: 7010/20099 (34.88%) Loss: 2.157980 LR: 0.00002783 [10:54:40] Epoch: 1 Batch: 7011/20099 (34.88%) Loss: 2.048911 LR: 0.00002783 [10:54:42] Epoch: 1 Batch: 7012/20099 (34.89%) Loss: 2.007964 LR: 0.00002783 [10:54:44] Epoch: 1 Batch: 7013/20099 (34.89%) Loss: 2.043484 LR: 0.00002783 [10:54:46] Epoch: 1 Batch: 7014/20099 (34.90%) Loss: 2.087451 LR: 0.00002783 [10:54:47] Epoch: 1 Batch: 7015/20099 (34.90%) Loss: 2.170595 LR: 0.00002783 [10:54:49] Epoch: 1 Batch: 7016/20099 (34.91%) Loss: 2.005560 LR: 0.00002783 [10:54:51] Epoch: 1 Batch: 7017/20099 (34.91%) Loss: 1.908896 LR: 0.00002782 [10:54:53] Epoch: 1 Batch: 7018/20099 (34.92%) Loss: 2.347799 LR: 0.00002782 [10:54:55] Epoch: 1 Batch: 7019/20099 (34.92%) Loss: 2.300037 LR: 0.00002782 [10:54:57] Epoch: 1 Batch: 7020/20099 (34.93%) Loss: 1.861715 LR: 0.00002782 [10:54:59] Epoch: 1 Batch: 7021/20099 (34.93%) Loss: 2.111417 LR: 0.00002782 [10:55:00] Epoch: 1 Batch: 7022/20099 (34.94%) Loss: 2.148019 LR: 0.00002782 [10:55:02] Epoch: 1 Batch: 7023/20099 (34.94%) Loss: 2.026584 LR: 0.00002782 [10:55:04] Epoch: 1 Batch: 7024/20099 (34.95%) Loss: 2.224730 LR: 0.00002781 [10:55:06] Epoch: 1 Batch: 7025/20099 (34.95%) Loss: 1.952418 LR: 0.00002781 [10:55:08] Epoch: 1 Batch: 7026/20099 (34.96%) Loss: 1.974889 LR: 0.00002781 [10:55:09] Epoch: 1 Batch: 7027/20099 (34.96%) Loss: 2.100460 LR: 0.00002781 [10:55:11] Epoch: 1 Batch: 7028/20099 (34.97%) Loss: 2.271579 LR: 0.00002781 [10:55:13] Epoch: 1 Batch: 7029/20099 (34.97%) Loss: 1.739239 LR: 0.00002781 [10:55:15] Epoch: 1 Batch: 7030/20099 (34.98%) Loss: 1.909335 LR: 0.00002781 [10:55:16] Epoch: 1 Batch: 7031/20099 (34.98%) Loss: 2.047011 LR: 0.00002780 [10:55:18] Epoch: 1 Batch: 7032/20099 (34.99%) Loss: 2.159661 LR: 0.00002780 [10:55:20] Epoch: 1 Batch: 7033/20099 (34.99%) Loss: 1.770979 LR: 0.00002780 [10:55:22] Epoch: 1 Batch: 7034/20099 (35.00%) Loss: 2.116041 LR: 0.00002780 [10:55:24] Epoch: 1 Batch: 7035/20099 (35.00%) Loss: 2.067008 LR: 0.00002780 [10:55:25] Epoch: 1 Batch: 7036/20099 (35.01%) Loss: 2.394377 LR: 0.00002780 [10:55:27] Epoch: 1 Batch: 7037/20099 (35.01%) Loss: 2.131183 LR: 0.00002780 [10:55:29] Epoch: 1 Batch: 7038/20099 (35.02%) Loss: 2.059942 LR: 0.00002780 [10:55:30] Epoch: 1 Batch: 7039/20099 (35.02%) Loss: 2.249591 LR: 0.00002780 [10:55:32] Epoch: 1 Batch: 7040/20099 (35.03%) Loss: 2.167532 LR: 0.00002780 [10:55:34] Epoch: 1 Batch: 7041/20099 (35.03%) Loss: 1.932089 LR: 0.00002780 [10:55:36] Epoch: 1 Batch: 7042/20099 (35.04%) Loss: 2.063604 LR: 0.00002780 [10:55:37] Epoch: 1 Batch: 7043/20099 (35.04%) Loss: 1.649331 LR: 0.00002780 [10:55:39] Epoch: 1 Batch: 7044/20099 (35.05%) Loss: 2.132253 LR: 0.00002780 [10:55:41] Epoch: 1 Batch: 7045/20099 (35.05%) Loss: 2.121477 LR: 0.00002779 [10:55:43] Epoch: 1 Batch: 7046/20099 (35.06%) Loss: 1.974774 LR: 0.00002779 [10:55:44] Epoch: 1 Batch: 7047/20099 (35.06%) Loss: 2.217102 LR: 0.00002779 [10:55:46] Epoch: 1 Batch: 7048/20099 (35.07%) Loss: 1.734862 LR: 0.00002779 [10:55:48] Epoch: 1 Batch: 7049/20099 (35.07%) Loss: 2.375342 LR: 0.00002779 [10:55:50] Epoch: 1 Batch: 7050/20099 (35.08%) Loss: 1.995915 LR: 0.00002779 [10:55:51] Epoch: 1 Batch: 7051/20099 (35.08%) Loss: 1.824705 LR: 0.00002779 [10:55:53] Epoch: 1 Batch: 7052/20099 (35.09%) Loss: 1.784909 LR: 0.00002778 [10:55:55] Epoch: 1 Batch: 7053/20099 (35.09%) Loss: 2.123444 LR: 0.00002778 [10:55:57] Epoch: 1 Batch: 7054/20099 (35.10%) Loss: 2.222812 LR: 0.00002778 [10:55:59] Epoch: 1 Batch: 7055/20099 (35.10%) Loss: 1.838475 LR: 0.00002778 [10:56:00] Epoch: 1 Batch: 7056/20099 (35.11%) Loss: 2.256785 LR: 0.00002778 [10:56:02] Epoch: 1 Batch: 7057/20099 (35.11%) Loss: 2.107504 LR: 0.00002778 [10:56:04] Epoch: 1 Batch: 7058/20099 (35.12%) Loss: 2.073534 LR: 0.00002778 [10:56:06] Epoch: 1 Batch: 7059/20099 (35.12%) Loss: 2.106862 LR: 0.00002777 [10:56:07] Epoch: 1 Batch: 7060/20099 (35.13%) Loss: 1.985534 LR: 0.00002777 [10:56:09] Epoch: 1 Batch: 7061/20099 (35.13%) Loss: 2.102464 LR: 0.00002777 [10:56:11] Epoch: 1 Batch: 7062/20099 (35.14%) Loss: 1.912967 LR: 0.00002777 [10:56:13] Epoch: 1 Batch: 7063/20099 (35.14%) Loss: 2.228588 LR: 0.00002777 [10:56:15] Epoch: 1 Batch: 7064/20099 (35.15%) Loss: 1.912664 LR: 0.00002777 [10:56:16] Epoch: 1 Batch: 7065/20099 (35.15%) Loss: 1.971045 LR: 0.00002777 [10:56:18] Epoch: 1 Batch: 7066/20099 (35.16%) Loss: 2.305587 LR: 0.00002776 [10:56:20] Epoch: 1 Batch: 7067/20099 (35.16%) Loss: 1.702191 LR: 0.00002776 [10:56:22] Epoch: 1 Batch: 7068/20099 (35.17%) Loss: 2.243290 LR: 0.00002776 [10:56:24] Epoch: 1 Batch: 7069/20099 (35.17%) Loss: 2.336854 LR: 0.00002776 [10:56:25] Epoch: 1 Batch: 7070/20099 (35.18%) Loss: 2.252550 LR: 0.00002776 [10:56:27] Epoch: 1 Batch: 7071/20099 (35.18%) Loss: 2.029330 LR: 0.00002776 [10:56:29] Epoch: 1 Batch: 7072/20099 (35.19%) Loss: 2.094667 LR: 0.00002776 [10:56:31] Epoch: 1 Batch: 7073/20099 (35.19%) Loss: 2.279553 LR: 0.00002775 [10:56:32] Epoch: 1 Batch: 7074/20099 (35.20%) Loss: 2.117043 LR: 0.00002775 [10:56:34] Epoch: 1 Batch: 7075/20099 (35.20%) Loss: 2.046130 LR: 0.00002775 [10:56:36] Epoch: 1 Batch: 7076/20099 (35.21%) Loss: 1.990312 LR: 0.00002775 [10:56:38] Epoch: 1 Batch: 7077/20099 (35.21%) Loss: 1.988800 LR: 0.00002775 [10:56:39] Epoch: 1 Batch: 7078/20099 (35.22%) Loss: 2.272196 LR: 0.00002775 [10:56:41] Epoch: 1 Batch: 7079/20099 (35.22%) Loss: 2.018588 LR: 0.00002775 [10:56:43] Epoch: 1 Batch: 7080/20099 (35.23%) Loss: 2.095214 LR: 0.00002774 [10:56:45] Epoch: 1 Batch: 7081/20099 (35.23%) Loss: 2.179277 LR: 0.00002774 [10:56:47] Epoch: 1 Batch: 7082/20099 (35.24%) Loss: 2.167888 LR: 0.00002774 [10:56:48] Epoch: 1 Batch: 7083/20099 (35.24%) Loss: 1.992153 LR: 0.00002774 [10:56:50] Epoch: 1 Batch: 7084/20099 (35.25%) Loss: 1.986721 LR: 0.00002774 [10:56:52] Epoch: 1 Batch: 7085/20099 (35.25%) Loss: 2.025734 LR: 0.00002774 [10:56:54] Epoch: 1 Batch: 7086/20099 (35.26%) Loss: 2.181678 LR: 0.00002774 [10:56:55] Epoch: 1 Batch: 7087/20099 (35.26%) Loss: 2.056132 LR: 0.00002773 [10:56:57] Epoch: 1 Batch: 7088/20099 (35.27%) Loss: 1.950521 LR: 0.00002773 [10:56:59] Epoch: 1 Batch: 7089/20099 (35.27%) Loss: 2.114528 LR: 0.00002773 [10:57:01] Epoch: 1 Batch: 7090/20099 (35.28%) Loss: 1.854285 LR: 0.00002773 [10:57:02] Epoch: 1 Batch: 7091/20099 (35.28%) Loss: 1.843466 LR: 0.00002773 [10:57:04] Epoch: 1 Batch: 7092/20099 (35.29%) Loss: 2.045099 LR: 0.00002773 [10:57:06] Epoch: 1 Batch: 7093/20099 (35.29%) Loss: 2.036808 LR: 0.00002773 [10:57:08] Epoch: 1 Batch: 7094/20099 (35.30%) Loss: 2.525105 LR: 0.00002772 [10:57:09] Epoch: 1 Batch: 7095/20099 (35.30%) Loss: 2.081077 LR: 0.00002772 [10:57:11] Epoch: 1 Batch: 7096/20099 (35.31%) Loss: 2.155258 LR: 0.00002772 [10:57:13] Epoch: 1 Batch: 7097/20099 (35.31%) Loss: 2.341044 LR: 0.00002772 [10:57:15] Epoch: 1 Batch: 7098/20099 (35.32%) Loss: 2.094894 LR: 0.00002772 [10:57:17] Epoch: 1 Batch: 7099/20099 (35.32%) Loss: 1.951580 LR: 0.00002772 [10:57:18] Epoch: 1 Batch: 7100/20099 (35.33%) Loss: 2.193962 LR: 0.00002772 [10:57:20] Epoch: 1 Batch: 7101/20099 (35.33%) Loss: 1.934027 LR: 0.00002771 [10:57:22] Epoch: 1 Batch: 7102/20099 (35.34%) Loss: 2.139476 LR: 0.00002771 [10:57:24] Epoch: 1 Batch: 7103/20099 (35.34%) Loss: 2.108750 LR: 0.00002771 [10:57:25] Epoch: 1 Batch: 7104/20099 (35.35%) Loss: 2.147327 LR: 0.00002771 [10:57:27] Epoch: 1 Batch: 7105/20099 (35.35%) Loss: 1.905008 LR: 0.00002771 [10:57:29] Epoch: 1 Batch: 7106/20099 (35.35%) Loss: 1.925633 LR: 0.00002771 [10:57:31] Epoch: 1 Batch: 7107/20099 (35.36%) Loss: 1.899914 LR: 0.00002771 [10:57:33] Epoch: 1 Batch: 7108/20099 (35.36%) Loss: 1.969616 LR: 0.00002770 [10:57:34] Epoch: 1 Batch: 7109/20099 (35.37%) Loss: 2.318392 LR: 0.00002770 [10:57:36] Epoch: 1 Batch: 7110/20099 (35.37%) Loss: 2.267425 LR: 0.00002770 [10:57:38] Epoch: 1 Batch: 7111/20099 (35.38%) Loss: 2.382717 LR: 0.00002770 [10:57:40] Epoch: 1 Batch: 7112/20099 (35.38%) Loss: 2.221552 LR: 0.00002770 [10:57:41] Epoch: 1 Batch: 7113/20099 (35.39%) Loss: 2.019246 LR: 0.00002770 [10:57:43] Epoch: 1 Batch: 7114/20099 (35.39%) Loss: 1.903459 LR: 0.00002770 [10:57:45] Epoch: 1 Batch: 7115/20099 (35.40%) Loss: 2.048503 LR: 0.00002769 [10:57:47] Epoch: 1 Batch: 7116/20099 (35.40%) Loss: 2.253531 LR: 0.00002769 [10:57:49] Epoch: 1 Batch: 7117/20099 (35.41%) Loss: 2.416657 LR: 0.00002769 [10:57:50] Epoch: 1 Batch: 7118/20099 (35.41%) Loss: 2.033681 LR: 0.00002769 [10:57:52] Epoch: 1 Batch: 7119/20099 (35.42%) Loss: 2.137687 LR: 0.00002769 [10:57:54] Epoch: 1 Batch: 7120/20099 (35.42%) Loss: 2.249152 LR: 0.00002769 [10:57:56] Epoch: 1 Batch: 7121/20099 (35.43%) Loss: 2.268061 LR: 0.00002769 [10:57:57] Epoch: 1 Batch: 7122/20099 (35.43%) Loss: 2.339057 LR: 0.00002768 [10:57:59] Epoch: 1 Batch: 7123/20099 (35.44%) Loss: 2.389833 LR: 0.00002768 [10:58:01] Epoch: 1 Batch: 7124/20099 (35.44%) Loss: 2.019301 LR: 0.00002768 [10:58:03] Epoch: 1 Batch: 7125/20099 (35.45%) Loss: 2.174572 LR: 0.00002768 [10:58:05] Epoch: 1 Batch: 7126/20099 (35.45%) Loss: 1.972755 LR: 0.00002768 [10:58:06] Epoch: 1 Batch: 7127/20099 (35.46%) Loss: 2.265330 LR: 0.00002768 [10:58:08] Epoch: 1 Batch: 7128/20099 (35.46%) Loss: 2.275131 LR: 0.00002768 [10:58:10] Epoch: 1 Batch: 7129/20099 (35.47%) Loss: 2.137804 LR: 0.00002767 [10:58:12] Epoch: 1 Batch: 7130/20099 (35.47%) Loss: 2.146187 LR: 0.00002767 [10:58:13] Epoch: 1 Batch: 7131/20099 (35.48%) Loss: 2.232193 LR: 0.00002767 [10:58:15] Epoch: 1 Batch: 7132/20099 (35.48%) Loss: 2.022522 LR: 0.00002767 [10:58:17] Epoch: 1 Batch: 7133/20099 (35.49%) Loss: 2.149465 LR: 0.00002767 [10:58:19] Epoch: 1 Batch: 7134/20099 (35.49%) Loss: 2.056278 LR: 0.00002767 [10:58:21] Epoch: 1 Batch: 7135/20099 (35.50%) Loss: 2.129737 LR: 0.00002767 [10:58:22] Epoch: 1 Batch: 7136/20099 (35.50%) Loss: 2.168998 LR: 0.00002766 [10:58:24] Epoch: 1 Batch: 7137/20099 (35.51%) Loss: 2.251946 LR: 0.00002766 [10:58:26] Epoch: 1 Batch: 7138/20099 (35.51%) Loss: 2.353036 LR: 0.00002766 [10:58:28] Epoch: 1 Batch: 7139/20099 (35.52%) Loss: 1.980910 LR: 0.00002766 [10:58:30] Epoch: 1 Batch: 7140/20099 (35.52%) Loss: 2.152295 LR: 0.00002766 [10:58:31] Epoch: 1 Batch: 7141/20099 (35.53%) Loss: 2.102435 LR: 0.00002766 [10:58:33] Epoch: 1 Batch: 7142/20099 (35.53%) Loss: 2.027541 LR: 0.00002766 [10:58:35] Epoch: 1 Batch: 7143/20099 (35.54%) Loss: 1.926253 LR: 0.00002765 [10:58:37] Epoch: 1 Batch: 7144/20099 (35.54%) Loss: 1.732822 LR: 0.00002765 [10:58:38] Epoch: 1 Batch: 7145/20099 (35.55%) Loss: 2.285007 LR: 0.00002765 [10:58:40] Epoch: 1 Batch: 7146/20099 (35.55%) Loss: 1.920499 LR: 0.00002765 [10:58:42] Epoch: 1 Batch: 7147/20099 (35.56%) Loss: 2.107876 LR: 0.00002765 [10:58:44] Epoch: 1 Batch: 7148/20099 (35.56%) Loss: 1.908098 LR: 0.00002765 [10:58:45] Epoch: 1 Batch: 7149/20099 (35.57%) Loss: 1.821089 LR: 0.00002765 [10:58:47] Epoch: 1 Batch: 7150/20099 (35.57%) Loss: 2.180191 LR: 0.00002764 [10:58:49] Epoch: 1 Batch: 7151/20099 (35.58%) Loss: 2.092648 LR: 0.00002764 [10:58:51] Epoch: 1 Batch: 7152/20099 (35.58%) Loss: 2.314674 LR: 0.00002764 [10:58:53] Epoch: 1 Batch: 7153/20099 (35.59%) Loss: 1.909270 LR: 0.00002764 [10:58:54] Epoch: 1 Batch: 7154/20099 (35.59%) Loss: 2.135186 LR: 0.00002764 [10:58:56] Epoch: 1 Batch: 7155/20099 (35.60%) Loss: 2.044431 LR: 0.00002764 [10:58:58] Epoch: 1 Batch: 7156/20099 (35.60%) Loss: 2.152645 LR: 0.00002764 [10:59:00] Epoch: 1 Batch: 7157/20099 (35.61%) Loss: 2.158270 LR: 0.00002763 [10:59:01] Epoch: 1 Batch: 7158/20099 (35.61%) Loss: 2.233413 LR: 0.00002763 [10:59:03] Epoch: 1 Batch: 7159/20099 (35.62%) Loss: 1.975279 LR: 0.00002763 [10:59:05] Epoch: 1 Batch: 7160/20099 (35.62%) Loss: 2.070082 LR: 0.00002763 [10:59:07] Epoch: 1 Batch: 7161/20099 (35.63%) Loss: 1.578087 LR: 0.00002763 [10:59:09] Epoch: 1 Batch: 7162/20099 (35.63%) Loss: 1.929736 LR: 0.00002763 [10:59:10] Epoch: 1 Batch: 7163/20099 (35.64%) Loss: 2.040705 LR: 0.00002763 [10:59:12] Epoch: 1 Batch: 7164/20099 (35.64%) Loss: 2.457897 LR: 0.00002762 [10:59:14] Epoch: 1 Batch: 7165/20099 (35.65%) Loss: 2.153120 LR: 0.00002762 [10:59:16] Epoch: 1 Batch: 7166/20099 (35.65%) Loss: 1.885421 LR: 0.00002762 [10:59:17] Epoch: 1 Batch: 7167/20099 (35.66%) Loss: 2.289651 LR: 0.00002762 [10:59:19] Epoch: 1 Batch: 7168/20099 (35.66%) Loss: 2.339476 LR: 0.00002762 [10:59:21] Epoch: 1 Batch: 7169/20099 (35.67%) Loss: 2.266893 LR: 0.00002762 [10:59:23] Epoch: 1 Batch: 7170/20099 (35.67%) Loss: 2.084042 LR: 0.00002762 [10:59:25] Epoch: 1 Batch: 7171/20099 (35.68%) Loss: 2.002411 LR: 0.00002761 [10:59:26] Epoch: 1 Batch: 7172/20099 (35.68%) Loss: 2.244063 LR: 0.00002761 [10:59:28] Epoch: 1 Batch: 7173/20099 (35.69%) Loss: 2.313492 LR: 0.00002761 [10:59:30] Epoch: 1 Batch: 7174/20099 (35.69%) Loss: 2.387916 LR: 0.00002761 [10:59:32] Epoch: 1 Batch: 7175/20099 (35.70%) Loss: 2.186898 LR: 0.00002761 [10:59:33] Epoch: 1 Batch: 7176/20099 (35.70%) Loss: 2.302961 LR: 0.00002761 [10:59:35] Epoch: 1 Batch: 7177/20099 (35.71%) Loss: 2.247002 LR: 0.00002761 [10:59:37] Epoch: 1 Batch: 7178/20099 (35.71%) Loss: 2.189175 LR: 0.00002760 [10:59:39] Epoch: 1 Batch: 7179/20099 (35.72%) Loss: 2.017252 LR: 0.00002760 [10:59:41] Epoch: 1 Batch: 7180/20099 (35.72%) Loss: 2.061703 LR: 0.00002760 [10:59:42] Epoch: 1 Batch: 7181/20099 (35.73%) Loss: 2.148195 LR: 0.00002760 [10:59:44] Epoch: 1 Batch: 7182/20099 (35.73%) Loss: 2.127609 LR: 0.00002760 [10:59:46] Epoch: 1 Batch: 7183/20099 (35.74%) Loss: 2.223930 LR: 0.00002760 [10:59:48] Epoch: 1 Batch: 7184/20099 (35.74%) Loss: 2.409929 LR: 0.00002760 [10:59:50] Epoch: 1 Batch: 7185/20099 (35.75%) Loss: 2.328671 LR: 0.00002759 [10:59:51] Epoch: 1 Batch: 7186/20099 (35.75%) Loss: 2.319633 LR: 0.00002759 [10:59:53] Epoch: 1 Batch: 7187/20099 (35.76%) Loss: 2.014514 LR: 0.00002759 [10:59:55] Epoch: 1 Batch: 7188/20099 (35.76%) Loss: 2.026465 LR: 0.00002759 [10:59:57] Epoch: 1 Batch: 7189/20099 (35.77%) Loss: 1.832118 LR: 0.00002759 [10:59:58] Epoch: 1 Batch: 7190/20099 (35.77%) Loss: 2.167995 LR: 0.00002759 [11:00:00] Epoch: 1 Batch: 7191/20099 (35.78%) Loss: 2.097427 LR: 0.00002759 [11:00:02] Epoch: 1 Batch: 7192/20099 (35.78%) Loss: 2.137363 LR: 0.00002758 [11:00:04] Epoch: 1 Batch: 7193/20099 (35.79%) Loss: 2.246005 LR: 0.00002758 [11:00:06] Epoch: 1 Batch: 7194/20099 (35.79%) Loss: 2.290567 LR: 0.00002758 [11:00:07] Epoch: 1 Batch: 7195/20099 (35.80%) Loss: 2.265014 LR: 0.00002758 [11:00:09] Epoch: 1 Batch: 7196/20099 (35.80%) Loss: 2.370922 LR: 0.00002758 [11:00:11] Epoch: 1 Batch: 7197/20099 (35.81%) Loss: 2.177689 LR: 0.00002758 [11:00:13] Epoch: 1 Batch: 7198/20099 (35.81%) Loss: 2.369873 LR: 0.00002758 [11:00:14] Epoch: 1 Batch: 7199/20099 (35.82%) Loss: 2.153777 LR: 0.00002757 [11:00:20] >> Cleaned up old temp checkpoint: epoch1_step5200 [11:00:20] >> Temp checkpoint saved: epoch1_step7200, size: 0.1693 GB [11:00:20] Epoch: 1 Batch: 7200/20099 (35.82%) Loss: 2.076853 LR: 0.00002757 [11:00:21] Epoch: 1 Batch: 7201/20099 (35.83%) Loss: 1.911233 LR: 0.00002757 [11:00:23] Epoch: 1 Batch: 7202/20099 (35.83%) Loss: 1.558824 LR: 0.00002757 [11:00:25] Epoch: 1 Batch: 7203/20099 (35.84%) Loss: 2.433884 LR: 0.00002757 [11:00:27] Epoch: 1 Batch: 7204/20099 (35.84%) Loss: 2.164688 LR: 0.00002757 [11:00:28] Epoch: 1 Batch: 7205/20099 (35.85%) Loss: 1.967166 LR: 0.00002757 [11:00:30] Epoch: 1 Batch: 7206/20099 (35.85%) Loss: 2.246242 LR: 0.00002756 [11:00:32] Epoch: 1 Batch: 7207/20099 (35.86%) Loss: 2.368386 LR: 0.00002756 [11:00:34] Epoch: 1 Batch: 7208/20099 (35.86%) Loss: 2.173307 LR: 0.00002756 [11:00:36] Epoch: 1 Batch: 7209/20099 (35.87%) Loss: 2.270287 LR: 0.00002756 [11:00:37] Epoch: 1 Batch: 7210/20099 (35.87%) Loss: 1.827531 LR: 0.00002756 [11:00:39] Epoch: 1 Batch: 7211/20099 (35.88%) Loss: 2.061102 LR: 0.00002756 [11:00:41] Epoch: 1 Batch: 7212/20099 (35.88%) Loss: 2.073916 LR: 0.00002756 [11:00:43] Epoch: 1 Batch: 7213/20099 (35.89%) Loss: 1.890602 LR: 0.00002756 [11:00:44] Epoch: 1 Batch: 7214/20099 (35.89%) Loss: 1.724355 LR: 0.00002756 [11:00:46] Epoch: 1 Batch: 7215/20099 (35.90%) Loss: 1.820392 LR: 0.00002756 [11:00:48] Epoch: 1 Batch: 7216/20099 (35.90%) Loss: 2.252142 LR: 0.00002756 [11:00:50] Epoch: 1 Batch: 7217/20099 (35.91%) Loss: 2.047825 LR: 0.00002756 [11:00:52] Epoch: 1 Batch: 7218/20099 (35.91%) Loss: 1.849444 LR: 0.00002756 [11:00:53] Epoch: 1 Batch: 7219/20099 (35.92%) Loss: 1.876080 LR: 0.00002756 [11:00:55] Epoch: 1 Batch: 7220/20099 (35.92%) Loss: 2.430123 LR: 0.00002755 [11:00:57] Epoch: 1 Batch: 7221/20099 (35.93%) Loss: 2.195468 LR: 0.00002755 [11:00:59] Epoch: 1 Batch: 7222/20099 (35.93%) Loss: 2.169407 LR: 0.00002755 [11:01:01] Epoch: 1 Batch: 7223/20099 (35.94%) Loss: 2.503976 LR: 0.00002755 [11:01:02] Epoch: 1 Batch: 7224/20099 (35.94%) Loss: 2.027979 LR: 0.00002755 [11:01:04] Epoch: 1 Batch: 7225/20099 (35.95%) Loss: 2.149176 LR: 0.00002755 [11:01:06] Epoch: 1 Batch: 7226/20099 (35.95%) Loss: 2.015510 LR: 0.00002755 [11:01:08] Epoch: 1 Batch: 7227/20099 (35.96%) Loss: 2.033658 LR: 0.00002754 [11:01:10] Epoch: 1 Batch: 7228/20099 (35.96%) Loss: 1.807357 LR: 0.00002754 [11:01:11] Epoch: 1 Batch: 7229/20099 (35.97%) Loss: 2.182154 LR: 0.00002754 [11:01:13] Epoch: 1 Batch: 7230/20099 (35.97%) Loss: 2.117984 LR: 0.00002754 [11:01:15] Epoch: 1 Batch: 7231/20099 (35.98%) Loss: 1.865798 LR: 0.00002754 [11:01:17] Epoch: 1 Batch: 7232/20099 (35.98%) Loss: 1.999234 LR: 0.00002754 [11:01:18] Epoch: 1 Batch: 7233/20099 (35.99%) Loss: 2.306871 LR: 0.00002754 [11:01:20] Epoch: 1 Batch: 7234/20099 (35.99%) Loss: 2.072221 LR: 0.00002753 [11:01:22] Epoch: 1 Batch: 7235/20099 (36.00%) Loss: 2.104891 LR: 0.00002753 [11:01:24] Epoch: 1 Batch: 7236/20099 (36.00%) Loss: 1.873299 LR: 0.00002753 [11:01:25] Epoch: 1 Batch: 7237/20099 (36.01%) Loss: 2.234660 LR: 0.00002753 [11:01:27] Epoch: 1 Batch: 7238/20099 (36.01%) Loss: 2.053027 LR: 0.00002753 [11:01:29] Epoch: 1 Batch: 7239/20099 (36.02%) Loss: 2.114897 LR: 0.00002753 [11:01:31] Epoch: 1 Batch: 7240/20099 (36.02%) Loss: 2.214556 LR: 0.00002753 [11:01:32] Epoch: 1 Batch: 7241/20099 (36.03%) Loss: 2.174130 LR: 0.00002752 [11:01:34] Epoch: 1 Batch: 7242/20099 (36.03%) Loss: 2.173335 LR: 0.00002752 [11:01:36] Epoch: 1 Batch: 7243/20099 (36.04%) Loss: 2.311701 LR: 0.00002752 [11:01:38] Epoch: 1 Batch: 7244/20099 (36.04%) Loss: 1.790541 LR: 0.00002752 [11:01:40] Epoch: 1 Batch: 7245/20099 (36.05%) Loss: 2.153057 LR: 0.00002752 [11:01:41] Epoch: 1 Batch: 7246/20099 (36.05%) Loss: 1.773124 LR: 0.00002752 [11:01:43] Epoch: 1 Batch: 7247/20099 (36.06%) Loss: 2.099765 LR: 0.00002752 [11:01:45] Epoch: 1 Batch: 7248/20099 (36.06%) Loss: 2.230945 LR: 0.00002751 [11:01:47] Epoch: 1 Batch: 7249/20099 (36.07%) Loss: 1.850626 LR: 0.00002751 [11:01:48] Epoch: 1 Batch: 7250/20099 (36.07%) Loss: 2.211106 LR: 0.00002751 [11:01:50] Epoch: 1 Batch: 7251/20099 (36.08%) Loss: 1.983997 LR: 0.00002751 [11:01:52] Epoch: 1 Batch: 7252/20099 (36.08%) Loss: 2.175603 LR: 0.00002751 [11:01:54] Epoch: 1 Batch: 7253/20099 (36.09%) Loss: 2.614998 LR: 0.00002751 [11:01:55] Epoch: 1 Batch: 7254/20099 (36.09%) Loss: 1.670992 LR: 0.00002751 [11:01:57] Epoch: 1 Batch: 7255/20099 (36.10%) Loss: 1.889959 LR: 0.00002750 [11:01:59] Epoch: 1 Batch: 7256/20099 (36.10%) Loss: 2.342557 LR: 0.00002750 [11:02:01] Epoch: 1 Batch: 7257/20099 (36.11%) Loss: 1.913944 LR: 0.00002750 [11:02:03] Epoch: 1 Batch: 7258/20099 (36.11%) Loss: 1.942025 LR: 0.00002750 [11:02:04] Epoch: 1 Batch: 7259/20099 (36.12%) Loss: 2.215939 LR: 0.00002750 [11:02:06] Epoch: 1 Batch: 7260/20099 (36.12%) Loss: 2.183671 LR: 0.00002750 [11:02:08] Epoch: 1 Batch: 7261/20099 (36.13%) Loss: 1.947031 LR: 0.00002750 [11:02:10] Epoch: 1 Batch: 7262/20099 (36.13%) Loss: 2.305274 LR: 0.00002749 [11:02:11] Epoch: 1 Batch: 7263/20099 (36.14%) Loss: 2.355109 LR: 0.00002749 [11:02:13] Epoch: 1 Batch: 7264/20099 (36.14%) Loss: 2.185870 LR: 0.00002749 [11:02:15] Epoch: 1 Batch: 7265/20099 (36.15%) Loss: 1.741967 LR: 0.00002749 [11:02:17] Epoch: 1 Batch: 7266/20099 (36.15%) Loss: 2.281250 LR: 0.00002749 [11:02:19] Epoch: 1 Batch: 7267/20099 (36.16%) Loss: 2.231014 LR: 0.00002749 [11:02:20] Epoch: 1 Batch: 7268/20099 (36.16%) Loss: 2.058611 LR: 0.00002749 [11:02:22] Epoch: 1 Batch: 7269/20099 (36.17%) Loss: 1.955834 LR: 0.00002748 [11:02:24] Epoch: 1 Batch: 7270/20099 (36.17%) Loss: 2.215246 LR: 0.00002748 [11:02:26] Epoch: 1 Batch: 7271/20099 (36.18%) Loss: 2.024403 LR: 0.00002748 [11:02:28] Epoch: 1 Batch: 7272/20099 (36.18%) Loss: 2.058369 LR: 0.00002748 [11:02:29] Epoch: 1 Batch: 7273/20099 (36.19%) Loss: 2.076151 LR: 0.00002748 [11:02:31] Epoch: 1 Batch: 7274/20099 (36.19%) Loss: 2.163307 LR: 0.00002748 [11:02:33] Epoch: 1 Batch: 7275/20099 (36.20%) Loss: 2.101662 LR: 0.00002748 [11:02:35] Epoch: 1 Batch: 7276/20099 (36.20%) Loss: 2.198657 LR: 0.00002747 [11:02:36] Epoch: 1 Batch: 7277/20099 (36.21%) Loss: 2.193222 LR: 0.00002747 [11:02:38] Epoch: 1 Batch: 7278/20099 (36.21%) Loss: 2.168489 LR: 0.00002747 [11:02:40] Epoch: 1 Batch: 7279/20099 (36.22%) Loss: 2.083658 LR: 0.00002747 [11:02:42] Epoch: 1 Batch: 7280/20099 (36.22%) Loss: 2.165561 LR: 0.00002747 [11:02:44] Epoch: 1 Batch: 7281/20099 (36.23%) Loss: 2.142388 LR: 0.00002747 [11:02:45] Epoch: 1 Batch: 7282/20099 (36.23%) Loss: 2.329437 LR: 0.00002747 [11:02:47] Epoch: 1 Batch: 7283/20099 (36.24%) Loss: 1.897173 LR: 0.00002746 [11:02:49] Epoch: 1 Batch: 7284/20099 (36.24%) Loss: 2.059953 LR: 0.00002746 [11:02:51] Epoch: 1 Batch: 7285/20099 (36.25%) Loss: 2.178367 LR: 0.00002746 [11:02:52] Epoch: 1 Batch: 7286/20099 (36.25%) Loss: 2.035744 LR: 0.00002746 [11:02:54] Epoch: 1 Batch: 7287/20099 (36.26%) Loss: 2.225251 LR: 0.00002746 [11:02:56] Epoch: 1 Batch: 7288/20099 (36.26%) Loss: 2.046050 LR: 0.00002746 [11:02:58] Epoch: 1 Batch: 7289/20099 (36.27%) Loss: 2.071908 LR: 0.00002746 [11:02:59] Epoch: 1 Batch: 7290/20099 (36.27%) Loss: 1.964508 LR: 0.00002745 [11:03:01] Epoch: 1 Batch: 7291/20099 (36.28%) Loss: 2.282103 LR: 0.00002745 [11:03:03] Epoch: 1 Batch: 7292/20099 (36.28%) Loss: 1.729223 LR: 0.00002745 [11:03:05] Epoch: 1 Batch: 7293/20099 (36.29%) Loss: 2.304574 LR: 0.00002745 [11:03:06] Epoch: 1 Batch: 7294/20099 (36.29%) Loss: 2.084528 LR: 0.00002745 [11:03:08] Epoch: 1 Batch: 7295/20099 (36.30%) Loss: 1.958475 LR: 0.00002745 [11:03:10] Epoch: 1 Batch: 7296/20099 (36.30%) Loss: 2.322032 LR: 0.00002745 [11:03:12] Epoch: 1 Batch: 7297/20099 (36.31%) Loss: 1.829064 LR: 0.00002744 [11:03:13] Epoch: 1 Batch: 7298/20099 (36.31%) Loss: 2.395888 LR: 0.00002744 [11:03:15] Epoch: 1 Batch: 7299/20099 (36.32%) Loss: 2.153566 LR: 0.00002744 [11:03:17] Epoch: 1 Batch: 7300/20099 (36.32%) Loss: 2.315783 LR: 0.00002744 [11:03:19] Epoch: 1 Batch: 7301/20099 (36.33%) Loss: 2.189029 LR: 0.00002744 [11:03:20] Epoch: 1 Batch: 7302/20099 (36.33%) Loss: 2.030681 LR: 0.00002744 [11:03:22] Epoch: 1 Batch: 7303/20099 (36.34%) Loss: 1.994963 LR: 0.00002744 [11:03:24] Epoch: 1 Batch: 7304/20099 (36.34%) Loss: 2.298687 LR: 0.00002743 [11:03:26] Epoch: 1 Batch: 7305/20099 (36.35%) Loss: 2.360773 LR: 0.00002743 [11:03:28] Epoch: 1 Batch: 7306/20099 (36.35%) Loss: 2.041594 LR: 0.00002743 [11:03:29] Epoch: 1 Batch: 7307/20099 (36.36%) Loss: 2.206077 LR: 0.00002743 [11:03:31] Epoch: 1 Batch: 7308/20099 (36.36%) Loss: 1.931530 LR: 0.00002743 [11:03:33] Epoch: 1 Batch: 7309/20099 (36.36%) Loss: 2.277756 LR: 0.00002743 [11:03:35] Epoch: 1 Batch: 7310/20099 (36.37%) Loss: 2.103641 LR: 0.00002743 [11:03:36] Epoch: 1 Batch: 7311/20099 (36.37%) Loss: 2.077133 LR: 0.00002742 [11:03:38] Epoch: 1 Batch: 7312/20099 (36.38%) Loss: 2.407587 LR: 0.00002742 [11:03:40] Epoch: 1 Batch: 7313/20099 (36.38%) Loss: 2.292740 LR: 0.00002742 [11:03:42] Epoch: 1 Batch: 7314/20099 (36.39%) Loss: 2.230324 LR: 0.00002742 [11:03:43] Epoch: 1 Batch: 7315/20099 (36.39%) Loss: 2.192221 LR: 0.00002742 [11:03:45] Epoch: 1 Batch: 7316/20099 (36.40%) Loss: 2.060912 LR: 0.00002742 [11:03:47] Epoch: 1 Batch: 7317/20099 (36.40%) Loss: 1.764946 LR: 0.00002742 [11:03:49] Epoch: 1 Batch: 7318/20099 (36.41%) Loss: 2.140236 LR: 0.00002741 [11:03:51] Epoch: 1 Batch: 7319/20099 (36.41%) Loss: 2.143922 LR: 0.00002741 [11:03:52] Epoch: 1 Batch: 7320/20099 (36.42%) Loss: 2.177565 LR: 0.00002741 [11:03:54] Epoch: 1 Batch: 7321/20099 (36.42%) Loss: 2.258011 LR: 0.00002741 [11:03:56] Epoch: 1 Batch: 7322/20099 (36.43%) Loss: 2.198504 LR: 0.00002741 [11:03:58] Epoch: 1 Batch: 7323/20099 (36.43%) Loss: 2.121088 LR: 0.00002741 [11:03:59] Epoch: 1 Batch: 7324/20099 (36.44%) Loss: 2.039883 LR: 0.00002741 [11:04:01] Epoch: 1 Batch: 7325/20099 (36.44%) Loss: 2.040908 LR: 0.00002740 [11:04:03] Epoch: 1 Batch: 7326/20099 (36.45%) Loss: 1.960421 LR: 0.00002740 [11:04:05] Epoch: 1 Batch: 7327/20099 (36.45%) Loss: 1.960288 LR: 0.00002740 [11:04:06] Epoch: 1 Batch: 7328/20099 (36.46%) Loss: 1.937103 LR: 0.00002740 [11:04:08] Epoch: 1 Batch: 7329/20099 (36.46%) Loss: 2.225955 LR: 0.00002740 [11:04:10] Epoch: 1 Batch: 7330/20099 (36.47%) Loss: 1.913225 LR: 0.00002740 [11:04:12] Epoch: 1 Batch: 7331/20099 (36.47%) Loss: 2.049021 LR: 0.00002740 [11:04:14] Epoch: 1 Batch: 7332/20099 (36.48%) Loss: 2.165673 LR: 0.00002739 [11:04:15] Epoch: 1 Batch: 7333/20099 (36.48%) Loss: 2.459568 LR: 0.00002739 [11:04:17] Epoch: 1 Batch: 7334/20099 (36.49%) Loss: 2.368166 LR: 0.00002739 [11:04:19] Epoch: 1 Batch: 7335/20099 (36.49%) Loss: 1.881795 LR: 0.00002739 [11:04:21] Epoch: 1 Batch: 7336/20099 (36.50%) Loss: 2.334785 LR: 0.00002739 [11:04:23] Epoch: 1 Batch: 7337/20099 (36.50%) Loss: 1.741207 LR: 0.00002739 [11:04:24] Epoch: 1 Batch: 7338/20099 (36.51%) Loss: 2.043748 LR: 0.00002739 [11:04:26] Epoch: 1 Batch: 7339/20099 (36.51%) Loss: 1.828566 LR: 0.00002738 [11:04:28] Epoch: 1 Batch: 7340/20099 (36.52%) Loss: 2.568763 LR: 0.00002738 [11:04:30] Epoch: 1 Batch: 7341/20099 (36.52%) Loss: 1.944180 LR: 0.00002738 [11:04:31] Epoch: 1 Batch: 7342/20099 (36.53%) Loss: 2.065260 LR: 0.00002738 [11:04:33] Epoch: 1 Batch: 7343/20099 (36.53%) Loss: 2.118607 LR: 0.00002738 [11:04:35] Epoch: 1 Batch: 7344/20099 (36.54%) Loss: 1.989427 LR: 0.00002738 [11:04:37] Epoch: 1 Batch: 7345/20099 (36.54%) Loss: 2.393122 LR: 0.00002738 [11:04:39] Epoch: 1 Batch: 7346/20099 (36.55%) Loss: 2.249776 LR: 0.00002737 [11:04:40] Epoch: 1 Batch: 7347/20099 (36.55%) Loss: 1.997816 LR: 0.00002737 [11:04:42] Epoch: 1 Batch: 7348/20099 (36.56%) Loss: 1.891102 LR: 0.00002737 [11:04:44] Epoch: 1 Batch: 7349/20099 (36.56%) Loss: 2.059733 LR: 0.00002737 [11:04:46] Epoch: 1 Batch: 7350/20099 (36.57%) Loss: 2.374706 LR: 0.00002737 [11:04:47] Epoch: 1 Batch: 7351/20099 (36.57%) Loss: 1.955818 LR: 0.00002737 [11:04:49] Epoch: 1 Batch: 7352/20099 (36.58%) Loss: 2.103321 LR: 0.00002737 [11:04:51] Epoch: 1 Batch: 7353/20099 (36.58%) Loss: 2.455685 LR: 0.00002736 [11:04:53] Epoch: 1 Batch: 7354/20099 (36.59%) Loss: 2.132526 LR: 0.00002736 [11:04:54] Epoch: 1 Batch: 7355/20099 (36.59%) Loss: 2.256558 LR: 0.00002736 [11:04:56] Epoch: 1 Batch: 7356/20099 (36.60%) Loss: 2.008852 LR: 0.00002736 [11:04:58] Epoch: 1 Batch: 7357/20099 (36.60%) Loss: 2.490409 LR: 0.00002736 [11:05:00] Epoch: 1 Batch: 7358/20099 (36.61%) Loss: 2.042322 LR: 0.00002736 [11:05:02] Epoch: 1 Batch: 7359/20099 (36.61%) Loss: 2.441477 LR: 0.00002736 [11:05:03] Epoch: 1 Batch: 7360/20099 (36.62%) Loss: 2.073596 LR: 0.00002734 [11:05:05] Epoch: 1 Batch: 7361/20099 (36.62%) Loss: 1.889905 LR: 0.00002734 [11:05:07] Epoch: 1 Batch: 7362/20099 (36.63%) Loss: 2.150015 LR: 0.00002734 [11:05:09] Epoch: 1 Batch: 7363/20099 (36.63%) Loss: 2.060184 LR: 0.00002734 [11:05:10] Epoch: 1 Batch: 7364/20099 (36.64%) Loss: 2.166988 LR: 0.00002734 [11:05:12] Epoch: 1 Batch: 7365/20099 (36.64%) Loss: 2.058955 LR: 0.00002734 [11:05:14] Epoch: 1 Batch: 7366/20099 (36.65%) Loss: 2.253374 LR: 0.00002734 [11:05:16] Epoch: 1 Batch: 7367/20099 (36.65%) Loss: 2.017129 LR: 0.00002733 [11:05:18] Epoch: 1 Batch: 7368/20099 (36.66%) Loss: 2.147307 LR: 0.00002733 [11:05:19] Epoch: 1 Batch: 7369/20099 (36.66%) Loss: 2.314949 LR: 0.00002733 [11:05:21] Epoch: 1 Batch: 7370/20099 (36.67%) Loss: 1.933322 LR: 0.00002733 [11:05:23] Epoch: 1 Batch: 7371/20099 (36.67%) Loss: 2.037072 LR: 0.00002733 [11:05:25] Epoch: 1 Batch: 7372/20099 (36.68%) Loss: 2.161930 LR: 0.00002733 [11:05:26] Epoch: 1 Batch: 7373/20099 (36.68%) Loss: 1.884749 LR: 0.00002733 [11:05:28] Epoch: 1 Batch: 7374/20099 (36.69%) Loss: 2.020182 LR: 0.00002732 [11:05:30] Epoch: 1 Batch: 7375/20099 (36.69%) Loss: 2.079051 LR: 0.00002732 [11:05:32] Epoch: 1 Batch: 7376/20099 (36.70%) Loss: 2.425427 LR: 0.00002732 [11:05:33] Epoch: 1 Batch: 7377/20099 (36.70%) Loss: 2.373074 LR: 0.00002732 [11:05:35] Epoch: 1 Batch: 7378/20099 (36.71%) Loss: 2.169395 LR: 0.00002732 [11:05:37] Epoch: 1 Batch: 7379/20099 (36.71%) Loss: 1.989823 LR: 0.00002732 [11:05:39] Epoch: 1 Batch: 7380/20099 (36.72%) Loss: 2.305950 LR: 0.00002732 [11:05:41] Epoch: 1 Batch: 7381/20099 (36.72%) Loss: 2.165323 LR: 0.00002731 [11:05:42] Epoch: 1 Batch: 7382/20099 (36.73%) Loss: 1.983045 LR: 0.00002731 [11:05:44] Epoch: 1 Batch: 7383/20099 (36.73%) Loss: 2.156368 LR: 0.00002731 [11:05:46] Epoch: 1 Batch: 7384/20099 (36.74%) Loss: 1.932139 LR: 0.00002731 [11:05:48] Epoch: 1 Batch: 7385/20099 (36.74%) Loss: 2.303589 LR: 0.00002731 [11:05:49] Epoch: 1 Batch: 7386/20099 (36.75%) Loss: 2.435396 LR: 0.00002731 [11:05:51] Epoch: 1 Batch: 7387/20099 (36.75%) Loss: 2.065573 LR: 0.00002731 [11:05:53] Epoch: 1 Batch: 7388/20099 (36.76%) Loss: 2.034019 LR: 0.00002730 [11:05:55] Epoch: 1 Batch: 7389/20099 (36.76%) Loss: 1.919823 LR: 0.00002730 [11:05:56] Epoch: 1 Batch: 7390/20099 (36.77%) Loss: 1.870006 LR: 0.00002730 [11:05:58] Epoch: 1 Batch: 7391/20099 (36.77%) Loss: 1.962441 LR: 0.00002730 [11:06:00] Epoch: 1 Batch: 7392/20099 (36.78%) Loss: 1.956198 LR: 0.00002730 [11:06:02] Epoch: 1 Batch: 7393/20099 (36.78%) Loss: 2.429133 LR: 0.00002730 [11:06:04] Epoch: 1 Batch: 7394/20099 (36.79%) Loss: 1.901583 LR: 0.00002730 [11:06:05] Epoch: 1 Batch: 7395/20099 (36.79%) Loss: 2.145886 LR: 0.00002729 [11:06:07] Epoch: 1 Batch: 7396/20099 (36.80%) Loss: 2.215379 LR: 0.00002729 [11:06:09] Epoch: 1 Batch: 7397/20099 (36.80%) Loss: 1.915058 LR: 0.00002729 [11:06:11] Epoch: 1 Batch: 7398/20099 (36.81%) Loss: 2.024350 LR: 0.00002729 [11:06:12] Epoch: 1 Batch: 7399/20099 (36.81%) Loss: 2.156263 LR: 0.00002729 [11:06:17] >> Cleaned up old temp checkpoint: epoch1_step5400 [11:06:18] >> Temp checkpoint saved: epoch1_step7400, size: 0.1693 GB [11:06:18] Epoch: 1 Batch: 7400/20099 (36.82%) Loss: 2.016092 LR: 0.00002729 [11:06:19] Epoch: 1 Batch: 7401/20099 (36.82%) Loss: 2.337456 LR: 0.00002729 [11:06:21] Epoch: 1 Batch: 7402/20099 (36.83%) Loss: 2.149627 LR: 0.00002728 [11:06:23] Epoch: 1 Batch: 7403/20099 (36.83%) Loss: 2.359048 LR: 0.00002728 [11:06:25] Epoch: 1 Batch: 7404/20099 (36.84%) Loss: 2.004697 LR: 0.00002728 [11:06:26] Epoch: 1 Batch: 7405/20099 (36.84%) Loss: 2.060101 LR: 0.00002728 [11:06:28] Epoch: 1 Batch: 7406/20099 (36.85%) Loss: 1.944252 LR: 0.00002728 [11:06:30] Epoch: 1 Batch: 7407/20099 (36.85%) Loss: 2.026157 LR: 0.00002728 [11:06:32] Epoch: 1 Batch: 7408/20099 (36.86%) Loss: 2.122943 LR: 0.00002728 [11:06:33] Epoch: 1 Batch: 7409/20099 (36.86%) Loss: 2.046132 LR: 0.00002727 [11:06:35] Epoch: 1 Batch: 7410/20099 (36.87%) Loss: 1.845685 LR: 0.00002727 [11:06:37] Epoch: 1 Batch: 7411/20099 (36.87%) Loss: 1.954907 LR: 0.00002727 [11:06:39] Epoch: 1 Batch: 7412/20099 (36.88%) Loss: 2.159217 LR: 0.00002727 [11:06:41] Epoch: 1 Batch: 7413/20099 (36.88%) Loss: 2.380630 LR: 0.00002727 [11:06:42] Epoch: 1 Batch: 7414/20099 (36.89%) Loss: 2.428357 LR: 0.00002727 [11:06:44] Epoch: 1 Batch: 7415/20099 (36.89%) Loss: 1.951288 LR: 0.00002727 [11:06:46] Epoch: 1 Batch: 7416/20099 (36.90%) Loss: 2.245491 LR: 0.00002726 [11:06:48] Epoch: 1 Batch: 7417/20099 (36.90%) Loss: 2.348791 LR: 0.00002726 [11:06:50] Epoch: 1 Batch: 7418/20099 (36.91%) Loss: 2.536381 LR: 0.00002726 [11:06:51] Epoch: 1 Batch: 7419/20099 (36.91%) Loss: 2.118656 LR: 0.00002726 [11:06:53] Epoch: 1 Batch: 7420/20099 (36.92%) Loss: 2.311234 LR: 0.00002726 [11:06:55] Epoch: 1 Batch: 7421/20099 (36.92%) Loss: 1.894537 LR: 0.00002726 [11:06:57] Epoch: 1 Batch: 7422/20099 (36.93%) Loss: 2.101614 LR: 0.00002726 [11:06:59] Epoch: 1 Batch: 7423/20099 (36.93%) Loss: 2.204845 LR: 0.00002725 [11:07:00] Epoch: 1 Batch: 7424/20099 (36.94%) Loss: 1.968376 LR: 0.00002725 [11:07:02] Epoch: 1 Batch: 7425/20099 (36.94%) Loss: 1.827072 LR: 0.00002725 [11:07:04] Epoch: 1 Batch: 7426/20099 (36.95%) Loss: 2.372280 LR: 0.00002725 [11:07:06] Epoch: 1 Batch: 7427/20099 (36.95%) Loss: 2.292456 LR: 0.00002725 [11:07:07] Epoch: 1 Batch: 7428/20099 (36.96%) Loss: 2.074204 LR: 0.00002725 [11:07:09] Epoch: 1 Batch: 7429/20099 (36.96%) Loss: 2.052638 LR: 0.00002725 [11:07:11] Epoch: 1 Batch: 7430/20099 (36.97%) Loss: 2.054391 LR: 0.00002724 [11:07:13] Epoch: 1 Batch: 7431/20099 (36.97%) Loss: 1.967737 LR: 0.00002724 [11:07:15] Epoch: 1 Batch: 7432/20099 (36.98%) Loss: 1.993551 LR: 0.00002724 [11:07:16] Epoch: 1 Batch: 7433/20099 (36.98%) Loss: 1.916423 LR: 0.00002724 [11:07:18] Epoch: 1 Batch: 7434/20099 (36.99%) Loss: 2.107457 LR: 0.00002724 [11:07:20] Epoch: 1 Batch: 7435/20099 (36.99%) Loss: 1.944660 LR: 0.00002724 [11:07:22] Epoch: 1 Batch: 7436/20099 (37.00%) Loss: 1.902421 LR: 0.00002724 [11:07:23] Epoch: 1 Batch: 7437/20099 (37.00%) Loss: 1.856221 LR: 0.00002723 [11:07:25] Epoch: 1 Batch: 7438/20099 (37.01%) Loss: 2.150042 LR: 0.00002723 [11:07:27] Epoch: 1 Batch: 7439/20099 (37.01%) Loss: 2.070509 LR: 0.00002723 [11:07:29] Epoch: 1 Batch: 7440/20099 (37.02%) Loss: 1.954005 LR: 0.00002723 [11:07:30] Epoch: 1 Batch: 7441/20099 (37.02%) Loss: 2.303750 LR: 0.00002723 [11:07:32] Epoch: 1 Batch: 7442/20099 (37.03%) Loss: 2.315698 LR: 0.00002723 [11:07:34] Epoch: 1 Batch: 7443/20099 (37.03%) Loss: 1.845899 LR: 0.00002723 [11:07:36] Epoch: 1 Batch: 7444/20099 (37.04%) Loss: 2.262138 LR: 0.00002722 [11:07:37] Epoch: 1 Batch: 7445/20099 (37.04%) Loss: 2.173619 LR: 0.00002722 [11:07:39] Epoch: 1 Batch: 7446/20099 (37.05%) Loss: 1.970432 LR: 0.00002722 [11:07:41] Epoch: 1 Batch: 7447/20099 (37.05%) Loss: 1.899911 LR: 0.00002722 [11:07:43] Epoch: 1 Batch: 7448/20099 (37.06%) Loss: 2.042604 LR: 0.00002722 [11:07:44] Epoch: 1 Batch: 7449/20099 (37.06%) Loss: 1.956385 LR: 0.00002722 [11:07:46] Epoch: 1 Batch: 7450/20099 (37.07%) Loss: 2.318150 LR: 0.00002722 [11:07:48] Epoch: 1 Batch: 7451/20099 (37.07%) Loss: 2.254893 LR: 0.00002721 [11:07:50] Epoch: 1 Batch: 7452/20099 (37.08%) Loss: 2.257835 LR: 0.00002721 [11:07:52] Epoch: 1 Batch: 7453/20099 (37.08%) Loss: 2.116762 LR: 0.00002721 [11:07:53] Epoch: 1 Batch: 7454/20099 (37.09%) Loss: 2.193834 LR: 0.00002721 [11:07:55] Epoch: 1 Batch: 7455/20099 (37.09%) Loss: 1.999442 LR: 0.00002721 [11:07:57] Epoch: 1 Batch: 7456/20099 (37.10%) Loss: 2.379193 LR: 0.00002721 [11:07:59] Epoch: 1 Batch: 7457/20099 (37.10%) Loss: 1.932035 LR: 0.00002721 [11:08:00] Epoch: 1 Batch: 7458/20099 (37.11%) Loss: 2.186571 LR: 0.00002720 [11:08:02] Epoch: 1 Batch: 7459/20099 (37.11%) Loss: 2.167314 LR: 0.00002720 [11:08:04] Epoch: 1 Batch: 7460/20099 (37.12%) Loss: 2.212732 LR: 0.00002720 [11:08:06] Epoch: 1 Batch: 7461/20099 (37.12%) Loss: 2.367903 LR: 0.00002720 [11:08:08] Epoch: 1 Batch: 7462/20099 (37.13%) Loss: 1.958687 LR: 0.00002720 [11:08:09] Epoch: 1 Batch: 7463/20099 (37.13%) Loss: 1.872130 LR: 0.00002720 [11:08:11] Epoch: 1 Batch: 7464/20099 (37.14%) Loss: 2.601503 LR: 0.00002720 [11:08:13] Epoch: 1 Batch: 7465/20099 (37.14%) Loss: 2.295963 LR: 0.00002719 [11:08:15] Epoch: 1 Batch: 7466/20099 (37.15%) Loss: 2.043216 LR: 0.00002719 [11:08:17] Epoch: 1 Batch: 7467/20099 (37.15%) Loss: 2.102214 LR: 0.00002719 [11:08:18] Epoch: 1 Batch: 7468/20099 (37.16%) Loss: 2.052921 LR: 0.00002719 [11:08:20] Epoch: 1 Batch: 7469/20099 (37.16%) Loss: 1.852019 LR: 0.00002719 [11:08:22] Epoch: 1 Batch: 7470/20099 (37.17%) Loss: 2.250102 LR: 0.00002719 [11:08:24] Epoch: 1 Batch: 7471/20099 (37.17%) Loss: 1.948229 LR: 0.00002719 [11:08:25] Epoch: 1 Batch: 7472/20099 (37.18%) Loss: 2.114219 LR: 0.00002718 [11:08:27] Epoch: 1 Batch: 7473/20099 (37.18%) Loss: 2.275443 LR: 0.00002718 [11:08:29] Epoch: 1 Batch: 7474/20099 (37.19%) Loss: 1.935437 LR: 0.00002718 [11:08:31] Epoch: 1 Batch: 7475/20099 (37.19%) Loss: 2.201656 LR: 0.00002718 [11:08:33] Epoch: 1 Batch: 7476/20099 (37.20%) Loss: 2.326701 LR: 0.00002718 [11:08:34] Epoch: 1 Batch: 7477/20099 (37.20%) Loss: 2.116198 LR: 0.00002718 [11:08:36] Epoch: 1 Batch: 7478/20099 (37.21%) Loss: 2.382480 LR: 0.00002718 [11:08:38] Epoch: 1 Batch: 7479/20099 (37.21%) Loss: 2.061540 LR: 0.00002717 [11:08:40] Epoch: 1 Batch: 7480/20099 (37.22%) Loss: 2.088738 LR: 0.00002717 [11:08:41] Epoch: 1 Batch: 7481/20099 (37.22%) Loss: 1.951484 LR: 0.00002717 [11:08:43] Epoch: 1 Batch: 7482/20099 (37.23%) Loss: 2.173810 LR: 0.00002717 [11:08:45] Epoch: 1 Batch: 7483/20099 (37.23%) Loss: 1.944673 LR: 0.00002717 [11:08:47] Epoch: 1 Batch: 7484/20099 (37.24%) Loss: 2.021167 LR: 0.00002717 [11:08:48] Epoch: 1 Batch: 7485/20099 (37.24%) Loss: 2.296095 LR: 0.00002717 [11:08:50] Epoch: 1 Batch: 7486/20099 (37.25%) Loss: 2.220359 LR: 0.00002716 [11:08:52] Epoch: 1 Batch: 7487/20099 (37.25%) Loss: 2.195253 LR: 0.00002716 [11:08:54] Epoch: 1 Batch: 7488/20099 (37.26%) Loss: 2.040570 LR: 0.00002716 [11:08:56] Epoch: 1 Batch: 7489/20099 (37.26%) Loss: 2.013252 LR: 0.00002716 [11:08:57] Epoch: 1 Batch: 7490/20099 (37.27%) Loss: 2.316216 LR: 0.00002716 [11:08:59] Epoch: 1 Batch: 7491/20099 (37.27%) Loss: 2.182035 LR: 0.00002716 [11:09:01] Epoch: 1 Batch: 7492/20099 (37.28%) Loss: 1.897231 LR: 0.00002716 [11:09:03] Epoch: 1 Batch: 7493/20099 (37.28%) Loss: 2.160485 LR: 0.00002715 [11:09:04] Epoch: 1 Batch: 7494/20099 (37.29%) Loss: 2.211513 LR: 0.00002715 [11:09:06] Epoch: 1 Batch: 7495/20099 (37.29%) Loss: 2.043427 LR: 0.00002715 [11:09:08] Epoch: 1 Batch: 7496/20099 (37.30%) Loss: 2.153075 LR: 0.00002715 [11:09:10] Epoch: 1 Batch: 7497/20099 (37.30%) Loss: 2.147776 LR: 0.00002715 [11:09:11] Epoch: 1 Batch: 7498/20099 (37.31%) Loss: 2.262820 LR: 0.00002715 [11:09:13] Epoch: 1 Batch: 7499/20099 (37.31%) Loss: 2.286182 LR: 0.00002715 [11:09:15] >> Evaluating batch 0 [11:09:16] >> Evaluating batch 1 [11:09:17] >> Evaluating batch 2 [11:09:18] >> Evaluating batch 3 [11:09:19] >> Evaluating batch 4 [11:09:20] >> Evaluating batch 5 [11:09:21] >> Evaluating batch 6 [11:09:22] >> Evaluating batch 7 [11:09:23] >> Evaluating batch 8 [11:09:24] >> Evaluating batch 9 [11:09:25] >> Evaluating batch 10 [11:09:26] >> Evaluating batch 11 [11:09:27] >> Evaluating batch 12 [11:09:28] >> Evaluating batch 13 [11:09:29] >> Evaluating batch 14 [11:09:30] >> Evaluating batch 15 [11:09:31] >> Evaluating batch 16 [11:09:31] Epoch: 1 Step: 7500/20099 Evaluation: [11:09:31] [1mAvg Loss Since Last Eval: 2.1102 Val Loss: 2.1859 Validation loss delta: 0.0017 Perplexity: 8.8985 LR: 0.00002714 [11:09:35] >> Checkpoint saved: epoch1_step7500, size: 0.1693 GB [11:09:35] Epoch: 1 Batch: 7500/20099 (37.32%) Loss: 2.090431 LR: 0.00002714 [11:09:37] Epoch: 1 Batch: 7501/20099 (37.32%) Loss: 1.955598 LR: 0.00002714 [11:09:39] Epoch: 1 Batch: 7502/20099 (37.33%) Loss: 2.319316 LR: 0.00002714 [11:09:40] Epoch: 1 Batch: 7503/20099 (37.33%) Loss: 2.227690 LR: 0.00002714 [11:09:42] Epoch: 1 Batch: 7504/20099 (37.34%) Loss: 2.190015 LR: 0.00002714 [11:09:44] Epoch: 1 Batch: 7505/20099 (37.34%) Loss: 2.040753 LR: 0.00002714 [11:09:45] Epoch: 1 Batch: 7506/20099 (37.35%) Loss: 2.369993 LR: 0.00002714 [11:09:47] Epoch: 1 Batch: 7507/20099 (37.35%) Loss: 1.926050 LR: 0.00002713 [11:09:49] Epoch: 1 Batch: 7508/20099 (37.36%) Loss: 2.117378 LR: 0.00002713 [11:09:51] Epoch: 1 Batch: 7509/20099 (37.36%) Loss: 2.227285 LR: 0.00002713 [11:09:53] Epoch: 1 Batch: 7510/20099 (37.37%) Loss: 2.096868 LR: 0.00002713 [11:09:54] Epoch: 1 Batch: 7511/20099 (37.37%) Loss: 2.033343 LR: 0.00002713 [11:09:56] Epoch: 1 Batch: 7512/20099 (37.37%) Loss: 2.199976 LR: 0.00002713 [11:09:58] Epoch: 1 Batch: 7513/20099 (37.38%) Loss: 2.188995 LR: 0.00002713 [11:10:00] Epoch: 1 Batch: 7514/20099 (37.38%) Loss: 2.311489 LR: 0.00002712 [11:10:02] Epoch: 1 Batch: 7515/20099 (37.39%) Loss: 2.041385 LR: 0.00002712 [11:10:03] Epoch: 1 Batch: 7516/20099 (37.39%) Loss: 2.275685 LR: 0.00002712 [11:10:05] Epoch: 1 Batch: 7517/20099 (37.40%) Loss: 2.287780 LR: 0.00002712 [11:10:07] Epoch: 1 Batch: 7518/20099 (37.40%) Loss: 2.242068 LR: 0.00002712 [11:10:09] Epoch: 1 Batch: 7519/20099 (37.41%) Loss: 2.205517 LR: 0.00002712 [11:10:11] Epoch: 1 Batch: 7520/20099 (37.41%) Loss: 2.073525 LR: 0.00002712 [11:10:12] Epoch: 1 Batch: 7521/20099 (37.42%) Loss: 2.185176 LR: 0.00002711 [11:10:14] Epoch: 1 Batch: 7522/20099 (37.42%) Loss: 2.105930 LR: 0.00002711 [11:10:16] Epoch: 1 Batch: 7523/20099 (37.43%) Loss: 1.708417 LR: 0.00002711 [11:10:18] Epoch: 1 Batch: 7524/20099 (37.43%) Loss: 2.339002 LR: 0.00002711 [11:10:19] Epoch: 1 Batch: 7525/20099 (37.44%) Loss: 2.114140 LR: 0.00002711 [11:10:21] Epoch: 1 Batch: 7526/20099 (37.44%) Loss: 2.012085 LR: 0.00002711 [11:10:23] Epoch: 1 Batch: 7527/20099 (37.45%) Loss: 2.172597 LR: 0.00002711 [11:10:25] Epoch: 1 Batch: 7528/20099 (37.45%) Loss: 1.857878 LR: 0.00002710 [11:10:27] Epoch: 1 Batch: 7529/20099 (37.46%) Loss: 1.825968 LR: 0.00002710 [11:10:28] Epoch: 1 Batch: 7530/20099 (37.46%) Loss: 2.057885 LR: 0.00002710 [11:10:30] Epoch: 1 Batch: 7531/20099 (37.47%) Loss: 2.106778 LR: 0.00002710 [11:10:32] Epoch: 1 Batch: 7532/20099 (37.47%) Loss: 2.127872 LR: 0.00002710 [11:10:34] Epoch: 1 Batch: 7533/20099 (37.48%) Loss: 2.088772 LR: 0.00002710 [11:10:35] Epoch: 1 Batch: 7534/20099 (37.48%) Loss: 2.615045 LR: 0.00002710 [11:10:37] Epoch: 1 Batch: 7535/20099 (37.49%) Loss: 2.219585 LR: 0.00002708 [11:10:39] Epoch: 1 Batch: 7536/20099 (37.49%) Loss: 2.349776 LR: 0.00002708 [11:10:41] Epoch: 1 Batch: 7537/20099 (37.50%) Loss: 2.065005 LR: 0.00002708 [11:10:43] Epoch: 1 Batch: 7538/20099 (37.50%) Loss: 1.936078 LR: 0.00002708 [11:10:44] Epoch: 1 Batch: 7539/20099 (37.51%) Loss: 2.255954 LR: 0.00002708 [11:10:46] Epoch: 1 Batch: 7540/20099 (37.51%) Loss: 2.014580 LR: 0.00002708 [11:10:48] Epoch: 1 Batch: 7541/20099 (37.52%) Loss: 2.095209 LR: 0.00002708 [11:10:50] Epoch: 1 Batch: 7542/20099 (37.52%) Loss: 2.124031 LR: 0.00002707 [11:10:51] Epoch: 1 Batch: 7543/20099 (37.53%) Loss: 1.980767 LR: 0.00002707 [11:10:53] Epoch: 1 Batch: 7544/20099 (37.53%) Loss: 2.080734 LR: 0.00002707 [11:10:55] Epoch: 1 Batch: 7545/20099 (37.54%) Loss: 2.015906 LR: 0.00002707 [11:10:57] Epoch: 1 Batch: 7546/20099 (37.54%) Loss: 2.184129 LR: 0.00002707 [11:10:58] Epoch: 1 Batch: 7547/20099 (37.55%) Loss: 2.085321 LR: 0.00002707 [11:11:00] Epoch: 1 Batch: 7548/20099 (37.55%) Loss: 2.022751 LR: 0.00002707 [11:11:02] Epoch: 1 Batch: 7549/20099 (37.56%) Loss: 2.092518 LR: 0.00002706 [11:11:04] Epoch: 1 Batch: 7550/20099 (37.56%) Loss: 2.152174 LR: 0.00002706 [11:11:05] Epoch: 1 Batch: 7551/20099 (37.57%) Loss: 2.195509 LR: 0.00002706 [11:11:07] Epoch: 1 Batch: 7552/20099 (37.57%) Loss: 2.014400 LR: 0.00002706 [11:11:09] Epoch: 1 Batch: 7553/20099 (37.58%) Loss: 2.247091 LR: 0.00002706 [11:11:11] Epoch: 1 Batch: 7554/20099 (37.58%) Loss: 2.333678 LR: 0.00002706 [11:11:12] Epoch: 1 Batch: 7555/20099 (37.59%) Loss: 2.180648 LR: 0.00002706 [11:11:14] Epoch: 1 Batch: 7556/20099 (37.59%) Loss: 2.145382 LR: 0.00002705 [11:11:16] Epoch: 1 Batch: 7557/20099 (37.60%) Loss: 2.159431 LR: 0.00002705 [11:11:18] Epoch: 1 Batch: 7558/20099 (37.60%) Loss: 1.994300 LR: 0.00002705 [11:11:20] Epoch: 1 Batch: 7559/20099 (37.61%) Loss: 1.882058 LR: 0.00002705 [11:11:21] Epoch: 1 Batch: 7560/20099 (37.61%) Loss: 2.333821 LR: 0.00002705 [11:11:23] Epoch: 1 Batch: 7561/20099 (37.62%) Loss: 2.053298 LR: 0.00002705 [11:11:25] Epoch: 1 Batch: 7562/20099 (37.62%) Loss: 2.077059 LR: 0.00002705 [11:11:27] Epoch: 1 Batch: 7563/20099 (37.63%) Loss: 2.191625 LR: 0.00002704 [11:11:28] Epoch: 1 Batch: 7564/20099 (37.63%) Loss: 2.071985 LR: 0.00002704 [11:11:30] Epoch: 1 Batch: 7565/20099 (37.64%) Loss: 2.012497 LR: 0.00002704 [11:11:32] Epoch: 1 Batch: 7566/20099 (37.64%) Loss: 2.005894 LR: 0.00002704 [11:11:34] Epoch: 1 Batch: 7567/20099 (37.65%) Loss: 1.900437 LR: 0.00002704 [11:11:36] Epoch: 1 Batch: 7568/20099 (37.65%) Loss: 2.063943 LR: 0.00002704 [11:11:37] Epoch: 1 Batch: 7569/20099 (37.66%) Loss: 2.102217 LR: 0.00002704 [11:11:39] Epoch: 1 Batch: 7570/20099 (37.66%) Loss: 2.223043 LR: 0.00002703 [11:11:41] Epoch: 1 Batch: 7571/20099 (37.67%) Loss: 2.417304 LR: 0.00002703 [11:11:43] Epoch: 1 Batch: 7572/20099 (37.67%) Loss: 2.447787 LR: 0.00002703 [11:11:45] Epoch: 1 Batch: 7573/20099 (37.68%) Loss: 2.257261 LR: 0.00002703 [11:11:46] Epoch: 1 Batch: 7574/20099 (37.68%) Loss: 2.252428 LR: 0.00002703 [11:11:48] Epoch: 1 Batch: 7575/20099 (37.69%) Loss: 2.164909 LR: 0.00002703 [11:11:50] Epoch: 1 Batch: 7576/20099 (37.69%) Loss: 2.015445 LR: 0.00002703 [11:11:52] Epoch: 1 Batch: 7577/20099 (37.70%) Loss: 2.096794 LR: 0.00002702 [11:11:53] Epoch: 1 Batch: 7578/20099 (37.70%) Loss: 2.184376 LR: 0.00002702 [11:11:55] Epoch: 1 Batch: 7579/20099 (37.71%) Loss: 2.165411 LR: 0.00002702 [11:11:57] Epoch: 1 Batch: 7580/20099 (37.71%) Loss: 2.326453 LR: 0.00002702 [11:11:59] Epoch: 1 Batch: 7581/20099 (37.72%) Loss: 2.004523 LR: 0.00002702 [11:12:01] Epoch: 1 Batch: 7582/20099 (37.72%) Loss: 2.204636 LR: 0.00002702 [11:12:02] Epoch: 1 Batch: 7583/20099 (37.73%) Loss: 2.211407 LR: 0.00002702 [11:12:04] Epoch: 1 Batch: 7584/20099 (37.73%) Loss: 2.026673 LR: 0.00002701 [11:12:06] Epoch: 1 Batch: 7585/20099 (37.74%) Loss: 2.117855 LR: 0.00002701 [11:12:08] Epoch: 1 Batch: 7586/20099 (37.74%) Loss: 2.178208 LR: 0.00002701 [11:12:09] Epoch: 1 Batch: 7587/20099 (37.75%) Loss: 2.148467 LR: 0.00002701 [11:12:11] Epoch: 1 Batch: 7588/20099 (37.75%) Loss: 2.113534 LR: 0.00002701 [11:12:13] Epoch: 1 Batch: 7589/20099 (37.76%) Loss: 2.119609 LR: 0.00002701 [11:12:15] Epoch: 1 Batch: 7590/20099 (37.76%) Loss: 2.158542 LR: 0.00002701 [11:12:16] Epoch: 1 Batch: 7591/20099 (37.77%) Loss: 2.497913 LR: 0.00002700 [11:12:18] Epoch: 1 Batch: 7592/20099 (37.77%) Loss: 2.291085 LR: 0.00002700 [11:12:20] Epoch: 1 Batch: 7593/20099 (37.78%) Loss: 1.968984 LR: 0.00002700 [11:12:22] Epoch: 1 Batch: 7594/20099 (37.78%) Loss: 2.307407 LR: 0.00002700 [11:12:24] Epoch: 1 Batch: 7595/20099 (37.79%) Loss: 2.002225 LR: 0.00002700 [11:12:25] Epoch: 1 Batch: 7596/20099 (37.79%) Loss: 2.268226 LR: 0.00002700 [11:12:27] Epoch: 1 Batch: 7597/20099 (37.80%) Loss: 2.031996 LR: 0.00002700 [11:12:29] Epoch: 1 Batch: 7598/20099 (37.80%) Loss: 2.295527 LR: 0.00002699 [11:12:31] Epoch: 1 Batch: 7599/20099 (37.81%) Loss: 2.176536 LR: 0.00002699 [11:12:36] >> Cleaned up old temp checkpoint: epoch1_step5600 [11:12:36] >> Temp checkpoint saved: epoch1_step7600, size: 0.1693 GB [11:12:36] Epoch: 1 Batch: 7600/20099 (37.81%) Loss: 2.354165 LR: 0.00002699 [11:12:38] Epoch: 1 Batch: 7601/20099 (37.82%) Loss: 2.289201 LR: 0.00002699 [11:12:39] Epoch: 1 Batch: 7602/20099 (37.82%) Loss: 2.250635 LR: 0.00002699 [11:12:41] Epoch: 1 Batch: 7603/20099 (37.83%) Loss: 2.531839 LR: 0.00002699 [11:12:43] Epoch: 1 Batch: 7604/20099 (37.83%) Loss: 1.831382 LR: 0.00002699 [11:12:45] Epoch: 1 Batch: 7605/20099 (37.84%) Loss: 2.020417 LR: 0.00002698 [11:12:46] Epoch: 1 Batch: 7606/20099 (37.84%) Loss: 2.043750 LR: 0.00002698 [11:12:48] Epoch: 1 Batch: 7607/20099 (37.85%) Loss: 2.200810 LR: 0.00002698 [11:12:50] Epoch: 1 Batch: 7608/20099 (37.85%) Loss: 2.257697 LR: 0.00002698 [11:12:52] Epoch: 1 Batch: 7609/20099 (37.86%) Loss: 1.891204 LR: 0.00002698 [11:12:53] Epoch: 1 Batch: 7610/20099 (37.86%) Loss: 2.176152 LR: 0.00002698 [11:12:55] Epoch: 1 Batch: 7611/20099 (37.87%) Loss: 2.197040 LR: 0.00002698 [11:12:57] Epoch: 1 Batch: 7612/20099 (37.87%) Loss: 2.117419 LR: 0.00002697 [11:12:59] Epoch: 1 Batch: 7613/20099 (37.88%) Loss: 2.373375 LR: 0.00002697 [11:13:00] Epoch: 1 Batch: 7614/20099 (37.88%) Loss: 2.016345 LR: 0.00002697 [11:13:02] Epoch: 1 Batch: 7615/20099 (37.89%) Loss: 1.747656 LR: 0.00002697 [11:13:04] Epoch: 1 Batch: 7616/20099 (37.89%) Loss: 2.432808 LR: 0.00002697 [11:13:06] Epoch: 1 Batch: 7617/20099 (37.90%) Loss: 2.040421 LR: 0.00002697 [11:13:08] Epoch: 1 Batch: 7618/20099 (37.90%) Loss: 2.177718 LR: 0.00002697 [11:13:10] Epoch: 1 Batch: 7619/20099 (37.91%) Loss: 2.337139 LR: 0.00002696 [11:13:11] Epoch: 1 Batch: 7620/20099 (37.91%) Loss: 2.133511 LR: 0.00002696 [11:13:13] Epoch: 1 Batch: 7621/20099 (37.92%) Loss: 1.979455 LR: 0.00002696 [11:13:15] Epoch: 1 Batch: 7622/20099 (37.92%) Loss: 2.097342 LR: 0.00002696 [11:13:17] Epoch: 1 Batch: 7623/20099 (37.93%) Loss: 2.086156 LR: 0.00002696 [11:13:18] Epoch: 1 Batch: 7624/20099 (37.93%) Loss: 1.882397 LR: 0.00002696 [11:13:20] Epoch: 1 Batch: 7625/20099 (37.94%) Loss: 2.079837 LR: 0.00002696 [11:13:22] Epoch: 1 Batch: 7626/20099 (37.94%) Loss: 2.288219 LR: 0.00002695 [11:13:24] Epoch: 1 Batch: 7627/20099 (37.95%) Loss: 2.173043 LR: 0.00002695 [11:13:26] Epoch: 1 Batch: 7628/20099 (37.95%) Loss: 1.995374 LR: 0.00002695 [11:13:27] Epoch: 1 Batch: 7629/20099 (37.96%) Loss: 2.305907 LR: 0.00002695 [11:13:29] Epoch: 1 Batch: 7630/20099 (37.96%) Loss: 2.060481 LR: 0.00002695 [11:13:31] Epoch: 1 Batch: 7631/20099 (37.97%) Loss: 2.371988 LR: 0.00002695 [11:13:33] Epoch: 1 Batch: 7632/20099 (37.97%) Loss: 1.937420 LR: 0.00002695 [11:13:34] Epoch: 1 Batch: 7633/20099 (37.98%) Loss: 1.954404 LR: 0.00002693 [11:13:36] Epoch: 1 Batch: 7634/20099 (37.98%) Loss: 1.744549 LR: 0.00002693 [11:13:38] Epoch: 1 Batch: 7635/20099 (37.99%) Loss: 2.177539 LR: 0.00002693 [11:13:40] Epoch: 1 Batch: 7636/20099 (37.99%) Loss: 2.483406 LR: 0.00002693 [11:13:42] Epoch: 1 Batch: 7637/20099 (38.00%) Loss: 2.232841 LR: 0.00002693 [11:13:43] Epoch: 1 Batch: 7638/20099 (38.00%) Loss: 2.291785 LR: 0.00002693 [11:13:45] Epoch: 1 Batch: 7639/20099 (38.01%) Loss: 2.215655 LR: 0.00002693 [11:13:47] Epoch: 1 Batch: 7640/20099 (38.01%) Loss: 2.175951 LR: 0.00002692 [11:13:49] Epoch: 1 Batch: 7641/20099 (38.02%) Loss: 1.911804 LR: 0.00002692 [11:13:50] Epoch: 1 Batch: 7642/20099 (38.02%) Loss: 2.115698 LR: 0.00002692 [11:13:52] Epoch: 1 Batch: 7643/20099 (38.03%) Loss: 1.978925 LR: 0.00002692 [11:13:54] Epoch: 1 Batch: 7644/20099 (38.03%) Loss: 1.972691 LR: 0.00002692 [11:13:56] Epoch: 1 Batch: 7645/20099 (38.04%) Loss: 1.885351 LR: 0.00002692 [11:13:57] Epoch: 1 Batch: 7646/20099 (38.04%) Loss: 2.350190 LR: 0.00002692 [11:13:59] Epoch: 1 Batch: 7647/20099 (38.05%) Loss: 1.906459 LR: 0.00002691 [11:14:01] Epoch: 1 Batch: 7648/20099 (38.05%) Loss: 1.952194 LR: 0.00002691 [11:14:03] Epoch: 1 Batch: 7649/20099 (38.06%) Loss: 1.868584 LR: 0.00002691 [11:14:04] Epoch: 1 Batch: 7650/20099 (38.06%) Loss: 2.126197 LR: 0.00002691 [11:14:06] Epoch: 1 Batch: 7651/20099 (38.07%) Loss: 1.895261 LR: 0.00002691 [11:14:08] Epoch: 1 Batch: 7652/20099 (38.07%) Loss: 1.957092 LR: 0.00002691 [11:14:10] Epoch: 1 Batch: 7653/20099 (38.08%) Loss: 2.063932 LR: 0.00002691 [11:14:12] Epoch: 1 Batch: 7654/20099 (38.08%) Loss: 2.307883 LR: 0.00002690 [11:14:13] Epoch: 1 Batch: 7655/20099 (38.09%) Loss: 2.189895 LR: 0.00002690 [11:14:15] Epoch: 1 Batch: 7656/20099 (38.09%) Loss: 2.029241 LR: 0.00002690 [11:14:17] Epoch: 1 Batch: 7657/20099 (38.10%) Loss: 2.154032 LR: 0.00002690 [11:14:19] Epoch: 1 Batch: 7658/20099 (38.10%) Loss: 2.189305 LR: 0.00002690 [11:14:20] Epoch: 1 Batch: 7659/20099 (38.11%) Loss: 2.065312 LR: 0.00002690 [11:14:22] Epoch: 1 Batch: 7660/20099 (38.11%) Loss: 1.970588 LR: 0.00002690 [11:14:24] Epoch: 1 Batch: 7661/20099 (38.12%) Loss: 2.056466 LR: 0.00002689 [11:14:26] Epoch: 1 Batch: 7662/20099 (38.12%) Loss: 2.047432 LR: 0.00002689 [11:14:28] Epoch: 1 Batch: 7663/20099 (38.13%) Loss: 2.239151 LR: 0.00002689 [11:14:29] Epoch: 1 Batch: 7664/20099 (38.13%) Loss: 2.113254 LR: 0.00002689 [11:14:31] Epoch: 1 Batch: 7665/20099 (38.14%) Loss: 2.092138 LR: 0.00002689 [11:14:33] Epoch: 1 Batch: 7666/20099 (38.14%) Loss: 2.156407 LR: 0.00002689 [11:14:35] Epoch: 1 Batch: 7667/20099 (38.15%) Loss: 2.110400 LR: 0.00002689 [11:14:36] Epoch: 1 Batch: 7668/20099 (38.15%) Loss: 2.103272 LR: 0.00002688 [11:14:38] Epoch: 1 Batch: 7669/20099 (38.16%) Loss: 2.102816 LR: 0.00002688 [11:14:40] Epoch: 1 Batch: 7670/20099 (38.16%) Loss: 1.998699 LR: 0.00002688 [11:14:42] Epoch: 1 Batch: 7671/20099 (38.17%) Loss: 2.225369 LR: 0.00002688 [11:14:44] Epoch: 1 Batch: 7672/20099 (38.17%) Loss: 2.252201 LR: 0.00002688 [11:14:45] Epoch: 1 Batch: 7673/20099 (38.18%) Loss: 2.306614 LR: 0.00002688 [11:14:47] Epoch: 1 Batch: 7674/20099 (38.18%) Loss: 2.108495 LR: 0.00002688 [11:14:49] Epoch: 1 Batch: 7675/20099 (38.19%) Loss: 2.256114 LR: 0.00002687 [11:14:51] Epoch: 1 Batch: 7676/20099 (38.19%) Loss: 2.040156 LR: 0.00002687 [11:14:52] Epoch: 1 Batch: 7677/20099 (38.20%) Loss: 2.105298 LR: 0.00002687 [11:14:54] Epoch: 1 Batch: 7678/20099 (38.20%) Loss: 2.163300 LR: 0.00002687 [11:14:56] Epoch: 1 Batch: 7679/20099 (38.21%) Loss: 2.217394 LR: 0.00002687 [11:14:58] Epoch: 1 Batch: 7680/20099 (38.21%) Loss: 2.301324 LR: 0.00002687 [11:15:00] Epoch: 1 Batch: 7681/20099 (38.22%) Loss: 2.013013 LR: 0.00002687 [11:15:01] Epoch: 1 Batch: 7682/20099 (38.22%) Loss: 2.197766 LR: 0.00002686 [11:15:03] Epoch: 1 Batch: 7683/20099 (38.23%) Loss: 2.236234 LR: 0.00002686 [11:15:05] Epoch: 1 Batch: 7684/20099 (38.23%) Loss: 1.973678 LR: 0.00002686 [11:15:07] Epoch: 1 Batch: 7685/20099 (38.24%) Loss: 2.045532 LR: 0.00002686 [11:15:08] Epoch: 1 Batch: 7686/20099 (38.24%) Loss: 2.117994 LR: 0.00002686 [11:15:10] Epoch: 1 Batch: 7687/20099 (38.25%) Loss: 2.127974 LR: 0.00002686 [11:15:12] Epoch: 1 Batch: 7688/20099 (38.25%) Loss: 1.829855 LR: 0.00002686 [11:15:14] Epoch: 1 Batch: 7689/20099 (38.26%) Loss: 2.152765 LR: 0.00002685 [11:15:15] Epoch: 1 Batch: 7690/20099 (38.26%) Loss: 2.344557 LR: 0.00002685 [11:15:17] Epoch: 1 Batch: 7691/20099 (38.27%) Loss: 2.016577 LR: 0.00002685 [11:15:19] Epoch: 1 Batch: 7692/20099 (38.27%) Loss: 2.115147 LR: 0.00002685 [11:15:21] Epoch: 1 Batch: 7693/20099 (38.28%) Loss: 1.937482 LR: 0.00002685 [11:15:23] Epoch: 1 Batch: 7694/20099 (38.28%) Loss: 2.255248 LR: 0.00002685 [11:15:24] Epoch: 1 Batch: 7695/20099 (38.29%) Loss: 2.292229 LR: 0.00002685 [11:15:26] Epoch: 1 Batch: 7696/20099 (38.29%) Loss: 2.018835 LR: 0.00002684 [11:15:28] Epoch: 1 Batch: 7697/20099 (38.30%) Loss: 1.983269 LR: 0.00002684 [11:15:29] Epoch: 1 Batch: 7698/20099 (38.30%) Loss: 2.233195 LR: 0.00002684 [11:15:31] Epoch: 1 Batch: 7699/20099 (38.31%) Loss: 2.044190 LR: 0.00002684 [11:15:33] Epoch: 1 Batch: 7700/20099 (38.31%) Loss: 2.261748 LR: 0.00002684 [11:15:35] Epoch: 1 Batch: 7701/20099 (38.32%) Loss: 2.040953 LR: 0.00002684 [11:15:37] Epoch: 1 Batch: 7702/20099 (38.32%) Loss: 2.151722 LR: 0.00002684 [11:15:38] Epoch: 1 Batch: 7703/20099 (38.33%) Loss: 1.923068 LR: 0.00002683 [11:15:40] Epoch: 1 Batch: 7704/20099 (38.33%) Loss: 2.095229 LR: 0.00002683 [11:15:42] Epoch: 1 Batch: 7705/20099 (38.34%) Loss: 2.336534 LR: 0.00002683 [11:15:44] Epoch: 1 Batch: 7706/20099 (38.34%) Loss: 2.340981 LR: 0.00002683 [11:15:45] Epoch: 1 Batch: 7707/20099 (38.35%) Loss: 2.182798 LR: 0.00002683 [11:15:47] Epoch: 1 Batch: 7708/20099 (38.35%) Loss: 2.232115 LR: 0.00002683 [11:15:49] Epoch: 1 Batch: 7709/20099 (38.36%) Loss: 2.011777 LR: 0.00002683 [11:15:51] Epoch: 1 Batch: 7710/20099 (38.36%) Loss: 1.989652 LR: 0.00002681 [11:15:52] Epoch: 1 Batch: 7711/20099 (38.37%) Loss: 2.324297 LR: 0.00002681 [11:15:54] Epoch: 1 Batch: 7712/20099 (38.37%) Loss: 2.290851 LR: 0.00002681 [11:15:56] Epoch: 1 Batch: 7713/20099 (38.38%) Loss: 2.030918 LR: 0.00002681 [11:15:58] Epoch: 1 Batch: 7714/20099 (38.38%) Loss: 2.061622 LR: 0.00002681 [11:16:00] Epoch: 1 Batch: 7715/20099 (38.38%) Loss: 2.236552 LR: 0.00002681 [11:16:01] Epoch: 1 Batch: 7716/20099 (38.39%) Loss: 1.739819 LR: 0.00002681 [11:16:03] Epoch: 1 Batch: 7717/20099 (38.39%) Loss: 2.128646 LR: 0.00002680 [11:16:05] Epoch: 1 Batch: 7718/20099 (38.40%) Loss: 2.327111 LR: 0.00002680 [11:16:07] Epoch: 1 Batch: 7719/20099 (38.40%) Loss: 2.171966 LR: 0.00002680 [11:16:08] Epoch: 1 Batch: 7720/20099 (38.41%) Loss: 2.210466 LR: 0.00002680 [11:16:10] Epoch: 1 Batch: 7721/20099 (38.41%) Loss: 1.985090 LR: 0.00002680 [11:16:12] Epoch: 1 Batch: 7722/20099 (38.42%) Loss: 2.067893 LR: 0.00002680 [11:16:14] Epoch: 1 Batch: 7723/20099 (38.42%) Loss: 2.017352 LR: 0.00002680 [11:16:16] Epoch: 1 Batch: 7724/20099 (38.43%) Loss: 2.128472 LR: 0.00002679 [11:16:17] Epoch: 1 Batch: 7725/20099 (38.43%) Loss: 2.106545 LR: 0.00002679 [11:16:19] Epoch: 1 Batch: 7726/20099 (38.44%) Loss: 2.108230 LR: 0.00002679 [11:16:21] Epoch: 1 Batch: 7727/20099 (38.44%) Loss: 2.128987 LR: 0.00002679 [11:16:23] Epoch: 1 Batch: 7728/20099 (38.45%) Loss: 2.421197 LR: 0.00002679 [11:16:24] Epoch: 1 Batch: 7729/20099 (38.45%) Loss: 1.945420 LR: 0.00002679 [11:16:26] Epoch: 1 Batch: 7730/20099 (38.46%) Loss: 1.992838 LR: 0.00002679 [11:16:28] Epoch: 1 Batch: 7731/20099 (38.46%) Loss: 2.107953 LR: 0.00002678 [11:16:30] Epoch: 1 Batch: 7732/20099 (38.47%) Loss: 2.341775 LR: 0.00002678 [11:16:31] Epoch: 1 Batch: 7733/20099 (38.47%) Loss: 1.967906 LR: 0.00002678 [11:16:33] Epoch: 1 Batch: 7734/20099 (38.48%) Loss: 1.797901 LR: 0.00002678 [11:16:35] Epoch: 1 Batch: 7735/20099 (38.48%) Loss: 2.176594 LR: 0.00002678 [11:16:37] Epoch: 1 Batch: 7736/20099 (38.49%) Loss: 2.273418 LR: 0.00002678 [11:16:39] Epoch: 1 Batch: 7737/20099 (38.49%) Loss: 2.470681 LR: 0.00002678 [11:16:40] Epoch: 1 Batch: 7738/20099 (38.50%) Loss: 1.914682 LR: 0.00002677 [11:16:42] Epoch: 1 Batch: 7739/20099 (38.50%) Loss: 2.044192 LR: 0.00002677 [11:16:44] Epoch: 1 Batch: 7740/20099 (38.51%) Loss: 2.113921 LR: 0.00002677 [11:16:46] Epoch: 1 Batch: 7741/20099 (38.51%) Loss: 1.914415 LR: 0.00002677 [11:16:47] Epoch: 1 Batch: 7742/20099 (38.52%) Loss: 1.695788 LR: 0.00002677 [11:16:49] Epoch: 1 Batch: 7743/20099 (38.52%) Loss: 2.205287 LR: 0.00002677 [11:16:51] Epoch: 1 Batch: 7744/20099 (38.53%) Loss: 2.122564 LR: 0.00002677 [11:16:53] Epoch: 1 Batch: 7745/20099 (38.53%) Loss: 2.351593 LR: 0.00002676 [11:16:55] Epoch: 1 Batch: 7746/20099 (38.54%) Loss: 2.250907 LR: 0.00002676 [11:16:56] Epoch: 1 Batch: 7747/20099 (38.54%) Loss: 2.305131 LR: 0.00002676 [11:16:58] Epoch: 1 Batch: 7748/20099 (38.55%) Loss: 1.760430 LR: 0.00002676 [11:17:00] Epoch: 1 Batch: 7749/20099 (38.55%) Loss: 2.165574 LR: 0.00002676 [11:17:02] Epoch: 1 Batch: 7750/20099 (38.56%) Loss: 2.042888 LR: 0.00002676 [11:17:03] Epoch: 1 Batch: 7751/20099 (38.56%) Loss: 1.792091 LR: 0.00002676 [11:17:05] Epoch: 1 Batch: 7752/20099 (38.57%) Loss: 2.066135 LR: 0.00002675 [11:17:07] Epoch: 1 Batch: 7753/20099 (38.57%) Loss: 2.257123 LR: 0.00002675 [11:17:09] Epoch: 1 Batch: 7754/20099 (38.58%) Loss: 2.224986 LR: 0.00002675 [11:17:11] Epoch: 1 Batch: 7755/20099 (38.58%) Loss: 2.054968 LR: 0.00002675 [11:17:12] Epoch: 1 Batch: 7756/20099 (38.59%) Loss: 2.109709 LR: 0.00002675 [11:17:14] Epoch: 1 Batch: 7757/20099 (38.59%) Loss: 2.237526 LR: 0.00002675 [11:17:16] Epoch: 1 Batch: 7758/20099 (38.60%) Loss: 2.232387 LR: 0.00002675 [11:17:18] Epoch: 1 Batch: 7759/20099 (38.60%) Loss: 2.118845 LR: 0.00002674 [11:17:19] Epoch: 1 Batch: 7760/20099 (38.61%) Loss: 1.826317 LR: 0.00002674 [11:17:21] Epoch: 1 Batch: 7761/20099 (38.61%) Loss: 2.525434 LR: 0.00002674 [11:17:23] Epoch: 1 Batch: 7762/20099 (38.62%) Loss: 1.906866 LR: 0.00002674 [11:17:25] Epoch: 1 Batch: 7763/20099 (38.62%) Loss: 2.129358 LR: 0.00002674 [11:17:27] Epoch: 1 Batch: 7764/20099 (38.63%) Loss: 1.898523 LR: 0.00002674 [11:17:28] Epoch: 1 Batch: 7765/20099 (38.63%) Loss: 2.124898 LR: 0.00002674 [11:17:30] Epoch: 1 Batch: 7766/20099 (38.64%) Loss: 2.183623 LR: 0.00002673 [11:17:32] Epoch: 1 Batch: 7767/20099 (38.64%) Loss: 1.816592 LR: 0.00002673 [11:17:34] Epoch: 1 Batch: 7768/20099 (38.65%) Loss: 2.031799 LR: 0.00002673 [11:17:36] Epoch: 1 Batch: 7769/20099 (38.65%) Loss: 2.115023 LR: 0.00002673 [11:17:37] Epoch: 1 Batch: 7770/20099 (38.66%) Loss: 2.185224 LR: 0.00002673 [11:17:39] Epoch: 1 Batch: 7771/20099 (38.66%) Loss: 2.177919 LR: 0.00002673 [11:17:41] Epoch: 1 Batch: 7772/20099 (38.67%) Loss: 2.140144 LR: 0.00002673 [11:17:43] Epoch: 1 Batch: 7773/20099 (38.67%) Loss: 2.077932 LR: 0.00002671 [11:17:44] Epoch: 1 Batch: 7774/20099 (38.68%) Loss: 2.299306 LR: 0.00002671 [11:17:46] Epoch: 1 Batch: 7775/20099 (38.68%) Loss: 1.926958 LR: 0.00002671 [11:17:48] Epoch: 1 Batch: 7776/20099 (38.69%) Loss: 2.182030 LR: 0.00002671 [11:17:50] Epoch: 1 Batch: 7777/20099 (38.69%) Loss: 2.339509 LR: 0.00002671 [11:17:52] Epoch: 1 Batch: 7778/20099 (38.70%) Loss: 2.231410 LR: 0.00002671 [11:17:53] Epoch: 1 Batch: 7779/20099 (38.70%) Loss: 2.143681 LR: 0.00002671 [11:17:55] Epoch: 1 Batch: 7780/20099 (38.71%) Loss: 2.115634 LR: 0.00002670 [11:17:57] Epoch: 1 Batch: 7781/20099 (38.71%) Loss: 2.243329 LR: 0.00002670 [11:17:59] Epoch: 1 Batch: 7782/20099 (38.72%) Loss: 2.130012 LR: 0.00002670 [11:18:00] Epoch: 1 Batch: 7783/20099 (38.72%) Loss: 2.231580 LR: 0.00002670 [11:18:02] Epoch: 1 Batch: 7784/20099 (38.73%) Loss: 2.209065 LR: 0.00002670 [11:18:04] Epoch: 1 Batch: 7785/20099 (38.73%) Loss: 2.192298 LR: 0.00002670 [11:18:06] Epoch: 1 Batch: 7786/20099 (38.74%) Loss: 1.941680 LR: 0.00002670 [11:18:08] Epoch: 1 Batch: 7787/20099 (38.74%) Loss: 2.118325 LR: 0.00002669 [11:18:09] Epoch: 1 Batch: 7788/20099 (38.75%) Loss: 1.906937 LR: 0.00002669 [11:18:11] Epoch: 1 Batch: 7789/20099 (38.75%) Loss: 2.189586 LR: 0.00002669 [11:18:13] Epoch: 1 Batch: 7790/20099 (38.76%) Loss: 2.018956 LR: 0.00002669 [11:18:15] Epoch: 1 Batch: 7791/20099 (38.76%) Loss: 2.315284 LR: 0.00002669 [11:18:16] Epoch: 1 Batch: 7792/20099 (38.77%) Loss: 2.052695 LR: 0.00002669 [11:18:18] Epoch: 1 Batch: 7793/20099 (38.77%) Loss: 2.364179 LR: 0.00002669 [11:18:20] Epoch: 1 Batch: 7794/20099 (38.78%) Loss: 1.939288 LR: 0.00002668 [11:18:22] Epoch: 1 Batch: 7795/20099 (38.78%) Loss: 2.011673 LR: 0.00002668 [11:18:23] Epoch: 1 Batch: 7796/20099 (38.79%) Loss: 2.031419 LR: 0.00002668 [11:18:25] Epoch: 1 Batch: 7797/20099 (38.79%) Loss: 1.954742 LR: 0.00002668 [11:18:27] Epoch: 1 Batch: 7798/20099 (38.80%) Loss: 2.065381 LR: 0.00002668 [11:18:29] Epoch: 1 Batch: 7799/20099 (38.80%) Loss: 2.067803 LR: 0.00002668 [11:18:34] >> Cleaned up old temp checkpoint: epoch1_step5800 [11:18:34] >> Temp checkpoint saved: epoch1_step7800, size: 0.1693 GB [11:18:34] Epoch: 1 Batch: 7800/20099 (38.81%) Loss: 2.115914 LR: 0.00002668 [11:18:36] Epoch: 1 Batch: 7801/20099 (38.81%) Loss: 2.126686 LR: 0.00002667 [11:18:38] Epoch: 1 Batch: 7802/20099 (38.82%) Loss: 2.041991 LR: 0.00002667 [11:18:39] Epoch: 1 Batch: 7803/20099 (38.82%) Loss: 2.205456 LR: 0.00002667 [11:18:41] Epoch: 1 Batch: 7804/20099 (38.83%) Loss: 2.034886 LR: 0.00002667 [11:18:43] Epoch: 1 Batch: 7805/20099 (38.83%) Loss: 2.365794 LR: 0.00002667 [11:18:45] Epoch: 1 Batch: 7806/20099 (38.84%) Loss: 1.890295 LR: 0.00002667 [11:18:46] Epoch: 1 Batch: 7807/20099 (38.84%) Loss: 1.768340 LR: 0.00002667 [11:18:48] Epoch: 1 Batch: 7808/20099 (38.85%) Loss: 2.187560 LR: 0.00002666 [11:18:50] Epoch: 1 Batch: 7809/20099 (38.85%) Loss: 2.189487 LR: 0.00002666 [11:18:52] Epoch: 1 Batch: 7810/20099 (38.86%) Loss: 1.897124 LR: 0.00002666 [11:18:54] Epoch: 1 Batch: 7811/20099 (38.86%) Loss: 2.242054 LR: 0.00002666 [11:18:55] Epoch: 1 Batch: 7812/20099 (38.87%) Loss: 2.129681 LR: 0.00002666 [11:18:57] Epoch: 1 Batch: 7813/20099 (38.87%) Loss: 2.272071 LR: 0.00002666 [11:18:59] Epoch: 1 Batch: 7814/20099 (38.88%) Loss: 2.046514 LR: 0.00002666 [11:19:01] Epoch: 1 Batch: 7815/20099 (38.88%) Loss: 2.298006 LR: 0.00002665 [11:19:02] Epoch: 1 Batch: 7816/20099 (38.89%) Loss: 2.187032 LR: 0.00002665 [11:19:04] Epoch: 1 Batch: 7817/20099 (38.89%) Loss: 2.112337 LR: 0.00002665 [11:19:06] Epoch: 1 Batch: 7818/20099 (38.90%) Loss: 2.272477 LR: 0.00002665 [11:19:08] Epoch: 1 Batch: 7819/20099 (38.90%) Loss: 1.936588 LR: 0.00002665 [11:19:10] Epoch: 1 Batch: 7820/20099 (38.91%) Loss: 2.085606 LR: 0.00002665 [11:19:11] Epoch: 1 Batch: 7821/20099 (38.91%) Loss: 2.086196 LR: 0.00002665 [11:19:13] Epoch: 1 Batch: 7822/20099 (38.92%) Loss: 2.235284 LR: 0.00002664 [11:19:15] Epoch: 1 Batch: 7823/20099 (38.92%) Loss: 2.347824 LR: 0.00002664 [11:19:17] Epoch: 1 Batch: 7824/20099 (38.93%) Loss: 1.966078 LR: 0.00002664 [11:19:19] Epoch: 1 Batch: 7825/20099 (38.93%) Loss: 2.103088 LR: 0.00002664 [11:19:20] Epoch: 1 Batch: 7826/20099 (38.94%) Loss: 2.293581 LR: 0.00002664 [11:19:22] Epoch: 1 Batch: 7827/20099 (38.94%) Loss: 2.153210 LR: 0.00002664 [11:19:24] Epoch: 1 Batch: 7828/20099 (38.95%) Loss: 2.238383 LR: 0.00002664 [11:19:26] Epoch: 1 Batch: 7829/20099 (38.95%) Loss: 2.332803 LR: 0.00002662 [11:19:27] Epoch: 1 Batch: 7830/20099 (38.96%) Loss: 2.453391 LR: 0.00002662 [11:19:29] Epoch: 1 Batch: 7831/20099 (38.96%) Loss: 2.456487 LR: 0.00002662 [11:19:31] Epoch: 1 Batch: 7832/20099 (38.97%) Loss: 2.249483 LR: 0.00002662 [11:19:33] Epoch: 1 Batch: 7833/20099 (38.97%) Loss: 2.124620 LR: 0.00002662 [11:19:35] Epoch: 1 Batch: 7834/20099 (38.98%) Loss: 2.441478 LR: 0.00002662 [11:19:36] Epoch: 1 Batch: 7835/20099 (38.98%) Loss: 2.256517 LR: 0.00002662 [11:19:38] Epoch: 1 Batch: 7836/20099 (38.99%) Loss: 2.031235 LR: 0.00002661 [11:19:40] Epoch: 1 Batch: 7837/20099 (38.99%) Loss: 2.143024 LR: 0.00002661 [11:19:42] Epoch: 1 Batch: 7838/20099 (39.00%) Loss: 2.305421 LR: 0.00002661 [11:19:43] Epoch: 1 Batch: 7839/20099 (39.00%) Loss: 2.128336 LR: 0.00002661 [11:19:45] Epoch: 1 Batch: 7840/20099 (39.01%) Loss: 2.151582 LR: 0.00002661 [11:19:47] Epoch: 1 Batch: 7841/20099 (39.01%) Loss: 2.327061 LR: 0.00002661 [11:19:49] Epoch: 1 Batch: 7842/20099 (39.02%) Loss: 2.014147 LR: 0.00002661 [11:19:50] Epoch: 1 Batch: 7843/20099 (39.02%) Loss: 1.615473 LR: 0.00002660 [11:19:52] Epoch: 1 Batch: 7844/20099 (39.03%) Loss: 2.020171 LR: 0.00002660 [11:19:54] Epoch: 1 Batch: 7845/20099 (39.03%) Loss: 2.134998 LR: 0.00002660 [11:19:56] Epoch: 1 Batch: 7846/20099 (39.04%) Loss: 2.113968 LR: 0.00002660 [11:19:57] Epoch: 1 Batch: 7847/20099 (39.04%) Loss: 2.185345 LR: 0.00002660 [11:19:59] Epoch: 1 Batch: 7848/20099 (39.05%) Loss: 2.058257 LR: 0.00002660 [11:20:01] Epoch: 1 Batch: 7849/20099 (39.05%) Loss: 1.962369 LR: 0.00002660 [11:20:03] Epoch: 1 Batch: 7850/20099 (39.06%) Loss: 1.902711 LR: 0.00002659 [11:20:05] Epoch: 1 Batch: 7851/20099 (39.06%) Loss: 2.346581 LR: 0.00002659 [11:20:06] Epoch: 1 Batch: 7852/20099 (39.07%) Loss: 2.122932 LR: 0.00002659 [11:20:08] Epoch: 1 Batch: 7853/20099 (39.07%) Loss: 1.886862 LR: 0.00002659 [11:20:10] Epoch: 1 Batch: 7854/20099 (39.08%) Loss: 1.563274 LR: 0.00002659 [11:20:12] Epoch: 1 Batch: 7855/20099 (39.08%) Loss: 2.306425 LR: 0.00002659 [11:20:13] Epoch: 1 Batch: 7856/20099 (39.09%) Loss: 2.044250 LR: 0.00002659 [11:20:15] Epoch: 1 Batch: 7857/20099 (39.09%) Loss: 1.984739 LR: 0.00002658 [11:20:17] Epoch: 1 Batch: 7858/20099 (39.10%) Loss: 2.333704 LR: 0.00002658 [11:20:19] Epoch: 1 Batch: 7859/20099 (39.10%) Loss: 1.998548 LR: 0.00002658 [11:20:20] Epoch: 1 Batch: 7860/20099 (39.11%) Loss: 2.130264 LR: 0.00002658 [11:20:22] Epoch: 1 Batch: 7861/20099 (39.11%) Loss: 2.080041 LR: 0.00002658 [11:20:24] Epoch: 1 Batch: 7862/20099 (39.12%) Loss: 1.956181 LR: 0.00002658 [11:20:26] Epoch: 1 Batch: 7863/20099 (39.12%) Loss: 2.147521 LR: 0.00002658 [11:20:27] Epoch: 1 Batch: 7864/20099 (39.13%) Loss: 2.209058 LR: 0.00002657 [11:20:29] Epoch: 1 Batch: 7865/20099 (39.13%) Loss: 2.139018 LR: 0.00002657 [11:20:31] Epoch: 1 Batch: 7866/20099 (39.14%) Loss: 1.920696 LR: 0.00002657 [11:20:33] Epoch: 1 Batch: 7867/20099 (39.14%) Loss: 2.386090 LR: 0.00002657 [11:20:35] Epoch: 1 Batch: 7868/20099 (39.15%) Loss: 2.193243 LR: 0.00002657 [11:20:36] Epoch: 1 Batch: 7869/20099 (39.15%) Loss: 2.337170 LR: 0.00002657 [11:20:38] Epoch: 1 Batch: 7870/20099 (39.16%) Loss: 2.318821 LR: 0.00002657 [11:20:40] Epoch: 1 Batch: 7871/20099 (39.16%) Loss: 2.121712 LR: 0.00002656 [11:20:42] Epoch: 1 Batch: 7872/20099 (39.17%) Loss: 2.198399 LR: 0.00002656 [11:20:44] Epoch: 1 Batch: 7873/20099 (39.17%) Loss: 2.248979 LR: 0.00002656 [11:20:45] Epoch: 1 Batch: 7874/20099 (39.18%) Loss: 2.472112 LR: 0.00002656 [11:20:47] Epoch: 1 Batch: 7875/20099 (39.18%) Loss: 2.220920 LR: 0.00002656 [11:20:49] Epoch: 1 Batch: 7876/20099 (39.19%) Loss: 1.916760 LR: 0.00002656 [11:20:50] Epoch: 1 Batch: 7877/20099 (39.19%) Loss: 1.990875 LR: 0.00002656 [11:20:52] Epoch: 1 Batch: 7878/20099 (39.20%) Loss: 2.146078 LR: 0.00002655 [11:20:54] Epoch: 1 Batch: 7879/20099 (39.20%) Loss: 2.118501 LR: 0.00002655 [11:20:56] Epoch: 1 Batch: 7880/20099 (39.21%) Loss: 2.079660 LR: 0.00002655 [11:20:58] Epoch: 1 Batch: 7881/20099 (39.21%) Loss: 1.988912 LR: 0.00002655 [11:20:59] Epoch: 1 Batch: 7882/20099 (39.22%) Loss: 2.106959 LR: 0.00002655 [11:21:01] Epoch: 1 Batch: 7883/20099 (39.22%) Loss: 2.144059 LR: 0.00002655 [11:21:03] Epoch: 1 Batch: 7884/20099 (39.23%) Loss: 2.136100 LR: 0.00002655 [11:21:05] Epoch: 1 Batch: 7885/20099 (39.23%) Loss: 1.815180 LR: 0.00002653 [11:21:07] Epoch: 1 Batch: 7886/20099 (39.24%) Loss: 2.022646 LR: 0.00002653 [11:21:08] Epoch: 1 Batch: 7887/20099 (39.24%) Loss: 2.227417 LR: 0.00002653 [11:21:10] Epoch: 1 Batch: 7888/20099 (39.25%) Loss: 2.230236 LR: 0.00002653 [11:21:12] Epoch: 1 Batch: 7889/20099 (39.25%) Loss: 2.082716 LR: 0.00002653 [11:21:14] Epoch: 1 Batch: 7890/20099 (39.26%) Loss: 1.944577 LR: 0.00002653 [11:21:15] Epoch: 1 Batch: 7891/20099 (39.26%) Loss: 1.998958 LR: 0.00002653 [11:21:17] Epoch: 1 Batch: 7892/20099 (39.27%) Loss: 1.879050 LR: 0.00002652 [11:21:19] Epoch: 1 Batch: 7893/20099 (39.27%) Loss: 2.318759 LR: 0.00002652 [11:21:21] Epoch: 1 Batch: 7894/20099 (39.28%) Loss: 2.119075 LR: 0.00002652 [11:21:22] Epoch: 1 Batch: 7895/20099 (39.28%) Loss: 2.212215 LR: 0.00002652 [11:21:24] Epoch: 1 Batch: 7896/20099 (39.29%) Loss: 2.008276 LR: 0.00002652 [11:21:26] Epoch: 1 Batch: 7897/20099 (39.29%) Loss: 2.590654 LR: 0.00002652 [11:21:28] Epoch: 1 Batch: 7898/20099 (39.30%) Loss: 2.244771 LR: 0.00002652 [11:21:30] Epoch: 1 Batch: 7899/20099 (39.30%) Loss: 2.133977 LR: 0.00002651 [11:21:31] Epoch: 1 Batch: 7900/20099 (39.31%) Loss: 1.898094 LR: 0.00002651 [11:21:33] Epoch: 1 Batch: 7901/20099 (39.31%) Loss: 2.083942 LR: 0.00002651 [11:21:35] Epoch: 1 Batch: 7902/20099 (39.32%) Loss: 2.161458 LR: 0.00002651 [11:21:37] Epoch: 1 Batch: 7903/20099 (39.32%) Loss: 2.050329 LR: 0.00002651 [11:21:38] Epoch: 1 Batch: 7904/20099 (39.33%) Loss: 2.101810 LR: 0.00002651 [11:21:40] Epoch: 1 Batch: 7905/20099 (39.33%) Loss: 2.136550 LR: 0.00002651 [11:21:42] Epoch: 1 Batch: 7906/20099 (39.34%) Loss: 1.999566 LR: 0.00002650 [11:21:44] Epoch: 1 Batch: 7907/20099 (39.34%) Loss: 2.013700 LR: 0.00002650 [11:21:46] Epoch: 1 Batch: 7908/20099 (39.35%) Loss: 2.074641 LR: 0.00002650 [11:21:47] Epoch: 1 Batch: 7909/20099 (39.35%) Loss: 2.247553 LR: 0.00002650 [11:21:49] Epoch: 1 Batch: 7910/20099 (39.36%) Loss: 2.295721 LR: 0.00002650 [11:21:51] Epoch: 1 Batch: 7911/20099 (39.36%) Loss: 1.777516 LR: 0.00002650 [11:21:53] Epoch: 1 Batch: 7912/20099 (39.37%) Loss: 2.124319 LR: 0.00002650 [11:21:54] Epoch: 1 Batch: 7913/20099 (39.37%) Loss: 2.235009 LR: 0.00002649 [11:21:56] Epoch: 1 Batch: 7914/20099 (39.38%) Loss: 2.439484 LR: 0.00002649 [11:21:58] Epoch: 1 Batch: 7915/20099 (39.38%) Loss: 2.001752 LR: 0.00002649 [11:22:00] Epoch: 1 Batch: 7916/20099 (39.39%) Loss: 2.219399 LR: 0.00002649 [11:22:01] Epoch: 1 Batch: 7917/20099 (39.39%) Loss: 2.147407 LR: 0.00002649 [11:22:03] Epoch: 1 Batch: 7918/20099 (39.39%) Loss: 1.916561 LR: 0.00002649 [11:22:05] Epoch: 1 Batch: 7919/20099 (39.40%) Loss: 2.049500 LR: 0.00002649 [11:22:07] Epoch: 1 Batch: 7920/20099 (39.40%) Loss: 2.436647 LR: 0.00002648 [11:22:08] Epoch: 1 Batch: 7921/20099 (39.41%) Loss: 1.780668 LR: 0.00002648 [11:22:10] Epoch: 1 Batch: 7922/20099 (39.41%) Loss: 2.211989 LR: 0.00002648 [11:22:12] Epoch: 1 Batch: 7923/20099 (39.42%) Loss: 2.239027 LR: 0.00002648 [11:22:14] Epoch: 1 Batch: 7924/20099 (39.42%) Loss: 2.043938 LR: 0.00002648 [11:22:16] Epoch: 1 Batch: 7925/20099 (39.43%) Loss: 2.089024 LR: 0.00002648 [11:22:17] Epoch: 1 Batch: 7926/20099 (39.43%) Loss: 2.102420 LR: 0.00002648 [11:22:19] Epoch: 1 Batch: 7927/20099 (39.44%) Loss: 1.861673 LR: 0.00002647 [11:22:21] Epoch: 1 Batch: 7928/20099 (39.44%) Loss: 2.268308 LR: 0.00002647 [11:22:23] Epoch: 1 Batch: 7929/20099 (39.45%) Loss: 2.277569 LR: 0.00002647 [11:22:24] Epoch: 1 Batch: 7930/20099 (39.45%) Loss: 2.441023 LR: 0.00002647 [11:22:26] Epoch: 1 Batch: 7931/20099 (39.46%) Loss: 2.122533 LR: 0.00002647 [11:22:28] Epoch: 1 Batch: 7932/20099 (39.46%) Loss: 2.250842 LR: 0.00002647 [11:22:30] Epoch: 1 Batch: 7933/20099 (39.47%) Loss: 2.016479 LR: 0.00002647 [11:22:31] Epoch: 1 Batch: 7934/20099 (39.47%) Loss: 2.052250 LR: 0.00002645 [11:22:33] Epoch: 1 Batch: 7935/20099 (39.48%) Loss: 1.908977 LR: 0.00002645 [11:22:35] Epoch: 1 Batch: 7936/20099 (39.48%) Loss: 2.215253 LR: 0.00002645 [11:22:37] Epoch: 1 Batch: 7937/20099 (39.49%) Loss: 2.316883 LR: 0.00002645 [11:22:38] Epoch: 1 Batch: 7938/20099 (39.49%) Loss: 2.153031 LR: 0.00002645 [11:22:40] Epoch: 1 Batch: 7939/20099 (39.50%) Loss: 2.322775 LR: 0.00002645 [11:22:42] Epoch: 1 Batch: 7940/20099 (39.50%) Loss: 2.214974 LR: 0.00002645 [11:22:44] Epoch: 1 Batch: 7941/20099 (39.51%) Loss: 2.323170 LR: 0.00002644 [11:22:46] Epoch: 1 Batch: 7942/20099 (39.51%) Loss: 2.104364 LR: 0.00002644 [11:22:47] Epoch: 1 Batch: 7943/20099 (39.52%) Loss: 2.429692 LR: 0.00002644 [11:22:49] Epoch: 1 Batch: 7944/20099 (39.52%) Loss: 2.290454 LR: 0.00002644 [11:22:51] Epoch: 1 Batch: 7945/20099 (39.53%) Loss: 1.939322 LR: 0.00002644 [11:22:53] Epoch: 1 Batch: 7946/20099 (39.53%) Loss: 2.148483 LR: 0.00002644 [11:22:54] Epoch: 1 Batch: 7947/20099 (39.54%) Loss: 2.123789 LR: 0.00002644 [11:22:56] Epoch: 1 Batch: 7948/20099 (39.54%) Loss: 2.104358 LR: 0.00002643 [11:22:58] Epoch: 1 Batch: 7949/20099 (39.55%) Loss: 2.010888 LR: 0.00002643 [11:23:00] Epoch: 1 Batch: 7950/20099 (39.55%) Loss: 1.731723 LR: 0.00002643 [11:23:01] Epoch: 1 Batch: 7951/20099 (39.56%) Loss: 2.140817 LR: 0.00002643 [11:23:03] Epoch: 1 Batch: 7952/20099 (39.56%) Loss: 1.997992 LR: 0.00002643 [11:23:05] Epoch: 1 Batch: 7953/20099 (39.57%) Loss: 2.066802 LR: 0.00002643 [11:23:07] Epoch: 1 Batch: 7954/20099 (39.57%) Loss: 2.127962 LR: 0.00002643 [11:23:09] Epoch: 1 Batch: 7955/20099 (39.58%) Loss: 1.935018 LR: 0.00002642 [11:23:10] Epoch: 1 Batch: 7956/20099 (39.58%) Loss: 1.752723 LR: 0.00002642 [11:23:12] Epoch: 1 Batch: 7957/20099 (39.59%) Loss: 2.100379 LR: 0.00002642 [11:23:14] Epoch: 1 Batch: 7958/20099 (39.59%) Loss: 1.952143 LR: 0.00002642 [11:23:16] Epoch: 1 Batch: 7959/20099 (39.60%) Loss: 2.283680 LR: 0.00002642 [11:23:17] Epoch: 1 Batch: 7960/20099 (39.60%) Loss: 1.815256 LR: 0.00002642 [11:23:19] Epoch: 1 Batch: 7961/20099 (39.61%) Loss: 2.116079 LR: 0.00002642 [11:23:21] Epoch: 1 Batch: 7962/20099 (39.61%) Loss: 2.321756 LR: 0.00002641 [11:23:23] Epoch: 1 Batch: 7963/20099 (39.62%) Loss: 2.199976 LR: 0.00002641 [11:23:24] Epoch: 1 Batch: 7964/20099 (39.62%) Loss: 1.828357 LR: 0.00002641 [11:23:26] Epoch: 1 Batch: 7965/20099 (39.63%) Loss: 2.064555 LR: 0.00002641 [11:23:28] Epoch: 1 Batch: 7966/20099 (39.63%) Loss: 2.031368 LR: 0.00002641 [11:23:30] Epoch: 1 Batch: 7967/20099 (39.64%) Loss: 1.970766 LR: 0.00002641 [11:23:32] Epoch: 1 Batch: 7968/20099 (39.64%) Loss: 1.974787 LR: 0.00002641 [11:23:33] Epoch: 1 Batch: 7969/20099 (39.65%) Loss: 2.084501 LR: 0.00002640 [11:23:35] Epoch: 1 Batch: 7970/20099 (39.65%) Loss: 2.006256 LR: 0.00002640 [11:23:37] Epoch: 1 Batch: 7971/20099 (39.66%) Loss: 2.127401 LR: 0.00002640 [11:23:39] Epoch: 1 Batch: 7972/20099 (39.66%) Loss: 1.973046 LR: 0.00002640 [11:23:40] Epoch: 1 Batch: 7973/20099 (39.67%) Loss: 2.268507 LR: 0.00002640 [11:23:42] Epoch: 1 Batch: 7974/20099 (39.67%) Loss: 2.264230 LR: 0.00002640 [11:23:44] Epoch: 1 Batch: 7975/20099 (39.68%) Loss: 2.085159 LR: 0.00002640 [11:23:46] Epoch: 1 Batch: 7976/20099 (39.68%) Loss: 2.067251 LR: 0.00002638 [11:23:47] Epoch: 1 Batch: 7977/20099 (39.69%) Loss: 2.344509 LR: 0.00002638 [11:23:49] Epoch: 1 Batch: 7978/20099 (39.69%) Loss: 1.954948 LR: 0.00002638 [11:23:51] Epoch: 1 Batch: 7979/20099 (39.70%) Loss: 2.029190 LR: 0.00002638 [11:23:53] Epoch: 1 Batch: 7980/20099 (39.70%) Loss: 1.721715 LR: 0.00002638 [11:23:54] Epoch: 1 Batch: 7981/20099 (39.71%) Loss: 1.828877 LR: 0.00002638 [11:23:56] Epoch: 1 Batch: 7982/20099 (39.71%) Loss: 2.269747 LR: 0.00002638 [11:23:58] Epoch: 1 Batch: 7983/20099 (39.72%) Loss: 2.153070 LR: 0.00002637 [11:24:00] Epoch: 1 Batch: 7984/20099 (39.72%) Loss: 1.960694 LR: 0.00002637 [11:24:02] Epoch: 1 Batch: 7985/20099 (39.73%) Loss: 1.951195 LR: 0.00002637 [11:24:03] Epoch: 1 Batch: 7986/20099 (39.73%) Loss: 2.157287 LR: 0.00002637 [11:24:05] Epoch: 1 Batch: 7987/20099 (39.74%) Loss: 2.299076 LR: 0.00002637 [11:24:07] Epoch: 1 Batch: 7988/20099 (39.74%) Loss: 2.195517 LR: 0.00002637 [11:24:09] Epoch: 1 Batch: 7989/20099 (39.75%) Loss: 2.025500 LR: 0.00002637 [11:24:11] Epoch: 1 Batch: 7990/20099 (39.75%) Loss: 2.088178 LR: 0.00002636 [11:24:12] Epoch: 1 Batch: 7991/20099 (39.76%) Loss: 1.961171 LR: 0.00002636 [11:24:14] Epoch: 1 Batch: 7992/20099 (39.76%) Loss: 2.201713 LR: 0.00002636 [11:24:16] Epoch: 1 Batch: 7993/20099 (39.77%) Loss: 2.556838 LR: 0.00002636 [11:24:18] Epoch: 1 Batch: 7994/20099 (39.77%) Loss: 2.164869 LR: 0.00002636 [11:24:19] Epoch: 1 Batch: 7995/20099 (39.78%) Loss: 2.379468 LR: 0.00002636 [11:24:21] Epoch: 1 Batch: 7996/20099 (39.78%) Loss: 2.032998 LR: 0.00002636 [11:24:23] Epoch: 1 Batch: 7997/20099 (39.79%) Loss: 2.118078 LR: 0.00002635 [11:24:25] Epoch: 1 Batch: 7998/20099 (39.79%) Loss: 2.011512 LR: 0.00002635 [11:24:26] Epoch: 1 Batch: 7999/20099 (39.80%) Loss: 1.957702 LR: 0.00002635 [11:24:28] >> Evaluating batch 0 [11:24:29] >> Evaluating batch 1 [11:24:30] >> Evaluating batch 2 [11:24:31] >> Evaluating batch 3 [11:24:32] >> Evaluating batch 4 [11:24:33] >> Evaluating batch 5 [11:24:34] >> Evaluating batch 6 [11:24:35] >> Evaluating batch 7 [11:24:36] >> Evaluating batch 8 [11:24:37] >> Evaluating batch 9 [11:24:38] >> Evaluating batch 10 [11:24:39] >> Evaluating batch 11 [11:24:40] >> Evaluating batch 12 [11:24:41] >> Evaluating batch 13 [11:24:42] >> Evaluating batch 14 [11:24:43] >> Evaluating batch 15 [11:24:44] >> Evaluating batch 16 [11:24:45] Epoch: 1 Step: 8000/20099 Evaluation: [11:24:45] [1mAvg Loss Since Last Eval: 2.1222 Val Loss: 2.1770 Validation loss delta: -0.0089 Perplexity: 8.8199 LR: 0.00002635 [11:24:48] >> Cleaned up old temp checkpoint: epoch1_step6000 [11:24:48] >> Temp checkpoint saved: epoch1_step8000, size: 0.1693 GB [11:24:52] >> Checkpoint saved: epoch1_step8000, size: 0.1693 GB [11:24:52] Epoch: 1 Batch: 8000/20099 (39.80%) Loss: 2.122792 LR: 0.00002635 [11:24:54] Epoch: 1 Batch: 8001/20099 (39.81%) Loss: 2.217061 LR: 0.00002635 [11:24:55] Epoch: 1 Batch: 8002/20099 (39.81%) Loss: 2.246052 LR: 0.00002635 [11:24:57] Epoch: 1 Batch: 8003/20099 (39.82%) Loss: 2.086280 LR: 0.00002635 [11:24:59] Epoch: 1 Batch: 8004/20099 (39.82%) Loss: 2.259737 LR: 0.00002634 [11:25:01] Epoch: 1 Batch: 8005/20099 (39.83%) Loss: 1.744634 LR: 0.00002634 [11:25:02] Epoch: 1 Batch: 8006/20099 (39.83%) Loss: 2.032975 LR: 0.00002634 [11:25:04] Epoch: 1 Batch: 8007/20099 (39.84%) Loss: 2.513135 LR: 0.00002634 [11:25:06] Epoch: 1 Batch: 8008/20099 (39.84%) Loss: 1.886194 LR: 0.00002634 [11:25:08] Epoch: 1 Batch: 8009/20099 (39.85%) Loss: 1.899353 LR: 0.00002634 [11:25:09] Epoch: 1 Batch: 8010/20099 (39.85%) Loss: 2.004407 LR: 0.00002634 [11:25:11] Epoch: 1 Batch: 8011/20099 (39.86%) Loss: 2.047608 LR: 0.00002633 [11:25:13] Epoch: 1 Batch: 8012/20099 (39.86%) Loss: 2.348198 LR: 0.00002633 [11:25:15] Epoch: 1 Batch: 8013/20099 (39.87%) Loss: 2.061751 LR: 0.00002633 [11:25:17] Epoch: 1 Batch: 8014/20099 (39.87%) Loss: 1.890765 LR: 0.00002633 [11:25:19] Epoch: 1 Batch: 8015/20099 (39.88%) Loss: 2.150518 LR: 0.00002633 [11:25:20] Epoch: 1 Batch: 8016/20099 (39.88%) Loss: 1.787401 LR: 0.00002633 [11:25:22] Epoch: 1 Batch: 8017/20099 (39.89%) Loss: 2.108136 LR: 0.00002633 [11:25:24] Epoch: 1 Batch: 8018/20099 (39.89%) Loss: 1.952225 LR: 0.00002631 [11:25:26] Epoch: 1 Batch: 8019/20099 (39.90%) Loss: 1.983482 LR: 0.00002631 [11:25:28] Epoch: 1 Batch: 8020/20099 (39.90%) Loss: 1.809058 LR: 0.00002631 [11:25:30] Epoch: 1 Batch: 8021/20099 (39.91%) Loss: 2.216883 LR: 0.00002631 [11:25:31] Epoch: 1 Batch: 8022/20099 (39.91%) Loss: 2.234991 LR: 0.00002631 [11:25:33] Epoch: 1 Batch: 8023/20099 (39.92%) Loss: 2.233574 LR: 0.00002631 [11:25:35] Epoch: 1 Batch: 8024/20099 (39.92%) Loss: 1.993183 LR: 0.00002631 [11:25:37] Epoch: 1 Batch: 8025/20099 (39.93%) Loss: 2.165363 LR: 0.00002630 [11:25:39] Epoch: 1 Batch: 8026/20099 (39.93%) Loss: 2.140855 LR: 0.00002630 [11:25:40] Epoch: 1 Batch: 8027/20099 (39.94%) Loss: 1.866997 LR: 0.00002630 [11:25:42] Epoch: 1 Batch: 8028/20099 (39.94%) Loss: 2.229012 LR: 0.00002630 [11:25:44] Epoch: 1 Batch: 8029/20099 (39.95%) Loss: 1.786212 LR: 0.00002630 [11:25:46] Epoch: 1 Batch: 8030/20099 (39.95%) Loss: 1.847127 LR: 0.00002630 [11:25:47] Epoch: 1 Batch: 8031/20099 (39.96%) Loss: 2.050258 LR: 0.00002630 [11:25:49] Epoch: 1 Batch: 8032/20099 (39.96%) Loss: 2.103346 LR: 0.00002629 [11:25:51] Epoch: 1 Batch: 8033/20099 (39.97%) Loss: 1.928375 LR: 0.00002629 [11:25:53] Epoch: 1 Batch: 8034/20099 (39.97%) Loss: 2.268705 LR: 0.00002629 [11:25:55] Epoch: 1 Batch: 8035/20099 (39.98%) Loss: 2.334401 LR: 0.00002629 [11:25:56] Epoch: 1 Batch: 8036/20099 (39.98%) Loss: 1.729475 LR: 0.00002629 [11:25:58] Epoch: 1 Batch: 8037/20099 (39.99%) Loss: 2.173978 LR: 0.00002629 [11:26:00] Epoch: 1 Batch: 8038/20099 (39.99%) Loss: 2.058457 LR: 0.00002629 [11:26:02] Epoch: 1 Batch: 8039/20099 (40.00%) Loss: 2.165681 LR: 0.00002628 [11:26:03] Epoch: 1 Batch: 8040/20099 (40.00%) Loss: 2.022991 LR: 0.00002628 [11:26:05] Epoch: 1 Batch: 8041/20099 (40.01%) Loss: 2.176553 LR: 0.00002628 [11:26:07] Epoch: 1 Batch: 8042/20099 (40.01%) Loss: 2.100810 LR: 0.00002628 [11:26:08] Epoch: 1 Batch: 8043/20099 (40.02%) Loss: 2.089385 LR: 0.00002628 [11:26:10] Epoch: 1 Batch: 8044/20099 (40.02%) Loss: 2.513356 LR: 0.00002628 [11:26:12] Epoch: 1 Batch: 8045/20099 (40.03%) Loss: 1.872199 LR: 0.00002628 [11:26:14] Epoch: 1 Batch: 8046/20099 (40.03%) Loss: 1.952038 LR: 0.00002627 [11:26:15] Epoch: 1 Batch: 8047/20099 (40.04%) Loss: 2.257120 LR: 0.00002627 [11:26:17] Epoch: 1 Batch: 8048/20099 (40.04%) Loss: 2.036979 LR: 0.00002627 [11:26:19] Epoch: 1 Batch: 8049/20099 (40.05%) Loss: 2.051506 LR: 0.00002627 [11:26:21] Epoch: 1 Batch: 8050/20099 (40.05%) Loss: 2.206504 LR: 0.00002627 [11:26:22] Epoch: 1 Batch: 8051/20099 (40.06%) Loss: 2.153022 LR: 0.00002627 [11:26:24] Epoch: 1 Batch: 8052/20099 (40.06%) Loss: 1.803505 LR: 0.00002627 [11:26:26] Epoch: 1 Batch: 8053/20099 (40.07%) Loss: 2.082860 LR: 0.00002626 [11:26:28] Epoch: 1 Batch: 8054/20099 (40.07%) Loss: 2.173228 LR: 0.00002626 [11:26:30] Epoch: 1 Batch: 8055/20099 (40.08%) Loss: 2.058554 LR: 0.00002626 [11:26:31] Epoch: 1 Batch: 8056/20099 (40.08%) Loss: 2.128071 LR: 0.00002626 [11:26:33] Epoch: 1 Batch: 8057/20099 (40.09%) Loss: 2.181391 LR: 0.00002626 [11:26:35] Epoch: 1 Batch: 8058/20099 (40.09%) Loss: 2.119464 LR: 0.00002626 [11:26:37] Epoch: 1 Batch: 8059/20099 (40.10%) Loss: 2.252860 LR: 0.00002626 [11:26:38] Epoch: 1 Batch: 8060/20099 (40.10%) Loss: 2.078083 LR: 0.00002624 [11:26:40] Epoch: 1 Batch: 8061/20099 (40.11%) Loss: 2.229984 LR: 0.00002624 [11:26:42] Epoch: 1 Batch: 8062/20099 (40.11%) Loss: 1.809499 LR: 0.00002624 [11:26:44] Epoch: 1 Batch: 8063/20099 (40.12%) Loss: 2.104297 LR: 0.00002624 [11:26:46] Epoch: 1 Batch: 8064/20099 (40.12%) Loss: 2.020247 LR: 0.00002624 [11:26:47] Epoch: 1 Batch: 8065/20099 (40.13%) Loss: 2.086458 LR: 0.00002624 [11:26:49] Epoch: 1 Batch: 8066/20099 (40.13%) Loss: 1.952986 LR: 0.00002624 [11:26:51] Epoch: 1 Batch: 8067/20099 (40.14%) Loss: 2.029030 LR: 0.00002623 [11:26:53] Epoch: 1 Batch: 8068/20099 (40.14%) Loss: 2.143623 LR: 0.00002623 [11:26:55] Epoch: 1 Batch: 8069/20099 (40.15%) Loss: 2.064872 LR: 0.00002623 [11:26:56] Epoch: 1 Batch: 8070/20099 (40.15%) Loss: 2.193104 LR: 0.00002623 [11:26:58] Epoch: 1 Batch: 8071/20099 (40.16%) Loss: 2.252611 LR: 0.00002623 [11:27:00] Epoch: 1 Batch: 8072/20099 (40.16%) Loss: 2.139862 LR: 0.00002623 [11:27:02] Epoch: 1 Batch: 8073/20099 (40.17%) Loss: 1.989737 LR: 0.00002623 [11:27:03] Epoch: 1 Batch: 8074/20099 (40.17%) Loss: 2.038685 LR: 0.00002622 [11:27:05] Epoch: 1 Batch: 8075/20099 (40.18%) Loss: 2.078929 LR: 0.00002622 [11:27:07] Epoch: 1 Batch: 8076/20099 (40.18%) Loss: 2.279980 LR: 0.00002622 [11:27:09] Epoch: 1 Batch: 8077/20099 (40.19%) Loss: 2.045713 LR: 0.00002622 [11:27:11] Epoch: 1 Batch: 8078/20099 (40.19%) Loss: 2.248418 LR: 0.00002622 [11:27:12] Epoch: 1 Batch: 8079/20099 (40.20%) Loss: 2.075420 LR: 0.00002622 [11:27:14] Epoch: 1 Batch: 8080/20099 (40.20%) Loss: 2.288369 LR: 0.00002622 [11:27:16] Epoch: 1 Batch: 8081/20099 (40.21%) Loss: 1.927092 LR: 0.00002621 [11:27:18] Epoch: 1 Batch: 8082/20099 (40.21%) Loss: 2.067911 LR: 0.00002621 [11:27:19] Epoch: 1 Batch: 8083/20099 (40.22%) Loss: 2.386840 LR: 0.00002621 [11:27:21] Epoch: 1 Batch: 8084/20099 (40.22%) Loss: 2.434735 LR: 0.00002621 [11:27:23] Epoch: 1 Batch: 8085/20099 (40.23%) Loss: 2.088951 LR: 0.00002621 [11:27:25] Epoch: 1 Batch: 8086/20099 (40.23%) Loss: 2.365932 LR: 0.00002621 [11:27:27] Epoch: 1 Batch: 8087/20099 (40.24%) Loss: 2.069150 LR: 0.00002621 [11:27:28] Epoch: 1 Batch: 8088/20099 (40.24%) Loss: 2.295676 LR: 0.00002620 [11:27:30] Epoch: 1 Batch: 8089/20099 (40.25%) Loss: 2.266593 LR: 0.00002620 [11:27:32] Epoch: 1 Batch: 8090/20099 (40.25%) Loss: 2.131532 LR: 0.00002620 [11:27:34] Epoch: 1 Batch: 8091/20099 (40.26%) Loss: 2.172271 LR: 0.00002620 [11:27:35] Epoch: 1 Batch: 8092/20099 (40.26%) Loss: 2.058846 LR: 0.00002620 [11:27:37] Epoch: 1 Batch: 8093/20099 (40.27%) Loss: 2.207103 LR: 0.00002620 [11:27:39] Epoch: 1 Batch: 8094/20099 (40.27%) Loss: 1.966424 LR: 0.00002620 [11:27:41] Epoch: 1 Batch: 8095/20099 (40.28%) Loss: 2.181124 LR: 0.00002618 [11:27:42] Epoch: 1 Batch: 8096/20099 (40.28%) Loss: 1.904945 LR: 0.00002618 [11:27:44] Epoch: 1 Batch: 8097/20099 (40.29%) Loss: 2.003220 LR: 0.00002618 [11:27:46] Epoch: 1 Batch: 8098/20099 (40.29%) Loss: 2.040104 LR: 0.00002618 [11:27:48] Epoch: 1 Batch: 8099/20099 (40.30%) Loss: 2.242136 LR: 0.00002618 [11:27:49] Epoch: 1 Batch: 8100/20099 (40.30%) Loss: 1.977487 LR: 0.00002618 [11:27:51] Epoch: 1 Batch: 8101/20099 (40.31%) Loss: 1.898668 LR: 0.00002618 [11:27:53] Epoch: 1 Batch: 8102/20099 (40.31%) Loss: 1.885321 LR: 0.00002617 [11:27:55] Epoch: 1 Batch: 8103/20099 (40.32%) Loss: 2.075983 LR: 0.00002617 [11:27:57] Epoch: 1 Batch: 8104/20099 (40.32%) Loss: 2.069250 LR: 0.00002617 [11:27:58] Epoch: 1 Batch: 8105/20099 (40.33%) Loss: 2.148896 LR: 0.00002617 [11:28:00] Epoch: 1 Batch: 8106/20099 (40.33%) Loss: 1.813303 LR: 0.00002617 [11:28:02] Epoch: 1 Batch: 8107/20099 (40.34%) Loss: 2.052743 LR: 0.00002617 [11:28:04] Epoch: 1 Batch: 8108/20099 (40.34%) Loss: 2.371491 LR: 0.00002617 [11:28:05] Epoch: 1 Batch: 8109/20099 (40.35%) Loss: 2.201392 LR: 0.00002616 [11:28:07] Epoch: 1 Batch: 8110/20099 (40.35%) Loss: 2.194938 LR: 0.00002616 [11:28:09] Epoch: 1 Batch: 8111/20099 (40.36%) Loss: 2.204846 LR: 0.00002616 [11:28:11] Epoch: 1 Batch: 8112/20099 (40.36%) Loss: 1.876159 LR: 0.00002616 [11:28:12] Epoch: 1 Batch: 8113/20099 (40.37%) Loss: 2.226472 LR: 0.00002616 [11:28:14] Epoch: 1 Batch: 8114/20099 (40.37%) Loss: 2.090899 LR: 0.00002616 [11:28:16] Epoch: 1 Batch: 8115/20099 (40.38%) Loss: 2.026616 LR: 0.00002616 [11:28:18] Epoch: 1 Batch: 8116/20099 (40.38%) Loss: 2.109978 LR: 0.00002615 [11:28:20] Epoch: 1 Batch: 8117/20099 (40.39%) Loss: 1.979089 LR: 0.00002615 [11:28:21] Epoch: 1 Batch: 8118/20099 (40.39%) Loss: 2.150599 LR: 0.00002615 [11:28:23] Epoch: 1 Batch: 8119/20099 (40.40%) Loss: 1.914079 LR: 0.00002615 [11:28:25] Epoch: 1 Batch: 8120/20099 (40.40%) Loss: 2.147960 LR: 0.00002615 [11:28:27] Epoch: 1 Batch: 8121/20099 (40.40%) Loss: 2.144169 LR: 0.00002615 [11:28:28] Epoch: 1 Batch: 8122/20099 (40.41%) Loss: 2.365145 LR: 0.00002615 [11:28:30] Epoch: 1 Batch: 8123/20099 (40.41%) Loss: 2.087547 LR: 0.00002614 [11:28:32] Epoch: 1 Batch: 8124/20099 (40.42%) Loss: 2.122096 LR: 0.00002614 [11:28:34] Epoch: 1 Batch: 8125/20099 (40.42%) Loss: 2.033706 LR: 0.00002614 [11:28:36] Epoch: 1 Batch: 8126/20099 (40.43%) Loss: 2.131450 LR: 0.00002614 [11:28:37] Epoch: 1 Batch: 8127/20099 (40.43%) Loss: 1.983948 LR: 0.00002614 [11:28:39] Epoch: 1 Batch: 8128/20099 (40.44%) Loss: 2.322608 LR: 0.00002614 [11:28:41] Epoch: 1 Batch: 8129/20099 (40.44%) Loss: 2.063937 LR: 0.00002614 [11:28:43] Epoch: 1 Batch: 8130/20099 (40.45%) Loss: 1.881704 LR: 0.00002612 [11:28:44] Epoch: 1 Batch: 8131/20099 (40.45%) Loss: 2.292980 LR: 0.00002612 [11:28:46] Epoch: 1 Batch: 8132/20099 (40.46%) Loss: 1.766855 LR: 0.00002612 [11:28:48] Epoch: 1 Batch: 8133/20099 (40.46%) Loss: 1.939925 LR: 0.00002612 [11:28:50] Epoch: 1 Batch: 8134/20099 (40.47%) Loss: 2.133668 LR: 0.00002612 [11:28:51] Epoch: 1 Batch: 8135/20099 (40.47%) Loss: 2.061893 LR: 0.00002612 [11:28:53] Epoch: 1 Batch: 8136/20099 (40.48%) Loss: 1.912529 LR: 0.00002612 [11:28:55] Epoch: 1 Batch: 8137/20099 (40.48%) Loss: 2.062526 LR: 0.00002611 [11:28:57] Epoch: 1 Batch: 8138/20099 (40.49%) Loss: 2.072294 LR: 0.00002611 [11:28:59] Epoch: 1 Batch: 8139/20099 (40.49%) Loss: 2.097504 LR: 0.00002611 [11:29:00] Epoch: 1 Batch: 8140/20099 (40.50%) Loss: 1.816803 LR: 0.00002611 [11:29:02] Epoch: 1 Batch: 8141/20099 (40.50%) Loss: 2.211808 LR: 0.00002611 [11:29:04] Epoch: 1 Batch: 8142/20099 (40.51%) Loss: 1.892917 LR: 0.00002611 [11:29:06] Epoch: 1 Batch: 8143/20099 (40.51%) Loss: 2.049631 LR: 0.00002611 [11:29:07] Epoch: 1 Batch: 8144/20099 (40.52%) Loss: 1.988649 LR: 0.00002610 [11:29:09] Epoch: 1 Batch: 8145/20099 (40.52%) Loss: 2.092923 LR: 0.00002610 [11:29:11] Epoch: 1 Batch: 8146/20099 (40.53%) Loss: 1.924506 LR: 0.00002610 [11:29:13] Epoch: 1 Batch: 8147/20099 (40.53%) Loss: 2.405868 LR: 0.00002610 [11:29:15] Epoch: 1 Batch: 8148/20099 (40.54%) Loss: 2.132190 LR: 0.00002610 [11:29:16] Epoch: 1 Batch: 8149/20099 (40.54%) Loss: 2.085325 LR: 0.00002610 [11:29:18] Epoch: 1 Batch: 8150/20099 (40.55%) Loss: 2.080643 LR: 0.00002610 [11:29:20] Epoch: 1 Batch: 8151/20099 (40.55%) Loss: 2.010659 LR: 0.00002609 [11:29:22] Epoch: 1 Batch: 8152/20099 (40.56%) Loss: 2.264739 LR: 0.00002609 [11:29:23] Epoch: 1 Batch: 8153/20099 (40.56%) Loss: 2.431526 LR: 0.00002609 [11:29:25] Epoch: 1 Batch: 8154/20099 (40.57%) Loss: 2.350100 LR: 0.00002609 [11:29:27] Epoch: 1 Batch: 8155/20099 (40.57%) Loss: 2.158111 LR: 0.00002609 [11:29:29] Epoch: 1 Batch: 8156/20099 (40.58%) Loss: 2.015496 LR: 0.00002609 [11:29:31] Epoch: 1 Batch: 8157/20099 (40.58%) Loss: 2.108106 LR: 0.00002609 [11:29:32] Epoch: 1 Batch: 8158/20099 (40.59%) Loss: 2.238278 LR: 0.00002608 [11:29:34] Epoch: 1 Batch: 8159/20099 (40.59%) Loss: 1.924238 LR: 0.00002608 [11:29:36] Epoch: 1 Batch: 8160/20099 (40.60%) Loss: 2.282678 LR: 0.00002608 [11:29:38] Epoch: 1 Batch: 8161/20099 (40.60%) Loss: 2.084300 LR: 0.00002608 [11:29:39] Epoch: 1 Batch: 8162/20099 (40.61%) Loss: 2.397682 LR: 0.00002608 [11:29:41] Epoch: 1 Batch: 8163/20099 (40.61%) Loss: 1.724172 LR: 0.00002608 [11:29:43] Epoch: 1 Batch: 8164/20099 (40.62%) Loss: 1.999162 LR: 0.00002608 [11:29:45] Epoch: 1 Batch: 8165/20099 (40.62%) Loss: 2.063351 LR: 0.00002606 [11:29:47] Epoch: 1 Batch: 8166/20099 (40.63%) Loss: 1.918376 LR: 0.00002606 [11:29:48] Epoch: 1 Batch: 8167/20099 (40.63%) Loss: 2.213048 LR: 0.00002606 [11:29:50] Epoch: 1 Batch: 8168/20099 (40.64%) Loss: 1.793094 LR: 0.00002606 [11:29:52] Epoch: 1 Batch: 8169/20099 (40.64%) Loss: 2.151948 LR: 0.00002606 [11:29:54] Epoch: 1 Batch: 8170/20099 (40.65%) Loss: 1.930595 LR: 0.00002606 [11:29:55] Epoch: 1 Batch: 8171/20099 (40.65%) Loss: 1.909399 LR: 0.00002606 [11:29:57] Epoch: 1 Batch: 8172/20099 (40.66%) Loss: 2.279598 LR: 0.00002605 [11:29:59] Epoch: 1 Batch: 8173/20099 (40.66%) Loss: 2.417250 LR: 0.00002605 [11:30:01] Epoch: 1 Batch: 8174/20099 (40.67%) Loss: 2.091132 LR: 0.00002605 [11:30:03] Epoch: 1 Batch: 8175/20099 (40.67%) Loss: 2.024986 LR: 0.00002605 [11:30:04] Epoch: 1 Batch: 8176/20099 (40.68%) Loss: 2.324004 LR: 0.00002605 [11:30:06] Epoch: 1 Batch: 8177/20099 (40.68%) Loss: 2.306357 LR: 0.00002605 [11:30:08] Epoch: 1 Batch: 8178/20099 (40.69%) Loss: 2.103740 LR: 0.00002605 [11:30:10] Epoch: 1 Batch: 8179/20099 (40.69%) Loss: 1.767395 LR: 0.00002604 [11:30:11] Epoch: 1 Batch: 8180/20099 (40.70%) Loss: 2.132862 LR: 0.00002604 [11:30:13] Epoch: 1 Batch: 8181/20099 (40.70%) Loss: 2.320781 LR: 0.00002604 [11:30:15] Epoch: 1 Batch: 8182/20099 (40.71%) Loss: 2.084222 LR: 0.00002604 [11:30:17] Epoch: 1 Batch: 8183/20099 (40.71%) Loss: 2.149271 LR: 0.00002604 [11:30:19] Epoch: 1 Batch: 8184/20099 (40.72%) Loss: 2.026205 LR: 0.00002604 [11:30:20] Epoch: 1 Batch: 8185/20099 (40.72%) Loss: 2.202920 LR: 0.00002604 [11:30:22] Epoch: 1 Batch: 8186/20099 (40.73%) Loss: 1.989897 LR: 0.00002603 [11:30:24] Epoch: 1 Batch: 8187/20099 (40.73%) Loss: 2.411265 LR: 0.00002603 [11:30:26] Epoch: 1 Batch: 8188/20099 (40.74%) Loss: 2.127685 LR: 0.00002603 [11:30:27] Epoch: 1 Batch: 8189/20099 (40.74%) Loss: 2.115659 LR: 0.00002603 [11:30:29] Epoch: 1 Batch: 8190/20099 (40.75%) Loss: 2.319483 LR: 0.00002603 [11:30:31] Epoch: 1 Batch: 8191/20099 (40.75%) Loss: 2.101728 LR: 0.00002603 [11:30:33] Epoch: 1 Batch: 8192/20099 (40.76%) Loss: 2.121945 LR: 0.00002603 [11:30:35] Epoch: 1 Batch: 8193/20099 (40.76%) Loss: 2.246704 LR: 0.00002602 [11:30:36] Epoch: 1 Batch: 8194/20099 (40.77%) Loss: 1.910924 LR: 0.00002602 [11:30:38] Epoch: 1 Batch: 8195/20099 (40.77%) Loss: 2.356366 LR: 0.00002602 [11:30:40] Epoch: 1 Batch: 8196/20099 (40.78%) Loss: 2.065781 LR: 0.00002602 [11:30:42] Epoch: 1 Batch: 8197/20099 (40.78%) Loss: 2.082463 LR: 0.00002602 [11:30:43] Epoch: 1 Batch: 8198/20099 (40.79%) Loss: 2.227404 LR: 0.00002602 [11:30:45] Epoch: 1 Batch: 8199/20099 (40.79%) Loss: 2.247691 LR: 0.00002602 [11:30:50] >> Cleaned up old temp checkpoint: epoch1_step6200 [11:30:50] >> Temp checkpoint saved: epoch1_step8200, size: 0.1693 GB [11:30:50] Epoch: 1 Batch: 8200/20099 (40.80%) Loss: 2.103532 LR: 0.00002600 [11:30:52] Epoch: 1 Batch: 8201/20099 (40.80%) Loss: 2.047515 LR: 0.00002600 [11:30:54] Epoch: 1 Batch: 8202/20099 (40.81%) Loss: 2.223495 LR: 0.00002600 [11:30:56] Epoch: 1 Batch: 8203/20099 (40.81%) Loss: 2.347199 LR: 0.00002600 [11:30:58] Epoch: 1 Batch: 8204/20099 (40.82%) Loss: 2.335638 LR: 0.00002600 [11:30:59] Epoch: 1 Batch: 8205/20099 (40.82%) Loss: 2.047778 LR: 0.00002600 [11:31:01] Epoch: 1 Batch: 8206/20099 (40.83%) Loss: 2.174614 LR: 0.00002600 [11:31:03] Epoch: 1 Batch: 8207/20099 (40.83%) Loss: 1.986754 LR: 0.00002599 [11:31:05] Epoch: 1 Batch: 8208/20099 (40.84%) Loss: 2.258337 LR: 0.00002599 [11:31:06] Epoch: 1 Batch: 8209/20099 (40.84%) Loss: 2.233785 LR: 0.00002599 [11:31:08] Epoch: 1 Batch: 8210/20099 (40.85%) Loss: 1.935499 LR: 0.00002599 [11:31:10] Epoch: 1 Batch: 8211/20099 (40.85%) Loss: 2.332697 LR: 0.00002599 [11:31:12] Epoch: 1 Batch: 8212/20099 (40.86%) Loss: 2.397747 LR: 0.00002599 [11:31:14] Epoch: 1 Batch: 8213/20099 (40.86%) Loss: 2.253101 LR: 0.00002599 [11:31:15] Epoch: 1 Batch: 8214/20099 (40.87%) Loss: 1.852451 LR: 0.00002598 [11:31:17] Epoch: 1 Batch: 8215/20099 (40.87%) Loss: 2.131278 LR: 0.00002598 [11:31:19] Epoch: 1 Batch: 8216/20099 (40.88%) Loss: 2.265303 LR: 0.00002598 [11:31:21] Epoch: 1 Batch: 8217/20099 (40.88%) Loss: 2.155893 LR: 0.00002598 [11:31:23] Epoch: 1 Batch: 8218/20099 (40.89%) Loss: 2.215322 LR: 0.00002598 [11:31:24] Epoch: 1 Batch: 8219/20099 (40.89%) Loss: 2.125850 LR: 0.00002598 [11:31:26] Epoch: 1 Batch: 8220/20099 (40.90%) Loss: 1.943813 LR: 0.00002598 [11:31:28] Epoch: 1 Batch: 8221/20099 (40.90%) Loss: 2.177730 LR: 0.00002597 [11:31:30] Epoch: 1 Batch: 8222/20099 (40.91%) Loss: 1.846109 LR: 0.00002597 [11:31:31] Epoch: 1 Batch: 8223/20099 (40.91%) Loss: 2.375459 LR: 0.00002597 [11:31:33] Epoch: 1 Batch: 8224/20099 (40.92%) Loss: 1.996266 LR: 0.00002597 [11:31:35] Epoch: 1 Batch: 8225/20099 (40.92%) Loss: 2.110236 LR: 0.00002597 [11:31:37] Epoch: 1 Batch: 8226/20099 (40.93%) Loss: 1.869027 LR: 0.00002597 [11:31:39] Epoch: 1 Batch: 8227/20099 (40.93%) Loss: 2.108586 LR: 0.00002597 [11:31:40] Epoch: 1 Batch: 8228/20099 (40.94%) Loss: 2.110737 LR: 0.00002596 [11:31:42] Epoch: 1 Batch: 8229/20099 (40.94%) Loss: 2.118512 LR: 0.00002596 [11:31:44] Epoch: 1 Batch: 8230/20099 (40.95%) Loss: 2.045492 LR: 0.00002596 [11:31:46] Epoch: 1 Batch: 8231/20099 (40.95%) Loss: 2.283268 LR: 0.00002596 [11:31:47] Epoch: 1 Batch: 8232/20099 (40.96%) Loss: 2.253147 LR: 0.00002596 [11:31:49] Epoch: 1 Batch: 8233/20099 (40.96%) Loss: 2.287370 LR: 0.00002596 [11:31:51] Epoch: 1 Batch: 8234/20099 (40.97%) Loss: 1.911391 LR: 0.00002596 [11:31:53] Epoch: 1 Batch: 8235/20099 (40.97%) Loss: 2.213205 LR: 0.00002594 [11:31:54] Epoch: 1 Batch: 8236/20099 (40.98%) Loss: 2.301231 LR: 0.00002594 [11:31:56] Epoch: 1 Batch: 8237/20099 (40.98%) Loss: 2.063960 LR: 0.00002594 [11:31:58] Epoch: 1 Batch: 8238/20099 (40.99%) Loss: 2.050057 LR: 0.00002594 [11:32:00] Epoch: 1 Batch: 8239/20099 (40.99%) Loss: 2.095547 LR: 0.00002594 [11:32:02] Epoch: 1 Batch: 8240/20099 (41.00%) Loss: 1.928670 LR: 0.00002594 [11:32:03] Epoch: 1 Batch: 8241/20099 (41.00%) Loss: 1.848796 LR: 0.00002594 [11:32:05] Epoch: 1 Batch: 8242/20099 (41.01%) Loss: 2.150547 LR: 0.00002593 [11:32:07] Epoch: 1 Batch: 8243/20099 (41.01%) Loss: 1.700732 LR: 0.00002593 [11:32:09] Epoch: 1 Batch: 8244/20099 (41.02%) Loss: 1.973993 LR: 0.00002593 [11:32:10] Epoch: 1 Batch: 8245/20099 (41.02%) Loss: 2.132689 LR: 0.00002593 [11:32:12] Epoch: 1 Batch: 8246/20099 (41.03%) Loss: 2.147711 LR: 0.00002593 [11:32:14] Epoch: 1 Batch: 8247/20099 (41.03%) Loss: 2.245638 LR: 0.00002593 [11:32:16] Epoch: 1 Batch: 8248/20099 (41.04%) Loss: 2.066290 LR: 0.00002593 [11:32:18] Epoch: 1 Batch: 8249/20099 (41.04%) Loss: 2.386791 LR: 0.00002592 [11:32:19] Epoch: 1 Batch: 8250/20099 (41.05%) Loss: 1.859469 LR: 0.00002592 [11:32:21] Epoch: 1 Batch: 8251/20099 (41.05%) Loss: 1.961264 LR: 0.00002592 [11:32:23] Epoch: 1 Batch: 8252/20099 (41.06%) Loss: 1.790578 LR: 0.00002592 [11:32:25] Epoch: 1 Batch: 8253/20099 (41.06%) Loss: 2.089749 LR: 0.00002592 [11:32:26] Epoch: 1 Batch: 8254/20099 (41.07%) Loss: 2.168762 LR: 0.00002592 [11:32:28] Epoch: 1 Batch: 8255/20099 (41.07%) Loss: 2.102539 LR: 0.00002592 [11:32:30] Epoch: 1 Batch: 8256/20099 (41.08%) Loss: 2.217268 LR: 0.00002591 [11:32:32] Epoch: 1 Batch: 8257/20099 (41.08%) Loss: 1.979180 LR: 0.00002591 [11:32:34] Epoch: 1 Batch: 8258/20099 (41.09%) Loss: 2.183273 LR: 0.00002591 [11:32:35] Epoch: 1 Batch: 8259/20099 (41.09%) Loss: 2.354937 LR: 0.00002591 [11:32:37] Epoch: 1 Batch: 8260/20099 (41.10%) Loss: 2.051922 LR: 0.00002591 [11:32:39] Epoch: 1 Batch: 8261/20099 (41.10%) Loss: 1.968598 LR: 0.00002591 [11:32:41] Epoch: 1 Batch: 8262/20099 (41.11%) Loss: 2.441431 LR: 0.00002591 [11:32:43] Epoch: 1 Batch: 8263/20099 (41.11%) Loss: 2.032303 LR: 0.00002590 [11:32:44] Epoch: 1 Batch: 8264/20099 (41.12%) Loss: 2.134115 LR: 0.00002590 [11:32:46] Epoch: 1 Batch: 8265/20099 (41.12%) Loss: 2.029630 LR: 0.00002590 [11:32:48] Epoch: 1 Batch: 8266/20099 (41.13%) Loss: 1.927977 LR: 0.00002590 [11:32:50] Epoch: 1 Batch: 8267/20099 (41.13%) Loss: 2.078717 LR: 0.00002590 [11:32:51] Epoch: 1 Batch: 8268/20099 (41.14%) Loss: 2.343168 LR: 0.00002590 [11:32:53] Epoch: 1 Batch: 8269/20099 (41.14%) Loss: 2.276478 LR: 0.00002590 [11:32:55] Epoch: 1 Batch: 8270/20099 (41.15%) Loss: 1.739197 LR: 0.00002588 [11:32:57] Epoch: 1 Batch: 8271/20099 (41.15%) Loss: 2.223879 LR: 0.00002588 [11:32:59] Epoch: 1 Batch: 8272/20099 (41.16%) Loss: 2.110296 LR: 0.00002588 [11:33:00] Epoch: 1 Batch: 8273/20099 (41.16%) Loss: 2.004871 LR: 0.00002588 [11:33:02] Epoch: 1 Batch: 8274/20099 (41.17%) Loss: 1.948287 LR: 0.00002588 [11:33:04] Epoch: 1 Batch: 8275/20099 (41.17%) Loss: 2.202878 LR: 0.00002588 [11:33:06] Epoch: 1 Batch: 8276/20099 (41.18%) Loss: 2.047437 LR: 0.00002588 [11:33:08] Epoch: 1 Batch: 8277/20099 (41.18%) Loss: 1.949892 LR: 0.00002587 [11:33:09] Epoch: 1 Batch: 8278/20099 (41.19%) Loss: 2.106364 LR: 0.00002587 [11:33:11] Epoch: 1 Batch: 8279/20099 (41.19%) Loss: 2.011910 LR: 0.00002587 [11:33:13] Epoch: 1 Batch: 8280/20099 (41.20%) Loss: 2.104411 LR: 0.00002587 [11:33:15] Epoch: 1 Batch: 8281/20099 (41.20%) Loss: 2.362146 LR: 0.00002587 [11:33:16] Epoch: 1 Batch: 8282/20099 (41.21%) Loss: 2.035570 LR: 0.00002587 [11:33:18] Epoch: 1 Batch: 8283/20099 (41.21%) Loss: 2.079381 LR: 0.00002587 [11:33:20] Epoch: 1 Batch: 8284/20099 (41.22%) Loss: 2.328632 LR: 0.00002586 [11:33:22] Epoch: 1 Batch: 8285/20099 (41.22%) Loss: 2.030283 LR: 0.00002586 [11:33:23] Epoch: 1 Batch: 8286/20099 (41.23%) Loss: 2.045771 LR: 0.00002586 [11:33:25] Epoch: 1 Batch: 8287/20099 (41.23%) Loss: 1.794908 LR: 0.00002586 [11:33:27] Epoch: 1 Batch: 8288/20099 (41.24%) Loss: 2.279110 LR: 0.00002586 [11:33:29] Epoch: 1 Batch: 8289/20099 (41.24%) Loss: 2.111758 LR: 0.00002586 [11:33:31] Epoch: 1 Batch: 8290/20099 (41.25%) Loss: 2.477107 LR: 0.00002586 [11:33:32] Epoch: 1 Batch: 8291/20099 (41.25%) Loss: 1.909185 LR: 0.00002585 [11:33:34] Epoch: 1 Batch: 8292/20099 (41.26%) Loss: 1.998699 LR: 0.00002585 [11:33:36] Epoch: 1 Batch: 8293/20099 (41.26%) Loss: 1.997839 LR: 0.00002585 [11:33:38] Epoch: 1 Batch: 8294/20099 (41.27%) Loss: 2.187543 LR: 0.00002585 [11:33:40] Epoch: 1 Batch: 8295/20099 (41.27%) Loss: 2.319999 LR: 0.00002585 [11:33:41] Epoch: 1 Batch: 8296/20099 (41.28%) Loss: 2.363299 LR: 0.00002585 [11:33:43] Epoch: 1 Batch: 8297/20099 (41.28%) Loss: 1.838425 LR: 0.00002585 [11:33:45] Epoch: 1 Batch: 8298/20099 (41.29%) Loss: 2.144073 LR: 0.00002583 [11:33:47] Epoch: 1 Batch: 8299/20099 (41.29%) Loss: 1.776664 LR: 0.00002583 [11:33:48] Epoch: 1 Batch: 8300/20099 (41.30%) Loss: 2.348075 LR: 0.00002583 [11:33:50] Epoch: 1 Batch: 8301/20099 (41.30%) Loss: 2.438646 LR: 0.00002583 [11:33:52] Epoch: 1 Batch: 8302/20099 (41.31%) Loss: 2.087900 LR: 0.00002583 [11:33:54] Epoch: 1 Batch: 8303/20099 (41.31%) Loss: 2.235232 LR: 0.00002583 [11:33:56] Epoch: 1 Batch: 8304/20099 (41.32%) Loss: 2.217824 LR: 0.00002583 [11:33:57] Epoch: 1 Batch: 8305/20099 (41.32%) Loss: 2.130119 LR: 0.00002582 [11:33:59] Epoch: 1 Batch: 8306/20099 (41.33%) Loss: 1.960494 LR: 0.00002582 [11:34:01] Epoch: 1 Batch: 8307/20099 (41.33%) Loss: 1.944493 LR: 0.00002582 [11:34:03] Epoch: 1 Batch: 8308/20099 (41.34%) Loss: 2.021588 LR: 0.00002582 [11:34:04] Epoch: 1 Batch: 8309/20099 (41.34%) Loss: 2.081196 LR: 0.00002582 [11:34:06] Epoch: 1 Batch: 8310/20099 (41.35%) Loss: 2.019321 LR: 0.00002582 [11:34:08] Epoch: 1 Batch: 8311/20099 (41.35%) Loss: 2.207712 LR: 0.00002582 [11:34:10] Epoch: 1 Batch: 8312/20099 (41.36%) Loss: 2.043094 LR: 0.00002581 [11:34:12] Epoch: 1 Batch: 8313/20099 (41.36%) Loss: 1.935134 LR: 0.00002581 [11:34:13] Epoch: 1 Batch: 8314/20099 (41.37%) Loss: 2.127048 LR: 0.00002581 [11:34:15] Epoch: 1 Batch: 8315/20099 (41.37%) Loss: 2.136548 LR: 0.00002581 [11:34:17] Epoch: 1 Batch: 8316/20099 (41.38%) Loss: 2.417741 LR: 0.00002581 [11:34:19] Epoch: 1 Batch: 8317/20099 (41.38%) Loss: 1.796977 LR: 0.00002581 [11:34:20] Epoch: 1 Batch: 8318/20099 (41.39%) Loss: 2.099362 LR: 0.00002581 [11:34:22] Epoch: 1 Batch: 8319/20099 (41.39%) Loss: 2.100381 LR: 0.00002580 [11:34:24] Epoch: 1 Batch: 8320/20099 (41.40%) Loss: 2.255765 LR: 0.00002580 [11:34:26] Epoch: 1 Batch: 8321/20099 (41.40%) Loss: 1.894671 LR: 0.00002580 [11:34:28] Epoch: 1 Batch: 8322/20099 (41.41%) Loss: 2.047995 LR: 0.00002580 [11:34:29] Epoch: 1 Batch: 8323/20099 (41.41%) Loss: 2.025865 LR: 0.00002580 [11:34:31] Epoch: 1 Batch: 8324/20099 (41.41%) Loss: 2.054902 LR: 0.00002580 [11:34:33] Epoch: 1 Batch: 8325/20099 (41.42%) Loss: 2.210465 LR: 0.00002580 [11:34:35] Epoch: 1 Batch: 8326/20099 (41.42%) Loss: 2.111426 LR: 0.00002578 [11:34:36] Epoch: 1 Batch: 8327/20099 (41.43%) Loss: 1.940457 LR: 0.00002578 [11:34:38] Epoch: 1 Batch: 8328/20099 (41.43%) Loss: 2.161345 LR: 0.00002578 [11:34:40] Epoch: 1 Batch: 8329/20099 (41.44%) Loss: 1.977554 LR: 0.00002578 [11:34:42] Epoch: 1 Batch: 8330/20099 (41.44%) Loss: 2.280603 LR: 0.00002578 [11:34:44] Epoch: 1 Batch: 8331/20099 (41.45%) Loss: 2.198880 LR: 0.00002578 [11:34:45] Epoch: 1 Batch: 8332/20099 (41.45%) Loss: 2.234639 LR: 0.00002578 [11:34:47] Epoch: 1 Batch: 8333/20099 (41.46%) Loss: 2.143510 LR: 0.00002577 [11:34:49] Epoch: 1 Batch: 8334/20099 (41.46%) Loss: 2.069878 LR: 0.00002577 [11:34:51] Epoch: 1 Batch: 8335/20099 (41.47%) Loss: 1.938427 LR: 0.00002577 [11:34:52] Epoch: 1 Batch: 8336/20099 (41.47%) Loss: 1.909236 LR: 0.00002577 [11:34:54] Epoch: 1 Batch: 8337/20099 (41.48%) Loss: 2.240653 LR: 0.00002577 [11:34:56] Epoch: 1 Batch: 8338/20099 (41.48%) Loss: 2.166502 LR: 0.00002577 [11:34:58] Epoch: 1 Batch: 8339/20099 (41.49%) Loss: 1.838065 LR: 0.00002577 [11:35:00] Epoch: 1 Batch: 8340/20099 (41.49%) Loss: 2.241515 LR: 0.00002576 [11:35:01] Epoch: 1 Batch: 8341/20099 (41.50%) Loss: 2.158964 LR: 0.00002576 [11:35:03] Epoch: 1 Batch: 8342/20099 (41.50%) Loss: 2.317140 LR: 0.00002576 [11:35:05] Epoch: 1 Batch: 8343/20099 (41.51%) Loss: 1.841415 LR: 0.00002576 [11:35:07] Epoch: 1 Batch: 8344/20099 (41.51%) Loss: 2.312675 LR: 0.00002576 [11:35:08] Epoch: 1 Batch: 8345/20099 (41.52%) Loss: 2.040964 LR: 0.00002576 [11:35:10] Epoch: 1 Batch: 8346/20099 (41.52%) Loss: 1.698634 LR: 0.00002576 [11:35:12] Epoch: 1 Batch: 8347/20099 (41.53%) Loss: 1.931397 LR: 0.00002575 [11:35:14] Epoch: 1 Batch: 8348/20099 (41.53%) Loss: 2.189879 LR: 0.00002575 [11:35:16] Epoch: 1 Batch: 8349/20099 (41.54%) Loss: 2.087610 LR: 0.00002575 [11:35:17] Epoch: 1 Batch: 8350/20099 (41.54%) Loss: 2.166534 LR: 0.00002575 [11:35:19] Epoch: 1 Batch: 8351/20099 (41.55%) Loss: 1.921308 LR: 0.00002575 [11:35:21] Epoch: 1 Batch: 8352/20099 (41.55%) Loss: 1.497729 LR: 0.00002575 [11:35:23] Epoch: 1 Batch: 8353/20099 (41.56%) Loss: 2.069600 LR: 0.00002575 [11:35:25] Epoch: 1 Batch: 8354/20099 (41.56%) Loss: 2.120371 LR: 0.00002573 [11:35:26] Epoch: 1 Batch: 8355/20099 (41.57%) Loss: 2.131090 LR: 0.00002573 [11:35:28] Epoch: 1 Batch: 8356/20099 (41.57%) Loss: 2.483771 LR: 0.00002573 [11:35:30] Epoch: 1 Batch: 8357/20099 (41.58%) Loss: 2.265750 LR: 0.00002573 [11:35:32] Epoch: 1 Batch: 8358/20099 (41.58%) Loss: 2.274986 LR: 0.00002573 [11:35:33] Epoch: 1 Batch: 8359/20099 (41.59%) Loss: 2.300013 LR: 0.00002573 [11:35:35] Epoch: 1 Batch: 8360/20099 (41.59%) Loss: 2.208018 LR: 0.00002573 [11:35:37] Epoch: 1 Batch: 8361/20099 (41.60%) Loss: 2.316407 LR: 0.00002572 [11:35:39] Epoch: 1 Batch: 8362/20099 (41.60%) Loss: 2.380417 LR: 0.00002572 [11:35:41] Epoch: 1 Batch: 8363/20099 (41.61%) Loss: 2.100675 LR: 0.00002572 [11:35:42] Epoch: 1 Batch: 8364/20099 (41.61%) Loss: 2.304286 LR: 0.00002572 [11:35:44] Epoch: 1 Batch: 8365/20099 (41.62%) Loss: 2.019868 LR: 0.00002572 [11:35:46] Epoch: 1 Batch: 8366/20099 (41.62%) Loss: 1.876629 LR: 0.00002572 [11:35:48] Epoch: 1 Batch: 8367/20099 (41.63%) Loss: 2.073965 LR: 0.00002572 [11:35:49] Epoch: 1 Batch: 8368/20099 (41.63%) Loss: 2.215035 LR: 0.00002571 [11:35:51] Epoch: 1 Batch: 8369/20099 (41.64%) Loss: 1.950903 LR: 0.00002571 [11:35:53] Epoch: 1 Batch: 8370/20099 (41.64%) Loss: 2.225978 LR: 0.00002571 [11:35:55] Epoch: 1 Batch: 8371/20099 (41.65%) Loss: 2.049452 LR: 0.00002571 [11:35:57] Epoch: 1 Batch: 8372/20099 (41.65%) Loss: 2.098077 LR: 0.00002571 [11:35:58] Epoch: 1 Batch: 8373/20099 (41.66%) Loss: 2.496556 LR: 0.00002571 [11:36:00] Epoch: 1 Batch: 8374/20099 (41.66%) Loss: 2.193581 LR: 0.00002571 [11:36:02] Epoch: 1 Batch: 8375/20099 (41.67%) Loss: 2.276154 LR: 0.00002570 [11:36:04] Epoch: 1 Batch: 8376/20099 (41.67%) Loss: 2.194686 LR: 0.00002570 [11:36:05] Epoch: 1 Batch: 8377/20099 (41.68%) Loss: 2.273958 LR: 0.00002570 [11:36:07] Epoch: 1 Batch: 8378/20099 (41.68%) Loss: 2.111023 LR: 0.00002570 [11:36:09] Epoch: 1 Batch: 8379/20099 (41.69%) Loss: 1.934123 LR: 0.00002570 [11:36:11] Epoch: 1 Batch: 8380/20099 (41.69%) Loss: 2.115332 LR: 0.00002570 [11:36:12] Epoch: 1 Batch: 8381/20099 (41.70%) Loss: 2.146709 LR: 0.00002570 [11:36:14] Epoch: 1 Batch: 8382/20099 (41.70%) Loss: 2.267504 LR: 0.00002569 [11:36:16] Epoch: 1 Batch: 8383/20099 (41.71%) Loss: 2.260318 LR: 0.00002569 [11:36:18] Epoch: 1 Batch: 8384/20099 (41.71%) Loss: 2.566031 LR: 0.00002569 [11:36:20] Epoch: 1 Batch: 8385/20099 (41.72%) Loss: 2.083444 LR: 0.00002569 [11:36:21] Epoch: 1 Batch: 8386/20099 (41.72%) Loss: 2.202050 LR: 0.00002569 [11:36:23] Epoch: 1 Batch: 8387/20099 (41.73%) Loss: 1.852112 LR: 0.00002569 [11:36:25] Epoch: 1 Batch: 8388/20099 (41.73%) Loss: 1.968212 LR: 0.00002569 [11:36:27] Epoch: 1 Batch: 8389/20099 (41.74%) Loss: 2.033763 LR: 0.00002567 [11:36:29] Epoch: 1 Batch: 8390/20099 (41.74%) Loss: 2.027413 LR: 0.00002567 [11:36:30] Epoch: 1 Batch: 8391/20099 (41.75%) Loss: 2.031100 LR: 0.00002567 [11:36:32] Epoch: 1 Batch: 8392/20099 (41.75%) Loss: 2.170081 LR: 0.00002567 [11:36:34] Epoch: 1 Batch: 8393/20099 (41.76%) Loss: 2.148518 LR: 0.00002567 [11:36:36] Epoch: 1 Batch: 8394/20099 (41.76%) Loss: 2.034337 LR: 0.00002567 [11:36:37] Epoch: 1 Batch: 8395/20099 (41.77%) Loss: 2.391926 LR: 0.00002567 [11:36:39] Epoch: 1 Batch: 8396/20099 (41.77%) Loss: 2.413854 LR: 0.00002566 [11:36:41] Epoch: 1 Batch: 8397/20099 (41.78%) Loss: 2.125856 LR: 0.00002566 [11:36:43] Epoch: 1 Batch: 8398/20099 (41.78%) Loss: 2.021636 LR: 0.00002566 [11:36:45] Epoch: 1 Batch: 8399/20099 (41.79%) Loss: 1.933128 LR: 0.00002566 [11:36:50] >> Cleaned up old temp checkpoint: epoch1_step6400 [11:36:50] >> Temp checkpoint saved: epoch1_step8400, size: 0.1693 GB [11:36:50] Epoch: 1 Batch: 8400/20099 (41.79%) Loss: 2.135387 LR: 0.00002566 [11:36:52] Epoch: 1 Batch: 8401/20099 (41.80%) Loss: 2.146113 LR: 0.00002566 [11:36:53] Epoch: 1 Batch: 8402/20099 (41.80%) Loss: 1.927920 LR: 0.00002566 [11:36:55] Epoch: 1 Batch: 8403/20099 (41.81%) Loss: 2.080199 LR: 0.00002565 [11:36:57] Epoch: 1 Batch: 8404/20099 (41.81%) Loss: 2.045110 LR: 0.00002565 [11:36:59] Epoch: 1 Batch: 8405/20099 (41.82%) Loss: 1.747438 LR: 0.00002565 [11:37:00] Epoch: 1 Batch: 8406/20099 (41.82%) Loss: 2.329950 LR: 0.00002565 [11:37:02] Epoch: 1 Batch: 8407/20099 (41.83%) Loss: 1.695412 LR: 0.00002565 [11:37:04] Epoch: 1 Batch: 8408/20099 (41.83%) Loss: 2.111331 LR: 0.00002565 [11:37:06] Epoch: 1 Batch: 8409/20099 (41.84%) Loss: 2.257595 LR: 0.00002565 [11:37:08] Epoch: 1 Batch: 8410/20099 (41.84%) Loss: 2.316241 LR: 0.00002564 [11:37:09] Epoch: 1 Batch: 8411/20099 (41.85%) Loss: 1.885411 LR: 0.00002564 [11:37:11] Epoch: 1 Batch: 8412/20099 (41.85%) Loss: 2.204722 LR: 0.00002564 [11:37:13] Epoch: 1 Batch: 8413/20099 (41.86%) Loss: 2.527086 LR: 0.00002564 [11:37:15] Epoch: 1 Batch: 8414/20099 (41.86%) Loss: 1.930007 LR: 0.00002564 [11:37:17] Epoch: 1 Batch: 8415/20099 (41.87%) Loss: 2.023625 LR: 0.00002564 [11:37:18] Epoch: 1 Batch: 8416/20099 (41.87%) Loss: 2.303977 LR: 0.00002564 [11:37:20] Epoch: 1 Batch: 8417/20099 (41.88%) Loss: 2.179839 LR: 0.00002562 [11:37:22] Epoch: 1 Batch: 8418/20099 (41.88%) Loss: 2.179608 LR: 0.00002562 [11:37:24] Epoch: 1 Batch: 8419/20099 (41.89%) Loss: 2.093322 LR: 0.00002562 [11:37:26] Epoch: 1 Batch: 8420/20099 (41.89%) Loss: 2.109544 LR: 0.00002562 [11:37:27] Epoch: 1 Batch: 8421/20099 (41.90%) Loss: 2.187616 LR: 0.00002562 [11:37:29] Epoch: 1 Batch: 8422/20099 (41.90%) Loss: 1.768944 LR: 0.00002562 [11:37:31] Epoch: 1 Batch: 8423/20099 (41.91%) Loss: 2.315255 LR: 0.00002562 [11:37:33] Epoch: 1 Batch: 8424/20099 (41.91%) Loss: 2.246185 LR: 0.00002561 [11:37:35] Epoch: 1 Batch: 8425/20099 (41.92%) Loss: 2.195369 LR: 0.00002561 [11:37:36] Epoch: 1 Batch: 8426/20099 (41.92%) Loss: 2.090518 LR: 0.00002561 [11:37:38] Epoch: 1 Batch: 8427/20099 (41.93%) Loss: 1.947975 LR: 0.00002561 [11:37:40] Epoch: 1 Batch: 8428/20099 (41.93%) Loss: 1.971010 LR: 0.00002561 [11:37:42] Epoch: 1 Batch: 8429/20099 (41.94%) Loss: 2.299153 LR: 0.00002561 [11:37:43] Epoch: 1 Batch: 8430/20099 (41.94%) Loss: 2.261442 LR: 0.00002561 [11:37:45] Epoch: 1 Batch: 8431/20099 (41.95%) Loss: 1.807941 LR: 0.00002560 [11:37:47] Epoch: 1 Batch: 8432/20099 (41.95%) Loss: 2.288074 LR: 0.00002560 [11:37:49] Epoch: 1 Batch: 8433/20099 (41.96%) Loss: 1.866800 LR: 0.00002560 [11:37:50] Epoch: 1 Batch: 8434/20099 (41.96%) Loss: 2.068439 LR: 0.00002560 [11:37:52] Epoch: 1 Batch: 8435/20099 (41.97%) Loss: 1.894861 LR: 0.00002560 [11:37:54] Epoch: 1 Batch: 8436/20099 (41.97%) Loss: 2.248547 LR: 0.00002560 [11:37:56] Epoch: 1 Batch: 8437/20099 (41.98%) Loss: 2.037016 LR: 0.00002560 [11:37:58] Epoch: 1 Batch: 8438/20099 (41.98%) Loss: 2.171897 LR: 0.00002558 [11:37:59] Epoch: 1 Batch: 8439/20099 (41.99%) Loss: 1.719218 LR: 0.00002558 [11:38:01] Epoch: 1 Batch: 8440/20099 (41.99%) Loss: 2.205342 LR: 0.00002558 [11:38:03] Epoch: 1 Batch: 8441/20099 (42.00%) Loss: 2.079671 LR: 0.00002558 [11:38:05] Epoch: 1 Batch: 8442/20099 (42.00%) Loss: 2.129334 LR: 0.00002558 [11:38:06] Epoch: 1 Batch: 8443/20099 (42.01%) Loss: 1.992893 LR: 0.00002558 [11:38:08] Epoch: 1 Batch: 8444/20099 (42.01%) Loss: 2.037917 LR: 0.00002558 [11:38:10] Epoch: 1 Batch: 8445/20099 (42.02%) Loss: 1.692624 LR: 0.00002557 [11:38:12] Epoch: 1 Batch: 8446/20099 (42.02%) Loss: 2.036050 LR: 0.00002557 [11:38:13] Epoch: 1 Batch: 8447/20099 (42.03%) Loss: 2.151200 LR: 0.00002557 [11:38:15] Epoch: 1 Batch: 8448/20099 (42.03%) Loss: 2.420434 LR: 0.00002557 [11:38:17] Epoch: 1 Batch: 8449/20099 (42.04%) Loss: 2.120061 LR: 0.00002557 [11:38:19] Epoch: 1 Batch: 8450/20099 (42.04%) Loss: 2.041980 LR: 0.00002557 [11:38:21] Epoch: 1 Batch: 8451/20099 (42.05%) Loss: 2.481949 LR: 0.00002557 [11:38:22] Epoch: 1 Batch: 8452/20099 (42.05%) Loss: 2.106501 LR: 0.00002556 [11:38:24] Epoch: 1 Batch: 8453/20099 (42.06%) Loss: 1.986683 LR: 0.00002556 [11:38:26] Epoch: 1 Batch: 8454/20099 (42.06%) Loss: 2.070791 LR: 0.00002556 [11:38:28] Epoch: 1 Batch: 8455/20099 (42.07%) Loss: 1.831435 LR: 0.00002556 [11:38:30] Epoch: 1 Batch: 8456/20099 (42.07%) Loss: 2.198513 LR: 0.00002556 [11:38:31] Epoch: 1 Batch: 8457/20099 (42.08%) Loss: 2.219378 LR: 0.00002556 [11:38:33] Epoch: 1 Batch: 8458/20099 (42.08%) Loss: 2.016115 LR: 0.00002556 [11:38:35] Epoch: 1 Batch: 8459/20099 (42.09%) Loss: 2.088208 LR: 0.00002555 [11:38:37] Epoch: 1 Batch: 8460/20099 (42.09%) Loss: 2.444727 LR: 0.00002555 [11:38:38] Epoch: 1 Batch: 8461/20099 (42.10%) Loss: 1.970284 LR: 0.00002555 [11:38:40] Epoch: 1 Batch: 8462/20099 (42.10%) Loss: 1.962031 LR: 0.00002555 [11:38:42] Epoch: 1 Batch: 8463/20099 (42.11%) Loss: 2.339386 LR: 0.00002555 [11:38:44] Epoch: 1 Batch: 8464/20099 (42.11%) Loss: 2.011597 LR: 0.00002555 [11:38:46] Epoch: 1 Batch: 8465/20099 (42.12%) Loss: 2.159337 LR: 0.00002555 [11:38:47] Epoch: 1 Batch: 8466/20099 (42.12%) Loss: 1.575574 LR: 0.00002553 [11:38:49] Epoch: 1 Batch: 8467/20099 (42.13%) Loss: 1.765136 LR: 0.00002553 [11:38:51] Epoch: 1 Batch: 8468/20099 (42.13%) Loss: 2.105543 LR: 0.00002553 [11:38:53] Epoch: 1 Batch: 8469/20099 (42.14%) Loss: 2.307961 LR: 0.00002553 [11:38:54] Epoch: 1 Batch: 8470/20099 (42.14%) Loss: 2.099694 LR: 0.00002553 [11:38:56] Epoch: 1 Batch: 8471/20099 (42.15%) Loss: 2.275409 LR: 0.00002553 [11:38:58] Epoch: 1 Batch: 8472/20099 (42.15%) Loss: 2.397203 LR: 0.00002553 [11:39:00] Epoch: 1 Batch: 8473/20099 (42.16%) Loss: 2.136112 LR: 0.00002552 [11:39:02] Epoch: 1 Batch: 8474/20099 (42.16%) Loss: 2.058021 LR: 0.00002552 [11:39:03] Epoch: 1 Batch: 8475/20099 (42.17%) Loss: 2.098475 LR: 0.00002552 [11:39:05] Epoch: 1 Batch: 8476/20099 (42.17%) Loss: 2.087186 LR: 0.00002552 [11:39:07] Epoch: 1 Batch: 8477/20099 (42.18%) Loss: 1.549926 LR: 0.00002552 [11:39:09] Epoch: 1 Batch: 8478/20099 (42.18%) Loss: 2.008648 LR: 0.00002552 [11:39:10] Epoch: 1 Batch: 8479/20099 (42.19%) Loss: 2.333879 LR: 0.00002552 [11:39:12] Epoch: 1 Batch: 8480/20099 (42.19%) Loss: 2.150656 LR: 0.00002551 [11:39:14] Epoch: 1 Batch: 8481/20099 (42.20%) Loss: 2.289738 LR: 0.00002551 [11:39:16] Epoch: 1 Batch: 8482/20099 (42.20%) Loss: 2.222258 LR: 0.00002551 [11:39:18] Epoch: 1 Batch: 8483/20099 (42.21%) Loss: 2.175740 LR: 0.00002551 [11:39:19] Epoch: 1 Batch: 8484/20099 (42.21%) Loss: 2.311911 LR: 0.00002551 [11:39:21] Epoch: 1 Batch: 8485/20099 (42.22%) Loss: 1.973220 LR: 0.00002551 [11:39:23] Epoch: 1 Batch: 8486/20099 (42.22%) Loss: 2.091789 LR: 0.00002551 [11:39:25] Epoch: 1 Batch: 8487/20099 (42.23%) Loss: 2.123166 LR: 0.00002550 [11:39:26] Epoch: 1 Batch: 8488/20099 (42.23%) Loss: 1.958299 LR: 0.00002550 [11:39:28] Epoch: 1 Batch: 8489/20099 (42.24%) Loss: 2.122168 LR: 0.00002550 [11:39:30] Epoch: 1 Batch: 8490/20099 (42.24%) Loss: 2.057658 LR: 0.00002550 [11:39:32] Epoch: 1 Batch: 8491/20099 (42.25%) Loss: 2.021213 LR: 0.00002550 [11:39:33] Epoch: 1 Batch: 8492/20099 (42.25%) Loss: 2.194717 LR: 0.00002550 [11:39:35] Epoch: 1 Batch: 8493/20099 (42.26%) Loss: 2.023645 LR: 0.00002550 [11:39:37] Epoch: 1 Batch: 8494/20099 (42.26%) Loss: 2.238142 LR: 0.00002548 [11:39:39] Epoch: 1 Batch: 8495/20099 (42.27%) Loss: 2.036112 LR: 0.00002548 [11:39:41] Epoch: 1 Batch: 8496/20099 (42.27%) Loss: 2.241148 LR: 0.00002548 [11:39:42] Epoch: 1 Batch: 8497/20099 (42.28%) Loss: 1.967391 LR: 0.00002548 [11:39:44] Epoch: 1 Batch: 8498/20099 (42.28%) Loss: 2.068941 LR: 0.00002548 [11:39:46] Epoch: 1 Batch: 8499/20099 (42.29%) Loss: 2.199587 LR: 0.00002548 [11:39:48] >> Evaluating batch 0 [11:39:49] >> Evaluating batch 1 [11:39:50] >> Evaluating batch 2 [11:39:51] >> Evaluating batch 3 [11:39:52] >> Evaluating batch 4 [11:39:53] >> Evaluating batch 5 [11:39:54] >> Evaluating batch 6 [11:39:55] >> Evaluating batch 7 [11:39:56] >> Evaluating batch 8 [11:39:57] >> Evaluating batch 9 [11:39:58] >> Evaluating batch 10 [11:39:59] >> Evaluating batch 11 [11:40:00] >> Evaluating batch 12 [11:40:01] >> Evaluating batch 13 [11:40:02] >> Evaluating batch 14 [11:40:02] >> Evaluating batch 15 [11:40:03] >> Evaluating batch 16 [11:40:04] Epoch: 1 Step: 8500/20099 Evaluation: [11:40:04] [1mAvg Loss Since Last Eval: 2.1053 Val Loss: 2.1758 Validation loss delta: -0.0012 Perplexity: 8.8091 LR: 0.00002548 [11:40:08] >> Checkpoint saved: epoch1_step8500, size: 0.1693 GB [11:40:08] Epoch: 1 Batch: 8500/20099 (42.29%) Loss: 2.226770 LR: 0.00002548 [11:40:09] Epoch: 1 Batch: 8501/20099 (42.30%) Loss: 2.055083 LR: 0.00002547 [11:40:11] Epoch: 1 Batch: 8502/20099 (42.30%) Loss: 2.110502 LR: 0.00002547 [11:40:13] Epoch: 1 Batch: 8503/20099 (42.31%) Loss: 2.077309 LR: 0.00002547 [11:40:15] Epoch: 1 Batch: 8504/20099 (42.31%) Loss: 1.603479 LR: 0.00002547 [11:40:16] Epoch: 1 Batch: 8505/20099 (42.32%) Loss: 2.289458 LR: 0.00002547 [11:40:18] Epoch: 1 Batch: 8506/20099 (42.32%) Loss: 2.025167 LR: 0.00002547 [11:40:20] Epoch: 1 Batch: 8507/20099 (42.33%) Loss: 1.626361 LR: 0.00002547 [11:40:22] Epoch: 1 Batch: 8508/20099 (42.33%) Loss: 1.991895 LR: 0.00002546 [11:40:23] Epoch: 1 Batch: 8509/20099 (42.34%) Loss: 2.229913 LR: 0.00002546 [11:40:25] Epoch: 1 Batch: 8510/20099 (42.34%) Loss: 1.980468 LR: 0.00002546 [11:40:27] Epoch: 1 Batch: 8511/20099 (42.35%) Loss: 2.208723 LR: 0.00002546 [11:40:29] Epoch: 1 Batch: 8512/20099 (42.35%) Loss: 2.088488 LR: 0.00002546 [11:40:31] Epoch: 1 Batch: 8513/20099 (42.36%) Loss: 2.047435 LR: 0.00002546 [11:40:33] Epoch: 1 Batch: 8514/20099 (42.36%) Loss: 2.281674 LR: 0.00002546 [11:40:34] Epoch: 1 Batch: 8515/20099 (42.37%) Loss: 1.859825 LR: 0.00002545 [11:40:36] Epoch: 1 Batch: 8516/20099 (42.37%) Loss: 2.385668 LR: 0.00002545 [11:40:38] Epoch: 1 Batch: 8517/20099 (42.38%) Loss: 1.971903 LR: 0.00002545 [11:40:40] Epoch: 1 Batch: 8518/20099 (42.38%) Loss: 1.972402 LR: 0.00002545 [11:40:42] Epoch: 1 Batch: 8519/20099 (42.39%) Loss: 1.664523 LR: 0.00002545 [11:40:43] Epoch: 1 Batch: 8520/20099 (42.39%) Loss: 1.535823 LR: 0.00002545 [11:40:45] Epoch: 1 Batch: 8521/20099 (42.40%) Loss: 1.912231 LR: 0.00002545 [11:40:47] Epoch: 1 Batch: 8522/20099 (42.40%) Loss: 2.273691 LR: 0.00002543 [11:40:49] Epoch: 1 Batch: 8523/20099 (42.41%) Loss: 2.132195 LR: 0.00002543 [11:40:51] Epoch: 1 Batch: 8524/20099 (42.41%) Loss: 2.121899 LR: 0.00002543 [11:40:52] Epoch: 1 Batch: 8525/20099 (42.42%) Loss: 2.294261 LR: 0.00002543 [11:40:54] Epoch: 1 Batch: 8526/20099 (42.42%) Loss: 2.072601 LR: 0.00002543 [11:40:56] Epoch: 1 Batch: 8527/20099 (42.42%) Loss: 2.205577 LR: 0.00002543 [11:40:58] Epoch: 1 Batch: 8528/20099 (42.43%) Loss: 1.931711 LR: 0.00002543 [11:40:59] Epoch: 1 Batch: 8529/20099 (42.43%) Loss: 2.159572 LR: 0.00002542 [11:41:01] Epoch: 1 Batch: 8530/20099 (42.44%) Loss: 1.947933 LR: 0.00002542 [11:41:03] Epoch: 1 Batch: 8531/20099 (42.44%) Loss: 2.356861 LR: 0.00002542 [11:41:05] Epoch: 1 Batch: 8532/20099 (42.45%) Loss: 2.129709 LR: 0.00002542 [11:41:06] Epoch: 1 Batch: 8533/20099 (42.45%) Loss: 2.025892 LR: 0.00002542 [11:41:08] Epoch: 1 Batch: 8534/20099 (42.46%) Loss: 2.348033 LR: 0.00002542 [11:41:10] Epoch: 1 Batch: 8535/20099 (42.46%) Loss: 2.088488 LR: 0.00002542 [11:41:12] Epoch: 1 Batch: 8536/20099 (42.47%) Loss: 2.191310 LR: 0.00002541 [11:41:13] Epoch: 1 Batch: 8537/20099 (42.47%) Loss: 2.397079 LR: 0.00002541 [11:41:15] Epoch: 1 Batch: 8538/20099 (42.48%) Loss: 2.136764 LR: 0.00002541 [11:41:17] Epoch: 1 Batch: 8539/20099 (42.48%) Loss: 2.045951 LR: 0.00002541 [11:41:19] Epoch: 1 Batch: 8540/20099 (42.49%) Loss: 2.173259 LR: 0.00002541 [11:41:21] Epoch: 1 Batch: 8541/20099 (42.49%) Loss: 2.107357 LR: 0.00002541 [11:41:22] Epoch: 1 Batch: 8542/20099 (42.50%) Loss: 2.171087 LR: 0.00002541 [11:41:24] Epoch: 1 Batch: 8543/20099 (42.50%) Loss: 1.919832 LR: 0.00002539 [11:41:26] Epoch: 1 Batch: 8544/20099 (42.51%) Loss: 2.023543 LR: 0.00002539 [11:41:28] Epoch: 1 Batch: 8545/20099 (42.51%) Loss: 2.144823 LR: 0.00002539 [11:41:29] Epoch: 1 Batch: 8546/20099 (42.52%) Loss: 1.947392 LR: 0.00002539 [11:41:31] Epoch: 1 Batch: 8547/20099 (42.52%) Loss: 2.120265 LR: 0.00002539 [11:41:33] Epoch: 1 Batch: 8548/20099 (42.53%) Loss: 2.246254 LR: 0.00002539 [11:41:35] Epoch: 1 Batch: 8549/20099 (42.53%) Loss: 2.358902 LR: 0.00002539 [11:41:36] Epoch: 1 Batch: 8550/20099 (42.54%) Loss: 2.277602 LR: 0.00002538 [11:41:38] Epoch: 1 Batch: 8551/20099 (42.54%) Loss: 2.060599 LR: 0.00002538 [11:41:40] Epoch: 1 Batch: 8552/20099 (42.55%) Loss: 1.908423 LR: 0.00002538 [11:41:42] Epoch: 1 Batch: 8553/20099 (42.55%) Loss: 1.776831 LR: 0.00002538 [11:41:44] Epoch: 1 Batch: 8554/20099 (42.56%) Loss: 2.154840 LR: 0.00002538 [11:41:45] Epoch: 1 Batch: 8555/20099 (42.56%) Loss: 2.037282 LR: 0.00002538 [11:41:47] Epoch: 1 Batch: 8556/20099 (42.57%) Loss: 2.033813 LR: 0.00002538 [11:41:49] Epoch: 1 Batch: 8557/20099 (42.57%) Loss: 2.467761 LR: 0.00002537 [11:41:51] Epoch: 1 Batch: 8558/20099 (42.58%) Loss: 1.687747 LR: 0.00002537 [11:41:52] Epoch: 1 Batch: 8559/20099 (42.58%) Loss: 2.349386 LR: 0.00002537 [11:41:54] Epoch: 1 Batch: 8560/20099 (42.59%) Loss: 2.244714 LR: 0.00002537 [11:41:56] Epoch: 1 Batch: 8561/20099 (42.59%) Loss: 2.199562 LR: 0.00002537 [11:41:58] Epoch: 1 Batch: 8562/20099 (42.60%) Loss: 1.936094 LR: 0.00002537 [11:42:00] Epoch: 1 Batch: 8563/20099 (42.60%) Loss: 1.812430 LR: 0.00002537 [11:42:01] Epoch: 1 Batch: 8564/20099 (42.61%) Loss: 2.126463 LR: 0.00002536 [11:42:03] Epoch: 1 Batch: 8565/20099 (42.61%) Loss: 2.266934 LR: 0.00002536 [11:42:05] Epoch: 1 Batch: 8566/20099 (42.62%) Loss: 2.285975 LR: 0.00002536 [11:42:07] Epoch: 1 Batch: 8567/20099 (42.62%) Loss: 2.363725 LR: 0.00002536 [11:42:09] Epoch: 1 Batch: 8568/20099 (42.63%) Loss: 1.907679 LR: 0.00002536 [11:42:10] Epoch: 1 Batch: 8569/20099 (42.63%) Loss: 2.078296 LR: 0.00002536 [11:42:12] Epoch: 1 Batch: 8570/20099 (42.64%) Loss: 1.833547 LR: 0.00002536 [11:42:14] Epoch: 1 Batch: 8571/20099 (42.64%) Loss: 2.315106 LR: 0.00002534 [11:42:16] Epoch: 1 Batch: 8572/20099 (42.65%) Loss: 2.023692 LR: 0.00002534 [11:42:17] Epoch: 1 Batch: 8573/20099 (42.65%) Loss: 2.152697 LR: 0.00002534 [11:42:19] Epoch: 1 Batch: 8574/20099 (42.66%) Loss: 2.287580 LR: 0.00002534 [11:42:21] Epoch: 1 Batch: 8575/20099 (42.66%) Loss: 2.201092 LR: 0.00002534 [11:42:23] Epoch: 1 Batch: 8576/20099 (42.67%) Loss: 1.964422 LR: 0.00002534 [11:42:24] Epoch: 1 Batch: 8577/20099 (42.67%) Loss: 1.683542 LR: 0.00002534 [11:42:26] Epoch: 1 Batch: 8578/20099 (42.68%) Loss: 1.966780 LR: 0.00002533 [11:42:28] Epoch: 1 Batch: 8579/20099 (42.68%) Loss: 1.915959 LR: 0.00002533 [11:42:30] Epoch: 1 Batch: 8580/20099 (42.69%) Loss: 1.956283 LR: 0.00002533 [11:42:32] Epoch: 1 Batch: 8581/20099 (42.69%) Loss: 2.216053 LR: 0.00002533 [11:42:33] Epoch: 1 Batch: 8582/20099 (42.70%) Loss: 2.547195 LR: 0.00002533 [11:42:35] Epoch: 1 Batch: 8583/20099 (42.70%) Loss: 2.067507 LR: 0.00002533 [11:42:37] Epoch: 1 Batch: 8584/20099 (42.71%) Loss: 1.890233 LR: 0.00002533 [11:42:39] Epoch: 1 Batch: 8585/20099 (42.71%) Loss: 2.322392 LR: 0.00002532 [11:42:40] Epoch: 1 Batch: 8586/20099 (42.72%) Loss: 2.068478 LR: 0.00002532 [11:42:42] Epoch: 1 Batch: 8587/20099 (42.72%) Loss: 2.115974 LR: 0.00002532 [11:42:44] Epoch: 1 Batch: 8588/20099 (42.73%) Loss: 2.226666 LR: 0.00002532 [11:42:46] Epoch: 1 Batch: 8589/20099 (42.73%) Loss: 2.116151 LR: 0.00002532 [11:42:47] Epoch: 1 Batch: 8590/20099 (42.74%) Loss: 2.033291 LR: 0.00002532 [11:42:49] Epoch: 1 Batch: 8591/20099 (42.74%) Loss: 2.053727 LR: 0.00002532 [11:42:51] Epoch: 1 Batch: 8592/20099 (42.75%) Loss: 2.059513 LR: 0.00002530 [11:42:53] Epoch: 1 Batch: 8593/20099 (42.75%) Loss: 1.838191 LR: 0.00002530 [11:42:55] Epoch: 1 Batch: 8594/20099 (42.76%) Loss: 2.229143 LR: 0.00002530 [11:42:56] Epoch: 1 Batch: 8595/20099 (42.76%) Loss: 2.222639 LR: 0.00002530 [11:42:58] Epoch: 1 Batch: 8596/20099 (42.77%) Loss: 2.135159 LR: 0.00002530 [11:43:00] Epoch: 1 Batch: 8597/20099 (42.77%) Loss: 1.994958 LR: 0.00002530 [11:43:02] Epoch: 1 Batch: 8598/20099 (42.78%) Loss: 1.803015 LR: 0.00002530 [11:43:03] Epoch: 1 Batch: 8599/20099 (42.78%) Loss: 1.964717 LR: 0.00002529 [11:43:09] >> Cleaned up old temp checkpoint: epoch1_step6600 [11:43:09] >> Temp checkpoint saved: epoch1_step8600, size: 0.1693 GB [11:43:09] Epoch: 1 Batch: 8600/20099 (42.79%) Loss: 2.135351 LR: 0.00002529 [11:43:10] Epoch: 1 Batch: 8601/20099 (42.79%) Loss: 2.388689 LR: 0.00002529 [11:43:12] Epoch: 1 Batch: 8602/20099 (42.80%) Loss: 2.108501 LR: 0.00002529 [11:43:14] Epoch: 1 Batch: 8603/20099 (42.80%) Loss: 2.207276 LR: 0.00002529 [11:43:16] Epoch: 1 Batch: 8604/20099 (42.81%) Loss: 2.053855 LR: 0.00002529 [11:43:17] Epoch: 1 Batch: 8605/20099 (42.81%) Loss: 2.025039 LR: 0.00002529 [11:43:19] Epoch: 1 Batch: 8606/20099 (42.82%) Loss: 2.252290 LR: 0.00002528 [11:43:21] Epoch: 1 Batch: 8607/20099 (42.82%) Loss: 2.091642 LR: 0.00002528 [11:43:23] Epoch: 1 Batch: 8608/20099 (42.83%) Loss: 2.059448 LR: 0.00002528 [11:43:25] Epoch: 1 Batch: 8609/20099 (42.83%) Loss: 1.999621 LR: 0.00002528 [11:43:26] Epoch: 1 Batch: 8610/20099 (42.84%) Loss: 2.349713 LR: 0.00002528 [11:43:28] Epoch: 1 Batch: 8611/20099 (42.84%) Loss: 2.089671 LR: 0.00002528 [11:43:30] Epoch: 1 Batch: 8612/20099 (42.85%) Loss: 1.988560 LR: 0.00002528 [11:43:32] Epoch: 1 Batch: 8613/20099 (42.85%) Loss: 2.185002 LR: 0.00002527 [11:43:34] Epoch: 1 Batch: 8614/20099 (42.86%) Loss: 2.090698 LR: 0.00002527 [11:43:35] Epoch: 1 Batch: 8615/20099 (42.86%) Loss: 1.904257 LR: 0.00002527 [11:43:37] Epoch: 1 Batch: 8616/20099 (42.87%) Loss: 2.108423 LR: 0.00002527 [11:43:39] Epoch: 1 Batch: 8617/20099 (42.87%) Loss: 2.218575 LR: 0.00002527 [11:43:41] Epoch: 1 Batch: 8618/20099 (42.88%) Loss: 2.102868 LR: 0.00002527 [11:43:43] Epoch: 1 Batch: 8619/20099 (42.88%) Loss: 2.217443 LR: 0.00002527 [11:43:45] Epoch: 1 Batch: 8620/20099 (42.89%) Loss: 1.877322 LR: 0.00002525 [11:43:46] Epoch: 1 Batch: 8621/20099 (42.89%) Loss: 1.822494 LR: 0.00002525 [11:43:48] Epoch: 1 Batch: 8622/20099 (42.90%) Loss: 1.955180 LR: 0.00002525 [11:43:50] Epoch: 1 Batch: 8623/20099 (42.90%) Loss: 2.144392 LR: 0.00002525 [11:43:52] Epoch: 1 Batch: 8624/20099 (42.91%) Loss: 2.146796 LR: 0.00002525 [11:43:53] Epoch: 1 Batch: 8625/20099 (42.91%) Loss: 2.081444 LR: 0.00002525 [11:43:55] Epoch: 1 Batch: 8626/20099 (42.92%) Loss: 2.069887 LR: 0.00002525 [11:43:57] Epoch: 1 Batch: 8627/20099 (42.92%) Loss: 2.028244 LR: 0.00002524 [11:43:59] Epoch: 1 Batch: 8628/20099 (42.93%) Loss: 2.045522 LR: 0.00002524 [11:44:01] Epoch: 1 Batch: 8629/20099 (42.93%) Loss: 1.949482 LR: 0.00002524 [11:44:02] Epoch: 1 Batch: 8630/20099 (42.94%) Loss: 2.041935 LR: 0.00002524 [11:44:04] Epoch: 1 Batch: 8631/20099 (42.94%) Loss: 2.258480 LR: 0.00002524 [11:44:06] Epoch: 1 Batch: 8632/20099 (42.95%) Loss: 1.848267 LR: 0.00002524 [11:44:08] Epoch: 1 Batch: 8633/20099 (42.95%) Loss: 2.201160 LR: 0.00002524 [11:44:09] Epoch: 1 Batch: 8634/20099 (42.96%) Loss: 1.978869 LR: 0.00002523 [11:44:11] Epoch: 1 Batch: 8635/20099 (42.96%) Loss: 2.139172 LR: 0.00002523 [11:44:13] Epoch: 1 Batch: 8636/20099 (42.97%) Loss: 2.281006 LR: 0.00002523 [11:44:15] Epoch: 1 Batch: 8637/20099 (42.97%) Loss: 2.140579 LR: 0.00002523 [11:44:16] Epoch: 1 Batch: 8638/20099 (42.98%) Loss: 1.941023 LR: 0.00002523 [11:44:18] Epoch: 1 Batch: 8639/20099 (42.98%) Loss: 1.930530 LR: 0.00002523 [11:44:20] Epoch: 1 Batch: 8640/20099 (42.99%) Loss: 1.938033 LR: 0.00002523 [11:44:22] Epoch: 1 Batch: 8641/20099 (42.99%) Loss: 2.143999 LR: 0.00002521 [11:44:23] Epoch: 1 Batch: 8642/20099 (43.00%) Loss: 2.091579 LR: 0.00002521 [11:44:25] Epoch: 1 Batch: 8643/20099 (43.00%) Loss: 2.251325 LR: 0.00002521 [11:44:27] Epoch: 1 Batch: 8644/20099 (43.01%) Loss: 2.210966 LR: 0.00002521 [11:44:29] Epoch: 1 Batch: 8645/20099 (43.01%) Loss: 2.099001 LR: 0.00002521 [11:44:30] Epoch: 1 Batch: 8646/20099 (43.02%) Loss: 2.158655 LR: 0.00002521 [11:44:32] Epoch: 1 Batch: 8647/20099 (43.02%) Loss: 2.184009 LR: 0.00002521 [11:44:34] Epoch: 1 Batch: 8648/20099 (43.03%) Loss: 2.182019 LR: 0.00002520 [11:44:36] Epoch: 1 Batch: 8649/20099 (43.03%) Loss: 1.969491 LR: 0.00002520 [11:44:38] Epoch: 1 Batch: 8650/20099 (43.04%) Loss: 2.021269 LR: 0.00002520 [11:44:39] Epoch: 1 Batch: 8651/20099 (43.04%) Loss: 1.944142 LR: 0.00002520 [11:44:41] Epoch: 1 Batch: 8652/20099 (43.05%) Loss: 1.787288 LR: 0.00002520 [11:44:43] Epoch: 1 Batch: 8653/20099 (43.05%) Loss: 2.004835 LR: 0.00002520 [11:44:45] Epoch: 1 Batch: 8654/20099 (43.06%) Loss: 2.123668 LR: 0.00002520 [11:44:47] Epoch: 1 Batch: 8655/20099 (43.06%) Loss: 2.364437 LR: 0.00002519 [11:44:48] Epoch: 1 Batch: 8656/20099 (43.07%) Loss: 1.824974 LR: 0.00002519 [11:44:50] Epoch: 1 Batch: 8657/20099 (43.07%) Loss: 2.137780 LR: 0.00002519 [11:44:52] Epoch: 1 Batch: 8658/20099 (43.08%) Loss: 2.115452 LR: 0.00002519 [11:44:54] Epoch: 1 Batch: 8659/20099 (43.08%) Loss: 2.278077 LR: 0.00002519 [11:44:55] Epoch: 1 Batch: 8660/20099 (43.09%) Loss: 2.202779 LR: 0.00002519 [11:44:57] Epoch: 1 Batch: 8661/20099 (43.09%) Loss: 2.177789 LR: 0.00002519 [11:44:59] Epoch: 1 Batch: 8662/20099 (43.10%) Loss: 2.160023 LR: 0.00002518 [11:45:01] Epoch: 1 Batch: 8663/20099 (43.10%) Loss: 2.188057 LR: 0.00002518 [11:45:03] Epoch: 1 Batch: 8664/20099 (43.11%) Loss: 1.868831 LR: 0.00002518 [11:45:04] Epoch: 1 Batch: 8665/20099 (43.11%) Loss: 2.217101 LR: 0.00002518 [11:45:06] Epoch: 1 Batch: 8666/20099 (43.12%) Loss: 2.151749 LR: 0.00002518 [11:45:08] Epoch: 1 Batch: 8667/20099 (43.12%) Loss: 2.255677 LR: 0.00002518 [11:45:10] Epoch: 1 Batch: 8668/20099 (43.13%) Loss: 1.944455 LR: 0.00002518 [11:45:11] Epoch: 1 Batch: 8669/20099 (43.13%) Loss: 2.068547 LR: 0.00002516 [11:45:13] Epoch: 1 Batch: 8670/20099 (43.14%) Loss: 2.213587 LR: 0.00002516 [11:45:15] Epoch: 1 Batch: 8671/20099 (43.14%) Loss: 2.036845 LR: 0.00002516 [11:45:17] Epoch: 1 Batch: 8672/20099 (43.15%) Loss: 2.061576 LR: 0.00002516 [11:45:19] Epoch: 1 Batch: 8673/20099 (43.15%) Loss: 2.257074 LR: 0.00002516 [11:45:20] Epoch: 1 Batch: 8674/20099 (43.16%) Loss: 2.286549 LR: 0.00002516 [11:45:22] Epoch: 1 Batch: 8675/20099 (43.16%) Loss: 2.290716 LR: 0.00002516 [11:45:24] Epoch: 1 Batch: 8676/20099 (43.17%) Loss: 2.055077 LR: 0.00002515 [11:45:26] Epoch: 1 Batch: 8677/20099 (43.17%) Loss: 2.116516 LR: 0.00002515 [11:45:27] Epoch: 1 Batch: 8678/20099 (43.18%) Loss: 1.939814 LR: 0.00002515 [11:45:29] Epoch: 1 Batch: 8679/20099 (43.18%) Loss: 2.190419 LR: 0.00002515 [11:45:31] Epoch: 1 Batch: 8680/20099 (43.19%) Loss: 1.872654 LR: 0.00002515 [11:45:33] Epoch: 1 Batch: 8681/20099 (43.19%) Loss: 1.911647 LR: 0.00002515 [11:45:34] Epoch: 1 Batch: 8682/20099 (43.20%) Loss: 1.829846 LR: 0.00002515 [11:45:36] Epoch: 1 Batch: 8683/20099 (43.20%) Loss: 1.950848 LR: 0.00002514 [11:45:38] Epoch: 1 Batch: 8684/20099 (43.21%) Loss: 2.157645 LR: 0.00002514 [11:45:40] Epoch: 1 Batch: 8685/20099 (43.21%) Loss: 2.157362 LR: 0.00002514 [11:45:41] Epoch: 1 Batch: 8686/20099 (43.22%) Loss: 2.128485 LR: 0.00002514 [11:45:43] Epoch: 1 Batch: 8687/20099 (43.22%) Loss: 2.075567 LR: 0.00002514 [11:45:45] Epoch: 1 Batch: 8688/20099 (43.23%) Loss: 2.206054 LR: 0.00002514 [11:45:47] Epoch: 1 Batch: 8689/20099 (43.23%) Loss: 2.205368 LR: 0.00002514 [11:45:49] Epoch: 1 Batch: 8690/20099 (43.24%) Loss: 1.940943 LR: 0.00002512 [11:45:50] Epoch: 1 Batch: 8691/20099 (43.24%) Loss: 2.050334 LR: 0.00002512 [11:45:52] Epoch: 1 Batch: 8692/20099 (43.25%) Loss: 2.087475 LR: 0.00002512 [11:45:54] Epoch: 1 Batch: 8693/20099 (43.25%) Loss: 1.955823 LR: 0.00002512 [11:45:56] Epoch: 1 Batch: 8694/20099 (43.26%) Loss: 2.039918 LR: 0.00002512 [11:45:57] Epoch: 1 Batch: 8695/20099 (43.26%) Loss: 2.041171 LR: 0.00002512 [11:45:59] Epoch: 1 Batch: 8696/20099 (43.27%) Loss: 2.069250 LR: 0.00002512 [11:46:01] Epoch: 1 Batch: 8697/20099 (43.27%) Loss: 1.921136 LR: 0.00002511 [11:46:03] Epoch: 1 Batch: 8698/20099 (43.28%) Loss: 2.109964 LR: 0.00002511 [11:46:05] Epoch: 1 Batch: 8699/20099 (43.28%) Loss: 2.087691 LR: 0.00002511 [11:46:06] Epoch: 1 Batch: 8700/20099 (43.29%) Loss: 2.057903 LR: 0.00002511 [11:46:08] Epoch: 1 Batch: 8701/20099 (43.29%) Loss: 1.950335 LR: 0.00002511 [11:46:10] Epoch: 1 Batch: 8702/20099 (43.30%) Loss: 1.958429 LR: 0.00002511 [11:46:12] Epoch: 1 Batch: 8703/20099 (43.30%) Loss: 1.818969 LR: 0.00002511 [11:46:14] Epoch: 1 Batch: 8704/20099 (43.31%) Loss: 1.957181 LR: 0.00002510 [11:46:16] Epoch: 1 Batch: 8705/20099 (43.31%) Loss: 1.840253 LR: 0.00002510 [11:46:17] Epoch: 1 Batch: 8706/20099 (43.32%) Loss: 2.141576 LR: 0.00002510 [11:46:19] Epoch: 1 Batch: 8707/20099 (43.32%) Loss: 2.141484 LR: 0.00002510 [11:46:21] Epoch: 1 Batch: 8708/20099 (43.33%) Loss: 1.966695 LR: 0.00002510 [11:46:23] Epoch: 1 Batch: 8709/20099 (43.33%) Loss: 2.340643 LR: 0.00002510 [11:46:24] Epoch: 1 Batch: 8710/20099 (43.34%) Loss: 2.470619 LR: 0.00002510 [11:46:26] Epoch: 1 Batch: 8711/20099 (43.34%) Loss: 2.368766 LR: 0.00002508 [11:46:28] Epoch: 1 Batch: 8712/20099 (43.35%) Loss: 1.845962 LR: 0.00002508 [11:46:30] Epoch: 1 Batch: 8713/20099 (43.35%) Loss: 2.269329 LR: 0.00002508 [11:46:32] Epoch: 1 Batch: 8714/20099 (43.36%) Loss: 2.501289 LR: 0.00002508 [11:46:33] Epoch: 1 Batch: 8715/20099 (43.36%) Loss: 2.355699 LR: 0.00002508 [11:46:35] Epoch: 1 Batch: 8716/20099 (43.37%) Loss: 1.818942 LR: 0.00002508 [11:46:37] Epoch: 1 Batch: 8717/20099 (43.37%) Loss: 2.133856 LR: 0.00002508 [11:46:39] Epoch: 1 Batch: 8718/20099 (43.38%) Loss: 2.445891 LR: 0.00002507 [11:46:41] Epoch: 1 Batch: 8719/20099 (43.38%) Loss: 2.286847 LR: 0.00002507 [11:46:42] Epoch: 1 Batch: 8720/20099 (43.39%) Loss: 1.817318 LR: 0.00002507 [11:46:44] Epoch: 1 Batch: 8721/20099 (43.39%) Loss: 1.773886 LR: 0.00002507 [11:46:46] Epoch: 1 Batch: 8722/20099 (43.40%) Loss: 2.110995 LR: 0.00002507 [11:46:48] Epoch: 1 Batch: 8723/20099 (43.40%) Loss: 2.274652 LR: 0.00002507 [11:46:49] Epoch: 1 Batch: 8724/20099 (43.41%) Loss: 1.998864 LR: 0.00002507 [11:46:51] Epoch: 1 Batch: 8725/20099 (43.41%) Loss: 2.252484 LR: 0.00002506 [11:46:53] Epoch: 1 Batch: 8726/20099 (43.42%) Loss: 1.807805 LR: 0.00002506 [11:46:55] Epoch: 1 Batch: 8727/20099 (43.42%) Loss: 2.328614 LR: 0.00002506 [11:46:57] Epoch: 1 Batch: 8728/20099 (43.43%) Loss: 1.953470 LR: 0.00002506 [11:46:58] Epoch: 1 Batch: 8729/20099 (43.43%) Loss: 2.032003 LR: 0.00002506 [11:47:00] Epoch: 1 Batch: 8730/20099 (43.43%) Loss: 2.107406 LR: 0.00002506 [11:47:02] Epoch: 1 Batch: 8731/20099 (43.44%) Loss: 2.098711 LR: 0.00002506 [11:47:04] Epoch: 1 Batch: 8732/20099 (43.44%) Loss: 2.235532 LR: 0.00002504 [11:47:06] Epoch: 1 Batch: 8733/20099 (43.45%) Loss: 1.808293 LR: 0.00002504 [11:47:07] Epoch: 1 Batch: 8734/20099 (43.45%) Loss: 1.774312 LR: 0.00002504 [11:47:09] Epoch: 1 Batch: 8735/20099 (43.46%) Loss: 2.037095 LR: 0.00002504 [11:47:11] Epoch: 1 Batch: 8736/20099 (43.46%) Loss: 2.064451 LR: 0.00002504 [11:47:13] Epoch: 1 Batch: 8737/20099 (43.47%) Loss: 2.264861 LR: 0.00002504 [11:47:14] Epoch: 1 Batch: 8738/20099 (43.47%) Loss: 2.002427 LR: 0.00002504 [11:47:16] Epoch: 1 Batch: 8739/20099 (43.48%) Loss: 1.738102 LR: 0.00002503 [11:47:18] Epoch: 1 Batch: 8740/20099 (43.48%) Loss: 2.282757 LR: 0.00002503 [11:47:20] Epoch: 1 Batch: 8741/20099 (43.49%) Loss: 1.955977 LR: 0.00002503 [11:47:22] Epoch: 1 Batch: 8742/20099 (43.49%) Loss: 2.112873 LR: 0.00002503 [11:47:23] Epoch: 1 Batch: 8743/20099 (43.50%) Loss: 2.100660 LR: 0.00002503 [11:47:25] Epoch: 1 Batch: 8744/20099 (43.50%) Loss: 1.953080 LR: 0.00002503 [11:47:27] Epoch: 1 Batch: 8745/20099 (43.51%) Loss: 2.181740 LR: 0.00002503 [11:47:29] Epoch: 1 Batch: 8746/20099 (43.51%) Loss: 2.088629 LR: 0.00002502 [11:47:31] Epoch: 1 Batch: 8747/20099 (43.52%) Loss: 2.210272 LR: 0.00002502 [11:47:32] Epoch: 1 Batch: 8748/20099 (43.52%) Loss: 2.431772 LR: 0.00002502 [11:47:34] Epoch: 1 Batch: 8749/20099 (43.53%) Loss: 2.076193 LR: 0.00002502 [11:47:36] Epoch: 1 Batch: 8750/20099 (43.53%) Loss: 1.753880 LR: 0.00002502 [11:47:38] Epoch: 1 Batch: 8751/20099 (43.54%) Loss: 2.023881 LR: 0.00002502 [11:47:39] Epoch: 1 Batch: 8752/20099 (43.54%) Loss: 1.874739 LR: 0.00002502 [11:47:41] Epoch: 1 Batch: 8753/20099 (43.55%) Loss: 2.292072 LR: 0.00002500 [11:47:43] Epoch: 1 Batch: 8754/20099 (43.55%) Loss: 2.109941 LR: 0.00002500 [11:47:45] Epoch: 1 Batch: 8755/20099 (43.56%) Loss: 2.108124 LR: 0.00002500 [11:47:46] Epoch: 1 Batch: 8756/20099 (43.56%) Loss: 2.006287 LR: 0.00002500 [11:47:48] Epoch: 1 Batch: 8757/20099 (43.57%) Loss: 2.134070 LR: 0.00002500 [11:47:50] Epoch: 1 Batch: 8758/20099 (43.57%) Loss: 2.188874 LR: 0.00002500 [11:47:52] Epoch: 1 Batch: 8759/20099 (43.58%) Loss: 2.424749 LR: 0.00002500 [11:47:54] Epoch: 1 Batch: 8760/20099 (43.58%) Loss: 2.178039 LR: 0.00002499 [11:47:55] Epoch: 1 Batch: 8761/20099 (43.59%) Loss: 2.448673 LR: 0.00002499 [11:47:57] Epoch: 1 Batch: 8762/20099 (43.59%) Loss: 2.244437 LR: 0.00002499 [11:47:59] Epoch: 1 Batch: 8763/20099 (43.60%) Loss: 2.066433 LR: 0.00002499 [11:48:01] Epoch: 1 Batch: 8764/20099 (43.60%) Loss: 2.194649 LR: 0.00002499 [11:48:02] Epoch: 1 Batch: 8765/20099 (43.61%) Loss: 2.255410 LR: 0.00002499 [11:48:04] Epoch: 1 Batch: 8766/20099 (43.61%) Loss: 2.361702 LR: 0.00002499 [11:48:06] Epoch: 1 Batch: 8767/20099 (43.62%) Loss: 2.179531 LR: 0.00002498 [11:48:08] Epoch: 1 Batch: 8768/20099 (43.62%) Loss: 2.216789 LR: 0.00002498 [11:48:10] Epoch: 1 Batch: 8769/20099 (43.63%) Loss: 2.302819 LR: 0.00002498 [11:48:11] Epoch: 1 Batch: 8770/20099 (43.63%) Loss: 2.389370 LR: 0.00002498 [11:48:13] Epoch: 1 Batch: 8771/20099 (43.64%) Loss: 2.379965 LR: 0.00002498 [11:48:15] Epoch: 1 Batch: 8772/20099 (43.64%) Loss: 1.903313 LR: 0.00002498 [11:48:17] Epoch: 1 Batch: 8773/20099 (43.65%) Loss: 1.905012 LR: 0.00002498 [11:48:19] Epoch: 1 Batch: 8774/20099 (43.65%) Loss: 1.881031 LR: 0.00002497 [11:48:20] Epoch: 1 Batch: 8775/20099 (43.66%) Loss: 2.346161 LR: 0.00002497 [11:48:22] Epoch: 1 Batch: 8776/20099 (43.66%) Loss: 1.985290 LR: 0.00002497 [11:48:24] Epoch: 1 Batch: 8777/20099 (43.67%) Loss: 2.154877 LR: 0.00002497 [11:48:26] Epoch: 1 Batch: 8778/20099 (43.67%) Loss: 2.086773 LR: 0.00002497 [11:48:27] Epoch: 1 Batch: 8779/20099 (43.68%) Loss: 1.896952 LR: 0.00002497 [11:48:29] Epoch: 1 Batch: 8780/20099 (43.68%) Loss: 1.734218 LR: 0.00002497 [11:48:31] Epoch: 1 Batch: 8781/20099 (43.69%) Loss: 1.619685 LR: 0.00002495 [11:48:33] Epoch: 1 Batch: 8782/20099 (43.69%) Loss: 1.964832 LR: 0.00002495 [11:48:35] Epoch: 1 Batch: 8783/20099 (43.70%) Loss: 2.174592 LR: 0.00002495 [11:48:36] Epoch: 1 Batch: 8784/20099 (43.70%) Loss: 2.134486 LR: 0.00002495 [11:48:38] Epoch: 1 Batch: 8785/20099 (43.71%) Loss: 2.167810 LR: 0.00002495 [11:48:40] Epoch: 1 Batch: 8786/20099 (43.71%) Loss: 2.100261 LR: 0.00002495 [11:48:42] Epoch: 1 Batch: 8787/20099 (43.72%) Loss: 2.004875 LR: 0.00002495 [11:48:43] Epoch: 1 Batch: 8788/20099 (43.72%) Loss: 2.284613 LR: 0.00002494 [11:48:45] Epoch: 1 Batch: 8789/20099 (43.73%) Loss: 2.216136 LR: 0.00002494 [11:48:47] Epoch: 1 Batch: 8790/20099 (43.73%) Loss: 2.170727 LR: 0.00002494 [11:48:49] Epoch: 1 Batch: 8791/20099 (43.74%) Loss: 2.414905 LR: 0.00002494 [11:48:51] Epoch: 1 Batch: 8792/20099 (43.74%) Loss: 2.294760 LR: 0.00002494 [11:48:52] Epoch: 1 Batch: 8793/20099 (43.75%) Loss: 2.299910 LR: 0.00002494 [11:48:54] Epoch: 1 Batch: 8794/20099 (43.75%) Loss: 1.928282 LR: 0.00002494 [11:48:56] Epoch: 1 Batch: 8795/20099 (43.76%) Loss: 2.045822 LR: 0.00002493 [11:48:58] Epoch: 1 Batch: 8796/20099 (43.76%) Loss: 2.228258 LR: 0.00002493 [11:48:59] Epoch: 1 Batch: 8797/20099 (43.77%) Loss: 2.168900 LR: 0.00002493 [11:49:01] Epoch: 1 Batch: 8798/20099 (43.77%) Loss: 2.177686 LR: 0.00002493 [11:49:03] Epoch: 1 Batch: 8799/20099 (43.78%) Loss: 1.878343 LR: 0.00002493 [11:49:08] >> Cleaned up old temp checkpoint: epoch1_step6800 [11:49:08] >> Temp checkpoint saved: epoch1_step8800, size: 0.1693 GB [11:49:08] Epoch: 1 Batch: 8800/20099 (43.78%) Loss: 1.866010 LR: 0.00002493 [11:49:10] Epoch: 1 Batch: 8801/20099 (43.79%) Loss: 1.960122 LR: 0.00002493 [11:49:12] Epoch: 1 Batch: 8802/20099 (43.79%) Loss: 2.231789 LR: 0.00002491 [11:49:14] Epoch: 1 Batch: 8803/20099 (43.80%) Loss: 1.864262 LR: 0.00002491 [11:49:15] Epoch: 1 Batch: 8804/20099 (43.80%) Loss: 2.279302 LR: 0.00002491 [11:49:17] Epoch: 1 Batch: 8805/20099 (43.81%) Loss: 2.153176 LR: 0.00002491 [11:49:19] Epoch: 1 Batch: 8806/20099 (43.81%) Loss: 2.209081 LR: 0.00002491 [11:49:21] Epoch: 1 Batch: 8807/20099 (43.82%) Loss: 2.357876 LR: 0.00002491 [11:49:22] Epoch: 1 Batch: 8808/20099 (43.82%) Loss: 1.957975 LR: 0.00002491 [11:49:24] Epoch: 1 Batch: 8809/20099 (43.83%) Loss: 2.060785 LR: 0.00002490 [11:49:26] Epoch: 1 Batch: 8810/20099 (43.83%) Loss: 1.981337 LR: 0.00002490 [11:49:28] Epoch: 1 Batch: 8811/20099 (43.84%) Loss: 2.171979 LR: 0.00002490 [11:49:30] Epoch: 1 Batch: 8812/20099 (43.84%) Loss: 1.936823 LR: 0.00002490 [11:49:31] Epoch: 1 Batch: 8813/20099 (43.85%) Loss: 2.091846 LR: 0.00002490 [11:49:33] Epoch: 1 Batch: 8814/20099 (43.85%) Loss: 2.210981 LR: 0.00002490 [11:49:35] Epoch: 1 Batch: 8815/20099 (43.86%) Loss: 2.059821 LR: 0.00002490 [11:49:37] Epoch: 1 Batch: 8816/20099 (43.86%) Loss: 2.251494 LR: 0.00002489 [11:49:39] Epoch: 1 Batch: 8817/20099 (43.87%) Loss: 1.927938 LR: 0.00002489 [11:49:41] Epoch: 1 Batch: 8818/20099 (43.87%) Loss: 2.082364 LR: 0.00002489 [11:49:42] Epoch: 1 Batch: 8819/20099 (43.88%) Loss: 2.126484 LR: 0.00002489 [11:49:44] Epoch: 1 Batch: 8820/20099 (43.88%) Loss: 2.276657 LR: 0.00002489 [11:49:46] Epoch: 1 Batch: 8821/20099 (43.89%) Loss: 2.066762 LR: 0.00002489 [11:49:48] Epoch: 1 Batch: 8822/20099 (43.89%) Loss: 2.221160 LR: 0.00002489 [11:49:50] Epoch: 1 Batch: 8823/20099 (43.90%) Loss: 2.168954 LR: 0.00002487 [11:49:51] Epoch: 1 Batch: 8824/20099 (43.90%) Loss: 2.188328 LR: 0.00002487 [11:49:53] Epoch: 1 Batch: 8825/20099 (43.91%) Loss: 2.223566 LR: 0.00002487 [11:49:55] Epoch: 1 Batch: 8826/20099 (43.91%) Loss: 1.842831 LR: 0.00002487 [11:49:57] Epoch: 1 Batch: 8827/20099 (43.92%) Loss: 2.191789 LR: 0.00002487 [11:49:58] Epoch: 1 Batch: 8828/20099 (43.92%) Loss: 2.634541 LR: 0.00002487 [11:50:00] Epoch: 1 Batch: 8829/20099 (43.93%) Loss: 2.167196 LR: 0.00002487 [11:50:02] Epoch: 1 Batch: 8830/20099 (43.93%) Loss: 2.050665 LR: 0.00002486 [11:50:04] Epoch: 1 Batch: 8831/20099 (43.94%) Loss: 2.094174 LR: 0.00002486 [11:50:05] Epoch: 1 Batch: 8832/20099 (43.94%) Loss: 2.352609 LR: 0.00002486 [11:50:07] Epoch: 1 Batch: 8833/20099 (43.95%) Loss: 2.318770 LR: 0.00002486 [11:50:09] Epoch: 1 Batch: 8834/20099 (43.95%) Loss: 2.289995 LR: 0.00002486 [11:50:11] Epoch: 1 Batch: 8835/20099 (43.96%) Loss: 2.046527 LR: 0.00002486 [11:50:13] Epoch: 1 Batch: 8836/20099 (43.96%) Loss: 2.343850 LR: 0.00002486 [11:50:14] Epoch: 1 Batch: 8837/20099 (43.97%) Loss: 1.936934 LR: 0.00002485 [11:50:16] Epoch: 1 Batch: 8838/20099 (43.97%) Loss: 2.000253 LR: 0.00002485 [11:50:18] Epoch: 1 Batch: 8839/20099 (43.98%) Loss: 2.412932 LR: 0.00002485 [11:50:20] Epoch: 1 Batch: 8840/20099 (43.98%) Loss: 2.070726 LR: 0.00002485 [11:50:21] Epoch: 1 Batch: 8841/20099 (43.99%) Loss: 1.701976 LR: 0.00002485 [11:50:23] Epoch: 1 Batch: 8842/20099 (43.99%) Loss: 2.069255 LR: 0.00002485 [11:50:25] Epoch: 1 Batch: 8843/20099 (44.00%) Loss: 2.259437 LR: 0.00002485 [11:50:27] Epoch: 1 Batch: 8844/20099 (44.00%) Loss: 2.449068 LR: 0.00002483 [11:50:28] Epoch: 1 Batch: 8845/20099 (44.01%) Loss: 2.214760 LR: 0.00002483 [11:50:30] Epoch: 1 Batch: 8846/20099 (44.01%) Loss: 2.054098 LR: 0.00002483 [11:50:32] Epoch: 1 Batch: 8847/20099 (44.02%) Loss: 2.061645 LR: 0.00002483 [11:50:34] Epoch: 1 Batch: 8848/20099 (44.02%) Loss: 2.222070 LR: 0.00002483 [11:50:36] Epoch: 1 Batch: 8849/20099 (44.03%) Loss: 1.882869 LR: 0.00002483 [11:50:37] Epoch: 1 Batch: 8850/20099 (44.03%) Loss: 2.003189 LR: 0.00002483 [11:50:39] Epoch: 1 Batch: 8851/20099 (44.04%) Loss: 2.097128 LR: 0.00002482 [11:50:41] Epoch: 1 Batch: 8852/20099 (44.04%) Loss: 2.241874 LR: 0.00002482 [11:50:43] Epoch: 1 Batch: 8853/20099 (44.05%) Loss: 2.036433 LR: 0.00002482 [11:50:45] Epoch: 1 Batch: 8854/20099 (44.05%) Loss: 1.945629 LR: 0.00002482 [11:50:46] Epoch: 1 Batch: 8855/20099 (44.06%) Loss: 2.118990 LR: 0.00002482 [11:50:48] Epoch: 1 Batch: 8856/20099 (44.06%) Loss: 2.318518 LR: 0.00002482 [11:50:50] Epoch: 1 Batch: 8857/20099 (44.07%) Loss: 2.109678 LR: 0.00002482 [11:50:52] Epoch: 1 Batch: 8858/20099 (44.07%) Loss: 1.893293 LR: 0.00002481 [11:50:53] Epoch: 1 Batch: 8859/20099 (44.08%) Loss: 2.191021 LR: 0.00002481 [11:50:55] Epoch: 1 Batch: 8860/20099 (44.08%) Loss: 1.980836 LR: 0.00002481 [11:50:57] Epoch: 1 Batch: 8861/20099 (44.09%) Loss: 2.129941 LR: 0.00002481 [11:50:59] Epoch: 1 Batch: 8862/20099 (44.09%) Loss: 2.392464 LR: 0.00002481 [11:51:01] Epoch: 1 Batch: 8863/20099 (44.10%) Loss: 1.517700 LR: 0.00002481 [11:51:02] Epoch: 1 Batch: 8864/20099 (44.10%) Loss: 2.092232 LR: 0.00002481 [11:51:04] Epoch: 1 Batch: 8865/20099 (44.11%) Loss: 2.601226 LR: 0.00002479 [11:51:06] Epoch: 1 Batch: 8866/20099 (44.11%) Loss: 2.259182 LR: 0.00002479 [11:51:08] Epoch: 1 Batch: 8867/20099 (44.12%) Loss: 1.996094 LR: 0.00002479 [11:51:09] Epoch: 1 Batch: 8868/20099 (44.12%) Loss: 2.057159 LR: 0.00002479 [11:51:11] Epoch: 1 Batch: 8869/20099 (44.13%) Loss: 2.101749 LR: 0.00002479 [11:51:13] Epoch: 1 Batch: 8870/20099 (44.13%) Loss: 2.284791 LR: 0.00002479 [11:51:15] Epoch: 1 Batch: 8871/20099 (44.14%) Loss: 2.145237 LR: 0.00002479 [11:51:17] Epoch: 1 Batch: 8872/20099 (44.14%) Loss: 2.443478 LR: 0.00002478 [11:51:18] Epoch: 1 Batch: 8873/20099 (44.15%) Loss: 2.003298 LR: 0.00002478 [11:51:20] Epoch: 1 Batch: 8874/20099 (44.15%) Loss: 2.197456 LR: 0.00002478 [11:51:22] Epoch: 1 Batch: 8875/20099 (44.16%) Loss: 2.143882 LR: 0.00002478 [11:51:24] Epoch: 1 Batch: 8876/20099 (44.16%) Loss: 2.336657 LR: 0.00002478 [11:51:25] Epoch: 1 Batch: 8877/20099 (44.17%) Loss: 2.509401 LR: 0.00002478 [11:51:27] Epoch: 1 Batch: 8878/20099 (44.17%) Loss: 1.780570 LR: 0.00002478 [11:51:29] Epoch: 1 Batch: 8879/20099 (44.18%) Loss: 2.063086 LR: 0.00002477 [11:51:31] Epoch: 1 Batch: 8880/20099 (44.18%) Loss: 1.829238 LR: 0.00002477 [11:51:33] Epoch: 1 Batch: 8881/20099 (44.19%) Loss: 2.067754 LR: 0.00002477 [11:51:34] Epoch: 1 Batch: 8882/20099 (44.19%) Loss: 2.306305 LR: 0.00002477 [11:51:36] Epoch: 1 Batch: 8883/20099 (44.20%) Loss: 2.325176 LR: 0.00002477 [11:51:38] Epoch: 1 Batch: 8884/20099 (44.20%) Loss: 1.910734 LR: 0.00002477 [11:51:40] Epoch: 1 Batch: 8885/20099 (44.21%) Loss: 1.956098 LR: 0.00002477 [11:51:41] Epoch: 1 Batch: 8886/20099 (44.21%) Loss: 2.054062 LR: 0.00002475 [11:51:43] Epoch: 1 Batch: 8887/20099 (44.22%) Loss: 2.056872 LR: 0.00002475 [11:51:45] Epoch: 1 Batch: 8888/20099 (44.22%) Loss: 2.105043 LR: 0.00002475 [11:51:47] Epoch: 1 Batch: 8889/20099 (44.23%) Loss: 1.798053 LR: 0.00002475 [11:51:48] Epoch: 1 Batch: 8890/20099 (44.23%) Loss: 2.134282 LR: 0.00002475 [11:51:50] Epoch: 1 Batch: 8891/20099 (44.24%) Loss: 2.263365 LR: 0.00002475 [11:51:52] Epoch: 1 Batch: 8892/20099 (44.24%) Loss: 2.095472 LR: 0.00002475 [11:51:54] Epoch: 1 Batch: 8893/20099 (44.25%) Loss: 2.099180 LR: 0.00002474 [11:51:56] Epoch: 1 Batch: 8894/20099 (44.25%) Loss: 2.139844 LR: 0.00002474 [11:51:57] Epoch: 1 Batch: 8895/20099 (44.26%) Loss: 2.176537 LR: 0.00002474 [11:51:59] Epoch: 1 Batch: 8896/20099 (44.26%) Loss: 1.992026 LR: 0.00002474 [11:52:01] Epoch: 1 Batch: 8897/20099 (44.27%) Loss: 2.300092 LR: 0.00002474 [11:52:03] Epoch: 1 Batch: 8898/20099 (44.27%) Loss: 1.934599 LR: 0.00002474 [11:52:05] Epoch: 1 Batch: 8899/20099 (44.28%) Loss: 2.067076 LR: 0.00002474 [11:52:06] Epoch: 1 Batch: 8900/20099 (44.28%) Loss: 1.996957 LR: 0.00002472 [11:52:08] Epoch: 1 Batch: 8901/20099 (44.29%) Loss: 2.001204 LR: 0.00002472 [11:52:10] Epoch: 1 Batch: 8902/20099 (44.29%) Loss: 2.185377 LR: 0.00002472 [11:52:12] Epoch: 1 Batch: 8903/20099 (44.30%) Loss: 2.248745 LR: 0.00002472 [11:52:13] Epoch: 1 Batch: 8904/20099 (44.30%) Loss: 1.955698 LR: 0.00002472 [11:52:15] Epoch: 1 Batch: 8905/20099 (44.31%) Loss: 2.046822 LR: 0.00002472 [11:52:17] Epoch: 1 Batch: 8906/20099 (44.31%) Loss: 2.110507 LR: 0.00002472 [11:52:19] Epoch: 1 Batch: 8907/20099 (44.32%) Loss: 1.889066 LR: 0.00002471 [11:52:21] Epoch: 1 Batch: 8908/20099 (44.32%) Loss: 2.348019 LR: 0.00002471 [11:52:22] Epoch: 1 Batch: 8909/20099 (44.33%) Loss: 1.888197 LR: 0.00002471 [11:52:24] Epoch: 1 Batch: 8910/20099 (44.33%) Loss: 1.944966 LR: 0.00002471 [11:52:26] Epoch: 1 Batch: 8911/20099 (44.34%) Loss: 2.411380 LR: 0.00002471 [11:52:28] Epoch: 1 Batch: 8912/20099 (44.34%) Loss: 2.101868 LR: 0.00002471 [11:52:30] Epoch: 1 Batch: 8913/20099 (44.35%) Loss: 1.949051 LR: 0.00002471 [11:52:31] Epoch: 1 Batch: 8914/20099 (44.35%) Loss: 2.021697 LR: 0.00002470 [11:52:33] Epoch: 1 Batch: 8915/20099 (44.36%) Loss: 1.925864 LR: 0.00002470 [11:52:35] Epoch: 1 Batch: 8916/20099 (44.36%) Loss: 2.163466 LR: 0.00002470 [11:52:37] Epoch: 1 Batch: 8917/20099 (44.37%) Loss: 2.105380 LR: 0.00002470 [11:52:38] Epoch: 1 Batch: 8918/20099 (44.37%) Loss: 2.020347 LR: 0.00002470 [11:52:40] Epoch: 1 Batch: 8919/20099 (44.38%) Loss: 1.891574 LR: 0.00002470 [11:52:42] Epoch: 1 Batch: 8920/20099 (44.38%) Loss: 2.121549 LR: 0.00002470 [11:52:44] Epoch: 1 Batch: 8921/20099 (44.39%) Loss: 2.127363 LR: 0.00002468 [11:52:46] Epoch: 1 Batch: 8922/20099 (44.39%) Loss: 2.315091 LR: 0.00002468 [11:52:47] Epoch: 1 Batch: 8923/20099 (44.40%) Loss: 2.200363 LR: 0.00002468 [11:52:49] Epoch: 1 Batch: 8924/20099 (44.40%) Loss: 2.146644 LR: 0.00002468 [11:52:51] Epoch: 1 Batch: 8925/20099 (44.41%) Loss: 2.151454 LR: 0.00002468 [11:52:53] Epoch: 1 Batch: 8926/20099 (44.41%) Loss: 1.891682 LR: 0.00002468 [11:52:54] Epoch: 1 Batch: 8927/20099 (44.42%) Loss: 2.181899 LR: 0.00002468 [11:52:56] Epoch: 1 Batch: 8928/20099 (44.42%) Loss: 2.444313 LR: 0.00002467 [11:52:58] Epoch: 1 Batch: 8929/20099 (44.43%) Loss: 2.092938 LR: 0.00002467 [11:53:00] Epoch: 1 Batch: 8930/20099 (44.43%) Loss: 2.003423 LR: 0.00002467 [11:53:02] Epoch: 1 Batch: 8931/20099 (44.44%) Loss: 2.027616 LR: 0.00002467 [11:53:03] Epoch: 1 Batch: 8932/20099 (44.44%) Loss: 1.957065 LR: 0.00002467 [11:53:05] Epoch: 1 Batch: 8933/20099 (44.44%) Loss: 1.879801 LR: 0.00002467 [11:53:07] Epoch: 1 Batch: 8934/20099 (44.45%) Loss: 2.005494 LR: 0.00002467 [11:53:09] Epoch: 1 Batch: 8935/20099 (44.45%) Loss: 2.281109 LR: 0.00002466 [11:53:10] Epoch: 1 Batch: 8936/20099 (44.46%) Loss: 2.214734 LR: 0.00002466 [11:53:12] Epoch: 1 Batch: 8937/20099 (44.46%) Loss: 1.818481 LR: 0.00002466 [11:53:14] Epoch: 1 Batch: 8938/20099 (44.47%) Loss: 2.112454 LR: 0.00002466 [11:53:16] Epoch: 1 Batch: 8939/20099 (44.47%) Loss: 2.012050 LR: 0.00002466 [11:53:18] Epoch: 1 Batch: 8940/20099 (44.48%) Loss: 2.226337 LR: 0.00002466 [11:53:19] Epoch: 1 Batch: 8941/20099 (44.48%) Loss: 2.296197 LR: 0.00002466 [11:53:21] Epoch: 1 Batch: 8942/20099 (44.49%) Loss: 2.230952 LR: 0.00002464 [11:53:23] Epoch: 1 Batch: 8943/20099 (44.49%) Loss: 2.047608 LR: 0.00002464 [11:53:25] Epoch: 1 Batch: 8944/20099 (44.50%) Loss: 2.265190 LR: 0.00002464 [11:53:26] Epoch: 1 Batch: 8945/20099 (44.50%) Loss: 2.256366 LR: 0.00002464 [11:53:28] Epoch: 1 Batch: 8946/20099 (44.51%) Loss: 2.191411 LR: 0.00002464 [11:53:30] Epoch: 1 Batch: 8947/20099 (44.51%) Loss: 2.253154 LR: 0.00002464 [11:53:32] Epoch: 1 Batch: 8948/20099 (44.52%) Loss: 2.075844 LR: 0.00002464 [11:53:34] Epoch: 1 Batch: 8949/20099 (44.52%) Loss: 1.998888 LR: 0.00002463 [11:53:35] Epoch: 1 Batch: 8950/20099 (44.53%) Loss: 2.170465 LR: 0.00002463 [11:53:37] Epoch: 1 Batch: 8951/20099 (44.53%) Loss: 1.914535 LR: 0.00002463 [11:53:39] Epoch: 1 Batch: 8952/20099 (44.54%) Loss: 2.146004 LR: 0.00002463 [11:53:41] Epoch: 1 Batch: 8953/20099 (44.54%) Loss: 2.403942 LR: 0.00002463 [11:53:42] Epoch: 1 Batch: 8954/20099 (44.55%) Loss: 1.942431 LR: 0.00002463 [11:53:44] Epoch: 1 Batch: 8955/20099 (44.55%) Loss: 2.117085 LR: 0.00002463 [11:53:46] Epoch: 1 Batch: 8956/20099 (44.56%) Loss: 2.200315 LR: 0.00002462 [11:53:48] Epoch: 1 Batch: 8957/20099 (44.56%) Loss: 1.910949 LR: 0.00002462 [11:53:50] Epoch: 1 Batch: 8958/20099 (44.57%) Loss: 2.144685 LR: 0.00002462 [11:53:51] Epoch: 1 Batch: 8959/20099 (44.57%) Loss: 1.921150 LR: 0.00002462 [11:53:53] Epoch: 1 Batch: 8960/20099 (44.58%) Loss: 2.101825 LR: 0.00002462 [11:53:55] Epoch: 1 Batch: 8961/20099 (44.58%) Loss: 2.204791 LR: 0.00002462 [11:53:57] Epoch: 1 Batch: 8962/20099 (44.59%) Loss: 1.871185 LR: 0.00002462 [11:53:58] Epoch: 1 Batch: 8963/20099 (44.59%) Loss: 2.129901 LR: 0.00002460 [11:54:00] Epoch: 1 Batch: 8964/20099 (44.60%) Loss: 1.850264 LR: 0.00002460 [11:54:02] Epoch: 1 Batch: 8965/20099 (44.60%) Loss: 1.969214 LR: 0.00002460 [11:54:04] Epoch: 1 Batch: 8966/20099 (44.61%) Loss: 1.965101 LR: 0.00002460 [11:54:06] Epoch: 1 Batch: 8967/20099 (44.61%) Loss: 2.511655 LR: 0.00002460 [11:54:07] Epoch: 1 Batch: 8968/20099 (44.62%) Loss: 2.234533 LR: 0.00002460 [11:54:09] Epoch: 1 Batch: 8969/20099 (44.62%) Loss: 2.165569 LR: 0.00002460 [11:54:11] Epoch: 1 Batch: 8970/20099 (44.63%) Loss: 2.074373 LR: 0.00002459 [11:54:13] Epoch: 1 Batch: 8971/20099 (44.63%) Loss: 2.078547 LR: 0.00002459 [11:54:14] Epoch: 1 Batch: 8972/20099 (44.64%) Loss: 2.041807 LR: 0.00002459 [11:54:16] Epoch: 1 Batch: 8973/20099 (44.64%) Loss: 2.260077 LR: 0.00002459 [11:54:18] Epoch: 1 Batch: 8974/20099 (44.65%) Loss: 2.033007 LR: 0.00002459 [11:54:20] Epoch: 1 Batch: 8975/20099 (44.65%) Loss: 2.001691 LR: 0.00002459 [11:54:22] Epoch: 1 Batch: 8976/20099 (44.66%) Loss: 2.134772 LR: 0.00002459 [11:54:23] Epoch: 1 Batch: 8977/20099 (44.66%) Loss: 2.076771 LR: 0.00002458 [11:54:25] Epoch: 1 Batch: 8978/20099 (44.67%) Loss: 2.277897 LR: 0.00002458 [11:54:27] Epoch: 1 Batch: 8979/20099 (44.67%) Loss: 2.089426 LR: 0.00002458 [11:54:29] Epoch: 1 Batch: 8980/20099 (44.68%) Loss: 1.932505 LR: 0.00002458 [11:54:30] Epoch: 1 Batch: 8981/20099 (44.68%) Loss: 1.973094 LR: 0.00002458 [11:54:32] Epoch: 1 Batch: 8982/20099 (44.69%) Loss: 2.451916 LR: 0.00002458 [11:54:34] Epoch: 1 Batch: 8983/20099 (44.69%) Loss: 1.879475 LR: 0.00002458 [11:54:36] Epoch: 1 Batch: 8984/20099 (44.70%) Loss: 2.189509 LR: 0.00002456 [11:54:37] Epoch: 1 Batch: 8985/20099 (44.70%) Loss: 2.055868 LR: 0.00002456 [11:54:39] Epoch: 1 Batch: 8986/20099 (44.71%) Loss: 1.995506 LR: 0.00002456 [11:54:41] Epoch: 1 Batch: 8987/20099 (44.71%) Loss: 2.308263 LR: 0.00002456 [11:54:43] Epoch: 1 Batch: 8988/20099 (44.72%) Loss: 2.395047 LR: 0.00002456 [11:54:45] Epoch: 1 Batch: 8989/20099 (44.72%) Loss: 2.275849 LR: 0.00002456 [11:54:46] Epoch: 1 Batch: 8990/20099 (44.73%) Loss: 2.255444 LR: 0.00002456 [11:54:48] Epoch: 1 Batch: 8991/20099 (44.73%) Loss: 1.982450 LR: 0.00002455 [11:54:50] Epoch: 1 Batch: 8992/20099 (44.74%) Loss: 1.975723 LR: 0.00002455 [11:54:52] Epoch: 1 Batch: 8993/20099 (44.74%) Loss: 1.913492 LR: 0.00002455 [11:54:53] Epoch: 1 Batch: 8994/20099 (44.75%) Loss: 2.001206 LR: 0.00002455 [11:54:55] Epoch: 1 Batch: 8995/20099 (44.75%) Loss: 1.979897 LR: 0.00002455 [11:54:57] Epoch: 1 Batch: 8996/20099 (44.76%) Loss: 1.974810 LR: 0.00002455 [11:54:59] Epoch: 1 Batch: 8997/20099 (44.76%) Loss: 1.948730 LR: 0.00002455 [11:55:01] Epoch: 1 Batch: 8998/20099 (44.77%) Loss: 2.000144 LR: 0.00002454 [11:55:02] Epoch: 1 Batch: 8999/20099 (44.77%) Loss: 1.873268 LR: 0.00002454 [11:55:04] >> Evaluating batch 0 [11:55:05] >> Evaluating batch 1 [11:55:06] >> Evaluating batch 2 [11:55:07] >> Evaluating batch 3 [11:55:08] >> Evaluating batch 4 [11:55:09] >> Evaluating batch 5 [11:55:10] >> Evaluating batch 6 [11:55:11] >> Evaluating batch 7 [11:55:12] >> Evaluating batch 8 [11:55:13] >> Evaluating batch 9 [11:55:14] >> Evaluating batch 10 [11:55:15] >> Evaluating batch 11 [11:55:16] >> Evaluating batch 12 [11:55:17] >> Evaluating batch 13 [11:55:18] >> Evaluating batch 14 [11:55:19] >> Evaluating batch 15 [11:55:20] >> Evaluating batch 16 [11:55:21] Epoch: 1 Step: 9000/20099 Evaluation: [11:55:21] [1mAvg Loss Since Last Eval: 2.0990 Val Loss: 2.1710 Validation loss delta: -0.0048 Perplexity: 8.7673 LR: 0.00002454 [11:55:24] >> Cleaned up old temp checkpoint: epoch1_step7000 [11:55:24] >> Temp checkpoint saved: epoch1_step9000, size: 0.1693 GB [11:55:28] >> Checkpoint saved: epoch1_step9000, size: 0.1693 GB [11:55:28] Epoch: 1 Batch: 9000/20099 (44.78%) Loss: 2.165499 LR: 0.00002454 [11:55:29] Epoch: 1 Batch: 9001/20099 (44.78%) Loss: 2.278482 LR: 0.00002454 [11:55:31] Epoch: 1 Batch: 9002/20099 (44.79%) Loss: 2.161115 LR: 0.00002454 [11:55:33] Epoch: 1 Batch: 9003/20099 (44.79%) Loss: 2.146694 LR: 0.00002454 [11:55:35] Epoch: 1 Batch: 9004/20099 (44.80%) Loss: 2.146910 LR: 0.00002454 [11:55:36] Epoch: 1 Batch: 9005/20099 (44.80%) Loss: 2.334079 LR: 0.00002452 [11:55:38] Epoch: 1 Batch: 9006/20099 (44.81%) Loss: 2.376534 LR: 0.00002452 [11:55:40] Epoch: 1 Batch: 9007/20099 (44.81%) Loss: 1.890614 LR: 0.00002452 [11:55:42] Epoch: 1 Batch: 9008/20099 (44.82%) Loss: 2.210827 LR: 0.00002452 [11:55:43] Epoch: 1 Batch: 9009/20099 (44.82%) Loss: 2.231585 LR: 0.00002452 [11:55:45] Epoch: 1 Batch: 9010/20099 (44.83%) Loss: 2.145108 LR: 0.00002452 [11:55:47] Epoch: 1 Batch: 9011/20099 (44.83%) Loss: 1.832391 LR: 0.00002452 [11:55:49] Epoch: 1 Batch: 9012/20099 (44.84%) Loss: 1.946634 LR: 0.00002451 [11:55:51] Epoch: 1 Batch: 9013/20099 (44.84%) Loss: 2.395166 LR: 0.00002451 [11:55:53] Epoch: 1 Batch: 9014/20099 (44.85%) Loss: 2.298743 LR: 0.00002451 [11:55:54] Epoch: 1 Batch: 9015/20099 (44.85%) Loss: 2.102147 LR: 0.00002451 [11:55:56] Epoch: 1 Batch: 9016/20099 (44.86%) Loss: 2.206362 LR: 0.00002451 [11:55:58] Epoch: 1 Batch: 9017/20099 (44.86%) Loss: 2.058571 LR: 0.00002451 [11:56:00] Epoch: 1 Batch: 9018/20099 (44.87%) Loss: 2.002407 LR: 0.00002451 [11:56:02] Epoch: 1 Batch: 9019/20099 (44.87%) Loss: 2.114236 LR: 0.00002449 [11:56:04] Epoch: 1 Batch: 9020/20099 (44.88%) Loss: 2.479812 LR: 0.00002449 [11:56:05] Epoch: 1 Batch: 9021/20099 (44.88%) Loss: 2.144647 LR: 0.00002449 [11:56:07] Epoch: 1 Batch: 9022/20099 (44.89%) Loss: 2.088941 LR: 0.00002449 [11:56:09] Epoch: 1 Batch: 9023/20099 (44.89%) Loss: 2.132734 LR: 0.00002449 [11:56:11] Epoch: 1 Batch: 9024/20099 (44.90%) Loss: 2.223024 LR: 0.00002449 [11:56:13] Epoch: 1 Batch: 9025/20099 (44.90%) Loss: 2.143334 LR: 0.00002449 [11:56:14] Epoch: 1 Batch: 9026/20099 (44.91%) Loss: 2.165899 LR: 0.00002448 [11:56:16] Epoch: 1 Batch: 9027/20099 (44.91%) Loss: 2.148203 LR: 0.00002448 [11:56:18] Epoch: 1 Batch: 9028/20099 (44.92%) Loss: 2.087659 LR: 0.00002448 [11:56:20] Epoch: 1 Batch: 9029/20099 (44.92%) Loss: 1.814174 LR: 0.00002448 [11:56:22] Epoch: 1 Batch: 9030/20099 (44.93%) Loss: 2.120022 LR: 0.00002448 [11:56:23] Epoch: 1 Batch: 9031/20099 (44.93%) Loss: 2.401445 LR: 0.00002448 [11:56:25] Epoch: 1 Batch: 9032/20099 (44.94%) Loss: 2.321194 LR: 0.00002448 [11:56:27] Epoch: 1 Batch: 9033/20099 (44.94%) Loss: 1.973994 LR: 0.00002447 [11:56:29] Epoch: 1 Batch: 9034/20099 (44.95%) Loss: 2.105116 LR: 0.00002447 [11:56:30] Epoch: 1 Batch: 9035/20099 (44.95%) Loss: 2.152263 LR: 0.00002447 [11:56:32] Epoch: 1 Batch: 9036/20099 (44.96%) Loss: 2.249472 LR: 0.00002447 [11:56:34] Epoch: 1 Batch: 9037/20099 (44.96%) Loss: 2.199401 LR: 0.00002447 [11:56:36] Epoch: 1 Batch: 9038/20099 (44.97%) Loss: 2.043688 LR: 0.00002447 [11:56:37] Epoch: 1 Batch: 9039/20099 (44.97%) Loss: 1.975628 LR: 0.00002447 [11:56:39] Epoch: 1 Batch: 9040/20099 (44.98%) Loss: 2.091162 LR: 0.00002445 [11:56:41] Epoch: 1 Batch: 9041/20099 (44.98%) Loss: 1.974990 LR: 0.00002445 [11:56:43] Epoch: 1 Batch: 9042/20099 (44.99%) Loss: 2.250166 LR: 0.00002445 [11:56:44] Epoch: 1 Batch: 9043/20099 (44.99%) Loss: 2.365336 LR: 0.00002445 [11:56:46] Epoch: 1 Batch: 9044/20099 (45.00%) Loss: 1.874790 LR: 0.00002445 [11:56:48] Epoch: 1 Batch: 9045/20099 (45.00%) Loss: 2.029709 LR: 0.00002445 [11:56:50] Epoch: 1 Batch: 9046/20099 (45.01%) Loss: 2.314759 LR: 0.00002445 [11:56:51] Epoch: 1 Batch: 9047/20099 (45.01%) Loss: 1.972628 LR: 0.00002444 [11:56:53] Epoch: 1 Batch: 9048/20099 (45.02%) Loss: 2.057288 LR: 0.00002444 [11:56:55] Epoch: 1 Batch: 9049/20099 (45.02%) Loss: 2.040596 LR: 0.00002444 [11:56:57] Epoch: 1 Batch: 9050/20099 (45.03%) Loss: 2.315883 LR: 0.00002444 [11:56:58] Epoch: 1 Batch: 9051/20099 (45.03%) Loss: 2.318029 LR: 0.00002444 [11:57:00] Epoch: 1 Batch: 9052/20099 (45.04%) Loss: 2.473105 LR: 0.00002444 [11:57:02] Epoch: 1 Batch: 9053/20099 (45.04%) Loss: 2.361902 LR: 0.00002444 [11:57:04] Epoch: 1 Batch: 9054/20099 (45.05%) Loss: 2.113984 LR: 0.00002443 [11:57:05] Epoch: 1 Batch: 9055/20099 (45.05%) Loss: 1.990285 LR: 0.00002443 [11:57:07] Epoch: 1 Batch: 9056/20099 (45.06%) Loss: 1.943011 LR: 0.00002443 [11:57:09] Epoch: 1 Batch: 9057/20099 (45.06%) Loss: 2.206657 LR: 0.00002443 [11:57:11] Epoch: 1 Batch: 9058/20099 (45.07%) Loss: 2.286159 LR: 0.00002443 [11:57:13] Epoch: 1 Batch: 9059/20099 (45.07%) Loss: 2.356273 LR: 0.00002443 [11:57:14] Epoch: 1 Batch: 9060/20099 (45.08%) Loss: 2.252020 LR: 0.00002443 [11:57:16] Epoch: 1 Batch: 9061/20099 (45.08%) Loss: 2.236052 LR: 0.00002441 [11:57:18] Epoch: 1 Batch: 9062/20099 (45.09%) Loss: 2.375777 LR: 0.00002441 [11:57:20] Epoch: 1 Batch: 9063/20099 (45.09%) Loss: 2.265572 LR: 0.00002441 [11:57:21] Epoch: 1 Batch: 9064/20099 (45.10%) Loss: 2.420368 LR: 0.00002441 [11:57:23] Epoch: 1 Batch: 9065/20099 (45.10%) Loss: 2.115771 LR: 0.00002441 [11:57:25] Epoch: 1 Batch: 9066/20099 (45.11%) Loss: 2.537005 LR: 0.00002441 [11:57:27] Epoch: 1 Batch: 9067/20099 (45.11%) Loss: 1.812806 LR: 0.00002441 [11:57:29] Epoch: 1 Batch: 9068/20099 (45.12%) Loss: 1.776789 LR: 0.00002440 [11:57:30] Epoch: 1 Batch: 9069/20099 (45.12%) Loss: 2.371753 LR: 0.00002440 [11:57:32] Epoch: 1 Batch: 9070/20099 (45.13%) Loss: 2.009654 LR: 0.00002440 [11:57:34] Epoch: 1 Batch: 9071/20099 (45.13%) Loss: 1.730688 LR: 0.00002440 [11:57:36] Epoch: 1 Batch: 9072/20099 (45.14%) Loss: 1.795186 LR: 0.00002440 [11:57:38] Epoch: 1 Batch: 9073/20099 (45.14%) Loss: 2.191907 LR: 0.00002440 [11:57:39] Epoch: 1 Batch: 9074/20099 (45.15%) Loss: 1.856454 LR: 0.00002440 [11:57:41] Epoch: 1 Batch: 9075/20099 (45.15%) Loss: 2.242178 LR: 0.00002438 [11:57:43] Epoch: 1 Batch: 9076/20099 (45.16%) Loss: 2.029832 LR: 0.00002438 [11:57:45] Epoch: 1 Batch: 9077/20099 (45.16%) Loss: 1.987526 LR: 0.00002438 [11:57:46] Epoch: 1 Batch: 9078/20099 (45.17%) Loss: 2.181558 LR: 0.00002438 [11:57:48] Epoch: 1 Batch: 9079/20099 (45.17%) Loss: 2.033348 LR: 0.00002438 [11:57:50] Epoch: 1 Batch: 9080/20099 (45.18%) Loss: 2.167500 LR: 0.00002438 [11:57:52] Epoch: 1 Batch: 9081/20099 (45.18%) Loss: 2.109411 LR: 0.00002438 [11:57:54] Epoch: 1 Batch: 9082/20099 (45.19%) Loss: 2.050435 LR: 0.00002437 [11:57:55] Epoch: 1 Batch: 9083/20099 (45.19%) Loss: 2.111998 LR: 0.00002437 [11:57:57] Epoch: 1 Batch: 9084/20099 (45.20%) Loss: 2.231886 LR: 0.00002437 [11:57:59] Epoch: 1 Batch: 9085/20099 (45.20%) Loss: 1.954940 LR: 0.00002437 [11:58:01] Epoch: 1 Batch: 9086/20099 (45.21%) Loss: 2.389026 LR: 0.00002437 [11:58:02] Epoch: 1 Batch: 9087/20099 (45.21%) Loss: 2.310574 LR: 0.00002437 [11:58:04] Epoch: 1 Batch: 9088/20099 (45.22%) Loss: 2.196440 LR: 0.00002437 [11:58:06] Epoch: 1 Batch: 9089/20099 (45.22%) Loss: 2.168095 LR: 0.00002436 [11:58:08] Epoch: 1 Batch: 9090/20099 (45.23%) Loss: 2.001562 LR: 0.00002436 [11:58:09] Epoch: 1 Batch: 9091/20099 (45.23%) Loss: 2.129727 LR: 0.00002436 [11:58:11] Epoch: 1 Batch: 9092/20099 (45.24%) Loss: 2.124660 LR: 0.00002436 [11:58:13] Epoch: 1 Batch: 9093/20099 (45.24%) Loss: 1.932333 LR: 0.00002436 [11:58:15] Epoch: 1 Batch: 9094/20099 (45.25%) Loss: 1.928227 LR: 0.00002436 [11:58:16] Epoch: 1 Batch: 9095/20099 (45.25%) Loss: 2.066949 LR: 0.00002436 [11:58:18] Epoch: 1 Batch: 9096/20099 (45.26%) Loss: 2.249023 LR: 0.00002434 [11:58:20] Epoch: 1 Batch: 9097/20099 (45.26%) Loss: 2.394020 LR: 0.00002434 [11:58:22] Epoch: 1 Batch: 9098/20099 (45.27%) Loss: 2.157460 LR: 0.00002434 [11:58:23] Epoch: 1 Batch: 9099/20099 (45.27%) Loss: 2.152714 LR: 0.00002434 [11:58:25] Epoch: 1 Batch: 9100/20099 (45.28%) Loss: 2.190256 LR: 0.00002434 [11:58:27] Epoch: 1 Batch: 9101/20099 (45.28%) Loss: 1.802599 LR: 0.00002434 [11:58:29] Epoch: 1 Batch: 9102/20099 (45.29%) Loss: 2.163137 LR: 0.00002434 [11:58:31] Epoch: 1 Batch: 9103/20099 (45.29%) Loss: 2.383504 LR: 0.00002433 [11:58:32] Epoch: 1 Batch: 9104/20099 (45.30%) Loss: 2.511687 LR: 0.00002433 [11:58:34] Epoch: 1 Batch: 9105/20099 (45.30%) Loss: 2.400532 LR: 0.00002433 [11:58:36] Epoch: 1 Batch: 9106/20099 (45.31%) Loss: 2.127068 LR: 0.00002433 [11:58:38] Epoch: 1 Batch: 9107/20099 (45.31%) Loss: 2.294084 LR: 0.00002433 [11:58:39] Epoch: 1 Batch: 9108/20099 (45.32%) Loss: 2.340208 LR: 0.00002433 [11:58:41] Epoch: 1 Batch: 9109/20099 (45.32%) Loss: 2.100180 LR: 0.00002433 [11:58:43] Epoch: 1 Batch: 9110/20099 (45.33%) Loss: 2.044102 LR: 0.00002432 [11:58:45] Epoch: 1 Batch: 9111/20099 (45.33%) Loss: 2.411339 LR: 0.00002432 [11:58:46] Epoch: 1 Batch: 9112/20099 (45.34%) Loss: 1.948151 LR: 0.00002432 [11:58:48] Epoch: 1 Batch: 9113/20099 (45.34%) Loss: 1.971307 LR: 0.00002432 [11:58:50] Epoch: 1 Batch: 9114/20099 (45.35%) Loss: 2.114316 LR: 0.00002432 [11:58:52] Epoch: 1 Batch: 9115/20099 (45.35%) Loss: 2.291146 LR: 0.00002432 [11:58:54] Epoch: 1 Batch: 9116/20099 (45.36%) Loss: 1.938184 LR: 0.00002432 [11:58:55] Epoch: 1 Batch: 9117/20099 (45.36%) Loss: 2.010793 LR: 0.00002430 [11:58:57] Epoch: 1 Batch: 9118/20099 (45.37%) Loss: 2.530650 LR: 0.00002430 [11:58:59] Epoch: 1 Batch: 9119/20099 (45.37%) Loss: 2.032656 LR: 0.00002430 [11:59:01] Epoch: 1 Batch: 9120/20099 (45.38%) Loss: 1.895960 LR: 0.00002430 [11:59:03] Epoch: 1 Batch: 9121/20099 (45.38%) Loss: 2.166624 LR: 0.00002430 [11:59:04] Epoch: 1 Batch: 9122/20099 (45.39%) Loss: 2.375689 LR: 0.00002430 [11:59:06] Epoch: 1 Batch: 9123/20099 (45.39%) Loss: 2.254987 LR: 0.00002430 [11:59:08] Epoch: 1 Batch: 9124/20099 (45.40%) Loss: 1.886475 LR: 0.00002429 [11:59:10] Epoch: 1 Batch: 9125/20099 (45.40%) Loss: 2.137772 LR: 0.00002429 [11:59:11] Epoch: 1 Batch: 9126/20099 (45.41%) Loss: 1.990760 LR: 0.00002429 [11:59:13] Epoch: 1 Batch: 9127/20099 (45.41%) Loss: 2.231516 LR: 0.00002429 [11:59:15] Epoch: 1 Batch: 9128/20099 (45.42%) Loss: 2.216008 LR: 0.00002429 [11:59:17] Epoch: 1 Batch: 9129/20099 (45.42%) Loss: 2.165371 LR: 0.00002429 [11:59:19] Epoch: 1 Batch: 9130/20099 (45.43%) Loss: 2.222523 LR: 0.00002429 [11:59:20] Epoch: 1 Batch: 9131/20099 (45.43%) Loss: 2.101853 LR: 0.00002427 [11:59:22] Epoch: 1 Batch: 9132/20099 (45.44%) Loss: 2.110201 LR: 0.00002427 [11:59:24] Epoch: 1 Batch: 9133/20099 (45.44%) Loss: 1.856743 LR: 0.00002427 [11:59:26] Epoch: 1 Batch: 9134/20099 (45.45%) Loss: 2.193601 LR: 0.00002427 [11:59:27] Epoch: 1 Batch: 9135/20099 (45.45%) Loss: 2.028407 LR: 0.00002427 [11:59:29] Epoch: 1 Batch: 9136/20099 (45.45%) Loss: 2.129206 LR: 0.00002427 [11:59:31] Epoch: 1 Batch: 9137/20099 (45.46%) Loss: 2.009054 LR: 0.00002427 [11:59:33] Epoch: 1 Batch: 9138/20099 (45.46%) Loss: 2.036807 LR: 0.00002426 [11:59:35] Epoch: 1 Batch: 9139/20099 (45.47%) Loss: 2.248283 LR: 0.00002426 [11:59:36] Epoch: 1 Batch: 9140/20099 (45.47%) Loss: 1.868717 LR: 0.00002426 [11:59:38] Epoch: 1 Batch: 9141/20099 (45.48%) Loss: 2.260092 LR: 0.00002426 [11:59:40] Epoch: 1 Batch: 9142/20099 (45.48%) Loss: 1.954775 LR: 0.00002426 [11:59:42] Epoch: 1 Batch: 9143/20099 (45.49%) Loss: 2.091927 LR: 0.00002426 [11:59:43] Epoch: 1 Batch: 9144/20099 (45.49%) Loss: 2.083263 LR: 0.00002426 [11:59:45] Epoch: 1 Batch: 9145/20099 (45.50%) Loss: 2.049387 LR: 0.00002425 [11:59:47] Epoch: 1 Batch: 9146/20099 (45.50%) Loss: 2.151494 LR: 0.00002425 [11:59:49] Epoch: 1 Batch: 9147/20099 (45.51%) Loss: 2.144167 LR: 0.00002425 [11:59:51] Epoch: 1 Batch: 9148/20099 (45.51%) Loss: 2.074472 LR: 0.00002425 [11:59:52] Epoch: 1 Batch: 9149/20099 (45.52%) Loss: 1.789467 LR: 0.00002425 [11:59:54] Epoch: 1 Batch: 9150/20099 (45.52%) Loss: 1.994728 LR: 0.00002425 [11:59:56] Epoch: 1 Batch: 9151/20099 (45.53%) Loss: 2.204883 LR: 0.00002425 [11:59:58] Epoch: 1 Batch: 9152/20099 (45.53%) Loss: 2.147173 LR: 0.00002423 [11:59:59] Epoch: 1 Batch: 9153/20099 (45.54%) Loss: 2.161818 LR: 0.00002423 [12:00:01] Epoch: 1 Batch: 9154/20099 (45.54%) Loss: 2.088622 LR: 0.00002423 [12:00:03] Epoch: 1 Batch: 9155/20099 (45.55%) Loss: 2.293149 LR: 0.00002423 [12:00:05] Epoch: 1 Batch: 9156/20099 (45.55%) Loss: 2.192732 LR: 0.00002423 [12:00:06] Epoch: 1 Batch: 9157/20099 (45.56%) Loss: 1.871873 LR: 0.00002423 [12:00:08] Epoch: 1 Batch: 9158/20099 (45.56%) Loss: 2.209268 LR: 0.00002423 [12:00:10] Epoch: 1 Batch: 9159/20099 (45.57%) Loss: 1.788508 LR: 0.00002422 [12:00:12] Epoch: 1 Batch: 9160/20099 (45.57%) Loss: 2.424423 LR: 0.00002422 [12:00:14] Epoch: 1 Batch: 9161/20099 (45.58%) Loss: 2.040257 LR: 0.00002422 [12:00:15] Epoch: 1 Batch: 9162/20099 (45.58%) Loss: 1.821468 LR: 0.00002422 [12:00:17] Epoch: 1 Batch: 9163/20099 (45.59%) Loss: 2.084630 LR: 0.00002422 [12:00:19] Epoch: 1 Batch: 9164/20099 (45.59%) Loss: 2.240560 LR: 0.00002422 [12:00:21] Epoch: 1 Batch: 9165/20099 (45.60%) Loss: 2.363619 LR: 0.00002422 [12:00:22] Epoch: 1 Batch: 9166/20099 (45.60%) Loss: 2.216919 LR: 0.00002421 [12:00:24] Epoch: 1 Batch: 9167/20099 (45.61%) Loss: 2.069092 LR: 0.00002421 [12:00:26] Epoch: 1 Batch: 9168/20099 (45.61%) Loss: 2.055189 LR: 0.00002421 [12:00:28] Epoch: 1 Batch: 9169/20099 (45.62%) Loss: 1.851123 LR: 0.00002421 [12:00:29] Epoch: 1 Batch: 9170/20099 (45.62%) Loss: 1.812312 LR: 0.00002421 [12:00:31] Epoch: 1 Batch: 9171/20099 (45.63%) Loss: 2.076532 LR: 0.00002421 [12:00:33] Epoch: 1 Batch: 9172/20099 (45.63%) Loss: 2.441097 LR: 0.00002421 [12:00:35] Epoch: 1 Batch: 9173/20099 (45.64%) Loss: 2.180741 LR: 0.00002419 [12:00:36] Epoch: 1 Batch: 9174/20099 (45.64%) Loss: 1.986938 LR: 0.00002419 [12:00:38] Epoch: 1 Batch: 9175/20099 (45.65%) Loss: 2.007969 LR: 0.00002419 [12:00:40] Epoch: 1 Batch: 9176/20099 (45.65%) Loss: 2.021558 LR: 0.00002419 [12:00:42] Epoch: 1 Batch: 9177/20099 (45.66%) Loss: 2.153283 LR: 0.00002419 [12:00:44] Epoch: 1 Batch: 9178/20099 (45.66%) Loss: 2.142602 LR: 0.00002419 [12:00:45] Epoch: 1 Batch: 9179/20099 (45.67%) Loss: 1.904850 LR: 0.00002419 [12:00:47] Epoch: 1 Batch: 9180/20099 (45.67%) Loss: 2.087038 LR: 0.00002418 [12:00:49] Epoch: 1 Batch: 9181/20099 (45.68%) Loss: 2.271775 LR: 0.00002418 [12:00:51] Epoch: 1 Batch: 9182/20099 (45.68%) Loss: 2.017069 LR: 0.00002418 [12:00:52] Epoch: 1 Batch: 9183/20099 (45.69%) Loss: 1.889721 LR: 0.00002418 [12:00:54] Epoch: 1 Batch: 9184/20099 (45.69%) Loss: 2.052370 LR: 0.00002418 [12:00:56] Epoch: 1 Batch: 9185/20099 (45.70%) Loss: 2.325043 LR: 0.00002418 [12:00:58] Epoch: 1 Batch: 9186/20099 (45.70%) Loss: 2.159525 LR: 0.00002418 [12:00:59] Epoch: 1 Batch: 9187/20099 (45.71%) Loss: 2.120241 LR: 0.00002416 [12:01:01] Epoch: 1 Batch: 9188/20099 (45.71%) Loss: 2.193137 LR: 0.00002416 [12:01:03] Epoch: 1 Batch: 9189/20099 (45.72%) Loss: 2.097411 LR: 0.00002416 [12:01:05] Epoch: 1 Batch: 9190/20099 (45.72%) Loss: 2.206403 LR: 0.00002416 [12:01:06] Epoch: 1 Batch: 9191/20099 (45.73%) Loss: 2.289990 LR: 0.00002416 [12:01:08] Epoch: 1 Batch: 9192/20099 (45.73%) Loss: 2.521374 LR: 0.00002416 [12:01:10] Epoch: 1 Batch: 9193/20099 (45.74%) Loss: 1.924747 LR: 0.00002416 [12:01:12] Epoch: 1 Batch: 9194/20099 (45.74%) Loss: 1.856146 LR: 0.00002415 [12:01:14] Epoch: 1 Batch: 9195/20099 (45.75%) Loss: 1.888569 LR: 0.00002415 [12:01:15] Epoch: 1 Batch: 9196/20099 (45.75%) Loss: 2.146529 LR: 0.00002415 [12:01:17] Epoch: 1 Batch: 9197/20099 (45.76%) Loss: 2.126666 LR: 0.00002415 [12:01:19] Epoch: 1 Batch: 9198/20099 (45.76%) Loss: 2.299310 LR: 0.00002415 [12:01:21] Epoch: 1 Batch: 9199/20099 (45.77%) Loss: 1.915786 LR: 0.00002415 [12:01:26] >> Cleaned up old temp checkpoint: epoch1_step7200 [12:01:26] >> Temp checkpoint saved: epoch1_step9200, size: 0.1693 GB [12:01:26] Epoch: 1 Batch: 9200/20099 (45.77%) Loss: 2.045567 LR: 0.00002415 [12:01:28] Epoch: 1 Batch: 9201/20099 (45.78%) Loss: 2.350906 LR: 0.00002414 [12:01:29] Epoch: 1 Batch: 9202/20099 (45.78%) Loss: 2.213795 LR: 0.00002414 [12:01:31] Epoch: 1 Batch: 9203/20099 (45.79%) Loss: 2.392748 LR: 0.00002414 [12:01:33] Epoch: 1 Batch: 9204/20099 (45.79%) Loss: 2.083737 LR: 0.00002414 [12:01:35] Epoch: 1 Batch: 9205/20099 (45.80%) Loss: 2.033147 LR: 0.00002414 [12:01:36] Epoch: 1 Batch: 9206/20099 (45.80%) Loss: 1.970197 LR: 0.00002414 [12:01:38] Epoch: 1 Batch: 9207/20099 (45.81%) Loss: 2.187941 LR: 0.00002414 [12:01:40] Epoch: 1 Batch: 9208/20099 (45.81%) Loss: 1.886406 LR: 0.00002412 [12:01:42] Epoch: 1 Batch: 9209/20099 (45.82%) Loss: 1.894311 LR: 0.00002412 [12:01:44] Epoch: 1 Batch: 9210/20099 (45.82%) Loss: 2.360817 LR: 0.00002412 [12:01:45] Epoch: 1 Batch: 9211/20099 (45.83%) Loss: 2.104266 LR: 0.00002412 [12:01:47] Epoch: 1 Batch: 9212/20099 (45.83%) Loss: 2.170833 LR: 0.00002412 [12:01:49] Epoch: 1 Batch: 9213/20099 (45.84%) Loss: 1.718122 LR: 0.00002412 [12:01:51] Epoch: 1 Batch: 9214/20099 (45.84%) Loss: 2.211202 LR: 0.00002412 [12:01:53] Epoch: 1 Batch: 9215/20099 (45.85%) Loss: 1.995834 LR: 0.00002411 [12:01:54] Epoch: 1 Batch: 9216/20099 (45.85%) Loss: 2.062163 LR: 0.00002411 [12:01:56] Epoch: 1 Batch: 9217/20099 (45.86%) Loss: 2.307980 LR: 0.00002411 [12:01:58] Epoch: 1 Batch: 9218/20099 (45.86%) Loss: 2.267162 LR: 0.00002411 [12:02:00] Epoch: 1 Batch: 9219/20099 (45.87%) Loss: 2.297253 LR: 0.00002411 [12:02:02] Epoch: 1 Batch: 9220/20099 (45.87%) Loss: 2.223510 LR: 0.00002411 [12:02:03] Epoch: 1 Batch: 9221/20099 (45.88%) Loss: 2.280790 LR: 0.00002411 [12:02:05] Epoch: 1 Batch: 9222/20099 (45.88%) Loss: 2.110057 LR: 0.00002409 [12:02:07] Epoch: 1 Batch: 9223/20099 (45.89%) Loss: 2.110988 LR: 0.00002409 [12:02:09] Epoch: 1 Batch: 9224/20099 (45.89%) Loss: 2.348033 LR: 0.00002409 [12:02:10] Epoch: 1 Batch: 9225/20099 (45.90%) Loss: 2.063286 LR: 0.00002409 [12:02:12] Epoch: 1 Batch: 9226/20099 (45.90%) Loss: 2.061849 LR: 0.00002409 [12:02:14] Epoch: 1 Batch: 9227/20099 (45.91%) Loss: 2.228657 LR: 0.00002409 [12:02:16] Epoch: 1 Batch: 9228/20099 (45.91%) Loss: 1.970988 LR: 0.00002409 [12:02:18] Epoch: 1 Batch: 9229/20099 (45.92%) Loss: 2.355350 LR: 0.00002408 [12:02:19] Epoch: 1 Batch: 9230/20099 (45.92%) Loss: 1.792391 LR: 0.00002408 [12:02:21] Epoch: 1 Batch: 9231/20099 (45.93%) Loss: 2.221451 LR: 0.00002408 [12:02:23] Epoch: 1 Batch: 9232/20099 (45.93%) Loss: 1.916058 LR: 0.00002408 [12:02:25] Epoch: 1 Batch: 9233/20099 (45.94%) Loss: 2.318926 LR: 0.00002408 [12:02:26] Epoch: 1 Batch: 9234/20099 (45.94%) Loss: 2.429273 LR: 0.00002408 [12:02:28] Epoch: 1 Batch: 9235/20099 (45.95%) Loss: 2.250731 LR: 0.00002408 [12:02:30] Epoch: 1 Batch: 9236/20099 (45.95%) Loss: 2.355707 LR: 0.00002407 [12:02:32] Epoch: 1 Batch: 9237/20099 (45.96%) Loss: 2.218588 LR: 0.00002407 [12:02:34] Epoch: 1 Batch: 9238/20099 (45.96%) Loss: 2.134538 LR: 0.00002407 [12:02:35] Epoch: 1 Batch: 9239/20099 (45.97%) Loss: 2.071525 LR: 0.00002407 [12:02:37] Epoch: 1 Batch: 9240/20099 (45.97%) Loss: 2.082251 LR: 0.00002407 [12:02:39] Epoch: 1 Batch: 9241/20099 (45.98%) Loss: 2.028720 LR: 0.00002407 [12:02:41] Epoch: 1 Batch: 9242/20099 (45.98%) Loss: 1.975001 LR: 0.00002407 [12:02:42] Epoch: 1 Batch: 9243/20099 (45.99%) Loss: 1.874352 LR: 0.00002405 [12:02:44] Epoch: 1 Batch: 9244/20099 (45.99%) Loss: 2.165920 LR: 0.00002405 [12:02:46] Epoch: 1 Batch: 9245/20099 (46.00%) Loss: 2.311582 LR: 0.00002405 [12:02:48] Epoch: 1 Batch: 9246/20099 (46.00%) Loss: 2.124339 LR: 0.00002405 [12:02:49] Epoch: 1 Batch: 9247/20099 (46.01%) Loss: 2.177079 LR: 0.00002405 [12:02:51] Epoch: 1 Batch: 9248/20099 (46.01%) Loss: 2.211617 LR: 0.00002405 [12:02:53] Epoch: 1 Batch: 9249/20099 (46.02%) Loss: 2.010850 LR: 0.00002405 [12:02:55] Epoch: 1 Batch: 9250/20099 (46.02%) Loss: 2.163287 LR: 0.00002404 [12:02:56] Epoch: 1 Batch: 9251/20099 (46.03%) Loss: 2.267598 LR: 0.00002404 [12:02:58] Epoch: 1 Batch: 9252/20099 (46.03%) Loss: 2.186189 LR: 0.00002404 [12:03:00] Epoch: 1 Batch: 9253/20099 (46.04%) Loss: 1.706822 LR: 0.00002404 [12:03:02] Epoch: 1 Batch: 9254/20099 (46.04%) Loss: 2.185171 LR: 0.00002404 [12:03:04] Epoch: 1 Batch: 9255/20099 (46.05%) Loss: 2.306379 LR: 0.00002404 [12:03:05] Epoch: 1 Batch: 9256/20099 (46.05%) Loss: 2.212572 LR: 0.00002404 [12:03:07] Epoch: 1 Batch: 9257/20099 (46.06%) Loss: 2.298540 LR: 0.00002402 [12:03:09] Epoch: 1 Batch: 9258/20099 (46.06%) Loss: 1.952045 LR: 0.00002402 [12:03:11] Epoch: 1 Batch: 9259/20099 (46.07%) Loss: 1.860819 LR: 0.00002402 [12:03:12] Epoch: 1 Batch: 9260/20099 (46.07%) Loss: 2.198509 LR: 0.00002402 [12:03:14] Epoch: 1 Batch: 9261/20099 (46.08%) Loss: 2.318628 LR: 0.00002402 [12:03:16] Epoch: 1 Batch: 9262/20099 (46.08%) Loss: 2.221910 LR: 0.00002402 [12:03:18] Epoch: 1 Batch: 9263/20099 (46.09%) Loss: 1.937892 LR: 0.00002402 [12:03:20] Epoch: 1 Batch: 9264/20099 (46.09%) Loss: 2.266819 LR: 0.00002401 [12:03:21] Epoch: 1 Batch: 9265/20099 (46.10%) Loss: 2.493267 LR: 0.00002401 [12:03:23] Epoch: 1 Batch: 9266/20099 (46.10%) Loss: 2.041617 LR: 0.00002401 [12:03:25] Epoch: 1 Batch: 9267/20099 (46.11%) Loss: 2.036446 LR: 0.00002401 [12:03:27] Epoch: 1 Batch: 9268/20099 (46.11%) Loss: 1.676574 LR: 0.00002401 [12:03:29] Epoch: 1 Batch: 9269/20099 (46.12%) Loss: 2.131135 LR: 0.00002401 [12:03:30] Epoch: 1 Batch: 9270/20099 (46.12%) Loss: 1.910474 LR: 0.00002401 [12:03:32] Epoch: 1 Batch: 9271/20099 (46.13%) Loss: 2.321613 LR: 0.00002400 [12:03:34] Epoch: 1 Batch: 9272/20099 (46.13%) Loss: 2.359130 LR: 0.00002400 [12:03:36] Epoch: 1 Batch: 9273/20099 (46.14%) Loss: 1.900019 LR: 0.00002400 [12:03:37] Epoch: 1 Batch: 9274/20099 (46.14%) Loss: 2.227680 LR: 0.00002400 [12:03:39] Epoch: 1 Batch: 9275/20099 (46.15%) Loss: 2.249901 LR: 0.00002400 [12:03:41] Epoch: 1 Batch: 9276/20099 (46.15%) Loss: 2.093754 LR: 0.00002400 [12:03:43] Epoch: 1 Batch: 9277/20099 (46.16%) Loss: 2.144181 LR: 0.00002400 [12:03:45] Epoch: 1 Batch: 9278/20099 (46.16%) Loss: 2.114960 LR: 0.00002398 [12:03:46] Epoch: 1 Batch: 9279/20099 (46.17%) Loss: 2.346670 LR: 0.00002398 [12:03:48] Epoch: 1 Batch: 9280/20099 (46.17%) Loss: 1.994242 LR: 0.00002398 [12:03:50] Epoch: 1 Batch: 9281/20099 (46.18%) Loss: 2.514149 LR: 0.00002398 [12:03:52] Epoch: 1 Batch: 9282/20099 (46.18%) Loss: 1.919082 LR: 0.00002398 [12:03:53] Epoch: 1 Batch: 9283/20099 (46.19%) Loss: 2.035897 LR: 0.00002398 [12:03:55] Epoch: 1 Batch: 9284/20099 (46.19%) Loss: 2.066009 LR: 0.00002398 [12:03:57] Epoch: 1 Batch: 9285/20099 (46.20%) Loss: 2.431469 LR: 0.00002397 [12:03:59] Epoch: 1 Batch: 9286/20099 (46.20%) Loss: 2.062090 LR: 0.00002397 [12:04:01] Epoch: 1 Batch: 9287/20099 (46.21%) Loss: 2.238910 LR: 0.00002397 [12:04:02] Epoch: 1 Batch: 9288/20099 (46.21%) Loss: 1.806733 LR: 0.00002397 [12:04:04] Epoch: 1 Batch: 9289/20099 (46.22%) Loss: 2.155999 LR: 0.00002397 [12:04:06] Epoch: 1 Batch: 9290/20099 (46.22%) Loss: 2.078401 LR: 0.00002397 [12:04:08] Epoch: 1 Batch: 9291/20099 (46.23%) Loss: 2.058513 LR: 0.00002397 [12:04:09] Epoch: 1 Batch: 9292/20099 (46.23%) Loss: 2.334511 LR: 0.00002395 [12:04:11] Epoch: 1 Batch: 9293/20099 (46.24%) Loss: 1.956556 LR: 0.00002395 [12:04:13] Epoch: 1 Batch: 9294/20099 (46.24%) Loss: 1.847939 LR: 0.00002395 [12:04:15] Epoch: 1 Batch: 9295/20099 (46.25%) Loss: 2.074734 LR: 0.00002395 [12:04:16] Epoch: 1 Batch: 9296/20099 (46.25%) Loss: 1.866462 LR: 0.00002395 [12:04:18] Epoch: 1 Batch: 9297/20099 (46.26%) Loss: 2.245913 LR: 0.00002395 [12:04:20] Epoch: 1 Batch: 9298/20099 (46.26%) Loss: 2.010411 LR: 0.00002395 [12:04:22] Epoch: 1 Batch: 9299/20099 (46.27%) Loss: 2.289441 LR: 0.00002394 [12:04:23] Epoch: 1 Batch: 9300/20099 (46.27%) Loss: 2.108822 LR: 0.00002394 [12:04:25] Epoch: 1 Batch: 9301/20099 (46.28%) Loss: 1.973079 LR: 0.00002394 [12:04:27] Epoch: 1 Batch: 9302/20099 (46.28%) Loss: 2.071766 LR: 0.00002394 [12:04:29] Epoch: 1 Batch: 9303/20099 (46.29%) Loss: 2.116529 LR: 0.00002394 [12:04:30] Epoch: 1 Batch: 9304/20099 (46.29%) Loss: 2.091579 LR: 0.00002394 [12:04:32] Epoch: 1 Batch: 9305/20099 (46.30%) Loss: 2.317658 LR: 0.00002394 [12:04:34] Epoch: 1 Batch: 9306/20099 (46.30%) Loss: 2.132154 LR: 0.00002392 [12:04:36] Epoch: 1 Batch: 9307/20099 (46.31%) Loss: 2.106025 LR: 0.00002392 [12:04:38] Epoch: 1 Batch: 9308/20099 (46.31%) Loss: 2.300197 LR: 0.00002392 [12:04:39] Epoch: 1 Batch: 9309/20099 (46.32%) Loss: 2.103794 LR: 0.00002392 [12:04:41] Epoch: 1 Batch: 9310/20099 (46.32%) Loss: 2.171246 LR: 0.00002392 [12:04:43] Epoch: 1 Batch: 9311/20099 (46.33%) Loss: 2.476990 LR: 0.00002392 [12:04:45] Epoch: 1 Batch: 9312/20099 (46.33%) Loss: 1.959389 LR: 0.00002392 [12:04:46] Epoch: 1 Batch: 9313/20099 (46.34%) Loss: 1.983668 LR: 0.00002391 [12:04:48] Epoch: 1 Batch: 9314/20099 (46.34%) Loss: 2.283330 LR: 0.00002391 [12:04:50] Epoch: 1 Batch: 9315/20099 (46.35%) Loss: 2.449835 LR: 0.00002391 [12:04:52] Epoch: 1 Batch: 9316/20099 (46.35%) Loss: 2.171149 LR: 0.00002391 [12:04:54] Epoch: 1 Batch: 9317/20099 (46.36%) Loss: 2.098539 LR: 0.00002391 [12:04:55] Epoch: 1 Batch: 9318/20099 (46.36%) Loss: 1.925911 LR: 0.00002391 [12:04:57] Epoch: 1 Batch: 9319/20099 (46.37%) Loss: 1.927752 LR: 0.00002391 [12:04:59] Epoch: 1 Batch: 9320/20099 (46.37%) Loss: 2.331660 LR: 0.00002390 [12:05:01] Epoch: 1 Batch: 9321/20099 (46.38%) Loss: 2.120948 LR: 0.00002390 [12:05:02] Epoch: 1 Batch: 9322/20099 (46.38%) Loss: 2.236708 LR: 0.00002390 [12:05:04] Epoch: 1 Batch: 9323/20099 (46.39%) Loss: 2.142719 LR: 0.00002390 [12:05:06] Epoch: 1 Batch: 9324/20099 (46.39%) Loss: 2.357676 LR: 0.00002390 [12:05:08] Epoch: 1 Batch: 9325/20099 (46.40%) Loss: 2.300825 LR: 0.00002390 [12:05:09] Epoch: 1 Batch: 9326/20099 (46.40%) Loss: 2.067228 LR: 0.00002390 [12:05:11] Epoch: 1 Batch: 9327/20099 (46.41%) Loss: 2.114865 LR: 0.00002388 [12:05:13] Epoch: 1 Batch: 9328/20099 (46.41%) Loss: 1.999580 LR: 0.00002388 [12:05:15] Epoch: 1 Batch: 9329/20099 (46.42%) Loss: 2.046137 LR: 0.00002388 [12:05:16] Epoch: 1 Batch: 9330/20099 (46.42%) Loss: 2.062184 LR: 0.00002388 [12:05:18] Epoch: 1 Batch: 9331/20099 (46.43%) Loss: 2.246281 LR: 0.00002388 [12:05:20] Epoch: 1 Batch: 9332/20099 (46.43%) Loss: 2.071417 LR: 0.00002388 [12:05:22] Epoch: 1 Batch: 9333/20099 (46.44%) Loss: 2.187849 LR: 0.00002388 [12:05:24] Epoch: 1 Batch: 9334/20099 (46.44%) Loss: 2.107283 LR: 0.00002387 [12:05:25] Epoch: 1 Batch: 9335/20099 (46.45%) Loss: 2.274931 LR: 0.00002387 [12:05:27] Epoch: 1 Batch: 9336/20099 (46.45%) Loss: 2.032984 LR: 0.00002387 [12:05:29] Epoch: 1 Batch: 9337/20099 (46.46%) Loss: 2.207665 LR: 0.00002387 [12:05:31] Epoch: 1 Batch: 9338/20099 (46.46%) Loss: 2.033588 LR: 0.00002387 [12:05:33] Epoch: 1 Batch: 9339/20099 (46.46%) Loss: 2.478738 LR: 0.00002387 [12:05:34] Epoch: 1 Batch: 9340/20099 (46.47%) Loss: 2.209998 LR: 0.00002387 [12:05:36] Epoch: 1 Batch: 9341/20099 (46.47%) Loss: 2.214007 LR: 0.00002385 [12:05:38] Epoch: 1 Batch: 9342/20099 (46.48%) Loss: 2.003210 LR: 0.00002385 [12:05:40] Epoch: 1 Batch: 9343/20099 (46.48%) Loss: 2.172973 LR: 0.00002385 [12:05:41] Epoch: 1 Batch: 9344/20099 (46.49%) Loss: 2.219859 LR: 0.00002385 [12:05:43] Epoch: 1 Batch: 9345/20099 (46.49%) Loss: 2.347208 LR: 0.00002385 [12:05:45] Epoch: 1 Batch: 9346/20099 (46.50%) Loss: 2.064275 LR: 0.00002385 [12:05:47] Epoch: 1 Batch: 9347/20099 (46.50%) Loss: 2.346556 LR: 0.00002385 [12:05:48] Epoch: 1 Batch: 9348/20099 (46.51%) Loss: 1.972494 LR: 0.00002384 [12:05:50] Epoch: 1 Batch: 9349/20099 (46.51%) Loss: 2.125959 LR: 0.00002384 [12:05:52] Epoch: 1 Batch: 9350/20099 (46.52%) Loss: 2.037594 LR: 0.00002384 [12:05:54] Epoch: 1 Batch: 9351/20099 (46.52%) Loss: 1.807078 LR: 0.00002384 [12:05:56] Epoch: 1 Batch: 9352/20099 (46.53%) Loss: 2.009830 LR: 0.00002384 [12:05:57] Epoch: 1 Batch: 9353/20099 (46.53%) Loss: 1.947050 LR: 0.00002384 [12:05:59] Epoch: 1 Batch: 9354/20099 (46.54%) Loss: 1.837435 LR: 0.00002384 [12:06:01] Epoch: 1 Batch: 9355/20099 (46.54%) Loss: 2.100842 LR: 0.00002383 [12:06:03] Epoch: 1 Batch: 9356/20099 (46.55%) Loss: 2.099627 LR: 0.00002383 [12:06:04] Epoch: 1 Batch: 9357/20099 (46.55%) Loss: 2.143074 LR: 0.00002383 [12:06:06] Epoch: 1 Batch: 9358/20099 (46.56%) Loss: 2.658697 LR: 0.00002383 [12:06:08] Epoch: 1 Batch: 9359/20099 (46.56%) Loss: 2.196622 LR: 0.00002383 [12:06:10] Epoch: 1 Batch: 9360/20099 (46.57%) Loss: 2.005906 LR: 0.00002383 [12:06:12] Epoch: 1 Batch: 9361/20099 (46.57%) Loss: 1.869178 LR: 0.00002383 [12:06:13] Epoch: 1 Batch: 9362/20099 (46.58%) Loss: 2.191496 LR: 0.00002381 [12:06:15] Epoch: 1 Batch: 9363/20099 (46.58%) Loss: 2.167088 LR: 0.00002381 [12:06:17] Epoch: 1 Batch: 9364/20099 (46.59%) Loss: 2.201173 LR: 0.00002381 [12:06:19] Epoch: 1 Batch: 9365/20099 (46.59%) Loss: 1.829090 LR: 0.00002381 [12:06:20] Epoch: 1 Batch: 9366/20099 (46.60%) Loss: 1.850945 LR: 0.00002381 [12:06:22] Epoch: 1 Batch: 9367/20099 (46.60%) Loss: 2.201318 LR: 0.00002381 [12:06:24] Epoch: 1 Batch: 9368/20099 (46.61%) Loss: 2.059328 LR: 0.00002381 [12:06:26] Epoch: 1 Batch: 9369/20099 (46.61%) Loss: 2.258810 LR: 0.00002380 [12:06:27] Epoch: 1 Batch: 9370/20099 (46.62%) Loss: 2.085174 LR: 0.00002380 [12:06:29] Epoch: 1 Batch: 9371/20099 (46.62%) Loss: 1.890519 LR: 0.00002380 [12:06:31] Epoch: 1 Batch: 9372/20099 (46.63%) Loss: 2.116861 LR: 0.00002380 [12:06:33] Epoch: 1 Batch: 9373/20099 (46.63%) Loss: 2.003023 LR: 0.00002380 [12:06:35] Epoch: 1 Batch: 9374/20099 (46.64%) Loss: 2.249454 LR: 0.00002380 [12:06:36] Epoch: 1 Batch: 9375/20099 (46.64%) Loss: 2.286725 LR: 0.00002380 [12:06:38] Epoch: 1 Batch: 9376/20099 (46.65%) Loss: 2.281140 LR: 0.00002378 [12:06:40] Epoch: 1 Batch: 9377/20099 (46.65%) Loss: 2.191887 LR: 0.00002378 [12:06:42] Epoch: 1 Batch: 9378/20099 (46.66%) Loss: 2.014700 LR: 0.00002378 [12:06:43] Epoch: 1 Batch: 9379/20099 (46.66%) Loss: 2.123560 LR: 0.00002378 [12:06:45] Epoch: 1 Batch: 9380/20099 (46.67%) Loss: 1.912226 LR: 0.00002378 [12:06:47] Epoch: 1 Batch: 9381/20099 (46.67%) Loss: 1.930715 LR: 0.00002378 [12:06:49] Epoch: 1 Batch: 9382/20099 (46.68%) Loss: 1.778153 LR: 0.00002378 [12:06:51] Epoch: 1 Batch: 9383/20099 (46.68%) Loss: 2.215627 LR: 0.00002377 [12:06:52] Epoch: 1 Batch: 9384/20099 (46.69%) Loss: 2.076908 LR: 0.00002377 [12:06:54] Epoch: 1 Batch: 9385/20099 (46.69%) Loss: 2.466665 LR: 0.00002377 [12:06:56] Epoch: 1 Batch: 9386/20099 (46.70%) Loss: 2.248737 LR: 0.00002377 [12:06:58] Epoch: 1 Batch: 9387/20099 (46.70%) Loss: 2.063239 LR: 0.00002377 [12:07:00] Epoch: 1 Batch: 9388/20099 (46.71%) Loss: 1.952344 LR: 0.00002377 [12:07:01] Epoch: 1 Batch: 9389/20099 (46.71%) Loss: 1.833552 LR: 0.00002377 [12:07:03] Epoch: 1 Batch: 9390/20099 (46.72%) Loss: 2.184552 LR: 0.00002375 [12:07:05] Epoch: 1 Batch: 9391/20099 (46.72%) Loss: 1.941707 LR: 0.00002375 [12:07:07] Epoch: 1 Batch: 9392/20099 (46.73%) Loss: 1.916327 LR: 0.00002375 [12:07:08] Epoch: 1 Batch: 9393/20099 (46.73%) Loss: 2.278915 LR: 0.00002375 [12:07:10] Epoch: 1 Batch: 9394/20099 (46.74%) Loss: 2.134921 LR: 0.00002375 [12:07:12] Epoch: 1 Batch: 9395/20099 (46.74%) Loss: 2.005348 LR: 0.00002375 [12:07:14] Epoch: 1 Batch: 9396/20099 (46.75%) Loss: 1.964191 LR: 0.00002375 [12:07:16] Epoch: 1 Batch: 9397/20099 (46.75%) Loss: 2.199244 LR: 0.00002374 [12:07:17] Epoch: 1 Batch: 9398/20099 (46.76%) Loss: 2.275865 LR: 0.00002374 [12:07:19] Epoch: 1 Batch: 9399/20099 (46.76%) Loss: 2.423033 LR: 0.00002374 [12:07:24] >> Cleaned up old temp checkpoint: epoch1_step7400 [12:07:24] >> Temp checkpoint saved: epoch1_step9400, size: 0.1693 GB [12:07:24] Epoch: 1 Batch: 9400/20099 (46.77%) Loss: 1.907833 LR: 0.00002374 [12:07:26] Epoch: 1 Batch: 9401/20099 (46.77%) Loss: 2.258406 LR: 0.00002374 [12:07:28] Epoch: 1 Batch: 9402/20099 (46.78%) Loss: 1.967403 LR: 0.00002374 [12:07:30] Epoch: 1 Batch: 9403/20099 (46.78%) Loss: 2.172712 LR: 0.00002374 [12:07:32] Epoch: 1 Batch: 9404/20099 (46.79%) Loss: 2.106892 LR: 0.00002373 [12:07:33] Epoch: 1 Batch: 9405/20099 (46.79%) Loss: 2.208883 LR: 0.00002373 [12:07:35] Epoch: 1 Batch: 9406/20099 (46.80%) Loss: 2.230259 LR: 0.00002373 [12:07:37] Epoch: 1 Batch: 9407/20099 (46.80%) Loss: 2.010746 LR: 0.00002373 [12:07:39] Epoch: 1 Batch: 9408/20099 (46.81%) Loss: 2.086753 LR: 0.00002373 [12:07:40] Epoch: 1 Batch: 9409/20099 (46.81%) Loss: 1.925590 LR: 0.00002373 [12:07:42] Epoch: 1 Batch: 9410/20099 (46.82%) Loss: 2.383957 LR: 0.00002373 [12:07:44] Epoch: 1 Batch: 9411/20099 (46.82%) Loss: 2.261527 LR: 0.00002371 [12:07:46] Epoch: 1 Batch: 9412/20099 (46.83%) Loss: 2.158074 LR: 0.00002371 [12:07:47] Epoch: 1 Batch: 9413/20099 (46.83%) Loss: 1.955886 LR: 0.00002371 [12:07:49] Epoch: 1 Batch: 9414/20099 (46.84%) Loss: 2.327398 LR: 0.00002371 [12:07:51] Epoch: 1 Batch: 9415/20099 (46.84%) Loss: 2.030664 LR: 0.00002371 [12:07:53] Epoch: 1 Batch: 9416/20099 (46.85%) Loss: 2.160621 LR: 0.00002371 [12:07:55] Epoch: 1 Batch: 9417/20099 (46.85%) Loss: 2.019266 LR: 0.00002371 [12:07:56] Epoch: 1 Batch: 9418/20099 (46.86%) Loss: 2.069087 LR: 0.00002370 [12:07:58] Epoch: 1 Batch: 9419/20099 (46.86%) Loss: 2.036530 LR: 0.00002370 [12:08:00] Epoch: 1 Batch: 9420/20099 (46.87%) Loss: 2.098251 LR: 0.00002370 [12:08:02] Epoch: 1 Batch: 9421/20099 (46.87%) Loss: 2.012729 LR: 0.00002370 [12:08:04] Epoch: 1 Batch: 9422/20099 (46.88%) Loss: 2.169810 LR: 0.00002370 [12:08:05] Epoch: 1 Batch: 9423/20099 (46.88%) Loss: 1.990564 LR: 0.00002370 [12:08:07] Epoch: 1 Batch: 9424/20099 (46.89%) Loss: 2.126844 LR: 0.00002370 [12:08:09] Epoch: 1 Batch: 9425/20099 (46.89%) Loss: 2.010858 LR: 0.00002368 [12:08:11] Epoch: 1 Batch: 9426/20099 (46.90%) Loss: 1.741487 LR: 0.00002368 [12:08:13] Epoch: 1 Batch: 9427/20099 (46.90%) Loss: 2.058780 LR: 0.00002368 [12:08:14] Epoch: 1 Batch: 9428/20099 (46.91%) Loss: 2.176776 LR: 0.00002368 [12:08:16] Epoch: 1 Batch: 9429/20099 (46.91%) Loss: 2.404131 LR: 0.00002368 [12:08:18] Epoch: 1 Batch: 9430/20099 (46.92%) Loss: 2.564886 LR: 0.00002368 [12:08:20] Epoch: 1 Batch: 9431/20099 (46.92%) Loss: 2.164956 LR: 0.00002368 [12:08:21] Epoch: 1 Batch: 9432/20099 (46.93%) Loss: 2.497317 LR: 0.00002367 [12:08:23] Epoch: 1 Batch: 9433/20099 (46.93%) Loss: 2.002699 LR: 0.00002367 [12:08:25] Epoch: 1 Batch: 9434/20099 (46.94%) Loss: 2.178479 LR: 0.00002367 [12:08:27] Epoch: 1 Batch: 9435/20099 (46.94%) Loss: 1.958166 LR: 0.00002367 [12:08:29] Epoch: 1 Batch: 9436/20099 (46.95%) Loss: 2.052859 LR: 0.00002367 [12:08:30] Epoch: 1 Batch: 9437/20099 (46.95%) Loss: 1.992962 LR: 0.00002367 [12:08:32] Epoch: 1 Batch: 9438/20099 (46.96%) Loss: 1.706176 LR: 0.00002367 [12:08:34] Epoch: 1 Batch: 9439/20099 (46.96%) Loss: 1.873890 LR: 0.00002365 [12:08:36] Epoch: 1 Batch: 9440/20099 (46.97%) Loss: 1.904483 LR: 0.00002365 [12:08:37] Epoch: 1 Batch: 9441/20099 (46.97%) Loss: 2.244889 LR: 0.00002365 [12:08:39] Epoch: 1 Batch: 9442/20099 (46.98%) Loss: 1.769275 LR: 0.00002365 [12:08:41] Epoch: 1 Batch: 9443/20099 (46.98%) Loss: 1.941209 LR: 0.00002365 [12:08:43] Epoch: 1 Batch: 9444/20099 (46.99%) Loss: 1.993742 LR: 0.00002365 [12:08:44] Epoch: 1 Batch: 9445/20099 (46.99%) Loss: 2.206442 LR: 0.00002365 [12:08:46] Epoch: 1 Batch: 9446/20099 (47.00%) Loss: 1.817183 LR: 0.00002364 [12:08:48] Epoch: 1 Batch: 9447/20099 (47.00%) Loss: 1.941155 LR: 0.00002364 [12:08:50] Epoch: 1 Batch: 9448/20099 (47.01%) Loss: 2.283155 LR: 0.00002364 [12:08:52] Epoch: 1 Batch: 9449/20099 (47.01%) Loss: 2.183802 LR: 0.00002364 [12:08:53] Epoch: 1 Batch: 9450/20099 (47.02%) Loss: 2.200827 LR: 0.00002364 [12:08:55] Epoch: 1 Batch: 9451/20099 (47.02%) Loss: 1.961768 LR: 0.00002364 [12:08:57] Epoch: 1 Batch: 9452/20099 (47.03%) Loss: 2.129909 LR: 0.00002364 [12:08:59] Epoch: 1 Batch: 9453/20099 (47.03%) Loss: 1.939364 LR: 0.00002363 [12:09:00] Epoch: 1 Batch: 9454/20099 (47.04%) Loss: 1.984640 LR: 0.00002363 [12:09:02] Epoch: 1 Batch: 9455/20099 (47.04%) Loss: 1.668297 LR: 0.00002363 [12:09:04] Epoch: 1 Batch: 9456/20099 (47.05%) Loss: 1.881298 LR: 0.00002363 [12:09:06] Epoch: 1 Batch: 9457/20099 (47.05%) Loss: 2.266949 LR: 0.00002363 [12:09:07] Epoch: 1 Batch: 9458/20099 (47.06%) Loss: 2.198484 LR: 0.00002363 [12:09:09] Epoch: 1 Batch: 9459/20099 (47.06%) Loss: 2.332202 LR: 0.00002363 [12:09:11] Epoch: 1 Batch: 9460/20099 (47.07%) Loss: 1.882725 LR: 0.00002361 [12:09:13] Epoch: 1 Batch: 9461/20099 (47.07%) Loss: 2.210719 LR: 0.00002361 [12:09:14] Epoch: 1 Batch: 9462/20099 (47.08%) Loss: 2.068842 LR: 0.00002361 [12:09:16] Epoch: 1 Batch: 9463/20099 (47.08%) Loss: 2.213252 LR: 0.00002361 [12:09:18] Epoch: 1 Batch: 9464/20099 (47.09%) Loss: 2.200275 LR: 0.00002361 [12:09:20] Epoch: 1 Batch: 9465/20099 (47.09%) Loss: 2.345363 LR: 0.00002361 [12:09:22] Epoch: 1 Batch: 9466/20099 (47.10%) Loss: 1.978821 LR: 0.00002361 [12:09:23] Epoch: 1 Batch: 9467/20099 (47.10%) Loss: 2.023939 LR: 0.00002360 [12:09:25] Epoch: 1 Batch: 9468/20099 (47.11%) Loss: 1.989781 LR: 0.00002360 [12:09:27] Epoch: 1 Batch: 9469/20099 (47.11%) Loss: 2.047773 LR: 0.00002360 [12:09:29] Epoch: 1 Batch: 9470/20099 (47.12%) Loss: 2.062806 LR: 0.00002360 [12:09:30] Epoch: 1 Batch: 9471/20099 (47.12%) Loss: 2.282166 LR: 0.00002360 [12:09:32] Epoch: 1 Batch: 9472/20099 (47.13%) Loss: 2.183056 LR: 0.00002360 [12:09:34] Epoch: 1 Batch: 9473/20099 (47.13%) Loss: 2.243507 LR: 0.00002360 [12:09:36] Epoch: 1 Batch: 9474/20099 (47.14%) Loss: 2.210069 LR: 0.00002358 [12:09:37] Epoch: 1 Batch: 9475/20099 (47.14%) Loss: 1.780221 LR: 0.00002358 [12:09:39] Epoch: 1 Batch: 9476/20099 (47.15%) Loss: 1.994505 LR: 0.00002358 [12:09:41] Epoch: 1 Batch: 9477/20099 (47.15%) Loss: 1.933585 LR: 0.00002358 [12:09:43] Epoch: 1 Batch: 9478/20099 (47.16%) Loss: 1.911352 LR: 0.00002358 [12:09:44] Epoch: 1 Batch: 9479/20099 (47.16%) Loss: 2.143031 LR: 0.00002358 [12:09:46] Epoch: 1 Batch: 9480/20099 (47.17%) Loss: 2.011117 LR: 0.00002358 [12:09:48] Epoch: 1 Batch: 9481/20099 (47.17%) Loss: 2.197650 LR: 0.00002357 [12:09:50] Epoch: 1 Batch: 9482/20099 (47.18%) Loss: 2.188788 LR: 0.00002357 [12:09:52] Epoch: 1 Batch: 9483/20099 (47.18%) Loss: 2.122593 LR: 0.00002357 [12:09:53] Epoch: 1 Batch: 9484/20099 (47.19%) Loss: 2.077100 LR: 0.00002357 [12:09:55] Epoch: 1 Batch: 9485/20099 (47.19%) Loss: 2.461115 LR: 0.00002357 [12:09:57] Epoch: 1 Batch: 9486/20099 (47.20%) Loss: 1.677493 LR: 0.00002357 [12:09:59] Epoch: 1 Batch: 9487/20099 (47.20%) Loss: 2.160422 LR: 0.00002357 [12:10:00] Epoch: 1 Batch: 9488/20099 (47.21%) Loss: 2.077689 LR: 0.00002355 [12:10:02] Epoch: 1 Batch: 9489/20099 (47.21%) Loss: 2.167521 LR: 0.00002355 [12:10:04] Epoch: 1 Batch: 9490/20099 (47.22%) Loss: 2.092480 LR: 0.00002355 [12:10:06] Epoch: 1 Batch: 9491/20099 (47.22%) Loss: 2.101011 LR: 0.00002355 [12:10:07] Epoch: 1 Batch: 9492/20099 (47.23%) Loss: 2.152921 LR: 0.00002355 [12:10:09] Epoch: 1 Batch: 9493/20099 (47.23%) Loss: 1.897252 LR: 0.00002355 [12:10:11] Epoch: 1 Batch: 9494/20099 (47.24%) Loss: 2.191504 LR: 0.00002355 [12:10:13] Epoch: 1 Batch: 9495/20099 (47.24%) Loss: 2.150588 LR: 0.00002354 [12:10:14] Epoch: 1 Batch: 9496/20099 (47.25%) Loss: 2.163707 LR: 0.00002354 [12:10:16] Epoch: 1 Batch: 9497/20099 (47.25%) Loss: 2.080167 LR: 0.00002354 [12:10:18] Epoch: 1 Batch: 9498/20099 (47.26%) Loss: 2.012486 LR: 0.00002354 [12:10:20] Epoch: 1 Batch: 9499/20099 (47.26%) Loss: 2.247128 LR: 0.00002354 [12:10:22] >> Evaluating batch 0 [12:10:23] >> Evaluating batch 1 [12:10:24] >> Evaluating batch 2 [12:10:25] >> Evaluating batch 3 [12:10:26] >> Evaluating batch 4 [12:10:27] >> Evaluating batch 5 [12:10:28] >> Evaluating batch 6 [12:10:29] >> Evaluating batch 7 [12:10:30] >> Evaluating batch 8 [12:10:31] >> Evaluating batch 9 [12:10:32] >> Evaluating batch 10 [12:10:33] >> Evaluating batch 11 [12:10:34] >> Evaluating batch 12 [12:10:35] >> Evaluating batch 13 [12:10:36] >> Evaluating batch 14 [12:10:36] >> Evaluating batch 15 [12:10:37] >> Evaluating batch 16 [12:10:38] Epoch: 1 Step: 9500/20099 Evaluation: [12:10:38] [1mAvg Loss Since Last Eval: 2.1206 Val Loss: 2.1674 Validation loss delta: -0.0036 Perplexity: 8.7354 LR: 0.00002354 [12:10:42] >> Checkpoint saved: epoch1_step9500, size: 0.1693 GB [12:10:42] Epoch: 1 Batch: 9500/20099 (47.27%) Loss: 2.148778 LR: 0.00002354 [12:10:43] Epoch: 1 Batch: 9501/20099 (47.27%) Loss: 2.160522 LR: 0.00002354 [12:10:45] Epoch: 1 Batch: 9502/20099 (47.28%) Loss: 2.229930 LR: 0.00002353 [12:10:47] Epoch: 1 Batch: 9503/20099 (47.28%) Loss: 1.917577 LR: 0.00002353 [12:10:49] Epoch: 1 Batch: 9504/20099 (47.29%) Loss: 2.067605 LR: 0.00002353 [12:10:50] Epoch: 1 Batch: 9505/20099 (47.29%) Loss: 1.998001 LR: 0.00002353 [12:10:52] Epoch: 1 Batch: 9506/20099 (47.30%) Loss: 2.256693 LR: 0.00002353 [12:10:54] Epoch: 1 Batch: 9507/20099 (47.30%) Loss: 2.133910 LR: 0.00002353 [12:10:56] Epoch: 1 Batch: 9508/20099 (47.31%) Loss: 2.324447 LR: 0.00002353 [12:10:57] Epoch: 1 Batch: 9509/20099 (47.31%) Loss: 1.982158 LR: 0.00002351 [12:10:59] Epoch: 1 Batch: 9510/20099 (47.32%) Loss: 2.174435 LR: 0.00002351 [12:11:01] Epoch: 1 Batch: 9511/20099 (47.32%) Loss: 2.153279 LR: 0.00002351 [12:11:03] Epoch: 1 Batch: 9512/20099 (47.33%) Loss: 1.974454 LR: 0.00002351 [12:11:05] Epoch: 1 Batch: 9513/20099 (47.33%) Loss: 1.937890 LR: 0.00002351 [12:11:06] Epoch: 1 Batch: 9514/20099 (47.34%) Loss: 2.187612 LR: 0.00002351 [12:11:08] Epoch: 1 Batch: 9515/20099 (47.34%) Loss: 1.954257 LR: 0.00002351 [12:11:10] Epoch: 1 Batch: 9516/20099 (47.35%) Loss: 2.273385 LR: 0.00002350 [12:11:12] Epoch: 1 Batch: 9517/20099 (47.35%) Loss: 2.387912 LR: 0.00002350 [12:11:14] Epoch: 1 Batch: 9518/20099 (47.36%) Loss: 1.922419 LR: 0.00002350 [12:11:15] Epoch: 1 Batch: 9519/20099 (47.36%) Loss: 2.176181 LR: 0.00002350 [12:11:17] Epoch: 1 Batch: 9520/20099 (47.37%) Loss: 2.347123 LR: 0.00002350 [12:11:19] Epoch: 1 Batch: 9521/20099 (47.37%) Loss: 2.261538 LR: 0.00002350 [12:11:21] Epoch: 1 Batch: 9522/20099 (47.38%) Loss: 1.714682 LR: 0.00002350 [12:11:23] Epoch: 1 Batch: 9523/20099 (47.38%) Loss: 1.612215 LR: 0.00002348 [12:11:24] Epoch: 1 Batch: 9524/20099 (47.39%) Loss: 1.889985 LR: 0.00002348 [12:11:26] Epoch: 1 Batch: 9525/20099 (47.39%) Loss: 2.145338 LR: 0.00002348 [12:11:28] Epoch: 1 Batch: 9526/20099 (47.40%) Loss: 2.152972 LR: 0.00002348 [12:11:30] Epoch: 1 Batch: 9527/20099 (47.40%) Loss: 1.798252 LR: 0.00002348 [12:11:31] Epoch: 1 Batch: 9528/20099 (47.41%) Loss: 2.168269 LR: 0.00002348 [12:11:33] Epoch: 1 Batch: 9529/20099 (47.41%) Loss: 1.879148 LR: 0.00002348 [12:11:35] Epoch: 1 Batch: 9530/20099 (47.42%) Loss: 1.927739 LR: 0.00002347 [12:11:37] Epoch: 1 Batch: 9531/20099 (47.42%) Loss: 2.188153 LR: 0.00002347 [12:11:39] Epoch: 1 Batch: 9532/20099 (47.43%) Loss: 2.266184 LR: 0.00002347 [12:11:40] Epoch: 1 Batch: 9533/20099 (47.43%) Loss: 2.034377 LR: 0.00002347 [12:11:42] Epoch: 1 Batch: 9534/20099 (47.44%) Loss: 2.031463 LR: 0.00002347 [12:11:44] Epoch: 1 Batch: 9535/20099 (47.44%) Loss: 2.222976 LR: 0.00002347 [12:11:46] Epoch: 1 Batch: 9536/20099 (47.45%) Loss: 1.937987 LR: 0.00002347 [12:11:47] Epoch: 1 Batch: 9537/20099 (47.45%) Loss: 2.139163 LR: 0.00002345 [12:11:49] Epoch: 1 Batch: 9538/20099 (47.46%) Loss: 2.161989 LR: 0.00002345 [12:11:51] Epoch: 1 Batch: 9539/20099 (47.46%) Loss: 1.978923 LR: 0.00002345 [12:11:53] Epoch: 1 Batch: 9540/20099 (47.47%) Loss: 2.172172 LR: 0.00002345 [12:11:54] Epoch: 1 Batch: 9541/20099 (47.47%) Loss: 2.032835 LR: 0.00002345 [12:11:56] Epoch: 1 Batch: 9542/20099 (47.47%) Loss: 2.350900 LR: 0.00002345 [12:11:58] Epoch: 1 Batch: 9543/20099 (47.48%) Loss: 1.959613 LR: 0.00002345 [12:12:00] Epoch: 1 Batch: 9544/20099 (47.48%) Loss: 2.127566 LR: 0.00002344 [12:12:02] Epoch: 1 Batch: 9545/20099 (47.49%) Loss: 1.962626 LR: 0.00002344 [12:12:03] Epoch: 1 Batch: 9546/20099 (47.49%) Loss: 2.079700 LR: 0.00002344 [12:12:05] Epoch: 1 Batch: 9547/20099 (47.50%) Loss: 1.808757 LR: 0.00002344 [12:12:07] Epoch: 1 Batch: 9548/20099 (47.50%) Loss: 1.974911 LR: 0.00002344 [12:12:09] Epoch: 1 Batch: 9549/20099 (47.51%) Loss: 1.945500 LR: 0.00002344 [12:12:10] Epoch: 1 Batch: 9550/20099 (47.51%) Loss: 2.254160 LR: 0.00002344 [12:12:12] Epoch: 1 Batch: 9551/20099 (47.52%) Loss: 2.463157 LR: 0.00002342 [12:12:14] Epoch: 1 Batch: 9552/20099 (47.52%) Loss: 2.336564 LR: 0.00002342 [12:12:16] Epoch: 1 Batch: 9553/20099 (47.53%) Loss: 2.147993 LR: 0.00002342 [12:12:17] Epoch: 1 Batch: 9554/20099 (47.53%) Loss: 2.160455 LR: 0.00002342 [12:12:19] Epoch: 1 Batch: 9555/20099 (47.54%) Loss: 1.993869 LR: 0.00002342 [12:12:21] Epoch: 1 Batch: 9556/20099 (47.54%) Loss: 2.200224 LR: 0.00002342 [12:12:23] Epoch: 1 Batch: 9557/20099 (47.55%) Loss: 2.165503 LR: 0.00002342 [12:12:24] Epoch: 1 Batch: 9558/20099 (47.55%) Loss: 1.916026 LR: 0.00002341 [12:12:26] Epoch: 1 Batch: 9559/20099 (47.56%) Loss: 1.905039 LR: 0.00002341 [12:12:28] Epoch: 1 Batch: 9560/20099 (47.56%) Loss: 2.041066 LR: 0.00002341 [12:12:30] Epoch: 1 Batch: 9561/20099 (47.57%) Loss: 2.348471 LR: 0.00002341 [12:12:32] Epoch: 1 Batch: 9562/20099 (47.57%) Loss: 1.924349 LR: 0.00002341 [12:12:33] Epoch: 1 Batch: 9563/20099 (47.58%) Loss: 1.907700 LR: 0.00002341 [12:12:35] Epoch: 1 Batch: 9564/20099 (47.58%) Loss: 1.937733 LR: 0.00002341 [12:12:37] Epoch: 1 Batch: 9565/20099 (47.59%) Loss: 1.994278 LR: 0.00002339 [12:12:39] Epoch: 1 Batch: 9566/20099 (47.59%) Loss: 2.114252 LR: 0.00002339 [12:12:40] Epoch: 1 Batch: 9567/20099 (47.60%) Loss: 2.197126 LR: 0.00002339 [12:12:42] Epoch: 1 Batch: 9568/20099 (47.60%) Loss: 2.332936 LR: 0.00002339 [12:12:44] Epoch: 1 Batch: 9569/20099 (47.61%) Loss: 2.410905 LR: 0.00002339 [12:12:46] Epoch: 1 Batch: 9570/20099 (47.61%) Loss: 2.191952 LR: 0.00002339 [12:12:48] Epoch: 1 Batch: 9571/20099 (47.62%) Loss: 1.933559 LR: 0.00002339 [12:12:49] Epoch: 1 Batch: 9572/20099 (47.62%) Loss: 2.111486 LR: 0.00002338 [12:12:51] Epoch: 1 Batch: 9573/20099 (47.63%) Loss: 2.042972 LR: 0.00002338 [12:12:53] Epoch: 1 Batch: 9574/20099 (47.63%) Loss: 2.065355 LR: 0.00002338 [12:12:55] Epoch: 1 Batch: 9575/20099 (47.64%) Loss: 2.442487 LR: 0.00002338 [12:12:56] Epoch: 1 Batch: 9576/20099 (47.64%) Loss: 2.338300 LR: 0.00002338 [12:12:58] Epoch: 1 Batch: 9577/20099 (47.65%) Loss: 2.454665 LR: 0.00002338 [12:13:00] Epoch: 1 Batch: 9578/20099 (47.65%) Loss: 1.964149 LR: 0.00002338 [12:13:02] Epoch: 1 Batch: 9579/20099 (47.66%) Loss: 1.956479 LR: 0.00002337 [12:13:04] Epoch: 1 Batch: 9580/20099 (47.66%) Loss: 1.883209 LR: 0.00002337 [12:13:05] Epoch: 1 Batch: 9581/20099 (47.67%) Loss: 2.015712 LR: 0.00002337 [12:13:07] Epoch: 1 Batch: 9582/20099 (47.67%) Loss: 2.117326 LR: 0.00002337 [12:13:09] Epoch: 1 Batch: 9583/20099 (47.68%) Loss: 2.064885 LR: 0.00002337 [12:13:11] Epoch: 1 Batch: 9584/20099 (47.68%) Loss: 2.009439 LR: 0.00002337 [12:13:13] Epoch: 1 Batch: 9585/20099 (47.69%) Loss: 2.061685 LR: 0.00002337 [12:13:14] Epoch: 1 Batch: 9586/20099 (47.69%) Loss: 2.136891 LR: 0.00002335 [12:13:16] Epoch: 1 Batch: 9587/20099 (47.70%) Loss: 1.811215 LR: 0.00002335 [12:13:18] Epoch: 1 Batch: 9588/20099 (47.70%) Loss: 2.181927 LR: 0.00002335 [12:13:20] Epoch: 1 Batch: 9589/20099 (47.71%) Loss: 1.950021 LR: 0.00002335 [12:13:21] Epoch: 1 Batch: 9590/20099 (47.71%) Loss: 2.129468 LR: 0.00002335 [12:13:23] Epoch: 1 Batch: 9591/20099 (47.72%) Loss: 2.275191 LR: 0.00002335 [12:13:25] Epoch: 1 Batch: 9592/20099 (47.72%) Loss: 2.161346 LR: 0.00002335 [12:13:27] Epoch: 1 Batch: 9593/20099 (47.73%) Loss: 2.101499 LR: 0.00002334 [12:13:29] Epoch: 1 Batch: 9594/20099 (47.73%) Loss: 2.289225 LR: 0.00002334 [12:13:30] Epoch: 1 Batch: 9595/20099 (47.74%) Loss: 2.224034 LR: 0.00002334 [12:13:32] Epoch: 1 Batch: 9596/20099 (47.74%) Loss: 1.720661 LR: 0.00002334 [12:13:34] Epoch: 1 Batch: 9597/20099 (47.75%) Loss: 2.065393 LR: 0.00002334 [12:13:36] Epoch: 1 Batch: 9598/20099 (47.75%) Loss: 2.297238 LR: 0.00002334 [12:13:37] Epoch: 1 Batch: 9599/20099 (47.76%) Loss: 1.973967 LR: 0.00002334 [12:13:43] >> Cleaned up old temp checkpoint: epoch1_step7600 [12:13:43] >> Temp checkpoint saved: epoch1_step9600, size: 0.1693 GB [12:13:43] Epoch: 1 Batch: 9600/20099 (47.76%) Loss: 2.267563 LR: 0.00002332 [12:13:45] Epoch: 1 Batch: 9601/20099 (47.77%) Loss: 2.149307 LR: 0.00002332 [12:13:46] Epoch: 1 Batch: 9602/20099 (47.77%) Loss: 1.933253 LR: 0.00002332 [12:13:48] Epoch: 1 Batch: 9603/20099 (47.78%) Loss: 2.150386 LR: 0.00002332 [12:13:50] Epoch: 1 Batch: 9604/20099 (47.78%) Loss: 2.231473 LR: 0.00002332 [12:13:52] Epoch: 1 Batch: 9605/20099 (47.79%) Loss: 2.124032 LR: 0.00002332 [12:13:53] Epoch: 1 Batch: 9606/20099 (47.79%) Loss: 1.964501 LR: 0.00002332 [12:13:55] Epoch: 1 Batch: 9607/20099 (47.80%) Loss: 1.723553 LR: 0.00002331 [12:13:57] Epoch: 1 Batch: 9608/20099 (47.80%) Loss: 2.318446 LR: 0.00002331 [12:13:59] Epoch: 1 Batch: 9609/20099 (47.81%) Loss: 1.971294 LR: 0.00002331 [12:14:01] Epoch: 1 Batch: 9610/20099 (47.81%) Loss: 1.709712 LR: 0.00002331 [12:14:02] Epoch: 1 Batch: 9611/20099 (47.82%) Loss: 2.116408 LR: 0.00002331 [12:14:04] Epoch: 1 Batch: 9612/20099 (47.82%) Loss: 2.173805 LR: 0.00002331 [12:14:06] Epoch: 1 Batch: 9613/20099 (47.83%) Loss: 2.080377 LR: 0.00002331 [12:14:08] Epoch: 1 Batch: 9614/20099 (47.83%) Loss: 2.019376 LR: 0.00002329 [12:14:10] Epoch: 1 Batch: 9615/20099 (47.84%) Loss: 1.795697 LR: 0.00002329 [12:14:11] Epoch: 1 Batch: 9616/20099 (47.84%) Loss: 2.285605 LR: 0.00002329 [12:14:13] Epoch: 1 Batch: 9617/20099 (47.85%) Loss: 2.366087 LR: 0.00002329 [12:14:15] Epoch: 1 Batch: 9618/20099 (47.85%) Loss: 2.025467 LR: 0.00002329 [12:14:17] Epoch: 1 Batch: 9619/20099 (47.86%) Loss: 1.881422 LR: 0.00002329 [12:14:19] Epoch: 1 Batch: 9620/20099 (47.86%) Loss: 2.139440 LR: 0.00002329 [12:14:20] Epoch: 1 Batch: 9621/20099 (47.87%) Loss: 2.177006 LR: 0.00002328 [12:14:22] Epoch: 1 Batch: 9622/20099 (47.87%) Loss: 2.354422 LR: 0.00002328 [12:14:24] Epoch: 1 Batch: 9623/20099 (47.88%) Loss: 2.086661 LR: 0.00002328 [12:14:26] Epoch: 1 Batch: 9624/20099 (47.88%) Loss: 2.100587 LR: 0.00002328 [12:14:27] Epoch: 1 Batch: 9625/20099 (47.89%) Loss: 2.220459 LR: 0.00002328 [12:14:29] Epoch: 1 Batch: 9626/20099 (47.89%) Loss: 2.049002 LR: 0.00002328 [12:14:31] Epoch: 1 Batch: 9627/20099 (47.90%) Loss: 2.007426 LR: 0.00002328 [12:14:33] Epoch: 1 Batch: 9628/20099 (47.90%) Loss: 2.145896 LR: 0.00002326 [12:14:35] Epoch: 1 Batch: 9629/20099 (47.91%) Loss: 2.151596 LR: 0.00002326 [12:14:36] Epoch: 1 Batch: 9630/20099 (47.91%) Loss: 2.106221 LR: 0.00002326 [12:14:38] Epoch: 1 Batch: 9631/20099 (47.92%) Loss: 2.197315 LR: 0.00002326 [12:14:40] Epoch: 1 Batch: 9632/20099 (47.92%) Loss: 1.722337 LR: 0.00002326 [12:14:42] Epoch: 1 Batch: 9633/20099 (47.93%) Loss: 2.180326 LR: 0.00002326 [12:14:43] Epoch: 1 Batch: 9634/20099 (47.93%) Loss: 2.322172 LR: 0.00002326 [12:14:45] Epoch: 1 Batch: 9635/20099 (47.94%) Loss: 2.113185 LR: 0.00002325 [12:14:47] Epoch: 1 Batch: 9636/20099 (47.94%) Loss: 2.203243 LR: 0.00002325 [12:14:49] Epoch: 1 Batch: 9637/20099 (47.95%) Loss: 2.201151 LR: 0.00002325 [12:14:50] Epoch: 1 Batch: 9638/20099 (47.95%) Loss: 2.063866 LR: 0.00002325 [12:14:52] Epoch: 1 Batch: 9639/20099 (47.96%) Loss: 2.158506 LR: 0.00002325 [12:14:54] Epoch: 1 Batch: 9640/20099 (47.96%) Loss: 2.308621 LR: 0.00002325 [12:14:56] Epoch: 1 Batch: 9641/20099 (47.97%) Loss: 2.239919 LR: 0.00002325 [12:14:58] Epoch: 1 Batch: 9642/20099 (47.97%) Loss: 2.150860 LR: 0.00002323 [12:14:59] Epoch: 1 Batch: 9643/20099 (47.98%) Loss: 2.101541 LR: 0.00002323 [12:15:01] Epoch: 1 Batch: 9644/20099 (47.98%) Loss: 2.161454 LR: 0.00002323 [12:15:03] Epoch: 1 Batch: 9645/20099 (47.99%) Loss: 2.299078 LR: 0.00002323 [12:15:05] Epoch: 1 Batch: 9646/20099 (47.99%) Loss: 2.198099 LR: 0.00002323 [12:15:06] Epoch: 1 Batch: 9647/20099 (48.00%) Loss: 1.828958 LR: 0.00002323 [12:15:08] Epoch: 1 Batch: 9648/20099 (48.00%) Loss: 2.449581 LR: 0.00002323 [12:15:10] Epoch: 1 Batch: 9649/20099 (48.01%) Loss: 1.972852 LR: 0.00002322 [12:15:12] Epoch: 1 Batch: 9650/20099 (48.01%) Loss: 2.285273 LR: 0.00002322 [12:15:13] Epoch: 1 Batch: 9651/20099 (48.02%) Loss: 2.017434 LR: 0.00002322 [12:15:15] Epoch: 1 Batch: 9652/20099 (48.02%) Loss: 1.872800 LR: 0.00002322 [12:15:17] Epoch: 1 Batch: 9653/20099 (48.03%) Loss: 2.480857 LR: 0.00002322 [12:15:19] Epoch: 1 Batch: 9654/20099 (48.03%) Loss: 2.159918 LR: 0.00002322 [12:15:20] Epoch: 1 Batch: 9655/20099 (48.04%) Loss: 2.037303 LR: 0.00002322 [12:15:22] Epoch: 1 Batch: 9656/20099 (48.04%) Loss: 2.211561 LR: 0.00002321 [12:15:24] Epoch: 1 Batch: 9657/20099 (48.05%) Loss: 2.079923 LR: 0.00002321 [12:15:26] Epoch: 1 Batch: 9658/20099 (48.05%) Loss: 2.046684 LR: 0.00002321 [12:15:28] Epoch: 1 Batch: 9659/20099 (48.06%) Loss: 2.084955 LR: 0.00002321 [12:15:29] Epoch: 1 Batch: 9660/20099 (48.06%) Loss: 2.267996 LR: 0.00002321 [12:15:31] Epoch: 1 Batch: 9661/20099 (48.07%) Loss: 2.004087 LR: 0.00002321 [12:15:33] Epoch: 1 Batch: 9662/20099 (48.07%) Loss: 2.089333 LR: 0.00002321 [12:15:35] Epoch: 1 Batch: 9663/20099 (48.08%) Loss: 1.987405 LR: 0.00002319 [12:15:36] Epoch: 1 Batch: 9664/20099 (48.08%) Loss: 2.298083 LR: 0.00002319 [12:15:38] Epoch: 1 Batch: 9665/20099 (48.09%) Loss: 1.799335 LR: 0.00002319 [12:15:40] Epoch: 1 Batch: 9666/20099 (48.09%) Loss: 2.069806 LR: 0.00002319 [12:15:42] Epoch: 1 Batch: 9667/20099 (48.10%) Loss: 1.828833 LR: 0.00002319 [12:15:44] Epoch: 1 Batch: 9668/20099 (48.10%) Loss: 1.984497 LR: 0.00002319 [12:15:45] Epoch: 1 Batch: 9669/20099 (48.11%) Loss: 2.270133 LR: 0.00002319 [12:15:47] Epoch: 1 Batch: 9670/20099 (48.11%) Loss: 2.070780 LR: 0.00002318 [12:15:49] Epoch: 1 Batch: 9671/20099 (48.12%) Loss: 2.201030 LR: 0.00002318 [12:15:51] Epoch: 1 Batch: 9672/20099 (48.12%) Loss: 2.148485 LR: 0.00002318 [12:15:53] Epoch: 1 Batch: 9673/20099 (48.13%) Loss: 1.987496 LR: 0.00002318 [12:15:54] Epoch: 1 Batch: 9674/20099 (48.13%) Loss: 1.993322 LR: 0.00002318 [12:15:56] Epoch: 1 Batch: 9675/20099 (48.14%) Loss: 2.198798 LR: 0.00002318 [12:15:58] Epoch: 1 Batch: 9676/20099 (48.14%) Loss: 2.220004 LR: 0.00002318 [12:16:00] Epoch: 1 Batch: 9677/20099 (48.15%) Loss: 2.133226 LR: 0.00002316 [12:16:01] Epoch: 1 Batch: 9678/20099 (48.15%) Loss: 1.875969 LR: 0.00002316 [12:16:03] Epoch: 1 Batch: 9679/20099 (48.16%) Loss: 1.812950 LR: 0.00002316 [12:16:05] Epoch: 1 Batch: 9680/20099 (48.16%) Loss: 2.078358 LR: 0.00002316 [12:16:07] Epoch: 1 Batch: 9681/20099 (48.17%) Loss: 2.231493 LR: 0.00002316 [12:16:09] Epoch: 1 Batch: 9682/20099 (48.17%) Loss: 1.975408 LR: 0.00002316 [12:16:10] Epoch: 1 Batch: 9683/20099 (48.18%) Loss: 2.030170 LR: 0.00002316 [12:16:12] Epoch: 1 Batch: 9684/20099 (48.18%) Loss: 2.245378 LR: 0.00002315 [12:16:14] Epoch: 1 Batch: 9685/20099 (48.19%) Loss: 1.982608 LR: 0.00002315 [12:16:16] Epoch: 1 Batch: 9686/20099 (48.19%) Loss: 2.060401 LR: 0.00002315 [12:16:17] Epoch: 1 Batch: 9687/20099 (48.20%) Loss: 2.172085 LR: 0.00002315 [12:16:19] Epoch: 1 Batch: 9688/20099 (48.20%) Loss: 2.174625 LR: 0.00002315 [12:16:21] Epoch: 1 Batch: 9689/20099 (48.21%) Loss: 2.418902 LR: 0.00002315 [12:16:23] Epoch: 1 Batch: 9690/20099 (48.21%) Loss: 1.924290 LR: 0.00002315 [12:16:25] Epoch: 1 Batch: 9691/20099 (48.22%) Loss: 2.034927 LR: 0.00002313 [12:16:26] Epoch: 1 Batch: 9692/20099 (48.22%) Loss: 1.946664 LR: 0.00002313 [12:16:28] Epoch: 1 Batch: 9693/20099 (48.23%) Loss: 2.019085 LR: 0.00002313 [12:16:30] Epoch: 1 Batch: 9694/20099 (48.23%) Loss: 2.365286 LR: 0.00002313 [12:16:32] Epoch: 1 Batch: 9695/20099 (48.24%) Loss: 2.205871 LR: 0.00002313 [12:16:34] Epoch: 1 Batch: 9696/20099 (48.24%) Loss: 1.919348 LR: 0.00002313 [12:16:35] Epoch: 1 Batch: 9697/20099 (48.25%) Loss: 2.209333 LR: 0.00002313 [12:16:37] Epoch: 1 Batch: 9698/20099 (48.25%) Loss: 2.201395 LR: 0.00002312 [12:16:39] Epoch: 1 Batch: 9699/20099 (48.26%) Loss: 2.264983 LR: 0.00002312 [12:16:41] Epoch: 1 Batch: 9700/20099 (48.26%) Loss: 2.277491 LR: 0.00002312 [12:16:42] Epoch: 1 Batch: 9701/20099 (48.27%) Loss: 2.393465 LR: 0.00002312 [12:16:44] Epoch: 1 Batch: 9702/20099 (48.27%) Loss: 2.392360 LR: 0.00002312 [12:16:46] Epoch: 1 Batch: 9703/20099 (48.28%) Loss: 1.981941 LR: 0.00002312 [12:16:48] Epoch: 1 Batch: 9704/20099 (48.28%) Loss: 1.734196 LR: 0.00002312 [12:16:50] Epoch: 1 Batch: 9705/20099 (48.29%) Loss: 2.170461 LR: 0.00002310 [12:16:51] Epoch: 1 Batch: 9706/20099 (48.29%) Loss: 2.171437 LR: 0.00002310 [12:16:53] Epoch: 1 Batch: 9707/20099 (48.30%) Loss: 2.106994 LR: 0.00002310 [12:16:55] Epoch: 1 Batch: 9708/20099 (48.30%) Loss: 2.176276 LR: 0.00002310 [12:16:57] Epoch: 1 Batch: 9709/20099 (48.31%) Loss: 2.345944 LR: 0.00002310 [12:16:58] Epoch: 1 Batch: 9710/20099 (48.31%) Loss: 2.146114 LR: 0.00002310 [12:17:00] Epoch: 1 Batch: 9711/20099 (48.32%) Loss: 1.729410 LR: 0.00002310 [12:17:02] Epoch: 1 Batch: 9712/20099 (48.32%) Loss: 1.806854 LR: 0.00002309 [12:17:04] Epoch: 1 Batch: 9713/20099 (48.33%) Loss: 2.167802 LR: 0.00002309 [12:17:06] Epoch: 1 Batch: 9714/20099 (48.33%) Loss: 2.027606 LR: 0.00002309 [12:17:07] Epoch: 1 Batch: 9715/20099 (48.34%) Loss: 2.408265 LR: 0.00002309 [12:17:09] Epoch: 1 Batch: 9716/20099 (48.34%) Loss: 2.060499 LR: 0.00002309 [12:17:11] Epoch: 1 Batch: 9717/20099 (48.35%) Loss: 1.888999 LR: 0.00002309 [12:17:13] Epoch: 1 Batch: 9718/20099 (48.35%) Loss: 2.045839 LR: 0.00002309 [12:17:14] Epoch: 1 Batch: 9719/20099 (48.36%) Loss: 2.117521 LR: 0.00002307 [12:17:16] Epoch: 1 Batch: 9720/20099 (48.36%) Loss: 2.208103 LR: 0.00002307 [12:17:18] Epoch: 1 Batch: 9721/20099 (48.37%) Loss: 2.088445 LR: 0.00002307 [12:17:20] Epoch: 1 Batch: 9722/20099 (48.37%) Loss: 2.064506 LR: 0.00002307 [12:17:22] Epoch: 1 Batch: 9723/20099 (48.38%) Loss: 2.199306 LR: 0.00002307 [12:17:23] Epoch: 1 Batch: 9724/20099 (48.38%) Loss: 2.076916 LR: 0.00002307 [12:17:25] Epoch: 1 Batch: 9725/20099 (48.39%) Loss: 2.051095 LR: 0.00002307 [12:17:27] Epoch: 1 Batch: 9726/20099 (48.39%) Loss: 2.058751 LR: 0.00002306 [12:17:29] Epoch: 1 Batch: 9727/20099 (48.40%) Loss: 1.904768 LR: 0.00002306 [12:17:30] Epoch: 1 Batch: 9728/20099 (48.40%) Loss: 2.165364 LR: 0.00002306 [12:17:32] Epoch: 1 Batch: 9729/20099 (48.41%) Loss: 2.338597 LR: 0.00002306 [12:17:34] Epoch: 1 Batch: 9730/20099 (48.41%) Loss: 2.171343 LR: 0.00002306 [12:17:36] Epoch: 1 Batch: 9731/20099 (48.42%) Loss: 2.113727 LR: 0.00002306 [12:17:38] Epoch: 1 Batch: 9732/20099 (48.42%) Loss: 2.090168 LR: 0.00002306 [12:17:39] Epoch: 1 Batch: 9733/20099 (48.43%) Loss: 2.442111 LR: 0.00002304 [12:17:41] Epoch: 1 Batch: 9734/20099 (48.43%) Loss: 1.821194 LR: 0.00002304 [12:17:43] Epoch: 1 Batch: 9735/20099 (48.44%) Loss: 2.033279 LR: 0.00002304 [12:17:45] Epoch: 1 Batch: 9736/20099 (48.44%) Loss: 1.940520 LR: 0.00002304 [12:17:46] Epoch: 1 Batch: 9737/20099 (48.45%) Loss: 2.163819 LR: 0.00002304 [12:17:48] Epoch: 1 Batch: 9738/20099 (48.45%) Loss: 1.867843 LR: 0.00002304 [12:17:50] Epoch: 1 Batch: 9739/20099 (48.46%) Loss: 2.108232 LR: 0.00002304 [12:17:52] Epoch: 1 Batch: 9740/20099 (48.46%) Loss: 2.200374 LR: 0.00002303 [12:17:54] Epoch: 1 Batch: 9741/20099 (48.47%) Loss: 1.966208 LR: 0.00002303 [12:17:55] Epoch: 1 Batch: 9742/20099 (48.47%) Loss: 2.421994 LR: 0.00002303 [12:17:57] Epoch: 1 Batch: 9743/20099 (48.48%) Loss: 2.133201 LR: 0.00002303 [12:17:59] Epoch: 1 Batch: 9744/20099 (48.48%) Loss: 1.802374 LR: 0.00002303 [12:18:01] Epoch: 1 Batch: 9745/20099 (48.48%) Loss: 2.092486 LR: 0.00002303 [12:18:02] Epoch: 1 Batch: 9746/20099 (48.49%) Loss: 2.217163 LR: 0.00002303 [12:18:04] Epoch: 1 Batch: 9747/20099 (48.49%) Loss: 1.827517 LR: 0.00002301 [12:18:06] Epoch: 1 Batch: 9748/20099 (48.50%) Loss: 1.746747 LR: 0.00002301 [12:18:08] Epoch: 1 Batch: 9749/20099 (48.50%) Loss: 2.012774 LR: 0.00002301 [12:18:10] Epoch: 1 Batch: 9750/20099 (48.51%) Loss: 2.018434 LR: 0.00002301 [12:18:11] Epoch: 1 Batch: 9751/20099 (48.51%) Loss: 2.181140 LR: 0.00002301 [12:18:13] Epoch: 1 Batch: 9752/20099 (48.52%) Loss: 2.155650 LR: 0.00002301 [12:18:15] Epoch: 1 Batch: 9753/20099 (48.52%) Loss: 2.316419 LR: 0.00002301 [12:18:17] Epoch: 1 Batch: 9754/20099 (48.53%) Loss: 2.318587 LR: 0.00002300 [12:18:18] Epoch: 1 Batch: 9755/20099 (48.53%) Loss: 2.075574 LR: 0.00002300 [12:18:20] Epoch: 1 Batch: 9756/20099 (48.54%) Loss: 2.383450 LR: 0.00002300 [12:18:22] Epoch: 1 Batch: 9757/20099 (48.54%) Loss: 2.413986 LR: 0.00002300 [12:18:24] Epoch: 1 Batch: 9758/20099 (48.55%) Loss: 2.051620 LR: 0.00002300 [12:18:26] Epoch: 1 Batch: 9759/20099 (48.55%) Loss: 2.214840 LR: 0.00002300 [12:18:27] Epoch: 1 Batch: 9760/20099 (48.56%) Loss: 2.176710 LR: 0.00002300 [12:18:29] Epoch: 1 Batch: 9761/20099 (48.56%) Loss: 2.137896 LR: 0.00002298 [12:18:31] Epoch: 1 Batch: 9762/20099 (48.57%) Loss: 2.158567 LR: 0.00002298 [12:18:33] Epoch: 1 Batch: 9763/20099 (48.57%) Loss: 2.033225 LR: 0.00002298 [12:18:34] Epoch: 1 Batch: 9764/20099 (48.58%) Loss: 1.970266 LR: 0.00002298 [12:18:36] Epoch: 1 Batch: 9765/20099 (48.58%) Loss: 2.411299 LR: 0.00002298 [12:18:38] Epoch: 1 Batch: 9766/20099 (48.59%) Loss: 2.049512 LR: 0.00002298 [12:18:40] Epoch: 1 Batch: 9767/20099 (48.59%) Loss: 2.095679 LR: 0.00002298 [12:18:42] Epoch: 1 Batch: 9768/20099 (48.60%) Loss: 1.835089 LR: 0.00002297 [12:18:43] Epoch: 1 Batch: 9769/20099 (48.60%) Loss: 2.011151 LR: 0.00002297 [12:18:45] Epoch: 1 Batch: 9770/20099 (48.61%) Loss: 2.147013 LR: 0.00002297 [12:18:47] Epoch: 1 Batch: 9771/20099 (48.61%) Loss: 2.064754 LR: 0.00002297 [12:18:49] Epoch: 1 Batch: 9772/20099 (48.62%) Loss: 2.096372 LR: 0.00002297 [12:18:50] Epoch: 1 Batch: 9773/20099 (48.62%) Loss: 2.115852 LR: 0.00002297 [12:18:52] Epoch: 1 Batch: 9774/20099 (48.63%) Loss: 2.359218 LR: 0.00002297 [12:18:54] Epoch: 1 Batch: 9775/20099 (48.63%) Loss: 2.321289 LR: 0.00002296 [12:18:56] Epoch: 1 Batch: 9776/20099 (48.64%) Loss: 2.155989 LR: 0.00002296 [12:18:58] Epoch: 1 Batch: 9777/20099 (48.64%) Loss: 2.083322 LR: 0.00002296 [12:18:59] Epoch: 1 Batch: 9778/20099 (48.65%) Loss: 1.921148 LR: 0.00002296 [12:19:01] Epoch: 1 Batch: 9779/20099 (48.65%) Loss: 2.186870 LR: 0.00002296 [12:19:03] Epoch: 1 Batch: 9780/20099 (48.66%) Loss: 1.827183 LR: 0.00002296 [12:19:05] Epoch: 1 Batch: 9781/20099 (48.66%) Loss: 2.243311 LR: 0.00002296 [12:19:07] Epoch: 1 Batch: 9782/20099 (48.67%) Loss: 2.056193 LR: 0.00002294 [12:19:08] Epoch: 1 Batch: 9783/20099 (48.67%) Loss: 2.430165 LR: 0.00002294 [12:19:10] Epoch: 1 Batch: 9784/20099 (48.68%) Loss: 2.256974 LR: 0.00002294 [12:19:12] Epoch: 1 Batch: 9785/20099 (48.68%) Loss: 2.357994 LR: 0.00002294 [12:19:14] Epoch: 1 Batch: 9786/20099 (48.69%) Loss: 2.115239 LR: 0.00002294 [12:19:15] Epoch: 1 Batch: 9787/20099 (48.69%) Loss: 1.998582 LR: 0.00002294 [12:19:17] Epoch: 1 Batch: 9788/20099 (48.70%) Loss: 2.065826 LR: 0.00002294 [12:19:19] Epoch: 1 Batch: 9789/20099 (48.70%) Loss: 1.978796 LR: 0.00002293 [12:19:21] Epoch: 1 Batch: 9790/20099 (48.71%) Loss: 2.354888 LR: 0.00002293 [12:19:23] Epoch: 1 Batch: 9791/20099 (48.71%) Loss: 2.205518 LR: 0.00002293 [12:19:24] Epoch: 1 Batch: 9792/20099 (48.72%) Loss: 2.107474 LR: 0.00002293 [12:19:26] Epoch: 1 Batch: 9793/20099 (48.72%) Loss: 2.061578 LR: 0.00002293 [12:19:28] Epoch: 1 Batch: 9794/20099 (48.73%) Loss: 2.280634 LR: 0.00002293 [12:19:30] Epoch: 1 Batch: 9795/20099 (48.73%) Loss: 1.974273 LR: 0.00002293 [12:19:31] Epoch: 1 Batch: 9796/20099 (48.74%) Loss: 2.213149 LR: 0.00002291 [12:19:33] Epoch: 1 Batch: 9797/20099 (48.74%) Loss: 1.736914 LR: 0.00002291 [12:19:35] Epoch: 1 Batch: 9798/20099 (48.75%) Loss: 2.270214 LR: 0.00002291 [12:19:37] Epoch: 1 Batch: 9799/20099 (48.75%) Loss: 1.936389 LR: 0.00002291 [12:19:42] >> Cleaned up old temp checkpoint: epoch1_step7800 [12:19:42] >> Temp checkpoint saved: epoch1_step9800, size: 0.1693 GB [12:19:42] Epoch: 1 Batch: 9800/20099 (48.76%) Loss: 2.016015 LR: 0.00002291 [12:19:44] Epoch: 1 Batch: 9801/20099 (48.76%) Loss: 2.411626 LR: 0.00002291 [12:19:45] Epoch: 1 Batch: 9802/20099 (48.77%) Loss: 2.200298 LR: 0.00002291 [12:19:47] Epoch: 1 Batch: 9803/20099 (48.77%) Loss: 2.159188 LR: 0.00002290 [12:19:49] Epoch: 1 Batch: 9804/20099 (48.78%) Loss: 2.147984 LR: 0.00002290 [12:19:51] Epoch: 1 Batch: 9805/20099 (48.78%) Loss: 2.002583 LR: 0.00002290 [12:19:53] Epoch: 1 Batch: 9806/20099 (48.79%) Loss: 1.908840 LR: 0.00002290 [12:19:54] Epoch: 1 Batch: 9807/20099 (48.79%) Loss: 2.326114 LR: 0.00002290 [12:19:56] Epoch: 1 Batch: 9808/20099 (48.80%) Loss: 2.094183 LR: 0.00002290 [12:19:58] Epoch: 1 Batch: 9809/20099 (48.80%) Loss: 2.083468 LR: 0.00002290 [12:20:00] Epoch: 1 Batch: 9810/20099 (48.81%) Loss: 2.147694 LR: 0.00002288 [12:20:01] Epoch: 1 Batch: 9811/20099 (48.81%) Loss: 2.243286 LR: 0.00002288 [12:20:03] Epoch: 1 Batch: 9812/20099 (48.82%) Loss: 1.999692 LR: 0.00002288 [12:20:05] Epoch: 1 Batch: 9813/20099 (48.82%) Loss: 2.109024 LR: 0.00002288 [12:20:07] Epoch: 1 Batch: 9814/20099 (48.83%) Loss: 1.773208 LR: 0.00002288 [12:20:09] Epoch: 1 Batch: 9815/20099 (48.83%) Loss: 2.136113 LR: 0.00002288 [12:20:10] Epoch: 1 Batch: 9816/20099 (48.84%) Loss: 2.292672 LR: 0.00002288 [12:20:12] Epoch: 1 Batch: 9817/20099 (48.84%) Loss: 1.942911 LR: 0.00002287 [12:20:14] Epoch: 1 Batch: 9818/20099 (48.85%) Loss: 2.249252 LR: 0.00002287 [12:20:16] Epoch: 1 Batch: 9819/20099 (48.85%) Loss: 1.976670 LR: 0.00002287 [12:20:17] Epoch: 1 Batch: 9820/20099 (48.86%) Loss: 2.062133 LR: 0.00002287 [12:20:19] Epoch: 1 Batch: 9821/20099 (48.86%) Loss: 1.932003 LR: 0.00002287 [12:20:21] Epoch: 1 Batch: 9822/20099 (48.87%) Loss: 1.942094 LR: 0.00002287 [12:20:23] Epoch: 1 Batch: 9823/20099 (48.87%) Loss: 2.287345 LR: 0.00002287 [12:20:25] Epoch: 1 Batch: 9824/20099 (48.88%) Loss: 2.110444 LR: 0.00002285 [12:20:26] Epoch: 1 Batch: 9825/20099 (48.88%) Loss: 2.512805 LR: 0.00002285 [12:20:28] Epoch: 1 Batch: 9826/20099 (48.89%) Loss: 2.175451 LR: 0.00002285 [12:20:30] Epoch: 1 Batch: 9827/20099 (48.89%) Loss: 2.239279 LR: 0.00002285 [12:20:32] Epoch: 1 Batch: 9828/20099 (48.90%) Loss: 2.071054 LR: 0.00002285 [12:20:34] Epoch: 1 Batch: 9829/20099 (48.90%) Loss: 1.947576 LR: 0.00002285 [12:20:35] Epoch: 1 Batch: 9830/20099 (48.91%) Loss: 2.226377 LR: 0.00002285 [12:20:37] Epoch: 1 Batch: 9831/20099 (48.91%) Loss: 2.072515 LR: 0.00002284 [12:20:39] Epoch: 1 Batch: 9832/20099 (48.92%) Loss: 2.272774 LR: 0.00002284 [12:20:41] Epoch: 1 Batch: 9833/20099 (48.92%) Loss: 1.917523 LR: 0.00002284 [12:20:42] Epoch: 1 Batch: 9834/20099 (48.93%) Loss: 2.144718 LR: 0.00002284 [12:20:44] Epoch: 1 Batch: 9835/20099 (48.93%) Loss: 2.122226 LR: 0.00002284 [12:20:46] Epoch: 1 Batch: 9836/20099 (48.94%) Loss: 1.940105 LR: 0.00002284 [12:20:48] Epoch: 1 Batch: 9837/20099 (48.94%) Loss: 2.038232 LR: 0.00002284 [12:20:50] Epoch: 1 Batch: 9838/20099 (48.95%) Loss: 2.260568 LR: 0.00002282 [12:20:51] Epoch: 1 Batch: 9839/20099 (48.95%) Loss: 2.095588 LR: 0.00002282 [12:20:53] Epoch: 1 Batch: 9840/20099 (48.96%) Loss: 2.042872 LR: 0.00002282 [12:20:55] Epoch: 1 Batch: 9841/20099 (48.96%) Loss: 2.052678 LR: 0.00002282 [12:20:57] Epoch: 1 Batch: 9842/20099 (48.97%) Loss: 1.987655 LR: 0.00002282 [12:20:58] Epoch: 1 Batch: 9843/20099 (48.97%) Loss: 2.371362 LR: 0.00002282 [12:21:00] Epoch: 1 Batch: 9844/20099 (48.98%) Loss: 2.022165 LR: 0.00002282 [12:21:02] Epoch: 1 Batch: 9845/20099 (48.98%) Loss: 2.156317 LR: 0.00002281 [12:21:04] Epoch: 1 Batch: 9846/20099 (48.99%) Loss: 1.563977 LR: 0.00002281 [12:21:05] Epoch: 1 Batch: 9847/20099 (48.99%) Loss: 2.358469 LR: 0.00002281 [12:21:07] Epoch: 1 Batch: 9848/20099 (49.00%) Loss: 2.133670 LR: 0.00002281 [12:21:09] Epoch: 1 Batch: 9849/20099 (49.00%) Loss: 2.069508 LR: 0.00002281 [12:21:11] Epoch: 1 Batch: 9850/20099 (49.01%) Loss: 2.234110 LR: 0.00002281 [12:21:12] Epoch: 1 Batch: 9851/20099 (49.01%) Loss: 2.294077 LR: 0.00002281 [12:21:14] Epoch: 1 Batch: 9852/20099 (49.02%) Loss: 1.927423 LR: 0.00002279 [12:21:16] Epoch: 1 Batch: 9853/20099 (49.02%) Loss: 1.878186 LR: 0.00002279 [12:21:18] Epoch: 1 Batch: 9854/20099 (49.03%) Loss: 1.916600 LR: 0.00002279 [12:21:20] Epoch: 1 Batch: 9855/20099 (49.03%) Loss: 2.251221 LR: 0.00002279 [12:21:21] Epoch: 1 Batch: 9856/20099 (49.04%) Loss: 1.692846 LR: 0.00002279 [12:21:23] Epoch: 1 Batch: 9857/20099 (49.04%) Loss: 2.115594 LR: 0.00002279 [12:21:25] Epoch: 1 Batch: 9858/20099 (49.05%) Loss: 2.146458 LR: 0.00002279 [12:21:27] Epoch: 1 Batch: 9859/20099 (49.05%) Loss: 2.249320 LR: 0.00002278 [12:21:28] Epoch: 1 Batch: 9860/20099 (49.06%) Loss: 2.135123 LR: 0.00002278 [12:21:30] Epoch: 1 Batch: 9861/20099 (49.06%) Loss: 2.327817 LR: 0.00002278 [12:21:32] Epoch: 1 Batch: 9862/20099 (49.07%) Loss: 2.357085 LR: 0.00002278 [12:21:34] Epoch: 1 Batch: 9863/20099 (49.07%) Loss: 2.281836 LR: 0.00002278 [12:21:35] Epoch: 1 Batch: 9864/20099 (49.08%) Loss: 2.090391 LR: 0.00002278 [12:21:37] Epoch: 1 Batch: 9865/20099 (49.08%) Loss: 1.902724 LR: 0.00002278 [12:21:39] Epoch: 1 Batch: 9866/20099 (49.09%) Loss: 2.225040 LR: 0.00002276 [12:21:41] Epoch: 1 Batch: 9867/20099 (49.09%) Loss: 2.325153 LR: 0.00002276 [12:21:43] Epoch: 1 Batch: 9868/20099 (49.10%) Loss: 2.182718 LR: 0.00002276 [12:21:44] Epoch: 1 Batch: 9869/20099 (49.10%) Loss: 2.372121 LR: 0.00002276 [12:21:46] Epoch: 1 Batch: 9870/20099 (49.11%) Loss: 1.800923 LR: 0.00002276 [12:21:48] Epoch: 1 Batch: 9871/20099 (49.11%) Loss: 2.132318 LR: 0.00002276 [12:21:50] Epoch: 1 Batch: 9872/20099 (49.12%) Loss: 2.151332 LR: 0.00002276 [12:21:52] Epoch: 1 Batch: 9873/20099 (49.12%) Loss: 2.112836 LR: 0.00002275 [12:21:53] Epoch: 1 Batch: 9874/20099 (49.13%) Loss: 2.205324 LR: 0.00002275 [12:21:55] Epoch: 1 Batch: 9875/20099 (49.13%) Loss: 2.133307 LR: 0.00002275 [12:21:57] Epoch: 1 Batch: 9876/20099 (49.14%) Loss: 1.732636 LR: 0.00002275 [12:21:59] Epoch: 1 Batch: 9877/20099 (49.14%) Loss: 2.102807 LR: 0.00002275 [12:22:00] Epoch: 1 Batch: 9878/20099 (49.15%) Loss: 1.943948 LR: 0.00002275 [12:22:02] Epoch: 1 Batch: 9879/20099 (49.15%) Loss: 1.635261 LR: 0.00002275 [12:22:04] Epoch: 1 Batch: 9880/20099 (49.16%) Loss: 2.257117 LR: 0.00002273 [12:22:06] Epoch: 1 Batch: 9881/20099 (49.16%) Loss: 2.192789 LR: 0.00002273 [12:22:08] Epoch: 1 Batch: 9882/20099 (49.17%) Loss: 2.070760 LR: 0.00002273 [12:22:09] Epoch: 1 Batch: 9883/20099 (49.17%) Loss: 2.363024 LR: 0.00002273 [12:22:11] Epoch: 1 Batch: 9884/20099 (49.18%) Loss: 1.965975 LR: 0.00002273 [12:22:13] Epoch: 1 Batch: 9885/20099 (49.18%) Loss: 1.731550 LR: 0.00002273 [12:22:15] Epoch: 1 Batch: 9886/20099 (49.19%) Loss: 1.811229 LR: 0.00002273 [12:22:16] Epoch: 1 Batch: 9887/20099 (49.19%) Loss: 2.041632 LR: 0.00002272 [12:22:18] Epoch: 1 Batch: 9888/20099 (49.20%) Loss: 2.246872 LR: 0.00002272 [12:22:20] Epoch: 1 Batch: 9889/20099 (49.20%) Loss: 2.099231 LR: 0.00002272 [12:22:22] Epoch: 1 Batch: 9890/20099 (49.21%) Loss: 2.284229 LR: 0.00002272 [12:22:24] Epoch: 1 Batch: 9891/20099 (49.21%) Loss: 2.306265 LR: 0.00002272 [12:22:25] Epoch: 1 Batch: 9892/20099 (49.22%) Loss: 2.028678 LR: 0.00002272 [12:22:27] Epoch: 1 Batch: 9893/20099 (49.22%) Loss: 2.245099 LR: 0.00002272 [12:22:29] Epoch: 1 Batch: 9894/20099 (49.23%) Loss: 1.961019 LR: 0.00002270 [12:22:31] Epoch: 1 Batch: 9895/20099 (49.23%) Loss: 2.126318 LR: 0.00002270 [12:22:33] Epoch: 1 Batch: 9896/20099 (49.24%) Loss: 2.201535 LR: 0.00002270 [12:22:34] Epoch: 1 Batch: 9897/20099 (49.24%) Loss: 2.273367 LR: 0.00002270 [12:22:36] Epoch: 1 Batch: 9898/20099 (49.25%) Loss: 2.095722 LR: 0.00002270 [12:22:38] Epoch: 1 Batch: 9899/20099 (49.25%) Loss: 1.849874 LR: 0.00002270 [12:22:40] Epoch: 1 Batch: 9900/20099 (49.26%) Loss: 2.240939 LR: 0.00002270 [12:22:41] Epoch: 1 Batch: 9901/20099 (49.26%) Loss: 2.136930 LR: 0.00002269 [12:22:43] Epoch: 1 Batch: 9902/20099 (49.27%) Loss: 2.058798 LR: 0.00002269 [12:22:45] Epoch: 1 Batch: 9903/20099 (49.27%) Loss: 2.109435 LR: 0.00002269 [12:22:47] Epoch: 1 Batch: 9904/20099 (49.28%) Loss: 2.137810 LR: 0.00002269 [12:22:49] Epoch: 1 Batch: 9905/20099 (49.28%) Loss: 2.239021 LR: 0.00002269 [12:22:50] Epoch: 1 Batch: 9906/20099 (49.29%) Loss: 2.143491 LR: 0.00002269 [12:22:52] Epoch: 1 Batch: 9907/20099 (49.29%) Loss: 1.801488 LR: 0.00002269 [12:22:54] Epoch: 1 Batch: 9908/20099 (49.30%) Loss: 2.369668 LR: 0.00002267 [12:22:56] Epoch: 1 Batch: 9909/20099 (49.30%) Loss: 2.411505 LR: 0.00002267 [12:22:57] Epoch: 1 Batch: 9910/20099 (49.31%) Loss: 1.957450 LR: 0.00002267 [12:22:59] Epoch: 1 Batch: 9911/20099 (49.31%) Loss: 2.069764 LR: 0.00002267 [12:23:01] Epoch: 1 Batch: 9912/20099 (49.32%) Loss: 2.108681 LR: 0.00002267 [12:23:03] Epoch: 1 Batch: 9913/20099 (49.32%) Loss: 2.093893 LR: 0.00002267 [12:23:04] Epoch: 1 Batch: 9914/20099 (49.33%) Loss: 2.083066 LR: 0.00002267 [12:23:06] Epoch: 1 Batch: 9915/20099 (49.33%) Loss: 2.181031 LR: 0.00002266 [12:23:08] Epoch: 1 Batch: 9916/20099 (49.34%) Loss: 2.053276 LR: 0.00002266 [12:23:10] Epoch: 1 Batch: 9917/20099 (49.34%) Loss: 2.078971 LR: 0.00002266 [12:23:12] Epoch: 1 Batch: 9918/20099 (49.35%) Loss: 1.892227 LR: 0.00002266 [12:23:13] Epoch: 1 Batch: 9919/20099 (49.35%) Loss: 2.243524 LR: 0.00002266 [12:23:15] Epoch: 1 Batch: 9920/20099 (49.36%) Loss: 2.262689 LR: 0.00002266 [12:23:17] Epoch: 1 Batch: 9921/20099 (49.36%) Loss: 1.984151 LR: 0.00002266 [12:23:19] Epoch: 1 Batch: 9922/20099 (49.37%) Loss: 2.415529 LR: 0.00002264 [12:23:20] Epoch: 1 Batch: 9923/20099 (49.37%) Loss: 2.083714 LR: 0.00002264 [12:23:22] Epoch: 1 Batch: 9924/20099 (49.38%) Loss: 2.027617 LR: 0.00002264 [12:23:24] Epoch: 1 Batch: 9925/20099 (49.38%) Loss: 2.233963 LR: 0.00002264 [12:23:26] Epoch: 1 Batch: 9926/20099 (49.39%) Loss: 1.963231 LR: 0.00002264 [12:23:27] Epoch: 1 Batch: 9927/20099 (49.39%) Loss: 1.851027 LR: 0.00002264 [12:23:29] Epoch: 1 Batch: 9928/20099 (49.40%) Loss: 1.839346 LR: 0.00002264 [12:23:31] Epoch: 1 Batch: 9929/20099 (49.40%) Loss: 2.120089 LR: 0.00002263 [12:23:33] Epoch: 1 Batch: 9930/20099 (49.41%) Loss: 2.069928 LR: 0.00002263 [12:23:34] Epoch: 1 Batch: 9931/20099 (49.41%) Loss: 2.037401 LR: 0.00002263 [12:23:36] Epoch: 1 Batch: 9932/20099 (49.42%) Loss: 2.198977 LR: 0.00002263 [12:23:38] Epoch: 1 Batch: 9933/20099 (49.42%) Loss: 1.827817 LR: 0.00002263 [12:23:40] Epoch: 1 Batch: 9934/20099 (49.43%) Loss: 2.114602 LR: 0.00002263 [12:23:42] Epoch: 1 Batch: 9935/20099 (49.43%) Loss: 2.091907 LR: 0.00002263 [12:23:44] Epoch: 1 Batch: 9936/20099 (49.44%) Loss: 1.939982 LR: 0.00002261 [12:23:45] Epoch: 1 Batch: 9937/20099 (49.44%) Loss: 2.099046 LR: 0.00002261 [12:23:47] Epoch: 1 Batch: 9938/20099 (49.45%) Loss: 2.214905 LR: 0.00002261 [12:23:49] Epoch: 1 Batch: 9939/20099 (49.45%) Loss: 2.014893 LR: 0.00002261 [12:23:51] Epoch: 1 Batch: 9940/20099 (49.46%) Loss: 2.194372 LR: 0.00002261 [12:23:52] Epoch: 1 Batch: 9941/20099 (49.46%) Loss: 1.907268 LR: 0.00002261 [12:23:54] Epoch: 1 Batch: 9942/20099 (49.47%) Loss: 2.273577 LR: 0.00002261 [12:23:56] Epoch: 1 Batch: 9943/20099 (49.47%) Loss: 2.323204 LR: 0.00002260 [12:23:58] Epoch: 1 Batch: 9944/20099 (49.48%) Loss: 2.168359 LR: 0.00002260 [12:23:59] Epoch: 1 Batch: 9945/20099 (49.48%) Loss: 2.140655 LR: 0.00002260 [12:24:01] Epoch: 1 Batch: 9946/20099 (49.49%) Loss: 2.177887 LR: 0.00002260 [12:24:03] Epoch: 1 Batch: 9947/20099 (49.49%) Loss: 1.952934 LR: 0.00002260 [12:24:05] Epoch: 1 Batch: 9948/20099 (49.49%) Loss: 2.297557 LR: 0.00002260 [12:24:07] Epoch: 1 Batch: 9949/20099 (49.50%) Loss: 2.125679 LR: 0.00002260 [12:24:08] Epoch: 1 Batch: 9950/20099 (49.50%) Loss: 2.127336 LR: 0.00002258 [12:24:10] Epoch: 1 Batch: 9951/20099 (49.51%) Loss: 2.047636 LR: 0.00002258 [12:24:12] Epoch: 1 Batch: 9952/20099 (49.51%) Loss: 2.244343 LR: 0.00002258 [12:24:14] Epoch: 1 Batch: 9953/20099 (49.52%) Loss: 2.117914 LR: 0.00002258 [12:24:15] Epoch: 1 Batch: 9954/20099 (49.52%) Loss: 1.971771 LR: 0.00002258 [12:24:17] Epoch: 1 Batch: 9955/20099 (49.53%) Loss: 1.890862 LR: 0.00002258 [12:24:19] Epoch: 1 Batch: 9956/20099 (49.53%) Loss: 1.977502 LR: 0.00002258 [12:24:21] Epoch: 1 Batch: 9957/20099 (49.54%) Loss: 2.069806 LR: 0.00002257 [12:24:23] Epoch: 1 Batch: 9958/20099 (49.54%) Loss: 2.217018 LR: 0.00002257 [12:24:24] Epoch: 1 Batch: 9959/20099 (49.55%) Loss: 2.205698 LR: 0.00002257 [12:24:26] Epoch: 1 Batch: 9960/20099 (49.55%) Loss: 2.070184 LR: 0.00002257 [12:24:28] Epoch: 1 Batch: 9961/20099 (49.56%) Loss: 2.001893 LR: 0.00002257 [12:24:30] Epoch: 1 Batch: 9962/20099 (49.56%) Loss: 2.035520 LR: 0.00002257 [12:24:31] Epoch: 1 Batch: 9963/20099 (49.57%) Loss: 2.195385 LR: 0.00002257 [12:24:33] Epoch: 1 Batch: 9964/20099 (49.57%) Loss: 2.038348 LR: 0.00002255 [12:24:35] Epoch: 1 Batch: 9965/20099 (49.58%) Loss: 2.079867 LR: 0.00002255 [12:24:37] Epoch: 1 Batch: 9966/20099 (49.58%) Loss: 2.221630 LR: 0.00002255 [12:24:39] Epoch: 1 Batch: 9967/20099 (49.59%) Loss: 2.020751 LR: 0.00002255 [12:24:40] Epoch: 1 Batch: 9968/20099 (49.59%) Loss: 2.108041 LR: 0.00002255 [12:24:42] Epoch: 1 Batch: 9969/20099 (49.60%) Loss: 2.047154 LR: 0.00002255 [12:24:44] Epoch: 1 Batch: 9970/20099 (49.60%) Loss: 1.993837 LR: 0.00002255 [12:24:46] Epoch: 1 Batch: 9971/20099 (49.61%) Loss: 2.235640 LR: 0.00002254 [12:24:47] Epoch: 1 Batch: 9972/20099 (49.61%) Loss: 2.206338 LR: 0.00002254 [12:24:49] Epoch: 1 Batch: 9973/20099 (49.62%) Loss: 2.227196 LR: 0.00002254 [12:24:51] Epoch: 1 Batch: 9974/20099 (49.62%) Loss: 2.172691 LR: 0.00002254 [12:24:53] Epoch: 1 Batch: 9975/20099 (49.63%) Loss: 2.372286 LR: 0.00002254 [12:24:55] Epoch: 1 Batch: 9976/20099 (49.63%) Loss: 2.059055 LR: 0.00002254 [12:24:56] Epoch: 1 Batch: 9977/20099 (49.64%) Loss: 2.028141 LR: 0.00002254 [12:24:58] Epoch: 1 Batch: 9978/20099 (49.64%) Loss: 1.804816 LR: 0.00002252 [12:25:00] Epoch: 1 Batch: 9979/20099 (49.65%) Loss: 2.351411 LR: 0.00002252 [12:25:02] Epoch: 1 Batch: 9980/20099 (49.65%) Loss: 1.909527 LR: 0.00002252 [12:25:03] Epoch: 1 Batch: 9981/20099 (49.66%) Loss: 2.141121 LR: 0.00002252 [12:25:05] Epoch: 1 Batch: 9982/20099 (49.66%) Loss: 2.252161 LR: 0.00002252 [12:25:07] Epoch: 1 Batch: 9983/20099 (49.67%) Loss: 2.205798 LR: 0.00002252 [12:25:09] Epoch: 1 Batch: 9984/20099 (49.67%) Loss: 1.934633 LR: 0.00002252 [12:25:11] Epoch: 1 Batch: 9985/20099 (49.68%) Loss: 2.144827 LR: 0.00002251 [12:25:12] Epoch: 1 Batch: 9986/20099 (49.68%) Loss: 2.156864 LR: 0.00002251 [12:25:14] Epoch: 1 Batch: 9987/20099 (49.69%) Loss: 2.352743 LR: 0.00002251 [12:25:16] Epoch: 1 Batch: 9988/20099 (49.69%) Loss: 2.014923 LR: 0.00002251 [12:25:18] Epoch: 1 Batch: 9989/20099 (49.70%) Loss: 1.912745 LR: 0.00002251 [12:25:20] Epoch: 1 Batch: 9990/20099 (49.70%) Loss: 2.250184 LR: 0.00002251 [12:25:21] Epoch: 1 Batch: 9991/20099 (49.71%) Loss: 2.038514 LR: 0.00002251 [12:25:23] Epoch: 1 Batch: 9992/20099 (49.71%) Loss: 2.391297 LR: 0.00002249 [12:25:25] Epoch: 1 Batch: 9993/20099 (49.72%) Loss: 2.104748 LR: 0.00002249 [12:25:27] Epoch: 1 Batch: 9994/20099 (49.72%) Loss: 2.165908 LR: 0.00002249 [12:25:28] Epoch: 1 Batch: 9995/20099 (49.73%) Loss: 2.200402 LR: 0.00002249 [12:25:30] Epoch: 1 Batch: 9996/20099 (49.73%) Loss: 1.926325 LR: 0.00002249 [12:25:32] Epoch: 1 Batch: 9997/20099 (49.74%) Loss: 1.948569 LR: 0.00002249 [12:25:34] Epoch: 1 Batch: 9998/20099 (49.74%) Loss: 2.033269 LR: 0.00002249 [12:25:36] Epoch: 1 Batch: 9999/20099 (49.75%) Loss: 2.029323 LR: 0.00002248 [12:25:37] >> Evaluating batch 0 [12:25:38] >> Evaluating batch 1 [12:25:39] >> Evaluating batch 2 [12:25:41] >> Evaluating batch 3 [12:25:42] >> Evaluating batch 4 [12:25:43] >> Evaluating batch 5 [12:25:44] >> Evaluating batch 6 [12:25:45] >> Evaluating batch 7 [12:25:46] >> Evaluating batch 8 [12:25:47] >> Evaluating batch 9 [12:25:48] >> Evaluating batch 10 [12:25:48] >> Evaluating batch 11 [12:25:49] >> Evaluating batch 12 [12:25:50] >> Evaluating batch 13 [12:25:51] >> Evaluating batch 14 [12:25:52] >> Evaluating batch 15 [12:25:53] >> Evaluating batch 16 [12:25:54] Epoch: 1 Step: 10000/20099 Evaluation: [12:25:54] [1mAvg Loss Since Last Eval: 2.1046 Val Loss: 2.1679 Validation loss delta: 0.0005 Perplexity: 8.7398 LR: 0.00002248 [12:25:57] >> Cleaned up old temp checkpoint: epoch1_step8000 [12:25:57] >> Temp checkpoint saved: epoch1_step10000, size: 0.1693 GB [12:26:01] >> Checkpoint saved: epoch1_step10000, size: 0.1693 GB [12:26:01] Epoch: 1 Batch: 10000/20099 (49.75%) Loss: 2.020817 LR: 0.00002248 [12:26:03] Epoch: 1 Batch: 10001/20099 (49.76%) Loss: 2.300226 LR: 0.00002248 [12:26:04] Epoch: 1 Batch: 10002/20099 (49.76%) Loss: 1.933957 LR: 0.00002248 [12:26:06] Epoch: 1 Batch: 10003/20099 (49.77%) Loss: 2.006570 LR: 0.00002248 [12:26:08] Epoch: 1 Batch: 10004/20099 (49.77%) Loss: 2.299121 LR: 0.00002248 [12:26:10] Epoch: 1 Batch: 10005/20099 (49.78%) Loss: 2.041704 LR: 0.00002248 [12:26:11] Epoch: 1 Batch: 10006/20099 (49.78%) Loss: 1.936397 LR: 0.00002246 [12:26:13] Epoch: 1 Batch: 10007/20099 (49.79%) Loss: 2.199860 LR: 0.00002246 [12:26:15] Epoch: 1 Batch: 10008/20099 (49.79%) Loss: 2.537230 LR: 0.00002246 [12:26:17] Epoch: 1 Batch: 10009/20099 (49.80%) Loss: 2.018312 LR: 0.00002246 [12:26:19] Epoch: 1 Batch: 10010/20099 (49.80%) Loss: 1.829368 LR: 0.00002246 [12:26:20] Epoch: 1 Batch: 10011/20099 (49.81%) Loss: 2.122959 LR: 0.00002246 [12:26:22] Epoch: 1 Batch: 10012/20099 (49.81%) Loss: 2.072715 LR: 0.00002246 [12:26:24] Epoch: 1 Batch: 10013/20099 (49.82%) Loss: 2.366633 LR: 0.00002245 [12:26:26] Epoch: 1 Batch: 10014/20099 (49.82%) Loss: 2.430928 LR: 0.00002245 [12:26:28] Epoch: 1 Batch: 10015/20099 (49.83%) Loss: 2.023671 LR: 0.00002245 [12:26:30] Epoch: 1 Batch: 10016/20099 (49.83%) Loss: 2.012231 LR: 0.00002245 [12:26:31] Epoch: 1 Batch: 10017/20099 (49.84%) Loss: 1.937811 LR: 0.00002245 [12:26:33] Epoch: 1 Batch: 10018/20099 (49.84%) Loss: 1.717094 LR: 0.00002245 [12:26:35] Epoch: 1 Batch: 10019/20099 (49.85%) Loss: 2.190111 LR: 0.00002245 [12:26:37] Epoch: 1 Batch: 10020/20099 (49.85%) Loss: 2.041499 LR: 0.00002243 [12:26:39] Epoch: 1 Batch: 10021/20099 (49.86%) Loss: 1.731227 LR: 0.00002243 [12:26:41] Epoch: 1 Batch: 10022/20099 (49.86%) Loss: 2.117833 LR: 0.00002243 [12:26:42] Epoch: 1 Batch: 10023/20099 (49.87%) Loss: 2.296460 LR: 0.00002243 [12:26:44] Epoch: 1 Batch: 10024/20099 (49.87%) Loss: 2.022303 LR: 0.00002243 [12:26:46] Epoch: 1 Batch: 10025/20099 (49.88%) Loss: 2.330329 LR: 0.00002243 [12:26:48] Epoch: 1 Batch: 10026/20099 (49.88%) Loss: 2.141462 LR: 0.00002243 [12:26:50] Epoch: 1 Batch: 10027/20099 (49.89%) Loss: 2.020880 LR: 0.00002242 [12:26:51] Epoch: 1 Batch: 10028/20099 (49.89%) Loss: 1.975474 LR: 0.00002242 [12:26:53] Epoch: 1 Batch: 10029/20099 (49.90%) Loss: 2.150536 LR: 0.00002242 [12:26:55] Epoch: 1 Batch: 10030/20099 (49.90%) Loss: 2.582999 LR: 0.00002242 [12:26:57] Epoch: 1 Batch: 10031/20099 (49.91%) Loss: 2.003338 LR: 0.00002242 [12:26:58] Epoch: 1 Batch: 10032/20099 (49.91%) Loss: 1.901347 LR: 0.00002242 [12:27:00] Epoch: 1 Batch: 10033/20099 (49.92%) Loss: 2.195032 LR: 0.00002242 [12:27:02] Epoch: 1 Batch: 10034/20099 (49.92%) Loss: 1.930283 LR: 0.00002240 [12:27:04] Epoch: 1 Batch: 10035/20099 (49.93%) Loss: 2.292161 LR: 0.00002240 [12:27:05] Epoch: 1 Batch: 10036/20099 (49.93%) Loss: 2.193601 LR: 0.00002240 [12:27:07] Epoch: 1 Batch: 10037/20099 (49.94%) Loss: 2.003621 LR: 0.00002240 [12:27:09] Epoch: 1 Batch: 10038/20099 (49.94%) Loss: 2.116432 LR: 0.00002240 [12:27:11] Epoch: 1 Batch: 10039/20099 (49.95%) Loss: 2.233969 LR: 0.00002240 [12:27:12] Epoch: 1 Batch: 10040/20099 (49.95%) Loss: 2.130067 LR: 0.00002240 [12:27:14] Epoch: 1 Batch: 10041/20099 (49.96%) Loss: 2.127692 LR: 0.00002239 [12:27:16] Epoch: 1 Batch: 10042/20099 (49.96%) Loss: 2.080173 LR: 0.00002239 [12:27:18] Epoch: 1 Batch: 10043/20099 (49.97%) Loss: 2.220584 LR: 0.00002239 [12:27:19] Epoch: 1 Batch: 10044/20099 (49.97%) Loss: 2.201909 LR: 0.00002239 [12:27:21] Epoch: 1 Batch: 10045/20099 (49.98%) Loss: 2.285996 LR: 0.00002239 [12:27:23] Epoch: 1 Batch: 10046/20099 (49.98%) Loss: 1.757471 LR: 0.00002239 [12:27:25] Epoch: 1 Batch: 10047/20099 (49.99%) Loss: 1.897918 LR: 0.00002239 [12:27:26] Epoch: 1 Batch: 10048/20099 (49.99%) Loss: 2.383520 LR: 0.00002237 [12:27:28] Epoch: 1 Batch: 10049/20099 (50.00%) Loss: 2.235969 LR: 0.00002237 [12:27:30] Epoch: 1 Batch: 10050/20099 (50.00%) Loss: 2.355113 LR: 0.00002237 [12:27:32] Epoch: 1 Batch: 10051/20099 (50.01%) Loss: 2.123668 LR: 0.00002237 [12:27:33] Epoch: 1 Batch: 10052/20099 (50.01%) Loss: 2.219716 LR: 0.00002237 [12:27:35] Epoch: 1 Batch: 10053/20099 (50.02%) Loss: 2.195567 LR: 0.00002237 [12:27:37] Epoch: 1 Batch: 10054/20099 (50.02%) Loss: 2.123614 LR: 0.00002237 [12:27:39] Epoch: 1 Batch: 10055/20099 (50.03%) Loss: 2.161660 LR: 0.00002236 [12:27:40] Epoch: 1 Batch: 10056/20099 (50.03%) Loss: 2.014030 LR: 0.00002236 [12:27:42] Epoch: 1 Batch: 10057/20099 (50.04%) Loss: 2.172013 LR: 0.00002236 [12:27:44] Epoch: 1 Batch: 10058/20099 (50.04%) Loss: 2.135809 LR: 0.00002236 [12:27:46] Epoch: 1 Batch: 10059/20099 (50.05%) Loss: 2.165059 LR: 0.00002236 [12:27:48] Epoch: 1 Batch: 10060/20099 (50.05%) Loss: 2.107183 LR: 0.00002236 [12:27:49] Epoch: 1 Batch: 10061/20099 (50.06%) Loss: 1.806012 LR: 0.00002236 [12:27:51] Epoch: 1 Batch: 10062/20099 (50.06%) Loss: 2.121375 LR: 0.00002234 [12:27:53] Epoch: 1 Batch: 10063/20099 (50.07%) Loss: 2.103945 LR: 0.00002234 [12:27:55] Epoch: 1 Batch: 10064/20099 (50.07%) Loss: 2.362548 LR: 0.00002234 [12:27:56] Epoch: 1 Batch: 10065/20099 (50.08%) Loss: 2.002727 LR: 0.00002234 [12:27:58] Epoch: 1 Batch: 10066/20099 (50.08%) Loss: 2.065569 LR: 0.00002234 [12:28:00] Epoch: 1 Batch: 10067/20099 (50.09%) Loss: 1.845837 LR: 0.00002234 [12:28:02] Epoch: 1 Batch: 10068/20099 (50.09%) Loss: 2.142133 LR: 0.00002234 [12:28:04] Epoch: 1 Batch: 10069/20099 (50.10%) Loss: 1.983291 LR: 0.00002233 [12:28:05] Epoch: 1 Batch: 10070/20099 (50.10%) Loss: 2.150580 LR: 0.00002233 [12:28:07] Epoch: 1 Batch: 10071/20099 (50.11%) Loss: 2.093424 LR: 0.00002233 [12:28:09] Epoch: 1 Batch: 10072/20099 (50.11%) Loss: 1.796980 LR: 0.00002233 [12:28:11] Epoch: 1 Batch: 10073/20099 (50.12%) Loss: 2.195577 LR: 0.00002233 [12:28:13] Epoch: 1 Batch: 10074/20099 (50.12%) Loss: 1.864250 LR: 0.00002233 [12:28:14] Epoch: 1 Batch: 10075/20099 (50.13%) Loss: 2.233644 LR: 0.00002233 [12:28:16] Epoch: 1 Batch: 10076/20099 (50.13%) Loss: 2.288469 LR: 0.00002231 [12:28:18] Epoch: 1 Batch: 10077/20099 (50.14%) Loss: 1.999655 LR: 0.00002231 [12:28:20] Epoch: 1 Batch: 10078/20099 (50.14%) Loss: 2.230029 LR: 0.00002231 [12:28:21] Epoch: 1 Batch: 10079/20099 (50.15%) Loss: 2.279404 LR: 0.00002231 [12:28:23] Epoch: 1 Batch: 10080/20099 (50.15%) Loss: 2.342122 LR: 0.00002231 [12:28:25] Epoch: 1 Batch: 10081/20099 (50.16%) Loss: 2.029478 LR: 0.00002231 [12:28:27] Epoch: 1 Batch: 10082/20099 (50.16%) Loss: 1.861949 LR: 0.00002231 [12:28:29] Epoch: 1 Batch: 10083/20099 (50.17%) Loss: 2.106226 LR: 0.00002230 [12:28:30] Epoch: 1 Batch: 10084/20099 (50.17%) Loss: 2.390975 LR: 0.00002230 [12:28:32] Epoch: 1 Batch: 10085/20099 (50.18%) Loss: 2.051450 LR: 0.00002230 [12:28:34] Epoch: 1 Batch: 10086/20099 (50.18%) Loss: 2.119925 LR: 0.00002230 [12:28:36] Epoch: 1 Batch: 10087/20099 (50.19%) Loss: 2.282066 LR: 0.00002230 [12:28:37] Epoch: 1 Batch: 10088/20099 (50.19%) Loss: 2.175742 LR: 0.00002230 [12:28:39] Epoch: 1 Batch: 10089/20099 (50.20%) Loss: 2.280556 LR: 0.00002230 [12:28:41] Epoch: 1 Batch: 10090/20099 (50.20%) Loss: 2.259068 LR: 0.00002228 [12:28:43] Epoch: 1 Batch: 10091/20099 (50.21%) Loss: 2.128651 LR: 0.00002228 [12:28:44] Epoch: 1 Batch: 10092/20099 (50.21%) Loss: 1.985488 LR: 0.00002228 [12:28:46] Epoch: 1 Batch: 10093/20099 (50.22%) Loss: 2.261419 LR: 0.00002228 [12:28:48] Epoch: 1 Batch: 10094/20099 (50.22%) Loss: 2.019954 LR: 0.00002228 [12:28:50] Epoch: 1 Batch: 10095/20099 (50.23%) Loss: 2.046975 LR: 0.00002228 [12:28:52] Epoch: 1 Batch: 10096/20099 (50.23%) Loss: 1.763218 LR: 0.00002228 [12:28:53] Epoch: 1 Batch: 10097/20099 (50.24%) Loss: 2.122265 LR: 0.00002227 [12:28:55] Epoch: 1 Batch: 10098/20099 (50.24%) Loss: 1.782828 LR: 0.00002227 [12:28:57] Epoch: 1 Batch: 10099/20099 (50.25%) Loss: 1.971214 LR: 0.00002227 [12:28:59] Epoch: 1 Batch: 10100/20099 (50.25%) Loss: 2.056597 LR: 0.00002227 [12:29:00] Epoch: 1 Batch: 10101/20099 (50.26%) Loss: 2.010042 LR: 0.00002227 [12:29:02] Epoch: 1 Batch: 10102/20099 (50.26%) Loss: 2.241540 LR: 0.00002227 [12:29:04] Epoch: 1 Batch: 10103/20099 (50.27%) Loss: 2.240625 LR: 0.00002227 [12:29:06] Epoch: 1 Batch: 10104/20099 (50.27%) Loss: 2.231999 LR: 0.00002225 [12:29:07] Epoch: 1 Batch: 10105/20099 (50.28%) Loss: 2.539127 LR: 0.00002225 [12:29:09] Epoch: 1 Batch: 10106/20099 (50.28%) Loss: 1.882254 LR: 0.00002225 [12:29:11] Epoch: 1 Batch: 10107/20099 (50.29%) Loss: 2.028766 LR: 0.00002225 [12:29:13] Epoch: 1 Batch: 10108/20099 (50.29%) Loss: 2.158724 LR: 0.00002225 [12:29:15] Epoch: 1 Batch: 10109/20099 (50.30%) Loss: 2.302498 LR: 0.00002225 [12:29:16] Epoch: 1 Batch: 10110/20099 (50.30%) Loss: 2.019878 LR: 0.00002225 [12:29:18] Epoch: 1 Batch: 10111/20099 (50.31%) Loss: 1.955251 LR: 0.00002224 [12:29:20] Epoch: 1 Batch: 10112/20099 (50.31%) Loss: 2.132048 LR: 0.00002224 [12:29:22] Epoch: 1 Batch: 10113/20099 (50.32%) Loss: 2.217074 LR: 0.00002224 [12:29:23] Epoch: 1 Batch: 10114/20099 (50.32%) Loss: 1.868025 LR: 0.00002224 [12:29:25] Epoch: 1 Batch: 10115/20099 (50.33%) Loss: 2.040937 LR: 0.00002224 [12:29:27] Epoch: 1 Batch: 10116/20099 (50.33%) Loss: 2.269646 LR: 0.00002224 [12:29:29] Epoch: 1 Batch: 10117/20099 (50.34%) Loss: 2.136592 LR: 0.00002224 [12:29:30] Epoch: 1 Batch: 10118/20099 (50.34%) Loss: 2.108685 LR: 0.00002222 [12:29:32] Epoch: 1 Batch: 10119/20099 (50.35%) Loss: 1.879045 LR: 0.00002222 [12:29:34] Epoch: 1 Batch: 10120/20099 (50.35%) Loss: 2.104494 LR: 0.00002222 [12:29:36] Epoch: 1 Batch: 10121/20099 (50.36%) Loss: 1.849973 LR: 0.00002222 [12:29:38] Epoch: 1 Batch: 10122/20099 (50.36%) Loss: 2.382658 LR: 0.00002222 [12:29:39] Epoch: 1 Batch: 10123/20099 (50.37%) Loss: 2.118007 LR: 0.00002222 [12:29:41] Epoch: 1 Batch: 10124/20099 (50.37%) Loss: 2.072966 LR: 0.00002222 [12:29:43] Epoch: 1 Batch: 10125/20099 (50.38%) Loss: 2.115799 LR: 0.00002220 [12:29:45] Epoch: 1 Batch: 10126/20099 (50.38%) Loss: 2.274395 LR: 0.00002220 [12:29:46] Epoch: 1 Batch: 10127/20099 (50.39%) Loss: 2.256159 LR: 0.00002220 [12:29:48] Epoch: 1 Batch: 10128/20099 (50.39%) Loss: 2.059817 LR: 0.00002220 [12:29:50] Epoch: 1 Batch: 10129/20099 (50.40%) Loss: 1.961134 LR: 0.00002220 [12:29:52] Epoch: 1 Batch: 10130/20099 (50.40%) Loss: 2.038703 LR: 0.00002220 [12:29:54] Epoch: 1 Batch: 10131/20099 (50.41%) Loss: 2.154069 LR: 0.00002220 [12:29:55] Epoch: 1 Batch: 10132/20099 (50.41%) Loss: 2.056169 LR: 0.00002219 [12:29:57] Epoch: 1 Batch: 10133/20099 (50.42%) Loss: 2.065122 LR: 0.00002219 [12:29:59] Epoch: 1 Batch: 10134/20099 (50.42%) Loss: 2.293813 LR: 0.00002219 [12:30:01] Epoch: 1 Batch: 10135/20099 (50.43%) Loss: 1.643738 LR: 0.00002219 [12:30:03] Epoch: 1 Batch: 10136/20099 (50.43%) Loss: 1.933113 LR: 0.00002219 [12:30:04] Epoch: 1 Batch: 10137/20099 (50.44%) Loss: 1.971386 LR: 0.00002219 [12:30:06] Epoch: 1 Batch: 10138/20099 (50.44%) Loss: 2.006056 LR: 0.00002219 [12:30:08] Epoch: 1 Batch: 10139/20099 (50.45%) Loss: 1.959049 LR: 0.00002217 [12:30:10] Epoch: 1 Batch: 10140/20099 (50.45%) Loss: 2.219636 LR: 0.00002217 [12:30:11] Epoch: 1 Batch: 10141/20099 (50.46%) Loss: 1.906939 LR: 0.00002217 [12:30:13] Epoch: 1 Batch: 10142/20099 (50.46%) Loss: 2.206557 LR: 0.00002217 [12:30:15] Epoch: 1 Batch: 10143/20099 (50.47%) Loss: 1.996764 LR: 0.00002217 [12:30:17] Epoch: 1 Batch: 10144/20099 (50.47%) Loss: 2.203351 LR: 0.00002217 [12:30:19] Epoch: 1 Batch: 10145/20099 (50.48%) Loss: 2.415311 LR: 0.00002217 [12:30:20] Epoch: 1 Batch: 10146/20099 (50.48%) Loss: 1.836269 LR: 0.00002216 [12:30:22] Epoch: 1 Batch: 10147/20099 (50.49%) Loss: 2.120124 LR: 0.00002216 [12:30:24] Epoch: 1 Batch: 10148/20099 (50.49%) Loss: 2.116754 LR: 0.00002216 [12:30:26] Epoch: 1 Batch: 10149/20099 (50.50%) Loss: 2.237048 LR: 0.00002216 [12:30:27] Epoch: 1 Batch: 10150/20099 (50.50%) Loss: 2.060260 LR: 0.00002216 [12:30:29] Epoch: 1 Batch: 10151/20099 (50.51%) Loss: 2.203682 LR: 0.00002216 [12:30:31] Epoch: 1 Batch: 10152/20099 (50.51%) Loss: 1.953529 LR: 0.00002216 [12:30:33] Epoch: 1 Batch: 10153/20099 (50.51%) Loss: 2.310438 LR: 0.00002214 [12:30:35] Epoch: 1 Batch: 10154/20099 (50.52%) Loss: 2.018197 LR: 0.00002214 [12:30:36] Epoch: 1 Batch: 10155/20099 (50.52%) Loss: 2.127926 LR: 0.00002214 [12:30:38] Epoch: 1 Batch: 10156/20099 (50.53%) Loss: 2.195174 LR: 0.00002214 [12:30:40] Epoch: 1 Batch: 10157/20099 (50.53%) Loss: 2.381013 LR: 0.00002214 [12:30:42] Epoch: 1 Batch: 10158/20099 (50.54%) Loss: 1.954852 LR: 0.00002214 [12:30:43] Epoch: 1 Batch: 10159/20099 (50.54%) Loss: 2.530523 LR: 0.00002214 [12:30:45] Epoch: 1 Batch: 10160/20099 (50.55%) Loss: 2.454019 LR: 0.00002213 [12:30:47] Epoch: 1 Batch: 10161/20099 (50.55%) Loss: 2.033713 LR: 0.00002213 [12:30:49] Epoch: 1 Batch: 10162/20099 (50.56%) Loss: 1.986884 LR: 0.00002213 [12:30:51] Epoch: 1 Batch: 10163/20099 (50.56%) Loss: 2.241462 LR: 0.00002213 [12:30:52] Epoch: 1 Batch: 10164/20099 (50.57%) Loss: 2.111174 LR: 0.00002213 [12:30:54] Epoch: 1 Batch: 10165/20099 (50.57%) Loss: 2.218716 LR: 0.00002213 [12:30:56] Epoch: 1 Batch: 10166/20099 (50.58%) Loss: 2.135056 LR: 0.00002213 [12:30:58] Epoch: 1 Batch: 10167/20099 (50.58%) Loss: 1.764461 LR: 0.00002211 [12:30:59] Epoch: 1 Batch: 10168/20099 (50.59%) Loss: 2.258940 LR: 0.00002211 [12:31:01] Epoch: 1 Batch: 10169/20099 (50.59%) Loss: 2.044407 LR: 0.00002211 [12:31:03] Epoch: 1 Batch: 10170/20099 (50.60%) Loss: 2.063094 LR: 0.00002211 [12:31:05] Epoch: 1 Batch: 10171/20099 (50.60%) Loss: 1.949757 LR: 0.00002211 [12:31:06] Epoch: 1 Batch: 10172/20099 (50.61%) Loss: 2.082707 LR: 0.00002211 [12:31:08] Epoch: 1 Batch: 10173/20099 (50.61%) Loss: 2.113373 LR: 0.00002211 [12:31:10] Epoch: 1 Batch: 10174/20099 (50.62%) Loss: 2.191147 LR: 0.00002210 [12:31:12] Epoch: 1 Batch: 10175/20099 (50.62%) Loss: 2.331698 LR: 0.00002210 [12:31:14] Epoch: 1 Batch: 10176/20099 (50.63%) Loss: 2.304794 LR: 0.00002210 [12:31:15] Epoch: 1 Batch: 10177/20099 (50.63%) Loss: 2.242102 LR: 0.00002210 [12:31:17] Epoch: 1 Batch: 10178/20099 (50.64%) Loss: 2.312136 LR: 0.00002210 [12:31:19] Epoch: 1 Batch: 10179/20099 (50.64%) Loss: 2.077174 LR: 0.00002210 [12:31:21] Epoch: 1 Batch: 10180/20099 (50.65%) Loss: 2.076869 LR: 0.00002210 [12:31:22] Epoch: 1 Batch: 10181/20099 (50.65%) Loss: 2.181428 LR: 0.00002208 [12:31:24] Epoch: 1 Batch: 10182/20099 (50.66%) Loss: 2.125516 LR: 0.00002208 [12:31:26] Epoch: 1 Batch: 10183/20099 (50.66%) Loss: 2.280057 LR: 0.00002208 [12:31:28] Epoch: 1 Batch: 10184/20099 (50.67%) Loss: 2.203995 LR: 0.00002208 [12:31:30] Epoch: 1 Batch: 10185/20099 (50.67%) Loss: 1.967923 LR: 0.00002208 [12:31:31] Epoch: 1 Batch: 10186/20099 (50.68%) Loss: 1.853456 LR: 0.00002208 [12:31:33] Epoch: 1 Batch: 10187/20099 (50.68%) Loss: 2.328783 LR: 0.00002208 [12:31:35] Epoch: 1 Batch: 10188/20099 (50.69%) Loss: 2.249911 LR: 0.00002207 [12:31:37] Epoch: 1 Batch: 10189/20099 (50.69%) Loss: 1.974304 LR: 0.00002207 [12:31:38] Epoch: 1 Batch: 10190/20099 (50.70%) Loss: 2.268555 LR: 0.00002207 [12:31:40] Epoch: 1 Batch: 10191/20099 (50.70%) Loss: 2.112212 LR: 0.00002207 [12:31:42] Epoch: 1 Batch: 10192/20099 (50.71%) Loss: 1.898471 LR: 0.00002207 [12:31:44] Epoch: 1 Batch: 10193/20099 (50.71%) Loss: 2.288927 LR: 0.00002207 [12:31:46] Epoch: 1 Batch: 10194/20099 (50.72%) Loss: 2.493155 LR: 0.00002207 [12:31:47] Epoch: 1 Batch: 10195/20099 (50.72%) Loss: 2.076202 LR: 0.00002205 [12:31:49] Epoch: 1 Batch: 10196/20099 (50.73%) Loss: 2.083090 LR: 0.00002205 [12:31:51] Epoch: 1 Batch: 10197/20099 (50.73%) Loss: 2.000123 LR: 0.00002205 [12:31:53] Epoch: 1 Batch: 10198/20099 (50.74%) Loss: 1.844868 LR: 0.00002205 [12:31:54] Epoch: 1 Batch: 10199/20099 (50.74%) Loss: 2.204826 LR: 0.00002205 [12:32:00] >> Cleaned up old temp checkpoint: epoch1_step8200 [12:32:00] >> Temp checkpoint saved: epoch1_step10200, size: 0.1693 GB [12:32:00] Epoch: 1 Batch: 10200/20099 (50.75%) Loss: 1.822114 LR: 0.00002205 [12:32:02] Epoch: 1 Batch: 10201/20099 (50.75%) Loss: 1.940187 LR: 0.00002205 [12:32:03] Epoch: 1 Batch: 10202/20099 (50.76%) Loss: 1.559281 LR: 0.00002204 [12:32:05] Epoch: 1 Batch: 10203/20099 (50.76%) Loss: 2.064336 LR: 0.00002204 [12:32:07] Epoch: 1 Batch: 10204/20099 (50.77%) Loss: 2.008357 LR: 0.00002204 [12:32:09] Epoch: 1 Batch: 10205/20099 (50.77%) Loss: 2.183630 LR: 0.00002204 [12:32:10] Epoch: 1 Batch: 10206/20099 (50.78%) Loss: 2.010585 LR: 0.00002204 [12:32:12] Epoch: 1 Batch: 10207/20099 (50.78%) Loss: 1.887463 LR: 0.00002204 [12:32:14] Epoch: 1 Batch: 10208/20099 (50.79%) Loss: 2.308248 LR: 0.00002204 [12:32:16] Epoch: 1 Batch: 10209/20099 (50.79%) Loss: 2.325475 LR: 0.00002202 [12:32:18] Epoch: 1 Batch: 10210/20099 (50.80%) Loss: 1.898462 LR: 0.00002202 [12:32:19] Epoch: 1 Batch: 10211/20099 (50.80%) Loss: 2.313471 LR: 0.00002202 [12:32:21] Epoch: 1 Batch: 10212/20099 (50.81%) Loss: 2.315353 LR: 0.00002202 [12:32:23] Epoch: 1 Batch: 10213/20099 (50.81%) Loss: 2.193127 LR: 0.00002202 [12:32:25] Epoch: 1 Batch: 10214/20099 (50.82%) Loss: 2.156697 LR: 0.00002202 [12:32:27] Epoch: 1 Batch: 10215/20099 (50.82%) Loss: 2.251478 LR: 0.00002202 [12:32:28] Epoch: 1 Batch: 10216/20099 (50.83%) Loss: 2.043884 LR: 0.00002201 [12:32:30] Epoch: 1 Batch: 10217/20099 (50.83%) Loss: 2.243955 LR: 0.00002201 [12:32:32] Epoch: 1 Batch: 10218/20099 (50.84%) Loss: 1.720449 LR: 0.00002201 [12:32:34] Epoch: 1 Batch: 10219/20099 (50.84%) Loss: 2.133950 LR: 0.00002201 [12:32:36] Epoch: 1 Batch: 10220/20099 (50.85%) Loss: 1.886939 LR: 0.00002201 [12:32:37] Epoch: 1 Batch: 10221/20099 (50.85%) Loss: 2.048932 LR: 0.00002201 [12:32:39] Epoch: 1 Batch: 10222/20099 (50.86%) Loss: 2.002116 LR: 0.00002201 [12:32:41] Epoch: 1 Batch: 10223/20099 (50.86%) Loss: 2.014609 LR: 0.00002199 [12:32:43] Epoch: 1 Batch: 10224/20099 (50.87%) Loss: 1.996554 LR: 0.00002199 [12:32:44] Epoch: 1 Batch: 10225/20099 (50.87%) Loss: 2.276937 LR: 0.00002199 [12:32:46] Epoch: 1 Batch: 10226/20099 (50.88%) Loss: 2.177308 LR: 0.00002199 [12:32:48] Epoch: 1 Batch: 10227/20099 (50.88%) Loss: 2.173513 LR: 0.00002199 [12:32:50] Epoch: 1 Batch: 10228/20099 (50.89%) Loss: 2.040606 LR: 0.00002199 [12:32:52] Epoch: 1 Batch: 10229/20099 (50.89%) Loss: 2.269064 LR: 0.00002199 [12:32:53] Epoch: 1 Batch: 10230/20099 (50.90%) Loss: 2.346204 LR: 0.00002198 [12:32:55] Epoch: 1 Batch: 10231/20099 (50.90%) Loss: 2.217344 LR: 0.00002198 [12:32:57] Epoch: 1 Batch: 10232/20099 (50.91%) Loss: 2.331136 LR: 0.00002198 [12:32:59] Epoch: 1 Batch: 10233/20099 (50.91%) Loss: 2.364319 LR: 0.00002198 [12:33:01] Epoch: 1 Batch: 10234/20099 (50.92%) Loss: 2.142001 LR: 0.00002198 [12:33:02] Epoch: 1 Batch: 10235/20099 (50.92%) Loss: 2.154401 LR: 0.00002198 [12:33:04] Epoch: 1 Batch: 10236/20099 (50.93%) Loss: 2.088140 LR: 0.00002198 [12:33:06] Epoch: 1 Batch: 10237/20099 (50.93%) Loss: 1.892968 LR: 0.00002196 [12:33:08] Epoch: 1 Batch: 10238/20099 (50.94%) Loss: 2.155104 LR: 0.00002196 [12:33:09] Epoch: 1 Batch: 10239/20099 (50.94%) Loss: 2.206733 LR: 0.00002196 [12:33:11] Epoch: 1 Batch: 10240/20099 (50.95%) Loss: 1.854820 LR: 0.00002196 [12:33:13] Epoch: 1 Batch: 10241/20099 (50.95%) Loss: 2.110201 LR: 0.00002196 [12:33:15] Epoch: 1 Batch: 10242/20099 (50.96%) Loss: 2.182367 LR: 0.00002196 [12:33:16] Epoch: 1 Batch: 10243/20099 (50.96%) Loss: 2.123262 LR: 0.00002196 [12:33:18] Epoch: 1 Batch: 10244/20099 (50.97%) Loss: 2.082731 LR: 0.00002195 [12:33:20] Epoch: 1 Batch: 10245/20099 (50.97%) Loss: 2.214356 LR: 0.00002195 [12:33:22] Epoch: 1 Batch: 10246/20099 (50.98%) Loss: 2.228073 LR: 0.00002195 [12:33:23] Epoch: 1 Batch: 10247/20099 (50.98%) Loss: 2.368850 LR: 0.00002195 [12:33:25] Epoch: 1 Batch: 10248/20099 (50.99%) Loss: 1.842714 LR: 0.00002195 [12:33:27] Epoch: 1 Batch: 10249/20099 (50.99%) Loss: 2.141127 LR: 0.00002195 [12:33:29] Epoch: 1 Batch: 10250/20099 (51.00%) Loss: 2.095977 LR: 0.00002195 [12:33:31] Epoch: 1 Batch: 10251/20099 (51.00%) Loss: 2.031660 LR: 0.00002193 [12:33:32] Epoch: 1 Batch: 10252/20099 (51.01%) Loss: 2.070159 LR: 0.00002193 [12:33:34] Epoch: 1 Batch: 10253/20099 (51.01%) Loss: 2.254473 LR: 0.00002193 [12:33:36] Epoch: 1 Batch: 10254/20099 (51.02%) Loss: 1.907931 LR: 0.00002193 [12:33:38] Epoch: 1 Batch: 10255/20099 (51.02%) Loss: 2.152711 LR: 0.00002193 [12:33:39] Epoch: 1 Batch: 10256/20099 (51.03%) Loss: 2.090043 LR: 0.00002193 [12:33:41] Epoch: 1 Batch: 10257/20099 (51.03%) Loss: 1.957006 LR: 0.00002193 [12:33:43] Epoch: 1 Batch: 10258/20099 (51.04%) Loss: 2.317679 LR: 0.00002191 [12:33:45] Epoch: 1 Batch: 10259/20099 (51.04%) Loss: 2.309538 LR: 0.00002191 [12:33:46] Epoch: 1 Batch: 10260/20099 (51.05%) Loss: 2.059008 LR: 0.00002191 [12:33:48] Epoch: 1 Batch: 10261/20099 (51.05%) Loss: 2.061834 LR: 0.00002191 [12:33:50] Epoch: 1 Batch: 10262/20099 (51.06%) Loss: 2.037491 LR: 0.00002191 [12:33:52] Epoch: 1 Batch: 10263/20099 (51.06%) Loss: 2.262705 LR: 0.00002191 [12:33:54] Epoch: 1 Batch: 10264/20099 (51.07%) Loss: 2.015163 LR: 0.00002191 [12:33:55] Epoch: 1 Batch: 10265/20099 (51.07%) Loss: 2.084115 LR: 0.00002190 [12:33:57] Epoch: 1 Batch: 10266/20099 (51.08%) Loss: 2.281603 LR: 0.00002190 [12:33:59] Epoch: 1 Batch: 10267/20099 (51.08%) Loss: 1.949597 LR: 0.00002190 [12:34:01] Epoch: 1 Batch: 10268/20099 (51.09%) Loss: 2.048435 LR: 0.00002190 [12:34:03] Epoch: 1 Batch: 10269/20099 (51.09%) Loss: 2.297588 LR: 0.00002190 [12:34:04] Epoch: 1 Batch: 10270/20099 (51.10%) Loss: 2.230838 LR: 0.00002190 [12:34:06] Epoch: 1 Batch: 10271/20099 (51.10%) Loss: 2.125212 LR: 0.00002190 [12:34:08] Epoch: 1 Batch: 10272/20099 (51.11%) Loss: 2.234109 LR: 0.00002188 [12:34:10] Epoch: 1 Batch: 10273/20099 (51.11%) Loss: 2.192349 LR: 0.00002188 [12:34:11] Epoch: 1 Batch: 10274/20099 (51.12%) Loss: 2.434985 LR: 0.00002188 [12:34:13] Epoch: 1 Batch: 10275/20099 (51.12%) Loss: 2.104143 LR: 0.00002188 [12:34:15] Epoch: 1 Batch: 10276/20099 (51.13%) Loss: 2.197866 LR: 0.00002188 [12:34:17] Epoch: 1 Batch: 10277/20099 (51.13%) Loss: 1.888504 LR: 0.00002188 [12:34:19] Epoch: 1 Batch: 10278/20099 (51.14%) Loss: 1.848029 LR: 0.00002188 [12:34:20] Epoch: 1 Batch: 10279/20099 (51.14%) Loss: 2.215764 LR: 0.00002187 [12:34:22] Epoch: 1 Batch: 10280/20099 (51.15%) Loss: 2.130677 LR: 0.00002187 [12:34:24] Epoch: 1 Batch: 10281/20099 (51.15%) Loss: 2.071054 LR: 0.00002187 [12:34:26] Epoch: 1 Batch: 10282/20099 (51.16%) Loss: 1.814594 LR: 0.00002187 [12:34:27] Epoch: 1 Batch: 10283/20099 (51.16%) Loss: 2.212279 LR: 0.00002187 [12:34:29] Epoch: 1 Batch: 10284/20099 (51.17%) Loss: 2.164334 LR: 0.00002187 [12:34:31] Epoch: 1 Batch: 10285/20099 (51.17%) Loss: 2.106360 LR: 0.00002187 [12:34:33] Epoch: 1 Batch: 10286/20099 (51.18%) Loss: 2.326935 LR: 0.00002185 [12:34:35] Epoch: 1 Batch: 10287/20099 (51.18%) Loss: 2.045877 LR: 0.00002185 [12:34:36] Epoch: 1 Batch: 10288/20099 (51.19%) Loss: 2.022226 LR: 0.00002185 [12:34:38] Epoch: 1 Batch: 10289/20099 (51.19%) Loss: 1.972576 LR: 0.00002185 [12:34:40] Epoch: 1 Batch: 10290/20099 (51.20%) Loss: 1.823125 LR: 0.00002185 [12:34:42] Epoch: 1 Batch: 10291/20099 (51.20%) Loss: 2.004529 LR: 0.00002185 [12:34:44] Epoch: 1 Batch: 10292/20099 (51.21%) Loss: 2.006238 LR: 0.00002185 [12:34:45] Epoch: 1 Batch: 10293/20099 (51.21%) Loss: 2.182030 LR: 0.00002184 [12:34:47] Epoch: 1 Batch: 10294/20099 (51.22%) Loss: 2.212804 LR: 0.00002184 [12:34:49] Epoch: 1 Batch: 10295/20099 (51.22%) Loss: 1.853108 LR: 0.00002184 [12:34:51] Epoch: 1 Batch: 10296/20099 (51.23%) Loss: 2.284854 LR: 0.00002184 [12:34:52] Epoch: 1 Batch: 10297/20099 (51.23%) Loss: 2.055839 LR: 0.00002184 [12:34:54] Epoch: 1 Batch: 10298/20099 (51.24%) Loss: 2.110389 LR: 0.00002184 [12:34:56] Epoch: 1 Batch: 10299/20099 (51.24%) Loss: 2.157324 LR: 0.00002184 [12:34:58] Epoch: 1 Batch: 10300/20099 (51.25%) Loss: 1.768042 LR: 0.00002182 [12:35:00] Epoch: 1 Batch: 10301/20099 (51.25%) Loss: 2.050135 LR: 0.00002182 [12:35:01] Epoch: 1 Batch: 10302/20099 (51.26%) Loss: 2.194338 LR: 0.00002182 [12:35:03] Epoch: 1 Batch: 10303/20099 (51.26%) Loss: 2.065015 LR: 0.00002182 [12:35:05] Epoch: 1 Batch: 10304/20099 (51.27%) Loss: 2.077620 LR: 0.00002182 [12:35:07] Epoch: 1 Batch: 10305/20099 (51.27%) Loss: 2.238293 LR: 0.00002182 [12:35:08] Epoch: 1 Batch: 10306/20099 (51.28%) Loss: 2.112545 LR: 0.00002182 [12:35:10] Epoch: 1 Batch: 10307/20099 (51.28%) Loss: 1.879103 LR: 0.00002181 [12:35:12] Epoch: 1 Batch: 10308/20099 (51.29%) Loss: 2.225071 LR: 0.00002181 [12:35:14] Epoch: 1 Batch: 10309/20099 (51.29%) Loss: 2.115066 LR: 0.00002181 [12:35:16] Epoch: 1 Batch: 10310/20099 (51.30%) Loss: 1.897396 LR: 0.00002181 [12:35:17] Epoch: 1 Batch: 10311/20099 (51.30%) Loss: 2.047663 LR: 0.00002181 [12:35:19] Epoch: 1 Batch: 10312/20099 (51.31%) Loss: 1.946905 LR: 0.00002181 [12:35:21] Epoch: 1 Batch: 10313/20099 (51.31%) Loss: 1.810385 LR: 0.00002181 [12:35:23] Epoch: 1 Batch: 10314/20099 (51.32%) Loss: 2.035806 LR: 0.00002179 [12:35:24] Epoch: 1 Batch: 10315/20099 (51.32%) Loss: 2.311561 LR: 0.00002179 [12:35:26] Epoch: 1 Batch: 10316/20099 (51.33%) Loss: 2.125880 LR: 0.00002179 [12:35:28] Epoch: 1 Batch: 10317/20099 (51.33%) Loss: 2.453233 LR: 0.00002179 [12:35:30] Epoch: 1 Batch: 10318/20099 (51.34%) Loss: 2.120188 LR: 0.00002179 [12:35:32] Epoch: 1 Batch: 10319/20099 (51.34%) Loss: 2.069199 LR: 0.00002179 [12:35:33] Epoch: 1 Batch: 10320/20099 (51.35%) Loss: 2.189836 LR: 0.00002179 [12:35:35] Epoch: 1 Batch: 10321/20099 (51.35%) Loss: 2.321538 LR: 0.00002178 [12:35:37] Epoch: 1 Batch: 10322/20099 (51.36%) Loss: 1.995060 LR: 0.00002178 [12:35:39] Epoch: 1 Batch: 10323/20099 (51.36%) Loss: 1.925357 LR: 0.00002178 [12:35:40] Epoch: 1 Batch: 10324/20099 (51.37%) Loss: 2.095595 LR: 0.00002178 [12:35:42] Epoch: 1 Batch: 10325/20099 (51.37%) Loss: 1.985288 LR: 0.00002178 [12:35:44] Epoch: 1 Batch: 10326/20099 (51.38%) Loss: 1.760827 LR: 0.00002178 [12:35:46] Epoch: 1 Batch: 10327/20099 (51.38%) Loss: 2.014076 LR: 0.00002178 [12:35:48] Epoch: 1 Batch: 10328/20099 (51.39%) Loss: 2.013000 LR: 0.00002176 [12:35:49] Epoch: 1 Batch: 10329/20099 (51.39%) Loss: 2.127565 LR: 0.00002176 [12:35:51] Epoch: 1 Batch: 10330/20099 (51.40%) Loss: 2.404293 LR: 0.00002176 [12:35:53] Epoch: 1 Batch: 10331/20099 (51.40%) Loss: 2.002283 LR: 0.00002176 [12:35:55] Epoch: 1 Batch: 10332/20099 (51.41%) Loss: 1.853055 LR: 0.00002176 [12:35:56] Epoch: 1 Batch: 10333/20099 (51.41%) Loss: 2.123050 LR: 0.00002176 [12:35:58] Epoch: 1 Batch: 10334/20099 (51.42%) Loss: 2.211385 LR: 0.00002176 [12:36:00] Epoch: 1 Batch: 10335/20099 (51.42%) Loss: 2.326090 LR: 0.00002175 [12:36:02] Epoch: 1 Batch: 10336/20099 (51.43%) Loss: 2.022330 LR: 0.00002175 [12:36:04] Epoch: 1 Batch: 10337/20099 (51.43%) Loss: 1.975691 LR: 0.00002175 [12:36:05] Epoch: 1 Batch: 10338/20099 (51.44%) Loss: 2.206053 LR: 0.00002175 [12:36:07] Epoch: 1 Batch: 10339/20099 (51.44%) Loss: 2.078422 LR: 0.00002175 [12:36:09] Epoch: 1 Batch: 10340/20099 (51.45%) Loss: 1.879647 LR: 0.00002175 [12:36:11] Epoch: 1 Batch: 10341/20099 (51.45%) Loss: 2.153389 LR: 0.00002175 [12:36:12] Epoch: 1 Batch: 10342/20099 (51.46%) Loss: 2.249233 LR: 0.00002173 [12:36:14] Epoch: 1 Batch: 10343/20099 (51.46%) Loss: 1.876883 LR: 0.00002173 [12:36:16] Epoch: 1 Batch: 10344/20099 (51.47%) Loss: 1.958269 LR: 0.00002173 [12:36:18] Epoch: 1 Batch: 10345/20099 (51.47%) Loss: 1.858542 LR: 0.00002173 [12:36:20] Epoch: 1 Batch: 10346/20099 (51.48%) Loss: 1.934578 LR: 0.00002173 [12:36:21] Epoch: 1 Batch: 10347/20099 (51.48%) Loss: 1.950093 LR: 0.00002173 [12:36:23] Epoch: 1 Batch: 10348/20099 (51.49%) Loss: 1.777596 LR: 0.00002173 [12:36:25] Epoch: 1 Batch: 10349/20099 (51.49%) Loss: 2.050186 LR: 0.00002171 [12:36:27] Epoch: 1 Batch: 10350/20099 (51.50%) Loss: 2.041305 LR: 0.00002171 [12:36:28] Epoch: 1 Batch: 10351/20099 (51.50%) Loss: 2.356239 LR: 0.00002171 [12:36:30] Epoch: 1 Batch: 10352/20099 (51.51%) Loss: 2.107040 LR: 0.00002171 [12:36:32] Epoch: 1 Batch: 10353/20099 (51.51%) Loss: 2.032933 LR: 0.00002171 [12:36:34] Epoch: 1 Batch: 10354/20099 (51.52%) Loss: 2.254959 LR: 0.00002171 [12:36:36] Epoch: 1 Batch: 10355/20099 (51.52%) Loss: 2.237024 LR: 0.00002171 [12:36:37] Epoch: 1 Batch: 10356/20099 (51.52%) Loss: 2.105427 LR: 0.00002170 [12:36:39] Epoch: 1 Batch: 10357/20099 (51.53%) Loss: 2.006105 LR: 0.00002170 [12:36:41] Epoch: 1 Batch: 10358/20099 (51.53%) Loss: 2.003927 LR: 0.00002170 [12:36:43] Epoch: 1 Batch: 10359/20099 (51.54%) Loss: 2.092793 LR: 0.00002170 [12:36:44] Epoch: 1 Batch: 10360/20099 (51.54%) Loss: 2.287009 LR: 0.00002170 [12:36:46] Epoch: 1 Batch: 10361/20099 (51.55%) Loss: 2.411765 LR: 0.00002170 [12:36:48] Epoch: 1 Batch: 10362/20099 (51.55%) Loss: 2.061428 LR: 0.00002170 [12:36:50] Epoch: 1 Batch: 10363/20099 (51.56%) Loss: 2.038290 LR: 0.00002168 [12:36:51] Epoch: 1 Batch: 10364/20099 (51.56%) Loss: 2.104352 LR: 0.00002168 [12:36:53] Epoch: 1 Batch: 10365/20099 (51.57%) Loss: 2.272573 LR: 0.00002168 [12:36:55] Epoch: 1 Batch: 10366/20099 (51.57%) Loss: 2.434746 LR: 0.00002168 [12:36:57] Epoch: 1 Batch: 10367/20099 (51.58%) Loss: 2.014027 LR: 0.00002168 [12:36:59] Epoch: 1 Batch: 10368/20099 (51.58%) Loss: 2.176905 LR: 0.00002168 [12:37:00] Epoch: 1 Batch: 10369/20099 (51.59%) Loss: 2.013161 LR: 0.00002168 [12:37:02] Epoch: 1 Batch: 10370/20099 (51.59%) Loss: 2.147414 LR: 0.00002167 [12:37:04] Epoch: 1 Batch: 10371/20099 (51.60%) Loss: 2.207065 LR: 0.00002167 [12:37:06] Epoch: 1 Batch: 10372/20099 (51.60%) Loss: 2.296807 LR: 0.00002167 [12:37:07] Epoch: 1 Batch: 10373/20099 (51.61%) Loss: 2.158292 LR: 0.00002167 [12:37:09] Epoch: 1 Batch: 10374/20099 (51.61%) Loss: 2.218572 LR: 0.00002167 [12:37:11] Epoch: 1 Batch: 10375/20099 (51.62%) Loss: 1.735222 LR: 0.00002167 [12:37:13] Epoch: 1 Batch: 10376/20099 (51.62%) Loss: 2.286353 LR: 0.00002167 [12:37:15] Epoch: 1 Batch: 10377/20099 (51.63%) Loss: 1.821790 LR: 0.00002165 [12:37:16] Epoch: 1 Batch: 10378/20099 (51.63%) Loss: 2.128585 LR: 0.00002165 [12:37:18] Epoch: 1 Batch: 10379/20099 (51.64%) Loss: 2.153456 LR: 0.00002165 [12:37:20] Epoch: 1 Batch: 10380/20099 (51.64%) Loss: 1.952369 LR: 0.00002165 [12:37:22] Epoch: 1 Batch: 10381/20099 (51.65%) Loss: 2.398804 LR: 0.00002165 [12:37:23] Epoch: 1 Batch: 10382/20099 (51.65%) Loss: 1.960120 LR: 0.00002165 [12:37:25] Epoch: 1 Batch: 10383/20099 (51.66%) Loss: 2.037566 LR: 0.00002165 [12:37:27] Epoch: 1 Batch: 10384/20099 (51.66%) Loss: 1.743765 LR: 0.00002164 [12:37:29] Epoch: 1 Batch: 10385/20099 (51.67%) Loss: 2.090971 LR: 0.00002164 [12:37:30] Epoch: 1 Batch: 10386/20099 (51.67%) Loss: 1.996231 LR: 0.00002164 [12:37:32] Epoch: 1 Batch: 10387/20099 (51.68%) Loss: 2.025589 LR: 0.00002164 [12:37:34] Epoch: 1 Batch: 10388/20099 (51.68%) Loss: 2.106706 LR: 0.00002164 [12:37:36] Epoch: 1 Batch: 10389/20099 (51.69%) Loss: 2.175152 LR: 0.00002164 [12:37:38] Epoch: 1 Batch: 10390/20099 (51.69%) Loss: 2.141534 LR: 0.00002164 [12:37:39] Epoch: 1 Batch: 10391/20099 (51.70%) Loss: 2.125543 LR: 0.00002162 [12:37:41] Epoch: 1 Batch: 10392/20099 (51.70%) Loss: 2.015568 LR: 0.00002162 [12:37:43] Epoch: 1 Batch: 10393/20099 (51.71%) Loss: 1.880584 LR: 0.00002162 [12:37:45] Epoch: 1 Batch: 10394/20099 (51.71%) Loss: 2.028400 LR: 0.00002162 [12:37:46] Epoch: 1 Batch: 10395/20099 (51.72%) Loss: 2.278065 LR: 0.00002162 [12:37:48] Epoch: 1 Batch: 10396/20099 (51.72%) Loss: 2.100892 LR: 0.00002162 [12:37:50] Epoch: 1 Batch: 10397/20099 (51.73%) Loss: 2.186100 LR: 0.00002162 [12:37:52] Epoch: 1 Batch: 10398/20099 (51.73%) Loss: 2.276024 LR: 0.00002161 [12:37:54] Epoch: 1 Batch: 10399/20099 (51.74%) Loss: 2.008871 LR: 0.00002161 [12:37:59] >> Cleaned up old temp checkpoint: epoch1_step8400 [12:37:59] >> Temp checkpoint saved: epoch1_step10400, size: 0.1693 GB [12:37:59] Epoch: 1 Batch: 10400/20099 (51.74%) Loss: 2.455716 LR: 0.00002161 [12:38:01] Epoch: 1 Batch: 10401/20099 (51.75%) Loss: 2.046143 LR: 0.00002161 [12:38:02] Epoch: 1 Batch: 10402/20099 (51.75%) Loss: 2.060118 LR: 0.00002161 [12:38:04] Epoch: 1 Batch: 10403/20099 (51.76%) Loss: 1.947907 LR: 0.00002161 [12:38:06] Epoch: 1 Batch: 10404/20099 (51.76%) Loss: 2.440352 LR: 0.00002161 [12:38:08] Epoch: 1 Batch: 10405/20099 (51.77%) Loss: 2.098964 LR: 0.00002159 [12:38:09] Epoch: 1 Batch: 10406/20099 (51.77%) Loss: 2.178732 LR: 0.00002159 [12:38:11] Epoch: 1 Batch: 10407/20099 (51.78%) Loss: 2.139475 LR: 0.00002159 [12:38:13] Epoch: 1 Batch: 10408/20099 (51.78%) Loss: 1.922473 LR: 0.00002159 [12:38:15] Epoch: 1 Batch: 10409/20099 (51.79%) Loss: 1.991336 LR: 0.00002159 [12:38:16] Epoch: 1 Batch: 10410/20099 (51.79%) Loss: 2.100329 LR: 0.00002159 [12:38:18] Epoch: 1 Batch: 10411/20099 (51.80%) Loss: 2.065415 LR: 0.00002159 [12:38:20] Epoch: 1 Batch: 10412/20099 (51.80%) Loss: 2.133376 LR: 0.00002158 [12:38:22] Epoch: 1 Batch: 10413/20099 (51.81%) Loss: 2.259407 LR: 0.00002158 [12:38:24] Epoch: 1 Batch: 10414/20099 (51.81%) Loss: 2.072116 LR: 0.00002158 [12:38:25] Epoch: 1 Batch: 10415/20099 (51.82%) Loss: 2.089845 LR: 0.00002158 [12:38:27] Epoch: 1 Batch: 10416/20099 (51.82%) Loss: 2.112533 LR: 0.00002158 [12:38:29] Epoch: 1 Batch: 10417/20099 (51.83%) Loss: 2.063540 LR: 0.00002158 [12:38:31] Epoch: 1 Batch: 10418/20099 (51.83%) Loss: 2.163007 LR: 0.00002158 [12:38:33] Epoch: 1 Batch: 10419/20099 (51.84%) Loss: 1.749084 LR: 0.00002156 [12:38:34] Epoch: 1 Batch: 10420/20099 (51.84%) Loss: 2.050811 LR: 0.00002156 [12:38:36] Epoch: 1 Batch: 10421/20099 (51.85%) Loss: 1.822824 LR: 0.00002156 [12:38:38] Epoch: 1 Batch: 10422/20099 (51.85%) Loss: 1.853037 LR: 0.00002156 [12:38:40] Epoch: 1 Batch: 10423/20099 (51.86%) Loss: 2.047303 LR: 0.00002156 [12:38:42] Epoch: 1 Batch: 10424/20099 (51.86%) Loss: 2.075544 LR: 0.00002156 [12:38:43] Epoch: 1 Batch: 10425/20099 (51.87%) Loss: 2.101726 LR: 0.00002156 [12:38:45] Epoch: 1 Batch: 10426/20099 (51.87%) Loss: 1.991239 LR: 0.00002154 [12:38:47] Epoch: 1 Batch: 10427/20099 (51.88%) Loss: 1.990588 LR: 0.00002154 [12:38:49] Epoch: 1 Batch: 10428/20099 (51.88%) Loss: 1.989367 LR: 0.00002154 [12:38:51] Epoch: 1 Batch: 10429/20099 (51.89%) Loss: 1.881629 LR: 0.00002154 [12:38:52] Epoch: 1 Batch: 10430/20099 (51.89%) Loss: 1.713953 LR: 0.00002154 [12:38:54] Epoch: 1 Batch: 10431/20099 (51.90%) Loss: 1.925071 LR: 0.00002154 [12:38:56] Epoch: 1 Batch: 10432/20099 (51.90%) Loss: 1.847067 LR: 0.00002154 [12:38:58] Epoch: 1 Batch: 10433/20099 (51.91%) Loss: 1.998695 LR: 0.00002153 [12:38:59] Epoch: 1 Batch: 10434/20099 (51.91%) Loss: 2.079140 LR: 0.00002153 [12:39:01] Epoch: 1 Batch: 10435/20099 (51.92%) Loss: 1.877827 LR: 0.00002153 [12:39:03] Epoch: 1 Batch: 10436/20099 (51.92%) Loss: 1.870346 LR: 0.00002153 [12:39:05] Epoch: 1 Batch: 10437/20099 (51.93%) Loss: 1.988331 LR: 0.00002153 [12:39:06] Epoch: 1 Batch: 10438/20099 (51.93%) Loss: 2.239874 LR: 0.00002153 [12:39:08] Epoch: 1 Batch: 10439/20099 (51.94%) Loss: 1.920635 LR: 0.00002153 [12:39:10] Epoch: 1 Batch: 10440/20099 (51.94%) Loss: 2.222629 LR: 0.00002151 [12:39:12] Epoch: 1 Batch: 10441/20099 (51.95%) Loss: 2.209567 LR: 0.00002151 [12:39:13] Epoch: 1 Batch: 10442/20099 (51.95%) Loss: 1.917642 LR: 0.00002151 [12:39:15] Epoch: 1 Batch: 10443/20099 (51.96%) Loss: 2.217182 LR: 0.00002151 [12:39:17] Epoch: 1 Batch: 10444/20099 (51.96%) Loss: 1.961433 LR: 0.00002151 [12:39:19] Epoch: 1 Batch: 10445/20099 (51.97%) Loss: 2.043659 LR: 0.00002151 [12:39:20] Epoch: 1 Batch: 10446/20099 (51.97%) Loss: 2.149435 LR: 0.00002151 [12:39:22] Epoch: 1 Batch: 10447/20099 (51.98%) Loss: 2.133778 LR: 0.00002150 [12:39:24] Epoch: 1 Batch: 10448/20099 (51.98%) Loss: 2.019760 LR: 0.00002150 [12:39:26] Epoch: 1 Batch: 10449/20099 (51.99%) Loss: 2.259640 LR: 0.00002150 [12:39:28] Epoch: 1 Batch: 10450/20099 (51.99%) Loss: 2.212079 LR: 0.00002150 [12:39:29] Epoch: 1 Batch: 10451/20099 (52.00%) Loss: 1.876225 LR: 0.00002150 [12:39:31] Epoch: 1 Batch: 10452/20099 (52.00%) Loss: 2.136570 LR: 0.00002150 [12:39:33] Epoch: 1 Batch: 10453/20099 (52.01%) Loss: 2.260218 LR: 0.00002150 [12:39:35] Epoch: 1 Batch: 10454/20099 (52.01%) Loss: 1.968714 LR: 0.00002148 [12:39:36] Epoch: 1 Batch: 10455/20099 (52.02%) Loss: 2.147729 LR: 0.00002148 [12:39:38] Epoch: 1 Batch: 10456/20099 (52.02%) Loss: 2.196300 LR: 0.00002148 [12:39:40] Epoch: 1 Batch: 10457/20099 (52.03%) Loss: 1.866500 LR: 0.00002148 [12:39:42] Epoch: 1 Batch: 10458/20099 (52.03%) Loss: 2.109551 LR: 0.00002148 [12:39:43] Epoch: 1 Batch: 10459/20099 (52.04%) Loss: 2.244141 LR: 0.00002148 [12:39:45] Epoch: 1 Batch: 10460/20099 (52.04%) Loss: 1.886133 LR: 0.00002148 [12:39:47] Epoch: 1 Batch: 10461/20099 (52.05%) Loss: 2.196221 LR: 0.00002147 [12:39:49] Epoch: 1 Batch: 10462/20099 (52.05%) Loss: 2.192221 LR: 0.00002147 [12:39:51] Epoch: 1 Batch: 10463/20099 (52.06%) Loss: 2.087475 LR: 0.00002147 [12:39:52] Epoch: 1 Batch: 10464/20099 (52.06%) Loss: 2.141432 LR: 0.00002147 [12:39:54] Epoch: 1 Batch: 10465/20099 (52.07%) Loss: 2.258586 LR: 0.00002147 [12:39:56] Epoch: 1 Batch: 10466/20099 (52.07%) Loss: 1.916189 LR: 0.00002147 [12:39:58] Epoch: 1 Batch: 10467/20099 (52.08%) Loss: 2.090657 LR: 0.00002147 [12:39:59] Epoch: 1 Batch: 10468/20099 (52.08%) Loss: 2.074265 LR: 0.00002145 [12:40:01] Epoch: 1 Batch: 10469/20099 (52.09%) Loss: 2.372570 LR: 0.00002145 [12:40:03] Epoch: 1 Batch: 10470/20099 (52.09%) Loss: 2.115500 LR: 0.00002145 [12:40:05] Epoch: 1 Batch: 10471/20099 (52.10%) Loss: 2.135011 LR: 0.00002145 [12:40:06] Epoch: 1 Batch: 10472/20099 (52.10%) Loss: 1.984574 LR: 0.00002145 [12:40:08] Epoch: 1 Batch: 10473/20099 (52.11%) Loss: 2.235359 LR: 0.00002145 [12:40:10] Epoch: 1 Batch: 10474/20099 (52.11%) Loss: 2.126053 LR: 0.00002145 [12:40:12] Epoch: 1 Batch: 10475/20099 (52.12%) Loss: 1.923983 LR: 0.00002144 [12:40:14] Epoch: 1 Batch: 10476/20099 (52.12%) Loss: 1.912723 LR: 0.00002144 [12:40:15] Epoch: 1 Batch: 10477/20099 (52.13%) Loss: 2.071522 LR: 0.00002144 [12:40:17] Epoch: 1 Batch: 10478/20099 (52.13%) Loss: 1.900655 LR: 0.00002144 [12:40:19] Epoch: 1 Batch: 10479/20099 (52.14%) Loss: 2.166101 LR: 0.00002144 [12:40:21] Epoch: 1 Batch: 10480/20099 (52.14%) Loss: 2.098050 LR: 0.00002144 [12:40:23] Epoch: 1 Batch: 10481/20099 (52.15%) Loss: 1.937467 LR: 0.00002144 [12:40:24] Epoch: 1 Batch: 10482/20099 (52.15%) Loss: 2.072272 LR: 0.00002142 [12:40:26] Epoch: 1 Batch: 10483/20099 (52.16%) Loss: 2.163943 LR: 0.00002142 [12:40:28] Epoch: 1 Batch: 10484/20099 (52.16%) Loss: 1.805678 LR: 0.00002142 [12:40:30] Epoch: 1 Batch: 10485/20099 (52.17%) Loss: 1.812973 LR: 0.00002142 [12:40:31] Epoch: 1 Batch: 10486/20099 (52.17%) Loss: 1.626138 LR: 0.00002142 [12:40:33] Epoch: 1 Batch: 10487/20099 (52.18%) Loss: 2.248865 LR: 0.00002142 [12:40:35] Epoch: 1 Batch: 10488/20099 (52.18%) Loss: 2.256725 LR: 0.00002142 [12:40:37] Epoch: 1 Batch: 10489/20099 (52.19%) Loss: 2.222520 LR: 0.00002140 [12:40:39] Epoch: 1 Batch: 10490/20099 (52.19%) Loss: 2.116989 LR: 0.00002140 [12:40:40] Epoch: 1 Batch: 10491/20099 (52.20%) Loss: 2.273890 LR: 0.00002140 [12:40:42] Epoch: 1 Batch: 10492/20099 (52.20%) Loss: 1.862279 LR: 0.00002140 [12:40:44] Epoch: 1 Batch: 10493/20099 (52.21%) Loss: 2.256407 LR: 0.00002140 [12:40:46] Epoch: 1 Batch: 10494/20099 (52.21%) Loss: 2.065521 LR: 0.00002140 [12:40:47] Epoch: 1 Batch: 10495/20099 (52.22%) Loss: 2.184956 LR: 0.00002140 [12:40:49] Epoch: 1 Batch: 10496/20099 (52.22%) Loss: 2.301841 LR: 0.00002139 [12:40:51] Epoch: 1 Batch: 10497/20099 (52.23%) Loss: 2.170638 LR: 0.00002139 [12:40:53] Epoch: 1 Batch: 10498/20099 (52.23%) Loss: 1.901645 LR: 0.00002139 [12:40:54] Epoch: 1 Batch: 10499/20099 (52.24%) Loss: 2.013383 LR: 0.00002139 [12:40:56] >> Evaluating batch 0 [12:40:58] >> Evaluating batch 1 [12:40:59] >> Evaluating batch 2 [12:41:00] >> Evaluating batch 3 [12:41:01] >> Evaluating batch 4 [12:41:02] >> Evaluating batch 5 [12:41:03] >> Evaluating batch 6 [12:41:04] >> Evaluating batch 7 [12:41:05] >> Evaluating batch 8 [12:41:06] >> Evaluating batch 9 [12:41:07] >> Evaluating batch 10 [12:41:08] >> Evaluating batch 11 [12:41:09] >> Evaluating batch 12 [12:41:09] >> Evaluating batch 13 [12:41:10] >> Evaluating batch 14 [12:41:11] >> Evaluating batch 15 [12:41:12] >> Evaluating batch 16 [12:41:13] Epoch: 1 Step: 10500/20099 Evaluation: [12:41:13] [1mAvg Loss Since Last Eval: 2.0980 Val Loss: 2.1666 Validation loss delta: -0.0013 Perplexity: 8.7286 LR: 0.00002139 [12:41:16] >> Checkpoint saved: epoch1_step10500, size: 0.1693 GB [12:41:16] Epoch: 1 Batch: 10500/20099 (52.24%) Loss: 2.105795 LR: 0.00002139 [12:41:18] Epoch: 1 Batch: 10501/20099 (52.25%) Loss: 2.275038 LR: 0.00002139 [12:41:20] Epoch: 1 Batch: 10502/20099 (52.25%) Loss: 2.063590 LR: 0.00002139 [12:41:22] Epoch: 1 Batch: 10503/20099 (52.26%) Loss: 1.944690 LR: 0.00002137 [12:41:23] Epoch: 1 Batch: 10504/20099 (52.26%) Loss: 2.197471 LR: 0.00002137 [12:41:25] Epoch: 1 Batch: 10505/20099 (52.27%) Loss: 2.005412 LR: 0.00002137 [12:41:27] Epoch: 1 Batch: 10506/20099 (52.27%) Loss: 2.247791 LR: 0.00002137 [12:41:29] Epoch: 1 Batch: 10507/20099 (52.28%) Loss: 2.047730 LR: 0.00002137 [12:41:30] Epoch: 1 Batch: 10508/20099 (52.28%) Loss: 2.246016 LR: 0.00002137 [12:41:32] Epoch: 1 Batch: 10509/20099 (52.29%) Loss: 2.154473 LR: 0.00002137 [12:41:34] Epoch: 1 Batch: 10510/20099 (52.29%) Loss: 2.257043 LR: 0.00002136 [12:41:36] Epoch: 1 Batch: 10511/20099 (52.30%) Loss: 1.845307 LR: 0.00002136 [12:41:38] Epoch: 1 Batch: 10512/20099 (52.30%) Loss: 2.389839 LR: 0.00002136 [12:41:39] Epoch: 1 Batch: 10513/20099 (52.31%) Loss: 1.959780 LR: 0.00002136 [12:41:41] Epoch: 1 Batch: 10514/20099 (52.31%) Loss: 2.152979 LR: 0.00002136 [12:41:43] Epoch: 1 Batch: 10515/20099 (52.32%) Loss: 2.144530 LR: 0.00002136 [12:41:45] Epoch: 1 Batch: 10516/20099 (52.32%) Loss: 2.195328 LR: 0.00002136 [12:41:47] Epoch: 1 Batch: 10517/20099 (52.33%) Loss: 2.063839 LR: 0.00002134 [12:41:48] Epoch: 1 Batch: 10518/20099 (52.33%) Loss: 2.272826 LR: 0.00002134 [12:41:50] Epoch: 1 Batch: 10519/20099 (52.34%) Loss: 2.413317 LR: 0.00002134 [12:41:52] Epoch: 1 Batch: 10520/20099 (52.34%) Loss: 2.135377 LR: 0.00002134 [12:41:54] Epoch: 1 Batch: 10521/20099 (52.35%) Loss: 2.441047 LR: 0.00002134 [12:41:56] Epoch: 1 Batch: 10522/20099 (52.35%) Loss: 1.931668 LR: 0.00002134 [12:41:57] Epoch: 1 Batch: 10523/20099 (52.36%) Loss: 1.971772 LR: 0.00002134 [12:41:59] Epoch: 1 Batch: 10524/20099 (52.36%) Loss: 1.909199 LR: 0.00002133 [12:42:01] Epoch: 1 Batch: 10525/20099 (52.37%) Loss: 2.177586 LR: 0.00002133 [12:42:03] Epoch: 1 Batch: 10526/20099 (52.37%) Loss: 2.091140 LR: 0.00002133 [12:42:05] Epoch: 1 Batch: 10527/20099 (52.38%) Loss: 2.057131 LR: 0.00002133 [12:42:06] Epoch: 1 Batch: 10528/20099 (52.38%) Loss: 2.353148 LR: 0.00002133 [12:42:08] Epoch: 1 Batch: 10529/20099 (52.39%) Loss: 1.966397 LR: 0.00002133 [12:42:10] Epoch: 1 Batch: 10530/20099 (52.39%) Loss: 2.049764 LR: 0.00002133 [12:42:12] Epoch: 1 Batch: 10531/20099 (52.40%) Loss: 2.055216 LR: 0.00002131 [12:42:13] Epoch: 1 Batch: 10532/20099 (52.40%) Loss: 2.429504 LR: 0.00002131 [12:42:15] Epoch: 1 Batch: 10533/20099 (52.41%) Loss: 1.938302 LR: 0.00002131 [12:42:17] Epoch: 1 Batch: 10534/20099 (52.41%) Loss: 1.732248 LR: 0.00002131 [12:42:19] Epoch: 1 Batch: 10535/20099 (52.42%) Loss: 2.298864 LR: 0.00002131 [12:42:21] Epoch: 1 Batch: 10536/20099 (52.42%) Loss: 1.861883 LR: 0.00002131 [12:42:22] Epoch: 1 Batch: 10537/20099 (52.43%) Loss: 2.062312 LR: 0.00002131 [12:42:24] Epoch: 1 Batch: 10538/20099 (52.43%) Loss: 2.034230 LR: 0.00002129 [12:42:26] Epoch: 1 Batch: 10539/20099 (52.44%) Loss: 2.069561 LR: 0.00002129 [12:42:28] Epoch: 1 Batch: 10540/20099 (52.44%) Loss: 2.116052 LR: 0.00002129 [12:42:29] Epoch: 1 Batch: 10541/20099 (52.45%) Loss: 1.911066 LR: 0.00002129 [12:42:31] Epoch: 1 Batch: 10542/20099 (52.45%) Loss: 1.891479 LR: 0.00002129 [12:42:33] Epoch: 1 Batch: 10543/20099 (52.46%) Loss: 2.043854 LR: 0.00002129 [12:42:35] Epoch: 1 Batch: 10544/20099 (52.46%) Loss: 2.031075 LR: 0.00002129 [12:42:37] Epoch: 1 Batch: 10545/20099 (52.47%) Loss: 2.132545 LR: 0.00002128 [12:42:38] Epoch: 1 Batch: 10546/20099 (52.47%) Loss: 2.175620 LR: 0.00002128 [12:42:40] Epoch: 1 Batch: 10547/20099 (52.48%) Loss: 2.359708 LR: 0.00002128 [12:42:42] Epoch: 1 Batch: 10548/20099 (52.48%) Loss: 1.879638 LR: 0.00002128 [12:42:44] Epoch: 1 Batch: 10549/20099 (52.49%) Loss: 1.887316 LR: 0.00002128 [12:42:45] Epoch: 1 Batch: 10550/20099 (52.49%) Loss: 2.167636 LR: 0.00002128 [12:42:47] Epoch: 1 Batch: 10551/20099 (52.50%) Loss: 2.388280 LR: 0.00002128 [12:42:49] Epoch: 1 Batch: 10552/20099 (52.50%) Loss: 2.395353 LR: 0.00002126 [12:42:51] Epoch: 1 Batch: 10553/20099 (52.51%) Loss: 2.104136 LR: 0.00002126 [12:42:52] Epoch: 1 Batch: 10554/20099 (52.51%) Loss: 2.270736 LR: 0.00002126 [12:42:54] Epoch: 1 Batch: 10555/20099 (52.52%) Loss: 2.248169 LR: 0.00002126 [12:42:56] Epoch: 1 Batch: 10556/20099 (52.52%) Loss: 2.393737 LR: 0.00002126 [12:42:58] Epoch: 1 Batch: 10557/20099 (52.53%) Loss: 2.041635 LR: 0.00002126 [12:42:59] Epoch: 1 Batch: 10558/20099 (52.53%) Loss: 2.362715 LR: 0.00002126 [12:43:01] Epoch: 1 Batch: 10559/20099 (52.53%) Loss: 2.434280 LR: 0.00002125 [12:43:03] Epoch: 1 Batch: 10560/20099 (52.54%) Loss: 1.520128 LR: 0.00002125 [12:43:05] Epoch: 1 Batch: 10561/20099 (52.54%) Loss: 2.163376 LR: 0.00002125 [12:43:07] Epoch: 1 Batch: 10562/20099 (52.55%) Loss: 1.914386 LR: 0.00002125 [12:43:08] Epoch: 1 Batch: 10563/20099 (52.55%) Loss: 1.814104 LR: 0.00002125 [12:43:10] Epoch: 1 Batch: 10564/20099 (52.56%) Loss: 2.010937 LR: 0.00002125 [12:43:12] Epoch: 1 Batch: 10565/20099 (52.56%) Loss: 2.246521 LR: 0.00002125 [12:43:14] Epoch: 1 Batch: 10566/20099 (52.57%) Loss: 2.330012 LR: 0.00002123 [12:43:15] Epoch: 1 Batch: 10567/20099 (52.57%) Loss: 2.393854 LR: 0.00002123 [12:43:17] Epoch: 1 Batch: 10568/20099 (52.58%) Loss: 2.026956 LR: 0.00002123 [12:43:19] Epoch: 1 Batch: 10569/20099 (52.58%) Loss: 2.285920 LR: 0.00002123 [12:43:21] Epoch: 1 Batch: 10570/20099 (52.59%) Loss: 1.929876 LR: 0.00002123 [12:43:23] Epoch: 1 Batch: 10571/20099 (52.59%) Loss: 2.071024 LR: 0.00002123 [12:43:24] Epoch: 1 Batch: 10572/20099 (52.60%) Loss: 2.281179 LR: 0.00002123 [12:43:26] Epoch: 1 Batch: 10573/20099 (52.60%) Loss: 2.082563 LR: 0.00002122 [12:43:28] Epoch: 1 Batch: 10574/20099 (52.61%) Loss: 2.056453 LR: 0.00002122 [12:43:30] Epoch: 1 Batch: 10575/20099 (52.61%) Loss: 2.150899 LR: 0.00002122 [12:43:32] Epoch: 1 Batch: 10576/20099 (52.62%) Loss: 1.877838 LR: 0.00002122 [12:43:33] Epoch: 1 Batch: 10577/20099 (52.62%) Loss: 2.239994 LR: 0.00002122 [12:43:35] Epoch: 1 Batch: 10578/20099 (52.63%) Loss: 2.245412 LR: 0.00002122 [12:43:37] Epoch: 1 Batch: 10579/20099 (52.63%) Loss: 2.212244 LR: 0.00002122 [12:43:39] Epoch: 1 Batch: 10580/20099 (52.64%) Loss: 2.048193 LR: 0.00002120 [12:43:40] Epoch: 1 Batch: 10581/20099 (52.64%) Loss: 2.089339 LR: 0.00002120 [12:43:42] Epoch: 1 Batch: 10582/20099 (52.65%) Loss: 1.980010 LR: 0.00002120 [12:43:44] Epoch: 1 Batch: 10583/20099 (52.65%) Loss: 2.193273 LR: 0.00002120 [12:43:46] Epoch: 1 Batch: 10584/20099 (52.66%) Loss: 2.294270 LR: 0.00002120 [12:43:48] Epoch: 1 Batch: 10585/20099 (52.66%) Loss: 2.241586 LR: 0.00002120 [12:43:49] Epoch: 1 Batch: 10586/20099 (52.67%) Loss: 2.049581 LR: 0.00002120 [12:43:51] Epoch: 1 Batch: 10587/20099 (52.67%) Loss: 2.226557 LR: 0.00002119 [12:43:53] Epoch: 1 Batch: 10588/20099 (52.68%) Loss: 2.230226 LR: 0.00002119 [12:43:55] Epoch: 1 Batch: 10589/20099 (52.68%) Loss: 2.352484 LR: 0.00002119 [12:43:56] Epoch: 1 Batch: 10590/20099 (52.69%) Loss: 2.317206 LR: 0.00002119 [12:43:58] Epoch: 1 Batch: 10591/20099 (52.69%) Loss: 1.944342 LR: 0.00002119 [12:44:00] Epoch: 1 Batch: 10592/20099 (52.70%) Loss: 2.009456 LR: 0.00002119 [12:44:02] Epoch: 1 Batch: 10593/20099 (52.70%) Loss: 2.398356 LR: 0.00002119 [12:44:04] Epoch: 1 Batch: 10594/20099 (52.71%) Loss: 2.173478 LR: 0.00002117 [12:44:05] Epoch: 1 Batch: 10595/20099 (52.71%) Loss: 1.951501 LR: 0.00002117 [12:44:07] Epoch: 1 Batch: 10596/20099 (52.72%) Loss: 2.375187 LR: 0.00002117 [12:44:09] Epoch: 1 Batch: 10597/20099 (52.72%) Loss: 1.947006 LR: 0.00002117 [12:44:11] Epoch: 1 Batch: 10598/20099 (52.73%) Loss: 2.068319 LR: 0.00002117 [12:44:12] Epoch: 1 Batch: 10599/20099 (52.73%) Loss: 2.098319 LR: 0.00002117 [12:44:18] >> Cleaned up old temp checkpoint: epoch1_step8600 [12:44:18] >> Temp checkpoint saved: epoch1_step10600, size: 0.1693 GB [12:44:18] Epoch: 1 Batch: 10600/20099 (52.74%) Loss: 2.075042 LR: 0.00002117 [12:44:20] Epoch: 1 Batch: 10601/20099 (52.74%) Loss: 2.206616 LR: 0.00002115 [12:44:21] Epoch: 1 Batch: 10602/20099 (52.75%) Loss: 2.080123 LR: 0.00002115 [12:44:23] Epoch: 1 Batch: 10603/20099 (52.75%) Loss: 1.952322 LR: 0.00002115 [12:44:25] Epoch: 1 Batch: 10604/20099 (52.76%) Loss: 1.956652 LR: 0.00002115 [12:44:27] Epoch: 1 Batch: 10605/20099 (52.76%) Loss: 2.211545 LR: 0.00002115 [12:44:28] Epoch: 1 Batch: 10606/20099 (52.77%) Loss: 2.256493 LR: 0.00002115 [12:44:30] Epoch: 1 Batch: 10607/20099 (52.77%) Loss: 2.131943 LR: 0.00002115 [12:44:32] Epoch: 1 Batch: 10608/20099 (52.78%) Loss: 2.085885 LR: 0.00002114 [12:44:34] Epoch: 1 Batch: 10609/20099 (52.78%) Loss: 1.617042 LR: 0.00002114 [12:44:35] Epoch: 1 Batch: 10610/20099 (52.79%) Loss: 2.083308 LR: 0.00002114 [12:44:37] Epoch: 1 Batch: 10611/20099 (52.79%) Loss: 2.060690 LR: 0.00002114 [12:44:39] Epoch: 1 Batch: 10612/20099 (52.80%) Loss: 2.462697 LR: 0.00002114 [12:44:41] Epoch: 1 Batch: 10613/20099 (52.80%) Loss: 2.174522 LR: 0.00002114 [12:44:43] Epoch: 1 Batch: 10614/20099 (52.81%) Loss: 2.113145 LR: 0.00002114 [12:44:45] Epoch: 1 Batch: 10615/20099 (52.81%) Loss: 1.968632 LR: 0.00002112 [12:44:46] Epoch: 1 Batch: 10616/20099 (52.82%) Loss: 2.278334 LR: 0.00002112 [12:44:48] Epoch: 1 Batch: 10617/20099 (52.82%) Loss: 2.030211 LR: 0.00002112 [12:44:50] Epoch: 1 Batch: 10618/20099 (52.83%) Loss: 1.802056 LR: 0.00002112 [12:44:52] Epoch: 1 Batch: 10619/20099 (52.83%) Loss: 1.638935 LR: 0.00002112 [12:44:54] Epoch: 1 Batch: 10620/20099 (52.84%) Loss: 2.331700 LR: 0.00002112 [12:44:55] Epoch: 1 Batch: 10621/20099 (52.84%) Loss: 2.048728 LR: 0.00002112 [12:44:57] Epoch: 1 Batch: 10622/20099 (52.85%) Loss: 2.018522 LR: 0.00002111 [12:44:59] Epoch: 1 Batch: 10623/20099 (52.85%) Loss: 1.997976 LR: 0.00002111 [12:45:01] Epoch: 1 Batch: 10624/20099 (52.86%) Loss: 2.419809 LR: 0.00002111 [12:45:03] Epoch: 1 Batch: 10625/20099 (52.86%) Loss: 2.160825 LR: 0.00002111 [12:45:04] Epoch: 1 Batch: 10626/20099 (52.87%) Loss: 2.323203 LR: 0.00002111 [12:45:06] Epoch: 1 Batch: 10627/20099 (52.87%) Loss: 2.052153 LR: 0.00002111 [12:45:08] Epoch: 1 Batch: 10628/20099 (52.88%) Loss: 2.125892 LR: 0.00002111 [12:45:10] Epoch: 1 Batch: 10629/20099 (52.88%) Loss: 2.063779 LR: 0.00002109 [12:45:11] Epoch: 1 Batch: 10630/20099 (52.89%) Loss: 2.175258 LR: 0.00002109 [12:45:13] Epoch: 1 Batch: 10631/20099 (52.89%) Loss: 2.022120 LR: 0.00002109 [12:45:15] Epoch: 1 Batch: 10632/20099 (52.90%) Loss: 1.751135 LR: 0.00002109 [12:45:17] Epoch: 1 Batch: 10633/20099 (52.90%) Loss: 1.834156 LR: 0.00002109 [12:45:18] Epoch: 1 Batch: 10634/20099 (52.91%) Loss: 2.208569 LR: 0.00002109 [12:45:20] Epoch: 1 Batch: 10635/20099 (52.91%) Loss: 1.928240 LR: 0.00002109 [12:45:22] Epoch: 1 Batch: 10636/20099 (52.92%) Loss: 1.833798 LR: 0.00002108 [12:45:24] Epoch: 1 Batch: 10637/20099 (52.92%) Loss: 2.220286 LR: 0.00002108 [12:45:26] Epoch: 1 Batch: 10638/20099 (52.93%) Loss: 2.276832 LR: 0.00002108 [12:45:27] Epoch: 1 Batch: 10639/20099 (52.93%) Loss: 2.098781 LR: 0.00002108 [12:45:29] Epoch: 1 Batch: 10640/20099 (52.94%) Loss: 2.087804 LR: 0.00002108 [12:45:31] Epoch: 1 Batch: 10641/20099 (52.94%) Loss: 1.818310 LR: 0.00002108 [12:45:33] Epoch: 1 Batch: 10642/20099 (52.95%) Loss: 1.790209 LR: 0.00002108 [12:45:34] Epoch: 1 Batch: 10643/20099 (52.95%) Loss: 2.164132 LR: 0.00002106 [12:45:36] Epoch: 1 Batch: 10644/20099 (52.96%) Loss: 2.268338 LR: 0.00002106 [12:45:38] Epoch: 1 Batch: 10645/20099 (52.96%) Loss: 2.132754 LR: 0.00002106 [12:45:40] Epoch: 1 Batch: 10646/20099 (52.97%) Loss: 2.184847 LR: 0.00002106 [12:45:41] Epoch: 1 Batch: 10647/20099 (52.97%) Loss: 1.946147 LR: 0.00002106 [12:45:43] Epoch: 1 Batch: 10648/20099 (52.98%) Loss: 2.215905 LR: 0.00002106 [12:45:45] Epoch: 1 Batch: 10649/20099 (52.98%) Loss: 1.898659 LR: 0.00002106 [12:45:47] Epoch: 1 Batch: 10650/20099 (52.99%) Loss: 2.292680 LR: 0.00002104 [12:45:48] Epoch: 1 Batch: 10651/20099 (52.99%) Loss: 2.154249 LR: 0.00002104 [12:45:50] Epoch: 1 Batch: 10652/20099 (53.00%) Loss: 1.712703 LR: 0.00002104 [12:45:52] Epoch: 1 Batch: 10653/20099 (53.00%) Loss: 2.110464 LR: 0.00002104 [12:45:54] Epoch: 1 Batch: 10654/20099 (53.01%) Loss: 2.321514 LR: 0.00002104 [12:45:56] Epoch: 1 Batch: 10655/20099 (53.01%) Loss: 2.167873 LR: 0.00002104 [12:45:57] Epoch: 1 Batch: 10656/20099 (53.02%) Loss: 2.115510 LR: 0.00002104 [12:45:59] Epoch: 1 Batch: 10657/20099 (53.02%) Loss: 2.180236 LR: 0.00002103 [12:46:01] Epoch: 1 Batch: 10658/20099 (53.03%) Loss: 2.313697 LR: 0.00002103 [12:46:03] Epoch: 1 Batch: 10659/20099 (53.03%) Loss: 1.973081 LR: 0.00002103 [12:46:04] Epoch: 1 Batch: 10660/20099 (53.04%) Loss: 2.190076 LR: 0.00002103 [12:46:06] Epoch: 1 Batch: 10661/20099 (53.04%) Loss: 2.046026 LR: 0.00002103 [12:46:08] Epoch: 1 Batch: 10662/20099 (53.05%) Loss: 2.080287 LR: 0.00002103 [12:46:10] Epoch: 1 Batch: 10663/20099 (53.05%) Loss: 2.397389 LR: 0.00002103 [12:46:12] Epoch: 1 Batch: 10664/20099 (53.06%) Loss: 2.026066 LR: 0.00002101 [12:46:13] Epoch: 1 Batch: 10665/20099 (53.06%) Loss: 2.055689 LR: 0.00002101 [12:46:15] Epoch: 1 Batch: 10666/20099 (53.07%) Loss: 1.660775 LR: 0.00002101 [12:46:17] Epoch: 1 Batch: 10667/20099 (53.07%) Loss: 2.244256 LR: 0.00002101 [12:46:19] Epoch: 1 Batch: 10668/20099 (53.08%) Loss: 2.269628 LR: 0.00002101 [12:46:20] Epoch: 1 Batch: 10669/20099 (53.08%) Loss: 1.926209 LR: 0.00002101 [12:46:22] Epoch: 1 Batch: 10670/20099 (53.09%) Loss: 1.825374 LR: 0.00002101 [12:46:24] Epoch: 1 Batch: 10671/20099 (53.09%) Loss: 2.194442 LR: 0.00002100 [12:46:26] Epoch: 1 Batch: 10672/20099 (53.10%) Loss: 1.913076 LR: 0.00002100 [12:46:28] Epoch: 1 Batch: 10673/20099 (53.10%) Loss: 1.792434 LR: 0.00002100 [12:46:29] Epoch: 1 Batch: 10674/20099 (53.11%) Loss: 2.406994 LR: 0.00002100 [12:46:31] Epoch: 1 Batch: 10675/20099 (53.11%) Loss: 1.953329 LR: 0.00002100 [12:46:33] Epoch: 1 Batch: 10676/20099 (53.12%) Loss: 2.127552 LR: 0.00002100 [12:46:35] Epoch: 1 Batch: 10677/20099 (53.12%) Loss: 2.553423 LR: 0.00002100 [12:46:36] Epoch: 1 Batch: 10678/20099 (53.13%) Loss: 1.972576 LR: 0.00002098 [12:46:38] Epoch: 1 Batch: 10679/20099 (53.13%) Loss: 1.927000 LR: 0.00002098 [12:46:40] Epoch: 1 Batch: 10680/20099 (53.14%) Loss: 2.224816 LR: 0.00002098 [12:46:42] Epoch: 1 Batch: 10681/20099 (53.14%) Loss: 2.034566 LR: 0.00002098 [12:46:44] Epoch: 1 Batch: 10682/20099 (53.15%) Loss: 2.160526 LR: 0.00002098 [12:46:45] Epoch: 1 Batch: 10683/20099 (53.15%) Loss: 1.799345 LR: 0.00002098 [12:46:47] Epoch: 1 Batch: 10684/20099 (53.16%) Loss: 2.122212 LR: 0.00002098 [12:46:49] Epoch: 1 Batch: 10685/20099 (53.16%) Loss: 2.136188 LR: 0.00002097 [12:46:51] Epoch: 1 Batch: 10686/20099 (53.17%) Loss: 2.000205 LR: 0.00002097 [12:46:52] Epoch: 1 Batch: 10687/20099 (53.17%) Loss: 2.287966 LR: 0.00002097 [12:46:54] Epoch: 1 Batch: 10688/20099 (53.18%) Loss: 1.848741 LR: 0.00002097 [12:46:56] Epoch: 1 Batch: 10689/20099 (53.18%) Loss: 1.975384 LR: 0.00002097 [12:46:58] Epoch: 1 Batch: 10690/20099 (53.19%) Loss: 2.278547 LR: 0.00002097 [12:47:00] Epoch: 1 Batch: 10691/20099 (53.19%) Loss: 1.784178 LR: 0.00002097 [12:47:01] Epoch: 1 Batch: 10692/20099 (53.20%) Loss: 2.199654 LR: 0.00002095 [12:47:03] Epoch: 1 Batch: 10693/20099 (53.20%) Loss: 2.181619 LR: 0.00002095 [12:47:05] Epoch: 1 Batch: 10694/20099 (53.21%) Loss: 1.924939 LR: 0.00002095 [12:47:07] Epoch: 1 Batch: 10695/20099 (53.21%) Loss: 2.323013 LR: 0.00002095 [12:47:09] Epoch: 1 Batch: 10696/20099 (53.22%) Loss: 2.152698 LR: 0.00002095 [12:47:10] Epoch: 1 Batch: 10697/20099 (53.22%) Loss: 2.163315 LR: 0.00002095 [12:47:12] Epoch: 1 Batch: 10698/20099 (53.23%) Loss: 1.933052 LR: 0.00002095 [12:47:14] Epoch: 1 Batch: 10699/20099 (53.23%) Loss: 2.181442 LR: 0.00002093 [12:47:16] Epoch: 1 Batch: 10700/20099 (53.24%) Loss: 1.976923 LR: 0.00002093 [12:47:17] Epoch: 1 Batch: 10701/20099 (53.24%) Loss: 2.120158 LR: 0.00002093 [12:47:19] Epoch: 1 Batch: 10702/20099 (53.25%) Loss: 2.115959 LR: 0.00002093 [12:47:21] Epoch: 1 Batch: 10703/20099 (53.25%) Loss: 1.841247 LR: 0.00002093 [12:47:23] Epoch: 1 Batch: 10704/20099 (53.26%) Loss: 2.033932 LR: 0.00002093 [12:47:24] Epoch: 1 Batch: 10705/20099 (53.26%) Loss: 2.184541 LR: 0.00002093 [12:47:26] Epoch: 1 Batch: 10706/20099 (53.27%) Loss: 1.874903 LR: 0.00002092 [12:47:28] Epoch: 1 Batch: 10707/20099 (53.27%) Loss: 2.007438 LR: 0.00002092 [12:47:30] Epoch: 1 Batch: 10708/20099 (53.28%) Loss: 2.101531 LR: 0.00002092 [12:47:32] Epoch: 1 Batch: 10709/20099 (53.28%) Loss: 1.618119 LR: 0.00002092 [12:47:33] Epoch: 1 Batch: 10710/20099 (53.29%) Loss: 2.381801 LR: 0.00002092 [12:47:35] Epoch: 1 Batch: 10711/20099 (53.29%) Loss: 2.005429 LR: 0.00002092 [12:47:37] Epoch: 1 Batch: 10712/20099 (53.30%) Loss: 2.273021 LR: 0.00002092 [12:47:39] Epoch: 1 Batch: 10713/20099 (53.30%) Loss: 2.112838 LR: 0.00002090 [12:47:40] Epoch: 1 Batch: 10714/20099 (53.31%) Loss: 2.137327 LR: 0.00002090 [12:47:42] Epoch: 1 Batch: 10715/20099 (53.31%) Loss: 2.367983 LR: 0.00002090 [12:47:44] Epoch: 1 Batch: 10716/20099 (53.32%) Loss: 2.020501 LR: 0.00002090 [12:47:46] Epoch: 1 Batch: 10717/20099 (53.32%) Loss: 2.379011 LR: 0.00002090 [12:47:47] Epoch: 1 Batch: 10718/20099 (53.33%) Loss: 2.596502 LR: 0.00002090 [12:47:49] Epoch: 1 Batch: 10719/20099 (53.33%) Loss: 2.408472 LR: 0.00002090 [12:47:51] Epoch: 1 Batch: 10720/20099 (53.34%) Loss: 2.328966 LR: 0.00002089 [12:47:53] Epoch: 1 Batch: 10721/20099 (53.34%) Loss: 2.346813 LR: 0.00002089 [12:47:55] Epoch: 1 Batch: 10722/20099 (53.35%) Loss: 2.313595 LR: 0.00002089 [12:47:56] Epoch: 1 Batch: 10723/20099 (53.35%) Loss: 1.997594 LR: 0.00002089 [12:47:58] Epoch: 1 Batch: 10724/20099 (53.36%) Loss: 1.845039 LR: 0.00002089 [12:48:00] Epoch: 1 Batch: 10725/20099 (53.36%) Loss: 2.309136 LR: 0.00002089 [12:48:02] Epoch: 1 Batch: 10726/20099 (53.37%) Loss: 2.211992 LR: 0.00002089 [12:48:03] Epoch: 1 Batch: 10727/20099 (53.37%) Loss: 2.270927 LR: 0.00002087 [12:48:05] Epoch: 1 Batch: 10728/20099 (53.38%) Loss: 2.228172 LR: 0.00002087 [12:48:07] Epoch: 1 Batch: 10729/20099 (53.38%) Loss: 1.949476 LR: 0.00002087 [12:48:09] Epoch: 1 Batch: 10730/20099 (53.39%) Loss: 1.903700 LR: 0.00002087 [12:48:10] Epoch: 1 Batch: 10731/20099 (53.39%) Loss: 1.821935 LR: 0.00002087 [12:48:12] Epoch: 1 Batch: 10732/20099 (53.40%) Loss: 2.257895 LR: 0.00002087 [12:48:14] Epoch: 1 Batch: 10733/20099 (53.40%) Loss: 1.963958 LR: 0.00002087 [12:48:16] Epoch: 1 Batch: 10734/20099 (53.41%) Loss: 1.836389 LR: 0.00002086 [12:48:18] Epoch: 1 Batch: 10735/20099 (53.41%) Loss: 2.030680 LR: 0.00002086 [12:48:19] Epoch: 1 Batch: 10736/20099 (53.42%) Loss: 2.476615 LR: 0.00002086 [12:48:21] Epoch: 1 Batch: 10737/20099 (53.42%) Loss: 2.171477 LR: 0.00002086 [12:48:23] Epoch: 1 Batch: 10738/20099 (53.43%) Loss: 1.956544 LR: 0.00002086 [12:48:25] Epoch: 1 Batch: 10739/20099 (53.43%) Loss: 2.374203 LR: 0.00002086 [12:48:26] Epoch: 1 Batch: 10740/20099 (53.44%) Loss: 2.087577 LR: 0.00002086 [12:48:28] Epoch: 1 Batch: 10741/20099 (53.44%) Loss: 1.889120 LR: 0.00002084 [12:48:30] Epoch: 1 Batch: 10742/20099 (53.45%) Loss: 2.110517 LR: 0.00002084 [12:48:32] Epoch: 1 Batch: 10743/20099 (53.45%) Loss: 2.400672 LR: 0.00002084 [12:48:33] Epoch: 1 Batch: 10744/20099 (53.46%) Loss: 2.330121 LR: 0.00002084 [12:48:35] Epoch: 1 Batch: 10745/20099 (53.46%) Loss: 1.643324 LR: 0.00002084 [12:48:37] Epoch: 1 Batch: 10746/20099 (53.47%) Loss: 2.102662 LR: 0.00002084 [12:48:39] Epoch: 1 Batch: 10747/20099 (53.47%) Loss: 1.811929 LR: 0.00002084 [12:48:41] Epoch: 1 Batch: 10748/20099 (53.48%) Loss: 2.175655 LR: 0.00002082 [12:48:42] Epoch: 1 Batch: 10749/20099 (53.48%) Loss: 2.143772 LR: 0.00002082 [12:48:44] Epoch: 1 Batch: 10750/20099 (53.49%) Loss: 2.044207 LR: 0.00002082 [12:48:46] Epoch: 1 Batch: 10751/20099 (53.49%) Loss: 2.135013 LR: 0.00002082 [12:48:48] Epoch: 1 Batch: 10752/20099 (53.50%) Loss: 2.156989 LR: 0.00002082 [12:48:49] Epoch: 1 Batch: 10753/20099 (53.50%) Loss: 2.117356 LR: 0.00002082 [12:48:51] Epoch: 1 Batch: 10754/20099 (53.51%) Loss: 2.051813 LR: 0.00002082 [12:48:53] Epoch: 1 Batch: 10755/20099 (53.51%) Loss: 2.020932 LR: 0.00002081 [12:48:55] Epoch: 1 Batch: 10756/20099 (53.52%) Loss: 2.186322 LR: 0.00002081 [12:48:56] Epoch: 1 Batch: 10757/20099 (53.52%) Loss: 2.331009 LR: 0.00002081 [12:48:58] Epoch: 1 Batch: 10758/20099 (53.53%) Loss: 1.995811 LR: 0.00002081 [12:49:00] Epoch: 1 Batch: 10759/20099 (53.53%) Loss: 2.185558 LR: 0.00002081 [12:49:02] Epoch: 1 Batch: 10760/20099 (53.54%) Loss: 2.387489 LR: 0.00002081 [12:49:03] Epoch: 1 Batch: 10761/20099 (53.54%) Loss: 2.165603 LR: 0.00002081 [12:49:05] Epoch: 1 Batch: 10762/20099 (53.54%) Loss: 1.952454 LR: 0.00002079 [12:49:07] Epoch: 1 Batch: 10763/20099 (53.55%) Loss: 2.130037 LR: 0.00002079 [12:49:09] Epoch: 1 Batch: 10764/20099 (53.55%) Loss: 1.813900 LR: 0.00002079 [12:49:11] Epoch: 1 Batch: 10765/20099 (53.56%) Loss: 2.413622 LR: 0.00002079 [12:49:12] Epoch: 1 Batch: 10766/20099 (53.56%) Loss: 2.248875 LR: 0.00002079 [12:49:14] Epoch: 1 Batch: 10767/20099 (53.57%) Loss: 1.901267 LR: 0.00002079 [12:49:16] Epoch: 1 Batch: 10768/20099 (53.57%) Loss: 2.084355 LR: 0.00002079 [12:49:18] Epoch: 1 Batch: 10769/20099 (53.58%) Loss: 2.017617 LR: 0.00002078 [12:49:19] Epoch: 1 Batch: 10770/20099 (53.58%) Loss: 1.900827 LR: 0.00002078 [12:49:21] Epoch: 1 Batch: 10771/20099 (53.59%) Loss: 2.212478 LR: 0.00002078 [12:49:23] Epoch: 1 Batch: 10772/20099 (53.59%) Loss: 2.380015 LR: 0.00002078 [12:49:25] Epoch: 1 Batch: 10773/20099 (53.60%) Loss: 2.048027 LR: 0.00002078 [12:49:26] Epoch: 1 Batch: 10774/20099 (53.60%) Loss: 2.302220 LR: 0.00002078 [12:49:28] Epoch: 1 Batch: 10775/20099 (53.61%) Loss: 2.057735 LR: 0.00002078 [12:49:30] Epoch: 1 Batch: 10776/20099 (53.61%) Loss: 2.207933 LR: 0.00002076 [12:49:32] Epoch: 1 Batch: 10777/20099 (53.62%) Loss: 2.253211 LR: 0.00002076 [12:49:34] Epoch: 1 Batch: 10778/20099 (53.62%) Loss: 2.018359 LR: 0.00002076 [12:49:35] Epoch: 1 Batch: 10779/20099 (53.63%) Loss: 1.940859 LR: 0.00002076 [12:49:37] Epoch: 1 Batch: 10780/20099 (53.63%) Loss: 2.188012 LR: 0.00002076 [12:49:39] Epoch: 1 Batch: 10781/20099 (53.64%) Loss: 2.132989 LR: 0.00002076 [12:49:41] Epoch: 1 Batch: 10782/20099 (53.64%) Loss: 1.936145 LR: 0.00002076 [12:49:42] Epoch: 1 Batch: 10783/20099 (53.65%) Loss: 2.326366 LR: 0.00002074 [12:49:44] Epoch: 1 Batch: 10784/20099 (53.65%) Loss: 2.318922 LR: 0.00002074 [12:49:46] Epoch: 1 Batch: 10785/20099 (53.66%) Loss: 1.943102 LR: 0.00002074 [12:49:48] Epoch: 1 Batch: 10786/20099 (53.66%) Loss: 2.393302 LR: 0.00002074 [12:49:49] Epoch: 1 Batch: 10787/20099 (53.67%) Loss: 2.160732 LR: 0.00002074 [12:49:51] Epoch: 1 Batch: 10788/20099 (53.67%) Loss: 2.270257 LR: 0.00002074 [12:49:53] Epoch: 1 Batch: 10789/20099 (53.68%) Loss: 2.126182 LR: 0.00002074 [12:49:55] Epoch: 1 Batch: 10790/20099 (53.68%) Loss: 2.112015 LR: 0.00002073 [12:49:57] Epoch: 1 Batch: 10791/20099 (53.69%) Loss: 2.461033 LR: 0.00002073 [12:49:58] Epoch: 1 Batch: 10792/20099 (53.69%) Loss: 2.194836 LR: 0.00002073 [12:50:00] Epoch: 1 Batch: 10793/20099 (53.70%) Loss: 1.873349 LR: 0.00002073 [12:50:02] Epoch: 1 Batch: 10794/20099 (53.70%) Loss: 2.039122 LR: 0.00002073 [12:50:04] Epoch: 1 Batch: 10795/20099 (53.71%) Loss: 2.284618 LR: 0.00002073 [12:50:05] Epoch: 1 Batch: 10796/20099 (53.71%) Loss: 2.086792 LR: 0.00002073 [12:50:07] Epoch: 1 Batch: 10797/20099 (53.72%) Loss: 1.961278 LR: 0.00002071 [12:50:09] Epoch: 1 Batch: 10798/20099 (53.72%) Loss: 2.005591 LR: 0.00002071 [12:50:11] Epoch: 1 Batch: 10799/20099 (53.73%) Loss: 1.929565 LR: 0.00002071 [12:50:16] >> Cleaned up old temp checkpoint: epoch1_step8800 [12:50:16] >> Temp checkpoint saved: epoch1_step10800, size: 0.1693 GB [12:50:16] Epoch: 1 Batch: 10800/20099 (53.73%) Loss: 1.819307 LR: 0.00002071 [12:50:18] Epoch: 1 Batch: 10801/20099 (53.74%) Loss: 1.821550 LR: 0.00002071 [12:50:20] Epoch: 1 Batch: 10802/20099 (53.74%) Loss: 2.019773 LR: 0.00002071 [12:50:21] Epoch: 1 Batch: 10803/20099 (53.75%) Loss: 1.947206 LR: 0.00002071 [12:50:23] Epoch: 1 Batch: 10804/20099 (53.75%) Loss: 2.187789 LR: 0.00002070 [12:50:25] Epoch: 1 Batch: 10805/20099 (53.76%) Loss: 2.151719 LR: 0.00002070 [12:50:27] Epoch: 1 Batch: 10806/20099 (53.76%) Loss: 2.017769 LR: 0.00002070 [12:50:28] Epoch: 1 Batch: 10807/20099 (53.77%) Loss: 2.210691 LR: 0.00002070 [12:50:30] Epoch: 1 Batch: 10808/20099 (53.77%) Loss: 2.168642 LR: 0.00002070 [12:50:32] Epoch: 1 Batch: 10809/20099 (53.78%) Loss: 2.335950 LR: 0.00002070 [12:50:34] Epoch: 1 Batch: 10810/20099 (53.78%) Loss: 1.898599 LR: 0.00002070 [12:50:36] Epoch: 1 Batch: 10811/20099 (53.79%) Loss: 2.110321 LR: 0.00002068 [12:50:37] Epoch: 1 Batch: 10812/20099 (53.79%) Loss: 1.691694 LR: 0.00002068 [12:50:39] Epoch: 1 Batch: 10813/20099 (53.80%) Loss: 2.041438 LR: 0.00002068 [12:50:41] Epoch: 1 Batch: 10814/20099 (53.80%) Loss: 2.167322 LR: 0.00002068 [12:50:43] Epoch: 1 Batch: 10815/20099 (53.81%) Loss: 2.219054 LR: 0.00002068 [12:50:44] Epoch: 1 Batch: 10816/20099 (53.81%) Loss: 1.837562 LR: 0.00002068 [12:50:46] Epoch: 1 Batch: 10817/20099 (53.82%) Loss: 2.027601 LR: 0.00002068 [12:50:48] Epoch: 1 Batch: 10818/20099 (53.82%) Loss: 2.234107 LR: 0.00002067 [12:50:50] Epoch: 1 Batch: 10819/20099 (53.83%) Loss: 2.177175 LR: 0.00002067 [12:50:52] Epoch: 1 Batch: 10820/20099 (53.83%) Loss: 2.175289 LR: 0.00002067 [12:50:53] Epoch: 1 Batch: 10821/20099 (53.84%) Loss: 1.905036 LR: 0.00002067 [12:50:55] Epoch: 1 Batch: 10822/20099 (53.84%) Loss: 2.244673 LR: 0.00002067 [12:50:57] Epoch: 1 Batch: 10823/20099 (53.85%) Loss: 2.343349 LR: 0.00002067 [12:50:59] Epoch: 1 Batch: 10824/20099 (53.85%) Loss: 2.195063 LR: 0.00002067 [12:51:01] Epoch: 1 Batch: 10825/20099 (53.86%) Loss: 1.869355 LR: 0.00002065 [12:51:02] Epoch: 1 Batch: 10826/20099 (53.86%) Loss: 2.035278 LR: 0.00002065 [12:51:04] Epoch: 1 Batch: 10827/20099 (53.87%) Loss: 2.130636 LR: 0.00002065 [12:51:06] Epoch: 1 Batch: 10828/20099 (53.87%) Loss: 2.308060 LR: 0.00002065 [12:51:08] Epoch: 1 Batch: 10829/20099 (53.88%) Loss: 2.007224 LR: 0.00002065 [12:51:10] Epoch: 1 Batch: 10830/20099 (53.88%) Loss: 2.131921 LR: 0.00002065 [12:51:11] Epoch: 1 Batch: 10831/20099 (53.89%) Loss: 1.977515 LR: 0.00002065 [12:51:13] Epoch: 1 Batch: 10832/20099 (53.89%) Loss: 2.120875 LR: 0.00002063 [12:51:15] Epoch: 1 Batch: 10833/20099 (53.90%) Loss: 1.940558 LR: 0.00002063 [12:51:17] Epoch: 1 Batch: 10834/20099 (53.90%) Loss: 2.297158 LR: 0.00002063 [12:51:18] Epoch: 1 Batch: 10835/20099 (53.91%) Loss: 2.140698 LR: 0.00002063 [12:51:20] Epoch: 1 Batch: 10836/20099 (53.91%) Loss: 2.169636 LR: 0.00002063 [12:51:22] Epoch: 1 Batch: 10837/20099 (53.92%) Loss: 2.180158 LR: 0.00002063 [12:51:24] Epoch: 1 Batch: 10838/20099 (53.92%) Loss: 1.755092 LR: 0.00002063 [12:51:26] Epoch: 1 Batch: 10839/20099 (53.93%) Loss: 2.131763 LR: 0.00002062 [12:51:27] Epoch: 1 Batch: 10840/20099 (53.93%) Loss: 2.110748 LR: 0.00002062 [12:51:29] Epoch: 1 Batch: 10841/20099 (53.94%) Loss: 1.852887 LR: 0.00002062 [12:51:31] Epoch: 1 Batch: 10842/20099 (53.94%) Loss: 1.944155 LR: 0.00002062 [12:51:33] Epoch: 1 Batch: 10843/20099 (53.95%) Loss: 2.325693 LR: 0.00002062 [12:51:34] Epoch: 1 Batch: 10844/20099 (53.95%) Loss: 2.081689 LR: 0.00002062 [12:51:36] Epoch: 1 Batch: 10845/20099 (53.96%) Loss: 2.141023 LR: 0.00002062 [12:51:38] Epoch: 1 Batch: 10846/20099 (53.96%) Loss: 2.120570 LR: 0.00002060 [12:51:40] Epoch: 1 Batch: 10847/20099 (53.97%) Loss: 2.159397 LR: 0.00002060 [12:51:41] Epoch: 1 Batch: 10848/20099 (53.97%) Loss: 2.091609 LR: 0.00002060 [12:51:43] Epoch: 1 Batch: 10849/20099 (53.98%) Loss: 2.384986 LR: 0.00002060 [12:51:45] Epoch: 1 Batch: 10850/20099 (53.98%) Loss: 1.948924 LR: 0.00002060 [12:51:47] Epoch: 1 Batch: 10851/20099 (53.99%) Loss: 2.267084 LR: 0.00002060 [12:51:48] Epoch: 1 Batch: 10852/20099 (53.99%) Loss: 1.990253 LR: 0.00002060 [12:51:50] Epoch: 1 Batch: 10853/20099 (54.00%) Loss: 2.027387 LR: 0.00002059 [12:51:52] Epoch: 1 Batch: 10854/20099 (54.00%) Loss: 2.333314 LR: 0.00002059 [12:51:54] Epoch: 1 Batch: 10855/20099 (54.01%) Loss: 2.013413 LR: 0.00002059 [12:51:55] Epoch: 1 Batch: 10856/20099 (54.01%) Loss: 2.113566 LR: 0.00002059 [12:51:57] Epoch: 1 Batch: 10857/20099 (54.02%) Loss: 2.074779 LR: 0.00002059 [12:51:59] Epoch: 1 Batch: 10858/20099 (54.02%) Loss: 2.128145 LR: 0.00002059 [12:52:01] Epoch: 1 Batch: 10859/20099 (54.03%) Loss: 2.240045 LR: 0.00002059 [12:52:03] Epoch: 1 Batch: 10860/20099 (54.03%) Loss: 2.349508 LR: 0.00002057 [12:52:04] Epoch: 1 Batch: 10861/20099 (54.04%) Loss: 1.896391 LR: 0.00002057 [12:52:06] Epoch: 1 Batch: 10862/20099 (54.04%) Loss: 2.019345 LR: 0.00002057 [12:52:08] Epoch: 1 Batch: 10863/20099 (54.05%) Loss: 2.171817 LR: 0.00002057 [12:52:10] Epoch: 1 Batch: 10864/20099 (54.05%) Loss: 2.012825 LR: 0.00002057 [12:52:11] Epoch: 1 Batch: 10865/20099 (54.06%) Loss: 2.094913 LR: 0.00002057 [12:52:13] Epoch: 1 Batch: 10866/20099 (54.06%) Loss: 2.136393 LR: 0.00002057 [12:52:15] Epoch: 1 Batch: 10867/20099 (54.07%) Loss: 2.088082 LR: 0.00002055 [12:52:17] Epoch: 1 Batch: 10868/20099 (54.07%) Loss: 2.078241 LR: 0.00002055 [12:52:19] Epoch: 1 Batch: 10869/20099 (54.08%) Loss: 2.215929 LR: 0.00002055 [12:52:20] Epoch: 1 Batch: 10870/20099 (54.08%) Loss: 1.865729 LR: 0.00002055 [12:52:22] Epoch: 1 Batch: 10871/20099 (54.09%) Loss: 2.041482 LR: 0.00002055 [12:52:24] Epoch: 1 Batch: 10872/20099 (54.09%) Loss: 1.845818 LR: 0.00002055 [12:52:26] Epoch: 1 Batch: 10873/20099 (54.10%) Loss: 2.021316 LR: 0.00002055 [12:52:27] Epoch: 1 Batch: 10874/20099 (54.10%) Loss: 2.199267 LR: 0.00002054 [12:52:29] Epoch: 1 Batch: 10875/20099 (54.11%) Loss: 2.328201 LR: 0.00002054 [12:52:31] Epoch: 1 Batch: 10876/20099 (54.11%) Loss: 2.651031 LR: 0.00002054 [12:52:33] Epoch: 1 Batch: 10877/20099 (54.12%) Loss: 2.141410 LR: 0.00002054 [12:52:35] Epoch: 1 Batch: 10878/20099 (54.12%) Loss: 2.127559 LR: 0.00002054 [12:52:36] Epoch: 1 Batch: 10879/20099 (54.13%) Loss: 2.148687 LR: 0.00002054 [12:52:38] Epoch: 1 Batch: 10880/20099 (54.13%) Loss: 2.084854 LR: 0.00002054 [12:52:40] Epoch: 1 Batch: 10881/20099 (54.14%) Loss: 2.224936 LR: 0.00002052 [12:52:42] Epoch: 1 Batch: 10882/20099 (54.14%) Loss: 2.264191 LR: 0.00002052 [12:52:44] Epoch: 1 Batch: 10883/20099 (54.15%) Loss: 2.153655 LR: 0.00002052 [12:52:45] Epoch: 1 Batch: 10884/20099 (54.15%) Loss: 1.872532 LR: 0.00002052 [12:52:47] Epoch: 1 Batch: 10885/20099 (54.16%) Loss: 2.124489 LR: 0.00002052 [12:52:49] Epoch: 1 Batch: 10886/20099 (54.16%) Loss: 2.109140 LR: 0.00002052 [12:52:51] Epoch: 1 Batch: 10887/20099 (54.17%) Loss: 2.249701 LR: 0.00002052 [12:52:52] Epoch: 1 Batch: 10888/20099 (54.17%) Loss: 1.965649 LR: 0.00002051 [12:52:54] Epoch: 1 Batch: 10889/20099 (54.18%) Loss: 2.166732 LR: 0.00002051 [12:52:56] Epoch: 1 Batch: 10890/20099 (54.18%) Loss: 2.067620 LR: 0.00002051 [12:52:58] Epoch: 1 Batch: 10891/20099 (54.19%) Loss: 2.222866 LR: 0.00002051 [12:53:00] Epoch: 1 Batch: 10892/20099 (54.19%) Loss: 2.466458 LR: 0.00002051 [12:53:01] Epoch: 1 Batch: 10893/20099 (54.20%) Loss: 2.396294 LR: 0.00002051 [12:53:03] Epoch: 1 Batch: 10894/20099 (54.20%) Loss: 2.106723 LR: 0.00002051 [12:53:05] Epoch: 1 Batch: 10895/20099 (54.21%) Loss: 2.147239 LR: 0.00002049 [12:53:07] Epoch: 1 Batch: 10896/20099 (54.21%) Loss: 1.933804 LR: 0.00002049 [12:53:08] Epoch: 1 Batch: 10897/20099 (54.22%) Loss: 2.118937 LR: 0.00002049 [12:53:10] Epoch: 1 Batch: 10898/20099 (54.22%) Loss: 2.151721 LR: 0.00002049 [12:53:12] Epoch: 1 Batch: 10899/20099 (54.23%) Loss: 1.964587 LR: 0.00002049 [12:53:14] Epoch: 1 Batch: 10900/20099 (54.23%) Loss: 1.531577 LR: 0.00002049 [12:53:16] Epoch: 1 Batch: 10901/20099 (54.24%) Loss: 1.969033 LR: 0.00002049 [12:53:17] Epoch: 1 Batch: 10902/20099 (54.24%) Loss: 2.265129 LR: 0.00002048 [12:53:19] Epoch: 1 Batch: 10903/20099 (54.25%) Loss: 2.118177 LR: 0.00002048 [12:53:21] Epoch: 1 Batch: 10904/20099 (54.25%) Loss: 2.092270 LR: 0.00002048 [12:53:23] Epoch: 1 Batch: 10905/20099 (54.26%) Loss: 2.032341 LR: 0.00002048 [12:53:24] Epoch: 1 Batch: 10906/20099 (54.26%) Loss: 2.141401 LR: 0.00002048 [12:53:26] Epoch: 1 Batch: 10907/20099 (54.27%) Loss: 2.235370 LR: 0.00002048 [12:53:28] Epoch: 1 Batch: 10908/20099 (54.27%) Loss: 2.304580 LR: 0.00002048 [12:53:30] Epoch: 1 Batch: 10909/20099 (54.28%) Loss: 2.143189 LR: 0.00002046 [12:53:32] Epoch: 1 Batch: 10910/20099 (54.28%) Loss: 2.042841 LR: 0.00002046 [12:53:33] Epoch: 1 Batch: 10911/20099 (54.29%) Loss: 2.195048 LR: 0.00002046 [12:53:35] Epoch: 1 Batch: 10912/20099 (54.29%) Loss: 2.178170 LR: 0.00002046 [12:53:37] Epoch: 1 Batch: 10913/20099 (54.30%) Loss: 2.033402 LR: 0.00002046 [12:53:39] Epoch: 1 Batch: 10914/20099 (54.30%) Loss: 2.036011 LR: 0.00002046 [12:53:40] Epoch: 1 Batch: 10915/20099 (54.31%) Loss: 2.324113 LR: 0.00002046 [12:53:42] Epoch: 1 Batch: 10916/20099 (54.31%) Loss: 1.816749 LR: 0.00002044 [12:53:44] Epoch: 1 Batch: 10917/20099 (54.32%) Loss: 2.272579 LR: 0.00002044 [12:53:46] Epoch: 1 Batch: 10918/20099 (54.32%) Loss: 1.984329 LR: 0.00002044 [12:53:48] Epoch: 1 Batch: 10919/20099 (54.33%) Loss: 1.858022 LR: 0.00002044 [12:53:49] Epoch: 1 Batch: 10920/20099 (54.33%) Loss: 2.263437 LR: 0.00002044 [12:53:51] Epoch: 1 Batch: 10921/20099 (54.34%) Loss: 2.109052 LR: 0.00002044 [12:53:53] Epoch: 1 Batch: 10922/20099 (54.34%) Loss: 2.086841 LR: 0.00002044 [12:53:55] Epoch: 1 Batch: 10923/20099 (54.35%) Loss: 1.939013 LR: 0.00002043 [12:53:56] Epoch: 1 Batch: 10924/20099 (54.35%) Loss: 2.228199 LR: 0.00002043 [12:53:58] Epoch: 1 Batch: 10925/20099 (54.36%) Loss: 1.914180 LR: 0.00002043 [12:54:00] Epoch: 1 Batch: 10926/20099 (54.36%) Loss: 1.675138 LR: 0.00002043 [12:54:02] Epoch: 1 Batch: 10927/20099 (54.37%) Loss: 1.973565 LR: 0.00002043 [12:54:04] Epoch: 1 Batch: 10928/20099 (54.37%) Loss: 2.030506 LR: 0.00002043 [12:54:05] Epoch: 1 Batch: 10929/20099 (54.38%) Loss: 2.205430 LR: 0.00002043 [12:54:07] Epoch: 1 Batch: 10930/20099 (54.38%) Loss: 2.018112 LR: 0.00002041 [12:54:09] Epoch: 1 Batch: 10931/20099 (54.39%) Loss: 1.843793 LR: 0.00002041 [12:54:11] Epoch: 1 Batch: 10932/20099 (54.39%) Loss: 1.886478 LR: 0.00002041 [12:54:12] Epoch: 1 Batch: 10933/20099 (54.40%) Loss: 1.724491 LR: 0.00002041 [12:54:14] Epoch: 1 Batch: 10934/20099 (54.40%) Loss: 2.348602 LR: 0.00002041 [12:54:16] Epoch: 1 Batch: 10935/20099 (54.41%) Loss: 2.241485 LR: 0.00002041 [12:54:18] Epoch: 1 Batch: 10936/20099 (54.41%) Loss: 2.258204 LR: 0.00002041 [12:54:19] Epoch: 1 Batch: 10937/20099 (54.42%) Loss: 2.001014 LR: 0.00002040 [12:54:21] Epoch: 1 Batch: 10938/20099 (54.42%) Loss: 1.879351 LR: 0.00002040 [12:54:23] Epoch: 1 Batch: 10939/20099 (54.43%) Loss: 2.183980 LR: 0.00002040 [12:54:25] Epoch: 1 Batch: 10940/20099 (54.43%) Loss: 2.391249 LR: 0.00002040 [12:54:26] Epoch: 1 Batch: 10941/20099 (54.44%) Loss: 2.181504 LR: 0.00002040 [12:54:28] Epoch: 1 Batch: 10942/20099 (54.44%) Loss: 2.069286 LR: 0.00002040 [12:54:30] Epoch: 1 Batch: 10943/20099 (54.45%) Loss: 2.399700 LR: 0.00002040 [12:54:32] Epoch: 1 Batch: 10944/20099 (54.45%) Loss: 1.727585 LR: 0.00002038 [12:54:34] Epoch: 1 Batch: 10945/20099 (54.46%) Loss: 2.166899 LR: 0.00002038 [12:54:35] Epoch: 1 Batch: 10946/20099 (54.46%) Loss: 1.769444 LR: 0.00002038 [12:54:37] Epoch: 1 Batch: 10947/20099 (54.47%) Loss: 2.151026 LR: 0.00002038 [12:54:39] Epoch: 1 Batch: 10948/20099 (54.47%) Loss: 1.948795 LR: 0.00002038 [12:54:41] Epoch: 1 Batch: 10949/20099 (54.48%) Loss: 2.120847 LR: 0.00002038 [12:54:42] Epoch: 1 Batch: 10950/20099 (54.48%) Loss: 1.909524 LR: 0.00002038 [12:54:44] Epoch: 1 Batch: 10951/20099 (54.49%) Loss: 2.077949 LR: 0.00002036 [12:54:46] Epoch: 1 Batch: 10952/20099 (54.49%) Loss: 2.279987 LR: 0.00002036 [12:54:48] Epoch: 1 Batch: 10953/20099 (54.50%) Loss: 2.042197 LR: 0.00002036 [12:54:49] Epoch: 1 Batch: 10954/20099 (54.50%) Loss: 1.842382 LR: 0.00002036 [12:54:51] Epoch: 1 Batch: 10955/20099 (54.51%) Loss: 2.163868 LR: 0.00002036 [12:54:53] Epoch: 1 Batch: 10956/20099 (54.51%) Loss: 2.246133 LR: 0.00002036 [12:54:55] Epoch: 1 Batch: 10957/20099 (54.52%) Loss: 2.158983 LR: 0.00002036 [12:54:56] Epoch: 1 Batch: 10958/20099 (54.52%) Loss: 2.022065 LR: 0.00002035 [12:54:58] Epoch: 1 Batch: 10959/20099 (54.53%) Loss: 1.863288 LR: 0.00002035 [12:55:00] Epoch: 1 Batch: 10960/20099 (54.53%) Loss: 2.299350 LR: 0.00002035 [12:55:02] Epoch: 1 Batch: 10961/20099 (54.54%) Loss: 2.102475 LR: 0.00002035 [12:55:03] Epoch: 1 Batch: 10962/20099 (54.54%) Loss: 1.928282 LR: 0.00002035 [12:55:05] Epoch: 1 Batch: 10963/20099 (54.55%) Loss: 2.067193 LR: 0.00002035 [12:55:07] Epoch: 1 Batch: 10964/20099 (54.55%) Loss: 2.089906 LR: 0.00002035 [12:55:09] Epoch: 1 Batch: 10965/20099 (54.55%) Loss: 2.396143 LR: 0.00002033 [12:55:11] Epoch: 1 Batch: 10966/20099 (54.56%) Loss: 1.918280 LR: 0.00002033 [12:55:12] Epoch: 1 Batch: 10967/20099 (54.56%) Loss: 1.938540 LR: 0.00002033 [12:55:14] Epoch: 1 Batch: 10968/20099 (54.57%) Loss: 1.981141 LR: 0.00002033 [12:55:16] Epoch: 1 Batch: 10969/20099 (54.57%) Loss: 2.198460 LR: 0.00002033 [12:55:18] Epoch: 1 Batch: 10970/20099 (54.58%) Loss: 2.246327 LR: 0.00002033 [12:55:19] Epoch: 1 Batch: 10971/20099 (54.58%) Loss: 2.056145 LR: 0.00002033 [12:55:21] Epoch: 1 Batch: 10972/20099 (54.59%) Loss: 2.054509 LR: 0.00002032 [12:55:23] Epoch: 1 Batch: 10973/20099 (54.59%) Loss: 1.984092 LR: 0.00002032 [12:55:25] Epoch: 1 Batch: 10974/20099 (54.60%) Loss: 2.310192 LR: 0.00002032 [12:55:26] Epoch: 1 Batch: 10975/20099 (54.60%) Loss: 2.190665 LR: 0.00002032 [12:55:28] Epoch: 1 Batch: 10976/20099 (54.61%) Loss: 2.027637 LR: 0.00002032 [12:55:30] Epoch: 1 Batch: 10977/20099 (54.61%) Loss: 2.070393 LR: 0.00002032 [12:55:32] Epoch: 1 Batch: 10978/20099 (54.62%) Loss: 2.377033 LR: 0.00002032 [12:55:34] Epoch: 1 Batch: 10979/20099 (54.62%) Loss: 2.145426 LR: 0.00002030 [12:55:35] Epoch: 1 Batch: 10980/20099 (54.63%) Loss: 2.078538 LR: 0.00002030 [12:55:37] Epoch: 1 Batch: 10981/20099 (54.63%) Loss: 1.838984 LR: 0.00002030 [12:55:39] Epoch: 1 Batch: 10982/20099 (54.64%) Loss: 2.122934 LR: 0.00002030 [12:55:41] Epoch: 1 Batch: 10983/20099 (54.64%) Loss: 1.911107 LR: 0.00002030 [12:55:42] Epoch: 1 Batch: 10984/20099 (54.65%) Loss: 2.082071 LR: 0.00002030 [12:55:44] Epoch: 1 Batch: 10985/20099 (54.65%) Loss: 2.021607 LR: 0.00002030 [12:55:46] Epoch: 1 Batch: 10986/20099 (54.66%) Loss: 2.545644 LR: 0.00002028 [12:55:48] Epoch: 1 Batch: 10987/20099 (54.66%) Loss: 2.058690 LR: 0.00002028 [12:55:49] Epoch: 1 Batch: 10988/20099 (54.67%) Loss: 2.463280 LR: 0.00002028 [12:55:51] Epoch: 1 Batch: 10989/20099 (54.67%) Loss: 1.885264 LR: 0.00002028 [12:55:53] Epoch: 1 Batch: 10990/20099 (54.68%) Loss: 2.087889 LR: 0.00002028 [12:55:55] Epoch: 1 Batch: 10991/20099 (54.68%) Loss: 2.044692 LR: 0.00002028 [12:55:57] Epoch: 1 Batch: 10992/20099 (54.69%) Loss: 1.828521 LR: 0.00002028 [12:55:58] Epoch: 1 Batch: 10993/20099 (54.69%) Loss: 2.448980 LR: 0.00002027 [12:56:00] Epoch: 1 Batch: 10994/20099 (54.70%) Loss: 2.105200 LR: 0.00002027 [12:56:02] Epoch: 1 Batch: 10995/20099 (54.70%) Loss: 2.133803 LR: 0.00002027 [12:56:04] Epoch: 1 Batch: 10996/20099 (54.71%) Loss: 2.262781 LR: 0.00002027 [12:56:05] Epoch: 1 Batch: 10997/20099 (54.71%) Loss: 2.354441 LR: 0.00002027 [12:56:07] Epoch: 1 Batch: 10998/20099 (54.72%) Loss: 2.009531 LR: 0.00002027 [12:56:09] Epoch: 1 Batch: 10999/20099 (54.72%) Loss: 1.820278 LR: 0.00002027 [12:56:11] >> Evaluating batch 0 [12:56:12] >> Evaluating batch 1 [12:56:13] >> Evaluating batch 2 [12:56:14] >> Evaluating batch 3 [12:56:15] >> Evaluating batch 4 [12:56:16] >> Evaluating batch 5 [12:56:17] >> Evaluating batch 6 [12:56:18] >> Evaluating batch 7 [12:56:19] >> Evaluating batch 8 [12:56:20] >> Evaluating batch 9 [12:56:21] >> Evaluating batch 10 [12:56:22] >> Evaluating batch 11 [12:56:23] >> Evaluating batch 12 [12:56:24] >> Evaluating batch 13 [12:56:25] >> Evaluating batch 14 [12:56:26] >> Evaluating batch 15 [12:56:27] >> Evaluating batch 16 [12:56:27] Epoch: 1 Step: 11000/20099 Evaluation: [12:56:27] [1mAvg Loss Since Last Eval: 2.1042 Val Loss: 2.1628 Validation loss delta: -0.0038 Perplexity: 8.6958 LR: 0.00002025 [12:56:31] >> Cleaned up old temp checkpoint: epoch1_step9000 [12:56:31] >> Temp checkpoint saved: epoch1_step11000, size: 0.1693 GB [12:56:34] >> Checkpoint saved: epoch1_step11000, size: 0.1693 GB [12:56:34] Epoch: 1 Batch: 11000/20099 (54.73%) Loss: 2.003237 LR: 0.00002025 [12:56:36] Epoch: 1 Batch: 11001/20099 (54.73%) Loss: 2.308950 LR: 0.00002025 [12:56:38] Epoch: 1 Batch: 11002/20099 (54.74%) Loss: 2.340276 LR: 0.00002025 [12:56:39] Epoch: 1 Batch: 11003/20099 (54.74%) Loss: 2.191643 LR: 0.00002025 [12:56:41] Epoch: 1 Batch: 11004/20099 (54.75%) Loss: 2.259829 LR: 0.00002025 [12:56:43] Epoch: 1 Batch: 11005/20099 (54.75%) Loss: 2.240568 LR: 0.00002025 [12:56:45] Epoch: 1 Batch: 11006/20099 (54.76%) Loss: 2.257064 LR: 0.00002025 [12:56:46] Epoch: 1 Batch: 11007/20099 (54.76%) Loss: 1.995762 LR: 0.00002024 [12:56:48] Epoch: 1 Batch: 11008/20099 (54.77%) Loss: 2.042869 LR: 0.00002024 [12:56:50] Epoch: 1 Batch: 11009/20099 (54.77%) Loss: 2.224186 LR: 0.00002024 [12:56:52] Epoch: 1 Batch: 11010/20099 (54.78%) Loss: 2.002099 LR: 0.00002024 [12:56:54] Epoch: 1 Batch: 11011/20099 (54.78%) Loss: 2.174895 LR: 0.00002024 [12:56:55] Epoch: 1 Batch: 11012/20099 (54.79%) Loss: 2.068383 LR: 0.00002024 [12:56:57] Epoch: 1 Batch: 11013/20099 (54.79%) Loss: 2.219602 LR: 0.00002024 [12:56:59] Epoch: 1 Batch: 11014/20099 (54.80%) Loss: 1.975746 LR: 0.00002022 [12:57:01] Epoch: 1 Batch: 11015/20099 (54.80%) Loss: 1.967935 LR: 0.00002022 [12:57:03] Epoch: 1 Batch: 11016/20099 (54.81%) Loss: 2.617079 LR: 0.00002022 [12:57:05] Epoch: 1 Batch: 11017/20099 (54.81%) Loss: 2.042877 LR: 0.00002022 [12:57:07] Epoch: 1 Batch: 11018/20099 (54.82%) Loss: 2.182997 LR: 0.00002022 [12:57:08] Epoch: 1 Batch: 11019/20099 (54.82%) Loss: 2.382670 LR: 0.00002022 [12:57:10] Epoch: 1 Batch: 11020/20099 (54.83%) Loss: 1.951826 LR: 0.00002022 [12:57:12] Epoch: 1 Batch: 11021/20099 (54.83%) Loss: 2.009771 LR: 0.00002020 [12:57:14] Epoch: 1 Batch: 11022/20099 (54.84%) Loss: 2.014898 LR: 0.00002020 [12:57:16] Epoch: 1 Batch: 11023/20099 (54.84%) Loss: 2.069910 LR: 0.00002020 [12:57:17] Epoch: 1 Batch: 11024/20099 (54.85%) Loss: 2.233220 LR: 0.00002020 [12:57:19] Epoch: 1 Batch: 11025/20099 (54.85%) Loss: 2.269468 LR: 0.00002020 [12:57:21] Epoch: 1 Batch: 11026/20099 (54.86%) Loss: 1.951767 LR: 0.00002020 [12:57:23] Epoch: 1 Batch: 11027/20099 (54.86%) Loss: 2.151754 LR: 0.00002020 [12:57:25] Epoch: 1 Batch: 11028/20099 (54.87%) Loss: 1.900542 LR: 0.00002019 [12:57:26] Epoch: 1 Batch: 11029/20099 (54.87%) Loss: 2.391329 LR: 0.00002019 [12:57:28] Epoch: 1 Batch: 11030/20099 (54.88%) Loss: 2.099194 LR: 0.00002019 [12:57:30] Epoch: 1 Batch: 11031/20099 (54.88%) Loss: 2.077719 LR: 0.00002019 [12:57:32] Epoch: 1 Batch: 11032/20099 (54.89%) Loss: 2.219657 LR: 0.00002019 [12:57:33] Epoch: 1 Batch: 11033/20099 (54.89%) Loss: 2.145596 LR: 0.00002019 [12:57:35] Epoch: 1 Batch: 11034/20099 (54.90%) Loss: 2.373609 LR: 0.00002019 [12:57:37] Epoch: 1 Batch: 11035/20099 (54.90%) Loss: 2.361026 LR: 0.00002017 [12:57:39] Epoch: 1 Batch: 11036/20099 (54.91%) Loss: 2.190966 LR: 0.00002017 [12:57:40] Epoch: 1 Batch: 11037/20099 (54.91%) Loss: 2.200602 LR: 0.00002017 [12:57:42] Epoch: 1 Batch: 11038/20099 (54.92%) Loss: 2.111797 LR: 0.00002017 [12:57:44] Epoch: 1 Batch: 11039/20099 (54.92%) Loss: 2.082245 LR: 0.00002017 [12:57:46] Epoch: 1 Batch: 11040/20099 (54.93%) Loss: 2.305699 LR: 0.00002017 [12:57:47] Epoch: 1 Batch: 11041/20099 (54.93%) Loss: 1.972590 LR: 0.00002017 [12:57:49] Epoch: 1 Batch: 11042/20099 (54.94%) Loss: 2.228224 LR: 0.00002016 [12:57:51] Epoch: 1 Batch: 11043/20099 (54.94%) Loss: 2.070248 LR: 0.00002016 [12:57:53] Epoch: 1 Batch: 11044/20099 (54.95%) Loss: 1.884289 LR: 0.00002016 [12:57:54] Epoch: 1 Batch: 11045/20099 (54.95%) Loss: 2.221781 LR: 0.00002016 [12:57:56] Epoch: 1 Batch: 11046/20099 (54.96%) Loss: 2.121635 LR: 0.00002016 [12:57:58] Epoch: 1 Batch: 11047/20099 (54.96%) Loss: 1.905028 LR: 0.00002016 [12:58:00] Epoch: 1 Batch: 11048/20099 (54.97%) Loss: 2.205162 LR: 0.00002016 [12:58:01] Epoch: 1 Batch: 11049/20099 (54.97%) Loss: 2.015265 LR: 0.00002014 [12:58:03] Epoch: 1 Batch: 11050/20099 (54.98%) Loss: 2.093370 LR: 0.00002014 [12:58:05] Epoch: 1 Batch: 11051/20099 (54.98%) Loss: 1.964112 LR: 0.00002014 [12:58:07] Epoch: 1 Batch: 11052/20099 (54.99%) Loss: 2.073983 LR: 0.00002014 [12:58:08] Epoch: 1 Batch: 11053/20099 (54.99%) Loss: 2.235200 LR: 0.00002014 [12:58:10] Epoch: 1 Batch: 11054/20099 (55.00%) Loss: 2.126853 LR: 0.00002014 [12:58:12] Epoch: 1 Batch: 11055/20099 (55.00%) Loss: 1.940715 LR: 0.00002014 [12:58:14] Epoch: 1 Batch: 11056/20099 (55.01%) Loss: 2.302220 LR: 0.00002012 [12:58:15] Epoch: 1 Batch: 11057/20099 (55.01%) Loss: 2.257996 LR: 0.00002012 [12:58:17] Epoch: 1 Batch: 11058/20099 (55.02%) Loss: 2.036028 LR: 0.00002012 [12:58:19] Epoch: 1 Batch: 11059/20099 (55.02%) Loss: 2.094869 LR: 0.00002012 [12:58:21] Epoch: 1 Batch: 11060/20099 (55.03%) Loss: 2.028501 LR: 0.00002012 [12:58:23] Epoch: 1 Batch: 11061/20099 (55.03%) Loss: 2.153646 LR: 0.00002012 [12:58:24] Epoch: 1 Batch: 11062/20099 (55.04%) Loss: 2.220414 LR: 0.00002012 [12:58:26] Epoch: 1 Batch: 11063/20099 (55.04%) Loss: 1.973165 LR: 0.00002011 [12:58:28] Epoch: 1 Batch: 11064/20099 (55.05%) Loss: 2.121934 LR: 0.00002011 [12:58:30] Epoch: 1 Batch: 11065/20099 (55.05%) Loss: 2.138520 LR: 0.00002011 [12:58:32] Epoch: 1 Batch: 11066/20099 (55.06%) Loss: 1.816774 LR: 0.00002011 [12:58:33] Epoch: 1 Batch: 11067/20099 (55.06%) Loss: 2.173776 LR: 0.00002011 [12:58:35] Epoch: 1 Batch: 11068/20099 (55.07%) Loss: 2.165769 LR: 0.00002011 [12:58:37] Epoch: 1 Batch: 11069/20099 (55.07%) Loss: 1.904986 LR: 0.00002011 [12:58:39] Epoch: 1 Batch: 11070/20099 (55.08%) Loss: 1.904932 LR: 0.00002009 [12:58:40] Epoch: 1 Batch: 11071/20099 (55.08%) Loss: 2.408404 LR: 0.00002009 [12:58:42] Epoch: 1 Batch: 11072/20099 (55.09%) Loss: 2.205010 LR: 0.00002009 [12:58:44] Epoch: 1 Batch: 11073/20099 (55.09%) Loss: 2.246984 LR: 0.00002009 [12:58:46] Epoch: 1 Batch: 11074/20099 (55.10%) Loss: 2.066077 LR: 0.00002009 [12:58:48] Epoch: 1 Batch: 11075/20099 (55.10%) Loss: 1.998656 LR: 0.00002009 [12:58:49] Epoch: 1 Batch: 11076/20099 (55.11%) Loss: 1.978814 LR: 0.00002009 [12:58:51] Epoch: 1 Batch: 11077/20099 (55.11%) Loss: 2.224337 LR: 0.00002008 [12:58:53] Epoch: 1 Batch: 11078/20099 (55.12%) Loss: 2.059154 LR: 0.00002008 [12:58:55] Epoch: 1 Batch: 11079/20099 (55.12%) Loss: 2.179935 LR: 0.00002008 [12:58:57] Epoch: 1 Batch: 11080/20099 (55.13%) Loss: 1.995695 LR: 0.00002008 [12:58:58] Epoch: 1 Batch: 11081/20099 (55.13%) Loss: 2.192569 LR: 0.00002008 [12:59:00] Epoch: 1 Batch: 11082/20099 (55.14%) Loss: 2.558090 LR: 0.00002008 [12:59:02] Epoch: 1 Batch: 11083/20099 (55.14%) Loss: 2.396944 LR: 0.00002008 [12:59:04] Epoch: 1 Batch: 11084/20099 (55.15%) Loss: 1.866625 LR: 0.00002006 [12:59:05] Epoch: 1 Batch: 11085/20099 (55.15%) Loss: 2.335765 LR: 0.00002006 [12:59:07] Epoch: 1 Batch: 11086/20099 (55.16%) Loss: 2.153575 LR: 0.00002006 [12:59:09] Epoch: 1 Batch: 11087/20099 (55.16%) Loss: 1.952592 LR: 0.00002006 [12:59:11] Epoch: 1 Batch: 11088/20099 (55.17%) Loss: 1.908183 LR: 0.00002006 [12:59:13] Epoch: 1 Batch: 11089/20099 (55.17%) Loss: 2.148547 LR: 0.00002006 [12:59:14] Epoch: 1 Batch: 11090/20099 (55.18%) Loss: 2.395633 LR: 0.00002006 [12:59:16] Epoch: 1 Batch: 11091/20099 (55.18%) Loss: 2.063512 LR: 0.00002004 [12:59:18] Epoch: 1 Batch: 11092/20099 (55.19%) Loss: 2.178868 LR: 0.00002004 [12:59:20] Epoch: 1 Batch: 11093/20099 (55.19%) Loss: 2.262750 LR: 0.00002004 [12:59:21] Epoch: 1 Batch: 11094/20099 (55.20%) Loss: 2.070873 LR: 0.00002004 [12:59:23] Epoch: 1 Batch: 11095/20099 (55.20%) Loss: 1.986290 LR: 0.00002004 [12:59:25] Epoch: 1 Batch: 11096/20099 (55.21%) Loss: 2.253832 LR: 0.00002004 [12:59:27] Epoch: 1 Batch: 11097/20099 (55.21%) Loss: 2.280985 LR: 0.00002004 [12:59:29] Epoch: 1 Batch: 11098/20099 (55.22%) Loss: 2.011451 LR: 0.00002003 [12:59:30] Epoch: 1 Batch: 11099/20099 (55.22%) Loss: 2.089214 LR: 0.00002003 [12:59:32] Epoch: 1 Batch: 11100/20099 (55.23%) Loss: 2.316898 LR: 0.00002003 [12:59:34] Epoch: 1 Batch: 11101/20099 (55.23%) Loss: 2.210807 LR: 0.00002003 [12:59:36] Epoch: 1 Batch: 11102/20099 (55.24%) Loss: 2.144978 LR: 0.00002003 [12:59:37] Epoch: 1 Batch: 11103/20099 (55.24%) Loss: 2.026101 LR: 0.00002003 [12:59:39] Epoch: 1 Batch: 11104/20099 (55.25%) Loss: 2.057931 LR: 0.00002003 [12:59:41] Epoch: 1 Batch: 11105/20099 (55.25%) Loss: 2.043174 LR: 0.00002001 [12:59:43] Epoch: 1 Batch: 11106/20099 (55.26%) Loss: 2.276022 LR: 0.00002001 [12:59:44] Epoch: 1 Batch: 11107/20099 (55.26%) Loss: 1.948438 LR: 0.00002001 [12:59:46] Epoch: 1 Batch: 11108/20099 (55.27%) Loss: 2.036401 LR: 0.00002001 [12:59:48] Epoch: 1 Batch: 11109/20099 (55.27%) Loss: 2.175477 LR: 0.00002001 [12:59:50] Epoch: 1 Batch: 11110/20099 (55.28%) Loss: 2.109460 LR: 0.00002001 [12:59:52] Epoch: 1 Batch: 11111/20099 (55.28%) Loss: 2.004047 LR: 0.00002001 [12:59:53] Epoch: 1 Batch: 11112/20099 (55.29%) Loss: 2.030140 LR: 0.00002000 [12:59:55] Epoch: 1 Batch: 11113/20099 (55.29%) Loss: 1.864693 LR: 0.00002000 [12:59:57] Epoch: 1 Batch: 11114/20099 (55.30%) Loss: 2.121825 LR: 0.00002000 [12:59:59] Epoch: 1 Batch: 11115/20099 (55.30%) Loss: 2.045478 LR: 0.00002000 [13:00:00] Epoch: 1 Batch: 11116/20099 (55.31%) Loss: 2.128784 LR: 0.00002000 [13:00:02] Epoch: 1 Batch: 11117/20099 (55.31%) Loss: 2.218340 LR: 0.00002000 [13:00:04] Epoch: 1 Batch: 11118/20099 (55.32%) Loss: 2.019497 LR: 0.00002000 [13:00:06] Epoch: 1 Batch: 11119/20099 (55.32%) Loss: 2.090980 LR: 0.00001998 [13:00:07] Epoch: 1 Batch: 11120/20099 (55.33%) Loss: 2.147621 LR: 0.00001998 [13:00:09] Epoch: 1 Batch: 11121/20099 (55.33%) Loss: 2.086133 LR: 0.00001998 [13:00:11] Epoch: 1 Batch: 11122/20099 (55.34%) Loss: 2.183997 LR: 0.00001998 [13:00:13] Epoch: 1 Batch: 11123/20099 (55.34%) Loss: 1.818453 LR: 0.00001998 [13:00:15] Epoch: 1 Batch: 11124/20099 (55.35%) Loss: 2.156151 LR: 0.00001998 [13:00:16] Epoch: 1 Batch: 11125/20099 (55.35%) Loss: 2.234179 LR: 0.00001998 [13:00:18] Epoch: 1 Batch: 11126/20099 (55.36%) Loss: 2.183157 LR: 0.00001996 [13:00:20] Epoch: 1 Batch: 11127/20099 (55.36%) Loss: 1.940661 LR: 0.00001996 [13:00:22] Epoch: 1 Batch: 11128/20099 (55.37%) Loss: 2.240952 LR: 0.00001996 [13:00:23] Epoch: 1 Batch: 11129/20099 (55.37%) Loss: 2.066563 LR: 0.00001996 [13:00:25] Epoch: 1 Batch: 11130/20099 (55.38%) Loss: 2.337873 LR: 0.00001996 [13:00:27] Epoch: 1 Batch: 11131/20099 (55.38%) Loss: 2.140023 LR: 0.00001996 [13:00:29] Epoch: 1 Batch: 11132/20099 (55.39%) Loss: 2.247560 LR: 0.00001996 [13:00:31] Epoch: 1 Batch: 11133/20099 (55.39%) Loss: 2.199916 LR: 0.00001995 [13:00:32] Epoch: 1 Batch: 11134/20099 (55.40%) Loss: 2.093566 LR: 0.00001995 [13:00:34] Epoch: 1 Batch: 11135/20099 (55.40%) Loss: 2.023887 LR: 0.00001995 [13:00:36] Epoch: 1 Batch: 11136/20099 (55.41%) Loss: 2.160091 LR: 0.00001995 [13:00:38] Epoch: 1 Batch: 11137/20099 (55.41%) Loss: 2.267032 LR: 0.00001995 [13:00:39] Epoch: 1 Batch: 11138/20099 (55.42%) Loss: 2.388919 LR: 0.00001995 [13:00:41] Epoch: 1 Batch: 11139/20099 (55.42%) Loss: 1.908056 LR: 0.00001995 [13:00:43] Epoch: 1 Batch: 11140/20099 (55.43%) Loss: 1.968200 LR: 0.00001993 [13:00:45] Epoch: 1 Batch: 11141/20099 (55.43%) Loss: 1.949701 LR: 0.00001993 [13:00:46] Epoch: 1 Batch: 11142/20099 (55.44%) Loss: 2.065496 LR: 0.00001993 [13:00:48] Epoch: 1 Batch: 11143/20099 (55.44%) Loss: 2.346295 LR: 0.00001993 [13:00:50] Epoch: 1 Batch: 11144/20099 (55.45%) Loss: 2.009866 LR: 0.00001993 [13:00:52] Epoch: 1 Batch: 11145/20099 (55.45%) Loss: 2.117960 LR: 0.00001993 [13:00:53] Epoch: 1 Batch: 11146/20099 (55.46%) Loss: 2.427787 LR: 0.00001993 [13:00:55] Epoch: 1 Batch: 11147/20099 (55.46%) Loss: 2.204567 LR: 0.00001992 [13:00:57] Epoch: 1 Batch: 11148/20099 (55.47%) Loss: 2.061039 LR: 0.00001992 [13:00:59] Epoch: 1 Batch: 11149/20099 (55.47%) Loss: 1.913082 LR: 0.00001992 [13:01:01] Epoch: 1 Batch: 11150/20099 (55.48%) Loss: 2.002297 LR: 0.00001992 [13:01:02] Epoch: 1 Batch: 11151/20099 (55.48%) Loss: 1.983355 LR: 0.00001992 [13:01:04] Epoch: 1 Batch: 11152/20099 (55.49%) Loss: 2.175663 LR: 0.00001992 [13:01:06] Epoch: 1 Batch: 11153/20099 (55.49%) Loss: 1.883452 LR: 0.00001992 [13:01:08] Epoch: 1 Batch: 11154/20099 (55.50%) Loss: 2.180163 LR: 0.00001990 [13:01:09] Epoch: 1 Batch: 11155/20099 (55.50%) Loss: 2.135711 LR: 0.00001990 [13:01:11] Epoch: 1 Batch: 11156/20099 (55.51%) Loss: 2.044790 LR: 0.00001990 [13:01:13] Epoch: 1 Batch: 11157/20099 (55.51%) Loss: 1.998878 LR: 0.00001990 [13:01:15] Epoch: 1 Batch: 11158/20099 (55.52%) Loss: 2.262176 LR: 0.00001990 [13:01:16] Epoch: 1 Batch: 11159/20099 (55.52%) Loss: 2.102956 LR: 0.00001990 [13:01:18] Epoch: 1 Batch: 11160/20099 (55.53%) Loss: 1.890763 LR: 0.00001990 [13:01:20] Epoch: 1 Batch: 11161/20099 (55.53%) Loss: 1.760657 LR: 0.00001988 [13:01:22] Epoch: 1 Batch: 11162/20099 (55.54%) Loss: 1.982925 LR: 0.00001988 [13:01:24] Epoch: 1 Batch: 11163/20099 (55.54%) Loss: 1.647432 LR: 0.00001988 [13:01:25] Epoch: 1 Batch: 11164/20099 (55.55%) Loss: 2.235788 LR: 0.00001988 [13:01:27] Epoch: 1 Batch: 11165/20099 (55.55%) Loss: 1.916987 LR: 0.00001988 [13:01:29] Epoch: 1 Batch: 11166/20099 (55.56%) Loss: 2.334794 LR: 0.00001988 [13:01:31] Epoch: 1 Batch: 11167/20099 (55.56%) Loss: 1.626364 LR: 0.00001988 [13:01:32] Epoch: 1 Batch: 11168/20099 (55.56%) Loss: 1.880565 LR: 0.00001987 [13:01:34] Epoch: 1 Batch: 11169/20099 (55.57%) Loss: 2.009051 LR: 0.00001987 [13:01:36] Epoch: 1 Batch: 11170/20099 (55.57%) Loss: 2.065865 LR: 0.00001987 [13:01:38] Epoch: 1 Batch: 11171/20099 (55.58%) Loss: 2.123891 LR: 0.00001987 [13:01:39] Epoch: 1 Batch: 11172/20099 (55.58%) Loss: 2.121518 LR: 0.00001987 [13:01:41] Epoch: 1 Batch: 11173/20099 (55.59%) Loss: 1.805938 LR: 0.00001987 [13:01:43] Epoch: 1 Batch: 11174/20099 (55.59%) Loss: 2.215779 LR: 0.00001987 [13:01:45] Epoch: 1 Batch: 11175/20099 (55.60%) Loss: 1.877665 LR: 0.00001985 [13:01:47] Epoch: 1 Batch: 11176/20099 (55.60%) Loss: 2.289865 LR: 0.00001985 [13:01:48] Epoch: 1 Batch: 11177/20099 (55.61%) Loss: 2.250286 LR: 0.00001985 [13:01:50] Epoch: 1 Batch: 11178/20099 (55.61%) Loss: 1.934063 LR: 0.00001985 [13:01:52] Epoch: 1 Batch: 11179/20099 (55.62%) Loss: 2.162903 LR: 0.00001985 [13:01:54] Epoch: 1 Batch: 11180/20099 (55.62%) Loss: 2.218387 LR: 0.00001985 [13:01:55] Epoch: 1 Batch: 11181/20099 (55.63%) Loss: 2.202329 LR: 0.00001985 [13:01:57] Epoch: 1 Batch: 11182/20099 (55.63%) Loss: 1.900783 LR: 0.00001984 [13:01:59] Epoch: 1 Batch: 11183/20099 (55.64%) Loss: 2.277054 LR: 0.00001984 [13:02:01] Epoch: 1 Batch: 11184/20099 (55.64%) Loss: 2.131933 LR: 0.00001984 [13:02:03] Epoch: 1 Batch: 11185/20099 (55.65%) Loss: 1.944875 LR: 0.00001984 [13:02:04] Epoch: 1 Batch: 11186/20099 (55.65%) Loss: 1.939227 LR: 0.00001984 [13:02:06] Epoch: 1 Batch: 11187/20099 (55.66%) Loss: 1.760072 LR: 0.00001984 [13:02:08] Epoch: 1 Batch: 11188/20099 (55.66%) Loss: 2.318941 LR: 0.00001984 [13:02:10] Epoch: 1 Batch: 11189/20099 (55.67%) Loss: 1.939994 LR: 0.00001982 [13:02:11] Epoch: 1 Batch: 11190/20099 (55.67%) Loss: 1.928573 LR: 0.00001982 [13:02:13] Epoch: 1 Batch: 11191/20099 (55.68%) Loss: 2.368747 LR: 0.00001982 [13:02:15] Epoch: 1 Batch: 11192/20099 (55.68%) Loss: 2.016939 LR: 0.00001982 [13:02:17] Epoch: 1 Batch: 11193/20099 (55.69%) Loss: 2.142023 LR: 0.00001982 [13:02:18] Epoch: 1 Batch: 11194/20099 (55.69%) Loss: 1.963130 LR: 0.00001982 [13:02:20] Epoch: 1 Batch: 11195/20099 (55.70%) Loss: 2.013131 LR: 0.00001982 [13:02:22] Epoch: 1 Batch: 11196/20099 (55.70%) Loss: 2.361329 LR: 0.00001980 [13:02:24] Epoch: 1 Batch: 11197/20099 (55.71%) Loss: 2.094740 LR: 0.00001980 [13:02:26] Epoch: 1 Batch: 11198/20099 (55.71%) Loss: 1.927511 LR: 0.00001980 [13:02:27] Epoch: 1 Batch: 11199/20099 (55.72%) Loss: 2.178116 LR: 0.00001980 [13:02:33] >> Cleaned up old temp checkpoint: epoch1_step9200 [13:02:33] >> Temp checkpoint saved: epoch1_step11200, size: 0.1693 GB [13:02:33] Epoch: 1 Batch: 11200/20099 (55.72%) Loss: 2.117837 LR: 0.00001980 [13:02:35] Epoch: 1 Batch: 11201/20099 (55.73%) Loss: 2.048815 LR: 0.00001980 [13:02:37] Epoch: 1 Batch: 11202/20099 (55.73%) Loss: 2.210438 LR: 0.00001980 [13:02:38] Epoch: 1 Batch: 11203/20099 (55.74%) Loss: 1.654237 LR: 0.00001979 [13:02:40] Epoch: 1 Batch: 11204/20099 (55.74%) Loss: 1.887697 LR: 0.00001979 [13:02:42] Epoch: 1 Batch: 11205/20099 (55.75%) Loss: 2.311118 LR: 0.00001979 [13:02:44] Epoch: 1 Batch: 11206/20099 (55.75%) Loss: 2.246371 LR: 0.00001979 [13:02:45] Epoch: 1 Batch: 11207/20099 (55.76%) Loss: 2.026492 LR: 0.00001979 [13:02:47] Epoch: 1 Batch: 11208/20099 (55.76%) Loss: 2.118731 LR: 0.00001979 [13:02:49] Epoch: 1 Batch: 11209/20099 (55.77%) Loss: 2.043592 LR: 0.00001979 [13:02:51] Epoch: 1 Batch: 11210/20099 (55.77%) Loss: 2.081089 LR: 0.00001977 [13:02:52] Epoch: 1 Batch: 11211/20099 (55.78%) Loss: 2.146350 LR: 0.00001977 [13:02:54] Epoch: 1 Batch: 11212/20099 (55.78%) Loss: 2.350594 LR: 0.00001977 [13:02:56] Epoch: 1 Batch: 11213/20099 (55.79%) Loss: 2.168718 LR: 0.00001977 [13:02:58] Epoch: 1 Batch: 11214/20099 (55.79%) Loss: 2.336211 LR: 0.00001977 [13:03:00] Epoch: 1 Batch: 11215/20099 (55.80%) Loss: 2.022901 LR: 0.00001977 [13:03:01] Epoch: 1 Batch: 11216/20099 (55.80%) Loss: 1.980048 LR: 0.00001977 [13:03:03] Epoch: 1 Batch: 11217/20099 (55.81%) Loss: 2.107671 LR: 0.00001976 [13:03:05] Epoch: 1 Batch: 11218/20099 (55.81%) Loss: 2.236177 LR: 0.00001976 [13:03:07] Epoch: 1 Batch: 11219/20099 (55.82%) Loss: 2.207035 LR: 0.00001976 [13:03:09] Epoch: 1 Batch: 11220/20099 (55.82%) Loss: 2.209307 LR: 0.00001976 [13:03:10] Epoch: 1 Batch: 11221/20099 (55.83%) Loss: 1.631344 LR: 0.00001976 [13:03:12] Epoch: 1 Batch: 11222/20099 (55.83%) Loss: 2.269216 LR: 0.00001976 [13:03:14] Epoch: 1 Batch: 11223/20099 (55.84%) Loss: 1.989925 LR: 0.00001976 [13:03:16] Epoch: 1 Batch: 11224/20099 (55.84%) Loss: 2.149505 LR: 0.00001974 [13:03:18] Epoch: 1 Batch: 11225/20099 (55.85%) Loss: 2.109759 LR: 0.00001974 [13:03:19] Epoch: 1 Batch: 11226/20099 (55.85%) Loss: 2.013200 LR: 0.00001974 [13:03:21] Epoch: 1 Batch: 11227/20099 (55.86%) Loss: 2.192127 LR: 0.00001974 [13:03:23] Epoch: 1 Batch: 11228/20099 (55.86%) Loss: 2.141660 LR: 0.00001974 [13:03:25] Epoch: 1 Batch: 11229/20099 (55.87%) Loss: 2.195828 LR: 0.00001974 [13:03:27] Epoch: 1 Batch: 11230/20099 (55.87%) Loss: 1.978537 LR: 0.00001974 [13:03:28] Epoch: 1 Batch: 11231/20099 (55.88%) Loss: 2.440528 LR: 0.00001972 [13:03:30] Epoch: 1 Batch: 11232/20099 (55.88%) Loss: 2.089115 LR: 0.00001972 [13:03:32] Epoch: 1 Batch: 11233/20099 (55.89%) Loss: 2.221148 LR: 0.00001972 [13:03:34] Epoch: 1 Batch: 11234/20099 (55.89%) Loss: 2.076248 LR: 0.00001972 [13:03:36] Epoch: 1 Batch: 11235/20099 (55.90%) Loss: 2.048286 LR: 0.00001972 [13:03:37] Epoch: 1 Batch: 11236/20099 (55.90%) Loss: 2.131194 LR: 0.00001972 [13:03:39] Epoch: 1 Batch: 11237/20099 (55.91%) Loss: 1.896971 LR: 0.00001972 [13:03:41] Epoch: 1 Batch: 11238/20099 (55.91%) Loss: 1.969922 LR: 0.00001971 [13:03:43] Epoch: 1 Batch: 11239/20099 (55.92%) Loss: 2.010126 LR: 0.00001971 [13:03:44] Epoch: 1 Batch: 11240/20099 (55.92%) Loss: 1.920749 LR: 0.00001971 [13:03:46] Epoch: 1 Batch: 11241/20099 (55.93%) Loss: 2.067146 LR: 0.00001971 [13:03:48] Epoch: 1 Batch: 11242/20099 (55.93%) Loss: 2.185040 LR: 0.00001971 [13:03:50] Epoch: 1 Batch: 11243/20099 (55.94%) Loss: 1.874555 LR: 0.00001971 [13:03:51] Epoch: 1 Batch: 11244/20099 (55.94%) Loss: 2.063280 LR: 0.00001971 [13:03:53] Epoch: 1 Batch: 11245/20099 (55.95%) Loss: 2.348310 LR: 0.00001969 [13:03:55] Epoch: 1 Batch: 11246/20099 (55.95%) Loss: 2.320829 LR: 0.00001969 [13:03:57] Epoch: 1 Batch: 11247/20099 (55.96%) Loss: 2.435837 LR: 0.00001969 [13:03:58] Epoch: 1 Batch: 11248/20099 (55.96%) Loss: 1.768005 LR: 0.00001969 [13:04:00] Epoch: 1 Batch: 11249/20099 (55.97%) Loss: 2.292635 LR: 0.00001969 [13:04:02] Epoch: 1 Batch: 11250/20099 (55.97%) Loss: 1.930801 LR: 0.00001969 [13:04:04] Epoch: 1 Batch: 11251/20099 (55.98%) Loss: 2.461676 LR: 0.00001969 [13:04:06] Epoch: 1 Batch: 11252/20099 (55.98%) Loss: 2.003017 LR: 0.00001968 [13:04:07] Epoch: 1 Batch: 11253/20099 (55.99%) Loss: 2.077899 LR: 0.00001968 [13:04:09] Epoch: 1 Batch: 11254/20099 (55.99%) Loss: 2.114132 LR: 0.00001968 [13:04:11] Epoch: 1 Batch: 11255/20099 (56.00%) Loss: 2.282373 LR: 0.00001968 [13:04:13] Epoch: 1 Batch: 11256/20099 (56.00%) Loss: 2.036855 LR: 0.00001968 [13:04:14] Epoch: 1 Batch: 11257/20099 (56.01%) Loss: 2.101515 LR: 0.00001968 [13:04:16] Epoch: 1 Batch: 11258/20099 (56.01%) Loss: 2.050449 LR: 0.00001968 [13:04:18] Epoch: 1 Batch: 11259/20099 (56.02%) Loss: 1.911826 LR: 0.00001966 [13:04:20] Epoch: 1 Batch: 11260/20099 (56.02%) Loss: 1.956179 LR: 0.00001966 [13:04:21] Epoch: 1 Batch: 11261/20099 (56.03%) Loss: 2.082642 LR: 0.00001966 [13:04:23] Epoch: 1 Batch: 11262/20099 (56.03%) Loss: 2.022380 LR: 0.00001966 [13:04:25] Epoch: 1 Batch: 11263/20099 (56.04%) Loss: 2.073771 LR: 0.00001966 [13:04:27] Epoch: 1 Batch: 11264/20099 (56.04%) Loss: 2.438944 LR: 0.00001966 [13:04:29] Epoch: 1 Batch: 11265/20099 (56.05%) Loss: 1.959609 LR: 0.00001966 [13:04:30] Epoch: 1 Batch: 11266/20099 (56.05%) Loss: 2.177157 LR: 0.00001964 [13:04:32] Epoch: 1 Batch: 11267/20099 (56.06%) Loss: 2.029596 LR: 0.00001964 [13:04:34] Epoch: 1 Batch: 11268/20099 (56.06%) Loss: 1.837508 LR: 0.00001964 [13:04:36] Epoch: 1 Batch: 11269/20099 (56.07%) Loss: 2.378138 LR: 0.00001964 [13:04:37] Epoch: 1 Batch: 11270/20099 (56.07%) Loss: 2.074793 LR: 0.00001964 [13:04:39] Epoch: 1 Batch: 11271/20099 (56.08%) Loss: 2.302495 LR: 0.00001964 [13:04:41] Epoch: 1 Batch: 11272/20099 (56.08%) Loss: 2.327219 LR: 0.00001964 [13:04:43] Epoch: 1 Batch: 11273/20099 (56.09%) Loss: 1.883996 LR: 0.00001963 [13:04:45] Epoch: 1 Batch: 11274/20099 (56.09%) Loss: 2.045943 LR: 0.00001963 [13:04:46] Epoch: 1 Batch: 11275/20099 (56.10%) Loss: 2.204832 LR: 0.00001963 [13:04:48] Epoch: 1 Batch: 11276/20099 (56.10%) Loss: 2.277378 LR: 0.00001963 [13:04:50] Epoch: 1 Batch: 11277/20099 (56.11%) Loss: 2.065509 LR: 0.00001963 [13:04:52] Epoch: 1 Batch: 11278/20099 (56.11%) Loss: 1.962020 LR: 0.00001963 [13:04:54] Epoch: 1 Batch: 11279/20099 (56.12%) Loss: 1.993627 LR: 0.00001963 [13:04:55] Epoch: 1 Batch: 11280/20099 (56.12%) Loss: 2.045679 LR: 0.00001961 [13:04:57] Epoch: 1 Batch: 11281/20099 (56.13%) Loss: 2.177164 LR: 0.00001961 [13:04:59] Epoch: 1 Batch: 11282/20099 (56.13%) Loss: 2.080282 LR: 0.00001961 [13:05:01] Epoch: 1 Batch: 11283/20099 (56.14%) Loss: 2.360803 LR: 0.00001961 [13:05:02] Epoch: 1 Batch: 11284/20099 (56.14%) Loss: 2.004181 LR: 0.00001961 [13:05:04] Epoch: 1 Batch: 11285/20099 (56.15%) Loss: 1.995249 LR: 0.00001961 [13:05:06] Epoch: 1 Batch: 11286/20099 (56.15%) Loss: 2.136674 LR: 0.00001961 [13:05:08] Epoch: 1 Batch: 11287/20099 (56.16%) Loss: 1.978145 LR: 0.00001960 [13:05:10] Epoch: 1 Batch: 11288/20099 (56.16%) Loss: 2.072719 LR: 0.00001960 [13:05:11] Epoch: 1 Batch: 11289/20099 (56.17%) Loss: 1.981132 LR: 0.00001960 [13:05:13] Epoch: 1 Batch: 11290/20099 (56.17%) Loss: 2.090764 LR: 0.00001960 [13:05:15] Epoch: 1 Batch: 11291/20099 (56.18%) Loss: 2.086682 LR: 0.00001960 [13:05:17] Epoch: 1 Batch: 11292/20099 (56.18%) Loss: 2.070847 LR: 0.00001960 [13:05:18] Epoch: 1 Batch: 11293/20099 (56.19%) Loss: 1.809889 LR: 0.00001960 [13:05:20] Epoch: 1 Batch: 11294/20099 (56.19%) Loss: 1.885626 LR: 0.00001958 [13:05:22] Epoch: 1 Batch: 11295/20099 (56.20%) Loss: 1.965294 LR: 0.00001958 [13:05:24] Epoch: 1 Batch: 11296/20099 (56.20%) Loss: 2.066365 LR: 0.00001958 [13:05:25] Epoch: 1 Batch: 11297/20099 (56.21%) Loss: 2.211788 LR: 0.00001958 [13:05:27] Epoch: 1 Batch: 11298/20099 (56.21%) Loss: 1.753140 LR: 0.00001958 [13:05:29] Epoch: 1 Batch: 11299/20099 (56.22%) Loss: 1.986456 LR: 0.00001958 [13:05:31] Epoch: 1 Batch: 11300/20099 (56.22%) Loss: 2.001939 LR: 0.00001958 [13:05:33] Epoch: 1 Batch: 11301/20099 (56.23%) Loss: 2.047039 LR: 0.00001956 [13:05:34] Epoch: 1 Batch: 11302/20099 (56.23%) Loss: 2.100166 LR: 0.00001956 [13:05:36] Epoch: 1 Batch: 11303/20099 (56.24%) Loss: 2.227453 LR: 0.00001956 [13:05:38] Epoch: 1 Batch: 11304/20099 (56.24%) Loss: 2.009810 LR: 0.00001956 [13:05:40] Epoch: 1 Batch: 11305/20099 (56.25%) Loss: 1.931912 LR: 0.00001956 [13:05:41] Epoch: 1 Batch: 11306/20099 (56.25%) Loss: 2.284395 LR: 0.00001956 [13:05:43] Epoch: 1 Batch: 11307/20099 (56.26%) Loss: 2.050078 LR: 0.00001956 [13:05:45] Epoch: 1 Batch: 11308/20099 (56.26%) Loss: 2.096411 LR: 0.00001955 [13:05:47] Epoch: 1 Batch: 11309/20099 (56.27%) Loss: 2.051000 LR: 0.00001955 [13:05:48] Epoch: 1 Batch: 11310/20099 (56.27%) Loss: 2.103501 LR: 0.00001955 [13:05:50] Epoch: 1 Batch: 11311/20099 (56.28%) Loss: 1.910689 LR: 0.00001955 [13:05:52] Epoch: 1 Batch: 11312/20099 (56.28%) Loss: 2.224805 LR: 0.00001955 [13:05:54] Epoch: 1 Batch: 11313/20099 (56.29%) Loss: 2.295284 LR: 0.00001955 [13:05:56] Epoch: 1 Batch: 11314/20099 (56.29%) Loss: 2.096960 LR: 0.00001955 [13:05:57] Epoch: 1 Batch: 11315/20099 (56.30%) Loss: 2.190556 LR: 0.00001953 [13:05:59] Epoch: 1 Batch: 11316/20099 (56.30%) Loss: 2.132529 LR: 0.00001953 [13:06:01] Epoch: 1 Batch: 11317/20099 (56.31%) Loss: 2.180504 LR: 0.00001953 [13:06:03] Epoch: 1 Batch: 11318/20099 (56.31%) Loss: 2.085730 LR: 0.00001953 [13:06:04] Epoch: 1 Batch: 11319/20099 (56.32%) Loss: 1.908525 LR: 0.00001953 [13:06:06] Epoch: 1 Batch: 11320/20099 (56.32%) Loss: 2.427936 LR: 0.00001953 [13:06:08] Epoch: 1 Batch: 11321/20099 (56.33%) Loss: 2.184149 LR: 0.00001953 [13:06:10] Epoch: 1 Batch: 11322/20099 (56.33%) Loss: 2.034099 LR: 0.00001951 [13:06:11] Epoch: 1 Batch: 11323/20099 (56.34%) Loss: 2.310478 LR: 0.00001951 [13:06:13] Epoch: 1 Batch: 11324/20099 (56.34%) Loss: 2.122055 LR: 0.00001951 [13:06:15] Epoch: 1 Batch: 11325/20099 (56.35%) Loss: 2.224063 LR: 0.00001951 [13:06:17] Epoch: 1 Batch: 11326/20099 (56.35%) Loss: 2.144893 LR: 0.00001951 [13:06:19] Epoch: 1 Batch: 11327/20099 (56.36%) Loss: 2.175828 LR: 0.00001951 [13:06:20] Epoch: 1 Batch: 11328/20099 (56.36%) Loss: 1.744901 LR: 0.00001951 [13:06:22] Epoch: 1 Batch: 11329/20099 (56.37%) Loss: 2.340363 LR: 0.00001950 [13:06:24] Epoch: 1 Batch: 11330/20099 (56.37%) Loss: 2.148187 LR: 0.00001950 [13:06:26] Epoch: 1 Batch: 11331/20099 (56.38%) Loss: 1.705023 LR: 0.00001950 [13:06:27] Epoch: 1 Batch: 11332/20099 (56.38%) Loss: 2.412832 LR: 0.00001950 [13:06:29] Epoch: 1 Batch: 11333/20099 (56.39%) Loss: 1.643580 LR: 0.00001950 [13:06:31] Epoch: 1 Batch: 11334/20099 (56.39%) Loss: 1.913682 LR: 0.00001950 [13:06:33] Epoch: 1 Batch: 11335/20099 (56.40%) Loss: 2.036838 LR: 0.00001950 [13:06:34] Epoch: 1 Batch: 11336/20099 (56.40%) Loss: 2.181314 LR: 0.00001948 [13:06:36] Epoch: 1 Batch: 11337/20099 (56.41%) Loss: 2.526333 LR: 0.00001948 [13:06:38] Epoch: 1 Batch: 11338/20099 (56.41%) Loss: 2.379275 LR: 0.00001948 [13:06:40] Epoch: 1 Batch: 11339/20099 (56.42%) Loss: 2.461579 LR: 0.00001948 [13:06:42] Epoch: 1 Batch: 11340/20099 (56.42%) Loss: 2.352740 LR: 0.00001948 [13:06:43] Epoch: 1 Batch: 11341/20099 (56.43%) Loss: 1.739417 LR: 0.00001948 [13:06:45] Epoch: 1 Batch: 11342/20099 (56.43%) Loss: 2.320500 LR: 0.00001948 [13:06:47] Epoch: 1 Batch: 11343/20099 (56.44%) Loss: 2.353551 LR: 0.00001947 [13:06:49] Epoch: 1 Batch: 11344/20099 (56.44%) Loss: 2.305062 LR: 0.00001947 [13:06:50] Epoch: 1 Batch: 11345/20099 (56.45%) Loss: 2.190596 LR: 0.00001947 [13:06:52] Epoch: 1 Batch: 11346/20099 (56.45%) Loss: 2.044444 LR: 0.00001947 [13:06:54] Epoch: 1 Batch: 11347/20099 (56.46%) Loss: 2.023049 LR: 0.00001947 [13:06:56] Epoch: 1 Batch: 11348/20099 (56.46%) Loss: 2.206058 LR: 0.00001947 [13:06:57] Epoch: 1 Batch: 11349/20099 (56.47%) Loss: 1.973364 LR: 0.00001947 [13:06:59] Epoch: 1 Batch: 11350/20099 (56.47%) Loss: 2.224528 LR: 0.00001945 [13:07:01] Epoch: 1 Batch: 11351/20099 (56.48%) Loss: 2.414248 LR: 0.00001945 [13:07:03] Epoch: 1 Batch: 11352/20099 (56.48%) Loss: 2.240918 LR: 0.00001945 [13:07:05] Epoch: 1 Batch: 11353/20099 (56.49%) Loss: 2.201759 LR: 0.00001945 [13:07:06] Epoch: 1 Batch: 11354/20099 (56.49%) Loss: 1.890977 LR: 0.00001945 [13:07:08] Epoch: 1 Batch: 11355/20099 (56.50%) Loss: 2.151474 LR: 0.00001945 [13:07:10] Epoch: 1 Batch: 11356/20099 (56.50%) Loss: 2.077891 LR: 0.00001945 [13:07:12] Epoch: 1 Batch: 11357/20099 (56.51%) Loss: 2.250790 LR: 0.00001943 [13:07:14] Epoch: 1 Batch: 11358/20099 (56.51%) Loss: 2.115627 LR: 0.00001943 [13:07:15] Epoch: 1 Batch: 11359/20099 (56.52%) Loss: 2.121855 LR: 0.00001943 [13:07:17] Epoch: 1 Batch: 11360/20099 (56.52%) Loss: 2.146965 LR: 0.00001943 [13:07:19] Epoch: 1 Batch: 11361/20099 (56.53%) Loss: 2.337461 LR: 0.00001943 [13:07:21] Epoch: 1 Batch: 11362/20099 (56.53%) Loss: 2.336686 LR: 0.00001943 [13:07:22] Epoch: 1 Batch: 11363/20099 (56.54%) Loss: 2.493858 LR: 0.00001943 [13:07:24] Epoch: 1 Batch: 11364/20099 (56.54%) Loss: 2.311725 LR: 0.00001942 [13:07:26] Epoch: 1 Batch: 11365/20099 (56.55%) Loss: 1.889207 LR: 0.00001942 [13:07:28] Epoch: 1 Batch: 11366/20099 (56.55%) Loss: 1.879188 LR: 0.00001942 [13:07:30] Epoch: 1 Batch: 11367/20099 (56.56%) Loss: 2.244578 LR: 0.00001942 [13:07:31] Epoch: 1 Batch: 11368/20099 (56.56%) Loss: 2.114572 LR: 0.00001942 [13:07:33] Epoch: 1 Batch: 11369/20099 (56.57%) Loss: 1.945596 LR: 0.00001942 [13:07:35] Epoch: 1 Batch: 11370/20099 (56.57%) Loss: 2.061101 LR: 0.00001942 [13:07:37] Epoch: 1 Batch: 11371/20099 (56.57%) Loss: 2.125188 LR: 0.00001940 [13:07:38] Epoch: 1 Batch: 11372/20099 (56.58%) Loss: 2.432108 LR: 0.00001940 [13:07:40] Epoch: 1 Batch: 11373/20099 (56.58%) Loss: 2.489977 LR: 0.00001940 [13:07:42] Epoch: 1 Batch: 11374/20099 (56.59%) Loss: 2.029814 LR: 0.00001940 [13:07:44] Epoch: 1 Batch: 11375/20099 (56.59%) Loss: 1.594001 LR: 0.00001940 [13:07:46] Epoch: 1 Batch: 11376/20099 (56.60%) Loss: 2.023249 LR: 0.00001940 [13:07:47] Epoch: 1 Batch: 11377/20099 (56.60%) Loss: 2.049863 LR: 0.00001940 [13:07:49] Epoch: 1 Batch: 11378/20099 (56.61%) Loss: 1.976772 LR: 0.00001939 [13:07:51] Epoch: 1 Batch: 11379/20099 (56.61%) Loss: 1.836416 LR: 0.00001939 [13:07:53] Epoch: 1 Batch: 11380/20099 (56.62%) Loss: 1.959018 LR: 0.00001939 [13:07:55] Epoch: 1 Batch: 11381/20099 (56.62%) Loss: 1.980948 LR: 0.00001939 [13:07:56] Epoch: 1 Batch: 11382/20099 (56.63%) Loss: 2.062345 LR: 0.00001939 [13:07:58] Epoch: 1 Batch: 11383/20099 (56.63%) Loss: 2.187510 LR: 0.00001939 [13:08:00] Epoch: 1 Batch: 11384/20099 (56.64%) Loss: 2.155882 LR: 0.00001939 [13:08:02] Epoch: 1 Batch: 11385/20099 (56.64%) Loss: 2.204037 LR: 0.00001937 [13:08:03] Epoch: 1 Batch: 11386/20099 (56.65%) Loss: 2.077801 LR: 0.00001937 [13:08:05] Epoch: 1 Batch: 11387/20099 (56.65%) Loss: 1.888232 LR: 0.00001937 [13:08:07] Epoch: 1 Batch: 11388/20099 (56.66%) Loss: 1.948736 LR: 0.00001937 [13:08:09] Epoch: 1 Batch: 11389/20099 (56.66%) Loss: 2.230695 LR: 0.00001937 [13:08:11] Epoch: 1 Batch: 11390/20099 (56.67%) Loss: 2.056886 LR: 0.00001937 [13:08:12] Epoch: 1 Batch: 11391/20099 (56.67%) Loss: 2.056435 LR: 0.00001937 [13:08:14] Epoch: 1 Batch: 11392/20099 (56.68%) Loss: 2.213975 LR: 0.00001935 [13:08:16] Epoch: 1 Batch: 11393/20099 (56.68%) Loss: 1.933760 LR: 0.00001935 [13:08:18] Epoch: 1 Batch: 11394/20099 (56.69%) Loss: 1.984713 LR: 0.00001935 [13:08:20] Epoch: 1 Batch: 11395/20099 (56.69%) Loss: 2.134214 LR: 0.00001935 [13:08:21] Epoch: 1 Batch: 11396/20099 (56.70%) Loss: 1.752936 LR: 0.00001935 [13:08:23] Epoch: 1 Batch: 11397/20099 (56.70%) Loss: 1.968265 LR: 0.00001935 [13:08:25] Epoch: 1 Batch: 11398/20099 (56.71%) Loss: 2.066082 LR: 0.00001935 [13:08:27] Epoch: 1 Batch: 11399/20099 (56.71%) Loss: 2.216040 LR: 0.00001934 [13:08:32] >> Cleaned up old temp checkpoint: epoch1_step9400 [13:08:32] >> Temp checkpoint saved: epoch1_step11400, size: 0.1693 GB [13:08:32] Epoch: 1 Batch: 11400/20099 (56.72%) Loss: 2.242309 LR: 0.00001934 [13:08:34] Epoch: 1 Batch: 11401/20099 (56.72%) Loss: 1.944973 LR: 0.00001934 [13:08:35] Epoch: 1 Batch: 11402/20099 (56.73%) Loss: 2.280672 LR: 0.00001934 [13:08:37] Epoch: 1 Batch: 11403/20099 (56.73%) Loss: 1.662863 LR: 0.00001934 [13:08:39] Epoch: 1 Batch: 11404/20099 (56.74%) Loss: 2.062529 LR: 0.00001934 [13:08:41] Epoch: 1 Batch: 11405/20099 (56.74%) Loss: 2.198876 LR: 0.00001934 [13:08:42] Epoch: 1 Batch: 11406/20099 (56.75%) Loss: 1.877169 LR: 0.00001932 [13:08:44] Epoch: 1 Batch: 11407/20099 (56.75%) Loss: 2.145426 LR: 0.00001932 [13:08:46] Epoch: 1 Batch: 11408/20099 (56.76%) Loss: 2.060930 LR: 0.00001932 [13:08:48] Epoch: 1 Batch: 11409/20099 (56.76%) Loss: 2.292199 LR: 0.00001932 [13:08:50] Epoch: 1 Batch: 11410/20099 (56.77%) Loss: 2.269112 LR: 0.00001932 [13:08:51] Epoch: 1 Batch: 11411/20099 (56.77%) Loss: 1.956730 LR: 0.00001932 [13:08:53] Epoch: 1 Batch: 11412/20099 (56.78%) Loss: 2.187746 LR: 0.00001932 [13:08:55] Epoch: 1 Batch: 11413/20099 (56.78%) Loss: 1.871574 LR: 0.00001930 [13:08:57] Epoch: 1 Batch: 11414/20099 (56.79%) Loss: 2.281445 LR: 0.00001930 [13:08:59] Epoch: 1 Batch: 11415/20099 (56.79%) Loss: 2.081698 LR: 0.00001930 [13:09:00] Epoch: 1 Batch: 11416/20099 (56.80%) Loss: 2.137792 LR: 0.00001930 [13:09:02] Epoch: 1 Batch: 11417/20099 (56.80%) Loss: 2.271646 LR: 0.00001930 [13:09:04] Epoch: 1 Batch: 11418/20099 (56.81%) Loss: 1.861923 LR: 0.00001930 [13:09:06] Epoch: 1 Batch: 11419/20099 (56.81%) Loss: 1.984867 LR: 0.00001930 [13:09:08] Epoch: 1 Batch: 11420/20099 (56.82%) Loss: 2.365150 LR: 0.00001929 [13:09:09] Epoch: 1 Batch: 11421/20099 (56.82%) Loss: 2.036678 LR: 0.00001929 [13:09:11] Epoch: 1 Batch: 11422/20099 (56.83%) Loss: 2.108927 LR: 0.00001929 [13:09:13] Epoch: 1 Batch: 11423/20099 (56.83%) Loss: 2.426346 LR: 0.00001929 [13:09:15] Epoch: 1 Batch: 11424/20099 (56.84%) Loss: 2.120196 LR: 0.00001929 [13:09:17] Epoch: 1 Batch: 11425/20099 (56.84%) Loss: 2.055756 LR: 0.00001929 [13:09:18] Epoch: 1 Batch: 11426/20099 (56.85%) Loss: 1.953622 LR: 0.00001929 [13:09:20] Epoch: 1 Batch: 11427/20099 (56.85%) Loss: 1.957633 LR: 0.00001927 [13:09:22] Epoch: 1 Batch: 11428/20099 (56.86%) Loss: 2.009292 LR: 0.00001927 [13:09:24] Epoch: 1 Batch: 11429/20099 (56.86%) Loss: 2.059504 LR: 0.00001927 [13:09:26] Epoch: 1 Batch: 11430/20099 (56.87%) Loss: 2.192499 LR: 0.00001927 [13:09:27] Epoch: 1 Batch: 11431/20099 (56.87%) Loss: 2.103674 LR: 0.00001927 [13:09:29] Epoch: 1 Batch: 11432/20099 (56.88%) Loss: 2.478964 LR: 0.00001927 [13:09:31] Epoch: 1 Batch: 11433/20099 (56.88%) Loss: 1.848549 LR: 0.00001927 [13:09:33] Epoch: 1 Batch: 11434/20099 (56.89%) Loss: 2.293578 LR: 0.00001926 [13:09:34] Epoch: 1 Batch: 11435/20099 (56.89%) Loss: 2.033075 LR: 0.00001926 [13:09:36] Epoch: 1 Batch: 11436/20099 (56.90%) Loss: 2.013788 LR: 0.00001926 [13:09:38] Epoch: 1 Batch: 11437/20099 (56.90%) Loss: 1.565673 LR: 0.00001926 [13:09:40] Epoch: 1 Batch: 11438/20099 (56.91%) Loss: 2.020387 LR: 0.00001926 [13:09:41] Epoch: 1 Batch: 11439/20099 (56.91%) Loss: 2.269745 LR: 0.00001926 [13:09:43] Epoch: 1 Batch: 11440/20099 (56.92%) Loss: 1.980197 LR: 0.00001926 [13:09:45] Epoch: 1 Batch: 11441/20099 (56.92%) Loss: 2.186114 LR: 0.00001924 [13:09:47] Epoch: 1 Batch: 11442/20099 (56.93%) Loss: 2.224105 LR: 0.00001924 [13:09:48] Epoch: 1 Batch: 11443/20099 (56.93%) Loss: 2.105724 LR: 0.00001924 [13:09:50] Epoch: 1 Batch: 11444/20099 (56.94%) Loss: 2.224580 LR: 0.00001924 [13:09:52] Epoch: 1 Batch: 11445/20099 (56.94%) Loss: 2.038245 LR: 0.00001924 [13:09:54] Epoch: 1 Batch: 11446/20099 (56.95%) Loss: 1.757391 LR: 0.00001924 [13:09:56] Epoch: 1 Batch: 11447/20099 (56.95%) Loss: 1.946046 LR: 0.00001924 [13:09:57] Epoch: 1 Batch: 11448/20099 (56.96%) Loss: 2.174122 LR: 0.00001922 [13:09:59] Epoch: 1 Batch: 11449/20099 (56.96%) Loss: 2.244912 LR: 0.00001922 [13:10:01] Epoch: 1 Batch: 11450/20099 (56.97%) Loss: 2.034544 LR: 0.00001922 [13:10:03] Epoch: 1 Batch: 11451/20099 (56.97%) Loss: 1.958884 LR: 0.00001922 [13:10:04] Epoch: 1 Batch: 11452/20099 (56.98%) Loss: 2.295491 LR: 0.00001922 [13:10:06] Epoch: 1 Batch: 11453/20099 (56.98%) Loss: 1.851385 LR: 0.00001922 [13:10:08] Epoch: 1 Batch: 11454/20099 (56.99%) Loss: 1.863563 LR: 0.00001922 [13:10:10] Epoch: 1 Batch: 11455/20099 (56.99%) Loss: 1.760401 LR: 0.00001921 [13:10:12] Epoch: 1 Batch: 11456/20099 (57.00%) Loss: 2.162434 LR: 0.00001921 [13:10:13] Epoch: 1 Batch: 11457/20099 (57.00%) Loss: 1.903727 LR: 0.00001921 [13:10:15] Epoch: 1 Batch: 11458/20099 (57.01%) Loss: 2.464973 LR: 0.00001921 [13:10:17] Epoch: 1 Batch: 11459/20099 (57.01%) Loss: 2.344601 LR: 0.00001921 [13:10:19] Epoch: 1 Batch: 11460/20099 (57.02%) Loss: 2.198756 LR: 0.00001921 [13:10:20] Epoch: 1 Batch: 11461/20099 (57.02%) Loss: 2.290895 LR: 0.00001921 [13:10:22] Epoch: 1 Batch: 11462/20099 (57.03%) Loss: 2.205675 LR: 0.00001919 [13:10:24] Epoch: 1 Batch: 11463/20099 (57.03%) Loss: 1.817565 LR: 0.00001919 [13:10:26] Epoch: 1 Batch: 11464/20099 (57.04%) Loss: 2.213937 LR: 0.00001919 [13:10:27] Epoch: 1 Batch: 11465/20099 (57.04%) Loss: 2.115468 LR: 0.00001919 [13:10:29] Epoch: 1 Batch: 11466/20099 (57.05%) Loss: 1.782922 LR: 0.00001919 [13:10:31] Epoch: 1 Batch: 11467/20099 (57.05%) Loss: 2.203735 LR: 0.00001919 [13:10:33] Epoch: 1 Batch: 11468/20099 (57.06%) Loss: 1.944154 LR: 0.00001919 [13:10:35] Epoch: 1 Batch: 11469/20099 (57.06%) Loss: 1.941014 LR: 0.00001918 [13:10:36] Epoch: 1 Batch: 11470/20099 (57.07%) Loss: 1.933968 LR: 0.00001918 [13:10:38] Epoch: 1 Batch: 11471/20099 (57.07%) Loss: 2.072848 LR: 0.00001918 [13:10:40] Epoch: 1 Batch: 11472/20099 (57.08%) Loss: 1.899228 LR: 0.00001918 [13:10:42] Epoch: 1 Batch: 11473/20099 (57.08%) Loss: 2.209858 LR: 0.00001918 [13:10:43] Epoch: 1 Batch: 11474/20099 (57.09%) Loss: 2.413316 LR: 0.00001918 [13:10:45] Epoch: 1 Batch: 11475/20099 (57.09%) Loss: 2.120059 LR: 0.00001918 [13:10:47] Epoch: 1 Batch: 11476/20099 (57.10%) Loss: 2.231450 LR: 0.00001916 [13:10:49] Epoch: 1 Batch: 11477/20099 (57.10%) Loss: 1.839533 LR: 0.00001916 [13:10:51] Epoch: 1 Batch: 11478/20099 (57.11%) Loss: 1.803795 LR: 0.00001916 [13:10:52] Epoch: 1 Batch: 11479/20099 (57.11%) Loss: 1.992642 LR: 0.00001916 [13:10:54] Epoch: 1 Batch: 11480/20099 (57.12%) Loss: 2.046949 LR: 0.00001916 [13:10:56] Epoch: 1 Batch: 11481/20099 (57.12%) Loss: 2.093298 LR: 0.00001916 [13:10:58] Epoch: 1 Batch: 11482/20099 (57.13%) Loss: 2.037118 LR: 0.00001916 [13:10:59] Epoch: 1 Batch: 11483/20099 (57.13%) Loss: 2.064366 LR: 0.00001914 [13:11:01] Epoch: 1 Batch: 11484/20099 (57.14%) Loss: 2.094176 LR: 0.00001914 [13:11:03] Epoch: 1 Batch: 11485/20099 (57.14%) Loss: 1.955821 LR: 0.00001914 [13:11:05] Epoch: 1 Batch: 11486/20099 (57.15%) Loss: 1.956538 LR: 0.00001914 [13:11:07] Epoch: 1 Batch: 11487/20099 (57.15%) Loss: 2.043616 LR: 0.00001914 [13:11:08] Epoch: 1 Batch: 11488/20099 (57.16%) Loss: 2.080916 LR: 0.00001914 [13:11:10] Epoch: 1 Batch: 11489/20099 (57.16%) Loss: 2.071719 LR: 0.00001914 [13:11:12] Epoch: 1 Batch: 11490/20099 (57.17%) Loss: 2.172902 LR: 0.00001913 [13:11:14] Epoch: 1 Batch: 11491/20099 (57.17%) Loss: 1.306231 LR: 0.00001913 [13:11:15] Epoch: 1 Batch: 11492/20099 (57.18%) Loss: 1.795804 LR: 0.00001913 [13:11:17] Epoch: 1 Batch: 11493/20099 (57.18%) Loss: 2.290836 LR: 0.00001913 [13:11:19] Epoch: 1 Batch: 11494/20099 (57.19%) Loss: 2.129313 LR: 0.00001913 [13:11:21] Epoch: 1 Batch: 11495/20099 (57.19%) Loss: 2.286456 LR: 0.00001913 [13:11:23] Epoch: 1 Batch: 11496/20099 (57.20%) Loss: 2.032724 LR: 0.00001913 [13:11:24] Epoch: 1 Batch: 11497/20099 (57.20%) Loss: 1.884629 LR: 0.00001911 [13:11:26] Epoch: 1 Batch: 11498/20099 (57.21%) Loss: 2.237799 LR: 0.00001911 [13:11:28] Epoch: 1 Batch: 11499/20099 (57.21%) Loss: 2.120321 LR: 0.00001911 [13:11:30] >> Evaluating batch 0 [13:11:31] >> Evaluating batch 1 [13:11:32] >> Evaluating batch 2 [13:11:33] >> Evaluating batch 3 [13:11:34] >> Evaluating batch 4 [13:11:35] >> Evaluating batch 5 [13:11:36] >> Evaluating batch 6 [13:11:37] >> Evaluating batch 7 [13:11:38] >> Evaluating batch 8 [13:11:39] >> Evaluating batch 9 [13:11:40] >> Evaluating batch 10 [13:11:41] >> Evaluating batch 11 [13:11:42] >> Evaluating batch 12 [13:11:43] >> Evaluating batch 13 [13:11:44] >> Evaluating batch 14 [13:11:45] >> Evaluating batch 15 [13:11:46] >> Evaluating batch 16 [13:11:46] Epoch: 1 Step: 11500/20099 Evaluation: [13:11:46] [1mAvg Loss Since Last Eval: 2.0994 Val Loss: 2.1593 Validation loss delta: -0.0035 Perplexity: 8.6652 LR: 0.00001911 [13:11:50] >> Checkpoint saved: epoch1_step11500, size: 0.1693 GB [13:11:50] Epoch: 1 Batch: 11500/20099 (57.22%) Loss: 2.133309 LR: 0.00001911 [13:11:52] Epoch: 1 Batch: 11501/20099 (57.22%) Loss: 1.876649 LR: 0.00001911 [13:11:54] Epoch: 1 Batch: 11502/20099 (57.23%) Loss: 1.810048 LR: 0.00001911 [13:11:55] Epoch: 1 Batch: 11503/20099 (57.23%) Loss: 1.999212 LR: 0.00001911 [13:11:57] Epoch: 1 Batch: 11504/20099 (57.24%) Loss: 2.283310 LR: 0.00001909 [13:11:59] Epoch: 1 Batch: 11505/20099 (57.24%) Loss: 2.253425 LR: 0.00001909 [13:12:01] Epoch: 1 Batch: 11506/20099 (57.25%) Loss: 2.120220 LR: 0.00001909 [13:12:02] Epoch: 1 Batch: 11507/20099 (57.25%) Loss: 2.219967 LR: 0.00001909 [13:12:04] Epoch: 1 Batch: 11508/20099 (57.26%) Loss: 2.036957 LR: 0.00001909 [13:12:06] Epoch: 1 Batch: 11509/20099 (57.26%) Loss: 2.228568 LR: 0.00001909 [13:12:08] Epoch: 1 Batch: 11510/20099 (57.27%) Loss: 2.105385 LR: 0.00001909 [13:12:10] Epoch: 1 Batch: 11511/20099 (57.27%) Loss: 2.020589 LR: 0.00001908 [13:12:11] Epoch: 1 Batch: 11512/20099 (57.28%) Loss: 2.192149 LR: 0.00001908 [13:12:13] Epoch: 1 Batch: 11513/20099 (57.28%) Loss: 2.099419 LR: 0.00001908 [13:12:15] Epoch: 1 Batch: 11514/20099 (57.29%) Loss: 2.081747 LR: 0.00001908 [13:12:17] Epoch: 1 Batch: 11515/20099 (57.29%) Loss: 2.161253 LR: 0.00001908 [13:12:18] Epoch: 1 Batch: 11516/20099 (57.30%) Loss: 1.832454 LR: 0.00001908 [13:12:20] Epoch: 1 Batch: 11517/20099 (57.30%) Loss: 2.114930 LR: 0.00001908 [13:12:22] Epoch: 1 Batch: 11518/20099 (57.31%) Loss: 2.044088 LR: 0.00001906 [13:12:24] Epoch: 1 Batch: 11519/20099 (57.31%) Loss: 2.297429 LR: 0.00001906 [13:12:26] Epoch: 1 Batch: 11520/20099 (57.32%) Loss: 1.994377 LR: 0.00001906 [13:12:27] Epoch: 1 Batch: 11521/20099 (57.32%) Loss: 2.248152 LR: 0.00001906 [13:12:29] Epoch: 1 Batch: 11522/20099 (57.33%) Loss: 2.106401 LR: 0.00001906 [13:12:31] Epoch: 1 Batch: 11523/20099 (57.33%) Loss: 2.226057 LR: 0.00001906 [13:12:33] Epoch: 1 Batch: 11524/20099 (57.34%) Loss: 1.804335 LR: 0.00001906 [13:12:35] Epoch: 1 Batch: 11525/20099 (57.34%) Loss: 2.431558 LR: 0.00001905 [13:12:36] Epoch: 1 Batch: 11526/20099 (57.35%) Loss: 2.196276 LR: 0.00001905 [13:12:38] Epoch: 1 Batch: 11527/20099 (57.35%) Loss: 2.252018 LR: 0.00001905 [13:12:40] Epoch: 1 Batch: 11528/20099 (57.36%) Loss: 2.260310 LR: 0.00001905 [13:12:42] Epoch: 1 Batch: 11529/20099 (57.36%) Loss: 2.215660 LR: 0.00001905 [13:12:43] Epoch: 1 Batch: 11530/20099 (57.37%) Loss: 2.292562 LR: 0.00001905 [13:12:45] Epoch: 1 Batch: 11531/20099 (57.37%) Loss: 2.105291 LR: 0.00001905 [13:12:47] Epoch: 1 Batch: 11532/20099 (57.38%) Loss: 2.257446 LR: 0.00001903 [13:12:49] Epoch: 1 Batch: 11533/20099 (57.38%) Loss: 2.469804 LR: 0.00001903 [13:12:50] Epoch: 1 Batch: 11534/20099 (57.39%) Loss: 2.128227 LR: 0.00001903 [13:12:52] Epoch: 1 Batch: 11535/20099 (57.39%) Loss: 2.174700 LR: 0.00001903 [13:12:54] Epoch: 1 Batch: 11536/20099 (57.40%) Loss: 1.709875 LR: 0.00001903 [13:12:56] Epoch: 1 Batch: 11537/20099 (57.40%) Loss: 2.152450 LR: 0.00001903 [13:12:58] Epoch: 1 Batch: 11538/20099 (57.41%) Loss: 2.329833 LR: 0.00001903 [13:12:59] Epoch: 1 Batch: 11539/20099 (57.41%) Loss: 2.260255 LR: 0.00001901 [13:13:01] Epoch: 1 Batch: 11540/20099 (57.42%) Loss: 1.657529 LR: 0.00001901 [13:13:03] Epoch: 1 Batch: 11541/20099 (57.42%) Loss: 1.797801 LR: 0.00001901 [13:13:05] Epoch: 1 Batch: 11542/20099 (57.43%) Loss: 1.814854 LR: 0.00001901 [13:13:06] Epoch: 1 Batch: 11543/20099 (57.43%) Loss: 1.984634 LR: 0.00001901 [13:13:08] Epoch: 1 Batch: 11544/20099 (57.44%) Loss: 2.216083 LR: 0.00001901 [13:13:10] Epoch: 1 Batch: 11545/20099 (57.44%) Loss: 2.156146 LR: 0.00001901 [13:13:12] Epoch: 1 Batch: 11546/20099 (57.45%) Loss: 2.170917 LR: 0.00001900 [13:13:13] Epoch: 1 Batch: 11547/20099 (57.45%) Loss: 1.994955 LR: 0.00001900 [13:13:15] Epoch: 1 Batch: 11548/20099 (57.46%) Loss: 2.226553 LR: 0.00001900 [13:13:17] Epoch: 1 Batch: 11549/20099 (57.46%) Loss: 2.305378 LR: 0.00001900 [13:13:19] Epoch: 1 Batch: 11550/20099 (57.47%) Loss: 1.807726 LR: 0.00001900 [13:13:20] Epoch: 1 Batch: 11551/20099 (57.47%) Loss: 2.128574 LR: 0.00001900 [13:13:22] Epoch: 1 Batch: 11552/20099 (57.48%) Loss: 2.223904 LR: 0.00001900 [13:13:24] Epoch: 1 Batch: 11553/20099 (57.48%) Loss: 2.077194 LR: 0.00001898 [13:13:26] Epoch: 1 Batch: 11554/20099 (57.49%) Loss: 2.172306 LR: 0.00001898 [13:13:28] Epoch: 1 Batch: 11555/20099 (57.49%) Loss: 2.135675 LR: 0.00001898 [13:13:29] Epoch: 1 Batch: 11556/20099 (57.50%) Loss: 2.065298 LR: 0.00001898 [13:13:31] Epoch: 1 Batch: 11557/20099 (57.50%) Loss: 2.146181 LR: 0.00001898 [13:13:33] Epoch: 1 Batch: 11558/20099 (57.51%) Loss: 1.915734 LR: 0.00001898 [13:13:35] Epoch: 1 Batch: 11559/20099 (57.51%) Loss: 2.116762 LR: 0.00001898 [13:13:36] Epoch: 1 Batch: 11560/20099 (57.52%) Loss: 2.252793 LR: 0.00001897 [13:13:38] Epoch: 1 Batch: 11561/20099 (57.52%) Loss: 1.907568 LR: 0.00001897 [13:13:40] Epoch: 1 Batch: 11562/20099 (57.53%) Loss: 2.268930 LR: 0.00001897 [13:13:42] Epoch: 1 Batch: 11563/20099 (57.53%) Loss: 2.167648 LR: 0.00001897 [13:13:43] Epoch: 1 Batch: 11564/20099 (57.54%) Loss: 1.898059 LR: 0.00001897 [13:13:45] Epoch: 1 Batch: 11565/20099 (57.54%) Loss: 2.067606 LR: 0.00001897 [13:13:47] Epoch: 1 Batch: 11566/20099 (57.55%) Loss: 2.174708 LR: 0.00001897 [13:13:49] Epoch: 1 Batch: 11567/20099 (57.55%) Loss: 2.463333 LR: 0.00001895 [13:13:51] Epoch: 1 Batch: 11568/20099 (57.56%) Loss: 2.193546 LR: 0.00001895 [13:13:52] Epoch: 1 Batch: 11569/20099 (57.56%) Loss: 2.292379 LR: 0.00001895 [13:13:54] Epoch: 1 Batch: 11570/20099 (57.57%) Loss: 2.283432 LR: 0.00001895 [13:13:56] Epoch: 1 Batch: 11571/20099 (57.57%) Loss: 2.025387 LR: 0.00001895 [13:13:58] Epoch: 1 Batch: 11572/20099 (57.58%) Loss: 2.224511 LR: 0.00001895 [13:13:59] Epoch: 1 Batch: 11573/20099 (57.58%) Loss: 1.927526 LR: 0.00001895 [13:14:01] Epoch: 1 Batch: 11574/20099 (57.58%) Loss: 2.233892 LR: 0.00001893 [13:14:03] Epoch: 1 Batch: 11575/20099 (57.59%) Loss: 2.209314 LR: 0.00001893 [13:14:05] Epoch: 1 Batch: 11576/20099 (57.59%) Loss: 2.044917 LR: 0.00001893 [13:14:06] Epoch: 1 Batch: 11577/20099 (57.60%) Loss: 2.163279 LR: 0.00001893 [13:14:08] Epoch: 1 Batch: 11578/20099 (57.60%) Loss: 2.140867 LR: 0.00001893 [13:14:10] Epoch: 1 Batch: 11579/20099 (57.61%) Loss: 1.925324 LR: 0.00001893 [13:14:12] Epoch: 1 Batch: 11580/20099 (57.61%) Loss: 2.232747 LR: 0.00001893 [13:14:14] Epoch: 1 Batch: 11581/20099 (57.62%) Loss: 2.407112 LR: 0.00001892 [13:14:15] Epoch: 1 Batch: 11582/20099 (57.62%) Loss: 2.089918 LR: 0.00001892 [13:14:17] Epoch: 1 Batch: 11583/20099 (57.63%) Loss: 2.156255 LR: 0.00001892 [13:14:19] Epoch: 1 Batch: 11584/20099 (57.63%) Loss: 2.085344 LR: 0.00001892 [13:14:21] Epoch: 1 Batch: 11585/20099 (57.64%) Loss: 2.082570 LR: 0.00001892 [13:14:22] Epoch: 1 Batch: 11586/20099 (57.64%) Loss: 1.584989 LR: 0.00001892 [13:14:24] Epoch: 1 Batch: 11587/20099 (57.65%) Loss: 2.285981 LR: 0.00001892 [13:14:26] Epoch: 1 Batch: 11588/20099 (57.65%) Loss: 2.106760 LR: 0.00001890 [13:14:28] Epoch: 1 Batch: 11589/20099 (57.66%) Loss: 2.158588 LR: 0.00001890 [13:14:29] Epoch: 1 Batch: 11590/20099 (57.66%) Loss: 2.029811 LR: 0.00001890 [13:14:31] Epoch: 1 Batch: 11591/20099 (57.67%) Loss: 2.087928 LR: 0.00001890 [13:14:33] Epoch: 1 Batch: 11592/20099 (57.67%) Loss: 2.069029 LR: 0.00001890 [13:14:35] Epoch: 1 Batch: 11593/20099 (57.68%) Loss: 1.928134 LR: 0.00001890 [13:14:37] Epoch: 1 Batch: 11594/20099 (57.68%) Loss: 2.050188 LR: 0.00001890 [13:14:38] Epoch: 1 Batch: 11595/20099 (57.69%) Loss: 2.157734 LR: 0.00001888 [13:14:40] Epoch: 1 Batch: 11596/20099 (57.69%) Loss: 2.152465 LR: 0.00001888 [13:14:42] Epoch: 1 Batch: 11597/20099 (57.70%) Loss: 2.212037 LR: 0.00001888 [13:14:44] Epoch: 1 Batch: 11598/20099 (57.70%) Loss: 2.062583 LR: 0.00001888 [13:14:45] Epoch: 1 Batch: 11599/20099 (57.71%) Loss: 2.122757 LR: 0.00001888 [13:14:51] >> Cleaned up old temp checkpoint: epoch1_step9600 [13:14:51] >> Temp checkpoint saved: epoch1_step11600, size: 0.1693 GB [13:14:51] Epoch: 1 Batch: 11600/20099 (57.71%) Loss: 2.018181 LR: 0.00001888 [13:14:53] Epoch: 1 Batch: 11601/20099 (57.72%) Loss: 2.156212 LR: 0.00001888 [13:14:54] Epoch: 1 Batch: 11602/20099 (57.72%) Loss: 2.090179 LR: 0.00001887 [13:14:56] Epoch: 1 Batch: 11603/20099 (57.73%) Loss: 1.807906 LR: 0.00001887 [13:14:58] Epoch: 1 Batch: 11604/20099 (57.73%) Loss: 2.004309 LR: 0.00001887 [13:15:00] Epoch: 1 Batch: 11605/20099 (57.74%) Loss: 2.192232 LR: 0.00001887 [13:15:01] Epoch: 1 Batch: 11606/20099 (57.74%) Loss: 1.983692 LR: 0.00001887 [13:15:03] Epoch: 1 Batch: 11607/20099 (57.75%) Loss: 1.958194 LR: 0.00001887 [13:15:05] Epoch: 1 Batch: 11608/20099 (57.75%) Loss: 1.943734 LR: 0.00001887 [13:15:07] Epoch: 1 Batch: 11609/20099 (57.76%) Loss: 1.857016 LR: 0.00001885 [13:15:08] Epoch: 1 Batch: 11610/20099 (57.76%) Loss: 2.328492 LR: 0.00001885 [13:15:10] Epoch: 1 Batch: 11611/20099 (57.77%) Loss: 2.152304 LR: 0.00001885 [13:15:12] Epoch: 1 Batch: 11612/20099 (57.77%) Loss: 2.180845 LR: 0.00001885 [13:15:14] Epoch: 1 Batch: 11613/20099 (57.78%) Loss: 2.161557 LR: 0.00001885 [13:15:16] Epoch: 1 Batch: 11614/20099 (57.78%) Loss: 2.205563 LR: 0.00001885 [13:15:17] Epoch: 1 Batch: 11615/20099 (57.79%) Loss: 2.190918 LR: 0.00001885 [13:15:19] Epoch: 1 Batch: 11616/20099 (57.79%) Loss: 1.940868 LR: 0.00001884 [13:15:21] Epoch: 1 Batch: 11617/20099 (57.80%) Loss: 1.854971 LR: 0.00001884 [13:15:23] Epoch: 1 Batch: 11618/20099 (57.80%) Loss: 2.152514 LR: 0.00001884 [13:15:25] Epoch: 1 Batch: 11619/20099 (57.81%) Loss: 1.921189 LR: 0.00001884 [13:15:26] Epoch: 1 Batch: 11620/20099 (57.81%) Loss: 2.131121 LR: 0.00001884 [13:15:28] Epoch: 1 Batch: 11621/20099 (57.82%) Loss: 2.126243 LR: 0.00001884 [13:15:30] Epoch: 1 Batch: 11622/20099 (57.82%) Loss: 1.669983 LR: 0.00001884 [13:15:32] Epoch: 1 Batch: 11623/20099 (57.83%) Loss: 2.320068 LR: 0.00001882 [13:15:34] Epoch: 1 Batch: 11624/20099 (57.83%) Loss: 1.949655 LR: 0.00001882 [13:15:35] Epoch: 1 Batch: 11625/20099 (57.84%) Loss: 2.391958 LR: 0.00001882 [13:15:37] Epoch: 1 Batch: 11626/20099 (57.84%) Loss: 1.898051 LR: 0.00001882 [13:15:39] Epoch: 1 Batch: 11627/20099 (57.85%) Loss: 2.187440 LR: 0.00001882 [13:15:41] Epoch: 1 Batch: 11628/20099 (57.85%) Loss: 2.429462 LR: 0.00001882 [13:15:42] Epoch: 1 Batch: 11629/20099 (57.86%) Loss: 1.917211 LR: 0.00001882 [13:15:44] Epoch: 1 Batch: 11630/20099 (57.86%) Loss: 2.011681 LR: 0.00001880 [13:15:46] Epoch: 1 Batch: 11631/20099 (57.87%) Loss: 2.390054 LR: 0.00001880 [13:15:48] Epoch: 1 Batch: 11632/20099 (57.87%) Loss: 1.939198 LR: 0.00001880 [13:15:50] Epoch: 1 Batch: 11633/20099 (57.88%) Loss: 1.925243 LR: 0.00001880 [13:15:51] Epoch: 1 Batch: 11634/20099 (57.88%) Loss: 1.961159 LR: 0.00001880 [13:15:53] Epoch: 1 Batch: 11635/20099 (57.89%) Loss: 2.227958 LR: 0.00001880 [13:15:55] Epoch: 1 Batch: 11636/20099 (57.89%) Loss: 2.033913 LR: 0.00001880 [13:15:57] Epoch: 1 Batch: 11637/20099 (57.90%) Loss: 2.320921 LR: 0.00001879 [13:15:58] Epoch: 1 Batch: 11638/20099 (57.90%) Loss: 2.222155 LR: 0.00001879 [13:16:00] Epoch: 1 Batch: 11639/20099 (57.91%) Loss: 2.124020 LR: 0.00001879 [13:16:02] Epoch: 1 Batch: 11640/20099 (57.91%) Loss: 1.926155 LR: 0.00001879 [13:16:04] Epoch: 1 Batch: 11641/20099 (57.92%) Loss: 1.899636 LR: 0.00001879 [13:16:05] Epoch: 1 Batch: 11642/20099 (57.92%) Loss: 2.078834 LR: 0.00001879 [13:16:07] Epoch: 1 Batch: 11643/20099 (57.93%) Loss: 2.304538 LR: 0.00001879 [13:16:09] Epoch: 1 Batch: 11644/20099 (57.93%) Loss: 2.042716 LR: 0.00001877 [13:16:11] Epoch: 1 Batch: 11645/20099 (57.94%) Loss: 2.162959 LR: 0.00001877 [13:16:12] Epoch: 1 Batch: 11646/20099 (57.94%) Loss: 2.268860 LR: 0.00001877 [13:16:14] Epoch: 1 Batch: 11647/20099 (57.95%) Loss: 2.438877 LR: 0.00001877 [13:16:16] Epoch: 1 Batch: 11648/20099 (57.95%) Loss: 2.039470 LR: 0.00001877 [13:16:18] Epoch: 1 Batch: 11649/20099 (57.96%) Loss: 2.131665 LR: 0.00001877 [13:16:20] Epoch: 1 Batch: 11650/20099 (57.96%) Loss: 1.800963 LR: 0.00001877 [13:16:21] Epoch: 1 Batch: 11651/20099 (57.97%) Loss: 1.808833 LR: 0.00001875 [13:16:23] Epoch: 1 Batch: 11652/20099 (57.97%) Loss: 2.138390 LR: 0.00001875 [13:16:25] Epoch: 1 Batch: 11653/20099 (57.98%) Loss: 2.073210 LR: 0.00001875 [13:16:27] Epoch: 1 Batch: 11654/20099 (57.98%) Loss: 1.930101 LR: 0.00001875 [13:16:28] Epoch: 1 Batch: 11655/20099 (57.99%) Loss: 2.105002 LR: 0.00001875 [13:16:30] Epoch: 1 Batch: 11656/20099 (57.99%) Loss: 1.822292 LR: 0.00001875 [13:16:32] Epoch: 1 Batch: 11657/20099 (58.00%) Loss: 1.911726 LR: 0.00001875 [13:16:34] Epoch: 1 Batch: 11658/20099 (58.00%) Loss: 2.213386 LR: 0.00001874 [13:16:35] Epoch: 1 Batch: 11659/20099 (58.01%) Loss: 2.286058 LR: 0.00001874 [13:16:37] Epoch: 1 Batch: 11660/20099 (58.01%) Loss: 2.118092 LR: 0.00001874 [13:16:39] Epoch: 1 Batch: 11661/20099 (58.02%) Loss: 1.874805 LR: 0.00001874 [13:16:41] Epoch: 1 Batch: 11662/20099 (58.02%) Loss: 1.997573 LR: 0.00001874 [13:16:43] Epoch: 1 Batch: 11663/20099 (58.03%) Loss: 1.860676 LR: 0.00001874 [13:16:44] Epoch: 1 Batch: 11664/20099 (58.03%) Loss: 2.221928 LR: 0.00001874 [13:16:46] Epoch: 1 Batch: 11665/20099 (58.04%) Loss: 2.073497 LR: 0.00001872 [13:16:48] Epoch: 1 Batch: 11666/20099 (58.04%) Loss: 1.703162 LR: 0.00001872 [13:16:50] Epoch: 1 Batch: 11667/20099 (58.05%) Loss: 2.015299 LR: 0.00001872 [13:16:51] Epoch: 1 Batch: 11668/20099 (58.05%) Loss: 2.110135 LR: 0.00001872 [13:16:53] Epoch: 1 Batch: 11669/20099 (58.06%) Loss: 2.044107 LR: 0.00001872 [13:16:55] Epoch: 1 Batch: 11670/20099 (58.06%) Loss: 2.275591 LR: 0.00001872 [13:16:57] Epoch: 1 Batch: 11671/20099 (58.07%) Loss: 2.307717 LR: 0.00001872 [13:16:58] Epoch: 1 Batch: 11672/20099 (58.07%) Loss: 1.963907 LR: 0.00001871 [13:17:00] Epoch: 1 Batch: 11673/20099 (58.08%) Loss: 2.024804 LR: 0.00001871 [13:17:02] Epoch: 1 Batch: 11674/20099 (58.08%) Loss: 2.347996 LR: 0.00001871 [13:17:04] Epoch: 1 Batch: 11675/20099 (58.09%) Loss: 2.136356 LR: 0.00001871 [13:17:06] Epoch: 1 Batch: 11676/20099 (58.09%) Loss: 2.091377 LR: 0.00001871 [13:17:07] Epoch: 1 Batch: 11677/20099 (58.10%) Loss: 2.078589 LR: 0.00001871 [13:17:09] Epoch: 1 Batch: 11678/20099 (58.10%) Loss: 2.266065 LR: 0.00001871 [13:17:11] Epoch: 1 Batch: 11679/20099 (58.11%) Loss: 1.983041 LR: 0.00001869 [13:17:13] Epoch: 1 Batch: 11680/20099 (58.11%) Loss: 1.999972 LR: 0.00001869 [13:17:14] Epoch: 1 Batch: 11681/20099 (58.12%) Loss: 2.151552 LR: 0.00001869 [13:17:16] Epoch: 1 Batch: 11682/20099 (58.12%) Loss: 2.087489 LR: 0.00001869 [13:17:18] Epoch: 1 Batch: 11683/20099 (58.13%) Loss: 1.714874 LR: 0.00001869 [13:17:20] Epoch: 1 Batch: 11684/20099 (58.13%) Loss: 1.909625 LR: 0.00001869 [13:17:22] Epoch: 1 Batch: 11685/20099 (58.14%) Loss: 2.236058 LR: 0.00001869 [13:17:23] Epoch: 1 Batch: 11686/20099 (58.14%) Loss: 1.885413 LR: 0.00001867 [13:17:25] Epoch: 1 Batch: 11687/20099 (58.15%) Loss: 2.283195 LR: 0.00001867 [13:17:27] Epoch: 1 Batch: 11688/20099 (58.15%) Loss: 1.883298 LR: 0.00001867 [13:17:29] Epoch: 1 Batch: 11689/20099 (58.16%) Loss: 2.055175 LR: 0.00001867 [13:17:31] Epoch: 1 Batch: 11690/20099 (58.16%) Loss: 1.967116 LR: 0.00001867 [13:17:32] Epoch: 1 Batch: 11691/20099 (58.17%) Loss: 1.977898 LR: 0.00001867 [13:17:34] Epoch: 1 Batch: 11692/20099 (58.17%) Loss: 1.840670 LR: 0.00001867 [13:17:36] Epoch: 1 Batch: 11693/20099 (58.18%) Loss: 2.174229 LR: 0.00001866 [13:17:38] Epoch: 1 Batch: 11694/20099 (58.18%) Loss: 2.390308 LR: 0.00001866 [13:17:39] Epoch: 1 Batch: 11695/20099 (58.19%) Loss: 1.990697 LR: 0.00001866 [13:17:41] Epoch: 1 Batch: 11696/20099 (58.19%) Loss: 2.169670 LR: 0.00001866 [13:17:43] Epoch: 1 Batch: 11697/20099 (58.20%) Loss: 2.138491 LR: 0.00001866 [13:17:45] Epoch: 1 Batch: 11698/20099 (58.20%) Loss: 2.178364 LR: 0.00001866 [13:17:46] Epoch: 1 Batch: 11699/20099 (58.21%) Loss: 1.960036 LR: 0.00001866 [13:17:48] Epoch: 1 Batch: 11700/20099 (58.21%) Loss: 2.222705 LR: 0.00001864 [13:17:50] Epoch: 1 Batch: 11701/20099 (58.22%) Loss: 2.192986 LR: 0.00001864 [13:17:52] Epoch: 1 Batch: 11702/20099 (58.22%) Loss: 2.476934 LR: 0.00001864 [13:17:54] Epoch: 1 Batch: 11703/20099 (58.23%) Loss: 2.118600 LR: 0.00001864 [13:17:55] Epoch: 1 Batch: 11704/20099 (58.23%) Loss: 2.107777 LR: 0.00001864 [13:17:57] Epoch: 1 Batch: 11705/20099 (58.24%) Loss: 1.888525 LR: 0.00001864 [13:17:59] Epoch: 1 Batch: 11706/20099 (58.24%) Loss: 2.284290 LR: 0.00001864 [13:18:01] Epoch: 1 Batch: 11707/20099 (58.25%) Loss: 1.869882 LR: 0.00001863 [13:18:02] Epoch: 1 Batch: 11708/20099 (58.25%) Loss: 2.150631 LR: 0.00001863 [13:18:04] Epoch: 1 Batch: 11709/20099 (58.26%) Loss: 2.212356 LR: 0.00001863 [13:18:06] Epoch: 1 Batch: 11710/20099 (58.26%) Loss: 2.236769 LR: 0.00001863 [13:18:08] Epoch: 1 Batch: 11711/20099 (58.27%) Loss: 1.888924 LR: 0.00001863 [13:18:10] Epoch: 1 Batch: 11712/20099 (58.27%) Loss: 2.198228 LR: 0.00001863 [13:18:11] Epoch: 1 Batch: 11713/20099 (58.28%) Loss: 2.288591 LR: 0.00001863 [13:18:13] Epoch: 1 Batch: 11714/20099 (58.28%) Loss: 2.357502 LR: 0.00001861 [13:18:15] Epoch: 1 Batch: 11715/20099 (58.29%) Loss: 2.147627 LR: 0.00001861 [13:18:17] Epoch: 1 Batch: 11716/20099 (58.29%) Loss: 2.183411 LR: 0.00001861 [13:18:18] Epoch: 1 Batch: 11717/20099 (58.30%) Loss: 2.368242 LR: 0.00001861 [13:18:20] Epoch: 1 Batch: 11718/20099 (58.30%) Loss: 2.001774 LR: 0.00001861 [13:18:22] Epoch: 1 Batch: 11719/20099 (58.31%) Loss: 2.183414 LR: 0.00001861 [13:18:24] Epoch: 1 Batch: 11720/20099 (58.31%) Loss: 2.261663 LR: 0.00001861 [13:18:26] Epoch: 1 Batch: 11721/20099 (58.32%) Loss: 2.404812 LR: 0.00001859 [13:18:27] Epoch: 1 Batch: 11722/20099 (58.32%) Loss: 1.867872 LR: 0.00001859 [13:18:29] Epoch: 1 Batch: 11723/20099 (58.33%) Loss: 2.090527 LR: 0.00001859 [13:18:31] Epoch: 1 Batch: 11724/20099 (58.33%) Loss: 2.006189 LR: 0.00001859 [13:18:33] Epoch: 1 Batch: 11725/20099 (58.34%) Loss: 2.014920 LR: 0.00001859 [13:18:34] Epoch: 1 Batch: 11726/20099 (58.34%) Loss: 2.088670 LR: 0.00001859 [13:18:36] Epoch: 1 Batch: 11727/20099 (58.35%) Loss: 2.055283 LR: 0.00001859 [13:18:38] Epoch: 1 Batch: 11728/20099 (58.35%) Loss: 2.098110 LR: 0.00001858 [13:18:40] Epoch: 1 Batch: 11729/20099 (58.36%) Loss: 1.762565 LR: 0.00001858 [13:18:41] Epoch: 1 Batch: 11730/20099 (58.36%) Loss: 1.891574 LR: 0.00001858 [13:18:43] Epoch: 1 Batch: 11731/20099 (58.37%) Loss: 2.306237 LR: 0.00001858 [13:18:45] Epoch: 1 Batch: 11732/20099 (58.37%) Loss: 2.124381 LR: 0.00001858 [13:18:47] Epoch: 1 Batch: 11733/20099 (58.38%) Loss: 1.964939 LR: 0.00001858 [13:18:49] Epoch: 1 Batch: 11734/20099 (58.38%) Loss: 1.894481 LR: 0.00001858 [13:18:50] Epoch: 1 Batch: 11735/20099 (58.39%) Loss: 2.205078 LR: 0.00001856 [13:18:52] Epoch: 1 Batch: 11736/20099 (58.39%) Loss: 1.677717 LR: 0.00001856 [13:18:54] Epoch: 1 Batch: 11737/20099 (58.40%) Loss: 2.454359 LR: 0.00001856 [13:18:56] Epoch: 1 Batch: 11738/20099 (58.40%) Loss: 2.226118 LR: 0.00001856 [13:18:57] Epoch: 1 Batch: 11739/20099 (58.41%) Loss: 2.097765 LR: 0.00001856 [13:18:59] Epoch: 1 Batch: 11740/20099 (58.41%) Loss: 1.938450 LR: 0.00001856 [13:19:01] Epoch: 1 Batch: 11741/20099 (58.42%) Loss: 1.986032 LR: 0.00001856 [13:19:03] Epoch: 1 Batch: 11742/20099 (58.42%) Loss: 2.220109 LR: 0.00001854 [13:19:05] Epoch: 1 Batch: 11743/20099 (58.43%) Loss: 1.878649 LR: 0.00001854 [13:19:06] Epoch: 1 Batch: 11744/20099 (58.43%) Loss: 1.951298 LR: 0.00001854 [13:19:08] Epoch: 1 Batch: 11745/20099 (58.44%) Loss: 1.909854 LR: 0.00001854 [13:19:10] Epoch: 1 Batch: 11746/20099 (58.44%) Loss: 2.061608 LR: 0.00001854 [13:19:12] Epoch: 1 Batch: 11747/20099 (58.45%) Loss: 2.377390 LR: 0.00001854 [13:19:13] Epoch: 1 Batch: 11748/20099 (58.45%) Loss: 2.206749 LR: 0.00001854 [13:19:15] Epoch: 1 Batch: 11749/20099 (58.46%) Loss: 2.091898 LR: 0.00001853 [13:19:17] Epoch: 1 Batch: 11750/20099 (58.46%) Loss: 2.100125 LR: 0.00001853 [13:19:19] Epoch: 1 Batch: 11751/20099 (58.47%) Loss: 1.944708 LR: 0.00001853 [13:19:20] Epoch: 1 Batch: 11752/20099 (58.47%) Loss: 1.943774 LR: 0.00001853 [13:19:22] Epoch: 1 Batch: 11753/20099 (58.48%) Loss: 2.211108 LR: 0.00001853 [13:19:24] Epoch: 1 Batch: 11754/20099 (58.48%) Loss: 2.144678 LR: 0.00001853 [13:19:26] Epoch: 1 Batch: 11755/20099 (58.49%) Loss: 2.209696 LR: 0.00001853 [13:19:28] Epoch: 1 Batch: 11756/20099 (58.49%) Loss: 2.116935 LR: 0.00001851 [13:19:29] Epoch: 1 Batch: 11757/20099 (58.50%) Loss: 1.694196 LR: 0.00001851 [13:19:31] Epoch: 1 Batch: 11758/20099 (58.50%) Loss: 2.308311 LR: 0.00001851 [13:19:33] Epoch: 1 Batch: 11759/20099 (58.51%) Loss: 2.216196 LR: 0.00001851 [13:19:35] Epoch: 1 Batch: 11760/20099 (58.51%) Loss: 2.125787 LR: 0.00001851 [13:19:36] Epoch: 1 Batch: 11761/20099 (58.52%) Loss: 2.083488 LR: 0.00001851 [13:19:38] Epoch: 1 Batch: 11762/20099 (58.52%) Loss: 2.249752 LR: 0.00001851 [13:19:40] Epoch: 1 Batch: 11763/20099 (58.53%) Loss: 2.244584 LR: 0.00001850 [13:19:42] Epoch: 1 Batch: 11764/20099 (58.53%) Loss: 2.289947 LR: 0.00001850 [13:19:44] Epoch: 1 Batch: 11765/20099 (58.54%) Loss: 2.301536 LR: 0.00001850 [13:40:38] 2025-08-23 [13:40:38] Tesla T4 [13:40:38] |===========================================================================| | PyTorch CUDA memory summary, device ID 0 | |---------------------------------------------------------------------------| | CUDA OOMs: 0 | cudaMalloc retries: 0 | |===========================================================================| | Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | |---------------------------------------------------------------------------| | Allocated memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Active memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Requested memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | GPU reserved memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Non-releasable memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Allocations | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Active allocs | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | GPU reserved segments | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Non-releasable allocs | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Oversize allocations | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Oversize GPU segments | 0 | 0 | 0 | 0 | |===========================================================================| [13:40:38] CPU usage: 38.0%, RAM usage: 26.7% [13:40:38] Running with the following configuration: [13:40:38] model_name: /content/drive/MyDrive/llm/NousResearch/Hermes-3-Llama-3.1-8B [13:40:38] tokenizer: /content/drive/MyDrive/llm/NousResearch/Hermes-3-Llama-3.1-8B [13:40:38] output_dir: /content/drive/MyDrive/llm/Discord-Hermes-3-8B [13:40:38] train_path: /content/drive/MyDrive/data/None156_fix.csv [13:40:38] checkpoint: /content/drive/MyDrive/llm/Discord-Hermes-3-8B/temp/epoch1_step11600 [13:40:38] lr: 3e-05 [13:40:38] lr_floor: 6e-06 [13:40:38] epochs: 1 [13:40:38] batch_size: 5 [13:40:38] accum_steps: 7 [13:40:38] val_batch_size: 6 [13:40:38] max_val_size: 100 [13:40:38] max_length: 150 [13:40:38] save_temp_frequency: 200 [13:40:38] save_frequency: 500 [13:40:38] eval_frequency: 500 [13:40:38] save_pattern: y [13:40:38] quantization: y [13:40:38] quantization_bits: 4 [13:40:38] lora: y [13:40:38] frozen_lora_path: None [13:40:38] lora_rank: 16 [13:40:38] lora_alpha: 32 [13:40:38] lora_dropout: 0.1 [13:40:38] optimizer_weight_decay: 0.0 [13:40:38] warmup_type: cosine [13:40:38] warmup_ratio: 0.08 [13:40:38] warmup_steps: 550 [13:40:38] shuffle: y [13:40:38] csv_column: text [13:40:38] new_run: n [13:40:38] label_smoothing: 0.05 [13:40:38] SEED: 1 [13:40:38] Using device: cuda [13:40:39] Resuming from temp checkpoint: /content/drive/MyDrive/llm/Discord-Hermes-3-8B/temp/epoch1_step11600 [13:46:13] Embeddings shape after: torch.Size([128256, 4096]) [13:46:19] Loaded trainable LoRA adapter from /content/drive/MyDrive/llm/Discord-Hermes-3-8B/temp/epoch1_step11600 [13:46:19] Trainable LoRA 'default': [13:46:19] task_type: CAUSAL_LM [13:46:19] peft_type: PeftType.LORA [13:46:19] auto_mapping: None [13:46:19] base_model_name_or_path: /content/drive/MyDrive/llm/NousResearch/Hermes-3-Llama-3.1-8B [13:46:19] revision: None [13:46:19] inference_mode: False [13:46:19] r: 16 [13:46:19] target_modules: {'q_proj', 'k_proj', 'o_proj', 'v_proj'} [13:46:19] exclude_modules: None [13:46:19] lora_alpha: 32 [13:46:19] lora_dropout: 0.1 [13:46:19] fan_in_fan_out: False [13:46:19] bias: none [13:46:19] use_rslora: True [13:46:19] modules_to_save: None [13:46:19] init_lora_weights: True [13:46:19] layers_to_transform: None [13:46:19] layers_pattern: None [13:46:19] rank_pattern: {} [13:46:19] alpha_pattern: {} [13:46:19] megatron_config: None [13:46:19] megatron_core: megatron.core [13:46:19] trainable_token_indices: None [13:46:19] loftq_config: {} [13:46:19] eva_config: None [13:46:19] corda_config: None [13:46:19] use_dora: False [13:46:19] use_qalora: False [13:46:19] qalora_group_size: 16 [13:46:20] layer_replication: None [13:46:20] runtime_config: LoraRuntimeConfig(ephemeral_gpu_offload=False) [13:46:20] lora_bias: False [13:46:20] target_parameters: None [13:46:20] _custom_modules: None [13:46:20] Embeddings shape after: torch.Size([128256, 4096]) [13:46:27] Resumed from epoch 1, step 11601, file 1 [13:46:27] Starting from CSV file... [13:46:30] Splitting data into chunks of 11000... [13:46:30] Using 7 processes across 10 chunks [13:46:31] Using saved train/val split from checkpoint. [13:46:31] Resuming scheduler with warmup steps: 229, total steps: 2871 [13:46:31] Initializing scheduler with cosine schedule with warmup, warmup steps 550, total steps: 2871 [13:46:31] Train/Val split: 100492 train, 100 val samples. [13:46:40] Model: PeftModelForCausalLM [13:46:40] Model config: LlamaConfig { "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": 128040, "head_dim": 128, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 131072, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pretraining_tp": 1, "quantization_config": { "_load_in_4bit": true, "_load_in_8bit": false, "bnb_4bit_compute_dtype": "float16", "bnb_4bit_quant_storage": "uint8", "bnb_4bit_quant_type": "nf4", "bnb_4bit_use_double_quant": true, "llm_int8_enable_fp32_cpu_offload": false, "llm_int8_has_fp16_weight": false, "llm_int8_skip_modules": [ "lm_head" ], "llm_int8_threshold": 6.0, "load_in_4bit": true, "load_in_8bit": false, "quant_method": "bitsandbytes" }, "rms_norm_eps": 1e-05, "rope_scaling": { "factor": 8.0, "high_freq_factor": 4.0, "low_freq_factor": 1.0, "original_max_position_embeddings": 8192, "rope_type": "llama3" }, "rope_theta": 500000.0, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.55.2", "use_cache": true, "vocab_size": 128256 } [13:46:40] Optimizer params: lr=3e-05, weight_decay=0.0, accum_steps=7 [13:46:40] Optimizer: PagedAdamW ( Parameter Group 0 alpha: 0.0 betas: (0.9, 0.95) eps: 1e-08 initial_lr: 3e-05 lr: 0.0 t_alpha: None t_beta3: None weight_decay: 0.0 ) [13:46:40] Optimizer params: lr=3e-05, weight_decay=0.0, accum_steps=7 [13:46:40] Scheduler: [13:46:40] Training on 100492 training samples, 100 validation samples [13:46:40] Average tokens per sample: 150.00 [13:46:40] Estimated epoch time: ~296.87 min [13:46:40] |===========================================================================| | PyTorch CUDA memory summary, device ID 0 | |---------------------------------------------------------------------------| | CUDA OOMs: 0 | cudaMalloc retries: 0 | |===========================================================================| | Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | |---------------------------------------------------------------------------| | Allocated memory | 5986 MiB | 7004 MiB | 335397 MiB | 329410 MiB | |---------------------------------------------------------------------------| | Active memory | 5986 MiB | 7004 MiB | 335397 MiB | 329410 MiB | |---------------------------------------------------------------------------| | Requested memory | 5983 MiB | 7000 MiB | 335022 MiB | 329039 MiB | |---------------------------------------------------------------------------| | GPU reserved memory | 7248 MiB | 7248 MiB | 7248 MiB | 0 B | |---------------------------------------------------------------------------| | Non-releasable memory | 1261 MiB | 5879 MiB | 328754 MiB | 327493 MiB | |---------------------------------------------------------------------------| | Allocations | 2762 | 2840 | 33883 | 31121 | |---------------------------------------------------------------------------| | Active allocs | 2762 | 2840 | 33883 | 31121 | |---------------------------------------------------------------------------| | GPU reserved segments | 185 | 185 | 185 | 0 | |---------------------------------------------------------------------------| | Non-releasable allocs | 36 | 36 | 13826 | 13790 | |---------------------------------------------------------------------------| | Oversize allocations | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Oversize GPU segments | 0 | 0 | 0 | 0 | |===========================================================================| [13:46:40] Restoring shuffle indices from training state for epoch 1 [13:46:40] CPU usage: 44.7%, RAM usage: 37.4% [13:46:41] Epoch 1 learning rate: 0.0 [13:46:41] Starting epoch 1 [13:47:20] Batch 11601: input_ids shape torch.Size([5, 150]), attention_mask shape torch.Size([5, 150]) [13:47:21] Epoch: 1 Batch: 11601/20099 (57.72%) Loss: 2.155287 LR: 0.00000000 [13:47:23] Epoch: 1 Batch: 11602/20099 (57.72%) Loss: 2.090240 LR: 0.00000000 [13:47:24] Epoch: 1 Batch: 11603/20099 (57.73%) Loss: 1.809998 LR: 0.00000000 [13:47:26] Epoch: 1 Batch: 11604/20099 (57.73%) Loss: 2.006388 LR: 0.00000000 [13:47:28] Epoch: 1 Batch: 11605/20099 (57.74%) Loss: 2.192882 LR: 0.00000000 [13:47:29] Epoch: 1 Batch: 11606/20099 (57.74%) Loss: 1.982967 LR: 0.00000000 [13:47:31] Epoch: 1 Batch: 11607/20099 (57.75%) Loss: 1.957612 LR: 0.00001887 [13:47:33] Epoch: 1 Batch: 11608/20099 (57.75%) Loss: 1.946005 LR: 0.00001887 [13:47:34] Epoch: 1 Batch: 11609/20099 (57.76%) Loss: 1.855440 LR: 0.00001887 [13:47:36] Epoch: 1 Batch: 11610/20099 (57.76%) Loss: 2.329100 LR: 0.00001887 [13:47:37] Epoch: 1 Batch: 11611/20099 (57.77%) Loss: 2.153529 LR: 0.00001887 [13:47:39] Epoch: 1 Batch: 11612/20099 (57.77%) Loss: 2.180310 LR: 0.00001887 [13:47:41] Epoch: 1 Batch: 11613/20099 (57.78%) Loss: 2.159098 LR: 0.00001887 [13:47:42] Epoch: 1 Batch: 11614/20099 (57.78%) Loss: 2.207454 LR: 0.00001885 [13:47:44] Epoch: 1 Batch: 11615/20099 (57.79%) Loss: 2.190587 LR: 0.00001885 [13:47:46] Epoch: 1 Batch: 11616/20099 (57.79%) Loss: 1.937835 LR: 0.00001885 [13:47:47] Epoch: 1 Batch: 11617/20099 (57.80%) Loss: 1.851808 LR: 0.00001885 [13:47:49] Epoch: 1 Batch: 11618/20099 (57.80%) Loss: 2.149576 LR: 0.00001885 [13:47:51] Epoch: 1 Batch: 11619/20099 (57.81%) Loss: 1.917542 LR: 0.00001885 [13:47:52] Epoch: 1 Batch: 11620/20099 (57.81%) Loss: 2.133489 LR: 0.00001885 [13:47:54] Epoch: 1 Batch: 11621/20099 (57.82%) Loss: 2.123605 LR: 0.00001884 [13:47:56] Epoch: 1 Batch: 11622/20099 (57.82%) Loss: 1.669149 LR: 0.00001884 [13:47:57] Epoch: 1 Batch: 11623/20099 (57.83%) Loss: 2.318624 LR: 0.00001884 [13:47:59] Epoch: 1 Batch: 11624/20099 (57.83%) Loss: 1.944835 LR: 0.00001884 [13:48:01] Epoch: 1 Batch: 11625/20099 (57.84%) Loss: 2.391906 LR: 0.00001884 [13:48:02] Epoch: 1 Batch: 11626/20099 (57.84%) Loss: 1.898487 LR: 0.00001884 [13:48:04] Epoch: 1 Batch: 11627/20099 (57.85%) Loss: 2.187089 LR: 0.00001884 [13:48:06] Epoch: 1 Batch: 11628/20099 (57.85%) Loss: 2.429490 LR: 0.00001882 [13:48:07] Epoch: 1 Batch: 11629/20099 (57.86%) Loss: 1.914874 LR: 0.00001882 [13:48:09] Epoch: 1 Batch: 11630/20099 (57.86%) Loss: 2.009294 LR: 0.00001882 [13:48:11] Epoch: 1 Batch: 11631/20099 (57.87%) Loss: 2.392295 LR: 0.00001882 [13:48:12] Epoch: 1 Batch: 11632/20099 (57.87%) Loss: 1.945846 LR: 0.00001882 [13:48:14] Epoch: 1 Batch: 11633/20099 (57.88%) Loss: 1.926804 LR: 0.00001882 [13:48:16] Epoch: 1 Batch: 11634/20099 (57.88%) Loss: 1.954943 LR: 0.00001882 [13:48:17] Epoch: 1 Batch: 11635/20099 (57.89%) Loss: 2.227548 LR: 0.00001880 [13:48:19] Epoch: 1 Batch: 11636/20099 (57.89%) Loss: 2.034768 LR: 0.00001880 [13:48:21] Epoch: 1 Batch: 11637/20099 (57.90%) Loss: 2.322190 LR: 0.00001880 [13:48:22] Epoch: 1 Batch: 11638/20099 (57.90%) Loss: 2.219972 LR: 0.00001880 [13:48:24] Epoch: 1 Batch: 11639/20099 (57.91%) Loss: 2.122686 LR: 0.00001880 [13:48:26] Epoch: 1 Batch: 11640/20099 (57.91%) Loss: 1.926885 LR: 0.00001880 [13:48:27] Epoch: 1 Batch: 11641/20099 (57.92%) Loss: 1.897896 LR: 0.00001880 [13:48:29] Epoch: 1 Batch: 11642/20099 (57.92%) Loss: 2.077842 LR: 0.00001879 [13:48:31] Epoch: 1 Batch: 11643/20099 (57.93%) Loss: 2.303964 LR: 0.00001879 [13:48:33] Epoch: 1 Batch: 11644/20099 (57.93%) Loss: 2.046954 LR: 0.00001879 [13:48:34] Epoch: 1 Batch: 11645/20099 (57.94%) Loss: 2.160060 LR: 0.00001879 [13:48:36] Epoch: 1 Batch: 11646/20099 (57.94%) Loss: 2.269464 LR: 0.00001879 [13:48:38] Epoch: 1 Batch: 11647/20099 (57.95%) Loss: 2.438831 LR: 0.00001879 [13:48:39] Epoch: 1 Batch: 11648/20099 (57.95%) Loss: 2.036758 LR: 0.00001879 [13:48:41] Epoch: 1 Batch: 11649/20099 (57.96%) Loss: 2.136044 LR: 0.00001877 [13:48:43] Epoch: 1 Batch: 11650/20099 (57.96%) Loss: 1.804207 LR: 0.00001877 [13:48:45] Epoch: 1 Batch: 11651/20099 (57.97%) Loss: 1.809004 LR: 0.00001877 [13:48:46] Epoch: 1 Batch: 11652/20099 (57.97%) Loss: 2.137701 LR: 0.00001877 [13:48:48] Epoch: 1 Batch: 11653/20099 (57.98%) Loss: 2.075732 LR: 0.00001877 [13:48:50] Epoch: 1 Batch: 11654/20099 (57.98%) Loss: 1.932519 LR: 0.00001877 [13:48:51] Epoch: 1 Batch: 11655/20099 (57.99%) Loss: 2.106810 LR: 0.00001877 [13:48:53] Epoch: 1 Batch: 11656/20099 (57.99%) Loss: 1.824063 LR: 0.00001875 [13:48:55] Epoch: 1 Batch: 11657/20099 (58.00%) Loss: 1.913222 LR: 0.00001875 [13:48:57] Epoch: 1 Batch: 11658/20099 (58.00%) Loss: 2.220027 LR: 0.00001875 [13:48:58] Epoch: 1 Batch: 11659/20099 (58.01%) Loss: 2.284807 LR: 0.00001875 [13:49:00] Epoch: 1 Batch: 11660/20099 (58.01%) Loss: 2.115885 LR: 0.00001875 [13:49:02] Epoch: 1 Batch: 11661/20099 (58.02%) Loss: 1.875328 LR: 0.00001875 [13:49:04] Epoch: 1 Batch: 11662/20099 (58.02%) Loss: 2.000580 LR: 0.00001875 [13:49:05] Epoch: 1 Batch: 11663/20099 (58.03%) Loss: 1.861473 LR: 0.00001874 [13:49:07] Epoch: 1 Batch: 11664/20099 (58.03%) Loss: 2.220942 LR: 0.00001874 [13:49:09] Epoch: 1 Batch: 11665/20099 (58.04%) Loss: 2.075192 LR: 0.00001874 [13:49:11] Epoch: 1 Batch: 11666/20099 (58.04%) Loss: 1.703435 LR: 0.00001874 [13:49:12] Epoch: 1 Batch: 11667/20099 (58.05%) Loss: 2.017027 LR: 0.00001874 [13:49:14] Epoch: 1 Batch: 11668/20099 (58.05%) Loss: 2.112870 LR: 0.00001874 [13:49:16] Epoch: 1 Batch: 11669/20099 (58.06%) Loss: 2.043498 LR: 0.00001874 [13:49:18] Epoch: 1 Batch: 11670/20099 (58.06%) Loss: 2.282007 LR: 0.00001872 [13:49:19] Epoch: 1 Batch: 11671/20099 (58.07%) Loss: 2.311927 LR: 0.00001872 [13:49:21] Epoch: 1 Batch: 11672/20099 (58.07%) Loss: 1.960341 LR: 0.00001872 [13:49:23] Epoch: 1 Batch: 11673/20099 (58.08%) Loss: 2.030680 LR: 0.00001872 [13:49:25] Epoch: 1 Batch: 11674/20099 (58.08%) Loss: 2.347935 LR: 0.00001872 [13:49:27] Epoch: 1 Batch: 11675/20099 (58.09%) Loss: 2.133591 LR: 0.00001872 [13:49:28] Epoch: 1 Batch: 11676/20099 (58.09%) Loss: 2.092982 LR: 0.00001872 [13:49:30] Epoch: 1 Batch: 11677/20099 (58.10%) Loss: 2.079291 LR: 0.00001871 [13:49:32] Epoch: 1 Batch: 11678/20099 (58.10%) Loss: 2.268747 LR: 0.00001871 [13:49:34] Epoch: 1 Batch: 11679/20099 (58.11%) Loss: 1.979760 LR: 0.00001871 [13:49:35] Epoch: 1 Batch: 11680/20099 (58.11%) Loss: 2.001689 LR: 0.00001871 [13:49:37] Epoch: 1 Batch: 11681/20099 (58.12%) Loss: 2.151004 LR: 0.00001871 [13:49:39] Epoch: 1 Batch: 11682/20099 (58.12%) Loss: 2.083557 LR: 0.00001871 [13:49:41] Epoch: 1 Batch: 11683/20099 (58.13%) Loss: 1.713750 LR: 0.00001871 [13:49:43] Epoch: 1 Batch: 11684/20099 (58.13%) Loss: 1.907785 LR: 0.00001869 [13:49:44] Epoch: 1 Batch: 11685/20099 (58.14%) Loss: 2.232937 LR: 0.00001869 [13:49:46] Epoch: 1 Batch: 11686/20099 (58.14%) Loss: 1.881616 LR: 0.00001869 [13:49:48] Epoch: 1 Batch: 11687/20099 (58.15%) Loss: 2.284847 LR: 0.00001869 [13:49:50] Epoch: 1 Batch: 11688/20099 (58.15%) Loss: 1.883412 LR: 0.00001869 [13:49:51] Epoch: 1 Batch: 11689/20099 (58.16%) Loss: 2.058420 LR: 0.00001869 [13:49:53] Epoch: 1 Batch: 11690/20099 (58.16%) Loss: 1.967052 LR: 0.00001869 [13:49:55] Epoch: 1 Batch: 11691/20099 (58.17%) Loss: 1.979523 LR: 0.00001867 [13:49:57] Epoch: 1 Batch: 11692/20099 (58.17%) Loss: 1.842944 LR: 0.00001867 [13:49:58] Epoch: 1 Batch: 11693/20099 (58.18%) Loss: 2.180306 LR: 0.00001867 [13:50:00] Epoch: 1 Batch: 11694/20099 (58.18%) Loss: 2.393631 LR: 0.00001867 [13:50:02] Epoch: 1 Batch: 11695/20099 (58.19%) Loss: 1.988879 LR: 0.00001867 [13:50:04] Epoch: 1 Batch: 11696/20099 (58.19%) Loss: 2.170368 LR: 0.00001867 [13:50:06] Epoch: 1 Batch: 11697/20099 (58.20%) Loss: 2.136129 LR: 0.00001867 [13:50:07] Epoch: 1 Batch: 11698/20099 (58.20%) Loss: 2.182470 LR: 0.00001866 [13:50:09] Epoch: 1 Batch: 11699/20099 (58.21%) Loss: 1.959076 LR: 0.00001866 [13:50:11] Epoch: 1 Batch: 11700/20099 (58.21%) Loss: 2.224196 LR: 0.00001866 [13:50:13] Epoch: 1 Batch: 11701/20099 (58.22%) Loss: 2.191238 LR: 0.00001866 [13:50:14] Epoch: 1 Batch: 11702/20099 (58.22%) Loss: 2.472732 LR: 0.00001866 [13:50:16] Epoch: 1 Batch: 11703/20099 (58.23%) Loss: 2.117027 LR: 0.00001866 [13:50:18] Epoch: 1 Batch: 11704/20099 (58.23%) Loss: 2.105459 LR: 0.00001866 [13:50:20] Epoch: 1 Batch: 11705/20099 (58.24%) Loss: 1.881214 LR: 0.00001864 [13:50:22] Epoch: 1 Batch: 11706/20099 (58.24%) Loss: 2.279474 LR: 0.00001864 [13:50:23] Epoch: 1 Batch: 11707/20099 (58.25%) Loss: 1.867320 LR: 0.00001864 [13:50:25] Epoch: 1 Batch: 11708/20099 (58.25%) Loss: 2.149002 LR: 0.00001864 [13:50:27] Epoch: 1 Batch: 11709/20099 (58.26%) Loss: 2.211083 LR: 0.00001864 [13:50:29] Epoch: 1 Batch: 11710/20099 (58.26%) Loss: 2.241066 LR: 0.00001864 [13:50:31] Epoch: 1 Batch: 11711/20099 (58.27%) Loss: 1.891470 LR: 0.00001864 [13:50:32] Epoch: 1 Batch: 11712/20099 (58.27%) Loss: 2.196545 LR: 0.00001863 [13:50:34] Epoch: 1 Batch: 11713/20099 (58.28%) Loss: 2.290085 LR: 0.00001863 [13:50:36] Epoch: 1 Batch: 11714/20099 (58.28%) Loss: 2.354098 LR: 0.00001863 [13:50:38] Epoch: 1 Batch: 11715/20099 (58.29%) Loss: 2.145191 LR: 0.00001863 [13:50:39] Epoch: 1 Batch: 11716/20099 (58.29%) Loss: 2.179118 LR: 0.00001863 [13:50:41] Epoch: 1 Batch: 11717/20099 (58.30%) Loss: 2.370042 LR: 0.00001863 [13:50:43] Epoch: 1 Batch: 11718/20099 (58.30%) Loss: 2.002565 LR: 0.00001863 [13:50:45] Epoch: 1 Batch: 11719/20099 (58.31%) Loss: 2.180166 LR: 0.00001861 [13:50:47] Epoch: 1 Batch: 11720/20099 (58.31%) Loss: 2.261866 LR: 0.00001861 [13:50:48] Epoch: 1 Batch: 11721/20099 (58.32%) Loss: 2.404797 LR: 0.00001861 [13:50:50] Epoch: 1 Batch: 11722/20099 (58.32%) Loss: 1.867610 LR: 0.00001861 [13:50:52] Epoch: 1 Batch: 11723/20099 (58.33%) Loss: 2.090361 LR: 0.00001861 [13:50:54] Epoch: 1 Batch: 11724/20099 (58.33%) Loss: 2.007075 LR: 0.00001861 [13:50:56] Epoch: 1 Batch: 11725/20099 (58.34%) Loss: 2.007616 LR: 0.00001861 [13:50:57] Epoch: 1 Batch: 11726/20099 (58.34%) Loss: 2.090174 LR: 0.00001859 [13:50:59] Epoch: 1 Batch: 11727/20099 (58.35%) Loss: 2.055362 LR: 0.00001859 [13:51:01] Epoch: 1 Batch: 11728/20099 (58.35%) Loss: 2.093300 LR: 0.00001859 [13:51:03] Epoch: 1 Batch: 11729/20099 (58.36%) Loss: 1.763729 LR: 0.00001859 [13:51:05] Epoch: 1 Batch: 11730/20099 (58.36%) Loss: 1.893010 LR: 0.00001859 [13:51:06] Epoch: 1 Batch: 11731/20099 (58.37%) Loss: 2.306679 LR: 0.00001859 [13:51:08] Epoch: 1 Batch: 11732/20099 (58.37%) Loss: 2.121930 LR: 0.00001859 [13:51:10] Epoch: 1 Batch: 11733/20099 (58.38%) Loss: 1.966584 LR: 0.00001858 [13:51:12] Epoch: 1 Batch: 11734/20099 (58.38%) Loss: 1.898839 LR: 0.00001858 [13:51:14] Epoch: 1 Batch: 11735/20099 (58.39%) Loss: 2.202404 LR: 0.00001858 [13:51:16] Epoch: 1 Batch: 11736/20099 (58.39%) Loss: 1.676832 LR: 0.00001858 [13:51:17] Epoch: 1 Batch: 11737/20099 (58.40%) Loss: 2.452664 LR: 0.00001858 [13:51:19] Epoch: 1 Batch: 11738/20099 (58.40%) Loss: 2.224465 LR: 0.00001858 [13:51:21] Epoch: 1 Batch: 11739/20099 (58.41%) Loss: 2.098806 LR: 0.00001858 [13:51:23] Epoch: 1 Batch: 11740/20099 (58.41%) Loss: 1.938314 LR: 0.00001856 [13:51:25] Epoch: 1 Batch: 11741/20099 (58.42%) Loss: 1.984704 LR: 0.00001856 [13:51:26] Epoch: 1 Batch: 11742/20099 (58.42%) Loss: 2.221113 LR: 0.00001856 [13:51:28] Epoch: 1 Batch: 11743/20099 (58.43%) Loss: 1.881946 LR: 0.00001856 [13:51:30] Epoch: 1 Batch: 11744/20099 (58.43%) Loss: 1.953294 LR: 0.00001856 [13:51:32] Epoch: 1 Batch: 11745/20099 (58.44%) Loss: 1.913837 LR: 0.00001856 [13:51:34] Epoch: 1 Batch: 11746/20099 (58.44%) Loss: 2.061452 LR: 0.00001856 [13:51:36] Epoch: 1 Batch: 11747/20099 (58.45%) Loss: 2.374895 LR: 0.00001854 [13:51:37] Epoch: 1 Batch: 11748/20099 (58.45%) Loss: 2.203031 LR: 0.00001854 [13:51:39] Epoch: 1 Batch: 11749/20099 (58.46%) Loss: 2.091162 LR: 0.00001854 [13:51:41] Epoch: 1 Batch: 11750/20099 (58.46%) Loss: 2.098262 LR: 0.00001854 [13:51:43] Epoch: 1 Batch: 11751/20099 (58.47%) Loss: 1.944414 LR: 0.00001854 [13:51:45] Epoch: 1 Batch: 11752/20099 (58.47%) Loss: 1.941667 LR: 0.00001854 [13:51:47] Epoch: 1 Batch: 11753/20099 (58.48%) Loss: 2.208131 LR: 0.00001854 [13:51:48] Epoch: 1 Batch: 11754/20099 (58.48%) Loss: 2.142455 LR: 0.00001853 [13:51:50] Epoch: 1 Batch: 11755/20099 (58.49%) Loss: 2.210806 LR: 0.00001853 [13:51:52] Epoch: 1 Batch: 11756/20099 (58.49%) Loss: 2.112141 LR: 0.00001853 [13:51:54] Epoch: 1 Batch: 11757/20099 (58.50%) Loss: 1.695814 LR: 0.00001853 [13:51:56] Epoch: 1 Batch: 11758/20099 (58.50%) Loss: 2.308747 LR: 0.00001853 [13:51:58] Epoch: 1 Batch: 11759/20099 (58.51%) Loss: 2.217951 LR: 0.00001853 [13:51:59] Epoch: 1 Batch: 11760/20099 (58.51%) Loss: 2.123650 LR: 0.00001853 [13:52:01] Epoch: 1 Batch: 11761/20099 (58.52%) Loss: 2.082167 LR: 0.00001851 [13:52:03] Epoch: 1 Batch: 11762/20099 (58.52%) Loss: 2.251978 LR: 0.00001851 [13:52:05] Epoch: 1 Batch: 11763/20099 (58.53%) Loss: 2.240474 LR: 0.00001851 [13:52:07] Epoch: 1 Batch: 11764/20099 (58.53%) Loss: 2.287882 LR: 0.00001851 [13:52:09] Epoch: 1 Batch: 11765/20099 (58.54%) Loss: 2.303154 LR: 0.00001851 [13:52:10] Epoch: 1 Batch: 11766/20099 (58.54%) Loss: 2.067173 LR: 0.00001851 [13:52:12] Epoch: 1 Batch: 11767/20099 (58.55%) Loss: 2.291340 LR: 0.00001851 [13:52:14] Epoch: 1 Batch: 11768/20099 (58.55%) Loss: 1.983034 LR: 0.00001850 [13:52:16] Epoch: 1 Batch: 11769/20099 (58.56%) Loss: 2.243315 LR: 0.00001850 [13:52:18] Epoch: 1 Batch: 11770/20099 (58.56%) Loss: 2.237517 LR: 0.00001850 [13:52:20] Epoch: 1 Batch: 11771/20099 (58.57%) Loss: 2.051685 LR: 0.00001850 [13:52:21] Epoch: 1 Batch: 11772/20099 (58.57%) Loss: 1.980009 LR: 0.00001850 [13:52:23] Epoch: 1 Batch: 11773/20099 (58.58%) Loss: 1.900794 LR: 0.00001850 [13:52:25] Epoch: 1 Batch: 11774/20099 (58.58%) Loss: 2.083643 LR: 0.00001850 [13:52:27] Epoch: 1 Batch: 11775/20099 (58.59%) Loss: 2.554008 LR: 0.00001848 [13:52:29] Epoch: 1 Batch: 11776/20099 (58.59%) Loss: 1.980413 LR: 0.00001848 [13:52:31] Epoch: 1 Batch: 11777/20099 (58.59%) Loss: 2.092948 LR: 0.00001848 [13:52:33] Epoch: 1 Batch: 11778/20099 (58.60%) Loss: 2.003721 LR: 0.00001848 [13:52:35] Epoch: 1 Batch: 11779/20099 (58.60%) Loss: 2.252162 LR: 0.00001848 [13:52:36] Epoch: 1 Batch: 11780/20099 (58.61%) Loss: 2.306731 LR: 0.00001848 [13:52:38] Epoch: 1 Batch: 11781/20099 (58.61%) Loss: 2.096641 LR: 0.00001848 [13:52:40] Epoch: 1 Batch: 11782/20099 (58.62%) Loss: 2.242378 LR: 0.00001846 [13:52:42] Epoch: 1 Batch: 11783/20099 (58.62%) Loss: 2.262103 LR: 0.00001846 [13:52:44] Epoch: 1 Batch: 11784/20099 (58.63%) Loss: 2.047008 LR: 0.00001846 [13:52:46] Epoch: 1 Batch: 11785/20099 (58.63%) Loss: 2.046954 LR: 0.00001846 [13:52:47] Epoch: 1 Batch: 11786/20099 (58.64%) Loss: 2.134771 LR: 0.00001846 [13:52:49] Epoch: 1 Batch: 11787/20099 (58.64%) Loss: 2.003363 LR: 0.00001846 [13:52:51] Epoch: 1 Batch: 11788/20099 (58.65%) Loss: 1.777327 LR: 0.00001846 [13:52:53] Epoch: 1 Batch: 11789/20099 (58.65%) Loss: 2.146442 LR: 0.00001845 [13:52:55] Epoch: 1 Batch: 11790/20099 (58.66%) Loss: 1.984076 LR: 0.00001845 [13:52:57] Epoch: 1 Batch: 11791/20099 (58.66%) Loss: 2.324841 LR: 0.00001845 [13:52:59] Epoch: 1 Batch: 11792/20099 (58.67%) Loss: 1.930925 LR: 0.00001845 [13:53:00] Epoch: 1 Batch: 11793/20099 (58.67%) Loss: 2.045105 LR: 0.00001845 [13:53:02] Epoch: 1 Batch: 11794/20099 (58.68%) Loss: 2.167869 LR: 0.00001845 [13:53:04] Epoch: 1 Batch: 11795/20099 (58.68%) Loss: 2.091428 LR: 0.00001845 [13:53:06] Epoch: 1 Batch: 11796/20099 (58.69%) Loss: 2.087180 LR: 0.00001843 [13:53:08] Epoch: 1 Batch: 11797/20099 (58.69%) Loss: 2.019968 LR: 0.00001843 [13:53:10] Epoch: 1 Batch: 11798/20099 (58.70%) Loss: 1.965273 LR: 0.00001843 [13:53:12] Epoch: 1 Batch: 11799/20099 (58.70%) Loss: 2.102204 LR: 0.00001843 [13:53:18] >> Cleaned up old temp checkpoint: epoch1_step6000 [13:53:18] >> Cleaned up old temp checkpoint: epoch1_step5800 [13:53:18] >> Cleaned up old temp checkpoint: epoch1_step5600 [13:53:18] >> Temp checkpoint saved: epoch1_step11800, size: 0.1693 GB [13:53:18] Epoch: 1 Batch: 11800/20099 (58.71%) Loss: 1.970434 LR: 0.00001843 [13:53:20] Epoch: 1 Batch: 11801/20099 (58.71%) Loss: 2.213262 LR: 0.00001843 [13:53:22] Epoch: 1 Batch: 11802/20099 (58.72%) Loss: 2.448035 LR: 0.00001843 [13:53:24] Epoch: 1 Batch: 11803/20099 (58.72%) Loss: 2.054489 LR: 0.00001841 [13:53:26] Epoch: 1 Batch: 11804/20099 (58.73%) Loss: 2.035305 LR: 0.00001841 [13:53:27] Epoch: 1 Batch: 11805/20099 (58.73%) Loss: 2.100226 LR: 0.00001841 [13:53:29] Epoch: 1 Batch: 11806/20099 (58.74%) Loss: 2.253917 LR: 0.00001841 [13:53:31] Epoch: 1 Batch: 11807/20099 (58.74%) Loss: 1.935932 LR: 0.00001841 [13:53:33] Epoch: 1 Batch: 11808/20099 (58.75%) Loss: 2.095400 LR: 0.00001841 [13:53:35] Epoch: 1 Batch: 11809/20099 (58.75%) Loss: 2.087065 LR: 0.00001841 [13:53:37] Epoch: 1 Batch: 11810/20099 (58.76%) Loss: 2.004863 LR: 0.00001840 [13:53:38] Epoch: 1 Batch: 11811/20099 (58.76%) Loss: 2.159311 LR: 0.00001840 [13:53:40] Epoch: 1 Batch: 11812/20099 (58.77%) Loss: 2.260426 LR: 0.00001840 [13:53:42] Epoch: 1 Batch: 11813/20099 (58.77%) Loss: 1.922394 LR: 0.00001840 [13:53:44] Epoch: 1 Batch: 11814/20099 (58.78%) Loss: 2.360313 LR: 0.00001840 [13:53:46] Epoch: 1 Batch: 11815/20099 (58.78%) Loss: 2.046540 LR: 0.00001840 [13:53:48] Epoch: 1 Batch: 11816/20099 (58.79%) Loss: 2.258841 LR: 0.00001840 [13:53:50] Epoch: 1 Batch: 11817/20099 (58.79%) Loss: 2.000324 LR: 0.00001838 [13:53:51] Epoch: 1 Batch: 11818/20099 (58.80%) Loss: 2.274274 LR: 0.00001838 [13:53:53] Epoch: 1 Batch: 11819/20099 (58.80%) Loss: 2.275008 LR: 0.00001838 [13:53:55] Epoch: 1 Batch: 11820/20099 (58.81%) Loss: 2.180188 LR: 0.00001838 [13:53:57] Epoch: 1 Batch: 11821/20099 (58.81%) Loss: 2.175413 LR: 0.00001838 [13:53:59] Epoch: 1 Batch: 11822/20099 (58.82%) Loss: 1.957323 LR: 0.00001838 [13:54:01] Epoch: 1 Batch: 11823/20099 (58.82%) Loss: 2.111849 LR: 0.00001838 [13:54:03] Epoch: 1 Batch: 11824/20099 (58.83%) Loss: 2.113971 LR: 0.00001837 [13:54:04] Epoch: 1 Batch: 11825/20099 (58.83%) Loss: 2.256018 LR: 0.00001837 [13:54:06] Epoch: 1 Batch: 11826/20099 (58.84%) Loss: 1.936416 LR: 0.00001837 [13:54:08] Epoch: 1 Batch: 11827/20099 (58.84%) Loss: 1.841989 LR: 0.00001837 [13:54:10] Epoch: 1 Batch: 11828/20099 (58.85%) Loss: 1.995320 LR: 0.00001837 [13:54:12] Epoch: 1 Batch: 11829/20099 (58.85%) Loss: 2.261927 LR: 0.00001837 [13:54:14] Epoch: 1 Batch: 11830/20099 (58.86%) Loss: 1.959313 LR: 0.00001837 [13:54:15] Epoch: 1 Batch: 11831/20099 (58.86%) Loss: 2.275488 LR: 0.00001835 [13:54:17] Epoch: 1 Batch: 11832/20099 (58.87%) Loss: 2.057869 LR: 0.00001835 [13:54:19] Epoch: 1 Batch: 11833/20099 (58.87%) Loss: 2.074106 LR: 0.00001835 [13:54:21] Epoch: 1 Batch: 11834/20099 (58.88%) Loss: 1.612940 LR: 0.00001835 [13:54:23] Epoch: 1 Batch: 11835/20099 (58.88%) Loss: 2.115710 LR: 0.00001835 [13:54:25] Epoch: 1 Batch: 11836/20099 (58.89%) Loss: 1.680971 LR: 0.00001835 [13:54:26] Epoch: 1 Batch: 11837/20099 (58.89%) Loss: 2.159996 LR: 0.00001835 [13:54:28] Epoch: 1 Batch: 11838/20099 (58.90%) Loss: 2.004302 LR: 0.00001833 [13:54:30] Epoch: 1 Batch: 11839/20099 (58.90%) Loss: 2.202188 LR: 0.00001833 [13:54:32] Epoch: 1 Batch: 11840/20099 (58.91%) Loss: 2.377111 LR: 0.00001833 [13:54:34] Epoch: 1 Batch: 11841/20099 (58.91%) Loss: 2.104885 LR: 0.00001833 [13:54:36] Epoch: 1 Batch: 11842/20099 (58.92%) Loss: 2.124622 LR: 0.00001833 [13:54:38] Epoch: 1 Batch: 11843/20099 (58.92%) Loss: 2.091166 LR: 0.00001833 [13:54:39] Epoch: 1 Batch: 11844/20099 (58.93%) Loss: 1.869989 LR: 0.00001833 [13:54:41] Epoch: 1 Batch: 11845/20099 (58.93%) Loss: 1.897210 LR: 0.00001832 [13:54:43] Epoch: 1 Batch: 11846/20099 (58.94%) Loss: 2.138158 LR: 0.00001832 [13:54:45] Epoch: 1 Batch: 11847/20099 (58.94%) Loss: 1.898017 LR: 0.00001832 [13:54:47] Epoch: 1 Batch: 11848/20099 (58.95%) Loss: 2.277475 LR: 0.00001832 [13:54:49] Epoch: 1 Batch: 11849/20099 (58.95%) Loss: 1.848698 LR: 0.00001832 [13:54:51] Epoch: 1 Batch: 11850/20099 (58.96%) Loss: 2.144259 LR: 0.00001832 [13:54:52] Epoch: 1 Batch: 11851/20099 (58.96%) Loss: 2.106738 LR: 0.00001832 [13:54:54] Epoch: 1 Batch: 11852/20099 (58.97%) Loss: 2.360341 LR: 0.00001830 [13:54:56] Epoch: 1 Batch: 11853/20099 (58.97%) Loss: 1.937472 LR: 0.00001830 [13:54:58] Epoch: 1 Batch: 11854/20099 (58.98%) Loss: 2.291535 LR: 0.00001830 [13:55:00] Epoch: 1 Batch: 11855/20099 (58.98%) Loss: 2.269319 LR: 0.00001830 [13:55:02] Epoch: 1 Batch: 11856/20099 (58.99%) Loss: 2.086259 LR: 0.00001830 [13:55:04] Epoch: 1 Batch: 11857/20099 (58.99%) Loss: 2.061155 LR: 0.00001830 [13:55:05] Epoch: 1 Batch: 11858/20099 (59.00%) Loss: 2.321799 LR: 0.00001830 [13:55:07] Epoch: 1 Batch: 11859/20099 (59.00%) Loss: 1.982731 LR: 0.00001828 [13:55:09] Epoch: 1 Batch: 11860/20099 (59.01%) Loss: 2.286336 LR: 0.00001828 [13:55:11] Epoch: 1 Batch: 11861/20099 (59.01%) Loss: 2.063646 LR: 0.00001828 [13:55:13] Epoch: 1 Batch: 11862/20099 (59.02%) Loss: 1.992929 LR: 0.00001828 [13:55:15] Epoch: 1 Batch: 11863/20099 (59.02%) Loss: 2.243889 LR: 0.00001828 [13:55:17] Epoch: 1 Batch: 11864/20099 (59.03%) Loss: 2.358169 LR: 0.00001828 [13:55:18] Epoch: 1 Batch: 11865/20099 (59.03%) Loss: 1.999358 LR: 0.00001828 [13:55:20] Epoch: 1 Batch: 11866/20099 (59.04%) Loss: 2.339380 LR: 0.00001827 [13:55:22] Epoch: 1 Batch: 11867/20099 (59.04%) Loss: 1.903591 LR: 0.00001827 [13:55:24] Epoch: 1 Batch: 11868/20099 (59.05%) Loss: 2.077491 LR: 0.00001827 [13:55:26] Epoch: 1 Batch: 11869/20099 (59.05%) Loss: 2.216758 LR: 0.00001827 [13:55:28] Epoch: 1 Batch: 11870/20099 (59.06%) Loss: 2.027454 LR: 0.00001827 [13:55:30] Epoch: 1 Batch: 11871/20099 (59.06%) Loss: 2.039042 LR: 0.00001827 [13:55:32] Epoch: 1 Batch: 11872/20099 (59.07%) Loss: 1.807988 LR: 0.00001827 [13:55:33] Epoch: 1 Batch: 11873/20099 (59.07%) Loss: 1.829313 LR: 0.00001825 [13:55:35] Epoch: 1 Batch: 11874/20099 (59.08%) Loss: 2.061653 LR: 0.00001825 [13:55:37] Epoch: 1 Batch: 11875/20099 (59.08%) Loss: 2.062271 LR: 0.00001825 [13:55:39] Epoch: 1 Batch: 11876/20099 (59.09%) Loss: 2.330869 LR: 0.00001825 [13:55:41] Epoch: 1 Batch: 11877/20099 (59.09%) Loss: 2.580341 LR: 0.00001825 [13:55:43] Epoch: 1 Batch: 11878/20099 (59.10%) Loss: 2.228382 LR: 0.00001825 [13:55:45] Epoch: 1 Batch: 11879/20099 (59.10%) Loss: 2.078221 LR: 0.00001825 [13:55:46] Epoch: 1 Batch: 11880/20099 (59.11%) Loss: 1.818790 LR: 0.00001824 [13:55:48] Epoch: 1 Batch: 11881/20099 (59.11%) Loss: 2.021463 LR: 0.00001824 [13:55:50] Epoch: 1 Batch: 11882/20099 (59.12%) Loss: 2.052936 LR: 0.00001824 [13:55:52] Epoch: 1 Batch: 11883/20099 (59.12%) Loss: 2.255507 LR: 0.00001824 [13:55:54] Epoch: 1 Batch: 11884/20099 (59.13%) Loss: 2.112234 LR: 0.00001824 [13:55:56] Epoch: 1 Batch: 11885/20099 (59.13%) Loss: 2.133395 LR: 0.00001824 [13:55:58] Epoch: 1 Batch: 11886/20099 (59.14%) Loss: 2.254321 LR: 0.00001824 [13:55:59] Epoch: 1 Batch: 11887/20099 (59.14%) Loss: 2.129515 LR: 0.00001822 [13:56:01] Epoch: 1 Batch: 11888/20099 (59.15%) Loss: 2.238394 LR: 0.00001822 [13:56:03] Epoch: 1 Batch: 11889/20099 (59.15%) Loss: 2.329515 LR: 0.00001822 [13:56:05] Epoch: 1 Batch: 11890/20099 (59.16%) Loss: 2.027393 LR: 0.00001822 [13:56:07] Epoch: 1 Batch: 11891/20099 (59.16%) Loss: 2.009616 LR: 0.00001822 [13:56:09] Epoch: 1 Batch: 11892/20099 (59.17%) Loss: 2.045273 LR: 0.00001822 [13:56:11] Epoch: 1 Batch: 11893/20099 (59.17%) Loss: 2.198835 LR: 0.00001822 [13:56:12] Epoch: 1 Batch: 11894/20099 (59.18%) Loss: 2.086552 LR: 0.00001820 [13:56:14] Epoch: 1 Batch: 11895/20099 (59.18%) Loss: 2.019103 LR: 0.00001820 [13:56:16] Epoch: 1 Batch: 11896/20099 (59.19%) Loss: 1.980255 LR: 0.00001820 [13:56:18] Epoch: 1 Batch: 11897/20099 (59.19%) Loss: 2.073910 LR: 0.00001820 [13:56:20] Epoch: 1 Batch: 11898/20099 (59.20%) Loss: 2.262356 LR: 0.00001820 [13:56:22] Epoch: 1 Batch: 11899/20099 (59.20%) Loss: 2.138031 LR: 0.00001820 [13:56:24] Epoch: 1 Batch: 11900/20099 (59.21%) Loss: 2.154301 LR: 0.00001820 [13:56:25] Epoch: 1 Batch: 11901/20099 (59.21%) Loss: 2.523156 LR: 0.00001819 [13:56:27] Epoch: 1 Batch: 11902/20099 (59.22%) Loss: 2.112205 LR: 0.00001819 [13:56:29] Epoch: 1 Batch: 11903/20099 (59.22%) Loss: 2.299864 LR: 0.00001819 [13:56:31] Epoch: 1 Batch: 11904/20099 (59.23%) Loss: 1.917401 LR: 0.00001819 [13:56:33] Epoch: 1 Batch: 11905/20099 (59.23%) Loss: 2.186485 LR: 0.00001819 [13:56:35] Epoch: 1 Batch: 11906/20099 (59.24%) Loss: 2.154911 LR: 0.00001819 [13:56:37] Epoch: 1 Batch: 11907/20099 (59.24%) Loss: 2.220390 LR: 0.00001819 [13:56:38] Epoch: 1 Batch: 11908/20099 (59.25%) Loss: 2.214277 LR: 0.00001817 [13:56:40] Epoch: 1 Batch: 11909/20099 (59.25%) Loss: 1.964027 LR: 0.00001817 [13:56:42] Epoch: 1 Batch: 11910/20099 (59.26%) Loss: 2.199575 LR: 0.00001817 [13:56:44] Epoch: 1 Batch: 11911/20099 (59.26%) Loss: 2.139123 LR: 0.00001817 [13:56:46] Epoch: 1 Batch: 11912/20099 (59.27%) Loss: 2.092628 LR: 0.00001817 [13:56:48] Epoch: 1 Batch: 11913/20099 (59.27%) Loss: 1.992154 LR: 0.00001817 [13:56:49] Epoch: 1 Batch: 11914/20099 (59.28%) Loss: 1.985468 LR: 0.00001817 [13:56:51] Epoch: 1 Batch: 11915/20099 (59.28%) Loss: 2.126505 LR: 0.00001815 [13:56:53] Epoch: 1 Batch: 11916/20099 (59.29%) Loss: 2.304210 LR: 0.00001815 [13:56:55] Epoch: 1 Batch: 11917/20099 (59.29%) Loss: 2.128174 LR: 0.00001815 [13:56:57] Epoch: 1 Batch: 11918/20099 (59.30%) Loss: 1.923754 LR: 0.00001815 [13:56:59] Epoch: 1 Batch: 11919/20099 (59.30%) Loss: 2.133491 LR: 0.00001815 [13:57:01] Epoch: 1 Batch: 11920/20099 (59.31%) Loss: 2.022337 LR: 0.00001815 [13:57:02] Epoch: 1 Batch: 11921/20099 (59.31%) Loss: 2.223453 LR: 0.00001815 [13:57:04] Epoch: 1 Batch: 11922/20099 (59.32%) Loss: 1.858255 LR: 0.00001814 [13:57:06] Epoch: 1 Batch: 11923/20099 (59.32%) Loss: 2.182527 LR: 0.00001814 [13:57:08] Epoch: 1 Batch: 11924/20099 (59.33%) Loss: 2.041735 LR: 0.00001814 [13:57:10] Epoch: 1 Batch: 11925/20099 (59.33%) Loss: 2.060998 LR: 0.00001814 [13:57:12] Epoch: 1 Batch: 11926/20099 (59.34%) Loss: 2.321335 LR: 0.00001814 [13:57:14] Epoch: 1 Batch: 11927/20099 (59.34%) Loss: 2.147984 LR: 0.00001814 [13:57:15] Epoch: 1 Batch: 11928/20099 (59.35%) Loss: 2.047800 LR: 0.00001814 [13:57:17] Epoch: 1 Batch: 11929/20099 (59.35%) Loss: 2.068217 LR: 0.00001812 [13:57:19] Epoch: 1 Batch: 11930/20099 (59.36%) Loss: 1.887088 LR: 0.00001812 [13:57:21] Epoch: 1 Batch: 11931/20099 (59.36%) Loss: 1.949737 LR: 0.00001812 [13:57:23] Epoch: 1 Batch: 11932/20099 (59.37%) Loss: 2.202947 LR: 0.00001812 [13:57:25] Epoch: 1 Batch: 11933/20099 (59.37%) Loss: 2.046483 LR: 0.00001812 [13:57:27] Epoch: 1 Batch: 11934/20099 (59.38%) Loss: 2.124260 LR: 0.00001812 [13:57:28] Epoch: 1 Batch: 11935/20099 (59.38%) Loss: 2.043452 LR: 0.00001812 [13:57:30] Epoch: 1 Batch: 11936/20099 (59.39%) Loss: 2.110117 LR: 0.00001811 [13:57:32] Epoch: 1 Batch: 11937/20099 (59.39%) Loss: 2.115472 LR: 0.00001811 [13:57:34] Epoch: 1 Batch: 11938/20099 (59.40%) Loss: 1.959133 LR: 0.00001811 [13:57:36] Epoch: 1 Batch: 11939/20099 (59.40%) Loss: 1.955804 LR: 0.00001811 [13:57:38] Epoch: 1 Batch: 11940/20099 (59.41%) Loss: 2.151593 LR: 0.00001811 [13:57:40] Epoch: 1 Batch: 11941/20099 (59.41%) Loss: 2.126657 LR: 0.00001811 [13:57:41] Epoch: 1 Batch: 11942/20099 (59.42%) Loss: 2.594750 LR: 0.00001811 [13:57:43] Epoch: 1 Batch: 11943/20099 (59.42%) Loss: 2.271149 LR: 0.00001809 [13:57:45] Epoch: 1 Batch: 11944/20099 (59.43%) Loss: 1.984614 LR: 0.00001809 [13:57:47] Epoch: 1 Batch: 11945/20099 (59.43%) Loss: 2.174869 LR: 0.00001809 [13:57:49] Epoch: 1 Batch: 11946/20099 (59.44%) Loss: 2.179909 LR: 0.00001809 [13:57:51] Epoch: 1 Batch: 11947/20099 (59.44%) Loss: 2.016270 LR: 0.00001809 [13:57:52] Epoch: 1 Batch: 11948/20099 (59.45%) Loss: 1.975372 LR: 0.00001809 [13:57:54] Epoch: 1 Batch: 11949/20099 (59.45%) Loss: 2.064921 LR: 0.00001809 [13:57:56] Epoch: 1 Batch: 11950/20099 (59.46%) Loss: 1.995668 LR: 0.00001807 [13:57:58] Epoch: 1 Batch: 11951/20099 (59.46%) Loss: 2.019344 LR: 0.00001807 [13:58:00] Epoch: 1 Batch: 11952/20099 (59.47%) Loss: 2.037734 LR: 0.00001807 [13:58:02] Epoch: 1 Batch: 11953/20099 (59.47%) Loss: 2.183320 LR: 0.00001807 [13:58:04] Epoch: 1 Batch: 11954/20099 (59.48%) Loss: 1.976371 LR: 0.00001807 [13:58:05] Epoch: 1 Batch: 11955/20099 (59.48%) Loss: 2.121747 LR: 0.00001807 [13:58:07] Epoch: 1 Batch: 11956/20099 (59.49%) Loss: 1.964252 LR: 0.00001807 [13:58:09] Epoch: 1 Batch: 11957/20099 (59.49%) Loss: 2.026970 LR: 0.00001806 [13:58:11] Epoch: 1 Batch: 11958/20099 (59.50%) Loss: 2.056350 LR: 0.00001806 [13:58:13] Epoch: 1 Batch: 11959/20099 (59.50%) Loss: 2.113556 LR: 0.00001806 [13:58:15] Epoch: 1 Batch: 11960/20099 (59.51%) Loss: 2.200220 LR: 0.00001806 [13:58:17] Epoch: 1 Batch: 11961/20099 (59.51%) Loss: 1.939128 LR: 0.00001806 [13:58:18] Epoch: 1 Batch: 11962/20099 (59.52%) Loss: 1.860443 LR: 0.00001806 [13:58:20] Epoch: 1 Batch: 11963/20099 (59.52%) Loss: 2.248807 LR: 0.00001806 [13:58:22] Epoch: 1 Batch: 11964/20099 (59.53%) Loss: 1.981426 LR: 0.00001804 [13:58:24] Epoch: 1 Batch: 11965/20099 (59.53%) Loss: 1.994280 LR: 0.00001804 [13:58:26] Epoch: 1 Batch: 11966/20099 (59.54%) Loss: 1.915812 LR: 0.00001804 [13:58:28] Epoch: 1 Batch: 11967/20099 (59.54%) Loss: 2.269165 LR: 0.00001804 [13:58:29] Epoch: 1 Batch: 11968/20099 (59.55%) Loss: 2.000148 LR: 0.00001804 [13:58:31] Epoch: 1 Batch: 11969/20099 (59.55%) Loss: 2.320022 LR: 0.00001804 [13:58:33] Epoch: 1 Batch: 11970/20099 (59.56%) Loss: 2.303728 LR: 0.00001804 [13:58:35] Epoch: 1 Batch: 11971/20099 (59.56%) Loss: 2.131265 LR: 0.00001802 [13:58:37] Epoch: 1 Batch: 11972/20099 (59.57%) Loss: 2.039286 LR: 0.00001802 [13:58:39] Epoch: 1 Batch: 11973/20099 (59.57%) Loss: 1.986131 LR: 0.00001802 [13:58:41] Epoch: 1 Batch: 11974/20099 (59.58%) Loss: 2.079265 LR: 0.00001802 [13:58:42] Epoch: 1 Batch: 11975/20099 (59.58%) Loss: 1.635267 LR: 0.00001802 [13:58:44] Epoch: 1 Batch: 11976/20099 (59.59%) Loss: 2.217071 LR: 0.00001802 [13:58:46] Epoch: 1 Batch: 11977/20099 (59.59%) Loss: 1.918673 LR: 0.00001802 [13:58:48] Epoch: 1 Batch: 11978/20099 (59.60%) Loss: 1.876424 LR: 0.00001801 [13:58:50] Epoch: 1 Batch: 11979/20099 (59.60%) Loss: 1.929583 LR: 0.00001801 [13:58:52] Epoch: 1 Batch: 11980/20099 (59.60%) Loss: 2.035302 LR: 0.00001801 [13:58:54] Epoch: 1 Batch: 11981/20099 (59.61%) Loss: 2.284866 LR: 0.00001801 [13:58:55] Epoch: 1 Batch: 11982/20099 (59.61%) Loss: 2.163074 LR: 0.00001801 [13:58:57] Epoch: 1 Batch: 11983/20099 (59.62%) Loss: 2.277340 LR: 0.00001801 [13:58:59] Epoch: 1 Batch: 11984/20099 (59.62%) Loss: 2.147599 LR: 0.00001801 [13:59:01] Epoch: 1 Batch: 11985/20099 (59.63%) Loss: 2.097553 LR: 0.00001799 [13:59:03] Epoch: 1 Batch: 11986/20099 (59.63%) Loss: 1.950933 LR: 0.00001799 [13:59:05] Epoch: 1 Batch: 11987/20099 (59.64%) Loss: 2.431226 LR: 0.00001799 [13:59:06] Epoch: 1 Batch: 11988/20099 (59.64%) Loss: 1.990898 LR: 0.00001799 [13:59:08] Epoch: 1 Batch: 11989/20099 (59.65%) Loss: 1.940519 LR: 0.00001799 [13:59:10] Epoch: 1 Batch: 11990/20099 (59.65%) Loss: 1.998959 LR: 0.00001799 [13:59:12] Epoch: 1 Batch: 11991/20099 (59.66%) Loss: 2.113931 LR: 0.00001799 [13:59:14] Epoch: 1 Batch: 11992/20099 (59.66%) Loss: 1.937516 LR: 0.00001798 [13:59:16] Epoch: 1 Batch: 11993/20099 (59.67%) Loss: 2.254978 LR: 0.00001798 [13:59:18] Epoch: 1 Batch: 11994/20099 (59.67%) Loss: 1.970295 LR: 0.00001798 [13:59:19] Epoch: 1 Batch: 11995/20099 (59.68%) Loss: 2.147945 LR: 0.00001798 [13:59:21] Epoch: 1 Batch: 11996/20099 (59.68%) Loss: 2.180451 LR: 0.00001798 [13:59:23] Epoch: 1 Batch: 11997/20099 (59.69%) Loss: 2.271821 LR: 0.00001798 [13:59:25] Epoch: 1 Batch: 11998/20099 (59.69%) Loss: 2.543804 LR: 0.00001798 [13:59:27] Epoch: 1 Batch: 11999/20099 (59.70%) Loss: 2.082809 LR: 0.00001796 [13:59:29] >> Evaluating batch 0 [13:59:30] >> Evaluating batch 1 [13:59:31] >> Evaluating batch 2 [13:59:32] >> Evaluating batch 3 [13:59:33] >> Evaluating batch 4 [13:59:34] >> Evaluating batch 5 [13:59:35] >> Evaluating batch 6 [13:59:36] >> Evaluating batch 7 [13:59:37] >> Evaluating batch 8 [13:59:39] >> Evaluating batch 9 [13:59:40] >> Evaluating batch 10 [13:59:41] >> Evaluating batch 11 [13:59:42] >> Evaluating batch 12 [13:59:43] >> Evaluating batch 13 [13:59:44] >> Evaluating batch 14 [13:59:45] >> Evaluating batch 15 [13:59:46] >> Evaluating batch 16 [13:59:46] Epoch: 1 Step: 12000/20099 Evaluation: [13:59:46] [1mAvg Loss Since Last Eval: 0.0699 Val Loss: 2.1585 Validation loss delta: 2.1585 Perplexity: 8.6583 LR: 0.00001796 [13:59:50] >> Cleaned up old temp checkpoint: epoch1_step6200 [13:59:50] >> Temp checkpoint saved: epoch1_step12000, size: 0.1693 GB [13:59:54] >> Checkpoint saved: epoch1_step12000, size: 0.1693 GB [13:59:54] Epoch: 1 Batch: 12000/20099 (59.70%) Loss: 1.929041 LR: 0.00001796 [13:59:55] Epoch: 1 Batch: 12001/20099 (59.71%) Loss: 2.157771 LR: 0.00001796 [13:59:57] Epoch: 1 Batch: 12002/20099 (59.71%) Loss: 2.243726 LR: 0.00001796 [13:59:59] Epoch: 1 Batch: 12003/20099 (59.72%) Loss: 2.172165 LR: 0.00001796 [14:00:01] Epoch: 1 Batch: 12004/20099 (59.72%) Loss: 2.085486 LR: 0.00001796 [14:00:03] Epoch: 1 Batch: 12005/20099 (59.73%) Loss: 2.166671 LR: 0.00001796 [14:00:05] Epoch: 1 Batch: 12006/20099 (59.73%) Loss: 1.988098 LR: 0.00001794 [14:00:06] Epoch: 1 Batch: 12007/20099 (59.74%) Loss: 1.860830 LR: 0.00001794 [14:00:08] Epoch: 1 Batch: 12008/20099 (59.74%) Loss: 1.766423 LR: 0.00001794 [14:00:10] Epoch: 1 Batch: 12009/20099 (59.75%) Loss: 1.910781 LR: 0.00001794 [14:00:12] Epoch: 1 Batch: 12010/20099 (59.75%) Loss: 1.559872 LR: 0.00001794 [14:00:14] Epoch: 1 Batch: 12011/20099 (59.76%) Loss: 2.284399 LR: 0.00001794 [14:00:16] Epoch: 1 Batch: 12012/20099 (59.76%) Loss: 1.753412 LR: 0.00001794 [14:00:17] Epoch: 1 Batch: 12013/20099 (59.77%) Loss: 2.212875 LR: 0.00001793 [14:00:19] Epoch: 1 Batch: 12014/20099 (59.77%) Loss: 1.777774 LR: 0.00001793 [14:00:21] Epoch: 1 Batch: 12015/20099 (59.78%) Loss: 2.272972 LR: 0.00001793 [14:00:23] Epoch: 1 Batch: 12016/20099 (59.78%) Loss: 1.900482 LR: 0.00001793 [14:00:25] Epoch: 1 Batch: 12017/20099 (59.79%) Loss: 2.150534 LR: 0.00001793 [14:00:27] Epoch: 1 Batch: 12018/20099 (59.79%) Loss: 1.791320 LR: 0.00001793 [14:00:29] Epoch: 1 Batch: 12019/20099 (59.80%) Loss: 2.091736 LR: 0.00001793 [14:00:30] Epoch: 1 Batch: 12020/20099 (59.80%) Loss: 2.148026 LR: 0.00001791 [14:00:32] Epoch: 1 Batch: 12021/20099 (59.81%) Loss: 2.089726 LR: 0.00001791 [14:00:34] Epoch: 1 Batch: 12022/20099 (59.81%) Loss: 2.007583 LR: 0.00001791 [14:00:36] Epoch: 1 Batch: 12023/20099 (59.82%) Loss: 1.939742 LR: 0.00001791 [14:00:38] Epoch: 1 Batch: 12024/20099 (59.82%) Loss: 2.165577 LR: 0.00001791 [14:00:40] Epoch: 1 Batch: 12025/20099 (59.83%) Loss: 2.382001 LR: 0.00001791 [14:00:41] Epoch: 1 Batch: 12026/20099 (59.83%) Loss: 2.161133 LR: 0.00001791 [14:00:43] Epoch: 1 Batch: 12027/20099 (59.84%) Loss: 1.955997 LR: 0.00001789 [14:00:45] Epoch: 1 Batch: 12028/20099 (59.84%) Loss: 2.283062 LR: 0.00001789 [14:00:47] Epoch: 1 Batch: 12029/20099 (59.85%) Loss: 2.266393 LR: 0.00001789 [14:00:49] Epoch: 1 Batch: 12030/20099 (59.85%) Loss: 2.204633 LR: 0.00001789 [14:00:51] Epoch: 1 Batch: 12031/20099 (59.86%) Loss: 1.977131 LR: 0.00001789 [14:00:53] Epoch: 1 Batch: 12032/20099 (59.86%) Loss: 2.306788 LR: 0.00001789 [14:00:55] Epoch: 1 Batch: 12033/20099 (59.87%) Loss: 1.971958 LR: 0.00001789 [14:00:56] Epoch: 1 Batch: 12034/20099 (59.87%) Loss: 2.033112 LR: 0.00001788 [14:00:58] Epoch: 1 Batch: 12035/20099 (59.88%) Loss: 2.095487 LR: 0.00001788 [14:01:00] Epoch: 1 Batch: 12036/20099 (59.88%) Loss: 2.132129 LR: 0.00001788 [14:01:02] Epoch: 1 Batch: 12037/20099 (59.89%) Loss: 2.054639 LR: 0.00001788 [14:01:04] Epoch: 1 Batch: 12038/20099 (59.89%) Loss: 2.019219 LR: 0.00001788 [14:01:06] Epoch: 1 Batch: 12039/20099 (59.90%) Loss: 1.958479 LR: 0.00001788 [14:01:08] Epoch: 1 Batch: 12040/20099 (59.90%) Loss: 2.274388 LR: 0.00001788 [14:01:09] Epoch: 1 Batch: 12041/20099 (59.91%) Loss: 1.982106 LR: 0.00001786 [14:01:11] Epoch: 1 Batch: 12042/20099 (59.91%) Loss: 1.871173 LR: 0.00001786 [14:01:13] Epoch: 1 Batch: 12043/20099 (59.92%) Loss: 1.964772 LR: 0.00001786 [14:01:15] Epoch: 1 Batch: 12044/20099 (59.92%) Loss: 1.877458 LR: 0.00001786 [14:01:17] Epoch: 1 Batch: 12045/20099 (59.93%) Loss: 2.022984 LR: 0.00001786 [14:01:19] Epoch: 1 Batch: 12046/20099 (59.93%) Loss: 2.178191 LR: 0.00001786 [14:01:21] Epoch: 1 Batch: 12047/20099 (59.94%) Loss: 1.916015 LR: 0.00001786 [14:01:22] Epoch: 1 Batch: 12048/20099 (59.94%) Loss: 2.158407 LR: 0.00001785 [14:01:24] Epoch: 1 Batch: 12049/20099 (59.95%) Loss: 2.371899 LR: 0.00001785 [14:01:26] Epoch: 1 Batch: 12050/20099 (59.95%) Loss: 2.184262 LR: 0.00001785 [14:01:28] Epoch: 1 Batch: 12051/20099 (59.96%) Loss: 2.042937 LR: 0.00001785 [14:01:30] Epoch: 1 Batch: 12052/20099 (59.96%) Loss: 2.038719 LR: 0.00001785 [14:01:32] Epoch: 1 Batch: 12053/20099 (59.97%) Loss: 1.838948 LR: 0.00001785 [14:01:34] Epoch: 1 Batch: 12054/20099 (59.97%) Loss: 2.180948 LR: 0.00001785 [14:01:35] Epoch: 1 Batch: 12055/20099 (59.98%) Loss: 2.002589 LR: 0.00001783 [14:01:37] Epoch: 1 Batch: 12056/20099 (59.98%) Loss: 2.118901 LR: 0.00001783 [14:01:39] Epoch: 1 Batch: 12057/20099 (59.99%) Loss: 1.978682 LR: 0.00001783 [14:01:41] Epoch: 1 Batch: 12058/20099 (59.99%) Loss: 1.822109 LR: 0.00001783 [14:01:43] Epoch: 1 Batch: 12059/20099 (60.00%) Loss: 2.395901 LR: 0.00001783 [14:01:45] Epoch: 1 Batch: 12060/20099 (60.00%) Loss: 2.370045 LR: 0.00001783 [14:01:47] Epoch: 1 Batch: 12061/20099 (60.01%) Loss: 2.003750 LR: 0.00001783 [14:01:48] Epoch: 1 Batch: 12062/20099 (60.01%) Loss: 2.226631 LR: 0.00001781 [14:01:50] Epoch: 1 Batch: 12063/20099 (60.02%) Loss: 1.888216 LR: 0.00001781 [14:01:52] Epoch: 1 Batch: 12064/20099 (60.02%) Loss: 2.097620 LR: 0.00001781 [14:01:54] Epoch: 1 Batch: 12065/20099 (60.03%) Loss: 2.136600 LR: 0.00001781 [14:01:56] Epoch: 1 Batch: 12066/20099 (60.03%) Loss: 2.104551 LR: 0.00001781 [14:01:58] Epoch: 1 Batch: 12067/20099 (60.04%) Loss: 2.217072 LR: 0.00001781 [14:02:00] Epoch: 1 Batch: 12068/20099 (60.04%) Loss: 2.320764 LR: 0.00001781 [14:02:01] Epoch: 1 Batch: 12069/20099 (60.05%) Loss: 2.117182 LR: 0.00001780 [14:02:03] Epoch: 1 Batch: 12070/20099 (60.05%) Loss: 2.137215 LR: 0.00001780 [14:02:05] Epoch: 1 Batch: 12071/20099 (60.06%) Loss: 1.923744 LR: 0.00001780 [14:02:07] Epoch: 1 Batch: 12072/20099 (60.06%) Loss: 2.046069 LR: 0.00001780 [14:02:09] Epoch: 1 Batch: 12073/20099 (60.07%) Loss: 1.992553 LR: 0.00001780 [14:02:11] Epoch: 1 Batch: 12074/20099 (60.07%) Loss: 1.968565 LR: 0.00001780 [14:02:13] Epoch: 1 Batch: 12075/20099 (60.08%) Loss: 2.076002 LR: 0.00001780 [14:02:14] Epoch: 1 Batch: 12076/20099 (60.08%) Loss: 2.143752 LR: 0.00001778 [14:02:16] Epoch: 1 Batch: 12077/20099 (60.09%) Loss: 2.152250 LR: 0.00001778 [14:02:18] Epoch: 1 Batch: 12078/20099 (60.09%) Loss: 1.995527 LR: 0.00001778 [14:02:20] Epoch: 1 Batch: 12079/20099 (60.10%) Loss: 2.102014 LR: 0.00001778 [14:02:22] Epoch: 1 Batch: 12080/20099 (60.10%) Loss: 2.085536 LR: 0.00001778 [14:02:24] Epoch: 1 Batch: 12081/20099 (60.11%) Loss: 1.962524 LR: 0.00001778 [14:02:26] Epoch: 1 Batch: 12082/20099 (60.11%) Loss: 2.373902 LR: 0.00001778 [14:02:27] Epoch: 1 Batch: 12083/20099 (60.12%) Loss: 2.111352 LR: 0.00001776 [14:02:29] Epoch: 1 Batch: 12084/20099 (60.12%) Loss: 2.123293 LR: 0.00001776 [14:02:31] Epoch: 1 Batch: 12085/20099 (60.13%) Loss: 2.019420 LR: 0.00001776 [14:02:33] Epoch: 1 Batch: 12086/20099 (60.13%) Loss: 1.855317 LR: 0.00001776 [14:02:35] Epoch: 1 Batch: 12087/20099 (60.14%) Loss: 2.234253 LR: 0.00001776 [14:02:37] Epoch: 1 Batch: 12088/20099 (60.14%) Loss: 1.910104 LR: 0.00001776 [14:02:39] Epoch: 1 Batch: 12089/20099 (60.15%) Loss: 2.002606 LR: 0.00001776 [14:02:40] Epoch: 1 Batch: 12090/20099 (60.15%) Loss: 2.108014 LR: 0.00001775 [14:02:42] Epoch: 1 Batch: 12091/20099 (60.16%) Loss: 1.935029 LR: 0.00001775 [14:02:44] Epoch: 1 Batch: 12092/20099 (60.16%) Loss: 2.170310 LR: 0.00001775 [14:02:46] Epoch: 1 Batch: 12093/20099 (60.17%) Loss: 2.165373 LR: 0.00001775 [14:02:48] Epoch: 1 Batch: 12094/20099 (60.17%) Loss: 2.054693 LR: 0.00001775 [14:02:50] Epoch: 1 Batch: 12095/20099 (60.18%) Loss: 2.336131 LR: 0.00001775 [14:02:52] Epoch: 1 Batch: 12096/20099 (60.18%) Loss: 1.823726 LR: 0.00001775 [14:02:53] Epoch: 1 Batch: 12097/20099 (60.19%) Loss: 1.808364 LR: 0.00001773 [14:02:55] Epoch: 1 Batch: 12098/20099 (60.19%) Loss: 2.032192 LR: 0.00001773 [14:02:57] Epoch: 1 Batch: 12099/20099 (60.20%) Loss: 2.278768 LR: 0.00001773 [14:02:59] Epoch: 1 Batch: 12100/20099 (60.20%) Loss: 2.032614 LR: 0.00001773 [14:03:01] Epoch: 1 Batch: 12101/20099 (60.21%) Loss: 2.189554 LR: 0.00001773 [14:03:03] Epoch: 1 Batch: 12102/20099 (60.21%) Loss: 2.166137 LR: 0.00001773 [14:03:05] Epoch: 1 Batch: 12103/20099 (60.22%) Loss: 1.985962 LR: 0.00001773 [14:03:06] Epoch: 1 Batch: 12104/20099 (60.22%) Loss: 2.370743 LR: 0.00001772 [14:03:08] Epoch: 1 Batch: 12105/20099 (60.23%) Loss: 2.152474 LR: 0.00001772 [14:03:10] Epoch: 1 Batch: 12106/20099 (60.23%) Loss: 2.177539 LR: 0.00001772 [14:03:12] Epoch: 1 Batch: 12107/20099 (60.24%) Loss: 2.412341 LR: 0.00001772 [14:03:14] Epoch: 1 Batch: 12108/20099 (60.24%) Loss: 2.162924 LR: 0.00001772 [14:03:16] Epoch: 1 Batch: 12109/20099 (60.25%) Loss: 2.120435 LR: 0.00001772 [14:03:18] Epoch: 1 Batch: 12110/20099 (60.25%) Loss: 1.977020 LR: 0.00001772 [14:03:19] Epoch: 1 Batch: 12111/20099 (60.26%) Loss: 1.976855 LR: 0.00001770 [14:03:21] Epoch: 1 Batch: 12112/20099 (60.26%) Loss: 2.391312 LR: 0.00001770 [14:03:23] Epoch: 1 Batch: 12113/20099 (60.27%) Loss: 1.970687 LR: 0.00001770 [14:03:25] Epoch: 1 Batch: 12114/20099 (60.27%) Loss: 2.046840 LR: 0.00001770 [14:03:27] Epoch: 1 Batch: 12115/20099 (60.28%) Loss: 2.235445 LR: 0.00001770 [14:03:29] Epoch: 1 Batch: 12116/20099 (60.28%) Loss: 2.031919 LR: 0.00001770 [14:03:31] Epoch: 1 Batch: 12117/20099 (60.29%) Loss: 2.057075 LR: 0.00001770 [14:03:32] Epoch: 1 Batch: 12118/20099 (60.29%) Loss: 2.126101 LR: 0.00001768 [14:03:34] Epoch: 1 Batch: 12119/20099 (60.30%) Loss: 1.894709 LR: 0.00001768 [14:03:36] Epoch: 1 Batch: 12120/20099 (60.30%) Loss: 2.075543 LR: 0.00001768 [14:03:38] Epoch: 1 Batch: 12121/20099 (60.31%) Loss: 2.001178 LR: 0.00001768 [14:03:40] Epoch: 1 Batch: 12122/20099 (60.31%) Loss: 2.430954 LR: 0.00001768 [14:03:42] Epoch: 1 Batch: 12123/20099 (60.32%) Loss: 2.018984 LR: 0.00001768 [14:03:43] Epoch: 1 Batch: 12124/20099 (60.32%) Loss: 2.099712 LR: 0.00001768 [14:03:45] Epoch: 1 Batch: 12125/20099 (60.33%) Loss: 2.469924 LR: 0.00001767 [14:03:47] Epoch: 1 Batch: 12126/20099 (60.33%) Loss: 1.946759 LR: 0.00001767 [14:03:49] Epoch: 1 Batch: 12127/20099 (60.34%) Loss: 2.037593 LR: 0.00001767 [14:03:51] Epoch: 1 Batch: 12128/20099 (60.34%) Loss: 1.854365 LR: 0.00001767 [14:03:53] Epoch: 1 Batch: 12129/20099 (60.35%) Loss: 2.147152 LR: 0.00001767 [14:03:55] Epoch: 1 Batch: 12130/20099 (60.35%) Loss: 1.976815 LR: 0.00001767 [14:03:56] Epoch: 1 Batch: 12131/20099 (60.36%) Loss: 1.794192 LR: 0.00001767 [14:03:58] Epoch: 1 Batch: 12132/20099 (60.36%) Loss: 2.025391 LR: 0.00001765 [14:04:00] Epoch: 1 Batch: 12133/20099 (60.37%) Loss: 1.827565 LR: 0.00001765 [14:04:02] Epoch: 1 Batch: 12134/20099 (60.37%) Loss: 2.163708 LR: 0.00001765 [14:04:04] Epoch: 1 Batch: 12135/20099 (60.38%) Loss: 2.140769 LR: 0.00001765 [14:04:06] Epoch: 1 Batch: 12136/20099 (60.38%) Loss: 2.140808 LR: 0.00001765 [14:04:08] Epoch: 1 Batch: 12137/20099 (60.39%) Loss: 2.053265 LR: 0.00001765 [14:04:09] Epoch: 1 Batch: 12138/20099 (60.39%) Loss: 2.201859 LR: 0.00001765 [14:04:11] Epoch: 1 Batch: 12139/20099 (60.40%) Loss: 2.274992 LR: 0.00001763 [14:04:13] Epoch: 1 Batch: 12140/20099 (60.40%) Loss: 2.145534 LR: 0.00001763 [14:04:15] Epoch: 1 Batch: 12141/20099 (60.41%) Loss: 2.585270 LR: 0.00001763 [14:04:17] Epoch: 1 Batch: 12142/20099 (60.41%) Loss: 1.744034 LR: 0.00001763 [14:04:19] Epoch: 1 Batch: 12143/20099 (60.42%) Loss: 2.092237 LR: 0.00001763 [14:04:20] Epoch: 1 Batch: 12144/20099 (60.42%) Loss: 2.004505 LR: 0.00001763 [14:04:22] Epoch: 1 Batch: 12145/20099 (60.43%) Loss: 2.000364 LR: 0.00001763 [14:04:24] Epoch: 1 Batch: 12146/20099 (60.43%) Loss: 2.124074 LR: 0.00001762 [14:04:26] Epoch: 1 Batch: 12147/20099 (60.44%) Loss: 2.077605 LR: 0.00001762 [14:04:28] Epoch: 1 Batch: 12148/20099 (60.44%) Loss: 1.789516 LR: 0.00001762 [14:04:30] Epoch: 1 Batch: 12149/20099 (60.45%) Loss: 2.044131 LR: 0.00001762 [14:04:32] Epoch: 1 Batch: 12150/20099 (60.45%) Loss: 2.014624 LR: 0.00001762 [14:04:33] Epoch: 1 Batch: 12151/20099 (60.46%) Loss: 2.050602 LR: 0.00001762 [14:04:35] Epoch: 1 Batch: 12152/20099 (60.46%) Loss: 2.448857 LR: 0.00001762 [14:04:37] Epoch: 1 Batch: 12153/20099 (60.47%) Loss: 2.397296 LR: 0.00001760 [14:04:39] Epoch: 1 Batch: 12154/20099 (60.47%) Loss: 2.037878 LR: 0.00001760 [14:04:41] Epoch: 1 Batch: 12155/20099 (60.48%) Loss: 2.370075 LR: 0.00001760 [14:04:43] Epoch: 1 Batch: 12156/20099 (60.48%) Loss: 2.226198 LR: 0.00001760 [14:04:45] Epoch: 1 Batch: 12157/20099 (60.49%) Loss: 2.021324 LR: 0.00001760 [14:04:46] Epoch: 1 Batch: 12158/20099 (60.49%) Loss: 2.083964 LR: 0.00001760 [14:04:48] Epoch: 1 Batch: 12159/20099 (60.50%) Loss: 2.143790 LR: 0.00001760 [14:04:50] Epoch: 1 Batch: 12160/20099 (60.50%) Loss: 2.346072 LR: 0.00001759 [14:04:52] Epoch: 1 Batch: 12161/20099 (60.51%) Loss: 2.188300 LR: 0.00001759 [14:04:54] Epoch: 1 Batch: 12162/20099 (60.51%) Loss: 2.151467 LR: 0.00001759 [14:04:56] Epoch: 1 Batch: 12163/20099 (60.52%) Loss: 2.173041 LR: 0.00001759 [14:04:58] Epoch: 1 Batch: 12164/20099 (60.52%) Loss: 2.129467 LR: 0.00001759 [14:04:59] Epoch: 1 Batch: 12165/20099 (60.53%) Loss: 2.196073 LR: 0.00001759 [14:05:01] Epoch: 1 Batch: 12166/20099 (60.53%) Loss: 2.131527 LR: 0.00001759 [14:05:03] Epoch: 1 Batch: 12167/20099 (60.54%) Loss: 2.312117 LR: 0.00001757 [14:05:05] Epoch: 1 Batch: 12168/20099 (60.54%) Loss: 2.464015 LR: 0.00001757 [14:05:07] Epoch: 1 Batch: 12169/20099 (60.55%) Loss: 2.232410 LR: 0.00001757 [14:05:09] Epoch: 1 Batch: 12170/20099 (60.55%) Loss: 2.122999 LR: 0.00001757 [14:05:11] Epoch: 1 Batch: 12171/20099 (60.56%) Loss: 1.865578 LR: 0.00001757 [14:05:13] Epoch: 1 Batch: 12172/20099 (60.56%) Loss: 1.967662 LR: 0.00001757 [14:05:14] Epoch: 1 Batch: 12173/20099 (60.57%) Loss: 2.085064 LR: 0.00001757 [14:05:16] Epoch: 1 Batch: 12174/20099 (60.57%) Loss: 2.224516 LR: 0.00001755 [14:05:18] Epoch: 1 Batch: 12175/20099 (60.58%) Loss: 2.155399 LR: 0.00001755 [14:05:20] Epoch: 1 Batch: 12176/20099 (60.58%) Loss: 2.197881 LR: 0.00001755 [14:05:22] Epoch: 1 Batch: 12177/20099 (60.59%) Loss: 1.926407 LR: 0.00001755 [14:05:24] Epoch: 1 Batch: 12178/20099 (60.59%) Loss: 2.039332 LR: 0.00001755 [14:05:26] Epoch: 1 Batch: 12179/20099 (60.60%) Loss: 2.085330 LR: 0.00001755 [14:05:27] Epoch: 1 Batch: 12180/20099 (60.60%) Loss: 2.057239 LR: 0.00001755 [14:05:29] Epoch: 1 Batch: 12181/20099 (60.61%) Loss: 1.901259 LR: 0.00001754 [14:05:31] Epoch: 1 Batch: 12182/20099 (60.61%) Loss: 1.775051 LR: 0.00001754 [14:05:33] Epoch: 1 Batch: 12183/20099 (60.61%) Loss: 2.312769 LR: 0.00001754 [14:05:35] Epoch: 1 Batch: 12184/20099 (60.62%) Loss: 1.958318 LR: 0.00001754 [14:05:37] Epoch: 1 Batch: 12185/20099 (60.62%) Loss: 1.867153 LR: 0.00001754 [14:05:39] Epoch: 1 Batch: 12186/20099 (60.63%) Loss: 2.051325 LR: 0.00001754 [14:05:40] Epoch: 1 Batch: 12187/20099 (60.63%) Loss: 2.164819 LR: 0.00001754 [14:05:42] Epoch: 1 Batch: 12188/20099 (60.64%) Loss: 2.158725 LR: 0.00001752 [14:05:44] Epoch: 1 Batch: 12189/20099 (60.64%) Loss: 2.018093 LR: 0.00001752 [14:05:46] Epoch: 1 Batch: 12190/20099 (60.65%) Loss: 2.090154 LR: 0.00001752 [14:05:48] Epoch: 1 Batch: 12191/20099 (60.65%) Loss: 2.163364 LR: 0.00001752 [14:05:50] Epoch: 1 Batch: 12192/20099 (60.66%) Loss: 1.835275 LR: 0.00001752 [14:05:52] Epoch: 1 Batch: 12193/20099 (60.66%) Loss: 1.752116 LR: 0.00001752 [14:05:53] Epoch: 1 Batch: 12194/20099 (60.67%) Loss: 2.345974 LR: 0.00001752 [14:05:55] Epoch: 1 Batch: 12195/20099 (60.67%) Loss: 1.835155 LR: 0.00001750 [14:05:57] Epoch: 1 Batch: 12196/20099 (60.68%) Loss: 2.089862 LR: 0.00001750 [14:05:59] Epoch: 1 Batch: 12197/20099 (60.68%) Loss: 2.205111 LR: 0.00001750 [14:06:01] Epoch: 1 Batch: 12198/20099 (60.69%) Loss: 2.030867 LR: 0.00001750 [14:06:03] Epoch: 1 Batch: 12199/20099 (60.69%) Loss: 2.235126 LR: 0.00001750 [14:06:08] >> Cleaned up old temp checkpoint: epoch1_step6400 [14:06:08] >> Temp checkpoint saved: epoch1_step12200, size: 0.1693 GB [14:06:08] Epoch: 1 Batch: 12200/20099 (60.70%) Loss: 2.129660 LR: 0.00001750 [14:06:10] Epoch: 1 Batch: 12201/20099 (60.70%) Loss: 1.889544 LR: 0.00001750 [14:06:12] Epoch: 1 Batch: 12202/20099 (60.71%) Loss: 2.000486 LR: 0.00001749 [14:06:14] Epoch: 1 Batch: 12203/20099 (60.71%) Loss: 2.235077 LR: 0.00001749 [14:06:16] Epoch: 1 Batch: 12204/20099 (60.72%) Loss: 1.891682 LR: 0.00001749 [14:06:18] Epoch: 1 Batch: 12205/20099 (60.72%) Loss: 2.075753 LR: 0.00001749 [14:06:19] Epoch: 1 Batch: 12206/20099 (60.73%) Loss: 2.140895 LR: 0.00001749 [14:06:21] Epoch: 1 Batch: 12207/20099 (60.73%) Loss: 1.768916 LR: 0.00001749 [14:06:23] Epoch: 1 Batch: 12208/20099 (60.74%) Loss: 2.109254 LR: 0.00001749 [14:06:25] Epoch: 1 Batch: 12209/20099 (60.74%) Loss: 1.975311 LR: 0.00001747 [14:06:27] Epoch: 1 Batch: 12210/20099 (60.75%) Loss: 2.150007 LR: 0.00001747 [14:06:29] Epoch: 1 Batch: 12211/20099 (60.75%) Loss: 2.417830 LR: 0.00001747 [14:06:30] Epoch: 1 Batch: 12212/20099 (60.76%) Loss: 2.271095 LR: 0.00001747 [14:06:32] Epoch: 1 Batch: 12213/20099 (60.76%) Loss: 2.014578 LR: 0.00001747 [14:06:34] Epoch: 1 Batch: 12214/20099 (60.77%) Loss: 2.148334 LR: 0.00001747 [14:06:36] Epoch: 1 Batch: 12215/20099 (60.77%) Loss: 2.027583 LR: 0.00001747 [14:06:38] Epoch: 1 Batch: 12216/20099 (60.78%) Loss: 2.023675 LR: 0.00001746 [14:06:40] Epoch: 1 Batch: 12217/20099 (60.78%) Loss: 2.166159 LR: 0.00001746 [14:06:42] Epoch: 1 Batch: 12218/20099 (60.79%) Loss: 2.169972 LR: 0.00001746 [14:06:44] Epoch: 1 Batch: 12219/20099 (60.79%) Loss: 2.137627 LR: 0.00001746 [14:06:45] Epoch: 1 Batch: 12220/20099 (60.80%) Loss: 1.948411 LR: 0.00001746 [14:06:47] Epoch: 1 Batch: 12221/20099 (60.80%) Loss: 1.911548 LR: 0.00001746 [14:06:49] Epoch: 1 Batch: 12222/20099 (60.81%) Loss: 1.994660 LR: 0.00001746 [14:06:51] Epoch: 1 Batch: 12223/20099 (60.81%) Loss: 2.241331 LR: 0.00001744 [14:06:53] Epoch: 1 Batch: 12224/20099 (60.82%) Loss: 1.844083 LR: 0.00001744 [14:06:55] Epoch: 1 Batch: 12225/20099 (60.82%) Loss: 2.234322 LR: 0.00001744 [14:06:57] Epoch: 1 Batch: 12226/20099 (60.83%) Loss: 2.269020 LR: 0.00001744 [14:06:58] Epoch: 1 Batch: 12227/20099 (60.83%) Loss: 1.904417 LR: 0.00001744 [14:07:00] Epoch: 1 Batch: 12228/20099 (60.84%) Loss: 2.110585 LR: 0.00001744 [14:07:02] Epoch: 1 Batch: 12229/20099 (60.84%) Loss: 2.271290 LR: 0.00001744 [14:07:04] Epoch: 1 Batch: 12230/20099 (60.85%) Loss: 2.103977 LR: 0.00001742 [14:07:06] Epoch: 1 Batch: 12231/20099 (60.85%) Loss: 2.149932 LR: 0.00001742 [14:07:08] Epoch: 1 Batch: 12232/20099 (60.86%) Loss: 2.026397 LR: 0.00001742 [14:07:10] Epoch: 1 Batch: 12233/20099 (60.86%) Loss: 2.040605 LR: 0.00001742 [14:07:12] Epoch: 1 Batch: 12234/20099 (60.87%) Loss: 1.831052 LR: 0.00001742 [14:07:14] Epoch: 1 Batch: 12235/20099 (60.87%) Loss: 2.258478 LR: 0.00001742 [14:07:15] Epoch: 1 Batch: 12236/20099 (60.88%) Loss: 2.116243 LR: 0.00001742 [14:07:17] Epoch: 1 Batch: 12237/20099 (60.88%) Loss: 2.301134 LR: 0.00001741 [14:07:19] Epoch: 1 Batch: 12238/20099 (60.89%) Loss: 2.100321 LR: 0.00001741 [14:07:21] Epoch: 1 Batch: 12239/20099 (60.89%) Loss: 1.976056 LR: 0.00001741 [14:07:23] Epoch: 1 Batch: 12240/20099 (60.90%) Loss: 2.194976 LR: 0.00001741 [14:07:25] Epoch: 1 Batch: 12241/20099 (60.90%) Loss: 2.168750 LR: 0.00001741 [14:07:27] Epoch: 1 Batch: 12242/20099 (60.91%) Loss: 2.170334 LR: 0.00001741 [14:07:28] Epoch: 1 Batch: 12243/20099 (60.91%) Loss: 1.903274 LR: 0.00001741 [14:07:30] Epoch: 1 Batch: 12244/20099 (60.92%) Loss: 2.520241 LR: 0.00001739 [14:07:32] Epoch: 1 Batch: 12245/20099 (60.92%) Loss: 2.004865 LR: 0.00001739 [14:07:34] Epoch: 1 Batch: 12246/20099 (60.93%) Loss: 1.930843 LR: 0.00001739 [14:07:36] Epoch: 1 Batch: 12247/20099 (60.93%) Loss: 2.264266 LR: 0.00001739 [14:07:38] Epoch: 1 Batch: 12248/20099 (60.94%) Loss: 2.082570 LR: 0.00001739 [14:07:40] Epoch: 1 Batch: 12249/20099 (60.94%) Loss: 2.085773 LR: 0.00001739 [14:07:41] Epoch: 1 Batch: 12250/20099 (60.95%) Loss: 1.978746 LR: 0.00001739 [14:07:43] Epoch: 1 Batch: 12251/20099 (60.95%) Loss: 1.813149 LR: 0.00001737 [14:07:45] Epoch: 1 Batch: 12252/20099 (60.96%) Loss: 2.146427 LR: 0.00001737 [14:07:47] Epoch: 1 Batch: 12253/20099 (60.96%) Loss: 2.369604 LR: 0.00001737 [14:07:49] Epoch: 1 Batch: 12254/20099 (60.97%) Loss: 2.034620 LR: 0.00001737 [14:07:51] Epoch: 1 Batch: 12255/20099 (60.97%) Loss: 1.722326 LR: 0.00001737 [14:07:53] Epoch: 1 Batch: 12256/20099 (60.98%) Loss: 2.052079 LR: 0.00001737 [14:07:54] Epoch: 1 Batch: 12257/20099 (60.98%) Loss: 2.218084 LR: 0.00001737 [14:07:56] Epoch: 1 Batch: 12258/20099 (60.99%) Loss: 2.122533 LR: 0.00001736 [14:07:58] Epoch: 1 Batch: 12259/20099 (60.99%) Loss: 2.028579 LR: 0.00001736 [14:08:00] Epoch: 1 Batch: 12260/20099 (61.00%) Loss: 1.887762 LR: 0.00001736 [14:08:02] Epoch: 1 Batch: 12261/20099 (61.00%) Loss: 2.286776 LR: 0.00001736 [14:08:04] Epoch: 1 Batch: 12262/20099 (61.01%) Loss: 2.183238 LR: 0.00001736 [14:08:06] Epoch: 1 Batch: 12263/20099 (61.01%) Loss: 2.376193 LR: 0.00001736 [14:08:07] Epoch: 1 Batch: 12264/20099 (61.02%) Loss: 2.083225 LR: 0.00001736 [14:08:09] Epoch: 1 Batch: 12265/20099 (61.02%) Loss: 1.946620 LR: 0.00001734 [14:08:11] Epoch: 1 Batch: 12266/20099 (61.03%) Loss: 1.990003 LR: 0.00001734 [14:08:13] Epoch: 1 Batch: 12267/20099 (61.03%) Loss: 2.013268 LR: 0.00001734 [14:08:15] Epoch: 1 Batch: 12268/20099 (61.04%) Loss: 2.114145 LR: 0.00001734 [14:08:17] Epoch: 1 Batch: 12269/20099 (61.04%) Loss: 2.075821 LR: 0.00001734 [14:08:19] Epoch: 1 Batch: 12270/20099 (61.05%) Loss: 1.953556 LR: 0.00001734 [14:08:20] Epoch: 1 Batch: 12271/20099 (61.05%) Loss: 2.182477 LR: 0.00001734 [14:08:22] Epoch: 1 Batch: 12272/20099 (61.06%) Loss: 2.005339 LR: 0.00001733 [14:08:24] Epoch: 1 Batch: 12273/20099 (61.06%) Loss: 2.243406 LR: 0.00001733 [14:08:26] Epoch: 1 Batch: 12274/20099 (61.07%) Loss: 2.024215 LR: 0.00001733 [14:08:28] Epoch: 1 Batch: 12275/20099 (61.07%) Loss: 2.155385 LR: 0.00001733 [14:08:30] Epoch: 1 Batch: 12276/20099 (61.08%) Loss: 2.214620 LR: 0.00001733 [14:08:32] Epoch: 1 Batch: 12277/20099 (61.08%) Loss: 2.230641 LR: 0.00001733 [14:08:34] Epoch: 1 Batch: 12278/20099 (61.09%) Loss: 2.073393 LR: 0.00001733 [14:08:35] Epoch: 1 Batch: 12279/20099 (61.09%) Loss: 2.311084 LR: 0.00001731 [14:08:37] Epoch: 1 Batch: 12280/20099 (61.10%) Loss: 2.190411 LR: 0.00001731 [14:08:39] Epoch: 1 Batch: 12281/20099 (61.10%) Loss: 2.011444 LR: 0.00001731 [14:08:41] Epoch: 1 Batch: 12282/20099 (61.11%) Loss: 1.952600 LR: 0.00001731 [14:08:43] Epoch: 1 Batch: 12283/20099 (61.11%) Loss: 1.977123 LR: 0.00001731 [14:08:45] Epoch: 1 Batch: 12284/20099 (61.12%) Loss: 1.981080 LR: 0.00001731 [14:08:47] Epoch: 1 Batch: 12285/20099 (61.12%) Loss: 1.952946 LR: 0.00001731 [14:08:48] Epoch: 1 Batch: 12286/20099 (61.13%) Loss: 1.601518 LR: 0.00001729 [14:08:50] Epoch: 1 Batch: 12287/20099 (61.13%) Loss: 2.420287 LR: 0.00001729 [14:08:52] Epoch: 1 Batch: 12288/20099 (61.14%) Loss: 2.311944 LR: 0.00001729 [14:08:54] Epoch: 1 Batch: 12289/20099 (61.14%) Loss: 2.116939 LR: 0.00001729 [14:08:56] Epoch: 1 Batch: 12290/20099 (61.15%) Loss: 2.360302 LR: 0.00001729 [14:08:58] Epoch: 1 Batch: 12291/20099 (61.15%) Loss: 2.028761 LR: 0.00001729 [14:09:00] Epoch: 1 Batch: 12292/20099 (61.16%) Loss: 1.989937 LR: 0.00001729 [14:09:01] Epoch: 1 Batch: 12293/20099 (61.16%) Loss: 2.094577 LR: 0.00001728 [14:09:03] Epoch: 1 Batch: 12294/20099 (61.17%) Loss: 2.171137 LR: 0.00001728 [14:09:05] Epoch: 1 Batch: 12295/20099 (61.17%) Loss: 1.821798 LR: 0.00001728 [14:09:07] Epoch: 1 Batch: 12296/20099 (61.18%) Loss: 2.183697 LR: 0.00001728 [14:09:09] Epoch: 1 Batch: 12297/20099 (61.18%) Loss: 1.888962 LR: 0.00001728 [14:09:11] Epoch: 1 Batch: 12298/20099 (61.19%) Loss: 2.053149 LR: 0.00001728 [14:09:13] Epoch: 1 Batch: 12299/20099 (61.19%) Loss: 2.069914 LR: 0.00001728 [14:09:14] Epoch: 1 Batch: 12300/20099 (61.20%) Loss: 1.943260 LR: 0.00001726 [14:09:16] Epoch: 1 Batch: 12301/20099 (61.20%) Loss: 1.901483 LR: 0.00001726 [14:09:18] Epoch: 1 Batch: 12302/20099 (61.21%) Loss: 2.171075 LR: 0.00001726 [14:09:20] Epoch: 1 Batch: 12303/20099 (61.21%) Loss: 1.841258 LR: 0.00001726 [14:09:22] Epoch: 1 Batch: 12304/20099 (61.22%) Loss: 2.150431 LR: 0.00001726 [14:09:24] Epoch: 1 Batch: 12305/20099 (61.22%) Loss: 2.021732 LR: 0.00001726 [14:09:26] Epoch: 1 Batch: 12306/20099 (61.23%) Loss: 2.056081 LR: 0.00001726 [14:09:27] Epoch: 1 Batch: 12307/20099 (61.23%) Loss: 2.321287 LR: 0.00001725 [14:09:29] Epoch: 1 Batch: 12308/20099 (61.24%) Loss: 2.158605 LR: 0.00001725 [14:09:31] Epoch: 1 Batch: 12309/20099 (61.24%) Loss: 2.015778 LR: 0.00001725 [14:09:33] Epoch: 1 Batch: 12310/20099 (61.25%) Loss: 2.183012 LR: 0.00001725 [14:09:35] Epoch: 1 Batch: 12311/20099 (61.25%) Loss: 1.976186 LR: 0.00001725 [14:09:37] Epoch: 1 Batch: 12312/20099 (61.26%) Loss: 1.936267 LR: 0.00001725 [14:09:39] Epoch: 1 Batch: 12313/20099 (61.26%) Loss: 2.066044 LR: 0.00001725 [14:09:41] Epoch: 1 Batch: 12314/20099 (61.27%) Loss: 2.134550 LR: 0.00001723 [14:09:42] Epoch: 1 Batch: 12315/20099 (61.27%) Loss: 2.259176 LR: 0.00001723 [14:09:44] Epoch: 1 Batch: 12316/20099 (61.28%) Loss: 2.229385 LR: 0.00001723 [14:09:46] Epoch: 1 Batch: 12317/20099 (61.28%) Loss: 2.068820 LR: 0.00001723 [14:09:48] Epoch: 1 Batch: 12318/20099 (61.29%) Loss: 2.189961 LR: 0.00001723 [14:09:50] Epoch: 1 Batch: 12319/20099 (61.29%) Loss: 2.024681 LR: 0.00001723 [14:09:52] Epoch: 1 Batch: 12320/20099 (61.30%) Loss: 1.749451 LR: 0.00001723 [14:09:54] Epoch: 1 Batch: 12321/20099 (61.30%) Loss: 2.235156 LR: 0.00001721 [14:09:55] Epoch: 1 Batch: 12322/20099 (61.31%) Loss: 1.913757 LR: 0.00001721 [14:09:57] Epoch: 1 Batch: 12323/20099 (61.31%) Loss: 2.117591 LR: 0.00001721 [14:09:59] Epoch: 1 Batch: 12324/20099 (61.32%) Loss: 1.765773 LR: 0.00001721 [14:10:01] Epoch: 1 Batch: 12325/20099 (61.32%) Loss: 2.133766 LR: 0.00001721 [14:10:03] Epoch: 1 Batch: 12326/20099 (61.33%) Loss: 2.226673 LR: 0.00001721 [14:10:05] Epoch: 1 Batch: 12327/20099 (61.33%) Loss: 2.112855 LR: 0.00001721 [14:10:07] Epoch: 1 Batch: 12328/20099 (61.34%) Loss: 2.135373 LR: 0.00001720 [14:10:08] Epoch: 1 Batch: 12329/20099 (61.34%) Loss: 2.166035 LR: 0.00001720 [14:10:10] Epoch: 1 Batch: 12330/20099 (61.35%) Loss: 2.516033 LR: 0.00001720 [14:10:12] Epoch: 1 Batch: 12331/20099 (61.35%) Loss: 2.023518 LR: 0.00001720 [14:10:14] Epoch: 1 Batch: 12332/20099 (61.36%) Loss: 2.396083 LR: 0.00001720 [14:10:16] Epoch: 1 Batch: 12333/20099 (61.36%) Loss: 1.858010 LR: 0.00001720 [14:10:18] Epoch: 1 Batch: 12334/20099 (61.37%) Loss: 2.137988 LR: 0.00001720 [14:10:20] Epoch: 1 Batch: 12335/20099 (61.37%) Loss: 2.153458 LR: 0.00001718 [14:10:21] Epoch: 1 Batch: 12336/20099 (61.38%) Loss: 2.187206 LR: 0.00001718 [14:10:23] Epoch: 1 Batch: 12337/20099 (61.38%) Loss: 1.915498 LR: 0.00001718 [14:10:25] Epoch: 1 Batch: 12338/20099 (61.39%) Loss: 2.029223 LR: 0.00001718 [14:10:27] Epoch: 1 Batch: 12339/20099 (61.39%) Loss: 1.861312 LR: 0.00001718 [14:10:29] Epoch: 1 Batch: 12340/20099 (61.40%) Loss: 2.329096 LR: 0.00001718 [14:10:31] Epoch: 1 Batch: 12341/20099 (61.40%) Loss: 2.247525 LR: 0.00001718 [14:10:33] Epoch: 1 Batch: 12342/20099 (61.41%) Loss: 2.138009 LR: 0.00001716 [14:10:34] Epoch: 1 Batch: 12343/20099 (61.41%) Loss: 1.934742 LR: 0.00001716 [14:10:36] Epoch: 1 Batch: 12344/20099 (61.42%) Loss: 2.134487 LR: 0.00001716 [14:10:38] Epoch: 1 Batch: 12345/20099 (61.42%) Loss: 1.961718 LR: 0.00001716 [14:10:40] Epoch: 1 Batch: 12346/20099 (61.43%) Loss: 2.383892 LR: 0.00001716 [14:10:42] Epoch: 1 Batch: 12347/20099 (61.43%) Loss: 2.035318 LR: 0.00001716 [14:10:44] Epoch: 1 Batch: 12348/20099 (61.44%) Loss: 1.997882 LR: 0.00001716 [14:10:46] Epoch: 1 Batch: 12349/20099 (61.44%) Loss: 2.055947 LR: 0.00001715 [14:10:47] Epoch: 1 Batch: 12350/20099 (61.45%) Loss: 1.792543 LR: 0.00001715 [14:10:49] Epoch: 1 Batch: 12351/20099 (61.45%) Loss: 2.228012 LR: 0.00001715 [14:10:51] Epoch: 1 Batch: 12352/20099 (61.46%) Loss: 2.345986 LR: 0.00001715 [14:10:53] Epoch: 1 Batch: 12353/20099 (61.46%) Loss: 2.046339 LR: 0.00001715 [14:10:55] Epoch: 1 Batch: 12354/20099 (61.47%) Loss: 1.882083 LR: 0.00001715 [14:10:57] Epoch: 1 Batch: 12355/20099 (61.47%) Loss: 1.998046 LR: 0.00001715 [14:10:59] Epoch: 1 Batch: 12356/20099 (61.48%) Loss: 2.249279 LR: 0.00001713 [14:11:00] Epoch: 1 Batch: 12357/20099 (61.48%) Loss: 1.971248 LR: 0.00001713 [14:11:02] Epoch: 1 Batch: 12358/20099 (61.49%) Loss: 2.123055 LR: 0.00001713 [14:11:04] Epoch: 1 Batch: 12359/20099 (61.49%) Loss: 2.101486 LR: 0.00001713 [14:11:06] Epoch: 1 Batch: 12360/20099 (61.50%) Loss: 2.082289 LR: 0.00001713 [14:11:08] Epoch: 1 Batch: 12361/20099 (61.50%) Loss: 2.331289 LR: 0.00001713 [14:11:10] Epoch: 1 Batch: 12362/20099 (61.51%) Loss: 2.107315 LR: 0.00001713 [14:11:11] Epoch: 1 Batch: 12363/20099 (61.51%) Loss: 2.284415 LR: 0.00001712 [14:11:13] Epoch: 1 Batch: 12364/20099 (61.52%) Loss: 1.960275 LR: 0.00001712 [14:11:15] Epoch: 1 Batch: 12365/20099 (61.52%) Loss: 2.167596 LR: 0.00001712 [14:11:17] Epoch: 1 Batch: 12366/20099 (61.53%) Loss: 2.081597 LR: 0.00001712 [14:11:19] Epoch: 1 Batch: 12367/20099 (61.53%) Loss: 2.227604 LR: 0.00001712 [14:11:21] Epoch: 1 Batch: 12368/20099 (61.54%) Loss: 2.049858 LR: 0.00001712 [14:11:23] Epoch: 1 Batch: 12369/20099 (61.54%) Loss: 2.007095 LR: 0.00001712 [14:11:24] Epoch: 1 Batch: 12370/20099 (61.55%) Loss: 2.088302 LR: 0.00001710 [14:11:26] Epoch: 1 Batch: 12371/20099 (61.55%) Loss: 1.854296 LR: 0.00001710 [14:11:28] Epoch: 1 Batch: 12372/20099 (61.56%) Loss: 2.133678 LR: 0.00001710 [14:11:30] Epoch: 1 Batch: 12373/20099 (61.56%) Loss: 2.164837 LR: 0.00001710 [14:11:32] Epoch: 1 Batch: 12374/20099 (61.57%) Loss: 2.241078 LR: 0.00001710 [14:11:34] Epoch: 1 Batch: 12375/20099 (61.57%) Loss: 1.960070 LR: 0.00001710 [14:11:36] Epoch: 1 Batch: 12376/20099 (61.58%) Loss: 2.052070 LR: 0.00001710 [14:11:37] Epoch: 1 Batch: 12377/20099 (61.58%) Loss: 2.150925 LR: 0.00001708 [14:11:39] Epoch: 1 Batch: 12378/20099 (61.59%) Loss: 2.073399 LR: 0.00001708 [14:11:41] Epoch: 1 Batch: 12379/20099 (61.59%) Loss: 2.057556 LR: 0.00001708 [14:11:43] Epoch: 1 Batch: 12380/20099 (61.60%) Loss: 2.113719 LR: 0.00001708 [14:11:45] Epoch: 1 Batch: 12381/20099 (61.60%) Loss: 1.937099 LR: 0.00001708 [14:11:47] Epoch: 1 Batch: 12382/20099 (61.61%) Loss: 2.139065 LR: 0.00001708 [14:11:49] Epoch: 1 Batch: 12383/20099 (61.61%) Loss: 2.112404 LR: 0.00001708 [14:11:50] Epoch: 1 Batch: 12384/20099 (61.62%) Loss: 2.174942 LR: 0.00001707 [14:11:52] Epoch: 1 Batch: 12385/20099 (61.62%) Loss: 1.969639 LR: 0.00001707 [14:11:54] Epoch: 1 Batch: 12386/20099 (61.62%) Loss: 1.988585 LR: 0.00001707 [14:11:56] Epoch: 1 Batch: 12387/20099 (61.63%) Loss: 2.242295 LR: 0.00001707 [14:11:58] Epoch: 1 Batch: 12388/20099 (61.63%) Loss: 2.712900 LR: 0.00001707 [14:12:00] Epoch: 1 Batch: 12389/20099 (61.64%) Loss: 1.821630 LR: 0.00001707 [14:12:02] Epoch: 1 Batch: 12390/20099 (61.64%) Loss: 2.124241 LR: 0.00001707 [14:12:03] Epoch: 1 Batch: 12391/20099 (61.65%) Loss: 2.129233 LR: 0.00001705 [14:12:05] Epoch: 1 Batch: 12392/20099 (61.65%) Loss: 2.216093 LR: 0.00001705 [14:12:07] Epoch: 1 Batch: 12393/20099 (61.66%) Loss: 2.101609 LR: 0.00001705 [14:12:09] Epoch: 1 Batch: 12394/20099 (61.66%) Loss: 2.177756 LR: 0.00001705 [14:12:11] Epoch: 1 Batch: 12395/20099 (61.67%) Loss: 1.839501 LR: 0.00001705 [14:12:13] Epoch: 1 Batch: 12396/20099 (61.67%) Loss: 2.259667 LR: 0.00001705 [14:12:15] Epoch: 1 Batch: 12397/20099 (61.68%) Loss: 1.995819 LR: 0.00001705 [14:12:16] Epoch: 1 Batch: 12398/20099 (61.68%) Loss: 2.227104 LR: 0.00001703 [14:12:18] Epoch: 1 Batch: 12399/20099 (61.69%) Loss: 2.415015 LR: 0.00001703 [14:12:24] >> Cleaned up old temp checkpoint: epoch1_step6600 [14:12:24] >> Temp checkpoint saved: epoch1_step12400, size: 0.1693 GB [14:12:24] Epoch: 1 Batch: 12400/20099 (61.69%) Loss: 2.087462 LR: 0.00001703 [14:12:26] Epoch: 1 Batch: 12401/20099 (61.70%) Loss: 2.165205 LR: 0.00001703 [14:12:28] Epoch: 1 Batch: 12402/20099 (61.70%) Loss: 2.255311 LR: 0.00001703 [14:12:30] Epoch: 1 Batch: 12403/20099 (61.71%) Loss: 2.219689 LR: 0.00001703 [14:12:31] Epoch: 1 Batch: 12404/20099 (61.71%) Loss: 1.792379 LR: 0.00001703 [14:12:33] Epoch: 1 Batch: 12405/20099 (61.72%) Loss: 1.937288 LR: 0.00001702 [14:12:35] Epoch: 1 Batch: 12406/20099 (61.72%) Loss: 2.165641 LR: 0.00001702 [14:12:37] Epoch: 1 Batch: 12407/20099 (61.73%) Loss: 1.932520 LR: 0.00001702 [14:12:39] Epoch: 1 Batch: 12408/20099 (61.73%) Loss: 2.291670 LR: 0.00001702 [14:12:41] Epoch: 1 Batch: 12409/20099 (61.74%) Loss: 2.017411 LR: 0.00001702 [14:12:42] Epoch: 1 Batch: 12410/20099 (61.74%) Loss: 2.170950 LR: 0.00001702 [14:12:44] Epoch: 1 Batch: 12411/20099 (61.75%) Loss: 2.203439 LR: 0.00001702 [14:12:46] Epoch: 1 Batch: 12412/20099 (61.75%) Loss: 2.158433 LR: 0.00001700 [14:12:48] Epoch: 1 Batch: 12413/20099 (61.76%) Loss: 1.862632 LR: 0.00001700 [14:12:50] Epoch: 1 Batch: 12414/20099 (61.76%) Loss: 2.106500 LR: 0.00001700 [14:12:52] Epoch: 1 Batch: 12415/20099 (61.77%) Loss: 1.967963 LR: 0.00001700 [14:12:54] Epoch: 1 Batch: 12416/20099 (61.77%) Loss: 2.075730 LR: 0.00001700 [14:12:55] Epoch: 1 Batch: 12417/20099 (61.78%) Loss: 1.867535 LR: 0.00001700 [14:12:57] Epoch: 1 Batch: 12418/20099 (61.78%) Loss: 2.073011 LR: 0.00001700 [14:12:59] Epoch: 1 Batch: 12419/20099 (61.79%) Loss: 2.140473 LR: 0.00001699 [14:13:01] Epoch: 1 Batch: 12420/20099 (61.79%) Loss: 1.883654 LR: 0.00001699 [14:13:03] Epoch: 1 Batch: 12421/20099 (61.80%) Loss: 1.788712 LR: 0.00001699 [14:13:05] Epoch: 1 Batch: 12422/20099 (61.80%) Loss: 2.033268 LR: 0.00001699 [14:13:07] Epoch: 1 Batch: 12423/20099 (61.81%) Loss: 2.039468 LR: 0.00001699 [14:13:08] Epoch: 1 Batch: 12424/20099 (61.81%) Loss: 2.231876 LR: 0.00001699 [14:13:10] Epoch: 1 Batch: 12425/20099 (61.82%) Loss: 2.306885 LR: 0.00001699 [14:13:12] Epoch: 1 Batch: 12426/20099 (61.82%) Loss: 2.272599 LR: 0.00001697 [14:13:14] Epoch: 1 Batch: 12427/20099 (61.83%) Loss: 2.311101 LR: 0.00001697 [14:13:16] Epoch: 1 Batch: 12428/20099 (61.83%) Loss: 2.061842 LR: 0.00001697 [14:13:18] Epoch: 1 Batch: 12429/20099 (61.84%) Loss: 1.502964 LR: 0.00001697 [14:13:20] Epoch: 1 Batch: 12430/20099 (61.84%) Loss: 2.062404 LR: 0.00001697 [14:13:21] Epoch: 1 Batch: 12431/20099 (61.85%) Loss: 1.749862 LR: 0.00001697 [14:13:23] Epoch: 1 Batch: 12432/20099 (61.85%) Loss: 1.992285 LR: 0.00001697 [14:13:25] Epoch: 1 Batch: 12433/20099 (61.86%) Loss: 1.931303 LR: 0.00001695 [14:13:27] Epoch: 1 Batch: 12434/20099 (61.86%) Loss: 2.504176 LR: 0.00001695 [14:13:29] Epoch: 1 Batch: 12435/20099 (61.87%) Loss: 1.968347 LR: 0.00001695 [14:13:31] Epoch: 1 Batch: 12436/20099 (61.87%) Loss: 1.581026 LR: 0.00001695 [14:13:32] Epoch: 1 Batch: 12437/20099 (61.88%) Loss: 2.041371 LR: 0.00001695 [14:13:34] Epoch: 1 Batch: 12438/20099 (61.88%) Loss: 2.134628 LR: 0.00001695 [14:13:36] Epoch: 1 Batch: 12439/20099 (61.89%) Loss: 2.034686 LR: 0.00001695 [14:13:38] Epoch: 1 Batch: 12440/20099 (61.89%) Loss: 1.903062 LR: 0.00001694 [14:13:40] Epoch: 1 Batch: 12441/20099 (61.90%) Loss: 1.883896 LR: 0.00001694 [14:13:42] Epoch: 1 Batch: 12442/20099 (61.90%) Loss: 2.332543 LR: 0.00001694 [14:13:44] Epoch: 1 Batch: 12443/20099 (61.91%) Loss: 2.174825 LR: 0.00001694 [14:13:45] Epoch: 1 Batch: 12444/20099 (61.91%) Loss: 2.151270 LR: 0.00001694 [14:13:47] Epoch: 1 Batch: 12445/20099 (61.92%) Loss: 1.911283 LR: 0.00001694 [14:13:49] Epoch: 1 Batch: 12446/20099 (61.92%) Loss: 2.189357 LR: 0.00001694 [14:13:51] Epoch: 1 Batch: 12447/20099 (61.93%) Loss: 2.218433 LR: 0.00001692 [14:13:53] Epoch: 1 Batch: 12448/20099 (61.93%) Loss: 2.211326 LR: 0.00001692 [14:13:55] Epoch: 1 Batch: 12449/20099 (61.94%) Loss: 1.930006 LR: 0.00001692 [14:13:57] Epoch: 1 Batch: 12450/20099 (61.94%) Loss: 1.699205 LR: 0.00001692 [14:13:58] Epoch: 1 Batch: 12451/20099 (61.95%) Loss: 2.045350 LR: 0.00001692 [14:14:00] Epoch: 1 Batch: 12452/20099 (61.95%) Loss: 2.184683 LR: 0.00001692 [14:14:02] Epoch: 1 Batch: 12453/20099 (61.96%) Loss: 2.132371 LR: 0.00001692 [14:14:04] Epoch: 1 Batch: 12454/20099 (61.96%) Loss: 2.234082 LR: 0.00001691 [14:14:06] Epoch: 1 Batch: 12455/20099 (61.97%) Loss: 2.128109 LR: 0.00001691 [14:14:08] Epoch: 1 Batch: 12456/20099 (61.97%) Loss: 2.013742 LR: 0.00001691 [14:14:10] Epoch: 1 Batch: 12457/20099 (61.98%) Loss: 2.000071 LR: 0.00001691 [14:14:11] Epoch: 1 Batch: 12458/20099 (61.98%) Loss: 2.215206 LR: 0.00001691 [14:14:13] Epoch: 1 Batch: 12459/20099 (61.99%) Loss: 2.092292 LR: 0.00001691 [14:14:15] Epoch: 1 Batch: 12460/20099 (61.99%) Loss: 2.036276 LR: 0.00001691 [14:14:17] Epoch: 1 Batch: 12461/20099 (62.00%) Loss: 2.195232 LR: 0.00001689 [14:14:19] Epoch: 1 Batch: 12462/20099 (62.00%) Loss: 1.989264 LR: 0.00001689 [14:14:21] Epoch: 1 Batch: 12463/20099 (62.01%) Loss: 2.147491 LR: 0.00001689 [14:14:22] Epoch: 1 Batch: 12464/20099 (62.01%) Loss: 2.230912 LR: 0.00001689 [14:14:24] Epoch: 1 Batch: 12465/20099 (62.02%) Loss: 1.956182 LR: 0.00001689 [14:14:26] Epoch: 1 Batch: 12466/20099 (62.02%) Loss: 1.994058 LR: 0.00001689 [14:14:28] Epoch: 1 Batch: 12467/20099 (62.03%) Loss: 2.233642 LR: 0.00001689 [14:14:30] Epoch: 1 Batch: 12468/20099 (62.03%) Loss: 2.216034 LR: 0.00001687 [14:14:32] Epoch: 1 Batch: 12469/20099 (62.04%) Loss: 1.901636 LR: 0.00001687 [14:14:34] Epoch: 1 Batch: 12470/20099 (62.04%) Loss: 2.061889 LR: 0.00001687 [14:14:35] Epoch: 1 Batch: 12471/20099 (62.05%) Loss: 1.928025 LR: 0.00001687 [14:14:37] Epoch: 1 Batch: 12472/20099 (62.05%) Loss: 2.115180 LR: 0.00001687 [14:14:39] Epoch: 1 Batch: 12473/20099 (62.06%) Loss: 2.171680 LR: 0.00001687 [14:14:41] Epoch: 1 Batch: 12474/20099 (62.06%) Loss: 1.871390 LR: 0.00001687 [14:14:43] Epoch: 1 Batch: 12475/20099 (62.07%) Loss: 2.108838 LR: 0.00001686 [14:14:45] Epoch: 1 Batch: 12476/20099 (62.07%) Loss: 1.932359 LR: 0.00001686 [14:14:47] Epoch: 1 Batch: 12477/20099 (62.08%) Loss: 1.982171 LR: 0.00001686 [14:14:48] Epoch: 1 Batch: 12478/20099 (62.08%) Loss: 1.989198 LR: 0.00001686 [14:14:50] Epoch: 1 Batch: 12479/20099 (62.09%) Loss: 2.440098 LR: 0.00001686 [14:14:52] Epoch: 1 Batch: 12480/20099 (62.09%) Loss: 2.278450 LR: 0.00001686 [14:14:54] Epoch: 1 Batch: 12481/20099 (62.10%) Loss: 1.972771 LR: 0.00001686 [14:14:56] Epoch: 1 Batch: 12482/20099 (62.10%) Loss: 2.254585 LR: 0.00001684 [14:14:58] Epoch: 1 Batch: 12483/20099 (62.11%) Loss: 1.506684 LR: 0.00001684 [14:15:00] Epoch: 1 Batch: 12484/20099 (62.11%) Loss: 2.111426 LR: 0.00001684 [14:15:01] Epoch: 1 Batch: 12485/20099 (62.12%) Loss: 2.167999 LR: 0.00001684 [14:15:03] Epoch: 1 Batch: 12486/20099 (62.12%) Loss: 2.215415 LR: 0.00001684 [14:15:05] Epoch: 1 Batch: 12487/20099 (62.13%) Loss: 2.039410 LR: 0.00001684 [14:15:07] Epoch: 1 Batch: 12488/20099 (62.13%) Loss: 1.945113 LR: 0.00001684 [14:15:09] Epoch: 1 Batch: 12489/20099 (62.14%) Loss: 2.236220 LR: 0.00001682 [14:15:11] Epoch: 1 Batch: 12490/20099 (62.14%) Loss: 1.776360 LR: 0.00001682 [14:15:13] Epoch: 1 Batch: 12491/20099 (62.15%) Loss: 2.113581 LR: 0.00001682 [14:15:14] Epoch: 1 Batch: 12492/20099 (62.15%) Loss: 2.022442 LR: 0.00001682 [14:15:16] Epoch: 1 Batch: 12493/20099 (62.16%) Loss: 2.055039 LR: 0.00001682 [14:15:18] Epoch: 1 Batch: 12494/20099 (62.16%) Loss: 2.216140 LR: 0.00001682 [14:15:20] Epoch: 1 Batch: 12495/20099 (62.17%) Loss: 1.930246 LR: 0.00001682 [14:15:22] Epoch: 1 Batch: 12496/20099 (62.17%) Loss: 1.843150 LR: 0.00001681 [14:15:24] Epoch: 1 Batch: 12497/20099 (62.18%) Loss: 2.334273 LR: 0.00001681 [14:15:26] Epoch: 1 Batch: 12498/20099 (62.18%) Loss: 2.158778 LR: 0.00001681 [14:15:27] Epoch: 1 Batch: 12499/20099 (62.19%) Loss: 2.173199 LR: 0.00001681 [14:15:29] >> Evaluating batch 0 [14:15:30] >> Evaluating batch 1 [14:15:32] >> Evaluating batch 2 [14:15:33] >> Evaluating batch 3 [14:15:34] >> Evaluating batch 4 [14:15:35] >> Evaluating batch 5 [14:15:36] >> Evaluating batch 6 [14:15:37] >> Evaluating batch 7 [14:15:38] >> Evaluating batch 8 [14:15:39] >> Evaluating batch 9 [14:15:40] >> Evaluating batch 10 [14:15:41] >> Evaluating batch 11 [14:15:42] >> Evaluating batch 12 [14:15:43] >> Evaluating batch 13 [14:15:44] >> Evaluating batch 14 [14:15:45] >> Evaluating batch 15 [14:15:46] >> Evaluating batch 16 [14:15:47] Epoch: 1 Step: 12500/20099 Evaluation: [14:15:47] [1mAvg Loss Since Last Eval: 2.0854 Val Loss: 2.1564 Validation loss delta: -0.0021 Perplexity: 8.6399 LR: 0.00001681 [14:15:51] >> Checkpoint saved: epoch1_step12500, size: 0.1693 GB [14:15:51] Epoch: 1 Batch: 12500/20099 (62.19%) Loss: 2.219299 LR: 0.00001681 [14:15:52] Epoch: 1 Batch: 12501/20099 (62.20%) Loss: 2.284016 LR: 0.00001681 [14:15:54] Epoch: 1 Batch: 12502/20099 (62.20%) Loss: 1.825728 LR: 0.00001681 [14:15:56] Epoch: 1 Batch: 12503/20099 (62.21%) Loss: 2.208636 LR: 0.00001679 [14:15:58] Epoch: 1 Batch: 12504/20099 (62.21%) Loss: 2.339548 LR: 0.00001679 [14:16:00] Epoch: 1 Batch: 12505/20099 (62.22%) Loss: 2.153552 LR: 0.00001679 [14:16:02] Epoch: 1 Batch: 12506/20099 (62.22%) Loss: 1.996155 LR: 0.00001679 [14:16:03] Epoch: 1 Batch: 12507/20099 (62.23%) Loss: 2.367830 LR: 0.00001679 [14:16:05] Epoch: 1 Batch: 12508/20099 (62.23%) Loss: 1.989859 LR: 0.00001679 [14:16:07] Epoch: 1 Batch: 12509/20099 (62.24%) Loss: 1.975428 LR: 0.00001679 [14:16:09] Epoch: 1 Batch: 12510/20099 (62.24%) Loss: 2.300255 LR: 0.00001678 [14:16:11] Epoch: 1 Batch: 12511/20099 (62.25%) Loss: 2.000569 LR: 0.00001678 [14:16:13] Epoch: 1 Batch: 12512/20099 (62.25%) Loss: 2.143499 LR: 0.00001678 [14:16:14] Epoch: 1 Batch: 12513/20099 (62.26%) Loss: 1.843700 LR: 0.00001678 [14:16:16] Epoch: 1 Batch: 12514/20099 (62.26%) Loss: 2.138958 LR: 0.00001678 [14:16:18] Epoch: 1 Batch: 12515/20099 (62.27%) Loss: 2.074175 LR: 0.00001678 [14:16:20] Epoch: 1 Batch: 12516/20099 (62.27%) Loss: 1.907137 LR: 0.00001678 [14:16:22] Epoch: 1 Batch: 12517/20099 (62.28%) Loss: 2.646269 LR: 0.00001676 [14:16:24] Epoch: 1 Batch: 12518/20099 (62.28%) Loss: 2.228563 LR: 0.00001676 [14:16:26] Epoch: 1 Batch: 12519/20099 (62.29%) Loss: 2.292810 LR: 0.00001676 [14:16:27] Epoch: 1 Batch: 12520/20099 (62.29%) Loss: 2.159517 LR: 0.00001676 [14:16:29] Epoch: 1 Batch: 12521/20099 (62.30%) Loss: 2.058583 LR: 0.00001676 [14:16:31] Epoch: 1 Batch: 12522/20099 (62.30%) Loss: 2.193661 LR: 0.00001676 [14:16:33] Epoch: 1 Batch: 12523/20099 (62.31%) Loss: 2.123439 LR: 0.00001676 [14:16:35] Epoch: 1 Batch: 12524/20099 (62.31%) Loss: 2.095974 LR: 0.00001674 [14:16:37] Epoch: 1 Batch: 12525/20099 (62.32%) Loss: 2.113257 LR: 0.00001674 [14:16:39] Epoch: 1 Batch: 12526/20099 (62.32%) Loss: 2.275215 LR: 0.00001674 [14:16:40] Epoch: 1 Batch: 12527/20099 (62.33%) Loss: 2.001440 LR: 0.00001674 [14:16:42] Epoch: 1 Batch: 12528/20099 (62.33%) Loss: 2.124734 LR: 0.00001674 [14:16:44] Epoch: 1 Batch: 12529/20099 (62.34%) Loss: 2.079690 LR: 0.00001674 [14:16:46] Epoch: 1 Batch: 12530/20099 (62.34%) Loss: 1.884131 LR: 0.00001674 [14:16:48] Epoch: 1 Batch: 12531/20099 (62.35%) Loss: 2.175298 LR: 0.00001673 [14:16:50] Epoch: 1 Batch: 12532/20099 (62.35%) Loss: 2.158480 LR: 0.00001673 [14:16:52] Epoch: 1 Batch: 12533/20099 (62.36%) Loss: 1.963490 LR: 0.00001673 [14:16:53] Epoch: 1 Batch: 12534/20099 (62.36%) Loss: 1.890460 LR: 0.00001673 [14:16:55] Epoch: 1 Batch: 12535/20099 (62.37%) Loss: 2.062577 LR: 0.00001673 [14:16:57] Epoch: 1 Batch: 12536/20099 (62.37%) Loss: 2.133164 LR: 0.00001673 [14:16:59] Epoch: 1 Batch: 12537/20099 (62.38%) Loss: 2.146369 LR: 0.00001673 [14:17:01] Epoch: 1 Batch: 12538/20099 (62.38%) Loss: 2.362843 LR: 0.00001671 [14:17:03] Epoch: 1 Batch: 12539/20099 (62.39%) Loss: 1.915859 LR: 0.00001671 [14:17:05] Epoch: 1 Batch: 12540/20099 (62.39%) Loss: 2.055562 LR: 0.00001671 [14:17:06] Epoch: 1 Batch: 12541/20099 (62.40%) Loss: 2.024007 LR: 0.00001671 [14:17:08] Epoch: 1 Batch: 12542/20099 (62.40%) Loss: 2.038615 LR: 0.00001671 [14:17:10] Epoch: 1 Batch: 12543/20099 (62.41%) Loss: 2.250983 LR: 0.00001671 [14:17:12] Epoch: 1 Batch: 12544/20099 (62.41%) Loss: 1.592399 LR: 0.00001671 [14:17:14] Epoch: 1 Batch: 12545/20099 (62.42%) Loss: 2.047400 LR: 0.00001670 [14:17:16] Epoch: 1 Batch: 12546/20099 (62.42%) Loss: 1.981114 LR: 0.00001670 [14:17:18] Epoch: 1 Batch: 12547/20099 (62.43%) Loss: 2.222521 LR: 0.00001670 [14:17:19] Epoch: 1 Batch: 12548/20099 (62.43%) Loss: 2.149658 LR: 0.00001670 [14:17:21] Epoch: 1 Batch: 12549/20099 (62.44%) Loss: 1.868765 LR: 0.00001670 [14:17:23] Epoch: 1 Batch: 12550/20099 (62.44%) Loss: 2.036593 LR: 0.00001670 [14:17:25] Epoch: 1 Batch: 12551/20099 (62.45%) Loss: 2.049429 LR: 0.00001670 [14:17:27] Epoch: 1 Batch: 12552/20099 (62.45%) Loss: 2.091812 LR: 0.00001668 [14:17:29] Epoch: 1 Batch: 12553/20099 (62.46%) Loss: 1.649682 LR: 0.00001668 [14:17:31] Epoch: 1 Batch: 12554/20099 (62.46%) Loss: 1.861302 LR: 0.00001668 [14:17:32] Epoch: 1 Batch: 12555/20099 (62.47%) Loss: 2.148660 LR: 0.00001668 [14:17:34] Epoch: 1 Batch: 12556/20099 (62.47%) Loss: 1.928683 LR: 0.00001668 [14:17:36] Epoch: 1 Batch: 12557/20099 (62.48%) Loss: 1.794629 LR: 0.00001668 [14:17:38] Epoch: 1 Batch: 12558/20099 (62.48%) Loss: 2.201550 LR: 0.00001668 [14:17:40] Epoch: 1 Batch: 12559/20099 (62.49%) Loss: 1.960785 LR: 0.00001666 [14:17:42] Epoch: 1 Batch: 12560/20099 (62.49%) Loss: 2.259130 LR: 0.00001666 [14:17:44] Epoch: 1 Batch: 12561/20099 (62.50%) Loss: 2.005169 LR: 0.00001666 [14:17:45] Epoch: 1 Batch: 12562/20099 (62.50%) Loss: 1.770578 LR: 0.00001666 [14:17:47] Epoch: 1 Batch: 12563/20099 (62.51%) Loss: 2.292722 LR: 0.00001666 [14:17:49] Epoch: 1 Batch: 12564/20099 (62.51%) Loss: 2.287107 LR: 0.00001666 [14:17:51] Epoch: 1 Batch: 12565/20099 (62.52%) Loss: 2.002618 LR: 0.00001666 [14:17:53] Epoch: 1 Batch: 12566/20099 (62.52%) Loss: 1.897934 LR: 0.00001665 [14:17:55] Epoch: 1 Batch: 12567/20099 (62.53%) Loss: 1.964326 LR: 0.00001665 [14:17:57] Epoch: 1 Batch: 12568/20099 (62.53%) Loss: 1.693827 LR: 0.00001665 [14:17:59] Epoch: 1 Batch: 12569/20099 (62.54%) Loss: 2.136894 LR: 0.00001665 [14:18:00] Epoch: 1 Batch: 12570/20099 (62.54%) Loss: 2.475913 LR: 0.00001665 [14:18:02] Epoch: 1 Batch: 12571/20099 (62.55%) Loss: 2.238304 LR: 0.00001665 [14:18:04] Epoch: 1 Batch: 12572/20099 (62.55%) Loss: 2.079713 LR: 0.00001665 [14:18:06] Epoch: 1 Batch: 12573/20099 (62.56%) Loss: 2.531006 LR: 0.00001663 [14:18:08] Epoch: 1 Batch: 12574/20099 (62.56%) Loss: 2.276798 LR: 0.00001663 [14:18:10] Epoch: 1 Batch: 12575/20099 (62.57%) Loss: 2.315984 LR: 0.00001663 [14:18:12] Epoch: 1 Batch: 12576/20099 (62.57%) Loss: 2.081683 LR: 0.00001663 [14:18:13] Epoch: 1 Batch: 12577/20099 (62.58%) Loss: 1.971057 LR: 0.00001663 [14:18:15] Epoch: 1 Batch: 12578/20099 (62.58%) Loss: 1.856409 LR: 0.00001663 [14:18:17] Epoch: 1 Batch: 12579/20099 (62.59%) Loss: 1.957466 LR: 0.00001663 [14:18:19] Epoch: 1 Batch: 12580/20099 (62.59%) Loss: 1.963899 LR: 0.00001661 [14:18:21] Epoch: 1 Batch: 12581/20099 (62.60%) Loss: 1.858514 LR: 0.00001661 [14:18:23] Epoch: 1 Batch: 12582/20099 (62.60%) Loss: 1.957968 LR: 0.00001661 [14:18:25] Epoch: 1 Batch: 12583/20099 (62.61%) Loss: 1.873317 LR: 0.00001661 [14:18:26] Epoch: 1 Batch: 12584/20099 (62.61%) Loss: 2.097559 LR: 0.00001661 [14:18:28] Epoch: 1 Batch: 12585/20099 (62.62%) Loss: 2.003291 LR: 0.00001661 [14:18:30] Epoch: 1 Batch: 12586/20099 (62.62%) Loss: 2.133249 LR: 0.00001661 [14:18:32] Epoch: 1 Batch: 12587/20099 (62.63%) Loss: 2.315340 LR: 0.00001660 [14:18:34] Epoch: 1 Batch: 12588/20099 (62.63%) Loss: 2.154790 LR: 0.00001660 [14:18:36] Epoch: 1 Batch: 12589/20099 (62.63%) Loss: 1.974035 LR: 0.00001660 [14:18:38] Epoch: 1 Batch: 12590/20099 (62.64%) Loss: 2.150758 LR: 0.00001660 [14:18:40] Epoch: 1 Batch: 12591/20099 (62.64%) Loss: 2.122747 LR: 0.00001660 [14:18:41] Epoch: 1 Batch: 12592/20099 (62.65%) Loss: 2.169953 LR: 0.00001660 [14:18:43] Epoch: 1 Batch: 12593/20099 (62.65%) Loss: 2.065744 LR: 0.00001660 [14:18:45] Epoch: 1 Batch: 12594/20099 (62.66%) Loss: 2.092122 LR: 0.00001658 [14:18:47] Epoch: 1 Batch: 12595/20099 (62.66%) Loss: 2.115968 LR: 0.00001658 [14:18:49] Epoch: 1 Batch: 12596/20099 (62.67%) Loss: 2.031101 LR: 0.00001658 [14:18:51] Epoch: 1 Batch: 12597/20099 (62.67%) Loss: 2.207226 LR: 0.00001658 [14:18:52] Epoch: 1 Batch: 12598/20099 (62.68%) Loss: 2.119432 LR: 0.00001658 [14:18:54] Epoch: 1 Batch: 12599/20099 (62.68%) Loss: 2.127054 LR: 0.00001658 [14:19:00] >> Cleaned up old temp checkpoint: epoch1_step6800 [14:19:00] >> Temp checkpoint saved: epoch1_step12600, size: 0.1693 GB [14:19:00] Epoch: 1 Batch: 12600/20099 (62.69%) Loss: 2.190987 LR: 0.00001658 [14:19:02] Epoch: 1 Batch: 12601/20099 (62.69%) Loss: 1.814809 LR: 0.00001657 [14:19:04] Epoch: 1 Batch: 12602/20099 (62.70%) Loss: 2.089878 LR: 0.00001657 [14:19:05] Epoch: 1 Batch: 12603/20099 (62.70%) Loss: 1.944906 LR: 0.00001657 [14:19:07] Epoch: 1 Batch: 12604/20099 (62.71%) Loss: 2.099495 LR: 0.00001657 [14:19:09] Epoch: 1 Batch: 12605/20099 (62.71%) Loss: 1.941844 LR: 0.00001657 [14:19:11] Epoch: 1 Batch: 12606/20099 (62.72%) Loss: 2.405787 LR: 0.00001657 [14:19:13] Epoch: 1 Batch: 12607/20099 (62.72%) Loss: 2.009480 LR: 0.00001657 [14:19:15] Epoch: 1 Batch: 12608/20099 (62.73%) Loss: 2.019177 LR: 0.00001655 [14:19:16] Epoch: 1 Batch: 12609/20099 (62.73%) Loss: 2.096102 LR: 0.00001655 [14:19:18] Epoch: 1 Batch: 12610/20099 (62.74%) Loss: 1.915547 LR: 0.00001655 [14:19:20] Epoch: 1 Batch: 12611/20099 (62.74%) Loss: 2.189138 LR: 0.00001655 [14:19:22] Epoch: 1 Batch: 12612/20099 (62.75%) Loss: 2.205578 LR: 0.00001655 [14:19:24] Epoch: 1 Batch: 12613/20099 (62.75%) Loss: 2.071659 LR: 0.00001655 [14:19:26] Epoch: 1 Batch: 12614/20099 (62.76%) Loss: 2.342597 LR: 0.00001655 [14:19:27] Epoch: 1 Batch: 12615/20099 (62.76%) Loss: 2.257440 LR: 0.00001653 [14:19:29] Epoch: 1 Batch: 12616/20099 (62.77%) Loss: 2.033727 LR: 0.00001653 [14:19:31] Epoch: 1 Batch: 12617/20099 (62.77%) Loss: 2.365439 LR: 0.00001653 [14:19:33] Epoch: 1 Batch: 12618/20099 (62.78%) Loss: 2.430846 LR: 0.00001653 [14:19:35] Epoch: 1 Batch: 12619/20099 (62.78%) Loss: 2.059263 LR: 0.00001653 [14:19:37] Epoch: 1 Batch: 12620/20099 (62.79%) Loss: 2.015301 LR: 0.00001653 [14:19:39] Epoch: 1 Batch: 12621/20099 (62.79%) Loss: 1.804606 LR: 0.00001653 [14:19:40] Epoch: 1 Batch: 12622/20099 (62.80%) Loss: 1.973134 LR: 0.00001652 [14:19:42] Epoch: 1 Batch: 12623/20099 (62.80%) Loss: 2.125205 LR: 0.00001652 [14:19:44] Epoch: 1 Batch: 12624/20099 (62.81%) Loss: 2.435292 LR: 0.00001652 [14:19:46] Epoch: 1 Batch: 12625/20099 (62.81%) Loss: 1.987108 LR: 0.00001652 [14:19:48] Epoch: 1 Batch: 12626/20099 (62.82%) Loss: 2.008645 LR: 0.00001652 [14:19:50] Epoch: 1 Batch: 12627/20099 (62.82%) Loss: 1.955027 LR: 0.00001652 [14:19:52] Epoch: 1 Batch: 12628/20099 (62.83%) Loss: 2.236176 LR: 0.00001652 [14:19:54] Epoch: 1 Batch: 12629/20099 (62.83%) Loss: 1.941162 LR: 0.00001650 [14:19:55] Epoch: 1 Batch: 12630/20099 (62.84%) Loss: 2.277886 LR: 0.00001650 [14:19:57] Epoch: 1 Batch: 12631/20099 (62.84%) Loss: 1.924298 LR: 0.00001650 [14:19:59] Epoch: 1 Batch: 12632/20099 (62.85%) Loss: 2.139210 LR: 0.00001650 [14:20:01] Epoch: 1 Batch: 12633/20099 (62.85%) Loss: 2.066837 LR: 0.00001650 [14:20:03] Epoch: 1 Batch: 12634/20099 (62.86%) Loss: 2.125645 LR: 0.00001650 [14:20:05] Epoch: 1 Batch: 12635/20099 (62.86%) Loss: 2.086339 LR: 0.00001650 [14:20:07] Epoch: 1 Batch: 12636/20099 (62.87%) Loss: 1.964285 LR: 0.00001649 [14:20:08] Epoch: 1 Batch: 12637/20099 (62.87%) Loss: 1.520764 LR: 0.00001649 [14:20:10] Epoch: 1 Batch: 12638/20099 (62.88%) Loss: 2.206492 LR: 0.00001649 [14:20:12] Epoch: 1 Batch: 12639/20099 (62.88%) Loss: 1.992594 LR: 0.00001649 [14:20:14] Epoch: 1 Batch: 12640/20099 (62.89%) Loss: 1.990512 LR: 0.00001649 [14:20:16] Epoch: 1 Batch: 12641/20099 (62.89%) Loss: 1.522527 LR: 0.00001649 [14:20:18] Epoch: 1 Batch: 12642/20099 (62.90%) Loss: 1.964320 LR: 0.00001649 [14:20:19] Epoch: 1 Batch: 12643/20099 (62.90%) Loss: 2.071163 LR: 0.00001647 [14:20:21] Epoch: 1 Batch: 12644/20099 (62.91%) Loss: 2.200923 LR: 0.00001647 [14:20:23] Epoch: 1 Batch: 12645/20099 (62.91%) Loss: 2.470009 LR: 0.00001647 [14:20:25] Epoch: 1 Batch: 12646/20099 (62.92%) Loss: 2.014628 LR: 0.00001647 [14:20:27] Epoch: 1 Batch: 12647/20099 (62.92%) Loss: 2.145019 LR: 0.00001647 [14:20:29] Epoch: 1 Batch: 12648/20099 (62.93%) Loss: 2.132402 LR: 0.00001647 [14:20:30] Epoch: 1 Batch: 12649/20099 (62.93%) Loss: 2.267000 LR: 0.00001647 [14:20:32] Epoch: 1 Batch: 12650/20099 (62.94%) Loss: 2.201242 LR: 0.00001645 [14:20:34] Epoch: 1 Batch: 12651/20099 (62.94%) Loss: 2.326022 LR: 0.00001645 [14:20:36] Epoch: 1 Batch: 12652/20099 (62.95%) Loss: 2.139207 LR: 0.00001645 [14:20:38] Epoch: 1 Batch: 12653/20099 (62.95%) Loss: 1.999581 LR: 0.00001645 [14:20:40] Epoch: 1 Batch: 12654/20099 (62.96%) Loss: 2.027912 LR: 0.00001645 [14:20:42] Epoch: 1 Batch: 12655/20099 (62.96%) Loss: 2.121173 LR: 0.00001645 [14:20:43] Epoch: 1 Batch: 12656/20099 (62.97%) Loss: 2.226741 LR: 0.00001645 [14:20:45] Epoch: 1 Batch: 12657/20099 (62.97%) Loss: 2.179336 LR: 0.00001644 [14:20:47] Epoch: 1 Batch: 12658/20099 (62.98%) Loss: 2.550044 LR: 0.00001644 [14:20:49] Epoch: 1 Batch: 12659/20099 (62.98%) Loss: 2.160189 LR: 0.00001644 [14:20:51] Epoch: 1 Batch: 12660/20099 (62.99%) Loss: 2.258188 LR: 0.00001644 [14:20:53] Epoch: 1 Batch: 12661/20099 (62.99%) Loss: 2.256711 LR: 0.00001644 [14:20:55] Epoch: 1 Batch: 12662/20099 (63.00%) Loss: 2.088911 LR: 0.00001644 [14:20:56] Epoch: 1 Batch: 12663/20099 (63.00%) Loss: 2.102660 LR: 0.00001644 [14:20:58] Epoch: 1 Batch: 12664/20099 (63.01%) Loss: 1.774547 LR: 0.00001642 [14:21:00] Epoch: 1 Batch: 12665/20099 (63.01%) Loss: 2.453672 LR: 0.00001642 [14:21:02] Epoch: 1 Batch: 12666/20099 (63.02%) Loss: 2.270105 LR: 0.00001642 [14:21:04] Epoch: 1 Batch: 12667/20099 (63.02%) Loss: 2.440426 LR: 0.00001642 [14:21:06] Epoch: 1 Batch: 12668/20099 (63.03%) Loss: 2.125851 LR: 0.00001642 [14:21:08] Epoch: 1 Batch: 12669/20099 (63.03%) Loss: 2.025991 LR: 0.00001642 [14:21:09] Epoch: 1 Batch: 12670/20099 (63.04%) Loss: 2.187680 LR: 0.00001642 [14:21:11] Epoch: 1 Batch: 12671/20099 (63.04%) Loss: 2.219559 LR: 0.00001640 [14:21:13] Epoch: 1 Batch: 12672/20099 (63.05%) Loss: 2.344424 LR: 0.00001640 [14:21:15] Epoch: 1 Batch: 12673/20099 (63.05%) Loss: 2.126011 LR: 0.00001640 [14:21:17] Epoch: 1 Batch: 12674/20099 (63.06%) Loss: 2.289833 LR: 0.00001640 [14:21:19] Epoch: 1 Batch: 12675/20099 (63.06%) Loss: 2.125325 LR: 0.00001640 [14:21:20] Epoch: 1 Batch: 12676/20099 (63.07%) Loss: 2.256634 LR: 0.00001640 [14:21:22] Epoch: 1 Batch: 12677/20099 (63.07%) Loss: 1.956988 LR: 0.00001640 [14:21:24] Epoch: 1 Batch: 12678/20099 (63.08%) Loss: 1.869049 LR: 0.00001639 [14:21:26] Epoch: 1 Batch: 12679/20099 (63.08%) Loss: 2.478871 LR: 0.00001639 [14:21:28] Epoch: 1 Batch: 12680/20099 (63.09%) Loss: 2.156205 LR: 0.00001639 [14:21:30] Epoch: 1 Batch: 12681/20099 (63.09%) Loss: 2.214976 LR: 0.00001639 [14:21:32] Epoch: 1 Batch: 12682/20099 (63.10%) Loss: 2.244859 LR: 0.00001639 [14:21:33] Epoch: 1 Batch: 12683/20099 (63.10%) Loss: 1.898811 LR: 0.00001639 [14:21:35] Epoch: 1 Batch: 12684/20099 (63.11%) Loss: 2.265249 LR: 0.00001639 [14:21:37] Epoch: 1 Batch: 12685/20099 (63.11%) Loss: 1.994962 LR: 0.00001637 [14:21:39] Epoch: 1 Batch: 12686/20099 (63.12%) Loss: 2.169183 LR: 0.00001637 [14:21:41] Epoch: 1 Batch: 12687/20099 (63.12%) Loss: 2.008988 LR: 0.00001637 [14:21:43] Epoch: 1 Batch: 12688/20099 (63.13%) Loss: 1.982748 LR: 0.00001637 [14:21:45] Epoch: 1 Batch: 12689/20099 (63.13%) Loss: 2.259931 LR: 0.00001637 [14:21:46] Epoch: 1 Batch: 12690/20099 (63.14%) Loss: 2.115459 LR: 0.00001637 [14:21:48] Epoch: 1 Batch: 12691/20099 (63.14%) Loss: 2.330428 LR: 0.00001637 [14:21:50] Epoch: 1 Batch: 12692/20099 (63.15%) Loss: 2.033515 LR: 0.00001636 [14:21:52] Epoch: 1 Batch: 12693/20099 (63.15%) Loss: 1.952426 LR: 0.00001636 [14:21:54] Epoch: 1 Batch: 12694/20099 (63.16%) Loss: 2.330842 LR: 0.00001636 [14:21:56] Epoch: 1 Batch: 12695/20099 (63.16%) Loss: 1.977513 LR: 0.00001636 [14:21:57] Epoch: 1 Batch: 12696/20099 (63.17%) Loss: 2.067961 LR: 0.00001636 [14:21:59] Epoch: 1 Batch: 12697/20099 (63.17%) Loss: 2.316957 LR: 0.00001636 [14:22:01] Epoch: 1 Batch: 12698/20099 (63.18%) Loss: 2.323213 LR: 0.00001636 [14:22:03] Epoch: 1 Batch: 12699/20099 (63.18%) Loss: 2.178513 LR: 0.00001634 [14:22:05] Epoch: 1 Batch: 12700/20099 (63.19%) Loss: 2.035271 LR: 0.00001634 [14:22:07] Epoch: 1 Batch: 12701/20099 (63.19%) Loss: 1.866002 LR: 0.00001634 [14:22:09] Epoch: 1 Batch: 12702/20099 (63.20%) Loss: 2.121701 LR: 0.00001634 [14:22:10] Epoch: 1 Batch: 12703/20099 (63.20%) Loss: 1.732798 LR: 0.00001634 [14:22:12] Epoch: 1 Batch: 12704/20099 (63.21%) Loss: 1.744851 LR: 0.00001634 [14:22:14] Epoch: 1 Batch: 12705/20099 (63.21%) Loss: 2.185283 LR: 0.00001634 [14:22:16] Epoch: 1 Batch: 12706/20099 (63.22%) Loss: 2.117529 LR: 0.00001632 [14:22:18] Epoch: 1 Batch: 12707/20099 (63.22%) Loss: 2.103647 LR: 0.00001632 [14:22:20] Epoch: 1 Batch: 12708/20099 (63.23%) Loss: 2.192468 LR: 0.00001632 [14:22:22] Epoch: 1 Batch: 12709/20099 (63.23%) Loss: 2.200884 LR: 0.00001632 [14:22:23] Epoch: 1 Batch: 12710/20099 (63.24%) Loss: 1.955021 LR: 0.00001632 [14:22:25] Epoch: 1 Batch: 12711/20099 (63.24%) Loss: 2.234428 LR: 0.00001632 [14:22:27] Epoch: 1 Batch: 12712/20099 (63.25%) Loss: 2.021751 LR: 0.00001632 [14:22:29] Epoch: 1 Batch: 12713/20099 (63.25%) Loss: 2.008090 LR: 0.00001631 [14:22:31] Epoch: 1 Batch: 12714/20099 (63.26%) Loss: 2.096052 LR: 0.00001631 [14:22:33] Epoch: 1 Batch: 12715/20099 (63.26%) Loss: 1.916308 LR: 0.00001631 [14:22:35] Epoch: 1 Batch: 12716/20099 (63.27%) Loss: 2.164582 LR: 0.00001631 [14:22:36] Epoch: 1 Batch: 12717/20099 (63.27%) Loss: 2.407270 LR: 0.00001631 [14:22:38] Epoch: 1 Batch: 12718/20099 (63.28%) Loss: 2.033816 LR: 0.00001631 [14:22:40] Epoch: 1 Batch: 12719/20099 (63.28%) Loss: 2.221924 LR: 0.00001631 [14:22:42] Epoch: 1 Batch: 12720/20099 (63.29%) Loss: 2.080385 LR: 0.00001629 [14:22:44] Epoch: 1 Batch: 12721/20099 (63.29%) Loss: 1.983751 LR: 0.00001629 [14:22:46] Epoch: 1 Batch: 12722/20099 (63.30%) Loss: 2.212434 LR: 0.00001629 [14:22:47] Epoch: 1 Batch: 12723/20099 (63.30%) Loss: 2.200582 LR: 0.00001629 [14:22:49] Epoch: 1 Batch: 12724/20099 (63.31%) Loss: 1.843708 LR: 0.00001629 [14:22:51] Epoch: 1 Batch: 12725/20099 (63.31%) Loss: 2.129790 LR: 0.00001629 [14:22:53] Epoch: 1 Batch: 12726/20099 (63.32%) Loss: 2.032133 LR: 0.00001629 [14:22:55] Epoch: 1 Batch: 12727/20099 (63.32%) Loss: 2.097843 LR: 0.00001628 [14:22:57] Epoch: 1 Batch: 12728/20099 (63.33%) Loss: 2.184883 LR: 0.00001628 [14:22:58] Epoch: 1 Batch: 12729/20099 (63.33%) Loss: 2.137303 LR: 0.00001628 [14:23:00] Epoch: 1 Batch: 12730/20099 (63.34%) Loss: 1.880498 LR: 0.00001628 [14:23:02] Epoch: 1 Batch: 12731/20099 (63.34%) Loss: 2.004451 LR: 0.00001628 [14:23:04] Epoch: 1 Batch: 12732/20099 (63.35%) Loss: 2.154609 LR: 0.00001628 [14:23:06] Epoch: 1 Batch: 12733/20099 (63.35%) Loss: 2.079374 LR: 0.00001628 [14:23:08] Epoch: 1 Batch: 12734/20099 (63.36%) Loss: 2.056169 LR: 0.00001626 [14:23:10] Epoch: 1 Batch: 12735/20099 (63.36%) Loss: 2.086734 LR: 0.00001626 [14:23:11] Epoch: 1 Batch: 12736/20099 (63.37%) Loss: 2.136561 LR: 0.00001626 [14:23:13] Epoch: 1 Batch: 12737/20099 (63.37%) Loss: 2.094603 LR: 0.00001626 [14:23:15] Epoch: 1 Batch: 12738/20099 (63.38%) Loss: 2.094899 LR: 0.00001626 [14:23:17] Epoch: 1 Batch: 12739/20099 (63.38%) Loss: 1.916194 LR: 0.00001626 [14:23:19] Epoch: 1 Batch: 12740/20099 (63.39%) Loss: 2.124086 LR: 0.00001626 [14:23:21] Epoch: 1 Batch: 12741/20099 (63.39%) Loss: 2.010309 LR: 0.00001624 [14:23:23] Epoch: 1 Batch: 12742/20099 (63.40%) Loss: 2.212326 LR: 0.00001624 [14:23:24] Epoch: 1 Batch: 12743/20099 (63.40%) Loss: 2.244358 LR: 0.00001624 [14:23:26] Epoch: 1 Batch: 12744/20099 (63.41%) Loss: 2.135125 LR: 0.00001624 [14:23:28] Epoch: 1 Batch: 12745/20099 (63.41%) Loss: 2.010543 LR: 0.00001624 [14:23:30] Epoch: 1 Batch: 12746/20099 (63.42%) Loss: 2.168868 LR: 0.00001624 [14:23:32] Epoch: 1 Batch: 12747/20099 (63.42%) Loss: 1.841687 LR: 0.00001624 [14:23:34] Epoch: 1 Batch: 12748/20099 (63.43%) Loss: 2.124093 LR: 0.00001623 [14:23:36] Epoch: 1 Batch: 12749/20099 (63.43%) Loss: 2.096280 LR: 0.00001623 [14:23:37] Epoch: 1 Batch: 12750/20099 (63.44%) Loss: 2.310754 LR: 0.00001623 [14:23:39] Epoch: 1 Batch: 12751/20099 (63.44%) Loss: 1.908609 LR: 0.00001623 [14:23:41] Epoch: 1 Batch: 12752/20099 (63.45%) Loss: 2.225609 LR: 0.00001623 [14:23:43] Epoch: 1 Batch: 12753/20099 (63.45%) Loss: 1.968488 LR: 0.00001623 [14:23:45] Epoch: 1 Batch: 12754/20099 (63.46%) Loss: 2.021708 LR: 0.00001623 [14:23:47] Epoch: 1 Batch: 12755/20099 (63.46%) Loss: 2.046255 LR: 0.00001621 [14:23:49] Epoch: 1 Batch: 12756/20099 (63.47%) Loss: 1.846948 LR: 0.00001621 [14:23:50] Epoch: 1 Batch: 12757/20099 (63.47%) Loss: 2.115986 LR: 0.00001621 [14:23:52] Epoch: 1 Batch: 12758/20099 (63.48%) Loss: 2.196186 LR: 0.00001621 [14:23:54] Epoch: 1 Batch: 12759/20099 (63.48%) Loss: 1.741677 LR: 0.00001621 [14:23:56] Epoch: 1 Batch: 12760/20099 (63.49%) Loss: 2.196645 LR: 0.00001621 [14:23:58] Epoch: 1 Batch: 12761/20099 (63.49%) Loss: 1.901577 LR: 0.00001621 [14:24:00] Epoch: 1 Batch: 12762/20099 (63.50%) Loss: 2.095832 LR: 0.00001620 [14:24:02] Epoch: 1 Batch: 12763/20099 (63.50%) Loss: 1.924193 LR: 0.00001620 [14:24:03] Epoch: 1 Batch: 12764/20099 (63.51%) Loss: 2.095968 LR: 0.00001620 [14:24:05] Epoch: 1 Batch: 12765/20099 (63.51%) Loss: 2.034623 LR: 0.00001620 [14:24:07] Epoch: 1 Batch: 12766/20099 (63.52%) Loss: 2.076230 LR: 0.00001620 [14:24:09] Epoch: 1 Batch: 12767/20099 (63.52%) Loss: 2.175267 LR: 0.00001620 [14:24:11] Epoch: 1 Batch: 12768/20099 (63.53%) Loss: 2.023261 LR: 0.00001620 [14:24:13] Epoch: 1 Batch: 12769/20099 (63.53%) Loss: 1.968342 LR: 0.00001618 [14:24:15] Epoch: 1 Batch: 12770/20099 (63.54%) Loss: 2.123756 LR: 0.00001618 [14:24:16] Epoch: 1 Batch: 12771/20099 (63.54%) Loss: 2.136152 LR: 0.00001618 [14:24:18] Epoch: 1 Batch: 12772/20099 (63.55%) Loss: 2.077722 LR: 0.00001618 [14:24:20] Epoch: 1 Batch: 12773/20099 (63.55%) Loss: 2.108462 LR: 0.00001618 [14:24:22] Epoch: 1 Batch: 12774/20099 (63.56%) Loss: 1.651378 LR: 0.00001618 [14:24:24] Epoch: 1 Batch: 12775/20099 (63.56%) Loss: 2.145134 LR: 0.00001618 [14:24:26] Epoch: 1 Batch: 12776/20099 (63.57%) Loss: 2.026645 LR: 0.00001616 [14:24:28] Epoch: 1 Batch: 12777/20099 (63.57%) Loss: 2.356749 LR: 0.00001616 [14:24:29] Epoch: 1 Batch: 12778/20099 (63.58%) Loss: 1.906243 LR: 0.00001616 [14:24:31] Epoch: 1 Batch: 12779/20099 (63.58%) Loss: 2.122055 LR: 0.00001616 [14:24:33] Epoch: 1 Batch: 12780/20099 (63.59%) Loss: 1.957824 LR: 0.00001616 [14:24:35] Epoch: 1 Batch: 12781/20099 (63.59%) Loss: 2.219341 LR: 0.00001616 [14:24:37] Epoch: 1 Batch: 12782/20099 (63.60%) Loss: 2.145710 LR: 0.00001616 [14:24:39] Epoch: 1 Batch: 12783/20099 (63.60%) Loss: 1.938868 LR: 0.00001615 [14:24:41] Epoch: 1 Batch: 12784/20099 (63.61%) Loss: 2.068468 LR: 0.00001615 [14:24:42] Epoch: 1 Batch: 12785/20099 (63.61%) Loss: 2.184168 LR: 0.00001615 [14:24:44] Epoch: 1 Batch: 12786/20099 (63.62%) Loss: 2.350325 LR: 0.00001615 [14:24:46] Epoch: 1 Batch: 12787/20099 (63.62%) Loss: 2.037389 LR: 0.00001615 [14:24:48] Epoch: 1 Batch: 12788/20099 (63.63%) Loss: 2.173512 LR: 0.00001615 [14:24:50] Epoch: 1 Batch: 12789/20099 (63.63%) Loss: 2.008999 LR: 0.00001615 [14:24:52] Epoch: 1 Batch: 12790/20099 (63.64%) Loss: 2.434047 LR: 0.00001613 [14:24:54] Epoch: 1 Batch: 12791/20099 (63.64%) Loss: 2.184029 LR: 0.00001613 [14:24:55] Epoch: 1 Batch: 12792/20099 (63.64%) Loss: 2.138233 LR: 0.00001613 [14:24:57] Epoch: 1 Batch: 12793/20099 (63.65%) Loss: 2.120486 LR: 0.00001613 [14:24:59] Epoch: 1 Batch: 12794/20099 (63.65%) Loss: 2.178020 LR: 0.00001613 [14:25:01] Epoch: 1 Batch: 12795/20099 (63.66%) Loss: 1.813330 LR: 0.00001613 [14:25:03] Epoch: 1 Batch: 12796/20099 (63.66%) Loss: 2.247991 LR: 0.00001613 [14:25:05] Epoch: 1 Batch: 12797/20099 (63.67%) Loss: 1.904433 LR: 0.00001612 [14:25:07] Epoch: 1 Batch: 12798/20099 (63.67%) Loss: 2.020821 LR: 0.00001612 [14:25:08] Epoch: 1 Batch: 12799/20099 (63.68%) Loss: 2.115403 LR: 0.00001612 [14:25:14] >> Cleaned up old temp checkpoint: epoch1_step7000 [14:25:14] >> Temp checkpoint saved: epoch1_step12800, size: 0.1693 GB [14:25:14] Epoch: 1 Batch: 12800/20099 (63.68%) Loss: 1.950352 LR: 0.00001612 [14:25:16] Epoch: 1 Batch: 12801/20099 (63.69%) Loss: 2.151952 LR: 0.00001612 [14:25:18] Epoch: 1 Batch: 12802/20099 (63.69%) Loss: 2.129425 LR: 0.00001612 [14:25:20] Epoch: 1 Batch: 12803/20099 (63.70%) Loss: 2.135623 LR: 0.00001612 [14:25:21] Epoch: 1 Batch: 12804/20099 (63.70%) Loss: 1.922926 LR: 0.00001610 [14:25:23] Epoch: 1 Batch: 12805/20099 (63.71%) Loss: 2.332348 LR: 0.00001610 [14:25:25] Epoch: 1 Batch: 12806/20099 (63.71%) Loss: 2.241134 LR: 0.00001610 [14:25:27] Epoch: 1 Batch: 12807/20099 (63.72%) Loss: 2.064121 LR: 0.00001610 [14:25:29] Epoch: 1 Batch: 12808/20099 (63.72%) Loss: 2.114382 LR: 0.00001610 [14:25:31] Epoch: 1 Batch: 12809/20099 (63.73%) Loss: 1.758492 LR: 0.00001610 [14:25:32] Epoch: 1 Batch: 12810/20099 (63.73%) Loss: 2.312398 LR: 0.00001610 [14:25:34] Epoch: 1 Batch: 12811/20099 (63.74%) Loss: 2.220335 LR: 0.00001608 [14:25:36] Epoch: 1 Batch: 12812/20099 (63.74%) Loss: 1.997087 LR: 0.00001608 [14:25:38] Epoch: 1 Batch: 12813/20099 (63.75%) Loss: 2.120683 LR: 0.00001608 [14:25:40] Epoch: 1 Batch: 12814/20099 (63.75%) Loss: 2.244936 LR: 0.00001608 [14:25:42] Epoch: 1 Batch: 12815/20099 (63.76%) Loss: 2.186012 LR: 0.00001608 [14:25:44] Epoch: 1 Batch: 12816/20099 (63.76%) Loss: 1.997612 LR: 0.00001608 [14:25:46] Epoch: 1 Batch: 12817/20099 (63.77%) Loss: 1.830277 LR: 0.00001608 [14:25:47] Epoch: 1 Batch: 12818/20099 (63.77%) Loss: 2.095818 LR: 0.00001607 [14:25:49] Epoch: 1 Batch: 12819/20099 (63.78%) Loss: 2.145198 LR: 0.00001607 [14:25:51] Epoch: 1 Batch: 12820/20099 (63.78%) Loss: 1.888341 LR: 0.00001607 [14:25:53] Epoch: 1 Batch: 12821/20099 (63.79%) Loss: 2.260203 LR: 0.00001607 [14:25:55] Epoch: 1 Batch: 12822/20099 (63.79%) Loss: 2.401816 LR: 0.00001607 [14:25:57] Epoch: 1 Batch: 12823/20099 (63.80%) Loss: 2.076410 LR: 0.00001607 [14:25:59] Epoch: 1 Batch: 12824/20099 (63.80%) Loss: 2.062681 LR: 0.00001607 [14:26:01] Epoch: 1 Batch: 12825/20099 (63.81%) Loss: 2.241721 LR: 0.00001605 [14:26:02] Epoch: 1 Batch: 12826/20099 (63.81%) Loss: 2.073078 LR: 0.00001605 [14:26:04] Epoch: 1 Batch: 12827/20099 (63.82%) Loss: 2.229571 LR: 0.00001605 [14:26:06] Epoch: 1 Batch: 12828/20099 (63.82%) Loss: 1.953658 LR: 0.00001605 [14:26:08] Epoch: 1 Batch: 12829/20099 (63.83%) Loss: 2.260696 LR: 0.00001605 [14:26:10] Epoch: 1 Batch: 12830/20099 (63.83%) Loss: 2.035530 LR: 0.00001605 [14:26:12] Epoch: 1 Batch: 12831/20099 (63.84%) Loss: 2.105090 LR: 0.00001605 [14:26:14] Epoch: 1 Batch: 12832/20099 (63.84%) Loss: 1.827815 LR: 0.00001604 [14:26:15] Epoch: 1 Batch: 12833/20099 (63.85%) Loss: 2.270337 LR: 0.00001604 [14:26:17] Epoch: 1 Batch: 12834/20099 (63.85%) Loss: 2.087409 LR: 0.00001604 [14:26:19] Epoch: 1 Batch: 12835/20099 (63.86%) Loss: 1.937943 LR: 0.00001604 [14:26:21] Epoch: 1 Batch: 12836/20099 (63.86%) Loss: 2.040845 LR: 0.00001604 [14:26:23] Epoch: 1 Batch: 12837/20099 (63.87%) Loss: 1.866566 LR: 0.00001604 [14:26:25] Epoch: 1 Batch: 12838/20099 (63.87%) Loss: 2.141048 LR: 0.00001604 [14:26:27] Epoch: 1 Batch: 12839/20099 (63.88%) Loss: 1.821415 LR: 0.00001602 [14:26:28] Epoch: 1 Batch: 12840/20099 (63.88%) Loss: 1.973950 LR: 0.00001602 [14:26:30] Epoch: 1 Batch: 12841/20099 (63.89%) Loss: 1.858399 LR: 0.00001602 [14:26:32] Epoch: 1 Batch: 12842/20099 (63.89%) Loss: 1.976620 LR: 0.00001602 [14:26:34] Epoch: 1 Batch: 12843/20099 (63.90%) Loss: 2.477959 LR: 0.00001602 [14:26:36] Epoch: 1 Batch: 12844/20099 (63.90%) Loss: 1.927150 LR: 0.00001602 [14:26:38] Epoch: 1 Batch: 12845/20099 (63.91%) Loss: 2.199275 LR: 0.00001602 [14:26:39] Epoch: 1 Batch: 12846/20099 (63.91%) Loss: 1.992364 LR: 0.00001600 [14:26:41] Epoch: 1 Batch: 12847/20099 (63.92%) Loss: 2.237602 LR: 0.00001600 [14:26:43] Epoch: 1 Batch: 12848/20099 (63.92%) Loss: 2.343164 LR: 0.00001600 [14:26:45] Epoch: 1 Batch: 12849/20099 (63.93%) Loss: 1.969648 LR: 0.00001600 [14:26:47] Epoch: 1 Batch: 12850/20099 (63.93%) Loss: 1.946840 LR: 0.00001600 [14:26:49] Epoch: 1 Batch: 12851/20099 (63.94%) Loss: 2.138975 LR: 0.00001600 [14:26:50] Epoch: 1 Batch: 12852/20099 (63.94%) Loss: 2.087366 LR: 0.00001600 [14:26:52] Epoch: 1 Batch: 12853/20099 (63.95%) Loss: 2.083472 LR: 0.00001599 [14:26:54] Epoch: 1 Batch: 12854/20099 (63.95%) Loss: 2.031334 LR: 0.00001599 [14:26:56] Epoch: 1 Batch: 12855/20099 (63.96%) Loss: 2.124740 LR: 0.00001599 [14:26:58] Epoch: 1 Batch: 12856/20099 (63.96%) Loss: 2.288210 LR: 0.00001599 [14:27:00] Epoch: 1 Batch: 12857/20099 (63.97%) Loss: 2.129854 LR: 0.00001599 [14:27:02] Epoch: 1 Batch: 12858/20099 (63.97%) Loss: 2.061570 LR: 0.00001599 [14:27:03] Epoch: 1 Batch: 12859/20099 (63.98%) Loss: 1.956840 LR: 0.00001599 [14:27:05] Epoch: 1 Batch: 12860/20099 (63.98%) Loss: 2.293208 LR: 0.00001597 [14:27:07] Epoch: 1 Batch: 12861/20099 (63.99%) Loss: 2.134808 LR: 0.00001597 [14:27:09] Epoch: 1 Batch: 12862/20099 (63.99%) Loss: 2.380448 LR: 0.00001597 [14:27:11] Epoch: 1 Batch: 12863/20099 (64.00%) Loss: 2.204761 LR: 0.00001597 [14:27:13] Epoch: 1 Batch: 12864/20099 (64.00%) Loss: 2.129557 LR: 0.00001597 [14:27:15] Epoch: 1 Batch: 12865/20099 (64.01%) Loss: 2.140949 LR: 0.00001597 [14:27:16] Epoch: 1 Batch: 12866/20099 (64.01%) Loss: 2.261990 LR: 0.00001597 [14:27:18] Epoch: 1 Batch: 12867/20099 (64.02%) Loss: 2.202038 LR: 0.00001596 [14:27:20] Epoch: 1 Batch: 12868/20099 (64.02%) Loss: 2.276654 LR: 0.00001596 [14:27:22] Epoch: 1 Batch: 12869/20099 (64.03%) Loss: 2.156851 LR: 0.00001596 [14:27:24] Epoch: 1 Batch: 12870/20099 (64.03%) Loss: 2.417508 LR: 0.00001596 [14:27:26] Epoch: 1 Batch: 12871/20099 (64.04%) Loss: 1.763663 LR: 0.00001596 [14:27:28] Epoch: 1 Batch: 12872/20099 (64.04%) Loss: 1.986901 LR: 0.00001596 [14:27:29] Epoch: 1 Batch: 12873/20099 (64.05%) Loss: 2.106017 LR: 0.00001596 [14:27:31] Epoch: 1 Batch: 12874/20099 (64.05%) Loss: 2.179945 LR: 0.00001594 [14:27:33] Epoch: 1 Batch: 12875/20099 (64.06%) Loss: 2.217934 LR: 0.00001594 [14:27:35] Epoch: 1 Batch: 12876/20099 (64.06%) Loss: 2.048631 LR: 0.00001594 [14:27:37] Epoch: 1 Batch: 12877/20099 (64.07%) Loss: 2.201055 LR: 0.00001594 [14:27:39] Epoch: 1 Batch: 12878/20099 (64.07%) Loss: 1.962034 LR: 0.00001594 [14:27:41] Epoch: 1 Batch: 12879/20099 (64.08%) Loss: 2.149259 LR: 0.00001594 [14:27:43] Epoch: 1 Batch: 12880/20099 (64.08%) Loss: 1.800211 LR: 0.00001594 [14:27:44] Epoch: 1 Batch: 12881/20099 (64.09%) Loss: 1.925407 LR: 0.00001592 [14:27:46] Epoch: 1 Batch: 12882/20099 (64.09%) Loss: 1.945724 LR: 0.00001592 [14:27:48] Epoch: 1 Batch: 12883/20099 (64.10%) Loss: 2.226109 LR: 0.00001592 [14:27:50] Epoch: 1 Batch: 12884/20099 (64.10%) Loss: 2.018183 LR: 0.00001592 [14:27:52] Epoch: 1 Batch: 12885/20099 (64.11%) Loss: 2.328751 LR: 0.00001592 [14:27:54] Epoch: 1 Batch: 12886/20099 (64.11%) Loss: 2.061155 LR: 0.00001592 [14:27:56] Epoch: 1 Batch: 12887/20099 (64.12%) Loss: 2.048187 LR: 0.00001592 [14:27:57] Epoch: 1 Batch: 12888/20099 (64.12%) Loss: 2.156271 LR: 0.00001591 [14:27:59] Epoch: 1 Batch: 12889/20099 (64.13%) Loss: 2.217962 LR: 0.00001591 [14:28:01] Epoch: 1 Batch: 12890/20099 (64.13%) Loss: 2.020672 LR: 0.00001591 [14:28:03] Epoch: 1 Batch: 12891/20099 (64.14%) Loss: 2.230306 LR: 0.00001591 [14:28:05] Epoch: 1 Batch: 12892/20099 (64.14%) Loss: 2.068682 LR: 0.00001591 [14:28:07] Epoch: 1 Batch: 12893/20099 (64.15%) Loss: 2.070997 LR: 0.00001591 [14:28:09] Epoch: 1 Batch: 12894/20099 (64.15%) Loss: 2.132705 LR: 0.00001591 [14:28:10] Epoch: 1 Batch: 12895/20099 (64.16%) Loss: 2.479273 LR: 0.00001589 [14:28:12] Epoch: 1 Batch: 12896/20099 (64.16%) Loss: 2.023038 LR: 0.00001589 [14:28:14] Epoch: 1 Batch: 12897/20099 (64.17%) Loss: 1.972834 LR: 0.00001589 [14:28:16] Epoch: 1 Batch: 12898/20099 (64.17%) Loss: 2.335745 LR: 0.00001589 [14:28:18] Epoch: 1 Batch: 12899/20099 (64.18%) Loss: 2.158777 LR: 0.00001589 [14:28:20] Epoch: 1 Batch: 12900/20099 (64.18%) Loss: 2.017734 LR: 0.00001589 [14:28:22] Epoch: 1 Batch: 12901/20099 (64.19%) Loss: 1.960034 LR: 0.00001589 [14:28:23] Epoch: 1 Batch: 12902/20099 (64.19%) Loss: 2.156090 LR: 0.00001588 [14:28:25] Epoch: 1 Batch: 12903/20099 (64.20%) Loss: 2.094389 LR: 0.00001588 [14:28:27] Epoch: 1 Batch: 12904/20099 (64.20%) Loss: 2.578749 LR: 0.00001588 [14:28:29] Epoch: 1 Batch: 12905/20099 (64.21%) Loss: 2.345489 LR: 0.00001588 [14:28:31] Epoch: 1 Batch: 12906/20099 (64.21%) Loss: 2.068344 LR: 0.00001588 [14:28:33] Epoch: 1 Batch: 12907/20099 (64.22%) Loss: 2.047159 LR: 0.00001588 [14:28:35] Epoch: 1 Batch: 12908/20099 (64.22%) Loss: 2.216292 LR: 0.00001588 [14:28:36] Epoch: 1 Batch: 12909/20099 (64.23%) Loss: 1.973738 LR: 0.00001586 [14:28:38] Epoch: 1 Batch: 12910/20099 (64.23%) Loss: 1.978111 LR: 0.00001586 [14:28:40] Epoch: 1 Batch: 12911/20099 (64.24%) Loss: 1.840107 LR: 0.00001586 [14:28:42] Epoch: 1 Batch: 12912/20099 (64.24%) Loss: 2.234126 LR: 0.00001586 [14:28:44] Epoch: 1 Batch: 12913/20099 (64.25%) Loss: 2.164028 LR: 0.00001586 [14:28:46] Epoch: 1 Batch: 12914/20099 (64.25%) Loss: 2.159059 LR: 0.00001586 [14:28:48] Epoch: 1 Batch: 12915/20099 (64.26%) Loss: 2.039332 LR: 0.00001586 [14:28:49] Epoch: 1 Batch: 12916/20099 (64.26%) Loss: 2.195777 LR: 0.00001584 [14:28:51] Epoch: 1 Batch: 12917/20099 (64.27%) Loss: 1.818622 LR: 0.00001584 [14:28:53] Epoch: 1 Batch: 12918/20099 (64.27%) Loss: 2.101648 LR: 0.00001584 [14:28:55] Epoch: 1 Batch: 12919/20099 (64.28%) Loss: 2.108256 LR: 0.00001584 [14:28:57] Epoch: 1 Batch: 12920/20099 (64.28%) Loss: 2.023192 LR: 0.00001584 [14:28:59] Epoch: 1 Batch: 12921/20099 (64.29%) Loss: 1.792125 LR: 0.00001584 [14:29:00] Epoch: 1 Batch: 12922/20099 (64.29%) Loss: 2.260219 LR: 0.00001584 [14:29:02] Epoch: 1 Batch: 12923/20099 (64.30%) Loss: 1.999249 LR: 0.00001583 [14:29:04] Epoch: 1 Batch: 12924/20099 (64.30%) Loss: 2.289367 LR: 0.00001583 [14:29:06] Epoch: 1 Batch: 12925/20099 (64.31%) Loss: 2.193008 LR: 0.00001583 [14:29:08] Epoch: 1 Batch: 12926/20099 (64.31%) Loss: 2.249049 LR: 0.00001583 [14:29:10] Epoch: 1 Batch: 12927/20099 (64.32%) Loss: 2.179411 LR: 0.00001583 [14:29:12] Epoch: 1 Batch: 12928/20099 (64.32%) Loss: 2.246844 LR: 0.00001583 [14:29:13] Epoch: 1 Batch: 12929/20099 (64.33%) Loss: 2.254588 LR: 0.00001583 [14:29:15] Epoch: 1 Batch: 12930/20099 (64.33%) Loss: 2.092018 LR: 0.00001581 [14:29:17] Epoch: 1 Batch: 12931/20099 (64.34%) Loss: 1.964147 LR: 0.00001581 [14:29:19] Epoch: 1 Batch: 12932/20099 (64.34%) Loss: 2.181491 LR: 0.00001581 [14:29:21] Epoch: 1 Batch: 12933/20099 (64.35%) Loss: 2.067664 LR: 0.00001581 [14:29:23] Epoch: 1 Batch: 12934/20099 (64.35%) Loss: 1.993499 LR: 0.00001581 [14:29:25] Epoch: 1 Batch: 12935/20099 (64.36%) Loss: 2.233940 LR: 0.00001581 [14:29:26] Epoch: 1 Batch: 12936/20099 (64.36%) Loss: 2.065438 LR: 0.00001581 [14:29:28] Epoch: 1 Batch: 12937/20099 (64.37%) Loss: 2.069937 LR: 0.00001580 [14:29:30] Epoch: 1 Batch: 12938/20099 (64.37%) Loss: 2.094052 LR: 0.00001580 [14:29:32] Epoch: 1 Batch: 12939/20099 (64.38%) Loss: 1.999597 LR: 0.00001580 [14:29:34] Epoch: 1 Batch: 12940/20099 (64.38%) Loss: 1.976787 LR: 0.00001580 [14:29:36] Epoch: 1 Batch: 12941/20099 (64.39%) Loss: 1.953522 LR: 0.00001580 [14:29:38] Epoch: 1 Batch: 12942/20099 (64.39%) Loss: 2.018340 LR: 0.00001580 [14:29:39] Epoch: 1 Batch: 12943/20099 (64.40%) Loss: 1.953619 LR: 0.00001580 [14:29:41] Epoch: 1 Batch: 12944/20099 (64.40%) Loss: 2.329332 LR: 0.00001578 [14:29:43] Epoch: 1 Batch: 12945/20099 (64.41%) Loss: 2.074020 LR: 0.00001578 [14:29:45] Epoch: 1 Batch: 12946/20099 (64.41%) Loss: 2.038601 LR: 0.00001578 [14:29:47] Epoch: 1 Batch: 12947/20099 (64.42%) Loss: 2.007611 LR: 0.00001578 [14:29:49] Epoch: 1 Batch: 12948/20099 (64.42%) Loss: 1.998048 LR: 0.00001578 [14:29:51] Epoch: 1 Batch: 12949/20099 (64.43%) Loss: 1.945383 LR: 0.00001578 [14:29:53] Epoch: 1 Batch: 12950/20099 (64.43%) Loss: 2.228230 LR: 0.00001578 [14:29:54] Epoch: 1 Batch: 12951/20099 (64.44%) Loss: 1.999337 LR: 0.00001576 [14:29:56] Epoch: 1 Batch: 12952/20099 (64.44%) Loss: 2.325129 LR: 0.00001576 [14:29:58] Epoch: 1 Batch: 12953/20099 (64.45%) Loss: 1.928772 LR: 0.00001576 [14:30:00] Epoch: 1 Batch: 12954/20099 (64.45%) Loss: 2.000386 LR: 0.00001576 [14:30:02] Epoch: 1 Batch: 12955/20099 (64.46%) Loss: 2.365925 LR: 0.00001576 [14:30:04] Epoch: 1 Batch: 12956/20099 (64.46%) Loss: 2.282686 LR: 0.00001576 [14:30:06] Epoch: 1 Batch: 12957/20099 (64.47%) Loss: 1.856720 LR: 0.00001576 [14:30:08] Epoch: 1 Batch: 12958/20099 (64.47%) Loss: 1.958438 LR: 0.00001575 [14:30:09] Epoch: 1 Batch: 12959/20099 (64.48%) Loss: 2.171622 LR: 0.00001575 [14:30:11] Epoch: 1 Batch: 12960/20099 (64.48%) Loss: 2.184100 LR: 0.00001575 [14:30:13] Epoch: 1 Batch: 12961/20099 (64.49%) Loss: 2.203345 LR: 0.00001575 [14:30:15] Epoch: 1 Batch: 12962/20099 (64.49%) Loss: 2.292236 LR: 0.00001575 [14:30:17] Epoch: 1 Batch: 12963/20099 (64.50%) Loss: 1.878169 LR: 0.00001575 [14:30:19] Epoch: 1 Batch: 12964/20099 (64.50%) Loss: 2.163909 LR: 0.00001575 [14:30:21] Epoch: 1 Batch: 12965/20099 (64.51%) Loss: 2.144651 LR: 0.00001573 [14:30:22] Epoch: 1 Batch: 12966/20099 (64.51%) Loss: 2.118259 LR: 0.00001573 [14:30:24] Epoch: 1 Batch: 12967/20099 (64.52%) Loss: 2.302425 LR: 0.00001573 [14:30:26] Epoch: 1 Batch: 12968/20099 (64.52%) Loss: 2.083566 LR: 0.00001573 [14:30:28] Epoch: 1 Batch: 12969/20099 (64.53%) Loss: 2.034535 LR: 0.00001573 [14:30:30] Epoch: 1 Batch: 12970/20099 (64.53%) Loss: 2.031750 LR: 0.00001573 [14:30:32] Epoch: 1 Batch: 12971/20099 (64.54%) Loss: 2.162639 LR: 0.00001573 [14:30:34] Epoch: 1 Batch: 12972/20099 (64.54%) Loss: 2.416115 LR: 0.00001572 [14:30:35] Epoch: 1 Batch: 12973/20099 (64.55%) Loss: 2.023897 LR: 0.00001572 [14:30:37] Epoch: 1 Batch: 12974/20099 (64.55%) Loss: 2.014349 LR: 0.00001572 [14:30:39] Epoch: 1 Batch: 12975/20099 (64.56%) Loss: 2.078857 LR: 0.00001572 [14:30:41] Epoch: 1 Batch: 12976/20099 (64.56%) Loss: 2.113276 LR: 0.00001572 [14:30:43] Epoch: 1 Batch: 12977/20099 (64.57%) Loss: 1.958822 LR: 0.00001572 [14:30:45] Epoch: 1 Batch: 12978/20099 (64.57%) Loss: 1.940669 LR: 0.00001572 [14:30:47] Epoch: 1 Batch: 12979/20099 (64.58%) Loss: 2.317500 LR: 0.00001570 [14:30:48] Epoch: 1 Batch: 12980/20099 (64.58%) Loss: 2.242309 LR: 0.00001570 [14:30:50] Epoch: 1 Batch: 12981/20099 (64.59%) Loss: 2.114102 LR: 0.00001570 [14:30:52] Epoch: 1 Batch: 12982/20099 (64.59%) Loss: 1.841945 LR: 0.00001570 [14:30:54] Epoch: 1 Batch: 12983/20099 (64.60%) Loss: 2.250595 LR: 0.00001570 [14:30:56] Epoch: 1 Batch: 12984/20099 (64.60%) Loss: 1.975917 LR: 0.00001570 [14:30:58] Epoch: 1 Batch: 12985/20099 (64.61%) Loss: 2.284597 LR: 0.00001570 [14:30:59] Epoch: 1 Batch: 12986/20099 (64.61%) Loss: 2.287809 LR: 0.00001568 [14:31:01] Epoch: 1 Batch: 12987/20099 (64.62%) Loss: 2.382632 LR: 0.00001568 [14:31:03] Epoch: 1 Batch: 12988/20099 (64.62%) Loss: 2.066663 LR: 0.00001568 [14:31:05] Epoch: 1 Batch: 12989/20099 (64.63%) Loss: 2.328650 LR: 0.00001568 [14:31:07] Epoch: 1 Batch: 12990/20099 (64.63%) Loss: 2.069773 LR: 0.00001568 [14:31:09] Epoch: 1 Batch: 12991/20099 (64.64%) Loss: 2.189937 LR: 0.00001568 [14:31:11] Epoch: 1 Batch: 12992/20099 (64.64%) Loss: 2.081442 LR: 0.00001568 [14:31:12] Epoch: 1 Batch: 12993/20099 (64.65%) Loss: 2.052189 LR: 0.00001567 [14:31:14] Epoch: 1 Batch: 12994/20099 (64.65%) Loss: 2.345217 LR: 0.00001567 [14:31:16] Epoch: 1 Batch: 12995/20099 (64.65%) Loss: 2.032706 LR: 0.00001567 [14:31:18] Epoch: 1 Batch: 12996/20099 (64.66%) Loss: 2.087821 LR: 0.00001567 [14:31:20] Epoch: 1 Batch: 12997/20099 (64.66%) Loss: 2.083219 LR: 0.00001567 [14:31:22] Epoch: 1 Batch: 12998/20099 (64.67%) Loss: 2.241676 LR: 0.00001567 [14:31:24] Epoch: 1 Batch: 12999/20099 (64.67%) Loss: 2.049608 LR: 0.00001567 [14:31:25] >> Evaluating batch 0 [14:31:27] >> Evaluating batch 1 [14:31:28] >> Evaluating batch 2 [14:31:29] >> Evaluating batch 3 [14:31:30] >> Evaluating batch 4 [14:31:31] >> Evaluating batch 5 [14:31:32] >> Evaluating batch 6 [14:31:33] >> Evaluating batch 7 [14:31:34] >> Evaluating batch 8 [14:31:35] >> Evaluating batch 9 [14:31:36] >> Evaluating batch 10 [14:31:37] >> Evaluating batch 11 [14:31:38] >> Evaluating batch 12 [14:31:39] >> Evaluating batch 13 [14:31:40] >> Evaluating batch 14 [14:31:41] >> Evaluating batch 15 [14:31:42] >> Evaluating batch 16 [14:31:43] Epoch: 1 Step: 13000/20099 Evaluation: [14:31:43] [1mAvg Loss Since Last Eval: 2.1007 Val Loss: 2.1575 Validation loss delta: 0.0012 Perplexity: 8.6499 LR: 0.00001565 [14:31:47] >> Cleaned up old temp checkpoint: epoch1_step7200 [14:31:47] >> Temp checkpoint saved: epoch1_step13000, size: 0.1693 GB [14:31:50] >> Checkpoint saved: epoch1_step13000, size: 0.1693 GB [14:31:50] Epoch: 1 Batch: 13000/20099 (64.68%) Loss: 2.186511 LR: 0.00001565 [14:31:52] Epoch: 1 Batch: 13001/20099 (64.68%) Loss: 2.033415 LR: 0.00001565 [14:31:54] Epoch: 1 Batch: 13002/20099 (64.69%) Loss: 2.112689 LR: 0.00001565 [14:31:56] Epoch: 1 Batch: 13003/20099 (64.69%) Loss: 2.029428 LR: 0.00001565 [14:31:58] Epoch: 1 Batch: 13004/20099 (64.70%) Loss: 2.165260 LR: 0.00001565 [14:32:00] Epoch: 1 Batch: 13005/20099 (64.70%) Loss: 1.916878 LR: 0.00001565 [14:32:01] Epoch: 1 Batch: 13006/20099 (64.71%) Loss: 2.075658 LR: 0.00001565 [14:32:03] Epoch: 1 Batch: 13007/20099 (64.71%) Loss: 2.195780 LR: 0.00001564 [14:32:05] Epoch: 1 Batch: 13008/20099 (64.72%) Loss: 1.837087 LR: 0.00001564 [14:32:07] Epoch: 1 Batch: 13009/20099 (64.72%) Loss: 2.286971 LR: 0.00001564 [14:32:09] Epoch: 1 Batch: 13010/20099 (64.73%) Loss: 2.213863 LR: 0.00001564 [14:32:11] Epoch: 1 Batch: 13011/20099 (64.73%) Loss: 2.055955 LR: 0.00001564 [14:32:13] Epoch: 1 Batch: 13012/20099 (64.74%) Loss: 1.917463 LR: 0.00001564 [14:32:15] Epoch: 1 Batch: 13013/20099 (64.74%) Loss: 2.159004 LR: 0.00001564 [14:32:17] Epoch: 1 Batch: 13014/20099 (64.75%) Loss: 2.422572 LR: 0.00001562 [14:32:18] Epoch: 1 Batch: 13015/20099 (64.75%) Loss: 2.172176 LR: 0.00001562 [14:32:20] Epoch: 1 Batch: 13016/20099 (64.76%) Loss: 2.096167 LR: 0.00001562 [14:32:22] Epoch: 1 Batch: 13017/20099 (64.76%) Loss: 1.715508 LR: 0.00001562 [14:32:24] Epoch: 1 Batch: 13018/20099 (64.77%) Loss: 2.088180 LR: 0.00001562 [14:32:26] Epoch: 1 Batch: 13019/20099 (64.77%) Loss: 2.170098 LR: 0.00001562 [14:32:28] Epoch: 1 Batch: 13020/20099 (64.78%) Loss: 2.085781 LR: 0.00001562 [14:32:30] Epoch: 1 Batch: 13021/20099 (64.78%) Loss: 2.156624 LR: 0.00001560 [14:32:32] Epoch: 1 Batch: 13022/20099 (64.79%) Loss: 1.931699 LR: 0.00001560 [14:32:34] Epoch: 1 Batch: 13023/20099 (64.79%) Loss: 2.058910 LR: 0.00001560 [14:32:35] Epoch: 1 Batch: 13024/20099 (64.80%) Loss: 2.491917 LR: 0.00001560 [14:32:37] Epoch: 1 Batch: 13025/20099 (64.80%) Loss: 2.247309 LR: 0.00001560 [14:32:39] Epoch: 1 Batch: 13026/20099 (64.81%) Loss: 1.844331 LR: 0.00001560 [14:32:41] Epoch: 1 Batch: 13027/20099 (64.81%) Loss: 1.976668 LR: 0.00001560 [14:32:43] Epoch: 1 Batch: 13028/20099 (64.82%) Loss: 2.279679 LR: 0.00001559 [14:32:45] Epoch: 1 Batch: 13029/20099 (64.82%) Loss: 2.088941 LR: 0.00001559 [14:32:47] Epoch: 1 Batch: 13030/20099 (64.83%) Loss: 2.250738 LR: 0.00001559 [14:32:48] Epoch: 1 Batch: 13031/20099 (64.83%) Loss: 2.222489 LR: 0.00001559 [14:32:50] Epoch: 1 Batch: 13032/20099 (64.84%) Loss: 1.802041 LR: 0.00001559 [14:32:52] Epoch: 1 Batch: 13033/20099 (64.84%) Loss: 2.233462 LR: 0.00001559 [14:32:54] Epoch: 1 Batch: 13034/20099 (64.85%) Loss: 2.507017 LR: 0.00001559 [14:32:56] Epoch: 1 Batch: 13035/20099 (64.85%) Loss: 2.402030 LR: 0.00001557 [14:32:58] Epoch: 1 Batch: 13036/20099 (64.86%) Loss: 1.921838 LR: 0.00001557 [14:32:59] Epoch: 1 Batch: 13037/20099 (64.86%) Loss: 1.977620 LR: 0.00001557 [14:33:01] Epoch: 1 Batch: 13038/20099 (64.87%) Loss: 2.160607 LR: 0.00001557 [14:33:03] Epoch: 1 Batch: 13039/20099 (64.87%) Loss: 2.067260 LR: 0.00001557 [14:33:05] Epoch: 1 Batch: 13040/20099 (64.88%) Loss: 2.273299 LR: 0.00001557 [14:33:07] Epoch: 1 Batch: 13041/20099 (64.88%) Loss: 1.867083 LR: 0.00001557 [14:33:09] Epoch: 1 Batch: 13042/20099 (64.89%) Loss: 2.235672 LR: 0.00001556 [14:33:11] Epoch: 1 Batch: 13043/20099 (64.89%) Loss: 2.163190 LR: 0.00001556 [14:33:12] Epoch: 1 Batch: 13044/20099 (64.90%) Loss: 2.124427 LR: 0.00001556 [14:33:14] Epoch: 1 Batch: 13045/20099 (64.90%) Loss: 2.142574 LR: 0.00001556 [14:33:16] Epoch: 1 Batch: 13046/20099 (64.91%) Loss: 1.950213 LR: 0.00001556 [14:33:18] Epoch: 1 Batch: 13047/20099 (64.91%) Loss: 2.321145 LR: 0.00001556 [14:33:20] Epoch: 1 Batch: 13048/20099 (64.92%) Loss: 1.954415 LR: 0.00001556 [14:33:22] Epoch: 1 Batch: 13049/20099 (64.92%) Loss: 2.272018 LR: 0.00001554 [14:33:23] Epoch: 1 Batch: 13050/20099 (64.93%) Loss: 2.270055 LR: 0.00001554 [14:33:25] Epoch: 1 Batch: 13051/20099 (64.93%) Loss: 1.668542 LR: 0.00001554 [14:33:27] Epoch: 1 Batch: 13052/20099 (64.94%) Loss: 2.115391 LR: 0.00001554 [14:33:29] Epoch: 1 Batch: 13053/20099 (64.94%) Loss: 2.282138 LR: 0.00001554 [14:33:31] Epoch: 1 Batch: 13054/20099 (64.95%) Loss: 1.971604 LR: 0.00001554 [14:33:33] Epoch: 1 Batch: 13055/20099 (64.95%) Loss: 2.478104 LR: 0.00001554 [14:33:35] Epoch: 1 Batch: 13056/20099 (64.96%) Loss: 1.961425 LR: 0.00001552 [14:33:36] Epoch: 1 Batch: 13057/20099 (64.96%) Loss: 2.103701 LR: 0.00001552 [14:33:38] Epoch: 1 Batch: 13058/20099 (64.97%) Loss: 2.378411 LR: 0.00001552 [14:33:40] Epoch: 1 Batch: 13059/20099 (64.97%) Loss: 2.075052 LR: 0.00001552 [14:33:42] Epoch: 1 Batch: 13060/20099 (64.98%) Loss: 2.053140 LR: 0.00001552 [14:33:44] Epoch: 1 Batch: 13061/20099 (64.98%) Loss: 1.981675 LR: 0.00001552 [14:33:46] Epoch: 1 Batch: 13062/20099 (64.99%) Loss: 2.080071 LR: 0.00001552 [14:33:48] Epoch: 1 Batch: 13063/20099 (64.99%) Loss: 1.900683 LR: 0.00001551 [14:33:49] Epoch: 1 Batch: 13064/20099 (65.00%) Loss: 2.163860 LR: 0.00001551 [14:33:51] Epoch: 1 Batch: 13065/20099 (65.00%) Loss: 2.142650 LR: 0.00001551 [14:33:53] Epoch: 1 Batch: 13066/20099 (65.01%) Loss: 2.375175 LR: 0.00001551 [14:33:55] Epoch: 1 Batch: 13067/20099 (65.01%) Loss: 2.156680 LR: 0.00001551 [14:33:57] Epoch: 1 Batch: 13068/20099 (65.02%) Loss: 2.159161 LR: 0.00001551 [14:33:59] Epoch: 1 Batch: 13069/20099 (65.02%) Loss: 1.768512 LR: 0.00001551 [14:34:01] Epoch: 1 Batch: 13070/20099 (65.03%) Loss: 1.899040 LR: 0.00001549 [14:34:02] Epoch: 1 Batch: 13071/20099 (65.03%) Loss: 1.782808 LR: 0.00001549 [14:34:04] Epoch: 1 Batch: 13072/20099 (65.04%) Loss: 1.832714 LR: 0.00001549 [14:34:06] Epoch: 1 Batch: 13073/20099 (65.04%) Loss: 2.014294 LR: 0.00001549 [14:34:08] Epoch: 1 Batch: 13074/20099 (65.05%) Loss: 2.095278 LR: 0.00001549 [14:34:10] Epoch: 1 Batch: 13075/20099 (65.05%) Loss: 1.772875 LR: 0.00001549 [14:34:12] Epoch: 1 Batch: 13076/20099 (65.06%) Loss: 1.903954 LR: 0.00001549 [14:34:14] Epoch: 1 Batch: 13077/20099 (65.06%) Loss: 1.966766 LR: 0.00001548 [14:34:15] Epoch: 1 Batch: 13078/20099 (65.07%) Loss: 2.247721 LR: 0.00001548 [14:34:17] Epoch: 1 Batch: 13079/20099 (65.07%) Loss: 2.104852 LR: 0.00001548 [14:34:19] Epoch: 1 Batch: 13080/20099 (65.08%) Loss: 2.059314 LR: 0.00001548 [14:34:21] Epoch: 1 Batch: 13081/20099 (65.08%) Loss: 2.021200 LR: 0.00001548 [14:34:23] Epoch: 1 Batch: 13082/20099 (65.09%) Loss: 2.138544 LR: 0.00001548 [14:34:25] Epoch: 1 Batch: 13083/20099 (65.09%) Loss: 1.498118 LR: 0.00001548 [14:34:27] Epoch: 1 Batch: 13084/20099 (65.10%) Loss: 1.830529 LR: 0.00001546 [14:34:29] Epoch: 1 Batch: 13085/20099 (65.10%) Loss: 2.102938 LR: 0.00001546 [14:34:30] Epoch: 1 Batch: 13086/20099 (65.11%) Loss: 2.249313 LR: 0.00001546 [14:34:32] Epoch: 1 Batch: 13087/20099 (65.11%) Loss: 2.090057 LR: 0.00001546 [14:34:34] Epoch: 1 Batch: 13088/20099 (65.12%) Loss: 2.228157 LR: 0.00001546 [14:34:36] Epoch: 1 Batch: 13089/20099 (65.12%) Loss: 2.036174 LR: 0.00001546 [14:34:38] Epoch: 1 Batch: 13090/20099 (65.13%) Loss: 1.994516 LR: 0.00001546 [14:34:40] Epoch: 1 Batch: 13091/20099 (65.13%) Loss: 1.809638 LR: 0.00001545 [14:34:42] Epoch: 1 Batch: 13092/20099 (65.14%) Loss: 1.995000 LR: 0.00001545 [14:34:43] Epoch: 1 Batch: 13093/20099 (65.14%) Loss: 2.216943 LR: 0.00001545 [14:34:45] Epoch: 1 Batch: 13094/20099 (65.15%) Loss: 1.844049 LR: 0.00001545 [14:34:47] Epoch: 1 Batch: 13095/20099 (65.15%) Loss: 2.117756 LR: 0.00001545 [14:34:49] Epoch: 1 Batch: 13096/20099 (65.16%) Loss: 1.993351 LR: 0.00001545 [14:34:51] Epoch: 1 Batch: 13097/20099 (65.16%) Loss: 2.306378 LR: 0.00001545 [14:34:53] Epoch: 1 Batch: 13098/20099 (65.17%) Loss: 2.019214 LR: 0.00001543 [14:34:55] Epoch: 1 Batch: 13099/20099 (65.17%) Loss: 2.094152 LR: 0.00001543 [14:34:56] Epoch: 1 Batch: 13100/20099 (65.18%) Loss: 2.001420 LR: 0.00001543 [14:34:58] Epoch: 1 Batch: 13101/20099 (65.18%) Loss: 1.913367 LR: 0.00001543 [14:35:00] Epoch: 1 Batch: 13102/20099 (65.19%) Loss: 1.980994 LR: 0.00001543 [14:35:02] Epoch: 1 Batch: 13103/20099 (65.19%) Loss: 2.107728 LR: 0.00001543 [14:35:04] Epoch: 1 Batch: 13104/20099 (65.20%) Loss: 2.009214 LR: 0.00001543 [14:35:06] Epoch: 1 Batch: 13105/20099 (65.20%) Loss: 1.908236 LR: 0.00001541 [14:35:08] Epoch: 1 Batch: 13106/20099 (65.21%) Loss: 2.094634 LR: 0.00001541 [14:35:09] Epoch: 1 Batch: 13107/20099 (65.21%) Loss: 2.001141 LR: 0.00001541 [14:35:11] Epoch: 1 Batch: 13108/20099 (65.22%) Loss: 2.228169 LR: 0.00001541 [14:35:13] Epoch: 1 Batch: 13109/20099 (65.22%) Loss: 2.373686 LR: 0.00001541 [14:35:15] Epoch: 1 Batch: 13110/20099 (65.23%) Loss: 2.072931 LR: 0.00001541 [14:35:17] Epoch: 1 Batch: 13111/20099 (65.23%) Loss: 2.016960 LR: 0.00001541 [14:35:19] Epoch: 1 Batch: 13112/20099 (65.24%) Loss: 1.740198 LR: 0.00001540 [14:35:20] Epoch: 1 Batch: 13113/20099 (65.24%) Loss: 2.035235 LR: 0.00001540 [14:35:22] Epoch: 1 Batch: 13114/20099 (65.25%) Loss: 1.650130 LR: 0.00001540 [14:35:24] Epoch: 1 Batch: 13115/20099 (65.25%) Loss: 2.219318 LR: 0.00001540 [14:35:26] Epoch: 1 Batch: 13116/20099 (65.26%) Loss: 2.137151 LR: 0.00001540 [14:35:28] Epoch: 1 Batch: 13117/20099 (65.26%) Loss: 2.159455 LR: 0.00001540 [14:35:30] Epoch: 1 Batch: 13118/20099 (65.27%) Loss: 2.067186 LR: 0.00001540 [14:35:32] Epoch: 1 Batch: 13119/20099 (65.27%) Loss: 2.298682 LR: 0.00001538 [14:35:33] Epoch: 1 Batch: 13120/20099 (65.28%) Loss: 1.984197 LR: 0.00001538 [14:35:35] Epoch: 1 Batch: 13121/20099 (65.28%) Loss: 2.287044 LR: 0.00001538 [14:35:37] Epoch: 1 Batch: 13122/20099 (65.29%) Loss: 2.010224 LR: 0.00001538 [14:35:39] Epoch: 1 Batch: 13123/20099 (65.29%) Loss: 2.355015 LR: 0.00001538 [14:35:41] Epoch: 1 Batch: 13124/20099 (65.30%) Loss: 2.427701 LR: 0.00001538 [14:35:43] Epoch: 1 Batch: 13125/20099 (65.30%) Loss: 2.299504 LR: 0.00001538 [14:35:45] Epoch: 1 Batch: 13126/20099 (65.31%) Loss: 2.145265 LR: 0.00001537 [14:35:46] Epoch: 1 Batch: 13127/20099 (65.31%) Loss: 1.982587 LR: 0.00001537 [14:35:48] Epoch: 1 Batch: 13128/20099 (65.32%) Loss: 1.929771 LR: 0.00001537 [14:35:50] Epoch: 1 Batch: 13129/20099 (65.32%) Loss: 2.201361 LR: 0.00001537 [14:35:52] Epoch: 1 Batch: 13130/20099 (65.33%) Loss: 2.281821 LR: 0.00001537 [14:35:54] Epoch: 1 Batch: 13131/20099 (65.33%) Loss: 2.027696 LR: 0.00001537 [14:35:56] Epoch: 1 Batch: 13132/20099 (65.34%) Loss: 2.132603 LR: 0.00001537 [14:35:58] Epoch: 1 Batch: 13133/20099 (65.34%) Loss: 1.877525 LR: 0.00001535 [14:35:59] Epoch: 1 Batch: 13134/20099 (65.35%) Loss: 2.102915 LR: 0.00001535 [14:36:01] Epoch: 1 Batch: 13135/20099 (65.35%) Loss: 2.101993 LR: 0.00001535 [14:36:03] Epoch: 1 Batch: 13136/20099 (65.36%) Loss: 1.971623 LR: 0.00001535 [14:36:05] Epoch: 1 Batch: 13137/20099 (65.36%) Loss: 2.100583 LR: 0.00001535 [14:36:07] Epoch: 1 Batch: 13138/20099 (65.37%) Loss: 2.114788 LR: 0.00001535 [14:36:09] Epoch: 1 Batch: 13139/20099 (65.37%) Loss: 2.067682 LR: 0.00001535 [14:36:11] Epoch: 1 Batch: 13140/20099 (65.38%) Loss: 2.127916 LR: 0.00001533 [14:36:12] Epoch: 1 Batch: 13141/20099 (65.38%) Loss: 2.190855 LR: 0.00001533 [14:36:14] Epoch: 1 Batch: 13142/20099 (65.39%) Loss: 2.155585 LR: 0.00001533 [14:36:16] Epoch: 1 Batch: 13143/20099 (65.39%) Loss: 1.791536 LR: 0.00001533 [14:36:18] Epoch: 1 Batch: 13144/20099 (65.40%) Loss: 1.876237 LR: 0.00001533 [14:36:20] Epoch: 1 Batch: 13145/20099 (65.40%) Loss: 1.896307 LR: 0.00001533 [14:36:22] Epoch: 1 Batch: 13146/20099 (65.41%) Loss: 2.112759 LR: 0.00001533 [14:36:24] Epoch: 1 Batch: 13147/20099 (65.41%) Loss: 2.109152 LR: 0.00001532 [14:36:25] Epoch: 1 Batch: 13148/20099 (65.42%) Loss: 2.049245 LR: 0.00001532 [14:36:27] Epoch: 1 Batch: 13149/20099 (65.42%) Loss: 2.015254 LR: 0.00001532 [14:36:29] Epoch: 1 Batch: 13150/20099 (65.43%) Loss: 2.186172 LR: 0.00001532 [14:36:31] Epoch: 1 Batch: 13151/20099 (65.43%) Loss: 1.873394 LR: 0.00001532 [14:36:33] Epoch: 1 Batch: 13152/20099 (65.44%) Loss: 2.154361 LR: 0.00001532 [14:36:35] Epoch: 1 Batch: 13153/20099 (65.44%) Loss: 2.039216 LR: 0.00001532 [14:36:37] Epoch: 1 Batch: 13154/20099 (65.45%) Loss: 1.883165 LR: 0.00001530 [14:36:38] Epoch: 1 Batch: 13155/20099 (65.45%) Loss: 1.919197 LR: 0.00001530 [14:36:40] Epoch: 1 Batch: 13156/20099 (65.46%) Loss: 2.253483 LR: 0.00001530 [14:36:42] Epoch: 1 Batch: 13157/20099 (65.46%) Loss: 1.949835 LR: 0.00001530 [14:36:44] Epoch: 1 Batch: 13158/20099 (65.47%) Loss: 2.171381 LR: 0.00001530 [14:36:46] Epoch: 1 Batch: 13159/20099 (65.47%) Loss: 2.045382 LR: 0.00001530 [14:36:48] Epoch: 1 Batch: 13160/20099 (65.48%) Loss: 2.004151 LR: 0.00001530 [14:36:49] Epoch: 1 Batch: 13161/20099 (65.48%) Loss: 1.988157 LR: 0.00001529 [14:36:51] Epoch: 1 Batch: 13162/20099 (65.49%) Loss: 2.272771 LR: 0.00001529 [14:36:53] Epoch: 1 Batch: 13163/20099 (65.49%) Loss: 2.151926 LR: 0.00001529 [14:36:55] Epoch: 1 Batch: 13164/20099 (65.50%) Loss: 1.997819 LR: 0.00001529 [14:36:57] Epoch: 1 Batch: 13165/20099 (65.50%) Loss: 1.723411 LR: 0.00001529 [14:36:59] Epoch: 1 Batch: 13166/20099 (65.51%) Loss: 1.979266 LR: 0.00001529 [14:37:00] Epoch: 1 Batch: 13167/20099 (65.51%) Loss: 1.925596 LR: 0.00001529 [14:37:02] Epoch: 1 Batch: 13168/20099 (65.52%) Loss: 2.063634 LR: 0.00001527 [14:37:04] Epoch: 1 Batch: 13169/20099 (65.52%) Loss: 2.227344 LR: 0.00001527 [14:37:06] Epoch: 1 Batch: 13170/20099 (65.53%) Loss: 2.047029 LR: 0.00001527 [14:37:08] Epoch: 1 Batch: 13171/20099 (65.53%) Loss: 2.148620 LR: 0.00001527 [14:37:10] Epoch: 1 Batch: 13172/20099 (65.54%) Loss: 2.307974 LR: 0.00001527 [14:37:12] Epoch: 1 Batch: 13173/20099 (65.54%) Loss: 2.017819 LR: 0.00001527 [14:37:13] Epoch: 1 Batch: 13174/20099 (65.55%) Loss: 2.151975 LR: 0.00001527 [14:37:15] Epoch: 1 Batch: 13175/20099 (65.55%) Loss: 1.955115 LR: 0.00001526 [14:37:17] Epoch: 1 Batch: 13176/20099 (65.56%) Loss: 1.775499 LR: 0.00001526 [14:37:19] Epoch: 1 Batch: 13177/20099 (65.56%) Loss: 2.089624 LR: 0.00001526 [14:37:21] Epoch: 1 Batch: 13178/20099 (65.57%) Loss: 2.186596 LR: 0.00001526 [14:37:23] Epoch: 1 Batch: 13179/20099 (65.57%) Loss: 1.924040 LR: 0.00001526 [14:37:25] Epoch: 1 Batch: 13180/20099 (65.58%) Loss: 1.902681 LR: 0.00001526 [14:37:26] Epoch: 1 Batch: 13181/20099 (65.58%) Loss: 2.139262 LR: 0.00001526 [14:37:28] Epoch: 1 Batch: 13182/20099 (65.59%) Loss: 2.223891 LR: 0.00001524 [14:37:30] Epoch: 1 Batch: 13183/20099 (65.59%) Loss: 2.135888 LR: 0.00001524 [14:37:32] Epoch: 1 Batch: 13184/20099 (65.60%) Loss: 2.089815 LR: 0.00001524 [14:37:34] Epoch: 1 Batch: 13185/20099 (65.60%) Loss: 2.113931 LR: 0.00001524 [14:37:36] Epoch: 1 Batch: 13186/20099 (65.61%) Loss: 2.242988 LR: 0.00001524 [14:37:38] Epoch: 1 Batch: 13187/20099 (65.61%) Loss: 2.129860 LR: 0.00001524 [14:37:40] Epoch: 1 Batch: 13188/20099 (65.62%) Loss: 2.267756 LR: 0.00001524 [14:37:41] Epoch: 1 Batch: 13189/20099 (65.62%) Loss: 2.125868 LR: 0.00001522 [14:37:43] Epoch: 1 Batch: 13190/20099 (65.63%) Loss: 1.897566 LR: 0.00001522 [14:37:45] Epoch: 1 Batch: 13191/20099 (65.63%) Loss: 1.897878 LR: 0.00001522 [14:37:47] Epoch: 1 Batch: 13192/20099 (65.64%) Loss: 2.139117 LR: 0.00001522 [14:37:49] Epoch: 1 Batch: 13193/20099 (65.64%) Loss: 1.963516 LR: 0.00001522 [14:37:51] Epoch: 1 Batch: 13194/20099 (65.65%) Loss: 2.012357 LR: 0.00001522 [14:37:53] Epoch: 1 Batch: 13195/20099 (65.65%) Loss: 2.088510 LR: 0.00001522 [14:37:54] Epoch: 1 Batch: 13196/20099 (65.66%) Loss: 2.325826 LR: 0.00001521 [14:37:56] Epoch: 1 Batch: 13197/20099 (65.66%) Loss: 1.917570 LR: 0.00001521 [14:37:58] Epoch: 1 Batch: 13198/20099 (65.66%) Loss: 2.124802 LR: 0.00001521 [14:38:00] Epoch: 1 Batch: 13199/20099 (65.67%) Loss: 2.210272 LR: 0.00001521 [14:38:06] >> Cleaned up old temp checkpoint: epoch1_step7400 [14:38:06] >> Temp checkpoint saved: epoch1_step13200, size: 0.1693 GB [14:38:06] Epoch: 1 Batch: 13200/20099 (65.67%) Loss: 2.100329 LR: 0.00001521 [14:38:07] Epoch: 1 Batch: 13201/20099 (65.68%) Loss: 2.164722 LR: 0.00001521 [14:38:09] Epoch: 1 Batch: 13202/20099 (65.68%) Loss: 1.989659 LR: 0.00001521 [14:38:11] Epoch: 1 Batch: 13203/20099 (65.69%) Loss: 2.001170 LR: 0.00001519 [14:38:13] Epoch: 1 Batch: 13204/20099 (65.69%) Loss: 1.948820 LR: 0.00001519 [14:38:15] Epoch: 1 Batch: 13205/20099 (65.70%) Loss: 2.014765 LR: 0.00001519 [14:38:17] Epoch: 1 Batch: 13206/20099 (65.70%) Loss: 2.128066 LR: 0.00001519 [14:38:18] Epoch: 1 Batch: 13207/20099 (65.71%) Loss: 2.099763 LR: 0.00001519 [14:38:20] Epoch: 1 Batch: 13208/20099 (65.71%) Loss: 2.333603 LR: 0.00001519 [14:38:22] Epoch: 1 Batch: 13209/20099 (65.72%) Loss: 2.169842 LR: 0.00001519 [14:38:24] Epoch: 1 Batch: 13210/20099 (65.72%) Loss: 1.772969 LR: 0.00001518 [14:38:26] Epoch: 1 Batch: 13211/20099 (65.73%) Loss: 2.020803 LR: 0.00001518 [14:38:28] Epoch: 1 Batch: 13212/20099 (65.73%) Loss: 1.961950 LR: 0.00001518 [14:38:30] Epoch: 1 Batch: 13213/20099 (65.74%) Loss: 2.306063 LR: 0.00001518 [14:38:32] Epoch: 1 Batch: 13214/20099 (65.74%) Loss: 2.069408 LR: 0.00001518 [14:38:34] Epoch: 1 Batch: 13215/20099 (65.75%) Loss: 2.162756 LR: 0.00001518 [14:38:35] Epoch: 1 Batch: 13216/20099 (65.75%) Loss: 1.959623 LR: 0.00001518 [14:38:37] Epoch: 1 Batch: 13217/20099 (65.76%) Loss: 2.033809 LR: 0.00001516 [14:38:39] Epoch: 1 Batch: 13218/20099 (65.76%) Loss: 2.224956 LR: 0.00001516 [14:38:41] Epoch: 1 Batch: 13219/20099 (65.77%) Loss: 2.112262 LR: 0.00001516 [14:38:43] Epoch: 1 Batch: 13220/20099 (65.77%) Loss: 2.081476 LR: 0.00001516 [14:38:45] Epoch: 1 Batch: 13221/20099 (65.78%) Loss: 1.982050 LR: 0.00001516 [14:38:47] Epoch: 1 Batch: 13222/20099 (65.78%) Loss: 2.275592 LR: 0.00001516 [14:38:49] Epoch: 1 Batch: 13223/20099 (65.79%) Loss: 1.843499 LR: 0.00001516 [14:38:50] Epoch: 1 Batch: 13224/20099 (65.79%) Loss: 2.497805 LR: 0.00001514 [14:38:52] Epoch: 1 Batch: 13225/20099 (65.80%) Loss: 1.684174 LR: 0.00001514 [14:38:54] Epoch: 1 Batch: 13226/20099 (65.80%) Loss: 2.239550 LR: 0.00001514 [14:38:56] Epoch: 1 Batch: 13227/20099 (65.81%) Loss: 2.155401 LR: 0.00001514 [14:38:58] Epoch: 1 Batch: 13228/20099 (65.81%) Loss: 1.905514 LR: 0.00001514 [14:39:00] Epoch: 1 Batch: 13229/20099 (65.82%) Loss: 2.360881 LR: 0.00001514 [14:39:02] Epoch: 1 Batch: 13230/20099 (65.82%) Loss: 2.545454 LR: 0.00001514 [14:39:03] Epoch: 1 Batch: 13231/20099 (65.83%) Loss: 2.210158 LR: 0.00001513 [14:39:05] Epoch: 1 Batch: 13232/20099 (65.83%) Loss: 2.140237 LR: 0.00001513 [14:39:07] Epoch: 1 Batch: 13233/20099 (65.84%) Loss: 2.166386 LR: 0.00001513 [14:39:09] Epoch: 1 Batch: 13234/20099 (65.84%) Loss: 2.046569 LR: 0.00001513 [14:39:11] Epoch: 1 Batch: 13235/20099 (65.85%) Loss: 1.923811 LR: 0.00001513 [14:39:13] Epoch: 1 Batch: 13236/20099 (65.85%) Loss: 2.382309 LR: 0.00001513 [14:39:14] Epoch: 1 Batch: 13237/20099 (65.86%) Loss: 1.942813 LR: 0.00001513 [14:39:16] Epoch: 1 Batch: 13238/20099 (65.86%) Loss: 1.900930 LR: 0.00001511 [14:39:18] Epoch: 1 Batch: 13239/20099 (65.87%) Loss: 1.799536 LR: 0.00001511 [14:39:20] Epoch: 1 Batch: 13240/20099 (65.87%) Loss: 2.316589 LR: 0.00001511 [14:39:22] Epoch: 1 Batch: 13241/20099 (65.88%) Loss: 2.239168 LR: 0.00001511 [14:39:24] Epoch: 1 Batch: 13242/20099 (65.88%) Loss: 2.129160 LR: 0.00001511 [14:39:26] Epoch: 1 Batch: 13243/20099 (65.89%) Loss: 2.177190 LR: 0.00001511 [14:39:27] Epoch: 1 Batch: 13244/20099 (65.89%) Loss: 2.101856 LR: 0.00001511 [14:39:29] Epoch: 1 Batch: 13245/20099 (65.90%) Loss: 1.795748 LR: 0.00001510 [14:39:31] Epoch: 1 Batch: 13246/20099 (65.90%) Loss: 1.982189 LR: 0.00001510 [14:39:33] Epoch: 1 Batch: 13247/20099 (65.91%) Loss: 2.391403 LR: 0.00001510 [14:39:35] Epoch: 1 Batch: 13248/20099 (65.91%) Loss: 2.282059 LR: 0.00001510 [14:39:37] Epoch: 1 Batch: 13249/20099 (65.92%) Loss: 2.072038 LR: 0.00001510 [14:39:39] Epoch: 1 Batch: 13250/20099 (65.92%) Loss: 2.127013 LR: 0.00001510 [14:39:40] Epoch: 1 Batch: 13251/20099 (65.93%) Loss: 2.228413 LR: 0.00001510 [14:39:42] Epoch: 1 Batch: 13252/20099 (65.93%) Loss: 2.234942 LR: 0.00001508 [14:39:44] Epoch: 1 Batch: 13253/20099 (65.94%) Loss: 1.952884 LR: 0.00001508 [14:39:46] Epoch: 1 Batch: 13254/20099 (65.94%) Loss: 2.025196 LR: 0.00001508 [14:39:48] Epoch: 1 Batch: 13255/20099 (65.95%) Loss: 1.776057 LR: 0.00001508 [14:39:50] Epoch: 1 Batch: 13256/20099 (65.95%) Loss: 2.288254 LR: 0.00001508 [14:39:52] Epoch: 1 Batch: 13257/20099 (65.96%) Loss: 2.095778 LR: 0.00001508 [14:39:53] Epoch: 1 Batch: 13258/20099 (65.96%) Loss: 1.798327 LR: 0.00001508 [14:39:55] Epoch: 1 Batch: 13259/20099 (65.97%) Loss: 2.149096 LR: 0.00001507 [14:39:57] Epoch: 1 Batch: 13260/20099 (65.97%) Loss: 1.935918 LR: 0.00001507 [14:39:59] Epoch: 1 Batch: 13261/20099 (65.98%) Loss: 2.147200 LR: 0.00001507 [14:40:01] Epoch: 1 Batch: 13262/20099 (65.98%) Loss: 2.083940 LR: 0.00001507 [14:40:03] Epoch: 1 Batch: 13263/20099 (65.99%) Loss: 1.889767 LR: 0.00001507 [14:40:05] Epoch: 1 Batch: 13264/20099 (65.99%) Loss: 1.968810 LR: 0.00001507 [14:40:06] Epoch: 1 Batch: 13265/20099 (66.00%) Loss: 2.424130 LR: 0.00001507 [14:40:08] Epoch: 1 Batch: 13266/20099 (66.00%) Loss: 1.941973 LR: 0.00001505 [14:40:10] Epoch: 1 Batch: 13267/20099 (66.01%) Loss: 2.026785 LR: 0.00001505 [14:40:12] Epoch: 1 Batch: 13268/20099 (66.01%) Loss: 2.004981 LR: 0.00001505 [14:40:14] Epoch: 1 Batch: 13269/20099 (66.02%) Loss: 1.995591 LR: 0.00001505 [14:40:16] Epoch: 1 Batch: 13270/20099 (66.02%) Loss: 2.053522 LR: 0.00001505 [14:40:18] Epoch: 1 Batch: 13271/20099 (66.03%) Loss: 2.215240 LR: 0.00001505 [14:40:19] Epoch: 1 Batch: 13272/20099 (66.03%) Loss: 2.192187 LR: 0.00001505 [14:40:21] Epoch: 1 Batch: 13273/20099 (66.04%) Loss: 2.298878 LR: 0.00001503 [14:40:23] Epoch: 1 Batch: 13274/20099 (66.04%) Loss: 2.034941 LR: 0.00001503 [14:40:25] Epoch: 1 Batch: 13275/20099 (66.05%) Loss: 1.986762 LR: 0.00001503 [14:40:27] Epoch: 1 Batch: 13276/20099 (66.05%) Loss: 1.991832 LR: 0.00001503 [14:40:29] Epoch: 1 Batch: 13277/20099 (66.06%) Loss: 2.264936 LR: 0.00001503 [14:40:31] Epoch: 1 Batch: 13278/20099 (66.06%) Loss: 2.404581 LR: 0.00001503 [14:40:32] Epoch: 1 Batch: 13279/20099 (66.07%) Loss: 1.853590 LR: 0.00001503 [14:40:34] Epoch: 1 Batch: 13280/20099 (66.07%) Loss: 2.103401 LR: 0.00001502 [14:40:36] Epoch: 1 Batch: 13281/20099 (66.08%) Loss: 2.077523 LR: 0.00001502 [14:40:38] Epoch: 1 Batch: 13282/20099 (66.08%) Loss: 2.057067 LR: 0.00001502 [14:40:40] Epoch: 1 Batch: 13283/20099 (66.09%) Loss: 1.807791 LR: 0.00001502 [14:40:42] Epoch: 1 Batch: 13284/20099 (66.09%) Loss: 1.867887 LR: 0.00001502 [14:40:44] Epoch: 1 Batch: 13285/20099 (66.10%) Loss: 2.179462 LR: 0.00001502 [14:40:46] Epoch: 1 Batch: 13286/20099 (66.10%) Loss: 2.035606 LR: 0.00001502 [14:40:47] Epoch: 1 Batch: 13287/20099 (66.11%) Loss: 1.995253 LR: 0.00001500 [14:40:49] Epoch: 1 Batch: 13288/20099 (66.11%) Loss: 1.916707 LR: 0.00001500 [14:40:51] Epoch: 1 Batch: 13289/20099 (66.12%) Loss: 1.984780 LR: 0.00001500 [14:40:53] Epoch: 1 Batch: 13290/20099 (66.12%) Loss: 1.978468 LR: 0.00001500 [14:40:55] Epoch: 1 Batch: 13291/20099 (66.13%) Loss: 2.129000 LR: 0.00001500 [14:40:57] Epoch: 1 Batch: 13292/20099 (66.13%) Loss: 2.009662 LR: 0.00001500 [14:40:58] Epoch: 1 Batch: 13293/20099 (66.14%) Loss: 2.284569 LR: 0.00001500 [14:41:00] Epoch: 1 Batch: 13294/20099 (66.14%) Loss: 2.032733 LR: 0.00001499 [14:41:02] Epoch: 1 Batch: 13295/20099 (66.15%) Loss: 2.197347 LR: 0.00001499 [14:41:04] Epoch: 1 Batch: 13296/20099 (66.15%) Loss: 2.106954 LR: 0.00001499 [14:41:06] Epoch: 1 Batch: 13297/20099 (66.16%) Loss: 1.755550 LR: 0.00001499 [14:41:08] Epoch: 1 Batch: 13298/20099 (66.16%) Loss: 1.694667 LR: 0.00001499 [14:41:10] Epoch: 1 Batch: 13299/20099 (66.17%) Loss: 2.054538 LR: 0.00001499 [14:41:11] Epoch: 1 Batch: 13300/20099 (66.17%) Loss: 2.303781 LR: 0.00001499 [14:41:13] Epoch: 1 Batch: 13301/20099 (66.18%) Loss: 2.338343 LR: 0.00001497 [14:41:15] Epoch: 1 Batch: 13302/20099 (66.18%) Loss: 1.930414 LR: 0.00001497 [14:41:17] Epoch: 1 Batch: 13303/20099 (66.19%) Loss: 2.263451 LR: 0.00001497 [14:41:19] Epoch: 1 Batch: 13304/20099 (66.19%) Loss: 2.025279 LR: 0.00001497 [14:41:21] Epoch: 1 Batch: 13305/20099 (66.20%) Loss: 1.967745 LR: 0.00001497 [14:41:22] Epoch: 1 Batch: 13306/20099 (66.20%) Loss: 2.429386 LR: 0.00001497 [14:41:24] Epoch: 1 Batch: 13307/20099 (66.21%) Loss: 2.230625 LR: 0.00001497 [14:41:26] Epoch: 1 Batch: 13308/20099 (66.21%) Loss: 2.096977 LR: 0.00001496 [14:41:28] Epoch: 1 Batch: 13309/20099 (66.22%) Loss: 2.336501 LR: 0.00001496 [14:41:30] Epoch: 1 Batch: 13310/20099 (66.22%) Loss: 2.312783 LR: 0.00001496 [14:41:32] Epoch: 1 Batch: 13311/20099 (66.23%) Loss: 2.071438 LR: 0.00001496 [14:41:34] Epoch: 1 Batch: 13312/20099 (66.23%) Loss: 2.379458 LR: 0.00001496 [14:41:35] Epoch: 1 Batch: 13313/20099 (66.24%) Loss: 2.158523 LR: 0.00001496 [14:41:37] Epoch: 1 Batch: 13314/20099 (66.24%) Loss: 1.903454 LR: 0.00001496 [14:41:39] Epoch: 1 Batch: 13315/20099 (66.25%) Loss: 1.839971 LR: 0.00001494 [14:41:41] Epoch: 1 Batch: 13316/20099 (66.25%) Loss: 2.039931 LR: 0.00001494 [14:41:43] Epoch: 1 Batch: 13317/20099 (66.26%) Loss: 1.767747 LR: 0.00001494 [14:41:45] Epoch: 1 Batch: 13318/20099 (66.26%) Loss: 1.980087 LR: 0.00001494 [14:41:47] Epoch: 1 Batch: 13319/20099 (66.27%) Loss: 2.074606 LR: 0.00001494 [14:41:48] Epoch: 1 Batch: 13320/20099 (66.27%) Loss: 2.111281 LR: 0.00001494 [14:41:50] Epoch: 1 Batch: 13321/20099 (66.28%) Loss: 2.273560 LR: 0.00001494 [14:41:52] Epoch: 1 Batch: 13322/20099 (66.28%) Loss: 2.181362 LR: 0.00001492 [14:41:54] Epoch: 1 Batch: 13323/20099 (66.29%) Loss: 2.243356 LR: 0.00001492 [14:41:56] Epoch: 1 Batch: 13324/20099 (66.29%) Loss: 2.121855 LR: 0.00001492 [14:41:58] Epoch: 1 Batch: 13325/20099 (66.30%) Loss: 2.167868 LR: 0.00001492 [14:42:00] Epoch: 1 Batch: 13326/20099 (66.30%) Loss: 1.829316 LR: 0.00001492 [14:42:01] Epoch: 1 Batch: 13327/20099 (66.31%) Loss: 1.829332 LR: 0.00001492 [14:42:03] Epoch: 1 Batch: 13328/20099 (66.31%) Loss: 2.210718 LR: 0.00001492 [14:42:05] Epoch: 1 Batch: 13329/20099 (66.32%) Loss: 2.150987 LR: 0.00001491 [14:42:07] Epoch: 1 Batch: 13330/20099 (66.32%) Loss: 2.288572 LR: 0.00001491 [14:42:09] Epoch: 1 Batch: 13331/20099 (66.33%) Loss: 1.981042 LR: 0.00001491 [14:42:11] Epoch: 1 Batch: 13332/20099 (66.33%) Loss: 2.221025 LR: 0.00001491 [14:42:13] Epoch: 1 Batch: 13333/20099 (66.34%) Loss: 2.110627 LR: 0.00001491 [14:42:14] Epoch: 1 Batch: 13334/20099 (66.34%) Loss: 2.168762 LR: 0.00001491 [14:42:16] Epoch: 1 Batch: 13335/20099 (66.35%) Loss: 1.816602 LR: 0.00001491 [14:42:18] Epoch: 1 Batch: 13336/20099 (66.35%) Loss: 2.223041 LR: 0.00001489 [14:42:20] Epoch: 1 Batch: 13337/20099 (66.36%) Loss: 2.069530 LR: 0.00001489 [14:42:22] Epoch: 1 Batch: 13338/20099 (66.36%) Loss: 2.031565 LR: 0.00001489 [14:42:24] Epoch: 1 Batch: 13339/20099 (66.37%) Loss: 2.159914 LR: 0.00001489 [14:42:26] Epoch: 1 Batch: 13340/20099 (66.37%) Loss: 2.183666 LR: 0.00001489 [14:42:27] Epoch: 1 Batch: 13341/20099 (66.38%) Loss: 2.094838 LR: 0.00001489 [14:42:29] Epoch: 1 Batch: 13342/20099 (66.38%) Loss: 2.000820 LR: 0.00001489 [14:42:31] Epoch: 1 Batch: 13343/20099 (66.39%) Loss: 1.890665 LR: 0.00001488 [14:42:33] Epoch: 1 Batch: 13344/20099 (66.39%) Loss: 2.388323 LR: 0.00001488 [14:42:35] Epoch: 1 Batch: 13345/20099 (66.40%) Loss: 2.242734 LR: 0.00001488 [14:42:37] Epoch: 1 Batch: 13346/20099 (66.40%) Loss: 1.852429 LR: 0.00001488 [14:42:38] Epoch: 1 Batch: 13347/20099 (66.41%) Loss: 2.160745 LR: 0.00001488 [14:42:40] Epoch: 1 Batch: 13348/20099 (66.41%) Loss: 2.057290 LR: 0.00001488 [14:42:42] Epoch: 1 Batch: 13349/20099 (66.42%) Loss: 2.008163 LR: 0.00001488 [14:42:44] Epoch: 1 Batch: 13350/20099 (66.42%) Loss: 1.995487 LR: 0.00001486 [14:42:46] Epoch: 1 Batch: 13351/20099 (66.43%) Loss: 1.992618 LR: 0.00001486 [14:42:48] Epoch: 1 Batch: 13352/20099 (66.43%) Loss: 1.784879 LR: 0.00001486 [14:42:50] Epoch: 1 Batch: 13353/20099 (66.44%) Loss: 1.997287 LR: 0.00001486 [14:42:51] Epoch: 1 Batch: 13354/20099 (66.44%) Loss: 2.358819 LR: 0.00001486 [14:42:53] Epoch: 1 Batch: 13355/20099 (66.45%) Loss: 2.384898 LR: 0.00001486 [14:42:55] Epoch: 1 Batch: 13356/20099 (66.45%) Loss: 1.859201 LR: 0.00001486 [14:42:57] Epoch: 1 Batch: 13357/20099 (66.46%) Loss: 2.176084 LR: 0.00001485 [14:42:59] Epoch: 1 Batch: 13358/20099 (66.46%) Loss: 2.058954 LR: 0.00001485 [14:43:01] Epoch: 1 Batch: 13359/20099 (66.47%) Loss: 2.127850 LR: 0.00001485 [14:43:02] Epoch: 1 Batch: 13360/20099 (66.47%) Loss: 2.171948 LR: 0.00001485 [14:43:04] Epoch: 1 Batch: 13361/20099 (66.48%) Loss: 2.267786 LR: 0.00001485 [14:43:06] Epoch: 1 Batch: 13362/20099 (66.48%) Loss: 2.175216 LR: 0.00001485 [14:43:08] Epoch: 1 Batch: 13363/20099 (66.49%) Loss: 2.209704 LR: 0.00001485 [14:43:10] Epoch: 1 Batch: 13364/20099 (66.49%) Loss: 2.323904 LR: 0.00001483 [14:43:12] Epoch: 1 Batch: 13365/20099 (66.50%) Loss: 2.060765 LR: 0.00001483 [14:43:14] Epoch: 1 Batch: 13366/20099 (66.50%) Loss: 2.176801 LR: 0.00001483 [14:43:15] Epoch: 1 Batch: 13367/20099 (66.51%) Loss: 2.266035 LR: 0.00001483 [14:43:17] Epoch: 1 Batch: 13368/20099 (66.51%) Loss: 2.115610 LR: 0.00001483 [14:43:19] Epoch: 1 Batch: 13369/20099 (66.52%) Loss: 2.231431 LR: 0.00001483 [14:43:21] Epoch: 1 Batch: 13370/20099 (66.52%) Loss: 2.277613 LR: 0.00001483 [14:43:23] Epoch: 1 Batch: 13371/20099 (66.53%) Loss: 2.158147 LR: 0.00001481 [14:43:25] Epoch: 1 Batch: 13372/20099 (66.53%) Loss: 2.118618 LR: 0.00001481 [14:43:27] Epoch: 1 Batch: 13373/20099 (66.54%) Loss: 1.765904 LR: 0.00001481 [14:43:29] Epoch: 1 Batch: 13374/20099 (66.54%) Loss: 1.926446 LR: 0.00001481 [14:43:30] Epoch: 1 Batch: 13375/20099 (66.55%) Loss: 2.176546 LR: 0.00001481 [14:43:32] Epoch: 1 Batch: 13376/20099 (66.55%) Loss: 2.009362 LR: 0.00001481 [14:43:34] Epoch: 1 Batch: 13377/20099 (66.56%) Loss: 1.979883 LR: 0.00001481 [14:43:36] Epoch: 1 Batch: 13378/20099 (66.56%) Loss: 2.275742 LR: 0.00001480 [14:43:38] Epoch: 1 Batch: 13379/20099 (66.57%) Loss: 2.205226 LR: 0.00001480 [14:43:40] Epoch: 1 Batch: 13380/20099 (66.57%) Loss: 1.986157 LR: 0.00001480 [14:43:42] Epoch: 1 Batch: 13381/20099 (66.58%) Loss: 2.037007 LR: 0.00001480 [14:43:43] Epoch: 1 Batch: 13382/20099 (66.58%) Loss: 1.993940 LR: 0.00001480 [14:43:45] Epoch: 1 Batch: 13383/20099 (66.59%) Loss: 1.849427 LR: 0.00001480 [14:43:47] Epoch: 1 Batch: 13384/20099 (66.59%) Loss: 1.788118 LR: 0.00001480 [14:43:49] Epoch: 1 Batch: 13385/20099 (66.60%) Loss: 2.277706 LR: 0.00001478 [14:43:51] Epoch: 1 Batch: 13386/20099 (66.60%) Loss: 2.305454 LR: 0.00001478 [14:43:53] Epoch: 1 Batch: 13387/20099 (66.61%) Loss: 2.110665 LR: 0.00001478 [14:43:55] Epoch: 1 Batch: 13388/20099 (66.61%) Loss: 2.121770 LR: 0.00001478 [14:43:56] Epoch: 1 Batch: 13389/20099 (66.62%) Loss: 2.005130 LR: 0.00001478 [14:43:58] Epoch: 1 Batch: 13390/20099 (66.62%) Loss: 2.416057 LR: 0.00001478 [14:44:00] Epoch: 1 Batch: 13391/20099 (66.63%) Loss: 2.057936 LR: 0.00001478 [14:44:02] Epoch: 1 Batch: 13392/20099 (66.63%) Loss: 2.136729 LR: 0.00001477 [14:44:04] Epoch: 1 Batch: 13393/20099 (66.64%) Loss: 2.028944 LR: 0.00001477 [14:44:06] Epoch: 1 Batch: 13394/20099 (66.64%) Loss: 2.085351 LR: 0.00001477 [14:44:07] Epoch: 1 Batch: 13395/20099 (66.65%) Loss: 2.093023 LR: 0.00001477 [14:44:09] Epoch: 1 Batch: 13396/20099 (66.65%) Loss: 1.930779 LR: 0.00001477 [14:44:11] Epoch: 1 Batch: 13397/20099 (66.66%) Loss: 2.247925 LR: 0.00001477 [14:44:13] Epoch: 1 Batch: 13398/20099 (66.66%) Loss: 2.240807 LR: 0.00001477 [14:44:15] Epoch: 1 Batch: 13399/20099 (66.67%) Loss: 2.086411 LR: 0.00001475 [14:44:20] >> Cleaned up old temp checkpoint: epoch1_step7600 [14:44:20] >> Temp checkpoint saved: epoch1_step13400, size: 0.1693 GB [14:44:20] Epoch: 1 Batch: 13400/20099 (66.67%) Loss: 2.135127 LR: 0.00001475 [14:44:22] Epoch: 1 Batch: 13401/20099 (66.67%) Loss: 1.793053 LR: 0.00001475 [14:44:24] Epoch: 1 Batch: 13402/20099 (66.68%) Loss: 2.021641 LR: 0.00001475 [14:44:26] Epoch: 1 Batch: 13403/20099 (66.68%) Loss: 2.265216 LR: 0.00001475 [14:44:28] Epoch: 1 Batch: 13404/20099 (66.69%) Loss: 2.333460 LR: 0.00001475 [14:44:30] Epoch: 1 Batch: 13405/20099 (66.69%) Loss: 2.192636 LR: 0.00001475 [14:44:32] Epoch: 1 Batch: 13406/20099 (66.70%) Loss: 2.081510 LR: 0.00001474 [14:44:33] Epoch: 1 Batch: 13407/20099 (66.70%) Loss: 2.191995 LR: 0.00001474 [14:44:35] Epoch: 1 Batch: 13408/20099 (66.71%) Loss: 2.086858 LR: 0.00001474 [14:44:37] Epoch: 1 Batch: 13409/20099 (66.71%) Loss: 2.146316 LR: 0.00001474 [14:44:39] Epoch: 1 Batch: 13410/20099 (66.72%) Loss: 1.925659 LR: 0.00001474 [14:44:41] Epoch: 1 Batch: 13411/20099 (66.72%) Loss: 2.127150 LR: 0.00001474 [14:44:43] Epoch: 1 Batch: 13412/20099 (66.73%) Loss: 1.910194 LR: 0.00001474 [14:44:45] Epoch: 1 Batch: 13413/20099 (66.73%) Loss: 2.036099 LR: 0.00001472 [14:44:47] Epoch: 1 Batch: 13414/20099 (66.74%) Loss: 2.062120 LR: 0.00001472 [14:44:49] Epoch: 1 Batch: 13415/20099 (66.74%) Loss: 2.202031 LR: 0.00001472 [14:44:50] Epoch: 1 Batch: 13416/20099 (66.75%) Loss: 2.026764 LR: 0.00001472 [14:44:52] Epoch: 1 Batch: 13417/20099 (66.75%) Loss: 2.123771 LR: 0.00001472 [14:44:54] Epoch: 1 Batch: 13418/20099 (66.76%) Loss: 1.684933 LR: 0.00001472 [14:44:56] Epoch: 1 Batch: 13419/20099 (66.76%) Loss: 2.046165 LR: 0.00001472 [14:44:58] Epoch: 1 Batch: 13420/20099 (66.77%) Loss: 2.168466 LR: 0.00001471 [14:45:00] Epoch: 1 Batch: 13421/20099 (66.77%) Loss: 1.825064 LR: 0.00001471 [14:45:02] Epoch: 1 Batch: 13422/20099 (66.78%) Loss: 2.148528 LR: 0.00001471 [14:45:03] Epoch: 1 Batch: 13423/20099 (66.78%) Loss: 2.297801 LR: 0.00001471 [14:45:05] Epoch: 1 Batch: 13424/20099 (66.79%) Loss: 2.136005 LR: 0.00001471 [14:45:07] Epoch: 1 Batch: 13425/20099 (66.79%) Loss: 1.755471 LR: 0.00001471 [14:45:09] Epoch: 1 Batch: 13426/20099 (66.80%) Loss: 2.222122 LR: 0.00001471 [14:45:11] Epoch: 1 Batch: 13427/20099 (66.80%) Loss: 2.453722 LR: 0.00001469 [14:45:13] Epoch: 1 Batch: 13428/20099 (66.81%) Loss: 2.042207 LR: 0.00001469 [14:45:15] Epoch: 1 Batch: 13429/20099 (66.81%) Loss: 2.144678 LR: 0.00001469 [14:45:16] Epoch: 1 Batch: 13430/20099 (66.82%) Loss: 2.148500 LR: 0.00001469 [14:45:18] Epoch: 1 Batch: 13431/20099 (66.82%) Loss: 2.146038 LR: 0.00001469 [14:45:20] Epoch: 1 Batch: 13432/20099 (66.83%) Loss: 2.038846 LR: 0.00001469 [14:45:22] Epoch: 1 Batch: 13433/20099 (66.83%) Loss: 2.096416 LR: 0.00001469 [14:45:24] Epoch: 1 Batch: 13434/20099 (66.84%) Loss: 2.089402 LR: 0.00001467 [14:45:26] Epoch: 1 Batch: 13435/20099 (66.84%) Loss: 2.266926 LR: 0.00001467 [14:45:28] Epoch: 1 Batch: 13436/20099 (66.85%) Loss: 1.930270 LR: 0.00001467 [14:45:29] Epoch: 1 Batch: 13437/20099 (66.85%) Loss: 1.991151 LR: 0.00001467 [14:45:31] Epoch: 1 Batch: 13438/20099 (66.86%) Loss: 2.178125 LR: 0.00001467 [14:45:33] Epoch: 1 Batch: 13439/20099 (66.86%) Loss: 2.182286 LR: 0.00001467 [14:45:35] Epoch: 1 Batch: 13440/20099 (66.87%) Loss: 1.984294 LR: 0.00001467 [14:45:37] Epoch: 1 Batch: 13441/20099 (66.87%) Loss: 2.205653 LR: 0.00001466 [14:45:39] Epoch: 1 Batch: 13442/20099 (66.88%) Loss: 1.954536 LR: 0.00001466 [14:45:40] Epoch: 1 Batch: 13443/20099 (66.88%) Loss: 1.910935 LR: 0.00001466 [14:45:42] Epoch: 1 Batch: 13444/20099 (66.89%) Loss: 1.883409 LR: 0.00001466 [14:45:44] Epoch: 1 Batch: 13445/20099 (66.89%) Loss: 1.968218 LR: 0.00001466 [14:45:46] Epoch: 1 Batch: 13446/20099 (66.90%) Loss: 1.509426 LR: 0.00001466 [14:45:48] Epoch: 1 Batch: 13447/20099 (66.90%) Loss: 2.001812 LR: 0.00001466 [14:45:50] Epoch: 1 Batch: 13448/20099 (66.91%) Loss: 1.812986 LR: 0.00001464 [14:45:52] Epoch: 1 Batch: 13449/20099 (66.91%) Loss: 2.101390 LR: 0.00001464 [14:45:53] Epoch: 1 Batch: 13450/20099 (66.92%) Loss: 2.241686 LR: 0.00001464 [14:45:55] Epoch: 1 Batch: 13451/20099 (66.92%) Loss: 2.485014 LR: 0.00001464 [14:45:57] Epoch: 1 Batch: 13452/20099 (66.93%) Loss: 2.218925 LR: 0.00001464 [14:45:59] Epoch: 1 Batch: 13453/20099 (66.93%) Loss: 2.250610 LR: 0.00001464 [14:46:01] Epoch: 1 Batch: 13454/20099 (66.94%) Loss: 2.383035 LR: 0.00001464 [14:46:03] Epoch: 1 Batch: 13455/20099 (66.94%) Loss: 2.035984 LR: 0.00001463 [14:46:05] Epoch: 1 Batch: 13456/20099 (66.95%) Loss: 1.989036 LR: 0.00001463 [14:46:06] Epoch: 1 Batch: 13457/20099 (66.95%) Loss: 1.995362 LR: 0.00001463 [14:46:08] Epoch: 1 Batch: 13458/20099 (66.96%) Loss: 1.863683 LR: 0.00001463 [14:46:10] Epoch: 1 Batch: 13459/20099 (66.96%) Loss: 1.934050 LR: 0.00001463 [14:46:12] Epoch: 1 Batch: 13460/20099 (66.97%) Loss: 1.838286 LR: 0.00001463 [14:46:14] Epoch: 1 Batch: 13461/20099 (66.97%) Loss: 2.080239 LR: 0.00001463 [14:46:16] Epoch: 1 Batch: 13462/20099 (66.98%) Loss: 2.190958 LR: 0.00001461 [14:46:18] Epoch: 1 Batch: 13463/20099 (66.98%) Loss: 2.205681 LR: 0.00001461 [14:46:19] Epoch: 1 Batch: 13464/20099 (66.99%) Loss: 2.040700 LR: 0.00001461 [14:46:21] Epoch: 1 Batch: 13465/20099 (66.99%) Loss: 2.091816 LR: 0.00001461 [14:46:23] Epoch: 1 Batch: 13466/20099 (67.00%) Loss: 2.049945 LR: 0.00001461 [14:46:25] Epoch: 1 Batch: 13467/20099 (67.00%) Loss: 1.877816 LR: 0.00001461 [14:46:27] Epoch: 1 Batch: 13468/20099 (67.01%) Loss: 2.258307 LR: 0.00001461 [14:46:29] Epoch: 1 Batch: 13469/20099 (67.01%) Loss: 2.198229 LR: 0.00001460 [14:46:30] Epoch: 1 Batch: 13470/20099 (67.02%) Loss: 1.899738 LR: 0.00001460 [14:46:32] Epoch: 1 Batch: 13471/20099 (67.02%) Loss: 2.358918 LR: 0.00001460 [14:46:34] Epoch: 1 Batch: 13472/20099 (67.03%) Loss: 2.227908 LR: 0.00001460 [14:46:36] Epoch: 1 Batch: 13473/20099 (67.03%) Loss: 2.537945 LR: 0.00001460 [14:46:38] Epoch: 1 Batch: 13474/20099 (67.04%) Loss: 1.979318 LR: 0.00001460 [14:46:40] Epoch: 1 Batch: 13475/20099 (67.04%) Loss: 2.319332 LR: 0.00001460 [14:46:42] Epoch: 1 Batch: 13476/20099 (67.05%) Loss: 1.680117 LR: 0.00001458 [14:46:43] Epoch: 1 Batch: 13477/20099 (67.05%) Loss: 2.024430 LR: 0.00001458 [14:46:45] Epoch: 1 Batch: 13478/20099 (67.06%) Loss: 2.212808 LR: 0.00001458 [14:46:47] Epoch: 1 Batch: 13479/20099 (67.06%) Loss: 2.102658 LR: 0.00001458 [14:46:49] Epoch: 1 Batch: 13480/20099 (67.07%) Loss: 1.983001 LR: 0.00001458 [14:46:51] Epoch: 1 Batch: 13481/20099 (67.07%) Loss: 2.138290 LR: 0.00001458 [14:46:53] Epoch: 1 Batch: 13482/20099 (67.08%) Loss: 2.020572 LR: 0.00001458 [14:46:55] Epoch: 1 Batch: 13483/20099 (67.08%) Loss: 2.035801 LR: 0.00001456 [14:46:56] Epoch: 1 Batch: 13484/20099 (67.09%) Loss: 1.725659 LR: 0.00001456 [14:46:58] Epoch: 1 Batch: 13485/20099 (67.09%) Loss: 2.176112 LR: 0.00001456 [14:47:00] Epoch: 1 Batch: 13486/20099 (67.10%) Loss: 1.992947 LR: 0.00001456 [14:47:02] Epoch: 1 Batch: 13487/20099 (67.10%) Loss: 2.131181 LR: 0.00001456 [14:47:04] Epoch: 1 Batch: 13488/20099 (67.11%) Loss: 1.562997 LR: 0.00001456 [14:47:06] Epoch: 1 Batch: 13489/20099 (67.11%) Loss: 2.506859 LR: 0.00001456 [14:47:08] Epoch: 1 Batch: 13490/20099 (67.12%) Loss: 2.072924 LR: 0.00001455 [14:47:09] Epoch: 1 Batch: 13491/20099 (67.12%) Loss: 1.916165 LR: 0.00001455 [14:47:11] Epoch: 1 Batch: 13492/20099 (67.13%) Loss: 2.225444 LR: 0.00001455 [14:47:13] Epoch: 1 Batch: 13493/20099 (67.13%) Loss: 1.816761 LR: 0.00001455 [14:47:15] Epoch: 1 Batch: 13494/20099 (67.14%) Loss: 2.235599 LR: 0.00001455 [14:47:17] Epoch: 1 Batch: 13495/20099 (67.14%) Loss: 2.225565 LR: 0.00001455 [14:47:19] Epoch: 1 Batch: 13496/20099 (67.15%) Loss: 2.069945 LR: 0.00001455 [14:47:21] Epoch: 1 Batch: 13497/20099 (67.15%) Loss: 1.985946 LR: 0.00001453 [14:47:22] Epoch: 1 Batch: 13498/20099 (67.16%) Loss: 2.429569 LR: 0.00001453 [14:47:24] Epoch: 1 Batch: 13499/20099 (67.16%) Loss: 1.888161 LR: 0.00001453 [14:47:26] >> Evaluating batch 0 [14:47:27] >> Evaluating batch 1 [14:47:28] >> Evaluating batch 2 [14:47:30] >> Evaluating batch 3 [14:47:31] >> Evaluating batch 4 [14:47:32] >> Evaluating batch 5 [14:47:33] >> Evaluating batch 6 [14:47:34] >> Evaluating batch 7 [14:47:35] >> Evaluating batch 8 [14:47:36] >> Evaluating batch 9 [14:47:37] >> Evaluating batch 10 [14:47:38] >> Evaluating batch 11 [14:47:39] >> Evaluating batch 12 [14:47:40] >> Evaluating batch 13 [14:47:41] >> Evaluating batch 14 [14:47:42] >> Evaluating batch 15 [14:47:43] >> Evaluating batch 16 [14:47:44] Epoch: 1 Step: 13500/20099 Evaluation: [14:47:44] [1mAvg Loss Since Last Eval: 2.0832 Val Loss: 2.1553 Validation loss delta: -0.0023 Perplexity: 8.6302 LR: 0.00001453 [14:47:47] >> Checkpoint saved: epoch1_step13500, size: 0.1693 GB [14:47:47] Epoch: 1 Batch: 13500/20099 (67.17%) Loss: 2.011903 LR: 0.00001453 [14:47:49] Epoch: 1 Batch: 13501/20099 (67.17%) Loss: 2.321958 LR: 0.00001453 [14:47:51] Epoch: 1 Batch: 13502/20099 (67.18%) Loss: 2.030170 LR: 0.00001453 [14:47:53] Epoch: 1 Batch: 13503/20099 (67.18%) Loss: 2.166376 LR: 0.00001453 [14:47:55] Epoch: 1 Batch: 13504/20099 (67.19%) Loss: 2.249407 LR: 0.00001452 [14:47:56] Epoch: 1 Batch: 13505/20099 (67.19%) Loss: 2.082206 LR: 0.00001452 [14:47:58] Epoch: 1 Batch: 13506/20099 (67.20%) Loss: 1.991330 LR: 0.00001452 [14:48:00] Epoch: 1 Batch: 13507/20099 (67.20%) Loss: 2.177834 LR: 0.00001452 [14:48:02] Epoch: 1 Batch: 13508/20099 (67.21%) Loss: 2.169148 LR: 0.00001452 [14:48:04] Epoch: 1 Batch: 13509/20099 (67.21%) Loss: 1.767752 LR: 0.00001452 [14:48:06] Epoch: 1 Batch: 13510/20099 (67.22%) Loss: 2.297407 LR: 0.00001452 [14:48:08] Epoch: 1 Batch: 13511/20099 (67.22%) Loss: 2.075797 LR: 0.00001450 [14:48:09] Epoch: 1 Batch: 13512/20099 (67.23%) Loss: 1.794461 LR: 0.00001450 [14:48:11] Epoch: 1 Batch: 13513/20099 (67.23%) Loss: 2.079753 LR: 0.00001450 [14:48:13] Epoch: 1 Batch: 13514/20099 (67.24%) Loss: 1.902808 LR: 0.00001450 [14:48:15] Epoch: 1 Batch: 13515/20099 (67.24%) Loss: 2.136451 LR: 0.00001450 [14:48:17] Epoch: 1 Batch: 13516/20099 (67.25%) Loss: 2.154259 LR: 0.00001450 [14:48:19] Epoch: 1 Batch: 13517/20099 (67.25%) Loss: 1.720858 LR: 0.00001450 [14:48:21] Epoch: 1 Batch: 13518/20099 (67.26%) Loss: 1.927377 LR: 0.00001449 [14:48:22] Epoch: 1 Batch: 13519/20099 (67.26%) Loss: 2.036101 LR: 0.00001449 [14:48:24] Epoch: 1 Batch: 13520/20099 (67.27%) Loss: 1.910327 LR: 0.00001449 [14:48:26] Epoch: 1 Batch: 13521/20099 (67.27%) Loss: 2.013096 LR: 0.00001449 [14:48:28] Epoch: 1 Batch: 13522/20099 (67.28%) Loss: 2.146849 LR: 0.00001449 [14:48:30] Epoch: 1 Batch: 13523/20099 (67.28%) Loss: 2.425998 LR: 0.00001449 [14:48:32] Epoch: 1 Batch: 13524/20099 (67.29%) Loss: 2.205116 LR: 0.00001449 [14:48:34] Epoch: 1 Batch: 13525/20099 (67.29%) Loss: 2.205664 LR: 0.00001447 [14:48:35] Epoch: 1 Batch: 13526/20099 (67.30%) Loss: 2.172720 LR: 0.00001447 [14:48:37] Epoch: 1 Batch: 13527/20099 (67.30%) Loss: 1.858198 LR: 0.00001447 [14:48:39] Epoch: 1 Batch: 13528/20099 (67.31%) Loss: 1.943600 LR: 0.00001447 [14:48:41] Epoch: 1 Batch: 13529/20099 (67.31%) Loss: 2.094165 LR: 0.00001447 [14:48:43] Epoch: 1 Batch: 13530/20099 (67.32%) Loss: 2.073287 LR: 0.00001447 [14:48:45] Epoch: 1 Batch: 13531/20099 (67.32%) Loss: 1.963931 LR: 0.00001447 [14:48:47] Epoch: 1 Batch: 13532/20099 (67.33%) Loss: 1.961176 LR: 0.00001446 [14:48:48] Epoch: 1 Batch: 13533/20099 (67.33%) Loss: 1.996915 LR: 0.00001446 [14:48:50] Epoch: 1 Batch: 13534/20099 (67.34%) Loss: 2.026828 LR: 0.00001446 [14:48:52] Epoch: 1 Batch: 13535/20099 (67.34%) Loss: 2.232148 LR: 0.00001446 [14:48:54] Epoch: 1 Batch: 13536/20099 (67.35%) Loss: 2.231326 LR: 0.00001446 [14:48:56] Epoch: 1 Batch: 13537/20099 (67.35%) Loss: 2.224164 LR: 0.00001446 [14:48:58] Epoch: 1 Batch: 13538/20099 (67.36%) Loss: 2.220720 LR: 0.00001446 [14:49:00] Epoch: 1 Batch: 13539/20099 (67.36%) Loss: 2.199985 LR: 0.00001444 [14:49:01] Epoch: 1 Batch: 13540/20099 (67.37%) Loss: 2.325209 LR: 0.00001444 [14:49:03] Epoch: 1 Batch: 13541/20099 (67.37%) Loss: 1.984649 LR: 0.00001444 [14:49:05] Epoch: 1 Batch: 13542/20099 (67.38%) Loss: 2.193214 LR: 0.00001444 [14:49:07] Epoch: 1 Batch: 13543/20099 (67.38%) Loss: 2.160307 LR: 0.00001444 [14:49:09] Epoch: 1 Batch: 13544/20099 (67.39%) Loss: 1.992991 LR: 0.00001444 [14:49:11] Epoch: 1 Batch: 13545/20099 (67.39%) Loss: 1.855545 LR: 0.00001444 [14:49:12] Epoch: 1 Batch: 13546/20099 (67.40%) Loss: 1.895160 LR: 0.00001442 [14:49:14] Epoch: 1 Batch: 13547/20099 (67.40%) Loss: 2.112163 LR: 0.00001442 [14:49:16] Epoch: 1 Batch: 13548/20099 (67.41%) Loss: 2.150685 LR: 0.00001442 [14:49:18] Epoch: 1 Batch: 13549/20099 (67.41%) Loss: 2.261418 LR: 0.00001442 [14:49:20] Epoch: 1 Batch: 13550/20099 (67.42%) Loss: 2.044626 LR: 0.00001442 [14:49:22] Epoch: 1 Batch: 13551/20099 (67.42%) Loss: 2.172755 LR: 0.00001442 [14:49:24] Epoch: 1 Batch: 13552/20099 (67.43%) Loss: 2.154099 LR: 0.00001442 [14:49:25] Epoch: 1 Batch: 13553/20099 (67.43%) Loss: 1.877829 LR: 0.00001441 [14:49:27] Epoch: 1 Batch: 13554/20099 (67.44%) Loss: 1.991456 LR: 0.00001441 [14:49:29] Epoch: 1 Batch: 13555/20099 (67.44%) Loss: 2.375202 LR: 0.00001441 [14:49:31] Epoch: 1 Batch: 13556/20099 (67.45%) Loss: 2.229606 LR: 0.00001441 [14:49:33] Epoch: 1 Batch: 13557/20099 (67.45%) Loss: 2.085515 LR: 0.00001441 [14:49:35] Epoch: 1 Batch: 13558/20099 (67.46%) Loss: 2.245805 LR: 0.00001441 [14:49:37] Epoch: 1 Batch: 13559/20099 (67.46%) Loss: 1.872956 LR: 0.00001441 [14:49:38] Epoch: 1 Batch: 13560/20099 (67.47%) Loss: 1.738995 LR: 0.00001439 [14:49:40] Epoch: 1 Batch: 13561/20099 (67.47%) Loss: 1.983373 LR: 0.00001439 [14:49:42] Epoch: 1 Batch: 13562/20099 (67.48%) Loss: 1.810364 LR: 0.00001439 [14:49:44] Epoch: 1 Batch: 13563/20099 (67.48%) Loss: 2.186716 LR: 0.00001439 [14:49:46] Epoch: 1 Batch: 13564/20099 (67.49%) Loss: 2.173098 LR: 0.00001439 [14:49:48] Epoch: 1 Batch: 13565/20099 (67.49%) Loss: 1.861415 LR: 0.00001439 [14:49:50] Epoch: 1 Batch: 13566/20099 (67.50%) Loss: 1.851422 LR: 0.00001439 [14:49:51] Epoch: 1 Batch: 13567/20099 (67.50%) Loss: 2.381083 LR: 0.00001438 [14:49:53] Epoch: 1 Batch: 13568/20099 (67.51%) Loss: 2.073934 LR: 0.00001438 [14:49:55] Epoch: 1 Batch: 13569/20099 (67.51%) Loss: 2.339108 LR: 0.00001438 [14:49:57] Epoch: 1 Batch: 13570/20099 (67.52%) Loss: 2.024748 LR: 0.00001438 [14:49:59] Epoch: 1 Batch: 13571/20099 (67.52%) Loss: 1.746899 LR: 0.00001438 [14:50:01] Epoch: 1 Batch: 13572/20099 (67.53%) Loss: 2.386711 LR: 0.00001438 [14:50:03] Epoch: 1 Batch: 13573/20099 (67.53%) Loss: 1.951219 LR: 0.00001438 [14:50:05] Epoch: 1 Batch: 13574/20099 (67.54%) Loss: 1.961536 LR: 0.00001436 [14:50:06] Epoch: 1 Batch: 13575/20099 (67.54%) Loss: 1.905236 LR: 0.00001436 [14:50:08] Epoch: 1 Batch: 13576/20099 (67.55%) Loss: 2.264654 LR: 0.00001436 [14:50:10] Epoch: 1 Batch: 13577/20099 (67.55%) Loss: 2.028520 LR: 0.00001436 [14:50:12] Epoch: 1 Batch: 13578/20099 (67.56%) Loss: 2.112345 LR: 0.00001436 [14:50:14] Epoch: 1 Batch: 13579/20099 (67.56%) Loss: 2.090103 LR: 0.00001436 [14:50:16] Epoch: 1 Batch: 13580/20099 (67.57%) Loss: 1.986644 LR: 0.00001436 [14:50:18] Epoch: 1 Batch: 13581/20099 (67.57%) Loss: 2.139053 LR: 0.00001435 [14:50:19] Epoch: 1 Batch: 13582/20099 (67.58%) Loss: 1.886828 LR: 0.00001435 [14:50:21] Epoch: 1 Batch: 13583/20099 (67.58%) Loss: 2.337843 LR: 0.00001435 [14:50:23] Epoch: 1 Batch: 13584/20099 (67.59%) Loss: 1.729876 LR: 0.00001435 [14:50:25] Epoch: 1 Batch: 13585/20099 (67.59%) Loss: 1.969269 LR: 0.00001435 [14:50:27] Epoch: 1 Batch: 13586/20099 (67.60%) Loss: 1.854407 LR: 0.00001435 [14:50:29] Epoch: 1 Batch: 13587/20099 (67.60%) Loss: 2.180773 LR: 0.00001435 [14:50:31] Epoch: 1 Batch: 13588/20099 (67.61%) Loss: 2.002723 LR: 0.00001433 [14:50:32] Epoch: 1 Batch: 13589/20099 (67.61%) Loss: 2.225653 LR: 0.00001433 [14:50:34] Epoch: 1 Batch: 13590/20099 (67.62%) Loss: 2.066202 LR: 0.00001433 [14:50:36] Epoch: 1 Batch: 13591/20099 (67.62%) Loss: 2.287260 LR: 0.00001433 [14:50:38] Epoch: 1 Batch: 13592/20099 (67.63%) Loss: 2.226359 LR: 0.00001433 [14:50:40] Epoch: 1 Batch: 13593/20099 (67.63%) Loss: 1.839072 LR: 0.00001433 [14:50:42] Epoch: 1 Batch: 13594/20099 (67.64%) Loss: 2.007494 LR: 0.00001433 [14:50:44] Epoch: 1 Batch: 13595/20099 (67.64%) Loss: 2.212268 LR: 0.00001432 [14:50:45] Epoch: 1 Batch: 13596/20099 (67.65%) Loss: 2.177581 LR: 0.00001432 [14:50:47] Epoch: 1 Batch: 13597/20099 (67.65%) Loss: 2.124609 LR: 0.00001432 [14:50:49] Epoch: 1 Batch: 13598/20099 (67.66%) Loss: 2.182082 LR: 0.00001432 [14:50:51] Epoch: 1 Batch: 13599/20099 (67.66%) Loss: 2.164456 LR: 0.00001432 [14:50:56] >> Cleaned up old temp checkpoint: epoch1_step11600 [14:50:56] >> Temp checkpoint saved: epoch1_step13600, size: 0.1693 GB [14:50:56] Epoch: 1 Batch: 13600/20099 (67.67%) Loss: 2.120978 LR: 0.00001432 [14:50:58] Epoch: 1 Batch: 13601/20099 (67.67%) Loss: 2.012021 LR: 0.00001432 [14:51:00] Epoch: 1 Batch: 13602/20099 (67.68%) Loss: 2.059035 LR: 0.00001430 [14:51:02] Epoch: 1 Batch: 13603/20099 (67.68%) Loss: 2.229355 LR: 0.00001430 [14:51:04] Epoch: 1 Batch: 13604/20099 (67.68%) Loss: 1.755328 LR: 0.00001430 [14:51:06] Epoch: 1 Batch: 13605/20099 (67.69%) Loss: 2.164550 LR: 0.00001430 [14:51:07] Epoch: 1 Batch: 13606/20099 (67.69%) Loss: 2.469192 LR: 0.00001430 [14:51:09] Epoch: 1 Batch: 13607/20099 (67.70%) Loss: 1.831126 LR: 0.00001430 [14:51:11] Epoch: 1 Batch: 13608/20099 (67.70%) Loss: 2.212390 LR: 0.00001430 [14:51:13] Epoch: 1 Batch: 13609/20099 (67.71%) Loss: 2.255259 LR: 0.00001429 [14:51:15] Epoch: 1 Batch: 13610/20099 (67.71%) Loss: 2.067151 LR: 0.00001429 [14:51:17] Epoch: 1 Batch: 13611/20099 (67.72%) Loss: 2.113679 LR: 0.00001429 [14:51:19] Epoch: 1 Batch: 13612/20099 (67.72%) Loss: 2.222425 LR: 0.00001429 [14:51:20] Epoch: 1 Batch: 13613/20099 (67.73%) Loss: 2.119242 LR: 0.00001429 [14:51:22] Epoch: 1 Batch: 13614/20099 (67.73%) Loss: 2.195909 LR: 0.00001429 [14:51:24] Epoch: 1 Batch: 13615/20099 (67.74%) Loss: 1.964242 LR: 0.00001429 [14:51:26] Epoch: 1 Batch: 13616/20099 (67.74%) Loss: 1.973704 LR: 0.00001427 [14:51:28] Epoch: 1 Batch: 13617/20099 (67.75%) Loss: 2.056378 LR: 0.00001427 [14:51:30] Epoch: 1 Batch: 13618/20099 (67.75%) Loss: 2.030741 LR: 0.00001427 [14:51:32] Epoch: 1 Batch: 13619/20099 (67.76%) Loss: 1.962627 LR: 0.00001427 [14:51:33] Epoch: 1 Batch: 13620/20099 (67.76%) Loss: 2.364783 LR: 0.00001427 [14:51:35] Epoch: 1 Batch: 13621/20099 (67.77%) Loss: 1.789416 LR: 0.00001427 [14:51:37] Epoch: 1 Batch: 13622/20099 (67.77%) Loss: 2.385618 LR: 0.00001427 [14:51:39] Epoch: 1 Batch: 13623/20099 (67.78%) Loss: 2.141251 LR: 0.00001425 [14:51:41] Epoch: 1 Batch: 13624/20099 (67.78%) Loss: 2.158912 LR: 0.00001425 [14:51:43] Epoch: 1 Batch: 13625/20099 (67.79%) Loss: 1.850044 LR: 0.00001425 [14:51:45] Epoch: 1 Batch: 13626/20099 (67.79%) Loss: 2.111247 LR: 0.00001425 [14:51:47] Epoch: 1 Batch: 13627/20099 (67.80%) Loss: 2.062116 LR: 0.00001425 [14:51:48] Epoch: 1 Batch: 13628/20099 (67.80%) Loss: 1.899929 LR: 0.00001425 [14:51:50] Epoch: 1 Batch: 13629/20099 (67.81%) Loss: 1.845110 LR: 0.00001425 [14:51:52] Epoch: 1 Batch: 13630/20099 (67.81%) Loss: 2.233570 LR: 0.00001424 [14:51:54] Epoch: 1 Batch: 13631/20099 (67.82%) Loss: 2.037664 LR: 0.00001424 [14:51:56] Epoch: 1 Batch: 13632/20099 (67.82%) Loss: 1.876193 LR: 0.00001424 [14:51:58] Epoch: 1 Batch: 13633/20099 (67.83%) Loss: 2.247371 LR: 0.00001424 [14:51:59] Epoch: 1 Batch: 13634/20099 (67.83%) Loss: 1.932721 LR: 0.00001424 [14:52:01] Epoch: 1 Batch: 13635/20099 (67.84%) Loss: 2.182402 LR: 0.00001424 [14:52:03] Epoch: 1 Batch: 13636/20099 (67.84%) Loss: 2.231509 LR: 0.00001424 [14:52:05] Epoch: 1 Batch: 13637/20099 (67.85%) Loss: 2.347555 LR: 0.00001422 [14:52:07] Epoch: 1 Batch: 13638/20099 (67.85%) Loss: 2.222641 LR: 0.00001422 [14:52:09] Epoch: 1 Batch: 13639/20099 (67.86%) Loss: 2.050825 LR: 0.00001422 [14:52:11] Epoch: 1 Batch: 13640/20099 (67.86%) Loss: 2.196037 LR: 0.00001422 [14:52:12] Epoch: 1 Batch: 13641/20099 (67.87%) Loss: 2.027528 LR: 0.00001422 [14:52:14] Epoch: 1 Batch: 13642/20099 (67.87%) Loss: 2.106459 LR: 0.00001422 [14:52:16] Epoch: 1 Batch: 13643/20099 (67.88%) Loss: 2.113587 LR: 0.00001422 [14:52:18] Epoch: 1 Batch: 13644/20099 (67.88%) Loss: 2.161985 LR: 0.00001421 [14:52:20] Epoch: 1 Batch: 13645/20099 (67.89%) Loss: 1.704740 LR: 0.00001421 [14:52:22] Epoch: 1 Batch: 13646/20099 (67.89%) Loss: 2.366832 LR: 0.00001421 [14:52:23] Epoch: 1 Batch: 13647/20099 (67.90%) Loss: 1.881084 LR: 0.00001421 [14:52:25] Epoch: 1 Batch: 13648/20099 (67.90%) Loss: 2.310054 LR: 0.00001421 [14:52:27] Epoch: 1 Batch: 13649/20099 (67.91%) Loss: 1.958483 LR: 0.00001421 [14:52:29] Epoch: 1 Batch: 13650/20099 (67.91%) Loss: 1.979843 LR: 0.00001421 [14:52:31] Epoch: 1 Batch: 13651/20099 (67.92%) Loss: 2.077630 LR: 0.00001419 [14:52:33] Epoch: 1 Batch: 13652/20099 (67.92%) Loss: 1.957179 LR: 0.00001419 [14:52:35] Epoch: 1 Batch: 13653/20099 (67.93%) Loss: 1.926663 LR: 0.00001419 [14:52:36] Epoch: 1 Batch: 13654/20099 (67.93%) Loss: 2.144798 LR: 0.00001419 [14:52:38] Epoch: 1 Batch: 13655/20099 (67.94%) Loss: 2.027520 LR: 0.00001419 [14:52:40] Epoch: 1 Batch: 13656/20099 (67.94%) Loss: 2.468672 LR: 0.00001419 [14:52:42] Epoch: 1 Batch: 13657/20099 (67.95%) Loss: 2.135991 LR: 0.00001419 [14:52:44] Epoch: 1 Batch: 13658/20099 (67.95%) Loss: 1.754703 LR: 0.00001418 [14:52:46] Epoch: 1 Batch: 13659/20099 (67.96%) Loss: 2.002926 LR: 0.00001418 [14:52:48] Epoch: 1 Batch: 13660/20099 (67.96%) Loss: 1.969887 LR: 0.00001418 [14:52:49] Epoch: 1 Batch: 13661/20099 (67.97%) Loss: 2.093292 LR: 0.00001418 [14:52:51] Epoch: 1 Batch: 13662/20099 (67.97%) Loss: 2.084755 LR: 0.00001418 [14:52:53] Epoch: 1 Batch: 13663/20099 (67.98%) Loss: 2.140959 LR: 0.00001418 [14:52:55] Epoch: 1 Batch: 13664/20099 (67.98%) Loss: 2.028686 LR: 0.00001418 [14:52:57] Epoch: 1 Batch: 13665/20099 (67.99%) Loss: 2.451407 LR: 0.00001416 [14:52:59] Epoch: 1 Batch: 13666/20099 (67.99%) Loss: 1.885467 LR: 0.00001416 [14:53:01] Epoch: 1 Batch: 13667/20099 (68.00%) Loss: 1.920986 LR: 0.00001416 [14:53:02] Epoch: 1 Batch: 13668/20099 (68.00%) Loss: 2.137796 LR: 0.00001416 [14:53:04] Epoch: 1 Batch: 13669/20099 (68.01%) Loss: 2.074468 LR: 0.00001416 [14:53:06] Epoch: 1 Batch: 13670/20099 (68.01%) Loss: 2.191732 LR: 0.00001416 [14:53:08] Epoch: 1 Batch: 13671/20099 (68.02%) Loss: 1.984421 LR: 0.00001416 [14:53:10] Epoch: 1 Batch: 13672/20099 (68.02%) Loss: 2.029981 LR: 0.00001415 [14:53:12] Epoch: 1 Batch: 13673/20099 (68.03%) Loss: 2.294815 LR: 0.00001415 [14:53:14] Epoch: 1 Batch: 13674/20099 (68.03%) Loss: 1.852778 LR: 0.00001415 [14:53:15] Epoch: 1 Batch: 13675/20099 (68.04%) Loss: 2.075808 LR: 0.00001415 [14:53:17] Epoch: 1 Batch: 13676/20099 (68.04%) Loss: 1.900626 LR: 0.00001415 [14:53:19] Epoch: 1 Batch: 13677/20099 (68.05%) Loss: 2.095630 LR: 0.00001415 [14:53:21] Epoch: 1 Batch: 13678/20099 (68.05%) Loss: 1.917425 LR: 0.00001415 [14:53:23] Epoch: 1 Batch: 13679/20099 (68.06%) Loss: 1.922134 LR: 0.00001413 [14:53:25] Epoch: 1 Batch: 13680/20099 (68.06%) Loss: 2.161550 LR: 0.00001413 [14:53:27] Epoch: 1 Batch: 13681/20099 (68.07%) Loss: 2.124061 LR: 0.00001413 [14:53:29] Epoch: 1 Batch: 13682/20099 (68.07%) Loss: 1.953416 LR: 0.00001413 [14:53:30] Epoch: 1 Batch: 13683/20099 (68.08%) Loss: 2.027341 LR: 0.00001413 [14:53:32] Epoch: 1 Batch: 13684/20099 (68.08%) Loss: 2.068595 LR: 0.00001413 [14:53:34] Epoch: 1 Batch: 13685/20099 (68.09%) Loss: 2.387050 LR: 0.00001413 [14:53:36] Epoch: 1 Batch: 13686/20099 (68.09%) Loss: 2.279773 LR: 0.00001412 [14:53:38] Epoch: 1 Batch: 13687/20099 (68.10%) Loss: 2.070468 LR: 0.00001412 [14:53:40] Epoch: 1 Batch: 13688/20099 (68.10%) Loss: 2.240777 LR: 0.00001412 [14:53:41] Epoch: 1 Batch: 13689/20099 (68.11%) Loss: 2.114401 LR: 0.00001412 [14:53:43] Epoch: 1 Batch: 13690/20099 (68.11%) Loss: 2.268743 LR: 0.00001412 [14:53:45] Epoch: 1 Batch: 13691/20099 (68.12%) Loss: 1.938378 LR: 0.00001412 [14:53:47] Epoch: 1 Batch: 13692/20099 (68.12%) Loss: 2.059533 LR: 0.00001412 [14:53:49] Epoch: 1 Batch: 13693/20099 (68.13%) Loss: 1.824458 LR: 0.00001410 [14:53:51] Epoch: 1 Batch: 13694/20099 (68.13%) Loss: 2.068025 LR: 0.00001410 [14:53:53] Epoch: 1 Batch: 13695/20099 (68.14%) Loss: 1.837087 LR: 0.00001410 [14:53:54] Epoch: 1 Batch: 13696/20099 (68.14%) Loss: 2.109556 LR: 0.00001410 [14:53:56] Epoch: 1 Batch: 13697/20099 (68.15%) Loss: 2.235687 LR: 0.00001410 [14:53:58] Epoch: 1 Batch: 13698/20099 (68.15%) Loss: 2.082028 LR: 0.00001410 [14:54:00] Epoch: 1 Batch: 13699/20099 (68.16%) Loss: 2.357527 LR: 0.00001410 [14:54:02] Epoch: 1 Batch: 13700/20099 (68.16%) Loss: 2.246573 LR: 0.00001409 [14:54:04] Epoch: 1 Batch: 13701/20099 (68.17%) Loss: 1.960848 LR: 0.00001409 [14:54:05] Epoch: 1 Batch: 13702/20099 (68.17%) Loss: 2.358168 LR: 0.00001409 [14:54:07] Epoch: 1 Batch: 13703/20099 (68.18%) Loss: 1.882469 LR: 0.00001409 [14:54:09] Epoch: 1 Batch: 13704/20099 (68.18%) Loss: 2.318020 LR: 0.00001409 [14:54:11] Epoch: 1 Batch: 13705/20099 (68.19%) Loss: 2.261776 LR: 0.00001409 [14:54:13] Epoch: 1 Batch: 13706/20099 (68.19%) Loss: 2.330215 LR: 0.00001409 [14:54:15] Epoch: 1 Batch: 13707/20099 (68.20%) Loss: 1.975414 LR: 0.00001407 [14:54:17] Epoch: 1 Batch: 13708/20099 (68.20%) Loss: 2.128087 LR: 0.00001407 [14:54:18] Epoch: 1 Batch: 13709/20099 (68.21%) Loss: 2.323610 LR: 0.00001407 [14:54:20] Epoch: 1 Batch: 13710/20099 (68.21%) Loss: 2.080271 LR: 0.00001407 [14:54:22] Epoch: 1 Batch: 13711/20099 (68.22%) Loss: 2.088404 LR: 0.00001407 [14:54:24] Epoch: 1 Batch: 13712/20099 (68.22%) Loss: 2.126389 LR: 0.00001407 [14:54:26] Epoch: 1 Batch: 13713/20099 (68.23%) Loss: 2.017226 LR: 0.00001407 [14:54:28] Epoch: 1 Batch: 13714/20099 (68.23%) Loss: 1.807030 LR: 0.00001405 [14:54:30] Epoch: 1 Batch: 13715/20099 (68.24%) Loss: 2.390823 LR: 0.00001405 [14:54:32] Epoch: 1 Batch: 13716/20099 (68.24%) Loss: 2.393228 LR: 0.00001405 [14:54:33] Epoch: 1 Batch: 13717/20099 (68.25%) Loss: 2.103113 LR: 0.00001405 [14:54:35] Epoch: 1 Batch: 13718/20099 (68.25%) Loss: 1.985986 LR: 0.00001405 [14:54:37] Epoch: 1 Batch: 13719/20099 (68.26%) Loss: 2.145407 LR: 0.00001405 [14:54:39] Epoch: 1 Batch: 13720/20099 (68.26%) Loss: 2.058873 LR: 0.00001405 [14:54:41] Epoch: 1 Batch: 13721/20099 (68.27%) Loss: 2.245711 LR: 0.00001404 [14:54:43] Epoch: 1 Batch: 13722/20099 (68.27%) Loss: 1.697034 LR: 0.00001404 [14:54:45] Epoch: 1 Batch: 13723/20099 (68.28%) Loss: 1.681452 LR: 0.00001404 [14:54:46] Epoch: 1 Batch: 13724/20099 (68.28%) Loss: 2.234457 LR: 0.00001404 [14:54:48] Epoch: 1 Batch: 13725/20099 (68.29%) Loss: 1.760089 LR: 0.00001404 [14:54:50] Epoch: 1 Batch: 13726/20099 (68.29%) Loss: 1.962854 LR: 0.00001404 [14:54:52] Epoch: 1 Batch: 13727/20099 (68.30%) Loss: 2.358553 LR: 0.00001404 [14:54:54] Epoch: 1 Batch: 13728/20099 (68.30%) Loss: 2.088039 LR: 0.00001402 [14:54:56] Epoch: 1 Batch: 13729/20099 (68.31%) Loss: 2.207064 LR: 0.00001402 [14:54:58] Epoch: 1 Batch: 13730/20099 (68.31%) Loss: 2.032794 LR: 0.00001402 [14:54:59] Epoch: 1 Batch: 13731/20099 (68.32%) Loss: 2.089985 LR: 0.00001402 [14:55:01] Epoch: 1 Batch: 13732/20099 (68.32%) Loss: 2.225474 LR: 0.00001402 [14:55:03] Epoch: 1 Batch: 13733/20099 (68.33%) Loss: 1.770898 LR: 0.00001402 [14:55:05] Epoch: 1 Batch: 13734/20099 (68.33%) Loss: 1.936968 LR: 0.00001402 [14:55:07] Epoch: 1 Batch: 13735/20099 (68.34%) Loss: 2.163065 LR: 0.00001401 [14:55:09] Epoch: 1 Batch: 13736/20099 (68.34%) Loss: 2.147904 LR: 0.00001401 [14:55:10] Epoch: 1 Batch: 13737/20099 (68.35%) Loss: 2.140880 LR: 0.00001401 [14:55:12] Epoch: 1 Batch: 13738/20099 (68.35%) Loss: 1.806151 LR: 0.00001401 [14:55:14] Epoch: 1 Batch: 13739/20099 (68.36%) Loss: 2.156403 LR: 0.00001401 [14:55:16] Epoch: 1 Batch: 13740/20099 (68.36%) Loss: 2.127851 LR: 0.00001401 [14:55:18] Epoch: 1 Batch: 13741/20099 (68.37%) Loss: 2.045966 LR: 0.00001401 [14:55:20] Epoch: 1 Batch: 13742/20099 (68.37%) Loss: 2.099993 LR: 0.00001399 [14:55:22] Epoch: 1 Batch: 13743/20099 (68.38%) Loss: 2.091187 LR: 0.00001399 [14:55:23] Epoch: 1 Batch: 13744/20099 (68.38%) Loss: 2.318160 LR: 0.00001399 [14:55:25] Epoch: 1 Batch: 13745/20099 (68.39%) Loss: 2.193218 LR: 0.00001399 [14:55:27] Epoch: 1 Batch: 13746/20099 (68.39%) Loss: 2.130143 LR: 0.00001399 [14:55:29] Epoch: 1 Batch: 13747/20099 (68.40%) Loss: 1.705513 LR: 0.00001399 [14:55:31] Epoch: 1 Batch: 13748/20099 (68.40%) Loss: 2.313535 LR: 0.00001399 [14:55:33] Epoch: 1 Batch: 13749/20099 (68.41%) Loss: 2.025187 LR: 0.00001398 [14:55:35] Epoch: 1 Batch: 13750/20099 (68.41%) Loss: 2.098824 LR: 0.00001398 [14:55:36] Epoch: 1 Batch: 13751/20099 (68.42%) Loss: 2.084153 LR: 0.00001398 [14:55:38] Epoch: 1 Batch: 13752/20099 (68.42%) Loss: 2.073747 LR: 0.00001398 [14:55:40] Epoch: 1 Batch: 13753/20099 (68.43%) Loss: 1.941357 LR: 0.00001398 [14:55:42] Epoch: 1 Batch: 13754/20099 (68.43%) Loss: 1.983290 LR: 0.00001398 [14:55:44] Epoch: 1 Batch: 13755/20099 (68.44%) Loss: 2.288856 LR: 0.00001398 [14:55:46] Epoch: 1 Batch: 13756/20099 (68.44%) Loss: 1.965196 LR: 0.00001396 [14:55:48] Epoch: 1 Batch: 13757/20099 (68.45%) Loss: 2.063732 LR: 0.00001396 [14:55:49] Epoch: 1 Batch: 13758/20099 (68.45%) Loss: 2.247014 LR: 0.00001396 [14:55:51] Epoch: 1 Batch: 13759/20099 (68.46%) Loss: 2.273970 LR: 0.00001396 [14:55:53] Epoch: 1 Batch: 13760/20099 (68.46%) Loss: 2.195621 LR: 0.00001396 [14:55:55] Epoch: 1 Batch: 13761/20099 (68.47%) Loss: 2.062136 LR: 0.00001396 [14:55:57] Epoch: 1 Batch: 13762/20099 (68.47%) Loss: 2.281042 LR: 0.00001396 [14:55:59] Epoch: 1 Batch: 13763/20099 (68.48%) Loss: 1.982923 LR: 0.00001395 [14:56:01] Epoch: 1 Batch: 13764/20099 (68.48%) Loss: 2.148174 LR: 0.00001395 [14:56:02] Epoch: 1 Batch: 13765/20099 (68.49%) Loss: 2.012049 LR: 0.00001395 [14:56:04] Epoch: 1 Batch: 13766/20099 (68.49%) Loss: 2.263738 LR: 0.00001395 [14:56:06] Epoch: 1 Batch: 13767/20099 (68.50%) Loss: 2.092298 LR: 0.00001395 [14:56:08] Epoch: 1 Batch: 13768/20099 (68.50%) Loss: 2.319257 LR: 0.00001395 [14:56:10] Epoch: 1 Batch: 13769/20099 (68.51%) Loss: 2.088193 LR: 0.00001395 [14:56:12] Epoch: 1 Batch: 13770/20099 (68.51%) Loss: 2.257448 LR: 0.00001393 [14:56:13] Epoch: 1 Batch: 13771/20099 (68.52%) Loss: 2.124279 LR: 0.00001393 [14:56:15] Epoch: 1 Batch: 13772/20099 (68.52%) Loss: 1.984884 LR: 0.00001393 [14:56:17] Epoch: 1 Batch: 13773/20099 (68.53%) Loss: 1.847061 LR: 0.00001393 [14:56:19] Epoch: 1 Batch: 13774/20099 (68.53%) Loss: 1.938062 LR: 0.00001393 [14:56:21] Epoch: 1 Batch: 13775/20099 (68.54%) Loss: 1.768648 LR: 0.00001393 [14:56:23] Epoch: 1 Batch: 13776/20099 (68.54%) Loss: 2.260957 LR: 0.00001393 [14:56:25] Epoch: 1 Batch: 13777/20099 (68.55%) Loss: 2.207091 LR: 0.00001392 [14:56:26] Epoch: 1 Batch: 13778/20099 (68.55%) Loss: 2.214519 LR: 0.00001392 [14:56:28] Epoch: 1 Batch: 13779/20099 (68.56%) Loss: 2.072462 LR: 0.00001392 [14:56:30] Epoch: 1 Batch: 13780/20099 (68.56%) Loss: 2.201774 LR: 0.00001392 [14:56:32] Epoch: 1 Batch: 13781/20099 (68.57%) Loss: 2.249969 LR: 0.00001392 [14:56:34] Epoch: 1 Batch: 13782/20099 (68.57%) Loss: 2.246098 LR: 0.00001392 [14:56:36] Epoch: 1 Batch: 13783/20099 (68.58%) Loss: 2.040993 LR: 0.00001392 [14:56:38] Epoch: 1 Batch: 13784/20099 (68.58%) Loss: 2.540714 LR: 0.00001390 [14:56:39] Epoch: 1 Batch: 13785/20099 (68.59%) Loss: 2.161488 LR: 0.00001390 [14:56:41] Epoch: 1 Batch: 13786/20099 (68.59%) Loss: 1.740087 LR: 0.00001390 [14:56:43] Epoch: 1 Batch: 13787/20099 (68.60%) Loss: 2.128547 LR: 0.00001390 [14:56:45] Epoch: 1 Batch: 13788/20099 (68.60%) Loss: 2.404901 LR: 0.00001390 [14:56:47] Epoch: 1 Batch: 13789/20099 (68.61%) Loss: 2.168510 LR: 0.00001390 [14:56:49] Epoch: 1 Batch: 13790/20099 (68.61%) Loss: 2.088853 LR: 0.00001390 [14:56:51] Epoch: 1 Batch: 13791/20099 (68.62%) Loss: 2.316560 LR: 0.00001389 [14:56:52] Epoch: 1 Batch: 13792/20099 (68.62%) Loss: 2.428486 LR: 0.00001389 [14:56:54] Epoch: 1 Batch: 13793/20099 (68.63%) Loss: 2.024719 LR: 0.00001389 [14:56:56] Epoch: 1 Batch: 13794/20099 (68.63%) Loss: 2.143664 LR: 0.00001389 [14:56:58] Epoch: 1 Batch: 13795/20099 (68.64%) Loss: 1.964874 LR: 0.00001389 [14:57:00] Epoch: 1 Batch: 13796/20099 (68.64%) Loss: 1.947215 LR: 0.00001389 [14:57:02] Epoch: 1 Batch: 13797/20099 (68.65%) Loss: 1.684202 LR: 0.00001389 [14:57:04] Epoch: 1 Batch: 13798/20099 (68.65%) Loss: 2.307091 LR: 0.00001387 [14:57:05] Epoch: 1 Batch: 13799/20099 (68.66%) Loss: 1.540619 LR: 0.00001387 [14:57:11] >> Cleaned up old temp checkpoint: epoch1_step11800 [14:57:11] >> Temp checkpoint saved: epoch1_step13800, size: 0.1693 GB [14:57:11] Epoch: 1 Batch: 13800/20099 (68.66%) Loss: 2.045623 LR: 0.00001387 [14:57:13] Epoch: 1 Batch: 13801/20099 (68.67%) Loss: 1.870990 LR: 0.00001387 [14:57:15] Epoch: 1 Batch: 13802/20099 (68.67%) Loss: 2.322865 LR: 0.00001387 [14:57:17] Epoch: 1 Batch: 13803/20099 (68.68%) Loss: 2.127298 LR: 0.00001387 [14:57:19] Epoch: 1 Batch: 13804/20099 (68.68%) Loss: 1.633534 LR: 0.00001387 [14:57:21] Epoch: 1 Batch: 13805/20099 (68.69%) Loss: 2.104667 LR: 0.00001386 [14:57:22] Epoch: 1 Batch: 13806/20099 (68.69%) Loss: 1.923604 LR: 0.00001386 [14:57:24] Epoch: 1 Batch: 13807/20099 (68.69%) Loss: 1.892445 LR: 0.00001386 [14:57:26] Epoch: 1 Batch: 13808/20099 (68.70%) Loss: 2.252549 LR: 0.00001386 [14:57:28] Epoch: 1 Batch: 13809/20099 (68.70%) Loss: 1.993925 LR: 0.00001386 [14:57:30] Epoch: 1 Batch: 13810/20099 (68.71%) Loss: 2.261081 LR: 0.00001386 [14:57:32] Epoch: 1 Batch: 13811/20099 (68.71%) Loss: 1.998562 LR: 0.00001386 [14:57:34] Epoch: 1 Batch: 13812/20099 (68.72%) Loss: 2.079758 LR: 0.00001384 [14:57:35] Epoch: 1 Batch: 13813/20099 (68.72%) Loss: 2.286893 LR: 0.00001384 [14:57:37] Epoch: 1 Batch: 13814/20099 (68.73%) Loss: 2.151386 LR: 0.00001384 [14:57:39] Epoch: 1 Batch: 13815/20099 (68.73%) Loss: 2.166206 LR: 0.00001384 [14:57:41] Epoch: 1 Batch: 13816/20099 (68.74%) Loss: 1.689416 LR: 0.00001384 [14:57:43] Epoch: 1 Batch: 13817/20099 (68.74%) Loss: 2.278618 LR: 0.00001384 [14:57:45] Epoch: 1 Batch: 13818/20099 (68.75%) Loss: 2.127134 LR: 0.00001384 [14:57:47] Epoch: 1 Batch: 13819/20099 (68.75%) Loss: 1.858003 LR: 0.00001383 [14:57:48] Epoch: 1 Batch: 13820/20099 (68.76%) Loss: 2.344027 LR: 0.00001383 [14:57:50] Epoch: 1 Batch: 13821/20099 (68.76%) Loss: 1.999196 LR: 0.00001383 [14:57:52] Epoch: 1 Batch: 13822/20099 (68.77%) Loss: 1.814938 LR: 0.00001383 [14:57:54] Epoch: 1 Batch: 13823/20099 (68.77%) Loss: 1.959158 LR: 0.00001383 [14:57:56] Epoch: 1 Batch: 13824/20099 (68.78%) Loss: 2.004296 LR: 0.00001383 [14:57:58] Epoch: 1 Batch: 13825/20099 (68.78%) Loss: 2.429563 LR: 0.00001383 [14:58:00] Epoch: 1 Batch: 13826/20099 (68.79%) Loss: 2.287254 LR: 0.00001381 [14:58:02] Epoch: 1 Batch: 13827/20099 (68.79%) Loss: 2.112763 LR: 0.00001381 [14:58:03] Epoch: 1 Batch: 13828/20099 (68.80%) Loss: 2.053855 LR: 0.00001381 [14:58:05] Epoch: 1 Batch: 13829/20099 (68.80%) Loss: 2.175066 LR: 0.00001381 [14:58:07] Epoch: 1 Batch: 13830/20099 (68.81%) Loss: 2.278041 LR: 0.00001381 [14:58:09] Epoch: 1 Batch: 13831/20099 (68.81%) Loss: 2.196589 LR: 0.00001381 [14:58:11] Epoch: 1 Batch: 13832/20099 (68.82%) Loss: 1.957931 LR: 0.00001381 [14:58:13] Epoch: 1 Batch: 13833/20099 (68.82%) Loss: 1.950059 LR: 0.00001380 [14:58:15] Epoch: 1 Batch: 13834/20099 (68.83%) Loss: 1.985117 LR: 0.00001380 [14:58:16] Epoch: 1 Batch: 13835/20099 (68.83%) Loss: 2.079342 LR: 0.00001380 [14:58:18] Epoch: 1 Batch: 13836/20099 (68.84%) Loss: 2.113398 LR: 0.00001380 [14:58:20] Epoch: 1 Batch: 13837/20099 (68.84%) Loss: 2.116506 LR: 0.00001380 [14:58:22] Epoch: 1 Batch: 13838/20099 (68.85%) Loss: 2.005578 LR: 0.00001380 [14:58:24] Epoch: 1 Batch: 13839/20099 (68.85%) Loss: 1.996784 LR: 0.00001380 [14:58:26] Epoch: 1 Batch: 13840/20099 (68.86%) Loss: 1.978959 LR: 0.00001378 [14:58:27] Epoch: 1 Batch: 13841/20099 (68.86%) Loss: 2.175993 LR: 0.00001378 [14:58:29] Epoch: 1 Batch: 13842/20099 (68.87%) Loss: 2.131866 LR: 0.00001378 [14:58:31] Epoch: 1 Batch: 13843/20099 (68.87%) Loss: 1.964678 LR: 0.00001378 [14:58:33] Epoch: 1 Batch: 13844/20099 (68.88%) Loss: 2.249523 LR: 0.00001378 [14:58:35] Epoch: 1 Batch: 13845/20099 (68.88%) Loss: 2.057565 LR: 0.00001378 [14:58:37] Epoch: 1 Batch: 13846/20099 (68.89%) Loss: 1.956790 LR: 0.00001378 [14:58:39] Epoch: 1 Batch: 13847/20099 (68.89%) Loss: 1.983870 LR: 0.00001376 [14:58:40] Epoch: 1 Batch: 13848/20099 (68.90%) Loss: 2.136093 LR: 0.00001376 [14:58:42] Epoch: 1 Batch: 13849/20099 (68.90%) Loss: 2.017327 LR: 0.00001376 [14:58:44] Epoch: 1 Batch: 13850/20099 (68.91%) Loss: 1.756735 LR: 0.00001376 [14:58:46] Epoch: 1 Batch: 13851/20099 (68.91%) Loss: 1.920734 LR: 0.00001376 [14:58:48] Epoch: 1 Batch: 13852/20099 (68.92%) Loss: 2.206615 LR: 0.00001376 [14:58:50] Epoch: 1 Batch: 13853/20099 (68.92%) Loss: 1.869215 LR: 0.00001376 [14:58:51] Epoch: 1 Batch: 13854/20099 (68.93%) Loss: 1.911517 LR: 0.00001375 [14:58:53] Epoch: 1 Batch: 13855/20099 (68.93%) Loss: 2.006258 LR: 0.00001375 [14:58:55] Epoch: 1 Batch: 13856/20099 (68.94%) Loss: 2.350784 LR: 0.00001375 [14:58:57] Epoch: 1 Batch: 13857/20099 (68.94%) Loss: 2.189626 LR: 0.00001375 [14:58:59] Epoch: 1 Batch: 13858/20099 (68.95%) Loss: 2.003358 LR: 0.00001375 [14:59:01] Epoch: 1 Batch: 13859/20099 (68.95%) Loss: 2.038043 LR: 0.00001375 [14:59:03] Epoch: 1 Batch: 13860/20099 (68.96%) Loss: 2.140952 LR: 0.00001375 [14:59:04] Epoch: 1 Batch: 13861/20099 (68.96%) Loss: 2.136316 LR: 0.00001373 [14:59:06] Epoch: 1 Batch: 13862/20099 (68.97%) Loss: 1.955398 LR: 0.00001373 [14:59:08] Epoch: 1 Batch: 13863/20099 (68.97%) Loss: 2.319066 LR: 0.00001373 [14:59:10] Epoch: 1 Batch: 13864/20099 (68.98%) Loss: 2.208076 LR: 0.00001373 [14:59:12] Epoch: 1 Batch: 13865/20099 (68.98%) Loss: 1.976430 LR: 0.00001373 [14:59:14] Epoch: 1 Batch: 13866/20099 (68.99%) Loss: 2.209623 LR: 0.00001373 [14:59:15] Epoch: 1 Batch: 13867/20099 (68.99%) Loss: 2.050482 LR: 0.00001373 [14:59:17] Epoch: 1 Batch: 13868/20099 (69.00%) Loss: 2.206521 LR: 0.00001372 [14:59:19] Epoch: 1 Batch: 13869/20099 (69.00%) Loss: 2.253762 LR: 0.00001372 [14:59:21] Epoch: 1 Batch: 13870/20099 (69.01%) Loss: 2.117237 LR: 0.00001372 [14:59:23] Epoch: 1 Batch: 13871/20099 (69.01%) Loss: 2.280343 LR: 0.00001372 [14:59:25] Epoch: 1 Batch: 13872/20099 (69.02%) Loss: 1.991402 LR: 0.00001372 [14:59:26] Epoch: 1 Batch: 13873/20099 (69.02%) Loss: 2.061892 LR: 0.00001372 [14:59:28] Epoch: 1 Batch: 13874/20099 (69.03%) Loss: 2.153712 LR: 0.00001372 [14:59:30] Epoch: 1 Batch: 13875/20099 (69.03%) Loss: 2.342072 LR: 0.00001370 [14:59:32] Epoch: 1 Batch: 13876/20099 (69.04%) Loss: 1.893811 LR: 0.00001370 [14:59:34] Epoch: 1 Batch: 13877/20099 (69.04%) Loss: 2.078634 LR: 0.00001370 [14:59:36] Epoch: 1 Batch: 13878/20099 (69.05%) Loss: 1.690002 LR: 0.00001370 [14:59:38] Epoch: 1 Batch: 13879/20099 (69.05%) Loss: 2.056716 LR: 0.00001370 [14:59:39] Epoch: 1 Batch: 13880/20099 (69.06%) Loss: 2.348188 LR: 0.00001370 [14:59:41] Epoch: 1 Batch: 13881/20099 (69.06%) Loss: 2.045840 LR: 0.00001370 [14:59:43] Epoch: 1 Batch: 13882/20099 (69.07%) Loss: 2.122275 LR: 0.00001369 [14:59:45] Epoch: 1 Batch: 13883/20099 (69.07%) Loss: 1.711198 LR: 0.00001369 [14:59:47] Epoch: 1 Batch: 13884/20099 (69.08%) Loss: 1.990469 LR: 0.00001369 [14:59:49] Epoch: 1 Batch: 13885/20099 (69.08%) Loss: 1.910649 LR: 0.00001369 [14:59:50] Epoch: 1 Batch: 13886/20099 (69.09%) Loss: 2.251243 LR: 0.00001369 [14:59:52] Epoch: 1 Batch: 13887/20099 (69.09%) Loss: 2.018031 LR: 0.00001369 [14:59:54] Epoch: 1 Batch: 13888/20099 (69.10%) Loss: 2.193894 LR: 0.00001369 [14:59:56] Epoch: 1 Batch: 13889/20099 (69.10%) Loss: 2.106102 LR: 0.00001367 [14:59:58] Epoch: 1 Batch: 13890/20099 (69.11%) Loss: 1.976237 LR: 0.00001367 [15:00:00] Epoch: 1 Batch: 13891/20099 (69.11%) Loss: 2.154158 LR: 0.00001367 [15:00:02] Epoch: 1 Batch: 13892/20099 (69.12%) Loss: 2.022938 LR: 0.00001367 [15:00:03] Epoch: 1 Batch: 13893/20099 (69.12%) Loss: 2.183618 LR: 0.00001367 [15:00:05] Epoch: 1 Batch: 13894/20099 (69.13%) Loss: 2.245978 LR: 0.00001367 [15:00:07] Epoch: 1 Batch: 13895/20099 (69.13%) Loss: 2.279955 LR: 0.00001367 [15:00:09] Epoch: 1 Batch: 13896/20099 (69.14%) Loss: 1.756827 LR: 0.00001366 [15:00:11] Epoch: 1 Batch: 13897/20099 (69.14%) Loss: 2.220166 LR: 0.00001366 [15:00:13] Epoch: 1 Batch: 13898/20099 (69.15%) Loss: 2.034070 LR: 0.00001366 [15:00:15] Epoch: 1 Batch: 13899/20099 (69.15%) Loss: 2.619587 LR: 0.00001366 [15:00:16] Epoch: 1 Batch: 13900/20099 (69.16%) Loss: 2.011238 LR: 0.00001366 [15:00:18] Epoch: 1 Batch: 13901/20099 (69.16%) Loss: 2.284074 LR: 0.00001366 [15:00:20] Epoch: 1 Batch: 13902/20099 (69.17%) Loss: 1.836890 LR: 0.00001366 [15:00:22] Epoch: 1 Batch: 13903/20099 (69.17%) Loss: 2.065214 LR: 0.00001364 [15:00:24] Epoch: 1 Batch: 13904/20099 (69.18%) Loss: 1.886945 LR: 0.00001364 [15:00:26] Epoch: 1 Batch: 13905/20099 (69.18%) Loss: 2.364975 LR: 0.00001364 [15:00:27] Epoch: 1 Batch: 13906/20099 (69.19%) Loss: 2.410313 LR: 0.00001364 [15:00:29] Epoch: 1 Batch: 13907/20099 (69.19%) Loss: 2.017542 LR: 0.00001364 [15:00:31] Epoch: 1 Batch: 13908/20099 (69.20%) Loss: 1.995131 LR: 0.00001364 [15:00:33] Epoch: 1 Batch: 13909/20099 (69.20%) Loss: 2.106164 LR: 0.00001364 [15:00:35] Epoch: 1 Batch: 13910/20099 (69.21%) Loss: 2.051979 LR: 0.00001363 [15:00:37] Epoch: 1 Batch: 13911/20099 (69.21%) Loss: 1.964107 LR: 0.00001363 [15:00:39] Epoch: 1 Batch: 13912/20099 (69.22%) Loss: 1.752481 LR: 0.00001363 [15:00:40] Epoch: 1 Batch: 13913/20099 (69.22%) Loss: 2.281965 LR: 0.00001363 [15:00:42] Epoch: 1 Batch: 13914/20099 (69.23%) Loss: 2.153474 LR: 0.00001363 [15:00:44] Epoch: 1 Batch: 13915/20099 (69.23%) Loss: 1.671374 LR: 0.00001363 [15:00:46] Epoch: 1 Batch: 13916/20099 (69.24%) Loss: 1.830287 LR: 0.00001363 [15:00:48] Epoch: 1 Batch: 13917/20099 (69.24%) Loss: 1.993374 LR: 0.00001361 [15:00:50] Epoch: 1 Batch: 13918/20099 (69.25%) Loss: 2.106870 LR: 0.00001361 [15:00:52] Epoch: 1 Batch: 13919/20099 (69.25%) Loss: 2.113884 LR: 0.00001361 [15:00:53] Epoch: 1 Batch: 13920/20099 (69.26%) Loss: 1.861474 LR: 0.00001361 [15:00:55] Epoch: 1 Batch: 13921/20099 (69.26%) Loss: 2.332723 LR: 0.00001361 [15:00:57] Epoch: 1 Batch: 13922/20099 (69.27%) Loss: 2.108716 LR: 0.00001361 [15:00:59] Epoch: 1 Batch: 13923/20099 (69.27%) Loss: 2.089876 LR: 0.00001361 [15:01:01] Epoch: 1 Batch: 13924/20099 (69.28%) Loss: 2.180338 LR: 0.00001360 [15:01:03] Epoch: 1 Batch: 13925/20099 (69.28%) Loss: 2.046123 LR: 0.00001360 [15:01:05] Epoch: 1 Batch: 13926/20099 (69.29%) Loss: 1.975247 LR: 0.00001360 [15:01:06] Epoch: 1 Batch: 13927/20099 (69.29%) Loss: 1.926643 LR: 0.00001360 [15:01:08] Epoch: 1 Batch: 13928/20099 (69.30%) Loss: 1.974615 LR: 0.00001360 [15:01:10] Epoch: 1 Batch: 13929/20099 (69.30%) Loss: 1.961394 LR: 0.00001360 [15:01:12] Epoch: 1 Batch: 13930/20099 (69.31%) Loss: 2.276236 LR: 0.00001360 [15:01:14] Epoch: 1 Batch: 13931/20099 (69.31%) Loss: 1.943706 LR: 0.00001358 [15:01:16] Epoch: 1 Batch: 13932/20099 (69.32%) Loss: 2.236912 LR: 0.00001358 [15:01:18] Epoch: 1 Batch: 13933/20099 (69.32%) Loss: 2.074392 LR: 0.00001358 [15:01:19] Epoch: 1 Batch: 13934/20099 (69.33%) Loss: 1.899179 LR: 0.00001358 [15:01:21] Epoch: 1 Batch: 13935/20099 (69.33%) Loss: 2.113750 LR: 0.00001358 [15:01:23] Epoch: 1 Batch: 13936/20099 (69.34%) Loss: 2.259090 LR: 0.00001358 [15:01:25] Epoch: 1 Batch: 13937/20099 (69.34%) Loss: 2.245796 LR: 0.00001358 [15:01:27] Epoch: 1 Batch: 13938/20099 (69.35%) Loss: 1.974177 LR: 0.00001357 [15:01:29] Epoch: 1 Batch: 13939/20099 (69.35%) Loss: 2.020723 LR: 0.00001357 [15:01:31] Epoch: 1 Batch: 13940/20099 (69.36%) Loss: 1.994613 LR: 0.00001357 [15:01:32] Epoch: 1 Batch: 13941/20099 (69.36%) Loss: 1.930880 LR: 0.00001357 [15:01:34] Epoch: 1 Batch: 13942/20099 (69.37%) Loss: 2.120154 LR: 0.00001357 [15:01:36] Epoch: 1 Batch: 13943/20099 (69.37%) Loss: 1.996972 LR: 0.00001357 [15:01:38] Epoch: 1 Batch: 13944/20099 (69.38%) Loss: 1.996036 LR: 0.00001357 [15:01:40] Epoch: 1 Batch: 13945/20099 (69.38%) Loss: 2.282368 LR: 0.00001355 [15:01:42] Epoch: 1 Batch: 13946/20099 (69.39%) Loss: 2.060913 LR: 0.00001355 [15:01:44] Epoch: 1 Batch: 13947/20099 (69.39%) Loss: 1.985770 LR: 0.00001355 [15:01:46] Epoch: 1 Batch: 13948/20099 (69.40%) Loss: 2.014093 LR: 0.00001355 [15:01:47] Epoch: 1 Batch: 13949/20099 (69.40%) Loss: 2.267414 LR: 0.00001355 [15:01:49] Epoch: 1 Batch: 13950/20099 (69.41%) Loss: 2.219996 LR: 0.00001355 [15:01:51] Epoch: 1 Batch: 13951/20099 (69.41%) Loss: 2.029548 LR: 0.00001355 [15:01:53] Epoch: 1 Batch: 13952/20099 (69.42%) Loss: 1.994523 LR: 0.00001354 [15:01:55] Epoch: 1 Batch: 13953/20099 (69.42%) Loss: 1.897132 LR: 0.00001354 [15:01:57] Epoch: 1 Batch: 13954/20099 (69.43%) Loss: 2.203361 LR: 0.00001354 [15:01:58] Epoch: 1 Batch: 13955/20099 (69.43%) Loss: 2.188126 LR: 0.00001354 [15:02:00] Epoch: 1 Batch: 13956/20099 (69.44%) Loss: 2.133483 LR: 0.00001354 [15:02:02] Epoch: 1 Batch: 13957/20099 (69.44%) Loss: 1.927952 LR: 0.00001354 [15:02:04] Epoch: 1 Batch: 13958/20099 (69.45%) Loss: 1.900508 LR: 0.00001354 [15:02:06] Epoch: 1 Batch: 13959/20099 (69.45%) Loss: 2.197140 LR: 0.00001352 [15:02:08] Epoch: 1 Batch: 13960/20099 (69.46%) Loss: 2.246276 LR: 0.00001352 [15:02:10] Epoch: 1 Batch: 13961/20099 (69.46%) Loss: 2.024846 LR: 0.00001352 [15:02:11] Epoch: 1 Batch: 13962/20099 (69.47%) Loss: 2.085478 LR: 0.00001352 [15:02:13] Epoch: 1 Batch: 13963/20099 (69.47%) Loss: 1.895947 LR: 0.00001352 [15:02:15] Epoch: 1 Batch: 13964/20099 (69.48%) Loss: 2.219170 LR: 0.00001352 [15:02:17] Epoch: 1 Batch: 13965/20099 (69.48%) Loss: 2.010260 LR: 0.00001352 [15:02:19] Epoch: 1 Batch: 13966/20099 (69.49%) Loss: 2.205330 LR: 0.00001351 [15:02:21] Epoch: 1 Batch: 13967/20099 (69.49%) Loss: 2.051417 LR: 0.00001351 [15:02:23] Epoch: 1 Batch: 13968/20099 (69.50%) Loss: 2.108162 LR: 0.00001351 [15:02:24] Epoch: 1 Batch: 13969/20099 (69.50%) Loss: 2.276898 LR: 0.00001351 [15:02:26] Epoch: 1 Batch: 13970/20099 (69.51%) Loss: 2.188663 LR: 0.00001351 [15:02:28] Epoch: 1 Batch: 13971/20099 (69.51%) Loss: 1.983019 LR: 0.00001351 [15:02:30] Epoch: 1 Batch: 13972/20099 (69.52%) Loss: 2.023133 LR: 0.00001351 [15:02:32] Epoch: 1 Batch: 13973/20099 (69.52%) Loss: 2.302833 LR: 0.00001349 [15:02:34] Epoch: 1 Batch: 13974/20099 (69.53%) Loss: 1.993385 LR: 0.00001349 [15:02:36] Epoch: 1 Batch: 13975/20099 (69.53%) Loss: 2.238295 LR: 0.00001349 [15:02:38] Epoch: 1 Batch: 13976/20099 (69.54%) Loss: 2.270355 LR: 0.00001349 [15:02:39] Epoch: 1 Batch: 13977/20099 (69.54%) Loss: 2.008026 LR: 0.00001349 [15:02:41] Epoch: 1 Batch: 13978/20099 (69.55%) Loss: 2.190813 LR: 0.00001349 [15:02:43] Epoch: 1 Batch: 13979/20099 (69.55%) Loss: 2.022871 LR: 0.00001349 [15:02:45] Epoch: 1 Batch: 13980/20099 (69.56%) Loss: 2.094099 LR: 0.00001348 [15:02:47] Epoch: 1 Batch: 13981/20099 (69.56%) Loss: 2.178144 LR: 0.00001348 [15:02:49] Epoch: 1 Batch: 13982/20099 (69.57%) Loss: 2.188421 LR: 0.00001348 [15:02:51] Epoch: 1 Batch: 13983/20099 (69.57%) Loss: 1.880559 LR: 0.00001348 [15:02:52] Epoch: 1 Batch: 13984/20099 (69.58%) Loss: 2.402621 LR: 0.00001348 [15:02:54] Epoch: 1 Batch: 13985/20099 (69.58%) Loss: 1.970962 LR: 0.00001348 [15:02:56] Epoch: 1 Batch: 13986/20099 (69.59%) Loss: 1.912417 LR: 0.00001348 [15:02:58] Epoch: 1 Batch: 13987/20099 (69.59%) Loss: 2.249149 LR: 0.00001346 [15:03:00] Epoch: 1 Batch: 13988/20099 (69.60%) Loss: 1.900277 LR: 0.00001346 [15:03:02] Epoch: 1 Batch: 13989/20099 (69.60%) Loss: 2.099365 LR: 0.00001346 [15:03:04] Epoch: 1 Batch: 13990/20099 (69.61%) Loss: 1.963787 LR: 0.00001346 [15:03:05] Epoch: 1 Batch: 13991/20099 (69.61%) Loss: 2.024497 LR: 0.00001346 [15:03:07] Epoch: 1 Batch: 13992/20099 (69.62%) Loss: 2.091093 LR: 0.00001346 [15:03:09] Epoch: 1 Batch: 13993/20099 (69.62%) Loss: 2.018023 LR: 0.00001346 [15:03:11] Epoch: 1 Batch: 13994/20099 (69.63%) Loss: 2.148852 LR: 0.00001345 [15:03:13] Epoch: 1 Batch: 13995/20099 (69.63%) Loss: 2.228026 LR: 0.00001345 [15:03:15] Epoch: 1 Batch: 13996/20099 (69.64%) Loss: 2.393276 LR: 0.00001345 [15:03:17] Epoch: 1 Batch: 13997/20099 (69.64%) Loss: 1.635229 LR: 0.00001345 [15:03:18] Epoch: 1 Batch: 13998/20099 (69.65%) Loss: 1.796764 LR: 0.00001345 [15:03:20] Epoch: 1 Batch: 13999/20099 (69.65%) Loss: 2.049733 LR: 0.00001345 [15:03:22] >> Evaluating batch 0 [15:03:23] >> Evaluating batch 1 [15:03:24] >> Evaluating batch 2 [15:03:25] >> Evaluating batch 3 [15:03:26] >> Evaluating batch 4 [15:03:28] >> Evaluating batch 5 [15:03:29] >> Evaluating batch 6 [15:03:30] >> Evaluating batch 7 [15:03:31] >> Evaluating batch 8 [15:03:32] >> Evaluating batch 9 [15:03:33] >> Evaluating batch 10 [15:03:34] >> Evaluating batch 11 [15:03:35] >> Evaluating batch 12 [15:03:36] >> Evaluating batch 13 [15:03:37] >> Evaluating batch 14 [15:03:38] >> Evaluating batch 15 [15:03:39] >> Evaluating batch 16 [15:03:39] Epoch: 1 Step: 14000/20099 Evaluation: [15:03:39] [1mAvg Loss Since Last Eval: 2.0831 Val Loss: 2.1546 Validation loss delta: -0.0007 Perplexity: 8.6244 LR: 0.00001345 [15:03:43] >> Cleaned up old temp checkpoint: epoch1_step12000 [15:03:43] >> Temp checkpoint saved: epoch1_step14000, size: 0.1693 GB [15:03:47] >> Checkpoint saved: epoch1_step14000, size: 0.1693 GB [15:03:47] Epoch: 1 Batch: 14000/20099 (69.66%) Loss: 1.908236 LR: 0.00001345 [15:03:48] Epoch: 1 Batch: 14001/20099 (69.66%) Loss: 2.152014 LR: 0.00001343 [15:03:50] Epoch: 1 Batch: 14002/20099 (69.67%) Loss: 2.035866 LR: 0.00001343 [15:03:52] Epoch: 1 Batch: 14003/20099 (69.67%) Loss: 2.169752 LR: 0.00001343 [15:03:54] Epoch: 1 Batch: 14004/20099 (69.68%) Loss: 2.156988 LR: 0.00001343 [15:03:56] Epoch: 1 Batch: 14005/20099 (69.68%) Loss: 2.053854 LR: 0.00001343 [15:03:58] Epoch: 1 Batch: 14006/20099 (69.69%) Loss: 2.015075 LR: 0.00001343 [15:03:59] Epoch: 1 Batch: 14007/20099 (69.69%) Loss: 2.143739 LR: 0.00001343 [15:04:01] Epoch: 1 Batch: 14008/20099 (69.70%) Loss: 2.312611 LR: 0.00001342 [15:04:03] Epoch: 1 Batch: 14009/20099 (69.70%) Loss: 1.889057 LR: 0.00001342 [15:04:05] Epoch: 1 Batch: 14010/20099 (69.70%) Loss: 2.238124 LR: 0.00001342 [15:04:07] Epoch: 1 Batch: 14011/20099 (69.71%) Loss: 1.765634 LR: 0.00001342 [15:04:09] Epoch: 1 Batch: 14012/20099 (69.71%) Loss: 2.169719 LR: 0.00001342 [15:04:11] Epoch: 1 Batch: 14013/20099 (69.72%) Loss: 2.335338 LR: 0.00001342 [15:04:13] Epoch: 1 Batch: 14014/20099 (69.72%) Loss: 2.380353 LR: 0.00001342 [15:04:15] Epoch: 1 Batch: 14015/20099 (69.73%) Loss: 2.028246 LR: 0.00001340 [15:04:16] Epoch: 1 Batch: 14016/20099 (69.73%) Loss: 2.277111 LR: 0.00001340 [15:04:18] Epoch: 1 Batch: 14017/20099 (69.74%) Loss: 2.014685 LR: 0.00001340 [15:04:20] Epoch: 1 Batch: 14018/20099 (69.74%) Loss: 2.137202 LR: 0.00001340 [15:04:22] Epoch: 1 Batch: 14019/20099 (69.75%) Loss: 2.016282 LR: 0.00001340 [15:04:24] Epoch: 1 Batch: 14020/20099 (69.75%) Loss: 2.085164 LR: 0.00001340 [15:04:26] Epoch: 1 Batch: 14021/20099 (69.76%) Loss: 2.132124 LR: 0.00001340 [15:04:28] Epoch: 1 Batch: 14022/20099 (69.76%) Loss: 2.165897 LR: 0.00001339 [15:04:30] Epoch: 1 Batch: 14023/20099 (69.77%) Loss: 2.016580 LR: 0.00001339 [15:04:32] Epoch: 1 Batch: 14024/20099 (69.77%) Loss: 1.937707 LR: 0.00001339 [15:04:33] Epoch: 1 Batch: 14025/20099 (69.78%) Loss: 1.870736 LR: 0.00001339 [15:04:35] Epoch: 1 Batch: 14026/20099 (69.78%) Loss: 1.949037 LR: 0.00001339 [15:04:37] Epoch: 1 Batch: 14027/20099 (69.79%) Loss: 2.074316 LR: 0.00001339 [15:04:39] Epoch: 1 Batch: 14028/20099 (69.79%) Loss: 2.092081 LR: 0.00001339 [15:04:41] Epoch: 1 Batch: 14029/20099 (69.80%) Loss: 2.311319 LR: 0.00001337 [15:04:43] Epoch: 1 Batch: 14030/20099 (69.80%) Loss: 1.931598 LR: 0.00001337 [15:04:44] Epoch: 1 Batch: 14031/20099 (69.81%) Loss: 2.239501 LR: 0.00001337 [15:04:46] Epoch: 1 Batch: 14032/20099 (69.81%) Loss: 2.190092 LR: 0.00001337 [15:04:48] Epoch: 1 Batch: 14033/20099 (69.82%) Loss: 2.310144 LR: 0.00001337 [15:04:50] Epoch: 1 Batch: 14034/20099 (69.82%) Loss: 2.056702 LR: 0.00001337 [15:04:52] Epoch: 1 Batch: 14035/20099 (69.83%) Loss: 1.903889 LR: 0.00001337 [15:04:54] Epoch: 1 Batch: 14036/20099 (69.83%) Loss: 1.891208 LR: 0.00001336 [15:04:56] Epoch: 1 Batch: 14037/20099 (69.84%) Loss: 1.909328 LR: 0.00001336 [15:04:57] Epoch: 1 Batch: 14038/20099 (69.84%) Loss: 2.251498 LR: 0.00001336 [15:04:59] Epoch: 1 Batch: 14039/20099 (69.85%) Loss: 2.144134 LR: 0.00001336 [15:05:01] Epoch: 1 Batch: 14040/20099 (69.85%) Loss: 2.473144 LR: 0.00001336 [15:05:03] Epoch: 1 Batch: 14041/20099 (69.86%) Loss: 2.045105 LR: 0.00001336 [15:05:05] Epoch: 1 Batch: 14042/20099 (69.86%) Loss: 1.967996 LR: 0.00001336 [15:05:07] Epoch: 1 Batch: 14043/20099 (69.87%) Loss: 2.129719 LR: 0.00001334 [15:05:08] Epoch: 1 Batch: 14044/20099 (69.87%) Loss: 2.128107 LR: 0.00001334 [15:05:10] Epoch: 1 Batch: 14045/20099 (69.88%) Loss: 2.002133 LR: 0.00001334 [15:05:12] Epoch: 1 Batch: 14046/20099 (69.88%) Loss: 2.099285 LR: 0.00001334 [15:05:14] Epoch: 1 Batch: 14047/20099 (69.89%) Loss: 2.348757 LR: 0.00001334 [15:05:16] Epoch: 1 Batch: 14048/20099 (69.89%) Loss: 2.361411 LR: 0.00001334 [15:05:18] Epoch: 1 Batch: 14049/20099 (69.90%) Loss: 2.333449 LR: 0.00001334 [15:05:20] Epoch: 1 Batch: 14050/20099 (69.90%) Loss: 2.141673 LR: 0.00001333 [15:05:21] Epoch: 1 Batch: 14051/20099 (69.91%) Loss: 1.997753 LR: 0.00001333 [15:05:23] Epoch: 1 Batch: 14052/20099 (69.91%) Loss: 2.026626 LR: 0.00001333 [15:05:25] Epoch: 1 Batch: 14053/20099 (69.92%) Loss: 2.236732 LR: 0.00001333 [15:05:27] Epoch: 1 Batch: 14054/20099 (69.92%) Loss: 1.996920 LR: 0.00001333 [15:05:29] Epoch: 1 Batch: 14055/20099 (69.93%) Loss: 2.158952 LR: 0.00001333 [15:05:31] Epoch: 1 Batch: 14056/20099 (69.93%) Loss: 2.174842 LR: 0.00001333 [15:05:33] Epoch: 1 Batch: 14057/20099 (69.94%) Loss: 2.160324 LR: 0.00001331 [15:05:34] Epoch: 1 Batch: 14058/20099 (69.94%) Loss: 1.935305 LR: 0.00001331 [15:05:36] Epoch: 1 Batch: 14059/20099 (69.95%) Loss: 1.981442 LR: 0.00001331 [15:05:38] Epoch: 1 Batch: 14060/20099 (69.95%) Loss: 2.069142 LR: 0.00001331 [15:05:40] Epoch: 1 Batch: 14061/20099 (69.96%) Loss: 2.086893 LR: 0.00001331 [15:05:42] Epoch: 1 Batch: 14062/20099 (69.96%) Loss: 2.513581 LR: 0.00001331 [15:05:44] Epoch: 1 Batch: 14063/20099 (69.97%) Loss: 1.899521 LR: 0.00001331 [15:05:46] Epoch: 1 Batch: 14064/20099 (69.97%) Loss: 1.893383 LR: 0.00001330 [15:05:48] Epoch: 1 Batch: 14065/20099 (69.98%) Loss: 2.259633 LR: 0.00001330 [15:05:49] Epoch: 1 Batch: 14066/20099 (69.98%) Loss: 1.960115 LR: 0.00001330 [15:05:51] Epoch: 1 Batch: 14067/20099 (69.99%) Loss: 2.103124 LR: 0.00001330 [15:05:53] Epoch: 1 Batch: 14068/20099 (69.99%) Loss: 2.135539 LR: 0.00001330 [15:05:55] Epoch: 1 Batch: 14069/20099 (70.00%) Loss: 2.063297 LR: 0.00001330 [15:05:57] Epoch: 1 Batch: 14070/20099 (70.00%) Loss: 2.176017 LR: 0.00001330 [15:05:59] Epoch: 1 Batch: 14071/20099 (70.01%) Loss: 2.201392 LR: 0.00001328 [15:06:01] Epoch: 1 Batch: 14072/20099 (70.01%) Loss: 2.330271 LR: 0.00001328 [15:06:02] Epoch: 1 Batch: 14073/20099 (70.02%) Loss: 2.088560 LR: 0.00001328 [15:06:04] Epoch: 1 Batch: 14074/20099 (70.02%) Loss: 2.288650 LR: 0.00001328 [15:06:06] Epoch: 1 Batch: 14075/20099 (70.03%) Loss: 2.207077 LR: 0.00001328 [15:06:08] Epoch: 1 Batch: 14076/20099 (70.03%) Loss: 1.758030 LR: 0.00001328 [15:06:10] Epoch: 1 Batch: 14077/20099 (70.04%) Loss: 2.220818 LR: 0.00001328 [15:06:12] Epoch: 1 Batch: 14078/20099 (70.04%) Loss: 1.826088 LR: 0.00001327 [15:06:14] Epoch: 1 Batch: 14079/20099 (70.05%) Loss: 2.040720 LR: 0.00001327 [15:06:15] Epoch: 1 Batch: 14080/20099 (70.05%) Loss: 2.245892 LR: 0.00001327 [15:06:17] Epoch: 1 Batch: 14081/20099 (70.06%) Loss: 2.259224 LR: 0.00001327 [15:06:19] Epoch: 1 Batch: 14082/20099 (70.06%) Loss: 1.787365 LR: 0.00001327 [15:06:21] Epoch: 1 Batch: 14083/20099 (70.07%) Loss: 2.131996 LR: 0.00001327 [15:06:23] Epoch: 1 Batch: 14084/20099 (70.07%) Loss: 2.107537 LR: 0.00001327 [15:06:25] Epoch: 1 Batch: 14085/20099 (70.08%) Loss: 2.117787 LR: 0.00001325 [15:06:26] Epoch: 1 Batch: 14086/20099 (70.08%) Loss: 2.089099 LR: 0.00001325 [15:06:28] Epoch: 1 Batch: 14087/20099 (70.09%) Loss: 1.963848 LR: 0.00001325 [15:06:30] Epoch: 1 Batch: 14088/20099 (70.09%) Loss: 2.116444 LR: 0.00001325 [15:06:32] Epoch: 1 Batch: 14089/20099 (70.10%) Loss: 1.703026 LR: 0.00001325 [15:06:34] Epoch: 1 Batch: 14090/20099 (70.10%) Loss: 2.112913 LR: 0.00001325 [15:06:36] Epoch: 1 Batch: 14091/20099 (70.11%) Loss: 1.945439 LR: 0.00001325 [15:06:38] Epoch: 1 Batch: 14092/20099 (70.11%) Loss: 2.091409 LR: 0.00001324 [15:06:39] Epoch: 1 Batch: 14093/20099 (70.12%) Loss: 2.248818 LR: 0.00001324 [15:06:41] Epoch: 1 Batch: 14094/20099 (70.12%) Loss: 1.714032 LR: 0.00001324 [15:06:43] Epoch: 1 Batch: 14095/20099 (70.13%) Loss: 1.971220 LR: 0.00001324 [15:06:45] Epoch: 1 Batch: 14096/20099 (70.13%) Loss: 2.127012 LR: 0.00001324 [15:06:47] Epoch: 1 Batch: 14097/20099 (70.14%) Loss: 2.014073 LR: 0.00001324 [15:06:49] Epoch: 1 Batch: 14098/20099 (70.14%) Loss: 1.880741 LR: 0.00001324 [15:06:51] Epoch: 1 Batch: 14099/20099 (70.15%) Loss: 2.292917 LR: 0.00001322 [15:06:52] Epoch: 1 Batch: 14100/20099 (70.15%) Loss: 2.267997 LR: 0.00001322 [15:06:54] Epoch: 1 Batch: 14101/20099 (70.16%) Loss: 2.133940 LR: 0.00001322 [15:06:56] Epoch: 1 Batch: 14102/20099 (70.16%) Loss: 2.089356 LR: 0.00001322 [15:06:58] Epoch: 1 Batch: 14103/20099 (70.17%) Loss: 2.067074 LR: 0.00001322 [15:07:00] Epoch: 1 Batch: 14104/20099 (70.17%) Loss: 1.921925 LR: 0.00001322 [15:07:02] Epoch: 1 Batch: 14105/20099 (70.18%) Loss: 2.198205 LR: 0.00001322 [15:07:04] Epoch: 1 Batch: 14106/20099 (70.18%) Loss: 2.230950 LR: 0.00001321 [15:07:05] Epoch: 1 Batch: 14107/20099 (70.19%) Loss: 2.091808 LR: 0.00001321 [15:07:07] Epoch: 1 Batch: 14108/20099 (70.19%) Loss: 2.194324 LR: 0.00001321 [15:07:09] Epoch: 1 Batch: 14109/20099 (70.20%) Loss: 2.120229 LR: 0.00001321 [15:07:11] Epoch: 1 Batch: 14110/20099 (70.20%) Loss: 2.333757 LR: 0.00001321 [15:07:13] Epoch: 1 Batch: 14111/20099 (70.21%) Loss: 2.429177 LR: 0.00001321 [15:07:15] Epoch: 1 Batch: 14112/20099 (70.21%) Loss: 1.957863 LR: 0.00001321 [15:07:16] Epoch: 1 Batch: 14113/20099 (70.22%) Loss: 1.870540 LR: 0.00001319 [15:07:18] Epoch: 1 Batch: 14114/20099 (70.22%) Loss: 2.317012 LR: 0.00001319 [15:07:20] Epoch: 1 Batch: 14115/20099 (70.23%) Loss: 1.740619 LR: 0.00001319 [15:07:22] Epoch: 1 Batch: 14116/20099 (70.23%) Loss: 1.896868 LR: 0.00001319 [15:07:24] Epoch: 1 Batch: 14117/20099 (70.24%) Loss: 1.537217 LR: 0.00001319 [15:07:26] Epoch: 1 Batch: 14118/20099 (70.24%) Loss: 2.018700 LR: 0.00001319 [15:07:28] Epoch: 1 Batch: 14119/20099 (70.25%) Loss: 2.039764 LR: 0.00001319 [15:07:29] Epoch: 1 Batch: 14120/20099 (70.25%) Loss: 2.313422 LR: 0.00001318 [15:07:31] Epoch: 1 Batch: 14121/20099 (70.26%) Loss: 2.352668 LR: 0.00001318 [15:07:33] Epoch: 1 Batch: 14122/20099 (70.26%) Loss: 2.203261 LR: 0.00001318 [15:07:35] Epoch: 1 Batch: 14123/20099 (70.27%) Loss: 1.856316 LR: 0.00001318 [15:07:37] Epoch: 1 Batch: 14124/20099 (70.27%) Loss: 1.922009 LR: 0.00001318 [15:07:39] Epoch: 1 Batch: 14125/20099 (70.28%) Loss: 2.045663 LR: 0.00001318 [15:07:41] Epoch: 1 Batch: 14126/20099 (70.28%) Loss: 2.360119 LR: 0.00001318 [15:07:42] Epoch: 1 Batch: 14127/20099 (70.29%) Loss: 2.649317 LR: 0.00001316 [15:07:44] Epoch: 1 Batch: 14128/20099 (70.29%) Loss: 2.030398 LR: 0.00001316 [15:07:46] Epoch: 1 Batch: 14129/20099 (70.30%) Loss: 1.995903 LR: 0.00001316 [15:07:48] Epoch: 1 Batch: 14130/20099 (70.30%) Loss: 1.951354 LR: 0.00001316 [15:07:50] Epoch: 1 Batch: 14131/20099 (70.31%) Loss: 2.004289 LR: 0.00001316 [15:07:52] Epoch: 1 Batch: 14132/20099 (70.31%) Loss: 2.029167 LR: 0.00001316 [15:07:54] Epoch: 1 Batch: 14133/20099 (70.32%) Loss: 2.048864 LR: 0.00001316 [15:07:55] Epoch: 1 Batch: 14134/20099 (70.32%) Loss: 2.329615 LR: 0.00001315 [15:07:57] Epoch: 1 Batch: 14135/20099 (70.33%) Loss: 2.032752 LR: 0.00001315 [15:07:59] Epoch: 1 Batch: 14136/20099 (70.33%) Loss: 1.876507 LR: 0.00001315 [15:08:01] Epoch: 1 Batch: 14137/20099 (70.34%) Loss: 2.222470 LR: 0.00001315 [15:08:03] Epoch: 1 Batch: 14138/20099 (70.34%) Loss: 2.221272 LR: 0.00001315 [15:08:05] Epoch: 1 Batch: 14139/20099 (70.35%) Loss: 2.333714 LR: 0.00001315 [15:08:06] Epoch: 1 Batch: 14140/20099 (70.35%) Loss: 2.099627 LR: 0.00001315 [15:08:08] Epoch: 1 Batch: 14141/20099 (70.36%) Loss: 1.963567 LR: 0.00001313 [15:08:10] Epoch: 1 Batch: 14142/20099 (70.36%) Loss: 2.235420 LR: 0.00001313 [15:08:12] Epoch: 1 Batch: 14143/20099 (70.37%) Loss: 1.877184 LR: 0.00001313 [15:08:14] Epoch: 1 Batch: 14144/20099 (70.37%) Loss: 2.290599 LR: 0.00001313 [15:08:16] Epoch: 1 Batch: 14145/20099 (70.38%) Loss: 1.808970 LR: 0.00001313 [15:08:18] Epoch: 1 Batch: 14146/20099 (70.38%) Loss: 1.987428 LR: 0.00001313 [15:08:19] Epoch: 1 Batch: 14147/20099 (70.39%) Loss: 2.179288 LR: 0.00001313 [15:08:21] Epoch: 1 Batch: 14148/20099 (70.39%) Loss: 2.232906 LR: 0.00001312 [15:08:23] Epoch: 1 Batch: 14149/20099 (70.40%) Loss: 2.055832 LR: 0.00001312 [15:08:25] Epoch: 1 Batch: 14150/20099 (70.40%) Loss: 2.206657 LR: 0.00001312 [15:08:27] Epoch: 1 Batch: 14151/20099 (70.41%) Loss: 1.981790 LR: 0.00001312 [15:08:29] Epoch: 1 Batch: 14152/20099 (70.41%) Loss: 2.080351 LR: 0.00001312 [15:08:30] Epoch: 1 Batch: 14153/20099 (70.42%) Loss: 2.257615 LR: 0.00001312 [15:08:32] Epoch: 1 Batch: 14154/20099 (70.42%) Loss: 2.161017 LR: 0.00001312 [15:08:34] Epoch: 1 Batch: 14155/20099 (70.43%) Loss: 2.157845 LR: 0.00001310 [15:08:36] Epoch: 1 Batch: 14156/20099 (70.43%) Loss: 2.169761 LR: 0.00001310 [15:08:38] Epoch: 1 Batch: 14157/20099 (70.44%) Loss: 2.239894 LR: 0.00001310 [15:08:40] Epoch: 1 Batch: 14158/20099 (70.44%) Loss: 2.090580 LR: 0.00001310 [15:08:42] Epoch: 1 Batch: 14159/20099 (70.45%) Loss: 2.004393 LR: 0.00001310 [15:08:43] Epoch: 1 Batch: 14160/20099 (70.45%) Loss: 2.234771 LR: 0.00001310 [15:08:45] Epoch: 1 Batch: 14161/20099 (70.46%) Loss: 2.207394 LR: 0.00001310 [15:08:47] Epoch: 1 Batch: 14162/20099 (70.46%) Loss: 1.983834 LR: 0.00001309 [15:08:49] Epoch: 1 Batch: 14163/20099 (70.47%) Loss: 1.388820 LR: 0.00001309 [15:08:51] Epoch: 1 Batch: 14164/20099 (70.47%) Loss: 2.037211 LR: 0.00001309 [15:08:53] Epoch: 1 Batch: 14165/20099 (70.48%) Loss: 1.953824 LR: 0.00001309 [15:08:55] Epoch: 1 Batch: 14166/20099 (70.48%) Loss: 1.882253 LR: 0.00001309 [15:08:56] Epoch: 1 Batch: 14167/20099 (70.49%) Loss: 2.105921 LR: 0.00001309 [15:08:58] Epoch: 1 Batch: 14168/20099 (70.49%) Loss: 2.074148 LR: 0.00001309 [15:09:00] Epoch: 1 Batch: 14169/20099 (70.50%) Loss: 2.218823 LR: 0.00001307 [15:09:02] Epoch: 1 Batch: 14170/20099 (70.50%) Loss: 2.048222 LR: 0.00001307 [15:09:04] Epoch: 1 Batch: 14171/20099 (70.51%) Loss: 1.949757 LR: 0.00001307 [15:09:06] Epoch: 1 Batch: 14172/20099 (70.51%) Loss: 2.160639 LR: 0.00001307 [15:09:08] Epoch: 1 Batch: 14173/20099 (70.52%) Loss: 2.140779 LR: 0.00001307 [15:09:09] Epoch: 1 Batch: 14174/20099 (70.52%) Loss: 2.154678 LR: 0.00001307 [15:09:11] Epoch: 1 Batch: 14175/20099 (70.53%) Loss: 2.054819 LR: 0.00001307 [15:09:13] Epoch: 1 Batch: 14176/20099 (70.53%) Loss: 1.943763 LR: 0.00001306 [15:09:15] Epoch: 1 Batch: 14177/20099 (70.54%) Loss: 2.023943 LR: 0.00001306 [15:09:17] Epoch: 1 Batch: 14178/20099 (70.54%) Loss: 2.128215 LR: 0.00001306 [15:09:19] Epoch: 1 Batch: 14179/20099 (70.55%) Loss: 2.233530 LR: 0.00001306 [15:09:20] Epoch: 1 Batch: 14180/20099 (70.55%) Loss: 2.353640 LR: 0.00001306 [15:09:22] Epoch: 1 Batch: 14181/20099 (70.56%) Loss: 2.047935 LR: 0.00001306 [15:09:24] Epoch: 1 Batch: 14182/20099 (70.56%) Loss: 2.029359 LR: 0.00001306 [15:09:26] Epoch: 1 Batch: 14183/20099 (70.57%) Loss: 2.107228 LR: 0.00001304 [15:09:28] Epoch: 1 Batch: 14184/20099 (70.57%) Loss: 2.579060 LR: 0.00001304 [15:09:30] Epoch: 1 Batch: 14185/20099 (70.58%) Loss: 2.080436 LR: 0.00001304 [15:09:32] Epoch: 1 Batch: 14186/20099 (70.58%) Loss: 2.224449 LR: 0.00001304 [15:09:33] Epoch: 1 Batch: 14187/20099 (70.59%) Loss: 2.035991 LR: 0.00001304 [15:09:35] Epoch: 1 Batch: 14188/20099 (70.59%) Loss: 1.714143 LR: 0.00001304 [15:09:37] Epoch: 1 Batch: 14189/20099 (70.60%) Loss: 2.115271 LR: 0.00001304 [15:09:39] Epoch: 1 Batch: 14190/20099 (70.60%) Loss: 2.251832 LR: 0.00001303 [15:09:41] Epoch: 1 Batch: 14191/20099 (70.61%) Loss: 2.064791 LR: 0.00001303 [15:09:43] Epoch: 1 Batch: 14192/20099 (70.61%) Loss: 2.095898 LR: 0.00001303 [15:09:44] Epoch: 1 Batch: 14193/20099 (70.62%) Loss: 1.999495 LR: 0.00001303 [15:09:46] Epoch: 1 Batch: 14194/20099 (70.62%) Loss: 1.880523 LR: 0.00001303 [15:09:48] Epoch: 1 Batch: 14195/20099 (70.63%) Loss: 2.378606 LR: 0.00001303 [15:09:50] Epoch: 1 Batch: 14196/20099 (70.63%) Loss: 2.050813 LR: 0.00001303 [15:09:52] Epoch: 1 Batch: 14197/20099 (70.64%) Loss: 2.286471 LR: 0.00001302 [15:09:54] Epoch: 1 Batch: 14198/20099 (70.64%) Loss: 2.296535 LR: 0.00001302 [15:09:56] Epoch: 1 Batch: 14199/20099 (70.65%) Loss: 2.148905 LR: 0.00001302 [15:10:01] >> Cleaned up old temp checkpoint: epoch1_step12200 [15:10:01] >> Temp checkpoint saved: epoch1_step14200, size: 0.1693 GB [15:10:01] Epoch: 1 Batch: 14200/20099 (70.65%) Loss: 1.778147 LR: 0.00001302 [15:10:03] Epoch: 1 Batch: 14201/20099 (70.66%) Loss: 2.048033 LR: 0.00001302 [15:10:05] Epoch: 1 Batch: 14202/20099 (70.66%) Loss: 2.127092 LR: 0.00001302 [15:10:06] Epoch: 1 Batch: 14203/20099 (70.67%) Loss: 2.125577 LR: 0.00001302 [15:10:08] Epoch: 1 Batch: 14204/20099 (70.67%) Loss: 1.883456 LR: 0.00001300 [15:10:10] Epoch: 1 Batch: 14205/20099 (70.68%) Loss: 1.932036 LR: 0.00001300 [15:10:12] Epoch: 1 Batch: 14206/20099 (70.68%) Loss: 2.118212 LR: 0.00001300 [15:10:14] Epoch: 1 Batch: 14207/20099 (70.69%) Loss: 1.956492 LR: 0.00001300 [15:10:16] Epoch: 1 Batch: 14208/20099 (70.69%) Loss: 2.170266 LR: 0.00001300 [15:10:18] Epoch: 1 Batch: 14209/20099 (70.70%) Loss: 2.233057 LR: 0.00001300 [15:10:19] Epoch: 1 Batch: 14210/20099 (70.70%) Loss: 2.136323 LR: 0.00001300 [15:10:21] Epoch: 1 Batch: 14211/20099 (70.71%) Loss: 2.140452 LR: 0.00001299 [15:10:23] Epoch: 1 Batch: 14212/20099 (70.71%) Loss: 2.169427 LR: 0.00001299 [15:10:25] Epoch: 1 Batch: 14213/20099 (70.71%) Loss: 1.829173 LR: 0.00001299 [15:10:27] Epoch: 1 Batch: 14214/20099 (70.72%) Loss: 2.188499 LR: 0.00001299 [15:10:29] Epoch: 1 Batch: 14215/20099 (70.72%) Loss: 2.253694 LR: 0.00001299 [15:10:31] Epoch: 1 Batch: 14216/20099 (70.73%) Loss: 2.108526 LR: 0.00001299 [15:10:33] Epoch: 1 Batch: 14217/20099 (70.73%) Loss: 2.022913 LR: 0.00001299 [15:10:35] Epoch: 1 Batch: 14218/20099 (70.74%) Loss: 2.097582 LR: 0.00001297 [15:10:36] Epoch: 1 Batch: 14219/20099 (70.74%) Loss: 1.958154 LR: 0.00001297 [15:10:38] Epoch: 1 Batch: 14220/20099 (70.75%) Loss: 2.049472 LR: 0.00001297 [15:10:40] Epoch: 1 Batch: 14221/20099 (70.75%) Loss: 2.116355 LR: 0.00001297 [15:10:42] Epoch: 1 Batch: 14222/20099 (70.76%) Loss: 2.280712 LR: 0.00001297 [15:10:44] Epoch: 1 Batch: 14223/20099 (70.76%) Loss: 2.074663 LR: 0.00001297 [15:10:46] Epoch: 1 Batch: 14224/20099 (70.77%) Loss: 2.292792 LR: 0.00001297 [15:10:48] Epoch: 1 Batch: 14225/20099 (70.77%) Loss: 2.048664 LR: 0.00001296 [15:10:49] Epoch: 1 Batch: 14226/20099 (70.78%) Loss: 2.391509 LR: 0.00001296 [15:10:51] Epoch: 1 Batch: 14227/20099 (70.78%) Loss: 2.012646 LR: 0.00001296 [15:10:53] Epoch: 1 Batch: 14228/20099 (70.79%) Loss: 1.984705 LR: 0.00001296 [15:10:55] Epoch: 1 Batch: 14229/20099 (70.79%) Loss: 2.099390 LR: 0.00001296 [15:10:57] Epoch: 1 Batch: 14230/20099 (70.80%) Loss: 2.407599 LR: 0.00001296 [15:10:59] Epoch: 1 Batch: 14231/20099 (70.80%) Loss: 2.017683 LR: 0.00001296 [15:11:01] Epoch: 1 Batch: 14232/20099 (70.81%) Loss: 2.152608 LR: 0.00001294 [15:11:02] Epoch: 1 Batch: 14233/20099 (70.81%) Loss: 2.183143 LR: 0.00001294 [15:11:04] Epoch: 1 Batch: 14234/20099 (70.82%) Loss: 1.992147 LR: 0.00001294 [15:11:06] Epoch: 1 Batch: 14235/20099 (70.82%) Loss: 2.051990 LR: 0.00001294 [15:11:08] Epoch: 1 Batch: 14236/20099 (70.83%) Loss: 2.161864 LR: 0.00001294 [15:11:10] Epoch: 1 Batch: 14237/20099 (70.83%) Loss: 1.988487 LR: 0.00001294 [15:11:12] Epoch: 1 Batch: 14238/20099 (70.84%) Loss: 1.962685 LR: 0.00001294 [15:11:13] Epoch: 1 Batch: 14239/20099 (70.84%) Loss: 2.360046 LR: 0.00001293 [15:11:15] Epoch: 1 Batch: 14240/20099 (70.85%) Loss: 2.083917 LR: 0.00001293 [15:11:17] Epoch: 1 Batch: 14241/20099 (70.85%) Loss: 2.101196 LR: 0.00001293 [15:11:19] Epoch: 1 Batch: 14242/20099 (70.86%) Loss: 1.967901 LR: 0.00001293 [15:11:21] Epoch: 1 Batch: 14243/20099 (70.86%) Loss: 2.010656 LR: 0.00001293 [15:11:23] Epoch: 1 Batch: 14244/20099 (70.87%) Loss: 1.901682 LR: 0.00001293 [15:11:24] Epoch: 1 Batch: 14245/20099 (70.87%) Loss: 2.088339 LR: 0.00001293 [15:11:26] Epoch: 1 Batch: 14246/20099 (70.88%) Loss: 2.083033 LR: 0.00001291 [15:11:28] Epoch: 1 Batch: 14247/20099 (70.88%) Loss: 1.934826 LR: 0.00001291 [15:11:30] Epoch: 1 Batch: 14248/20099 (70.89%) Loss: 2.463536 LR: 0.00001291 [15:11:32] Epoch: 1 Batch: 14249/20099 (70.89%) Loss: 1.949071 LR: 0.00001291 [15:11:34] Epoch: 1 Batch: 14250/20099 (70.90%) Loss: 2.160035 LR: 0.00001291 [15:11:36] Epoch: 1 Batch: 14251/20099 (70.90%) Loss: 1.906379 LR: 0.00001291 [15:11:37] Epoch: 1 Batch: 14252/20099 (70.91%) Loss: 2.040885 LR: 0.00001291 [15:11:39] Epoch: 1 Batch: 14253/20099 (70.91%) Loss: 2.290811 LR: 0.00001290 [15:11:41] Epoch: 1 Batch: 14254/20099 (70.92%) Loss: 2.044679 LR: 0.00001290 [15:11:43] Epoch: 1 Batch: 14255/20099 (70.92%) Loss: 2.275305 LR: 0.00001290 [15:11:45] Epoch: 1 Batch: 14256/20099 (70.93%) Loss: 2.160977 LR: 0.00001290 [15:11:47] Epoch: 1 Batch: 14257/20099 (70.93%) Loss: 2.095740 LR: 0.00001290 [15:11:49] Epoch: 1 Batch: 14258/20099 (70.94%) Loss: 2.264572 LR: 0.00001290 [15:11:50] Epoch: 1 Batch: 14259/20099 (70.94%) Loss: 2.008916 LR: 0.00001290 [15:11:52] Epoch: 1 Batch: 14260/20099 (70.95%) Loss: 1.997956 LR: 0.00001288 [15:11:54] Epoch: 1 Batch: 14261/20099 (70.95%) Loss: 2.050543 LR: 0.00001288 [15:11:56] Epoch: 1 Batch: 14262/20099 (70.96%) Loss: 2.298821 LR: 0.00001288 [15:11:58] Epoch: 1 Batch: 14263/20099 (70.96%) Loss: 2.020747 LR: 0.00001288 [15:12:00] Epoch: 1 Batch: 14264/20099 (70.97%) Loss: 2.071627 LR: 0.00001288 [15:12:02] Epoch: 1 Batch: 14265/20099 (70.97%) Loss: 2.066666 LR: 0.00001288 [15:12:03] Epoch: 1 Batch: 14266/20099 (70.98%) Loss: 2.313992 LR: 0.00001288 [15:12:05] Epoch: 1 Batch: 14267/20099 (70.98%) Loss: 2.029183 LR: 0.00001287 [15:12:07] Epoch: 1 Batch: 14268/20099 (70.99%) Loss: 1.924399 LR: 0.00001287 [15:12:09] Epoch: 1 Batch: 14269/20099 (70.99%) Loss: 2.206744 LR: 0.00001287 [15:12:11] Epoch: 1 Batch: 14270/20099 (71.00%) Loss: 2.231679 LR: 0.00001287 [15:12:13] Epoch: 1 Batch: 14271/20099 (71.00%) Loss: 2.003874 LR: 0.00001287 [15:12:15] Epoch: 1 Batch: 14272/20099 (71.01%) Loss: 1.761560 LR: 0.00001287 [15:12:16] Epoch: 1 Batch: 14273/20099 (71.01%) Loss: 2.349317 LR: 0.00001287 [15:12:18] Epoch: 1 Batch: 14274/20099 (71.02%) Loss: 2.185916 LR: 0.00001285 [15:12:20] Epoch: 1 Batch: 14275/20099 (71.02%) Loss: 1.842154 LR: 0.00001285 [15:12:22] Epoch: 1 Batch: 14276/20099 (71.03%) Loss: 2.250989 LR: 0.00001285 [15:12:24] Epoch: 1 Batch: 14277/20099 (71.03%) Loss: 2.146939 LR: 0.00001285 [15:12:26] Epoch: 1 Batch: 14278/20099 (71.04%) Loss: 2.069284 LR: 0.00001285 [15:12:27] Epoch: 1 Batch: 14279/20099 (71.04%) Loss: 2.093804 LR: 0.00001285 [15:12:29] Epoch: 1 Batch: 14280/20099 (71.05%) Loss: 2.183137 LR: 0.00001285 [15:12:31] Epoch: 1 Batch: 14281/20099 (71.05%) Loss: 2.452441 LR: 0.00001284 [15:12:33] Epoch: 1 Batch: 14282/20099 (71.06%) Loss: 2.143381 LR: 0.00001284 [15:12:35] Epoch: 1 Batch: 14283/20099 (71.06%) Loss: 2.092828 LR: 0.00001284 [15:12:37] Epoch: 1 Batch: 14284/20099 (71.07%) Loss: 1.878299 LR: 0.00001284 [15:12:39] Epoch: 1 Batch: 14285/20099 (71.07%) Loss: 2.028181 LR: 0.00001284 [15:12:40] Epoch: 1 Batch: 14286/20099 (71.08%) Loss: 2.117287 LR: 0.00001284 [15:12:42] Epoch: 1 Batch: 14287/20099 (71.08%) Loss: 2.160241 LR: 0.00001284 [15:12:44] Epoch: 1 Batch: 14288/20099 (71.09%) Loss: 2.094038 LR: 0.00001282 [15:12:46] Epoch: 1 Batch: 14289/20099 (71.09%) Loss: 2.236572 LR: 0.00001282 [15:12:48] Epoch: 1 Batch: 14290/20099 (71.10%) Loss: 2.046491 LR: 0.00001282 [15:12:50] Epoch: 1 Batch: 14291/20099 (71.10%) Loss: 1.774092 LR: 0.00001282 [15:12:52] Epoch: 1 Batch: 14292/20099 (71.11%) Loss: 2.364005 LR: 0.00001282 [15:12:53] Epoch: 1 Batch: 14293/20099 (71.11%) Loss: 1.826458 LR: 0.00001282 [15:12:55] Epoch: 1 Batch: 14294/20099 (71.12%) Loss: 2.015998 LR: 0.00001282 [15:12:57] Epoch: 1 Batch: 14295/20099 (71.12%) Loss: 1.851637 LR: 0.00001281 [15:12:59] Epoch: 1 Batch: 14296/20099 (71.13%) Loss: 2.108362 LR: 0.00001281 [15:13:01] Epoch: 1 Batch: 14297/20099 (71.13%) Loss: 1.991532 LR: 0.00001281 [15:13:03] Epoch: 1 Batch: 14298/20099 (71.14%) Loss: 2.288403 LR: 0.00001281 [15:13:05] Epoch: 1 Batch: 14299/20099 (71.14%) Loss: 2.049167 LR: 0.00001281 [15:13:06] Epoch: 1 Batch: 14300/20099 (71.15%) Loss: 1.792877 LR: 0.00001281 [15:13:08] Epoch: 1 Batch: 14301/20099 (71.15%) Loss: 2.212775 LR: 0.00001281 [15:13:10] Epoch: 1 Batch: 14302/20099 (71.16%) Loss: 2.052789 LR: 0.00001279 [15:13:12] Epoch: 1 Batch: 14303/20099 (71.16%) Loss: 2.148552 LR: 0.00001279 [15:13:14] Epoch: 1 Batch: 14304/20099 (71.17%) Loss: 1.838241 LR: 0.00001279 [15:13:16] Epoch: 1 Batch: 14305/20099 (71.17%) Loss: 2.006079 LR: 0.00001279 [15:13:18] Epoch: 1 Batch: 14306/20099 (71.18%) Loss: 2.521352 LR: 0.00001279 [15:13:19] Epoch: 1 Batch: 14307/20099 (71.18%) Loss: 1.997672 LR: 0.00001279 [15:13:21] Epoch: 1 Batch: 14308/20099 (71.19%) Loss: 1.998363 LR: 0.00001279 [15:13:23] Epoch: 1 Batch: 14309/20099 (71.19%) Loss: 2.151805 LR: 0.00001278 [15:13:25] Epoch: 1 Batch: 14310/20099 (71.20%) Loss: 1.948065 LR: 0.00001278 [15:13:27] Epoch: 1 Batch: 14311/20099 (71.20%) Loss: 1.877794 LR: 0.00001278 [15:13:29] Epoch: 1 Batch: 14312/20099 (71.21%) Loss: 2.216712 LR: 0.00001278 [15:13:31] Epoch: 1 Batch: 14313/20099 (71.21%) Loss: 2.215113 LR: 0.00001278 [15:13:32] Epoch: 1 Batch: 14314/20099 (71.22%) Loss: 2.139597 LR: 0.00001278 [15:13:34] Epoch: 1 Batch: 14315/20099 (71.22%) Loss: 2.149871 LR: 0.00001278 [15:13:36] Epoch: 1 Batch: 14316/20099 (71.23%) Loss: 2.000152 LR: 0.00001277 [15:13:38] Epoch: 1 Batch: 14317/20099 (71.23%) Loss: 2.012342 LR: 0.00001277 [15:13:40] Epoch: 1 Batch: 14318/20099 (71.24%) Loss: 1.899079 LR: 0.00001277 [15:13:42] Epoch: 1 Batch: 14319/20099 (71.24%) Loss: 2.537788 LR: 0.00001277 [15:13:44] Epoch: 1 Batch: 14320/20099 (71.25%) Loss: 2.433259 LR: 0.00001277 [15:13:45] Epoch: 1 Batch: 14321/20099 (71.25%) Loss: 2.167090 LR: 0.00001277 [15:13:47] Epoch: 1 Batch: 14322/20099 (71.26%) Loss: 2.113279 LR: 0.00001277 [15:13:49] Epoch: 1 Batch: 14323/20099 (71.26%) Loss: 2.197399 LR: 0.00001275 [15:13:51] Epoch: 1 Batch: 14324/20099 (71.27%) Loss: 2.130492 LR: 0.00001275 [15:13:53] Epoch: 1 Batch: 14325/20099 (71.27%) Loss: 2.047857 LR: 0.00001275 [15:13:55] Epoch: 1 Batch: 14326/20099 (71.28%) Loss: 2.188257 LR: 0.00001275 [15:13:57] Epoch: 1 Batch: 14327/20099 (71.28%) Loss: 2.095658 LR: 0.00001275 [15:13:58] Epoch: 1 Batch: 14328/20099 (71.29%) Loss: 1.766085 LR: 0.00001275 [15:14:00] Epoch: 1 Batch: 14329/20099 (71.29%) Loss: 1.965010 LR: 0.00001275 [15:14:02] Epoch: 1 Batch: 14330/20099 (71.30%) Loss: 2.070580 LR: 0.00001274 [15:14:04] Epoch: 1 Batch: 14331/20099 (71.30%) Loss: 1.739433 LR: 0.00001274 [15:14:06] Epoch: 1 Batch: 14332/20099 (71.31%) Loss: 2.107355 LR: 0.00001274 [15:14:08] Epoch: 1 Batch: 14333/20099 (71.31%) Loss: 1.887204 LR: 0.00001274 [15:14:09] Epoch: 1 Batch: 14334/20099 (71.32%) Loss: 2.030982 LR: 0.00001274 [15:14:11] Epoch: 1 Batch: 14335/20099 (71.32%) Loss: 2.009320 LR: 0.00001274 [15:14:13] Epoch: 1 Batch: 14336/20099 (71.33%) Loss: 2.125716 LR: 0.00001274 [15:14:15] Epoch: 1 Batch: 14337/20099 (71.33%) Loss: 2.191684 LR: 0.00001272 [15:14:17] Epoch: 1 Batch: 14338/20099 (71.34%) Loss: 1.899550 LR: 0.00001272 [15:14:19] Epoch: 1 Batch: 14339/20099 (71.34%) Loss: 2.223105 LR: 0.00001272 [15:14:21] Epoch: 1 Batch: 14340/20099 (71.35%) Loss: 2.197577 LR: 0.00001272 [15:14:22] Epoch: 1 Batch: 14341/20099 (71.35%) Loss: 2.256318 LR: 0.00001272 [15:14:24] Epoch: 1 Batch: 14342/20099 (71.36%) Loss: 2.203864 LR: 0.00001272 [15:14:26] Epoch: 1 Batch: 14343/20099 (71.36%) Loss: 2.034499 LR: 0.00001272 [15:14:28] Epoch: 1 Batch: 14344/20099 (71.37%) Loss: 2.186865 LR: 0.00001271 [15:14:30] Epoch: 1 Batch: 14345/20099 (71.37%) Loss: 2.044107 LR: 0.00001271 [15:14:32] Epoch: 1 Batch: 14346/20099 (71.38%) Loss: 2.170029 LR: 0.00001271 [15:14:34] Epoch: 1 Batch: 14347/20099 (71.38%) Loss: 2.129327 LR: 0.00001271 [15:14:36] Epoch: 1 Batch: 14348/20099 (71.39%) Loss: 2.287020 LR: 0.00001271 [15:14:38] Epoch: 1 Batch: 14349/20099 (71.39%) Loss: 1.977602 LR: 0.00001271 [15:14:39] Epoch: 1 Batch: 14350/20099 (71.40%) Loss: 1.947317 LR: 0.00001271 [15:14:41] Epoch: 1 Batch: 14351/20099 (71.40%) Loss: 2.026627 LR: 0.00001269 [15:14:43] Epoch: 1 Batch: 14352/20099 (71.41%) Loss: 2.179625 LR: 0.00001269 [15:14:45] Epoch: 1 Batch: 14353/20099 (71.41%) Loss: 1.953543 LR: 0.00001269 [15:14:47] Epoch: 1 Batch: 14354/20099 (71.42%) Loss: 2.018035 LR: 0.00001269 [15:14:49] Epoch: 1 Batch: 14355/20099 (71.42%) Loss: 1.907887 LR: 0.00001269 [15:14:51] Epoch: 1 Batch: 14356/20099 (71.43%) Loss: 2.303562 LR: 0.00001269 [15:14:52] Epoch: 1 Batch: 14357/20099 (71.43%) Loss: 2.162208 LR: 0.00001269 [15:14:54] Epoch: 1 Batch: 14358/20099 (71.44%) Loss: 2.359410 LR: 0.00001268 [15:14:56] Epoch: 1 Batch: 14359/20099 (71.44%) Loss: 2.038847 LR: 0.00001268 [15:14:58] Epoch: 1 Batch: 14360/20099 (71.45%) Loss: 2.267715 LR: 0.00001268 [15:15:00] Epoch: 1 Batch: 14361/20099 (71.45%) Loss: 2.086479 LR: 0.00001268 [15:15:02] Epoch: 1 Batch: 14362/20099 (71.46%) Loss: 2.276069 LR: 0.00001268 [15:15:04] Epoch: 1 Batch: 14363/20099 (71.46%) Loss: 2.056138 LR: 0.00001268 [15:15:05] Epoch: 1 Batch: 14364/20099 (71.47%) Loss: 2.281404 LR: 0.00001268 [15:15:07] Epoch: 1 Batch: 14365/20099 (71.47%) Loss: 2.120492 LR: 0.00001266 [15:15:09] Epoch: 1 Batch: 14366/20099 (71.48%) Loss: 2.051313 LR: 0.00001266 [15:15:11] Epoch: 1 Batch: 14367/20099 (71.48%) Loss: 2.134567 LR: 0.00001266 [15:15:13] Epoch: 1 Batch: 14368/20099 (71.49%) Loss: 1.789321 LR: 0.00001266 [15:15:15] Epoch: 1 Batch: 14369/20099 (71.49%) Loss: 1.693445 LR: 0.00001266 [15:15:17] Epoch: 1 Batch: 14370/20099 (71.50%) Loss: 2.067714 LR: 0.00001266 [15:15:18] Epoch: 1 Batch: 14371/20099 (71.50%) Loss: 2.123215 LR: 0.00001266 [15:15:20] Epoch: 1 Batch: 14372/20099 (71.51%) Loss: 1.870459 LR: 0.00001265 [15:15:22] Epoch: 1 Batch: 14373/20099 (71.51%) Loss: 2.179322 LR: 0.00001265 [15:15:24] Epoch: 1 Batch: 14374/20099 (71.52%) Loss: 2.337648 LR: 0.00001265 [15:15:26] Epoch: 1 Batch: 14375/20099 (71.52%) Loss: 2.033265 LR: 0.00001265 [15:15:28] Epoch: 1 Batch: 14376/20099 (71.53%) Loss: 1.904836 LR: 0.00001265 [15:15:30] Epoch: 1 Batch: 14377/20099 (71.53%) Loss: 2.504828 LR: 0.00001265 [15:15:31] Epoch: 1 Batch: 14378/20099 (71.54%) Loss: 2.086949 LR: 0.00001265 [15:15:33] Epoch: 1 Batch: 14379/20099 (71.54%) Loss: 2.122542 LR: 0.00001263 [15:15:35] Epoch: 1 Batch: 14380/20099 (71.55%) Loss: 2.071714 LR: 0.00001263 [15:15:37] Epoch: 1 Batch: 14381/20099 (71.55%) Loss: 1.894771 LR: 0.00001263 [15:15:39] Epoch: 1 Batch: 14382/20099 (71.56%) Loss: 2.319098 LR: 0.00001263 [15:15:41] Epoch: 1 Batch: 14383/20099 (71.56%) Loss: 2.279652 LR: 0.00001263 [15:15:42] Epoch: 1 Batch: 14384/20099 (71.57%) Loss: 2.373564 LR: 0.00001263 [15:15:44] Epoch: 1 Batch: 14385/20099 (71.57%) Loss: 2.100822 LR: 0.00001263 [15:15:46] Epoch: 1 Batch: 14386/20099 (71.58%) Loss: 1.988576 LR: 0.00001262 [15:15:48] Epoch: 1 Batch: 14387/20099 (71.58%) Loss: 2.149220 LR: 0.00001262 [15:15:50] Epoch: 1 Batch: 14388/20099 (71.59%) Loss: 2.156591 LR: 0.00001262 [15:15:52] Epoch: 1 Batch: 14389/20099 (71.59%) Loss: 2.148806 LR: 0.00001262 [15:15:54] Epoch: 1 Batch: 14390/20099 (71.60%) Loss: 2.266338 LR: 0.00001262 [15:15:55] Epoch: 1 Batch: 14391/20099 (71.60%) Loss: 2.061398 LR: 0.00001262 [15:15:57] Epoch: 1 Batch: 14392/20099 (71.61%) Loss: 1.843570 LR: 0.00001262 [15:15:59] Epoch: 1 Batch: 14393/20099 (71.61%) Loss: 1.948949 LR: 0.00001261 [15:16:01] Epoch: 1 Batch: 14394/20099 (71.62%) Loss: 1.814525 LR: 0.00001261 [15:16:03] Epoch: 1 Batch: 14395/20099 (71.62%) Loss: 2.158657 LR: 0.00001261 [15:16:05] Epoch: 1 Batch: 14396/20099 (71.63%) Loss: 1.749747 LR: 0.00001261 [15:16:07] Epoch: 1 Batch: 14397/20099 (71.63%) Loss: 1.921659 LR: 0.00001261 [15:16:08] Epoch: 1 Batch: 14398/20099 (71.64%) Loss: 2.144074 LR: 0.00001261 [15:16:10] Epoch: 1 Batch: 14399/20099 (71.64%) Loss: 1.687020 LR: 0.00001261 [15:16:16] >> Cleaned up old temp checkpoint: epoch1_step12400 [15:16:16] >> Temp checkpoint saved: epoch1_step14400, size: 0.1693 GB [15:16:16] Epoch: 1 Batch: 14400/20099 (71.65%) Loss: 2.206450 LR: 0.00001259 [15:16:18] Epoch: 1 Batch: 14401/20099 (71.65%) Loss: 2.075752 LR: 0.00001259 [15:16:20] Epoch: 1 Batch: 14402/20099 (71.66%) Loss: 2.045844 LR: 0.00001259 [15:16:22] Epoch: 1 Batch: 14403/20099 (71.66%) Loss: 2.143670 LR: 0.00001259 [15:16:23] Epoch: 1 Batch: 14404/20099 (71.67%) Loss: 1.976221 LR: 0.00001259 [15:16:25] Epoch: 1 Batch: 14405/20099 (71.67%) Loss: 1.876219 LR: 0.00001259 [15:16:27] Epoch: 1 Batch: 14406/20099 (71.68%) Loss: 1.975319 LR: 0.00001259 [15:16:29] Epoch: 1 Batch: 14407/20099 (71.68%) Loss: 2.163534 LR: 0.00001258 [15:16:31] Epoch: 1 Batch: 14408/20099 (71.69%) Loss: 2.071994 LR: 0.00001258 [15:16:33] Epoch: 1 Batch: 14409/20099 (71.69%) Loss: 1.934184 LR: 0.00001258 [15:16:35] Epoch: 1 Batch: 14410/20099 (71.70%) Loss: 1.976240 LR: 0.00001258 [15:16:36] Epoch: 1 Batch: 14411/20099 (71.70%) Loss: 2.033132 LR: 0.00001258 [15:16:38] Epoch: 1 Batch: 14412/20099 (71.71%) Loss: 1.987811 LR: 0.00001258 [15:16:40] Epoch: 1 Batch: 14413/20099 (71.71%) Loss: 2.220178 LR: 0.00001258 [15:16:42] Epoch: 1 Batch: 14414/20099 (71.72%) Loss: 2.152871 LR: 0.00001256 [15:16:44] Epoch: 1 Batch: 14415/20099 (71.72%) Loss: 2.332562 LR: 0.00001256 [15:16:46] Epoch: 1 Batch: 14416/20099 (71.72%) Loss: 2.238581 LR: 0.00001256 [15:16:48] Epoch: 1 Batch: 14417/20099 (71.73%) Loss: 2.089610 LR: 0.00001256 [15:16:49] Epoch: 1 Batch: 14418/20099 (71.73%) Loss: 2.253574 LR: 0.00001256 [15:16:51] Epoch: 1 Batch: 14419/20099 (71.74%) Loss: 1.877119 LR: 0.00001256 [15:16:53] Epoch: 1 Batch: 14420/20099 (71.74%) Loss: 1.851880 LR: 0.00001256 [15:16:55] Epoch: 1 Batch: 14421/20099 (71.75%) Loss: 2.165289 LR: 0.00001255 [15:16:57] Epoch: 1 Batch: 14422/20099 (71.75%) Loss: 2.246713 LR: 0.00001255 [15:16:59] Epoch: 1 Batch: 14423/20099 (71.76%) Loss: 2.126076 LR: 0.00001255 [15:17:01] Epoch: 1 Batch: 14424/20099 (71.76%) Loss: 2.044554 LR: 0.00001255 [15:17:03] Epoch: 1 Batch: 14425/20099 (71.77%) Loss: 2.324728 LR: 0.00001255 [15:17:04] Epoch: 1 Batch: 14426/20099 (71.77%) Loss: 1.812767 LR: 0.00001255 [15:17:06] Epoch: 1 Batch: 14427/20099 (71.78%) Loss: 2.258412 LR: 0.00001255 [15:17:08] Epoch: 1 Batch: 14428/20099 (71.78%) Loss: 2.184781 LR: 0.00001253 [15:17:10] Epoch: 1 Batch: 14429/20099 (71.79%) Loss: 2.070400 LR: 0.00001253 [15:17:12] Epoch: 1 Batch: 14430/20099 (71.79%) Loss: 1.893469 LR: 0.00001253 [15:17:14] Epoch: 1 Batch: 14431/20099 (71.80%) Loss: 2.042439 LR: 0.00001253 [15:17:16] Epoch: 1 Batch: 14432/20099 (71.80%) Loss: 1.971487 LR: 0.00001253 [15:17:17] Epoch: 1 Batch: 14433/20099 (71.81%) Loss: 1.981019 LR: 0.00001253 [15:17:19] Epoch: 1 Batch: 14434/20099 (71.81%) Loss: 2.101089 LR: 0.00001253 [15:17:21] Epoch: 1 Batch: 14435/20099 (71.82%) Loss: 1.960382 LR: 0.00001252 [15:17:23] Epoch: 1 Batch: 14436/20099 (71.82%) Loss: 2.217888 LR: 0.00001252 [15:17:25] Epoch: 1 Batch: 14437/20099 (71.83%) Loss: 2.130462 LR: 0.00001252 [15:17:27] Epoch: 1 Batch: 14438/20099 (71.83%) Loss: 2.106613 LR: 0.00001252 [15:17:28] Epoch: 1 Batch: 14439/20099 (71.84%) Loss: 1.908011 LR: 0.00001252 [15:17:30] Epoch: 1 Batch: 14440/20099 (71.84%) Loss: 2.220032 LR: 0.00001252 [15:17:32] Epoch: 1 Batch: 14441/20099 (71.85%) Loss: 1.910267 LR: 0.00001252 [15:17:34] Epoch: 1 Batch: 14442/20099 (71.85%) Loss: 2.261104 LR: 0.00001250 [15:17:36] Epoch: 1 Batch: 14443/20099 (71.86%) Loss: 2.085515 LR: 0.00001250 [15:17:38] Epoch: 1 Batch: 14444/20099 (71.86%) Loss: 1.943610 LR: 0.00001250 [15:17:40] Epoch: 1 Batch: 14445/20099 (71.87%) Loss: 1.963847 LR: 0.00001250 [15:17:41] Epoch: 1 Batch: 14446/20099 (71.87%) Loss: 2.349266 LR: 0.00001250 [15:17:43] Epoch: 1 Batch: 14447/20099 (71.88%) Loss: 2.200140 LR: 0.00001250 [15:17:45] Epoch: 1 Batch: 14448/20099 (71.88%) Loss: 1.996956 LR: 0.00001250 [15:17:47] Epoch: 1 Batch: 14449/20099 (71.89%) Loss: 2.202397 LR: 0.00001249 [15:17:49] Epoch: 1 Batch: 14450/20099 (71.89%) Loss: 2.148545 LR: 0.00001249 [15:17:51] Epoch: 1 Batch: 14451/20099 (71.90%) Loss: 2.126232 LR: 0.00001249 [15:17:53] Epoch: 1 Batch: 14452/20099 (71.90%) Loss: 2.233553 LR: 0.00001249 [15:17:54] Epoch: 1 Batch: 14453/20099 (71.91%) Loss: 2.125915 LR: 0.00001249 [15:17:56] Epoch: 1 Batch: 14454/20099 (71.91%) Loss: 2.064939 LR: 0.00001249 [15:17:58] Epoch: 1 Batch: 14455/20099 (71.92%) Loss: 1.873284 LR: 0.00001249 [15:18:00] Epoch: 1 Batch: 14456/20099 (71.92%) Loss: 2.141307 LR: 0.00001247 [15:18:02] Epoch: 1 Batch: 14457/20099 (71.93%) Loss: 2.360401 LR: 0.00001247 [15:18:04] Epoch: 1 Batch: 14458/20099 (71.93%) Loss: 2.310028 LR: 0.00001247 [15:18:05] Epoch: 1 Batch: 14459/20099 (71.94%) Loss: 2.637458 LR: 0.00001247 [15:18:07] Epoch: 1 Batch: 14460/20099 (71.94%) Loss: 2.149997 LR: 0.00001247 [15:18:09] Epoch: 1 Batch: 14461/20099 (71.95%) Loss: 2.183871 LR: 0.00001247 [15:18:11] Epoch: 1 Batch: 14462/20099 (71.95%) Loss: 1.826224 LR: 0.00001247 [15:18:13] Epoch: 1 Batch: 14463/20099 (71.96%) Loss: 1.833122 LR: 0.00001246 [15:18:15] Epoch: 1 Batch: 14464/20099 (71.96%) Loss: 2.202696 LR: 0.00001246 [15:18:17] Epoch: 1 Batch: 14465/20099 (71.97%) Loss: 1.819483 LR: 0.00001246 [15:18:18] Epoch: 1 Batch: 14466/20099 (71.97%) Loss: 2.076388 LR: 0.00001246 [15:18:20] Epoch: 1 Batch: 14467/20099 (71.98%) Loss: 1.945291 LR: 0.00001246 [15:18:22] Epoch: 1 Batch: 14468/20099 (71.98%) Loss: 2.335364 LR: 0.00001246 [15:18:24] Epoch: 1 Batch: 14469/20099 (71.99%) Loss: 2.198541 LR: 0.00001246 [15:18:26] Epoch: 1 Batch: 14470/20099 (71.99%) Loss: 2.079492 LR: 0.00001245 [15:18:28] Epoch: 1 Batch: 14471/20099 (72.00%) Loss: 2.233108 LR: 0.00001245 [15:18:29] Epoch: 1 Batch: 14472/20099 (72.00%) Loss: 1.769062 LR: 0.00001245 [15:18:31] Epoch: 1 Batch: 14473/20099 (72.01%) Loss: 2.417563 LR: 0.00001245 [15:18:33] Epoch: 1 Batch: 14474/20099 (72.01%) Loss: 2.011116 LR: 0.00001245 [15:18:35] Epoch: 1 Batch: 14475/20099 (72.02%) Loss: 2.102329 LR: 0.00001245 [15:18:37] Epoch: 1 Batch: 14476/20099 (72.02%) Loss: 2.167774 LR: 0.00001245 [15:18:39] Epoch: 1 Batch: 14477/20099 (72.03%) Loss: 1.933705 LR: 0.00001243 [15:18:41] Epoch: 1 Batch: 14478/20099 (72.03%) Loss: 2.047865 LR: 0.00001243 [15:18:42] Epoch: 1 Batch: 14479/20099 (72.04%) Loss: 2.353516 LR: 0.00001243 [15:18:44] Epoch: 1 Batch: 14480/20099 (72.04%) Loss: 2.319725 LR: 0.00001243 [15:18:46] Epoch: 1 Batch: 14481/20099 (72.05%) Loss: 1.982674 LR: 0.00001243 [15:18:48] Epoch: 1 Batch: 14482/20099 (72.05%) Loss: 1.962979 LR: 0.00001243 [15:18:50] Epoch: 1 Batch: 14483/20099 (72.06%) Loss: 1.914790 LR: 0.00001243 [15:18:52] Epoch: 1 Batch: 14484/20099 (72.06%) Loss: 2.024591 LR: 0.00001242 [15:18:54] Epoch: 1 Batch: 14485/20099 (72.07%) Loss: 2.169400 LR: 0.00001242 [15:18:55] Epoch: 1 Batch: 14486/20099 (72.07%) Loss: 2.048958 LR: 0.00001242 [15:18:57] Epoch: 1 Batch: 14487/20099 (72.08%) Loss: 2.480508 LR: 0.00001242 [15:18:59] Epoch: 1 Batch: 14488/20099 (72.08%) Loss: 2.060916 LR: 0.00001242 [15:19:01] Epoch: 1 Batch: 14489/20099 (72.09%) Loss: 2.125001 LR: 0.00001242 [15:19:03] Epoch: 1 Batch: 14490/20099 (72.09%) Loss: 1.822491 LR: 0.00001242 [15:19:05] Epoch: 1 Batch: 14491/20099 (72.10%) Loss: 1.838583 LR: 0.00001240 [15:19:07] Epoch: 1 Batch: 14492/20099 (72.10%) Loss: 2.015318 LR: 0.00001240 [15:19:09] Epoch: 1 Batch: 14493/20099 (72.11%) Loss: 2.206692 LR: 0.00001240 [15:19:10] Epoch: 1 Batch: 14494/20099 (72.11%) Loss: 2.215018 LR: 0.00001240 [15:19:12] Epoch: 1 Batch: 14495/20099 (72.12%) Loss: 2.295737 LR: 0.00001240 [15:19:14] Epoch: 1 Batch: 14496/20099 (72.12%) Loss: 2.448688 LR: 0.00001240 [15:19:16] Epoch: 1 Batch: 14497/20099 (72.13%) Loss: 2.364935 LR: 0.00001240 [15:19:18] Epoch: 1 Batch: 14498/20099 (72.13%) Loss: 2.018556 LR: 0.00001239 [15:19:20] Epoch: 1 Batch: 14499/20099 (72.14%) Loss: 1.936328 LR: 0.00001239 [15:19:22] >> Evaluating batch 0 [15:19:23] >> Evaluating batch 1 [15:19:24] >> Evaluating batch 2 [15:19:25] >> Evaluating batch 3 [15:19:26] >> Evaluating batch 4 [15:19:27] >> Evaluating batch 5 [15:19:28] >> Evaluating batch 6 [15:19:29] >> Evaluating batch 7 [15:19:30] >> Evaluating batch 8 [15:19:31] >> Evaluating batch 9 [15:19:32] >> Evaluating batch 10 [15:19:33] >> Evaluating batch 11 [15:19:34] >> Evaluating batch 12 [15:19:35] >> Evaluating batch 13 [15:19:36] >> Evaluating batch 14 [15:19:37] >> Evaluating batch 15 [15:19:38] >> Evaluating batch 16 [15:19:39] Epoch: 1 Step: 14500/20099 Evaluation: [15:19:39] [1mAvg Loss Since Last Eval: 2.0950 Val Loss: 2.1510 Validation loss delta: -0.0036 Perplexity: 8.5937 LR: 0.00001239 [15:19:43] >> Checkpoint saved: epoch1_step14500, size: 0.1693 GB [15:19:43] Epoch: 1 Batch: 14500/20099 (72.14%) Loss: 1.991810 LR: 0.00001239 [15:19:44] Epoch: 1 Batch: 14501/20099 (72.15%) Loss: 2.164187 LR: 0.00001239 [15:19:46] Epoch: 1 Batch: 14502/20099 (72.15%) Loss: 2.163255 LR: 0.00001239 [15:19:48] Epoch: 1 Batch: 14503/20099 (72.16%) Loss: 2.463665 LR: 0.00001239 [15:19:50] Epoch: 1 Batch: 14504/20099 (72.16%) Loss: 2.165258 LR: 0.00001239 [15:19:52] Epoch: 1 Batch: 14505/20099 (72.17%) Loss: 2.323009 LR: 0.00001237 [15:19:54] Epoch: 1 Batch: 14506/20099 (72.17%) Loss: 2.243193 LR: 0.00001237 [15:19:56] Epoch: 1 Batch: 14507/20099 (72.18%) Loss: 1.835574 LR: 0.00001237 [15:19:57] Epoch: 1 Batch: 14508/20099 (72.18%) Loss: 1.756479 LR: 0.00001237 [15:19:59] Epoch: 1 Batch: 14509/20099 (72.19%) Loss: 1.705734 LR: 0.00001237 [15:20:01] Epoch: 1 Batch: 14510/20099 (72.19%) Loss: 2.187381 LR: 0.00001237 [15:20:03] Epoch: 1 Batch: 14511/20099 (72.20%) Loss: 2.145376 LR: 0.00001237 [15:20:05] Epoch: 1 Batch: 14512/20099 (72.20%) Loss: 1.882414 LR: 0.00001236 [15:20:07] Epoch: 1 Batch: 14513/20099 (72.21%) Loss: 2.215072 LR: 0.00001236 [15:20:09] Epoch: 1 Batch: 14514/20099 (72.21%) Loss: 1.993399 LR: 0.00001236 [15:20:10] Epoch: 1 Batch: 14515/20099 (72.22%) Loss: 1.847801 LR: 0.00001236 [15:20:12] Epoch: 1 Batch: 14516/20099 (72.22%) Loss: 2.180828 LR: 0.00001236 [15:20:14] Epoch: 1 Batch: 14517/20099 (72.23%) Loss: 2.125284 LR: 0.00001236 [15:20:16] Epoch: 1 Batch: 14518/20099 (72.23%) Loss: 2.132117 LR: 0.00001236 [15:20:18] Epoch: 1 Batch: 14519/20099 (72.24%) Loss: 2.158510 LR: 0.00001235 [15:20:20] Epoch: 1 Batch: 14520/20099 (72.24%) Loss: 2.063774 LR: 0.00001235 [15:20:22] Epoch: 1 Batch: 14521/20099 (72.25%) Loss: 2.124070 LR: 0.00001235 [15:20:24] Epoch: 1 Batch: 14522/20099 (72.25%) Loss: 2.018030 LR: 0.00001235 [15:20:25] Epoch: 1 Batch: 14523/20099 (72.26%) Loss: 2.184516 LR: 0.00001235 [15:20:27] Epoch: 1 Batch: 14524/20099 (72.26%) Loss: 2.082471 LR: 0.00001235 [15:20:29] Epoch: 1 Batch: 14525/20099 (72.27%) Loss: 2.160716 LR: 0.00001235 [15:20:31] Epoch: 1 Batch: 14526/20099 (72.27%) Loss: 2.116430 LR: 0.00001233 [15:20:33] Epoch: 1 Batch: 14527/20099 (72.28%) Loss: 1.868672 LR: 0.00001233 [15:20:35] Epoch: 1 Batch: 14528/20099 (72.28%) Loss: 1.933285 LR: 0.00001233 [15:20:37] Epoch: 1 Batch: 14529/20099 (72.29%) Loss: 2.187509 LR: 0.00001233 [15:20:38] Epoch: 1 Batch: 14530/20099 (72.29%) Loss: 2.055129 LR: 0.00001233 [15:20:40] Epoch: 1 Batch: 14531/20099 (72.30%) Loss: 2.005318 LR: 0.00001233 [15:20:42] Epoch: 1 Batch: 14532/20099 (72.30%) Loss: 2.230756 LR: 0.00001233 [15:20:44] Epoch: 1 Batch: 14533/20099 (72.31%) Loss: 1.957039 LR: 0.00001232 [15:20:46] Epoch: 1 Batch: 14534/20099 (72.31%) Loss: 2.161413 LR: 0.00001232 [15:20:48] Epoch: 1 Batch: 14535/20099 (72.32%) Loss: 1.921613 LR: 0.00001232 [15:20:50] Epoch: 1 Batch: 14536/20099 (72.32%) Loss: 1.986036 LR: 0.00001232 [15:20:51] Epoch: 1 Batch: 14537/20099 (72.33%) Loss: 2.228891 LR: 0.00001232 [15:20:53] Epoch: 1 Batch: 14538/20099 (72.33%) Loss: 1.871427 LR: 0.00001232 [15:20:55] Epoch: 1 Batch: 14539/20099 (72.34%) Loss: 2.155831 LR: 0.00001232 [15:20:57] Epoch: 1 Batch: 14540/20099 (72.34%) Loss: 2.498863 LR: 0.00001230 [15:20:59] Epoch: 1 Batch: 14541/20099 (72.35%) Loss: 1.964395 LR: 0.00001230 [15:21:01] Epoch: 1 Batch: 14542/20099 (72.35%) Loss: 2.005301 LR: 0.00001230 [15:21:03] Epoch: 1 Batch: 14543/20099 (72.36%) Loss: 2.308398 LR: 0.00001230 [15:21:04] Epoch: 1 Batch: 14544/20099 (72.36%) Loss: 2.098806 LR: 0.00001230 [15:21:06] Epoch: 1 Batch: 14545/20099 (72.37%) Loss: 1.818506 LR: 0.00001230 [15:21:08] Epoch: 1 Batch: 14546/20099 (72.37%) Loss: 2.287256 LR: 0.00001230 [15:21:10] Epoch: 1 Batch: 14547/20099 (72.38%) Loss: 2.301476 LR: 0.00001229 [15:21:12] Epoch: 1 Batch: 14548/20099 (72.38%) Loss: 1.800382 LR: 0.00001229 [15:21:14] Epoch: 1 Batch: 14549/20099 (72.39%) Loss: 2.378016 LR: 0.00001229 [15:21:15] Epoch: 1 Batch: 14550/20099 (72.39%) Loss: 2.083975 LR: 0.00001229 [15:21:17] Epoch: 1 Batch: 14551/20099 (72.40%) Loss: 2.112379 LR: 0.00001229 [15:21:19] Epoch: 1 Batch: 14552/20099 (72.40%) Loss: 2.409041 LR: 0.00001229 [15:21:21] Epoch: 1 Batch: 14553/20099 (72.41%) Loss: 2.221600 LR: 0.00001229 [15:21:23] Epoch: 1 Batch: 14554/20099 (72.41%) Loss: 1.893570 LR: 0.00001227 [15:21:25] Epoch: 1 Batch: 14555/20099 (72.42%) Loss: 2.271136 LR: 0.00001227 [15:21:27] Epoch: 1 Batch: 14556/20099 (72.42%) Loss: 2.220961 LR: 0.00001227 [15:21:28] Epoch: 1 Batch: 14557/20099 (72.43%) Loss: 2.177417 LR: 0.00001227 [15:21:30] Epoch: 1 Batch: 14558/20099 (72.43%) Loss: 2.383300 LR: 0.00001227 [15:21:32] Epoch: 1 Batch: 14559/20099 (72.44%) Loss: 2.106506 LR: 0.00001227 [15:21:34] Epoch: 1 Batch: 14560/20099 (72.44%) Loss: 2.212078 LR: 0.00001227 [15:21:36] Epoch: 1 Batch: 14561/20099 (72.45%) Loss: 2.100115 LR: 0.00001226 [15:21:38] Epoch: 1 Batch: 14562/20099 (72.45%) Loss: 2.168281 LR: 0.00001226 [15:21:40] Epoch: 1 Batch: 14563/20099 (72.46%) Loss: 2.038933 LR: 0.00001226 [15:21:41] Epoch: 1 Batch: 14564/20099 (72.46%) Loss: 2.212020 LR: 0.00001226 [15:21:43] Epoch: 1 Batch: 14565/20099 (72.47%) Loss: 1.901061 LR: 0.00001226 [15:21:45] Epoch: 1 Batch: 14566/20099 (72.47%) Loss: 2.051090 LR: 0.00001226 [15:21:47] Epoch: 1 Batch: 14567/20099 (72.48%) Loss: 2.046336 LR: 0.00001226 [15:21:49] Epoch: 1 Batch: 14568/20099 (72.48%) Loss: 2.211973 LR: 0.00001225 [15:21:51] Epoch: 1 Batch: 14569/20099 (72.49%) Loss: 2.412043 LR: 0.00001225 [15:21:53] Epoch: 1 Batch: 14570/20099 (72.49%) Loss: 1.976097 LR: 0.00001225 [15:21:54] Epoch: 1 Batch: 14571/20099 (72.50%) Loss: 1.640873 LR: 0.00001225 [15:21:56] Epoch: 1 Batch: 14572/20099 (72.50%) Loss: 1.940375 LR: 0.00001225 [15:21:58] Epoch: 1 Batch: 14573/20099 (72.51%) Loss: 1.804585 LR: 0.00001225 [15:22:00] Epoch: 1 Batch: 14574/20099 (72.51%) Loss: 1.994008 LR: 0.00001225 [15:22:02] Epoch: 1 Batch: 14575/20099 (72.52%) Loss: 2.144055 LR: 0.00001223 [15:22:04] Epoch: 1 Batch: 14576/20099 (72.52%) Loss: 1.995118 LR: 0.00001223 [15:22:06] Epoch: 1 Batch: 14577/20099 (72.53%) Loss: 2.275714 LR: 0.00001223 [15:22:07] Epoch: 1 Batch: 14578/20099 (72.53%) Loss: 1.918792 LR: 0.00001223 [15:22:09] Epoch: 1 Batch: 14579/20099 (72.54%) Loss: 2.120080 LR: 0.00001223 [15:22:11] Epoch: 1 Batch: 14580/20099 (72.54%) Loss: 1.992417 LR: 0.00001223 [15:22:13] Epoch: 1 Batch: 14581/20099 (72.55%) Loss: 2.131125 LR: 0.00001223 [15:22:15] Epoch: 1 Batch: 14582/20099 (72.55%) Loss: 2.312051 LR: 0.00001222 [15:22:17] Epoch: 1 Batch: 14583/20099 (72.56%) Loss: 2.062566 LR: 0.00001222 [15:22:19] Epoch: 1 Batch: 14584/20099 (72.56%) Loss: 1.993074 LR: 0.00001222 [15:22:20] Epoch: 1 Batch: 14585/20099 (72.57%) Loss: 2.236779 LR: 0.00001222 [15:22:22] Epoch: 1 Batch: 14586/20099 (72.57%) Loss: 2.443809 LR: 0.00001222 [15:22:24] Epoch: 1 Batch: 14587/20099 (72.58%) Loss: 2.076302 LR: 0.00001222 [15:22:26] Epoch: 1 Batch: 14588/20099 (72.58%) Loss: 2.030118 LR: 0.00001222 [15:22:28] Epoch: 1 Batch: 14589/20099 (72.59%) Loss: 1.984947 LR: 0.00001220 [15:22:30] Epoch: 1 Batch: 14590/20099 (72.59%) Loss: 2.191644 LR: 0.00001220 [15:22:32] Epoch: 1 Batch: 14591/20099 (72.60%) Loss: 2.212530 LR: 0.00001220 [15:22:33] Epoch: 1 Batch: 14592/20099 (72.60%) Loss: 2.199967 LR: 0.00001220 [15:22:35] Epoch: 1 Batch: 14593/20099 (72.61%) Loss: 2.130109 LR: 0.00001220 [15:22:37] Epoch: 1 Batch: 14594/20099 (72.61%) Loss: 1.889025 LR: 0.00001220 [15:22:39] Epoch: 1 Batch: 14595/20099 (72.62%) Loss: 2.228910 LR: 0.00001220 [15:22:41] Epoch: 1 Batch: 14596/20099 (72.62%) Loss: 2.041878 LR: 0.00001219 [15:22:43] Epoch: 1 Batch: 14597/20099 (72.63%) Loss: 1.977821 LR: 0.00001219 [15:22:45] Epoch: 1 Batch: 14598/20099 (72.63%) Loss: 2.112341 LR: 0.00001219 [15:22:46] Epoch: 1 Batch: 14599/20099 (72.64%) Loss: 2.205645 LR: 0.00001219 [15:22:52] >> Cleaned up old temp checkpoint: epoch1_step12600 [15:22:52] >> Temp checkpoint saved: epoch1_step14600, size: 0.1693 GB [15:22:52] Epoch: 1 Batch: 14600/20099 (72.64%) Loss: 1.845559 LR: 0.00001219 [15:22:54] Epoch: 1 Batch: 14601/20099 (72.65%) Loss: 2.057917 LR: 0.00001219 [15:22:56] Epoch: 1 Batch: 14602/20099 (72.65%) Loss: 2.533642 LR: 0.00001219 [15:22:57] Epoch: 1 Batch: 14603/20099 (72.66%) Loss: 2.032584 LR: 0.00001217 [15:22:59] Epoch: 1 Batch: 14604/20099 (72.66%) Loss: 2.102175 LR: 0.00001217 [15:23:01] Epoch: 1 Batch: 14605/20099 (72.67%) Loss: 2.207043 LR: 0.00001217 [15:23:03] Epoch: 1 Batch: 14606/20099 (72.67%) Loss: 1.926958 LR: 0.00001217 [15:23:05] Epoch: 1 Batch: 14607/20099 (72.68%) Loss: 1.838336 LR: 0.00001217 [15:23:07] Epoch: 1 Batch: 14608/20099 (72.68%) Loss: 1.911971 LR: 0.00001217 [15:23:09] Epoch: 1 Batch: 14609/20099 (72.69%) Loss: 1.884308 LR: 0.00001217 [15:23:10] Epoch: 1 Batch: 14610/20099 (72.69%) Loss: 1.935245 LR: 0.00001216 [15:23:12] Epoch: 1 Batch: 14611/20099 (72.70%) Loss: 1.943741 LR: 0.00001216 [15:23:14] Epoch: 1 Batch: 14612/20099 (72.70%) Loss: 2.016637 LR: 0.00001216 [15:23:16] Epoch: 1 Batch: 14613/20099 (72.71%) Loss: 1.950716 LR: 0.00001216 [15:23:18] Epoch: 1 Batch: 14614/20099 (72.71%) Loss: 2.190831 LR: 0.00001216 [15:23:20] Epoch: 1 Batch: 14615/20099 (72.72%) Loss: 1.879681 LR: 0.00001216 [15:23:22] Epoch: 1 Batch: 14616/20099 (72.72%) Loss: 2.331791 LR: 0.00001216 [15:23:23] Epoch: 1 Batch: 14617/20099 (72.73%) Loss: 2.205222 LR: 0.00001215 [15:23:25] Epoch: 1 Batch: 14618/20099 (72.73%) Loss: 2.156131 LR: 0.00001215 [15:23:27] Epoch: 1 Batch: 14619/20099 (72.73%) Loss: 1.859800 LR: 0.00001215 [15:23:29] Epoch: 1 Batch: 14620/20099 (72.74%) Loss: 2.412878 LR: 0.00001215 [15:23:31] Epoch: 1 Batch: 14621/20099 (72.74%) Loss: 2.234488 LR: 0.00001215 [15:23:33] Epoch: 1 Batch: 14622/20099 (72.75%) Loss: 2.142592 LR: 0.00001215 [15:23:35] Epoch: 1 Batch: 14623/20099 (72.75%) Loss: 1.815431 LR: 0.00001215 [15:23:37] Epoch: 1 Batch: 14624/20099 (72.76%) Loss: 2.023240 LR: 0.00001213 [15:23:38] Epoch: 1 Batch: 14625/20099 (72.76%) Loss: 2.063927 LR: 0.00001213 [15:23:40] Epoch: 1 Batch: 14626/20099 (72.77%) Loss: 1.755138 LR: 0.00001213 [15:23:42] Epoch: 1 Batch: 14627/20099 (72.77%) Loss: 2.143165 LR: 0.00001213 [15:23:44] Epoch: 1 Batch: 14628/20099 (72.78%) Loss: 2.154861 LR: 0.00001213 [15:23:46] Epoch: 1 Batch: 14629/20099 (72.78%) Loss: 2.120452 LR: 0.00001213 [15:23:48] Epoch: 1 Batch: 14630/20099 (72.79%) Loss: 1.985712 LR: 0.00001213 [15:23:50] Epoch: 1 Batch: 14631/20099 (72.79%) Loss: 2.323375 LR: 0.00001212 [15:23:51] Epoch: 1 Batch: 14632/20099 (72.80%) Loss: 1.974238 LR: 0.00001212 [15:23:53] Epoch: 1 Batch: 14633/20099 (72.80%) Loss: 2.069423 LR: 0.00001212 [15:23:55] Epoch: 1 Batch: 14634/20099 (72.81%) Loss: 1.919914 LR: 0.00001212 [15:23:57] Epoch: 1 Batch: 14635/20099 (72.81%) Loss: 2.212505 LR: 0.00001212 [15:23:59] Epoch: 1 Batch: 14636/20099 (72.82%) Loss: 1.883612 LR: 0.00001212 [15:24:01] Epoch: 1 Batch: 14637/20099 (72.82%) Loss: 2.042296 LR: 0.00001212 [15:24:03] Epoch: 1 Batch: 14638/20099 (72.83%) Loss: 2.043594 LR: 0.00001210 [15:24:04] Epoch: 1 Batch: 14639/20099 (72.83%) Loss: 1.896118 LR: 0.00001210 [15:24:06] Epoch: 1 Batch: 14640/20099 (72.84%) Loss: 1.933711 LR: 0.00001210 [15:24:08] Epoch: 1 Batch: 14641/20099 (72.84%) Loss: 1.864973 LR: 0.00001210 [15:24:10] Epoch: 1 Batch: 14642/20099 (72.85%) Loss: 2.149244 LR: 0.00001210 [15:24:12] Epoch: 1 Batch: 14643/20099 (72.85%) Loss: 2.395778 LR: 0.00001210 [15:24:14] Epoch: 1 Batch: 14644/20099 (72.86%) Loss: 2.180894 LR: 0.00001210 [15:24:16] Epoch: 1 Batch: 14645/20099 (72.86%) Loss: 2.107377 LR: 0.00001209 [15:24:17] Epoch: 1 Batch: 14646/20099 (72.87%) Loss: 2.022335 LR: 0.00001209 [15:24:19] Epoch: 1 Batch: 14647/20099 (72.87%) Loss: 2.150871 LR: 0.00001209 [15:24:21] Epoch: 1 Batch: 14648/20099 (72.88%) Loss: 2.026461 LR: 0.00001209 [15:24:23] Epoch: 1 Batch: 14649/20099 (72.88%) Loss: 1.902867 LR: 0.00001209 [15:24:25] Epoch: 1 Batch: 14650/20099 (72.89%) Loss: 1.860460 LR: 0.00001209 [15:24:27] Epoch: 1 Batch: 14651/20099 (72.89%) Loss: 2.294273 LR: 0.00001209 [15:24:29] Epoch: 1 Batch: 14652/20099 (72.90%) Loss: 2.045142 LR: 0.00001208 [15:24:30] Epoch: 1 Batch: 14653/20099 (72.90%) Loss: 2.006219 LR: 0.00001208 [15:24:32] Epoch: 1 Batch: 14654/20099 (72.91%) Loss: 1.814377 LR: 0.00001208 [15:24:34] Epoch: 1 Batch: 14655/20099 (72.91%) Loss: 2.342150 LR: 0.00001208 [15:24:36] Epoch: 1 Batch: 14656/20099 (72.92%) Loss: 2.009468 LR: 0.00001208 [15:24:38] Epoch: 1 Batch: 14657/20099 (72.92%) Loss: 2.218953 LR: 0.00001208 [15:24:40] Epoch: 1 Batch: 14658/20099 (72.93%) Loss: 2.021503 LR: 0.00001208 [15:24:42] Epoch: 1 Batch: 14659/20099 (72.93%) Loss: 2.121539 LR: 0.00001206 [15:24:43] Epoch: 1 Batch: 14660/20099 (72.94%) Loss: 1.995151 LR: 0.00001206 [15:24:45] Epoch: 1 Batch: 14661/20099 (72.94%) Loss: 1.991552 LR: 0.00001206 [15:24:47] Epoch: 1 Batch: 14662/20099 (72.95%) Loss: 2.206068 LR: 0.00001206 [15:24:49] Epoch: 1 Batch: 14663/20099 (72.95%) Loss: 2.110966 LR: 0.00001206 [15:24:51] Epoch: 1 Batch: 14664/20099 (72.96%) Loss: 2.055425 LR: 0.00001206 [15:24:53] Epoch: 1 Batch: 14665/20099 (72.96%) Loss: 2.027600 LR: 0.00001206 [15:24:55] Epoch: 1 Batch: 14666/20099 (72.97%) Loss: 2.510704 LR: 0.00001205 [15:24:56] Epoch: 1 Batch: 14667/20099 (72.97%) Loss: 2.203276 LR: 0.00001205 [15:24:58] Epoch: 1 Batch: 14668/20099 (72.98%) Loss: 2.153906 LR: 0.00001205 [15:25:00] Epoch: 1 Batch: 14669/20099 (72.98%) Loss: 2.305878 LR: 0.00001205 [15:25:02] Epoch: 1 Batch: 14670/20099 (72.99%) Loss: 2.250226 LR: 0.00001205 [15:25:04] Epoch: 1 Batch: 14671/20099 (72.99%) Loss: 2.112631 LR: 0.00001205 [15:25:06] Epoch: 1 Batch: 14672/20099 (73.00%) Loss: 2.270169 LR: 0.00001205 [15:25:08] Epoch: 1 Batch: 14673/20099 (73.00%) Loss: 2.122239 LR: 0.00001203 [15:25:09] Epoch: 1 Batch: 14674/20099 (73.01%) Loss: 2.188517 LR: 0.00001203 [15:25:11] Epoch: 1 Batch: 14675/20099 (73.01%) Loss: 2.006328 LR: 0.00001203 [15:25:13] Epoch: 1 Batch: 14676/20099 (73.02%) Loss: 2.356757 LR: 0.00001203 [15:25:15] Epoch: 1 Batch: 14677/20099 (73.02%) Loss: 2.269834 LR: 0.00001203 [15:25:17] Epoch: 1 Batch: 14678/20099 (73.03%) Loss: 2.044632 LR: 0.00001203 [15:25:19] Epoch: 1 Batch: 14679/20099 (73.03%) Loss: 2.291368 LR: 0.00001203 [15:25:20] Epoch: 1 Batch: 14680/20099 (73.04%) Loss: 2.192049 LR: 0.00001202 [15:25:22] Epoch: 1 Batch: 14681/20099 (73.04%) Loss: 2.262888 LR: 0.00001202 [15:25:24] Epoch: 1 Batch: 14682/20099 (73.05%) Loss: 2.233484 LR: 0.00001202 [15:25:26] Epoch: 1 Batch: 14683/20099 (73.05%) Loss: 2.256237 LR: 0.00001202 [15:25:28] Epoch: 1 Batch: 14684/20099 (73.06%) Loss: 2.018841 LR: 0.00001202 [15:25:30] Epoch: 1 Batch: 14685/20099 (73.06%) Loss: 2.179704 LR: 0.00001202 [15:25:32] Epoch: 1 Batch: 14686/20099 (73.07%) Loss: 2.466531 LR: 0.00001202 [15:25:33] Epoch: 1 Batch: 14687/20099 (73.07%) Loss: 2.204104 LR: 0.00001200 [15:25:35] Epoch: 1 Batch: 14688/20099 (73.08%) Loss: 1.864255 LR: 0.00001200 [15:25:37] Epoch: 1 Batch: 14689/20099 (73.08%) Loss: 2.025358 LR: 0.00001200 [15:25:39] Epoch: 1 Batch: 14690/20099 (73.09%) Loss: 2.007831 LR: 0.00001200 [15:25:41] Epoch: 1 Batch: 14691/20099 (73.09%) Loss: 2.452326 LR: 0.00001200 [15:25:43] Epoch: 1 Batch: 14692/20099 (73.10%) Loss: 2.217162 LR: 0.00001200 [15:25:45] Epoch: 1 Batch: 14693/20099 (73.10%) Loss: 2.163694 LR: 0.00001200 [15:25:46] Epoch: 1 Batch: 14694/20099 (73.11%) Loss: 2.180666 LR: 0.00001199 [15:25:48] Epoch: 1 Batch: 14695/20099 (73.11%) Loss: 1.786565 LR: 0.00001199 [15:25:50] Epoch: 1 Batch: 14696/20099 (73.12%) Loss: 2.239279 LR: 0.00001199 [15:25:52] Epoch: 1 Batch: 14697/20099 (73.12%) Loss: 2.327783 LR: 0.00001199 [15:25:54] Epoch: 1 Batch: 14698/20099 (73.13%) Loss: 2.160182 LR: 0.00001199 [15:25:56] Epoch: 1 Batch: 14699/20099 (73.13%) Loss: 2.071859 LR: 0.00001199 [15:25:58] Epoch: 1 Batch: 14700/20099 (73.14%) Loss: 2.109748 LR: 0.00001199 [15:25:59] Epoch: 1 Batch: 14701/20099 (73.14%) Loss: 1.987413 LR: 0.00001198 [15:26:01] Epoch: 1 Batch: 14702/20099 (73.15%) Loss: 2.063632 LR: 0.00001198 [15:26:03] Epoch: 1 Batch: 14703/20099 (73.15%) Loss: 2.079232 LR: 0.00001198 [15:26:05] Epoch: 1 Batch: 14704/20099 (73.16%) Loss: 2.107628 LR: 0.00001198 [15:26:07] Epoch: 1 Batch: 14705/20099 (73.16%) Loss: 1.878389 LR: 0.00001198 [15:26:09] Epoch: 1 Batch: 14706/20099 (73.17%) Loss: 2.106992 LR: 0.00001198 [15:26:10] Epoch: 1 Batch: 14707/20099 (73.17%) Loss: 1.831939 LR: 0.00001198 [15:26:12] Epoch: 1 Batch: 14708/20099 (73.18%) Loss: 2.020643 LR: 0.00001196 [15:26:14] Epoch: 1 Batch: 14709/20099 (73.18%) Loss: 1.899216 LR: 0.00001196 [15:26:16] Epoch: 1 Batch: 14710/20099 (73.19%) Loss: 1.819044 LR: 0.00001196 [15:26:18] Epoch: 1 Batch: 14711/20099 (73.19%) Loss: 2.131608 LR: 0.00001196 [15:26:20] Epoch: 1 Batch: 14712/20099 (73.20%) Loss: 2.186875 LR: 0.00001196 [15:26:22] Epoch: 1 Batch: 14713/20099 (73.20%) Loss: 1.993829 LR: 0.00001196 [15:26:23] Epoch: 1 Batch: 14714/20099 (73.21%) Loss: 1.985477 LR: 0.00001196 [15:26:25] Epoch: 1 Batch: 14715/20099 (73.21%) Loss: 2.242743 LR: 0.00001195 [15:26:27] Epoch: 1 Batch: 14716/20099 (73.22%) Loss: 2.094003 LR: 0.00001195 [15:26:29] Epoch: 1 Batch: 14717/20099 (73.22%) Loss: 2.027370 LR: 0.00001195 [15:26:31] Epoch: 1 Batch: 14718/20099 (73.23%) Loss: 2.048308 LR: 0.00001195 [15:26:33] Epoch: 1 Batch: 14719/20099 (73.23%) Loss: 2.288171 LR: 0.00001195 [15:26:35] Epoch: 1 Batch: 14720/20099 (73.24%) Loss: 1.997952 LR: 0.00001195 [15:26:36] Epoch: 1 Batch: 14721/20099 (73.24%) Loss: 1.976528 LR: 0.00001195 [15:26:38] Epoch: 1 Batch: 14722/20099 (73.25%) Loss: 2.072670 LR: 0.00001193 [15:26:40] Epoch: 1 Batch: 14723/20099 (73.25%) Loss: 1.886161 LR: 0.00001193 [15:26:42] Epoch: 1 Batch: 14724/20099 (73.26%) Loss: 2.258782 LR: 0.00001193 [15:26:44] Epoch: 1 Batch: 14725/20099 (73.26%) Loss: 2.041959 LR: 0.00001193 [15:26:46] Epoch: 1 Batch: 14726/20099 (73.27%) Loss: 2.078343 LR: 0.00001193 [15:26:47] Epoch: 1 Batch: 14727/20099 (73.27%) Loss: 2.028849 LR: 0.00001193 [15:26:49] Epoch: 1 Batch: 14728/20099 (73.28%) Loss: 2.250461 LR: 0.00001193 [15:26:51] Epoch: 1 Batch: 14729/20099 (73.28%) Loss: 1.819268 LR: 0.00001192 [15:26:53] Epoch: 1 Batch: 14730/20099 (73.29%) Loss: 2.272353 LR: 0.00001192 [15:26:55] Epoch: 1 Batch: 14731/20099 (73.29%) Loss: 1.910786 LR: 0.00001192 [15:26:57] Epoch: 1 Batch: 14732/20099 (73.30%) Loss: 2.077081 LR: 0.00001192 [15:26:59] Epoch: 1 Batch: 14733/20099 (73.30%) Loss: 2.061259 LR: 0.00001192 [15:27:00] Epoch: 1 Batch: 14734/20099 (73.31%) Loss: 1.848962 LR: 0.00001192 [15:27:02] Epoch: 1 Batch: 14735/20099 (73.31%) Loss: 2.058107 LR: 0.00001192 [15:27:04] Epoch: 1 Batch: 14736/20099 (73.32%) Loss: 2.471178 LR: 0.00001191 [15:27:06] Epoch: 1 Batch: 14737/20099 (73.32%) Loss: 2.064875 LR: 0.00001191 [15:27:08] Epoch: 1 Batch: 14738/20099 (73.33%) Loss: 2.044766 LR: 0.00001191 [15:27:10] Epoch: 1 Batch: 14739/20099 (73.33%) Loss: 2.070641 LR: 0.00001191 [15:27:12] Epoch: 1 Batch: 14740/20099 (73.34%) Loss: 1.806723 LR: 0.00001191 [15:27:13] Epoch: 1 Batch: 14741/20099 (73.34%) Loss: 2.153395 LR: 0.00001191 [15:27:15] Epoch: 1 Batch: 14742/20099 (73.35%) Loss: 2.079314 LR: 0.00001191 [15:27:17] Epoch: 1 Batch: 14743/20099 (73.35%) Loss: 2.326917 LR: 0.00001189 [15:27:19] Epoch: 1 Batch: 14744/20099 (73.36%) Loss: 2.130505 LR: 0.00001189 [15:27:21] Epoch: 1 Batch: 14745/20099 (73.36%) Loss: 2.326753 LR: 0.00001189 [15:27:23] Epoch: 1 Batch: 14746/20099 (73.37%) Loss: 2.183417 LR: 0.00001189 [15:27:25] Epoch: 1 Batch: 14747/20099 (73.37%) Loss: 1.856151 LR: 0.00001189 [15:27:26] Epoch: 1 Batch: 14748/20099 (73.38%) Loss: 2.055672 LR: 0.00001189 [15:27:28] Epoch: 1 Batch: 14749/20099 (73.38%) Loss: 2.316203 LR: 0.00001189 [15:27:30] Epoch: 1 Batch: 14750/20099 (73.39%) Loss: 2.248447 LR: 0.00001188 [15:27:32] Epoch: 1 Batch: 14751/20099 (73.39%) Loss: 1.879160 LR: 0.00001188 [15:27:34] Epoch: 1 Batch: 14752/20099 (73.40%) Loss: 2.146756 LR: 0.00001188 [15:27:36] Epoch: 1 Batch: 14753/20099 (73.40%) Loss: 2.407632 LR: 0.00001188 [15:27:38] Epoch: 1 Batch: 14754/20099 (73.41%) Loss: 2.247459 LR: 0.00001188 [15:27:39] Epoch: 1 Batch: 14755/20099 (73.41%) Loss: 2.083076 LR: 0.00001188 [15:27:41] Epoch: 1 Batch: 14756/20099 (73.42%) Loss: 2.252095 LR: 0.00001188 [15:27:43] Epoch: 1 Batch: 14757/20099 (73.42%) Loss: 2.056896 LR: 0.00001186 [15:27:45] Epoch: 1 Batch: 14758/20099 (73.43%) Loss: 1.769943 LR: 0.00001186 [15:27:47] Epoch: 1 Batch: 14759/20099 (73.43%) Loss: 2.107672 LR: 0.00001186 [15:27:49] Epoch: 1 Batch: 14760/20099 (73.44%) Loss: 1.994403 LR: 0.00001186 [15:27:50] Epoch: 1 Batch: 14761/20099 (73.44%) Loss: 1.793188 LR: 0.00001186 [15:27:52] Epoch: 1 Batch: 14762/20099 (73.45%) Loss: 2.123430 LR: 0.00001186 [15:27:54] Epoch: 1 Batch: 14763/20099 (73.45%) Loss: 1.763446 LR: 0.00001186 [15:27:56] Epoch: 1 Batch: 14764/20099 (73.46%) Loss: 2.305965 LR: 0.00001185 [15:27:58] Epoch: 1 Batch: 14765/20099 (73.46%) Loss: 1.941462 LR: 0.00001185 [15:28:00] Epoch: 1 Batch: 14766/20099 (73.47%) Loss: 2.170876 LR: 0.00001185 [15:28:02] Epoch: 1 Batch: 14767/20099 (73.47%) Loss: 2.104540 LR: 0.00001185 [15:28:03] Epoch: 1 Batch: 14768/20099 (73.48%) Loss: 2.065100 LR: 0.00001185 [15:28:05] Epoch: 1 Batch: 14769/20099 (73.48%) Loss: 2.126123 LR: 0.00001185 [15:28:07] Epoch: 1 Batch: 14770/20099 (73.49%) Loss: 2.102377 LR: 0.00001185 [15:28:09] Epoch: 1 Batch: 14771/20099 (73.49%) Loss: 2.071370 LR: 0.00001184 [15:28:11] Epoch: 1 Batch: 14772/20099 (73.50%) Loss: 2.018467 LR: 0.00001184 [15:28:13] Epoch: 1 Batch: 14773/20099 (73.50%) Loss: 2.278516 LR: 0.00001184 [15:28:15] Epoch: 1 Batch: 14774/20099 (73.51%) Loss: 2.049923 LR: 0.00001184 [15:28:17] Epoch: 1 Batch: 14775/20099 (73.51%) Loss: 2.177368 LR: 0.00001184 [15:28:18] Epoch: 1 Batch: 14776/20099 (73.52%) Loss: 1.925876 LR: 0.00001184 [15:28:20] Epoch: 1 Batch: 14777/20099 (73.52%) Loss: 2.301172 LR: 0.00001184 [15:28:22] Epoch: 1 Batch: 14778/20099 (73.53%) Loss: 1.967881 LR: 0.00001182 [15:28:24] Epoch: 1 Batch: 14779/20099 (73.53%) Loss: 1.908280 LR: 0.00001182 [15:28:26] Epoch: 1 Batch: 14780/20099 (73.54%) Loss: 1.795962 LR: 0.00001182 [15:28:28] Epoch: 1 Batch: 14781/20099 (73.54%) Loss: 2.360852 LR: 0.00001182 [15:28:30] Epoch: 1 Batch: 14782/20099 (73.55%) Loss: 2.084901 LR: 0.00001182 [15:28:31] Epoch: 1 Batch: 14783/20099 (73.55%) Loss: 2.293394 LR: 0.00001182 [15:28:33] Epoch: 1 Batch: 14784/20099 (73.56%) Loss: 2.015037 LR: 0.00001182 [15:28:35] Epoch: 1 Batch: 14785/20099 (73.56%) Loss: 1.998969 LR: 0.00001181 [15:28:37] Epoch: 1 Batch: 14786/20099 (73.57%) Loss: 1.892595 LR: 0.00001181 [15:28:39] Epoch: 1 Batch: 14787/20099 (73.57%) Loss: 2.071038 LR: 0.00001181 [15:28:41] Epoch: 1 Batch: 14788/20099 (73.58%) Loss: 2.064728 LR: 0.00001181 [15:28:42] Epoch: 1 Batch: 14789/20099 (73.58%) Loss: 2.148851 LR: 0.00001181 [15:28:44] Epoch: 1 Batch: 14790/20099 (73.59%) Loss: 2.087311 LR: 0.00001181 [15:28:46] Epoch: 1 Batch: 14791/20099 (73.59%) Loss: 2.271152 LR: 0.00001181 [15:28:48] Epoch: 1 Batch: 14792/20099 (73.60%) Loss: 2.218958 LR: 0.00001179 [15:28:50] Epoch: 1 Batch: 14793/20099 (73.60%) Loss: 2.100421 LR: 0.00001179 [15:28:52] Epoch: 1 Batch: 14794/20099 (73.61%) Loss: 2.152236 LR: 0.00001179 [15:28:54] Epoch: 1 Batch: 14795/20099 (73.61%) Loss: 2.186857 LR: 0.00001179 [15:28:55] Epoch: 1 Batch: 14796/20099 (73.62%) Loss: 1.746650 LR: 0.00001179 [15:28:57] Epoch: 1 Batch: 14797/20099 (73.62%) Loss: 2.412846 LR: 0.00001179 [15:28:59] Epoch: 1 Batch: 14798/20099 (73.63%) Loss: 2.076237 LR: 0.00001179 [15:29:01] Epoch: 1 Batch: 14799/20099 (73.63%) Loss: 1.773370 LR: 0.00001178 [15:29:06] >> Cleaned up old temp checkpoint: epoch1_step12800 [15:29:06] >> Temp checkpoint saved: epoch1_step14800, size: 0.1693 GB [15:29:06] Epoch: 1 Batch: 14800/20099 (73.64%) Loss: 2.057747 LR: 0.00001178 [15:29:08] Epoch: 1 Batch: 14801/20099 (73.64%) Loss: 1.785506 LR: 0.00001178 [15:29:10] Epoch: 1 Batch: 14802/20099 (73.65%) Loss: 2.156562 LR: 0.00001178 [15:29:12] Epoch: 1 Batch: 14803/20099 (73.65%) Loss: 2.045387 LR: 0.00001178 [15:29:14] Epoch: 1 Batch: 14804/20099 (73.66%) Loss: 2.207200 LR: 0.00001178 [15:29:16] Epoch: 1 Batch: 14805/20099 (73.66%) Loss: 2.113934 LR: 0.00001178 [15:29:17] Epoch: 1 Batch: 14806/20099 (73.67%) Loss: 2.321863 LR: 0.00001177 [15:29:19] Epoch: 1 Batch: 14807/20099 (73.67%) Loss: 1.982444 LR: 0.00001177 [15:29:21] Epoch: 1 Batch: 14808/20099 (73.68%) Loss: 2.057757 LR: 0.00001177 [15:29:23] Epoch: 1 Batch: 14809/20099 (73.68%) Loss: 2.235214 LR: 0.00001177 [15:29:25] Epoch: 1 Batch: 14810/20099 (73.69%) Loss: 1.831557 LR: 0.00001177 [15:29:27] Epoch: 1 Batch: 14811/20099 (73.69%) Loss: 2.398069 LR: 0.00001177 [15:29:29] Epoch: 1 Batch: 14812/20099 (73.70%) Loss: 2.089263 LR: 0.00001177 [15:29:30] Epoch: 1 Batch: 14813/20099 (73.70%) Loss: 1.942188 LR: 0.00001175 [15:29:32] Epoch: 1 Batch: 14814/20099 (73.71%) Loss: 1.837707 LR: 0.00001175 [15:29:34] Epoch: 1 Batch: 14815/20099 (73.71%) Loss: 1.964948 LR: 0.00001175 [15:29:36] Epoch: 1 Batch: 14816/20099 (73.72%) Loss: 1.883831 LR: 0.00001175 [15:29:38] Epoch: 1 Batch: 14817/20099 (73.72%) Loss: 2.600588 LR: 0.00001175 [15:29:40] Epoch: 1 Batch: 14818/20099 (73.73%) Loss: 1.884473 LR: 0.00001175 [15:29:42] Epoch: 1 Batch: 14819/20099 (73.73%) Loss: 2.001213 LR: 0.00001175 [15:29:44] Epoch: 1 Batch: 14820/20099 (73.74%) Loss: 1.869462 LR: 0.00001174 [15:29:45] Epoch: 1 Batch: 14821/20099 (73.74%) Loss: 2.137549 LR: 0.00001174 [15:29:47] Epoch: 1 Batch: 14822/20099 (73.74%) Loss: 2.148617 LR: 0.00001174 [15:29:49] Epoch: 1 Batch: 14823/20099 (73.75%) Loss: 1.938741 LR: 0.00001174 [15:29:51] Epoch: 1 Batch: 14824/20099 (73.75%) Loss: 1.853939 LR: 0.00001174 [15:29:53] Epoch: 1 Batch: 14825/20099 (73.76%) Loss: 2.227391 LR: 0.00001174 [15:29:55] Epoch: 1 Batch: 14826/20099 (73.76%) Loss: 2.249324 LR: 0.00001174 [15:29:57] Epoch: 1 Batch: 14827/20099 (73.77%) Loss: 2.202935 LR: 0.00001173 [15:29:58] Epoch: 1 Batch: 14828/20099 (73.77%) Loss: 2.003791 LR: 0.00001173 [15:30:00] Epoch: 1 Batch: 14829/20099 (73.78%) Loss: 1.994618 LR: 0.00001173 [15:30:02] Epoch: 1 Batch: 14830/20099 (73.78%) Loss: 2.145823 LR: 0.00001173 [15:30:04] Epoch: 1 Batch: 14831/20099 (73.79%) Loss: 2.304205 LR: 0.00001173 [15:30:06] Epoch: 1 Batch: 14832/20099 (73.79%) Loss: 2.108578 LR: 0.00001173 [15:30:08] Epoch: 1 Batch: 14833/20099 (73.80%) Loss: 1.930388 LR: 0.00001173 [15:30:09] Epoch: 1 Batch: 14834/20099 (73.80%) Loss: 2.101443 LR: 0.00001171 [15:30:11] Epoch: 1 Batch: 14835/20099 (73.81%) Loss: 1.978124 LR: 0.00001171 [15:30:13] Epoch: 1 Batch: 14836/20099 (73.81%) Loss: 2.106428 LR: 0.00001171 [15:30:15] Epoch: 1 Batch: 14837/20099 (73.82%) Loss: 2.214680 LR: 0.00001171 [15:30:17] Epoch: 1 Batch: 14838/20099 (73.82%) Loss: 2.139306 LR: 0.00001171 [15:30:19] Epoch: 1 Batch: 14839/20099 (73.83%) Loss: 2.192270 LR: 0.00001171 [15:30:21] Epoch: 1 Batch: 14840/20099 (73.83%) Loss: 1.920345 LR: 0.00001171 [15:30:22] Epoch: 1 Batch: 14841/20099 (73.84%) Loss: 2.089080 LR: 0.00001170 [15:30:24] Epoch: 1 Batch: 14842/20099 (73.84%) Loss: 2.127104 LR: 0.00001170 [15:30:26] Epoch: 1 Batch: 14843/20099 (73.85%) Loss: 2.033286 LR: 0.00001170 [15:30:28] Epoch: 1 Batch: 14844/20099 (73.85%) Loss: 2.155840 LR: 0.00001170 [15:30:30] Epoch: 1 Batch: 14845/20099 (73.86%) Loss: 2.099457 LR: 0.00001170 [15:30:32] Epoch: 1 Batch: 14846/20099 (73.86%) Loss: 1.791582 LR: 0.00001170 [15:30:34] Epoch: 1 Batch: 14847/20099 (73.87%) Loss: 1.853220 LR: 0.00001170 [15:30:35] Epoch: 1 Batch: 14848/20099 (73.87%) Loss: 2.273280 LR: 0.00001168 [15:30:37] Epoch: 1 Batch: 14849/20099 (73.88%) Loss: 2.051242 LR: 0.00001168 [15:30:39] Epoch: 1 Batch: 14850/20099 (73.88%) Loss: 1.827104 LR: 0.00001168 [15:30:41] Epoch: 1 Batch: 14851/20099 (73.89%) Loss: 2.169297 LR: 0.00001168 [15:30:43] Epoch: 1 Batch: 14852/20099 (73.89%) Loss: 1.912151 LR: 0.00001168 [15:30:45] Epoch: 1 Batch: 14853/20099 (73.90%) Loss: 1.831099 LR: 0.00001168 [15:30:46] Epoch: 1 Batch: 14854/20099 (73.90%) Loss: 2.297000 LR: 0.00001168 [15:30:48] Epoch: 1 Batch: 14855/20099 (73.91%) Loss: 2.252624 LR: 0.00001167 [15:30:50] Epoch: 1 Batch: 14856/20099 (73.91%) Loss: 1.939414 LR: 0.00001167 [15:30:52] Epoch: 1 Batch: 14857/20099 (73.92%) Loss: 2.165844 LR: 0.00001167 [15:30:54] Epoch: 1 Batch: 14858/20099 (73.92%) Loss: 2.094969 LR: 0.00001167 [15:30:56] Epoch: 1 Batch: 14859/20099 (73.93%) Loss: 2.158749 LR: 0.00001167 [15:30:58] Epoch: 1 Batch: 14860/20099 (73.93%) Loss: 1.891902 LR: 0.00001167 [15:30:59] Epoch: 1 Batch: 14861/20099 (73.94%) Loss: 2.246606 LR: 0.00001167 [15:31:01] Epoch: 1 Batch: 14862/20099 (73.94%) Loss: 2.117472 LR: 0.00001166 [15:31:03] Epoch: 1 Batch: 14863/20099 (73.95%) Loss: 1.879394 LR: 0.00001166 [15:31:05] Epoch: 1 Batch: 14864/20099 (73.95%) Loss: 2.058429 LR: 0.00001166 [15:31:07] Epoch: 1 Batch: 14865/20099 (73.96%) Loss: 2.135745 LR: 0.00001166 [15:31:09] Epoch: 1 Batch: 14866/20099 (73.96%) Loss: 2.000299 LR: 0.00001166 [15:31:11] Epoch: 1 Batch: 14867/20099 (73.97%) Loss: 2.216433 LR: 0.00001166 [15:31:12] Epoch: 1 Batch: 14868/20099 (73.97%) Loss: 1.979194 LR: 0.00001166 [15:31:14] Epoch: 1 Batch: 14869/20099 (73.98%) Loss: 1.961370 LR: 0.00001164 [15:31:16] Epoch: 1 Batch: 14870/20099 (73.98%) Loss: 2.199549 LR: 0.00001164 [15:31:18] Epoch: 1 Batch: 14871/20099 (73.99%) Loss: 1.989530 LR: 0.00001164 [15:31:20] Epoch: 1 Batch: 14872/20099 (73.99%) Loss: 2.185439 LR: 0.00001164 [15:31:22] Epoch: 1 Batch: 14873/20099 (74.00%) Loss: 2.027701 LR: 0.00001164 [15:31:23] Epoch: 1 Batch: 14874/20099 (74.00%) Loss: 2.135622 LR: 0.00001164 [15:31:25] Epoch: 1 Batch: 14875/20099 (74.01%) Loss: 2.107593 LR: 0.00001164 [15:31:27] Epoch: 1 Batch: 14876/20099 (74.01%) Loss: 2.260426 LR: 0.00001163 [15:31:29] Epoch: 1 Batch: 14877/20099 (74.02%) Loss: 2.194685 LR: 0.00001163 [15:31:31] Epoch: 1 Batch: 14878/20099 (74.02%) Loss: 1.847812 LR: 0.00001163 [15:31:33] Epoch: 1 Batch: 14879/20099 (74.03%) Loss: 2.132424 LR: 0.00001163 [15:31:35] Epoch: 1 Batch: 14880/20099 (74.03%) Loss: 2.380367 LR: 0.00001163 [15:31:36] Epoch: 1 Batch: 14881/20099 (74.04%) Loss: 2.272047 LR: 0.00001163 [15:31:38] Epoch: 1 Batch: 14882/20099 (74.04%) Loss: 2.066972 LR: 0.00001163 [15:31:40] Epoch: 1 Batch: 14883/20099 (74.05%) Loss: 2.379069 LR: 0.00001162 [15:31:42] Epoch: 1 Batch: 14884/20099 (74.05%) Loss: 1.938607 LR: 0.00001162 [15:31:44] Epoch: 1 Batch: 14885/20099 (74.06%) Loss: 2.059341 LR: 0.00001162 [15:31:46] Epoch: 1 Batch: 14886/20099 (74.06%) Loss: 2.521427 LR: 0.00001162 [15:31:48] Epoch: 1 Batch: 14887/20099 (74.07%) Loss: 2.238933 LR: 0.00001162 [15:31:50] Epoch: 1 Batch: 14888/20099 (74.07%) Loss: 2.149816 LR: 0.00001162 [15:31:51] Epoch: 1 Batch: 14889/20099 (74.08%) Loss: 1.967731 LR: 0.00001162 [15:31:53] Epoch: 1 Batch: 14890/20099 (74.08%) Loss: 2.126666 LR: 0.00001160 [15:31:55] Epoch: 1 Batch: 14891/20099 (74.09%) Loss: 2.220259 LR: 0.00001160 [15:31:57] Epoch: 1 Batch: 14892/20099 (74.09%) Loss: 1.747729 LR: 0.00001160 [15:31:59] Epoch: 1 Batch: 14893/20099 (74.10%) Loss: 2.100382 LR: 0.00001160 [15:32:01] Epoch: 1 Batch: 14894/20099 (74.10%) Loss: 2.461234 LR: 0.00001160 [15:32:02] Epoch: 1 Batch: 14895/20099 (74.11%) Loss: 2.361595 LR: 0.00001160 [15:32:04] Epoch: 1 Batch: 14896/20099 (74.11%) Loss: 2.251818 LR: 0.00001160 [15:32:06] Epoch: 1 Batch: 14897/20099 (74.12%) Loss: 2.013610 LR: 0.00001159 [15:32:08] Epoch: 1 Batch: 14898/20099 (74.12%) Loss: 1.991952 LR: 0.00001159 [15:32:10] Epoch: 1 Batch: 14899/20099 (74.13%) Loss: 1.918209 LR: 0.00001159 [15:32:12] Epoch: 1 Batch: 14900/20099 (74.13%) Loss: 1.890265 LR: 0.00001159 [15:32:14] Epoch: 1 Batch: 14901/20099 (74.14%) Loss: 1.871738 LR: 0.00001159 [15:32:15] Epoch: 1 Batch: 14902/20099 (74.14%) Loss: 1.873860 LR: 0.00001159 [15:32:17] Epoch: 1 Batch: 14903/20099 (74.15%) Loss: 1.887889 LR: 0.00001159 [15:32:19] Epoch: 1 Batch: 14904/20099 (74.15%) Loss: 2.057255 LR: 0.00001157 [15:32:21] Epoch: 1 Batch: 14905/20099 (74.16%) Loss: 2.113943 LR: 0.00001157 [15:32:23] Epoch: 1 Batch: 14906/20099 (74.16%) Loss: 1.885328 LR: 0.00001157 [15:32:25] Epoch: 1 Batch: 14907/20099 (74.17%) Loss: 1.972797 LR: 0.00001157 [15:32:26] Epoch: 1 Batch: 14908/20099 (74.17%) Loss: 2.294512 LR: 0.00001157 [15:32:28] Epoch: 1 Batch: 14909/20099 (74.18%) Loss: 2.053163 LR: 0.00001157 [15:32:30] Epoch: 1 Batch: 14910/20099 (74.18%) Loss: 1.957083 LR: 0.00001157 [15:32:32] Epoch: 1 Batch: 14911/20099 (74.19%) Loss: 1.956885 LR: 0.00001156 [15:32:34] Epoch: 1 Batch: 14912/20099 (74.19%) Loss: 1.599414 LR: 0.00001156 [15:32:36] Epoch: 1 Batch: 14913/20099 (74.20%) Loss: 1.876906 LR: 0.00001156 [15:32:38] Epoch: 1 Batch: 14914/20099 (74.20%) Loss: 2.217363 LR: 0.00001156 [15:32:39] Epoch: 1 Batch: 14915/20099 (74.21%) Loss: 2.208473 LR: 0.00001156 [15:32:41] Epoch: 1 Batch: 14916/20099 (74.21%) Loss: 1.772821 LR: 0.00001156 [15:32:43] Epoch: 1 Batch: 14917/20099 (74.22%) Loss: 1.953118 LR: 0.00001156 [15:32:45] Epoch: 1 Batch: 14918/20099 (74.22%) Loss: 2.026433 LR: 0.00001155 [15:32:47] Epoch: 1 Batch: 14919/20099 (74.23%) Loss: 2.237639 LR: 0.00001155 [15:32:49] Epoch: 1 Batch: 14920/20099 (74.23%) Loss: 1.734126 LR: 0.00001155 [15:32:51] Epoch: 1 Batch: 14921/20099 (74.24%) Loss: 1.669109 LR: 0.00001155 [15:32:52] Epoch: 1 Batch: 14922/20099 (74.24%) Loss: 2.436272 LR: 0.00001155 [15:32:54] Epoch: 1 Batch: 14923/20099 (74.25%) Loss: 2.062576 LR: 0.00001155 [15:32:56] Epoch: 1 Batch: 14924/20099 (74.25%) Loss: 2.037621 LR: 0.00001155 [15:32:58] Epoch: 1 Batch: 14925/20099 (74.26%) Loss: 2.063207 LR: 0.00001153 [15:33:00] Epoch: 1 Batch: 14926/20099 (74.26%) Loss: 2.181958 LR: 0.00001153 [15:33:02] Epoch: 1 Batch: 14927/20099 (74.27%) Loss: 2.065014 LR: 0.00001153 [15:33:04] Epoch: 1 Batch: 14928/20099 (74.27%) Loss: 2.331077 LR: 0.00001153 [15:33:05] Epoch: 1 Batch: 14929/20099 (74.28%) Loss: 2.108982 LR: 0.00001153 [15:33:07] Epoch: 1 Batch: 14930/20099 (74.28%) Loss: 2.152534 LR: 0.00001153 [15:33:09] Epoch: 1 Batch: 14931/20099 (74.29%) Loss: 2.086803 LR: 0.00001153 [15:33:11] Epoch: 1 Batch: 14932/20099 (74.29%) Loss: 2.058345 LR: 0.00001152 [15:33:13] Epoch: 1 Batch: 14933/20099 (74.30%) Loss: 1.886432 LR: 0.00001152 [15:33:15] Epoch: 1 Batch: 14934/20099 (74.30%) Loss: 2.044586 LR: 0.00001152 [15:33:16] Epoch: 1 Batch: 14935/20099 (74.31%) Loss: 2.283975 LR: 0.00001152 [15:33:18] Epoch: 1 Batch: 14936/20099 (74.31%) Loss: 2.255137 LR: 0.00001152 [15:33:20] Epoch: 1 Batch: 14937/20099 (74.32%) Loss: 2.152160 LR: 0.00001152 [15:33:22] Epoch: 1 Batch: 14938/20099 (74.32%) Loss: 2.033334 LR: 0.00001152 [15:33:24] Epoch: 1 Batch: 14939/20099 (74.33%) Loss: 2.029494 LR: 0.00001151 [15:33:26] Epoch: 1 Batch: 14940/20099 (74.33%) Loss: 2.414512 LR: 0.00001151 [15:33:28] Epoch: 1 Batch: 14941/20099 (74.34%) Loss: 1.913621 LR: 0.00001151 [15:33:30] Epoch: 1 Batch: 14942/20099 (74.34%) Loss: 2.063816 LR: 0.00001151 [15:33:32] Epoch: 1 Batch: 14943/20099 (74.35%) Loss: 2.190058 LR: 0.00001151 [15:33:33] Epoch: 1 Batch: 14944/20099 (74.35%) Loss: 2.190831 LR: 0.00001151 [15:33:35] Epoch: 1 Batch: 14945/20099 (74.36%) Loss: 2.061677 LR: 0.00001151 [15:33:37] Epoch: 1 Batch: 14946/20099 (74.36%) Loss: 1.805814 LR: 0.00001149 [15:33:39] Epoch: 1 Batch: 14947/20099 (74.37%) Loss: 2.225127 LR: 0.00001149 [15:33:41] Epoch: 1 Batch: 14948/20099 (74.37%) Loss: 1.942101 LR: 0.00001149 [15:33:43] Epoch: 1 Batch: 14949/20099 (74.38%) Loss: 2.080892 LR: 0.00001149 [15:33:44] Epoch: 1 Batch: 14950/20099 (74.38%) Loss: 2.065711 LR: 0.00001149 [15:33:46] Epoch: 1 Batch: 14951/20099 (74.39%) Loss: 1.685774 LR: 0.00001149 [15:33:48] Epoch: 1 Batch: 14952/20099 (74.39%) Loss: 1.968872 LR: 0.00001149 [15:33:50] Epoch: 1 Batch: 14953/20099 (74.40%) Loss: 1.959617 LR: 0.00001148 [15:33:52] Epoch: 1 Batch: 14954/20099 (74.40%) Loss: 1.658932 LR: 0.00001148 [15:33:54] Epoch: 1 Batch: 14955/20099 (74.41%) Loss: 2.344990 LR: 0.00001148 [15:33:56] Epoch: 1 Batch: 14956/20099 (74.41%) Loss: 2.049994 LR: 0.00001148 [15:33:57] Epoch: 1 Batch: 14957/20099 (74.42%) Loss: 1.879896 LR: 0.00001148 [15:33:59] Epoch: 1 Batch: 14958/20099 (74.42%) Loss: 2.146871 LR: 0.00001148 [15:34:01] Epoch: 1 Batch: 14959/20099 (74.43%) Loss: 2.046450 LR: 0.00001148 [15:34:03] Epoch: 1 Batch: 14960/20099 (74.43%) Loss: 2.006572 LR: 0.00001146 [15:34:05] Epoch: 1 Batch: 14961/20099 (74.44%) Loss: 1.916632 LR: 0.00001146 [15:34:07] Epoch: 1 Batch: 14962/20099 (74.44%) Loss: 1.815775 LR: 0.00001146 [15:34:09] Epoch: 1 Batch: 14963/20099 (74.45%) Loss: 2.026780 LR: 0.00001146 [15:34:10] Epoch: 1 Batch: 14964/20099 (74.45%) Loss: 2.146287 LR: 0.00001146 [15:34:12] Epoch: 1 Batch: 14965/20099 (74.46%) Loss: 1.947296 LR: 0.00001146 [15:34:14] Epoch: 1 Batch: 14966/20099 (74.46%) Loss: 1.876985 LR: 0.00001146 [15:34:16] Epoch: 1 Batch: 14967/20099 (74.47%) Loss: 2.160566 LR: 0.00001145 [15:34:18] Epoch: 1 Batch: 14968/20099 (74.47%) Loss: 2.108978 LR: 0.00001145 [15:34:20] Epoch: 1 Batch: 14969/20099 (74.48%) Loss: 2.083555 LR: 0.00001145 [15:34:22] Epoch: 1 Batch: 14970/20099 (74.48%) Loss: 2.002642 LR: 0.00001145 [15:34:24] Epoch: 1 Batch: 14971/20099 (74.49%) Loss: 1.803746 LR: 0.00001145 [15:34:25] Epoch: 1 Batch: 14972/20099 (74.49%) Loss: 2.032679 LR: 0.00001145 [15:34:27] Epoch: 1 Batch: 14973/20099 (74.50%) Loss: 2.148899 LR: 0.00001145 [15:34:29] Epoch: 1 Batch: 14974/20099 (74.50%) Loss: 1.702875 LR: 0.00001144 [15:34:31] Epoch: 1 Batch: 14975/20099 (74.51%) Loss: 2.112844 LR: 0.00001144 [15:34:33] Epoch: 1 Batch: 14976/20099 (74.51%) Loss: 2.228926 LR: 0.00001144 [15:34:35] Epoch: 1 Batch: 14977/20099 (74.52%) Loss: 2.226591 LR: 0.00001144 [15:34:37] Epoch: 1 Batch: 14978/20099 (74.52%) Loss: 2.350809 LR: 0.00001144 [15:34:38] Epoch: 1 Batch: 14979/20099 (74.53%) Loss: 2.133690 LR: 0.00001144 [15:34:40] Epoch: 1 Batch: 14980/20099 (74.53%) Loss: 2.304178 LR: 0.00001144 [15:34:42] Epoch: 1 Batch: 14981/20099 (74.54%) Loss: 2.182967 LR: 0.00001142 [15:34:44] Epoch: 1 Batch: 14982/20099 (74.54%) Loss: 1.884519 LR: 0.00001142 [15:34:46] Epoch: 1 Batch: 14983/20099 (74.55%) Loss: 2.032850 LR: 0.00001142 [15:34:48] Epoch: 1 Batch: 14984/20099 (74.55%) Loss: 2.089105 LR: 0.00001142 [15:34:50] Epoch: 1 Batch: 14985/20099 (74.56%) Loss: 2.290029 LR: 0.00001142 [15:34:51] Epoch: 1 Batch: 14986/20099 (74.56%) Loss: 2.208405 LR: 0.00001142 [15:34:53] Epoch: 1 Batch: 14987/20099 (74.57%) Loss: 2.318029 LR: 0.00001142 [15:34:55] Epoch: 1 Batch: 14988/20099 (74.57%) Loss: 2.018976 LR: 0.00001141 [15:34:57] Epoch: 1 Batch: 14989/20099 (74.58%) Loss: 2.410890 LR: 0.00001141 [15:34:59] Epoch: 1 Batch: 14990/20099 (74.58%) Loss: 1.792704 LR: 0.00001141 [15:35:01] Epoch: 1 Batch: 14991/20099 (74.59%) Loss: 2.074270 LR: 0.00001141 [15:35:02] Epoch: 1 Batch: 14992/20099 (74.59%) Loss: 1.945010 LR: 0.00001141 [15:35:04] Epoch: 1 Batch: 14993/20099 (74.60%) Loss: 2.284334 LR: 0.00001141 [15:35:06] Epoch: 1 Batch: 14994/20099 (74.60%) Loss: 1.616927 LR: 0.00001141 [15:35:08] Epoch: 1 Batch: 14995/20099 (74.61%) Loss: 2.180976 LR: 0.00001140 [15:35:10] Epoch: 1 Batch: 14996/20099 (74.61%) Loss: 1.950357 LR: 0.00001140 [15:35:12] Epoch: 1 Batch: 14997/20099 (74.62%) Loss: 2.245338 LR: 0.00001140 [15:35:14] Epoch: 1 Batch: 14998/20099 (74.62%) Loss: 2.255525 LR: 0.00001140 [15:35:15] Epoch: 1 Batch: 14999/20099 (74.63%) Loss: 2.211165 LR: 0.00001140 [15:35:17] >> Evaluating batch 0 [15:35:18] >> Evaluating batch 1 [15:35:19] >> Evaluating batch 2 [15:35:21] >> Evaluating batch 3 [15:35:22] >> Evaluating batch 4 [15:35:23] >> Evaluating batch 5 [15:35:24] >> Evaluating batch 6 [15:35:25] >> Evaluating batch 7 [15:35:26] >> Evaluating batch 8 [15:35:27] >> Evaluating batch 9 [15:35:28] >> Evaluating batch 10 [15:35:29] >> Evaluating batch 11 [15:35:30] >> Evaluating batch 12 [15:35:31] >> Evaluating batch 13 [15:35:32] >> Evaluating batch 14 [15:35:33] >> Evaluating batch 15 [15:35:34] >> Evaluating batch 16 [15:35:35] Epoch: 1 Step: 15000/20099 Evaluation: [15:35:35] [1mAvg Loss Since Last Eval: 2.0849 Val Loss: 2.1531 Validation loss delta: 0.0021 Perplexity: 8.6118 LR: 0.00001140 [15:35:38] >> Cleaned up old temp checkpoint: epoch1_step13000 [15:35:38] >> Temp checkpoint saved: epoch1_step15000, size: 0.1693 GB [15:35:42] >> Checkpoint saved: epoch1_step15000, size: 0.1693 GB [15:35:42] Epoch: 1 Batch: 15000/20099 (74.63%) Loss: 2.399714 LR: 0.00001140 [15:35:43] Epoch: 1 Batch: 15001/20099 (74.64%) Loss: 2.268160 LR: 0.00001140 [15:35:45] Epoch: 1 Batch: 15002/20099 (74.64%) Loss: 2.052238 LR: 0.00001138 [15:35:47] Epoch: 1 Batch: 15003/20099 (74.65%) Loss: 2.290272 LR: 0.00001138 [15:35:49] Epoch: 1 Batch: 15004/20099 (74.65%) Loss: 2.058256 LR: 0.00001138 [15:35:51] Epoch: 1 Batch: 15005/20099 (74.66%) Loss: 2.200400 LR: 0.00001138 [15:35:53] Epoch: 1 Batch: 15006/20099 (74.66%) Loss: 2.171251 LR: 0.00001138 [15:35:54] Epoch: 1 Batch: 15007/20099 (74.67%) Loss: 2.048496 LR: 0.00001138 [15:35:56] Epoch: 1 Batch: 15008/20099 (74.67%) Loss: 1.979047 LR: 0.00001138 [15:35:58] Epoch: 1 Batch: 15009/20099 (74.68%) Loss: 2.077617 LR: 0.00001137 [15:36:00] Epoch: 1 Batch: 15010/20099 (74.68%) Loss: 1.943466 LR: 0.00001137 [15:36:02] Epoch: 1 Batch: 15011/20099 (74.69%) Loss: 2.078683 LR: 0.00001137 [15:36:04] Epoch: 1 Batch: 15012/20099 (74.69%) Loss: 2.207379 LR: 0.00001137 [15:36:06] Epoch: 1 Batch: 15013/20099 (74.70%) Loss: 2.096154 LR: 0.00001137 [15:36:08] Epoch: 1 Batch: 15014/20099 (74.70%) Loss: 1.865391 LR: 0.00001137 [15:36:10] Epoch: 1 Batch: 15015/20099 (74.71%) Loss: 2.009006 LR: 0.00001137 [15:36:12] Epoch: 1 Batch: 15016/20099 (74.71%) Loss: 1.933682 LR: 0.00001136 [15:36:14] Epoch: 1 Batch: 15017/20099 (74.72%) Loss: 1.910530 LR: 0.00001136 [15:36:15] Epoch: 1 Batch: 15018/20099 (74.72%) Loss: 2.183299 LR: 0.00001136 [15:36:17] Epoch: 1 Batch: 15019/20099 (74.73%) Loss: 2.156774 LR: 0.00001136 [15:36:19] Epoch: 1 Batch: 15020/20099 (74.73%) Loss: 1.996060 LR: 0.00001136 [15:36:21] Epoch: 1 Batch: 15021/20099 (74.74%) Loss: 2.121195 LR: 0.00001136 [15:36:23] Epoch: 1 Batch: 15022/20099 (74.74%) Loss: 2.121837 LR: 0.00001136 [15:36:25] Epoch: 1 Batch: 15023/20099 (74.75%) Loss: 2.208791 LR: 0.00001134 [15:36:27] Epoch: 1 Batch: 15024/20099 (74.75%) Loss: 2.012165 LR: 0.00001134 [15:36:28] Epoch: 1 Batch: 15025/20099 (74.75%) Loss: 2.246577 LR: 0.00001134 [15:36:30] Epoch: 1 Batch: 15026/20099 (74.76%) Loss: 2.053092 LR: 0.00001134 [15:36:32] Epoch: 1 Batch: 15027/20099 (74.76%) Loss: 2.093612 LR: 0.00001134 [15:36:34] Epoch: 1 Batch: 15028/20099 (74.77%) Loss: 2.285298 LR: 0.00001134 [15:36:36] Epoch: 1 Batch: 15029/20099 (74.77%) Loss: 2.219091 LR: 0.00001134 [15:36:38] Epoch: 1 Batch: 15030/20099 (74.78%) Loss: 2.310145 LR: 0.00001133 [15:36:40] Epoch: 1 Batch: 15031/20099 (74.78%) Loss: 1.908726 LR: 0.00001133 [15:36:41] Epoch: 1 Batch: 15032/20099 (74.79%) Loss: 2.225934 LR: 0.00001133 [15:36:43] Epoch: 1 Batch: 15033/20099 (74.79%) Loss: 2.205931 LR: 0.00001133 [15:36:45] Epoch: 1 Batch: 15034/20099 (74.80%) Loss: 2.026833 LR: 0.00001133 [15:36:47] Epoch: 1 Batch: 15035/20099 (74.80%) Loss: 2.098872 LR: 0.00001133 [15:36:49] Epoch: 1 Batch: 15036/20099 (74.81%) Loss: 1.963253 LR: 0.00001133 [15:36:51] Epoch: 1 Batch: 15037/20099 (74.81%) Loss: 1.951711 LR: 0.00001132 [15:36:52] Epoch: 1 Batch: 15038/20099 (74.82%) Loss: 1.882307 LR: 0.00001132 [15:36:54] Epoch: 1 Batch: 15039/20099 (74.82%) Loss: 2.301706 LR: 0.00001132 [15:36:56] Epoch: 1 Batch: 15040/20099 (74.83%) Loss: 2.209752 LR: 0.00001132 [15:36:58] Epoch: 1 Batch: 15041/20099 (74.83%) Loss: 1.963155 LR: 0.00001132 [15:37:00] Epoch: 1 Batch: 15042/20099 (74.84%) Loss: 2.436749 LR: 0.00001132 [15:37:02] Epoch: 1 Batch: 15043/20099 (74.84%) Loss: 2.036539 LR: 0.00001132 [15:37:04] Epoch: 1 Batch: 15044/20099 (74.85%) Loss: 2.107283 LR: 0.00001130 [15:37:05] Epoch: 1 Batch: 15045/20099 (74.85%) Loss: 2.154724 LR: 0.00001130 [15:37:07] Epoch: 1 Batch: 15046/20099 (74.86%) Loss: 1.883241 LR: 0.00001130 [15:37:09] Epoch: 1 Batch: 15047/20099 (74.86%) Loss: 2.225297 LR: 0.00001130 [15:37:11] Epoch: 1 Batch: 15048/20099 (74.87%) Loss: 2.056154 LR: 0.00001130 [15:37:13] Epoch: 1 Batch: 15049/20099 (74.87%) Loss: 1.888702 LR: 0.00001130 [15:37:15] Epoch: 1 Batch: 15050/20099 (74.88%) Loss: 2.342314 LR: 0.00001130 [15:37:17] Epoch: 1 Batch: 15051/20099 (74.88%) Loss: 2.300777 LR: 0.00001129 [15:37:18] Epoch: 1 Batch: 15052/20099 (74.89%) Loss: 2.206802 LR: 0.00001129 [15:37:20] Epoch: 1 Batch: 15053/20099 (74.89%) Loss: 1.960949 LR: 0.00001129 [15:37:22] Epoch: 1 Batch: 15054/20099 (74.90%) Loss: 2.306780 LR: 0.00001129 [15:37:24] Epoch: 1 Batch: 15055/20099 (74.90%) Loss: 2.071290 LR: 0.00001129 [15:37:26] Epoch: 1 Batch: 15056/20099 (74.91%) Loss: 1.726404 LR: 0.00001129 [15:37:28] Epoch: 1 Batch: 15057/20099 (74.91%) Loss: 2.131804 LR: 0.00001129 [15:37:30] Epoch: 1 Batch: 15058/20099 (74.92%) Loss: 2.316452 LR: 0.00001128 [15:37:31] Epoch: 1 Batch: 15059/20099 (74.92%) Loss: 2.140406 LR: 0.00001128 [15:37:33] Epoch: 1 Batch: 15060/20099 (74.93%) Loss: 2.020577 LR: 0.00001128 [15:37:35] Epoch: 1 Batch: 15061/20099 (74.93%) Loss: 2.142670 LR: 0.00001128 [15:37:37] Epoch: 1 Batch: 15062/20099 (74.94%) Loss: 2.216624 LR: 0.00001128 [15:37:39] Epoch: 1 Batch: 15063/20099 (74.94%) Loss: 1.961823 LR: 0.00001128 [15:37:41] Epoch: 1 Batch: 15064/20099 (74.95%) Loss: 2.321778 LR: 0.00001128 [15:37:43] Epoch: 1 Batch: 15065/20099 (74.95%) Loss: 2.136083 LR: 0.00001126 [15:37:44] Epoch: 1 Batch: 15066/20099 (74.96%) Loss: 1.960380 LR: 0.00001126 [15:37:46] Epoch: 1 Batch: 15067/20099 (74.96%) Loss: 2.602595 LR: 0.00001126 [15:37:48] Epoch: 1 Batch: 15068/20099 (74.97%) Loss: 2.209375 LR: 0.00001126 [15:37:50] Epoch: 1 Batch: 15069/20099 (74.97%) Loss: 1.821879 LR: 0.00001126 [15:37:52] Epoch: 1 Batch: 15070/20099 (74.98%) Loss: 1.969225 LR: 0.00001126 [15:37:54] Epoch: 1 Batch: 15071/20099 (74.98%) Loss: 2.187813 LR: 0.00001126 [15:37:56] Epoch: 1 Batch: 15072/20099 (74.99%) Loss: 2.255141 LR: 0.00001125 [15:37:57] Epoch: 1 Batch: 15073/20099 (74.99%) Loss: 1.948574 LR: 0.00001125 [15:37:59] Epoch: 1 Batch: 15074/20099 (75.00%) Loss: 2.046085 LR: 0.00001125 [15:38:01] Epoch: 1 Batch: 15075/20099 (75.00%) Loss: 2.074902 LR: 0.00001125 [15:38:03] Epoch: 1 Batch: 15076/20099 (75.01%) Loss: 2.158692 LR: 0.00001125 [15:38:05] Epoch: 1 Batch: 15077/20099 (75.01%) Loss: 1.937281 LR: 0.00001125 [15:38:07] Epoch: 1 Batch: 15078/20099 (75.02%) Loss: 2.248474 LR: 0.00001125 [15:38:09] Epoch: 1 Batch: 15079/20099 (75.02%) Loss: 2.176357 LR: 0.00001123 [15:38:10] Epoch: 1 Batch: 15080/20099 (75.03%) Loss: 1.968749 LR: 0.00001123 [15:38:12] Epoch: 1 Batch: 15081/20099 (75.03%) Loss: 2.186950 LR: 0.00001123 [15:38:14] Epoch: 1 Batch: 15082/20099 (75.04%) Loss: 2.097704 LR: 0.00001123 [15:38:16] Epoch: 1 Batch: 15083/20099 (75.04%) Loss: 1.596262 LR: 0.00001123 [15:38:18] Epoch: 1 Batch: 15084/20099 (75.05%) Loss: 2.188342 LR: 0.00001123 [15:38:20] Epoch: 1 Batch: 15085/20099 (75.05%) Loss: 2.295798 LR: 0.00001123 [15:38:22] Epoch: 1 Batch: 15086/20099 (75.06%) Loss: 1.961109 LR: 0.00001122 [15:38:23] Epoch: 1 Batch: 15087/20099 (75.06%) Loss: 1.931898 LR: 0.00001122 [15:38:25] Epoch: 1 Batch: 15088/20099 (75.07%) Loss: 2.135632 LR: 0.00001122 [15:38:27] Epoch: 1 Batch: 15089/20099 (75.07%) Loss: 2.054744 LR: 0.00001122 [15:38:29] Epoch: 1 Batch: 15090/20099 (75.08%) Loss: 2.379980 LR: 0.00001122 [15:38:31] Epoch: 1 Batch: 15091/20099 (75.08%) Loss: 2.355986 LR: 0.00001122 [15:38:33] Epoch: 1 Batch: 15092/20099 (75.09%) Loss: 2.110699 LR: 0.00001122 [15:38:35] Epoch: 1 Batch: 15093/20099 (75.09%) Loss: 1.870952 LR: 0.00001121 [15:38:36] Epoch: 1 Batch: 15094/20099 (75.10%) Loss: 1.910584 LR: 0.00001121 [15:38:38] Epoch: 1 Batch: 15095/20099 (75.10%) Loss: 2.638815 LR: 0.00001121 [15:38:40] Epoch: 1 Batch: 15096/20099 (75.11%) Loss: 1.994974 LR: 0.00001121 [15:38:42] Epoch: 1 Batch: 15097/20099 (75.11%) Loss: 2.007473 LR: 0.00001121 [15:38:44] Epoch: 1 Batch: 15098/20099 (75.12%) Loss: 2.239409 LR: 0.00001121 [15:38:46] Epoch: 1 Batch: 15099/20099 (75.12%) Loss: 2.126755 LR: 0.00001121 [15:38:48] Epoch: 1 Batch: 15100/20099 (75.13%) Loss: 1.957310 LR: 0.00001119 [15:38:49] Epoch: 1 Batch: 15101/20099 (75.13%) Loss: 1.907703 LR: 0.00001119 [15:38:51] Epoch: 1 Batch: 15102/20099 (75.14%) Loss: 2.063211 LR: 0.00001119 [15:38:53] Epoch: 1 Batch: 15103/20099 (75.14%) Loss: 2.293962 LR: 0.00001119 [15:38:55] Epoch: 1 Batch: 15104/20099 (75.15%) Loss: 2.286539 LR: 0.00001119 [15:38:57] Epoch: 1 Batch: 15105/20099 (75.15%) Loss: 1.864180 LR: 0.00001119 [15:38:59] Epoch: 1 Batch: 15106/20099 (75.16%) Loss: 1.766424 LR: 0.00001119 [15:39:00] Epoch: 1 Batch: 15107/20099 (75.16%) Loss: 2.404510 LR: 0.00001118 [15:39:02] Epoch: 1 Batch: 15108/20099 (75.17%) Loss: 2.285655 LR: 0.00001118 [15:39:04] Epoch: 1 Batch: 15109/20099 (75.17%) Loss: 1.667720 LR: 0.00001118 [15:39:06] Epoch: 1 Batch: 15110/20099 (75.18%) Loss: 1.895698 LR: 0.00001118 [15:39:08] Epoch: 1 Batch: 15111/20099 (75.18%) Loss: 2.287520 LR: 0.00001118 [15:39:10] Epoch: 1 Batch: 15112/20099 (75.19%) Loss: 2.028039 LR: 0.00001118 [15:39:12] Epoch: 1 Batch: 15113/20099 (75.19%) Loss: 2.111310 LR: 0.00001118 [15:39:13] Epoch: 1 Batch: 15114/20099 (75.20%) Loss: 2.129694 LR: 0.00001117 [15:39:15] Epoch: 1 Batch: 15115/20099 (75.20%) Loss: 2.164134 LR: 0.00001117 [15:39:17] Epoch: 1 Batch: 15116/20099 (75.21%) Loss: 2.100262 LR: 0.00001117 [15:39:19] Epoch: 1 Batch: 15117/20099 (75.21%) Loss: 2.176819 LR: 0.00001117 [15:39:21] Epoch: 1 Batch: 15118/20099 (75.22%) Loss: 2.366236 LR: 0.00001117 [15:39:23] Epoch: 1 Batch: 15119/20099 (75.22%) Loss: 2.155540 LR: 0.00001117 [15:39:25] Epoch: 1 Batch: 15120/20099 (75.23%) Loss: 1.939144 LR: 0.00001117 [15:39:26] Epoch: 1 Batch: 15121/20099 (75.23%) Loss: 2.072552 LR: 0.00001115 [15:39:28] Epoch: 1 Batch: 15122/20099 (75.24%) Loss: 2.101533 LR: 0.00001115 [15:39:30] Epoch: 1 Batch: 15123/20099 (75.24%) Loss: 2.016602 LR: 0.00001115 [15:39:32] Epoch: 1 Batch: 15124/20099 (75.25%) Loss: 2.112617 LR: 0.00001115 [15:39:34] Epoch: 1 Batch: 15125/20099 (75.25%) Loss: 2.362927 LR: 0.00001115 [15:39:36] Epoch: 1 Batch: 15126/20099 (75.26%) Loss: 2.051871 LR: 0.00001115 [15:39:38] Epoch: 1 Batch: 15127/20099 (75.26%) Loss: 2.078336 LR: 0.00001115 [15:39:39] Epoch: 1 Batch: 15128/20099 (75.27%) Loss: 2.379160 LR: 0.00001114 [15:39:41] Epoch: 1 Batch: 15129/20099 (75.27%) Loss: 2.372590 LR: 0.00001114 [15:39:43] Epoch: 1 Batch: 15130/20099 (75.28%) Loss: 2.012220 LR: 0.00001114 [15:39:45] Epoch: 1 Batch: 15131/20099 (75.28%) Loss: 2.287977 LR: 0.00001114 [15:39:47] Epoch: 1 Batch: 15132/20099 (75.29%) Loss: 2.316362 LR: 0.00001114 [15:39:49] Epoch: 1 Batch: 15133/20099 (75.29%) Loss: 2.515020 LR: 0.00001114 [15:39:51] Epoch: 1 Batch: 15134/20099 (75.30%) Loss: 2.013502 LR: 0.00001114 [15:39:52] Epoch: 1 Batch: 15135/20099 (75.30%) Loss: 2.194878 LR: 0.00001113 [15:39:54] Epoch: 1 Batch: 15136/20099 (75.31%) Loss: 2.347776 LR: 0.00001113 [15:39:56] Epoch: 1 Batch: 15137/20099 (75.31%) Loss: 2.251693 LR: 0.00001113 [15:39:58] Epoch: 1 Batch: 15138/20099 (75.32%) Loss: 2.229572 LR: 0.00001113 [15:40:00] Epoch: 1 Batch: 15139/20099 (75.32%) Loss: 2.148108 LR: 0.00001113 [15:40:02] Epoch: 1 Batch: 15140/20099 (75.33%) Loss: 1.839326 LR: 0.00001113 [15:40:04] Epoch: 1 Batch: 15141/20099 (75.33%) Loss: 1.935631 LR: 0.00001113 [15:40:05] Epoch: 1 Batch: 15142/20099 (75.34%) Loss: 2.234335 LR: 0.00001111 [15:40:07] Epoch: 1 Batch: 15143/20099 (75.34%) Loss: 2.447953 LR: 0.00001111 [15:40:09] Epoch: 1 Batch: 15144/20099 (75.35%) Loss: 2.291926 LR: 0.00001111 [15:40:11] Epoch: 1 Batch: 15145/20099 (75.35%) Loss: 2.160417 LR: 0.00001111 [15:40:13] Epoch: 1 Batch: 15146/20099 (75.36%) Loss: 2.205726 LR: 0.00001111 [15:40:15] Epoch: 1 Batch: 15147/20099 (75.36%) Loss: 2.252620 LR: 0.00001111 [15:40:17] Epoch: 1 Batch: 15148/20099 (75.37%) Loss: 2.209413 LR: 0.00001111 [15:40:18] Epoch: 1 Batch: 15149/20099 (75.37%) Loss: 2.293726 LR: 0.00001110 [15:40:20] Epoch: 1 Batch: 15150/20099 (75.38%) Loss: 1.894569 LR: 0.00001110 [15:40:22] Epoch: 1 Batch: 15151/20099 (75.38%) Loss: 2.105061 LR: 0.00001110 [15:40:24] Epoch: 1 Batch: 15152/20099 (75.39%) Loss: 2.204687 LR: 0.00001110 [15:40:26] Epoch: 1 Batch: 15153/20099 (75.39%) Loss: 1.987600 LR: 0.00001110 [15:40:28] Epoch: 1 Batch: 15154/20099 (75.40%) Loss: 2.045967 LR: 0.00001110 [15:40:30] Epoch: 1 Batch: 15155/20099 (75.40%) Loss: 2.612744 LR: 0.00001110 [15:40:31] Epoch: 1 Batch: 15156/20099 (75.41%) Loss: 2.007379 LR: 0.00001109 [15:40:33] Epoch: 1 Batch: 15157/20099 (75.41%) Loss: 1.999689 LR: 0.00001109 [15:40:35] Epoch: 1 Batch: 15158/20099 (75.42%) Loss: 2.134153 LR: 0.00001109 [15:40:37] Epoch: 1 Batch: 15159/20099 (75.42%) Loss: 2.215169 LR: 0.00001109 [15:40:39] Epoch: 1 Batch: 15160/20099 (75.43%) Loss: 2.095916 LR: 0.00001109 [15:40:41] Epoch: 1 Batch: 15161/20099 (75.43%) Loss: 2.381615 LR: 0.00001109 [15:40:43] Epoch: 1 Batch: 15162/20099 (75.44%) Loss: 2.090457 LR: 0.00001109 [15:40:44] Epoch: 1 Batch: 15163/20099 (75.44%) Loss: 2.434157 LR: 0.00001107 [15:40:46] Epoch: 1 Batch: 15164/20099 (75.45%) Loss: 2.006435 LR: 0.00001107 [15:40:48] Epoch: 1 Batch: 15165/20099 (75.45%) Loss: 2.003097 LR: 0.00001107 [15:40:50] Epoch: 1 Batch: 15166/20099 (75.46%) Loss: 2.111379 LR: 0.00001107 [15:40:52] Epoch: 1 Batch: 15167/20099 (75.46%) Loss: 2.273332 LR: 0.00001107 [15:40:54] Epoch: 1 Batch: 15168/20099 (75.47%) Loss: 1.987785 LR: 0.00001107 [15:40:56] Epoch: 1 Batch: 15169/20099 (75.47%) Loss: 2.169081 LR: 0.00001107 [15:40:57] Epoch: 1 Batch: 15170/20099 (75.48%) Loss: 2.141636 LR: 0.00001106 [15:40:59] Epoch: 1 Batch: 15171/20099 (75.48%) Loss: 2.186105 LR: 0.00001106 [15:41:01] Epoch: 1 Batch: 15172/20099 (75.49%) Loss: 1.819408 LR: 0.00001106 [15:41:03] Epoch: 1 Batch: 15173/20099 (75.49%) Loss: 2.144169 LR: 0.00001106 [15:41:05] Epoch: 1 Batch: 15174/20099 (75.50%) Loss: 2.090658 LR: 0.00001106 [15:41:07] Epoch: 1 Batch: 15175/20099 (75.50%) Loss: 2.374142 LR: 0.00001106 [15:41:09] Epoch: 1 Batch: 15176/20099 (75.51%) Loss: 2.120179 LR: 0.00001106 [15:41:10] Epoch: 1 Batch: 15177/20099 (75.51%) Loss: 2.191149 LR: 0.00001105 [15:41:12] Epoch: 1 Batch: 15178/20099 (75.52%) Loss: 2.171837 LR: 0.00001105 [15:41:14] Epoch: 1 Batch: 15179/20099 (75.52%) Loss: 2.180700 LR: 0.00001105 [15:41:16] Epoch: 1 Batch: 15180/20099 (75.53%) Loss: 1.980951 LR: 0.00001105 [15:41:18] Epoch: 1 Batch: 15181/20099 (75.53%) Loss: 2.029532 LR: 0.00001105 [15:41:20] Epoch: 1 Batch: 15182/20099 (75.54%) Loss: 2.200535 LR: 0.00001105 [15:41:22] Epoch: 1 Batch: 15183/20099 (75.54%) Loss: 2.114560 LR: 0.00001105 [15:41:23] Epoch: 1 Batch: 15184/20099 (75.55%) Loss: 2.073947 LR: 0.00001103 [15:41:25] Epoch: 1 Batch: 15185/20099 (75.55%) Loss: 2.123386 LR: 0.00001103 [15:41:27] Epoch: 1 Batch: 15186/20099 (75.56%) Loss: 2.229039 LR: 0.00001103 [15:41:29] Epoch: 1 Batch: 15187/20099 (75.56%) Loss: 2.150718 LR: 0.00001103 [15:41:31] Epoch: 1 Batch: 15188/20099 (75.57%) Loss: 2.167921 LR: 0.00001103 [15:41:33] Epoch: 1 Batch: 15189/20099 (75.57%) Loss: 2.082790 LR: 0.00001103 [15:41:35] Epoch: 1 Batch: 15190/20099 (75.58%) Loss: 1.978480 LR: 0.00001103 [15:41:36] Epoch: 1 Batch: 15191/20099 (75.58%) Loss: 2.058733 LR: 0.00001102 [15:41:38] Epoch: 1 Batch: 15192/20099 (75.59%) Loss: 2.272103 LR: 0.00001102 [15:41:40] Epoch: 1 Batch: 15193/20099 (75.59%) Loss: 2.259006 LR: 0.00001102 [15:41:42] Epoch: 1 Batch: 15194/20099 (75.60%) Loss: 2.281334 LR: 0.00001102 [15:41:44] Epoch: 1 Batch: 15195/20099 (75.60%) Loss: 2.152300 LR: 0.00001102 [15:41:46] Epoch: 1 Batch: 15196/20099 (75.61%) Loss: 2.045691 LR: 0.00001102 [15:41:48] Epoch: 1 Batch: 15197/20099 (75.61%) Loss: 2.313630 LR: 0.00001102 [15:41:49] Epoch: 1 Batch: 15198/20099 (75.62%) Loss: 2.111108 LR: 0.00001101 [15:41:51] Epoch: 1 Batch: 15199/20099 (75.62%) Loss: 1.731439 LR: 0.00001101 [15:41:57] >> Cleaned up old temp checkpoint: epoch1_step13200 [15:41:57] >> Temp checkpoint saved: epoch1_step15200, size: 0.1693 GB [15:41:57] Epoch: 1 Batch: 15200/20099 (75.63%) Loss: 2.114590 LR: 0.00001101 [15:41:59] Epoch: 1 Batch: 15201/20099 (75.63%) Loss: 2.403763 LR: 0.00001101 [15:42:00] Epoch: 1 Batch: 15202/20099 (75.64%) Loss: 2.000744 LR: 0.00001101 [15:42:02] Epoch: 1 Batch: 15203/20099 (75.64%) Loss: 1.870393 LR: 0.00001101 [15:42:04] Epoch: 1 Batch: 15204/20099 (75.65%) Loss: 2.191642 LR: 0.00001101 [15:42:06] Epoch: 1 Batch: 15205/20099 (75.65%) Loss: 2.082978 LR: 0.00001100 [15:42:08] Epoch: 1 Batch: 15206/20099 (75.66%) Loss: 2.329272 LR: 0.00001100 [15:42:10] Epoch: 1 Batch: 15207/20099 (75.66%) Loss: 2.268017 LR: 0.00001100 [15:42:11] Epoch: 1 Batch: 15208/20099 (75.67%) Loss: 2.089405 LR: 0.00001100 [15:42:13] Epoch: 1 Batch: 15209/20099 (75.67%) Loss: 2.299042 LR: 0.00001100 [15:42:15] Epoch: 1 Batch: 15210/20099 (75.68%) Loss: 1.971114 LR: 0.00001100 [15:42:17] Epoch: 1 Batch: 15211/20099 (75.68%) Loss: 2.050303 LR: 0.00001100 [15:42:19] Epoch: 1 Batch: 15212/20099 (75.69%) Loss: 1.794920 LR: 0.00001098 [15:42:21] Epoch: 1 Batch: 15213/20099 (75.69%) Loss: 2.208009 LR: 0.00001098 [15:42:23] Epoch: 1 Batch: 15214/20099 (75.70%) Loss: 2.036736 LR: 0.00001098 [15:42:24] Epoch: 1 Batch: 15215/20099 (75.70%) Loss: 1.895869 LR: 0.00001098 [15:42:26] Epoch: 1 Batch: 15216/20099 (75.71%) Loss: 2.079580 LR: 0.00001098 [15:42:28] Epoch: 1 Batch: 15217/20099 (75.71%) Loss: 2.061784 LR: 0.00001098 [15:42:30] Epoch: 1 Batch: 15218/20099 (75.72%) Loss: 2.211788 LR: 0.00001098 [15:42:32] Epoch: 1 Batch: 15219/20099 (75.72%) Loss: 2.178365 LR: 0.00001097 [15:42:34] Epoch: 1 Batch: 15220/20099 (75.73%) Loss: 2.153729 LR: 0.00001097 [15:42:36] Epoch: 1 Batch: 15221/20099 (75.73%) Loss: 1.927995 LR: 0.00001097 [15:42:38] Epoch: 1 Batch: 15222/20099 (75.74%) Loss: 1.944123 LR: 0.00001097 [15:42:39] Epoch: 1 Batch: 15223/20099 (75.74%) Loss: 1.749714 LR: 0.00001097 [15:42:41] Epoch: 1 Batch: 15224/20099 (75.75%) Loss: 1.868667 LR: 0.00001097 [15:42:43] Epoch: 1 Batch: 15225/20099 (75.75%) Loss: 2.048191 LR: 0.00001097 [15:42:45] Epoch: 1 Batch: 15226/20099 (75.76%) Loss: 1.875780 LR: 0.00001096 [15:42:47] Epoch: 1 Batch: 15227/20099 (75.76%) Loss: 2.339383 LR: 0.00001096 [15:42:49] Epoch: 1 Batch: 15228/20099 (75.76%) Loss: 2.075808 LR: 0.00001096 [15:42:51] Epoch: 1 Batch: 15229/20099 (75.77%) Loss: 2.072983 LR: 0.00001096 [15:42:52] Epoch: 1 Batch: 15230/20099 (75.77%) Loss: 2.109290 LR: 0.00001096 [15:42:54] Epoch: 1 Batch: 15231/20099 (75.78%) Loss: 2.354695 LR: 0.00001096 [15:42:56] Epoch: 1 Batch: 15232/20099 (75.78%) Loss: 1.915908 LR: 0.00001096 [15:42:58] Epoch: 1 Batch: 15233/20099 (75.79%) Loss: 2.061069 LR: 0.00001094 [15:43:00] Epoch: 1 Batch: 15234/20099 (75.79%) Loss: 2.223853 LR: 0.00001094 [15:43:02] Epoch: 1 Batch: 15235/20099 (75.80%) Loss: 2.138276 LR: 0.00001094 [15:43:04] Epoch: 1 Batch: 15236/20099 (75.80%) Loss: 2.059806 LR: 0.00001094 [15:43:05] Epoch: 1 Batch: 15237/20099 (75.81%) Loss: 2.079848 LR: 0.00001094 [15:43:07] Epoch: 1 Batch: 15238/20099 (75.81%) Loss: 1.966143 LR: 0.00001094 [15:43:09] Epoch: 1 Batch: 15239/20099 (75.82%) Loss: 2.286017 LR: 0.00001094 [15:43:11] Epoch: 1 Batch: 15240/20099 (75.82%) Loss: 1.834256 LR: 0.00001093 [15:43:13] Epoch: 1 Batch: 15241/20099 (75.83%) Loss: 2.136604 LR: 0.00001093 [15:43:15] Epoch: 1 Batch: 15242/20099 (75.83%) Loss: 2.097176 LR: 0.00001093 [15:43:16] Epoch: 1 Batch: 15243/20099 (75.84%) Loss: 2.202451 LR: 0.00001093 [15:43:18] Epoch: 1 Batch: 15244/20099 (75.84%) Loss: 2.220344 LR: 0.00001093 [15:43:20] Epoch: 1 Batch: 15245/20099 (75.85%) Loss: 1.805936 LR: 0.00001093 [15:43:22] Epoch: 1 Batch: 15246/20099 (75.85%) Loss: 2.165530 LR: 0.00001093 [15:43:24] Epoch: 1 Batch: 15247/20099 (75.86%) Loss: 1.965137 LR: 0.00001092 [15:43:26] Epoch: 1 Batch: 15248/20099 (75.86%) Loss: 1.925427 LR: 0.00001092 [15:43:27] Epoch: 1 Batch: 15249/20099 (75.87%) Loss: 2.194438 LR: 0.00001092 [15:43:29] Epoch: 1 Batch: 15250/20099 (75.87%) Loss: 2.178713 LR: 0.00001092 [15:43:31] Epoch: 1 Batch: 15251/20099 (75.88%) Loss: 2.013268 LR: 0.00001092 [15:43:33] Epoch: 1 Batch: 15252/20099 (75.88%) Loss: 2.151851 LR: 0.00001092 [15:43:35] Epoch: 1 Batch: 15253/20099 (75.89%) Loss: 2.062140 LR: 0.00001092 [15:43:37] Epoch: 1 Batch: 15254/20099 (75.89%) Loss: 2.106021 LR: 0.00001090 [15:43:39] Epoch: 1 Batch: 15255/20099 (75.90%) Loss: 2.180043 LR: 0.00001090 [15:43:40] Epoch: 1 Batch: 15256/20099 (75.90%) Loss: 1.914078 LR: 0.00001090 [15:43:42] Epoch: 1 Batch: 15257/20099 (75.91%) Loss: 2.175543 LR: 0.00001090 [15:43:44] Epoch: 1 Batch: 15258/20099 (75.91%) Loss: 1.924527 LR: 0.00001090 [15:43:46] Epoch: 1 Batch: 15259/20099 (75.92%) Loss: 2.287408 LR: 0.00001090 [15:43:48] Epoch: 1 Batch: 15260/20099 (75.92%) Loss: 2.005952 LR: 0.00001090 [15:43:50] Epoch: 1 Batch: 15261/20099 (75.93%) Loss: 1.861841 LR: 0.00001089 [15:43:52] Epoch: 1 Batch: 15262/20099 (75.93%) Loss: 2.221993 LR: 0.00001089 [15:43:53] Epoch: 1 Batch: 15263/20099 (75.94%) Loss: 2.232285 LR: 0.00001089 [15:43:55] Epoch: 1 Batch: 15264/20099 (75.94%) Loss: 2.102482 LR: 0.00001089 [15:43:57] Epoch: 1 Batch: 15265/20099 (75.95%) Loss: 2.280500 LR: 0.00001089 [15:43:59] Epoch: 1 Batch: 15266/20099 (75.95%) Loss: 2.012672 LR: 0.00001089 [15:44:01] Epoch: 1 Batch: 15267/20099 (75.96%) Loss: 1.807153 LR: 0.00001089 [15:44:03] Epoch: 1 Batch: 15268/20099 (75.96%) Loss: 2.244059 LR: 0.00001088 [15:44:05] Epoch: 1 Batch: 15269/20099 (75.97%) Loss: 2.238489 LR: 0.00001088 [15:44:06] Epoch: 1 Batch: 15270/20099 (75.97%) Loss: 2.039292 LR: 0.00001088 [15:44:08] Epoch: 1 Batch: 15271/20099 (75.98%) Loss: 2.141555 LR: 0.00001088 [15:44:10] Epoch: 1 Batch: 15272/20099 (75.98%) Loss: 2.332460 LR: 0.00001088 [15:44:12] Epoch: 1 Batch: 15273/20099 (75.99%) Loss: 2.246798 LR: 0.00001088 [15:44:14] Epoch: 1 Batch: 15274/20099 (75.99%) Loss: 2.157036 LR: 0.00001088 [15:44:16] Epoch: 1 Batch: 15275/20099 (76.00%) Loss: 2.055009 LR: 0.00001086 [15:44:18] Epoch: 1 Batch: 15276/20099 (76.00%) Loss: 2.000980 LR: 0.00001086 [15:44:20] Epoch: 1 Batch: 15277/20099 (76.01%) Loss: 2.221381 LR: 0.00001086 [15:44:21] Epoch: 1 Batch: 15278/20099 (76.01%) Loss: 1.990639 LR: 0.00001086 [15:44:23] Epoch: 1 Batch: 15279/20099 (76.02%) Loss: 1.966292 LR: 0.00001086 [15:44:25] Epoch: 1 Batch: 15280/20099 (76.02%) Loss: 2.002801 LR: 0.00001086 [15:44:27] Epoch: 1 Batch: 15281/20099 (76.03%) Loss: 1.885616 LR: 0.00001086 [15:44:29] Epoch: 1 Batch: 15282/20099 (76.03%) Loss: 2.103451 LR: 0.00001085 [15:44:31] Epoch: 1 Batch: 15283/20099 (76.04%) Loss: 1.893052 LR: 0.00001085 [15:44:33] Epoch: 1 Batch: 15284/20099 (76.04%) Loss: 2.122659 LR: 0.00001085 [15:44:34] Epoch: 1 Batch: 15285/20099 (76.05%) Loss: 1.978359 LR: 0.00001085 [15:44:36] Epoch: 1 Batch: 15286/20099 (76.05%) Loss: 1.936689 LR: 0.00001085 [15:44:38] Epoch: 1 Batch: 15287/20099 (76.06%) Loss: 1.928531 LR: 0.00001085 [15:44:40] Epoch: 1 Batch: 15288/20099 (76.06%) Loss: 1.824268 LR: 0.00001085 [15:44:42] Epoch: 1 Batch: 15289/20099 (76.07%) Loss: 2.047624 LR: 0.00001084 [15:44:44] Epoch: 1 Batch: 15290/20099 (76.07%) Loss: 2.014981 LR: 0.00001084 [15:44:46] Epoch: 1 Batch: 15291/20099 (76.08%) Loss: 1.934533 LR: 0.00001084 [15:44:47] Epoch: 1 Batch: 15292/20099 (76.08%) Loss: 2.510976 LR: 0.00001084 [15:44:49] Epoch: 1 Batch: 15293/20099 (76.09%) Loss: 2.062315 LR: 0.00001084 [15:44:51] Epoch: 1 Batch: 15294/20099 (76.09%) Loss: 1.954675 LR: 0.00001084 [15:44:53] Epoch: 1 Batch: 15295/20099 (76.10%) Loss: 1.884975 LR: 0.00001084 [15:44:55] Epoch: 1 Batch: 15296/20099 (76.10%) Loss: 1.583232 LR: 0.00001082 [15:44:57] Epoch: 1 Batch: 15297/20099 (76.11%) Loss: 1.930627 LR: 0.00001082 [15:44:59] Epoch: 1 Batch: 15298/20099 (76.11%) Loss: 1.848381 LR: 0.00001082 [15:45:00] Epoch: 1 Batch: 15299/20099 (76.12%) Loss: 1.943404 LR: 0.00001082 [15:45:02] Epoch: 1 Batch: 15300/20099 (76.12%) Loss: 2.124398 LR: 0.00001082 [15:45:04] Epoch: 1 Batch: 15301/20099 (76.13%) Loss: 2.343056 LR: 0.00001082 [15:45:06] Epoch: 1 Batch: 15302/20099 (76.13%) Loss: 2.000821 LR: 0.00001082 [15:45:08] Epoch: 1 Batch: 15303/20099 (76.14%) Loss: 1.924122 LR: 0.00001081 [15:45:10] Epoch: 1 Batch: 15304/20099 (76.14%) Loss: 2.076907 LR: 0.00001081 [15:45:11] Epoch: 1 Batch: 15305/20099 (76.15%) Loss: 2.070025 LR: 0.00001081 [15:45:13] Epoch: 1 Batch: 15306/20099 (76.15%) Loss: 2.202655 LR: 0.00001081 [15:45:15] Epoch: 1 Batch: 15307/20099 (76.16%) Loss: 2.066310 LR: 0.00001081 [15:45:17] Epoch: 1 Batch: 15308/20099 (76.16%) Loss: 1.984949 LR: 0.00001081 [15:45:19] Epoch: 1 Batch: 15309/20099 (76.17%) Loss: 2.378587 LR: 0.00001081 [15:45:21] Epoch: 1 Batch: 15310/20099 (76.17%) Loss: 2.128044 LR: 0.00001080 [15:45:23] Epoch: 1 Batch: 15311/20099 (76.18%) Loss: 2.329441 LR: 0.00001080 [15:45:24] Epoch: 1 Batch: 15312/20099 (76.18%) Loss: 2.148195 LR: 0.00001080 [15:45:26] Epoch: 1 Batch: 15313/20099 (76.19%) Loss: 1.908280 LR: 0.00001080 [15:45:28] Epoch: 1 Batch: 15314/20099 (76.19%) Loss: 2.214649 LR: 0.00001080 [15:45:30] Epoch: 1 Batch: 15315/20099 (76.20%) Loss: 2.294306 LR: 0.00001080 [15:45:32] Epoch: 1 Batch: 15316/20099 (76.20%) Loss: 1.850348 LR: 0.00001080 [15:45:34] Epoch: 1 Batch: 15317/20099 (76.21%) Loss: 2.029615 LR: 0.00001079 [15:45:36] Epoch: 1 Batch: 15318/20099 (76.21%) Loss: 1.921311 LR: 0.00001079 [15:45:37] Epoch: 1 Batch: 15319/20099 (76.22%) Loss: 2.143719 LR: 0.00001079 [15:45:39] Epoch: 1 Batch: 15320/20099 (76.22%) Loss: 2.155460 LR: 0.00001079 [15:45:41] Epoch: 1 Batch: 15321/20099 (76.23%) Loss: 2.037797 LR: 0.00001079 [15:45:43] Epoch: 1 Batch: 15322/20099 (76.23%) Loss: 2.003455 LR: 0.00001079 [15:45:45] Epoch: 1 Batch: 15323/20099 (76.24%) Loss: 2.020525 LR: 0.00001079 [15:45:47] Epoch: 1 Batch: 15324/20099 (76.24%) Loss: 1.798068 LR: 0.00001077 [15:45:49] Epoch: 1 Batch: 15325/20099 (76.25%) Loss: 2.075258 LR: 0.00001077 [15:45:50] Epoch: 1 Batch: 15326/20099 (76.25%) Loss: 1.986336 LR: 0.00001077 [15:45:52] Epoch: 1 Batch: 15327/20099 (76.26%) Loss: 2.221890 LR: 0.00001077 [15:45:54] Epoch: 1 Batch: 15328/20099 (76.26%) Loss: 2.216751 LR: 0.00001077 [15:45:56] Epoch: 1 Batch: 15329/20099 (76.27%) Loss: 2.171168 LR: 0.00001077 [15:45:58] Epoch: 1 Batch: 15330/20099 (76.27%) Loss: 2.033912 LR: 0.00001077 [15:46:00] Epoch: 1 Batch: 15331/20099 (76.28%) Loss: 2.283942 LR: 0.00001076 [15:46:02] Epoch: 1 Batch: 15332/20099 (76.28%) Loss: 1.884410 LR: 0.00001076 [15:46:03] Epoch: 1 Batch: 15333/20099 (76.29%) Loss: 1.910356 LR: 0.00001076 [15:46:05] Epoch: 1 Batch: 15334/20099 (76.29%) Loss: 1.671831 LR: 0.00001076 [15:46:07] Epoch: 1 Batch: 15335/20099 (76.30%) Loss: 1.978758 LR: 0.00001076 [15:46:09] Epoch: 1 Batch: 15336/20099 (76.30%) Loss: 2.178091 LR: 0.00001076 [15:46:11] Epoch: 1 Batch: 15337/20099 (76.31%) Loss: 2.288015 LR: 0.00001076 [15:46:13] Epoch: 1 Batch: 15338/20099 (76.31%) Loss: 1.961298 LR: 0.00001075 [15:46:14] Epoch: 1 Batch: 15339/20099 (76.32%) Loss: 1.953100 LR: 0.00001075 [15:46:16] Epoch: 1 Batch: 15340/20099 (76.32%) Loss: 2.260576 LR: 0.00001075 [15:46:18] Epoch: 1 Batch: 15341/20099 (76.33%) Loss: 2.070876 LR: 0.00001075 [15:46:20] Epoch: 1 Batch: 15342/20099 (76.33%) Loss: 2.089956 LR: 0.00001075 [15:46:22] Epoch: 1 Batch: 15343/20099 (76.34%) Loss: 2.010200 LR: 0.00001075 [15:46:24] Epoch: 1 Batch: 15344/20099 (76.34%) Loss: 1.958868 LR: 0.00001075 [15:46:26] Epoch: 1 Batch: 15345/20099 (76.35%) Loss: 2.130991 LR: 0.00001073 [15:46:27] Epoch: 1 Batch: 15346/20099 (76.35%) Loss: 2.244550 LR: 0.00001073 [15:46:29] Epoch: 1 Batch: 15347/20099 (76.36%) Loss: 2.099769 LR: 0.00001073 [15:46:31] Epoch: 1 Batch: 15348/20099 (76.36%) Loss: 2.222178 LR: 0.00001073 [15:46:33] Epoch: 1 Batch: 15349/20099 (76.37%) Loss: 2.209086 LR: 0.00001073 [15:46:35] Epoch: 1 Batch: 15350/20099 (76.37%) Loss: 2.145990 LR: 0.00001073 [15:46:37] Epoch: 1 Batch: 15351/20099 (76.38%) Loss: 2.178428 LR: 0.00001073 [15:46:39] Epoch: 1 Batch: 15352/20099 (76.38%) Loss: 2.200249 LR: 0.00001072 [15:46:40] Epoch: 1 Batch: 15353/20099 (76.39%) Loss: 1.720559 LR: 0.00001072 [15:46:42] Epoch: 1 Batch: 15354/20099 (76.39%) Loss: 1.886654 LR: 0.00001072 [15:46:44] Epoch: 1 Batch: 15355/20099 (76.40%) Loss: 2.165847 LR: 0.00001072 [15:46:46] Epoch: 1 Batch: 15356/20099 (76.40%) Loss: 2.435343 LR: 0.00001072 [15:46:48] Epoch: 1 Batch: 15357/20099 (76.41%) Loss: 1.827041 LR: 0.00001072 [15:46:50] Epoch: 1 Batch: 15358/20099 (76.41%) Loss: 2.252287 LR: 0.00001072 [15:46:51] Epoch: 1 Batch: 15359/20099 (76.42%) Loss: 1.803403 LR: 0.00001071 [15:46:53] Epoch: 1 Batch: 15360/20099 (76.42%) Loss: 2.218782 LR: 0.00001071 [15:46:55] Epoch: 1 Batch: 15361/20099 (76.43%) Loss: 1.906487 LR: 0.00001071 [15:46:57] Epoch: 1 Batch: 15362/20099 (76.43%) Loss: 2.178492 LR: 0.00001071 [15:46:59] Epoch: 1 Batch: 15363/20099 (76.44%) Loss: 1.837838 LR: 0.00001071 [15:47:01] Epoch: 1 Batch: 15364/20099 (76.44%) Loss: 2.127402 LR: 0.00001071 [15:47:03] Epoch: 1 Batch: 15365/20099 (76.45%) Loss: 2.148665 LR: 0.00001071 [15:47:04] Epoch: 1 Batch: 15366/20099 (76.45%) Loss: 1.975127 LR: 0.00001070 [15:47:06] Epoch: 1 Batch: 15367/20099 (76.46%) Loss: 2.035200 LR: 0.00001070 [15:47:08] Epoch: 1 Batch: 15368/20099 (76.46%) Loss: 1.895653 LR: 0.00001070 [15:47:10] Epoch: 1 Batch: 15369/20099 (76.47%) Loss: 2.330080 LR: 0.00001070 [15:47:12] Epoch: 1 Batch: 15370/20099 (76.47%) Loss: 2.114565 LR: 0.00001070 [15:47:14] Epoch: 1 Batch: 15371/20099 (76.48%) Loss: 2.262273 LR: 0.00001070 [15:47:16] Epoch: 1 Batch: 15372/20099 (76.48%) Loss: 2.123597 LR: 0.00001070 [15:47:17] Epoch: 1 Batch: 15373/20099 (76.49%) Loss: 2.242881 LR: 0.00001068 [15:47:19] Epoch: 1 Batch: 15374/20099 (76.49%) Loss: 2.090059 LR: 0.00001068 [15:47:21] Epoch: 1 Batch: 15375/20099 (76.50%) Loss: 1.761441 LR: 0.00001068 [15:47:23] Epoch: 1 Batch: 15376/20099 (76.50%) Loss: 2.504808 LR: 0.00001068 [15:47:25] Epoch: 1 Batch: 15377/20099 (76.51%) Loss: 1.810058 LR: 0.00001068 [15:47:27] Epoch: 1 Batch: 15378/20099 (76.51%) Loss: 1.841511 LR: 0.00001068 [15:47:28] Epoch: 1 Batch: 15379/20099 (76.52%) Loss: 2.082515 LR: 0.00001068 [15:47:30] Epoch: 1 Batch: 15380/20099 (76.52%) Loss: 2.267555 LR: 0.00001067 [15:47:32] Epoch: 1 Batch: 15381/20099 (76.53%) Loss: 2.239877 LR: 0.00001067 [15:47:34] Epoch: 1 Batch: 15382/20099 (76.53%) Loss: 2.049970 LR: 0.00001067 [15:47:36] Epoch: 1 Batch: 15383/20099 (76.54%) Loss: 1.867315 LR: 0.00001067 [15:47:38] Epoch: 1 Batch: 15384/20099 (76.54%) Loss: 2.223927 LR: 0.00001067 [15:47:40] Epoch: 1 Batch: 15385/20099 (76.55%) Loss: 1.880072 LR: 0.00001067 [15:47:41] Epoch: 1 Batch: 15386/20099 (76.55%) Loss: 2.238926 LR: 0.00001067 [15:47:43] Epoch: 1 Batch: 15387/20099 (76.56%) Loss: 1.876747 LR: 0.00001066 [15:47:45] Epoch: 1 Batch: 15388/20099 (76.56%) Loss: 2.223061 LR: 0.00001066 [15:47:47] Epoch: 1 Batch: 15389/20099 (76.57%) Loss: 2.282782 LR: 0.00001066 [15:47:49] Epoch: 1 Batch: 15390/20099 (76.57%) Loss: 2.186871 LR: 0.00001066 [15:47:51] Epoch: 1 Batch: 15391/20099 (76.58%) Loss: 2.287266 LR: 0.00001066 [15:47:52] Epoch: 1 Batch: 15392/20099 (76.58%) Loss: 1.821151 LR: 0.00001066 [15:47:54] Epoch: 1 Batch: 15393/20099 (76.59%) Loss: 1.907996 LR: 0.00001066 [15:47:56] Epoch: 1 Batch: 15394/20099 (76.59%) Loss: 1.884081 LR: 0.00001064 [15:47:58] Epoch: 1 Batch: 15395/20099 (76.60%) Loss: 2.300083 LR: 0.00001064 [15:48:00] Epoch: 1 Batch: 15396/20099 (76.60%) Loss: 2.018670 LR: 0.00001064 [15:48:02] Epoch: 1 Batch: 15397/20099 (76.61%) Loss: 2.046902 LR: 0.00001064 [15:48:04] Epoch: 1 Batch: 15398/20099 (76.61%) Loss: 2.021716 LR: 0.00001064 [15:48:05] Epoch: 1 Batch: 15399/20099 (76.62%) Loss: 1.923739 LR: 0.00001064 [15:48:11] >> Cleaned up old temp checkpoint: epoch1_step13400 [15:48:11] >> Temp checkpoint saved: epoch1_step15400, size: 0.1693 GB [15:48:11] Epoch: 1 Batch: 15400/20099 (76.62%) Loss: 1.787343 LR: 0.00001064 [15:48:13] Epoch: 1 Batch: 15401/20099 (76.63%) Loss: 2.077458 LR: 0.00001063 [15:48:15] Epoch: 1 Batch: 15402/20099 (76.63%) Loss: 1.791352 LR: 0.00001063 [15:48:17] Epoch: 1 Batch: 15403/20099 (76.64%) Loss: 2.085089 LR: 0.00001063 [15:48:18] Epoch: 1 Batch: 15404/20099 (76.64%) Loss: 2.108154 LR: 0.00001063 [15:48:20] Epoch: 1 Batch: 15405/20099 (76.65%) Loss: 2.321947 LR: 0.00001063 [15:48:22] Epoch: 1 Batch: 15406/20099 (76.65%) Loss: 2.095125 LR: 0.00001063 [15:48:24] Epoch: 1 Batch: 15407/20099 (76.66%) Loss: 1.988294 LR: 0.00001063 [15:48:26] Epoch: 1 Batch: 15408/20099 (76.66%) Loss: 2.074601 LR: 0.00001062 [15:48:28] Epoch: 1 Batch: 15409/20099 (76.67%) Loss: 2.135297 LR: 0.00001062 [15:48:30] Epoch: 1 Batch: 15410/20099 (76.67%) Loss: 2.262911 LR: 0.00001062 [15:48:31] Epoch: 1 Batch: 15411/20099 (76.68%) Loss: 1.893217 LR: 0.00001062 [15:48:33] Epoch: 1 Batch: 15412/20099 (76.68%) Loss: 2.199661 LR: 0.00001062 [15:48:35] Epoch: 1 Batch: 15413/20099 (76.69%) Loss: 2.293222 LR: 0.00001062 [15:48:37] Epoch: 1 Batch: 15414/20099 (76.69%) Loss: 2.060537 LR: 0.00001062 [15:48:39] Epoch: 1 Batch: 15415/20099 (76.70%) Loss: 1.940843 LR: 0.00001061 [15:48:41] Epoch: 1 Batch: 15416/20099 (76.70%) Loss: 1.998266 LR: 0.00001061 [15:48:43] Epoch: 1 Batch: 15417/20099 (76.71%) Loss: 2.206871 LR: 0.00001061 [15:48:44] Epoch: 1 Batch: 15418/20099 (76.71%) Loss: 2.037436 LR: 0.00001061 [15:48:46] Epoch: 1 Batch: 15419/20099 (76.72%) Loss: 2.017669 LR: 0.00001061 [15:48:48] Epoch: 1 Batch: 15420/20099 (76.72%) Loss: 1.915703 LR: 0.00001061 [15:48:50] Epoch: 1 Batch: 15421/20099 (76.73%) Loss: 2.088032 LR: 0.00001061 [15:48:52] Epoch: 1 Batch: 15422/20099 (76.73%) Loss: 1.935935 LR: 0.00001059 [15:48:54] Epoch: 1 Batch: 15423/20099 (76.74%) Loss: 2.225381 LR: 0.00001059 [15:48:56] Epoch: 1 Batch: 15424/20099 (76.74%) Loss: 2.021248 LR: 0.00001059 [15:48:58] Epoch: 1 Batch: 15425/20099 (76.75%) Loss: 2.185432 LR: 0.00001059 [15:48:59] Epoch: 1 Batch: 15426/20099 (76.75%) Loss: 2.328572 LR: 0.00001059 [15:49:01] Epoch: 1 Batch: 15427/20099 (76.76%) Loss: 1.983002 LR: 0.00001059 [15:49:03] Epoch: 1 Batch: 15428/20099 (76.76%) Loss: 1.960104 LR: 0.00001059 [15:49:05] Epoch: 1 Batch: 15429/20099 (76.77%) Loss: 2.139424 LR: 0.00001058 [15:49:07] Epoch: 1 Batch: 15430/20099 (76.77%) Loss: 2.054262 LR: 0.00001058 [15:49:09] Epoch: 1 Batch: 15431/20099 (76.77%) Loss: 2.104237 LR: 0.00001058 [15:49:11] Epoch: 1 Batch: 15432/20099 (76.78%) Loss: 2.113359 LR: 0.00001058 [15:49:12] Epoch: 1 Batch: 15433/20099 (76.78%) Loss: 2.174231 LR: 0.00001058 [15:49:14] Epoch: 1 Batch: 15434/20099 (76.79%) Loss: 2.036298 LR: 0.00001058 [15:49:16] Epoch: 1 Batch: 15435/20099 (76.79%) Loss: 2.316279 LR: 0.00001058 [15:49:18] Epoch: 1 Batch: 15436/20099 (76.80%) Loss: 1.991347 LR: 0.00001057 [15:49:20] Epoch: 1 Batch: 15437/20099 (76.80%) Loss: 2.198402 LR: 0.00001057 [15:49:22] Epoch: 1 Batch: 15438/20099 (76.81%) Loss: 2.201159 LR: 0.00001057 [15:49:23] Epoch: 1 Batch: 15439/20099 (76.81%) Loss: 1.845095 LR: 0.00001057 [15:49:25] Epoch: 1 Batch: 15440/20099 (76.82%) Loss: 2.215903 LR: 0.00001057 [15:49:27] Epoch: 1 Batch: 15441/20099 (76.82%) Loss: 2.004562 LR: 0.00001057 [15:49:29] Epoch: 1 Batch: 15442/20099 (76.83%) Loss: 1.925024 LR: 0.00001057 [15:49:31] Epoch: 1 Batch: 15443/20099 (76.83%) Loss: 2.285370 LR: 0.00001055 [15:49:33] Epoch: 1 Batch: 15444/20099 (76.84%) Loss: 2.211355 LR: 0.00001055 [15:49:35] Epoch: 1 Batch: 15445/20099 (76.84%) Loss: 1.970210 LR: 0.00001055 [15:49:36] Epoch: 1 Batch: 15446/20099 (76.85%) Loss: 2.032976 LR: 0.00001055 [15:49:38] Epoch: 1 Batch: 15447/20099 (76.85%) Loss: 1.953813 LR: 0.00001055 [15:49:40] Epoch: 1 Batch: 15448/20099 (76.86%) Loss: 2.135137 LR: 0.00001055 [15:49:42] Epoch: 1 Batch: 15449/20099 (76.86%) Loss: 1.843059 LR: 0.00001055 [15:49:44] Epoch: 1 Batch: 15450/20099 (76.87%) Loss: 2.398548 LR: 0.00001054 [15:49:46] Epoch: 1 Batch: 15451/20099 (76.87%) Loss: 2.227602 LR: 0.00001054 [15:49:47] Epoch: 1 Batch: 15452/20099 (76.88%) Loss: 2.184901 LR: 0.00001054 [15:49:49] Epoch: 1 Batch: 15453/20099 (76.88%) Loss: 2.113898 LR: 0.00001054 [15:49:51] Epoch: 1 Batch: 15454/20099 (76.89%) Loss: 1.927786 LR: 0.00001054 [15:49:53] Epoch: 1 Batch: 15455/20099 (76.89%) Loss: 2.019750 LR: 0.00001054 [15:49:55] Epoch: 1 Batch: 15456/20099 (76.90%) Loss: 2.172204 LR: 0.00001054 [15:49:57] Epoch: 1 Batch: 15457/20099 (76.90%) Loss: 2.195533 LR: 0.00001053 [15:49:59] Epoch: 1 Batch: 15458/20099 (76.91%) Loss: 2.417556 LR: 0.00001053 [15:50:00] Epoch: 1 Batch: 15459/20099 (76.91%) Loss: 1.661147 LR: 0.00001053 [15:50:02] Epoch: 1 Batch: 15460/20099 (76.92%) Loss: 1.673719 LR: 0.00001053 [15:50:04] Epoch: 1 Batch: 15461/20099 (76.92%) Loss: 2.203580 LR: 0.00001053 [15:50:06] Epoch: 1 Batch: 15462/20099 (76.93%) Loss: 2.148651 LR: 0.00001053 [15:50:08] Epoch: 1 Batch: 15463/20099 (76.93%) Loss: 2.285357 LR: 0.00001053 [15:50:10] Epoch: 1 Batch: 15464/20099 (76.94%) Loss: 2.070907 LR: 0.00001052 [15:50:12] Epoch: 1 Batch: 15465/20099 (76.94%) Loss: 2.470788 LR: 0.00001052 [15:50:13] Epoch: 1 Batch: 15466/20099 (76.95%) Loss: 2.489792 LR: 0.00001052 [15:50:15] Epoch: 1 Batch: 15467/20099 (76.95%) Loss: 2.138826 LR: 0.00001052 [15:50:17] Epoch: 1 Batch: 15468/20099 (76.96%) Loss: 2.191453 LR: 0.00001052 [15:50:19] Epoch: 1 Batch: 15469/20099 (76.96%) Loss: 2.314897 LR: 0.00001052 [15:50:21] Epoch: 1 Batch: 15470/20099 (76.97%) Loss: 2.428260 LR: 0.00001052 [15:50:23] Epoch: 1 Batch: 15471/20099 (76.97%) Loss: 1.937745 LR: 0.00001050 [15:50:25] Epoch: 1 Batch: 15472/20099 (76.98%) Loss: 2.076041 LR: 0.00001050 [15:50:26] Epoch: 1 Batch: 15473/20099 (76.98%) Loss: 2.316175 LR: 0.00001050 [15:50:28] Epoch: 1 Batch: 15474/20099 (76.99%) Loss: 1.933040 LR: 0.00001050 [15:50:30] Epoch: 1 Batch: 15475/20099 (76.99%) Loss: 2.101529 LR: 0.00001050 [15:50:32] Epoch: 1 Batch: 15476/20099 (77.00%) Loss: 1.771391 LR: 0.00001050 [15:50:34] Epoch: 1 Batch: 15477/20099 (77.00%) Loss: 1.861154 LR: 0.00001050 [15:50:36] Epoch: 1 Batch: 15478/20099 (77.01%) Loss: 2.233125 LR: 0.00001049 [15:50:38] Epoch: 1 Batch: 15479/20099 (77.01%) Loss: 2.029874 LR: 0.00001049 [15:50:39] Epoch: 1 Batch: 15480/20099 (77.02%) Loss: 2.368866 LR: 0.00001049 [15:50:41] Epoch: 1 Batch: 15481/20099 (77.02%) Loss: 2.043628 LR: 0.00001049 [15:50:43] Epoch: 1 Batch: 15482/20099 (77.03%) Loss: 2.073314 LR: 0.00001049 [15:50:45] Epoch: 1 Batch: 15483/20099 (77.03%) Loss: 1.944717 LR: 0.00001049 [15:50:47] Epoch: 1 Batch: 15484/20099 (77.04%) Loss: 2.297977 LR: 0.00001049 [15:50:49] Epoch: 1 Batch: 15485/20099 (77.04%) Loss: 2.014137 LR: 0.00001048 [15:50:50] Epoch: 1 Batch: 15486/20099 (77.05%) Loss: 1.989508 LR: 0.00001048 [15:50:52] Epoch: 1 Batch: 15487/20099 (77.05%) Loss: 2.271599 LR: 0.00001048 [15:50:54] Epoch: 1 Batch: 15488/20099 (77.06%) Loss: 2.165126 LR: 0.00001048 [15:50:56] Epoch: 1 Batch: 15489/20099 (77.06%) Loss: 2.003668 LR: 0.00001048 [15:50:58] Epoch: 1 Batch: 15490/20099 (77.07%) Loss: 2.109466 LR: 0.00001048 [15:51:00] Epoch: 1 Batch: 15491/20099 (77.07%) Loss: 2.016095 LR: 0.00001048 [15:51:02] Epoch: 1 Batch: 15492/20099 (77.08%) Loss: 2.128454 LR: 0.00001047 [15:51:03] Epoch: 1 Batch: 15493/20099 (77.08%) Loss: 2.162929 LR: 0.00001047 [15:51:05] Epoch: 1 Batch: 15494/20099 (77.09%) Loss: 1.761993 LR: 0.00001047 [15:51:07] Epoch: 1 Batch: 15495/20099 (77.09%) Loss: 2.107648 LR: 0.00001047 [15:51:09] Epoch: 1 Batch: 15496/20099 (77.10%) Loss: 2.174953 LR: 0.00001047 [15:51:11] Epoch: 1 Batch: 15497/20099 (77.10%) Loss: 2.359966 LR: 0.00001047 [15:51:13] Epoch: 1 Batch: 15498/20099 (77.11%) Loss: 2.103602 LR: 0.00001047 [15:51:15] Epoch: 1 Batch: 15499/20099 (77.11%) Loss: 2.105142 LR: 0.00001045 [15:51:16] >> Evaluating batch 0 [15:51:18] >> Evaluating batch 1 [15:51:19] >> Evaluating batch 2 [15:51:20] >> Evaluating batch 3 [15:51:21] >> Evaluating batch 4 [15:51:22] >> Evaluating batch 5 [15:51:23] >> Evaluating batch 6 [15:51:24] >> Evaluating batch 7 [15:51:25] >> Evaluating batch 8 [15:51:26] >> Evaluating batch 9 [15:51:27] >> Evaluating batch 10 [15:51:28] >> Evaluating batch 11 [15:51:29] >> Evaluating batch 12 [15:51:30] >> Evaluating batch 13 [15:51:31] >> Evaluating batch 14 [15:51:32] >> Evaluating batch 15 [15:51:33] >> Evaluating batch 16 [15:51:34] Epoch: 1 Step: 15500/20099 Evaluation: [15:51:34] [1mAvg Loss Since Last Eval: 2.0979 Val Loss: 2.1504 Validation loss delta: -0.0028 Perplexity: 8.5879 LR: 0.00001045 [15:51:38] >> Checkpoint saved: epoch1_step15500, size: 0.1693 GB [15:51:38] Epoch: 1 Batch: 15500/20099 (77.12%) Loss: 1.783126 LR: 0.00001045 [15:51:40] Epoch: 1 Batch: 15501/20099 (77.12%) Loss: 2.151868 LR: 0.00001045 [15:51:42] Epoch: 1 Batch: 15502/20099 (77.13%) Loss: 2.051015 LR: 0.00001045 [15:51:44] Epoch: 1 Batch: 15503/20099 (77.13%) Loss: 1.664257 LR: 0.00001045 [15:51:45] Epoch: 1 Batch: 15504/20099 (77.14%) Loss: 2.154980 LR: 0.00001045 [15:51:47] Epoch: 1 Batch: 15505/20099 (77.14%) Loss: 2.316065 LR: 0.00001045 [15:51:49] Epoch: 1 Batch: 15506/20099 (77.15%) Loss: 2.253338 LR: 0.00001044 [15:51:51] Epoch: 1 Batch: 15507/20099 (77.15%) Loss: 2.069305 LR: 0.00001044 [15:51:53] Epoch: 1 Batch: 15508/20099 (77.16%) Loss: 2.113972 LR: 0.00001044 [15:51:55] Epoch: 1 Batch: 15509/20099 (77.16%) Loss: 2.306899 LR: 0.00001044 [15:51:57] Epoch: 1 Batch: 15510/20099 (77.17%) Loss: 1.902904 LR: 0.00001044 [15:51:58] Epoch: 1 Batch: 15511/20099 (77.17%) Loss: 1.918498 LR: 0.00001044 [15:52:00] Epoch: 1 Batch: 15512/20099 (77.18%) Loss: 2.075801 LR: 0.00001044 [15:52:02] Epoch: 1 Batch: 15513/20099 (77.18%) Loss: 1.901731 LR: 0.00001043 [15:52:04] Epoch: 1 Batch: 15514/20099 (77.19%) Loss: 2.298077 LR: 0.00001043 [15:52:06] Epoch: 1 Batch: 15515/20099 (77.19%) Loss: 2.278167 LR: 0.00001043 [15:52:08] Epoch: 1 Batch: 15516/20099 (77.20%) Loss: 2.283703 LR: 0.00001043 [15:52:10] Epoch: 1 Batch: 15517/20099 (77.20%) Loss: 2.006898 LR: 0.00001043 [15:52:12] Epoch: 1 Batch: 15518/20099 (77.21%) Loss: 2.052295 LR: 0.00001043 [15:52:13] Epoch: 1 Batch: 15519/20099 (77.21%) Loss: 1.961916 LR: 0.00001043 [15:52:15] Epoch: 1 Batch: 15520/20099 (77.22%) Loss: 1.953322 LR: 0.00001042 [15:52:17] Epoch: 1 Batch: 15521/20099 (77.22%) Loss: 2.049419 LR: 0.00001042 [15:52:19] Epoch: 1 Batch: 15522/20099 (77.23%) Loss: 1.989728 LR: 0.00001042 [15:52:21] Epoch: 1 Batch: 15523/20099 (77.23%) Loss: 1.962202 LR: 0.00001042 [15:52:23] Epoch: 1 Batch: 15524/20099 (77.24%) Loss: 2.300329 LR: 0.00001042 [15:52:25] Epoch: 1 Batch: 15525/20099 (77.24%) Loss: 1.937537 LR: 0.00001042 [15:52:26] Epoch: 1 Batch: 15526/20099 (77.25%) Loss: 2.042443 LR: 0.00001042 [15:52:28] Epoch: 1 Batch: 15527/20099 (77.25%) Loss: 1.890192 LR: 0.00001040 [15:52:30] Epoch: 1 Batch: 15528/20099 (77.26%) Loss: 2.058901 LR: 0.00001040 [15:52:32] Epoch: 1 Batch: 15529/20099 (77.26%) Loss: 1.856785 LR: 0.00001040 [15:52:34] Epoch: 1 Batch: 15530/20099 (77.27%) Loss: 2.109856 LR: 0.00001040 [15:52:36] Epoch: 1 Batch: 15531/20099 (77.27%) Loss: 1.651166 LR: 0.00001040 [15:52:38] Epoch: 1 Batch: 15532/20099 (77.28%) Loss: 2.019732 LR: 0.00001040 [15:52:39] Epoch: 1 Batch: 15533/20099 (77.28%) Loss: 2.156766 LR: 0.00001040 [15:52:41] Epoch: 1 Batch: 15534/20099 (77.29%) Loss: 1.967450 LR: 0.00001039 [15:52:43] Epoch: 1 Batch: 15535/20099 (77.29%) Loss: 2.107119 LR: 0.00001039 [15:52:45] Epoch: 1 Batch: 15536/20099 (77.30%) Loss: 2.122323 LR: 0.00001039 [15:52:47] Epoch: 1 Batch: 15537/20099 (77.30%) Loss: 1.559146 LR: 0.00001039 [15:52:49] Epoch: 1 Batch: 15538/20099 (77.31%) Loss: 2.045942 LR: 0.00001039 [15:52:50] Epoch: 1 Batch: 15539/20099 (77.31%) Loss: 2.207802 LR: 0.00001039 [15:52:52] Epoch: 1 Batch: 15540/20099 (77.32%) Loss: 1.936261 LR: 0.00001039 [15:52:54] Epoch: 1 Batch: 15541/20099 (77.32%) Loss: 2.156393 LR: 0.00001038 [15:52:56] Epoch: 1 Batch: 15542/20099 (77.33%) Loss: 2.043953 LR: 0.00001038 [15:52:58] Epoch: 1 Batch: 15543/20099 (77.33%) Loss: 2.241662 LR: 0.00001038 [15:53:00] Epoch: 1 Batch: 15544/20099 (77.34%) Loss: 1.952232 LR: 0.00001038 [15:53:01] Epoch: 1 Batch: 15545/20099 (77.34%) Loss: 2.060880 LR: 0.00001038 [15:53:03] Epoch: 1 Batch: 15546/20099 (77.35%) Loss: 2.003650 LR: 0.00001038 [15:53:05] Epoch: 1 Batch: 15547/20099 (77.35%) Loss: 2.081288 LR: 0.00001038 [15:53:07] Epoch: 1 Batch: 15548/20099 (77.36%) Loss: 1.799410 LR: 0.00001036 [15:53:09] Epoch: 1 Batch: 15549/20099 (77.36%) Loss: 2.099184 LR: 0.00001036 [15:53:11] Epoch: 1 Batch: 15550/20099 (77.37%) Loss: 2.065982 LR: 0.00001036 [15:53:13] Epoch: 1 Batch: 15551/20099 (77.37%) Loss: 1.933086 LR: 0.00001036 [15:53:14] Epoch: 1 Batch: 15552/20099 (77.38%) Loss: 1.653019 LR: 0.00001036 [15:53:16] Epoch: 1 Batch: 15553/20099 (77.38%) Loss: 1.851772 LR: 0.00001036 [15:53:18] Epoch: 1 Batch: 15554/20099 (77.39%) Loss: 2.249006 LR: 0.00001036 [15:53:20] Epoch: 1 Batch: 15555/20099 (77.39%) Loss: 2.035734 LR: 0.00001035 [15:53:22] Epoch: 1 Batch: 15556/20099 (77.40%) Loss: 2.202520 LR: 0.00001035 [15:53:24] Epoch: 1 Batch: 15557/20099 (77.40%) Loss: 1.954386 LR: 0.00001035 [15:53:25] Epoch: 1 Batch: 15558/20099 (77.41%) Loss: 2.028545 LR: 0.00001035 [15:53:27] Epoch: 1 Batch: 15559/20099 (77.41%) Loss: 2.326250 LR: 0.00001035 [15:53:29] Epoch: 1 Batch: 15560/20099 (77.42%) Loss: 2.211926 LR: 0.00001035 [15:53:31] Epoch: 1 Batch: 15561/20099 (77.42%) Loss: 1.985471 LR: 0.00001035 [15:53:33] Epoch: 1 Batch: 15562/20099 (77.43%) Loss: 2.116374 LR: 0.00001034 [15:53:35] Epoch: 1 Batch: 15563/20099 (77.43%) Loss: 2.589301 LR: 0.00001034 [15:53:37] Epoch: 1 Batch: 15564/20099 (77.44%) Loss: 2.016425 LR: 0.00001034 [15:53:38] Epoch: 1 Batch: 15565/20099 (77.44%) Loss: 1.549008 LR: 0.00001034 [15:53:40] Epoch: 1 Batch: 15566/20099 (77.45%) Loss: 2.320504 LR: 0.00001034 [15:53:42] Epoch: 1 Batch: 15567/20099 (77.45%) Loss: 2.432944 LR: 0.00001034 [15:53:44] Epoch: 1 Batch: 15568/20099 (77.46%) Loss: 2.205656 LR: 0.00001034 [15:53:46] Epoch: 1 Batch: 15569/20099 (77.46%) Loss: 2.157797 LR: 0.00001033 [15:53:48] Epoch: 1 Batch: 15570/20099 (77.47%) Loss: 2.226482 LR: 0.00001033 [15:53:50] Epoch: 1 Batch: 15571/20099 (77.47%) Loss: 2.364875 LR: 0.00001033 [15:53:51] Epoch: 1 Batch: 15572/20099 (77.48%) Loss: 2.060739 LR: 0.00001033 [15:53:53] Epoch: 1 Batch: 15573/20099 (77.48%) Loss: 2.180540 LR: 0.00001033 [15:53:55] Epoch: 1 Batch: 15574/20099 (77.49%) Loss: 1.971587 LR: 0.00001033 [15:53:57] Epoch: 1 Batch: 15575/20099 (77.49%) Loss: 2.318553 LR: 0.00001033 [15:53:59] Epoch: 1 Batch: 15576/20099 (77.50%) Loss: 2.319774 LR: 0.00001031 [15:54:01] Epoch: 1 Batch: 15577/20099 (77.50%) Loss: 2.178058 LR: 0.00001031 [15:54:02] Epoch: 1 Batch: 15578/20099 (77.51%) Loss: 1.992712 LR: 0.00001031 [15:54:04] Epoch: 1 Batch: 15579/20099 (77.51%) Loss: 2.201994 LR: 0.00001031 [15:54:06] Epoch: 1 Batch: 15580/20099 (77.52%) Loss: 2.176192 LR: 0.00001031 [15:54:08] Epoch: 1 Batch: 15581/20099 (77.52%) Loss: 2.050178 LR: 0.00001031 [15:54:10] Epoch: 1 Batch: 15582/20099 (77.53%) Loss: 1.971581 LR: 0.00001031 [15:54:12] Epoch: 1 Batch: 15583/20099 (77.53%) Loss: 2.188823 LR: 0.00001030 [15:54:14] Epoch: 1 Batch: 15584/20099 (77.54%) Loss: 2.261289 LR: 0.00001030 [15:54:15] Epoch: 1 Batch: 15585/20099 (77.54%) Loss: 1.993728 LR: 0.00001030 [15:54:17] Epoch: 1 Batch: 15586/20099 (77.55%) Loss: 1.967864 LR: 0.00001030 [15:54:19] Epoch: 1 Batch: 15587/20099 (77.55%) Loss: 1.910121 LR: 0.00001030 [15:54:21] Epoch: 1 Batch: 15588/20099 (77.56%) Loss: 2.221543 LR: 0.00001030 [15:54:23] Epoch: 1 Batch: 15589/20099 (77.56%) Loss: 2.122075 LR: 0.00001030 [15:54:25] Epoch: 1 Batch: 15590/20099 (77.57%) Loss: 2.145648 LR: 0.00001029 [15:54:27] Epoch: 1 Batch: 15591/20099 (77.57%) Loss: 2.261636 LR: 0.00001029 [15:54:28] Epoch: 1 Batch: 15592/20099 (77.58%) Loss: 1.870109 LR: 0.00001029 [15:54:30] Epoch: 1 Batch: 15593/20099 (77.58%) Loss: 1.972451 LR: 0.00001029 [15:54:32] Epoch: 1 Batch: 15594/20099 (77.59%) Loss: 2.236895 LR: 0.00001029 [15:54:34] Epoch: 1 Batch: 15595/20099 (77.59%) Loss: 1.842286 LR: 0.00001029 [15:54:36] Epoch: 1 Batch: 15596/20099 (77.60%) Loss: 2.272804 LR: 0.00001029 [15:54:38] Epoch: 1 Batch: 15597/20099 (77.60%) Loss: 2.117489 LR: 0.00001028 [15:54:40] Epoch: 1 Batch: 15598/20099 (77.61%) Loss: 1.909019 LR: 0.00001028 [15:54:41] Epoch: 1 Batch: 15599/20099 (77.61%) Loss: 2.129066 LR: 0.00001028 [15:54:47] >> Cleaned up old temp checkpoint: epoch1_step13600 [15:54:47] >> Temp checkpoint saved: epoch1_step15600, size: 0.1693 GB [15:54:47] Epoch: 1 Batch: 15600/20099 (77.62%) Loss: 2.021706 LR: 0.00001028 [15:54:49] Epoch: 1 Batch: 15601/20099 (77.62%) Loss: 1.960383 LR: 0.00001028 [15:54:51] Epoch: 1 Batch: 15602/20099 (77.63%) Loss: 2.075302 LR: 0.00001028 [15:54:52] Epoch: 1 Batch: 15603/20099 (77.63%) Loss: 1.948139 LR: 0.00001028 [15:54:54] Epoch: 1 Batch: 15604/20099 (77.64%) Loss: 2.230962 LR: 0.00001027 [15:54:56] Epoch: 1 Batch: 15605/20099 (77.64%) Loss: 1.932310 LR: 0.00001027 [15:54:58] Epoch: 1 Batch: 15606/20099 (77.65%) Loss: 1.975339 LR: 0.00001027 [15:55:00] Epoch: 1 Batch: 15607/20099 (77.65%) Loss: 2.238906 LR: 0.00001027 [15:55:02] Epoch: 1 Batch: 15608/20099 (77.66%) Loss: 2.086926 LR: 0.00001027 [15:55:04] Epoch: 1 Batch: 15609/20099 (77.66%) Loss: 2.194679 LR: 0.00001027 [15:55:05] Epoch: 1 Batch: 15610/20099 (77.67%) Loss: 2.123814 LR: 0.00001027 [15:55:07] Epoch: 1 Batch: 15611/20099 (77.67%) Loss: 2.171151 LR: 0.00001025 [15:55:09] Epoch: 1 Batch: 15612/20099 (77.68%) Loss: 2.017267 LR: 0.00001025 [15:55:11] Epoch: 1 Batch: 15613/20099 (77.68%) Loss: 1.959557 LR: 0.00001025 [15:55:13] Epoch: 1 Batch: 15614/20099 (77.69%) Loss: 2.179789 LR: 0.00001025 [15:55:15] Epoch: 1 Batch: 15615/20099 (77.69%) Loss: 2.149274 LR: 0.00001025 [15:55:17] Epoch: 1 Batch: 15616/20099 (77.70%) Loss: 1.979704 LR: 0.00001025 [15:55:18] Epoch: 1 Batch: 15617/20099 (77.70%) Loss: 2.094185 LR: 0.00001025 [15:55:20] Epoch: 1 Batch: 15618/20099 (77.71%) Loss: 2.050099 LR: 0.00001024 [15:55:22] Epoch: 1 Batch: 15619/20099 (77.71%) Loss: 2.118863 LR: 0.00001024 [15:55:24] Epoch: 1 Batch: 15620/20099 (77.72%) Loss: 2.302886 LR: 0.00001024 [15:55:26] Epoch: 1 Batch: 15621/20099 (77.72%) Loss: 2.011087 LR: 0.00001024 [15:55:28] Epoch: 1 Batch: 15622/20099 (77.73%) Loss: 2.249615 LR: 0.00001024 [15:55:30] Epoch: 1 Batch: 15623/20099 (77.73%) Loss: 2.051770 LR: 0.00001024 [15:55:32] Epoch: 1 Batch: 15624/20099 (77.74%) Loss: 2.355285 LR: 0.00001024 [15:55:33] Epoch: 1 Batch: 15625/20099 (77.74%) Loss: 2.138605 LR: 0.00001023 [15:55:35] Epoch: 1 Batch: 15626/20099 (77.75%) Loss: 2.001810 LR: 0.00001023 [15:55:37] Epoch: 1 Batch: 15627/20099 (77.75%) Loss: 2.375710 LR: 0.00001023 [15:55:39] Epoch: 1 Batch: 15628/20099 (77.76%) Loss: 1.858228 LR: 0.00001023 [15:55:41] Epoch: 1 Batch: 15629/20099 (77.76%) Loss: 1.801430 LR: 0.00001023 [15:55:43] Epoch: 1 Batch: 15630/20099 (77.77%) Loss: 1.992192 LR: 0.00001023 [15:55:45] Epoch: 1 Batch: 15631/20099 (77.77%) Loss: 2.365761 LR: 0.00001023 [15:55:46] Epoch: 1 Batch: 15632/20099 (77.78%) Loss: 2.119000 LR: 0.00001022 [15:55:48] Epoch: 1 Batch: 15633/20099 (77.78%) Loss: 2.460329 LR: 0.00001022 [15:55:50] Epoch: 1 Batch: 15634/20099 (77.78%) Loss: 1.991013 LR: 0.00001022 [15:55:52] Epoch: 1 Batch: 15635/20099 (77.79%) Loss: 1.801908 LR: 0.00001022 [15:55:54] Epoch: 1 Batch: 15636/20099 (77.79%) Loss: 2.100223 LR: 0.00001022 [15:55:56] Epoch: 1 Batch: 15637/20099 (77.80%) Loss: 1.908920 LR: 0.00001022 [15:55:57] Epoch: 1 Batch: 15638/20099 (77.80%) Loss: 2.105090 LR: 0.00001022 [15:55:59] Epoch: 1 Batch: 15639/20099 (77.81%) Loss: 1.817190 LR: 0.00001020 [15:56:01] Epoch: 1 Batch: 15640/20099 (77.81%) Loss: 1.937742 LR: 0.00001020 [15:56:03] Epoch: 1 Batch: 15641/20099 (77.82%) Loss: 2.051777 LR: 0.00001020 [15:56:05] Epoch: 1 Batch: 15642/20099 (77.82%) Loss: 1.970969 LR: 0.00001020 [15:56:07] Epoch: 1 Batch: 15643/20099 (77.83%) Loss: 2.154370 LR: 0.00001020 [15:56:09] Epoch: 1 Batch: 15644/20099 (77.83%) Loss: 2.030743 LR: 0.00001020 [15:56:10] Epoch: 1 Batch: 15645/20099 (77.84%) Loss: 1.763569 LR: 0.00001020 [15:56:12] Epoch: 1 Batch: 15646/20099 (77.84%) Loss: 1.837155 LR: 0.00001019 [15:56:14] Epoch: 1 Batch: 15647/20099 (77.85%) Loss: 2.001074 LR: 0.00001019 [15:56:16] Epoch: 1 Batch: 15648/20099 (77.85%) Loss: 1.983035 LR: 0.00001019 [15:56:18] Epoch: 1 Batch: 15649/20099 (77.86%) Loss: 1.869951 LR: 0.00001019 [15:56:20] Epoch: 1 Batch: 15650/20099 (77.86%) Loss: 2.091889 LR: 0.00001019 [15:56:21] Epoch: 1 Batch: 15651/20099 (77.87%) Loss: 1.983964 LR: 0.00001019 [15:56:23] Epoch: 1 Batch: 15652/20099 (77.87%) Loss: 2.137774 LR: 0.00001019 [15:56:25] Epoch: 1 Batch: 15653/20099 (77.88%) Loss: 2.080971 LR: 0.00001018 [15:56:27] Epoch: 1 Batch: 15654/20099 (77.88%) Loss: 1.870296 LR: 0.00001018 [15:56:29] Epoch: 1 Batch: 15655/20099 (77.89%) Loss: 2.010511 LR: 0.00001018 [15:56:31] Epoch: 1 Batch: 15656/20099 (77.89%) Loss: 1.821626 LR: 0.00001018 [15:56:33] Epoch: 1 Batch: 15657/20099 (77.90%) Loss: 1.947908 LR: 0.00001018 [15:56:34] Epoch: 1 Batch: 15658/20099 (77.90%) Loss: 2.111537 LR: 0.00001018 [15:56:36] Epoch: 1 Batch: 15659/20099 (77.91%) Loss: 1.988392 LR: 0.00001018 [15:56:38] Epoch: 1 Batch: 15660/20099 (77.91%) Loss: 2.302016 LR: 0.00001017 [15:56:40] Epoch: 1 Batch: 15661/20099 (77.92%) Loss: 2.287156 LR: 0.00001017 [15:56:42] Epoch: 1 Batch: 15662/20099 (77.92%) Loss: 2.209000 LR: 0.00001017 [15:56:44] Epoch: 1 Batch: 15663/20099 (77.93%) Loss: 2.206934 LR: 0.00001017 [15:56:45] Epoch: 1 Batch: 15664/20099 (77.93%) Loss: 1.912124 LR: 0.00001017 [15:56:47] Epoch: 1 Batch: 15665/20099 (77.94%) Loss: 2.110021 LR: 0.00001017 [15:56:49] Epoch: 1 Batch: 15666/20099 (77.94%) Loss: 2.036299 LR: 0.00001017 [15:56:51] Epoch: 1 Batch: 15667/20099 (77.95%) Loss: 2.342015 LR: 0.00001015 [15:56:53] Epoch: 1 Batch: 15668/20099 (77.95%) Loss: 2.244155 LR: 0.00001015 [15:56:55] Epoch: 1 Batch: 15669/20099 (77.96%) Loss: 1.996850 LR: 0.00001015 [15:56:57] Epoch: 1 Batch: 15670/20099 (77.96%) Loss: 1.853235 LR: 0.00001015 [15:56:58] Epoch: 1 Batch: 15671/20099 (77.97%) Loss: 2.167412 LR: 0.00001015 [15:57:00] Epoch: 1 Batch: 15672/20099 (77.97%) Loss: 2.124855 LR: 0.00001015 [15:57:02] Epoch: 1 Batch: 15673/20099 (77.98%) Loss: 1.925758 LR: 0.00001015 [15:57:04] Epoch: 1 Batch: 15674/20099 (77.98%) Loss: 2.082839 LR: 0.00001014 [15:57:06] Epoch: 1 Batch: 15675/20099 (77.99%) Loss: 1.947602 LR: 0.00001014 [15:57:08] Epoch: 1 Batch: 15676/20099 (77.99%) Loss: 2.264728 LR: 0.00001014 [15:57:10] Epoch: 1 Batch: 15677/20099 (78.00%) Loss: 2.192770 LR: 0.00001014 [15:57:11] Epoch: 1 Batch: 15678/20099 (78.00%) Loss: 2.032751 LR: 0.00001014 [15:57:13] Epoch: 1 Batch: 15679/20099 (78.01%) Loss: 2.001391 LR: 0.00001014 [15:57:15] Epoch: 1 Batch: 15680/20099 (78.01%) Loss: 2.532889 LR: 0.00001014 [15:57:17] Epoch: 1 Batch: 15681/20099 (78.02%) Loss: 2.396230 LR: 0.00001013 [15:57:19] Epoch: 1 Batch: 15682/20099 (78.02%) Loss: 2.062712 LR: 0.00001013 [15:57:21] Epoch: 1 Batch: 15683/20099 (78.03%) Loss: 2.077476 LR: 0.00001013 [15:57:23] Epoch: 1 Batch: 15684/20099 (78.03%) Loss: 2.034358 LR: 0.00001013 [15:57:25] Epoch: 1 Batch: 15685/20099 (78.04%) Loss: 2.145914 LR: 0.00001013 [15:57:26] Epoch: 1 Batch: 15686/20099 (78.04%) Loss: 1.954157 LR: 0.00001013 [15:57:28] Epoch: 1 Batch: 15687/20099 (78.05%) Loss: 1.810053 LR: 0.00001013 [15:57:30] Epoch: 1 Batch: 15688/20099 (78.05%) Loss: 2.200597 LR: 0.00001012 [15:57:32] Epoch: 1 Batch: 15689/20099 (78.06%) Loss: 2.030104 LR: 0.00001012 [15:57:34] Epoch: 1 Batch: 15690/20099 (78.06%) Loss: 2.218868 LR: 0.00001012 [15:57:36] Epoch: 1 Batch: 15691/20099 (78.07%) Loss: 1.740746 LR: 0.00001012 [15:57:38] Epoch: 1 Batch: 15692/20099 (78.07%) Loss: 2.049696 LR: 0.00001012 [15:57:39] Epoch: 1 Batch: 15693/20099 (78.08%) Loss: 2.253983 LR: 0.00001012 [15:57:41] Epoch: 1 Batch: 15694/20099 (78.08%) Loss: 2.236936 LR: 0.00001012 [15:57:43] Epoch: 1 Batch: 15695/20099 (78.09%) Loss: 2.352349 LR: 0.00001010 [15:57:45] Epoch: 1 Batch: 15696/20099 (78.09%) Loss: 2.193680 LR: 0.00001010 [15:57:47] Epoch: 1 Batch: 15697/20099 (78.10%) Loss: 1.813413 LR: 0.00001010 [15:57:49] Epoch: 1 Batch: 15698/20099 (78.10%) Loss: 2.093519 LR: 0.00001010 [15:57:51] Epoch: 1 Batch: 15699/20099 (78.11%) Loss: 2.182641 LR: 0.00001010 [15:57:52] Epoch: 1 Batch: 15700/20099 (78.11%) Loss: 2.157641 LR: 0.00001010 [15:57:54] Epoch: 1 Batch: 15701/20099 (78.12%) Loss: 2.059607 LR: 0.00001010 [15:57:56] Epoch: 1 Batch: 15702/20099 (78.12%) Loss: 1.827855 LR: 0.00001009 [15:57:58] Epoch: 1 Batch: 15703/20099 (78.13%) Loss: 2.143876 LR: 0.00001009 [15:58:00] Epoch: 1 Batch: 15704/20099 (78.13%) Loss: 1.906261 LR: 0.00001009 [15:58:02] Epoch: 1 Batch: 15705/20099 (78.14%) Loss: 1.935135 LR: 0.00001009 [15:58:04] Epoch: 1 Batch: 15706/20099 (78.14%) Loss: 2.119717 LR: 0.00001009 [15:58:05] Epoch: 1 Batch: 15707/20099 (78.15%) Loss: 2.268354 LR: 0.00001009 [15:58:07] Epoch: 1 Batch: 15708/20099 (78.15%) Loss: 2.158185 LR: 0.00001009 [15:58:09] Epoch: 1 Batch: 15709/20099 (78.16%) Loss: 2.276398 LR: 0.00001008 [15:58:11] Epoch: 1 Batch: 15710/20099 (78.16%) Loss: 2.106095 LR: 0.00001008 [15:58:13] Epoch: 1 Batch: 15711/20099 (78.17%) Loss: 1.940248 LR: 0.00001008 [15:58:15] Epoch: 1 Batch: 15712/20099 (78.17%) Loss: 1.900149 LR: 0.00001008 [15:58:16] Epoch: 1 Batch: 15713/20099 (78.18%) Loss: 1.855711 LR: 0.00001008 [15:58:18] Epoch: 1 Batch: 15714/20099 (78.18%) Loss: 2.253801 LR: 0.00001008 [15:58:20] Epoch: 1 Batch: 15715/20099 (78.19%) Loss: 2.010630 LR: 0.00001008 [15:58:22] Epoch: 1 Batch: 15716/20099 (78.19%) Loss: 2.166205 LR: 0.00001007 [15:58:24] Epoch: 1 Batch: 15717/20099 (78.20%) Loss: 2.149372 LR: 0.00001007 [15:58:26] Epoch: 1 Batch: 15718/20099 (78.20%) Loss: 2.240678 LR: 0.00001007 [15:58:28] Epoch: 1 Batch: 15719/20099 (78.21%) Loss: 2.367652 LR: 0.00001007 [15:58:30] Epoch: 1 Batch: 15720/20099 (78.21%) Loss: 2.237463 LR: 0.00001007 [15:58:31] Epoch: 1 Batch: 15721/20099 (78.22%) Loss: 2.124147 LR: 0.00001007 [15:58:33] Epoch: 1 Batch: 15722/20099 (78.22%) Loss: 2.136568 LR: 0.00001007 [15:58:35] Epoch: 1 Batch: 15723/20099 (78.23%) Loss: 2.024474 LR: 0.00001006 [15:58:37] Epoch: 1 Batch: 15724/20099 (78.23%) Loss: 1.914419 LR: 0.00001006 [15:58:39] Epoch: 1 Batch: 15725/20099 (78.24%) Loss: 1.746320 LR: 0.00001006 [15:58:41] Epoch: 1 Batch: 15726/20099 (78.24%) Loss: 2.078390 LR: 0.00001006 [15:58:43] Epoch: 1 Batch: 15727/20099 (78.25%) Loss: 1.903078 LR: 0.00001006 [15:58:44] Epoch: 1 Batch: 15728/20099 (78.25%) Loss: 2.181024 LR: 0.00001006 [15:58:46] Epoch: 1 Batch: 15729/20099 (78.26%) Loss: 1.939851 LR: 0.00001006 [15:58:48] Epoch: 1 Batch: 15730/20099 (78.26%) Loss: 2.060353 LR: 0.00001004 [15:58:50] Epoch: 1 Batch: 15731/20099 (78.27%) Loss: 2.166804 LR: 0.00001004 [15:58:52] Epoch: 1 Batch: 15732/20099 (78.27%) Loss: 2.285968 LR: 0.00001004 [15:58:54] Epoch: 1 Batch: 15733/20099 (78.28%) Loss: 2.215719 LR: 0.00001004 [15:58:56] Epoch: 1 Batch: 15734/20099 (78.28%) Loss: 1.830772 LR: 0.00001004 [15:58:57] Epoch: 1 Batch: 15735/20099 (78.29%) Loss: 2.100947 LR: 0.00001004 [15:58:59] Epoch: 1 Batch: 15736/20099 (78.29%) Loss: 1.913176 LR: 0.00001004 [15:59:01] Epoch: 1 Batch: 15737/20099 (78.30%) Loss: 2.159969 LR: 0.00001003 [15:59:03] Epoch: 1 Batch: 15738/20099 (78.30%) Loss: 2.170730 LR: 0.00001003 [15:59:05] Epoch: 1 Batch: 15739/20099 (78.31%) Loss: 2.252765 LR: 0.00001003 [15:59:07] Epoch: 1 Batch: 15740/20099 (78.31%) Loss: 2.307674 LR: 0.00001003 [15:59:09] Epoch: 1 Batch: 15741/20099 (78.32%) Loss: 2.524321 LR: 0.00001003 [15:59:10] Epoch: 1 Batch: 15742/20099 (78.32%) Loss: 2.226506 LR: 0.00001003 [15:59:12] Epoch: 1 Batch: 15743/20099 (78.33%) Loss: 1.990787 LR: 0.00001003 [15:59:14] Epoch: 1 Batch: 15744/20099 (78.33%) Loss: 2.087411 LR: 0.00001002 [15:59:16] Epoch: 1 Batch: 15745/20099 (78.34%) Loss: 2.154553 LR: 0.00001002 [15:59:18] Epoch: 1 Batch: 15746/20099 (78.34%) Loss: 1.968263 LR: 0.00001002 [15:59:20] Epoch: 1 Batch: 15747/20099 (78.35%) Loss: 2.219302 LR: 0.00001002 [15:59:21] Epoch: 1 Batch: 15748/20099 (78.35%) Loss: 2.055678 LR: 0.00001002 [15:59:23] Epoch: 1 Batch: 15749/20099 (78.36%) Loss: 1.884024 LR: 0.00001002 [15:59:25] Epoch: 1 Batch: 15750/20099 (78.36%) Loss: 2.137870 LR: 0.00001002 [15:59:27] Epoch: 1 Batch: 15751/20099 (78.37%) Loss: 2.139441 LR: 0.00001001 [15:59:29] Epoch: 1 Batch: 15752/20099 (78.37%) Loss: 1.967782 LR: 0.00001001 [15:59:31] Epoch: 1 Batch: 15753/20099 (78.38%) Loss: 2.152537 LR: 0.00001001 [15:59:33] Epoch: 1 Batch: 15754/20099 (78.38%) Loss: 2.031475 LR: 0.00001001 [15:59:34] Epoch: 1 Batch: 15755/20099 (78.39%) Loss: 2.232095 LR: 0.00001001 [15:59:36] Epoch: 1 Batch: 15756/20099 (78.39%) Loss: 2.274142 LR: 0.00001001 [15:59:38] Epoch: 1 Batch: 15757/20099 (78.40%) Loss: 2.126869 LR: 0.00001001 [15:59:40] Epoch: 1 Batch: 15758/20099 (78.40%) Loss: 2.427935 LR: 0.00001000 [15:59:42] Epoch: 1 Batch: 15759/20099 (78.41%) Loss: 2.234783 LR: 0.00001000 [15:59:44] Epoch: 1 Batch: 15760/20099 (78.41%) Loss: 2.258379 LR: 0.00001000 [15:59:46] Epoch: 1 Batch: 15761/20099 (78.42%) Loss: 2.019770 LR: 0.00001000 [15:59:47] Epoch: 1 Batch: 15762/20099 (78.42%) Loss: 1.767642 LR: 0.00001000 [15:59:49] Epoch: 1 Batch: 15763/20099 (78.43%) Loss: 2.335553 LR: 0.00001000 [15:59:51] Epoch: 1 Batch: 15764/20099 (78.43%) Loss: 1.960418 LR: 0.00001000 [15:59:53] Epoch: 1 Batch: 15765/20099 (78.44%) Loss: 2.246634 LR: 0.00000998 [15:59:55] Epoch: 1 Batch: 15766/20099 (78.44%) Loss: 1.892395 LR: 0.00000998 [15:59:57] Epoch: 1 Batch: 15767/20099 (78.45%) Loss: 2.049098 LR: 0.00000998 [15:59:58] Epoch: 1 Batch: 15768/20099 (78.45%) Loss: 2.130647 LR: 0.00000998 [16:00:00] Epoch: 1 Batch: 15769/20099 (78.46%) Loss: 2.180283 LR: 0.00000998 [16:00:02] Epoch: 1 Batch: 15770/20099 (78.46%) Loss: 2.019328 LR: 0.00000998 [16:00:04] Epoch: 1 Batch: 15771/20099 (78.47%) Loss: 2.058983 LR: 0.00000998 [16:00:06] Epoch: 1 Batch: 15772/20099 (78.47%) Loss: 2.042424 LR: 0.00000997 [16:00:08] Epoch: 1 Batch: 15773/20099 (78.48%) Loss: 1.757224 LR: 0.00000997 [16:00:10] Epoch: 1 Batch: 15774/20099 (78.48%) Loss: 2.053060 LR: 0.00000997 [16:00:11] Epoch: 1 Batch: 15775/20099 (78.49%) Loss: 2.046332 LR: 0.00000997 [16:00:13] Epoch: 1 Batch: 15776/20099 (78.49%) Loss: 2.144575 LR: 0.00000997 [16:00:15] Epoch: 1 Batch: 15777/20099 (78.50%) Loss: 2.469840 LR: 0.00000997 [16:00:17] Epoch: 1 Batch: 15778/20099 (78.50%) Loss: 2.391606 LR: 0.00000997 [16:00:19] Epoch: 1 Batch: 15779/20099 (78.51%) Loss: 2.216549 LR: 0.00000996 [16:00:21] Epoch: 1 Batch: 15780/20099 (78.51%) Loss: 2.280471 LR: 0.00000996 [16:00:23] Epoch: 1 Batch: 15781/20099 (78.52%) Loss: 1.925820 LR: 0.00000996 [16:00:24] Epoch: 1 Batch: 15782/20099 (78.52%) Loss: 2.251534 LR: 0.00000996 [16:00:26] Epoch: 1 Batch: 15783/20099 (78.53%) Loss: 2.329866 LR: 0.00000996 [16:00:28] Epoch: 1 Batch: 15784/20099 (78.53%) Loss: 2.107868 LR: 0.00000996 [16:00:30] Epoch: 1 Batch: 15785/20099 (78.54%) Loss: 2.016022 LR: 0.00000996 [16:00:32] Epoch: 1 Batch: 15786/20099 (78.54%) Loss: 2.247635 LR: 0.00000995 [16:00:34] Epoch: 1 Batch: 15787/20099 (78.55%) Loss: 2.103354 LR: 0.00000995 [16:00:35] Epoch: 1 Batch: 15788/20099 (78.55%) Loss: 1.929298 LR: 0.00000995 [16:00:37] Epoch: 1 Batch: 15789/20099 (78.56%) Loss: 1.694338 LR: 0.00000995 [16:00:39] Epoch: 1 Batch: 15790/20099 (78.56%) Loss: 1.943704 LR: 0.00000995 [16:00:41] Epoch: 1 Batch: 15791/20099 (78.57%) Loss: 2.047667 LR: 0.00000995 [16:00:43] Epoch: 1 Batch: 15792/20099 (78.57%) Loss: 2.258962 LR: 0.00000995 [16:00:45] Epoch: 1 Batch: 15793/20099 (78.58%) Loss: 2.104939 LR: 0.00000994 [16:00:47] Epoch: 1 Batch: 15794/20099 (78.58%) Loss: 2.222811 LR: 0.00000994 [16:00:48] Epoch: 1 Batch: 15795/20099 (78.59%) Loss: 2.096844 LR: 0.00000994 [16:00:50] Epoch: 1 Batch: 15796/20099 (78.59%) Loss: 2.251136 LR: 0.00000994 [16:00:52] Epoch: 1 Batch: 15797/20099 (78.60%) Loss: 2.254202 LR: 0.00000994 [16:00:54] Epoch: 1 Batch: 15798/20099 (78.60%) Loss: 1.743074 LR: 0.00000994 [16:00:56] Epoch: 1 Batch: 15799/20099 (78.61%) Loss: 1.841773 LR: 0.00000994 [16:01:01] >> Cleaned up old temp checkpoint: epoch1_step13800 [16:01:01] >> Temp checkpoint saved: epoch1_step15800, size: 0.1693 GB [16:01:01] Epoch: 1 Batch: 15800/20099 (78.61%) Loss: 2.084735 LR: 0.00000992 [16:01:03] Epoch: 1 Batch: 15801/20099 (78.62%) Loss: 2.225002 LR: 0.00000992 [16:01:05] Epoch: 1 Batch: 15802/20099 (78.62%) Loss: 2.211079 LR: 0.00000992 [16:01:07] Epoch: 1 Batch: 15803/20099 (78.63%) Loss: 2.129584 LR: 0.00000992 [16:01:09] Epoch: 1 Batch: 15804/20099 (78.63%) Loss: 2.014624 LR: 0.00000992 [16:01:10] Epoch: 1 Batch: 15805/20099 (78.64%) Loss: 1.876759 LR: 0.00000992 [16:01:12] Epoch: 1 Batch: 15806/20099 (78.64%) Loss: 2.081866 LR: 0.00000992 [16:01:14] Epoch: 1 Batch: 15807/20099 (78.65%) Loss: 2.182796 LR: 0.00000991 [16:01:16] Epoch: 1 Batch: 15808/20099 (78.65%) Loss: 2.256665 LR: 0.00000991 [16:01:18] Epoch: 1 Batch: 15809/20099 (78.66%) Loss: 2.227414 LR: 0.00000991 [16:01:20] Epoch: 1 Batch: 15810/20099 (78.66%) Loss: 2.129448 LR: 0.00000991 [16:01:22] Epoch: 1 Batch: 15811/20099 (78.67%) Loss: 2.371997 LR: 0.00000991 [16:01:23] Epoch: 1 Batch: 15812/20099 (78.67%) Loss: 2.171572 LR: 0.00000991 [16:01:25] Epoch: 1 Batch: 15813/20099 (78.68%) Loss: 1.746992 LR: 0.00000991 [16:01:27] Epoch: 1 Batch: 15814/20099 (78.68%) Loss: 2.139758 LR: 0.00000990 [16:01:29] Epoch: 1 Batch: 15815/20099 (78.69%) Loss: 1.975576 LR: 0.00000990 [16:01:31] Epoch: 1 Batch: 15816/20099 (78.69%) Loss: 2.226619 LR: 0.00000990 [16:01:33] Epoch: 1 Batch: 15817/20099 (78.70%) Loss: 2.162621 LR: 0.00000990 [16:01:35] Epoch: 1 Batch: 15818/20099 (78.70%) Loss: 2.187351 LR: 0.00000990 [16:01:37] Epoch: 1 Batch: 15819/20099 (78.71%) Loss: 2.131095 LR: 0.00000990 [16:01:38] Epoch: 1 Batch: 15820/20099 (78.71%) Loss: 2.228522 LR: 0.00000990 [16:01:40] Epoch: 1 Batch: 15821/20099 (78.72%) Loss: 2.126752 LR: 0.00000989 [16:01:42] Epoch: 1 Batch: 15822/20099 (78.72%) Loss: 1.975162 LR: 0.00000989 [16:01:44] Epoch: 1 Batch: 15823/20099 (78.73%) Loss: 1.967517 LR: 0.00000989 [16:01:46] Epoch: 1 Batch: 15824/20099 (78.73%) Loss: 2.028698 LR: 0.00000989 [16:01:48] Epoch: 1 Batch: 15825/20099 (78.74%) Loss: 2.126650 LR: 0.00000989 [16:01:50] Epoch: 1 Batch: 15826/20099 (78.74%) Loss: 1.743927 LR: 0.00000989 [16:01:51] Epoch: 1 Batch: 15827/20099 (78.75%) Loss: 1.986560 LR: 0.00000989 [16:01:53] Epoch: 1 Batch: 15828/20099 (78.75%) Loss: 1.899283 LR: 0.00000988 [16:01:55] Epoch: 1 Batch: 15829/20099 (78.76%) Loss: 1.950975 LR: 0.00000988 [16:01:57] Epoch: 1 Batch: 15830/20099 (78.76%) Loss: 2.043473 LR: 0.00000988 [16:01:59] Epoch: 1 Batch: 15831/20099 (78.77%) Loss: 2.089319 LR: 0.00000988 [16:02:01] Epoch: 1 Batch: 15832/20099 (78.77%) Loss: 2.080405 LR: 0.00000988 [16:02:03] Epoch: 1 Batch: 15833/20099 (78.78%) Loss: 2.248727 LR: 0.00000988 [16:02:04] Epoch: 1 Batch: 15834/20099 (78.78%) Loss: 2.099190 LR: 0.00000988 [16:02:06] Epoch: 1 Batch: 15835/20099 (78.79%) Loss: 1.972268 LR: 0.00000986 [16:02:08] Epoch: 1 Batch: 15836/20099 (78.79%) Loss: 2.161817 LR: 0.00000986 [16:02:10] Epoch: 1 Batch: 15837/20099 (78.79%) Loss: 1.865821 LR: 0.00000986 [16:02:12] Epoch: 1 Batch: 15838/20099 (78.80%) Loss: 2.130437 LR: 0.00000986 [16:02:14] Epoch: 1 Batch: 15839/20099 (78.80%) Loss: 2.028706 LR: 0.00000986 [16:02:15] Epoch: 1 Batch: 15840/20099 (78.81%) Loss: 1.881472 LR: 0.00000986 [16:02:17] Epoch: 1 Batch: 15841/20099 (78.81%) Loss: 2.373928 LR: 0.00000986 [16:02:19] Epoch: 1 Batch: 15842/20099 (78.82%) Loss: 2.217161 LR: 0.00000985 [16:02:21] Epoch: 1 Batch: 15843/20099 (78.82%) Loss: 2.125276 LR: 0.00000985 [16:02:23] Epoch: 1 Batch: 15844/20099 (78.83%) Loss: 2.222782 LR: 0.00000985 [16:02:25] Epoch: 1 Batch: 15845/20099 (78.83%) Loss: 2.038830 LR: 0.00000985 [16:02:26] Epoch: 1 Batch: 15846/20099 (78.84%) Loss: 2.082449 LR: 0.00000985 [16:02:28] Epoch: 1 Batch: 15847/20099 (78.84%) Loss: 2.216962 LR: 0.00000985 [16:02:30] Epoch: 1 Batch: 15848/20099 (78.85%) Loss: 2.353801 LR: 0.00000985 [16:02:32] Epoch: 1 Batch: 15849/20099 (78.85%) Loss: 2.253825 LR: 0.00000984 [16:02:34] Epoch: 1 Batch: 15850/20099 (78.86%) Loss: 2.062839 LR: 0.00000984 [16:02:36] Epoch: 1 Batch: 15851/20099 (78.86%) Loss: 1.963298 LR: 0.00000984 [16:02:38] Epoch: 1 Batch: 15852/20099 (78.87%) Loss: 2.176297 LR: 0.00000984 [16:02:39] Epoch: 1 Batch: 15853/20099 (78.87%) Loss: 2.258205 LR: 0.00000984 [16:02:41] Epoch: 1 Batch: 15854/20099 (78.88%) Loss: 2.276224 LR: 0.00000984 [16:02:43] Epoch: 1 Batch: 15855/20099 (78.88%) Loss: 2.102357 LR: 0.00000984 [16:02:45] Epoch: 1 Batch: 15856/20099 (78.89%) Loss: 2.392173 LR: 0.00000983 [16:02:47] Epoch: 1 Batch: 15857/20099 (78.89%) Loss: 1.991498 LR: 0.00000983 [16:02:49] Epoch: 1 Batch: 15858/20099 (78.90%) Loss: 1.934220 LR: 0.00000983 [16:02:51] Epoch: 1 Batch: 15859/20099 (78.90%) Loss: 1.951105 LR: 0.00000983 [16:02:52] Epoch: 1 Batch: 15860/20099 (78.91%) Loss: 2.150551 LR: 0.00000983 [16:02:54] Epoch: 1 Batch: 15861/20099 (78.91%) Loss: 2.121039 LR: 0.00000983 [16:02:56] Epoch: 1 Batch: 15862/20099 (78.92%) Loss: 1.890010 LR: 0.00000983 [16:02:58] Epoch: 1 Batch: 15863/20099 (78.92%) Loss: 2.056733 LR: 0.00000982 [16:03:00] Epoch: 1 Batch: 15864/20099 (78.93%) Loss: 2.076474 LR: 0.00000982 [16:03:02] Epoch: 1 Batch: 15865/20099 (78.93%) Loss: 2.092302 LR: 0.00000982 [16:03:04] Epoch: 1 Batch: 15866/20099 (78.94%) Loss: 2.051674 LR: 0.00000982 [16:03:06] Epoch: 1 Batch: 15867/20099 (78.94%) Loss: 2.130412 LR: 0.00000982 [16:03:07] Epoch: 1 Batch: 15868/20099 (78.95%) Loss: 2.017164 LR: 0.00000982 [16:03:09] Epoch: 1 Batch: 15869/20099 (78.95%) Loss: 1.874221 LR: 0.00000982 [16:03:11] Epoch: 1 Batch: 15870/20099 (78.96%) Loss: 2.037051 LR: 0.00000980 [16:03:13] Epoch: 1 Batch: 15871/20099 (78.96%) Loss: 1.904336 LR: 0.00000980 [16:03:15] Epoch: 1 Batch: 15872/20099 (78.97%) Loss: 2.045178 LR: 0.00000980 [16:03:17] Epoch: 1 Batch: 15873/20099 (78.97%) Loss: 2.002966 LR: 0.00000980 [16:03:19] Epoch: 1 Batch: 15874/20099 (78.98%) Loss: 1.994064 LR: 0.00000980 [16:03:20] Epoch: 1 Batch: 15875/20099 (78.98%) Loss: 2.104956 LR: 0.00000980 [16:03:22] Epoch: 1 Batch: 15876/20099 (78.99%) Loss: 2.284694 LR: 0.00000980 [16:03:24] Epoch: 1 Batch: 15877/20099 (78.99%) Loss: 2.190358 LR: 0.00000979 [16:03:26] Epoch: 1 Batch: 15878/20099 (79.00%) Loss: 2.130193 LR: 0.00000979 [16:03:28] Epoch: 1 Batch: 15879/20099 (79.00%) Loss: 2.105683 LR: 0.00000979 [16:03:30] Epoch: 1 Batch: 15880/20099 (79.01%) Loss: 2.223555 LR: 0.00000979 [16:03:31] Epoch: 1 Batch: 15881/20099 (79.01%) Loss: 2.162377 LR: 0.00000979 [16:03:33] Epoch: 1 Batch: 15882/20099 (79.02%) Loss: 2.021289 LR: 0.00000979 [16:03:35] Epoch: 1 Batch: 15883/20099 (79.02%) Loss: 1.930312 LR: 0.00000979 [16:03:37] Epoch: 1 Batch: 15884/20099 (79.03%) Loss: 1.863414 LR: 0.00000978 [16:03:39] Epoch: 1 Batch: 15885/20099 (79.03%) Loss: 2.243502 LR: 0.00000978 [16:03:41] Epoch: 1 Batch: 15886/20099 (79.04%) Loss: 2.284492 LR: 0.00000978 [16:03:43] Epoch: 1 Batch: 15887/20099 (79.04%) Loss: 2.088390 LR: 0.00000978 [16:03:44] Epoch: 1 Batch: 15888/20099 (79.05%) Loss: 1.904705 LR: 0.00000978 [16:03:46] Epoch: 1 Batch: 15889/20099 (79.05%) Loss: 1.891292 LR: 0.00000978 [16:03:48] Epoch: 1 Batch: 15890/20099 (79.06%) Loss: 2.113130 LR: 0.00000978 [16:03:50] Epoch: 1 Batch: 15891/20099 (79.06%) Loss: 1.989501 LR: 0.00000977 [16:03:52] Epoch: 1 Batch: 15892/20099 (79.07%) Loss: 2.019909 LR: 0.00000977 [16:03:54] Epoch: 1 Batch: 15893/20099 (79.07%) Loss: 1.938616 LR: 0.00000977 [16:03:55] Epoch: 1 Batch: 15894/20099 (79.08%) Loss: 2.098261 LR: 0.00000977 [16:03:57] Epoch: 1 Batch: 15895/20099 (79.08%) Loss: 2.125155 LR: 0.00000977 [16:03:59] Epoch: 1 Batch: 15896/20099 (79.09%) Loss: 2.150758 LR: 0.00000977 [16:04:01] Epoch: 1 Batch: 15897/20099 (79.09%) Loss: 2.092648 LR: 0.00000977 [16:04:03] Epoch: 1 Batch: 15898/20099 (79.10%) Loss: 2.090314 LR: 0.00000976 [16:04:05] Epoch: 1 Batch: 15899/20099 (79.10%) Loss: 2.051914 LR: 0.00000976 [16:04:07] Epoch: 1 Batch: 15900/20099 (79.11%) Loss: 2.247695 LR: 0.00000976 [16:04:08] Epoch: 1 Batch: 15901/20099 (79.11%) Loss: 2.150155 LR: 0.00000976 [16:04:10] Epoch: 1 Batch: 15902/20099 (79.12%) Loss: 2.101501 LR: 0.00000976 [16:04:12] Epoch: 1 Batch: 15903/20099 (79.12%) Loss: 2.013657 LR: 0.00000976 [16:04:14] Epoch: 1 Batch: 15904/20099 (79.13%) Loss: 2.161249 LR: 0.00000976 [16:04:16] Epoch: 1 Batch: 15905/20099 (79.13%) Loss: 2.048043 LR: 0.00000974 [16:04:18] Epoch: 1 Batch: 15906/20099 (79.14%) Loss: 2.328536 LR: 0.00000974 [16:04:20] Epoch: 1 Batch: 15907/20099 (79.14%) Loss: 2.120104 LR: 0.00000974 [16:04:21] Epoch: 1 Batch: 15908/20099 (79.15%) Loss: 2.511087 LR: 0.00000974 [16:04:23] Epoch: 1 Batch: 15909/20099 (79.15%) Loss: 1.976946 LR: 0.00000974 [16:04:25] Epoch: 1 Batch: 15910/20099 (79.16%) Loss: 2.237674 LR: 0.00000974 [16:04:27] Epoch: 1 Batch: 15911/20099 (79.16%) Loss: 2.052062 LR: 0.00000974 [16:04:29] Epoch: 1 Batch: 15912/20099 (79.17%) Loss: 2.073648 LR: 0.00000973 [16:04:31] Epoch: 1 Batch: 15913/20099 (79.17%) Loss: 2.119620 LR: 0.00000973 [16:04:32] Epoch: 1 Batch: 15914/20099 (79.18%) Loss: 2.229075 LR: 0.00000973 [16:04:34] Epoch: 1 Batch: 15915/20099 (79.18%) Loss: 2.061507 LR: 0.00000973 [16:04:36] Epoch: 1 Batch: 15916/20099 (79.19%) Loss: 1.939970 LR: 0.00000973 [16:04:38] Epoch: 1 Batch: 15917/20099 (79.19%) Loss: 1.960644 LR: 0.00000973 [16:04:40] Epoch: 1 Batch: 15918/20099 (79.20%) Loss: 1.991987 LR: 0.00000973 [16:04:42] Epoch: 1 Batch: 15919/20099 (79.20%) Loss: 1.927425 LR: 0.00000972 [16:04:44] Epoch: 1 Batch: 15920/20099 (79.21%) Loss: 1.990829 LR: 0.00000972 [16:04:45] Epoch: 1 Batch: 15921/20099 (79.21%) Loss: 2.029898 LR: 0.00000972 [16:04:47] Epoch: 1 Batch: 15922/20099 (79.22%) Loss: 2.063139 LR: 0.00000972 [16:04:49] Epoch: 1 Batch: 15923/20099 (79.22%) Loss: 2.181451 LR: 0.00000972 [16:04:51] Epoch: 1 Batch: 15924/20099 (79.23%) Loss: 2.146837 LR: 0.00000972 [16:04:53] Epoch: 1 Batch: 15925/20099 (79.23%) Loss: 2.129570 LR: 0.00000972 [16:04:55] Epoch: 1 Batch: 15926/20099 (79.24%) Loss: 2.310096 LR: 0.00000971 [16:04:57] Epoch: 1 Batch: 15927/20099 (79.24%) Loss: 2.133451 LR: 0.00000971 [16:04:58] Epoch: 1 Batch: 15928/20099 (79.25%) Loss: 2.060733 LR: 0.00000971 [16:05:00] Epoch: 1 Batch: 15929/20099 (79.25%) Loss: 2.009328 LR: 0.00000971 [16:05:02] Epoch: 1 Batch: 15930/20099 (79.26%) Loss: 2.264488 LR: 0.00000971 [16:05:04] Epoch: 1 Batch: 15931/20099 (79.26%) Loss: 1.887088 LR: 0.00000971 [16:05:06] Epoch: 1 Batch: 15932/20099 (79.27%) Loss: 2.002310 LR: 0.00000971 [16:05:08] Epoch: 1 Batch: 15933/20099 (79.27%) Loss: 1.818721 LR: 0.00000970 [16:05:10] Epoch: 1 Batch: 15934/20099 (79.28%) Loss: 2.426514 LR: 0.00000970 [16:05:11] Epoch: 1 Batch: 15935/20099 (79.28%) Loss: 2.045333 LR: 0.00000970 [16:05:13] Epoch: 1 Batch: 15936/20099 (79.29%) Loss: 1.829441 LR: 0.00000970 [16:05:15] Epoch: 1 Batch: 15937/20099 (79.29%) Loss: 2.552944 LR: 0.00000970 [16:05:17] Epoch: 1 Batch: 15938/20099 (79.30%) Loss: 2.170409 LR: 0.00000970 [16:05:19] Epoch: 1 Batch: 15939/20099 (79.30%) Loss: 1.718681 LR: 0.00000970 [16:05:21] Epoch: 1 Batch: 15940/20099 (79.31%) Loss: 1.929232 LR: 0.00000969 [16:05:23] Epoch: 1 Batch: 15941/20099 (79.31%) Loss: 2.056788 LR: 0.00000969 [16:05:24] Epoch: 1 Batch: 15942/20099 (79.32%) Loss: 1.986933 LR: 0.00000969 [16:05:26] Epoch: 1 Batch: 15943/20099 (79.32%) Loss: 1.996920 LR: 0.00000969 [16:05:28] Epoch: 1 Batch: 15944/20099 (79.33%) Loss: 2.125483 LR: 0.00000969 [16:05:30] Epoch: 1 Batch: 15945/20099 (79.33%) Loss: 1.982537 LR: 0.00000969 [16:05:32] Epoch: 1 Batch: 15946/20099 (79.34%) Loss: 1.807639 LR: 0.00000969 [16:05:34] Epoch: 1 Batch: 15947/20099 (79.34%) Loss: 1.806106 LR: 0.00000967 [16:05:36] Epoch: 1 Batch: 15948/20099 (79.35%) Loss: 2.000024 LR: 0.00000967 [16:05:37] Epoch: 1 Batch: 15949/20099 (79.35%) Loss: 2.316201 LR: 0.00000967 [16:05:39] Epoch: 1 Batch: 15950/20099 (79.36%) Loss: 2.027521 LR: 0.00000967 [16:05:41] Epoch: 1 Batch: 15951/20099 (79.36%) Loss: 1.986259 LR: 0.00000967 [16:05:43] Epoch: 1 Batch: 15952/20099 (79.37%) Loss: 2.517145 LR: 0.00000967 [16:05:45] Epoch: 1 Batch: 15953/20099 (79.37%) Loss: 2.251321 LR: 0.00000967 [16:05:47] Epoch: 1 Batch: 15954/20099 (79.38%) Loss: 1.892114 LR: 0.00000966 [16:05:49] Epoch: 1 Batch: 15955/20099 (79.38%) Loss: 1.994374 LR: 0.00000966 [16:05:50] Epoch: 1 Batch: 15956/20099 (79.39%) Loss: 1.884433 LR: 0.00000966 [16:05:52] Epoch: 1 Batch: 15957/20099 (79.39%) Loss: 1.970425 LR: 0.00000966 [16:05:54] Epoch: 1 Batch: 15958/20099 (79.40%) Loss: 1.951691 LR: 0.00000966 [16:05:56] Epoch: 1 Batch: 15959/20099 (79.40%) Loss: 1.963602 LR: 0.00000966 [16:05:58] Epoch: 1 Batch: 15960/20099 (79.41%) Loss: 1.896909 LR: 0.00000966 [16:06:00] Epoch: 1 Batch: 15961/20099 (79.41%) Loss: 1.966817 LR: 0.00000965 [16:06:02] Epoch: 1 Batch: 15962/20099 (79.42%) Loss: 2.094138 LR: 0.00000965 [16:06:03] Epoch: 1 Batch: 15963/20099 (79.42%) Loss: 2.034080 LR: 0.00000965 [16:06:05] Epoch: 1 Batch: 15964/20099 (79.43%) Loss: 2.104521 LR: 0.00000965 [16:06:07] Epoch: 1 Batch: 15965/20099 (79.43%) Loss: 1.782545 LR: 0.00000965 [16:06:09] Epoch: 1 Batch: 15966/20099 (79.44%) Loss: 2.009510 LR: 0.00000965 [16:06:11] Epoch: 1 Batch: 15967/20099 (79.44%) Loss: 1.649070 LR: 0.00000965 [16:06:13] Epoch: 1 Batch: 15968/20099 (79.45%) Loss: 2.149815 LR: 0.00000964 [16:06:15] Epoch: 1 Batch: 15969/20099 (79.45%) Loss: 2.035090 LR: 0.00000964 [16:06:16] Epoch: 1 Batch: 15970/20099 (79.46%) Loss: 2.236294 LR: 0.00000964 [16:06:18] Epoch: 1 Batch: 15971/20099 (79.46%) Loss: 2.275693 LR: 0.00000964 [16:06:20] Epoch: 1 Batch: 15972/20099 (79.47%) Loss: 2.172979 LR: 0.00000964 [16:06:22] Epoch: 1 Batch: 15973/20099 (79.47%) Loss: 2.235153 LR: 0.00000964 [16:06:24] Epoch: 1 Batch: 15974/20099 (79.48%) Loss: 1.606601 LR: 0.00000964 [16:06:26] Epoch: 1 Batch: 15975/20099 (79.48%) Loss: 2.061748 LR: 0.00000963 [16:06:28] Epoch: 1 Batch: 15976/20099 (79.49%) Loss: 1.980593 LR: 0.00000963 [16:06:29] Epoch: 1 Batch: 15977/20099 (79.49%) Loss: 1.970255 LR: 0.00000963 [16:06:31] Epoch: 1 Batch: 15978/20099 (79.50%) Loss: 2.027783 LR: 0.00000963 [16:06:33] Epoch: 1 Batch: 15979/20099 (79.50%) Loss: 1.925441 LR: 0.00000963 [16:06:35] Epoch: 1 Batch: 15980/20099 (79.51%) Loss: 2.073586 LR: 0.00000963 [16:06:37] Epoch: 1 Batch: 15981/20099 (79.51%) Loss: 2.237588 LR: 0.00000963 [16:06:39] Epoch: 1 Batch: 15982/20099 (79.52%) Loss: 2.538350 LR: 0.00000962 [16:06:41] Epoch: 1 Batch: 15983/20099 (79.52%) Loss: 2.069956 LR: 0.00000962 [16:06:42] Epoch: 1 Batch: 15984/20099 (79.53%) Loss: 2.007016 LR: 0.00000962 [16:06:44] Epoch: 1 Batch: 15985/20099 (79.53%) Loss: 2.342731 LR: 0.00000962 [16:06:46] Epoch: 1 Batch: 15986/20099 (79.54%) Loss: 2.064733 LR: 0.00000962 [16:06:48] Epoch: 1 Batch: 15987/20099 (79.54%) Loss: 1.825910 LR: 0.00000962 [16:06:50] Epoch: 1 Batch: 15988/20099 (79.55%) Loss: 2.457549 LR: 0.00000962 [16:06:52] Epoch: 1 Batch: 15989/20099 (79.55%) Loss: 1.984759 LR: 0.00000960 [16:06:54] Epoch: 1 Batch: 15990/20099 (79.56%) Loss: 2.153504 LR: 0.00000960 [16:06:55] Epoch: 1 Batch: 15991/20099 (79.56%) Loss: 2.168320 LR: 0.00000960 [16:06:57] Epoch: 1 Batch: 15992/20099 (79.57%) Loss: 2.223712 LR: 0.00000960 [16:06:59] Epoch: 1 Batch: 15993/20099 (79.57%) Loss: 1.968870 LR: 0.00000960 [16:07:01] Epoch: 1 Batch: 15994/20099 (79.58%) Loss: 2.223657 LR: 0.00000960 [16:07:03] Epoch: 1 Batch: 15995/20099 (79.58%) Loss: 1.992247 LR: 0.00000960 [16:07:05] Epoch: 1 Batch: 15996/20099 (79.59%) Loss: 2.353453 LR: 0.00000959 [16:07:06] Epoch: 1 Batch: 15997/20099 (79.59%) Loss: 1.974596 LR: 0.00000959 [16:07:08] Epoch: 1 Batch: 15998/20099 (79.60%) Loss: 2.178932 LR: 0.00000959 [16:07:10] Epoch: 1 Batch: 15999/20099 (79.60%) Loss: 1.863865 LR: 0.00000959 [16:07:12] >> Evaluating batch 0 [16:07:13] >> Evaluating batch 1 [16:07:14] >> Evaluating batch 2 [16:07:15] >> Evaluating batch 3 [16:07:16] >> Evaluating batch 4 [16:07:17] >> Evaluating batch 5 [16:07:18] >> Evaluating batch 6 [16:07:20] >> Evaluating batch 7 [16:07:21] >> Evaluating batch 8 [16:07:22] >> Evaluating batch 9 [16:07:23] >> Evaluating batch 10 [16:07:24] >> Evaluating batch 11 [16:07:25] >> Evaluating batch 12 [16:07:26] >> Evaluating batch 13 [16:07:27] >> Evaluating batch 14 [16:07:28] >> Evaluating batch 15 [16:07:29] >> Evaluating batch 16 [16:07:29] Epoch: 1 Step: 16000/20099 Evaluation: [16:07:29] [1mAvg Loss Since Last Eval: 2.0832 Val Loss: 2.1489 Validation loss delta: -0.0015 Perplexity: 8.5754 LR: 0.00000959 [16:07:33] >> Cleaned up old temp checkpoint: epoch1_step14000 [16:07:33] >> Temp checkpoint saved: epoch1_step16000, size: 0.1693 GB [16:07:36] >> Checkpoint saved: epoch1_step16000, size: 0.1693 GB [16:07:36] Epoch: 1 Batch: 16000/20099 (79.61%) Loss: 2.333603 LR: 0.00000959 [16:07:38] Epoch: 1 Batch: 16001/20099 (79.61%) Loss: 2.075922 LR: 0.00000959 [16:07:40] Epoch: 1 Batch: 16002/20099 (79.62%) Loss: 2.095192 LR: 0.00000959 [16:07:42] Epoch: 1 Batch: 16003/20099 (79.62%) Loss: 2.162087 LR: 0.00000958 [16:07:44] Epoch: 1 Batch: 16004/20099 (79.63%) Loss: 1.783127 LR: 0.00000958 [16:07:45] Epoch: 1 Batch: 16005/20099 (79.63%) Loss: 2.001313 LR: 0.00000958 [16:07:47] Epoch: 1 Batch: 16006/20099 (79.64%) Loss: 1.945339 LR: 0.00000958 [16:07:49] Epoch: 1 Batch: 16007/20099 (79.64%) Loss: 1.991182 LR: 0.00000958 [16:07:51] Epoch: 1 Batch: 16008/20099 (79.65%) Loss: 2.114540 LR: 0.00000958 [16:07:53] Epoch: 1 Batch: 16009/20099 (79.65%) Loss: 2.072911 LR: 0.00000958 [16:07:55] Epoch: 1 Batch: 16010/20099 (79.66%) Loss: 2.069255 LR: 0.00000957 [16:07:56] Epoch: 1 Batch: 16011/20099 (79.66%) Loss: 2.141101 LR: 0.00000957 [16:07:58] Epoch: 1 Batch: 16012/20099 (79.67%) Loss: 2.186353 LR: 0.00000957 [16:08:00] Epoch: 1 Batch: 16013/20099 (79.67%) Loss: 2.107273 LR: 0.00000957 [16:08:02] Epoch: 1 Batch: 16014/20099 (79.68%) Loss: 1.867392 LR: 0.00000957 [16:08:04] Epoch: 1 Batch: 16015/20099 (79.68%) Loss: 1.975651 LR: 0.00000957 [16:08:06] Epoch: 1 Batch: 16016/20099 (79.69%) Loss: 1.994370 LR: 0.00000957 [16:08:08] Epoch: 1 Batch: 16017/20099 (79.69%) Loss: 1.718414 LR: 0.00000956 [16:08:09] Epoch: 1 Batch: 16018/20099 (79.70%) Loss: 1.784824 LR: 0.00000956 [16:08:11] Epoch: 1 Batch: 16019/20099 (79.70%) Loss: 2.071734 LR: 0.00000956 [16:08:13] Epoch: 1 Batch: 16020/20099 (79.71%) Loss: 2.116893 LR: 0.00000956 [16:08:15] Epoch: 1 Batch: 16021/20099 (79.71%) Loss: 2.129067 LR: 0.00000956 [16:08:17] Epoch: 1 Batch: 16022/20099 (79.72%) Loss: 1.583279 LR: 0.00000956 [16:08:19] Epoch: 1 Batch: 16023/20099 (79.72%) Loss: 2.021614 LR: 0.00000956 [16:08:21] Epoch: 1 Batch: 16024/20099 (79.73%) Loss: 2.063846 LR: 0.00000955 [16:08:22] Epoch: 1 Batch: 16025/20099 (79.73%) Loss: 2.326497 LR: 0.00000955 [16:08:24] Epoch: 1 Batch: 16026/20099 (79.74%) Loss: 2.180389 LR: 0.00000955 [16:08:26] Epoch: 1 Batch: 16027/20099 (79.74%) Loss: 2.197096 LR: 0.00000955 [16:08:28] Epoch: 1 Batch: 16028/20099 (79.75%) Loss: 2.300980 LR: 0.00000955 [16:08:30] Epoch: 1 Batch: 16029/20099 (79.75%) Loss: 1.945229 LR: 0.00000955 [16:08:32] Epoch: 1 Batch: 16030/20099 (79.76%) Loss: 2.389047 LR: 0.00000955 [16:08:34] Epoch: 1 Batch: 16031/20099 (79.76%) Loss: 2.038697 LR: 0.00000953 [16:08:35] Epoch: 1 Batch: 16032/20099 (79.77%) Loss: 2.304904 LR: 0.00000953 [16:08:37] Epoch: 1 Batch: 16033/20099 (79.77%) Loss: 1.943884 LR: 0.00000953 [16:08:39] Epoch: 1 Batch: 16034/20099 (79.78%) Loss: 2.231856 LR: 0.00000953 [16:08:41] Epoch: 1 Batch: 16035/20099 (79.78%) Loss: 2.062818 LR: 0.00000953 [16:08:43] Epoch: 1 Batch: 16036/20099 (79.79%) Loss: 2.463088 LR: 0.00000953 [16:08:45] Epoch: 1 Batch: 16037/20099 (79.79%) Loss: 1.882139 LR: 0.00000953 [16:08:47] Epoch: 1 Batch: 16038/20099 (79.80%) Loss: 2.138044 LR: 0.00000952 [16:08:48] Epoch: 1 Batch: 16039/20099 (79.80%) Loss: 2.224810 LR: 0.00000952 [16:08:50] Epoch: 1 Batch: 16040/20099 (79.80%) Loss: 2.064534 LR: 0.00000952 [16:08:52] Epoch: 1 Batch: 16041/20099 (79.81%) Loss: 2.012225 LR: 0.00000952 [16:08:54] Epoch: 1 Batch: 16042/20099 (79.81%) Loss: 2.265417 LR: 0.00000952 [16:08:56] Epoch: 1 Batch: 16043/20099 (79.82%) Loss: 2.202663 LR: 0.00000952 [16:08:58] Epoch: 1 Batch: 16044/20099 (79.82%) Loss: 1.857615 LR: 0.00000952 [16:09:00] Epoch: 1 Batch: 16045/20099 (79.83%) Loss: 1.804510 LR: 0.00000951 [16:09:01] Epoch: 1 Batch: 16046/20099 (79.83%) Loss: 1.948513 LR: 0.00000951 [16:09:03] Epoch: 1 Batch: 16047/20099 (79.84%) Loss: 1.931455 LR: 0.00000951 [16:09:05] Epoch: 1 Batch: 16048/20099 (79.84%) Loss: 2.257914 LR: 0.00000951 [16:09:07] Epoch: 1 Batch: 16049/20099 (79.85%) Loss: 1.781874 LR: 0.00000951 [16:09:09] Epoch: 1 Batch: 16050/20099 (79.85%) Loss: 2.147046 LR: 0.00000951 [16:09:11] Epoch: 1 Batch: 16051/20099 (79.86%) Loss: 2.174601 LR: 0.00000951 [16:09:12] Epoch: 1 Batch: 16052/20099 (79.86%) Loss: 1.810132 LR: 0.00000950 [16:09:14] Epoch: 1 Batch: 16053/20099 (79.87%) Loss: 2.076145 LR: 0.00000950 [16:09:16] Epoch: 1 Batch: 16054/20099 (79.87%) Loss: 2.177145 LR: 0.00000950 [16:09:18] Epoch: 1 Batch: 16055/20099 (79.88%) Loss: 2.316991 LR: 0.00000950 [16:09:20] Epoch: 1 Batch: 16056/20099 (79.88%) Loss: 2.006156 LR: 0.00000950 [16:09:22] Epoch: 1 Batch: 16057/20099 (79.89%) Loss: 1.941637 LR: 0.00000950 [16:09:24] Epoch: 1 Batch: 16058/20099 (79.89%) Loss: 1.860137 LR: 0.00000950 [16:09:25] Epoch: 1 Batch: 16059/20099 (79.90%) Loss: 2.138763 LR: 0.00000949 [16:09:27] Epoch: 1 Batch: 16060/20099 (79.90%) Loss: 2.144622 LR: 0.00000949 [16:09:29] Epoch: 1 Batch: 16061/20099 (79.91%) Loss: 1.771066 LR: 0.00000949 [16:09:31] Epoch: 1 Batch: 16062/20099 (79.91%) Loss: 2.062634 LR: 0.00000949 [16:09:33] Epoch: 1 Batch: 16063/20099 (79.92%) Loss: 1.971748 LR: 0.00000949 [16:09:35] Epoch: 1 Batch: 16064/20099 (79.92%) Loss: 1.773437 LR: 0.00000949 [16:09:37] Epoch: 1 Batch: 16065/20099 (79.93%) Loss: 1.965925 LR: 0.00000949 [16:09:38] Epoch: 1 Batch: 16066/20099 (79.93%) Loss: 2.158885 LR: 0.00000948 [16:09:40] Epoch: 1 Batch: 16067/20099 (79.94%) Loss: 1.698883 LR: 0.00000948 [16:09:42] Epoch: 1 Batch: 16068/20099 (79.94%) Loss: 1.952510 LR: 0.00000948 [16:09:44] Epoch: 1 Batch: 16069/20099 (79.95%) Loss: 2.044368 LR: 0.00000948 [16:09:46] Epoch: 1 Batch: 16070/20099 (79.95%) Loss: 2.048138 LR: 0.00000948 [16:09:48] Epoch: 1 Batch: 16071/20099 (79.96%) Loss: 2.140734 LR: 0.00000948 [16:09:50] Epoch: 1 Batch: 16072/20099 (79.96%) Loss: 1.973428 LR: 0.00000948 [16:09:51] Epoch: 1 Batch: 16073/20099 (79.97%) Loss: 1.903027 LR: 0.00000947 [16:09:53] Epoch: 1 Batch: 16074/20099 (79.97%) Loss: 2.579132 LR: 0.00000947 [16:09:55] Epoch: 1 Batch: 16075/20099 (79.98%) Loss: 2.154301 LR: 0.00000947 [16:09:57] Epoch: 1 Batch: 16076/20099 (79.98%) Loss: 2.282586 LR: 0.00000947 [16:09:59] Epoch: 1 Batch: 16077/20099 (79.99%) Loss: 2.173342 LR: 0.00000947 [16:10:01] Epoch: 1 Batch: 16078/20099 (79.99%) Loss: 2.020382 LR: 0.00000947 [16:10:03] Epoch: 1 Batch: 16079/20099 (80.00%) Loss: 2.067012 LR: 0.00000947 [16:10:04] Epoch: 1 Batch: 16080/20099 (80.00%) Loss: 2.142215 LR: 0.00000945 [16:10:06] Epoch: 1 Batch: 16081/20099 (80.01%) Loss: 2.184170 LR: 0.00000945 [16:10:08] Epoch: 1 Batch: 16082/20099 (80.01%) Loss: 2.260066 LR: 0.00000945 [16:10:10] Epoch: 1 Batch: 16083/20099 (80.02%) Loss: 2.211658 LR: 0.00000945 [16:10:12] Epoch: 1 Batch: 16084/20099 (80.02%) Loss: 2.162451 LR: 0.00000945 [16:10:14] Epoch: 1 Batch: 16085/20099 (80.03%) Loss: 2.212565 LR: 0.00000945 [16:10:16] Epoch: 1 Batch: 16086/20099 (80.03%) Loss: 2.288130 LR: 0.00000945 [16:10:17] Epoch: 1 Batch: 16087/20099 (80.04%) Loss: 2.032566 LR: 0.00000944 [16:10:19] Epoch: 1 Batch: 16088/20099 (80.04%) Loss: 2.324108 LR: 0.00000944 [16:10:21] Epoch: 1 Batch: 16089/20099 (80.05%) Loss: 1.565732 LR: 0.00000944 [16:10:23] Epoch: 1 Batch: 16090/20099 (80.05%) Loss: 2.124652 LR: 0.00000944 [16:10:25] Epoch: 1 Batch: 16091/20099 (80.06%) Loss: 2.125581 LR: 0.00000944 [16:10:27] Epoch: 1 Batch: 16092/20099 (80.06%) Loss: 2.148273 LR: 0.00000944 [16:10:29] Epoch: 1 Batch: 16093/20099 (80.07%) Loss: 2.351346 LR: 0.00000944 [16:10:31] Epoch: 1 Batch: 16094/20099 (80.07%) Loss: 1.891081 LR: 0.00000943 [16:10:32] Epoch: 1 Batch: 16095/20099 (80.08%) Loss: 1.921865 LR: 0.00000943 [16:10:34] Epoch: 1 Batch: 16096/20099 (80.08%) Loss: 1.993951 LR: 0.00000943 [16:10:36] Epoch: 1 Batch: 16097/20099 (80.09%) Loss: 2.046703 LR: 0.00000943 [16:10:38] Epoch: 1 Batch: 16098/20099 (80.09%) Loss: 1.883911 LR: 0.00000943 [16:10:40] Epoch: 1 Batch: 16099/20099 (80.10%) Loss: 2.296765 LR: 0.00000943 [16:10:42] Epoch: 1 Batch: 16100/20099 (80.10%) Loss: 1.429716 LR: 0.00000943 [16:10:44] Epoch: 1 Batch: 16101/20099 (80.11%) Loss: 2.021261 LR: 0.00000942 [16:10:45] Epoch: 1 Batch: 16102/20099 (80.11%) Loss: 2.589377 LR: 0.00000942 [16:10:47] Epoch: 1 Batch: 16103/20099 (80.12%) Loss: 2.082891 LR: 0.00000942 [16:10:49] Epoch: 1 Batch: 16104/20099 (80.12%) Loss: 2.170861 LR: 0.00000942 [16:10:51] Epoch: 1 Batch: 16105/20099 (80.13%) Loss: 2.072650 LR: 0.00000942 [16:10:53] Epoch: 1 Batch: 16106/20099 (80.13%) Loss: 2.279993 LR: 0.00000942 [16:10:55] Epoch: 1 Batch: 16107/20099 (80.14%) Loss: 1.920435 LR: 0.00000942 [16:10:57] Epoch: 1 Batch: 16108/20099 (80.14%) Loss: 2.115345 LR: 0.00000941 [16:10:58] Epoch: 1 Batch: 16109/20099 (80.15%) Loss: 2.205196 LR: 0.00000941 [16:11:00] Epoch: 1 Batch: 16110/20099 (80.15%) Loss: 2.131766 LR: 0.00000941 [16:11:02] Epoch: 1 Batch: 16111/20099 (80.16%) Loss: 2.310471 LR: 0.00000941 [16:11:04] Epoch: 1 Batch: 16112/20099 (80.16%) Loss: 2.138590 LR: 0.00000941 [16:11:06] Epoch: 1 Batch: 16113/20099 (80.17%) Loss: 2.274571 LR: 0.00000941 [16:11:08] Epoch: 1 Batch: 16114/20099 (80.17%) Loss: 2.001380 LR: 0.00000941 [16:11:10] Epoch: 1 Batch: 16115/20099 (80.18%) Loss: 2.040736 LR: 0.00000940 [16:11:11] Epoch: 1 Batch: 16116/20099 (80.18%) Loss: 2.132338 LR: 0.00000940 [16:11:13] Epoch: 1 Batch: 16117/20099 (80.19%) Loss: 1.939666 LR: 0.00000940 [16:11:15] Epoch: 1 Batch: 16118/20099 (80.19%) Loss: 1.968175 LR: 0.00000940 [16:11:17] Epoch: 1 Batch: 16119/20099 (80.20%) Loss: 1.811564 LR: 0.00000940 [16:11:19] Epoch: 1 Batch: 16120/20099 (80.20%) Loss: 1.948194 LR: 0.00000940 [16:11:21] Epoch: 1 Batch: 16121/20099 (80.21%) Loss: 1.971432 LR: 0.00000940 [16:11:23] Epoch: 1 Batch: 16122/20099 (80.21%) Loss: 1.957029 LR: 0.00000939 [16:11:24] Epoch: 1 Batch: 16123/20099 (80.22%) Loss: 1.983742 LR: 0.00000939 [16:11:26] Epoch: 1 Batch: 16124/20099 (80.22%) Loss: 2.321616 LR: 0.00000939 [16:11:28] Epoch: 1 Batch: 16125/20099 (80.23%) Loss: 1.886446 LR: 0.00000939 [16:11:30] Epoch: 1 Batch: 16126/20099 (80.23%) Loss: 1.948652 LR: 0.00000939 [16:11:32] Epoch: 1 Batch: 16127/20099 (80.24%) Loss: 2.157378 LR: 0.00000939 [16:11:34] Epoch: 1 Batch: 16128/20099 (80.24%) Loss: 2.149153 LR: 0.00000939 [16:11:36] Epoch: 1 Batch: 16129/20099 (80.25%) Loss: 2.044570 LR: 0.00000938 [16:11:37] Epoch: 1 Batch: 16130/20099 (80.25%) Loss: 1.874442 LR: 0.00000938 [16:11:39] Epoch: 1 Batch: 16131/20099 (80.26%) Loss: 1.622912 LR: 0.00000938 [16:11:41] Epoch: 1 Batch: 16132/20099 (80.26%) Loss: 1.899008 LR: 0.00000938 [16:11:43] Epoch: 1 Batch: 16133/20099 (80.27%) Loss: 2.073575 LR: 0.00000938 [16:11:45] Epoch: 1 Batch: 16134/20099 (80.27%) Loss: 2.205327 LR: 0.00000938 [16:11:47] Epoch: 1 Batch: 16135/20099 (80.28%) Loss: 2.192777 LR: 0.00000938 [16:11:49] Epoch: 1 Batch: 16136/20099 (80.28%) Loss: 1.954570 LR: 0.00000936 [16:11:50] Epoch: 1 Batch: 16137/20099 (80.29%) Loss: 2.243796 LR: 0.00000936 [16:11:52] Epoch: 1 Batch: 16138/20099 (80.29%) Loss: 2.008721 LR: 0.00000936 [16:11:54] Epoch: 1 Batch: 16139/20099 (80.30%) Loss: 1.996799 LR: 0.00000936 [16:11:56] Epoch: 1 Batch: 16140/20099 (80.30%) Loss: 1.844346 LR: 0.00000936 [16:11:58] Epoch: 1 Batch: 16141/20099 (80.31%) Loss: 1.987541 LR: 0.00000936 [16:12:00] Epoch: 1 Batch: 16142/20099 (80.31%) Loss: 2.058368 LR: 0.00000936 [16:12:02] Epoch: 1 Batch: 16143/20099 (80.32%) Loss: 1.701446 LR: 0.00000935 [16:12:03] Epoch: 1 Batch: 16144/20099 (80.32%) Loss: 2.061042 LR: 0.00000935 [16:12:05] Epoch: 1 Batch: 16145/20099 (80.33%) Loss: 2.191151 LR: 0.00000935 [16:12:07] Epoch: 1 Batch: 16146/20099 (80.33%) Loss: 1.921444 LR: 0.00000935 [16:12:09] Epoch: 1 Batch: 16147/20099 (80.34%) Loss: 2.096420 LR: 0.00000935 [16:12:11] Epoch: 1 Batch: 16148/20099 (80.34%) Loss: 2.063709 LR: 0.00000935 [16:12:13] Epoch: 1 Batch: 16149/20099 (80.35%) Loss: 2.056097 LR: 0.00000935 [16:12:15] Epoch: 1 Batch: 16150/20099 (80.35%) Loss: 1.774503 LR: 0.00000934 [16:12:16] Epoch: 1 Batch: 16151/20099 (80.36%) Loss: 1.880729 LR: 0.00000934 [16:12:18] Epoch: 1 Batch: 16152/20099 (80.36%) Loss: 2.170770 LR: 0.00000934 [16:12:20] Epoch: 1 Batch: 16153/20099 (80.37%) Loss: 2.005351 LR: 0.00000934 [16:12:22] Epoch: 1 Batch: 16154/20099 (80.37%) Loss: 2.300341 LR: 0.00000934 [16:12:24] Epoch: 1 Batch: 16155/20099 (80.38%) Loss: 1.772797 LR: 0.00000934 [16:12:26] Epoch: 1 Batch: 16156/20099 (80.38%) Loss: 2.207148 LR: 0.00000934 [16:12:28] Epoch: 1 Batch: 16157/20099 (80.39%) Loss: 2.035293 LR: 0.00000933 [16:12:29] Epoch: 1 Batch: 16158/20099 (80.39%) Loss: 2.267085 LR: 0.00000933 [16:12:31] Epoch: 1 Batch: 16159/20099 (80.40%) Loss: 2.179423 LR: 0.00000933 [16:12:33] Epoch: 1 Batch: 16160/20099 (80.40%) Loss: 2.379231 LR: 0.00000933 [16:12:35] Epoch: 1 Batch: 16161/20099 (80.41%) Loss: 1.900602 LR: 0.00000933 [16:12:37] Epoch: 1 Batch: 16162/20099 (80.41%) Loss: 2.152182 LR: 0.00000933 [16:12:39] Epoch: 1 Batch: 16163/20099 (80.42%) Loss: 2.059004 LR: 0.00000933 [16:12:40] Epoch: 1 Batch: 16164/20099 (80.42%) Loss: 1.996798 LR: 0.00000932 [16:12:42] Epoch: 1 Batch: 16165/20099 (80.43%) Loss: 2.285771 LR: 0.00000932 [16:12:44] Epoch: 1 Batch: 16166/20099 (80.43%) Loss: 2.231897 LR: 0.00000932 [16:12:46] Epoch: 1 Batch: 16167/20099 (80.44%) Loss: 1.965531 LR: 0.00000932 [16:12:48] Epoch: 1 Batch: 16168/20099 (80.44%) Loss: 2.193839 LR: 0.00000932 [16:12:50] Epoch: 1 Batch: 16169/20099 (80.45%) Loss: 1.935272 LR: 0.00000932 [16:12:52] Epoch: 1 Batch: 16170/20099 (80.45%) Loss: 2.403994 LR: 0.00000932 [16:12:53] Epoch: 1 Batch: 16171/20099 (80.46%) Loss: 1.953079 LR: 0.00000931 [16:12:55] Epoch: 1 Batch: 16172/20099 (80.46%) Loss: 2.028410 LR: 0.00000931 [16:12:57] Epoch: 1 Batch: 16173/20099 (80.47%) Loss: 2.233258 LR: 0.00000931 [16:12:59] Epoch: 1 Batch: 16174/20099 (80.47%) Loss: 1.968730 LR: 0.00000931 [16:13:01] Epoch: 1 Batch: 16175/20099 (80.48%) Loss: 2.085281 LR: 0.00000931 [16:13:03] Epoch: 1 Batch: 16176/20099 (80.48%) Loss: 2.051264 LR: 0.00000931 [16:13:05] Epoch: 1 Batch: 16177/20099 (80.49%) Loss: 2.580713 LR: 0.00000931 [16:13:06] Epoch: 1 Batch: 16178/20099 (80.49%) Loss: 2.175680 LR: 0.00000930 [16:13:08] Epoch: 1 Batch: 16179/20099 (80.50%) Loss: 2.186239 LR: 0.00000930 [16:13:10] Epoch: 1 Batch: 16180/20099 (80.50%) Loss: 1.871091 LR: 0.00000930 [16:13:12] Epoch: 1 Batch: 16181/20099 (80.51%) Loss: 1.917589 LR: 0.00000930 [16:13:14] Epoch: 1 Batch: 16182/20099 (80.51%) Loss: 1.954893 LR: 0.00000930 [16:13:16] Epoch: 1 Batch: 16183/20099 (80.52%) Loss: 1.950796 LR: 0.00000930 [16:13:18] Epoch: 1 Batch: 16184/20099 (80.52%) Loss: 1.984521 LR: 0.00000930 [16:13:19] Epoch: 1 Batch: 16185/20099 (80.53%) Loss: 2.008568 LR: 0.00000929 [16:13:21] Epoch: 1 Batch: 16186/20099 (80.53%) Loss: 2.154390 LR: 0.00000929 [16:13:23] Epoch: 1 Batch: 16187/20099 (80.54%) Loss: 1.896702 LR: 0.00000929 [16:13:25] Epoch: 1 Batch: 16188/20099 (80.54%) Loss: 2.099654 LR: 0.00000929 [16:13:27] Epoch: 1 Batch: 16189/20099 (80.55%) Loss: 2.251858 LR: 0.00000929 [16:13:29] Epoch: 1 Batch: 16190/20099 (80.55%) Loss: 1.927732 LR: 0.00000929 [16:13:31] Epoch: 1 Batch: 16191/20099 (80.56%) Loss: 2.076502 LR: 0.00000929 [16:13:33] Epoch: 1 Batch: 16192/20099 (80.56%) Loss: 2.358695 LR: 0.00000927 [16:13:34] Epoch: 1 Batch: 16193/20099 (80.57%) Loss: 2.171126 LR: 0.00000927 [16:13:36] Epoch: 1 Batch: 16194/20099 (80.57%) Loss: 1.971061 LR: 0.00000927 [16:13:38] Epoch: 1 Batch: 16195/20099 (80.58%) Loss: 2.059350 LR: 0.00000927 [16:13:40] Epoch: 1 Batch: 16196/20099 (80.58%) Loss: 1.812565 LR: 0.00000927 [16:13:42] Epoch: 1 Batch: 16197/20099 (80.59%) Loss: 1.741261 LR: 0.00000927 [16:13:44] Epoch: 1 Batch: 16198/20099 (80.59%) Loss: 2.197853 LR: 0.00000927 [16:13:46] Epoch: 1 Batch: 16199/20099 (80.60%) Loss: 2.152427 LR: 0.00000926 [16:13:51] >> Cleaned up old temp checkpoint: epoch1_step14200 [16:13:51] >> Temp checkpoint saved: epoch1_step16200, size: 0.1693 GB [16:13:51] Epoch: 1 Batch: 16200/20099 (80.60%) Loss: 2.072310 LR: 0.00000926 [16:13:53] Epoch: 1 Batch: 16201/20099 (80.61%) Loss: 2.030499 LR: 0.00000926 [16:13:55] Epoch: 1 Batch: 16202/20099 (80.61%) Loss: 1.994692 LR: 0.00000926 [16:13:56] Epoch: 1 Batch: 16203/20099 (80.62%) Loss: 2.025219 LR: 0.00000926 [16:13:58] Epoch: 1 Batch: 16204/20099 (80.62%) Loss: 1.901455 LR: 0.00000926 [16:14:00] Epoch: 1 Batch: 16205/20099 (80.63%) Loss: 1.995661 LR: 0.00000926 [16:14:02] Epoch: 1 Batch: 16206/20099 (80.63%) Loss: 2.003099 LR: 0.00000925 [16:14:04] Epoch: 1 Batch: 16207/20099 (80.64%) Loss: 2.203615 LR: 0.00000925 [16:14:06] Epoch: 1 Batch: 16208/20099 (80.64%) Loss: 2.163456 LR: 0.00000925 [16:14:08] Epoch: 1 Batch: 16209/20099 (80.65%) Loss: 2.332261 LR: 0.00000925 [16:14:09] Epoch: 1 Batch: 16210/20099 (80.65%) Loss: 2.261419 LR: 0.00000925 [16:14:11] Epoch: 1 Batch: 16211/20099 (80.66%) Loss: 2.134980 LR: 0.00000925 [16:14:13] Epoch: 1 Batch: 16212/20099 (80.66%) Loss: 2.063556 LR: 0.00000925 [16:14:15] Epoch: 1 Batch: 16213/20099 (80.67%) Loss: 2.075718 LR: 0.00000924 [16:14:17] Epoch: 1 Batch: 16214/20099 (80.67%) Loss: 2.062649 LR: 0.00000924 [16:14:19] Epoch: 1 Batch: 16215/20099 (80.68%) Loss: 2.315426 LR: 0.00000924 [16:14:20] Epoch: 1 Batch: 16216/20099 (80.68%) Loss: 2.029032 LR: 0.00000924 [16:14:22] Epoch: 1 Batch: 16217/20099 (80.69%) Loss: 2.137788 LR: 0.00000924 [16:14:24] Epoch: 1 Batch: 16218/20099 (80.69%) Loss: 2.316128 LR: 0.00000924 [16:14:26] Epoch: 1 Batch: 16219/20099 (80.70%) Loss: 2.284968 LR: 0.00000924 [16:14:28] Epoch: 1 Batch: 16220/20099 (80.70%) Loss: 1.941548 LR: 0.00000923 [16:14:30] Epoch: 1 Batch: 16221/20099 (80.71%) Loss: 1.917004 LR: 0.00000923 [16:14:32] Epoch: 1 Batch: 16222/20099 (80.71%) Loss: 1.887786 LR: 0.00000923 [16:14:33] Epoch: 1 Batch: 16223/20099 (80.72%) Loss: 1.907259 LR: 0.00000923 [16:14:35] Epoch: 1 Batch: 16224/20099 (80.72%) Loss: 1.875565 LR: 0.00000923 [16:14:37] Epoch: 1 Batch: 16225/20099 (80.73%) Loss: 2.085193 LR: 0.00000923 [16:14:39] Epoch: 1 Batch: 16226/20099 (80.73%) Loss: 2.120503 LR: 0.00000923 [16:14:41] Epoch: 1 Batch: 16227/20099 (80.74%) Loss: 2.110924 LR: 0.00000922 [16:14:43] Epoch: 1 Batch: 16228/20099 (80.74%) Loss: 2.246830 LR: 0.00000922 [16:14:45] Epoch: 1 Batch: 16229/20099 (80.75%) Loss: 2.101231 LR: 0.00000922 [16:14:46] Epoch: 1 Batch: 16230/20099 (80.75%) Loss: 2.256769 LR: 0.00000922 [16:14:48] Epoch: 1 Batch: 16231/20099 (80.76%) Loss: 2.027970 LR: 0.00000922 [16:14:50] Epoch: 1 Batch: 16232/20099 (80.76%) Loss: 2.118880 LR: 0.00000922 [16:14:52] Epoch: 1 Batch: 16233/20099 (80.77%) Loss: 1.991335 LR: 0.00000922 [16:14:54] Epoch: 1 Batch: 16234/20099 (80.77%) Loss: 2.079563 LR: 0.00000921 [16:14:56] Epoch: 1 Batch: 16235/20099 (80.78%) Loss: 1.969099 LR: 0.00000921 [16:14:58] Epoch: 1 Batch: 16236/20099 (80.78%) Loss: 2.250857 LR: 0.00000921 [16:14:59] Epoch: 1 Batch: 16237/20099 (80.79%) Loss: 1.845540 LR: 0.00000921 [16:15:01] Epoch: 1 Batch: 16238/20099 (80.79%) Loss: 2.079174 LR: 0.00000921 [16:15:03] Epoch: 1 Batch: 16239/20099 (80.80%) Loss: 2.114981 LR: 0.00000921 [16:15:05] Epoch: 1 Batch: 16240/20099 (80.80%) Loss: 2.210446 LR: 0.00000921 [16:15:07] Epoch: 1 Batch: 16241/20099 (80.81%) Loss: 1.993339 LR: 0.00000920 [16:15:09] Epoch: 1 Batch: 16242/20099 (80.81%) Loss: 2.245748 LR: 0.00000920 [16:15:11] Epoch: 1 Batch: 16243/20099 (80.81%) Loss: 2.250192 LR: 0.00000920 [16:15:12] Epoch: 1 Batch: 16244/20099 (80.82%) Loss: 2.057471 LR: 0.00000920 [16:15:14] Epoch: 1 Batch: 16245/20099 (80.82%) Loss: 2.240060 LR: 0.00000920 [16:15:16] Epoch: 1 Batch: 16246/20099 (80.83%) Loss: 2.008941 LR: 0.00000920 [16:15:18] Epoch: 1 Batch: 16247/20099 (80.83%) Loss: 1.833347 LR: 0.00000920 [16:15:20] Epoch: 1 Batch: 16248/20099 (80.84%) Loss: 1.843803 LR: 0.00000919 [16:15:22] Epoch: 1 Batch: 16249/20099 (80.84%) Loss: 1.957973 LR: 0.00000919 [16:15:23] Epoch: 1 Batch: 16250/20099 (80.85%) Loss: 2.161137 LR: 0.00000919 [16:15:25] Epoch: 1 Batch: 16251/20099 (80.85%) Loss: 1.972811 LR: 0.00000919 [16:15:27] Epoch: 1 Batch: 16252/20099 (80.86%) Loss: 1.905372 LR: 0.00000919 [16:15:29] Epoch: 1 Batch: 16253/20099 (80.86%) Loss: 2.048338 LR: 0.00000919 [16:15:31] Epoch: 1 Batch: 16254/20099 (80.87%) Loss: 2.161475 LR: 0.00000919 [16:15:33] Epoch: 1 Batch: 16255/20099 (80.87%) Loss: 2.358382 LR: 0.00000917 [16:15:35] Epoch: 1 Batch: 16256/20099 (80.88%) Loss: 1.970593 LR: 0.00000917 [16:15:36] Epoch: 1 Batch: 16257/20099 (80.88%) Loss: 1.892983 LR: 0.00000917 [16:15:38] Epoch: 1 Batch: 16258/20099 (80.89%) Loss: 2.253192 LR: 0.00000917 [16:15:40] Epoch: 1 Batch: 16259/20099 (80.89%) Loss: 2.088091 LR: 0.00000917 [16:15:42] Epoch: 1 Batch: 16260/20099 (80.90%) Loss: 1.883291 LR: 0.00000917 [16:15:44] Epoch: 1 Batch: 16261/20099 (80.90%) Loss: 2.102073 LR: 0.00000917 [16:15:46] Epoch: 1 Batch: 16262/20099 (80.91%) Loss: 2.053726 LR: 0.00000916 [16:15:48] Epoch: 1 Batch: 16263/20099 (80.91%) Loss: 2.074721 LR: 0.00000916 [16:15:49] Epoch: 1 Batch: 16264/20099 (80.92%) Loss: 2.085749 LR: 0.00000916 [16:15:51] Epoch: 1 Batch: 16265/20099 (80.92%) Loss: 1.658970 LR: 0.00000916 [16:15:53] Epoch: 1 Batch: 16266/20099 (80.93%) Loss: 1.906022 LR: 0.00000916 [16:15:55] Epoch: 1 Batch: 16267/20099 (80.93%) Loss: 2.078442 LR: 0.00000916 [16:15:57] Epoch: 1 Batch: 16268/20099 (80.94%) Loss: 1.869716 LR: 0.00000916 [16:15:59] Epoch: 1 Batch: 16269/20099 (80.94%) Loss: 2.378378 LR: 0.00000915 [16:16:01] Epoch: 1 Batch: 16270/20099 (80.95%) Loss: 2.135688 LR: 0.00000915 [16:16:02] Epoch: 1 Batch: 16271/20099 (80.95%) Loss: 2.033135 LR: 0.00000915 [16:16:04] Epoch: 1 Batch: 16272/20099 (80.96%) Loss: 1.955188 LR: 0.00000915 [16:16:06] Epoch: 1 Batch: 16273/20099 (80.96%) Loss: 2.033279 LR: 0.00000915 [16:16:08] Epoch: 1 Batch: 16274/20099 (80.97%) Loss: 2.132611 LR: 0.00000915 [16:16:10] Epoch: 1 Batch: 16275/20099 (80.97%) Loss: 2.141267 LR: 0.00000915 [16:16:12] Epoch: 1 Batch: 16276/20099 (80.98%) Loss: 1.830270 LR: 0.00000914 [16:16:14] Epoch: 1 Batch: 16277/20099 (80.98%) Loss: 1.984531 LR: 0.00000914 [16:16:15] Epoch: 1 Batch: 16278/20099 (80.99%) Loss: 2.256507 LR: 0.00000914 [16:16:17] Epoch: 1 Batch: 16279/20099 (80.99%) Loss: 1.934224 LR: 0.00000914 [16:16:19] Epoch: 1 Batch: 16280/20099 (81.00%) Loss: 1.647403 LR: 0.00000914 [16:16:21] Epoch: 1 Batch: 16281/20099 (81.00%) Loss: 2.150187 LR: 0.00000914 [16:16:23] Epoch: 1 Batch: 16282/20099 (81.01%) Loss: 2.173044 LR: 0.00000914 [16:16:25] Epoch: 1 Batch: 16283/20099 (81.01%) Loss: 2.117081 LR: 0.00000913 [16:16:27] Epoch: 1 Batch: 16284/20099 (81.02%) Loss: 1.832438 LR: 0.00000913 [16:16:29] Epoch: 1 Batch: 16285/20099 (81.02%) Loss: 1.934216 LR: 0.00000913 [16:16:30] Epoch: 1 Batch: 16286/20099 (81.03%) Loss: 2.274875 LR: 0.00000913 [16:16:32] Epoch: 1 Batch: 16287/20099 (81.03%) Loss: 2.230308 LR: 0.00000913 [16:16:34] Epoch: 1 Batch: 16288/20099 (81.04%) Loss: 2.269819 LR: 0.00000913 [16:16:36] Epoch: 1 Batch: 16289/20099 (81.04%) Loss: 2.151894 LR: 0.00000913 [16:16:38] Epoch: 1 Batch: 16290/20099 (81.05%) Loss: 1.856832 LR: 0.00000912 [16:16:40] Epoch: 1 Batch: 16291/20099 (81.05%) Loss: 2.420965 LR: 0.00000912 [16:16:42] Epoch: 1 Batch: 16292/20099 (81.06%) Loss: 2.079112 LR: 0.00000912 [16:16:43] Epoch: 1 Batch: 16293/20099 (81.06%) Loss: 2.132642 LR: 0.00000912 [16:16:45] Epoch: 1 Batch: 16294/20099 (81.07%) Loss: 1.997705 LR: 0.00000912 [16:16:47] Epoch: 1 Batch: 16295/20099 (81.07%) Loss: 2.125673 LR: 0.00000912 [16:16:49] Epoch: 1 Batch: 16296/20099 (81.08%) Loss: 2.069116 LR: 0.00000912 [16:16:51] Epoch: 1 Batch: 16297/20099 (81.08%) Loss: 1.888569 LR: 0.00000911 [16:16:53] Epoch: 1 Batch: 16298/20099 (81.09%) Loss: 2.216491 LR: 0.00000911 [16:16:55] Epoch: 1 Batch: 16299/20099 (81.09%) Loss: 1.793048 LR: 0.00000911 [16:16:56] Epoch: 1 Batch: 16300/20099 (81.10%) Loss: 1.944720 LR: 0.00000911 [16:16:58] Epoch: 1 Batch: 16301/20099 (81.10%) Loss: 2.169119 LR: 0.00000911 [16:17:00] Epoch: 1 Batch: 16302/20099 (81.11%) Loss: 2.007722 LR: 0.00000911 [16:17:02] Epoch: 1 Batch: 16303/20099 (81.11%) Loss: 2.056260 LR: 0.00000911 [16:17:04] Epoch: 1 Batch: 16304/20099 (81.12%) Loss: 2.043446 LR: 0.00000910 [16:17:06] Epoch: 1 Batch: 16305/20099 (81.12%) Loss: 2.317718 LR: 0.00000910 [16:17:08] Epoch: 1 Batch: 16306/20099 (81.13%) Loss: 2.196620 LR: 0.00000910 [16:17:09] Epoch: 1 Batch: 16307/20099 (81.13%) Loss: 2.117467 LR: 0.00000910 [16:17:11] Epoch: 1 Batch: 16308/20099 (81.14%) Loss: 2.354849 LR: 0.00000910 [16:17:13] Epoch: 1 Batch: 16309/20099 (81.14%) Loss: 1.910407 LR: 0.00000910 [16:17:15] Epoch: 1 Batch: 16310/20099 (81.15%) Loss: 2.071437 LR: 0.00000910 [16:17:17] Epoch: 1 Batch: 16311/20099 (81.15%) Loss: 2.010394 LR: 0.00000909 [16:17:19] Epoch: 1 Batch: 16312/20099 (81.16%) Loss: 2.281152 LR: 0.00000909 [16:17:21] Epoch: 1 Batch: 16313/20099 (81.16%) Loss: 1.898114 LR: 0.00000909 [16:17:22] Epoch: 1 Batch: 16314/20099 (81.17%) Loss: 2.141768 LR: 0.00000909 [16:17:24] Epoch: 1 Batch: 16315/20099 (81.17%) Loss: 1.999946 LR: 0.00000909 [16:17:26] Epoch: 1 Batch: 16316/20099 (81.18%) Loss: 2.269006 LR: 0.00000909 [16:17:28] Epoch: 1 Batch: 16317/20099 (81.18%) Loss: 1.954539 LR: 0.00000909 [16:17:30] Epoch: 1 Batch: 16318/20099 (81.19%) Loss: 1.924524 LR: 0.00000908 [16:17:32] Epoch: 1 Batch: 16319/20099 (81.19%) Loss: 2.186167 LR: 0.00000908 [16:17:34] Epoch: 1 Batch: 16320/20099 (81.20%) Loss: 2.141123 LR: 0.00000908 [16:17:35] Epoch: 1 Batch: 16321/20099 (81.20%) Loss: 2.383243 LR: 0.00000908 [16:17:37] Epoch: 1 Batch: 16322/20099 (81.21%) Loss: 2.273573 LR: 0.00000908 [16:17:39] Epoch: 1 Batch: 16323/20099 (81.21%) Loss: 2.262253 LR: 0.00000908 [16:17:41] Epoch: 1 Batch: 16324/20099 (81.22%) Loss: 1.951034 LR: 0.00000908 [16:17:43] Epoch: 1 Batch: 16325/20099 (81.22%) Loss: 2.372686 LR: 0.00000907 [16:17:45] Epoch: 1 Batch: 16326/20099 (81.23%) Loss: 1.999699 LR: 0.00000907 [16:17:47] Epoch: 1 Batch: 16327/20099 (81.23%) Loss: 1.915114 LR: 0.00000907 [16:17:48] Epoch: 1 Batch: 16328/20099 (81.24%) Loss: 1.518602 LR: 0.00000907 [16:17:50] Epoch: 1 Batch: 16329/20099 (81.24%) Loss: 1.936647 LR: 0.00000907 [16:17:52] Epoch: 1 Batch: 16330/20099 (81.25%) Loss: 2.140012 LR: 0.00000907 [16:17:54] Epoch: 1 Batch: 16331/20099 (81.25%) Loss: 2.056526 LR: 0.00000907 [16:17:56] Epoch: 1 Batch: 16332/20099 (81.26%) Loss: 2.191687 LR: 0.00000905 [16:17:58] Epoch: 1 Batch: 16333/20099 (81.26%) Loss: 2.051647 LR: 0.00000905 [16:17:59] Epoch: 1 Batch: 16334/20099 (81.27%) Loss: 2.229664 LR: 0.00000905 [16:18:01] Epoch: 1 Batch: 16335/20099 (81.27%) Loss: 2.010728 LR: 0.00000905 [16:18:03] Epoch: 1 Batch: 16336/20099 (81.28%) Loss: 2.141655 LR: 0.00000905 [16:18:05] Epoch: 1 Batch: 16337/20099 (81.28%) Loss: 2.018472 LR: 0.00000905 [16:18:07] Epoch: 1 Batch: 16338/20099 (81.29%) Loss: 2.302359 LR: 0.00000905 [16:18:09] Epoch: 1 Batch: 16339/20099 (81.29%) Loss: 2.220205 LR: 0.00000904 [16:18:11] Epoch: 1 Batch: 16340/20099 (81.30%) Loss: 2.022842 LR: 0.00000904 [16:18:12] Epoch: 1 Batch: 16341/20099 (81.30%) Loss: 1.561133 LR: 0.00000904 [16:18:14] Epoch: 1 Batch: 16342/20099 (81.31%) Loss: 2.295654 LR: 0.00000904 [16:18:16] Epoch: 1 Batch: 16343/20099 (81.31%) Loss: 2.443317 LR: 0.00000904 [16:18:18] Epoch: 1 Batch: 16344/20099 (81.32%) Loss: 2.069518 LR: 0.00000904 [16:18:20] Epoch: 1 Batch: 16345/20099 (81.32%) Loss: 2.228383 LR: 0.00000904 [16:18:22] Epoch: 1 Batch: 16346/20099 (81.33%) Loss: 1.787857 LR: 0.00000903 [16:18:24] Epoch: 1 Batch: 16347/20099 (81.33%) Loss: 1.798540 LR: 0.00000903 [16:18:25] Epoch: 1 Batch: 16348/20099 (81.34%) Loss: 2.076424 LR: 0.00000903 [16:18:27] Epoch: 1 Batch: 16349/20099 (81.34%) Loss: 1.930049 LR: 0.00000903 [16:18:29] Epoch: 1 Batch: 16350/20099 (81.35%) Loss: 2.189503 LR: 0.00000903 [16:18:31] Epoch: 1 Batch: 16351/20099 (81.35%) Loss: 2.202222 LR: 0.00000903 [16:18:33] Epoch: 1 Batch: 16352/20099 (81.36%) Loss: 1.936811 LR: 0.00000903 [16:18:35] Epoch: 1 Batch: 16353/20099 (81.36%) Loss: 2.376939 LR: 0.00000902 [16:18:36] Epoch: 1 Batch: 16354/20099 (81.37%) Loss: 2.293652 LR: 0.00000902 [16:18:38] Epoch: 1 Batch: 16355/20099 (81.37%) Loss: 1.960539 LR: 0.00000902 [16:18:40] Epoch: 1 Batch: 16356/20099 (81.38%) Loss: 2.094041 LR: 0.00000902 [16:18:42] Epoch: 1 Batch: 16357/20099 (81.38%) Loss: 2.160471 LR: 0.00000902 [16:18:44] Epoch: 1 Batch: 16358/20099 (81.39%) Loss: 1.955694 LR: 0.00000902 [16:18:46] Epoch: 1 Batch: 16359/20099 (81.39%) Loss: 1.870648 LR: 0.00000902 [16:18:48] Epoch: 1 Batch: 16360/20099 (81.40%) Loss: 2.089891 LR: 0.00000901 [16:18:49] Epoch: 1 Batch: 16361/20099 (81.40%) Loss: 2.334042 LR: 0.00000901 [16:18:51] Epoch: 1 Batch: 16362/20099 (81.41%) Loss: 2.328595 LR: 0.00000901 [16:18:53] Epoch: 1 Batch: 16363/20099 (81.41%) Loss: 2.366523 LR: 0.00000901 [16:18:55] Epoch: 1 Batch: 16364/20099 (81.42%) Loss: 2.426263 LR: 0.00000901 [16:18:57] Epoch: 1 Batch: 16365/20099 (81.42%) Loss: 1.925551 LR: 0.00000901 [16:18:59] Epoch: 1 Batch: 16366/20099 (81.43%) Loss: 1.943493 LR: 0.00000901 [16:19:01] Epoch: 1 Batch: 16367/20099 (81.43%) Loss: 1.787126 LR: 0.00000900 [16:19:02] Epoch: 1 Batch: 16368/20099 (81.44%) Loss: 1.962619 LR: 0.00000900 [16:19:04] Epoch: 1 Batch: 16369/20099 (81.44%) Loss: 2.289970 LR: 0.00000900 [16:19:06] Epoch: 1 Batch: 16370/20099 (81.45%) Loss: 2.190047 LR: 0.00000900 [16:19:08] Epoch: 1 Batch: 16371/20099 (81.45%) Loss: 2.249807 LR: 0.00000900 [16:19:10] Epoch: 1 Batch: 16372/20099 (81.46%) Loss: 2.336900 LR: 0.00000900 [16:19:12] Epoch: 1 Batch: 16373/20099 (81.46%) Loss: 2.124334 LR: 0.00000900 [16:19:14] Epoch: 1 Batch: 16374/20099 (81.47%) Loss: 2.361797 LR: 0.00000899 [16:19:15] Epoch: 1 Batch: 16375/20099 (81.47%) Loss: 1.895668 LR: 0.00000899 [16:19:17] Epoch: 1 Batch: 16376/20099 (81.48%) Loss: 2.109955 LR: 0.00000899 [16:19:19] Epoch: 1 Batch: 16377/20099 (81.48%) Loss: 1.970118 LR: 0.00000899 [16:19:21] Epoch: 1 Batch: 16378/20099 (81.49%) Loss: 2.160949 LR: 0.00000899 [16:19:23] Epoch: 1 Batch: 16379/20099 (81.49%) Loss: 2.103943 LR: 0.00000899 [16:19:25] Epoch: 1 Batch: 16380/20099 (81.50%) Loss: 1.970526 LR: 0.00000899 [16:19:27] Epoch: 1 Batch: 16381/20099 (81.50%) Loss: 2.027892 LR: 0.00000898 [16:19:29] Epoch: 1 Batch: 16382/20099 (81.51%) Loss: 2.165708 LR: 0.00000898 [16:19:30] Epoch: 1 Batch: 16383/20099 (81.51%) Loss: 1.935363 LR: 0.00000898 [16:19:32] Epoch: 1 Batch: 16384/20099 (81.52%) Loss: 2.043347 LR: 0.00000898 [16:19:34] Epoch: 1 Batch: 16385/20099 (81.52%) Loss: 2.274018 LR: 0.00000898 [16:19:36] Epoch: 1 Batch: 16386/20099 (81.53%) Loss: 2.229197 LR: 0.00000898 [16:19:38] Epoch: 1 Batch: 16387/20099 (81.53%) Loss: 2.355647 LR: 0.00000898 [16:19:40] Epoch: 1 Batch: 16388/20099 (81.54%) Loss: 2.541529 LR: 0.00000897 [16:19:42] Epoch: 1 Batch: 16389/20099 (81.54%) Loss: 2.181964 LR: 0.00000897 [16:19:43] Epoch: 1 Batch: 16390/20099 (81.55%) Loss: 2.094627 LR: 0.00000897 [16:19:45] Epoch: 1 Batch: 16391/20099 (81.55%) Loss: 1.928281 LR: 0.00000897 [16:19:47] Epoch: 1 Batch: 16392/20099 (81.56%) Loss: 2.072373 LR: 0.00000897 [16:19:49] Epoch: 1 Batch: 16393/20099 (81.56%) Loss: 2.252689 LR: 0.00000897 [16:19:51] Epoch: 1 Batch: 16394/20099 (81.57%) Loss: 2.374795 LR: 0.00000897 [16:19:53] Epoch: 1 Batch: 16395/20099 (81.57%) Loss: 2.069069 LR: 0.00000896 [16:19:55] Epoch: 1 Batch: 16396/20099 (81.58%) Loss: 1.946969 LR: 0.00000896 [16:19:56] Epoch: 1 Batch: 16397/20099 (81.58%) Loss: 2.038100 LR: 0.00000896 [16:19:58] Epoch: 1 Batch: 16398/20099 (81.59%) Loss: 2.205416 LR: 0.00000896 [16:20:00] Epoch: 1 Batch: 16399/20099 (81.59%) Loss: 2.535084 LR: 0.00000896 [16:20:06] >> Cleaned up old temp checkpoint: epoch1_step14400 [16:20:06] >> Temp checkpoint saved: epoch1_step16400, size: 0.1693 GB [16:20:06] Epoch: 1 Batch: 16400/20099 (81.60%) Loss: 2.174673 LR: 0.00000896 [16:20:08] Epoch: 1 Batch: 16401/20099 (81.60%) Loss: 1.592294 LR: 0.00000896 [16:20:09] Epoch: 1 Batch: 16402/20099 (81.61%) Loss: 2.011690 LR: 0.00000895 [16:20:11] Epoch: 1 Batch: 16403/20099 (81.61%) Loss: 1.990918 LR: 0.00000895 [16:20:13] Epoch: 1 Batch: 16404/20099 (81.62%) Loss: 1.896723 LR: 0.00000895 [16:20:15] Epoch: 1 Batch: 16405/20099 (81.62%) Loss: 2.256215 LR: 0.00000895 [16:20:17] Epoch: 1 Batch: 16406/20099 (81.63%) Loss: 2.215314 LR: 0.00000895 [16:20:19] Epoch: 1 Batch: 16407/20099 (81.63%) Loss: 2.201990 LR: 0.00000895 [16:20:21] Epoch: 1 Batch: 16408/20099 (81.64%) Loss: 2.278979 LR: 0.00000895 [16:20:22] Epoch: 1 Batch: 16409/20099 (81.64%) Loss: 1.909014 LR: 0.00000894 [16:20:24] Epoch: 1 Batch: 16410/20099 (81.65%) Loss: 2.186988 LR: 0.00000894 [16:20:26] Epoch: 1 Batch: 16411/20099 (81.65%) Loss: 1.988971 LR: 0.00000894 [16:20:28] Epoch: 1 Batch: 16412/20099 (81.66%) Loss: 2.406084 LR: 0.00000894 [16:20:30] Epoch: 1 Batch: 16413/20099 (81.66%) Loss: 2.097692 LR: 0.00000894 [16:20:32] Epoch: 1 Batch: 16414/20099 (81.67%) Loss: 2.235339 LR: 0.00000894 [16:20:34] Epoch: 1 Batch: 16415/20099 (81.67%) Loss: 1.925044 LR: 0.00000894 [16:20:36] Epoch: 1 Batch: 16416/20099 (81.68%) Loss: 2.077592 LR: 0.00000893 [16:20:37] Epoch: 1 Batch: 16417/20099 (81.68%) Loss: 2.009226 LR: 0.00000893 [16:20:39] Epoch: 1 Batch: 16418/20099 (81.69%) Loss: 2.015584 LR: 0.00000893 [16:20:41] Epoch: 1 Batch: 16419/20099 (81.69%) Loss: 2.362475 LR: 0.00000893 [16:20:43] Epoch: 1 Batch: 16420/20099 (81.70%) Loss: 2.175116 LR: 0.00000893 [16:20:45] Epoch: 1 Batch: 16421/20099 (81.70%) Loss: 2.058089 LR: 0.00000893 [16:20:47] Epoch: 1 Batch: 16422/20099 (81.71%) Loss: 2.083847 LR: 0.00000893 [16:20:49] Epoch: 1 Batch: 16423/20099 (81.71%) Loss: 2.165125 LR: 0.00000892 [16:20:51] Epoch: 1 Batch: 16424/20099 (81.72%) Loss: 1.835495 LR: 0.00000892 [16:20:52] Epoch: 1 Batch: 16425/20099 (81.72%) Loss: 2.364249 LR: 0.00000892 [16:20:54] Epoch: 1 Batch: 16426/20099 (81.73%) Loss: 2.174137 LR: 0.00000892 [16:20:56] Epoch: 1 Batch: 16427/20099 (81.73%) Loss: 2.141110 LR: 0.00000892 [16:20:58] Epoch: 1 Batch: 16428/20099 (81.74%) Loss: 2.007820 LR: 0.00000892 [16:21:00] Epoch: 1 Batch: 16429/20099 (81.74%) Loss: 2.076276 LR: 0.00000892 [16:21:02] Epoch: 1 Batch: 16430/20099 (81.75%) Loss: 2.052114 LR: 0.00000890 [16:21:04] Epoch: 1 Batch: 16431/20099 (81.75%) Loss: 1.926757 LR: 0.00000890 [16:21:05] Epoch: 1 Batch: 16432/20099 (81.76%) Loss: 2.080806 LR: 0.00000890 [16:21:07] Epoch: 1 Batch: 16433/20099 (81.76%) Loss: 1.868304 LR: 0.00000890 [16:21:09] Epoch: 1 Batch: 16434/20099 (81.77%) Loss: 2.096716 LR: 0.00000890 [16:21:11] Epoch: 1 Batch: 16435/20099 (81.77%) Loss: 1.606771 LR: 0.00000890 [16:21:13] Epoch: 1 Batch: 16436/20099 (81.78%) Loss: 1.903899 LR: 0.00000890 [16:21:15] Epoch: 1 Batch: 16437/20099 (81.78%) Loss: 1.931796 LR: 0.00000889 [16:21:17] Epoch: 1 Batch: 16438/20099 (81.79%) Loss: 2.159409 LR: 0.00000889 [16:21:19] Epoch: 1 Batch: 16439/20099 (81.79%) Loss: 2.105321 LR: 0.00000889 [16:21:20] Epoch: 1 Batch: 16440/20099 (81.80%) Loss: 2.027949 LR: 0.00000889 [16:21:22] Epoch: 1 Batch: 16441/20099 (81.80%) Loss: 2.178613 LR: 0.00000889 [16:21:24] Epoch: 1 Batch: 16442/20099 (81.81%) Loss: 1.841477 LR: 0.00000889 [16:21:26] Epoch: 1 Batch: 16443/20099 (81.81%) Loss: 2.025831 LR: 0.00000889 [16:21:28] Epoch: 1 Batch: 16444/20099 (81.82%) Loss: 2.324130 LR: 0.00000888 [16:21:30] Epoch: 1 Batch: 16445/20099 (81.82%) Loss: 1.813631 LR: 0.00000888 [16:21:32] Epoch: 1 Batch: 16446/20099 (81.82%) Loss: 1.483622 LR: 0.00000888 [16:21:33] Epoch: 1 Batch: 16447/20099 (81.83%) Loss: 2.230134 LR: 0.00000888 [16:21:35] Epoch: 1 Batch: 16448/20099 (81.83%) Loss: 2.088285 LR: 0.00000888 [16:21:37] Epoch: 1 Batch: 16449/20099 (81.84%) Loss: 1.994093 LR: 0.00000888 [16:21:39] Epoch: 1 Batch: 16450/20099 (81.84%) Loss: 1.872150 LR: 0.00000888 [16:21:41] Epoch: 1 Batch: 16451/20099 (81.85%) Loss: 2.092474 LR: 0.00000887 [16:21:43] Epoch: 1 Batch: 16452/20099 (81.85%) Loss: 2.442058 LR: 0.00000887 [16:21:45] Epoch: 1 Batch: 16453/20099 (81.86%) Loss: 2.015718 LR: 0.00000887 [16:21:46] Epoch: 1 Batch: 16454/20099 (81.86%) Loss: 2.232316 LR: 0.00000887 [16:21:48] Epoch: 1 Batch: 16455/20099 (81.87%) Loss: 2.005383 LR: 0.00000887 [16:21:50] Epoch: 1 Batch: 16456/20099 (81.87%) Loss: 1.947353 LR: 0.00000887 [16:21:52] Epoch: 1 Batch: 16457/20099 (81.88%) Loss: 2.298584 LR: 0.00000887 [16:21:54] Epoch: 1 Batch: 16458/20099 (81.88%) Loss: 1.957472 LR: 0.00000886 [16:21:56] Epoch: 1 Batch: 16459/20099 (81.89%) Loss: 2.102442 LR: 0.00000886 [16:21:58] Epoch: 1 Batch: 16460/20099 (81.89%) Loss: 2.182721 LR: 0.00000886 [16:21:59] Epoch: 1 Batch: 16461/20099 (81.90%) Loss: 2.035393 LR: 0.00000886 [16:22:01] Epoch: 1 Batch: 16462/20099 (81.90%) Loss: 2.214000 LR: 0.00000886 [16:22:03] Epoch: 1 Batch: 16463/20099 (81.91%) Loss: 1.996739 LR: 0.00000886 [16:22:05] Epoch: 1 Batch: 16464/20099 (81.91%) Loss: 2.189920 LR: 0.00000886 [16:22:07] Epoch: 1 Batch: 16465/20099 (81.92%) Loss: 1.935609 LR: 0.00000885 [16:22:09] Epoch: 1 Batch: 16466/20099 (81.92%) Loss: 2.085215 LR: 0.00000885 [16:22:10] Epoch: 1 Batch: 16467/20099 (81.93%) Loss: 2.205858 LR: 0.00000885 [16:22:12] Epoch: 1 Batch: 16468/20099 (81.93%) Loss: 1.821169 LR: 0.00000885 [16:22:14] Epoch: 1 Batch: 16469/20099 (81.94%) Loss: 2.239026 LR: 0.00000885 [16:22:16] Epoch: 1 Batch: 16470/20099 (81.94%) Loss: 1.953988 LR: 0.00000885 [16:22:18] Epoch: 1 Batch: 16471/20099 (81.95%) Loss: 1.897557 LR: 0.00000885 [16:22:20] Epoch: 1 Batch: 16472/20099 (81.95%) Loss: 2.120228 LR: 0.00000884 [16:22:22] Epoch: 1 Batch: 16473/20099 (81.96%) Loss: 1.921435 LR: 0.00000884 [16:22:23] Epoch: 1 Batch: 16474/20099 (81.96%) Loss: 2.147715 LR: 0.00000884 [16:22:25] Epoch: 1 Batch: 16475/20099 (81.97%) Loss: 2.137479 LR: 0.00000884 [16:22:27] Epoch: 1 Batch: 16476/20099 (81.97%) Loss: 2.145004 LR: 0.00000884 [16:22:29] Epoch: 1 Batch: 16477/20099 (81.98%) Loss: 1.923843 LR: 0.00000884 [16:22:31] Epoch: 1 Batch: 16478/20099 (81.98%) Loss: 2.312792 LR: 0.00000884 [16:22:33] Epoch: 1 Batch: 16479/20099 (81.99%) Loss: 2.204871 LR: 0.00000883 [16:22:34] Epoch: 1 Batch: 16480/20099 (81.99%) Loss: 2.117625 LR: 0.00000883 [16:22:36] Epoch: 1 Batch: 16481/20099 (82.00%) Loss: 2.031749 LR: 0.00000883 [16:22:38] Epoch: 1 Batch: 16482/20099 (82.00%) Loss: 1.964625 LR: 0.00000883 [16:22:40] Epoch: 1 Batch: 16483/20099 (82.01%) Loss: 2.349060 LR: 0.00000883 [16:22:42] Epoch: 1 Batch: 16484/20099 (82.01%) Loss: 2.165146 LR: 0.00000883 [16:22:44] Epoch: 1 Batch: 16485/20099 (82.02%) Loss: 2.178628 LR: 0.00000883 [16:22:46] Epoch: 1 Batch: 16486/20099 (82.02%) Loss: 1.836853 LR: 0.00000882 [16:22:47] Epoch: 1 Batch: 16487/20099 (82.03%) Loss: 2.248146 LR: 0.00000882 [16:22:49] Epoch: 1 Batch: 16488/20099 (82.03%) Loss: 2.415990 LR: 0.00000882 [16:22:51] Epoch: 1 Batch: 16489/20099 (82.04%) Loss: 1.786675 LR: 0.00000882 [16:22:53] Epoch: 1 Batch: 16490/20099 (82.04%) Loss: 1.948336 LR: 0.00000882 [16:22:55] Epoch: 1 Batch: 16491/20099 (82.05%) Loss: 2.164959 LR: 0.00000882 [16:22:57] Epoch: 1 Batch: 16492/20099 (82.05%) Loss: 2.010401 LR: 0.00000882 [16:22:58] Epoch: 1 Batch: 16493/20099 (82.06%) Loss: 2.409545 LR: 0.00000881 [16:23:00] Epoch: 1 Batch: 16494/20099 (82.06%) Loss: 2.172247 LR: 0.00000881 [16:23:02] Epoch: 1 Batch: 16495/20099 (82.07%) Loss: 2.104458 LR: 0.00000881 [16:23:04] Epoch: 1 Batch: 16496/20099 (82.07%) Loss: 2.059661 LR: 0.00000881 [16:23:06] Epoch: 1 Batch: 16497/20099 (82.08%) Loss: 2.255671 LR: 0.00000881 [16:23:08] Epoch: 1 Batch: 16498/20099 (82.08%) Loss: 1.734671 LR: 0.00000881 [16:23:10] Epoch: 1 Batch: 16499/20099 (82.09%) Loss: 2.002261 LR: 0.00000881 [16:23:11] >> Evaluating batch 0 [16:23:13] >> Evaluating batch 1 [16:23:14] >> Evaluating batch 2 [16:23:15] >> Evaluating batch 3 [16:23:16] >> Evaluating batch 4 [16:23:17] >> Evaluating batch 5 [16:23:18] >> Evaluating batch 6 [16:23:19] >> Evaluating batch 7 [16:23:20] >> Evaluating batch 8 [16:23:21] >> Evaluating batch 9 [16:23:22] >> Evaluating batch 10 [16:23:23] >> Evaluating batch 11 [16:23:24] >> Evaluating batch 12 [16:23:25] >> Evaluating batch 13 [16:23:26] >> Evaluating batch 14 [16:23:27] >> Evaluating batch 15 [16:23:28] >> Evaluating batch 16 [16:23:29] Epoch: 1 Step: 16500/20099 Evaluation: [16:23:29] [1mAvg Loss Since Last Eval: 2.0763 Val Loss: 2.1479 Validation loss delta: -0.0010 Perplexity: 8.5664 LR: 0.00000880 [16:23:32] >> Checkpoint saved: epoch1_step16500, size: 0.1693 GB [16:23:32] Epoch: 1 Batch: 16500/20099 (82.09%) Loss: 2.285379 LR: 0.00000880 [16:23:34] Epoch: 1 Batch: 16501/20099 (82.10%) Loss: 2.283371 LR: 0.00000880 [16:23:36] Epoch: 1 Batch: 16502/20099 (82.10%) Loss: 2.249239 LR: 0.00000880 [16:23:38] Epoch: 1 Batch: 16503/20099 (82.11%) Loss: 2.182446 LR: 0.00000880 [16:23:40] Epoch: 1 Batch: 16504/20099 (82.11%) Loss: 2.295748 LR: 0.00000880 [16:23:42] Epoch: 1 Batch: 16505/20099 (82.12%) Loss: 2.222879 LR: 0.00000880 [16:23:43] Epoch: 1 Batch: 16506/20099 (82.12%) Loss: 1.888754 LR: 0.00000880 [16:23:45] Epoch: 1 Batch: 16507/20099 (82.13%) Loss: 2.204117 LR: 0.00000879 [16:23:47] Epoch: 1 Batch: 16508/20099 (82.13%) Loss: 2.051782 LR: 0.00000879 [16:23:49] Epoch: 1 Batch: 16509/20099 (82.14%) Loss: 2.224120 LR: 0.00000879 [16:23:51] Epoch: 1 Batch: 16510/20099 (82.14%) Loss: 2.142867 LR: 0.00000879 [16:23:53] Epoch: 1 Batch: 16511/20099 (82.15%) Loss: 1.979385 LR: 0.00000879 [16:23:55] Epoch: 1 Batch: 16512/20099 (82.15%) Loss: 1.811125 LR: 0.00000879 [16:23:56] Epoch: 1 Batch: 16513/20099 (82.16%) Loss: 1.968368 LR: 0.00000879 [16:23:58] Epoch: 1 Batch: 16514/20099 (82.16%) Loss: 2.105552 LR: 0.00000878 [16:24:00] Epoch: 1 Batch: 16515/20099 (82.17%) Loss: 2.119487 LR: 0.00000878 [16:24:02] Epoch: 1 Batch: 16516/20099 (82.17%) Loss: 2.135611 LR: 0.00000878 [16:24:04] Epoch: 1 Batch: 16517/20099 (82.18%) Loss: 2.369372 LR: 0.00000878 [16:24:06] Epoch: 1 Batch: 16518/20099 (82.18%) Loss: 2.232313 LR: 0.00000878 [16:24:08] Epoch: 1 Batch: 16519/20099 (82.19%) Loss: 1.523172 LR: 0.00000878 [16:24:09] Epoch: 1 Batch: 16520/20099 (82.19%) Loss: 2.272309 LR: 0.00000878 [16:24:11] Epoch: 1 Batch: 16521/20099 (82.20%) Loss: 2.195114 LR: 0.00000877 [16:24:13] Epoch: 1 Batch: 16522/20099 (82.20%) Loss: 2.411936 LR: 0.00000877 [16:24:15] Epoch: 1 Batch: 16523/20099 (82.21%) Loss: 2.126309 LR: 0.00000877 [16:24:17] Epoch: 1 Batch: 16524/20099 (82.21%) Loss: 1.969497 LR: 0.00000877 [16:24:19] Epoch: 1 Batch: 16525/20099 (82.22%) Loss: 2.122985 LR: 0.00000877 [16:24:21] Epoch: 1 Batch: 16526/20099 (82.22%) Loss: 2.025162 LR: 0.00000877 [16:24:22] Epoch: 1 Batch: 16527/20099 (82.23%) Loss: 1.947451 LR: 0.00000877 [16:24:24] Epoch: 1 Batch: 16528/20099 (82.23%) Loss: 2.262833 LR: 0.00000876 [16:24:26] Epoch: 1 Batch: 16529/20099 (82.24%) Loss: 2.189519 LR: 0.00000876 [16:24:28] Epoch: 1 Batch: 16530/20099 (82.24%) Loss: 2.193761 LR: 0.00000876 [16:24:30] Epoch: 1 Batch: 16531/20099 (82.25%) Loss: 1.863230 LR: 0.00000876 [16:24:32] Epoch: 1 Batch: 16532/20099 (82.25%) Loss: 1.641419 LR: 0.00000876 [16:24:34] Epoch: 1 Batch: 16533/20099 (82.26%) Loss: 2.089165 LR: 0.00000876 [16:24:35] Epoch: 1 Batch: 16534/20099 (82.26%) Loss: 1.938838 LR: 0.00000876 [16:24:37] Epoch: 1 Batch: 16535/20099 (82.27%) Loss: 2.047857 LR: 0.00000875 [16:24:39] Epoch: 1 Batch: 16536/20099 (82.27%) Loss: 2.164503 LR: 0.00000875 [16:24:41] Epoch: 1 Batch: 16537/20099 (82.28%) Loss: 2.228244 LR: 0.00000875 [16:24:43] Epoch: 1 Batch: 16538/20099 (82.28%) Loss: 2.022917 LR: 0.00000875 [16:24:45] Epoch: 1 Batch: 16539/20099 (82.29%) Loss: 2.093437 LR: 0.00000875 [16:24:47] Epoch: 1 Batch: 16540/20099 (82.29%) Loss: 1.986417 LR: 0.00000875 [16:24:48] Epoch: 1 Batch: 16541/20099 (82.30%) Loss: 1.919136 LR: 0.00000875 [16:24:50] Epoch: 1 Batch: 16542/20099 (82.30%) Loss: 1.906429 LR: 0.00000874 [16:24:52] Epoch: 1 Batch: 16543/20099 (82.31%) Loss: 2.081245 LR: 0.00000874 [16:24:54] Epoch: 1 Batch: 16544/20099 (82.31%) Loss: 2.002152 LR: 0.00000874 [16:24:56] Epoch: 1 Batch: 16545/20099 (82.32%) Loss: 2.170120 LR: 0.00000874 [16:24:58] Epoch: 1 Batch: 16546/20099 (82.32%) Loss: 2.112074 LR: 0.00000874 [16:25:00] Epoch: 1 Batch: 16547/20099 (82.33%) Loss: 2.229463 LR: 0.00000874 [16:25:01] Epoch: 1 Batch: 16548/20099 (82.33%) Loss: 2.110729 LR: 0.00000874 [16:25:03] Epoch: 1 Batch: 16549/20099 (82.34%) Loss: 2.218425 LR: 0.00000873 [16:25:05] Epoch: 1 Batch: 16550/20099 (82.34%) Loss: 1.827083 LR: 0.00000873 [16:25:07] Epoch: 1 Batch: 16551/20099 (82.35%) Loss: 2.236944 LR: 0.00000873 [16:25:09] Epoch: 1 Batch: 16552/20099 (82.35%) Loss: 2.076031 LR: 0.00000873 [16:25:11] Epoch: 1 Batch: 16553/20099 (82.36%) Loss: 2.235271 LR: 0.00000873 [16:25:13] Epoch: 1 Batch: 16554/20099 (82.36%) Loss: 1.987821 LR: 0.00000873 [16:25:14] Epoch: 1 Batch: 16555/20099 (82.37%) Loss: 1.788835 LR: 0.00000873 [16:25:16] Epoch: 1 Batch: 16556/20099 (82.37%) Loss: 2.366536 LR: 0.00000872 [16:25:18] Epoch: 1 Batch: 16557/20099 (82.38%) Loss: 2.405690 LR: 0.00000872 [16:25:20] Epoch: 1 Batch: 16558/20099 (82.38%) Loss: 2.240904 LR: 0.00000872 [16:25:22] Epoch: 1 Batch: 16559/20099 (82.39%) Loss: 1.982807 LR: 0.00000872 [16:25:24] Epoch: 1 Batch: 16560/20099 (82.39%) Loss: 2.165594 LR: 0.00000872 [16:25:25] Epoch: 1 Batch: 16561/20099 (82.40%) Loss: 2.156009 LR: 0.00000872 [16:25:27] Epoch: 1 Batch: 16562/20099 (82.40%) Loss: 2.135953 LR: 0.00000872 [16:25:29] Epoch: 1 Batch: 16563/20099 (82.41%) Loss: 2.054039 LR: 0.00000871 [16:25:31] Epoch: 1 Batch: 16564/20099 (82.41%) Loss: 2.251221 LR: 0.00000871 [16:25:33] Epoch: 1 Batch: 16565/20099 (82.42%) Loss: 2.108429 LR: 0.00000871 [16:25:35] Epoch: 1 Batch: 16566/20099 (82.42%) Loss: 2.023752 LR: 0.00000871 [16:25:37] Epoch: 1 Batch: 16567/20099 (82.43%) Loss: 1.892316 LR: 0.00000871 [16:25:38] Epoch: 1 Batch: 16568/20099 (82.43%) Loss: 2.197621 LR: 0.00000871 [16:25:40] Epoch: 1 Batch: 16569/20099 (82.44%) Loss: 2.212276 LR: 0.00000871 [16:25:42] Epoch: 1 Batch: 16570/20099 (82.44%) Loss: 1.957202 LR: 0.00000870 [16:25:44] Epoch: 1 Batch: 16571/20099 (82.45%) Loss: 2.097516 LR: 0.00000870 [16:25:46] Epoch: 1 Batch: 16572/20099 (82.45%) Loss: 2.083212 LR: 0.00000870 [16:25:48] Epoch: 1 Batch: 16573/20099 (82.46%) Loss: 2.195915 LR: 0.00000870 [16:25:49] Epoch: 1 Batch: 16574/20099 (82.46%) Loss: 2.133075 LR: 0.00000870 [16:25:51] Epoch: 1 Batch: 16575/20099 (82.47%) Loss: 2.241356 LR: 0.00000870 [16:25:53] Epoch: 1 Batch: 16576/20099 (82.47%) Loss: 1.874364 LR: 0.00000870 [16:25:55] Epoch: 1 Batch: 16577/20099 (82.48%) Loss: 2.366028 LR: 0.00000869 [16:25:57] Epoch: 1 Batch: 16578/20099 (82.48%) Loss: 1.736152 LR: 0.00000869 [16:25:59] Epoch: 1 Batch: 16579/20099 (82.49%) Loss: 2.118838 LR: 0.00000869 [16:26:01] Epoch: 1 Batch: 16580/20099 (82.49%) Loss: 2.051070 LR: 0.00000869 [16:26:02] Epoch: 1 Batch: 16581/20099 (82.50%) Loss: 1.910800 LR: 0.00000869 [16:26:04] Epoch: 1 Batch: 16582/20099 (82.50%) Loss: 2.112578 LR: 0.00000869 [16:26:06] Epoch: 1 Batch: 16583/20099 (82.51%) Loss: 1.881167 LR: 0.00000869 [16:26:08] Epoch: 1 Batch: 16584/20099 (82.51%) Loss: 1.797178 LR: 0.00000868 [16:26:10] Epoch: 1 Batch: 16585/20099 (82.52%) Loss: 2.132055 LR: 0.00000868 [16:26:12] Epoch: 1 Batch: 16586/20099 (82.52%) Loss: 2.342200 LR: 0.00000868 [16:26:14] Epoch: 1 Batch: 16587/20099 (82.53%) Loss: 2.033067 LR: 0.00000868 [16:26:15] Epoch: 1 Batch: 16588/20099 (82.53%) Loss: 2.344885 LR: 0.00000868 [16:26:17] Epoch: 1 Batch: 16589/20099 (82.54%) Loss: 1.907830 LR: 0.00000868 [16:26:19] Epoch: 1 Batch: 16590/20099 (82.54%) Loss: 2.113581 LR: 0.00000868 [16:26:21] Epoch: 1 Batch: 16591/20099 (82.55%) Loss: 2.015352 LR: 0.00000867 [16:26:23] Epoch: 1 Batch: 16592/20099 (82.55%) Loss: 2.057253 LR: 0.00000867 [16:26:25] Epoch: 1 Batch: 16593/20099 (82.56%) Loss: 2.164710 LR: 0.00000867 [16:26:27] Epoch: 1 Batch: 16594/20099 (82.56%) Loss: 2.154039 LR: 0.00000867 [16:26:28] Epoch: 1 Batch: 16595/20099 (82.57%) Loss: 2.231452 LR: 0.00000867 [16:26:30] Epoch: 1 Batch: 16596/20099 (82.57%) Loss: 2.405390 LR: 0.00000867 [16:26:32] Epoch: 1 Batch: 16597/20099 (82.58%) Loss: 2.132266 LR: 0.00000867 [16:26:34] Epoch: 1 Batch: 16598/20099 (82.58%) Loss: 2.018621 LR: 0.00000866 [16:26:36] Epoch: 1 Batch: 16599/20099 (82.59%) Loss: 1.993345 LR: 0.00000866 [16:26:41] >> Cleaned up old temp checkpoint: epoch1_step14600 [16:26:41] >> Temp checkpoint saved: epoch1_step16600, size: 0.1693 GB [16:26:41] Epoch: 1 Batch: 16600/20099 (82.59%) Loss: 1.964015 LR: 0.00000866 [16:26:43] Epoch: 1 Batch: 16601/20099 (82.60%) Loss: 2.126747 LR: 0.00000866 [16:26:45] Epoch: 1 Batch: 16602/20099 (82.60%) Loss: 2.104041 LR: 0.00000866 [16:26:47] Epoch: 1 Batch: 16603/20099 (82.61%) Loss: 2.161549 LR: 0.00000866 [16:26:49] Epoch: 1 Batch: 16604/20099 (82.61%) Loss: 2.281396 LR: 0.00000866 [16:26:50] Epoch: 1 Batch: 16605/20099 (82.62%) Loss: 1.860242 LR: 0.00000864 [16:26:52] Epoch: 1 Batch: 16606/20099 (82.62%) Loss: 2.091843 LR: 0.00000864 [16:26:54] Epoch: 1 Batch: 16607/20099 (82.63%) Loss: 2.312483 LR: 0.00000864 [16:26:56] Epoch: 1 Batch: 16608/20099 (82.63%) Loss: 2.153271 LR: 0.00000864 [16:26:58] Epoch: 1 Batch: 16609/20099 (82.64%) Loss: 2.004353 LR: 0.00000864 [16:27:00] Epoch: 1 Batch: 16610/20099 (82.64%) Loss: 2.242671 LR: 0.00000864 [16:27:02] Epoch: 1 Batch: 16611/20099 (82.65%) Loss: 2.304753 LR: 0.00000864 [16:27:03] Epoch: 1 Batch: 16612/20099 (82.65%) Loss: 1.941058 LR: 0.00000863 [16:27:05] Epoch: 1 Batch: 16613/20099 (82.66%) Loss: 2.105476 LR: 0.00000863 [16:27:07] Epoch: 1 Batch: 16614/20099 (82.66%) Loss: 2.234357 LR: 0.00000863 [16:27:09] Epoch: 1 Batch: 16615/20099 (82.67%) Loss: 2.176742 LR: 0.00000863 [16:27:11] Epoch: 1 Batch: 16616/20099 (82.67%) Loss: 2.198997 LR: 0.00000863 [16:27:13] Epoch: 1 Batch: 16617/20099 (82.68%) Loss: 2.499077 LR: 0.00000863 [16:27:15] Epoch: 1 Batch: 16618/20099 (82.68%) Loss: 1.900685 LR: 0.00000863 [16:27:16] Epoch: 1 Batch: 16619/20099 (82.69%) Loss: 2.113362 LR: 0.00000862 [16:27:18] Epoch: 1 Batch: 16620/20099 (82.69%) Loss: 2.058198 LR: 0.00000862 [16:27:20] Epoch: 1 Batch: 16621/20099 (82.70%) Loss: 1.930577 LR: 0.00000862 [16:27:22] Epoch: 1 Batch: 16622/20099 (82.70%) Loss: 2.135709 LR: 0.00000862 [16:27:24] Epoch: 1 Batch: 16623/20099 (82.71%) Loss: 2.101613 LR: 0.00000862 [16:27:26] Epoch: 1 Batch: 16624/20099 (82.71%) Loss: 2.227145 LR: 0.00000862 [16:27:28] Epoch: 1 Batch: 16625/20099 (82.72%) Loss: 1.806254 LR: 0.00000862 [16:27:29] Epoch: 1 Batch: 16626/20099 (82.72%) Loss: 2.036176 LR: 0.00000861 [16:27:31] Epoch: 1 Batch: 16627/20099 (82.73%) Loss: 2.251123 LR: 0.00000861 [16:27:33] Epoch: 1 Batch: 16628/20099 (82.73%) Loss: 2.193777 LR: 0.00000861 [16:27:35] Epoch: 1 Batch: 16629/20099 (82.74%) Loss: 2.152320 LR: 0.00000861 [16:27:37] Epoch: 1 Batch: 16630/20099 (82.74%) Loss: 2.110282 LR: 0.00000861 [16:27:39] Epoch: 1 Batch: 16631/20099 (82.75%) Loss: 2.013544 LR: 0.00000861 [16:27:41] Epoch: 1 Batch: 16632/20099 (82.75%) Loss: 2.095991 LR: 0.00000861 [16:27:42] Epoch: 1 Batch: 16633/20099 (82.76%) Loss: 2.331870 LR: 0.00000860 [16:27:44] Epoch: 1 Batch: 16634/20099 (82.76%) Loss: 2.015231 LR: 0.00000860 [16:27:46] Epoch: 1 Batch: 16635/20099 (82.77%) Loss: 2.197946 LR: 0.00000860 [16:27:48] Epoch: 1 Batch: 16636/20099 (82.77%) Loss: 2.052575 LR: 0.00000860 [16:27:50] Epoch: 1 Batch: 16637/20099 (82.78%) Loss: 1.918877 LR: 0.00000860 [16:27:52] Epoch: 1 Batch: 16638/20099 (82.78%) Loss: 2.265791 LR: 0.00000860 [16:27:54] Epoch: 1 Batch: 16639/20099 (82.79%) Loss: 2.252563 LR: 0.00000860 [16:27:55] Epoch: 1 Batch: 16640/20099 (82.79%) Loss: 1.908554 LR: 0.00000859 [16:27:57] Epoch: 1 Batch: 16641/20099 (82.80%) Loss: 2.226805 LR: 0.00000859 [16:27:59] Epoch: 1 Batch: 16642/20099 (82.80%) Loss: 2.310566 LR: 0.00000859 [16:28:01] Epoch: 1 Batch: 16643/20099 (82.81%) Loss: 2.210073 LR: 0.00000859 [16:28:03] Epoch: 1 Batch: 16644/20099 (82.81%) Loss: 2.056789 LR: 0.00000859 [16:28:05] Epoch: 1 Batch: 16645/20099 (82.82%) Loss: 2.250406 LR: 0.00000859 [16:28:06] Epoch: 1 Batch: 16646/20099 (82.82%) Loss: 2.018199 LR: 0.00000859 [16:28:08] Epoch: 1 Batch: 16647/20099 (82.83%) Loss: 2.232755 LR: 0.00000858 [16:28:10] Epoch: 1 Batch: 16648/20099 (82.83%) Loss: 2.188177 LR: 0.00000858 [16:28:12] Epoch: 1 Batch: 16649/20099 (82.83%) Loss: 2.072391 LR: 0.00000858 [16:28:14] Epoch: 1 Batch: 16650/20099 (82.84%) Loss: 1.813561 LR: 0.00000858 [16:28:16] Epoch: 1 Batch: 16651/20099 (82.84%) Loss: 2.252062 LR: 0.00000858 [16:28:18] Epoch: 1 Batch: 16652/20099 (82.85%) Loss: 2.136316 LR: 0.00000858 [16:28:19] Epoch: 1 Batch: 16653/20099 (82.85%) Loss: 2.247184 LR: 0.00000858 [16:28:21] Epoch: 1 Batch: 16654/20099 (82.86%) Loss: 2.088854 LR: 0.00000857 [16:28:23] Epoch: 1 Batch: 16655/20099 (82.86%) Loss: 2.019944 LR: 0.00000857 [16:28:25] Epoch: 1 Batch: 16656/20099 (82.87%) Loss: 2.085492 LR: 0.00000857 [16:28:27] Epoch: 1 Batch: 16657/20099 (82.87%) Loss: 2.054235 LR: 0.00000857 [16:28:29] Epoch: 1 Batch: 16658/20099 (82.88%) Loss: 2.069050 LR: 0.00000857 [16:28:30] Epoch: 1 Batch: 16659/20099 (82.88%) Loss: 1.985282 LR: 0.00000857 [16:28:32] Epoch: 1 Batch: 16660/20099 (82.89%) Loss: 2.067096 LR: 0.00000857 [16:28:34] Epoch: 1 Batch: 16661/20099 (82.89%) Loss: 2.151713 LR: 0.00000856 [16:28:36] Epoch: 1 Batch: 16662/20099 (82.90%) Loss: 2.030815 LR: 0.00000856 [16:28:38] Epoch: 1 Batch: 16663/20099 (82.90%) Loss: 1.783173 LR: 0.00000856 [16:28:40] Epoch: 1 Batch: 16664/20099 (82.91%) Loss: 2.056595 LR: 0.00000856 [16:28:42] Epoch: 1 Batch: 16665/20099 (82.91%) Loss: 2.006794 LR: 0.00000856 [16:28:43] Epoch: 1 Batch: 16666/20099 (82.92%) Loss: 2.132596 LR: 0.00000856 [16:28:45] Epoch: 1 Batch: 16667/20099 (82.92%) Loss: 2.109039 LR: 0.00000856 [16:28:47] Epoch: 1 Batch: 16668/20099 (82.93%) Loss: 2.349339 LR: 0.00000855 [16:28:49] Epoch: 1 Batch: 16669/20099 (82.93%) Loss: 2.406927 LR: 0.00000855 [16:28:51] Epoch: 1 Batch: 16670/20099 (82.94%) Loss: 2.061842 LR: 0.00000855 [16:28:53] Epoch: 1 Batch: 16671/20099 (82.94%) Loss: 1.845564 LR: 0.00000855 [16:28:55] Epoch: 1 Batch: 16672/20099 (82.95%) Loss: 2.232133 LR: 0.00000855 [16:28:56] Epoch: 1 Batch: 16673/20099 (82.95%) Loss: 1.756716 LR: 0.00000855 [16:28:58] Epoch: 1 Batch: 16674/20099 (82.96%) Loss: 2.037388 LR: 0.00000855 [16:29:00] Epoch: 1 Batch: 16675/20099 (82.96%) Loss: 1.858126 LR: 0.00000854 [16:29:02] Epoch: 1 Batch: 16676/20099 (82.97%) Loss: 2.110776 LR: 0.00000854 [16:29:04] Epoch: 1 Batch: 16677/20099 (82.97%) Loss: 1.919326 LR: 0.00000854 [16:29:06] Epoch: 1 Batch: 16678/20099 (82.98%) Loss: 2.282291 LR: 0.00000854 [16:29:08] Epoch: 1 Batch: 16679/20099 (82.98%) Loss: 2.130403 LR: 0.00000854 [16:29:09] Epoch: 1 Batch: 16680/20099 (82.99%) Loss: 2.217645 LR: 0.00000854 [16:29:11] Epoch: 1 Batch: 16681/20099 (82.99%) Loss: 2.053225 LR: 0.00000854 [16:29:13] Epoch: 1 Batch: 16682/20099 (83.00%) Loss: 2.020172 LR: 0.00000853 [16:29:15] Epoch: 1 Batch: 16683/20099 (83.00%) Loss: 2.056619 LR: 0.00000853 [16:29:17] Epoch: 1 Batch: 16684/20099 (83.01%) Loss: 2.188424 LR: 0.00000853 [16:29:19] Epoch: 1 Batch: 16685/20099 (83.01%) Loss: 2.219086 LR: 0.00000853 [16:29:21] Epoch: 1 Batch: 16686/20099 (83.02%) Loss: 1.860166 LR: 0.00000853 [16:29:22] Epoch: 1 Batch: 16687/20099 (83.02%) Loss: 1.879277 LR: 0.00000853 [16:29:24] Epoch: 1 Batch: 16688/20099 (83.03%) Loss: 2.299792 LR: 0.00000853 [16:29:26] Epoch: 1 Batch: 16689/20099 (83.03%) Loss: 2.137259 LR: 0.00000852 [16:29:28] Epoch: 1 Batch: 16690/20099 (83.04%) Loss: 2.105556 LR: 0.00000852 [16:29:30] Epoch: 1 Batch: 16691/20099 (83.04%) Loss: 2.079010 LR: 0.00000852 [16:29:32] Epoch: 1 Batch: 16692/20099 (83.05%) Loss: 1.978904 LR: 0.00000852 [16:29:34] Epoch: 1 Batch: 16693/20099 (83.05%) Loss: 2.201975 LR: 0.00000852 [16:29:35] Epoch: 1 Batch: 16694/20099 (83.06%) Loss: 1.963457 LR: 0.00000852 [16:29:37] Epoch: 1 Batch: 16695/20099 (83.06%) Loss: 2.051216 LR: 0.00000852 [16:29:39] Epoch: 1 Batch: 16696/20099 (83.07%) Loss: 2.172128 LR: 0.00000851 [16:29:41] Epoch: 1 Batch: 16697/20099 (83.07%) Loss: 2.409959 LR: 0.00000851 [16:29:43] Epoch: 1 Batch: 16698/20099 (83.08%) Loss: 2.027288 LR: 0.00000851 [16:29:45] Epoch: 1 Batch: 16699/20099 (83.08%) Loss: 1.703194 LR: 0.00000851 [16:29:46] Epoch: 1 Batch: 16700/20099 (83.09%) Loss: 1.934295 LR: 0.00000851 [16:29:48] Epoch: 1 Batch: 16701/20099 (83.09%) Loss: 2.006363 LR: 0.00000851 [16:29:50] Epoch: 1 Batch: 16702/20099 (83.10%) Loss: 2.095083 LR: 0.00000851 [16:29:52] Epoch: 1 Batch: 16703/20099 (83.10%) Loss: 2.324102 LR: 0.00000850 [16:29:54] Epoch: 1 Batch: 16704/20099 (83.11%) Loss: 2.057350 LR: 0.00000850 [16:29:56] Epoch: 1 Batch: 16705/20099 (83.11%) Loss: 2.076157 LR: 0.00000850 [16:29:58] Epoch: 1 Batch: 16706/20099 (83.12%) Loss: 2.271578 LR: 0.00000850 [16:29:59] Epoch: 1 Batch: 16707/20099 (83.12%) Loss: 1.967137 LR: 0.00000850 [16:30:01] Epoch: 1 Batch: 16708/20099 (83.13%) Loss: 2.225040 LR: 0.00000850 [16:30:03] Epoch: 1 Batch: 16709/20099 (83.13%) Loss: 2.192386 LR: 0.00000850 [16:30:05] Epoch: 1 Batch: 16710/20099 (83.14%) Loss: 2.049857 LR: 0.00000849 [16:30:07] Epoch: 1 Batch: 16711/20099 (83.14%) Loss: 2.231309 LR: 0.00000849 [16:30:09] Epoch: 1 Batch: 16712/20099 (83.15%) Loss: 2.178263 LR: 0.00000849 [16:30:11] Epoch: 1 Batch: 16713/20099 (83.15%) Loss: 2.078445 LR: 0.00000849 [16:30:12] Epoch: 1 Batch: 16714/20099 (83.16%) Loss: 2.075748 LR: 0.00000849 [16:30:14] Epoch: 1 Batch: 16715/20099 (83.16%) Loss: 2.081205 LR: 0.00000849 [16:30:16] Epoch: 1 Batch: 16716/20099 (83.17%) Loss: 2.209902 LR: 0.00000849 [16:30:18] Epoch: 1 Batch: 16717/20099 (83.17%) Loss: 2.120301 LR: 0.00000848 [16:30:20] Epoch: 1 Batch: 16718/20099 (83.18%) Loss: 1.990799 LR: 0.00000848 [16:30:22] Epoch: 1 Batch: 16719/20099 (83.18%) Loss: 1.914299 LR: 0.00000848 [16:30:23] Epoch: 1 Batch: 16720/20099 (83.19%) Loss: 2.052778 LR: 0.00000848 [16:30:25] Epoch: 1 Batch: 16721/20099 (83.19%) Loss: 2.114540 LR: 0.00000848 [16:30:27] Epoch: 1 Batch: 16722/20099 (83.20%) Loss: 2.425771 LR: 0.00000848 [16:30:29] Epoch: 1 Batch: 16723/20099 (83.20%) Loss: 1.733617 LR: 0.00000848 [16:30:31] Epoch: 1 Batch: 16724/20099 (83.21%) Loss: 1.858983 LR: 0.00000847 [16:30:33] Epoch: 1 Batch: 16725/20099 (83.21%) Loss: 2.135315 LR: 0.00000847 [16:30:35] Epoch: 1 Batch: 16726/20099 (83.22%) Loss: 2.274087 LR: 0.00000847 [16:30:36] Epoch: 1 Batch: 16727/20099 (83.22%) Loss: 1.951154 LR: 0.00000847 [16:30:38] Epoch: 1 Batch: 16728/20099 (83.23%) Loss: 2.326573 LR: 0.00000847 [16:30:40] Epoch: 1 Batch: 16729/20099 (83.23%) Loss: 2.154121 LR: 0.00000847 [16:30:42] Epoch: 1 Batch: 16730/20099 (83.24%) Loss: 2.051224 LR: 0.00000847 [16:30:44] Epoch: 1 Batch: 16731/20099 (83.24%) Loss: 1.838811 LR: 0.00000846 [16:30:46] Epoch: 1 Batch: 16732/20099 (83.25%) Loss: 2.110795 LR: 0.00000846 [16:30:48] Epoch: 1 Batch: 16733/20099 (83.25%) Loss: 2.162132 LR: 0.00000846 [16:30:49] Epoch: 1 Batch: 16734/20099 (83.26%) Loss: 2.387799 LR: 0.00000846 [16:30:51] Epoch: 1 Batch: 16735/20099 (83.26%) Loss: 2.069029 LR: 0.00000846 [16:30:53] Epoch: 1 Batch: 16736/20099 (83.27%) Loss: 1.947996 LR: 0.00000846 [16:30:55] Epoch: 1 Batch: 16737/20099 (83.27%) Loss: 2.272533 LR: 0.00000846 [16:30:57] Epoch: 1 Batch: 16738/20099 (83.28%) Loss: 2.231332 LR: 0.00000845 [16:30:59] Epoch: 1 Batch: 16739/20099 (83.28%) Loss: 1.725709 LR: 0.00000845 [16:31:01] Epoch: 1 Batch: 16740/20099 (83.29%) Loss: 2.056757 LR: 0.00000845 [16:31:02] Epoch: 1 Batch: 16741/20099 (83.29%) Loss: 1.967078 LR: 0.00000845 [16:31:04] Epoch: 1 Batch: 16742/20099 (83.30%) Loss: 1.892167 LR: 0.00000845 [16:31:06] Epoch: 1 Batch: 16743/20099 (83.30%) Loss: 2.112472 LR: 0.00000845 [16:31:08] Epoch: 1 Batch: 16744/20099 (83.31%) Loss: 2.010365 LR: 0.00000845 [16:31:10] Epoch: 1 Batch: 16745/20099 (83.31%) Loss: 2.186443 LR: 0.00000844 [16:31:12] Epoch: 1 Batch: 16746/20099 (83.32%) Loss: 2.163054 LR: 0.00000844 [16:31:13] Epoch: 1 Batch: 16747/20099 (83.32%) Loss: 2.093272 LR: 0.00000844 [16:31:15] Epoch: 1 Batch: 16748/20099 (83.33%) Loss: 2.104272 LR: 0.00000844 [16:31:17] Epoch: 1 Batch: 16749/20099 (83.33%) Loss: 2.233150 LR: 0.00000844 [16:31:19] Epoch: 1 Batch: 16750/20099 (83.34%) Loss: 2.408203 LR: 0.00000844 [16:31:21] Epoch: 1 Batch: 16751/20099 (83.34%) Loss: 1.762165 LR: 0.00000844 [16:31:23] Epoch: 1 Batch: 16752/20099 (83.35%) Loss: 2.066351 LR: 0.00000844 [16:31:25] Epoch: 1 Batch: 16753/20099 (83.35%) Loss: 2.119678 LR: 0.00000844 [16:31:26] Epoch: 1 Batch: 16754/20099 (83.36%) Loss: 2.371763 LR: 0.00000844 [16:31:28] Epoch: 1 Batch: 16755/20099 (83.36%) Loss: 1.937698 LR: 0.00000844 [16:31:30] Epoch: 1 Batch: 16756/20099 (83.37%) Loss: 2.006530 LR: 0.00000844 [16:31:32] Epoch: 1 Batch: 16757/20099 (83.37%) Loss: 1.989969 LR: 0.00000844 [16:31:34] Epoch: 1 Batch: 16758/20099 (83.38%) Loss: 2.179781 LR: 0.00000844 [16:31:36] Epoch: 1 Batch: 16759/20099 (83.38%) Loss: 1.774055 LR: 0.00000843 [16:31:37] Epoch: 1 Batch: 16760/20099 (83.39%) Loss: 2.144992 LR: 0.00000843 [16:31:39] Epoch: 1 Batch: 16761/20099 (83.39%) Loss: 1.934946 LR: 0.00000843 [16:31:41] Epoch: 1 Batch: 16762/20099 (83.40%) Loss: 2.200750 LR: 0.00000843 [16:31:43] Epoch: 1 Batch: 16763/20099 (83.40%) Loss: 1.927474 LR: 0.00000843 [16:31:45] Epoch: 1 Batch: 16764/20099 (83.41%) Loss: 1.737182 LR: 0.00000843 [16:31:47] Epoch: 1 Batch: 16765/20099 (83.41%) Loss: 2.168481 LR: 0.00000843 [16:31:49] Epoch: 1 Batch: 16766/20099 (83.42%) Loss: 1.973350 LR: 0.00000842 [16:31:51] Epoch: 1 Batch: 16767/20099 (83.42%) Loss: 2.100538 LR: 0.00000842 [16:31:52] Epoch: 1 Batch: 16768/20099 (83.43%) Loss: 2.000759 LR: 0.00000842 [16:31:54] Epoch: 1 Batch: 16769/20099 (83.43%) Loss: 2.078461 LR: 0.00000842 [16:31:56] Epoch: 1 Batch: 16770/20099 (83.44%) Loss: 2.377001 LR: 0.00000842 [16:31:58] Epoch: 1 Batch: 16771/20099 (83.44%) Loss: 2.082760 LR: 0.00000842 [16:32:00] Epoch: 1 Batch: 16772/20099 (83.45%) Loss: 1.696118 LR: 0.00000842 [16:32:02] Epoch: 1 Batch: 16773/20099 (83.45%) Loss: 1.908051 LR: 0.00000841 [16:32:03] Epoch: 1 Batch: 16774/20099 (83.46%) Loss: 2.004529 LR: 0.00000841 [16:32:05] Epoch: 1 Batch: 16775/20099 (83.46%) Loss: 1.916346 LR: 0.00000841 [16:32:07] Epoch: 1 Batch: 16776/20099 (83.47%) Loss: 2.141359 LR: 0.00000841 [16:32:09] Epoch: 1 Batch: 16777/20099 (83.47%) Loss: 2.037600 LR: 0.00000841 [16:32:11] Epoch: 1 Batch: 16778/20099 (83.48%) Loss: 2.013016 LR: 0.00000841 [16:32:13] Epoch: 1 Batch: 16779/20099 (83.48%) Loss: 1.780203 LR: 0.00000841 [16:32:15] Epoch: 1 Batch: 16780/20099 (83.49%) Loss: 2.179477 LR: 0.00000840 [16:32:16] Epoch: 1 Batch: 16781/20099 (83.49%) Loss: 2.392939 LR: 0.00000840 [16:32:18] Epoch: 1 Batch: 16782/20099 (83.50%) Loss: 2.235525 LR: 0.00000840 [16:32:20] Epoch: 1 Batch: 16783/20099 (83.50%) Loss: 2.167224 LR: 0.00000840 [16:32:22] Epoch: 1 Batch: 16784/20099 (83.51%) Loss: 2.122760 LR: 0.00000840 [16:32:24] Epoch: 1 Batch: 16785/20099 (83.51%) Loss: 2.288196 LR: 0.00000840 [16:32:26] Epoch: 1 Batch: 16786/20099 (83.52%) Loss: 2.079150 LR: 0.00000840 [16:32:28] Epoch: 1 Batch: 16787/20099 (83.52%) Loss: 2.091730 LR: 0.00000839 [16:32:29] Epoch: 1 Batch: 16788/20099 (83.53%) Loss: 2.136585 LR: 0.00000839 [16:32:31] Epoch: 1 Batch: 16789/20099 (83.53%) Loss: 2.106469 LR: 0.00000839 [16:32:33] Epoch: 1 Batch: 16790/20099 (83.54%) Loss: 1.985318 LR: 0.00000839 [16:32:35] Epoch: 1 Batch: 16791/20099 (83.54%) Loss: 1.866581 LR: 0.00000839 [16:32:37] Epoch: 1 Batch: 16792/20099 (83.55%) Loss: 2.095175 LR: 0.00000839 [16:32:39] Epoch: 1 Batch: 16793/20099 (83.55%) Loss: 1.824196 LR: 0.00000839 [16:32:41] Epoch: 1 Batch: 16794/20099 (83.56%) Loss: 2.246424 LR: 0.00000838 [16:32:42] Epoch: 1 Batch: 16795/20099 (83.56%) Loss: 2.189766 LR: 0.00000838 [16:32:44] Epoch: 1 Batch: 16796/20099 (83.57%) Loss: 2.035667 LR: 0.00000838 [16:32:46] Epoch: 1 Batch: 16797/20099 (83.57%) Loss: 2.094762 LR: 0.00000838 [16:32:48] Epoch: 1 Batch: 16798/20099 (83.58%) Loss: 1.892124 LR: 0.00000838 [16:32:50] Epoch: 1 Batch: 16799/20099 (83.58%) Loss: 2.465844 LR: 0.00000838 [16:32:55] >> Cleaned up old temp checkpoint: epoch1_step14800 [16:32:55] >> Temp checkpoint saved: epoch1_step16800, size: 0.1693 GB [16:32:55] Epoch: 1 Batch: 16800/20099 (83.59%) Loss: 1.784863 LR: 0.00000838 [16:32:57] Epoch: 1 Batch: 16801/20099 (83.59%) Loss: 2.340324 LR: 0.00000837 [16:32:59] Epoch: 1 Batch: 16802/20099 (83.60%) Loss: 2.238878 LR: 0.00000837 [16:33:01] Epoch: 1 Batch: 16803/20099 (83.60%) Loss: 2.020341 LR: 0.00000837 [16:33:03] Epoch: 1 Batch: 16804/20099 (83.61%) Loss: 1.807205 LR: 0.00000837 [16:33:05] Epoch: 1 Batch: 16805/20099 (83.61%) Loss: 2.048045 LR: 0.00000837 [16:33:07] Epoch: 1 Batch: 16806/20099 (83.62%) Loss: 2.288436 LR: 0.00000837 [16:33:08] Epoch: 1 Batch: 16807/20099 (83.62%) Loss: 1.890078 LR: 0.00000837 [16:33:10] Epoch: 1 Batch: 16808/20099 (83.63%) Loss: 2.154845 LR: 0.00000836 [16:33:12] Epoch: 1 Batch: 16809/20099 (83.63%) Loss: 1.984071 LR: 0.00000836 [16:33:14] Epoch: 1 Batch: 16810/20099 (83.64%) Loss: 1.910351 LR: 0.00000836 [16:33:16] Epoch: 1 Batch: 16811/20099 (83.64%) Loss: 2.326294 LR: 0.00000836 [16:33:18] Epoch: 1 Batch: 16812/20099 (83.65%) Loss: 2.118865 LR: 0.00000836 [16:33:20] Epoch: 1 Batch: 16813/20099 (83.65%) Loss: 2.012086 LR: 0.00000836 [16:33:21] Epoch: 1 Batch: 16814/20099 (83.66%) Loss: 2.215842 LR: 0.00000836 [16:33:23] Epoch: 1 Batch: 16815/20099 (83.66%) Loss: 1.863859 LR: 0.00000835 [16:33:25] Epoch: 1 Batch: 16816/20099 (83.67%) Loss: 2.080020 LR: 0.00000835 [16:33:27] Epoch: 1 Batch: 16817/20099 (83.67%) Loss: 1.798467 LR: 0.00000835 [16:33:29] Epoch: 1 Batch: 16818/20099 (83.68%) Loss: 2.022755 LR: 0.00000835 [16:33:31] Epoch: 1 Batch: 16819/20099 (83.68%) Loss: 2.186172 LR: 0.00000835 [16:33:33] Epoch: 1 Batch: 16820/20099 (83.69%) Loss: 2.192345 LR: 0.00000835 [16:33:34] Epoch: 1 Batch: 16821/20099 (83.69%) Loss: 1.972157 LR: 0.00000835 [16:33:36] Epoch: 1 Batch: 16822/20099 (83.70%) Loss: 2.086529 LR: 0.00000834 [16:33:38] Epoch: 1 Batch: 16823/20099 (83.70%) Loss: 1.952470 LR: 0.00000834 [16:33:40] Epoch: 1 Batch: 16824/20099 (83.71%) Loss: 1.963237 LR: 0.00000834 [16:33:42] Epoch: 1 Batch: 16825/20099 (83.71%) Loss: 2.047339 LR: 0.00000834 [16:33:44] Epoch: 1 Batch: 16826/20099 (83.72%) Loss: 2.113311 LR: 0.00000834 [16:33:46] Epoch: 1 Batch: 16827/20099 (83.72%) Loss: 1.856907 LR: 0.00000834 [16:33:47] Epoch: 1 Batch: 16828/20099 (83.73%) Loss: 1.812660 LR: 0.00000834 [16:33:49] Epoch: 1 Batch: 16829/20099 (83.73%) Loss: 1.991629 LR: 0.00000833 [16:33:51] Epoch: 1 Batch: 16830/20099 (83.74%) Loss: 2.099817 LR: 0.00000833 [16:33:53] Epoch: 1 Batch: 16831/20099 (83.74%) Loss: 1.901524 LR: 0.00000833 [16:33:55] Epoch: 1 Batch: 16832/20099 (83.75%) Loss: 2.231394 LR: 0.00000833 [16:33:57] Epoch: 1 Batch: 16833/20099 (83.75%) Loss: 2.159597 LR: 0.00000833 [16:33:59] Epoch: 1 Batch: 16834/20099 (83.76%) Loss: 1.990955 LR: 0.00000833 [16:34:00] Epoch: 1 Batch: 16835/20099 (83.76%) Loss: 2.282653 LR: 0.00000833 [16:34:02] Epoch: 1 Batch: 16836/20099 (83.77%) Loss: 2.094138 LR: 0.00000832 [16:34:04] Epoch: 1 Batch: 16837/20099 (83.77%) Loss: 2.255760 LR: 0.00000832 [16:34:06] Epoch: 1 Batch: 16838/20099 (83.78%) Loss: 2.026728 LR: 0.00000832 [16:34:08] Epoch: 1 Batch: 16839/20099 (83.78%) Loss: 1.957673 LR: 0.00000832 [16:34:10] Epoch: 1 Batch: 16840/20099 (83.79%) Loss: 2.093330 LR: 0.00000832 [16:34:11] Epoch: 1 Batch: 16841/20099 (83.79%) Loss: 1.900071 LR: 0.00000832 [16:34:13] Epoch: 1 Batch: 16842/20099 (83.80%) Loss: 2.009720 LR: 0.00000832 [16:34:15] Epoch: 1 Batch: 16843/20099 (83.80%) Loss: 1.808057 LR: 0.00000831 [16:34:17] Epoch: 1 Batch: 16844/20099 (83.81%) Loss: 2.067416 LR: 0.00000831 [16:34:19] Epoch: 1 Batch: 16845/20099 (83.81%) Loss: 2.279677 LR: 0.00000831 [16:34:21] Epoch: 1 Batch: 16846/20099 (83.82%) Loss: 1.973707 LR: 0.00000831 [16:34:23] Epoch: 1 Batch: 16847/20099 (83.82%) Loss: 2.010552 LR: 0.00000831 [16:34:24] Epoch: 1 Batch: 16848/20099 (83.83%) Loss: 2.111931 LR: 0.00000831 [16:34:26] Epoch: 1 Batch: 16849/20099 (83.83%) Loss: 1.724504 LR: 0.00000831 [16:34:28] Epoch: 1 Batch: 16850/20099 (83.84%) Loss: 1.752719 LR: 0.00000830 [16:34:30] Epoch: 1 Batch: 16851/20099 (83.84%) Loss: 1.999775 LR: 0.00000830 [16:34:32] Epoch: 1 Batch: 16852/20099 (83.84%) Loss: 2.251132 LR: 0.00000830 [16:34:34] Epoch: 1 Batch: 16853/20099 (83.85%) Loss: 1.759152 LR: 0.00000830 [16:34:35] Epoch: 1 Batch: 16854/20099 (83.85%) Loss: 2.007309 LR: 0.00000830 [16:34:37] Epoch: 1 Batch: 16855/20099 (83.86%) Loss: 2.010330 LR: 0.00000830 [16:34:39] Epoch: 1 Batch: 16856/20099 (83.86%) Loss: 2.061253 LR: 0.00000830 [16:34:41] Epoch: 1 Batch: 16857/20099 (83.87%) Loss: 2.302803 LR: 0.00000829 [16:34:43] Epoch: 1 Batch: 16858/20099 (83.87%) Loss: 1.950537 LR: 0.00000829 [16:34:45] Epoch: 1 Batch: 16859/20099 (83.88%) Loss: 2.169888 LR: 0.00000829 [16:34:47] Epoch: 1 Batch: 16860/20099 (83.88%) Loss: 1.887194 LR: 0.00000829 [16:34:48] Epoch: 1 Batch: 16861/20099 (83.89%) Loss: 2.528647 LR: 0.00000829 [16:34:50] Epoch: 1 Batch: 16862/20099 (83.89%) Loss: 1.636225 LR: 0.00000829 [16:34:52] Epoch: 1 Batch: 16863/20099 (83.90%) Loss: 2.132779 LR: 0.00000829 [16:34:54] Epoch: 1 Batch: 16864/20099 (83.90%) Loss: 2.193437 LR: 0.00000828 [16:34:56] Epoch: 1 Batch: 16865/20099 (83.91%) Loss: 2.299555 LR: 0.00000828 [16:34:58] Epoch: 1 Batch: 16866/20099 (83.91%) Loss: 2.142249 LR: 0.00000828 [16:35:00] Epoch: 1 Batch: 16867/20099 (83.92%) Loss: 2.150552 LR: 0.00000828 [16:35:02] Epoch: 1 Batch: 16868/20099 (83.92%) Loss: 2.072528 LR: 0.00000828 [16:35:03] Epoch: 1 Batch: 16869/20099 (83.93%) Loss: 2.051984 LR: 0.00000828 [16:35:05] Epoch: 1 Batch: 16870/20099 (83.93%) Loss: 2.140445 LR: 0.00000828 [16:35:07] Epoch: 1 Batch: 16871/20099 (83.94%) Loss: 2.257636 LR: 0.00000827 [16:35:09] Epoch: 1 Batch: 16872/20099 (83.94%) Loss: 1.999200 LR: 0.00000827 [16:35:11] Epoch: 1 Batch: 16873/20099 (83.95%) Loss: 1.639849 LR: 0.00000827 [16:35:13] Epoch: 1 Batch: 16874/20099 (83.95%) Loss: 1.850664 LR: 0.00000827 [16:35:15] Epoch: 1 Batch: 16875/20099 (83.96%) Loss: 1.932293 LR: 0.00000827 [16:35:16] Epoch: 1 Batch: 16876/20099 (83.96%) Loss: 2.238001 LR: 0.00000827 [16:35:18] Epoch: 1 Batch: 16877/20099 (83.97%) Loss: 2.221350 LR: 0.00000827 [16:35:20] Epoch: 1 Batch: 16878/20099 (83.97%) Loss: 2.241984 LR: 0.00000826 [16:35:22] Epoch: 1 Batch: 16879/20099 (83.98%) Loss: 2.069839 LR: 0.00000826 [16:35:24] Epoch: 1 Batch: 16880/20099 (83.98%) Loss: 2.351850 LR: 0.00000826 [16:35:26] Epoch: 1 Batch: 16881/20099 (83.99%) Loss: 2.362052 LR: 0.00000826 [16:35:28] Epoch: 1 Batch: 16882/20099 (83.99%) Loss: 2.038764 LR: 0.00000826 [16:35:29] Epoch: 1 Batch: 16883/20099 (84.00%) Loss: 1.991608 LR: 0.00000826 [16:35:31] Epoch: 1 Batch: 16884/20099 (84.00%) Loss: 2.257091 LR: 0.00000826 [16:35:33] Epoch: 1 Batch: 16885/20099 (84.01%) Loss: 1.986232 LR: 0.00000825 [16:35:35] Epoch: 1 Batch: 16886/20099 (84.01%) Loss: 2.005443 LR: 0.00000825 [16:35:37] Epoch: 1 Batch: 16887/20099 (84.02%) Loss: 2.043661 LR: 0.00000825 [16:35:39] Epoch: 1 Batch: 16888/20099 (84.02%) Loss: 2.307811 LR: 0.00000825 [16:35:41] Epoch: 1 Batch: 16889/20099 (84.03%) Loss: 2.150407 LR: 0.00000825 [16:35:42] Epoch: 1 Batch: 16890/20099 (84.03%) Loss: 2.138336 LR: 0.00000825 [16:35:44] Epoch: 1 Batch: 16891/20099 (84.04%) Loss: 2.147258 LR: 0.00000825 [16:35:46] Epoch: 1 Batch: 16892/20099 (84.04%) Loss: 2.207033 LR: 0.00000824 [16:35:48] Epoch: 1 Batch: 16893/20099 (84.05%) Loss: 1.876033 LR: 0.00000824 [16:35:50] Epoch: 1 Batch: 16894/20099 (84.05%) Loss: 1.905821 LR: 0.00000824 [16:35:52] Epoch: 1 Batch: 16895/20099 (84.06%) Loss: 1.873228 LR: 0.00000824 [16:35:53] Epoch: 1 Batch: 16896/20099 (84.06%) Loss: 2.125677 LR: 0.00000824 [16:35:55] Epoch: 1 Batch: 16897/20099 (84.07%) Loss: 2.106588 LR: 0.00000824 [16:35:57] Epoch: 1 Batch: 16898/20099 (84.07%) Loss: 1.917327 LR: 0.00000824 [16:35:59] Epoch: 1 Batch: 16899/20099 (84.08%) Loss: 2.020635 LR: 0.00000823 [16:36:01] Epoch: 1 Batch: 16900/20099 (84.08%) Loss: 2.200465 LR: 0.00000823 [16:36:03] Epoch: 1 Batch: 16901/20099 (84.09%) Loss: 2.360320 LR: 0.00000823 [16:36:05] Epoch: 1 Batch: 16902/20099 (84.09%) Loss: 1.702943 LR: 0.00000823 [16:36:06] Epoch: 1 Batch: 16903/20099 (84.10%) Loss: 2.170109 LR: 0.00000823 [16:36:08] Epoch: 1 Batch: 16904/20099 (84.10%) Loss: 2.277708 LR: 0.00000823 [16:36:10] Epoch: 1 Batch: 16905/20099 (84.11%) Loss: 1.803392 LR: 0.00000823 [16:36:12] Epoch: 1 Batch: 16906/20099 (84.11%) Loss: 1.981362 LR: 0.00000822 [16:36:14] Epoch: 1 Batch: 16907/20099 (84.12%) Loss: 2.203875 LR: 0.00000822 [16:36:16] Epoch: 1 Batch: 16908/20099 (84.12%) Loss: 2.002357 LR: 0.00000822 [16:36:18] Epoch: 1 Batch: 16909/20099 (84.13%) Loss: 1.837436 LR: 0.00000822 [16:36:19] Epoch: 1 Batch: 16910/20099 (84.13%) Loss: 2.062276 LR: 0.00000822 [16:36:21] Epoch: 1 Batch: 16911/20099 (84.14%) Loss: 2.342928 LR: 0.00000822 [16:36:23] Epoch: 1 Batch: 16912/20099 (84.14%) Loss: 1.974358 LR: 0.00000822 [16:36:25] Epoch: 1 Batch: 16913/20099 (84.15%) Loss: 2.200863 LR: 0.00000821 [16:36:27] Epoch: 1 Batch: 16914/20099 (84.15%) Loss: 2.112032 LR: 0.00000821 [16:36:29] Epoch: 1 Batch: 16915/20099 (84.16%) Loss: 2.308681 LR: 0.00000821 [16:36:30] Epoch: 1 Batch: 16916/20099 (84.16%) Loss: 1.731260 LR: 0.00000821 [16:36:32] Epoch: 1 Batch: 16917/20099 (84.17%) Loss: 2.104476 LR: 0.00000821 [16:36:34] Epoch: 1 Batch: 16918/20099 (84.17%) Loss: 2.085897 LR: 0.00000821 [16:36:36] Epoch: 1 Batch: 16919/20099 (84.18%) Loss: 1.968170 LR: 0.00000821 [16:36:38] Epoch: 1 Batch: 16920/20099 (84.18%) Loss: 2.192598 LR: 0.00000820 [16:36:40] Epoch: 1 Batch: 16921/20099 (84.19%) Loss: 1.791328 LR: 0.00000820 [16:36:42] Epoch: 1 Batch: 16922/20099 (84.19%) Loss: 1.703125 LR: 0.00000820 [16:36:43] Epoch: 1 Batch: 16923/20099 (84.20%) Loss: 2.177454 LR: 0.00000820 [16:36:45] Epoch: 1 Batch: 16924/20099 (84.20%) Loss: 2.228979 LR: 0.00000820 [16:36:47] Epoch: 1 Batch: 16925/20099 (84.21%) Loss: 1.936711 LR: 0.00000820 [16:36:49] Epoch: 1 Batch: 16926/20099 (84.21%) Loss: 2.053221 LR: 0.00000820 [16:36:51] Epoch: 1 Batch: 16927/20099 (84.22%) Loss: 2.229975 LR: 0.00000820 [16:36:53] Epoch: 1 Batch: 16928/20099 (84.22%) Loss: 1.690549 LR: 0.00000820 [16:36:54] Epoch: 1 Batch: 16929/20099 (84.23%) Loss: 1.814277 LR: 0.00000820 [16:36:56] Epoch: 1 Batch: 16930/20099 (84.23%) Loss: 1.886759 LR: 0.00000820 [16:36:58] Epoch: 1 Batch: 16931/20099 (84.24%) Loss: 2.026542 LR: 0.00000820 [16:37:00] Epoch: 1 Batch: 16932/20099 (84.24%) Loss: 2.192698 LR: 0.00000820 [16:37:02] Epoch: 1 Batch: 16933/20099 (84.25%) Loss: 2.245118 LR: 0.00000820 [16:37:04] Epoch: 1 Batch: 16934/20099 (84.25%) Loss: 2.169201 LR: 0.00000819 [16:37:06] Epoch: 1 Batch: 16935/20099 (84.26%) Loss: 1.918150 LR: 0.00000819 [16:37:07] Epoch: 1 Batch: 16936/20099 (84.26%) Loss: 2.087108 LR: 0.00000819 [16:37:09] Epoch: 1 Batch: 16937/20099 (84.27%) Loss: 2.235621 LR: 0.00000819 [16:37:11] Epoch: 1 Batch: 16938/20099 (84.27%) Loss: 2.171145 LR: 0.00000819 [16:37:13] Epoch: 1 Batch: 16939/20099 (84.28%) Loss: 1.998971 LR: 0.00000819 [16:37:15] Epoch: 1 Batch: 16940/20099 (84.28%) Loss: 1.929785 LR: 0.00000819 [16:37:17] Epoch: 1 Batch: 16941/20099 (84.29%) Loss: 2.258292 LR: 0.00000818 [16:37:19] Epoch: 1 Batch: 16942/20099 (84.29%) Loss: 2.068955 LR: 0.00000818 [16:37:20] Epoch: 1 Batch: 16943/20099 (84.30%) Loss: 2.011949 LR: 0.00000818 [16:37:22] Epoch: 1 Batch: 16944/20099 (84.30%) Loss: 2.187311 LR: 0.00000818 [16:37:24] Epoch: 1 Batch: 16945/20099 (84.31%) Loss: 1.573499 LR: 0.00000818 [16:37:26] Epoch: 1 Batch: 16946/20099 (84.31%) Loss: 2.176100 LR: 0.00000818 [16:37:28] Epoch: 1 Batch: 16947/20099 (84.32%) Loss: 2.445177 LR: 0.00000818 [16:37:30] Epoch: 1 Batch: 16948/20099 (84.32%) Loss: 2.114449 LR: 0.00000817 [16:37:31] Epoch: 1 Batch: 16949/20099 (84.33%) Loss: 1.838074 LR: 0.00000817 [16:37:33] Epoch: 1 Batch: 16950/20099 (84.33%) Loss: 2.123845 LR: 0.00000817 [16:37:35] Epoch: 1 Batch: 16951/20099 (84.34%) Loss: 2.141633 LR: 0.00000817 [16:37:37] Epoch: 1 Batch: 16952/20099 (84.34%) Loss: 1.640726 LR: 0.00000817 [16:37:39] Epoch: 1 Batch: 16953/20099 (84.35%) Loss: 2.508094 LR: 0.00000817 [16:37:41] Epoch: 1 Batch: 16954/20099 (84.35%) Loss: 2.346501 LR: 0.00000817 [16:37:43] Epoch: 1 Batch: 16955/20099 (84.36%) Loss: 2.356726 LR: 0.00000816 [16:37:44] Epoch: 1 Batch: 16956/20099 (84.36%) Loss: 2.081290 LR: 0.00000816 [16:37:46] Epoch: 1 Batch: 16957/20099 (84.37%) Loss: 2.198065 LR: 0.00000816 [16:37:48] Epoch: 1 Batch: 16958/20099 (84.37%) Loss: 2.266855 LR: 0.00000816 [16:37:50] Epoch: 1 Batch: 16959/20099 (84.38%) Loss: 1.977073 LR: 0.00000816 [16:37:52] Epoch: 1 Batch: 16960/20099 (84.38%) Loss: 2.138112 LR: 0.00000816 [16:37:54] Epoch: 1 Batch: 16961/20099 (84.39%) Loss: 1.947053 LR: 0.00000816 [16:37:56] Epoch: 1 Batch: 16962/20099 (84.39%) Loss: 2.025668 LR: 0.00000815 [16:37:57] Epoch: 1 Batch: 16963/20099 (84.40%) Loss: 2.123389 LR: 0.00000815 [16:37:59] Epoch: 1 Batch: 16964/20099 (84.40%) Loss: 2.102684 LR: 0.00000815 [16:38:01] Epoch: 1 Batch: 16965/20099 (84.41%) Loss: 2.156146 LR: 0.00000815 [16:38:03] Epoch: 1 Batch: 16966/20099 (84.41%) Loss: 2.182614 LR: 0.00000815 [16:38:05] Epoch: 1 Batch: 16967/20099 (84.42%) Loss: 2.149832 LR: 0.00000815 [16:38:07] Epoch: 1 Batch: 16968/20099 (84.42%) Loss: 1.787732 LR: 0.00000815 [16:38:09] Epoch: 1 Batch: 16969/20099 (84.43%) Loss: 1.978180 LR: 0.00000814 [16:38:10] Epoch: 1 Batch: 16970/20099 (84.43%) Loss: 1.799998 LR: 0.00000814 [16:38:12] Epoch: 1 Batch: 16971/20099 (84.44%) Loss: 1.943870 LR: 0.00000814 [16:38:14] Epoch: 1 Batch: 16972/20099 (84.44%) Loss: 2.210814 LR: 0.00000814 [16:38:16] Epoch: 1 Batch: 16973/20099 (84.45%) Loss: 2.219488 LR: 0.00000814 [16:38:18] Epoch: 1 Batch: 16974/20099 (84.45%) Loss: 2.012368 LR: 0.00000814 [16:38:20] Epoch: 1 Batch: 16975/20099 (84.46%) Loss: 1.894328 LR: 0.00000814 [16:38:22] Epoch: 1 Batch: 16976/20099 (84.46%) Loss: 1.633489 LR: 0.00000813 [16:38:23] Epoch: 1 Batch: 16977/20099 (84.47%) Loss: 2.245471 LR: 0.00000813 [16:38:25] Epoch: 1 Batch: 16978/20099 (84.47%) Loss: 2.307497 LR: 0.00000813 [16:38:27] Epoch: 1 Batch: 16979/20099 (84.48%) Loss: 2.178915 LR: 0.00000813 [16:38:29] Epoch: 1 Batch: 16980/20099 (84.48%) Loss: 1.981733 LR: 0.00000813 [16:38:31] Epoch: 1 Batch: 16981/20099 (84.49%) Loss: 1.898773 LR: 0.00000813 [16:38:33] Epoch: 1 Batch: 16982/20099 (84.49%) Loss: 1.922602 LR: 0.00000813 [16:38:35] Epoch: 1 Batch: 16983/20099 (84.50%) Loss: 2.043898 LR: 0.00000812 [16:38:36] Epoch: 1 Batch: 16984/20099 (84.50%) Loss: 2.275217 LR: 0.00000812 [16:38:38] Epoch: 1 Batch: 16985/20099 (84.51%) Loss: 2.183689 LR: 0.00000812 [16:38:40] Epoch: 1 Batch: 16986/20099 (84.51%) Loss: 2.035990 LR: 0.00000812 [16:38:42] Epoch: 1 Batch: 16987/20099 (84.52%) Loss: 1.878629 LR: 0.00000812 [16:38:44] Epoch: 1 Batch: 16988/20099 (84.52%) Loss: 1.804050 LR: 0.00000812 [16:38:46] Epoch: 1 Batch: 16989/20099 (84.53%) Loss: 2.214268 LR: 0.00000812 [16:38:48] Epoch: 1 Batch: 16990/20099 (84.53%) Loss: 2.285652 LR: 0.00000811 [16:38:49] Epoch: 1 Batch: 16991/20099 (84.54%) Loss: 1.891377 LR: 0.00000811 [16:38:51] Epoch: 1 Batch: 16992/20099 (84.54%) Loss: 2.319561 LR: 0.00000811 [16:38:53] Epoch: 1 Batch: 16993/20099 (84.55%) Loss: 2.112476 LR: 0.00000811 [16:38:55] Epoch: 1 Batch: 16994/20099 (84.55%) Loss: 2.204563 LR: 0.00000811 [16:38:57] Epoch: 1 Batch: 16995/20099 (84.56%) Loss: 1.818583 LR: 0.00000811 [16:38:59] Epoch: 1 Batch: 16996/20099 (84.56%) Loss: 1.988327 LR: 0.00000811 [16:39:01] Epoch: 1 Batch: 16997/20099 (84.57%) Loss: 2.262294 LR: 0.00000810 [16:39:02] Epoch: 1 Batch: 16998/20099 (84.57%) Loss: 2.034229 LR: 0.00000810 [16:39:04] Epoch: 1 Batch: 16999/20099 (84.58%) Loss: 1.943547 LR: 0.00000810 [16:39:06] >> Evaluating batch 0 [16:39:07] >> Evaluating batch 1 [16:39:08] >> Evaluating batch 2 [16:39:09] >> Evaluating batch 3 [16:39:10] >> Evaluating batch 4 [16:39:12] >> Evaluating batch 5 [16:39:13] >> Evaluating batch 6 [16:39:14] >> Evaluating batch 7 [16:39:15] >> Evaluating batch 8 [16:39:16] >> Evaluating batch 9 [16:39:17] >> Evaluating batch 10 [16:39:18] >> Evaluating batch 11 [16:39:19] >> Evaluating batch 12 [16:39:20] >> Evaluating batch 13 [16:39:21] >> Evaluating batch 14 [16:39:22] >> Evaluating batch 15 [16:39:23] >> Evaluating batch 16 [16:39:23] Epoch: 1 Step: 17000/20099 Evaluation: [16:39:23] [1mAvg Loss Since Last Eval: 2.0804 Val Loss: 2.1469 Validation loss delta: -0.0009 Perplexity: 8.5584 LR: 0.00000810 [16:39:27] >> Cleaned up old temp checkpoint: epoch1_step15000 [16:39:27] >> Temp checkpoint saved: epoch1_step17000, size: 0.1693 GB [16:39:31] >> Checkpoint saved: epoch1_step17000, size: 0.1693 GB [16:39:31] Epoch: 1 Batch: 17000/20099 (84.58%) Loss: 2.191501 LR: 0.00000810 [16:39:32] Epoch: 1 Batch: 17001/20099 (84.59%) Loss: 2.395800 LR: 0.00000810 [16:39:34] Epoch: 1 Batch: 17002/20099 (84.59%) Loss: 1.995746 LR: 0.00000810 [16:39:36] Epoch: 1 Batch: 17003/20099 (84.60%) Loss: 2.097784 LR: 0.00000810 [16:39:38] Epoch: 1 Batch: 17004/20099 (84.60%) Loss: 2.210976 LR: 0.00000809 [16:39:40] Epoch: 1 Batch: 17005/20099 (84.61%) Loss: 1.958822 LR: 0.00000809 [16:39:42] Epoch: 1 Batch: 17006/20099 (84.61%) Loss: 2.267553 LR: 0.00000809 [16:39:43] Epoch: 1 Batch: 17007/20099 (84.62%) Loss: 1.867942 LR: 0.00000809 [16:39:45] Epoch: 1 Batch: 17008/20099 (84.62%) Loss: 2.424419 LR: 0.00000809 [16:39:47] Epoch: 1 Batch: 17009/20099 (84.63%) Loss: 2.219309 LR: 0.00000809 [16:39:49] Epoch: 1 Batch: 17010/20099 (84.63%) Loss: 1.764623 LR: 0.00000809 [16:39:51] Epoch: 1 Batch: 17011/20099 (84.64%) Loss: 1.998904 LR: 0.00000808 [16:39:53] Epoch: 1 Batch: 17012/20099 (84.64%) Loss: 2.387442 LR: 0.00000808 [16:39:55] Epoch: 1 Batch: 17013/20099 (84.65%) Loss: 1.869690 LR: 0.00000808 [16:39:57] Epoch: 1 Batch: 17014/20099 (84.65%) Loss: 1.603001 LR: 0.00000808 [16:39:59] Epoch: 1 Batch: 17015/20099 (84.66%) Loss: 2.142349 LR: 0.00000808 [16:40:01] Epoch: 1 Batch: 17016/20099 (84.66%) Loss: 1.916685 LR: 0.00000808 [16:40:02] Epoch: 1 Batch: 17017/20099 (84.67%) Loss: 2.317330 LR: 0.00000808 [16:40:04] Epoch: 1 Batch: 17018/20099 (84.67%) Loss: 2.166361 LR: 0.00000808 [16:40:06] Epoch: 1 Batch: 17019/20099 (84.68%) Loss: 2.210110 LR: 0.00000808 [16:40:08] Epoch: 1 Batch: 17020/20099 (84.68%) Loss: 1.822191 LR: 0.00000808 [16:40:10] Epoch: 1 Batch: 17021/20099 (84.69%) Loss: 2.090557 LR: 0.00000808 [16:40:12] Epoch: 1 Batch: 17022/20099 (84.69%) Loss: 2.224681 LR: 0.00000808 [16:40:14] Epoch: 1 Batch: 17023/20099 (84.70%) Loss: 2.099369 LR: 0.00000808 [16:40:16] Epoch: 1 Batch: 17024/20099 (84.70%) Loss: 2.002857 LR: 0.00000808 [16:40:17] Epoch: 1 Batch: 17025/20099 (84.71%) Loss: 1.935846 LR: 0.00000807 [16:40:19] Epoch: 1 Batch: 17026/20099 (84.71%) Loss: 1.808426 LR: 0.00000807 [16:40:21] Epoch: 1 Batch: 17027/20099 (84.72%) Loss: 2.327192 LR: 0.00000807 [16:40:23] Epoch: 1 Batch: 17028/20099 (84.72%) Loss: 2.161461 LR: 0.00000807 [16:40:25] Epoch: 1 Batch: 17029/20099 (84.73%) Loss: 2.257541 LR: 0.00000807 [16:40:27] Epoch: 1 Batch: 17030/20099 (84.73%) Loss: 1.998036 LR: 0.00000807 [16:40:28] Epoch: 1 Batch: 17031/20099 (84.74%) Loss: 2.404791 LR: 0.00000807 [16:40:30] Epoch: 1 Batch: 17032/20099 (84.74%) Loss: 2.379539 LR: 0.00000806 [16:40:32] Epoch: 1 Batch: 17033/20099 (84.75%) Loss: 1.970291 LR: 0.00000806 [16:40:34] Epoch: 1 Batch: 17034/20099 (84.75%) Loss: 2.261286 LR: 0.00000806 [16:40:36] Epoch: 1 Batch: 17035/20099 (84.76%) Loss: 2.050778 LR: 0.00000806 [16:40:38] Epoch: 1 Batch: 17036/20099 (84.76%) Loss: 2.045305 LR: 0.00000806 [16:40:40] Epoch: 1 Batch: 17037/20099 (84.77%) Loss: 2.335869 LR: 0.00000806 [16:40:41] Epoch: 1 Batch: 17038/20099 (84.77%) Loss: 2.319338 LR: 0.00000806 [16:40:43] Epoch: 1 Batch: 17039/20099 (84.78%) Loss: 2.086765 LR: 0.00000805 [16:40:45] Epoch: 1 Batch: 17040/20099 (84.78%) Loss: 2.159406 LR: 0.00000805 [16:40:47] Epoch: 1 Batch: 17041/20099 (84.79%) Loss: 2.155032 LR: 0.00000805 [16:40:49] Epoch: 1 Batch: 17042/20099 (84.79%) Loss: 2.356128 LR: 0.00000805 [16:40:51] Epoch: 1 Batch: 17043/20099 (84.80%) Loss: 2.009349 LR: 0.00000805 [16:40:52] Epoch: 1 Batch: 17044/20099 (84.80%) Loss: 1.917074 LR: 0.00000805 [16:40:54] Epoch: 1 Batch: 17045/20099 (84.81%) Loss: 2.294897 LR: 0.00000805 [16:40:56] Epoch: 1 Batch: 17046/20099 (84.81%) Loss: 2.063482 LR: 0.00000804 [16:40:58] Epoch: 1 Batch: 17047/20099 (84.82%) Loss: 2.460309 LR: 0.00000804 [16:41:00] Epoch: 1 Batch: 17048/20099 (84.82%) Loss: 1.886720 LR: 0.00000804 [16:41:02] Epoch: 1 Batch: 17049/20099 (84.83%) Loss: 2.090027 LR: 0.00000804 [16:41:03] Epoch: 1 Batch: 17050/20099 (84.83%) Loss: 1.806859 LR: 0.00000804 [16:41:05] Epoch: 1 Batch: 17051/20099 (84.84%) Loss: 1.931793 LR: 0.00000804 [16:41:07] Epoch: 1 Batch: 17052/20099 (84.84%) Loss: 2.059423 LR: 0.00000804 [16:41:09] Epoch: 1 Batch: 17053/20099 (84.85%) Loss: 2.026283 LR: 0.00000803 [16:41:11] Epoch: 1 Batch: 17054/20099 (84.85%) Loss: 1.881199 LR: 0.00000803 [16:41:13] Epoch: 1 Batch: 17055/20099 (84.85%) Loss: 2.082657 LR: 0.00000803 [16:41:15] Epoch: 1 Batch: 17056/20099 (84.86%) Loss: 1.911495 LR: 0.00000803 [16:41:16] Epoch: 1 Batch: 17057/20099 (84.86%) Loss: 1.916013 LR: 0.00000803 [16:41:18] Epoch: 1 Batch: 17058/20099 (84.87%) Loss: 2.184204 LR: 0.00000803 [16:41:20] Epoch: 1 Batch: 17059/20099 (84.87%) Loss: 1.856809 LR: 0.00000803 [16:41:22] Epoch: 1 Batch: 17060/20099 (84.88%) Loss: 2.080236 LR: 0.00000802 [16:41:24] Epoch: 1 Batch: 17061/20099 (84.88%) Loss: 1.960490 LR: 0.00000802 [16:41:26] Epoch: 1 Batch: 17062/20099 (84.89%) Loss: 1.995957 LR: 0.00000802 [16:41:28] Epoch: 1 Batch: 17063/20099 (84.89%) Loss: 2.031104 LR: 0.00000802 [16:41:30] Epoch: 1 Batch: 17064/20099 (84.90%) Loss: 2.029749 LR: 0.00000802 [16:41:31] Epoch: 1 Batch: 17065/20099 (84.90%) Loss: 2.029975 LR: 0.00000802 [16:41:33] Epoch: 1 Batch: 17066/20099 (84.91%) Loss: 2.349007 LR: 0.00000802 [16:41:35] Epoch: 1 Batch: 17067/20099 (84.91%) Loss: 2.254635 LR: 0.00000801 [16:41:37] Epoch: 1 Batch: 17068/20099 (84.92%) Loss: 2.012547 LR: 0.00000801 [16:41:39] Epoch: 1 Batch: 17069/20099 (84.92%) Loss: 2.081901 LR: 0.00000801 [16:41:41] Epoch: 1 Batch: 17070/20099 (84.93%) Loss: 2.240936 LR: 0.00000801 [16:41:43] Epoch: 1 Batch: 17071/20099 (84.93%) Loss: 2.137130 LR: 0.00000801 [16:41:44] Epoch: 1 Batch: 17072/20099 (84.94%) Loss: 2.011303 LR: 0.00000801 [16:41:46] Epoch: 1 Batch: 17073/20099 (84.94%) Loss: 2.115493 LR: 0.00000801 [16:41:48] Epoch: 1 Batch: 17074/20099 (84.95%) Loss: 1.715613 LR: 0.00000800 [16:41:50] Epoch: 1 Batch: 17075/20099 (84.95%) Loss: 2.100876 LR: 0.00000800 [16:41:52] Epoch: 1 Batch: 17076/20099 (84.96%) Loss: 2.018236 LR: 0.00000800 [16:41:54] Epoch: 1 Batch: 17077/20099 (84.96%) Loss: 1.568634 LR: 0.00000800 [16:41:55] Epoch: 1 Batch: 17078/20099 (84.97%) Loss: 2.210957 LR: 0.00000800 [16:41:57] Epoch: 1 Batch: 17079/20099 (84.97%) Loss: 1.859042 LR: 0.00000800 [16:41:59] Epoch: 1 Batch: 17080/20099 (84.98%) Loss: 1.955690 LR: 0.00000800 [16:42:01] Epoch: 1 Batch: 17081/20099 (84.98%) Loss: 2.244576 LR: 0.00000799 [16:42:03] Epoch: 1 Batch: 17082/20099 (84.99%) Loss: 1.886267 LR: 0.00000799 [16:42:05] Epoch: 1 Batch: 17083/20099 (84.99%) Loss: 1.767527 LR: 0.00000799 [16:42:07] Epoch: 1 Batch: 17084/20099 (85.00%) Loss: 1.959054 LR: 0.00000799 [16:42:08] Epoch: 1 Batch: 17085/20099 (85.00%) Loss: 1.747660 LR: 0.00000799 [16:42:10] Epoch: 1 Batch: 17086/20099 (85.01%) Loss: 2.083838 LR: 0.00000799 [16:42:12] Epoch: 1 Batch: 17087/20099 (85.01%) Loss: 2.039367 LR: 0.00000799 [16:42:14] Epoch: 1 Batch: 17088/20099 (85.02%) Loss: 1.610425 LR: 0.00000798 [16:42:16] Epoch: 1 Batch: 17089/20099 (85.02%) Loss: 1.924597 LR: 0.00000798 [16:42:18] Epoch: 1 Batch: 17090/20099 (85.03%) Loss: 2.033648 LR: 0.00000798 [16:42:19] Epoch: 1 Batch: 17091/20099 (85.03%) Loss: 2.048092 LR: 0.00000798 [16:42:21] Epoch: 1 Batch: 17092/20099 (85.04%) Loss: 2.030400 LR: 0.00000798 [16:42:23] Epoch: 1 Batch: 17093/20099 (85.04%) Loss: 2.186990 LR: 0.00000798 [16:42:25] Epoch: 1 Batch: 17094/20099 (85.05%) Loss: 2.370811 LR: 0.00000798 [16:42:27] Epoch: 1 Batch: 17095/20099 (85.05%) Loss: 1.966318 LR: 0.00000798 [16:42:29] Epoch: 1 Batch: 17096/20099 (85.06%) Loss: 2.227926 LR: 0.00000798 [16:42:31] Epoch: 1 Batch: 17097/20099 (85.06%) Loss: 1.970612 LR: 0.00000798 [16:42:32] Epoch: 1 Batch: 17098/20099 (85.07%) Loss: 1.880152 LR: 0.00000798 [16:42:34] Epoch: 1 Batch: 17099/20099 (85.07%) Loss: 2.285702 LR: 0.00000798 [16:42:36] Epoch: 1 Batch: 17100/20099 (85.08%) Loss: 1.950794 LR: 0.00000798 [16:42:38] Epoch: 1 Batch: 17101/20099 (85.08%) Loss: 2.210974 LR: 0.00000798 [16:42:40] Epoch: 1 Batch: 17102/20099 (85.09%) Loss: 2.111454 LR: 0.00000797 [16:42:42] Epoch: 1 Batch: 17103/20099 (85.09%) Loss: 2.065188 LR: 0.00000797 [16:42:44] Epoch: 1 Batch: 17104/20099 (85.10%) Loss: 2.137998 LR: 0.00000797 [16:42:45] Epoch: 1 Batch: 17105/20099 (85.10%) Loss: 2.428985 LR: 0.00000797 [16:42:47] Epoch: 1 Batch: 17106/20099 (85.11%) Loss: 2.040183 LR: 0.00000797 [16:42:49] Epoch: 1 Batch: 17107/20099 (85.11%) Loss: 2.357509 LR: 0.00000797 [16:42:51] Epoch: 1 Batch: 17108/20099 (85.12%) Loss: 2.183233 LR: 0.00000797 [16:42:53] Epoch: 1 Batch: 17109/20099 (85.12%) Loss: 2.294452 LR: 0.00000796 [16:42:55] Epoch: 1 Batch: 17110/20099 (85.13%) Loss: 2.289980 LR: 0.00000796 [16:42:57] Epoch: 1 Batch: 17111/20099 (85.13%) Loss: 2.086342 LR: 0.00000796 [16:42:58] Epoch: 1 Batch: 17112/20099 (85.14%) Loss: 1.994547 LR: 0.00000796 [16:43:00] Epoch: 1 Batch: 17113/20099 (85.14%) Loss: 2.055961 LR: 0.00000796 [16:43:02] Epoch: 1 Batch: 17114/20099 (85.15%) Loss: 1.846628 LR: 0.00000796 [16:43:04] Epoch: 1 Batch: 17115/20099 (85.15%) Loss: 1.792512 LR: 0.00000796 [16:43:06] Epoch: 1 Batch: 17116/20099 (85.16%) Loss: 2.193355 LR: 0.00000795 [16:43:08] Epoch: 1 Batch: 17117/20099 (85.16%) Loss: 2.068602 LR: 0.00000795 [16:43:10] Epoch: 1 Batch: 17118/20099 (85.17%) Loss: 1.911580 LR: 0.00000795 [16:43:11] Epoch: 1 Batch: 17119/20099 (85.17%) Loss: 2.128372 LR: 0.00000795 [16:43:13] Epoch: 1 Batch: 17120/20099 (85.18%) Loss: 1.848773 LR: 0.00000795 [16:43:15] Epoch: 1 Batch: 17121/20099 (85.18%) Loss: 1.808911 LR: 0.00000795 [16:43:17] Epoch: 1 Batch: 17122/20099 (85.19%) Loss: 1.705184 LR: 0.00000795 [16:43:19] Epoch: 1 Batch: 17123/20099 (85.19%) Loss: 2.145293 LR: 0.00000794 [16:43:21] Epoch: 1 Batch: 17124/20099 (85.20%) Loss: 2.040301 LR: 0.00000794 [16:43:23] Epoch: 1 Batch: 17125/20099 (85.20%) Loss: 2.028472 LR: 0.00000794 [16:43:24] Epoch: 1 Batch: 17126/20099 (85.21%) Loss: 2.176286 LR: 0.00000794 [16:43:26] Epoch: 1 Batch: 17127/20099 (85.21%) Loss: 2.128289 LR: 0.00000794 [16:43:28] Epoch: 1 Batch: 17128/20099 (85.22%) Loss: 1.957259 LR: 0.00000794 [16:43:30] Epoch: 1 Batch: 17129/20099 (85.22%) Loss: 1.907363 LR: 0.00000794 [16:43:32] Epoch: 1 Batch: 17130/20099 (85.23%) Loss: 2.096828 LR: 0.00000793 [16:43:34] Epoch: 1 Batch: 17131/20099 (85.23%) Loss: 2.181234 LR: 0.00000793 [16:43:35] Epoch: 1 Batch: 17132/20099 (85.24%) Loss: 2.112146 LR: 0.00000793 [16:43:37] Epoch: 1 Batch: 17133/20099 (85.24%) Loss: 2.083989 LR: 0.00000793 [16:43:39] Epoch: 1 Batch: 17134/20099 (85.25%) Loss: 2.124143 LR: 0.00000793 [16:43:41] Epoch: 1 Batch: 17135/20099 (85.25%) Loss: 2.334980 LR: 0.00000793 [16:43:43] Epoch: 1 Batch: 17136/20099 (85.26%) Loss: 2.074976 LR: 0.00000793 [16:43:45] Epoch: 1 Batch: 17137/20099 (85.26%) Loss: 1.917490 LR: 0.00000792 [16:43:47] Epoch: 1 Batch: 17138/20099 (85.27%) Loss: 1.915757 LR: 0.00000792 [16:43:48] Epoch: 1 Batch: 17139/20099 (85.27%) Loss: 2.025529 LR: 0.00000792 [16:43:50] Epoch: 1 Batch: 17140/20099 (85.28%) Loss: 2.124703 LR: 0.00000792 [16:43:52] Epoch: 1 Batch: 17141/20099 (85.28%) Loss: 2.383836 LR: 0.00000792 [16:43:54] Epoch: 1 Batch: 17142/20099 (85.29%) Loss: 2.215561 LR: 0.00000792 [16:43:56] Epoch: 1 Batch: 17143/20099 (85.29%) Loss: 1.989034 LR: 0.00000792 [16:43:58] Epoch: 1 Batch: 17144/20099 (85.30%) Loss: 2.036236 LR: 0.00000791 [16:44:00] Epoch: 1 Batch: 17145/20099 (85.30%) Loss: 1.826297 LR: 0.00000791 [16:44:01] Epoch: 1 Batch: 17146/20099 (85.31%) Loss: 1.941903 LR: 0.00000791 [16:44:03] Epoch: 1 Batch: 17147/20099 (85.31%) Loss: 2.217383 LR: 0.00000791 [16:44:05] Epoch: 1 Batch: 17148/20099 (85.32%) Loss: 2.159658 LR: 0.00000791 [16:44:07] Epoch: 1 Batch: 17149/20099 (85.32%) Loss: 2.406987 LR: 0.00000791 [16:44:09] Epoch: 1 Batch: 17150/20099 (85.33%) Loss: 1.932494 LR: 0.00000791 [16:44:11] Epoch: 1 Batch: 17151/20099 (85.33%) Loss: 1.918618 LR: 0.00000790 [16:44:13] Epoch: 1 Batch: 17152/20099 (85.34%) Loss: 2.149009 LR: 0.00000790 [16:44:15] Epoch: 1 Batch: 17153/20099 (85.34%) Loss: 2.128085 LR: 0.00000790 [16:44:16] Epoch: 1 Batch: 17154/20099 (85.35%) Loss: 1.992927 LR: 0.00000790 [16:44:18] Epoch: 1 Batch: 17155/20099 (85.35%) Loss: 2.302804 LR: 0.00000790 [16:44:20] Epoch: 1 Batch: 17156/20099 (85.36%) Loss: 2.302409 LR: 0.00000790 [16:44:22] Epoch: 1 Batch: 17157/20099 (85.36%) Loss: 2.458788 LR: 0.00000790 [16:44:24] Epoch: 1 Batch: 17158/20099 (85.37%) Loss: 2.335850 LR: 0.00000790 [16:44:26] Epoch: 1 Batch: 17159/20099 (85.37%) Loss: 1.974927 LR: 0.00000790 [16:44:28] Epoch: 1 Batch: 17160/20099 (85.38%) Loss: 1.966613 LR: 0.00000790 [16:44:29] Epoch: 1 Batch: 17161/20099 (85.38%) Loss: 1.939694 LR: 0.00000790 [16:44:31] Epoch: 1 Batch: 17162/20099 (85.39%) Loss: 2.148151 LR: 0.00000790 [16:44:33] Epoch: 1 Batch: 17163/20099 (85.39%) Loss: 1.939029 LR: 0.00000790 [16:44:35] Epoch: 1 Batch: 17164/20099 (85.40%) Loss: 2.324956 LR: 0.00000790 [16:44:37] Epoch: 1 Batch: 17165/20099 (85.40%) Loss: 2.196288 LR: 0.00000789 [16:44:39] Epoch: 1 Batch: 17166/20099 (85.41%) Loss: 2.177834 LR: 0.00000789 [16:44:40] Epoch: 1 Batch: 17167/20099 (85.41%) Loss: 2.068719 LR: 0.00000789 [16:44:42] Epoch: 1 Batch: 17168/20099 (85.42%) Loss: 2.042240 LR: 0.00000789 [16:44:44] Epoch: 1 Batch: 17169/20099 (85.42%) Loss: 2.424984 LR: 0.00000789 [16:44:46] Epoch: 1 Batch: 17170/20099 (85.43%) Loss: 2.206494 LR: 0.00000789 [16:44:48] Epoch: 1 Batch: 17171/20099 (85.43%) Loss: 2.312869 LR: 0.00000789 [16:44:50] Epoch: 1 Batch: 17172/20099 (85.44%) Loss: 2.008868 LR: 0.00000788 [16:44:52] Epoch: 1 Batch: 17173/20099 (85.44%) Loss: 2.134622 LR: 0.00000788 [16:44:53] Epoch: 1 Batch: 17174/20099 (85.45%) Loss: 2.161809 LR: 0.00000788 [16:44:55] Epoch: 1 Batch: 17175/20099 (85.45%) Loss: 2.010904 LR: 0.00000788 [16:44:57] Epoch: 1 Batch: 17176/20099 (85.46%) Loss: 2.018464 LR: 0.00000788 [16:44:59] Epoch: 1 Batch: 17177/20099 (85.46%) Loss: 2.034219 LR: 0.00000788 [16:45:01] Epoch: 1 Batch: 17178/20099 (85.47%) Loss: 2.178152 LR: 0.00000788 [16:45:03] Epoch: 1 Batch: 17179/20099 (85.47%) Loss: 2.159491 LR: 0.00000787 [16:45:05] Epoch: 1 Batch: 17180/20099 (85.48%) Loss: 1.968548 LR: 0.00000787 [16:45:06] Epoch: 1 Batch: 17181/20099 (85.48%) Loss: 1.871145 LR: 0.00000787 [16:45:08] Epoch: 1 Batch: 17182/20099 (85.49%) Loss: 2.105959 LR: 0.00000787 [16:45:10] Epoch: 1 Batch: 17183/20099 (85.49%) Loss: 2.116418 LR: 0.00000787 [16:45:12] Epoch: 1 Batch: 17184/20099 (85.50%) Loss: 1.689605 LR: 0.00000787 [16:45:14] Epoch: 1 Batch: 17185/20099 (85.50%) Loss: 2.093787 LR: 0.00000787 [16:45:16] Epoch: 1 Batch: 17186/20099 (85.51%) Loss: 2.071362 LR: 0.00000786 [16:45:17] Epoch: 1 Batch: 17187/20099 (85.51%) Loss: 2.191816 LR: 0.00000786 [16:45:19] Epoch: 1 Batch: 17188/20099 (85.52%) Loss: 2.070408 LR: 0.00000786 [16:45:21] Epoch: 1 Batch: 17189/20099 (85.52%) Loss: 1.933650 LR: 0.00000786 [16:45:23] Epoch: 1 Batch: 17190/20099 (85.53%) Loss: 1.972380 LR: 0.00000786 [16:45:25] Epoch: 1 Batch: 17191/20099 (85.53%) Loss: 1.849607 LR: 0.00000786 [16:45:27] Epoch: 1 Batch: 17192/20099 (85.54%) Loss: 2.175886 LR: 0.00000786 [16:45:28] Epoch: 1 Batch: 17193/20099 (85.54%) Loss: 2.155641 LR: 0.00000785 [16:45:30] Epoch: 1 Batch: 17194/20099 (85.55%) Loss: 2.044591 LR: 0.00000785 [16:45:32] Epoch: 1 Batch: 17195/20099 (85.55%) Loss: 2.204817 LR: 0.00000785 [16:45:34] Epoch: 1 Batch: 17196/20099 (85.56%) Loss: 2.120708 LR: 0.00000785 [16:45:36] Epoch: 1 Batch: 17197/20099 (85.56%) Loss: 1.942590 LR: 0.00000785 [16:45:38] Epoch: 1 Batch: 17198/20099 (85.57%) Loss: 2.170893 LR: 0.00000785 [16:45:40] Epoch: 1 Batch: 17199/20099 (85.57%) Loss: 2.493351 LR: 0.00000785 [16:45:45] >> Cleaned up old temp checkpoint: epoch1_step15200 [16:45:45] >> Temp checkpoint saved: epoch1_step17200, size: 0.1693 GB [16:45:45] Epoch: 1 Batch: 17200/20099 (85.58%) Loss: 2.036138 LR: 0.00000784 [16:45:47] Epoch: 1 Batch: 17201/20099 (85.58%) Loss: 2.152213 LR: 0.00000784 [16:45:49] Epoch: 1 Batch: 17202/20099 (85.59%) Loss: 1.968125 LR: 0.00000784 [16:45:50] Epoch: 1 Batch: 17203/20099 (85.59%) Loss: 2.203853 LR: 0.00000784 [16:45:52] Epoch: 1 Batch: 17204/20099 (85.60%) Loss: 2.350096 LR: 0.00000784 [16:45:54] Epoch: 1 Batch: 17205/20099 (85.60%) Loss: 2.035088 LR: 0.00000784 [16:45:56] Epoch: 1 Batch: 17206/20099 (85.61%) Loss: 2.481073 LR: 0.00000784 [16:45:58] Epoch: 1 Batch: 17207/20099 (85.61%) Loss: 2.153065 LR: 0.00000784 [16:46:00] Epoch: 1 Batch: 17208/20099 (85.62%) Loss: 2.210736 LR: 0.00000784 [16:46:02] Epoch: 1 Batch: 17209/20099 (85.62%) Loss: 2.038571 LR: 0.00000784 [16:46:03] Epoch: 1 Batch: 17210/20099 (85.63%) Loss: 2.021170 LR: 0.00000784 [16:46:05] Epoch: 1 Batch: 17211/20099 (85.63%) Loss: 1.926524 LR: 0.00000784 [16:46:07] Epoch: 1 Batch: 17212/20099 (85.64%) Loss: 2.079737 LR: 0.00000784 [16:46:09] Epoch: 1 Batch: 17213/20099 (85.64%) Loss: 1.967750 LR: 0.00000784 [16:46:11] Epoch: 1 Batch: 17214/20099 (85.65%) Loss: 2.312788 LR: 0.00000783 [16:46:13] Epoch: 1 Batch: 17215/20099 (85.65%) Loss: 2.097534 LR: 0.00000783 [16:46:15] Epoch: 1 Batch: 17216/20099 (85.66%) Loss: 2.160419 LR: 0.00000783 [16:46:16] Epoch: 1 Batch: 17217/20099 (85.66%) Loss: 1.959460 LR: 0.00000783 [16:46:18] Epoch: 1 Batch: 17218/20099 (85.67%) Loss: 2.072925 LR: 0.00000783 [16:46:20] Epoch: 1 Batch: 17219/20099 (85.67%) Loss: 2.004448 LR: 0.00000783 [16:46:22] Epoch: 1 Batch: 17220/20099 (85.68%) Loss: 2.073457 LR: 0.00000783 [16:46:24] Epoch: 1 Batch: 17221/20099 (85.68%) Loss: 2.484454 LR: 0.00000782 [16:46:26] Epoch: 1 Batch: 17222/20099 (85.69%) Loss: 2.040761 LR: 0.00000782 [16:46:28] Epoch: 1 Batch: 17223/20099 (85.69%) Loss: 2.189120 LR: 0.00000782 [16:46:30] Epoch: 1 Batch: 17224/20099 (85.70%) Loss: 2.018032 LR: 0.00000782 [16:46:31] Epoch: 1 Batch: 17225/20099 (85.70%) Loss: 2.178329 LR: 0.00000782 [16:46:33] Epoch: 1 Batch: 17226/20099 (85.71%) Loss: 2.017492 LR: 0.00000782 [16:46:35] Epoch: 1 Batch: 17227/20099 (85.71%) Loss: 2.018399 LR: 0.00000782 [16:46:37] Epoch: 1 Batch: 17228/20099 (85.72%) Loss: 1.923882 LR: 0.00000781 [16:46:39] Epoch: 1 Batch: 17229/20099 (85.72%) Loss: 2.385854 LR: 0.00000781 [16:46:41] Epoch: 1 Batch: 17230/20099 (85.73%) Loss: 2.192957 LR: 0.00000781 [16:46:43] Epoch: 1 Batch: 17231/20099 (85.73%) Loss: 2.112211 LR: 0.00000781 [16:46:44] Epoch: 1 Batch: 17232/20099 (85.74%) Loss: 2.062364 LR: 0.00000781 [16:46:46] Epoch: 1 Batch: 17233/20099 (85.74%) Loss: 1.871177 LR: 0.00000781 [16:46:48] Epoch: 1 Batch: 17234/20099 (85.75%) Loss: 2.107831 LR: 0.00000781 [16:46:50] Epoch: 1 Batch: 17235/20099 (85.75%) Loss: 2.002338 LR: 0.00000780 [16:46:52] Epoch: 1 Batch: 17236/20099 (85.76%) Loss: 1.909302 LR: 0.00000780 [16:46:54] Epoch: 1 Batch: 17237/20099 (85.76%) Loss: 1.697102 LR: 0.00000780 [16:46:56] Epoch: 1 Batch: 17238/20099 (85.77%) Loss: 2.139163 LR: 0.00000780 [16:46:57] Epoch: 1 Batch: 17239/20099 (85.77%) Loss: 1.934740 LR: 0.00000780 [16:46:59] Epoch: 1 Batch: 17240/20099 (85.78%) Loss: 1.790341 LR: 0.00000780 [16:47:01] Epoch: 1 Batch: 17241/20099 (85.78%) Loss: 2.079438 LR: 0.00000780 [16:47:03] Epoch: 1 Batch: 17242/20099 (85.79%) Loss: 2.167928 LR: 0.00000779 [16:47:05] Epoch: 1 Batch: 17243/20099 (85.79%) Loss: 2.125711 LR: 0.00000779 [16:47:07] Epoch: 1 Batch: 17244/20099 (85.80%) Loss: 2.144293 LR: 0.00000779 [16:47:08] Epoch: 1 Batch: 17245/20099 (85.80%) Loss: 1.897318 LR: 0.00000779 [16:47:10] Epoch: 1 Batch: 17246/20099 (85.81%) Loss: 2.089487 LR: 0.00000779 [16:47:12] Epoch: 1 Batch: 17247/20099 (85.81%) Loss: 2.109206 LR: 0.00000779 [16:47:14] Epoch: 1 Batch: 17248/20099 (85.82%) Loss: 2.335516 LR: 0.00000779 [16:47:16] Epoch: 1 Batch: 17249/20099 (85.82%) Loss: 1.900991 LR: 0.00000778 [16:47:18] Epoch: 1 Batch: 17250/20099 (85.83%) Loss: 2.200330 LR: 0.00000778 [16:47:20] Epoch: 1 Batch: 17251/20099 (85.83%) Loss: 2.193169 LR: 0.00000778 [16:47:21] Epoch: 1 Batch: 17252/20099 (85.84%) Loss: 2.069787 LR: 0.00000778 [16:47:23] Epoch: 1 Batch: 17253/20099 (85.84%) Loss: 2.380711 LR: 0.00000778 [16:47:25] Epoch: 1 Batch: 17254/20099 (85.85%) Loss: 2.274825 LR: 0.00000778 [16:47:27] Epoch: 1 Batch: 17255/20099 (85.85%) Loss: 1.904615 LR: 0.00000778 [16:47:29] Epoch: 1 Batch: 17256/20099 (85.86%) Loss: 2.077132 LR: 0.00000778 [16:47:31] Epoch: 1 Batch: 17257/20099 (85.86%) Loss: 2.166797 LR: 0.00000778 [16:47:32] Epoch: 1 Batch: 17258/20099 (85.86%) Loss: 1.973248 LR: 0.00000778 [16:47:34] Epoch: 1 Batch: 17259/20099 (85.87%) Loss: 2.247633 LR: 0.00000778 [16:47:36] Epoch: 1 Batch: 17260/20099 (85.87%) Loss: 2.201734 LR: 0.00000778 [16:47:38] Epoch: 1 Batch: 17261/20099 (85.88%) Loss: 1.935487 LR: 0.00000778 [16:47:40] Epoch: 1 Batch: 17262/20099 (85.88%) Loss: 2.133951 LR: 0.00000778 [16:47:42] Epoch: 1 Batch: 17263/20099 (85.89%) Loss: 2.126147 LR: 0.00000777 [16:47:44] Epoch: 1 Batch: 17264/20099 (85.89%) Loss: 1.933670 LR: 0.00000777 [16:47:46] Epoch: 1 Batch: 17265/20099 (85.90%) Loss: 2.353459 LR: 0.00000777 [16:47:47] Epoch: 1 Batch: 17266/20099 (85.90%) Loss: 1.878260 LR: 0.00000777 [16:47:49] Epoch: 1 Batch: 17267/20099 (85.91%) Loss: 2.208635 LR: 0.00000777 [16:47:51] Epoch: 1 Batch: 17268/20099 (85.91%) Loss: 1.995406 LR: 0.00000777 [16:47:53] Epoch: 1 Batch: 17269/20099 (85.92%) Loss: 2.036774 LR: 0.00000777 [16:47:55] Epoch: 1 Batch: 17270/20099 (85.92%) Loss: 1.879942 LR: 0.00000776 [16:47:57] Epoch: 1 Batch: 17271/20099 (85.93%) Loss: 1.984814 LR: 0.00000776 [16:47:59] Epoch: 1 Batch: 17272/20099 (85.93%) Loss: 2.154281 LR: 0.00000776 [16:48:00] Epoch: 1 Batch: 17273/20099 (85.94%) Loss: 1.603649 LR: 0.00000776 [16:48:02] Epoch: 1 Batch: 17274/20099 (85.94%) Loss: 2.429045 LR: 0.00000776 [16:48:04] Epoch: 1 Batch: 17275/20099 (85.95%) Loss: 2.091626 LR: 0.00000776 [16:48:06] Epoch: 1 Batch: 17276/20099 (85.95%) Loss: 2.333521 LR: 0.00000776 [16:48:08] Epoch: 1 Batch: 17277/20099 (85.96%) Loss: 2.206443 LR: 0.00000775 [16:48:10] Epoch: 1 Batch: 17278/20099 (85.96%) Loss: 1.879445 LR: 0.00000775 [16:48:12] Epoch: 1 Batch: 17279/20099 (85.97%) Loss: 2.062108 LR: 0.00000775 [16:48:13] Epoch: 1 Batch: 17280/20099 (85.97%) Loss: 1.934065 LR: 0.00000775 [16:48:15] Epoch: 1 Batch: 17281/20099 (85.98%) Loss: 2.174314 LR: 0.00000775 [16:48:17] Epoch: 1 Batch: 17282/20099 (85.98%) Loss: 2.366800 LR: 0.00000775 [16:48:19] Epoch: 1 Batch: 17283/20099 (85.99%) Loss: 1.886599 LR: 0.00000775 [16:48:21] Epoch: 1 Batch: 17284/20099 (85.99%) Loss: 2.048080 LR: 0.00000774 [16:48:23] Epoch: 1 Batch: 17285/20099 (86.00%) Loss: 2.125031 LR: 0.00000774 [16:48:25] Epoch: 1 Batch: 17286/20099 (86.00%) Loss: 2.138046 LR: 0.00000774 [16:48:26] Epoch: 1 Batch: 17287/20099 (86.01%) Loss: 2.170060 LR: 0.00000774 [16:48:28] Epoch: 1 Batch: 17288/20099 (86.01%) Loss: 2.533138 LR: 0.00000774 [16:48:30] Epoch: 1 Batch: 17289/20099 (86.02%) Loss: 2.094114 LR: 0.00000774 [16:48:32] Epoch: 1 Batch: 17290/20099 (86.02%) Loss: 2.213296 LR: 0.00000774 [16:48:34] Epoch: 1 Batch: 17291/20099 (86.03%) Loss: 1.922375 LR: 0.00000773 [16:48:36] Epoch: 1 Batch: 17292/20099 (86.03%) Loss: 2.205354 LR: 0.00000773 [16:48:38] Epoch: 1 Batch: 17293/20099 (86.04%) Loss: 2.112832 LR: 0.00000773 [16:48:39] Epoch: 1 Batch: 17294/20099 (86.04%) Loss: 2.349554 LR: 0.00000773 [16:48:41] Epoch: 1 Batch: 17295/20099 (86.05%) Loss: 2.135830 LR: 0.00000773 [16:48:43] Epoch: 1 Batch: 17296/20099 (86.05%) Loss: 2.214923 LR: 0.00000773 [16:48:45] Epoch: 1 Batch: 17297/20099 (86.06%) Loss: 2.028799 LR: 0.00000773 [16:48:47] Epoch: 1 Batch: 17298/20099 (86.06%) Loss: 1.969951 LR: 0.00000772 [16:48:49] Epoch: 1 Batch: 17299/20099 (86.07%) Loss: 2.016245 LR: 0.00000772 [16:48:51] Epoch: 1 Batch: 17300/20099 (86.07%) Loss: 2.060121 LR: 0.00000772 [16:48:52] Epoch: 1 Batch: 17301/20099 (86.08%) Loss: 2.200127 LR: 0.00000772 [16:48:54] Epoch: 1 Batch: 17302/20099 (86.08%) Loss: 1.905559 LR: 0.00000772 [16:48:56] Epoch: 1 Batch: 17303/20099 (86.09%) Loss: 2.082115 LR: 0.00000772 [16:48:58] Epoch: 1 Batch: 17304/20099 (86.09%) Loss: 2.003947 LR: 0.00000772 [16:49:00] Epoch: 1 Batch: 17305/20099 (86.10%) Loss: 2.056095 LR: 0.00000772 [16:49:02] Epoch: 1 Batch: 17306/20099 (86.10%) Loss: 1.949459 LR: 0.00000772 [16:49:03] Epoch: 1 Batch: 17307/20099 (86.11%) Loss: 2.052391 LR: 0.00000772 [16:49:05] Epoch: 1 Batch: 17308/20099 (86.11%) Loss: 2.343960 LR: 0.00000772 [16:49:07] Epoch: 1 Batch: 17309/20099 (86.12%) Loss: 2.408039 LR: 0.00000772 [16:49:09] Epoch: 1 Batch: 17310/20099 (86.12%) Loss: 2.010841 LR: 0.00000772 [16:49:11] Epoch: 1 Batch: 17311/20099 (86.13%) Loss: 1.919291 LR: 0.00000772 [16:49:13] Epoch: 1 Batch: 17312/20099 (86.13%) Loss: 2.232831 LR: 0.00000771 [16:49:14] Epoch: 1 Batch: 17313/20099 (86.14%) Loss: 1.839179 LR: 0.00000771 [16:49:16] Epoch: 1 Batch: 17314/20099 (86.14%) Loss: 2.376810 LR: 0.00000771 [16:49:18] Epoch: 1 Batch: 17315/20099 (86.15%) Loss: 1.999658 LR: 0.00000771 [16:49:20] Epoch: 1 Batch: 17316/20099 (86.15%) Loss: 1.963540 LR: 0.00000771 [16:49:22] Epoch: 1 Batch: 17317/20099 (86.16%) Loss: 1.899183 LR: 0.00000771 [16:49:24] Epoch: 1 Batch: 17318/20099 (86.16%) Loss: 1.985349 LR: 0.00000771 [16:49:26] Epoch: 1 Batch: 17319/20099 (86.17%) Loss: 2.199787 LR: 0.00000770 [16:49:27] Epoch: 1 Batch: 17320/20099 (86.17%) Loss: 2.281458 LR: 0.00000770 [16:49:29] Epoch: 1 Batch: 17321/20099 (86.18%) Loss: 1.995444 LR: 0.00000770 [16:49:31] Epoch: 1 Batch: 17322/20099 (86.18%) Loss: 1.796515 LR: 0.00000770 [16:49:33] Epoch: 1 Batch: 17323/20099 (86.19%) Loss: 2.193122 LR: 0.00000770 [16:49:35] Epoch: 1 Batch: 17324/20099 (86.19%) Loss: 1.890665 LR: 0.00000770 [16:49:37] Epoch: 1 Batch: 17325/20099 (86.20%) Loss: 1.878411 LR: 0.00000770 [16:49:39] Epoch: 1 Batch: 17326/20099 (86.20%) Loss: 2.080388 LR: 0.00000769 [16:49:40] Epoch: 1 Batch: 17327/20099 (86.21%) Loss: 1.803379 LR: 0.00000769 [16:49:42] Epoch: 1 Batch: 17328/20099 (86.21%) Loss: 2.037890 LR: 0.00000769 [16:49:44] Epoch: 1 Batch: 17329/20099 (86.22%) Loss: 2.085538 LR: 0.00000769 [16:49:46] Epoch: 1 Batch: 17330/20099 (86.22%) Loss: 2.430757 LR: 0.00000769 [16:49:48] Epoch: 1 Batch: 17331/20099 (86.23%) Loss: 1.926998 LR: 0.00000769 [16:49:50] Epoch: 1 Batch: 17332/20099 (86.23%) Loss: 2.235817 LR: 0.00000769 [16:49:52] Epoch: 1 Batch: 17333/20099 (86.24%) Loss: 1.915742 LR: 0.00000768 [16:49:53] Epoch: 1 Batch: 17334/20099 (86.24%) Loss: 2.317299 LR: 0.00000768 [16:49:55] Epoch: 1 Batch: 17335/20099 (86.25%) Loss: 2.190443 LR: 0.00000768 [16:49:57] Epoch: 1 Batch: 17336/20099 (86.25%) Loss: 2.035498 LR: 0.00000768 [16:49:59] Epoch: 1 Batch: 17337/20099 (86.26%) Loss: 2.084763 LR: 0.00000768 [16:50:01] Epoch: 1 Batch: 17338/20099 (86.26%) Loss: 2.080914 LR: 0.00000768 [16:50:03] Epoch: 1 Batch: 17339/20099 (86.27%) Loss: 2.279551 LR: 0.00000768 [16:50:05] Epoch: 1 Batch: 17340/20099 (86.27%) Loss: 2.102048 LR: 0.00000767 [16:50:06] Epoch: 1 Batch: 17341/20099 (86.28%) Loss: 2.302230 LR: 0.00000767 [16:50:08] Epoch: 1 Batch: 17342/20099 (86.28%) Loss: 2.177974 LR: 0.00000767 [16:50:10] Epoch: 1 Batch: 17343/20099 (86.29%) Loss: 2.056188 LR: 0.00000767 [16:50:12] Epoch: 1 Batch: 17344/20099 (86.29%) Loss: 2.127283 LR: 0.00000767 [16:50:14] Epoch: 1 Batch: 17345/20099 (86.30%) Loss: 2.146709 LR: 0.00000767 [16:50:16] Epoch: 1 Batch: 17346/20099 (86.30%) Loss: 2.223658 LR: 0.00000767 [16:50:18] Epoch: 1 Batch: 17347/20099 (86.31%) Loss: 2.191038 LR: 0.00000767 [16:50:19] Epoch: 1 Batch: 17348/20099 (86.31%) Loss: 1.788854 LR: 0.00000767 [16:50:21] Epoch: 1 Batch: 17349/20099 (86.32%) Loss: 2.120617 LR: 0.00000767 [16:50:23] Epoch: 1 Batch: 17350/20099 (86.32%) Loss: 2.089490 LR: 0.00000767 [16:50:25] Epoch: 1 Batch: 17351/20099 (86.33%) Loss: 1.964117 LR: 0.00000767 [16:50:27] Epoch: 1 Batch: 17352/20099 (86.33%) Loss: 1.746458 LR: 0.00000767 [16:50:29] Epoch: 1 Batch: 17353/20099 (86.34%) Loss: 1.463508 LR: 0.00000767 [16:50:31] Epoch: 1 Batch: 17354/20099 (86.34%) Loss: 2.105454 LR: 0.00000766 [16:50:32] Epoch: 1 Batch: 17355/20099 (86.35%) Loss: 2.284277 LR: 0.00000766 [16:50:34] Epoch: 1 Batch: 17356/20099 (86.35%) Loss: 2.545760 LR: 0.00000766 [16:50:36] Epoch: 1 Batch: 17357/20099 (86.36%) Loss: 2.373370 LR: 0.00000766 [16:50:38] Epoch: 1 Batch: 17358/20099 (86.36%) Loss: 1.881050 LR: 0.00000766 [16:50:40] Epoch: 1 Batch: 17359/20099 (86.37%) Loss: 2.212394 LR: 0.00000766 [16:50:42] Epoch: 1 Batch: 17360/20099 (86.37%) Loss: 2.260129 LR: 0.00000766 [16:50:44] Epoch: 1 Batch: 17361/20099 (86.38%) Loss: 2.002730 LR: 0.00000765 [16:50:45] Epoch: 1 Batch: 17362/20099 (86.38%) Loss: 1.913905 LR: 0.00000765 [16:50:47] Epoch: 1 Batch: 17363/20099 (86.39%) Loss: 2.123970 LR: 0.00000765 [16:50:49] Epoch: 1 Batch: 17364/20099 (86.39%) Loss: 2.014408 LR: 0.00000765 [16:50:51] Epoch: 1 Batch: 17365/20099 (86.40%) Loss: 2.123674 LR: 0.00000765 [16:50:53] Epoch: 1 Batch: 17366/20099 (86.40%) Loss: 1.622434 LR: 0.00000765 [16:50:55] Epoch: 1 Batch: 17367/20099 (86.41%) Loss: 1.943944 LR: 0.00000765 [16:50:56] Epoch: 1 Batch: 17368/20099 (86.41%) Loss: 2.383486 LR: 0.00000764 [16:50:58] Epoch: 1 Batch: 17369/20099 (86.42%) Loss: 2.008016 LR: 0.00000764 [16:51:00] Epoch: 1 Batch: 17370/20099 (86.42%) Loss: 2.052284 LR: 0.00000764 [16:51:02] Epoch: 1 Batch: 17371/20099 (86.43%) Loss: 1.954945 LR: 0.00000764 [16:51:04] Epoch: 1 Batch: 17372/20099 (86.43%) Loss: 2.108921 LR: 0.00000764 [16:51:06] Epoch: 1 Batch: 17373/20099 (86.44%) Loss: 2.039291 LR: 0.00000764 [16:51:08] Epoch: 1 Batch: 17374/20099 (86.44%) Loss: 2.054207 LR: 0.00000764 [16:51:09] Epoch: 1 Batch: 17375/20099 (86.45%) Loss: 2.207550 LR: 0.00000763 [16:51:11] Epoch: 1 Batch: 17376/20099 (86.45%) Loss: 2.382384 LR: 0.00000763 [16:51:13] Epoch: 1 Batch: 17377/20099 (86.46%) Loss: 1.936982 LR: 0.00000763 [16:51:15] Epoch: 1 Batch: 17378/20099 (86.46%) Loss: 2.106252 LR: 0.00000763 [16:51:17] Epoch: 1 Batch: 17379/20099 (86.47%) Loss: 1.996331 LR: 0.00000763 [16:51:19] Epoch: 1 Batch: 17380/20099 (86.47%) Loss: 1.953431 LR: 0.00000763 [16:51:20] Epoch: 1 Batch: 17381/20099 (86.48%) Loss: 2.002789 LR: 0.00000763 [16:51:22] Epoch: 1 Batch: 17382/20099 (86.48%) Loss: 1.942730 LR: 0.00000763 [16:51:24] Epoch: 1 Batch: 17383/20099 (86.49%) Loss: 1.905152 LR: 0.00000763 [16:51:26] Epoch: 1 Batch: 17384/20099 (86.49%) Loss: 2.195080 LR: 0.00000763 [16:51:28] Epoch: 1 Batch: 17385/20099 (86.50%) Loss: 1.892500 LR: 0.00000763 [16:51:30] Epoch: 1 Batch: 17386/20099 (86.50%) Loss: 2.035051 LR: 0.00000763 [16:51:32] Epoch: 1 Batch: 17387/20099 (86.51%) Loss: 2.385957 LR: 0.00000763 [16:51:33] Epoch: 1 Batch: 17388/20099 (86.51%) Loss: 2.052205 LR: 0.00000763 [16:51:35] Epoch: 1 Batch: 17389/20099 (86.52%) Loss: 1.883700 LR: 0.00000762 [16:51:37] Epoch: 1 Batch: 17390/20099 (86.52%) Loss: 1.715557 LR: 0.00000762 [16:51:39] Epoch: 1 Batch: 17391/20099 (86.53%) Loss: 2.144867 LR: 0.00000762 [16:51:41] Epoch: 1 Batch: 17392/20099 (86.53%) Loss: 2.075301 LR: 0.00000762 [16:51:43] Epoch: 1 Batch: 17393/20099 (86.54%) Loss: 2.113362 LR: 0.00000762 [16:51:45] Epoch: 1 Batch: 17394/20099 (86.54%) Loss: 1.967252 LR: 0.00000762 [16:51:46] Epoch: 1 Batch: 17395/20099 (86.55%) Loss: 1.899744 LR: 0.00000762 [16:51:48] Epoch: 1 Batch: 17396/20099 (86.55%) Loss: 2.161981 LR: 0.00000761 [16:51:50] Epoch: 1 Batch: 17397/20099 (86.56%) Loss: 1.730449 LR: 0.00000761 [16:51:52] Epoch: 1 Batch: 17398/20099 (86.56%) Loss: 2.250108 LR: 0.00000761 [16:51:54] Epoch: 1 Batch: 17399/20099 (86.57%) Loss: 2.054953 LR: 0.00000761 [16:51:59] >> Cleaned up old temp checkpoint: epoch1_step15400 [16:51:59] >> Temp checkpoint saved: epoch1_step17400, size: 0.1693 GB [16:51:59] Epoch: 1 Batch: 17400/20099 (86.57%) Loss: 1.919746 LR: 0.00000761 [16:52:01] Epoch: 1 Batch: 17401/20099 (86.58%) Loss: 1.799139 LR: 0.00000761 [16:52:03] Epoch: 1 Batch: 17402/20099 (86.58%) Loss: 2.000528 LR: 0.00000761 [16:52:05] Epoch: 1 Batch: 17403/20099 (86.59%) Loss: 1.805122 LR: 0.00000760 [16:52:07] Epoch: 1 Batch: 17404/20099 (86.59%) Loss: 2.256586 LR: 0.00000760 [16:52:09] Epoch: 1 Batch: 17405/20099 (86.60%) Loss: 1.913914 LR: 0.00000760 [16:52:11] Epoch: 1 Batch: 17406/20099 (86.60%) Loss: 1.680895 LR: 0.00000760 [16:52:12] Epoch: 1 Batch: 17407/20099 (86.61%) Loss: 2.058811 LR: 0.00000760 [16:52:14] Epoch: 1 Batch: 17408/20099 (86.61%) Loss: 2.050990 LR: 0.00000760 [16:52:16] Epoch: 1 Batch: 17409/20099 (86.62%) Loss: 2.085586 LR: 0.00000760 [16:52:18] Epoch: 1 Batch: 17410/20099 (86.62%) Loss: 1.804055 LR: 0.00000759 [16:52:20] Epoch: 1 Batch: 17411/20099 (86.63%) Loss: 2.061215 LR: 0.00000759 [16:52:22] Epoch: 1 Batch: 17412/20099 (86.63%) Loss: 1.941276 LR: 0.00000759 [16:52:24] Epoch: 1 Batch: 17413/20099 (86.64%) Loss: 2.062979 LR: 0.00000759 [16:52:25] Epoch: 1 Batch: 17414/20099 (86.64%) Loss: 1.997889 LR: 0.00000759 [16:52:27] Epoch: 1 Batch: 17415/20099 (86.65%) Loss: 2.135464 LR: 0.00000759 [16:52:29] Epoch: 1 Batch: 17416/20099 (86.65%) Loss: 2.187501 LR: 0.00000759 [16:52:31] Epoch: 1 Batch: 17417/20099 (86.66%) Loss: 2.248392 LR: 0.00000758 [16:52:33] Epoch: 1 Batch: 17418/20099 (86.66%) Loss: 1.904775 LR: 0.00000758 [16:52:35] Epoch: 1 Batch: 17419/20099 (86.67%) Loss: 1.831022 LR: 0.00000758 [16:52:37] Epoch: 1 Batch: 17420/20099 (86.67%) Loss: 2.134475 LR: 0.00000758 [16:52:38] Epoch: 1 Batch: 17421/20099 (86.68%) Loss: 2.049401 LR: 0.00000758 [16:52:40] Epoch: 1 Batch: 17422/20099 (86.68%) Loss: 2.266951 LR: 0.00000758 [16:52:42] Epoch: 1 Batch: 17423/20099 (86.69%) Loss: 2.083439 LR: 0.00000758 [16:52:44] Epoch: 1 Batch: 17424/20099 (86.69%) Loss: 2.195826 LR: 0.00000758 [16:52:46] Epoch: 1 Batch: 17425/20099 (86.70%) Loss: 2.195502 LR: 0.00000758 [16:52:48] Epoch: 1 Batch: 17426/20099 (86.70%) Loss: 2.305930 LR: 0.00000758 [16:52:50] Epoch: 1 Batch: 17427/20099 (86.71%) Loss: 2.065771 LR: 0.00000758 [16:52:52] Epoch: 1 Batch: 17428/20099 (86.71%) Loss: 2.051359 LR: 0.00000758 [16:52:53] Epoch: 1 Batch: 17429/20099 (86.72%) Loss: 2.040518 LR: 0.00000758 [16:52:55] Epoch: 1 Batch: 17430/20099 (86.72%) Loss: 2.187335 LR: 0.00000758 [16:52:57] Epoch: 1 Batch: 17431/20099 (86.73%) Loss: 1.919889 LR: 0.00000757 [16:52:59] Epoch: 1 Batch: 17432/20099 (86.73%) Loss: 1.883625 LR: 0.00000757 [16:53:01] Epoch: 1 Batch: 17433/20099 (86.74%) Loss: 2.270657 LR: 0.00000757 [16:53:03] Epoch: 1 Batch: 17434/20099 (86.74%) Loss: 2.191535 LR: 0.00000757 [16:53:05] Epoch: 1 Batch: 17435/20099 (86.75%) Loss: 2.157299 LR: 0.00000757 [16:53:06] Epoch: 1 Batch: 17436/20099 (86.75%) Loss: 1.945669 LR: 0.00000757 [16:53:08] Epoch: 1 Batch: 17437/20099 (86.76%) Loss: 2.231283 LR: 0.00000757 [16:53:10] Epoch: 1 Batch: 17438/20099 (86.76%) Loss: 2.337742 LR: 0.00000756 [16:53:12] Epoch: 1 Batch: 17439/20099 (86.77%) Loss: 2.132945 LR: 0.00000756 [16:53:14] Epoch: 1 Batch: 17440/20099 (86.77%) Loss: 1.901555 LR: 0.00000756 [16:53:16] Epoch: 1 Batch: 17441/20099 (86.78%) Loss: 2.196252 LR: 0.00000756 [16:53:17] Epoch: 1 Batch: 17442/20099 (86.78%) Loss: 2.357275 LR: 0.00000756 [16:53:19] Epoch: 1 Batch: 17443/20099 (86.79%) Loss: 2.076062 LR: 0.00000756 [16:53:21] Epoch: 1 Batch: 17444/20099 (86.79%) Loss: 2.368969 LR: 0.00000756 [16:53:23] Epoch: 1 Batch: 17445/20099 (86.80%) Loss: 2.316773 LR: 0.00000755 [16:53:25] Epoch: 1 Batch: 17446/20099 (86.80%) Loss: 2.412952 LR: 0.00000755 [16:53:27] Epoch: 1 Batch: 17447/20099 (86.81%) Loss: 2.167215 LR: 0.00000755 [16:53:28] Epoch: 1 Batch: 17448/20099 (86.81%) Loss: 2.030721 LR: 0.00000755 [16:53:30] Epoch: 1 Batch: 17449/20099 (86.82%) Loss: 2.032823 LR: 0.00000755 [16:53:32] Epoch: 1 Batch: 17450/20099 (86.82%) Loss: 1.819517 LR: 0.00000755 [16:53:34] Epoch: 1 Batch: 17451/20099 (86.83%) Loss: 2.154927 LR: 0.00000755 [16:53:36] Epoch: 1 Batch: 17452/20099 (86.83%) Loss: 2.115144 LR: 0.00000754 [16:53:38] Epoch: 1 Batch: 17453/20099 (86.84%) Loss: 1.819947 LR: 0.00000754 [16:53:40] Epoch: 1 Batch: 17454/20099 (86.84%) Loss: 2.176224 LR: 0.00000754 [16:53:41] Epoch: 1 Batch: 17455/20099 (86.85%) Loss: 2.136883 LR: 0.00000754 [16:53:43] Epoch: 1 Batch: 17456/20099 (86.85%) Loss: 1.724549 LR: 0.00000754 [16:53:45] Epoch: 1 Batch: 17457/20099 (86.86%) Loss: 2.220631 LR: 0.00000754 [16:53:47] Epoch: 1 Batch: 17458/20099 (86.86%) Loss: 1.849589 LR: 0.00000754 [16:53:49] Epoch: 1 Batch: 17459/20099 (86.87%) Loss: 2.021147 LR: 0.00000754 [16:53:51] Epoch: 1 Batch: 17460/20099 (86.87%) Loss: 2.253992 LR: 0.00000754 [16:53:53] Epoch: 1 Batch: 17461/20099 (86.87%) Loss: 1.699164 LR: 0.00000754 [16:53:54] Epoch: 1 Batch: 17462/20099 (86.88%) Loss: 2.146445 LR: 0.00000754 [16:53:56] Epoch: 1 Batch: 17463/20099 (86.88%) Loss: 2.080113 LR: 0.00000754 [16:53:58] Epoch: 1 Batch: 17464/20099 (86.89%) Loss: 1.963135 LR: 0.00000754 [16:54:00] Epoch: 1 Batch: 17465/20099 (86.89%) Loss: 2.147226 LR: 0.00000754 [16:54:02] Epoch: 1 Batch: 17466/20099 (86.90%) Loss: 1.900192 LR: 0.00000753 [16:54:04] Epoch: 1 Batch: 17467/20099 (86.90%) Loss: 2.210519 LR: 0.00000753 [16:54:06] Epoch: 1 Batch: 17468/20099 (86.91%) Loss: 2.061323 LR: 0.00000753 [16:54:07] Epoch: 1 Batch: 17469/20099 (86.91%) Loss: 1.962737 LR: 0.00000753 [16:54:09] Epoch: 1 Batch: 17470/20099 (86.92%) Loss: 1.767326 LR: 0.00000753 [16:54:11] Epoch: 1 Batch: 17471/20099 (86.92%) Loss: 1.996546 LR: 0.00000753 [16:54:13] Epoch: 1 Batch: 17472/20099 (86.93%) Loss: 2.376313 LR: 0.00000753 [16:54:15] Epoch: 1 Batch: 17473/20099 (86.93%) Loss: 2.111506 LR: 0.00000752 [16:54:17] Epoch: 1 Batch: 17474/20099 (86.94%) Loss: 2.174128 LR: 0.00000752 [16:54:19] Epoch: 1 Batch: 17475/20099 (86.94%) Loss: 2.148231 LR: 0.00000752 [16:54:20] Epoch: 1 Batch: 17476/20099 (86.95%) Loss: 1.927590 LR: 0.00000752 [16:54:22] Epoch: 1 Batch: 17477/20099 (86.95%) Loss: 2.248884 LR: 0.00000752 [16:54:24] Epoch: 1 Batch: 17478/20099 (86.96%) Loss: 2.426048 LR: 0.00000752 [16:54:26] Epoch: 1 Batch: 17479/20099 (86.96%) Loss: 1.946288 LR: 0.00000752 [16:54:28] Epoch: 1 Batch: 17480/20099 (86.97%) Loss: 2.331980 LR: 0.00000751 [16:54:30] Epoch: 1 Batch: 17481/20099 (86.97%) Loss: 2.165002 LR: 0.00000751 [16:54:32] Epoch: 1 Batch: 17482/20099 (86.98%) Loss: 1.737559 LR: 0.00000751 [16:54:33] Epoch: 1 Batch: 17483/20099 (86.98%) Loss: 2.025131 LR: 0.00000751 [16:54:36] Epoch: 1 Batch: 17484/20099 (86.99%) Loss: 2.005286 LR: 0.00000751 [16:54:37] Epoch: 1 Batch: 17485/20099 (86.99%) Loss: 1.973143 LR: 0.00000751 [16:54:39] Epoch: 1 Batch: 17486/20099 (87.00%) Loss: 1.710006 LR: 0.00000751 [16:54:41] Epoch: 1 Batch: 17487/20099 (87.00%) Loss: 2.244121 LR: 0.00000751 [16:54:43] Epoch: 1 Batch: 17488/20099 (87.01%) Loss: 2.342765 LR: 0.00000751 [16:54:45] Epoch: 1 Batch: 17489/20099 (87.01%) Loss: 2.235284 LR: 0.00000751 [16:54:47] Epoch: 1 Batch: 17490/20099 (87.02%) Loss: 1.948175 LR: 0.00000751 [16:54:49] Epoch: 1 Batch: 17491/20099 (87.02%) Loss: 2.049271 LR: 0.00000751 [16:54:50] Epoch: 1 Batch: 17492/20099 (87.03%) Loss: 1.816251 LR: 0.00000751 [16:54:52] Epoch: 1 Batch: 17493/20099 (87.03%) Loss: 2.252430 LR: 0.00000751 [16:54:54] Epoch: 1 Batch: 17494/20099 (87.04%) Loss: 1.945635 LR: 0.00000750 [16:54:56] Epoch: 1 Batch: 17495/20099 (87.04%) Loss: 2.090365 LR: 0.00000750 [16:54:58] Epoch: 1 Batch: 17496/20099 (87.05%) Loss: 2.094879 LR: 0.00000750 [16:55:00] Epoch: 1 Batch: 17497/20099 (87.05%) Loss: 1.921861 LR: 0.00000750 [16:55:01] Epoch: 1 Batch: 17498/20099 (87.06%) Loss: 1.965526 LR: 0.00000750 [16:55:03] Epoch: 1 Batch: 17499/20099 (87.06%) Loss: 2.302305 LR: 0.00000750 [16:55:05] >> Evaluating batch 0 [16:55:06] >> Evaluating batch 1 [16:55:07] >> Evaluating batch 2 [16:55:08] >> Evaluating batch 3 [16:55:10] >> Evaluating batch 4 [16:55:11] >> Evaluating batch 5 [16:55:12] >> Evaluating batch 6 [16:55:13] >> Evaluating batch 7 [16:55:14] >> Evaluating batch 8 [16:55:15] >> Evaluating batch 9 [16:55:16] >> Evaluating batch 10 [16:55:17] >> Evaluating batch 11 [16:55:18] >> Evaluating batch 12 [16:55:19] >> Evaluating batch 13 [16:55:20] >> Evaluating batch 14 [16:55:21] >> Evaluating batch 15 [16:55:22] >> Evaluating batch 16 [16:55:22] Epoch: 1 Step: 17500/20099 Evaluation: [16:55:22] [1mAvg Loss Since Last Eval: 2.0786 Val Loss: 2.1478 Validation loss delta: 0.0008 Perplexity: 8.5657 LR: 0.00000750 [16:55:26] >> Checkpoint saved: epoch1_step17500, size: 0.1693 GB [16:55:26] Epoch: 1 Batch: 17500/20099 (87.07%) Loss: 2.235569 LR: 0.00000750 [16:55:28] Epoch: 1 Batch: 17501/20099 (87.07%) Loss: 1.715805 LR: 0.00000749 [16:55:30] Epoch: 1 Batch: 17502/20099 (87.08%) Loss: 1.954907 LR: 0.00000749 [16:55:31] Epoch: 1 Batch: 17503/20099 (87.08%) Loss: 2.258551 LR: 0.00000749 [16:55:33] Epoch: 1 Batch: 17504/20099 (87.09%) Loss: 2.229052 LR: 0.00000749 [16:55:35] Epoch: 1 Batch: 17505/20099 (87.09%) Loss: 2.292044 LR: 0.00000749 [16:55:37] Epoch: 1 Batch: 17506/20099 (87.10%) Loss: 2.261032 LR: 0.00000749 [16:55:39] Epoch: 1 Batch: 17507/20099 (87.10%) Loss: 1.893339 LR: 0.00000749 [16:55:41] Epoch: 1 Batch: 17508/20099 (87.11%) Loss: 2.165386 LR: 0.00000748 [16:55:43] Epoch: 1 Batch: 17509/20099 (87.11%) Loss: 2.097392 LR: 0.00000748 [16:55:44] Epoch: 1 Batch: 17510/20099 (87.12%) Loss: 2.204749 LR: 0.00000748 [16:55:46] Epoch: 1 Batch: 17511/20099 (87.12%) Loss: 2.159243 LR: 0.00000748 [16:55:48] Epoch: 1 Batch: 17512/20099 (87.13%) Loss: 2.176404 LR: 0.00000748 [16:55:50] Epoch: 1 Batch: 17513/20099 (87.13%) Loss: 2.247217 LR: 0.00000748 [16:55:52] Epoch: 1 Batch: 17514/20099 (87.14%) Loss: 2.210220 LR: 0.00000748 [16:55:54] Epoch: 1 Batch: 17515/20099 (87.14%) Loss: 2.052430 LR: 0.00000747 [16:55:56] Epoch: 1 Batch: 17516/20099 (87.15%) Loss: 2.379779 LR: 0.00000747 [16:55:57] Epoch: 1 Batch: 17517/20099 (87.15%) Loss: 2.065267 LR: 0.00000747 [16:55:59] Epoch: 1 Batch: 17518/20099 (87.16%) Loss: 2.148651 LR: 0.00000747 [16:56:01] Epoch: 1 Batch: 17519/20099 (87.16%) Loss: 2.098794 LR: 0.00000747 [16:56:03] Epoch: 1 Batch: 17520/20099 (87.17%) Loss: 2.067379 LR: 0.00000747 [16:56:05] Epoch: 1 Batch: 17521/20099 (87.17%) Loss: 1.958378 LR: 0.00000747 [16:56:07] Epoch: 1 Batch: 17522/20099 (87.18%) Loss: 2.113764 LR: 0.00000747 [16:56:09] Epoch: 1 Batch: 17523/20099 (87.18%) Loss: 1.983912 LR: 0.00000747 [16:56:11] Epoch: 1 Batch: 17524/20099 (87.19%) Loss: 2.129180 LR: 0.00000747 [16:56:12] Epoch: 1 Batch: 17525/20099 (87.19%) Loss: 1.754702 LR: 0.00000747 [16:56:14] Epoch: 1 Batch: 17526/20099 (87.20%) Loss: 2.087590 LR: 0.00000747 [16:56:16] Epoch: 1 Batch: 17527/20099 (87.20%) Loss: 1.877912 LR: 0.00000747 [16:56:18] Epoch: 1 Batch: 17528/20099 (87.21%) Loss: 2.185984 LR: 0.00000747 [16:56:20] Epoch: 1 Batch: 17529/20099 (87.21%) Loss: 2.124600 LR: 0.00000746 [16:56:22] Epoch: 1 Batch: 17530/20099 (87.22%) Loss: 1.892984 LR: 0.00000746 [16:56:24] Epoch: 1 Batch: 17531/20099 (87.22%) Loss: 2.091397 LR: 0.00000746 [16:56:25] Epoch: 1 Batch: 17532/20099 (87.23%) Loss: 1.996607 LR: 0.00000746 [16:56:27] Epoch: 1 Batch: 17533/20099 (87.23%) Loss: 2.079574 LR: 0.00000746 [16:56:29] Epoch: 1 Batch: 17534/20099 (87.24%) Loss: 2.142843 LR: 0.00000746 [16:56:31] Epoch: 1 Batch: 17535/20099 (87.24%) Loss: 2.078719 LR: 0.00000746 [16:56:33] Epoch: 1 Batch: 17536/20099 (87.25%) Loss: 2.171797 LR: 0.00000745 [16:56:35] Epoch: 1 Batch: 17537/20099 (87.25%) Loss: 2.290361 LR: 0.00000745 [16:56:36] Epoch: 1 Batch: 17538/20099 (87.26%) Loss: 2.057257 LR: 0.00000745 [16:56:38] Epoch: 1 Batch: 17539/20099 (87.26%) Loss: 2.095358 LR: 0.00000745 [16:56:40] Epoch: 1 Batch: 17540/20099 (87.27%) Loss: 2.373081 LR: 0.00000745 [16:56:42] Epoch: 1 Batch: 17541/20099 (87.27%) Loss: 2.271805 LR: 0.00000745 [16:56:44] Epoch: 1 Batch: 17542/20099 (87.28%) Loss: 2.197390 LR: 0.00000745 [16:56:46] Epoch: 1 Batch: 17543/20099 (87.28%) Loss: 2.493478 LR: 0.00000744 [16:56:48] Epoch: 1 Batch: 17544/20099 (87.29%) Loss: 1.839786 LR: 0.00000744 [16:56:49] Epoch: 1 Batch: 17545/20099 (87.29%) Loss: 1.865154 LR: 0.00000744 [16:56:51] Epoch: 1 Batch: 17546/20099 (87.30%) Loss: 1.965575 LR: 0.00000744 [16:56:53] Epoch: 1 Batch: 17547/20099 (87.30%) Loss: 1.922869 LR: 0.00000744 [16:56:55] Epoch: 1 Batch: 17548/20099 (87.31%) Loss: 2.210639 LR: 0.00000744 [16:56:57] Epoch: 1 Batch: 17549/20099 (87.31%) Loss: 1.990479 LR: 0.00000744 [16:56:59] Epoch: 1 Batch: 17550/20099 (87.32%) Loss: 1.876554 LR: 0.00000743 [16:57:00] Epoch: 1 Batch: 17551/20099 (87.32%) Loss: 2.072264 LR: 0.00000743 [16:57:02] Epoch: 1 Batch: 17552/20099 (87.33%) Loss: 1.964428 LR: 0.00000743 [16:57:04] Epoch: 1 Batch: 17553/20099 (87.33%) Loss: 2.291249 LR: 0.00000743 [16:57:06] Epoch: 1 Batch: 17554/20099 (87.34%) Loss: 2.187897 LR: 0.00000743 [16:57:08] Epoch: 1 Batch: 17555/20099 (87.34%) Loss: 2.060750 LR: 0.00000743 [16:57:10] Epoch: 1 Batch: 17556/20099 (87.35%) Loss: 1.993210 LR: 0.00000743 [16:57:12] Epoch: 1 Batch: 17557/20099 (87.35%) Loss: 2.393697 LR: 0.00000743 [16:57:13] Epoch: 1 Batch: 17558/20099 (87.36%) Loss: 1.958124 LR: 0.00000743 [16:57:15] Epoch: 1 Batch: 17559/20099 (87.36%) Loss: 1.798334 LR: 0.00000743 [16:57:17] Epoch: 1 Batch: 17560/20099 (87.37%) Loss: 2.376261 LR: 0.00000743 [16:57:19] Epoch: 1 Batch: 17561/20099 (87.37%) Loss: 2.049006 LR: 0.00000743 [16:57:21] Epoch: 1 Batch: 17562/20099 (87.38%) Loss: 2.257985 LR: 0.00000743 [16:57:23] Epoch: 1 Batch: 17563/20099 (87.38%) Loss: 1.971873 LR: 0.00000743 [16:57:25] Epoch: 1 Batch: 17564/20099 (87.39%) Loss: 2.170785 LR: 0.00000742 [16:57:26] Epoch: 1 Batch: 17565/20099 (87.39%) Loss: 2.103040 LR: 0.00000742 [16:57:28] Epoch: 1 Batch: 17566/20099 (87.40%) Loss: 2.058441 LR: 0.00000742 [16:57:30] Epoch: 1 Batch: 17567/20099 (87.40%) Loss: 2.125111 LR: 0.00000742 [16:57:32] Epoch: 1 Batch: 17568/20099 (87.41%) Loss: 2.287550 LR: 0.00000742 [16:57:34] Epoch: 1 Batch: 17569/20099 (87.41%) Loss: 2.007423 LR: 0.00000742 [16:57:36] Epoch: 1 Batch: 17570/20099 (87.42%) Loss: 1.912422 LR: 0.00000742 [16:57:38] Epoch: 1 Batch: 17571/20099 (87.42%) Loss: 2.087461 LR: 0.00000741 [16:57:39] Epoch: 1 Batch: 17572/20099 (87.43%) Loss: 2.213047 LR: 0.00000741 [16:57:41] Epoch: 1 Batch: 17573/20099 (87.43%) Loss: 2.347178 LR: 0.00000741 [16:57:43] Epoch: 1 Batch: 17574/20099 (87.44%) Loss: 1.943620 LR: 0.00000741 [16:57:45] Epoch: 1 Batch: 17575/20099 (87.44%) Loss: 1.965196 LR: 0.00000741 [16:57:47] Epoch: 1 Batch: 17576/20099 (87.45%) Loss: 2.231793 LR: 0.00000741 [16:57:49] Epoch: 1 Batch: 17577/20099 (87.45%) Loss: 2.128377 LR: 0.00000741 [16:57:51] Epoch: 1 Batch: 17578/20099 (87.46%) Loss: 2.405900 LR: 0.00000740 [16:57:53] Epoch: 1 Batch: 17579/20099 (87.46%) Loss: 2.167300 LR: 0.00000740 [16:57:54] Epoch: 1 Batch: 17580/20099 (87.47%) Loss: 1.992033 LR: 0.00000740 [16:57:56] Epoch: 1 Batch: 17581/20099 (87.47%) Loss: 1.977361 LR: 0.00000740 [16:57:58] Epoch: 1 Batch: 17582/20099 (87.48%) Loss: 2.372686 LR: 0.00000740 [16:58:00] Epoch: 1 Batch: 17583/20099 (87.48%) Loss: 2.371083 LR: 0.00000740 [16:58:02] Epoch: 1 Batch: 17584/20099 (87.49%) Loss: 2.093752 LR: 0.00000740 [16:58:04] Epoch: 1 Batch: 17585/20099 (87.49%) Loss: 1.923145 LR: 0.00000740 [16:58:06] Epoch: 1 Batch: 17586/20099 (87.50%) Loss: 1.873164 LR: 0.00000740 [16:58:07] Epoch: 1 Batch: 17587/20099 (87.50%) Loss: 2.343898 LR: 0.00000740 [16:58:09] Epoch: 1 Batch: 17588/20099 (87.51%) Loss: 2.212794 LR: 0.00000740 [16:58:11] Epoch: 1 Batch: 17589/20099 (87.51%) Loss: 2.031736 LR: 0.00000740 [16:58:13] Epoch: 1 Batch: 17590/20099 (87.52%) Loss: 2.003627 LR: 0.00000740 [16:58:15] Epoch: 1 Batch: 17591/20099 (87.52%) Loss: 1.730391 LR: 0.00000740 [16:58:17] Epoch: 1 Batch: 17592/20099 (87.53%) Loss: 2.068016 LR: 0.00000739 [16:58:19] Epoch: 1 Batch: 17593/20099 (87.53%) Loss: 1.565503 LR: 0.00000739 [16:58:20] Epoch: 1 Batch: 17594/20099 (87.54%) Loss: 1.899391 LR: 0.00000739 [16:58:22] Epoch: 1 Batch: 17595/20099 (87.54%) Loss: 1.910562 LR: 0.00000739 [16:58:24] Epoch: 1 Batch: 17596/20099 (87.55%) Loss: 1.883582 LR: 0.00000739 [16:58:26] Epoch: 1 Batch: 17597/20099 (87.55%) Loss: 2.436203 LR: 0.00000739 [16:58:28] Epoch: 1 Batch: 17598/20099 (87.56%) Loss: 2.247070 LR: 0.00000739 [16:58:30] Epoch: 1 Batch: 17599/20099 (87.56%) Loss: 2.383197 LR: 0.00000738 [16:58:35] >> Cleaned up old temp checkpoint: epoch1_step15600 [16:58:35] >> Temp checkpoint saved: epoch1_step17600, size: 0.1693 GB [16:58:35] Epoch: 1 Batch: 17600/20099 (87.57%) Loss: 2.295342 LR: 0.00000738 [16:58:37] Epoch: 1 Batch: 17601/20099 (87.57%) Loss: 1.992482 LR: 0.00000738 [16:58:39] Epoch: 1 Batch: 17602/20099 (87.58%) Loss: 1.999650 LR: 0.00000738 [16:58:41] Epoch: 1 Batch: 17603/20099 (87.58%) Loss: 2.313826 LR: 0.00000738 [16:58:43] Epoch: 1 Batch: 17604/20099 (87.59%) Loss: 1.874575 LR: 0.00000738 [16:58:44] Epoch: 1 Batch: 17605/20099 (87.59%) Loss: 2.285563 LR: 0.00000738 [16:58:46] Epoch: 1 Batch: 17606/20099 (87.60%) Loss: 1.992687 LR: 0.00000737 [16:58:48] Epoch: 1 Batch: 17607/20099 (87.60%) Loss: 2.119013 LR: 0.00000737 [16:58:50] Epoch: 1 Batch: 17608/20099 (87.61%) Loss: 2.160233 LR: 0.00000737 [16:58:52] Epoch: 1 Batch: 17609/20099 (87.61%) Loss: 1.840156 LR: 0.00000737 [16:58:54] Epoch: 1 Batch: 17610/20099 (87.62%) Loss: 2.300976 LR: 0.00000737 [16:58:55] Epoch: 1 Batch: 17611/20099 (87.62%) Loss: 2.084265 LR: 0.00000737 [16:58:57] Epoch: 1 Batch: 17612/20099 (87.63%) Loss: 1.776223 LR: 0.00000737 [16:58:59] Epoch: 1 Batch: 17613/20099 (87.63%) Loss: 2.225344 LR: 0.00000737 [16:59:01] Epoch: 1 Batch: 17614/20099 (87.64%) Loss: 1.906311 LR: 0.00000737 [16:59:03] Epoch: 1 Batch: 17615/20099 (87.64%) Loss: 2.034387 LR: 0.00000737 [16:59:05] Epoch: 1 Batch: 17616/20099 (87.65%) Loss: 1.933355 LR: 0.00000737 [16:59:07] Epoch: 1 Batch: 17617/20099 (87.65%) Loss: 2.013640 LR: 0.00000737 [16:59:09] Epoch: 1 Batch: 17618/20099 (87.66%) Loss: 2.048515 LR: 0.00000737 [16:59:10] Epoch: 1 Batch: 17619/20099 (87.66%) Loss: 2.013034 LR: 0.00000737 [16:59:12] Epoch: 1 Batch: 17620/20099 (87.67%) Loss: 2.206240 LR: 0.00000736 [16:59:14] Epoch: 1 Batch: 17621/20099 (87.67%) Loss: 2.347862 LR: 0.00000736 [16:59:16] Epoch: 1 Batch: 17622/20099 (87.68%) Loss: 2.076741 LR: 0.00000736 [16:59:18] Epoch: 1 Batch: 17623/20099 (87.68%) Loss: 2.159196 LR: 0.00000736 [16:59:20] Epoch: 1 Batch: 17624/20099 (87.69%) Loss: 1.809497 LR: 0.00000736 [16:59:22] Epoch: 1 Batch: 17625/20099 (87.69%) Loss: 2.177937 LR: 0.00000736 [16:59:23] Epoch: 1 Batch: 17626/20099 (87.70%) Loss: 2.155886 LR: 0.00000736 [16:59:25] Epoch: 1 Batch: 17627/20099 (87.70%) Loss: 2.022970 LR: 0.00000735 [16:59:27] Epoch: 1 Batch: 17628/20099 (87.71%) Loss: 2.060594 LR: 0.00000735 [16:59:29] Epoch: 1 Batch: 17629/20099 (87.71%) Loss: 2.143426 LR: 0.00000735 [16:59:31] Epoch: 1 Batch: 17630/20099 (87.72%) Loss: 2.029916 LR: 0.00000735 [16:59:33] Epoch: 1 Batch: 17631/20099 (87.72%) Loss: 2.418743 LR: 0.00000735 [16:59:35] Epoch: 1 Batch: 17632/20099 (87.73%) Loss: 1.934432 LR: 0.00000735 [16:59:36] Epoch: 1 Batch: 17633/20099 (87.73%) Loss: 2.026227 LR: 0.00000735 [16:59:38] Epoch: 1 Batch: 17634/20099 (87.74%) Loss: 1.899903 LR: 0.00000734 [16:59:40] Epoch: 1 Batch: 17635/20099 (87.74%) Loss: 2.087386 LR: 0.00000734 [16:59:42] Epoch: 1 Batch: 17636/20099 (87.75%) Loss: 2.075427 LR: 0.00000734 [16:59:44] Epoch: 1 Batch: 17637/20099 (87.75%) Loss: 1.812583 LR: 0.00000734 [16:59:46] Epoch: 1 Batch: 17638/20099 (87.76%) Loss: 2.168366 LR: 0.00000734 [16:59:47] Epoch: 1 Batch: 17639/20099 (87.76%) Loss: 1.981191 LR: 0.00000734 [16:59:49] Epoch: 1 Batch: 17640/20099 (87.77%) Loss: 2.203865 LR: 0.00000734 [16:59:51] Epoch: 1 Batch: 17641/20099 (87.77%) Loss: 1.861481 LR: 0.00000734 [16:59:53] Epoch: 1 Batch: 17642/20099 (87.78%) Loss: 2.230494 LR: 0.00000734 [16:59:55] Epoch: 1 Batch: 17643/20099 (87.78%) Loss: 2.144921 LR: 0.00000734 [16:59:57] Epoch: 1 Batch: 17644/20099 (87.79%) Loss: 2.140000 LR: 0.00000734 [16:59:59] Epoch: 1 Batch: 17645/20099 (87.79%) Loss: 2.020084 LR: 0.00000734 [17:00:00] Epoch: 1 Batch: 17646/20099 (87.80%) Loss: 2.117768 LR: 0.00000734 [17:00:02] Epoch: 1 Batch: 17647/20099 (87.80%) Loss: 1.933041 LR: 0.00000734 [17:00:04] Epoch: 1 Batch: 17648/20099 (87.81%) Loss: 2.227930 LR: 0.00000733 [17:00:06] Epoch: 1 Batch: 17649/20099 (87.81%) Loss: 2.109669 LR: 0.00000733 [17:00:08] Epoch: 1 Batch: 17650/20099 (87.82%) Loss: 2.393474 LR: 0.00000733 [17:00:10] Epoch: 1 Batch: 17651/20099 (87.82%) Loss: 1.935545 LR: 0.00000733 [17:00:12] Epoch: 1 Batch: 17652/20099 (87.83%) Loss: 2.100092 LR: 0.00000733 [17:00:13] Epoch: 1 Batch: 17653/20099 (87.83%) Loss: 2.068494 LR: 0.00000733 [17:00:15] Epoch: 1 Batch: 17654/20099 (87.84%) Loss: 2.245531 LR: 0.00000733 [17:00:17] Epoch: 1 Batch: 17655/20099 (87.84%) Loss: 2.064369 LR: 0.00000732 [17:00:19] Epoch: 1 Batch: 17656/20099 (87.85%) Loss: 1.957474 LR: 0.00000732 [17:00:21] Epoch: 1 Batch: 17657/20099 (87.85%) Loss: 2.269178 LR: 0.00000732 [17:00:23] Epoch: 1 Batch: 17658/20099 (87.86%) Loss: 2.324436 LR: 0.00000732 [17:00:24] Epoch: 1 Batch: 17659/20099 (87.86%) Loss: 1.933003 LR: 0.00000732 [17:00:26] Epoch: 1 Batch: 17660/20099 (87.87%) Loss: 2.320109 LR: 0.00000732 [17:00:28] Epoch: 1 Batch: 17661/20099 (87.87%) Loss: 2.125337 LR: 0.00000732 [17:00:30] Epoch: 1 Batch: 17662/20099 (87.88%) Loss: 2.285500 LR: 0.00000731 [17:00:32] Epoch: 1 Batch: 17663/20099 (87.88%) Loss: 1.969454 LR: 0.00000731 [17:00:34] Epoch: 1 Batch: 17664/20099 (87.88%) Loss: 1.690562 LR: 0.00000731 [17:00:36] Epoch: 1 Batch: 17665/20099 (87.89%) Loss: 2.244020 LR: 0.00000731 [17:00:37] Epoch: 1 Batch: 17666/20099 (87.89%) Loss: 2.180691 LR: 0.00000731 [17:00:39] Epoch: 1 Batch: 17667/20099 (87.90%) Loss: 1.858263 LR: 0.00000731 [17:00:41] Epoch: 1 Batch: 17668/20099 (87.90%) Loss: 1.795972 LR: 0.00000731 [17:00:43] Epoch: 1 Batch: 17669/20099 (87.91%) Loss: 1.923286 LR: 0.00000731 [17:00:45] Epoch: 1 Batch: 17670/20099 (87.91%) Loss: 2.106896 LR: 0.00000731 [17:00:47] Epoch: 1 Batch: 17671/20099 (87.92%) Loss: 2.419809 LR: 0.00000731 [17:00:49] Epoch: 1 Batch: 17672/20099 (87.92%) Loss: 2.039883 LR: 0.00000731 [17:00:50] Epoch: 1 Batch: 17673/20099 (87.93%) Loss: 2.273315 LR: 0.00000731 [17:00:52] Epoch: 1 Batch: 17674/20099 (87.93%) Loss: 1.959483 LR: 0.00000731 [17:00:54] Epoch: 1 Batch: 17675/20099 (87.94%) Loss: 2.292348 LR: 0.00000731 [17:00:56] Epoch: 1 Batch: 17676/20099 (87.94%) Loss: 1.933493 LR: 0.00000730 [17:00:58] Epoch: 1 Batch: 17677/20099 (87.95%) Loss: 2.140478 LR: 0.00000730 [17:01:00] Epoch: 1 Batch: 17678/20099 (87.95%) Loss: 1.856125 LR: 0.00000730 [17:01:01] Epoch: 1 Batch: 17679/20099 (87.96%) Loss: 2.062682 LR: 0.00000730 [17:01:03] Epoch: 1 Batch: 17680/20099 (87.96%) Loss: 2.260612 LR: 0.00000730 [17:01:05] Epoch: 1 Batch: 17681/20099 (87.97%) Loss: 2.011373 LR: 0.00000730 [17:01:07] Epoch: 1 Batch: 17682/20099 (87.97%) Loss: 2.133018 LR: 0.00000730 [17:01:09] Epoch: 1 Batch: 17683/20099 (87.98%) Loss: 1.878486 LR: 0.00000729 [17:01:11] Epoch: 1 Batch: 17684/20099 (87.98%) Loss: 1.989194 LR: 0.00000729 [17:01:12] Epoch: 1 Batch: 17685/20099 (87.99%) Loss: 2.178058 LR: 0.00000729 [17:01:14] Epoch: 1 Batch: 17686/20099 (87.99%) Loss: 2.395487 LR: 0.00000729 [17:01:16] Epoch: 1 Batch: 17687/20099 (88.00%) Loss: 1.917369 LR: 0.00000729 [17:01:18] Epoch: 1 Batch: 17688/20099 (88.00%) Loss: 2.213969 LR: 0.00000729 [17:01:20] Epoch: 1 Batch: 17689/20099 (88.01%) Loss: 2.026361 LR: 0.00000729 [17:01:22] Epoch: 1 Batch: 17690/20099 (88.01%) Loss: 1.888976 LR: 0.00000728 [17:01:24] Epoch: 1 Batch: 17691/20099 (88.02%) Loss: 2.220475 LR: 0.00000728 [17:01:25] Epoch: 1 Batch: 17692/20099 (88.02%) Loss: 2.110624 LR: 0.00000728 [17:01:27] Epoch: 1 Batch: 17693/20099 (88.03%) Loss: 2.161783 LR: 0.00000728 [17:01:29] Epoch: 1 Batch: 17694/20099 (88.03%) Loss: 2.077018 LR: 0.00000728 [17:01:31] Epoch: 1 Batch: 17695/20099 (88.04%) Loss: 1.980493 LR: 0.00000728 [17:01:33] Epoch: 1 Batch: 17696/20099 (88.04%) Loss: 1.818541 LR: 0.00000728 [17:01:35] Epoch: 1 Batch: 17697/20099 (88.05%) Loss: 1.978923 LR: 0.00000728 [17:01:37] Epoch: 1 Batch: 17698/20099 (88.05%) Loss: 2.124783 LR: 0.00000728 [17:01:38] Epoch: 1 Batch: 17699/20099 (88.06%) Loss: 2.372052 LR: 0.00000728 [17:01:40] Epoch: 1 Batch: 17700/20099 (88.06%) Loss: 1.779910 LR: 0.00000728 [17:01:42] Epoch: 1 Batch: 17701/20099 (88.07%) Loss: 2.069732 LR: 0.00000728 [17:01:44] Epoch: 1 Batch: 17702/20099 (88.07%) Loss: 2.196401 LR: 0.00000728 [17:01:46] Epoch: 1 Batch: 17703/20099 (88.08%) Loss: 2.084141 LR: 0.00000728 [17:01:48] Epoch: 1 Batch: 17704/20099 (88.08%) Loss: 1.797920 LR: 0.00000727 [17:01:50] Epoch: 1 Batch: 17705/20099 (88.09%) Loss: 2.074756 LR: 0.00000727 [17:01:51] Epoch: 1 Batch: 17706/20099 (88.09%) Loss: 2.008380 LR: 0.00000727 [17:01:53] Epoch: 1 Batch: 17707/20099 (88.10%) Loss: 2.029838 LR: 0.00000727 [17:01:55] Epoch: 1 Batch: 17708/20099 (88.10%) Loss: 2.123073 LR: 0.00000727 [17:01:57] Epoch: 1 Batch: 17709/20099 (88.11%) Loss: 2.013156 LR: 0.00000727 [17:01:59] Epoch: 1 Batch: 17710/20099 (88.11%) Loss: 2.300520 LR: 0.00000727 [17:02:01] Epoch: 1 Batch: 17711/20099 (88.12%) Loss: 1.933515 LR: 0.00000726 [17:02:03] Epoch: 1 Batch: 17712/20099 (88.12%) Loss: 2.129671 LR: 0.00000726 [17:02:04] Epoch: 1 Batch: 17713/20099 (88.13%) Loss: 1.878718 LR: 0.00000726 [17:02:06] Epoch: 1 Batch: 17714/20099 (88.13%) Loss: 2.067873 LR: 0.00000726 [17:02:08] Epoch: 1 Batch: 17715/20099 (88.14%) Loss: 1.993014 LR: 0.00000726 [17:02:10] Epoch: 1 Batch: 17716/20099 (88.14%) Loss: 2.073698 LR: 0.00000726 [17:02:12] Epoch: 1 Batch: 17717/20099 (88.15%) Loss: 2.340602 LR: 0.00000726 [17:02:14] Epoch: 1 Batch: 17718/20099 (88.15%) Loss: 2.361984 LR: 0.00000726 [17:02:16] Epoch: 1 Batch: 17719/20099 (88.16%) Loss: 1.764903 LR: 0.00000726 [17:02:17] Epoch: 1 Batch: 17720/20099 (88.16%) Loss: 2.232680 LR: 0.00000726 [17:02:19] Epoch: 1 Batch: 17721/20099 (88.17%) Loss: 2.203587 LR: 0.00000726 [17:02:21] Epoch: 1 Batch: 17722/20099 (88.17%) Loss: 1.875383 LR: 0.00000726 [17:02:23] Epoch: 1 Batch: 17723/20099 (88.18%) Loss: 1.981509 LR: 0.00000726 [17:02:25] Epoch: 1 Batch: 17724/20099 (88.18%) Loss: 2.154276 LR: 0.00000726 [17:02:27] Epoch: 1 Batch: 17725/20099 (88.19%) Loss: 2.090257 LR: 0.00000725 [17:02:28] Epoch: 1 Batch: 17726/20099 (88.19%) Loss: 2.161135 LR: 0.00000725 [17:02:30] Epoch: 1 Batch: 17727/20099 (88.20%) Loss: 2.083909 LR: 0.00000725 [17:02:32] Epoch: 1 Batch: 17728/20099 (88.20%) Loss: 2.006870 LR: 0.00000725 [17:02:34] Epoch: 1 Batch: 17729/20099 (88.21%) Loss: 1.738441 LR: 0.00000725 [17:02:36] Epoch: 1 Batch: 17730/20099 (88.21%) Loss: 1.899634 LR: 0.00000725 [17:02:38] Epoch: 1 Batch: 17731/20099 (88.22%) Loss: 2.118209 LR: 0.00000725 [17:02:39] Epoch: 1 Batch: 17732/20099 (88.22%) Loss: 1.882387 LR: 0.00000724 [17:02:41] Epoch: 1 Batch: 17733/20099 (88.23%) Loss: 2.321584 LR: 0.00000724 [17:02:43] Epoch: 1 Batch: 17734/20099 (88.23%) Loss: 2.042328 LR: 0.00000724 [17:02:45] Epoch: 1 Batch: 17735/20099 (88.24%) Loss: 1.898383 LR: 0.00000724 [17:02:47] Epoch: 1 Batch: 17736/20099 (88.24%) Loss: 1.864330 LR: 0.00000724 [17:02:49] Epoch: 1 Batch: 17737/20099 (88.25%) Loss: 2.093995 LR: 0.00000724 [17:02:51] Epoch: 1 Batch: 17738/20099 (88.25%) Loss: 2.156176 LR: 0.00000724 [17:02:52] Epoch: 1 Batch: 17739/20099 (88.26%) Loss: 2.170602 LR: 0.00000723 [17:02:54] Epoch: 1 Batch: 17740/20099 (88.26%) Loss: 2.152958 LR: 0.00000723 [17:02:56] Epoch: 1 Batch: 17741/20099 (88.27%) Loss: 2.152933 LR: 0.00000723 [17:02:58] Epoch: 1 Batch: 17742/20099 (88.27%) Loss: 2.097863 LR: 0.00000723 [17:03:00] Epoch: 1 Batch: 17743/20099 (88.28%) Loss: 2.093624 LR: 0.00000723 [17:03:02] Epoch: 1 Batch: 17744/20099 (88.28%) Loss: 1.947402 LR: 0.00000723 [17:03:04] Epoch: 1 Batch: 17745/20099 (88.29%) Loss: 1.961210 LR: 0.00000723 [17:03:05] Epoch: 1 Batch: 17746/20099 (88.29%) Loss: 2.145979 LR: 0.00000723 [17:03:07] Epoch: 1 Batch: 17747/20099 (88.30%) Loss: 1.895670 LR: 0.00000723 [17:03:09] Epoch: 1 Batch: 17748/20099 (88.30%) Loss: 2.367840 LR: 0.00000723 [17:03:11] Epoch: 1 Batch: 17749/20099 (88.31%) Loss: 2.177567 LR: 0.00000723 [17:03:13] Epoch: 1 Batch: 17750/20099 (88.31%) Loss: 2.069417 LR: 0.00000723 [17:03:15] Epoch: 1 Batch: 17751/20099 (88.32%) Loss: 1.975558 LR: 0.00000723 [17:03:17] Epoch: 1 Batch: 17752/20099 (88.32%) Loss: 1.785103 LR: 0.00000723 [17:03:18] Epoch: 1 Batch: 17753/20099 (88.33%) Loss: 2.017456 LR: 0.00000722 [17:03:20] Epoch: 1 Batch: 17754/20099 (88.33%) Loss: 2.025445 LR: 0.00000722 [17:03:22] Epoch: 1 Batch: 17755/20099 (88.34%) Loss: 1.935558 LR: 0.00000722 [17:03:24] Epoch: 1 Batch: 17756/20099 (88.34%) Loss: 1.691265 LR: 0.00000722 [17:03:26] Epoch: 1 Batch: 17757/20099 (88.35%) Loss: 2.150466 LR: 0.00000722 [17:03:28] Epoch: 1 Batch: 17758/20099 (88.35%) Loss: 2.097889 LR: 0.00000722 [17:03:30] Epoch: 1 Batch: 17759/20099 (88.36%) Loss: 2.283082 LR: 0.00000722 [17:03:31] Epoch: 1 Batch: 17760/20099 (88.36%) Loss: 2.078397 LR: 0.00000721 [17:03:33] Epoch: 1 Batch: 17761/20099 (88.37%) Loss: 2.075105 LR: 0.00000721 [17:03:35] Epoch: 1 Batch: 17762/20099 (88.37%) Loss: 2.476645 LR: 0.00000721 [17:03:37] Epoch: 1 Batch: 17763/20099 (88.38%) Loss: 2.207800 LR: 0.00000721 [17:03:39] Epoch: 1 Batch: 17764/20099 (88.38%) Loss: 2.280474 LR: 0.00000721 [17:03:41] Epoch: 1 Batch: 17765/20099 (88.39%) Loss: 2.246104 LR: 0.00000721 [17:03:43] Epoch: 1 Batch: 17766/20099 (88.39%) Loss: 2.097852 LR: 0.00000721 [17:03:44] Epoch: 1 Batch: 17767/20099 (88.40%) Loss: 2.095386 LR: 0.00000721 [17:03:46] Epoch: 1 Batch: 17768/20099 (88.40%) Loss: 2.010395 LR: 0.00000721 [17:03:48] Epoch: 1 Batch: 17769/20099 (88.41%) Loss: 2.240827 LR: 0.00000721 [17:03:50] Epoch: 1 Batch: 17770/20099 (88.41%) Loss: 2.317030 LR: 0.00000721 [17:03:52] Epoch: 1 Batch: 17771/20099 (88.42%) Loss: 1.914078 LR: 0.00000721 [17:03:54] Epoch: 1 Batch: 17772/20099 (88.42%) Loss: 2.124379 LR: 0.00000721 [17:03:56] Epoch: 1 Batch: 17773/20099 (88.43%) Loss: 2.031997 LR: 0.00000721 [17:03:57] Epoch: 1 Batch: 17774/20099 (88.43%) Loss: 2.147969 LR: 0.00000720 [17:03:59] Epoch: 1 Batch: 17775/20099 (88.44%) Loss: 2.081557 LR: 0.00000720 [17:04:01] Epoch: 1 Batch: 17776/20099 (88.44%) Loss: 1.790178 LR: 0.00000720 [17:04:03] Epoch: 1 Batch: 17777/20099 (88.45%) Loss: 2.189587 LR: 0.00000720 [17:04:05] Epoch: 1 Batch: 17778/20099 (88.45%) Loss: 2.259582 LR: 0.00000720 [17:04:07] Epoch: 1 Batch: 17779/20099 (88.46%) Loss: 2.193757 LR: 0.00000720 [17:04:08] Epoch: 1 Batch: 17780/20099 (88.46%) Loss: 1.998036 LR: 0.00000720 [17:04:10] Epoch: 1 Batch: 17781/20099 (88.47%) Loss: 2.237819 LR: 0.00000719 [17:04:12] Epoch: 1 Batch: 17782/20099 (88.47%) Loss: 1.847376 LR: 0.00000719 [17:04:14] Epoch: 1 Batch: 17783/20099 (88.48%) Loss: 2.302372 LR: 0.00000719 [17:04:16] Epoch: 1 Batch: 17784/20099 (88.48%) Loss: 2.222530 LR: 0.00000719 [17:04:18] Epoch: 1 Batch: 17785/20099 (88.49%) Loss: 2.163054 LR: 0.00000719 [17:04:20] Epoch: 1 Batch: 17786/20099 (88.49%) Loss: 2.123675 LR: 0.00000719 [17:04:21] Epoch: 1 Batch: 17787/20099 (88.50%) Loss: 2.047291 LR: 0.00000719 [17:04:23] Epoch: 1 Batch: 17788/20099 (88.50%) Loss: 1.886324 LR: 0.00000718 [17:04:25] Epoch: 1 Batch: 17789/20099 (88.51%) Loss: 2.026636 LR: 0.00000718 [17:04:27] Epoch: 1 Batch: 17790/20099 (88.51%) Loss: 2.280516 LR: 0.00000718 [17:04:29] Epoch: 1 Batch: 17791/20099 (88.52%) Loss: 1.925335 LR: 0.00000718 [17:04:31] Epoch: 1 Batch: 17792/20099 (88.52%) Loss: 2.167463 LR: 0.00000718 [17:04:32] Epoch: 1 Batch: 17793/20099 (88.53%) Loss: 2.033176 LR: 0.00000718 [17:04:34] Epoch: 1 Batch: 17794/20099 (88.53%) Loss: 2.170818 LR: 0.00000718 [17:04:36] Epoch: 1 Batch: 17795/20099 (88.54%) Loss: 2.071957 LR: 0.00000718 [17:04:38] Epoch: 1 Batch: 17796/20099 (88.54%) Loss: 1.714383 LR: 0.00000718 [17:04:40] Epoch: 1 Batch: 17797/20099 (88.55%) Loss: 2.283930 LR: 0.00000718 [17:04:42] Epoch: 1 Batch: 17798/20099 (88.55%) Loss: 2.142462 LR: 0.00000718 [17:04:43] Epoch: 1 Batch: 17799/20099 (88.56%) Loss: 2.096951 LR: 0.00000718 [17:04:49] >> Cleaned up old temp checkpoint: epoch1_step15800 [17:04:49] >> Temp checkpoint saved: epoch1_step17800, size: 0.1693 GB [17:04:49] Epoch: 1 Batch: 17800/20099 (88.56%) Loss: 2.086728 LR: 0.00000718 [17:04:51] Epoch: 1 Batch: 17801/20099 (88.57%) Loss: 2.153544 LR: 0.00000718 [17:04:53] Epoch: 1 Batch: 17802/20099 (88.57%) Loss: 2.114238 LR: 0.00000717 [17:04:55] Epoch: 1 Batch: 17803/20099 (88.58%) Loss: 1.773811 LR: 0.00000717 [17:04:57] Epoch: 1 Batch: 17804/20099 (88.58%) Loss: 2.132750 LR: 0.00000717 [17:04:58] Epoch: 1 Batch: 17805/20099 (88.59%) Loss: 2.149646 LR: 0.00000717 [17:05:00] Epoch: 1 Batch: 17806/20099 (88.59%) Loss: 2.162595 LR: 0.00000717 [17:05:02] Epoch: 1 Batch: 17807/20099 (88.60%) Loss: 2.044490 LR: 0.00000717 [17:05:04] Epoch: 1 Batch: 17808/20099 (88.60%) Loss: 2.073534 LR: 0.00000717 [17:05:06] Epoch: 1 Batch: 17809/20099 (88.61%) Loss: 2.203354 LR: 0.00000716 [17:05:08] Epoch: 1 Batch: 17810/20099 (88.61%) Loss: 2.009174 LR: 0.00000716 [17:05:10] Epoch: 1 Batch: 17811/20099 (88.62%) Loss: 1.987364 LR: 0.00000716 [17:05:11] Epoch: 1 Batch: 17812/20099 (88.62%) Loss: 2.108571 LR: 0.00000716 [17:05:13] Epoch: 1 Batch: 17813/20099 (88.63%) Loss: 2.216749 LR: 0.00000716 [17:05:15] Epoch: 1 Batch: 17814/20099 (88.63%) Loss: 2.198654 LR: 0.00000716 [17:05:17] Epoch: 1 Batch: 17815/20099 (88.64%) Loss: 1.938093 LR: 0.00000716 [17:05:19] Epoch: 1 Batch: 17816/20099 (88.64%) Loss: 2.048977 LR: 0.00000716 [17:05:21] Epoch: 1 Batch: 17817/20099 (88.65%) Loss: 2.127401 LR: 0.00000716 [17:05:23] Epoch: 1 Batch: 17818/20099 (88.65%) Loss: 1.921334 LR: 0.00000716 [17:05:25] Epoch: 1 Batch: 17819/20099 (88.66%) Loss: 2.101003 LR: 0.00000716 [17:05:26] Epoch: 1 Batch: 17820/20099 (88.66%) Loss: 2.377211 LR: 0.00000716 [17:05:28] Epoch: 1 Batch: 17821/20099 (88.67%) Loss: 2.196701 LR: 0.00000716 [17:05:30] Epoch: 1 Batch: 17822/20099 (88.67%) Loss: 2.033611 LR: 0.00000716 [17:05:32] Epoch: 1 Batch: 17823/20099 (88.68%) Loss: 2.112613 LR: 0.00000715 [17:05:34] Epoch: 1 Batch: 17824/20099 (88.68%) Loss: 1.988445 LR: 0.00000715 [17:05:36] Epoch: 1 Batch: 17825/20099 (88.69%) Loss: 2.464946 LR: 0.00000715 [17:05:38] Epoch: 1 Batch: 17826/20099 (88.69%) Loss: 1.981993 LR: 0.00000715 [17:05:39] Epoch: 1 Batch: 17827/20099 (88.70%) Loss: 1.897996 LR: 0.00000715 [17:05:41] Epoch: 1 Batch: 17828/20099 (88.70%) Loss: 1.987734 LR: 0.00000715 [17:05:43] Epoch: 1 Batch: 17829/20099 (88.71%) Loss: 2.091252 LR: 0.00000715 [17:05:45] Epoch: 1 Batch: 17830/20099 (88.71%) Loss: 2.062995 LR: 0.00000714 [17:05:47] Epoch: 1 Batch: 17831/20099 (88.72%) Loss: 1.685725 LR: 0.00000714 [17:05:49] Epoch: 1 Batch: 17832/20099 (88.72%) Loss: 2.255077 LR: 0.00000714 [17:05:50] Epoch: 1 Batch: 17833/20099 (88.73%) Loss: 2.176946 LR: 0.00000714 [17:05:52] Epoch: 1 Batch: 17834/20099 (88.73%) Loss: 2.283120 LR: 0.00000714 [17:05:54] Epoch: 1 Batch: 17835/20099 (88.74%) Loss: 2.065267 LR: 0.00000714 [17:05:56] Epoch: 1 Batch: 17836/20099 (88.74%) Loss: 2.273965 LR: 0.00000714 [17:05:58] Epoch: 1 Batch: 17837/20099 (88.75%) Loss: 1.603566 LR: 0.00000714 [17:06:00] Epoch: 1 Batch: 17838/20099 (88.75%) Loss: 2.361848 LR: 0.00000714 [17:06:02] Epoch: 1 Batch: 17839/20099 (88.76%) Loss: 1.719803 LR: 0.00000714 [17:06:03] Epoch: 1 Batch: 17840/20099 (88.76%) Loss: 2.097347 LR: 0.00000714 [17:06:05] Epoch: 1 Batch: 17841/20099 (88.77%) Loss: 2.188777 LR: 0.00000714 [17:06:07] Epoch: 1 Batch: 17842/20099 (88.77%) Loss: 2.086007 LR: 0.00000714 [17:06:09] Epoch: 1 Batch: 17843/20099 (88.78%) Loss: 1.674153 LR: 0.00000714 [17:06:11] Epoch: 1 Batch: 17844/20099 (88.78%) Loss: 1.934477 LR: 0.00000713 [17:06:13] Epoch: 1 Batch: 17845/20099 (88.79%) Loss: 2.114703 LR: 0.00000713 [17:06:15] Epoch: 1 Batch: 17846/20099 (88.79%) Loss: 1.869865 LR: 0.00000713 [17:06:16] Epoch: 1 Batch: 17847/20099 (88.80%) Loss: 2.153941 LR: 0.00000713 [17:06:18] Epoch: 1 Batch: 17848/20099 (88.80%) Loss: 1.988660 LR: 0.00000713 [17:06:20] Epoch: 1 Batch: 17849/20099 (88.81%) Loss: 2.098267 LR: 0.00000713 [17:06:22] Epoch: 1 Batch: 17850/20099 (88.81%) Loss: 2.158161 LR: 0.00000713 [17:06:24] Epoch: 1 Batch: 17851/20099 (88.82%) Loss: 2.178480 LR: 0.00000712 [17:06:26] Epoch: 1 Batch: 17852/20099 (88.82%) Loss: 2.169351 LR: 0.00000712 [17:06:27] Epoch: 1 Batch: 17853/20099 (88.83%) Loss: 1.938846 LR: 0.00000712 [17:06:29] Epoch: 1 Batch: 17854/20099 (88.83%) Loss: 1.951990 LR: 0.00000712 [17:06:31] Epoch: 1 Batch: 17855/20099 (88.84%) Loss: 2.364931 LR: 0.00000712 [17:06:33] Epoch: 1 Batch: 17856/20099 (88.84%) Loss: 2.037882 LR: 0.00000712 [17:06:35] Epoch: 1 Batch: 17857/20099 (88.85%) Loss: 2.028296 LR: 0.00000712 [17:06:37] Epoch: 1 Batch: 17858/20099 (88.85%) Loss: 2.349697 LR: 0.00000711 [17:06:39] Epoch: 1 Batch: 17859/20099 (88.86%) Loss: 2.146251 LR: 0.00000711 [17:06:40] Epoch: 1 Batch: 17860/20099 (88.86%) Loss: 2.281940 LR: 0.00000711 [17:06:42] Epoch: 1 Batch: 17861/20099 (88.87%) Loss: 2.065648 LR: 0.00000711 [17:06:44] Epoch: 1 Batch: 17862/20099 (88.87%) Loss: 2.080222 LR: 0.00000711 [17:06:46] Epoch: 1 Batch: 17863/20099 (88.88%) Loss: 1.964322 LR: 0.00000711 [17:06:48] Epoch: 1 Batch: 17864/20099 (88.88%) Loss: 2.045818 LR: 0.00000711 [17:06:50] Epoch: 1 Batch: 17865/20099 (88.89%) Loss: 2.146172 LR: 0.00000711 [17:06:52] Epoch: 1 Batch: 17866/20099 (88.89%) Loss: 2.075050 LR: 0.00000711 [17:06:53] Epoch: 1 Batch: 17867/20099 (88.89%) Loss: 1.876192 LR: 0.00000711 [17:06:55] Epoch: 1 Batch: 17868/20099 (88.90%) Loss: 1.633713 LR: 0.00000711 [17:06:57] Epoch: 1 Batch: 17869/20099 (88.90%) Loss: 1.890222 LR: 0.00000711 [17:06:59] Epoch: 1 Batch: 17870/20099 (88.91%) Loss: 2.202323 LR: 0.00000711 [17:07:01] Epoch: 1 Batch: 17871/20099 (88.91%) Loss: 2.102167 LR: 0.00000711 [17:07:03] Epoch: 1 Batch: 17872/20099 (88.92%) Loss: 1.839270 LR: 0.00000710 [17:07:05] Epoch: 1 Batch: 17873/20099 (88.92%) Loss: 2.015069 LR: 0.00000710 [17:07:06] Epoch: 1 Batch: 17874/20099 (88.93%) Loss: 2.213577 LR: 0.00000710 [17:07:08] Epoch: 1 Batch: 17875/20099 (88.93%) Loss: 2.445965 LR: 0.00000710 [17:07:10] Epoch: 1 Batch: 17876/20099 (88.94%) Loss: 2.367826 LR: 0.00000710 [17:07:12] Epoch: 1 Batch: 17877/20099 (88.94%) Loss: 1.769125 LR: 0.00000710 [17:07:14] Epoch: 1 Batch: 17878/20099 (88.95%) Loss: 2.318062 LR: 0.00000710 [17:07:16] Epoch: 1 Batch: 17879/20099 (88.95%) Loss: 2.199411 LR: 0.00000709 [17:07:18] Epoch: 1 Batch: 17880/20099 (88.96%) Loss: 2.319661 LR: 0.00000709 [17:07:19] Epoch: 1 Batch: 17881/20099 (88.96%) Loss: 1.950304 LR: 0.00000709 [17:07:21] Epoch: 1 Batch: 17882/20099 (88.97%) Loss: 2.120035 LR: 0.00000709 [17:07:23] Epoch: 1 Batch: 17883/20099 (88.97%) Loss: 1.683664 LR: 0.00000709 [17:07:25] Epoch: 1 Batch: 17884/20099 (88.98%) Loss: 2.361920 LR: 0.00000709 [17:07:27] Epoch: 1 Batch: 17885/20099 (88.98%) Loss: 2.172902 LR: 0.00000709 [17:07:29] Epoch: 1 Batch: 17886/20099 (88.99%) Loss: 2.611073 LR: 0.00000709 [17:07:31] Epoch: 1 Batch: 17887/20099 (88.99%) Loss: 1.832123 LR: 0.00000709 [17:07:32] Epoch: 1 Batch: 17888/20099 (89.00%) Loss: 2.060222 LR: 0.00000709 [17:07:34] Epoch: 1 Batch: 17889/20099 (89.00%) Loss: 2.115753 LR: 0.00000709 [17:07:36] Epoch: 1 Batch: 17890/20099 (89.01%) Loss: 2.018043 LR: 0.00000709 [17:07:38] Epoch: 1 Batch: 17891/20099 (89.01%) Loss: 2.035562 LR: 0.00000709 [17:07:40] Epoch: 1 Batch: 17892/20099 (89.02%) Loss: 1.761590 LR: 0.00000709 [17:07:42] Epoch: 1 Batch: 17893/20099 (89.02%) Loss: 2.229523 LR: 0.00000708 [17:07:44] Epoch: 1 Batch: 17894/20099 (89.03%) Loss: 2.173752 LR: 0.00000708 [17:07:45] Epoch: 1 Batch: 17895/20099 (89.03%) Loss: 2.024402 LR: 0.00000708 [17:07:47] Epoch: 1 Batch: 17896/20099 (89.04%) Loss: 1.905504 LR: 0.00000708 [17:07:49] Epoch: 1 Batch: 17897/20099 (89.04%) Loss: 2.049084 LR: 0.00000708 [17:07:51] Epoch: 1 Batch: 17898/20099 (89.05%) Loss: 2.113774 LR: 0.00000708 [17:07:53] Epoch: 1 Batch: 17899/20099 (89.05%) Loss: 2.029503 LR: 0.00000708 [17:07:55] Epoch: 1 Batch: 17900/20099 (89.06%) Loss: 2.032397 LR: 0.00000707 [17:07:57] Epoch: 1 Batch: 17901/20099 (89.06%) Loss: 1.882246 LR: 0.00000707 [17:07:58] Epoch: 1 Batch: 17902/20099 (89.07%) Loss: 2.137533 LR: 0.00000707 [17:08:00] Epoch: 1 Batch: 17903/20099 (89.07%) Loss: 2.252595 LR: 0.00000707 [17:08:02] Epoch: 1 Batch: 17904/20099 (89.08%) Loss: 1.929169 LR: 0.00000707 [17:08:04] Epoch: 1 Batch: 17905/20099 (89.08%) Loss: 1.988222 LR: 0.00000707 [17:08:06] Epoch: 1 Batch: 17906/20099 (89.09%) Loss: 2.147563 LR: 0.00000707 [17:08:08] Epoch: 1 Batch: 17907/20099 (89.09%) Loss: 2.078092 LR: 0.00000707 [17:08:10] Epoch: 1 Batch: 17908/20099 (89.10%) Loss: 1.783320 LR: 0.00000707 [17:08:11] Epoch: 1 Batch: 17909/20099 (89.10%) Loss: 1.861536 LR: 0.00000707 [17:08:13] Epoch: 1 Batch: 17910/20099 (89.11%) Loss: 2.264801 LR: 0.00000707 [17:08:15] Epoch: 1 Batch: 17911/20099 (89.11%) Loss: 2.007893 LR: 0.00000707 [17:08:17] Epoch: 1 Batch: 17912/20099 (89.12%) Loss: 2.245257 LR: 0.00000707 [17:08:19] Epoch: 1 Batch: 17913/20099 (89.12%) Loss: 2.510348 LR: 0.00000707 [17:08:21] Epoch: 1 Batch: 17914/20099 (89.13%) Loss: 2.187956 LR: 0.00000706 [17:08:23] Epoch: 1 Batch: 17915/20099 (89.13%) Loss: 2.200603 LR: 0.00000706 [17:08:24] Epoch: 1 Batch: 17916/20099 (89.14%) Loss: 2.246451 LR: 0.00000706 [17:08:26] Epoch: 1 Batch: 17917/20099 (89.14%) Loss: 1.981159 LR: 0.00000706 [17:08:28] Epoch: 1 Batch: 17918/20099 (89.15%) Loss: 2.116902 LR: 0.00000706 [17:08:30] Epoch: 1 Batch: 17919/20099 (89.15%) Loss: 2.226731 LR: 0.00000706 [17:08:32] Epoch: 1 Batch: 17920/20099 (89.16%) Loss: 2.142377 LR: 0.00000706 [17:08:34] Epoch: 1 Batch: 17921/20099 (89.16%) Loss: 1.907858 LR: 0.00000705 [17:08:36] Epoch: 1 Batch: 17922/20099 (89.17%) Loss: 1.893241 LR: 0.00000705 [17:08:37] Epoch: 1 Batch: 17923/20099 (89.17%) Loss: 1.731871 LR: 0.00000705 [17:08:39] Epoch: 1 Batch: 17924/20099 (89.18%) Loss: 1.828278 LR: 0.00000705 [17:08:41] Epoch: 1 Batch: 17925/20099 (89.18%) Loss: 2.215572 LR: 0.00000705 [17:08:43] Epoch: 1 Batch: 17926/20099 (89.19%) Loss: 1.900466 LR: 0.00000705 [17:08:45] Epoch: 1 Batch: 17927/20099 (89.19%) Loss: 1.966420 LR: 0.00000705 [17:08:47] Epoch: 1 Batch: 17928/20099 (89.20%) Loss: 1.700089 LR: 0.00000705 [17:08:49] Epoch: 1 Batch: 17929/20099 (89.20%) Loss: 1.965002 LR: 0.00000705 [17:08:50] Epoch: 1 Batch: 17930/20099 (89.21%) Loss: 1.595601 LR: 0.00000705 [17:08:52] Epoch: 1 Batch: 17931/20099 (89.21%) Loss: 2.120149 LR: 0.00000705 [17:08:54] Epoch: 1 Batch: 17932/20099 (89.22%) Loss: 2.189367 LR: 0.00000705 [17:08:56] Epoch: 1 Batch: 17933/20099 (89.22%) Loss: 2.095162 LR: 0.00000705 [17:08:58] Epoch: 1 Batch: 17934/20099 (89.23%) Loss: 1.639656 LR: 0.00000705 [17:09:00] Epoch: 1 Batch: 17935/20099 (89.23%) Loss: 1.996701 LR: 0.00000704 [17:09:02] Epoch: 1 Batch: 17936/20099 (89.24%) Loss: 2.109386 LR: 0.00000704 [17:09:03] Epoch: 1 Batch: 17937/20099 (89.24%) Loss: 2.138952 LR: 0.00000704 [17:09:05] Epoch: 1 Batch: 17938/20099 (89.25%) Loss: 1.821160 LR: 0.00000704 [17:09:07] Epoch: 1 Batch: 17939/20099 (89.25%) Loss: 1.806112 LR: 0.00000704 [17:09:09] Epoch: 1 Batch: 17940/20099 (89.26%) Loss: 2.380305 LR: 0.00000704 [17:09:11] Epoch: 1 Batch: 17941/20099 (89.26%) Loss: 1.861227 LR: 0.00000704 [17:09:13] Epoch: 1 Batch: 17942/20099 (89.27%) Loss: 1.916560 LR: 0.00000703 [17:09:15] Epoch: 1 Batch: 17943/20099 (89.27%) Loss: 2.018984 LR: 0.00000703 [17:09:16] Epoch: 1 Batch: 17944/20099 (89.28%) Loss: 2.181723 LR: 0.00000703 [17:09:18] Epoch: 1 Batch: 17945/20099 (89.28%) Loss: 1.963849 LR: 0.00000703 [17:09:20] Epoch: 1 Batch: 17946/20099 (89.29%) Loss: 2.141358 LR: 0.00000703 [17:09:22] Epoch: 1 Batch: 17947/20099 (89.29%) Loss: 2.262803 LR: 0.00000703 [17:09:24] Epoch: 1 Batch: 17948/20099 (89.30%) Loss: 1.765720 LR: 0.00000703 [17:09:26] Epoch: 1 Batch: 17949/20099 (89.30%) Loss: 1.969283 LR: 0.00000703 [17:09:28] Epoch: 1 Batch: 17950/20099 (89.31%) Loss: 2.067737 LR: 0.00000703 [17:09:29] Epoch: 1 Batch: 17951/20099 (89.31%) Loss: 1.729749 LR: 0.00000703 [17:09:31] Epoch: 1 Batch: 17952/20099 (89.32%) Loss: 2.096179 LR: 0.00000703 [17:09:33] Epoch: 1 Batch: 17953/20099 (89.32%) Loss: 2.010196 LR: 0.00000703 [17:09:35] Epoch: 1 Batch: 17954/20099 (89.33%) Loss: 2.233782 LR: 0.00000703 [17:09:37] Epoch: 1 Batch: 17955/20099 (89.33%) Loss: 1.900432 LR: 0.00000703 [17:09:39] Epoch: 1 Batch: 17956/20099 (89.34%) Loss: 2.063324 LR: 0.00000702 [17:09:40] Epoch: 1 Batch: 17957/20099 (89.34%) Loss: 1.771828 LR: 0.00000702 [17:09:42] Epoch: 1 Batch: 17958/20099 (89.35%) Loss: 2.336871 LR: 0.00000702 [17:09:44] Epoch: 1 Batch: 17959/20099 (89.35%) Loss: 2.374394 LR: 0.00000702 [17:09:46] Epoch: 1 Batch: 17960/20099 (89.36%) Loss: 2.073898 LR: 0.00000702 [17:09:48] Epoch: 1 Batch: 17961/20099 (89.36%) Loss: 2.097205 LR: 0.00000702 [17:09:50] Epoch: 1 Batch: 17962/20099 (89.37%) Loss: 2.179702 LR: 0.00000702 [17:09:52] Epoch: 1 Batch: 17963/20099 (89.37%) Loss: 2.099352 LR: 0.00000701 [17:09:53] Epoch: 1 Batch: 17964/20099 (89.38%) Loss: 1.899663 LR: 0.00000701 [17:09:55] Epoch: 1 Batch: 17965/20099 (89.38%) Loss: 2.231411 LR: 0.00000701 [17:09:57] Epoch: 1 Batch: 17966/20099 (89.39%) Loss: 2.054854 LR: 0.00000701 [17:09:59] Epoch: 1 Batch: 17967/20099 (89.39%) Loss: 2.403244 LR: 0.00000701 [17:10:01] Epoch: 1 Batch: 17968/20099 (89.40%) Loss: 1.937529 LR: 0.00000701 [17:10:03] Epoch: 1 Batch: 17969/20099 (89.40%) Loss: 2.144378 LR: 0.00000701 [17:10:05] Epoch: 1 Batch: 17970/20099 (89.41%) Loss: 2.156329 LR: 0.00000701 [17:10:06] Epoch: 1 Batch: 17971/20099 (89.41%) Loss: 2.115625 LR: 0.00000701 [17:10:08] Epoch: 1 Batch: 17972/20099 (89.42%) Loss: 1.965944 LR: 0.00000701 [17:10:10] Epoch: 1 Batch: 17973/20099 (89.42%) Loss: 2.171092 LR: 0.00000701 [17:10:12] Epoch: 1 Batch: 17974/20099 (89.43%) Loss: 1.909823 LR: 0.00000701 [17:10:14] Epoch: 1 Batch: 17975/20099 (89.43%) Loss: 2.093071 LR: 0.00000701 [17:10:16] Epoch: 1 Batch: 17976/20099 (89.44%) Loss: 1.913140 LR: 0.00000701 [17:10:17] Epoch: 1 Batch: 17977/20099 (89.44%) Loss: 2.326950 LR: 0.00000700 [17:10:19] Epoch: 1 Batch: 17978/20099 (89.45%) Loss: 2.089114 LR: 0.00000700 [17:10:21] Epoch: 1 Batch: 17979/20099 (89.45%) Loss: 2.009624 LR: 0.00000700 [17:10:23] Epoch: 1 Batch: 17980/20099 (89.46%) Loss: 1.958285 LR: 0.00000700 [17:10:25] Epoch: 1 Batch: 17981/20099 (89.46%) Loss: 2.094510 LR: 0.00000700 [17:10:27] Epoch: 1 Batch: 17982/20099 (89.47%) Loss: 1.758223 LR: 0.00000700 [17:10:29] Epoch: 1 Batch: 17983/20099 (89.47%) Loss: 2.000707 LR: 0.00000700 [17:10:30] Epoch: 1 Batch: 17984/20099 (89.48%) Loss: 1.948707 LR: 0.00000700 [17:10:32] Epoch: 1 Batch: 17985/20099 (89.48%) Loss: 2.217270 LR: 0.00000700 [17:10:34] Epoch: 1 Batch: 17986/20099 (89.49%) Loss: 2.099501 LR: 0.00000700 [17:10:36] Epoch: 1 Batch: 17987/20099 (89.49%) Loss: 2.207113 LR: 0.00000700 [17:10:38] Epoch: 1 Batch: 17988/20099 (89.50%) Loss: 2.160863 LR: 0.00000700 [17:10:40] Epoch: 1 Batch: 17989/20099 (89.50%) Loss: 2.235569 LR: 0.00000700 [17:10:42] Epoch: 1 Batch: 17990/20099 (89.51%) Loss: 1.662771 LR: 0.00000700 [17:10:43] Epoch: 1 Batch: 17991/20099 (89.51%) Loss: 2.004698 LR: 0.00000699 [17:10:45] Epoch: 1 Batch: 17992/20099 (89.52%) Loss: 2.117875 LR: 0.00000699 [17:10:47] Epoch: 1 Batch: 17993/20099 (89.52%) Loss: 1.828228 LR: 0.00000699 [17:10:49] Epoch: 1 Batch: 17994/20099 (89.53%) Loss: 2.016907 LR: 0.00000699 [17:10:51] Epoch: 1 Batch: 17995/20099 (89.53%) Loss: 2.215260 LR: 0.00000699 [17:10:53] Epoch: 1 Batch: 17996/20099 (89.54%) Loss: 1.861059 LR: 0.00000699 [17:10:55] Epoch: 1 Batch: 17997/20099 (89.54%) Loss: 2.173875 LR: 0.00000699 [17:10:56] Epoch: 1 Batch: 17998/20099 (89.55%) Loss: 2.036344 LR: 0.00000698 [17:10:58] Epoch: 1 Batch: 17999/20099 (89.55%) Loss: 1.817745 LR: 0.00000698 [17:11:00] >> Evaluating batch 0 [17:11:01] >> Evaluating batch 1 [17:11:02] >> Evaluating batch 2 [17:11:03] >> Evaluating batch 3 [17:11:04] >> Evaluating batch 4 [17:11:06] >> Evaluating batch 5 [17:11:07] >> Evaluating batch 6 [17:11:08] >> Evaluating batch 7 [17:11:09] >> Evaluating batch 8 [17:11:10] >> Evaluating batch 9 [17:11:11] >> Evaluating batch 10 [17:11:12] >> Evaluating batch 11 [17:11:13] >> Evaluating batch 12 [17:11:14] >> Evaluating batch 13 [17:11:15] >> Evaluating batch 14 [17:11:16] >> Evaluating batch 15 [17:11:17] >> Evaluating batch 16 [17:11:17] Epoch: 1 Step: 18000/20099 Evaluation: [17:11:17] [1mAvg Loss Since Last Eval: 2.0751 Val Loss: 2.1479 Validation loss delta: 0.0002 Perplexity: 8.5672 LR: 0.00000698 [17:11:21] >> Cleaned up old temp checkpoint: epoch1_step16000 [17:11:21] >> Temp checkpoint saved: epoch1_step18000, size: 0.1693 GB [17:11:25] >> Checkpoint saved: epoch1_step18000, size: 0.1693 GB [17:11:25] Epoch: 1 Batch: 18000/20099 (89.56%) Loss: 2.004705 LR: 0.00000698 [17:11:27] Epoch: 1 Batch: 18001/20099 (89.56%) Loss: 2.205158 LR: 0.00000698 [17:11:28] Epoch: 1 Batch: 18002/20099 (89.57%) Loss: 2.016294 LR: 0.00000698 [17:11:30] Epoch: 1 Batch: 18003/20099 (89.57%) Loss: 1.971382 LR: 0.00000698 [17:11:32] Epoch: 1 Batch: 18004/20099 (89.58%) Loss: 1.886335 LR: 0.00000698 [17:11:34] Epoch: 1 Batch: 18005/20099 (89.58%) Loss: 1.924114 LR: 0.00000698 [17:11:36] Epoch: 1 Batch: 18006/20099 (89.59%) Loss: 1.895831 LR: 0.00000698 [17:11:38] Epoch: 1 Batch: 18007/20099 (89.59%) Loss: 1.896155 LR: 0.00000698 [17:11:39] Epoch: 1 Batch: 18008/20099 (89.60%) Loss: 2.063445 LR: 0.00000698 [17:11:41] Epoch: 1 Batch: 18009/20099 (89.60%) Loss: 1.721303 LR: 0.00000698 [17:11:43] Epoch: 1 Batch: 18010/20099 (89.61%) Loss: 2.263575 LR: 0.00000698 [17:11:45] Epoch: 1 Batch: 18011/20099 (89.61%) Loss: 2.260409 LR: 0.00000698 [17:11:47] Epoch: 1 Batch: 18012/20099 (89.62%) Loss: 2.441817 LR: 0.00000697 [17:11:49] Epoch: 1 Batch: 18013/20099 (89.62%) Loss: 2.200558 LR: 0.00000697 [17:11:51] Epoch: 1 Batch: 18014/20099 (89.63%) Loss: 2.163806 LR: 0.00000697 [17:11:53] Epoch: 1 Batch: 18015/20099 (89.63%) Loss: 2.083430 LR: 0.00000697 [17:11:55] Epoch: 1 Batch: 18016/20099 (89.64%) Loss: 2.095789 LR: 0.00000697 [17:11:57] Epoch: 1 Batch: 18017/20099 (89.64%) Loss: 2.156080 LR: 0.00000697 [17:11:59] Epoch: 1 Batch: 18018/20099 (89.65%) Loss: 1.983117 LR: 0.00000697 [17:12:00] Epoch: 1 Batch: 18019/20099 (89.65%) Loss: 2.137215 LR: 0.00000696 [17:12:02] Epoch: 1 Batch: 18020/20099 (89.66%) Loss: 2.316269 LR: 0.00000696 [17:12:04] Epoch: 1 Batch: 18021/20099 (89.66%) Loss: 2.132652 LR: 0.00000696 [17:12:06] Epoch: 1 Batch: 18022/20099 (89.67%) Loss: 2.201175 LR: 0.00000696 [17:12:08] Epoch: 1 Batch: 18023/20099 (89.67%) Loss: 2.304817 LR: 0.00000696 [17:12:10] Epoch: 1 Batch: 18024/20099 (89.68%) Loss: 1.968195 LR: 0.00000696 [17:12:12] Epoch: 1 Batch: 18025/20099 (89.68%) Loss: 2.180826 LR: 0.00000696 [17:12:14] Epoch: 1 Batch: 18026/20099 (89.69%) Loss: 1.883281 LR: 0.00000696 [17:12:15] Epoch: 1 Batch: 18027/20099 (89.69%) Loss: 2.215320 LR: 0.00000696 [17:12:17] Epoch: 1 Batch: 18028/20099 (89.70%) Loss: 2.149981 LR: 0.00000696 [17:12:19] Epoch: 1 Batch: 18029/20099 (89.70%) Loss: 2.217269 LR: 0.00000696 [17:12:21] Epoch: 1 Batch: 18030/20099 (89.71%) Loss: 1.900046 LR: 0.00000696 [17:12:23] Epoch: 1 Batch: 18031/20099 (89.71%) Loss: 2.091825 LR: 0.00000696 [17:12:25] Epoch: 1 Batch: 18032/20099 (89.72%) Loss: 2.042341 LR: 0.00000696 [17:12:26] Epoch: 1 Batch: 18033/20099 (89.72%) Loss: 2.202927 LR: 0.00000695 [17:12:28] Epoch: 1 Batch: 18034/20099 (89.73%) Loss: 1.931984 LR: 0.00000695 [17:12:30] Epoch: 1 Batch: 18035/20099 (89.73%) Loss: 2.253047 LR: 0.00000695 [17:12:32] Epoch: 1 Batch: 18036/20099 (89.74%) Loss: 2.353540 LR: 0.00000695 [17:12:34] Epoch: 1 Batch: 18037/20099 (89.74%) Loss: 2.253605 LR: 0.00000695 [17:12:36] Epoch: 1 Batch: 18038/20099 (89.75%) Loss: 2.340044 LR: 0.00000695 [17:12:37] Epoch: 1 Batch: 18039/20099 (89.75%) Loss: 2.103763 LR: 0.00000695 [17:12:39] Epoch: 1 Batch: 18040/20099 (89.76%) Loss: 1.990899 LR: 0.00000694 [17:12:41] Epoch: 1 Batch: 18041/20099 (89.76%) Loss: 2.345346 LR: 0.00000694 [17:12:43] Epoch: 1 Batch: 18042/20099 (89.77%) Loss: 2.633908 LR: 0.00000694 [17:12:45] Epoch: 1 Batch: 18043/20099 (89.77%) Loss: 1.917744 LR: 0.00000694 [17:12:47] Epoch: 1 Batch: 18044/20099 (89.78%) Loss: 2.171669 LR: 0.00000694 [17:12:49] Epoch: 1 Batch: 18045/20099 (89.78%) Loss: 2.174442 LR: 0.00000694 [17:12:50] Epoch: 1 Batch: 18046/20099 (89.79%) Loss: 2.062431 LR: 0.00000694 [17:12:52] Epoch: 1 Batch: 18047/20099 (89.79%) Loss: 2.188600 LR: 0.00000694 [17:12:54] Epoch: 1 Batch: 18048/20099 (89.80%) Loss: 2.104252 LR: 0.00000694 [17:12:56] Epoch: 1 Batch: 18049/20099 (89.80%) Loss: 2.041443 LR: 0.00000694 [17:12:58] Epoch: 1 Batch: 18050/20099 (89.81%) Loss: 2.104284 LR: 0.00000694 [17:13:00] Epoch: 1 Batch: 18051/20099 (89.81%) Loss: 2.017040 LR: 0.00000694 [17:13:02] Epoch: 1 Batch: 18052/20099 (89.82%) Loss: 2.321356 LR: 0.00000694 [17:13:03] Epoch: 1 Batch: 18053/20099 (89.82%) Loss: 2.158650 LR: 0.00000694 [17:13:05] Epoch: 1 Batch: 18054/20099 (89.83%) Loss: 2.085393 LR: 0.00000693 [17:13:07] Epoch: 1 Batch: 18055/20099 (89.83%) Loss: 1.976495 LR: 0.00000693 [17:13:09] Epoch: 1 Batch: 18056/20099 (89.84%) Loss: 2.142013 LR: 0.00000693 [17:13:11] Epoch: 1 Batch: 18057/20099 (89.84%) Loss: 2.144199 LR: 0.00000693 [17:13:13] Epoch: 1 Batch: 18058/20099 (89.85%) Loss: 1.637200 LR: 0.00000693 [17:13:15] Epoch: 1 Batch: 18059/20099 (89.85%) Loss: 2.053703 LR: 0.00000693 [17:13:16] Epoch: 1 Batch: 18060/20099 (89.86%) Loss: 2.196311 LR: 0.00000693 [17:13:18] Epoch: 1 Batch: 18061/20099 (89.86%) Loss: 2.102333 LR: 0.00000693 [17:13:20] Epoch: 1 Batch: 18062/20099 (89.87%) Loss: 2.406469 LR: 0.00000693 [17:13:22] Epoch: 1 Batch: 18063/20099 (89.87%) Loss: 1.898297 LR: 0.00000693 [17:13:24] Epoch: 1 Batch: 18064/20099 (89.88%) Loss: 1.802793 LR: 0.00000693 [17:13:26] Epoch: 1 Batch: 18065/20099 (89.88%) Loss: 2.227165 LR: 0.00000693 [17:13:28] Epoch: 1 Batch: 18066/20099 (89.89%) Loss: 2.096587 LR: 0.00000693 [17:13:30] Epoch: 1 Batch: 18067/20099 (89.89%) Loss: 2.089689 LR: 0.00000693 [17:13:32] Epoch: 1 Batch: 18068/20099 (89.90%) Loss: 2.231888 LR: 0.00000692 [17:13:33] Epoch: 1 Batch: 18069/20099 (89.90%) Loss: 2.012552 LR: 0.00000692 [17:13:35] Epoch: 1 Batch: 18070/20099 (89.90%) Loss: 2.365656 LR: 0.00000692 [17:13:37] Epoch: 1 Batch: 18071/20099 (89.91%) Loss: 2.353521 LR: 0.00000692 [17:13:39] Epoch: 1 Batch: 18072/20099 (89.91%) Loss: 2.417355 LR: 0.00000692 [17:13:41] Epoch: 1 Batch: 18073/20099 (89.92%) Loss: 2.153215 LR: 0.00000692 [17:13:43] Epoch: 1 Batch: 18074/20099 (89.92%) Loss: 2.378214 LR: 0.00000692 [17:13:45] Epoch: 1 Batch: 18075/20099 (89.93%) Loss: 1.890756 LR: 0.00000691 [17:13:46] Epoch: 1 Batch: 18076/20099 (89.93%) Loss: 2.140655 LR: 0.00000691 [17:13:48] Epoch: 1 Batch: 18077/20099 (89.94%) Loss: 2.364367 LR: 0.00000691 [17:13:50] Epoch: 1 Batch: 18078/20099 (89.94%) Loss: 1.870022 LR: 0.00000691 [17:13:52] Epoch: 1 Batch: 18079/20099 (89.95%) Loss: 2.101709 LR: 0.00000691 [17:13:54] Epoch: 1 Batch: 18080/20099 (89.95%) Loss: 2.057960 LR: 0.00000691 [17:13:56] Epoch: 1 Batch: 18081/20099 (89.96%) Loss: 2.152176 LR: 0.00000691 [17:13:58] Epoch: 1 Batch: 18082/20099 (89.96%) Loss: 1.915035 LR: 0.00000691 [17:13:59] Epoch: 1 Batch: 18083/20099 (89.97%) Loss: 2.147628 LR: 0.00000691 [17:14:01] Epoch: 1 Batch: 18084/20099 (89.97%) Loss: 2.180744 LR: 0.00000691 [17:14:03] Epoch: 1 Batch: 18085/20099 (89.98%) Loss: 2.220371 LR: 0.00000691 [17:14:05] Epoch: 1 Batch: 18086/20099 (89.98%) Loss: 2.095800 LR: 0.00000691 [17:14:07] Epoch: 1 Batch: 18087/20099 (89.99%) Loss: 2.179121 LR: 0.00000691 [17:14:09] Epoch: 1 Batch: 18088/20099 (89.99%) Loss: 1.686664 LR: 0.00000691 [17:14:11] Epoch: 1 Batch: 18089/20099 (90.00%) Loss: 2.466885 LR: 0.00000690 [17:14:12] Epoch: 1 Batch: 18090/20099 (90.00%) Loss: 2.092649 LR: 0.00000690 [17:14:14] Epoch: 1 Batch: 18091/20099 (90.01%) Loss: 1.834007 LR: 0.00000690 [17:14:16] Epoch: 1 Batch: 18092/20099 (90.01%) Loss: 2.004737 LR: 0.00000690 [17:14:18] Epoch: 1 Batch: 18093/20099 (90.02%) Loss: 2.088456 LR: 0.00000690 [17:14:20] Epoch: 1 Batch: 18094/20099 (90.02%) Loss: 1.900372 LR: 0.00000690 [17:14:22] Epoch: 1 Batch: 18095/20099 (90.03%) Loss: 2.085242 LR: 0.00000690 [17:14:24] Epoch: 1 Batch: 18096/20099 (90.03%) Loss: 2.080117 LR: 0.00000689 [17:14:25] Epoch: 1 Batch: 18097/20099 (90.04%) Loss: 2.095410 LR: 0.00000689 [17:14:27] Epoch: 1 Batch: 18098/20099 (90.04%) Loss: 1.875842 LR: 0.00000689 [17:14:29] Epoch: 1 Batch: 18099/20099 (90.05%) Loss: 2.062492 LR: 0.00000689 [17:14:31] Epoch: 1 Batch: 18100/20099 (90.05%) Loss: 1.912337 LR: 0.00000689 [17:14:33] Epoch: 1 Batch: 18101/20099 (90.06%) Loss: 2.026867 LR: 0.00000689 [17:14:35] Epoch: 1 Batch: 18102/20099 (90.06%) Loss: 2.027820 LR: 0.00000689 [17:14:37] Epoch: 1 Batch: 18103/20099 (90.07%) Loss: 2.213296 LR: 0.00000689 [17:14:38] Epoch: 1 Batch: 18104/20099 (90.07%) Loss: 1.902693 LR: 0.00000689 [17:14:40] Epoch: 1 Batch: 18105/20099 (90.08%) Loss: 2.221216 LR: 0.00000689 [17:14:42] Epoch: 1 Batch: 18106/20099 (90.08%) Loss: 2.161276 LR: 0.00000689 [17:14:44] Epoch: 1 Batch: 18107/20099 (90.09%) Loss: 2.094465 LR: 0.00000689 [17:14:46] Epoch: 1 Batch: 18108/20099 (90.09%) Loss: 2.090825 LR: 0.00000689 [17:14:48] Epoch: 1 Batch: 18109/20099 (90.10%) Loss: 2.232857 LR: 0.00000689 [17:14:49] Epoch: 1 Batch: 18110/20099 (90.10%) Loss: 2.170771 LR: 0.00000688 [17:14:51] Epoch: 1 Batch: 18111/20099 (90.11%) Loss: 2.149915 LR: 0.00000688 [17:14:53] Epoch: 1 Batch: 18112/20099 (90.11%) Loss: 2.066336 LR: 0.00000688 [17:14:55] Epoch: 1 Batch: 18113/20099 (90.12%) Loss: 2.395573 LR: 0.00000688 [17:14:57] Epoch: 1 Batch: 18114/20099 (90.12%) Loss: 2.088757 LR: 0.00000688 [17:14:59] Epoch: 1 Batch: 18115/20099 (90.13%) Loss: 2.069268 LR: 0.00000688 [17:15:01] Epoch: 1 Batch: 18116/20099 (90.13%) Loss: 2.037172 LR: 0.00000688 [17:15:02] Epoch: 1 Batch: 18117/20099 (90.14%) Loss: 2.035388 LR: 0.00000688 [17:15:04] Epoch: 1 Batch: 18118/20099 (90.14%) Loss: 2.022200 LR: 0.00000688 [17:15:06] Epoch: 1 Batch: 18119/20099 (90.15%) Loss: 1.911481 LR: 0.00000688 [17:15:08] Epoch: 1 Batch: 18120/20099 (90.15%) Loss: 2.212917 LR: 0.00000688 [17:15:10] Epoch: 1 Batch: 18121/20099 (90.16%) Loss: 2.009455 LR: 0.00000688 [17:15:12] Epoch: 1 Batch: 18122/20099 (90.16%) Loss: 2.066230 LR: 0.00000688 [17:15:13] Epoch: 1 Batch: 18123/20099 (90.17%) Loss: 2.180559 LR: 0.00000688 [17:15:15] Epoch: 1 Batch: 18124/20099 (90.17%) Loss: 2.044644 LR: 0.00000687 [17:15:17] Epoch: 1 Batch: 18125/20099 (90.18%) Loss: 1.972743 LR: 0.00000687 [17:15:19] Epoch: 1 Batch: 18126/20099 (90.18%) Loss: 1.908948 LR: 0.00000687 [17:15:21] Epoch: 1 Batch: 18127/20099 (90.19%) Loss: 2.022417 LR: 0.00000687 [17:15:23] Epoch: 1 Batch: 18128/20099 (90.19%) Loss: 2.231846 LR: 0.00000687 [17:15:25] Epoch: 1 Batch: 18129/20099 (90.20%) Loss: 2.077019 LR: 0.00000687 [17:15:26] Epoch: 1 Batch: 18130/20099 (90.20%) Loss: 1.983520 LR: 0.00000687 [17:15:28] Epoch: 1 Batch: 18131/20099 (90.21%) Loss: 2.256837 LR: 0.00000686 [17:15:30] Epoch: 1 Batch: 18132/20099 (90.21%) Loss: 1.957459 LR: 0.00000686 [17:15:32] Epoch: 1 Batch: 18133/20099 (90.22%) Loss: 2.282062 LR: 0.00000686 [17:15:34] Epoch: 1 Batch: 18134/20099 (90.22%) Loss: 2.317946 LR: 0.00000686 [17:15:36] Epoch: 1 Batch: 18135/20099 (90.23%) Loss: 1.969807 LR: 0.00000686 [17:15:38] Epoch: 1 Batch: 18136/20099 (90.23%) Loss: 1.639954 LR: 0.00000686 [17:15:39] Epoch: 1 Batch: 18137/20099 (90.24%) Loss: 2.172732 LR: 0.00000686 [17:15:41] Epoch: 1 Batch: 18138/20099 (90.24%) Loss: 2.030382 LR: 0.00000686 [17:15:43] Epoch: 1 Batch: 18139/20099 (90.25%) Loss: 2.183272 LR: 0.00000686 [17:15:45] Epoch: 1 Batch: 18140/20099 (90.25%) Loss: 2.279426 LR: 0.00000686 [17:15:47] Epoch: 1 Batch: 18141/20099 (90.26%) Loss: 1.901422 LR: 0.00000686 [17:15:49] Epoch: 1 Batch: 18142/20099 (90.26%) Loss: 2.196802 LR: 0.00000686 [17:15:51] Epoch: 1 Batch: 18143/20099 (90.27%) Loss: 2.221325 LR: 0.00000686 [17:15:53] Epoch: 1 Batch: 18144/20099 (90.27%) Loss: 2.332128 LR: 0.00000686 [17:15:54] Epoch: 1 Batch: 18145/20099 (90.28%) Loss: 1.914622 LR: 0.00000685 [17:15:56] Epoch: 1 Batch: 18146/20099 (90.28%) Loss: 2.454961 LR: 0.00000685 [17:15:58] Epoch: 1 Batch: 18147/20099 (90.29%) Loss: 2.219870 LR: 0.00000685 [17:16:00] Epoch: 1 Batch: 18148/20099 (90.29%) Loss: 2.045647 LR: 0.00000685 [17:16:02] Epoch: 1 Batch: 18149/20099 (90.30%) Loss: 2.206931 LR: 0.00000685 [17:16:04] Epoch: 1 Batch: 18150/20099 (90.30%) Loss: 2.210404 LR: 0.00000685 [17:16:05] Epoch: 1 Batch: 18151/20099 (90.31%) Loss: 2.217705 LR: 0.00000685 [17:16:07] Epoch: 1 Batch: 18152/20099 (90.31%) Loss: 2.103740 LR: 0.00000685 [17:16:09] Epoch: 1 Batch: 18153/20099 (90.32%) Loss: 1.846803 LR: 0.00000685 [17:16:11] Epoch: 1 Batch: 18154/20099 (90.32%) Loss: 2.026046 LR: 0.00000685 [17:16:13] Epoch: 1 Batch: 18155/20099 (90.33%) Loss: 2.066137 LR: 0.00000685 [17:16:15] Epoch: 1 Batch: 18156/20099 (90.33%) Loss: 2.143274 LR: 0.00000685 [17:16:17] Epoch: 1 Batch: 18157/20099 (90.34%) Loss: 2.185949 LR: 0.00000685 [17:16:18] Epoch: 1 Batch: 18158/20099 (90.34%) Loss: 1.726983 LR: 0.00000685 [17:16:20] Epoch: 1 Batch: 18159/20099 (90.35%) Loss: 2.124674 LR: 0.00000684 [17:16:22] Epoch: 1 Batch: 18160/20099 (90.35%) Loss: 1.914822 LR: 0.00000684 [17:16:24] Epoch: 1 Batch: 18161/20099 (90.36%) Loss: 2.093619 LR: 0.00000684 [17:16:26] Epoch: 1 Batch: 18162/20099 (90.36%) Loss: 1.973718 LR: 0.00000684 [17:16:28] Epoch: 1 Batch: 18163/20099 (90.37%) Loss: 2.176095 LR: 0.00000684 [17:16:29] Epoch: 1 Batch: 18164/20099 (90.37%) Loss: 2.025962 LR: 0.00000684 [17:16:31] Epoch: 1 Batch: 18165/20099 (90.38%) Loss: 2.021997 LR: 0.00000684 [17:16:33] Epoch: 1 Batch: 18166/20099 (90.38%) Loss: 1.903714 LR: 0.00000683 [17:16:35] Epoch: 1 Batch: 18167/20099 (90.39%) Loss: 2.163500 LR: 0.00000683 [17:16:37] Epoch: 1 Batch: 18168/20099 (90.39%) Loss: 2.056345 LR: 0.00000683 [17:16:39] Epoch: 1 Batch: 18169/20099 (90.40%) Loss: 1.826703 LR: 0.00000683 [17:16:41] Epoch: 1 Batch: 18170/20099 (90.40%) Loss: 1.704078 LR: 0.00000683 [17:16:42] Epoch: 1 Batch: 18171/20099 (90.41%) Loss: 1.863752 LR: 0.00000683 [17:16:44] Epoch: 1 Batch: 18172/20099 (90.41%) Loss: 2.031860 LR: 0.00000683 [17:16:46] Epoch: 1 Batch: 18173/20099 (90.42%) Loss: 1.657881 LR: 0.00000683 [17:16:48] Epoch: 1 Batch: 18174/20099 (90.42%) Loss: 2.188251 LR: 0.00000683 [17:16:50] Epoch: 1 Batch: 18175/20099 (90.43%) Loss: 2.008827 LR: 0.00000683 [17:16:52] Epoch: 1 Batch: 18176/20099 (90.43%) Loss: 1.992982 LR: 0.00000683 [17:16:53] Epoch: 1 Batch: 18177/20099 (90.44%) Loss: 2.056496 LR: 0.00000683 [17:16:55] Epoch: 1 Batch: 18178/20099 (90.44%) Loss: 2.167754 LR: 0.00000683 [17:16:57] Epoch: 1 Batch: 18179/20099 (90.45%) Loss: 2.165286 LR: 0.00000683 [17:16:59] Epoch: 1 Batch: 18180/20099 (90.45%) Loss: 2.116682 LR: 0.00000682 [17:17:01] Epoch: 1 Batch: 18181/20099 (90.46%) Loss: 2.274207 LR: 0.00000682 [17:17:03] Epoch: 1 Batch: 18182/20099 (90.46%) Loss: 2.332131 LR: 0.00000682 [17:17:05] Epoch: 1 Batch: 18183/20099 (90.47%) Loss: 2.094793 LR: 0.00000682 [17:17:06] Epoch: 1 Batch: 18184/20099 (90.47%) Loss: 1.919041 LR: 0.00000682 [17:17:08] Epoch: 1 Batch: 18185/20099 (90.48%) Loss: 1.875718 LR: 0.00000682 [17:17:10] Epoch: 1 Batch: 18186/20099 (90.48%) Loss: 1.858859 LR: 0.00000682 [17:17:12] Epoch: 1 Batch: 18187/20099 (90.49%) Loss: 2.229933 LR: 0.00000682 [17:17:14] Epoch: 1 Batch: 18188/20099 (90.49%) Loss: 1.889033 LR: 0.00000682 [17:17:16] Epoch: 1 Batch: 18189/20099 (90.50%) Loss: 2.172183 LR: 0.00000682 [17:17:18] Epoch: 1 Batch: 18190/20099 (90.50%) Loss: 1.922470 LR: 0.00000682 [17:17:19] Epoch: 1 Batch: 18191/20099 (90.51%) Loss: 2.084778 LR: 0.00000682 [17:17:21] Epoch: 1 Batch: 18192/20099 (90.51%) Loss: 1.843529 LR: 0.00000682 [17:17:23] Epoch: 1 Batch: 18193/20099 (90.52%) Loss: 2.037268 LR: 0.00000682 [17:17:25] Epoch: 1 Batch: 18194/20099 (90.52%) Loss: 2.015238 LR: 0.00000681 [17:17:27] Epoch: 1 Batch: 18195/20099 (90.53%) Loss: 1.959667 LR: 0.00000681 [17:17:29] Epoch: 1 Batch: 18196/20099 (90.53%) Loss: 2.263777 LR: 0.00000681 [17:17:31] Epoch: 1 Batch: 18197/20099 (90.54%) Loss: 2.256994 LR: 0.00000681 [17:17:32] Epoch: 1 Batch: 18198/20099 (90.54%) Loss: 2.079698 LR: 0.00000681 [17:17:34] Epoch: 1 Batch: 18199/20099 (90.55%) Loss: 2.081349 LR: 0.00000681 [17:17:40] >> Cleaned up old temp checkpoint: epoch1_step16200 [17:17:40] >> Temp checkpoint saved: epoch1_step18200, size: 0.1693 GB [17:17:40] Epoch: 1 Batch: 18200/20099 (90.55%) Loss: 2.523502 LR: 0.00000681 [17:17:42] Epoch: 1 Batch: 18201/20099 (90.56%) Loss: 2.030546 LR: 0.00000680 [17:17:44] Epoch: 1 Batch: 18202/20099 (90.56%) Loss: 2.279252 LR: 0.00000680 [17:17:45] Epoch: 1 Batch: 18203/20099 (90.57%) Loss: 2.193962 LR: 0.00000680 [17:17:47] Epoch: 1 Batch: 18204/20099 (90.57%) Loss: 2.022379 LR: 0.00000680 [17:17:49] Epoch: 1 Batch: 18205/20099 (90.58%) Loss: 2.336188 LR: 0.00000680 [17:17:51] Epoch: 1 Batch: 18206/20099 (90.58%) Loss: 2.039440 LR: 0.00000680 [17:17:53] Epoch: 1 Batch: 18207/20099 (90.59%) Loss: 1.648145 LR: 0.00000680 [17:17:55] Epoch: 1 Batch: 18208/20099 (90.59%) Loss: 2.088834 LR: 0.00000680 [17:17:57] Epoch: 1 Batch: 18209/20099 (90.60%) Loss: 2.494401 LR: 0.00000680 [17:17:58] Epoch: 1 Batch: 18210/20099 (90.60%) Loss: 2.227100 LR: 0.00000680 [17:18:00] Epoch: 1 Batch: 18211/20099 (90.61%) Loss: 2.416119 LR: 0.00000680 [17:18:02] Epoch: 1 Batch: 18212/20099 (90.61%) Loss: 2.062824 LR: 0.00000680 [17:18:04] Epoch: 1 Batch: 18213/20099 (90.62%) Loss: 1.908146 LR: 0.00000680 [17:18:06] Epoch: 1 Batch: 18214/20099 (90.62%) Loss: 2.121631 LR: 0.00000680 [17:18:08] Epoch: 1 Batch: 18215/20099 (90.63%) Loss: 1.950527 LR: 0.00000679 [17:18:10] Epoch: 1 Batch: 18216/20099 (90.63%) Loss: 1.918663 LR: 0.00000679 [17:18:11] Epoch: 1 Batch: 18217/20099 (90.64%) Loss: 1.962832 LR: 0.00000679 [17:18:13] Epoch: 1 Batch: 18218/20099 (90.64%) Loss: 1.997926 LR: 0.00000679 [17:18:15] Epoch: 1 Batch: 18219/20099 (90.65%) Loss: 1.877568 LR: 0.00000679 [17:18:17] Epoch: 1 Batch: 18220/20099 (90.65%) Loss: 2.161923 LR: 0.00000679 [17:18:19] Epoch: 1 Batch: 18221/20099 (90.66%) Loss: 2.402639 LR: 0.00000679 [17:18:21] Epoch: 1 Batch: 18222/20099 (90.66%) Loss: 2.319711 LR: 0.00000679 [17:18:23] Epoch: 1 Batch: 18223/20099 (90.67%) Loss: 2.000299 LR: 0.00000679 [17:18:24] Epoch: 1 Batch: 18224/20099 (90.67%) Loss: 2.109338 LR: 0.00000679 [17:18:26] Epoch: 1 Batch: 18225/20099 (90.68%) Loss: 1.997355 LR: 0.00000679 [17:18:28] Epoch: 1 Batch: 18226/20099 (90.68%) Loss: 2.038446 LR: 0.00000679 [17:18:30] Epoch: 1 Batch: 18227/20099 (90.69%) Loss: 1.971798 LR: 0.00000679 [17:18:32] Epoch: 1 Batch: 18228/20099 (90.69%) Loss: 1.743580 LR: 0.00000679 [17:18:34] Epoch: 1 Batch: 18229/20099 (90.70%) Loss: 2.134880 LR: 0.00000678 [17:18:36] Epoch: 1 Batch: 18230/20099 (90.70%) Loss: 2.037946 LR: 0.00000678 [17:18:37] Epoch: 1 Batch: 18231/20099 (90.71%) Loss: 2.280535 LR: 0.00000678 [17:18:39] Epoch: 1 Batch: 18232/20099 (90.71%) Loss: 2.319334 LR: 0.00000678 [17:18:41] Epoch: 1 Batch: 18233/20099 (90.72%) Loss: 2.229506 LR: 0.00000678 [17:18:43] Epoch: 1 Batch: 18234/20099 (90.72%) Loss: 2.069950 LR: 0.00000678 [17:18:45] Epoch: 1 Batch: 18235/20099 (90.73%) Loss: 2.044002 LR: 0.00000678 [17:18:47] Epoch: 1 Batch: 18236/20099 (90.73%) Loss: 2.199251 LR: 0.00000678 [17:18:49] Epoch: 1 Batch: 18237/20099 (90.74%) Loss: 2.085798 LR: 0.00000678 [17:18:51] Epoch: 1 Batch: 18238/20099 (90.74%) Loss: 2.021058 LR: 0.00000678 [17:18:52] Epoch: 1 Batch: 18239/20099 (90.75%) Loss: 2.407196 LR: 0.00000678 [17:18:54] Epoch: 1 Batch: 18240/20099 (90.75%) Loss: 2.024598 LR: 0.00000678 [17:18:56] Epoch: 1 Batch: 18241/20099 (90.76%) Loss: 2.081717 LR: 0.00000678 [17:18:58] Epoch: 1 Batch: 18242/20099 (90.76%) Loss: 1.911353 LR: 0.00000678 [17:19:00] Epoch: 1 Batch: 18243/20099 (90.77%) Loss: 2.043551 LR: 0.00000677 [17:19:02] Epoch: 1 Batch: 18244/20099 (90.77%) Loss: 2.257400 LR: 0.00000677 [17:19:03] Epoch: 1 Batch: 18245/20099 (90.78%) Loss: 2.075047 LR: 0.00000677 [17:19:05] Epoch: 1 Batch: 18246/20099 (90.78%) Loss: 2.165115 LR: 0.00000677 [17:19:07] Epoch: 1 Batch: 18247/20099 (90.79%) Loss: 2.416836 LR: 0.00000677 [17:19:09] Epoch: 1 Batch: 18248/20099 (90.79%) Loss: 2.076576 LR: 0.00000677 [17:19:11] Epoch: 1 Batch: 18249/20099 (90.80%) Loss: 2.016102 LR: 0.00000677 [17:19:13] Epoch: 1 Batch: 18250/20099 (90.80%) Loss: 2.277062 LR: 0.00000676 [17:19:15] Epoch: 1 Batch: 18251/20099 (90.81%) Loss: 2.048624 LR: 0.00000676 [17:19:16] Epoch: 1 Batch: 18252/20099 (90.81%) Loss: 2.034063 LR: 0.00000676 [17:19:18] Epoch: 1 Batch: 18253/20099 (90.82%) Loss: 2.170323 LR: 0.00000676 [17:19:20] Epoch: 1 Batch: 18254/20099 (90.82%) Loss: 2.185003 LR: 0.00000676 [17:19:22] Epoch: 1 Batch: 18255/20099 (90.83%) Loss: 2.027947 LR: 0.00000676 [17:19:24] Epoch: 1 Batch: 18256/20099 (90.83%) Loss: 2.100636 LR: 0.00000676 [17:19:26] Epoch: 1 Batch: 18257/20099 (90.84%) Loss: 2.147670 LR: 0.00000676 [17:19:28] Epoch: 1 Batch: 18258/20099 (90.84%) Loss: 2.421621 LR: 0.00000676 [17:19:29] Epoch: 1 Batch: 18259/20099 (90.85%) Loss: 2.165501 LR: 0.00000676 [17:19:31] Epoch: 1 Batch: 18260/20099 (90.85%) Loss: 2.042848 LR: 0.00000676 [17:19:33] Epoch: 1 Batch: 18261/20099 (90.86%) Loss: 2.172343 LR: 0.00000676 [17:19:35] Epoch: 1 Batch: 18262/20099 (90.86%) Loss: 2.005523 LR: 0.00000676 [17:19:37] Epoch: 1 Batch: 18263/20099 (90.87%) Loss: 2.114500 LR: 0.00000676 [17:19:39] Epoch: 1 Batch: 18264/20099 (90.87%) Loss: 1.780577 LR: 0.00000675 [17:19:41] Epoch: 1 Batch: 18265/20099 (90.88%) Loss: 2.120799 LR: 0.00000675 [17:19:42] Epoch: 1 Batch: 18266/20099 (90.88%) Loss: 2.259013 LR: 0.00000675 [17:19:44] Epoch: 1 Batch: 18267/20099 (90.89%) Loss: 2.079130 LR: 0.00000675 [17:19:46] Epoch: 1 Batch: 18268/20099 (90.89%) Loss: 2.325318 LR: 0.00000675 [17:19:48] Epoch: 1 Batch: 18269/20099 (90.90%) Loss: 2.008859 LR: 0.00000675 [17:19:50] Epoch: 1 Batch: 18270/20099 (90.90%) Loss: 2.069320 LR: 0.00000675 [17:19:52] Epoch: 1 Batch: 18271/20099 (90.91%) Loss: 2.120573 LR: 0.00000675 [17:19:53] Epoch: 1 Batch: 18272/20099 (90.91%) Loss: 1.977761 LR: 0.00000675 [17:19:55] Epoch: 1 Batch: 18273/20099 (90.91%) Loss: 2.163534 LR: 0.00000675 [17:19:57] Epoch: 1 Batch: 18274/20099 (90.92%) Loss: 1.993819 LR: 0.00000675 [17:19:59] Epoch: 1 Batch: 18275/20099 (90.92%) Loss: 1.991727 LR: 0.00000675 [17:20:01] Epoch: 1 Batch: 18276/20099 (90.93%) Loss: 1.737078 LR: 0.00000675 [17:20:03] Epoch: 1 Batch: 18277/20099 (90.93%) Loss: 2.163844 LR: 0.00000675 [17:20:05] Epoch: 1 Batch: 18278/20099 (90.94%) Loss: 2.635512 LR: 0.00000674 [17:20:06] Epoch: 1 Batch: 18279/20099 (90.94%) Loss: 1.964423 LR: 0.00000674 [17:20:08] Epoch: 1 Batch: 18280/20099 (90.95%) Loss: 2.352328 LR: 0.00000674 [17:20:10] Epoch: 1 Batch: 18281/20099 (90.95%) Loss: 2.581562 LR: 0.00000674 [17:20:12] Epoch: 1 Batch: 18282/20099 (90.96%) Loss: 2.068626 LR: 0.00000674 [17:20:14] Epoch: 1 Batch: 18283/20099 (90.96%) Loss: 2.196370 LR: 0.00000674 [17:20:16] Epoch: 1 Batch: 18284/20099 (90.97%) Loss: 2.100296 LR: 0.00000674 [17:20:18] Epoch: 1 Batch: 18285/20099 (90.97%) Loss: 1.817886 LR: 0.00000674 [17:20:19] Epoch: 1 Batch: 18286/20099 (90.98%) Loss: 2.005213 LR: 0.00000674 [17:20:21] Epoch: 1 Batch: 18287/20099 (90.98%) Loss: 2.257515 LR: 0.00000674 [17:20:23] Epoch: 1 Batch: 18288/20099 (90.99%) Loss: 2.070512 LR: 0.00000674 [17:20:25] Epoch: 1 Batch: 18289/20099 (90.99%) Loss: 2.233605 LR: 0.00000674 [17:20:27] Epoch: 1 Batch: 18290/20099 (91.00%) Loss: 1.956137 LR: 0.00000674 [17:20:29] Epoch: 1 Batch: 18291/20099 (91.00%) Loss: 1.892496 LR: 0.00000674 [17:20:31] Epoch: 1 Batch: 18292/20099 (91.01%) Loss: 2.244969 LR: 0.00000673 [17:20:33] Epoch: 1 Batch: 18293/20099 (91.01%) Loss: 2.247425 LR: 0.00000673 [17:20:34] Epoch: 1 Batch: 18294/20099 (91.02%) Loss: 2.115074 LR: 0.00000673 [17:20:36] Epoch: 1 Batch: 18295/20099 (91.02%) Loss: 1.971923 LR: 0.00000673 [17:20:38] Epoch: 1 Batch: 18296/20099 (91.03%) Loss: 2.238026 LR: 0.00000673 [17:20:40] Epoch: 1 Batch: 18297/20099 (91.03%) Loss: 2.173589 LR: 0.00000673 [17:20:42] Epoch: 1 Batch: 18298/20099 (91.04%) Loss: 2.057692 LR: 0.00000673 [17:20:44] Epoch: 1 Batch: 18299/20099 (91.04%) Loss: 1.731912 LR: 0.00000672 [17:20:46] Epoch: 1 Batch: 18300/20099 (91.05%) Loss: 2.360095 LR: 0.00000672 [17:20:47] Epoch: 1 Batch: 18301/20099 (91.05%) Loss: 2.147245 LR: 0.00000672 [17:20:49] Epoch: 1 Batch: 18302/20099 (91.06%) Loss: 2.054597 LR: 0.00000672 [17:20:51] Epoch: 1 Batch: 18303/20099 (91.06%) Loss: 1.745960 LR: 0.00000672 [17:20:53] Epoch: 1 Batch: 18304/20099 (91.07%) Loss: 2.321170 LR: 0.00000672 [17:20:55] Epoch: 1 Batch: 18305/20099 (91.07%) Loss: 1.976469 LR: 0.00000672 [17:20:57] Epoch: 1 Batch: 18306/20099 (91.08%) Loss: 1.805069 LR: 0.00000672 [17:20:59] Epoch: 1 Batch: 18307/20099 (91.08%) Loss: 1.898974 LR: 0.00000672 [17:21:00] Epoch: 1 Batch: 18308/20099 (91.09%) Loss: 2.018485 LR: 0.00000672 [17:21:02] Epoch: 1 Batch: 18309/20099 (91.09%) Loss: 2.045979 LR: 0.00000672 [17:21:04] Epoch: 1 Batch: 18310/20099 (91.10%) Loss: 2.436414 LR: 0.00000672 [17:21:06] Epoch: 1 Batch: 18311/20099 (91.10%) Loss: 1.983081 LR: 0.00000672 [17:21:08] Epoch: 1 Batch: 18312/20099 (91.11%) Loss: 1.722857 LR: 0.00000672 [17:21:10] Epoch: 1 Batch: 18313/20099 (91.11%) Loss: 1.791212 LR: 0.00000671 [17:21:12] Epoch: 1 Batch: 18314/20099 (91.12%) Loss: 2.322313 LR: 0.00000671 [17:21:13] Epoch: 1 Batch: 18315/20099 (91.12%) Loss: 2.223823 LR: 0.00000671 [17:21:15] Epoch: 1 Batch: 18316/20099 (91.13%) Loss: 2.138594 LR: 0.00000671 [17:21:17] Epoch: 1 Batch: 18317/20099 (91.13%) Loss: 2.237443 LR: 0.00000671 [17:21:19] Epoch: 1 Batch: 18318/20099 (91.14%) Loss: 2.198127 LR: 0.00000671 [17:21:21] Epoch: 1 Batch: 18319/20099 (91.14%) Loss: 2.305162 LR: 0.00000671 [17:21:23] Epoch: 1 Batch: 18320/20099 (91.15%) Loss: 2.279042 LR: 0.00000671 [17:21:25] Epoch: 1 Batch: 18321/20099 (91.15%) Loss: 2.049140 LR: 0.00000671 [17:21:27] Epoch: 1 Batch: 18322/20099 (91.16%) Loss: 1.891444 LR: 0.00000671 [17:21:28] Epoch: 1 Batch: 18323/20099 (91.16%) Loss: 1.871263 LR: 0.00000671 [17:21:30] Epoch: 1 Batch: 18324/20099 (91.17%) Loss: 2.406608 LR: 0.00000671 [17:21:32] Epoch: 1 Batch: 18325/20099 (91.17%) Loss: 1.669776 LR: 0.00000671 [17:21:34] Epoch: 1 Batch: 18326/20099 (91.18%) Loss: 2.239606 LR: 0.00000671 [17:21:36] Epoch: 1 Batch: 18327/20099 (91.18%) Loss: 1.950848 LR: 0.00000670 [17:21:38] Epoch: 1 Batch: 18328/20099 (91.19%) Loss: 1.796362 LR: 0.00000670 [17:21:40] Epoch: 1 Batch: 18329/20099 (91.19%) Loss: 1.811717 LR: 0.00000670 [17:21:41] Epoch: 1 Batch: 18330/20099 (91.20%) Loss: 2.029404 LR: 0.00000670 [17:21:43] Epoch: 1 Batch: 18331/20099 (91.20%) Loss: 2.034078 LR: 0.00000670 [17:21:45] Epoch: 1 Batch: 18332/20099 (91.21%) Loss: 1.942078 LR: 0.00000670 [17:21:47] Epoch: 1 Batch: 18333/20099 (91.21%) Loss: 2.045277 LR: 0.00000670 [17:21:49] Epoch: 1 Batch: 18334/20099 (91.22%) Loss: 2.144887 LR: 0.00000670 [17:21:51] Epoch: 1 Batch: 18335/20099 (91.22%) Loss: 2.002846 LR: 0.00000670 [17:21:53] Epoch: 1 Batch: 18336/20099 (91.23%) Loss: 1.899409 LR: 0.00000670 [17:21:54] Epoch: 1 Batch: 18337/20099 (91.23%) Loss: 1.897315 LR: 0.00000670 [17:21:56] Epoch: 1 Batch: 18338/20099 (91.24%) Loss: 2.173879 LR: 0.00000670 [17:21:58] Epoch: 1 Batch: 18339/20099 (91.24%) Loss: 2.136481 LR: 0.00000670 [17:22:00] Epoch: 1 Batch: 18340/20099 (91.25%) Loss: 2.128942 LR: 0.00000670 [17:22:02] Epoch: 1 Batch: 18341/20099 (91.25%) Loss: 2.503619 LR: 0.00000669 [17:22:04] Epoch: 1 Batch: 18342/20099 (91.26%) Loss: 2.130661 LR: 0.00000669 [17:22:06] Epoch: 1 Batch: 18343/20099 (91.26%) Loss: 2.113306 LR: 0.00000669 [17:22:07] Epoch: 1 Batch: 18344/20099 (91.27%) Loss: 2.172758 LR: 0.00000669 [17:22:09] Epoch: 1 Batch: 18345/20099 (91.27%) Loss: 1.736252 LR: 0.00000669 [17:22:11] Epoch: 1 Batch: 18346/20099 (91.28%) Loss: 2.086050 LR: 0.00000669 [17:22:13] Epoch: 1 Batch: 18347/20099 (91.28%) Loss: 1.931801 LR: 0.00000669 [17:22:15] Epoch: 1 Batch: 18348/20099 (91.29%) Loss: 1.932867 LR: 0.00000669 [17:22:17] Epoch: 1 Batch: 18349/20099 (91.29%) Loss: 2.122732 LR: 0.00000669 [17:22:19] Epoch: 1 Batch: 18350/20099 (91.30%) Loss: 1.977980 LR: 0.00000669 [17:22:20] Epoch: 1 Batch: 18351/20099 (91.30%) Loss: 2.219812 LR: 0.00000669 [17:22:22] Epoch: 1 Batch: 18352/20099 (91.31%) Loss: 2.149518 LR: 0.00000669 [17:22:24] Epoch: 1 Batch: 18353/20099 (91.31%) Loss: 2.171475 LR: 0.00000669 [17:22:26] Epoch: 1 Batch: 18354/20099 (91.32%) Loss: 1.790500 LR: 0.00000669 [17:22:28] Epoch: 1 Batch: 18355/20099 (91.32%) Loss: 1.642383 LR: 0.00000668 [17:22:30] Epoch: 1 Batch: 18356/20099 (91.33%) Loss: 1.978318 LR: 0.00000668 [17:22:31] Epoch: 1 Batch: 18357/20099 (91.33%) Loss: 1.906888 LR: 0.00000668 [17:22:33] Epoch: 1 Batch: 18358/20099 (91.34%) Loss: 2.045049 LR: 0.00000668 [17:22:35] Epoch: 1 Batch: 18359/20099 (91.34%) Loss: 2.393996 LR: 0.00000668 [17:22:37] Epoch: 1 Batch: 18360/20099 (91.35%) Loss: 2.415705 LR: 0.00000668 [17:22:39] Epoch: 1 Batch: 18361/20099 (91.35%) Loss: 1.838671 LR: 0.00000668 [17:22:41] Epoch: 1 Batch: 18362/20099 (91.36%) Loss: 1.911799 LR: 0.00000668 [17:22:43] Epoch: 1 Batch: 18363/20099 (91.36%) Loss: 1.934760 LR: 0.00000668 [17:22:44] Epoch: 1 Batch: 18364/20099 (91.37%) Loss: 1.930239 LR: 0.00000668 [17:22:46] Epoch: 1 Batch: 18365/20099 (91.37%) Loss: 2.068558 LR: 0.00000668 [17:22:48] Epoch: 1 Batch: 18366/20099 (91.38%) Loss: 2.095570 LR: 0.00000668 [17:22:50] Epoch: 1 Batch: 18367/20099 (91.38%) Loss: 2.100580 LR: 0.00000668 [17:22:52] Epoch: 1 Batch: 18368/20099 (91.39%) Loss: 1.865878 LR: 0.00000668 [17:22:54] Epoch: 1 Batch: 18369/20099 (91.39%) Loss: 2.188638 LR: 0.00000667 [17:22:55] Epoch: 1 Batch: 18370/20099 (91.40%) Loss: 2.192326 LR: 0.00000667 [17:22:57] Epoch: 1 Batch: 18371/20099 (91.40%) Loss: 2.295754 LR: 0.00000667 [17:22:59] Epoch: 1 Batch: 18372/20099 (91.41%) Loss: 1.689284 LR: 0.00000667 [17:23:01] Epoch: 1 Batch: 18373/20099 (91.41%) Loss: 2.270655 LR: 0.00000667 [17:23:03] Epoch: 1 Batch: 18374/20099 (91.42%) Loss: 2.085134 LR: 0.00000667 [17:23:05] Epoch: 1 Batch: 18375/20099 (91.42%) Loss: 2.369422 LR: 0.00000667 [17:23:07] Epoch: 1 Batch: 18376/20099 (91.43%) Loss: 1.928385 LR: 0.00000666 [17:23:09] Epoch: 1 Batch: 18377/20099 (91.43%) Loss: 2.055173 LR: 0.00000666 [17:23:10] Epoch: 1 Batch: 18378/20099 (91.44%) Loss: 2.043360 LR: 0.00000666 [17:23:12] Epoch: 1 Batch: 18379/20099 (91.44%) Loss: 2.200934 LR: 0.00000666 [17:23:14] Epoch: 1 Batch: 18380/20099 (91.45%) Loss: 2.039237 LR: 0.00000666 [17:23:16] Epoch: 1 Batch: 18381/20099 (91.45%) Loss: 2.202638 LR: 0.00000666 [17:23:18] Epoch: 1 Batch: 18382/20099 (91.46%) Loss: 2.025159 LR: 0.00000666 [17:23:20] Epoch: 1 Batch: 18383/20099 (91.46%) Loss: 2.141991 LR: 0.00000666 [17:23:22] Epoch: 1 Batch: 18384/20099 (91.47%) Loss: 2.219597 LR: 0.00000666 [17:23:24] Epoch: 1 Batch: 18385/20099 (91.47%) Loss: 1.981840 LR: 0.00000666 [17:23:25] Epoch: 1 Batch: 18386/20099 (91.48%) Loss: 2.142974 LR: 0.00000666 [17:23:27] Epoch: 1 Batch: 18387/20099 (91.48%) Loss: 2.116699 LR: 0.00000666 [17:23:29] Epoch: 1 Batch: 18388/20099 (91.49%) Loss: 1.905943 LR: 0.00000666 [17:23:31] Epoch: 1 Batch: 18389/20099 (91.49%) Loss: 1.965544 LR: 0.00000666 [17:23:33] Epoch: 1 Batch: 18390/20099 (91.50%) Loss: 2.064021 LR: 0.00000665 [17:23:35] Epoch: 1 Batch: 18391/20099 (91.50%) Loss: 2.184979 LR: 0.00000665 [17:23:37] Epoch: 1 Batch: 18392/20099 (91.51%) Loss: 1.990743 LR: 0.00000665 [17:23:38] Epoch: 1 Batch: 18393/20099 (91.51%) Loss: 1.844092 LR: 0.00000665 [17:23:40] Epoch: 1 Batch: 18394/20099 (91.52%) Loss: 2.140414 LR: 0.00000665 [17:23:42] Epoch: 1 Batch: 18395/20099 (91.52%) Loss: 1.959634 LR: 0.00000665 [17:23:44] Epoch: 1 Batch: 18396/20099 (91.53%) Loss: 2.201645 LR: 0.00000665 [17:23:46] Epoch: 1 Batch: 18397/20099 (91.53%) Loss: 2.017674 LR: 0.00000665 [17:23:48] Epoch: 1 Batch: 18398/20099 (91.54%) Loss: 1.887501 LR: 0.00000665 [17:23:50] Epoch: 1 Batch: 18399/20099 (91.54%) Loss: 1.976553 LR: 0.00000665 [17:23:55] >> Cleaned up old temp checkpoint: epoch1_step16400 [17:23:55] >> Temp checkpoint saved: epoch1_step18400, size: 0.1693 GB [17:23:55] Epoch: 1 Batch: 18400/20099 (91.55%) Loss: 2.249673 LR: 0.00000665 [17:23:57] Epoch: 1 Batch: 18401/20099 (91.55%) Loss: 2.009481 LR: 0.00000665 [17:23:59] Epoch: 1 Batch: 18402/20099 (91.56%) Loss: 2.090874 LR: 0.00000665 [17:24:00] Epoch: 1 Batch: 18403/20099 (91.56%) Loss: 1.690160 LR: 0.00000665 [17:24:02] Epoch: 1 Batch: 18404/20099 (91.57%) Loss: 1.949294 LR: 0.00000664 [17:24:04] Epoch: 1 Batch: 18405/20099 (91.57%) Loss: 1.991221 LR: 0.00000664 [17:24:06] Epoch: 1 Batch: 18406/20099 (91.58%) Loss: 2.097525 LR: 0.00000664 [17:24:08] Epoch: 1 Batch: 18407/20099 (91.58%) Loss: 2.059528 LR: 0.00000664 [17:24:10] Epoch: 1 Batch: 18408/20099 (91.59%) Loss: 2.054844 LR: 0.00000664 [17:24:12] Epoch: 1 Batch: 18409/20099 (91.59%) Loss: 1.852958 LR: 0.00000664 [17:24:13] Epoch: 1 Batch: 18410/20099 (91.60%) Loss: 2.152265 LR: 0.00000664 [17:24:15] Epoch: 1 Batch: 18411/20099 (91.60%) Loss: 1.833290 LR: 0.00000664 [17:24:17] Epoch: 1 Batch: 18412/20099 (91.61%) Loss: 2.328901 LR: 0.00000664 [17:24:19] Epoch: 1 Batch: 18413/20099 (91.61%) Loss: 2.168030 LR: 0.00000664 [17:24:21] Epoch: 1 Batch: 18414/20099 (91.62%) Loss: 1.804657 LR: 0.00000664 [17:24:23] Epoch: 1 Batch: 18415/20099 (91.62%) Loss: 2.174877 LR: 0.00000664 [17:24:25] Epoch: 1 Batch: 18416/20099 (91.63%) Loss: 2.063121 LR: 0.00000664 [17:24:27] Epoch: 1 Batch: 18417/20099 (91.63%) Loss: 2.119265 LR: 0.00000664 [17:24:28] Epoch: 1 Batch: 18418/20099 (91.64%) Loss: 2.232717 LR: 0.00000663 [17:24:30] Epoch: 1 Batch: 18419/20099 (91.64%) Loss: 2.120742 LR: 0.00000663 [17:24:32] Epoch: 1 Batch: 18420/20099 (91.65%) Loss: 2.199342 LR: 0.00000663 [17:24:34] Epoch: 1 Batch: 18421/20099 (91.65%) Loss: 2.155524 LR: 0.00000663 [17:24:36] Epoch: 1 Batch: 18422/20099 (91.66%) Loss: 1.961869 LR: 0.00000663 [17:24:38] Epoch: 1 Batch: 18423/20099 (91.66%) Loss: 1.773931 LR: 0.00000663 [17:24:40] Epoch: 1 Batch: 18424/20099 (91.67%) Loss: 2.082604 LR: 0.00000663 [17:24:42] Epoch: 1 Batch: 18425/20099 (91.67%) Loss: 1.982359 LR: 0.00000663 [17:24:43] Epoch: 1 Batch: 18426/20099 (91.68%) Loss: 1.930046 LR: 0.00000663 [17:24:45] Epoch: 1 Batch: 18427/20099 (91.68%) Loss: 2.135198 LR: 0.00000663 [17:24:47] Epoch: 1 Batch: 18428/20099 (91.69%) Loss: 1.780113 LR: 0.00000663 [17:24:49] Epoch: 1 Batch: 18429/20099 (91.69%) Loss: 2.316741 LR: 0.00000663 [17:24:51] Epoch: 1 Batch: 18430/20099 (91.70%) Loss: 2.105590 LR: 0.00000663 [17:24:53] Epoch: 1 Batch: 18431/20099 (91.70%) Loss: 2.206619 LR: 0.00000663 [17:24:55] Epoch: 1 Batch: 18432/20099 (91.71%) Loss: 2.088426 LR: 0.00000662 [17:24:56] Epoch: 1 Batch: 18433/20099 (91.71%) Loss: 1.767898 LR: 0.00000662 [17:24:58] Epoch: 1 Batch: 18434/20099 (91.72%) Loss: 1.984496 LR: 0.00000662 [17:25:00] Epoch: 1 Batch: 18435/20099 (91.72%) Loss: 1.812868 LR: 0.00000662 [17:25:02] Epoch: 1 Batch: 18436/20099 (91.73%) Loss: 2.266970 LR: 0.00000662 [17:25:04] Epoch: 1 Batch: 18437/20099 (91.73%) Loss: 2.125788 LR: 0.00000662 [17:25:06] Epoch: 1 Batch: 18438/20099 (91.74%) Loss: 2.031799 LR: 0.00000662 [17:25:08] Epoch: 1 Batch: 18439/20099 (91.74%) Loss: 2.156986 LR: 0.00000662 [17:25:09] Epoch: 1 Batch: 18440/20099 (91.75%) Loss: 2.113350 LR: 0.00000662 [17:25:11] Epoch: 1 Batch: 18441/20099 (91.75%) Loss: 2.007675 LR: 0.00000662 [17:25:13] Epoch: 1 Batch: 18442/20099 (91.76%) Loss: 2.235642 LR: 0.00000662 [17:25:15] Epoch: 1 Batch: 18443/20099 (91.76%) Loss: 2.294758 LR: 0.00000662 [17:25:17] Epoch: 1 Batch: 18444/20099 (91.77%) Loss: 2.291122 LR: 0.00000662 [17:25:19] Epoch: 1 Batch: 18445/20099 (91.77%) Loss: 1.818213 LR: 0.00000662 [17:25:21] Epoch: 1 Batch: 18446/20099 (91.78%) Loss: 2.424020 LR: 0.00000661 [17:25:22] Epoch: 1 Batch: 18447/20099 (91.78%) Loss: 2.229614 LR: 0.00000661 [17:25:24] Epoch: 1 Batch: 18448/20099 (91.79%) Loss: 2.290539 LR: 0.00000661 [17:25:26] Epoch: 1 Batch: 18449/20099 (91.79%) Loss: 2.015176 LR: 0.00000661 [17:25:28] Epoch: 1 Batch: 18450/20099 (91.80%) Loss: 1.939888 LR: 0.00000661 [17:25:30] Epoch: 1 Batch: 18451/20099 (91.80%) Loss: 2.155565 LR: 0.00000661 [17:25:32] Epoch: 1 Batch: 18452/20099 (91.81%) Loss: 2.225537 LR: 0.00000661 [17:25:33] Epoch: 1 Batch: 18453/20099 (91.81%) Loss: 2.078254 LR: 0.00000661 [17:25:35] Epoch: 1 Batch: 18454/20099 (91.82%) Loss: 2.303090 LR: 0.00000661 [17:25:37] Epoch: 1 Batch: 18455/20099 (91.82%) Loss: 2.247781 LR: 0.00000661 [17:25:39] Epoch: 1 Batch: 18456/20099 (91.83%) Loss: 2.372938 LR: 0.00000661 [17:25:41] Epoch: 1 Batch: 18457/20099 (91.83%) Loss: 2.056391 LR: 0.00000661 [17:25:43] Epoch: 1 Batch: 18458/20099 (91.84%) Loss: 2.242462 LR: 0.00000661 [17:25:45] Epoch: 1 Batch: 18459/20099 (91.84%) Loss: 2.127981 LR: 0.00000661 [17:25:46] Epoch: 1 Batch: 18460/20099 (91.85%) Loss: 2.217747 LR: 0.00000660 [17:25:48] Epoch: 1 Batch: 18461/20099 (91.85%) Loss: 2.080657 LR: 0.00000660 [17:25:50] Epoch: 1 Batch: 18462/20099 (91.86%) Loss: 2.056182 LR: 0.00000660 [17:25:52] Epoch: 1 Batch: 18463/20099 (91.86%) Loss: 2.178015 LR: 0.00000660 [17:25:54] Epoch: 1 Batch: 18464/20099 (91.87%) Loss: 2.217439 LR: 0.00000660 [17:25:56] Epoch: 1 Batch: 18465/20099 (91.87%) Loss: 1.958147 LR: 0.00000660 [17:25:58] Epoch: 1 Batch: 18466/20099 (91.88%) Loss: 1.988809 LR: 0.00000660 [17:25:59] Epoch: 1 Batch: 18467/20099 (91.88%) Loss: 2.021551 LR: 0.00000660 [17:26:01] Epoch: 1 Batch: 18468/20099 (91.89%) Loss: 2.137855 LR: 0.00000660 [17:26:03] Epoch: 1 Batch: 18469/20099 (91.89%) Loss: 2.297027 LR: 0.00000660 [17:26:05] Epoch: 1 Batch: 18470/20099 (91.90%) Loss: 1.779277 LR: 0.00000660 [17:26:07] Epoch: 1 Batch: 18471/20099 (91.90%) Loss: 2.017470 LR: 0.00000660 [17:26:09] Epoch: 1 Batch: 18472/20099 (91.91%) Loss: 2.142831 LR: 0.00000660 [17:26:11] Epoch: 1 Batch: 18473/20099 (91.91%) Loss: 2.102731 LR: 0.00000660 [17:26:13] Epoch: 1 Batch: 18474/20099 (91.92%) Loss: 2.043601 LR: 0.00000659 [17:26:14] Epoch: 1 Batch: 18475/20099 (91.92%) Loss: 1.955654 LR: 0.00000659 [17:26:16] Epoch: 1 Batch: 18476/20099 (91.92%) Loss: 2.155729 LR: 0.00000659 [17:26:18] Epoch: 1 Batch: 18477/20099 (91.93%) Loss: 2.310088 LR: 0.00000659 [17:26:20] Epoch: 1 Batch: 18478/20099 (91.93%) Loss: 2.086790 LR: 0.00000659 [17:26:22] Epoch: 1 Batch: 18479/20099 (91.94%) Loss: 1.949475 LR: 0.00000659 [17:26:24] Epoch: 1 Batch: 18480/20099 (91.94%) Loss: 2.324889 LR: 0.00000659 [17:26:26] Epoch: 1 Batch: 18481/20099 (91.95%) Loss: 2.137190 LR: 0.00000659 [17:26:27] Epoch: 1 Batch: 18482/20099 (91.95%) Loss: 2.033202 LR: 0.00000659 [17:26:29] Epoch: 1 Batch: 18483/20099 (91.96%) Loss: 1.879902 LR: 0.00000659 [17:26:31] Epoch: 1 Batch: 18484/20099 (91.96%) Loss: 1.733433 LR: 0.00000659 [17:26:33] Epoch: 1 Batch: 18485/20099 (91.97%) Loss: 1.865685 LR: 0.00000659 [17:26:35] Epoch: 1 Batch: 18486/20099 (91.97%) Loss: 2.006053 LR: 0.00000659 [17:26:37] Epoch: 1 Batch: 18487/20099 (91.98%) Loss: 2.027108 LR: 0.00000659 [17:26:39] Epoch: 1 Batch: 18488/20099 (91.98%) Loss: 2.125189 LR: 0.00000658 [17:26:40] Epoch: 1 Batch: 18489/20099 (91.99%) Loss: 1.993268 LR: 0.00000658 [17:26:42] Epoch: 1 Batch: 18490/20099 (91.99%) Loss: 2.025223 LR: 0.00000658 [17:26:44] Epoch: 1 Batch: 18491/20099 (92.00%) Loss: 1.954008 LR: 0.00000658 [17:26:46] Epoch: 1 Batch: 18492/20099 (92.00%) Loss: 2.049920 LR: 0.00000658 [17:26:48] Epoch: 1 Batch: 18493/20099 (92.01%) Loss: 2.060196 LR: 0.00000658 [17:26:50] Epoch: 1 Batch: 18494/20099 (92.01%) Loss: 2.221396 LR: 0.00000658 [17:26:52] Epoch: 1 Batch: 18495/20099 (92.02%) Loss: 1.949296 LR: 0.00000658 [17:26:54] Epoch: 1 Batch: 18496/20099 (92.02%) Loss: 2.216684 LR: 0.00000658 [17:26:55] Epoch: 1 Batch: 18497/20099 (92.03%) Loss: 1.797877 LR: 0.00000658 [17:26:57] Epoch: 1 Batch: 18498/20099 (92.03%) Loss: 2.202593 LR: 0.00000658 [17:26:59] Epoch: 1 Batch: 18499/20099 (92.04%) Loss: 1.974794 LR: 0.00000658 [17:27:01] >> Evaluating batch 0 [17:27:02] >> Evaluating batch 1 [17:27:03] >> Evaluating batch 2 [17:27:04] >> Evaluating batch 3 [17:27:05] >> Evaluating batch 4 [17:27:06] >> Evaluating batch 5 [17:27:07] >> Evaluating batch 6 [17:27:08] >> Evaluating batch 7 [17:27:10] >> Evaluating batch 8 [17:27:11] >> Evaluating batch 9 [17:27:12] >> Evaluating batch 10 [17:27:13] >> Evaluating batch 11 [17:27:14] >> Evaluating batch 12 [17:27:15] >> Evaluating batch 13 [17:27:16] >> Evaluating batch 14 [17:27:17] >> Evaluating batch 15 [17:27:18] >> Evaluating batch 16 [17:27:18] Epoch: 1 Step: 18500/20099 Evaluation: [17:27:18] [1mAvg Loss Since Last Eval: 2.0856 Val Loss: 2.1458 Validation loss delta: -0.0021 Perplexity: 8.5491 LR: 0.00000658 [17:27:22] >> Checkpoint saved: epoch1_step18500, size: 0.1693 GB [17:27:22] Epoch: 1 Batch: 18500/20099 (92.04%) Loss: 2.070910 LR: 0.00000658 [17:27:24] Epoch: 1 Batch: 18501/20099 (92.05%) Loss: 2.200188 LR: 0.00000658 [17:27:25] Epoch: 1 Batch: 18502/20099 (92.05%) Loss: 1.681633 LR: 0.00000657 [17:27:27] Epoch: 1 Batch: 18503/20099 (92.06%) Loss: 2.008803 LR: 0.00000657 [17:27:29] Epoch: 1 Batch: 18504/20099 (92.06%) Loss: 2.209777 LR: 0.00000657 [17:27:31] Epoch: 1 Batch: 18505/20099 (92.07%) Loss: 2.010547 LR: 0.00000657 [17:27:33] Epoch: 1 Batch: 18506/20099 (92.07%) Loss: 2.352647 LR: 0.00000657 [17:27:35] Epoch: 1 Batch: 18507/20099 (92.08%) Loss: 2.049237 LR: 0.00000657 [17:27:37] Epoch: 1 Batch: 18508/20099 (92.08%) Loss: 2.254377 LR: 0.00000657 [17:27:38] Epoch: 1 Batch: 18509/20099 (92.09%) Loss: 1.905163 LR: 0.00000657 [17:27:40] Epoch: 1 Batch: 18510/20099 (92.09%) Loss: 2.159025 LR: 0.00000657 [17:27:42] Epoch: 1 Batch: 18511/20099 (92.10%) Loss: 2.104522 LR: 0.00000657 [17:27:44] Epoch: 1 Batch: 18512/20099 (92.10%) Loss: 2.090782 LR: 0.00000657 [17:27:46] Epoch: 1 Batch: 18513/20099 (92.11%) Loss: 1.868900 LR: 0.00000657 [17:27:48] Epoch: 1 Batch: 18514/20099 (92.11%) Loss: 2.178253 LR: 0.00000657 [17:27:50] Epoch: 1 Batch: 18515/20099 (92.12%) Loss: 1.943151 LR: 0.00000657 [17:27:52] Epoch: 1 Batch: 18516/20099 (92.12%) Loss: 2.097134 LR: 0.00000656 [17:27:53] Epoch: 1 Batch: 18517/20099 (92.13%) Loss: 2.108878 LR: 0.00000656 [17:27:55] Epoch: 1 Batch: 18518/20099 (92.13%) Loss: 1.919691 LR: 0.00000656 [17:27:57] Epoch: 1 Batch: 18519/20099 (92.14%) Loss: 2.186897 LR: 0.00000656 [17:27:59] Epoch: 1 Batch: 18520/20099 (92.14%) Loss: 2.186325 LR: 0.00000656 [17:28:01] Epoch: 1 Batch: 18521/20099 (92.15%) Loss: 2.180423 LR: 0.00000656 [17:28:03] Epoch: 1 Batch: 18522/20099 (92.15%) Loss: 1.992442 LR: 0.00000656 [17:28:05] Epoch: 1 Batch: 18523/20099 (92.16%) Loss: 2.119346 LR: 0.00000656 [17:28:07] Epoch: 1 Batch: 18524/20099 (92.16%) Loss: 2.086420 LR: 0.00000656 [17:28:08] Epoch: 1 Batch: 18525/20099 (92.17%) Loss: 2.232122 LR: 0.00000656 [17:28:10] Epoch: 1 Batch: 18526/20099 (92.17%) Loss: 2.112742 LR: 0.00000656 [17:28:12] Epoch: 1 Batch: 18527/20099 (92.18%) Loss: 2.139407 LR: 0.00000656 [17:28:14] Epoch: 1 Batch: 18528/20099 (92.18%) Loss: 1.721443 LR: 0.00000656 [17:28:16] Epoch: 1 Batch: 18529/20099 (92.19%) Loss: 2.120173 LR: 0.00000656 [17:28:18] Epoch: 1 Batch: 18530/20099 (92.19%) Loss: 1.952200 LR: 0.00000655 [17:28:20] Epoch: 1 Batch: 18531/20099 (92.20%) Loss: 2.082420 LR: 0.00000655 [17:28:21] Epoch: 1 Batch: 18532/20099 (92.20%) Loss: 1.943797 LR: 0.00000655 [17:28:23] Epoch: 1 Batch: 18533/20099 (92.21%) Loss: 2.337121 LR: 0.00000655 [17:28:25] Epoch: 1 Batch: 18534/20099 (92.21%) Loss: 2.154261 LR: 0.00000655 [17:28:27] Epoch: 1 Batch: 18535/20099 (92.22%) Loss: 1.921984 LR: 0.00000655 [17:28:29] Epoch: 1 Batch: 18536/20099 (92.22%) Loss: 2.241860 LR: 0.00000655 [17:28:31] Epoch: 1 Batch: 18537/20099 (92.23%) Loss: 2.158810 LR: 0.00000655 [17:28:32] Epoch: 1 Batch: 18538/20099 (92.23%) Loss: 2.122480 LR: 0.00000655 [17:28:34] Epoch: 1 Batch: 18539/20099 (92.24%) Loss: 2.175948 LR: 0.00000655 [17:28:36] Epoch: 1 Batch: 18540/20099 (92.24%) Loss: 2.282676 LR: 0.00000655 [17:28:38] Epoch: 1 Batch: 18541/20099 (92.25%) Loss: 1.977975 LR: 0.00000655 [17:28:40] Epoch: 1 Batch: 18542/20099 (92.25%) Loss: 2.406963 LR: 0.00000655 [17:28:42] Epoch: 1 Batch: 18543/20099 (92.26%) Loss: 2.249956 LR: 0.00000655 [17:28:44] Epoch: 1 Batch: 18544/20099 (92.26%) Loss: 2.353301 LR: 0.00000654 [17:28:45] Epoch: 1 Batch: 18545/20099 (92.27%) Loss: 1.787629 LR: 0.00000654 [17:28:47] Epoch: 1 Batch: 18546/20099 (92.27%) Loss: 2.286444 LR: 0.00000654 [17:28:49] Epoch: 1 Batch: 18547/20099 (92.28%) Loss: 1.700065 LR: 0.00000654 [17:28:51] Epoch: 1 Batch: 18548/20099 (92.28%) Loss: 1.937444 LR: 0.00000654 [17:28:53] Epoch: 1 Batch: 18549/20099 (92.29%) Loss: 1.863976 LR: 0.00000654 [17:28:55] Epoch: 1 Batch: 18550/20099 (92.29%) Loss: 1.998828 LR: 0.00000654 [17:28:56] Epoch: 1 Batch: 18551/20099 (92.30%) Loss: 2.319018 LR: 0.00000654 [17:28:58] Epoch: 1 Batch: 18552/20099 (92.30%) Loss: 2.218042 LR: 0.00000654 [17:29:00] Epoch: 1 Batch: 18553/20099 (92.31%) Loss: 2.173389 LR: 0.00000654 [17:29:02] Epoch: 1 Batch: 18554/20099 (92.31%) Loss: 1.863856 LR: 0.00000654 [17:29:04] Epoch: 1 Batch: 18555/20099 (92.32%) Loss: 2.091495 LR: 0.00000654 [17:29:06] Epoch: 1 Batch: 18556/20099 (92.32%) Loss: 1.889516 LR: 0.00000654 [17:29:08] Epoch: 1 Batch: 18557/20099 (92.33%) Loss: 1.814129 LR: 0.00000654 [17:29:09] Epoch: 1 Batch: 18558/20099 (92.33%) Loss: 2.095965 LR: 0.00000653 [17:29:11] Epoch: 1 Batch: 18559/20099 (92.34%) Loss: 2.098662 LR: 0.00000653 [17:29:13] Epoch: 1 Batch: 18560/20099 (92.34%) Loss: 2.084881 LR: 0.00000653 [17:29:15] Epoch: 1 Batch: 18561/20099 (92.35%) Loss: 1.856266 LR: 0.00000653 [17:29:17] Epoch: 1 Batch: 18562/20099 (92.35%) Loss: 2.208614 LR: 0.00000653 [17:29:19] Epoch: 1 Batch: 18563/20099 (92.36%) Loss: 2.038973 LR: 0.00000653 [17:29:21] Epoch: 1 Batch: 18564/20099 (92.36%) Loss: 2.149385 LR: 0.00000653 [17:29:23] Epoch: 1 Batch: 18565/20099 (92.37%) Loss: 1.968051 LR: 0.00000653 [17:29:24] Epoch: 1 Batch: 18566/20099 (92.37%) Loss: 1.950612 LR: 0.00000653 [17:29:26] Epoch: 1 Batch: 18567/20099 (92.38%) Loss: 2.178420 LR: 0.00000653 [17:29:28] Epoch: 1 Batch: 18568/20099 (92.38%) Loss: 2.039620 LR: 0.00000653 [17:29:30] Epoch: 1 Batch: 18569/20099 (92.39%) Loss: 2.145299 LR: 0.00000653 [17:29:32] Epoch: 1 Batch: 18570/20099 (92.39%) Loss: 2.082948 LR: 0.00000653 [17:29:34] Epoch: 1 Batch: 18571/20099 (92.40%) Loss: 2.105550 LR: 0.00000653 [17:29:36] Epoch: 1 Batch: 18572/20099 (92.40%) Loss: 2.055319 LR: 0.00000652 [17:29:37] Epoch: 1 Batch: 18573/20099 (92.41%) Loss: 1.954497 LR: 0.00000652 [17:29:39] Epoch: 1 Batch: 18574/20099 (92.41%) Loss: 2.095882 LR: 0.00000652 [17:29:41] Epoch: 1 Batch: 18575/20099 (92.42%) Loss: 2.423534 LR: 0.00000652 [17:29:43] Epoch: 1 Batch: 18576/20099 (92.42%) Loss: 1.885827 LR: 0.00000652 [17:29:45] Epoch: 1 Batch: 18577/20099 (92.43%) Loss: 2.016953 LR: 0.00000652 [17:29:47] Epoch: 1 Batch: 18578/20099 (92.43%) Loss: 1.802325 LR: 0.00000652 [17:29:49] Epoch: 1 Batch: 18579/20099 (92.44%) Loss: 2.585333 LR: 0.00000652 [17:29:50] Epoch: 1 Batch: 18580/20099 (92.44%) Loss: 2.004861 LR: 0.00000652 [17:29:52] Epoch: 1 Batch: 18581/20099 (92.45%) Loss: 2.285186 LR: 0.00000652 [17:29:54] Epoch: 1 Batch: 18582/20099 (92.45%) Loss: 2.204913 LR: 0.00000652 [17:29:56] Epoch: 1 Batch: 18583/20099 (92.46%) Loss: 2.206007 LR: 0.00000652 [17:29:58] Epoch: 1 Batch: 18584/20099 (92.46%) Loss: 2.077108 LR: 0.00000652 [17:30:00] Epoch: 1 Batch: 18585/20099 (92.47%) Loss: 2.128728 LR: 0.00000652 [17:30:01] Epoch: 1 Batch: 18586/20099 (92.47%) Loss: 2.184425 LR: 0.00000651 [17:30:03] Epoch: 1 Batch: 18587/20099 (92.48%) Loss: 2.174702 LR: 0.00000651 [17:30:05] Epoch: 1 Batch: 18588/20099 (92.48%) Loss: 1.979606 LR: 0.00000651 [17:30:07] Epoch: 1 Batch: 18589/20099 (92.49%) Loss: 1.969262 LR: 0.00000651 [17:30:09] Epoch: 1 Batch: 18590/20099 (92.49%) Loss: 2.291821 LR: 0.00000651 [17:30:11] Epoch: 1 Batch: 18591/20099 (92.50%) Loss: 2.020004 LR: 0.00000651 [17:30:13] Epoch: 1 Batch: 18592/20099 (92.50%) Loss: 2.005043 LR: 0.00000651 [17:30:14] Epoch: 1 Batch: 18593/20099 (92.51%) Loss: 1.991050 LR: 0.00000651 [17:30:16] Epoch: 1 Batch: 18594/20099 (92.51%) Loss: 2.192938 LR: 0.00000651 [17:30:18] Epoch: 1 Batch: 18595/20099 (92.52%) Loss: 1.842924 LR: 0.00000651 [17:30:20] Epoch: 1 Batch: 18596/20099 (92.52%) Loss: 1.967454 LR: 0.00000651 [17:30:22] Epoch: 1 Batch: 18597/20099 (92.53%) Loss: 1.984706 LR: 0.00000651 [17:30:24] Epoch: 1 Batch: 18598/20099 (92.53%) Loss: 2.036189 LR: 0.00000651 [17:30:25] Epoch: 1 Batch: 18599/20099 (92.54%) Loss: 2.076400 LR: 0.00000651 [17:30:31] >> Cleaned up old temp checkpoint: epoch1_step16600 [17:30:31] >> Temp checkpoint saved: epoch1_step18600, size: 0.1693 GB [17:30:31] Epoch: 1 Batch: 18600/20099 (92.54%) Loss: 2.136930 LR: 0.00000650 [17:30:33] Epoch: 1 Batch: 18601/20099 (92.55%) Loss: 2.203590 LR: 0.00000650 [17:30:35] Epoch: 1 Batch: 18602/20099 (92.55%) Loss: 2.242135 LR: 0.00000650 [17:30:36] Epoch: 1 Batch: 18603/20099 (92.56%) Loss: 2.149366 LR: 0.00000650 [17:30:38] Epoch: 1 Batch: 18604/20099 (92.56%) Loss: 2.079739 LR: 0.00000650 [17:30:40] Epoch: 1 Batch: 18605/20099 (92.57%) Loss: 2.105687 LR: 0.00000650 [17:30:42] Epoch: 1 Batch: 18606/20099 (92.57%) Loss: 2.024431 LR: 0.00000650 [17:30:44] Epoch: 1 Batch: 18607/20099 (92.58%) Loss: 2.034639 LR: 0.00000650 [17:30:46] Epoch: 1 Batch: 18608/20099 (92.58%) Loss: 2.312706 LR: 0.00000650 [17:30:47] Epoch: 1 Batch: 18609/20099 (92.59%) Loss: 1.852452 LR: 0.00000650 [17:30:49] Epoch: 1 Batch: 18610/20099 (92.59%) Loss: 1.923555 LR: 0.00000650 [17:30:51] Epoch: 1 Batch: 18611/20099 (92.60%) Loss: 2.000809 LR: 0.00000650 [17:30:53] Epoch: 1 Batch: 18612/20099 (92.60%) Loss: 1.888199 LR: 0.00000650 [17:30:55] Epoch: 1 Batch: 18613/20099 (92.61%) Loss: 2.238435 LR: 0.00000650 [17:30:57] Epoch: 1 Batch: 18614/20099 (92.61%) Loss: 1.866315 LR: 0.00000650 [17:30:59] Epoch: 1 Batch: 18615/20099 (92.62%) Loss: 1.947032 LR: 0.00000650 [17:31:00] Epoch: 1 Batch: 18616/20099 (92.62%) Loss: 2.433438 LR: 0.00000650 [17:31:02] Epoch: 1 Batch: 18617/20099 (92.63%) Loss: 2.131267 LR: 0.00000650 [17:31:04] Epoch: 1 Batch: 18618/20099 (92.63%) Loss: 1.948051 LR: 0.00000650 [17:31:06] Epoch: 1 Batch: 18619/20099 (92.64%) Loss: 2.178027 LR: 0.00000650 [17:31:08] Epoch: 1 Batch: 18620/20099 (92.64%) Loss: 2.228802 LR: 0.00000650 [17:31:10] Epoch: 1 Batch: 18621/20099 (92.65%) Loss: 1.898891 LR: 0.00000649 [17:31:12] Epoch: 1 Batch: 18622/20099 (92.65%) Loss: 2.019291 LR: 0.00000649 [17:31:14] Epoch: 1 Batch: 18623/20099 (92.66%) Loss: 2.387329 LR: 0.00000649 [17:31:15] Epoch: 1 Batch: 18624/20099 (92.66%) Loss: 2.464012 LR: 0.00000649 [17:31:17] Epoch: 1 Batch: 18625/20099 (92.67%) Loss: 1.971589 LR: 0.00000649 [17:31:19] Epoch: 1 Batch: 18626/20099 (92.67%) Loss: 2.178151 LR: 0.00000649 [17:31:21] Epoch: 1 Batch: 18627/20099 (92.68%) Loss: 2.074365 LR: 0.00000649 [17:31:23] Epoch: 1 Batch: 18628/20099 (92.68%) Loss: 2.064335 LR: 0.00000649 [17:31:25] Epoch: 1 Batch: 18629/20099 (92.69%) Loss: 2.118189 LR: 0.00000649 [17:31:27] Epoch: 1 Batch: 18630/20099 (92.69%) Loss: 2.140453 LR: 0.00000649 [17:31:29] Epoch: 1 Batch: 18631/20099 (92.70%) Loss: 2.093363 LR: 0.00000649 [17:31:30] Epoch: 1 Batch: 18632/20099 (92.70%) Loss: 2.368889 LR: 0.00000649 [17:31:32] Epoch: 1 Batch: 18633/20099 (92.71%) Loss: 2.021134 LR: 0.00000649 [17:31:34] Epoch: 1 Batch: 18634/20099 (92.71%) Loss: 2.218708 LR: 0.00000649 [17:31:36] Epoch: 1 Batch: 18635/20099 (92.72%) Loss: 1.892248 LR: 0.00000648 [17:31:38] Epoch: 1 Batch: 18636/20099 (92.72%) Loss: 2.027182 LR: 0.00000648 [17:31:40] Epoch: 1 Batch: 18637/20099 (92.73%) Loss: 2.000261 LR: 0.00000648 [17:31:41] Epoch: 1 Batch: 18638/20099 (92.73%) Loss: 2.136497 LR: 0.00000648 [17:31:43] Epoch: 1 Batch: 18639/20099 (92.74%) Loss: 2.118199 LR: 0.00000648 [17:31:45] Epoch: 1 Batch: 18640/20099 (92.74%) Loss: 2.016747 LR: 0.00000648 [17:31:47] Epoch: 1 Batch: 18641/20099 (92.75%) Loss: 1.917366 LR: 0.00000648 [17:31:49] Epoch: 1 Batch: 18642/20099 (92.75%) Loss: 2.190498 LR: 0.00000648 [17:31:51] Epoch: 1 Batch: 18643/20099 (92.76%) Loss: 2.409042 LR: 0.00000648 [17:31:53] Epoch: 1 Batch: 18644/20099 (92.76%) Loss: 2.184630 LR: 0.00000648 [17:31:54] Epoch: 1 Batch: 18645/20099 (92.77%) Loss: 1.783897 LR: 0.00000648 [17:31:56] Epoch: 1 Batch: 18646/20099 (92.77%) Loss: 1.832634 LR: 0.00000648 [17:31:58] Epoch: 1 Batch: 18647/20099 (92.78%) Loss: 2.293603 LR: 0.00000648 [17:32:00] Epoch: 1 Batch: 18648/20099 (92.78%) Loss: 2.066734 LR: 0.00000648 [17:32:02] Epoch: 1 Batch: 18649/20099 (92.79%) Loss: 2.010708 LR: 0.00000647 [17:32:04] Epoch: 1 Batch: 18650/20099 (92.79%) Loss: 1.835790 LR: 0.00000647 [17:32:05] Epoch: 1 Batch: 18651/20099 (92.80%) Loss: 2.315195 LR: 0.00000647 [17:32:07] Epoch: 1 Batch: 18652/20099 (92.80%) Loss: 1.996179 LR: 0.00000647 [17:32:09] Epoch: 1 Batch: 18653/20099 (92.81%) Loss: 2.290437 LR: 0.00000647 [17:32:11] Epoch: 1 Batch: 18654/20099 (92.81%) Loss: 2.172856 LR: 0.00000647 [17:32:13] Epoch: 1 Batch: 18655/20099 (92.82%) Loss: 2.273805 LR: 0.00000647 [17:32:15] Epoch: 1 Batch: 18656/20099 (92.82%) Loss: 1.965993 LR: 0.00000647 [17:32:16] Epoch: 1 Batch: 18657/20099 (92.83%) Loss: 2.040334 LR: 0.00000647 [17:32:18] Epoch: 1 Batch: 18658/20099 (92.83%) Loss: 2.064215 LR: 0.00000647 [17:32:20] Epoch: 1 Batch: 18659/20099 (92.84%) Loss: 1.930113 LR: 0.00000647 [17:32:22] Epoch: 1 Batch: 18660/20099 (92.84%) Loss: 1.890174 LR: 0.00000647 [17:32:24] Epoch: 1 Batch: 18661/20099 (92.85%) Loss: 1.977024 LR: 0.00000647 [17:32:26] Epoch: 1 Batch: 18662/20099 (92.85%) Loss: 2.100531 LR: 0.00000647 [17:32:28] Epoch: 1 Batch: 18663/20099 (92.86%) Loss: 2.075784 LR: 0.00000646 [17:32:29] Epoch: 1 Batch: 18664/20099 (92.86%) Loss: 2.078973 LR: 0.00000646 [17:32:31] Epoch: 1 Batch: 18665/20099 (92.87%) Loss: 1.921682 LR: 0.00000646 [17:32:33] Epoch: 1 Batch: 18666/20099 (92.87%) Loss: 2.073140 LR: 0.00000646 [17:32:35] Epoch: 1 Batch: 18667/20099 (92.88%) Loss: 2.015704 LR: 0.00000646 [17:32:37] Epoch: 1 Batch: 18668/20099 (92.88%) Loss: 2.001980 LR: 0.00000646 [17:32:39] Epoch: 1 Batch: 18669/20099 (92.89%) Loss: 2.378247 LR: 0.00000646 [17:32:41] Epoch: 1 Batch: 18670/20099 (92.89%) Loss: 1.904986 LR: 0.00000646 [17:32:42] Epoch: 1 Batch: 18671/20099 (92.90%) Loss: 2.058527 LR: 0.00000646 [17:32:44] Epoch: 1 Batch: 18672/20099 (92.90%) Loss: 2.001500 LR: 0.00000646 [17:32:46] Epoch: 1 Batch: 18673/20099 (92.91%) Loss: 2.045680 LR: 0.00000646 [17:32:48] Epoch: 1 Batch: 18674/20099 (92.91%) Loss: 2.374646 LR: 0.00000646 [17:32:50] Epoch: 1 Batch: 18675/20099 (92.92%) Loss: 2.226331 LR: 0.00000646 [17:32:52] Epoch: 1 Batch: 18676/20099 (92.92%) Loss: 1.977802 LR: 0.00000646 [17:32:54] Epoch: 1 Batch: 18677/20099 (92.93%) Loss: 1.942435 LR: 0.00000645 [17:32:55] Epoch: 1 Batch: 18678/20099 (92.93%) Loss: 1.825245 LR: 0.00000645 [17:32:57] Epoch: 1 Batch: 18679/20099 (92.93%) Loss: 2.064878 LR: 0.00000645 [17:32:59] Epoch: 1 Batch: 18680/20099 (92.94%) Loss: 2.237486 LR: 0.00000645 [17:33:01] Epoch: 1 Batch: 18681/20099 (92.94%) Loss: 2.093668 LR: 0.00000645 [17:33:03] Epoch: 1 Batch: 18682/20099 (92.95%) Loss: 2.129313 LR: 0.00000645 [17:33:05] Epoch: 1 Batch: 18683/20099 (92.95%) Loss: 2.305095 LR: 0.00000645 [17:33:07] Epoch: 1 Batch: 18684/20099 (92.96%) Loss: 2.067386 LR: 0.00000645 [17:33:09] Epoch: 1 Batch: 18685/20099 (92.96%) Loss: 1.971169 LR: 0.00000645 [17:33:10] Epoch: 1 Batch: 18686/20099 (92.97%) Loss: 2.109677 LR: 0.00000645 [17:33:12] Epoch: 1 Batch: 18687/20099 (92.97%) Loss: 2.283536 LR: 0.00000645 [17:33:14] Epoch: 1 Batch: 18688/20099 (92.98%) Loss: 2.154419 LR: 0.00000645 [17:33:16] Epoch: 1 Batch: 18689/20099 (92.98%) Loss: 2.238497 LR: 0.00000645 [17:33:18] Epoch: 1 Batch: 18690/20099 (92.99%) Loss: 2.078596 LR: 0.00000645 [17:33:20] Epoch: 1 Batch: 18691/20099 (92.99%) Loss: 2.267323 LR: 0.00000645 [17:33:22] Epoch: 1 Batch: 18692/20099 (93.00%) Loss: 2.092633 LR: 0.00000645 [17:33:23] Epoch: 1 Batch: 18693/20099 (93.00%) Loss: 2.002461 LR: 0.00000645 [17:33:25] Epoch: 1 Batch: 18694/20099 (93.01%) Loss: 1.712382 LR: 0.00000645 [17:33:27] Epoch: 1 Batch: 18695/20099 (93.01%) Loss: 1.858646 LR: 0.00000645 [17:33:29] Epoch: 1 Batch: 18696/20099 (93.02%) Loss: 2.499231 LR: 0.00000645 [17:33:31] Epoch: 1 Batch: 18697/20099 (93.02%) Loss: 2.223319 LR: 0.00000645 [17:33:33] Epoch: 1 Batch: 18698/20099 (93.03%) Loss: 2.077700 LR: 0.00000644 [17:33:35] Epoch: 1 Batch: 18699/20099 (93.03%) Loss: 2.001270 LR: 0.00000644 [17:33:36] Epoch: 1 Batch: 18700/20099 (93.04%) Loss: 1.892269 LR: 0.00000644 [17:33:38] Epoch: 1 Batch: 18701/20099 (93.04%) Loss: 2.018689 LR: 0.00000644 [17:33:40] Epoch: 1 Batch: 18702/20099 (93.05%) Loss: 2.082895 LR: 0.00000644 [17:33:42] Epoch: 1 Batch: 18703/20099 (93.05%) Loss: 1.855661 LR: 0.00000644 [17:33:44] Epoch: 1 Batch: 18704/20099 (93.06%) Loss: 2.001112 LR: 0.00000644 [17:33:46] Epoch: 1 Batch: 18705/20099 (93.06%) Loss: 1.860882 LR: 0.00000644 [17:33:48] Epoch: 1 Batch: 18706/20099 (93.07%) Loss: 2.058739 LR: 0.00000644 [17:33:49] Epoch: 1 Batch: 18707/20099 (93.07%) Loss: 2.156561 LR: 0.00000644 [17:33:51] Epoch: 1 Batch: 18708/20099 (93.08%) Loss: 2.047520 LR: 0.00000644 [17:33:53] Epoch: 1 Batch: 18709/20099 (93.08%) Loss: 2.263521 LR: 0.00000644 [17:33:55] Epoch: 1 Batch: 18710/20099 (93.09%) Loss: 2.085829 LR: 0.00000644 [17:33:57] Epoch: 1 Batch: 18711/20099 (93.09%) Loss: 2.424526 LR: 0.00000644 [17:33:59] Epoch: 1 Batch: 18712/20099 (93.10%) Loss: 1.983364 LR: 0.00000643 [17:34:01] Epoch: 1 Batch: 18713/20099 (93.10%) Loss: 2.018081 LR: 0.00000643 [17:34:02] Epoch: 1 Batch: 18714/20099 (93.11%) Loss: 2.169566 LR: 0.00000643 [17:34:04] Epoch: 1 Batch: 18715/20099 (93.11%) Loss: 1.968771 LR: 0.00000643 [17:34:06] Epoch: 1 Batch: 18716/20099 (93.12%) Loss: 2.036135 LR: 0.00000643 [17:34:08] Epoch: 1 Batch: 18717/20099 (93.12%) Loss: 1.847755 LR: 0.00000643 [17:34:10] Epoch: 1 Batch: 18718/20099 (93.13%) Loss: 2.001440 LR: 0.00000643 [17:34:12] Epoch: 1 Batch: 18719/20099 (93.13%) Loss: 2.224553 LR: 0.00000643 [17:34:14] Epoch: 1 Batch: 18720/20099 (93.14%) Loss: 2.151979 LR: 0.00000643 [17:34:15] Epoch: 1 Batch: 18721/20099 (93.14%) Loss: 2.276133 LR: 0.00000643 [17:34:17] Epoch: 1 Batch: 18722/20099 (93.15%) Loss: 2.107471 LR: 0.00000643 [17:34:19] Epoch: 1 Batch: 18723/20099 (93.15%) Loss: 1.984312 LR: 0.00000643 [17:34:21] Epoch: 1 Batch: 18724/20099 (93.16%) Loss: 2.004476 LR: 0.00000643 [17:34:23] Epoch: 1 Batch: 18725/20099 (93.16%) Loss: 2.042790 LR: 0.00000643 [17:34:25] Epoch: 1 Batch: 18726/20099 (93.17%) Loss: 2.188302 LR: 0.00000642 [17:34:26] Epoch: 1 Batch: 18727/20099 (93.17%) Loss: 2.103880 LR: 0.00000642 [17:34:28] Epoch: 1 Batch: 18728/20099 (93.18%) Loss: 1.967489 LR: 0.00000642 [17:34:30] Epoch: 1 Batch: 18729/20099 (93.18%) Loss: 2.156760 LR: 0.00000642 [17:34:32] Epoch: 1 Batch: 18730/20099 (93.19%) Loss: 1.999638 LR: 0.00000642 [17:34:34] Epoch: 1 Batch: 18731/20099 (93.19%) Loss: 2.132552 LR: 0.00000642 [17:34:36] Epoch: 1 Batch: 18732/20099 (93.20%) Loss: 2.038106 LR: 0.00000642 [17:34:38] Epoch: 1 Batch: 18733/20099 (93.20%) Loss: 2.134585 LR: 0.00000642 [17:34:39] Epoch: 1 Batch: 18734/20099 (93.21%) Loss: 2.178454 LR: 0.00000642 [17:34:41] Epoch: 1 Batch: 18735/20099 (93.21%) Loss: 1.941187 LR: 0.00000642 [17:34:43] Epoch: 1 Batch: 18736/20099 (93.22%) Loss: 1.861298 LR: 0.00000642 [17:34:45] Epoch: 1 Batch: 18737/20099 (93.22%) Loss: 2.228157 LR: 0.00000642 [17:34:47] Epoch: 1 Batch: 18738/20099 (93.23%) Loss: 2.259131 LR: 0.00000642 [17:34:49] Epoch: 1 Batch: 18739/20099 (93.23%) Loss: 1.827313 LR: 0.00000642 [17:34:51] Epoch: 1 Batch: 18740/20099 (93.24%) Loss: 2.070163 LR: 0.00000642 [17:34:52] Epoch: 1 Batch: 18741/20099 (93.24%) Loss: 2.206600 LR: 0.00000642 [17:34:54] Epoch: 1 Batch: 18742/20099 (93.25%) Loss: 1.767934 LR: 0.00000642 [17:34:56] Epoch: 1 Batch: 18743/20099 (93.25%) Loss: 1.962018 LR: 0.00000642 [17:34:58] Epoch: 1 Batch: 18744/20099 (93.26%) Loss: 1.959214 LR: 0.00000642 [17:35:00] Epoch: 1 Batch: 18745/20099 (93.26%) Loss: 2.339392 LR: 0.00000642 [17:35:02] Epoch: 1 Batch: 18746/20099 (93.27%) Loss: 2.058571 LR: 0.00000642 [17:35:04] Epoch: 1 Batch: 18747/20099 (93.27%) Loss: 2.025589 LR: 0.00000641 [17:35:05] Epoch: 1 Batch: 18748/20099 (93.28%) Loss: 2.037528 LR: 0.00000641 [17:35:07] Epoch: 1 Batch: 18749/20099 (93.28%) Loss: 2.028756 LR: 0.00000641 [17:35:09] Epoch: 1 Batch: 18750/20099 (93.29%) Loss: 2.050617 LR: 0.00000641 [17:35:11] Epoch: 1 Batch: 18751/20099 (93.29%) Loss: 1.990655 LR: 0.00000641 [17:35:13] Epoch: 1 Batch: 18752/20099 (93.30%) Loss: 2.074683 LR: 0.00000641 [17:35:15] Epoch: 1 Batch: 18753/20099 (93.30%) Loss: 2.065112 LR: 0.00000641 [17:35:17] Epoch: 1 Batch: 18754/20099 (93.31%) Loss: 2.352881 LR: 0.00000641 [17:35:18] Epoch: 1 Batch: 18755/20099 (93.31%) Loss: 2.209114 LR: 0.00000641 [17:35:20] Epoch: 1 Batch: 18756/20099 (93.32%) Loss: 2.051444 LR: 0.00000641 [17:35:22] Epoch: 1 Batch: 18757/20099 (93.32%) Loss: 2.088859 LR: 0.00000641 [17:35:24] Epoch: 1 Batch: 18758/20099 (93.33%) Loss: 2.426763 LR: 0.00000641 [17:35:26] Epoch: 1 Batch: 18759/20099 (93.33%) Loss: 2.168723 LR: 0.00000641 [17:35:28] Epoch: 1 Batch: 18760/20099 (93.34%) Loss: 2.182183 LR: 0.00000641 [17:35:30] Epoch: 1 Batch: 18761/20099 (93.34%) Loss: 2.304244 LR: 0.00000640 [17:35:31] Epoch: 1 Batch: 18762/20099 (93.35%) Loss: 2.311611 LR: 0.00000640 [17:35:33] Epoch: 1 Batch: 18763/20099 (93.35%) Loss: 2.066550 LR: 0.00000640 [17:35:35] Epoch: 1 Batch: 18764/20099 (93.36%) Loss: 2.306504 LR: 0.00000640 [17:35:37] Epoch: 1 Batch: 18765/20099 (93.36%) Loss: 2.212830 LR: 0.00000640 [17:35:39] Epoch: 1 Batch: 18766/20099 (93.37%) Loss: 1.993317 LR: 0.00000640 [17:35:41] Epoch: 1 Batch: 18767/20099 (93.37%) Loss: 2.210887 LR: 0.00000640 [17:35:42] Epoch: 1 Batch: 18768/20099 (93.38%) Loss: 1.817197 LR: 0.00000640 [17:35:44] Epoch: 1 Batch: 18769/20099 (93.38%) Loss: 2.325694 LR: 0.00000640 [17:35:46] Epoch: 1 Batch: 18770/20099 (93.39%) Loss: 2.396560 LR: 0.00000640 [17:35:48] Epoch: 1 Batch: 18771/20099 (93.39%) Loss: 2.282889 LR: 0.00000640 [17:35:50] Epoch: 1 Batch: 18772/20099 (93.40%) Loss: 2.000047 LR: 0.00000640 [17:35:52] Epoch: 1 Batch: 18773/20099 (93.40%) Loss: 2.224808 LR: 0.00000640 [17:35:54] Epoch: 1 Batch: 18774/20099 (93.41%) Loss: 2.046538 LR: 0.00000640 [17:35:55] Epoch: 1 Batch: 18775/20099 (93.41%) Loss: 2.110552 LR: 0.00000639 [17:35:57] Epoch: 1 Batch: 18776/20099 (93.42%) Loss: 2.005676 LR: 0.00000639 [17:35:59] Epoch: 1 Batch: 18777/20099 (93.42%) Loss: 1.748749 LR: 0.00000639 [17:36:01] Epoch: 1 Batch: 18778/20099 (93.43%) Loss: 2.110223 LR: 0.00000639 [17:36:03] Epoch: 1 Batch: 18779/20099 (93.43%) Loss: 1.993221 LR: 0.00000639 [17:36:05] Epoch: 1 Batch: 18780/20099 (93.44%) Loss: 1.864843 LR: 0.00000639 [17:36:07] Epoch: 1 Batch: 18781/20099 (93.44%) Loss: 1.971040 LR: 0.00000639 [17:36:08] Epoch: 1 Batch: 18782/20099 (93.45%) Loss: 2.099641 LR: 0.00000639 [17:36:10] Epoch: 1 Batch: 18783/20099 (93.45%) Loss: 2.127069 LR: 0.00000639 [17:36:12] Epoch: 1 Batch: 18784/20099 (93.46%) Loss: 2.238376 LR: 0.00000639 [17:36:14] Epoch: 1 Batch: 18785/20099 (93.46%) Loss: 2.208950 LR: 0.00000639 [17:36:16] Epoch: 1 Batch: 18786/20099 (93.47%) Loss: 2.097956 LR: 0.00000639 [17:36:18] Epoch: 1 Batch: 18787/20099 (93.47%) Loss: 2.375259 LR: 0.00000639 [17:36:20] Epoch: 1 Batch: 18788/20099 (93.48%) Loss: 2.211732 LR: 0.00000639 [17:36:21] Epoch: 1 Batch: 18789/20099 (93.48%) Loss: 2.009293 LR: 0.00000639 [17:36:23] Epoch: 1 Batch: 18790/20099 (93.49%) Loss: 2.040404 LR: 0.00000639 [17:36:25] Epoch: 1 Batch: 18791/20099 (93.49%) Loss: 2.039186 LR: 0.00000639 [17:36:27] Epoch: 1 Batch: 18792/20099 (93.50%) Loss: 1.941736 LR: 0.00000639 [17:36:29] Epoch: 1 Batch: 18793/20099 (93.50%) Loss: 2.182559 LR: 0.00000639 [17:36:31] Epoch: 1 Batch: 18794/20099 (93.51%) Loss: 1.916400 LR: 0.00000639 [17:36:33] Epoch: 1 Batch: 18795/20099 (93.51%) Loss: 2.411017 LR: 0.00000639 [17:36:35] Epoch: 1 Batch: 18796/20099 (93.52%) Loss: 2.071110 LR: 0.00000638 [17:36:36] Epoch: 1 Batch: 18797/20099 (93.52%) Loss: 2.140448 LR: 0.00000638 [17:36:38] Epoch: 1 Batch: 18798/20099 (93.53%) Loss: 2.302137 LR: 0.00000638 [17:36:40] Epoch: 1 Batch: 18799/20099 (93.53%) Loss: 2.134970 LR: 0.00000638 [17:36:46] >> Cleaned up old temp checkpoint: epoch1_step16800 [17:36:46] >> Temp checkpoint saved: epoch1_step18800, size: 0.1693 GB [17:36:46] Epoch: 1 Batch: 18800/20099 (93.54%) Loss: 1.885792 LR: 0.00000638 [17:36:48] Epoch: 1 Batch: 18801/20099 (93.54%) Loss: 1.947433 LR: 0.00000638 [17:36:49] Epoch: 1 Batch: 18802/20099 (93.55%) Loss: 1.903567 LR: 0.00000638 [17:36:51] Epoch: 1 Batch: 18803/20099 (93.55%) Loss: 2.278106 LR: 0.00000638 [17:36:53] Epoch: 1 Batch: 18804/20099 (93.56%) Loss: 2.265444 LR: 0.00000638 [17:36:55] Epoch: 1 Batch: 18805/20099 (93.56%) Loss: 2.026125 LR: 0.00000638 [17:36:57] Epoch: 1 Batch: 18806/20099 (93.57%) Loss: 2.132472 LR: 0.00000638 [17:36:59] Epoch: 1 Batch: 18807/20099 (93.57%) Loss: 1.751912 LR: 0.00000638 [17:37:01] Epoch: 1 Batch: 18808/20099 (93.58%) Loss: 2.314101 LR: 0.00000638 [17:37:02] Epoch: 1 Batch: 18809/20099 (93.58%) Loss: 2.162289 LR: 0.00000638 [17:37:04] Epoch: 1 Batch: 18810/20099 (93.59%) Loss: 2.132331 LR: 0.00000637 [17:37:06] Epoch: 1 Batch: 18811/20099 (93.59%) Loss: 1.926652 LR: 0.00000637 [17:37:08] Epoch: 1 Batch: 18812/20099 (93.60%) Loss: 2.130433 LR: 0.00000637 [17:37:10] Epoch: 1 Batch: 18813/20099 (93.60%) Loss: 2.084897 LR: 0.00000637 [17:37:12] Epoch: 1 Batch: 18814/20099 (93.61%) Loss: 2.275780 LR: 0.00000637 [17:37:14] Epoch: 1 Batch: 18815/20099 (93.61%) Loss: 2.090987 LR: 0.00000637 [17:37:16] Epoch: 1 Batch: 18816/20099 (93.62%) Loss: 2.123373 LR: 0.00000637 [17:37:17] Epoch: 1 Batch: 18817/20099 (93.62%) Loss: 1.644003 LR: 0.00000637 [17:37:19] Epoch: 1 Batch: 18818/20099 (93.63%) Loss: 2.177297 LR: 0.00000637 [17:37:21] Epoch: 1 Batch: 18819/20099 (93.63%) Loss: 1.641241 LR: 0.00000637 [17:37:23] Epoch: 1 Batch: 18820/20099 (93.64%) Loss: 2.119945 LR: 0.00000637 [17:37:25] Epoch: 1 Batch: 18821/20099 (93.64%) Loss: 2.314338 LR: 0.00000637 [17:37:27] Epoch: 1 Batch: 18822/20099 (93.65%) Loss: 1.986167 LR: 0.00000637 [17:37:29] Epoch: 1 Batch: 18823/20099 (93.65%) Loss: 2.032900 LR: 0.00000637 [17:37:30] Epoch: 1 Batch: 18824/20099 (93.66%) Loss: 1.994736 LR: 0.00000637 [17:37:32] Epoch: 1 Batch: 18825/20099 (93.66%) Loss: 2.139333 LR: 0.00000637 [17:37:34] Epoch: 1 Batch: 18826/20099 (93.67%) Loss: 2.302745 LR: 0.00000637 [17:37:36] Epoch: 1 Batch: 18827/20099 (93.67%) Loss: 2.265903 LR: 0.00000637 [17:37:38] Epoch: 1 Batch: 18828/20099 (93.68%) Loss: 1.683404 LR: 0.00000637 [17:37:40] Epoch: 1 Batch: 18829/20099 (93.68%) Loss: 2.141759 LR: 0.00000637 [17:37:41] Epoch: 1 Batch: 18830/20099 (93.69%) Loss: 2.068971 LR: 0.00000637 [17:37:43] Epoch: 1 Batch: 18831/20099 (93.69%) Loss: 2.064252 LR: 0.00000636 [17:37:45] Epoch: 1 Batch: 18832/20099 (93.70%) Loss: 1.994169 LR: 0.00000636 [17:37:47] Epoch: 1 Batch: 18833/20099 (93.70%) Loss: 2.008592 LR: 0.00000636 [17:37:49] Epoch: 1 Batch: 18834/20099 (93.71%) Loss: 2.055545 LR: 0.00000636 [17:37:51] Epoch: 1 Batch: 18835/20099 (93.71%) Loss: 2.405630 LR: 0.00000636 [17:37:53] Epoch: 1 Batch: 18836/20099 (93.72%) Loss: 1.993490 LR: 0.00000636 [17:37:54] Epoch: 1 Batch: 18837/20099 (93.72%) Loss: 2.127676 LR: 0.00000636 [17:37:56] Epoch: 1 Batch: 18838/20099 (93.73%) Loss: 2.072916 LR: 0.00000636 [17:37:58] Epoch: 1 Batch: 18839/20099 (93.73%) Loss: 2.077196 LR: 0.00000636 [17:38:00] Epoch: 1 Batch: 18840/20099 (93.74%) Loss: 2.022184 LR: 0.00000636 [17:38:02] Epoch: 1 Batch: 18841/20099 (93.74%) Loss: 2.049360 LR: 0.00000636 [17:38:04] Epoch: 1 Batch: 18842/20099 (93.75%) Loss: 1.737040 LR: 0.00000636 [17:38:05] Epoch: 1 Batch: 18843/20099 (93.75%) Loss: 2.266951 LR: 0.00000636 [17:38:07] Epoch: 1 Batch: 18844/20099 (93.76%) Loss: 2.244268 LR: 0.00000636 [17:38:09] Epoch: 1 Batch: 18845/20099 (93.76%) Loss: 2.109494 LR: 0.00000635 [17:38:11] Epoch: 1 Batch: 18846/20099 (93.77%) Loss: 2.008463 LR: 0.00000635 [17:38:13] Epoch: 1 Batch: 18847/20099 (93.77%) Loss: 2.133718 LR: 0.00000635 [17:38:15] Epoch: 1 Batch: 18848/20099 (93.78%) Loss: 1.984733 LR: 0.00000635 [17:38:17] Epoch: 1 Batch: 18849/20099 (93.78%) Loss: 2.152451 LR: 0.00000635 [17:38:18] Epoch: 1 Batch: 18850/20099 (93.79%) Loss: 1.904969 LR: 0.00000635 [17:38:20] Epoch: 1 Batch: 18851/20099 (93.79%) Loss: 2.103430 LR: 0.00000635 [17:38:22] Epoch: 1 Batch: 18852/20099 (93.80%) Loss: 1.832710 LR: 0.00000635 [17:38:24] Epoch: 1 Batch: 18853/20099 (93.80%) Loss: 2.296971 LR: 0.00000635 [17:38:26] Epoch: 1 Batch: 18854/20099 (93.81%) Loss: 2.229144 LR: 0.00000635 [17:38:28] Epoch: 1 Batch: 18855/20099 (93.81%) Loss: 2.040869 LR: 0.00000635 [17:38:30] Epoch: 1 Batch: 18856/20099 (93.82%) Loss: 1.851839 LR: 0.00000635 [17:38:31] Epoch: 1 Batch: 18857/20099 (93.82%) Loss: 2.076534 LR: 0.00000635 [17:38:33] Epoch: 1 Batch: 18858/20099 (93.83%) Loss: 1.885270 LR: 0.00000635 [17:38:35] Epoch: 1 Batch: 18859/20099 (93.83%) Loss: 2.301797 LR: 0.00000635 [17:38:37] Epoch: 1 Batch: 18860/20099 (93.84%) Loss: 2.399562 LR: 0.00000635 [17:38:39] Epoch: 1 Batch: 18861/20099 (93.84%) Loss: 2.322425 LR: 0.00000635 [17:38:41] Epoch: 1 Batch: 18862/20099 (93.85%) Loss: 2.041503 LR: 0.00000635 [17:38:43] Epoch: 1 Batch: 18863/20099 (93.85%) Loss: 1.829449 LR: 0.00000635 [17:38:45] Epoch: 1 Batch: 18864/20099 (93.86%) Loss: 2.224522 LR: 0.00000635 [17:38:46] Epoch: 1 Batch: 18865/20099 (93.86%) Loss: 2.050713 LR: 0.00000635 [17:38:48] Epoch: 1 Batch: 18866/20099 (93.87%) Loss: 2.088889 LR: 0.00000634 [17:38:50] Epoch: 1 Batch: 18867/20099 (93.87%) Loss: 1.903591 LR: 0.00000634 [17:38:52] Epoch: 1 Batch: 18868/20099 (93.88%) Loss: 1.837787 LR: 0.00000634 [17:38:54] Epoch: 1 Batch: 18869/20099 (93.88%) Loss: 1.899860 LR: 0.00000634 [17:38:56] Epoch: 1 Batch: 18870/20099 (93.89%) Loss: 2.153050 LR: 0.00000634 [17:38:58] Epoch: 1 Batch: 18871/20099 (93.89%) Loss: 2.115349 LR: 0.00000634 [17:38:59] Epoch: 1 Batch: 18872/20099 (93.90%) Loss: 2.281894 LR: 0.00000634 [17:39:01] Epoch: 1 Batch: 18873/20099 (93.90%) Loss: 2.020916 LR: 0.00000634 [17:39:03] Epoch: 1 Batch: 18874/20099 (93.91%) Loss: 2.364813 LR: 0.00000634 [17:39:05] Epoch: 1 Batch: 18875/20099 (93.91%) Loss: 2.139720 LR: 0.00000634 [17:39:07] Epoch: 1 Batch: 18876/20099 (93.92%) Loss: 2.006542 LR: 0.00000634 [17:39:09] Epoch: 1 Batch: 18877/20099 (93.92%) Loss: 2.313743 LR: 0.00000634 [17:39:11] Epoch: 1 Batch: 18878/20099 (93.93%) Loss: 2.136689 LR: 0.00000634 [17:39:12] Epoch: 1 Batch: 18879/20099 (93.93%) Loss: 2.237786 LR: 0.00000634 [17:39:14] Epoch: 1 Batch: 18880/20099 (93.94%) Loss: 2.011243 LR: 0.00000634 [17:39:16] Epoch: 1 Batch: 18881/20099 (93.94%) Loss: 1.684474 LR: 0.00000634 [17:39:18] Epoch: 1 Batch: 18882/20099 (93.94%) Loss: 2.020290 LR: 0.00000634 [17:39:20] Epoch: 1 Batch: 18883/20099 (93.95%) Loss: 2.218421 LR: 0.00000634 [17:39:22] Epoch: 1 Batch: 18884/20099 (93.95%) Loss: 2.297864 LR: 0.00000634 [17:39:24] Epoch: 1 Batch: 18885/20099 (93.96%) Loss: 2.290954 LR: 0.00000634 [17:39:25] Epoch: 1 Batch: 18886/20099 (93.96%) Loss: 2.012318 LR: 0.00000634 [17:39:27] Epoch: 1 Batch: 18887/20099 (93.97%) Loss: 2.335898 LR: 0.00000633 [17:39:29] Epoch: 1 Batch: 18888/20099 (93.97%) Loss: 2.256416 LR: 0.00000633 [17:39:31] Epoch: 1 Batch: 18889/20099 (93.98%) Loss: 2.014206 LR: 0.00000633 [17:39:33] Epoch: 1 Batch: 18890/20099 (93.98%) Loss: 2.244164 LR: 0.00000633 [17:39:35] Epoch: 1 Batch: 18891/20099 (93.99%) Loss: 2.372243 LR: 0.00000633 [17:39:37] Epoch: 1 Batch: 18892/20099 (93.99%) Loss: 1.874442 LR: 0.00000633 [17:39:38] Epoch: 1 Batch: 18893/20099 (94.00%) Loss: 2.349739 LR: 0.00000633 [17:39:40] Epoch: 1 Batch: 18894/20099 (94.00%) Loss: 2.053530 LR: 0.00000633 [17:39:42] Epoch: 1 Batch: 18895/20099 (94.01%) Loss: 2.091776 LR: 0.00000633 [17:39:44] Epoch: 1 Batch: 18896/20099 (94.01%) Loss: 2.131095 LR: 0.00000633 [17:39:46] Epoch: 1 Batch: 18897/20099 (94.02%) Loss: 1.939365 LR: 0.00000633 [17:39:48] Epoch: 1 Batch: 18898/20099 (94.02%) Loss: 2.188623 LR: 0.00000633 [17:39:50] Epoch: 1 Batch: 18899/20099 (94.03%) Loss: 2.270235 LR: 0.00000633 [17:39:51] Epoch: 1 Batch: 18900/20099 (94.03%) Loss: 2.249335 LR: 0.00000633 [17:39:53] Epoch: 1 Batch: 18901/20099 (94.04%) Loss: 1.662834 LR: 0.00000632 [17:39:55] Epoch: 1 Batch: 18902/20099 (94.04%) Loss: 2.107272 LR: 0.00000632 [17:39:57] Epoch: 1 Batch: 18903/20099 (94.05%) Loss: 2.056877 LR: 0.00000632 [17:39:59] Epoch: 1 Batch: 18904/20099 (94.05%) Loss: 2.058735 LR: 0.00000632 [17:40:01] Epoch: 1 Batch: 18905/20099 (94.06%) Loss: 2.231776 LR: 0.00000632 [17:40:02] Epoch: 1 Batch: 18906/20099 (94.06%) Loss: 2.030151 LR: 0.00000632 [17:40:04] Epoch: 1 Batch: 18907/20099 (94.07%) Loss: 2.122410 LR: 0.00000632 [17:40:06] Epoch: 1 Batch: 18908/20099 (94.07%) Loss: 2.169614 LR: 0.00000632 [17:40:08] Epoch: 1 Batch: 18909/20099 (94.08%) Loss: 2.206417 LR: 0.00000632 [17:40:10] Epoch: 1 Batch: 18910/20099 (94.08%) Loss: 2.175643 LR: 0.00000632 [17:40:12] Epoch: 1 Batch: 18911/20099 (94.09%) Loss: 1.995345 LR: 0.00000632 [17:40:14] Epoch: 1 Batch: 18912/20099 (94.09%) Loss: 2.025333 LR: 0.00000632 [17:40:15] Epoch: 1 Batch: 18913/20099 (94.10%) Loss: 1.953658 LR: 0.00000632 [17:40:17] Epoch: 1 Batch: 18914/20099 (94.10%) Loss: 1.967561 LR: 0.00000632 [17:40:19] Epoch: 1 Batch: 18915/20099 (94.11%) Loss: 2.115386 LR: 0.00000632 [17:40:21] Epoch: 1 Batch: 18916/20099 (94.11%) Loss: 2.210395 LR: 0.00000632 [17:40:23] Epoch: 1 Batch: 18917/20099 (94.12%) Loss: 2.008346 LR: 0.00000632 [17:40:25] Epoch: 1 Batch: 18918/20099 (94.12%) Loss: 2.211252 LR: 0.00000632 [17:40:27] Epoch: 1 Batch: 18919/20099 (94.13%) Loss: 2.087225 LR: 0.00000632 [17:40:28] Epoch: 1 Batch: 18920/20099 (94.13%) Loss: 1.850089 LR: 0.00000632 [17:40:30] Epoch: 1 Batch: 18921/20099 (94.14%) Loss: 2.062256 LR: 0.00000632 [17:40:32] Epoch: 1 Batch: 18922/20099 (94.14%) Loss: 1.976392 LR: 0.00000631 [17:40:34] Epoch: 1 Batch: 18923/20099 (94.15%) Loss: 2.228108 LR: 0.00000631 [17:40:36] Epoch: 1 Batch: 18924/20099 (94.15%) Loss: 2.288451 LR: 0.00000631 [17:40:38] Epoch: 1 Batch: 18925/20099 (94.16%) Loss: 2.137476 LR: 0.00000631 [17:40:39] Epoch: 1 Batch: 18926/20099 (94.16%) Loss: 1.900648 LR: 0.00000631 [17:40:41] Epoch: 1 Batch: 18927/20099 (94.17%) Loss: 2.049162 LR: 0.00000631 [17:40:43] Epoch: 1 Batch: 18928/20099 (94.17%) Loss: 1.781119 LR: 0.00000631 [17:40:45] Epoch: 1 Batch: 18929/20099 (94.18%) Loss: 1.956563 LR: 0.00000631 [17:40:47] Epoch: 1 Batch: 18930/20099 (94.18%) Loss: 2.024128 LR: 0.00000631 [17:40:49] Epoch: 1 Batch: 18931/20099 (94.19%) Loss: 1.839046 LR: 0.00000631 [17:40:51] Epoch: 1 Batch: 18932/20099 (94.19%) Loss: 2.398629 LR: 0.00000631 [17:40:52] Epoch: 1 Batch: 18933/20099 (94.20%) Loss: 1.855192 LR: 0.00000631 [17:40:54] Epoch: 1 Batch: 18934/20099 (94.20%) Loss: 1.978988 LR: 0.00000631 [17:40:56] Epoch: 1 Batch: 18935/20099 (94.21%) Loss: 1.873926 LR: 0.00000631 [17:40:58] Epoch: 1 Batch: 18936/20099 (94.21%) Loss: 2.253057 LR: 0.00000631 [17:41:00] Epoch: 1 Batch: 18937/20099 (94.22%) Loss: 2.187789 LR: 0.00000631 [17:41:02] Epoch: 1 Batch: 18938/20099 (94.22%) Loss: 2.210035 LR: 0.00000631 [17:41:03] Epoch: 1 Batch: 18939/20099 (94.23%) Loss: 2.040642 LR: 0.00000631 [17:41:05] Epoch: 1 Batch: 18940/20099 (94.23%) Loss: 2.014747 LR: 0.00000631 [17:41:07] Epoch: 1 Batch: 18941/20099 (94.24%) Loss: 1.973518 LR: 0.00000631 [17:41:09] Epoch: 1 Batch: 18942/20099 (94.24%) Loss: 2.036056 LR: 0.00000631 [17:41:11] Epoch: 1 Batch: 18943/20099 (94.25%) Loss: 2.435516 LR: 0.00000630 [17:41:13] Epoch: 1 Batch: 18944/20099 (94.25%) Loss: 2.070271 LR: 0.00000630 [17:41:15] Epoch: 1 Batch: 18945/20099 (94.26%) Loss: 2.029995 LR: 0.00000630 [17:41:16] Epoch: 1 Batch: 18946/20099 (94.26%) Loss: 2.152287 LR: 0.00000630 [17:41:18] Epoch: 1 Batch: 18947/20099 (94.27%) Loss: 1.930840 LR: 0.00000630 [17:41:20] Epoch: 1 Batch: 18948/20099 (94.27%) Loss: 2.202269 LR: 0.00000630 [17:41:22] Epoch: 1 Batch: 18949/20099 (94.28%) Loss: 1.884151 LR: 0.00000630 [17:41:24] Epoch: 1 Batch: 18950/20099 (94.28%) Loss: 2.040607 LR: 0.00000630 [17:41:26] Epoch: 1 Batch: 18951/20099 (94.29%) Loss: 2.072490 LR: 0.00000630 [17:41:27] Epoch: 1 Batch: 18952/20099 (94.29%) Loss: 1.976477 LR: 0.00000630 [17:41:29] Epoch: 1 Batch: 18953/20099 (94.30%) Loss: 2.136546 LR: 0.00000630 [17:41:31] Epoch: 1 Batch: 18954/20099 (94.30%) Loss: 1.837832 LR: 0.00000630 [17:41:33] Epoch: 1 Batch: 18955/20099 (94.31%) Loss: 2.195533 LR: 0.00000630 [17:41:35] Epoch: 1 Batch: 18956/20099 (94.31%) Loss: 2.130661 LR: 0.00000630 [17:41:37] Epoch: 1 Batch: 18957/20099 (94.32%) Loss: 2.150196 LR: 0.00000629 [17:41:39] Epoch: 1 Batch: 18958/20099 (94.32%) Loss: 2.515425 LR: 0.00000629 [17:41:40] Epoch: 1 Batch: 18959/20099 (94.33%) Loss: 2.175138 LR: 0.00000629 [17:41:42] Epoch: 1 Batch: 18960/20099 (94.33%) Loss: 2.110434 LR: 0.00000629 [17:41:44] Epoch: 1 Batch: 18961/20099 (94.34%) Loss: 2.090422 LR: 0.00000629 [17:41:46] Epoch: 1 Batch: 18962/20099 (94.34%) Loss: 2.086721 LR: 0.00000629 [17:41:48] Epoch: 1 Batch: 18963/20099 (94.35%) Loss: 2.134450 LR: 0.00000629 [17:41:50] Epoch: 1 Batch: 18964/20099 (94.35%) Loss: 2.224017 LR: 0.00000629 [17:41:52] Epoch: 1 Batch: 18965/20099 (94.36%) Loss: 1.878811 LR: 0.00000629 [17:41:53] Epoch: 1 Batch: 18966/20099 (94.36%) Loss: 2.216249 LR: 0.00000629 [17:41:55] Epoch: 1 Batch: 18967/20099 (94.37%) Loss: 2.389329 LR: 0.00000629 [17:41:57] Epoch: 1 Batch: 18968/20099 (94.37%) Loss: 1.848604 LR: 0.00000629 [17:41:59] Epoch: 1 Batch: 18969/20099 (94.38%) Loss: 2.376028 LR: 0.00000629 [17:42:01] Epoch: 1 Batch: 18970/20099 (94.38%) Loss: 1.992151 LR: 0.00000629 [17:42:03] Epoch: 1 Batch: 18971/20099 (94.39%) Loss: 2.107484 LR: 0.00000629 [17:42:04] Epoch: 1 Batch: 18972/20099 (94.39%) Loss: 2.242568 LR: 0.00000629 [17:42:06] Epoch: 1 Batch: 18973/20099 (94.40%) Loss: 1.898398 LR: 0.00000629 [17:42:08] Epoch: 1 Batch: 18974/20099 (94.40%) Loss: 1.725408 LR: 0.00000629 [17:42:10] Epoch: 1 Batch: 18975/20099 (94.41%) Loss: 1.672631 LR: 0.00000629 [17:42:12] Epoch: 1 Batch: 18976/20099 (94.41%) Loss: 1.989157 LR: 0.00000629 [17:42:14] Epoch: 1 Batch: 18977/20099 (94.42%) Loss: 2.038745 LR: 0.00000629 [17:42:16] Epoch: 1 Batch: 18978/20099 (94.42%) Loss: 2.059676 LR: 0.00000628 [17:42:17] Epoch: 1 Batch: 18979/20099 (94.43%) Loss: 2.221294 LR: 0.00000628 [17:42:19] Epoch: 1 Batch: 18980/20099 (94.43%) Loss: 2.347026 LR: 0.00000628 [17:42:21] Epoch: 1 Batch: 18981/20099 (94.44%) Loss: 2.304212 LR: 0.00000628 [17:42:23] Epoch: 1 Batch: 18982/20099 (94.44%) Loss: 2.377649 LR: 0.00000628 [17:42:25] Epoch: 1 Batch: 18983/20099 (94.45%) Loss: 1.896579 LR: 0.00000628 [17:42:27] Epoch: 1 Batch: 18984/20099 (94.45%) Loss: 1.970669 LR: 0.00000628 [17:42:29] Epoch: 1 Batch: 18985/20099 (94.46%) Loss: 2.072384 LR: 0.00000628 [17:42:30] Epoch: 1 Batch: 18986/20099 (94.46%) Loss: 2.213712 LR: 0.00000628 [17:42:32] Epoch: 1 Batch: 18987/20099 (94.47%) Loss: 1.848153 LR: 0.00000628 [17:42:34] Epoch: 1 Batch: 18988/20099 (94.47%) Loss: 2.251995 LR: 0.00000628 [17:42:36] Epoch: 1 Batch: 18989/20099 (94.48%) Loss: 2.018346 LR: 0.00000628 [17:42:38] Epoch: 1 Batch: 18990/20099 (94.48%) Loss: 2.179210 LR: 0.00000628 [17:42:40] Epoch: 1 Batch: 18991/20099 (94.49%) Loss: 2.129859 LR: 0.00000628 [17:42:42] Epoch: 1 Batch: 18992/20099 (94.49%) Loss: 1.910518 LR: 0.00000628 [17:42:43] Epoch: 1 Batch: 18993/20099 (94.50%) Loss: 2.022862 LR: 0.00000628 [17:42:45] Epoch: 1 Batch: 18994/20099 (94.50%) Loss: 2.017751 LR: 0.00000628 [17:42:47] Epoch: 1 Batch: 18995/20099 (94.51%) Loss: 2.208992 LR: 0.00000628 [17:42:49] Epoch: 1 Batch: 18996/20099 (94.51%) Loss: 2.124212 LR: 0.00000628 [17:42:51] Epoch: 1 Batch: 18997/20099 (94.52%) Loss: 2.006035 LR: 0.00000628 [17:42:53] Epoch: 1 Batch: 18998/20099 (94.52%) Loss: 2.086472 LR: 0.00000628 [17:42:55] Epoch: 1 Batch: 18999/20099 (94.53%) Loss: 2.119793 LR: 0.00000627 [17:42:56] >> Evaluating batch 0 [17:42:58] >> Evaluating batch 1 [17:42:59] >> Evaluating batch 2 [17:43:00] >> Evaluating batch 3 [17:43:01] >> Evaluating batch 4 [17:43:02] >> Evaluating batch 5 [17:43:03] >> Evaluating batch 6 [17:43:04] >> Evaluating batch 7 [17:43:05] >> Evaluating batch 8 [17:43:06] >> Evaluating batch 9 [17:43:07] >> Evaluating batch 10 [17:43:08] >> Evaluating batch 11 [17:43:09] >> Evaluating batch 12 [17:43:10] >> Evaluating batch 13 [17:43:11] >> Evaluating batch 14 [17:43:12] >> Evaluating batch 15 [17:43:13] >> Evaluating batch 16 [17:43:14] Epoch: 1 Step: 19000/20099 Evaluation: [17:43:14] [1mAvg Loss Since Last Eval: 2.0872 Val Loss: 2.1460 Validation loss delta: 0.0001 Perplexity: 8.5502 LR: 0.00000627 [17:43:17] >> Cleaned up old temp checkpoint: epoch1_step17000 [17:43:17] >> Temp checkpoint saved: epoch1_step19000, size: 0.1693 GB [17:43:21] >> Checkpoint saved: epoch1_step19000, size: 0.1693 GB [17:43:21] Epoch: 1 Batch: 19000/20099 (94.53%) Loss: 2.124066 LR: 0.00000627 [17:43:23] Epoch: 1 Batch: 19001/20099 (94.54%) Loss: 1.977629 LR: 0.00000627 [17:43:25] Epoch: 1 Batch: 19002/20099 (94.54%) Loss: 2.163679 LR: 0.00000627 [17:43:27] Epoch: 1 Batch: 19003/20099 (94.55%) Loss: 2.211445 LR: 0.00000627 [17:43:29] Epoch: 1 Batch: 19004/20099 (94.55%) Loss: 2.461773 LR: 0.00000627 [17:43:30] Epoch: 1 Batch: 19005/20099 (94.56%) Loss: 2.194065 LR: 0.00000627 [17:43:32] Epoch: 1 Batch: 19006/20099 (94.56%) Loss: 2.215211 LR: 0.00000627 [17:43:34] Epoch: 1 Batch: 19007/20099 (94.57%) Loss: 2.036795 LR: 0.00000627 [17:43:36] Epoch: 1 Batch: 19008/20099 (94.57%) Loss: 2.430990 LR: 0.00000627 [17:43:38] Epoch: 1 Batch: 19009/20099 (94.58%) Loss: 2.280551 LR: 0.00000627 [17:43:40] Epoch: 1 Batch: 19010/20099 (94.58%) Loss: 1.902504 LR: 0.00000627 [17:43:42] Epoch: 1 Batch: 19011/20099 (94.59%) Loss: 2.033902 LR: 0.00000627 [17:43:44] Epoch: 1 Batch: 19012/20099 (94.59%) Loss: 2.101706 LR: 0.00000627 [17:43:46] Epoch: 1 Batch: 19013/20099 (94.60%) Loss: 2.397848 LR: 0.00000627 [17:43:47] Epoch: 1 Batch: 19014/20099 (94.60%) Loss: 2.104726 LR: 0.00000627 [17:43:49] Epoch: 1 Batch: 19015/20099 (94.61%) Loss: 2.174184 LR: 0.00000627 [17:43:51] Epoch: 1 Batch: 19016/20099 (94.61%) Loss: 1.822861 LR: 0.00000627 [17:43:53] Epoch: 1 Batch: 19017/20099 (94.62%) Loss: 1.988816 LR: 0.00000627 [17:43:55] Epoch: 1 Batch: 19018/20099 (94.62%) Loss: 2.130287 LR: 0.00000627 [17:43:57] Epoch: 1 Batch: 19019/20099 (94.63%) Loss: 2.064925 LR: 0.00000627 [17:43:59] Epoch: 1 Batch: 19020/20099 (94.63%) Loss: 1.901406 LR: 0.00000626 [17:44:01] Epoch: 1 Batch: 19021/20099 (94.64%) Loss: 1.936075 LR: 0.00000626 [17:44:03] Epoch: 1 Batch: 19022/20099 (94.64%) Loss: 1.969674 LR: 0.00000626 [17:44:04] Epoch: 1 Batch: 19023/20099 (94.65%) Loss: 2.344706 LR: 0.00000626 [17:44:06] Epoch: 1 Batch: 19024/20099 (94.65%) Loss: 1.944178 LR: 0.00000626 [17:44:08] Epoch: 1 Batch: 19025/20099 (94.66%) Loss: 2.168022 LR: 0.00000626 [17:44:10] Epoch: 1 Batch: 19026/20099 (94.66%) Loss: 1.888553 LR: 0.00000626 [17:44:12] Epoch: 1 Batch: 19027/20099 (94.67%) Loss: 2.058875 LR: 0.00000626 [17:44:14] Epoch: 1 Batch: 19028/20099 (94.67%) Loss: 1.962032 LR: 0.00000626 [17:44:16] Epoch: 1 Batch: 19029/20099 (94.68%) Loss: 1.925310 LR: 0.00000626 [17:44:17] Epoch: 1 Batch: 19030/20099 (94.68%) Loss: 2.049188 LR: 0.00000626 [17:44:19] Epoch: 1 Batch: 19031/20099 (94.69%) Loss: 1.792070 LR: 0.00000626 [17:44:21] Epoch: 1 Batch: 19032/20099 (94.69%) Loss: 1.949693 LR: 0.00000626 [17:44:23] Epoch: 1 Batch: 19033/20099 (94.70%) Loss: 2.184498 LR: 0.00000626 [17:44:25] Epoch: 1 Batch: 19034/20099 (94.70%) Loss: 2.177499 LR: 0.00000626 [17:44:27] Epoch: 1 Batch: 19035/20099 (94.71%) Loss: 1.747552 LR: 0.00000626 [17:44:28] Epoch: 1 Batch: 19036/20099 (94.71%) Loss: 2.124082 LR: 0.00000626 [17:44:30] Epoch: 1 Batch: 19037/20099 (94.72%) Loss: 2.147720 LR: 0.00000626 [17:44:32] Epoch: 1 Batch: 19038/20099 (94.72%) Loss: 1.956648 LR: 0.00000626 [17:44:34] Epoch: 1 Batch: 19039/20099 (94.73%) Loss: 2.151826 LR: 0.00000626 [17:44:36] Epoch: 1 Batch: 19040/20099 (94.73%) Loss: 2.102213 LR: 0.00000626 [17:44:38] Epoch: 1 Batch: 19041/20099 (94.74%) Loss: 1.980709 LR: 0.00000625 [17:44:40] Epoch: 1 Batch: 19042/20099 (94.74%) Loss: 2.130724 LR: 0.00000625 [17:44:41] Epoch: 1 Batch: 19043/20099 (94.75%) Loss: 1.990044 LR: 0.00000625 [17:44:43] Epoch: 1 Batch: 19044/20099 (94.75%) Loss: 2.223107 LR: 0.00000625 [17:44:45] Epoch: 1 Batch: 19045/20099 (94.76%) Loss: 1.653735 LR: 0.00000625 [17:44:47] Epoch: 1 Batch: 19046/20099 (94.76%) Loss: 2.267890 LR: 0.00000625 [17:44:49] Epoch: 1 Batch: 19047/20099 (94.77%) Loss: 1.986172 LR: 0.00000625 [17:44:51] Epoch: 1 Batch: 19048/20099 (94.77%) Loss: 1.642796 LR: 0.00000625 [17:44:52] Epoch: 1 Batch: 19049/20099 (94.78%) Loss: 2.098203 LR: 0.00000625 [17:44:54] Epoch: 1 Batch: 19050/20099 (94.78%) Loss: 2.115925 LR: 0.00000625 [17:44:56] Epoch: 1 Batch: 19051/20099 (94.79%) Loss: 2.317911 LR: 0.00000625 [17:44:58] Epoch: 1 Batch: 19052/20099 (94.79%) Loss: 1.670208 LR: 0.00000625 [17:45:00] Epoch: 1 Batch: 19053/20099 (94.80%) Loss: 2.267307 LR: 0.00000625 [17:45:02] Epoch: 1 Batch: 19054/20099 (94.80%) Loss: 2.235002 LR: 0.00000625 [17:45:04] Epoch: 1 Batch: 19055/20099 (94.81%) Loss: 1.947238 LR: 0.00000625 [17:45:05] Epoch: 1 Batch: 19056/20099 (94.81%) Loss: 2.132453 LR: 0.00000625 [17:45:07] Epoch: 1 Batch: 19057/20099 (94.82%) Loss: 2.061967 LR: 0.00000625 [17:45:09] Epoch: 1 Batch: 19058/20099 (94.82%) Loss: 2.113499 LR: 0.00000625 [17:45:11] Epoch: 1 Batch: 19059/20099 (94.83%) Loss: 1.874085 LR: 0.00000625 [17:45:13] Epoch: 1 Batch: 19060/20099 (94.83%) Loss: 2.107150 LR: 0.00000625 [17:45:15] Epoch: 1 Batch: 19061/20099 (94.84%) Loss: 2.134414 LR: 0.00000625 [17:45:17] Epoch: 1 Batch: 19062/20099 (94.84%) Loss: 2.422300 LR: 0.00000624 [17:45:18] Epoch: 1 Batch: 19063/20099 (94.85%) Loss: 1.945963 LR: 0.00000624 [17:45:20] Epoch: 1 Batch: 19064/20099 (94.85%) Loss: 2.076646 LR: 0.00000624 [17:45:22] Epoch: 1 Batch: 19065/20099 (94.86%) Loss: 1.970679 LR: 0.00000624 [17:45:24] Epoch: 1 Batch: 19066/20099 (94.86%) Loss: 1.979329 LR: 0.00000624 [17:45:26] Epoch: 1 Batch: 19067/20099 (94.87%) Loss: 2.095322 LR: 0.00000624 [17:45:28] Epoch: 1 Batch: 19068/20099 (94.87%) Loss: 1.995820 LR: 0.00000624 [17:45:30] Epoch: 1 Batch: 19069/20099 (94.88%) Loss: 2.166484 LR: 0.00000624 [17:45:31] Epoch: 1 Batch: 19070/20099 (94.88%) Loss: 1.972367 LR: 0.00000624 [17:45:33] Epoch: 1 Batch: 19071/20099 (94.89%) Loss: 1.820091 LR: 0.00000624 [17:45:35] Epoch: 1 Batch: 19072/20099 (94.89%) Loss: 1.982157 LR: 0.00000624 [17:45:37] Epoch: 1 Batch: 19073/20099 (94.90%) Loss: 2.092489 LR: 0.00000624 [17:45:39] Epoch: 1 Batch: 19074/20099 (94.90%) Loss: 1.747104 LR: 0.00000624 [17:45:41] Epoch: 1 Batch: 19075/20099 (94.91%) Loss: 2.365318 LR: 0.00000624 [17:45:43] Epoch: 1 Batch: 19076/20099 (94.91%) Loss: 1.736719 LR: 0.00000624 [17:45:45] Epoch: 1 Batch: 19077/20099 (94.92%) Loss: 2.252074 LR: 0.00000624 [17:45:46] Epoch: 1 Batch: 19078/20099 (94.92%) Loss: 2.164671 LR: 0.00000624 [17:45:48] Epoch: 1 Batch: 19079/20099 (94.93%) Loss: 2.262967 LR: 0.00000624 [17:45:50] Epoch: 1 Batch: 19080/20099 (94.93%) Loss: 2.020811 LR: 0.00000624 [17:45:52] Epoch: 1 Batch: 19081/20099 (94.94%) Loss: 2.084850 LR: 0.00000624 [17:45:54] Epoch: 1 Batch: 19082/20099 (94.94%) Loss: 2.289929 LR: 0.00000624 [17:45:56] Epoch: 1 Batch: 19083/20099 (94.95%) Loss: 1.871007 LR: 0.00000623 [17:45:58] Epoch: 1 Batch: 19084/20099 (94.95%) Loss: 2.352017 LR: 0.00000623 [17:45:59] Epoch: 1 Batch: 19085/20099 (94.95%) Loss: 2.158286 LR: 0.00000623 [17:46:01] Epoch: 1 Batch: 19086/20099 (94.96%) Loss: 2.090886 LR: 0.00000623 [17:46:03] Epoch: 1 Batch: 19087/20099 (94.96%) Loss: 2.222639 LR: 0.00000623 [17:46:05] Epoch: 1 Batch: 19088/20099 (94.97%) Loss: 2.144529 LR: 0.00000623 [17:46:07] Epoch: 1 Batch: 19089/20099 (94.97%) Loss: 2.232392 LR: 0.00000623 [17:46:09] Epoch: 1 Batch: 19090/20099 (94.98%) Loss: 2.151242 LR: 0.00000623 [17:46:11] Epoch: 1 Batch: 19091/20099 (94.98%) Loss: 2.151306 LR: 0.00000623 [17:46:12] Epoch: 1 Batch: 19092/20099 (94.99%) Loss: 2.221907 LR: 0.00000623 [17:46:14] Epoch: 1 Batch: 19093/20099 (94.99%) Loss: 1.769434 LR: 0.00000623 [17:46:16] Epoch: 1 Batch: 19094/20099 (95.00%) Loss: 1.890808 LR: 0.00000623 [17:46:18] Epoch: 1 Batch: 19095/20099 (95.00%) Loss: 2.033147 LR: 0.00000623 [17:46:20] Epoch: 1 Batch: 19096/20099 (95.01%) Loss: 2.116609 LR: 0.00000623 [17:46:22] Epoch: 1 Batch: 19097/20099 (95.01%) Loss: 2.210115 LR: 0.00000623 [17:46:24] Epoch: 1 Batch: 19098/20099 (95.02%) Loss: 2.256449 LR: 0.00000623 [17:46:25] Epoch: 1 Batch: 19099/20099 (95.02%) Loss: 2.313223 LR: 0.00000623 [17:46:27] Epoch: 1 Batch: 19100/20099 (95.03%) Loss: 1.688814 LR: 0.00000623 [17:46:29] Epoch: 1 Batch: 19101/20099 (95.03%) Loss: 2.271181 LR: 0.00000623 [17:46:31] Epoch: 1 Batch: 19102/20099 (95.04%) Loss: 1.967984 LR: 0.00000623 [17:46:33] Epoch: 1 Batch: 19103/20099 (95.04%) Loss: 1.996438 LR: 0.00000623 [17:46:35] Epoch: 1 Batch: 19104/20099 (95.05%) Loss: 1.968906 LR: 0.00000622 [17:46:37] Epoch: 1 Batch: 19105/20099 (95.05%) Loss: 2.072537 LR: 0.00000622 [17:46:38] Epoch: 1 Batch: 19106/20099 (95.06%) Loss: 2.073584 LR: 0.00000622 [17:46:40] Epoch: 1 Batch: 19107/20099 (95.06%) Loss: 1.953782 LR: 0.00000622 [17:46:42] Epoch: 1 Batch: 19108/20099 (95.07%) Loss: 2.123389 LR: 0.00000622 [17:46:44] Epoch: 1 Batch: 19109/20099 (95.07%) Loss: 1.984866 LR: 0.00000622 [17:46:46] Epoch: 1 Batch: 19110/20099 (95.08%) Loss: 2.157506 LR: 0.00000622 [17:46:48] Epoch: 1 Batch: 19111/20099 (95.08%) Loss: 2.095464 LR: 0.00000622 [17:46:50] Epoch: 1 Batch: 19112/20099 (95.09%) Loss: 1.908317 LR: 0.00000622 [17:46:51] Epoch: 1 Batch: 19113/20099 (95.09%) Loss: 2.032139 LR: 0.00000622 [17:46:53] Epoch: 1 Batch: 19114/20099 (95.10%) Loss: 2.219545 LR: 0.00000622 [17:46:55] Epoch: 1 Batch: 19115/20099 (95.10%) Loss: 2.160140 LR: 0.00000622 [17:46:57] Epoch: 1 Batch: 19116/20099 (95.11%) Loss: 2.153012 LR: 0.00000622 [17:46:59] Epoch: 1 Batch: 19117/20099 (95.11%) Loss: 2.140130 LR: 0.00000622 [17:47:01] Epoch: 1 Batch: 19118/20099 (95.12%) Loss: 2.097713 LR: 0.00000622 [17:47:03] Epoch: 1 Batch: 19119/20099 (95.12%) Loss: 2.416894 LR: 0.00000622 [17:47:04] Epoch: 1 Batch: 19120/20099 (95.13%) Loss: 1.848419 LR: 0.00000622 [17:47:06] Epoch: 1 Batch: 19121/20099 (95.13%) Loss: 2.056425 LR: 0.00000622 [17:47:08] Epoch: 1 Batch: 19122/20099 (95.14%) Loss: 2.144402 LR: 0.00000622 [17:47:10] Epoch: 1 Batch: 19123/20099 (95.14%) Loss: 1.922931 LR: 0.00000622 [17:47:12] Epoch: 1 Batch: 19124/20099 (95.15%) Loss: 2.279506 LR: 0.00000622 [17:47:14] Epoch: 1 Batch: 19125/20099 (95.15%) Loss: 2.003331 LR: 0.00000621 [17:47:16] Epoch: 1 Batch: 19126/20099 (95.16%) Loss: 1.974100 LR: 0.00000621 [17:47:17] Epoch: 1 Batch: 19127/20099 (95.16%) Loss: 1.791069 LR: 0.00000621 [17:47:19] Epoch: 1 Batch: 19128/20099 (95.17%) Loss: 2.304812 LR: 0.00000621 [17:47:21] Epoch: 1 Batch: 19129/20099 (95.17%) Loss: 2.248230 LR: 0.00000621 [17:47:23] Epoch: 1 Batch: 19130/20099 (95.18%) Loss: 2.053443 LR: 0.00000621 [17:47:25] Epoch: 1 Batch: 19131/20099 (95.18%) Loss: 2.002259 LR: 0.00000621 [17:47:27] Epoch: 1 Batch: 19132/20099 (95.19%) Loss: 1.951243 LR: 0.00000621 [17:47:29] Epoch: 1 Batch: 19133/20099 (95.19%) Loss: 2.085312 LR: 0.00000621 [17:47:30] Epoch: 1 Batch: 19134/20099 (95.20%) Loss: 2.450269 LR: 0.00000621 [17:47:32] Epoch: 1 Batch: 19135/20099 (95.20%) Loss: 2.315492 LR: 0.00000621 [17:47:34] Epoch: 1 Batch: 19136/20099 (95.21%) Loss: 1.946923 LR: 0.00000621 [17:47:36] Epoch: 1 Batch: 19137/20099 (95.21%) Loss: 2.330607 LR: 0.00000621 [17:47:38] Epoch: 1 Batch: 19138/20099 (95.22%) Loss: 2.296384 LR: 0.00000621 [17:47:40] Epoch: 1 Batch: 19139/20099 (95.22%) Loss: 2.013048 LR: 0.00000621 [17:47:42] Epoch: 1 Batch: 19140/20099 (95.23%) Loss: 2.147447 LR: 0.00000621 [17:47:43] Epoch: 1 Batch: 19141/20099 (95.23%) Loss: 1.794285 LR: 0.00000621 [17:47:45] Epoch: 1 Batch: 19142/20099 (95.24%) Loss: 2.182471 LR: 0.00000621 [17:47:47] Epoch: 1 Batch: 19143/20099 (95.24%) Loss: 1.964452 LR: 0.00000621 [17:47:49] Epoch: 1 Batch: 19144/20099 (95.25%) Loss: 1.981396 LR: 0.00000621 [17:47:51] Epoch: 1 Batch: 19145/20099 (95.25%) Loss: 1.793979 LR: 0.00000621 [17:47:53] Epoch: 1 Batch: 19146/20099 (95.26%) Loss: 1.831638 LR: 0.00000621 [17:47:54] Epoch: 1 Batch: 19147/20099 (95.26%) Loss: 2.008935 LR: 0.00000621 [17:47:56] Epoch: 1 Batch: 19148/20099 (95.27%) Loss: 1.768438 LR: 0.00000621 [17:47:58] Epoch: 1 Batch: 19149/20099 (95.27%) Loss: 2.036927 LR: 0.00000621 [17:48:00] Epoch: 1 Batch: 19150/20099 (95.28%) Loss: 2.165780 LR: 0.00000621 [17:48:02] Epoch: 1 Batch: 19151/20099 (95.28%) Loss: 1.613072 LR: 0.00000621 [17:48:04] Epoch: 1 Batch: 19152/20099 (95.29%) Loss: 1.954875 LR: 0.00000621 [17:48:06] Epoch: 1 Batch: 19153/20099 (95.29%) Loss: 1.835941 LR: 0.00000620 [17:48:07] Epoch: 1 Batch: 19154/20099 (95.30%) Loss: 2.073872 LR: 0.00000620 [17:48:09] Epoch: 1 Batch: 19155/20099 (95.30%) Loss: 2.034414 LR: 0.00000620 [17:48:11] Epoch: 1 Batch: 19156/20099 (95.31%) Loss: 1.940453 LR: 0.00000620 [17:48:13] Epoch: 1 Batch: 19157/20099 (95.31%) Loss: 1.988105 LR: 0.00000620 [17:48:15] Epoch: 1 Batch: 19158/20099 (95.32%) Loss: 2.059215 LR: 0.00000620 [17:48:17] Epoch: 1 Batch: 19159/20099 (95.32%) Loss: 2.075120 LR: 0.00000620 [17:48:19] Epoch: 1 Batch: 19160/20099 (95.33%) Loss: 1.976802 LR: 0.00000620 [17:48:20] Epoch: 1 Batch: 19161/20099 (95.33%) Loss: 1.928421 LR: 0.00000620 [17:48:22] Epoch: 1 Batch: 19162/20099 (95.34%) Loss: 2.027040 LR: 0.00000620 [17:48:24] Epoch: 1 Batch: 19163/20099 (95.34%) Loss: 1.999191 LR: 0.00000620 [17:48:26] Epoch: 1 Batch: 19164/20099 (95.35%) Loss: 2.044357 LR: 0.00000620 [17:48:28] Epoch: 1 Batch: 19165/20099 (95.35%) Loss: 2.076180 LR: 0.00000620 [17:48:30] Epoch: 1 Batch: 19166/20099 (95.36%) Loss: 2.222180 LR: 0.00000620 [17:48:31] Epoch: 1 Batch: 19167/20099 (95.36%) Loss: 1.950299 LR: 0.00000620 [17:48:33] Epoch: 1 Batch: 19168/20099 (95.37%) Loss: 2.166123 LR: 0.00000620 [17:48:35] Epoch: 1 Batch: 19169/20099 (95.37%) Loss: 2.267328 LR: 0.00000620 [17:48:37] Epoch: 1 Batch: 19170/20099 (95.38%) Loss: 1.901120 LR: 0.00000620 [17:48:39] Epoch: 1 Batch: 19171/20099 (95.38%) Loss: 2.203068 LR: 0.00000620 [17:48:41] Epoch: 1 Batch: 19172/20099 (95.39%) Loss: 1.718506 LR: 0.00000620 [17:48:43] Epoch: 1 Batch: 19173/20099 (95.39%) Loss: 2.164622 LR: 0.00000620 [17:48:44] Epoch: 1 Batch: 19174/20099 (95.40%) Loss: 2.154397 LR: 0.00000619 [17:48:46] Epoch: 1 Batch: 19175/20099 (95.40%) Loss: 2.189280 LR: 0.00000619 [17:48:48] Epoch: 1 Batch: 19176/20099 (95.41%) Loss: 1.990188 LR: 0.00000619 [17:48:50] Epoch: 1 Batch: 19177/20099 (95.41%) Loss: 2.105336 LR: 0.00000619 [17:48:52] Epoch: 1 Batch: 19178/20099 (95.42%) Loss: 2.048081 LR: 0.00000619 [17:48:54] Epoch: 1 Batch: 19179/20099 (95.42%) Loss: 2.012219 LR: 0.00000619 [17:48:56] Epoch: 1 Batch: 19180/20099 (95.43%) Loss: 2.290995 LR: 0.00000619 [17:48:57] Epoch: 1 Batch: 19181/20099 (95.43%) Loss: 2.003184 LR: 0.00000619 [17:48:59] Epoch: 1 Batch: 19182/20099 (95.44%) Loss: 2.192874 LR: 0.00000619 [17:49:01] Epoch: 1 Batch: 19183/20099 (95.44%) Loss: 2.045545 LR: 0.00000619 [17:49:03] Epoch: 1 Batch: 19184/20099 (95.45%) Loss: 1.915342 LR: 0.00000619 [17:49:05] Epoch: 1 Batch: 19185/20099 (95.45%) Loss: 1.938540 LR: 0.00000619 [17:49:07] Epoch: 1 Batch: 19186/20099 (95.46%) Loss: 2.322129 LR: 0.00000619 [17:49:09] Epoch: 1 Batch: 19187/20099 (95.46%) Loss: 2.157093 LR: 0.00000619 [17:49:10] Epoch: 1 Batch: 19188/20099 (95.47%) Loss: 2.030738 LR: 0.00000619 [17:49:12] Epoch: 1 Batch: 19189/20099 (95.47%) Loss: 1.976158 LR: 0.00000619 [17:49:14] Epoch: 1 Batch: 19190/20099 (95.48%) Loss: 2.063291 LR: 0.00000619 [17:49:16] Epoch: 1 Batch: 19191/20099 (95.48%) Loss: 1.848789 LR: 0.00000619 [17:49:18] Epoch: 1 Batch: 19192/20099 (95.49%) Loss: 1.696146 LR: 0.00000619 [17:49:20] Epoch: 1 Batch: 19193/20099 (95.49%) Loss: 2.008295 LR: 0.00000619 [17:49:22] Epoch: 1 Batch: 19194/20099 (95.50%) Loss: 2.154993 LR: 0.00000619 [17:49:23] Epoch: 1 Batch: 19195/20099 (95.50%) Loss: 2.204476 LR: 0.00000619 [17:49:25] Epoch: 1 Batch: 19196/20099 (95.51%) Loss: 2.223096 LR: 0.00000619 [17:49:27] Epoch: 1 Batch: 19197/20099 (95.51%) Loss: 2.149396 LR: 0.00000619 [17:49:29] Epoch: 1 Batch: 19198/20099 (95.52%) Loss: 2.151521 LR: 0.00000619 [17:49:31] Epoch: 1 Batch: 19199/20099 (95.52%) Loss: 1.866635 LR: 0.00000619 [17:49:36] >> Cleaned up old temp checkpoint: epoch1_step17200 [17:49:36] >> Temp checkpoint saved: epoch1_step19200, size: 0.1693 GB [17:49:36] Epoch: 1 Batch: 19200/20099 (95.53%) Loss: 2.144485 LR: 0.00000619 [17:49:38] Epoch: 1 Batch: 19201/20099 (95.53%) Loss: 2.156549 LR: 0.00000619 [17:49:40] Epoch: 1 Batch: 19202/20099 (95.54%) Loss: 2.143527 LR: 0.00000618 [17:49:42] Epoch: 1 Batch: 19203/20099 (95.54%) Loss: 2.074331 LR: 0.00000618 [17:49:44] Epoch: 1 Batch: 19204/20099 (95.55%) Loss: 1.984075 LR: 0.00000618 [17:49:45] Epoch: 1 Batch: 19205/20099 (95.55%) Loss: 1.979135 LR: 0.00000618 [17:49:47] Epoch: 1 Batch: 19206/20099 (95.56%) Loss: 2.152142 LR: 0.00000618 [17:49:49] Epoch: 1 Batch: 19207/20099 (95.56%) Loss: 2.554687 LR: 0.00000618 [17:49:51] Epoch: 1 Batch: 19208/20099 (95.57%) Loss: 2.116932 LR: 0.00000618 [17:49:53] Epoch: 1 Batch: 19209/20099 (95.57%) Loss: 2.299011 LR: 0.00000618 [17:49:55] Epoch: 1 Batch: 19210/20099 (95.58%) Loss: 2.098840 LR: 0.00000618 [17:49:57] Epoch: 1 Batch: 19211/20099 (95.58%) Loss: 2.005692 LR: 0.00000618 [17:49:59] Epoch: 1 Batch: 19212/20099 (95.59%) Loss: 1.906707 LR: 0.00000618 [17:50:00] Epoch: 1 Batch: 19213/20099 (95.59%) Loss: 2.253526 LR: 0.00000618 [17:50:02] Epoch: 1 Batch: 19214/20099 (95.60%) Loss: 2.148888 LR: 0.00000618 [17:50:04] Epoch: 1 Batch: 19215/20099 (95.60%) Loss: 2.282751 LR: 0.00000618 [17:50:06] Epoch: 1 Batch: 19216/20099 (95.61%) Loss: 2.112956 LR: 0.00000618 [17:50:08] Epoch: 1 Batch: 19217/20099 (95.61%) Loss: 2.117719 LR: 0.00000618 [17:50:10] Epoch: 1 Batch: 19218/20099 (95.62%) Loss: 2.298433 LR: 0.00000618 [17:50:12] Epoch: 1 Batch: 19219/20099 (95.62%) Loss: 2.285566 LR: 0.00000618 [17:50:14] Epoch: 1 Batch: 19220/20099 (95.63%) Loss: 2.140619 LR: 0.00000618 [17:50:15] Epoch: 1 Batch: 19221/20099 (95.63%) Loss: 2.106320 LR: 0.00000618 [17:50:17] Epoch: 1 Batch: 19222/20099 (95.64%) Loss: 2.344803 LR: 0.00000618 [17:50:19] Epoch: 1 Batch: 19223/20099 (95.64%) Loss: 1.940233 LR: 0.00000617 [17:50:21] Epoch: 1 Batch: 19224/20099 (95.65%) Loss: 1.980187 LR: 0.00000617 [17:50:23] Epoch: 1 Batch: 19225/20099 (95.65%) Loss: 2.072789 LR: 0.00000617 [17:50:25] Epoch: 1 Batch: 19226/20099 (95.66%) Loss: 2.085811 LR: 0.00000617 [17:50:27] Epoch: 1 Batch: 19227/20099 (95.66%) Loss: 2.177317 LR: 0.00000617 [17:50:28] Epoch: 1 Batch: 19228/20099 (95.67%) Loss: 2.180286 LR: 0.00000617 [17:50:30] Epoch: 1 Batch: 19229/20099 (95.67%) Loss: 1.923912 LR: 0.00000617 [17:50:32] Epoch: 1 Batch: 19230/20099 (95.68%) Loss: 2.132566 LR: 0.00000617 [17:50:34] Epoch: 1 Batch: 19231/20099 (95.68%) Loss: 2.087453 LR: 0.00000617 [17:50:36] Epoch: 1 Batch: 19232/20099 (95.69%) Loss: 1.976746 LR: 0.00000617 [17:50:38] Epoch: 1 Batch: 19233/20099 (95.69%) Loss: 1.997505 LR: 0.00000617 [17:50:40] Epoch: 1 Batch: 19234/20099 (95.70%) Loss: 2.364505 LR: 0.00000617 [17:50:41] Epoch: 1 Batch: 19235/20099 (95.70%) Loss: 2.005272 LR: 0.00000617 [17:50:43] Epoch: 1 Batch: 19236/20099 (95.71%) Loss: 2.084250 LR: 0.00000617 [17:50:45] Epoch: 1 Batch: 19237/20099 (95.71%) Loss: 2.101813 LR: 0.00000617 [17:50:47] Epoch: 1 Batch: 19238/20099 (95.72%) Loss: 1.898337 LR: 0.00000617 [17:50:49] Epoch: 1 Batch: 19239/20099 (95.72%) Loss: 1.864245 LR: 0.00000617 [17:50:51] Epoch: 1 Batch: 19240/20099 (95.73%) Loss: 1.887712 LR: 0.00000617 [17:50:52] Epoch: 1 Batch: 19241/20099 (95.73%) Loss: 2.212617 LR: 0.00000617 [17:50:54] Epoch: 1 Batch: 19242/20099 (95.74%) Loss: 2.278148 LR: 0.00000617 [17:50:56] Epoch: 1 Batch: 19243/20099 (95.74%) Loss: 2.362340 LR: 0.00000617 [17:50:58] Epoch: 1 Batch: 19244/20099 (95.75%) Loss: 2.214197 LR: 0.00000617 [17:51:00] Epoch: 1 Batch: 19245/20099 (95.75%) Loss: 1.943503 LR: 0.00000617 [17:51:02] Epoch: 1 Batch: 19246/20099 (95.76%) Loss: 2.040317 LR: 0.00000617 [17:51:04] Epoch: 1 Batch: 19247/20099 (95.76%) Loss: 2.189617 LR: 0.00000617 [17:51:05] Epoch: 1 Batch: 19248/20099 (95.77%) Loss: 2.145845 LR: 0.00000617 [17:51:07] Epoch: 1 Batch: 19249/20099 (95.77%) Loss: 2.211088 LR: 0.00000617 [17:51:09] Epoch: 1 Batch: 19250/20099 (95.78%) Loss: 2.423914 LR: 0.00000617 [17:51:11] Epoch: 1 Batch: 19251/20099 (95.78%) Loss: 2.307972 LR: 0.00000616 [17:51:13] Epoch: 1 Batch: 19252/20099 (95.79%) Loss: 2.269443 LR: 0.00000616 [17:51:15] Epoch: 1 Batch: 19253/20099 (95.79%) Loss: 2.143629 LR: 0.00000616 [17:51:17] Epoch: 1 Batch: 19254/20099 (95.80%) Loss: 2.064335 LR: 0.00000616 [17:51:18] Epoch: 1 Batch: 19255/20099 (95.80%) Loss: 2.286610 LR: 0.00000616 [17:51:20] Epoch: 1 Batch: 19256/20099 (95.81%) Loss: 1.904684 LR: 0.00000616 [17:51:22] Epoch: 1 Batch: 19257/20099 (95.81%) Loss: 1.972625 LR: 0.00000616 [17:51:24] Epoch: 1 Batch: 19258/20099 (95.82%) Loss: 1.896065 LR: 0.00000616 [17:51:26] Epoch: 1 Batch: 19259/20099 (95.82%) Loss: 2.162983 LR: 0.00000616 [17:51:28] Epoch: 1 Batch: 19260/20099 (95.83%) Loss: 2.011559 LR: 0.00000616 [17:51:30] Epoch: 1 Batch: 19261/20099 (95.83%) Loss: 1.939342 LR: 0.00000616 [17:51:31] Epoch: 1 Batch: 19262/20099 (95.84%) Loss: 2.360000 LR: 0.00000616 [17:51:33] Epoch: 1 Batch: 19263/20099 (95.84%) Loss: 2.007145 LR: 0.00000616 [17:51:35] Epoch: 1 Batch: 19264/20099 (95.85%) Loss: 2.060386 LR: 0.00000616 [17:51:37] Epoch: 1 Batch: 19265/20099 (95.85%) Loss: 2.213036 LR: 0.00000616 [17:51:39] Epoch: 1 Batch: 19266/20099 (95.86%) Loss: 2.171942 LR: 0.00000616 [17:51:41] Epoch: 1 Batch: 19267/20099 (95.86%) Loss: 1.997608 LR: 0.00000616 [17:51:43] Epoch: 1 Batch: 19268/20099 (95.87%) Loss: 2.044609 LR: 0.00000616 [17:51:44] Epoch: 1 Batch: 19269/20099 (95.87%) Loss: 2.135028 LR: 0.00000616 [17:51:46] Epoch: 1 Batch: 19270/20099 (95.88%) Loss: 2.251157 LR: 0.00000616 [17:51:48] Epoch: 1 Batch: 19271/20099 (95.88%) Loss: 2.073727 LR: 0.00000616 [17:51:50] Epoch: 1 Batch: 19272/20099 (95.89%) Loss: 1.850901 LR: 0.00000616 [17:51:52] Epoch: 1 Batch: 19273/20099 (95.89%) Loss: 2.174431 LR: 0.00000616 [17:51:54] Epoch: 1 Batch: 19274/20099 (95.90%) Loss: 1.966506 LR: 0.00000616 [17:51:56] Epoch: 1 Batch: 19275/20099 (95.90%) Loss: 1.960494 LR: 0.00000616 [17:51:58] Epoch: 1 Batch: 19276/20099 (95.91%) Loss: 1.920515 LR: 0.00000616 [17:51:59] Epoch: 1 Batch: 19277/20099 (95.91%) Loss: 2.121280 LR: 0.00000616 [17:52:01] Epoch: 1 Batch: 19278/20099 (95.92%) Loss: 2.005651 LR: 0.00000616 [17:52:03] Epoch: 1 Batch: 19279/20099 (95.92%) Loss: 2.349174 LR: 0.00000615 [17:52:05] Epoch: 1 Batch: 19280/20099 (95.93%) Loss: 2.098175 LR: 0.00000615 [17:52:07] Epoch: 1 Batch: 19281/20099 (95.93%) Loss: 2.254025 LR: 0.00000615 [17:52:09] Epoch: 1 Batch: 19282/20099 (95.94%) Loss: 1.946934 LR: 0.00000615 [17:52:10] Epoch: 1 Batch: 19283/20099 (95.94%) Loss: 2.241768 LR: 0.00000615 [17:52:12] Epoch: 1 Batch: 19284/20099 (95.95%) Loss: 1.918355 LR: 0.00000615 [17:52:14] Epoch: 1 Batch: 19285/20099 (95.95%) Loss: 2.156464 LR: 0.00000615 [17:52:16] Epoch: 1 Batch: 19286/20099 (95.96%) Loss: 1.896860 LR: 0.00000615 [17:52:18] Epoch: 1 Batch: 19287/20099 (95.96%) Loss: 1.747467 LR: 0.00000615 [17:52:20] Epoch: 1 Batch: 19288/20099 (95.96%) Loss: 2.037144 LR: 0.00000615 [17:52:22] Epoch: 1 Batch: 19289/20099 (95.97%) Loss: 1.990064 LR: 0.00000615 [17:52:23] Epoch: 1 Batch: 19290/20099 (95.97%) Loss: 2.401813 LR: 0.00000615 [17:52:25] Epoch: 1 Batch: 19291/20099 (95.98%) Loss: 2.035743 LR: 0.00000615 [17:52:27] Epoch: 1 Batch: 19292/20099 (95.98%) Loss: 1.954222 LR: 0.00000615 [17:52:29] Epoch: 1 Batch: 19293/20099 (95.99%) Loss: 2.066465 LR: 0.00000615 [17:52:31] Epoch: 1 Batch: 19294/20099 (95.99%) Loss: 1.846586 LR: 0.00000615 [17:52:33] Epoch: 1 Batch: 19295/20099 (96.00%) Loss: 2.185046 LR: 0.00000615 [17:52:35] Epoch: 1 Batch: 19296/20099 (96.00%) Loss: 2.247917 LR: 0.00000615 [17:52:36] Epoch: 1 Batch: 19297/20099 (96.01%) Loss: 2.504989 LR: 0.00000615 [17:52:38] Epoch: 1 Batch: 19298/20099 (96.01%) Loss: 1.830147 LR: 0.00000615 [17:52:40] Epoch: 1 Batch: 19299/20099 (96.02%) Loss: 1.966463 LR: 0.00000615 [17:52:42] Epoch: 1 Batch: 19300/20099 (96.02%) Loss: 2.045630 LR: 0.00000615 [17:52:44] Epoch: 1 Batch: 19301/20099 (96.03%) Loss: 2.121202 LR: 0.00000615 [17:52:46] Epoch: 1 Batch: 19302/20099 (96.03%) Loss: 2.001701 LR: 0.00000615 [17:52:47] Epoch: 1 Batch: 19303/20099 (96.04%) Loss: 2.119357 LR: 0.00000615 [17:52:49] Epoch: 1 Batch: 19304/20099 (96.04%) Loss: 2.324713 LR: 0.00000615 [17:52:51] Epoch: 1 Batch: 19305/20099 (96.05%) Loss: 2.090404 LR: 0.00000615 [17:52:53] Epoch: 1 Batch: 19306/20099 (96.05%) Loss: 2.235808 LR: 0.00000615 [17:52:55] Epoch: 1 Batch: 19307/20099 (96.06%) Loss: 2.131451 LR: 0.00000614 [17:52:57] Epoch: 1 Batch: 19308/20099 (96.06%) Loss: 1.998028 LR: 0.00000614 [17:52:59] Epoch: 1 Batch: 19309/20099 (96.07%) Loss: 2.403607 LR: 0.00000614 [17:53:00] Epoch: 1 Batch: 19310/20099 (96.07%) Loss: 2.035325 LR: 0.00000614 [17:53:02] Epoch: 1 Batch: 19311/20099 (96.08%) Loss: 2.004435 LR: 0.00000614 [17:53:04] Epoch: 1 Batch: 19312/20099 (96.08%) Loss: 2.523798 LR: 0.00000614 [17:53:06] Epoch: 1 Batch: 19313/20099 (96.09%) Loss: 2.101306 LR: 0.00000614 [17:53:08] Epoch: 1 Batch: 19314/20099 (96.09%) Loss: 2.082710 LR: 0.00000614 [17:53:10] Epoch: 1 Batch: 19315/20099 (96.10%) Loss: 2.158450 LR: 0.00000614 [17:53:12] Epoch: 1 Batch: 19316/20099 (96.10%) Loss: 2.085894 LR: 0.00000614 [17:53:13] Epoch: 1 Batch: 19317/20099 (96.11%) Loss: 1.920276 LR: 0.00000614 [17:53:15] Epoch: 1 Batch: 19318/20099 (96.11%) Loss: 2.350251 LR: 0.00000614 [17:53:17] Epoch: 1 Batch: 19319/20099 (96.12%) Loss: 2.002548 LR: 0.00000614 [17:53:19] Epoch: 1 Batch: 19320/20099 (96.12%) Loss: 2.005158 LR: 0.00000614 [17:53:21] Epoch: 1 Batch: 19321/20099 (96.13%) Loss: 1.957619 LR: 0.00000614 [17:53:23] Epoch: 1 Batch: 19322/20099 (96.13%) Loss: 2.168481 LR: 0.00000614 [17:53:25] Epoch: 1 Batch: 19323/20099 (96.14%) Loss: 1.947701 LR: 0.00000614 [17:53:26] Epoch: 1 Batch: 19324/20099 (96.14%) Loss: 2.057435 LR: 0.00000614 [17:53:28] Epoch: 1 Batch: 19325/20099 (96.15%) Loss: 1.992551 LR: 0.00000614 [17:53:30] Epoch: 1 Batch: 19326/20099 (96.15%) Loss: 2.271888 LR: 0.00000614 [17:53:32] Epoch: 1 Batch: 19327/20099 (96.16%) Loss: 1.961425 LR: 0.00000614 [17:53:34] Epoch: 1 Batch: 19328/20099 (96.16%) Loss: 2.078290 LR: 0.00000614 [17:53:36] Epoch: 1 Batch: 19329/20099 (96.17%) Loss: 2.182646 LR: 0.00000614 [17:53:38] Epoch: 1 Batch: 19330/20099 (96.17%) Loss: 2.046505 LR: 0.00000614 [17:53:40] Epoch: 1 Batch: 19331/20099 (96.18%) Loss: 2.068397 LR: 0.00000614 [17:53:41] Epoch: 1 Batch: 19332/20099 (96.18%) Loss: 2.285892 LR: 0.00000614 [17:53:43] Epoch: 1 Batch: 19333/20099 (96.19%) Loss: 1.986217 LR: 0.00000614 [17:53:45] Epoch: 1 Batch: 19334/20099 (96.19%) Loss: 2.115404 LR: 0.00000614 [17:53:47] Epoch: 1 Batch: 19335/20099 (96.20%) Loss: 2.298965 LR: 0.00000613 [17:53:49] Epoch: 1 Batch: 19336/20099 (96.20%) Loss: 1.903817 LR: 0.00000613 [17:53:51] Epoch: 1 Batch: 19337/20099 (96.21%) Loss: 2.047595 LR: 0.00000613 [17:53:53] Epoch: 1 Batch: 19338/20099 (96.21%) Loss: 2.126400 LR: 0.00000613 [17:53:54] Epoch: 1 Batch: 19339/20099 (96.22%) Loss: 1.660000 LR: 0.00000613 [17:53:56] Epoch: 1 Batch: 19340/20099 (96.22%) Loss: 2.151557 LR: 0.00000613 [17:53:58] Epoch: 1 Batch: 19341/20099 (96.23%) Loss: 2.040665 LR: 0.00000613 [17:54:00] Epoch: 1 Batch: 19342/20099 (96.23%) Loss: 2.017295 LR: 0.00000613 [17:54:02] Epoch: 1 Batch: 19343/20099 (96.24%) Loss: 2.097999 LR: 0.00000613 [17:54:04] Epoch: 1 Batch: 19344/20099 (96.24%) Loss: 2.218729 LR: 0.00000613 [17:54:06] Epoch: 1 Batch: 19345/20099 (96.25%) Loss: 1.557062 LR: 0.00000613 [17:54:07] Epoch: 1 Batch: 19346/20099 (96.25%) Loss: 2.061547 LR: 0.00000613 [17:54:09] Epoch: 1 Batch: 19347/20099 (96.26%) Loss: 2.107486 LR: 0.00000613 [17:54:11] Epoch: 1 Batch: 19348/20099 (96.26%) Loss: 1.981693 LR: 0.00000613 [17:54:13] Epoch: 1 Batch: 19349/20099 (96.27%) Loss: 2.163957 LR: 0.00000613 [17:54:15] Epoch: 1 Batch: 19350/20099 (96.27%) Loss: 1.940536 LR: 0.00000613 [17:54:17] Epoch: 1 Batch: 19351/20099 (96.28%) Loss: 2.279514 LR: 0.00000613 [17:54:19] Epoch: 1 Batch: 19352/20099 (96.28%) Loss: 2.062138 LR: 0.00000613 [17:54:20] Epoch: 1 Batch: 19353/20099 (96.29%) Loss: 2.368973 LR: 0.00000613 [17:54:22] Epoch: 1 Batch: 19354/20099 (96.29%) Loss: 2.137545 LR: 0.00000613 [17:54:24] Epoch: 1 Batch: 19355/20099 (96.30%) Loss: 1.982227 LR: 0.00000613 [17:54:26] Epoch: 1 Batch: 19356/20099 (96.30%) Loss: 2.155360 LR: 0.00000613 [17:54:28] Epoch: 1 Batch: 19357/20099 (96.31%) Loss: 1.993412 LR: 0.00000613 [17:54:30] Epoch: 1 Batch: 19358/20099 (96.31%) Loss: 2.258946 LR: 0.00000613 [17:54:31] Epoch: 1 Batch: 19359/20099 (96.32%) Loss: 2.113856 LR: 0.00000613 [17:54:33] Epoch: 1 Batch: 19360/20099 (96.32%) Loss: 1.843676 LR: 0.00000613 [17:54:35] Epoch: 1 Batch: 19361/20099 (96.33%) Loss: 2.309748 LR: 0.00000613 [17:54:37] Epoch: 1 Batch: 19362/20099 (96.33%) Loss: 1.844963 LR: 0.00000613 [17:54:39] Epoch: 1 Batch: 19363/20099 (96.34%) Loss: 2.246712 LR: 0.00000612 [17:54:41] Epoch: 1 Batch: 19364/20099 (96.34%) Loss: 2.020667 LR: 0.00000612 [17:54:43] Epoch: 1 Batch: 19365/20099 (96.35%) Loss: 2.043519 LR: 0.00000612 [17:54:44] Epoch: 1 Batch: 19366/20099 (96.35%) Loss: 2.245112 LR: 0.00000612 [17:54:46] Epoch: 1 Batch: 19367/20099 (96.36%) Loss: 1.809615 LR: 0.00000612 [17:54:48] Epoch: 1 Batch: 19368/20099 (96.36%) Loss: 2.149504 LR: 0.00000612 [17:54:50] Epoch: 1 Batch: 19369/20099 (96.37%) Loss: 2.042169 LR: 0.00000612 [17:54:52] Epoch: 1 Batch: 19370/20099 (96.37%) Loss: 2.340504 LR: 0.00000612 [17:54:54] Epoch: 1 Batch: 19371/20099 (96.38%) Loss: 1.903045 LR: 0.00000612 [17:54:56] Epoch: 1 Batch: 19372/20099 (96.38%) Loss: 2.288682 LR: 0.00000612 [17:54:57] Epoch: 1 Batch: 19373/20099 (96.39%) Loss: 2.155350 LR: 0.00000612 [17:54:59] Epoch: 1 Batch: 19374/20099 (96.39%) Loss: 1.990450 LR: 0.00000612 [17:55:01] Epoch: 1 Batch: 19375/20099 (96.40%) Loss: 1.896698 LR: 0.00000612 [17:55:03] Epoch: 1 Batch: 19376/20099 (96.40%) Loss: 1.763433 LR: 0.00000612 [17:55:05] Epoch: 1 Batch: 19377/20099 (96.41%) Loss: 1.811717 LR: 0.00000612 [17:55:07] Epoch: 1 Batch: 19378/20099 (96.41%) Loss: 1.978618 LR: 0.00000612 [17:55:09] Epoch: 1 Batch: 19379/20099 (96.42%) Loss: 1.934530 LR: 0.00000612 [17:55:10] Epoch: 1 Batch: 19380/20099 (96.42%) Loss: 2.182753 LR: 0.00000612 [17:55:12] Epoch: 1 Batch: 19381/20099 (96.43%) Loss: 1.943883 LR: 0.00000612 [17:55:14] Epoch: 1 Batch: 19382/20099 (96.43%) Loss: 1.996518 LR: 0.00000612 [17:55:16] Epoch: 1 Batch: 19383/20099 (96.44%) Loss: 2.188907 LR: 0.00000612 [17:55:18] Epoch: 1 Batch: 19384/20099 (96.44%) Loss: 2.021890 LR: 0.00000612 [17:55:20] Epoch: 1 Batch: 19385/20099 (96.45%) Loss: 2.134569 LR: 0.00000612 [17:55:22] Epoch: 1 Batch: 19386/20099 (96.45%) Loss: 1.697063 LR: 0.00000612 [17:55:23] Epoch: 1 Batch: 19387/20099 (96.46%) Loss: 2.006622 LR: 0.00000612 [17:55:25] Epoch: 1 Batch: 19388/20099 (96.46%) Loss: 2.066455 LR: 0.00000612 [17:55:27] Epoch: 1 Batch: 19389/20099 (96.47%) Loss: 1.704357 LR: 0.00000612 [17:55:29] Epoch: 1 Batch: 19390/20099 (96.47%) Loss: 2.009863 LR: 0.00000612 [17:55:31] Epoch: 1 Batch: 19391/20099 (96.48%) Loss: 2.196047 LR: 0.00000611 [17:55:33] Epoch: 1 Batch: 19392/20099 (96.48%) Loss: 2.179289 LR: 0.00000611 [17:55:35] Epoch: 1 Batch: 19393/20099 (96.49%) Loss: 2.236217 LR: 0.00000611 [17:55:36] Epoch: 1 Batch: 19394/20099 (96.49%) Loss: 2.035829 LR: 0.00000611 [17:55:38] Epoch: 1 Batch: 19395/20099 (96.50%) Loss: 1.961829 LR: 0.00000611 [17:55:40] Epoch: 1 Batch: 19396/20099 (96.50%) Loss: 2.051625 LR: 0.00000611 [17:55:42] Epoch: 1 Batch: 19397/20099 (96.51%) Loss: 2.077054 LR: 0.00000611 [17:55:44] Epoch: 1 Batch: 19398/20099 (96.51%) Loss: 1.812675 LR: 0.00000611 [17:55:46] Epoch: 1 Batch: 19399/20099 (96.52%) Loss: 2.121755 LR: 0.00000611 [17:55:51] >> Cleaned up old temp checkpoint: epoch1_step17400 [17:55:51] >> Temp checkpoint saved: epoch1_step19400, size: 0.1693 GB [17:55:51] Epoch: 1 Batch: 19400/20099 (96.52%) Loss: 1.925099 LR: 0.00000611 [17:55:53] Epoch: 1 Batch: 19401/20099 (96.53%) Loss: 1.917139 LR: 0.00000611 [17:55:55] Epoch: 1 Batch: 19402/20099 (96.53%) Loss: 2.258086 LR: 0.00000611 [17:55:57] Epoch: 1 Batch: 19403/20099 (96.54%) Loss: 2.195222 LR: 0.00000611 [17:55:59] Epoch: 1 Batch: 19404/20099 (96.54%) Loss: 2.020311 LR: 0.00000611 [17:56:00] Epoch: 1 Batch: 19405/20099 (96.55%) Loss: 1.865507 LR: 0.00000611 [17:56:02] Epoch: 1 Batch: 19406/20099 (96.55%) Loss: 1.990460 LR: 0.00000611 [17:56:04] Epoch: 1 Batch: 19407/20099 (96.56%) Loss: 2.076969 LR: 0.00000611 [17:56:06] Epoch: 1 Batch: 19408/20099 (96.56%) Loss: 2.015680 LR: 0.00000611 [17:56:08] Epoch: 1 Batch: 19409/20099 (96.57%) Loss: 2.140987 LR: 0.00000611 [17:56:10] Epoch: 1 Batch: 19410/20099 (96.57%) Loss: 2.205754 LR: 0.00000611 [17:56:12] Epoch: 1 Batch: 19411/20099 (96.58%) Loss: 2.132425 LR: 0.00000611 [17:56:13] Epoch: 1 Batch: 19412/20099 (96.58%) Loss: 1.985133 LR: 0.00000611 [17:56:15] Epoch: 1 Batch: 19413/20099 (96.59%) Loss: 2.017500 LR: 0.00000611 [17:56:17] Epoch: 1 Batch: 19414/20099 (96.59%) Loss: 2.086883 LR: 0.00000611 [17:56:19] Epoch: 1 Batch: 19415/20099 (96.60%) Loss: 1.951419 LR: 0.00000611 [17:56:21] Epoch: 1 Batch: 19416/20099 (96.60%) Loss: 2.102908 LR: 0.00000611 [17:56:23] Epoch: 1 Batch: 19417/20099 (96.61%) Loss: 2.181990 LR: 0.00000611 [17:56:25] Epoch: 1 Batch: 19418/20099 (96.61%) Loss: 2.164812 LR: 0.00000611 [17:56:26] Epoch: 1 Batch: 19419/20099 (96.62%) Loss: 1.808563 LR: 0.00000611 [17:56:28] Epoch: 1 Batch: 19420/20099 (96.62%) Loss: 2.159036 LR: 0.00000611 [17:56:30] Epoch: 1 Batch: 19421/20099 (96.63%) Loss: 2.462580 LR: 0.00000611 [17:56:32] Epoch: 1 Batch: 19422/20099 (96.63%) Loss: 2.323354 LR: 0.00000611 [17:56:34] Epoch: 1 Batch: 19423/20099 (96.64%) Loss: 2.112163 LR: 0.00000611 [17:56:36] Epoch: 1 Batch: 19424/20099 (96.64%) Loss: 2.001533 LR: 0.00000611 [17:56:38] Epoch: 1 Batch: 19425/20099 (96.65%) Loss: 1.966708 LR: 0.00000611 [17:56:40] Epoch: 1 Batch: 19426/20099 (96.65%) Loss: 1.804164 LR: 0.00000610 [17:56:41] Epoch: 1 Batch: 19427/20099 (96.66%) Loss: 2.171651 LR: 0.00000610 [17:56:43] Epoch: 1 Batch: 19428/20099 (96.66%) Loss: 1.965465 LR: 0.00000610 [17:56:45] Epoch: 1 Batch: 19429/20099 (96.67%) Loss: 2.064736 LR: 0.00000610 [17:56:47] Epoch: 1 Batch: 19430/20099 (96.67%) Loss: 2.281380 LR: 0.00000610 [17:56:49] Epoch: 1 Batch: 19431/20099 (96.68%) Loss: 2.105890 LR: 0.00000610 [17:56:51] Epoch: 1 Batch: 19432/20099 (96.68%) Loss: 2.204804 LR: 0.00000610 [17:56:53] Epoch: 1 Batch: 19433/20099 (96.69%) Loss: 1.852398 LR: 0.00000610 [17:56:54] Epoch: 1 Batch: 19434/20099 (96.69%) Loss: 2.217866 LR: 0.00000610 [17:56:56] Epoch: 1 Batch: 19435/20099 (96.70%) Loss: 1.799012 LR: 0.00000610 [17:56:58] Epoch: 1 Batch: 19436/20099 (96.70%) Loss: 2.312256 LR: 0.00000610 [17:57:00] Epoch: 1 Batch: 19437/20099 (96.71%) Loss: 2.063627 LR: 0.00000610 [17:57:02] Epoch: 1 Batch: 19438/20099 (96.71%) Loss: 2.141188 LR: 0.00000610 [17:57:04] Epoch: 1 Batch: 19439/20099 (96.72%) Loss: 2.190587 LR: 0.00000610 [17:57:06] Epoch: 1 Batch: 19440/20099 (96.72%) Loss: 2.201427 LR: 0.00000610 [17:57:07] Epoch: 1 Batch: 19441/20099 (96.73%) Loss: 2.004861 LR: 0.00000610 [17:57:09] Epoch: 1 Batch: 19442/20099 (96.73%) Loss: 1.692448 LR: 0.00000610 [17:57:11] Epoch: 1 Batch: 19443/20099 (96.74%) Loss: 1.792293 LR: 0.00000610 [17:57:13] Epoch: 1 Batch: 19444/20099 (96.74%) Loss: 2.099373 LR: 0.00000610 [17:57:15] Epoch: 1 Batch: 19445/20099 (96.75%) Loss: 1.901873 LR: 0.00000610 [17:57:17] Epoch: 1 Batch: 19446/20099 (96.75%) Loss: 1.975660 LR: 0.00000610 [17:57:19] Epoch: 1 Batch: 19447/20099 (96.76%) Loss: 1.867064 LR: 0.00000610 [17:57:20] Epoch: 1 Batch: 19448/20099 (96.76%) Loss: 2.299204 LR: 0.00000610 [17:57:22] Epoch: 1 Batch: 19449/20099 (96.77%) Loss: 2.042770 LR: 0.00000610 [17:57:24] Epoch: 1 Batch: 19450/20099 (96.77%) Loss: 2.163883 LR: 0.00000610 [17:57:26] Epoch: 1 Batch: 19451/20099 (96.78%) Loss: 2.170896 LR: 0.00000610 [17:57:28] Epoch: 1 Batch: 19452/20099 (96.78%) Loss: 1.844895 LR: 0.00000610 [17:57:30] Epoch: 1 Batch: 19453/20099 (96.79%) Loss: 1.967607 LR: 0.00000610 [17:57:32] Epoch: 1 Batch: 19454/20099 (96.79%) Loss: 2.094273 LR: 0.00000609 [17:57:33] Epoch: 1 Batch: 19455/20099 (96.80%) Loss: 1.927925 LR: 0.00000609 [17:57:35] Epoch: 1 Batch: 19456/20099 (96.80%) Loss: 1.993478 LR: 0.00000609 [17:57:37] Epoch: 1 Batch: 19457/20099 (96.81%) Loss: 1.928728 LR: 0.00000609 [17:57:39] Epoch: 1 Batch: 19458/20099 (96.81%) Loss: 2.242888 LR: 0.00000609 [17:57:41] Epoch: 1 Batch: 19459/20099 (96.82%) Loss: 2.047242 LR: 0.00000609 [17:57:43] Epoch: 1 Batch: 19460/20099 (96.82%) Loss: 1.965682 LR: 0.00000609 [17:57:45] Epoch: 1 Batch: 19461/20099 (96.83%) Loss: 2.219653 LR: 0.00000609 [17:57:46] Epoch: 1 Batch: 19462/20099 (96.83%) Loss: 2.083118 LR: 0.00000609 [17:57:48] Epoch: 1 Batch: 19463/20099 (96.84%) Loss: 2.296796 LR: 0.00000609 [17:57:50] Epoch: 1 Batch: 19464/20099 (96.84%) Loss: 2.037698 LR: 0.00000609 [17:57:52] Epoch: 1 Batch: 19465/20099 (96.85%) Loss: 2.283341 LR: 0.00000609 [17:57:54] Epoch: 1 Batch: 19466/20099 (96.85%) Loss: 2.258079 LR: 0.00000609 [17:57:56] Epoch: 1 Batch: 19467/20099 (96.86%) Loss: 1.868088 LR: 0.00000609 [17:57:58] Epoch: 1 Batch: 19468/20099 (96.86%) Loss: 1.925231 LR: 0.00000609 [17:57:59] Epoch: 1 Batch: 19469/20099 (96.87%) Loss: 2.134203 LR: 0.00000609 [17:58:01] Epoch: 1 Batch: 19470/20099 (96.87%) Loss: 2.055877 LR: 0.00000609 [17:58:03] Epoch: 1 Batch: 19471/20099 (96.88%) Loss: 2.046229 LR: 0.00000609 [17:58:05] Epoch: 1 Batch: 19472/20099 (96.88%) Loss: 2.131074 LR: 0.00000609 [17:58:07] Epoch: 1 Batch: 19473/20099 (96.89%) Loss: 2.177550 LR: 0.00000609 [17:58:09] Epoch: 1 Batch: 19474/20099 (96.89%) Loss: 1.931331 LR: 0.00000609 [17:58:11] Epoch: 1 Batch: 19475/20099 (96.90%) Loss: 1.712581 LR: 0.00000609 [17:58:12] Epoch: 1 Batch: 19476/20099 (96.90%) Loss: 2.054656 LR: 0.00000609 [17:58:14] Epoch: 1 Batch: 19477/20099 (96.91%) Loss: 1.903771 LR: 0.00000609 [17:58:16] Epoch: 1 Batch: 19478/20099 (96.91%) Loss: 1.763790 LR: 0.00000609 [17:58:18] Epoch: 1 Batch: 19479/20099 (96.92%) Loss: 1.917998 LR: 0.00000609 [17:58:20] Epoch: 1 Batch: 19480/20099 (96.92%) Loss: 1.862601 LR: 0.00000609 [17:58:22] Epoch: 1 Batch: 19481/20099 (96.93%) Loss: 1.981473 LR: 0.00000609 [17:58:24] Epoch: 1 Batch: 19482/20099 (96.93%) Loss: 2.192888 LR: 0.00000609 [17:58:26] Epoch: 1 Batch: 19483/20099 (96.94%) Loss: 2.191119 LR: 0.00000609 [17:58:27] Epoch: 1 Batch: 19484/20099 (96.94%) Loss: 2.047935 LR: 0.00000609 [17:58:29] Epoch: 1 Batch: 19485/20099 (96.95%) Loss: 2.176638 LR: 0.00000609 [17:58:31] Epoch: 1 Batch: 19486/20099 (96.95%) Loss: 2.109624 LR: 0.00000609 [17:58:33] Epoch: 1 Batch: 19487/20099 (96.96%) Loss: 1.885157 LR: 0.00000609 [17:58:35] Epoch: 1 Batch: 19488/20099 (96.96%) Loss: 2.149888 LR: 0.00000609 [17:58:37] Epoch: 1 Batch: 19489/20099 (96.97%) Loss: 2.200167 LR: 0.00000609 [17:58:39] Epoch: 1 Batch: 19490/20099 (96.97%) Loss: 1.930757 LR: 0.00000609 [17:58:40] Epoch: 1 Batch: 19491/20099 (96.97%) Loss: 1.809559 LR: 0.00000609 [17:58:42] Epoch: 1 Batch: 19492/20099 (96.98%) Loss: 1.936587 LR: 0.00000609 [17:58:44] Epoch: 1 Batch: 19493/20099 (96.98%) Loss: 1.901129 LR: 0.00000609 [17:58:46] Epoch: 1 Batch: 19494/20099 (96.99%) Loss: 2.301649 LR: 0.00000609 [17:58:48] Epoch: 1 Batch: 19495/20099 (96.99%) Loss: 2.171221 LR: 0.00000609 [17:58:50] Epoch: 1 Batch: 19496/20099 (97.00%) Loss: 2.007379 LR: 0.00000608 [17:58:52] Epoch: 1 Batch: 19497/20099 (97.00%) Loss: 1.939808 LR: 0.00000608 [17:58:53] Epoch: 1 Batch: 19498/20099 (97.01%) Loss: 2.022080 LR: 0.00000608 [17:58:55] Epoch: 1 Batch: 19499/20099 (97.01%) Loss: 2.302041 LR: 0.00000608 [17:58:57] >> Evaluating batch 0 [17:58:58] >> Evaluating batch 1 [17:58:59] >> Evaluating batch 2 [17:59:01] >> Evaluating batch 3 [17:59:02] >> Evaluating batch 4 [17:59:03] >> Evaluating batch 5 [17:59:04] >> Evaluating batch 6 [17:59:05] >> Evaluating batch 7 [17:59:06] >> Evaluating batch 8 [17:59:07] >> Evaluating batch 9 [17:59:08] >> Evaluating batch 10 [17:59:09] >> Evaluating batch 11 [17:59:10] >> Evaluating batch 12 [17:59:11] >> Evaluating batch 13 [17:59:12] >> Evaluating batch 14 [17:59:13] >> Evaluating batch 15 [17:59:14] >> Evaluating batch 16 [17:59:15] Epoch: 1 Step: 19500/20099 Evaluation: [17:59:15] [1mAvg Loss Since Last Eval: 2.0715 Val Loss: 2.1466 Validation loss delta: 0.0007 Perplexity: 8.5557 LR: 0.00000608 [17:59:18] >> Checkpoint saved: epoch1_step19500, size: 0.1693 GB [17:59:18] Epoch: 1 Batch: 19500/20099 (97.02%) Loss: 2.199610 LR: 0.00000608 [17:59:20] Epoch: 1 Batch: 19501/20099 (97.02%) Loss: 2.145238 LR: 0.00000608 [17:59:22] Epoch: 1 Batch: 19502/20099 (97.03%) Loss: 2.047187 LR: 0.00000608 [17:59:24] Epoch: 1 Batch: 19503/20099 (97.03%) Loss: 1.855285 LR: 0.00000608 [17:59:25] Epoch: 1 Batch: 19504/20099 (97.04%) Loss: 1.993897 LR: 0.00000608 [17:59:27] Epoch: 1 Batch: 19505/20099 (97.04%) Loss: 2.341565 LR: 0.00000608 [17:59:29] Epoch: 1 Batch: 19506/20099 (97.05%) Loss: 2.201092 LR: 0.00000608 [17:59:31] Epoch: 1 Batch: 19507/20099 (97.05%) Loss: 2.396380 LR: 0.00000608 [17:59:33] Epoch: 1 Batch: 19508/20099 (97.06%) Loss: 2.121985 LR: 0.00000608 [17:59:35] Epoch: 1 Batch: 19509/20099 (97.06%) Loss: 2.030664 LR: 0.00000608 [17:59:37] Epoch: 1 Batch: 19510/20099 (97.07%) Loss: 2.153253 LR: 0.00000608 [17:59:38] Epoch: 1 Batch: 19511/20099 (97.07%) Loss: 2.185047 LR: 0.00000608 [17:59:40] Epoch: 1 Batch: 19512/20099 (97.08%) Loss: 2.211670 LR: 0.00000608 [17:59:42] Epoch: 1 Batch: 19513/20099 (97.08%) Loss: 1.995257 LR: 0.00000608 [17:59:44] Epoch: 1 Batch: 19514/20099 (97.09%) Loss: 2.130027 LR: 0.00000608 [17:59:46] Epoch: 1 Batch: 19515/20099 (97.09%) Loss: 1.905598 LR: 0.00000608 [17:59:48] Epoch: 1 Batch: 19516/20099 (97.10%) Loss: 1.985156 LR: 0.00000608 [17:59:50] Epoch: 1 Batch: 19517/20099 (97.10%) Loss: 2.293996 LR: 0.00000608 [17:59:51] Epoch: 1 Batch: 19518/20099 (97.11%) Loss: 2.030399 LR: 0.00000608 [17:59:53] Epoch: 1 Batch: 19519/20099 (97.11%) Loss: 2.061881 LR: 0.00000608 [17:59:55] Epoch: 1 Batch: 19520/20099 (97.12%) Loss: 2.167645 LR: 0.00000608 [17:59:57] Epoch: 1 Batch: 19521/20099 (97.12%) Loss: 1.952795 LR: 0.00000608 [17:59:59] Epoch: 1 Batch: 19522/20099 (97.13%) Loss: 2.161867 LR: 0.00000608 [18:00:01] Epoch: 1 Batch: 19523/20099 (97.13%) Loss: 1.998260 LR: 0.00000608 [18:00:03] Epoch: 1 Batch: 19524/20099 (97.14%) Loss: 2.282427 LR: 0.00000608 [18:00:05] Epoch: 1 Batch: 19525/20099 (97.14%) Loss: 2.044101 LR: 0.00000608 [18:00:06] Epoch: 1 Batch: 19526/20099 (97.15%) Loss: 2.195108 LR: 0.00000608 [18:00:08] Epoch: 1 Batch: 19527/20099 (97.15%) Loss: 1.862700 LR: 0.00000608 [18:00:10] Epoch: 1 Batch: 19528/20099 (97.16%) Loss: 2.144134 LR: 0.00000608 [18:00:12] Epoch: 1 Batch: 19529/20099 (97.16%) Loss: 2.076379 LR: 0.00000608 [18:00:14] Epoch: 1 Batch: 19530/20099 (97.17%) Loss: 2.001679 LR: 0.00000608 [18:00:16] Epoch: 1 Batch: 19531/20099 (97.17%) Loss: 2.028637 LR: 0.00000607 [18:00:18] Epoch: 1 Batch: 19532/20099 (97.18%) Loss: 2.187317 LR: 0.00000607 [18:00:19] Epoch: 1 Batch: 19533/20099 (97.18%) Loss: 2.084725 LR: 0.00000607 [18:00:21] Epoch: 1 Batch: 19534/20099 (97.19%) Loss: 2.155938 LR: 0.00000607 [18:00:23] Epoch: 1 Batch: 19535/20099 (97.19%) Loss: 2.056368 LR: 0.00000607 [18:00:25] Epoch: 1 Batch: 19536/20099 (97.20%) Loss: 2.011721 LR: 0.00000607 [18:00:27] Epoch: 1 Batch: 19537/20099 (97.20%) Loss: 2.235137 LR: 0.00000607 [18:00:29] Epoch: 1 Batch: 19538/20099 (97.21%) Loss: 2.068829 LR: 0.00000607 [18:00:31] Epoch: 1 Batch: 19539/20099 (97.21%) Loss: 2.052560 LR: 0.00000607 [18:00:32] Epoch: 1 Batch: 19540/20099 (97.22%) Loss: 2.337172 LR: 0.00000607 [18:00:34] Epoch: 1 Batch: 19541/20099 (97.22%) Loss: 2.164357 LR: 0.00000607 [18:00:36] Epoch: 1 Batch: 19542/20099 (97.23%) Loss: 2.013916 LR: 0.00000607 [18:00:38] Epoch: 1 Batch: 19543/20099 (97.23%) Loss: 2.471647 LR: 0.00000607 [18:00:40] Epoch: 1 Batch: 19544/20099 (97.24%) Loss: 2.270936 LR: 0.00000607 [18:00:42] Epoch: 1 Batch: 19545/20099 (97.24%) Loss: 2.379933 LR: 0.00000607 [18:00:44] Epoch: 1 Batch: 19546/20099 (97.25%) Loss: 2.250376 LR: 0.00000607 [18:00:45] Epoch: 1 Batch: 19547/20099 (97.25%) Loss: 2.149836 LR: 0.00000607 [18:00:47] Epoch: 1 Batch: 19548/20099 (97.26%) Loss: 2.164184 LR: 0.00000607 [18:00:49] Epoch: 1 Batch: 19549/20099 (97.26%) Loss: 2.052629 LR: 0.00000607 [18:00:51] Epoch: 1 Batch: 19550/20099 (97.27%) Loss: 2.255353 LR: 0.00000607 [18:00:53] Epoch: 1 Batch: 19551/20099 (97.27%) Loss: 2.041041 LR: 0.00000607 [18:00:55] Epoch: 1 Batch: 19552/20099 (97.28%) Loss: 2.272614 LR: 0.00000607 [18:00:56] Epoch: 1 Batch: 19553/20099 (97.28%) Loss: 2.087205 LR: 0.00000607 [18:00:58] Epoch: 1 Batch: 19554/20099 (97.29%) Loss: 2.354961 LR: 0.00000607 [18:01:00] Epoch: 1 Batch: 19555/20099 (97.29%) Loss: 1.965375 LR: 0.00000607 [18:01:02] Epoch: 1 Batch: 19556/20099 (97.30%) Loss: 1.888774 LR: 0.00000607 [18:01:04] Epoch: 1 Batch: 19557/20099 (97.30%) Loss: 2.132423 LR: 0.00000607 [18:01:06] Epoch: 1 Batch: 19558/20099 (97.31%) Loss: 2.215467 LR: 0.00000607 [18:01:08] Epoch: 1 Batch: 19559/20099 (97.31%) Loss: 1.979129 LR: 0.00000607 [18:01:09] Epoch: 1 Batch: 19560/20099 (97.32%) Loss: 1.986181 LR: 0.00000607 [18:01:11] Epoch: 1 Batch: 19561/20099 (97.32%) Loss: 1.943447 LR: 0.00000607 [18:01:13] Epoch: 1 Batch: 19562/20099 (97.33%) Loss: 2.131091 LR: 0.00000607 [18:01:15] Epoch: 1 Batch: 19563/20099 (97.33%) Loss: 2.090927 LR: 0.00000607 [18:01:17] Epoch: 1 Batch: 19564/20099 (97.34%) Loss: 2.122889 LR: 0.00000607 [18:01:19] Epoch: 1 Batch: 19565/20099 (97.34%) Loss: 1.901665 LR: 0.00000607 [18:01:21] Epoch: 1 Batch: 19566/20099 (97.35%) Loss: 2.179532 LR: 0.00000607 [18:01:22] Epoch: 1 Batch: 19567/20099 (97.35%) Loss: 2.119359 LR: 0.00000607 [18:01:24] Epoch: 1 Batch: 19568/20099 (97.36%) Loss: 1.945777 LR: 0.00000607 [18:01:26] Epoch: 1 Batch: 19569/20099 (97.36%) Loss: 2.149792 LR: 0.00000607 [18:01:28] Epoch: 1 Batch: 19570/20099 (97.37%) Loss: 2.278142 LR: 0.00000607 [18:01:30] Epoch: 1 Batch: 19571/20099 (97.37%) Loss: 2.122362 LR: 0.00000607 [18:01:32] Epoch: 1 Batch: 19572/20099 (97.38%) Loss: 1.948481 LR: 0.00000607 [18:01:34] Epoch: 1 Batch: 19573/20099 (97.38%) Loss: 2.017097 LR: 0.00000606 [18:01:35] Epoch: 1 Batch: 19574/20099 (97.39%) Loss: 2.121473 LR: 0.00000606 [18:01:37] Epoch: 1 Batch: 19575/20099 (97.39%) Loss: 1.868038 LR: 0.00000606 [18:01:39] Epoch: 1 Batch: 19576/20099 (97.40%) Loss: 2.128089 LR: 0.00000606 [18:01:41] Epoch: 1 Batch: 19577/20099 (97.40%) Loss: 2.029241 LR: 0.00000606 [18:01:43] Epoch: 1 Batch: 19578/20099 (97.41%) Loss: 2.116193 LR: 0.00000606 [18:01:45] Epoch: 1 Batch: 19579/20099 (97.41%) Loss: 1.792562 LR: 0.00000606 [18:01:47] Epoch: 1 Batch: 19580/20099 (97.42%) Loss: 2.079478 LR: 0.00000606 [18:01:48] Epoch: 1 Batch: 19581/20099 (97.42%) Loss: 1.954494 LR: 0.00000606 [18:01:50] Epoch: 1 Batch: 19582/20099 (97.43%) Loss: 2.049985 LR: 0.00000606 [18:01:52] Epoch: 1 Batch: 19583/20099 (97.43%) Loss: 1.880618 LR: 0.00000606 [18:01:54] Epoch: 1 Batch: 19584/20099 (97.44%) Loss: 2.280022 LR: 0.00000606 [18:01:56] Epoch: 1 Batch: 19585/20099 (97.44%) Loss: 2.034087 LR: 0.00000606 [18:01:58] Epoch: 1 Batch: 19586/20099 (97.45%) Loss: 2.295334 LR: 0.00000606 [18:02:00] Epoch: 1 Batch: 19587/20099 (97.45%) Loss: 1.760272 LR: 0.00000606 [18:02:02] Epoch: 1 Batch: 19588/20099 (97.46%) Loss: 1.954733 LR: 0.00000606 [18:02:03] Epoch: 1 Batch: 19589/20099 (97.46%) Loss: 1.967489 LR: 0.00000606 [18:02:05] Epoch: 1 Batch: 19590/20099 (97.47%) Loss: 2.323667 LR: 0.00000606 [18:02:07] Epoch: 1 Batch: 19591/20099 (97.47%) Loss: 2.148960 LR: 0.00000606 [18:02:09] Epoch: 1 Batch: 19592/20099 (97.48%) Loss: 1.635573 LR: 0.00000606 [18:02:11] Epoch: 1 Batch: 19593/20099 (97.48%) Loss: 2.146079 LR: 0.00000606 [18:02:13] Epoch: 1 Batch: 19594/20099 (97.49%) Loss: 2.035960 LR: 0.00000606 [18:02:15] Epoch: 1 Batch: 19595/20099 (97.49%) Loss: 1.550639 LR: 0.00000606 [18:02:16] Epoch: 1 Batch: 19596/20099 (97.50%) Loss: 2.150113 LR: 0.00000606 [18:02:18] Epoch: 1 Batch: 19597/20099 (97.50%) Loss: 1.886929 LR: 0.00000606 [18:02:20] Epoch: 1 Batch: 19598/20099 (97.51%) Loss: 2.179450 LR: 0.00000606 [18:02:22] Epoch: 1 Batch: 19599/20099 (97.51%) Loss: 2.228694 LR: 0.00000606 [18:02:28] >> Cleaned up old temp checkpoint: epoch1_step17600 [18:02:28] >> Temp checkpoint saved: epoch1_step19600, size: 0.1693 GB [18:02:28] Epoch: 1 Batch: 19600/20099 (97.52%) Loss: 2.046846 LR: 0.00000606 [18:02:30] Epoch: 1 Batch: 19601/20099 (97.52%) Loss: 2.107209 LR: 0.00000606 [18:02:31] Epoch: 1 Batch: 19602/20099 (97.53%) Loss: 2.263826 LR: 0.00000606 [18:02:33] Epoch: 1 Batch: 19603/20099 (97.53%) Loss: 2.066991 LR: 0.00000606 [18:02:35] Epoch: 1 Batch: 19604/20099 (97.54%) Loss: 2.019565 LR: 0.00000606 [18:02:37] Epoch: 1 Batch: 19605/20099 (97.54%) Loss: 1.901450 LR: 0.00000606 [18:02:39] Epoch: 1 Batch: 19606/20099 (97.55%) Loss: 2.159849 LR: 0.00000606 [18:02:41] Epoch: 1 Batch: 19607/20099 (97.55%) Loss: 1.895479 LR: 0.00000606 [18:02:42] Epoch: 1 Batch: 19608/20099 (97.56%) Loss: 2.216279 LR: 0.00000606 [18:02:44] Epoch: 1 Batch: 19609/20099 (97.56%) Loss: 2.306649 LR: 0.00000606 [18:02:46] Epoch: 1 Batch: 19610/20099 (97.57%) Loss: 2.418483 LR: 0.00000606 [18:02:48] Epoch: 1 Batch: 19611/20099 (97.57%) Loss: 2.172929 LR: 0.00000606 [18:02:50] Epoch: 1 Batch: 19612/20099 (97.58%) Loss: 2.161781 LR: 0.00000606 [18:02:52] Epoch: 1 Batch: 19613/20099 (97.58%) Loss: 1.833497 LR: 0.00000606 [18:02:54] Epoch: 1 Batch: 19614/20099 (97.59%) Loss: 1.979941 LR: 0.00000606 [18:02:56] Epoch: 1 Batch: 19615/20099 (97.59%) Loss: 1.928433 LR: 0.00000605 [18:02:57] Epoch: 1 Batch: 19616/20099 (97.60%) Loss: 2.227236 LR: 0.00000605 [18:02:59] Epoch: 1 Batch: 19617/20099 (97.60%) Loss: 2.179509 LR: 0.00000605 [18:03:01] Epoch: 1 Batch: 19618/20099 (97.61%) Loss: 2.473178 LR: 0.00000605 [18:03:03] Epoch: 1 Batch: 19619/20099 (97.61%) Loss: 2.369631 LR: 0.00000605 [18:03:05] Epoch: 1 Batch: 19620/20099 (97.62%) Loss: 2.237721 LR: 0.00000605 [18:03:07] Epoch: 1 Batch: 19621/20099 (97.62%) Loss: 2.056774 LR: 0.00000605 [18:03:09] Epoch: 1 Batch: 19622/20099 (97.63%) Loss: 2.352124 LR: 0.00000605 [18:03:11] Epoch: 1 Batch: 19623/20099 (97.63%) Loss: 1.850369 LR: 0.00000605 [18:03:12] Epoch: 1 Batch: 19624/20099 (97.64%) Loss: 2.108496 LR: 0.00000605 [18:03:14] Epoch: 1 Batch: 19625/20099 (97.64%) Loss: 1.915843 LR: 0.00000605 [18:03:16] Epoch: 1 Batch: 19626/20099 (97.65%) Loss: 1.643032 LR: 0.00000605 [18:03:18] Epoch: 1 Batch: 19627/20099 (97.65%) Loss: 2.248806 LR: 0.00000605 [18:03:20] Epoch: 1 Batch: 19628/20099 (97.66%) Loss: 1.956812 LR: 0.00000605 [18:03:22] Epoch: 1 Batch: 19629/20099 (97.66%) Loss: 2.260164 LR: 0.00000605 [18:03:24] Epoch: 1 Batch: 19630/20099 (97.67%) Loss: 2.263349 LR: 0.00000605 [18:03:25] Epoch: 1 Batch: 19631/20099 (97.67%) Loss: 2.224092 LR: 0.00000605 [18:03:27] Epoch: 1 Batch: 19632/20099 (97.68%) Loss: 2.232020 LR: 0.00000605 [18:03:29] Epoch: 1 Batch: 19633/20099 (97.68%) Loss: 2.182219 LR: 0.00000605 [18:03:31] Epoch: 1 Batch: 19634/20099 (97.69%) Loss: 2.031585 LR: 0.00000605 [18:03:33] Epoch: 1 Batch: 19635/20099 (97.69%) Loss: 1.928462 LR: 0.00000605 [18:03:35] Epoch: 1 Batch: 19636/20099 (97.70%) Loss: 2.070324 LR: 0.00000605 [18:03:37] Epoch: 1 Batch: 19637/20099 (97.70%) Loss: 2.301719 LR: 0.00000605 [18:03:38] Epoch: 1 Batch: 19638/20099 (97.71%) Loss: 2.178490 LR: 0.00000605 [18:03:40] Epoch: 1 Batch: 19639/20099 (97.71%) Loss: 2.415533 LR: 0.00000605 [18:03:42] Epoch: 1 Batch: 19640/20099 (97.72%) Loss: 2.146120 LR: 0.00000605 [18:03:44] Epoch: 1 Batch: 19641/20099 (97.72%) Loss: 2.192451 LR: 0.00000605 [18:03:46] Epoch: 1 Batch: 19642/20099 (97.73%) Loss: 2.035509 LR: 0.00000605 [18:03:48] Epoch: 1 Batch: 19643/20099 (97.73%) Loss: 2.034524 LR: 0.00000605 [18:03:49] Epoch: 1 Batch: 19644/20099 (97.74%) Loss: 2.241922 LR: 0.00000605 [18:03:51] Epoch: 1 Batch: 19645/20099 (97.74%) Loss: 2.071701 LR: 0.00000605 [18:03:53] Epoch: 1 Batch: 19646/20099 (97.75%) Loss: 2.121493 LR: 0.00000605 [18:03:55] Epoch: 1 Batch: 19647/20099 (97.75%) Loss: 2.149298 LR: 0.00000605 [18:03:57] Epoch: 1 Batch: 19648/20099 (97.76%) Loss: 2.009726 LR: 0.00000605 [18:03:59] Epoch: 1 Batch: 19649/20099 (97.76%) Loss: 2.193168 LR: 0.00000605 [18:04:01] Epoch: 1 Batch: 19650/20099 (97.77%) Loss: 2.124324 LR: 0.00000605 [18:04:02] Epoch: 1 Batch: 19651/20099 (97.77%) Loss: 2.043886 LR: 0.00000605 [18:04:04] Epoch: 1 Batch: 19652/20099 (97.78%) Loss: 2.154180 LR: 0.00000605 [18:04:06] Epoch: 1 Batch: 19653/20099 (97.78%) Loss: 2.118556 LR: 0.00000605 [18:04:08] Epoch: 1 Batch: 19654/20099 (97.79%) Loss: 2.494402 LR: 0.00000605 [18:04:10] Epoch: 1 Batch: 19655/20099 (97.79%) Loss: 2.005381 LR: 0.00000605 [18:04:12] Epoch: 1 Batch: 19656/20099 (97.80%) Loss: 2.073069 LR: 0.00000605 [18:04:14] Epoch: 1 Batch: 19657/20099 (97.80%) Loss: 1.904256 LR: 0.00000604 [18:04:15] Epoch: 1 Batch: 19658/20099 (97.81%) Loss: 2.419137 LR: 0.00000604 [18:04:17] Epoch: 1 Batch: 19659/20099 (97.81%) Loss: 2.302383 LR: 0.00000604 [18:04:19] Epoch: 1 Batch: 19660/20099 (97.82%) Loss: 2.082259 LR: 0.00000604 [18:04:21] Epoch: 1 Batch: 19661/20099 (97.82%) Loss: 2.089575 LR: 0.00000604 [18:04:23] Epoch: 1 Batch: 19662/20099 (97.83%) Loss: 2.005141 LR: 0.00000604 [18:04:25] Epoch: 1 Batch: 19663/20099 (97.83%) Loss: 2.157150 LR: 0.00000604 [18:04:27] Epoch: 1 Batch: 19664/20099 (97.84%) Loss: 2.117649 LR: 0.00000604 [18:04:28] Epoch: 1 Batch: 19665/20099 (97.84%) Loss: 2.233055 LR: 0.00000604 [18:04:30] Epoch: 1 Batch: 19666/20099 (97.85%) Loss: 2.041726 LR: 0.00000604 [18:04:32] Epoch: 1 Batch: 19667/20099 (97.85%) Loss: 1.895770 LR: 0.00000604 [18:04:34] Epoch: 1 Batch: 19668/20099 (97.86%) Loss: 2.282933 LR: 0.00000604 [18:04:36] Epoch: 1 Batch: 19669/20099 (97.86%) Loss: 2.356988 LR: 0.00000604 [18:04:38] Epoch: 1 Batch: 19670/20099 (97.87%) Loss: 2.037096 LR: 0.00000604 [18:04:40] Epoch: 1 Batch: 19671/20099 (97.87%) Loss: 2.109510 LR: 0.00000604 [18:04:41] Epoch: 1 Batch: 19672/20099 (97.88%) Loss: 2.202351 LR: 0.00000604 [18:04:43] Epoch: 1 Batch: 19673/20099 (97.88%) Loss: 2.075749 LR: 0.00000604 [18:04:45] Epoch: 1 Batch: 19674/20099 (97.89%) Loss: 2.060717 LR: 0.00000604 [18:04:47] Epoch: 1 Batch: 19675/20099 (97.89%) Loss: 2.055551 LR: 0.00000604 [18:04:49] Epoch: 1 Batch: 19676/20099 (97.90%) Loss: 2.157984 LR: 0.00000604 [18:04:51] Epoch: 1 Batch: 19677/20099 (97.90%) Loss: 1.892918 LR: 0.00000604 [18:04:53] Epoch: 1 Batch: 19678/20099 (97.91%) Loss: 1.907358 LR: 0.00000604 [18:04:54] Epoch: 1 Batch: 19679/20099 (97.91%) Loss: 2.129120 LR: 0.00000604 [18:04:56] Epoch: 1 Batch: 19680/20099 (97.92%) Loss: 1.932591 LR: 0.00000604 [18:04:58] Epoch: 1 Batch: 19681/20099 (97.92%) Loss: 2.256490 LR: 0.00000604 [18:05:00] Epoch: 1 Batch: 19682/20099 (97.93%) Loss: 1.997917 LR: 0.00000604 [18:05:02] Epoch: 1 Batch: 19683/20099 (97.93%) Loss: 1.831437 LR: 0.00000604 [18:05:04] Epoch: 1 Batch: 19684/20099 (97.94%) Loss: 2.303206 LR: 0.00000604 [18:05:06] Epoch: 1 Batch: 19685/20099 (97.94%) Loss: 2.407388 LR: 0.00000604 [18:05:08] Epoch: 1 Batch: 19686/20099 (97.95%) Loss: 2.134957 LR: 0.00000604 [18:05:09] Epoch: 1 Batch: 19687/20099 (97.95%) Loss: 2.168730 LR: 0.00000604 [18:05:11] Epoch: 1 Batch: 19688/20099 (97.96%) Loss: 2.110125 LR: 0.00000604 [18:05:13] Epoch: 1 Batch: 19689/20099 (97.96%) Loss: 1.937375 LR: 0.00000604 [18:05:15] Epoch: 1 Batch: 19690/20099 (97.97%) Loss: 1.886409 LR: 0.00000604 [18:05:17] Epoch: 1 Batch: 19691/20099 (97.97%) Loss: 1.953246 LR: 0.00000604 [18:05:19] Epoch: 1 Batch: 19692/20099 (97.98%) Loss: 2.011682 LR: 0.00000604 [18:05:21] Epoch: 1 Batch: 19693/20099 (97.98%) Loss: 2.048472 LR: 0.00000604 [18:05:22] Epoch: 1 Batch: 19694/20099 (97.98%) Loss: 2.216143 LR: 0.00000604 [18:05:24] Epoch: 1 Batch: 19695/20099 (97.99%) Loss: 2.253417 LR: 0.00000604 [18:05:26] Epoch: 1 Batch: 19696/20099 (97.99%) Loss: 2.285841 LR: 0.00000604 [18:05:28] Epoch: 1 Batch: 19697/20099 (98.00%) Loss: 2.406462 LR: 0.00000604 [18:05:30] Epoch: 1 Batch: 19698/20099 (98.00%) Loss: 2.118507 LR: 0.00000604 [18:05:32] Epoch: 1 Batch: 19699/20099 (98.01%) Loss: 2.098961 LR: 0.00000604 [18:05:34] Epoch: 1 Batch: 19700/20099 (98.01%) Loss: 1.905943 LR: 0.00000604 [18:05:35] Epoch: 1 Batch: 19701/20099 (98.02%) Loss: 2.091667 LR: 0.00000604 [18:05:37] Epoch: 1 Batch: 19702/20099 (98.02%) Loss: 2.173095 LR: 0.00000604 [18:05:39] Epoch: 1 Batch: 19703/20099 (98.03%) Loss: 2.439407 LR: 0.00000604 [18:05:41] Epoch: 1 Batch: 19704/20099 (98.03%) Loss: 2.156247 LR: 0.00000604 [18:05:43] Epoch: 1 Batch: 19705/20099 (98.04%) Loss: 1.948297 LR: 0.00000604 [18:05:45] Epoch: 1 Batch: 19706/20099 (98.04%) Loss: 2.166853 LR: 0.00000604 [18:05:47] Epoch: 1 Batch: 19707/20099 (98.05%) Loss: 1.843661 LR: 0.00000604 [18:05:48] Epoch: 1 Batch: 19708/20099 (98.05%) Loss: 1.994346 LR: 0.00000604 [18:05:50] Epoch: 1 Batch: 19709/20099 (98.06%) Loss: 2.203892 LR: 0.00000604 [18:05:52] Epoch: 1 Batch: 19710/20099 (98.06%) Loss: 1.927269 LR: 0.00000604 [18:05:54] Epoch: 1 Batch: 19711/20099 (98.07%) Loss: 2.060558 LR: 0.00000604 [18:05:56] Epoch: 1 Batch: 19712/20099 (98.07%) Loss: 2.082078 LR: 0.00000604 [18:05:58] Epoch: 1 Batch: 19713/20099 (98.08%) Loss: 2.192109 LR: 0.00000603 [18:06:00] Epoch: 1 Batch: 19714/20099 (98.08%) Loss: 2.029804 LR: 0.00000603 [18:06:01] Epoch: 1 Batch: 19715/20099 (98.09%) Loss: 1.949035 LR: 0.00000603 [18:06:03] Epoch: 1 Batch: 19716/20099 (98.09%) Loss: 2.221664 LR: 0.00000603 [18:06:05] Epoch: 1 Batch: 19717/20099 (98.10%) Loss: 1.803337 LR: 0.00000603 [18:06:07] Epoch: 1 Batch: 19718/20099 (98.10%) Loss: 1.843779 LR: 0.00000603 [18:06:09] Epoch: 1 Batch: 19719/20099 (98.11%) Loss: 2.468314 LR: 0.00000603 [18:06:11] Epoch: 1 Batch: 19720/20099 (98.11%) Loss: 2.296453 LR: 0.00000603 [18:06:13] Epoch: 1 Batch: 19721/20099 (98.12%) Loss: 1.811277 LR: 0.00000603 [18:06:14] Epoch: 1 Batch: 19722/20099 (98.12%) Loss: 1.913281 LR: 0.00000603 [18:06:16] Epoch: 1 Batch: 19723/20099 (98.13%) Loss: 1.965908 LR: 0.00000603 [18:06:18] Epoch: 1 Batch: 19724/20099 (98.13%) Loss: 2.256288 LR: 0.00000603 [18:06:20] Epoch: 1 Batch: 19725/20099 (98.14%) Loss: 2.020911 LR: 0.00000603 [18:06:22] Epoch: 1 Batch: 19726/20099 (98.14%) Loss: 2.113656 LR: 0.00000603 [18:06:24] Epoch: 1 Batch: 19727/20099 (98.15%) Loss: 2.008913 LR: 0.00000603 [18:06:25] Epoch: 1 Batch: 19728/20099 (98.15%) Loss: 2.094574 LR: 0.00000603 [18:06:27] Epoch: 1 Batch: 19729/20099 (98.16%) Loss: 2.083511 LR: 0.00000603 [18:06:29] Epoch: 1 Batch: 19730/20099 (98.16%) Loss: 2.115383 LR: 0.00000603 [18:06:31] Epoch: 1 Batch: 19731/20099 (98.17%) Loss: 2.062526 LR: 0.00000603 [18:06:33] Epoch: 1 Batch: 19732/20099 (98.17%) Loss: 1.893872 LR: 0.00000603 [18:06:35] Epoch: 1 Batch: 19733/20099 (98.18%) Loss: 2.380313 LR: 0.00000603 [18:06:37] Epoch: 1 Batch: 19734/20099 (98.18%) Loss: 2.102864 LR: 0.00000603 [18:06:38] Epoch: 1 Batch: 19735/20099 (98.19%) Loss: 1.706490 LR: 0.00000603 [18:06:40] Epoch: 1 Batch: 19736/20099 (98.19%) Loss: 2.039380 LR: 0.00000603 [18:06:42] Epoch: 1 Batch: 19737/20099 (98.20%) Loss: 2.046010 LR: 0.00000603 [18:06:44] Epoch: 1 Batch: 19738/20099 (98.20%) Loss: 2.072414 LR: 0.00000603 [18:06:46] Epoch: 1 Batch: 19739/20099 (98.21%) Loss: 2.169982 LR: 0.00000603 [18:06:48] Epoch: 1 Batch: 19740/20099 (98.21%) Loss: 2.043205 LR: 0.00000603 [18:06:50] Epoch: 1 Batch: 19741/20099 (98.22%) Loss: 2.296820 LR: 0.00000603 [18:06:51] Epoch: 1 Batch: 19742/20099 (98.22%) Loss: 2.313753 LR: 0.00000603 [18:06:53] Epoch: 1 Batch: 19743/20099 (98.23%) Loss: 1.911609 LR: 0.00000603 [18:06:55] Epoch: 1 Batch: 19744/20099 (98.23%) Loss: 2.019235 LR: 0.00000603 [18:06:57] Epoch: 1 Batch: 19745/20099 (98.24%) Loss: 2.137435 LR: 0.00000603 [18:06:59] Epoch: 1 Batch: 19746/20099 (98.24%) Loss: 1.996792 LR: 0.00000603 [18:07:01] Epoch: 1 Batch: 19747/20099 (98.25%) Loss: 1.922562 LR: 0.00000603 [18:07:03] Epoch: 1 Batch: 19748/20099 (98.25%) Loss: 1.859964 LR: 0.00000603 [18:07:04] Epoch: 1 Batch: 19749/20099 (98.26%) Loss: 1.914468 LR: 0.00000603 [18:07:06] Epoch: 1 Batch: 19750/20099 (98.26%) Loss: 2.225158 LR: 0.00000603 [18:07:08] Epoch: 1 Batch: 19751/20099 (98.27%) Loss: 1.813281 LR: 0.00000603 [18:07:10] Epoch: 1 Batch: 19752/20099 (98.27%) Loss: 1.793062 LR: 0.00000603 [18:07:12] Epoch: 1 Batch: 19753/20099 (98.28%) Loss: 2.447685 LR: 0.00000603 [18:07:14] Epoch: 1 Batch: 19754/20099 (98.28%) Loss: 2.159777 LR: 0.00000603 [18:07:15] Epoch: 1 Batch: 19755/20099 (98.29%) Loss: 2.067502 LR: 0.00000603 [18:07:17] Epoch: 1 Batch: 19756/20099 (98.29%) Loss: 2.146427 LR: 0.00000603 [18:07:19] Epoch: 1 Batch: 19757/20099 (98.30%) Loss: 2.219773 LR: 0.00000603 [18:07:21] Epoch: 1 Batch: 19758/20099 (98.30%) Loss: 1.832261 LR: 0.00000603 [18:07:23] Epoch: 1 Batch: 19759/20099 (98.31%) Loss: 2.321684 LR: 0.00000603 [18:07:25] Epoch: 1 Batch: 19760/20099 (98.31%) Loss: 1.928580 LR: 0.00000603 [18:07:27] Epoch: 1 Batch: 19761/20099 (98.32%) Loss: 1.878603 LR: 0.00000603 [18:07:28] Epoch: 1 Batch: 19762/20099 (98.32%) Loss: 2.133772 LR: 0.00000603 [18:07:30] Epoch: 1 Batch: 19763/20099 (98.33%) Loss: 2.173616 LR: 0.00000603 [18:07:32] Epoch: 1 Batch: 19764/20099 (98.33%) Loss: 1.866152 LR: 0.00000603 [18:07:34] Epoch: 1 Batch: 19765/20099 (98.34%) Loss: 2.231261 LR: 0.00000603 [18:07:36] Epoch: 1 Batch: 19766/20099 (98.34%) Loss: 2.208231 LR: 0.00000603 [18:07:38] Epoch: 1 Batch: 19767/20099 (98.35%) Loss: 2.668456 LR: 0.00000603 [18:07:40] Epoch: 1 Batch: 19768/20099 (98.35%) Loss: 2.096473 LR: 0.00000603 [18:07:41] Epoch: 1 Batch: 19769/20099 (98.36%) Loss: 2.187817 LR: 0.00000603 [18:07:43] Epoch: 1 Batch: 19770/20099 (98.36%) Loss: 2.127483 LR: 0.00000603 [18:07:45] Epoch: 1 Batch: 19771/20099 (98.37%) Loss: 2.229463 LR: 0.00000603 [18:07:47] Epoch: 1 Batch: 19772/20099 (98.37%) Loss: 2.026065 LR: 0.00000603 [18:07:49] Epoch: 1 Batch: 19773/20099 (98.38%) Loss: 2.241276 LR: 0.00000603 [18:07:51] Epoch: 1 Batch: 19774/20099 (98.38%) Loss: 2.317104 LR: 0.00000603 [18:07:53] Epoch: 1 Batch: 19775/20099 (98.39%) Loss: 2.058139 LR: 0.00000603 [18:07:54] Epoch: 1 Batch: 19776/20099 (98.39%) Loss: 1.926757 LR: 0.00000602 [18:07:56] Epoch: 1 Batch: 19777/20099 (98.40%) Loss: 2.175685 LR: 0.00000602 [18:07:58] Epoch: 1 Batch: 19778/20099 (98.40%) Loss: 2.164976 LR: 0.00000602 [18:08:00] Epoch: 1 Batch: 19779/20099 (98.41%) Loss: 1.738712 LR: 0.00000602 [18:08:02] Epoch: 1 Batch: 19780/20099 (98.41%) Loss: 1.832962 LR: 0.00000602 [18:08:04] Epoch: 1 Batch: 19781/20099 (98.42%) Loss: 2.064605 LR: 0.00000602 [18:08:06] Epoch: 1 Batch: 19782/20099 (98.42%) Loss: 1.983401 LR: 0.00000602 [18:08:07] Epoch: 1 Batch: 19783/20099 (98.43%) Loss: 2.163121 LR: 0.00000602 [18:08:09] Epoch: 1 Batch: 19784/20099 (98.43%) Loss: 2.137851 LR: 0.00000602 [18:08:11] Epoch: 1 Batch: 19785/20099 (98.44%) Loss: 2.007523 LR: 0.00000602 [18:08:13] Epoch: 1 Batch: 19786/20099 (98.44%) Loss: 1.790818 LR: 0.00000602 [18:08:15] Epoch: 1 Batch: 19787/20099 (98.45%) Loss: 2.076005 LR: 0.00000602 [18:08:17] Epoch: 1 Batch: 19788/20099 (98.45%) Loss: 1.591262 LR: 0.00000602 [18:08:19] Epoch: 1 Batch: 19789/20099 (98.46%) Loss: 2.062265 LR: 0.00000602 [18:08:20] Epoch: 1 Batch: 19790/20099 (98.46%) Loss: 2.057992 LR: 0.00000602 [18:08:22] Epoch: 1 Batch: 19791/20099 (98.47%) Loss: 1.813899 LR: 0.00000602 [18:08:24] Epoch: 1 Batch: 19792/20099 (98.47%) Loss: 1.949007 LR: 0.00000602 [18:08:26] Epoch: 1 Batch: 19793/20099 (98.48%) Loss: 1.768726 LR: 0.00000602 [18:08:28] Epoch: 1 Batch: 19794/20099 (98.48%) Loss: 1.849072 LR: 0.00000602 [18:08:30] Epoch: 1 Batch: 19795/20099 (98.49%) Loss: 2.052927 LR: 0.00000602 [18:08:32] Epoch: 1 Batch: 19796/20099 (98.49%) Loss: 1.931696 LR: 0.00000602 [18:08:33] Epoch: 1 Batch: 19797/20099 (98.50%) Loss: 2.181660 LR: 0.00000602 [18:08:35] Epoch: 1 Batch: 19798/20099 (98.50%) Loss: 2.170081 LR: 0.00000602 [18:08:37] Epoch: 1 Batch: 19799/20099 (98.51%) Loss: 2.017310 LR: 0.00000602 [18:08:43] >> Cleaned up old temp checkpoint: epoch1_step17800 [18:08:43] >> Temp checkpoint saved: epoch1_step19800, size: 0.1693 GB [18:08:43] Epoch: 1 Batch: 19800/20099 (98.51%) Loss: 2.051925 LR: 0.00000602 [18:08:44] Epoch: 1 Batch: 19801/20099 (98.52%) Loss: 2.289562 LR: 0.00000602 [18:08:46] Epoch: 1 Batch: 19802/20099 (98.52%) Loss: 1.942138 LR: 0.00000602 [18:08:48] Epoch: 1 Batch: 19803/20099 (98.53%) Loss: 2.093971 LR: 0.00000602 [18:08:50] Epoch: 1 Batch: 19804/20099 (98.53%) Loss: 1.984400 LR: 0.00000602 [18:08:52] Epoch: 1 Batch: 19805/20099 (98.54%) Loss: 1.872719 LR: 0.00000602 [18:08:54] Epoch: 1 Batch: 19806/20099 (98.54%) Loss: 2.373830 LR: 0.00000602 [18:08:56] Epoch: 1 Batch: 19807/20099 (98.55%) Loss: 2.054818 LR: 0.00000602 [18:08:57] Epoch: 1 Batch: 19808/20099 (98.55%) Loss: 1.820733 LR: 0.00000602 [18:08:59] Epoch: 1 Batch: 19809/20099 (98.56%) Loss: 1.875537 LR: 0.00000602 [18:09:01] Epoch: 1 Batch: 19810/20099 (98.56%) Loss: 1.896393 LR: 0.00000602 [18:09:03] Epoch: 1 Batch: 19811/20099 (98.57%) Loss: 2.518976 LR: 0.00000602 [18:09:05] Epoch: 1 Batch: 19812/20099 (98.57%) Loss: 2.415454 LR: 0.00000602 [18:09:07] Epoch: 1 Batch: 19813/20099 (98.58%) Loss: 2.033613 LR: 0.00000602 [18:09:09] Epoch: 1 Batch: 19814/20099 (98.58%) Loss: 2.292053 LR: 0.00000602 [18:09:10] Epoch: 1 Batch: 19815/20099 (98.59%) Loss: 1.899283 LR: 0.00000602 [18:09:12] Epoch: 1 Batch: 19816/20099 (98.59%) Loss: 2.255978 LR: 0.00000602 [18:09:14] Epoch: 1 Batch: 19817/20099 (98.60%) Loss: 1.930538 LR: 0.00000602 [18:09:16] Epoch: 1 Batch: 19818/20099 (98.60%) Loss: 2.062823 LR: 0.00000602 [18:09:18] Epoch: 1 Batch: 19819/20099 (98.61%) Loss: 2.554799 LR: 0.00000602 [18:09:20] Epoch: 1 Batch: 19820/20099 (98.61%) Loss: 2.461972 LR: 0.00000602 [18:09:22] Epoch: 1 Batch: 19821/20099 (98.62%) Loss: 2.327936 LR: 0.00000602 [18:09:24] Epoch: 1 Batch: 19822/20099 (98.62%) Loss: 1.930173 LR: 0.00000602 [18:09:25] Epoch: 1 Batch: 19823/20099 (98.63%) Loss: 2.103911 LR: 0.00000602 [18:09:27] Epoch: 1 Batch: 19824/20099 (98.63%) Loss: 1.908548 LR: 0.00000602 [18:09:29] Epoch: 1 Batch: 19825/20099 (98.64%) Loss: 2.288671 LR: 0.00000602 [18:09:31] Epoch: 1 Batch: 19826/20099 (98.64%) Loss: 2.085632 LR: 0.00000602 [18:09:33] Epoch: 1 Batch: 19827/20099 (98.65%) Loss: 2.184224 LR: 0.00000602 [18:09:35] Epoch: 1 Batch: 19828/20099 (98.65%) Loss: 2.046353 LR: 0.00000602 [18:09:37] Epoch: 1 Batch: 19829/20099 (98.66%) Loss: 2.053692 LR: 0.00000602 [18:09:38] Epoch: 1 Batch: 19830/20099 (98.66%) Loss: 1.720436 LR: 0.00000602 [18:09:40] Epoch: 1 Batch: 19831/20099 (98.67%) Loss: 2.022283 LR: 0.00000602 [18:09:42] Epoch: 1 Batch: 19832/20099 (98.67%) Loss: 2.128328 LR: 0.00000602 [18:09:44] Epoch: 1 Batch: 19833/20099 (98.68%) Loss: 1.924510 LR: 0.00000602 [18:09:46] Epoch: 1 Batch: 19834/20099 (98.68%) Loss: 2.167423 LR: 0.00000602 [18:09:48] Epoch: 1 Batch: 19835/20099 (98.69%) Loss: 2.149616 LR: 0.00000602 [18:09:50] Epoch: 1 Batch: 19836/20099 (98.69%) Loss: 2.190154 LR: 0.00000602 [18:09:51] Epoch: 1 Batch: 19837/20099 (98.70%) Loss: 2.251785 LR: 0.00000602 [18:09:53] Epoch: 1 Batch: 19838/20099 (98.70%) Loss: 1.972065 LR: 0.00000602 [18:09:55] Epoch: 1 Batch: 19839/20099 (98.71%) Loss: 2.179947 LR: 0.00000602 [18:09:57] Epoch: 1 Batch: 19840/20099 (98.71%) Loss: 1.809257 LR: 0.00000602 [18:09:59] Epoch: 1 Batch: 19841/20099 (98.72%) Loss: 1.841779 LR: 0.00000602 [18:10:01] Epoch: 1 Batch: 19842/20099 (98.72%) Loss: 1.967659 LR: 0.00000602 [18:10:02] Epoch: 1 Batch: 19843/20099 (98.73%) Loss: 2.036695 LR: 0.00000602 [18:10:04] Epoch: 1 Batch: 19844/20099 (98.73%) Loss: 2.005346 LR: 0.00000602 [18:10:06] Epoch: 1 Batch: 19845/20099 (98.74%) Loss: 2.079934 LR: 0.00000602 [18:10:08] Epoch: 1 Batch: 19846/20099 (98.74%) Loss: 2.151863 LR: 0.00000602 [18:10:10] Epoch: 1 Batch: 19847/20099 (98.75%) Loss: 2.093193 LR: 0.00000602 [18:10:12] Epoch: 1 Batch: 19848/20099 (98.75%) Loss: 1.804417 LR: 0.00000602 [18:10:14] Epoch: 1 Batch: 19849/20099 (98.76%) Loss: 2.175230 LR: 0.00000602 [18:10:15] Epoch: 1 Batch: 19850/20099 (98.76%) Loss: 2.240636 LR: 0.00000602 [18:10:17] Epoch: 1 Batch: 19851/20099 (98.77%) Loss: 2.130238 LR: 0.00000602 [18:10:19] Epoch: 1 Batch: 19852/20099 (98.77%) Loss: 2.086620 LR: 0.00000602 [18:10:21] Epoch: 1 Batch: 19853/20099 (98.78%) Loss: 2.367168 LR: 0.00000601 [18:10:23] Epoch: 1 Batch: 19854/20099 (98.78%) Loss: 1.819766 LR: 0.00000601 [18:10:25] Epoch: 1 Batch: 19855/20099 (98.79%) Loss: 2.123184 LR: 0.00000601 [18:10:27] Epoch: 1 Batch: 19856/20099 (98.79%) Loss: 2.113590 LR: 0.00000601 [18:10:28] Epoch: 1 Batch: 19857/20099 (98.80%) Loss: 2.118157 LR: 0.00000601 [18:10:30] Epoch: 1 Batch: 19858/20099 (98.80%) Loss: 1.897917 LR: 0.00000601 [18:10:32] Epoch: 1 Batch: 19859/20099 (98.81%) Loss: 2.366243 LR: 0.00000601 [18:10:34] Epoch: 1 Batch: 19860/20099 (98.81%) Loss: 1.888029 LR: 0.00000601 [18:10:36] Epoch: 1 Batch: 19861/20099 (98.82%) Loss: 1.926271 LR: 0.00000601 [18:10:38] Epoch: 1 Batch: 19862/20099 (98.82%) Loss: 2.253131 LR: 0.00000601 [18:10:40] Epoch: 1 Batch: 19863/20099 (98.83%) Loss: 2.081148 LR: 0.00000601 [18:10:41] Epoch: 1 Batch: 19864/20099 (98.83%) Loss: 2.161395 LR: 0.00000601 [18:10:43] Epoch: 1 Batch: 19865/20099 (98.84%) Loss: 2.206704 LR: 0.00000601 [18:10:45] Epoch: 1 Batch: 19866/20099 (98.84%) Loss: 2.363537 LR: 0.00000601 [18:10:47] Epoch: 1 Batch: 19867/20099 (98.85%) Loss: 2.181461 LR: 0.00000601 [18:10:49] Epoch: 1 Batch: 19868/20099 (98.85%) Loss: 2.418191 LR: 0.00000601 [18:10:51] Epoch: 1 Batch: 19869/20099 (98.86%) Loss: 2.045240 LR: 0.00000601 [18:10:53] Epoch: 1 Batch: 19870/20099 (98.86%) Loss: 2.261252 LR: 0.00000601 [18:10:55] Epoch: 1 Batch: 19871/20099 (98.87%) Loss: 2.352535 LR: 0.00000601 [18:10:56] Epoch: 1 Batch: 19872/20099 (98.87%) Loss: 2.135545 LR: 0.00000601 [18:10:58] Epoch: 1 Batch: 19873/20099 (98.88%) Loss: 1.981014 LR: 0.00000601 [18:11:00] Epoch: 1 Batch: 19874/20099 (98.88%) Loss: 2.387724 LR: 0.00000601 [18:11:02] Epoch: 1 Batch: 19875/20099 (98.89%) Loss: 1.827366 LR: 0.00000601 [18:11:04] Epoch: 1 Batch: 19876/20099 (98.89%) Loss: 2.058333 LR: 0.00000601 [18:11:06] Epoch: 1 Batch: 19877/20099 (98.90%) Loss: 2.179856 LR: 0.00000601 [18:11:08] Epoch: 1 Batch: 19878/20099 (98.90%) Loss: 2.273023 LR: 0.00000601 [18:11:09] Epoch: 1 Batch: 19879/20099 (98.91%) Loss: 2.151248 LR: 0.00000601 [18:11:11] Epoch: 1 Batch: 19880/20099 (98.91%) Loss: 2.390322 LR: 0.00000601 [18:11:13] Epoch: 1 Batch: 19881/20099 (98.92%) Loss: 1.923298 LR: 0.00000601 [18:11:15] Epoch: 1 Batch: 19882/20099 (98.92%) Loss: 2.094088 LR: 0.00000601 [18:11:17] Epoch: 1 Batch: 19883/20099 (98.93%) Loss: 2.342150 LR: 0.00000601 [18:11:19] Epoch: 1 Batch: 19884/20099 (98.93%) Loss: 1.903969 LR: 0.00000601 [18:11:21] Epoch: 1 Batch: 19885/20099 (98.94%) Loss: 2.450585 LR: 0.00000601 [18:11:23] Epoch: 1 Batch: 19886/20099 (98.94%) Loss: 2.299467 LR: 0.00000601 [18:11:24] Epoch: 1 Batch: 19887/20099 (98.95%) Loss: 2.011108 LR: 0.00000601 [18:11:26] Epoch: 1 Batch: 19888/20099 (98.95%) Loss: 2.216524 LR: 0.00000601 [18:11:28] Epoch: 1 Batch: 19889/20099 (98.96%) Loss: 2.243650 LR: 0.00000601 [18:11:30] Epoch: 1 Batch: 19890/20099 (98.96%) Loss: 2.025542 LR: 0.00000601 [18:11:32] Epoch: 1 Batch: 19891/20099 (98.97%) Loss: 2.224040 LR: 0.00000601 [18:11:34] Epoch: 1 Batch: 19892/20099 (98.97%) Loss: 2.126376 LR: 0.00000601 [18:11:35] Epoch: 1 Batch: 19893/20099 (98.98%) Loss: 2.143669 LR: 0.00000601 [18:11:37] Epoch: 1 Batch: 19894/20099 (98.98%) Loss: 1.997896 LR: 0.00000601 [18:11:39] Epoch: 1 Batch: 19895/20099 (98.99%) Loss: 2.360316 LR: 0.00000601 [18:11:41] Epoch: 1 Batch: 19896/20099 (98.99%) Loss: 1.992538 LR: 0.00000601 [18:11:43] Epoch: 1 Batch: 19897/20099 (98.99%) Loss: 1.909878 LR: 0.00000601 [18:11:45] Epoch: 1 Batch: 19898/20099 (99.00%) Loss: 2.474989 LR: 0.00000601 [18:11:47] Epoch: 1 Batch: 19899/20099 (99.00%) Loss: 2.139086 LR: 0.00000601 [18:11:48] Epoch: 1 Batch: 19900/20099 (99.01%) Loss: 2.263630 LR: 0.00000601 [18:11:50] Epoch: 1 Batch: 19901/20099 (99.01%) Loss: 2.060228 LR: 0.00000601 [18:11:52] Epoch: 1 Batch: 19902/20099 (99.02%) Loss: 2.183789 LR: 0.00000601 [18:11:54] Epoch: 1 Batch: 19903/20099 (99.02%) Loss: 2.224228 LR: 0.00000601 [18:11:56] Epoch: 1 Batch: 19904/20099 (99.03%) Loss: 2.058662 LR: 0.00000601 [18:11:58] Epoch: 1 Batch: 19905/20099 (99.03%) Loss: 2.343738 LR: 0.00000601 [18:12:00] Epoch: 1 Batch: 19906/20099 (99.04%) Loss: 1.906108 LR: 0.00000601 [18:12:02] Epoch: 1 Batch: 19907/20099 (99.04%) Loss: 1.927742 LR: 0.00000601 [18:12:03] Epoch: 1 Batch: 19908/20099 (99.05%) Loss: 2.235109 LR: 0.00000601 [18:12:05] Epoch: 1 Batch: 19909/20099 (99.05%) Loss: 2.001861 LR: 0.00000601 [18:12:07] Epoch: 1 Batch: 19910/20099 (99.06%) Loss: 2.194060 LR: 0.00000601 [18:12:09] Epoch: 1 Batch: 19911/20099 (99.06%) Loss: 1.997604 LR: 0.00000601 [18:12:11] Epoch: 1 Batch: 19912/20099 (99.07%) Loss: 1.491943 LR: 0.00000601 [18:12:13] Epoch: 1 Batch: 19913/20099 (99.07%) Loss: 2.068338 LR: 0.00000601 [18:12:14] Epoch: 1 Batch: 19914/20099 (99.08%) Loss: 2.129529 LR: 0.00000601 [18:12:16] Epoch: 1 Batch: 19915/20099 (99.08%) Loss: 1.912435 LR: 0.00000601 [18:12:18] Epoch: 1 Batch: 19916/20099 (99.09%) Loss: 2.082454 LR: 0.00000601 [18:12:20] Epoch: 1 Batch: 19917/20099 (99.09%) Loss: 1.930790 LR: 0.00000601 [18:12:22] Epoch: 1 Batch: 19918/20099 (99.10%) Loss: 1.794702 LR: 0.00000601 [18:12:24] Epoch: 1 Batch: 19919/20099 (99.10%) Loss: 2.064025 LR: 0.00000601 [18:12:26] Epoch: 1 Batch: 19920/20099 (99.11%) Loss: 1.681873 LR: 0.00000601 [18:12:27] Epoch: 1 Batch: 19921/20099 (99.11%) Loss: 2.124971 LR: 0.00000601 [18:12:29] Epoch: 1 Batch: 19922/20099 (99.12%) Loss: 2.194179 LR: 0.00000601 [18:12:31] Epoch: 1 Batch: 19923/20099 (99.12%) Loss: 1.765832 LR: 0.00000601 [18:12:33] Epoch: 1 Batch: 19924/20099 (99.13%) Loss: 2.186311 LR: 0.00000601 [18:12:35] Epoch: 1 Batch: 19925/20099 (99.13%) Loss: 2.228449 LR: 0.00000601 [18:12:37] Epoch: 1 Batch: 19926/20099 (99.14%) Loss: 1.966350 LR: 0.00000601 [18:12:39] Epoch: 1 Batch: 19927/20099 (99.14%) Loss: 2.091926 LR: 0.00000601 [18:12:40] Epoch: 1 Batch: 19928/20099 (99.15%) Loss: 2.112078 LR: 0.00000601 [18:12:42] Epoch: 1 Batch: 19929/20099 (99.15%) Loss: 2.223415 LR: 0.00000601 [18:12:44] Epoch: 1 Batch: 19930/20099 (99.16%) Loss: 2.091291 LR: 0.00000601 [18:12:46] Epoch: 1 Batch: 19931/20099 (99.16%) Loss: 1.978330 LR: 0.00000601 [18:12:48] Epoch: 1 Batch: 19932/20099 (99.17%) Loss: 1.728313 LR: 0.00000601 [18:12:50] Epoch: 1 Batch: 19933/20099 (99.17%) Loss: 2.364385 LR: 0.00000601 [18:12:52] Epoch: 1 Batch: 19934/20099 (99.18%) Loss: 2.016558 LR: 0.00000601 [18:12:53] Epoch: 1 Batch: 19935/20099 (99.18%) Loss: 2.216303 LR: 0.00000601 [18:12:55] Epoch: 1 Batch: 19936/20099 (99.19%) Loss: 2.105827 LR: 0.00000601 [18:12:57] Epoch: 1 Batch: 19937/20099 (99.19%) Loss: 2.393059 LR: 0.00000601 [18:12:59] Epoch: 1 Batch: 19938/20099 (99.20%) Loss: 2.184569 LR: 0.00000601 [18:13:01] Epoch: 1 Batch: 19939/20099 (99.20%) Loss: 2.019128 LR: 0.00000601 [18:13:03] Epoch: 1 Batch: 19940/20099 (99.21%) Loss: 2.103800 LR: 0.00000601 [18:13:05] Epoch: 1 Batch: 19941/20099 (99.21%) Loss: 2.226369 LR: 0.00000601 [18:13:06] Epoch: 1 Batch: 19942/20099 (99.22%) Loss: 2.003755 LR: 0.00000601 [18:13:08] Epoch: 1 Batch: 19943/20099 (99.22%) Loss: 2.065042 LR: 0.00000601 [18:13:10] Epoch: 1 Batch: 19944/20099 (99.23%) Loss: 2.323166 LR: 0.00000601 [18:13:12] Epoch: 1 Batch: 19945/20099 (99.23%) Loss: 1.949398 LR: 0.00000601 [18:13:14] Epoch: 1 Batch: 19946/20099 (99.24%) Loss: 2.002255 LR: 0.00000601 [18:13:16] Epoch: 1 Batch: 19947/20099 (99.24%) Loss: 2.056298 LR: 0.00000601 [18:13:17] Epoch: 1 Batch: 19948/20099 (99.25%) Loss: 2.044523 LR: 0.00000601 [18:13:19] Epoch: 1 Batch: 19949/20099 (99.25%) Loss: 2.300149 LR: 0.00000601 [18:13:21] Epoch: 1 Batch: 19950/20099 (99.26%) Loss: 2.122986 LR: 0.00000601 [18:13:23] Epoch: 1 Batch: 19951/20099 (99.26%) Loss: 2.088663 LR: 0.00000601 [18:13:25] Epoch: 1 Batch: 19952/20099 (99.27%) Loss: 2.109361 LR: 0.00000601 [18:13:27] Epoch: 1 Batch: 19953/20099 (99.27%) Loss: 2.695279 LR: 0.00000601 [18:13:29] Epoch: 1 Batch: 19954/20099 (99.28%) Loss: 2.065400 LR: 0.00000601 [18:13:30] Epoch: 1 Batch: 19955/20099 (99.28%) Loss: 2.190050 LR: 0.00000601 [18:13:32] Epoch: 1 Batch: 19956/20099 (99.29%) Loss: 2.142442 LR: 0.00000601 [18:13:34] Epoch: 1 Batch: 19957/20099 (99.29%) Loss: 1.986890 LR: 0.00000601 [18:13:36] Epoch: 1 Batch: 19958/20099 (99.30%) Loss: 1.784241 LR: 0.00000600 [18:13:38] Epoch: 1 Batch: 19959/20099 (99.30%) Loss: 2.032623 LR: 0.00000600 [18:13:40] Epoch: 1 Batch: 19960/20099 (99.31%) Loss: 2.015134 LR: 0.00000600 [18:13:42] Epoch: 1 Batch: 19961/20099 (99.31%) Loss: 1.981108 LR: 0.00000600 [18:13:43] Epoch: 1 Batch: 19962/20099 (99.32%) Loss: 1.877672 LR: 0.00000600 [18:13:45] Epoch: 1 Batch: 19963/20099 (99.32%) Loss: 1.990401 LR: 0.00000600 [18:13:47] Epoch: 1 Batch: 19964/20099 (99.33%) Loss: 2.051068 LR: 0.00000600 [18:13:49] Epoch: 1 Batch: 19965/20099 (99.33%) Loss: 2.449492 LR: 0.00000600 [18:13:51] Epoch: 1 Batch: 19966/20099 (99.34%) Loss: 2.365175 LR: 0.00000600 [18:13:53] Epoch: 1 Batch: 19967/20099 (99.34%) Loss: 1.904671 LR: 0.00000600 [18:13:55] Epoch: 1 Batch: 19968/20099 (99.35%) Loss: 1.993899 LR: 0.00000600 [18:13:56] Epoch: 1 Batch: 19969/20099 (99.35%) Loss: 1.825162 LR: 0.00000600 [18:13:58] Epoch: 1 Batch: 19970/20099 (99.36%) Loss: 2.120810 LR: 0.00000600 [18:14:00] Epoch: 1 Batch: 19971/20099 (99.36%) Loss: 2.126808 LR: 0.00000600 [18:14:02] Epoch: 1 Batch: 19972/20099 (99.37%) Loss: 2.258442 LR: 0.00000600 [18:14:04] Epoch: 1 Batch: 19973/20099 (99.37%) Loss: 1.917770 LR: 0.00000600 [18:14:06] Epoch: 1 Batch: 19974/20099 (99.38%) Loss: 1.709959 LR: 0.00000600 [18:14:08] Epoch: 1 Batch: 19975/20099 (99.38%) Loss: 2.094954 LR: 0.00000600 [18:14:09] Epoch: 1 Batch: 19976/20099 (99.39%) Loss: 1.951792 LR: 0.00000600 [18:14:11] Epoch: 1 Batch: 19977/20099 (99.39%) Loss: 1.990114 LR: 0.00000600 [18:14:13] Epoch: 1 Batch: 19978/20099 (99.40%) Loss: 2.174391 LR: 0.00000600 [18:14:15] Epoch: 1 Batch: 19979/20099 (99.40%) Loss: 1.989349 LR: 0.00000600 [18:14:17] Epoch: 1 Batch: 19980/20099 (99.41%) Loss: 2.005177 LR: 0.00000600 [18:14:19] Epoch: 1 Batch: 19981/20099 (99.41%) Loss: 2.212886 LR: 0.00000600 [18:14:21] Epoch: 1 Batch: 19982/20099 (99.42%) Loss: 1.738271 LR: 0.00000600 [18:14:22] Epoch: 1 Batch: 19983/20099 (99.42%) Loss: 1.654482 LR: 0.00000600 [18:14:24] Epoch: 1 Batch: 19984/20099 (99.43%) Loss: 1.832176 LR: 0.00000600 [18:14:26] Epoch: 1 Batch: 19985/20099 (99.43%) Loss: 1.949587 LR: 0.00000600 [18:14:28] Epoch: 1 Batch: 19986/20099 (99.44%) Loss: 2.049768 LR: 0.00000600 [18:14:30] Epoch: 1 Batch: 19987/20099 (99.44%) Loss: 2.207282 LR: 0.00000600 [18:14:32] Epoch: 1 Batch: 19988/20099 (99.45%) Loss: 2.254671 LR: 0.00000600 [18:14:33] Epoch: 1 Batch: 19989/20099 (99.45%) Loss: 1.989869 LR: 0.00000600 [18:14:35] Epoch: 1 Batch: 19990/20099 (99.46%) Loss: 2.359429 LR: 0.00000600 [18:14:37] Epoch: 1 Batch: 19991/20099 (99.46%) Loss: 2.252242 LR: 0.00000600 [18:14:39] Epoch: 1 Batch: 19992/20099 (99.47%) Loss: 2.124191 LR: 0.00000600 [18:14:41] Epoch: 1 Batch: 19993/20099 (99.47%) Loss: 2.152019 LR: 0.00000600 [18:14:43] Epoch: 1 Batch: 19994/20099 (99.48%) Loss: 2.158744 LR: 0.00000600 [18:14:45] Epoch: 1 Batch: 19995/20099 (99.48%) Loss: 2.042598 LR: 0.00000600 [18:14:46] Epoch: 1 Batch: 19996/20099 (99.49%) Loss: 2.274831 LR: 0.00000600 [18:14:48] Epoch: 1 Batch: 19997/20099 (99.49%) Loss: 2.210019 LR: 0.00000600 [18:14:50] Epoch: 1 Batch: 19998/20099 (99.50%) Loss: 2.021503 LR: 0.00000600 [18:14:52] Epoch: 1 Batch: 19999/20099 (99.50%) Loss: 2.177226 LR: 0.00000600 [18:14:54] >> Evaluating batch 0 [18:14:55] >> Evaluating batch 1 [18:14:56] >> Evaluating batch 2 [18:14:57] >> Evaluating batch 3 [18:14:58] >> Evaluating batch 4 [18:14:59] >> Evaluating batch 5 [18:15:00] >> Evaluating batch 6 [18:15:01] >> Evaluating batch 7 [18:15:02] >> Evaluating batch 8 [18:15:04] >> Evaluating batch 9 [18:15:05] >> Evaluating batch 10 [18:15:06] >> Evaluating batch 11 [18:15:07] >> Evaluating batch 12 [18:15:08] >> Evaluating batch 13 [18:15:09] >> Evaluating batch 14 [18:15:10] >> Evaluating batch 15 [18:15:11] >> Evaluating batch 16 [18:15:11] Epoch: 1 Step: 20000/20099 Evaluation: [18:15:11] [1mAvg Loss Since Last Eval: 2.0924 Val Loss: 2.1444 Validation loss delta: -0.0022 Perplexity: 8.5367 LR: 0.00000600 [18:15:15] >> Cleaned up old temp checkpoint: epoch1_step18000 [18:15:15] >> Temp checkpoint saved: epoch1_step20000, size: 0.1693 GB [18:15:18] >> Checkpoint saved: epoch1_step20000, size: 0.1693 GB [18:15:18] Epoch: 1 Batch: 20000/20099 (99.51%) Loss: 1.895693 LR: 0.00000600 [18:15:20] Epoch: 1 Batch: 20001/20099 (99.51%) Loss: 2.118155 LR: 0.00000600 [18:15:22] Epoch: 1 Batch: 20002/20099 (99.52%) Loss: 1.900028 LR: 0.00000600 [18:15:24] Epoch: 1 Batch: 20003/20099 (99.52%) Loss: 1.885696 LR: 0.00000600 [18:15:26] Epoch: 1 Batch: 20004/20099 (99.53%) Loss: 2.298121 LR: 0.00000600 [18:15:27] Epoch: 1 Batch: 20005/20099 (99.53%) Loss: 2.129913 LR: 0.00000600 [18:15:29] Epoch: 1 Batch: 20006/20099 (99.54%) Loss: 2.128814 LR: 0.00000600 [18:15:31] Epoch: 1 Batch: 20007/20099 (99.54%) Loss: 2.134611 LR: 0.00000600 [18:15:33] Epoch: 1 Batch: 20008/20099 (99.55%) Loss: 1.945534 LR: 0.00000600 [18:15:35] Epoch: 1 Batch: 20009/20099 (99.55%) Loss: 2.313602 LR: 0.00000600 [18:15:37] Epoch: 1 Batch: 20010/20099 (99.56%) Loss: 2.076821 LR: 0.00000600 [18:15:39] Epoch: 1 Batch: 20011/20099 (99.56%) Loss: 2.217935 LR: 0.00000600 [18:15:41] Epoch: 1 Batch: 20012/20099 (99.57%) Loss: 1.920306 LR: 0.00000600 [18:15:43] Epoch: 1 Batch: 20013/20099 (99.57%) Loss: 2.286664 LR: 0.00000600 [18:15:45] Epoch: 1 Batch: 20014/20099 (99.58%) Loss: 1.826551 LR: 0.00000600 [18:15:46] Epoch: 1 Batch: 20015/20099 (99.58%) Loss: 2.038226 LR: 0.00000600 [18:15:48] Epoch: 1 Batch: 20016/20099 (99.59%) Loss: 2.308253 LR: 0.00000600 [18:15:50] Epoch: 1 Batch: 20017/20099 (99.59%) Loss: 2.430830 LR: 0.00000600 [18:15:52] Epoch: 1 Batch: 20018/20099 (99.60%) Loss: 2.111906 LR: 0.00000600 [18:15:54] Epoch: 1 Batch: 20019/20099 (99.60%) Loss: 1.901409 LR: 0.00000600 [18:15:56] Epoch: 1 Batch: 20020/20099 (99.61%) Loss: 1.697669 LR: 0.00000600 [18:15:58] Epoch: 1 Batch: 20021/20099 (99.61%) Loss: 1.972427 LR: 0.00000600 [18:16:00] Epoch: 1 Batch: 20022/20099 (99.62%) Loss: 2.280633 LR: 0.00000600 [18:16:01] Epoch: 1 Batch: 20023/20099 (99.62%) Loss: 2.210963 LR: 0.00000600 [18:16:03] Epoch: 1 Batch: 20024/20099 (99.63%) Loss: 2.193432 LR: 0.00000600 [18:16:05] Epoch: 1 Batch: 20025/20099 (99.63%) Loss: 2.002573 LR: 0.00000600 [18:16:07] Epoch: 1 Batch: 20026/20099 (99.64%) Loss: 2.376453 LR: 0.00000600 [18:16:09] Epoch: 1 Batch: 20027/20099 (99.64%) Loss: 2.042660 LR: 0.00000600 [18:16:11] Epoch: 1 Batch: 20028/20099 (99.65%) Loss: 1.959887 LR: 0.00000600 [18:16:13] Epoch: 1 Batch: 20029/20099 (99.65%) Loss: 2.093671 LR: 0.00000600 [18:16:14] Epoch: 1 Batch: 20030/20099 (99.66%) Loss: 2.195789 LR: 0.00000600 [18:16:16] Epoch: 1 Batch: 20031/20099 (99.66%) Loss: 2.164001 LR: 0.00000600 [18:16:18] Epoch: 1 Batch: 20032/20099 (99.67%) Loss: 1.986731 LR: 0.00000600 [18:16:20] Epoch: 1 Batch: 20033/20099 (99.67%) Loss: 2.271636 LR: 0.00000600 [18:16:22] Epoch: 1 Batch: 20034/20099 (99.68%) Loss: 2.018275 LR: 0.00000600 [18:16:24] Epoch: 1 Batch: 20035/20099 (99.68%) Loss: 2.099380 LR: 0.00000600 [18:16:25] Epoch: 1 Batch: 20036/20099 (99.69%) Loss: 2.018587 LR: 0.00000600 [18:16:27] Epoch: 1 Batch: 20037/20099 (99.69%) Loss: 2.019433 LR: 0.00000600 [18:16:29] Epoch: 1 Batch: 20038/20099 (99.70%) Loss: 2.112030 LR: 0.00000600 [18:16:31] Epoch: 1 Batch: 20039/20099 (99.70%) Loss: 2.167210 LR: 0.00000600 [18:16:33] Epoch: 1 Batch: 20040/20099 (99.71%) Loss: 2.285786 LR: 0.00000600 [18:16:35] Epoch: 1 Batch: 20041/20099 (99.71%) Loss: 2.189596 LR: 0.00000600 [18:16:37] Epoch: 1 Batch: 20042/20099 (99.72%) Loss: 2.201477 LR: 0.00000600 [18:16:38] Epoch: 1 Batch: 20043/20099 (99.72%) Loss: 1.802700 LR: 0.00000600 [18:16:40] Epoch: 1 Batch: 20044/20099 (99.73%) Loss: 1.943753 LR: 0.00000600 [18:16:42] Epoch: 1 Batch: 20045/20099 (99.73%) Loss: 1.957204 LR: 0.00000600 [18:16:44] Epoch: 1 Batch: 20046/20099 (99.74%) Loss: 2.256362 LR: 0.00000600 [18:16:46] Epoch: 1 Batch: 20047/20099 (99.74%) Loss: 2.068656 LR: 0.00000600 [18:16:48] Epoch: 1 Batch: 20048/20099 (99.75%) Loss: 1.874844 LR: 0.00000600 [18:16:49] Epoch: 1 Batch: 20049/20099 (99.75%) Loss: 1.974484 LR: 0.00000600 [18:16:51] Epoch: 1 Batch: 20050/20099 (99.76%) Loss: 2.000124 LR: 0.00000600 [18:16:53] Epoch: 1 Batch: 20051/20099 (99.76%) Loss: 2.147340 LR: 0.00000600 [18:16:55] Epoch: 1 Batch: 20052/20099 (99.77%) Loss: 2.334250 LR: 0.00000600 [18:16:57] Epoch: 1 Batch: 20053/20099 (99.77%) Loss: 2.299127 LR: 0.00000600 [18:16:59] Epoch: 1 Batch: 20054/20099 (99.78%) Loss: 1.934283 LR: 0.00000600 [18:17:01] Epoch: 1 Batch: 20055/20099 (99.78%) Loss: 1.882757 LR: 0.00000600 [18:17:02] Epoch: 1 Batch: 20056/20099 (99.79%) Loss: 2.052042 LR: 0.00000600 [18:17:04] Epoch: 1 Batch: 20057/20099 (99.79%) Loss: 2.233625 LR: 0.00000600 [18:17:06] Epoch: 1 Batch: 20058/20099 (99.80%) Loss: 2.161074 LR: 0.00000600 [18:17:08] Epoch: 1 Batch: 20059/20099 (99.80%) Loss: 2.036236 LR: 0.00000600 [18:17:10] Epoch: 1 Batch: 20060/20099 (99.81%) Loss: 1.957123 LR: 0.00000600 [18:17:12] Epoch: 1 Batch: 20061/20099 (99.81%) Loss: 1.790294 LR: 0.00000600 [18:17:14] Epoch: 1 Batch: 20062/20099 (99.82%) Loss: 2.163150 LR: 0.00000600 [18:17:15] Epoch: 1 Batch: 20063/20099 (99.82%) Loss: 2.443207 LR: 0.00000600 [18:17:17] Epoch: 1 Batch: 20064/20099 (99.83%) Loss: 1.768087 LR: 0.00000600 [18:17:19] Epoch: 1 Batch: 20065/20099 (99.83%) Loss: 2.161706 LR: 0.00000600 [18:17:21] Epoch: 1 Batch: 20066/20099 (99.84%) Loss: 1.910872 LR: 0.00000600 [18:17:23] Epoch: 1 Batch: 20067/20099 (99.84%) Loss: 2.215328 LR: 0.00000600 [18:17:25] Epoch: 1 Batch: 20068/20099 (99.85%) Loss: 2.215004 LR: 0.00000600 [18:17:27] Epoch: 1 Batch: 20069/20099 (99.85%) Loss: 2.040601 LR: 0.00000600 [18:17:28] Epoch: 1 Batch: 20070/20099 (99.86%) Loss: 1.849070 LR: 0.00000600 [18:17:30] Epoch: 1 Batch: 20071/20099 (99.86%) Loss: 2.029732 LR: 0.00000600 [18:17:32] Epoch: 1 Batch: 20072/20099 (99.87%) Loss: 2.323634 LR: 0.00000600 [18:17:34] Epoch: 1 Batch: 20073/20099 (99.87%) Loss: 2.117156 LR: 0.00000600 [18:17:36] Epoch: 1 Batch: 20074/20099 (99.88%) Loss: 1.723173 LR: 0.00000600 [18:17:38] Epoch: 1 Batch: 20075/20099 (99.88%) Loss: 1.981044 LR: 0.00000600 [18:17:40] Epoch: 1 Batch: 20076/20099 (99.89%) Loss: 2.037143 LR: 0.00000600 [18:17:41] Epoch: 1 Batch: 20077/20099 (99.89%) Loss: 1.958102 LR: 0.00000600 [18:17:43] Epoch: 1 Batch: 20078/20099 (99.90%) Loss: 1.980074 LR: 0.00000600 [18:17:45] Epoch: 1 Batch: 20079/20099 (99.90%) Loss: 2.210082 LR: 0.00000600 [18:17:47] Epoch: 1 Batch: 20080/20099 (99.91%) Loss: 2.048256 LR: 0.00000600 [18:17:49] Epoch: 1 Batch: 20081/20099 (99.91%) Loss: 2.119569 LR: 0.00000600 [18:17:51] Epoch: 1 Batch: 20082/20099 (99.92%) Loss: 1.897750 LR: 0.00000600 [18:17:53] Epoch: 1 Batch: 20083/20099 (99.92%) Loss: 2.091305 LR: 0.00000600 [18:17:54] Epoch: 1 Batch: 20084/20099 (99.93%) Loss: 2.280691 LR: 0.00000600 [18:17:56] Epoch: 1 Batch: 20085/20099 (99.93%) Loss: 2.147128 LR: 0.00000600 [18:17:58] Epoch: 1 Batch: 20086/20099 (99.94%) Loss: 2.006140 LR: 0.00000600 [18:18:00] Epoch: 1 Batch: 20087/20099 (99.94%) Loss: 2.240511 LR: 0.00000600 [18:18:02] Epoch: 1 Batch: 20088/20099 (99.95%) Loss: 1.995956 LR: 0.00000600 [18:18:04] Epoch: 1 Batch: 20089/20099 (99.95%) Loss: 2.086977 LR: 0.00000600 [18:18:05] Epoch: 1 Batch: 20090/20099 (99.96%) Loss: 1.993428 LR: 0.00000600 [18:18:07] Epoch: 1 Batch: 20091/20099 (99.96%) Loss: 2.169419 LR: 0.00000600 [18:18:09] Epoch: 1 Batch: 20092/20099 (99.97%) Loss: 2.396623 LR: 0.00000600 [18:18:11] Epoch: 1 Batch: 20093/20099 (99.97%) Loss: 1.850003 LR: 0.00000600 [18:18:13] Epoch: 1 Batch: 20094/20099 (99.98%) Loss: 1.873596 LR: 0.00000600 [18:18:15] Epoch: 1 Batch: 20095/20099 (99.98%) Loss: 1.948931 LR: 0.00000600 [18:18:17] Epoch: 1 Batch: 20096/20099 (99.99%) Loss: 2.426093 LR: 0.00000600 [18:18:18] Epoch: 1 Batch: 20097/20099 (99.99%) Loss: 2.223984 LR: 0.00000600 [18:18:20] Epoch: 1 Batch: 20098/20099 (100.00%) Loss: 1.950486 LR: 0.00000600 [18:18:21] Epoch: 1 Batch: 20099/20099 (100.00%) Loss: 2.265170 LR: 0.00000600 [18:18:21] CPU usage: 64.4%, RAM usage: 30.6% [18:18:21] Memory cleanup after epoch 1 [18:18:23] CPU usage: 53.0%, RAM usage: 30.6% [18:18:23] Epoch 1 average loss: 0.8819 [18:18:23] >> Evaluating batch 0 [18:18:24] >> Evaluating batch 1 [18:18:25] >> Evaluating batch 2 [18:18:26] >> Evaluating batch 3 [18:18:27] >> Evaluating batch 4 [18:18:28] >> Evaluating batch 5 [18:18:29] >> Evaluating batch 6 [18:18:31] >> Evaluating batch 7 [18:18:32] >> Evaluating batch 8 [18:18:33] >> Evaluating batch 9 [18:18:34] >> Evaluating batch 10 [18:18:35] >> Evaluating batch 11 [18:18:36] >> Evaluating batch 12 [18:18:37] >> Evaluating batch 13 [18:18:38] >> Evaluating batch 14 [18:18:39] >> Evaluating batch 15 [18:18:40] >> Evaluating batch 16 [18:18:40] Epoch: 1 Step: 20099/20099 Evaluation: [18:18:40] Val Loss: 2.1445 Perplexity: 8.5377 LR: 0.00000600 [18:18:40] Epoch 1 completed in 16320.25 seconds [18:18:44] >> Checkpoint saved: epoch1_complete, size: 0.1690 GB [18:18:48] >> Cleaned up old temp checkpoint: epoch1_step18200 [18:18:48] >> Temp checkpoint saved: epoch1_step20099, size: 0.1690 GB [18:18:48] Training complete. [20:41:35] 2025-08-24 [20:41:36] Tesla T4 [20:41:36] |===========================================================================| | PyTorch CUDA memory summary, device ID 0 | |---------------------------------------------------------------------------| | CUDA OOMs: 0 | cudaMalloc retries: 0 | |===========================================================================| | Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | |---------------------------------------------------------------------------| | Allocated memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Active memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Requested memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | GPU reserved memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Non-releasable memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Allocations | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Active allocs | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | GPU reserved segments | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Non-releasable allocs | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Oversize allocations | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Oversize GPU segments | 0 | 0 | 0 | 0 | |===========================================================================| [20:41:36] CPU usage: 94.8%, RAM usage: 27.2% [20:41:36] Running with the following configuration: [20:41:36] model_name: /content/drive/MyDrive/llm/NousResearch/Hermes-3-Llama-3.1-8B [20:41:36] tokenizer: /content/drive/MyDrive/llm/NousResearch/Hermes-3-Llama-3.1-8B [20:41:36] output_dir: /content/drive/MyDrive/llm/Discord-Hermes-3-8B [20:41:36] train_path: /content/drive/MyDrive/data/None156_fix.csv [20:41:36] checkpoint: /content/drive/MyDrive/llm/Discord-Hermes-3-8B/epoch1_step19000 [20:41:36] lr: 3e-05 [20:41:36] lr_floor: 6e-06 [20:41:37] epochs: 1 [20:41:37] batch_size: 5 [20:41:37] accum_steps: 7 [20:41:37] val_batch_size: 6 [20:41:37] max_val_size: 100 [20:41:37] max_length: 150 [20:41:37] save_temp_frequency: 200 [20:41:37] save_frequency: 500 [20:41:37] eval_frequency: 500 [20:41:37] save_pattern: y [20:41:37] quantization: y [20:41:37] quantization_bits: 4 [20:41:37] lora: y [20:41:37] frozen_lora_path: None [20:41:37] lora_rank: 16 [20:41:37] lora_alpha: 32 [20:41:37] lora_dropout: 0.1 [20:41:37] optimizer_weight_decay: 0.0 [20:41:37] warmup_type: cosine [20:41:37] warmup_ratio: 0.08 [20:41:37] warmup_steps: 550 [20:41:37] shuffle: y [20:41:37] csv_column: text [20:41:37] new_run: n [20:41:37] label_smoothing: 0.05 [20:41:37] SEED: 1 [20:41:37] Using device: cuda [20:41:37] Resuming from temp checkpoint: /content/drive/MyDrive/llm/Discord-Hermes-3-8B/epoch1_step19000 [20:48:09] Embeddings shape after: torch.Size([128256, 4096]) [20:48:19] Loaded trainable LoRA adapter from /content/drive/MyDrive/llm/Discord-Hermes-3-8B/epoch1_step19000 [20:48:19] Trainable LoRA 'default': [20:48:19] task_type: CAUSAL_LM [20:48:19] peft_type: PeftType.LORA [20:48:19] auto_mapping: None [20:48:19] base_model_name_or_path: /content/drive/MyDrive/llm/NousResearch/Hermes-3-Llama-3.1-8B [20:48:19] revision: None [20:48:19] inference_mode: False [20:48:19] r: 16 [20:48:19] target_modules: {'q_proj', 'k_proj', 'v_proj', 'o_proj'} [20:48:19] exclude_modules: None [20:48:19] lora_alpha: 32 [20:48:19] lora_dropout: 0.1 [20:48:19] fan_in_fan_out: False [20:48:19] bias: none [20:48:19] use_rslora: True [20:48:19] modules_to_save: None [20:48:19] init_lora_weights: True [20:48:19] layers_to_transform: None [20:48:19] layers_pattern: None [20:48:19] rank_pattern: {} [20:48:19] alpha_pattern: {} [20:48:19] megatron_config: None [20:48:19] megatron_core: megatron.core [20:48:19] trainable_token_indices: None [20:48:19] loftq_config: {} [20:48:19] eva_config: None [20:48:19] corda_config: None [20:48:19] use_dora: False [20:48:19] use_qalora: False [20:48:19] qalora_group_size: 16 [20:48:19] layer_replication: None [20:48:19] runtime_config: LoraRuntimeConfig(ephemeral_gpu_offload=False) [20:48:19] lora_bias: False [20:48:19] target_parameters: None [20:48:19] _custom_modules: None [20:48:19] Embeddings shape after: torch.Size([128256, 4096]) [20:48:34] Resumed from epoch 1, step 19001, file 1 [20:48:34] Starting from CSV file... [20:48:39] Splitting data into chunks of 11000... [20:48:39] Using 7 processes across 10 chunks [20:48:39] Using saved train/val split from checkpoint. [20:48:39] Resuming scheduler with warmup steps: 229, total steps: 2871 [20:48:39] Initializing scheduler with cosine schedule with warmup, warmup steps 550, total steps: 2871 [20:48:39] Train/Val split: 100492 train, 100 val samples. [20:48:48] Model: PeftModelForCausalLM [20:48:48] Model config: LlamaConfig { "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": 128040, "head_dim": 128, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 131072, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pretraining_tp": 1, "quantization_config": { "_load_in_4bit": true, "_load_in_8bit": false, "bnb_4bit_compute_dtype": "float16", "bnb_4bit_quant_storage": "uint8", "bnb_4bit_quant_type": "nf4", "bnb_4bit_use_double_quant": true, "llm_int8_enable_fp32_cpu_offload": false, "llm_int8_has_fp16_weight": false, "llm_int8_skip_modules": [ "lm_head" ], "llm_int8_threshold": 6.0, "load_in_4bit": true, "load_in_8bit": false, "quant_method": "bitsandbytes" }, "rms_norm_eps": 1e-05, "rope_scaling": { "factor": 8.0, "high_freq_factor": 4.0, "low_freq_factor": 1.0, "original_max_position_embeddings": 8192, "rope_type": "llama3" }, "rope_theta": 500000.0, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.55.2", "use_cache": true, "vocab_size": 128256 } [20:48:48] Optimizer params: lr=3e-05, weight_decay=0.0, accum_steps=7 [20:48:48] Optimizer: PagedAdamW ( Parameter Group 0 alpha: 0.0 betas: (0.9, 0.95) eps: 1e-08 initial_lr: 3e-05 lr: 0.0 t_alpha: None t_beta3: None weight_decay: 0.0 ) [20:48:48] Optimizer params: lr=3e-05, weight_decay=0.0, accum_steps=7 [20:48:48] Scheduler: [20:48:48] Training on 100492 training samples, 100 validation samples [20:48:48] Average tokens per sample: 150.00 [20:48:48] Estimated epoch time: ~301.73 min [20:48:48] |===========================================================================| | PyTorch CUDA memory summary, device ID 0 | |---------------------------------------------------------------------------| | CUDA OOMs: 0 | cudaMalloc retries: 0 | |===========================================================================| | Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | |---------------------------------------------------------------------------| | Allocated memory | 5986 MiB | 7004 MiB | 335397 MiB | 329410 MiB | |---------------------------------------------------------------------------| | Active memory | 5986 MiB | 7004 MiB | 335397 MiB | 329410 MiB | |---------------------------------------------------------------------------| | Requested memory | 5983 MiB | 7000 MiB | 335022 MiB | 329039 MiB | |---------------------------------------------------------------------------| | GPU reserved memory | 7248 MiB | 7248 MiB | 7248 MiB | 0 B | |---------------------------------------------------------------------------| | Non-releasable memory | 1261 MiB | 5879 MiB | 328754 MiB | 327493 MiB | |---------------------------------------------------------------------------| | Allocations | 2762 | 2840 | 33883 | 31121 | |---------------------------------------------------------------------------| | Active allocs | 2762 | 2840 | 33883 | 31121 | |---------------------------------------------------------------------------| | GPU reserved segments | 185 | 185 | 185 | 0 | |---------------------------------------------------------------------------| | Non-releasable allocs | 36 | 36 | 13826 | 13790 | |---------------------------------------------------------------------------| | Oversize allocations | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Oversize GPU segments | 0 | 0 | 0 | 0 | |===========================================================================| [20:48:48] Restoring shuffle indices from training state for epoch 1 [20:48:48] CPU usage: 42.2%, RAM usage: 36.8% [20:48:49] Epoch 1 learning rate: 0.0 [20:48:49] Starting epoch 1 [20:49:50] Batch 19001: input_ids shape torch.Size([5, 150]), attention_mask shape torch.Size([5, 150]) [20:49:51] Epoch: 1 Batch: 19001/20099 (94.54%) Loss: 1.978112 LR: 0.00000000 [20:49:53] Epoch: 1 Batch: 19002/20099 (94.54%) Loss: 2.163214 LR: 0.00000000 [20:49:55] Epoch: 1 Batch: 19003/20099 (94.55%) Loss: 2.210306 LR: 0.00000000 [20:49:56] Epoch: 1 Batch: 19004/20099 (94.55%) Loss: 2.462724 LR: 0.00000000 [20:49:58] Epoch: 1 Batch: 19005/20099 (94.56%) Loss: 2.192734 LR: 0.00000000 [20:50:00] Epoch: 1 Batch: 19006/20099 (94.56%) Loss: 2.213096 LR: 0.00000000 [20:50:01] Epoch: 1 Batch: 19007/20099 (94.57%) Loss: 2.035861 LR: 0.00000627 [20:50:03] Epoch: 1 Batch: 19008/20099 (94.57%) Loss: 2.430056 LR: 0.00000627 [20:50:05] Epoch: 1 Batch: 19009/20099 (94.58%) Loss: 2.280874 LR: 0.00000627 [20:50:06] Epoch: 1 Batch: 19010/20099 (94.58%) Loss: 1.900641 LR: 0.00000627 [20:50:08] Epoch: 1 Batch: 19011/20099 (94.59%) Loss: 2.030517 LR: 0.00000627 [20:50:10] Epoch: 1 Batch: 19012/20099 (94.59%) Loss: 2.103296 LR: 0.00000627 [20:50:12] Epoch: 1 Batch: 19013/20099 (94.60%) Loss: 2.395484 LR: 0.00000627 [20:50:13] Epoch: 1 Batch: 19014/20099 (94.60%) Loss: 2.102773 LR: 0.00000627 [20:50:15] Epoch: 1 Batch: 19015/20099 (94.61%) Loss: 2.169385 LR: 0.00000627 [20:50:17] Epoch: 1 Batch: 19016/20099 (94.61%) Loss: 1.819225 LR: 0.00000627 [20:50:18] Epoch: 1 Batch: 19017/20099 (94.62%) Loss: 1.983545 LR: 0.00000627 [20:50:20] Epoch: 1 Batch: 19018/20099 (94.62%) Loss: 2.136305 LR: 0.00000627 [20:50:22] Epoch: 1 Batch: 19019/20099 (94.63%) Loss: 2.064129 LR: 0.00000627 [20:50:24] Epoch: 1 Batch: 19020/20099 (94.63%) Loss: 1.902418 LR: 0.00000627 [20:50:25] Epoch: 1 Batch: 19021/20099 (94.64%) Loss: 1.938429 LR: 0.00000626 [20:50:27] Epoch: 1 Batch: 19022/20099 (94.64%) Loss: 1.968538 LR: 0.00000626 [20:50:29] Epoch: 1 Batch: 19023/20099 (94.65%) Loss: 2.344552 LR: 0.00000626 [20:50:31] Epoch: 1 Batch: 19024/20099 (94.65%) Loss: 1.947167 LR: 0.00000626 [20:50:32] Epoch: 1 Batch: 19025/20099 (94.66%) Loss: 2.168672 LR: 0.00000626 [20:50:34] Epoch: 1 Batch: 19026/20099 (94.66%) Loss: 1.889120 LR: 0.00000626 [20:50:36] Epoch: 1 Batch: 19027/20099 (94.67%) Loss: 2.061914 LR: 0.00000626 [20:50:38] Epoch: 1 Batch: 19028/20099 (94.67%) Loss: 1.961316 LR: 0.00000626 [20:50:39] Epoch: 1 Batch: 19029/20099 (94.68%) Loss: 1.922917 LR: 0.00000626 [20:50:41] Epoch: 1 Batch: 19030/20099 (94.68%) Loss: 2.051198 LR: 0.00000626 [20:50:43] Epoch: 1 Batch: 19031/20099 (94.69%) Loss: 1.791793 LR: 0.00000626 [20:50:45] Epoch: 1 Batch: 19032/20099 (94.69%) Loss: 1.950352 LR: 0.00000626 [20:50:46] Epoch: 1 Batch: 19033/20099 (94.70%) Loss: 2.182775 LR: 0.00000626 [20:50:48] Epoch: 1 Batch: 19034/20099 (94.70%) Loss: 2.177331 LR: 0.00000626 [20:50:50] Epoch: 1 Batch: 19035/20099 (94.71%) Loss: 1.749549 LR: 0.00000626 [20:50:51] Epoch: 1 Batch: 19036/20099 (94.71%) Loss: 2.126160 LR: 0.00000626 [20:50:53] Epoch: 1 Batch: 19037/20099 (94.72%) Loss: 2.146404 LR: 0.00000626 [20:50:55] Epoch: 1 Batch: 19038/20099 (94.72%) Loss: 1.955635 LR: 0.00000626 [20:50:56] Epoch: 1 Batch: 19039/20099 (94.73%) Loss: 2.152267 LR: 0.00000626 [20:50:58] Epoch: 1 Batch: 19040/20099 (94.73%) Loss: 2.099615 LR: 0.00000626 [20:51:00] Epoch: 1 Batch: 19041/20099 (94.74%) Loss: 1.976633 LR: 0.00000626 [20:51:02] Epoch: 1 Batch: 19042/20099 (94.74%) Loss: 2.129047 LR: 0.00000625 [20:51:03] Epoch: 1 Batch: 19043/20099 (94.75%) Loss: 1.988193 LR: 0.00000625 [20:51:05] Epoch: 1 Batch: 19044/20099 (94.75%) Loss: 2.221761 LR: 0.00000625 [20:51:07] Epoch: 1 Batch: 19045/20099 (94.76%) Loss: 1.653421 LR: 0.00000625 [20:51:08] Epoch: 1 Batch: 19046/20099 (94.76%) Loss: 2.268929 LR: 0.00000625 [20:51:10] Epoch: 1 Batch: 19047/20099 (94.77%) Loss: 1.984380 LR: 0.00000625 [20:51:12] Epoch: 1 Batch: 19048/20099 (94.77%) Loss: 1.642195 LR: 0.00000625 [20:51:13] Epoch: 1 Batch: 19049/20099 (94.78%) Loss: 2.097610 LR: 0.00000625 [20:51:15] Epoch: 1 Batch: 19050/20099 (94.78%) Loss: 2.115116 LR: 0.00000625 [20:51:17] Epoch: 1 Batch: 19051/20099 (94.79%) Loss: 2.317827 LR: 0.00000625 [20:51:18] Epoch: 1 Batch: 19052/20099 (94.79%) Loss: 1.668196 LR: 0.00000625 [20:51:20] Epoch: 1 Batch: 19053/20099 (94.80%) Loss: 2.269184 LR: 0.00000625 [20:51:22] Epoch: 1 Batch: 19054/20099 (94.80%) Loss: 2.229957 LR: 0.00000625 [20:51:23] Epoch: 1 Batch: 19055/20099 (94.81%) Loss: 1.947311 LR: 0.00000625 [20:51:25] Epoch: 1 Batch: 19056/20099 (94.81%) Loss: 2.133463 LR: 0.00000625 [20:51:27] Epoch: 1 Batch: 19057/20099 (94.82%) Loss: 2.061038 LR: 0.00000625 [20:51:28] Epoch: 1 Batch: 19058/20099 (94.82%) Loss: 2.114843 LR: 0.00000625 [20:51:30] Epoch: 1 Batch: 19059/20099 (94.83%) Loss: 1.873330 LR: 0.00000625 [20:51:32] Epoch: 1 Batch: 19060/20099 (94.83%) Loss: 2.105238 LR: 0.00000625 [20:51:33] Epoch: 1 Batch: 19061/20099 (94.84%) Loss: 2.132580 LR: 0.00000625 [20:51:35] Epoch: 1 Batch: 19062/20099 (94.84%) Loss: 2.423683 LR: 0.00000625 [20:51:37] Epoch: 1 Batch: 19063/20099 (94.85%) Loss: 1.946827 LR: 0.00000624 [20:51:38] Epoch: 1 Batch: 19064/20099 (94.85%) Loss: 2.077487 LR: 0.00000624 [20:51:40] Epoch: 1 Batch: 19065/20099 (94.86%) Loss: 1.968953 LR: 0.00000624 [20:51:42] Epoch: 1 Batch: 19066/20099 (94.86%) Loss: 1.977991 LR: 0.00000624 [20:51:43] Epoch: 1 Batch: 19067/20099 (94.87%) Loss: 2.096603 LR: 0.00000624 [20:51:45] Epoch: 1 Batch: 19068/20099 (94.87%) Loss: 1.995829 LR: 0.00000624 [20:51:47] Epoch: 1 Batch: 19069/20099 (94.88%) Loss: 2.165722 LR: 0.00000624 [20:51:48] Epoch: 1 Batch: 19070/20099 (94.88%) Loss: 1.969299 LR: 0.00000624 [20:51:50] Epoch: 1 Batch: 19071/20099 (94.89%) Loss: 1.821863 LR: 0.00000624 [20:51:52] Epoch: 1 Batch: 19072/20099 (94.89%) Loss: 1.980339 LR: 0.00000624 [20:51:54] Epoch: 1 Batch: 19073/20099 (94.90%) Loss: 2.093713 LR: 0.00000624 [20:51:55] Epoch: 1 Batch: 19074/20099 (94.90%) Loss: 1.747308 LR: 0.00000624 [20:51:57] Epoch: 1 Batch: 19075/20099 (94.91%) Loss: 2.366522 LR: 0.00000624 [20:51:59] Epoch: 1 Batch: 19076/20099 (94.91%) Loss: 1.737646 LR: 0.00000624 [20:52:00] Epoch: 1 Batch: 19077/20099 (94.92%) Loss: 2.251599 LR: 0.00000624 [20:52:02] Epoch: 1 Batch: 19078/20099 (94.92%) Loss: 2.161386 LR: 0.00000624 [20:52:04] Epoch: 1 Batch: 19079/20099 (94.93%) Loss: 2.263873 LR: 0.00000624 [20:52:06] Epoch: 1 Batch: 19080/20099 (94.93%) Loss: 2.019010 LR: 0.00000624 [20:52:07] Epoch: 1 Batch: 19081/20099 (94.94%) Loss: 2.086088 LR: 0.00000624 [20:52:09] Epoch: 1 Batch: 19082/20099 (94.94%) Loss: 2.288847 LR: 0.00000624 [20:52:11] Epoch: 1 Batch: 19083/20099 (94.95%) Loss: 1.870238 LR: 0.00000624 [20:52:12] Epoch: 1 Batch: 19084/20099 (94.95%) Loss: 2.354767 LR: 0.00000623 [20:52:14] Epoch: 1 Batch: 19085/20099 (94.95%) Loss: 2.159588 LR: 0.00000623 [20:52:16] Epoch: 1 Batch: 19086/20099 (94.96%) Loss: 2.087382 LR: 0.00000623 [20:52:17] Epoch: 1 Batch: 19087/20099 (94.96%) Loss: 2.222804 LR: 0.00000623 [20:52:19] Epoch: 1 Batch: 19088/20099 (94.97%) Loss: 2.145258 LR: 0.00000623 [20:52:21] Epoch: 1 Batch: 19089/20099 (94.97%) Loss: 2.231341 LR: 0.00000623 [20:52:23] Epoch: 1 Batch: 19090/20099 (94.98%) Loss: 2.151014 LR: 0.00000623 [20:52:24] Epoch: 1 Batch: 19091/20099 (94.98%) Loss: 2.149684 LR: 0.00000623 [20:52:26] Epoch: 1 Batch: 19092/20099 (94.99%) Loss: 2.225381 LR: 0.00000623 [20:52:28] Epoch: 1 Batch: 19093/20099 (94.99%) Loss: 1.769606 LR: 0.00000623 [20:52:29] Epoch: 1 Batch: 19094/20099 (95.00%) Loss: 1.890630 LR: 0.00000623 [20:52:31] Epoch: 1 Batch: 19095/20099 (95.00%) Loss: 2.034448 LR: 0.00000623 [20:52:33] Epoch: 1 Batch: 19096/20099 (95.01%) Loss: 2.114579 LR: 0.00000623 [20:52:34] Epoch: 1 Batch: 19097/20099 (95.01%) Loss: 2.209268 LR: 0.00000623 [20:52:36] Epoch: 1 Batch: 19098/20099 (95.02%) Loss: 2.254925 LR: 0.00000623 [20:52:38] Epoch: 1 Batch: 19099/20099 (95.02%) Loss: 2.315436 LR: 0.00000623 [20:52:39] Epoch: 1 Batch: 19100/20099 (95.03%) Loss: 1.686467 LR: 0.00000623 [20:52:41] Epoch: 1 Batch: 19101/20099 (95.03%) Loss: 2.270308 LR: 0.00000623 [20:52:43] Epoch: 1 Batch: 19102/20099 (95.04%) Loss: 1.968389 LR: 0.00000623 [20:52:45] Epoch: 1 Batch: 19103/20099 (95.04%) Loss: 1.994247 LR: 0.00000623 [20:52:46] Epoch: 1 Batch: 19104/20099 (95.05%) Loss: 1.968544 LR: 0.00000623 [20:52:48] Epoch: 1 Batch: 19105/20099 (95.05%) Loss: 2.068643 LR: 0.00000622 [20:52:50] Epoch: 1 Batch: 19106/20099 (95.06%) Loss: 2.073535 LR: 0.00000622 [20:52:51] Epoch: 1 Batch: 19107/20099 (95.06%) Loss: 1.955161 LR: 0.00000622 [20:52:53] Epoch: 1 Batch: 19108/20099 (95.07%) Loss: 2.120068 LR: 0.00000622 [20:52:55] Epoch: 1 Batch: 19109/20099 (95.07%) Loss: 1.983937 LR: 0.00000622 [20:52:56] Epoch: 1 Batch: 19110/20099 (95.08%) Loss: 2.158452 LR: 0.00000622 [20:52:58] Epoch: 1 Batch: 19111/20099 (95.08%) Loss: 2.097769 LR: 0.00000622 [20:53:00] Epoch: 1 Batch: 19112/20099 (95.09%) Loss: 1.908696 LR: 0.00000622 [20:53:01] Epoch: 1 Batch: 19113/20099 (95.09%) Loss: 2.033146 LR: 0.00000622 [20:53:03] Epoch: 1 Batch: 19114/20099 (95.10%) Loss: 2.218906 LR: 0.00000622 [20:53:05] Epoch: 1 Batch: 19115/20099 (95.10%) Loss: 2.161279 LR: 0.00000622 [20:53:07] Epoch: 1 Batch: 19116/20099 (95.11%) Loss: 2.150926 LR: 0.00000622 [20:53:08] Epoch: 1 Batch: 19117/20099 (95.11%) Loss: 2.138609 LR: 0.00000622 [20:53:10] Epoch: 1 Batch: 19118/20099 (95.12%) Loss: 2.097715 LR: 0.00000622 [20:53:12] Epoch: 1 Batch: 19119/20099 (95.12%) Loss: 2.417540 LR: 0.00000622 [20:53:13] Epoch: 1 Batch: 19120/20099 (95.13%) Loss: 1.849390 LR: 0.00000622 [20:53:15] Epoch: 1 Batch: 19121/20099 (95.13%) Loss: 2.056543 LR: 0.00000622 [20:53:17] Epoch: 1 Batch: 19122/20099 (95.14%) Loss: 2.141981 LR: 0.00000622 [20:53:18] Epoch: 1 Batch: 19123/20099 (95.14%) Loss: 1.924123 LR: 0.00000622 [20:53:20] Epoch: 1 Batch: 19124/20099 (95.15%) Loss: 2.281999 LR: 0.00000622 [20:53:22] Epoch: 1 Batch: 19125/20099 (95.15%) Loss: 2.005001 LR: 0.00000622 [20:53:23] Epoch: 1 Batch: 19126/20099 (95.16%) Loss: 1.968428 LR: 0.00000621 [20:53:25] Epoch: 1 Batch: 19127/20099 (95.16%) Loss: 1.788190 LR: 0.00000621 [20:53:27] Epoch: 1 Batch: 19128/20099 (95.17%) Loss: 2.302800 LR: 0.00000621 [20:53:29] Epoch: 1 Batch: 19129/20099 (95.17%) Loss: 2.249286 LR: 0.00000621 [20:53:30] Epoch: 1 Batch: 19130/20099 (95.18%) Loss: 2.052105 LR: 0.00000621 [20:53:32] Epoch: 1 Batch: 19131/20099 (95.18%) Loss: 2.001174 LR: 0.00000621 [20:53:34] Epoch: 1 Batch: 19132/20099 (95.19%) Loss: 1.953700 LR: 0.00000621 [20:53:35] Epoch: 1 Batch: 19133/20099 (95.19%) Loss: 2.086836 LR: 0.00000621 [20:53:37] Epoch: 1 Batch: 19134/20099 (95.20%) Loss: 2.452835 LR: 0.00000621 [20:53:39] Epoch: 1 Batch: 19135/20099 (95.20%) Loss: 2.316084 LR: 0.00000621 [20:53:40] Epoch: 1 Batch: 19136/20099 (95.21%) Loss: 1.945849 LR: 0.00000621 [20:53:42] Epoch: 1 Batch: 19137/20099 (95.21%) Loss: 2.327879 LR: 0.00000621 [20:53:44] Epoch: 1 Batch: 19138/20099 (95.22%) Loss: 2.295895 LR: 0.00000621 [20:53:45] Epoch: 1 Batch: 19139/20099 (95.22%) Loss: 2.014965 LR: 0.00000621 [20:53:47] Epoch: 1 Batch: 19140/20099 (95.23%) Loss: 2.146734 LR: 0.00000621 [20:53:49] Epoch: 1 Batch: 19141/20099 (95.23%) Loss: 1.795921 LR: 0.00000621 [20:53:51] Epoch: 1 Batch: 19142/20099 (95.24%) Loss: 2.182985 LR: 0.00000621 [20:53:52] Epoch: 1 Batch: 19143/20099 (95.24%) Loss: 1.968587 LR: 0.00000621 [20:53:54] Epoch: 1 Batch: 19144/20099 (95.25%) Loss: 1.981478 LR: 0.00000621 [20:53:56] Epoch: 1 Batch: 19145/20099 (95.25%) Loss: 1.792846 LR: 0.00000621 [20:53:57] Epoch: 1 Batch: 19146/20099 (95.26%) Loss: 1.831909 LR: 0.00000621 [20:53:59] Epoch: 1 Batch: 19147/20099 (95.26%) Loss: 2.008496 LR: 0.00000621 [20:54:01] Epoch: 1 Batch: 19148/20099 (95.27%) Loss: 1.771042 LR: 0.00000621 [20:54:02] Epoch: 1 Batch: 19149/20099 (95.27%) Loss: 2.035701 LR: 0.00000621 [20:54:04] Epoch: 1 Batch: 19150/20099 (95.28%) Loss: 2.169387 LR: 0.00000621 [20:54:06] Epoch: 1 Batch: 19151/20099 (95.28%) Loss: 1.612649 LR: 0.00000621 [20:54:08] Epoch: 1 Batch: 19152/20099 (95.29%) Loss: 1.955926 LR: 0.00000621 [20:54:09] Epoch: 1 Batch: 19153/20099 (95.29%) Loss: 1.834377 LR: 0.00000621 [20:54:11] Epoch: 1 Batch: 19154/20099 (95.30%) Loss: 2.075702 LR: 0.00000620 [20:54:13] Epoch: 1 Batch: 19155/20099 (95.30%) Loss: 2.035671 LR: 0.00000620 [20:54:14] Epoch: 1 Batch: 19156/20099 (95.31%) Loss: 1.941365 LR: 0.00000620 [20:54:16] Epoch: 1 Batch: 19157/20099 (95.31%) Loss: 1.987090 LR: 0.00000620 [20:54:18] Epoch: 1 Batch: 19158/20099 (95.32%) Loss: 2.059231 LR: 0.00000620 [20:54:19] Epoch: 1 Batch: 19159/20099 (95.32%) Loss: 2.076622 LR: 0.00000620 [20:54:21] Epoch: 1 Batch: 19160/20099 (95.33%) Loss: 1.975419 LR: 0.00000620 [20:54:23] Epoch: 1 Batch: 19161/20099 (95.33%) Loss: 1.929149 LR: 0.00000620 [20:54:25] Epoch: 1 Batch: 19162/20099 (95.34%) Loss: 2.027461 LR: 0.00000620 [20:54:26] Epoch: 1 Batch: 19163/20099 (95.34%) Loss: 1.998302 LR: 0.00000620 [20:54:28] Epoch: 1 Batch: 19164/20099 (95.35%) Loss: 2.040854 LR: 0.00000620 [20:54:30] Epoch: 1 Batch: 19165/20099 (95.35%) Loss: 2.074735 LR: 0.00000620 [20:54:31] Epoch: 1 Batch: 19166/20099 (95.36%) Loss: 2.222879 LR: 0.00000620 [20:54:33] Epoch: 1 Batch: 19167/20099 (95.36%) Loss: 1.950391 LR: 0.00000620 [20:54:35] Epoch: 1 Batch: 19168/20099 (95.37%) Loss: 2.169336 LR: 0.00000620 [20:54:36] Epoch: 1 Batch: 19169/20099 (95.37%) Loss: 2.268463 LR: 0.00000620 [20:54:38] Epoch: 1 Batch: 19170/20099 (95.38%) Loss: 1.903302 LR: 0.00000620 [20:54:40] Epoch: 1 Batch: 19171/20099 (95.38%) Loss: 2.200964 LR: 0.00000620 [20:54:41] Epoch: 1 Batch: 19172/20099 (95.39%) Loss: 1.718770 LR: 0.00000620 [20:54:43] Epoch: 1 Batch: 19173/20099 (95.39%) Loss: 2.163008 LR: 0.00000620 [20:54:45] Epoch: 1 Batch: 19174/20099 (95.40%) Loss: 2.153473 LR: 0.00000620 [20:54:47] Epoch: 1 Batch: 19175/20099 (95.40%) Loss: 2.190738 LR: 0.00000619 [20:54:48] Epoch: 1 Batch: 19176/20099 (95.41%) Loss: 1.991787 LR: 0.00000619 [20:54:50] Epoch: 1 Batch: 19177/20099 (95.41%) Loss: 2.105792 LR: 0.00000619 [20:54:52] Epoch: 1 Batch: 19178/20099 (95.42%) Loss: 2.050585 LR: 0.00000619 [20:54:53] Epoch: 1 Batch: 19179/20099 (95.42%) Loss: 2.012550 LR: 0.00000619 [20:54:55] Epoch: 1 Batch: 19180/20099 (95.43%) Loss: 2.289478 LR: 0.00000619 [20:54:57] Epoch: 1 Batch: 19181/20099 (95.43%) Loss: 2.000117 LR: 0.00000619 [20:54:58] Epoch: 1 Batch: 19182/20099 (95.44%) Loss: 2.192591 LR: 0.00000619 [20:55:00] Epoch: 1 Batch: 19183/20099 (95.44%) Loss: 2.047304 LR: 0.00000619 [20:55:02] Epoch: 1 Batch: 19184/20099 (95.45%) Loss: 1.916289 LR: 0.00000619 [20:55:04] Epoch: 1 Batch: 19185/20099 (95.45%) Loss: 1.935992 LR: 0.00000619 [20:55:05] Epoch: 1 Batch: 19186/20099 (95.46%) Loss: 2.320133 LR: 0.00000619 [20:55:07] Epoch: 1 Batch: 19187/20099 (95.46%) Loss: 2.155389 LR: 0.00000619 [20:55:09] Epoch: 1 Batch: 19188/20099 (95.47%) Loss: 2.032508 LR: 0.00000619 [20:55:10] Epoch: 1 Batch: 19189/20099 (95.47%) Loss: 1.975265 LR: 0.00000619 [20:55:12] Epoch: 1 Batch: 19190/20099 (95.48%) Loss: 2.061513 LR: 0.00000619 [20:55:14] Epoch: 1 Batch: 19191/20099 (95.48%) Loss: 1.848305 LR: 0.00000619 [20:55:15] Epoch: 1 Batch: 19192/20099 (95.49%) Loss: 1.694508 LR: 0.00000619 [20:55:17] Epoch: 1 Batch: 19193/20099 (95.49%) Loss: 2.009422 LR: 0.00000619 [20:55:19] Epoch: 1 Batch: 19194/20099 (95.50%) Loss: 2.153341 LR: 0.00000619 [20:55:21] Epoch: 1 Batch: 19195/20099 (95.50%) Loss: 2.206749 LR: 0.00000619 [20:55:22] Epoch: 1 Batch: 19196/20099 (95.51%) Loss: 2.221833 LR: 0.00000619 [20:55:24] Epoch: 1 Batch: 19197/20099 (95.51%) Loss: 2.147700 LR: 0.00000619 [20:55:26] Epoch: 1 Batch: 19198/20099 (95.52%) Loss: 2.150656 LR: 0.00000619 [20:55:27] Epoch: 1 Batch: 19199/20099 (95.52%) Loss: 1.865967 LR: 0.00000619 [20:56:01] >> Cleaned up old temp checkpoint: epoch1_step17600 [20:56:01] >> Temp checkpoint saved: epoch1_step19200, size: 0.1693 GB [20:56:01] Epoch: 1 Batch: 19200/20099 (95.53%) Loss: 2.142875 LR: 0.00000619 [20:56:02] Epoch: 1 Batch: 19201/20099 (95.53%) Loss: 2.152092 LR: 0.00000619 [20:56:04] Epoch: 1 Batch: 19202/20099 (95.54%) Loss: 2.140530 LR: 0.00000619 [20:56:06] Epoch: 1 Batch: 19203/20099 (95.54%) Loss: 2.072032 LR: 0.00000618 [20:56:07] Epoch: 1 Batch: 19204/20099 (95.55%) Loss: 1.984781 LR: 0.00000618 [20:56:09] Epoch: 1 Batch: 19205/20099 (95.55%) Loss: 1.979917 LR: 0.00000618 [20:56:13] Epoch: 1 Batch: 19206/20099 (95.56%) Loss: 2.151002 LR: 0.00000618 [20:56:14] Epoch: 1 Batch: 19207/20099 (95.56%) Loss: 2.552400 LR: 0.00000618 [20:56:16] Epoch: 1 Batch: 19208/20099 (95.57%) Loss: 2.114375 LR: 0.00000618 [20:56:18] Epoch: 1 Batch: 19209/20099 (95.57%) Loss: 2.299017 LR: 0.00000618 [20:56:19] Epoch: 1 Batch: 19210/20099 (95.58%) Loss: 2.098571 LR: 0.00000618 [20:56:21] Epoch: 1 Batch: 19211/20099 (95.58%) Loss: 2.009327 LR: 0.00000618 [20:56:23] Epoch: 1 Batch: 19212/20099 (95.59%) Loss: 1.907591 LR: 0.00000618 [20:56:25] Epoch: 1 Batch: 19213/20099 (95.59%) Loss: 2.253966 LR: 0.00000618 [20:56:26] Epoch: 1 Batch: 19214/20099 (95.60%) Loss: 2.149771 LR: 0.00000618 [20:56:28] Epoch: 1 Batch: 19215/20099 (95.60%) Loss: 2.279134 LR: 0.00000618 [20:56:30] Epoch: 1 Batch: 19216/20099 (95.61%) Loss: 2.113948 LR: 0.00000618 [20:56:32] Epoch: 1 Batch: 19217/20099 (95.61%) Loss: 2.117103 LR: 0.00000618 [20:56:33] Epoch: 1 Batch: 19218/20099 (95.62%) Loss: 2.296872 LR: 0.00000618 [20:56:35] Epoch: 1 Batch: 19219/20099 (95.62%) Loss: 2.284608 LR: 0.00000618 [20:56:37] Epoch: 1 Batch: 19220/20099 (95.63%) Loss: 2.141250 LR: 0.00000618 [20:56:39] Epoch: 1 Batch: 19221/20099 (95.63%) Loss: 2.107011 LR: 0.00000618 [20:56:40] Epoch: 1 Batch: 19222/20099 (95.64%) Loss: 2.347450 LR: 0.00000618 [20:56:42] Epoch: 1 Batch: 19223/20099 (95.64%) Loss: 1.942029 LR: 0.00000618 [20:56:44] Epoch: 1 Batch: 19224/20099 (95.65%) Loss: 1.978559 LR: 0.00000617 [20:56:46] Epoch: 1 Batch: 19225/20099 (95.65%) Loss: 2.072915 LR: 0.00000617 [20:56:47] Epoch: 1 Batch: 19226/20099 (95.66%) Loss: 2.084349 LR: 0.00000617 [20:56:49] Epoch: 1 Batch: 19227/20099 (95.66%) Loss: 2.176692 LR: 0.00000617 [20:56:51] Epoch: 1 Batch: 19228/20099 (95.67%) Loss: 2.179019 LR: 0.00000617 [20:56:53] Epoch: 1 Batch: 19229/20099 (95.67%) Loss: 1.923511 LR: 0.00000617 [20:56:54] Epoch: 1 Batch: 19230/20099 (95.68%) Loss: 2.132443 LR: 0.00000617 [20:56:56] Epoch: 1 Batch: 19231/20099 (95.68%) Loss: 2.086932 LR: 0.00000617 [20:56:58] Epoch: 1 Batch: 19232/20099 (95.69%) Loss: 1.977108 LR: 0.00000617 [20:56:59] Epoch: 1 Batch: 19233/20099 (95.69%) Loss: 1.994963 LR: 0.00000617 [20:57:01] Epoch: 1 Batch: 19234/20099 (95.70%) Loss: 2.359763 LR: 0.00000617 [20:57:03] Epoch: 1 Batch: 19235/20099 (95.70%) Loss: 2.006109 LR: 0.00000617 [20:57:05] Epoch: 1 Batch: 19236/20099 (95.71%) Loss: 2.083440 LR: 0.00000617 [20:57:06] Epoch: 1 Batch: 19237/20099 (95.71%) Loss: 2.103277 LR: 0.00000617 [20:57:08] Epoch: 1 Batch: 19238/20099 (95.72%) Loss: 1.896535 LR: 0.00000617 [20:57:10] Epoch: 1 Batch: 19239/20099 (95.72%) Loss: 1.865528 LR: 0.00000617 [20:57:11] Epoch: 1 Batch: 19240/20099 (95.73%) Loss: 1.888546 LR: 0.00000617 [20:57:13] Epoch: 1 Batch: 19241/20099 (95.73%) Loss: 2.211688 LR: 0.00000617 [20:57:15] Epoch: 1 Batch: 19242/20099 (95.74%) Loss: 2.278568 LR: 0.00000617 [20:57:16] Epoch: 1 Batch: 19243/20099 (95.74%) Loss: 2.366391 LR: 0.00000617 [20:57:18] Epoch: 1 Batch: 19244/20099 (95.75%) Loss: 2.212700 LR: 0.00000617 [20:57:20] Epoch: 1 Batch: 19245/20099 (95.75%) Loss: 1.944273 LR: 0.00000617 [20:57:21] Epoch: 1 Batch: 19246/20099 (95.76%) Loss: 2.041165 LR: 0.00000617 [20:57:23] Epoch: 1 Batch: 19247/20099 (95.76%) Loss: 2.191855 LR: 0.00000617 [20:57:25] Epoch: 1 Batch: 19248/20099 (95.77%) Loss: 2.144728 LR: 0.00000617 [20:57:26] Epoch: 1 Batch: 19249/20099 (95.77%) Loss: 2.210048 LR: 0.00000617 [20:57:28] Epoch: 1 Batch: 19250/20099 (95.78%) Loss: 2.424622 LR: 0.00000617 [20:57:30] Epoch: 1 Batch: 19251/20099 (95.78%) Loss: 2.307872 LR: 0.00000617 [20:57:31] Epoch: 1 Batch: 19252/20099 (95.79%) Loss: 2.268333 LR: 0.00000616 [20:57:33] Epoch: 1 Batch: 19253/20099 (95.79%) Loss: 2.144069 LR: 0.00000616 [20:57:35] Epoch: 1 Batch: 19254/20099 (95.80%) Loss: 2.063195 LR: 0.00000616 [20:57:36] Epoch: 1 Batch: 19255/20099 (95.80%) Loss: 2.288700 LR: 0.00000616 [20:57:38] Epoch: 1 Batch: 19256/20099 (95.81%) Loss: 1.901940 LR: 0.00000616 [20:57:40] Epoch: 1 Batch: 19257/20099 (95.81%) Loss: 1.972109 LR: 0.00000616 [20:57:41] Epoch: 1 Batch: 19258/20099 (95.82%) Loss: 1.897694 LR: 0.00000616 [20:57:43] Epoch: 1 Batch: 19259/20099 (95.82%) Loss: 2.158423 LR: 0.00000616 [20:57:45] Epoch: 1 Batch: 19260/20099 (95.83%) Loss: 2.009321 LR: 0.00000616 [20:57:46] Epoch: 1 Batch: 19261/20099 (95.83%) Loss: 1.939477 LR: 0.00000616 [20:57:48] Epoch: 1 Batch: 19262/20099 (95.84%) Loss: 2.358209 LR: 0.00000616 [20:57:50] Epoch: 1 Batch: 19263/20099 (95.84%) Loss: 2.007946 LR: 0.00000616 [20:57:51] Epoch: 1 Batch: 19264/20099 (95.85%) Loss: 2.058913 LR: 0.00000616 [20:57:53] Epoch: 1 Batch: 19265/20099 (95.85%) Loss: 2.212206 LR: 0.00000616 [20:57:55] Epoch: 1 Batch: 19266/20099 (95.86%) Loss: 2.173413 LR: 0.00000616 [20:57:57] Epoch: 1 Batch: 19267/20099 (95.86%) Loss: 1.998665 LR: 0.00000616 [20:57:58] Epoch: 1 Batch: 19268/20099 (95.87%) Loss: 2.045244 LR: 0.00000616 [20:58:00] Epoch: 1 Batch: 19269/20099 (95.87%) Loss: 2.134426 LR: 0.00000616 [20:58:02] Epoch: 1 Batch: 19270/20099 (95.88%) Loss: 2.253838 LR: 0.00000616 [20:58:03] Epoch: 1 Batch: 19271/20099 (95.88%) Loss: 2.073817 LR: 0.00000616 [20:58:05] Epoch: 1 Batch: 19272/20099 (95.89%) Loss: 1.851133 LR: 0.00000616 [20:58:07] Epoch: 1 Batch: 19273/20099 (95.89%) Loss: 2.171045 LR: 0.00000616 [20:58:08] Epoch: 1 Batch: 19274/20099 (95.90%) Loss: 1.961958 LR: 0.00000616 [20:58:10] Epoch: 1 Batch: 19275/20099 (95.90%) Loss: 1.956166 LR: 0.00000616 [20:58:12] Epoch: 1 Batch: 19276/20099 (95.91%) Loss: 1.917084 LR: 0.00000616 [20:58:14] Epoch: 1 Batch: 19277/20099 (95.91%) Loss: 2.122412 LR: 0.00000616 [20:58:15] Epoch: 1 Batch: 19278/20099 (95.92%) Loss: 2.005834 LR: 0.00000616 [20:58:17] Epoch: 1 Batch: 19279/20099 (95.92%) Loss: 2.346806 LR: 0.00000616 [20:58:19] Epoch: 1 Batch: 19280/20099 (95.93%) Loss: 2.100619 LR: 0.00000615 [20:58:20] Epoch: 1 Batch: 19281/20099 (95.93%) Loss: 2.253731 LR: 0.00000615 [20:58:22] Epoch: 1 Batch: 19282/20099 (95.94%) Loss: 1.946220 LR: 0.00000615 [20:58:24] Epoch: 1 Batch: 19283/20099 (95.94%) Loss: 2.240571 LR: 0.00000615 [20:58:26] Epoch: 1 Batch: 19284/20099 (95.95%) Loss: 1.921088 LR: 0.00000615 [20:58:27] Epoch: 1 Batch: 19285/20099 (95.95%) Loss: 2.157561 LR: 0.00000615 [20:58:29] Epoch: 1 Batch: 19286/20099 (95.96%) Loss: 1.897148 LR: 0.00000615 [20:58:31] Epoch: 1 Batch: 19287/20099 (95.96%) Loss: 1.748106 LR: 0.00000615 [20:58:32] Epoch: 1 Batch: 19288/20099 (95.96%) Loss: 2.034826 LR: 0.00000615 [20:58:34] Epoch: 1 Batch: 19289/20099 (95.97%) Loss: 1.990942 LR: 0.00000615 [20:58:36] Epoch: 1 Batch: 19290/20099 (95.97%) Loss: 2.399631 LR: 0.00000615 [20:58:37] Epoch: 1 Batch: 19291/20099 (95.98%) Loss: 2.035029 LR: 0.00000615 [20:58:39] Epoch: 1 Batch: 19292/20099 (95.98%) Loss: 1.952772 LR: 0.00000615 [20:58:41] Epoch: 1 Batch: 19293/20099 (95.99%) Loss: 2.066490 LR: 0.00000615 [20:58:43] Epoch: 1 Batch: 19294/20099 (95.99%) Loss: 1.847305 LR: 0.00000615 [20:58:44] Epoch: 1 Batch: 19295/20099 (96.00%) Loss: 2.187312 LR: 0.00000615 [20:58:46] Epoch: 1 Batch: 19296/20099 (96.00%) Loss: 2.249157 LR: 0.00000615 [20:58:48] Epoch: 1 Batch: 19297/20099 (96.01%) Loss: 2.504088 LR: 0.00000615 [20:58:49] Epoch: 1 Batch: 19298/20099 (96.01%) Loss: 1.829207 LR: 0.00000615 [20:58:51] Epoch: 1 Batch: 19299/20099 (96.02%) Loss: 1.963589 LR: 0.00000615 [20:58:53] Epoch: 1 Batch: 19300/20099 (96.02%) Loss: 2.047724 LR: 0.00000615 [20:58:54] Epoch: 1 Batch: 19301/20099 (96.03%) Loss: 2.120921 LR: 0.00000615 [20:58:56] Epoch: 1 Batch: 19302/20099 (96.03%) Loss: 2.004623 LR: 0.00000615 [20:58:58] Epoch: 1 Batch: 19303/20099 (96.04%) Loss: 2.120356 LR: 0.00000615 [20:58:59] Epoch: 1 Batch: 19304/20099 (96.04%) Loss: 2.324379 LR: 0.00000615 [20:59:01] Epoch: 1 Batch: 19305/20099 (96.05%) Loss: 2.088772 LR: 0.00000615 [20:59:03] Epoch: 1 Batch: 19306/20099 (96.05%) Loss: 2.235310 LR: 0.00000615 [20:59:04] Epoch: 1 Batch: 19307/20099 (96.06%) Loss: 2.132212 LR: 0.00000615 [20:59:06] Epoch: 1 Batch: 19308/20099 (96.06%) Loss: 1.994984 LR: 0.00000614 [20:59:08] Epoch: 1 Batch: 19309/20099 (96.07%) Loss: 2.398373 LR: 0.00000614 [20:59:10] Epoch: 1 Batch: 19310/20099 (96.07%) Loss: 2.033854 LR: 0.00000614 [20:59:11] Epoch: 1 Batch: 19311/20099 (96.08%) Loss: 2.002549 LR: 0.00000614 [20:59:13] Epoch: 1 Batch: 19312/20099 (96.08%) Loss: 2.526049 LR: 0.00000614 [20:59:15] Epoch: 1 Batch: 19313/20099 (96.09%) Loss: 2.100870 LR: 0.00000614 [20:59:16] Epoch: 1 Batch: 19314/20099 (96.09%) Loss: 2.079069 LR: 0.00000614 [20:59:18] Epoch: 1 Batch: 19315/20099 (96.10%) Loss: 2.153771 LR: 0.00000614 [20:59:20] Epoch: 1 Batch: 19316/20099 (96.10%) Loss: 2.083469 LR: 0.00000614 [20:59:21] Epoch: 1 Batch: 19317/20099 (96.11%) Loss: 1.919357 LR: 0.00000614 [20:59:23] Epoch: 1 Batch: 19318/20099 (96.11%) Loss: 2.351133 LR: 0.00000614 [20:59:25] Epoch: 1 Batch: 19319/20099 (96.12%) Loss: 2.002998 LR: 0.00000614 [20:59:26] Epoch: 1 Batch: 19320/20099 (96.12%) Loss: 1.999341 LR: 0.00000614 [20:59:28] Epoch: 1 Batch: 19321/20099 (96.13%) Loss: 1.955538 LR: 0.00000614 [20:59:30] Epoch: 1 Batch: 19322/20099 (96.13%) Loss: 2.167962 LR: 0.00000614 [20:59:31] Epoch: 1 Batch: 19323/20099 (96.14%) Loss: 1.948067 LR: 0.00000614 [20:59:33] Epoch: 1 Batch: 19324/20099 (96.14%) Loss: 2.053443 LR: 0.00000614 [20:59:35] Epoch: 1 Batch: 19325/20099 (96.15%) Loss: 1.994024 LR: 0.00000614 [20:59:37] Epoch: 1 Batch: 19326/20099 (96.15%) Loss: 2.267633 LR: 0.00000614 [20:59:38] Epoch: 1 Batch: 19327/20099 (96.16%) Loss: 1.964378 LR: 0.00000614 [20:59:40] Epoch: 1 Batch: 19328/20099 (96.16%) Loss: 2.074259 LR: 0.00000614 [20:59:42] Epoch: 1 Batch: 19329/20099 (96.17%) Loss: 2.180869 LR: 0.00000614 [20:59:43] Epoch: 1 Batch: 19330/20099 (96.17%) Loss: 2.047415 LR: 0.00000614 [20:59:45] Epoch: 1 Batch: 19331/20099 (96.18%) Loss: 2.065015 LR: 0.00000614 [20:59:47] Epoch: 1 Batch: 19332/20099 (96.18%) Loss: 2.287655 LR: 0.00000614 [20:59:48] Epoch: 1 Batch: 19333/20099 (96.19%) Loss: 1.989528 LR: 0.00000614 [20:59:50] Epoch: 1 Batch: 19334/20099 (96.19%) Loss: 2.117703 LR: 0.00000614 [20:59:52] Epoch: 1 Batch: 19335/20099 (96.20%) Loss: 2.298967 LR: 0.00000614 [20:59:54] Epoch: 1 Batch: 19336/20099 (96.20%) Loss: 1.903924 LR: 0.00000613 [20:59:55] Epoch: 1 Batch: 19337/20099 (96.21%) Loss: 2.048628 LR: 0.00000613 [20:59:57] Epoch: 1 Batch: 19338/20099 (96.21%) Loss: 2.127208 LR: 0.00000613 [20:59:59] Epoch: 1 Batch: 19339/20099 (96.22%) Loss: 1.658756 LR: 0.00000613 [21:00:00] Epoch: 1 Batch: 19340/20099 (96.22%) Loss: 2.150151 LR: 0.00000613 [21:00:02] Epoch: 1 Batch: 19341/20099 (96.23%) Loss: 2.042761 LR: 0.00000613 [21:00:04] Epoch: 1 Batch: 19342/20099 (96.23%) Loss: 2.015319 LR: 0.00000613 [21:00:05] Epoch: 1 Batch: 19343/20099 (96.24%) Loss: 2.097989 LR: 0.00000613 [21:00:07] Epoch: 1 Batch: 19344/20099 (96.24%) Loss: 2.220733 LR: 0.00000613 [21:00:09] Epoch: 1 Batch: 19345/20099 (96.25%) Loss: 1.554869 LR: 0.00000613 [21:00:11] Epoch: 1 Batch: 19346/20099 (96.25%) Loss: 2.061192 LR: 0.00000613 [21:00:12] Epoch: 1 Batch: 19347/20099 (96.26%) Loss: 2.108182 LR: 0.00000613 [21:00:14] Epoch: 1 Batch: 19348/20099 (96.26%) Loss: 1.979931 LR: 0.00000613 [21:00:16] Epoch: 1 Batch: 19349/20099 (96.27%) Loss: 2.162380 LR: 0.00000613 [21:00:17] Epoch: 1 Batch: 19350/20099 (96.27%) Loss: 1.940903 LR: 0.00000613 [21:00:19] Epoch: 1 Batch: 19351/20099 (96.28%) Loss: 2.282442 LR: 0.00000613 [21:00:21] Epoch: 1 Batch: 19352/20099 (96.28%) Loss: 2.060704 LR: 0.00000613 [21:00:22] Epoch: 1 Batch: 19353/20099 (96.29%) Loss: 2.369459 LR: 0.00000613 [21:00:24] Epoch: 1 Batch: 19354/20099 (96.29%) Loss: 2.137497 LR: 0.00000613 [21:00:26] Epoch: 1 Batch: 19355/20099 (96.30%) Loss: 1.979867 LR: 0.00000613 [21:00:27] Epoch: 1 Batch: 19356/20099 (96.30%) Loss: 2.156448 LR: 0.00000613 [21:00:29] Epoch: 1 Batch: 19357/20099 (96.31%) Loss: 1.996593 LR: 0.00000613 [21:00:31] Epoch: 1 Batch: 19358/20099 (96.31%) Loss: 2.260717 LR: 0.00000613 [21:00:33] Epoch: 1 Batch: 19359/20099 (96.32%) Loss: 2.114215 LR: 0.00000613 [21:00:34] Epoch: 1 Batch: 19360/20099 (96.32%) Loss: 1.845825 LR: 0.00000613 [21:00:36] Epoch: 1 Batch: 19361/20099 (96.33%) Loss: 2.307250 LR: 0.00000613 [21:00:38] Epoch: 1 Batch: 19362/20099 (96.33%) Loss: 1.844648 LR: 0.00000613 [21:00:39] Epoch: 1 Batch: 19363/20099 (96.34%) Loss: 2.248711 LR: 0.00000613 [21:00:41] Epoch: 1 Batch: 19364/20099 (96.34%) Loss: 2.023023 LR: 0.00000612 [21:00:43] Epoch: 1 Batch: 19365/20099 (96.35%) Loss: 2.043382 LR: 0.00000612 [21:00:45] Epoch: 1 Batch: 19366/20099 (96.35%) Loss: 2.246357 LR: 0.00000612 [21:00:46] Epoch: 1 Batch: 19367/20099 (96.36%) Loss: 1.808795 LR: 0.00000612 [21:00:48] Epoch: 1 Batch: 19368/20099 (96.36%) Loss: 2.152886 LR: 0.00000612 [21:00:50] Epoch: 1 Batch: 19369/20099 (96.37%) Loss: 2.042317 LR: 0.00000612 [21:00:51] Epoch: 1 Batch: 19370/20099 (96.37%) Loss: 2.341267 LR: 0.00000612 [21:00:53] Epoch: 1 Batch: 19371/20099 (96.38%) Loss: 1.901681 LR: 0.00000612 [21:00:55] Epoch: 1 Batch: 19372/20099 (96.38%) Loss: 2.290436 LR: 0.00000612 [21:00:56] Epoch: 1 Batch: 19373/20099 (96.39%) Loss: 2.157007 LR: 0.00000612 [21:00:58] Epoch: 1 Batch: 19374/20099 (96.39%) Loss: 1.990457 LR: 0.00000612 [21:01:00] Epoch: 1 Batch: 19375/20099 (96.40%) Loss: 1.900661 LR: 0.00000612 [21:01:02] Epoch: 1 Batch: 19376/20099 (96.40%) Loss: 1.762744 LR: 0.00000612 [21:01:03] Epoch: 1 Batch: 19377/20099 (96.41%) Loss: 1.808325 LR: 0.00000612 [21:01:05] Epoch: 1 Batch: 19378/20099 (96.41%) Loss: 1.975487 LR: 0.00000612 [21:01:07] Epoch: 1 Batch: 19379/20099 (96.42%) Loss: 1.929963 LR: 0.00000612 [21:01:08] Epoch: 1 Batch: 19380/20099 (96.42%) Loss: 2.183448 LR: 0.00000612 [21:01:10] Epoch: 1 Batch: 19381/20099 (96.43%) Loss: 1.944941 LR: 0.00000612 [21:01:12] Epoch: 1 Batch: 19382/20099 (96.43%) Loss: 1.996176 LR: 0.00000612 [21:01:13] Epoch: 1 Batch: 19383/20099 (96.44%) Loss: 2.188856 LR: 0.00000612 [21:01:15] Epoch: 1 Batch: 19384/20099 (96.44%) Loss: 2.023151 LR: 0.00000612 [21:01:17] Epoch: 1 Batch: 19385/20099 (96.45%) Loss: 2.136082 LR: 0.00000612 [21:01:19] Epoch: 1 Batch: 19386/20099 (96.45%) Loss: 1.696579 LR: 0.00000612 [21:01:20] Epoch: 1 Batch: 19387/20099 (96.46%) Loss: 2.007573 LR: 0.00000612 [21:01:22] Epoch: 1 Batch: 19388/20099 (96.46%) Loss: 2.065821 LR: 0.00000612 [21:01:24] Epoch: 1 Batch: 19389/20099 (96.47%) Loss: 1.703727 LR: 0.00000612 [21:01:25] Epoch: 1 Batch: 19390/20099 (96.47%) Loss: 2.011709 LR: 0.00000612 [21:01:27] Epoch: 1 Batch: 19391/20099 (96.48%) Loss: 2.194217 LR: 0.00000612 [21:01:29] Epoch: 1 Batch: 19392/20099 (96.48%) Loss: 2.177925 LR: 0.00000611 [21:01:30] Epoch: 1 Batch: 19393/20099 (96.49%) Loss: 2.240837 LR: 0.00000611 [21:01:32] Epoch: 1 Batch: 19394/20099 (96.49%) Loss: 2.036586 LR: 0.00000611 [21:01:34] Epoch: 1 Batch: 19395/20099 (96.50%) Loss: 1.963877 LR: 0.00000611 [21:01:35] Epoch: 1 Batch: 19396/20099 (96.50%) Loss: 2.052092 LR: 0.00000611 [21:01:37] Epoch: 1 Batch: 19397/20099 (96.51%) Loss: 2.076968 LR: 0.00000611 [21:01:39] Epoch: 1 Batch: 19398/20099 (96.51%) Loss: 1.817231 LR: 0.00000611 [21:01:41] Epoch: 1 Batch: 19399/20099 (96.52%) Loss: 2.123400 LR: 0.00000611 [21:02:14] >> Temp checkpoint saved: epoch1_step19400, size: 0.1693 GB [21:02:14] Epoch: 1 Batch: 19400/20099 (96.52%) Loss: 1.924161 LR: 0.00000611 [21:02:16] Epoch: 1 Batch: 19401/20099 (96.53%) Loss: 1.918088 LR: 0.00000611 [21:02:17] Epoch: 1 Batch: 19402/20099 (96.53%) Loss: 2.260812 LR: 0.00000611 [21:02:19] Epoch: 1 Batch: 19403/20099 (96.54%) Loss: 2.193776 LR: 0.00000611 [21:02:21] Epoch: 1 Batch: 19404/20099 (96.54%) Loss: 2.020071 LR: 0.00000611 [21:02:22] Epoch: 1 Batch: 19405/20099 (96.55%) Loss: 1.868532 LR: 0.00000611 [21:02:24] Epoch: 1 Batch: 19406/20099 (96.55%) Loss: 1.988458 LR: 0.00000611 [21:02:28] Epoch: 1 Batch: 19407/20099 (96.56%) Loss: 2.079351 LR: 0.00000611 [21:02:29] Epoch: 1 Batch: 19408/20099 (96.56%) Loss: 2.015919 LR: 0.00000611 [21:02:31] Epoch: 1 Batch: 19409/20099 (96.57%) Loss: 2.139397 LR: 0.00000611 [21:02:33] Epoch: 1 Batch: 19410/20099 (96.57%) Loss: 2.206385 LR: 0.00000611 [21:02:35] Epoch: 1 Batch: 19411/20099 (96.58%) Loss: 2.133883 LR: 0.00000611 [21:02:36] Epoch: 1 Batch: 19412/20099 (96.58%) Loss: 1.984125 LR: 0.00000611 [21:02:38] Epoch: 1 Batch: 19413/20099 (96.59%) Loss: 2.020163 LR: 0.00000611 [21:02:40] Epoch: 1 Batch: 19414/20099 (96.59%) Loss: 2.087989 LR: 0.00000611 [21:02:41] Epoch: 1 Batch: 19415/20099 (96.60%) Loss: 1.952047 LR: 0.00000611 [21:02:43] Epoch: 1 Batch: 19416/20099 (96.60%) Loss: 2.104268 LR: 0.00000611 [21:02:45] Epoch: 1 Batch: 19417/20099 (96.61%) Loss: 2.183434 LR: 0.00000611 [21:02:47] Epoch: 1 Batch: 19418/20099 (96.61%) Loss: 2.168390 LR: 0.00000611 [21:02:49] Epoch: 1 Batch: 19419/20099 (96.62%) Loss: 1.809287 LR: 0.00000611 [21:02:50] Epoch: 1 Batch: 19420/20099 (96.62%) Loss: 2.158170 LR: 0.00000611 [21:02:52] Epoch: 1 Batch: 19421/20099 (96.63%) Loss: 2.462716 LR: 0.00000611 [21:02:54] Epoch: 1 Batch: 19422/20099 (96.63%) Loss: 2.324638 LR: 0.00000611 [21:02:56] Epoch: 1 Batch: 19423/20099 (96.64%) Loss: 2.118031 LR: 0.00000611 [21:02:57] Epoch: 1 Batch: 19424/20099 (96.64%) Loss: 2.003569 LR: 0.00000611 [21:02:59] Epoch: 1 Batch: 19425/20099 (96.65%) Loss: 1.966407 LR: 0.00000611 [21:03:01] Epoch: 1 Batch: 19426/20099 (96.65%) Loss: 1.801728 LR: 0.00000611 [21:03:03] Epoch: 1 Batch: 19427/20099 (96.66%) Loss: 2.171199 LR: 0.00000610 [21:03:04] Epoch: 1 Batch: 19428/20099 (96.66%) Loss: 1.966954 LR: 0.00000610 [21:03:06] Epoch: 1 Batch: 19429/20099 (96.67%) Loss: 2.069332 LR: 0.00000610 [21:03:08] Epoch: 1 Batch: 19430/20099 (96.67%) Loss: 2.282226 LR: 0.00000610 [21:03:09] Epoch: 1 Batch: 19431/20099 (96.68%) Loss: 2.108295 LR: 0.00000610 [21:03:11] Epoch: 1 Batch: 19432/20099 (96.68%) Loss: 2.206185 LR: 0.00000610 [21:03:13] Epoch: 1 Batch: 19433/20099 (96.69%) Loss: 1.853494 LR: 0.00000610 [21:03:15] Epoch: 1 Batch: 19434/20099 (96.69%) Loss: 2.217936 LR: 0.00000610 [21:03:16] Epoch: 1 Batch: 19435/20099 (96.70%) Loss: 1.800403 LR: 0.00000610 [21:03:18] Epoch: 1 Batch: 19436/20099 (96.70%) Loss: 2.313794 LR: 0.00000610 [21:03:20] Epoch: 1 Batch: 19437/20099 (96.71%) Loss: 2.067330 LR: 0.00000610 [21:03:21] Epoch: 1 Batch: 19438/20099 (96.71%) Loss: 2.140300 LR: 0.00000610 [21:03:23] Epoch: 1 Batch: 19439/20099 (96.72%) Loss: 2.192342 LR: 0.00000610 [21:03:25] Epoch: 1 Batch: 19440/20099 (96.72%) Loss: 2.201669 LR: 0.00000610 [21:03:26] Epoch: 1 Batch: 19441/20099 (96.73%) Loss: 2.007512 LR: 0.00000610 [21:03:28] Epoch: 1 Batch: 19442/20099 (96.73%) Loss: 1.690128 LR: 0.00000610 [21:03:30] Epoch: 1 Batch: 19443/20099 (96.74%) Loss: 1.790415 LR: 0.00000610 [21:03:31] Epoch: 1 Batch: 19444/20099 (96.74%) Loss: 2.098956 LR: 0.00000610 [21:03:33] Epoch: 1 Batch: 19445/20099 (96.75%) Loss: 1.904441 LR: 0.00000610 [21:03:35] Epoch: 1 Batch: 19446/20099 (96.75%) Loss: 1.975704 LR: 0.00000610 [21:03:36] Epoch: 1 Batch: 19447/20099 (96.76%) Loss: 1.869539 LR: 0.00000610 [21:03:38] Epoch: 1 Batch: 19448/20099 (96.76%) Loss: 2.298446 LR: 0.00000610 [21:03:40] Epoch: 1 Batch: 19449/20099 (96.77%) Loss: 2.042427 LR: 0.00000610 [21:03:41] Epoch: 1 Batch: 19450/20099 (96.77%) Loss: 2.165522 LR: 0.00000610 [21:03:43] Epoch: 1 Batch: 19451/20099 (96.78%) Loss: 2.169574 LR: 0.00000610 [21:03:45] Epoch: 1 Batch: 19452/20099 (96.78%) Loss: 1.844660 LR: 0.00000610 [21:03:46] Epoch: 1 Batch: 19453/20099 (96.79%) Loss: 1.967025 LR: 0.00000610 [21:03:48] Epoch: 1 Batch: 19454/20099 (96.79%) Loss: 2.094159 LR: 0.00000610 [21:03:50] Epoch: 1 Batch: 19455/20099 (96.80%) Loss: 1.926441 LR: 0.00000609 [21:03:51] Epoch: 1 Batch: 19456/20099 (96.80%) Loss: 1.998559 LR: 0.00000609 [21:03:53] Epoch: 1 Batch: 19457/20099 (96.81%) Loss: 1.929996 LR: 0.00000609 [21:03:55] Epoch: 1 Batch: 19458/20099 (96.81%) Loss: 2.244255 LR: 0.00000609 [21:03:56] Epoch: 1 Batch: 19459/20099 (96.82%) Loss: 2.048966 LR: 0.00000609 [21:03:58] Epoch: 1 Batch: 19460/20099 (96.82%) Loss: 1.966728 LR: 0.00000609 [21:04:00] Epoch: 1 Batch: 19461/20099 (96.83%) Loss: 2.219174 LR: 0.00000609 [21:04:02] Epoch: 1 Batch: 19462/20099 (96.83%) Loss: 2.083934 LR: 0.00000609 [21:04:03] Epoch: 1 Batch: 19463/20099 (96.84%) Loss: 2.297644 LR: 0.00000609 [21:04:05] Epoch: 1 Batch: 19464/20099 (96.84%) Loss: 2.037646 LR: 0.00000609 [21:04:07] Epoch: 1 Batch: 19465/20099 (96.85%) Loss: 2.284017 LR: 0.00000609 [21:04:08] Epoch: 1 Batch: 19466/20099 (96.85%) Loss: 2.258305 LR: 0.00000609 [21:04:10] Epoch: 1 Batch: 19467/20099 (96.86%) Loss: 1.868079 LR: 0.00000609 [21:04:12] Epoch: 1 Batch: 19468/20099 (96.86%) Loss: 1.929782 LR: 0.00000609 [21:04:13] Epoch: 1 Batch: 19469/20099 (96.87%) Loss: 2.134636 LR: 0.00000609 [21:04:15] Epoch: 1 Batch: 19470/20099 (96.87%) Loss: 2.052505 LR: 0.00000609 [21:04:17] Epoch: 1 Batch: 19471/20099 (96.88%) Loss: 2.046832 LR: 0.00000609 [21:04:19] Epoch: 1 Batch: 19472/20099 (96.88%) Loss: 2.131657 LR: 0.00000609 [21:04:20] Epoch: 1 Batch: 19473/20099 (96.89%) Loss: 2.178327 LR: 0.00000609 [21:04:22] Epoch: 1 Batch: 19474/20099 (96.89%) Loss: 1.931745 LR: 0.00000609 [21:04:24] Epoch: 1 Batch: 19475/20099 (96.90%) Loss: 1.709950 LR: 0.00000609 [21:04:25] Epoch: 1 Batch: 19476/20099 (96.90%) Loss: 2.056588 LR: 0.00000609 [21:04:27] Epoch: 1 Batch: 19477/20099 (96.91%) Loss: 1.903806 LR: 0.00000609 [21:04:29] Epoch: 1 Batch: 19478/20099 (96.91%) Loss: 1.764072 LR: 0.00000609 [21:04:30] Epoch: 1 Batch: 19479/20099 (96.92%) Loss: 1.917783 LR: 0.00000609 [21:04:32] Epoch: 1 Batch: 19480/20099 (96.92%) Loss: 1.864176 LR: 0.00000609 [21:04:34] Epoch: 1 Batch: 19481/20099 (96.93%) Loss: 1.983515 LR: 0.00000609 [21:04:36] Epoch: 1 Batch: 19482/20099 (96.93%) Loss: 2.193128 LR: 0.00000609 [21:04:37] Epoch: 1 Batch: 19483/20099 (96.94%) Loss: 2.191280 LR: 0.00000609 [21:04:39] Epoch: 1 Batch: 19484/20099 (96.94%) Loss: 2.047396 LR: 0.00000609 [21:04:41] Epoch: 1 Batch: 19485/20099 (96.95%) Loss: 2.175007 LR: 0.00000609 [21:04:42] Epoch: 1 Batch: 19486/20099 (96.95%) Loss: 2.109110 LR: 0.00000609 [21:04:44] Epoch: 1 Batch: 19487/20099 (96.96%) Loss: 1.885035 LR: 0.00000609 [21:04:46] Epoch: 1 Batch: 19488/20099 (96.96%) Loss: 2.148341 LR: 0.00000609 [21:04:47] Epoch: 1 Batch: 19489/20099 (96.97%) Loss: 2.202434 LR: 0.00000609 [21:04:49] Epoch: 1 Batch: 19490/20099 (96.97%) Loss: 1.931261 LR: 0.00000609 [21:04:51] Epoch: 1 Batch: 19491/20099 (96.97%) Loss: 1.811763 LR: 0.00000609 [21:04:53] Epoch: 1 Batch: 19492/20099 (96.98%) Loss: 1.939105 LR: 0.00000609 [21:04:54] Epoch: 1 Batch: 19493/20099 (96.98%) Loss: 1.899246 LR: 0.00000609 [21:04:56] Epoch: 1 Batch: 19494/20099 (96.99%) Loss: 2.303148 LR: 0.00000609 [21:04:58] Epoch: 1 Batch: 19495/20099 (96.99%) Loss: 2.167841 LR: 0.00000609 [21:04:59] Epoch: 1 Batch: 19496/20099 (97.00%) Loss: 2.007464 LR: 0.00000609 [21:05:01] Epoch: 1 Batch: 19497/20099 (97.00%) Loss: 1.941152 LR: 0.00000608 [21:05:03] Epoch: 1 Batch: 19498/20099 (97.01%) Loss: 2.023023 LR: 0.00000608 [21:05:05] Epoch: 1 Batch: 19499/20099 (97.01%) Loss: 2.300960 LR: 0.00000608 [21:05:06] >> Evaluating batch 0 [21:05:07] >> Evaluating batch 1 [21:05:08] >> Evaluating batch 2 [21:05:09] >> Evaluating batch 3 [21:05:10] >> Evaluating batch 4 [21:05:11] >> Evaluating batch 5 [21:05:12] >> Evaluating batch 6 [21:05:13] >> Evaluating batch 7 [21:05:14] >> Evaluating batch 8 [21:05:15] >> Evaluating batch 9 [21:05:16] >> Evaluating batch 10 [21:05:17] >> Evaluating batch 11 [21:05:18] >> Evaluating batch 12 [21:05:19] >> Evaluating batch 13 [21:05:20] >> Evaluating batch 14 [21:05:21] >> Evaluating batch 15 [21:05:21] >> Evaluating batch 16 [21:05:22] Epoch: 1 Step: 19500/20099 Evaluation: [21:05:22] [1mAvg Loss Since Last Eval: 0.0531 Val Loss: 2.1462 Validation loss delta: 2.1462 Perplexity: 8.5523 LR: 0.00000608 [21:05:26] >> Checkpoint saved: epoch1_step19500, size: 0.1693 GB [21:05:26] Epoch: 1 Batch: 19500/20099 (97.02%) Loss: 2.200761 LR: 0.00000608 [21:05:28] Epoch: 1 Batch: 19501/20099 (97.02%) Loss: 2.148013 LR: 0.00000608 [21:05:29] Epoch: 1 Batch: 19502/20099 (97.03%) Loss: 2.046340 LR: 0.00000608 [21:05:31] Epoch: 1 Batch: 19503/20099 (97.03%) Loss: 1.856627 LR: 0.00000608 [21:05:33] Epoch: 1 Batch: 19504/20099 (97.04%) Loss: 1.993814 LR: 0.00000608 [21:05:34] Epoch: 1 Batch: 19505/20099 (97.04%) Loss: 2.341378 LR: 0.00000608 [21:05:36] Epoch: 1 Batch: 19506/20099 (97.05%) Loss: 2.199477 LR: 0.00000608 [21:05:38] Epoch: 1 Batch: 19507/20099 (97.05%) Loss: 2.394095 LR: 0.00000608 [21:05:39] Epoch: 1 Batch: 19508/20099 (97.06%) Loss: 2.121704 LR: 0.00000608 [21:05:41] Epoch: 1 Batch: 19509/20099 (97.06%) Loss: 2.031300 LR: 0.00000608 [21:05:43] Epoch: 1 Batch: 19510/20099 (97.07%) Loss: 2.154776 LR: 0.00000608 [21:05:45] Epoch: 1 Batch: 19511/20099 (97.07%) Loss: 2.184183 LR: 0.00000608 [21:05:46] Epoch: 1 Batch: 19512/20099 (97.08%) Loss: 2.213635 LR: 0.00000608 [21:05:48] Epoch: 1 Batch: 19513/20099 (97.08%) Loss: 1.994876 LR: 0.00000608 [21:05:50] Epoch: 1 Batch: 19514/20099 (97.09%) Loss: 2.129902 LR: 0.00000608 [21:05:51] Epoch: 1 Batch: 19515/20099 (97.09%) Loss: 1.901718 LR: 0.00000608 [21:05:53] Epoch: 1 Batch: 19516/20099 (97.10%) Loss: 1.983424 LR: 0.00000608 [21:05:55] Epoch: 1 Batch: 19517/20099 (97.10%) Loss: 2.291611 LR: 0.00000608 [21:05:57] Epoch: 1 Batch: 19518/20099 (97.11%) Loss: 2.029920 LR: 0.00000608 [21:05:58] Epoch: 1 Batch: 19519/20099 (97.11%) Loss: 2.062734 LR: 0.00000608 [21:06:00] Epoch: 1 Batch: 19520/20099 (97.12%) Loss: 2.167249 LR: 0.00000608 [21:06:02] Epoch: 1 Batch: 19521/20099 (97.12%) Loss: 1.950860 LR: 0.00000608 [21:06:03] Epoch: 1 Batch: 19522/20099 (97.13%) Loss: 2.158612 LR: 0.00000608 [21:06:05] Epoch: 1 Batch: 19523/20099 (97.13%) Loss: 1.995965 LR: 0.00000608 [21:06:07] Epoch: 1 Batch: 19524/20099 (97.14%) Loss: 2.281001 LR: 0.00000608 [21:06:09] Epoch: 1 Batch: 19525/20099 (97.14%) Loss: 2.043555 LR: 0.00000608 [21:06:10] Epoch: 1 Batch: 19526/20099 (97.15%) Loss: 2.196559 LR: 0.00000608 [21:06:12] Epoch: 1 Batch: 19527/20099 (97.15%) Loss: 1.863253 LR: 0.00000608 [21:06:14] Epoch: 1 Batch: 19528/20099 (97.16%) Loss: 2.143423 LR: 0.00000608 [21:06:15] Epoch: 1 Batch: 19529/20099 (97.16%) Loss: 2.078881 LR: 0.00000608 [21:06:17] Epoch: 1 Batch: 19530/20099 (97.17%) Loss: 2.001253 LR: 0.00000608 [21:06:19] Epoch: 1 Batch: 19531/20099 (97.17%) Loss: 2.029536 LR: 0.00000608 [21:06:20] Epoch: 1 Batch: 19532/20099 (97.18%) Loss: 2.184704 LR: 0.00000607 [21:06:22] Epoch: 1 Batch: 19533/20099 (97.18%) Loss: 2.083115 LR: 0.00000607 [21:06:24] Epoch: 1 Batch: 19534/20099 (97.19%) Loss: 2.157794 LR: 0.00000607 [21:06:25] Epoch: 1 Batch: 19535/20099 (97.19%) Loss: 2.055910 LR: 0.00000607 [21:06:27] Epoch: 1 Batch: 19536/20099 (97.20%) Loss: 2.010116 LR: 0.00000607 [21:06:29] Epoch: 1 Batch: 19537/20099 (97.20%) Loss: 2.234168 LR: 0.00000607 [21:06:31] Epoch: 1 Batch: 19538/20099 (97.21%) Loss: 2.070967 LR: 0.00000607 [21:06:32] Epoch: 1 Batch: 19539/20099 (97.21%) Loss: 2.053724 LR: 0.00000607 [21:06:34] Epoch: 1 Batch: 19540/20099 (97.22%) Loss: 2.333262 LR: 0.00000607 [21:06:36] Epoch: 1 Batch: 19541/20099 (97.22%) Loss: 2.163237 LR: 0.00000607 [21:06:37] Epoch: 1 Batch: 19542/20099 (97.23%) Loss: 2.011131 LR: 0.00000607 [21:06:39] Epoch: 1 Batch: 19543/20099 (97.23%) Loss: 2.468477 LR: 0.00000607 [21:06:41] Epoch: 1 Batch: 19544/20099 (97.24%) Loss: 2.267995 LR: 0.00000607 [21:06:42] Epoch: 1 Batch: 19545/20099 (97.24%) Loss: 2.379897 LR: 0.00000607 [21:06:44] Epoch: 1 Batch: 19546/20099 (97.25%) Loss: 2.250490 LR: 0.00000607 [21:06:46] Epoch: 1 Batch: 19547/20099 (97.25%) Loss: 2.152672 LR: 0.00000607 [21:06:47] Epoch: 1 Batch: 19548/20099 (97.26%) Loss: 2.165462 LR: 0.00000607 [21:06:49] Epoch: 1 Batch: 19549/20099 (97.26%) Loss: 2.052712 LR: 0.00000607 [21:06:51] Epoch: 1 Batch: 19550/20099 (97.27%) Loss: 2.255512 LR: 0.00000607 [21:06:52] Epoch: 1 Batch: 19551/20099 (97.27%) Loss: 2.036903 LR: 0.00000607 [21:06:54] Epoch: 1 Batch: 19552/20099 (97.28%) Loss: 2.273328 LR: 0.00000607 [21:06:56] Epoch: 1 Batch: 19553/20099 (97.28%) Loss: 2.087305 LR: 0.00000607 [21:06:58] Epoch: 1 Batch: 19554/20099 (97.29%) Loss: 2.354399 LR: 0.00000607 [21:06:59] Epoch: 1 Batch: 19555/20099 (97.29%) Loss: 1.965425 LR: 0.00000607 [21:07:01] Epoch: 1 Batch: 19556/20099 (97.30%) Loss: 1.888842 LR: 0.00000607 [21:07:03] Epoch: 1 Batch: 19557/20099 (97.30%) Loss: 2.133152 LR: 0.00000607 [21:07:04] Epoch: 1 Batch: 19558/20099 (97.31%) Loss: 2.215588 LR: 0.00000607 [21:07:06] Epoch: 1 Batch: 19559/20099 (97.31%) Loss: 1.980094 LR: 0.00000607 [21:07:08] Epoch: 1 Batch: 19560/20099 (97.32%) Loss: 1.983996 LR: 0.00000607 [21:07:09] Epoch: 1 Batch: 19561/20099 (97.32%) Loss: 1.945713 LR: 0.00000607 [21:07:11] Epoch: 1 Batch: 19562/20099 (97.33%) Loss: 2.131228 LR: 0.00000607 [21:07:13] Epoch: 1 Batch: 19563/20099 (97.33%) Loss: 2.089635 LR: 0.00000607 [21:07:14] Epoch: 1 Batch: 19564/20099 (97.34%) Loss: 2.124650 LR: 0.00000607 [21:07:16] Epoch: 1 Batch: 19565/20099 (97.34%) Loss: 1.903804 LR: 0.00000607 [21:07:18] Epoch: 1 Batch: 19566/20099 (97.35%) Loss: 2.179860 LR: 0.00000607 [21:07:20] Epoch: 1 Batch: 19567/20099 (97.35%) Loss: 2.118630 LR: 0.00000607 [21:07:21] Epoch: 1 Batch: 19568/20099 (97.36%) Loss: 1.946252 LR: 0.00000607 [21:07:23] Epoch: 1 Batch: 19569/20099 (97.36%) Loss: 2.150693 LR: 0.00000607 [21:07:25] Epoch: 1 Batch: 19570/20099 (97.37%) Loss: 2.279199 LR: 0.00000607 [21:07:26] Epoch: 1 Batch: 19571/20099 (97.37%) Loss: 2.124003 LR: 0.00000607 [21:07:28] Epoch: 1 Batch: 19572/20099 (97.38%) Loss: 1.950289 LR: 0.00000607 [21:07:30] Epoch: 1 Batch: 19573/20099 (97.38%) Loss: 2.014793 LR: 0.00000607 [21:07:31] Epoch: 1 Batch: 19574/20099 (97.39%) Loss: 2.119580 LR: 0.00000606 [21:07:33] Epoch: 1 Batch: 19575/20099 (97.39%) Loss: 1.867238 LR: 0.00000606 [21:07:35] Epoch: 1 Batch: 19576/20099 (97.40%) Loss: 2.126518 LR: 0.00000606 [21:07:37] Epoch: 1 Batch: 19577/20099 (97.40%) Loss: 2.029558 LR: 0.00000606 [21:07:38] Epoch: 1 Batch: 19578/20099 (97.41%) Loss: 2.114726 LR: 0.00000606 [21:07:40] Epoch: 1 Batch: 19579/20099 (97.41%) Loss: 1.791034 LR: 0.00000606 [21:07:42] Epoch: 1 Batch: 19580/20099 (97.42%) Loss: 2.079845 LR: 0.00000606 [21:07:43] Epoch: 1 Batch: 19581/20099 (97.42%) Loss: 1.952391 LR: 0.00000606 [21:07:45] Epoch: 1 Batch: 19582/20099 (97.43%) Loss: 2.053125 LR: 0.00000606 [21:07:47] Epoch: 1 Batch: 19583/20099 (97.43%) Loss: 1.879852 LR: 0.00000606 [21:07:48] Epoch: 1 Batch: 19584/20099 (97.44%) Loss: 2.277481 LR: 0.00000606 [21:07:50] Epoch: 1 Batch: 19585/20099 (97.44%) Loss: 2.036930 LR: 0.00000606 [21:07:52] Epoch: 1 Batch: 19586/20099 (97.45%) Loss: 2.296636 LR: 0.00000606 [21:07:54] Epoch: 1 Batch: 19587/20099 (97.45%) Loss: 1.760554 LR: 0.00000606 [21:07:55] Epoch: 1 Batch: 19588/20099 (97.46%) Loss: 1.951828 LR: 0.00000606 [21:07:57] Epoch: 1 Batch: 19589/20099 (97.46%) Loss: 1.969662 LR: 0.00000606 [21:07:59] Epoch: 1 Batch: 19590/20099 (97.47%) Loss: 2.324870 LR: 0.00000606 [21:08:00] Epoch: 1 Batch: 19591/20099 (97.47%) Loss: 2.147874 LR: 0.00000606 [21:08:02] Epoch: 1 Batch: 19592/20099 (97.48%) Loss: 1.635293 LR: 0.00000606 [21:08:04] Epoch: 1 Batch: 19593/20099 (97.48%) Loss: 2.144351 LR: 0.00000606 [21:08:05] Epoch: 1 Batch: 19594/20099 (97.49%) Loss: 2.038568 LR: 0.00000606 [21:08:07] Epoch: 1 Batch: 19595/20099 (97.49%) Loss: 1.552045 LR: 0.00000606 [21:08:09] Epoch: 1 Batch: 19596/20099 (97.50%) Loss: 2.147465 LR: 0.00000606 [21:08:11] Epoch: 1 Batch: 19597/20099 (97.50%) Loss: 1.891292 LR: 0.00000606 [21:08:12] Epoch: 1 Batch: 19598/20099 (97.51%) Loss: 2.180317 LR: 0.00000606 [21:08:14] Epoch: 1 Batch: 19599/20099 (97.51%) Loss: 2.229229 LR: 0.00000606 [21:08:19] >> Temp checkpoint saved: epoch1_step19600, size: 0.1693 GB [21:08:19] Epoch: 1 Batch: 19600/20099 (97.52%) Loss: 2.047410 LR: 0.00000606 [21:08:21] Epoch: 1 Batch: 19601/20099 (97.52%) Loss: 2.106380 LR: 0.00000606 [21:08:23] Epoch: 1 Batch: 19602/20099 (97.53%) Loss: 2.263475 LR: 0.00000606 [21:08:24] Epoch: 1 Batch: 19603/20099 (97.53%) Loss: 2.067778 LR: 0.00000606 [21:08:26] Epoch: 1 Batch: 19604/20099 (97.54%) Loss: 2.020451 LR: 0.00000606 [21:08:28] Epoch: 1 Batch: 19605/20099 (97.54%) Loss: 1.901429 LR: 0.00000606 [21:08:29] Epoch: 1 Batch: 19606/20099 (97.55%) Loss: 2.157249 LR: 0.00000606 [21:08:31] Epoch: 1 Batch: 19607/20099 (97.55%) Loss: 1.895662 LR: 0.00000606 [21:08:33] Epoch: 1 Batch: 19608/20099 (97.56%) Loss: 2.216561 LR: 0.00000606 [21:08:35] Epoch: 1 Batch: 19609/20099 (97.56%) Loss: 2.307118 LR: 0.00000606 [21:08:36] Epoch: 1 Batch: 19610/20099 (97.57%) Loss: 2.419635 LR: 0.00000606 [21:08:38] Epoch: 1 Batch: 19611/20099 (97.57%) Loss: 2.175984 LR: 0.00000606 [21:08:40] Epoch: 1 Batch: 19612/20099 (97.58%) Loss: 2.160920 LR: 0.00000606 [21:08:41] Epoch: 1 Batch: 19613/20099 (97.58%) Loss: 1.833313 LR: 0.00000606 [21:08:43] Epoch: 1 Batch: 19614/20099 (97.59%) Loss: 1.977506 LR: 0.00000606 [21:08:45] Epoch: 1 Batch: 19615/20099 (97.59%) Loss: 1.926411 LR: 0.00000606 [21:08:46] Epoch: 1 Batch: 19616/20099 (97.60%) Loss: 2.224730 LR: 0.00000605 [21:08:48] Epoch: 1 Batch: 19617/20099 (97.60%) Loss: 2.179622 LR: 0.00000605 [21:08:50] Epoch: 1 Batch: 19618/20099 (97.61%) Loss: 2.471264 LR: 0.00000605 [21:08:52] Epoch: 1 Batch: 19619/20099 (97.61%) Loss: 2.367577 LR: 0.00000605 [21:08:53] Epoch: 1 Batch: 19620/20099 (97.62%) Loss: 2.237221 LR: 0.00000605 [21:08:55] Epoch: 1 Batch: 19621/20099 (97.62%) Loss: 2.056040 LR: 0.00000605 [21:08:57] Epoch: 1 Batch: 19622/20099 (97.63%) Loss: 2.350324 LR: 0.00000605 [21:08:58] Epoch: 1 Batch: 19623/20099 (97.63%) Loss: 1.847331 LR: 0.00000605 [21:09:00] Epoch: 1 Batch: 19624/20099 (97.64%) Loss: 2.109378 LR: 0.00000605 [21:09:02] Epoch: 1 Batch: 19625/20099 (97.64%) Loss: 1.914697 LR: 0.00000605 [21:09:04] Epoch: 1 Batch: 19626/20099 (97.65%) Loss: 1.644272 LR: 0.00000605 [21:09:05] Epoch: 1 Batch: 19627/20099 (97.65%) Loss: 2.252537 LR: 0.00000605 [21:09:07] Epoch: 1 Batch: 19628/20099 (97.66%) Loss: 1.957116 LR: 0.00000605 [21:09:09] Epoch: 1 Batch: 19629/20099 (97.66%) Loss: 2.261369 LR: 0.00000605 [21:09:10] Epoch: 1 Batch: 19630/20099 (97.67%) Loss: 2.267442 LR: 0.00000605 [21:09:12] Epoch: 1 Batch: 19631/20099 (97.67%) Loss: 2.226410 LR: 0.00000605 [21:09:14] Epoch: 1 Batch: 19632/20099 (97.68%) Loss: 2.233835 LR: 0.00000605 [21:09:15] Epoch: 1 Batch: 19633/20099 (97.68%) Loss: 2.181283 LR: 0.00000605 [21:09:17] Epoch: 1 Batch: 19634/20099 (97.69%) Loss: 2.031026 LR: 0.00000605 [21:09:19] Epoch: 1 Batch: 19635/20099 (97.69%) Loss: 1.927615 LR: 0.00000605 [21:09:21] Epoch: 1 Batch: 19636/20099 (97.70%) Loss: 2.067996 LR: 0.00000605 [21:09:22] Epoch: 1 Batch: 19637/20099 (97.70%) Loss: 2.301143 LR: 0.00000605 [21:09:24] Epoch: 1 Batch: 19638/20099 (97.71%) Loss: 2.177565 LR: 0.00000605 [21:09:26] Epoch: 1 Batch: 19639/20099 (97.71%) Loss: 2.415480 LR: 0.00000605 [21:09:27] Epoch: 1 Batch: 19640/20099 (97.72%) Loss: 2.148036 LR: 0.00000605 [21:09:29] Epoch: 1 Batch: 19641/20099 (97.72%) Loss: 2.194405 LR: 0.00000605 [21:09:31] Epoch: 1 Batch: 19642/20099 (97.73%) Loss: 2.033323 LR: 0.00000605 [21:09:32] Epoch: 1 Batch: 19643/20099 (97.73%) Loss: 2.031634 LR: 0.00000605 [21:09:34] Epoch: 1 Batch: 19644/20099 (97.74%) Loss: 2.242031 LR: 0.00000605 [21:09:36] Epoch: 1 Batch: 19645/20099 (97.74%) Loss: 2.073355 LR: 0.00000605 [21:09:37] Epoch: 1 Batch: 19646/20099 (97.75%) Loss: 2.119832 LR: 0.00000605 [21:09:39] Epoch: 1 Batch: 19647/20099 (97.75%) Loss: 2.147381 LR: 0.00000605 [21:09:41] Epoch: 1 Batch: 19648/20099 (97.76%) Loss: 2.008725 LR: 0.00000605 [21:09:43] Epoch: 1 Batch: 19649/20099 (97.76%) Loss: 2.193848 LR: 0.00000605 [21:09:44] Epoch: 1 Batch: 19650/20099 (97.77%) Loss: 2.126035 LR: 0.00000605 [21:09:46] Epoch: 1 Batch: 19651/20099 (97.77%) Loss: 2.037643 LR: 0.00000605 [21:09:48] Epoch: 1 Batch: 19652/20099 (97.78%) Loss: 2.156614 LR: 0.00000605 [21:09:49] Epoch: 1 Batch: 19653/20099 (97.78%) Loss: 2.118502 LR: 0.00000605 [21:09:51] Epoch: 1 Batch: 19654/20099 (97.79%) Loss: 2.496036 LR: 0.00000605 [21:09:53] Epoch: 1 Batch: 19655/20099 (97.79%) Loss: 2.002283 LR: 0.00000605 [21:09:54] Epoch: 1 Batch: 19656/20099 (97.80%) Loss: 2.070154 LR: 0.00000605 [21:09:56] Epoch: 1 Batch: 19657/20099 (97.80%) Loss: 1.906476 LR: 0.00000605 [21:09:58] Epoch: 1 Batch: 19658/20099 (97.81%) Loss: 2.419716 LR: 0.00000604 [21:09:59] Epoch: 1 Batch: 19659/20099 (97.81%) Loss: 2.301998 LR: 0.00000604 [21:10:01] Epoch: 1 Batch: 19660/20099 (97.82%) Loss: 2.082315 LR: 0.00000604 [21:10:03] Epoch: 1 Batch: 19661/20099 (97.82%) Loss: 2.089788 LR: 0.00000604 [21:10:04] Epoch: 1 Batch: 19662/20099 (97.83%) Loss: 2.007788 LR: 0.00000604 [21:10:06] Epoch: 1 Batch: 19663/20099 (97.83%) Loss: 2.155777 LR: 0.00000604 [21:10:08] Epoch: 1 Batch: 19664/20099 (97.84%) Loss: 2.118854 LR: 0.00000604 [21:10:10] Epoch: 1 Batch: 19665/20099 (97.84%) Loss: 2.233336 LR: 0.00000604 [21:10:11] Epoch: 1 Batch: 19666/20099 (97.85%) Loss: 2.043792 LR: 0.00000604 [21:10:13] Epoch: 1 Batch: 19667/20099 (97.85%) Loss: 1.893848 LR: 0.00000604 [21:10:15] Epoch: 1 Batch: 19668/20099 (97.86%) Loss: 2.282824 LR: 0.00000604 [21:10:16] Epoch: 1 Batch: 19669/20099 (97.86%) Loss: 2.357982 LR: 0.00000604 [21:10:18] Epoch: 1 Batch: 19670/20099 (97.87%) Loss: 2.036724 LR: 0.00000604 [21:10:20] Epoch: 1 Batch: 19671/20099 (97.87%) Loss: 2.109029 LR: 0.00000604 [21:10:21] Epoch: 1 Batch: 19672/20099 (97.88%) Loss: 2.200966 LR: 0.00000604 [21:10:23] Epoch: 1 Batch: 19673/20099 (97.88%) Loss: 2.075505 LR: 0.00000604 [21:10:25] Epoch: 1 Batch: 19674/20099 (97.89%) Loss: 2.063268 LR: 0.00000604 [21:10:27] Epoch: 1 Batch: 19675/20099 (97.89%) Loss: 2.054620 LR: 0.00000604 [21:10:28] Epoch: 1 Batch: 19676/20099 (97.90%) Loss: 2.158583 LR: 0.00000604 [21:10:30] Epoch: 1 Batch: 19677/20099 (97.90%) Loss: 1.894716 LR: 0.00000604 [21:10:32] Epoch: 1 Batch: 19678/20099 (97.91%) Loss: 1.905698 LR: 0.00000604 [21:10:34] Epoch: 1 Batch: 19679/20099 (97.91%) Loss: 2.126866 LR: 0.00000604 [21:10:35] Epoch: 1 Batch: 19680/20099 (97.92%) Loss: 1.933876 LR: 0.00000604 [21:10:37] Epoch: 1 Batch: 19681/20099 (97.92%) Loss: 2.253431 LR: 0.00000604 [21:10:39] Epoch: 1 Batch: 19682/20099 (97.93%) Loss: 1.996334 LR: 0.00000604 [21:10:40] Epoch: 1 Batch: 19683/20099 (97.93%) Loss: 1.831973 LR: 0.00000604 [21:10:42] Epoch: 1 Batch: 19684/20099 (97.94%) Loss: 2.302943 LR: 0.00000604 [21:10:44] Epoch: 1 Batch: 19685/20099 (97.94%) Loss: 2.411334 LR: 0.00000604 [21:10:45] Epoch: 1 Batch: 19686/20099 (97.95%) Loss: 2.135217 LR: 0.00000604 [21:10:47] Epoch: 1 Batch: 19687/20099 (97.95%) Loss: 2.168799 LR: 0.00000604 [21:10:49] Epoch: 1 Batch: 19688/20099 (97.96%) Loss: 2.111564 LR: 0.00000604 [21:10:51] Epoch: 1 Batch: 19689/20099 (97.96%) Loss: 1.937084 LR: 0.00000604 [21:10:52] Epoch: 1 Batch: 19690/20099 (97.97%) Loss: 1.887712 LR: 0.00000604 [21:10:54] Epoch: 1 Batch: 19691/20099 (97.97%) Loss: 1.951787 LR: 0.00000604 [21:10:56] Epoch: 1 Batch: 19692/20099 (97.98%) Loss: 2.012900 LR: 0.00000604 [21:10:57] Epoch: 1 Batch: 19693/20099 (97.98%) Loss: 2.049324 LR: 0.00000604 [21:10:59] Epoch: 1 Batch: 19694/20099 (97.98%) Loss: 2.214893 LR: 0.00000604 [21:11:01] Epoch: 1 Batch: 19695/20099 (97.99%) Loss: 2.252084 LR: 0.00000604 [21:11:02] Epoch: 1 Batch: 19696/20099 (97.99%) Loss: 2.282241 LR: 0.00000604 [21:11:04] Epoch: 1 Batch: 19697/20099 (98.00%) Loss: 2.403964 LR: 0.00000604 [21:11:06] Epoch: 1 Batch: 19698/20099 (98.00%) Loss: 2.118081 LR: 0.00000604 [21:11:07] Epoch: 1 Batch: 19699/20099 (98.01%) Loss: 2.099753 LR: 0.00000604 [21:11:09] Epoch: 1 Batch: 19700/20099 (98.01%) Loss: 1.905315 LR: 0.00000604 [21:11:11] Epoch: 1 Batch: 19701/20099 (98.02%) Loss: 2.092693 LR: 0.00000604 [21:11:13] Epoch: 1 Batch: 19702/20099 (98.02%) Loss: 2.173392 LR: 0.00000604 [21:11:14] Epoch: 1 Batch: 19703/20099 (98.03%) Loss: 2.438276 LR: 0.00000604 [21:11:16] Epoch: 1 Batch: 19704/20099 (98.03%) Loss: 2.157546 LR: 0.00000604 [21:11:18] Epoch: 1 Batch: 19705/20099 (98.04%) Loss: 1.947139 LR: 0.00000604 [21:11:19] Epoch: 1 Batch: 19706/20099 (98.04%) Loss: 2.170629 LR: 0.00000604 [21:11:21] Epoch: 1 Batch: 19707/20099 (98.05%) Loss: 1.843884 LR: 0.00000604 [21:11:23] Epoch: 1 Batch: 19708/20099 (98.05%) Loss: 1.996712 LR: 0.00000604 [21:11:25] Epoch: 1 Batch: 19709/20099 (98.06%) Loss: 2.203949 LR: 0.00000604 [21:11:26] Epoch: 1 Batch: 19710/20099 (98.06%) Loss: 1.928943 LR: 0.00000604 [21:11:28] Epoch: 1 Batch: 19711/20099 (98.07%) Loss: 2.063842 LR: 0.00000604 [21:11:30] Epoch: 1 Batch: 19712/20099 (98.07%) Loss: 2.082536 LR: 0.00000604 [21:11:31] Epoch: 1 Batch: 19713/20099 (98.08%) Loss: 2.192016 LR: 0.00000604 [21:11:33] Epoch: 1 Batch: 19714/20099 (98.08%) Loss: 2.026328 LR: 0.00000603 [21:11:35] Epoch: 1 Batch: 19715/20099 (98.09%) Loss: 1.948423 LR: 0.00000603 [21:11:36] Epoch: 1 Batch: 19716/20099 (98.09%) Loss: 2.223531 LR: 0.00000603 [21:11:38] Epoch: 1 Batch: 19717/20099 (98.10%) Loss: 1.804175 LR: 0.00000603 [21:11:40] Epoch: 1 Batch: 19718/20099 (98.10%) Loss: 1.847138 LR: 0.00000603 [21:11:42] Epoch: 1 Batch: 19719/20099 (98.11%) Loss: 2.468766 LR: 0.00000603 [21:11:43] Epoch: 1 Batch: 19720/20099 (98.11%) Loss: 2.296887 LR: 0.00000603 [21:11:45] Epoch: 1 Batch: 19721/20099 (98.12%) Loss: 1.811715 LR: 0.00000603 [21:11:47] Epoch: 1 Batch: 19722/20099 (98.12%) Loss: 1.916123 LR: 0.00000603 [21:11:48] Epoch: 1 Batch: 19723/20099 (98.13%) Loss: 1.967202 LR: 0.00000603 [21:11:50] Epoch: 1 Batch: 19724/20099 (98.13%) Loss: 2.255395 LR: 0.00000603 [21:11:52] Epoch: 1 Batch: 19725/20099 (98.14%) Loss: 2.021192 LR: 0.00000603 [21:11:53] Epoch: 1 Batch: 19726/20099 (98.14%) Loss: 2.113619 LR: 0.00000603 [21:11:55] Epoch: 1 Batch: 19727/20099 (98.15%) Loss: 2.008102 LR: 0.00000603 [21:11:57] Epoch: 1 Batch: 19728/20099 (98.15%) Loss: 2.094108 LR: 0.00000603 [21:11:58] Epoch: 1 Batch: 19729/20099 (98.16%) Loss: 2.080100 LR: 0.00000603 [21:12:00] Epoch: 1 Batch: 19730/20099 (98.16%) Loss: 2.113569 LR: 0.00000603 [21:12:02] Epoch: 1 Batch: 19731/20099 (98.17%) Loss: 2.062838 LR: 0.00000603 [21:12:03] Epoch: 1 Batch: 19732/20099 (98.17%) Loss: 1.894602 LR: 0.00000603 [21:12:05] Epoch: 1 Batch: 19733/20099 (98.18%) Loss: 2.380128 LR: 0.00000603 [21:12:07] Epoch: 1 Batch: 19734/20099 (98.18%) Loss: 2.104017 LR: 0.00000603 [21:12:09] Epoch: 1 Batch: 19735/20099 (98.19%) Loss: 1.709727 LR: 0.00000603 [21:12:10] Epoch: 1 Batch: 19736/20099 (98.19%) Loss: 2.038914 LR: 0.00000603 [21:12:12] Epoch: 1 Batch: 19737/20099 (98.20%) Loss: 2.046742 LR: 0.00000603 [21:12:14] Epoch: 1 Batch: 19738/20099 (98.20%) Loss: 2.071620 LR: 0.00000603 [21:12:15] Epoch: 1 Batch: 19739/20099 (98.21%) Loss: 2.168152 LR: 0.00000603 [21:12:17] Epoch: 1 Batch: 19740/20099 (98.21%) Loss: 2.043953 LR: 0.00000603 [21:12:19] Epoch: 1 Batch: 19741/20099 (98.22%) Loss: 2.293775 LR: 0.00000603 [21:12:20] Epoch: 1 Batch: 19742/20099 (98.22%) Loss: 2.314892 LR: 0.00000603 [21:12:22] Epoch: 1 Batch: 19743/20099 (98.23%) Loss: 1.911069 LR: 0.00000603 [21:12:24] Epoch: 1 Batch: 19744/20099 (98.23%) Loss: 2.020998 LR: 0.00000603 [21:12:26] Epoch: 1 Batch: 19745/20099 (98.24%) Loss: 2.140975 LR: 0.00000603 [21:12:27] Epoch: 1 Batch: 19746/20099 (98.24%) Loss: 1.997423 LR: 0.00000603 [21:12:29] Epoch: 1 Batch: 19747/20099 (98.25%) Loss: 1.922877 LR: 0.00000603 [21:12:31] Epoch: 1 Batch: 19748/20099 (98.25%) Loss: 1.861700 LR: 0.00000603 [21:12:32] Epoch: 1 Batch: 19749/20099 (98.26%) Loss: 1.913440 LR: 0.00000603 [21:12:34] Epoch: 1 Batch: 19750/20099 (98.26%) Loss: 2.223578 LR: 0.00000603 [21:12:36] Epoch: 1 Batch: 19751/20099 (98.27%) Loss: 1.813420 LR: 0.00000603 [21:12:37] Epoch: 1 Batch: 19752/20099 (98.27%) Loss: 1.790297 LR: 0.00000603 [21:12:39] Epoch: 1 Batch: 19753/20099 (98.28%) Loss: 2.447745 LR: 0.00000603 [21:12:41] Epoch: 1 Batch: 19754/20099 (98.28%) Loss: 2.159322 LR: 0.00000603 [21:12:43] Epoch: 1 Batch: 19755/20099 (98.29%) Loss: 2.067868 LR: 0.00000603 [21:12:44] Epoch: 1 Batch: 19756/20099 (98.29%) Loss: 2.145474 LR: 0.00000603 [21:12:46] Epoch: 1 Batch: 19757/20099 (98.30%) Loss: 2.218757 LR: 0.00000603 [21:12:48] Epoch: 1 Batch: 19758/20099 (98.30%) Loss: 1.833680 LR: 0.00000603 [21:12:49] Epoch: 1 Batch: 19759/20099 (98.31%) Loss: 2.323237 LR: 0.00000603 [21:12:51] Epoch: 1 Batch: 19760/20099 (98.31%) Loss: 1.930634 LR: 0.00000603 [21:12:53] Epoch: 1 Batch: 19761/20099 (98.32%) Loss: 1.875503 LR: 0.00000603 [21:12:54] Epoch: 1 Batch: 19762/20099 (98.32%) Loss: 2.133965 LR: 0.00000603 [21:12:56] Epoch: 1 Batch: 19763/20099 (98.33%) Loss: 2.174304 LR: 0.00000603 [21:12:58] Epoch: 1 Batch: 19764/20099 (98.33%) Loss: 1.866125 LR: 0.00000603 [21:12:59] Epoch: 1 Batch: 19765/20099 (98.34%) Loss: 2.230114 LR: 0.00000603 [21:13:01] Epoch: 1 Batch: 19766/20099 (98.34%) Loss: 2.207559 LR: 0.00000603 [21:13:03] Epoch: 1 Batch: 19767/20099 (98.35%) Loss: 2.667779 LR: 0.00000603 [21:13:05] Epoch: 1 Batch: 19768/20099 (98.35%) Loss: 2.096428 LR: 0.00000603 [21:13:06] Epoch: 1 Batch: 19769/20099 (98.36%) Loss: 2.190620 LR: 0.00000603 [21:13:08] Epoch: 1 Batch: 19770/20099 (98.36%) Loss: 2.128263 LR: 0.00000603 [21:13:10] Epoch: 1 Batch: 19771/20099 (98.37%) Loss: 2.224972 LR: 0.00000603 [21:13:11] Epoch: 1 Batch: 19772/20099 (98.37%) Loss: 2.028450 LR: 0.00000603 [21:13:13] Epoch: 1 Batch: 19773/20099 (98.38%) Loss: 2.242065 LR: 0.00000603 [21:13:15] Epoch: 1 Batch: 19774/20099 (98.38%) Loss: 2.319241 LR: 0.00000603 [21:13:16] Epoch: 1 Batch: 19775/20099 (98.39%) Loss: 2.059998 LR: 0.00000603 [21:13:18] Epoch: 1 Batch: 19776/20099 (98.39%) Loss: 1.928091 LR: 0.00000603 [21:13:20] Epoch: 1 Batch: 19777/20099 (98.40%) Loss: 2.175483 LR: 0.00000602 [21:13:22] Epoch: 1 Batch: 19778/20099 (98.40%) Loss: 2.162094 LR: 0.00000602 [21:13:23] Epoch: 1 Batch: 19779/20099 (98.41%) Loss: 1.738840 LR: 0.00000602 [21:13:25] Epoch: 1 Batch: 19780/20099 (98.41%) Loss: 1.834609 LR: 0.00000602 [21:13:27] Epoch: 1 Batch: 19781/20099 (98.42%) Loss: 2.065798 LR: 0.00000602 [21:13:28] Epoch: 1 Batch: 19782/20099 (98.42%) Loss: 1.981241 LR: 0.00000602 [21:13:30] Epoch: 1 Batch: 19783/20099 (98.43%) Loss: 2.163119 LR: 0.00000602 [21:13:32] Epoch: 1 Batch: 19784/20099 (98.43%) Loss: 2.136414 LR: 0.00000602 [21:13:33] Epoch: 1 Batch: 19785/20099 (98.44%) Loss: 2.007050 LR: 0.00000602 [21:13:35] Epoch: 1 Batch: 19786/20099 (98.44%) Loss: 1.792248 LR: 0.00000602 [21:13:37] Epoch: 1 Batch: 19787/20099 (98.45%) Loss: 2.074148 LR: 0.00000602 [21:13:38] Epoch: 1 Batch: 19788/20099 (98.45%) Loss: 1.591648 LR: 0.00000602 [21:13:40] Epoch: 1 Batch: 19789/20099 (98.46%) Loss: 2.062532 LR: 0.00000602 [21:13:42] Epoch: 1 Batch: 19790/20099 (98.46%) Loss: 2.058855 LR: 0.00000602 [21:13:44] Epoch: 1 Batch: 19791/20099 (98.47%) Loss: 1.814199 LR: 0.00000602 [21:13:45] Epoch: 1 Batch: 19792/20099 (98.47%) Loss: 1.950415 LR: 0.00000602 [21:13:47] Epoch: 1 Batch: 19793/20099 (98.48%) Loss: 1.771776 LR: 0.00000602 [21:13:49] Epoch: 1 Batch: 19794/20099 (98.48%) Loss: 1.851476 LR: 0.00000602 [21:13:50] Epoch: 1 Batch: 19795/20099 (98.49%) Loss: 2.052677 LR: 0.00000602 [21:13:52] Epoch: 1 Batch: 19796/20099 (98.49%) Loss: 1.928611 LR: 0.00000602 [21:13:54] Epoch: 1 Batch: 19797/20099 (98.50%) Loss: 2.180693 LR: 0.00000602 [21:13:55] Epoch: 1 Batch: 19798/20099 (98.50%) Loss: 2.166965 LR: 0.00000602 [21:13:57] Epoch: 1 Batch: 19799/20099 (98.51%) Loss: 2.018901 LR: 0.00000602 [21:14:03] >> Cleaned up old temp checkpoint: epoch1_step17800 [21:14:03] >> Temp checkpoint saved: epoch1_step19800, size: 0.1693 GB [21:14:03] Epoch: 1 Batch: 19800/20099 (98.51%) Loss: 2.052021 LR: 0.00000602 [21:14:04] Epoch: 1 Batch: 19801/20099 (98.52%) Loss: 2.292603 LR: 0.00000602 [21:14:06] Epoch: 1 Batch: 19802/20099 (98.52%) Loss: 1.940827 LR: 0.00000602 [21:14:08] Epoch: 1 Batch: 19803/20099 (98.53%) Loss: 2.090777 LR: 0.00000602 [21:14:09] Epoch: 1 Batch: 19804/20099 (98.53%) Loss: 1.982296 LR: 0.00000602 [21:14:11] Epoch: 1 Batch: 19805/20099 (98.54%) Loss: 1.870217 LR: 0.00000602 [21:14:13] Epoch: 1 Batch: 19806/20099 (98.54%) Loss: 2.375092 LR: 0.00000602 [21:14:14] Epoch: 1 Batch: 19807/20099 (98.55%) Loss: 2.053111 LR: 0.00000602 [21:14:16] Epoch: 1 Batch: 19808/20099 (98.55%) Loss: 1.820624 LR: 0.00000602 [21:14:18] Epoch: 1 Batch: 19809/20099 (98.56%) Loss: 1.875758 LR: 0.00000602 [21:14:19] Epoch: 1 Batch: 19810/20099 (98.56%) Loss: 1.896281 LR: 0.00000602 [21:14:21] Epoch: 1 Batch: 19811/20099 (98.57%) Loss: 2.519021 LR: 0.00000602 [21:14:23] Epoch: 1 Batch: 19812/20099 (98.57%) Loss: 2.417110 LR: 0.00000602 [21:14:25] Epoch: 1 Batch: 19813/20099 (98.58%) Loss: 2.032626 LR: 0.00000602 [21:14:26] Epoch: 1 Batch: 19814/20099 (98.58%) Loss: 2.289120 LR: 0.00000602 [21:14:28] Epoch: 1 Batch: 19815/20099 (98.59%) Loss: 1.898158 LR: 0.00000602 [21:14:30] Epoch: 1 Batch: 19816/20099 (98.59%) Loss: 2.255989 LR: 0.00000602 [21:14:31] Epoch: 1 Batch: 19817/20099 (98.60%) Loss: 1.928620 LR: 0.00000602 [21:14:33] Epoch: 1 Batch: 19818/20099 (98.60%) Loss: 2.066255 LR: 0.00000602 [21:14:35] Epoch: 1 Batch: 19819/20099 (98.61%) Loss: 2.556113 LR: 0.00000602 [21:14:37] Epoch: 1 Batch: 19820/20099 (98.61%) Loss: 2.461286 LR: 0.00000602 [21:14:38] Epoch: 1 Batch: 19821/20099 (98.62%) Loss: 2.330388 LR: 0.00000602 [21:14:40] Epoch: 1 Batch: 19822/20099 (98.62%) Loss: 1.928169 LR: 0.00000602 [21:14:42] Epoch: 1 Batch: 19823/20099 (98.63%) Loss: 2.100055 LR: 0.00000602 [21:14:43] Epoch: 1 Batch: 19824/20099 (98.63%) Loss: 1.911302 LR: 0.00000602 [21:14:45] Epoch: 1 Batch: 19825/20099 (98.64%) Loss: 2.287809 LR: 0.00000602 [21:14:47] Epoch: 1 Batch: 19826/20099 (98.64%) Loss: 2.085364 LR: 0.00000602 [21:14:48] Epoch: 1 Batch: 19827/20099 (98.65%) Loss: 2.183035 LR: 0.00000602 [21:14:50] Epoch: 1 Batch: 19828/20099 (98.65%) Loss: 2.047106 LR: 0.00000602 [21:14:52] Epoch: 1 Batch: 19829/20099 (98.66%) Loss: 2.056818 LR: 0.00000602 [21:14:54] Epoch: 1 Batch: 19830/20099 (98.66%) Loss: 1.722013 LR: 0.00000602 [21:14:55] Epoch: 1 Batch: 19831/20099 (98.67%) Loss: 2.022016 LR: 0.00000602 [21:14:57] Epoch: 1 Batch: 19832/20099 (98.67%) Loss: 2.126859 LR: 0.00000602 [21:14:59] Epoch: 1 Batch: 19833/20099 (98.68%) Loss: 1.922251 LR: 0.00000602 [21:15:00] Epoch: 1 Batch: 19834/20099 (98.68%) Loss: 2.169086 LR: 0.00000602 [21:15:02] Epoch: 1 Batch: 19835/20099 (98.69%) Loss: 2.153125 LR: 0.00000602 [21:15:04] Epoch: 1 Batch: 19836/20099 (98.69%) Loss: 2.191390 LR: 0.00000602 [21:15:05] Epoch: 1 Batch: 19837/20099 (98.70%) Loss: 2.248435 LR: 0.00000602 [21:15:07] Epoch: 1 Batch: 19838/20099 (98.70%) Loss: 1.974731 LR: 0.00000602 [21:15:09] Epoch: 1 Batch: 19839/20099 (98.71%) Loss: 2.181959 LR: 0.00000602 [21:15:11] Epoch: 1 Batch: 19840/20099 (98.71%) Loss: 1.811167 LR: 0.00000602 [21:15:12] Epoch: 1 Batch: 19841/20099 (98.72%) Loss: 1.840478 LR: 0.00000602 [21:15:14] Epoch: 1 Batch: 19842/20099 (98.72%) Loss: 1.963462 LR: 0.00000602 [21:15:16] Epoch: 1 Batch: 19843/20099 (98.73%) Loss: 2.038458 LR: 0.00000602 [21:15:17] Epoch: 1 Batch: 19844/20099 (98.73%) Loss: 2.004476 LR: 0.00000602 [21:15:19] Epoch: 1 Batch: 19845/20099 (98.74%) Loss: 2.078323 LR: 0.00000602 [21:15:21] Epoch: 1 Batch: 19846/20099 (98.74%) Loss: 2.155626 LR: 0.00000602 [21:15:22] Epoch: 1 Batch: 19847/20099 (98.75%) Loss: 2.095434 LR: 0.00000602 [21:15:24] Epoch: 1 Batch: 19848/20099 (98.75%) Loss: 1.807209 LR: 0.00000602 [21:15:26] Epoch: 1 Batch: 19849/20099 (98.76%) Loss: 2.173926 LR: 0.00000602 [21:15:27] Epoch: 1 Batch: 19850/20099 (98.76%) Loss: 2.241233 LR: 0.00000602 [21:15:29] Epoch: 1 Batch: 19851/20099 (98.77%) Loss: 2.129119 LR: 0.00000602 [21:15:31] Epoch: 1 Batch: 19852/20099 (98.77%) Loss: 2.088399 LR: 0.00000602 [21:15:32] Epoch: 1 Batch: 19853/20099 (98.78%) Loss: 2.365876 LR: 0.00000602 [21:15:34] Epoch: 1 Batch: 19854/20099 (98.78%) Loss: 1.820472 LR: 0.00000601 [21:15:36] Epoch: 1 Batch: 19855/20099 (98.79%) Loss: 2.125359 LR: 0.00000601 [21:15:37] Epoch: 1 Batch: 19856/20099 (98.79%) Loss: 2.111696 LR: 0.00000601 [21:15:39] Epoch: 1 Batch: 19857/20099 (98.80%) Loss: 2.118651 LR: 0.00000601 [21:15:41] Epoch: 1 Batch: 19858/20099 (98.80%) Loss: 1.898112 LR: 0.00000601 [21:15:43] Epoch: 1 Batch: 19859/20099 (98.81%) Loss: 2.364972 LR: 0.00000601 [21:15:44] Epoch: 1 Batch: 19860/20099 (98.81%) Loss: 1.888286 LR: 0.00000601 [21:15:46] Epoch: 1 Batch: 19861/20099 (98.82%) Loss: 1.925153 LR: 0.00000601 [21:15:48] Epoch: 1 Batch: 19862/20099 (98.82%) Loss: 2.251948 LR: 0.00000601 [21:15:49] Epoch: 1 Batch: 19863/20099 (98.83%) Loss: 2.079187 LR: 0.00000601 [21:15:51] Epoch: 1 Batch: 19864/20099 (98.83%) Loss: 2.162825 LR: 0.00000601 [21:15:53] Epoch: 1 Batch: 19865/20099 (98.84%) Loss: 2.204412 LR: 0.00000601 [21:15:54] Epoch: 1 Batch: 19866/20099 (98.84%) Loss: 2.361954 LR: 0.00000601 [21:15:56] Epoch: 1 Batch: 19867/20099 (98.85%) Loss: 2.182274 LR: 0.00000601 [21:15:58] Epoch: 1 Batch: 19868/20099 (98.85%) Loss: 2.418038 LR: 0.00000601 [21:15:59] Epoch: 1 Batch: 19869/20099 (98.86%) Loss: 2.044775 LR: 0.00000601 [21:16:01] Epoch: 1 Batch: 19870/20099 (98.86%) Loss: 2.260160 LR: 0.00000601 [21:16:03] Epoch: 1 Batch: 19871/20099 (98.87%) Loss: 2.349758 LR: 0.00000601 [21:16:05] Epoch: 1 Batch: 19872/20099 (98.87%) Loss: 2.136977 LR: 0.00000601 [21:16:06] Epoch: 1 Batch: 19873/20099 (98.88%) Loss: 1.982842 LR: 0.00000601 [21:16:08] Epoch: 1 Batch: 19874/20099 (98.88%) Loss: 2.386414 LR: 0.00000601 [21:16:10] Epoch: 1 Batch: 19875/20099 (98.89%) Loss: 1.828253 LR: 0.00000601 [21:16:11] Epoch: 1 Batch: 19876/20099 (98.89%) Loss: 2.059540 LR: 0.00000601 [21:16:13] Epoch: 1 Batch: 19877/20099 (98.90%) Loss: 2.177837 LR: 0.00000601 [21:16:15] Epoch: 1 Batch: 19878/20099 (98.90%) Loss: 2.270956 LR: 0.00000601 [21:16:16] Epoch: 1 Batch: 19879/20099 (98.91%) Loss: 2.152571 LR: 0.00000601 [21:16:18] Epoch: 1 Batch: 19880/20099 (98.91%) Loss: 2.391748 LR: 0.00000601 [21:16:20] Epoch: 1 Batch: 19881/20099 (98.92%) Loss: 1.922534 LR: 0.00000601 [21:16:21] Epoch: 1 Batch: 19882/20099 (98.92%) Loss: 2.095635 LR: 0.00000601 [21:16:23] Epoch: 1 Batch: 19883/20099 (98.93%) Loss: 2.339785 LR: 0.00000601 [21:16:25] Epoch: 1 Batch: 19884/20099 (98.93%) Loss: 1.905582 LR: 0.00000601 [21:16:27] Epoch: 1 Batch: 19885/20099 (98.94%) Loss: 2.450078 LR: 0.00000601 [21:16:28] Epoch: 1 Batch: 19886/20099 (98.94%) Loss: 2.299301 LR: 0.00000601 [21:16:30] Epoch: 1 Batch: 19887/20099 (98.95%) Loss: 2.012414 LR: 0.00000601 [21:16:32] Epoch: 1 Batch: 19888/20099 (98.95%) Loss: 2.217902 LR: 0.00000601 [21:16:33] Epoch: 1 Batch: 19889/20099 (98.96%) Loss: 2.243109 LR: 0.00000601 [21:16:35] Epoch: 1 Batch: 19890/20099 (98.96%) Loss: 2.026502 LR: 0.00000601 [21:16:37] Epoch: 1 Batch: 19891/20099 (98.97%) Loss: 2.223279 LR: 0.00000601 [21:16:38] Epoch: 1 Batch: 19892/20099 (98.97%) Loss: 2.128953 LR: 0.00000601 [21:16:40] Epoch: 1 Batch: 19893/20099 (98.98%) Loss: 2.140095 LR: 0.00000601 [21:16:42] Epoch: 1 Batch: 19894/20099 (98.98%) Loss: 1.992040 LR: 0.00000601 [21:16:43] Epoch: 1 Batch: 19895/20099 (98.99%) Loss: 2.362668 LR: 0.00000601 [21:16:45] Epoch: 1 Batch: 19896/20099 (98.99%) Loss: 1.991993 LR: 0.00000601 [21:16:47] Epoch: 1 Batch: 19897/20099 (98.99%) Loss: 1.908153 LR: 0.00000601 [21:16:48] Epoch: 1 Batch: 19898/20099 (99.00%) Loss: 2.475658 LR: 0.00000601 [21:16:50] Epoch: 1 Batch: 19899/20099 (99.00%) Loss: 2.139417 LR: 0.00000601 [21:16:52] Epoch: 1 Batch: 19900/20099 (99.01%) Loss: 2.265611 LR: 0.00000601 [21:16:54] Epoch: 1 Batch: 19901/20099 (99.01%) Loss: 2.066012 LR: 0.00000601 [21:16:55] Epoch: 1 Batch: 19902/20099 (99.02%) Loss: 2.183114 LR: 0.00000601 [21:16:57] Epoch: 1 Batch: 19903/20099 (99.02%) Loss: 2.225366 LR: 0.00000601 [21:16:59] Epoch: 1 Batch: 19904/20099 (99.03%) Loss: 2.059197 LR: 0.00000601 [21:17:00] Epoch: 1 Batch: 19905/20099 (99.03%) Loss: 2.343226 LR: 0.00000601 [21:17:02] Epoch: 1 Batch: 19906/20099 (99.04%) Loss: 1.905977 LR: 0.00000601 [21:17:04] Epoch: 1 Batch: 19907/20099 (99.04%) Loss: 1.926025 LR: 0.00000601 [21:17:05] Epoch: 1 Batch: 19908/20099 (99.05%) Loss: 2.233013 LR: 0.00000601 [21:17:07] Epoch: 1 Batch: 19909/20099 (99.05%) Loss: 2.001720 LR: 0.00000601 [21:17:09] Epoch: 1 Batch: 19910/20099 (99.06%) Loss: 2.192508 LR: 0.00000601 [21:17:10] Epoch: 1 Batch: 19911/20099 (99.06%) Loss: 1.998633 LR: 0.00000601 [21:17:12] Epoch: 1 Batch: 19912/20099 (99.07%) Loss: 1.491950 LR: 0.00000601 [21:17:14] Epoch: 1 Batch: 19913/20099 (99.07%) Loss: 2.068771 LR: 0.00000601 [21:17:16] Epoch: 1 Batch: 19914/20099 (99.08%) Loss: 2.128163 LR: 0.00000601 [21:17:17] Epoch: 1 Batch: 19915/20099 (99.08%) Loss: 1.915859 LR: 0.00000601 [21:17:19] Epoch: 1 Batch: 19916/20099 (99.09%) Loss: 2.079754 LR: 0.00000601 [21:17:21] Epoch: 1 Batch: 19917/20099 (99.09%) Loss: 1.931251 LR: 0.00000601 [21:17:22] Epoch: 1 Batch: 19918/20099 (99.10%) Loss: 1.794970 LR: 0.00000601 [21:17:24] Epoch: 1 Batch: 19919/20099 (99.10%) Loss: 2.064368 LR: 0.00000601 [21:17:26] Epoch: 1 Batch: 19920/20099 (99.11%) Loss: 1.681525 LR: 0.00000601 [21:17:27] Epoch: 1 Batch: 19921/20099 (99.11%) Loss: 2.125411 LR: 0.00000601 [21:17:29] Epoch: 1 Batch: 19922/20099 (99.12%) Loss: 2.191348 LR: 0.00000601 [21:17:31] Epoch: 1 Batch: 19923/20099 (99.12%) Loss: 1.762524 LR: 0.00000601 [21:17:32] Epoch: 1 Batch: 19924/20099 (99.13%) Loss: 2.188381 LR: 0.00000601 [21:17:34] Epoch: 1 Batch: 19925/20099 (99.13%) Loss: 2.229745 LR: 0.00000601 [21:17:36] Epoch: 1 Batch: 19926/20099 (99.14%) Loss: 1.964939 LR: 0.00000601 [21:17:38] Epoch: 1 Batch: 19927/20099 (99.14%) Loss: 2.091248 LR: 0.00000601 [21:17:39] Epoch: 1 Batch: 19928/20099 (99.15%) Loss: 2.110506 LR: 0.00000601 [21:17:41] Epoch: 1 Batch: 19929/20099 (99.15%) Loss: 2.221666 LR: 0.00000601 [21:17:43] Epoch: 1 Batch: 19930/20099 (99.16%) Loss: 2.092208 LR: 0.00000601 [21:17:44] Epoch: 1 Batch: 19931/20099 (99.16%) Loss: 1.979654 LR: 0.00000601 [21:17:46] Epoch: 1 Batch: 19932/20099 (99.17%) Loss: 1.729502 LR: 0.00000601 [21:17:48] Epoch: 1 Batch: 19933/20099 (99.17%) Loss: 2.357080 LR: 0.00000601 [21:17:49] Epoch: 1 Batch: 19934/20099 (99.18%) Loss: 2.016843 LR: 0.00000601 [21:17:51] Epoch: 1 Batch: 19935/20099 (99.18%) Loss: 2.214905 LR: 0.00000601 [21:17:53] Epoch: 1 Batch: 19936/20099 (99.19%) Loss: 2.107324 LR: 0.00000601 [21:17:54] Epoch: 1 Batch: 19937/20099 (99.19%) Loss: 2.392010 LR: 0.00000601 [21:17:56] Epoch: 1 Batch: 19938/20099 (99.20%) Loss: 2.181722 LR: 0.00000601 [21:17:58] Epoch: 1 Batch: 19939/20099 (99.20%) Loss: 2.021723 LR: 0.00000601 [21:18:00] Epoch: 1 Batch: 19940/20099 (99.21%) Loss: 2.103037 LR: 0.00000601 [21:18:01] Epoch: 1 Batch: 19941/20099 (99.21%) Loss: 2.227875 LR: 0.00000601 [21:18:03] Epoch: 1 Batch: 19942/20099 (99.22%) Loss: 2.004650 LR: 0.00000601 [21:18:05] Epoch: 1 Batch: 19943/20099 (99.22%) Loss: 2.066424 LR: 0.00000601 [21:18:06] Epoch: 1 Batch: 19944/20099 (99.23%) Loss: 2.322552 LR: 0.00000601 [21:18:08] Epoch: 1 Batch: 19945/20099 (99.23%) Loss: 1.947948 LR: 0.00000601 [21:18:10] Epoch: 1 Batch: 19946/20099 (99.24%) Loss: 2.001580 LR: 0.00000601 [21:18:11] Epoch: 1 Batch: 19947/20099 (99.24%) Loss: 2.056804 LR: 0.00000601 [21:18:13] Epoch: 1 Batch: 19948/20099 (99.25%) Loss: 2.046386 LR: 0.00000601 [21:18:15] Epoch: 1 Batch: 19949/20099 (99.25%) Loss: 2.299993 LR: 0.00000601 [21:18:16] Epoch: 1 Batch: 19950/20099 (99.26%) Loss: 2.123426 LR: 0.00000601 [21:18:18] Epoch: 1 Batch: 19951/20099 (99.26%) Loss: 2.092831 LR: 0.00000601 [21:18:20] Epoch: 1 Batch: 19952/20099 (99.27%) Loss: 2.107985 LR: 0.00000601 [21:18:22] Epoch: 1 Batch: 19953/20099 (99.27%) Loss: 2.699091 LR: 0.00000601 [21:18:23] Epoch: 1 Batch: 19954/20099 (99.28%) Loss: 2.065516 LR: 0.00000601 [21:18:25] Epoch: 1 Batch: 19955/20099 (99.28%) Loss: 2.189439 LR: 0.00000601 [21:18:27] Epoch: 1 Batch: 19956/20099 (99.29%) Loss: 2.139875 LR: 0.00000601 [21:18:28] Epoch: 1 Batch: 19957/20099 (99.29%) Loss: 1.987597 LR: 0.00000601 [21:18:30] Epoch: 1 Batch: 19958/20099 (99.30%) Loss: 1.785277 LR: 0.00000601 [21:18:32] Epoch: 1 Batch: 19959/20099 (99.30%) Loss: 2.030840 LR: 0.00000600 [21:18:33] Epoch: 1 Batch: 19960/20099 (99.31%) Loss: 2.016146 LR: 0.00000600 [21:18:35] Epoch: 1 Batch: 19961/20099 (99.31%) Loss: 1.983338 LR: 0.00000600 [21:18:37] Epoch: 1 Batch: 19962/20099 (99.32%) Loss: 1.879156 LR: 0.00000600 [21:18:38] Epoch: 1 Batch: 19963/20099 (99.32%) Loss: 1.990597 LR: 0.00000600 [21:18:40] Epoch: 1 Batch: 19964/20099 (99.33%) Loss: 2.050892 LR: 0.00000600 [21:18:42] Epoch: 1 Batch: 19965/20099 (99.33%) Loss: 2.450272 LR: 0.00000600 [21:18:44] Epoch: 1 Batch: 19966/20099 (99.34%) Loss: 2.364906 LR: 0.00000600 [21:18:45] Epoch: 1 Batch: 19967/20099 (99.34%) Loss: 1.904831 LR: 0.00000600 [21:18:47] Epoch: 1 Batch: 19968/20099 (99.35%) Loss: 1.992085 LR: 0.00000600 [21:18:49] Epoch: 1 Batch: 19969/20099 (99.35%) Loss: 1.824592 LR: 0.00000600 [21:18:50] Epoch: 1 Batch: 19970/20099 (99.36%) Loss: 2.119126 LR: 0.00000600 [21:18:52] Epoch: 1 Batch: 19971/20099 (99.36%) Loss: 2.127517 LR: 0.00000600 [21:18:54] Epoch: 1 Batch: 19972/20099 (99.37%) Loss: 2.261355 LR: 0.00000600 [21:18:55] Epoch: 1 Batch: 19973/20099 (99.37%) Loss: 1.915739 LR: 0.00000600 [21:18:57] Epoch: 1 Batch: 19974/20099 (99.38%) Loss: 1.710819 LR: 0.00000600 [21:18:59] Epoch: 1 Batch: 19975/20099 (99.38%) Loss: 2.095576 LR: 0.00000600 [21:19:01] Epoch: 1 Batch: 19976/20099 (99.39%) Loss: 1.952948 LR: 0.00000600 [21:19:02] Epoch: 1 Batch: 19977/20099 (99.39%) Loss: 1.988061 LR: 0.00000600 [21:19:04] Epoch: 1 Batch: 19978/20099 (99.40%) Loss: 2.172117 LR: 0.00000600 [21:19:06] Epoch: 1 Batch: 19979/20099 (99.40%) Loss: 1.987535 LR: 0.00000600 [21:19:07] Epoch: 1 Batch: 19980/20099 (99.41%) Loss: 2.006175 LR: 0.00000600 [21:19:09] Epoch: 1 Batch: 19981/20099 (99.41%) Loss: 2.217468 LR: 0.00000600 [21:19:11] Epoch: 1 Batch: 19982/20099 (99.42%) Loss: 1.738266 LR: 0.00000600 [21:19:12] Epoch: 1 Batch: 19983/20099 (99.42%) Loss: 1.652343 LR: 0.00000600 [21:19:14] Epoch: 1 Batch: 19984/20099 (99.43%) Loss: 1.832038 LR: 0.00000600 [21:19:16] Epoch: 1 Batch: 19985/20099 (99.43%) Loss: 1.951116 LR: 0.00000600 [21:19:17] Epoch: 1 Batch: 19986/20099 (99.44%) Loss: 2.049422 LR: 0.00000600 [21:19:19] Epoch: 1 Batch: 19987/20099 (99.44%) Loss: 2.207121 LR: 0.00000600 [21:19:21] Epoch: 1 Batch: 19988/20099 (99.45%) Loss: 2.255623 LR: 0.00000600 [21:19:23] Epoch: 1 Batch: 19989/20099 (99.45%) Loss: 1.987094 LR: 0.00000600 [21:19:24] Epoch: 1 Batch: 19990/20099 (99.46%) Loss: 2.357598 LR: 0.00000600 [21:19:26] Epoch: 1 Batch: 19991/20099 (99.46%) Loss: 2.250891 LR: 0.00000600 [21:19:28] Epoch: 1 Batch: 19992/20099 (99.47%) Loss: 2.123541 LR: 0.00000600 [21:19:29] Epoch: 1 Batch: 19993/20099 (99.47%) Loss: 2.153937 LR: 0.00000600 [21:19:31] Epoch: 1 Batch: 19994/20099 (99.48%) Loss: 2.160149 LR: 0.00000600 [21:19:33] Epoch: 1 Batch: 19995/20099 (99.48%) Loss: 2.040557 LR: 0.00000600 [21:19:34] Epoch: 1 Batch: 19996/20099 (99.49%) Loss: 2.279503 LR: 0.00000600 [21:19:36] Epoch: 1 Batch: 19997/20099 (99.49%) Loss: 2.209139 LR: 0.00000600 [21:19:38] Epoch: 1 Batch: 19998/20099 (99.50%) Loss: 2.021509 LR: 0.00000600 [21:19:40] Epoch: 1 Batch: 19999/20099 (99.50%) Loss: 2.177733 LR: 0.00000600 [21:19:41] >> Evaluating batch 0 [21:19:42] >> Evaluating batch 1 [21:19:43] >> Evaluating batch 2 [21:19:44] >> Evaluating batch 3 [21:19:45] >> Evaluating batch 4 [21:19:46] >> Evaluating batch 5 [21:19:47] >> Evaluating batch 6 [21:19:48] >> Evaluating batch 7 [21:19:49] >> Evaluating batch 8 [21:19:50] >> Evaluating batch 9 [21:19:51] >> Evaluating batch 10 [21:19:52] >> Evaluating batch 11 [21:19:53] >> Evaluating batch 12 [21:19:54] >> Evaluating batch 13 [21:19:55] >> Evaluating batch 14 [21:19:55] >> Evaluating batch 15 [21:19:56] >> Evaluating batch 16 [21:19:57] Epoch: 1 Step: 20000/20099 Evaluation: [21:19:57] [1mAvg Loss Since Last Eval: 2.0924 Val Loss: 2.1448 Validation loss delta: -0.0014 Perplexity: 8.5400 LR: 0.00000600 [21:20:01] >> Cleaned up old temp checkpoint: epoch1_step18000 [21:20:01] >> Temp checkpoint saved: epoch1_step20000, size: 0.1693 GB [21:20:04] >> Checkpoint saved: epoch1_step20000, size: 0.1693 GB [21:20:04] Epoch: 1 Batch: 20000/20099 (99.51%) Loss: 1.894609 LR: 0.00000600 [21:20:06] Epoch: 1 Batch: 20001/20099 (99.51%) Loss: 2.119290 LR: 0.00000600 [21:20:08] Epoch: 1 Batch: 20002/20099 (99.52%) Loss: 1.900173 LR: 0.00000600 [21:20:09] Epoch: 1 Batch: 20003/20099 (99.52%) Loss: 1.886672 LR: 0.00000600 [21:20:11] Epoch: 1 Batch: 20004/20099 (99.53%) Loss: 2.296856 LR: 0.00000600 [21:20:13] Epoch: 1 Batch: 20005/20099 (99.53%) Loss: 2.127817 LR: 0.00000600 [21:20:14] Epoch: 1 Batch: 20006/20099 (99.54%) Loss: 2.125939 LR: 0.00000600 [21:20:16] Epoch: 1 Batch: 20007/20099 (99.54%) Loss: 2.132201 LR: 0.00000600 [21:20:18] Epoch: 1 Batch: 20008/20099 (99.55%) Loss: 1.948098 LR: 0.00000600 [21:20:19] Epoch: 1 Batch: 20009/20099 (99.55%) Loss: 2.311985 LR: 0.00000600 [21:20:21] Epoch: 1 Batch: 20010/20099 (99.56%) Loss: 2.074785 LR: 0.00000600 [21:20:23] Epoch: 1 Batch: 20011/20099 (99.56%) Loss: 2.217137 LR: 0.00000600 [21:20:25] Epoch: 1 Batch: 20012/20099 (99.57%) Loss: 1.919242 LR: 0.00000600 [21:20:26] Epoch: 1 Batch: 20013/20099 (99.57%) Loss: 2.283978 LR: 0.00000600 [21:20:28] Epoch: 1 Batch: 20014/20099 (99.58%) Loss: 1.825777 LR: 0.00000600 [21:20:30] Epoch: 1 Batch: 20015/20099 (99.58%) Loss: 2.039642 LR: 0.00000600 [21:20:31] Epoch: 1 Batch: 20016/20099 (99.59%) Loss: 2.310894 LR: 0.00000600 [21:20:33] Epoch: 1 Batch: 20017/20099 (99.59%) Loss: 2.431560 LR: 0.00000600 [21:20:35] Epoch: 1 Batch: 20018/20099 (99.60%) Loss: 2.110026 LR: 0.00000600 [21:20:37] Epoch: 1 Batch: 20019/20099 (99.60%) Loss: 1.900888 LR: 0.00000600 [21:20:38] Epoch: 1 Batch: 20020/20099 (99.61%) Loss: 1.696613 LR: 0.00000600 [21:20:40] Epoch: 1 Batch: 20021/20099 (99.61%) Loss: 1.971084 LR: 0.00000600 [21:20:42] Epoch: 1 Batch: 20022/20099 (99.62%) Loss: 2.279451 LR: 0.00000600 [21:20:43] Epoch: 1 Batch: 20023/20099 (99.62%) Loss: 2.208556 LR: 0.00000600 [21:20:45] Epoch: 1 Batch: 20024/20099 (99.63%) Loss: 2.193953 LR: 0.00000600 [21:20:47] Epoch: 1 Batch: 20025/20099 (99.63%) Loss: 2.005806 LR: 0.00000600 [21:20:49] Epoch: 1 Batch: 20026/20099 (99.64%) Loss: 2.376236 LR: 0.00000600 [21:20:50] Epoch: 1 Batch: 20027/20099 (99.64%) Loss: 2.040633 LR: 0.00000600 [21:20:52] Epoch: 1 Batch: 20028/20099 (99.65%) Loss: 1.958661 LR: 0.00000600 [21:20:54] Epoch: 1 Batch: 20029/20099 (99.65%) Loss: 2.091048 LR: 0.00000600 [21:20:55] Epoch: 1 Batch: 20030/20099 (99.66%) Loss: 2.196643 LR: 0.00000600 [21:20:57] Epoch: 1 Batch: 20031/20099 (99.66%) Loss: 2.165956 LR: 0.00000600 [21:20:59] Epoch: 1 Batch: 20032/20099 (99.67%) Loss: 1.989054 LR: 0.00000600 [21:21:00] Epoch: 1 Batch: 20033/20099 (99.67%) Loss: 2.271550 LR: 0.00000600 [21:21:02] Epoch: 1 Batch: 20034/20099 (99.68%) Loss: 2.020090 LR: 0.00000600 [21:21:04] Epoch: 1 Batch: 20035/20099 (99.68%) Loss: 2.100436 LR: 0.00000600 [21:21:06] Epoch: 1 Batch: 20036/20099 (99.69%) Loss: 2.018391 LR: 0.00000600 [21:21:07] Epoch: 1 Batch: 20037/20099 (99.69%) Loss: 2.015633 LR: 0.00000600 [21:21:09] Epoch: 1 Batch: 20038/20099 (99.70%) Loss: 2.111664 LR: 0.00000600 [21:21:11] Epoch: 1 Batch: 20039/20099 (99.70%) Loss: 2.169130 LR: 0.00000600 [21:21:12] Epoch: 1 Batch: 20040/20099 (99.71%) Loss: 2.284732 LR: 0.00000600 [21:21:14] Epoch: 1 Batch: 20041/20099 (99.71%) Loss: 2.193870 LR: 0.00000600 [21:21:16] Epoch: 1 Batch: 20042/20099 (99.72%) Loss: 2.200043 LR: 0.00000600 [21:21:17] Epoch: 1 Batch: 20043/20099 (99.72%) Loss: 1.802087 LR: 0.00000600 [21:21:19] Epoch: 1 Batch: 20044/20099 (99.73%) Loss: 1.943886 LR: 0.00000600 [21:21:21] Epoch: 1 Batch: 20045/20099 (99.73%) Loss: 1.958809 LR: 0.00000600 [21:21:22] Epoch: 1 Batch: 20046/20099 (99.74%) Loss: 2.254942 LR: 0.00000600 [21:21:24] Epoch: 1 Batch: 20047/20099 (99.74%) Loss: 2.071559 LR: 0.00000600 [21:21:26] Epoch: 1 Batch: 20048/20099 (99.75%) Loss: 1.877109 LR: 0.00000600 [21:21:27] Epoch: 1 Batch: 20049/20099 (99.75%) Loss: 1.975144 LR: 0.00000600 [21:21:29] Epoch: 1 Batch: 20050/20099 (99.76%) Loss: 2.000994 LR: 0.00000600 [21:21:31] Epoch: 1 Batch: 20051/20099 (99.76%) Loss: 2.148230 LR: 0.00000600 [21:21:33] Epoch: 1 Batch: 20052/20099 (99.77%) Loss: 2.332328 LR: 0.00000600 [21:21:34] Epoch: 1 Batch: 20053/20099 (99.77%) Loss: 2.297668 LR: 0.00000600 [21:21:36] Epoch: 1 Batch: 20054/20099 (99.78%) Loss: 1.931857 LR: 0.00000600 [21:21:38] Epoch: 1 Batch: 20055/20099 (99.78%) Loss: 1.881793 LR: 0.00000600 [21:21:39] Epoch: 1 Batch: 20056/20099 (99.79%) Loss: 2.050407 LR: 0.00000600 [21:21:41] Epoch: 1 Batch: 20057/20099 (99.79%) Loss: 2.232180 LR: 0.00000600 [21:21:43] Epoch: 1 Batch: 20058/20099 (99.80%) Loss: 2.163333 LR: 0.00000600 [21:21:44] Epoch: 1 Batch: 20059/20099 (99.80%) Loss: 2.033811 LR: 0.00000600 [21:21:46] Epoch: 1 Batch: 20060/20099 (99.81%) Loss: 1.958224 LR: 0.00000600 [21:21:48] Epoch: 1 Batch: 20061/20099 (99.81%) Loss: 1.790466 LR: 0.00000600 [21:21:49] Epoch: 1 Batch: 20062/20099 (99.82%) Loss: 2.163512 LR: 0.00000600 [21:21:51] Epoch: 1 Batch: 20063/20099 (99.82%) Loss: 2.442458 LR: 0.00000600 [21:21:53] Epoch: 1 Batch: 20064/20099 (99.83%) Loss: 1.767469 LR: 0.00000600 [21:21:55] Epoch: 1 Batch: 20065/20099 (99.83%) Loss: 2.162555 LR: 0.00000600 [21:21:56] Epoch: 1 Batch: 20066/20099 (99.84%) Loss: 1.913892 LR: 0.00000600 [21:21:58] Epoch: 1 Batch: 20067/20099 (99.84%) Loss: 2.216797 LR: 0.00000600 [21:22:00] Epoch: 1 Batch: 20068/20099 (99.85%) Loss: 2.215182 LR: 0.00000600 [21:22:01] Epoch: 1 Batch: 20069/20099 (99.85%) Loss: 2.041317 LR: 0.00000600 [21:22:03] Epoch: 1 Batch: 20070/20099 (99.86%) Loss: 1.849959 LR: 0.00000600 [21:22:05] Epoch: 1 Batch: 20071/20099 (99.86%) Loss: 2.030911 LR: 0.00000600 [21:22:06] Epoch: 1 Batch: 20072/20099 (99.87%) Loss: 2.323207 LR: 0.00000600 [21:22:08] Epoch: 1 Batch: 20073/20099 (99.87%) Loss: 2.118185 LR: 0.00000600 [21:22:10] Epoch: 1 Batch: 20074/20099 (99.88%) Loss: 1.719002 LR: 0.00000600 [21:22:11] Epoch: 1 Batch: 20075/20099 (99.88%) Loss: 1.980797 LR: 0.00000600 [21:22:13] Epoch: 1 Batch: 20076/20099 (99.89%) Loss: 2.036379 LR: 0.00000600 [21:22:15] Epoch: 1 Batch: 20077/20099 (99.89%) Loss: 1.961114 LR: 0.00000600 [21:22:17] Epoch: 1 Batch: 20078/20099 (99.90%) Loss: 1.979412 LR: 0.00000600 [21:22:18] Epoch: 1 Batch: 20079/20099 (99.90%) Loss: 2.213207 LR: 0.00000600 [21:22:20] Epoch: 1 Batch: 20080/20099 (99.91%) Loss: 2.047645 LR: 0.00000600 [21:22:22] Epoch: 1 Batch: 20081/20099 (99.91%) Loss: 2.121185 LR: 0.00000600 [21:22:23] Epoch: 1 Batch: 20082/20099 (99.92%) Loss: 1.899416 LR: 0.00000600 [21:22:25] Epoch: 1 Batch: 20083/20099 (99.92%) Loss: 2.089800 LR: 0.00000600 [21:22:27] Epoch: 1 Batch: 20084/20099 (99.93%) Loss: 2.278829 LR: 0.00000600 [21:22:28] Epoch: 1 Batch: 20085/20099 (99.93%) Loss: 2.147457 LR: 0.00000600 [21:22:30] Epoch: 1 Batch: 20086/20099 (99.94%) Loss: 2.007672 LR: 0.00000600 [21:22:32] Epoch: 1 Batch: 20087/20099 (99.94%) Loss: 2.243100 LR: 0.00000600 [21:22:34] Epoch: 1 Batch: 20088/20099 (99.95%) Loss: 1.995599 LR: 0.00000600 [21:22:35] Epoch: 1 Batch: 20089/20099 (99.95%) Loss: 2.086331 LR: 0.00000600 [21:22:37] Epoch: 1 Batch: 20090/20099 (99.96%) Loss: 1.993614 LR: 0.00000600 [21:22:39] Epoch: 1 Batch: 20091/20099 (99.96%) Loss: 2.170582 LR: 0.00000600 [21:22:40] Epoch: 1 Batch: 20092/20099 (99.97%) Loss: 2.397893 LR: 0.00000600 [21:22:42] Epoch: 1 Batch: 20093/20099 (99.97%) Loss: 1.849714 LR: 0.00000600 [21:22:44] Epoch: 1 Batch: 20094/20099 (99.98%) Loss: 1.872685 LR: 0.00000600 [21:22:46] Epoch: 1 Batch: 20095/20099 (99.98%) Loss: 1.949284 LR: 0.00000600 [21:22:47] Epoch: 1 Batch: 20096/20099 (99.99%) Loss: 2.428811 LR: 0.00000600 [21:22:49] Epoch: 1 Batch: 20097/20099 (99.99%) Loss: 2.222315 LR: 0.00000600 [21:22:51] Epoch: 1 Batch: 20098/20099 (100.00%) Loss: 1.950915 LR: 0.00000600 [21:22:52] Epoch: 1 Batch: 20099/20099 (100.00%) Loss: 2.265149 LR: 0.00000600 [21:22:52] CPU usage: 64.4%, RAM usage: 43.5% [21:22:52] Memory cleanup after epoch 1 [21:22:53] CPU usage: 54.0%, RAM usage: 43.5% [21:22:53] Epoch 1 average loss: 0.1139 [21:22:53] >> Evaluating batch 0 [21:22:54] >> Evaluating batch 1 [21:22:55] >> Evaluating batch 2 [21:22:56] >> Evaluating batch 3 [21:22:57] >> Evaluating batch 4 [21:22:58] >> Evaluating batch 5 [21:22:59] >> Evaluating batch 6 [21:23:00] >> Evaluating batch 7 [21:23:01] >> Evaluating batch 8 [21:23:02] >> Evaluating batch 9 [21:23:03] >> Evaluating batch 10 [21:23:04] >> Evaluating batch 11 [21:23:05] >> Evaluating batch 12 [21:23:06] >> Evaluating batch 13 [21:23:06] >> Evaluating batch 14 [21:23:07] >> Evaluating batch 15 [21:23:08] >> Evaluating batch 16 [21:23:09] Epoch: 1 Step: 20099/20099 Evaluation: [21:23:09] Val Loss: 2.1447 Perplexity: 8.5396 LR: 0.00000600 [21:23:09] Epoch 1 completed in 2060.51 seconds [21:23:13] >> Checkpoint saved: epoch1_complete, size: 0.1690 GB [21:23:16] >> Cleaned up old temp checkpoint: epoch1_step18200 [21:23:16] >> Temp checkpoint saved: epoch1_step20099, size: 0.1690 GB [21:23:17] Training complete.