From dcac1c41d3fd440f781fcf1d7daa74fd5bcee28c Mon Sep 17 00:00:00 2001 From: ModelHub XC Date: Sat, 18 Apr 2026 01:19:21 +0800 Subject: [PATCH] =?UTF-8?q?=E5=88=9D=E5=A7=8B=E5=8C=96=E9=A1=B9=E7=9B=AE?= =?UTF-8?q?=EF=BC=8C=E7=94=B1ModelHub=20XC=E7=A4=BE=E5=8C=BA=E6=8F=90?= =?UTF-8?q?=E4=BE=9B=E6=A8=A1=E5=9E=8B?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Model: mookiezii/Discord-Hermes-3-8B Source: Original Platform --- .gitattributes | 41 + README.md | 75 + additional_chat_templates/tool_use.jinja | 152 + ...ord-Hermes-3-8B-interface-screenshot-2.png | 3 + ...scord-Hermes-3-8B-interface-screenshot.png | Bin 0 -> 91533 bytes chat_template.jinja | 6 + config.json | 35 + generation_config.json | 9 + interface.py | 1161 + model-00001-of-00004.safetensors | 3 + model-00002-of-00004.safetensors | 3 + model-00003-of-00004.safetensors | 3 + model-00004-of-00004.safetensors | 3 + model.safetensors.index.json | 299 + special_tokens_map.json | 23 + tokenizer.json | 3 + tokenizer_config.json | 2063 ++ train.log | 24158 ++++++++++++++++ 18 files changed, 28040 insertions(+) create mode 100644 .gitattributes create mode 100644 README.md create mode 100644 additional_chat_templates/tool_use.jinja create mode 100644 assets/Discord-Hermes-3-8B-interface-screenshot-2.png create mode 100644 assets/Discord-Hermes-3-8B-interface-screenshot.png create mode 100644 chat_template.jinja create mode 100644 config.json create mode 100644 generation_config.json create mode 100644 interface.py create mode 100644 model-00001-of-00004.safetensors create mode 100644 model-00002-of-00004.safetensors create mode 100644 model-00003-of-00004.safetensors create mode 100644 model-00004-of-00004.safetensors create mode 100644 model.safetensors.index.json create mode 100644 special_tokens_map.json create mode 100644 tokenizer.json create mode 100644 tokenizer_config.json create mode 100644 train.log diff --git a/.gitattributes b/.gitattributes new file mode 100644 index 0000000..32765cd --- /dev/null +++ b/.gitattributes @@ -0,0 +1,41 @@ +*.7z filter=lfs diff=lfs merge=lfs -text +*.arrow filter=lfs diff=lfs merge=lfs -text +*.bin filter=lfs diff=lfs merge=lfs -text +*.bz2 filter=lfs diff=lfs merge=lfs -text +*.ckpt filter=lfs diff=lfs merge=lfs -text +*.ftz filter=lfs diff=lfs merge=lfs -text +*.gz filter=lfs diff=lfs merge=lfs -text +*.h5 filter=lfs diff=lfs merge=lfs -text +*.joblib filter=lfs diff=lfs merge=lfs -text +*.lfs.* filter=lfs diff=lfs merge=lfs -text +*.mlmodel filter=lfs diff=lfs merge=lfs -text +*.model filter=lfs diff=lfs merge=lfs -text +*.msgpack filter=lfs diff=lfs merge=lfs -text +*.npy filter=lfs diff=lfs merge=lfs -text +*.npz filter=lfs diff=lfs merge=lfs -text +*.onnx filter=lfs diff=lfs merge=lfs -text +*.ot filter=lfs diff=lfs merge=lfs -text +*.parquet filter=lfs diff=lfs merge=lfs -text +*.pb filter=lfs diff=lfs merge=lfs -text +*.pickle filter=lfs diff=lfs merge=lfs -text +*.pkl filter=lfs diff=lfs merge=lfs -text +*.pt filter=lfs diff=lfs merge=lfs -text +*.pth filter=lfs diff=lfs merge=lfs -text +*.rar filter=lfs diff=lfs merge=lfs -text +*.safetensors filter=lfs diff=lfs merge=lfs -text +saved_model/**/* filter=lfs diff=lfs merge=lfs -text +*.tar.* filter=lfs diff=lfs merge=lfs -text +*.tar filter=lfs diff=lfs merge=lfs -text +*.tflite filter=lfs diff=lfs merge=lfs -text +*.tgz filter=lfs diff=lfs merge=lfs -text +*.wasm filter=lfs diff=lfs merge=lfs -text +*.xz filter=lfs diff=lfs merge=lfs -text +*.zip filter=lfs diff=lfs merge=lfs -text +*.zst filter=lfs diff=lfs merge=lfs -text +*tfevents* filter=lfs diff=lfs merge=lfs -text +tokenizer.json filter=lfs diff=lfs merge=lfs -text +Discord-Hermes-3-8B-inferface-screenshot-2.png filter=lfs diff=lfs merge=lfs -text +assests/Discord-Hermes-3-8B-inferface-screenshot-2.png filter=lfs diff=lfs merge=lfs -text +assests/Discord-Hermes-3-8B-inferface-screenshot.png filter=lfs diff=lfs merge=lfs -text +assests/Discord-Hermes-3-8B-interface-screenshot-2.png filter=lfs diff=lfs merge=lfs -text +assets/Discord-Hermes-3-8B-interface-screenshot-2.png filter=lfs diff=lfs merge=lfs -text diff --git a/README.md b/README.md new file mode 100644 index 0000000..4893b55 --- /dev/null +++ b/README.md @@ -0,0 +1,75 @@ +--- +tags: + - transformers + - causal-lm + - text-generation + - instruct + - chat + - fine-tuned + - merged-lora + - llama-3 + - hermes + - discord-dataset + - conversational-ai + - chatml + - pytorch + - open-weights + - 8b-parameters +model-index: + - name: mookiezii/Discord-Hermes-3-8B + results: [] +base_model: + - NousResearch/Hermes-3-Llama-3.1-8B +datasets: + - mookiezi/Discord-Dialogues +library_name: transformers +license: llama3 +--- + +# Discord-Hermes-3-8B + +## Model Description + +This is a fine-tuned version of [NousResearch/Hermes-3-Llama-3.1-8B](https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B), trained on a curated dataset built from a mixture of enhanced and original samples from [mookiezi/Discord-Dialogues](https://huggingface.co/datasets/mookiezi/Discord-Dialogues). + +The training hyper-parameters and log are available [here](https://huggingface.co/mookiezii/Discord-Hermes-3-8B/raw/main/train.log). + +--- + +### Sample Outputs + +⚠️ **Disclaimer:** The first image is fictional dialogue. +It should not be interpreted as political advice, a statement of fact, or an endorsement of any kind. + +![Discord-Hermes-3-8B Interface Screenshot](assets/Discord-Hermes-3-8B-interface-screenshot.png) + +![Discord-Hermes-3-8B Interface Screenshot 2](assets/Discord-Hermes-3-8B-interface-screenshot-2.png) + +--- + +## Interfacing + +An optimized Python script for interfacing is available [here](https://huggingface.co/mookiezii/Discord-Hermes-3-8B/blob/main/interface.py). + +### Windows + +```powershell +py interface.py +```` + +### macOS + +```bash +python3 interface.py +``` + +### Linux + +```bash +python3 interface.py +``` + + +[​](https://20000.online/micae) +[​](https://20000.online/openmicae) +[​](https://20000.online/discord-dialogues) diff --git a/additional_chat_templates/tool_use.jinja b/additional_chat_templates/tool_use.jinja new file mode 100644 index 0000000..149250b --- /dev/null +++ b/additional_chat_templates/tool_use.jinja @@ -0,0 +1,152 @@ +{%- macro json_to_python_type(json_spec) %} +{%- set basic_type_map = { + "string": "str", + "number": "float", + "integer": "int", + "boolean": "bool" +} %} + +{%- if basic_type_map[json_spec.type] is defined %} + {{- basic_type_map[json_spec.type] }} +{%- elif json_spec.type == "array" %} + {{- "list[" + json_to_python_type(json_spec|items) + "]"}} +{%- elif json_spec.type == "object" %} + {%- if json_spec.additionalProperties is defined %} + {{- "dict[str, " + json_to_python_type(json_spec.additionalProperties) + ']'}} + {%- else %} + {{- "dict" }} + {%- endif %} +{%- elif json_spec.type is iterable %} + {{- "Union[" }} + {%- for t in json_spec.type %} + {{- json_to_python_type({"type": t}) }} + {%- if not loop.last %} + {{- "," }} + {%- endif %} + {%- endfor %} + {{- "]" }} +{%- else %} + {{- "Any" }} +{%- endif %} +{%- endmacro %} + + +{{- bos_token }} +{{- '<|im_start|>system +' }} +{{- "You are a function calling AI model. You are provided with function signatures within XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools: " }} +{%- for tool in tools %} + {%- if tool.function is defined %} + {%- set tool = tool.function %} + {%- endif %} + {{- '{"type": "function", "function": ' }} + {{- '{"name": "' + tool.name + '", ' }} + {{- '"description": "' + tool.name + '(' }} + {%- for param_name, param_fields in tool.parameters.properties|items %} + {{- param_name + ": " + json_to_python_type(param_fields) }} + {%- if not loop.last %} + {{- ", " }} + {%- endif %} + {%- endfor %} + {{- ")" }} + {%- if tool.return is defined %} + {{- " -> " + json_to_python_type(tool.return) }} + {%- endif %} + {{- " - " + tool.description + " + +" }} + {%- for param_name, param_fields in tool.parameters.properties|items %} + {%- if loop.first %} + {{- " Args: +" }} + {%- endif %} + {{- " " + param_name + "(" + json_to_python_type(param_fields) + "): " + param_fields.description|trim }} + {%- endfor %} + {%- if tool.return is defined and tool.return.description is defined %} + {{- " + Returns: + " + tool.return.description }} + {%- endif %} + {{- '"' }} + {{- ', "parameters": ' }} + {%- if tool.parameters.properties | length == 0 %} + {{- "{}" }} + {%- else %} + {{- tool.parameters|tojson }} + {%- endif %} + {{- "}" }} + {%- if not loop.last %} + {{- " +" }} + {%- endif %} +{%- endfor %} +{{- " " }} +{{- 'Use the following pydantic model json schema for each tool call you will make: {"properties": {"name": {"title": "Name", "type": "string"}, "arguments": {"title": "Arguments", "type": "object"}}, "required": ["name", "arguments"], "title": "FunctionCall", "type": "object"}} +' }} +{{- "For each function call return a json object with function name and arguments within XML tags as follows: +" }} +{{- " +" }} +{{- '{"name": , "arguments": } +' }} +{{- '<|im_end|> +' }} +{%- for message in messages %} + {%- if message.role == "user" or message.role == "system" or (message.role == "assistant" and message.tool_calls is not defined) %} + {{- '<|im_start|>' + message.role + ' +' + message.content + '<|im_end|>' + ' +' }} + {%- elif message.role == "assistant" %} + {{- '<|im_start|>' + message.role }} + {%- for tool_call in message.tool_calls %} + {{- ' + +' }} {%- if tool_call.function is defined %} + {%- set tool_call = tool_call.function %} + {%- endif %} + {{- '{' }} + {{- '"name": "' }} + {{- tool_call.name }} + {{- '"' }} + {{- ', '}} + {%- if tool_call.arguments is defined %} + {{- '"arguments": ' }} + {%- if tool_call.arguments is string %} + {{- tool_call.arguments }} + {%- else %} + {{- tool_call.arguments|tojson }} + {%- endif %} + {%- endif %} + {{- '}' }} + {{- ' +' }} + {%- endfor %} + {{- '<|im_end|> +' }} + {%- elif message.role == "tool" %} + {%- if loop.previtem and loop.previtem.role != "tool" %} + {{- '<|im_start|>tool +' }} + {%- endif %} + {{- ' +' }} + {{- message.content }} + {%- if not loop.last %} + {{- ' + +' }} + {%- else %} + {{- ' +' }} + {%- endif %} + {%- if not loop.last and loop.nextitem.role != "tool" %} + {{- '<|im_end|>' }} + {%- elif loop.last %} + {{- '<|im_end|>' }} + {%- endif %} + {%- endif %} +{%- endfor %} +{%- if add_generation_prompt %} + {{- '<|im_start|>assistant +' }} +{%- endif %} diff --git a/assets/Discord-Hermes-3-8B-interface-screenshot-2.png b/assets/Discord-Hermes-3-8B-interface-screenshot-2.png new file mode 100644 index 0000000..ad41a49 --- /dev/null +++ b/assets/Discord-Hermes-3-8B-interface-screenshot-2.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0ba4a17ccf0347f350b7bc6abf5e7f467de39d89043eba12a96b83a659be89f4 +size 272378 diff --git a/assets/Discord-Hermes-3-8B-interface-screenshot.png b/assets/Discord-Hermes-3-8B-interface-screenshot.png new file mode 100644 index 0000000000000000000000000000000000000000..70529971be1b4d66004dd94900fce46854a0a876 GIT binary patch literal 91533 zcmeFZbyQYeyDz*zNZdkPwjW78DQ!#6n6+L_tdF?ovuhy1Ppn z&b3hY+56e=KHnH;eB&G6AMZUL#{>7exz}8CUUSat7uS5Ma7UI1p9&vA5TYB`rIZll z7z2V}nd0HVJNx~aQ}9156G=Hq1jz~|*we>`-!bfzWUpfMG_ZNWFARp)mE;h_^*n-j zKSq!pc*}blK^!h4$gCcM2)#uRa_i_qMG?4g-1_=GI|MmVjsC-U&3b|w-o&-PAt#MH zgGY(ac7kWv8g9gZ+>nyI>)bm(;Nq-oREoR2bmIrR_B2k8{iCb*q>_`=B{Lg2^m;z^ zljOWIqFGA9KkZLC2nGt6OKQN6(po2NM28z2=vJdw77JVC0CljXaVP8`;{9n z_ZI1|*Q&EzNP0q68auTuYJ10X%42Bl9h1Cgbn#P62?;6mWq;aK0Q=~tYRtz0=u6%E zBnkS;Le4UxFEWOs%TjC-N0$Sz@&3!@!l&2)2@-8)@uoTHMHyd+5z?3w@4~`bceb^Q z?5wPuWu&zBw{&||vNa?W5v11}fp-M*o}%CN(M2(CHN5rMix;gcC7XJJE~jMaT=9@$ zO3J7CR;3sSO+ZL#afX9=b@2w--l@|uPo7M4T#WYIP|@ly-M1Zy?`Cpsrg{8~jz~J$ zx1%HO{=O?|bHp+K9zx{0h(lw~W<$7+k5gZ+DX$G9X&TuDUDx9^=3UFPN6qKuDfF|v zJj&_y$SagCla#dFc^DDV$njb7*sM@epZ@+}t=AlnQk1IG#U(%6Lm1O_Q|8-+*vBPb zP9;w@V^4dKPA&DnVAC%fwOlt9v8lZ`ggI(Rg%1q>`Z`K0$qXWBm!zADf%75$8N}r-Fc! zl2gRu_H@b4u9ep^{A=rQLuSpw`5rPe=V#h7Bbi$aXafrq@t^CmBb!?j59%^h?weG# zX3$rjmzTMI6W>SnnZ$7Uqk3BY6SYgFc9Vj)&r@tJIavxHvgL%oR}TtIFN{5EjAc7! zwLZ5thmc0@#YDZmMd7kP&3yqvO&RqvL&dYN|?~@7mayZ!^6I*0+Y2rfp?j zX~jQZv&Og0={L9dSKarq(xO4iT_*{Va?2|TEUX0jFE5NO%8N z(ol0_4KDPaz5ks8OJr|(BH@D^b#{z6eXYl?!WaEx8C2*U6?u}dFKfyrsIz6h6AkkA zr}I&h!V;;fvJqlBMyId8Oh{1G!#{+DYz=vd?-e@uIu$uC&$MD73b`&k&CRh(T4jn6 zF+^E?@k{-bI~>|^OFys+TNPd{^*>#W5hi&R{j9$1;LJOx0c@nz+4ZVdi|Fz~7ZLdz zjtLbebQsJ&+&w%Iu!f|O(WPm`f5`v2-K1KUw6f|0!DH~XLRD3Ds?sL(4Le2}S?-G$ z=DD&*lhMf(I~lhk2Ya!VO$pTpvGwmb!7HfyjaC z%o%K5UEQf)1$I*kIy#GvCyu_Bm(x-dV|%?_KJ48)&#IWxbu`}*~3 zZrAnc)Z7PGsNh~{&7K*rd7Kut1f3S62`Ray5)u-|BTU|rq;_7q=6~XhQ*%&--etIf zZ-riewsi*5(A@0Uljks}dauYa&2G2)>1l4aUl!rf*aT$t%`w8%*~rTJ`m4Y|{0I@3 zFW(M6e`edBP0euDr#$+K%M{PfQ&@6dJ^a;YrJWlT6huM=`wt#I{?#JKh06xjc;}dz z1??D_nBu$VsMoXP?Cr1Y?eAxqw7k<88{;?rJpyu;5+RaJwv$);eT<;u#dZ*QMJ zckbNq*2-A({_*3->zkS;UjOb>AF6e8XWA3EgRXPz-=hUj`{EG!goYB^3(=lBb?UP9 zaJk0z!NGw7DYk0~9aU4Ozik@L*Pdr8sj6q0n2yB=Irt>7P+ZnSpxd+TXOwjAp#zn0 zzA;UDx=dT|-RsxPnzmgZNeA_VO|yn6E{aKJJgM)u;ROE1LtMu?lEwTm2|))<88 ze(kY%@$%)EYDj2kD4TAn78_a6+a}{rA3r`1*Y&(dDm{WpkB$sFK#9lwR=?3x9kd0? z_;#gfR+;4_Q(EJO6-n~j?DMxj^ZYPD(%mPw!==8?EPmk{k_%w znHzt%y1zT&KeswEhP>MZp@;|_7gfxjflJQs5HC1?cf4o1Q!uL+Z;zk( zJ-2Hk_f9`Q{Q(n^uvGEfmse_wyHs#8cC)`@>$L0XU9?$@g-rO*8A2tpSDyML*RDkh z(gou7&L)z?t!1c%$h1i$PiIaiPnc%0<5c|;Q(P)F@Gy+)-jPrwohGHV4?LkEJ=f(j zLrP1^f^c}TvHeKSu`3euI0~DAr~Ny4U0yjc0THyxO#Q-yh#t8xU-UkOJiRNOjOH1H zq45IdlB1>%u5Xd>kn`bvp{L^X9+Xkd4;5q9%7`t><5c&fJ-nP3mEX_D4s*Yo%Q>BB zAs@3V@@X&^{bg6{k7J+I72JSn{m-uN3HI!?D&qOaD zA9}T*JV1^%sV(OA)E(Oga+E3a=Rb33h2D%JxmHD2F-DT8XvSXkRaOTJDem277NPlZ zdRs+&Z0<`5-sXvG8F7m&mt$;m6VIEB^~ft&u-EKAI^M12?OEdJo;}$5z+S+l6!8)l z*Gjoq7ay32c+)qA<~!&h%XPV9iaN@V1YUwHjawbQwJCIuwGe{{q)Ix?D-D{-GRY=n8WVxQH#}dKbNjhMRCa=|)lgT(? zqkriGrB2H0O_vo*`|SjBvV&|a_@J z$Cv2PkW8q(B(b=%^z6;Mv-La&kHu;CsrlaRQzY>k<*XX4KeYZ*c(R4sn9-oF*8zVoFfb+E%S-C-^ai$fn&{i8a~wo-~8&^l&~ zWpyzvUr#NiI2<5-rG86$^@JMDwZ~2ZIC-<7Tq9M(NgI_Zm77^G*;4E@Tym;4H%tA* zb7`0Ma6YT*#J?roKXsS zPoy`e4Pjs)R>9<$B72JWFMi^=(7}qu=TB7ArS!8#Cxa$uOtBBW{O0@I!^%o##Q^O`-u{Ij2(ogJ6w-d2?DDYG5!s3wlXqu5M{YFW6zKoyMh4U*P2 zQ>lCRD7krf;@kI&LLzufPOPr2#kA#F4eb!8W9J}JK-x`0dCPs`ZHRI7S+`xeArlzJPwN-z&@5`1V z{hpYsW1->(&C!Aa4zr|uR)f!4#C8cq-G14ZT5?A$ueFGe?Jkv-jWm8dyE}UNC__rn zY+1k@@3Sg%=;7etkO?AVr!ul8>LLHVMs z7jRP$E^lm1oVjom{Imooy}Y~-?~DviQt1aBFBRI=4iEOT4C_u*S646p?&JXu?D}-= zBE2kig}ZmL5a+}FO+{s8-{@$X{>D-v`)Tj@@6TnMbq1=%pusUsrf9uAm$`@s8#3f> z<0?lt`j4zm`2RX$j@HC~g1j~&QiecG>C`6)orQ~~jbey`y^ynv%+u)NOwB1ODsGJiaXt?Z#|4b}890sY?d|%O7Fk={FSmwZGeUoslOt+uY|J$J zM&|6|QowuM?bm$8OL?)GFIFp~HE;Q?-r2eJ@EV=&CfCw7{Oq=eLbgty+XRpdVG z%XHhcv|sF}^z`(!6fNv2apNar*LYmwwrNlkcqV^y(2aUSu?@C6tz06UV+MwX+l0aB zP+e#4#IA?8~j(`wObKJ!xCFTpe3kJ^qRTbE) zM*$tI=&e;~e;h`n?q`DzDZsOC8#;{mWsxHC}blYSUr1 z_1LYu{cl^79FiMmw2Pw@CsEM1P)7}kGD5c ze6TY&-CtmrCX(72_|>Gv;QL#yW~;WHjk@eYr=<|M*HoU)lP6Cw^7q0xooGZ|NiSZ! z2xEZ_@chH=2%ooa-)gutG&D@NClezKXU}4M`t(V?z?S(>J0O|w&9e&SEZxFHmf_hM z`S`jZ!IAgcISqJSQAOoRLIR_+o10;kACX#EkNyT>O zd0euR`hPYB4n8r^mm^0M?QFafsH2v{SU zd6qS!H9;;%X)(#C_ru{Cm4!Xo8tj!-Rf?*rI4>LQ2ln2uYhF$59NpVqOF#2U#uxoH zv2D2W2|;t{!%kUsb&AibtQT|rl72XbWT_c@px&VCB&{nRl+73z{4YOtIKuXAh zqcG4$X+nn@^r^N2H2KyeSACz&xGrej>-YRMe{WFm=%X8PBcw;jzpW_&`*Ku!oIygi z#{5tj{RR1V2PZ7ya>)#C)@q4%bm^7=LaM#}L%NgdpRp6WxdB&Jd-%AXsr}kqZ`& z=g;Ci`iyV_a(op}iaM6X*-RuOVl3V{E=`OIL+H?T!jzN4!&j0EfYTnl7`}1x~o)T)Vj$wM|TH{GPCv0 z&{z-8@O^5%q!G;(TlMqU%VoJ9vCcnKMT7m~x$v%6@r{<1vy~eFtGKSrlb|$9QWv*~ z+qg7Ro*JWDS#?%4Om~B)l%p@mdg<~y{%iEBv6VNZ9P*U8Qi z#mM;EO%g0`u2m&P+R~&5;3He{djXnL*oc>fFx>?2nV)4yscw?(Pxn{Z zlt!b&KKbR2Ovdmc3hDKhIHPxS#WQv8J3sv_?=I^jf{E?z%tFUe_o<=a6i=@_31EHt z{<1G!=4O2n{gW3_o&d&iLntGieen!j1a^Urh?)Tze-tML7Y*oD&*@OLxNg?`g(>|v zM!Kk;Z+W|Aa@kxSxc5<8^ z$jXIK5p{+{u3;kn0-1@X*o5>7mGm+CG~#dSXb3T$xl7LADWvP(Wq!Igo6aw^ zO2S7cD0~#^=VQ=#tg@yJC0m7I&ni{)z9HS?TGiCh3!S@IZBb)rXJTRG(y# zFquzy^cu1Qt3bm)2?24{Ku--RZDG2d>KFC8nj zL$hatR)*vxpXPKhFIm@cp=6PD>H)=RF+v>;&d-0z>^3njl+yH)J4WWHtCx6*7? zZZ>fy@J5^bD1BKJfo_bBgD&!1IU-1$^1u(6En=L_axbZ8SF`2cYfdGpYX6D!nf@Kx zlgM(H^V#w$dg(VWgzT=CT-CGt3B1-5?fH#HMPa&}1(@HJC6x+Hyg9WQlg0fh^Rv^x z*PlPqn<7)YHU80p^iPKJFAn^x)%|0^D)g9{b&Jc(<0qAtbZ~I^8v7Vn0^o=f-b)P- z*#5KU&OIi70UbG;Hb`ILy4J*R)rbf?%wnaf_T4|v#?MbaRu@{f zuv4h~v|er@yzy#OtW{cZ?g|UW)hGF61 z{`K|O!}pnFo|SJ7c@Yy6hvHE4AdohgtY9Fjsb^GFR5;d9Vd*ZtrH3&q8{6&0(p}4Y zrS1h3gnk+svEp7jKib>1e?1&5zW3$Jm*_sw5YCW4*6*VTRVI+*ghdHs68Hx2L6 zQG*Cx^EXnmvft{%m}<;&3ZwmXGQwfT%=COs4hSHm70Pb*M3dAx;n_yT_V5b5=ODSb z{OU1bzI5p(4j}~tJOc_COqwCRn}B5MN12z`gZ2->_wU~iXfn1dATurBAnlH-N^1J0 z`qv8p@>?VPyI=qO?5Q(nm|H~Fo&onYwcQjeuFGfHKV5A3Pgv7A25L?i&}lc_-HRXl z``BP|c z`#T%qw%zp1%*8Wj&TzY|-Vcxdqo`qd-;TU;;{>hH zIpiDc8&2O-|IG~hPf3(!=c1Li-sX063M#Uppf<7xIP*Ey&5q6JfcM+iB~Jzvi!%`|aDe z7~K_W5)usH{@gbQoHzuF~UxMv9YnY27&f8Ym&FH z_&m!sWC5@j1|Qdth#Ci>;Ip9M(8~ADDBVeEs}*tIvM%0R{pLI!@1NnO$p3|#`X ztup>eD=Vw;hyGQMJ}D44_)RHOcz#4-J;iY0^%cWn5MM!l84;F1% z0|SG!s}tQ;ALbI{;^Mk|)`TnI5%s?Iv?kpm58qcia5zNQl*bMC0r>lTJ9zvUYkYDN z6XDP~fYH;cwpfk?S~PSe5PbIW7$bqnii| z%SY8g(arv`FCH+~W680eN}|S2u+ZhlKKe$m3PTxmCq_sx=$O-F&%mKPP%D8m?a6Ca zw;)3^i^Q&nmWRD~;RC=~BDr0^@zuqEr%!DfW?<1jb^3J7&`(k_GMwR6Sx3(%vz*cS zp05-<#?RJtt-N+gBIjI|N;XfDllub=^7!%N^1eQ5SPF7#-@kzvL(03mHzVaOKv0J0 zT6Yz_Le)exp_P$g4gj$N-A>C6HZn}R(`f-f2aqsb)+_f;ogA`j5k=R1K$bsP_#=BP zYEx8FpNEA#f@hEf><`0=;>$w&PduC#F5n)+Jy8uZ_*urEFquHQwVD6PmAAt~6+GRW z!wOl3gao4FweJ8NL4oSv@<`Q6JyR&Cm;Fvd9@pjMfFxk;r}`S{RvN!J#ac7*YoRa8 zdBuQ6#QEffJJHr|?>a*$^YOaqf}+1F@L4%rQV!d@aersSf#glx1u5vjlajo$FFWr#RdgjcW}soT&Jhh zV^@WoN*tCUM#k=a10;Xt)^r&yDA)f|y0)3TZ)^;j1#HFa$as%QY{?>Ltu8zC8JZj2 zKK~M)&$+KZtawvq8P?==cIgmeX+gKHb6YyWi#zCjWkC2|8{^_%4_AZG- z6tFNcb8{=LyY4&K0la5t*T~L~=h9tzXH~zueXzSk4wzoK77S0!IyyRg-(DQa^)kMX zZIkeAx*`9GZ=bnrKdm5k%`8DgR8*U67}gmkPEKN2t1i7^yEpox$Z1IjI5T}<{i5@Y z^%i<_E=M)Jz4GAH1z*UpBG1J1RzqJDZ%;;T!FT|@Bnt#z_ncb8%;a`=*})Uog!jOr z(rl2Gkbs%0nwIjq4+t^n$vYaSCl?lir((V6va+&JG6$R?6*mCeBpAzXFs_4(!yVKY z&~Iu+s{Cz+JT~>gni5@c-ePZSWo!Ox%kanu21&)x;enI=T-QR1u8BE!MD)-PDY=&c zao~nv;ozvPA{L+jvK5V{f(0KpWi#Oic=ad!Asqf`!})79_?xKz!*o}TiKRfwqQOp@ z*5$CN@(aT|p}m5K`;>-wnYq*A0#QpqKq~#=ts*dpfe}Q4`Xtg4fH%d5GP@+aoGD%b zml-Y|ryX)5e)IBz&3L>&f`BtGV0V%lDStFQ{cTqxPv*O(RRt$k+dXylJ290rYPvNw z3w@mAbgrTaQ{N@)r`tgAxh_X^vnxY(BwEn=Z4=^Mt(+X~5$is~?40*7NcHrEt8HdB zt4ZMs#l}P#bm*NGc@oX1ahMKP$+tGAF>Xm>Ji|Rvo~XjQHRwdo_T))(G+#!E-}AOa z#h7saTYY&HkKex+ zAb76))%dJ+D?>Gh?g?#E7JQSI+9zjN<4ki-7R9KA9E}n-p0$1tH#GXuaQror^?IiK zd=KGfAT($DwGeMGeuW3&k=P$BUz@TCOuitp?g}op;g9K#5873gdW0#;g-#(F$Xmrc8xMbvV)QH=VN6u<-LW15!HYt4>oaXOQ{@{rt zSswPqqvOQ~r`|OOOO9$|UoiYhr>a%mzbp{i>FiRWpS7HFL}?AoU|)Ce6r(ZbxgsVu zz1Jq287=aZY=5`AF$gSrR3&tid+q|a&x5BC1Nm&yj3DzeOo3m%oaA%iUOX5-eb;@T z*MgaOIWWXeCZjIv?6u*6blR)6&m_yMh;=#4db1__w%7Fg^N5+L`Ib}AKp^2&)q9O4a=t|~*5AKZ($u_tH|#3+h? zY<_RMd4u|OPtVCB+MVW@c<|Ms_BsM0>FwqimZ5^vk_`<3dvm%YF5TJ)5}bTteY5}A zkmZ22J~)tgkA}hc=Ejn@0CB6?NL$Jov%sL3A&U)7Do#!ap0j+^;JtLW z&w8=I*bN1>Quh9T4X+#j*OOK%=ePRK*YJ`EAHQ(I(6n*wS~ocrRUinxjNII-xn*kT zC`U?Ttj;9IGKz|pG{l(RyonF`$7SR1Z#9KNJ_w25|YB^5NTqvSSDKTo<9ZczJ-QqALG!wcrOLJqTK)Saf87Cqb)!7{I2zdp;ub6RMi z%sd8i^@tLKT8` zlLDbl@D4~78Ij@d-h}{?1~e1*`7O`Ga8!l*v{DtBDg9UcUYg)Cz)c#=R ziz_UoN=iy<4n6y6=GMQ0zqvZy31=s#T*qJ|E(gQe3grF07}7q6)Wp7aIy#0i_3a zn!R3Mw>Qx|d-g21-Q;alPJ)f@hMe4MFw4!fCHf;L|7SQG*vqB+fc5-&8P{L)7a18L zGfjK_uhgl6flx@dBVvKM2m}z+(Qy|d@i_^T%ylE9q-S)Zy=h#4fi7}#hC`EK&wo`% z=XU#MKb zZtd=-E}BE$XA3M6CHMURDQW2jo)!@%)zr?3$d3hNM7-kN)S(G1&F1-#^9g$FWZM-1 zTD077C~?zSC~5qi(AZEvF08)s@_Yk8D=cp`g=#W>nf%W6roB)M>Iu1i-MW;d2~3)T zju5tA92a_p`i8!mgaYByWE7j1mj^(dPGrkZe*LDVnwxU>e@J89diz?CPfbq!)6zM> zA%TM7aF2+H2oyl(;Xh~10#WiJE2|wD?c(~xzd)z`g=9or;%0Sr%&38;p{2#D^>91g zu|;=D#|Skm`c{}O#=8<-|5sqI7V=0CTT!rQvY0pIIcIBOW0Unwt>sdn4?Hv@a6kod zd3ICH%WG>z%%R>W?f|%{zcg5)o@Yrr=sZRQjEKx!M*8;ge`zcoSeyfR#?go4ixDb* z;9OGoE+!_pvGHc!Zs7k$qvjkk!Bj~*8do~@KGMoxdtkOH0tt|A z#TSNoecD}RT1)?EsNzdbiT^WpbBO$d-ORf_+bkCK5@%>{e>f{I-x9@535sIP4SH@W&?e|Lbx*z*2&X=!N|c6MnaBWADOQ_Jh6) zxBx-Rx7Vhqvorp~W}7d@`nYwvvf#ohuD}9i)(OlQv-94^- z=K1sI3K>IWH}2e#i52sRwn|q9)2{8r&2avm|1K*oIJ>%5wzr>#M84bos)CxX@mESn zsISV%;J~_%C9MdU5qbaNg8@MMlnm)>O`*LjS4eX<_cj)f_KUtpv)D{9jTjki0U8q^ zC^sf8b&j3A$%t#!iX6|x$fyC-zUV%XFHyf2kW=4{zoI-aEU*`NOqxG;PJzLE9Hc;l zEcN$=g&JtI=aCh_uJ{+9y#%9_Nizi2IB?ga)&kWQeBWw<$R~jGO5LoCEUf)(?0JPM z7;MeTH*ejlvCO+ACl^e5=1c?q!7^?hdHf-;V&TBK1ol{v6%BB}Cs zH5D{qN&dA5Z0ik_IqehCPHNd7StR)(Yn8ry@n=QKVZvH0I2x}jn%==)Nrpg)?7M9RGr!tXe(WA6j zIF~L%9V+vma9TUV!@=pdyi~#97`O@TO5gdQE{C!D;}+*}=Xk>xDzWymv z(z{Dph3(-0J>j!~{{D4D+CkJY^f_Y%Dw`s|4b!y?6nXREt$zj5GBJ98TmPz&a!_rl zvYV)BzgCS{n}?)NHp)6fq-mPyT#GL1Ed{cCPJ}ZD-TGvkoGm*=<i^PO&944$ zYxTda)&I6u|Jz#q{}*d@4i+}{r{;$Pg}mTliRr7Tr~nRiKu)mY59Uh^`ax578--XM zpl19BsH0HXKHRDkzYMiUQ@~ese()&iVE`%5%THB^-r2G_Rtfg213P6n47nlkx@AoZkO@9g4R8v%gwH z5bVN0>t6*0@VhY&h>I8`pT25?>V~@Y)V~@B&+<3k3g@VKM#+=D(+gU|Ww0}O#lDyE zh5fF=^hyH2FWAUiXAZR8!0?1Y@4k$IPe1@JJXFa8&bxQp6?{UaTcZR38$mrLXI1@d z-7E?IzS(WN-Ozu z>maj8N%@_i)nJ{9J={x1NnjjeIyU$ex7UF?5ZPfMFPSvMQ=^135|F?G2ttF4g76d# z&1YA(|MwBxTJg*O!AfX;(2?E#s#XLtDiV_Iwb(hfUNpbDUmZZAbno6paHpw?PO?tvXlPslfBR}xZa&D-C;%ZR z4RahS#xNxY-PhzUvamc42*As$z5L+EIYImB=g9N#-*0F2Sgdv^%Rjx7d>Y)8AV4(p zXQTRGo#jBGN>{qJrpYcyhhZRmYHt+d64SX~dkMumkw4zzf<)?Aqz#GbILaH7gaQA5 z3z`@Z0N{}T5wsb7{QaFM*d6~A+YL~C1&tMz5XarYtoQ}Yic^lj6V!}YUx(U-LtNuO zSj$*3gO{QE6eZTWxrLWo9nB#I@)|>CkBi2YGs|%%fn)C!y8%G@(IeC$LQySE|3E8s%jwhj|B?5-# zkN+`mKI^VBl^sklh;Bt~Z6X9@lLu?Ff1#d*+_{6$x$ zghP|1!NkN%OP-Isre56OJ}Y5at#$S!71q#mQ>==QT0U6Uo<4uo&gd7^u}bHw9^aWn zdOQ`c6dRxO!kJ85%)2C6*SYH3s4&zt7B8K|lk?$xj#qHa(yr5;pWFNz4NS+@W%(e~j2!3s2(z{D96XeSO_0@1B< z$#YW}WQB_ybKsn1;pD6Xn@N_}p(l;FCk+fVEXO4?XTrMxD4fnHUd0FN#NFaj=vFBnYO6*49?DG8@S>RkV}_1iuGap{Z&aLSp%MNNpZ4 z{`CUHiqXO%AOy&KLP->sTQ19f0a#uorO>{I`8*7AAjd*-Ru&1!@d}^rPWODh0Bidr zNnf0tdXp@5>+Mye_en{DrC}+ec_Iad;xXD&xtJ+A=G_Ku3DOkeo+2>qLC+zfr(as% zWx6@n|3<&c59p@~n+awC0gCdCz7CWbpXR`qI6)(TK+z}g21q$zf8x~6S^Jt1wX&Tv zX6ELol@q*~Kqf&%q<8Kwf|t^3tBQUZH1l}*STPQ4hGRH{<-q%m{FH|o{tcu48)w-w zkS(w9^2UGpataXU7O;huk^Dd(0va?mGxKRPtp>wMU*`(#KugR=J9Fq?KX3pImuwDR zGc;s^-9T)AU3q=3J3||WMM?_m;9#??%JfR*eQv>0$87+OT5aeytR0O%cy)h1;%z`|@h07Oa?a6p!%z&weq+d?D;Sr+xKvec%W~!2J)6%hv85K+ZiRS+Yy+r;nV>Q#S%l@o8cT! z2JT}b=d=pPS$0R~GOAK_#n&T5sp0nIx?Z#@eYrO4XqW^PQl=F-QT*x~s&v}2e#@yP zF}5!0xkx?0k zB~Dgb$sew=k?qZyh+ngN$llnUQP(J?U2CHOA&Q_t1ku^w)g@+>x;&lXyHO4CO`ge*BnMb)_W|+m&eVltO$3enci~hNM&RhYK{jD7E_v)(0zDu z>L(G-R&^~V!ZzfAi)8ZhC2(KBt+1OMJ`iF5LdrKs{M*JVa7Q}J#EWrHU?ClggTdce)r8&+nbx=X|Fmc&ro!4E zu4pBuBYddGFuS^{0A(a)uSg%4{i3qp9Kb~s=ImREDKTSl>7{D?h-iL`KbALFK>Qhu zYb#Hbr&mu+p#ky7&`e(5bbtSBQw%7e>dasOsO-EByHw03~LxnmT9{9E&O`08D&C+ebm82Fmk;W_c|zZN>Mc&s|_}_u#kHA zkp0WoukoD+!d;Y5|A5lzQ1y?lr%<|!yX`k;_O za)VlnVB5w?YzqJekaf`n0V!V^IQQAvSv~HE%m?k~1Fi>gn>0s1g*pqgIUhgb*-d`8 z`D7&`B9aNV0~#@R3IZ}V31{d0#05xNZN}S8uc(7>6NEXv91T~Kz}mM;z-@lD6mbM?=MhB(0m&u9$hyV*^&2|nF{Y*CMhu(U?Wlj zgB|!z-hchG*I1$`NfulmHcQ46mLb1?8R46G90gLLmCb%`V z3T&m%-+s%0VlKeGz>?nam|0rlyyiz32GZU|Hnvw#kbD;onm8vYcn@SsZ8}JBf4@H5 zk!$1@1A)nQb^Hbf1AvMj{rzkx#4*j$fNz3ottC7Ow1{{RHI29c$*ehW2%^b&Qn-$7 z?<9Co7#X46w3gGs9s_I$C3wz$E;xc7=Wr`ISM`l>@9S5u&OwzGq(KuS{`6!dVHn47 zu0xLnTu-I1^!nMNk){gP9JJOYqzullfRiY!@g%BHskyiw;kIWx33yf!-3 zk{~0iag4chVPj(hf{Z440X$=?P;!%79~%|L3ZY9V1jRZ>uR=woJ;LiCjV2@}#w&eV z2KkP*|VnWYhs)-&*8k1(+VEpr8P)X10NYIO=fq5m*^4 z9CXWoaOVQi6ZP->EcFluMci+GeqOlixMLA50PZ?i$WQ}ew15pJ_;%{xi!66Uq|~53 zI$9^H$vAws2W)N#%;)GqHC^4MtR_&xDAc_9TwY)*NpO#A&E9gq9_wTgei$yNjez8Yz@U`r-BeR_p~6xKX{{rU;8ThO&&pGMDoq2{x&ZcJ^T0bn~kGvo3t zl?mPO?a8tJrui7i&oVD@2yX2Y5~v&kj_0^fpN>Pvpw4q#dUxygV%YQ`-er`PPeUsI zhC_Q+=^h|Az_~I2v5?{Y%_09zGlxPm$cY}IQ3EUd<%ruM8D~nZM|}Fw<-fnPrEI4& zLqStB=*t(VyvIScD3}X(T>eL_(6Zp510H@Eg1z@U*(a446THbct}L zP`fA1l4IVj2jaMYKu5CXh9cDCf*&3rh2`8D7u0Lw;o(V0OJjq!r*;Xw$sh2**!{8R z&yj5ZXZRrXk5q&UDKK<(S+C}-{J`)*_KNIpt}tqDRa|Pz>tKw%_~ZsnGrzWky$+`A zGjv{jq@G(8`UOC;=SdvE~C4IH{DUF zIk}?X5*I6WN;Ic*p)pbQQ@qN9ti?X3*14A~d+YT)3mmMy7w}z!`5|VrFR{aaS1twL zTKJKYtssd#yz@|hWH3IYd*R|y)24OCf3T|`77baj?KEIMcE&!2SQu2{h9CM($sbS! zH-R$-_X>7yctOJT&9OIs*pd%Q|kxqgH;*`~;A z?mqpoStESyFhJjvxKgdJYNiC&{NSnL;oJ8gwQWl1%Dd%h%KZp5_55jDHk-U*MGbv`+#SjWpD_yVcFx?(^uG&z!h&g&$`3%Pr}jq#f#JTubYzh_{ii zPTd#=lfsX6@h#Z}5@@;M`=QHPm$hu?WpBKT3EzxsRPyzAs~cLySkL`6mbT)fnpSg0 z8Qa}$y`0ya_D)wU2I8*l)SKjpQJ%5WBoUKCSCl@^*mg+r*yV|y$NbG|I%h4Ld+3~! z+7xv96dmnjof z{``4cQ9t&-&+LA#e{Z%8!|9mAn%;d3E=r$Q`KDZP1J0>gxrKg9ZmeoxvPSpfm%kg+ z+qa8J<3D&9uDPA4Xnrgw5NnUQA+KHjD9q~^vSqp+B9hvCcpN8+MEP25?duhj<-u#z zD=+DeMgK}X7kX;3H}r)ghPU?o*5_3O8P*N%rVmt0?SOJ8o%3p*pPe$%B>lJLObT$g(h9o(S|FUp50)XR^=lep-W>~W`g*#|{P zTzmmZ(~rGhFQoqoD(TYb#x%B?3)S_%rPRoW}m zBdHW~LtIpJD`gYwct&P8CKRvEct)0bNr<(!mklf^rR4k>dWS&ztDO8QR~fc$gko$l zj2tFp6&zQtdS@B7G&Zt?{&_*I!Id7u>(jKjNWLkXO0;aoN^Nv$i-C10#WQ?&Qe^q< zA>N*#CDEO(n2H?&!OTDv#e2xIoCgM$PrN9-gy*QvD@x8Pi#)yBZcY} zG}K|JG*LjFzJ=kl$FSU9$`D~4zuv|QVYOXY`^wN^7LAxj$>6X0maVALQLes}D9p&e99VZku z=#zF8Q#nz(v(IL{*XEFROnY_I_P;E~G@tTUf}<()Pd>xo52*UPUj3y*{(~*~ zF#~Jn@Y2%I10!r499zEpd$3l5Gr!P*8tm(E{NG`bYxLgU-eP1@M4T43 z0Ca%)8%1YXjzVksHp$&;vb70`W1e{#L5m%~nrSwRL0mpqnO+|{Cc!>4EQ|1Z`u;FbyuZY;j96Xo#{kCe1bchOT@rj z2V4)ga)XMsH=MdBqMB+8dheh-CRig-9P&$_UtM>Rihy}+FlHxWEz>Q;%}Ipr$AAe` z5Ksu(o&%lZ8K8@;t*v3eejHOa-Fq|NU7Rn3mh}EAz6caDRq*65-N+dndY`52kPs9x zqF)OeygrBw&R~OS`}y-Hd#?_hMN?QPq@LC#ET&eJ`UfH2`n;L$+&^tL_f6X{{(gT^ zyUnw@?M1=GffeWLPO+_U0OIS{@c{M^y}Z7@*LB)!$vjgdF(SV%m0|+!eS}t{gZT|u z(F(vN3UH9aix;OLeHK`SWBS;1iYefPKqx0WDnz&w)0^H1<#PIfe8C8g3Pzw$#VLJ? zUPZZDH0`W0kW^G80iR2IX^28Am{ZxZ1>GbO? z0wn~Ag@uI!b~%tDi!1|dFo3i%L^}0-^TQkuDAiGis&lz-T4kQ1a2myul4xI~AxLWu z>p=Yaj~RHxOUWhMF$m3Nk0KzLXI z5068`#zb?Lsd>M)wr^TmP`dpcI1}j{THOb=K$Vch3>Lfm%za1=i4+;9&U`(;^{|9H z1T;98(9W^3;dJfx8mVf~Ga|tMO zVdmmegu)@(Z3uM^wREkp*C$TBD6FPG+2(#g0$QWtpX(|=)8!W z6B5z@=y4J1$;c2Tb#)fB5N=e6HNp_i9Dx-P75Eq#*}IZ|h~53-_^P9BXh;{z)EHql zpq~cq_!FGycpXGmu+kQ87e^?mIWYjMkon{6Gx1;mPa^p256@~qzf#5 z3+D_65%x}P2rVnr0KNv{6u!h!s5#H!z;?>!c7yZ9PyaG#ZX5-T=C0zWyCECva1iFL zd-o#79Dtt~6}dTussS)r8vzphFZSL$oa?^-AO4^;Bx#>^oDGzg5JF`Xm8d8?8c1aC z(WGP~Qdt#3R?420j7Tr^5t!Bi^WU}hB5W~gkhT6tKrVlV8ReXqneiYVy`#{W}v&D92){qB}@!NN&C;z zS@+Gk?{Q?zA`YNVYbZl`;kYzy3pHg92`vviHzQ9>3-YGSosaD zPZW9HVLG$guV@Rph9S7QxVq$H!22g;IFcci6BeB%o20)5+>f_M!^X92Dk;aVhlbG?j{}kV-6hc`69;+>Kc*B0BZ41WCv*WaI3t9 zwIdmp+RPLQHgX&|P;cxf1+OzAwZ_vf69^_Vk?R!-+l{|2?}3&%PnkKjECw$W&WMSO zG(~wAz_36KZN`R0paDW>M^DCQJ9puq01I^XcRqahU{T~&H92qsgQ5>N7Sj!Y71w-s zYv4Nw%c?mSt^&B25bquiRcww33;ci!?CI}+AG2~|E`n2Y&q&;)Bek=$Mo-=z5K<{t z!@kLQi@N061cCK2Cx4&#i2s42bc3xyP!x>Y9t#qcV|Y3uj( z;a(u82Pd;q+kVDVQ_2WM7>WW z<~smq55i3^ZLC=~Ew!d2+|T$5#iUiNhux9)LD8iv#`4Ll1#64lYz>U+3KQ}4u5Vnc zCE%yd7n!i+u)3b|sZ-@U#)9L&{6{z?we98v?js82;itCvjLgi2UVE#p;0&CHFzV_TaKqE_vyYXo}ivUw9USOPwg7Zcx_9cd1+}v zdEn&qwT5_^D|v3JgF{29yws~K?XTbR+82I7-7M4;>P|h=*%^%J2&6Oc-OorH(#T|s zUM4`r>iV|IsyF+XECWthmdBCdAE=C~#sBH7jYfemll|+Ip-g)o6BBcI>RPC}datRs zDzzu`l5fmQg4ubn`@PqafGWCt?rUydUT}DLetQA^AA3G_jR0NnfQoZH<>a|@cR{b` z=jYeL96l-GVO=Gka*y4=!#+Q={)l#yx(xmAze7Iz%=-eFSpyld`>gsF_)x^kv=BWSv!WqB8x6t zc6FvEuj%kk{O8g1&n)+t_>9QtgO+R^etv!g+XrIyA?|>%FcUUV?e9S`XDzeu;^SM^z?0wx7I>Hx zN`xwI;^tn44FR~)%|ahP={ z1gQmIpfV@++?zkg44jy>bXG=gz)@;gH7~}!U!OQpUOLj=H@WSPu`g#gSho0=J<83M zMk9fd@pIIeXXCUN)6z4wDc0888+P-w8YZX=`escpj?EZlBIUY|>}^cHQAq4>KV>s( zl4pGQlU0ghkm`v1W%fMpFWK&azWpCBhg6X#J#4QyJCC|1w7YC*LSS5V8#Yk5jW>yj z&A#klv;3U>MJzFXO8iaKU_a5DhzK6Du7Whg{i${e5;ZKD**Vk1#_dSigHmLgAFa!*;Pwj=itXY?OVv3ly)=Za5F+E_VVRR z)duk}rAU0U+%@M0P{Z0u|66G_l}Z7X`uqMOBb0v6s3gl0s+k+Ip@%MBY`@xn*s`ed zOU09`mSd@%y6hI;h0mDO!K&*{j`n*@2hxhByIzfS-M28WswvgwU%G7BLCot`&}OSA z(AIC};Gl5csHssgXv}1-tgOUHO(U&XleO#c=GMw5BVG4~PkzEuVlUnOs)Kbik(QTt zVFmrxmtrNzqAI+W@El4r4u+NQKi~-o@NJlA5YKmTG%hds1#p@KfhZpQ@+IxuWOWvV zFT0IfBqqui zpVZ^W*B3vdH}sU#Dmy@qDEZlG-18wL{TUry+Qd{=W2|g2)oC7z)62{9ogY51#LlZS z!)+C&58pblX3gNkv4)!yEmHmtKXy}`_(WOa=BBq&nyfCgja+^H+(jjA{eji1S+3un zDBJ7}+Dhcqs5%1#qmq2;_nf#-JuB1Sq{REm@`nV36BzF!PDIX^<)WtDjJefFkF(g@ z%ZEcto4~XP)~^+CUl7#_2Ahwq%yyVk9bd`5a`-l#uBZbGoBiawj_1xG1(>&WO?~%U z`gqUc+!k#$P}dFwSy3M>56iA9aeC8UNJa74zkk4b;2`GvoIk3w?g;E!1=dA4KNEjj zhv{Dw2o-qMN}M`K#fNkYC1uMQ9vRc&k*u*k`nihJE;%C?X;C-*6>OYHnyq1S+veifN)m~1l(B&fDBmsnRcr*IqbQ+YnsQ}9ww(6 zvByNIF40k7gtM39T+hKzpHEKPt-b%p&8GdPt2_f&;sAfL39gg^t#2?gmSy4xEe(?gt7*E$w5p1j zmv%%{dY1(m&vK^9(X2{)f9*i$(&aa53hBUy06C?CGw8D9qR*kB+lmT4SQ=&A?kw^v zMO)A;ZoP*oYjSz+3(g~^dn+0m!s}Ak-MK-XXImK<_}Z_CWgU6T7q$l|yB~SKGu(3! z>hgogtQgL5?c8T!;A&UuCwMrzI{UODb=rGM(ATpRr}ngd3MDaD`t5KbgIS2X#0-599WxCy@<3PnF)&C=nw$ILkf}L#iznBv>az%Xe86>}-<=+V6f5Q!_T(rTzTDoFIdXI3o;nnI4 z&Rx4!f%>+Tfg!8a6xOC5^Ok)pB`z#IhnR>N?<(0HP3uttNt~F6&)KYH1{h81Z zJVb}`S^b#*;#{bt)$WCnCYf70PL>Dl0zVk?Syvhm%OuAPDJ_O@q<9D7Y+ve|aMNQ? z_ACa&@FT1j7{M|Hg8@wKHuoYo%#_YUhX~-xEUjUVG$qkV4Ts=@0b~aF#5K&Z7cE+J z?KM`DaMcovk>$B4v4TtM%r7c*Ed3X<#f+YM`+ zX>2gxoiH3z@D2O_n2>Sm8)kJNXIzJ*7JCJXwdSs_h-4Zy?#O72FtCFgqep-t3v^ZI zVYbc3P(bv`U@a57n)R-Wv6Fy(jt$DB4Yc7^gq8d=6Iw@RID0&5H3t$FyyIVxu-0Vy zqesudHyk7hVljys8Hoddhf893l2_a{nYs8g?4Ymzs88EKOu*0ru3}^B>vYD|t$UC{ zVkq)#<={L8yvFa*HzoF`$F_uV6oB#(5%Ebco!pSA!U|q+!{QdP?1Nn_>|mic^9HI# zM0&+PI2$Nn99&A#IJadPE zosS|3@;QhZfTr-Ytlxf?1GEjV$B(yNzkc0)1{LMMp-V5~kH>h2hNy#-vL32n)Y_}p zuit{Ir>vxe1LPg!h!cb|qO43Lao%X8C%zRsY;kjQ+oO}aLM9Lvm|<_I^5HMKdF$4^ z!J7XSNW{Z;Wn!q^q`hF>lU?U;;MUGZCk*yUJ`@S+Omvnb@|aTssM&1?hqv5 zA77vQXL<;a@#yVAWCT;5u^62~b zOY^Efc$*^N15@QZ=Qz+l!q97_DL23|M`_IQ7kuHL2v!YFj)@8Q1$!Z1hw}++d~2KC zbtAWL?iF^NP2Ya*%}4HEtc&7yN?WYsOk{Z?fUNyCKf~py^4%bcAVdrIXW2e){s|=- zg4}W*B`Q0c9PH>_)gf)OK-g$=FLq<^@3}bcAy1y zXBF(Q(DGY#M})nGeLEaSS^*Z7=<*98NQSE)ejx!3s%Or$|J)bK4J{OuMBjm;&9FA~MO96EM~5@m1zh{~t;5aYE($~yc1nD{ z=6I~as&Vnhb9OqE_N8`!Q26}$`}Q}8jA`!aaRpGE(CG>n8(ubCJCCwSv&6*j>1iFq zAx9sQHohq?k1ZPyjTNgex#Q_`sPezPy^OG#-7u~D*;mgDG3-I`ksBQ6>`7lJ9bEb# z33zR#{eNoIW76vl^MI@okex_@$*Kt2K4EmYxYUY7CRo%b%R6GGz9X|a-#{Qut;&Pl zDzUcrs@S65_IC5atjg-@q{t}d;`NX=go)16G@KK2Kd8?H$BP9-NAK_NoA3ut1Bc^1 zB&g~|njj?wAOVCQyUn{#6O&e;OvIcRnr4NwXFC<2%VC|gWSOmHynTgmVltw(@H-p7wk z!~o7;EPigh;64710IF4Qpc_5EYa>|r{boMAVq$EN1mDZU5x45w*F6t5%|0XheHf?2 zNY0aN7G`NFf6wVqsryj=w@oF=50vI;ksdsH^nJXsm%ZWTK7+5j0PPCRS{_*7VQZ>Y zDxH%R3jtl`-YY~hEHf$fXBMRrvU8u0@t0HU;CD5tMJMlO{>sPAj>b9_#ZF5y$k0A z5Ys2xcRF?r9qybP(VW|i8*>%38?0rZMaJkSA~x0++TX*cd3@w_nqy`a6xovpH}LOlTNU6)PZ`Q6bQgdz-9zb9=8R4 z2H40|Wp#D#LU5wM9Sjc-cdFFS&`a~V4@^l$&)(44{R{nOQ<av5%1-CpQq{)_`jrNfE10^vb5)}^ay2wpn}+mTAo?p zFz`wDRiW(nw#)Q z8aPy8D%5C3RcgJAWx-F`Fse|KYQIr3g|bbbN6(S#j6Q#;cgrwinxUh!h4wEVg(Rh= zqmXVjhug0^Odk1JF`3Bgp%QvWDKsb5sPSN|{ELBduHyK#DBYlKo%a{fx>aHBygNffBNJbX;N^yQF}x}FhNrV+#NLgI>9Aa! z8!5;;A}&xEA$GNQt8C+j{uMJ^cP0)j)x63OHpUh;u~m9T$Lobf&WH^~Q{0%s%x7?& z)k3sUkcP>Jw(&`fR`+1XKygj*y+U%nI8%M~)IX`|L`yv^a*7TWDJiLQx%*zeWyASX zCv_rn(#vD1DUaxFSUr~9mpEEJ*TkM6gCoEHz#;qE`)i-#*f(2SXf)j9k)vnpV|6TC zQc_#vk(5iPlCWO+*t&t0bD?vN@aa2VujgB{n3B~kNMl}Q^CQ*ERdP;d)%SHb zM&fuGZJUg9F3vZB^*tdOdLeVW9vrPSX}PdqZ7|k?h#su>QATuC(KF)>OtMYcSv+*Mqtuzj*61iuK2x2k|an z9aOjj_0`h~!NmLZC)L+WuGncs@BT$1YXq9@%9d4#n^P^3o~5TW1(%8pMG4Ps-Dn_v zjNYC}IXCFNEyLB1@-{23fxsdvn|wuF79RiwifN0&r>); zl|yxeA(~-U%w$8^A;)`J>Jy8K?f0_pT$kR!+1PPr$?kF22WN+SZSWhWISrP(i_ZQb zJ$~SaBHQLV43~SUo$h>>=9e2D^L4xRc0MVu(_%5lVY?-pj%;vh$u$}rdeV5lqdn_+ zu1YPZrg!E}J|?-ro*Pn2r+!eiov~lW?#5PpD~zyWLZ|d5XQ?J6NACxb)TB z4ofrYPA|LOwDkaO)6H9RuDRd09HX~QmCE#R&Z5K1JD6O-Z{m1v$wFy*5qH{{EhCrN zbW+Uq{Q>P=JOc`pcRI6N10Bixgw9#g_UvqX$D<^5k?Nh`$l_H??~K|}ChpoQTuNW@ zw*JN+npw}1)nyZu7RX{5CkEGff^>6XrYB2l4YR}gr}wr_-Os%rt09pPn#g}LT&&GQc#yIA`a)4cVge?5l*a#=Lg6b*g(pFanrLQ_&J zMXv`^=_U#W7bw#BeXa4qg6Q2R6RpVgaqmW@xWw=*FN>75-x3t(kNr3O?nUjf|I_yL z2pxhDeq`86LJA9PtoAeb{`p_#ugTaFj(@b8k?cn_?MKc4@{RLNOCbUPl>UZ$wv{JpmKdMWAc*&kp%!J%jEHF4AAqt@omyV72CFq6hilKA4!GQuvG6Gocwa zpFY6JNv&x&xELCOW<+`(_Y>G(o+$%#4~oRMO}H-x&yiq`?H1 zHj&*09@F0k?_3uk`h?;kBEpOjedIg^(%+>lq`HR~_L8(nPEJ(r=@&H;RCWYQyXnUr zn4kGO4Er_a!>Td_xHN`Uo8do;GHgpsMcA`v5~x5(!9z$1K$XiXD*l9k;946oD*kus zyL)QSh|SSs$Gl+R9_M8ox-jMQ65R@9+37C#Gdxt?Vj4QYu!`&fxb-7|Z1RZ3kFT33l40dX z?d>y2oCLQ#szGA1BY5=`EN{|8OmY8dC)I8i5Ey-A?Y)RYP!XOvxJnK2@_+H{6JgBr zsj%?Mm{r3o@&VxE^-q(2Zh1`hPX+#(>=#M_Qd#)(v!8CAWs8>_$UcBy@I@gmVyK!Z z{)WnmE$}MYG1}YON<)E%*KdlucLyl$5cr1I)|&qkxZ;r@EG0OvF_>8efqQ|Rv1>Ko z|IxAqInJhV;gk?VpQMYQi7U*{Nk5}Z_^z3`I?O!Bq=7(5)f>sPn&0eR_so&&Yw~*o zfh1eaeQ>2 z$;rv}wT6rIkmI92c>DruS7B9M6ocb*iH(j9momK9!B@_pguEr18?9X&=vcs1Fg3tP zRR%r0YZJtP!s^b22mZE0VxhVLb#j3?7)0=X=WTHO7rkrR=4<+qz0htL+t`EJ--?{H z_T}>noNs*3`t6WIQ=B|J^?tc@{^Lnq5d;TzVB zTjiE)=!e4$2W~-bic#2IUR(ti7qyq;*B>lY@Q>K_L5Bfdz?bb1Xm=YZL%7r~YFUCu z`_lTw7#V#ttzSV>k#qAa1RY*}emz#bxbgebhuvW+idIHZQ*)M{GZ_O@9O`5>g`yUN}T4w zqyk?ZMS|aGDLm2ZQwOf;+Wbs5Z`RN#xqZ8SdZw{oIkc3*vyLKB{&uU``Sb48bg8 zkWYLa2^?7-CZG50;LVdKPr|IAE_$n?gTpM0DZXR}^OVh8ceMSKNjXO!u;}LdYu5`e zops|_y4qR7iPC!GJnNm58!b+STrPJM7?-S)^uFpv;S^tVQ05lzJp`jvxJbM`a)V~k z(r-u0y3A~B9UVFByO#YKc6iBkJMrRSG5JX|^U4Te^H*smsVA}<8`;mDJD2n^4vvl4 z;H&Jygd4oZwE$LyP`qDl{6f~Iq_i!=>!QHK(SgG?)O-2nIR={LKp|KY0=gFzQ0|%HqS@LIGEU*U_`_P zfCyt3;O6?49!vO)qiLXu7e#f?0!&QaVgGA6_ zmN5&rBH=_3u|L7>lhp%cXACw9L83S687=J@8A8l?!=9;b?1cIWHG^Nv%TGbTcA+ff zsHLRlFVo9p_3Thy4&YL7R3zaes0hHG+r)VSTuOS-wQqvx^IkiS5Y!>`HH%IsW~{)C zP#O^@Ie306M)RYKAkpw7NkmRA4t79jn^O4Go{CuVVGzYyifE6p4!~DLfd`M5C1}Bp zASpJ4Rn%&LzUT)b5u%D9+$jn{;o!E3LN)tAE`Be}x(Q4#XPu;1PK%Qwjle?KBcU$c zg#lX_KhYoJD!?L%{hijXXX)uIn9aBg+;?*$s$#rNfQB!yYA7&a1(h8-st{R8lGj$y z(vU-irvhZEIeCXZiP1PAyq!360Hp*g8fv&ESeFwVomCGf1y1&@D5s@iP6OT4Tsr~N z%+l1Th`oOZdJHrOQeowv&@Oz`juTR+dV(-Sm=hL4Qou>3G8YThf)(WD?cH`{@A(gl zkZD2!421(uK+q8S=8Ym+o8}|sDW;bZ%Y!Ukup;H4Bxu8PWNCe;y{Cs4)+uF?$~!UI znMoe90~Z5{2XOI2SFj1REiX?`}&4-*P6ER3$%4GjPM>3#3sJrzw;!hlAK+6{n{g!GTi7LKy&L*Z302s*8<(CClN?o5FV>JvQ z%xgw*I~2lsNL_gSNbg&8r&!pCtoqbbG#3-8Ydk@Jlf!*aT3^xTg^-e64jm;t8OMIv z98H53?0%KZQ~0ED5}+-7hg;oWBMp``E7-+Y-nG}pB9Mg%a%dtCEoel-)1Vcx9s&?R z%mwI~1+-9QV(yew-yRAH2gNsQwORYhW0;*obYT%fK5U4B>;fSW_q;T0rJ>0wNxZ5G zzb27T5g0<1w~}e(;(kAYs~^B8-*q8>l|g;#Iy4Li4<4jK>`AMM`dEKs|IcJV)dyXB zdTK5QoNxyHZ5kKogq{N^S69C8Lp(q_av!alt}nR;je$l#5gAVga0fc6^&d7tiCXU5 z!n@{SJQXM2#`eZH7v-YUIMnT3PopCs0V1$U_=tWJdR;g2!C>H@P!&*DK%x+# z^}w9E8HF{4Z_Ywi55^>z+bklsO^S*ygc_?gZ0SK`%mbC~I$qlZ6PmM$2Ir#9s#EC) z91|xHhP@k49Qj-T2|hqr%v_NR`*Lb}(6p7gCRvh_8 zp=-~|EB*01TF|RSk33K11IsjHg3 z&vuO6@$sB7V9gsvSQ6HpYT3EZdyDIo)+murBofe)riBlC9ts#MFwmg~hDQ!LJ9xAG zWtg52J_UrzF0QUO(S?Du?g>ITl*~e6`F!WzECTM8i~YI@T{uL?n34+3^*iRGCD+Yl9e< z28NiNGDX{av$&>$+d!Q1$Y(4`HNQfX&|A0aU@5CUmIbjo=F+z=lQ8ts9g>4L9$-ZO zsh|Lm5C%{p7C$_pITPA3J2E)<04WhLO>iS)rSNc1NY15B^Hk2&Pbo(~0+uDW&B+R#2hD@+0sHQfKLyO9?9N&}=7^tz;Kd!!()vCGO*k zc#ipY9@@Z&rE{n!Hmd-aUcKjp3X_{Ogwb0dLy(k&qeUCYpRR=?OmOSQ197)hcP);W zxKiM*&xf^C&-7JylnLMPfJaZWIgFoYd?4(&H8^HeLGh87&htNmJUv%UpbU_dB-Xq? zZ8AV;Zhh)n^7if9b_i$G27ZMc8oS==P7+sJ4JIRKvf%Fbyu?~^5b|^!GIaeh_7hUk z(y$dfFW{$u-nl1o4`yP>TR{%lHmui^>dv`q@nSdWn>acFl|KRl13_DS50ejOwVn7| z09gd<(B*F^nME=OJ@He<6Aj?Zacoq96O|C>4OB}8Utd!}bp|l;>8Yo@$|W%x^G>yr zSwXbNB|5*ng`XVGF~mlLj~bOB4MTUIr-OtfkO!Iw^?0bg(0UiaaD=Fyt3CkBInaLe z9C~~7+LMcmR(NYCK^NKJv|o=APeV}d#u9?%y!SiYGP0{p{} ztU=^us(MU}qV+3p2=_7qMpVH-1Hfo9MCxKQ=B9POI%_?v#S3;)LR9nGSI!f^2vz_1 z-tg>R4++ti%-4nH&ay2G&T8v+IBtltoU;`Ry!XWA)R{2GG~U1k{dcE;1sR-O6opyc zIiK1A*;f_eO#aTXHdpngX*2FFOm6nwU3P@dfG8(&K@;-^laqL=+8 z0RLHiMmO;Q?ctT8rA=?NXSACliQCx?rO?~>k`6VF&`|MN&l65jn^id*A}^QMdS^j7 zTCq0e=enr56kF;`jUqIR4b2pa-`9cC+w!|7O5W}|8%d!khzZ-3nD6gn=c`}TZdZ*&(E-4bo8J5Lwa^sgPvn|;nv)wbxP(o%=M~fQchKkT?uJ& znb|iRSDrZ?`=v;@Nvm;tlu!n(#n;^_XL$H|=)d?4?VilLJ0A1X_Sc+hAJpHgeb34& zT->XDwL$u{3jLAgjvGC?q@R6MyQe3kv51o1V6lF3(byrw%B;bwT~x@547POb6S^we zlP`1h>G2J-cR5X$rS9uDsbAbV(%u?8ctvYH9MPpfjjT{y7A7E z;d&hf!-#i}rmr4Z+Py5ma?vA8H7ECT_G^9Tm9-TWM-K;Ep0s@MJjbhk>`GoD>jA-g zD|S3`8;lISEn`Z>!N$W7R?U4B1{Mv&-uWfk-|BV*SDDHVKdImv@=0$ZCm|(ae2G7xdzzq6)A2eIs7~h%ec1C%={S{%}&4vc=Kota-qtE*pHS z%5~QfMGqjs$e##RbK}6rj%-PButmy>e&Yv#4 zAj(q3pyOz>Jm>rBFf(Uf_k$g**FU~?_FTPtkp|Pws83UlXD54ja`d1Dj|<6WCN~hP z<-6s>%in+O+-=p7)$v}wCo*|EMM%SXwPR)0ec4E=u_Yt#*v+#Ud)IU+W_Z7fT$q); z{1VzoMMW`6*g$=N`sCSDYL}m`iq$&XtF-osoYorEn`PVT*+VD7&*)29-OlEGq%sv7 zqQO`6qkG%@k-Mx}Sdv~yzr3_&-d~JyD+g(6B|k1(u(JDa-`A(a8{Bvq3T?nDMwAtA zcdHo$A#w@;eN=-el;5lV_~S0fH>--pR|uvm9NXdmgZO zB9k$k>=_C2ODT3!*lDR?ilzPkR85N_h5-;PH5;lRLvO0oHmuL=lf$WdWjzuqD* zA)7>4V36~)z}QlG3J$Z1l9JeJW2RKZ#l^|y?V7_0*@q8Ouu0rwq25M%4-&CxXxMk0 zy9Xg&KAH{_Qs^NhB`-iLRefAlc`oJ2)Q>)!6kczC%@wp$n{FC+NIslcUPW}r=iXjA zUlFd5a9N9UU~DXpsE?q2c#aZmV084(QpZ07`9u)l0=HCG@Wa#ui-2zs8=d$o)hkiW z6CeR4oZ1P~NRaAs!AvrGVrRIHw8rp`cZ?ai_}{g##%xP=)$vFodt%^r5`J)G4qWDz!+?YDnapxro7EqHwS z5d(wm=%R5EyW$UEcZBOJJkOkUAkJ(hkuBI+$**|0<@hU~M)+_(@bja?M1btkD8gU{ z>8AJze7M8WO0AIS=g=^SIzgT%eWe5=Fdbah@K>)6nVau(m>!fd`|-7O&<_2^YwO`l z#CzC4fS>>%6j>DLfM-Xd5XHuuXuvwcJcA3ZfBWb%7_VT~iu&OEo+qAxff_xp+@*ux zLr+Rw`qKrV&s%lT4thkxX6lBU9AVhKR zM9)}0%*Mtx4y=ZFi=ywN!8FZ1er~!$d|jA=O&^jA3I}WQ*d19g+mXa)zznwJ?Pd9# ziHhnk=;eunDP-p~xC>$4B~kzs<8_OG(o+ww%u-KdpbRv?DRNWId;A;L8dZ&-(wsBC z802$5as*r|5n_i%CbWoYK%H|$P7XD<33t$>yKmX>>7!HI{$PL^?Q5ty(N3UR0)RmS zH#X6EnR!^7JjjR`VYw^9HTt5QRlGh6Hwo9q5H}p`YU1nbX{@Y>uLzts7PZ1Re zmKflk%}_ogLgG&nND^#cY=9Ub62nTxfe8z8KijUnWZtFaG;ahf817GUS~sp3f=>4+ zep@JRdyJo_rJy}x)x&=-qOUB22R2GGPyAAp&t))EkYPe|4d;eExMp|Juv5X=3+yvo zlI+l#zeiT0jiGn>dt@af*`K#}6voJ(FxcY;JVpv>9EcC3{V5rJvXjr#YVxY9mBBs< z0V#`89I5~F@*$8f)s8F^HZEY^Hi#@&P3-2GQbt7V(F~(7uI?v6f zQz|E)dksfs3Yb-IjA=W&ZHL^^MlJq`lY##M?qJnJYRbH`MtRTQz$`kgJ53ZKLjiXY zg60iu}Io;sobM*Hyq?$l79q1h!k6DOy6ruDuMaveo#I=Zna zXZA6d`&i-k_oyFJQ)b%S@x=!Y3x4P`G~QnIK~>_R9$~4i-9jJWO#i7vdoX`RmKYY!qK3&1y8V z=wi8G_dA$MtU@Lyh1rlQ#7vCfvD*Ooq$QG0n~Y2azJ8KCMr^|%KmfTIu3vYtKnoT` zj}m|YbJTYvD%K4WG)2YL@E-Yb_Ez1$0Yk7f+}7ZWUj+ykz8MbJ4Bmb-5OSDKZZ!s$ zL`_Zo8VFigHE|CHE^Zmo2hA*?CuuB`6e8KyEq!sj^^F;Vs%pD6G{w!7r|OR3S&Hn1 z3IJ;i{g)2usmCJRUCgq1ZWy|g)fWQ+5+!B~^ewEz==*r^Y4xc(&rAhhY;XGyRP~Hl z{fNt}b4af<4&>>I6b@Jkw0azwZ2o1_Si|{zp7XPBHjgzD1>aP#H{;Ew*69SC{!E6{ zD=eob16%k6gV_Z_Gz3l3nOZYGsoUS|GvGT;I3cfXM)ZkFgqU$+1n?TN$h@)dv346w zu%jmQal-ARb-dOvsrAX_!1o0{yt_e{CGbD|L*UDtUfVC~gHTw^q{I5tkU~nX6F4Bz z7cs)Ew#}OUdyMMaK5@^wrSop#pDvixXuzwOKVN;K$$p<-5p8r;!ql3_Mc;eaj4Z1& zG95Z&MpK@62a1gJM)FS7MTxL*KMyh5#tOG9?OhJjRxq%f_ra!J$yyr_} zzEF6ZeTjPOa`B7(NV zDh8Z;tlr^KQ6HftQ8dyZAx*$K;=mgwo(6#Ao|HR=)DLP_f_&c!%qJJ8k+pX3o;{wR z7mY5<2V@E_MGxU}kfl}<`T*GX9Y5J=Ek%HA!L<){4Dr#<8x^=(BV;)B6t1%17XgE6 z0_Vb=AHjL=!K|;(l<$kORi7z=wtR1jO&`C7n6e`!b+RZSZpb*)s6x=tWhoH=!<7K{ zEIAHoSOgtscT`iT7E7CYoTyix+hvR@Z5Ba8f(0pd!5tB0rr-1vULdfX%x(-kU`N@t zMA_$itGn@!OPR_IKa4d#Ve74t>598nEo8A5drKp)QG8#_&&tUH6PdV!RG5sbRyFaTOBvF|o7pYz>$nYSSRaVFKWy%{qtA}rtXk;h&$HWD^9*l0bjoIU zv&1s9W~=|<*v%`a*ZAyZcKdCvd4klDXsN5?S2tDR@_;BL2~jqxOrkRqpXri!=ZCk9 zqLR`Ugdsr2P0TuX&}Iq~nLXnJ+8rOJrW)t?P|Bu#570vrU#lUfPVRk1#8N3o#>6~= z`XWv3d$5NX4~%V%+fLym#CMtuG`&|mefm%6wDFE`R5~kQ@Rb8-tA)ys(IC1^xV>J) z_2KK4gVilu@liT~6OU+xe0VGQum4Ewt6)n+@I<^Gm|Q*-7Mwgj1)*q_5wT1uLk!x3 z!bW<}w7yh1$&bbxR$x(yA&Ux{CrqE1!ZHyKLcmEw&089X>ttf$!Q*^l3S2K%y2tAK zm#FJ?;K7xH#0v3_BvY9v0S@-6EB`8QRI+#wDIZ&ii7#w@KDQ5s#hfe?^bWx;D(lX@ zmhX@w_<_V=1qX;N9UYb%H2$ga_*Y*ql)L9-C-sjUaj}EY$30FJqf27(qA?x$XV7GK zjG3s}nC(S%g~s27u1MZ)<;nBvt;Nw*3_X#)k$DJEy9Q?PWWlTcy{m6pzJIkBMfVn* zPxxObU3PYZqtNTuudg^fJ`-69oSF})#0#a!SifR$-rK<;OH$9KUA*{{I%1ZKQZaaf zP-?BFW~1Mmod7o78TOcG9K?->tS*`0K39>R80_Vd_J)@#Ec@zy*u@xz;k-tc4k_Rh zyEDtyT4FFPSY%1M$_V_2|~ zP3W2g;du^7>~^Sj$k(b{2H{ZbL}PR9jA+K>@|6y0th#8`qE%lsMA>aTzW1x+#{1IA z^s)Z{MzvvADtL91LYBjfetm??)0oCc_SL1)wt)_)tDq-*zcXu^8n{2$gt*1Dpu^9S z6RY5ln_I;lX2qE5!T~!7h;0xn?@{RLz+9lqFI<1<;nPI&HucNMJk@cgN7J&-8qpVL z38@gBzLQH6kzArlWX;Mpemng9<=q+QBm37mOtCG{d=+#~yW?%ot6);%AZwrKado(=|_{42y~mwLS3HC}JYt`Bmpo=V0b zE#6wf{KQJW-mlj&He?jOw8Fs0M+kn}lxMs4u5nOL`w*)m(-~;LzQ%LaS03KyWx6L* zEO&Y=2)ws)I0JAQM zZ>x=?nUM2+G1+z$dCWakzFQeNb*BdN+^Zu*4>ULT&?R^<{HUxvVXflos^Iy6isHXu z%D*pSS^gQicLyBOYK#-^W!OINZ)9(NxMh=7+mS_^w+2tVyDl^_-nwi0=R>Z5dfi(n zO<6ox9p!pAbASKv;fmo_jJh!EGNaZOWZEe&eCRM-_V3hRxPgH1O4$yuP@blXwuOc^ zPft&{6y;06xt2?J{Mtr`$z_#WyZq(Zx5EsTvK#y!P$k6fJeHF3n%Qlj=PM&cOU#jtqP2J9$;VIb z4nWC_)t%`{@uXL*tXeD zF17_<{93s3N6N02Jx>3510To0^(9TnGuX>X-BDK7#K^X?c;K3v)znprghZ0zK!44; zhG^C7u;a`WiLR|dO>kOrM<%oF*dYUf@x-fijTXI4wK1&mn>KB0OrUE|vv{7<^xlO; zgEyFOph!Rw-Zm5>7X6IbsJk$Tk_jVM@3ufgUo9e#i^X% zu_M~FIax+`8+`j!)a>e~=g(94-QSZACfTAb3)1o5fAb58{A(8SAHHu{BpF8nx^CTi zeX%|Ytry-{>oET_j|4Oo?=ON>Og4MN`Z?xrL8>VPBYn|$k$MQmUFW+iA`H$6+I9q3 zl15qJmaDx1$Of?1lLB^gv*0P@qHuHTO0>@^ok=+}ufCbA-@?)N2(xFjQA0=e?d}uq zBPwOIO2&Ap>I)6o(T)?*B4lNt=o*c%-p6=UI=K0PzIL5?oqV9^T}Txh_KYaL5x3N@ zCle>ih8QQ9wNr!ip8nkw%-K9d=?*(SdG&9ElR3d^gd5Sr-SB>Tx+s7)KzCZmMt1*q zj*JXwHK5QC@1PtX*ME>&61PI#it#j7(`yE3uUW8j(!v!Pn3^{DJttx#jDo1Nu)?~} z*i8@S5w~t)iU@~(Fey%_16TgvBmBh6Ez#7Q+Qv;Fq>ZoxEKek+Qt$v*P8W}18JnE^ zik^=(wy3B`itHdhNuWW4f3kvvkD=-WdD5cxTq09_CsvN4ZO4>*C<(O=SM9+3E0i zC50}~4H(TodBRQdAFfPo+fm56W>sijy^om1=GdO&669Hrze>WUA%PBCMtocsIT5{z z3G7n=c+&TInD=_5O{ROj0ikCrYI;;B`3F7c)7Em2)8}Zu7j~Eyg7XtG{&yCirotm8 zsjbb~m(sze%g;n?ydGFUvV0sjh2KUhk&mkq*&aVs|FBjld)>h#@TQcYdoN@GD$FMDraJ+_Pss zd5LUzwdjb6a>0s)mDLS|9OR!BcO^`X^~<2Stg8|0+g=#Z-J>ImtmFJHW03hk{qavE z+z*7oKNhnWhOZP*mWkTgHyuFfi30jOS`1Bk&Ym)E7wIP^{lwhS1rG|NXkFj;yfc>LJ06}cYl9-wfw_$X^!SFxX(a|na#9V_ll7#M*2 z0icGUF~;d+X=ZY5k&i^!^#{-qcEh!RfD_}OvmsY&u;ehq{uxV+yYQ0-4@Ki0S_qcf zn30=$Fz2HX=lsJy5ZUg>BFwm9gEPoNq6oWi?||PCs7N<(jqD(G5lk-@>_+4G`kMi0 zG}%alWUZ}@*SiVOjY0wE65g?;pbLfZ%KwHeYQ$s-LHDWVF@?h_(QV+CR#aEtfd~fu5^Y}~qh}>Us13(8SgzTVGu3f*r zAZYgI)|@Gpg@)Q>Us`?N*>^u=F<+BYQbJh+3d^@R9cN;dp?5F|qMs<01X-e!zp&>h zsB?r^hh=+y8|VhP5w36d7Od3B0*PSxeY%V`_Ic_@*$aO)@~yubdA@ej5%t$GA3-SM zgYzQ^fXNwAcq87|JuF)Q9m@lpK3I0ZQ4NHp&t17oey#Ry>s~d^CU=4Ii(Ba=qA-F> z&l*jFV5rOgl-53U2?bBZ9c1aDKBUH_uFhl$Q3^(d29yUu8L8imjFR6J3-Um60%r}d zz^jk9LP!X<>Dskx-)YD3C&L^hI3xUzpPR{)I+lH$!IE;k#!H z*LeDo*dnA!5DQNa;o5l(uEW+EkrWd^TULlEXJlq}{pH;u+G4;?o`Ol%i2k;MyGub} zI(ho^EiwB^p&c<>66XmC^>)X<^PZibfd8EO@BL+)^uFPxA9Ye0U4}i{F?_(E)S~(! z%l9^8SskqrkM9bdF=vWPYq9=_BL-_oCuw?}=c(Lsiqpe0DfF=+vTm>5TaUl0ew1re z;trZ&R(MK+dBKK5Bl~jrsAfQ;8-oWS%DnA*`yAcjccnK z6-M(Wg3h-wv<8fZW!h=msh&R*mQvuv zeQePtMS0hA_40H1YdhPNW|$ivu}R#r6tYR`bT7|+>u%>A^3AL{Xxkp&XR5<|)#^?f z$Z%&e(ydf!^IN{Esoz8HkqOHyq2-kElxY{a5o#y##sllyA1ycd)OhFCv%nH-87=F) z#hs5UWqxHe91q~ZF_(hzq0oC1M+vl z!zrrYqkkZbKL{j}wRn-{NKM%@8*{^|gBNt;@WP41;}j~61F9dv*;!xfd1z44Rv>x% zLhZDIHkkXi+!YDo#ff<@$IbkiJl*bz`%5eJ%TrMtm^5Xc-VSTHS*qk#y2E*sV~K*D zRikWOkE6#aG1;I6~W3**(o-UE}CLTR!quKXmGqu3(LQJ+rH$`pV#FCQzc9B!U7uD_{7@>3Lt1 z5tHp%Oz}~oiPE0wee>Y;*gPv_U)7=zdkNKBHnd!M1H7{~KQ9`j4sgqE4^1e`8BhH1 zq+3)pGKjwZ%gV23;+Np)MH%;(h4Ra8D+=pQT0FE#;~;&EucG-vL_$Rcx5h1p($h@k>7@xvb3 zH#d#ZP}r7s->wYU(0X!0VY_(1GS{ zMigJgEW|=Q70y9Q%IS_xy}n5hwM!=)%gsJNvuiXg?4$vov8-is%SN*X#v?0imvS>6 zZVDaz@HBnSIW#Lq(SOTl-nDYV>DO@sRT(dd+7-L%*nPH}mb12O)A5-^)86W{bx?Vx zFg%{=n&U}^X$3)+Qn3eom2wfO3sQdlZna6)Rg?iG+f5~0l-v7q%wH}WTz;q0Q0hU- zB7wMl+uX`#6rRGsLi(1n(01*JggM#n_Oj10?h8l$n;58nL0@1Y&CZy5A(_(uS0d$q z`+h4&LcDJ2jp&7VJibzaHYs9!20iR9RQII)qvsPvnt9v^!t9eDv({SP3F zfd5W2T#{fC>r|?1CR+}c_OZvisL-`R2eSyu4Emezn-s-h(IbfsDtf4+IEaC9wt ziV8PloLJy24V#%9a8zRawqVYIk|%0!5a^zt{R9~V{Gv;F#))ezFE1VZg6>r6ll5&E8d6z+Tmmw$HjWFUT?_6Y)E>=XmPlbfe+o5FI2oKC6Cf4-h; zdi43=F93^PQD%h3CLg zltyGokmZmOUa7BPsPmii$d?zrp9;F2Hy1vwj=(ep1eUIn=h^hiqlDvy-Wm<8g+W?R zOd43uc%vaSvIUK^qk>c%kiL+X4 z-%m9Ghnyrij8|vP-UhTqLV1zyw_PRC;4aV@)2DKzqH612=Ls`v6!Z@w&qXWk$8b72 z9cZX3Kly94(7Ak{ALvZ@^Lt7DjbHw^@Bagy`GWbe z(?4u%W8>y+Y(=@ao)3;?IW#vk3{-T*%uF^8ozdfWl3h$GSw#EcRG6vgmignVbzSbX z?dZnGo=&#n)2jeFH|5hXP$+nL=^*hZ6_=LLY^Zv(SsV5lj&|uf+6tzl+rMPY3Kv-X zNERI&7?Ju}m$H;?H#MZ#uvD~am2N8vpiBF-h~gq{Tct3H`9l*f_fo%IR0!Fe8(#P9HuW8#nJ=p5@N^+;;Sw?%_jPkx{o~gAcYhWQdx~{{nv} zO3gD4iiD4b$6GzU*FL_QnFZ$)G#oOmO#${F{(NbAkN(=I(9q;$yZyp5<6Jjn+|7&CheP8G;S+R<`X_ zzt{b6D-{Jexn`iG%d_4*b`g!Rp`W*$V|P}FiM>`<7HQYp!6$uCnwD~Hp3pB@5we9t zc?Tkbd-d}wc}DwgXF59Gl7D&&V-ZGg|NGw|)oE+>$Q~${cQ>;LX1oKQjHt!*js)M3 z$8>bDDFnYnN?FM=mwMfF(XOTRw@DQ7!(|R*^~$^f%BR<2sN?DT=*}=SpaNVp*FiOX z)|~zSu=Xb4RIhE@_>xd5LuJTVnn=-T%2XLjgOq5?P?}_(NkpkgMT62HM52V0dCXiC z$vn>)R_1yB&KvFLd7tlj{_pX=|8F0TeQcYxto2*>egCfOyw2gwXvao%f9b_hTCaxt z_B2+-Z0#KKEPt1hd^+-NquQ*Tz_4@0{mvTgio+W)u=rt1)hO|mbiKeVqb{sttVCau zn!2_1SBB|MG6|x%j7{Rh_`P|fhUsxGO5~;kg1dHY)r)ue%)sa3bA+ML zeUUG?i^}#A(wUhSja_jx6q?$^^fV()`#95O{$eH+N-*q285!9n%FNWeYZ;>|8>(Zs zI9A8FnkCu(K#lHLF5=x&b(7tdVQqh~Ps8PM9SS8TWTRtsoSR;Ggy6}j>vKN3?P9^B zcJe%;KMWdJ+363eJ!me_m^skZN@sGECCl(;OmuX?&1Sw=uUwR9jV$_X24+VXfg7ae z^%sVM|NC8?4e|u^s2NX44x~V!)=rA9&Ba?ldT*ws$wA*!kvO}WTW;v3lV89wKPUDG zy{#($Z`HG1he{3KYbPwHccp=K9_TTvU~hlw#q4EXt#TF~H4pv$MPYmEPT*W`q2pP` zVh2$qti1=I7v4c&H+w=Enh)9+ev%JXZuaAkpbO4xDPMA+ix2eyBm*a(J{;I=#H_-& z5XxR)p+%5bQCtxJiSpq7{re632Owx9`ZIfA=@PZG*Ud0vr8Au{@(F-lS3vk#JWPzh zQRT$M>VzYS7vtZvMzhmWiv@vqRWL%u9Xq58WnS4hlWY3t`73oxRsNqqudaE_s}hm6 z_@^axF^CNR2@Rq<^>FrbphQlwxcKeqHMsmSU@&vb9ezR0JL_>!`j8X1~TJuj~<>M^Id z95xNmeLPrMDWaKgeK1_Rl zlq;B7y4IPqCt7Z(bwzXak}#>1d)-?S1>~tcgH^Tb>l@R}+&0Y$SIj?{v1|D1((eBK zv}O|!P7BinBF(|o=?T=@3E&B37`S4D}Lt=$w(zwV~o-CXOnNA*Rr zqt}YU5zK5iqto>(YvMZVM`H(x8(&e9qMfgppG`~9y6%tUps+AMb}OcOzfM@bqvx^;+I{2h zCu&s5;ITu7L$68e4^W4`G|cJ~lfb?^)l>-HKZ)J$$M@rJo1ScWz%E z7Sh(D{VDW8Ml3s|xx~d-eX?=qM+fI66v$Sjer}c?;=3Dbc-K^N>6oi$Q|5$56Iy5A zezk`X*0ct_)}@^0*lo*x0m#8NZQYfT8gCne`Wkos@wi*T;vCMO`a6YAXp37z@nqN2 z(SaRLBm7SY*J>&z6?IeVPEJxSAqlcSB)))Q{I^0Rd#yG^HoN=Pv_vUrX-~6J&ERmC zGK;Q-Mr~N#MO!6MvV&9dmR?_Zkm%Xso2qL1Lg9}^AlgEE-T2p(al5VU=ZHWVBD%T# z<$*}B+TKgDC(^duXYN~KX#P*?l_PxugLmHSU9@myyx!HTX@&+URFF+3P>T1t4x^VG zEi&%AZhFDm_kA}l1y3z>q&ciF{k4W_;7ey7zNHELlwSFxqVKztIof~A>v z`ZuiEdb=@0i&0iRbMweR{mwCtpe!{M@Xgw932vKs|35+NRB1)l`RSeg3X+kfBCmA0JRbBM^$Zo^%A#?YjeUd7< zLD9T1D)$!on^$C&U6@oE&JW7#-tI)Yc-AP&oK!I}uf%r&c|HMsnmGhw^ZYq_avq6b zoVNvHpBEA>m?Pzw>Q)Jx1VD%_ zJ z1ZV!#NAPH6Wrf7%SF-j^bm33nyP+%I+&3`r2%-kUjcd6lUUwhab^GH?Yo%&*5!&wV zPR#foO2R{7`;}p1OaOVc+{CF`DRpH^7u_DlG`6*EgrfjS@4_T)Ip&+7gmLKY#ujd- zUwU=novO!(86@2^+K1ran=5`Q60-^-4?*@L^v);-%wds@=Gr)CEcD+AoDtjC8=Zxwlyj#6%lL2?qiZ?^f)b4MydEhlSkwNrT;krY zPt5u-SSNY}QtA)ed^XUC-@8F#2iC`hC8`{#F@R*;uUFW*Y+tx@d^BmufzH z7*hn*3Unla(a|d@+Em(hzA48YC`mZW!}@fNz_-`zdH^h{^OOVc|2MD$QHI zu-@gDs|(tbos&Zz3?Rgmuy%VfIQpEQf6zh?gV%HbD7CREV3Q(E@l{hF;XV+ENz|bp zDqIYHmr~^YWu1uT2Qyo6&&e~Pl(5N1!@HYk!O5*%DToy=>10@lJRs>jwVyuAVd|d(S7oEM!gHH1nO*g60YxEBt{;MXEiEl+U2cpLn@T;jCK0=`{mUJi?}-yt zH*vHFUiT52c{r4qmb-lOm(#IF&fDyE$X&3wwN`t5x&?BsaT5CB4*6zZ{=lq#+#g1k*hKoM+Gt(mr5CpUL93DeB z4mds@K7PC&lP?UwYGm_u;IjrkzYi`0Br1=dJYj#f%jk)+q{{g9jKU}4Vqz+`wy~tR zCB7(NY7)?@$T6tMzNoNNc$L?~v+vfCiM#+|NI>*TtQP?hH9RSROif{os!JTc#n&Gten1Rn;9)>L6pRWfFLKr8&Cir+6*B9O?(Bc-aqP}|= z13H1W)eM{rR3MO`+JZ6@QSHRc#Po!y|DjZD0U5U+h=>aEBH;+44ZVf5gOh1bN_i!^0r=;G6?b2VP}bEj!k-VippMdwe$}WlZh4)fpVU$8mU_$MWXA8im8Wi+#09vmyIxIsP87 zdPNYRx@)4d@4rZA#q^eK38SM+(NG_S9<8W!ZU?0 z&tS3L4P)~ttXEc4Oh;1zP}0uCN`i+-U#PK&s6#*q#=OnLV0Mh?WB$zd@(H1J=w9jC zMPm1GiPyl~{U(qu!g76Cr+Ecb4Nu78fL%`x$E9V1@4MA;)SQr=aX^nEWs`=2X(_g) z0yxoBfv+Gg0KH3!vBRAtt{SHzz^oxbj#!nRoAiR1)$^TC7jgLb?>) zo}#DVs@wr1C*3qoPs|L;BGWh{BO{$$8PJ-F?5kez2TQie0RJSe8s@M%YQhDUzW6&m z>S%v*``s9ewW$%hMrkFf9Bgc_aL{IYrPW-y_HcHsy0>c`({>?IZxSO3y^KJh6H+Su zL=TCym`g~Hzu!)Yj*V>s44M;tshC9cHa$@0O9%Z9 zQBGi1PvjIqSRq3Uzp=jW=JH)xiXiAsbys?1CP$mSz7!Uk75hCjYr7;$F5Kj#z^Rbp zj=r05Dmle<+fL+lSnGoeLmZ%pf~^S*I(xjU-iotcdlXhiE(_m-kx)7unw0Y3VI=SeqIT76+e!bQDUZ9<)ublK2;-(^=o%G zS@zJWl)d^nA}SC_{YU}ZX(bO61Jo?!4jLaj(ovi{biy6H7tFWeKVdnc_ZS3gM5ftG z$Adu>1c(y%aZXuB3BN1dlI2b+6NV-xCIe}uPUC~=Kmk@FpOD`<3;#uQKtzH^qH_Qy zfDy4LZihS$ZF~6iM4z51B*lrWvvXYX3)Koq&`zkQ}XXA~8YY^g}wbi)&Jll>9G+!#fAV*qcxfwmj z)d@{yubU6s6feHN$K7W$g{|0Ss0$Qu8dIHT2#D@Gv0vq4#J-T?+hoXsn;_9H z<7fku=mEUF`povd*GuCAmGDb`{yx-(2y(zdDy?-#CAYY4Qt-Q^Szl8&?3_B-A}i;S zeE;0~*TL4j4iAg@Q10O7c^1HT#0-txOp#D9oXG|Pca-Ja8&9mNG-`IJvV+-){go?8 zvIUo8JkG7B9{oDO!CDmaW^~rlrZUC|DI&P0v-A?3#FDP2-7tK;3>n7jm>6RA1uMR< z2SroXA%Jin$6F#N6%a5HS*l*8i8&jX8MiYV2|`V#g#=AE>Ux0qVi<(Ujo=eez0t2!NxQVc^zJhj>-C<3#Xds43bdtP2cMpMdhLvKO z%WLQEJ0W^Cq4fz!hT8SEufe~;=RgczahHK=2U3+UBqGE?y1Ra=+s+bsPUSop&Xjd6 za9p3O^VQz_o7Y3%0SipD$`#-9E?!R!`Bc^MKD8kC>9(_^p2Ew@2D)YxnewYB>gf2N zbI!psIbu*?&?u${(Xse*J+Z2|%cX)<$fR?lZdK%BmAW+2FmSoGG2>?Th)>2rTa4(9 z;i#6CpHKCmmw<>Lx8I)_gbH^GOG|4*5gajo!w||750B+3y3YFsPiSjz7EK)uq}<0V z*0uH9Z6vMYdFje4jj;9jeKZpa;Tn2FHlg?)lu?p&-KfvS;i%ZQC9cM@(n7C;yt|*0 zgMEFlXYv=B+PexdMflM{H_w|J3yTX*a*H=pH;qhHr78iS$SGwi9oy96FoYpv2F&(A ziEB%E7e=zTdeUZOb+3+{=Q|!K5-2RkFgCtVVKBc~*UJ^|iuS*rU1S0K=1k@-onpqF zw{yMS3-@$`D{T|CKtbULl-+QLZc3f#rvA8@_xc;mq$0kShxI>Q-e>c80%o)n^IGSH zb1oI8e1=En#@>J0(D~+4;ZO5PH}6(4-+ca~H>EsYg3-mMdpS-6im^0Ac44Q!a_984 zPq%i6tKGSlZ2p2G-*=HgN9lT@Fa}rLam^_ za{2U-DCIMaYhz8UpA73KYoW}U4mtH_bG)LKGRkaFy#=AiR}MXeuD?es;+!TI#W}}94&7uljLR;`s~cF@9*PXRVzb|7R^&_=BE-mZIfx$6}Kxvu93@)kVqQ z!&W19J)RABm%5jF^zdC?IDCuV_pel*+h}>#?I=Fs;+ywfapX)U%S73!AGdA=>UV3a zM{3bMR-(0#uy2_K@J_l=N-BMAa~7xwWx zHaPg^Rdee)$Y2&>{`|YCNfov8D4ILU2a*&dAkcY??h74<%atkX*yg>56%_@qHqhKq zeRlo+w?hjro(q`x@k;R7R+^m1G}~6m!y7nJgw@(kovPA(s2~0|q1~1|g+Sr~MwXDT z))DQC7+tkOT#NOu)xWfh{W@^lQ^`n?Zc{(fw=-_K>!fg98>w+^*mm~x4c`?aCbc^$wl~ zk$+DUtAFr+aEk#fYL%d(wupFu1tjGM5CE<-sv*WyK+^s6u*g-@Rd{ zuyE5re*+sIQ%|qC?=)2pV-;HZ&uPxi?G7Wi%=~33F1Nx@SvV94ln-%#S5#ziqa7c$ zZ=V>xnQEqIuy0=xtmtww_AU$#S=)J&qroYE3x+Cj;)8MZOli|oYAO13UR(t70ld=J z;}XvY2}hVRbA_RiPQCGv6%;N8uC8lrcmH0t%qPXXMtJkm9|or*!;;QjnSIG6it~Fg z(DRtplG%0PKEFHvY~FwD&bc4|3v#4S6L=EbUAUo`bsj*E(v&v~Jtgujxt}&O)$`@+ z*Mv(OH}BX%(z`jKYS5fz?iGpZxYYcHf7)Fbi8p~I>brDyAuwN(r1OI=GaYOOc8fsiJn=m0;F6h>%8YPe5?MOn5IYft1zua=YJ*9yN9j;Pyv#LN`n1f&rl zt6nPBgl#D1(x*RF!p_76%>ufC+-BclcpESf|4~InIv@a;Pm`>xIA3BB11cr4Pi*X* zeKVaDHLm~Ia%xM0J96yQDPa`LXgSW|0N0PvBB5}PcVH?{JjT&Zqnjsbb*R0zLp7&) z1bw1{!a9&u0Vxym#Er!(YI1_sDIWmf^aYkEq)o;ZStl%f5H~OeMv{{EftMnHZJDRnlOsMV5T7?c-oXk?fPtsF0^w(7*gAYu%U zPyB(OALdI0A%g`6+zGt_yQx1$jdZ9~A(nrTe zrd!&VkA$p%o2k5|<#xCUl2Q`H1TCbGl= zkNhfHb^lN0>xLdC6j+;;)eeADRi@LZMElS>oWWRZwCRrMeSd#1%*(F$T1ozM0WLOO zv6^f;wYGuwrC$JVJh*?~1rQ9brF=*+iS7n2=}~Pi^^*8}5_SphtLf$C9I(^}frTU; z2%Euk7#e5Wm)jOIGc%ZXH*5Q~2%Z6}9M-ycAH+H_{W_hymsgsBDLPfmLJA9;cg#kK z;6^I_0gUi@BVr{)Eda5EL}D?Z4YiQS z6>_;)S+Cu?MGww&#O$^Yq`sX8toyy69|z)sfBu_VA!{C}{`)k#uXjj1QeK)%lHFoD zN?L37ImF|s({79OX1?#0W*d+=679VMAY-08_>Re8@|GuVOO(*v|tR?ZM)!LGFFU6Qi;$T;&eX{+p&C z_Zrss?5@##ueWrNOAjC3-{fAe;ra9`D}Kg1{aqc2wGUEfS9zj9jh|_ReNfx7$2ZZ)`bHy|QgGElEuO zq!)v>lk?FbyK~*n-%O;tVpasDv6-!(W}|f-UyF&}a7FFVOvcoV6q%M2`@eu&klecq*Xcd7Ft zFC|aKlo*;~uW$5}IzKH9{M zI{ga@KcrAM?Ffz%8jyNm2{NR0{iYjdn^yVd!qR#XV($k zN{(3{K3N0CoFSf@UhiU?`5z={~Zstz(wlucmEc{l$CzSo4VeaJ%!F;tsE)W<~~& zIIs#ferL0PW6vZYmqdFrMlVA(D7PnQyG$OYdb_vC>8$4OTv5|Fb*$5J{!_Q{5Zx$>9>x^a2wU%pIZtbe=&Gux!MoD25oH}DFoOf7u;tJuIsYg@;Q zkd-|?$9$-BPJCg<9qC(MCoH&JKedXM4}x*1kUdir`n?8dkGL-Wh)GV&rR0mmm+{}86^3b*n3zIbTywc1O{ zSiwvd)`H6DyJF$A$=bMr2`N^0^}b%ctX`?}W_CO$@Ta5EeD?e}-8YTff4Bhu%d^W! z1Y}-d%R2o5)~!xN`5gG-1^wq5s>NwQro_7r&ZLPodphz+uce?ME{+M->^U-j{;6?@ z%Ey}3d*b%h{L-%OT9JVQ#z%zf3w4j?zZ(Umo^?I`f0a*VHZy2aiF1BjE-@9s7-$XU zH2efk#+mX$#9l?8WBqVW=#Ecu7P?(`$4si#`Jo%4zU=H{8-MfdZYn6yxl$f2r*9arNg3?`K zv9TJEc)<<#3lOnv!x$B0APKht(=d&r`2vj|F}R=>VI$jVZH=#20oF^(a;&Z1P|+r%G@>mJb~u7X`0DdHZf$_l|PjIV0~ zKKj{E{(%3gproV%tCOAL;`x|X=|~b6QU+fp6E{EV_W@iG5H5B zE-nX$)8>l1H%(5OTu`i%0tmk7n$bNV8{fygHL;c0l>p?>iF5l0*k9@MPcQ7#UTR0j zLxkTZ)S4O_TRWY9{|^lP$tSxWU7&|K&Bu43_OnUSIdzKKS6(b8CWiVW(^!m_fT0d?KKEYnx3Ba7}0tAk6M$rZUmNXBr+G{ADG72v|`Y- zC$c=rlsIO=(r1 zl@HH>!q(#r_`9{!x%?7#d&U6zN*itj)1LSO62GJ>+v&{*E((i^DyyjEvBki)%5{9) z0c+?i=Am7`_+=bcddUoPmc9SW70}WGIC55`Ij7}HlCwcU^WE{ZTjlyu#My&jmT*DB z#fE49{l|}Q!UP0pPz{6ctEOsy5hqbV)dgDrik|m#I`e!Ok$CPpmXui z9L6f)vvKa#m7k|sK!LnA1NuG-kgK5{Gp_td=RB6L7okE!X88!B3VQVlsmR#XcRyqL zFKyZeM*u|)4R%avj~+ix3-kJW+~T^A7uUxaagYHDq@h6(5o9!sT=5&}r+Y(R!sVNe z3>Z+Hqo_u~+lS9I5GH{UzL;b>0!qgT_Tl8_lTd;}UVRN4!t2kGL+VoW~wnlO3dHQg6f1n^@4dRoB!<>&fc|7#6=|NpxNo&_j$%I6#HqQbY5 zbhDq_zZSKBTJS%2n}4Z^|NQu0JM@SEirU+Np9c7<5!XA%4OWnry|UsSaP2w@Y@~pi z=}z{^#f1Gd0NzFkSiJOx-MQCPG1Abm)}B+o~ZMf-b%vBjJRllqU7^qS3@HLHk zGYcp@TR96%sFf?W9rVeNldL>(GTKk-W}c>QdL{4f0yaJenWY22)|~r&5`WVD`>n)f z+();W@%gedQdWTy5n>=}_Q*Kn? zloIF?Bb}WA=&bI4b};~~@t}lNzzTt>%yWr1%Tr9uw^9Ov<2gh6`x^|5rR|<%n#$QZ zk9038_~LRhe6;(4@@X@J&(B)(AJ{ig_HN&yL-*xvjkYzv*wQGo6gm?F!A-vk6o=lO zIC^O(i<_&m=j=YGTXZzVay?!&i?rps>wFH2F53N;Fa3CatB*H@Vx4luUsynaWhR{= zBKK`_N@H+2Y(p4BDu0)~0ltK+Wm?J|5>y}%D00F@+ngr8T=n8^krHSfn;7^A9JT1{ zHwAMg`b1`4kB5s_5a>(OulU?`itJ z#8yV5tC{wqk+9bivLe(pMdiPpKX0zJ`RK&Bz%(4m9QxNK%8Z_Vk&NegRYpA{BEk>b`g1qL~HOIc*-# zPnju3D&9Fg*(qFNQu8Hfj(X|(lkzldotoCEGcr!V{OTC};a;e$5U^d8S+}vVmx(#I z15{WF%huDBN$9F0?94`oJ7_5e6^U#i99t}P(u%az7g|UJczL-2MI&IE*v^&rP>gKe ztZ>vpDp1#+rY^2#0p;kGEBl-$x_YVg{H4Mgy`kj&vepKyoMCdO(r_(HhCy3+Y}wFv z4Y`ovzH4F|v%rGQ@)5BJ z2GNJ&U%ccs-#^q_@fowrQzRgzj)9Ydg8_-8`wkmIbOee@wxz-XM(QTb!n3oa&Zpm7*HO5z|agnS) z7@p)~PP=~l_G5(60*(rtHMJ#SUg87i$yM9*ZxAjurq76{$pgluBl#a;yIHFd_<#I+ zj>ixi56|Yf`m0HZX!zRKoZAWn<#9{PxmB``@x&^&ZUiX|xNUo)>{C%txIxU^zF&9* zDU7Bkh~4nKNkB=as8EwLYd|qu+^zOtBt#6%*Zi)IUn&>7r?eK1#t{pM^6L5X=jZag zg-pG+2nybZKi5y2D*%6pi>y}o@JFXkYs6gOZ`;3^@E*32MTNWpw;nz=-mYxa>*oM4 z`S6G{vizwbn=jhj{8_vX94{1G=@|a%RAW|6d?pf2wsNwtun?T%SX3$~4}~C| zTjn+J^5VKM7`K>83JhGZ;n8A%W~P$(7pPtqwIZUTb~@%)C4m4^8oIh_f5`?r**rm9 ziuBCn03j7dT^TXuk+OD|6WdSjtE629ZHbs%6GKWEFk|>eOr_(lBiZRcl}YjaaAos? ziwwAGYao$=`vG16EDS8D#V7*+9s|zyrM}M@^O6?oqB_zvcXgrL@;~Uud&vv^7ZBcn zV{chFIo;_P*%1di9hiVwG-N(+XK1|9qXZ{GBv>nH0=@CQF)h%S30WBxDG4ec*M(q{5EoC zQ4Jrj(S%B|q`W-;rtgsu$BC{uKE2J4{QN{>Vq#zxfHyu%tR!AGB*oX#wyTQ#Ez_yW zE?Y8p7nJ~*J;!&Zs7(zrdx0nk%HFFtZ+<YRVt~G}wtVNFH4s0=Euo&c17Ao6tW;<)Z)fQJMDCRQHkS&z^Ip z=H*k&tf|v4ZrX49W#MZeq-=a7a zapX|SE=CaeCiS?}}*- zE7}uQ7MOSJ9tixC21Z%0-UK1WU7WAaAdS*+Tldiq1EY*5?@abDOJ#hJxM;KL$s0Fx zMufWOp`b^eIROpV2uzorzFVi-wkBpzOQTff{mIycmtkwx6x?rl#TP^VzT9fR<(K1+ zh%uie8^4PHQRGqhEPuY5eeb~j6!;<@)!zzPOi2(H{0mQGarYf7e~g{m#NTT=@tD2! zviYyG#dbB^Mn`)nj^;c~^ODu716fe`9w54G{RERb7I$q7;Kz?lT+{ zsv-Aogw1mJ+YQ-P$1!BB{kD5ya86I(@1mKEH^uy~>dJm>5XCAu@?5z0P#=%NRa=H7 zMrk+vmAvYe&pUV{cEeWdYNEVs$;SB1iQtmf@!8GS*|}2h#?_o-wAUImr{m6??Vq_> zRCm4aK@gBgnO2?AOM|0jCq3BR7JsdsprPQoSGHl1KeeRt|0tJZ8gZ+mrNoPmnkLtp zEoR!qm)+mLpd#x{C_+{2j&ECdJ?r!Cqu%3-H+>H&>WtR^DXuD1CmJkL&O!E|nr6u8 z(k}Z+Yg6)z$HaZ-(bfVQ@_CG3Z`qVvb>VoWKv_rNN&F8*f%cW0E>fy>+sG2XC zUTe8@X{s7_<3Etbfp$B2ihRiMo&ZgXp%zu-#fr`q3zCXGv4VM;+K+nv>=QI08TA<( zdBmb(R;4|h>JR;xbZ9QB=uh7jvHgE{UIo7JDSOm{q5vH{`c&%F2VKp&5wxMeJrYbc zPn>ua74@Kyk9-0q03^(xu0;y&68yQRn zG8I`a36;om;KMZUyvmqvv4V(5P{ZGm8RU)WUr3gWOeOZ4U& z0Bcnr12T#WJXg-l=Ph86M6xWQ;wPC7;^O5)u$dY^4Q4d5<6KG&6)#;9g1=j5SvO{T z#CeJ2eGzK4v2hkM+K@Ki+^4Yg|1b7e6;xz!|HTVpm<2f+35I|ZThE{*34z250RN^Y z_Tss74UpAeL&&55C9?5a1XP0J<4+nI8Lbx=*TlW+E}x7R@IZL+#an_<)NIS`0zUK! z)Qk&V03*VP9;Vl`25t~914IE}u~u$+G>EzlDBkvq1w4dV4BrEiOadVR=F+Ck{}zcn zC|w!({}v8hzwJLkn;$c+;8(VO_wJ+1Sd$1=-~%4>vcwf4o&rxbFE+t_Qk zpn6R(nSjSz3#P_Hp%l=yn&3A_OTUzb^+C01(qUXAcXt%D+8<}|Twef=*5(WVY)NQ! z9Bp(X;SgRM&itbi^%$OlL|{7%1>$@m_df_`J9QZDXIUNADtq^4te-gcuvqt>z)F)V zFnJ^7Szf=)gKVqkiOcD`+<^3<2ssD8WXFepCNSEKeaat(yKAYoNwKl)SdIgs_i}QS z01)AeB9BCLOzA1{vg1L=-flH45!|eo%0se30VIJL#Y}wJV!?badDF$1wq4`)e?Sqz z5uI-I0PKhYE_diD9FWddemoClBtNd-&E4JcL5Jm3EUREuFkdG}SHdDJ>xh3QdWo2bzUYGyvD7r}OGN!DW z(s0y@`2JM7c4XjY@&91JU1cR(-e?|gVkg045^<{IuB5SvNjB)__h3OuL*cvJ!iw}m zc7cqGtH`;VU_D;m8yg$D7qF=bwK#;C>;$#l|2)o)MbdBB_QjY3?>}%LAy5qS-lGsH z{Q7kP+a;!U5`OMYQMr7Xu>0!G=FlTl3y`TtB-)M<)E{9k-rhGLU@41{!Q>l$)0*e` ze2{JkXDax4oPpbxJzVxim(I3w;xDInCBqs_5qc1?Q0ZYr9OCD&1oLvfxU>Im`n~DV z2Smi7&s;4k%8Y&aqr)2f+J1u?l>a0%0c44-$SVaa83uy_cmY7DZCc6n0k)12J2s%c zrW82&=0EZs{?E)*52o+OF=R)#gP7lv&e(%(Xhnh7Gt@Z_9OjX>X8}y!u;VWIpd(EG zhbZNNivPzbr8hB3BVKTjB>`GL3mkkjUvIj}f`&8_;9v*7eY@Bpg%2kh`fAYK?O)V@ z)4z1p_Ju}e?;AhOR$Ns_8$%*9I*oC8$S(nRb*l!69dB^1S})%9>^|JSu$hD2yrDl5 z^6Bh}ms21}Do+Is_8n@OF8XX}(yCg^yZ98bV6>NGx2p*y!E{Yp(gtvPD7EWk*cs6Vf za_2nXga6PZGM;k3#>aM_+Kuv8w+%+f^hNZWxsvU+E zzWogDRXt+M#vUGWJ;O;GJTt!}#MGG``VCfq>c=k2L3*6&o{(YVPKZ4bV__(a{mq>+0gg?vB4!foIleU2+ z*YQcXfbXYCW4TG3qtMW&v zZo(}Kj&iu4aA;Gyhl(pb8FxXeFnW!FMfUs>fVQl@53Y6Za|=9|v5+%uy6SP;&S$4< z(&}xhhs-WrIp7iV=2u`fSv&6S8@9;%ZYWL&T)TOx;Y+3?cj$mMS8gcWD8&`d#uoY| zKa-h}L%?>(aABPl^Yz@j*ZQcDXGDaqRdnG+1zzaVdg^gU$Lqwz=;z9+yr9|otE7~X z#p@GDd3g;xpEda0r%*mj_tH{6pV=AsCVq$1b)2sT#dxSI=lLWz+Yvjij0Z`IMc?L*l>&Y z|G_JFQhseZ;9aJ1Qq^CnA!Q_EAgcDdP34z@i!Ozq9huX?IE##5bMEnB8|_RJbCMn+a|KhIDOU6+w9 zaTH2PUEPoRtGhv0RUYzS`rkasz6;ILNdJt~|I9%C6&?MTAxHbK;FP~?Vb!|` z7|03lY5=G1i=dE;$ajKw&@F^mP;C+HjZ-HwOftFBI>u=xxNH^+Fzd2}=9|&G?OGZ6S0AvvQ0S;Q! z%E4V*Qx8vsoTB|N#2`;2R<@|S$jwqKW{JSh4Us8*#`USZr~-0!_axJzoxqRZ>!wug z@(j zd_?D~mWi3PEB&$zlm{O!6>q@XI^-|Qiwoci^w`wYG=zU-Sn#hxmE8B|K&2NVhoSUJ zXWcN={`kC68J|Lb01IY5R+xVx?as0lBgK5^l0-nJAXd|3h`nAfDft(XSpt(=AiqJ? zZ!^q`pi)@}XnFhg?XMFO{>t&>r5c7{9*U&^V`6&OJupB%4+AaJ<|674{(CK!N*sj9 zJdz*3KdrZKUPi2}NV>+kbFUGru^+_E>Rqc=BAU;oSH#2-X&NrKcc)y*=En{giQmf767y@!AF3e(c9E_nwYKDrcte>mW& zo8e!Fi(miPzoz;dONrKq&xP>r*i-KhXrAv<)p61mi_wr5-hQx_Ps6{a_h|QXq35^t zJ+>_1qwJ9^*qZsRZNyDdZsW>SJ)T8FQ;89UHrwhG+EV34&M=AZs9*K<%NJt6nvc#w zN4Cc#eA{~JUzZA0`qj3F%mIz2&yU8@jG+?R4a=h{GHQ_~2*{=3C^HFFRQ zbRPZckz68BfxgHyAI11`^lqRvn*3^QY_urlxQ|#iSccD9`ks|~c?5ptFU3}?^do}) zKB&Kvk0J4U1VzvEy95-cLVV?*-(S{ya8f*DaNd(_8mLL<&t}afdw^97II* zuFOlgEpW$pgl)L4us@X0?r?8~3V0QmobK*57P-76jYPn*bmD>VuaxjzWmOKTe!kAW zs#n^)=Pao2al9*}k#ragEp7!-Rk|;0h31wd;M66hrINg4)C>%sut?EH=Jvsh46Nqd zR)iIuK}a7F6(iX31Kgc)P=)60f-7rxcQ>(fX~yo@hc>L!kwQ%1j`$cE8lrIt*t8J| z%@|z3LYEIEUT=T@UUl`A5S;5wmuXy#VME3@m_e9j=XQ*0>N(v6J$RF@T!inP0-lTO zjZPep1u+(*ZG2H&aA=^qUO=g-tvyxalbFGE{Nbmcj@lM81cHcA9inf^-56BWHEiS; z0HIBFQ$=f`wOLJkRCXK_o<*`0GkI+s7hD+AC_{aXcY-)_hf*tvh@9f$6%^OdCQB>qiTnr-Md^F=`?ymLSZ+ ziJ<1f7--b60cJxpM2dZEeBZhvgTok+f$54l0Z|l%$RaK9Q?Q7RP{4wLhpVqyZ|apijQTJd<@zMYQC&lbtAS)h2)^V7o^hoFKh zr>F^G1#~#DY;@i^GpxgB1MmH`X1WVtF(Vx4WghM@_=-az3k1uMhkf(i@+@;|sMtg@S09BlV#r;JL$qKY8%yLE!Y)kU|PY z9&$n7O7RE7?hYNGD=1CM^^udI2QZE!UX55Tkl!^z%&4HQUUGuk-`96dDF{Te8yaEL z8&8$Q+Qe*A2bYHgU%_4+pUMru{D7kF!*UWfh(dF#r!(ni%)1T#)YiORWn~VwHnJ#^ zvzF~=B>avKaQnk@{f(eG5_Z!Iy(}#SIq-;4y$P8!@vp;#1)Pi{Gi(el*!{eO;e$je z7m!~-7IuU~a6H6Wyys!F*NYEy1tvq_dFONgJS^HI~qI}WT)?%6YD2`iG$IjgrNo6+lI!* zYhJSBUpj_;CQh6>#fZBfd>nCe47n=w>P>iA&e7Wt44gqO1JVB?+4KVlVUzVU6V9iM z<+V=DZmHi%4`be~EvPDy&nGuqq_*Bsd#-6sZTvBWu5nAd&;#6bu4(d&rWpWZ>Q2x`<2zP46JH*^C#7! zQ~#(D2?qnauOd`K{@Pp)iV)#Qg*A&dUy`!6=s?y_a7HxBK6ZMX2^Y{%M8U*4KVM-`nJV)yIXLA#OL zgY|*da6ZH<*4SMKv}ld{LzA)=p{g)dR}g zU*W5`U47_aWYIp_WiAK#85xI`+G?As&Of!l3I6nPxj0qjs={Bde3TIiutChx`d5$r zuOEz|W~LO%jmx(hQqqO{tkTYSa@5s}`(>DUC11aH(TdGytp18@M!a7TuZgqeu=D+B z(NZD**v^i#zwg=6Pp`^!WXM$R>9VfYIWdznHRGV5mPT)r#>k-Do%OECxZ+Iij+94u zY35aq?2kqYKc|JIY1GWRH5Y{+ihN`KIEHIktkU}}I$nvt9PTnZu#ju|IGD?dX@-x| z>rj%<+2}0l!8Psbe4VNJ2R z(G)woI!tYT*2Nc^J^iw_ZM}1gZEjTf$a{{TOxO=&-z`NHMP^T?~p?M~D{Y8ovi*;+~Y+4_w$9!f73KXT>I zuC!3@O|K2>+tIVzw^Qwy(qhfhpx1iDrYZKs?}ZdcP3Ml+_Tfs5sxLRj{h9I9x!6-v zdJ8AS1h&~Pn7(Suc)@W8Mw|AR@ zZU}d(Rl%ppC_?$u&Y9lW2j)Yw!*81s`g1O-D;F%AmQb6KxoXwoRbl?CC-kF6oM3rw z1$W4yVEb3Am#wMp-mBZn(fe~%-#LSa@3yN5hKDq|t!SaGnX0N44c0Q6+b9|#qw5B( zel1zFP;MZVmVen8cjpW4t&%N8?5)!mMmKvv{792Mr-fbF>J|v^?dEgzkKP!PO`<&QR++B*CBbXE3rZVIz3L<+*W)W z_~y-<%!%w6Fh>;PKE7z;y$@C>irlr^wmsO=(z{PH&Z`tS#eG*cObnRs7uTE?sLFUBb^|qsF);4Q4&F_jANyq~V8CK8` zZr!>??e=+?5dR>D%oRh|PQK<(_rQJp2x9LXh_wZp2OA`u&p#H~LV7B{nlA9p`tS#< zr_jQhf}LoNh`|B@h#~nwYnoA8TdM{>EtwM#7B>u`(FARS_APe^_+SOBfsXh}ATdak z*aDyhL%zUXN5|xMEl5Z?k!y0t!-K&2rIKM^HzFh*5Z}#geNsG;bk-2+0MgRpUT!~6 zBuxt*eZ2vq5BeAx4=RnwlwB$u3=v<0TRX_;UH5&R932VE6)y~h2KjRWfevdD^hHYE zXW|kQvJv|oIv8FnBh_*z*S#j2_IFo>UNWw4rllYQmbBI|AOfQO0WspB5((>alN*Ql z0yQqURvkNWVm)Dlqoaox)m?PaUy?5m@SG=9B*MgS<8#=L%H$kqxu9&En;h+1Adlq8 ztel)YFwhkh&cO|m`0oJRT+YI>+OQw=@tSz@(=_WHn)BTqmv zB0)y5OM15R(!-#;05#y4D-Yy%3piCICF?6@FY81^j-d2gS-t}aq^zt5R8$_qFME}Q zc>tk{zpanR8$-l=7-HBDBDRCHm9QZqba)L7jgZ2d&uvZ~KfV^Dt>uWe0I7JEG6DwG z%E(N9WhbW;NDo81eSRh!-7bqgMl|`DMLr-Ddt}rKrcDn;X4lv2^;ONGBkA!pdutif zNTbDzv=q{4=~CR`#RDxEaqB+*3>^9~ENWlR%l7u}?)$a1Yi#VAEOp8qimJ6xyGvf~ zf4x~d{zpdzh`;NQGNo9FLKcm+t+7-|*$%7|aP(n;6dqdN_5H8N@;jBrPhcQaV3Bep zyuJy!O-y;Ko5^cU#0KR1 zr^i(qAIUWl;~pIM?{*xCO>GR_;v=V*hE{>SUKfrD)}_^u?8heYp}qsbUmw`%Q#of@ zSS%sw-W7@M*=-jEFXhj7VAbDJ!oFpnmy@kTJo0W&n^Kz!c4lTT4QIEk6GjAZZNF9J z68#2#UvRMj2`U|@4o~h`eSLE*b9T|!l+UNPTHn5Au*KiQV+kzdUV)AcZLO&tw;>f) z&x-(UJb*G6ZsM!guFdSI^`L%FfIe63#WLbc4>O5odl05RKuRi-Vm>iMZL^j{OcGqG zoz){V3r@oiY>XJGClWXLOf$Iy5Q3ZpwjZpkRxL&ggU%db_NzcawA^GA zb<+ITw^d;1NlgoPfhQJWi{oLwMY{1?$!M&i%a#lg0ge!5GHPc;ZL+RdQQTz5kEAnv zk&bsVPUn&g4u?$aHJ%>*g>-Z<`4JY@d2WRU0RodgMAj017>M+sweLN5?p#v2i}0_# zw_*NM<2*Cvc~5b!yE_q?nJ;OylM9`_K}?nH(7_G|({MjbZYh8-_N%J8!Gac%QY zRpYNz$Ar#DKXha`Ou{os#x4=l;2{9Qz|e=))zKKH$1U$)-+h>HRI5b68gHdxaV*nPMvK#kHOv4SGV1QR_rVbLla3*%u)npJZ+li$l))yU zJ(bkR=8Up%A2B|qwHbO&Y}1c6>J*m-sE_$y~8Jc;S;Iyz^*^ju}|Z_F%G*Sg(9Z49;UFLFz-~a=PJY*&RnJ&Kg8SC z-F)Wi=*IPuVyo9PvIpeGTe#b|@wU@0IGSj_Ca?VcHKh>O3S(c%*{fzM@6rzn%MM40 z$s8}Y9#`fjTn>pOeY!)rUbGbH@!^!{+bnxZaAz?M*~Zy$VJ8$(i<7>#s9ziy8xSn~ zwvb{fJNu!VzFqc(ZKF%lRIh37V?(2K~&i-ppRSS99DL%M7xL-1x35*QfTX9#V*7T`OW@=rEGgb%!Hpni^>_ z9poipM2aT`aY0^}n<`6Gj<48^pw(*8t286`SvVbr_WtD{RdfCN=|xL2p2^LuqBQKV z7EESfzD1t#cR4fVXv*as1)Ft$OE?Faj+nZCr$4K{^6u0`W<{FZAPr6E*Zh5lb@?2d zn_0(y9^6`Ei?mdut)C@MQr?IG!4{<$J==VWGLn^hhGdH@s z`^UuCp64%$v~_AbZ#SD>yXJDl?bkqC{N%}#oXN?_JU#$!?ttViX2!w8vy?R*Ft}ur+$;D+AuFV5601j8xY@#IJNG zxScJ&_J1mS@3@@%zJL6vkc@^%On^aXo%L{BfR*>6Oe z)*7qY#B044!Yt78L4_cZuyv}IZG1P@9THpw73fUShowgHRxC>+>IOaKt$6+uIiNos; zA2Ku-FvInS@)185W${524k1IRoC7`C7|wS@#0^e3VSWbkQv%upM0ct>5hGjF;Fv9f z4k3E6E+*AtQ+r$6M7Yl)1-g)J&>RF}@TQ~}?>(QI6cZu%zP8q}%^8&ZGcV3lfx8T7 z(e9(jtFWfauh9-z?5v-v{S+m09=9 zm|#E?P7nmM?)*WbSxR#-PZ!|6^FH-+Ok`j7f9$QqLpg9aYIyL>KW#+WegSm|xeTrh z7dQ8x=+j5fo~aX&^b#2oeotm{nTF z6~Thkr{Uw9mFX;>NbHo?Lqcw&**IJ5`W$&94THYG1(Z-!#fEvo(v&dxs_qXPYRZ8H zi#nV-yDnP+jQg2awsg{73XY!$)w}@L6U6WVOS~Q$8VL-8YW;e*drw+73{e4@THqULHOv$TwU`)F;qtwWsg=B+7E~u&%;jRbGpucG)~)Ljfysy zfpqZVdqBQ(W41ng@SA%A_T_O%afuLF;0h@gs;w5s-$m;jOjdC`9iaxM8tT?}g!c+R z{|Q(M?m~S)d?Ih}$UhLy^tBgPU^#LVP%jmU0JBh1xs9YI%cBi}wHf%Gpt2E8FbGw- zTxk6lbhNLX2(?9pM8I!iVZ)cjdH3(vS-RqbgyYrl($Q&h82~enpFh`tB9@mV{?w=` z808NQ5!#u|5CZ2;#T$$Xd)Y`m-QAc+z(!EVdK3y{B3C8Z6e35szZN?Sq!3oFr!c(l z$&Zd9Q~raSrMFzx-|;^vS(-=^G^WM9mBs%Nzxw>6Ewl0N=aVjaj_#6TE28m$s+x1H zA86hO`RyH~lyl|pRvW#fUNzbM5L3y^uK{Sy>OE^rnB?Ke@_WN1pb)jgp%Z#Z*lyx^ zw=WzcFeYfg3#Y6f;5h@pVG6t}%oO!MTG_PG7CavqS|2cey@pf{I9uJ&FpN;PlEib2 zNYCJTNErTM1z1PIoK?G@F%L3-*!patq-HLMNIeX?m#99+$F=(vMQqxFaNp-K((72L z)s5Y+t^MvSF;Ev90+FF9)@C?CDgjeNE5_2hn&1WG3#zJhD2eSRdlb;Qh&DLR`7ca| zh5Y?#yG1@>{i#>WB`a{9ji#==y>wW-p%WlnjD(A-eGj;SM4sx68x>Hk#EzJMY%0yf6~Igg-;)Oe!h4~$ zQ#^C@y|+OwJ~Ke{^#lYIM7QezmXd}!eQyqGU~mZvO#cq#C2Tx-vM%#7Os;_)asr`} zaQ-fXN4?h|8k%dQvM*)S>!d;3-yT?7Q~>p&_zlVcJ2CFC^*N`#9kKsh2k31L)@4Z>(J zAJ$(&Hf%z}^2%knSra5?B0H5JE?VRe#63*oT7_tFCSbzJ*WtZjjSaaMPjlF)c%qSY zbH4qqLO8o&cF+YkRC#FHM?8IxuWXif9M}ZT0n7Md{pEY1{{Ri{TM4BS5j2Tr7z0p;rVb z3Tkzo#Q0VcDA6(7%f5Xd9_Ae{{uKtC?K;LU811gIi~GH!Q%8_=TC~TNFovS4#X6@SQqS? z@#REX%8(+lva-4&DjDDL!~BR~?CNjB+Ac58BJYt-;~PXI0l>x|_RPCtRHati6V9fd ze|5N*#MZ3-_CI&D4kd*}PH*TrZqp94ylN&^PR_ztPII=e=4*XQ{LL2Lm%j@zk?-5w zeS@V!o_xgoBGdwxjBo%nZH zTzl!W<4*hL;ZAXR-;#A=Op69rpSS% zkVwXH9Imz#DvNEl_j>(jWuuJrys$hI7k(gf@RCw%e9RIX*Vu8Vn&b@k zc}9=kSU;bt1D&rX>OCadopl<@>{rTIHNIvYZ6AOA>XyXwG*Kq!Z<8m+&*RWygySpw zmJUsfk7RAjTa~9>=KGS)tA8Ow@oEO0JbS=l z*43joF=t(%ML3C6nkx1*4fagwt;q>hx*-54=|){>WRJKTUzjqHA7@U;hx;5bT)o(r zB0qjH?FJP0j;)~G-fC2GRDo4&|7c(_U#X4c8sB{Lwiuh_NALUZ`BZ0fhQ~yO2VTt0 zS2dRe%tQcqbGVvgBTWx~xLgO;2L$h@-6uve_L*5~I-r8zc54L{HGcY$``2`q@rE9+ zRxHKAXRRbF#uw}MRQDzQ?l$>eJdWVB{jv2L7vH@rORT^6y01&g-nos=({9FM@%;jI zGGI)N*V@xO!wjTvm0{M~B*;je4MyU;hUxXMH3#2W^Aoq*E%IOy_73;`i@!=kl{dxt zmc_SU+0psF^4Hq^c|0?2{rjdmz1i-1m2C+XrjJ{H|60ke+-}3pgtQu-?qK&jzc-Rb zqDSlbKS&o_bLmU2J_#&2WIMX({bjC7tA6zf?_{3ag|jb__OwBuW?Jxf6|W|*A&;`$ z+uWj?I&*6QM;_S`{V46m7L`%++3q!SB$qPhe}mMdsFq?1$jM)QEFLdcl3Na#AAlK^ z+e?Lll^x4c_;coTX}@)Py8@;d*N{Xn?A6w2^F)v%uCeoUu*xP>_zWuOa-p ziO^Tnh_h(m9yXz8sHi@{Sy(xF-`B9 zf`G&h5eMl2oRy*BUQhVkpg^7Nx=b&9F?c;?M3C-dj2ocJiV?LZ2k@d=4VNY&%rw8C zfQ$rpZgi*wWeuGykz@;hJd# zvTS^aSp`uDhllI<>@hdpM1Z48OZX0>kcC!JlZD=$mpT7=M$mHVkd$>sD9YDr>*N#n zuW%o~TYmmvLR#7e*jmw`vLk}ial(>YC6{M=;hyS?V6kDw+`AtUW`Zc2#q^H9hBldr zf~R%W^u*G4m~qc&>Zm)coSMJ)un|eX1d_7&-Ar0DG90IgkV*lTBnWSZ4$+q_w2#NI zdH}Tv43}PJL^D9lSulu+a~s&9J+o0KEP=0OHQ<}tQ8vmC+G-zzMBxT5pq0& zBX{upTUv~C=t1x#V@G>UIGZIiBRXlX585lO>GTXwq!95l30v7cZN~C4T%Q!UE09P~ zsb}QNNY03f1Yc38F^&iUBKW4IKD_r`xF@05saaVbUlHF&2__n0M;0rttt~~&od8N00`Q? z|2%R=U7Z~u6)_<>2G@ecpHXFZ0D|8awvMiCV!Zk)KS9o|8>>nUPLD!X6U-lxjDla- zA?EOiM0@oWI|%#4>XzIJCL~ldp4~q;zfI9d0@E3mahKEgvtt@CcKJJahk?Z_%L^>g zzZ!(u|1VQmdu9pi&m9=fRqMY@9w+soFB4GpHF$CUI$R)9EiBPR6H^&7cAw>6@dz_L zgWE-G`dxz~IJSRAGfhsi7xYYLdw*yC?m7ETSDQW~M;o6GbbCkfT(H@RFyypJjTgfE zl>SU;R+-Ig%M+BGRQMU!D@5)cZ5J!t6kjBe9s*zV_eP4^}Hd+CZX8@GrbHF`NQO>9F?~ zKIY50dj)WXbAuI@Hm=AdV}Ee2ckp0aZe!Z#&2KTrh^YZjk6tL7~rL546Aj4E@={+|Ji0_G2d zX&BKm^_R|%@Fmwm1+aeo`k|>9c>iIR8iwc+v9M+u8Jd88GRzlz51YjEv|q8^P3&K=n4U9$~~ZMF!qwDk>@yu!1W*9FHy)m7GjceHU1y zM2t2av>f0P?1T>p_BPG0chg6PzY7fnFX(s-ln7zI&M57D1dFX5(y8HSEFtBP#ubmL zD$WQ-lN72KO6LZcl52N{u7i|W3CUP6M(f7XkEAGw?24wm69xoF+*?P2KNhHrb(WP? zdKySEylt?&5oB{BO_+E*Twy&fh_S8}aGO>8w@iakSJ{Ha9cm9g3|y9Y@S$;F@V?}c zBU*cG^ID12@Bc%(9=U9&U|E(`e_sqRGPz7h=SHaMwA)<}ql_}<<=M=fQlsS~KVkf< z=Xv#GWSLRo%M4)A1VtJ*@Eg9Y;wwOfU)$vI{a8K0`jZR%DottW`r~=ecnz3T8tpML zBYY_T*MLfU>2$HUdVhXC*yiYKd>2(AU>1*kH* ze=hh(SM%t2(wpdb)M@K}Z1`E=x$j|m6C+(aITLNy=VSN&_t=R~KtKWPU|8YBEFWhC z437&XA^CQ3R+=^*a{zz3IMe)BmXJ zl+#$HG~A+Yf#U1T%S&{`Xc4A(=P+y9XOzyc->?G5L_~iRi7HU!1|kX;%kaYy14W&N zUz%BO@IWRnT#HzUN*`Q$f$?b%03`)Q#dRQjU&HWPVGhs{k>Ll6YcTUmlTKisTNfue znjeEA2FoZN=KPM#P$xLe-P`MR!{KqAXPvcW#@-Awl{oJr zgXiZNhycZac8J^=3=3JB?ti!l^naJ6BnwgeT{;ZQp-tEfo+G9%nZX!E>`+IE1ZIhG zUcb*`PgQdI`S-|bMT2U?26DUM-i<0w@^!!f@YmqkV34M|LRb6YL*?=PCD-tXUCPmg zd}bqFK5b$76(t)x;%;#|!5zSWl@ixeD>UgJCT6Og6j?7bK-UwE(}%4=FbLT@FeF(~ z6J`KY4D+s|CMJo!Wq#7j$^m?Hv$G@;X6?G3y&-&9C3EjsSbN^d46WUN`VUzoYtE<% z&V-2=H4`>RtE4>!hV&7r8(^&|2(6|NRoP@rk3h>0%LDL-FNy7Bl z(%Qn|=5Q2qC8W~3Vx6bkwypoC;OTkBewEm{nAS#JJPU|wy8>Q4Efynam_&%Hfufe$ z$W;EasfJh2+p9EEJe;3>k5D>BR2a{04*k6Mau{| zhJt)d>1L&qWp|!7fbx%VD4Z_V`obeYNxmlvl=W26#;DAho>qU=pD5T6qZtH zQZx%Q)v2l0?)@HY=i(9tU0aUWf~6=0Yj_z?q){qXBxbG13elOWOldKa_#O}JZT zQN5xU6$Bg#RvaLLg(p8m;vKIKU3@ajb(!ucY8H*NXRl**6^&gFmdRH-q)rA9Ri2%r zotqDfvm6$W{nee$ zOhF1Yq>;PCbGf6V$)@cMMZnFQ>7C1g?n2@$59?tii8yxJEj{^IrSY{YJ^v;kyj^;s zi{fysxc{57*|nr{+MU5eJ}XS5XVOO&KR@p}H-%J~4!HnP;q526xp_R6obI>sUo$VhT3D$LgL;Pkk*c}uvf##4u?25KUit;T6o z)Z@>4GnSn7Bd+RokIug$ zpJEiDrh`fqydf2Yxz0rkfjdO|AoO3DXqb^OPXYkbFTs zi4vyc44UtwuO2%L{@)G7OOletBKKoPtfYKc`|sZ;m4BclMuznQIb;PrYwYsaI>z!( zjs0lnVU*l?^pM$WXW2iK`NLp@|@1F9(Ppj@D~zE&r5Reix;$q9e4j&oYMS?h?-0ItazM0?F<#2 z|0Fjt)XK1Y{rZrwNyX!nC&xSTy(Ue*`YZ|6`ApH7g%X!*Xx(FqXX?j}np<4yf4}4M zLgoH0)-B|6S%y5v)dKa@X*V4`D*9_snu6pRKli6F`8IXE;xWnz z&iDmpz~Ptv|k-~jEX;ZL5+i$bHT zEVV?v%YX6kz8dOM|H-TRo0<2w5a7T4zX-bkgpm$YnVJ}g>A9&E#EB7lx3B&p>?msp zAjZUmQ|Vc3VGCnNjbqL9`$_bOiFHXhS;)(H@Sq+kte#-7K$6(QlY14c6VyytbJ&QY zDoGp^Gf4(N9YaqOcnS4#r zP;Elu&YnSL}N#6BT5g;DX?|j1)+u@g2F-&7*-D` zEn14l^sFOcjT1sJ{(qv}fKspoU4rt(ix)$3GhH?z*W)~@a)+J@DmprN2;NAt0+d$E zAY;~_joR+Mw9m43mR^CwTycFWE+>kleA~fFA>cagi^P*>q zt|_7EZB@O0a5aGrC?9~mKEx$z5 zZq)N6{W&2_Hye?AqJpGixN(ecqa!9oc|4kD&hWtb+)g|f1?pUD=kh@Qk$>W+q^fv> zk%HmsojY9U><`*?$q*r|nmf3yd3iO_&J&7j_(KuXD3o2w2p>fUs-`a%@DFsD=Vw$; z{>=p-s4;WoLr?!6!qAeo(9t0?hW9g4)~ci3K)G4#ixJatV2}jMh1jcsI%6Vr0l-|^ zK)d#6L57=-xRgrcz5gN!_|iDb^PzcLFNF-DQAk#cze^4Hu6ud_yb z0Uy==0|S@O>>IW|Ob)_o37aOoEpWdu(>-c&cR!IKtwY13h2nuBL){T}GO)${P8f3VO5!H}_aMWL0+MY+ z-wEt=*NInu0k%l-?-onOBd!OinMHj8)1%8F!1OhC!=p!V$A*B21W|^#4a3fz2mwt= z%lMBvYtc-d&v&CeVJX#jxd0wG7}FgIQTF>Ug8{5jAPTTavx1BA7po3K0oasgxOO1Z zcd$g#(i`DOFmkZeku)&(!~h9Nu2*5KD;f!5f~{*ZsKtc}1)dbAPM;1KYt8G0lA!|B znF%xmg!q<|oP=VFl4RAEPZvnT=3HovQCwQ?KT$n6ID=bG-BtrvUFU=Z3s?>CRY!v( zji5ol3$=GS;KM+$n}b6_pj{=Ah@d9as%WKc+4teqncq+6T3TA1 zo4!O=Ri8URB3TLRg|xc5xlM+F^IWkx$dd4#!aVRbW=;oSNC4Z0c6$TFYvDOo*U<_5 z@Ia*KQOW=5pb6%*g0+*pROUGp--hMwLv#AYu5{oGyuthp0|#nCG7O#toDxXrqgeWp zOg#Y+i3v=?!c3dy>545sc<3;(>0}7zOurP8ihW*%Mid6NH!zmpB`w`)e*|Xjz_9yw z9U!*u6a#5_xSddtAd3eh$J~Ai6AxlBCTK!n7IEWj#2QZhCvCu098Ow|DGcx7ml4z} z{RSL(k}oRvO6!7yNZLq%bP22K&3pIOmn?slnyd-uFc1cmfgH|m2zJJ{%)NX9;|@@0 z6d*Q|UxSQ#XrYdTu}Y5cX#S_(?Xg9NLAwjLE)$!O<@<97AhReZ^j6_53-Bz|5KnG8 zBn1H!1X-+vKkU?4D+8K3uch_KdMGU4=-n=?TBx5aBi4|cimeUB<|zn#C8E=zw?+_9HELOC>S^N z4nsvS`K5R%v4InGfAsU1HIB`D7J@Qy?i@R^b1Do6-i5?1+2k+XP?8H| zf)lob$g{`Nw(nLrA8d>^iwkv@)Aa>BV)H= z7s?-qwuG%Wb$InvV$rNmR3MXfe(=|C5QXtt`B${W%M))v)T5jxlg`hMY@fF4ueM+s>B=&k4Rp#><{Lw09c#obj)iJ@9ek}-b+JK1 zc|!`4_SX@z zEwjrubuWm*Hb^(gxA0v1ZFkBi7nb*Lxol4&<=OT2CVsRZWTPn`Nn1;Lx%pT{pya?g z87oa7n#z$WE9^)4<4 z`)XX6crZix>tp`U}ms(5qrsIn5{uCsJ`E% zw%pe1%jszY{)!=o=b|2&L0U@dNwNZ|O9QtxlM@zqEg5nsMul*fbXZT_Z*zX~sBJg# zi@1yzy|Qe&WBnNo_Td|XM{J&v*yj~5ZWJuk&sU1vkRki!`-SHa$^H3y`DDu{#+#YO zXJ@%QSbx)WTp8&OobX!tNJdIfDaon}Sy`{Ju4U^PPeeZq|m&=?+qzo%w|a}1)Up77W$|2uZ@1Q z-u0^Fk8!U)H#=N}g6i+xG?VnDbXnXOztxL6`^)DPL)-;{V{2X6U3P9M`*9=Z!YFOp z*WSiQ_GAsWyfIQ%hW*RUu<-0ozXt6PrY%|4 zTDETSYu9U!Ja3H-)Xm%6OxKP8hOxCTMu#Ff?nO8-iSH2asu?09`PL@Q8{6bH&n8(= z4lqa4B-_>KZYNn~)cO0$G(D?3hs zM&vu`pPA&;--|iInx!vJpRl*&`vtP{%;*x=12aY`_Gbmz(`al?FkkJv`N+34Nr!vm);h(}{evEB z?=;6={Bv`j_)D~1C%9H_HAzAK@9$9kt;E7V^L*v;)xFkgYkl@0nfbZ#NapGt!c zpkg9|Wi-_!ofO9|4}V^Zp{?yee1j#N5gm8K6^dLsF zuQ4L6S{i=+ti`av&<1fdftP>|YYGLN>%=Qc;~d?B{#ARrI*Mi&qF5$YZd9>l!^owY z#bgZb^F4H`m&2 zFq2cn!1+S8x(&#=pOg({7)OL^r%2rKP{h<1Hk-0WG2>(0hH-F@s|s z5lH{@!8lIAR?DVL>w^UafRLBA+(d)3&~O$FE;8s}xgC><);05&P%qK#+!+kwHM*c#0o&fHP;Z~8 z$jI0{rC-81b=R+7$0wczg=cY~6^=!`t@>r&lDHa+Pu^WmXcJwH&tCh$cgszf?4VO; zAOHaZXd2BbV@2j?>G<_ikhYQt^C`H%mDCq+CTJaP&~CVbYe2*}fIZuTfvoMz;&VIs zJHCoT<4W>v9sB(t3qHP>K<}~61`)shu$bT-CbQMsW}`S6nJ*C>g_{YnB@_rb3xIn5 z?aW!2JHvq*hc|(0Irc>;#3J2ras#O2+4W#MQoz3YSy;p(z`om|;8UvR9%M&Vib(F- zMS&3B7g4+R1UU4rjz_mzk$cl9!)=P7s)%&BQV=YEbOT=TZbw*)0jA&{+(~ei03RW+ zlOS?KqL7mn2VD=C5~vLbk0J085>ToSem|}f(O0x_a?r-mK`5sV&a*_%#Om=~@TFxF zbp#i2eE_f7e7%c$3;#~1TTf~2|DVr$ZtFz(Ee^Q=+`M>FRgyk#-QZ6=1gN)Ok2!Bl z{=$p&;Yy>Gc^#uzjG%)+Uqa@Tyn#V1SW3AB4}KBWMc6l6xlG0g6^ z6Ro%osO4=<&4v^fu~B9g76JGV^yE6J*;{TR{Nf?B;Gkh$T%!hVJYLY#2Uq}rFk{U3 z?;SFurdX{dPRR7_W(T-VG*L#Ka5;i<oGaxfZQvrj@9ty4B>f6T^G)z-zp84>GT+)Ea~c_We$D zWRrWWB5PGh{hZ-b*=>3GjMy}>L(`0=vJT?7y0klX?;hGc z`h8HwOzX90K5Lnmg`!hx_SrEfHp|yJYI}MkZxz{{*BeuCd??Rl@yXn7O=3ijIuG~L zcpdWN-X-I|gwMKPX5H@h?9Ap3F5jNH*^B8WFq^i0;gH7M=R zbJkfFV3lK2xps z;p-@Lv;9<^Z=I*!wq#G4;#8gGJF|oBYLnvlPi|_5M^l?Bg~E3WMVC%*49SWOaf+8_ z61<`MCHAoQp4|=(Bl*lWtm9B|JJa7>XWNlasUTssXUS@Mqd$-Ineg|1#zRhWhpXNz zSF!Q&+ew5Q?bt}ABWtFfSon2Vgk{3;u~paPdKuSo*`XUMT3f{yI}9DtoVnIwO*vxY zH{mK|A+Rr>nS9;t4NgKo6m^F0?4>fEwD`u?9-6tcQCELI0S&{qYI_|VEadPkUTD4y)-DxxDYPTG+;L~SZI)8A> zKB}WLmPyYB`g^uZn^T|n5XxyMEAQPN`kR+y&9(Qxu07?TV_g2oB|66Hxg^W+dr6C; zo9@|G=CY)yLy|jGB4_pv|FBIz@ckASmqA)}xkRt-k$_VAcUml8IW~|YR-X6$4o}cc zmPtA0Ka!+c_OZ`qW7*QDhu0dv6))LUm>yFf-TH0OUQF0aD}c%9gdCoYD7CORBH1D5 z^S;>42#r@;SQ+a7a9Lc}E5&9d$lqaItu(_j@u0E0#twIn?4&Bwh(+6rku{YHNu~>4 za#s5XdSio9w7J=nvwh??Q(Nbnw5n{}$HGb#d-VE{7K_Ty??WOS+pjLi`93?7GydpO zYw!ksw<%iYoQ^|p2C08f<>Y*yd~@LOh+Q+wb37XS9*YNrJ=Gh}OC-fz3MtL2>RJ#9 zGnrVjR?jV?9_`ro>YJJ7IUxx_%X9liGHm2u)^|OL`f@YcdSdC)*`K4&O{(bnGSfsY z1yOw7?zF_+&|Ny0>vPtR=7-esz#N{pwz4#O;e9+;CrDm4*8^mE6Gl=f*p|mg)V9yb zN&DiNDJ!XThIeX3iB2q-Z6}r22}OP@c=CCsxq~h3rrojiBN+ld+-f$S(JeoCB{$Eu z?qT8J^)1L7tHjUiy>3v6l*O!zlIv>FkkBZu-Nz?IJRGEFi^Jqr`s$4-Hm+CJ8QWW| zr+Ma3MVB8-9+60UyVExOc-D2RN&1MCT-_TFBCq5clo)UwTF2;B{n;;6=$_*nR+6t` zmQ3B3cSC}WYaXjyY2_u>guBuPB(vCJ)9O2kl}qX3#$S{!*h#FJ4-5B`h8Js zdoI9K?2O4J0TT)m0@%R{35{ z6L)$4)N*<2UVXc7N51_`-7$SyWZ#ahel}tn&$RO=n(60|+cFr%S)jxdQFyIqgmQ~< z&~HOGZ65BvjI3MjVJhh|jW^fhl+)eLeB;|{s&Q5;{4JM2i}U8n()cvN@7?k*M@{AD z65N@W8#b6e-N`E{r=el@yU6tI*?4O{zH>rFpK>jS%qbtvlnAYJ60+&@*1rDLaNV_P zOS|^>TZ<2$rzCYn`|LjY*nIKogB6dAaY_c#>z=Pc=E-sxy1KV-4Xxp%%;lGqZtw4_ zNKPH|zLdRfv_m}D&!yUQ>h`tdpt3Fz69eqmATm7kSvM=tm!mrmEuWtEmArkE3 zK7MNZ*b(nMhws_ExT@D8jz~D@XmdKAW1J70`JrZ1cZAbY@7vxImu>Ey%vW`@Hx{Jx z-Ky^lnx*DooYCJf&~bMUUtN8wuJVR*&$oW==}e{ZSs!Vq%v^B3Rr zAJ?mCY*`iLudKtyG3C^BY1Y=*MxgXw9F0e0tB3YC?z&@E(YoTgbV7^zXdvYhwoh{75 zG&UZ)SZtzv>r&+1rFUuVq_Q;PRn&e?pL3Bx_T=8S!xEbVh4t(!V=->(Kc?zjmxGHp z(DtmlN&A_dhjo%5`~3QCh5hn4uzQY6tY5Kbc-Ltxjy7cuiPiNTmx*rKzL_eBi_EsY zm{Mux+pUhyprY+gG7GhD5)SRv(CM`~Sf`sb^QLg8h zi+rb7ifFs7i`B{t9E~aY7vm|u%@_19#wZ`AV-ss_@25R;dFAM$*Bp1yv8IH~s?bUr z)~TWmZCajpEXK`tmXIH3leiJecDMV5M9+cw)ZDkFQaf?E0;&Zj=5EjBPfD8)9pv1C z+t;LeLj?zAv_#*?Uifw_y)LC9sJg{u<)hHI+!y4yAhTZks`t>ay$*LQQR5FDPpKBj z%Du{=>+el(I@$2c|7g)iO3h77D6!Zb&xy9F`fVPM5-L(XZr|W>V7Y8q-JHv6DUp1w z-ctON2HkKM&cxG^zWniqzKI6HXKM9%n$2cnXU%1|8#&lF-w{0@ekcK@u($z>U!HAq$eXgWSDkwsnhL}B zoZll>#wfx!eaC>>*~?>*?zK{G!#t~(+vPl#re&jp>rMnq+U_w;rhn@!BA@Q*RjRGK z7FA4IaDVk+gcTEBHXckT46@n>#! zcG(T%j~2BSJ`eM#-V!Yem%RHjRaA#%I?ORhwc+*H)-%6njXT@-c|`I(l(-e9OY;-q zuC)u^X;o~s0*rPZ{3f3li>lYOO0k^~v^p@6DXJVX6J~nyes;5yZm_Tx752W2&;FT7 ziO(kA9Ap;lHuuk1a^1sev5iXkRjw{`OWSOpq|~7`3A?wUTs5BivXKvO_&dr5oO5u0u=-PjwVI$Oon4Myb8Jl!c>CvbFlS$ygN z?Wlyt((s;G7YC> zIVFuMuh74}=PW*zUw7?2U-8G0aFzO=d4^N=6AXObOZocF5)SRlHtP<%uT`08cxC*H zJ@t6XxKhdERJpA~j+%!}4$0l@Xj)HF9<{E>Kby+aap6rBO`nV7?DXxQyGM`Ap4`4S zO!Sr=}e7i?BuaPqjEDQNY2pZ#uqh-Pp3lb!Eb;sbp=ANW_)xo?V&?(*Qg z88iFTqi9W&RO}7xz2UoBt(1QZw2OF47?(RVX=uDG_fV*W{P?jzAb%IgG>~!xh z5%#=GftzJri_+V(U+zi^X0a1nvt8Pb*6Fl}cbrjhZE{6->HdAMmg9fODqbX!))*a= zKdP0cq50zq@7ax^hJMXW-#59sX-d7VDBaxfc8J0G`}wD@XH>84?&bFlWxHeYc4c?s zXbR7@%G6sz6$=MnZQj$|xu@hdnP%{IbFWRrB3AbChyFdP@Y)6HSaca^S8)X1XSKFZ zEqFewpe3zqqoaJR`#RynY363{pL`1rBQIa}#0qa6f63vZy;j_a{(5)#?m1Rv2SGOU zJ`6h*j_-e#_{mRbh4)Iez^4oHTnDq(zv-c*nK$r1K5kp_FqNuQ>AHnnB+(27NJlfO zOT6%T>D)}WJ?m&7-y8ek3FVZ-TXOjYdy`%CmTyI%-$_c|lQftyALWJ-`lPexJ`}WVaJG~MJ|~Ef~^xnt7hhEb|_6uhxk9!DZC|ifLSu@5Jzgr z@*|zv2dBJKSX)o)e!}ZubKZ(lHuB@vHD9apvREpsXliMtlXiqDMsRh{i# zriL754{?JxassNC?QU;FYj=Lr0nXh9fjHOiA3STC>B~OXK6sTk0DFsU2VeTa5^{d}5Sadr2*o;LNZHDC5C;TT-6 zC|wBe^9NPBTx0s6Y%*?Te7+bm|5Uo5z@I`L+2ixx&udEyY zqQIPs7@ebfk7pjA-i-dS&0}?5MgC~bgx14D+S|p{Uw#Qlj1|7q*N_t|LlvG^c3CbJ zzF?hdiV?oi4h+9n^h4)$X($|SRe$QI6@B(@#KUE6UR0o9Um5&DzkV%SSU;P(vO3k0 z|FElQ<6YX5gS-)3f{Cl&x~%f|%t?1c(H|ddjgW4Ze@9FIhARGfEVX*bfB#sJY!#vS eduk;gx+JZql#oENrBRQBuVV_T@|m*duKYi2d5q8i literal 0 HcmV?d00001 diff --git a/chat_template.jinja b/chat_template.jinja new file mode 100644 index 0000000..744756d --- /dev/null +++ b/chat_template.jinja @@ -0,0 +1,6 @@ +{{bos_token}}{% for message in messages %}{% if loop.first and messages[0]['role'] != 'system' %}{{ '<|im_start|>system +You are a helpful assistant.<|im_end|> +' }}{% endif %}{{'<|im_start|>' + message['role'] + ' +' + message['content'] + '<|im_end|>' + ' +'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant +' }}{% endif %} \ No newline at end of file diff --git a/config.json b/config.json new file mode 100644 index 0000000..e252e6b --- /dev/null +++ b/config.json @@ -0,0 +1,35 @@ +{ + "architectures": [ + "LlamaForCausalLM" + ], + "attention_bias": false, + "attention_dropout": 0.0, + "bos_token_id": 128000, + "eos_token_id": 128040, + "head_dim": 128, + "hidden_act": "silu", + "hidden_size": 4096, + "initializer_range": 0.02, + "intermediate_size": 14336, + "max_position_embeddings": 131072, + "mlp_bias": false, + "model_type": "llama", + "num_attention_heads": 32, + "num_hidden_layers": 32, + "num_key_value_heads": 8, + "pretraining_tp": 1, + "rms_norm_eps": 1e-05, + "rope_scaling": { + "factor": 8.0, + "high_freq_factor": 4.0, + "low_freq_factor": 1.0, + "original_max_position_embeddings": 8192, + "rope_type": "llama3" + }, + "rope_theta": 500000.0, + "tie_word_embeddings": false, + "torch_dtype": "float16", + "transformers_version": "4.55.2", + "use_cache": true, + "vocab_size": 128256 +} diff --git a/generation_config.json b/generation_config.json new file mode 100644 index 0000000..e394340 --- /dev/null +++ b/generation_config.json @@ -0,0 +1,9 @@ +{ + "_from_model_config": true, + "bos_token_id": 128000, + "do_sample": true, + "eos_token_id": 128040, + "temperature": 0.6, + "top_p": 0.9, + "transformers_version": "4.55.2" +} diff --git a/interface.py b/interface.py new file mode 100644 index 0000000..3387b29 --- /dev/null +++ b/interface.py @@ -0,0 +1,1161 @@ +""" +interface.py - Discord-Hermes Model Chat Interface + +Description: + This script provides an interactive CLI for chatting with Hugging Face language models, + designed for Discord-style conversational flow. It supports both Transformers and GGUF + (llama.cpp) models, with optional LoRA adapter loading and bitsandbytes quantization. + +Key Features & Quick Controls: + Keyboard Shortcuts: + • Enter → submit current input + • Ctrl+T → insert newline into the current prompt (multi-line messages) + • Ctrl+C → cancel mid-generation or mid-stream and keep partial output + • Ctrl+Z → exit cleanly, freeing VRAM and memory + + Commands: + • /clear (/c, /reset, !clear, !c) → reset all context + • /back (/b, !b) → undo last user+assistant exchange and preview history + • /h [N] (!h) → enable history using last N exchanges (default: all) + • /d → disable history + + • On-the-fly parameter tuning: + /min N → set min new tokens + /max N → set max new tokens + /temp X → set temperature + /p X → set top-p + /k N → set top-k + + • Randomization: + /r → randomize params + /rh → randomize with high variance + + • /stop → toggle stopping further extension mid-generation + + Generation Flow: + • Extension flow: prefer short replies (min–max tokens) but extend until EOS for natural endings + • Configurable prompt modes (system prompt, assistant prompt, blank mode) + + Advanced: + • LoRA stacking: frozen base adapter + active adapter support + • Supports GGUF (llama.cpp) with selectable chat templates (--gguf-chat-format) + • Optional code detection and filtering with auto-reprompt + +Arguments: + -m, --model Model path or Hugging Face repo ID (default: mookiezii/Discord-Hermes-3-8B) + -q, --quant Quantization mode: 4 or 8 (default: off). Use `-q` (no value) for 4-bit, or `-q 8` for 8-bit + -fl, --frozen-lora Model path or Hugging Face repo ID of the base LoRa adapter to load and freeze + -c, --checkpoint Model path or Hugging Face repo ID of the LoRa adapter to load + -chs, --checkpoint-subfolder Subfolder of the path or Hugging Face repo ID of the LoRa adapter to load + --deephermes Enable DeepHermes formatting instead of ChatML + --gguf Use GGUF model format with llama.cpp backend + --gguf-chat-format Chat format for GGUF models (default: "chatml") + --blank Raw user input only, no prompts/system context + -asc, --assistant-system-combo Include both system and assistant system prompts + -as, --assistant-system Use assistant system prompt instead of standard + --just-system-prompt Use only the system prompt with user input + --no-system-prompt Do not include system prompt + --no-assistant-prompt Do not include assistant prompt + --code-check Enable code detection and filtering via classifier + -au, --auto Run preset inputs (hello → what do you do → wow tell me more) 5 times with /clear in between, then exit + +Usage (quick help): + python interface.py -h + +USAGE / RECIPES: + Basic (Transformers, full precision): + python interface.py -m mookiezii/Discord-Hermes-3-8B + + Quantization (Transformers): + # 4-bit: + python interface.py -m repo -q + # 8-bit: + python interface.py -m repo -q 8 + # full precision: + python interface.py -m repo + + GGUF (llama.cpp backend): + python interface.py --gguf -m /path/to/model.gguf --gguf-chat-format chatml + # alternate chat template: + python interface.py --gguf -m /path/to/model.gguf --gguf-chat-format alpaca + + LoRA (frozen base + active adapter): + python interface.py -m base/model \ + -fl path/to/frozen_base_lora \ + -c path/to/active_adapter --checkpoint-subfolder adapter_subdir + + Prompt modes: + # Raw user input, no system/assistant prompts: + python interface.py --blank + # Assistant system prompt instead of standard: + python interface.py --assistant-system + # System + assistant system combined: + python interface.py --assistant-system-combo + # Just the system prompt (no assistant prompt preface): + python interface.py --just-system-prompt + # Strip system prompt entirely: + python interface.py --no-system-prompt + # Strip assistant prompt entirely: + python interface.py --no-assistant-prompt + + Format toggle: + # Use DeepHermes formatting instead of ChatML: + python interface.py --deephermes + + Auto run demo + exit: + python interface.py --auto +""" + +# MIT License +# +# Copyright (c) 2025 mookziei +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. + +#!/usr/bin/env python3 +import os +os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True" +import torch +import gc +from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, pipeline, AutoConfig +from peft import PeftModel +from huggingface_hub import login +import argparse +import logging +import re +from transformers import TextStreamer +from datetime import datetime +from prompt_toolkit import PromptSession +from prompt_toolkit.key_binding import KeyBindings +from huggingface_hub import login +import random +from transformers import LogitsProcessor +from transformers import LogitsProcessorList +import signal +os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1" +gc.collect() +torch.cuda.empty_cache() +torch.cuda.reset_peak_memory_stats() + +parser = argparse.ArgumentParser(description="HuggingFace Model Chat Interface") + +parser.add_argument("-au", "--auto", action="store_true", + help="Run preset inputs (hello → what do you do → wow tell me more) 5 times with /clear in between, then exit") +parser.add_argument("-m", "--model", default="mookiezii/Discord-Hermes-3-8B", + help="Model path or Hugging Face repo ID") +parser.add_argument( + "-q", "--quant", + nargs="?", + choices=("4", "8"), + const="4", + help="Quantization mode: 4 or 8. Omit this flag for full precision. Using `-q` alone selects 4-bit." +) +parser.add_argument("-fl", "--frozen-lora", + help="Model path or Hugging Face repo ID of the base LoRa adapter to load and freeze") +parser.add_argument("-c", "--checkpoint", + help="Model path or Hugging Face repo ID of the LoRa adapter to load") +parser.add_argument("-chs", "--checkpoint-subfolder", + help="Subfolder of the path or Hugging Face repo ID of the LoRa adapter to load") +parser.add_argument("--deephermes", action="store_true", + help="Enable DeepHermes formatting instead of ChatML") +parser.add_argument("--gguf", action="store_true", + help="Use GGUF model format (llama.cpp backend)") +parser.add_argument("--gguf-chat-format", default="chatml", + help='Chat format for GGUF models (default: "chatml")') +parser.add_argument("--blank", action="store_true", + help="Use only raw user input (no prompts/system context)") +parser.add_argument("-asc", "--assistant-system-combo", action="store_true", + help="Include both system and assistant system prompts") +parser.add_argument("-as", "--assistant-system", action="store_true", + help="Use assistant system prompt instead of standard system prompt") +parser.add_argument("--just-system-prompt", action="store_true", + help="Use only the system prompt with user input") +parser.add_argument("--no-system-prompt", action="store_true", + help="Do not include system prompt") +parser.add_argument("--no-assistant-prompt", action="store_true", + help="Do not include assistant prompt") +parser.add_argument("--code-check", action="store_true", + help="Enable code detection and filtering via classifier") + +args = parser.parse_args() + +# Apply args to variables +DEEPHERMES = args.deephermes +GGUF = args.gguf +GGUF_CHAT_FORMAT = args.gguf_chat_format +BLANK = args.blank +ASSISTANT_SYSTEM_COMBO = args.assistant_system_combo +ASSISTANT_SYSTEM = args.assistant_system +JUST_SYSTEM_PROMPT = args.just_system_prompt +NO_SYSTEM_PROMPT = args.no_system_prompt +NO_ASSISTANT_PROMPT = args.no_assistant_prompt +CODE_CHECK = args.code_check +QUANTIZATION = args.quant or "off" +USE_QUANT = QUANTIZATION in ("4", "8") +# ================================ +base_model_name = args.model +FROZEN_LORA_PATH = args.frozen_lora +checkpoint_path = args.checkpoint or "" +checkpoint_subfolder = args.checkpoint_subfolder +USE_BASE_MODEL_ONLY = not (FROZEN_LORA_PATH or checkpoint_path) +# ================================ +PLAIN_SYSTEM_PROMPT = """""" +ASSISTANT_SYSTEM_PROMPT = """""" +# ================================ +if args.auto: + MIN_NEW_TOKENS = 4 + MAX_NEW_TOKENS = 60 + TEMPERATURE = 0.6 + TOP_P = 0.9 + TOP_K = 55 +else: + MIN_NEW_TOKENS = random.randint(3, 5) + MAX_NEW_TOKENS = random.randint(40, 75) + TEMPERATURE = random.uniform(0.5, 0.9) + TOP_P = random.uniform(0.7, 0.9) + TOP_K = random.randint(40, 50) +# ================================ + +STOP_EXTENSION = False + +# --- Prefer 40–75 tokens, but extend until <|im_end|> (or eos) --- +def generate_prefer_short_then_extend( + model, + tokenizer, + input_ids, + attention_mask=None, + first_min=MIN_NEW_TOKENS, + first_max=MAX_NEW_TOKENS, + extend_step=64, + hard_cap=1024, + **gen_kwargs, +): + import random, torch + + # Resolve EOS ids (<|im_end|> for ChatML/Hermes, or tokenizer.eos_token_id) + eos_ids = [] + if tokenizer.eos_token_id is not None: + eos_ids.append(tokenizer.eos_token_id) + try: + im_end_id = tokenizer.convert_tokens_to_ids("<|im_end|>") + if im_end_id is not None: + eos_ids.append(im_end_id) + except Exception: + pass + eos_ids = [eid for eid in eos_ids if eid is not None] + if not eos_ids: + raise ValueError("No EOS token id found. Make sure '<|im_end|>' exists or set tokenizer.eos_token.") + + # Strip keys we set manually so no duplication happens + for key in ["max_new_tokens", "eos_token_id", "do_sample", "pad_token_id"]: + gen_kwargs.pop(key, None) + + start_len = input_ids.shape[1] + seq = input_ids + attn = attention_mask if attention_mask is not None else torch.ones_like(input_ids) + + budget = random.randint(first_min, first_max) + + while True: + out = model.generate( + input_ids=seq, + attention_mask=attn, + max_new_tokens=budget, + eos_token_id=eos_ids[0], + pad_token_id=tokenizer.pad_token_id or eos_ids[0], + return_dict_in_generate=True, + **gen_kwargs, + ) + seq = out.sequences + attn = torch.ones_like(seq) + + new_tokens = seq[0, start_len:].tolist() + if any(t in eos_ids for t in new_tokens): + break + + if len(new_tokens) >= hard_cap: + break + + if globals().get("STOP_EXTENSION", False): + break + + budget = extend_step + + return seq + + + +if checkpoint_path is None: + USE_BASE_MODEL_ONLY = True + checkpoint_path = "" +else: + USE_BASE_MODEL_ONLY = False + +if ASSISTANT_SYSTEM: + if DEEPHERMES: + SYSTEM_PROMPT = f"<|start_header_id|>assistant<|end_header_id|>\n{PLAIN_SYSTEM_PROMPT}" + else: + SYSTEM_PROMPT = f"<|im_start|>assistant\n{PLAIN_SYSTEM_PROMPT}<|im_end|>" +elif ASSISTANT_SYSTEM_COMBO: + if DEEPHERMES: + SYSTEM_PROMPT = f"<|start_header_id|>system<|end_header_id|>\n{PLAIN_SYSTEM_PROMPT}<|start_header_id|>assistant<|end_header_id|>\n{ASSISTANT_SYSTEM_PROMPT}" + else: + SYSTEM_PROMPT = f"<|im_start|>system\n{PLAIN_SYSTEM_PROMPT}<|im_end|>\n<|im_start|>assistant\n{ASSISTANT_SYSTEM_PROMPT}<|im_end|>" +else: + if DEEPHERMES: + SYSTEM_PROMPT = f"<|start_header_id|>system<|end_header_id|>\n{PLAIN_SYSTEM_PROMPT}" + else: + SYSTEM_PROMPT = f"<|im_start|>system\n{PLAIN_SYSTEM_PROMPT}<|im_end|>" + + +def handle_sigstp(signum, frame): + print("\n\033[1;91m[Received Ctrl+Z — exiting.]\033[0m") + torch.cuda.empty_cache() + exit(0) + +signal.signal(signal.SIGTSTP, handle_sigstp) + +class RecentTokenBlocker(LogitsProcessor): + def __init__(self, window_size: int, eos_id: int | None = None): + self.window_size = window_size + self.eos_id = eos_id + + def __call__(self, input_ids, scores): + # Penalize last N tokens (use a big negative instead of -inf) + recent_tokens = input_ids[0, -self.window_size:].tolist() + if recent_tokens: + for token_id in set(recent_tokens): + if 0 <= token_id < scores.shape[-1]: + scores[0, token_id] = -1e9 + + # Sanitize logits + scores = torch.nan_to_num(scores, neginf=-1e9) + + # If everything is blocked, fallback to EOS + if torch.all(scores[0] <= -9e8): + scores[0].fill_(-1e9) + if self.eos_id is not None and 0 <= self.eos_id < scores.shape[-1]: + scores[0, self.eos_id] = 0 + + return scores + +processor_list = LogitsProcessorList() +processor_list.append(RecentTokenBlocker(window_size=3)) + +def randomize(): + global MIN_NEW_TOKENS, MAX_NEW_TOKENS, TEMPERATURE, TOP_P, TOP_K + MIN_NEW_TOKENS = random.randint(3, 5) + MAX_NEW_TOKENS = random.randint(40, 75) + TEMPERATURE = random.uniform(0.5, 0.9) + TOP_P = random.uniform(0.7, 0.9) + TOP_K = random.randint(40, 75) + + return + +def randomize_high_variance(): + global MIN_NEW_TOKENS, MAX_NEW_TOKENS, TEMPERATURE, TOP_P, TOP_K + MIN_NEW_TOKENS = random.randint(3, 20) + MAX_NEW_TOKENS = random.randint(40, 150) + TEMPERATURE = random.uniform(0.2, 0.9) + TOP_P = random.uniform(0.7, 1) + TOP_K = random.randint(40, 150) + + return + +logging.getLogger("transformers").setLevel(logging.ERROR) + +bnb_config4bit = BitsAndBytesConfig( + load_in_4bit=True, + bnb_4bit_use_double_quant=True, + bnb_4bit_compute_dtype=torch.float16, + bnb_4bit_quant_type="nf4", +) +bnb_config8bit = BitsAndBytesConfig(load_in_8bit=True) + +bnb_config = None +if USE_QUANT: + bnb_config = bnb_config4bit if QUANTIZATION == "4" else bnb_config8bit + +bindings = KeyBindings() +session = PromptSession(key_bindings=bindings) + +@bindings.add('enter') +def _(event): + event.app.exit(result=event.app.current_buffer.text) + +@bindings.add('c-t') +def _(event): + event.app.current_buffer.insert_text('\n') + +tokenizer = None # define early for fallback + +if GGUF: + from llama_cpp import Llama + model = Llama( + model_path=base_model_name, + n_gpu_layers=999, # same as -ngl 999 + n_ctx=4096, # same as -c 4096 + n_batch=2048, # same as -b 2048 + n_threads=12, # same as -t 12 + chat_format=GGUF_CHAT_FORMAT, + use_mmap=True, # fastest load + logits_all=False + ) + tokenizer = None +else: + tokenizer = AutoTokenizer.from_pretrained(base_model_name) + + # ---- normalize config BEFORE ANY model load ---- + cfg = AutoConfig.from_pretrained(base_model_name) + if getattr(cfg, "hidden_act", None) == "swiglu": + cfg.hidden_act = "silu" + + if USE_BASE_MODEL_ONLY: + if USE_QUANT and bnb_config is not None: + model = AutoModelForCausalLM.from_pretrained( + base_model_name, + config=cfg, + torch_dtype=torch.float16, + quantization_config=bnb_config, + device_map="auto", + ) + else: + model = AutoModelForCausalLM.from_pretrained( + base_model_name, + config=cfg, + torch_dtype=torch.float16, + device_map="auto", + ) + model.resize_token_embeddings(len(tokenizer)) + + else: + # Load base first (for LoRA stack) + if USE_QUANT and bnb_config is not None: + base_model = AutoModelForCausalLM.from_pretrained( + base_model_name, + config=cfg, + torch_dtype=torch.float16, + quantization_config=bnb_config, + device_map="auto", + ) + else: + base_model = AutoModelForCausalLM.from_pretrained( + base_model_name, + config=cfg, + torch_dtype=torch.float16, + device_map="auto", + ) + base_model.resize_token_embeddings(len(tokenizer)) + + # Load frozen adapter first (if present) + if FROZEN_LORA_PATH: + base_model = PeftModel.from_pretrained( + base_model, + FROZEN_LORA_PATH, + is_trainable=False + ) + print(f"Loaded frozen LoRA adapter from {FROZEN_LORA_PATH}") + + # Active adapter (if provided) + if checkpoint_path: + model = PeftModel.from_pretrained( + base_model, + checkpoint_path, + subfolder=checkpoint_subfolder if checkpoint_subfolder else None, + ) + else: + model = base_model + +class CyanPromptStreamer(TextStreamer): + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) + self.first = True + self.output = "" + self._buffer = [] + + # Receive raw, possibly-unstable pieces → just buffer + def on_text(self, text, **kwargs): + self._buffer.append(text) + + # Receive stable text spans → clean once, then print + def on_finalized_text(self, text, **kwargs): + text = "".join(self._buffer) + text + self._buffer.clear() + + text = strip_begin_of_text(text) + text = clean_special_tokens(text) + #text = filter_emojis_keep_first_single(text) + #text = remove_all_emojis(text) + + self.output += text + if self.first: + #print("\033[1;96m> \033[0m", end="", flush=True) + self.first = False + print(text, end="", flush=True) + + def on_final_text(self, text, **kwargs): + # final flush if anything remains + if self._buffer: + self.on_finalized_text("", **kwargs) + self.output += text + print(text, end="", flush=True) + + def end(self): + super().end() + self.first = True + +def filter_emojis_keep_first_single(text): + # Emoji regex pattern (single emojis) + emoji_pattern = re.compile( + "[" + "\U0001F600-\U0001F64F" + "\U0001F300-\U0001F5FF" + "\U0001F680-\U0001F6FF" + "\U0001F1E0-\U0001F1FF" + "\U00002702-\U000027B0" + "\U000024C2-\U0001F251" + "\U0001F900-\U0001F9FF" + "\U0001FA70-\U0001FAFF" + "\U0001F018-\U0001F270" + "\U0001F650-\U0001F67F" + "\U00002600-\U000026FF" + "\U00002300-\U000023FF" + "]+", + flags=re.UNICODE + ) + + # Pattern for emoji groups: 2 or more consecutive emojis + emoji_group_pattern = re.compile(r'(' + emoji_pattern.pattern + r'){2,}', flags=re.UNICODE) + + # Remove all emoji groups first (clusters of 2+ emojis) + text = emoji_group_pattern.sub("", text) + + # Now remove all single emojis except the first one + first_emoji_found = False + def replace_single_emoji(match): + nonlocal first_emoji_found + if not first_emoji_found: + first_emoji_found = True + return match.group(0) # keep first emoji + return "" # remove all others + + filtered_text = emoji_pattern.sub(replace_single_emoji, text) + + return filtered_text + +def remove_all_emojis(text): + emoji_pattern = re.compile( + "[" + "\U0001F600-\U0001F64F" + "\U0001F300-\U0001F5FF" + "\U0001F680-\U0001F6FF" + "\U0001F1E0-\U0001F1FF" + "\U00002702-\U000027B0" + "\U000024C2-\U0001F251" + "\U0001F900-\U0001F9FF" + "\U0001FA70-\U0001FAFF" + "\U0001F018-\U0001F270" + "\U0001F650-\U0001F67F" + "\U00002600-\U000026FF" + "\U00002300-\U000023FF" + "]+", + flags=re.UNICODE + ) + + return emoji_pattern.sub(r'.', text) + +def show_params(): + print( + f"\033[1;92m[Params]" + f" min={MIN_NEW_TOKENS} | max={MAX_NEW_TOKENS} |" + f" temp={TEMPERATURE:.2f} | p={TOP_P:.2f} | k={TOP_K}\033[0m" + ) + +if CODE_CHECK: + classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli") + +if checkpoint_path: + folder_name = f"{base_model_name.replace('/', '_')}_{os.path.basename(checkpoint_path).replace('/', '_')}" +else: + folder_name = base_model_name.replace('/', '_') +output_dir = os.path.join(os.getcwd(), "interface_output", folder_name) +os.makedirs(output_dir, exist_ok=True) +base_filename = f"{folder_name}.txt" +output_path = os.path.join(output_dir, base_filename) + +if args.auto: + base_filename = "auto.txt" + output_path = os.path.join(output_dir, base_filename) +else: + base_filename = f"{folder_name}.txt" + output_path = os.path.join(output_dir, base_filename) + + if os.path.exists(output_path): + i = 1 + while True: + numbered = os.path.join(output_dir, f"{folder_name}_{i}.txt") + if not os.path.exists(numbered): + output_path = numbered + break + i += 1 + +if tokenizer: + if DEEPHERMES: + reserved_special_ids = [ + tid for tok, tid in tokenizer.get_vocab().items() + if re.match(r"<\|reserved_special_token_\d+\|>", tok) + ] + im_end_token_id = [tokenizer.convert_tokens_to_ids("<|eot_id|>")] + reserved_special_ids + else: + im_end_token_id = tokenizer.convert_tokens_to_ids("<|im_end|>") +else: + im_end_token_id = None + +def get_generation_params(): + return { + "do_sample": True, + "eos_token_id": ( + ([im_end_token_id] if isinstance(im_end_token_id, int) else (im_end_token_id or [])) + + ([tokenizer.eos_token_id] if tokenizer and tokenizer.eos_token_id is not None else []) + ) or None, + "bos_token_id": tokenizer.bos_token_id if tokenizer else None, + "min_new_tokens": MIN_NEW_TOKENS, + "max_new_tokens": MAX_NEW_TOKENS, + "temperature": TEMPERATURE, + "top_p": TOP_P, + "top_k": TOP_K, + "no_repeat_ngram_size": 3, + "repetition_penalty": 1.2, + #"pad_token_id": tokenizer.eos_token_id if tokenizer else None, + "logits_processor": processor_list, + } + +chat_history = [] + +def cleanup_and_exit(): + print("\n\033[1;91m[Cleaning up — freeing VRAM]\033[0m") + gc.collect() + torch.cuda.empty_cache() + torch.cuda.reset_peak_memory_stats() + raise SystemExit(0) + +def log(text): + now = datetime.now().strftime("%Y-%m-%d %H:%M:%S") + print(colorize(text)) + with open(output_path, "a", encoding="utf-8") as f: + f.write(f"[{now}]\n{text}\n\n────────────────────────────────────────────────────────────────────────────\n\n") + +def justlog(text): + now = datetime.now().strftime("%Y-%m-%d %H:%M:%S") + with open(output_path, "a", encoding="utf-8") as f: + f.write(f"[{now}]\n{text}\n\n────────────────────────────────────────────────────────────────────────────\n\n") + +def clean_special_tokens(text: str) -> str: + text = re.sub(r"<\|im_start\|>", "", text) + text = re.sub(r"<\|im_end\|>", "", text) + text = re.sub(r"<\|eot_id\|>", "", text) + text = re.sub(r"<\|reserved_special_token_\d+\|>", "", text) + text = re.sub(r"\n+\s*(<\|start_header_id\|>assistant<\|end_header_id\|>)", r"\n\1", text) + return text + +def count_tokens(s): + if tokenizer: + return len(tokenizer.encode(s)) + else: + return len(s.split()) # fallback for GGUF + +def colorize(text): + text = text.replace("<|im_start|>system", "\033[1;93m<|im_start|>system\033[0m") + text = text.replace("<|im_start|>user", "\033[1;96m<|im_start|>user\033[0m") + text = text.replace("<|im_start|>assistant", "\033[1;95m<|im_start|>assistant\033[0m") + text = text.replace("<|im_end|>", "\033[1;1;90m<|im_end|>\033[0m") + return text + +def extract_assistant_reply(full_text): + if DEEPHERMES: + start_token = "<|start_header_id|>assistant<|end_header_id|>" + start_idx = full_text.rfind(start_token) + if start_idx == -1: + return "" + start_idx += len(start_token) + + # Find the earliest end among <|eot_id|> or any <|reserved_special_token_N|> + end_pattern = re.compile(r"<\|eot_id\|>|<\|reserved_special_token_\d+\|>") + m = end_pattern.search(full_text, start_idx) + end_idx = m.start() if m else len(full_text) # ← fallback to EOF + else: + start_token = "<|im_start|>assistant" + end_token = "<|im_end|>" + start_idx = full_text.rfind(start_token) + if start_idx == -1: + return "" + start_idx += len(start_token) + end_idx = full_text.find(end_token, start_idx) + if end_idx == -1: + end_idx = len(full_text) # ← fallback to EOF + + return full_text[start_idx:end_idx].strip() + +def lowercase_lines_and_sentences(text: str) -> str: + def should_skip(word: str) -> bool: + return len(word) >= 2 and word.isupper() + + # Lowercase start of every line (unless all caps and >= 2 chars) + def lower_line_start(line: str) -> str: + if not line.strip(): # covers empty or whitespace-only + return '' + parts = line.split(maxsplit=1) + first_word = parts[0] + if should_skip(first_word): + return line + # Rebuild line if split > 1, else just lower first char + if len(parts) > 1: + rest = parts[1] + return first_word[:1].lower() + first_word[1:] + ' ' + rest + return line[:1].lower() + line[1:] + + # Lowercase start of every sentence after . ? ! + def lower_after_punct(match): + punct_space = match.group(1) + rest = match.group(2) + first_word = rest.split()[0] + if should_skip(first_word): + return punct_space + rest + return punct_space + rest[:1].lower() + rest[1:] + + return re.sub(r'([.?!]\s+)(\w+)', lower_after_punct, text) + +def strip_begin_of_text(s): + prefix = "<|begin_of_text|>" + if s.startswith(prefix): + return s[len(prefix):] + return s + +def strip_code_word(text: str) -> str: + import re + words_to_remove = [ + 'api', 'program', 'programmer', 'script', 'scripts', 'code', 'coded', 'codeblock', 'coding', 'log', 'logs', 'snippet', 'function', + 'script', 'algorithm', 'web', 'library', 'python', 'bot' , 'nodejs' + ] + pattern = r'\b(?:' + '|'.join(map(re.escape, words_to_remove)) + r')\b' + return re.sub(pattern, '', text, flags=re.IGNORECASE) + +def classify(text, threshold=0.8): + if not text.strip(): + return False, {} + candidate_labels = ["code", "chat"] + result = classifier(text, candidate_labels) + scores = dict(zip(result['labels'], result['scores'])) + is_code = scores.get("code", 0) > scores.get("chat", 0) and scores["code"] > threshold + return is_code, scores + + +# Initialize dummy values for current_prompt and user_input for the first generation_params +current_prompt = "" +user_input = "" +generation_params = get_generation_params() + +if args.auto: + with open(output_path, "a", encoding="utf-8") as f: + now = datetime.now().strftime("%Y-%m-%d %H:%M:%S") + f.write(f"[{now}]") + f.write(f"# model: {base_model_name}\n") + f.write(f"# frozen-lora: {FROZEN_LORA_PATH or 'None'}\n") + f.write(f"# checkpoint: {checkpoint_path or 'None'}\n") + f.write(f"# args: {vars(args)}\n") + f.write(f"# generation_params: {generation_params}\n\n────────────────────────────────────────────────────────────────────────────\n\n") + +else: + with open(output_path, "w", encoding="utf-8") as f: + now = datetime.now().strftime("%Y-%m-%d %H:%M:%S") + f.write(f"[{now}]") + f.write(f"# model: {base_model_name}\n") + f.write(f"# frozen-lora: {FROZEN_LORA_PATH or 'None'}\n") + f.write(f"# checkpoint: {checkpoint_path or 'None'}\n") + f.write(f"# args: {vars(args)}\n") + f.write(f"# generation_params: {generation_params}\n\n────────────────────────────────────────────────────────────────────────────\n\n") + +max_retries = 5 +history_enabled = True +cot_max_exchanges = 1000 +cached_history_size = 1000 + +if not GGUF: + model.eval() + aa = getattr(model, "active_adapter", None) + if aa is not None: + print(aa) + +else: + print(f"🚀 Model launched ({base_model_name})") + + +turn = 0 + +if args.auto: + cycle = ["hello", "what do you do", "wow tell me more", "/clear"] + auto_inputs = cycle * 5 # repeat cycle 3 times +else: + auto_inputs = [] + +#============= +# START LOOP +#============= + +while True: + generation_params = get_generation_params() + show_params() + + # print("\033[1;96m> ", end="", flush=True) + + if auto_inputs: + user_input = auto_inputs.pop(0) + print(f"\033[1;96m> {user_input}\033[0m") # show it like a typed input + else: + user_input = session.prompt(multiline=True) + user_input = user_input.strip() + + cmd = user_input.lower() + + # --- clear/reset first (includes /c and !c aliases) --- + if cmd in ["/clear", "!clear", "/reset", "!reset", "/c", "!c"]: + chat_history.clear() + gc.collect() + torch.cuda.empty_cache() + print("\033[1;93m[Chat history cleared.]\033[0m") + justlog("[Chat history cleared.]") + # If we're in auto mode and this was the final queued item, exit now. + if args.auto and not auto_inputs: + print("\n\033[1;92m[Auto mode finished — exiting.]\033[0m") + cleanup_and_exit() + continue + + # --- /back (/b): undo last exchange and reprint tail --- + if cmd in ["/back", "!back", "/b", "!b"]: + if chat_history: + u, a = chat_history.pop() + turn = max(0, turn - 1) + print("\033[1;93m[Undo] Removed last user+assistant exchange.\033[0m") + justlog("[Undo] Removed last user+assistant exchange.]") + + TAIL = 3 # how many exchanges to preview + tail = chat_history[-TAIL:] + if tail: + if DEEPHERMES: + preview = "\n".join( + f"<|start_header_id|>user<|end_header_id|>\n{uu}\n" + f"<|start_header_id|>assistant<|end_header_id|>\n{aa}<|eot_id|>" + for uu, aa in tail + ) + else: + preview = "\n".join( + f"<|im_start|>user\n{uu}<|im_end|>\n" + f"<|im_start|>assistant\n{aa}<|im_end|>" + for uu, aa in tail + ) + print(colorize(preview)) + else: + print("\033[1;90m[Chat history is now empty.]\033[0m") + else: + print("\033[1;90m[Nothing to undo.]\033[0m") + continue + + # --- /h (CoT on, optional count) --- + m = re.match(r"^(/h|!h)(\s+(\d+))?$", cmd) + if m: + history_enabled = True + cot_max_exchanges = int(m.group(3)) if m.group(3) else len(chat_history) + print(f"\033[1;94m[History] Using last {cot_max_exchanges} exchanges.]\033[0m") + justlog(f"[Chat history] Using last {cot_max_exchanges} exchanges.]") + continue + + # --- /d (CoT off) --- + if cmd in ["/d", "!d"]: + history_enabled = False + cot_max_exchanges = 0 + print("\033[1;94m[Chat history disabled.]\033[0m") + justlog("[Chat history disabled.]") + continue + + # === Multi-param updates: allow "/k 40 /t 1 /p .7 /min 1 /max 50" in one line === + def apply_param_change(kind: str, val_s: str | None) -> None: + """ + Return: None. Applies a single param change (or toggle) in-place, printing the outcome. + Logic: Parses kind/value, clamps where needed, updates globals (min/max/temp/p/k or r/rh/stop). + Allowances: Accepts floats like '.7'; /r, /rh, /stop take no value; whitespace is allowed. + """ + global MIN_NEW_TOKENS, MAX_NEW_TOKENS, TEMPERATURE, TOP_P, TOP_K, STOP_EXTENSION + + if kind in ("r", "rh", "stop"): + if kind == "r": + randomize() + print("\033[1;96m[Randomized Parameters]\033[0m") + elif kind == "rh": + randomize_high_variance() + print("\033[1;96m[Randomized Parameters: High Variance]\033[0m") + else: + STOP_EXTENSION = not STOP_EXTENSION + print(f"\033[1;94m[Stop extension {'ENABLED' if STOP_EXTENSION else 'DISABLED'}]\033[0m") + return + + if val_s is None or not val_s.strip(): + # No numeric given → just show current params + show_params() + return + + # Normalize numbers like ".7" to "0.7" + if val_s.startswith("."): + val_s = "0" + val_s + + try: + num = float(val_s) + except ValueError: + print("\033[1;91m[Error] Invalid number.\033[0m") + return + + if kind == "min": + MIN_NEW_TOKENS = max(0, int(num)) + if MAX_NEW_TOKENS < MIN_NEW_TOKENS: + MAX_NEW_TOKENS = MIN_NEW_TOKENS + print(f"\033[1;96m[min → {MIN_NEW_TOKENS}]\033[0m") + + elif kind == "max": + MAX_NEW_TOKENS = max(1, int(num)) + if MIN_NEW_TOKENS > MAX_NEW_TOKENS: + MIN_NEW_TOKENS = MAX_NEW_TOKENS + print(f"\033[1;96m[max → {MAX_NEW_TOKENS}]\033[0m") + + elif kind in ("temp", "t"): + TEMPERATURE = max(0.01, min(float(num), 2.0)) + print(f"\033[1;96m[temp → {TEMPERATURE:.2f}]\033[0m") + + elif kind == "p": + TOP_P = max(0.05, min(float(num), 1.0)) + print(f"\033[1;96m[p → {TOP_P:.2f}]\033[0m") + + elif kind == "k": + TOP_K = max(0, int(num)) + print(f"\033[1;96m[k → {TOP_K}]\033[0m") + + + def parse_and_apply_multi_params(cmd: str) -> bool: + """ + Return: True if at least one /param token was found and applied, else False. + Logic: Scans the whole line for repeated tokens /(min|max|temp|t|p|k|r|rh|stop) with optional numbers; applies in order. + Allowances: Free spacing; supports floats like '.7'; mixed toggles (/r /rh /stop) with numeric params in one command. + """ + # find all occurrences anywhere in the line + pattern = re.compile( + r'/(min|max|temp|t|p|k|r|rh|stop)(?:\s+([+-]?(?:\d+(?:\.\d+)?|\.\d+)))?', + flags=re.IGNORECASE + ) + found = False + for kind, val in pattern.findall(cmd): + found = True + apply_param_change(kind.lower(), val if val else None) + return found + + + if parse_and_apply_multi_params(cmd): + generation_params = get_generation_params() + continue + + limited_history_for_prompt = chat_history[-cached_history_size:] + + if history_enabled and cot_max_exchanges > 0: + limited_history = limited_history_for_prompt[-cot_max_exchanges:] + if DEEPHERMES: + context = "\n".join( + f"<|start_header_id|>user<|end_header_id|>\n{u}\n<|start_header_id|>assistant<|end_header_id|>\n{a}" + for u, a in limited_history + ) + "\n" + else: + context = "\n".join( + f"<|im_start|>user\n{u}<|im_end|>\n<|im_start|>assistant\n{a}<|im_end|>" + for u, a in limited_history + ) + "\n" + else: + context = "" + + if user_input.startswith("<|im_start|>"): + current_prompt = f"{user_input}\n" + elif DEEPHERMES: + current_prompt = f"{SYSTEM_PROMPT}{context}<|start_header_id|>user<|end_header_id|>\n{user_input}\n<|start_header_id|>assistant<|end_header_id|>\n" + else: + if turn == 0: + current_prompt = f"{SYSTEM_PROMPT}{context}<|im_start|>user\n{user_input}<|im_end|>\n<|im_start|>assistant\n" + else: + current_prompt = f"{SYSTEM_PROMPT}\n{context}<|im_start|>user\n{user_input}<|im_end|>\n<|im_start|>assistant\n" + + if BLANK: + current_prompt = user_input + + if JUST_SYSTEM_PROMPT: + if DEEPHERMES: + current_prompt = f"{SYSTEM_PROMPT}{user_input}" + else: + current_prompt = f"{SYSTEM_PROMPT}\n{user_input}" + + if NO_SYSTEM_PROMPT: + if DEEPHERMES: + current_prompt = f"{context}<|start_header_id|>user<|end_header_id|>\n{user_input}\n<|start_header_id|>assistant<|end_header_id|>\n" + else: + current_prompt = f"{context}<|im_start|>user\n{user_input}<|im_end|>\n<|im_start|>assistant\n" + + if NO_ASSISTANT_PROMPT: + if DEEPHERMES: + current_prompt = f"{SYSTEM_PROMPT}{context}<|start_header_id|>user<|end_header_id|>\n{user_input}" + else: + current_prompt = f"{SYSTEM_PROMPT}\n{context}<|im_start|>user\n{user_input}" + + while count_tokens(current_prompt) > 3000 and chat_history: + chat_history.pop(0) + limited_history_for_prompt = chat_history[-cached_history_size:] + if history_enabled and cot_max_exchanges > 0: + limited_history = limited_history_for_prompt[-cot_max_exchanges:] + context = "\n".join( + f"<|im_start|>user\n{u}<|im_end|>\n<|im_start|>assistant\n{a}<|im_end|>" + for u, a in limited_history + ) + "\n" + else: + context = "" + current_prompt = f"{SYSTEM_PROMPT}\n{context}<|im_start|>user\n{user_input}<|im_end|>\n<|im_start|>assistant" + + if CODE_CHECK: + retry_count = 0 + reply = "" + + while retry_count < max_retries: + if GGUF: + output_stream = model.create_chat_completion( + messages=[ + {"role": "system", "content": SYSTEM_PROMPT}, + {"role": "user", "content": user_input} + ], + temperature=TEMPERATURE, + top_p=TOP_P, + top_k=TOP_K, + max_tokens=MAX_NEW_TOKENS, + stream=True + ) + + text = "" + for chunk in output_stream: + delta = chunk["choices"][0]["delta"].get("content", "") + print(delta, end="", flush=True) + text += delta + else: + inputs = tokenizer(current_prompt, return_tensors="pt").to(model.device) + streamer = CyanPromptStreamer(tokenizer, skip_prompt=True, skip_special_tokens=False) + with torch.no_grad(): + output_ids = model.generate( + **inputs, + **generation_params, + streamer=streamer + ) + text = tokenizer.decode(output_ids[0], skip_special_tokens=False) + + reply = extract_assistant_reply(text) + stripedreply = strip_code_word(reply) + text = strip_begin_of_text(text) + + if not BLANK: + is_code, scores = classify(stripedreply) + if is_code: + retry_count += 1 + log(f"\033[1;91mClassifier scores: code={scores.get('code', 0):.3f}, chat={scores.get('chat', 0):.3f}\033[0m") + log(f"\033[1;91mAttempt {retry_count} reply:\n{text}\n\033[0m") + log(f"\033[1;91mAttempt {retry_count} stripped reply:\n{reply}\n\033[0m") + log(f"\033[1;92;102m[Trying to code. Reprompting... {retry_count}/{max_retries}]\033[0m") + with open(output_path, "a", encoding="utf-8") as f: + f.write(f"[Trying to code. Reprompting... {retry_count}/{max_retries}]\n") + else: + break + + final_check, final_scores = classify(text) + if retry_count == max_retries and final_check: + log(f"\033[1;91mFinal classifier scores: code={final_scores.get('code', 0):.3f}, chat={final_scores.get('chat', 0):.3f}\033[0m") + polite_message = "Sorry, I’m not able to provide code snippets." + log(polite_message) + with open(output_path, "a", encoding="utf-8") as f: + f.write(f"{polite_message}\n") + if history_enabled: + chat_history.append((user_input, polite_message)) + else: + if GGUF: + output_stream = model.create_chat_completion( + messages=[ + {"role": "system", "content": SYSTEM_PROMPT}, + {"role": "user", "content": user_input} + ], + temperature=TEMPERATURE, + top_p=TOP_P, + top_k=TOP_K, + max_tokens=MAX_NEW_TOKENS, + stream=True # Enable streaming + ) + + text = "" + for chunk in output_stream: + delta = chunk["choices"][0]["delta"].get("content", "") + print(delta, end="", flush=True) # Optional: stream live to console + text += delta + else: + inputs = tokenizer(current_prompt, return_tensors="pt").to(model.device) + streamer = CyanPromptStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True, clean_up_tokenization_spaces=True) + with torch.no_grad(): + try: + print("\033[1;35m> \033[0m ", end="", flush=True) + output_ids = generate_prefer_short_then_extend( + model, tokenizer, + **inputs, + **generation_params, + first_min=MIN_NEW_TOKENS, + first_max=MAX_NEW_TOKENS, + extend_step=64, + hard_cap=1024, + streamer=streamer + ) + except KeyboardInterrupt: + streamer.stopped = True + print("\n\033[1;91m[Generation interrupted by Ctrl+C — using partial output.]\033[0m") + + text = tokenizer.decode(output_ids[0], skip_special_tokens=False) + + originalreply = extract_assistant_reply(text) + text = lowercase_lines_and_sentences(text) + reply = lowercase_lines_and_sentences(originalreply) + stripedreply = strip_code_word(reply) + text = strip_begin_of_text(text) + logtext = text + text = colorize(text) + consoletext = clean_special_tokens(text) + #text = filter_emojis_keep_first_single(text) + #text = remove_all_emojis(text) + print(f"\n{consoletext}") + justlog(logtext) + if history_enabled: + if not GGUF: + chat_history.append((user_input, reply if reply.strip() else streamer.output.strip())) + else: + chat_history.append((user_input, reply.strip())) + turn += 1 + if args.auto and not auto_inputs: + print("\n\033[1;92m[Auto mode finished — exiting.]\033[0m") + break \ No newline at end of file diff --git a/model-00001-of-00004.safetensors b/model-00001-of-00004.safetensors new file mode 100644 index 0000000..ec8ca75 --- /dev/null +++ b/model-00001-of-00004.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:40c29af1aff3cec59b6397174593e1f637284c49c6560a5c46340bf89367ec89 +size 4976698592 diff --git a/model-00002-of-00004.safetensors b/model-00002-of-00004.safetensors new file mode 100644 index 0000000..a91c959 --- /dev/null +++ b/model-00002-of-00004.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:27f65f3f1040dd14157fe52974feed29cff58d37bb69dbf46d2bac33d10af425 +size 4999802616 diff --git a/model-00003-of-00004.safetensors b/model-00003-of-00004.safetensors new file mode 100644 index 0000000..ec8e274 --- /dev/null +++ b/model-00003-of-00004.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dbc2d2c777353a804dc48fec6a65ea66df17b695adf4098e1fa4dba89295548e +size 4915916080 diff --git a/model-00004-of-00004.safetensors b/model-00004-of-00004.safetensors new file mode 100644 index 0000000..03368f2 --- /dev/null +++ b/model-00004-of-00004.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c60541e47b07cd290e8ea3ae0806694d79639722ee1d41bf0948abb6d5162823 +size 1168138808 diff --git a/model.safetensors.index.json b/model.safetensors.index.json new file mode 100644 index 0000000..5c64f1e --- /dev/null +++ b/model.safetensors.index.json @@ -0,0 +1,299 @@ +{ + "metadata": { + "total_parameters": 8030261248, + "total_size": 16060522496 + }, + "weight_map": { + "lm_head.weight": "model-00004-of-00004.safetensors", + "model.embed_tokens.weight": "model-00001-of-00004.safetensors", + "model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors", + "model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors", + "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors", + "model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors", + "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors", + "model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors", + "model.layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.10.self_attn.q_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.10.self_attn.v_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors", + "model.layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors", + "model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors", + "model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors", + "model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors", + "model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.13.mlp.gate_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.13.post_attention_layernorm.weight": "model-00002-of-00004.safetensors", + "model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors", + "model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.14.mlp.gate_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors", + "model.layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors", + "model.layers.15.mlp.down_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.15.mlp.gate_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.15.mlp.up_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.15.post_attention_layernorm.weight": "model-00002-of-00004.safetensors", + "model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.16.input_layernorm.weight": "model-00002-of-00004.safetensors", + "model.layers.16.mlp.down_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.16.mlp.gate_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.16.post_attention_layernorm.weight": "model-00002-of-00004.safetensors", + "model.layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors", + "model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.17.mlp.gate_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors", + "model.layers.17.self_attn.k_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.17.self_attn.v_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.18.input_layernorm.weight": "model-00002-of-00004.safetensors", + "model.layers.18.mlp.down_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.18.mlp.gate_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.18.mlp.up_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.18.post_attention_layernorm.weight": "model-00002-of-00004.safetensors", + "model.layers.18.self_attn.k_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.18.self_attn.o_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.18.self_attn.q_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.18.self_attn.v_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.19.input_layernorm.weight": "model-00002-of-00004.safetensors", + "model.layers.19.mlp.down_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.19.mlp.gate_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.19.mlp.up_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.19.post_attention_layernorm.weight": "model-00002-of-00004.safetensors", + "model.layers.19.self_attn.k_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.19.self_attn.o_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.19.self_attn.q_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.19.self_attn.v_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors", + "model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors", + "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.20.input_layernorm.weight": "model-00003-of-00004.safetensors", + "model.layers.20.mlp.down_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.20.mlp.gate_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.20.mlp.up_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.20.post_attention_layernorm.weight": "model-00003-of-00004.safetensors", + "model.layers.20.self_attn.k_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.20.self_attn.o_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.20.self_attn.q_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.20.self_attn.v_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.21.input_layernorm.weight": "model-00003-of-00004.safetensors", + "model.layers.21.mlp.down_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.21.mlp.gate_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.21.mlp.up_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.21.post_attention_layernorm.weight": "model-00003-of-00004.safetensors", + "model.layers.21.self_attn.k_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.21.self_attn.o_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.21.self_attn.q_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.21.self_attn.v_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.22.input_layernorm.weight": "model-00003-of-00004.safetensors", + "model.layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.22.mlp.gate_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.22.mlp.up_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.22.post_attention_layernorm.weight": "model-00003-of-00004.safetensors", + "model.layers.22.self_attn.k_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.22.self_attn.o_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.22.self_attn.q_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.22.self_attn.v_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors", + "model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors", + "model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.23.self_attn.q_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.24.input_layernorm.weight": "model-00003-of-00004.safetensors", + "model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.24.mlp.gate_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.24.post_attention_layernorm.weight": "model-00003-of-00004.safetensors", + "model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors", + "model.layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.25.mlp.gate_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.25.post_attention_layernorm.weight": "model-00003-of-00004.safetensors", + "model.layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.26.input_layernorm.weight": "model-00003-of-00004.safetensors", + "model.layers.26.mlp.down_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors", + "model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.27.input_layernorm.weight": "model-00003-of-00004.safetensors", + "model.layers.27.mlp.down_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.27.mlp.gate_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.27.mlp.up_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors", + "model.layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.27.self_attn.o_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.27.self_attn.q_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.27.self_attn.v_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.28.input_layernorm.weight": "model-00003-of-00004.safetensors", + "model.layers.28.mlp.down_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.28.mlp.gate_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.28.mlp.up_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.28.post_attention_layernorm.weight": "model-00003-of-00004.safetensors", + "model.layers.28.self_attn.k_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.28.self_attn.o_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.28.self_attn.q_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.28.self_attn.v_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.29.input_layernorm.weight": "model-00003-of-00004.safetensors", + "model.layers.29.mlp.down_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.29.mlp.gate_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.29.mlp.up_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.29.post_attention_layernorm.weight": "model-00003-of-00004.safetensors", + "model.layers.29.self_attn.k_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.29.self_attn.o_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.29.self_attn.q_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.29.self_attn.v_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors", + "model.layers.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00004.safetensors", + "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.30.input_layernorm.weight": "model-00003-of-00004.safetensors", + "model.layers.30.mlp.down_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.30.mlp.gate_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.30.mlp.up_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.30.post_attention_layernorm.weight": "model-00003-of-00004.safetensors", + "model.layers.30.self_attn.k_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.30.self_attn.o_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.30.self_attn.q_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.30.self_attn.v_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.31.input_layernorm.weight": "model-00004-of-00004.safetensors", + "model.layers.31.mlp.down_proj.weight": "model-00004-of-00004.safetensors", + "model.layers.31.mlp.gate_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.31.mlp.up_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.31.post_attention_layernorm.weight": "model-00004-of-00004.safetensors", + "model.layers.31.self_attn.k_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.31.self_attn.o_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.31.self_attn.q_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.31.self_attn.v_proj.weight": "model-00003-of-00004.safetensors", + "model.layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors", + "model.layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00004.safetensors", + "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.5.input_layernorm.weight": "model-00001-of-00004.safetensors", + "model.layers.5.mlp.down_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00004.safetensors", + "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.6.input_layernorm.weight": "model-00001-of-00004.safetensors", + "model.layers.6.mlp.down_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.6.mlp.up_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00004.safetensors", + "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.7.input_layernorm.weight": "model-00001-of-00004.safetensors", + "model.layers.7.mlp.down_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.7.mlp.gate_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.7.mlp.up_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00004.safetensors", + "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.8.input_layernorm.weight": "model-00001-of-00004.safetensors", + "model.layers.8.mlp.down_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.8.mlp.gate_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.8.mlp.up_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.8.post_attention_layernorm.weight": "model-00001-of-00004.safetensors", + "model.layers.8.self_attn.k_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.8.self_attn.o_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.8.self_attn.q_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.8.self_attn.v_proj.weight": "model-00001-of-00004.safetensors", + "model.layers.9.input_layernorm.weight": "model-00002-of-00004.safetensors", + "model.layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.9.mlp.gate_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.9.post_attention_layernorm.weight": "model-00002-of-00004.safetensors", + "model.layers.9.self_attn.k_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.9.self_attn.o_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.9.self_attn.q_proj.weight": "model-00002-of-00004.safetensors", + "model.layers.9.self_attn.v_proj.weight": "model-00002-of-00004.safetensors", + "model.norm.weight": "model-00004-of-00004.safetensors" + } +} diff --git a/special_tokens_map.json b/special_tokens_map.json new file mode 100644 index 0000000..df2a79d --- /dev/null +++ b/special_tokens_map.json @@ -0,0 +1,23 @@ +{ + "bos_token": { + "content": "<|begin_of_text|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false + }, + "eos_token": { + "content": "<|im_end|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false + }, + "pad_token": { + "content": "<|im_end|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false + } +} diff --git a/tokenizer.json b/tokenizer.json new file mode 100644 index 0000000..9b7e7b9 --- /dev/null +++ b/tokenizer.json @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:14b5e679cb69af62e14c3b98d346177bd4137d882a44f87dec9efec982b01a05 +size 17209403 diff --git a/tokenizer_config.json b/tokenizer_config.json new file mode 100644 index 0000000..60f3c6f --- /dev/null +++ b/tokenizer_config.json @@ -0,0 +1,2063 @@ +{ + "added_tokens_decoder": { + "128000": { + "content": "<|begin_of_text|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128001": { + "content": "<|end_of_text|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128002": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": false + }, + "128003": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": false + }, + "128004": { + "content": "<|finetune_right_pad_id|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128005": { + "content": "<|reserved_special_token_2|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128006": { + "content": "<|start_header_id|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128007": { + "content": "<|end_header_id|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128008": { + "content": "<|eom_id|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128009": { + "content": "<|eot_id|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128010": { + "content": "<|python_tag|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128011": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": false + }, + "128012": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": false + }, + "128013": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": false + }, + "128014": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": false + }, + "128015": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": false + }, + "128016": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": false + }, + "128017": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": false + }, + "128018": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": false + }, + "128019": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": false + }, + "128020": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": false + }, + "128021": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": false + }, + "128022": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": false + }, + "128023": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": false + }, + "128024": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": false + }, + "128025": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": false + }, + "128026": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": false + }, + "128027": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": false + }, + "128028": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": false + }, + "128029": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": false + }, + "128030": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": false + }, + "128031": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": false + }, + "128032": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": false + }, + "128033": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": false + }, + "128034": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": false + }, + "128035": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": false + }, + "128036": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": false + }, + "128037": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": false + }, + "128038": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": false + }, + "128039": { + "content": "<|im_start|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": false + }, + "128040": { + "content": "<|im_end|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128041": { + "content": "<|reserved_special_token_33|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128042": { + "content": "<|reserved_special_token_34|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128043": { + "content": "<|reserved_special_token_35|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128044": { + "content": "<|reserved_special_token_36|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128045": { + "content": "<|reserved_special_token_37|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128046": { + "content": "<|reserved_special_token_38|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128047": { + "content": "<|reserved_special_token_39|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128048": { + "content": "<|reserved_special_token_40|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128049": { + "content": "<|reserved_special_token_41|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128050": { + "content": "<|reserved_special_token_42|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128051": { + "content": "<|reserved_special_token_43|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128052": { + "content": "<|reserved_special_token_44|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128053": { + "content": "<|reserved_special_token_45|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128054": { + "content": "<|reserved_special_token_46|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128055": { + "content": "<|reserved_special_token_47|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128056": { + "content": "<|reserved_special_token_48|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128057": { + "content": "<|reserved_special_token_49|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128058": { + "content": "<|reserved_special_token_50|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128059": { + "content": "<|reserved_special_token_51|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128060": { + "content": "<|reserved_special_token_52|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128061": { + "content": "<|reserved_special_token_53|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128062": { + "content": "<|reserved_special_token_54|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128063": { + "content": "<|reserved_special_token_55|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128064": { + "content": "<|reserved_special_token_56|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128065": { + "content": "<|reserved_special_token_57|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128066": { + "content": "<|reserved_special_token_58|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128067": { + "content": "<|reserved_special_token_59|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128068": { + "content": "<|reserved_special_token_60|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128069": { + "content": "<|reserved_special_token_61|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128070": { + "content": "<|reserved_special_token_62|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128071": { + "content": "<|reserved_special_token_63|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128072": { + "content": "<|reserved_special_token_64|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128073": { + "content": "<|reserved_special_token_65|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128074": { + "content": "<|reserved_special_token_66|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128075": { + "content": "<|reserved_special_token_67|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128076": { + "content": "<|reserved_special_token_68|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128077": { + "content": "<|reserved_special_token_69|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128078": { + "content": "<|reserved_special_token_70|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128079": { + "content": "<|reserved_special_token_71|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128080": { + "content": "<|reserved_special_token_72|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128081": { + "content": "<|reserved_special_token_73|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128082": { + "content": "<|reserved_special_token_74|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128083": { + "content": "<|reserved_special_token_75|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128084": { + "content": "<|reserved_special_token_76|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128085": { + "content": "<|reserved_special_token_77|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128086": { + "content": "<|reserved_special_token_78|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128087": { + "content": "<|reserved_special_token_79|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128088": { + "content": "<|reserved_special_token_80|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128089": { + "content": "<|reserved_special_token_81|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128090": { + "content": "<|reserved_special_token_82|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128091": { + "content": "<|reserved_special_token_83|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128092": { + "content": "<|reserved_special_token_84|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128093": { + "content": "<|reserved_special_token_85|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128094": { + "content": "<|reserved_special_token_86|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128095": { + "content": "<|reserved_special_token_87|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128096": { + "content": "<|reserved_special_token_88|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128097": { + "content": "<|reserved_special_token_89|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128098": { + "content": "<|reserved_special_token_90|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128099": { + "content": "<|reserved_special_token_91|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128100": { + "content": "<|reserved_special_token_92|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128101": { + "content": "<|reserved_special_token_93|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128102": { + "content": "<|reserved_special_token_94|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128103": { + "content": "<|reserved_special_token_95|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128104": { + "content": "<|reserved_special_token_96|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128105": { + "content": "<|reserved_special_token_97|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128106": { + "content": "<|reserved_special_token_98|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128107": { + "content": "<|reserved_special_token_99|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128108": { + "content": "<|reserved_special_token_100|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128109": { + "content": "<|reserved_special_token_101|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128110": { + "content": "<|reserved_special_token_102|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128111": { + "content": "<|reserved_special_token_103|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128112": { + "content": "<|reserved_special_token_104|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128113": { + "content": "<|reserved_special_token_105|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128114": { + "content": "<|reserved_special_token_106|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128115": { + "content": "<|reserved_special_token_107|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128116": { + "content": "<|reserved_special_token_108|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128117": { + "content": "<|reserved_special_token_109|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128118": { + "content": "<|reserved_special_token_110|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128119": { + "content": "<|reserved_special_token_111|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128120": { + "content": "<|reserved_special_token_112|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128121": { + "content": "<|reserved_special_token_113|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128122": { + "content": "<|reserved_special_token_114|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128123": { + "content": "<|reserved_special_token_115|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128124": { + "content": "<|reserved_special_token_116|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128125": { + "content": "<|reserved_special_token_117|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128126": { + "content": "<|reserved_special_token_118|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128127": { + "content": "<|reserved_special_token_119|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128128": { + "content": "<|reserved_special_token_120|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128129": { + "content": "<|reserved_special_token_121|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128130": { + "content": "<|reserved_special_token_122|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128131": { + "content": "<|reserved_special_token_123|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128132": { + "content": "<|reserved_special_token_124|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128133": { + "content": "<|reserved_special_token_125|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128134": { + "content": "<|reserved_special_token_126|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128135": { + "content": "<|reserved_special_token_127|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128136": { + "content": "<|reserved_special_token_128|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128137": { + "content": "<|reserved_special_token_129|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128138": { + "content": "<|reserved_special_token_130|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128139": { + "content": "<|reserved_special_token_131|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128140": { + "content": "<|reserved_special_token_132|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128141": { + "content": "<|reserved_special_token_133|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128142": { + "content": "<|reserved_special_token_134|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128143": { + "content": "<|reserved_special_token_135|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128144": { + "content": "<|reserved_special_token_136|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128145": { + "content": "<|reserved_special_token_137|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128146": { + "content": "<|reserved_special_token_138|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128147": { + "content": "<|reserved_special_token_139|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128148": { + "content": "<|reserved_special_token_140|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128149": { + "content": "<|reserved_special_token_141|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128150": { + "content": "<|reserved_special_token_142|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128151": { + "content": "<|reserved_special_token_143|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128152": { + "content": "<|reserved_special_token_144|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128153": { + "content": "<|reserved_special_token_145|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128154": { + "content": "<|reserved_special_token_146|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128155": { + "content": "<|reserved_special_token_147|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128156": { + "content": "<|reserved_special_token_148|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128157": { + "content": "<|reserved_special_token_149|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128158": { + "content": "<|reserved_special_token_150|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128159": { + "content": "<|reserved_special_token_151|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128160": { + "content": "<|reserved_special_token_152|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128161": { + "content": "<|reserved_special_token_153|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128162": { + "content": "<|reserved_special_token_154|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128163": { + "content": "<|reserved_special_token_155|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128164": { + "content": "<|reserved_special_token_156|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128165": { + "content": "<|reserved_special_token_157|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128166": { + "content": "<|reserved_special_token_158|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128167": { + "content": "<|reserved_special_token_159|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128168": { + "content": "<|reserved_special_token_160|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128169": { + "content": "<|reserved_special_token_161|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128170": { + "content": "<|reserved_special_token_162|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128171": { + "content": "<|reserved_special_token_163|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128172": { + "content": "<|reserved_special_token_164|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128173": { + "content": "<|reserved_special_token_165|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128174": { + "content": "<|reserved_special_token_166|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128175": { + "content": "<|reserved_special_token_167|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128176": { + "content": "<|reserved_special_token_168|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128177": { + "content": "<|reserved_special_token_169|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128178": { + "content": "<|reserved_special_token_170|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128179": { + "content": "<|reserved_special_token_171|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128180": { + "content": "<|reserved_special_token_172|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128181": { + "content": "<|reserved_special_token_173|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128182": { + "content": "<|reserved_special_token_174|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128183": { + "content": "<|reserved_special_token_175|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128184": { + "content": "<|reserved_special_token_176|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128185": { + "content": "<|reserved_special_token_177|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128186": { + "content": "<|reserved_special_token_178|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128187": { + "content": "<|reserved_special_token_179|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128188": { + "content": "<|reserved_special_token_180|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128189": { + "content": "<|reserved_special_token_181|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128190": { + "content": "<|reserved_special_token_182|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128191": { + "content": "<|reserved_special_token_183|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128192": { + "content": "<|reserved_special_token_184|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128193": { + "content": "<|reserved_special_token_185|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128194": { + "content": "<|reserved_special_token_186|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128195": { + "content": "<|reserved_special_token_187|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128196": { + "content": "<|reserved_special_token_188|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128197": { + "content": "<|reserved_special_token_189|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128198": { + "content": "<|reserved_special_token_190|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128199": { + "content": "<|reserved_special_token_191|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128200": { + "content": "<|reserved_special_token_192|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128201": { + "content": "<|reserved_special_token_193|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128202": { + "content": "<|reserved_special_token_194|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128203": { + "content": "<|reserved_special_token_195|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128204": { + "content": "<|reserved_special_token_196|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128205": { + "content": "<|reserved_special_token_197|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128206": { + "content": "<|reserved_special_token_198|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128207": { + "content": "<|reserved_special_token_199|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128208": { + "content": "<|reserved_special_token_200|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128209": { + "content": "<|reserved_special_token_201|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128210": { + "content": "<|reserved_special_token_202|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128211": { + "content": "<|reserved_special_token_203|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128212": { + "content": "<|reserved_special_token_204|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128213": { + "content": "<|reserved_special_token_205|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128214": { + "content": "<|reserved_special_token_206|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128215": { + "content": "<|reserved_special_token_207|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128216": { + "content": "<|reserved_special_token_208|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128217": { + "content": "<|reserved_special_token_209|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128218": { + "content": "<|reserved_special_token_210|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128219": { + "content": "<|reserved_special_token_211|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128220": { + "content": "<|reserved_special_token_212|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128221": { + "content": "<|reserved_special_token_213|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128222": { + "content": "<|reserved_special_token_214|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128223": { + "content": "<|reserved_special_token_215|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128224": { + "content": "<|reserved_special_token_216|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128225": { + "content": "<|reserved_special_token_217|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128226": { + "content": "<|reserved_special_token_218|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128227": { + "content": "<|reserved_special_token_219|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128228": { + "content": "<|reserved_special_token_220|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128229": { + "content": "<|reserved_special_token_221|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128230": { + "content": "<|reserved_special_token_222|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128231": { + "content": "<|reserved_special_token_223|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128232": { + "content": "<|reserved_special_token_224|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128233": { + "content": "<|reserved_special_token_225|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128234": { + "content": "<|reserved_special_token_226|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128235": { + "content": "<|reserved_special_token_227|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128236": { + "content": "<|reserved_special_token_228|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128237": { + "content": "<|reserved_special_token_229|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128238": { + "content": "<|reserved_special_token_230|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128239": { + "content": "<|reserved_special_token_231|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128240": { + "content": "<|reserved_special_token_232|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128241": { + "content": "<|reserved_special_token_233|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128242": { + "content": "<|reserved_special_token_234|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128243": { + "content": "<|reserved_special_token_235|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128244": { + "content": "<|reserved_special_token_236|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128245": { + "content": "<|reserved_special_token_237|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128246": { + "content": "<|reserved_special_token_238|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128247": { + "content": "<|reserved_special_token_239|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128248": { + "content": "<|reserved_special_token_240|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128249": { + "content": "<|reserved_special_token_241|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128250": { + "content": "<|reserved_special_token_242|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128251": { + "content": "<|reserved_special_token_243|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128252": { + "content": "<|reserved_special_token_244|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128253": { + "content": "<|reserved_special_token_245|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128254": { + "content": "<|reserved_special_token_246|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128255": { + "content": "<|reserved_special_token_247|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + } + }, + "bos_token": "<|begin_of_text|>", + "clean_up_tokenization_spaces": true, + "eos_token": "<|im_end|>", + "extra_special_tokens": {}, + "model_input_names": [ + "input_ids", + "attention_mask" + ], + "model_max_length": 131072, + "pad_token": "<|im_end|>", + "tokenizer_class": "PreTrainedTokenizerFast" +} diff --git a/train.log b/train.log new file mode 100644 index 0000000..1965705 --- /dev/null +++ b/train.log @@ -0,0 +1,24158 @@ +[05:25:24] 2025-08-23 +[05:25:24] Tesla T4 +[05:25:24] CPU usage: 95.4%, RAM usage: 23.8% +[05:25:24] Running with the following configuration: +[05:25:24] model_name: /content/drive/MyDrive/llm/NousResearch/Hermes-3-Llama-3.1-8B +[05:25:24] tokenizer: /content/drive/MyDrive/llm/NousResearch/Hermes-3-Llama-3.1-8B +[05:25:24] output_dir: /content/drive/MyDrive/llm/Discord-Hermes-3-8B +[05:25:24] train_path: /content/drive/MyDrive/data/None156_fix.csv +[05:25:24] checkpoint: +[05:25:24] lr: 3e-05 +[05:25:24] lr_floor: 6e-06 +[05:25:24] epochs: 1 +[05:25:24] batch_size: 5 +[05:25:24] accum_steps: 7 +[05:25:24] val_batch_size: 6 +[05:25:24] max_val_size: 100 +[05:25:24] max_length: 150 +[05:25:24] save_temp_frequency: 200 +[05:25:24] save_frequency: 500 +[05:25:24] eval_frequency: 500 +[05:25:24] save_pattern: y +[05:25:24] quantization: y +[05:25:24] quantization_bits: 4 +[05:25:24] lora: y +[05:25:24] frozen_lora_path: None +[05:25:24] lora_rank: 16 +[05:25:24] lora_alpha: 32 +[05:25:24] lora_dropout: 0.1 +[05:25:24] optimizer_weight_decay: 0.0 +[05:25:24] warmup_type: cosine +[05:25:24] warmup_ratio: 0.08 +[05:25:24] warmup_steps: 550 +[05:25:24] shuffle: y +[05:25:24] csv_column: text +[05:25:24] new_run: n +[05:25:24] label_smoothing: 0.05 +[05:25:24] SEED: 1 +[05:25:24] Using device: cuda +[05:28:08] LoRA configuration: +[05:28:08] task_type: TaskType.CAUSAL_LM +[05:28:08] peft_type: PeftType.LORA +[05:28:08] auto_mapping: None +[05:28:08] base_model_name_or_path: /content/drive/MyDrive/llm/NousResearch/Hermes-3-Llama-3.1-8B +[05:28:08] revision: None +[05:28:08] inference_mode: False +[05:28:08] r: 16 +[05:28:08] target_modules: {'k_proj', 'q_proj', 'v_proj', 'o_proj'} +[05:28:08] exclude_modules: None +[05:28:08] lora_alpha: 32 +[05:28:08] lora_dropout: 0.1 +[05:28:08] fan_in_fan_out: False +[05:28:08] bias: none +[05:28:08] use_rslora: True +[05:28:08] modules_to_save: None +[05:28:08] init_lora_weights: True +[05:28:08] layers_to_transform: None +[05:28:08] layers_pattern: None +[05:28:08] rank_pattern: {} +[05:28:08] alpha_pattern: {} +[05:28:08] megatron_config: None +[05:28:08] megatron_core: megatron.core +[05:28:08] trainable_token_indices: None +[05:28:08] loftq_config: {} +[05:28:08] eva_config: None +[05:28:08] corda_config: None +[05:28:08] use_dora: False +[05:28:08] use_qalora: False +[05:28:08] qalora_group_size: 16 +[05:28:08] layer_replication: None +[05:28:08] runtime_config: LoraRuntimeConfig(ephemeral_gpu_offload=False) +[05:28:08] lora_bias: False +[05:28:08] target_parameters: None +[05:28:08] _custom_modules: None +[05:28:08] TRAINING: base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:08] TRAINING: base_model.model.model.layers.0.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:08] TRAINING: base_model.model.model.layers.0.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.0.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.0.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.0.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.0.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.0.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.1.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.1.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.1.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.1.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.1.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.1.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.1.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.1.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.2.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.2.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.2.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.2.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.2.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.2.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.2.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.2.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.3.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.3.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.3.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.3.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.3.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.3.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.3.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.3.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.4.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.4.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.4.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.4.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.4.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.4.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.4.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.4.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.5.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.5.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.5.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.5.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.5.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.5.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.5.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.5.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.6.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.6.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.6.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.6.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.6.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.6.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.6.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.6.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.7.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.7.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.7.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.7.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.7.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.7.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.7.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.7.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.8.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.8.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.8.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.8.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.8.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.8.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.8.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.8.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.9.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.9.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.9.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.9.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.9.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.9.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.9.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.9.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.10.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.10.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.10.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.10.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.10.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.10.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.10.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.10.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.11.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.11.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.11.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.11.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.11.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.11.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.11.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.11.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.12.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.12.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.12.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.12.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.12.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.12.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.12.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.12.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.13.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.13.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.13.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.13.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.13.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.13.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.13.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.13.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.14.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.14.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.14.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.14.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.14.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.14.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.14.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.14.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.15.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.15.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.15.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.15.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.15.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.15.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.15.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.15.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.16.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.16.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.16.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.16.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.16.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.16.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.16.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.16.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.17.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.17.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.17.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.17.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.17.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.17.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.17.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.17.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.18.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.18.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.18.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.18.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.18.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.18.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.18.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.18.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.19.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.19.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.19.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.19.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.19.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.19.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.19.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.19.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.20.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.20.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.20.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.20.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.20.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.20.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.20.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.20.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.21.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.21.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.21.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.21.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.21.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.21.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.21.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.21.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:09] TRAINING: base_model.model.model.layers.22.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:09] TRAINING: base_model.model.model.layers.22.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:10] TRAINING: base_model.model.model.layers.22.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:10] TRAINING: base_model.model.model.layers.22.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:10] TRAINING: base_model.model.model.layers.22.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:10] TRAINING: base_model.model.model.layers.22.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:10] TRAINING: base_model.model.model.layers.22.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:10] TRAINING: base_model.model.model.layers.22.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:10] TRAINING: base_model.model.model.layers.23.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:10] TRAINING: base_model.model.model.layers.23.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:10] TRAINING: base_model.model.model.layers.23.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:10] TRAINING: base_model.model.model.layers.23.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:10] TRAINING: base_model.model.model.layers.23.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:10] TRAINING: base_model.model.model.layers.23.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:10] TRAINING: base_model.model.model.layers.23.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:10] TRAINING: base_model.model.model.layers.23.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:10] TRAINING: base_model.model.model.layers.24.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:10] TRAINING: base_model.model.model.layers.24.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:10] TRAINING: base_model.model.model.layers.24.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:10] TRAINING: base_model.model.model.layers.24.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:10] TRAINING: base_model.model.model.layers.24.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:10] TRAINING: base_model.model.model.layers.24.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:10] TRAINING: base_model.model.model.layers.24.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:10] TRAINING: base_model.model.model.layers.24.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:10] TRAINING: base_model.model.model.layers.25.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:10] TRAINING: base_model.model.model.layers.25.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:10] TRAINING: base_model.model.model.layers.25.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:10] TRAINING: base_model.model.model.layers.25.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:10] TRAINING: base_model.model.model.layers.25.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:10] TRAINING: base_model.model.model.layers.25.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:10] TRAINING: base_model.model.model.layers.25.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:10] TRAINING: base_model.model.model.layers.25.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:10] TRAINING: base_model.model.model.layers.26.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:10] TRAINING: base_model.model.model.layers.26.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:10] TRAINING: base_model.model.model.layers.26.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:10] TRAINING: base_model.model.model.layers.26.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:10] TRAINING: base_model.model.model.layers.26.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:10] TRAINING: base_model.model.model.layers.26.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:10] TRAINING: base_model.model.model.layers.26.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:10] TRAINING: base_model.model.model.layers.26.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:10] TRAINING: base_model.model.model.layers.27.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:10] TRAINING: base_model.model.model.layers.27.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:10] TRAINING: base_model.model.model.layers.27.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:10] TRAINING: base_model.model.model.layers.27.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:10] TRAINING: base_model.model.model.layers.27.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:10] TRAINING: base_model.model.model.layers.27.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:10] TRAINING: base_model.model.model.layers.27.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:10] TRAINING: base_model.model.model.layers.27.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:10] TRAINING: base_model.model.model.layers.28.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:10] TRAINING: base_model.model.model.layers.28.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:10] TRAINING: base_model.model.model.layers.28.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:10] TRAINING: base_model.model.model.layers.28.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:10] TRAINING: base_model.model.model.layers.28.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:10] TRAINING: base_model.model.model.layers.28.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:10] TRAINING: base_model.model.model.layers.28.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:10] TRAINING: base_model.model.model.layers.28.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:10] TRAINING: base_model.model.model.layers.29.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:10] TRAINING: base_model.model.model.layers.29.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:10] TRAINING: base_model.model.model.layers.29.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:10] TRAINING: base_model.model.model.layers.29.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:10] TRAINING: base_model.model.model.layers.29.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:10] TRAINING: base_model.model.model.layers.29.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:10] TRAINING: base_model.model.model.layers.29.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:10] TRAINING: base_model.model.model.layers.29.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:10] TRAINING: base_model.model.model.layers.30.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:10] TRAINING: base_model.model.model.layers.30.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:10] TRAINING: base_model.model.model.layers.30.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:10] TRAINING: base_model.model.model.layers.30.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:10] TRAINING: base_model.model.model.layers.30.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:10] TRAINING: base_model.model.model.layers.30.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:10] TRAINING: base_model.model.model.layers.30.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:10] TRAINING: base_model.model.model.layers.30.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:10] TRAINING: base_model.model.model.layers.31.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:10] TRAINING: base_model.model.model.layers.31.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:10] TRAINING: base_model.model.model.layers.31.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:10] TRAINING: base_model.model.model.layers.31.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:10] TRAINING: base_model.model.model.layers.31.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:10] TRAINING: base_model.model.model.layers.31.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) +[05:28:10] TRAINING: base_model.model.model.layers.31.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) +[05:28:10] TRAINING: base_model.model.model.layers.31.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) +[05:28:10] +Total Parameters: 4,554,231,808 +[05:28:10] Trainable Parameters: 13,631,488 +[05:28:10] Trainable %: 0.2993% +[05:28:10] base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.0.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.0.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.0.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.0.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.0.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.0.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.0.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.1.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.1.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.1.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.1.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.1.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.1.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.1.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.1.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.2.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.2.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.2.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.2.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.2.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.2.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.2.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.2.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.3.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.3.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.3.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.3.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.3.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.3.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.3.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.3.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.4.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.4.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.4.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.4.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.4.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.4.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.4.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.4.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.5.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.5.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.5.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.5.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.5.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.5.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.5.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.5.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.6.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.6.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.6.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.6.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.6.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.6.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.6.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.6.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.7.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.7.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.7.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.7.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.7.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.7.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.7.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.7.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.8.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.8.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.8.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.8.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.8.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.8.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.8.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.8.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.9.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.9.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.9.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.9.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.9.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.9.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.9.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.9.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.10.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.10.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.10.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.10.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.10.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.10.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.10.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.10.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.11.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.11.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.11.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.11.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.11.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.11.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.11.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.11.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.12.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:10] base_model.model.model.layers.12.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.12.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.12.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.12.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.12.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.12.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.12.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.13.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.13.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.13.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.13.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.13.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.13.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.13.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.13.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.14.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.14.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.14.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.14.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.14.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.14.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.14.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.14.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.15.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.15.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.15.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.15.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.15.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.15.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.15.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.15.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.16.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.16.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.16.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.16.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.16.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.16.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.16.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.16.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.17.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.17.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.17.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.17.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.17.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.17.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.17.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.17.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.18.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.18.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.18.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.18.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.18.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.18.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.18.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.18.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.19.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.19.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.19.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.19.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.19.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.19.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.19.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.19.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.20.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.20.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.20.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.20.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.20.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.20.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.20.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.20.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.21.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.21.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.21.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.21.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.21.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.21.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.21.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.21.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.22.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.22.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.22.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.22.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.22.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.22.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.22.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.22.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.23.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.23.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.23.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.23.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.23.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.23.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.23.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.23.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.24.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.24.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.24.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.24.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.24.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.24.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.24.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.24.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.25.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.25.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.25.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.25.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.25.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.25.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.25.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.25.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.26.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.26.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.26.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.26.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.26.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.26.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.26.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.26.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.27.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.27.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.27.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.27.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.27.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.27.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.27.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.27.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.28.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.28.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.28.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.28.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.28.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.28.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.28.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.28.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.29.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.29.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.29.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.29.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.29.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.29.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.29.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.29.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.30.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.30.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.30.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.30.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.30.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.30.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.30.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.30.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.31.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.31.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.31.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.31.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.31.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.31.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.31.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] base_model.model.model.layers.31.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 +[05:28:11] Starting from CSV file... +[05:28:14] Splitting data into chunks of 11000... +[05:28:14] Using 7 processes across 10 chunks +[05:28:15] Creating new train/val split. +[05:28:15] Initializing scheduler with cosine schedule with warmup, warmup steps 550, total steps: 2871 +[05:28:15] Train/Val split: 100492 train, 100 val samples. +[05:28:26] Model: PeftModelForCausalLM +[05:28:26] Model config: LlamaConfig { + "architectures": [ + "LlamaForCausalLM" + ], + "attention_bias": false, + "attention_dropout": 0.0, + "bos_token_id": 128000, + "eos_token_id": 128040, + "head_dim": 128, + "hidden_act": "silu", + "hidden_size": 4096, + "initializer_range": 0.02, + "intermediate_size": 14336, + "max_position_embeddings": 131072, + "mlp_bias": false, + "model_type": "llama", + "num_attention_heads": 32, + "num_hidden_layers": 32, + "num_key_value_heads": 8, + "pretraining_tp": 1, + "quantization_config": { + "_load_in_4bit": true, + "_load_in_8bit": false, + "bnb_4bit_compute_dtype": "float16", + "bnb_4bit_quant_storage": "uint8", + "bnb_4bit_quant_type": "nf4", + "bnb_4bit_use_double_quant": true, + "llm_int8_enable_fp32_cpu_offload": false, + "llm_int8_has_fp16_weight": false, + "llm_int8_skip_modules": [ + "lm_head" + ], + "llm_int8_threshold": 6.0, + "load_in_4bit": true, + "load_in_8bit": false, + "quant_method": "bitsandbytes" + }, + "rms_norm_eps": 1e-05, + "rope_scaling": { + "factor": 8.0, + "high_freq_factor": 4.0, + "low_freq_factor": 1.0, + "original_max_position_embeddings": 8192, + "rope_type": "llama3" + }, + "rope_theta": 500000.0, + "tie_word_embeddings": false, + "torch_dtype": "float16", + "transformers_version": "4.55.2", + "use_cache": true, + "vocab_size": 128256 +} + +[05:28:26] Optimizer params: lr=3e-05, weight_decay=0.0, accum_steps=7 +[05:28:26] +Optimizer: PagedAdamW ( +Parameter Group 0 + alpha: 0.0 + betas: (0.9, 0.95) + eps: 1e-08 + initial_lr: 3e-05 + lr: 0.0 + t_alpha: None + t_beta3: None + weight_decay: 0.0 +) +[05:28:26] Optimizer params: lr=3e-05, weight_decay=0.0, accum_steps=7 +[05:28:26] Scheduler: +[05:28:26] Training on 100492 training samples, 100 validation samples +[05:28:26] Average tokens per sample: 150.00 +[05:28:26] Estimated epoch time: ~345.37 min +[05:28:26] +|===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 8060 MiB | 8993 MiB | 411433 MiB | 403373 MiB | +|---------------------------------------------------------------------------| +| Active memory | 8060 MiB | 8993 MiB | 411433 MiB | 403373 MiB | +|---------------------------------------------------------------------------| +| Requested memory | 8057 MiB | 8990 MiB | 411312 MiB | 403255 MiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 11050 MiB | 11050 MiB | 11050 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 2987 MiB | 5879 MiB | 402509 MiB | 399522 MiB | +|---------------------------------------------------------------------------| +| Allocations | 1738 | 1816 | 32748 | 31010 | +|---------------------------------------------------------------------------| +| Active allocs | 1738 | 1816 | 32748 | 31010 | +|---------------------------------------------------------------------------| +| GPU reserved segments | 84 | 84 | 84 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 96 | 96 | 13657 | 13561 | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +[05:28:26] Shuffling indices for epoch 1 with seed 1 +[05:28:26] CPU usage: 63.3%, RAM usage: 38.5% +[05:28:27] Epoch 1 learning rate: 0.0 +[05:28:27] Starting epoch 1 +[05:28:27] Batch 1: input_ids shape torch.Size([5, 150]), attention_mask shape torch.Size([5, 150]) +[05:28:30] Epoch: 1 Batch: 1/20099 (0.00%) Loss: 6.962016 LR: 0.00000000 +[05:28:32] Epoch: 1 Batch: 2/20099 (0.01%) Loss: 6.844860 LR: 0.00000000 +[05:28:35] Epoch: 1 Batch: 3/20099 (0.01%) Loss: 6.896306 LR: 0.00000000 +[05:28:38] Epoch: 1 Batch: 4/20099 (0.02%) Loss: 6.635071 LR: 0.00000000 +[05:28:41] Epoch: 1 Batch: 5/20099 (0.02%) Loss: 7.060008 LR: 0.00000000 +[05:28:44] Epoch: 1 Batch: 6/20099 (0.03%) Loss: 7.106397 LR: 0.00000000 +[05:28:47] Epoch: 1 Batch: 7/20099 (0.03%) Loss: 6.829549 LR: 0.00000005 +[05:28:49] Epoch: 1 Batch: 8/20099 (0.04%) Loss: 6.929935 LR: 0.00000005 +[05:28:52] Epoch: 1 Batch: 9/20099 (0.04%) Loss: 6.916800 LR: 0.00000005 +[05:28:55] Epoch: 1 Batch: 10/20099 (0.05%) Loss: 7.042025 LR: 0.00000005 +[05:28:58] Epoch: 1 Batch: 11/20099 (0.05%) Loss: 7.277932 LR: 0.00000005 +[05:29:01] Epoch: 1 Batch: 12/20099 (0.06%) Loss: 6.509862 LR: 0.00000005 +[05:29:04] Epoch: 1 Batch: 13/20099 (0.06%) Loss: 6.894521 LR: 0.00000005 +[05:29:06] Epoch: 1 Batch: 14/20099 (0.07%) Loss: 6.918648 LR: 0.00000011 +[05:29:09] Epoch: 1 Batch: 15/20099 (0.07%) Loss: 6.698418 LR: 0.00000011 +[05:29:12] Epoch: 1 Batch: 16/20099 (0.08%) Loss: 6.752904 LR: 0.00000011 +[05:29:15] Epoch: 1 Batch: 17/20099 (0.08%) Loss: 6.594116 LR: 0.00000011 +[05:29:18] Epoch: 1 Batch: 18/20099 (0.09%) Loss: 7.069793 LR: 0.00000011 +[05:29:21] Epoch: 1 Batch: 19/20099 (0.09%) Loss: 6.812260 LR: 0.00000011 +[05:29:24] Epoch: 1 Batch: 20/20099 (0.10%) Loss: 6.718220 LR: 0.00000011 +[05:29:27] Epoch: 1 Batch: 21/20099 (0.10%) Loss: 6.796827 LR: 0.00000016 +[05:29:30] Epoch: 1 Batch: 22/20099 (0.11%) Loss: 6.767028 LR: 0.00000016 +[05:29:33] Epoch: 1 Batch: 23/20099 (0.11%) Loss: 6.944226 LR: 0.00000016 +[05:29:36] Epoch: 1 Batch: 24/20099 (0.12%) Loss: 7.006477 LR: 0.00000016 +[05:29:38] Epoch: 1 Batch: 25/20099 (0.12%) Loss: 6.673720 LR: 0.00000016 +[05:29:41] Epoch: 1 Batch: 26/20099 (0.13%) Loss: 6.881500 LR: 0.00000016 +[05:29:44] Epoch: 1 Batch: 27/20099 (0.13%) Loss: 7.197294 LR: 0.00000016 +[05:29:47] Epoch: 1 Batch: 28/20099 (0.14%) Loss: 6.888683 LR: 0.00000022 +[05:29:50] Epoch: 1 Batch: 29/20099 (0.14%) Loss: 6.827078 LR: 0.00000022 +[05:29:53] Epoch: 1 Batch: 30/20099 (0.15%) Loss: 6.696512 LR: 0.00000022 +[05:29:56] Epoch: 1 Batch: 31/20099 (0.15%) Loss: 7.060051 LR: 0.00000022 +[05:29:59] Epoch: 1 Batch: 32/20099 (0.16%) Loss: 6.822063 LR: 0.00000022 +[05:30:02] Epoch: 1 Batch: 33/20099 (0.16%) Loss: 6.663684 LR: 0.00000022 +[05:30:05] Epoch: 1 Batch: 34/20099 (0.17%) Loss: 6.978521 LR: 0.00000022 +[05:30:08] Epoch: 1 Batch: 35/20099 (0.17%) Loss: 6.726797 LR: 0.00000027 +[05:30:11] Epoch: 1 Batch: 36/20099 (0.18%) Loss: 6.278772 LR: 0.00000027 +[05:30:14] Epoch: 1 Batch: 37/20099 (0.18%) Loss: 7.222415 LR: 0.00000027 +[05:30:17] Epoch: 1 Batch: 38/20099 (0.19%) Loss: 6.191016 LR: 0.00000027 +[05:30:20] Epoch: 1 Batch: 39/20099 (0.19%) Loss: 7.164665 LR: 0.00000027 +[05:30:23] Epoch: 1 Batch: 40/20099 (0.20%) Loss: 6.672603 LR: 0.00000027 +[05:30:26] Epoch: 1 Batch: 41/20099 (0.20%) Loss: 6.795218 LR: 0.00000027 +[05:30:29] Epoch: 1 Batch: 42/20099 (0.21%) Loss: 6.736525 LR: 0.00000033 +[05:30:32] Epoch: 1 Batch: 43/20099 (0.21%) Loss: 7.278892 LR: 0.00000033 +[05:30:35] Epoch: 1 Batch: 44/20099 (0.22%) Loss: 6.782112 LR: 0.00000033 +[05:30:38] Epoch: 1 Batch: 45/20099 (0.22%) Loss: 6.758891 LR: 0.00000033 +[05:30:41] Epoch: 1 Batch: 46/20099 (0.23%) Loss: 6.826479 LR: 0.00000033 +[05:30:44] Epoch: 1 Batch: 47/20099 (0.23%) Loss: 6.418891 LR: 0.00000033 +[05:30:47] Epoch: 1 Batch: 48/20099 (0.24%) Loss: 6.771644 LR: 0.00000033 +[05:30:50] Epoch: 1 Batch: 49/20099 (0.24%) Loss: 6.725226 LR: 0.00000038 +[05:30:53] Epoch: 1 Batch: 50/20099 (0.25%) Loss: 7.178868 LR: 0.00000038 +[05:30:56] Epoch: 1 Batch: 51/20099 (0.25%) Loss: 6.895179 LR: 0.00000038 +[05:30:59] Epoch: 1 Batch: 52/20099 (0.26%) Loss: 6.431315 LR: 0.00000038 +[05:31:02] Epoch: 1 Batch: 53/20099 (0.26%) Loss: 6.736614 LR: 0.00000038 +[05:31:05] Epoch: 1 Batch: 54/20099 (0.27%) Loss: 6.815797 LR: 0.00000038 +[05:31:08] Epoch: 1 Batch: 55/20099 (0.27%) Loss: 6.614000 LR: 0.00000038 +[05:31:11] Epoch: 1 Batch: 56/20099 (0.28%) Loss: 6.972163 LR: 0.00000044 +[05:31:14] Epoch: 1 Batch: 57/20099 (0.28%) Loss: 6.965045 LR: 0.00000044 +[05:31:17] Epoch: 1 Batch: 58/20099 (0.29%) Loss: 7.045127 LR: 0.00000044 +[05:31:20] Epoch: 1 Batch: 59/20099 (0.29%) Loss: 6.983873 LR: 0.00000044 +[05:31:23] Epoch: 1 Batch: 60/20099 (0.30%) Loss: 6.855827 LR: 0.00000044 +[05:31:26] Epoch: 1 Batch: 61/20099 (0.30%) Loss: 6.770656 LR: 0.00000044 +[05:31:29] Epoch: 1 Batch: 62/20099 (0.31%) Loss: 6.871757 LR: 0.00000044 +[05:31:33] Epoch: 1 Batch: 63/20099 (0.31%) Loss: 6.716748 LR: 0.00000049 +[05:31:36] Epoch: 1 Batch: 64/20099 (0.32%) Loss: 6.837216 LR: 0.00000049 +[05:31:39] Epoch: 1 Batch: 65/20099 (0.32%) Loss: 7.155727 LR: 0.00000049 +[05:31:42] Epoch: 1 Batch: 66/20099 (0.33%) Loss: 6.878810 LR: 0.00000049 +[05:31:45] Epoch: 1 Batch: 67/20099 (0.33%) Loss: 6.990387 LR: 0.00000049 +[05:31:48] Epoch: 1 Batch: 68/20099 (0.34%) Loss: 6.873594 LR: 0.00000049 +[05:31:51] Epoch: 1 Batch: 69/20099 (0.34%) Loss: 6.576877 LR: 0.00000049 +[05:31:54] Epoch: 1 Batch: 70/20099 (0.35%) Loss: 6.757781 LR: 0.00000055 +[05:31:57] Epoch: 1 Batch: 71/20099 (0.35%) Loss: 6.604476 LR: 0.00000055 +[05:32:00] Epoch: 1 Batch: 72/20099 (0.36%) Loss: 6.932801 LR: 0.00000055 +[05:32:03] Epoch: 1 Batch: 73/20099 (0.36%) Loss: 6.624389 LR: 0.00000055 +[05:32:06] Epoch: 1 Batch: 74/20099 (0.37%) Loss: 6.993600 LR: 0.00000055 +[05:32:09] Epoch: 1 Batch: 75/20099 (0.37%) Loss: 6.991932 LR: 0.00000055 +[05:32:12] Epoch: 1 Batch: 76/20099 (0.38%) Loss: 6.722564 LR: 0.00000055 +[05:32:15] Epoch: 1 Batch: 77/20099 (0.38%) Loss: 7.035213 LR: 0.00000060 +[05:32:19] Epoch: 1 Batch: 78/20099 (0.39%) Loss: 6.918113 LR: 0.00000060 +[05:32:22] Epoch: 1 Batch: 79/20099 (0.39%) Loss: 6.459154 LR: 0.00000060 +[05:32:25] Epoch: 1 Batch: 80/20099 (0.40%) Loss: 6.856339 LR: 0.00000060 +[05:32:28] Epoch: 1 Batch: 81/20099 (0.40%) Loss: 6.846845 LR: 0.00000060 +[05:32:31] Epoch: 1 Batch: 82/20099 (0.41%) Loss: 6.864122 LR: 0.00000060 +[05:32:34] Epoch: 1 Batch: 83/20099 (0.41%) Loss: 6.803262 LR: 0.00000060 +[05:32:37] Epoch: 1 Batch: 84/20099 (0.42%) Loss: 6.907727 LR: 0.00000065 +[05:32:40] Epoch: 1 Batch: 85/20099 (0.42%) Loss: 6.762291 LR: 0.00000065 +[05:32:44] Epoch: 1 Batch: 86/20099 (0.43%) Loss: 6.806541 LR: 0.00000065 +[05:32:47] Epoch: 1 Batch: 87/20099 (0.43%) Loss: 6.475651 LR: 0.00000065 +[05:32:50] Epoch: 1 Batch: 88/20099 (0.44%) Loss: 6.600870 LR: 0.00000065 +[05:32:53] Epoch: 1 Batch: 89/20099 (0.44%) Loss: 6.621047 LR: 0.00000065 +[05:32:56] Epoch: 1 Batch: 90/20099 (0.45%) Loss: 6.845604 LR: 0.00000065 +[05:32:59] Epoch: 1 Batch: 91/20099 (0.45%) Loss: 6.564170 LR: 0.00000071 +[05:33:02] Epoch: 1 Batch: 92/20099 (0.46%) Loss: 6.514664 LR: 0.00000071 +[05:33:06] Epoch: 1 Batch: 93/20099 (0.46%) Loss: 6.489383 LR: 0.00000071 +[05:33:09] Epoch: 1 Batch: 94/20099 (0.47%) Loss: 6.591108 LR: 0.00000071 +[05:33:12] Epoch: 1 Batch: 95/20099 (0.47%) Loss: 6.543421 LR: 0.00000071 +[05:33:15] Epoch: 1 Batch: 96/20099 (0.48%) Loss: 6.640646 LR: 0.00000071 +[05:33:18] Epoch: 1 Batch: 97/20099 (0.48%) Loss: 6.666123 LR: 0.00000071 +[05:33:21] Epoch: 1 Batch: 98/20099 (0.49%) Loss: 6.663676 LR: 0.00000076 +[05:33:24] Epoch: 1 Batch: 99/20099 (0.49%) Loss: 7.137365 LR: 0.00000076 +[05:33:27] Epoch: 1 Batch: 100/20099 (0.50%) Loss: 6.723222 LR: 0.00000076 +[05:33:30] Epoch: 1 Batch: 101/20099 (0.50%) Loss: 6.652589 LR: 0.00000076 +[05:33:33] Epoch: 1 Batch: 102/20099 (0.51%) Loss: 6.725030 LR: 0.00000076 +[05:33:36] Epoch: 1 Batch: 103/20099 (0.51%) Loss: 6.637894 LR: 0.00000076 +[05:33:40] Epoch: 1 Batch: 104/20099 (0.52%) Loss: 6.814937 LR: 0.00000076 +[05:33:43] Epoch: 1 Batch: 105/20099 (0.52%) Loss: 6.486751 LR: 0.00000082 +[05:33:46] Epoch: 1 Batch: 106/20099 (0.53%) Loss: 6.497531 LR: 0.00000082 +[05:33:49] Epoch: 1 Batch: 107/20099 (0.53%) Loss: 6.509476 LR: 0.00000082 +[05:33:52] Epoch: 1 Batch: 108/20099 (0.54%) Loss: 6.648261 LR: 0.00000082 +[05:33:55] Epoch: 1 Batch: 109/20099 (0.54%) Loss: 6.239590 LR: 0.00000082 +[05:33:58] Epoch: 1 Batch: 110/20099 (0.55%) Loss: 6.686769 LR: 0.00000082 +[05:34:01] Epoch: 1 Batch: 111/20099 (0.55%) Loss: 6.750135 LR: 0.00000082 +[05:34:04] Epoch: 1 Batch: 112/20099 (0.56%) Loss: 7.052633 LR: 0.00000087 +[05:34:08] Epoch: 1 Batch: 113/20099 (0.56%) Loss: 6.514356 LR: 0.00000087 +[05:34:11] Epoch: 1 Batch: 114/20099 (0.57%) Loss: 6.400832 LR: 0.00000087 +[05:34:14] Epoch: 1 Batch: 115/20099 (0.57%) Loss: 6.413975 LR: 0.00000087 +[05:34:17] Epoch: 1 Batch: 116/20099 (0.58%) Loss: 6.430995 LR: 0.00000087 +[05:34:20] Epoch: 1 Batch: 117/20099 (0.58%) Loss: 6.863123 LR: 0.00000087 +[05:34:23] Epoch: 1 Batch: 118/20099 (0.59%) Loss: 6.755737 LR: 0.00000087 +[05:34:26] Epoch: 1 Batch: 119/20099 (0.59%) Loss: 6.394816 LR: 0.00000093 +[05:34:29] Epoch: 1 Batch: 120/20099 (0.60%) Loss: 6.555242 LR: 0.00000093 +[05:34:32] Epoch: 1 Batch: 121/20099 (0.60%) Loss: 6.644229 LR: 0.00000093 +[05:34:36] Epoch: 1 Batch: 122/20099 (0.61%) Loss: 6.372416 LR: 0.00000093 +[05:34:39] Epoch: 1 Batch: 123/20099 (0.61%) Loss: 6.253324 LR: 0.00000093 +[05:34:42] Epoch: 1 Batch: 124/20099 (0.62%) Loss: 6.435319 LR: 0.00000093 +[05:34:45] Epoch: 1 Batch: 125/20099 (0.62%) Loss: 6.674481 LR: 0.00000093 +[05:34:48] Epoch: 1 Batch: 126/20099 (0.63%) Loss: 6.503188 LR: 0.00000098 +[05:34:51] Epoch: 1 Batch: 127/20099 (0.63%) Loss: 6.616235 LR: 0.00000098 +[05:34:54] Epoch: 1 Batch: 128/20099 (0.64%) Loss: 6.335257 LR: 0.00000098 +[05:34:57] Epoch: 1 Batch: 129/20099 (0.64%) Loss: 6.502963 LR: 0.00000098 +[05:35:00] Epoch: 1 Batch: 130/20099 (0.65%) Loss: 6.429163 LR: 0.00000098 +[05:35:04] Epoch: 1 Batch: 131/20099 (0.65%) Loss: 6.404885 LR: 0.00000098 +[05:35:07] Epoch: 1 Batch: 132/20099 (0.66%) Loss: 6.518302 LR: 0.00000098 +[05:35:10] Epoch: 1 Batch: 133/20099 (0.66%) Loss: 6.721867 LR: 0.00000104 +[05:35:13] Epoch: 1 Batch: 134/20099 (0.67%) Loss: 6.334186 LR: 0.00000104 +[05:35:16] Epoch: 1 Batch: 135/20099 (0.67%) Loss: 6.711968 LR: 0.00000104 +[05:35:19] Epoch: 1 Batch: 136/20099 (0.68%) Loss: 6.439227 LR: 0.00000104 +[05:35:22] Epoch: 1 Batch: 137/20099 (0.68%) Loss: 6.490163 LR: 0.00000104 +[05:35:25] Epoch: 1 Batch: 138/20099 (0.69%) Loss: 6.182080 LR: 0.00000104 +[05:35:28] Epoch: 1 Batch: 139/20099 (0.69%) Loss: 6.645146 LR: 0.00000104 +[05:35:32] Epoch: 1 Batch: 140/20099 (0.70%) Loss: 6.454726 LR: 0.00000109 +[05:35:35] Epoch: 1 Batch: 141/20099 (0.70%) Loss: 6.389693 LR: 0.00000109 +[05:35:38] Epoch: 1 Batch: 142/20099 (0.71%) Loss: 6.573570 LR: 0.00000109 +[05:35:41] Epoch: 1 Batch: 143/20099 (0.71%) Loss: 6.421232 LR: 0.00000109 +[05:35:44] Epoch: 1 Batch: 144/20099 (0.72%) Loss: 6.541822 LR: 0.00000109 +[05:35:47] Epoch: 1 Batch: 145/20099 (0.72%) Loss: 6.197926 LR: 0.00000109 +[05:35:50] Epoch: 1 Batch: 146/20099 (0.73%) Loss: 6.510123 LR: 0.00000109 +[05:35:53] Epoch: 1 Batch: 147/20099 (0.73%) Loss: 6.573685 LR: 0.00000115 +[05:35:56] Epoch: 1 Batch: 148/20099 (0.74%) Loss: 6.656434 LR: 0.00000115 +[05:36:00] Epoch: 1 Batch: 149/20099 (0.74%) Loss: 6.376676 LR: 0.00000115 +[05:36:03] Epoch: 1 Batch: 150/20099 (0.75%) Loss: 6.215777 LR: 0.00000115 +[05:36:06] Epoch: 1 Batch: 151/20099 (0.75%) Loss: 6.363476 LR: 0.00000115 +[05:36:09] Epoch: 1 Batch: 152/20099 (0.76%) Loss: 6.620334 LR: 0.00000115 +[05:36:12] Epoch: 1 Batch: 153/20099 (0.76%) Loss: 6.468755 LR: 0.00000115 +[05:36:15] Epoch: 1 Batch: 154/20099 (0.77%) Loss: 6.322797 LR: 0.00000120 +[05:36:18] Epoch: 1 Batch: 155/20099 (0.77%) Loss: 6.676209 LR: 0.00000120 +[05:36:21] Epoch: 1 Batch: 156/20099 (0.78%) Loss: 6.065999 LR: 0.00000120 +[05:36:24] Epoch: 1 Batch: 157/20099 (0.78%) Loss: 6.779493 LR: 0.00000120 +[05:36:28] Epoch: 1 Batch: 158/20099 (0.79%) Loss: 6.407558 LR: 0.00000120 +[05:36:31] Epoch: 1 Batch: 159/20099 (0.79%) Loss: 6.315716 LR: 0.00000120 +[05:36:34] Epoch: 1 Batch: 160/20099 (0.80%) Loss: 6.164629 LR: 0.00000120 +[05:36:37] Epoch: 1 Batch: 161/20099 (0.80%) Loss: 6.435280 LR: 0.00000125 +[05:36:40] Epoch: 1 Batch: 162/20099 (0.81%) Loss: 6.606733 LR: 0.00000125 +[05:36:43] Epoch: 1 Batch: 163/20099 (0.81%) Loss: 6.095863 LR: 0.00000125 +[05:36:46] Epoch: 1 Batch: 164/20099 (0.82%) Loss: 6.013640 LR: 0.00000125 +[05:36:49] Epoch: 1 Batch: 165/20099 (0.82%) Loss: 6.524360 LR: 0.00000125 +[05:36:52] Epoch: 1 Batch: 166/20099 (0.83%) Loss: 6.084081 LR: 0.00000125 +[05:36:56] Epoch: 1 Batch: 167/20099 (0.83%) Loss: 6.428363 LR: 0.00000125 +[05:36:59] Epoch: 1 Batch: 168/20099 (0.84%) Loss: 6.342831 LR: 0.00000131 +[05:37:02] Epoch: 1 Batch: 169/20099 (0.84%) Loss: 6.097524 LR: 0.00000131 +[05:37:05] Epoch: 1 Batch: 170/20099 (0.85%) Loss: 6.208304 LR: 0.00000131 +[05:37:08] Epoch: 1 Batch: 171/20099 (0.85%) Loss: 6.293879 LR: 0.00000131 +[05:37:11] Epoch: 1 Batch: 172/20099 (0.86%) Loss: 6.507888 LR: 0.00000131 +[05:37:14] Epoch: 1 Batch: 173/20099 (0.86%) Loss: 6.448502 LR: 0.00000131 +[05:37:17] Epoch: 1 Batch: 174/20099 (0.87%) Loss: 6.333720 LR: 0.00000131 +[05:37:20] Epoch: 1 Batch: 175/20099 (0.87%) Loss: 5.815619 LR: 0.00000136 +[05:37:24] Epoch: 1 Batch: 176/20099 (0.88%) Loss: 5.965688 LR: 0.00000136 +[05:37:27] Epoch: 1 Batch: 177/20099 (0.88%) Loss: 6.505473 LR: 0.00000136 +[05:37:30] Epoch: 1 Batch: 178/20099 (0.89%) Loss: 6.120909 LR: 0.00000136 +[05:37:33] Epoch: 1 Batch: 179/20099 (0.89%) Loss: 6.296700 LR: 0.00000136 +[05:37:36] Epoch: 1 Batch: 180/20099 (0.90%) Loss: 6.364547 LR: 0.00000136 +[05:37:39] Epoch: 1 Batch: 181/20099 (0.90%) Loss: 6.454642 LR: 0.00000136 +[05:37:42] Epoch: 1 Batch: 182/20099 (0.91%) Loss: 6.215357 LR: 0.00000142 +[05:37:45] Epoch: 1 Batch: 183/20099 (0.91%) Loss: 6.249943 LR: 0.00000142 +[05:37:48] Epoch: 1 Batch: 184/20099 (0.92%) Loss: 6.188128 LR: 0.00000142 +[05:37:52] Epoch: 1 Batch: 185/20099 (0.92%) Loss: 6.247033 LR: 0.00000142 +[05:37:55] Epoch: 1 Batch: 186/20099 (0.93%) Loss: 6.516192 LR: 0.00000142 +[05:37:58] Epoch: 1 Batch: 187/20099 (0.93%) Loss: 6.236046 LR: 0.00000142 +[05:38:01] Epoch: 1 Batch: 188/20099 (0.94%) Loss: 5.909803 LR: 0.00000142 +[05:38:04] Epoch: 1 Batch: 189/20099 (0.94%) Loss: 6.254087 LR: 0.00000147 +[05:38:07] Epoch: 1 Batch: 190/20099 (0.95%) Loss: 6.308969 LR: 0.00000147 +[05:38:10] Epoch: 1 Batch: 191/20099 (0.95%) Loss: 6.102310 LR: 0.00000147 +[05:38:13] Epoch: 1 Batch: 192/20099 (0.96%) Loss: 5.751007 LR: 0.00000147 +[05:38:16] Epoch: 1 Batch: 193/20099 (0.96%) Loss: 6.190647 LR: 0.00000147 +[05:38:20] Epoch: 1 Batch: 194/20099 (0.97%) Loss: 5.969916 LR: 0.00000147 +[05:38:23] Epoch: 1 Batch: 195/20099 (0.97%) Loss: 6.395854 LR: 0.00000147 +[05:38:26] Epoch: 1 Batch: 196/20099 (0.98%) Loss: 6.385301 LR: 0.00000153 +[05:38:29] Epoch: 1 Batch: 197/20099 (0.98%) Loss: 5.855777 LR: 0.00000153 +[05:38:32] Epoch: 1 Batch: 198/20099 (0.99%) Loss: 5.975441 LR: 0.00000153 +[05:38:35] Epoch: 1 Batch: 199/20099 (0.99%) Loss: 6.063916 LR: 0.00000153 +[05:38:42] >> Temp checkpoint saved: epoch1_step200, size: 0.1693 GB +[05:38:42] Epoch: 1 Batch: 200/20099 (1.00%) Loss: 5.698818 LR: 0.00000153 +[05:38:45] Epoch: 1 Batch: 201/20099 (1.00%) Loss: 6.142428 LR: 0.00000153 +[05:38:48] Epoch: 1 Batch: 202/20099 (1.01%) Loss: 6.221570 LR: 0.00000153 +[05:38:51] Epoch: 1 Batch: 203/20099 (1.01%) Loss: 6.094434 LR: 0.00000158 +[05:38:54] Epoch: 1 Batch: 204/20099 (1.01%) Loss: 6.072093 LR: 0.00000158 +[05:38:57] Epoch: 1 Batch: 205/20099 (1.02%) Loss: 6.169077 LR: 0.00000158 +[05:39:01] Epoch: 1 Batch: 206/20099 (1.02%) Loss: 6.286780 LR: 0.00000158 +[05:39:04] Epoch: 1 Batch: 207/20099 (1.03%) Loss: 6.379415 LR: 0.00000158 +[05:39:07] Epoch: 1 Batch: 208/20099 (1.03%) Loss: 6.125323 LR: 0.00000158 +[05:39:10] Epoch: 1 Batch: 209/20099 (1.04%) Loss: 6.244865 LR: 0.00000158 +[05:39:13] Epoch: 1 Batch: 210/20099 (1.04%) Loss: 6.171168 LR: 0.00000164 +[05:39:16] Epoch: 1 Batch: 211/20099 (1.05%) Loss: 6.062247 LR: 0.00000164 +[05:39:19] Epoch: 1 Batch: 212/20099 (1.05%) Loss: 5.894752 LR: 0.00000164 +[05:39:22] Epoch: 1 Batch: 213/20099 (1.06%) Loss: 6.162818 LR: 0.00000164 +[05:39:25] Epoch: 1 Batch: 214/20099 (1.06%) Loss: 5.833673 LR: 0.00000164 +[05:39:29] Epoch: 1 Batch: 215/20099 (1.07%) Loss: 6.039549 LR: 0.00000164 +[05:39:32] Epoch: 1 Batch: 216/20099 (1.07%) Loss: 5.770587 LR: 0.00000164 +[05:39:35] Epoch: 1 Batch: 217/20099 (1.08%) Loss: 5.787209 LR: 0.00000169 +[05:39:38] Epoch: 1 Batch: 218/20099 (1.08%) Loss: 5.869362 LR: 0.00000169 +[05:39:41] Epoch: 1 Batch: 219/20099 (1.09%) Loss: 5.754841 LR: 0.00000169 +[05:39:44] Epoch: 1 Batch: 220/20099 (1.09%) Loss: 5.975692 LR: 0.00000169 +[05:39:47] Epoch: 1 Batch: 221/20099 (1.10%) Loss: 6.191529 LR: 0.00000169 +[05:39:50] Epoch: 1 Batch: 222/20099 (1.10%) Loss: 5.850366 LR: 0.00000169 +[05:39:53] Epoch: 1 Batch: 223/20099 (1.11%) Loss: 5.753388 LR: 0.00000169 +[05:39:57] Epoch: 1 Batch: 224/20099 (1.11%) Loss: 5.992358 LR: 0.00000175 +[05:40:00] Epoch: 1 Batch: 225/20099 (1.12%) Loss: 6.066387 LR: 0.00000175 +[05:40:03] Epoch: 1 Batch: 226/20099 (1.12%) Loss: 5.618329 LR: 0.00000175 +[05:40:06] Epoch: 1 Batch: 227/20099 (1.13%) Loss: 6.136591 LR: 0.00000175 +[05:40:09] Epoch: 1 Batch: 228/20099 (1.13%) Loss: 6.188446 LR: 0.00000175 +[05:40:12] Epoch: 1 Batch: 229/20099 (1.14%) Loss: 6.199264 LR: 0.00000175 +[05:40:15] Epoch: 1 Batch: 230/20099 (1.14%) Loss: 5.657817 LR: 0.00000175 +[05:40:18] Epoch: 1 Batch: 231/20099 (1.15%) Loss: 5.634850 LR: 0.00000180 +[05:40:21] Epoch: 1 Batch: 232/20099 (1.15%) Loss: 5.713538 LR: 0.00000180 +[05:40:24] Epoch: 1 Batch: 233/20099 (1.16%) Loss: 5.442693 LR: 0.00000180 +[05:40:27] Epoch: 1 Batch: 234/20099 (1.16%) Loss: 5.468944 LR: 0.00000180 +[05:40:31] Epoch: 1 Batch: 235/20099 (1.17%) Loss: 5.735326 LR: 0.00000180 +[05:40:34] Epoch: 1 Batch: 236/20099 (1.17%) Loss: 5.961711 LR: 0.00000180 +[05:40:37] Epoch: 1 Batch: 237/20099 (1.18%) Loss: 5.912184 LR: 0.00000180 +[05:40:40] Epoch: 1 Batch: 238/20099 (1.18%) Loss: 5.762606 LR: 0.00000185 +[05:40:43] Epoch: 1 Batch: 239/20099 (1.19%) Loss: 5.803808 LR: 0.00000185 +[05:40:46] Epoch: 1 Batch: 240/20099 (1.19%) Loss: 5.470053 LR: 0.00000185 +[05:40:49] Epoch: 1 Batch: 241/20099 (1.20%) Loss: 6.077574 LR: 0.00000185 +[05:40:52] Epoch: 1 Batch: 242/20099 (1.20%) Loss: 5.852667 LR: 0.00000185 +[05:40:55] Epoch: 1 Batch: 243/20099 (1.21%) Loss: 5.655321 LR: 0.00000185 +[05:40:58] Epoch: 1 Batch: 244/20099 (1.21%) Loss: 5.610448 LR: 0.00000185 +[05:41:02] Epoch: 1 Batch: 245/20099 (1.22%) Loss: 5.818143 LR: 0.00000191 +[05:41:05] Epoch: 1 Batch: 246/20099 (1.22%) Loss: 5.807350 LR: 0.00000191 +[05:41:08] Epoch: 1 Batch: 247/20099 (1.23%) Loss: 5.619134 LR: 0.00000191 +[05:41:11] Epoch: 1 Batch: 248/20099 (1.23%) Loss: 5.859716 LR: 0.00000191 +[05:41:14] Epoch: 1 Batch: 249/20099 (1.24%) Loss: 5.755089 LR: 0.00000191 +[05:41:17] Epoch: 1 Batch: 250/20099 (1.24%) Loss: 5.934231 LR: 0.00000191 +[05:41:20] Epoch: 1 Batch: 251/20099 (1.25%) Loss: 5.471463 LR: 0.00000191 +[05:41:23] Epoch: 1 Batch: 252/20099 (1.25%) Loss: 5.868921 LR: 0.00000196 +[05:41:26] Epoch: 1 Batch: 253/20099 (1.26%) Loss: 5.466228 LR: 0.00000196 +[05:41:29] Epoch: 1 Batch: 254/20099 (1.26%) Loss: 5.504995 LR: 0.00000196 +[05:41:33] Epoch: 1 Batch: 255/20099 (1.27%) Loss: 5.788034 LR: 0.00000196 +[05:41:36] Epoch: 1 Batch: 256/20099 (1.27%) Loss: 5.667752 LR: 0.00000196 +[05:41:39] Epoch: 1 Batch: 257/20099 (1.28%) Loss: 5.331821 LR: 0.00000196 +[05:41:42] Epoch: 1 Batch: 258/20099 (1.28%) Loss: 5.674621 LR: 0.00000196 +[05:41:45] Epoch: 1 Batch: 259/20099 (1.29%) Loss: 5.657183 LR: 0.00000202 +[05:41:48] Epoch: 1 Batch: 260/20099 (1.29%) Loss: 5.716599 LR: 0.00000202 +[05:41:51] Epoch: 1 Batch: 261/20099 (1.30%) Loss: 5.072574 LR: 0.00000202 +[05:41:54] Epoch: 1 Batch: 262/20099 (1.30%) Loss: 5.573281 LR: 0.00000202 +[05:41:57] Epoch: 1 Batch: 263/20099 (1.31%) Loss: 5.770137 LR: 0.00000202 +[05:42:00] Epoch: 1 Batch: 264/20099 (1.31%) Loss: 5.341028 LR: 0.00000202 +[05:42:04] Epoch: 1 Batch: 265/20099 (1.32%) Loss: 5.450570 LR: 0.00000202 +[05:42:07] Epoch: 1 Batch: 266/20099 (1.32%) Loss: 5.163337 LR: 0.00000207 +[05:42:10] Epoch: 1 Batch: 267/20099 (1.33%) Loss: 5.341553 LR: 0.00000207 +[05:42:13] Epoch: 1 Batch: 268/20099 (1.33%) Loss: 5.937451 LR: 0.00000207 +[05:42:16] Epoch: 1 Batch: 269/20099 (1.34%) Loss: 5.385732 LR: 0.00000207 +[05:42:19] Epoch: 1 Batch: 270/20099 (1.34%) Loss: 5.646508 LR: 0.00000207 +[05:42:22] Epoch: 1 Batch: 271/20099 (1.35%) Loss: 5.687840 LR: 0.00000207 +[05:42:25] Epoch: 1 Batch: 272/20099 (1.35%) Loss: 5.411791 LR: 0.00000207 +[05:42:29] Epoch: 1 Batch: 273/20099 (1.36%) Loss: 5.368639 LR: 0.00000213 +[05:42:32] Epoch: 1 Batch: 274/20099 (1.36%) Loss: 5.434786 LR: 0.00000213 +[05:42:35] Epoch: 1 Batch: 275/20099 (1.37%) Loss: 5.362719 LR: 0.00000213 +[05:42:38] Epoch: 1 Batch: 276/20099 (1.37%) Loss: 5.543468 LR: 0.00000213 +[05:42:41] Epoch: 1 Batch: 277/20099 (1.38%) Loss: 5.233136 LR: 0.00000213 +[05:42:44] Epoch: 1 Batch: 278/20099 (1.38%) Loss: 5.227110 LR: 0.00000213 +[05:42:47] Epoch: 1 Batch: 279/20099 (1.39%) Loss: 5.556291 LR: 0.00000213 +[05:42:50] Epoch: 1 Batch: 280/20099 (1.39%) Loss: 5.393062 LR: 0.00000218 +[05:42:53] Epoch: 1 Batch: 281/20099 (1.40%) Loss: 5.804086 LR: 0.00000218 +[05:42:57] Epoch: 1 Batch: 282/20099 (1.40%) Loss: 5.275065 LR: 0.00000218 +[05:43:00] Epoch: 1 Batch: 283/20099 (1.41%) Loss: 5.461874 LR: 0.00000218 +[05:43:03] Epoch: 1 Batch: 284/20099 (1.41%) Loss: 5.395660 LR: 0.00000218 +[05:43:06] Epoch: 1 Batch: 285/20099 (1.42%) Loss: 5.600826 LR: 0.00000218 +[05:43:09] Epoch: 1 Batch: 286/20099 (1.42%) Loss: 5.379600 LR: 0.00000218 +[05:43:12] Epoch: 1 Batch: 287/20099 (1.43%) Loss: 5.399331 LR: 0.00000224 +[05:43:15] Epoch: 1 Batch: 288/20099 (1.43%) Loss: 5.327848 LR: 0.00000224 +[05:43:18] Epoch: 1 Batch: 289/20099 (1.44%) Loss: 5.226300 LR: 0.00000224 +[05:43:21] Epoch: 1 Batch: 290/20099 (1.44%) Loss: 5.193249 LR: 0.00000224 +[05:43:25] Epoch: 1 Batch: 291/20099 (1.45%) Loss: 4.893579 LR: 0.00000224 +[05:43:28] Epoch: 1 Batch: 292/20099 (1.45%) Loss: 5.189096 LR: 0.00000224 +[05:43:31] Epoch: 1 Batch: 293/20099 (1.46%) Loss: 5.457005 LR: 0.00000224 +[05:43:34] Epoch: 1 Batch: 294/20099 (1.46%) Loss: 5.486538 LR: 0.00000229 +[05:43:37] Epoch: 1 Batch: 295/20099 (1.47%) Loss: 5.685827 LR: 0.00000229 +[05:43:40] Epoch: 1 Batch: 296/20099 (1.47%) Loss: 5.005714 LR: 0.00000229 +[05:43:43] Epoch: 1 Batch: 297/20099 (1.48%) Loss: 5.061221 LR: 0.00000229 +[05:43:46] Epoch: 1 Batch: 298/20099 (1.48%) Loss: 5.398332 LR: 0.00000229 +[05:43:49] Epoch: 1 Batch: 299/20099 (1.49%) Loss: 5.169147 LR: 0.00000229 +[05:43:53] Epoch: 1 Batch: 300/20099 (1.49%) Loss: 5.401959 LR: 0.00000229 +[05:43:56] Epoch: 1 Batch: 301/20099 (1.50%) Loss: 5.609097 LR: 0.00000235 +[05:43:59] Epoch: 1 Batch: 302/20099 (1.50%) Loss: 5.090550 LR: 0.00000235 +[05:44:02] Epoch: 1 Batch: 303/20099 (1.51%) Loss: 5.435633 LR: 0.00000235 +[05:44:05] Epoch: 1 Batch: 304/20099 (1.51%) Loss: 4.977289 LR: 0.00000235 +[05:44:08] Epoch: 1 Batch: 305/20099 (1.52%) Loss: 5.232569 LR: 0.00000235 +[05:44:11] Epoch: 1 Batch: 306/20099 (1.52%) Loss: 5.072268 LR: 0.00000235 +[05:44:14] Epoch: 1 Batch: 307/20099 (1.53%) Loss: 5.037082 LR: 0.00000235 +[05:44:17] Epoch: 1 Batch: 308/20099 (1.53%) Loss: 5.163113 LR: 0.00000240 +[05:44:21] Epoch: 1 Batch: 309/20099 (1.54%) Loss: 5.293965 LR: 0.00000240 +[05:44:24] Epoch: 1 Batch: 310/20099 (1.54%) Loss: 5.232081 LR: 0.00000240 +[05:44:27] Epoch: 1 Batch: 311/20099 (1.55%) Loss: 4.839522 LR: 0.00000240 +[05:44:30] Epoch: 1 Batch: 312/20099 (1.55%) Loss: 5.036047 LR: 0.00000240 +[05:44:33] Epoch: 1 Batch: 313/20099 (1.56%) Loss: 4.877650 LR: 0.00000240 +[05:44:36] Epoch: 1 Batch: 314/20099 (1.56%) Loss: 4.930508 LR: 0.00000240 +[05:44:39] Epoch: 1 Batch: 315/20099 (1.57%) Loss: 5.141826 LR: 0.00000245 +[05:44:42] Epoch: 1 Batch: 316/20099 (1.57%) Loss: 5.309392 LR: 0.00000245 +[05:44:45] Epoch: 1 Batch: 317/20099 (1.58%) Loss: 4.945790 LR: 0.00000245 +[05:44:48] Epoch: 1 Batch: 318/20099 (1.58%) Loss: 5.153427 LR: 0.00000245 +[05:44:51] Epoch: 1 Batch: 319/20099 (1.59%) Loss: 5.296544 LR: 0.00000245 +[05:44:55] Epoch: 1 Batch: 320/20099 (1.59%) Loss: 5.330044 LR: 0.00000245 +[05:44:58] Epoch: 1 Batch: 321/20099 (1.60%) Loss: 5.185388 LR: 0.00000245 +[05:45:01] Epoch: 1 Batch: 322/20099 (1.60%) Loss: 5.081245 LR: 0.00000251 +[05:45:04] Epoch: 1 Batch: 323/20099 (1.61%) Loss: 5.322606 LR: 0.00000251 +[05:45:07] Epoch: 1 Batch: 324/20099 (1.61%) Loss: 5.183798 LR: 0.00000251 +[05:45:10] Epoch: 1 Batch: 325/20099 (1.62%) Loss: 4.819690 LR: 0.00000251 +[05:45:13] Epoch: 1 Batch: 326/20099 (1.62%) Loss: 4.931559 LR: 0.00000251 +[05:45:16] Epoch: 1 Batch: 327/20099 (1.63%) Loss: 4.904405 LR: 0.00000251 +[05:45:19] Epoch: 1 Batch: 328/20099 (1.63%) Loss: 5.280621 LR: 0.00000251 +[05:45:22] Epoch: 1 Batch: 329/20099 (1.64%) Loss: 4.951993 LR: 0.00000256 +[05:45:26] Epoch: 1 Batch: 330/20099 (1.64%) Loss: 4.881390 LR: 0.00000256 +[05:45:29] Epoch: 1 Batch: 331/20099 (1.65%) Loss: 4.972852 LR: 0.00000256 +[05:45:32] Epoch: 1 Batch: 332/20099 (1.65%) Loss: 5.029356 LR: 0.00000256 +[05:45:35] Epoch: 1 Batch: 333/20099 (1.66%) Loss: 5.228640 LR: 0.00000256 +[05:45:38] Epoch: 1 Batch: 334/20099 (1.66%) Loss: 5.045745 LR: 0.00000256 +[05:45:41] Epoch: 1 Batch: 335/20099 (1.67%) Loss: 4.942456 LR: 0.00000256 +[05:45:44] Epoch: 1 Batch: 336/20099 (1.67%) Loss: 4.788242 LR: 0.00000262 +[05:45:47] Epoch: 1 Batch: 337/20099 (1.68%) Loss: 5.103614 LR: 0.00000262 +[05:45:50] Epoch: 1 Batch: 338/20099 (1.68%) Loss: 4.654943 LR: 0.00000262 +[05:45:53] Epoch: 1 Batch: 339/20099 (1.69%) Loss: 4.922127 LR: 0.00000262 +[05:45:56] Epoch: 1 Batch: 340/20099 (1.69%) Loss: 4.512710 LR: 0.00000262 +[05:46:00] Epoch: 1 Batch: 341/20099 (1.70%) Loss: 4.882350 LR: 0.00000262 +[05:46:03] Epoch: 1 Batch: 342/20099 (1.70%) Loss: 5.083026 LR: 0.00000262 +[05:46:06] Epoch: 1 Batch: 343/20099 (1.71%) Loss: 4.741843 LR: 0.00000267 +[05:46:09] Epoch: 1 Batch: 344/20099 (1.71%) Loss: 4.899370 LR: 0.00000267 +[05:46:12] Epoch: 1 Batch: 345/20099 (1.72%) Loss: 4.738027 LR: 0.00000267 +[05:46:15] Epoch: 1 Batch: 346/20099 (1.72%) Loss: 4.711547 LR: 0.00000267 +[05:46:18] Epoch: 1 Batch: 347/20099 (1.73%) Loss: 4.708997 LR: 0.00000267 +[05:46:21] Epoch: 1 Batch: 348/20099 (1.73%) Loss: 4.782561 LR: 0.00000267 +[05:46:24] Epoch: 1 Batch: 349/20099 (1.74%) Loss: 4.748567 LR: 0.00000267 +[05:46:28] Epoch: 1 Batch: 350/20099 (1.74%) Loss: 4.783836 LR: 0.00000273 +[05:46:31] Epoch: 1 Batch: 351/20099 (1.75%) Loss: 5.047218 LR: 0.00000273 +[05:46:34] Epoch: 1 Batch: 352/20099 (1.75%) Loss: 4.825919 LR: 0.00000273 +[05:46:37] Epoch: 1 Batch: 353/20099 (1.76%) Loss: 4.717120 LR: 0.00000273 +[05:46:40] Epoch: 1 Batch: 354/20099 (1.76%) Loss: 4.809447 LR: 0.00000273 +[05:46:43] Epoch: 1 Batch: 355/20099 (1.77%) Loss: 4.627695 LR: 0.00000273 +[05:46:46] Epoch: 1 Batch: 356/20099 (1.77%) Loss: 4.865065 LR: 0.00000273 +[05:46:49] Epoch: 1 Batch: 357/20099 (1.78%) Loss: 4.727468 LR: 0.00000278 +[05:46:52] Epoch: 1 Batch: 358/20099 (1.78%) Loss: 4.718815 LR: 0.00000278 +[05:46:55] Epoch: 1 Batch: 359/20099 (1.79%) Loss: 4.850678 LR: 0.00000278 +[05:46:59] Epoch: 1 Batch: 360/20099 (1.79%) Loss: 4.290783 LR: 0.00000278 +[05:47:02] Epoch: 1 Batch: 361/20099 (1.80%) Loss: 4.616312 LR: 0.00000278 +[05:47:05] Epoch: 1 Batch: 362/20099 (1.80%) Loss: 4.740779 LR: 0.00000278 +[05:47:08] Epoch: 1 Batch: 363/20099 (1.81%) Loss: 4.519742 LR: 0.00000278 +[05:47:11] Epoch: 1 Batch: 364/20099 (1.81%) Loss: 4.583066 LR: 0.00000284 +[05:47:14] Epoch: 1 Batch: 365/20099 (1.82%) Loss: 4.539240 LR: 0.00000284 +[05:47:17] Epoch: 1 Batch: 366/20099 (1.82%) Loss: 4.715234 LR: 0.00000284 +[05:47:20] Epoch: 1 Batch: 367/20099 (1.83%) Loss: 4.348344 LR: 0.00000284 +[05:47:23] Epoch: 1 Batch: 368/20099 (1.83%) Loss: 4.668050 LR: 0.00000284 +[05:47:26] Epoch: 1 Batch: 369/20099 (1.84%) Loss: 4.624444 LR: 0.00000284 +[05:47:30] Epoch: 1 Batch: 370/20099 (1.84%) Loss: 4.446140 LR: 0.00000284 +[05:47:33] Epoch: 1 Batch: 371/20099 (1.85%) Loss: 4.270877 LR: 0.00000289 +[05:47:36] Epoch: 1 Batch: 372/20099 (1.85%) Loss: 4.583157 LR: 0.00000289 +[05:47:39] Epoch: 1 Batch: 373/20099 (1.86%) Loss: 4.234210 LR: 0.00000289 +[05:47:42] Epoch: 1 Batch: 374/20099 (1.86%) Loss: 4.338867 LR: 0.00000289 +[05:47:45] Epoch: 1 Batch: 375/20099 (1.87%) Loss: 4.833863 LR: 0.00000289 +[05:47:48] Epoch: 1 Batch: 376/20099 (1.87%) Loss: 4.351973 LR: 0.00000289 +[05:47:51] Epoch: 1 Batch: 377/20099 (1.88%) Loss: 4.211730 LR: 0.00000289 +[05:47:54] Epoch: 1 Batch: 378/20099 (1.88%) Loss: 4.438265 LR: 0.00000295 +[05:47:58] Epoch: 1 Batch: 379/20099 (1.89%) Loss: 4.377117 LR: 0.00000295 +[05:48:01] Epoch: 1 Batch: 380/20099 (1.89%) Loss: 4.738757 LR: 0.00000295 +[05:48:04] Epoch: 1 Batch: 381/20099 (1.90%) Loss: 4.620075 LR: 0.00000295 +[05:48:07] Epoch: 1 Batch: 382/20099 (1.90%) Loss: 4.514782 LR: 0.00000295 +[05:48:10] Epoch: 1 Batch: 383/20099 (1.91%) Loss: 4.726908 LR: 0.00000295 +[05:48:13] Epoch: 1 Batch: 384/20099 (1.91%) Loss: 4.545939 LR: 0.00000295 +[05:48:16] Epoch: 1 Batch: 385/20099 (1.92%) Loss: 4.612621 LR: 0.00000300 +[05:48:19] Epoch: 1 Batch: 386/20099 (1.92%) Loss: 4.335743 LR: 0.00000300 +[05:48:22] Epoch: 1 Batch: 387/20099 (1.93%) Loss: 4.399743 LR: 0.00000300 +[05:48:25] Epoch: 1 Batch: 388/20099 (1.93%) Loss: 4.705808 LR: 0.00000300 +[05:48:29] Epoch: 1 Batch: 389/20099 (1.94%) Loss: 4.504914 LR: 0.00000300 +[05:48:32] Epoch: 1 Batch: 390/20099 (1.94%) Loss: 4.027596 LR: 0.00000300 +[05:48:35] Epoch: 1 Batch: 391/20099 (1.95%) Loss: 4.647719 LR: 0.00000300 +[05:48:38] Epoch: 1 Batch: 392/20099 (1.95%) Loss: 4.645146 LR: 0.00000305 +[05:48:41] Epoch: 1 Batch: 393/20099 (1.96%) Loss: 4.740805 LR: 0.00000305 +[05:48:44] Epoch: 1 Batch: 394/20099 (1.96%) Loss: 4.275738 LR: 0.00000305 +[05:48:47] Epoch: 1 Batch: 395/20099 (1.97%) Loss: 4.541855 LR: 0.00000305 +[05:48:50] Epoch: 1 Batch: 396/20099 (1.97%) Loss: 4.124485 LR: 0.00000305 +[05:48:53] Epoch: 1 Batch: 397/20099 (1.98%) Loss: 4.208072 LR: 0.00000305 +[05:48:57] Epoch: 1 Batch: 398/20099 (1.98%) Loss: 4.563438 LR: 0.00000305 +[05:49:00] Epoch: 1 Batch: 399/20099 (1.99%) Loss: 4.288823 LR: 0.00000311 +[05:49:06] >> Temp checkpoint saved: epoch1_step400, size: 0.1693 GB +[05:49:06] Epoch: 1 Batch: 400/20099 (1.99%) Loss: 4.443245 LR: 0.00000311 +[05:49:10] Epoch: 1 Batch: 401/20099 (2.00%) Loss: 4.263463 LR: 0.00000311 +[05:49:13] Epoch: 1 Batch: 402/20099 (2.00%) Loss: 4.325703 LR: 0.00000311 +[05:49:16] Epoch: 1 Batch: 403/20099 (2.01%) Loss: 4.370763 LR: 0.00000311 +[05:49:19] Epoch: 1 Batch: 404/20099 (2.01%) Loss: 4.374426 LR: 0.00000311 +[05:49:22] Epoch: 1 Batch: 405/20099 (2.02%) Loss: 4.432939 LR: 0.00000311 +[05:49:25] Epoch: 1 Batch: 406/20099 (2.02%) Loss: 4.191067 LR: 0.00000316 +[05:49:28] Epoch: 1 Batch: 407/20099 (2.02%) Loss: 4.505256 LR: 0.00000316 +[05:49:31] Epoch: 1 Batch: 408/20099 (2.03%) Loss: 4.575447 LR: 0.00000316 +[05:49:34] Epoch: 1 Batch: 409/20099 (2.03%) Loss: 4.091455 LR: 0.00000316 +[05:49:38] Epoch: 1 Batch: 410/20099 (2.04%) Loss: 4.182844 LR: 0.00000316 +[05:49:41] Epoch: 1 Batch: 411/20099 (2.04%) Loss: 4.224022 LR: 0.00000316 +[05:49:44] Epoch: 1 Batch: 412/20099 (2.05%) Loss: 4.192635 LR: 0.00000316 +[05:49:47] Epoch: 1 Batch: 413/20099 (2.05%) Loss: 4.140214 LR: 0.00000322 +[05:49:50] Epoch: 1 Batch: 414/20099 (2.06%) Loss: 4.216347 LR: 0.00000322 +[05:49:53] Epoch: 1 Batch: 415/20099 (2.06%) Loss: 4.621737 LR: 0.00000322 +[05:49:56] Epoch: 1 Batch: 416/20099 (2.07%) Loss: 4.252255 LR: 0.00000322 +[05:49:59] Epoch: 1 Batch: 417/20099 (2.07%) Loss: 4.198958 LR: 0.00000322 +[05:50:03] Epoch: 1 Batch: 418/20099 (2.08%) Loss: 4.394632 LR: 0.00000322 +[05:50:06] Epoch: 1 Batch: 419/20099 (2.08%) Loss: 4.236249 LR: 0.00000322 +[05:50:09] Epoch: 1 Batch: 420/20099 (2.09%) Loss: 4.355175 LR: 0.00000327 +[05:50:12] Epoch: 1 Batch: 421/20099 (2.09%) Loss: 4.291812 LR: 0.00000327 +[05:50:15] Epoch: 1 Batch: 422/20099 (2.10%) Loss: 4.109209 LR: 0.00000327 +[05:50:18] Epoch: 1 Batch: 423/20099 (2.10%) Loss: 4.256128 LR: 0.00000327 +[05:50:21] Epoch: 1 Batch: 424/20099 (2.11%) Loss: 4.256746 LR: 0.00000327 +[05:50:24] Epoch: 1 Batch: 425/20099 (2.11%) Loss: 4.010420 LR: 0.00000327 +[05:50:27] Epoch: 1 Batch: 426/20099 (2.12%) Loss: 4.185977 LR: 0.00000327 +[05:50:30] Epoch: 1 Batch: 427/20099 (2.12%) Loss: 3.734584 LR: 0.00000333 +[05:50:33] Epoch: 1 Batch: 428/20099 (2.13%) Loss: 4.120882 LR: 0.00000333 +[05:50:37] Epoch: 1 Batch: 429/20099 (2.13%) Loss: 4.035900 LR: 0.00000333 +[05:50:40] Epoch: 1 Batch: 430/20099 (2.14%) Loss: 4.096785 LR: 0.00000333 +[05:50:43] Epoch: 1 Batch: 431/20099 (2.14%) Loss: 4.371582 LR: 0.00000333 +[05:50:46] Epoch: 1 Batch: 432/20099 (2.15%) Loss: 4.406343 LR: 0.00000333 +[05:50:49] Epoch: 1 Batch: 433/20099 (2.15%) Loss: 4.217954 LR: 0.00000333 +[05:50:52] Epoch: 1 Batch: 434/20099 (2.16%) Loss: 4.006398 LR: 0.00000338 +[05:50:55] Epoch: 1 Batch: 435/20099 (2.16%) Loss: 4.312137 LR: 0.00000338 +[05:50:58] Epoch: 1 Batch: 436/20099 (2.17%) Loss: 3.935061 LR: 0.00000338 +[05:51:01] Epoch: 1 Batch: 437/20099 (2.17%) Loss: 3.984499 LR: 0.00000338 +[05:51:04] Epoch: 1 Batch: 438/20099 (2.18%) Loss: 4.066708 LR: 0.00000338 +[05:51:08] Epoch: 1 Batch: 439/20099 (2.18%) Loss: 4.183898 LR: 0.00000338 +[05:51:11] Epoch: 1 Batch: 440/20099 (2.19%) Loss: 3.720214 LR: 0.00000338 +[05:51:14] Epoch: 1 Batch: 441/20099 (2.19%) Loss: 3.827372 LR: 0.00000344 +[05:51:17] Epoch: 1 Batch: 442/20099 (2.20%) Loss: 4.160211 LR: 0.00000344 +[05:51:20] Epoch: 1 Batch: 443/20099 (2.20%) Loss: 4.374070 LR: 0.00000344 +[05:51:23] Epoch: 1 Batch: 444/20099 (2.21%) Loss: 3.992518 LR: 0.00000344 +[05:51:26] Epoch: 1 Batch: 445/20099 (2.21%) Loss: 4.103231 LR: 0.00000344 +[05:51:29] Epoch: 1 Batch: 446/20099 (2.22%) Loss: 3.821749 LR: 0.00000344 +[05:51:32] Epoch: 1 Batch: 447/20099 (2.22%) Loss: 3.976046 LR: 0.00000344 +[05:51:35] Epoch: 1 Batch: 448/20099 (2.23%) Loss: 4.146168 LR: 0.00000349 +[05:51:38] Epoch: 1 Batch: 449/20099 (2.23%) Loss: 4.201299 LR: 0.00000349 +[05:51:42] Epoch: 1 Batch: 450/20099 (2.24%) Loss: 4.064378 LR: 0.00000349 +[05:51:45] Epoch: 1 Batch: 451/20099 (2.24%) Loss: 3.729725 LR: 0.00000349 +[05:51:48] Epoch: 1 Batch: 452/20099 (2.25%) Loss: 3.675411 LR: 0.00000349 +[05:51:51] Epoch: 1 Batch: 453/20099 (2.25%) Loss: 4.076932 LR: 0.00000349 +[05:51:54] Epoch: 1 Batch: 454/20099 (2.26%) Loss: 4.141027 LR: 0.00000349 +[05:51:57] Epoch: 1 Batch: 455/20099 (2.26%) Loss: 4.166919 LR: 0.00000355 +[05:52:00] Epoch: 1 Batch: 456/20099 (2.27%) Loss: 3.875535 LR: 0.00000355 +[05:52:03] Epoch: 1 Batch: 457/20099 (2.27%) Loss: 4.099027 LR: 0.00000355 +[05:52:06] Epoch: 1 Batch: 458/20099 (2.28%) Loss: 3.543368 LR: 0.00000355 +[05:52:09] Epoch: 1 Batch: 459/20099 (2.28%) Loss: 3.953135 LR: 0.00000355 +[05:52:13] Epoch: 1 Batch: 460/20099 (2.29%) Loss: 4.167485 LR: 0.00000355 +[05:52:16] Epoch: 1 Batch: 461/20099 (2.29%) Loss: 4.143532 LR: 0.00000355 +[05:52:19] Epoch: 1 Batch: 462/20099 (2.30%) Loss: 3.822133 LR: 0.00000360 +[05:52:22] Epoch: 1 Batch: 463/20099 (2.30%) Loss: 3.708073 LR: 0.00000360 +[05:52:25] Epoch: 1 Batch: 464/20099 (2.31%) Loss: 3.909542 LR: 0.00000360 +[05:52:28] Epoch: 1 Batch: 465/20099 (2.31%) Loss: 3.827154 LR: 0.00000360 +[05:52:31] Epoch: 1 Batch: 466/20099 (2.32%) Loss: 3.733705 LR: 0.00000360 +[05:52:34] Epoch: 1 Batch: 467/20099 (2.32%) Loss: 3.790383 LR: 0.00000360 +[05:52:37] Epoch: 1 Batch: 468/20099 (2.33%) Loss: 3.943520 LR: 0.00000360 +[05:52:41] Epoch: 1 Batch: 469/20099 (2.33%) Loss: 3.766414 LR: 0.00000365 +[05:52:44] Epoch: 1 Batch: 470/20099 (2.34%) Loss: 3.857356 LR: 0.00000365 +[05:52:47] Epoch: 1 Batch: 471/20099 (2.34%) Loss: 3.848726 LR: 0.00000365 +[05:52:50] Epoch: 1 Batch: 472/20099 (2.35%) Loss: 3.865954 LR: 0.00000365 +[05:52:53] Epoch: 1 Batch: 473/20099 (2.35%) Loss: 3.860591 LR: 0.00000365 +[05:52:56] Epoch: 1 Batch: 474/20099 (2.36%) Loss: 3.921723 LR: 0.00000365 +[05:52:59] Epoch: 1 Batch: 475/20099 (2.36%) Loss: 3.815340 LR: 0.00000365 +[05:53:02] Epoch: 1 Batch: 476/20099 (2.37%) Loss: 3.966082 LR: 0.00000371 +[05:53:05] Epoch: 1 Batch: 477/20099 (2.37%) Loss: 3.753700 LR: 0.00000371 +[05:53:08] Epoch: 1 Batch: 478/20099 (2.38%) Loss: 3.675669 LR: 0.00000371 +[05:53:12] Epoch: 1 Batch: 479/20099 (2.38%) Loss: 3.459110 LR: 0.00000371 +[05:53:15] Epoch: 1 Batch: 480/20099 (2.39%) Loss: 4.243398 LR: 0.00000371 +[05:53:18] Epoch: 1 Batch: 481/20099 (2.39%) Loss: 3.737950 LR: 0.00000371 +[05:53:21] Epoch: 1 Batch: 482/20099 (2.40%) Loss: 4.179872 LR: 0.00000371 +[05:53:24] Epoch: 1 Batch: 483/20099 (2.40%) Loss: 3.865076 LR: 0.00000376 +[05:53:27] Epoch: 1 Batch: 484/20099 (2.41%) Loss: 3.965529 LR: 0.00000376 +[05:53:30] Epoch: 1 Batch: 485/20099 (2.41%) Loss: 3.849352 LR: 0.00000376 +[05:53:33] Epoch: 1 Batch: 486/20099 (2.42%) Loss: 3.759498 LR: 0.00000376 +[05:53:36] Epoch: 1 Batch: 487/20099 (2.42%) Loss: 3.589682 LR: 0.00000376 +[05:53:39] Epoch: 1 Batch: 488/20099 (2.43%) Loss: 3.498409 LR: 0.00000376 +[05:53:42] Epoch: 1 Batch: 489/20099 (2.43%) Loss: 3.753106 LR: 0.00000376 +[05:53:46] Epoch: 1 Batch: 490/20099 (2.44%) Loss: 3.818756 LR: 0.00000382 +[05:53:49] Epoch: 1 Batch: 491/20099 (2.44%) Loss: 3.818244 LR: 0.00000382 +[05:53:52] Epoch: 1 Batch: 492/20099 (2.45%) Loss: 3.464826 LR: 0.00000382 +[05:53:55] Epoch: 1 Batch: 493/20099 (2.45%) Loss: 3.757796 LR: 0.00000382 +[05:53:58] Epoch: 1 Batch: 494/20099 (2.46%) Loss: 3.976522 LR: 0.00000382 +[05:54:01] Epoch: 1 Batch: 495/20099 (2.46%) Loss: 3.548073 LR: 0.00000382 +[05:54:04] Epoch: 1 Batch: 496/20099 (2.47%) Loss: 3.892557 LR: 0.00000382 +[05:54:07] Epoch: 1 Batch: 497/20099 (2.47%) Loss: 3.668852 LR: 0.00000387 +[05:54:10] Epoch: 1 Batch: 498/20099 (2.48%) Loss: 3.730822 LR: 0.00000387 +[05:54:13] Epoch: 1 Batch: 499/20099 (2.48%) Loss: 3.480257 LR: 0.00000387 +[05:54:17] >> Evaluating batch 0 +[05:54:18] >> Evaluating batch 1 +[05:54:19] >> Evaluating batch 2 +[05:54:21] >> Evaluating batch 3 +[05:54:22] >> Evaluating batch 4 +[05:54:23] >> Evaluating batch 5 +[05:54:24] >> Evaluating batch 6 +[05:54:26] >> Evaluating batch 7 +[05:54:27] >> Evaluating batch 8 +[05:54:28] >> Evaluating batch 9 +[05:54:29] >> Evaluating batch 10 +[05:54:30] >> Evaluating batch 11 +[05:54:32] >> Evaluating batch 12 +[05:54:33] >> Evaluating batch 13 +[05:54:34] >> Evaluating batch 14 +[05:54:35] >> Evaluating batch 15 +[05:54:36] >> Evaluating batch 16 +[05:54:37] Epoch: 1 Step: 500/20099 Evaluation: +[05:54:37] [1mAvg Loss Since Last Eval: 5.5317 Val Loss: 3.7841 Validation loss delta: 3.7841 Perplexity: 43.9965 LR: 0.00000387 +[05:54:41] >> Checkpoint saved: epoch1_step500, size: 0.1693 GB +[05:54:41] Epoch: 1 Batch: 500/20099 (2.49%) Loss: 3.659668 LR: 0.00000387 +[05:54:44] Epoch: 1 Batch: 501/20099 (2.49%) Loss: 3.431229 LR: 0.00000387 +[05:54:47] Epoch: 1 Batch: 502/20099 (2.50%) Loss: 3.719763 LR: 0.00000387 +[05:54:50] Epoch: 1 Batch: 503/20099 (2.50%) Loss: 3.788880 LR: 0.00000387 +[05:54:53] Epoch: 1 Batch: 504/20099 (2.51%) Loss: 3.466397 LR: 0.00000393 +[05:54:56] Epoch: 1 Batch: 505/20099 (2.51%) Loss: 3.665978 LR: 0.00000393 +[05:54:59] Epoch: 1 Batch: 506/20099 (2.52%) Loss: 4.039702 LR: 0.00000393 +[05:55:02] Epoch: 1 Batch: 507/20099 (2.52%) Loss: 3.902894 LR: 0.00000393 +[05:55:06] Epoch: 1 Batch: 508/20099 (2.53%) Loss: 3.589540 LR: 0.00000393 +[05:55:09] Epoch: 1 Batch: 509/20099 (2.53%) Loss: 3.595153 LR: 0.00000393 +[05:55:12] Epoch: 1 Batch: 510/20099 (2.54%) Loss: 3.821929 LR: 0.00000393 +[05:55:15] Epoch: 1 Batch: 511/20099 (2.54%) Loss: 3.638913 LR: 0.00000398 +[05:55:18] Epoch: 1 Batch: 512/20099 (2.55%) Loss: 3.772271 LR: 0.00000398 +[05:55:21] Epoch: 1 Batch: 513/20099 (2.55%) Loss: 3.613601 LR: 0.00000398 +[05:55:24] Epoch: 1 Batch: 514/20099 (2.56%) Loss: 3.359339 LR: 0.00000398 +[05:55:28] Epoch: 1 Batch: 515/20099 (2.56%) Loss: 3.825066 LR: 0.00000398 +[05:55:31] Epoch: 1 Batch: 516/20099 (2.57%) Loss: 3.490818 LR: 0.00000398 +[05:55:34] Epoch: 1 Batch: 517/20099 (2.57%) Loss: 3.395434 LR: 0.00000398 +[05:55:37] Epoch: 1 Batch: 518/20099 (2.58%) Loss: 3.687841 LR: 0.00000404 +[05:55:40] Epoch: 1 Batch: 519/20099 (2.58%) Loss: 3.573826 LR: 0.00000404 +[05:55:43] Epoch: 1 Batch: 520/20099 (2.59%) Loss: 3.504808 LR: 0.00000404 +[05:55:46] Epoch: 1 Batch: 521/20099 (2.59%) Loss: 3.787109 LR: 0.00000404 +[05:55:49] Epoch: 1 Batch: 522/20099 (2.60%) Loss: 3.473059 LR: 0.00000404 +[05:55:52] Epoch: 1 Batch: 523/20099 (2.60%) Loss: 3.585751 LR: 0.00000404 +[05:55:56] Epoch: 1 Batch: 524/20099 (2.61%) Loss: 3.363418 LR: 0.00000404 +[05:55:59] Epoch: 1 Batch: 525/20099 (2.61%) Loss: 3.883132 LR: 0.00000409 +[05:56:02] Epoch: 1 Batch: 526/20099 (2.62%) Loss: 3.766091 LR: 0.00000409 +[05:56:05] Epoch: 1 Batch: 527/20099 (2.62%) Loss: 3.396730 LR: 0.00000409 +[05:56:08] Epoch: 1 Batch: 528/20099 (2.63%) Loss: 3.667768 LR: 0.00000409 +[05:56:11] Epoch: 1 Batch: 529/20099 (2.63%) Loss: 3.520267 LR: 0.00000409 +[05:56:14] Epoch: 1 Batch: 530/20099 (2.64%) Loss: 3.588992 LR: 0.00000409 +[05:56:17] Epoch: 1 Batch: 531/20099 (2.64%) Loss: 3.276070 LR: 0.00000409 +[05:56:20] Epoch: 1 Batch: 532/20099 (2.65%) Loss: 3.699071 LR: 0.00000415 +[05:56:23] Epoch: 1 Batch: 533/20099 (2.65%) Loss: 3.508219 LR: 0.00000415 +[05:56:26] Epoch: 1 Batch: 534/20099 (2.66%) Loss: 3.550948 LR: 0.00000415 +[05:56:30] Epoch: 1 Batch: 535/20099 (2.66%) Loss: 3.305432 LR: 0.00000415 +[05:56:33] Epoch: 1 Batch: 536/20099 (2.67%) Loss: 3.604021 LR: 0.00000415 +[05:56:36] Epoch: 1 Batch: 537/20099 (2.67%) Loss: 3.365321 LR: 0.00000415 +[05:56:39] Epoch: 1 Batch: 538/20099 (2.68%) Loss: 3.414226 LR: 0.00000415 +[05:56:42] Epoch: 1 Batch: 539/20099 (2.68%) Loss: 3.377056 LR: 0.00000420 +[05:56:45] Epoch: 1 Batch: 540/20099 (2.69%) Loss: 3.160712 LR: 0.00000420 +[05:56:48] Epoch: 1 Batch: 541/20099 (2.69%) Loss: 3.483096 LR: 0.00000420 +[05:56:51] Epoch: 1 Batch: 542/20099 (2.70%) Loss: 3.367512 LR: 0.00000420 +[05:56:54] Epoch: 1 Batch: 543/20099 (2.70%) Loss: 3.590770 LR: 0.00000420 +[05:56:58] Epoch: 1 Batch: 544/20099 (2.71%) Loss: 3.066118 LR: 0.00000420 +[05:57:01] Epoch: 1 Batch: 545/20099 (2.71%) Loss: 3.460900 LR: 0.00000420 +[05:57:04] Epoch: 1 Batch: 546/20099 (2.72%) Loss: 3.363395 LR: 0.00000425 +[05:57:07] Epoch: 1 Batch: 547/20099 (2.72%) Loss: 3.327603 LR: 0.00000425 +[05:57:10] Epoch: 1 Batch: 548/20099 (2.73%) Loss: 3.557308 LR: 0.00000425 +[05:57:13] Epoch: 1 Batch: 549/20099 (2.73%) Loss: 3.242236 LR: 0.00000425 +[05:57:16] Epoch: 1 Batch: 550/20099 (2.74%) Loss: 3.535053 LR: 0.00000425 +[05:57:19] Epoch: 1 Batch: 551/20099 (2.74%) Loss: 3.438156 LR: 0.00000425 +[05:57:22] Epoch: 1 Batch: 552/20099 (2.75%) Loss: 3.394970 LR: 0.00000425 +[05:57:26] Epoch: 1 Batch: 553/20099 (2.75%) Loss: 3.225433 LR: 0.00000431 +[05:57:29] Epoch: 1 Batch: 554/20099 (2.76%) Loss: 3.360871 LR: 0.00000431 +[05:57:32] Epoch: 1 Batch: 555/20099 (2.76%) Loss: 2.998131 LR: 0.00000431 +[05:57:35] Epoch: 1 Batch: 556/20099 (2.77%) Loss: 3.220312 LR: 0.00000431 +[05:57:38] Epoch: 1 Batch: 557/20099 (2.77%) Loss: 3.574907 LR: 0.00000431 +[05:57:41] Epoch: 1 Batch: 558/20099 (2.78%) Loss: 3.541529 LR: 0.00000431 +[05:57:44] Epoch: 1 Batch: 559/20099 (2.78%) Loss: 3.331823 LR: 0.00000431 +[05:57:47] Epoch: 1 Batch: 560/20099 (2.79%) Loss: 3.114966 LR: 0.00000436 +[05:57:50] Epoch: 1 Batch: 561/20099 (2.79%) Loss: 3.313049 LR: 0.00000436 +[05:57:54] Epoch: 1 Batch: 562/20099 (2.80%) Loss: 3.215311 LR: 0.00000436 +[05:57:57] Epoch: 1 Batch: 563/20099 (2.80%) Loss: 3.518518 LR: 0.00000436 +[05:58:00] Epoch: 1 Batch: 564/20099 (2.81%) Loss: 3.285318 LR: 0.00000436 +[05:58:03] Epoch: 1 Batch: 565/20099 (2.81%) Loss: 3.470620 LR: 0.00000436 +[05:58:06] Epoch: 1 Batch: 566/20099 (2.82%) Loss: 3.350331 LR: 0.00000436 +[05:58:09] Epoch: 1 Batch: 567/20099 (2.82%) Loss: 3.215801 LR: 0.00000442 +[05:58:12] Epoch: 1 Batch: 568/20099 (2.83%) Loss: 3.126339 LR: 0.00000442 +[05:58:15] Epoch: 1 Batch: 569/20099 (2.83%) Loss: 3.139811 LR: 0.00000442 +[05:58:18] Epoch: 1 Batch: 570/20099 (2.84%) Loss: 3.255645 LR: 0.00000442 +[05:58:21] Epoch: 1 Batch: 571/20099 (2.84%) Loss: 3.393592 LR: 0.00000442 +[05:58:25] Epoch: 1 Batch: 572/20099 (2.85%) Loss: 3.235650 LR: 0.00000442 +[05:58:28] Epoch: 1 Batch: 573/20099 (2.85%) Loss: 3.407495 LR: 0.00000442 +[05:58:31] Epoch: 1 Batch: 574/20099 (2.86%) Loss: 3.135778 LR: 0.00000447 +[05:58:34] Epoch: 1 Batch: 575/20099 (2.86%) Loss: 3.040358 LR: 0.00000447 +[05:58:37] Epoch: 1 Batch: 576/20099 (2.87%) Loss: 3.381332 LR: 0.00000447 +[05:58:40] Epoch: 1 Batch: 577/20099 (2.87%) Loss: 3.146860 LR: 0.00000447 +[05:58:43] Epoch: 1 Batch: 578/20099 (2.88%) Loss: 3.093589 LR: 0.00000447 +[05:58:46] Epoch: 1 Batch: 579/20099 (2.88%) Loss: 3.534211 LR: 0.00000447 +[05:58:49] Epoch: 1 Batch: 580/20099 (2.89%) Loss: 3.272237 LR: 0.00000447 +[05:58:53] Epoch: 1 Batch: 581/20099 (2.89%) Loss: 3.417892 LR: 0.00000453 +[05:58:56] Epoch: 1 Batch: 582/20099 (2.90%) Loss: 3.313116 LR: 0.00000453 +[05:58:59] Epoch: 1 Batch: 583/20099 (2.90%) Loss: 3.206202 LR: 0.00000453 +[05:59:02] Epoch: 1 Batch: 584/20099 (2.91%) Loss: 3.550137 LR: 0.00000453 +[05:59:05] Epoch: 1 Batch: 585/20099 (2.91%) Loss: 3.333817 LR: 0.00000453 +[05:59:08] Epoch: 1 Batch: 586/20099 (2.92%) Loss: 3.080060 LR: 0.00000453 +[05:59:11] Epoch: 1 Batch: 587/20099 (2.92%) Loss: 3.173259 LR: 0.00000453 +[05:59:14] Epoch: 1 Batch: 588/20099 (2.93%) Loss: 3.314359 LR: 0.00000458 +[05:59:17] Epoch: 1 Batch: 589/20099 (2.93%) Loss: 2.940089 LR: 0.00000458 +[05:59:21] Epoch: 1 Batch: 590/20099 (2.94%) Loss: 3.244025 LR: 0.00000458 +[05:59:24] Epoch: 1 Batch: 591/20099 (2.94%) Loss: 3.239339 LR: 0.00000458 +[05:59:27] Epoch: 1 Batch: 592/20099 (2.95%) Loss: 3.177510 LR: 0.00000458 +[05:59:30] Epoch: 1 Batch: 593/20099 (2.95%) Loss: 3.410569 LR: 0.00000458 +[05:59:33] Epoch: 1 Batch: 594/20099 (2.96%) Loss: 3.361750 LR: 0.00000458 +[05:59:36] Epoch: 1 Batch: 595/20099 (2.96%) Loss: 3.321055 LR: 0.00000464 +[05:59:39] Epoch: 1 Batch: 596/20099 (2.97%) Loss: 3.248768 LR: 0.00000464 +[05:59:42] Epoch: 1 Batch: 597/20099 (2.97%) Loss: 3.478450 LR: 0.00000464 +[05:59:45] Epoch: 1 Batch: 598/20099 (2.98%) Loss: 3.254317 LR: 0.00000464 +[05:59:48] Epoch: 1 Batch: 599/20099 (2.98%) Loss: 3.094679 LR: 0.00000464 +[05:59:55] >> Temp checkpoint saved: epoch1_step600, size: 0.1693 GB +[05:59:55] Epoch: 1 Batch: 600/20099 (2.99%) Loss: 3.077919 LR: 0.00000464 +[05:59:58] Epoch: 1 Batch: 601/20099 (2.99%) Loss: 3.388025 LR: 0.00000464 +[06:00:01] Epoch: 1 Batch: 602/20099 (3.00%) Loss: 3.208430 LR: 0.00000469 +[06:00:05] Epoch: 1 Batch: 603/20099 (3.00%) Loss: 3.214555 LR: 0.00000469 +[06:00:08] Epoch: 1 Batch: 604/20099 (3.01%) Loss: 3.274143 LR: 0.00000469 +[06:00:11] Epoch: 1 Batch: 605/20099 (3.01%) Loss: 3.238697 LR: 0.00000469 +[06:00:14] Epoch: 1 Batch: 606/20099 (3.02%) Loss: 3.252021 LR: 0.00000469 +[06:00:17] Epoch: 1 Batch: 607/20099 (3.02%) Loss: 3.050976 LR: 0.00000469 +[06:00:20] Epoch: 1 Batch: 608/20099 (3.03%) Loss: 3.174529 LR: 0.00000469 +[06:00:23] Epoch: 1 Batch: 609/20099 (3.03%) Loss: 2.937750 LR: 0.00000475 +[06:00:26] Epoch: 1 Batch: 610/20099 (3.03%) Loss: 3.077062 LR: 0.00000475 +[06:00:29] Epoch: 1 Batch: 611/20099 (3.04%) Loss: 3.319816 LR: 0.00000475 +[06:00:33] Epoch: 1 Batch: 612/20099 (3.04%) Loss: 2.946783 LR: 0.00000475 +[06:00:36] Epoch: 1 Batch: 613/20099 (3.05%) Loss: 3.323491 LR: 0.00000475 +[06:00:39] Epoch: 1 Batch: 614/20099 (3.05%) Loss: 3.172926 LR: 0.00000475 +[06:00:42] Epoch: 1 Batch: 615/20099 (3.06%) Loss: 3.302462 LR: 0.00000475 +[06:00:45] Epoch: 1 Batch: 616/20099 (3.06%) Loss: 3.105341 LR: 0.00000480 +[06:00:48] Epoch: 1 Batch: 617/20099 (3.07%) Loss: 3.070046 LR: 0.00000480 +[06:00:51] Epoch: 1 Batch: 618/20099 (3.07%) Loss: 3.223153 LR: 0.00000480 +[06:00:54] Epoch: 1 Batch: 619/20099 (3.08%) Loss: 2.849803 LR: 0.00000480 +[06:00:57] Epoch: 1 Batch: 620/20099 (3.08%) Loss: 3.203351 LR: 0.00000480 +[06:01:01] Epoch: 1 Batch: 621/20099 (3.09%) Loss: 3.106402 LR: 0.00000480 +[06:01:04] Epoch: 1 Batch: 622/20099 (3.09%) Loss: 3.263969 LR: 0.00000480 +[06:01:07] Epoch: 1 Batch: 623/20099 (3.10%) Loss: 3.093187 LR: 0.00000485 +[06:01:10] Epoch: 1 Batch: 624/20099 (3.10%) Loss: 3.134565 LR: 0.00000485 +[06:01:13] Epoch: 1 Batch: 625/20099 (3.11%) Loss: 2.915504 LR: 0.00000485 +[06:01:16] Epoch: 1 Batch: 626/20099 (3.11%) Loss: 3.422499 LR: 0.00000485 +[06:01:19] Epoch: 1 Batch: 627/20099 (3.12%) Loss: 3.200302 LR: 0.00000485 +[06:01:22] Epoch: 1 Batch: 628/20099 (3.12%) Loss: 3.059123 LR: 0.00000485 +[06:01:25] Epoch: 1 Batch: 629/20099 (3.13%) Loss: 2.926637 LR: 0.00000485 +[06:01:28] Epoch: 1 Batch: 630/20099 (3.13%) Loss: 3.096199 LR: 0.00000491 +[06:01:32] Epoch: 1 Batch: 631/20099 (3.14%) Loss: 3.423839 LR: 0.00000491 +[06:01:35] Epoch: 1 Batch: 632/20099 (3.14%) Loss: 3.104832 LR: 0.00000491 +[06:01:38] Epoch: 1 Batch: 633/20099 (3.15%) Loss: 3.059201 LR: 0.00000491 +[06:01:41] Epoch: 1 Batch: 634/20099 (3.15%) Loss: 2.907817 LR: 0.00000491 +[06:01:44] Epoch: 1 Batch: 635/20099 (3.16%) Loss: 3.105647 LR: 0.00000491 +[06:01:47] Epoch: 1 Batch: 636/20099 (3.16%) Loss: 3.200073 LR: 0.00000491 +[06:01:50] Epoch: 1 Batch: 637/20099 (3.17%) Loss: 3.079901 LR: 0.00000496 +[06:01:53] Epoch: 1 Batch: 638/20099 (3.17%) Loss: 2.781568 LR: 0.00000496 +[06:01:56] Epoch: 1 Batch: 639/20099 (3.18%) Loss: 2.994633 LR: 0.00000496 +[06:01:59] Epoch: 1 Batch: 640/20099 (3.18%) Loss: 2.662927 LR: 0.00000496 +[06:02:03] Epoch: 1 Batch: 641/20099 (3.19%) Loss: 3.026238 LR: 0.00000496 +[06:02:06] Epoch: 1 Batch: 642/20099 (3.19%) Loss: 2.726641 LR: 0.00000496 +[06:02:09] Epoch: 1 Batch: 643/20099 (3.20%) Loss: 2.948033 LR: 0.00000496 +[06:02:12] Epoch: 1 Batch: 644/20099 (3.20%) Loss: 3.041286 LR: 0.00000502 +[06:02:15] Epoch: 1 Batch: 645/20099 (3.21%) Loss: 3.011386 LR: 0.00000502 +[06:02:18] Epoch: 1 Batch: 646/20099 (3.21%) Loss: 3.173436 LR: 0.00000502 +[06:02:21] Epoch: 1 Batch: 647/20099 (3.22%) Loss: 3.107981 LR: 0.00000502 +[06:02:24] Epoch: 1 Batch: 648/20099 (3.22%) Loss: 3.088532 LR: 0.00000502 +[06:02:27] Epoch: 1 Batch: 649/20099 (3.23%) Loss: 3.018124 LR: 0.00000502 +[06:02:31] Epoch: 1 Batch: 650/20099 (3.23%) Loss: 3.039707 LR: 0.00000502 +[06:02:34] Epoch: 1 Batch: 651/20099 (3.24%) Loss: 3.113390 LR: 0.00000507 +[06:02:37] Epoch: 1 Batch: 652/20099 (3.24%) Loss: 2.892727 LR: 0.00000507 +[06:02:40] Epoch: 1 Batch: 653/20099 (3.25%) Loss: 2.966950 LR: 0.00000507 +[06:02:43] Epoch: 1 Batch: 654/20099 (3.25%) Loss: 3.125466 LR: 0.00000507 +[06:02:46] Epoch: 1 Batch: 655/20099 (3.26%) Loss: 2.866787 LR: 0.00000507 +[06:02:49] Epoch: 1 Batch: 656/20099 (3.26%) Loss: 2.985016 LR: 0.00000507 +[06:02:52] Epoch: 1 Batch: 657/20099 (3.27%) Loss: 3.099442 LR: 0.00000507 +[06:02:55] Epoch: 1 Batch: 658/20099 (3.27%) Loss: 2.795891 LR: 0.00000513 +[06:02:59] Epoch: 1 Batch: 659/20099 (3.28%) Loss: 3.027641 LR: 0.00000513 +[06:03:02] Epoch: 1 Batch: 660/20099 (3.28%) Loss: 3.005656 LR: 0.00000513 +[06:03:05] Epoch: 1 Batch: 661/20099 (3.29%) Loss: 2.552632 LR: 0.00000513 +[06:03:08] Epoch: 1 Batch: 662/20099 (3.29%) Loss: 2.904753 LR: 0.00000513 +[06:03:11] Epoch: 1 Batch: 663/20099 (3.30%) Loss: 2.797255 LR: 0.00000513 +[06:03:14] Epoch: 1 Batch: 664/20099 (3.30%) Loss: 2.999705 LR: 0.00000513 +[06:03:17] Epoch: 1 Batch: 665/20099 (3.31%) Loss: 2.763514 LR: 0.00000518 +[06:03:20] Epoch: 1 Batch: 666/20099 (3.31%) Loss: 2.862915 LR: 0.00000518 +[06:03:23] Epoch: 1 Batch: 667/20099 (3.32%) Loss: 2.854605 LR: 0.00000518 +[06:03:26] Epoch: 1 Batch: 668/20099 (3.32%) Loss: 2.644818 LR: 0.00000518 +[06:03:30] Epoch: 1 Batch: 669/20099 (3.33%) Loss: 2.947203 LR: 0.00000518 +[06:03:33] Epoch: 1 Batch: 670/20099 (3.33%) Loss: 3.058627 LR: 0.00000518 +[06:03:36] Epoch: 1 Batch: 671/20099 (3.34%) Loss: 2.843078 LR: 0.00000518 +[06:03:39] Epoch: 1 Batch: 672/20099 (3.34%) Loss: 2.764021 LR: 0.00000524 +[06:03:42] Epoch: 1 Batch: 673/20099 (3.35%) Loss: 3.055920 LR: 0.00000524 +[06:03:45] Epoch: 1 Batch: 674/20099 (3.35%) Loss: 2.888914 LR: 0.00000524 +[06:03:48] Epoch: 1 Batch: 675/20099 (3.36%) Loss: 2.999079 LR: 0.00000524 +[06:03:51] Epoch: 1 Batch: 676/20099 (3.36%) Loss: 2.743417 LR: 0.00000524 +[06:03:54] Epoch: 1 Batch: 677/20099 (3.37%) Loss: 2.781025 LR: 0.00000524 +[06:03:57] Epoch: 1 Batch: 678/20099 (3.37%) Loss: 3.001202 LR: 0.00000524 +[06:04:01] Epoch: 1 Batch: 679/20099 (3.38%) Loss: 3.256280 LR: 0.00000529 +[06:04:04] Epoch: 1 Batch: 680/20099 (3.38%) Loss: 3.016867 LR: 0.00000529 +[06:04:07] Epoch: 1 Batch: 681/20099 (3.39%) Loss: 3.040658 LR: 0.00000529 +[06:04:10] Epoch: 1 Batch: 682/20099 (3.39%) Loss: 2.886739 LR: 0.00000529 +[06:04:13] Epoch: 1 Batch: 683/20099 (3.40%) Loss: 3.096670 LR: 0.00000529 +[06:04:16] Epoch: 1 Batch: 684/20099 (3.40%) Loss: 3.169854 LR: 0.00000529 +[06:04:19] Epoch: 1 Batch: 685/20099 (3.41%) Loss: 2.877370 LR: 0.00000529 +[06:04:22] Epoch: 1 Batch: 686/20099 (3.41%) Loss: 2.966127 LR: 0.00000535 +[06:04:25] Epoch: 1 Batch: 687/20099 (3.42%) Loss: 2.683146 LR: 0.00000535 +[06:04:28] Epoch: 1 Batch: 688/20099 (3.42%) Loss: 2.782723 LR: 0.00000535 +[06:04:31] Epoch: 1 Batch: 689/20099 (3.43%) Loss: 3.031456 LR: 0.00000535 +[06:04:35] Epoch: 1 Batch: 690/20099 (3.43%) Loss: 2.642422 LR: 0.00000535 +[06:04:38] Epoch: 1 Batch: 691/20099 (3.44%) Loss: 2.506506 LR: 0.00000535 +[06:04:41] Epoch: 1 Batch: 692/20099 (3.44%) Loss: 2.810854 LR: 0.00000535 +[06:04:44] Epoch: 1 Batch: 693/20099 (3.45%) Loss: 2.898553 LR: 0.00000540 +[06:04:47] Epoch: 1 Batch: 694/20099 (3.45%) Loss: 2.893824 LR: 0.00000540 +[06:04:50] Epoch: 1 Batch: 695/20099 (3.46%) Loss: 2.800242 LR: 0.00000540 +[06:04:53] Epoch: 1 Batch: 696/20099 (3.46%) Loss: 2.776359 LR: 0.00000540 +[06:04:56] Epoch: 1 Batch: 697/20099 (3.47%) Loss: 2.696438 LR: 0.00000540 +[06:04:59] Epoch: 1 Batch: 698/20099 (3.47%) Loss: 2.688920 LR: 0.00000540 +[06:05:02] Epoch: 1 Batch: 699/20099 (3.48%) Loss: 2.761857 LR: 0.00000540 +[06:05:06] Epoch: 1 Batch: 700/20099 (3.48%) Loss: 2.526166 LR: 0.00000545 +[06:05:09] Epoch: 1 Batch: 701/20099 (3.49%) Loss: 2.949073 LR: 0.00000545 +[06:05:12] Epoch: 1 Batch: 702/20099 (3.49%) Loss: 3.025122 LR: 0.00000545 +[06:05:15] Epoch: 1 Batch: 703/20099 (3.50%) Loss: 2.769567 LR: 0.00000545 +[06:05:18] Epoch: 1 Batch: 704/20099 (3.50%) Loss: 2.675805 LR: 0.00000545 +[06:05:21] Epoch: 1 Batch: 705/20099 (3.51%) Loss: 2.773868 LR: 0.00000545 +[06:05:24] Epoch: 1 Batch: 706/20099 (3.51%) Loss: 2.931582 LR: 0.00000545 +[06:05:27] Epoch: 1 Batch: 707/20099 (3.52%) Loss: 2.909219 LR: 0.00000551 +[06:05:30] Epoch: 1 Batch: 708/20099 (3.52%) Loss: 2.736878 LR: 0.00000551 +[06:05:33] Epoch: 1 Batch: 709/20099 (3.53%) Loss: 2.723336 LR: 0.00000551 +[06:05:36] Epoch: 1 Batch: 710/20099 (3.53%) Loss: 2.804015 LR: 0.00000551 +[06:05:40] Epoch: 1 Batch: 711/20099 (3.54%) Loss: 2.915295 LR: 0.00000551 +[06:05:43] Epoch: 1 Batch: 712/20099 (3.54%) Loss: 3.158375 LR: 0.00000551 +[06:05:46] Epoch: 1 Batch: 713/20099 (3.55%) Loss: 3.050549 LR: 0.00000551 +[06:05:49] Epoch: 1 Batch: 714/20099 (3.55%) Loss: 2.436827 LR: 0.00000556 +[06:05:52] Epoch: 1 Batch: 715/20099 (3.56%) Loss: 2.664726 LR: 0.00000556 +[06:05:55] Epoch: 1 Batch: 716/20099 (3.56%) Loss: 2.712883 LR: 0.00000556 +[06:05:58] Epoch: 1 Batch: 717/20099 (3.57%) Loss: 2.872016 LR: 0.00000556 +[06:06:01] Epoch: 1 Batch: 718/20099 (3.57%) Loss: 2.960576 LR: 0.00000556 +[06:06:04] Epoch: 1 Batch: 719/20099 (3.58%) Loss: 2.906193 LR: 0.00000556 +[06:06:07] Epoch: 1 Batch: 720/20099 (3.58%) Loss: 2.900452 LR: 0.00000556 +[06:06:11] Epoch: 1 Batch: 721/20099 (3.59%) Loss: 2.983211 LR: 0.00000562 +[06:06:14] Epoch: 1 Batch: 722/20099 (3.59%) Loss: 2.976666 LR: 0.00000562 +[06:06:17] Epoch: 1 Batch: 723/20099 (3.60%) Loss: 2.955793 LR: 0.00000562 +[06:06:20] Epoch: 1 Batch: 724/20099 (3.60%) Loss: 2.357826 LR: 0.00000562 +[06:06:23] Epoch: 1 Batch: 725/20099 (3.61%) Loss: 2.686409 LR: 0.00000562 +[06:06:26] Epoch: 1 Batch: 726/20099 (3.61%) Loss: 2.638599 LR: 0.00000562 +[06:06:29] Epoch: 1 Batch: 727/20099 (3.62%) Loss: 2.642083 LR: 0.00000562 +[06:06:32] Epoch: 1 Batch: 728/20099 (3.62%) Loss: 2.543625 LR: 0.00000567 +[06:06:35] Epoch: 1 Batch: 729/20099 (3.63%) Loss: 2.819079 LR: 0.00000567 +[06:06:38] Epoch: 1 Batch: 730/20099 (3.63%) Loss: 2.456854 LR: 0.00000567 +[06:06:42] Epoch: 1 Batch: 731/20099 (3.64%) Loss: 2.797595 LR: 0.00000567 +[06:06:45] Epoch: 1 Batch: 732/20099 (3.64%) Loss: 2.716734 LR: 0.00000567 +[06:06:48] Epoch: 1 Batch: 733/20099 (3.65%) Loss: 2.592807 LR: 0.00000567 +[06:06:51] Epoch: 1 Batch: 734/20099 (3.65%) Loss: 2.724958 LR: 0.00000567 +[06:06:54] Epoch: 1 Batch: 735/20099 (3.66%) Loss: 3.005465 LR: 0.00000573 +[06:06:57] Epoch: 1 Batch: 736/20099 (3.66%) Loss: 2.725872 LR: 0.00000573 +[06:07:00] Epoch: 1 Batch: 737/20099 (3.67%) Loss: 3.063807 LR: 0.00000573 +[06:07:03] Epoch: 1 Batch: 738/20099 (3.67%) Loss: 2.776735 LR: 0.00000573 +[06:07:06] Epoch: 1 Batch: 739/20099 (3.68%) Loss: 3.007910 LR: 0.00000573 +[06:07:09] Epoch: 1 Batch: 740/20099 (3.68%) Loss: 2.573530 LR: 0.00000573 +[06:07:13] Epoch: 1 Batch: 741/20099 (3.69%) Loss: 2.598355 LR: 0.00000573 +[06:07:16] Epoch: 1 Batch: 742/20099 (3.69%) Loss: 2.466495 LR: 0.00000578 +[06:07:19] Epoch: 1 Batch: 743/20099 (3.70%) Loss: 3.053300 LR: 0.00000578 +[06:07:22] Epoch: 1 Batch: 744/20099 (3.70%) Loss: 2.677148 LR: 0.00000578 +[06:07:25] Epoch: 1 Batch: 745/20099 (3.71%) Loss: 2.579324 LR: 0.00000578 +[06:07:28] Epoch: 1 Batch: 746/20099 (3.71%) Loss: 2.494274 LR: 0.00000578 +[06:07:31] Epoch: 1 Batch: 747/20099 (3.72%) Loss: 2.617231 LR: 0.00000578 +[06:07:34] Epoch: 1 Batch: 748/20099 (3.72%) Loss: 2.687482 LR: 0.00000578 +[06:07:37] Epoch: 1 Batch: 749/20099 (3.73%) Loss: 2.780354 LR: 0.00000584 +[06:07:40] Epoch: 1 Batch: 750/20099 (3.73%) Loss: 2.774773 LR: 0.00000584 +[06:07:44] Epoch: 1 Batch: 751/20099 (3.74%) Loss: 2.791642 LR: 0.00000584 +[06:07:47] Epoch: 1 Batch: 752/20099 (3.74%) Loss: 2.910693 LR: 0.00000584 +[06:07:50] Epoch: 1 Batch: 753/20099 (3.75%) Loss: 2.732524 LR: 0.00000584 +[06:07:53] Epoch: 1 Batch: 754/20099 (3.75%) Loss: 2.674719 LR: 0.00000584 +[06:07:56] Epoch: 1 Batch: 755/20099 (3.76%) Loss: 2.846109 LR: 0.00000584 +[06:07:59] Epoch: 1 Batch: 756/20099 (3.76%) Loss: 2.975952 LR: 0.00000589 +[06:08:02] Epoch: 1 Batch: 757/20099 (3.77%) Loss: 2.805113 LR: 0.00000589 +[06:08:05] Epoch: 1 Batch: 758/20099 (3.77%) Loss: 2.681490 LR: 0.00000589 +[06:08:08] Epoch: 1 Batch: 759/20099 (3.78%) Loss: 2.692689 LR: 0.00000589 +[06:08:11] Epoch: 1 Batch: 760/20099 (3.78%) Loss: 2.645676 LR: 0.00000589 +[06:08:15] Epoch: 1 Batch: 761/20099 (3.79%) Loss: 2.765140 LR: 0.00000589 +[06:08:18] Epoch: 1 Batch: 762/20099 (3.79%) Loss: 2.692878 LR: 0.00000589 +[06:08:21] Epoch: 1 Batch: 763/20099 (3.80%) Loss: 3.188788 LR: 0.00000595 +[06:08:24] Epoch: 1 Batch: 764/20099 (3.80%) Loss: 2.714193 LR: 0.00000595 +[06:08:27] Epoch: 1 Batch: 765/20099 (3.81%) Loss: 2.796363 LR: 0.00000595 +[06:08:30] Epoch: 1 Batch: 766/20099 (3.81%) Loss: 2.631731 LR: 0.00000595 +[06:08:33] Epoch: 1 Batch: 767/20099 (3.82%) Loss: 2.876368 LR: 0.00000595 +[06:08:36] Epoch: 1 Batch: 768/20099 (3.82%) Loss: 2.613647 LR: 0.00000595 +[06:08:39] Epoch: 1 Batch: 769/20099 (3.83%) Loss: 2.565985 LR: 0.00000595 +[06:08:42] Epoch: 1 Batch: 770/20099 (3.83%) Loss: 2.686233 LR: 0.00000600 +[06:08:46] Epoch: 1 Batch: 771/20099 (3.84%) Loss: 2.977858 LR: 0.00000600 +[06:08:49] Epoch: 1 Batch: 772/20099 (3.84%) Loss: 2.741403 LR: 0.00000600 +[06:08:52] Epoch: 1 Batch: 773/20099 (3.85%) Loss: 2.547160 LR: 0.00000600 +[06:08:55] Epoch: 1 Batch: 774/20099 (3.85%) Loss: 2.780986 LR: 0.00000600 +[06:08:58] Epoch: 1 Batch: 775/20099 (3.86%) Loss: 2.665013 LR: 0.00000600 +[06:09:01] Epoch: 1 Batch: 776/20099 (3.86%) Loss: 2.602629 LR: 0.00000600 +[06:09:04] Epoch: 1 Batch: 777/20099 (3.87%) Loss: 2.866360 LR: 0.00000605 +[06:09:07] Epoch: 1 Batch: 778/20099 (3.87%) Loss: 2.461978 LR: 0.00000605 +[06:09:10] Epoch: 1 Batch: 779/20099 (3.88%) Loss: 2.697537 LR: 0.00000605 +[06:09:13] Epoch: 1 Batch: 780/20099 (3.88%) Loss: 2.482233 LR: 0.00000605 +[06:09:17] Epoch: 1 Batch: 781/20099 (3.89%) Loss: 2.558990 LR: 0.00000605 +[06:09:20] Epoch: 1 Batch: 782/20099 (3.89%) Loss: 2.437068 LR: 0.00000605 +[06:09:23] Epoch: 1 Batch: 783/20099 (3.90%) Loss: 2.735448 LR: 0.00000605 +[06:09:26] Epoch: 1 Batch: 784/20099 (3.90%) Loss: 2.429077 LR: 0.00000611 +[06:09:29] Epoch: 1 Batch: 785/20099 (3.91%) Loss: 2.583948 LR: 0.00000611 +[06:09:32] Epoch: 1 Batch: 786/20099 (3.91%) Loss: 2.514871 LR: 0.00000611 +[06:09:35] Epoch: 1 Batch: 787/20099 (3.92%) Loss: 2.677977 LR: 0.00000611 +[06:09:38] Epoch: 1 Batch: 788/20099 (3.92%) Loss: 2.808684 LR: 0.00000611 +[06:09:41] Epoch: 1 Batch: 789/20099 (3.93%) Loss: 2.763461 LR: 0.00000611 +[06:09:44] Epoch: 1 Batch: 790/20099 (3.93%) Loss: 2.608593 LR: 0.00000611 +[06:09:48] Epoch: 1 Batch: 791/20099 (3.94%) Loss: 2.535972 LR: 0.00000616 +[06:09:51] Epoch: 1 Batch: 792/20099 (3.94%) Loss: 2.715879 LR: 0.00000616 +[06:09:54] Epoch: 1 Batch: 793/20099 (3.95%) Loss: 2.616222 LR: 0.00000616 +[06:09:57] Epoch: 1 Batch: 794/20099 (3.95%) Loss: 2.713726 LR: 0.00000616 +[06:10:00] Epoch: 1 Batch: 795/20099 (3.96%) Loss: 2.650722 LR: 0.00000616 +[06:10:03] Epoch: 1 Batch: 796/20099 (3.96%) Loss: 2.865614 LR: 0.00000616 +[06:10:06] Epoch: 1 Batch: 797/20099 (3.97%) Loss: 2.704346 LR: 0.00000616 +[06:10:09] Epoch: 1 Batch: 798/20099 (3.97%) Loss: 2.617279 LR: 0.00000622 +[06:10:12] Epoch: 1 Batch: 799/20099 (3.98%) Loss: 2.568028 LR: 0.00000622 +[06:10:19] >> Temp checkpoint saved: epoch1_step800, size: 0.1693 GB +[06:10:19] Epoch: 1 Batch: 800/20099 (3.98%) Loss: 2.402861 LR: 0.00000622 +[06:10:22] Epoch: 1 Batch: 801/20099 (3.99%) Loss: 2.842114 LR: 0.00000622 +[06:10:25] Epoch: 1 Batch: 802/20099 (3.99%) Loss: 2.828298 LR: 0.00000622 +[06:10:28] Epoch: 1 Batch: 803/20099 (4.00%) Loss: 2.641461 LR: 0.00000622 +[06:10:32] Epoch: 1 Batch: 804/20099 (4.00%) Loss: 2.578912 LR: 0.00000622 +[06:10:35] Epoch: 1 Batch: 805/20099 (4.01%) Loss: 2.651098 LR: 0.00000627 +[06:10:38] Epoch: 1 Batch: 806/20099 (4.01%) Loss: 2.744098 LR: 0.00000627 +[06:10:41] Epoch: 1 Batch: 807/20099 (4.02%) Loss: 2.703344 LR: 0.00000627 +[06:10:44] Epoch: 1 Batch: 808/20099 (4.02%) Loss: 2.505684 LR: 0.00000627 +[06:10:47] Epoch: 1 Batch: 809/20099 (4.03%) Loss: 2.504888 LR: 0.00000627 +[06:10:50] Epoch: 1 Batch: 810/20099 (4.03%) Loss: 2.287298 LR: 0.00000627 +[06:10:53] Epoch: 1 Batch: 811/20099 (4.04%) Loss: 2.504086 LR: 0.00000627 +[06:10:57] Epoch: 1 Batch: 812/20099 (4.04%) Loss: 2.532554 LR: 0.00000633 +[06:11:00] Epoch: 1 Batch: 813/20099 (4.04%) Loss: 2.721575 LR: 0.00000633 +[06:11:03] Epoch: 1 Batch: 814/20099 (4.05%) Loss: 2.276974 LR: 0.00000633 +[06:11:06] Epoch: 1 Batch: 815/20099 (4.05%) Loss: 2.621229 LR: 0.00000633 +[06:11:09] Epoch: 1 Batch: 816/20099 (4.06%) Loss: 2.589428 LR: 0.00000633 +[06:11:12] Epoch: 1 Batch: 817/20099 (4.06%) Loss: 2.452340 LR: 0.00000633 +[06:11:15] Epoch: 1 Batch: 818/20099 (4.07%) Loss: 2.821325 LR: 0.00000633 +[06:11:18] Epoch: 1 Batch: 819/20099 (4.07%) Loss: 2.840100 LR: 0.00000638 +[06:11:21] Epoch: 1 Batch: 820/20099 (4.08%) Loss: 2.325849 LR: 0.00000638 +[06:11:24] Epoch: 1 Batch: 821/20099 (4.08%) Loss: 2.388970 LR: 0.00000638 +[06:11:28] Epoch: 1 Batch: 822/20099 (4.09%) Loss: 2.550123 LR: 0.00000638 +[06:11:31] Epoch: 1 Batch: 823/20099 (4.09%) Loss: 2.782383 LR: 0.00000638 +[06:11:34] Epoch: 1 Batch: 824/20099 (4.10%) Loss: 2.451442 LR: 0.00000638 +[06:11:37] Epoch: 1 Batch: 825/20099 (4.10%) Loss: 2.583817 LR: 0.00000638 +[06:11:40] Epoch: 1 Batch: 826/20099 (4.11%) Loss: 2.412059 LR: 0.00000644 +[06:11:43] Epoch: 1 Batch: 827/20099 (4.11%) Loss: 2.490955 LR: 0.00000644 +[06:11:46] Epoch: 1 Batch: 828/20099 (4.12%) Loss: 2.698726 LR: 0.00000644 +[06:11:49] Epoch: 1 Batch: 829/20099 (4.12%) Loss: 2.521588 LR: 0.00000644 +[06:11:52] Epoch: 1 Batch: 830/20099 (4.13%) Loss: 2.423469 LR: 0.00000644 +[06:11:55] Epoch: 1 Batch: 831/20099 (4.13%) Loss: 2.723634 LR: 0.00000644 +[06:11:58] Epoch: 1 Batch: 832/20099 (4.14%) Loss: 2.751270 LR: 0.00000644 +[06:12:01] Epoch: 1 Batch: 833/20099 (4.14%) Loss: 2.730820 LR: 0.00000649 +[06:12:05] Epoch: 1 Batch: 834/20099 (4.15%) Loss: 2.670857 LR: 0.00000649 +[06:12:08] Epoch: 1 Batch: 835/20099 (4.15%) Loss: 2.753617 LR: 0.00000649 +[06:12:11] Epoch: 1 Batch: 836/20099 (4.16%) Loss: 2.547131 LR: 0.00000649 +[06:12:14] Epoch: 1 Batch: 837/20099 (4.16%) Loss: 2.463307 LR: 0.00000649 +[06:12:17] Epoch: 1 Batch: 838/20099 (4.17%) Loss: 2.306565 LR: 0.00000649 +[06:12:20] Epoch: 1 Batch: 839/20099 (4.17%) Loss: 2.564209 LR: 0.00000649 +[06:12:23] Epoch: 1 Batch: 840/20099 (4.18%) Loss: 2.574380 LR: 0.00000655 +[06:12:26] Epoch: 1 Batch: 841/20099 (4.18%) Loss: 2.604923 LR: 0.00000655 +[06:12:29] Epoch: 1 Batch: 842/20099 (4.19%) Loss: 2.540481 LR: 0.00000655 +[06:12:32] Epoch: 1 Batch: 843/20099 (4.19%) Loss: 2.644381 LR: 0.00000655 +[06:12:35] Epoch: 1 Batch: 844/20099 (4.20%) Loss: 2.717461 LR: 0.00000655 +[06:12:39] Epoch: 1 Batch: 845/20099 (4.20%) Loss: 2.635366 LR: 0.00000655 +[06:12:42] Epoch: 1 Batch: 846/20099 (4.21%) Loss: 2.562326 LR: 0.00000655 +[06:12:45] Epoch: 1 Batch: 847/20099 (4.21%) Loss: 2.681325 LR: 0.00000660 +[06:12:48] Epoch: 1 Batch: 848/20099 (4.22%) Loss: 2.723640 LR: 0.00000660 +[06:12:51] Epoch: 1 Batch: 849/20099 (4.22%) Loss: 2.795785 LR: 0.00000660 +[06:12:54] Epoch: 1 Batch: 850/20099 (4.23%) Loss: 2.624772 LR: 0.00000660 +[06:12:57] Epoch: 1 Batch: 851/20099 (4.23%) Loss: 2.405778 LR: 0.00000660 +[06:13:00] Epoch: 1 Batch: 852/20099 (4.24%) Loss: 2.557772 LR: 0.00000660 +[06:13:03] Epoch: 1 Batch: 853/20099 (4.24%) Loss: 2.468818 LR: 0.00000660 +[06:13:06] Epoch: 1 Batch: 854/20099 (4.25%) Loss: 2.674399 LR: 0.00000665 +[06:13:09] Epoch: 1 Batch: 855/20099 (4.25%) Loss: 2.687448 LR: 0.00000665 +[06:13:13] Epoch: 1 Batch: 856/20099 (4.26%) Loss: 2.744894 LR: 0.00000665 +[06:13:16] Epoch: 1 Batch: 857/20099 (4.26%) Loss: 2.650777 LR: 0.00000665 +[06:13:19] Epoch: 1 Batch: 858/20099 (4.27%) Loss: 2.825450 LR: 0.00000665 +[06:13:22] Epoch: 1 Batch: 859/20099 (4.27%) Loss: 2.514328 LR: 0.00000665 +[06:13:25] Epoch: 1 Batch: 860/20099 (4.28%) Loss: 2.751596 LR: 0.00000665 +[06:13:28] Epoch: 1 Batch: 861/20099 (4.28%) Loss: 2.488051 LR: 0.00000671 +[06:13:31] Epoch: 1 Batch: 862/20099 (4.29%) Loss: 2.606173 LR: 0.00000671 +[06:13:34] Epoch: 1 Batch: 863/20099 (4.29%) Loss: 2.521894 LR: 0.00000671 +[06:13:37] Epoch: 1 Batch: 864/20099 (4.30%) Loss: 2.455069 LR: 0.00000671 +[06:13:40] Epoch: 1 Batch: 865/20099 (4.30%) Loss: 2.490532 LR: 0.00000671 +[06:13:44] Epoch: 1 Batch: 866/20099 (4.31%) Loss: 2.348190 LR: 0.00000671 +[06:13:47] Epoch: 1 Batch: 867/20099 (4.31%) Loss: 2.403929 LR: 0.00000671 +[06:13:50] Epoch: 1 Batch: 868/20099 (4.32%) Loss: 2.724191 LR: 0.00000676 +[06:13:53] Epoch: 1 Batch: 869/20099 (4.32%) Loss: 2.716354 LR: 0.00000676 +[06:13:56] Epoch: 1 Batch: 870/20099 (4.33%) Loss: 2.639460 LR: 0.00000676 +[06:13:59] Epoch: 1 Batch: 871/20099 (4.33%) Loss: 2.227056 LR: 0.00000676 +[06:14:02] Epoch: 1 Batch: 872/20099 (4.34%) Loss: 2.472200 LR: 0.00000676 +[06:14:05] Epoch: 1 Batch: 873/20099 (4.34%) Loss: 2.620000 LR: 0.00000676 +[06:14:08] Epoch: 1 Batch: 874/20099 (4.35%) Loss: 2.679246 LR: 0.00000676 +[06:14:12] Epoch: 1 Batch: 875/20099 (4.35%) Loss: 2.609647 LR: 0.00000682 +[06:14:15] Epoch: 1 Batch: 876/20099 (4.36%) Loss: 2.515188 LR: 0.00000682 +[06:14:18] Epoch: 1 Batch: 877/20099 (4.36%) Loss: 2.560900 LR: 0.00000682 +[06:14:21] Epoch: 1 Batch: 878/20099 (4.37%) Loss: 2.392089 LR: 0.00000682 +[06:14:24] Epoch: 1 Batch: 879/20099 (4.37%) Loss: 2.570833 LR: 0.00000682 +[06:14:27] Epoch: 1 Batch: 880/20099 (4.38%) Loss: 2.411783 LR: 0.00000682 +[06:14:30] Epoch: 1 Batch: 881/20099 (4.38%) Loss: 2.602786 LR: 0.00000682 +[06:14:33] Epoch: 1 Batch: 882/20099 (4.39%) Loss: 2.482776 LR: 0.00000687 +[06:14:36] Epoch: 1 Batch: 883/20099 (4.39%) Loss: 2.470782 LR: 0.00000687 +[06:14:40] Epoch: 1 Batch: 884/20099 (4.40%) Loss: 2.475675 LR: 0.00000687 +[06:14:43] Epoch: 1 Batch: 885/20099 (4.40%) Loss: 2.568896 LR: 0.00000687 +[06:14:46] Epoch: 1 Batch: 886/20099 (4.41%) Loss: 2.462074 LR: 0.00000687 +[06:14:49] Epoch: 1 Batch: 887/20099 (4.41%) Loss: 2.576078 LR: 0.00000687 +[06:14:52] Epoch: 1 Batch: 888/20099 (4.42%) Loss: 2.787576 LR: 0.00000687 +[06:14:55] Epoch: 1 Batch: 889/20099 (4.42%) Loss: 2.327943 LR: 0.00000693 +[06:14:58] Epoch: 1 Batch: 890/20099 (4.43%) Loss: 2.285853 LR: 0.00000693 +[06:15:01] Epoch: 1 Batch: 891/20099 (4.43%) Loss: 2.261620 LR: 0.00000693 +[06:15:04] Epoch: 1 Batch: 892/20099 (4.44%) Loss: 2.802789 LR: 0.00000693 +[06:15:07] Epoch: 1 Batch: 893/20099 (4.44%) Loss: 2.366830 LR: 0.00000693 +[06:15:11] Epoch: 1 Batch: 894/20099 (4.45%) Loss: 2.537253 LR: 0.00000693 +[06:15:14] Epoch: 1 Batch: 895/20099 (4.45%) Loss: 2.551683 LR: 0.00000693 +[06:15:17] Epoch: 1 Batch: 896/20099 (4.46%) Loss: 2.565464 LR: 0.00000698 +[06:15:20] Epoch: 1 Batch: 897/20099 (4.46%) Loss: 2.883792 LR: 0.00000698 +[06:15:23] Epoch: 1 Batch: 898/20099 (4.47%) Loss: 2.671345 LR: 0.00000698 +[06:15:26] Epoch: 1 Batch: 899/20099 (4.47%) Loss: 2.635837 LR: 0.00000698 +[06:15:29] Epoch: 1 Batch: 900/20099 (4.48%) Loss: 2.613115 LR: 0.00000698 +[06:15:32] Epoch: 1 Batch: 901/20099 (4.48%) Loss: 2.249015 LR: 0.00000698 +[06:15:35] Epoch: 1 Batch: 902/20099 (4.49%) Loss: 2.572182 LR: 0.00000698 +[06:15:38] Epoch: 1 Batch: 903/20099 (4.49%) Loss: 2.426290 LR: 0.00000704 +[06:15:41] Epoch: 1 Batch: 904/20099 (4.50%) Loss: 2.459203 LR: 0.00000704 +[06:15:44] Epoch: 1 Batch: 905/20099 (4.50%) Loss: 2.464326 LR: 0.00000704 +[06:15:48] Epoch: 1 Batch: 906/20099 (4.51%) Loss: 2.679364 LR: 0.00000704 +[06:15:51] Epoch: 1 Batch: 907/20099 (4.51%) Loss: 2.706455 LR: 0.00000704 +[06:15:54] Epoch: 1 Batch: 908/20099 (4.52%) Loss: 2.667820 LR: 0.00000704 +[06:15:57] Epoch: 1 Batch: 909/20099 (4.52%) Loss: 2.573327 LR: 0.00000704 +[06:16:00] Epoch: 1 Batch: 910/20099 (4.53%) Loss: 2.173689 LR: 0.00000709 +[06:16:03] Epoch: 1 Batch: 911/20099 (4.53%) Loss: 2.395568 LR: 0.00000709 +[06:16:06] Epoch: 1 Batch: 912/20099 (4.54%) Loss: 2.938343 LR: 0.00000709 +[06:16:09] Epoch: 1 Batch: 913/20099 (4.54%) Loss: 2.319208 LR: 0.00000709 +[06:16:12] Epoch: 1 Batch: 914/20099 (4.55%) Loss: 2.700063 LR: 0.00000709 +[06:16:15] Epoch: 1 Batch: 915/20099 (4.55%) Loss: 2.650727 LR: 0.00000709 +[06:16:19] Epoch: 1 Batch: 916/20099 (4.56%) Loss: 2.206372 LR: 0.00000709 +[06:16:22] Epoch: 1 Batch: 917/20099 (4.56%) Loss: 2.546712 LR: 0.00000715 +[06:16:25] Epoch: 1 Batch: 918/20099 (4.57%) Loss: 2.473594 LR: 0.00000715 +[06:16:28] Epoch: 1 Batch: 919/20099 (4.57%) Loss: 2.478794 LR: 0.00000715 +[06:16:31] Epoch: 1 Batch: 920/20099 (4.58%) Loss: 2.519163 LR: 0.00000715 +[06:16:34] Epoch: 1 Batch: 921/20099 (4.58%) Loss: 2.779296 LR: 0.00000715 +[06:16:37] Epoch: 1 Batch: 922/20099 (4.59%) Loss: 2.533009 LR: 0.00000715 +[06:16:40] Epoch: 1 Batch: 923/20099 (4.59%) Loss: 2.532431 LR: 0.00000715 +[06:16:43] Epoch: 1 Batch: 924/20099 (4.60%) Loss: 2.097837 LR: 0.00000720 +[06:16:46] Epoch: 1 Batch: 925/20099 (4.60%) Loss: 2.507565 LR: 0.00000720 +[06:16:49] Epoch: 1 Batch: 926/20099 (4.61%) Loss: 2.726522 LR: 0.00000720 +[06:16:52] Epoch: 1 Batch: 927/20099 (4.61%) Loss: 2.686650 LR: 0.00000720 +[06:16:55] Epoch: 1 Batch: 928/20099 (4.62%) Loss: 2.418660 LR: 0.00000720 +[06:16:59] Epoch: 1 Batch: 929/20099 (4.62%) Loss: 2.385518 LR: 0.00000720 +[06:17:02] Epoch: 1 Batch: 930/20099 (4.63%) Loss: 2.404970 LR: 0.00000720 +[06:17:05] Epoch: 1 Batch: 931/20099 (4.63%) Loss: 2.696613 LR: 0.00000725 +[06:17:08] Epoch: 1 Batch: 932/20099 (4.64%) Loss: 2.703056 LR: 0.00000725 +[06:17:11] Epoch: 1 Batch: 933/20099 (4.64%) Loss: 2.434398 LR: 0.00000725 +[06:17:14] Epoch: 1 Batch: 934/20099 (4.65%) Loss: 2.638575 LR: 0.00000725 +[06:17:17] Epoch: 1 Batch: 935/20099 (4.65%) Loss: 2.449628 LR: 0.00000725 +[06:17:20] Epoch: 1 Batch: 936/20099 (4.66%) Loss: 2.620136 LR: 0.00000725 +[06:17:23] Epoch: 1 Batch: 937/20099 (4.66%) Loss: 2.590441 LR: 0.00000725 +[06:17:26] Epoch: 1 Batch: 938/20099 (4.67%) Loss: 2.460018 LR: 0.00000731 +[06:17:29] Epoch: 1 Batch: 939/20099 (4.67%) Loss: 2.603896 LR: 0.00000731 +[06:17:33] Epoch: 1 Batch: 940/20099 (4.68%) Loss: 2.745907 LR: 0.00000731 +[06:17:36] Epoch: 1 Batch: 941/20099 (4.68%) Loss: 2.647418 LR: 0.00000731 +[06:17:39] Epoch: 1 Batch: 942/20099 (4.69%) Loss: 2.458124 LR: 0.00000731 +[06:17:42] Epoch: 1 Batch: 943/20099 (4.69%) Loss: 2.558774 LR: 0.00000731 +[06:17:45] Epoch: 1 Batch: 944/20099 (4.70%) Loss: 2.440393 LR: 0.00000731 +[06:17:48] Epoch: 1 Batch: 945/20099 (4.70%) Loss: 2.493258 LR: 0.00000736 +[06:17:51] Epoch: 1 Batch: 946/20099 (4.71%) Loss: 2.502746 LR: 0.00000736 +[06:17:54] Epoch: 1 Batch: 947/20099 (4.71%) Loss: 2.610931 LR: 0.00000736 +[06:17:57] Epoch: 1 Batch: 948/20099 (4.72%) Loss: 2.715745 LR: 0.00000736 +[06:18:00] Epoch: 1 Batch: 949/20099 (4.72%) Loss: 2.582851 LR: 0.00000736 +[06:18:04] Epoch: 1 Batch: 950/20099 (4.73%) Loss: 2.658231 LR: 0.00000736 +[06:18:07] Epoch: 1 Batch: 951/20099 (4.73%) Loss: 2.459612 LR: 0.00000736 +[06:18:10] Epoch: 1 Batch: 952/20099 (4.74%) Loss: 2.494330 LR: 0.00000742 +[06:18:13] Epoch: 1 Batch: 953/20099 (4.74%) Loss: 2.352329 LR: 0.00000742 +[06:18:16] Epoch: 1 Batch: 954/20099 (4.75%) Loss: 2.416545 LR: 0.00000742 +[06:18:19] Epoch: 1 Batch: 955/20099 (4.75%) Loss: 2.123285 LR: 0.00000742 +[06:18:22] Epoch: 1 Batch: 956/20099 (4.76%) Loss: 2.506108 LR: 0.00000742 +[06:18:25] Epoch: 1 Batch: 957/20099 (4.76%) Loss: 2.554704 LR: 0.00000742 +[06:18:28] Epoch: 1 Batch: 958/20099 (4.77%) Loss: 2.394995 LR: 0.00000742 +[06:18:31] Epoch: 1 Batch: 959/20099 (4.77%) Loss: 2.385046 LR: 0.00000747 +[06:18:35] Epoch: 1 Batch: 960/20099 (4.78%) Loss: 2.492765 LR: 0.00000747 +[06:18:38] Epoch: 1 Batch: 961/20099 (4.78%) Loss: 2.551732 LR: 0.00000747 +[06:18:41] Epoch: 1 Batch: 962/20099 (4.79%) Loss: 2.243909 LR: 0.00000747 +[06:18:44] Epoch: 1 Batch: 963/20099 (4.79%) Loss: 2.670889 LR: 0.00000747 +[06:18:47] Epoch: 1 Batch: 964/20099 (4.80%) Loss: 2.778051 LR: 0.00000747 +[06:18:50] Epoch: 1 Batch: 965/20099 (4.80%) Loss: 2.470696 LR: 0.00000747 +[06:18:53] Epoch: 1 Batch: 966/20099 (4.81%) Loss: 2.696412 LR: 0.00000753 +[06:18:56] Epoch: 1 Batch: 967/20099 (4.81%) Loss: 2.851357 LR: 0.00000753 +[06:18:59] Epoch: 1 Batch: 968/20099 (4.82%) Loss: 2.447409 LR: 0.00000753 +[06:19:02] Epoch: 1 Batch: 969/20099 (4.82%) Loss: 2.442802 LR: 0.00000753 +[06:19:06] Epoch: 1 Batch: 970/20099 (4.83%) Loss: 2.610682 LR: 0.00000753 +[06:19:09] Epoch: 1 Batch: 971/20099 (4.83%) Loss: 2.440674 LR: 0.00000753 +[06:19:12] Epoch: 1 Batch: 972/20099 (4.84%) Loss: 2.639163 LR: 0.00000753 +[06:19:15] Epoch: 1 Batch: 973/20099 (4.84%) Loss: 2.355131 LR: 0.00000758 +[06:19:18] Epoch: 1 Batch: 974/20099 (4.85%) Loss: 2.356697 LR: 0.00000758 +[06:19:21] Epoch: 1 Batch: 975/20099 (4.85%) Loss: 2.390357 LR: 0.00000758 +[06:19:24] Epoch: 1 Batch: 976/20099 (4.86%) Loss: 2.958416 LR: 0.00000758 +[06:19:27] Epoch: 1 Batch: 977/20099 (4.86%) Loss: 2.671299 LR: 0.00000758 +[06:19:30] Epoch: 1 Batch: 978/20099 (4.87%) Loss: 2.239478 LR: 0.00000758 +[06:19:33] Epoch: 1 Batch: 979/20099 (4.87%) Loss: 2.714630 LR: 0.00000758 +[06:19:37] Epoch: 1 Batch: 980/20099 (4.88%) Loss: 2.516395 LR: 0.00000764 +[06:19:40] Epoch: 1 Batch: 981/20099 (4.88%) Loss: 2.523391 LR: 0.00000764 +[06:19:43] Epoch: 1 Batch: 982/20099 (4.89%) Loss: 2.685921 LR: 0.00000764 +[06:19:46] Epoch: 1 Batch: 983/20099 (4.89%) Loss: 2.260486 LR: 0.00000764 +[06:19:49] Epoch: 1 Batch: 984/20099 (4.90%) Loss: 2.365876 LR: 0.00000764 +[06:19:52] Epoch: 1 Batch: 985/20099 (4.90%) Loss: 2.634164 LR: 0.00000764 +[06:19:55] Epoch: 1 Batch: 986/20099 (4.91%) Loss: 2.481252 LR: 0.00000764 +[06:19:58] Epoch: 1 Batch: 987/20099 (4.91%) Loss: 2.762477 LR: 0.00000769 +[06:20:01] Epoch: 1 Batch: 988/20099 (4.92%) Loss: 2.392833 LR: 0.00000769 +[06:20:05] Epoch: 1 Batch: 989/20099 (4.92%) Loss: 2.242428 LR: 0.00000769 +[06:20:08] Epoch: 1 Batch: 990/20099 (4.93%) Loss: 2.609648 LR: 0.00000769 +[06:20:11] Epoch: 1 Batch: 991/20099 (4.93%) Loss: 2.314681 LR: 0.00000769 +[06:20:14] Epoch: 1 Batch: 992/20099 (4.94%) Loss: 2.377580 LR: 0.00000769 +[06:20:17] Epoch: 1 Batch: 993/20099 (4.94%) Loss: 2.282904 LR: 0.00000769 +[06:20:20] Epoch: 1 Batch: 994/20099 (4.95%) Loss: 2.415827 LR: 0.00000775 +[06:20:23] Epoch: 1 Batch: 995/20099 (4.95%) Loss: 2.501282 LR: 0.00000775 +[06:20:26] Epoch: 1 Batch: 996/20099 (4.96%) Loss: 2.216219 LR: 0.00000775 +[06:20:29] Epoch: 1 Batch: 997/20099 (4.96%) Loss: 2.484162 LR: 0.00000775 +[06:20:32] Epoch: 1 Batch: 998/20099 (4.97%) Loss: 2.314183 LR: 0.00000775 +[06:20:35] Epoch: 1 Batch: 999/20099 (4.97%) Loss: 2.462252 LR: 0.00000775 +[06:20:39] >> Evaluating batch 0 +[06:20:40] >> Evaluating batch 1 +[06:20:41] >> Evaluating batch 2 +[06:20:43] >> Evaluating batch 3 +[06:20:44] >> Evaluating batch 4 +[06:20:45] >> Evaluating batch 5 +[06:20:46] >> Evaluating batch 6 +[06:20:48] >> Evaluating batch 7 +[06:20:49] >> Evaluating batch 8 +[06:20:50] >> Evaluating batch 9 +[06:20:51] >> Evaluating batch 10 +[06:20:53] >> Evaluating batch 11 +[06:20:54] >> Evaluating batch 12 +[06:20:55] >> Evaluating batch 13 +[06:20:56] >> Evaluating batch 14 +[06:20:57] >> Evaluating batch 15 +[06:20:58] >> Evaluating batch 16 +[06:20:59] Epoch: 1 Step: 1000/20099 Evaluation: +[06:20:59] [1mAvg Loss Since Last Eval: 2.8438 Val Loss: 2.5492 Validation loss delta: -1.2349 Perplexity: 12.7973 LR: 0.00000775 +[06:21:03] >> Temp checkpoint saved: epoch1_step1000, size: 0.1693 GB +[06:21:06] >> Checkpoint saved: epoch1_step1000, size: 0.1693 GB +[06:21:06] Epoch: 1 Batch: 1000/20099 (4.98%) Loss: 2.268009 LR: 0.00000775 +[06:21:09] Epoch: 1 Batch: 1001/20099 (4.98%) Loss: 2.465096 LR: 0.00000780 +[06:21:12] Epoch: 1 Batch: 1002/20099 (4.99%) Loss: 2.413087 LR: 0.00000780 +[06:21:16] Epoch: 1 Batch: 1003/20099 (4.99%) Loss: 2.499773 LR: 0.00000780 +[06:21:19] Epoch: 1 Batch: 1004/20099 (5.00%) Loss: 2.516705 LR: 0.00000780 +[06:21:22] Epoch: 1 Batch: 1005/20099 (5.00%) Loss: 2.315004 LR: 0.00000780 +[06:21:25] Epoch: 1 Batch: 1006/20099 (5.01%) Loss: 2.431981 LR: 0.00000780 +[06:21:28] Epoch: 1 Batch: 1007/20099 (5.01%) Loss: 2.782194 LR: 0.00000780 +[06:21:31] Epoch: 1 Batch: 1008/20099 (5.02%) Loss: 2.353743 LR: 0.00000785 +[06:21:34] Epoch: 1 Batch: 1009/20099 (5.02%) Loss: 2.532976 LR: 0.00000785 +[06:21:37] Epoch: 1 Batch: 1010/20099 (5.03%) Loss: 2.326507 LR: 0.00000785 +[06:21:41] Epoch: 1 Batch: 1011/20099 (5.03%) Loss: 2.380943 LR: 0.00000785 +[06:21:44] Epoch: 1 Batch: 1012/20099 (5.04%) Loss: 2.446947 LR: 0.00000785 +[06:21:47] Epoch: 1 Batch: 1013/20099 (5.04%) Loss: 2.245902 LR: 0.00000785 +[06:21:50] Epoch: 1 Batch: 1014/20099 (5.05%) Loss: 2.602089 LR: 0.00000785 +[06:21:53] Epoch: 1 Batch: 1015/20099 (5.05%) Loss: 2.296535 LR: 0.00000791 +[06:21:56] Epoch: 1 Batch: 1016/20099 (5.05%) Loss: 2.479060 LR: 0.00000791 +[06:21:59] Epoch: 1 Batch: 1017/20099 (5.06%) Loss: 2.798729 LR: 0.00000791 +[06:22:02] Epoch: 1 Batch: 1018/20099 (5.06%) Loss: 2.501838 LR: 0.00000791 +[06:22:06] Epoch: 1 Batch: 1019/20099 (5.07%) Loss: 2.458307 LR: 0.00000791 +[06:22:09] Epoch: 1 Batch: 1020/20099 (5.07%) Loss: 2.589815 LR: 0.00000791 +[06:22:12] Epoch: 1 Batch: 1021/20099 (5.08%) Loss: 2.443780 LR: 0.00000791 +[06:22:15] Epoch: 1 Batch: 1022/20099 (5.08%) Loss: 2.565748 LR: 0.00000796 +[06:22:18] Epoch: 1 Batch: 1023/20099 (5.09%) Loss: 2.536574 LR: 0.00000796 +[06:22:21] Epoch: 1 Batch: 1024/20099 (5.09%) Loss: 2.380493 LR: 0.00000796 +[06:22:24] Epoch: 1 Batch: 1025/20099 (5.10%) Loss: 2.784717 LR: 0.00000796 +[06:22:27] Epoch: 1 Batch: 1026/20099 (5.10%) Loss: 2.414336 LR: 0.00000796 +[06:22:30] Epoch: 1 Batch: 1027/20099 (5.11%) Loss: 2.274538 LR: 0.00000796 +[06:22:33] Epoch: 1 Batch: 1028/20099 (5.11%) Loss: 2.227084 LR: 0.00000796 +[06:22:36] Epoch: 1 Batch: 1029/20099 (5.12%) Loss: 2.215168 LR: 0.00000802 +[06:22:39] Epoch: 1 Batch: 1030/20099 (5.12%) Loss: 2.564003 LR: 0.00000802 +[06:22:42] Epoch: 1 Batch: 1031/20099 (5.13%) Loss: 2.357389 LR: 0.00000802 +[06:22:46] Epoch: 1 Batch: 1032/20099 (5.13%) Loss: 2.582675 LR: 0.00000802 +[06:22:49] Epoch: 1 Batch: 1033/20099 (5.14%) Loss: 2.620351 LR: 0.00000802 +[06:22:52] Epoch: 1 Batch: 1034/20099 (5.14%) Loss: 2.459991 LR: 0.00000802 +[06:22:55] Epoch: 1 Batch: 1035/20099 (5.15%) Loss: 2.560182 LR: 0.00000802 +[06:22:58] Epoch: 1 Batch: 1036/20099 (5.15%) Loss: 2.530527 LR: 0.00000807 +[06:23:01] Epoch: 1 Batch: 1037/20099 (5.16%) Loss: 2.416432 LR: 0.00000807 +[06:23:04] Epoch: 1 Batch: 1038/20099 (5.16%) Loss: 2.545779 LR: 0.00000807 +[06:23:07] Epoch: 1 Batch: 1039/20099 (5.17%) Loss: 2.685722 LR: 0.00000807 +[06:23:10] Epoch: 1 Batch: 1040/20099 (5.17%) Loss: 2.668758 LR: 0.00000807 +[06:23:13] Epoch: 1 Batch: 1041/20099 (5.18%) Loss: 2.590490 LR: 0.00000807 +[06:23:17] Epoch: 1 Batch: 1042/20099 (5.18%) Loss: 2.365594 LR: 0.00000807 +[06:23:20] Epoch: 1 Batch: 1043/20099 (5.19%) Loss: 2.601564 LR: 0.00000813 +[06:23:23] Epoch: 1 Batch: 1044/20099 (5.19%) Loss: 2.401983 LR: 0.00000813 +[06:23:26] Epoch: 1 Batch: 1045/20099 (5.20%) Loss: 2.252738 LR: 0.00000813 +[06:23:29] Epoch: 1 Batch: 1046/20099 (5.20%) Loss: 2.220903 LR: 0.00000813 +[06:23:32] Epoch: 1 Batch: 1047/20099 (5.21%) Loss: 2.507816 LR: 0.00000813 +[06:23:35] Epoch: 1 Batch: 1048/20099 (5.21%) Loss: 2.420840 LR: 0.00000813 +[06:23:38] Epoch: 1 Batch: 1049/20099 (5.22%) Loss: 2.372026 LR: 0.00000813 +[06:23:41] Epoch: 1 Batch: 1050/20099 (5.22%) Loss: 2.279827 LR: 0.00000818 +[06:23:44] Epoch: 1 Batch: 1051/20099 (5.23%) Loss: 2.455801 LR: 0.00000818 +[06:23:48] Epoch: 1 Batch: 1052/20099 (5.23%) Loss: 2.369296 LR: 0.00000818 +[06:23:51] Epoch: 1 Batch: 1053/20099 (5.24%) Loss: 2.392564 LR: 0.00000818 +[06:23:54] Epoch: 1 Batch: 1054/20099 (5.24%) Loss: 2.121730 LR: 0.00000818 +[06:23:57] Epoch: 1 Batch: 1055/20099 (5.25%) Loss: 2.413616 LR: 0.00000818 +[06:24:00] Epoch: 1 Batch: 1056/20099 (5.25%) Loss: 2.601921 LR: 0.00000818 +[06:24:03] Epoch: 1 Batch: 1057/20099 (5.26%) Loss: 2.185557 LR: 0.00000824 +[06:24:06] Epoch: 1 Batch: 1058/20099 (5.26%) Loss: 2.342689 LR: 0.00000824 +[06:24:09] Epoch: 1 Batch: 1059/20099 (5.27%) Loss: 2.199738 LR: 0.00000824 +[06:24:12] Epoch: 1 Batch: 1060/20099 (5.27%) Loss: 2.046974 LR: 0.00000824 +[06:24:15] Epoch: 1 Batch: 1061/20099 (5.28%) Loss: 2.426784 LR: 0.00000824 +[06:24:18] Epoch: 1 Batch: 1062/20099 (5.28%) Loss: 2.600775 LR: 0.00000824 +[06:24:22] Epoch: 1 Batch: 1063/20099 (5.29%) Loss: 2.172648 LR: 0.00000824 +[06:24:25] Epoch: 1 Batch: 1064/20099 (5.29%) Loss: 2.719525 LR: 0.00000829 +[06:24:28] Epoch: 1 Batch: 1065/20099 (5.30%) Loss: 2.245898 LR: 0.00000829 +[06:24:31] Epoch: 1 Batch: 1066/20099 (5.30%) Loss: 2.280223 LR: 0.00000829 +[06:24:34] Epoch: 1 Batch: 1067/20099 (5.31%) Loss: 2.671634 LR: 0.00000829 +[06:24:37] Epoch: 1 Batch: 1068/20099 (5.31%) Loss: 2.602126 LR: 0.00000829 +[06:24:40] Epoch: 1 Batch: 1069/20099 (5.32%) Loss: 2.225140 LR: 0.00000829 +[06:24:43] Epoch: 1 Batch: 1070/20099 (5.32%) Loss: 2.321930 LR: 0.00000829 +[06:24:46] Epoch: 1 Batch: 1071/20099 (5.33%) Loss: 2.459015 LR: 0.00000835 +[06:24:49] Epoch: 1 Batch: 1072/20099 (5.33%) Loss: 2.347937 LR: 0.00000835 +[06:24:53] Epoch: 1 Batch: 1073/20099 (5.34%) Loss: 2.132351 LR: 0.00000835 +[06:24:56] Epoch: 1 Batch: 1074/20099 (5.34%) Loss: 2.659304 LR: 0.00000835 +[06:24:59] Epoch: 1 Batch: 1075/20099 (5.35%) Loss: 2.534127 LR: 0.00000835 +[06:25:02] Epoch: 1 Batch: 1076/20099 (5.35%) Loss: 2.407731 LR: 0.00000835 +[06:25:05] Epoch: 1 Batch: 1077/20099 (5.36%) Loss: 2.341429 LR: 0.00000835 +[06:25:08] Epoch: 1 Batch: 1078/20099 (5.36%) Loss: 2.475575 LR: 0.00000840 +[06:25:11] Epoch: 1 Batch: 1079/20099 (5.37%) Loss: 2.302447 LR: 0.00000840 +[06:25:14] Epoch: 1 Batch: 1080/20099 (5.37%) Loss: 2.210517 LR: 0.00000840 +[06:25:17] Epoch: 1 Batch: 1081/20099 (5.38%) Loss: 2.746410 LR: 0.00000840 +[06:25:20] Epoch: 1 Batch: 1082/20099 (5.38%) Loss: 2.703728 LR: 0.00000840 +[06:25:23] Epoch: 1 Batch: 1083/20099 (5.39%) Loss: 2.560720 LR: 0.00000840 +[06:25:27] Epoch: 1 Batch: 1084/20099 (5.39%) Loss: 2.882574 LR: 0.00000840 +[06:25:30] Epoch: 1 Batch: 1085/20099 (5.40%) Loss: 2.218088 LR: 0.00000845 +[06:25:33] Epoch: 1 Batch: 1086/20099 (5.40%) Loss: 2.527889 LR: 0.00000845 +[06:25:36] Epoch: 1 Batch: 1087/20099 (5.41%) Loss: 2.355284 LR: 0.00000845 +[06:25:39] Epoch: 1 Batch: 1088/20099 (5.41%) Loss: 2.381893 LR: 0.00000845 +[06:25:42] Epoch: 1 Batch: 1089/20099 (5.42%) Loss: 2.128335 LR: 0.00000845 +[06:25:45] Epoch: 1 Batch: 1090/20099 (5.42%) Loss: 2.270531 LR: 0.00000845 +[06:25:48] Epoch: 1 Batch: 1091/20099 (5.43%) Loss: 2.466590 LR: 0.00000845 +[06:25:51] Epoch: 1 Batch: 1092/20099 (5.43%) Loss: 2.468090 LR: 0.00000851 +[06:25:54] Epoch: 1 Batch: 1093/20099 (5.44%) Loss: 2.313470 LR: 0.00000851 +[06:25:58] Epoch: 1 Batch: 1094/20099 (5.44%) Loss: 2.570095 LR: 0.00000851 +[06:26:01] Epoch: 1 Batch: 1095/20099 (5.45%) Loss: 2.401191 LR: 0.00000851 +[06:26:04] Epoch: 1 Batch: 1096/20099 (5.45%) Loss: 2.873330 LR: 0.00000851 +[06:26:07] Epoch: 1 Batch: 1097/20099 (5.46%) Loss: 2.265061 LR: 0.00000851 +[06:26:10] Epoch: 1 Batch: 1098/20099 (5.46%) Loss: 2.257049 LR: 0.00000851 +[06:26:13] Epoch: 1 Batch: 1099/20099 (5.47%) Loss: 2.407581 LR: 0.00000856 +[06:26:16] Epoch: 1 Batch: 1100/20099 (5.47%) Loss: 2.699013 LR: 0.00000856 +[06:26:19] Epoch: 1 Batch: 1101/20099 (5.48%) Loss: 2.583417 LR: 0.00000856 +[06:26:22] Epoch: 1 Batch: 1102/20099 (5.48%) Loss: 2.178537 LR: 0.00000856 +[06:26:25] Epoch: 1 Batch: 1103/20099 (5.49%) Loss: 2.603993 LR: 0.00000856 +[06:26:29] Epoch: 1 Batch: 1104/20099 (5.49%) Loss: 2.798329 LR: 0.00000856 +[06:26:32] Epoch: 1 Batch: 1105/20099 (5.50%) Loss: 2.329470 LR: 0.00000856 +[06:26:35] Epoch: 1 Batch: 1106/20099 (5.50%) Loss: 2.213904 LR: 0.00000862 +[06:26:38] Epoch: 1 Batch: 1107/20099 (5.51%) Loss: 2.261133 LR: 0.00000862 +[06:26:41] Epoch: 1 Batch: 1108/20099 (5.51%) Loss: 2.428382 LR: 0.00000862 +[06:26:44] Epoch: 1 Batch: 1109/20099 (5.52%) Loss: 2.406951 LR: 0.00000862 +[06:26:47] Epoch: 1 Batch: 1110/20099 (5.52%) Loss: 2.318517 LR: 0.00000862 +[06:26:50] Epoch: 1 Batch: 1111/20099 (5.53%) Loss: 2.386443 LR: 0.00000862 +[06:26:53] Epoch: 1 Batch: 1112/20099 (5.53%) Loss: 2.553387 LR: 0.00000862 +[06:26:57] Epoch: 1 Batch: 1113/20099 (5.54%) Loss: 2.267224 LR: 0.00000867 +[06:27:00] Epoch: 1 Batch: 1114/20099 (5.54%) Loss: 2.337773 LR: 0.00000867 +[06:27:03] Epoch: 1 Batch: 1115/20099 (5.55%) Loss: 2.511516 LR: 0.00000867 +[06:27:06] Epoch: 1 Batch: 1116/20099 (5.55%) Loss: 2.156609 LR: 0.00000867 +[06:27:09] Epoch: 1 Batch: 1117/20099 (5.56%) Loss: 2.514179 LR: 0.00000867 +[06:27:12] Epoch: 1 Batch: 1118/20099 (5.56%) Loss: 2.929897 LR: 0.00000867 +[06:27:15] Epoch: 1 Batch: 1119/20099 (5.57%) Loss: 2.228798 LR: 0.00000867 +[06:27:18] Epoch: 1 Batch: 1120/20099 (5.57%) Loss: 2.461570 LR: 0.00000873 +[06:27:21] Epoch: 1 Batch: 1121/20099 (5.58%) Loss: 2.428960 LR: 0.00000873 +[06:27:24] Epoch: 1 Batch: 1122/20099 (5.58%) Loss: 2.169969 LR: 0.00000873 +[06:27:28] Epoch: 1 Batch: 1123/20099 (5.59%) Loss: 2.601725 LR: 0.00000873 +[06:27:31] Epoch: 1 Batch: 1124/20099 (5.59%) Loss: 2.426909 LR: 0.00000873 +[06:27:34] Epoch: 1 Batch: 1125/20099 (5.60%) Loss: 2.380077 LR: 0.00000873 +[06:27:37] Epoch: 1 Batch: 1126/20099 (5.60%) Loss: 2.342493 LR: 0.00000873 +[06:27:40] Epoch: 1 Batch: 1127/20099 (5.61%) Loss: 2.250722 LR: 0.00000878 +[06:27:43] Epoch: 1 Batch: 1128/20099 (5.61%) Loss: 2.638403 LR: 0.00000878 +[06:27:46] Epoch: 1 Batch: 1129/20099 (5.62%) Loss: 2.463369 LR: 0.00000878 +[06:27:49] Epoch: 1 Batch: 1130/20099 (5.62%) Loss: 2.428862 LR: 0.00000878 +[06:27:52] Epoch: 1 Batch: 1131/20099 (5.63%) Loss: 2.133033 LR: 0.00000878 +[06:27:55] Epoch: 1 Batch: 1132/20099 (5.63%) Loss: 2.573893 LR: 0.00000878 +[06:27:58] Epoch: 1 Batch: 1133/20099 (5.64%) Loss: 2.405326 LR: 0.00000878 +[06:28:02] Epoch: 1 Batch: 1134/20099 (5.64%) Loss: 2.285279 LR: 0.00000884 +[06:28:05] Epoch: 1 Batch: 1135/20099 (5.65%) Loss: 2.563377 LR: 0.00000884 +[06:28:08] Epoch: 1 Batch: 1136/20099 (5.65%) Loss: 2.409335 LR: 0.00000884 +[06:28:11] Epoch: 1 Batch: 1137/20099 (5.66%) Loss: 2.674809 LR: 0.00000884 +[06:28:14] Epoch: 1 Batch: 1138/20099 (5.66%) Loss: 2.217234 LR: 0.00000884 +[06:28:17] Epoch: 1 Batch: 1139/20099 (5.67%) Loss: 2.504634 LR: 0.00000884 +[06:28:20] Epoch: 1 Batch: 1140/20099 (5.67%) Loss: 2.380763 LR: 0.00000884 +[06:28:23] Epoch: 1 Batch: 1141/20099 (5.68%) Loss: 2.478846 LR: 0.00000889 +[06:28:26] Epoch: 1 Batch: 1142/20099 (5.68%) Loss: 2.219178 LR: 0.00000889 +[06:28:29] Epoch: 1 Batch: 1143/20099 (5.69%) Loss: 2.479153 LR: 0.00000889 +[06:28:33] Epoch: 1 Batch: 1144/20099 (5.69%) Loss: 2.338852 LR: 0.00000889 +[06:28:36] Epoch: 1 Batch: 1145/20099 (5.70%) Loss: 2.547462 LR: 0.00000889 +[06:28:39] Epoch: 1 Batch: 1146/20099 (5.70%) Loss: 2.326259 LR: 0.00000889 +[06:28:42] Epoch: 1 Batch: 1147/20099 (5.71%) Loss: 2.430275 LR: 0.00000889 +[06:28:45] Epoch: 1 Batch: 1148/20099 (5.71%) Loss: 2.331192 LR: 0.00000895 +[06:28:48] Epoch: 1 Batch: 1149/20099 (5.72%) Loss: 2.377501 LR: 0.00000895 +[06:28:51] Epoch: 1 Batch: 1150/20099 (5.72%) Loss: 2.405118 LR: 0.00000895 +[06:28:54] Epoch: 1 Batch: 1151/20099 (5.73%) Loss: 2.438580 LR: 0.00000895 +[06:28:57] Epoch: 1 Batch: 1152/20099 (5.73%) Loss: 2.582524 LR: 0.00000895 +[06:29:00] Epoch: 1 Batch: 1153/20099 (5.74%) Loss: 2.605482 LR: 0.00000895 +[06:29:04] Epoch: 1 Batch: 1154/20099 (5.74%) Loss: 2.351916 LR: 0.00000895 +[06:29:07] Epoch: 1 Batch: 1155/20099 (5.75%) Loss: 2.359187 LR: 0.00000900 +[06:29:10] Epoch: 1 Batch: 1156/20099 (5.75%) Loss: 2.781204 LR: 0.00000900 +[06:29:13] Epoch: 1 Batch: 1157/20099 (5.76%) Loss: 2.216732 LR: 0.00000900 +[06:29:16] Epoch: 1 Batch: 1158/20099 (5.76%) Loss: 2.617371 LR: 0.00000900 +[06:29:19] Epoch: 1 Batch: 1159/20099 (5.77%) Loss: 2.614164 LR: 0.00000900 +[06:29:22] Epoch: 1 Batch: 1160/20099 (5.77%) Loss: 2.227229 LR: 0.00000900 +[06:29:25] Epoch: 1 Batch: 1161/20099 (5.78%) Loss: 2.316306 LR: 0.00000900 +[06:29:28] Epoch: 1 Batch: 1162/20099 (5.78%) Loss: 2.546376 LR: 0.00000905 +[06:29:31] Epoch: 1 Batch: 1163/20099 (5.79%) Loss: 2.547140 LR: 0.00000905 +[06:29:35] Epoch: 1 Batch: 1164/20099 (5.79%) Loss: 2.212639 LR: 0.00000905 +[06:29:38] Epoch: 1 Batch: 1165/20099 (5.80%) Loss: 2.607040 LR: 0.00000905 +[06:29:41] Epoch: 1 Batch: 1166/20099 (5.80%) Loss: 2.266511 LR: 0.00000905 +[06:29:44] Epoch: 1 Batch: 1167/20099 (5.81%) Loss: 2.461068 LR: 0.00000905 +[06:29:47] Epoch: 1 Batch: 1168/20099 (5.81%) Loss: 2.310920 LR: 0.00000905 +[06:29:50] Epoch: 1 Batch: 1169/20099 (5.82%) Loss: 2.496668 LR: 0.00000911 +[06:29:53] Epoch: 1 Batch: 1170/20099 (5.82%) Loss: 2.436123 LR: 0.00000911 +[06:29:56] Epoch: 1 Batch: 1171/20099 (5.83%) Loss: 2.411027 LR: 0.00000911 +[06:29:59] Epoch: 1 Batch: 1172/20099 (5.83%) Loss: 2.337048 LR: 0.00000911 +[06:30:02] Epoch: 1 Batch: 1173/20099 (5.84%) Loss: 2.440300 LR: 0.00000911 +[06:30:05] Epoch: 1 Batch: 1174/20099 (5.84%) Loss: 2.199517 LR: 0.00000911 +[06:30:09] Epoch: 1 Batch: 1175/20099 (5.85%) Loss: 1.955714 LR: 0.00000911 +[06:30:12] Epoch: 1 Batch: 1176/20099 (5.85%) Loss: 2.494298 LR: 0.00000916 +[06:30:15] Epoch: 1 Batch: 1177/20099 (5.86%) Loss: 2.460342 LR: 0.00000916 +[06:30:18] Epoch: 1 Batch: 1178/20099 (5.86%) Loss: 2.436229 LR: 0.00000916 +[06:30:21] Epoch: 1 Batch: 1179/20099 (5.87%) Loss: 2.279537 LR: 0.00000916 +[06:30:24] Epoch: 1 Batch: 1180/20099 (5.87%) Loss: 2.419563 LR: 0.00000916 +[06:30:27] Epoch: 1 Batch: 1181/20099 (5.88%) Loss: 2.602052 LR: 0.00000916 +[06:30:30] Epoch: 1 Batch: 1182/20099 (5.88%) Loss: 2.335544 LR: 0.00000916 +[06:30:33] Epoch: 1 Batch: 1183/20099 (5.89%) Loss: 2.433874 LR: 0.00000922 +[06:30:36] Epoch: 1 Batch: 1184/20099 (5.89%) Loss: 2.620772 LR: 0.00000922 +[06:30:40] Epoch: 1 Batch: 1185/20099 (5.90%) Loss: 2.515781 LR: 0.00000922 +[06:30:43] Epoch: 1 Batch: 1186/20099 (5.90%) Loss: 2.311657 LR: 0.00000922 +[06:30:46] Epoch: 1 Batch: 1187/20099 (5.91%) Loss: 1.989614 LR: 0.00000922 +[06:30:49] Epoch: 1 Batch: 1188/20099 (5.91%) Loss: 2.473170 LR: 0.00000922 +[06:30:52] Epoch: 1 Batch: 1189/20099 (5.92%) Loss: 2.462648 LR: 0.00000922 +[06:30:55] Epoch: 1 Batch: 1190/20099 (5.92%) Loss: 2.405720 LR: 0.00000927 +[06:30:58] Epoch: 1 Batch: 1191/20099 (5.93%) Loss: 2.298663 LR: 0.00000927 +[06:31:01] Epoch: 1 Batch: 1192/20099 (5.93%) Loss: 2.480774 LR: 0.00000927 +[06:31:04] Epoch: 1 Batch: 1193/20099 (5.94%) Loss: 2.388686 LR: 0.00000927 +[06:31:07] Epoch: 1 Batch: 1194/20099 (5.94%) Loss: 2.372922 LR: 0.00000927 +[06:31:11] Epoch: 1 Batch: 1195/20099 (5.95%) Loss: 2.544777 LR: 0.00000927 +[06:31:14] Epoch: 1 Batch: 1196/20099 (5.95%) Loss: 2.319907 LR: 0.00000927 +[06:31:17] Epoch: 1 Batch: 1197/20099 (5.96%) Loss: 2.452294 LR: 0.00000933 +[06:31:20] Epoch: 1 Batch: 1198/20099 (5.96%) Loss: 2.323024 LR: 0.00000933 +[06:31:23] Epoch: 1 Batch: 1199/20099 (5.97%) Loss: 2.385090 LR: 0.00000933 +[06:31:30] >> Temp checkpoint saved: epoch1_step1200, size: 0.1693 GB +[06:31:30] Epoch: 1 Batch: 1200/20099 (5.97%) Loss: 2.602888 LR: 0.00000933 +[06:31:33] Epoch: 1 Batch: 1201/20099 (5.98%) Loss: 2.248256 LR: 0.00000933 +[06:31:36] Epoch: 1 Batch: 1202/20099 (5.98%) Loss: 2.364904 LR: 0.00000933 +[06:31:39] Epoch: 1 Batch: 1203/20099 (5.99%) Loss: 2.542409 LR: 0.00000933 +[06:31:42] Epoch: 1 Batch: 1204/20099 (5.99%) Loss: 2.485518 LR: 0.00000938 +[06:31:45] Epoch: 1 Batch: 1205/20099 (6.00%) Loss: 2.025431 LR: 0.00000938 +[06:31:48] Epoch: 1 Batch: 1206/20099 (6.00%) Loss: 2.454522 LR: 0.00000938 +[06:31:51] Epoch: 1 Batch: 1207/20099 (6.01%) Loss: 2.370275 LR: 0.00000938 +[06:31:55] Epoch: 1 Batch: 1208/20099 (6.01%) Loss: 2.348686 LR: 0.00000938 +[06:31:58] Epoch: 1 Batch: 1209/20099 (6.02%) Loss: 2.653940 LR: 0.00000938 +[06:32:01] Epoch: 1 Batch: 1210/20099 (6.02%) Loss: 2.748654 LR: 0.00000938 +[06:32:04] Epoch: 1 Batch: 1211/20099 (6.03%) Loss: 2.277048 LR: 0.00000944 +[06:32:07] Epoch: 1 Batch: 1212/20099 (6.03%) Loss: 2.287073 LR: 0.00000944 +[06:32:10] Epoch: 1 Batch: 1213/20099 (6.04%) Loss: 2.637357 LR: 0.00000944 +[06:32:13] Epoch: 1 Batch: 1214/20099 (6.04%) Loss: 2.642890 LR: 0.00000944 +[06:32:16] Epoch: 1 Batch: 1215/20099 (6.05%) Loss: 2.137106 LR: 0.00000944 +[06:32:20] Epoch: 1 Batch: 1216/20099 (6.05%) Loss: 2.257788 LR: 0.00000944 +[06:32:23] Epoch: 1 Batch: 1217/20099 (6.06%) Loss: 2.415434 LR: 0.00000944 +[06:32:26] Epoch: 1 Batch: 1218/20099 (6.06%) Loss: 2.546329 LR: 0.00000949 +[06:32:29] Epoch: 1 Batch: 1219/20099 (6.06%) Loss: 2.511510 LR: 0.00000949 +[06:32:32] Epoch: 1 Batch: 1220/20099 (6.07%) Loss: 2.402707 LR: 0.00000949 +[06:32:35] Epoch: 1 Batch: 1221/20099 (6.07%) Loss: 2.452036 LR: 0.00000949 +[06:32:38] Epoch: 1 Batch: 1222/20099 (6.08%) Loss: 2.347884 LR: 0.00000949 +[06:32:41] Epoch: 1 Batch: 1223/20099 (6.08%) Loss: 2.600413 LR: 0.00000949 +[06:32:44] Epoch: 1 Batch: 1224/20099 (6.09%) Loss: 2.360252 LR: 0.00000949 +[06:32:47] Epoch: 1 Batch: 1225/20099 (6.09%) Loss: 2.286991 LR: 0.00000955 +[06:32:51] Epoch: 1 Batch: 1226/20099 (6.10%) Loss: 2.675958 LR: 0.00000955 +[06:32:54] Epoch: 1 Batch: 1227/20099 (6.10%) Loss: 2.584281 LR: 0.00000955 +[06:32:57] Epoch: 1 Batch: 1228/20099 (6.11%) Loss: 2.296438 LR: 0.00000955 +[06:33:00] Epoch: 1 Batch: 1229/20099 (6.11%) Loss: 2.184522 LR: 0.00000955 +[06:33:03] Epoch: 1 Batch: 1230/20099 (6.12%) Loss: 2.369483 LR: 0.00000955 +[06:33:06] Epoch: 1 Batch: 1231/20099 (6.12%) Loss: 2.530273 LR: 0.00000955 +[06:33:09] Epoch: 1 Batch: 1232/20099 (6.13%) Loss: 2.454349 LR: 0.00000960 +[06:33:12] Epoch: 1 Batch: 1233/20099 (6.13%) Loss: 2.505024 LR: 0.00000960 +[06:33:15] Epoch: 1 Batch: 1234/20099 (6.14%) Loss: 2.236042 LR: 0.00000960 +[06:33:18] Epoch: 1 Batch: 1235/20099 (6.14%) Loss: 2.161332 LR: 0.00000960 +[06:33:22] Epoch: 1 Batch: 1236/20099 (6.15%) Loss: 2.311042 LR: 0.00000960 +[06:33:25] Epoch: 1 Batch: 1237/20099 (6.15%) Loss: 2.431950 LR: 0.00000960 +[06:33:28] Epoch: 1 Batch: 1238/20099 (6.16%) Loss: 2.237216 LR: 0.00000960 +[06:33:31] Epoch: 1 Batch: 1239/20099 (6.16%) Loss: 2.607191 LR: 0.00000965 +[06:33:34] Epoch: 1 Batch: 1240/20099 (6.17%) Loss: 2.439565 LR: 0.00000965 +[06:33:37] Epoch: 1 Batch: 1241/20099 (6.17%) Loss: 2.361653 LR: 0.00000965 +[06:33:40] Epoch: 1 Batch: 1242/20099 (6.18%) Loss: 2.527033 LR: 0.00000965 +[06:33:43] Epoch: 1 Batch: 1243/20099 (6.18%) Loss: 2.046207 LR: 0.00000965 +[06:33:46] Epoch: 1 Batch: 1244/20099 (6.19%) Loss: 2.371460 LR: 0.00000965 +[06:33:49] Epoch: 1 Batch: 1245/20099 (6.19%) Loss: 2.475160 LR: 0.00000965 +[06:33:52] Epoch: 1 Batch: 1246/20099 (6.20%) Loss: 2.439467 LR: 0.00000971 +[06:33:56] Epoch: 1 Batch: 1247/20099 (6.20%) Loss: 2.404128 LR: 0.00000971 +[06:33:59] Epoch: 1 Batch: 1248/20099 (6.21%) Loss: 2.501898 LR: 0.00000971 +[06:34:02] Epoch: 1 Batch: 1249/20099 (6.21%) Loss: 2.381163 LR: 0.00000971 +[06:34:05] Epoch: 1 Batch: 1250/20099 (6.22%) Loss: 2.260645 LR: 0.00000971 +[06:34:08] Epoch: 1 Batch: 1251/20099 (6.22%) Loss: 2.446442 LR: 0.00000971 +[06:34:11] Epoch: 1 Batch: 1252/20099 (6.23%) Loss: 2.256139 LR: 0.00000971 +[06:34:14] Epoch: 1 Batch: 1253/20099 (6.23%) Loss: 2.539400 LR: 0.00000976 +[06:34:17] Epoch: 1 Batch: 1254/20099 (6.24%) Loss: 2.235198 LR: 0.00000976 +[06:34:20] Epoch: 1 Batch: 1255/20099 (6.24%) Loss: 2.437637 LR: 0.00000976 +[06:34:23] Epoch: 1 Batch: 1256/20099 (6.25%) Loss: 2.549773 LR: 0.00000976 +[06:34:27] Epoch: 1 Batch: 1257/20099 (6.25%) Loss: 2.362587 LR: 0.00000976 +[06:34:30] Epoch: 1 Batch: 1258/20099 (6.26%) Loss: 2.502442 LR: 0.00000976 +[06:34:33] Epoch: 1 Batch: 1259/20099 (6.26%) Loss: 2.442385 LR: 0.00000976 +[06:34:36] Epoch: 1 Batch: 1260/20099 (6.27%) Loss: 2.605734 LR: 0.00000982 +[06:34:39] Epoch: 1 Batch: 1261/20099 (6.27%) Loss: 2.201965 LR: 0.00000982 +[06:34:42] Epoch: 1 Batch: 1262/20099 (6.28%) Loss: 2.348424 LR: 0.00000982 +[06:34:45] Epoch: 1 Batch: 1263/20099 (6.28%) Loss: 2.392138 LR: 0.00000982 +[06:34:48] Epoch: 1 Batch: 1264/20099 (6.29%) Loss: 2.438533 LR: 0.00000982 +[06:34:51] Epoch: 1 Batch: 1265/20099 (6.29%) Loss: 2.438781 LR: 0.00000982 +[06:34:54] Epoch: 1 Batch: 1266/20099 (6.30%) Loss: 2.175991 LR: 0.00000982 +[06:34:58] Epoch: 1 Batch: 1267/20099 (6.30%) Loss: 2.521098 LR: 0.00000987 +[06:35:01] Epoch: 1 Batch: 1268/20099 (6.31%) Loss: 2.255584 LR: 0.00000987 +[06:35:04] Epoch: 1 Batch: 1269/20099 (6.31%) Loss: 2.443235 LR: 0.00000987 +[06:35:07] Epoch: 1 Batch: 1270/20099 (6.32%) Loss: 2.291856 LR: 0.00000987 +[06:35:10] Epoch: 1 Batch: 1271/20099 (6.32%) Loss: 2.544887 LR: 0.00000987 +[06:35:13] Epoch: 1 Batch: 1272/20099 (6.33%) Loss: 2.293662 LR: 0.00000987 +[06:35:16] Epoch: 1 Batch: 1273/20099 (6.33%) Loss: 1.993520 LR: 0.00000987 +[06:35:19] Epoch: 1 Batch: 1274/20099 (6.34%) Loss: 2.308265 LR: 0.00000993 +[06:35:22] Epoch: 1 Batch: 1275/20099 (6.34%) Loss: 2.471623 LR: 0.00000993 +[06:35:25] Epoch: 1 Batch: 1276/20099 (6.35%) Loss: 2.757651 LR: 0.00000993 +[06:35:29] Epoch: 1 Batch: 1277/20099 (6.35%) Loss: 2.543737 LR: 0.00000993 +[06:35:32] Epoch: 1 Batch: 1278/20099 (6.36%) Loss: 2.274960 LR: 0.00000993 +[06:35:35] Epoch: 1 Batch: 1279/20099 (6.36%) Loss: 2.403479 LR: 0.00000993 +[06:35:38] Epoch: 1 Batch: 1280/20099 (6.37%) Loss: 2.312597 LR: 0.00000993 +[06:35:41] Epoch: 1 Batch: 1281/20099 (6.37%) Loss: 2.502779 LR: 0.00000998 +[06:35:44] Epoch: 1 Batch: 1282/20099 (6.38%) Loss: 1.982061 LR: 0.00000998 +[06:35:47] Epoch: 1 Batch: 1283/20099 (6.38%) Loss: 2.280499 LR: 0.00000998 +[06:35:50] Epoch: 1 Batch: 1284/20099 (6.39%) Loss: 2.177249 LR: 0.00000998 +[06:35:53] Epoch: 1 Batch: 1285/20099 (6.39%) Loss: 2.269976 LR: 0.00000998 +[06:35:56] Epoch: 1 Batch: 1286/20099 (6.40%) Loss: 2.230238 LR: 0.00000998 +[06:35:59] Epoch: 1 Batch: 1287/20099 (6.40%) Loss: 2.454524 LR: 0.00000998 +[06:36:03] Epoch: 1 Batch: 1288/20099 (6.41%) Loss: 2.176792 LR: 0.00001004 +[06:36:06] Epoch: 1 Batch: 1289/20099 (6.41%) Loss: 2.427468 LR: 0.00001004 +[06:36:09] Epoch: 1 Batch: 1290/20099 (6.42%) Loss: 2.108545 LR: 0.00001004 +[06:36:12] Epoch: 1 Batch: 1291/20099 (6.42%) Loss: 2.582648 LR: 0.00001004 +[06:36:15] Epoch: 1 Batch: 1292/20099 (6.43%) Loss: 2.188144 LR: 0.00001004 +[06:36:18] Epoch: 1 Batch: 1293/20099 (6.43%) Loss: 2.269563 LR: 0.00001004 +[06:36:21] Epoch: 1 Batch: 1294/20099 (6.44%) Loss: 2.632709 LR: 0.00001004 +[06:36:24] Epoch: 1 Batch: 1295/20099 (6.44%) Loss: 2.584979 LR: 0.00001009 +[06:36:27] Epoch: 1 Batch: 1296/20099 (6.45%) Loss: 2.552597 LR: 0.00001009 +[06:36:30] Epoch: 1 Batch: 1297/20099 (6.45%) Loss: 2.693159 LR: 0.00001009 +[06:36:33] Epoch: 1 Batch: 1298/20099 (6.46%) Loss: 2.456577 LR: 0.00001009 +[06:36:37] Epoch: 1 Batch: 1299/20099 (6.46%) Loss: 2.555768 LR: 0.00001009 +[06:36:40] Epoch: 1 Batch: 1300/20099 (6.47%) Loss: 2.107139 LR: 0.00001009 +[06:36:43] Epoch: 1 Batch: 1301/20099 (6.47%) Loss: 2.286432 LR: 0.00001009 +[06:36:46] Epoch: 1 Batch: 1302/20099 (6.48%) Loss: 2.321225 LR: 0.00001015 +[06:36:49] Epoch: 1 Batch: 1303/20099 (6.48%) Loss: 2.383250 LR: 0.00001015 +[06:36:52] Epoch: 1 Batch: 1304/20099 (6.49%) Loss: 2.301107 LR: 0.00001015 +[06:36:55] Epoch: 1 Batch: 1305/20099 (6.49%) Loss: 2.723021 LR: 0.00001015 +[06:36:58] Epoch: 1 Batch: 1306/20099 (6.50%) Loss: 2.344350 LR: 0.00001015 +[06:37:01] Epoch: 1 Batch: 1307/20099 (6.50%) Loss: 2.362641 LR: 0.00001015 +[06:37:04] Epoch: 1 Batch: 1308/20099 (6.51%) Loss: 2.252395 LR: 0.00001015 +[06:37:08] Epoch: 1 Batch: 1309/20099 (6.51%) Loss: 2.355865 LR: 0.00001020 +[06:37:11] Epoch: 1 Batch: 1310/20099 (6.52%) Loss: 2.530144 LR: 0.00001020 +[06:37:14] Epoch: 1 Batch: 1311/20099 (6.52%) Loss: 2.416257 LR: 0.00001020 +[06:37:17] Epoch: 1 Batch: 1312/20099 (6.53%) Loss: 2.434113 LR: 0.00001020 +[06:37:20] Epoch: 1 Batch: 1313/20099 (6.53%) Loss: 1.982383 LR: 0.00001020 +[06:37:23] Epoch: 1 Batch: 1314/20099 (6.54%) Loss: 2.381495 LR: 0.00001020 +[06:37:26] Epoch: 1 Batch: 1315/20099 (6.54%) Loss: 2.059135 LR: 0.00001020 +[06:37:29] Epoch: 1 Batch: 1316/20099 (6.55%) Loss: 1.941844 LR: 0.00001025 +[06:37:32] Epoch: 1 Batch: 1317/20099 (6.55%) Loss: 2.455489 LR: 0.00001025 +[06:37:35] Epoch: 1 Batch: 1318/20099 (6.56%) Loss: 2.204604 LR: 0.00001025 +[06:37:38] Epoch: 1 Batch: 1319/20099 (6.56%) Loss: 2.355816 LR: 0.00001025 +[06:37:42] Epoch: 1 Batch: 1320/20099 (6.57%) Loss: 2.332306 LR: 0.00001025 +[06:37:45] Epoch: 1 Batch: 1321/20099 (6.57%) Loss: 2.272191 LR: 0.00001025 +[06:37:48] Epoch: 1 Batch: 1322/20099 (6.58%) Loss: 2.254807 LR: 0.00001025 +[06:37:51] Epoch: 1 Batch: 1323/20099 (6.58%) Loss: 2.258155 LR: 0.00001031 +[06:37:54] Epoch: 1 Batch: 1324/20099 (6.59%) Loss: 2.370725 LR: 0.00001031 +[06:37:57] Epoch: 1 Batch: 1325/20099 (6.59%) Loss: 2.446596 LR: 0.00001031 +[06:38:00] Epoch: 1 Batch: 1326/20099 (6.60%) Loss: 2.211038 LR: 0.00001031 +[06:38:03] Epoch: 1 Batch: 1327/20099 (6.60%) Loss: 2.337549 LR: 0.00001031 +[06:38:06] Epoch: 1 Batch: 1328/20099 (6.61%) Loss: 2.289629 LR: 0.00001031 +[06:38:09] Epoch: 1 Batch: 1329/20099 (6.61%) Loss: 2.338742 LR: 0.00001031 +[06:38:13] Epoch: 1 Batch: 1330/20099 (6.62%) Loss: 2.628507 LR: 0.00001036 +[06:38:16] Epoch: 1 Batch: 1331/20099 (6.62%) Loss: 2.208860 LR: 0.00001036 +[06:38:19] Epoch: 1 Batch: 1332/20099 (6.63%) Loss: 2.243638 LR: 0.00001036 +[06:38:22] Epoch: 1 Batch: 1333/20099 (6.63%) Loss: 2.684297 LR: 0.00001036 +[06:38:25] Epoch: 1 Batch: 1334/20099 (6.64%) Loss: 2.256164 LR: 0.00001036 +[06:38:28] Epoch: 1 Batch: 1335/20099 (6.64%) Loss: 2.408877 LR: 0.00001036 +[06:38:31] Epoch: 1 Batch: 1336/20099 (6.65%) Loss: 2.266587 LR: 0.00001036 +[06:38:34] Epoch: 1 Batch: 1337/20099 (6.65%) Loss: 1.974915 LR: 0.00001042 +[06:38:37] Epoch: 1 Batch: 1338/20099 (6.66%) Loss: 2.213343 LR: 0.00001042 +[06:38:41] Epoch: 1 Batch: 1339/20099 (6.66%) Loss: 2.329974 LR: 0.00001042 +[06:38:44] Epoch: 1 Batch: 1340/20099 (6.67%) Loss: 2.386693 LR: 0.00001042 +[06:38:47] Epoch: 1 Batch: 1341/20099 (6.67%) Loss: 2.384928 LR: 0.00001042 +[06:38:50] Epoch: 1 Batch: 1342/20099 (6.68%) Loss: 2.319753 LR: 0.00001042 +[06:38:53] Epoch: 1 Batch: 1343/20099 (6.68%) Loss: 2.035163 LR: 0.00001042 +[06:38:56] Epoch: 1 Batch: 1344/20099 (6.69%) Loss: 2.347615 LR: 0.00001047 +[06:38:59] Epoch: 1 Batch: 1345/20099 (6.69%) Loss: 2.542047 LR: 0.00001047 +[06:39:02] Epoch: 1 Batch: 1346/20099 (6.70%) Loss: 2.217592 LR: 0.00001047 +[06:39:05] Epoch: 1 Batch: 1347/20099 (6.70%) Loss: 2.158413 LR: 0.00001047 +[06:39:08] Epoch: 1 Batch: 1348/20099 (6.71%) Loss: 2.359446 LR: 0.00001047 +[06:39:12] Epoch: 1 Batch: 1349/20099 (6.71%) Loss: 2.324864 LR: 0.00001047 +[06:39:15] Epoch: 1 Batch: 1350/20099 (6.72%) Loss: 2.537016 LR: 0.00001047 +[06:39:18] Epoch: 1 Batch: 1351/20099 (6.72%) Loss: 2.143765 LR: 0.00001053 +[06:39:21] Epoch: 1 Batch: 1352/20099 (6.73%) Loss: 2.270648 LR: 0.00001053 +[06:39:24] Epoch: 1 Batch: 1353/20099 (6.73%) Loss: 2.378229 LR: 0.00001053 +[06:39:27] Epoch: 1 Batch: 1354/20099 (6.74%) Loss: 2.306212 LR: 0.00001053 +[06:39:30] Epoch: 1 Batch: 1355/20099 (6.74%) Loss: 2.260682 LR: 0.00001053 +[06:39:33] Epoch: 1 Batch: 1356/20099 (6.75%) Loss: 2.290469 LR: 0.00001053 +[06:39:36] Epoch: 1 Batch: 1357/20099 (6.75%) Loss: 2.127180 LR: 0.00001053 +[06:39:39] Epoch: 1 Batch: 1358/20099 (6.76%) Loss: 1.924264 LR: 0.00001058 +[06:39:43] Epoch: 1 Batch: 1359/20099 (6.76%) Loss: 2.220931 LR: 0.00001058 +[06:39:46] Epoch: 1 Batch: 1360/20099 (6.77%) Loss: 2.320542 LR: 0.00001058 +[06:39:49] Epoch: 1 Batch: 1361/20099 (6.77%) Loss: 2.457225 LR: 0.00001058 +[06:39:52] Epoch: 1 Batch: 1362/20099 (6.78%) Loss: 2.309032 LR: 0.00001058 +[06:39:55] Epoch: 1 Batch: 1363/20099 (6.78%) Loss: 2.455758 LR: 0.00001058 +[06:39:58] Epoch: 1 Batch: 1364/20099 (6.79%) Loss: 2.299111 LR: 0.00001058 +[06:40:01] Epoch: 1 Batch: 1365/20099 (6.79%) Loss: 2.207558 LR: 0.00001064 +[06:40:04] Epoch: 1 Batch: 1366/20099 (6.80%) Loss: 2.328998 LR: 0.00001064 +[06:40:07] Epoch: 1 Batch: 1367/20099 (6.80%) Loss: 2.279822 LR: 0.00001064 +[06:40:10] Epoch: 1 Batch: 1368/20099 (6.81%) Loss: 2.323619 LR: 0.00001064 +[06:40:13] Epoch: 1 Batch: 1369/20099 (6.81%) Loss: 2.546458 LR: 0.00001064 +[06:40:16] Epoch: 1 Batch: 1370/20099 (6.82%) Loss: 2.667234 LR: 0.00001064 +[06:40:20] Epoch: 1 Batch: 1371/20099 (6.82%) Loss: 2.670646 LR: 0.00001064 +[06:40:23] Epoch: 1 Batch: 1372/20099 (6.83%) Loss: 2.473140 LR: 0.00001069 +[06:40:26] Epoch: 1 Batch: 1373/20099 (6.83%) Loss: 2.103649 LR: 0.00001069 +[06:40:29] Epoch: 1 Batch: 1374/20099 (6.84%) Loss: 2.178277 LR: 0.00001069 +[06:40:32] Epoch: 1 Batch: 1375/20099 (6.84%) Loss: 2.793400 LR: 0.00001069 +[06:40:35] Epoch: 1 Batch: 1376/20099 (6.85%) Loss: 2.210613 LR: 0.00001069 +[06:40:38] Epoch: 1 Batch: 1377/20099 (6.85%) Loss: 2.590357 LR: 0.00001069 +[06:40:41] Epoch: 1 Batch: 1378/20099 (6.86%) Loss: 2.690386 LR: 0.00001069 +[06:40:44] Epoch: 1 Batch: 1379/20099 (6.86%) Loss: 2.589483 LR: 0.00001075 +[06:40:47] Epoch: 1 Batch: 1380/20099 (6.87%) Loss: 2.299586 LR: 0.00001075 +[06:40:51] Epoch: 1 Batch: 1381/20099 (6.87%) Loss: 2.515357 LR: 0.00001075 +[06:40:54] Epoch: 1 Batch: 1382/20099 (6.88%) Loss: 2.652035 LR: 0.00001075 +[06:40:57] Epoch: 1 Batch: 1383/20099 (6.88%) Loss: 1.913829 LR: 0.00001075 +[06:41:00] Epoch: 1 Batch: 1384/20099 (6.89%) Loss: 2.533218 LR: 0.00001075 +[06:41:03] Epoch: 1 Batch: 1385/20099 (6.89%) Loss: 2.327100 LR: 0.00001075 +[06:41:06] Epoch: 1 Batch: 1386/20099 (6.90%) Loss: 2.287391 LR: 0.00001080 +[06:41:09] Epoch: 1 Batch: 1387/20099 (6.90%) Loss: 2.145640 LR: 0.00001080 +[06:41:12] Epoch: 1 Batch: 1388/20099 (6.91%) Loss: 2.271015 LR: 0.00001080 +[06:41:15] Epoch: 1 Batch: 1389/20099 (6.91%) Loss: 2.267975 LR: 0.00001080 +[06:41:18] Epoch: 1 Batch: 1390/20099 (6.92%) Loss: 2.553367 LR: 0.00001080 +[06:41:22] Epoch: 1 Batch: 1391/20099 (6.92%) Loss: 2.401686 LR: 0.00001080 +[06:41:25] Epoch: 1 Batch: 1392/20099 (6.93%) Loss: 2.445266 LR: 0.00001080 +[06:41:28] Epoch: 1 Batch: 1393/20099 (6.93%) Loss: 2.359893 LR: 0.00001085 +[06:41:31] Epoch: 1 Batch: 1394/20099 (6.94%) Loss: 2.159885 LR: 0.00001085 +[06:41:34] Epoch: 1 Batch: 1395/20099 (6.94%) Loss: 2.557048 LR: 0.00001085 +[06:41:37] Epoch: 1 Batch: 1396/20099 (6.95%) Loss: 2.612354 LR: 0.00001085 +[06:41:40] Epoch: 1 Batch: 1397/20099 (6.95%) Loss: 2.457088 LR: 0.00001085 +[06:41:43] Epoch: 1 Batch: 1398/20099 (6.96%) Loss: 2.703312 LR: 0.00001085 +[06:41:46] Epoch: 1 Batch: 1399/20099 (6.96%) Loss: 2.248981 LR: 0.00001085 +[06:41:53] >> Temp checkpoint saved: epoch1_step1400, size: 0.1693 GB +[06:41:53] Epoch: 1 Batch: 1400/20099 (6.97%) Loss: 2.377447 LR: 0.00001091 +[06:41:56] Epoch: 1 Batch: 1401/20099 (6.97%) Loss: 2.581002 LR: 0.00001091 +[06:41:59] Epoch: 1 Batch: 1402/20099 (6.98%) Loss: 2.312774 LR: 0.00001091 +[06:42:02] Epoch: 1 Batch: 1403/20099 (6.98%) Loss: 2.354105 LR: 0.00001091 +[06:42:05] Epoch: 1 Batch: 1404/20099 (6.99%) Loss: 2.454820 LR: 0.00001091 +[06:42:08] Epoch: 1 Batch: 1405/20099 (6.99%) Loss: 1.947499 LR: 0.00001091 +[06:42:12] Epoch: 1 Batch: 1406/20099 (7.00%) Loss: 2.740079 LR: 0.00001091 +[06:42:15] Epoch: 1 Batch: 1407/20099 (7.00%) Loss: 2.408769 LR: 0.00001096 +[06:42:18] Epoch: 1 Batch: 1408/20099 (7.01%) Loss: 2.120155 LR: 0.00001096 +[06:42:21] Epoch: 1 Batch: 1409/20099 (7.01%) Loss: 2.305241 LR: 0.00001096 +[06:42:24] Epoch: 1 Batch: 1410/20099 (7.02%) Loss: 2.000194 LR: 0.00001096 +[06:42:27] Epoch: 1 Batch: 1411/20099 (7.02%) Loss: 2.381540 LR: 0.00001096 +[06:42:30] Epoch: 1 Batch: 1412/20099 (7.03%) Loss: 2.338565 LR: 0.00001096 +[06:42:33] Epoch: 1 Batch: 1413/20099 (7.03%) Loss: 2.278925 LR: 0.00001096 +[06:42:37] Epoch: 1 Batch: 1414/20099 (7.04%) Loss: 2.461100 LR: 0.00001102 +[06:42:40] Epoch: 1 Batch: 1415/20099 (7.04%) Loss: 2.416693 LR: 0.00001102 +[06:42:43] Epoch: 1 Batch: 1416/20099 (7.05%) Loss: 2.642253 LR: 0.00001102 +[06:42:46] Epoch: 1 Batch: 1417/20099 (7.05%) Loss: 2.188291 LR: 0.00001102 +[06:42:49] Epoch: 1 Batch: 1418/20099 (7.06%) Loss: 2.598341 LR: 0.00001102 +[06:42:52] Epoch: 1 Batch: 1419/20099 (7.06%) Loss: 2.553317 LR: 0.00001102 +[06:42:55] Epoch: 1 Batch: 1420/20099 (7.07%) Loss: 2.200788 LR: 0.00001102 +[06:42:58] Epoch: 1 Batch: 1421/20099 (7.07%) Loss: 2.358859 LR: 0.00001107 +[06:43:01] Epoch: 1 Batch: 1422/20099 (7.07%) Loss: 2.376953 LR: 0.00001107 +[06:43:04] Epoch: 1 Batch: 1423/20099 (7.08%) Loss: 2.430365 LR: 0.00001107 +[06:43:08] Epoch: 1 Batch: 1424/20099 (7.08%) Loss: 2.331782 LR: 0.00001107 +[06:43:11] Epoch: 1 Batch: 1425/20099 (7.09%) Loss: 2.395746 LR: 0.00001107 +[06:43:14] Epoch: 1 Batch: 1426/20099 (7.09%) Loss: 2.209054 LR: 0.00001107 +[06:43:17] Epoch: 1 Batch: 1427/20099 (7.10%) Loss: 2.355356 LR: 0.00001107 +[06:43:20] Epoch: 1 Batch: 1428/20099 (7.10%) Loss: 2.361518 LR: 0.00001113 +[06:43:23] Epoch: 1 Batch: 1429/20099 (7.11%) Loss: 2.321869 LR: 0.00001113 +[06:43:26] Epoch: 1 Batch: 1430/20099 (7.11%) Loss: 2.313463 LR: 0.00001113 +[06:43:29] Epoch: 1 Batch: 1431/20099 (7.12%) Loss: 2.292091 LR: 0.00001113 +[06:43:32] Epoch: 1 Batch: 1432/20099 (7.12%) Loss: 2.081513 LR: 0.00001113 +[06:43:35] Epoch: 1 Batch: 1433/20099 (7.13%) Loss: 2.506223 LR: 0.00001113 +[06:43:38] Epoch: 1 Batch: 1434/20099 (7.13%) Loss: 2.399392 LR: 0.00001113 +[06:43:41] Epoch: 1 Batch: 1435/20099 (7.14%) Loss: 2.423990 LR: 0.00001118 +[06:43:45] Epoch: 1 Batch: 1436/20099 (7.14%) Loss: 2.565485 LR: 0.00001118 +[06:43:48] Epoch: 1 Batch: 1437/20099 (7.15%) Loss: 2.371872 LR: 0.00001118 +[06:43:51] Epoch: 1 Batch: 1438/20099 (7.15%) Loss: 2.236276 LR: 0.00001118 +[06:43:54] Epoch: 1 Batch: 1439/20099 (7.16%) Loss: 2.274421 LR: 0.00001118 +[06:43:57] Epoch: 1 Batch: 1440/20099 (7.16%) Loss: 2.341462 LR: 0.00001118 +[06:44:00] Epoch: 1 Batch: 1441/20099 (7.17%) Loss: 2.330958 LR: 0.00001118 +[06:44:03] Epoch: 1 Batch: 1442/20099 (7.17%) Loss: 2.468582 LR: 0.00001124 +[06:44:06] Epoch: 1 Batch: 1443/20099 (7.18%) Loss: 1.979831 LR: 0.00001124 +[06:44:09] Epoch: 1 Batch: 1444/20099 (7.18%) Loss: 2.286430 LR: 0.00001124 +[06:44:12] Epoch: 1 Batch: 1445/20099 (7.19%) Loss: 1.929819 LR: 0.00001124 +[06:44:16] Epoch: 1 Batch: 1446/20099 (7.19%) Loss: 2.218487 LR: 0.00001124 +[06:44:19] Epoch: 1 Batch: 1447/20099 (7.20%) Loss: 2.496862 LR: 0.00001124 +[06:44:22] Epoch: 1 Batch: 1448/20099 (7.20%) Loss: 2.568574 LR: 0.00001124 +[06:44:25] Epoch: 1 Batch: 1449/20099 (7.21%) Loss: 2.309966 LR: 0.00001129 +[06:44:28] Epoch: 1 Batch: 1450/20099 (7.21%) Loss: 2.352157 LR: 0.00001129 +[06:44:31] Epoch: 1 Batch: 1451/20099 (7.22%) Loss: 2.550408 LR: 0.00001129 +[06:44:34] Epoch: 1 Batch: 1452/20099 (7.22%) Loss: 2.310077 LR: 0.00001129 +[06:44:37] Epoch: 1 Batch: 1453/20099 (7.23%) Loss: 2.359107 LR: 0.00001129 +[06:44:40] Epoch: 1 Batch: 1454/20099 (7.23%) Loss: 2.288473 LR: 0.00001129 +[06:44:44] Epoch: 1 Batch: 1455/20099 (7.24%) Loss: 2.505878 LR: 0.00001129 +[06:44:47] Epoch: 1 Batch: 1456/20099 (7.24%) Loss: 2.398846 LR: 0.00001135 +[06:44:50] Epoch: 1 Batch: 1457/20099 (7.25%) Loss: 2.437691 LR: 0.00001135 +[06:44:53] Epoch: 1 Batch: 1458/20099 (7.25%) Loss: 2.500881 LR: 0.00001135 +[06:44:56] Epoch: 1 Batch: 1459/20099 (7.26%) Loss: 2.546563 LR: 0.00001135 +[06:44:59] Epoch: 1 Batch: 1460/20099 (7.26%) Loss: 2.515837 LR: 0.00001135 +[06:45:02] Epoch: 1 Batch: 1461/20099 (7.27%) Loss: 2.236443 LR: 0.00001135 +[06:45:05] Epoch: 1 Batch: 1462/20099 (7.27%) Loss: 2.471182 LR: 0.00001135 +[06:45:08] Epoch: 1 Batch: 1463/20099 (7.28%) Loss: 2.366965 LR: 0.00001140 +[06:45:11] Epoch: 1 Batch: 1464/20099 (7.28%) Loss: 2.186170 LR: 0.00001140 +[06:45:15] Epoch: 1 Batch: 1465/20099 (7.29%) Loss: 2.634935 LR: 0.00001140 +[06:45:18] Epoch: 1 Batch: 1466/20099 (7.29%) Loss: 2.204321 LR: 0.00001140 +[06:45:21] Epoch: 1 Batch: 1467/20099 (7.30%) Loss: 2.414025 LR: 0.00001140 +[06:45:24] Epoch: 1 Batch: 1468/20099 (7.30%) Loss: 2.527127 LR: 0.00001140 +[06:45:27] Epoch: 1 Batch: 1469/20099 (7.31%) Loss: 2.164903 LR: 0.00001140 +[06:45:30] Epoch: 1 Batch: 1470/20099 (7.31%) Loss: 2.337158 LR: 0.00001145 +[06:45:33] Epoch: 1 Batch: 1471/20099 (7.32%) Loss: 2.225142 LR: 0.00001145 +[06:45:36] Epoch: 1 Batch: 1472/20099 (7.32%) Loss: 2.291492 LR: 0.00001145 +[06:45:39] Epoch: 1 Batch: 1473/20099 (7.33%) Loss: 2.551913 LR: 0.00001145 +[06:45:42] Epoch: 1 Batch: 1474/20099 (7.33%) Loss: 2.183409 LR: 0.00001145 +[06:45:45] Epoch: 1 Batch: 1475/20099 (7.34%) Loss: 2.149106 LR: 0.00001145 +[06:45:48] Epoch: 1 Batch: 1476/20099 (7.34%) Loss: 2.543639 LR: 0.00001145 +[06:45:52] Epoch: 1 Batch: 1477/20099 (7.35%) Loss: 2.304509 LR: 0.00001151 +[06:45:55] Epoch: 1 Batch: 1478/20099 (7.35%) Loss: 2.353556 LR: 0.00001151 +[06:45:58] Epoch: 1 Batch: 1479/20099 (7.36%) Loss: 2.590397 LR: 0.00001151 +[06:46:01] Epoch: 1 Batch: 1480/20099 (7.36%) Loss: 2.676801 LR: 0.00001151 +[06:46:04] Epoch: 1 Batch: 1481/20099 (7.37%) Loss: 2.001445 LR: 0.00001151 +[06:46:07] Epoch: 1 Batch: 1482/20099 (7.37%) Loss: 2.574193 LR: 0.00001151 +[06:46:10] Epoch: 1 Batch: 1483/20099 (7.38%) Loss: 2.075574 LR: 0.00001151 +[06:46:13] Epoch: 1 Batch: 1484/20099 (7.38%) Loss: 2.409600 LR: 0.00001156 +[06:46:16] Epoch: 1 Batch: 1485/20099 (7.39%) Loss: 2.394545 LR: 0.00001156 +[06:46:19] Epoch: 1 Batch: 1486/20099 (7.39%) Loss: 2.402910 LR: 0.00001156 +[06:46:22] Epoch: 1 Batch: 1487/20099 (7.40%) Loss: 2.227754 LR: 0.00001156 +[06:46:26] Epoch: 1 Batch: 1488/20099 (7.40%) Loss: 2.157104 LR: 0.00001156 +[06:46:29] Epoch: 1 Batch: 1489/20099 (7.41%) Loss: 2.407466 LR: 0.00001156 +[06:46:32] Epoch: 1 Batch: 1490/20099 (7.41%) Loss: 2.390683 LR: 0.00001156 +[06:46:35] Epoch: 1 Batch: 1491/20099 (7.42%) Loss: 2.129061 LR: 0.00001162 +[06:46:38] Epoch: 1 Batch: 1492/20099 (7.42%) Loss: 2.019541 LR: 0.00001162 +[06:46:41] Epoch: 1 Batch: 1493/20099 (7.43%) Loss: 2.308099 LR: 0.00001162 +[06:46:44] Epoch: 1 Batch: 1494/20099 (7.43%) Loss: 2.048505 LR: 0.00001162 +[06:46:47] Epoch: 1 Batch: 1495/20099 (7.44%) Loss: 2.472794 LR: 0.00001162 +[06:46:50] Epoch: 1 Batch: 1496/20099 (7.44%) Loss: 2.386614 LR: 0.00001162 +[06:46:53] Epoch: 1 Batch: 1497/20099 (7.45%) Loss: 2.411885 LR: 0.00001162 +[06:46:57] Epoch: 1 Batch: 1498/20099 (7.45%) Loss: 2.209210 LR: 0.00001167 +[06:47:00] Epoch: 1 Batch: 1499/20099 (7.46%) Loss: 2.161344 LR: 0.00001167 +[06:47:03] >> Evaluating batch 0 +[06:47:04] >> Evaluating batch 1 +[06:47:05] >> Evaluating batch 2 +[06:47:07] >> Evaluating batch 3 +[06:47:08] >> Evaluating batch 4 +[06:47:09] >> Evaluating batch 5 +[06:47:10] >> Evaluating batch 6 +[06:47:12] >> Evaluating batch 7 +[06:47:13] >> Evaluating batch 8 +[06:47:14] >> Evaluating batch 9 +[06:47:15] >> Evaluating batch 10 +[06:47:17] >> Evaluating batch 11 +[06:47:18] >> Evaluating batch 12 +[06:47:19] >> Evaluating batch 13 +[06:47:20] >> Evaluating batch 14 +[06:47:21] >> Evaluating batch 15 +[06:47:22] >> Evaluating batch 16 +[06:47:23] Epoch: 1 Step: 1500/20099 Evaluation: +[06:47:23] [1mAvg Loss Since Last Eval: 2.3877 Val Loss: 2.4055 Validation loss delta: -0.1438 Perplexity: 11.0837 LR: 0.00001167 +[06:47:27] >> Checkpoint saved: epoch1_step1500, size: 0.1693 GB +[06:47:27] Epoch: 1 Batch: 1500/20099 (7.46%) Loss: 2.417127 LR: 0.00001167 +[06:47:30] Epoch: 1 Batch: 1501/20099 (7.47%) Loss: 2.424207 LR: 0.00001167 +[06:47:33] Epoch: 1 Batch: 1502/20099 (7.47%) Loss: 2.784582 LR: 0.00001167 +[06:47:36] Epoch: 1 Batch: 1503/20099 (7.48%) Loss: 2.551846 LR: 0.00001167 +[06:47:39] Epoch: 1 Batch: 1504/20099 (7.48%) Loss: 2.496992 LR: 0.00001167 +[06:47:42] Epoch: 1 Batch: 1505/20099 (7.49%) Loss: 2.135157 LR: 0.00001173 +[06:47:45] Epoch: 1 Batch: 1506/20099 (7.49%) Loss: 2.426564 LR: 0.00001173 +[06:47:48] Epoch: 1 Batch: 1507/20099 (7.50%) Loss: 2.139047 LR: 0.00001173 +[06:47:51] Epoch: 1 Batch: 1508/20099 (7.50%) Loss: 2.304374 LR: 0.00001173 +[06:47:55] Epoch: 1 Batch: 1509/20099 (7.51%) Loss: 2.576912 LR: 0.00001173 +[06:47:58] Epoch: 1 Batch: 1510/20099 (7.51%) Loss: 2.379657 LR: 0.00001173 +[06:48:01] Epoch: 1 Batch: 1511/20099 (7.52%) Loss: 2.058109 LR: 0.00001173 +[06:48:04] Epoch: 1 Batch: 1512/20099 (7.52%) Loss: 2.568503 LR: 0.00001178 +[06:48:07] Epoch: 1 Batch: 1513/20099 (7.53%) Loss: 2.528423 LR: 0.00001178 +[06:48:10] Epoch: 1 Batch: 1514/20099 (7.53%) Loss: 2.435926 LR: 0.00001178 +[06:48:13] Epoch: 1 Batch: 1515/20099 (7.54%) Loss: 2.467796 LR: 0.00001178 +[06:48:16] Epoch: 1 Batch: 1516/20099 (7.54%) Loss: 2.436476 LR: 0.00001178 +[06:48:19] Epoch: 1 Batch: 1517/20099 (7.55%) Loss: 2.317236 LR: 0.00001178 +[06:48:23] Epoch: 1 Batch: 1518/20099 (7.55%) Loss: 2.113345 LR: 0.00001178 +[06:48:26] Epoch: 1 Batch: 1519/20099 (7.56%) Loss: 2.191586 LR: 0.00001184 +[06:48:29] Epoch: 1 Batch: 1520/20099 (7.56%) Loss: 2.143368 LR: 0.00001184 +[06:48:32] Epoch: 1 Batch: 1521/20099 (7.57%) Loss: 2.187930 LR: 0.00001184 +[06:48:35] Epoch: 1 Batch: 1522/20099 (7.57%) Loss: 2.517765 LR: 0.00001184 +[06:48:38] Epoch: 1 Batch: 1523/20099 (7.58%) Loss: 2.277214 LR: 0.00001184 +[06:48:41] Epoch: 1 Batch: 1524/20099 (7.58%) Loss: 2.627007 LR: 0.00001184 +[06:48:44] Epoch: 1 Batch: 1525/20099 (7.59%) Loss: 2.509440 LR: 0.00001184 +[06:48:47] Epoch: 1 Batch: 1526/20099 (7.59%) Loss: 2.297620 LR: 0.00001189 +[06:48:50] Epoch: 1 Batch: 1527/20099 (7.60%) Loss: 2.228506 LR: 0.00001189 +[06:48:53] Epoch: 1 Batch: 1528/20099 (7.60%) Loss: 2.365267 LR: 0.00001189 +[06:48:57] Epoch: 1 Batch: 1529/20099 (7.61%) Loss: 2.378869 LR: 0.00001189 +[06:49:00] Epoch: 1 Batch: 1530/20099 (7.61%) Loss: 2.581509 LR: 0.00001189 +[06:49:03] Epoch: 1 Batch: 1531/20099 (7.62%) Loss: 2.241691 LR: 0.00001189 +[06:49:06] Epoch: 1 Batch: 1532/20099 (7.62%) Loss: 2.350325 LR: 0.00001189 +[06:49:09] Epoch: 1 Batch: 1533/20099 (7.63%) Loss: 2.250481 LR: 0.00001195 +[06:49:12] Epoch: 1 Batch: 1534/20099 (7.63%) Loss: 2.032722 LR: 0.00001195 +[06:49:15] Epoch: 1 Batch: 1535/20099 (7.64%) Loss: 2.060870 LR: 0.00001195 +[06:49:18] Epoch: 1 Batch: 1536/20099 (7.64%) Loss: 2.344065 LR: 0.00001195 +[06:49:21] Epoch: 1 Batch: 1537/20099 (7.65%) Loss: 2.419174 LR: 0.00001195 +[06:49:24] Epoch: 1 Batch: 1538/20099 (7.65%) Loss: 2.419722 LR: 0.00001195 +[06:49:28] Epoch: 1 Batch: 1539/20099 (7.66%) Loss: 2.392572 LR: 0.00001195 +[06:49:31] Epoch: 1 Batch: 1540/20099 (7.66%) Loss: 2.334383 LR: 0.00001200 +[06:49:34] Epoch: 1 Batch: 1541/20099 (7.67%) Loss: 2.313348 LR: 0.00001200 +[06:49:37] Epoch: 1 Batch: 1542/20099 (7.67%) Loss: 2.380042 LR: 0.00001200 +[06:49:40] Epoch: 1 Batch: 1543/20099 (7.68%) Loss: 2.461760 LR: 0.00001200 +[06:49:43] Epoch: 1 Batch: 1544/20099 (7.68%) Loss: 2.149881 LR: 0.00001200 +[06:49:46] Epoch: 1 Batch: 1545/20099 (7.69%) Loss: 2.511050 LR: 0.00001200 +[06:49:49] Epoch: 1 Batch: 1546/20099 (7.69%) Loss: 2.311547 LR: 0.00001200 +[06:49:52] Epoch: 1 Batch: 1547/20099 (7.70%) Loss: 2.359907 LR: 0.00001205 +[06:49:55] Epoch: 1 Batch: 1548/20099 (7.70%) Loss: 2.299057 LR: 0.00001205 +[06:49:59] Epoch: 1 Batch: 1549/20099 (7.71%) Loss: 2.622379 LR: 0.00001205 +[06:50:02] Epoch: 1 Batch: 1550/20099 (7.71%) Loss: 2.213366 LR: 0.00001205 +[06:50:05] Epoch: 1 Batch: 1551/20099 (7.72%) Loss: 1.909202 LR: 0.00001205 +[06:50:08] Epoch: 1 Batch: 1552/20099 (7.72%) Loss: 2.435736 LR: 0.00001205 +[06:50:11] Epoch: 1 Batch: 1553/20099 (7.73%) Loss: 2.653101 LR: 0.00001205 +[06:50:14] Epoch: 1 Batch: 1554/20099 (7.73%) Loss: 2.311483 LR: 0.00001211 +[06:50:17] Epoch: 1 Batch: 1555/20099 (7.74%) Loss: 2.180030 LR: 0.00001211 +[06:50:20] Epoch: 1 Batch: 1556/20099 (7.74%) Loss: 2.183302 LR: 0.00001211 +[06:50:23] Epoch: 1 Batch: 1557/20099 (7.75%) Loss: 1.936146 LR: 0.00001211 +[06:50:26] Epoch: 1 Batch: 1558/20099 (7.75%) Loss: 2.386286 LR: 0.00001211 +[06:50:30] Epoch: 1 Batch: 1559/20099 (7.76%) Loss: 2.194422 LR: 0.00001211 +[06:50:33] Epoch: 1 Batch: 1560/20099 (7.76%) Loss: 2.626185 LR: 0.00001211 +[06:50:36] Epoch: 1 Batch: 1561/20099 (7.77%) Loss: 2.297762 LR: 0.00001216 +[06:50:39] Epoch: 1 Batch: 1562/20099 (7.77%) Loss: 2.223624 LR: 0.00001216 +[06:50:42] Epoch: 1 Batch: 1563/20099 (7.78%) Loss: 2.231692 LR: 0.00001216 +[06:50:45] Epoch: 1 Batch: 1564/20099 (7.78%) Loss: 2.398568 LR: 0.00001216 +[06:50:48] Epoch: 1 Batch: 1565/20099 (7.79%) Loss: 2.477699 LR: 0.00001216 +[06:50:51] Epoch: 1 Batch: 1566/20099 (7.79%) Loss: 2.346787 LR: 0.00001216 +[06:50:54] Epoch: 1 Batch: 1567/20099 (7.80%) Loss: 2.458641 LR: 0.00001216 +[06:50:58] Epoch: 1 Batch: 1568/20099 (7.80%) Loss: 2.256150 LR: 0.00001222 +[06:51:01] Epoch: 1 Batch: 1569/20099 (7.81%) Loss: 2.306362 LR: 0.00001222 +[06:51:04] Epoch: 1 Batch: 1570/20099 (7.81%) Loss: 2.053053 LR: 0.00001222 +[06:51:07] Epoch: 1 Batch: 1571/20099 (7.82%) Loss: 2.450536 LR: 0.00001222 +[06:51:10] Epoch: 1 Batch: 1572/20099 (7.82%) Loss: 2.270018 LR: 0.00001222 +[06:51:13] Epoch: 1 Batch: 1573/20099 (7.83%) Loss: 2.215135 LR: 0.00001222 +[06:51:16] Epoch: 1 Batch: 1574/20099 (7.83%) Loss: 2.505959 LR: 0.00001222 +[06:51:19] Epoch: 1 Batch: 1575/20099 (7.84%) Loss: 2.287878 LR: 0.00001227 +[06:51:22] Epoch: 1 Batch: 1576/20099 (7.84%) Loss: 2.136587 LR: 0.00001227 +[06:51:25] Epoch: 1 Batch: 1577/20099 (7.85%) Loss: 2.470381 LR: 0.00001227 +[06:51:29] Epoch: 1 Batch: 1578/20099 (7.85%) Loss: 2.281889 LR: 0.00001227 +[06:51:32] Epoch: 1 Batch: 1579/20099 (7.86%) Loss: 2.124431 LR: 0.00001227 +[06:51:35] Epoch: 1 Batch: 1580/20099 (7.86%) Loss: 2.418377 LR: 0.00001227 +[06:51:38] Epoch: 1 Batch: 1581/20099 (7.87%) Loss: 2.271000 LR: 0.00001227 +[06:51:41] Epoch: 1 Batch: 1582/20099 (7.87%) Loss: 2.287450 LR: 0.00001233 +[06:51:44] Epoch: 1 Batch: 1583/20099 (7.88%) Loss: 2.340910 LR: 0.00001233 +[06:51:47] Epoch: 1 Batch: 1584/20099 (7.88%) Loss: 2.590541 LR: 0.00001233 +[06:51:50] Epoch: 1 Batch: 1585/20099 (7.89%) Loss: 2.343602 LR: 0.00001233 +[06:51:53] Epoch: 1 Batch: 1586/20099 (7.89%) Loss: 2.606321 LR: 0.00001233 +[06:51:56] Epoch: 1 Batch: 1587/20099 (7.90%) Loss: 2.511487 LR: 0.00001233 +[06:52:00] Epoch: 1 Batch: 1588/20099 (7.90%) Loss: 2.258364 LR: 0.00001233 +[06:52:03] Epoch: 1 Batch: 1589/20099 (7.91%) Loss: 2.827784 LR: 0.00001238 +[06:52:06] Epoch: 1 Batch: 1590/20099 (7.91%) Loss: 2.164371 LR: 0.00001238 +[06:52:09] Epoch: 1 Batch: 1591/20099 (7.92%) Loss: 2.174267 LR: 0.00001238 +[06:52:12] Epoch: 1 Batch: 1592/20099 (7.92%) Loss: 2.491860 LR: 0.00001238 +[06:52:15] Epoch: 1 Batch: 1593/20099 (7.93%) Loss: 2.291612 LR: 0.00001238 +[06:52:18] Epoch: 1 Batch: 1594/20099 (7.93%) Loss: 2.423944 LR: 0.00001238 +[06:52:21] Epoch: 1 Batch: 1595/20099 (7.94%) Loss: 2.093985 LR: 0.00001238 +[06:52:24] Epoch: 1 Batch: 1596/20099 (7.94%) Loss: 2.354590 LR: 0.00001244 +[06:52:27] Epoch: 1 Batch: 1597/20099 (7.95%) Loss: 2.373621 LR: 0.00001244 +[06:52:31] Epoch: 1 Batch: 1598/20099 (7.95%) Loss: 2.445650 LR: 0.00001244 +[06:52:34] Epoch: 1 Batch: 1599/20099 (7.96%) Loss: 2.492216 LR: 0.00001244 +[06:52:40] >> Temp checkpoint saved: epoch1_step1600, size: 0.1693 GB +[06:52:40] Epoch: 1 Batch: 1600/20099 (7.96%) Loss: 2.247180 LR: 0.00001244 +[06:52:44] Epoch: 1 Batch: 1601/20099 (7.97%) Loss: 2.532938 LR: 0.00001244 +[06:52:47] Epoch: 1 Batch: 1602/20099 (7.97%) Loss: 2.000120 LR: 0.00001244 +[06:52:50] Epoch: 1 Batch: 1603/20099 (7.98%) Loss: 2.299903 LR: 0.00001249 +[06:52:53] Epoch: 1 Batch: 1604/20099 (7.98%) Loss: 2.182782 LR: 0.00001249 +[06:52:56] Epoch: 1 Batch: 1605/20099 (7.99%) Loss: 2.360156 LR: 0.00001249 +[06:52:59] Epoch: 1 Batch: 1606/20099 (7.99%) Loss: 2.619795 LR: 0.00001249 +[06:53:02] Epoch: 1 Batch: 1607/20099 (8.00%) Loss: 2.502345 LR: 0.00001249 +[06:53:05] Epoch: 1 Batch: 1608/20099 (8.00%) Loss: 2.409710 LR: 0.00001249 +[06:53:08] Epoch: 1 Batch: 1609/20099 (8.01%) Loss: 2.194351 LR: 0.00001249 +[06:53:12] Epoch: 1 Batch: 1610/20099 (8.01%) Loss: 2.379601 LR: 0.00001255 +[06:53:15] Epoch: 1 Batch: 1611/20099 (8.02%) Loss: 2.438886 LR: 0.00001255 +[06:53:18] Epoch: 1 Batch: 1612/20099 (8.02%) Loss: 2.301873 LR: 0.00001255 +[06:53:21] Epoch: 1 Batch: 1613/20099 (8.03%) Loss: 2.217442 LR: 0.00001255 +[06:53:24] Epoch: 1 Batch: 1614/20099 (8.03%) Loss: 2.153318 LR: 0.00001255 +[06:53:27] Epoch: 1 Batch: 1615/20099 (8.04%) Loss: 2.458114 LR: 0.00001255 +[06:53:30] Epoch: 1 Batch: 1616/20099 (8.04%) Loss: 2.129455 LR: 0.00001255 +[06:53:33] Epoch: 1 Batch: 1617/20099 (8.05%) Loss: 2.181606 LR: 0.00001260 +[06:53:36] Epoch: 1 Batch: 1618/20099 (8.05%) Loss: 2.135801 LR: 0.00001260 +[06:53:40] Epoch: 1 Batch: 1619/20099 (8.06%) Loss: 2.172067 LR: 0.00001260 +[06:53:43] Epoch: 1 Batch: 1620/20099 (8.06%) Loss: 2.315743 LR: 0.00001260 +[06:53:46] Epoch: 1 Batch: 1621/20099 (8.07%) Loss: 2.308822 LR: 0.00001260 +[06:53:49] Epoch: 1 Batch: 1622/20099 (8.07%) Loss: 2.357031 LR: 0.00001260 +[06:53:52] Epoch: 1 Batch: 1623/20099 (8.08%) Loss: 2.424401 LR: 0.00001260 +[06:53:55] Epoch: 1 Batch: 1624/20099 (8.08%) Loss: 2.049492 LR: 0.00001265 +[06:53:58] Epoch: 1 Batch: 1625/20099 (8.08%) Loss: 2.459597 LR: 0.00001265 +[06:54:01] Epoch: 1 Batch: 1626/20099 (8.09%) Loss: 2.223011 LR: 0.00001265 +[06:54:04] Epoch: 1 Batch: 1627/20099 (8.09%) Loss: 2.541628 LR: 0.00001265 +[06:54:07] Epoch: 1 Batch: 1628/20099 (8.10%) Loss: 2.281045 LR: 0.00001265 +[06:54:10] Epoch: 1 Batch: 1629/20099 (8.10%) Loss: 2.351700 LR: 0.00001265 +[06:54:13] Epoch: 1 Batch: 1630/20099 (8.11%) Loss: 2.367495 LR: 0.00001265 +[06:54:17] Epoch: 1 Batch: 1631/20099 (8.11%) Loss: 2.382348 LR: 0.00001271 +[06:54:20] Epoch: 1 Batch: 1632/20099 (8.12%) Loss: 2.244730 LR: 0.00001271 +[06:54:23] Epoch: 1 Batch: 1633/20099 (8.12%) Loss: 2.424586 LR: 0.00001271 +[06:54:26] Epoch: 1 Batch: 1634/20099 (8.13%) Loss: 2.215753 LR: 0.00001271 +[06:54:29] Epoch: 1 Batch: 1635/20099 (8.13%) Loss: 2.221030 LR: 0.00001271 +[06:54:32] Epoch: 1 Batch: 1636/20099 (8.14%) Loss: 2.340493 LR: 0.00001271 +[06:54:35] Epoch: 1 Batch: 1637/20099 (8.14%) Loss: 2.121684 LR: 0.00001271 +[06:54:38] Epoch: 1 Batch: 1638/20099 (8.15%) Loss: 2.093879 LR: 0.00001276 +[06:54:41] Epoch: 1 Batch: 1639/20099 (8.15%) Loss: 2.374841 LR: 0.00001276 +[06:54:44] Epoch: 1 Batch: 1640/20099 (8.16%) Loss: 2.215573 LR: 0.00001276 +[06:54:48] Epoch: 1 Batch: 1641/20099 (8.16%) Loss: 2.306739 LR: 0.00001276 +[06:54:51] Epoch: 1 Batch: 1642/20099 (8.17%) Loss: 2.601887 LR: 0.00001276 +[06:54:54] Epoch: 1 Batch: 1643/20099 (8.17%) Loss: 2.668187 LR: 0.00001276 +[06:54:57] Epoch: 1 Batch: 1644/20099 (8.18%) Loss: 2.091061 LR: 0.00001276 +[06:55:00] Epoch: 1 Batch: 1645/20099 (8.18%) Loss: 2.246003 LR: 0.00001282 +[06:55:03] Epoch: 1 Batch: 1646/20099 (8.19%) Loss: 2.541567 LR: 0.00001282 +[06:55:06] Epoch: 1 Batch: 1647/20099 (8.19%) Loss: 2.412294 LR: 0.00001282 +[06:55:09] Epoch: 1 Batch: 1648/20099 (8.20%) Loss: 2.392308 LR: 0.00001282 +[06:55:12] Epoch: 1 Batch: 1649/20099 (8.20%) Loss: 2.200082 LR: 0.00001282 +[06:55:15] Epoch: 1 Batch: 1650/20099 (8.21%) Loss: 2.262511 LR: 0.00001282 +[06:55:19] Epoch: 1 Batch: 1651/20099 (8.21%) Loss: 2.407450 LR: 0.00001282 +[06:55:22] Epoch: 1 Batch: 1652/20099 (8.22%) Loss: 2.109709 LR: 0.00001287 +[06:55:25] Epoch: 1 Batch: 1653/20099 (8.22%) Loss: 2.426110 LR: 0.00001287 +[06:55:28] Epoch: 1 Batch: 1654/20099 (8.23%) Loss: 2.084368 LR: 0.00001287 +[06:55:31] Epoch: 1 Batch: 1655/20099 (8.23%) Loss: 2.338767 LR: 0.00001287 +[06:55:34] Epoch: 1 Batch: 1656/20099 (8.24%) Loss: 2.367495 LR: 0.00001287 +[06:55:37] Epoch: 1 Batch: 1657/20099 (8.24%) Loss: 2.529631 LR: 0.00001287 +[06:55:40] Epoch: 1 Batch: 1658/20099 (8.25%) Loss: 2.194630 LR: 0.00001287 +[06:55:43] Epoch: 1 Batch: 1659/20099 (8.25%) Loss: 1.926474 LR: 0.00001293 +[06:55:46] Epoch: 1 Batch: 1660/20099 (8.26%) Loss: 2.714239 LR: 0.00001293 +[06:55:50] Epoch: 1 Batch: 1661/20099 (8.26%) Loss: 2.178670 LR: 0.00001293 +[06:55:53] Epoch: 1 Batch: 1662/20099 (8.27%) Loss: 2.215367 LR: 0.00001293 +[06:55:56] Epoch: 1 Batch: 1663/20099 (8.27%) Loss: 2.620052 LR: 0.00001293 +[06:55:59] Epoch: 1 Batch: 1664/20099 (8.28%) Loss: 2.282206 LR: 0.00001293 +[06:56:02] Epoch: 1 Batch: 1665/20099 (8.28%) Loss: 2.213902 LR: 0.00001293 +[06:56:05] Epoch: 1 Batch: 1666/20099 (8.29%) Loss: 2.659980 LR: 0.00001298 +[06:56:08] Epoch: 1 Batch: 1667/20099 (8.29%) Loss: 2.686398 LR: 0.00001298 +[06:56:11] Epoch: 1 Batch: 1668/20099 (8.30%) Loss: 2.345804 LR: 0.00001298 +[06:56:14] Epoch: 1 Batch: 1669/20099 (8.30%) Loss: 2.009180 LR: 0.00001298 +[06:56:17] Epoch: 1 Batch: 1670/20099 (8.31%) Loss: 2.521678 LR: 0.00001298 +[06:56:20] Epoch: 1 Batch: 1671/20099 (8.31%) Loss: 2.400159 LR: 0.00001298 +[06:56:23] Epoch: 1 Batch: 1672/20099 (8.32%) Loss: 1.950377 LR: 0.00001298 +[06:56:27] Epoch: 1 Batch: 1673/20099 (8.32%) Loss: 2.502210 LR: 0.00001304 +[06:56:30] Epoch: 1 Batch: 1674/20099 (8.33%) Loss: 2.126189 LR: 0.00001304 +[06:56:33] Epoch: 1 Batch: 1675/20099 (8.33%) Loss: 2.321570 LR: 0.00001304 +[06:56:36] Epoch: 1 Batch: 1676/20099 (8.34%) Loss: 2.253106 LR: 0.00001304 +[06:56:39] Epoch: 1 Batch: 1677/20099 (8.34%) Loss: 2.454096 LR: 0.00001304 +[06:56:42] Epoch: 1 Batch: 1678/20099 (8.35%) Loss: 2.856981 LR: 0.00001304 +[06:56:45] Epoch: 1 Batch: 1679/20099 (8.35%) Loss: 2.030300 LR: 0.00001304 +[06:56:48] Epoch: 1 Batch: 1680/20099 (8.36%) Loss: 2.440923 LR: 0.00001309 +[06:56:51] Epoch: 1 Batch: 1681/20099 (8.36%) Loss: 2.089742 LR: 0.00001309 +[06:56:54] Epoch: 1 Batch: 1682/20099 (8.37%) Loss: 2.201564 LR: 0.00001309 +[06:56:58] Epoch: 1 Batch: 1683/20099 (8.37%) Loss: 2.214243 LR: 0.00001309 +[06:57:01] Epoch: 1 Batch: 1684/20099 (8.38%) Loss: 2.417551 LR: 0.00001309 +[06:57:04] Epoch: 1 Batch: 1685/20099 (8.38%) Loss: 2.351271 LR: 0.00001309 +[06:57:07] Epoch: 1 Batch: 1686/20099 (8.39%) Loss: 2.183200 LR: 0.00001309 +[06:57:10] Epoch: 1 Batch: 1687/20099 (8.39%) Loss: 2.589788 LR: 0.00001315 +[06:57:13] Epoch: 1 Batch: 1688/20099 (8.40%) Loss: 2.247130 LR: 0.00001315 +[06:57:16] Epoch: 1 Batch: 1689/20099 (8.40%) Loss: 2.245478 LR: 0.00001315 +[06:57:19] Epoch: 1 Batch: 1690/20099 (8.41%) Loss: 2.343208 LR: 0.00001315 +[06:57:22] Epoch: 1 Batch: 1691/20099 (8.41%) Loss: 2.327599 LR: 0.00001315 +[06:57:25] Epoch: 1 Batch: 1692/20099 (8.42%) Loss: 2.333018 LR: 0.00001315 +[06:57:29] Epoch: 1 Batch: 1693/20099 (8.42%) Loss: 2.346905 LR: 0.00001315 +[06:57:32] Epoch: 1 Batch: 1694/20099 (8.43%) Loss: 2.419308 LR: 0.00001320 +[06:57:35] Epoch: 1 Batch: 1695/20099 (8.43%) Loss: 1.861662 LR: 0.00001320 +[06:57:38] Epoch: 1 Batch: 1696/20099 (8.44%) Loss: 2.298707 LR: 0.00001320 +[06:57:41] Epoch: 1 Batch: 1697/20099 (8.44%) Loss: 2.162012 LR: 0.00001320 +[06:57:44] Epoch: 1 Batch: 1698/20099 (8.45%) Loss: 2.299673 LR: 0.00001320 +[06:57:47] Epoch: 1 Batch: 1699/20099 (8.45%) Loss: 2.138160 LR: 0.00001320 +[06:57:50] Epoch: 1 Batch: 1700/20099 (8.46%) Loss: 2.188778 LR: 0.00001320 +[06:57:53] Epoch: 1 Batch: 1701/20099 (8.46%) Loss: 2.437657 LR: 0.00001325 +[06:57:57] Epoch: 1 Batch: 1702/20099 (8.47%) Loss: 2.233498 LR: 0.00001325 +[06:58:00] Epoch: 1 Batch: 1703/20099 (8.47%) Loss: 2.208044 LR: 0.00001325 +[06:58:03] Epoch: 1 Batch: 1704/20099 (8.48%) Loss: 2.111284 LR: 0.00001325 +[06:58:06] Epoch: 1 Batch: 1705/20099 (8.48%) Loss: 2.283358 LR: 0.00001325 +[06:58:09] Epoch: 1 Batch: 1706/20099 (8.49%) Loss: 2.416652 LR: 0.00001325 +[06:58:12] Epoch: 1 Batch: 1707/20099 (8.49%) Loss: 2.492845 LR: 0.00001325 +[06:58:15] Epoch: 1 Batch: 1708/20099 (8.50%) Loss: 1.951697 LR: 0.00001331 +[06:58:18] Epoch: 1 Batch: 1709/20099 (8.50%) Loss: 2.192699 LR: 0.00001331 +[06:58:21] Epoch: 1 Batch: 1710/20099 (8.51%) Loss: 2.461122 LR: 0.00001331 +[06:58:24] Epoch: 1 Batch: 1711/20099 (8.51%) Loss: 1.974747 LR: 0.00001331 +[06:58:28] Epoch: 1 Batch: 1712/20099 (8.52%) Loss: 2.429329 LR: 0.00001331 +[06:58:31] Epoch: 1 Batch: 1713/20099 (8.52%) Loss: 2.223940 LR: 0.00001331 +[06:58:34] Epoch: 1 Batch: 1714/20099 (8.53%) Loss: 2.348989 LR: 0.00001331 +[06:58:37] Epoch: 1 Batch: 1715/20099 (8.53%) Loss: 2.370657 LR: 0.00001336 +[06:58:40] Epoch: 1 Batch: 1716/20099 (8.54%) Loss: 2.358052 LR: 0.00001336 +[06:58:43] Epoch: 1 Batch: 1717/20099 (8.54%) Loss: 2.290869 LR: 0.00001336 +[06:58:46] Epoch: 1 Batch: 1718/20099 (8.55%) Loss: 2.238388 LR: 0.00001336 +[06:58:49] Epoch: 1 Batch: 1719/20099 (8.55%) Loss: 2.261013 LR: 0.00001336 +[06:58:52] Epoch: 1 Batch: 1720/20099 (8.56%) Loss: 2.559283 LR: 0.00001336 +[06:58:55] Epoch: 1 Batch: 1721/20099 (8.56%) Loss: 2.203809 LR: 0.00001336 +[06:58:59] Epoch: 1 Batch: 1722/20099 (8.57%) Loss: 2.541089 LR: 0.00001342 +[06:59:02] Epoch: 1 Batch: 1723/20099 (8.57%) Loss: 2.299240 LR: 0.00001342 +[06:59:05] Epoch: 1 Batch: 1724/20099 (8.58%) Loss: 2.328654 LR: 0.00001342 +[06:59:08] Epoch: 1 Batch: 1725/20099 (8.58%) Loss: 2.136559 LR: 0.00001342 +[06:59:11] Epoch: 1 Batch: 1726/20099 (8.59%) Loss: 2.256575 LR: 0.00001342 +[06:59:14] Epoch: 1 Batch: 1727/20099 (8.59%) Loss: 2.251299 LR: 0.00001342 +[06:59:17] Epoch: 1 Batch: 1728/20099 (8.60%) Loss: 2.466022 LR: 0.00001342 +[06:59:20] Epoch: 1 Batch: 1729/20099 (8.60%) Loss: 2.383904 LR: 0.00001347 +[06:59:23] Epoch: 1 Batch: 1730/20099 (8.61%) Loss: 2.336459 LR: 0.00001347 +[06:59:26] Epoch: 1 Batch: 1731/20099 (8.61%) Loss: 2.207716 LR: 0.00001347 +[06:59:30] Epoch: 1 Batch: 1732/20099 (8.62%) Loss: 2.193822 LR: 0.00001347 +[06:59:33] Epoch: 1 Batch: 1733/20099 (8.62%) Loss: 2.472135 LR: 0.00001347 +[06:59:36] Epoch: 1 Batch: 1734/20099 (8.63%) Loss: 2.323449 LR: 0.00001347 +[06:59:39] Epoch: 1 Batch: 1735/20099 (8.63%) Loss: 2.349814 LR: 0.00001347 +[06:59:42] Epoch: 1 Batch: 1736/20099 (8.64%) Loss: 2.389352 LR: 0.00001353 +[06:59:45] Epoch: 1 Batch: 1737/20099 (8.64%) Loss: 2.172447 LR: 0.00001353 +[06:59:48] Epoch: 1 Batch: 1738/20099 (8.65%) Loss: 2.547144 LR: 0.00001353 +[06:59:51] Epoch: 1 Batch: 1739/20099 (8.65%) Loss: 2.121384 LR: 0.00001353 +[06:59:54] Epoch: 1 Batch: 1740/20099 (8.66%) Loss: 2.347766 LR: 0.00001353 +[06:59:57] Epoch: 1 Batch: 1741/20099 (8.66%) Loss: 2.570089 LR: 0.00001353 +[07:00:00] Epoch: 1 Batch: 1742/20099 (8.67%) Loss: 2.158255 LR: 0.00001353 +[07:00:04] Epoch: 1 Batch: 1743/20099 (8.67%) Loss: 2.358286 LR: 0.00001358 +[07:00:07] Epoch: 1 Batch: 1744/20099 (8.68%) Loss: 2.560031 LR: 0.00001358 +[07:00:10] Epoch: 1 Batch: 1745/20099 (8.68%) Loss: 2.156942 LR: 0.00001358 +[07:00:13] Epoch: 1 Batch: 1746/20099 (8.69%) Loss: 2.125696 LR: 0.00001358 +[07:00:16] Epoch: 1 Batch: 1747/20099 (8.69%) Loss: 2.457773 LR: 0.00001358 +[07:00:19] Epoch: 1 Batch: 1748/20099 (8.70%) Loss: 2.443941 LR: 0.00001358 +[07:00:22] Epoch: 1 Batch: 1749/20099 (8.70%) Loss: 2.541251 LR: 0.00001358 +[07:00:25] Epoch: 1 Batch: 1750/20099 (8.71%) Loss: 2.464676 LR: 0.00001364 +[07:00:28] Epoch: 1 Batch: 1751/20099 (8.71%) Loss: 2.066006 LR: 0.00001364 +[07:00:31] Epoch: 1 Batch: 1752/20099 (8.72%) Loss: 2.263027 LR: 0.00001364 +[07:00:35] Epoch: 1 Batch: 1753/20099 (8.72%) Loss: 2.372286 LR: 0.00001364 +[07:00:38] Epoch: 1 Batch: 1754/20099 (8.73%) Loss: 2.431624 LR: 0.00001364 +[07:00:41] Epoch: 1 Batch: 1755/20099 (8.73%) Loss: 2.099194 LR: 0.00001364 +[07:00:44] Epoch: 1 Batch: 1756/20099 (8.74%) Loss: 1.992789 LR: 0.00001364 +[07:00:47] Epoch: 1 Batch: 1757/20099 (8.74%) Loss: 2.504426 LR: 0.00001369 +[07:00:50] Epoch: 1 Batch: 1758/20099 (8.75%) Loss: 2.034373 LR: 0.00001369 +[07:00:53] Epoch: 1 Batch: 1759/20099 (8.75%) Loss: 2.576954 LR: 0.00001369 +[07:00:56] Epoch: 1 Batch: 1760/20099 (8.76%) Loss: 2.412631 LR: 0.00001369 +[07:00:59] Epoch: 1 Batch: 1761/20099 (8.76%) Loss: 2.325705 LR: 0.00001369 +[07:01:02] Epoch: 1 Batch: 1762/20099 (8.77%) Loss: 2.370502 LR: 0.00001369 +[07:01:05] Epoch: 1 Batch: 1763/20099 (8.77%) Loss: 2.317315 LR: 0.00001369 +[07:01:09] Epoch: 1 Batch: 1764/20099 (8.78%) Loss: 2.312242 LR: 0.00001375 +[07:01:12] Epoch: 1 Batch: 1765/20099 (8.78%) Loss: 1.719209 LR: 0.00001375 +[07:01:15] Epoch: 1 Batch: 1766/20099 (8.79%) Loss: 2.155427 LR: 0.00001375 +[07:01:18] Epoch: 1 Batch: 1767/20099 (8.79%) Loss: 2.381479 LR: 0.00001375 +[07:01:21] Epoch: 1 Batch: 1768/20099 (8.80%) Loss: 2.558931 LR: 0.00001375 +[07:01:24] Epoch: 1 Batch: 1769/20099 (8.80%) Loss: 2.362370 LR: 0.00001375 +[07:01:27] Epoch: 1 Batch: 1770/20099 (8.81%) Loss: 2.087512 LR: 0.00001375 +[07:01:30] Epoch: 1 Batch: 1771/20099 (8.81%) Loss: 2.326218 LR: 0.00001380 +[07:01:33] Epoch: 1 Batch: 1772/20099 (8.82%) Loss: 2.741281 LR: 0.00001380 +[07:01:36] Epoch: 1 Batch: 1773/20099 (8.82%) Loss: 2.302479 LR: 0.00001380 +[07:01:40] Epoch: 1 Batch: 1774/20099 (8.83%) Loss: 2.098899 LR: 0.00001380 +[07:01:43] Epoch: 1 Batch: 1775/20099 (8.83%) Loss: 2.131225 LR: 0.00001380 +[07:01:46] Epoch: 1 Batch: 1776/20099 (8.84%) Loss: 2.246016 LR: 0.00001380 +[07:01:49] Epoch: 1 Batch: 1777/20099 (8.84%) Loss: 2.279077 LR: 0.00001380 +[07:01:52] Epoch: 1 Batch: 1778/20099 (8.85%) Loss: 2.300801 LR: 0.00001385 +[07:01:55] Epoch: 1 Batch: 1779/20099 (8.85%) Loss: 2.395876 LR: 0.00001385 +[07:01:58] Epoch: 1 Batch: 1780/20099 (8.86%) Loss: 2.285399 LR: 0.00001385 +[07:02:01] Epoch: 1 Batch: 1781/20099 (8.86%) Loss: 1.918569 LR: 0.00001385 +[07:02:04] Epoch: 1 Batch: 1782/20099 (8.87%) Loss: 2.197295 LR: 0.00001385 +[07:02:07] Epoch: 1 Batch: 1783/20099 (8.87%) Loss: 2.136642 LR: 0.00001385 +[07:02:11] Epoch: 1 Batch: 1784/20099 (8.88%) Loss: 2.576044 LR: 0.00001385 +[07:02:14] Epoch: 1 Batch: 1785/20099 (8.88%) Loss: 2.111222 LR: 0.00001391 +[07:02:17] Epoch: 1 Batch: 1786/20099 (8.89%) Loss: 2.556108 LR: 0.00001391 +[07:02:20] Epoch: 1 Batch: 1787/20099 (8.89%) Loss: 2.209382 LR: 0.00001391 +[07:02:23] Epoch: 1 Batch: 1788/20099 (8.90%) Loss: 2.082262 LR: 0.00001391 +[07:02:26] Epoch: 1 Batch: 1789/20099 (8.90%) Loss: 2.190976 LR: 0.00001391 +[07:02:29] Epoch: 1 Batch: 1790/20099 (8.91%) Loss: 2.325526 LR: 0.00001391 +[07:02:32] Epoch: 1 Batch: 1791/20099 (8.91%) Loss: 2.398899 LR: 0.00001391 +[07:02:35] Epoch: 1 Batch: 1792/20099 (8.92%) Loss: 2.487328 LR: 0.00001396 +[07:02:38] Epoch: 1 Batch: 1793/20099 (8.92%) Loss: 2.455072 LR: 0.00001396 +[07:02:41] Epoch: 1 Batch: 1794/20099 (8.93%) Loss: 2.228589 LR: 0.00001396 +[07:02:45] Epoch: 1 Batch: 1795/20099 (8.93%) Loss: 2.301514 LR: 0.00001396 +[07:02:48] Epoch: 1 Batch: 1796/20099 (8.94%) Loss: 2.215296 LR: 0.00001396 +[07:02:51] Epoch: 1 Batch: 1797/20099 (8.94%) Loss: 2.391502 LR: 0.00001396 +[07:02:54] Epoch: 1 Batch: 1798/20099 (8.95%) Loss: 2.454530 LR: 0.00001396 +[07:02:57] Epoch: 1 Batch: 1799/20099 (8.95%) Loss: 2.125805 LR: 0.00001402 +[07:03:04] >> Temp checkpoint saved: epoch1_step1800, size: 0.1693 GB +[07:03:04] Epoch: 1 Batch: 1800/20099 (8.96%) Loss: 2.341751 LR: 0.00001402 +[07:03:07] Epoch: 1 Batch: 1801/20099 (8.96%) Loss: 2.512991 LR: 0.00001402 +[07:03:10] Epoch: 1 Batch: 1802/20099 (8.97%) Loss: 2.313782 LR: 0.00001402 +[07:03:13] Epoch: 1 Batch: 1803/20099 (8.97%) Loss: 2.262982 LR: 0.00001402 +[07:03:16] Epoch: 1 Batch: 1804/20099 (8.98%) Loss: 2.229398 LR: 0.00001402 +[07:03:19] Epoch: 1 Batch: 1805/20099 (8.98%) Loss: 2.300263 LR: 0.00001402 +[07:03:22] Epoch: 1 Batch: 1806/20099 (8.99%) Loss: 2.642084 LR: 0.00001407 +[07:03:25] Epoch: 1 Batch: 1807/20099 (8.99%) Loss: 2.262362 LR: 0.00001407 +[07:03:29] Epoch: 1 Batch: 1808/20099 (9.00%) Loss: 2.222048 LR: 0.00001407 +[07:03:32] Epoch: 1 Batch: 1809/20099 (9.00%) Loss: 2.258570 LR: 0.00001407 +[07:03:35] Epoch: 1 Batch: 1810/20099 (9.01%) Loss: 2.291123 LR: 0.00001407 +[07:03:38] Epoch: 1 Batch: 1811/20099 (9.01%) Loss: 2.038574 LR: 0.00001407 +[07:03:41] Epoch: 1 Batch: 1812/20099 (9.02%) Loss: 2.163481 LR: 0.00001407 +[07:03:44] Epoch: 1 Batch: 1813/20099 (9.02%) Loss: 2.190791 LR: 0.00001413 +[07:03:47] Epoch: 1 Batch: 1814/20099 (9.03%) Loss: 2.122209 LR: 0.00001413 +[07:03:50] Epoch: 1 Batch: 1815/20099 (9.03%) Loss: 2.445512 LR: 0.00001413 +[07:03:54] Epoch: 1 Batch: 1816/20099 (9.04%) Loss: 2.413850 LR: 0.00001413 +[07:03:57] Epoch: 1 Batch: 1817/20099 (9.04%) Loss: 2.119793 LR: 0.00001413 +[07:04:00] Epoch: 1 Batch: 1818/20099 (9.05%) Loss: 2.062216 LR: 0.00001413 +[07:04:03] Epoch: 1 Batch: 1819/20099 (9.05%) Loss: 2.429849 LR: 0.00001413 +[07:04:06] Epoch: 1 Batch: 1820/20099 (9.06%) Loss: 2.576694 LR: 0.00001418 +[07:04:09] Epoch: 1 Batch: 1821/20099 (9.06%) Loss: 2.103817 LR: 0.00001418 +[07:04:12] Epoch: 1 Batch: 1822/20099 (9.07%) Loss: 2.055585 LR: 0.00001418 +[07:04:15] Epoch: 1 Batch: 1823/20099 (9.07%) Loss: 2.230421 LR: 0.00001418 +[07:04:18] Epoch: 1 Batch: 1824/20099 (9.08%) Loss: 2.327197 LR: 0.00001418 +[07:04:21] Epoch: 1 Batch: 1825/20099 (9.08%) Loss: 2.046785 LR: 0.00001418 +[07:04:24] Epoch: 1 Batch: 1826/20099 (9.09%) Loss: 2.214765 LR: 0.00001418 +[07:04:28] Epoch: 1 Batch: 1827/20099 (9.09%) Loss: 2.434702 LR: 0.00001424 +[07:04:31] Epoch: 1 Batch: 1828/20099 (9.09%) Loss: 2.425454 LR: 0.00001424 +[07:04:34] Epoch: 1 Batch: 1829/20099 (9.10%) Loss: 2.109859 LR: 0.00001424 +[07:04:37] Epoch: 1 Batch: 1830/20099 (9.10%) Loss: 2.155876 LR: 0.00001424 +[07:04:40] Epoch: 1 Batch: 1831/20099 (9.11%) Loss: 2.234667 LR: 0.00001424 +[07:04:43] Epoch: 1 Batch: 1832/20099 (9.11%) Loss: 2.315950 LR: 0.00001424 +[07:04:46] Epoch: 1 Batch: 1833/20099 (9.12%) Loss: 2.404123 LR: 0.00001424 +[07:04:49] Epoch: 1 Batch: 1834/20099 (9.12%) Loss: 2.113973 LR: 0.00001429 +[07:04:52] Epoch: 1 Batch: 1835/20099 (9.13%) Loss: 2.457240 LR: 0.00001429 +[07:04:55] Epoch: 1 Batch: 1836/20099 (9.13%) Loss: 2.279351 LR: 0.00001429 +[07:04:58] Epoch: 1 Batch: 1837/20099 (9.14%) Loss: 2.269922 LR: 0.00001429 +[07:05:02] Epoch: 1 Batch: 1838/20099 (9.14%) Loss: 2.245026 LR: 0.00001429 +[07:05:05] Epoch: 1 Batch: 1839/20099 (9.15%) Loss: 2.328990 LR: 0.00001429 +[07:05:08] Epoch: 1 Batch: 1840/20099 (9.15%) Loss: 2.190562 LR: 0.00001429 +[07:05:11] Epoch: 1 Batch: 1841/20099 (9.16%) Loss: 2.382811 LR: 0.00001435 +[07:05:14] Epoch: 1 Batch: 1842/20099 (9.16%) Loss: 2.476586 LR: 0.00001435 +[07:05:17] Epoch: 1 Batch: 1843/20099 (9.17%) Loss: 2.254930 LR: 0.00001435 +[07:05:20] Epoch: 1 Batch: 1844/20099 (9.17%) Loss: 2.147118 LR: 0.00001435 +[07:05:23] Epoch: 1 Batch: 1845/20099 (9.18%) Loss: 2.080910 LR: 0.00001435 +[07:05:26] Epoch: 1 Batch: 1846/20099 (9.18%) Loss: 2.573818 LR: 0.00001435 +[07:05:29] Epoch: 1 Batch: 1847/20099 (9.19%) Loss: 2.532523 LR: 0.00001435 +[07:05:32] Epoch: 1 Batch: 1848/20099 (9.19%) Loss: 2.189087 LR: 0.00001440 +[07:05:36] Epoch: 1 Batch: 1849/20099 (9.20%) Loss: 2.281926 LR: 0.00001440 +[07:05:39] Epoch: 1 Batch: 1850/20099 (9.20%) Loss: 2.312815 LR: 0.00001440 +[07:05:42] Epoch: 1 Batch: 1851/20099 (9.21%) Loss: 2.219538 LR: 0.00001440 +[07:05:45] Epoch: 1 Batch: 1852/20099 (9.21%) Loss: 2.326258 LR: 0.00001440 +[07:05:48] Epoch: 1 Batch: 1853/20099 (9.22%) Loss: 2.339379 LR: 0.00001440 +[07:05:51] Epoch: 1 Batch: 1854/20099 (9.22%) Loss: 2.097604 LR: 0.00001440 +[07:05:54] Epoch: 1 Batch: 1855/20099 (9.23%) Loss: 2.401152 LR: 0.00001445 +[07:05:57] Epoch: 1 Batch: 1856/20099 (9.23%) Loss: 2.210540 LR: 0.00001445 +[07:06:00] Epoch: 1 Batch: 1857/20099 (9.24%) Loss: 2.132291 LR: 0.00001445 +[07:06:03] Epoch: 1 Batch: 1858/20099 (9.24%) Loss: 2.284000 LR: 0.00001445 +[07:06:06] Epoch: 1 Batch: 1859/20099 (9.25%) Loss: 2.308446 LR: 0.00001445 +[07:06:10] Epoch: 1 Batch: 1860/20099 (9.25%) Loss: 2.151280 LR: 0.00001445 +[07:06:13] Epoch: 1 Batch: 1861/20099 (9.26%) Loss: 2.206615 LR: 0.00001445 +[07:06:16] Epoch: 1 Batch: 1862/20099 (9.26%) Loss: 2.298809 LR: 0.00001451 +[07:06:19] Epoch: 1 Batch: 1863/20099 (9.27%) Loss: 2.447143 LR: 0.00001451 +[07:06:22] Epoch: 1 Batch: 1864/20099 (9.27%) Loss: 2.307387 LR: 0.00001451 +[07:06:25] Epoch: 1 Batch: 1865/20099 (9.28%) Loss: 2.228246 LR: 0.00001451 +[07:06:28] Epoch: 1 Batch: 1866/20099 (9.28%) Loss: 2.326654 LR: 0.00001451 +[07:06:31] Epoch: 1 Batch: 1867/20099 (9.29%) Loss: 2.345151 LR: 0.00001451 +[07:06:34] Epoch: 1 Batch: 1868/20099 (9.29%) Loss: 2.162492 LR: 0.00001451 +[07:06:37] Epoch: 1 Batch: 1869/20099 (9.30%) Loss: 2.187411 LR: 0.00001456 +[07:06:41] Epoch: 1 Batch: 1870/20099 (9.30%) Loss: 1.998963 LR: 0.00001456 +[07:06:44] Epoch: 1 Batch: 1871/20099 (9.31%) Loss: 2.070997 LR: 0.00001456 +[07:06:47] Epoch: 1 Batch: 1872/20099 (9.31%) Loss: 2.109891 LR: 0.00001456 +[07:06:50] Epoch: 1 Batch: 1873/20099 (9.32%) Loss: 2.566212 LR: 0.00001456 +[07:06:53] Epoch: 1 Batch: 1874/20099 (9.32%) Loss: 2.506237 LR: 0.00001456 +[07:06:56] Epoch: 1 Batch: 1875/20099 (9.33%) Loss: 2.253242 LR: 0.00001456 +[07:06:59] Epoch: 1 Batch: 1876/20099 (9.33%) Loss: 2.282872 LR: 0.00001462 +[07:07:02] Epoch: 1 Batch: 1877/20099 (9.34%) Loss: 2.292516 LR: 0.00001462 +[07:07:05] Epoch: 1 Batch: 1878/20099 (9.34%) Loss: 2.371445 LR: 0.00001462 +[07:07:08] Epoch: 1 Batch: 1879/20099 (9.35%) Loss: 2.220976 LR: 0.00001462 +[07:07:12] Epoch: 1 Batch: 1880/20099 (9.35%) Loss: 2.249311 LR: 0.00001462 +[07:07:15] Epoch: 1 Batch: 1881/20099 (9.36%) Loss: 2.150406 LR: 0.00001462 +[07:07:18] Epoch: 1 Batch: 1882/20099 (9.36%) Loss: 2.197207 LR: 0.00001462 +[07:07:21] Epoch: 1 Batch: 1883/20099 (9.37%) Loss: 2.382979 LR: 0.00001467 +[07:07:24] Epoch: 1 Batch: 1884/20099 (9.37%) Loss: 2.472743 LR: 0.00001467 +[07:07:27] Epoch: 1 Batch: 1885/20099 (9.38%) Loss: 2.445419 LR: 0.00001467 +[07:07:30] Epoch: 1 Batch: 1886/20099 (9.38%) Loss: 2.178793 LR: 0.00001467 +[07:07:33] Epoch: 1 Batch: 1887/20099 (9.39%) Loss: 2.276334 LR: 0.00001467 +[07:07:36] Epoch: 1 Batch: 1888/20099 (9.39%) Loss: 2.561960 LR: 0.00001467 +[07:07:39] Epoch: 1 Batch: 1889/20099 (9.40%) Loss: 2.424026 LR: 0.00001467 +[07:07:42] Epoch: 1 Batch: 1890/20099 (9.40%) Loss: 2.245950 LR: 0.00001473 +[07:07:46] Epoch: 1 Batch: 1891/20099 (9.41%) Loss: 2.089722 LR: 0.00001473 +[07:07:49] Epoch: 1 Batch: 1892/20099 (9.41%) Loss: 2.382690 LR: 0.00001473 +[07:07:52] Epoch: 1 Batch: 1893/20099 (9.42%) Loss: 2.443253 LR: 0.00001473 +[07:07:55] Epoch: 1 Batch: 1894/20099 (9.42%) Loss: 2.415367 LR: 0.00001473 +[07:07:58] Epoch: 1 Batch: 1895/20099 (9.43%) Loss: 2.261578 LR: 0.00001473 +[07:08:01] Epoch: 1 Batch: 1896/20099 (9.43%) Loss: 2.233251 LR: 0.00001473 +[07:08:04] Epoch: 1 Batch: 1897/20099 (9.44%) Loss: 2.558352 LR: 0.00001478 +[07:08:07] Epoch: 1 Batch: 1898/20099 (9.44%) Loss: 2.526720 LR: 0.00001478 +[07:08:10] Epoch: 1 Batch: 1899/20099 (9.45%) Loss: 2.429466 LR: 0.00001478 +[07:08:14] Epoch: 1 Batch: 1900/20099 (9.45%) Loss: 2.250235 LR: 0.00001478 +[07:08:17] Epoch: 1 Batch: 1901/20099 (9.46%) Loss: 2.151051 LR: 0.00001478 +[07:08:20] Epoch: 1 Batch: 1902/20099 (9.46%) Loss: 2.287206 LR: 0.00001478 +[07:08:23] Epoch: 1 Batch: 1903/20099 (9.47%) Loss: 2.545099 LR: 0.00001478 +[07:08:26] Epoch: 1 Batch: 1904/20099 (9.47%) Loss: 2.251592 LR: 0.00001484 +[07:08:29] Epoch: 1 Batch: 1905/20099 (9.48%) Loss: 2.395315 LR: 0.00001484 +[07:08:32] Epoch: 1 Batch: 1906/20099 (9.48%) Loss: 2.164914 LR: 0.00001484 +[07:08:35] Epoch: 1 Batch: 1907/20099 (9.49%) Loss: 2.374604 LR: 0.00001484 +[07:08:38] Epoch: 1 Batch: 1908/20099 (9.49%) Loss: 2.218908 LR: 0.00001484 +[07:08:41] Epoch: 1 Batch: 1909/20099 (9.50%) Loss: 2.395016 LR: 0.00001484 +[07:08:45] Epoch: 1 Batch: 1910/20099 (9.50%) Loss: 2.199760 LR: 0.00001484 +[07:08:48] Epoch: 1 Batch: 1911/20099 (9.51%) Loss: 2.411818 LR: 0.00001489 +[07:08:51] Epoch: 1 Batch: 1912/20099 (9.51%) Loss: 2.293213 LR: 0.00001489 +[07:08:54] Epoch: 1 Batch: 1913/20099 (9.52%) Loss: 2.413023 LR: 0.00001489 +[07:08:57] Epoch: 1 Batch: 1914/20099 (9.52%) Loss: 2.367072 LR: 0.00001489 +[07:09:00] Epoch: 1 Batch: 1915/20099 (9.53%) Loss: 2.226970 LR: 0.00001489 +[07:09:03] Epoch: 1 Batch: 1916/20099 (9.53%) Loss: 2.410823 LR: 0.00001489 +[07:09:06] Epoch: 1 Batch: 1917/20099 (9.54%) Loss: 2.256713 LR: 0.00001489 +[07:09:09] Epoch: 1 Batch: 1918/20099 (9.54%) Loss: 2.358310 LR: 0.00001495 +[07:09:12] Epoch: 1 Batch: 1919/20099 (9.55%) Loss: 1.987475 LR: 0.00001495 +[07:09:16] Epoch: 1 Batch: 1920/20099 (9.55%) Loss: 2.478616 LR: 0.00001495 +[07:09:19] Epoch: 1 Batch: 1921/20099 (9.56%) Loss: 2.320477 LR: 0.00001495 +[07:09:22] Epoch: 1 Batch: 1922/20099 (9.56%) Loss: 2.144688 LR: 0.00001495 +[07:09:25] Epoch: 1 Batch: 1923/20099 (9.57%) Loss: 2.321565 LR: 0.00001495 +[07:09:28] Epoch: 1 Batch: 1924/20099 (9.57%) Loss: 2.427891 LR: 0.00001495 +[07:09:31] Epoch: 1 Batch: 1925/20099 (9.58%) Loss: 2.240398 LR: 0.00001500 +[07:09:34] Epoch: 1 Batch: 1926/20099 (9.58%) Loss: 2.220269 LR: 0.00001500 +[07:09:37] Epoch: 1 Batch: 1927/20099 (9.59%) Loss: 2.657623 LR: 0.00001500 +[07:09:40] Epoch: 1 Batch: 1928/20099 (9.59%) Loss: 2.314806 LR: 0.00001500 +[07:09:43] Epoch: 1 Batch: 1929/20099 (9.60%) Loss: 2.336476 LR: 0.00001500 +[07:09:47] Epoch: 1 Batch: 1930/20099 (9.60%) Loss: 2.257707 LR: 0.00001500 +[07:09:50] Epoch: 1 Batch: 1931/20099 (9.61%) Loss: 2.289458 LR: 0.00001500 +[07:09:53] Epoch: 1 Batch: 1932/20099 (9.61%) Loss: 2.247865 LR: 0.00001505 +[07:09:56] Epoch: 1 Batch: 1933/20099 (9.62%) Loss: 2.364190 LR: 0.00001505 +[07:09:59] Epoch: 1 Batch: 1934/20099 (9.62%) Loss: 2.475781 LR: 0.00001505 +[07:10:02] Epoch: 1 Batch: 1935/20099 (9.63%) Loss: 2.419861 LR: 0.00001505 +[07:10:05] Epoch: 1 Batch: 1936/20099 (9.63%) Loss: 1.885737 LR: 0.00001505 +[07:10:08] Epoch: 1 Batch: 1937/20099 (9.64%) Loss: 2.308288 LR: 0.00001505 +[07:10:11] Epoch: 1 Batch: 1938/20099 (9.64%) Loss: 2.256571 LR: 0.00001505 +[07:10:14] Epoch: 1 Batch: 1939/20099 (9.65%) Loss: 2.459883 LR: 0.00001511 +[07:10:18] Epoch: 1 Batch: 1940/20099 (9.65%) Loss: 2.213403 LR: 0.00001511 +[07:10:21] Epoch: 1 Batch: 1941/20099 (9.66%) Loss: 2.323805 LR: 0.00001511 +[07:10:24] Epoch: 1 Batch: 1942/20099 (9.66%) Loss: 2.292919 LR: 0.00001511 +[07:10:27] Epoch: 1 Batch: 1943/20099 (9.67%) Loss: 2.143442 LR: 0.00001511 +[07:10:30] Epoch: 1 Batch: 1944/20099 (9.67%) Loss: 2.303315 LR: 0.00001511 +[07:10:33] Epoch: 1 Batch: 1945/20099 (9.68%) Loss: 2.344183 LR: 0.00001511 +[07:10:36] Epoch: 1 Batch: 1946/20099 (9.68%) Loss: 2.213863 LR: 0.00001516 +[07:10:39] Epoch: 1 Batch: 1947/20099 (9.69%) Loss: 2.570961 LR: 0.00001516 +[07:10:42] Epoch: 1 Batch: 1948/20099 (9.69%) Loss: 2.227100 LR: 0.00001516 +[07:10:46] Epoch: 1 Batch: 1949/20099 (9.70%) Loss: 2.507065 LR: 0.00001516 +[07:10:49] Epoch: 1 Batch: 1950/20099 (9.70%) Loss: 2.187437 LR: 0.00001516 +[07:10:52] Epoch: 1 Batch: 1951/20099 (9.71%) Loss: 2.349329 LR: 0.00001516 +[07:10:55] Epoch: 1 Batch: 1952/20099 (9.71%) Loss: 1.983500 LR: 0.00001516 +[07:10:58] Epoch: 1 Batch: 1953/20099 (9.72%) Loss: 2.451755 LR: 0.00001522 +[07:11:01] Epoch: 1 Batch: 1954/20099 (9.72%) Loss: 2.312938 LR: 0.00001522 +[07:11:04] Epoch: 1 Batch: 1955/20099 (9.73%) Loss: 2.413391 LR: 0.00001522 +[07:11:07] Epoch: 1 Batch: 1956/20099 (9.73%) Loss: 2.302065 LR: 0.00001522 +[07:11:10] Epoch: 1 Batch: 1957/20099 (9.74%) Loss: 2.063798 LR: 0.00001522 +[07:11:13] Epoch: 1 Batch: 1958/20099 (9.74%) Loss: 2.387277 LR: 0.00001522 +[07:11:17] Epoch: 1 Batch: 1959/20099 (9.75%) Loss: 2.412908 LR: 0.00001522 +[07:11:20] Epoch: 1 Batch: 1960/20099 (9.75%) Loss: 1.994870 LR: 0.00001527 +[07:11:23] Epoch: 1 Batch: 1961/20099 (9.76%) Loss: 2.167708 LR: 0.00001527 +[07:11:26] Epoch: 1 Batch: 1962/20099 (9.76%) Loss: 2.282966 LR: 0.00001527 +[07:11:29] Epoch: 1 Batch: 1963/20099 (9.77%) Loss: 2.119739 LR: 0.00001527 +[07:11:32] Epoch: 1 Batch: 1964/20099 (9.77%) Loss: 2.065388 LR: 0.00001527 +[07:11:35] Epoch: 1 Batch: 1965/20099 (9.78%) Loss: 2.202756 LR: 0.00001527 +[07:11:38] Epoch: 1 Batch: 1966/20099 (9.78%) Loss: 2.430023 LR: 0.00001527 +[07:11:41] Epoch: 1 Batch: 1967/20099 (9.79%) Loss: 2.057037 LR: 0.00001533 +[07:11:44] Epoch: 1 Batch: 1968/20099 (9.79%) Loss: 2.221644 LR: 0.00001533 +[07:11:48] Epoch: 1 Batch: 1969/20099 (9.80%) Loss: 2.140297 LR: 0.00001533 +[07:11:51] Epoch: 1 Batch: 1970/20099 (9.80%) Loss: 2.416026 LR: 0.00001533 +[07:11:54] Epoch: 1 Batch: 1971/20099 (9.81%) Loss: 2.322754 LR: 0.00001533 +[07:11:57] Epoch: 1 Batch: 1972/20099 (9.81%) Loss: 2.536763 LR: 0.00001533 +[07:12:00] Epoch: 1 Batch: 1973/20099 (9.82%) Loss: 2.140145 LR: 0.00001533 +[07:12:03] Epoch: 1 Batch: 1974/20099 (9.82%) Loss: 2.217020 LR: 0.00001538 +[07:12:06] Epoch: 1 Batch: 1975/20099 (9.83%) Loss: 2.259129 LR: 0.00001538 +[07:12:09] Epoch: 1 Batch: 1976/20099 (9.83%) Loss: 2.245807 LR: 0.00001538 +[07:12:12] Epoch: 1 Batch: 1977/20099 (9.84%) Loss: 2.007086 LR: 0.00001538 +[07:12:15] Epoch: 1 Batch: 1978/20099 (9.84%) Loss: 2.148416 LR: 0.00001538 +[07:12:19] Epoch: 1 Batch: 1979/20099 (9.85%) Loss: 2.503799 LR: 0.00001538 +[07:12:22] Epoch: 1 Batch: 1980/20099 (9.85%) Loss: 2.147196 LR: 0.00001538 +[07:12:25] Epoch: 1 Batch: 1981/20099 (9.86%) Loss: 2.274340 LR: 0.00001544 +[07:12:28] Epoch: 1 Batch: 1982/20099 (9.86%) Loss: 2.227834 LR: 0.00001544 +[07:12:31] Epoch: 1 Batch: 1983/20099 (9.87%) Loss: 2.402157 LR: 0.00001544 +[07:12:34] Epoch: 1 Batch: 1984/20099 (9.87%) Loss: 2.370559 LR: 0.00001544 +[07:12:37] Epoch: 1 Batch: 1985/20099 (9.88%) Loss: 2.419552 LR: 0.00001544 +[07:12:40] Epoch: 1 Batch: 1986/20099 (9.88%) Loss: 2.235141 LR: 0.00001544 +[07:12:43] Epoch: 1 Batch: 1987/20099 (9.89%) Loss: 2.274122 LR: 0.00001544 +[07:12:46] Epoch: 1 Batch: 1988/20099 (9.89%) Loss: 2.277702 LR: 0.00001549 +[07:12:49] Epoch: 1 Batch: 1989/20099 (9.90%) Loss: 2.263847 LR: 0.00001549 +[07:12:52] Epoch: 1 Batch: 1990/20099 (9.90%) Loss: 2.325329 LR: 0.00001549 +[07:12:56] Epoch: 1 Batch: 1991/20099 (9.91%) Loss: 2.405195 LR: 0.00001549 +[07:12:59] Epoch: 1 Batch: 1992/20099 (9.91%) Loss: 2.181181 LR: 0.00001549 +[07:13:02] Epoch: 1 Batch: 1993/20099 (9.92%) Loss: 2.070408 LR: 0.00001549 +[07:13:05] Epoch: 1 Batch: 1994/20099 (9.92%) Loss: 2.018552 LR: 0.00001549 +[07:13:08] Epoch: 1 Batch: 1995/20099 (9.93%) Loss: 2.307298 LR: 0.00001555 +[07:13:11] Epoch: 1 Batch: 1996/20099 (9.93%) Loss: 2.326211 LR: 0.00001555 +[07:13:14] Epoch: 1 Batch: 1997/20099 (9.94%) Loss: 2.252757 LR: 0.00001555 +[07:13:17] Epoch: 1 Batch: 1998/20099 (9.94%) Loss: 2.487104 LR: 0.00001555 +[07:13:20] Epoch: 1 Batch: 1999/20099 (9.95%) Loss: 2.298489 LR: 0.00001555 +[07:13:23] >> Evaluating batch 0 +[07:13:25] >> Evaluating batch 1 +[07:13:26] >> Evaluating batch 2 +[07:13:27] >> Evaluating batch 3 +[07:13:29] >> Evaluating batch 4 +[07:13:30] >> Evaluating batch 5 +[07:13:31] >> Evaluating batch 6 +[07:13:32] >> Evaluating batch 7 +[07:13:34] >> Evaluating batch 8 +[07:13:35] >> Evaluating batch 9 +[07:13:36] >> Evaluating batch 10 +[07:13:37] >> Evaluating batch 11 +[07:13:38] >> Evaluating batch 12 +[07:13:40] >> Evaluating batch 13 +[07:13:41] >> Evaluating batch 14 +[07:13:42] >> Evaluating batch 15 +[07:13:43] >> Evaluating batch 16 +[07:13:44] Epoch: 1 Step: 2000/20099 Evaluation: +[07:13:44] [1mAvg Loss Since Last Eval: 2.3065 Val Loss: 2.3562 Validation loss delta: -0.0493 Perplexity: 10.5509 LR: 0.00001555 +[07:13:47] >> Temp checkpoint saved: epoch1_step2000, size: 0.1693 GB +[07:13:51] >> Checkpoint saved: epoch1_step2000, size: 0.1693 GB +[07:13:51] Epoch: 1 Batch: 2000/20099 (9.95%) Loss: 2.367077 LR: 0.00001555 +[07:13:54] Epoch: 1 Batch: 2001/20099 (9.96%) Loss: 2.289066 LR: 0.00001555 +[07:13:57] Epoch: 1 Batch: 2002/20099 (9.96%) Loss: 2.346720 LR: 0.00001560 +[07:14:00] Epoch: 1 Batch: 2003/20099 (9.97%) Loss: 2.406989 LR: 0.00001560 +[07:14:03] Epoch: 1 Batch: 2004/20099 (9.97%) Loss: 2.170361 LR: 0.00001560 +[07:14:07] Epoch: 1 Batch: 2005/20099 (9.98%) Loss: 2.354332 LR: 0.00001560 +[07:14:10] Epoch: 1 Batch: 2006/20099 (9.98%) Loss: 2.293551 LR: 0.00001560 +[07:14:13] Epoch: 1 Batch: 2007/20099 (9.99%) Loss: 2.191210 LR: 0.00001560 +[07:14:16] Epoch: 1 Batch: 2008/20099 (9.99%) Loss: 2.334704 LR: 0.00001560 +[07:14:19] Epoch: 1 Batch: 2009/20099 (10.00%) Loss: 2.313212 LR: 0.00001565 +[07:14:22] Epoch: 1 Batch: 2010/20099 (10.00%) Loss: 2.346992 LR: 0.00001565 +[07:14:26] Epoch: 1 Batch: 2011/20099 (10.01%) Loss: 2.237567 LR: 0.00001565 +[07:14:29] Epoch: 1 Batch: 2012/20099 (10.01%) Loss: 2.464631 LR: 0.00001565 +[07:14:32] Epoch: 1 Batch: 2013/20099 (10.02%) Loss: 2.245076 LR: 0.00001565 +[07:14:35] Epoch: 1 Batch: 2014/20099 (10.02%) Loss: 2.339341 LR: 0.00001565 +[07:14:38] Epoch: 1 Batch: 2015/20099 (10.03%) Loss: 2.270372 LR: 0.00001565 +[07:14:41] Epoch: 1 Batch: 2016/20099 (10.03%) Loss: 2.128101 LR: 0.00001571 +[07:14:44] Epoch: 1 Batch: 2017/20099 (10.04%) Loss: 2.159590 LR: 0.00001571 +[07:14:47] Epoch: 1 Batch: 2018/20099 (10.04%) Loss: 2.211290 LR: 0.00001571 +[07:14:51] Epoch: 1 Batch: 2019/20099 (10.05%) Loss: 2.021578 LR: 0.00001571 +[07:14:54] Epoch: 1 Batch: 2020/20099 (10.05%) Loss: 2.454699 LR: 0.00001571 +[07:14:57] Epoch: 1 Batch: 2021/20099 (10.06%) Loss: 2.202665 LR: 0.00001571 +[07:15:00] Epoch: 1 Batch: 2022/20099 (10.06%) Loss: 2.377779 LR: 0.00001571 +[07:15:03] Epoch: 1 Batch: 2023/20099 (10.07%) Loss: 2.165132 LR: 0.00001576 +[07:15:06] Epoch: 1 Batch: 2024/20099 (10.07%) Loss: 1.950633 LR: 0.00001576 +[07:15:09] Epoch: 1 Batch: 2025/20099 (10.08%) Loss: 2.504355 LR: 0.00001576 +[07:15:12] Epoch: 1 Batch: 2026/20099 (10.08%) Loss: 2.504117 LR: 0.00001576 +[07:15:15] Epoch: 1 Batch: 2027/20099 (10.09%) Loss: 2.179287 LR: 0.00001576 +[07:15:18] Epoch: 1 Batch: 2028/20099 (10.09%) Loss: 2.175775 LR: 0.00001576 +[07:15:21] Epoch: 1 Batch: 2029/20099 (10.10%) Loss: 2.215324 LR: 0.00001576 +[07:15:24] Epoch: 1 Batch: 2030/20099 (10.10%) Loss: 2.251256 LR: 0.00001582 +[07:15:27] Epoch: 1 Batch: 2031/20099 (10.10%) Loss: 2.517844 LR: 0.00001582 +[07:15:31] Epoch: 1 Batch: 2032/20099 (10.11%) Loss: 2.377371 LR: 0.00001582 +[07:15:34] Epoch: 1 Batch: 2033/20099 (10.11%) Loss: 2.566766 LR: 0.00001582 +[07:15:37] Epoch: 1 Batch: 2034/20099 (10.12%) Loss: 2.504982 LR: 0.00001582 +[07:15:40] Epoch: 1 Batch: 2035/20099 (10.12%) Loss: 2.155178 LR: 0.00001582 +[07:15:43] Epoch: 1 Batch: 2036/20099 (10.13%) Loss: 2.274602 LR: 0.00001582 +[07:15:46] Epoch: 1 Batch: 2037/20099 (10.13%) Loss: 2.522890 LR: 0.00001587 +[07:15:49] Epoch: 1 Batch: 2038/20099 (10.14%) Loss: 2.423954 LR: 0.00001587 +[07:15:52] Epoch: 1 Batch: 2039/20099 (10.14%) Loss: 2.036903 LR: 0.00001587 +[07:15:55] Epoch: 1 Batch: 2040/20099 (10.15%) Loss: 2.170370 LR: 0.00001587 +[07:15:59] Epoch: 1 Batch: 2041/20099 (10.15%) Loss: 2.848930 LR: 0.00001587 +[07:16:02] Epoch: 1 Batch: 2042/20099 (10.16%) Loss: 2.229634 LR: 0.00001587 +[07:16:05] Epoch: 1 Batch: 2043/20099 (10.16%) Loss: 2.516218 LR: 0.00001587 +[07:16:08] Epoch: 1 Batch: 2044/20099 (10.17%) Loss: 2.308128 LR: 0.00001593 +[07:16:11] Epoch: 1 Batch: 2045/20099 (10.17%) Loss: 2.207214 LR: 0.00001593 +[07:16:14] Epoch: 1 Batch: 2046/20099 (10.18%) Loss: 2.056960 LR: 0.00001593 +[07:16:17] Epoch: 1 Batch: 2047/20099 (10.18%) Loss: 2.162518 LR: 0.00001593 +[07:16:20] Epoch: 1 Batch: 2048/20099 (10.19%) Loss: 2.189251 LR: 0.00001593 +[07:16:23] Epoch: 1 Batch: 2049/20099 (10.19%) Loss: 2.296892 LR: 0.00001593 +[07:16:26] Epoch: 1 Batch: 2050/20099 (10.20%) Loss: 2.230157 LR: 0.00001593 +[07:16:30] Epoch: 1 Batch: 2051/20099 (10.20%) Loss: 2.306894 LR: 0.00001598 +[07:16:33] Epoch: 1 Batch: 2052/20099 (10.21%) Loss: 2.121476 LR: 0.00001598 +[07:16:36] Epoch: 1 Batch: 2053/20099 (10.21%) Loss: 2.277861 LR: 0.00001598 +[07:16:39] Epoch: 1 Batch: 2054/20099 (10.22%) Loss: 2.022494 LR: 0.00001598 +[07:16:42] Epoch: 1 Batch: 2055/20099 (10.22%) Loss: 2.184910 LR: 0.00001598 +[07:16:45] Epoch: 1 Batch: 2056/20099 (10.23%) Loss: 2.358737 LR: 0.00001598 +[07:16:48] Epoch: 1 Batch: 2057/20099 (10.23%) Loss: 2.182059 LR: 0.00001598 +[07:16:51] Epoch: 1 Batch: 2058/20099 (10.24%) Loss: 2.207781 LR: 0.00001604 +[07:16:54] Epoch: 1 Batch: 2059/20099 (10.24%) Loss: 2.553614 LR: 0.00001604 +[07:16:57] Epoch: 1 Batch: 2060/20099 (10.25%) Loss: 2.409334 LR: 0.00001604 +[07:17:00] Epoch: 1 Batch: 2061/20099 (10.25%) Loss: 2.074232 LR: 0.00001604 +[07:17:03] Epoch: 1 Batch: 2062/20099 (10.26%) Loss: 2.401018 LR: 0.00001604 +[07:17:07] Epoch: 1 Batch: 2063/20099 (10.26%) Loss: 1.950542 LR: 0.00001604 +[07:17:10] Epoch: 1 Batch: 2064/20099 (10.27%) Loss: 2.088585 LR: 0.00001604 +[07:17:13] Epoch: 1 Batch: 2065/20099 (10.27%) Loss: 2.173294 LR: 0.00001609 +[07:17:16] Epoch: 1 Batch: 2066/20099 (10.28%) Loss: 2.343125 LR: 0.00001609 +[07:17:19] Epoch: 1 Batch: 2067/20099 (10.28%) Loss: 2.096139 LR: 0.00001609 +[07:17:22] Epoch: 1 Batch: 2068/20099 (10.29%) Loss: 2.302466 LR: 0.00001609 +[07:17:25] Epoch: 1 Batch: 2069/20099 (10.29%) Loss: 2.327072 LR: 0.00001609 +[07:17:28] Epoch: 1 Batch: 2070/20099 (10.30%) Loss: 2.444797 LR: 0.00001609 +[07:17:31] Epoch: 1 Batch: 2071/20099 (10.30%) Loss: 2.269427 LR: 0.00001609 +[07:17:34] Epoch: 1 Batch: 2072/20099 (10.31%) Loss: 2.038048 LR: 0.00001615 +[07:17:37] Epoch: 1 Batch: 2073/20099 (10.31%) Loss: 2.369176 LR: 0.00001615 +[07:17:41] Epoch: 1 Batch: 2074/20099 (10.32%) Loss: 2.255075 LR: 0.00001615 +[07:17:44] Epoch: 1 Batch: 2075/20099 (10.32%) Loss: 2.614571 LR: 0.00001615 +[07:17:47] Epoch: 1 Batch: 2076/20099 (10.33%) Loss: 2.272285 LR: 0.00001615 +[07:17:50] Epoch: 1 Batch: 2077/20099 (10.33%) Loss: 2.226606 LR: 0.00001615 +[07:17:53] Epoch: 1 Batch: 2078/20099 (10.34%) Loss: 2.376482 LR: 0.00001615 +[07:17:56] Epoch: 1 Batch: 2079/20099 (10.34%) Loss: 1.969734 LR: 0.00001620 +[07:17:59] Epoch: 1 Batch: 2080/20099 (10.35%) Loss: 1.986150 LR: 0.00001620 +[07:18:02] Epoch: 1 Batch: 2081/20099 (10.35%) Loss: 2.335512 LR: 0.00001620 +[07:18:05] Epoch: 1 Batch: 2082/20099 (10.36%) Loss: 1.875843 LR: 0.00001620 +[07:18:09] Epoch: 1 Batch: 2083/20099 (10.36%) Loss: 2.496465 LR: 0.00001620 +[07:18:12] Epoch: 1 Batch: 2084/20099 (10.37%) Loss: 2.302438 LR: 0.00001620 +[07:18:15] Epoch: 1 Batch: 2085/20099 (10.37%) Loss: 2.189569 LR: 0.00001620 +[07:18:18] Epoch: 1 Batch: 2086/20099 (10.38%) Loss: 2.421917 LR: 0.00001625 +[07:18:21] Epoch: 1 Batch: 2087/20099 (10.38%) Loss: 1.799346 LR: 0.00001625 +[07:18:24] Epoch: 1 Batch: 2088/20099 (10.39%) Loss: 2.194513 LR: 0.00001625 +[07:18:27] Epoch: 1 Batch: 2089/20099 (10.39%) Loss: 2.130016 LR: 0.00001625 +[07:18:30] Epoch: 1 Batch: 2090/20099 (10.40%) Loss: 2.320693 LR: 0.00001625 +[07:18:33] Epoch: 1 Batch: 2091/20099 (10.40%) Loss: 2.367608 LR: 0.00001625 +[07:18:36] Epoch: 1 Batch: 2092/20099 (10.41%) Loss: 2.367485 LR: 0.00001625 +[07:18:39] Epoch: 1 Batch: 2093/20099 (10.41%) Loss: 2.298373 LR: 0.00001631 +[07:18:43] Epoch: 1 Batch: 2094/20099 (10.42%) Loss: 2.453069 LR: 0.00001631 +[07:18:46] Epoch: 1 Batch: 2095/20099 (10.42%) Loss: 2.164415 LR: 0.00001631 +[07:18:49] Epoch: 1 Batch: 2096/20099 (10.43%) Loss: 2.093382 LR: 0.00001631 +[07:18:52] Epoch: 1 Batch: 2097/20099 (10.43%) Loss: 2.318124 LR: 0.00001631 +[07:18:55] Epoch: 1 Batch: 2098/20099 (10.44%) Loss: 2.397333 LR: 0.00001631 +[07:18:58] Epoch: 1 Batch: 2099/20099 (10.44%) Loss: 2.171014 LR: 0.00001631 +[07:19:01] Epoch: 1 Batch: 2100/20099 (10.45%) Loss: 2.496403 LR: 0.00001636 +[07:19:04] Epoch: 1 Batch: 2101/20099 (10.45%) Loss: 2.354991 LR: 0.00001636 +[07:19:07] Epoch: 1 Batch: 2102/20099 (10.46%) Loss: 2.253316 LR: 0.00001636 +[07:19:10] Epoch: 1 Batch: 2103/20099 (10.46%) Loss: 2.191739 LR: 0.00001636 +[07:19:14] Epoch: 1 Batch: 2104/20099 (10.47%) Loss: 2.212520 LR: 0.00001636 +[07:19:17] Epoch: 1 Batch: 2105/20099 (10.47%) Loss: 2.186094 LR: 0.00001636 +[07:19:20] Epoch: 1 Batch: 2106/20099 (10.48%) Loss: 2.211991 LR: 0.00001636 +[07:19:23] Epoch: 1 Batch: 2107/20099 (10.48%) Loss: 2.064953 LR: 0.00001642 +[07:19:26] Epoch: 1 Batch: 2108/20099 (10.49%) Loss: 2.128211 LR: 0.00001642 +[07:19:29] Epoch: 1 Batch: 2109/20099 (10.49%) Loss: 2.538609 LR: 0.00001642 +[07:19:32] Epoch: 1 Batch: 2110/20099 (10.50%) Loss: 2.340076 LR: 0.00001642 +[07:19:35] Epoch: 1 Batch: 2111/20099 (10.50%) Loss: 2.291218 LR: 0.00001642 +[07:19:38] Epoch: 1 Batch: 2112/20099 (10.51%) Loss: 2.457956 LR: 0.00001642 +[07:19:41] Epoch: 1 Batch: 2113/20099 (10.51%) Loss: 2.098350 LR: 0.00001642 +[07:19:45] Epoch: 1 Batch: 2114/20099 (10.52%) Loss: 1.829670 LR: 0.00001647 +[07:19:48] Epoch: 1 Batch: 2115/20099 (10.52%) Loss: 2.291657 LR: 0.00001647 +[07:19:51] Epoch: 1 Batch: 2116/20099 (10.53%) Loss: 2.336934 LR: 0.00001647 +[07:19:54] Epoch: 1 Batch: 2117/20099 (10.53%) Loss: 2.084483 LR: 0.00001647 +[07:19:57] Epoch: 1 Batch: 2118/20099 (10.54%) Loss: 2.006647 LR: 0.00001647 +[07:20:00] Epoch: 1 Batch: 2119/20099 (10.54%) Loss: 2.328546 LR: 0.00001647 +[07:20:03] Epoch: 1 Batch: 2120/20099 (10.55%) Loss: 2.558205 LR: 0.00001647 +[07:20:06] Epoch: 1 Batch: 2121/20099 (10.55%) Loss: 2.269551 LR: 0.00001653 +[07:20:09] Epoch: 1 Batch: 2122/20099 (10.56%) Loss: 2.243447 LR: 0.00001653 +[07:20:12] Epoch: 1 Batch: 2123/20099 (10.56%) Loss: 2.311573 LR: 0.00001653 +[07:20:16] Epoch: 1 Batch: 2124/20099 (10.57%) Loss: 2.446999 LR: 0.00001653 +[07:20:19] Epoch: 1 Batch: 2125/20099 (10.57%) Loss: 2.761786 LR: 0.00001653 +[07:20:22] Epoch: 1 Batch: 2126/20099 (10.58%) Loss: 2.040054 LR: 0.00001653 +[07:20:25] Epoch: 1 Batch: 2127/20099 (10.58%) Loss: 2.214284 LR: 0.00001653 +[07:20:28] Epoch: 1 Batch: 2128/20099 (10.59%) Loss: 2.196260 LR: 0.00001658 +[07:20:31] Epoch: 1 Batch: 2129/20099 (10.59%) Loss: 2.150673 LR: 0.00001658 +[07:20:34] Epoch: 1 Batch: 2130/20099 (10.60%) Loss: 2.281082 LR: 0.00001658 +[07:20:37] Epoch: 1 Batch: 2131/20099 (10.60%) Loss: 2.436777 LR: 0.00001658 +[07:20:40] Epoch: 1 Batch: 2132/20099 (10.61%) Loss: 2.156753 LR: 0.00001658 +[07:20:44] Epoch: 1 Batch: 2133/20099 (10.61%) Loss: 2.560733 LR: 0.00001658 +[07:20:47] Epoch: 1 Batch: 2134/20099 (10.62%) Loss: 2.051600 LR: 0.00001658 +[07:20:50] Epoch: 1 Batch: 2135/20099 (10.62%) Loss: 2.256183 LR: 0.00001664 +[07:20:53] Epoch: 1 Batch: 2136/20099 (10.63%) Loss: 2.390585 LR: 0.00001664 +[07:20:56] Epoch: 1 Batch: 2137/20099 (10.63%) Loss: 2.358550 LR: 0.00001664 +[07:20:59] Epoch: 1 Batch: 2138/20099 (10.64%) Loss: 2.356432 LR: 0.00001664 +[07:21:02] Epoch: 1 Batch: 2139/20099 (10.64%) Loss: 2.176764 LR: 0.00001664 +[07:21:05] Epoch: 1 Batch: 2140/20099 (10.65%) Loss: 2.446532 LR: 0.00001664 +[07:21:08] Epoch: 1 Batch: 2141/20099 (10.65%) Loss: 2.544462 LR: 0.00001664 +[07:21:11] Epoch: 1 Batch: 2142/20099 (10.66%) Loss: 2.407151 LR: 0.00001669 +[07:21:14] Epoch: 1 Batch: 2143/20099 (10.66%) Loss: 2.003327 LR: 0.00001669 +[07:21:18] Epoch: 1 Batch: 2144/20099 (10.67%) Loss: 2.359697 LR: 0.00001669 +[07:21:21] Epoch: 1 Batch: 2145/20099 (10.67%) Loss: 2.304569 LR: 0.00001669 +[07:21:24] Epoch: 1 Batch: 2146/20099 (10.68%) Loss: 2.254885 LR: 0.00001669 +[07:21:27] Epoch: 1 Batch: 2147/20099 (10.68%) Loss: 2.095551 LR: 0.00001669 +[07:21:30] Epoch: 1 Batch: 2148/20099 (10.69%) Loss: 2.650084 LR: 0.00001669 +[07:21:33] Epoch: 1 Batch: 2149/20099 (10.69%) Loss: 2.251336 LR: 0.00001675 +[07:21:36] Epoch: 1 Batch: 2150/20099 (10.70%) Loss: 2.250636 LR: 0.00001675 +[07:21:39] Epoch: 1 Batch: 2151/20099 (10.70%) Loss: 2.266400 LR: 0.00001675 +[07:21:42] Epoch: 1 Batch: 2152/20099 (10.71%) Loss: 2.093622 LR: 0.00001675 +[07:21:45] Epoch: 1 Batch: 2153/20099 (10.71%) Loss: 2.440011 LR: 0.00001675 +[07:21:49] Epoch: 1 Batch: 2154/20099 (10.72%) Loss: 2.376338 LR: 0.00001675 +[07:21:52] Epoch: 1 Batch: 2155/20099 (10.72%) Loss: 2.452884 LR: 0.00001675 +[07:21:55] Epoch: 1 Batch: 2156/20099 (10.73%) Loss: 2.323530 LR: 0.00001680 +[07:21:58] Epoch: 1 Batch: 2157/20099 (10.73%) Loss: 2.107119 LR: 0.00001680 +[07:22:01] Epoch: 1 Batch: 2158/20099 (10.74%) Loss: 2.306903 LR: 0.00001680 +[07:22:04] Epoch: 1 Batch: 2159/20099 (10.74%) Loss: 2.185366 LR: 0.00001680 +[07:22:07] Epoch: 1 Batch: 2160/20099 (10.75%) Loss: 2.334271 LR: 0.00001680 +[07:22:10] Epoch: 1 Batch: 2161/20099 (10.75%) Loss: 2.367483 LR: 0.00001680 +[07:22:13] Epoch: 1 Batch: 2162/20099 (10.76%) Loss: 2.345570 LR: 0.00001680 +[07:22:16] Epoch: 1 Batch: 2163/20099 (10.76%) Loss: 2.131888 LR: 0.00001685 +[07:22:20] Epoch: 1 Batch: 2164/20099 (10.77%) Loss: 2.099236 LR: 0.00001685 +[07:22:23] Epoch: 1 Batch: 2165/20099 (10.77%) Loss: 2.371895 LR: 0.00001685 +[07:22:26] Epoch: 1 Batch: 2166/20099 (10.78%) Loss: 2.291843 LR: 0.00001685 +[07:22:29] Epoch: 1 Batch: 2167/20099 (10.78%) Loss: 2.518139 LR: 0.00001685 +[07:22:32] Epoch: 1 Batch: 2168/20099 (10.79%) Loss: 2.352367 LR: 0.00001685 +[07:22:35] Epoch: 1 Batch: 2169/20099 (10.79%) Loss: 2.293696 LR: 0.00001685 +[07:22:38] Epoch: 1 Batch: 2170/20099 (10.80%) Loss: 1.994458 LR: 0.00001691 +[07:22:41] Epoch: 1 Batch: 2171/20099 (10.80%) Loss: 2.634981 LR: 0.00001691 +[07:22:44] Epoch: 1 Batch: 2172/20099 (10.81%) Loss: 2.382989 LR: 0.00001691 +[07:22:47] Epoch: 1 Batch: 2173/20099 (10.81%) Loss: 2.678243 LR: 0.00001691 +[07:22:50] Epoch: 1 Batch: 2174/20099 (10.82%) Loss: 2.622238 LR: 0.00001691 +[07:22:54] Epoch: 1 Batch: 2175/20099 (10.82%) Loss: 1.999560 LR: 0.00001691 +[07:22:57] Epoch: 1 Batch: 2176/20099 (10.83%) Loss: 2.402190 LR: 0.00001691 +[07:23:00] Epoch: 1 Batch: 2177/20099 (10.83%) Loss: 2.448068 LR: 0.00001696 +[07:23:03] Epoch: 1 Batch: 2178/20099 (10.84%) Loss: 2.263721 LR: 0.00001696 +[07:23:06] Epoch: 1 Batch: 2179/20099 (10.84%) Loss: 2.273844 LR: 0.00001696 +[07:23:09] Epoch: 1 Batch: 2180/20099 (10.85%) Loss: 2.442904 LR: 0.00001696 +[07:23:12] Epoch: 1 Batch: 2181/20099 (10.85%) Loss: 2.109273 LR: 0.00001696 +[07:23:15] Epoch: 1 Batch: 2182/20099 (10.86%) Loss: 2.325187 LR: 0.00001696 +[07:23:18] Epoch: 1 Batch: 2183/20099 (10.86%) Loss: 2.457445 LR: 0.00001696 +[07:23:21] Epoch: 1 Batch: 2184/20099 (10.87%) Loss: 2.405094 LR: 0.00001702 +[07:23:25] Epoch: 1 Batch: 2185/20099 (10.87%) Loss: 2.134115 LR: 0.00001702 +[07:23:28] Epoch: 1 Batch: 2186/20099 (10.88%) Loss: 2.234102 LR: 0.00001702 +[07:23:31] Epoch: 1 Batch: 2187/20099 (10.88%) Loss: 2.289825 LR: 0.00001702 +[07:23:34] Epoch: 1 Batch: 2188/20099 (10.89%) Loss: 2.237949 LR: 0.00001702 +[07:23:37] Epoch: 1 Batch: 2189/20099 (10.89%) Loss: 2.244594 LR: 0.00001702 +[07:23:40] Epoch: 1 Batch: 2190/20099 (10.90%) Loss: 2.029689 LR: 0.00001702 +[07:23:43] Epoch: 1 Batch: 2191/20099 (10.90%) Loss: 2.468111 LR: 0.00001707 +[07:23:46] Epoch: 1 Batch: 2192/20099 (10.91%) Loss: 2.265834 LR: 0.00001707 +[07:23:49] Epoch: 1 Batch: 2193/20099 (10.91%) Loss: 2.122946 LR: 0.00001707 +[07:23:52] Epoch: 1 Batch: 2194/20099 (10.92%) Loss: 2.312151 LR: 0.00001707 +[07:23:55] Epoch: 1 Batch: 2195/20099 (10.92%) Loss: 2.127033 LR: 0.00001707 +[07:23:59] Epoch: 1 Batch: 2196/20099 (10.93%) Loss: 2.280644 LR: 0.00001707 +[07:24:02] Epoch: 1 Batch: 2197/20099 (10.93%) Loss: 2.274489 LR: 0.00001707 +[07:24:05] Epoch: 1 Batch: 2198/20099 (10.94%) Loss: 2.363740 LR: 0.00001713 +[07:24:08] Epoch: 1 Batch: 2199/20099 (10.94%) Loss: 2.431030 LR: 0.00001713 +[07:24:15] >> Cleaned up old temp checkpoint: epoch1_step200 +[07:24:15] >> Temp checkpoint saved: epoch1_step2200, size: 0.1693 GB +[07:24:15] Epoch: 1 Batch: 2200/20099 (10.95%) Loss: 2.350767 LR: 0.00001713 +[07:24:18] Epoch: 1 Batch: 2201/20099 (10.95%) Loss: 2.173109 LR: 0.00001713 +[07:24:21] Epoch: 1 Batch: 2202/20099 (10.96%) Loss: 2.202811 LR: 0.00001713 +[07:24:24] Epoch: 1 Batch: 2203/20099 (10.96%) Loss: 2.522493 LR: 0.00001713 +[07:24:27] Epoch: 1 Batch: 2204/20099 (10.97%) Loss: 2.108222 LR: 0.00001713 +[07:24:30] Epoch: 1 Batch: 2205/20099 (10.97%) Loss: 2.107733 LR: 0.00001718 +[07:24:33] Epoch: 1 Batch: 2206/20099 (10.98%) Loss: 2.077242 LR: 0.00001718 +[07:24:36] Epoch: 1 Batch: 2207/20099 (10.98%) Loss: 2.172688 LR: 0.00001718 +[07:24:39] Epoch: 1 Batch: 2208/20099 (10.99%) Loss: 2.121405 LR: 0.00001718 +[07:24:43] Epoch: 1 Batch: 2209/20099 (10.99%) Loss: 2.000204 LR: 0.00001718 +[07:24:46] Epoch: 1 Batch: 2210/20099 (11.00%) Loss: 2.235373 LR: 0.00001718 +[07:24:49] Epoch: 1 Batch: 2211/20099 (11.00%) Loss: 2.288319 LR: 0.00001718 +[07:24:52] Epoch: 1 Batch: 2212/20099 (11.01%) Loss: 2.127893 LR: 0.00001724 +[07:24:55] Epoch: 1 Batch: 2213/20099 (11.01%) Loss: 2.637189 LR: 0.00001724 +[07:24:58] Epoch: 1 Batch: 2214/20099 (11.02%) Loss: 2.517625 LR: 0.00001724 +[07:25:01] Epoch: 1 Batch: 2215/20099 (11.02%) Loss: 2.274584 LR: 0.00001724 +[07:25:05] Epoch: 1 Batch: 2216/20099 (11.03%) Loss: 2.178013 LR: 0.00001724 +[07:25:08] Epoch: 1 Batch: 2217/20099 (11.03%) Loss: 2.557345 LR: 0.00001724 +[07:25:11] Epoch: 1 Batch: 2218/20099 (11.04%) Loss: 2.386651 LR: 0.00001724 +[07:25:14] Epoch: 1 Batch: 2219/20099 (11.04%) Loss: 2.297599 LR: 0.00001729 +[07:25:17] Epoch: 1 Batch: 2220/20099 (11.05%) Loss: 2.398722 LR: 0.00001729 +[07:25:20] Epoch: 1 Batch: 2221/20099 (11.05%) Loss: 2.039381 LR: 0.00001729 +[07:25:23] Epoch: 1 Batch: 2222/20099 (11.06%) Loss: 2.445382 LR: 0.00001729 +[07:25:26] Epoch: 1 Batch: 2223/20099 (11.06%) Loss: 2.404015 LR: 0.00001729 +[07:25:29] Epoch: 1 Batch: 2224/20099 (11.07%) Loss: 2.334263 LR: 0.00001729 +[07:25:32] Epoch: 1 Batch: 2225/20099 (11.07%) Loss: 2.011540 LR: 0.00001729 +[07:25:35] Epoch: 1 Batch: 2226/20099 (11.08%) Loss: 2.267080 LR: 0.00001735 +[07:25:38] Epoch: 1 Batch: 2227/20099 (11.08%) Loss: 2.219900 LR: 0.00001735 +[07:25:42] Epoch: 1 Batch: 2228/20099 (11.09%) Loss: 1.820150 LR: 0.00001735 +[07:25:45] Epoch: 1 Batch: 2229/20099 (11.09%) Loss: 2.405012 LR: 0.00001735 +[07:25:48] Epoch: 1 Batch: 2230/20099 (11.10%) Loss: 2.415494 LR: 0.00001735 +[07:25:51] Epoch: 1 Batch: 2231/20099 (11.10%) Loss: 2.179748 LR: 0.00001735 +[07:25:54] Epoch: 1 Batch: 2232/20099 (11.11%) Loss: 2.336345 LR: 0.00001735 +[07:25:58] Epoch: 1 Batch: 2233/20099 (11.11%) Loss: 2.455586 LR: 0.00001740 +[07:26:01] Epoch: 1 Batch: 2234/20099 (11.11%) Loss: 2.206902 LR: 0.00001740 +[07:26:04] Epoch: 1 Batch: 2235/20099 (11.12%) Loss: 2.518841 LR: 0.00001740 +[07:26:07] Epoch: 1 Batch: 2236/20099 (11.12%) Loss: 2.418238 LR: 0.00001740 +[07:26:10] Epoch: 1 Batch: 2237/20099 (11.13%) Loss: 2.120307 LR: 0.00001740 +[07:26:13] Epoch: 1 Batch: 2238/20099 (11.13%) Loss: 2.317286 LR: 0.00001740 +[07:26:16] Epoch: 1 Batch: 2239/20099 (11.14%) Loss: 2.217853 LR: 0.00001740 +[07:26:19] Epoch: 1 Batch: 2240/20099 (11.14%) Loss: 2.056600 LR: 0.00001745 +[07:26:22] Epoch: 1 Batch: 2241/20099 (11.15%) Loss: 2.349730 LR: 0.00001745 +[07:26:25] Epoch: 1 Batch: 2242/20099 (11.15%) Loss: 2.209155 LR: 0.00001745 +[07:26:29] Epoch: 1 Batch: 2243/20099 (11.16%) Loss: 2.266071 LR: 0.00001745 +[07:26:32] Epoch: 1 Batch: 2244/20099 (11.16%) Loss: 2.289361 LR: 0.00001745 +[07:26:35] Epoch: 1 Batch: 2245/20099 (11.17%) Loss: 2.041457 LR: 0.00001745 +[07:26:38] Epoch: 1 Batch: 2246/20099 (11.17%) Loss: 2.333389 LR: 0.00001745 +[07:26:41] Epoch: 1 Batch: 2247/20099 (11.18%) Loss: 2.399965 LR: 0.00001751 +[07:26:44] Epoch: 1 Batch: 2248/20099 (11.18%) Loss: 2.248127 LR: 0.00001751 +[07:26:47] Epoch: 1 Batch: 2249/20099 (11.19%) Loss: 2.158881 LR: 0.00001751 +[07:26:50] Epoch: 1 Batch: 2250/20099 (11.19%) Loss: 2.291950 LR: 0.00001751 +[07:26:53] Epoch: 1 Batch: 2251/20099 (11.20%) Loss: 2.079212 LR: 0.00001751 +[07:26:56] Epoch: 1 Batch: 2252/20099 (11.20%) Loss: 2.137505 LR: 0.00001751 +[07:26:59] Epoch: 1 Batch: 2253/20099 (11.21%) Loss: 2.358327 LR: 0.00001751 +[07:27:03] Epoch: 1 Batch: 2254/20099 (11.21%) Loss: 2.137950 LR: 0.00001756 +[07:27:06] Epoch: 1 Batch: 2255/20099 (11.22%) Loss: 2.569938 LR: 0.00001756 +[07:27:09] Epoch: 1 Batch: 2256/20099 (11.22%) Loss: 2.263758 LR: 0.00001756 +[07:27:12] Epoch: 1 Batch: 2257/20099 (11.23%) Loss: 2.289107 LR: 0.00001756 +[07:27:15] Epoch: 1 Batch: 2258/20099 (11.23%) Loss: 2.245047 LR: 0.00001756 +[07:27:18] Epoch: 1 Batch: 2259/20099 (11.24%) Loss: 2.197377 LR: 0.00001756 +[07:27:21] Epoch: 1 Batch: 2260/20099 (11.24%) Loss: 1.934034 LR: 0.00001756 +[07:27:24] Epoch: 1 Batch: 2261/20099 (11.25%) Loss: 2.095039 LR: 0.00001762 +[07:27:27] Epoch: 1 Batch: 2262/20099 (11.25%) Loss: 2.252041 LR: 0.00001762 +[07:27:30] Epoch: 1 Batch: 2263/20099 (11.26%) Loss: 2.201217 LR: 0.00001762 +[07:27:34] Epoch: 1 Batch: 2264/20099 (11.26%) Loss: 2.315679 LR: 0.00001762 +[07:27:37] Epoch: 1 Batch: 2265/20099 (11.27%) Loss: 2.194971 LR: 0.00001762 +[07:27:40] Epoch: 1 Batch: 2266/20099 (11.27%) Loss: 2.411655 LR: 0.00001762 +[07:27:43] Epoch: 1 Batch: 2267/20099 (11.28%) Loss: 2.503161 LR: 0.00001762 +[07:27:46] Epoch: 1 Batch: 2268/20099 (11.28%) Loss: 2.113563 LR: 0.00001767 +[07:27:49] Epoch: 1 Batch: 2269/20099 (11.29%) Loss: 1.854295 LR: 0.00001767 +[07:27:52] Epoch: 1 Batch: 2270/20099 (11.29%) Loss: 2.340843 LR: 0.00001767 +[07:27:55] Epoch: 1 Batch: 2271/20099 (11.30%) Loss: 2.493485 LR: 0.00001767 +[07:27:58] Epoch: 1 Batch: 2272/20099 (11.30%) Loss: 2.432115 LR: 0.00001767 +[07:28:01] Epoch: 1 Batch: 2273/20099 (11.31%) Loss: 2.258824 LR: 0.00001767 +[07:28:04] Epoch: 1 Batch: 2274/20099 (11.31%) Loss: 2.418018 LR: 0.00001767 +[07:28:08] Epoch: 1 Batch: 2275/20099 (11.32%) Loss: 2.187926 LR: 0.00001773 +[07:28:11] Epoch: 1 Batch: 2276/20099 (11.32%) Loss: 1.895923 LR: 0.00001773 +[07:28:14] Epoch: 1 Batch: 2277/20099 (11.33%) Loss: 2.061088 LR: 0.00001773 +[07:28:17] Epoch: 1 Batch: 2278/20099 (11.33%) Loss: 2.244972 LR: 0.00001773 +[07:28:20] Epoch: 1 Batch: 2279/20099 (11.34%) Loss: 2.213605 LR: 0.00001773 +[07:28:23] Epoch: 1 Batch: 2280/20099 (11.34%) Loss: 2.434781 LR: 0.00001773 +[07:28:26] Epoch: 1 Batch: 2281/20099 (11.35%) Loss: 2.034761 LR: 0.00001773 +[07:28:29] Epoch: 1 Batch: 2282/20099 (11.35%) Loss: 2.375212 LR: 0.00001778 +[07:28:32] Epoch: 1 Batch: 2283/20099 (11.36%) Loss: 2.267987 LR: 0.00001778 +[07:28:35] Epoch: 1 Batch: 2284/20099 (11.36%) Loss: 2.209011 LR: 0.00001778 +[07:28:38] Epoch: 1 Batch: 2285/20099 (11.37%) Loss: 2.319532 LR: 0.00001778 +[07:28:41] Epoch: 1 Batch: 2286/20099 (11.37%) Loss: 1.637609 LR: 0.00001778 +[07:28:45] Epoch: 1 Batch: 2287/20099 (11.38%) Loss: 2.268727 LR: 0.00001778 +[07:28:48] Epoch: 1 Batch: 2288/20099 (11.38%) Loss: 2.260520 LR: 0.00001778 +[07:28:51] Epoch: 1 Batch: 2289/20099 (11.39%) Loss: 2.483506 LR: 0.00001784 +[07:28:54] Epoch: 1 Batch: 2290/20099 (11.39%) Loss: 2.231959 LR: 0.00001784 +[07:28:57] Epoch: 1 Batch: 2291/20099 (11.40%) Loss: 2.560749 LR: 0.00001784 +[07:29:00] Epoch: 1 Batch: 2292/20099 (11.40%) Loss: 2.193611 LR: 0.00001784 +[07:29:03] Epoch: 1 Batch: 2293/20099 (11.41%) Loss: 2.286368 LR: 0.00001784 +[07:29:06] Epoch: 1 Batch: 2294/20099 (11.41%) Loss: 2.169125 LR: 0.00001784 +[07:29:09] Epoch: 1 Batch: 2295/20099 (11.42%) Loss: 2.442679 LR: 0.00001784 +[07:29:12] Epoch: 1 Batch: 2296/20099 (11.42%) Loss: 2.262251 LR: 0.00001789 +[07:29:15] Epoch: 1 Batch: 2297/20099 (11.43%) Loss: 2.266653 LR: 0.00001789 +[07:29:19] Epoch: 1 Batch: 2298/20099 (11.43%) Loss: 2.445362 LR: 0.00001789 +[07:29:22] Epoch: 1 Batch: 2299/20099 (11.44%) Loss: 2.358265 LR: 0.00001789 +[07:29:25] Epoch: 1 Batch: 2300/20099 (11.44%) Loss: 2.353177 LR: 0.00001789 +[07:29:28] Epoch: 1 Batch: 2301/20099 (11.45%) Loss: 2.074112 LR: 0.00001789 +[07:29:31] Epoch: 1 Batch: 2302/20099 (11.45%) Loss: 2.028365 LR: 0.00001789 +[07:29:34] Epoch: 1 Batch: 2303/20099 (11.46%) Loss: 2.506512 LR: 0.00001795 +[07:29:37] Epoch: 1 Batch: 2304/20099 (11.46%) Loss: 2.077575 LR: 0.00001795 +[07:29:40] Epoch: 1 Batch: 2305/20099 (11.47%) Loss: 2.112476 LR: 0.00001795 +[07:29:43] Epoch: 1 Batch: 2306/20099 (11.47%) Loss: 2.293051 LR: 0.00001795 +[07:29:46] Epoch: 1 Batch: 2307/20099 (11.48%) Loss: 2.301848 LR: 0.00001795 +[07:29:49] Epoch: 1 Batch: 2308/20099 (11.48%) Loss: 2.101388 LR: 0.00001795 +[07:29:53] Epoch: 1 Batch: 2309/20099 (11.49%) Loss: 2.236845 LR: 0.00001795 +[07:29:56] Epoch: 1 Batch: 2310/20099 (11.49%) Loss: 2.530491 LR: 0.00001800 +[07:29:59] Epoch: 1 Batch: 2311/20099 (11.50%) Loss: 2.431580 LR: 0.00001800 +[07:30:02] Epoch: 1 Batch: 2312/20099 (11.50%) Loss: 2.377620 LR: 0.00001800 +[07:30:05] Epoch: 1 Batch: 2313/20099 (11.51%) Loss: 2.392697 LR: 0.00001800 +[07:30:08] Epoch: 1 Batch: 2314/20099 (11.51%) Loss: 2.497182 LR: 0.00001800 +[07:30:11] Epoch: 1 Batch: 2315/20099 (11.52%) Loss: 2.419215 LR: 0.00001800 +[07:30:14] Epoch: 1 Batch: 2316/20099 (11.52%) Loss: 2.185939 LR: 0.00001800 +[07:30:17] Epoch: 1 Batch: 2317/20099 (11.53%) Loss: 2.259358 LR: 0.00001805 +[07:30:20] Epoch: 1 Batch: 2318/20099 (11.53%) Loss: 2.219774 LR: 0.00001805 +[07:30:24] Epoch: 1 Batch: 2319/20099 (11.54%) Loss: 2.392250 LR: 0.00001805 +[07:30:27] Epoch: 1 Batch: 2320/20099 (11.54%) Loss: 2.345712 LR: 0.00001805 +[07:30:30] Epoch: 1 Batch: 2321/20099 (11.55%) Loss: 2.163954 LR: 0.00001805 +[07:30:33] Epoch: 1 Batch: 2322/20099 (11.55%) Loss: 2.197498 LR: 0.00001805 +[07:30:36] Epoch: 1 Batch: 2323/20099 (11.56%) Loss: 2.287405 LR: 0.00001805 +[07:30:39] Epoch: 1 Batch: 2324/20099 (11.56%) Loss: 2.259254 LR: 0.00001811 +[07:30:42] Epoch: 1 Batch: 2325/20099 (11.57%) Loss: 2.182793 LR: 0.00001811 +[07:30:45] Epoch: 1 Batch: 2326/20099 (11.57%) Loss: 2.276844 LR: 0.00001811 +[07:30:48] Epoch: 1 Batch: 2327/20099 (11.58%) Loss: 1.962260 LR: 0.00001811 +[07:30:51] Epoch: 1 Batch: 2328/20099 (11.58%) Loss: 2.307644 LR: 0.00001811 +[07:30:54] Epoch: 1 Batch: 2329/20099 (11.59%) Loss: 2.300001 LR: 0.00001811 +[07:30:58] Epoch: 1 Batch: 2330/20099 (11.59%) Loss: 2.341674 LR: 0.00001811 +[07:31:01] Epoch: 1 Batch: 2331/20099 (11.60%) Loss: 2.235319 LR: 0.00001816 +[07:31:04] Epoch: 1 Batch: 2332/20099 (11.60%) Loss: 2.292845 LR: 0.00001816 +[07:31:07] Epoch: 1 Batch: 2333/20099 (11.61%) Loss: 2.275460 LR: 0.00001816 +[07:31:10] Epoch: 1 Batch: 2334/20099 (11.61%) Loss: 2.094487 LR: 0.00001816 +[07:31:13] Epoch: 1 Batch: 2335/20099 (11.62%) Loss: 2.381038 LR: 0.00001816 +[07:31:16] Epoch: 1 Batch: 2336/20099 (11.62%) Loss: 2.430778 LR: 0.00001816 +[07:31:19] Epoch: 1 Batch: 2337/20099 (11.63%) Loss: 2.397788 LR: 0.00001816 +[07:31:22] Epoch: 1 Batch: 2338/20099 (11.63%) Loss: 2.337412 LR: 0.00001822 +[07:31:25] Epoch: 1 Batch: 2339/20099 (11.64%) Loss: 2.185457 LR: 0.00001822 +[07:31:29] Epoch: 1 Batch: 2340/20099 (11.64%) Loss: 2.143036 LR: 0.00001822 +[07:31:32] Epoch: 1 Batch: 2341/20099 (11.65%) Loss: 1.926982 LR: 0.00001822 +[07:31:35] Epoch: 1 Batch: 2342/20099 (11.65%) Loss: 2.271755 LR: 0.00001822 +[07:31:38] Epoch: 1 Batch: 2343/20099 (11.66%) Loss: 2.345785 LR: 0.00001822 +[07:31:41] Epoch: 1 Batch: 2344/20099 (11.66%) Loss: 2.166161 LR: 0.00001822 +[07:31:44] Epoch: 1 Batch: 2345/20099 (11.67%) Loss: 2.046388 LR: 0.00001827 +[07:31:47] Epoch: 1 Batch: 2346/20099 (11.67%) Loss: 2.263599 LR: 0.00001827 +[07:31:50] Epoch: 1 Batch: 2347/20099 (11.68%) Loss: 2.067366 LR: 0.00001827 +[07:31:53] Epoch: 1 Batch: 2348/20099 (11.68%) Loss: 2.274108 LR: 0.00001827 +[07:31:56] Epoch: 1 Batch: 2349/20099 (11.69%) Loss: 2.387689 LR: 0.00001827 +[07:32:00] Epoch: 1 Batch: 2350/20099 (11.69%) Loss: 2.247333 LR: 0.00001827 +[07:32:03] Epoch: 1 Batch: 2351/20099 (11.70%) Loss: 2.300290 LR: 0.00001827 +[07:32:06] Epoch: 1 Batch: 2352/20099 (11.70%) Loss: 2.393127 LR: 0.00001833 +[07:32:09] Epoch: 1 Batch: 2353/20099 (11.71%) Loss: 2.277577 LR: 0.00001833 +[07:32:12] Epoch: 1 Batch: 2354/20099 (11.71%) Loss: 2.340128 LR: 0.00001833 +[07:32:15] Epoch: 1 Batch: 2355/20099 (11.72%) Loss: 2.167192 LR: 0.00001833 +[07:32:18] Epoch: 1 Batch: 2356/20099 (11.72%) Loss: 2.153863 LR: 0.00001833 +[07:32:21] Epoch: 1 Batch: 2357/20099 (11.73%) Loss: 2.420443 LR: 0.00001833 +[07:32:24] Epoch: 1 Batch: 2358/20099 (11.73%) Loss: 2.433759 LR: 0.00001833 +[07:32:27] Epoch: 1 Batch: 2359/20099 (11.74%) Loss: 2.406386 LR: 0.00001838 +[07:32:31] Epoch: 1 Batch: 2360/20099 (11.74%) Loss: 2.473785 LR: 0.00001838 +[07:32:34] Epoch: 1 Batch: 2361/20099 (11.75%) Loss: 2.288433 LR: 0.00001838 +[07:32:37] Epoch: 1 Batch: 2362/20099 (11.75%) Loss: 2.087006 LR: 0.00001838 +[07:32:40] Epoch: 1 Batch: 2363/20099 (11.76%) Loss: 2.089127 LR: 0.00001838 +[07:32:43] Epoch: 1 Batch: 2364/20099 (11.76%) Loss: 2.444918 LR: 0.00001838 +[07:32:46] Epoch: 1 Batch: 2365/20099 (11.77%) Loss: 2.234603 LR: 0.00001838 +[07:32:49] Epoch: 1 Batch: 2366/20099 (11.77%) Loss: 2.328384 LR: 0.00001844 +[07:32:52] Epoch: 1 Batch: 2367/20099 (11.78%) Loss: 2.102800 LR: 0.00001844 +[07:32:55] Epoch: 1 Batch: 2368/20099 (11.78%) Loss: 2.038360 LR: 0.00001844 +[07:32:59] Epoch: 1 Batch: 2369/20099 (11.79%) Loss: 2.287847 LR: 0.00001844 +[07:33:02] Epoch: 1 Batch: 2370/20099 (11.79%) Loss: 2.127258 LR: 0.00001844 +[07:33:05] Epoch: 1 Batch: 2371/20099 (11.80%) Loss: 2.355172 LR: 0.00001844 +[07:33:08] Epoch: 1 Batch: 2372/20099 (11.80%) Loss: 2.116716 LR: 0.00001844 +[07:33:11] Epoch: 1 Batch: 2373/20099 (11.81%) Loss: 2.226033 LR: 0.00001849 +[07:33:14] Epoch: 1 Batch: 2374/20099 (11.81%) Loss: 2.184504 LR: 0.00001849 +[07:33:17] Epoch: 1 Batch: 2375/20099 (11.82%) Loss: 2.260288 LR: 0.00001849 +[07:33:20] Epoch: 1 Batch: 2376/20099 (11.82%) Loss: 2.255987 LR: 0.00001849 +[07:33:23] Epoch: 1 Batch: 2377/20099 (11.83%) Loss: 2.411364 LR: 0.00001849 +[07:33:27] Epoch: 1 Batch: 2378/20099 (11.83%) Loss: 2.199879 LR: 0.00001849 +[07:33:30] Epoch: 1 Batch: 2379/20099 (11.84%) Loss: 2.195881 LR: 0.00001849 +[07:33:33] Epoch: 1 Batch: 2380/20099 (11.84%) Loss: 1.816497 LR: 0.00001855 +[07:33:36] Epoch: 1 Batch: 2381/20099 (11.85%) Loss: 2.161429 LR: 0.00001855 +[07:33:39] Epoch: 1 Batch: 2382/20099 (11.85%) Loss: 2.385235 LR: 0.00001855 +[07:33:42] Epoch: 1 Batch: 2383/20099 (11.86%) Loss: 2.092165 LR: 0.00001855 +[07:33:45] Epoch: 1 Batch: 2384/20099 (11.86%) Loss: 2.221914 LR: 0.00001855 +[07:33:48] Epoch: 1 Batch: 2385/20099 (11.87%) Loss: 2.307705 LR: 0.00001855 +[07:33:51] Epoch: 1 Batch: 2386/20099 (11.87%) Loss: 2.318867 LR: 0.00001855 +[07:33:54] Epoch: 1 Batch: 2387/20099 (11.88%) Loss: 2.426428 LR: 0.00001860 +[07:33:58] Epoch: 1 Batch: 2388/20099 (11.88%) Loss: 2.140994 LR: 0.00001860 +[07:34:01] Epoch: 1 Batch: 2389/20099 (11.89%) Loss: 2.290293 LR: 0.00001860 +[07:34:04] Epoch: 1 Batch: 2390/20099 (11.89%) Loss: 2.007528 LR: 0.00001860 +[07:34:07] Epoch: 1 Batch: 2391/20099 (11.90%) Loss: 2.167539 LR: 0.00001860 +[07:34:10] Epoch: 1 Batch: 2392/20099 (11.90%) Loss: 2.326761 LR: 0.00001860 +[07:34:13] Epoch: 1 Batch: 2393/20099 (11.91%) Loss: 1.986004 LR: 0.00001860 +[07:34:16] Epoch: 1 Batch: 2394/20099 (11.91%) Loss: 1.992854 LR: 0.00001865 +[07:34:19] Epoch: 1 Batch: 2395/20099 (11.92%) Loss: 2.356692 LR: 0.00001865 +[07:34:22] Epoch: 1 Batch: 2396/20099 (11.92%) Loss: 2.322082 LR: 0.00001865 +[07:34:25] Epoch: 1 Batch: 2397/20099 (11.93%) Loss: 2.363139 LR: 0.00001865 +[07:34:29] Epoch: 1 Batch: 2398/20099 (11.93%) Loss: 2.349685 LR: 0.00001865 +[07:34:32] Epoch: 1 Batch: 2399/20099 (11.94%) Loss: 2.484805 LR: 0.00001865 +[07:34:38] >> Cleaned up old temp checkpoint: epoch1_step400 +[07:34:38] >> Temp checkpoint saved: epoch1_step2400, size: 0.1693 GB +[07:34:38] Epoch: 1 Batch: 2400/20099 (11.94%) Loss: 2.348801 LR: 0.00001865 +[07:34:42] Epoch: 1 Batch: 2401/20099 (11.95%) Loss: 2.220187 LR: 0.00001871 +[07:34:45] Epoch: 1 Batch: 2402/20099 (11.95%) Loss: 2.024014 LR: 0.00001871 +[07:34:48] Epoch: 1 Batch: 2403/20099 (11.96%) Loss: 2.001157 LR: 0.00001871 +[07:34:51] Epoch: 1 Batch: 2404/20099 (11.96%) Loss: 2.390901 LR: 0.00001871 +[07:34:54] Epoch: 1 Batch: 2405/20099 (11.97%) Loss: 2.469045 LR: 0.00001871 +[07:34:57] Epoch: 1 Batch: 2406/20099 (11.97%) Loss: 2.221735 LR: 0.00001871 +[07:35:00] Epoch: 1 Batch: 2407/20099 (11.98%) Loss: 2.217654 LR: 0.00001871 +[07:35:03] Epoch: 1 Batch: 2408/20099 (11.98%) Loss: 2.452870 LR: 0.00001876 +[07:35:06] Epoch: 1 Batch: 2409/20099 (11.99%) Loss: 2.304440 LR: 0.00001876 +[07:35:10] Epoch: 1 Batch: 2410/20099 (11.99%) Loss: 2.290774 LR: 0.00001876 +[07:35:13] Epoch: 1 Batch: 2411/20099 (12.00%) Loss: 2.192205 LR: 0.00001876 +[07:35:16] Epoch: 1 Batch: 2412/20099 (12.00%) Loss: 2.633580 LR: 0.00001876 +[07:35:19] Epoch: 1 Batch: 2413/20099 (12.01%) Loss: 2.602895 LR: 0.00001876 +[07:35:22] Epoch: 1 Batch: 2414/20099 (12.01%) Loss: 2.314735 LR: 0.00001876 +[07:35:25] Epoch: 1 Batch: 2415/20099 (12.02%) Loss: 2.255551 LR: 0.00001882 +[07:35:28] Epoch: 1 Batch: 2416/20099 (12.02%) Loss: 2.021905 LR: 0.00001882 +[07:35:31] Epoch: 1 Batch: 2417/20099 (12.03%) Loss: 2.523207 LR: 0.00001882 +[07:35:35] Epoch: 1 Batch: 2418/20099 (12.03%) Loss: 2.228493 LR: 0.00001882 +[07:35:38] Epoch: 1 Batch: 2419/20099 (12.04%) Loss: 2.212708 LR: 0.00001882 +[07:35:41] Epoch: 1 Batch: 2420/20099 (12.04%) Loss: 2.421712 LR: 0.00001882 +[07:35:44] Epoch: 1 Batch: 2421/20099 (12.05%) Loss: 2.050515 LR: 0.00001882 +[07:35:47] Epoch: 1 Batch: 2422/20099 (12.05%) Loss: 1.923726 LR: 0.00001887 +[07:35:50] Epoch: 1 Batch: 2423/20099 (12.06%) Loss: 1.975281 LR: 0.00001887 +[07:35:53] Epoch: 1 Batch: 2424/20099 (12.06%) Loss: 2.134868 LR: 0.00001887 +[07:35:56] Epoch: 1 Batch: 2425/20099 (12.07%) Loss: 2.547057 LR: 0.00001887 +[07:35:59] Epoch: 1 Batch: 2426/20099 (12.07%) Loss: 2.008806 LR: 0.00001887 +[07:36:02] Epoch: 1 Batch: 2427/20099 (12.08%) Loss: 2.321616 LR: 0.00001887 +[07:36:05] Epoch: 1 Batch: 2428/20099 (12.08%) Loss: 2.218908 LR: 0.00001887 +[07:36:09] Epoch: 1 Batch: 2429/20099 (12.09%) Loss: 2.134288 LR: 0.00001893 +[07:36:12] Epoch: 1 Batch: 2430/20099 (12.09%) Loss: 2.059369 LR: 0.00001893 +[07:36:15] Epoch: 1 Batch: 2431/20099 (12.10%) Loss: 2.198492 LR: 0.00001893 +[07:36:18] Epoch: 1 Batch: 2432/20099 (12.10%) Loss: 2.346263 LR: 0.00001893 +[07:36:21] Epoch: 1 Batch: 2433/20099 (12.11%) Loss: 2.100940 LR: 0.00001893 +[07:36:24] Epoch: 1 Batch: 2434/20099 (12.11%) Loss: 1.863545 LR: 0.00001893 +[07:36:27] Epoch: 1 Batch: 2435/20099 (12.12%) Loss: 2.209511 LR: 0.00001893 +[07:36:30] Epoch: 1 Batch: 2436/20099 (12.12%) Loss: 2.450082 LR: 0.00001898 +[07:36:33] Epoch: 1 Batch: 2437/20099 (12.12%) Loss: 2.427570 LR: 0.00001898 +[07:36:36] Epoch: 1 Batch: 2438/20099 (12.13%) Loss: 2.369405 LR: 0.00001898 +[07:36:39] Epoch: 1 Batch: 2439/20099 (12.13%) Loss: 2.299678 LR: 0.00001898 +[07:36:43] Epoch: 1 Batch: 2440/20099 (12.14%) Loss: 2.358124 LR: 0.00001898 +[07:36:46] Epoch: 1 Batch: 2441/20099 (12.14%) Loss: 2.086754 LR: 0.00001898 +[07:36:49] Epoch: 1 Batch: 2442/20099 (12.15%) Loss: 2.013250 LR: 0.00001898 +[07:36:52] Epoch: 1 Batch: 2443/20099 (12.15%) Loss: 2.108417 LR: 0.00001904 +[07:36:55] Epoch: 1 Batch: 2444/20099 (12.16%) Loss: 2.580287 LR: 0.00001904 +[07:36:58] Epoch: 1 Batch: 2445/20099 (12.16%) Loss: 2.278261 LR: 0.00001904 +[07:37:01] Epoch: 1 Batch: 2446/20099 (12.17%) Loss: 2.355254 LR: 0.00001904 +[07:37:04] Epoch: 1 Batch: 2447/20099 (12.17%) Loss: 2.183835 LR: 0.00001904 +[07:37:07] Epoch: 1 Batch: 2448/20099 (12.18%) Loss: 2.211163 LR: 0.00001904 +[07:37:11] Epoch: 1 Batch: 2449/20099 (12.18%) Loss: 2.198703 LR: 0.00001904 +[07:37:14] Epoch: 1 Batch: 2450/20099 (12.19%) Loss: 2.058736 LR: 0.00001909 +[07:37:17] Epoch: 1 Batch: 2451/20099 (12.19%) Loss: 2.130414 LR: 0.00001909 +[07:37:20] Epoch: 1 Batch: 2452/20099 (12.20%) Loss: 2.057219 LR: 0.00001909 +[07:37:23] Epoch: 1 Batch: 2453/20099 (12.20%) Loss: 2.139417 LR: 0.00001909 +[07:37:26] Epoch: 1 Batch: 2454/20099 (12.21%) Loss: 1.942473 LR: 0.00001909 +[07:37:29] Epoch: 1 Batch: 2455/20099 (12.21%) Loss: 2.254039 LR: 0.00001909 +[07:37:32] Epoch: 1 Batch: 2456/20099 (12.22%) Loss: 2.186723 LR: 0.00001909 +[07:37:35] Epoch: 1 Batch: 2457/20099 (12.22%) Loss: 2.691996 LR: 0.00001915 +[07:37:38] Epoch: 1 Batch: 2458/20099 (12.23%) Loss: 2.700078 LR: 0.00001915 +[07:37:42] Epoch: 1 Batch: 2459/20099 (12.23%) Loss: 2.402087 LR: 0.00001915 +[07:37:45] Epoch: 1 Batch: 2460/20099 (12.24%) Loss: 2.241405 LR: 0.00001915 +[07:37:48] Epoch: 1 Batch: 2461/20099 (12.24%) Loss: 2.268862 LR: 0.00001915 +[07:37:51] Epoch: 1 Batch: 2462/20099 (12.25%) Loss: 2.337632 LR: 0.00001915 +[07:37:54] Epoch: 1 Batch: 2463/20099 (12.25%) Loss: 2.109412 LR: 0.00001915 +[07:37:57] Epoch: 1 Batch: 2464/20099 (12.26%) Loss: 2.486362 LR: 0.00001920 +[07:38:00] Epoch: 1 Batch: 2465/20099 (12.26%) Loss: 2.306170 LR: 0.00001920 +[07:38:03] Epoch: 1 Batch: 2466/20099 (12.27%) Loss: 2.304505 LR: 0.00001920 +[07:38:06] Epoch: 1 Batch: 2467/20099 (12.27%) Loss: 2.122912 LR: 0.00001920 +[07:38:09] Epoch: 1 Batch: 2468/20099 (12.28%) Loss: 2.344028 LR: 0.00001920 +[07:38:12] Epoch: 1 Batch: 2469/20099 (12.28%) Loss: 2.247477 LR: 0.00001920 +[07:38:16] Epoch: 1 Batch: 2470/20099 (12.29%) Loss: 2.147029 LR: 0.00001920 +[07:38:19] Epoch: 1 Batch: 2471/20099 (12.29%) Loss: 2.391531 LR: 0.00001925 +[07:38:22] Epoch: 1 Batch: 2472/20099 (12.30%) Loss: 2.184744 LR: 0.00001925 +[07:38:25] Epoch: 1 Batch: 2473/20099 (12.30%) Loss: 2.191040 LR: 0.00001925 +[07:38:28] Epoch: 1 Batch: 2474/20099 (12.31%) Loss: 2.332690 LR: 0.00001925 +[07:38:31] Epoch: 1 Batch: 2475/20099 (12.31%) Loss: 2.463895 LR: 0.00001925 +[07:38:34] Epoch: 1 Batch: 2476/20099 (12.32%) Loss: 1.856904 LR: 0.00001925 +[07:38:37] Epoch: 1 Batch: 2477/20099 (12.32%) Loss: 2.316909 LR: 0.00001925 +[07:38:40] Epoch: 1 Batch: 2478/20099 (12.33%) Loss: 2.293367 LR: 0.00001931 +[07:38:43] Epoch: 1 Batch: 2479/20099 (12.33%) Loss: 2.164303 LR: 0.00001931 +[07:38:46] Epoch: 1 Batch: 2480/20099 (12.34%) Loss: 2.276574 LR: 0.00001931 +[07:38:50] Epoch: 1 Batch: 2481/20099 (12.34%) Loss: 2.073965 LR: 0.00001931 +[07:38:53] Epoch: 1 Batch: 2482/20099 (12.35%) Loss: 2.147082 LR: 0.00001931 +[07:38:56] Epoch: 1 Batch: 2483/20099 (12.35%) Loss: 2.377464 LR: 0.00001931 +[07:38:59] Epoch: 1 Batch: 2484/20099 (12.36%) Loss: 1.895467 LR: 0.00001931 +[07:39:02] Epoch: 1 Batch: 2485/20099 (12.36%) Loss: 2.226432 LR: 0.00001936 +[07:39:05] Epoch: 1 Batch: 2486/20099 (12.37%) Loss: 1.859377 LR: 0.00001936 +[07:39:08] Epoch: 1 Batch: 2487/20099 (12.37%) Loss: 2.297233 LR: 0.00001936 +[07:39:11] Epoch: 1 Batch: 2488/20099 (12.38%) Loss: 2.406906 LR: 0.00001936 +[07:39:14] Epoch: 1 Batch: 2489/20099 (12.38%) Loss: 1.951037 LR: 0.00001936 +[07:39:17] Epoch: 1 Batch: 2490/20099 (12.39%) Loss: 1.933472 LR: 0.00001936 +[07:39:20] Epoch: 1 Batch: 2491/20099 (12.39%) Loss: 2.001258 LR: 0.00001936 +[07:39:24] Epoch: 1 Batch: 2492/20099 (12.40%) Loss: 2.424253 LR: 0.00001942 +[07:39:27] Epoch: 1 Batch: 2493/20099 (12.40%) Loss: 2.236502 LR: 0.00001942 +[07:39:30] Epoch: 1 Batch: 2494/20099 (12.41%) Loss: 2.374649 LR: 0.00001942 +[07:39:33] Epoch: 1 Batch: 2495/20099 (12.41%) Loss: 2.228652 LR: 0.00001942 +[07:39:36] Epoch: 1 Batch: 2496/20099 (12.42%) Loss: 2.513165 LR: 0.00001942 +[07:39:39] Epoch: 1 Batch: 2497/20099 (12.42%) Loss: 2.257178 LR: 0.00001942 +[07:39:42] Epoch: 1 Batch: 2498/20099 (12.43%) Loss: 2.273577 LR: 0.00001942 +[07:39:45] Epoch: 1 Batch: 2499/20099 (12.43%) Loss: 1.954978 LR: 0.00001947 +[07:39:48] >> Evaluating batch 0 +[07:39:50] >> Evaluating batch 1 +[07:39:51] >> Evaluating batch 2 +[07:39:52] >> Evaluating batch 3 +[07:39:53] >> Evaluating batch 4 +[07:39:55] >> Evaluating batch 5 +[07:39:56] >> Evaluating batch 6 +[07:39:57] >> Evaluating batch 7 +[07:39:59] >> Evaluating batch 8 +[07:40:00] >> Evaluating batch 9 +[07:40:01] >> Evaluating batch 10 +[07:40:02] >> Evaluating batch 11 +[07:40:03] >> Evaluating batch 12 +[07:40:04] >> Evaluating batch 13 +[07:40:06] >> Evaluating batch 14 +[07:40:07] >> Evaluating batch 15 +[07:40:08] >> Evaluating batch 16 +[07:40:09] Epoch: 1 Step: 2500/20099 Evaluation: +[07:40:09] [1mAvg Loss Since Last Eval: 2.2613 Val Loss: 2.3137 Validation loss delta: -0.0425 Perplexity: 10.1120 LR: 0.00001947 +[07:40:12] >> Checkpoint saved: epoch1_step2500, size: 0.1693 GB +[07:40:12] Epoch: 1 Batch: 2500/20099 (12.44%) Loss: 1.880281 LR: 0.00001947 +[07:40:15] Epoch: 1 Batch: 2501/20099 (12.44%) Loss: 2.622879 LR: 0.00001947 +[07:40:18] Epoch: 1 Batch: 2502/20099 (12.45%) Loss: 2.188593 LR: 0.00001947 +[07:40:21] Epoch: 1 Batch: 2503/20099 (12.45%) Loss: 2.152611 LR: 0.00001947 +[07:40:25] Epoch: 1 Batch: 2504/20099 (12.46%) Loss: 2.273496 LR: 0.00001947 +[07:40:28] Epoch: 1 Batch: 2505/20099 (12.46%) Loss: 2.543478 LR: 0.00001947 +[07:40:31] Epoch: 1 Batch: 2506/20099 (12.47%) Loss: 2.349094 LR: 0.00001953 +[07:40:34] Epoch: 1 Batch: 2507/20099 (12.47%) Loss: 1.993045 LR: 0.00001953 +[07:40:37] Epoch: 1 Batch: 2508/20099 (12.48%) Loss: 2.237552 LR: 0.00001953 +[07:40:40] Epoch: 1 Batch: 2509/20099 (12.48%) Loss: 2.067101 LR: 0.00001953 +[07:40:43] Epoch: 1 Batch: 2510/20099 (12.49%) Loss: 2.039281 LR: 0.00001953 +[07:40:46] Epoch: 1 Batch: 2511/20099 (12.49%) Loss: 2.329012 LR: 0.00001953 +[07:40:49] Epoch: 1 Batch: 2512/20099 (12.50%) Loss: 2.287578 LR: 0.00001953 +[07:40:53] Epoch: 1 Batch: 2513/20099 (12.50%) Loss: 2.235556 LR: 0.00001958 +[07:40:56] Epoch: 1 Batch: 2514/20099 (12.51%) Loss: 2.187831 LR: 0.00001958 +[07:40:59] Epoch: 1 Batch: 2515/20099 (12.51%) Loss: 2.352833 LR: 0.00001958 +[07:41:02] Epoch: 1 Batch: 2516/20099 (12.52%) Loss: 2.101795 LR: 0.00001958 +[07:41:05] Epoch: 1 Batch: 2517/20099 (12.52%) Loss: 2.415463 LR: 0.00001958 +[07:41:08] Epoch: 1 Batch: 2518/20099 (12.53%) Loss: 2.008004 LR: 0.00001958 +[07:41:11] Epoch: 1 Batch: 2519/20099 (12.53%) Loss: 2.271945 LR: 0.00001958 +[07:41:14] Epoch: 1 Batch: 2520/20099 (12.54%) Loss: 2.556609 LR: 0.00001964 +[07:41:17] Epoch: 1 Batch: 2521/20099 (12.54%) Loss: 2.310710 LR: 0.00001964 +[07:41:20] Epoch: 1 Batch: 2522/20099 (12.55%) Loss: 2.175994 LR: 0.00001964 +[07:41:24] Epoch: 1 Batch: 2523/20099 (12.55%) Loss: 2.208343 LR: 0.00001964 +[07:41:27] Epoch: 1 Batch: 2524/20099 (12.56%) Loss: 1.982280 LR: 0.00001964 +[07:41:30] Epoch: 1 Batch: 2525/20099 (12.56%) Loss: 2.159120 LR: 0.00001964 +[07:41:33] Epoch: 1 Batch: 2526/20099 (12.57%) Loss: 2.208865 LR: 0.00001964 +[07:41:36] Epoch: 1 Batch: 2527/20099 (12.57%) Loss: 2.455609 LR: 0.00001969 +[07:41:39] Epoch: 1 Batch: 2528/20099 (12.58%) Loss: 2.260881 LR: 0.00001969 +[07:41:42] Epoch: 1 Batch: 2529/20099 (12.58%) Loss: 2.494386 LR: 0.00001969 +[07:41:45] Epoch: 1 Batch: 2530/20099 (12.59%) Loss: 2.384629 LR: 0.00001969 +[07:41:48] Epoch: 1 Batch: 2531/20099 (12.59%) Loss: 2.269327 LR: 0.00001969 +[07:41:51] Epoch: 1 Batch: 2532/20099 (12.60%) Loss: 2.471129 LR: 0.00001969 +[07:41:54] Epoch: 1 Batch: 2533/20099 (12.60%) Loss: 2.383344 LR: 0.00001969 +[07:41:58] Epoch: 1 Batch: 2534/20099 (12.61%) Loss: 2.177846 LR: 0.00001975 +[07:42:01] Epoch: 1 Batch: 2535/20099 (12.61%) Loss: 2.138136 LR: 0.00001975 +[07:42:04] Epoch: 1 Batch: 2536/20099 (12.62%) Loss: 2.294704 LR: 0.00001975 +[07:42:07] Epoch: 1 Batch: 2537/20099 (12.62%) Loss: 2.065362 LR: 0.00001975 +[07:42:10] Epoch: 1 Batch: 2538/20099 (12.63%) Loss: 1.994961 LR: 0.00001975 +[07:42:13] Epoch: 1 Batch: 2539/20099 (12.63%) Loss: 2.059072 LR: 0.00001975 +[07:42:16] Epoch: 1 Batch: 2540/20099 (12.64%) Loss: 2.085220 LR: 0.00001975 +[07:42:19] Epoch: 1 Batch: 2541/20099 (12.64%) Loss: 2.270836 LR: 0.00001980 +[07:42:22] Epoch: 1 Batch: 2542/20099 (12.65%) Loss: 2.057643 LR: 0.00001980 +[07:42:26] Epoch: 1 Batch: 2543/20099 (12.65%) Loss: 2.272408 LR: 0.00001980 +[07:42:29] Epoch: 1 Batch: 2544/20099 (12.66%) Loss: 2.227843 LR: 0.00001980 +[07:42:32] Epoch: 1 Batch: 2545/20099 (12.66%) Loss: 2.214902 LR: 0.00001980 +[07:42:35] Epoch: 1 Batch: 2546/20099 (12.67%) Loss: 2.252584 LR: 0.00001980 +[07:42:38] Epoch: 1 Batch: 2547/20099 (12.67%) Loss: 2.219244 LR: 0.00001980 +[07:42:41] Epoch: 1 Batch: 2548/20099 (12.68%) Loss: 2.209745 LR: 0.00001985 +[07:42:44] Epoch: 1 Batch: 2549/20099 (12.68%) Loss: 1.773001 LR: 0.00001985 +[07:42:47] Epoch: 1 Batch: 2550/20099 (12.69%) Loss: 2.134299 LR: 0.00001985 +[07:42:50] Epoch: 1 Batch: 2551/20099 (12.69%) Loss: 2.155014 LR: 0.00001985 +[07:42:53] Epoch: 1 Batch: 2552/20099 (12.70%) Loss: 2.349919 LR: 0.00001985 +[07:42:57] Epoch: 1 Batch: 2553/20099 (12.70%) Loss: 2.046849 LR: 0.00001985 +[07:43:00] Epoch: 1 Batch: 2554/20099 (12.71%) Loss: 2.180592 LR: 0.00001985 +[07:43:03] Epoch: 1 Batch: 2555/20099 (12.71%) Loss: 2.131764 LR: 0.00001991 +[07:43:06] Epoch: 1 Batch: 2556/20099 (12.72%) Loss: 2.106577 LR: 0.00001991 +[07:43:09] Epoch: 1 Batch: 2557/20099 (12.72%) Loss: 2.235081 LR: 0.00001991 +[07:43:12] Epoch: 1 Batch: 2558/20099 (12.73%) Loss: 2.340556 LR: 0.00001991 +[07:43:15] Epoch: 1 Batch: 2559/20099 (12.73%) Loss: 2.236594 LR: 0.00001991 +[07:43:18] Epoch: 1 Batch: 2560/20099 (12.74%) Loss: 2.321872 LR: 0.00001991 +[07:43:21] Epoch: 1 Batch: 2561/20099 (12.74%) Loss: 2.266024 LR: 0.00001991 +[07:43:25] Epoch: 1 Batch: 2562/20099 (12.75%) Loss: 2.563259 LR: 0.00001996 +[07:43:28] Epoch: 1 Batch: 2563/20099 (12.75%) Loss: 1.950840 LR: 0.00001996 +[07:43:31] Epoch: 1 Batch: 2564/20099 (12.76%) Loss: 2.162748 LR: 0.00001996 +[07:43:34] Epoch: 1 Batch: 2565/20099 (12.76%) Loss: 2.317983 LR: 0.00001996 +[07:43:37] Epoch: 1 Batch: 2566/20099 (12.77%) Loss: 2.003156 LR: 0.00001996 +[07:43:40] Epoch: 1 Batch: 2567/20099 (12.77%) Loss: 2.436775 LR: 0.00001996 +[07:43:43] Epoch: 1 Batch: 2568/20099 (12.78%) Loss: 2.107519 LR: 0.00001996 +[07:43:46] Epoch: 1 Batch: 2569/20099 (12.78%) Loss: 2.134382 LR: 0.00002002 +[07:43:49] Epoch: 1 Batch: 2570/20099 (12.79%) Loss: 2.186508 LR: 0.00002002 +[07:43:52] Epoch: 1 Batch: 2571/20099 (12.79%) Loss: 2.253652 LR: 0.00002002 +[07:43:55] Epoch: 1 Batch: 2572/20099 (12.80%) Loss: 2.809451 LR: 0.00002002 +[07:43:59] Epoch: 1 Batch: 2573/20099 (12.80%) Loss: 2.073937 LR: 0.00002002 +[07:44:02] Epoch: 1 Batch: 2574/20099 (12.81%) Loss: 2.352082 LR: 0.00002002 +[07:44:05] Epoch: 1 Batch: 2575/20099 (12.81%) Loss: 2.506079 LR: 0.00002002 +[07:44:08] Epoch: 1 Batch: 2576/20099 (12.82%) Loss: 1.922283 LR: 0.00002007 +[07:44:11] Epoch: 1 Batch: 2577/20099 (12.82%) Loss: 2.225896 LR: 0.00002007 +[07:44:14] Epoch: 1 Batch: 2578/20099 (12.83%) Loss: 2.489269 LR: 0.00002007 +[07:44:17] Epoch: 1 Batch: 2579/20099 (12.83%) Loss: 2.252371 LR: 0.00002007 +[07:44:20] Epoch: 1 Batch: 2580/20099 (12.84%) Loss: 2.202752 LR: 0.00002007 +[07:44:23] Epoch: 1 Batch: 2581/20099 (12.84%) Loss: 2.412083 LR: 0.00002007 +[07:44:26] Epoch: 1 Batch: 2582/20099 (12.85%) Loss: 2.132069 LR: 0.00002007 +[07:44:29] Epoch: 1 Batch: 2583/20099 (12.85%) Loss: 2.249569 LR: 0.00002013 +[07:44:33] Epoch: 1 Batch: 2584/20099 (12.86%) Loss: 2.245947 LR: 0.00002013 +[07:44:36] Epoch: 1 Batch: 2585/20099 (12.86%) Loss: 2.046800 LR: 0.00002013 +[07:44:39] Epoch: 1 Batch: 2586/20099 (12.87%) Loss: 2.369754 LR: 0.00002013 +[07:44:42] Epoch: 1 Batch: 2587/20099 (12.87%) Loss: 2.082038 LR: 0.00002013 +[07:44:45] Epoch: 1 Batch: 2588/20099 (12.88%) Loss: 2.102934 LR: 0.00002013 +[07:44:48] Epoch: 1 Batch: 2589/20099 (12.88%) Loss: 2.290942 LR: 0.00002013 +[07:44:51] Epoch: 1 Batch: 2590/20099 (12.89%) Loss: 2.064359 LR: 0.00002018 +[07:44:54] Epoch: 1 Batch: 2591/20099 (12.89%) Loss: 2.291740 LR: 0.00002018 +[07:44:57] Epoch: 1 Batch: 2592/20099 (12.90%) Loss: 2.424448 LR: 0.00002018 +[07:45:00] Epoch: 1 Batch: 2593/20099 (12.90%) Loss: 2.222500 LR: 0.00002018 +[07:45:03] Epoch: 1 Batch: 2594/20099 (12.91%) Loss: 2.369810 LR: 0.00002018 +[07:45:07] Epoch: 1 Batch: 2595/20099 (12.91%) Loss: 2.205030 LR: 0.00002018 +[07:45:10] Epoch: 1 Batch: 2596/20099 (12.92%) Loss: 2.168958 LR: 0.00002018 +[07:45:13] Epoch: 1 Batch: 2597/20099 (12.92%) Loss: 2.357138 LR: 0.00002024 +[07:45:16] Epoch: 1 Batch: 2598/20099 (12.93%) Loss: 2.020350 LR: 0.00002024 +[07:45:19] Epoch: 1 Batch: 2599/20099 (12.93%) Loss: 2.445168 LR: 0.00002024 +[07:45:25] >> Cleaned up old temp checkpoint: epoch1_step600 +[07:45:25] >> Temp checkpoint saved: epoch1_step2600, size: 0.1693 GB +[07:45:25] Epoch: 1 Batch: 2600/20099 (12.94%) Loss: 2.208510 LR: 0.00002024 +[07:45:28] Epoch: 1 Batch: 2601/20099 (12.94%) Loss: 2.012844 LR: 0.00002024 +[07:45:32] Epoch: 1 Batch: 2602/20099 (12.95%) Loss: 2.261114 LR: 0.00002024 +[07:45:35] Epoch: 1 Batch: 2603/20099 (12.95%) Loss: 2.419029 LR: 0.00002024 +[07:45:38] Epoch: 1 Batch: 2604/20099 (12.96%) Loss: 2.110422 LR: 0.00002029 +[07:45:41] Epoch: 1 Batch: 2605/20099 (12.96%) Loss: 2.012567 LR: 0.00002029 +[07:45:44] Epoch: 1 Batch: 2606/20099 (12.97%) Loss: 2.501838 LR: 0.00002029 +[07:45:47] Epoch: 1 Batch: 2607/20099 (12.97%) Loss: 2.441295 LR: 0.00002029 +[07:45:50] Epoch: 1 Batch: 2608/20099 (12.98%) Loss: 2.383824 LR: 0.00002029 +[07:45:53] Epoch: 1 Batch: 2609/20099 (12.98%) Loss: 2.580420 LR: 0.00002029 +[07:45:56] Epoch: 1 Batch: 2610/20099 (12.99%) Loss: 2.024242 LR: 0.00002029 +[07:46:00] Epoch: 1 Batch: 2611/20099 (12.99%) Loss: 2.403764 LR: 0.00002035 +[07:46:03] Epoch: 1 Batch: 2612/20099 (13.00%) Loss: 2.495763 LR: 0.00002035 +[07:46:06] Epoch: 1 Batch: 2613/20099 (13.00%) Loss: 2.580287 LR: 0.00002035 +[07:46:09] Epoch: 1 Batch: 2614/20099 (13.01%) Loss: 2.290182 LR: 0.00002035 +[07:46:12] Epoch: 1 Batch: 2615/20099 (13.01%) Loss: 2.170666 LR: 0.00002035 +[07:46:15] Epoch: 1 Batch: 2616/20099 (13.02%) Loss: 2.341111 LR: 0.00002035 +[07:46:18] Epoch: 1 Batch: 2617/20099 (13.02%) Loss: 2.180949 LR: 0.00002035 +[07:46:22] Epoch: 1 Batch: 2618/20099 (13.03%) Loss: 2.314270 LR: 0.00002040 +[07:46:25] Epoch: 1 Batch: 2619/20099 (13.03%) Loss: 2.223565 LR: 0.00002040 +[07:46:28] Epoch: 1 Batch: 2620/20099 (13.04%) Loss: 2.106771 LR: 0.00002040 +[07:46:31] Epoch: 1 Batch: 2621/20099 (13.04%) Loss: 2.467380 LR: 0.00002040 +[07:46:34] Epoch: 1 Batch: 2622/20099 (13.05%) Loss: 2.123027 LR: 0.00002040 +[07:46:37] Epoch: 1 Batch: 2623/20099 (13.05%) Loss: 2.067979 LR: 0.00002040 +[07:46:40] Epoch: 1 Batch: 2624/20099 (13.06%) Loss: 2.287547 LR: 0.00002040 +[07:46:43] Epoch: 1 Batch: 2625/20099 (13.06%) Loss: 2.406898 LR: 0.00002045 +[07:46:46] Epoch: 1 Batch: 2626/20099 (13.07%) Loss: 2.072085 LR: 0.00002045 +[07:46:49] Epoch: 1 Batch: 2627/20099 (13.07%) Loss: 2.242967 LR: 0.00002045 +[07:46:52] Epoch: 1 Batch: 2628/20099 (13.08%) Loss: 2.475592 LR: 0.00002045 +[07:46:55] Epoch: 1 Batch: 2629/20099 (13.08%) Loss: 2.160910 LR: 0.00002045 +[07:46:59] Epoch: 1 Batch: 2630/20099 (13.09%) Loss: 2.247187 LR: 0.00002045 +[07:47:02] Epoch: 1 Batch: 2631/20099 (13.09%) Loss: 2.235039 LR: 0.00002045 +[07:47:05] Epoch: 1 Batch: 2632/20099 (13.10%) Loss: 2.220374 LR: 0.00002051 +[07:47:08] Epoch: 1 Batch: 2633/20099 (13.10%) Loss: 2.355858 LR: 0.00002051 +[07:47:11] Epoch: 1 Batch: 2634/20099 (13.11%) Loss: 2.269606 LR: 0.00002051 +[07:47:14] Epoch: 1 Batch: 2635/20099 (13.11%) Loss: 1.968300 LR: 0.00002051 +[07:47:17] Epoch: 1 Batch: 2636/20099 (13.12%) Loss: 2.219812 LR: 0.00002051 +[07:47:20] Epoch: 1 Batch: 2637/20099 (13.12%) Loss: 2.475409 LR: 0.00002051 +[07:47:23] Epoch: 1 Batch: 2638/20099 (13.13%) Loss: 2.160435 LR: 0.00002051 +[07:47:26] Epoch: 1 Batch: 2639/20099 (13.13%) Loss: 2.444203 LR: 0.00002056 +[07:47:30] Epoch: 1 Batch: 2640/20099 (13.13%) Loss: 2.511827 LR: 0.00002056 +[07:47:33] Epoch: 1 Batch: 2641/20099 (13.14%) Loss: 2.304080 LR: 0.00002056 +[07:47:36] Epoch: 1 Batch: 2642/20099 (13.14%) Loss: 2.085004 LR: 0.00002056 +[07:47:39] Epoch: 1 Batch: 2643/20099 (13.15%) Loss: 2.062993 LR: 0.00002056 +[07:47:42] Epoch: 1 Batch: 2644/20099 (13.15%) Loss: 2.345502 LR: 0.00002056 +[07:47:45] Epoch: 1 Batch: 2645/20099 (13.16%) Loss: 2.099865 LR: 0.00002056 +[07:47:48] Epoch: 1 Batch: 2646/20099 (13.16%) Loss: 2.371133 LR: 0.00002062 +[07:47:51] Epoch: 1 Batch: 2647/20099 (13.17%) Loss: 2.429337 LR: 0.00002062 +[07:47:54] Epoch: 1 Batch: 2648/20099 (13.17%) Loss: 2.274230 LR: 0.00002062 +[07:47:58] Epoch: 1 Batch: 2649/20099 (13.18%) Loss: 1.978810 LR: 0.00002062 +[07:48:01] Epoch: 1 Batch: 2650/20099 (13.18%) Loss: 2.109786 LR: 0.00002062 +[07:48:04] Epoch: 1 Batch: 2651/20099 (13.19%) Loss: 2.008283 LR: 0.00002062 +[07:48:07] Epoch: 1 Batch: 2652/20099 (13.19%) Loss: 2.265197 LR: 0.00002062 +[07:48:10] Epoch: 1 Batch: 2653/20099 (13.20%) Loss: 2.443243 LR: 0.00002067 +[07:48:13] Epoch: 1 Batch: 2654/20099 (13.20%) Loss: 2.253790 LR: 0.00002067 +[07:48:16] Epoch: 1 Batch: 2655/20099 (13.21%) Loss: 2.528709 LR: 0.00002067 +[07:48:19] Epoch: 1 Batch: 2656/20099 (13.21%) Loss: 2.291509 LR: 0.00002067 +[07:48:22] Epoch: 1 Batch: 2657/20099 (13.22%) Loss: 2.435505 LR: 0.00002067 +[07:48:25] Epoch: 1 Batch: 2658/20099 (13.22%) Loss: 1.871415 LR: 0.00002067 +[07:48:28] Epoch: 1 Batch: 2659/20099 (13.23%) Loss: 2.278696 LR: 0.00002067 +[07:48:32] Epoch: 1 Batch: 2660/20099 (13.23%) Loss: 2.397355 LR: 0.00002073 +[07:48:35] Epoch: 1 Batch: 2661/20099 (13.24%) Loss: 2.375643 LR: 0.00002073 +[07:48:38] Epoch: 1 Batch: 2662/20099 (13.24%) Loss: 2.244741 LR: 0.00002073 +[07:48:41] Epoch: 1 Batch: 2663/20099 (13.25%) Loss: 2.164840 LR: 0.00002073 +[07:48:44] Epoch: 1 Batch: 2664/20099 (13.25%) Loss: 2.340595 LR: 0.00002073 +[07:48:47] Epoch: 1 Batch: 2665/20099 (13.26%) Loss: 2.006605 LR: 0.00002073 +[07:48:50] Epoch: 1 Batch: 2666/20099 (13.26%) Loss: 2.313323 LR: 0.00002073 +[07:48:53] Epoch: 1 Batch: 2667/20099 (13.27%) Loss: 2.654842 LR: 0.00002078 +[07:48:56] Epoch: 1 Batch: 2668/20099 (13.27%) Loss: 2.200256 LR: 0.00002078 +[07:48:59] Epoch: 1 Batch: 2669/20099 (13.28%) Loss: 2.245870 LR: 0.00002078 +[07:49:02] Epoch: 1 Batch: 2670/20099 (13.28%) Loss: 2.003902 LR: 0.00002078 +[07:49:05] Epoch: 1 Batch: 2671/20099 (13.29%) Loss: 2.092023 LR: 0.00002078 +[07:49:09] Epoch: 1 Batch: 2672/20099 (13.29%) Loss: 2.572376 LR: 0.00002078 +[07:49:12] Epoch: 1 Batch: 2673/20099 (13.30%) Loss: 2.351670 LR: 0.00002078 +[07:49:15] Epoch: 1 Batch: 2674/20099 (13.30%) Loss: 2.517550 LR: 0.00002084 +[07:49:18] Epoch: 1 Batch: 2675/20099 (13.31%) Loss: 2.202765 LR: 0.00002084 +[07:49:21] Epoch: 1 Batch: 2676/20099 (13.31%) Loss: 2.282797 LR: 0.00002084 +[07:49:24] Epoch: 1 Batch: 2677/20099 (13.32%) Loss: 1.944969 LR: 0.00002084 +[07:49:27] Epoch: 1 Batch: 2678/20099 (13.32%) Loss: 2.231028 LR: 0.00002084 +[07:49:30] Epoch: 1 Batch: 2679/20099 (13.33%) Loss: 2.165832 LR: 0.00002084 +[07:49:33] Epoch: 1 Batch: 2680/20099 (13.33%) Loss: 2.280382 LR: 0.00002084 +[07:49:36] Epoch: 1 Batch: 2681/20099 (13.34%) Loss: 2.464732 LR: 0.00002089 +[07:49:39] Epoch: 1 Batch: 2682/20099 (13.34%) Loss: 2.038962 LR: 0.00002089 +[07:49:43] Epoch: 1 Batch: 2683/20099 (13.35%) Loss: 2.323652 LR: 0.00002089 +[07:49:46] Epoch: 1 Batch: 2684/20099 (13.35%) Loss: 2.176493 LR: 0.00002089 +[07:49:49] Epoch: 1 Batch: 2685/20099 (13.36%) Loss: 2.384225 LR: 0.00002089 +[07:49:52] Epoch: 1 Batch: 2686/20099 (13.36%) Loss: 2.051665 LR: 0.00002089 +[07:49:55] Epoch: 1 Batch: 2687/20099 (13.37%) Loss: 2.213538 LR: 0.00002089 +[07:49:58] Epoch: 1 Batch: 2688/20099 (13.37%) Loss: 2.230741 LR: 0.00002095 +[07:50:01] Epoch: 1 Batch: 2689/20099 (13.38%) Loss: 2.491953 LR: 0.00002095 +[07:50:04] Epoch: 1 Batch: 2690/20099 (13.38%) Loss: 2.449439 LR: 0.00002095 +[07:50:07] Epoch: 1 Batch: 2691/20099 (13.39%) Loss: 2.606206 LR: 0.00002095 +[07:50:10] Epoch: 1 Batch: 2692/20099 (13.39%) Loss: 2.250755 LR: 0.00002095 +[07:50:14] Epoch: 1 Batch: 2693/20099 (13.40%) Loss: 2.297725 LR: 0.00002095 +[07:50:17] Epoch: 1 Batch: 2694/20099 (13.40%) Loss: 2.000909 LR: 0.00002095 +[07:50:20] Epoch: 1 Batch: 2695/20099 (13.41%) Loss: 2.391862 LR: 0.00002100 +[07:50:23] Epoch: 1 Batch: 2696/20099 (13.41%) Loss: 2.287190 LR: 0.00002100 +[07:50:26] Epoch: 1 Batch: 2697/20099 (13.42%) Loss: 1.829051 LR: 0.00002100 +[07:50:29] Epoch: 1 Batch: 2698/20099 (13.42%) Loss: 2.148665 LR: 0.00002100 +[07:50:32] Epoch: 1 Batch: 2699/20099 (13.43%) Loss: 2.105326 LR: 0.00002100 +[07:50:35] Epoch: 1 Batch: 2700/20099 (13.43%) Loss: 2.097973 LR: 0.00002100 +[07:50:38] Epoch: 1 Batch: 2701/20099 (13.44%) Loss: 2.359345 LR: 0.00002100 +[07:50:42] Epoch: 1 Batch: 2702/20099 (13.44%) Loss: 2.257281 LR: 0.00002105 +[07:50:45] Epoch: 1 Batch: 2703/20099 (13.45%) Loss: 2.278080 LR: 0.00002105 +[07:50:48] Epoch: 1 Batch: 2704/20099 (13.45%) Loss: 2.133782 LR: 0.00002105 +[07:50:51] Epoch: 1 Batch: 2705/20099 (13.46%) Loss: 2.203111 LR: 0.00002105 +[07:50:54] Epoch: 1 Batch: 2706/20099 (13.46%) Loss: 2.109122 LR: 0.00002105 +[07:50:57] Epoch: 1 Batch: 2707/20099 (13.47%) Loss: 1.915862 LR: 0.00002105 +[07:51:00] Epoch: 1 Batch: 2708/20099 (13.47%) Loss: 2.037068 LR: 0.00002105 +[07:51:03] Epoch: 1 Batch: 2709/20099 (13.48%) Loss: 2.181608 LR: 0.00002111 +[07:51:06] Epoch: 1 Batch: 2710/20099 (13.48%) Loss: 2.143762 LR: 0.00002111 +[07:51:09] Epoch: 1 Batch: 2711/20099 (13.49%) Loss: 2.348612 LR: 0.00002111 +[07:51:13] Epoch: 1 Batch: 2712/20099 (13.49%) Loss: 2.495041 LR: 0.00002111 +[07:51:16] Epoch: 1 Batch: 2713/20099 (13.50%) Loss: 2.307232 LR: 0.00002111 +[07:51:19] Epoch: 1 Batch: 2714/20099 (13.50%) Loss: 2.399476 LR: 0.00002111 +[07:51:22] Epoch: 1 Batch: 2715/20099 (13.51%) Loss: 2.226197 LR: 0.00002111 +[07:51:25] Epoch: 1 Batch: 2716/20099 (13.51%) Loss: 2.291373 LR: 0.00002116 +[07:51:28] Epoch: 1 Batch: 2717/20099 (13.52%) Loss: 2.015525 LR: 0.00002116 +[07:51:31] Epoch: 1 Batch: 2718/20099 (13.52%) Loss: 1.918024 LR: 0.00002116 +[07:51:34] Epoch: 1 Batch: 2719/20099 (13.53%) Loss: 2.219763 LR: 0.00002116 +[07:51:37] Epoch: 1 Batch: 2720/20099 (13.53%) Loss: 2.398365 LR: 0.00002116 +[07:51:40] Epoch: 1 Batch: 2721/20099 (13.54%) Loss: 2.209027 LR: 0.00002116 +[07:51:43] Epoch: 1 Batch: 2722/20099 (13.54%) Loss: 2.170711 LR: 0.00002116 +[07:51:47] Epoch: 1 Batch: 2723/20099 (13.55%) Loss: 2.405605 LR: 0.00002122 +[07:51:50] Epoch: 1 Batch: 2724/20099 (13.55%) Loss: 2.248603 LR: 0.00002122 +[07:51:53] Epoch: 1 Batch: 2725/20099 (13.56%) Loss: 2.190042 LR: 0.00002122 +[07:51:56] Epoch: 1 Batch: 2726/20099 (13.56%) Loss: 2.159958 LR: 0.00002122 +[07:51:59] Epoch: 1 Batch: 2727/20099 (13.57%) Loss: 2.454524 LR: 0.00002122 +[07:52:02] Epoch: 1 Batch: 2728/20099 (13.57%) Loss: 1.998534 LR: 0.00002122 +[07:52:05] Epoch: 1 Batch: 2729/20099 (13.58%) Loss: 2.232819 LR: 0.00002122 +[07:52:08] Epoch: 1 Batch: 2730/20099 (13.58%) Loss: 2.087619 LR: 0.00002127 +[07:52:11] Epoch: 1 Batch: 2731/20099 (13.59%) Loss: 2.258963 LR: 0.00002127 +[07:52:14] Epoch: 1 Batch: 2732/20099 (13.59%) Loss: 2.311458 LR: 0.00002127 +[07:52:18] Epoch: 1 Batch: 2733/20099 (13.60%) Loss: 2.223551 LR: 0.00002127 +[07:52:21] Epoch: 1 Batch: 2734/20099 (13.60%) Loss: 2.236923 LR: 0.00002127 +[07:52:24] Epoch: 1 Batch: 2735/20099 (13.61%) Loss: 2.401893 LR: 0.00002127 +[07:52:27] Epoch: 1 Batch: 2736/20099 (13.61%) Loss: 2.172214 LR: 0.00002127 +[07:52:30] Epoch: 1 Batch: 2737/20099 (13.62%) Loss: 2.204510 LR: 0.00002133 +[07:52:33] Epoch: 1 Batch: 2738/20099 (13.62%) Loss: 2.373618 LR: 0.00002133 +[07:52:36] Epoch: 1 Batch: 2739/20099 (13.63%) Loss: 2.328069 LR: 0.00002133 +[07:52:39] Epoch: 1 Batch: 2740/20099 (13.63%) Loss: 1.896794 LR: 0.00002133 +[07:52:42] Epoch: 1 Batch: 2741/20099 (13.64%) Loss: 1.869381 LR: 0.00002133 +[07:52:45] Epoch: 1 Batch: 2742/20099 (13.64%) Loss: 2.323018 LR: 0.00002133 +[07:52:49] Epoch: 1 Batch: 2743/20099 (13.65%) Loss: 2.542007 LR: 0.00002133 +[07:52:52] Epoch: 1 Batch: 2744/20099 (13.65%) Loss: 1.870534 LR: 0.00002138 +[07:52:55] Epoch: 1 Batch: 2745/20099 (13.66%) Loss: 2.082040 LR: 0.00002138 +[07:52:58] Epoch: 1 Batch: 2746/20099 (13.66%) Loss: 2.339970 LR: 0.00002138 +[07:53:01] Epoch: 1 Batch: 2747/20099 (13.67%) Loss: 2.134591 LR: 0.00002138 +[07:53:04] Epoch: 1 Batch: 2748/20099 (13.67%) Loss: 1.761796 LR: 0.00002138 +[07:53:07] Epoch: 1 Batch: 2749/20099 (13.68%) Loss: 1.914184 LR: 0.00002138 +[07:53:10] Epoch: 1 Batch: 2750/20099 (13.68%) Loss: 2.099195 LR: 0.00002138 +[07:53:13] Epoch: 1 Batch: 2751/20099 (13.69%) Loss: 2.210208 LR: 0.00002144 +[07:53:16] Epoch: 1 Batch: 2752/20099 (13.69%) Loss: 2.047792 LR: 0.00002144 +[07:53:20] Epoch: 1 Batch: 2753/20099 (13.70%) Loss: 2.244542 LR: 0.00002144 +[07:53:23] Epoch: 1 Batch: 2754/20099 (13.70%) Loss: 2.641780 LR: 0.00002144 +[07:53:26] Epoch: 1 Batch: 2755/20099 (13.71%) Loss: 2.192464 LR: 0.00002144 +[07:53:29] Epoch: 1 Batch: 2756/20099 (13.71%) Loss: 2.056824 LR: 0.00002144 +[07:53:32] Epoch: 1 Batch: 2757/20099 (13.72%) Loss: 2.375796 LR: 0.00002144 +[07:53:35] Epoch: 1 Batch: 2758/20099 (13.72%) Loss: 2.305646 LR: 0.00002149 +[07:53:38] Epoch: 1 Batch: 2759/20099 (13.73%) Loss: 2.180172 LR: 0.00002149 +[07:53:41] Epoch: 1 Batch: 2760/20099 (13.73%) Loss: 1.986772 LR: 0.00002149 +[07:53:44] Epoch: 1 Batch: 2761/20099 (13.74%) Loss: 2.284252 LR: 0.00002149 +[07:53:47] Epoch: 1 Batch: 2762/20099 (13.74%) Loss: 2.370033 LR: 0.00002149 +[07:53:50] Epoch: 1 Batch: 2763/20099 (13.75%) Loss: 2.307525 LR: 0.00002149 +[07:53:54] Epoch: 1 Batch: 2764/20099 (13.75%) Loss: 2.395034 LR: 0.00002149 +[07:53:57] Epoch: 1 Batch: 2765/20099 (13.76%) Loss: 2.218702 LR: 0.00002155 +[07:54:00] Epoch: 1 Batch: 2766/20099 (13.76%) Loss: 2.697784 LR: 0.00002155 +[07:54:03] Epoch: 1 Batch: 2767/20099 (13.77%) Loss: 2.060065 LR: 0.00002155 +[07:54:06] Epoch: 1 Batch: 2768/20099 (13.77%) Loss: 2.117900 LR: 0.00002155 +[07:54:09] Epoch: 1 Batch: 2769/20099 (13.78%) Loss: 2.128484 LR: 0.00002155 +[07:54:12] Epoch: 1 Batch: 2770/20099 (13.78%) Loss: 2.082245 LR: 0.00002155 +[07:54:15] Epoch: 1 Batch: 2771/20099 (13.79%) Loss: 2.275504 LR: 0.00002155 +[07:54:18] Epoch: 1 Batch: 2772/20099 (13.79%) Loss: 2.718981 LR: 0.00002160 +[07:54:21] Epoch: 1 Batch: 2773/20099 (13.80%) Loss: 2.143050 LR: 0.00002160 +[07:54:25] Epoch: 1 Batch: 2774/20099 (13.80%) Loss: 2.243861 LR: 0.00002160 +[07:54:28] Epoch: 1 Batch: 2775/20099 (13.81%) Loss: 2.412583 LR: 0.00002160 +[07:54:31] Epoch: 1 Batch: 2776/20099 (13.81%) Loss: 2.707017 LR: 0.00002160 +[07:54:34] Epoch: 1 Batch: 2777/20099 (13.82%) Loss: 2.227385 LR: 0.00002160 +[07:54:37] Epoch: 1 Batch: 2778/20099 (13.82%) Loss: 1.718606 LR: 0.00002160 +[07:54:40] Epoch: 1 Batch: 2779/20099 (13.83%) Loss: 2.032504 LR: 0.00002165 +[07:54:43] Epoch: 1 Batch: 2780/20099 (13.83%) Loss: 1.887750 LR: 0.00002165 +[07:54:46] Epoch: 1 Batch: 2781/20099 (13.84%) Loss: 2.128351 LR: 0.00002165 +[07:54:49] Epoch: 1 Batch: 2782/20099 (13.84%) Loss: 2.230880 LR: 0.00002165 +[07:54:52] Epoch: 1 Batch: 2783/20099 (13.85%) Loss: 2.528112 LR: 0.00002165 +[07:54:56] Epoch: 1 Batch: 2784/20099 (13.85%) Loss: 2.040090 LR: 0.00002165 +[07:54:59] Epoch: 1 Batch: 2785/20099 (13.86%) Loss: 1.841525 LR: 0.00002165 +[07:55:02] Epoch: 1 Batch: 2786/20099 (13.86%) Loss: 2.275285 LR: 0.00002171 +[07:55:05] Epoch: 1 Batch: 2787/20099 (13.87%) Loss: 2.262744 LR: 0.00002171 +[07:55:08] Epoch: 1 Batch: 2788/20099 (13.87%) Loss: 2.221883 LR: 0.00002171 +[07:55:11] Epoch: 1 Batch: 2789/20099 (13.88%) Loss: 2.145683 LR: 0.00002171 +[07:55:14] Epoch: 1 Batch: 2790/20099 (13.88%) Loss: 2.409032 LR: 0.00002171 +[07:55:17] Epoch: 1 Batch: 2791/20099 (13.89%) Loss: 1.984984 LR: 0.00002171 +[07:55:20] Epoch: 1 Batch: 2792/20099 (13.89%) Loss: 2.118602 LR: 0.00002171 +[07:55:23] Epoch: 1 Batch: 2793/20099 (13.90%) Loss: 2.057848 LR: 0.00002176 +[07:55:27] Epoch: 1 Batch: 2794/20099 (13.90%) Loss: 2.103943 LR: 0.00002176 +[07:55:30] Epoch: 1 Batch: 2795/20099 (13.91%) Loss: 2.324187 LR: 0.00002176 +[07:55:33] Epoch: 1 Batch: 2796/20099 (13.91%) Loss: 2.271766 LR: 0.00002176 +[07:55:36] Epoch: 1 Batch: 2797/20099 (13.92%) Loss: 2.061742 LR: 0.00002176 +[07:55:39] Epoch: 1 Batch: 2798/20099 (13.92%) Loss: 2.285283 LR: 0.00002176 +[07:55:42] Epoch: 1 Batch: 2799/20099 (13.93%) Loss: 1.941206 LR: 0.00002176 +[07:55:49] >> Cleaned up old temp checkpoint: epoch1_step800 +[07:55:49] >> Temp checkpoint saved: epoch1_step2800, size: 0.1693 GB +[07:55:49] Epoch: 1 Batch: 2800/20099 (13.93%) Loss: 2.074402 LR: 0.00002182 +[07:55:52] Epoch: 1 Batch: 2801/20099 (13.94%) Loss: 2.216730 LR: 0.00002182 +[07:55:55] Epoch: 1 Batch: 2802/20099 (13.94%) Loss: 1.978696 LR: 0.00002182 +[07:55:58] Epoch: 1 Batch: 2803/20099 (13.95%) Loss: 2.133315 LR: 0.00002182 +[07:56:01] Epoch: 1 Batch: 2804/20099 (13.95%) Loss: 2.203534 LR: 0.00002182 +[07:56:04] Epoch: 1 Batch: 2805/20099 (13.96%) Loss: 2.201557 LR: 0.00002182 +[07:56:07] Epoch: 1 Batch: 2806/20099 (13.96%) Loss: 2.272289 LR: 0.00002182 +[07:56:10] Epoch: 1 Batch: 2807/20099 (13.97%) Loss: 1.932803 LR: 0.00002187 +[07:56:13] Epoch: 1 Batch: 2808/20099 (13.97%) Loss: 2.064082 LR: 0.00002187 +[07:56:16] Epoch: 1 Batch: 2809/20099 (13.98%) Loss: 2.396595 LR: 0.00002187 +[07:56:20] Epoch: 1 Batch: 2810/20099 (13.98%) Loss: 2.264002 LR: 0.00002187 +[07:56:23] Epoch: 1 Batch: 2811/20099 (13.99%) Loss: 2.237241 LR: 0.00002187 +[07:56:26] Epoch: 1 Batch: 2812/20099 (13.99%) Loss: 2.449508 LR: 0.00002187 +[07:56:29] Epoch: 1 Batch: 2813/20099 (14.00%) Loss: 2.318754 LR: 0.00002187 +[07:56:32] Epoch: 1 Batch: 2814/20099 (14.00%) Loss: 2.151048 LR: 0.00002193 +[07:56:35] Epoch: 1 Batch: 2815/20099 (14.01%) Loss: 2.491852 LR: 0.00002193 +[07:56:38] Epoch: 1 Batch: 2816/20099 (14.01%) Loss: 2.441684 LR: 0.00002193 +[07:56:41] Epoch: 1 Batch: 2817/20099 (14.02%) Loss: 2.199344 LR: 0.00002193 +[07:56:45] Epoch: 1 Batch: 2818/20099 (14.02%) Loss: 2.334011 LR: 0.00002193 +[07:56:48] Epoch: 1 Batch: 2819/20099 (14.03%) Loss: 2.330544 LR: 0.00002193 +[07:56:51] Epoch: 1 Batch: 2820/20099 (14.03%) Loss: 2.371519 LR: 0.00002193 +[07:56:54] Epoch: 1 Batch: 2821/20099 (14.04%) Loss: 2.314243 LR: 0.00002198 +[07:56:57] Epoch: 1 Batch: 2822/20099 (14.04%) Loss: 1.991649 LR: 0.00002198 +[07:57:00] Epoch: 1 Batch: 2823/20099 (14.05%) Loss: 2.590214 LR: 0.00002198 +[07:57:03] Epoch: 1 Batch: 2824/20099 (14.05%) Loss: 2.155244 LR: 0.00002198 +[07:57:06] Epoch: 1 Batch: 2825/20099 (14.06%) Loss: 2.428171 LR: 0.00002198 +[07:57:09] Epoch: 1 Batch: 2826/20099 (14.06%) Loss: 2.234841 LR: 0.00002198 +[07:57:12] Epoch: 1 Batch: 2827/20099 (14.07%) Loss: 2.186695 LR: 0.00002198 +[07:57:15] Epoch: 1 Batch: 2828/20099 (14.07%) Loss: 2.221848 LR: 0.00002204 +[07:57:19] Epoch: 1 Batch: 2829/20099 (14.08%) Loss: 2.134188 LR: 0.00002204 +[07:57:22] Epoch: 1 Batch: 2830/20099 (14.08%) Loss: 2.363022 LR: 0.00002204 +[07:57:25] Epoch: 1 Batch: 2831/20099 (14.09%) Loss: 2.335325 LR: 0.00002204 +[07:57:28] Epoch: 1 Batch: 2832/20099 (14.09%) Loss: 2.131914 LR: 0.00002204 +[07:57:31] Epoch: 1 Batch: 2833/20099 (14.10%) Loss: 2.116859 LR: 0.00002204 +[07:57:34] Epoch: 1 Batch: 2834/20099 (14.10%) Loss: 2.146293 LR: 0.00002204 +[07:57:37] Epoch: 1 Batch: 2835/20099 (14.11%) Loss: 2.133831 LR: 0.00002209 +[07:57:40] Epoch: 1 Batch: 2836/20099 (14.11%) Loss: 2.429434 LR: 0.00002209 +[07:57:43] Epoch: 1 Batch: 2837/20099 (14.12%) Loss: 2.120774 LR: 0.00002209 +[07:57:46] Epoch: 1 Batch: 2838/20099 (14.12%) Loss: 2.289357 LR: 0.00002209 +[07:57:49] Epoch: 1 Batch: 2839/20099 (14.13%) Loss: 1.966570 LR: 0.00002209 +[07:57:52] Epoch: 1 Batch: 2840/20099 (14.13%) Loss: 2.048846 LR: 0.00002209 +[07:57:56] Epoch: 1 Batch: 2841/20099 (14.14%) Loss: 2.211694 LR: 0.00002209 +[07:57:59] Epoch: 1 Batch: 2842/20099 (14.14%) Loss: 2.254627 LR: 0.00002215 +[07:58:02] Epoch: 1 Batch: 2843/20099 (14.14%) Loss: 1.959458 LR: 0.00002215 +[07:58:05] Epoch: 1 Batch: 2844/20099 (14.15%) Loss: 2.438670 LR: 0.00002215 +[07:58:08] Epoch: 1 Batch: 2845/20099 (14.15%) Loss: 2.239449 LR: 0.00002215 +[07:58:11] Epoch: 1 Batch: 2846/20099 (14.16%) Loss: 2.154443 LR: 0.00002215 +[07:58:14] Epoch: 1 Batch: 2847/20099 (14.16%) Loss: 2.198065 LR: 0.00002215 +[07:58:17] Epoch: 1 Batch: 2848/20099 (14.17%) Loss: 2.110764 LR: 0.00002215 +[07:58:20] Epoch: 1 Batch: 2849/20099 (14.17%) Loss: 2.203395 LR: 0.00002220 +[07:58:23] Epoch: 1 Batch: 2850/20099 (14.18%) Loss: 2.414010 LR: 0.00002220 +[07:58:26] Epoch: 1 Batch: 2851/20099 (14.18%) Loss: 2.278963 LR: 0.00002220 +[07:58:30] Epoch: 1 Batch: 2852/20099 (14.19%) Loss: 2.276326 LR: 0.00002220 +[07:58:33] Epoch: 1 Batch: 2853/20099 (14.19%) Loss: 2.265434 LR: 0.00002220 +[07:58:36] Epoch: 1 Batch: 2854/20099 (14.20%) Loss: 2.591926 LR: 0.00002220 +[07:58:39] Epoch: 1 Batch: 2855/20099 (14.20%) Loss: 2.349315 LR: 0.00002220 +[07:58:42] Epoch: 1 Batch: 2856/20099 (14.21%) Loss: 2.321206 LR: 0.00002225 +[07:58:45] Epoch: 1 Batch: 2857/20099 (14.21%) Loss: 2.135187 LR: 0.00002225 +[07:58:48] Epoch: 1 Batch: 2858/20099 (14.22%) Loss: 2.291490 LR: 0.00002225 +[07:58:51] Epoch: 1 Batch: 2859/20099 (14.22%) Loss: 2.337298 LR: 0.00002225 +[07:58:54] Epoch: 1 Batch: 2860/20099 (14.23%) Loss: 2.165537 LR: 0.00002225 +[07:58:58] Epoch: 1 Batch: 2861/20099 (14.23%) Loss: 2.049564 LR: 0.00002225 +[07:59:01] Epoch: 1 Batch: 2862/20099 (14.24%) Loss: 2.104748 LR: 0.00002225 +[07:59:04] Epoch: 1 Batch: 2863/20099 (14.24%) Loss: 2.023598 LR: 0.00002231 +[07:59:07] Epoch: 1 Batch: 2864/20099 (14.25%) Loss: 2.191615 LR: 0.00002231 +[07:59:10] Epoch: 1 Batch: 2865/20099 (14.25%) Loss: 2.136958 LR: 0.00002231 +[07:59:13] Epoch: 1 Batch: 2866/20099 (14.26%) Loss: 2.416388 LR: 0.00002231 +[07:59:16] Epoch: 1 Batch: 2867/20099 (14.26%) Loss: 2.275473 LR: 0.00002231 +[07:59:19] Epoch: 1 Batch: 2868/20099 (14.27%) Loss: 2.362564 LR: 0.00002231 +[07:59:22] Epoch: 1 Batch: 2869/20099 (14.27%) Loss: 2.022331 LR: 0.00002231 +[07:59:25] Epoch: 1 Batch: 2870/20099 (14.28%) Loss: 2.331536 LR: 0.00002236 +[07:59:29] Epoch: 1 Batch: 2871/20099 (14.28%) Loss: 2.291268 LR: 0.00002236 +[07:59:32] Epoch: 1 Batch: 2872/20099 (14.29%) Loss: 2.214578 LR: 0.00002236 +[07:59:35] Epoch: 1 Batch: 2873/20099 (14.29%) Loss: 2.098819 LR: 0.00002236 +[07:59:38] Epoch: 1 Batch: 2874/20099 (14.30%) Loss: 2.101243 LR: 0.00002236 +[07:59:41] Epoch: 1 Batch: 2875/20099 (14.30%) Loss: 2.150649 LR: 0.00002236 +[07:59:44] Epoch: 1 Batch: 2876/20099 (14.31%) Loss: 2.081349 LR: 0.00002236 +[07:59:47] Epoch: 1 Batch: 2877/20099 (14.31%) Loss: 2.144913 LR: 0.00002242 +[07:59:50] Epoch: 1 Batch: 2878/20099 (14.32%) Loss: 2.161219 LR: 0.00002242 +[07:59:53] Epoch: 1 Batch: 2879/20099 (14.32%) Loss: 2.172865 LR: 0.00002242 +[07:59:57] Epoch: 1 Batch: 2880/20099 (14.33%) Loss: 2.208618 LR: 0.00002242 +[08:00:00] Epoch: 1 Batch: 2881/20099 (14.33%) Loss: 2.449243 LR: 0.00002242 +[08:00:03] Epoch: 1 Batch: 2882/20099 (14.34%) Loss: 2.434333 LR: 0.00002242 +[08:00:06] Epoch: 1 Batch: 2883/20099 (14.34%) Loss: 2.009742 LR: 0.00002242 +[08:00:09] Epoch: 1 Batch: 2884/20099 (14.35%) Loss: 2.188139 LR: 0.00002247 +[08:00:12] Epoch: 1 Batch: 2885/20099 (14.35%) Loss: 2.161915 LR: 0.00002247 +[08:00:15] Epoch: 1 Batch: 2886/20099 (14.36%) Loss: 2.368684 LR: 0.00002247 +[08:00:18] Epoch: 1 Batch: 2887/20099 (14.36%) Loss: 2.430614 LR: 0.00002247 +[08:00:21] Epoch: 1 Batch: 2888/20099 (14.37%) Loss: 2.164064 LR: 0.00002247 +[08:00:24] Epoch: 1 Batch: 2889/20099 (14.37%) Loss: 1.899476 LR: 0.00002247 +[08:00:28] Epoch: 1 Batch: 2890/20099 (14.38%) Loss: 2.319108 LR: 0.00002247 +[08:00:31] Epoch: 1 Batch: 2891/20099 (14.38%) Loss: 2.166673 LR: 0.00002253 +[08:00:34] Epoch: 1 Batch: 2892/20099 (14.39%) Loss: 2.624513 LR: 0.00002253 +[08:00:37] Epoch: 1 Batch: 2893/20099 (14.39%) Loss: 1.969318 LR: 0.00002253 +[08:00:40] Epoch: 1 Batch: 2894/20099 (14.40%) Loss: 2.149759 LR: 0.00002253 +[08:00:43] Epoch: 1 Batch: 2895/20099 (14.40%) Loss: 2.501909 LR: 0.00002253 +[08:00:46] Epoch: 1 Batch: 2896/20099 (14.41%) Loss: 1.987996 LR: 0.00002253 +[08:00:49] Epoch: 1 Batch: 2897/20099 (14.41%) Loss: 2.171886 LR: 0.00002253 +[08:00:52] Epoch: 1 Batch: 2898/20099 (14.42%) Loss: 2.277207 LR: 0.00002258 +[08:00:55] Epoch: 1 Batch: 2899/20099 (14.42%) Loss: 2.072195 LR: 0.00002258 +[08:00:59] Epoch: 1 Batch: 2900/20099 (14.43%) Loss: 2.324095 LR: 0.00002258 +[08:01:02] Epoch: 1 Batch: 2901/20099 (14.43%) Loss: 2.525599 LR: 0.00002258 +[08:01:05] Epoch: 1 Batch: 2902/20099 (14.44%) Loss: 2.392134 LR: 0.00002258 +[08:01:08] Epoch: 1 Batch: 2903/20099 (14.44%) Loss: 2.368020 LR: 0.00002258 +[08:01:11] Epoch: 1 Batch: 2904/20099 (14.45%) Loss: 2.184041 LR: 0.00002258 +[08:01:14] Epoch: 1 Batch: 2905/20099 (14.45%) Loss: 2.019115 LR: 0.00002264 +[08:01:17] Epoch: 1 Batch: 2906/20099 (14.46%) Loss: 2.530500 LR: 0.00002264 +[08:01:20] Epoch: 1 Batch: 2907/20099 (14.46%) Loss: 2.055346 LR: 0.00002264 +[08:01:23] Epoch: 1 Batch: 2908/20099 (14.47%) Loss: 2.305311 LR: 0.00002264 +[08:01:26] Epoch: 1 Batch: 2909/20099 (14.47%) Loss: 2.222674 LR: 0.00002264 +[08:01:29] Epoch: 1 Batch: 2910/20099 (14.48%) Loss: 2.381155 LR: 0.00002264 +[08:01:33] Epoch: 1 Batch: 2911/20099 (14.48%) Loss: 2.002766 LR: 0.00002264 +[08:01:36] Epoch: 1 Batch: 2912/20099 (14.49%) Loss: 2.297015 LR: 0.00002269 +[08:01:39] Epoch: 1 Batch: 2913/20099 (14.49%) Loss: 2.391950 LR: 0.00002269 +[08:01:42] Epoch: 1 Batch: 2914/20099 (14.50%) Loss: 2.234587 LR: 0.00002269 +[08:01:45] Epoch: 1 Batch: 2915/20099 (14.50%) Loss: 2.193214 LR: 0.00002269 +[08:01:48] Epoch: 1 Batch: 2916/20099 (14.51%) Loss: 2.243914 LR: 0.00002269 +[08:01:51] Epoch: 1 Batch: 2917/20099 (14.51%) Loss: 2.007367 LR: 0.00002269 +[08:01:54] Epoch: 1 Batch: 2918/20099 (14.52%) Loss: 2.456401 LR: 0.00002269 +[08:01:57] Epoch: 1 Batch: 2919/20099 (14.52%) Loss: 2.158502 LR: 0.00002275 +[08:02:00] Epoch: 1 Batch: 2920/20099 (14.53%) Loss: 2.009751 LR: 0.00002275 +[08:02:04] Epoch: 1 Batch: 2921/20099 (14.53%) Loss: 2.180648 LR: 0.00002275 +[08:02:07] Epoch: 1 Batch: 2922/20099 (14.54%) Loss: 2.284039 LR: 0.00002275 +[08:02:10] Epoch: 1 Batch: 2923/20099 (14.54%) Loss: 2.372457 LR: 0.00002275 +[08:02:13] Epoch: 1 Batch: 2924/20099 (14.55%) Loss: 2.034812 LR: 0.00002275 +[08:02:16] Epoch: 1 Batch: 2925/20099 (14.55%) Loss: 2.400895 LR: 0.00002275 +[08:02:19] Epoch: 1 Batch: 2926/20099 (14.56%) Loss: 2.173874 LR: 0.00002280 +[08:02:22] Epoch: 1 Batch: 2927/20099 (14.56%) Loss: 2.110225 LR: 0.00002280 +[08:02:25] Epoch: 1 Batch: 2928/20099 (14.57%) Loss: 2.331111 LR: 0.00002280 +[08:02:28] Epoch: 1 Batch: 2929/20099 (14.57%) Loss: 2.254573 LR: 0.00002280 +[08:02:31] Epoch: 1 Batch: 2930/20099 (14.58%) Loss: 2.267711 LR: 0.00002280 +[08:02:35] Epoch: 1 Batch: 2931/20099 (14.58%) Loss: 2.115487 LR: 0.00002280 +[08:02:38] Epoch: 1 Batch: 2932/20099 (14.59%) Loss: 2.129659 LR: 0.00002280 +[08:02:41] Epoch: 1 Batch: 2933/20099 (14.59%) Loss: 2.432295 LR: 0.00002285 +[08:02:44] Epoch: 1 Batch: 2934/20099 (14.60%) Loss: 2.134562 LR: 0.00002285 +[08:02:47] Epoch: 1 Batch: 2935/20099 (14.60%) Loss: 2.170224 LR: 0.00002285 +[08:02:50] Epoch: 1 Batch: 2936/20099 (14.61%) Loss: 2.265427 LR: 0.00002285 +[08:02:53] Epoch: 1 Batch: 2937/20099 (14.61%) Loss: 2.466420 LR: 0.00002285 +[08:02:56] Epoch: 1 Batch: 2938/20099 (14.62%) Loss: 2.210351 LR: 0.00002285 +[08:02:59] Epoch: 1 Batch: 2939/20099 (14.62%) Loss: 2.013543 LR: 0.00002285 +[08:03:02] Epoch: 1 Batch: 2940/20099 (14.63%) Loss: 2.379362 LR: 0.00002291 +[08:03:06] Epoch: 1 Batch: 2941/20099 (14.63%) Loss: 2.364670 LR: 0.00002291 +[08:03:09] Epoch: 1 Batch: 2942/20099 (14.64%) Loss: 2.077369 LR: 0.00002291 +[08:03:12] Epoch: 1 Batch: 2943/20099 (14.64%) Loss: 2.151177 LR: 0.00002291 +[08:03:15] Epoch: 1 Batch: 2944/20099 (14.65%) Loss: 2.240447 LR: 0.00002291 +[08:03:18] Epoch: 1 Batch: 2945/20099 (14.65%) Loss: 2.334965 LR: 0.00002291 +[08:03:21] Epoch: 1 Batch: 2946/20099 (14.66%) Loss: 2.241404 LR: 0.00002291 +[08:03:24] Epoch: 1 Batch: 2947/20099 (14.66%) Loss: 2.446580 LR: 0.00002296 +[08:03:27] Epoch: 1 Batch: 2948/20099 (14.67%) Loss: 1.974125 LR: 0.00002296 +[08:03:30] Epoch: 1 Batch: 2949/20099 (14.67%) Loss: 2.128999 LR: 0.00002296 +[08:03:33] Epoch: 1 Batch: 2950/20099 (14.68%) Loss: 2.310823 LR: 0.00002296 +[08:03:37] Epoch: 1 Batch: 2951/20099 (14.68%) Loss: 2.171002 LR: 0.00002296 +[08:03:40] Epoch: 1 Batch: 2952/20099 (14.69%) Loss: 2.358540 LR: 0.00002296 +[08:03:43] Epoch: 1 Batch: 2953/20099 (14.69%) Loss: 2.278703 LR: 0.00002296 +[08:03:46] Epoch: 1 Batch: 2954/20099 (14.70%) Loss: 2.152119 LR: 0.00002302 +[08:03:49] Epoch: 1 Batch: 2955/20099 (14.70%) Loss: 1.958947 LR: 0.00002302 +[08:03:52] Epoch: 1 Batch: 2956/20099 (14.71%) Loss: 2.007500 LR: 0.00002302 +[08:03:55] Epoch: 1 Batch: 2957/20099 (14.71%) Loss: 2.125084 LR: 0.00002302 +[08:03:58] Epoch: 1 Batch: 2958/20099 (14.72%) Loss: 2.329364 LR: 0.00002302 +[08:04:01] Epoch: 1 Batch: 2959/20099 (14.72%) Loss: 2.509832 LR: 0.00002302 +[08:04:04] Epoch: 1 Batch: 2960/20099 (14.73%) Loss: 2.273805 LR: 0.00002302 +[08:04:07] Epoch: 1 Batch: 2961/20099 (14.73%) Loss: 2.199972 LR: 0.00002307 +[08:04:11] Epoch: 1 Batch: 2962/20099 (14.74%) Loss: 2.454372 LR: 0.00002307 +[08:04:14] Epoch: 1 Batch: 2963/20099 (14.74%) Loss: 2.294977 LR: 0.00002307 +[08:04:17] Epoch: 1 Batch: 2964/20099 (14.75%) Loss: 2.127792 LR: 0.00002307 +[08:04:20] Epoch: 1 Batch: 2965/20099 (14.75%) Loss: 2.076711 LR: 0.00002307 +[08:04:23] Epoch: 1 Batch: 2966/20099 (14.76%) Loss: 2.312637 LR: 0.00002307 +[08:04:26] Epoch: 1 Batch: 2967/20099 (14.76%) Loss: 2.162895 LR: 0.00002307 +[08:04:29] Epoch: 1 Batch: 2968/20099 (14.77%) Loss: 2.339951 LR: 0.00002313 +[08:04:32] Epoch: 1 Batch: 2969/20099 (14.77%) Loss: 2.236085 LR: 0.00002313 +[08:04:35] Epoch: 1 Batch: 2970/20099 (14.78%) Loss: 2.096571 LR: 0.00002313 +[08:04:38] Epoch: 1 Batch: 2971/20099 (14.78%) Loss: 2.269050 LR: 0.00002313 +[08:04:42] Epoch: 1 Batch: 2972/20099 (14.79%) Loss: 2.295001 LR: 0.00002313 +[08:04:45] Epoch: 1 Batch: 2973/20099 (14.79%) Loss: 2.220366 LR: 0.00002313 +[08:04:48] Epoch: 1 Batch: 2974/20099 (14.80%) Loss: 2.489001 LR: 0.00002313 +[08:04:51] Epoch: 1 Batch: 2975/20099 (14.80%) Loss: 2.146025 LR: 0.00002318 +[08:04:54] Epoch: 1 Batch: 2976/20099 (14.81%) Loss: 2.411311 LR: 0.00002318 +[08:04:57] Epoch: 1 Batch: 2977/20099 (14.81%) Loss: 2.590576 LR: 0.00002318 +[08:05:00] Epoch: 1 Batch: 2978/20099 (14.82%) Loss: 2.181749 LR: 0.00002318 +[08:05:03] Epoch: 1 Batch: 2979/20099 (14.82%) Loss: 2.212873 LR: 0.00002318 +[08:05:06] Epoch: 1 Batch: 2980/20099 (14.83%) Loss: 2.031998 LR: 0.00002318 +[08:05:09] Epoch: 1 Batch: 2981/20099 (14.83%) Loss: 2.415733 LR: 0.00002318 +[08:05:12] Epoch: 1 Batch: 2982/20099 (14.84%) Loss: 2.192715 LR: 0.00002324 +[08:05:16] Epoch: 1 Batch: 2983/20099 (14.84%) Loss: 2.078628 LR: 0.00002324 +[08:05:19] Epoch: 1 Batch: 2984/20099 (14.85%) Loss: 2.042099 LR: 0.00002324 +[08:05:22] Epoch: 1 Batch: 2985/20099 (14.85%) Loss: 2.371159 LR: 0.00002324 +[08:05:25] Epoch: 1 Batch: 2986/20099 (14.86%) Loss: 2.199560 LR: 0.00002324 +[08:05:28] Epoch: 1 Batch: 2987/20099 (14.86%) Loss: 2.232897 LR: 0.00002324 +[08:05:31] Epoch: 1 Batch: 2988/20099 (14.87%) Loss: 2.544611 LR: 0.00002324 +[08:05:34] Epoch: 1 Batch: 2989/20099 (14.87%) Loss: 2.012506 LR: 0.00002329 +[08:05:37] Epoch: 1 Batch: 2990/20099 (14.88%) Loss: 2.353424 LR: 0.00002329 +[08:05:40] Epoch: 1 Batch: 2991/20099 (14.88%) Loss: 1.983872 LR: 0.00002329 +[08:05:43] Epoch: 1 Batch: 2992/20099 (14.89%) Loss: 2.500118 LR: 0.00002329 +[08:05:46] Epoch: 1 Batch: 2993/20099 (14.89%) Loss: 2.224709 LR: 0.00002329 +[08:05:50] Epoch: 1 Batch: 2994/20099 (14.90%) Loss: 2.136726 LR: 0.00002329 +[08:05:53] Epoch: 1 Batch: 2995/20099 (14.90%) Loss: 2.312960 LR: 0.00002329 +[08:05:56] Epoch: 1 Batch: 2996/20099 (14.91%) Loss: 2.221241 LR: 0.00002335 +[08:05:59] Epoch: 1 Batch: 2997/20099 (14.91%) Loss: 2.242181 LR: 0.00002335 +[08:06:02] Epoch: 1 Batch: 2998/20099 (14.92%) Loss: 2.293796 LR: 0.00002335 +[08:06:05] Epoch: 1 Batch: 2999/20099 (14.92%) Loss: 2.156148 LR: 0.00002335 +[08:06:08] >> Evaluating batch 0 +[08:06:09] >> Evaluating batch 1 +[08:06:11] >> Evaluating batch 2 +[08:06:12] >> Evaluating batch 3 +[08:06:13] >> Evaluating batch 4 +[08:06:15] >> Evaluating batch 5 +[08:06:16] >> Evaluating batch 6 +[08:06:17] >> Evaluating batch 7 +[08:06:18] >> Evaluating batch 8 +[08:06:20] >> Evaluating batch 9 +[08:06:21] >> Evaluating batch 10 +[08:06:22] >> Evaluating batch 11 +[08:06:23] >> Evaluating batch 12 +[08:06:24] >> Evaluating batch 13 +[08:06:26] >> Evaluating batch 14 +[08:06:27] >> Evaluating batch 15 +[08:06:28] >> Evaluating batch 16 +[08:06:29] Epoch: 1 Step: 3000/20099 Evaluation: +[08:06:29] [1mAvg Loss Since Last Eval: 2.2333 Val Loss: 2.2879 Validation loss delta: -0.0258 Perplexity: 9.8541 LR: 0.00002335 +[08:06:32] >> Cleaned up old temp checkpoint: epoch1_step1000 +[08:06:32] >> Temp checkpoint saved: epoch1_step3000, size: 0.1693 GB +[08:06:36] >> Checkpoint saved: epoch1_step3000, size: 0.1693 GB +[08:06:36] Epoch: 1 Batch: 3000/20099 (14.93%) Loss: 2.199824 LR: 0.00002335 +[08:06:39] Epoch: 1 Batch: 3001/20099 (14.93%) Loss: 2.307619 LR: 0.00002335 +[08:06:42] Epoch: 1 Batch: 3002/20099 (14.94%) Loss: 2.225192 LR: 0.00002335 +[08:06:45] Epoch: 1 Batch: 3003/20099 (14.94%) Loss: 2.267109 LR: 0.00002340 +[08:06:48] Epoch: 1 Batch: 3004/20099 (14.95%) Loss: 2.125650 LR: 0.00002340 +[08:06:51] Epoch: 1 Batch: 3005/20099 (14.95%) Loss: 2.208072 LR: 0.00002340 +[08:06:54] Epoch: 1 Batch: 3006/20099 (14.96%) Loss: 2.304340 LR: 0.00002340 +[08:06:57] Epoch: 1 Batch: 3007/20099 (14.96%) Loss: 1.962974 LR: 0.00002340 +[08:07:01] Epoch: 1 Batch: 3008/20099 (14.97%) Loss: 2.343878 LR: 0.00002340 +[08:07:04] Epoch: 1 Batch: 3009/20099 (14.97%) Loss: 2.250842 LR: 0.00002340 +[08:07:07] Epoch: 1 Batch: 3010/20099 (14.98%) Loss: 2.288443 LR: 0.00002345 +[08:07:10] Epoch: 1 Batch: 3011/20099 (14.98%) Loss: 2.469611 LR: 0.00002345 +[08:07:13] Epoch: 1 Batch: 3012/20099 (14.99%) Loss: 1.936887 LR: 0.00002345 +[08:07:16] Epoch: 1 Batch: 3013/20099 (14.99%) Loss: 2.221314 LR: 0.00002345 +[08:07:20] Epoch: 1 Batch: 3014/20099 (15.00%) Loss: 2.451274 LR: 0.00002345 +[08:07:23] Epoch: 1 Batch: 3015/20099 (15.00%) Loss: 2.328000 LR: 0.00002345 +[08:07:26] Epoch: 1 Batch: 3016/20099 (15.01%) Loss: 2.393557 LR: 0.00002345 +[08:07:29] Epoch: 1 Batch: 3017/20099 (15.01%) Loss: 2.428119 LR: 0.00002351 +[08:07:32] Epoch: 1 Batch: 3018/20099 (15.02%) Loss: 2.113833 LR: 0.00002351 +[08:07:35] Epoch: 1 Batch: 3019/20099 (15.02%) Loss: 2.153996 LR: 0.00002351 +[08:07:38] Epoch: 1 Batch: 3020/20099 (15.03%) Loss: 2.380019 LR: 0.00002351 +[08:07:41] Epoch: 1 Batch: 3021/20099 (15.03%) Loss: 2.205778 LR: 0.00002351 +[08:07:44] Epoch: 1 Batch: 3022/20099 (15.04%) Loss: 2.283350 LR: 0.00002351 +[08:07:47] Epoch: 1 Batch: 3023/20099 (15.04%) Loss: 2.049090 LR: 0.00002351 +[08:07:50] Epoch: 1 Batch: 3024/20099 (15.05%) Loss: 2.165942 LR: 0.00002356 +[08:07:54] Epoch: 1 Batch: 3025/20099 (15.05%) Loss: 2.226965 LR: 0.00002356 +[08:07:57] Epoch: 1 Batch: 3026/20099 (15.06%) Loss: 2.109134 LR: 0.00002356 +[08:08:00] Epoch: 1 Batch: 3027/20099 (15.06%) Loss: 2.382126 LR: 0.00002356 +[08:08:03] Epoch: 1 Batch: 3028/20099 (15.07%) Loss: 2.447420 LR: 0.00002356 +[08:08:06] Epoch: 1 Batch: 3029/20099 (15.07%) Loss: 2.300135 LR: 0.00002356 +[08:08:09] Epoch: 1 Batch: 3030/20099 (15.08%) Loss: 2.153233 LR: 0.00002356 +[08:08:12] Epoch: 1 Batch: 3031/20099 (15.08%) Loss: 2.217435 LR: 0.00002362 +[08:08:15] Epoch: 1 Batch: 3032/20099 (15.09%) Loss: 2.362185 LR: 0.00002362 +[08:08:18] Epoch: 1 Batch: 3033/20099 (15.09%) Loss: 2.509098 LR: 0.00002362 +[08:08:21] Epoch: 1 Batch: 3034/20099 (15.10%) Loss: 2.390411 LR: 0.00002362 +[08:08:24] Epoch: 1 Batch: 3035/20099 (15.10%) Loss: 2.469917 LR: 0.00002362 +[08:08:27] Epoch: 1 Batch: 3036/20099 (15.11%) Loss: 2.149001 LR: 0.00002362 +[08:08:31] Epoch: 1 Batch: 3037/20099 (15.11%) Loss: 2.111990 LR: 0.00002362 +[08:08:34] Epoch: 1 Batch: 3038/20099 (15.12%) Loss: 2.271682 LR: 0.00002367 +[08:08:37] Epoch: 1 Batch: 3039/20099 (15.12%) Loss: 2.540505 LR: 0.00002367 +[08:08:40] Epoch: 1 Batch: 3040/20099 (15.13%) Loss: 2.242901 LR: 0.00002367 +[08:08:43] Epoch: 1 Batch: 3041/20099 (15.13%) Loss: 2.068046 LR: 0.00002367 +[08:08:46] Epoch: 1 Batch: 3042/20099 (15.14%) Loss: 1.705473 LR: 0.00002367 +[08:08:49] Epoch: 1 Batch: 3043/20099 (15.14%) Loss: 2.042801 LR: 0.00002367 +[08:08:52] Epoch: 1 Batch: 3044/20099 (15.15%) Loss: 2.387302 LR: 0.00002367 +[08:08:55] Epoch: 1 Batch: 3045/20099 (15.15%) Loss: 2.112935 LR: 0.00002373 +[08:08:59] Epoch: 1 Batch: 3046/20099 (15.15%) Loss: 2.307239 LR: 0.00002373 +[08:09:02] Epoch: 1 Batch: 3047/20099 (15.16%) Loss: 2.386851 LR: 0.00002373 +[08:09:05] Epoch: 1 Batch: 3048/20099 (15.16%) Loss: 2.135274 LR: 0.00002373 +[08:09:08] Epoch: 1 Batch: 3049/20099 (15.17%) Loss: 2.035246 LR: 0.00002373 +[08:09:11] Epoch: 1 Batch: 3050/20099 (15.17%) Loss: 2.354911 LR: 0.00002373 +[08:09:14] Epoch: 1 Batch: 3051/20099 (15.18%) Loss: 2.044442 LR: 0.00002373 +[08:09:17] Epoch: 1 Batch: 3052/20099 (15.18%) Loss: 2.397319 LR: 0.00002378 +[08:09:20] Epoch: 1 Batch: 3053/20099 (15.19%) Loss: 2.390864 LR: 0.00002378 +[08:09:23] Epoch: 1 Batch: 3054/20099 (15.19%) Loss: 2.260608 LR: 0.00002378 +[08:09:26] Epoch: 1 Batch: 3055/20099 (15.20%) Loss: 1.911923 LR: 0.00002378 +[08:09:29] Epoch: 1 Batch: 3056/20099 (15.20%) Loss: 2.285467 LR: 0.00002378 +[08:09:33] Epoch: 1 Batch: 3057/20099 (15.21%) Loss: 1.924426 LR: 0.00002378 +[08:09:36] Epoch: 1 Batch: 3058/20099 (15.21%) Loss: 2.149227 LR: 0.00002378 +[08:09:39] Epoch: 1 Batch: 3059/20099 (15.22%) Loss: 2.127029 LR: 0.00002384 +[08:09:42] Epoch: 1 Batch: 3060/20099 (15.22%) Loss: 2.166077 LR: 0.00002384 +[08:09:45] Epoch: 1 Batch: 3061/20099 (15.23%) Loss: 2.244812 LR: 0.00002384 +[08:09:48] Epoch: 1 Batch: 3062/20099 (15.23%) Loss: 1.982675 LR: 0.00002384 +[08:09:51] Epoch: 1 Batch: 3063/20099 (15.24%) Loss: 2.080415 LR: 0.00002384 +[08:09:54] Epoch: 1 Batch: 3064/20099 (15.24%) Loss: 2.552805 LR: 0.00002384 +[08:09:57] Epoch: 1 Batch: 3065/20099 (15.25%) Loss: 2.230253 LR: 0.00002384 +[08:10:00] Epoch: 1 Batch: 3066/20099 (15.25%) Loss: 2.175795 LR: 0.00002389 +[08:10:03] Epoch: 1 Batch: 3067/20099 (15.26%) Loss: 2.346294 LR: 0.00002389 +[08:10:06] Epoch: 1 Batch: 3068/20099 (15.26%) Loss: 2.244041 LR: 0.00002389 +[08:10:10] Epoch: 1 Batch: 3069/20099 (15.27%) Loss: 2.349112 LR: 0.00002389 +[08:10:13] Epoch: 1 Batch: 3070/20099 (15.27%) Loss: 2.367264 LR: 0.00002389 +[08:10:16] Epoch: 1 Batch: 3071/20099 (15.28%) Loss: 2.385255 LR: 0.00002389 +[08:10:19] Epoch: 1 Batch: 3072/20099 (15.28%) Loss: 2.295088 LR: 0.00002389 +[08:10:22] Epoch: 1 Batch: 3073/20099 (15.29%) Loss: 2.194893 LR: 0.00002395 +[08:10:25] Epoch: 1 Batch: 3074/20099 (15.29%) Loss: 2.194145 LR: 0.00002395 +[08:10:28] Epoch: 1 Batch: 3075/20099 (15.30%) Loss: 2.514250 LR: 0.00002395 +[08:10:31] Epoch: 1 Batch: 3076/20099 (15.30%) Loss: 2.096946 LR: 0.00002395 +[08:10:34] Epoch: 1 Batch: 3077/20099 (15.31%) Loss: 2.318950 LR: 0.00002395 +[08:10:37] Epoch: 1 Batch: 3078/20099 (15.31%) Loss: 1.818635 LR: 0.00002395 +[08:10:40] Epoch: 1 Batch: 3079/20099 (15.32%) Loss: 2.224263 LR: 0.00002395 +[08:10:44] Epoch: 1 Batch: 3080/20099 (15.32%) Loss: 2.088881 LR: 0.00002400 +[08:10:47] Epoch: 1 Batch: 3081/20099 (15.33%) Loss: 1.978843 LR: 0.00002400 +[08:10:50] Epoch: 1 Batch: 3082/20099 (15.33%) Loss: 2.362289 LR: 0.00002400 +[08:10:53] Epoch: 1 Batch: 3083/20099 (15.34%) Loss: 2.130124 LR: 0.00002400 +[08:10:56] Epoch: 1 Batch: 3084/20099 (15.34%) Loss: 1.874016 LR: 0.00002400 +[08:10:59] Epoch: 1 Batch: 3085/20099 (15.35%) Loss: 2.318212 LR: 0.00002400 +[08:11:02] Epoch: 1 Batch: 3086/20099 (15.35%) Loss: 2.272377 LR: 0.00002400 +[08:11:05] Epoch: 1 Batch: 3087/20099 (15.36%) Loss: 2.353722 LR: 0.00002405 +[08:11:08] Epoch: 1 Batch: 3088/20099 (15.36%) Loss: 2.172667 LR: 0.00002405 +[08:11:11] Epoch: 1 Batch: 3089/20099 (15.37%) Loss: 1.596799 LR: 0.00002405 +[08:11:14] Epoch: 1 Batch: 3090/20099 (15.37%) Loss: 2.430948 LR: 0.00002405 +[08:11:18] Epoch: 1 Batch: 3091/20099 (15.38%) Loss: 2.144435 LR: 0.00002405 +[08:11:21] Epoch: 1 Batch: 3092/20099 (15.38%) Loss: 2.083289 LR: 0.00002405 +[08:11:24] Epoch: 1 Batch: 3093/20099 (15.39%) Loss: 2.255977 LR: 0.00002405 +[08:11:27] Epoch: 1 Batch: 3094/20099 (15.39%) Loss: 2.125531 LR: 0.00002411 +[08:11:30] Epoch: 1 Batch: 3095/20099 (15.40%) Loss: 2.282505 LR: 0.00002411 +[08:11:33] Epoch: 1 Batch: 3096/20099 (15.40%) Loss: 2.220950 LR: 0.00002411 +[08:11:36] Epoch: 1 Batch: 3097/20099 (15.41%) Loss: 2.466730 LR: 0.00002411 +[08:11:39] Epoch: 1 Batch: 3098/20099 (15.41%) Loss: 2.058371 LR: 0.00002411 +[08:11:42] Epoch: 1 Batch: 3099/20099 (15.42%) Loss: 1.885173 LR: 0.00002411 +[08:11:45] Epoch: 1 Batch: 3100/20099 (15.42%) Loss: 1.973068 LR: 0.00002411 +[08:11:48] Epoch: 1 Batch: 3101/20099 (15.43%) Loss: 2.161209 LR: 0.00002416 +[08:11:52] Epoch: 1 Batch: 3102/20099 (15.43%) Loss: 2.293001 LR: 0.00002416 +[08:11:55] Epoch: 1 Batch: 3103/20099 (15.44%) Loss: 2.125030 LR: 0.00002416 +[08:11:58] Epoch: 1 Batch: 3104/20099 (15.44%) Loss: 2.247079 LR: 0.00002416 +[08:12:01] Epoch: 1 Batch: 3105/20099 (15.45%) Loss: 2.196653 LR: 0.00002416 +[08:12:04] Epoch: 1 Batch: 3106/20099 (15.45%) Loss: 2.399161 LR: 0.00002416 +[08:12:07] Epoch: 1 Batch: 3107/20099 (15.46%) Loss: 2.207570 LR: 0.00002416 +[08:12:10] Epoch: 1 Batch: 3108/20099 (15.46%) Loss: 2.015174 LR: 0.00002422 +[08:12:13] Epoch: 1 Batch: 3109/20099 (15.47%) Loss: 2.462596 LR: 0.00002422 +[08:12:16] Epoch: 1 Batch: 3110/20099 (15.47%) Loss: 2.028267 LR: 0.00002422 +[08:12:19] Epoch: 1 Batch: 3111/20099 (15.48%) Loss: 2.070783 LR: 0.00002422 +[08:12:23] Epoch: 1 Batch: 3112/20099 (15.48%) Loss: 2.072525 LR: 0.00002422 +[08:12:26] Epoch: 1 Batch: 3113/20099 (15.49%) Loss: 2.551221 LR: 0.00002422 +[08:12:29] Epoch: 1 Batch: 3114/20099 (15.49%) Loss: 1.897044 LR: 0.00002422 +[08:12:32] Epoch: 1 Batch: 3115/20099 (15.50%) Loss: 2.038540 LR: 0.00002427 +[08:12:35] Epoch: 1 Batch: 3116/20099 (15.50%) Loss: 2.208086 LR: 0.00002427 +[08:12:38] Epoch: 1 Batch: 3117/20099 (15.51%) Loss: 2.125078 LR: 0.00002427 +[08:12:41] Epoch: 1 Batch: 3118/20099 (15.51%) Loss: 2.307572 LR: 0.00002427 +[08:12:44] Epoch: 1 Batch: 3119/20099 (15.52%) Loss: 2.427942 LR: 0.00002427 +[08:12:47] Epoch: 1 Batch: 3120/20099 (15.52%) Loss: 2.024485 LR: 0.00002427 +[08:12:50] Epoch: 1 Batch: 3121/20099 (15.53%) Loss: 2.248472 LR: 0.00002427 +[08:12:53] Epoch: 1 Batch: 3122/20099 (15.53%) Loss: 2.021040 LR: 0.00002433 +[08:12:57] Epoch: 1 Batch: 3123/20099 (15.54%) Loss: 2.305498 LR: 0.00002433 +[08:13:00] Epoch: 1 Batch: 3124/20099 (15.54%) Loss: 2.127471 LR: 0.00002433 +[08:13:03] Epoch: 1 Batch: 3125/20099 (15.55%) Loss: 2.138189 LR: 0.00002433 +[08:13:06] Epoch: 1 Batch: 3126/20099 (15.55%) Loss: 2.219580 LR: 0.00002433 +[08:13:09] Epoch: 1 Batch: 3127/20099 (15.56%) Loss: 2.137914 LR: 0.00002433 +[08:13:12] Epoch: 1 Batch: 3128/20099 (15.56%) Loss: 2.376242 LR: 0.00002433 +[08:13:15] Epoch: 1 Batch: 3129/20099 (15.57%) Loss: 2.276161 LR: 0.00002438 +[08:13:18] Epoch: 1 Batch: 3130/20099 (15.57%) Loss: 2.274749 LR: 0.00002438 +[08:13:21] Epoch: 1 Batch: 3131/20099 (15.58%) Loss: 2.294725 LR: 0.00002438 +[08:13:24] Epoch: 1 Batch: 3132/20099 (15.58%) Loss: 2.339610 LR: 0.00002438 +[08:13:28] Epoch: 1 Batch: 3133/20099 (15.59%) Loss: 2.106100 LR: 0.00002438 +[08:13:31] Epoch: 1 Batch: 3134/20099 (15.59%) Loss: 2.219691 LR: 0.00002438 +[08:13:34] Epoch: 1 Batch: 3135/20099 (15.60%) Loss: 2.061401 LR: 0.00002438 +[08:13:37] Epoch: 1 Batch: 3136/20099 (15.60%) Loss: 2.333879 LR: 0.00002444 +[08:13:40] Epoch: 1 Batch: 3137/20099 (15.61%) Loss: 2.106811 LR: 0.00002444 +[08:13:43] Epoch: 1 Batch: 3138/20099 (15.61%) Loss: 2.165758 LR: 0.00002444 +[08:13:46] Epoch: 1 Batch: 3139/20099 (15.62%) Loss: 2.062826 LR: 0.00002444 +[08:13:49] Epoch: 1 Batch: 3140/20099 (15.62%) Loss: 2.316462 LR: 0.00002444 +[08:13:52] Epoch: 1 Batch: 3141/20099 (15.63%) Loss: 2.588897 LR: 0.00002444 +[08:13:55] Epoch: 1 Batch: 3142/20099 (15.63%) Loss: 2.086696 LR: 0.00002444 +[08:13:59] Epoch: 1 Batch: 3143/20099 (15.64%) Loss: 2.052127 LR: 0.00002449 +[08:14:02] Epoch: 1 Batch: 3144/20099 (15.64%) Loss: 2.120579 LR: 0.00002449 +[08:14:05] Epoch: 1 Batch: 3145/20099 (15.65%) Loss: 2.370177 LR: 0.00002449 +[08:14:08] Epoch: 1 Batch: 3146/20099 (15.65%) Loss: 2.325802 LR: 0.00002449 +[08:14:11] Epoch: 1 Batch: 3147/20099 (15.66%) Loss: 2.037673 LR: 0.00002449 +[08:14:14] Epoch: 1 Batch: 3148/20099 (15.66%) Loss: 2.185206 LR: 0.00002449 +[08:14:17] Epoch: 1 Batch: 3149/20099 (15.67%) Loss: 2.031933 LR: 0.00002449 +[08:14:20] Epoch: 1 Batch: 3150/20099 (15.67%) Loss: 2.014289 LR: 0.00002455 +[08:14:23] Epoch: 1 Batch: 3151/20099 (15.68%) Loss: 2.119134 LR: 0.00002455 +[08:14:26] Epoch: 1 Batch: 3152/20099 (15.68%) Loss: 2.184337 LR: 0.00002455 +[08:14:30] Epoch: 1 Batch: 3153/20099 (15.69%) Loss: 2.316284 LR: 0.00002455 +[08:14:33] Epoch: 1 Batch: 3154/20099 (15.69%) Loss: 2.250019 LR: 0.00002455 +[08:14:36] Epoch: 1 Batch: 3155/20099 (15.70%) Loss: 2.089617 LR: 0.00002455 +[08:14:39] Epoch: 1 Batch: 3156/20099 (15.70%) Loss: 2.554235 LR: 0.00002455 +[08:14:42] Epoch: 1 Batch: 3157/20099 (15.71%) Loss: 2.118400 LR: 0.00002460 +[08:14:45] Epoch: 1 Batch: 3158/20099 (15.71%) Loss: 2.266126 LR: 0.00002460 +[08:14:48] Epoch: 1 Batch: 3159/20099 (15.72%) Loss: 2.194458 LR: 0.00002460 +[08:14:51] Epoch: 1 Batch: 3160/20099 (15.72%) Loss: 2.185789 LR: 0.00002460 +[08:14:54] Epoch: 1 Batch: 3161/20099 (15.73%) Loss: 2.032192 LR: 0.00002460 +[08:14:58] Epoch: 1 Batch: 3162/20099 (15.73%) Loss: 2.178215 LR: 0.00002460 +[08:15:01] Epoch: 1 Batch: 3163/20099 (15.74%) Loss: 2.117066 LR: 0.00002460 +[08:15:04] Epoch: 1 Batch: 3164/20099 (15.74%) Loss: 2.210169 LR: 0.00002465 +[08:15:07] Epoch: 1 Batch: 3165/20099 (15.75%) Loss: 2.384284 LR: 0.00002465 +[08:15:10] Epoch: 1 Batch: 3166/20099 (15.75%) Loss: 2.226386 LR: 0.00002465 +[08:15:13] Epoch: 1 Batch: 3167/20099 (15.76%) Loss: 1.815578 LR: 0.00002465 +[08:15:16] Epoch: 1 Batch: 3168/20099 (15.76%) Loss: 2.404747 LR: 0.00002465 +[08:15:19] Epoch: 1 Batch: 3169/20099 (15.77%) Loss: 2.367325 LR: 0.00002465 +[08:15:22] Epoch: 1 Batch: 3170/20099 (15.77%) Loss: 2.200128 LR: 0.00002465 +[08:15:25] Epoch: 1 Batch: 3171/20099 (15.78%) Loss: 2.167218 LR: 0.00002471 +[08:15:29] Epoch: 1 Batch: 3172/20099 (15.78%) Loss: 1.992056 LR: 0.00002471 +[08:15:32] Epoch: 1 Batch: 3173/20099 (15.79%) Loss: 2.458081 LR: 0.00002471 +[08:15:35] Epoch: 1 Batch: 3174/20099 (15.79%) Loss: 1.964258 LR: 0.00002471 +[08:15:38] Epoch: 1 Batch: 3175/20099 (15.80%) Loss: 2.072972 LR: 0.00002471 +[08:15:41] Epoch: 1 Batch: 3176/20099 (15.80%) Loss: 2.016938 LR: 0.00002471 +[08:15:44] Epoch: 1 Batch: 3177/20099 (15.81%) Loss: 2.427017 LR: 0.00002471 +[08:15:47] Epoch: 1 Batch: 3178/20099 (15.81%) Loss: 2.418317 LR: 0.00002476 +[08:15:50] Epoch: 1 Batch: 3179/20099 (15.82%) Loss: 2.087599 LR: 0.00002476 +[08:15:53] Epoch: 1 Batch: 3180/20099 (15.82%) Loss: 2.328664 LR: 0.00002476 +[08:15:56] Epoch: 1 Batch: 3181/20099 (15.83%) Loss: 1.829847 LR: 0.00002476 +[08:15:59] Epoch: 1 Batch: 3182/20099 (15.83%) Loss: 2.416254 LR: 0.00002476 +[08:16:03] Epoch: 1 Batch: 3183/20099 (15.84%) Loss: 2.260193 LR: 0.00002476 +[08:16:06] Epoch: 1 Batch: 3184/20099 (15.84%) Loss: 2.273007 LR: 0.00002476 +[08:16:09] Epoch: 1 Batch: 3185/20099 (15.85%) Loss: 2.192671 LR: 0.00002482 +[08:16:12] Epoch: 1 Batch: 3186/20099 (15.85%) Loss: 2.409903 LR: 0.00002482 +[08:16:15] Epoch: 1 Batch: 3187/20099 (15.86%) Loss: 2.251062 LR: 0.00002482 +[08:16:18] Epoch: 1 Batch: 3188/20099 (15.86%) Loss: 2.050327 LR: 0.00002482 +[08:16:21] Epoch: 1 Batch: 3189/20099 (15.87%) Loss: 2.240115 LR: 0.00002482 +[08:16:24] Epoch: 1 Batch: 3190/20099 (15.87%) Loss: 2.325598 LR: 0.00002482 +[08:16:27] Epoch: 1 Batch: 3191/20099 (15.88%) Loss: 1.820946 LR: 0.00002482 +[08:16:30] Epoch: 1 Batch: 3192/20099 (15.88%) Loss: 2.296700 LR: 0.00002487 +[08:16:33] Epoch: 1 Batch: 3193/20099 (15.89%) Loss: 2.241825 LR: 0.00002487 +[08:16:37] Epoch: 1 Batch: 3194/20099 (15.89%) Loss: 2.213295 LR: 0.00002487 +[08:16:40] Epoch: 1 Batch: 3195/20099 (15.90%) Loss: 2.393660 LR: 0.00002487 +[08:16:43] Epoch: 1 Batch: 3196/20099 (15.90%) Loss: 2.503264 LR: 0.00002487 +[08:16:46] Epoch: 1 Batch: 3197/20099 (15.91%) Loss: 2.088830 LR: 0.00002487 +[08:16:49] Epoch: 1 Batch: 3198/20099 (15.91%) Loss: 2.161597 LR: 0.00002487 +[08:16:52] Epoch: 1 Batch: 3199/20099 (15.92%) Loss: 2.496126 LR: 0.00002493 +[08:16:59] >> Cleaned up old temp checkpoint: epoch1_step1200 +[08:16:59] >> Temp checkpoint saved: epoch1_step3200, size: 0.1693 GB +[08:16:59] Epoch: 1 Batch: 3200/20099 (15.92%) Loss: 2.228915 LR: 0.00002493 +[08:17:02] Epoch: 1 Batch: 3201/20099 (15.93%) Loss: 1.972764 LR: 0.00002493 +[08:17:05] Epoch: 1 Batch: 3202/20099 (15.93%) Loss: 2.296057 LR: 0.00002493 +[08:17:08] Epoch: 1 Batch: 3203/20099 (15.94%) Loss: 2.171824 LR: 0.00002493 +[08:17:11] Epoch: 1 Batch: 3204/20099 (15.94%) Loss: 2.219774 LR: 0.00002493 +[08:17:14] Epoch: 1 Batch: 3205/20099 (15.95%) Loss: 2.027104 LR: 0.00002493 +[08:17:17] Epoch: 1 Batch: 3206/20099 (15.95%) Loss: 2.078878 LR: 0.00002498 +[08:17:20] Epoch: 1 Batch: 3207/20099 (15.96%) Loss: 2.287446 LR: 0.00002498 +[08:17:23] Epoch: 1 Batch: 3208/20099 (15.96%) Loss: 2.366476 LR: 0.00002498 +[08:17:26] Epoch: 1 Batch: 3209/20099 (15.97%) Loss: 2.205447 LR: 0.00002498 +[08:17:30] Epoch: 1 Batch: 3210/20099 (15.97%) Loss: 2.126209 LR: 0.00002498 +[08:17:33] Epoch: 1 Batch: 3211/20099 (15.98%) Loss: 2.137474 LR: 0.00002498 +[08:17:36] Epoch: 1 Batch: 3212/20099 (15.98%) Loss: 2.043741 LR: 0.00002498 +[08:17:39] Epoch: 1 Batch: 3213/20099 (15.99%) Loss: 2.211807 LR: 0.00002504 +[08:17:42] Epoch: 1 Batch: 3214/20099 (15.99%) Loss: 1.988102 LR: 0.00002504 +[08:17:45] Epoch: 1 Batch: 3215/20099 (16.00%) Loss: 2.343780 LR: 0.00002504 +[08:17:48] Epoch: 1 Batch: 3216/20099 (16.00%) Loss: 2.451816 LR: 0.00002504 +[08:17:51] Epoch: 1 Batch: 3217/20099 (16.01%) Loss: 2.351874 LR: 0.00002504 +[08:17:54] Epoch: 1 Batch: 3218/20099 (16.01%) Loss: 2.324056 LR: 0.00002504 +[08:17:58] Epoch: 1 Batch: 3219/20099 (16.02%) Loss: 1.826231 LR: 0.00002504 +[08:18:01] Epoch: 1 Batch: 3220/20099 (16.02%) Loss: 2.196962 LR: 0.00002509 +[08:18:04] Epoch: 1 Batch: 3221/20099 (16.03%) Loss: 2.185053 LR: 0.00002509 +[08:18:07] Epoch: 1 Batch: 3222/20099 (16.03%) Loss: 2.171045 LR: 0.00002509 +[08:18:10] Epoch: 1 Batch: 3223/20099 (16.04%) Loss: 2.294142 LR: 0.00002509 +[08:18:13] Epoch: 1 Batch: 3224/20099 (16.04%) Loss: 2.149269 LR: 0.00002509 +[08:18:16] Epoch: 1 Batch: 3225/20099 (16.05%) Loss: 2.056018 LR: 0.00002509 +[08:18:19] Epoch: 1 Batch: 3226/20099 (16.05%) Loss: 1.805209 LR: 0.00002509 +[08:18:22] Epoch: 1 Batch: 3227/20099 (16.06%) Loss: 2.427785 LR: 0.00002515 +[08:18:25] Epoch: 1 Batch: 3228/20099 (16.06%) Loss: 2.267134 LR: 0.00002515 +[08:18:28] Epoch: 1 Batch: 3229/20099 (16.07%) Loss: 2.239864 LR: 0.00002515 +[08:18:32] Epoch: 1 Batch: 3230/20099 (16.07%) Loss: 2.706387 LR: 0.00002515 +[08:18:35] Epoch: 1 Batch: 3231/20099 (16.08%) Loss: 2.232271 LR: 0.00002515 +[08:18:38] Epoch: 1 Batch: 3232/20099 (16.08%) Loss: 2.197361 LR: 0.00002515 +[08:18:41] Epoch: 1 Batch: 3233/20099 (16.09%) Loss: 2.401169 LR: 0.00002515 +[08:18:44] Epoch: 1 Batch: 3234/20099 (16.09%) Loss: 2.203218 LR: 0.00002520 +[08:18:47] Epoch: 1 Batch: 3235/20099 (16.10%) Loss: 2.177223 LR: 0.00002520 +[08:18:50] Epoch: 1 Batch: 3236/20099 (16.10%) Loss: 2.145628 LR: 0.00002520 +[08:18:53] Epoch: 1 Batch: 3237/20099 (16.11%) Loss: 2.135652 LR: 0.00002520 +[08:18:56] Epoch: 1 Batch: 3238/20099 (16.11%) Loss: 2.304856 LR: 0.00002520 +[08:18:59] Epoch: 1 Batch: 3239/20099 (16.12%) Loss: 2.502804 LR: 0.00002520 +[08:19:02] Epoch: 1 Batch: 3240/20099 (16.12%) Loss: 1.978871 LR: 0.00002520 +[08:19:06] Epoch: 1 Batch: 3241/20099 (16.13%) Loss: 2.669682 LR: 0.00002525 +[08:19:09] Epoch: 1 Batch: 3242/20099 (16.13%) Loss: 2.258694 LR: 0.00002525 +[08:19:12] Epoch: 1 Batch: 3243/20099 (16.14%) Loss: 2.310532 LR: 0.00002525 +[08:19:15] Epoch: 1 Batch: 3244/20099 (16.14%) Loss: 2.148279 LR: 0.00002525 +[08:19:18] Epoch: 1 Batch: 3245/20099 (16.15%) Loss: 2.139233 LR: 0.00002525 +[08:19:21] Epoch: 1 Batch: 3246/20099 (16.15%) Loss: 2.169776 LR: 0.00002525 +[08:19:24] Epoch: 1 Batch: 3247/20099 (16.16%) Loss: 2.459082 LR: 0.00002525 +[08:19:27] Epoch: 1 Batch: 3248/20099 (16.16%) Loss: 2.473167 LR: 0.00002531 +[08:19:30] Epoch: 1 Batch: 3249/20099 (16.16%) Loss: 2.211891 LR: 0.00002531 +[08:19:33] Epoch: 1 Batch: 3250/20099 (16.17%) Loss: 2.473466 LR: 0.00002531 +[08:19:36] Epoch: 1 Batch: 3251/20099 (16.17%) Loss: 2.144229 LR: 0.00002531 +[08:19:40] Epoch: 1 Batch: 3252/20099 (16.18%) Loss: 1.884536 LR: 0.00002531 +[08:19:43] Epoch: 1 Batch: 3253/20099 (16.18%) Loss: 2.202050 LR: 0.00002531 +[08:19:46] Epoch: 1 Batch: 3254/20099 (16.19%) Loss: 1.972501 LR: 0.00002531 +[08:19:49] Epoch: 1 Batch: 3255/20099 (16.19%) Loss: 2.304303 LR: 0.00002536 +[08:19:52] Epoch: 1 Batch: 3256/20099 (16.20%) Loss: 2.319896 LR: 0.00002536 +[08:19:55] Epoch: 1 Batch: 3257/20099 (16.20%) Loss: 1.875923 LR: 0.00002536 +[08:19:58] Epoch: 1 Batch: 3258/20099 (16.21%) Loss: 2.310294 LR: 0.00002536 +[08:20:01] Epoch: 1 Batch: 3259/20099 (16.21%) Loss: 2.229076 LR: 0.00002536 +[08:20:04] Epoch: 1 Batch: 3260/20099 (16.22%) Loss: 2.146106 LR: 0.00002536 +[08:20:07] Epoch: 1 Batch: 3261/20099 (16.22%) Loss: 2.059092 LR: 0.00002536 +[08:20:10] Epoch: 1 Batch: 3262/20099 (16.23%) Loss: 1.875203 LR: 0.00002542 +[08:20:14] Epoch: 1 Batch: 3263/20099 (16.23%) Loss: 2.673416 LR: 0.00002542 +[08:20:17] Epoch: 1 Batch: 3264/20099 (16.24%) Loss: 2.305681 LR: 0.00002542 +[08:20:20] Epoch: 1 Batch: 3265/20099 (16.24%) Loss: 2.082320 LR: 0.00002542 +[08:20:23] Epoch: 1 Batch: 3266/20099 (16.25%) Loss: 2.252251 LR: 0.00002542 +[08:20:26] Epoch: 1 Batch: 3267/20099 (16.25%) Loss: 2.418186 LR: 0.00002542 +[08:20:29] Epoch: 1 Batch: 3268/20099 (16.26%) Loss: 2.332343 LR: 0.00002542 +[08:20:32] Epoch: 1 Batch: 3269/20099 (16.26%) Loss: 2.209190 LR: 0.00002547 +[08:20:35] Epoch: 1 Batch: 3270/20099 (16.27%) Loss: 2.297785 LR: 0.00002547 +[08:20:38] Epoch: 1 Batch: 3271/20099 (16.27%) Loss: 1.944709 LR: 0.00002547 +[08:20:41] Epoch: 1 Batch: 3272/20099 (16.28%) Loss: 2.040693 LR: 0.00002547 +[08:20:45] Epoch: 1 Batch: 3273/20099 (16.28%) Loss: 2.262236 LR: 0.00002547 +[08:20:48] Epoch: 1 Batch: 3274/20099 (16.29%) Loss: 2.418095 LR: 0.00002547 +[08:20:51] Epoch: 1 Batch: 3275/20099 (16.29%) Loss: 2.186109 LR: 0.00002547 +[08:20:54] Epoch: 1 Batch: 3276/20099 (16.30%) Loss: 2.256920 LR: 0.00002553 +[08:20:57] Epoch: 1 Batch: 3277/20099 (16.30%) Loss: 2.164545 LR: 0.00002553 +[08:21:00] Epoch: 1 Batch: 3278/20099 (16.31%) Loss: 2.238378 LR: 0.00002553 +[08:21:03] Epoch: 1 Batch: 3279/20099 (16.31%) Loss: 1.811388 LR: 0.00002553 +[08:21:06] Epoch: 1 Batch: 3280/20099 (16.32%) Loss: 2.417852 LR: 0.00002553 +[08:21:09] Epoch: 1 Batch: 3281/20099 (16.32%) Loss: 2.371376 LR: 0.00002553 +[08:21:12] Epoch: 1 Batch: 3282/20099 (16.33%) Loss: 2.424272 LR: 0.00002553 +[08:21:15] Epoch: 1 Batch: 3283/20099 (16.33%) Loss: 2.481582 LR: 0.00002558 +[08:21:19] Epoch: 1 Batch: 3284/20099 (16.34%) Loss: 2.331858 LR: 0.00002558 +[08:21:22] Epoch: 1 Batch: 3285/20099 (16.34%) Loss: 2.479770 LR: 0.00002558 +[08:21:25] Epoch: 1 Batch: 3286/20099 (16.35%) Loss: 2.208421 LR: 0.00002558 +[08:21:28] Epoch: 1 Batch: 3287/20099 (16.35%) Loss: 2.593747 LR: 0.00002558 +[08:21:31] Epoch: 1 Batch: 3288/20099 (16.36%) Loss: 2.121921 LR: 0.00002558 +[08:21:34] Epoch: 1 Batch: 3289/20099 (16.36%) Loss: 2.235387 LR: 0.00002558 +[08:21:37] Epoch: 1 Batch: 3290/20099 (16.37%) Loss: 2.439532 LR: 0.00002564 +[08:21:40] Epoch: 1 Batch: 3291/20099 (16.37%) Loss: 1.818057 LR: 0.00002564 +[08:21:43] Epoch: 1 Batch: 3292/20099 (16.38%) Loss: 2.206335 LR: 0.00002564 +[08:21:46] Epoch: 1 Batch: 3293/20099 (16.38%) Loss: 1.876814 LR: 0.00002564 +[08:21:50] Epoch: 1 Batch: 3294/20099 (16.39%) Loss: 2.249139 LR: 0.00002564 +[08:21:53] Epoch: 1 Batch: 3295/20099 (16.39%) Loss: 2.262602 LR: 0.00002564 +[08:21:56] Epoch: 1 Batch: 3296/20099 (16.40%) Loss: 2.210190 LR: 0.00002564 +[08:21:59] Epoch: 1 Batch: 3297/20099 (16.40%) Loss: 2.296537 LR: 0.00002569 +[08:22:02] Epoch: 1 Batch: 3298/20099 (16.41%) Loss: 2.151994 LR: 0.00002569 +[08:22:05] Epoch: 1 Batch: 3299/20099 (16.41%) Loss: 2.066785 LR: 0.00002569 +[08:22:08] Epoch: 1 Batch: 3300/20099 (16.42%) Loss: 2.600168 LR: 0.00002569 +[08:22:11] Epoch: 1 Batch: 3301/20099 (16.42%) Loss: 2.211089 LR: 0.00002569 +[08:22:14] Epoch: 1 Batch: 3302/20099 (16.43%) Loss: 2.063712 LR: 0.00002569 +[08:22:17] Epoch: 1 Batch: 3303/20099 (16.43%) Loss: 2.173796 LR: 0.00002569 +[08:22:21] Epoch: 1 Batch: 3304/20099 (16.44%) Loss: 2.278502 LR: 0.00002575 +[08:22:24] Epoch: 1 Batch: 3305/20099 (16.44%) Loss: 1.950620 LR: 0.00002575 +[08:22:27] Epoch: 1 Batch: 3306/20099 (16.45%) Loss: 2.147433 LR: 0.00002575 +[08:22:30] Epoch: 1 Batch: 3307/20099 (16.45%) Loss: 1.689042 LR: 0.00002575 +[08:22:33] Epoch: 1 Batch: 3308/20099 (16.46%) Loss: 2.330981 LR: 0.00002575 +[08:22:36] Epoch: 1 Batch: 3309/20099 (16.46%) Loss: 2.535165 LR: 0.00002575 +[08:22:39] Epoch: 1 Batch: 3310/20099 (16.47%) Loss: 2.450668 LR: 0.00002575 +[08:22:42] Epoch: 1 Batch: 3311/20099 (16.47%) Loss: 2.302010 LR: 0.00002580 +[08:22:45] Epoch: 1 Batch: 3312/20099 (16.48%) Loss: 1.969719 LR: 0.00002580 +[08:22:48] Epoch: 1 Batch: 3313/20099 (16.48%) Loss: 2.336655 LR: 0.00002580 +[08:22:51] Epoch: 1 Batch: 3314/20099 (16.49%) Loss: 1.979051 LR: 0.00002580 +[08:22:55] Epoch: 1 Batch: 3315/20099 (16.49%) Loss: 2.140113 LR: 0.00002580 +[08:22:58] Epoch: 1 Batch: 3316/20099 (16.50%) Loss: 2.601040 LR: 0.00002580 +[08:23:01] Epoch: 1 Batch: 3317/20099 (16.50%) Loss: 2.401024 LR: 0.00002580 +[08:23:04] Epoch: 1 Batch: 3318/20099 (16.51%) Loss: 2.063126 LR: 0.00002585 +[08:23:07] Epoch: 1 Batch: 3319/20099 (16.51%) Loss: 2.297027 LR: 0.00002585 +[08:23:10] Epoch: 1 Batch: 3320/20099 (16.52%) Loss: 2.338523 LR: 0.00002585 +[08:23:13] Epoch: 1 Batch: 3321/20099 (16.52%) Loss: 2.481900 LR: 0.00002585 +[08:23:16] Epoch: 1 Batch: 3322/20099 (16.53%) Loss: 2.266592 LR: 0.00002585 +[08:23:19] Epoch: 1 Batch: 3323/20099 (16.53%) Loss: 2.355905 LR: 0.00002585 +[08:23:22] Epoch: 1 Batch: 3324/20099 (16.54%) Loss: 2.291657 LR: 0.00002585 +[08:23:26] Epoch: 1 Batch: 3325/20099 (16.54%) Loss: 1.713855 LR: 0.00002591 +[08:23:29] Epoch: 1 Batch: 3326/20099 (16.55%) Loss: 2.226881 LR: 0.00002591 +[08:23:32] Epoch: 1 Batch: 3327/20099 (16.55%) Loss: 2.023180 LR: 0.00002591 +[08:23:35] Epoch: 1 Batch: 3328/20099 (16.56%) Loss: 2.457410 LR: 0.00002591 +[08:23:38] Epoch: 1 Batch: 3329/20099 (16.56%) Loss: 2.049907 LR: 0.00002591 +[08:23:41] Epoch: 1 Batch: 3330/20099 (16.57%) Loss: 2.331043 LR: 0.00002591 +[08:23:44] Epoch: 1 Batch: 3331/20099 (16.57%) Loss: 2.070821 LR: 0.00002591 +[08:23:47] Epoch: 1 Batch: 3332/20099 (16.58%) Loss: 2.320089 LR: 0.00002596 +[08:23:50] Epoch: 1 Batch: 3333/20099 (16.58%) Loss: 2.089374 LR: 0.00002596 +[08:23:54] Epoch: 1 Batch: 3334/20099 (16.59%) Loss: 2.141730 LR: 0.00002596 +[08:23:57] Epoch: 1 Batch: 3335/20099 (16.59%) Loss: 2.202426 LR: 0.00002596 +[08:24:00] Epoch: 1 Batch: 3336/20099 (16.60%) Loss: 1.980498 LR: 0.00002596 +[08:24:03] Epoch: 1 Batch: 3337/20099 (16.60%) Loss: 2.036700 LR: 0.00002596 +[08:24:06] Epoch: 1 Batch: 3338/20099 (16.61%) Loss: 2.421759 LR: 0.00002596 +[08:24:09] Epoch: 1 Batch: 3339/20099 (16.61%) Loss: 1.834824 LR: 0.00002602 +[08:24:12] Epoch: 1 Batch: 3340/20099 (16.62%) Loss: 2.006632 LR: 0.00002602 +[08:24:15] Epoch: 1 Batch: 3341/20099 (16.62%) Loss: 2.230856 LR: 0.00002602 +[08:24:18] Epoch: 1 Batch: 3342/20099 (16.63%) Loss: 2.102738 LR: 0.00002602 +[08:24:21] Epoch: 1 Batch: 3343/20099 (16.63%) Loss: 1.945235 LR: 0.00002602 +[08:24:24] Epoch: 1 Batch: 3344/20099 (16.64%) Loss: 2.309559 LR: 0.00002602 +[08:24:28] Epoch: 1 Batch: 3345/20099 (16.64%) Loss: 2.515952 LR: 0.00002602 +[08:24:31] Epoch: 1 Batch: 3346/20099 (16.65%) Loss: 2.193229 LR: 0.00002607 +[08:24:34] Epoch: 1 Batch: 3347/20099 (16.65%) Loss: 2.544253 LR: 0.00002607 +[08:24:37] Epoch: 1 Batch: 3348/20099 (16.66%) Loss: 2.102910 LR: 0.00002607 +[08:24:40] Epoch: 1 Batch: 3349/20099 (16.66%) Loss: 2.459631 LR: 0.00002607 +[08:24:43] Epoch: 1 Batch: 3350/20099 (16.67%) Loss: 2.047219 LR: 0.00002607 +[08:24:46] Epoch: 1 Batch: 3351/20099 (16.67%) Loss: 2.400079 LR: 0.00002607 +[08:24:49] Epoch: 1 Batch: 3352/20099 (16.68%) Loss: 2.102782 LR: 0.00002607 +[08:24:52] Epoch: 1 Batch: 3353/20099 (16.68%) Loss: 2.526941 LR: 0.00002613 +[08:24:55] Epoch: 1 Batch: 3354/20099 (16.69%) Loss: 1.998634 LR: 0.00002613 +[08:24:59] Epoch: 1 Batch: 3355/20099 (16.69%) Loss: 2.194685 LR: 0.00002613 +[08:25:02] Epoch: 1 Batch: 3356/20099 (16.70%) Loss: 1.973290 LR: 0.00002613 +[08:25:05] Epoch: 1 Batch: 3357/20099 (16.70%) Loss: 2.035192 LR: 0.00002613 +[08:25:08] Epoch: 1 Batch: 3358/20099 (16.71%) Loss: 2.322349 LR: 0.00002613 +[08:25:11] Epoch: 1 Batch: 3359/20099 (16.71%) Loss: 2.231432 LR: 0.00002613 +[08:25:14] Epoch: 1 Batch: 3360/20099 (16.72%) Loss: 2.061768 LR: 0.00002618 +[08:25:17] Epoch: 1 Batch: 3361/20099 (16.72%) Loss: 2.419883 LR: 0.00002618 +[08:25:20] Epoch: 1 Batch: 3362/20099 (16.73%) Loss: 2.345036 LR: 0.00002618 +[08:25:23] Epoch: 1 Batch: 3363/20099 (16.73%) Loss: 2.449249 LR: 0.00002618 +[08:25:26] Epoch: 1 Batch: 3364/20099 (16.74%) Loss: 2.317705 LR: 0.00002618 +[08:25:29] Epoch: 1 Batch: 3365/20099 (16.74%) Loss: 1.984886 LR: 0.00002618 +[08:25:33] Epoch: 1 Batch: 3366/20099 (16.75%) Loss: 2.351197 LR: 0.00002618 +[08:25:36] Epoch: 1 Batch: 3367/20099 (16.75%) Loss: 2.322683 LR: 0.00002624 +[08:25:39] Epoch: 1 Batch: 3368/20099 (16.76%) Loss: 2.232392 LR: 0.00002624 +[08:25:42] Epoch: 1 Batch: 3369/20099 (16.76%) Loss: 1.976066 LR: 0.00002624 +[08:25:45] Epoch: 1 Batch: 3370/20099 (16.77%) Loss: 2.188760 LR: 0.00002624 +[08:25:48] Epoch: 1 Batch: 3371/20099 (16.77%) Loss: 2.123841 LR: 0.00002624 +[08:25:51] Epoch: 1 Batch: 3372/20099 (16.78%) Loss: 2.338250 LR: 0.00002624 +[08:25:54] Epoch: 1 Batch: 3373/20099 (16.78%) Loss: 2.265271 LR: 0.00002624 +[08:25:57] Epoch: 1 Batch: 3374/20099 (16.79%) Loss: 2.126385 LR: 0.00002629 +[08:26:00] Epoch: 1 Batch: 3375/20099 (16.79%) Loss: 2.009998 LR: 0.00002629 +[08:26:03] Epoch: 1 Batch: 3376/20099 (16.80%) Loss: 2.227751 LR: 0.00002629 +[08:26:07] Epoch: 1 Batch: 3377/20099 (16.80%) Loss: 2.316501 LR: 0.00002629 +[08:26:10] Epoch: 1 Batch: 3378/20099 (16.81%) Loss: 2.394395 LR: 0.00002629 +[08:26:13] Epoch: 1 Batch: 3379/20099 (16.81%) Loss: 2.102838 LR: 0.00002629 +[08:26:16] Epoch: 1 Batch: 3380/20099 (16.82%) Loss: 1.937382 LR: 0.00002629 +[08:26:19] Epoch: 1 Batch: 3381/20099 (16.82%) Loss: 2.141023 LR: 0.00002635 +[08:26:22] Epoch: 1 Batch: 3382/20099 (16.83%) Loss: 2.279950 LR: 0.00002635 +[08:26:25] Epoch: 1 Batch: 3383/20099 (16.83%) Loss: 2.024928 LR: 0.00002635 +[08:26:28] Epoch: 1 Batch: 3384/20099 (16.84%) Loss: 2.300538 LR: 0.00002635 +[08:26:31] Epoch: 1 Batch: 3385/20099 (16.84%) Loss: 2.206520 LR: 0.00002635 +[08:26:34] Epoch: 1 Batch: 3386/20099 (16.85%) Loss: 2.182676 LR: 0.00002635 +[08:26:37] Epoch: 1 Batch: 3387/20099 (16.85%) Loss: 2.080597 LR: 0.00002635 +[08:26:41] Epoch: 1 Batch: 3388/20099 (16.86%) Loss: 2.116941 LR: 0.00002640 +[08:26:44] Epoch: 1 Batch: 3389/20099 (16.86%) Loss: 2.289006 LR: 0.00002640 +[08:26:47] Epoch: 1 Batch: 3390/20099 (16.87%) Loss: 1.891817 LR: 0.00002640 +[08:26:50] Epoch: 1 Batch: 3391/20099 (16.87%) Loss: 2.406143 LR: 0.00002640 +[08:26:53] Epoch: 1 Batch: 3392/20099 (16.88%) Loss: 2.174690 LR: 0.00002640 +[08:26:56] Epoch: 1 Batch: 3393/20099 (16.88%) Loss: 2.200736 LR: 0.00002640 +[08:26:59] Epoch: 1 Batch: 3394/20099 (16.89%) Loss: 2.087096 LR: 0.00002640 +[08:27:02] Epoch: 1 Batch: 3395/20099 (16.89%) Loss: 2.103908 LR: 0.00002645 +[08:27:05] Epoch: 1 Batch: 3396/20099 (16.90%) Loss: 2.120263 LR: 0.00002645 +[08:27:08] Epoch: 1 Batch: 3397/20099 (16.90%) Loss: 2.283164 LR: 0.00002645 +[08:27:11] Epoch: 1 Batch: 3398/20099 (16.91%) Loss: 2.216897 LR: 0.00002645 +[08:27:15] Epoch: 1 Batch: 3399/20099 (16.91%) Loss: 2.064074 LR: 0.00002645 +[08:27:21] >> Cleaned up old temp checkpoint: epoch1_step1400 +[08:27:21] >> Temp checkpoint saved: epoch1_step3400, size: 0.1693 GB +[08:27:21] Epoch: 1 Batch: 3400/20099 (16.92%) Loss: 2.163091 LR: 0.00002645 +[08:27:24] Epoch: 1 Batch: 3401/20099 (16.92%) Loss: 2.023928 LR: 0.00002645 +[08:27:27] Epoch: 1 Batch: 3402/20099 (16.93%) Loss: 2.126615 LR: 0.00002651 +[08:27:30] Epoch: 1 Batch: 3403/20099 (16.93%) Loss: 2.255748 LR: 0.00002651 +[08:27:34] Epoch: 1 Batch: 3404/20099 (16.94%) Loss: 2.220340 LR: 0.00002651 +[08:27:37] Epoch: 1 Batch: 3405/20099 (16.94%) Loss: 2.249306 LR: 0.00002651 +[08:27:40] Epoch: 1 Batch: 3406/20099 (16.95%) Loss: 2.319127 LR: 0.00002651 +[08:27:43] Epoch: 1 Batch: 3407/20099 (16.95%) Loss: 2.173804 LR: 0.00002651 +[08:27:46] Epoch: 1 Batch: 3408/20099 (16.96%) Loss: 2.145165 LR: 0.00002651 +[08:27:49] Epoch: 1 Batch: 3409/20099 (16.96%) Loss: 2.518950 LR: 0.00002656 +[08:27:52] Epoch: 1 Batch: 3410/20099 (16.97%) Loss: 2.032890 LR: 0.00002656 +[08:27:55] Epoch: 1 Batch: 3411/20099 (16.97%) Loss: 2.138523 LR: 0.00002656 +[08:27:58] Epoch: 1 Batch: 3412/20099 (16.98%) Loss: 2.271917 LR: 0.00002656 +[08:28:02] Epoch: 1 Batch: 3413/20099 (16.98%) Loss: 2.348805 LR: 0.00002656 +[08:28:05] Epoch: 1 Batch: 3414/20099 (16.99%) Loss: 2.060218 LR: 0.00002656 +[08:28:08] Epoch: 1 Batch: 3415/20099 (16.99%) Loss: 2.193175 LR: 0.00002656 +[08:28:11] Epoch: 1 Batch: 3416/20099 (17.00%) Loss: 2.023857 LR: 0.00002662 +[08:28:14] Epoch: 1 Batch: 3417/20099 (17.00%) Loss: 2.104334 LR: 0.00002662 +[08:28:17] Epoch: 1 Batch: 3418/20099 (17.01%) Loss: 2.076061 LR: 0.00002662 +[08:28:20] Epoch: 1 Batch: 3419/20099 (17.01%) Loss: 1.965384 LR: 0.00002662 +[08:28:23] Epoch: 1 Batch: 3420/20099 (17.02%) Loss: 2.225835 LR: 0.00002662 +[08:28:26] Epoch: 1 Batch: 3421/20099 (17.02%) Loss: 2.579302 LR: 0.00002662 +[08:28:29] Epoch: 1 Batch: 3422/20099 (17.03%) Loss: 2.214364 LR: 0.00002662 +[08:28:33] Epoch: 1 Batch: 3423/20099 (17.03%) Loss: 2.255745 LR: 0.00002667 +[08:28:36] Epoch: 1 Batch: 3424/20099 (17.04%) Loss: 2.385896 LR: 0.00002667 +[08:28:39] Epoch: 1 Batch: 3425/20099 (17.04%) Loss: 2.004862 LR: 0.00002667 +[08:28:42] Epoch: 1 Batch: 3426/20099 (17.05%) Loss: 2.230822 LR: 0.00002667 +[08:28:45] Epoch: 1 Batch: 3427/20099 (17.05%) Loss: 2.253181 LR: 0.00002667 +[08:28:48] Epoch: 1 Batch: 3428/20099 (17.06%) Loss: 2.088153 LR: 0.00002667 +[08:28:51] Epoch: 1 Batch: 3429/20099 (17.06%) Loss: 2.213341 LR: 0.00002667 +[08:28:54] Epoch: 1 Batch: 3430/20099 (17.07%) Loss: 2.231649 LR: 0.00002673 +[08:28:57] Epoch: 1 Batch: 3431/20099 (17.07%) Loss: 2.022143 LR: 0.00002673 +[08:29:00] Epoch: 1 Batch: 3432/20099 (17.08%) Loss: 2.461801 LR: 0.00002673 +[08:29:03] Epoch: 1 Batch: 3433/20099 (17.08%) Loss: 2.276209 LR: 0.00002673 +[08:29:06] Epoch: 1 Batch: 3434/20099 (17.09%) Loss: 2.406996 LR: 0.00002673 +[08:29:10] Epoch: 1 Batch: 3435/20099 (17.09%) Loss: 2.326365 LR: 0.00002673 +[08:29:13] Epoch: 1 Batch: 3436/20099 (17.10%) Loss: 2.338133 LR: 0.00002673 +[08:29:16] Epoch: 1 Batch: 3437/20099 (17.10%) Loss: 2.194034 LR: 0.00002678 +[08:29:19] Epoch: 1 Batch: 3438/20099 (17.11%) Loss: 2.070997 LR: 0.00002678 +[08:29:22] Epoch: 1 Batch: 3439/20099 (17.11%) Loss: 2.341562 LR: 0.00002678 +[08:29:25] Epoch: 1 Batch: 3440/20099 (17.12%) Loss: 2.157019 LR: 0.00002678 +[08:29:28] Epoch: 1 Batch: 3441/20099 (17.12%) Loss: 2.277182 LR: 0.00002678 +[08:29:31] Epoch: 1 Batch: 3442/20099 (17.13%) Loss: 2.191656 LR: 0.00002678 +[08:29:34] Epoch: 1 Batch: 3443/20099 (17.13%) Loss: 2.241832 LR: 0.00002678 +[08:29:37] Epoch: 1 Batch: 3444/20099 (17.14%) Loss: 1.945580 LR: 0.00002684 +[08:29:41] Epoch: 1 Batch: 3445/20099 (17.14%) Loss: 2.155689 LR: 0.00002684 +[08:29:44] Epoch: 1 Batch: 3446/20099 (17.15%) Loss: 2.011464 LR: 0.00002684 +[08:29:47] Epoch: 1 Batch: 3447/20099 (17.15%) Loss: 2.031164 LR: 0.00002684 +[08:29:50] Epoch: 1 Batch: 3448/20099 (17.16%) Loss: 2.129925 LR: 0.00002684 +[08:29:53] Epoch: 1 Batch: 3449/20099 (17.16%) Loss: 2.334049 LR: 0.00002684 +[08:29:56] Epoch: 1 Batch: 3450/20099 (17.17%) Loss: 2.164114 LR: 0.00002684 +[08:29:59] Epoch: 1 Batch: 3451/20099 (17.17%) Loss: 2.384614 LR: 0.00002689 +[08:30:02] Epoch: 1 Batch: 3452/20099 (17.17%) Loss: 2.267121 LR: 0.00002689 +[08:30:05] Epoch: 1 Batch: 3453/20099 (17.18%) Loss: 2.200549 LR: 0.00002689 +[08:30:08] Epoch: 1 Batch: 3454/20099 (17.18%) Loss: 2.282138 LR: 0.00002689 +[08:30:12] Epoch: 1 Batch: 3455/20099 (17.19%) Loss: 2.204434 LR: 0.00002689 +[08:30:15] Epoch: 1 Batch: 3456/20099 (17.19%) Loss: 2.276602 LR: 0.00002689 +[08:30:18] Epoch: 1 Batch: 3457/20099 (17.20%) Loss: 2.142602 LR: 0.00002689 +[08:30:21] Epoch: 1 Batch: 3458/20099 (17.20%) Loss: 2.232658 LR: 0.00002695 +[08:30:24] Epoch: 1 Batch: 3459/20099 (17.21%) Loss: 2.103098 LR: 0.00002695 +[08:30:27] Epoch: 1 Batch: 3460/20099 (17.21%) Loss: 2.373849 LR: 0.00002695 +[08:30:30] Epoch: 1 Batch: 3461/20099 (17.22%) Loss: 2.300773 LR: 0.00002695 +[08:30:33] Epoch: 1 Batch: 3462/20099 (17.22%) Loss: 2.227243 LR: 0.00002695 +[08:30:36] Epoch: 1 Batch: 3463/20099 (17.23%) Loss: 2.085287 LR: 0.00002695 +[08:30:40] Epoch: 1 Batch: 3464/20099 (17.23%) Loss: 2.427595 LR: 0.00002695 +[08:30:43] Epoch: 1 Batch: 3465/20099 (17.24%) Loss: 2.321514 LR: 0.00002700 +[08:30:46] Epoch: 1 Batch: 3466/20099 (17.24%) Loss: 2.048918 LR: 0.00002700 +[08:30:49] Epoch: 1 Batch: 3467/20099 (17.25%) Loss: 1.975017 LR: 0.00002700 +[08:30:52] Epoch: 1 Batch: 3468/20099 (17.25%) Loss: 2.305634 LR: 0.00002700 +[08:30:55] Epoch: 1 Batch: 3469/20099 (17.26%) Loss: 1.966069 LR: 0.00002700 +[08:30:58] Epoch: 1 Batch: 3470/20099 (17.26%) Loss: 2.245541 LR: 0.00002700 +[08:31:01] Epoch: 1 Batch: 3471/20099 (17.27%) Loss: 2.365441 LR: 0.00002700 +[08:31:04] Epoch: 1 Batch: 3472/20099 (17.27%) Loss: 2.173981 LR: 0.00002705 +[08:31:07] Epoch: 1 Batch: 3473/20099 (17.28%) Loss: 2.285876 LR: 0.00002705 +[08:31:11] Epoch: 1 Batch: 3474/20099 (17.28%) Loss: 2.070543 LR: 0.00002705 +[08:31:14] Epoch: 1 Batch: 3475/20099 (17.29%) Loss: 2.197250 LR: 0.00002705 +[08:31:17] Epoch: 1 Batch: 3476/20099 (17.29%) Loss: 2.447327 LR: 0.00002705 +[08:31:20] Epoch: 1 Batch: 3477/20099 (17.30%) Loss: 2.422841 LR: 0.00002705 +[08:31:23] Epoch: 1 Batch: 3478/20099 (17.30%) Loss: 2.286703 LR: 0.00002705 +[08:31:26] Epoch: 1 Batch: 3479/20099 (17.31%) Loss: 2.126195 LR: 0.00002711 +[08:31:29] Epoch: 1 Batch: 3480/20099 (17.31%) Loss: 2.198842 LR: 0.00002711 +[08:31:32] Epoch: 1 Batch: 3481/20099 (17.32%) Loss: 2.157564 LR: 0.00002711 +[08:31:35] Epoch: 1 Batch: 3482/20099 (17.32%) Loss: 2.286287 LR: 0.00002711 +[08:31:38] Epoch: 1 Batch: 3483/20099 (17.33%) Loss: 2.130831 LR: 0.00002711 +[08:31:42] Epoch: 1 Batch: 3484/20099 (17.33%) Loss: 2.437206 LR: 0.00002711 +[08:31:45] Epoch: 1 Batch: 3485/20099 (17.34%) Loss: 2.132667 LR: 0.00002711 +[08:31:48] Epoch: 1 Batch: 3486/20099 (17.34%) Loss: 1.964328 LR: 0.00002716 +[08:31:51] Epoch: 1 Batch: 3487/20099 (17.35%) Loss: 2.415962 LR: 0.00002716 +[08:31:54] Epoch: 1 Batch: 3488/20099 (17.35%) Loss: 2.293418 LR: 0.00002716 +[08:31:57] Epoch: 1 Batch: 3489/20099 (17.36%) Loss: 2.191009 LR: 0.00002716 +[08:32:00] Epoch: 1 Batch: 3490/20099 (17.36%) Loss: 2.340241 LR: 0.00002716 +[08:32:03] Epoch: 1 Batch: 3491/20099 (17.37%) Loss: 1.755109 LR: 0.00002716 +[08:32:06] Epoch: 1 Batch: 3492/20099 (17.37%) Loss: 2.239558 LR: 0.00002716 +[08:32:09] Epoch: 1 Batch: 3493/20099 (17.38%) Loss: 2.075412 LR: 0.00002722 +[08:32:13] Epoch: 1 Batch: 3494/20099 (17.38%) Loss: 2.220302 LR: 0.00002722 +[08:32:16] Epoch: 1 Batch: 3495/20099 (17.39%) Loss: 2.169812 LR: 0.00002722 +[08:32:19] Epoch: 1 Batch: 3496/20099 (17.39%) Loss: 2.315007 LR: 0.00002722 +[08:32:22] Epoch: 1 Batch: 3497/20099 (17.40%) Loss: 2.352706 LR: 0.00002722 +[08:32:25] Epoch: 1 Batch: 3498/20099 (17.40%) Loss: 2.461675 LR: 0.00002722 +[08:32:28] Epoch: 1 Batch: 3499/20099 (17.41%) Loss: 2.138924 LR: 0.00002722 +[08:32:31] >> Evaluating batch 0 +[08:32:32] >> Evaluating batch 1 +[08:32:34] >> Evaluating batch 2 +[08:32:35] >> Evaluating batch 3 +[08:32:36] >> Evaluating batch 4 +[08:32:38] >> Evaluating batch 5 +[08:32:39] >> Evaluating batch 6 +[08:32:40] >> Evaluating batch 7 +[08:32:41] >> Evaluating batch 8 +[08:32:43] >> Evaluating batch 9 +[08:32:44] >> Evaluating batch 10 +[08:32:45] >> Evaluating batch 11 +[08:32:46] >> Evaluating batch 12 +[08:32:47] >> Evaluating batch 13 +[08:32:49] >> Evaluating batch 14 +[08:32:50] >> Evaluating batch 15 +[08:32:51] >> Evaluating batch 16 +[08:32:52] Epoch: 1 Step: 3500/20099 Evaluation: +[08:32:52] [1mAvg Loss Since Last Eval: 2.2110 Val Loss: 2.2777 Validation loss delta: -0.0102 Perplexity: 9.7542 LR: 0.00002727 +[08:32:55] >> Checkpoint saved: epoch1_step3500, size: 0.1693 GB +[08:32:55] Epoch: 1 Batch: 3500/20099 (17.41%) Loss: 1.961526 LR: 0.00002727 +[08:32:58] Epoch: 1 Batch: 3501/20099 (17.42%) Loss: 2.266328 LR: 0.00002727 +[08:33:01] Epoch: 1 Batch: 3502/20099 (17.42%) Loss: 2.196041 LR: 0.00002727 +[08:33:04] Epoch: 1 Batch: 3503/20099 (17.43%) Loss: 2.096919 LR: 0.00002727 +[08:33:07] Epoch: 1 Batch: 3504/20099 (17.43%) Loss: 2.426582 LR: 0.00002727 +[08:33:11] Epoch: 1 Batch: 3505/20099 (17.44%) Loss: 2.102571 LR: 0.00002727 +[08:33:14] Epoch: 1 Batch: 3506/20099 (17.44%) Loss: 2.533707 LR: 0.00002727 +[08:33:17] Epoch: 1 Batch: 3507/20099 (17.45%) Loss: 2.205536 LR: 0.00002733 +[08:33:20] Epoch: 1 Batch: 3508/20099 (17.45%) Loss: 2.166482 LR: 0.00002733 +[08:33:23] Epoch: 1 Batch: 3509/20099 (17.46%) Loss: 2.032693 LR: 0.00002733 +[08:33:26] Epoch: 1 Batch: 3510/20099 (17.46%) Loss: 2.225319 LR: 0.00002733 +[08:33:29] Epoch: 1 Batch: 3511/20099 (17.47%) Loss: 2.155451 LR: 0.00002733 +[08:33:32] Epoch: 1 Batch: 3512/20099 (17.47%) Loss: 2.523999 LR: 0.00002733 +[08:33:35] Epoch: 1 Batch: 3513/20099 (17.48%) Loss: 2.510327 LR: 0.00002733 +[08:33:39] Epoch: 1 Batch: 3514/20099 (17.48%) Loss: 2.168895 LR: 0.00002738 +[08:33:42] Epoch: 1 Batch: 3515/20099 (17.49%) Loss: 2.217379 LR: 0.00002738 +[08:33:45] Epoch: 1 Batch: 3516/20099 (17.49%) Loss: 2.223370 LR: 0.00002738 +[08:33:48] Epoch: 1 Batch: 3517/20099 (17.50%) Loss: 2.411501 LR: 0.00002738 +[08:33:51] Epoch: 1 Batch: 3518/20099 (17.50%) Loss: 2.277023 LR: 0.00002738 +[08:33:54] Epoch: 1 Batch: 3519/20099 (17.51%) Loss: 2.253070 LR: 0.00002738 +[08:33:57] Epoch: 1 Batch: 3520/20099 (17.51%) Loss: 2.365365 LR: 0.00002738 +[08:34:00] Epoch: 1 Batch: 3521/20099 (17.52%) Loss: 2.062301 LR: 0.00002744 +[08:34:03] Epoch: 1 Batch: 3522/20099 (17.52%) Loss: 2.325712 LR: 0.00002744 +[08:34:06] Epoch: 1 Batch: 3523/20099 (17.53%) Loss: 2.194599 LR: 0.00002744 +[08:34:09] Epoch: 1 Batch: 3524/20099 (17.53%) Loss: 2.195588 LR: 0.00002744 +[08:34:13] Epoch: 1 Batch: 3525/20099 (17.54%) Loss: 2.122439 LR: 0.00002744 +[08:34:16] Epoch: 1 Batch: 3526/20099 (17.54%) Loss: 1.791992 LR: 0.00002744 +[08:34:19] Epoch: 1 Batch: 3527/20099 (17.55%) Loss: 2.177729 LR: 0.00002744 +[08:34:22] Epoch: 1 Batch: 3528/20099 (17.55%) Loss: 2.482922 LR: 0.00002749 +[08:34:25] Epoch: 1 Batch: 3529/20099 (17.56%) Loss: 1.960946 LR: 0.00002749 +[08:34:28] Epoch: 1 Batch: 3530/20099 (17.56%) Loss: 2.314354 LR: 0.00002749 +[08:34:31] Epoch: 1 Batch: 3531/20099 (17.57%) Loss: 2.328574 LR: 0.00002749 +[08:34:34] Epoch: 1 Batch: 3532/20099 (17.57%) Loss: 2.135657 LR: 0.00002749 +[08:34:37] Epoch: 1 Batch: 3533/20099 (17.58%) Loss: 2.472409 LR: 0.00002749 +[08:34:40] Epoch: 1 Batch: 3534/20099 (17.58%) Loss: 2.359417 LR: 0.00002749 +[08:34:43] Epoch: 1 Batch: 3535/20099 (17.59%) Loss: 2.323080 LR: 0.00002755 +[08:34:46] Epoch: 1 Batch: 3536/20099 (17.59%) Loss: 2.200904 LR: 0.00002755 +[08:34:50] Epoch: 1 Batch: 3537/20099 (17.60%) Loss: 2.231810 LR: 0.00002755 +[08:34:53] Epoch: 1 Batch: 3538/20099 (17.60%) Loss: 2.079761 LR: 0.00002755 +[08:34:56] Epoch: 1 Batch: 3539/20099 (17.61%) Loss: 2.232421 LR: 0.00002755 +[08:34:59] Epoch: 1 Batch: 3540/20099 (17.61%) Loss: 2.176439 LR: 0.00002755 +[08:35:02] Epoch: 1 Batch: 3541/20099 (17.62%) Loss: 2.377147 LR: 0.00002755 +[08:35:05] Epoch: 1 Batch: 3542/20099 (17.62%) Loss: 2.218146 LR: 0.00002760 +[08:35:08] Epoch: 1 Batch: 3543/20099 (17.63%) Loss: 2.210729 LR: 0.00002760 +[08:35:11] Epoch: 1 Batch: 3544/20099 (17.63%) Loss: 2.145888 LR: 0.00002760 +[08:35:14] Epoch: 1 Batch: 3545/20099 (17.64%) Loss: 2.117176 LR: 0.00002760 +[08:35:17] Epoch: 1 Batch: 3546/20099 (17.64%) Loss: 2.465259 LR: 0.00002760 +[08:35:21] Epoch: 1 Batch: 3547/20099 (17.65%) Loss: 2.492796 LR: 0.00002760 +[08:35:24] Epoch: 1 Batch: 3548/20099 (17.65%) Loss: 2.256935 LR: 0.00002760 +[08:35:27] Epoch: 1 Batch: 3549/20099 (17.66%) Loss: 2.043155 LR: 0.00002765 +[08:35:30] Epoch: 1 Batch: 3550/20099 (17.66%) Loss: 2.035642 LR: 0.00002765 +[08:35:33] Epoch: 1 Batch: 3551/20099 (17.67%) Loss: 2.090890 LR: 0.00002765 +[08:35:36] Epoch: 1 Batch: 3552/20099 (17.67%) Loss: 2.321067 LR: 0.00002765 +[08:35:39] Epoch: 1 Batch: 3553/20099 (17.68%) Loss: 2.322880 LR: 0.00002765 +[08:35:42] Epoch: 1 Batch: 3554/20099 (17.68%) Loss: 2.140146 LR: 0.00002765 +[08:35:45] Epoch: 1 Batch: 3555/20099 (17.69%) Loss: 2.107869 LR: 0.00002765 +[08:35:48] Epoch: 1 Batch: 3556/20099 (17.69%) Loss: 2.110251 LR: 0.00002771 +[08:35:52] Epoch: 1 Batch: 3557/20099 (17.70%) Loss: 2.161535 LR: 0.00002771 +[08:35:55] Epoch: 1 Batch: 3558/20099 (17.70%) Loss: 2.504559 LR: 0.00002771 +[08:35:58] Epoch: 1 Batch: 3559/20099 (17.71%) Loss: 2.041262 LR: 0.00002771 +[08:36:01] Epoch: 1 Batch: 3560/20099 (17.71%) Loss: 2.214695 LR: 0.00002771 +[08:36:04] Epoch: 1 Batch: 3561/20099 (17.72%) Loss: 2.279027 LR: 0.00002771 +[08:36:07] Epoch: 1 Batch: 3562/20099 (17.72%) Loss: 2.267825 LR: 0.00002771 +[08:36:10] Epoch: 1 Batch: 3563/20099 (17.73%) Loss: 2.147718 LR: 0.00002776 +[08:36:13] Epoch: 1 Batch: 3564/20099 (17.73%) Loss: 2.181548 LR: 0.00002776 +[08:36:16] Epoch: 1 Batch: 3565/20099 (17.74%) Loss: 2.372869 LR: 0.00002776 +[08:36:19] Epoch: 1 Batch: 3566/20099 (17.74%) Loss: 2.099928 LR: 0.00002776 +[08:36:23] Epoch: 1 Batch: 3567/20099 (17.75%) Loss: 2.453443 LR: 0.00002776 +[08:36:26] Epoch: 1 Batch: 3568/20099 (17.75%) Loss: 2.126630 LR: 0.00002776 +[08:36:29] Epoch: 1 Batch: 3569/20099 (17.76%) Loss: 2.460560 LR: 0.00002776 +[08:36:32] Epoch: 1 Batch: 3570/20099 (17.76%) Loss: 2.082667 LR: 0.00002782 +[08:36:35] Epoch: 1 Batch: 3571/20099 (17.77%) Loss: 2.111930 LR: 0.00002782 +[08:36:38] Epoch: 1 Batch: 3572/20099 (17.77%) Loss: 2.264828 LR: 0.00002782 +[08:36:41] Epoch: 1 Batch: 3573/20099 (17.78%) Loss: 1.947830 LR: 0.00002782 +[08:36:44] Epoch: 1 Batch: 3574/20099 (17.78%) Loss: 1.985795 LR: 0.00002782 +[08:36:47] Epoch: 1 Batch: 3575/20099 (17.79%) Loss: 2.176628 LR: 0.00002782 +[08:36:51] Epoch: 1 Batch: 3576/20099 (17.79%) Loss: 2.327776 LR: 0.00002782 +[08:36:54] Epoch: 1 Batch: 3577/20099 (17.80%) Loss: 2.254191 LR: 0.00002787 +[08:36:57] Epoch: 1 Batch: 3578/20099 (17.80%) Loss: 2.335636 LR: 0.00002787 +[08:37:00] Epoch: 1 Batch: 3579/20099 (17.81%) Loss: 2.275903 LR: 0.00002787 +[08:37:03] Epoch: 1 Batch: 3580/20099 (17.81%) Loss: 2.105923 LR: 0.00002787 +[08:37:06] Epoch: 1 Batch: 3581/20099 (17.82%) Loss: 2.149241 LR: 0.00002787 +[08:37:09] Epoch: 1 Batch: 3582/20099 (17.82%) Loss: 2.136858 LR: 0.00002787 +[08:37:12] Epoch: 1 Batch: 3583/20099 (17.83%) Loss: 2.060277 LR: 0.00002787 +[08:37:15] Epoch: 1 Batch: 3584/20099 (17.83%) Loss: 2.352275 LR: 0.00002793 +[08:37:18] Epoch: 1 Batch: 3585/20099 (17.84%) Loss: 2.103818 LR: 0.00002793 +[08:37:22] Epoch: 1 Batch: 3586/20099 (17.84%) Loss: 1.906251 LR: 0.00002793 +[08:37:25] Epoch: 1 Batch: 3587/20099 (17.85%) Loss: 1.985630 LR: 0.00002793 +[08:37:28] Epoch: 1 Batch: 3588/20099 (17.85%) Loss: 1.891139 LR: 0.00002793 +[08:37:31] Epoch: 1 Batch: 3589/20099 (17.86%) Loss: 2.194587 LR: 0.00002793 +[08:37:34] Epoch: 1 Batch: 3590/20099 (17.86%) Loss: 2.317896 LR: 0.00002793 +[08:37:37] Epoch: 1 Batch: 3591/20099 (17.87%) Loss: 2.317786 LR: 0.00002798 +[08:37:40] Epoch: 1 Batch: 3592/20099 (17.87%) Loss: 2.268287 LR: 0.00002798 +[08:37:43] Epoch: 1 Batch: 3593/20099 (17.88%) Loss: 2.287639 LR: 0.00002798 +[08:37:46] Epoch: 1 Batch: 3594/20099 (17.88%) Loss: 2.286015 LR: 0.00002798 +[08:37:49] Epoch: 1 Batch: 3595/20099 (17.89%) Loss: 2.098863 LR: 0.00002798 +[08:37:52] Epoch: 1 Batch: 3596/20099 (17.89%) Loss: 2.543015 LR: 0.00002798 +[08:37:56] Epoch: 1 Batch: 3597/20099 (17.90%) Loss: 2.137972 LR: 0.00002798 +[08:37:59] Epoch: 1 Batch: 3598/20099 (17.90%) Loss: 2.305627 LR: 0.00002804 +[08:38:02] Epoch: 1 Batch: 3599/20099 (17.91%) Loss: 1.991279 LR: 0.00002804 +[08:38:08] >> Cleaned up old temp checkpoint: epoch1_step1600 +[08:38:08] >> Temp checkpoint saved: epoch1_step3600, size: 0.1693 GB +[08:38:08] Epoch: 1 Batch: 3600/20099 (17.91%) Loss: 2.372720 LR: 0.00002804 +[08:38:11] Epoch: 1 Batch: 3601/20099 (17.92%) Loss: 2.200416 LR: 0.00002804 +[08:38:14] Epoch: 1 Batch: 3602/20099 (17.92%) Loss: 2.152371 LR: 0.00002804 +[08:38:18] Epoch: 1 Batch: 3603/20099 (17.93%) Loss: 2.417607 LR: 0.00002804 +[08:38:21] Epoch: 1 Batch: 3604/20099 (17.93%) Loss: 2.007164 LR: 0.00002804 +[08:38:24] Epoch: 1 Batch: 3605/20099 (17.94%) Loss: 2.052018 LR: 0.00002809 +[08:38:27] Epoch: 1 Batch: 3606/20099 (17.94%) Loss: 2.255626 LR: 0.00002809 +[08:38:30] Epoch: 1 Batch: 3607/20099 (17.95%) Loss: 2.078324 LR: 0.00002809 +[08:38:33] Epoch: 1 Batch: 3608/20099 (17.95%) Loss: 2.353887 LR: 0.00002809 +[08:38:36] Epoch: 1 Batch: 3609/20099 (17.96%) Loss: 2.111553 LR: 0.00002809 +[08:38:39] Epoch: 1 Batch: 3610/20099 (17.96%) Loss: 2.155862 LR: 0.00002809 +[08:38:42] Epoch: 1 Batch: 3611/20099 (17.97%) Loss: 2.094332 LR: 0.00002809 +[08:38:46] Epoch: 1 Batch: 3612/20099 (17.97%) Loss: 2.108253 LR: 0.00002815 +[08:38:49] Epoch: 1 Batch: 3613/20099 (17.98%) Loss: 2.145724 LR: 0.00002815 +[08:38:52] Epoch: 1 Batch: 3614/20099 (17.98%) Loss: 2.034326 LR: 0.00002815 +[08:38:55] Epoch: 1 Batch: 3615/20099 (17.99%) Loss: 2.328555 LR: 0.00002815 +[08:38:58] Epoch: 1 Batch: 3616/20099 (17.99%) Loss: 2.187956 LR: 0.00002815 +[08:39:01] Epoch: 1 Batch: 3617/20099 (18.00%) Loss: 2.051411 LR: 0.00002815 +[08:39:04] Epoch: 1 Batch: 3618/20099 (18.00%) Loss: 2.378642 LR: 0.00002815 +[08:39:07] Epoch: 1 Batch: 3619/20099 (18.01%) Loss: 1.882419 LR: 0.00002820 +[08:39:10] Epoch: 1 Batch: 3620/20099 (18.01%) Loss: 2.101921 LR: 0.00002820 +[08:39:14] Epoch: 1 Batch: 3621/20099 (18.02%) Loss: 2.392888 LR: 0.00002820 +[08:39:17] Epoch: 1 Batch: 3622/20099 (18.02%) Loss: 2.501123 LR: 0.00002820 +[08:39:20] Epoch: 1 Batch: 3623/20099 (18.03%) Loss: 2.198987 LR: 0.00002820 +[08:39:23] Epoch: 1 Batch: 3624/20099 (18.03%) Loss: 2.289235 LR: 0.00002820 +[08:39:26] Epoch: 1 Batch: 3625/20099 (18.04%) Loss: 2.110739 LR: 0.00002820 +[08:39:29] Epoch: 1 Batch: 3626/20099 (18.04%) Loss: 2.327968 LR: 0.00002825 +[08:39:32] Epoch: 1 Batch: 3627/20099 (18.05%) Loss: 2.459016 LR: 0.00002825 +[08:39:35] Epoch: 1 Batch: 3628/20099 (18.05%) Loss: 2.353927 LR: 0.00002825 +[08:39:38] Epoch: 1 Batch: 3629/20099 (18.06%) Loss: 2.427294 LR: 0.00002825 +[08:39:41] Epoch: 1 Batch: 3630/20099 (18.06%) Loss: 2.247803 LR: 0.00002825 +[08:39:44] Epoch: 1 Batch: 3631/20099 (18.07%) Loss: 2.417314 LR: 0.00002825 +[08:39:47] Epoch: 1 Batch: 3632/20099 (18.07%) Loss: 2.167754 LR: 0.00002825 +[08:39:50] Epoch: 1 Batch: 3633/20099 (18.08%) Loss: 2.014553 LR: 0.00002831 +[08:39:54] Epoch: 1 Batch: 3634/20099 (18.08%) Loss: 2.246857 LR: 0.00002831 +[08:39:57] Epoch: 1 Batch: 3635/20099 (18.09%) Loss: 2.058805 LR: 0.00002831 +[08:40:00] Epoch: 1 Batch: 3636/20099 (18.09%) Loss: 2.035503 LR: 0.00002831 +[08:40:03] Epoch: 1 Batch: 3637/20099 (18.10%) Loss: 2.360140 LR: 0.00002831 +[08:40:06] Epoch: 1 Batch: 3638/20099 (18.10%) Loss: 2.077742 LR: 0.00002831 +[08:40:09] Epoch: 1 Batch: 3639/20099 (18.11%) Loss: 1.983696 LR: 0.00002831 +[08:40:12] Epoch: 1 Batch: 3640/20099 (18.11%) Loss: 2.150242 LR: 0.00002836 +[08:40:15] Epoch: 1 Batch: 3641/20099 (18.12%) Loss: 2.213603 LR: 0.00002836 +[08:40:18] Epoch: 1 Batch: 3642/20099 (18.12%) Loss: 2.275053 LR: 0.00002836 +[08:40:21] Epoch: 1 Batch: 3643/20099 (18.13%) Loss: 2.254088 LR: 0.00002836 +[08:40:25] Epoch: 1 Batch: 3644/20099 (18.13%) Loss: 2.269475 LR: 0.00002836 +[08:40:28] Epoch: 1 Batch: 3645/20099 (18.14%) Loss: 2.385344 LR: 0.00002836 +[08:40:31] Epoch: 1 Batch: 3646/20099 (18.14%) Loss: 1.821224 LR: 0.00002836 +[08:40:34] Epoch: 1 Batch: 3647/20099 (18.15%) Loss: 2.016310 LR: 0.00002842 +[08:40:37] Epoch: 1 Batch: 3648/20099 (18.15%) Loss: 2.147260 LR: 0.00002842 +[08:40:40] Epoch: 1 Batch: 3649/20099 (18.16%) Loss: 1.803504 LR: 0.00002842 +[08:40:43] Epoch: 1 Batch: 3650/20099 (18.16%) Loss: 2.314947 LR: 0.00002842 +[08:40:46] Epoch: 1 Batch: 3651/20099 (18.17%) Loss: 2.308720 LR: 0.00002842 +[08:40:49] Epoch: 1 Batch: 3652/20099 (18.17%) Loss: 2.323877 LR: 0.00002842 +[08:40:52] Epoch: 1 Batch: 3653/20099 (18.18%) Loss: 1.987878 LR: 0.00002842 +[08:40:56] Epoch: 1 Batch: 3654/20099 (18.18%) Loss: 2.099923 LR: 0.00002847 +[08:40:59] Epoch: 1 Batch: 3655/20099 (18.18%) Loss: 2.449993 LR: 0.00002847 +[08:41:02] Epoch: 1 Batch: 3656/20099 (18.19%) Loss: 2.368257 LR: 0.00002847 +[08:41:05] Epoch: 1 Batch: 3657/20099 (18.19%) Loss: 2.326651 LR: 0.00002847 +[08:41:08] Epoch: 1 Batch: 3658/20099 (18.20%) Loss: 2.275696 LR: 0.00002847 +[08:41:11] Epoch: 1 Batch: 3659/20099 (18.20%) Loss: 2.118002 LR: 0.00002847 +[08:41:14] Epoch: 1 Batch: 3660/20099 (18.21%) Loss: 2.024580 LR: 0.00002847 +[08:41:17] Epoch: 1 Batch: 3661/20099 (18.21%) Loss: 2.297718 LR: 0.00002853 +[08:41:20] Epoch: 1 Batch: 3662/20099 (18.22%) Loss: 1.941885 LR: 0.00002853 +[08:41:23] Epoch: 1 Batch: 3663/20099 (18.22%) Loss: 2.290888 LR: 0.00002853 +[08:41:26] Epoch: 1 Batch: 3664/20099 (18.23%) Loss: 2.052295 LR: 0.00002853 +[08:41:30] Epoch: 1 Batch: 3665/20099 (18.23%) Loss: 1.931201 LR: 0.00002853 +[08:41:33] Epoch: 1 Batch: 3666/20099 (18.24%) Loss: 2.030326 LR: 0.00002853 +[08:41:36] Epoch: 1 Batch: 3667/20099 (18.24%) Loss: 2.121090 LR: 0.00002853 +[08:41:39] Epoch: 1 Batch: 3668/20099 (18.25%) Loss: 2.183436 LR: 0.00002858 +[08:41:42] Epoch: 1 Batch: 3669/20099 (18.25%) Loss: 1.922950 LR: 0.00002858 +[08:41:45] Epoch: 1 Batch: 3670/20099 (18.26%) Loss: 1.827330 LR: 0.00002858 +[08:41:48] Epoch: 1 Batch: 3671/20099 (18.26%) Loss: 2.095283 LR: 0.00002858 +[08:41:51] Epoch: 1 Batch: 3672/20099 (18.27%) Loss: 1.987642 LR: 0.00002858 +[08:41:54] Epoch: 1 Batch: 3673/20099 (18.27%) Loss: 1.969862 LR: 0.00002858 +[08:41:57] Epoch: 1 Batch: 3674/20099 (18.28%) Loss: 2.192002 LR: 0.00002858 +[08:42:00] Epoch: 1 Batch: 3675/20099 (18.28%) Loss: 2.521352 LR: 0.00002864 +[08:42:03] Epoch: 1 Batch: 3676/20099 (18.29%) Loss: 2.245274 LR: 0.00002864 +[08:42:07] Epoch: 1 Batch: 3677/20099 (18.29%) Loss: 2.674772 LR: 0.00002864 +[08:42:10] Epoch: 1 Batch: 3678/20099 (18.30%) Loss: 1.738716 LR: 0.00002864 +[08:42:13] Epoch: 1 Batch: 3679/20099 (18.30%) Loss: 2.361116 LR: 0.00002864 +[08:42:16] Epoch: 1 Batch: 3680/20099 (18.31%) Loss: 2.155463 LR: 0.00002864 +[08:42:19] Epoch: 1 Batch: 3681/20099 (18.31%) Loss: 2.296482 LR: 0.00002864 +[08:42:22] Epoch: 1 Batch: 3682/20099 (18.32%) Loss: 2.320742 LR: 0.00002869 +[08:42:25] Epoch: 1 Batch: 3683/20099 (18.32%) Loss: 2.229099 LR: 0.00002869 +[08:42:28] Epoch: 1 Batch: 3684/20099 (18.33%) Loss: 2.411473 LR: 0.00002869 +[08:42:31] Epoch: 1 Batch: 3685/20099 (18.33%) Loss: 2.094179 LR: 0.00002869 +[08:42:34] Epoch: 1 Batch: 3686/20099 (18.34%) Loss: 2.516968 LR: 0.00002869 +[08:42:37] Epoch: 1 Batch: 3687/20099 (18.34%) Loss: 2.385446 LR: 0.00002869 +[08:42:40] Epoch: 1 Batch: 3688/20099 (18.35%) Loss: 1.980304 LR: 0.00002869 +[08:42:44] Epoch: 1 Batch: 3689/20099 (18.35%) Loss: 2.132445 LR: 0.00002875 +[08:42:47] Epoch: 1 Batch: 3690/20099 (18.36%) Loss: 2.081540 LR: 0.00002875 +[08:42:50] Epoch: 1 Batch: 3691/20099 (18.36%) Loss: 2.219523 LR: 0.00002875 +[08:42:53] Epoch: 1 Batch: 3692/20099 (18.37%) Loss: 2.106692 LR: 0.00002875 +[08:42:56] Epoch: 1 Batch: 3693/20099 (18.37%) Loss: 2.179696 LR: 0.00002875 +[08:42:59] Epoch: 1 Batch: 3694/20099 (18.38%) Loss: 2.006476 LR: 0.00002875 +[08:43:02] Epoch: 1 Batch: 3695/20099 (18.38%) Loss: 2.374117 LR: 0.00002875 +[08:43:05] Epoch: 1 Batch: 3696/20099 (18.39%) Loss: 2.191715 LR: 0.00002880 +[08:43:08] Epoch: 1 Batch: 3697/20099 (18.39%) Loss: 2.280511 LR: 0.00002880 +[08:43:11] Epoch: 1 Batch: 3698/20099 (18.40%) Loss: 2.162292 LR: 0.00002880 +[08:43:15] Epoch: 1 Batch: 3699/20099 (18.40%) Loss: 2.306903 LR: 0.00002880 +[08:43:18] Epoch: 1 Batch: 3700/20099 (18.41%) Loss: 2.083425 LR: 0.00002880 +[08:43:21] Epoch: 1 Batch: 3701/20099 (18.41%) Loss: 2.255505 LR: 0.00002880 +[08:43:24] Epoch: 1 Batch: 3702/20099 (18.42%) Loss: 2.036491 LR: 0.00002880 +[08:43:27] Epoch: 1 Batch: 3703/20099 (18.42%) Loss: 1.856236 LR: 0.00002885 +[08:43:30] Epoch: 1 Batch: 3704/20099 (18.43%) Loss: 2.274361 LR: 0.00002885 +[08:43:33] Epoch: 1 Batch: 3705/20099 (18.43%) Loss: 2.052067 LR: 0.00002885 +[08:43:36] Epoch: 1 Batch: 3706/20099 (18.44%) Loss: 2.219908 LR: 0.00002885 +[08:43:39] Epoch: 1 Batch: 3707/20099 (18.44%) Loss: 2.150232 LR: 0.00002885 +[08:43:42] Epoch: 1 Batch: 3708/20099 (18.45%) Loss: 2.270745 LR: 0.00002885 +[08:43:45] Epoch: 1 Batch: 3709/20099 (18.45%) Loss: 2.215064 LR: 0.00002885 +[08:43:49] Epoch: 1 Batch: 3710/20099 (18.46%) Loss: 2.150714 LR: 0.00002891 +[08:43:52] Epoch: 1 Batch: 3711/20099 (18.46%) Loss: 2.217515 LR: 0.00002891 +[08:43:55] Epoch: 1 Batch: 3712/20099 (18.47%) Loss: 2.041106 LR: 0.00002891 +[08:43:58] Epoch: 1 Batch: 3713/20099 (18.47%) Loss: 2.185126 LR: 0.00002891 +[08:44:01] Epoch: 1 Batch: 3714/20099 (18.48%) Loss: 1.983349 LR: 0.00002891 +[08:44:04] Epoch: 1 Batch: 3715/20099 (18.48%) Loss: 2.292759 LR: 0.00002891 +[08:44:07] Epoch: 1 Batch: 3716/20099 (18.49%) Loss: 2.557661 LR: 0.00002891 +[08:44:10] Epoch: 1 Batch: 3717/20099 (18.49%) Loss: 2.389632 LR: 0.00002896 +[08:44:13] Epoch: 1 Batch: 3718/20099 (18.50%) Loss: 2.261550 LR: 0.00002896 +[08:44:16] Epoch: 1 Batch: 3719/20099 (18.50%) Loss: 2.357910 LR: 0.00002896 +[08:44:19] Epoch: 1 Batch: 3720/20099 (18.51%) Loss: 2.354188 LR: 0.00002896 +[08:44:23] Epoch: 1 Batch: 3721/20099 (18.51%) Loss: 2.413951 LR: 0.00002896 +[08:44:26] Epoch: 1 Batch: 3722/20099 (18.52%) Loss: 2.348244 LR: 0.00002896 +[08:44:29] Epoch: 1 Batch: 3723/20099 (18.52%) Loss: 2.052775 LR: 0.00002896 +[08:44:32] Epoch: 1 Batch: 3724/20099 (18.53%) Loss: 2.066980 LR: 0.00002902 +[08:44:35] Epoch: 1 Batch: 3725/20099 (18.53%) Loss: 2.157805 LR: 0.00002902 +[08:44:38] Epoch: 1 Batch: 3726/20099 (18.54%) Loss: 2.375665 LR: 0.00002902 +[08:44:41] Epoch: 1 Batch: 3727/20099 (18.54%) Loss: 2.223972 LR: 0.00002902 +[08:44:44] Epoch: 1 Batch: 3728/20099 (18.55%) Loss: 2.517500 LR: 0.00002902 +[08:44:47] Epoch: 1 Batch: 3729/20099 (18.55%) Loss: 1.913212 LR: 0.00002902 +[08:44:50] Epoch: 1 Batch: 3730/20099 (18.56%) Loss: 2.251338 LR: 0.00002902 +[08:44:54] Epoch: 1 Batch: 3731/20099 (18.56%) Loss: 2.292226 LR: 0.00002907 +[08:44:57] Epoch: 1 Batch: 3732/20099 (18.57%) Loss: 2.232478 LR: 0.00002907 +[08:45:00] Epoch: 1 Batch: 3733/20099 (18.57%) Loss: 2.127770 LR: 0.00002907 +[08:45:03] Epoch: 1 Batch: 3734/20099 (18.58%) Loss: 1.975544 LR: 0.00002907 +[08:45:06] Epoch: 1 Batch: 3735/20099 (18.58%) Loss: 2.209234 LR: 0.00002907 +[08:45:09] Epoch: 1 Batch: 3736/20099 (18.59%) Loss: 2.556951 LR: 0.00002907 +[08:45:12] Epoch: 1 Batch: 3737/20099 (18.59%) Loss: 2.449658 LR: 0.00002907 +[08:45:15] Epoch: 1 Batch: 3738/20099 (18.60%) Loss: 2.121218 LR: 0.00002913 +[08:45:18] Epoch: 1 Batch: 3739/20099 (18.60%) Loss: 2.174524 LR: 0.00002913 +[08:45:21] Epoch: 1 Batch: 3740/20099 (18.61%) Loss: 2.473674 LR: 0.00002913 +[08:45:25] Epoch: 1 Batch: 3741/20099 (18.61%) Loss: 1.983352 LR: 0.00002913 +[08:45:28] Epoch: 1 Batch: 3742/20099 (18.62%) Loss: 2.176258 LR: 0.00002913 +[08:45:31] Epoch: 1 Batch: 3743/20099 (18.62%) Loss: 2.259927 LR: 0.00002913 +[08:45:34] Epoch: 1 Batch: 3744/20099 (18.63%) Loss: 2.249662 LR: 0.00002913 +[08:45:37] Epoch: 1 Batch: 3745/20099 (18.63%) Loss: 2.053046 LR: 0.00002918 +[08:45:40] Epoch: 1 Batch: 3746/20099 (18.64%) Loss: 2.430856 LR: 0.00002918 +[08:45:43] Epoch: 1 Batch: 3747/20099 (18.64%) Loss: 2.135508 LR: 0.00002918 +[08:45:46] Epoch: 1 Batch: 3748/20099 (18.65%) Loss: 2.294399 LR: 0.00002918 +[08:45:49] Epoch: 1 Batch: 3749/20099 (18.65%) Loss: 2.292766 LR: 0.00002918 +[08:45:52] Epoch: 1 Batch: 3750/20099 (18.66%) Loss: 2.424328 LR: 0.00002918 +[08:45:56] Epoch: 1 Batch: 3751/20099 (18.66%) Loss: 2.243640 LR: 0.00002918 +[08:45:59] Epoch: 1 Batch: 3752/20099 (18.67%) Loss: 2.251503 LR: 0.00002924 +[08:46:02] Epoch: 1 Batch: 3753/20099 (18.67%) Loss: 2.150180 LR: 0.00002924 +[08:46:05] Epoch: 1 Batch: 3754/20099 (18.68%) Loss: 2.662363 LR: 0.00002924 +[08:46:08] Epoch: 1 Batch: 3755/20099 (18.68%) Loss: 1.748996 LR: 0.00002924 +[08:46:11] Epoch: 1 Batch: 3756/20099 (18.69%) Loss: 2.200341 LR: 0.00002924 +[08:46:14] Epoch: 1 Batch: 3757/20099 (18.69%) Loss: 2.118017 LR: 0.00002924 +[08:46:17] Epoch: 1 Batch: 3758/20099 (18.70%) Loss: 2.150166 LR: 0.00002924 +[08:46:20] Epoch: 1 Batch: 3759/20099 (18.70%) Loss: 2.028834 LR: 0.00002929 +[08:46:23] Epoch: 1 Batch: 3760/20099 (18.71%) Loss: 1.854575 LR: 0.00002929 +[08:46:26] Epoch: 1 Batch: 3761/20099 (18.71%) Loss: 2.464490 LR: 0.00002929 +[08:46:30] Epoch: 1 Batch: 3762/20099 (18.72%) Loss: 2.352511 LR: 0.00002929 +[08:46:33] Epoch: 1 Batch: 3763/20099 (18.72%) Loss: 2.200231 LR: 0.00002929 +[08:46:36] Epoch: 1 Batch: 3764/20099 (18.73%) Loss: 2.044948 LR: 0.00002929 +[08:46:39] Epoch: 1 Batch: 3765/20099 (18.73%) Loss: 2.441801 LR: 0.00002929 +[08:46:42] Epoch: 1 Batch: 3766/20099 (18.74%) Loss: 2.169284 LR: 0.00002935 +[08:46:45] Epoch: 1 Batch: 3767/20099 (18.74%) Loss: 2.479515 LR: 0.00002935 +[08:46:48] Epoch: 1 Batch: 3768/20099 (18.75%) Loss: 2.173170 LR: 0.00002935 +[08:46:51] Epoch: 1 Batch: 3769/20099 (18.75%) Loss: 2.112743 LR: 0.00002935 +[08:46:54] Epoch: 1 Batch: 3770/20099 (18.76%) Loss: 2.054328 LR: 0.00002935 +[08:46:57] Epoch: 1 Batch: 3771/20099 (18.76%) Loss: 2.099656 LR: 0.00002935 +[08:47:01] Epoch: 1 Batch: 3772/20099 (18.77%) Loss: 1.925619 LR: 0.00002935 +[08:47:04] Epoch: 1 Batch: 3773/20099 (18.77%) Loss: 2.109031 LR: 0.00002940 +[08:47:07] Epoch: 1 Batch: 3774/20099 (18.78%) Loss: 1.979051 LR: 0.00002940 +[08:47:10] Epoch: 1 Batch: 3775/20099 (18.78%) Loss: 2.282022 LR: 0.00002940 +[08:47:13] Epoch: 1 Batch: 3776/20099 (18.79%) Loss: 2.165964 LR: 0.00002940 +[08:47:16] Epoch: 1 Batch: 3777/20099 (18.79%) Loss: 2.226140 LR: 0.00002940 +[08:47:19] Epoch: 1 Batch: 3778/20099 (18.80%) Loss: 2.232293 LR: 0.00002940 +[08:47:22] Epoch: 1 Batch: 3779/20099 (18.80%) Loss: 2.193266 LR: 0.00002940 +[08:47:25] Epoch: 1 Batch: 3780/20099 (18.81%) Loss: 2.190737 LR: 0.00002945 +[08:47:28] Epoch: 1 Batch: 3781/20099 (18.81%) Loss: 2.046825 LR: 0.00002945 +[08:47:31] Epoch: 1 Batch: 3782/20099 (18.82%) Loss: 2.502385 LR: 0.00002945 +[08:47:35] Epoch: 1 Batch: 3783/20099 (18.82%) Loss: 2.216633 LR: 0.00002945 +[08:47:38] Epoch: 1 Batch: 3784/20099 (18.83%) Loss: 2.179853 LR: 0.00002945 +[08:47:41] Epoch: 1 Batch: 3785/20099 (18.83%) Loss: 2.507049 LR: 0.00002945 +[08:47:44] Epoch: 1 Batch: 3786/20099 (18.84%) Loss: 2.297454 LR: 0.00002945 +[08:47:47] Epoch: 1 Batch: 3787/20099 (18.84%) Loss: 2.421426 LR: 0.00002951 +[08:47:50] Epoch: 1 Batch: 3788/20099 (18.85%) Loss: 1.943087 LR: 0.00002951 +[08:47:53] Epoch: 1 Batch: 3789/20099 (18.85%) Loss: 2.537128 LR: 0.00002951 +[08:47:56] Epoch: 1 Batch: 3790/20099 (18.86%) Loss: 2.257926 LR: 0.00002951 +[08:47:59] Epoch: 1 Batch: 3791/20099 (18.86%) Loss: 2.148468 LR: 0.00002951 +[08:48:02] Epoch: 1 Batch: 3792/20099 (18.87%) Loss: 2.151384 LR: 0.00002951 +[08:48:05] Epoch: 1 Batch: 3793/20099 (18.87%) Loss: 2.077953 LR: 0.00002951 +[08:48:09] Epoch: 1 Batch: 3794/20099 (18.88%) Loss: 2.009767 LR: 0.00002956 +[08:48:12] Epoch: 1 Batch: 3795/20099 (18.88%) Loss: 2.151670 LR: 0.00002956 +[08:48:15] Epoch: 1 Batch: 3796/20099 (18.89%) Loss: 2.232823 LR: 0.00002956 +[08:48:18] Epoch: 1 Batch: 3797/20099 (18.89%) Loss: 2.090296 LR: 0.00002956 +[08:48:21] Epoch: 1 Batch: 3798/20099 (18.90%) Loss: 2.334928 LR: 0.00002956 +[08:48:24] Epoch: 1 Batch: 3799/20099 (18.90%) Loss: 2.277787 LR: 0.00002956 +[08:48:31] >> Cleaned up old temp checkpoint: epoch1_step1800 +[08:48:31] >> Temp checkpoint saved: epoch1_step3800, size: 0.1693 GB +[08:48:31] Epoch: 1 Batch: 3800/20099 (18.91%) Loss: 2.539415 LR: 0.00002956 +[08:48:34] Epoch: 1 Batch: 3801/20099 (18.91%) Loss: 1.990244 LR: 0.00002962 +[08:48:37] Epoch: 1 Batch: 3802/20099 (18.92%) Loss: 2.327858 LR: 0.00002962 +[08:48:40] Epoch: 1 Batch: 3803/20099 (18.92%) Loss: 2.119348 LR: 0.00002962 +[08:48:43] Epoch: 1 Batch: 3804/20099 (18.93%) Loss: 2.100741 LR: 0.00002962 +[08:48:46] Epoch: 1 Batch: 3805/20099 (18.93%) Loss: 2.375851 LR: 0.00002962 +[08:48:49] Epoch: 1 Batch: 3806/20099 (18.94%) Loss: 2.230961 LR: 0.00002962 +[08:48:52] Epoch: 1 Batch: 3807/20099 (18.94%) Loss: 2.093738 LR: 0.00002962 +[08:48:55] Epoch: 1 Batch: 3808/20099 (18.95%) Loss: 1.882820 LR: 0.00002967 +[08:48:58] Epoch: 1 Batch: 3809/20099 (18.95%) Loss: 2.269152 LR: 0.00002967 +[08:49:02] Epoch: 1 Batch: 3810/20099 (18.96%) Loss: 2.537046 LR: 0.00002967 +[08:49:05] Epoch: 1 Batch: 3811/20099 (18.96%) Loss: 2.094864 LR: 0.00002967 +[08:49:08] Epoch: 1 Batch: 3812/20099 (18.97%) Loss: 2.309602 LR: 0.00002967 +[08:49:11] Epoch: 1 Batch: 3813/20099 (18.97%) Loss: 2.361408 LR: 0.00002967 +[08:49:14] Epoch: 1 Batch: 3814/20099 (18.98%) Loss: 2.239440 LR: 0.00002967 +[08:49:17] Epoch: 1 Batch: 3815/20099 (18.98%) Loss: 2.462315 LR: 0.00002973 +[08:49:20] Epoch: 1 Batch: 3816/20099 (18.99%) Loss: 2.221870 LR: 0.00002973 +[08:49:23] Epoch: 1 Batch: 3817/20099 (18.99%) Loss: 2.282412 LR: 0.00002973 +[08:49:27] Epoch: 1 Batch: 3818/20099 (19.00%) Loss: 2.381946 LR: 0.00002973 +[08:49:30] Epoch: 1 Batch: 3819/20099 (19.00%) Loss: 2.620139 LR: 0.00002973 +[08:49:33] Epoch: 1 Batch: 3820/20099 (19.01%) Loss: 2.212627 LR: 0.00002973 +[08:49:36] Epoch: 1 Batch: 3821/20099 (19.01%) Loss: 2.237298 LR: 0.00002973 +[08:49:39] Epoch: 1 Batch: 3822/20099 (19.02%) Loss: 2.162826 LR: 0.00002978 +[08:49:42] Epoch: 1 Batch: 3823/20099 (19.02%) Loss: 2.199271 LR: 0.00002978 +[08:49:45] Epoch: 1 Batch: 3824/20099 (19.03%) Loss: 1.933123 LR: 0.00002978 +[08:49:48] Epoch: 1 Batch: 3825/20099 (19.03%) Loss: 2.414501 LR: 0.00002978 +[08:49:51] Epoch: 1 Batch: 3826/20099 (19.04%) Loss: 2.186663 LR: 0.00002978 +[08:49:54] Epoch: 1 Batch: 3827/20099 (19.04%) Loss: 2.290898 LR: 0.00002978 +[08:49:57] Epoch: 1 Batch: 3828/20099 (19.05%) Loss: 2.356161 LR: 0.00002978 +[08:50:00] Epoch: 1 Batch: 3829/20099 (19.05%) Loss: 2.225232 LR: 0.00002984 +[08:50:04] Epoch: 1 Batch: 3830/20099 (19.06%) Loss: 2.139328 LR: 0.00002984 +[08:50:07] Epoch: 1 Batch: 3831/20099 (19.06%) Loss: 2.191448 LR: 0.00002984 +[08:50:10] Epoch: 1 Batch: 3832/20099 (19.07%) Loss: 2.520070 LR: 0.00002984 +[08:50:13] Epoch: 1 Batch: 3833/20099 (19.07%) Loss: 2.179572 LR: 0.00002984 +[08:50:16] Epoch: 1 Batch: 3834/20099 (19.08%) Loss: 2.178042 LR: 0.00002984 +[08:50:19] Epoch: 1 Batch: 3835/20099 (19.08%) Loss: 2.235148 LR: 0.00002984 +[08:50:22] Epoch: 1 Batch: 3836/20099 (19.09%) Loss: 2.144053 LR: 0.00002989 +[08:50:25] Epoch: 1 Batch: 3837/20099 (19.09%) Loss: 1.648186 LR: 0.00002989 +[08:50:28] Epoch: 1 Batch: 3838/20099 (19.10%) Loss: 2.362682 LR: 0.00002989 +[08:50:31] Epoch: 1 Batch: 3839/20099 (19.10%) Loss: 2.243945 LR: 0.00002989 +[08:50:35] Epoch: 1 Batch: 3840/20099 (19.11%) Loss: 1.998917 LR: 0.00002989 +[08:50:38] Epoch: 1 Batch: 3841/20099 (19.11%) Loss: 2.216210 LR: 0.00002989 +[08:50:41] Epoch: 1 Batch: 3842/20099 (19.12%) Loss: 2.220915 LR: 0.00002989 +[08:50:44] Epoch: 1 Batch: 3843/20099 (19.12%) Loss: 1.960943 LR: 0.00002995 +[08:50:47] Epoch: 1 Batch: 3844/20099 (19.13%) Loss: 2.140455 LR: 0.00002995 +[08:50:50] Epoch: 1 Batch: 3845/20099 (19.13%) Loss: 2.120445 LR: 0.00002995 +[08:50:53] Epoch: 1 Batch: 3846/20099 (19.14%) Loss: 2.359638 LR: 0.00002995 +[08:50:56] Epoch: 1 Batch: 3847/20099 (19.14%) Loss: 2.635872 LR: 0.00002995 +[08:50:59] Epoch: 1 Batch: 3848/20099 (19.15%) Loss: 2.234388 LR: 0.00002995 +[08:51:02] Epoch: 1 Batch: 3849/20099 (19.15%) Loss: 1.966524 LR: 0.00002995 +[08:51:05] Epoch: 1 Batch: 3850/20099 (19.16%) Loss: 2.364397 LR: 0.00003000 +[08:51:09] Epoch: 1 Batch: 3851/20099 (19.16%) Loss: 2.200590 LR: 0.00003000 +[08:51:12] Epoch: 1 Batch: 3852/20099 (19.17%) Loss: 1.904665 LR: 0.00003000 +[08:51:15] Epoch: 1 Batch: 3853/20099 (19.17%) Loss: 1.898541 LR: 0.00003000 +[08:51:18] Epoch: 1 Batch: 3854/20099 (19.18%) Loss: 2.095768 LR: 0.00003000 +[08:51:21] Epoch: 1 Batch: 3855/20099 (19.18%) Loss: 2.233748 LR: 0.00003000 +[08:51:24] Epoch: 1 Batch: 3856/20099 (19.19%) Loss: 2.043198 LR: 0.00003000 +[08:51:27] Epoch: 1 Batch: 3857/20099 (19.19%) Loss: 2.194845 LR: 0.00003000 +[08:51:30] Epoch: 1 Batch: 3858/20099 (19.19%) Loss: 2.262131 LR: 0.00003000 +[08:51:33] Epoch: 1 Batch: 3859/20099 (19.20%) Loss: 2.312472 LR: 0.00003000 +[08:51:36] Epoch: 1 Batch: 3860/20099 (19.20%) Loss: 2.156317 LR: 0.00003000 +[08:51:40] Epoch: 1 Batch: 3861/20099 (19.21%) Loss: 1.864562 LR: 0.00003000 +[08:51:43] Epoch: 1 Batch: 3862/20099 (19.21%) Loss: 2.658740 LR: 0.00003000 +[08:51:46] Epoch: 1 Batch: 3863/20099 (19.22%) Loss: 2.308361 LR: 0.00003000 +[08:51:49] Epoch: 1 Batch: 3864/20099 (19.22%) Loss: 1.825405 LR: 0.00003000 +[08:51:52] Epoch: 1 Batch: 3865/20099 (19.23%) Loss: 2.038861 LR: 0.00003000 +[08:51:55] Epoch: 1 Batch: 3866/20099 (19.23%) Loss: 2.258087 LR: 0.00003000 +[08:51:58] Epoch: 1 Batch: 3867/20099 (19.24%) Loss: 2.416586 LR: 0.00003000 +[08:52:01] Epoch: 1 Batch: 3868/20099 (19.24%) Loss: 2.051997 LR: 0.00003000 +[08:52:04] Epoch: 1 Batch: 3869/20099 (19.25%) Loss: 2.440117 LR: 0.00003000 +[08:52:07] Epoch: 1 Batch: 3870/20099 (19.25%) Loss: 2.371255 LR: 0.00003000 +[08:52:10] Epoch: 1 Batch: 3871/20099 (19.26%) Loss: 2.437447 LR: 0.00003000 +[08:52:14] Epoch: 1 Batch: 3872/20099 (19.26%) Loss: 2.179188 LR: 0.00003000 +[08:52:17] Epoch: 1 Batch: 3873/20099 (19.27%) Loss: 2.308589 LR: 0.00003000 +[08:52:20] Epoch: 1 Batch: 3874/20099 (19.27%) Loss: 2.440221 LR: 0.00003000 +[08:52:23] Epoch: 1 Batch: 3875/20099 (19.28%) Loss: 2.405983 LR: 0.00003000 +[08:52:26] Epoch: 1 Batch: 3876/20099 (19.28%) Loss: 2.218790 LR: 0.00003000 +[08:52:29] Epoch: 1 Batch: 3877/20099 (19.29%) Loss: 2.439365 LR: 0.00003000 +[08:52:32] Epoch: 1 Batch: 3878/20099 (19.29%) Loss: 2.406316 LR: 0.00003000 +[08:52:35] Epoch: 1 Batch: 3879/20099 (19.30%) Loss: 2.026322 LR: 0.00003000 +[08:52:38] Epoch: 1 Batch: 3880/20099 (19.30%) Loss: 2.109701 LR: 0.00003000 +[08:52:41] Epoch: 1 Batch: 3881/20099 (19.31%) Loss: 2.286036 LR: 0.00003000 +[08:52:45] Epoch: 1 Batch: 3882/20099 (19.31%) Loss: 2.217481 LR: 0.00003000 +[08:52:48] Epoch: 1 Batch: 3883/20099 (19.32%) Loss: 2.191971 LR: 0.00003000 +[08:52:51] Epoch: 1 Batch: 3884/20099 (19.32%) Loss: 2.050911 LR: 0.00003000 +[08:52:54] Epoch: 1 Batch: 3885/20099 (19.33%) Loss: 2.280048 LR: 0.00003000 +[08:52:57] Epoch: 1 Batch: 3886/20099 (19.33%) Loss: 2.158287 LR: 0.00003000 +[08:53:00] Epoch: 1 Batch: 3887/20099 (19.34%) Loss: 2.189884 LR: 0.00003000 +[08:53:03] Epoch: 1 Batch: 3888/20099 (19.34%) Loss: 2.277434 LR: 0.00003000 +[08:53:06] Epoch: 1 Batch: 3889/20099 (19.35%) Loss: 2.182162 LR: 0.00003000 +[08:53:09] Epoch: 1 Batch: 3890/20099 (19.35%) Loss: 2.280168 LR: 0.00003000 +[08:53:12] Epoch: 1 Batch: 3891/20099 (19.36%) Loss: 2.149066 LR: 0.00003000 +[08:53:16] Epoch: 1 Batch: 3892/20099 (19.36%) Loss: 2.212634 LR: 0.00003000 +[08:53:19] Epoch: 1 Batch: 3893/20099 (19.37%) Loss: 2.237472 LR: 0.00003000 +[08:53:22] Epoch: 1 Batch: 3894/20099 (19.37%) Loss: 2.222408 LR: 0.00003000 +[08:53:25] Epoch: 1 Batch: 3895/20099 (19.38%) Loss: 2.097622 LR: 0.00003000 +[08:53:28] Epoch: 1 Batch: 3896/20099 (19.38%) Loss: 2.108954 LR: 0.00003000 +[08:53:31] Epoch: 1 Batch: 3897/20099 (19.39%) Loss: 2.246134 LR: 0.00003000 +[08:53:34] Epoch: 1 Batch: 3898/20099 (19.39%) Loss: 2.177431 LR: 0.00003000 +[08:53:37] Epoch: 1 Batch: 3899/20099 (19.40%) Loss: 1.998008 LR: 0.00003000 +[08:53:40] Epoch: 1 Batch: 3900/20099 (19.40%) Loss: 2.092345 LR: 0.00003000 +[08:53:43] Epoch: 1 Batch: 3901/20099 (19.41%) Loss: 2.453894 LR: 0.00003000 +[08:53:46] Epoch: 1 Batch: 3902/20099 (19.41%) Loss: 2.193978 LR: 0.00003000 +[08:53:50] Epoch: 1 Batch: 3903/20099 (19.42%) Loss: 2.346276 LR: 0.00003000 +[08:53:53] Epoch: 1 Batch: 3904/20099 (19.42%) Loss: 2.043164 LR: 0.00003000 +[08:53:56] Epoch: 1 Batch: 3905/20099 (19.43%) Loss: 1.883669 LR: 0.00003000 +[08:53:59] Epoch: 1 Batch: 3906/20099 (19.43%) Loss: 2.210085 LR: 0.00003000 +[08:54:02] Epoch: 1 Batch: 3907/20099 (19.44%) Loss: 2.113711 LR: 0.00003000 +[08:54:05] Epoch: 1 Batch: 3908/20099 (19.44%) Loss: 2.153842 LR: 0.00003000 +[08:54:08] Epoch: 1 Batch: 3909/20099 (19.45%) Loss: 2.385109 LR: 0.00003000 +[08:54:11] Epoch: 1 Batch: 3910/20099 (19.45%) Loss: 2.434619 LR: 0.00003000 +[08:54:14] Epoch: 1 Batch: 3911/20099 (19.46%) Loss: 2.042595 LR: 0.00003000 +[08:54:17] Epoch: 1 Batch: 3912/20099 (19.46%) Loss: 2.297205 LR: 0.00003000 +[08:54:21] Epoch: 1 Batch: 3913/20099 (19.47%) Loss: 1.919553 LR: 0.00003000 +[08:54:24] Epoch: 1 Batch: 3914/20099 (19.47%) Loss: 2.164845 LR: 0.00003000 +[08:54:27] Epoch: 1 Batch: 3915/20099 (19.48%) Loss: 2.108092 LR: 0.00003000 +[08:54:30] Epoch: 1 Batch: 3916/20099 (19.48%) Loss: 2.016833 LR: 0.00003000 +[08:54:33] Epoch: 1 Batch: 3917/20099 (19.49%) Loss: 2.137065 LR: 0.00003000 +[08:54:36] Epoch: 1 Batch: 3918/20099 (19.49%) Loss: 2.298762 LR: 0.00003000 +[08:54:39] Epoch: 1 Batch: 3919/20099 (19.50%) Loss: 2.226798 LR: 0.00003000 +[08:54:42] Epoch: 1 Batch: 3920/20099 (19.50%) Loss: 2.254846 LR: 0.00003000 +[08:54:45] Epoch: 1 Batch: 3921/20099 (19.51%) Loss: 2.273290 LR: 0.00003000 +[08:54:48] Epoch: 1 Batch: 3922/20099 (19.51%) Loss: 2.436063 LR: 0.00003000 +[08:54:51] Epoch: 1 Batch: 3923/20099 (19.52%) Loss: 2.335819 LR: 0.00003000 +[08:54:55] Epoch: 1 Batch: 3924/20099 (19.52%) Loss: 2.320823 LR: 0.00003000 +[08:54:58] Epoch: 1 Batch: 3925/20099 (19.53%) Loss: 2.088118 LR: 0.00003000 +[08:55:01] Epoch: 1 Batch: 3926/20099 (19.53%) Loss: 2.095462 LR: 0.00003000 +[08:55:04] Epoch: 1 Batch: 3927/20099 (19.54%) Loss: 2.344166 LR: 0.00003000 +[08:55:07] Epoch: 1 Batch: 3928/20099 (19.54%) Loss: 2.399682 LR: 0.00003000 +[08:55:10] Epoch: 1 Batch: 3929/20099 (19.55%) Loss: 2.357320 LR: 0.00003000 +[08:55:13] Epoch: 1 Batch: 3930/20099 (19.55%) Loss: 2.176040 LR: 0.00003000 +[08:55:16] Epoch: 1 Batch: 3931/20099 (19.56%) Loss: 2.224869 LR: 0.00003000 +[08:55:19] Epoch: 1 Batch: 3932/20099 (19.56%) Loss: 2.163687 LR: 0.00003000 +[08:55:22] Epoch: 1 Batch: 3933/20099 (19.57%) Loss: 2.272918 LR: 0.00003000 +[08:55:26] Epoch: 1 Batch: 3934/20099 (19.57%) Loss: 2.407977 LR: 0.00003000 +[08:55:29] Epoch: 1 Batch: 3935/20099 (19.58%) Loss: 1.891371 LR: 0.00003000 +[08:55:32] Epoch: 1 Batch: 3936/20099 (19.58%) Loss: 2.398843 LR: 0.00003000 +[08:55:35] Epoch: 1 Batch: 3937/20099 (19.59%) Loss: 2.009269 LR: 0.00003000 +[08:55:38] Epoch: 1 Batch: 3938/20099 (19.59%) Loss: 2.259749 LR: 0.00003000 +[08:55:41] Epoch: 1 Batch: 3939/20099 (19.60%) Loss: 2.069403 LR: 0.00003000 +[08:55:44] Epoch: 1 Batch: 3940/20099 (19.60%) Loss: 2.485107 LR: 0.00003000 +[08:55:47] Epoch: 1 Batch: 3941/20099 (19.61%) Loss: 2.187631 LR: 0.00003000 +[08:55:50] Epoch: 1 Batch: 3942/20099 (19.61%) Loss: 2.339424 LR: 0.00003000 +[08:55:53] Epoch: 1 Batch: 3943/20099 (19.62%) Loss: 2.199047 LR: 0.00003000 +[08:55:57] Epoch: 1 Batch: 3944/20099 (19.62%) Loss: 2.386613 LR: 0.00003000 +[08:56:00] Epoch: 1 Batch: 3945/20099 (19.63%) Loss: 1.938118 LR: 0.00003000 +[08:56:03] Epoch: 1 Batch: 3946/20099 (19.63%) Loss: 2.171975 LR: 0.00003000 +[08:56:06] Epoch: 1 Batch: 3947/20099 (19.64%) Loss: 2.305034 LR: 0.00003000 +[08:56:09] Epoch: 1 Batch: 3948/20099 (19.64%) Loss: 2.423848 LR: 0.00003000 +[08:56:12] Epoch: 1 Batch: 3949/20099 (19.65%) Loss: 1.990211 LR: 0.00003000 +[08:56:15] Epoch: 1 Batch: 3950/20099 (19.65%) Loss: 2.271347 LR: 0.00003000 +[08:56:18] Epoch: 1 Batch: 3951/20099 (19.66%) Loss: 2.170313 LR: 0.00003000 +[08:56:21] Epoch: 1 Batch: 3952/20099 (19.66%) Loss: 2.259197 LR: 0.00003000 +[08:56:24] Epoch: 1 Batch: 3953/20099 (19.67%) Loss: 2.065250 LR: 0.00003000 +[08:56:27] Epoch: 1 Batch: 3954/20099 (19.67%) Loss: 2.168954 LR: 0.00003000 +[08:56:31] Epoch: 1 Batch: 3955/20099 (19.68%) Loss: 2.025030 LR: 0.00003000 +[08:56:34] Epoch: 1 Batch: 3956/20099 (19.68%) Loss: 2.002827 LR: 0.00003000 +[08:56:37] Epoch: 1 Batch: 3957/20099 (19.69%) Loss: 2.305538 LR: 0.00003000 +[08:56:40] Epoch: 1 Batch: 3958/20099 (19.69%) Loss: 2.046114 LR: 0.00003000 +[08:56:43] Epoch: 1 Batch: 3959/20099 (19.70%) Loss: 2.208117 LR: 0.00003000 +[08:56:46] Epoch: 1 Batch: 3960/20099 (19.70%) Loss: 2.387245 LR: 0.00003000 +[08:56:49] Epoch: 1 Batch: 3961/20099 (19.71%) Loss: 2.148747 LR: 0.00003000 +[08:56:52] Epoch: 1 Batch: 3962/20099 (19.71%) Loss: 2.382910 LR: 0.00003000 +[08:56:55] Epoch: 1 Batch: 3963/20099 (19.72%) Loss: 1.934010 LR: 0.00003000 +[08:56:58] Epoch: 1 Batch: 3964/20099 (19.72%) Loss: 2.243827 LR: 0.00003000 +[08:57:02] Epoch: 1 Batch: 3965/20099 (19.73%) Loss: 2.402542 LR: 0.00003000 +[08:57:05] Epoch: 1 Batch: 3966/20099 (19.73%) Loss: 2.236204 LR: 0.00003000 +[08:57:08] Epoch: 1 Batch: 3967/20099 (19.74%) Loss: 2.381816 LR: 0.00003000 +[08:57:11] Epoch: 1 Batch: 3968/20099 (19.74%) Loss: 2.417387 LR: 0.00003000 +[08:57:14] Epoch: 1 Batch: 3969/20099 (19.75%) Loss: 1.858077 LR: 0.00003000 +[08:57:17] Epoch: 1 Batch: 3970/20099 (19.75%) Loss: 1.801800 LR: 0.00003000 +[08:57:20] Epoch: 1 Batch: 3971/20099 (19.76%) Loss: 2.072035 LR: 0.00003000 +[08:57:23] Epoch: 1 Batch: 3972/20099 (19.76%) Loss: 2.193757 LR: 0.00003000 +[08:57:26] Epoch: 1 Batch: 3973/20099 (19.77%) Loss: 2.321878 LR: 0.00003000 +[08:57:29] Epoch: 1 Batch: 3974/20099 (19.77%) Loss: 2.119226 LR: 0.00003000 +[08:57:33] Epoch: 1 Batch: 3975/20099 (19.78%) Loss: 2.380112 LR: 0.00003000 +[08:57:36] Epoch: 1 Batch: 3976/20099 (19.78%) Loss: 2.360560 LR: 0.00003000 +[08:57:39] Epoch: 1 Batch: 3977/20099 (19.79%) Loss: 1.954153 LR: 0.00003000 +[08:57:42] Epoch: 1 Batch: 3978/20099 (19.79%) Loss: 2.244420 LR: 0.00003000 +[08:57:45] Epoch: 1 Batch: 3979/20099 (19.80%) Loss: 2.289345 LR: 0.00003000 +[08:57:48] Epoch: 1 Batch: 3980/20099 (19.80%) Loss: 2.409555 LR: 0.00003000 +[08:57:51] Epoch: 1 Batch: 3981/20099 (19.81%) Loss: 2.168973 LR: 0.00003000 +[08:57:54] Epoch: 1 Batch: 3982/20099 (19.81%) Loss: 2.279854 LR: 0.00003000 +[08:57:57] Epoch: 1 Batch: 3983/20099 (19.82%) Loss: 2.380676 LR: 0.00003000 +[08:58:00] Epoch: 1 Batch: 3984/20099 (19.82%) Loss: 2.071845 LR: 0.00003000 +[08:58:04] Epoch: 1 Batch: 3985/20099 (19.83%) Loss: 2.416183 LR: 0.00003000 +[08:58:07] Epoch: 1 Batch: 3986/20099 (19.83%) Loss: 2.147555 LR: 0.00003000 +[08:58:10] Epoch: 1 Batch: 3987/20099 (19.84%) Loss: 1.923975 LR: 0.00003000 +[08:58:13] Epoch: 1 Batch: 3988/20099 (19.84%) Loss: 2.187119 LR: 0.00003000 +[08:58:16] Epoch: 1 Batch: 3989/20099 (19.85%) Loss: 2.210663 LR: 0.00003000 +[08:58:19] Epoch: 1 Batch: 3990/20099 (19.85%) Loss: 2.222629 LR: 0.00003000 +[08:58:22] Epoch: 1 Batch: 3991/20099 (19.86%) Loss: 2.158592 LR: 0.00003000 +[08:58:25] Epoch: 1 Batch: 3992/20099 (19.86%) Loss: 1.971459 LR: 0.00003000 +[08:58:28] Epoch: 1 Batch: 3993/20099 (19.87%) Loss: 2.283452 LR: 0.00003000 +[08:58:31] Epoch: 1 Batch: 3994/20099 (19.87%) Loss: 2.040454 LR: 0.00003000 +[08:58:34] Epoch: 1 Batch: 3995/20099 (19.88%) Loss: 2.261510 LR: 0.00003000 +[08:58:38] Epoch: 1 Batch: 3996/20099 (19.88%) Loss: 2.243987 LR: 0.00003000 +[08:58:41] Epoch: 1 Batch: 3997/20099 (19.89%) Loss: 2.128390 LR: 0.00003000 +[08:58:44] Epoch: 1 Batch: 3998/20099 (19.89%) Loss: 2.338022 LR: 0.00003000 +[08:58:47] Epoch: 1 Batch: 3999/20099 (19.90%) Loss: 2.137212 LR: 0.00003000 +[08:58:50] >> Evaluating batch 0 +[08:58:51] >> Evaluating batch 1 +[08:58:53] >> Evaluating batch 2 +[08:58:54] >> Evaluating batch 3 +[08:58:55] >> Evaluating batch 4 +[08:58:56] >> Evaluating batch 5 +[08:58:58] >> Evaluating batch 6 +[08:58:59] >> Evaluating batch 7 +[08:59:00] >> Evaluating batch 8 +[08:59:01] >> Evaluating batch 9 +[08:59:03] >> Evaluating batch 10 +[08:59:04] >> Evaluating batch 11 +[08:59:05] >> Evaluating batch 12 +[08:59:06] >> Evaluating batch 13 +[08:59:07] >> Evaluating batch 14 +[08:59:08] >> Evaluating batch 15 +[08:59:10] >> Evaluating batch 16 +[08:59:11] Epoch: 1 Step: 4000/20099 Evaluation: +[08:59:11] [1mAvg Loss Since Last Eval: 2.2078 Val Loss: 2.2596 Validation loss delta: -0.0181 Perplexity: 9.5792 LR: 0.00003000 +[08:59:14] >> Cleaned up old temp checkpoint: epoch1_step2000 +[08:59:14] >> Temp checkpoint saved: epoch1_step4000, size: 0.1693 GB +[08:59:18] >> Checkpoint saved: epoch1_step4000, size: 0.1693 GB +[08:59:18] Epoch: 1 Batch: 4000/20099 (19.90%) Loss: 2.230767 LR: 0.00003000 +[08:59:21] Epoch: 1 Batch: 4001/20099 (19.91%) Loss: 2.365759 LR: 0.00003000 +[08:59:24] Epoch: 1 Batch: 4002/20099 (19.91%) Loss: 2.155231 LR: 0.00003000 +[08:59:27] Epoch: 1 Batch: 4003/20099 (19.92%) Loss: 2.146534 LR: 0.00003000 +[08:59:30] Epoch: 1 Batch: 4004/20099 (19.92%) Loss: 2.157895 LR: 0.00002999 +[08:59:33] Epoch: 1 Batch: 4005/20099 (19.93%) Loss: 2.153976 LR: 0.00002999 +[08:59:36] Epoch: 1 Batch: 4006/20099 (19.93%) Loss: 2.220458 LR: 0.00002999 +[08:59:39] Epoch: 1 Batch: 4007/20099 (19.94%) Loss: 2.197265 LR: 0.00002999 +[08:59:42] Epoch: 1 Batch: 4008/20099 (19.94%) Loss: 2.098452 LR: 0.00002999 +[08:59:46] Epoch: 1 Batch: 4009/20099 (19.95%) Loss: 2.141283 LR: 0.00002999 +[08:59:49] Epoch: 1 Batch: 4010/20099 (19.95%) Loss: 1.806260 LR: 0.00002999 +[08:59:52] Epoch: 1 Batch: 4011/20099 (19.96%) Loss: 2.075292 LR: 0.00002999 +[08:59:55] Epoch: 1 Batch: 4012/20099 (19.96%) Loss: 2.124531 LR: 0.00002999 +[08:59:58] Epoch: 1 Batch: 4013/20099 (19.97%) Loss: 2.037990 LR: 0.00002999 +[09:00:01] Epoch: 1 Batch: 4014/20099 (19.97%) Loss: 2.214685 LR: 0.00002999 +[09:00:05] Epoch: 1 Batch: 4015/20099 (19.98%) Loss: 2.198604 LR: 0.00002999 +[09:00:08] Epoch: 1 Batch: 4016/20099 (19.98%) Loss: 2.152951 LR: 0.00002999 +[09:00:11] Epoch: 1 Batch: 4017/20099 (19.99%) Loss: 2.282963 LR: 0.00002999 +[09:00:14] Epoch: 1 Batch: 4018/20099 (19.99%) Loss: 2.183164 LR: 0.00002999 +[09:00:17] Epoch: 1 Batch: 4019/20099 (20.00%) Loss: 2.054929 LR: 0.00002999 +[09:00:20] Epoch: 1 Batch: 4020/20099 (20.00%) Loss: 2.209050 LR: 0.00002999 +[09:00:23] Epoch: 1 Batch: 4021/20099 (20.01%) Loss: 2.128426 LR: 0.00002999 +[09:00:26] Epoch: 1 Batch: 4022/20099 (20.01%) Loss: 2.433947 LR: 0.00002999 +[09:00:29] Epoch: 1 Batch: 4023/20099 (20.02%) Loss: 2.161413 LR: 0.00002999 +[09:00:32] Epoch: 1 Batch: 4024/20099 (20.02%) Loss: 2.048685 LR: 0.00002999 +[09:00:35] Epoch: 1 Batch: 4025/20099 (20.03%) Loss: 2.127099 LR: 0.00002999 +[09:00:38] Epoch: 1 Batch: 4026/20099 (20.03%) Loss: 2.125098 LR: 0.00002999 +[09:00:42] Epoch: 1 Batch: 4027/20099 (20.04%) Loss: 2.297527 LR: 0.00002999 +[09:00:45] Epoch: 1 Batch: 4028/20099 (20.04%) Loss: 2.158046 LR: 0.00002999 +[09:00:48] Epoch: 1 Batch: 4029/20099 (20.05%) Loss: 1.937330 LR: 0.00002999 +[09:00:51] Epoch: 1 Batch: 4030/20099 (20.05%) Loss: 2.435694 LR: 0.00002999 +[09:00:54] Epoch: 1 Batch: 4031/20099 (20.06%) Loss: 2.049549 LR: 0.00002999 +[09:00:57] Epoch: 1 Batch: 4032/20099 (20.06%) Loss: 2.282878 LR: 0.00002999 +[09:01:00] Epoch: 1 Batch: 4033/20099 (20.07%) Loss: 1.783038 LR: 0.00002999 +[09:01:03] Epoch: 1 Batch: 4034/20099 (20.07%) Loss: 2.097534 LR: 0.00002999 +[09:01:06] Epoch: 1 Batch: 4035/20099 (20.08%) Loss: 1.880652 LR: 0.00002999 +[09:01:09] Epoch: 1 Batch: 4036/20099 (20.08%) Loss: 2.199209 LR: 0.00002999 +[09:01:12] Epoch: 1 Batch: 4037/20099 (20.09%) Loss: 2.250202 LR: 0.00002999 +[09:01:15] Epoch: 1 Batch: 4038/20099 (20.09%) Loss: 2.112329 LR: 0.00002999 +[09:01:19] Epoch: 1 Batch: 4039/20099 (20.10%) Loss: 2.110555 LR: 0.00002999 +[09:01:22] Epoch: 1 Batch: 4040/20099 (20.10%) Loss: 2.203888 LR: 0.00002999 +[09:01:25] Epoch: 1 Batch: 4041/20099 (20.11%) Loss: 2.109310 LR: 0.00002999 +[09:01:28] Epoch: 1 Batch: 4042/20099 (20.11%) Loss: 2.089413 LR: 0.00002999 +[09:01:31] Epoch: 1 Batch: 4043/20099 (20.12%) Loss: 1.872151 LR: 0.00002999 +[09:01:34] Epoch: 1 Batch: 4044/20099 (20.12%) Loss: 2.067805 LR: 0.00002999 +[09:01:37] Epoch: 1 Batch: 4045/20099 (20.13%) Loss: 2.148703 LR: 0.00002999 +[09:01:40] Epoch: 1 Batch: 4046/20099 (20.13%) Loss: 2.296954 LR: 0.00002999 +[09:01:43] Epoch: 1 Batch: 4047/20099 (20.14%) Loss: 1.896643 LR: 0.00002999 +[09:01:47] Epoch: 1 Batch: 4048/20099 (20.14%) Loss: 1.918306 LR: 0.00002999 +[09:01:50] Epoch: 1 Batch: 4049/20099 (20.15%) Loss: 2.143274 LR: 0.00002999 +[09:01:53] Epoch: 1 Batch: 4050/20099 (20.15%) Loss: 2.257886 LR: 0.00002999 +[09:01:56] Epoch: 1 Batch: 4051/20099 (20.16%) Loss: 2.061531 LR: 0.00002999 +[09:01:59] Epoch: 1 Batch: 4052/20099 (20.16%) Loss: 2.403210 LR: 0.00002999 +[09:02:02] Epoch: 1 Batch: 4053/20099 (20.17%) Loss: 2.145045 LR: 0.00002999 +[09:02:05] Epoch: 1 Batch: 4054/20099 (20.17%) Loss: 2.205540 LR: 0.00002999 +[09:02:08] Epoch: 1 Batch: 4055/20099 (20.18%) Loss: 2.172061 LR: 0.00002999 +[09:02:11] Epoch: 1 Batch: 4056/20099 (20.18%) Loss: 2.575297 LR: 0.00002999 +[09:02:14] Epoch: 1 Batch: 4057/20099 (20.19%) Loss: 2.080708 LR: 0.00002999 +[09:02:18] Epoch: 1 Batch: 4058/20099 (20.19%) Loss: 2.249847 LR: 0.00002999 +[09:02:21] Epoch: 1 Batch: 4059/20099 (20.20%) Loss: 2.053966 LR: 0.00002999 +[09:02:24] Epoch: 1 Batch: 4060/20099 (20.20%) Loss: 2.025671 LR: 0.00002999 +[09:02:27] Epoch: 1 Batch: 4061/20099 (20.20%) Loss: 2.445138 LR: 0.00002999 +[09:02:30] Epoch: 1 Batch: 4062/20099 (20.21%) Loss: 1.991379 LR: 0.00002999 +[09:02:33] Epoch: 1 Batch: 4063/20099 (20.21%) Loss: 2.220240 LR: 0.00002999 +[09:02:36] Epoch: 1 Batch: 4064/20099 (20.22%) Loss: 1.978880 LR: 0.00002999 +[09:02:39] Epoch: 1 Batch: 4065/20099 (20.22%) Loss: 2.067166 LR: 0.00002999 +[09:02:42] Epoch: 1 Batch: 4066/20099 (20.23%) Loss: 2.386489 LR: 0.00002999 +[09:02:45] Epoch: 1 Batch: 4067/20099 (20.23%) Loss: 2.406748 LR: 0.00002999 +[09:02:48] Epoch: 1 Batch: 4068/20099 (20.24%) Loss: 1.887735 LR: 0.00002999 +[09:02:52] Epoch: 1 Batch: 4069/20099 (20.24%) Loss: 2.367457 LR: 0.00002999 +[09:02:55] Epoch: 1 Batch: 4070/20099 (20.25%) Loss: 2.404861 LR: 0.00002999 +[09:02:58] Epoch: 1 Batch: 4071/20099 (20.25%) Loss: 1.999785 LR: 0.00002999 +[09:03:01] Epoch: 1 Batch: 4072/20099 (20.26%) Loss: 2.249867 LR: 0.00002999 +[09:03:04] Epoch: 1 Batch: 4073/20099 (20.26%) Loss: 2.075081 LR: 0.00002999 +[09:03:07] Epoch: 1 Batch: 4074/20099 (20.27%) Loss: 2.149020 LR: 0.00002999 +[09:03:10] Epoch: 1 Batch: 4075/20099 (20.27%) Loss: 2.059026 LR: 0.00002999 +[09:03:13] Epoch: 1 Batch: 4076/20099 (20.28%) Loss: 2.341281 LR: 0.00002999 +[09:03:16] Epoch: 1 Batch: 4077/20099 (20.28%) Loss: 1.938484 LR: 0.00002999 +[09:03:19] Epoch: 1 Batch: 4078/20099 (20.29%) Loss: 2.411069 LR: 0.00002999 +[09:03:22] Epoch: 1 Batch: 4079/20099 (20.29%) Loss: 2.407255 LR: 0.00002999 +[09:03:26] Epoch: 1 Batch: 4080/20099 (20.30%) Loss: 2.353181 LR: 0.00002999 +[09:03:29] Epoch: 1 Batch: 4081/20099 (20.30%) Loss: 1.941622 LR: 0.00002999 +[09:03:32] Epoch: 1 Batch: 4082/20099 (20.31%) Loss: 1.916857 LR: 0.00002999 +[09:03:35] Epoch: 1 Batch: 4083/20099 (20.31%) Loss: 2.263108 LR: 0.00002999 +[09:03:38] Epoch: 1 Batch: 4084/20099 (20.32%) Loss: 1.844754 LR: 0.00002999 +[09:03:41] Epoch: 1 Batch: 4085/20099 (20.32%) Loss: 2.114269 LR: 0.00002999 +[09:03:44] Epoch: 1 Batch: 4086/20099 (20.33%) Loss: 2.249472 LR: 0.00002999 +[09:03:47] Epoch: 1 Batch: 4087/20099 (20.33%) Loss: 2.391635 LR: 0.00002999 +[09:03:50] Epoch: 1 Batch: 4088/20099 (20.34%) Loss: 2.290164 LR: 0.00002999 +[09:03:53] Epoch: 1 Batch: 4089/20099 (20.34%) Loss: 2.277984 LR: 0.00002999 +[09:03:57] Epoch: 1 Batch: 4090/20099 (20.35%) Loss: 2.199185 LR: 0.00002999 +[09:04:00] Epoch: 1 Batch: 4091/20099 (20.35%) Loss: 2.028072 LR: 0.00002999 +[09:04:03] Epoch: 1 Batch: 4092/20099 (20.36%) Loss: 2.390983 LR: 0.00002999 +[09:04:06] Epoch: 1 Batch: 4093/20099 (20.36%) Loss: 2.337943 LR: 0.00002999 +[09:04:09] Epoch: 1 Batch: 4094/20099 (20.37%) Loss: 2.323091 LR: 0.00002999 +[09:04:12] Epoch: 1 Batch: 4095/20099 (20.37%) Loss: 2.060595 LR: 0.00002999 +[09:04:15] Epoch: 1 Batch: 4096/20099 (20.38%) Loss: 2.221461 LR: 0.00002999 +[09:04:18] Epoch: 1 Batch: 4097/20099 (20.38%) Loss: 1.858494 LR: 0.00002999 +[09:04:21] Epoch: 1 Batch: 4098/20099 (20.39%) Loss: 2.214817 LR: 0.00002999 +[09:04:24] Epoch: 1 Batch: 4099/20099 (20.39%) Loss: 2.093318 LR: 0.00002999 +[09:04:27] Epoch: 1 Batch: 4100/20099 (20.40%) Loss: 1.903260 LR: 0.00002999 +[09:04:30] Epoch: 1 Batch: 4101/20099 (20.40%) Loss: 2.383768 LR: 0.00002999 +[09:04:34] Epoch: 1 Batch: 4102/20099 (20.41%) Loss: 2.129053 LR: 0.00002999 +[09:04:37] Epoch: 1 Batch: 4103/20099 (20.41%) Loss: 1.893385 LR: 0.00002999 +[09:04:40] Epoch: 1 Batch: 4104/20099 (20.42%) Loss: 2.302868 LR: 0.00002999 +[09:04:43] Epoch: 1 Batch: 4105/20099 (20.42%) Loss: 2.183161 LR: 0.00002999 +[09:04:46] Epoch: 1 Batch: 4106/20099 (20.43%) Loss: 2.235359 LR: 0.00002999 +[09:04:49] Epoch: 1 Batch: 4107/20099 (20.43%) Loss: 2.209957 LR: 0.00002999 +[09:04:52] Epoch: 1 Batch: 4108/20099 (20.44%) Loss: 2.151106 LR: 0.00002999 +[09:04:55] Epoch: 1 Batch: 4109/20099 (20.44%) Loss: 2.377003 LR: 0.00002998 +[09:04:58] Epoch: 1 Batch: 4110/20099 (20.45%) Loss: 2.182909 LR: 0.00002998 +[09:05:01] Epoch: 1 Batch: 4111/20099 (20.45%) Loss: 2.103906 LR: 0.00002998 +[09:05:04] Epoch: 1 Batch: 4112/20099 (20.46%) Loss: 2.067913 LR: 0.00002998 +[09:05:07] Epoch: 1 Batch: 4113/20099 (20.46%) Loss: 2.156087 LR: 0.00002998 +[09:05:11] Epoch: 1 Batch: 4114/20099 (20.47%) Loss: 2.289936 LR: 0.00002998 +[09:05:14] Epoch: 1 Batch: 4115/20099 (20.47%) Loss: 2.310542 LR: 0.00002998 +[09:05:17] Epoch: 1 Batch: 4116/20099 (20.48%) Loss: 2.070861 LR: 0.00002998 +[09:05:20] Epoch: 1 Batch: 4117/20099 (20.48%) Loss: 2.178672 LR: 0.00002998 +[09:05:23] Epoch: 1 Batch: 4118/20099 (20.49%) Loss: 2.034380 LR: 0.00002998 +[09:05:26] Epoch: 1 Batch: 4119/20099 (20.49%) Loss: 2.095064 LR: 0.00002998 +[09:05:29] Epoch: 1 Batch: 4120/20099 (20.50%) Loss: 2.139366 LR: 0.00002998 +[09:05:32] Epoch: 1 Batch: 4121/20099 (20.50%) Loss: 2.403356 LR: 0.00002998 +[09:05:35] Epoch: 1 Batch: 4122/20099 (20.51%) Loss: 2.171491 LR: 0.00002998 +[09:05:38] Epoch: 1 Batch: 4123/20099 (20.51%) Loss: 2.228421 LR: 0.00002998 +[09:05:42] Epoch: 1 Batch: 4124/20099 (20.52%) Loss: 2.023410 LR: 0.00002998 +[09:05:45] Epoch: 1 Batch: 4125/20099 (20.52%) Loss: 2.132067 LR: 0.00002998 +[09:05:48] Epoch: 1 Batch: 4126/20099 (20.53%) Loss: 2.325770 LR: 0.00002998 +[09:05:51] Epoch: 1 Batch: 4127/20099 (20.53%) Loss: 2.295927 LR: 0.00002998 +[09:05:54] Epoch: 1 Batch: 4128/20099 (20.54%) Loss: 2.182862 LR: 0.00002998 +[09:05:57] Epoch: 1 Batch: 4129/20099 (20.54%) Loss: 2.215382 LR: 0.00002998 +[09:06:00] Epoch: 1 Batch: 4130/20099 (20.55%) Loss: 2.455392 LR: 0.00002998 +[09:06:03] Epoch: 1 Batch: 4131/20099 (20.55%) Loss: 2.277093 LR: 0.00002998 +[09:06:06] Epoch: 1 Batch: 4132/20099 (20.56%) Loss: 2.348808 LR: 0.00002998 +[09:06:09] Epoch: 1 Batch: 4133/20099 (20.56%) Loss: 2.165989 LR: 0.00002998 +[09:06:13] Epoch: 1 Batch: 4134/20099 (20.57%) Loss: 2.398511 LR: 0.00002998 +[09:06:16] Epoch: 1 Batch: 4135/20099 (20.57%) Loss: 2.184958 LR: 0.00002998 +[09:06:19] Epoch: 1 Batch: 4136/20099 (20.58%) Loss: 2.131309 LR: 0.00002998 +[09:06:22] Epoch: 1 Batch: 4137/20099 (20.58%) Loss: 2.152016 LR: 0.00002998 +[09:06:25] Epoch: 1 Batch: 4138/20099 (20.59%) Loss: 2.340721 LR: 0.00002998 +[09:06:28] Epoch: 1 Batch: 4139/20099 (20.59%) Loss: 2.109335 LR: 0.00002998 +[09:06:31] Epoch: 1 Batch: 4140/20099 (20.60%) Loss: 2.196358 LR: 0.00002998 +[09:06:34] Epoch: 1 Batch: 4141/20099 (20.60%) Loss: 2.239304 LR: 0.00002998 +[09:06:37] Epoch: 1 Batch: 4142/20099 (20.61%) Loss: 2.187442 LR: 0.00002998 +[09:06:41] Epoch: 1 Batch: 4143/20099 (20.61%) Loss: 2.099458 LR: 0.00002998 +[09:06:44] Epoch: 1 Batch: 4144/20099 (20.62%) Loss: 1.951378 LR: 0.00002998 +[09:06:47] Epoch: 1 Batch: 4145/20099 (20.62%) Loss: 2.278064 LR: 0.00002998 +[09:06:50] Epoch: 1 Batch: 4146/20099 (20.63%) Loss: 2.409459 LR: 0.00002998 +[09:06:53] Epoch: 1 Batch: 4147/20099 (20.63%) Loss: 1.883501 LR: 0.00002998 +[09:06:56] Epoch: 1 Batch: 4148/20099 (20.64%) Loss: 2.003777 LR: 0.00002998 +[09:06:59] Epoch: 1 Batch: 4149/20099 (20.64%) Loss: 1.894684 LR: 0.00002998 +[09:07:02] Epoch: 1 Batch: 4150/20099 (20.65%) Loss: 1.986508 LR: 0.00002998 +[09:07:05] Epoch: 1 Batch: 4151/20099 (20.65%) Loss: 2.177313 LR: 0.00002998 +[09:07:08] Epoch: 1 Batch: 4152/20099 (20.66%) Loss: 1.931077 LR: 0.00002998 +[09:07:12] Epoch: 1 Batch: 4153/20099 (20.66%) Loss: 2.804565 LR: 0.00002998 +[09:07:15] Epoch: 1 Batch: 4154/20099 (20.67%) Loss: 2.128333 LR: 0.00002998 +[09:07:18] Epoch: 1 Batch: 4155/20099 (20.67%) Loss: 2.248257 LR: 0.00002998 +[09:07:21] Epoch: 1 Batch: 4156/20099 (20.68%) Loss: 2.071353 LR: 0.00002998 +[09:07:24] Epoch: 1 Batch: 4157/20099 (20.68%) Loss: 1.646418 LR: 0.00002998 +[09:07:27] Epoch: 1 Batch: 4158/20099 (20.69%) Loss: 2.037481 LR: 0.00002998 +[09:07:30] Epoch: 1 Batch: 4159/20099 (20.69%) Loss: 1.863584 LR: 0.00002998 +[09:07:33] Epoch: 1 Batch: 4160/20099 (20.70%) Loss: 2.244012 LR: 0.00002998 +[09:07:36] Epoch: 1 Batch: 4161/20099 (20.70%) Loss: 2.529523 LR: 0.00002998 +[09:07:39] Epoch: 1 Batch: 4162/20099 (20.71%) Loss: 1.983195 LR: 0.00002998 +[09:07:42] Epoch: 1 Batch: 4163/20099 (20.71%) Loss: 2.311235 LR: 0.00002998 +[09:07:45] Epoch: 1 Batch: 4164/20099 (20.72%) Loss: 2.240770 LR: 0.00002998 +[09:07:49] Epoch: 1 Batch: 4165/20099 (20.72%) Loss: 2.230162 LR: 0.00002998 +[09:07:52] Epoch: 1 Batch: 4166/20099 (20.73%) Loss: 1.741855 LR: 0.00002998 +[09:07:55] Epoch: 1 Batch: 4167/20099 (20.73%) Loss: 2.071361 LR: 0.00002998 +[09:07:58] Epoch: 1 Batch: 4168/20099 (20.74%) Loss: 2.383783 LR: 0.00002998 +[09:08:01] Epoch: 1 Batch: 4169/20099 (20.74%) Loss: 2.259821 LR: 0.00002998 +[09:08:04] Epoch: 1 Batch: 4170/20099 (20.75%) Loss: 1.746626 LR: 0.00002998 +[09:08:07] Epoch: 1 Batch: 4171/20099 (20.75%) Loss: 2.307424 LR: 0.00002998 +[09:08:10] Epoch: 1 Batch: 4172/20099 (20.76%) Loss: 2.076166 LR: 0.00002998 +[09:08:13] Epoch: 1 Batch: 4173/20099 (20.76%) Loss: 2.459747 LR: 0.00002998 +[09:08:16] Epoch: 1 Batch: 4174/20099 (20.77%) Loss: 1.771962 LR: 0.00002998 +[09:08:20] Epoch: 1 Batch: 4175/20099 (20.77%) Loss: 2.142812 LR: 0.00002998 +[09:08:23] Epoch: 1 Batch: 4176/20099 (20.78%) Loss: 2.127935 LR: 0.00002998 +[09:08:26] Epoch: 1 Batch: 4177/20099 (20.78%) Loss: 2.409044 LR: 0.00002998 +[09:08:29] Epoch: 1 Batch: 4178/20099 (20.79%) Loss: 1.902766 LR: 0.00002998 +[09:08:32] Epoch: 1 Batch: 4179/20099 (20.79%) Loss: 2.125066 LR: 0.00002998 +[09:08:35] Epoch: 1 Batch: 4180/20099 (20.80%) Loss: 2.021751 LR: 0.00002998 +[09:08:38] Epoch: 1 Batch: 4181/20099 (20.80%) Loss: 2.104440 LR: 0.00002998 +[09:08:41] Epoch: 1 Batch: 4182/20099 (20.81%) Loss: 2.334563 LR: 0.00002998 +[09:08:44] Epoch: 1 Batch: 4183/20099 (20.81%) Loss: 2.211609 LR: 0.00002998 +[09:08:47] Epoch: 1 Batch: 4184/20099 (20.82%) Loss: 2.310794 LR: 0.00002998 +[09:08:50] Epoch: 1 Batch: 4185/20099 (20.82%) Loss: 1.583148 LR: 0.00002998 +[09:08:53] Epoch: 1 Batch: 4186/20099 (20.83%) Loss: 1.950957 LR: 0.00002997 +[09:08:57] Epoch: 1 Batch: 4187/20099 (20.83%) Loss: 2.178679 LR: 0.00002997 +[09:09:00] Epoch: 1 Batch: 4188/20099 (20.84%) Loss: 1.968034 LR: 0.00002997 +[09:09:03] Epoch: 1 Batch: 4189/20099 (20.84%) Loss: 2.173266 LR: 0.00002997 +[09:09:06] Epoch: 1 Batch: 4190/20099 (20.85%) Loss: 2.194826 LR: 0.00002997 +[09:09:09] Epoch: 1 Batch: 4191/20099 (20.85%) Loss: 2.312870 LR: 0.00002997 +[09:09:12] Epoch: 1 Batch: 4192/20099 (20.86%) Loss: 2.147857 LR: 0.00002997 +[09:09:15] Epoch: 1 Batch: 4193/20099 (20.86%) Loss: 2.148641 LR: 0.00002997 +[09:09:18] Epoch: 1 Batch: 4194/20099 (20.87%) Loss: 2.161398 LR: 0.00002997 +[09:09:21] Epoch: 1 Batch: 4195/20099 (20.87%) Loss: 2.468514 LR: 0.00002997 +[09:09:24] Epoch: 1 Batch: 4196/20099 (20.88%) Loss: 2.226539 LR: 0.00002997 +[09:09:27] Epoch: 1 Batch: 4197/20099 (20.88%) Loss: 2.495526 LR: 0.00002997 +[09:09:31] Epoch: 1 Batch: 4198/20099 (20.89%) Loss: 1.778994 LR: 0.00002997 +[09:09:34] Epoch: 1 Batch: 4199/20099 (20.89%) Loss: 2.475960 LR: 0.00002997 +[09:09:40] >> Cleaned up old temp checkpoint: epoch1_step2200 +[09:09:40] >> Temp checkpoint saved: epoch1_step4200, size: 0.1693 GB +[09:09:40] Epoch: 1 Batch: 4200/20099 (20.90%) Loss: 2.259036 LR: 0.00002997 +[09:09:43] Epoch: 1 Batch: 4201/20099 (20.90%) Loss: 2.205713 LR: 0.00002997 +[09:09:46] Epoch: 1 Batch: 4202/20099 (20.91%) Loss: 1.897110 LR: 0.00002997 +[09:09:49] Epoch: 1 Batch: 4203/20099 (20.91%) Loss: 1.987788 LR: 0.00002997 +[09:09:52] Epoch: 1 Batch: 4204/20099 (20.92%) Loss: 2.488675 LR: 0.00002997 +[09:09:56] Epoch: 1 Batch: 4205/20099 (20.92%) Loss: 2.079707 LR: 0.00002997 +[09:09:59] Epoch: 1 Batch: 4206/20099 (20.93%) Loss: 2.319595 LR: 0.00002997 +[09:10:02] Epoch: 1 Batch: 4207/20099 (20.93%) Loss: 2.207680 LR: 0.00002997 +[09:10:05] Epoch: 1 Batch: 4208/20099 (20.94%) Loss: 2.013397 LR: 0.00002997 +[09:10:08] Epoch: 1 Batch: 4209/20099 (20.94%) Loss: 2.176493 LR: 0.00002997 +[09:10:11] Epoch: 1 Batch: 4210/20099 (20.95%) Loss: 2.297882 LR: 0.00002997 +[09:10:14] Epoch: 1 Batch: 4211/20099 (20.95%) Loss: 2.000204 LR: 0.00002997 +[09:10:17] Epoch: 1 Batch: 4212/20099 (20.96%) Loss: 2.221789 LR: 0.00002997 +[09:10:20] Epoch: 1 Batch: 4213/20099 (20.96%) Loss: 2.264577 LR: 0.00002997 +[09:10:24] Epoch: 1 Batch: 4214/20099 (20.97%) Loss: 2.222815 LR: 0.00002997 +[09:10:27] Epoch: 1 Batch: 4215/20099 (20.97%) Loss: 2.096981 LR: 0.00002997 +[09:10:30] Epoch: 1 Batch: 4216/20099 (20.98%) Loss: 2.488047 LR: 0.00002997 +[09:10:33] Epoch: 1 Batch: 4217/20099 (20.98%) Loss: 2.280287 LR: 0.00002997 +[09:10:36] Epoch: 1 Batch: 4218/20099 (20.99%) Loss: 2.116813 LR: 0.00002997 +[09:10:39] Epoch: 1 Batch: 4219/20099 (20.99%) Loss: 2.105318 LR: 0.00002997 +[09:10:42] Epoch: 1 Batch: 4220/20099 (21.00%) Loss: 2.303334 LR: 0.00002997 +[09:10:45] Epoch: 1 Batch: 4221/20099 (21.00%) Loss: 2.217862 LR: 0.00002997 +[09:10:48] Epoch: 1 Batch: 4222/20099 (21.01%) Loss: 2.151740 LR: 0.00002997 +[09:10:52] Epoch: 1 Batch: 4223/20099 (21.01%) Loss: 2.230169 LR: 0.00002997 +[09:10:55] Epoch: 1 Batch: 4224/20099 (21.02%) Loss: 1.890332 LR: 0.00002997 +[09:10:58] Epoch: 1 Batch: 4225/20099 (21.02%) Loss: 2.341240 LR: 0.00002997 +[09:11:01] Epoch: 1 Batch: 4226/20099 (21.03%) Loss: 2.544625 LR: 0.00002997 +[09:11:04] Epoch: 1 Batch: 4227/20099 (21.03%) Loss: 2.078583 LR: 0.00002997 +[09:11:07] Epoch: 1 Batch: 4228/20099 (21.04%) Loss: 2.007075 LR: 0.00002997 +[09:11:10] Epoch: 1 Batch: 4229/20099 (21.04%) Loss: 2.048175 LR: 0.00002997 +[09:11:13] Epoch: 1 Batch: 4230/20099 (21.05%) Loss: 2.161504 LR: 0.00002997 +[09:11:16] Epoch: 1 Batch: 4231/20099 (21.05%) Loss: 2.206236 LR: 0.00002997 +[09:11:19] Epoch: 1 Batch: 4232/20099 (21.06%) Loss: 2.155538 LR: 0.00002997 +[09:11:22] Epoch: 1 Batch: 4233/20099 (21.06%) Loss: 2.459120 LR: 0.00002997 +[09:11:26] Epoch: 1 Batch: 4234/20099 (21.07%) Loss: 1.907223 LR: 0.00002997 +[09:11:29] Epoch: 1 Batch: 4235/20099 (21.07%) Loss: 2.381596 LR: 0.00002997 +[09:11:32] Epoch: 1 Batch: 4236/20099 (21.08%) Loss: 2.195115 LR: 0.00002997 +[09:11:35] Epoch: 1 Batch: 4237/20099 (21.08%) Loss: 1.958473 LR: 0.00002997 +[09:11:38] Epoch: 1 Batch: 4238/20099 (21.09%) Loss: 2.104866 LR: 0.00002997 +[09:11:41] Epoch: 1 Batch: 4239/20099 (21.09%) Loss: 2.036722 LR: 0.00002997 +[09:11:44] Epoch: 1 Batch: 4240/20099 (21.10%) Loss: 2.038196 LR: 0.00002997 +[09:11:47] Epoch: 1 Batch: 4241/20099 (21.10%) Loss: 2.020663 LR: 0.00002997 +[09:11:50] Epoch: 1 Batch: 4242/20099 (21.11%) Loss: 2.325535 LR: 0.00002997 +[09:11:53] Epoch: 1 Batch: 4243/20099 (21.11%) Loss: 2.306229 LR: 0.00002997 +[09:11:56] Epoch: 1 Batch: 4244/20099 (21.12%) Loss: 1.979847 LR: 0.00002997 +[09:12:00] Epoch: 1 Batch: 4245/20099 (21.12%) Loss: 2.148159 LR: 0.00002997 +[09:12:03] Epoch: 1 Batch: 4246/20099 (21.13%) Loss: 2.126872 LR: 0.00002997 +[09:12:06] Epoch: 1 Batch: 4247/20099 (21.13%) Loss: 2.306468 LR: 0.00002997 +[09:12:09] Epoch: 1 Batch: 4248/20099 (21.14%) Loss: 2.133780 LR: 0.00002997 +[09:12:12] Epoch: 1 Batch: 4249/20099 (21.14%) Loss: 2.219174 LR: 0.00002996 +[09:12:15] Epoch: 1 Batch: 4250/20099 (21.15%) Loss: 2.538643 LR: 0.00002996 +[09:12:18] Epoch: 1 Batch: 4251/20099 (21.15%) Loss: 2.379699 LR: 0.00002996 +[09:12:21] Epoch: 1 Batch: 4252/20099 (21.16%) Loss: 2.149000 LR: 0.00002996 +[09:12:24] Epoch: 1 Batch: 4253/20099 (21.16%) Loss: 1.929791 LR: 0.00002996 +[09:12:27] Epoch: 1 Batch: 4254/20099 (21.17%) Loss: 2.711968 LR: 0.00002996 +[09:12:30] Epoch: 1 Batch: 4255/20099 (21.17%) Loss: 2.219484 LR: 0.00002996 +[09:12:34] Epoch: 1 Batch: 4256/20099 (21.18%) Loss: 2.181377 LR: 0.00002996 +[09:12:37] Epoch: 1 Batch: 4257/20099 (21.18%) Loss: 2.115895 LR: 0.00002996 +[09:12:40] Epoch: 1 Batch: 4258/20099 (21.19%) Loss: 2.068900 LR: 0.00002996 +[09:12:43] Epoch: 1 Batch: 4259/20099 (21.19%) Loss: 2.314376 LR: 0.00002996 +[09:12:46] Epoch: 1 Batch: 4260/20099 (21.20%) Loss: 2.208560 LR: 0.00002996 +[09:12:49] Epoch: 1 Batch: 4261/20099 (21.20%) Loss: 2.317142 LR: 0.00002996 +[09:12:52] Epoch: 1 Batch: 4262/20099 (21.21%) Loss: 2.045093 LR: 0.00002996 +[09:12:55] Epoch: 1 Batch: 4263/20099 (21.21%) Loss: 2.006933 LR: 0.00002996 +[09:12:58] Epoch: 1 Batch: 4264/20099 (21.21%) Loss: 2.137494 LR: 0.00002996 +[09:13:01] Epoch: 1 Batch: 4265/20099 (21.22%) Loss: 2.223497 LR: 0.00002996 +[09:13:04] Epoch: 1 Batch: 4266/20099 (21.22%) Loss: 2.173101 LR: 0.00002996 +[09:13:07] Epoch: 1 Batch: 4267/20099 (21.23%) Loss: 2.159361 LR: 0.00002996 +[09:13:11] Epoch: 1 Batch: 4268/20099 (21.23%) Loss: 2.252352 LR: 0.00002996 +[09:13:14] Epoch: 1 Batch: 4269/20099 (21.24%) Loss: 2.341697 LR: 0.00002996 +[09:13:17] Epoch: 1 Batch: 4270/20099 (21.24%) Loss: 2.381249 LR: 0.00002996 +[09:13:20] Epoch: 1 Batch: 4271/20099 (21.25%) Loss: 2.287217 LR: 0.00002996 +[09:13:23] Epoch: 1 Batch: 4272/20099 (21.25%) Loss: 2.047531 LR: 0.00002996 +[09:13:26] Epoch: 1 Batch: 4273/20099 (21.26%) Loss: 2.165573 LR: 0.00002996 +[09:13:29] Epoch: 1 Batch: 4274/20099 (21.26%) Loss: 2.083696 LR: 0.00002996 +[09:13:32] Epoch: 1 Batch: 4275/20099 (21.27%) Loss: 2.101045 LR: 0.00002996 +[09:13:35] Epoch: 1 Batch: 4276/20099 (21.27%) Loss: 1.866181 LR: 0.00002996 +[09:13:38] Epoch: 1 Batch: 4277/20099 (21.28%) Loss: 2.118667 LR: 0.00002996 +[09:13:42] Epoch: 1 Batch: 4278/20099 (21.28%) Loss: 2.220998 LR: 0.00002996 +[09:13:45] Epoch: 1 Batch: 4279/20099 (21.29%) Loss: 1.847115 LR: 0.00002996 +[09:13:48] Epoch: 1 Batch: 4280/20099 (21.29%) Loss: 2.511682 LR: 0.00002996 +[09:13:51] Epoch: 1 Batch: 4281/20099 (21.30%) Loss: 2.290359 LR: 0.00002996 +[09:13:54] Epoch: 1 Batch: 4282/20099 (21.30%) Loss: 2.527822 LR: 0.00002996 +[09:13:57] Epoch: 1 Batch: 4283/20099 (21.31%) Loss: 2.245917 LR: 0.00002996 +[09:14:00] Epoch: 1 Batch: 4284/20099 (21.31%) Loss: 2.115796 LR: 0.00002996 +[09:14:03] Epoch: 1 Batch: 4285/20099 (21.32%) Loss: 2.122059 LR: 0.00002996 +[09:14:06] Epoch: 1 Batch: 4286/20099 (21.32%) Loss: 2.275984 LR: 0.00002996 +[09:14:09] Epoch: 1 Batch: 4287/20099 (21.33%) Loss: 2.073956 LR: 0.00002996 +[09:14:13] Epoch: 1 Batch: 4288/20099 (21.33%) Loss: 2.290770 LR: 0.00002996 +[09:14:16] Epoch: 1 Batch: 4289/20099 (21.34%) Loss: 2.517737 LR: 0.00002996 +[09:14:19] Epoch: 1 Batch: 4290/20099 (21.34%) Loss: 2.596697 LR: 0.00002996 +[09:14:22] Epoch: 1 Batch: 4291/20099 (21.35%) Loss: 1.943339 LR: 0.00002996 +[09:14:25] Epoch: 1 Batch: 4292/20099 (21.35%) Loss: 1.910061 LR: 0.00002996 +[09:14:28] Epoch: 1 Batch: 4293/20099 (21.36%) Loss: 2.100767 LR: 0.00002996 +[09:14:31] Epoch: 1 Batch: 4294/20099 (21.36%) Loss: 2.154790 LR: 0.00002996 +[09:14:34] Epoch: 1 Batch: 4295/20099 (21.37%) Loss: 2.241067 LR: 0.00002996 +[09:14:37] Epoch: 1 Batch: 4296/20099 (21.37%) Loss: 2.315400 LR: 0.00002996 +[09:14:40] Epoch: 1 Batch: 4297/20099 (21.38%) Loss: 2.173643 LR: 0.00002996 +[09:14:44] Epoch: 1 Batch: 4298/20099 (21.38%) Loss: 2.125968 LR: 0.00002996 +[09:14:47] Epoch: 1 Batch: 4299/20099 (21.39%) Loss: 2.567380 LR: 0.00002996 +[09:14:50] Epoch: 1 Batch: 4300/20099 (21.39%) Loss: 2.074193 LR: 0.00002996 +[09:14:53] Epoch: 1 Batch: 4301/20099 (21.40%) Loss: 1.839112 LR: 0.00002996 +[09:14:56] Epoch: 1 Batch: 4302/20099 (21.40%) Loss: 2.374906 LR: 0.00002996 +[09:14:59] Epoch: 1 Batch: 4303/20099 (21.41%) Loss: 2.215290 LR: 0.00002996 +[09:15:02] Epoch: 1 Batch: 4304/20099 (21.41%) Loss: 2.164253 LR: 0.00002996 +[09:15:05] Epoch: 1 Batch: 4305/20099 (21.42%) Loss: 2.269198 LR: 0.00002995 +[09:15:08] Epoch: 1 Batch: 4306/20099 (21.42%) Loss: 2.016145 LR: 0.00002995 +[09:15:11] Epoch: 1 Batch: 4307/20099 (21.43%) Loss: 2.021875 LR: 0.00002995 +[09:15:15] Epoch: 1 Batch: 4308/20099 (21.43%) Loss: 2.131211 LR: 0.00002995 +[09:15:18] Epoch: 1 Batch: 4309/20099 (21.44%) Loss: 1.902801 LR: 0.00002995 +[09:15:21] Epoch: 1 Batch: 4310/20099 (21.44%) Loss: 1.967012 LR: 0.00002995 +[09:15:24] Epoch: 1 Batch: 4311/20099 (21.45%) Loss: 1.967475 LR: 0.00002995 +[09:15:27] Epoch: 1 Batch: 4312/20099 (21.45%) Loss: 2.333742 LR: 0.00002995 +[09:15:30] Epoch: 1 Batch: 4313/20099 (21.46%) Loss: 2.112101 LR: 0.00002995 +[09:15:33] Epoch: 1 Batch: 4314/20099 (21.46%) Loss: 2.282474 LR: 0.00002995 +[09:15:36] Epoch: 1 Batch: 4315/20099 (21.47%) Loss: 2.228675 LR: 0.00002995 +[09:15:39] Epoch: 1 Batch: 4316/20099 (21.47%) Loss: 2.393400 LR: 0.00002995 +[09:15:42] Epoch: 1 Batch: 4317/20099 (21.48%) Loss: 2.304865 LR: 0.00002995 +[09:15:45] Epoch: 1 Batch: 4318/20099 (21.48%) Loss: 2.301402 LR: 0.00002995 +[09:15:49] Epoch: 1 Batch: 4319/20099 (21.49%) Loss: 2.161675 LR: 0.00002995 +[09:15:52] Epoch: 1 Batch: 4320/20099 (21.49%) Loss: 1.954080 LR: 0.00002995 +[09:15:55] Epoch: 1 Batch: 4321/20099 (21.50%) Loss: 2.191655 LR: 0.00002995 +[09:15:58] Epoch: 1 Batch: 4322/20099 (21.50%) Loss: 2.077162 LR: 0.00002995 +[09:16:01] Epoch: 1 Batch: 4323/20099 (21.51%) Loss: 2.095188 LR: 0.00002995 +[09:16:13] 2025-08-23 +[09:16:14] Tesla T4 +[09:16:14] +|===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 0 B | 0 B | 0 B | 0 B | +|---------------------------------------------------------------------------| +| Active memory | 0 B | 0 B | 0 B | 0 B | +|---------------------------------------------------------------------------| +| Requested memory | 0 B | 0 B | 0 B | 0 B | +|---------------------------------------------------------------------------| +| GPU reserved memory | 0 B | 0 B | 0 B | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 0 B | 0 B | 0 B | 0 B | +|---------------------------------------------------------------------------| +| Allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Active allocs | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| GPU reserved segments | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +[09:16:14] CPU usage: 59.3%, RAM usage: 27.2% +[09:16:14] Running with the following configuration: +[09:16:14] model_name: /content/drive/MyDrive/llm/NousResearch/Hermes-3-Llama-3.1-8B +[09:16:14] tokenizer: /content/drive/MyDrive/llm/NousResearch/Hermes-3-Llama-3.1-8B +[09:16:14] output_dir: /content/drive/MyDrive/llm/Discord-Hermes-3-8B +[09:16:14] train_path: /content/drive/MyDrive/data/None156_fix.csv +[09:16:14] checkpoint: /content/drive/MyDrive/llm/Discord-Hermes-3-8B/temp/epoch1_step4000 +[09:16:14] lr: 3e-05 +[09:16:14] lr_floor: 6e-06 +[09:16:14] epochs: 1 +[09:16:14] batch_size: 5 +[09:16:14] accum_steps: 7 +[09:16:14] val_batch_size: 6 +[09:16:14] max_val_size: 100 +[09:16:14] max_length: 150 +[09:16:14] save_temp_frequency: 200 +[09:16:14] save_frequency: 500 +[09:16:14] eval_frequency: 500 +[09:16:14] save_pattern: y +[09:16:14] quantization: y +[09:16:14] quantization_bits: 4 +[09:16:14] lora: y +[09:16:14] frozen_lora_path: None +[09:16:14] lora_rank: 16 +[09:16:14] lora_alpha: 32 +[09:16:14] lora_dropout: 0.1 +[09:16:14] optimizer_weight_decay: 0.0 +[09:16:14] warmup_type: cosine +[09:16:14] warmup_ratio: 0.08 +[09:16:14] warmup_steps: 550 +[09:16:14] shuffle: y +[09:16:14] csv_column: text +[09:16:14] new_run: n +[09:16:14] label_smoothing: 0.05 +[09:16:14] SEED: 1 +[09:16:14] Using device: cuda +[09:16:14] Resuming from temp checkpoint: /content/drive/MyDrive/llm/Discord-Hermes-3-8B/temp/epoch1_step4000 +[09:21:55] Embeddings shape after: torch.Size([128256, 4096]) +[09:22:00] Loaded trainable LoRA adapter from /content/drive/MyDrive/llm/Discord-Hermes-3-8B/temp/epoch1_step4000 +[09:22:00] Trainable LoRA 'default': +[09:22:00] task_type: CAUSAL_LM +[09:22:00] peft_type: PeftType.LORA +[09:22:00] auto_mapping: None +[09:22:00] base_model_name_or_path: /content/drive/MyDrive/llm/NousResearch/Hermes-3-Llama-3.1-8B +[09:22:00] revision: None +[09:22:01] inference_mode: False +[09:22:01] r: 16 +[09:22:01] target_modules: {'v_proj', 'q_proj', 'o_proj', 'k_proj'} +[09:22:01] exclude_modules: None +[09:22:01] lora_alpha: 32 +[09:22:01] lora_dropout: 0.1 +[09:22:01] fan_in_fan_out: False +[09:22:01] bias: none +[09:22:01] use_rslora: True +[09:22:01] modules_to_save: None +[09:22:01] init_lora_weights: True +[09:22:01] layers_to_transform: None +[09:22:01] layers_pattern: None +[09:22:01] rank_pattern: {} +[09:22:01] alpha_pattern: {} +[09:22:01] megatron_config: None +[09:22:01] megatron_core: megatron.core +[09:22:01] trainable_token_indices: None +[09:22:01] loftq_config: {} +[09:22:01] eva_config: None +[09:22:01] corda_config: None +[09:22:01] use_dora: False +[09:22:01] use_qalora: False +[09:22:01] qalora_group_size: 16 +[09:22:01] layer_replication: None +[09:22:01] runtime_config: LoraRuntimeConfig(ephemeral_gpu_offload=False) +[09:22:01] lora_bias: False +[09:22:01] target_parameters: None +[09:22:01] _custom_modules: None +[09:22:01] Embeddings shape after: torch.Size([128256, 4096]) +[09:22:06] Resumed from epoch 1, step 4001, file 1 +[09:22:06] Starting from CSV file... +[09:22:09] Splitting data into chunks of 11000... +[09:22:09] Using 7 processes across 10 chunks +[09:22:10] Using saved train/val split from checkpoint. +[09:22:10] Resuming scheduler with warmup steps: 229, total steps: 2871 +[09:22:10] Initializing scheduler with cosine schedule with warmup, warmup steps 550, total steps: 2871 +[09:22:10] Train/Val split: 100492 train, 100 val samples. +[09:22:19] Model: PeftModelForCausalLM +[09:22:19] Model config: LlamaConfig { + "architectures": [ + "LlamaForCausalLM" + ], + "attention_bias": false, + "attention_dropout": 0.0, + "bos_token_id": 128000, + "eos_token_id": 128040, + "head_dim": 128, + "hidden_act": "silu", + "hidden_size": 4096, + "initializer_range": 0.02, + "intermediate_size": 14336, + "max_position_embeddings": 131072, + "mlp_bias": false, + "model_type": "llama", + "num_attention_heads": 32, + "num_hidden_layers": 32, + "num_key_value_heads": 8, + "pretraining_tp": 1, + "quantization_config": { + "_load_in_4bit": true, + "_load_in_8bit": false, + "bnb_4bit_compute_dtype": "float16", + "bnb_4bit_quant_storage": "uint8", + "bnb_4bit_quant_type": "nf4", + "bnb_4bit_use_double_quant": true, + "llm_int8_enable_fp32_cpu_offload": false, + "llm_int8_has_fp16_weight": false, + "llm_int8_skip_modules": [ + "lm_head" + ], + "llm_int8_threshold": 6.0, + "load_in_4bit": true, + "load_in_8bit": false, + "quant_method": "bitsandbytes" + }, + "rms_norm_eps": 1e-05, + "rope_scaling": { + "factor": 8.0, + "high_freq_factor": 4.0, + "low_freq_factor": 1.0, + "original_max_position_embeddings": 8192, + "rope_type": "llama3" + }, + "rope_theta": 500000.0, + "tie_word_embeddings": false, + "torch_dtype": "float16", + "transformers_version": "4.55.2", + "use_cache": true, + "vocab_size": 128256 +} + +[09:22:19] Optimizer params: lr=3e-05, weight_decay=0.0, accum_steps=7 +[09:22:19] +Optimizer: PagedAdamW ( +Parameter Group 0 + alpha: 0.0 + betas: (0.9, 0.95) + eps: 1e-08 + initial_lr: 3e-05 + lr: 0.0 + t_alpha: None + t_beta3: None + weight_decay: 0.0 +) +[09:22:19] Optimizer params: lr=3e-05, weight_decay=0.0, accum_steps=7 +[09:22:19] Scheduler: +[09:22:19] Training on 100492 training samples, 100 validation samples +[09:22:19] Average tokens per sample: 150.00 +[09:22:19] Estimated epoch time: ~308.02 min +[09:22:19] +|===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 5986 MiB | 7004 MiB | 335397 MiB | 329410 MiB | +|---------------------------------------------------------------------------| +| Active memory | 5986 MiB | 7004 MiB | 335397 MiB | 329410 MiB | +|---------------------------------------------------------------------------| +| Requested memory | 5983 MiB | 7000 MiB | 335022 MiB | 329039 MiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 7248 MiB | 7248 MiB | 7248 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 1261 MiB | 5879 MiB | 328754 MiB | 327493 MiB | +|---------------------------------------------------------------------------| +| Allocations | 2762 | 2840 | 33883 | 31121 | +|---------------------------------------------------------------------------| +| Active allocs | 2762 | 2840 | 33883 | 31121 | +|---------------------------------------------------------------------------| +| GPU reserved segments | 185 | 185 | 185 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 36 | 36 | 13826 | 13790 | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +[09:22:19] Restoring shuffle indices from training state for epoch 1 +[09:22:19] CPU usage: 46.3%, RAM usage: 37.1% +[09:22:20] Epoch 1 learning rate: 0.0 +[09:22:20] Starting epoch 1 +[09:22:32] Batch 4001: input_ids shape torch.Size([5, 140]), attention_mask shape torch.Size([5, 140]) +[09:22:34] Epoch: 1 Batch: 4001/20099 (19.91%) Loss: 2.366458 LR: 0.00000000 +[09:22:36] Epoch: 1 Batch: 4002/20099 (19.91%) Loss: 2.154430 LR: 0.00000000 +[09:22:37] Epoch: 1 Batch: 4003/20099 (19.92%) Loss: 2.147170 LR: 0.00000000 +[09:22:39] Epoch: 1 Batch: 4004/20099 (19.92%) Loss: 2.156662 LR: 0.00000000 +[09:22:41] Epoch: 1 Batch: 4005/20099 (19.93%) Loss: 2.155010 LR: 0.00000000 +[09:22:42] Epoch: 1 Batch: 4006/20099 (19.93%) Loss: 2.219366 LR: 0.00000000 +[09:22:44] Epoch: 1 Batch: 4007/20099 (19.94%) Loss: 2.197301 LR: 0.00002999 +[09:22:46] Epoch: 1 Batch: 4008/20099 (19.94%) Loss: 2.097772 LR: 0.00002999 +[09:22:47] Epoch: 1 Batch: 4009/20099 (19.95%) Loss: 2.143481 LR: 0.00002999 +[09:22:49] Epoch: 1 Batch: 4010/20099 (19.95%) Loss: 1.808341 LR: 0.00002999 +[09:22:51] Epoch: 1 Batch: 4011/20099 (19.96%) Loss: 2.078780 LR: 0.00002999 +[09:22:53] Epoch: 1 Batch: 4012/20099 (19.96%) Loss: 2.126783 LR: 0.00002999 +[09:22:54] Epoch: 1 Batch: 4013/20099 (19.97%) Loss: 2.034043 LR: 0.00002999 +[09:22:56] Epoch: 1 Batch: 4014/20099 (19.97%) Loss: 2.210934 LR: 0.00002999 +[09:22:58] Epoch: 1 Batch: 4015/20099 (19.98%) Loss: 2.196439 LR: 0.00002999 +[09:23:00] Epoch: 1 Batch: 4016/20099 (19.98%) Loss: 2.149901 LR: 0.00002999 +[09:23:02] Epoch: 1 Batch: 4017/20099 (19.99%) Loss: 2.283413 LR: 0.00002999 +[09:23:03] Epoch: 1 Batch: 4018/20099 (19.99%) Loss: 2.181970 LR: 0.00002999 +[09:23:05] Epoch: 1 Batch: 4019/20099 (20.00%) Loss: 2.062837 LR: 0.00002999 +[09:23:07] Epoch: 1 Batch: 4020/20099 (20.00%) Loss: 2.212120 LR: 0.00002999 +[09:23:09] Epoch: 1 Batch: 4021/20099 (20.01%) Loss: 2.131150 LR: 0.00002999 +[09:23:11] Epoch: 1 Batch: 4022/20099 (20.01%) Loss: 2.434605 LR: 0.00002999 +[09:23:13] Epoch: 1 Batch: 4023/20099 (20.02%) Loss: 2.162192 LR: 0.00002999 +[09:23:14] Epoch: 1 Batch: 4024/20099 (20.02%) Loss: 2.050821 LR: 0.00002999 +[09:23:16] Epoch: 1 Batch: 4025/20099 (20.03%) Loss: 2.139873 LR: 0.00002999 +[09:23:18] Epoch: 1 Batch: 4026/20099 (20.03%) Loss: 2.136459 LR: 0.00002999 +[09:23:20] Epoch: 1 Batch: 4027/20099 (20.04%) Loss: 2.295009 LR: 0.00002999 +[09:23:22] Epoch: 1 Batch: 4028/20099 (20.04%) Loss: 2.159074 LR: 0.00002999 +[09:23:24] Epoch: 1 Batch: 4029/20099 (20.05%) Loss: 1.936129 LR: 0.00002999 +[09:23:25] Epoch: 1 Batch: 4030/20099 (20.05%) Loss: 2.432516 LR: 0.00002999 +[09:23:27] Epoch: 1 Batch: 4031/20099 (20.06%) Loss: 2.049529 LR: 0.00002999 +[09:23:29] Epoch: 1 Batch: 4032/20099 (20.06%) Loss: 2.282998 LR: 0.00002999 +[09:23:31] Epoch: 1 Batch: 4033/20099 (20.07%) Loss: 1.783732 LR: 0.00002999 +[09:23:33] Epoch: 1 Batch: 4034/20099 (20.07%) Loss: 2.098824 LR: 0.00002999 +[09:23:35] Epoch: 1 Batch: 4035/20099 (20.08%) Loss: 1.877168 LR: 0.00002999 +[09:23:36] Epoch: 1 Batch: 4036/20099 (20.08%) Loss: 2.196977 LR: 0.00002999 +[09:23:38] Epoch: 1 Batch: 4037/20099 (20.09%) Loss: 2.253958 LR: 0.00002999 +[09:23:40] Epoch: 1 Batch: 4038/20099 (20.09%) Loss: 2.112927 LR: 0.00002999 +[09:23:42] Epoch: 1 Batch: 4039/20099 (20.10%) Loss: 2.108996 LR: 0.00002999 +[09:23:43] Epoch: 1 Batch: 4040/20099 (20.10%) Loss: 2.197201 LR: 0.00002999 +[09:23:45] Epoch: 1 Batch: 4041/20099 (20.11%) Loss: 2.118268 LR: 0.00002999 +[09:23:47] Epoch: 1 Batch: 4042/20099 (20.11%) Loss: 2.102075 LR: 0.00002999 +[09:23:49] Epoch: 1 Batch: 4043/20099 (20.12%) Loss: 1.874926 LR: 0.00002999 +[09:23:50] Epoch: 1 Batch: 4044/20099 (20.12%) Loss: 2.071547 LR: 0.00002999 +[09:23:52] Epoch: 1 Batch: 4045/20099 (20.13%) Loss: 2.150452 LR: 0.00002999 +[09:23:54] Epoch: 1 Batch: 4046/20099 (20.13%) Loss: 2.297747 LR: 0.00002999 +[09:23:56] Epoch: 1 Batch: 4047/20099 (20.14%) Loss: 1.896263 LR: 0.00002999 +[09:23:57] Epoch: 1 Batch: 4048/20099 (20.14%) Loss: 1.922141 LR: 0.00002999 +[09:23:59] Epoch: 1 Batch: 4049/20099 (20.15%) Loss: 2.147628 LR: 0.00002999 +[09:24:01] Epoch: 1 Batch: 4050/20099 (20.15%) Loss: 2.255849 LR: 0.00002999 +[09:24:02] Epoch: 1 Batch: 4051/20099 (20.16%) Loss: 2.060654 LR: 0.00002999 +[09:24:04] Epoch: 1 Batch: 4052/20099 (20.16%) Loss: 2.400548 LR: 0.00002999 +[09:24:06] Epoch: 1 Batch: 4053/20099 (20.17%) Loss: 2.142119 LR: 0.00002999 +[09:24:08] Epoch: 1 Batch: 4054/20099 (20.17%) Loss: 2.210209 LR: 0.00002999 +[09:24:09] Epoch: 1 Batch: 4055/20099 (20.18%) Loss: 2.173558 LR: 0.00002999 +[09:24:11] Epoch: 1 Batch: 4056/20099 (20.18%) Loss: 2.590345 LR: 0.00002999 +[09:24:13] Epoch: 1 Batch: 4057/20099 (20.19%) Loss: 2.077080 LR: 0.00002999 +[09:24:15] Epoch: 1 Batch: 4058/20099 (20.19%) Loss: 2.250375 LR: 0.00002999 +[09:24:17] Epoch: 1 Batch: 4059/20099 (20.20%) Loss: 2.055619 LR: 0.00002999 +[09:24:18] Epoch: 1 Batch: 4060/20099 (20.20%) Loss: 2.027817 LR: 0.00002999 +[09:24:20] Epoch: 1 Batch: 4061/20099 (20.20%) Loss: 2.447200 LR: 0.00002999 +[09:24:22] Epoch: 1 Batch: 4062/20099 (20.21%) Loss: 1.974106 LR: 0.00002999 +[09:24:24] Epoch: 1 Batch: 4063/20099 (20.21%) Loss: 2.211970 LR: 0.00002999 +[09:24:25] Epoch: 1 Batch: 4064/20099 (20.22%) Loss: 1.975977 LR: 0.00002999 +[09:24:27] Epoch: 1 Batch: 4065/20099 (20.22%) Loss: 2.059137 LR: 0.00002999 +[09:24:29] Epoch: 1 Batch: 4066/20099 (20.23%) Loss: 2.382083 LR: 0.00002999 +[09:24:31] Epoch: 1 Batch: 4067/20099 (20.23%) Loss: 2.404890 LR: 0.00002999 +[09:24:32] Epoch: 1 Batch: 4068/20099 (20.24%) Loss: 1.885523 LR: 0.00002999 +[09:24:34] Epoch: 1 Batch: 4069/20099 (20.24%) Loss: 2.361925 LR: 0.00002999 +[09:24:36] Epoch: 1 Batch: 4070/20099 (20.25%) Loss: 2.402880 LR: 0.00002999 +[09:24:38] Epoch: 1 Batch: 4071/20099 (20.25%) Loss: 1.990436 LR: 0.00002999 +[09:24:40] Epoch: 1 Batch: 4072/20099 (20.26%) Loss: 2.249188 LR: 0.00002999 +[09:24:41] Epoch: 1 Batch: 4073/20099 (20.26%) Loss: 2.073479 LR: 0.00002999 +[09:24:43] Epoch: 1 Batch: 4074/20099 (20.27%) Loss: 2.147360 LR: 0.00002999 +[09:24:45] Epoch: 1 Batch: 4075/20099 (20.27%) Loss: 2.059235 LR: 0.00002999 +[09:24:47] Epoch: 1 Batch: 4076/20099 (20.28%) Loss: 2.341625 LR: 0.00002999 +[09:24:49] Epoch: 1 Batch: 4077/20099 (20.28%) Loss: 1.937874 LR: 0.00002999 +[09:24:50] Epoch: 1 Batch: 4078/20099 (20.29%) Loss: 2.413152 LR: 0.00002999 +[09:24:52] Epoch: 1 Batch: 4079/20099 (20.29%) Loss: 2.403069 LR: 0.00002999 +[09:24:54] Epoch: 1 Batch: 4080/20099 (20.30%) Loss: 2.349684 LR: 0.00002999 +[09:24:56] Epoch: 1 Batch: 4081/20099 (20.30%) Loss: 1.943343 LR: 0.00002999 +[09:24:58] Epoch: 1 Batch: 4082/20099 (20.31%) Loss: 1.914011 LR: 0.00002999 +[09:24:59] Epoch: 1 Batch: 4083/20099 (20.31%) Loss: 2.262524 LR: 0.00002999 +[09:25:01] Epoch: 1 Batch: 4084/20099 (20.32%) Loss: 1.840936 LR: 0.00002999 +[09:25:03] Epoch: 1 Batch: 4085/20099 (20.32%) Loss: 2.109431 LR: 0.00002999 +[09:25:05] Epoch: 1 Batch: 4086/20099 (20.33%) Loss: 2.250267 LR: 0.00002999 +[09:25:07] Epoch: 1 Batch: 4087/20099 (20.33%) Loss: 2.383049 LR: 0.00002999 +[09:25:08] Epoch: 1 Batch: 4088/20099 (20.34%) Loss: 2.286646 LR: 0.00002999 +[09:25:10] Epoch: 1 Batch: 4089/20099 (20.34%) Loss: 2.274777 LR: 0.00002999 +[09:25:12] Epoch: 1 Batch: 4090/20099 (20.35%) Loss: 2.193700 LR: 0.00002999 +[09:25:14] Epoch: 1 Batch: 4091/20099 (20.35%) Loss: 2.031152 LR: 0.00002999 +[09:25:15] Epoch: 1 Batch: 4092/20099 (20.36%) Loss: 2.384892 LR: 0.00002999 +[09:25:17] Epoch: 1 Batch: 4093/20099 (20.36%) Loss: 2.331565 LR: 0.00002999 +[09:25:19] Epoch: 1 Batch: 4094/20099 (20.37%) Loss: 2.319333 LR: 0.00002999 +[09:25:21] Epoch: 1 Batch: 4095/20099 (20.37%) Loss: 2.055644 LR: 0.00002999 +[09:25:22] Epoch: 1 Batch: 4096/20099 (20.38%) Loss: 2.220170 LR: 0.00002999 +[09:25:24] Epoch: 1 Batch: 4097/20099 (20.38%) Loss: 1.857415 LR: 0.00002999 +[09:25:26] Epoch: 1 Batch: 4098/20099 (20.39%) Loss: 2.209712 LR: 0.00002999 +[09:25:28] Epoch: 1 Batch: 4099/20099 (20.39%) Loss: 2.097533 LR: 0.00002999 +[09:25:29] Epoch: 1 Batch: 4100/20099 (20.40%) Loss: 1.895439 LR: 0.00002999 +[09:25:31] Epoch: 1 Batch: 4101/20099 (20.40%) Loss: 2.386527 LR: 0.00002999 +[09:25:33] Epoch: 1 Batch: 4102/20099 (20.41%) Loss: 2.126809 LR: 0.00002999 +[09:25:35] Epoch: 1 Batch: 4103/20099 (20.41%) Loss: 1.895291 LR: 0.00002999 +[09:25:36] Epoch: 1 Batch: 4104/20099 (20.42%) Loss: 2.300444 LR: 0.00002999 +[09:25:38] Epoch: 1 Batch: 4105/20099 (20.42%) Loss: 2.172960 LR: 0.00002999 +[09:25:40] Epoch: 1 Batch: 4106/20099 (20.43%) Loss: 2.231460 LR: 0.00002999 +[09:25:42] Epoch: 1 Batch: 4107/20099 (20.43%) Loss: 2.201310 LR: 0.00002999 +[09:25:44] Epoch: 1 Batch: 4108/20099 (20.44%) Loss: 2.143344 LR: 0.00002999 +[09:25:45] Epoch: 1 Batch: 4109/20099 (20.44%) Loss: 2.373152 LR: 0.00002999 +[09:25:47] Epoch: 1 Batch: 4110/20099 (20.45%) Loss: 2.182037 LR: 0.00002999 +[09:25:49] Epoch: 1 Batch: 4111/20099 (20.45%) Loss: 2.102974 LR: 0.00002999 +[09:25:51] Epoch: 1 Batch: 4112/20099 (20.46%) Loss: 2.067826 LR: 0.00002998 +[09:25:52] Epoch: 1 Batch: 4113/20099 (20.46%) Loss: 2.168070 LR: 0.00002998 +[09:25:54] Epoch: 1 Batch: 4114/20099 (20.47%) Loss: 2.286877 LR: 0.00002998 +[09:25:56] Epoch: 1 Batch: 4115/20099 (20.47%) Loss: 2.317633 LR: 0.00002998 +[09:25:58] Epoch: 1 Batch: 4116/20099 (20.48%) Loss: 2.073340 LR: 0.00002998 +[09:26:00] Epoch: 1 Batch: 4117/20099 (20.48%) Loss: 2.177355 LR: 0.00002998 +[09:26:01] Epoch: 1 Batch: 4118/20099 (20.49%) Loss: 2.029275 LR: 0.00002998 +[09:26:03] Epoch: 1 Batch: 4119/20099 (20.49%) Loss: 2.100441 LR: 0.00002998 +[09:26:05] Epoch: 1 Batch: 4120/20099 (20.50%) Loss: 2.140800 LR: 0.00002998 +[09:26:07] Epoch: 1 Batch: 4121/20099 (20.50%) Loss: 2.403699 LR: 0.00002998 +[09:26:08] Epoch: 1 Batch: 4122/20099 (20.51%) Loss: 2.171704 LR: 0.00002998 +[09:26:10] Epoch: 1 Batch: 4123/20099 (20.51%) Loss: 2.229477 LR: 0.00002998 +[09:26:12] Epoch: 1 Batch: 4124/20099 (20.52%) Loss: 2.029306 LR: 0.00002998 +[09:26:14] Epoch: 1 Batch: 4125/20099 (20.52%) Loss: 2.129517 LR: 0.00002998 +[09:26:16] Epoch: 1 Batch: 4126/20099 (20.53%) Loss: 2.322975 LR: 0.00002998 +[09:26:17] Epoch: 1 Batch: 4127/20099 (20.53%) Loss: 2.291036 LR: 0.00002998 +[09:26:19] Epoch: 1 Batch: 4128/20099 (20.54%) Loss: 2.168188 LR: 0.00002998 +[09:26:21] Epoch: 1 Batch: 4129/20099 (20.54%) Loss: 2.214102 LR: 0.00002998 +[09:26:23] Epoch: 1 Batch: 4130/20099 (20.55%) Loss: 2.447123 LR: 0.00002998 +[09:26:25] Epoch: 1 Batch: 4131/20099 (20.55%) Loss: 2.279592 LR: 0.00002998 +[09:26:26] Epoch: 1 Batch: 4132/20099 (20.56%) Loss: 2.349950 LR: 0.00002998 +[09:26:28] Epoch: 1 Batch: 4133/20099 (20.56%) Loss: 2.163515 LR: 0.00002998 +[09:26:30] Epoch: 1 Batch: 4134/20099 (20.57%) Loss: 2.402847 LR: 0.00002998 +[09:26:32] Epoch: 1 Batch: 4135/20099 (20.57%) Loss: 2.186323 LR: 0.00002998 +[09:26:33] Epoch: 1 Batch: 4136/20099 (20.58%) Loss: 2.135204 LR: 0.00002998 +[09:26:35] Epoch: 1 Batch: 4137/20099 (20.58%) Loss: 2.153203 LR: 0.00002998 +[09:26:37] Epoch: 1 Batch: 4138/20099 (20.59%) Loss: 2.337076 LR: 0.00002998 +[09:26:39] Epoch: 1 Batch: 4139/20099 (20.59%) Loss: 2.117932 LR: 0.00002998 +[09:26:41] Epoch: 1 Batch: 4140/20099 (20.60%) Loss: 2.210425 LR: 0.00002998 +[09:26:42] Epoch: 1 Batch: 4141/20099 (20.60%) Loss: 2.249429 LR: 0.00002998 +[09:26:44] Epoch: 1 Batch: 4142/20099 (20.61%) Loss: 2.196202 LR: 0.00002998 +[09:26:46] Epoch: 1 Batch: 4143/20099 (20.61%) Loss: 2.108394 LR: 0.00002998 +[09:26:48] Epoch: 1 Batch: 4144/20099 (20.62%) Loss: 1.956349 LR: 0.00002998 +[09:26:49] Epoch: 1 Batch: 4145/20099 (20.62%) Loss: 2.279192 LR: 0.00002998 +[09:26:51] Epoch: 1 Batch: 4146/20099 (20.63%) Loss: 2.406589 LR: 0.00002998 +[09:26:53] Epoch: 1 Batch: 4147/20099 (20.63%) Loss: 1.886248 LR: 0.00002998 +[09:26:55] Epoch: 1 Batch: 4148/20099 (20.64%) Loss: 2.000163 LR: 0.00002998 +[09:26:56] Epoch: 1 Batch: 4149/20099 (20.64%) Loss: 1.887361 LR: 0.00002998 +[09:26:58] Epoch: 1 Batch: 4150/20099 (20.65%) Loss: 1.984324 LR: 0.00002998 +[09:27:00] Epoch: 1 Batch: 4151/20099 (20.65%) Loss: 2.179041 LR: 0.00002998 +[09:27:02] Epoch: 1 Batch: 4152/20099 (20.66%) Loss: 1.930087 LR: 0.00002998 +[09:27:04] Epoch: 1 Batch: 4153/20099 (20.66%) Loss: 2.794794 LR: 0.00002998 +[09:27:05] Epoch: 1 Batch: 4154/20099 (20.67%) Loss: 2.126315 LR: 0.00002998 +[09:27:07] Epoch: 1 Batch: 4155/20099 (20.67%) Loss: 2.256185 LR: 0.00002998 +[09:27:09] Epoch: 1 Batch: 4156/20099 (20.68%) Loss: 2.078226 LR: 0.00002998 +[09:27:11] Epoch: 1 Batch: 4157/20099 (20.68%) Loss: 1.652144 LR: 0.00002998 +[09:27:12] Epoch: 1 Batch: 4158/20099 (20.69%) Loss: 2.046209 LR: 0.00002998 +[09:27:14] Epoch: 1 Batch: 4159/20099 (20.69%) Loss: 1.866931 LR: 0.00002998 +[09:27:16] Epoch: 1 Batch: 4160/20099 (20.70%) Loss: 2.249739 LR: 0.00002998 +[09:27:18] Epoch: 1 Batch: 4161/20099 (20.70%) Loss: 2.535205 LR: 0.00002998 +[09:27:19] Epoch: 1 Batch: 4162/20099 (20.71%) Loss: 1.987071 LR: 0.00002998 +[09:27:21] Epoch: 1 Batch: 4163/20099 (20.71%) Loss: 2.310194 LR: 0.00002998 +[09:27:23] Epoch: 1 Batch: 4164/20099 (20.72%) Loss: 2.246069 LR: 0.00002998 +[09:27:25] Epoch: 1 Batch: 4165/20099 (20.72%) Loss: 2.236383 LR: 0.00002998 +[09:27:26] Epoch: 1 Batch: 4166/20099 (20.73%) Loss: 1.744655 LR: 0.00002998 +[09:27:28] Epoch: 1 Batch: 4167/20099 (20.73%) Loss: 2.082060 LR: 0.00002998 +[09:27:30] Epoch: 1 Batch: 4168/20099 (20.74%) Loss: 2.383420 LR: 0.00002998 +[09:27:32] Epoch: 1 Batch: 4169/20099 (20.74%) Loss: 2.261171 LR: 0.00002998 +[09:27:33] Epoch: 1 Batch: 4170/20099 (20.75%) Loss: 1.748674 LR: 0.00002998 +[09:27:35] Epoch: 1 Batch: 4171/20099 (20.75%) Loss: 2.313165 LR: 0.00002998 +[09:27:37] Epoch: 1 Batch: 4172/20099 (20.76%) Loss: 2.079988 LR: 0.00002998 +[09:27:39] Epoch: 1 Batch: 4173/20099 (20.76%) Loss: 2.461200 LR: 0.00002998 +[09:27:41] Epoch: 1 Batch: 4174/20099 (20.77%) Loss: 1.769288 LR: 0.00002998 +[09:27:42] Epoch: 1 Batch: 4175/20099 (20.77%) Loss: 2.145441 LR: 0.00002998 +[09:27:44] Epoch: 1 Batch: 4176/20099 (20.78%) Loss: 2.132227 LR: 0.00002998 +[09:27:46] Epoch: 1 Batch: 4177/20099 (20.78%) Loss: 2.423449 LR: 0.00002998 +[09:27:48] Epoch: 1 Batch: 4178/20099 (20.79%) Loss: 1.908737 LR: 0.00002998 +[09:27:49] Epoch: 1 Batch: 4179/20099 (20.79%) Loss: 2.124351 LR: 0.00002998 +[09:27:51] Epoch: 1 Batch: 4180/20099 (20.80%) Loss: 2.022864 LR: 0.00002998 +[09:27:53] Epoch: 1 Batch: 4181/20099 (20.80%) Loss: 2.109617 LR: 0.00002998 +[09:27:55] Epoch: 1 Batch: 4182/20099 (20.81%) Loss: 2.346232 LR: 0.00002998 +[09:27:56] Epoch: 1 Batch: 4183/20099 (20.81%) Loss: 2.208992 LR: 0.00002998 +[09:27:58] Epoch: 1 Batch: 4184/20099 (20.82%) Loss: 2.314929 LR: 0.00002998 +[09:28:00] Epoch: 1 Batch: 4185/20099 (20.82%) Loss: 1.593492 LR: 0.00002998 +[09:28:02] Epoch: 1 Batch: 4186/20099 (20.83%) Loss: 1.952594 LR: 0.00002998 +[09:28:04] Epoch: 1 Batch: 4187/20099 (20.83%) Loss: 2.185266 LR: 0.00002998 +[09:28:05] Epoch: 1 Batch: 4188/20099 (20.84%) Loss: 1.977003 LR: 0.00002998 +[09:28:07] Epoch: 1 Batch: 4189/20099 (20.84%) Loss: 2.172154 LR: 0.00002997 +[09:28:09] Epoch: 1 Batch: 4190/20099 (20.85%) Loss: 2.205625 LR: 0.00002997 +[09:28:11] Epoch: 1 Batch: 4191/20099 (20.85%) Loss: 2.313609 LR: 0.00002997 +[09:28:13] Epoch: 1 Batch: 4192/20099 (20.86%) Loss: 2.151612 LR: 0.00002997 +[09:28:14] Epoch: 1 Batch: 4193/20099 (20.86%) Loss: 2.159230 LR: 0.00002997 +[09:28:16] Epoch: 1 Batch: 4194/20099 (20.87%) Loss: 2.171582 LR: 0.00002997 +[09:28:18] Epoch: 1 Batch: 4195/20099 (20.87%) Loss: 2.472545 LR: 0.00002997 +[09:28:20] Epoch: 1 Batch: 4196/20099 (20.88%) Loss: 2.231366 LR: 0.00002997 +[09:28:21] Epoch: 1 Batch: 4197/20099 (20.88%) Loss: 2.498285 LR: 0.00002997 +[09:28:23] Epoch: 1 Batch: 4198/20099 (20.89%) Loss: 1.781743 LR: 0.00002997 +[09:28:25] Epoch: 1 Batch: 4199/20099 (20.89%) Loss: 2.474333 LR: 0.00002997 +[09:28:42] >> Temp checkpoint saved: epoch1_step4200, size: 0.1693 GB +[09:28:42] Epoch: 1 Batch: 4200/20099 (20.90%) Loss: 2.258252 LR: 0.00002997 +[09:28:44] Epoch: 1 Batch: 4201/20099 (20.90%) Loss: 2.217326 LR: 0.00002997 +[09:28:46] Epoch: 1 Batch: 4202/20099 (20.91%) Loss: 1.906879 LR: 0.00002997 +[09:28:47] Epoch: 1 Batch: 4203/20099 (20.91%) Loss: 1.988648 LR: 0.00002997 +[09:28:49] Epoch: 1 Batch: 4204/20099 (20.92%) Loss: 2.493227 LR: 0.00002997 +[09:28:52] Epoch: 1 Batch: 4205/20099 (20.92%) Loss: 2.081568 LR: 0.00002997 +[09:28:54] Epoch: 1 Batch: 4206/20099 (20.93%) Loss: 2.326968 LR: 0.00002997 +[09:28:56] Epoch: 1 Batch: 4207/20099 (20.93%) Loss: 2.204204 LR: 0.00002997 +[09:28:58] Epoch: 1 Batch: 4208/20099 (20.94%) Loss: 2.013638 LR: 0.00002997 +[09:29:00] Epoch: 1 Batch: 4209/20099 (20.94%) Loss: 2.173445 LR: 0.00002997 +[09:29:01] Epoch: 1 Batch: 4210/20099 (20.95%) Loss: 2.294368 LR: 0.00002997 +[09:29:03] Epoch: 1 Batch: 4211/20099 (20.95%) Loss: 2.000285 LR: 0.00002997 +[09:29:05] Epoch: 1 Batch: 4212/20099 (20.96%) Loss: 2.212265 LR: 0.00002997 +[09:29:07] Epoch: 1 Batch: 4213/20099 (20.96%) Loss: 2.269067 LR: 0.00002997 +[09:29:09] Epoch: 1 Batch: 4214/20099 (20.97%) Loss: 2.209731 LR: 0.00002997 +[09:29:11] Epoch: 1 Batch: 4215/20099 (20.97%) Loss: 2.092431 LR: 0.00002997 +[09:29:12] Epoch: 1 Batch: 4216/20099 (20.98%) Loss: 2.491425 LR: 0.00002997 +[09:29:14] Epoch: 1 Batch: 4217/20099 (20.98%) Loss: 2.283398 LR: 0.00002997 +[09:29:16] Epoch: 1 Batch: 4218/20099 (20.99%) Loss: 2.120367 LR: 0.00002997 +[09:29:18] Epoch: 1 Batch: 4219/20099 (20.99%) Loss: 2.099170 LR: 0.00002997 +[09:29:20] Epoch: 1 Batch: 4220/20099 (21.00%) Loss: 2.306964 LR: 0.00002997 +[09:29:22] Epoch: 1 Batch: 4221/20099 (21.00%) Loss: 2.225070 LR: 0.00002997 +[09:29:24] Epoch: 1 Batch: 4222/20099 (21.01%) Loss: 2.166511 LR: 0.00002997 +[09:29:26] Epoch: 1 Batch: 4223/20099 (21.01%) Loss: 2.240740 LR: 0.00002997 +[09:29:27] Epoch: 1 Batch: 4224/20099 (21.02%) Loss: 1.895833 LR: 0.00002997 +[09:29:29] Epoch: 1 Batch: 4225/20099 (21.02%) Loss: 2.339063 LR: 0.00002997 +[09:29:31] Epoch: 1 Batch: 4226/20099 (21.03%) Loss: 2.540421 LR: 0.00002997 +[09:29:33] Epoch: 1 Batch: 4227/20099 (21.03%) Loss: 2.084920 LR: 0.00002997 +[09:29:35] Epoch: 1 Batch: 4228/20099 (21.04%) Loss: 2.009688 LR: 0.00002997 +[09:29:36] Epoch: 1 Batch: 4229/20099 (21.04%) Loss: 2.044715 LR: 0.00002997 +[09:29:38] Epoch: 1 Batch: 4230/20099 (21.05%) Loss: 2.163644 LR: 0.00002997 +[09:29:40] Epoch: 1 Batch: 4231/20099 (21.05%) Loss: 2.207278 LR: 0.00002997 +[09:29:42] Epoch: 1 Batch: 4232/20099 (21.06%) Loss: 2.156269 LR: 0.00002997 +[09:29:43] Epoch: 1 Batch: 4233/20099 (21.06%) Loss: 2.457905 LR: 0.00002997 +[09:29:45] Epoch: 1 Batch: 4234/20099 (21.07%) Loss: 1.904993 LR: 0.00002997 +[09:29:47] Epoch: 1 Batch: 4235/20099 (21.07%) Loss: 2.386817 LR: 0.00002997 +[09:29:49] Epoch: 1 Batch: 4236/20099 (21.08%) Loss: 2.200943 LR: 0.00002997 +[09:29:50] Epoch: 1 Batch: 4237/20099 (21.08%) Loss: 1.952594 LR: 0.00002997 +[09:29:52] Epoch: 1 Batch: 4238/20099 (21.09%) Loss: 2.115129 LR: 0.00002997 +[09:29:54] Epoch: 1 Batch: 4239/20099 (21.09%) Loss: 2.038809 LR: 0.00002997 +[09:29:56] Epoch: 1 Batch: 4240/20099 (21.10%) Loss: 2.038389 LR: 0.00002997 +[09:29:57] Epoch: 1 Batch: 4241/20099 (21.10%) Loss: 2.027530 LR: 0.00002997 +[09:29:59] Epoch: 1 Batch: 4242/20099 (21.11%) Loss: 2.317637 LR: 0.00002997 +[09:30:01] Epoch: 1 Batch: 4243/20099 (21.11%) Loss: 2.314612 LR: 0.00002997 +[09:30:03] Epoch: 1 Batch: 4244/20099 (21.12%) Loss: 1.980255 LR: 0.00002997 +[09:30:04] Epoch: 1 Batch: 4245/20099 (21.12%) Loss: 2.140796 LR: 0.00002997 +[09:30:06] Epoch: 1 Batch: 4246/20099 (21.13%) Loss: 2.130179 LR: 0.00002997 +[09:30:08] Epoch: 1 Batch: 4247/20099 (21.13%) Loss: 2.303821 LR: 0.00002997 +[09:30:10] Epoch: 1 Batch: 4248/20099 (21.14%) Loss: 2.132404 LR: 0.00002997 +[09:30:11] Epoch: 1 Batch: 4249/20099 (21.14%) Loss: 2.222065 LR: 0.00002997 +[09:30:13] Epoch: 1 Batch: 4250/20099 (21.15%) Loss: 2.548959 LR: 0.00002997 +[09:30:15] Epoch: 1 Batch: 4251/20099 (21.15%) Loss: 2.378686 LR: 0.00002997 +[09:30:17] Epoch: 1 Batch: 4252/20099 (21.16%) Loss: 2.146301 LR: 0.00002996 +[09:30:18] Epoch: 1 Batch: 4253/20099 (21.16%) Loss: 1.932536 LR: 0.00002996 +[09:30:20] Epoch: 1 Batch: 4254/20099 (21.17%) Loss: 2.704013 LR: 0.00002996 +[09:30:22] Epoch: 1 Batch: 4255/20099 (21.17%) Loss: 2.219529 LR: 0.00002996 +[09:30:24] Epoch: 1 Batch: 4256/20099 (21.18%) Loss: 2.186404 LR: 0.00002996 +[09:30:25] Epoch: 1 Batch: 4257/20099 (21.18%) Loss: 2.122137 LR: 0.00002996 +[09:30:27] Epoch: 1 Batch: 4258/20099 (21.19%) Loss: 2.077111 LR: 0.00002996 +[09:30:29] Epoch: 1 Batch: 4259/20099 (21.19%) Loss: 2.326340 LR: 0.00002996 +[09:30:31] Epoch: 1 Batch: 4260/20099 (21.20%) Loss: 2.210035 LR: 0.00002996 +[09:30:32] Epoch: 1 Batch: 4261/20099 (21.20%) Loss: 2.320212 LR: 0.00002996 +[09:30:34] Epoch: 1 Batch: 4262/20099 (21.21%) Loss: 2.049881 LR: 0.00002996 +[09:30:36] Epoch: 1 Batch: 4263/20099 (21.21%) Loss: 2.009214 LR: 0.00002996 +[09:30:38] Epoch: 1 Batch: 4264/20099 (21.21%) Loss: 2.148879 LR: 0.00002996 +[09:30:40] Epoch: 1 Batch: 4265/20099 (21.22%) Loss: 2.223458 LR: 0.00002996 +[09:30:41] Epoch: 1 Batch: 4266/20099 (21.22%) Loss: 2.177021 LR: 0.00002996 +[09:30:43] Epoch: 1 Batch: 4267/20099 (21.23%) Loss: 2.162271 LR: 0.00002996 +[09:30:45] Epoch: 1 Batch: 4268/20099 (21.23%) Loss: 2.253125 LR: 0.00002996 +[09:30:47] Epoch: 1 Batch: 4269/20099 (21.24%) Loss: 2.341412 LR: 0.00002996 +[09:30:49] Epoch: 1 Batch: 4270/20099 (21.24%) Loss: 2.386431 LR: 0.00002996 +[09:30:50] Epoch: 1 Batch: 4271/20099 (21.25%) Loss: 2.283540 LR: 0.00002996 +[09:30:52] Epoch: 1 Batch: 4272/20099 (21.25%) Loss: 2.050317 LR: 0.00002996 +[09:30:54] Epoch: 1 Batch: 4273/20099 (21.26%) Loss: 2.161394 LR: 0.00002996 +[09:30:56] Epoch: 1 Batch: 4274/20099 (21.26%) Loss: 2.082860 LR: 0.00002996 +[09:30:58] Epoch: 1 Batch: 4275/20099 (21.27%) Loss: 2.101066 LR: 0.00002996 +[09:30:59] Epoch: 1 Batch: 4276/20099 (21.27%) Loss: 1.863072 LR: 0.00002996 +[09:31:01] Epoch: 1 Batch: 4277/20099 (21.28%) Loss: 2.119669 LR: 0.00002996 +[09:31:03] Epoch: 1 Batch: 4278/20099 (21.28%) Loss: 2.214416 LR: 0.00002996 +[09:31:05] Epoch: 1 Batch: 4279/20099 (21.29%) Loss: 1.846871 LR: 0.00002996 +[09:31:07] Epoch: 1 Batch: 4280/20099 (21.29%) Loss: 2.516695 LR: 0.00002996 +[09:31:08] Epoch: 1 Batch: 4281/20099 (21.30%) Loss: 2.295454 LR: 0.00002996 +[09:31:10] Epoch: 1 Batch: 4282/20099 (21.30%) Loss: 2.537032 LR: 0.00002996 +[09:31:12] Epoch: 1 Batch: 4283/20099 (21.31%) Loss: 2.249225 LR: 0.00002996 +[09:31:14] Epoch: 1 Batch: 4284/20099 (21.31%) Loss: 2.117881 LR: 0.00002996 +[09:31:15] Epoch: 1 Batch: 4285/20099 (21.32%) Loss: 2.106057 LR: 0.00002996 +[09:31:17] Epoch: 1 Batch: 4286/20099 (21.32%) Loss: 2.270016 LR: 0.00002996 +[09:31:19] Epoch: 1 Batch: 4287/20099 (21.33%) Loss: 2.063246 LR: 0.00002996 +[09:31:21] Epoch: 1 Batch: 4288/20099 (21.33%) Loss: 2.291669 LR: 0.00002996 +[09:31:22] Epoch: 1 Batch: 4289/20099 (21.34%) Loss: 2.507775 LR: 0.00002996 +[09:31:24] Epoch: 1 Batch: 4290/20099 (21.34%) Loss: 2.583970 LR: 0.00002996 +[09:31:26] Epoch: 1 Batch: 4291/20099 (21.35%) Loss: 1.926546 LR: 0.00002996 +[09:31:28] Epoch: 1 Batch: 4292/20099 (21.35%) Loss: 1.906059 LR: 0.00002996 +[09:31:29] Epoch: 1 Batch: 4293/20099 (21.36%) Loss: 2.087105 LR: 0.00002996 +[09:31:31] Epoch: 1 Batch: 4294/20099 (21.36%) Loss: 2.145736 LR: 0.00002996 +[09:31:33] Epoch: 1 Batch: 4295/20099 (21.37%) Loss: 2.237744 LR: 0.00002996 +[09:31:35] Epoch: 1 Batch: 4296/20099 (21.37%) Loss: 2.306480 LR: 0.00002996 +[09:31:36] Epoch: 1 Batch: 4297/20099 (21.38%) Loss: 2.151994 LR: 0.00002996 +[09:31:38] Epoch: 1 Batch: 4298/20099 (21.38%) Loss: 2.125349 LR: 0.00002996 +[09:31:40] Epoch: 1 Batch: 4299/20099 (21.39%) Loss: 2.565513 LR: 0.00002996 +[09:31:42] Epoch: 1 Batch: 4300/20099 (21.39%) Loss: 2.073684 LR: 0.00002996 +[09:31:44] Epoch: 1 Batch: 4301/20099 (21.40%) Loss: 1.843676 LR: 0.00002996 +[09:31:45] Epoch: 1 Batch: 4302/20099 (21.40%) Loss: 2.379328 LR: 0.00002996 +[09:31:47] Epoch: 1 Batch: 4303/20099 (21.41%) Loss: 2.206618 LR: 0.00002996 +[09:31:49] Epoch: 1 Batch: 4304/20099 (21.41%) Loss: 2.164945 LR: 0.00002996 +[09:31:51] Epoch: 1 Batch: 4305/20099 (21.42%) Loss: 2.261166 LR: 0.00002996 +[09:31:52] Epoch: 1 Batch: 4306/20099 (21.42%) Loss: 2.004408 LR: 0.00002996 +[09:31:54] Epoch: 1 Batch: 4307/20099 (21.43%) Loss: 2.015667 LR: 0.00002996 +[09:31:56] Epoch: 1 Batch: 4308/20099 (21.43%) Loss: 2.121432 LR: 0.00002995 +[09:31:58] Epoch: 1 Batch: 4309/20099 (21.44%) Loss: 1.881698 LR: 0.00002995 +[09:32:00] Epoch: 1 Batch: 4310/20099 (21.44%) Loss: 1.956137 LR: 0.00002995 +[09:32:01] Epoch: 1 Batch: 4311/20099 (21.45%) Loss: 1.947452 LR: 0.00002995 +[09:32:03] Epoch: 1 Batch: 4312/20099 (21.45%) Loss: 2.335690 LR: 0.00002995 +[09:32:05] Epoch: 1 Batch: 4313/20099 (21.46%) Loss: 2.090460 LR: 0.00002995 +[09:32:07] Epoch: 1 Batch: 4314/20099 (21.46%) Loss: 2.267272 LR: 0.00002995 +[09:32:09] Epoch: 1 Batch: 4315/20099 (21.47%) Loss: 2.220538 LR: 0.00002995 +[09:32:10] Epoch: 1 Batch: 4316/20099 (21.47%) Loss: 2.372441 LR: 0.00002995 +[09:32:12] Epoch: 1 Batch: 4317/20099 (21.48%) Loss: 2.301841 LR: 0.00002995 +[09:32:14] Epoch: 1 Batch: 4318/20099 (21.48%) Loss: 2.286405 LR: 0.00002995 +[09:32:16] Epoch: 1 Batch: 4319/20099 (21.49%) Loss: 2.136866 LR: 0.00002995 +[09:32:17] Epoch: 1 Batch: 4320/20099 (21.49%) Loss: 1.948267 LR: 0.00002995 +[09:32:19] Epoch: 1 Batch: 4321/20099 (21.50%) Loss: 2.191081 LR: 0.00002995 +[09:32:21] Epoch: 1 Batch: 4322/20099 (21.50%) Loss: 2.072840 LR: 0.00002995 +[09:32:23] Epoch: 1 Batch: 4323/20099 (21.51%) Loss: 2.094398 LR: 0.00002995 +[09:32:25] Epoch: 1 Batch: 4324/20099 (21.51%) Loss: 2.349399 LR: 0.00002995 +[09:32:26] Epoch: 1 Batch: 4325/20099 (21.52%) Loss: 2.408298 LR: 0.00002995 +[09:32:28] Epoch: 1 Batch: 4326/20099 (21.52%) Loss: 2.215339 LR: 0.00002995 +[09:32:30] Epoch: 1 Batch: 4327/20099 (21.53%) Loss: 2.190173 LR: 0.00002995 +[09:32:32] Epoch: 1 Batch: 4328/20099 (21.53%) Loss: 2.054258 LR: 0.00002995 +[09:32:33] Epoch: 1 Batch: 4329/20099 (21.54%) Loss: 2.155009 LR: 0.00002995 +[09:32:35] Epoch: 1 Batch: 4330/20099 (21.54%) Loss: 2.393301 LR: 0.00002995 +[09:32:37] Epoch: 1 Batch: 4331/20099 (21.55%) Loss: 2.053289 LR: 0.00002995 +[09:32:39] Epoch: 1 Batch: 4332/20099 (21.55%) Loss: 2.140115 LR: 0.00002995 +[09:32:41] Epoch: 1 Batch: 4333/20099 (21.56%) Loss: 1.904420 LR: 0.00002995 +[09:32:42] Epoch: 1 Batch: 4334/20099 (21.56%) Loss: 2.380860 LR: 0.00002995 +[09:32:44] Epoch: 1 Batch: 4335/20099 (21.57%) Loss: 2.395250 LR: 0.00002995 +[09:32:46] Epoch: 1 Batch: 4336/20099 (21.57%) Loss: 2.006323 LR: 0.00002995 +[09:32:48] Epoch: 1 Batch: 4337/20099 (21.58%) Loss: 1.960952 LR: 0.00002995 +[09:32:49] Epoch: 1 Batch: 4338/20099 (21.58%) Loss: 2.175974 LR: 0.00002995 +[09:32:51] Epoch: 1 Batch: 4339/20099 (21.59%) Loss: 1.958958 LR: 0.00002995 +[09:32:53] Epoch: 1 Batch: 4340/20099 (21.59%) Loss: 2.040513 LR: 0.00002995 +[09:32:55] Epoch: 1 Batch: 4341/20099 (21.60%) Loss: 2.163926 LR: 0.00002995 +[09:32:57] Epoch: 1 Batch: 4342/20099 (21.60%) Loss: 2.496896 LR: 0.00002995 +[09:32:58] Epoch: 1 Batch: 4343/20099 (21.61%) Loss: 2.300030 LR: 0.00002995 +[09:33:00] Epoch: 1 Batch: 4344/20099 (21.61%) Loss: 2.464136 LR: 0.00002995 +[09:33:02] Epoch: 1 Batch: 4345/20099 (21.62%) Loss: 2.069293 LR: 0.00002995 +[09:33:04] Epoch: 1 Batch: 4346/20099 (21.62%) Loss: 2.187393 LR: 0.00002995 +[09:33:05] Epoch: 1 Batch: 4347/20099 (21.63%) Loss: 2.306781 LR: 0.00002995 +[09:33:07] Epoch: 1 Batch: 4348/20099 (21.63%) Loss: 2.354031 LR: 0.00002995 +[09:33:09] Epoch: 1 Batch: 4349/20099 (21.64%) Loss: 2.517716 LR: 0.00002995 +[09:33:11] Epoch: 1 Batch: 4350/20099 (21.64%) Loss: 2.225712 LR: 0.00002994 +[09:33:13] Epoch: 1 Batch: 4351/20099 (21.65%) Loss: 2.159921 LR: 0.00002994 +[09:33:14] Epoch: 1 Batch: 4352/20099 (21.65%) Loss: 2.430187 LR: 0.00002994 +[09:33:16] Epoch: 1 Batch: 4353/20099 (21.66%) Loss: 2.081634 LR: 0.00002994 +[09:33:18] Epoch: 1 Batch: 4354/20099 (21.66%) Loss: 2.167129 LR: 0.00002994 +[09:33:20] Epoch: 1 Batch: 4355/20099 (21.67%) Loss: 1.975859 LR: 0.00002994 +[09:33:21] Epoch: 1 Batch: 4356/20099 (21.67%) Loss: 2.054198 LR: 0.00002994 +[09:33:23] Epoch: 1 Batch: 4357/20099 (21.68%) Loss: 1.911408 LR: 0.00002994 +[09:33:25] Epoch: 1 Batch: 4358/20099 (21.68%) Loss: 2.143560 LR: 0.00002994 +[09:33:27] Epoch: 1 Batch: 4359/20099 (21.69%) Loss: 1.897688 LR: 0.00002994 +[09:33:29] Epoch: 1 Batch: 4360/20099 (21.69%) Loss: 2.172303 LR: 0.00002994 +[09:33:30] Epoch: 1 Batch: 4361/20099 (21.70%) Loss: 2.082796 LR: 0.00002994 +[09:33:32] Epoch: 1 Batch: 4362/20099 (21.70%) Loss: 2.255810 LR: 0.00002994 +[09:33:34] Epoch: 1 Batch: 4363/20099 (21.71%) Loss: 2.193030 LR: 0.00002994 +[09:33:36] Epoch: 1 Batch: 4364/20099 (21.71%) Loss: 2.429543 LR: 0.00002994 +[09:33:37] Epoch: 1 Batch: 4365/20099 (21.72%) Loss: 2.062505 LR: 0.00002994 +[09:33:39] Epoch: 1 Batch: 4366/20099 (21.72%) Loss: 2.008330 LR: 0.00002994 +[09:33:41] Epoch: 1 Batch: 4367/20099 (21.73%) Loss: 2.009021 LR: 0.00002994 +[09:33:43] Epoch: 1 Batch: 4368/20099 (21.73%) Loss: 2.295634 LR: 0.00002994 +[09:33:45] Epoch: 1 Batch: 4369/20099 (21.74%) Loss: 2.109503 LR: 0.00002994 +[09:33:46] Epoch: 1 Batch: 4370/20099 (21.74%) Loss: 2.626593 LR: 0.00002994 +[09:33:48] Epoch: 1 Batch: 4371/20099 (21.75%) Loss: 2.263075 LR: 0.00002994 +[09:33:50] Epoch: 1 Batch: 4372/20099 (21.75%) Loss: 1.986575 LR: 0.00002994 +[09:33:52] Epoch: 1 Batch: 4373/20099 (21.76%) Loss: 2.441669 LR: 0.00002994 +[09:33:53] Epoch: 1 Batch: 4374/20099 (21.76%) Loss: 2.191575 LR: 0.00002994 +[09:33:55] Epoch: 1 Batch: 4375/20099 (21.77%) Loss: 2.324395 LR: 0.00002994 +[09:33:57] Epoch: 1 Batch: 4376/20099 (21.77%) Loss: 2.465048 LR: 0.00002994 +[09:33:59] Epoch: 1 Batch: 4377/20099 (21.78%) Loss: 2.180017 LR: 0.00002994 +[09:34:01] Epoch: 1 Batch: 4378/20099 (21.78%) Loss: 2.246031 LR: 0.00002994 +[09:34:02] Epoch: 1 Batch: 4379/20099 (21.79%) Loss: 2.357298 LR: 0.00002994 +[09:34:04] Epoch: 1 Batch: 4380/20099 (21.79%) Loss: 2.220786 LR: 0.00002994 +[09:34:06] Epoch: 1 Batch: 4381/20099 (21.80%) Loss: 1.845129 LR: 0.00002994 +[09:34:08] Epoch: 1 Batch: 4382/20099 (21.80%) Loss: 2.698543 LR: 0.00002994 +[09:34:09] Epoch: 1 Batch: 4383/20099 (21.81%) Loss: 2.326270 LR: 0.00002994 +[09:34:11] Epoch: 1 Batch: 4384/20099 (21.81%) Loss: 2.411268 LR: 0.00002994 +[09:34:13] Epoch: 1 Batch: 4385/20099 (21.82%) Loss: 2.225788 LR: 0.00002994 +[09:34:15] Epoch: 1 Batch: 4386/20099 (21.82%) Loss: 2.450372 LR: 0.00002994 +[09:34:16] Epoch: 1 Batch: 4387/20099 (21.83%) Loss: 2.288461 LR: 0.00002994 +[09:34:18] Epoch: 1 Batch: 4388/20099 (21.83%) Loss: 2.105160 LR: 0.00002994 +[09:34:20] Epoch: 1 Batch: 4389/20099 (21.84%) Loss: 2.119358 LR: 0.00002994 +[09:34:22] Epoch: 1 Batch: 4390/20099 (21.84%) Loss: 2.374735 LR: 0.00002994 +[09:34:23] Epoch: 1 Batch: 4391/20099 (21.85%) Loss: 2.110728 LR: 0.00002994 +[09:34:25] Epoch: 1 Batch: 4392/20099 (21.85%) Loss: 1.998562 LR: 0.00002993 +[09:34:27] Epoch: 1 Batch: 4393/20099 (21.86%) Loss: 2.378952 LR: 0.00002993 +[09:34:29] Epoch: 1 Batch: 4394/20099 (21.86%) Loss: 2.152264 LR: 0.00002993 +[09:34:31] Epoch: 1 Batch: 4395/20099 (21.87%) Loss: 2.426760 LR: 0.00002993 +[09:34:32] Epoch: 1 Batch: 4396/20099 (21.87%) Loss: 2.111558 LR: 0.00002993 +[09:34:34] Epoch: 1 Batch: 4397/20099 (21.88%) Loss: 2.266134 LR: 0.00002993 +[09:34:36] Epoch: 1 Batch: 4398/20099 (21.88%) Loss: 2.289913 LR: 0.00002993 +[09:34:38] Epoch: 1 Batch: 4399/20099 (21.89%) Loss: 2.324952 LR: 0.00002993 +[09:34:43] >> Cleaned up old temp checkpoint: epoch1_step2400 +[09:34:43] >> Temp checkpoint saved: epoch1_step4400, size: 0.1693 GB +[09:34:43] Epoch: 1 Batch: 4400/20099 (21.89%) Loss: 2.206054 LR: 0.00002993 +[09:34:45] Epoch: 1 Batch: 4401/20099 (21.90%) Loss: 2.121865 LR: 0.00002993 +[09:34:47] Epoch: 1 Batch: 4402/20099 (21.90%) Loss: 2.185474 LR: 0.00002993 +[09:34:49] Epoch: 1 Batch: 4403/20099 (21.91%) Loss: 2.184878 LR: 0.00002993 +[09:34:50] Epoch: 1 Batch: 4404/20099 (21.91%) Loss: 2.138655 LR: 0.00002993 +[09:34:52] Epoch: 1 Batch: 4405/20099 (21.92%) Loss: 2.327180 LR: 0.00002993 +[09:34:54] Epoch: 1 Batch: 4406/20099 (21.92%) Loss: 2.359757 LR: 0.00002993 +[09:34:56] Epoch: 1 Batch: 4407/20099 (21.93%) Loss: 2.339371 LR: 0.00002993 +[09:34:57] Epoch: 1 Batch: 4408/20099 (21.93%) Loss: 2.117037 LR: 0.00002993 +[09:34:59] Epoch: 1 Batch: 4409/20099 (21.94%) Loss: 2.204730 LR: 0.00002993 +[09:35:01] Epoch: 1 Batch: 4410/20099 (21.94%) Loss: 2.286284 LR: 0.00002993 +[09:35:03] Epoch: 1 Batch: 4411/20099 (21.95%) Loss: 1.945703 LR: 0.00002993 +[09:35:04] Epoch: 1 Batch: 4412/20099 (21.95%) Loss: 2.093357 LR: 0.00002993 +[09:35:06] Epoch: 1 Batch: 4413/20099 (21.96%) Loss: 2.060310 LR: 0.00002993 +[09:35:08] Epoch: 1 Batch: 4414/20099 (21.96%) Loss: 1.806569 LR: 0.00002993 +[09:35:10] Epoch: 1 Batch: 4415/20099 (21.97%) Loss: 2.104291 LR: 0.00002993 +[09:35:12] Epoch: 1 Batch: 4416/20099 (21.97%) Loss: 2.155068 LR: 0.00002993 +[09:35:14] Epoch: 1 Batch: 4417/20099 (21.98%) Loss: 2.191786 LR: 0.00002993 +[09:35:15] Epoch: 1 Batch: 4418/20099 (21.98%) Loss: 2.304741 LR: 0.00002993 +[09:35:17] Epoch: 1 Batch: 4419/20099 (21.99%) Loss: 2.053975 LR: 0.00002993 +[09:35:19] Epoch: 1 Batch: 4420/20099 (21.99%) Loss: 2.253861 LR: 0.00002993 +[09:35:21] Epoch: 1 Batch: 4421/20099 (22.00%) Loss: 2.168701 LR: 0.00002993 +[09:35:23] Epoch: 1 Batch: 4422/20099 (22.00%) Loss: 2.424770 LR: 0.00002993 +[09:35:24] Epoch: 1 Batch: 4423/20099 (22.01%) Loss: 2.042711 LR: 0.00002993 +[09:35:26] Epoch: 1 Batch: 4424/20099 (22.01%) Loss: 2.167008 LR: 0.00002993 +[09:35:28] Epoch: 1 Batch: 4425/20099 (22.02%) Loss: 2.327759 LR: 0.00002993 +[09:35:30] Epoch: 1 Batch: 4426/20099 (22.02%) Loss: 2.086819 LR: 0.00002993 +[09:35:31] Epoch: 1 Batch: 4427/20099 (22.03%) Loss: 2.284888 LR: 0.00002993 +[09:35:33] Epoch: 1 Batch: 4428/20099 (22.03%) Loss: 2.297292 LR: 0.00002993 +[09:35:35] Epoch: 1 Batch: 4429/20099 (22.04%) Loss: 2.115367 LR: 0.00002993 +[09:35:37] Epoch: 1 Batch: 4430/20099 (22.04%) Loss: 1.863683 LR: 0.00002993 +[09:35:38] Epoch: 1 Batch: 4431/20099 (22.05%) Loss: 2.439022 LR: 0.00002993 +[09:35:40] Epoch: 1 Batch: 4432/20099 (22.05%) Loss: 2.518859 LR: 0.00002993 +[09:35:42] Epoch: 1 Batch: 4433/20099 (22.06%) Loss: 2.156759 LR: 0.00002993 +[09:35:44] Epoch: 1 Batch: 4434/20099 (22.06%) Loss: 2.174412 LR: 0.00002992 +[09:35:46] Epoch: 1 Batch: 4435/20099 (22.07%) Loss: 2.162699 LR: 0.00002992 +[09:35:47] Epoch: 1 Batch: 4436/20099 (22.07%) Loss: 2.155162 LR: 0.00002992 +[09:35:49] Epoch: 1 Batch: 4437/20099 (22.08%) Loss: 2.214235 LR: 0.00002992 +[09:35:51] Epoch: 1 Batch: 4438/20099 (22.08%) Loss: 2.099998 LR: 0.00002992 +[09:35:53] Epoch: 1 Batch: 4439/20099 (22.09%) Loss: 2.228888 LR: 0.00002992 +[09:35:54] Epoch: 1 Batch: 4440/20099 (22.09%) Loss: 2.161440 LR: 0.00002992 +[09:35:56] Epoch: 1 Batch: 4441/20099 (22.10%) Loss: 2.240600 LR: 0.00002992 +[09:35:58] Epoch: 1 Batch: 4442/20099 (22.10%) Loss: 1.745911 LR: 0.00002992 +[09:36:00] Epoch: 1 Batch: 4443/20099 (22.11%) Loss: 2.077041 LR: 0.00002992 +[09:36:01] Epoch: 1 Batch: 4444/20099 (22.11%) Loss: 1.929353 LR: 0.00002992 +[09:36:03] Epoch: 1 Batch: 4445/20099 (22.12%) Loss: 2.288770 LR: 0.00002992 +[09:36:05] Epoch: 1 Batch: 4446/20099 (22.12%) Loss: 1.893214 LR: 0.00002992 +[09:36:07] Epoch: 1 Batch: 4447/20099 (22.13%) Loss: 1.982065 LR: 0.00002992 +[09:36:08] Epoch: 1 Batch: 4448/20099 (22.13%) Loss: 2.151330 LR: 0.00002992 +[09:36:10] Epoch: 1 Batch: 4449/20099 (22.14%) Loss: 2.103332 LR: 0.00002992 +[09:36:12] Epoch: 1 Batch: 4450/20099 (22.14%) Loss: 2.271091 LR: 0.00002992 +[09:36:14] Epoch: 1 Batch: 4451/20099 (22.15%) Loss: 2.141728 LR: 0.00002992 +[09:36:16] Epoch: 1 Batch: 4452/20099 (22.15%) Loss: 2.021437 LR: 0.00002992 +[09:36:17] Epoch: 1 Batch: 4453/20099 (22.16%) Loss: 1.957507 LR: 0.00002992 +[09:36:19] Epoch: 1 Batch: 4454/20099 (22.16%) Loss: 2.156494 LR: 0.00002992 +[09:36:21] Epoch: 1 Batch: 4455/20099 (22.17%) Loss: 2.340196 LR: 0.00002992 +[09:36:23] Epoch: 1 Batch: 4456/20099 (22.17%) Loss: 1.897669 LR: 0.00002992 +[09:36:24] Epoch: 1 Batch: 4457/20099 (22.18%) Loss: 2.221617 LR: 0.00002992 +[09:36:26] Epoch: 1 Batch: 4458/20099 (22.18%) Loss: 2.199193 LR: 0.00002992 +[09:36:28] Epoch: 1 Batch: 4459/20099 (22.19%) Loss: 2.315684 LR: 0.00002992 +[09:36:30] Epoch: 1 Batch: 4460/20099 (22.19%) Loss: 2.362048 LR: 0.00002992 +[09:36:32] Epoch: 1 Batch: 4461/20099 (22.20%) Loss: 2.210023 LR: 0.00002992 +[09:36:33] Epoch: 1 Batch: 4462/20099 (22.20%) Loss: 2.147963 LR: 0.00002992 +[09:36:35] Epoch: 1 Batch: 4463/20099 (22.21%) Loss: 1.976379 LR: 0.00002992 +[09:36:37] Epoch: 1 Batch: 4464/20099 (22.21%) Loss: 2.246976 LR: 0.00002992 +[09:36:39] Epoch: 1 Batch: 4465/20099 (22.22%) Loss: 2.083721 LR: 0.00002992 +[09:36:40] Epoch: 1 Batch: 4466/20099 (22.22%) Loss: 2.039861 LR: 0.00002992 +[09:36:42] Epoch: 1 Batch: 4467/20099 (22.22%) Loss: 2.250999 LR: 0.00002992 +[09:36:44] Epoch: 1 Batch: 4468/20099 (22.23%) Loss: 1.960066 LR: 0.00002992 +[09:36:46] Epoch: 1 Batch: 4469/20099 (22.23%) Loss: 2.599974 LR: 0.00002991 +[09:36:48] Epoch: 1 Batch: 4470/20099 (22.24%) Loss: 2.125574 LR: 0.00002991 +[09:36:49] Epoch: 1 Batch: 4471/20099 (22.24%) Loss: 2.701041 LR: 0.00002991 +[09:36:51] Epoch: 1 Batch: 4472/20099 (22.25%) Loss: 2.034093 LR: 0.00002991 +[09:36:53] Epoch: 1 Batch: 4473/20099 (22.25%) Loss: 2.413813 LR: 0.00002991 +[09:36:55] Epoch: 1 Batch: 4474/20099 (22.26%) Loss: 2.219178 LR: 0.00002991 +[09:36:57] Epoch: 1 Batch: 4475/20099 (22.26%) Loss: 2.104830 LR: 0.00002991 +[09:36:58] Epoch: 1 Batch: 4476/20099 (22.27%) Loss: 2.444024 LR: 0.00002991 +[09:37:00] Epoch: 1 Batch: 4477/20099 (22.27%) Loss: 2.259976 LR: 0.00002991 +[09:37:02] Epoch: 1 Batch: 4478/20099 (22.28%) Loss: 2.164045 LR: 0.00002991 +[09:37:04] Epoch: 1 Batch: 4479/20099 (22.28%) Loss: 2.011524 LR: 0.00002991 +[09:37:05] Epoch: 1 Batch: 4480/20099 (22.29%) Loss: 2.277252 LR: 0.00002991 +[09:37:07] Epoch: 1 Batch: 4481/20099 (22.29%) Loss: 2.357504 LR: 0.00002991 +[09:37:09] Epoch: 1 Batch: 4482/20099 (22.30%) Loss: 2.282357 LR: 0.00002991 +[09:37:11] Epoch: 1 Batch: 4483/20099 (22.30%) Loss: 1.967951 LR: 0.00002991 +[09:37:13] Epoch: 1 Batch: 4484/20099 (22.31%) Loss: 2.122877 LR: 0.00002991 +[09:37:14] Epoch: 1 Batch: 4485/20099 (22.31%) Loss: 1.947138 LR: 0.00002991 +[09:37:16] Epoch: 1 Batch: 4486/20099 (22.32%) Loss: 2.280461 LR: 0.00002991 +[09:37:18] Epoch: 1 Batch: 4487/20099 (22.32%) Loss: 2.120796 LR: 0.00002991 +[09:37:20] Epoch: 1 Batch: 4488/20099 (22.33%) Loss: 2.518332 LR: 0.00002991 +[09:37:22] Epoch: 1 Batch: 4489/20099 (22.33%) Loss: 2.352842 LR: 0.00002991 +[09:37:23] Epoch: 1 Batch: 4490/20099 (22.34%) Loss: 2.109434 LR: 0.00002991 +[09:37:25] Epoch: 1 Batch: 4491/20099 (22.34%) Loss: 2.188045 LR: 0.00002991 +[09:37:27] Epoch: 1 Batch: 4492/20099 (22.35%) Loss: 2.203532 LR: 0.00002991 +[09:37:29] Epoch: 1 Batch: 4493/20099 (22.35%) Loss: 2.153884 LR: 0.00002991 +[09:37:30] Epoch: 1 Batch: 4494/20099 (22.36%) Loss: 2.190680 LR: 0.00002991 +[09:37:32] Epoch: 1 Batch: 4495/20099 (22.36%) Loss: 2.276166 LR: 0.00002991 +[09:37:34] Epoch: 1 Batch: 4496/20099 (22.37%) Loss: 2.142515 LR: 0.00002991 +[09:37:36] Epoch: 1 Batch: 4497/20099 (22.37%) Loss: 2.343031 LR: 0.00002991 +[09:37:37] Epoch: 1 Batch: 4498/20099 (22.38%) Loss: 2.166052 LR: 0.00002991 +[09:37:39] Epoch: 1 Batch: 4499/20099 (22.38%) Loss: 2.138952 LR: 0.00002991 +[09:37:41] >> Evaluating batch 0 +[09:37:42] >> Evaluating batch 1 +[09:37:43] >> Evaluating batch 2 +[09:37:44] >> Evaluating batch 3 +[09:37:45] >> Evaluating batch 4 +[09:37:46] >> Evaluating batch 5 +[09:37:47] >> Evaluating batch 6 +[09:37:48] >> Evaluating batch 7 +[09:37:49] >> Evaluating batch 8 +[09:37:50] >> Evaluating batch 9 +[09:37:51] >> Evaluating batch 10 +[09:37:52] >> Evaluating batch 11 +[09:37:53] >> Evaluating batch 12 +[09:37:54] >> Evaluating batch 13 +[09:37:55] >> Evaluating batch 14 +[09:37:56] >> Evaluating batch 15 +[09:37:57] >> Evaluating batch 16 +[09:37:58] Epoch: 1 Step: 4500/20099 Evaluation: +[09:37:58] [1mAvg Loss Since Last Eval: 0.2422 Val Loss: 2.2500 Validation loss delta: 2.2500 Perplexity: 9.4881 LR: 0.00002991 +[09:38:01] >> Checkpoint saved: epoch1_step4500, size: 0.1693 GB +[09:38:01] Epoch: 1 Batch: 4500/20099 (22.39%) Loss: 2.454383 LR: 0.00002991 +[09:38:03] Epoch: 1 Batch: 4501/20099 (22.39%) Loss: 2.127920 LR: 0.00002991 +[09:38:05] Epoch: 1 Batch: 4502/20099 (22.40%) Loss: 2.181894 LR: 0.00002991 +[09:38:06] Epoch: 1 Batch: 4503/20099 (22.40%) Loss: 2.267566 LR: 0.00002991 +[09:38:08] Epoch: 1 Batch: 4504/20099 (22.41%) Loss: 2.116218 LR: 0.00002991 +[09:38:10] Epoch: 1 Batch: 4505/20099 (22.41%) Loss: 2.189499 LR: 0.00002991 +[09:38:12] Epoch: 1 Batch: 4506/20099 (22.42%) Loss: 2.080168 LR: 0.00002991 +[09:38:13] Epoch: 1 Batch: 4507/20099 (22.42%) Loss: 2.237687 LR: 0.00002991 +[09:38:15] Epoch: 1 Batch: 4508/20099 (22.43%) Loss: 2.397248 LR: 0.00002991 +[09:38:17] Epoch: 1 Batch: 4509/20099 (22.43%) Loss: 2.352885 LR: 0.00002991 +[09:38:19] Epoch: 1 Batch: 4510/20099 (22.44%) Loss: 2.073200 LR: 0.00002991 +[09:38:21] Epoch: 1 Batch: 4511/20099 (22.44%) Loss: 2.116466 LR: 0.00002990 +[09:38:22] Epoch: 1 Batch: 4512/20099 (22.45%) Loss: 2.144737 LR: 0.00002990 +[09:38:24] Epoch: 1 Batch: 4513/20099 (22.45%) Loss: 2.067570 LR: 0.00002990 +[09:38:26] Epoch: 1 Batch: 4514/20099 (22.46%) Loss: 2.282880 LR: 0.00002990 +[09:38:28] Epoch: 1 Batch: 4515/20099 (22.46%) Loss: 2.065158 LR: 0.00002990 +[09:38:30] Epoch: 1 Batch: 4516/20099 (22.47%) Loss: 2.179939 LR: 0.00002990 +[09:38:31] Epoch: 1 Batch: 4517/20099 (22.47%) Loss: 2.038934 LR: 0.00002990 +[09:38:33] Epoch: 1 Batch: 4518/20099 (22.48%) Loss: 1.920319 LR: 0.00002990 +[09:38:35] Epoch: 1 Batch: 4519/20099 (22.48%) Loss: 2.322344 LR: 0.00002990 +[09:38:37] Epoch: 1 Batch: 4520/20099 (22.49%) Loss: 2.012674 LR: 0.00002990 +[09:38:39] Epoch: 1 Batch: 4521/20099 (22.49%) Loss: 2.306949 LR: 0.00002990 +[09:38:40] Epoch: 1 Batch: 4522/20099 (22.50%) Loss: 2.251603 LR: 0.00002990 +[09:38:42] Epoch: 1 Batch: 4523/20099 (22.50%) Loss: 2.471321 LR: 0.00002990 +[09:38:44] Epoch: 1 Batch: 4524/20099 (22.51%) Loss: 2.025124 LR: 0.00002990 +[09:38:46] Epoch: 1 Batch: 4525/20099 (22.51%) Loss: 2.160552 LR: 0.00002990 +[09:38:48] Epoch: 1 Batch: 4526/20099 (22.52%) Loss: 2.231827 LR: 0.00002990 +[09:38:49] Epoch: 1 Batch: 4527/20099 (22.52%) Loss: 2.199973 LR: 0.00002990 +[09:38:51] Epoch: 1 Batch: 4528/20099 (22.53%) Loss: 2.129307 LR: 0.00002990 +[09:38:53] Epoch: 1 Batch: 4529/20099 (22.53%) Loss: 2.103114 LR: 0.00002990 +[09:38:55] Epoch: 1 Batch: 4530/20099 (22.54%) Loss: 2.203794 LR: 0.00002990 +[09:38:56] Epoch: 1 Batch: 4531/20099 (22.54%) Loss: 2.448875 LR: 0.00002990 +[09:38:58] Epoch: 1 Batch: 4532/20099 (22.55%) Loss: 2.394624 LR: 0.00002990 +[09:39:00] Epoch: 1 Batch: 4533/20099 (22.55%) Loss: 2.171654 LR: 0.00002990 +[09:39:02] Epoch: 1 Batch: 4534/20099 (22.56%) Loss: 2.588477 LR: 0.00002990 +[09:39:04] Epoch: 1 Batch: 4535/20099 (22.56%) Loss: 2.148917 LR: 0.00002990 +[09:39:05] Epoch: 1 Batch: 4536/20099 (22.57%) Loss: 2.274196 LR: 0.00002990 +[09:39:07] Epoch: 1 Batch: 4537/20099 (22.57%) Loss: 1.942928 LR: 0.00002990 +[09:39:09] Epoch: 1 Batch: 4538/20099 (22.58%) Loss: 2.302236 LR: 0.00002990 +[09:39:11] Epoch: 1 Batch: 4539/20099 (22.58%) Loss: 2.179597 LR: 0.00002989 +[09:39:12] Epoch: 1 Batch: 4540/20099 (22.59%) Loss: 2.244758 LR: 0.00002989 +[09:39:14] Epoch: 1 Batch: 4541/20099 (22.59%) Loss: 2.412387 LR: 0.00002989 +[09:39:16] Epoch: 1 Batch: 4542/20099 (22.60%) Loss: 2.156938 LR: 0.00002989 +[09:39:18] Epoch: 1 Batch: 4543/20099 (22.60%) Loss: 2.259222 LR: 0.00002989 +[09:39:19] Epoch: 1 Batch: 4544/20099 (22.61%) Loss: 1.942501 LR: 0.00002989 +[09:39:21] Epoch: 1 Batch: 4545/20099 (22.61%) Loss: 2.215254 LR: 0.00002989 +[09:39:23] Epoch: 1 Batch: 4546/20099 (22.62%) Loss: 2.195856 LR: 0.00002989 +[09:39:25] Epoch: 1 Batch: 4547/20099 (22.62%) Loss: 2.046872 LR: 0.00002989 +[09:39:26] Epoch: 1 Batch: 4548/20099 (22.63%) Loss: 2.068745 LR: 0.00002989 +[09:39:28] Epoch: 1 Batch: 4549/20099 (22.63%) Loss: 2.048756 LR: 0.00002989 +[09:39:30] Epoch: 1 Batch: 4550/20099 (22.64%) Loss: 2.158098 LR: 0.00002989 +[09:39:32] Epoch: 1 Batch: 4551/20099 (22.64%) Loss: 2.373278 LR: 0.00002989 +[09:39:34] Epoch: 1 Batch: 4552/20099 (22.65%) Loss: 2.208424 LR: 0.00002989 +[09:39:35] Epoch: 1 Batch: 4553/20099 (22.65%) Loss: 1.723308 LR: 0.00002989 +[09:39:37] Epoch: 1 Batch: 4554/20099 (22.66%) Loss: 2.233638 LR: 0.00002989 +[09:39:39] Epoch: 1 Batch: 4555/20099 (22.66%) Loss: 2.200091 LR: 0.00002989 +[09:39:41] Epoch: 1 Batch: 4556/20099 (22.67%) Loss: 2.019238 LR: 0.00002989 +[09:39:42] Epoch: 1 Batch: 4557/20099 (22.67%) Loss: 2.027067 LR: 0.00002989 +[09:39:44] Epoch: 1 Batch: 4558/20099 (22.68%) Loss: 2.246422 LR: 0.00002989 +[09:39:46] Epoch: 1 Batch: 4559/20099 (22.68%) Loss: 2.019165 LR: 0.00002989 +[09:39:48] Epoch: 1 Batch: 4560/20099 (22.69%) Loss: 2.247591 LR: 0.00002989 +[09:39:49] Epoch: 1 Batch: 4561/20099 (22.69%) Loss: 2.628521 LR: 0.00002989 +[09:39:51] Epoch: 1 Batch: 4562/20099 (22.70%) Loss: 2.084549 LR: 0.00002989 +[09:39:53] Epoch: 1 Batch: 4563/20099 (22.70%) Loss: 2.271124 LR: 0.00002989 +[09:39:55] Epoch: 1 Batch: 4564/20099 (22.71%) Loss: 2.149997 LR: 0.00002989 +[09:39:57] Epoch: 1 Batch: 4565/20099 (22.71%) Loss: 2.408524 LR: 0.00002989 +[09:39:58] Epoch: 1 Batch: 4566/20099 (22.72%) Loss: 2.233233 LR: 0.00002989 +[09:40:00] Epoch: 1 Batch: 4567/20099 (22.72%) Loss: 2.063513 LR: 0.00002989 +[09:40:02] Epoch: 1 Batch: 4568/20099 (22.73%) Loss: 2.072395 LR: 0.00002989 +[09:40:04] Epoch: 1 Batch: 4569/20099 (22.73%) Loss: 2.322791 LR: 0.00002989 +[09:40:05] Epoch: 1 Batch: 4570/20099 (22.74%) Loss: 2.111361 LR: 0.00002989 +[09:40:07] Epoch: 1 Batch: 4571/20099 (22.74%) Loss: 2.174940 LR: 0.00002989 +[09:40:09] Epoch: 1 Batch: 4572/20099 (22.75%) Loss: 2.269704 LR: 0.00002989 +[09:40:11] Epoch: 1 Batch: 4573/20099 (22.75%) Loss: 2.210771 LR: 0.00002989 +[09:40:12] Epoch: 1 Batch: 4574/20099 (22.76%) Loss: 2.143299 LR: 0.00002988 +[09:40:14] Epoch: 1 Batch: 4575/20099 (22.76%) Loss: 2.013377 LR: 0.00002988 +[09:40:16] Epoch: 1 Batch: 4576/20099 (22.77%) Loss: 2.094445 LR: 0.00002988 +[09:40:18] Epoch: 1 Batch: 4577/20099 (22.77%) Loss: 2.169196 LR: 0.00002988 +[09:40:20] Epoch: 1 Batch: 4578/20099 (22.78%) Loss: 1.770027 LR: 0.00002988 +[09:40:21] Epoch: 1 Batch: 4579/20099 (22.78%) Loss: 2.170108 LR: 0.00002988 +[09:40:23] Epoch: 1 Batch: 4580/20099 (22.79%) Loss: 2.074645 LR: 0.00002988 +[09:40:25] Epoch: 1 Batch: 4581/20099 (22.79%) Loss: 2.170211 LR: 0.00002988 +[09:40:27] Epoch: 1 Batch: 4582/20099 (22.80%) Loss: 2.067807 LR: 0.00002988 +[09:40:28] Epoch: 1 Batch: 4583/20099 (22.80%) Loss: 2.415698 LR: 0.00002988 +[09:40:30] Epoch: 1 Batch: 4584/20099 (22.81%) Loss: 2.208312 LR: 0.00002988 +[09:40:32] Epoch: 1 Batch: 4585/20099 (22.81%) Loss: 2.042493 LR: 0.00002988 +[09:40:34] Epoch: 1 Batch: 4586/20099 (22.82%) Loss: 2.180415 LR: 0.00002988 +[09:40:36] Epoch: 1 Batch: 4587/20099 (22.82%) Loss: 1.968171 LR: 0.00002988 +[09:40:37] Epoch: 1 Batch: 4588/20099 (22.83%) Loss: 1.975385 LR: 0.00002988 +[09:40:39] Epoch: 1 Batch: 4589/20099 (22.83%) Loss: 2.021766 LR: 0.00002988 +[09:40:41] Epoch: 1 Batch: 4590/20099 (22.84%) Loss: 2.146341 LR: 0.00002988 +[09:40:43] Epoch: 1 Batch: 4591/20099 (22.84%) Loss: 2.325340 LR: 0.00002988 +[09:40:44] Epoch: 1 Batch: 4592/20099 (22.85%) Loss: 2.230047 LR: 0.00002988 +[09:40:46] Epoch: 1 Batch: 4593/20099 (22.85%) Loss: 2.374275 LR: 0.00002988 +[09:40:48] Epoch: 1 Batch: 4594/20099 (22.86%) Loss: 1.915366 LR: 0.00002988 +[09:40:50] Epoch: 1 Batch: 4595/20099 (22.86%) Loss: 2.180977 LR: 0.00002988 +[09:40:52] Epoch: 1 Batch: 4596/20099 (22.87%) Loss: 2.130810 LR: 0.00002988 +[09:40:53] Epoch: 1 Batch: 4597/20099 (22.87%) Loss: 2.190001 LR: 0.00002988 +[09:40:55] Epoch: 1 Batch: 4598/20099 (22.88%) Loss: 2.060367 LR: 0.00002988 +[09:40:57] Epoch: 1 Batch: 4599/20099 (22.88%) Loss: 2.095263 LR: 0.00002988 +[09:41:02] >> Cleaned up old temp checkpoint: epoch1_step2600 +[09:41:02] >> Temp checkpoint saved: epoch1_step4600, size: 0.1693 GB +[09:41:02] Epoch: 1 Batch: 4600/20099 (22.89%) Loss: 2.269129 LR: 0.00002988 +[09:41:04] Epoch: 1 Batch: 4601/20099 (22.89%) Loss: 2.309718 LR: 0.00002988 +[09:41:06] Epoch: 1 Batch: 4602/20099 (22.90%) Loss: 2.260203 LR: 0.00002987 +[09:41:08] Epoch: 1 Batch: 4603/20099 (22.90%) Loss: 2.116080 LR: 0.00002987 +[09:41:09] Epoch: 1 Batch: 4604/20099 (22.91%) Loss: 1.988276 LR: 0.00002987 +[09:41:11] Epoch: 1 Batch: 4605/20099 (22.91%) Loss: 2.198561 LR: 0.00002987 +[09:41:13] Epoch: 1 Batch: 4606/20099 (22.92%) Loss: 2.081283 LR: 0.00002987 +[09:41:15] Epoch: 1 Batch: 4607/20099 (22.92%) Loss: 2.234585 LR: 0.00002987 +[09:41:16] Epoch: 1 Batch: 4608/20099 (22.93%) Loss: 1.730787 LR: 0.00002987 +[09:41:18] Epoch: 1 Batch: 4609/20099 (22.93%) Loss: 2.339286 LR: 0.00002987 +[09:41:20] Epoch: 1 Batch: 4610/20099 (22.94%) Loss: 2.162195 LR: 0.00002987 +[09:41:22] Epoch: 1 Batch: 4611/20099 (22.94%) Loss: 2.292299 LR: 0.00002987 +[09:41:24] Epoch: 1 Batch: 4612/20099 (22.95%) Loss: 2.217103 LR: 0.00002987 +[09:41:25] Epoch: 1 Batch: 4613/20099 (22.95%) Loss: 2.329199 LR: 0.00002987 +[09:41:27] Epoch: 1 Batch: 4614/20099 (22.96%) Loss: 2.286993 LR: 0.00002987 +[09:41:29] Epoch: 1 Batch: 4615/20099 (22.96%) Loss: 2.024486 LR: 0.00002987 +[09:41:31] Epoch: 1 Batch: 4616/20099 (22.97%) Loss: 2.329528 LR: 0.00002987 +[09:41:33] Epoch: 1 Batch: 4617/20099 (22.97%) Loss: 2.410837 LR: 0.00002987 +[09:41:34] Epoch: 1 Batch: 4618/20099 (22.98%) Loss: 1.902958 LR: 0.00002987 +[09:41:36] Epoch: 1 Batch: 4619/20099 (22.98%) Loss: 2.231718 LR: 0.00002987 +[09:41:38] Epoch: 1 Batch: 4620/20099 (22.99%) Loss: 2.043199 LR: 0.00002987 +[09:41:40] Epoch: 1 Batch: 4621/20099 (22.99%) Loss: 2.352170 LR: 0.00002987 +[09:41:42] Epoch: 1 Batch: 4622/20099 (23.00%) Loss: 2.009378 LR: 0.00002987 +[09:41:43] Epoch: 1 Batch: 4623/20099 (23.00%) Loss: 2.010901 LR: 0.00002987 +[09:41:45] Epoch: 1 Batch: 4624/20099 (23.01%) Loss: 2.450607 LR: 0.00002987 +[09:41:47] Epoch: 1 Batch: 4625/20099 (23.01%) Loss: 2.079332 LR: 0.00002987 +[09:41:49] Epoch: 1 Batch: 4626/20099 (23.02%) Loss: 2.083146 LR: 0.00002987 +[09:41:51] Epoch: 1 Batch: 4627/20099 (23.02%) Loss: 2.213307 LR: 0.00002987 +[09:41:52] Epoch: 1 Batch: 4628/20099 (23.03%) Loss: 1.885389 LR: 0.00002987 +[09:41:54] Epoch: 1 Batch: 4629/20099 (23.03%) Loss: 2.558436 LR: 0.00002987 +[09:41:56] Epoch: 1 Batch: 4630/20099 (23.04%) Loss: 2.228308 LR: 0.00002986 +[09:41:58] Epoch: 1 Batch: 4631/20099 (23.04%) Loss: 2.147811 LR: 0.00002986 +[09:41:59] Epoch: 1 Batch: 4632/20099 (23.05%) Loss: 2.159683 LR: 0.00002986 +[09:42:01] Epoch: 1 Batch: 4633/20099 (23.05%) Loss: 1.921819 LR: 0.00002986 +[09:42:03] Epoch: 1 Batch: 4634/20099 (23.06%) Loss: 2.113861 LR: 0.00002986 +[09:42:05] Epoch: 1 Batch: 4635/20099 (23.06%) Loss: 2.055442 LR: 0.00002986 +[09:42:06] Epoch: 1 Batch: 4636/20099 (23.07%) Loss: 2.114245 LR: 0.00002986 +[09:42:08] Epoch: 1 Batch: 4637/20099 (23.07%) Loss: 2.250477 LR: 0.00002986 +[09:42:10] Epoch: 1 Batch: 4638/20099 (23.08%) Loss: 2.310852 LR: 0.00002986 +[09:42:12] Epoch: 1 Batch: 4639/20099 (23.08%) Loss: 2.505158 LR: 0.00002986 +[09:42:14] Epoch: 1 Batch: 4640/20099 (23.09%) Loss: 1.916171 LR: 0.00002986 +[09:42:15] Epoch: 1 Batch: 4641/20099 (23.09%) Loss: 2.351977 LR: 0.00002986 +[09:42:17] Epoch: 1 Batch: 4642/20099 (23.10%) Loss: 2.255134 LR: 0.00002986 +[09:42:19] Epoch: 1 Batch: 4643/20099 (23.10%) Loss: 2.263360 LR: 0.00002986 +[09:42:21] Epoch: 1 Batch: 4644/20099 (23.11%) Loss: 2.145698 LR: 0.00002986 +[09:42:22] Epoch: 1 Batch: 4645/20099 (23.11%) Loss: 2.179838 LR: 0.00002986 +[09:42:24] Epoch: 1 Batch: 4646/20099 (23.12%) Loss: 1.850737 LR: 0.00002986 +[09:42:26] Epoch: 1 Batch: 4647/20099 (23.12%) Loss: 1.976613 LR: 0.00002986 +[09:42:28] Epoch: 1 Batch: 4648/20099 (23.13%) Loss: 2.214621 LR: 0.00002986 +[09:42:29] Epoch: 1 Batch: 4649/20099 (23.13%) Loss: 2.298293 LR: 0.00002986 +[09:42:31] Epoch: 1 Batch: 4650/20099 (23.14%) Loss: 2.193688 LR: 0.00002986 +[09:42:33] Epoch: 1 Batch: 4651/20099 (23.14%) Loss: 2.437580 LR: 0.00002986 +[09:42:35] Epoch: 1 Batch: 4652/20099 (23.15%) Loss: 1.912781 LR: 0.00002986 +[09:42:37] Epoch: 1 Batch: 4653/20099 (23.15%) Loss: 2.243409 LR: 0.00002986 +[09:42:39] Epoch: 1 Batch: 4654/20099 (23.16%) Loss: 2.156494 LR: 0.00002986 +[09:42:41] Epoch: 1 Batch: 4655/20099 (23.16%) Loss: 2.050530 LR: 0.00002986 +[09:42:42] Epoch: 1 Batch: 4656/20099 (23.17%) Loss: 2.467556 LR: 0.00002986 +[09:42:44] Epoch: 1 Batch: 4657/20099 (23.17%) Loss: 2.182628 LR: 0.00002986 +[09:42:46] Epoch: 1 Batch: 4658/20099 (23.18%) Loss: 2.271367 LR: 0.00002985 +[09:42:48] Epoch: 1 Batch: 4659/20099 (23.18%) Loss: 2.114659 LR: 0.00002985 +[09:42:49] Epoch: 1 Batch: 4660/20099 (23.19%) Loss: 2.167618 LR: 0.00002985 +[09:42:51] Epoch: 1 Batch: 4661/20099 (23.19%) Loss: 2.036290 LR: 0.00002985 +[09:42:53] Epoch: 1 Batch: 4662/20099 (23.20%) Loss: 2.112420 LR: 0.00002985 +[09:42:55] Epoch: 1 Batch: 4663/20099 (23.20%) Loss: 1.946655 LR: 0.00002985 +[09:42:56] Epoch: 1 Batch: 4664/20099 (23.21%) Loss: 2.213091 LR: 0.00002985 +[09:42:58] Epoch: 1 Batch: 4665/20099 (23.21%) Loss: 2.138493 LR: 0.00002985 +[09:43:00] Epoch: 1 Batch: 4666/20099 (23.22%) Loss: 2.187932 LR: 0.00002985 +[09:43:02] Epoch: 1 Batch: 4667/20099 (23.22%) Loss: 2.044093 LR: 0.00002985 +[09:43:04] Epoch: 1 Batch: 4668/20099 (23.23%) Loss: 2.185463 LR: 0.00002985 +[09:43:05] Epoch: 1 Batch: 4669/20099 (23.23%) Loss: 2.178218 LR: 0.00002985 +[09:43:07] Epoch: 1 Batch: 4670/20099 (23.23%) Loss: 2.079967 LR: 0.00002985 +[09:43:09] Epoch: 1 Batch: 4671/20099 (23.24%) Loss: 2.226505 LR: 0.00002985 +[09:43:11] Epoch: 1 Batch: 4672/20099 (23.24%) Loss: 2.244752 LR: 0.00002985 +[09:43:12] Epoch: 1 Batch: 4673/20099 (23.25%) Loss: 2.147533 LR: 0.00002985 +[09:43:14] Epoch: 1 Batch: 4674/20099 (23.25%) Loss: 2.224766 LR: 0.00002985 +[09:43:16] Epoch: 1 Batch: 4675/20099 (23.26%) Loss: 2.086420 LR: 0.00002985 +[09:43:18] Epoch: 1 Batch: 4676/20099 (23.26%) Loss: 2.109352 LR: 0.00002985 +[09:43:20] Epoch: 1 Batch: 4677/20099 (23.27%) Loss: 2.338684 LR: 0.00002985 +[09:43:21] Epoch: 1 Batch: 4678/20099 (23.27%) Loss: 2.494125 LR: 0.00002985 +[09:43:23] Epoch: 1 Batch: 4679/20099 (23.28%) Loss: 2.199632 LR: 0.00002985 +[09:43:25] Epoch: 1 Batch: 4680/20099 (23.28%) Loss: 2.023668 LR: 0.00002985 +[09:43:27] Epoch: 1 Batch: 4681/20099 (23.29%) Loss: 2.287734 LR: 0.00002985 +[09:43:28] Epoch: 1 Batch: 4682/20099 (23.29%) Loss: 2.280823 LR: 0.00002985 +[09:43:30] Epoch: 1 Batch: 4683/20099 (23.30%) Loss: 2.178694 LR: 0.00002985 +[09:43:32] Epoch: 1 Batch: 4684/20099 (23.30%) Loss: 2.008847 LR: 0.00002985 +[09:43:34] Epoch: 1 Batch: 4685/20099 (23.31%) Loss: 2.332745 LR: 0.00002985 +[09:43:36] Epoch: 1 Batch: 4686/20099 (23.31%) Loss: 2.348091 LR: 0.00002984 +[09:43:37] Epoch: 1 Batch: 4687/20099 (23.32%) Loss: 2.242439 LR: 0.00002984 +[09:43:39] Epoch: 1 Batch: 4688/20099 (23.32%) Loss: 2.014596 LR: 0.00002984 +[09:43:41] Epoch: 1 Batch: 4689/20099 (23.33%) Loss: 2.448067 LR: 0.00002984 +[09:43:43] Epoch: 1 Batch: 4690/20099 (23.33%) Loss: 2.475633 LR: 0.00002984 +[09:43:44] Epoch: 1 Batch: 4691/20099 (23.34%) Loss: 1.969159 LR: 0.00002984 +[09:43:46] Epoch: 1 Batch: 4692/20099 (23.34%) Loss: 2.314331 LR: 0.00002984 +[09:43:48] Epoch: 1 Batch: 4693/20099 (23.35%) Loss: 2.375246 LR: 0.00002984 +[09:43:50] Epoch: 1 Batch: 4694/20099 (23.35%) Loss: 2.186830 LR: 0.00002984 +[09:43:52] Epoch: 1 Batch: 4695/20099 (23.36%) Loss: 2.099122 LR: 0.00002984 +[09:43:53] Epoch: 1 Batch: 4696/20099 (23.36%) Loss: 1.980689 LR: 0.00002984 +[09:43:55] Epoch: 1 Batch: 4697/20099 (23.37%) Loss: 2.153680 LR: 0.00002984 +[09:43:57] Epoch: 1 Batch: 4698/20099 (23.37%) Loss: 2.370480 LR: 0.00002984 +[09:43:59] Epoch: 1 Batch: 4699/20099 (23.38%) Loss: 1.981578 LR: 0.00002984 +[09:44:00] Epoch: 1 Batch: 4700/20099 (23.38%) Loss: 2.353047 LR: 0.00002984 +[09:44:02] Epoch: 1 Batch: 4701/20099 (23.39%) Loss: 2.345898 LR: 0.00002984 +[09:44:04] Epoch: 1 Batch: 4702/20099 (23.39%) Loss: 2.365562 LR: 0.00002984 +[09:44:06] Epoch: 1 Batch: 4703/20099 (23.40%) Loss: 2.004191 LR: 0.00002984 +[09:44:08] Epoch: 1 Batch: 4704/20099 (23.40%) Loss: 2.045257 LR: 0.00002984 +[09:44:09] Epoch: 1 Batch: 4705/20099 (23.41%) Loss: 2.118621 LR: 0.00002984 +[09:44:11] Epoch: 1 Batch: 4706/20099 (23.41%) Loss: 2.024510 LR: 0.00002984 +[09:44:13] Epoch: 1 Batch: 4707/20099 (23.42%) Loss: 2.193663 LR: 0.00002984 +[09:44:15] Epoch: 1 Batch: 4708/20099 (23.42%) Loss: 2.161496 LR: 0.00002984 +[09:44:16] Epoch: 1 Batch: 4709/20099 (23.43%) Loss: 2.476473 LR: 0.00002984 +[09:44:18] Epoch: 1 Batch: 4710/20099 (23.43%) Loss: 2.065486 LR: 0.00002984 +[09:44:20] Epoch: 1 Batch: 4711/20099 (23.44%) Loss: 2.375472 LR: 0.00002984 +[09:44:22] Epoch: 1 Batch: 4712/20099 (23.44%) Loss: 2.074227 LR: 0.00002984 +[09:44:24] Epoch: 1 Batch: 4713/20099 (23.45%) Loss: 2.139662 LR: 0.00002984 +[09:44:25] Epoch: 1 Batch: 4714/20099 (23.45%) Loss: 2.442742 LR: 0.00002983 +[09:44:27] Epoch: 1 Batch: 4715/20099 (23.46%) Loss: 2.216808 LR: 0.00002983 +[09:44:29] Epoch: 1 Batch: 4716/20099 (23.46%) Loss: 2.207420 LR: 0.00002983 +[09:44:31] Epoch: 1 Batch: 4717/20099 (23.47%) Loss: 2.341451 LR: 0.00002983 +[09:44:32] Epoch: 1 Batch: 4718/20099 (23.47%) Loss: 2.085963 LR: 0.00002983 +[09:44:34] Epoch: 1 Batch: 4719/20099 (23.48%) Loss: 2.002911 LR: 0.00002983 +[09:44:36] Epoch: 1 Batch: 4720/20099 (23.48%) Loss: 2.052259 LR: 0.00002983 +[09:44:38] Epoch: 1 Batch: 4721/20099 (23.49%) Loss: 2.232872 LR: 0.00002983 +[09:44:40] Epoch: 1 Batch: 4722/20099 (23.49%) Loss: 2.272426 LR: 0.00002983 +[09:44:41] Epoch: 1 Batch: 4723/20099 (23.50%) Loss: 2.392010 LR: 0.00002983 +[09:44:43] Epoch: 1 Batch: 4724/20099 (23.50%) Loss: 2.312881 LR: 0.00002983 +[09:44:45] Epoch: 1 Batch: 4725/20099 (23.51%) Loss: 2.126988 LR: 0.00002983 +[09:44:47] Epoch: 1 Batch: 4726/20099 (23.51%) Loss: 2.308162 LR: 0.00002983 +[09:44:49] Epoch: 1 Batch: 4727/20099 (23.52%) Loss: 2.126079 LR: 0.00002983 +[09:44:50] Epoch: 1 Batch: 4728/20099 (23.52%) Loss: 2.483215 LR: 0.00002983 +[09:44:52] Epoch: 1 Batch: 4729/20099 (23.53%) Loss: 2.604293 LR: 0.00002983 +[09:44:54] Epoch: 1 Batch: 4730/20099 (23.53%) Loss: 2.077269 LR: 0.00002983 +[09:44:56] Epoch: 1 Batch: 4731/20099 (23.54%) Loss: 2.156270 LR: 0.00002983 +[09:44:57] Epoch: 1 Batch: 4732/20099 (23.54%) Loss: 2.085979 LR: 0.00002983 +[09:44:59] Epoch: 1 Batch: 4733/20099 (23.55%) Loss: 2.005166 LR: 0.00002983 +[09:45:01] Epoch: 1 Batch: 4734/20099 (23.55%) Loss: 2.049402 LR: 0.00002983 +[09:45:03] Epoch: 1 Batch: 4735/20099 (23.56%) Loss: 2.309577 LR: 0.00002983 +[09:45:05] Epoch: 1 Batch: 4736/20099 (23.56%) Loss: 2.121827 LR: 0.00002983 +[09:45:06] Epoch: 1 Batch: 4737/20099 (23.57%) Loss: 2.042012 LR: 0.00002983 +[09:45:08] Epoch: 1 Batch: 4738/20099 (23.57%) Loss: 2.298476 LR: 0.00002983 +[09:45:10] Epoch: 1 Batch: 4739/20099 (23.58%) Loss: 2.254707 LR: 0.00002983 +[09:45:12] Epoch: 1 Batch: 4740/20099 (23.58%) Loss: 2.127829 LR: 0.00002983 +[09:45:13] Epoch: 1 Batch: 4741/20099 (23.59%) Loss: 1.957008 LR: 0.00002983 +[09:45:15] Epoch: 1 Batch: 4742/20099 (23.59%) Loss: 2.167894 LR: 0.00002982 +[09:45:17] Epoch: 1 Batch: 4743/20099 (23.60%) Loss: 2.224962 LR: 0.00002982 +[09:45:19] Epoch: 1 Batch: 4744/20099 (23.60%) Loss: 2.002913 LR: 0.00002982 +[09:45:20] Epoch: 1 Batch: 4745/20099 (23.61%) Loss: 1.833344 LR: 0.00002982 +[09:45:22] Epoch: 1 Batch: 4746/20099 (23.61%) Loss: 2.104826 LR: 0.00002982 +[09:45:24] Epoch: 1 Batch: 4747/20099 (23.62%) Loss: 2.210449 LR: 0.00002982 +[09:45:26] Epoch: 1 Batch: 4748/20099 (23.62%) Loss: 2.180137 LR: 0.00002982 +[09:45:28] Epoch: 1 Batch: 4749/20099 (23.63%) Loss: 2.117358 LR: 0.00002982 +[09:45:29] Epoch: 1 Batch: 4750/20099 (23.63%) Loss: 2.180726 LR: 0.00002982 +[09:45:31] Epoch: 1 Batch: 4751/20099 (23.64%) Loss: 2.083993 LR: 0.00002982 +[09:45:33] Epoch: 1 Batch: 4752/20099 (23.64%) Loss: 2.193973 LR: 0.00002982 +[09:45:35] Epoch: 1 Batch: 4753/20099 (23.65%) Loss: 2.095832 LR: 0.00002982 +[09:45:36] Epoch: 1 Batch: 4754/20099 (23.65%) Loss: 2.288699 LR: 0.00002982 +[09:45:38] Epoch: 1 Batch: 4755/20099 (23.66%) Loss: 2.304881 LR: 0.00002982 +[09:45:40] Epoch: 1 Batch: 4756/20099 (23.66%) Loss: 2.284542 LR: 0.00002982 +[09:45:42] Epoch: 1 Batch: 4757/20099 (23.67%) Loss: 1.846016 LR: 0.00002982 +[09:45:44] Epoch: 1 Batch: 4758/20099 (23.67%) Loss: 2.118920 LR: 0.00002982 +[09:45:45] Epoch: 1 Batch: 4759/20099 (23.68%) Loss: 2.262972 LR: 0.00002982 +[09:45:47] Epoch: 1 Batch: 4760/20099 (23.68%) Loss: 1.974369 LR: 0.00002982 +[09:45:49] Epoch: 1 Batch: 4761/20099 (23.69%) Loss: 2.066820 LR: 0.00002982 +[09:45:51] Epoch: 1 Batch: 4762/20099 (23.69%) Loss: 2.296504 LR: 0.00002982 +[09:45:52] Epoch: 1 Batch: 4763/20099 (23.70%) Loss: 2.245446 LR: 0.00002981 +[09:45:54] Epoch: 1 Batch: 4764/20099 (23.70%) Loss: 2.107984 LR: 0.00002981 +[09:45:56] Epoch: 1 Batch: 4765/20099 (23.71%) Loss: 2.210339 LR: 0.00002981 +[09:45:58] Epoch: 1 Batch: 4766/20099 (23.71%) Loss: 2.317290 LR: 0.00002981 +[09:46:00] Epoch: 1 Batch: 4767/20099 (23.72%) Loss: 1.968884 LR: 0.00002981 +[09:46:01] Epoch: 1 Batch: 4768/20099 (23.72%) Loss: 2.279551 LR: 0.00002981 +[09:46:03] Epoch: 1 Batch: 4769/20099 (23.73%) Loss: 2.090513 LR: 0.00002981 +[09:46:05] Epoch: 1 Batch: 4770/20099 (23.73%) Loss: 1.964401 LR: 0.00002981 +[09:46:07] Epoch: 1 Batch: 4771/20099 (23.74%) Loss: 2.168427 LR: 0.00002981 +[09:46:08] Epoch: 1 Batch: 4772/20099 (23.74%) Loss: 2.142965 LR: 0.00002981 +[09:46:10] Epoch: 1 Batch: 4773/20099 (23.75%) Loss: 2.212569 LR: 0.00002981 +[09:46:12] Epoch: 1 Batch: 4774/20099 (23.75%) Loss: 2.291496 LR: 0.00002981 +[09:46:14] Epoch: 1 Batch: 4775/20099 (23.76%) Loss: 2.010428 LR: 0.00002981 +[09:46:15] Epoch: 1 Batch: 4776/20099 (23.76%) Loss: 2.372914 LR: 0.00002981 +[09:46:17] Epoch: 1 Batch: 4777/20099 (23.77%) Loss: 2.031020 LR: 0.00002981 +[09:46:19] Epoch: 1 Batch: 4778/20099 (23.77%) Loss: 2.287892 LR: 0.00002981 +[09:46:21] Epoch: 1 Batch: 4779/20099 (23.78%) Loss: 2.512156 LR: 0.00002981 +[09:46:23] Epoch: 1 Batch: 4780/20099 (23.78%) Loss: 2.507593 LR: 0.00002981 +[09:46:24] Epoch: 1 Batch: 4781/20099 (23.79%) Loss: 2.205787 LR: 0.00002981 +[09:46:26] Epoch: 1 Batch: 4782/20099 (23.79%) Loss: 2.325956 LR: 0.00002981 +[09:46:28] Epoch: 1 Batch: 4783/20099 (23.80%) Loss: 2.132880 LR: 0.00002981 +[09:46:30] Epoch: 1 Batch: 4784/20099 (23.80%) Loss: 1.950488 LR: 0.00002981 +[09:46:31] Epoch: 1 Batch: 4785/20099 (23.81%) Loss: 2.384472 LR: 0.00002981 +[09:46:33] Epoch: 1 Batch: 4786/20099 (23.81%) Loss: 2.183739 LR: 0.00002981 +[09:46:35] Epoch: 1 Batch: 4787/20099 (23.82%) Loss: 2.289413 LR: 0.00002981 +[09:46:37] Epoch: 1 Batch: 4788/20099 (23.82%) Loss: 2.112224 LR: 0.00002981 +[09:46:38] Epoch: 1 Batch: 4789/20099 (23.83%) Loss: 2.148224 LR: 0.00002981 +[09:46:40] Epoch: 1 Batch: 4790/20099 (23.83%) Loss: 1.919504 LR: 0.00002981 +[09:46:42] Epoch: 1 Batch: 4791/20099 (23.84%) Loss: 1.892247 LR: 0.00002980 +[09:46:44] Epoch: 1 Batch: 4792/20099 (23.84%) Loss: 2.480908 LR: 0.00002980 +[09:46:46] Epoch: 1 Batch: 4793/20099 (23.85%) Loss: 2.184773 LR: 0.00002980 +[09:46:47] Epoch: 1 Batch: 4794/20099 (23.85%) Loss: 2.260722 LR: 0.00002980 +[09:46:49] Epoch: 1 Batch: 4795/20099 (23.86%) Loss: 2.007028 LR: 0.00002980 +[09:46:51] Epoch: 1 Batch: 4796/20099 (23.86%) Loss: 2.119010 LR: 0.00002980 +[09:46:53] Epoch: 1 Batch: 4797/20099 (23.87%) Loss: 2.108781 LR: 0.00002980 +[09:46:54] Epoch: 1 Batch: 4798/20099 (23.87%) Loss: 2.499474 LR: 0.00002980 +[09:46:56] Epoch: 1 Batch: 4799/20099 (23.88%) Loss: 1.998094 LR: 0.00002980 +[09:47:01] >> Cleaned up old temp checkpoint: epoch1_step2800 +[09:47:02] >> Temp checkpoint saved: epoch1_step4800, size: 0.1693 GB +[09:47:02] Epoch: 1 Batch: 4800/20099 (23.88%) Loss: 2.234776 LR: 0.00002980 +[09:47:03] Epoch: 1 Batch: 4801/20099 (23.89%) Loss: 2.206125 LR: 0.00002980 +[09:47:05] Epoch: 1 Batch: 4802/20099 (23.89%) Loss: 2.121066 LR: 0.00002980 +[09:47:07] Epoch: 1 Batch: 4803/20099 (23.90%) Loss: 2.238003 LR: 0.00002980 +[09:47:09] Epoch: 1 Batch: 4804/20099 (23.90%) Loss: 2.296652 LR: 0.00002980 +[09:47:10] Epoch: 1 Batch: 4805/20099 (23.91%) Loss: 2.048527 LR: 0.00002980 +[09:47:12] Epoch: 1 Batch: 4806/20099 (23.91%) Loss: 2.165964 LR: 0.00002980 +[09:47:14] Epoch: 1 Batch: 4807/20099 (23.92%) Loss: 2.177756 LR: 0.00002980 +[09:47:16] Epoch: 1 Batch: 4808/20099 (23.92%) Loss: 2.034861 LR: 0.00002980 +[09:47:17] Epoch: 1 Batch: 4809/20099 (23.93%) Loss: 2.069503 LR: 0.00002980 +[09:47:19] Epoch: 1 Batch: 4810/20099 (23.93%) Loss: 2.477509 LR: 0.00002980 +[09:47:21] Epoch: 1 Batch: 4811/20099 (23.94%) Loss: 2.272393 LR: 0.00002980 +[09:47:23] Epoch: 1 Batch: 4812/20099 (23.94%) Loss: 2.246731 LR: 0.00002979 +[09:47:25] Epoch: 1 Batch: 4813/20099 (23.95%) Loss: 2.428300 LR: 0.00002979 +[09:47:26] Epoch: 1 Batch: 4814/20099 (23.95%) Loss: 2.057619 LR: 0.00002979 +[09:47:28] Epoch: 1 Batch: 4815/20099 (23.96%) Loss: 1.895146 LR: 0.00002979 +[09:47:30] Epoch: 1 Batch: 4816/20099 (23.96%) Loss: 2.123725 LR: 0.00002979 +[09:47:32] Epoch: 1 Batch: 4817/20099 (23.97%) Loss: 2.273620 LR: 0.00002979 +[09:47:34] Epoch: 1 Batch: 4818/20099 (23.97%) Loss: 1.933514 LR: 0.00002979 +[09:47:35] Epoch: 1 Batch: 4819/20099 (23.98%) Loss: 2.012297 LR: 0.00002979 +[09:47:37] Epoch: 1 Batch: 4820/20099 (23.98%) Loss: 2.113417 LR: 0.00002979 +[09:47:39] Epoch: 1 Batch: 4821/20099 (23.99%) Loss: 2.290759 LR: 0.00002979 +[09:47:41] Epoch: 1 Batch: 4822/20099 (23.99%) Loss: 2.162830 LR: 0.00002979 +[09:47:43] Epoch: 1 Batch: 4823/20099 (24.00%) Loss: 2.284759 LR: 0.00002979 +[09:47:44] Epoch: 1 Batch: 4824/20099 (24.00%) Loss: 2.338061 LR: 0.00002979 +[09:47:46] Epoch: 1 Batch: 4825/20099 (24.01%) Loss: 2.307732 LR: 0.00002979 +[09:47:48] Epoch: 1 Batch: 4826/20099 (24.01%) Loss: 2.196670 LR: 0.00002979 +[09:47:50] Epoch: 1 Batch: 4827/20099 (24.02%) Loss: 1.956112 LR: 0.00002979 +[09:47:51] Epoch: 1 Batch: 4828/20099 (24.02%) Loss: 2.328331 LR: 0.00002979 +[09:47:53] Epoch: 1 Batch: 4829/20099 (24.03%) Loss: 2.213278 LR: 0.00002979 +[09:47:55] Epoch: 1 Batch: 4830/20099 (24.03%) Loss: 2.139114 LR: 0.00002979 +[09:47:57] Epoch: 1 Batch: 4831/20099 (24.04%) Loss: 2.410370 LR: 0.00002979 +[09:47:59] Epoch: 1 Batch: 4832/20099 (24.04%) Loss: 2.069990 LR: 0.00002979 +[09:48:00] Epoch: 1 Batch: 4833/20099 (24.05%) Loss: 1.910782 LR: 0.00002979 +[09:48:02] Epoch: 1 Batch: 4834/20099 (24.05%) Loss: 2.107382 LR: 0.00002979 +[09:48:04] Epoch: 1 Batch: 4835/20099 (24.06%) Loss: 1.790853 LR: 0.00002979 +[09:48:06] Epoch: 1 Batch: 4836/20099 (24.06%) Loss: 2.184691 LR: 0.00002979 +[09:48:07] Epoch: 1 Batch: 4837/20099 (24.07%) Loss: 2.124120 LR: 0.00002979 +[09:48:09] Epoch: 1 Batch: 4838/20099 (24.07%) Loss: 2.043215 LR: 0.00002979 +[09:48:11] Epoch: 1 Batch: 4839/20099 (24.08%) Loss: 2.001926 LR: 0.00002979 +[09:48:13] Epoch: 1 Batch: 4840/20099 (24.08%) Loss: 2.147848 LR: 0.00002978 +[09:48:14] Epoch: 1 Batch: 4841/20099 (24.09%) Loss: 2.076287 LR: 0.00002978 +[09:48:16] Epoch: 1 Batch: 4842/20099 (24.09%) Loss: 2.354243 LR: 0.00002978 +[09:48:18] Epoch: 1 Batch: 4843/20099 (24.10%) Loss: 1.913371 LR: 0.00002978 +[09:48:20] Epoch: 1 Batch: 4844/20099 (24.10%) Loss: 2.029375 LR: 0.00002978 +[09:48:21] Epoch: 1 Batch: 4845/20099 (24.11%) Loss: 2.222288 LR: 0.00002978 +[09:48:23] Epoch: 1 Batch: 4846/20099 (24.11%) Loss: 1.961665 LR: 0.00002978 +[09:48:25] Epoch: 1 Batch: 4847/20099 (24.12%) Loss: 1.968897 LR: 0.00002978 +[09:48:27] Epoch: 1 Batch: 4848/20099 (24.12%) Loss: 2.322304 LR: 0.00002978 +[09:48:29] Epoch: 1 Batch: 4849/20099 (24.13%) Loss: 2.185101 LR: 0.00002978 +[09:48:30] Epoch: 1 Batch: 4850/20099 (24.13%) Loss: 1.924018 LR: 0.00002978 +[09:48:32] Epoch: 1 Batch: 4851/20099 (24.14%) Loss: 2.011701 LR: 0.00002978 +[09:48:34] Epoch: 1 Batch: 4852/20099 (24.14%) Loss: 2.338996 LR: 0.00002978 +[09:48:36] Epoch: 1 Batch: 4853/20099 (24.15%) Loss: 1.962860 LR: 0.00002978 +[09:48:37] Epoch: 1 Batch: 4854/20099 (24.15%) Loss: 2.232194 LR: 0.00002978 +[09:48:39] Epoch: 1 Batch: 4855/20099 (24.16%) Loss: 2.169894 LR: 0.00002978 +[09:48:41] Epoch: 1 Batch: 4856/20099 (24.16%) Loss: 2.543146 LR: 0.00002978 +[09:48:43] Epoch: 1 Batch: 4857/20099 (24.17%) Loss: 2.323641 LR: 0.00002978 +[09:48:44] Epoch: 1 Batch: 4858/20099 (24.17%) Loss: 2.171200 LR: 0.00002978 +[09:48:46] Epoch: 1 Batch: 4859/20099 (24.18%) Loss: 2.190246 LR: 0.00002978 +[09:48:48] Epoch: 1 Batch: 4860/20099 (24.18%) Loss: 2.058274 LR: 0.00002978 +[09:48:50] Epoch: 1 Batch: 4861/20099 (24.19%) Loss: 2.211266 LR: 0.00002977 +[09:48:52] Epoch: 1 Batch: 4862/20099 (24.19%) Loss: 2.154627 LR: 0.00002977 +[09:48:53] Epoch: 1 Batch: 4863/20099 (24.20%) Loss: 2.021559 LR: 0.00002977 +[09:48:55] Epoch: 1 Batch: 4864/20099 (24.20%) Loss: 2.290583 LR: 0.00002977 +[09:48:57] Epoch: 1 Batch: 4865/20099 (24.21%) Loss: 2.144314 LR: 0.00002977 +[09:48:59] Epoch: 1 Batch: 4866/20099 (24.21%) Loss: 2.115013 LR: 0.00002977 +[09:49:00] Epoch: 1 Batch: 4867/20099 (24.22%) Loss: 2.199650 LR: 0.00002977 +[09:49:02] Epoch: 1 Batch: 4868/20099 (24.22%) Loss: 2.124646 LR: 0.00002977 +[09:49:04] Epoch: 1 Batch: 4869/20099 (24.23%) Loss: 2.118107 LR: 0.00002977 +[09:49:06] Epoch: 1 Batch: 4870/20099 (24.23%) Loss: 2.336019 LR: 0.00002977 +[09:49:08] Epoch: 1 Batch: 4871/20099 (24.24%) Loss: 1.979587 LR: 0.00002977 +[09:49:09] Epoch: 1 Batch: 4872/20099 (24.24%) Loss: 2.203210 LR: 0.00002977 +[09:49:11] Epoch: 1 Batch: 4873/20099 (24.24%) Loss: 1.865507 LR: 0.00002977 +[09:49:13] Epoch: 1 Batch: 4874/20099 (24.25%) Loss: 2.165278 LR: 0.00002977 +[09:49:15] Epoch: 1 Batch: 4875/20099 (24.25%) Loss: 2.092434 LR: 0.00002977 +[09:49:16] Epoch: 1 Batch: 4876/20099 (24.26%) Loss: 2.198335 LR: 0.00002977 +[09:49:18] Epoch: 1 Batch: 4877/20099 (24.26%) Loss: 2.177116 LR: 0.00002977 +[09:49:20] Epoch: 1 Batch: 4878/20099 (24.27%) Loss: 2.178542 LR: 0.00002977 +[09:49:22] Epoch: 1 Batch: 4879/20099 (24.27%) Loss: 2.228541 LR: 0.00002977 +[09:49:24] Epoch: 1 Batch: 4880/20099 (24.28%) Loss: 2.155006 LR: 0.00002977 +[09:49:25] Epoch: 1 Batch: 4881/20099 (24.28%) Loss: 1.878647 LR: 0.00002977 +[09:49:27] Epoch: 1 Batch: 4882/20099 (24.29%) Loss: 2.224158 LR: 0.00002976 +[09:49:29] Epoch: 1 Batch: 4883/20099 (24.29%) Loss: 2.217337 LR: 0.00002976 +[09:49:31] Epoch: 1 Batch: 4884/20099 (24.30%) Loss: 2.022690 LR: 0.00002976 +[09:49:32] Epoch: 1 Batch: 4885/20099 (24.30%) Loss: 2.302140 LR: 0.00002976 +[09:49:34] Epoch: 1 Batch: 4886/20099 (24.31%) Loss: 2.273205 LR: 0.00002976 +[09:49:36] Epoch: 1 Batch: 4887/20099 (24.31%) Loss: 2.121745 LR: 0.00002976 +[09:49:38] Epoch: 1 Batch: 4888/20099 (24.32%) Loss: 2.112606 LR: 0.00002976 +[09:49:40] Epoch: 1 Batch: 4889/20099 (24.32%) Loss: 2.243021 LR: 0.00002976 +[09:49:41] Epoch: 1 Batch: 4890/20099 (24.33%) Loss: 1.975548 LR: 0.00002976 +[09:49:43] Epoch: 1 Batch: 4891/20099 (24.33%) Loss: 2.205206 LR: 0.00002976 +[09:49:45] Epoch: 1 Batch: 4892/20099 (24.34%) Loss: 2.295355 LR: 0.00002976 +[09:49:47] Epoch: 1 Batch: 4893/20099 (24.34%) Loss: 2.207078 LR: 0.00002976 +[09:49:48] Epoch: 1 Batch: 4894/20099 (24.35%) Loss: 2.348968 LR: 0.00002976 +[09:49:50] Epoch: 1 Batch: 4895/20099 (24.35%) Loss: 2.314091 LR: 0.00002976 +[09:49:52] Epoch: 1 Batch: 4896/20099 (24.36%) Loss: 2.473916 LR: 0.00002976 +[09:49:54] Epoch: 1 Batch: 4897/20099 (24.36%) Loss: 2.078672 LR: 0.00002976 +[09:49:55] Epoch: 1 Batch: 4898/20099 (24.37%) Loss: 2.011466 LR: 0.00002976 +[09:49:57] Epoch: 1 Batch: 4899/20099 (24.37%) Loss: 2.016005 LR: 0.00002976 +[09:49:59] Epoch: 1 Batch: 4900/20099 (24.38%) Loss: 1.973158 LR: 0.00002976 +[09:50:01] Epoch: 1 Batch: 4901/20099 (24.38%) Loss: 2.144464 LR: 0.00002976 +[09:50:03] Epoch: 1 Batch: 4902/20099 (24.39%) Loss: 2.426229 LR: 0.00002976 +[09:50:04] Epoch: 1 Batch: 4903/20099 (24.39%) Loss: 1.862036 LR: 0.00002975 +[09:50:06] Epoch: 1 Batch: 4904/20099 (24.40%) Loss: 1.879345 LR: 0.00002975 +[09:50:08] Epoch: 1 Batch: 4905/20099 (24.40%) Loss: 1.882080 LR: 0.00002975 +[09:50:10] Epoch: 1 Batch: 4906/20099 (24.41%) Loss: 2.442721 LR: 0.00002975 +[09:50:11] Epoch: 1 Batch: 4907/20099 (24.41%) Loss: 2.149083 LR: 0.00002975 +[09:50:13] Epoch: 1 Batch: 4908/20099 (24.42%) Loss: 1.910943 LR: 0.00002975 +[09:50:15] Epoch: 1 Batch: 4909/20099 (24.42%) Loss: 2.110845 LR: 0.00002975 +[09:50:17] Epoch: 1 Batch: 4910/20099 (24.43%) Loss: 2.311897 LR: 0.00002975 +[09:50:19] Epoch: 1 Batch: 4911/20099 (24.43%) Loss: 2.044370 LR: 0.00002975 +[09:50:20] Epoch: 1 Batch: 4912/20099 (24.44%) Loss: 2.345976 LR: 0.00002975 +[09:50:22] Epoch: 1 Batch: 4913/20099 (24.44%) Loss: 2.064775 LR: 0.00002975 +[09:50:24] Epoch: 1 Batch: 4914/20099 (24.45%) Loss: 2.294205 LR: 0.00002975 +[09:50:26] Epoch: 1 Batch: 4915/20099 (24.45%) Loss: 2.234475 LR: 0.00002975 +[09:50:27] Epoch: 1 Batch: 4916/20099 (24.46%) Loss: 2.220117 LR: 0.00002975 +[09:50:29] Epoch: 1 Batch: 4917/20099 (24.46%) Loss: 2.357362 LR: 0.00002975 +[09:50:31] Epoch: 1 Batch: 4918/20099 (24.47%) Loss: 2.092838 LR: 0.00002975 +[09:50:33] Epoch: 1 Batch: 4919/20099 (24.47%) Loss: 2.442264 LR: 0.00002975 +[09:50:34] Epoch: 1 Batch: 4920/20099 (24.48%) Loss: 2.331126 LR: 0.00002975 +[09:50:36] Epoch: 1 Batch: 4921/20099 (24.48%) Loss: 2.399190 LR: 0.00002975 +[09:50:38] Epoch: 1 Batch: 4922/20099 (24.49%) Loss: 1.937255 LR: 0.00002975 +[09:50:40] Epoch: 1 Batch: 4923/20099 (24.49%) Loss: 2.503200 LR: 0.00002975 +[09:50:41] Epoch: 1 Batch: 4924/20099 (24.50%) Loss: 1.809917 LR: 0.00002974 +[09:50:43] Epoch: 1 Batch: 4925/20099 (24.50%) Loss: 2.186947 LR: 0.00002974 +[09:50:45] Epoch: 1 Batch: 4926/20099 (24.51%) Loss: 1.942776 LR: 0.00002974 +[09:50:47] Epoch: 1 Batch: 4927/20099 (24.51%) Loss: 2.111140 LR: 0.00002974 +[09:50:49] Epoch: 1 Batch: 4928/20099 (24.52%) Loss: 2.427361 LR: 0.00002974 +[09:50:50] Epoch: 1 Batch: 4929/20099 (24.52%) Loss: 2.201350 LR: 0.00002974 +[09:50:52] Epoch: 1 Batch: 4930/20099 (24.53%) Loss: 2.427022 LR: 0.00002974 +[09:50:54] Epoch: 1 Batch: 4931/20099 (24.53%) Loss: 2.087469 LR: 0.00002974 +[09:50:56] Epoch: 1 Batch: 4932/20099 (24.54%) Loss: 2.282948 LR: 0.00002974 +[09:50:57] Epoch: 1 Batch: 4933/20099 (24.54%) Loss: 1.880638 LR: 0.00002974 +[09:50:59] Epoch: 1 Batch: 4934/20099 (24.55%) Loss: 2.480005 LR: 0.00002974 +[09:51:01] Epoch: 1 Batch: 4935/20099 (24.55%) Loss: 2.043077 LR: 0.00002974 +[09:51:03] Epoch: 1 Batch: 4936/20099 (24.56%) Loss: 2.374222 LR: 0.00002974 +[09:51:04] Epoch: 1 Batch: 4937/20099 (24.56%) Loss: 2.139943 LR: 0.00002974 +[09:51:06] Epoch: 1 Batch: 4938/20099 (24.57%) Loss: 2.200003 LR: 0.00002974 +[09:51:08] Epoch: 1 Batch: 4939/20099 (24.57%) Loss: 2.139958 LR: 0.00002974 +[09:51:10] Epoch: 1 Batch: 4940/20099 (24.58%) Loss: 1.947821 LR: 0.00002974 +[09:51:12] Epoch: 1 Batch: 4941/20099 (24.58%) Loss: 2.384009 LR: 0.00002974 +[09:51:13] Epoch: 1 Batch: 4942/20099 (24.59%) Loss: 2.242365 LR: 0.00002974 +[09:51:15] Epoch: 1 Batch: 4943/20099 (24.59%) Loss: 2.346899 LR: 0.00002974 +[09:51:17] Epoch: 1 Batch: 4944/20099 (24.60%) Loss: 2.159861 LR: 0.00002974 +[09:51:19] Epoch: 1 Batch: 4945/20099 (24.60%) Loss: 2.574744 LR: 0.00002973 +[09:51:20] Epoch: 1 Batch: 4946/20099 (24.61%) Loss: 2.327347 LR: 0.00002973 +[09:51:22] Epoch: 1 Batch: 4947/20099 (24.61%) Loss: 2.064087 LR: 0.00002973 +[09:51:24] Epoch: 1 Batch: 4948/20099 (24.62%) Loss: 1.977720 LR: 0.00002973 +[09:51:26] Epoch: 1 Batch: 4949/20099 (24.62%) Loss: 2.397459 LR: 0.00002973 +[09:51:27] Epoch: 1 Batch: 4950/20099 (24.63%) Loss: 1.952505 LR: 0.00002973 +[09:51:29] Epoch: 1 Batch: 4951/20099 (24.63%) Loss: 2.244450 LR: 0.00002973 +[09:51:31] Epoch: 1 Batch: 4952/20099 (24.64%) Loss: 1.979291 LR: 0.00002973 +[09:51:33] Epoch: 1 Batch: 4953/20099 (24.64%) Loss: 2.129127 LR: 0.00002973 +[09:51:35] Epoch: 1 Batch: 4954/20099 (24.65%) Loss: 2.237111 LR: 0.00002973 +[09:51:36] Epoch: 1 Batch: 4955/20099 (24.65%) Loss: 2.307721 LR: 0.00002973 +[09:51:38] Epoch: 1 Batch: 4956/20099 (24.66%) Loss: 1.912464 LR: 0.00002973 +[09:51:40] Epoch: 1 Batch: 4957/20099 (24.66%) Loss: 2.204460 LR: 0.00002973 +[09:51:42] Epoch: 1 Batch: 4958/20099 (24.67%) Loss: 2.192246 LR: 0.00002973 +[09:51:43] Epoch: 1 Batch: 4959/20099 (24.67%) Loss: 2.090888 LR: 0.00002973 +[09:51:45] Epoch: 1 Batch: 4960/20099 (24.68%) Loss: 2.210060 LR: 0.00002973 +[09:51:47] Epoch: 1 Batch: 4961/20099 (24.68%) Loss: 2.251124 LR: 0.00002973 +[09:51:49] Epoch: 1 Batch: 4962/20099 (24.69%) Loss: 1.944263 LR: 0.00002973 +[09:51:50] Epoch: 1 Batch: 4963/20099 (24.69%) Loss: 1.952312 LR: 0.00002973 +[09:51:52] Epoch: 1 Batch: 4964/20099 (24.70%) Loss: 2.266460 LR: 0.00002973 +[09:51:54] Epoch: 1 Batch: 4965/20099 (24.70%) Loss: 2.101196 LR: 0.00002973 +[09:51:56] Epoch: 1 Batch: 4966/20099 (24.71%) Loss: 2.283311 LR: 0.00002972 +[09:51:58] Epoch: 1 Batch: 4967/20099 (24.71%) Loss: 2.093925 LR: 0.00002972 +[09:51:59] Epoch: 1 Batch: 4968/20099 (24.72%) Loss: 1.764931 LR: 0.00002972 +[09:52:01] Epoch: 1 Batch: 4969/20099 (24.72%) Loss: 2.086683 LR: 0.00002972 +[09:52:03] Epoch: 1 Batch: 4970/20099 (24.73%) Loss: 1.978712 LR: 0.00002972 +[09:52:05] Epoch: 1 Batch: 4971/20099 (24.73%) Loss: 2.248721 LR: 0.00002972 +[09:52:06] Epoch: 1 Batch: 4972/20099 (24.74%) Loss: 2.351784 LR: 0.00002972 +[09:52:08] Epoch: 1 Batch: 4973/20099 (24.74%) Loss: 2.048282 LR: 0.00002972 +[09:52:10] Epoch: 1 Batch: 4974/20099 (24.75%) Loss: 2.035323 LR: 0.00002972 +[09:52:12] Epoch: 1 Batch: 4975/20099 (24.75%) Loss: 1.936232 LR: 0.00002972 +[09:52:13] Epoch: 1 Batch: 4976/20099 (24.76%) Loss: 2.073332 LR: 0.00002972 +[09:52:15] Epoch: 1 Batch: 4977/20099 (24.76%) Loss: 2.321735 LR: 0.00002972 +[09:52:17] Epoch: 1 Batch: 4978/20099 (24.77%) Loss: 2.575166 LR: 0.00002972 +[09:52:19] Epoch: 1 Batch: 4979/20099 (24.77%) Loss: 2.337676 LR: 0.00002972 +[09:52:21] Epoch: 1 Batch: 4980/20099 (24.78%) Loss: 1.990798 LR: 0.00002972 +[09:52:22] Epoch: 1 Batch: 4981/20099 (24.78%) Loss: 2.023738 LR: 0.00002972 +[09:52:24] Epoch: 1 Batch: 4982/20099 (24.79%) Loss: 2.327463 LR: 0.00002972 +[09:52:26] Epoch: 1 Batch: 4983/20099 (24.79%) Loss: 2.047676 LR: 0.00002972 +[09:52:28] Epoch: 1 Batch: 4984/20099 (24.80%) Loss: 2.054253 LR: 0.00002972 +[09:52:29] Epoch: 1 Batch: 4985/20099 (24.80%) Loss: 2.588379 LR: 0.00002972 +[09:52:31] Epoch: 1 Batch: 4986/20099 (24.81%) Loss: 2.171120 LR: 0.00002972 +[09:52:33] Epoch: 1 Batch: 4987/20099 (24.81%) Loss: 2.266971 LR: 0.00002971 +[09:52:35] Epoch: 1 Batch: 4988/20099 (24.82%) Loss: 2.649441 LR: 0.00002971 +[09:52:36] Epoch: 1 Batch: 4989/20099 (24.82%) Loss: 1.894354 LR: 0.00002971 +[09:52:38] Epoch: 1 Batch: 4990/20099 (24.83%) Loss: 2.225730 LR: 0.00002971 +[09:52:40] Epoch: 1 Batch: 4991/20099 (24.83%) Loss: 2.412319 LR: 0.00002971 +[09:52:42] Epoch: 1 Batch: 4992/20099 (24.84%) Loss: 2.241832 LR: 0.00002971 +[09:52:44] Epoch: 1 Batch: 4993/20099 (24.84%) Loss: 2.143329 LR: 0.00002971 +[09:52:45] Epoch: 1 Batch: 4994/20099 (24.85%) Loss: 1.718128 LR: 0.00002971 +[09:52:47] Epoch: 1 Batch: 4995/20099 (24.85%) Loss: 2.106905 LR: 0.00002971 +[09:52:49] Epoch: 1 Batch: 4996/20099 (24.86%) Loss: 2.067069 LR: 0.00002971 +[09:52:51] Epoch: 1 Batch: 4997/20099 (24.86%) Loss: 2.272020 LR: 0.00002971 +[09:52:52] Epoch: 1 Batch: 4998/20099 (24.87%) Loss: 2.179208 LR: 0.00002971 +[09:52:54] Epoch: 1 Batch: 4999/20099 (24.87%) Loss: 2.087494 LR: 0.00002971 +[09:52:56] >> Evaluating batch 0 +[09:52:57] >> Evaluating batch 1 +[09:52:58] >> Evaluating batch 2 +[09:52:59] >> Evaluating batch 3 +[09:53:00] >> Evaluating batch 4 +[09:53:01] >> Evaluating batch 5 +[09:53:02] >> Evaluating batch 6 +[09:53:03] >> Evaluating batch 7 +[09:53:04] >> Evaluating batch 8 +[09:53:05] >> Evaluating batch 9 +[09:53:06] >> Evaluating batch 10 +[09:53:07] >> Evaluating batch 11 +[09:53:08] >> Evaluating batch 12 +[09:53:09] >> Evaluating batch 13 +[09:53:10] >> Evaluating batch 14 +[09:53:11] >> Evaluating batch 15 +[09:53:12] >> Evaluating batch 16 +[09:53:12] Epoch: 1 Step: 5000/20099 Evaluation: +[09:53:12] [1mAvg Loss Since Last Eval: 2.1737 Val Loss: 2.2270 Validation loss delta: -0.0230 Perplexity: 9.2722 LR: 0.00002971 +[09:53:16] >> Cleaned up old temp checkpoint: epoch1_step3000 +[09:53:16] >> Temp checkpoint saved: epoch1_step5000, size: 0.1693 GB +[09:53:20] >> Checkpoint saved: epoch1_step5000, size: 0.1693 GB +[09:53:20] Epoch: 1 Batch: 5000/20099 (24.88%) Loss: 2.851571 LR: 0.00002971 +[09:53:21] Epoch: 1 Batch: 5001/20099 (24.88%) Loss: 2.080209 LR: 0.00002971 +[09:53:23] Epoch: 1 Batch: 5002/20099 (24.89%) Loss: 2.366229 LR: 0.00002971 +[09:53:25] Epoch: 1 Batch: 5003/20099 (24.89%) Loss: 2.242831 LR: 0.00002971 +[09:53:27] Epoch: 1 Batch: 5004/20099 (24.90%) Loss: 2.142959 LR: 0.00002971 +[09:53:28] Epoch: 1 Batch: 5005/20099 (24.90%) Loss: 2.267247 LR: 0.00002971 +[09:53:30] Epoch: 1 Batch: 5006/20099 (24.91%) Loss: 2.479731 LR: 0.00002971 +[09:53:32] Epoch: 1 Batch: 5007/20099 (24.91%) Loss: 2.229644 LR: 0.00002971 +[09:53:34] Epoch: 1 Batch: 5008/20099 (24.92%) Loss: 2.243711 LR: 0.00002970 +[09:53:36] Epoch: 1 Batch: 5009/20099 (24.92%) Loss: 2.176639 LR: 0.00002970 +[09:53:37] Epoch: 1 Batch: 5010/20099 (24.93%) Loss: 2.079576 LR: 0.00002970 +[09:53:39] Epoch: 1 Batch: 5011/20099 (24.93%) Loss: 2.405908 LR: 0.00002970 +[09:53:41] Epoch: 1 Batch: 5012/20099 (24.94%) Loss: 2.203678 LR: 0.00002970 +[09:53:43] Epoch: 1 Batch: 5013/20099 (24.94%) Loss: 1.990335 LR: 0.00002970 +[09:53:45] Epoch: 1 Batch: 5014/20099 (24.95%) Loss: 2.036045 LR: 0.00002970 +[09:53:47] Epoch: 1 Batch: 5015/20099 (24.95%) Loss: 2.311205 LR: 0.00002970 +[09:53:48] Epoch: 1 Batch: 5016/20099 (24.96%) Loss: 2.173803 LR: 0.00002970 +[09:53:50] Epoch: 1 Batch: 5017/20099 (24.96%) Loss: 2.137241 LR: 0.00002970 +[09:53:52] Epoch: 1 Batch: 5018/20099 (24.97%) Loss: 1.999016 LR: 0.00002970 +[09:53:54] Epoch: 1 Batch: 5019/20099 (24.97%) Loss: 2.367273 LR: 0.00002970 +[09:53:56] Epoch: 1 Batch: 5020/20099 (24.98%) Loss: 2.432936 LR: 0.00002970 +[09:53:57] Epoch: 1 Batch: 5021/20099 (24.98%) Loss: 2.315578 LR: 0.00002970 +[09:53:59] Epoch: 1 Batch: 5022/20099 (24.99%) Loss: 2.121822 LR: 0.00002969 +[09:54:01] Epoch: 1 Batch: 5023/20099 (24.99%) Loss: 2.458246 LR: 0.00002969 +[09:54:03] Epoch: 1 Batch: 5024/20099 (25.00%) Loss: 2.106646 LR: 0.00002969 +[09:54:05] Epoch: 1 Batch: 5025/20099 (25.00%) Loss: 2.111447 LR: 0.00002969 +[09:54:06] Epoch: 1 Batch: 5026/20099 (25.01%) Loss: 1.845444 LR: 0.00002969 +[09:54:08] Epoch: 1 Batch: 5027/20099 (25.01%) Loss: 2.172496 LR: 0.00002969 +[09:54:10] Epoch: 1 Batch: 5028/20099 (25.02%) Loss: 1.979561 LR: 0.00002969 +[09:54:12] Epoch: 1 Batch: 5029/20099 (25.02%) Loss: 2.240930 LR: 0.00002969 +[09:54:14] Epoch: 1 Batch: 5030/20099 (25.03%) Loss: 2.196163 LR: 0.00002969 +[09:54:15] Epoch: 1 Batch: 5031/20099 (25.03%) Loss: 2.227931 LR: 0.00002969 +[09:54:17] Epoch: 1 Batch: 5032/20099 (25.04%) Loss: 1.886937 LR: 0.00002969 +[09:54:19] Epoch: 1 Batch: 5033/20099 (25.04%) Loss: 2.261586 LR: 0.00002969 +[09:54:21] Epoch: 1 Batch: 5034/20099 (25.05%) Loss: 1.973758 LR: 0.00002969 +[09:54:22] Epoch: 1 Batch: 5035/20099 (25.05%) Loss: 2.348380 LR: 0.00002969 +[09:54:24] Epoch: 1 Batch: 5036/20099 (25.06%) Loss: 2.284104 LR: 0.00002969 +[09:54:26] Epoch: 1 Batch: 5037/20099 (25.06%) Loss: 2.303903 LR: 0.00002969 +[09:54:28] Epoch: 1 Batch: 5038/20099 (25.07%) Loss: 2.127284 LR: 0.00002969 +[09:54:29] Epoch: 1 Batch: 5039/20099 (25.07%) Loss: 2.591542 LR: 0.00002969 +[09:54:31] Epoch: 1 Batch: 5040/20099 (25.08%) Loss: 2.263111 LR: 0.00002969 +[09:54:33] Epoch: 1 Batch: 5041/20099 (25.08%) Loss: 2.448599 LR: 0.00002969 +[09:54:35] Epoch: 1 Batch: 5042/20099 (25.09%) Loss: 1.946489 LR: 0.00002969 +[09:54:36] Epoch: 1 Batch: 5043/20099 (25.09%) Loss: 2.416937 LR: 0.00002968 +[09:54:38] Epoch: 1 Batch: 5044/20099 (25.10%) Loss: 1.884483 LR: 0.00002968 +[09:54:40] Epoch: 1 Batch: 5045/20099 (25.10%) Loss: 2.525615 LR: 0.00002968 +[09:54:42] Epoch: 1 Batch: 5046/20099 (25.11%) Loss: 2.123149 LR: 0.00002968 +[09:54:43] Epoch: 1 Batch: 5047/20099 (25.11%) Loss: 2.089194 LR: 0.00002968 +[09:54:45] Epoch: 1 Batch: 5048/20099 (25.12%) Loss: 1.821545 LR: 0.00002968 +[09:54:47] Epoch: 1 Batch: 5049/20099 (25.12%) Loss: 2.329400 LR: 0.00002968 +[09:54:49] Epoch: 1 Batch: 5050/20099 (25.13%) Loss: 2.261026 LR: 0.00002968 +[09:54:50] Epoch: 1 Batch: 5051/20099 (25.13%) Loss: 1.971361 LR: 0.00002968 +[09:54:52] Epoch: 1 Batch: 5052/20099 (25.14%) Loss: 1.957452 LR: 0.00002968 +[09:54:54] Epoch: 1 Batch: 5053/20099 (25.14%) Loss: 2.236181 LR: 0.00002968 +[09:54:56] Epoch: 1 Batch: 5054/20099 (25.15%) Loss: 2.456327 LR: 0.00002968 +[09:54:58] Epoch: 1 Batch: 5055/20099 (25.15%) Loss: 2.365149 LR: 0.00002968 +[09:54:59] Epoch: 1 Batch: 5056/20099 (25.16%) Loss: 2.120660 LR: 0.00002968 +[09:55:01] Epoch: 1 Batch: 5057/20099 (25.16%) Loss: 2.354949 LR: 0.00002968 +[09:55:03] Epoch: 1 Batch: 5058/20099 (25.17%) Loss: 2.457166 LR: 0.00002968 +[09:55:05] Epoch: 1 Batch: 5059/20099 (25.17%) Loss: 2.345314 LR: 0.00002968 +[09:55:06] Epoch: 1 Batch: 5060/20099 (25.18%) Loss: 2.072136 LR: 0.00002968 +[09:55:08] Epoch: 1 Batch: 5061/20099 (25.18%) Loss: 2.036778 LR: 0.00002968 +[09:55:10] Epoch: 1 Batch: 5062/20099 (25.19%) Loss: 2.226122 LR: 0.00002968 +[09:55:12] Epoch: 1 Batch: 5063/20099 (25.19%) Loss: 2.248617 LR: 0.00002968 +[09:55:14] Epoch: 1 Batch: 5064/20099 (25.20%) Loss: 1.989989 LR: 0.00002967 +[09:55:15] Epoch: 1 Batch: 5065/20099 (25.20%) Loss: 2.217492 LR: 0.00002967 +[09:55:17] Epoch: 1 Batch: 5066/20099 (25.21%) Loss: 2.263133 LR: 0.00002967 +[09:55:19] Epoch: 1 Batch: 5067/20099 (25.21%) Loss: 2.183441 LR: 0.00002967 +[09:55:21] Epoch: 1 Batch: 5068/20099 (25.22%) Loss: 2.024618 LR: 0.00002967 +[09:55:22] Epoch: 1 Batch: 5069/20099 (25.22%) Loss: 2.032421 LR: 0.00002967 +[09:55:24] Epoch: 1 Batch: 5070/20099 (25.23%) Loss: 2.105435 LR: 0.00002967 +[09:55:26] Epoch: 1 Batch: 5071/20099 (25.23%) Loss: 2.280354 LR: 0.00002967 +[09:55:28] Epoch: 1 Batch: 5072/20099 (25.24%) Loss: 2.200377 LR: 0.00002967 +[09:55:30] Epoch: 1 Batch: 5073/20099 (25.24%) Loss: 2.267823 LR: 0.00002967 +[09:55:31] Epoch: 1 Batch: 5074/20099 (25.25%) Loss: 2.109293 LR: 0.00002967 +[09:55:33] Epoch: 1 Batch: 5075/20099 (25.25%) Loss: 2.211303 LR: 0.00002967 +[09:55:35] Epoch: 1 Batch: 5076/20099 (25.25%) Loss: 2.619929 LR: 0.00002967 +[09:55:37] Epoch: 1 Batch: 5077/20099 (25.26%) Loss: 2.137373 LR: 0.00002967 +[09:55:39] Epoch: 1 Batch: 5078/20099 (25.26%) Loss: 2.380644 LR: 0.00002966 +[09:55:40] Epoch: 1 Batch: 5079/20099 (25.27%) Loss: 2.055312 LR: 0.00002966 +[09:55:42] Epoch: 1 Batch: 5080/20099 (25.27%) Loss: 2.142936 LR: 0.00002966 +[09:55:44] Epoch: 1 Batch: 5081/20099 (25.28%) Loss: 2.181513 LR: 0.00002966 +[09:55:46] Epoch: 1 Batch: 5082/20099 (25.28%) Loss: 2.027374 LR: 0.00002966 +[09:55:47] Epoch: 1 Batch: 5083/20099 (25.29%) Loss: 2.072664 LR: 0.00002966 +[09:55:49] Epoch: 1 Batch: 5084/20099 (25.29%) Loss: 2.156169 LR: 0.00002966 +[09:55:51] Epoch: 1 Batch: 5085/20099 (25.30%) Loss: 2.111713 LR: 0.00002966 +[09:55:53] Epoch: 1 Batch: 5086/20099 (25.30%) Loss: 2.183945 LR: 0.00002966 +[09:55:54] Epoch: 1 Batch: 5087/20099 (25.31%) Loss: 1.907758 LR: 0.00002966 +[09:55:56] Epoch: 1 Batch: 5088/20099 (25.31%) Loss: 2.264148 LR: 0.00002966 +[09:55:58] Epoch: 1 Batch: 5089/20099 (25.32%) Loss: 2.085764 LR: 0.00002966 +[09:56:00] Epoch: 1 Batch: 5090/20099 (25.32%) Loss: 1.935993 LR: 0.00002966 +[09:56:01] Epoch: 1 Batch: 5091/20099 (25.33%) Loss: 2.302396 LR: 0.00002966 +[09:56:03] Epoch: 1 Batch: 5092/20099 (25.33%) Loss: 2.428800 LR: 0.00002966 +[09:56:05] Epoch: 1 Batch: 5093/20099 (25.34%) Loss: 2.317728 LR: 0.00002966 +[09:56:07] Epoch: 1 Batch: 5094/20099 (25.34%) Loss: 2.200238 LR: 0.00002966 +[09:56:09] Epoch: 1 Batch: 5095/20099 (25.35%) Loss: 1.937538 LR: 0.00002966 +[09:56:10] Epoch: 1 Batch: 5096/20099 (25.35%) Loss: 2.151785 LR: 0.00002966 +[09:56:12] Epoch: 1 Batch: 5097/20099 (25.36%) Loss: 2.227997 LR: 0.00002966 +[09:56:14] Epoch: 1 Batch: 5098/20099 (25.36%) Loss: 2.187951 LR: 0.00002966 +[09:56:16] Epoch: 1 Batch: 5099/20099 (25.37%) Loss: 2.208586 LR: 0.00002965 +[09:56:17] Epoch: 1 Batch: 5100/20099 (25.37%) Loss: 2.386372 LR: 0.00002965 +[09:56:19] Epoch: 1 Batch: 5101/20099 (25.38%) Loss: 1.812694 LR: 0.00002965 +[09:56:21] Epoch: 1 Batch: 5102/20099 (25.38%) Loss: 2.130867 LR: 0.00002965 +[09:56:23] Epoch: 1 Batch: 5103/20099 (25.39%) Loss: 2.153519 LR: 0.00002965 +[09:56:24] Epoch: 1 Batch: 5104/20099 (25.39%) Loss: 2.100061 LR: 0.00002965 +[09:56:26] Epoch: 1 Batch: 5105/20099 (25.40%) Loss: 2.073378 LR: 0.00002965 +[09:56:28] Epoch: 1 Batch: 5106/20099 (25.40%) Loss: 1.987383 LR: 0.00002965 +[09:56:30] Epoch: 1 Batch: 5107/20099 (25.41%) Loss: 2.197016 LR: 0.00002965 +[09:56:31] Epoch: 1 Batch: 5108/20099 (25.41%) Loss: 1.898695 LR: 0.00002965 +[09:56:33] Epoch: 1 Batch: 5109/20099 (25.42%) Loss: 2.173894 LR: 0.00002965 +[09:56:35] Epoch: 1 Batch: 5110/20099 (25.42%) Loss: 2.258354 LR: 0.00002965 +[09:56:37] Epoch: 1 Batch: 5111/20099 (25.43%) Loss: 2.318998 LR: 0.00002965 +[09:56:39] Epoch: 1 Batch: 5112/20099 (25.43%) Loss: 2.032122 LR: 0.00002965 +[09:56:40] Epoch: 1 Batch: 5113/20099 (25.44%) Loss: 2.317402 LR: 0.00002965 +[09:56:42] Epoch: 1 Batch: 5114/20099 (25.44%) Loss: 2.298309 LR: 0.00002965 +[09:56:44] Epoch: 1 Batch: 5115/20099 (25.45%) Loss: 2.197459 LR: 0.00002965 +[09:56:46] Epoch: 1 Batch: 5116/20099 (25.45%) Loss: 2.058699 LR: 0.00002965 +[09:56:47] Epoch: 1 Batch: 5117/20099 (25.46%) Loss: 2.573050 LR: 0.00002965 +[09:56:49] Epoch: 1 Batch: 5118/20099 (25.46%) Loss: 2.187925 LR: 0.00002965 +[09:56:51] Epoch: 1 Batch: 5119/20099 (25.47%) Loss: 2.668014 LR: 0.00002965 +[09:56:53] Epoch: 1 Batch: 5120/20099 (25.47%) Loss: 2.150378 LR: 0.00002964 +[09:56:55] Epoch: 1 Batch: 5121/20099 (25.48%) Loss: 2.171249 LR: 0.00002964 +[09:56:56] Epoch: 1 Batch: 5122/20099 (25.48%) Loss: 2.185718 LR: 0.00002964 +[09:56:58] Epoch: 1 Batch: 5123/20099 (25.49%) Loss: 2.049951 LR: 0.00002964 +[09:57:00] Epoch: 1 Batch: 5124/20099 (25.49%) Loss: 2.215984 LR: 0.00002964 +[09:57:02] Epoch: 1 Batch: 5125/20099 (25.50%) Loss: 1.875645 LR: 0.00002964 +[09:57:03] Epoch: 1 Batch: 5126/20099 (25.50%) Loss: 2.088014 LR: 0.00002964 +[09:57:05] Epoch: 1 Batch: 5127/20099 (25.51%) Loss: 2.194373 LR: 0.00002964 +[09:57:07] Epoch: 1 Batch: 5128/20099 (25.51%) Loss: 2.093822 LR: 0.00002964 +[09:57:09] Epoch: 1 Batch: 5129/20099 (25.52%) Loss: 2.440821 LR: 0.00002964 +[09:57:10] Epoch: 1 Batch: 5130/20099 (25.52%) Loss: 2.309804 LR: 0.00002964 +[09:57:12] Epoch: 1 Batch: 5131/20099 (25.53%) Loss: 2.474787 LR: 0.00002964 +[09:57:14] Epoch: 1 Batch: 5132/20099 (25.53%) Loss: 2.245017 LR: 0.00002964 +[09:57:16] Epoch: 1 Batch: 5133/20099 (25.54%) Loss: 2.397236 LR: 0.00002964 +[09:57:18] Epoch: 1 Batch: 5134/20099 (25.54%) Loss: 1.975077 LR: 0.00002963 +[09:57:19] Epoch: 1 Batch: 5135/20099 (25.55%) Loss: 2.271425 LR: 0.00002963 +[09:57:21] Epoch: 1 Batch: 5136/20099 (25.55%) Loss: 1.811434 LR: 0.00002963 +[09:57:23] Epoch: 1 Batch: 5137/20099 (25.56%) Loss: 1.876786 LR: 0.00002963 +[09:57:25] Epoch: 1 Batch: 5138/20099 (25.56%) Loss: 2.212978 LR: 0.00002963 +[09:57:27] Epoch: 1 Batch: 5139/20099 (25.57%) Loss: 2.047060 LR: 0.00002963 +[09:57:28] Epoch: 1 Batch: 5140/20099 (25.57%) Loss: 2.023449 LR: 0.00002963 +[09:57:30] Epoch: 1 Batch: 5141/20099 (25.58%) Loss: 2.472535 LR: 0.00002963 +[09:57:32] Epoch: 1 Batch: 5142/20099 (25.58%) Loss: 1.974768 LR: 0.00002963 +[09:57:34] Epoch: 1 Batch: 5143/20099 (25.59%) Loss: 2.002954 LR: 0.00002963 +[09:57:35] Epoch: 1 Batch: 5144/20099 (25.59%) Loss: 2.232331 LR: 0.00002963 +[09:57:37] Epoch: 1 Batch: 5145/20099 (25.60%) Loss: 2.243229 LR: 0.00002963 +[09:57:39] Epoch: 1 Batch: 5146/20099 (25.60%) Loss: 2.211474 LR: 0.00002963 +[09:57:41] Epoch: 1 Batch: 5147/20099 (25.61%) Loss: 2.213857 LR: 0.00002963 +[09:57:42] Epoch: 1 Batch: 5148/20099 (25.61%) Loss: 2.011610 LR: 0.00002963 +[09:57:44] Epoch: 1 Batch: 5149/20099 (25.62%) Loss: 2.352600 LR: 0.00002963 +[09:57:46] Epoch: 1 Batch: 5150/20099 (25.62%) Loss: 2.182811 LR: 0.00002963 +[09:57:48] Epoch: 1 Batch: 5151/20099 (25.63%) Loss: 2.296450 LR: 0.00002963 +[09:57:50] Epoch: 1 Batch: 5152/20099 (25.63%) Loss: 2.089911 LR: 0.00002963 +[09:57:51] Epoch: 1 Batch: 5153/20099 (25.64%) Loss: 2.424341 LR: 0.00002963 +[09:57:53] Epoch: 1 Batch: 5154/20099 (25.64%) Loss: 2.280419 LR: 0.00002963 +[09:57:55] Epoch: 1 Batch: 5155/20099 (25.65%) Loss: 2.134317 LR: 0.00002962 +[09:57:57] Epoch: 1 Batch: 5156/20099 (25.65%) Loss: 2.361571 LR: 0.00002962 +[09:57:58] Epoch: 1 Batch: 5157/20099 (25.66%) Loss: 2.201919 LR: 0.00002962 +[09:58:00] Epoch: 1 Batch: 5158/20099 (25.66%) Loss: 2.307642 LR: 0.00002962 +[09:58:02] Epoch: 1 Batch: 5159/20099 (25.67%) Loss: 2.037541 LR: 0.00002962 +[09:58:04] Epoch: 1 Batch: 5160/20099 (25.67%) Loss: 2.114485 LR: 0.00002962 +[09:58:05] Epoch: 1 Batch: 5161/20099 (25.68%) Loss: 1.782122 LR: 0.00002962 +[09:58:07] Epoch: 1 Batch: 5162/20099 (25.68%) Loss: 2.386678 LR: 0.00002962 +[09:58:09] Epoch: 1 Batch: 5163/20099 (25.69%) Loss: 2.247995 LR: 0.00002962 +[09:58:11] Epoch: 1 Batch: 5164/20099 (25.69%) Loss: 2.036732 LR: 0.00002962 +[09:58:13] Epoch: 1 Batch: 5165/20099 (25.70%) Loss: 1.930676 LR: 0.00002962 +[09:58:14] Epoch: 1 Batch: 5166/20099 (25.70%) Loss: 2.182351 LR: 0.00002962 +[09:58:16] Epoch: 1 Batch: 5167/20099 (25.71%) Loss: 2.266179 LR: 0.00002962 +[09:58:18] Epoch: 1 Batch: 5168/20099 (25.71%) Loss: 2.083990 LR: 0.00002962 +[09:58:20] Epoch: 1 Batch: 5169/20099 (25.72%) Loss: 2.239344 LR: 0.00002961 +[09:58:21] Epoch: 1 Batch: 5170/20099 (25.72%) Loss: 2.294771 LR: 0.00002961 +[09:58:23] Epoch: 1 Batch: 5171/20099 (25.73%) Loss: 2.059346 LR: 0.00002961 +[09:58:25] Epoch: 1 Batch: 5172/20099 (25.73%) Loss: 2.218330 LR: 0.00002961 +[09:58:27] Epoch: 1 Batch: 5173/20099 (25.74%) Loss: 1.920479 LR: 0.00002961 +[09:58:28] Epoch: 1 Batch: 5174/20099 (25.74%) Loss: 1.964068 LR: 0.00002961 +[09:58:30] Epoch: 1 Batch: 5175/20099 (25.75%) Loss: 2.025846 LR: 0.00002961 +[09:58:32] Epoch: 1 Batch: 5176/20099 (25.75%) Loss: 2.012708 LR: 0.00002961 +[09:58:34] Epoch: 1 Batch: 5177/20099 (25.76%) Loss: 2.184777 LR: 0.00002961 +[09:58:36] Epoch: 1 Batch: 5178/20099 (25.76%) Loss: 1.947737 LR: 0.00002961 +[09:58:37] Epoch: 1 Batch: 5179/20099 (25.77%) Loss: 2.226052 LR: 0.00002961 +[09:58:39] Epoch: 1 Batch: 5180/20099 (25.77%) Loss: 2.344881 LR: 0.00002961 +[09:58:41] Epoch: 1 Batch: 5181/20099 (25.78%) Loss: 2.275828 LR: 0.00002961 +[09:58:43] Epoch: 1 Batch: 5182/20099 (25.78%) Loss: 1.784848 LR: 0.00002961 +[09:58:44] Epoch: 1 Batch: 5183/20099 (25.79%) Loss: 2.316425 LR: 0.00002961 +[09:58:46] Epoch: 1 Batch: 5184/20099 (25.79%) Loss: 2.111327 LR: 0.00002961 +[09:58:48] Epoch: 1 Batch: 5185/20099 (25.80%) Loss: 2.050776 LR: 0.00002961 +[09:58:50] Epoch: 1 Batch: 5186/20099 (25.80%) Loss: 2.236385 LR: 0.00002961 +[09:58:51] Epoch: 1 Batch: 5187/20099 (25.81%) Loss: 2.197124 LR: 0.00002961 +[09:58:53] Epoch: 1 Batch: 5188/20099 (25.81%) Loss: 2.071490 LR: 0.00002961 +[09:58:55] Epoch: 1 Batch: 5189/20099 (25.82%) Loss: 2.126563 LR: 0.00002961 +[09:58:57] Epoch: 1 Batch: 5190/20099 (25.82%) Loss: 2.282122 LR: 0.00002960 +[09:58:58] Epoch: 1 Batch: 5191/20099 (25.83%) Loss: 2.160692 LR: 0.00002960 +[09:59:00] Epoch: 1 Batch: 5192/20099 (25.83%) Loss: 2.069715 LR: 0.00002960 +[09:59:02] Epoch: 1 Batch: 5193/20099 (25.84%) Loss: 2.334690 LR: 0.00002960 +[09:59:04] Epoch: 1 Batch: 5194/20099 (25.84%) Loss: 1.808087 LR: 0.00002960 +[09:59:06] Epoch: 1 Batch: 5195/20099 (25.85%) Loss: 2.366622 LR: 0.00002960 +[09:59:07] Epoch: 1 Batch: 5196/20099 (25.85%) Loss: 2.141907 LR: 0.00002960 +[09:59:09] Epoch: 1 Batch: 5197/20099 (25.86%) Loss: 2.319940 LR: 0.00002960 +[09:59:11] Epoch: 1 Batch: 5198/20099 (25.86%) Loss: 1.934475 LR: 0.00002960 +[09:59:13] Epoch: 1 Batch: 5199/20099 (25.87%) Loss: 2.054247 LR: 0.00002960 +[09:59:18] >> Cleaned up old temp checkpoint: epoch1_step3200 +[09:59:18] >> Temp checkpoint saved: epoch1_step5200, size: 0.1693 GB +[09:59:18] Epoch: 1 Batch: 5200/20099 (25.87%) Loss: 2.111468 LR: 0.00002960 +[09:59:20] Epoch: 1 Batch: 5201/20099 (25.88%) Loss: 2.183207 LR: 0.00002960 +[09:59:21] Epoch: 1 Batch: 5202/20099 (25.88%) Loss: 2.348523 LR: 0.00002960 +[09:59:23] Epoch: 1 Batch: 5203/20099 (25.89%) Loss: 2.546628 LR: 0.00002960 +[09:59:25] Epoch: 1 Batch: 5204/20099 (25.89%) Loss: 2.124391 LR: 0.00002959 +[09:59:27] Epoch: 1 Batch: 5205/20099 (25.90%) Loss: 1.827406 LR: 0.00002959 +[09:59:28] Epoch: 1 Batch: 5206/20099 (25.90%) Loss: 2.222822 LR: 0.00002959 +[09:59:30] Epoch: 1 Batch: 5207/20099 (25.91%) Loss: 2.085008 LR: 0.00002959 +[09:59:32] Epoch: 1 Batch: 5208/20099 (25.91%) Loss: 1.961774 LR: 0.00002959 +[09:59:34] Epoch: 1 Batch: 5209/20099 (25.92%) Loss: 2.072930 LR: 0.00002959 +[09:59:36] Epoch: 1 Batch: 5210/20099 (25.92%) Loss: 2.404287 LR: 0.00002959 +[09:59:37] Epoch: 1 Batch: 5211/20099 (25.93%) Loss: 2.546427 LR: 0.00002959 +[09:59:39] Epoch: 1 Batch: 5212/20099 (25.93%) Loss: 2.233519 LR: 0.00002959 +[09:59:41] Epoch: 1 Batch: 5213/20099 (25.94%) Loss: 2.341598 LR: 0.00002959 +[09:59:43] Epoch: 1 Batch: 5214/20099 (25.94%) Loss: 2.159814 LR: 0.00002959 +[09:59:45] Epoch: 1 Batch: 5215/20099 (25.95%) Loss: 2.433942 LR: 0.00002959 +[09:59:46] Epoch: 1 Batch: 5216/20099 (25.95%) Loss: 2.086906 LR: 0.00002959 +[09:59:48] Epoch: 1 Batch: 5217/20099 (25.96%) Loss: 2.333353 LR: 0.00002959 +[09:59:50] Epoch: 1 Batch: 5218/20099 (25.96%) Loss: 2.006044 LR: 0.00002958 +[09:59:52] Epoch: 1 Batch: 5219/20099 (25.97%) Loss: 2.050009 LR: 0.00002958 +[09:59:54] Epoch: 1 Batch: 5220/20099 (25.97%) Loss: 2.103528 LR: 0.00002958 +[09:59:55] Epoch: 1 Batch: 5221/20099 (25.98%) Loss: 1.934553 LR: 0.00002958 +[09:59:57] Epoch: 1 Batch: 5222/20099 (25.98%) Loss: 2.176851 LR: 0.00002958 +[09:59:59] Epoch: 1 Batch: 5223/20099 (25.99%) Loss: 2.023637 LR: 0.00002958 +[10:00:01] Epoch: 1 Batch: 5224/20099 (25.99%) Loss: 2.082035 LR: 0.00002958 +[10:00:03] Epoch: 1 Batch: 5225/20099 (26.00%) Loss: 1.821024 LR: 0.00002958 +[10:00:04] Epoch: 1 Batch: 5226/20099 (26.00%) Loss: 1.931556 LR: 0.00002958 +[10:00:06] Epoch: 1 Batch: 5227/20099 (26.01%) Loss: 2.502036 LR: 0.00002958 +[10:00:08] Epoch: 1 Batch: 5228/20099 (26.01%) Loss: 2.239546 LR: 0.00002958 +[10:00:10] Epoch: 1 Batch: 5229/20099 (26.02%) Loss: 1.865717 LR: 0.00002958 +[10:00:11] Epoch: 1 Batch: 5230/20099 (26.02%) Loss: 2.224649 LR: 0.00002958 +[10:00:13] Epoch: 1 Batch: 5231/20099 (26.03%) Loss: 2.182599 LR: 0.00002958 +[10:00:15] Epoch: 1 Batch: 5232/20099 (26.03%) Loss: 2.027246 LR: 0.00002958 +[10:00:17] Epoch: 1 Batch: 5233/20099 (26.04%) Loss: 1.965098 LR: 0.00002958 +[10:00:19] Epoch: 1 Batch: 5234/20099 (26.04%) Loss: 1.760204 LR: 0.00002958 +[10:00:20] Epoch: 1 Batch: 5235/20099 (26.05%) Loss: 1.967444 LR: 0.00002958 +[10:00:22] Epoch: 1 Batch: 5236/20099 (26.05%) Loss: 2.337430 LR: 0.00002958 +[10:00:24] Epoch: 1 Batch: 5237/20099 (26.06%) Loss: 2.282742 LR: 0.00002958 +[10:00:26] Epoch: 1 Batch: 5238/20099 (26.06%) Loss: 1.893889 LR: 0.00002958 +[10:00:27] Epoch: 1 Batch: 5239/20099 (26.07%) Loss: 1.951274 LR: 0.00002957 +[10:00:29] Epoch: 1 Batch: 5240/20099 (26.07%) Loss: 1.945030 LR: 0.00002957 +[10:00:31] Epoch: 1 Batch: 5241/20099 (26.08%) Loss: 2.054807 LR: 0.00002957 +[10:00:33] Epoch: 1 Batch: 5242/20099 (26.08%) Loss: 2.125057 LR: 0.00002957 +[10:00:34] Epoch: 1 Batch: 5243/20099 (26.09%) Loss: 2.172578 LR: 0.00002957 +[10:00:36] Epoch: 1 Batch: 5244/20099 (26.09%) Loss: 2.004131 LR: 0.00002957 +[10:00:38] Epoch: 1 Batch: 5245/20099 (26.10%) Loss: 2.060345 LR: 0.00002957 +[10:00:40] Epoch: 1 Batch: 5246/20099 (26.10%) Loss: 2.444358 LR: 0.00002957 +[10:00:41] Epoch: 1 Batch: 5247/20099 (26.11%) Loss: 2.455721 LR: 0.00002957 +[10:00:43] Epoch: 1 Batch: 5248/20099 (26.11%) Loss: 2.502295 LR: 0.00002957 +[10:00:45] Epoch: 1 Batch: 5249/20099 (26.12%) Loss: 2.249175 LR: 0.00002957 +[10:00:47] Epoch: 1 Batch: 5250/20099 (26.12%) Loss: 2.195855 LR: 0.00002957 +[10:00:48] Epoch: 1 Batch: 5251/20099 (26.13%) Loss: 2.052966 LR: 0.00002957 +[10:00:50] Epoch: 1 Batch: 5252/20099 (26.13%) Loss: 1.988862 LR: 0.00002957 +[10:00:52] Epoch: 1 Batch: 5253/20099 (26.14%) Loss: 2.194725 LR: 0.00002956 +[10:00:54] Epoch: 1 Batch: 5254/20099 (26.14%) Loss: 2.220252 LR: 0.00002956 +[10:00:55] Epoch: 1 Batch: 5255/20099 (26.15%) Loss: 2.084338 LR: 0.00002956 +[10:00:57] Epoch: 1 Batch: 5256/20099 (26.15%) Loss: 2.389321 LR: 0.00002956 +[10:00:59] Epoch: 1 Batch: 5257/20099 (26.16%) Loss: 2.290904 LR: 0.00002956 +[10:01:01] Epoch: 1 Batch: 5258/20099 (26.16%) Loss: 1.912925 LR: 0.00002956 +[10:01:03] Epoch: 1 Batch: 5259/20099 (26.17%) Loss: 2.081119 LR: 0.00002956 +[10:01:04] Epoch: 1 Batch: 5260/20099 (26.17%) Loss: 2.257213 LR: 0.00002956 +[10:01:06] Epoch: 1 Batch: 5261/20099 (26.18%) Loss: 1.953340 LR: 0.00002956 +[10:01:08] Epoch: 1 Batch: 5262/20099 (26.18%) Loss: 1.955008 LR: 0.00002956 +[10:01:10] Epoch: 1 Batch: 5263/20099 (26.19%) Loss: 2.049940 LR: 0.00002956 +[10:01:11] Epoch: 1 Batch: 5264/20099 (26.19%) Loss: 2.308121 LR: 0.00002956 +[10:01:13] Epoch: 1 Batch: 5265/20099 (26.20%) Loss: 2.042370 LR: 0.00002956 +[10:01:15] Epoch: 1 Batch: 5266/20099 (26.20%) Loss: 1.847723 LR: 0.00002956 +[10:01:17] Epoch: 1 Batch: 5267/20099 (26.21%) Loss: 2.209192 LR: 0.00002955 +[10:01:19] Epoch: 1 Batch: 5268/20099 (26.21%) Loss: 2.036092 LR: 0.00002955 +[10:01:20] Epoch: 1 Batch: 5269/20099 (26.22%) Loss: 1.951100 LR: 0.00002955 +[10:01:22] Epoch: 1 Batch: 5270/20099 (26.22%) Loss: 2.438736 LR: 0.00002955 +[10:01:24] Epoch: 1 Batch: 5271/20099 (26.23%) Loss: 2.131855 LR: 0.00002955 +[10:01:26] Epoch: 1 Batch: 5272/20099 (26.23%) Loss: 2.211555 LR: 0.00002955 +[10:01:27] Epoch: 1 Batch: 5273/20099 (26.24%) Loss: 2.227537 LR: 0.00002955 +[10:01:29] Epoch: 1 Batch: 5274/20099 (26.24%) Loss: 2.000927 LR: 0.00002955 +[10:01:31] Epoch: 1 Batch: 5275/20099 (26.25%) Loss: 2.245870 LR: 0.00002955 +[10:01:33] Epoch: 1 Batch: 5276/20099 (26.25%) Loss: 2.149190 LR: 0.00002955 +[10:01:35] Epoch: 1 Batch: 5277/20099 (26.26%) Loss: 2.182068 LR: 0.00002955 +[10:01:37] Epoch: 1 Batch: 5278/20099 (26.26%) Loss: 2.064885 LR: 0.00002955 +[10:01:38] Epoch: 1 Batch: 5279/20099 (26.26%) Loss: 2.357294 LR: 0.00002955 +[10:01:40] Epoch: 1 Batch: 5280/20099 (26.27%) Loss: 2.309474 LR: 0.00002955 +[10:01:42] Epoch: 1 Batch: 5281/20099 (26.27%) Loss: 2.137645 LR: 0.00002955 +[10:01:44] Epoch: 1 Batch: 5282/20099 (26.28%) Loss: 1.903790 LR: 0.00002955 +[10:01:46] Epoch: 1 Batch: 5283/20099 (26.28%) Loss: 2.483090 LR: 0.00002955 +[10:01:47] Epoch: 1 Batch: 5284/20099 (26.29%) Loss: 2.257658 LR: 0.00002955 +[10:01:49] Epoch: 1 Batch: 5285/20099 (26.29%) Loss: 2.112091 LR: 0.00002955 +[10:01:51] Epoch: 1 Batch: 5286/20099 (26.30%) Loss: 2.232180 LR: 0.00002955 +[10:01:53] Epoch: 1 Batch: 5287/20099 (26.30%) Loss: 2.244846 LR: 0.00002955 +[10:01:54] Epoch: 1 Batch: 5288/20099 (26.31%) Loss: 2.131731 LR: 0.00002954 +[10:01:56] Epoch: 1 Batch: 5289/20099 (26.31%) Loss: 2.124147 LR: 0.00002954 +[10:01:58] Epoch: 1 Batch: 5290/20099 (26.32%) Loss: 1.862711 LR: 0.00002954 +[10:02:00] Epoch: 1 Batch: 5291/20099 (26.32%) Loss: 2.122579 LR: 0.00002954 +[10:02:02] Epoch: 1 Batch: 5292/20099 (26.33%) Loss: 2.122437 LR: 0.00002954 +[10:02:03] Epoch: 1 Batch: 5293/20099 (26.33%) Loss: 2.008774 LR: 0.00002954 +[10:02:05] Epoch: 1 Batch: 5294/20099 (26.34%) Loss: 2.153089 LR: 0.00002954 +[10:02:07] Epoch: 1 Batch: 5295/20099 (26.34%) Loss: 2.453319 LR: 0.00002954 +[10:02:09] Epoch: 1 Batch: 5296/20099 (26.35%) Loss: 2.107199 LR: 0.00002954 +[10:02:10] Epoch: 1 Batch: 5297/20099 (26.35%) Loss: 2.016767 LR: 0.00002954 +[10:02:12] Epoch: 1 Batch: 5298/20099 (26.36%) Loss: 2.049711 LR: 0.00002954 +[10:02:14] Epoch: 1 Batch: 5299/20099 (26.36%) Loss: 2.304215 LR: 0.00002954 +[10:02:16] Epoch: 1 Batch: 5300/20099 (26.37%) Loss: 2.367209 LR: 0.00002954 +[10:02:18] Epoch: 1 Batch: 5301/20099 (26.37%) Loss: 2.152944 LR: 0.00002954 +[10:02:19] Epoch: 1 Batch: 5302/20099 (26.38%) Loss: 1.905111 LR: 0.00002953 +[10:02:21] Epoch: 1 Batch: 5303/20099 (26.38%) Loss: 1.983787 LR: 0.00002953 +[10:02:23] Epoch: 1 Batch: 5304/20099 (26.39%) Loss: 2.155028 LR: 0.00002953 +[10:02:25] Epoch: 1 Batch: 5305/20099 (26.39%) Loss: 2.216731 LR: 0.00002953 +[10:02:26] Epoch: 1 Batch: 5306/20099 (26.40%) Loss: 1.968447 LR: 0.00002953 +[10:02:28] Epoch: 1 Batch: 5307/20099 (26.40%) Loss: 2.258006 LR: 0.00002953 +[10:02:30] Epoch: 1 Batch: 5308/20099 (26.41%) Loss: 2.181214 LR: 0.00002953 +[10:02:32] Epoch: 1 Batch: 5309/20099 (26.41%) Loss: 2.443493 LR: 0.00002953 +[10:02:34] Epoch: 1 Batch: 5310/20099 (26.42%) Loss: 1.945972 LR: 0.00002953 +[10:02:35] Epoch: 1 Batch: 5311/20099 (26.42%) Loss: 1.978797 LR: 0.00002953 +[10:02:37] Epoch: 1 Batch: 5312/20099 (26.43%) Loss: 2.087245 LR: 0.00002953 +[10:02:39] Epoch: 1 Batch: 5313/20099 (26.43%) Loss: 2.147776 LR: 0.00002953 +[10:02:41] Epoch: 1 Batch: 5314/20099 (26.44%) Loss: 2.105486 LR: 0.00002953 +[10:02:42] Epoch: 1 Batch: 5315/20099 (26.44%) Loss: 1.979147 LR: 0.00002953 +[10:02:44] Epoch: 1 Batch: 5316/20099 (26.45%) Loss: 2.090837 LR: 0.00002952 +[10:02:46] Epoch: 1 Batch: 5317/20099 (26.45%) Loss: 2.036646 LR: 0.00002952 +[10:02:48] Epoch: 1 Batch: 5318/20099 (26.46%) Loss: 1.980001 LR: 0.00002952 +[10:02:50] Epoch: 1 Batch: 5319/20099 (26.46%) Loss: 2.267281 LR: 0.00002952 +[10:02:51] Epoch: 1 Batch: 5320/20099 (26.47%) Loss: 2.169286 LR: 0.00002952 +[10:02:53] Epoch: 1 Batch: 5321/20099 (26.47%) Loss: 2.157197 LR: 0.00002952 +[10:02:55] Epoch: 1 Batch: 5322/20099 (26.48%) Loss: 2.132870 LR: 0.00002952 +[10:02:57] Epoch: 1 Batch: 5323/20099 (26.48%) Loss: 2.362358 LR: 0.00002952 +[10:02:58] Epoch: 1 Batch: 5324/20099 (26.49%) Loss: 1.845561 LR: 0.00002952 +[10:03:00] Epoch: 1 Batch: 5325/20099 (26.49%) Loss: 2.125608 LR: 0.00002952 +[10:03:02] Epoch: 1 Batch: 5326/20099 (26.50%) Loss: 2.014180 LR: 0.00002952 +[10:03:04] Epoch: 1 Batch: 5327/20099 (26.50%) Loss: 2.529214 LR: 0.00002952 +[10:03:05] Epoch: 1 Batch: 5328/20099 (26.51%) Loss: 2.150317 LR: 0.00002952 +[10:03:07] Epoch: 1 Batch: 5329/20099 (26.51%) Loss: 1.665474 LR: 0.00002952 +[10:03:09] Epoch: 1 Batch: 5330/20099 (26.52%) Loss: 2.055614 LR: 0.00002951 +[10:03:11] Epoch: 1 Batch: 5331/20099 (26.52%) Loss: 2.175996 LR: 0.00002951 +[10:03:13] Epoch: 1 Batch: 5332/20099 (26.53%) Loss: 2.573268 LR: 0.00002951 +[10:03:14] Epoch: 1 Batch: 5333/20099 (26.53%) Loss: 1.926118 LR: 0.00002951 +[10:03:16] Epoch: 1 Batch: 5334/20099 (26.54%) Loss: 2.246243 LR: 0.00002951 +[10:03:18] Epoch: 1 Batch: 5335/20099 (26.54%) Loss: 2.248676 LR: 0.00002951 +[10:03:20] Epoch: 1 Batch: 5336/20099 (26.55%) Loss: 2.077011 LR: 0.00002951 +[10:03:21] Epoch: 1 Batch: 5337/20099 (26.55%) Loss: 2.333693 LR: 0.00002951 +[10:03:23] Epoch: 1 Batch: 5338/20099 (26.56%) Loss: 1.980316 LR: 0.00002951 +[10:03:25] Epoch: 1 Batch: 5339/20099 (26.56%) Loss: 2.185232 LR: 0.00002951 +[10:03:27] Epoch: 1 Batch: 5340/20099 (26.57%) Loss: 2.047994 LR: 0.00002951 +[10:03:29] Epoch: 1 Batch: 5341/20099 (26.57%) Loss: 2.406561 LR: 0.00002951 +[10:03:30] Epoch: 1 Batch: 5342/20099 (26.58%) Loss: 2.263602 LR: 0.00002951 +[10:03:32] Epoch: 1 Batch: 5343/20099 (26.58%) Loss: 2.113905 LR: 0.00002951 +[10:03:34] Epoch: 1 Batch: 5344/20099 (26.59%) Loss: 2.165080 LR: 0.00002950 +[10:03:36] Epoch: 1 Batch: 5345/20099 (26.59%) Loss: 2.105507 LR: 0.00002950 +[10:03:37] Epoch: 1 Batch: 5346/20099 (26.60%) Loss: 2.162409 LR: 0.00002950 +[10:03:39] Epoch: 1 Batch: 5347/20099 (26.60%) Loss: 2.254974 LR: 0.00002950 +[10:03:41] Epoch: 1 Batch: 5348/20099 (26.61%) Loss: 2.146637 LR: 0.00002950 +[10:03:43] Epoch: 1 Batch: 5349/20099 (26.61%) Loss: 1.983313 LR: 0.00002950 +[10:03:45] Epoch: 1 Batch: 5350/20099 (26.62%) Loss: 2.384114 LR: 0.00002950 +[10:03:46] Epoch: 1 Batch: 5351/20099 (26.62%) Loss: 2.380051 LR: 0.00002950 +[10:03:48] Epoch: 1 Batch: 5352/20099 (26.63%) Loss: 2.154150 LR: 0.00002950 +[10:03:50] Epoch: 1 Batch: 5353/20099 (26.63%) Loss: 2.188178 LR: 0.00002950 +[10:03:52] Epoch: 1 Batch: 5354/20099 (26.64%) Loss: 1.660404 LR: 0.00002950 +[10:03:53] Epoch: 1 Batch: 5355/20099 (26.64%) Loss: 2.377430 LR: 0.00002950 +[10:03:55] Epoch: 1 Batch: 5356/20099 (26.65%) Loss: 2.008257 LR: 0.00002950 +[10:03:57] Epoch: 1 Batch: 5357/20099 (26.65%) Loss: 2.257199 LR: 0.00002950 +[10:03:59] Epoch: 1 Batch: 5358/20099 (26.66%) Loss: 2.353332 LR: 0.00002950 +[10:04:01] Epoch: 1 Batch: 5359/20099 (26.66%) Loss: 2.207362 LR: 0.00002950 +[10:04:02] Epoch: 1 Batch: 5360/20099 (26.67%) Loss: 2.388850 LR: 0.00002950 +[10:04:04] Epoch: 1 Batch: 5361/20099 (26.67%) Loss: 2.460552 LR: 0.00002950 +[10:04:06] Epoch: 1 Batch: 5362/20099 (26.68%) Loss: 2.069199 LR: 0.00002950 +[10:04:08] Epoch: 1 Batch: 5363/20099 (26.68%) Loss: 1.785211 LR: 0.00002950 +[10:04:09] Epoch: 1 Batch: 5364/20099 (26.69%) Loss: 1.869507 LR: 0.00002950 +[10:04:11] Epoch: 1 Batch: 5365/20099 (26.69%) Loss: 2.315555 LR: 0.00002949 +[10:04:13] Epoch: 1 Batch: 5366/20099 (26.70%) Loss: 2.072086 LR: 0.00002949 +[10:04:15] Epoch: 1 Batch: 5367/20099 (26.70%) Loss: 2.025519 LR: 0.00002949 +[10:04:17] Epoch: 1 Batch: 5368/20099 (26.71%) Loss: 2.026887 LR: 0.00002949 +[10:04:18] Epoch: 1 Batch: 5369/20099 (26.71%) Loss: 2.092586 LR: 0.00002949 +[10:04:20] Epoch: 1 Batch: 5370/20099 (26.72%) Loss: 1.903172 LR: 0.00002949 +[10:04:22] Epoch: 1 Batch: 5371/20099 (26.72%) Loss: 2.265740 LR: 0.00002949 +[10:04:24] Epoch: 1 Batch: 5372/20099 (26.73%) Loss: 2.345257 LR: 0.00002949 +[10:04:25] Epoch: 1 Batch: 5373/20099 (26.73%) Loss: 2.166030 LR: 0.00002949 +[10:04:27] Epoch: 1 Batch: 5374/20099 (26.74%) Loss: 2.028642 LR: 0.00002949 +[10:04:29] Epoch: 1 Batch: 5375/20099 (26.74%) Loss: 2.054534 LR: 0.00002949 +[10:04:31] Epoch: 1 Batch: 5376/20099 (26.75%) Loss: 2.196232 LR: 0.00002949 +[10:04:32] Epoch: 1 Batch: 5377/20099 (26.75%) Loss: 2.086753 LR: 0.00002949 +[10:04:34] Epoch: 1 Batch: 5378/20099 (26.76%) Loss: 1.820545 LR: 0.00002949 +[10:04:36] Epoch: 1 Batch: 5379/20099 (26.76%) Loss: 2.331662 LR: 0.00002948 +[10:04:38] Epoch: 1 Batch: 5380/20099 (26.77%) Loss: 2.286243 LR: 0.00002948 +[10:04:40] Epoch: 1 Batch: 5381/20099 (26.77%) Loss: 1.825334 LR: 0.00002948 +[10:04:41] Epoch: 1 Batch: 5382/20099 (26.78%) Loss: 2.274067 LR: 0.00002948 +[10:04:43] Epoch: 1 Batch: 5383/20099 (26.78%) Loss: 2.003365 LR: 0.00002948 +[10:04:45] Epoch: 1 Batch: 5384/20099 (26.79%) Loss: 2.140905 LR: 0.00002948 +[10:04:47] Epoch: 1 Batch: 5385/20099 (26.79%) Loss: 2.342805 LR: 0.00002948 +[10:04:48] Epoch: 1 Batch: 5386/20099 (26.80%) Loss: 2.162837 LR: 0.00002948 +[10:04:50] Epoch: 1 Batch: 5387/20099 (26.80%) Loss: 2.209598 LR: 0.00002948 +[10:04:52] Epoch: 1 Batch: 5388/20099 (26.81%) Loss: 2.322072 LR: 0.00002948 +[10:04:54] Epoch: 1 Batch: 5389/20099 (26.81%) Loss: 1.984072 LR: 0.00002948 +[10:04:55] Epoch: 1 Batch: 5390/20099 (26.82%) Loss: 2.172188 LR: 0.00002948 +[10:04:57] Epoch: 1 Batch: 5391/20099 (26.82%) Loss: 1.974128 LR: 0.00002948 +[10:04:59] Epoch: 1 Batch: 5392/20099 (26.83%) Loss: 1.827538 LR: 0.00002948 +[10:05:01] Epoch: 1 Batch: 5393/20099 (26.83%) Loss: 2.208696 LR: 0.00002947 +[10:05:03] Epoch: 1 Batch: 5394/20099 (26.84%) Loss: 1.988352 LR: 0.00002947 +[10:05:04] Epoch: 1 Batch: 5395/20099 (26.84%) Loss: 1.917780 LR: 0.00002947 +[10:05:06] Epoch: 1 Batch: 5396/20099 (26.85%) Loss: 2.378491 LR: 0.00002947 +[10:05:08] Epoch: 1 Batch: 5397/20099 (26.85%) Loss: 1.977208 LR: 0.00002947 +[10:05:10] Epoch: 1 Batch: 5398/20099 (26.86%) Loss: 2.216182 LR: 0.00002947 +[10:05:11] Epoch: 1 Batch: 5399/20099 (26.86%) Loss: 2.188872 LR: 0.00002947 +[10:05:17] >> Cleaned up old temp checkpoint: epoch1_step3400 +[10:05:17] >> Temp checkpoint saved: epoch1_step5400, size: 0.1693 GB +[10:05:17] Epoch: 1 Batch: 5400/20099 (26.87%) Loss: 2.098447 LR: 0.00002947 +[10:05:19] Epoch: 1 Batch: 5401/20099 (26.87%) Loss: 2.155879 LR: 0.00002947 +[10:05:20] Epoch: 1 Batch: 5402/20099 (26.88%) Loss: 2.188242 LR: 0.00002947 +[10:05:22] Epoch: 1 Batch: 5403/20099 (26.88%) Loss: 2.194258 LR: 0.00002947 +[10:05:24] Epoch: 1 Batch: 5404/20099 (26.89%) Loss: 2.280442 LR: 0.00002947 +[10:05:26] Epoch: 1 Batch: 5405/20099 (26.89%) Loss: 2.206422 LR: 0.00002947 +[10:05:27] Epoch: 1 Batch: 5406/20099 (26.90%) Loss: 2.300056 LR: 0.00002947 +[10:05:29] Epoch: 1 Batch: 5407/20099 (26.90%) Loss: 2.122420 LR: 0.00002946 +[10:05:31] Epoch: 1 Batch: 5408/20099 (26.91%) Loss: 2.074892 LR: 0.00002946 +[10:05:33] Epoch: 1 Batch: 5409/20099 (26.91%) Loss: 2.045554 LR: 0.00002946 +[10:05:34] Epoch: 1 Batch: 5410/20099 (26.92%) Loss: 1.814148 LR: 0.00002946 +[10:05:36] Epoch: 1 Batch: 5411/20099 (26.92%) Loss: 2.350235 LR: 0.00002946 +[10:05:38] Epoch: 1 Batch: 5412/20099 (26.93%) Loss: 2.165095 LR: 0.00002946 +[10:05:40] Epoch: 1 Batch: 5413/20099 (26.93%) Loss: 2.094590 LR: 0.00002946 +[10:05:42] Epoch: 1 Batch: 5414/20099 (26.94%) Loss: 2.133256 LR: 0.00002946 +[10:05:43] Epoch: 1 Batch: 5415/20099 (26.94%) Loss: 1.834228 LR: 0.00002946 +[10:05:45] Epoch: 1 Batch: 5416/20099 (26.95%) Loss: 2.399768 LR: 0.00002946 +[10:05:47] Epoch: 1 Batch: 5417/20099 (26.95%) Loss: 2.149693 LR: 0.00002946 +[10:05:49] Epoch: 1 Batch: 5418/20099 (26.96%) Loss: 2.088020 LR: 0.00002946 +[10:05:50] Epoch: 1 Batch: 5419/20099 (26.96%) Loss: 1.924644 LR: 0.00002946 +[10:05:52] Epoch: 1 Batch: 5420/20099 (26.97%) Loss: 2.209120 LR: 0.00002946 +[10:05:54] Epoch: 1 Batch: 5421/20099 (26.97%) Loss: 1.832779 LR: 0.00002945 +[10:05:56] Epoch: 1 Batch: 5422/20099 (26.98%) Loss: 2.265687 LR: 0.00002945 +[10:05:58] Epoch: 1 Batch: 5423/20099 (26.98%) Loss: 2.064221 LR: 0.00002945 +[10:05:59] Epoch: 1 Batch: 5424/20099 (26.99%) Loss: 2.100106 LR: 0.00002945 +[10:06:01] Epoch: 1 Batch: 5425/20099 (26.99%) Loss: 2.208666 LR: 0.00002945 +[10:06:03] Epoch: 1 Batch: 5426/20099 (27.00%) Loss: 2.229734 LR: 0.00002945 +[10:06:05] Epoch: 1 Batch: 5427/20099 (27.00%) Loss: 1.895799 LR: 0.00002945 +[10:06:06] Epoch: 1 Batch: 5428/20099 (27.01%) Loss: 2.145902 LR: 0.00002945 +[10:06:08] Epoch: 1 Batch: 5429/20099 (27.01%) Loss: 2.554694 LR: 0.00002945 +[10:06:10] Epoch: 1 Batch: 5430/20099 (27.02%) Loss: 2.137693 LR: 0.00002945 +[10:06:12] Epoch: 1 Batch: 5431/20099 (27.02%) Loss: 2.247862 LR: 0.00002945 +[10:06:14] Epoch: 1 Batch: 5432/20099 (27.03%) Loss: 2.299312 LR: 0.00002945 +[10:06:15] Epoch: 1 Batch: 5433/20099 (27.03%) Loss: 2.245755 LR: 0.00002945 +[10:06:17] Epoch: 1 Batch: 5434/20099 (27.04%) Loss: 2.007467 LR: 0.00002945 +[10:06:19] Epoch: 1 Batch: 5435/20099 (27.04%) Loss: 2.135342 LR: 0.00002944 +[10:06:21] Epoch: 1 Batch: 5436/20099 (27.05%) Loss: 2.060696 LR: 0.00002944 +[10:06:22] Epoch: 1 Batch: 5437/20099 (27.05%) Loss: 2.141539 LR: 0.00002944 +[10:06:24] Epoch: 1 Batch: 5438/20099 (27.06%) Loss: 2.083541 LR: 0.00002944 +[10:06:26] Epoch: 1 Batch: 5439/20099 (27.06%) Loss: 2.125227 LR: 0.00002944 +[10:06:28] Epoch: 1 Batch: 5440/20099 (27.07%) Loss: 2.110991 LR: 0.00002944 +[10:06:29] Epoch: 1 Batch: 5441/20099 (27.07%) Loss: 2.269356 LR: 0.00002944 +[10:06:31] Epoch: 1 Batch: 5442/20099 (27.08%) Loss: 1.770213 LR: 0.00002944 +[10:06:33] Epoch: 1 Batch: 5443/20099 (27.08%) Loss: 1.907457 LR: 0.00002944 +[10:06:35] Epoch: 1 Batch: 5444/20099 (27.09%) Loss: 2.083388 LR: 0.00002944 +[10:06:37] Epoch: 1 Batch: 5445/20099 (27.09%) Loss: 2.446862 LR: 0.00002944 +[10:06:38] Epoch: 1 Batch: 5446/20099 (27.10%) Loss: 1.941616 LR: 0.00002944 +[10:06:40] Epoch: 1 Batch: 5447/20099 (27.10%) Loss: 2.333537 LR: 0.00002944 +[10:06:42] Epoch: 1 Batch: 5448/20099 (27.11%) Loss: 1.777848 LR: 0.00002944 +[10:06:44] Epoch: 1 Batch: 5449/20099 (27.11%) Loss: 2.244641 LR: 0.00002943 +[10:06:45] Epoch: 1 Batch: 5450/20099 (27.12%) Loss: 2.130341 LR: 0.00002943 +[10:06:47] Epoch: 1 Batch: 5451/20099 (27.12%) Loss: 2.365151 LR: 0.00002943 +[10:06:49] Epoch: 1 Batch: 5452/20099 (27.13%) Loss: 2.368598 LR: 0.00002943 +[10:06:51] Epoch: 1 Batch: 5453/20099 (27.13%) Loss: 2.452541 LR: 0.00002943 +[10:06:53] Epoch: 1 Batch: 5454/20099 (27.14%) Loss: 1.995470 LR: 0.00002943 +[10:06:54] Epoch: 1 Batch: 5455/20099 (27.14%) Loss: 2.118965 LR: 0.00002943 +[10:06:56] Epoch: 1 Batch: 5456/20099 (27.15%) Loss: 2.170623 LR: 0.00002943 +[10:06:58] Epoch: 1 Batch: 5457/20099 (27.15%) Loss: 2.311609 LR: 0.00002943 +[10:07:00] Epoch: 1 Batch: 5458/20099 (27.16%) Loss: 1.917262 LR: 0.00002943 +[10:07:01] Epoch: 1 Batch: 5459/20099 (27.16%) Loss: 2.218131 LR: 0.00002943 +[10:07:03] Epoch: 1 Batch: 5460/20099 (27.17%) Loss: 2.243514 LR: 0.00002943 +[10:07:05] Epoch: 1 Batch: 5461/20099 (27.17%) Loss: 2.038677 LR: 0.00002943 +[10:07:07] Epoch: 1 Batch: 5462/20099 (27.18%) Loss: 2.286114 LR: 0.00002943 +[10:07:08] Epoch: 1 Batch: 5463/20099 (27.18%) Loss: 1.758963 LR: 0.00002942 +[10:07:10] Epoch: 1 Batch: 5464/20099 (27.19%) Loss: 2.262538 LR: 0.00002942 +[10:07:12] Epoch: 1 Batch: 5465/20099 (27.19%) Loss: 2.106436 LR: 0.00002942 +[10:07:14] Epoch: 1 Batch: 5466/20099 (27.20%) Loss: 2.104243 LR: 0.00002942 +[10:07:16] Epoch: 1 Batch: 5467/20099 (27.20%) Loss: 1.982912 LR: 0.00002942 +[10:07:17] Epoch: 1 Batch: 5468/20099 (27.21%) Loss: 2.145750 LR: 0.00002942 +[10:07:19] Epoch: 1 Batch: 5469/20099 (27.21%) Loss: 2.372891 LR: 0.00002942 +[10:07:21] Epoch: 1 Batch: 5470/20099 (27.22%) Loss: 2.088860 LR: 0.00002942 +[10:07:23] Epoch: 1 Batch: 5471/20099 (27.22%) Loss: 2.470699 LR: 0.00002942 +[10:07:24] Epoch: 1 Batch: 5472/20099 (27.23%) Loss: 2.321858 LR: 0.00002942 +[10:07:26] Epoch: 1 Batch: 5473/20099 (27.23%) Loss: 1.932436 LR: 0.00002942 +[10:07:28] Epoch: 1 Batch: 5474/20099 (27.24%) Loss: 2.066908 LR: 0.00002942 +[10:07:30] Epoch: 1 Batch: 5475/20099 (27.24%) Loss: 2.051298 LR: 0.00002942 +[10:07:32] Epoch: 1 Batch: 5476/20099 (27.25%) Loss: 2.022308 LR: 0.00002942 +[10:07:33] Epoch: 1 Batch: 5477/20099 (27.25%) Loss: 2.050642 LR: 0.00002941 +[10:07:35] Epoch: 1 Batch: 5478/20099 (27.26%) Loss: 2.252359 LR: 0.00002941 +[10:07:37] Epoch: 1 Batch: 5479/20099 (27.26%) Loss: 2.110823 LR: 0.00002941 +[10:07:39] Epoch: 1 Batch: 5480/20099 (27.27%) Loss: 2.200112 LR: 0.00002941 +[10:07:40] Epoch: 1 Batch: 5481/20099 (27.27%) Loss: 2.105589 LR: 0.00002941 +[10:07:42] Epoch: 1 Batch: 5482/20099 (27.27%) Loss: 2.304595 LR: 0.00002941 +[10:07:44] Epoch: 1 Batch: 5483/20099 (27.28%) Loss: 2.365151 LR: 0.00002941 +[10:07:46] Epoch: 1 Batch: 5484/20099 (27.28%) Loss: 2.251320 LR: 0.00002941 +[10:07:47] Epoch: 1 Batch: 5485/20099 (27.29%) Loss: 2.566852 LR: 0.00002941 +[10:07:49] Epoch: 1 Batch: 5486/20099 (27.29%) Loss: 2.216344 LR: 0.00002941 +[10:07:51] Epoch: 1 Batch: 5487/20099 (27.30%) Loss: 2.187454 LR: 0.00002941 +[10:07:53] Epoch: 1 Batch: 5488/20099 (27.30%) Loss: 2.055529 LR: 0.00002941 +[10:07:55] Epoch: 1 Batch: 5489/20099 (27.31%) Loss: 2.494184 LR: 0.00002941 +[10:07:56] Epoch: 1 Batch: 5490/20099 (27.31%) Loss: 2.404665 LR: 0.00002941 +[10:07:58] Epoch: 1 Batch: 5491/20099 (27.32%) Loss: 2.371977 LR: 0.00002940 +[10:08:00] Epoch: 1 Batch: 5492/20099 (27.32%) Loss: 2.162741 LR: 0.00002940 +[10:08:02] Epoch: 1 Batch: 5493/20099 (27.33%) Loss: 2.171967 LR: 0.00002940 +[10:08:03] Epoch: 1 Batch: 5494/20099 (27.33%) Loss: 2.256630 LR: 0.00002940 +[10:08:05] Epoch: 1 Batch: 5495/20099 (27.34%) Loss: 2.235033 LR: 0.00002940 +[10:08:07] Epoch: 1 Batch: 5496/20099 (27.34%) Loss: 2.151681 LR: 0.00002940 +[10:08:09] Epoch: 1 Batch: 5497/20099 (27.35%) Loss: 2.196597 LR: 0.00002940 +[10:08:11] Epoch: 1 Batch: 5498/20099 (27.35%) Loss: 2.422774 LR: 0.00002940 +[10:08:12] Epoch: 1 Batch: 5499/20099 (27.36%) Loss: 2.383378 LR: 0.00002940 +[10:08:14] >> Evaluating batch 0 +[10:08:15] >> Evaluating batch 1 +[10:08:16] >> Evaluating batch 2 +[10:08:17] >> Evaluating batch 3 +[10:08:18] >> Evaluating batch 4 +[10:08:19] >> Evaluating batch 5 +[10:08:20] >> Evaluating batch 6 +[10:08:21] >> Evaluating batch 7 +[10:08:22] >> Evaluating batch 8 +[10:08:23] >> Evaluating batch 9 +[10:08:24] >> Evaluating batch 10 +[10:08:25] >> Evaluating batch 11 +[10:08:26] >> Evaluating batch 12 +[10:08:27] >> Evaluating batch 13 +[10:08:28] >> Evaluating batch 14 +[10:08:29] >> Evaluating batch 15 +[10:08:30] >> Evaluating batch 16 +[10:08:31] Epoch: 1 Step: 5500/20099 Evaluation: +[10:08:31] [1mAvg Loss Since Last Eval: 2.1600 Val Loss: 2.2059 Validation loss delta: -0.0211 Perplexity: 9.0784 LR: 0.00002940 +[10:08:34] >> Checkpoint saved: epoch1_step5500, size: 0.1693 GB +[10:08:34] Epoch: 1 Batch: 5500/20099 (27.36%) Loss: 2.135148 LR: 0.00002940 +[10:08:36] Epoch: 1 Batch: 5501/20099 (27.37%) Loss: 2.050394 LR: 0.00002940 +[10:08:38] Epoch: 1 Batch: 5502/20099 (27.37%) Loss: 2.382435 LR: 0.00002940 +[10:08:39] Epoch: 1 Batch: 5503/20099 (27.38%) Loss: 2.240859 LR: 0.00002940 +[10:08:41] Epoch: 1 Batch: 5504/20099 (27.38%) Loss: 2.169502 LR: 0.00002940 +[10:08:43] Epoch: 1 Batch: 5505/20099 (27.39%) Loss: 2.138979 LR: 0.00002939 +[10:08:45] Epoch: 1 Batch: 5506/20099 (27.39%) Loss: 2.000754 LR: 0.00002939 +[10:08:47] Epoch: 1 Batch: 5507/20099 (27.40%) Loss: 2.019054 LR: 0.00002939 +[10:08:48] Epoch: 1 Batch: 5508/20099 (27.40%) Loss: 1.561484 LR: 0.00002939 +[10:08:50] Epoch: 1 Batch: 5509/20099 (27.41%) Loss: 2.237347 LR: 0.00002939 +[10:08:52] Epoch: 1 Batch: 5510/20099 (27.41%) Loss: 2.209234 LR: 0.00002939 +[10:08:54] Epoch: 1 Batch: 5511/20099 (27.42%) Loss: 1.989541 LR: 0.00002939 +[10:08:55] Epoch: 1 Batch: 5512/20099 (27.42%) Loss: 2.012194 LR: 0.00002939 +[10:08:57] Epoch: 1 Batch: 5513/20099 (27.43%) Loss: 2.156382 LR: 0.00002939 +[10:08:59] Epoch: 1 Batch: 5514/20099 (27.43%) Loss: 2.312996 LR: 0.00002939 +[10:09:01] Epoch: 1 Batch: 5515/20099 (27.44%) Loss: 2.292919 LR: 0.00002939 +[10:09:03] Epoch: 1 Batch: 5516/20099 (27.44%) Loss: 2.045623 LR: 0.00002939 +[10:09:04] Epoch: 1 Batch: 5517/20099 (27.45%) Loss: 2.201769 LR: 0.00002939 +[10:09:06] Epoch: 1 Batch: 5518/20099 (27.45%) Loss: 2.080319 LR: 0.00002939 +[10:09:08] Epoch: 1 Batch: 5519/20099 (27.46%) Loss: 2.302050 LR: 0.00002938 +[10:09:10] Epoch: 1 Batch: 5520/20099 (27.46%) Loss: 2.330819 LR: 0.00002938 +[10:09:12] Epoch: 1 Batch: 5521/20099 (27.47%) Loss: 1.991016 LR: 0.00002938 +[10:09:13] Epoch: 1 Batch: 5522/20099 (27.47%) Loss: 2.126603 LR: 0.00002938 +[10:09:15] Epoch: 1 Batch: 5523/20099 (27.48%) Loss: 1.951139 LR: 0.00002938 +[10:09:17] Epoch: 1 Batch: 5524/20099 (27.48%) Loss: 2.101253 LR: 0.00002938 +[10:09:19] Epoch: 1 Batch: 5525/20099 (27.49%) Loss: 2.094637 LR: 0.00002938 +[10:09:21] Epoch: 1 Batch: 5526/20099 (27.49%) Loss: 2.074740 LR: 0.00002938 +[10:09:22] Epoch: 1 Batch: 5527/20099 (27.50%) Loss: 2.380671 LR: 0.00002938 +[10:09:24] Epoch: 1 Batch: 5528/20099 (27.50%) Loss: 2.139341 LR: 0.00002938 +[10:09:26] Epoch: 1 Batch: 5529/20099 (27.51%) Loss: 1.942733 LR: 0.00002938 +[10:09:28] Epoch: 1 Batch: 5530/20099 (27.51%) Loss: 2.229770 LR: 0.00002938 +[10:09:29] Epoch: 1 Batch: 5531/20099 (27.52%) Loss: 1.756896 LR: 0.00002938 +[10:09:31] Epoch: 1 Batch: 5532/20099 (27.52%) Loss: 2.112085 LR: 0.00002938 +[10:09:33] Epoch: 1 Batch: 5533/20099 (27.53%) Loss: 2.402185 LR: 0.00002937 +[10:09:35] Epoch: 1 Batch: 5534/20099 (27.53%) Loss: 2.091736 LR: 0.00002937 +[10:09:36] Epoch: 1 Batch: 5535/20099 (27.54%) Loss: 2.148349 LR: 0.00002937 +[10:09:38] Epoch: 1 Batch: 5536/20099 (27.54%) Loss: 2.043629 LR: 0.00002937 +[10:09:40] Epoch: 1 Batch: 5537/20099 (27.55%) Loss: 2.483744 LR: 0.00002937 +[10:09:42] Epoch: 1 Batch: 5538/20099 (27.55%) Loss: 1.855629 LR: 0.00002937 +[10:09:43] Epoch: 1 Batch: 5539/20099 (27.56%) Loss: 2.267557 LR: 0.00002937 +[10:09:45] Epoch: 1 Batch: 5540/20099 (27.56%) Loss: 2.241838 LR: 0.00002937 +[10:09:47] Epoch: 1 Batch: 5541/20099 (27.57%) Loss: 2.420246 LR: 0.00002937 +[10:09:49] Epoch: 1 Batch: 5542/20099 (27.57%) Loss: 2.143618 LR: 0.00002937 +[10:09:50] Epoch: 1 Batch: 5543/20099 (27.58%) Loss: 1.988189 LR: 0.00002937 +[10:09:52] Epoch: 1 Batch: 5544/20099 (27.58%) Loss: 2.405669 LR: 0.00002937 +[10:09:54] Epoch: 1 Batch: 5545/20099 (27.59%) Loss: 2.077928 LR: 0.00002937 +[10:09:56] Epoch: 1 Batch: 5546/20099 (27.59%) Loss: 2.467732 LR: 0.00002937 +[10:09:57] Epoch: 1 Batch: 5547/20099 (27.60%) Loss: 2.068269 LR: 0.00002936 +[10:09:59] Epoch: 1 Batch: 5548/20099 (27.60%) Loss: 2.303580 LR: 0.00002936 +[10:10:01] Epoch: 1 Batch: 5549/20099 (27.61%) Loss: 1.947271 LR: 0.00002936 +[10:10:03] Epoch: 1 Batch: 5550/20099 (27.61%) Loss: 2.277041 LR: 0.00002936 +[10:10:05] Epoch: 1 Batch: 5551/20099 (27.62%) Loss: 2.111787 LR: 0.00002936 +[10:10:06] Epoch: 1 Batch: 5552/20099 (27.62%) Loss: 2.299436 LR: 0.00002936 +[10:10:08] Epoch: 1 Batch: 5553/20099 (27.63%) Loss: 2.185243 LR: 0.00002936 +[10:10:10] Epoch: 1 Batch: 5554/20099 (27.63%) Loss: 1.949589 LR: 0.00002936 +[10:10:12] Epoch: 1 Batch: 5555/20099 (27.64%) Loss: 2.251114 LR: 0.00002936 +[10:10:13] Epoch: 1 Batch: 5556/20099 (27.64%) Loss: 1.953321 LR: 0.00002936 +[10:10:15] Epoch: 1 Batch: 5557/20099 (27.65%) Loss: 1.925995 LR: 0.00002936 +[10:10:17] Epoch: 1 Batch: 5558/20099 (27.65%) Loss: 1.872311 LR: 0.00002936 +[10:10:19] Epoch: 1 Batch: 5559/20099 (27.66%) Loss: 2.090401 LR: 0.00002936 +[10:10:21] Epoch: 1 Batch: 5560/20099 (27.66%) Loss: 2.046410 LR: 0.00002936 +[10:10:22] Epoch: 1 Batch: 5561/20099 (27.67%) Loss: 2.167044 LR: 0.00002935 +[10:10:24] Epoch: 1 Batch: 5562/20099 (27.67%) Loss: 2.346011 LR: 0.00002935 +[10:10:26] Epoch: 1 Batch: 5563/20099 (27.68%) Loss: 2.250489 LR: 0.00002935 +[10:10:28] Epoch: 1 Batch: 5564/20099 (27.68%) Loss: 2.084104 LR: 0.00002935 +[10:10:29] Epoch: 1 Batch: 5565/20099 (27.69%) Loss: 1.955862 LR: 0.00002935 +[10:10:31] Epoch: 1 Batch: 5566/20099 (27.69%) Loss: 2.513808 LR: 0.00002935 +[10:10:33] Epoch: 1 Batch: 5567/20099 (27.70%) Loss: 2.211601 LR: 0.00002935 +[10:10:35] Epoch: 1 Batch: 5568/20099 (27.70%) Loss: 2.165299 LR: 0.00002935 +[10:10:36] Epoch: 1 Batch: 5569/20099 (27.71%) Loss: 2.359883 LR: 0.00002935 +[10:10:38] Epoch: 1 Batch: 5570/20099 (27.71%) Loss: 1.889619 LR: 0.00002935 +[10:10:40] Epoch: 1 Batch: 5571/20099 (27.72%) Loss: 2.032293 LR: 0.00002935 +[10:10:42] Epoch: 1 Batch: 5572/20099 (27.72%) Loss: 1.867296 LR: 0.00002935 +[10:10:44] Epoch: 1 Batch: 5573/20099 (27.73%) Loss: 2.366440 LR: 0.00002935 +[10:10:45] Epoch: 1 Batch: 5574/20099 (27.73%) Loss: 1.978371 LR: 0.00002935 +[10:10:47] Epoch: 1 Batch: 5575/20099 (27.74%) Loss: 2.119112 LR: 0.00002934 +[10:10:49] Epoch: 1 Batch: 5576/20099 (27.74%) Loss: 2.201457 LR: 0.00002934 +[10:10:51] Epoch: 1 Batch: 5577/20099 (27.75%) Loss: 2.335431 LR: 0.00002934 +[10:10:53] Epoch: 1 Batch: 5578/20099 (27.75%) Loss: 2.025382 LR: 0.00002934 +[10:10:54] Epoch: 1 Batch: 5579/20099 (27.76%) Loss: 2.122647 LR: 0.00002934 +[10:10:56] Epoch: 1 Batch: 5580/20099 (27.76%) Loss: 1.848374 LR: 0.00002934 +[10:10:58] Epoch: 1 Batch: 5581/20099 (27.77%) Loss: 2.231189 LR: 0.00002934 +[10:11:00] Epoch: 1 Batch: 5582/20099 (27.77%) Loss: 2.164372 LR: 0.00002934 +[10:11:01] Epoch: 1 Batch: 5583/20099 (27.78%) Loss: 2.229421 LR: 0.00002934 +[10:11:03] Epoch: 1 Batch: 5584/20099 (27.78%) Loss: 2.617900 LR: 0.00002934 +[10:11:05] Epoch: 1 Batch: 5585/20099 (27.79%) Loss: 2.127577 LR: 0.00002934 +[10:11:07] Epoch: 1 Batch: 5586/20099 (27.79%) Loss: 2.381475 LR: 0.00002934 +[10:11:09] Epoch: 1 Batch: 5587/20099 (27.80%) Loss: 2.089231 LR: 0.00002934 +[10:11:10] Epoch: 1 Batch: 5588/20099 (27.80%) Loss: 2.022465 LR: 0.00002934 +[10:11:12] Epoch: 1 Batch: 5589/20099 (27.81%) Loss: 1.922452 LR: 0.00002933 +[10:11:14] Epoch: 1 Batch: 5590/20099 (27.81%) Loss: 1.970361 LR: 0.00002933 +[10:11:16] Epoch: 1 Batch: 5591/20099 (27.82%) Loss: 2.031234 LR: 0.00002933 +[10:11:17] Epoch: 1 Batch: 5592/20099 (27.82%) Loss: 2.230311 LR: 0.00002933 +[10:11:19] Epoch: 1 Batch: 5593/20099 (27.83%) Loss: 1.999375 LR: 0.00002933 +[10:11:21] Epoch: 1 Batch: 5594/20099 (27.83%) Loss: 1.812546 LR: 0.00002933 +[10:11:23] Epoch: 1 Batch: 5595/20099 (27.84%) Loss: 2.361866 LR: 0.00002933 +[10:11:25] Epoch: 1 Batch: 5596/20099 (27.84%) Loss: 2.086416 LR: 0.00002932 +[10:11:26] Epoch: 1 Batch: 5597/20099 (27.85%) Loss: 2.132535 LR: 0.00002932 +[10:11:28] Epoch: 1 Batch: 5598/20099 (27.85%) Loss: 2.180308 LR: 0.00002932 +[10:11:30] Epoch: 1 Batch: 5599/20099 (27.86%) Loss: 2.239886 LR: 0.00002932 +[10:11:35] >> Cleaned up old temp checkpoint: epoch1_step3600 +[10:11:35] >> Temp checkpoint saved: epoch1_step5600, size: 0.1693 GB +[10:11:35] Epoch: 1 Batch: 5600/20099 (27.86%) Loss: 1.894940 LR: 0.00002932 +[10:11:37] Epoch: 1 Batch: 5601/20099 (27.87%) Loss: 2.414137 LR: 0.00002932 +[10:11:39] Epoch: 1 Batch: 5602/20099 (27.87%) Loss: 2.052530 LR: 0.00002932 +[10:11:40] Epoch: 1 Batch: 5603/20099 (27.88%) Loss: 2.082198 LR: 0.00002932 +[10:11:42] Epoch: 1 Batch: 5604/20099 (27.88%) Loss: 1.907498 LR: 0.00002932 +[10:11:44] Epoch: 1 Batch: 5605/20099 (27.89%) Loss: 1.994053 LR: 0.00002932 +[10:11:46] Epoch: 1 Batch: 5606/20099 (27.89%) Loss: 2.284043 LR: 0.00002932 +[10:11:47] Epoch: 1 Batch: 5607/20099 (27.90%) Loss: 1.969285 LR: 0.00002932 +[10:11:49] Epoch: 1 Batch: 5608/20099 (27.90%) Loss: 2.355072 LR: 0.00002932 +[10:11:51] Epoch: 1 Batch: 5609/20099 (27.91%) Loss: 2.029380 LR: 0.00002932 +[10:11:53] Epoch: 1 Batch: 5610/20099 (27.91%) Loss: 2.153576 LR: 0.00002931 +[10:11:55] Epoch: 1 Batch: 5611/20099 (27.92%) Loss: 2.069564 LR: 0.00002931 +[10:11:56] Epoch: 1 Batch: 5612/20099 (27.92%) Loss: 2.014031 LR: 0.00002931 +[10:11:58] Epoch: 1 Batch: 5613/20099 (27.93%) Loss: 1.939979 LR: 0.00002931 +[10:12:00] Epoch: 1 Batch: 5614/20099 (27.93%) Loss: 2.041470 LR: 0.00002931 +[10:12:02] Epoch: 1 Batch: 5615/20099 (27.94%) Loss: 2.142931 LR: 0.00002931 +[10:12:03] Epoch: 1 Batch: 5616/20099 (27.94%) Loss: 2.008445 LR: 0.00002931 +[10:12:05] Epoch: 1 Batch: 5617/20099 (27.95%) Loss: 2.145051 LR: 0.00002931 +[10:12:07] Epoch: 1 Batch: 5618/20099 (27.95%) Loss: 2.155002 LR: 0.00002931 +[10:12:09] Epoch: 1 Batch: 5619/20099 (27.96%) Loss: 1.986511 LR: 0.00002931 +[10:12:11] Epoch: 1 Batch: 5620/20099 (27.96%) Loss: 2.127598 LR: 0.00002931 +[10:12:12] Epoch: 1 Batch: 5621/20099 (27.97%) Loss: 1.629275 LR: 0.00002931 +[10:12:14] Epoch: 1 Batch: 5622/20099 (27.97%) Loss: 2.163292 LR: 0.00002931 +[10:12:16] Epoch: 1 Batch: 5623/20099 (27.98%) Loss: 2.116549 LR: 0.00002931 +[10:12:18] Epoch: 1 Batch: 5624/20099 (27.98%) Loss: 2.386443 LR: 0.00002930 +[10:12:20] Epoch: 1 Batch: 5625/20099 (27.99%) Loss: 2.358570 LR: 0.00002930 +[10:12:21] Epoch: 1 Batch: 5626/20099 (27.99%) Loss: 2.030927 LR: 0.00002930 +[10:12:23] Epoch: 1 Batch: 5627/20099 (28.00%) Loss: 2.140235 LR: 0.00002930 +[10:12:25] Epoch: 1 Batch: 5628/20099 (28.00%) Loss: 2.399337 LR: 0.00002930 +[10:12:27] Epoch: 1 Batch: 5629/20099 (28.01%) Loss: 2.120539 LR: 0.00002930 +[10:12:28] Epoch: 1 Batch: 5630/20099 (28.01%) Loss: 2.024320 LR: 0.00002930 +[10:12:30] Epoch: 1 Batch: 5631/20099 (28.02%) Loss: 2.009178 LR: 0.00002930 +[10:12:32] Epoch: 1 Batch: 5632/20099 (28.02%) Loss: 2.371442 LR: 0.00002930 +[10:12:34] Epoch: 1 Batch: 5633/20099 (28.03%) Loss: 2.062345 LR: 0.00002930 +[10:12:35] Epoch: 1 Batch: 5634/20099 (28.03%) Loss: 2.068314 LR: 0.00002930 +[10:12:37] Epoch: 1 Batch: 5635/20099 (28.04%) Loss: 2.095057 LR: 0.00002930 +[10:12:39] Epoch: 1 Batch: 5636/20099 (28.04%) Loss: 2.359134 LR: 0.00002930 +[10:12:41] Epoch: 1 Batch: 5637/20099 (28.05%) Loss: 2.065557 LR: 0.00002930 +[10:12:43] Epoch: 1 Batch: 5638/20099 (28.05%) Loss: 2.151374 LR: 0.00002929 +[10:12:44] Epoch: 1 Batch: 5639/20099 (28.06%) Loss: 2.017007 LR: 0.00002929 +[10:12:46] Epoch: 1 Batch: 5640/20099 (28.06%) Loss: 2.320213 LR: 0.00002929 +[10:12:48] Epoch: 1 Batch: 5641/20099 (28.07%) Loss: 2.116705 LR: 0.00002929 +[10:12:50] Epoch: 1 Batch: 5642/20099 (28.07%) Loss: 1.983116 LR: 0.00002929 +[10:12:51] Epoch: 1 Batch: 5643/20099 (28.08%) Loss: 1.936975 LR: 0.00002929 +[10:12:53] Epoch: 1 Batch: 5644/20099 (28.08%) Loss: 1.957347 LR: 0.00002929 +[10:12:55] Epoch: 1 Batch: 5645/20099 (28.09%) Loss: 2.084813 LR: 0.00002929 +[10:12:57] Epoch: 1 Batch: 5646/20099 (28.09%) Loss: 2.364551 LR: 0.00002929 +[10:12:58] Epoch: 1 Batch: 5647/20099 (28.10%) Loss: 2.107544 LR: 0.00002929 +[10:13:00] Epoch: 1 Batch: 5648/20099 (28.10%) Loss: 1.762895 LR: 0.00002929 +[10:13:02] Epoch: 1 Batch: 5649/20099 (28.11%) Loss: 1.863264 LR: 0.00002929 +[10:13:04] Epoch: 1 Batch: 5650/20099 (28.11%) Loss: 2.132744 LR: 0.00002929 +[10:13:05] Epoch: 1 Batch: 5651/20099 (28.12%) Loss: 2.136455 LR: 0.00002929 +[10:13:07] Epoch: 1 Batch: 5652/20099 (28.12%) Loss: 1.998704 LR: 0.00002928 +[10:13:09] Epoch: 1 Batch: 5653/20099 (28.13%) Loss: 2.460695 LR: 0.00002928 +[10:13:11] Epoch: 1 Batch: 5654/20099 (28.13%) Loss: 2.044469 LR: 0.00002928 +[10:13:13] Epoch: 1 Batch: 5655/20099 (28.14%) Loss: 2.005651 LR: 0.00002928 +[10:13:14] Epoch: 1 Batch: 5656/20099 (28.14%) Loss: 1.927574 LR: 0.00002928 +[10:13:16] Epoch: 1 Batch: 5657/20099 (28.15%) Loss: 2.589249 LR: 0.00002928 +[10:13:18] Epoch: 1 Batch: 5658/20099 (28.15%) Loss: 2.029427 LR: 0.00002928 +[10:13:20] Epoch: 1 Batch: 5659/20099 (28.16%) Loss: 2.304166 LR: 0.00002928 +[10:13:21] Epoch: 1 Batch: 5660/20099 (28.16%) Loss: 2.391147 LR: 0.00002928 +[10:13:23] Epoch: 1 Batch: 5661/20099 (28.17%) Loss: 1.880856 LR: 0.00002928 +[10:13:25] Epoch: 1 Batch: 5662/20099 (28.17%) Loss: 2.195905 LR: 0.00002928 +[10:13:27] Epoch: 1 Batch: 5663/20099 (28.18%) Loss: 2.084331 LR: 0.00002928 +[10:13:28] Epoch: 1 Batch: 5664/20099 (28.18%) Loss: 2.301641 LR: 0.00002928 +[10:13:30] Epoch: 1 Batch: 5665/20099 (28.19%) Loss: 2.381791 LR: 0.00002928 +[10:13:32] Epoch: 1 Batch: 5666/20099 (28.19%) Loss: 2.360234 LR: 0.00002927 +[10:13:34] Epoch: 1 Batch: 5667/20099 (28.20%) Loss: 1.816981 LR: 0.00002927 +[10:13:36] Epoch: 1 Batch: 5668/20099 (28.20%) Loss: 2.135162 LR: 0.00002927 +[10:13:37] Epoch: 1 Batch: 5669/20099 (28.21%) Loss: 2.242070 LR: 0.00002927 +[10:13:39] Epoch: 1 Batch: 5670/20099 (28.21%) Loss: 2.489749 LR: 0.00002927 +[10:13:41] Epoch: 1 Batch: 5671/20099 (28.22%) Loss: 1.994850 LR: 0.00002927 +[10:13:43] Epoch: 1 Batch: 5672/20099 (28.22%) Loss: 2.039171 LR: 0.00002927 +[10:13:45] Epoch: 1 Batch: 5673/20099 (28.23%) Loss: 2.356518 LR: 0.00002926 +[10:13:46] Epoch: 1 Batch: 5674/20099 (28.23%) Loss: 1.880168 LR: 0.00002926 +[10:13:48] Epoch: 1 Batch: 5675/20099 (28.24%) Loss: 2.207188 LR: 0.00002926 +[10:13:50] Epoch: 1 Batch: 5676/20099 (28.24%) Loss: 2.171134 LR: 0.00002926 +[10:13:52] Epoch: 1 Batch: 5677/20099 (28.25%) Loss: 2.059070 LR: 0.00002926 +[10:13:53] Epoch: 1 Batch: 5678/20099 (28.25%) Loss: 1.854773 LR: 0.00002926 +[10:13:55] Epoch: 1 Batch: 5679/20099 (28.26%) Loss: 2.116316 LR: 0.00002926 +[10:13:57] Epoch: 1 Batch: 5680/20099 (28.26%) Loss: 1.921187 LR: 0.00002926 +[10:13:59] Epoch: 1 Batch: 5681/20099 (28.27%) Loss: 2.105031 LR: 0.00002926 +[10:14:01] Epoch: 1 Batch: 5682/20099 (28.27%) Loss: 2.381476 LR: 0.00002926 +[10:14:02] Epoch: 1 Batch: 5683/20099 (28.28%) Loss: 1.901219 LR: 0.00002926 +[10:14:04] Epoch: 1 Batch: 5684/20099 (28.28%) Loss: 1.869229 LR: 0.00002926 +[10:14:06] Epoch: 1 Batch: 5685/20099 (28.28%) Loss: 2.090598 LR: 0.00002926 +[10:14:08] Epoch: 1 Batch: 5686/20099 (28.29%) Loss: 2.170013 LR: 0.00002926 +[10:14:09] Epoch: 1 Batch: 5687/20099 (28.29%) Loss: 2.098230 LR: 0.00002925 +[10:14:11] Epoch: 1 Batch: 5688/20099 (28.30%) Loss: 2.066526 LR: 0.00002925 +[10:14:13] Epoch: 1 Batch: 5689/20099 (28.30%) Loss: 1.836978 LR: 0.00002925 +[10:14:15] Epoch: 1 Batch: 5690/20099 (28.31%) Loss: 2.182305 LR: 0.00002925 +[10:14:16] Epoch: 1 Batch: 5691/20099 (28.31%) Loss: 1.683793 LR: 0.00002925 +[10:14:18] Epoch: 1 Batch: 5692/20099 (28.32%) Loss: 2.207219 LR: 0.00002925 +[10:14:20] Epoch: 1 Batch: 5693/20099 (28.32%) Loss: 2.218802 LR: 0.00002925 +[10:14:22] Epoch: 1 Batch: 5694/20099 (28.33%) Loss: 1.837239 LR: 0.00002925 +[10:14:24] Epoch: 1 Batch: 5695/20099 (28.33%) Loss: 2.094657 LR: 0.00002925 +[10:14:25] Epoch: 1 Batch: 5696/20099 (28.34%) Loss: 2.188740 LR: 0.00002925 +[10:14:27] Epoch: 1 Batch: 5697/20099 (28.34%) Loss: 2.120753 LR: 0.00002925 +[10:14:29] Epoch: 1 Batch: 5698/20099 (28.35%) Loss: 2.462973 LR: 0.00002925 +[10:14:31] Epoch: 1 Batch: 5699/20099 (28.35%) Loss: 2.105696 LR: 0.00002925 +[10:14:32] Epoch: 1 Batch: 5700/20099 (28.36%) Loss: 2.024656 LR: 0.00002925 +[10:14:34] Epoch: 1 Batch: 5701/20099 (28.36%) Loss: 2.146820 LR: 0.00002924 +[10:14:36] Epoch: 1 Batch: 5702/20099 (28.37%) Loss: 1.938148 LR: 0.00002924 +[10:14:38] Epoch: 1 Batch: 5703/20099 (28.37%) Loss: 2.097735 LR: 0.00002924 +[10:14:39] Epoch: 1 Batch: 5704/20099 (28.38%) Loss: 2.119946 LR: 0.00002924 +[10:14:41] Epoch: 1 Batch: 5705/20099 (28.38%) Loss: 2.287447 LR: 0.00002924 +[10:14:43] Epoch: 1 Batch: 5706/20099 (28.39%) Loss: 2.182202 LR: 0.00002924 +[10:14:45] Epoch: 1 Batch: 5707/20099 (28.39%) Loss: 2.140720 LR: 0.00002924 +[10:14:47] Epoch: 1 Batch: 5708/20099 (28.40%) Loss: 1.705332 LR: 0.00002924 +[10:14:48] Epoch: 1 Batch: 5709/20099 (28.40%) Loss: 2.296562 LR: 0.00002924 +[10:14:50] Epoch: 1 Batch: 5710/20099 (28.41%) Loss: 2.086496 LR: 0.00002924 +[10:14:52] Epoch: 1 Batch: 5711/20099 (28.41%) Loss: 1.921529 LR: 0.00002924 +[10:14:54] Epoch: 1 Batch: 5712/20099 (28.42%) Loss: 2.034347 LR: 0.00002924 +[10:14:55] Epoch: 1 Batch: 5713/20099 (28.42%) Loss: 2.295450 LR: 0.00002924 +[10:14:57] Epoch: 1 Batch: 5714/20099 (28.43%) Loss: 1.983602 LR: 0.00002924 +[10:14:59] Epoch: 1 Batch: 5715/20099 (28.43%) Loss: 2.069678 LR: 0.00002923 +[10:15:01] Epoch: 1 Batch: 5716/20099 (28.44%) Loss: 2.047155 LR: 0.00002923 +[10:15:02] Epoch: 1 Batch: 5717/20099 (28.44%) Loss: 1.722468 LR: 0.00002923 +[10:15:04] Epoch: 1 Batch: 5718/20099 (28.45%) Loss: 2.402497 LR: 0.00002923 +[10:15:06] Epoch: 1 Batch: 5719/20099 (28.45%) Loss: 2.098434 LR: 0.00002923 +[10:15:08] Epoch: 1 Batch: 5720/20099 (28.46%) Loss: 2.099648 LR: 0.00002923 +[10:15:09] Epoch: 1 Batch: 5721/20099 (28.46%) Loss: 2.055037 LR: 0.00002923 +[10:15:11] Epoch: 1 Batch: 5722/20099 (28.47%) Loss: 2.094568 LR: 0.00002922 +[10:15:13] Epoch: 1 Batch: 5723/20099 (28.47%) Loss: 2.250470 LR: 0.00002922 +[10:15:15] Epoch: 1 Batch: 5724/20099 (28.48%) Loss: 1.822640 LR: 0.00002922 +[10:15:16] Epoch: 1 Batch: 5725/20099 (28.48%) Loss: 2.073017 LR: 0.00002922 +[10:15:18] Epoch: 1 Batch: 5726/20099 (28.49%) Loss: 1.950014 LR: 0.00002922 +[10:15:20] Epoch: 1 Batch: 5727/20099 (28.49%) Loss: 2.179752 LR: 0.00002922 +[10:15:22] Epoch: 1 Batch: 5728/20099 (28.50%) Loss: 2.176908 LR: 0.00002922 +[10:15:24] Epoch: 1 Batch: 5729/20099 (28.50%) Loss: 2.224230 LR: 0.00002922 +[10:15:25] Epoch: 1 Batch: 5730/20099 (28.51%) Loss: 2.104697 LR: 0.00002922 +[10:15:27] Epoch: 1 Batch: 5731/20099 (28.51%) Loss: 2.072458 LR: 0.00002922 +[10:15:29] Epoch: 1 Batch: 5732/20099 (28.52%) Loss: 2.285072 LR: 0.00002922 +[10:15:31] Epoch: 1 Batch: 5733/20099 (28.52%) Loss: 2.084134 LR: 0.00002922 +[10:15:32] Epoch: 1 Batch: 5734/20099 (28.53%) Loss: 2.014092 LR: 0.00002922 +[10:15:34] Epoch: 1 Batch: 5735/20099 (28.53%) Loss: 2.317812 LR: 0.00002922 +[10:15:36] Epoch: 1 Batch: 5736/20099 (28.54%) Loss: 2.211993 LR: 0.00002921 +[10:15:38] Epoch: 1 Batch: 5737/20099 (28.54%) Loss: 2.171583 LR: 0.00002921 +[10:15:40] Epoch: 1 Batch: 5738/20099 (28.55%) Loss: 1.997504 LR: 0.00002921 +[10:15:41] Epoch: 1 Batch: 5739/20099 (28.55%) Loss: 2.006678 LR: 0.00002921 +[10:15:43] Epoch: 1 Batch: 5740/20099 (28.56%) Loss: 2.001716 LR: 0.00002921 +[10:15:45] Epoch: 1 Batch: 5741/20099 (28.56%) Loss: 1.693074 LR: 0.00002921 +[10:15:47] Epoch: 1 Batch: 5742/20099 (28.57%) Loss: 2.229803 LR: 0.00002921 +[10:15:48] Epoch: 1 Batch: 5743/20099 (28.57%) Loss: 1.969671 LR: 0.00002921 +[10:15:50] Epoch: 1 Batch: 5744/20099 (28.58%) Loss: 2.116443 LR: 0.00002921 +[10:15:52] Epoch: 1 Batch: 5745/20099 (28.58%) Loss: 2.381115 LR: 0.00002921 +[10:15:54] Epoch: 1 Batch: 5746/20099 (28.59%) Loss: 2.061568 LR: 0.00002921 +[10:15:55] Epoch: 1 Batch: 5747/20099 (28.59%) Loss: 2.302366 LR: 0.00002921 +[10:15:57] Epoch: 1 Batch: 5748/20099 (28.60%) Loss: 2.256037 LR: 0.00002921 +[10:15:59] Epoch: 1 Batch: 5749/20099 (28.60%) Loss: 2.017595 LR: 0.00002921 +[10:16:01] Epoch: 1 Batch: 5750/20099 (28.61%) Loss: 2.151429 LR: 0.00002920 +[10:16:03] Epoch: 1 Batch: 5751/20099 (28.61%) Loss: 2.444093 LR: 0.00002920 +[10:16:04] Epoch: 1 Batch: 5752/20099 (28.62%) Loss: 2.104065 LR: 0.00002920 +[10:16:06] Epoch: 1 Batch: 5753/20099 (28.62%) Loss: 2.111111 LR: 0.00002920 +[10:16:08] Epoch: 1 Batch: 5754/20099 (28.63%) Loss: 2.157824 LR: 0.00002920 +[10:16:10] Epoch: 1 Batch: 5755/20099 (28.63%) Loss: 2.135009 LR: 0.00002920 +[10:16:11] Epoch: 1 Batch: 5756/20099 (28.64%) Loss: 2.191044 LR: 0.00002920 +[10:16:13] Epoch: 1 Batch: 5757/20099 (28.64%) Loss: 1.912259 LR: 0.00002920 +[10:16:15] Epoch: 1 Batch: 5758/20099 (28.65%) Loss: 2.119370 LR: 0.00002920 +[10:16:17] Epoch: 1 Batch: 5759/20099 (28.65%) Loss: 1.945659 LR: 0.00002920 +[10:16:19] Epoch: 1 Batch: 5760/20099 (28.66%) Loss: 2.149083 LR: 0.00002920 +[10:16:20] Epoch: 1 Batch: 5761/20099 (28.66%) Loss: 2.184052 LR: 0.00002920 +[10:16:22] Epoch: 1 Batch: 5762/20099 (28.67%) Loss: 2.345678 LR: 0.00002920 +[10:16:24] Epoch: 1 Batch: 5763/20099 (28.67%) Loss: 1.962343 LR: 0.00002920 +[10:16:26] Epoch: 1 Batch: 5764/20099 (28.68%) Loss: 2.100049 LR: 0.00002919 +[10:16:27] Epoch: 1 Batch: 5765/20099 (28.68%) Loss: 2.187373 LR: 0.00002919 +[10:16:29] Epoch: 1 Batch: 5766/20099 (28.69%) Loss: 1.905284 LR: 0.00002919 +[10:16:31] Epoch: 1 Batch: 5767/20099 (28.69%) Loss: 2.311063 LR: 0.00002919 +[10:16:33] Epoch: 1 Batch: 5768/20099 (28.70%) Loss: 1.823513 LR: 0.00002919 +[10:16:35] Epoch: 1 Batch: 5769/20099 (28.70%) Loss: 2.005320 LR: 0.00002919 +[10:16:36] Epoch: 1 Batch: 5770/20099 (28.71%) Loss: 2.136724 LR: 0.00002919 +[10:16:38] Epoch: 1 Batch: 5771/20099 (28.71%) Loss: 1.905490 LR: 0.00002918 +[10:16:40] Epoch: 1 Batch: 5772/20099 (28.72%) Loss: 2.270180 LR: 0.00002918 +[10:16:42] Epoch: 1 Batch: 5773/20099 (28.72%) Loss: 2.004126 LR: 0.00002918 +[10:16:43] Epoch: 1 Batch: 5774/20099 (28.73%) Loss: 2.245483 LR: 0.00002918 +[10:16:45] Epoch: 1 Batch: 5775/20099 (28.73%) Loss: 1.817795 LR: 0.00002918 +[10:16:47] Epoch: 1 Batch: 5776/20099 (28.74%) Loss: 2.132286 LR: 0.00002918 +[10:16:49] Epoch: 1 Batch: 5777/20099 (28.74%) Loss: 2.200933 LR: 0.00002918 +[10:16:50] Epoch: 1 Batch: 5778/20099 (28.75%) Loss: 2.035585 LR: 0.00002918 +[10:16:52] Epoch: 1 Batch: 5779/20099 (28.75%) Loss: 2.103627 LR: 0.00002918 +[10:16:54] Epoch: 1 Batch: 5780/20099 (28.76%) Loss: 2.333654 LR: 0.00002918 +[10:16:56] Epoch: 1 Batch: 5781/20099 (28.76%) Loss: 1.916925 LR: 0.00002918 +[10:16:58] Epoch: 1 Batch: 5782/20099 (28.77%) Loss: 2.065562 LR: 0.00002918 +[10:16:59] Epoch: 1 Batch: 5783/20099 (28.77%) Loss: 2.384601 LR: 0.00002918 +[10:17:01] Epoch: 1 Batch: 5784/20099 (28.78%) Loss: 2.215942 LR: 0.00002918 +[10:17:03] Epoch: 1 Batch: 5785/20099 (28.78%) Loss: 2.281406 LR: 0.00002917 +[10:17:05] Epoch: 1 Batch: 5786/20099 (28.79%) Loss: 2.056295 LR: 0.00002917 +[10:17:06] Epoch: 1 Batch: 5787/20099 (28.79%) Loss: 2.142128 LR: 0.00002917 +[10:17:08] Epoch: 1 Batch: 5788/20099 (28.80%) Loss: 2.244149 LR: 0.00002917 +[10:17:10] Epoch: 1 Batch: 5789/20099 (28.80%) Loss: 1.840826 LR: 0.00002917 +[10:17:12] Epoch: 1 Batch: 5790/20099 (28.81%) Loss: 2.145611 LR: 0.00002917 +[10:17:14] Epoch: 1 Batch: 5791/20099 (28.81%) Loss: 2.051666 LR: 0.00002917 +[10:17:15] Epoch: 1 Batch: 5792/20099 (28.82%) Loss: 2.233139 LR: 0.00002917 +[10:17:17] Epoch: 1 Batch: 5793/20099 (28.82%) Loss: 2.360253 LR: 0.00002917 +[10:17:19] Epoch: 1 Batch: 5794/20099 (28.83%) Loss: 2.025477 LR: 0.00002917 +[10:17:21] Epoch: 1 Batch: 5795/20099 (28.83%) Loss: 2.015267 LR: 0.00002917 +[10:17:22] Epoch: 1 Batch: 5796/20099 (28.84%) Loss: 2.010987 LR: 0.00002917 +[10:17:24] Epoch: 1 Batch: 5797/20099 (28.84%) Loss: 2.410330 LR: 0.00002917 +[10:17:26] Epoch: 1 Batch: 5798/20099 (28.85%) Loss: 2.023354 LR: 0.00002917 +[10:17:28] Epoch: 1 Batch: 5799/20099 (28.85%) Loss: 2.053431 LR: 0.00002916 +[10:17:33] >> Cleaned up old temp checkpoint: epoch1_step3800 +[10:17:33] >> Temp checkpoint saved: epoch1_step5800, size: 0.1693 GB +[10:17:33] Epoch: 1 Batch: 5800/20099 (28.86%) Loss: 2.252732 LR: 0.00002916 +[10:17:35] Epoch: 1 Batch: 5801/20099 (28.86%) Loss: 1.946936 LR: 0.00002916 +[10:17:37] Epoch: 1 Batch: 5802/20099 (28.87%) Loss: 2.212658 LR: 0.00002916 +[10:17:38] Epoch: 1 Batch: 5803/20099 (28.87%) Loss: 2.017823 LR: 0.00002916 +[10:17:40] Epoch: 1 Batch: 5804/20099 (28.88%) Loss: 1.862233 LR: 0.00002916 +[10:17:42] Epoch: 1 Batch: 5805/20099 (28.88%) Loss: 2.395719 LR: 0.00002916 +[10:17:44] Epoch: 1 Batch: 5806/20099 (28.89%) Loss: 2.251369 LR: 0.00002915 +[10:17:45] Epoch: 1 Batch: 5807/20099 (28.89%) Loss: 2.091102 LR: 0.00002915 +[10:17:47] Epoch: 1 Batch: 5808/20099 (28.90%) Loss: 1.773430 LR: 0.00002915 +[10:17:49] Epoch: 1 Batch: 5809/20099 (28.90%) Loss: 2.162878 LR: 0.00002915 +[10:17:51] Epoch: 1 Batch: 5810/20099 (28.91%) Loss: 2.302252 LR: 0.00002915 +[10:17:52] Epoch: 1 Batch: 5811/20099 (28.91%) Loss: 2.141431 LR: 0.00002915 +[10:17:54] Epoch: 1 Batch: 5812/20099 (28.92%) Loss: 2.142077 LR: 0.00002915 +[10:17:56] Epoch: 1 Batch: 5813/20099 (28.92%) Loss: 1.947053 LR: 0.00002915 +[10:17:58] Epoch: 1 Batch: 5814/20099 (28.93%) Loss: 1.919007 LR: 0.00002915 +[10:18:00] Epoch: 1 Batch: 5815/20099 (28.93%) Loss: 2.215216 LR: 0.00002915 +[10:18:01] Epoch: 1 Batch: 5816/20099 (28.94%) Loss: 1.954203 LR: 0.00002915 +[10:18:03] Epoch: 1 Batch: 5817/20099 (28.94%) Loss: 2.168072 LR: 0.00002915 +[10:18:05] Epoch: 1 Batch: 5818/20099 (28.95%) Loss: 1.922810 LR: 0.00002915 +[10:18:07] Epoch: 1 Batch: 5819/20099 (28.95%) Loss: 2.528810 LR: 0.00002915 +[10:18:08] Epoch: 1 Batch: 5820/20099 (28.96%) Loss: 2.311135 LR: 0.00002914 +[10:18:10] Epoch: 1 Batch: 5821/20099 (28.96%) Loss: 2.044465 LR: 0.00002914 +[10:18:12] Epoch: 1 Batch: 5822/20099 (28.97%) Loss: 2.218964 LR: 0.00002914 +[10:18:14] Epoch: 1 Batch: 5823/20099 (28.97%) Loss: 2.233440 LR: 0.00002914 +[10:18:16] Epoch: 1 Batch: 5824/20099 (28.98%) Loss: 1.923606 LR: 0.00002914 +[10:18:17] Epoch: 1 Batch: 5825/20099 (28.98%) Loss: 1.863114 LR: 0.00002914 +[10:18:19] Epoch: 1 Batch: 5826/20099 (28.99%) Loss: 2.214758 LR: 0.00002914 +[10:18:21] Epoch: 1 Batch: 5827/20099 (28.99%) Loss: 2.340299 LR: 0.00002914 +[10:18:23] Epoch: 1 Batch: 5828/20099 (29.00%) Loss: 1.868752 LR: 0.00002914 +[10:18:25] Epoch: 1 Batch: 5829/20099 (29.00%) Loss: 1.971392 LR: 0.00002914 +[10:18:26] Epoch: 1 Batch: 5830/20099 (29.01%) Loss: 2.342893 LR: 0.00002914 +[10:18:28] Epoch: 1 Batch: 5831/20099 (29.01%) Loss: 1.891534 LR: 0.00002914 +[10:18:30] Epoch: 1 Batch: 5832/20099 (29.02%) Loss: 2.075529 LR: 0.00002914 +[10:18:32] Epoch: 1 Batch: 5833/20099 (29.02%) Loss: 2.150151 LR: 0.00002914 +[10:18:33] Epoch: 1 Batch: 5834/20099 (29.03%) Loss: 1.898393 LR: 0.00002913 +[10:18:35] Epoch: 1 Batch: 5835/20099 (29.03%) Loss: 2.163815 LR: 0.00002913 +[10:18:37] Epoch: 1 Batch: 5836/20099 (29.04%) Loss: 1.882176 LR: 0.00002913 +[10:18:39] Epoch: 1 Batch: 5837/20099 (29.04%) Loss: 2.119623 LR: 0.00002913 +[10:18:40] Epoch: 1 Batch: 5838/20099 (29.05%) Loss: 2.211334 LR: 0.00002913 +[10:18:42] Epoch: 1 Batch: 5839/20099 (29.05%) Loss: 1.980462 LR: 0.00002913 +[10:18:44] Epoch: 1 Batch: 5840/20099 (29.06%) Loss: 2.149172 LR: 0.00002913 +[10:18:46] Epoch: 1 Batch: 5841/20099 (29.06%) Loss: 2.283388 LR: 0.00002912 +[10:18:47] Epoch: 1 Batch: 5842/20099 (29.07%) Loss: 2.013088 LR: 0.00002912 +[10:18:49] Epoch: 1 Batch: 5843/20099 (29.07%) Loss: 1.766804 LR: 0.00002912 +[10:18:51] Epoch: 1 Batch: 5844/20099 (29.08%) Loss: 1.829578 LR: 0.00002912 +[10:18:53] Epoch: 1 Batch: 5845/20099 (29.08%) Loss: 1.938738 LR: 0.00002912 +[10:18:54] Epoch: 1 Batch: 5846/20099 (29.09%) Loss: 1.896629 LR: 0.00002912 +[10:18:56] Epoch: 1 Batch: 5847/20099 (29.09%) Loss: 2.148381 LR: 0.00002912 +[10:18:58] Epoch: 1 Batch: 5848/20099 (29.10%) Loss: 2.214371 LR: 0.00002912 +[10:19:00] Epoch: 1 Batch: 5849/20099 (29.10%) Loss: 2.221948 LR: 0.00002912 +[10:19:02] Epoch: 1 Batch: 5850/20099 (29.11%) Loss: 2.188726 LR: 0.00002912 +[10:19:03] Epoch: 1 Batch: 5851/20099 (29.11%) Loss: 2.305455 LR: 0.00002912 +[10:19:05] Epoch: 1 Batch: 5852/20099 (29.12%) Loss: 2.255434 LR: 0.00002912 +[10:19:07] Epoch: 1 Batch: 5853/20099 (29.12%) Loss: 2.107186 LR: 0.00002912 +[10:19:09] Epoch: 1 Batch: 5854/20099 (29.13%) Loss: 2.080858 LR: 0.00002912 +[10:19:10] Epoch: 1 Batch: 5855/20099 (29.13%) Loss: 2.363265 LR: 0.00002911 +[10:19:12] Epoch: 1 Batch: 5856/20099 (29.14%) Loss: 2.025679 LR: 0.00002911 +[10:19:14] Epoch: 1 Batch: 5857/20099 (29.14%) Loss: 2.018208 LR: 0.00002911 +[10:19:16] Epoch: 1 Batch: 5858/20099 (29.15%) Loss: 1.968245 LR: 0.00002911 +[10:19:17] Epoch: 1 Batch: 5859/20099 (29.15%) Loss: 1.882569 LR: 0.00002911 +[10:19:19] Epoch: 1 Batch: 5860/20099 (29.16%) Loss: 1.901354 LR: 0.00002911 +[10:19:21] Epoch: 1 Batch: 5861/20099 (29.16%) Loss: 1.999199 LR: 0.00002911 +[10:19:23] Epoch: 1 Batch: 5862/20099 (29.17%) Loss: 2.280003 LR: 0.00002911 +[10:19:25] Epoch: 1 Batch: 5863/20099 (29.17%) Loss: 2.045243 LR: 0.00002911 +[10:19:26] Epoch: 1 Batch: 5864/20099 (29.18%) Loss: 2.170026 LR: 0.00002911 +[10:19:28] Epoch: 1 Batch: 5865/20099 (29.18%) Loss: 2.148227 LR: 0.00002911 +[10:19:30] Epoch: 1 Batch: 5866/20099 (29.19%) Loss: 2.093106 LR: 0.00002911 +[10:19:32] Epoch: 1 Batch: 5867/20099 (29.19%) Loss: 2.228858 LR: 0.00002911 +[10:19:33] Epoch: 1 Batch: 5868/20099 (29.20%) Loss: 2.187702 LR: 0.00002911 +[10:19:36] Epoch: 1 Batch: 5869/20099 (29.20%) Loss: 1.842423 LR: 0.00002910 +[10:19:37] Epoch: 1 Batch: 5870/20099 (29.21%) Loss: 2.046632 LR: 0.00002910 +[10:19:39] Epoch: 1 Batch: 5871/20099 (29.21%) Loss: 2.246473 LR: 0.00002910 +[10:19:41] Epoch: 1 Batch: 5872/20099 (29.22%) Loss: 2.055605 LR: 0.00002910 +[10:19:43] Epoch: 1 Batch: 5873/20099 (29.22%) Loss: 1.927635 LR: 0.00002910 +[10:19:44] Epoch: 1 Batch: 5874/20099 (29.23%) Loss: 2.328952 LR: 0.00002910 +[10:19:46] Epoch: 1 Batch: 5875/20099 (29.23%) Loss: 2.221496 LR: 0.00002910 +[10:19:48] Epoch: 1 Batch: 5876/20099 (29.24%) Loss: 1.976502 LR: 0.00002909 +[10:19:50] Epoch: 1 Batch: 5877/20099 (29.24%) Loss: 2.202083 LR: 0.00002909 +[10:19:52] Epoch: 1 Batch: 5878/20099 (29.25%) Loss: 2.386108 LR: 0.00002909 +[10:19:53] Epoch: 1 Batch: 5879/20099 (29.25%) Loss: 2.067835 LR: 0.00002909 +[10:19:55] Epoch: 1 Batch: 5880/20099 (29.26%) Loss: 2.139376 LR: 0.00002909 +[10:19:57] Epoch: 1 Batch: 5881/20099 (29.26%) Loss: 1.653525 LR: 0.00002909 +[10:19:59] Epoch: 1 Batch: 5882/20099 (29.27%) Loss: 2.199294 LR: 0.00002909 +[10:20:01] Epoch: 1 Batch: 5883/20099 (29.27%) Loss: 2.005581 LR: 0.00002909 +[10:20:02] Epoch: 1 Batch: 5884/20099 (29.28%) Loss: 2.264829 LR: 0.00002909 +[10:20:04] Epoch: 1 Batch: 5885/20099 (29.28%) Loss: 2.250932 LR: 0.00002909 +[10:20:06] Epoch: 1 Batch: 5886/20099 (29.29%) Loss: 2.299768 LR: 0.00002909 +[10:20:08] Epoch: 1 Batch: 5887/20099 (29.29%) Loss: 2.241748 LR: 0.00002909 +[10:20:09] Epoch: 1 Batch: 5888/20099 (29.29%) Loss: 1.947544 LR: 0.00002909 +[10:20:11] Epoch: 1 Batch: 5889/20099 (29.30%) Loss: 2.396574 LR: 0.00002909 +[10:20:13] Epoch: 1 Batch: 5890/20099 (29.30%) Loss: 2.279308 LR: 0.00002908 +[10:20:15] Epoch: 1 Batch: 5891/20099 (29.31%) Loss: 2.005969 LR: 0.00002908 +[10:20:16] Epoch: 1 Batch: 5892/20099 (29.31%) Loss: 1.939330 LR: 0.00002908 +[10:20:18] Epoch: 1 Batch: 5893/20099 (29.32%) Loss: 2.073083 LR: 0.00002908 +[10:20:20] Epoch: 1 Batch: 5894/20099 (29.32%) Loss: 2.035916 LR: 0.00002908 +[10:20:22] Epoch: 1 Batch: 5895/20099 (29.33%) Loss: 2.042533 LR: 0.00002908 +[10:20:24] Epoch: 1 Batch: 5896/20099 (29.33%) Loss: 2.032497 LR: 0.00002908 +[10:20:25] Epoch: 1 Batch: 5897/20099 (29.34%) Loss: 2.209742 LR: 0.00002907 +[10:20:27] Epoch: 1 Batch: 5898/20099 (29.34%) Loss: 2.225151 LR: 0.00002907 +[10:20:29] Epoch: 1 Batch: 5899/20099 (29.35%) Loss: 2.253237 LR: 0.00002907 +[10:20:31] Epoch: 1 Batch: 5900/20099 (29.35%) Loss: 2.321652 LR: 0.00002907 +[10:20:32] Epoch: 1 Batch: 5901/20099 (29.36%) Loss: 1.963192 LR: 0.00002907 +[10:20:34] Epoch: 1 Batch: 5902/20099 (29.36%) Loss: 2.073853 LR: 0.00002907 +[10:20:36] Epoch: 1 Batch: 5903/20099 (29.37%) Loss: 2.155034 LR: 0.00002907 +[10:20:38] Epoch: 1 Batch: 5904/20099 (29.37%) Loss: 2.205460 LR: 0.00002907 +[10:20:39] Epoch: 1 Batch: 5905/20099 (29.38%) Loss: 2.440878 LR: 0.00002907 +[10:20:41] Epoch: 1 Batch: 5906/20099 (29.38%) Loss: 2.047995 LR: 0.00002907 +[10:20:43] Epoch: 1 Batch: 5907/20099 (29.39%) Loss: 2.002074 LR: 0.00002907 +[10:20:45] Epoch: 1 Batch: 5908/20099 (29.39%) Loss: 2.339496 LR: 0.00002907 +[10:20:46] Epoch: 1 Batch: 5909/20099 (29.40%) Loss: 2.004176 LR: 0.00002907 +[10:20:48] Epoch: 1 Batch: 5910/20099 (29.40%) Loss: 1.947823 LR: 0.00002907 +[10:20:50] Epoch: 1 Batch: 5911/20099 (29.41%) Loss: 2.054211 LR: 0.00002906 +[10:20:52] Epoch: 1 Batch: 5912/20099 (29.41%) Loss: 2.130546 LR: 0.00002906 +[10:20:53] Epoch: 1 Batch: 5913/20099 (29.42%) Loss: 2.271972 LR: 0.00002906 +[10:20:55] Epoch: 1 Batch: 5914/20099 (29.42%) Loss: 2.181969 LR: 0.00002906 +[10:20:57] Epoch: 1 Batch: 5915/20099 (29.43%) Loss: 2.134753 LR: 0.00002906 +[10:20:59] Epoch: 1 Batch: 5916/20099 (29.43%) Loss: 2.048719 LR: 0.00002906 +[10:21:00] Epoch: 1 Batch: 5917/20099 (29.44%) Loss: 1.804802 LR: 0.00002906 +[10:21:02] Epoch: 1 Batch: 5918/20099 (29.44%) Loss: 1.939336 LR: 0.00002906 +[10:21:04] Epoch: 1 Batch: 5919/20099 (29.45%) Loss: 2.106017 LR: 0.00002906 +[10:21:06] Epoch: 1 Batch: 5920/20099 (29.45%) Loss: 2.244561 LR: 0.00002906 +[10:21:07] Epoch: 1 Batch: 5921/20099 (29.46%) Loss: 2.299437 LR: 0.00002906 +[10:21:09] Epoch: 1 Batch: 5922/20099 (29.46%) Loss: 2.322139 LR: 0.00002906 +[10:21:11] Epoch: 1 Batch: 5923/20099 (29.47%) Loss: 2.005537 LR: 0.00002906 +[10:21:13] Epoch: 1 Batch: 5924/20099 (29.47%) Loss: 2.342740 LR: 0.00002906 +[10:21:15] Epoch: 1 Batch: 5925/20099 (29.48%) Loss: 2.270243 LR: 0.00002905 +[10:21:16] Epoch: 1 Batch: 5926/20099 (29.48%) Loss: 2.175947 LR: 0.00002905 +[10:21:18] Epoch: 1 Batch: 5927/20099 (29.49%) Loss: 2.267172 LR: 0.00002905 +[10:21:20] Epoch: 1 Batch: 5928/20099 (29.49%) Loss: 2.077535 LR: 0.00002905 +[10:21:22] Epoch: 1 Batch: 5929/20099 (29.50%) Loss: 2.302945 LR: 0.00002905 +[10:21:23] Epoch: 1 Batch: 5930/20099 (29.50%) Loss: 2.194579 LR: 0.00002905 +[10:21:25] Epoch: 1 Batch: 5931/20099 (29.51%) Loss: 2.130224 LR: 0.00002905 +[10:21:27] Epoch: 1 Batch: 5932/20099 (29.51%) Loss: 2.220417 LR: 0.00002904 +[10:21:29] Epoch: 1 Batch: 5933/20099 (29.52%) Loss: 2.043969 LR: 0.00002904 +[10:21:30] Epoch: 1 Batch: 5934/20099 (29.52%) Loss: 2.344446 LR: 0.00002904 +[10:21:32] Epoch: 1 Batch: 5935/20099 (29.53%) Loss: 1.893997 LR: 0.00002904 +[10:21:34] Epoch: 1 Batch: 5936/20099 (29.53%) Loss: 2.081080 LR: 0.00002904 +[10:21:36] Epoch: 1 Batch: 5937/20099 (29.54%) Loss: 2.073641 LR: 0.00002904 +[10:21:38] Epoch: 1 Batch: 5938/20099 (29.54%) Loss: 2.170569 LR: 0.00002904 +[10:21:39] Epoch: 1 Batch: 5939/20099 (29.55%) Loss: 1.994800 LR: 0.00002904 +[10:21:41] Epoch: 1 Batch: 5940/20099 (29.55%) Loss: 1.990452 LR: 0.00002904 +[10:21:43] Epoch: 1 Batch: 5941/20099 (29.56%) Loss: 2.211219 LR: 0.00002904 +[10:21:45] Epoch: 1 Batch: 5942/20099 (29.56%) Loss: 2.173639 LR: 0.00002904 +[10:21:46] Epoch: 1 Batch: 5943/20099 (29.57%) Loss: 2.154146 LR: 0.00002904 +[10:21:48] Epoch: 1 Batch: 5944/20099 (29.57%) Loss: 2.159600 LR: 0.00002904 +[10:21:50] Epoch: 1 Batch: 5945/20099 (29.58%) Loss: 2.287150 LR: 0.00002904 +[10:21:52] Epoch: 1 Batch: 5946/20099 (29.58%) Loss: 2.143012 LR: 0.00002903 +[10:21:53] Epoch: 1 Batch: 5947/20099 (29.59%) Loss: 1.946534 LR: 0.00002903 +[10:21:55] Epoch: 1 Batch: 5948/20099 (29.59%) Loss: 2.436218 LR: 0.00002903 +[10:21:57] Epoch: 1 Batch: 5949/20099 (29.60%) Loss: 1.906746 LR: 0.00002903 +[10:21:59] Epoch: 1 Batch: 5950/20099 (29.60%) Loss: 2.194941 LR: 0.00002903 +[10:22:00] Epoch: 1 Batch: 5951/20099 (29.61%) Loss: 2.156676 LR: 0.00002903 +[10:22:02] Epoch: 1 Batch: 5952/20099 (29.61%) Loss: 2.106206 LR: 0.00002903 +[10:22:04] Epoch: 1 Batch: 5953/20099 (29.62%) Loss: 2.352821 LR: 0.00002902 +[10:22:06] Epoch: 1 Batch: 5954/20099 (29.62%) Loss: 1.678390 LR: 0.00002902 +[10:22:08] Epoch: 1 Batch: 5955/20099 (29.63%) Loss: 2.142422 LR: 0.00002902 +[10:22:09] Epoch: 1 Batch: 5956/20099 (29.63%) Loss: 2.047773 LR: 0.00002902 +[10:22:11] Epoch: 1 Batch: 5957/20099 (29.64%) Loss: 2.518769 LR: 0.00002902 +[10:22:13] Epoch: 1 Batch: 5958/20099 (29.64%) Loss: 2.099605 LR: 0.00002902 +[10:22:15] Epoch: 1 Batch: 5959/20099 (29.65%) Loss: 2.049442 LR: 0.00002902 +[10:22:16] Epoch: 1 Batch: 5960/20099 (29.65%) Loss: 2.196264 LR: 0.00002902 +[10:22:18] Epoch: 1 Batch: 5961/20099 (29.66%) Loss: 2.447105 LR: 0.00002902 +[10:22:20] Epoch: 1 Batch: 5962/20099 (29.66%) Loss: 2.079891 LR: 0.00002902 +[10:22:22] Epoch: 1 Batch: 5963/20099 (29.67%) Loss: 2.270235 LR: 0.00002902 +[10:22:24] Epoch: 1 Batch: 5964/20099 (29.67%) Loss: 2.104357 LR: 0.00002902 +[10:22:25] Epoch: 1 Batch: 5965/20099 (29.68%) Loss: 2.012653 LR: 0.00002902 +[10:22:27] Epoch: 1 Batch: 5966/20099 (29.68%) Loss: 2.114477 LR: 0.00002902 +[10:22:29] Epoch: 1 Batch: 5967/20099 (29.69%) Loss: 2.193624 LR: 0.00002901 +[10:22:31] Epoch: 1 Batch: 5968/20099 (29.69%) Loss: 2.212496 LR: 0.00002901 +[10:22:32] Epoch: 1 Batch: 5969/20099 (29.70%) Loss: 2.412482 LR: 0.00002901 +[10:22:34] Epoch: 1 Batch: 5970/20099 (29.70%) Loss: 2.063107 LR: 0.00002901 +[10:22:36] Epoch: 1 Batch: 5971/20099 (29.71%) Loss: 2.264232 LR: 0.00002901 +[10:22:38] Epoch: 1 Batch: 5972/20099 (29.71%) Loss: 2.159037 LR: 0.00002901 +[10:22:40] Epoch: 1 Batch: 5973/20099 (29.72%) Loss: 2.496321 LR: 0.00002901 +[10:22:41] Epoch: 1 Batch: 5974/20099 (29.72%) Loss: 2.077712 LR: 0.00002900 +[10:22:43] Epoch: 1 Batch: 5975/20099 (29.73%) Loss: 2.139822 LR: 0.00002900 +[10:22:45] Epoch: 1 Batch: 5976/20099 (29.73%) Loss: 2.321510 LR: 0.00002900 +[10:22:47] Epoch: 1 Batch: 5977/20099 (29.74%) Loss: 2.046504 LR: 0.00002900 +[10:22:48] Epoch: 1 Batch: 5978/20099 (29.74%) Loss: 2.135754 LR: 0.00002900 +[10:22:50] Epoch: 1 Batch: 5979/20099 (29.75%) Loss: 2.259355 LR: 0.00002900 +[10:22:52] Epoch: 1 Batch: 5980/20099 (29.75%) Loss: 2.108790 LR: 0.00002900 +[10:22:54] Epoch: 1 Batch: 5981/20099 (29.76%) Loss: 2.185145 LR: 0.00002900 +[10:22:56] Epoch: 1 Batch: 5982/20099 (29.76%) Loss: 1.689453 LR: 0.00002900 +[10:22:57] Epoch: 1 Batch: 5983/20099 (29.77%) Loss: 2.188216 LR: 0.00002900 +[10:22:59] Epoch: 1 Batch: 5984/20099 (29.77%) Loss: 1.843337 LR: 0.00002900 +[10:23:01] Epoch: 1 Batch: 5985/20099 (29.78%) Loss: 2.168924 LR: 0.00002900 +[10:23:03] Epoch: 1 Batch: 5986/20099 (29.78%) Loss: 2.213278 LR: 0.00002900 +[10:23:04] Epoch: 1 Batch: 5987/20099 (29.79%) Loss: 2.233690 LR: 0.00002900 +[10:23:06] Epoch: 1 Batch: 5988/20099 (29.79%) Loss: 2.156695 LR: 0.00002899 +[10:23:08] Epoch: 1 Batch: 5989/20099 (29.80%) Loss: 2.337240 LR: 0.00002899 +[10:23:10] Epoch: 1 Batch: 5990/20099 (29.80%) Loss: 1.816768 LR: 0.00002899 +[10:23:12] Epoch: 1 Batch: 5991/20099 (29.81%) Loss: 2.117988 LR: 0.00002899 +[10:23:13] Epoch: 1 Batch: 5992/20099 (29.81%) Loss: 2.315064 LR: 0.00002899 +[10:23:15] Epoch: 1 Batch: 5993/20099 (29.82%) Loss: 2.368414 LR: 0.00002899 +[10:23:17] Epoch: 1 Batch: 5994/20099 (29.82%) Loss: 2.126934 LR: 0.00002899 +[10:23:19] Epoch: 1 Batch: 5995/20099 (29.83%) Loss: 2.240390 LR: 0.00002899 +[10:23:20] Epoch: 1 Batch: 5996/20099 (29.83%) Loss: 2.075534 LR: 0.00002899 +[10:23:22] Epoch: 1 Batch: 5997/20099 (29.84%) Loss: 2.140616 LR: 0.00002899 +[10:23:24] Epoch: 1 Batch: 5998/20099 (29.84%) Loss: 2.043946 LR: 0.00002899 +[10:23:26] Epoch: 1 Batch: 5999/20099 (29.85%) Loss: 1.942797 LR: 0.00002899 +[10:23:28] >> Evaluating batch 0 +[10:23:29] >> Evaluating batch 1 +[10:23:30] >> Evaluating batch 2 +[10:23:31] >> Evaluating batch 3 +[10:23:32] >> Evaluating batch 4 +[10:23:33] >> Evaluating batch 5 +[10:23:34] >> Evaluating batch 6 +[10:23:35] >> Evaluating batch 7 +[10:23:36] >> Evaluating batch 8 +[10:23:37] >> Evaluating batch 9 +[10:23:38] >> Evaluating batch 10 +[10:23:39] >> Evaluating batch 11 +[10:23:40] >> Evaluating batch 12 +[10:23:41] >> Evaluating batch 13 +[10:23:42] >> Evaluating batch 14 +[10:23:43] >> Evaluating batch 15 +[10:23:44] >> Evaluating batch 16 +[10:23:44] Epoch: 1 Step: 6000/20099 Evaluation: +[10:23:44] [1mAvg Loss Since Last Eval: 2.1212 Val Loss: 2.1983 Validation loss delta: -0.0075 Perplexity: 9.0101 LR: 0.00002899 +[10:23:48] >> Cleaned up old temp checkpoint: epoch1_step4000 +[10:23:48] >> Temp checkpoint saved: epoch1_step6000, size: 0.1693 GB +[10:23:51] >> Checkpoint saved: epoch1_step6000, size: 0.1693 GB +[10:23:51] Epoch: 1 Batch: 6000/20099 (29.85%) Loss: 2.167515 LR: 0.00002899 +[10:23:53] Epoch: 1 Batch: 6001/20099 (29.86%) Loss: 2.266573 LR: 0.00002899 +[10:23:55] Epoch: 1 Batch: 6002/20099 (29.86%) Loss: 2.149224 LR: 0.00002898 +[10:23:57] Epoch: 1 Batch: 6003/20099 (29.87%) Loss: 2.416745 LR: 0.00002898 +[10:23:58] Epoch: 1 Batch: 6004/20099 (29.87%) Loss: 1.998012 LR: 0.00002898 +[10:24:00] Epoch: 1 Batch: 6005/20099 (29.88%) Loss: 2.026496 LR: 0.00002898 +[10:24:02] Epoch: 1 Batch: 6006/20099 (29.88%) Loss: 1.998003 LR: 0.00002898 +[10:24:04] Epoch: 1 Batch: 6007/20099 (29.89%) Loss: 2.318587 LR: 0.00002898 +[10:24:05] Epoch: 1 Batch: 6008/20099 (29.89%) Loss: 2.047056 LR: 0.00002898 +[10:24:07] Epoch: 1 Batch: 6009/20099 (29.90%) Loss: 2.286715 LR: 0.00002897 +[10:24:09] Epoch: 1 Batch: 6010/20099 (29.90%) Loss: 2.034576 LR: 0.00002897 +[10:24:11] Epoch: 1 Batch: 6011/20099 (29.91%) Loss: 2.146474 LR: 0.00002897 +[10:24:13] Epoch: 1 Batch: 6012/20099 (29.91%) Loss: 2.176541 LR: 0.00002897 +[10:24:14] Epoch: 1 Batch: 6013/20099 (29.92%) Loss: 2.397283 LR: 0.00002897 +[10:24:16] Epoch: 1 Batch: 6014/20099 (29.92%) Loss: 2.033884 LR: 0.00002897 +[10:24:18] Epoch: 1 Batch: 6015/20099 (29.93%) Loss: 2.410838 LR: 0.00002897 +[10:24:20] Epoch: 1 Batch: 6016/20099 (29.93%) Loss: 2.079157 LR: 0.00002897 +[10:24:22] Epoch: 1 Batch: 6017/20099 (29.94%) Loss: 2.436302 LR: 0.00002897 +[10:24:24] Epoch: 1 Batch: 6018/20099 (29.94%) Loss: 2.068577 LR: 0.00002897 +[10:24:26] Epoch: 1 Batch: 6019/20099 (29.95%) Loss: 2.201342 LR: 0.00002897 +[10:24:27] Epoch: 1 Batch: 6020/20099 (29.95%) Loss: 1.943850 LR: 0.00002897 +[10:24:29] Epoch: 1 Batch: 6021/20099 (29.96%) Loss: 1.916546 LR: 0.00002897 +[10:24:31] Epoch: 1 Batch: 6022/20099 (29.96%) Loss: 1.992570 LR: 0.00002897 +[10:24:33] Epoch: 1 Batch: 6023/20099 (29.97%) Loss: 2.136137 LR: 0.00002896 +[10:24:35] Epoch: 1 Batch: 6024/20099 (29.97%) Loss: 2.231624 LR: 0.00002896 +[10:24:36] Epoch: 1 Batch: 6025/20099 (29.98%) Loss: 1.956048 LR: 0.00002896 +[10:24:38] Epoch: 1 Batch: 6026/20099 (29.98%) Loss: 2.345159 LR: 0.00002896 +[10:24:40] Epoch: 1 Batch: 6027/20099 (29.99%) Loss: 2.176679 LR: 0.00002896 +[10:24:42] Epoch: 1 Batch: 6028/20099 (29.99%) Loss: 2.260270 LR: 0.00002896 +[10:24:43] Epoch: 1 Batch: 6029/20099 (30.00%) Loss: 2.069808 LR: 0.00002896 +[10:24:45] Epoch: 1 Batch: 6030/20099 (30.00%) Loss: 2.154341 LR: 0.00002895 +[10:24:47] Epoch: 1 Batch: 6031/20099 (30.01%) Loss: 2.073474 LR: 0.00002895 +[10:24:49] Epoch: 1 Batch: 6032/20099 (30.01%) Loss: 1.865573 LR: 0.00002895 +[10:24:50] Epoch: 1 Batch: 6033/20099 (30.02%) Loss: 2.182718 LR: 0.00002895 +[10:24:52] Epoch: 1 Batch: 6034/20099 (30.02%) Loss: 2.185627 LR: 0.00002895 +[10:24:54] Epoch: 1 Batch: 6035/20099 (30.03%) Loss: 2.200272 LR: 0.00002895 +[10:24:56] Epoch: 1 Batch: 6036/20099 (30.03%) Loss: 2.067059 LR: 0.00002895 +[10:24:58] Epoch: 1 Batch: 6037/20099 (30.04%) Loss: 2.193210 LR: 0.00002895 +[10:24:59] Epoch: 1 Batch: 6038/20099 (30.04%) Loss: 2.150224 LR: 0.00002895 +[10:25:01] Epoch: 1 Batch: 6039/20099 (30.05%) Loss: 2.230484 LR: 0.00002895 +[10:25:03] Epoch: 1 Batch: 6040/20099 (30.05%) Loss: 1.854348 LR: 0.00002895 +[10:25:04] Epoch: 1 Batch: 6041/20099 (30.06%) Loss: 1.890896 LR: 0.00002895 +[10:25:06] Epoch: 1 Batch: 6042/20099 (30.06%) Loss: 2.412341 LR: 0.00002895 +[10:25:08] Epoch: 1 Batch: 6043/20099 (30.07%) Loss: 2.204033 LR: 0.00002895 +[10:25:10] Epoch: 1 Batch: 6044/20099 (30.07%) Loss: 2.274420 LR: 0.00002894 +[10:25:11] Epoch: 1 Batch: 6045/20099 (30.08%) Loss: 2.042023 LR: 0.00002894 +[10:25:13] Epoch: 1 Batch: 6046/20099 (30.08%) Loss: 2.320745 LR: 0.00002894 +[10:25:15] Epoch: 1 Batch: 6047/20099 (30.09%) Loss: 2.461384 LR: 0.00002894 +[10:25:17] Epoch: 1 Batch: 6048/20099 (30.09%) Loss: 2.203166 LR: 0.00002894 +[10:25:18] Epoch: 1 Batch: 6049/20099 (30.10%) Loss: 2.169242 LR: 0.00002894 +[10:25:20] Epoch: 1 Batch: 6050/20099 (30.10%) Loss: 2.332826 LR: 0.00002894 +[10:25:22] Epoch: 1 Batch: 6051/20099 (30.11%) Loss: 2.099708 LR: 0.00002893 +[10:25:24] Epoch: 1 Batch: 6052/20099 (30.11%) Loss: 2.024796 LR: 0.00002893 +[10:25:25] Epoch: 1 Batch: 6053/20099 (30.12%) Loss: 1.987003 LR: 0.00002893 +[10:25:27] Epoch: 1 Batch: 6054/20099 (30.12%) Loss: 2.101177 LR: 0.00002893 +[10:25:29] Epoch: 1 Batch: 6055/20099 (30.13%) Loss: 2.091612 LR: 0.00002893 +[10:25:31] Epoch: 1 Batch: 6056/20099 (30.13%) Loss: 2.057204 LR: 0.00002893 +[10:25:33] Epoch: 1 Batch: 6057/20099 (30.14%) Loss: 1.815431 LR: 0.00002893 +[10:25:34] Epoch: 1 Batch: 6058/20099 (30.14%) Loss: 2.234263 LR: 0.00002893 +[10:25:36] Epoch: 1 Batch: 6059/20099 (30.15%) Loss: 2.188094 LR: 0.00002893 +[10:25:38] Epoch: 1 Batch: 6060/20099 (30.15%) Loss: 2.166759 LR: 0.00002893 +[10:25:40] Epoch: 1 Batch: 6061/20099 (30.16%) Loss: 2.220728 LR: 0.00002893 +[10:25:41] Epoch: 1 Batch: 6062/20099 (30.16%) Loss: 2.046496 LR: 0.00002893 +[10:25:43] Epoch: 1 Batch: 6063/20099 (30.17%) Loss: 2.167997 LR: 0.00002893 +[10:25:45] Epoch: 1 Batch: 6064/20099 (30.17%) Loss: 1.983816 LR: 0.00002893 +[10:25:47] Epoch: 1 Batch: 6065/20099 (30.18%) Loss: 2.131745 LR: 0.00002892 +[10:25:49] Epoch: 1 Batch: 6066/20099 (30.18%) Loss: 2.143176 LR: 0.00002892 +[10:25:50] Epoch: 1 Batch: 6067/20099 (30.19%) Loss: 1.867975 LR: 0.00002892 +[10:25:52] Epoch: 1 Batch: 6068/20099 (30.19%) Loss: 1.938297 LR: 0.00002892 +[10:25:54] Epoch: 1 Batch: 6069/20099 (30.20%) Loss: 2.180942 LR: 0.00002892 +[10:25:56] Epoch: 1 Batch: 6070/20099 (30.20%) Loss: 2.065716 LR: 0.00002892 +[10:25:58] Epoch: 1 Batch: 6071/20099 (30.21%) Loss: 2.264986 LR: 0.00002892 +[10:25:59] Epoch: 1 Batch: 6072/20099 (30.21%) Loss: 2.029092 LR: 0.00002891 +[10:26:01] Epoch: 1 Batch: 6073/20099 (30.22%) Loss: 2.211565 LR: 0.00002891 +[10:26:03] Epoch: 1 Batch: 6074/20099 (30.22%) Loss: 1.899963 LR: 0.00002891 +[10:26:05] Epoch: 1 Batch: 6075/20099 (30.23%) Loss: 2.088056 LR: 0.00002891 +[10:26:06] Epoch: 1 Batch: 6076/20099 (30.23%) Loss: 2.019875 LR: 0.00002891 +[10:26:08] Epoch: 1 Batch: 6077/20099 (30.24%) Loss: 2.153631 LR: 0.00002891 +[10:26:10] Epoch: 1 Batch: 6078/20099 (30.24%) Loss: 2.272575 LR: 0.00002891 +[10:26:12] Epoch: 1 Batch: 6079/20099 (30.25%) Loss: 1.961095 LR: 0.00002891 +[10:26:13] Epoch: 1 Batch: 6080/20099 (30.25%) Loss: 2.013952 LR: 0.00002891 +[10:26:15] Epoch: 1 Batch: 6081/20099 (30.26%) Loss: 2.137667 LR: 0.00002891 +[10:26:17] Epoch: 1 Batch: 6082/20099 (30.26%) Loss: 2.041891 LR: 0.00002891 +[10:26:19] Epoch: 1 Batch: 6083/20099 (30.27%) Loss: 1.925700 LR: 0.00002891 +[10:26:21] Epoch: 1 Batch: 6084/20099 (30.27%) Loss: 2.056208 LR: 0.00002891 +[10:26:22] Epoch: 1 Batch: 6085/20099 (30.28%) Loss: 2.049197 LR: 0.00002891 +[10:26:24] Epoch: 1 Batch: 6086/20099 (30.28%) Loss: 2.100750 LR: 0.00002890 +[10:26:26] Epoch: 1 Batch: 6087/20099 (30.29%) Loss: 2.081678 LR: 0.00002890 +[10:26:28] Epoch: 1 Batch: 6088/20099 (30.29%) Loss: 2.102439 LR: 0.00002890 +[10:26:29] Epoch: 1 Batch: 6089/20099 (30.30%) Loss: 2.008194 LR: 0.00002890 +[10:26:31] Epoch: 1 Batch: 6090/20099 (30.30%) Loss: 2.301480 LR: 0.00002890 +[10:26:33] Epoch: 1 Batch: 6091/20099 (30.30%) Loss: 2.023216 LR: 0.00002890 +[10:26:35] Epoch: 1 Batch: 6092/20099 (30.31%) Loss: 1.994119 LR: 0.00002890 +[10:26:36] Epoch: 1 Batch: 6093/20099 (30.31%) Loss: 2.303020 LR: 0.00002889 +[10:26:38] Epoch: 1 Batch: 6094/20099 (30.32%) Loss: 2.020190 LR: 0.00002889 +[10:26:40] Epoch: 1 Batch: 6095/20099 (30.32%) Loss: 2.220544 LR: 0.00002889 +[10:26:42] Epoch: 1 Batch: 6096/20099 (30.33%) Loss: 2.031635 LR: 0.00002889 +[10:26:44] Epoch: 1 Batch: 6097/20099 (30.33%) Loss: 2.173140 LR: 0.00002889 +[10:26:45] Epoch: 1 Batch: 6098/20099 (30.34%) Loss: 1.903060 LR: 0.00002889 +[10:26:47] Epoch: 1 Batch: 6099/20099 (30.34%) Loss: 2.013709 LR: 0.00002889 +[10:26:49] Epoch: 1 Batch: 6100/20099 (30.35%) Loss: 2.122766 LR: 0.00002889 +[10:26:51] Epoch: 1 Batch: 6101/20099 (30.35%) Loss: 2.296837 LR: 0.00002889 +[10:26:52] Epoch: 1 Batch: 6102/20099 (30.36%) Loss: 1.961914 LR: 0.00002889 +[10:26:54] Epoch: 1 Batch: 6103/20099 (30.36%) Loss: 2.200015 LR: 0.00002889 +[10:26:56] Epoch: 1 Batch: 6104/20099 (30.37%) Loss: 1.954469 LR: 0.00002889 +[10:26:58] Epoch: 1 Batch: 6105/20099 (30.37%) Loss: 1.752756 LR: 0.00002889 +[10:26:59] Epoch: 1 Batch: 6106/20099 (30.38%) Loss: 2.327977 LR: 0.00002889 +[10:27:01] Epoch: 1 Batch: 6107/20099 (30.38%) Loss: 2.040295 LR: 0.00002888 +[10:27:03] Epoch: 1 Batch: 6108/20099 (30.39%) Loss: 2.174480 LR: 0.00002888 +[10:27:05] Epoch: 1 Batch: 6109/20099 (30.39%) Loss: 2.002710 LR: 0.00002888 +[10:27:06] Epoch: 1 Batch: 6110/20099 (30.40%) Loss: 2.304384 LR: 0.00002888 +[10:27:08] Epoch: 1 Batch: 6111/20099 (30.40%) Loss: 1.993517 LR: 0.00002888 +[10:27:10] Epoch: 1 Batch: 6112/20099 (30.41%) Loss: 2.279287 LR: 0.00002888 +[10:27:12] Epoch: 1 Batch: 6113/20099 (30.41%) Loss: 2.003711 LR: 0.00002888 +[10:27:14] Epoch: 1 Batch: 6114/20099 (30.42%) Loss: 2.131424 LR: 0.00002887 +[10:27:15] Epoch: 1 Batch: 6115/20099 (30.42%) Loss: 2.435085 LR: 0.00002887 +[10:27:17] Epoch: 1 Batch: 6116/20099 (30.43%) Loss: 2.330194 LR: 0.00002887 +[10:27:19] Epoch: 1 Batch: 6117/20099 (30.43%) Loss: 2.479560 LR: 0.00002887 +[10:27:21] Epoch: 1 Batch: 6118/20099 (30.44%) Loss: 2.068970 LR: 0.00002887 +[10:27:22] Epoch: 1 Batch: 6119/20099 (30.44%) Loss: 1.789693 LR: 0.00002887 +[10:27:24] Epoch: 1 Batch: 6120/20099 (30.45%) Loss: 1.911541 LR: 0.00002887 +[10:27:26] Epoch: 1 Batch: 6121/20099 (30.45%) Loss: 2.033945 LR: 0.00002886 +[10:27:28] Epoch: 1 Batch: 6122/20099 (30.46%) Loss: 2.032663 LR: 0.00002886 +[10:27:29] Epoch: 1 Batch: 6123/20099 (30.46%) Loss: 2.106970 LR: 0.00002886 +[10:27:31] Epoch: 1 Batch: 6124/20099 (30.47%) Loss: 2.449900 LR: 0.00002886 +[10:27:33] Epoch: 1 Batch: 6125/20099 (30.47%) Loss: 2.245286 LR: 0.00002886 +[10:27:35] Epoch: 1 Batch: 6126/20099 (30.48%) Loss: 2.485668 LR: 0.00002886 +[10:27:36] Epoch: 1 Batch: 6127/20099 (30.48%) Loss: 2.077368 LR: 0.00002886 +[10:27:38] Epoch: 1 Batch: 6128/20099 (30.49%) Loss: 2.203731 LR: 0.00002886 +[10:27:40] Epoch: 1 Batch: 6129/20099 (30.49%) Loss: 1.978845 LR: 0.00002886 +[10:27:42] Epoch: 1 Batch: 6130/20099 (30.50%) Loss: 1.775719 LR: 0.00002886 +[10:27:43] Epoch: 1 Batch: 6131/20099 (30.50%) Loss: 2.194798 LR: 0.00002886 +[10:27:45] Epoch: 1 Batch: 6132/20099 (30.51%) Loss: 2.162802 LR: 0.00002886 +[10:27:47] Epoch: 1 Batch: 6133/20099 (30.51%) Loss: 2.121477 LR: 0.00002886 +[10:27:49] Epoch: 1 Batch: 6134/20099 (30.52%) Loss: 1.615075 LR: 0.00002886 +[10:27:50] Epoch: 1 Batch: 6135/20099 (30.52%) Loss: 2.252779 LR: 0.00002885 +[10:27:52] Epoch: 1 Batch: 6136/20099 (30.53%) Loss: 2.148409 LR: 0.00002885 +[10:27:54] Epoch: 1 Batch: 6137/20099 (30.53%) Loss: 2.126829 LR: 0.00002885 +[10:27:56] Epoch: 1 Batch: 6138/20099 (30.54%) Loss: 2.179602 LR: 0.00002885 +[10:27:58] Epoch: 1 Batch: 6139/20099 (30.54%) Loss: 2.238281 LR: 0.00002885 +[10:27:59] Epoch: 1 Batch: 6140/20099 (30.55%) Loss: 1.929934 LR: 0.00002885 +[10:28:01] Epoch: 1 Batch: 6141/20099 (30.55%) Loss: 2.285249 LR: 0.00002885 +[10:28:03] Epoch: 1 Batch: 6142/20099 (30.56%) Loss: 2.460362 LR: 0.00002884 +[10:28:05] Epoch: 1 Batch: 6143/20099 (30.56%) Loss: 2.125650 LR: 0.00002884 +[10:28:06] Epoch: 1 Batch: 6144/20099 (30.57%) Loss: 2.370881 LR: 0.00002884 +[10:28:08] Epoch: 1 Batch: 6145/20099 (30.57%) Loss: 2.352486 LR: 0.00002884 +[10:28:10] Epoch: 1 Batch: 6146/20099 (30.58%) Loss: 2.066581 LR: 0.00002884 +[10:28:12] Epoch: 1 Batch: 6147/20099 (30.58%) Loss: 1.935142 LR: 0.00002884 +[10:28:13] Epoch: 1 Batch: 6148/20099 (30.59%) Loss: 2.087194 LR: 0.00002884 +[10:28:15] Epoch: 1 Batch: 6149/20099 (30.59%) Loss: 2.274426 LR: 0.00002884 +[10:28:17] Epoch: 1 Batch: 6150/20099 (30.60%) Loss: 2.526196 LR: 0.00002884 +[10:28:19] Epoch: 1 Batch: 6151/20099 (30.60%) Loss: 2.145799 LR: 0.00002884 +[10:28:21] Epoch: 1 Batch: 6152/20099 (30.61%) Loss: 2.190077 LR: 0.00002884 +[10:28:22] Epoch: 1 Batch: 6153/20099 (30.61%) Loss: 2.363593 LR: 0.00002884 +[10:28:24] Epoch: 1 Batch: 6154/20099 (30.62%) Loss: 2.063462 LR: 0.00002884 +[10:28:26] Epoch: 1 Batch: 6155/20099 (30.62%) Loss: 2.350155 LR: 0.00002884 +[10:28:28] Epoch: 1 Batch: 6156/20099 (30.63%) Loss: 2.093012 LR: 0.00002883 +[10:28:29] Epoch: 1 Batch: 6157/20099 (30.63%) Loss: 2.314228 LR: 0.00002883 +[10:28:31] Epoch: 1 Batch: 6158/20099 (30.64%) Loss: 2.166234 LR: 0.00002883 +[10:28:33] Epoch: 1 Batch: 6159/20099 (30.64%) Loss: 1.984644 LR: 0.00002883 +[10:28:35] Epoch: 1 Batch: 6160/20099 (30.65%) Loss: 2.021545 LR: 0.00002883 +[10:28:36] Epoch: 1 Batch: 6161/20099 (30.65%) Loss: 2.351858 LR: 0.00002883 +[10:28:38] Epoch: 1 Batch: 6162/20099 (30.66%) Loss: 2.069036 LR: 0.00002883 +[10:28:40] Epoch: 1 Batch: 6163/20099 (30.66%) Loss: 2.073274 LR: 0.00002882 +[10:28:42] Epoch: 1 Batch: 6164/20099 (30.67%) Loss: 2.378255 LR: 0.00002882 +[10:28:44] Epoch: 1 Batch: 6165/20099 (30.67%) Loss: 2.071101 LR: 0.00002882 +[10:28:45] Epoch: 1 Batch: 6166/20099 (30.68%) Loss: 1.766687 LR: 0.00002882 +[10:28:47] Epoch: 1 Batch: 6167/20099 (30.68%) Loss: 2.323901 LR: 0.00002882 +[10:28:49] Epoch: 1 Batch: 6168/20099 (30.69%) Loss: 2.124365 LR: 0.00002882 +[10:28:51] Epoch: 1 Batch: 6169/20099 (30.69%) Loss: 2.150475 LR: 0.00002882 +[10:28:52] Epoch: 1 Batch: 6170/20099 (30.70%) Loss: 2.207050 LR: 0.00002882 +[10:28:54] Epoch: 1 Batch: 6171/20099 (30.70%) Loss: 1.978800 LR: 0.00002882 +[10:28:56] Epoch: 1 Batch: 6172/20099 (30.71%) Loss: 2.047195 LR: 0.00002882 +[10:28:58] Epoch: 1 Batch: 6173/20099 (30.71%) Loss: 2.019106 LR: 0.00002882 +[10:28:59] Epoch: 1 Batch: 6174/20099 (30.72%) Loss: 2.172266 LR: 0.00002882 +[10:29:01] Epoch: 1 Batch: 6175/20099 (30.72%) Loss: 2.364386 LR: 0.00002882 +[10:29:03] Epoch: 1 Batch: 6176/20099 (30.73%) Loss: 2.210007 LR: 0.00002882 +[10:29:05] Epoch: 1 Batch: 6177/20099 (30.73%) Loss: 2.127088 LR: 0.00002881 +[10:29:07] Epoch: 1 Batch: 6178/20099 (30.74%) Loss: 2.232116 LR: 0.00002881 +[10:29:08] Epoch: 1 Batch: 6179/20099 (30.74%) Loss: 2.096040 LR: 0.00002881 +[10:29:10] Epoch: 1 Batch: 6180/20099 (30.75%) Loss: 2.052990 LR: 0.00002881 +[10:29:12] Epoch: 1 Batch: 6181/20099 (30.75%) Loss: 1.966865 LR: 0.00002881 +[10:29:14] Epoch: 1 Batch: 6182/20099 (30.76%) Loss: 2.361166 LR: 0.00002881 +[10:29:15] Epoch: 1 Batch: 6183/20099 (30.76%) Loss: 2.414168 LR: 0.00002881 +[10:29:17] Epoch: 1 Batch: 6184/20099 (30.77%) Loss: 1.966624 LR: 0.00002880 +[10:29:19] Epoch: 1 Batch: 6185/20099 (30.77%) Loss: 2.071851 LR: 0.00002880 +[10:29:21] Epoch: 1 Batch: 6186/20099 (30.78%) Loss: 2.094296 LR: 0.00002880 +[10:29:22] Epoch: 1 Batch: 6187/20099 (30.78%) Loss: 1.959520 LR: 0.00002880 +[10:29:24] Epoch: 1 Batch: 6188/20099 (30.79%) Loss: 2.277769 LR: 0.00002880 +[10:29:26] Epoch: 1 Batch: 6189/20099 (30.79%) Loss: 2.161672 LR: 0.00002880 +[10:29:28] Epoch: 1 Batch: 6190/20099 (30.80%) Loss: 2.199385 LR: 0.00002880 +[10:29:30] Epoch: 1 Batch: 6191/20099 (30.80%) Loss: 2.426378 LR: 0.00002879 +[10:29:31] Epoch: 1 Batch: 6192/20099 (30.81%) Loss: 2.205956 LR: 0.00002879 +[10:29:33] Epoch: 1 Batch: 6193/20099 (30.81%) Loss: 2.113696 LR: 0.00002879 +[10:29:35] Epoch: 1 Batch: 6194/20099 (30.82%) Loss: 2.186690 LR: 0.00002879 +[10:29:37] Epoch: 1 Batch: 6195/20099 (30.82%) Loss: 2.268899 LR: 0.00002879 +[10:29:38] Epoch: 1 Batch: 6196/20099 (30.83%) Loss: 2.119167 LR: 0.00002879 +[10:29:40] Epoch: 1 Batch: 6197/20099 (30.83%) Loss: 2.273023 LR: 0.00002879 +[10:29:42] Epoch: 1 Batch: 6198/20099 (30.84%) Loss: 2.194120 LR: 0.00002879 +[10:29:44] Epoch: 1 Batch: 6199/20099 (30.84%) Loss: 2.007127 LR: 0.00002879 +[10:29:49] >> Cleaned up old temp checkpoint: epoch1_step4200 +[10:29:49] >> Temp checkpoint saved: epoch1_step6200, size: 0.1693 GB +[10:29:49] Epoch: 1 Batch: 6200/20099 (30.85%) Loss: 2.352658 LR: 0.00002879 +[10:29:51] Epoch: 1 Batch: 6201/20099 (30.85%) Loss: 2.122938 LR: 0.00002879 +[10:29:53] Epoch: 1 Batch: 6202/20099 (30.86%) Loss: 2.133700 LR: 0.00002879 +[10:29:54] Epoch: 1 Batch: 6203/20099 (30.86%) Loss: 1.997478 LR: 0.00002879 +[10:29:56] Epoch: 1 Batch: 6204/20099 (30.87%) Loss: 2.240467 LR: 0.00002879 +[10:29:58] Epoch: 1 Batch: 6205/20099 (30.87%) Loss: 2.375379 LR: 0.00002878 +[10:30:00] Epoch: 1 Batch: 6206/20099 (30.88%) Loss: 2.101030 LR: 0.00002878 +[10:30:01] Epoch: 1 Batch: 6207/20099 (30.88%) Loss: 2.297934 LR: 0.00002878 +[10:30:03] Epoch: 1 Batch: 6208/20099 (30.89%) Loss: 2.024060 LR: 0.00002878 +[10:30:05] Epoch: 1 Batch: 6209/20099 (30.89%) Loss: 2.430643 LR: 0.00002878 +[10:30:07] Epoch: 1 Batch: 6210/20099 (30.90%) Loss: 2.557595 LR: 0.00002878 +[10:30:08] Epoch: 1 Batch: 6211/20099 (30.90%) Loss: 2.022629 LR: 0.00002878 +[10:30:10] Epoch: 1 Batch: 6212/20099 (30.91%) Loss: 2.093209 LR: 0.00002877 +[10:30:12] Epoch: 1 Batch: 6213/20099 (30.91%) Loss: 2.188879 LR: 0.00002877 +[10:30:14] Epoch: 1 Batch: 6214/20099 (30.92%) Loss: 2.198479 LR: 0.00002877 +[10:30:16] Epoch: 1 Batch: 6215/20099 (30.92%) Loss: 2.367657 LR: 0.00002877 +[10:30:17] Epoch: 1 Batch: 6216/20099 (30.93%) Loss: 2.026569 LR: 0.00002877 +[10:30:19] Epoch: 1 Batch: 6217/20099 (30.93%) Loss: 1.990440 LR: 0.00002877 +[10:30:21] Epoch: 1 Batch: 6218/20099 (30.94%) Loss: 2.307656 LR: 0.00002877 +[10:30:23] Epoch: 1 Batch: 6219/20099 (30.94%) Loss: 2.302414 LR: 0.00002877 +[10:30:25] Epoch: 1 Batch: 6220/20099 (30.95%) Loss: 2.042426 LR: 0.00002877 +[10:30:26] Epoch: 1 Batch: 6221/20099 (30.95%) Loss: 1.956993 LR: 0.00002877 +[10:30:28] Epoch: 1 Batch: 6222/20099 (30.96%) Loss: 2.070917 LR: 0.00002877 +[10:30:30] Epoch: 1 Batch: 6223/20099 (30.96%) Loss: 2.398350 LR: 0.00002877 +[10:30:32] Epoch: 1 Batch: 6224/20099 (30.97%) Loss: 2.162161 LR: 0.00002877 +[10:30:33] Epoch: 1 Batch: 6225/20099 (30.97%) Loss: 2.424959 LR: 0.00002877 +[10:30:35] Epoch: 1 Batch: 6226/20099 (30.98%) Loss: 2.075073 LR: 0.00002876 +[10:30:37] Epoch: 1 Batch: 6227/20099 (30.98%) Loss: 2.176429 LR: 0.00002876 +[10:30:39] Epoch: 1 Batch: 6228/20099 (30.99%) Loss: 2.264375 LR: 0.00002876 +[10:30:41] Epoch: 1 Batch: 6229/20099 (30.99%) Loss: 2.318801 LR: 0.00002876 +[10:30:42] Epoch: 1 Batch: 6230/20099 (31.00%) Loss: 2.320192 LR: 0.00002876 +[10:30:44] Epoch: 1 Batch: 6231/20099 (31.00%) Loss: 2.111840 LR: 0.00002876 +[10:30:46] Epoch: 1 Batch: 6232/20099 (31.01%) Loss: 1.939478 LR: 0.00002876 +[10:30:48] Epoch: 1 Batch: 6233/20099 (31.01%) Loss: 1.553004 LR: 0.00002875 +[10:30:49] Epoch: 1 Batch: 6234/20099 (31.02%) Loss: 2.190462 LR: 0.00002875 +[10:30:51] Epoch: 1 Batch: 6235/20099 (31.02%) Loss: 2.097749 LR: 0.00002875 +[10:30:53] Epoch: 1 Batch: 6236/20099 (31.03%) Loss: 2.135648 LR: 0.00002875 +[10:30:55] Epoch: 1 Batch: 6237/20099 (31.03%) Loss: 1.849052 LR: 0.00002875 +[10:30:56] Epoch: 1 Batch: 6238/20099 (31.04%) Loss: 2.221046 LR: 0.00002875 +[10:30:58] Epoch: 1 Batch: 6239/20099 (31.04%) Loss: 1.806323 LR: 0.00002875 +[10:31:00] Epoch: 1 Batch: 6240/20099 (31.05%) Loss: 2.211417 LR: 0.00002874 +[10:31:02] Epoch: 1 Batch: 6241/20099 (31.05%) Loss: 1.989772 LR: 0.00002874 +[10:31:04] Epoch: 1 Batch: 6242/20099 (31.06%) Loss: 2.258232 LR: 0.00002874 +[10:31:05] Epoch: 1 Batch: 6243/20099 (31.06%) Loss: 1.868869 LR: 0.00002874 +[10:31:07] Epoch: 1 Batch: 6244/20099 (31.07%) Loss: 2.063636 LR: 0.00002874 +[10:31:09] Epoch: 1 Batch: 6245/20099 (31.07%) Loss: 2.168203 LR: 0.00002874 +[10:31:11] Epoch: 1 Batch: 6246/20099 (31.08%) Loss: 2.266906 LR: 0.00002874 +[10:31:12] Epoch: 1 Batch: 6247/20099 (31.08%) Loss: 2.119305 LR: 0.00002874 +[10:31:14] Epoch: 1 Batch: 6248/20099 (31.09%) Loss: 1.839048 LR: 0.00002874 +[10:31:16] Epoch: 1 Batch: 6249/20099 (31.09%) Loss: 2.341777 LR: 0.00002874 +[10:31:18] Epoch: 1 Batch: 6250/20099 (31.10%) Loss: 1.923596 LR: 0.00002874 +[10:31:19] Epoch: 1 Batch: 6251/20099 (31.10%) Loss: 2.086246 LR: 0.00002874 +[10:31:21] Epoch: 1 Batch: 6252/20099 (31.11%) Loss: 1.946492 LR: 0.00002874 +[10:31:23] Epoch: 1 Batch: 6253/20099 (31.11%) Loss: 1.811921 LR: 0.00002874 +[10:31:25] Epoch: 1 Batch: 6254/20099 (31.12%) Loss: 2.099774 LR: 0.00002873 +[10:31:27] Epoch: 1 Batch: 6255/20099 (31.12%) Loss: 2.013122 LR: 0.00002873 +[10:31:28] Epoch: 1 Batch: 6256/20099 (31.13%) Loss: 2.168018 LR: 0.00002873 +[10:31:30] Epoch: 1 Batch: 6257/20099 (31.13%) Loss: 2.296320 LR: 0.00002873 +[10:31:32] Epoch: 1 Batch: 6258/20099 (31.14%) Loss: 1.782577 LR: 0.00002873 +[10:31:34] Epoch: 1 Batch: 6259/20099 (31.14%) Loss: 1.936415 LR: 0.00002873 +[10:31:35] Epoch: 1 Batch: 6260/20099 (31.15%) Loss: 1.783918 LR: 0.00002873 +[10:31:37] Epoch: 1 Batch: 6261/20099 (31.15%) Loss: 2.352529 LR: 0.00002872 +[10:31:39] Epoch: 1 Batch: 6262/20099 (31.16%) Loss: 2.102598 LR: 0.00002872 +[10:31:41] Epoch: 1 Batch: 6263/20099 (31.16%) Loss: 2.024643 LR: 0.00002872 +[10:31:42] Epoch: 1 Batch: 6264/20099 (31.17%) Loss: 2.136469 LR: 0.00002872 +[10:31:44] Epoch: 1 Batch: 6265/20099 (31.17%) Loss: 2.013725 LR: 0.00002872 +[10:31:46] Epoch: 1 Batch: 6266/20099 (31.18%) Loss: 2.118818 LR: 0.00002872 +[10:31:48] Epoch: 1 Batch: 6267/20099 (31.18%) Loss: 2.116645 LR: 0.00002872 +[10:31:49] Epoch: 1 Batch: 6268/20099 (31.19%) Loss: 2.264665 LR: 0.00002872 +[10:31:51] Epoch: 1 Batch: 6269/20099 (31.19%) Loss: 2.431829 LR: 0.00002872 +[10:31:53] Epoch: 1 Batch: 6270/20099 (31.20%) Loss: 2.005041 LR: 0.00002872 +[10:31:55] Epoch: 1 Batch: 6271/20099 (31.20%) Loss: 2.347154 LR: 0.00002872 +[10:31:57] Epoch: 1 Batch: 6272/20099 (31.21%) Loss: 2.042758 LR: 0.00002872 +[10:31:58] Epoch: 1 Batch: 6273/20099 (31.21%) Loss: 2.224375 LR: 0.00002872 +[10:32:00] Epoch: 1 Batch: 6274/20099 (31.22%) Loss: 2.099962 LR: 0.00002872 +[10:32:02] Epoch: 1 Batch: 6275/20099 (31.22%) Loss: 2.175937 LR: 0.00002871 +[10:32:04] Epoch: 1 Batch: 6276/20099 (31.23%) Loss: 2.144987 LR: 0.00002871 +[10:32:05] Epoch: 1 Batch: 6277/20099 (31.23%) Loss: 2.311742 LR: 0.00002871 +[10:32:07] Epoch: 1 Batch: 6278/20099 (31.24%) Loss: 2.154278 LR: 0.00002871 +[10:32:09] Epoch: 1 Batch: 6279/20099 (31.24%) Loss: 2.148815 LR: 0.00002871 +[10:32:11] Epoch: 1 Batch: 6280/20099 (31.25%) Loss: 2.019048 LR: 0.00002871 +[10:32:12] Epoch: 1 Batch: 6281/20099 (31.25%) Loss: 2.227001 LR: 0.00002871 +[10:32:14] Epoch: 1 Batch: 6282/20099 (31.26%) Loss: 1.992877 LR: 0.00002870 +[10:32:16] Epoch: 1 Batch: 6283/20099 (31.26%) Loss: 2.038112 LR: 0.00002870 +[10:32:18] Epoch: 1 Batch: 6284/20099 (31.27%) Loss: 2.358537 LR: 0.00002870 +[10:32:20] Epoch: 1 Batch: 6285/20099 (31.27%) Loss: 2.396007 LR: 0.00002870 +[10:32:21] Epoch: 1 Batch: 6286/20099 (31.28%) Loss: 2.252377 LR: 0.00002870 +[10:32:23] Epoch: 1 Batch: 6287/20099 (31.28%) Loss: 1.995181 LR: 0.00002870 +[10:32:25] Epoch: 1 Batch: 6288/20099 (31.29%) Loss: 2.208895 LR: 0.00002870 +[10:32:27] Epoch: 1 Batch: 6289/20099 (31.29%) Loss: 2.161045 LR: 0.00002869 +[10:32:28] Epoch: 1 Batch: 6290/20099 (31.30%) Loss: 1.909012 LR: 0.00002869 +[10:32:30] Epoch: 1 Batch: 6291/20099 (31.30%) Loss: 2.003023 LR: 0.00002869 +[10:32:32] Epoch: 1 Batch: 6292/20099 (31.31%) Loss: 2.053637 LR: 0.00002869 +[10:32:34] Epoch: 1 Batch: 6293/20099 (31.31%) Loss: 1.581449 LR: 0.00002869 +[10:32:36] Epoch: 1 Batch: 6294/20099 (31.31%) Loss: 1.993044 LR: 0.00002869 +[10:32:37] Epoch: 1 Batch: 6295/20099 (31.32%) Loss: 2.364690 LR: 0.00002869 +[10:32:39] Epoch: 1 Batch: 6296/20099 (31.32%) Loss: 2.251909 LR: 0.00002869 +[10:32:41] Epoch: 1 Batch: 6297/20099 (31.33%) Loss: 2.124489 LR: 0.00002869 +[10:32:43] Epoch: 1 Batch: 6298/20099 (31.33%) Loss: 2.271361 LR: 0.00002869 +[10:32:44] Epoch: 1 Batch: 6299/20099 (31.34%) Loss: 2.161395 LR: 0.00002869 +[10:32:46] Epoch: 1 Batch: 6300/20099 (31.34%) Loss: 1.885215 LR: 0.00002869 +[10:32:48] Epoch: 1 Batch: 6301/20099 (31.35%) Loss: 2.243901 LR: 0.00002869 +[10:32:50] Epoch: 1 Batch: 6302/20099 (31.35%) Loss: 2.308875 LR: 0.00002869 +[10:32:52] Epoch: 1 Batch: 6303/20099 (31.36%) Loss: 2.255338 LR: 0.00002868 +[10:32:54] Epoch: 1 Batch: 6304/20099 (31.36%) Loss: 2.232868 LR: 0.00002868 +[10:32:55] Epoch: 1 Batch: 6305/20099 (31.37%) Loss: 2.008337 LR: 0.00002868 +[10:32:57] Epoch: 1 Batch: 6306/20099 (31.37%) Loss: 2.292962 LR: 0.00002868 +[10:32:59] Epoch: 1 Batch: 6307/20099 (31.38%) Loss: 2.177768 LR: 0.00002868 +[10:33:01] Epoch: 1 Batch: 6308/20099 (31.38%) Loss: 1.999588 LR: 0.00002868 +[10:33:02] Epoch: 1 Batch: 6309/20099 (31.39%) Loss: 2.037834 LR: 0.00002868 +[10:33:04] Epoch: 1 Batch: 6310/20099 (31.39%) Loss: 1.953870 LR: 0.00002867 +[10:33:06] Epoch: 1 Batch: 6311/20099 (31.40%) Loss: 2.386093 LR: 0.00002867 +[10:33:08] Epoch: 1 Batch: 6312/20099 (31.40%) Loss: 2.085163 LR: 0.00002867 +[10:33:10] Epoch: 1 Batch: 6313/20099 (31.41%) Loss: 2.273416 LR: 0.00002867 +[10:33:11] Epoch: 1 Batch: 6314/20099 (31.41%) Loss: 1.964740 LR: 0.00002867 +[10:33:13] Epoch: 1 Batch: 6315/20099 (31.42%) Loss: 1.837167 LR: 0.00002867 +[10:33:15] Epoch: 1 Batch: 6316/20099 (31.42%) Loss: 2.140056 LR: 0.00002867 +[10:33:17] Epoch: 1 Batch: 6317/20099 (31.43%) Loss: 1.926631 LR: 0.00002866 +[10:33:18] Epoch: 1 Batch: 6318/20099 (31.43%) Loss: 2.346955 LR: 0.00002866 +[10:33:20] Epoch: 1 Batch: 6319/20099 (31.44%) Loss: 2.188176 LR: 0.00002866 +[10:33:22] Epoch: 1 Batch: 6320/20099 (31.44%) Loss: 2.186530 LR: 0.00002866 +[10:33:24] Epoch: 1 Batch: 6321/20099 (31.45%) Loss: 2.317046 LR: 0.00002866 +[10:33:26] Epoch: 1 Batch: 6322/20099 (31.45%) Loss: 2.321129 LR: 0.00002866 +[10:33:27] Epoch: 1 Batch: 6323/20099 (31.46%) Loss: 2.320998 LR: 0.00002866 +[10:33:29] Epoch: 1 Batch: 6324/20099 (31.46%) Loss: 2.068393 LR: 0.00002866 +[10:33:31] Epoch: 1 Batch: 6325/20099 (31.47%) Loss: 2.342798 LR: 0.00002866 +[10:33:33] Epoch: 1 Batch: 6326/20099 (31.47%) Loss: 2.376051 LR: 0.00002866 +[10:33:34] Epoch: 1 Batch: 6327/20099 (31.48%) Loss: 1.967569 LR: 0.00002866 +[10:33:36] Epoch: 1 Batch: 6328/20099 (31.48%) Loss: 2.309822 LR: 0.00002866 +[10:33:38] Epoch: 1 Batch: 6329/20099 (31.49%) Loss: 2.247426 LR: 0.00002866 +[10:33:40] Epoch: 1 Batch: 6330/20099 (31.49%) Loss: 2.406678 LR: 0.00002866 +[10:33:42] Epoch: 1 Batch: 6331/20099 (31.50%) Loss: 2.164763 LR: 0.00002865 +[10:33:43] Epoch: 1 Batch: 6332/20099 (31.50%) Loss: 1.999369 LR: 0.00002865 +[10:33:45] Epoch: 1 Batch: 6333/20099 (31.51%) Loss: 1.852582 LR: 0.00002865 +[10:33:47] Epoch: 1 Batch: 6334/20099 (31.51%) Loss: 2.117928 LR: 0.00002865 +[10:33:49] Epoch: 1 Batch: 6335/20099 (31.52%) Loss: 2.210070 LR: 0.00002865 +[10:33:50] Epoch: 1 Batch: 6336/20099 (31.52%) Loss: 2.194678 LR: 0.00002865 +[10:33:52] Epoch: 1 Batch: 6337/20099 (31.53%) Loss: 2.138809 LR: 0.00002865 +[10:33:54] Epoch: 1 Batch: 6338/20099 (31.53%) Loss: 2.578362 LR: 0.00002864 +[10:33:56] Epoch: 1 Batch: 6339/20099 (31.54%) Loss: 2.613313 LR: 0.00002864 +[10:33:57] Epoch: 1 Batch: 6340/20099 (31.54%) Loss: 1.962745 LR: 0.00002864 +[10:33:59] Epoch: 1 Batch: 6341/20099 (31.55%) Loss: 2.238034 LR: 0.00002864 +[10:34:01] Epoch: 1 Batch: 6342/20099 (31.55%) Loss: 1.965877 LR: 0.00002864 +[10:34:03] Epoch: 1 Batch: 6343/20099 (31.56%) Loss: 2.031760 LR: 0.00002864 +[10:34:05] Epoch: 1 Batch: 6344/20099 (31.56%) Loss: 2.168831 LR: 0.00002864 +[10:34:06] Epoch: 1 Batch: 6345/20099 (31.57%) Loss: 2.137528 LR: 0.00002863 +[10:34:08] Epoch: 1 Batch: 6346/20099 (31.57%) Loss: 2.020970 LR: 0.00002863 +[10:34:10] Epoch: 1 Batch: 6347/20099 (31.58%) Loss: 2.623054 LR: 0.00002863 +[10:34:12] Epoch: 1 Batch: 6348/20099 (31.58%) Loss: 2.391135 LR: 0.00002863 +[10:34:13] Epoch: 1 Batch: 6349/20099 (31.59%) Loss: 1.864197 LR: 0.00002863 +[10:34:15] Epoch: 1 Batch: 6350/20099 (31.59%) Loss: 2.221427 LR: 0.00002863 +[10:34:17] Epoch: 1 Batch: 6351/20099 (31.60%) Loss: 2.290756 LR: 0.00002863 +[10:34:19] Epoch: 1 Batch: 6352/20099 (31.60%) Loss: 2.254968 LR: 0.00002863 +[10:34:21] Epoch: 1 Batch: 6353/20099 (31.61%) Loss: 2.176170 LR: 0.00002863 +[10:34:22] Epoch: 1 Batch: 6354/20099 (31.61%) Loss: 2.136479 LR: 0.00002863 +[10:34:24] Epoch: 1 Batch: 6355/20099 (31.62%) Loss: 2.019973 LR: 0.00002863 +[10:34:26] Epoch: 1 Batch: 6356/20099 (31.62%) Loss: 2.168294 LR: 0.00002863 +[10:34:28] Epoch: 1 Batch: 6357/20099 (31.63%) Loss: 2.092223 LR: 0.00002863 +[10:34:29] Epoch: 1 Batch: 6358/20099 (31.63%) Loss: 1.790054 LR: 0.00002863 +[10:34:31] Epoch: 1 Batch: 6359/20099 (31.64%) Loss: 2.244169 LR: 0.00002862 +[10:34:33] Epoch: 1 Batch: 6360/20099 (31.64%) Loss: 1.939675 LR: 0.00002862 +[10:34:35] Epoch: 1 Batch: 6361/20099 (31.65%) Loss: 2.093519 LR: 0.00002862 +[10:34:36] Epoch: 1 Batch: 6362/20099 (31.65%) Loss: 2.149931 LR: 0.00002862 +[10:34:38] Epoch: 1 Batch: 6363/20099 (31.66%) Loss: 2.302253 LR: 0.00002862 +[10:34:40] Epoch: 1 Batch: 6364/20099 (31.66%) Loss: 2.047087 LR: 0.00002862 +[10:34:42] Epoch: 1 Batch: 6365/20099 (31.67%) Loss: 1.813463 LR: 0.00002862 +[10:34:43] Epoch: 1 Batch: 6366/20099 (31.67%) Loss: 2.200158 LR: 0.00002861 +[10:34:45] Epoch: 1 Batch: 6367/20099 (31.68%) Loss: 2.218111 LR: 0.00002861 +[10:34:47] Epoch: 1 Batch: 6368/20099 (31.68%) Loss: 2.087201 LR: 0.00002861 +[10:34:49] Epoch: 1 Batch: 6369/20099 (31.69%) Loss: 2.247863 LR: 0.00002861 +[10:34:50] Epoch: 1 Batch: 6370/20099 (31.69%) Loss: 2.013531 LR: 0.00002861 +[10:34:52] Epoch: 1 Batch: 6371/20099 (31.70%) Loss: 2.202727 LR: 0.00002861 +[10:34:54] Epoch: 1 Batch: 6372/20099 (31.70%) Loss: 2.069596 LR: 0.00002861 +[10:34:56] Epoch: 1 Batch: 6373/20099 (31.71%) Loss: 1.978574 LR: 0.00002860 +[10:34:58] Epoch: 1 Batch: 6374/20099 (31.71%) Loss: 2.135419 LR: 0.00002860 +[10:34:59] Epoch: 1 Batch: 6375/20099 (31.72%) Loss: 2.421329 LR: 0.00002860 +[10:35:01] Epoch: 1 Batch: 6376/20099 (31.72%) Loss: 1.557088 LR: 0.00002860 +[10:35:03] Epoch: 1 Batch: 6377/20099 (31.73%) Loss: 2.361428 LR: 0.00002860 +[10:35:05] Epoch: 1 Batch: 6378/20099 (31.73%) Loss: 1.848278 LR: 0.00002860 +[10:35:06] Epoch: 1 Batch: 6379/20099 (31.74%) Loss: 2.046473 LR: 0.00002860 +[10:35:08] Epoch: 1 Batch: 6380/20099 (31.74%) Loss: 2.112885 LR: 0.00002860 +[10:35:10] Epoch: 1 Batch: 6381/20099 (31.75%) Loss: 2.250193 LR: 0.00002860 +[10:35:12] Epoch: 1 Batch: 6382/20099 (31.75%) Loss: 1.942080 LR: 0.00002860 +[10:35:13] Epoch: 1 Batch: 6383/20099 (31.76%) Loss: 2.205992 LR: 0.00002860 +[10:35:15] Epoch: 1 Batch: 6384/20099 (31.76%) Loss: 1.865969 LR: 0.00002860 +[10:35:17] Epoch: 1 Batch: 6385/20099 (31.77%) Loss: 2.087124 LR: 0.00002860 +[10:35:19] Epoch: 1 Batch: 6386/20099 (31.77%) Loss: 1.854963 LR: 0.00002860 +[10:35:20] Epoch: 1 Batch: 6387/20099 (31.78%) Loss: 2.257592 LR: 0.00002859 +[10:35:22] Epoch: 1 Batch: 6388/20099 (31.78%) Loss: 2.037284 LR: 0.00002859 +[10:35:24] Epoch: 1 Batch: 6389/20099 (31.79%) Loss: 2.206249 LR: 0.00002859 +[10:35:26] Epoch: 1 Batch: 6390/20099 (31.79%) Loss: 2.296614 LR: 0.00002859 +[10:35:27] Epoch: 1 Batch: 6391/20099 (31.80%) Loss: 2.156594 LR: 0.00002859 +[10:35:29] Epoch: 1 Batch: 6392/20099 (31.80%) Loss: 1.779805 LR: 0.00002859 +[10:35:31] Epoch: 1 Batch: 6393/20099 (31.81%) Loss: 2.363652 LR: 0.00002859 +[10:35:33] Epoch: 1 Batch: 6394/20099 (31.81%) Loss: 2.080235 LR: 0.00002858 +[10:35:35] Epoch: 1 Batch: 6395/20099 (31.82%) Loss: 2.042125 LR: 0.00002858 +[10:35:36] Epoch: 1 Batch: 6396/20099 (31.82%) Loss: 2.227843 LR: 0.00002858 +[10:35:38] Epoch: 1 Batch: 6397/20099 (31.83%) Loss: 2.112696 LR: 0.00002858 +[10:35:40] Epoch: 1 Batch: 6398/20099 (31.83%) Loss: 1.746638 LR: 0.00002858 +[10:35:42] Epoch: 1 Batch: 6399/20099 (31.84%) Loss: 2.035920 LR: 0.00002858 +[10:35:47] >> Cleaned up old temp checkpoint: epoch1_step4400 +[10:35:47] >> Temp checkpoint saved: epoch1_step6400, size: 0.1693 GB +[10:35:47] Epoch: 1 Batch: 6400/20099 (31.84%) Loss: 2.187665 LR: 0.00002858 +[10:35:49] Epoch: 1 Batch: 6401/20099 (31.85%) Loss: 2.415897 LR: 0.00002857 +[10:35:50] Epoch: 1 Batch: 6402/20099 (31.85%) Loss: 2.008123 LR: 0.00002857 +[10:35:52] Epoch: 1 Batch: 6403/20099 (31.86%) Loss: 2.250744 LR: 0.00002857 +[10:35:54] Epoch: 1 Batch: 6404/20099 (31.86%) Loss: 1.907071 LR: 0.00002857 +[10:35:56] Epoch: 1 Batch: 6405/20099 (31.87%) Loss: 2.142601 LR: 0.00002857 +[10:35:57] Epoch: 1 Batch: 6406/20099 (31.87%) Loss: 2.119431 LR: 0.00002857 +[10:35:59] Epoch: 1 Batch: 6407/20099 (31.88%) Loss: 2.164829 LR: 0.00002857 +[10:36:01] Epoch: 1 Batch: 6408/20099 (31.88%) Loss: 1.941119 LR: 0.00002857 +[10:36:03] Epoch: 1 Batch: 6409/20099 (31.89%) Loss: 2.189453 LR: 0.00002857 +[10:36:04] Epoch: 1 Batch: 6410/20099 (31.89%) Loss: 1.862437 LR: 0.00002857 +[10:36:06] Epoch: 1 Batch: 6411/20099 (31.90%) Loss: 2.442967 LR: 0.00002857 +[10:36:08] Epoch: 1 Batch: 6412/20099 (31.90%) Loss: 2.326790 LR: 0.00002857 +[10:36:10] Epoch: 1 Batch: 6413/20099 (31.91%) Loss: 1.938196 LR: 0.00002857 +[10:36:12] Epoch: 1 Batch: 6414/20099 (31.91%) Loss: 2.344465 LR: 0.00002857 +[10:36:13] Epoch: 1 Batch: 6415/20099 (31.92%) Loss: 2.083340 LR: 0.00002856 +[10:36:15] Epoch: 1 Batch: 6416/20099 (31.92%) Loss: 2.089156 LR: 0.00002856 +[10:36:17] Epoch: 1 Batch: 6417/20099 (31.93%) Loss: 2.032650 LR: 0.00002856 +[10:36:19] Epoch: 1 Batch: 6418/20099 (31.93%) Loss: 1.982558 LR: 0.00002856 +[10:36:21] Epoch: 1 Batch: 6419/20099 (31.94%) Loss: 2.167651 LR: 0.00002856 +[10:36:22] Epoch: 1 Batch: 6420/20099 (31.94%) Loss: 2.191182 LR: 0.00002856 +[10:36:24] Epoch: 1 Batch: 6421/20099 (31.95%) Loss: 2.059162 LR: 0.00002856 +[10:36:26] Epoch: 1 Batch: 6422/20099 (31.95%) Loss: 2.069561 LR: 0.00002855 +[10:36:28] Epoch: 1 Batch: 6423/20099 (31.96%) Loss: 1.951817 LR: 0.00002855 +[10:36:29] Epoch: 1 Batch: 6424/20099 (31.96%) Loss: 2.182806 LR: 0.00002855 +[10:36:31] Epoch: 1 Batch: 6425/20099 (31.97%) Loss: 2.182603 LR: 0.00002855 +[10:36:33] Epoch: 1 Batch: 6426/20099 (31.97%) Loss: 2.202991 LR: 0.00002855 +[10:36:35] Epoch: 1 Batch: 6427/20099 (31.98%) Loss: 2.128460 LR: 0.00002855 +[10:36:37] Epoch: 1 Batch: 6428/20099 (31.98%) Loss: 2.210492 LR: 0.00002855 +[10:36:38] Epoch: 1 Batch: 6429/20099 (31.99%) Loss: 2.253300 LR: 0.00002854 +[10:36:40] Epoch: 1 Batch: 6430/20099 (31.99%) Loss: 2.055878 LR: 0.00002854 +[10:36:42] Epoch: 1 Batch: 6431/20099 (32.00%) Loss: 2.252514 LR: 0.00002854 +[10:36:44] Epoch: 1 Batch: 6432/20099 (32.00%) Loss: 2.128476 LR: 0.00002854 +[10:36:45] Epoch: 1 Batch: 6433/20099 (32.01%) Loss: 2.277943 LR: 0.00002854 +[10:36:47] Epoch: 1 Batch: 6434/20099 (32.01%) Loss: 2.148726 LR: 0.00002854 +[10:36:49] Epoch: 1 Batch: 6435/20099 (32.02%) Loss: 2.372304 LR: 0.00002854 +[10:36:51] Epoch: 1 Batch: 6436/20099 (32.02%) Loss: 2.173939 LR: 0.00002853 +[10:36:53] Epoch: 1 Batch: 6437/20099 (32.03%) Loss: 2.139204 LR: 0.00002853 +[10:36:55] Epoch: 1 Batch: 6438/20099 (32.03%) Loss: 2.104537 LR: 0.00002853 +[10:36:56] Epoch: 1 Batch: 6439/20099 (32.04%) Loss: 1.990730 LR: 0.00002853 +[10:36:58] Epoch: 1 Batch: 6440/20099 (32.04%) Loss: 1.683298 LR: 0.00002853 +[10:37:00] Epoch: 1 Batch: 6441/20099 (32.05%) Loss: 1.856986 LR: 0.00002853 +[10:37:02] Epoch: 1 Batch: 6442/20099 (32.05%) Loss: 1.720581 LR: 0.00002853 +[10:37:03] Epoch: 1 Batch: 6443/20099 (32.06%) Loss: 2.269967 LR: 0.00002853 +[10:37:05] Epoch: 1 Batch: 6444/20099 (32.06%) Loss: 2.088237 LR: 0.00002853 +[10:37:07] Epoch: 1 Batch: 6445/20099 (32.07%) Loss: 2.325061 LR: 0.00002853 +[10:37:09] Epoch: 1 Batch: 6446/20099 (32.07%) Loss: 2.199887 LR: 0.00002853 +[10:37:10] Epoch: 1 Batch: 6447/20099 (32.08%) Loss: 1.890098 LR: 0.00002853 +[10:37:12] Epoch: 1 Batch: 6448/20099 (32.08%) Loss: 1.856905 LR: 0.00002853 +[10:37:14] Epoch: 1 Batch: 6449/20099 (32.09%) Loss: 1.878806 LR: 0.00002853 +[10:37:16] Epoch: 1 Batch: 6450/20099 (32.09%) Loss: 2.107252 LR: 0.00002852 +[10:37:17] Epoch: 1 Batch: 6451/20099 (32.10%) Loss: 2.153322 LR: 0.00002852 +[10:37:19] Epoch: 1 Batch: 6452/20099 (32.10%) Loss: 1.989208 LR: 0.00002852 +[10:37:21] Epoch: 1 Batch: 6453/20099 (32.11%) Loss: 2.167279 LR: 0.00002852 +[10:37:23] Epoch: 1 Batch: 6454/20099 (32.11%) Loss: 2.041933 LR: 0.00002852 +[10:37:24] Epoch: 1 Batch: 6455/20099 (32.12%) Loss: 2.164305 LR: 0.00002852 +[10:37:26] Epoch: 1 Batch: 6456/20099 (32.12%) Loss: 2.242924 LR: 0.00002852 +[10:37:28] Epoch: 1 Batch: 6457/20099 (32.13%) Loss: 2.059285 LR: 0.00002851 +[10:37:30] Epoch: 1 Batch: 6458/20099 (32.13%) Loss: 2.238825 LR: 0.00002851 +[10:37:31] Epoch: 1 Batch: 6459/20099 (32.14%) Loss: 1.898645 LR: 0.00002851 +[10:37:33] Epoch: 1 Batch: 6460/20099 (32.14%) Loss: 1.892266 LR: 0.00002851 +[10:37:35] Epoch: 1 Batch: 6461/20099 (32.15%) Loss: 2.213252 LR: 0.00002851 +[10:37:37] Epoch: 1 Batch: 6462/20099 (32.15%) Loss: 2.130011 LR: 0.00002851 +[10:37:39] Epoch: 1 Batch: 6463/20099 (32.16%) Loss: 2.394291 LR: 0.00002851 +[10:37:40] Epoch: 1 Batch: 6464/20099 (32.16%) Loss: 2.277079 LR: 0.00002850 +[10:37:42] Epoch: 1 Batch: 6465/20099 (32.17%) Loss: 2.149283 LR: 0.00002850 +[10:37:44] Epoch: 1 Batch: 6466/20099 (32.17%) Loss: 2.530308 LR: 0.00002850 +[10:37:46] Epoch: 1 Batch: 6467/20099 (32.18%) Loss: 2.000461 LR: 0.00002850 +[10:37:48] Epoch: 1 Batch: 6468/20099 (32.18%) Loss: 2.242094 LR: 0.00002850 +[10:37:49] Epoch: 1 Batch: 6469/20099 (32.19%) Loss: 2.252054 LR: 0.00002850 +[10:37:51] Epoch: 1 Batch: 6470/20099 (32.19%) Loss: 2.180910 LR: 0.00002850 +[10:37:53] Epoch: 1 Batch: 6471/20099 (32.20%) Loss: 1.886440 LR: 0.00002849 +[10:37:55] Epoch: 1 Batch: 6472/20099 (32.20%) Loss: 1.936881 LR: 0.00002849 +[10:37:56] Epoch: 1 Batch: 6473/20099 (32.21%) Loss: 2.202753 LR: 0.00002849 +[10:37:58] Epoch: 1 Batch: 6474/20099 (32.21%) Loss: 2.116051 LR: 0.00002849 +[10:38:00] Epoch: 1 Batch: 6475/20099 (32.22%) Loss: 2.321758 LR: 0.00002849 +[10:38:02] Epoch: 1 Batch: 6476/20099 (32.22%) Loss: 2.130818 LR: 0.00002849 +[10:38:04] Epoch: 1 Batch: 6477/20099 (32.23%) Loss: 2.130774 LR: 0.00002849 +[10:38:05] Epoch: 1 Batch: 6478/20099 (32.23%) Loss: 2.111354 LR: 0.00002849 +[10:38:07] Epoch: 1 Batch: 6479/20099 (32.24%) Loss: 1.938881 LR: 0.00002849 +[10:38:09] Epoch: 1 Batch: 6480/20099 (32.24%) Loss: 2.136102 LR: 0.00002849 +[10:38:11] Epoch: 1 Batch: 6481/20099 (32.25%) Loss: 2.278056 LR: 0.00002849 +[10:38:12] Epoch: 1 Batch: 6482/20099 (32.25%) Loss: 2.088788 LR: 0.00002849 +[10:38:14] Epoch: 1 Batch: 6483/20099 (32.26%) Loss: 1.988611 LR: 0.00002849 +[10:38:16] Epoch: 1 Batch: 6484/20099 (32.26%) Loss: 2.331734 LR: 0.00002849 +[10:38:18] Epoch: 1 Batch: 6485/20099 (32.27%) Loss: 1.954788 LR: 0.00002848 +[10:38:19] Epoch: 1 Batch: 6486/20099 (32.27%) Loss: 2.239268 LR: 0.00002848 +[10:38:21] Epoch: 1 Batch: 6487/20099 (32.28%) Loss: 2.367664 LR: 0.00002848 +[10:38:23] Epoch: 1 Batch: 6488/20099 (32.28%) Loss: 2.037768 LR: 0.00002848 +[10:38:25] Epoch: 1 Batch: 6489/20099 (32.29%) Loss: 2.313503 LR: 0.00002848 +[10:38:27] Epoch: 1 Batch: 6490/20099 (32.29%) Loss: 2.131440 LR: 0.00002848 +[10:38:28] Epoch: 1 Batch: 6491/20099 (32.30%) Loss: 2.190133 LR: 0.00002848 +[10:38:30] Epoch: 1 Batch: 6492/20099 (32.30%) Loss: 2.212343 LR: 0.00002847 +[10:38:32] Epoch: 1 Batch: 6493/20099 (32.31%) Loss: 2.175993 LR: 0.00002847 +[10:38:34] Epoch: 1 Batch: 6494/20099 (32.31%) Loss: 1.995020 LR: 0.00002847 +[10:38:35] Epoch: 1 Batch: 6495/20099 (32.32%) Loss: 2.087795 LR: 0.00002847 +[10:38:37] Epoch: 1 Batch: 6496/20099 (32.32%) Loss: 2.122383 LR: 0.00002847 +[10:38:39] Epoch: 1 Batch: 6497/20099 (32.32%) Loss: 2.111411 LR: 0.00002847 +[10:38:41] Epoch: 1 Batch: 6498/20099 (32.33%) Loss: 2.365207 LR: 0.00002847 +[10:38:43] Epoch: 1 Batch: 6499/20099 (32.33%) Loss: 1.932493 LR: 0.00002846 +[10:38:44] >> Evaluating batch 0 +[10:38:45] >> Evaluating batch 1 +[10:38:46] >> Evaluating batch 2 +[10:38:47] >> Evaluating batch 3 +[10:38:48] >> Evaluating batch 4 +[10:38:49] >> Evaluating batch 5 +[10:38:51] >> Evaluating batch 6 +[10:38:51] >> Evaluating batch 7 +[10:38:53] >> Evaluating batch 8 +[10:38:54] >> Evaluating batch 9 +[10:38:54] >> Evaluating batch 10 +[10:38:55] >> Evaluating batch 11 +[10:38:56] >> Evaluating batch 12 +[10:38:57] >> Evaluating batch 13 +[10:38:58] >> Evaluating batch 14 +[10:38:59] >> Evaluating batch 15 +[10:39:00] >> Evaluating batch 16 +[10:39:01] Epoch: 1 Step: 6500/20099 Evaluation: +[10:39:01] [1mAvg Loss Since Last Eval: 2.1318 Val Loss: 2.1890 Validation loss delta: -0.0093 Perplexity: 8.9265 LR: 0.00002846 +[10:39:04] >> Checkpoint saved: epoch1_step6500, size: 0.1693 GB +[10:39:04] Epoch: 1 Batch: 6500/20099 (32.34%) Loss: 1.906406 LR: 0.00002846 +[10:39:06] Epoch: 1 Batch: 6501/20099 (32.34%) Loss: 2.115054 LR: 0.00002846 +[10:39:08] Epoch: 1 Batch: 6502/20099 (32.35%) Loss: 2.105612 LR: 0.00002846 +[10:39:10] Epoch: 1 Batch: 6503/20099 (32.35%) Loss: 2.404300 LR: 0.00002846 +[10:39:11] Epoch: 1 Batch: 6504/20099 (32.36%) Loss: 1.993336 LR: 0.00002846 +[10:39:13] Epoch: 1 Batch: 6505/20099 (32.36%) Loss: 2.086851 LR: 0.00002846 +[10:39:15] Epoch: 1 Batch: 6506/20099 (32.37%) Loss: 2.032112 LR: 0.00002846 +[10:39:17] Epoch: 1 Batch: 6507/20099 (32.37%) Loss: 2.123315 LR: 0.00002846 +[10:39:18] Epoch: 1 Batch: 6508/20099 (32.38%) Loss: 2.132798 LR: 0.00002846 +[10:39:20] Epoch: 1 Batch: 6509/20099 (32.38%) Loss: 2.028927 LR: 0.00002846 +[10:39:22] Epoch: 1 Batch: 6510/20099 (32.39%) Loss: 2.413198 LR: 0.00002846 +[10:39:24] Epoch: 1 Batch: 6511/20099 (32.39%) Loss: 1.752986 LR: 0.00002846 +[10:39:26] Epoch: 1 Batch: 6512/20099 (32.40%) Loss: 1.807659 LR: 0.00002846 +[10:39:27] Epoch: 1 Batch: 6513/20099 (32.40%) Loss: 2.421151 LR: 0.00002845 +[10:39:29] Epoch: 1 Batch: 6514/20099 (32.41%) Loss: 2.422933 LR: 0.00002845 +[10:39:31] Epoch: 1 Batch: 6515/20099 (32.41%) Loss: 1.952537 LR: 0.00002845 +[10:39:33] Epoch: 1 Batch: 6516/20099 (32.42%) Loss: 2.209727 LR: 0.00002845 +[10:39:35] Epoch: 1 Batch: 6517/20099 (32.42%) Loss: 2.500434 LR: 0.00002845 +[10:39:36] Epoch: 1 Batch: 6518/20099 (32.43%) Loss: 1.948252 LR: 0.00002845 +[10:39:38] Epoch: 1 Batch: 6519/20099 (32.43%) Loss: 2.600738 LR: 0.00002845 +[10:39:40] Epoch: 1 Batch: 6520/20099 (32.44%) Loss: 2.185647 LR: 0.00002844 +[10:39:42] Epoch: 1 Batch: 6521/20099 (32.44%) Loss: 2.219620 LR: 0.00002844 +[10:39:44] Epoch: 1 Batch: 6522/20099 (32.45%) Loss: 2.090808 LR: 0.00002844 +[10:39:45] Epoch: 1 Batch: 6523/20099 (32.45%) Loss: 1.949762 LR: 0.00002844 +[10:39:47] Epoch: 1 Batch: 6524/20099 (32.46%) Loss: 1.940869 LR: 0.00002844 +[10:39:49] Epoch: 1 Batch: 6525/20099 (32.46%) Loss: 2.095965 LR: 0.00002844 +[10:39:51] Epoch: 1 Batch: 6526/20099 (32.47%) Loss: 2.292775 LR: 0.00002844 +[10:39:52] Epoch: 1 Batch: 6527/20099 (32.47%) Loss: 1.980376 LR: 0.00002843 +[10:39:54] Epoch: 1 Batch: 6528/20099 (32.48%) Loss: 2.125344 LR: 0.00002843 +[10:39:56] Epoch: 1 Batch: 6529/20099 (32.48%) Loss: 2.118387 LR: 0.00002843 +[10:39:58] Epoch: 1 Batch: 6530/20099 (32.49%) Loss: 1.950618 LR: 0.00002843 +[10:39:59] Epoch: 1 Batch: 6531/20099 (32.49%) Loss: 2.460605 LR: 0.00002843 +[10:40:01] Epoch: 1 Batch: 6532/20099 (32.50%) Loss: 2.112350 LR: 0.00002843 +[10:40:03] Epoch: 1 Batch: 6533/20099 (32.50%) Loss: 2.153263 LR: 0.00002843 +[10:40:05] Epoch: 1 Batch: 6534/20099 (32.51%) Loss: 2.217041 LR: 0.00002842 +[10:40:07] Epoch: 1 Batch: 6535/20099 (32.51%) Loss: 2.092172 LR: 0.00002842 +[10:40:08] Epoch: 1 Batch: 6536/20099 (32.52%) Loss: 2.099350 LR: 0.00002842 +[10:40:10] Epoch: 1 Batch: 6537/20099 (32.52%) Loss: 2.239337 LR: 0.00002842 +[10:40:12] Epoch: 1 Batch: 6538/20099 (32.53%) Loss: 2.060982 LR: 0.00002842 +[10:40:14] Epoch: 1 Batch: 6539/20099 (32.53%) Loss: 2.290468 LR: 0.00002842 +[10:40:15] Epoch: 1 Batch: 6540/20099 (32.54%) Loss: 1.859637 LR: 0.00002842 +[10:40:17] Epoch: 1 Batch: 6541/20099 (32.54%) Loss: 2.027823 LR: 0.00002842 +[10:40:19] Epoch: 1 Batch: 6542/20099 (32.55%) Loss: 2.298292 LR: 0.00002842 +[10:40:21] Epoch: 1 Batch: 6543/20099 (32.55%) Loss: 2.235002 LR: 0.00002842 +[10:40:22] Epoch: 1 Batch: 6544/20099 (32.56%) Loss: 1.882437 LR: 0.00002842 +[10:40:24] Epoch: 1 Batch: 6545/20099 (32.56%) Loss: 2.103991 LR: 0.00002842 +[10:40:26] Epoch: 1 Batch: 6546/20099 (32.57%) Loss: 2.411402 LR: 0.00002842 +[10:40:28] Epoch: 1 Batch: 6547/20099 (32.57%) Loss: 2.240455 LR: 0.00002842 +[10:40:29] Epoch: 1 Batch: 6548/20099 (32.58%) Loss: 2.197745 LR: 0.00002841 +[10:40:31] Epoch: 1 Batch: 6549/20099 (32.58%) Loss: 2.008924 LR: 0.00002841 +[10:40:33] Epoch: 1 Batch: 6550/20099 (32.59%) Loss: 2.037074 LR: 0.00002841 +[10:40:35] Epoch: 1 Batch: 6551/20099 (32.59%) Loss: 2.100135 LR: 0.00002841 +[10:40:37] Epoch: 1 Batch: 6552/20099 (32.60%) Loss: 2.291214 LR: 0.00002841 +[10:40:38] Epoch: 1 Batch: 6553/20099 (32.60%) Loss: 2.208898 LR: 0.00002841 +[10:40:40] Epoch: 1 Batch: 6554/20099 (32.61%) Loss: 2.073964 LR: 0.00002841 +[10:40:42] Epoch: 1 Batch: 6555/20099 (32.61%) Loss: 2.003874 LR: 0.00002840 +[10:40:44] Epoch: 1 Batch: 6556/20099 (32.62%) Loss: 1.905044 LR: 0.00002840 +[10:40:45] Epoch: 1 Batch: 6557/20099 (32.62%) Loss: 2.297339 LR: 0.00002840 +[10:40:47] Epoch: 1 Batch: 6558/20099 (32.63%) Loss: 1.921412 LR: 0.00002840 +[10:40:49] Epoch: 1 Batch: 6559/20099 (32.63%) Loss: 2.214042 LR: 0.00002840 +[10:40:51] Epoch: 1 Batch: 6560/20099 (32.64%) Loss: 2.128930 LR: 0.00002840 +[10:40:52] Epoch: 1 Batch: 6561/20099 (32.64%) Loss: 2.101876 LR: 0.00002840 +[10:40:54] Epoch: 1 Batch: 6562/20099 (32.65%) Loss: 2.177011 LR: 0.00002839 +[10:40:56] Epoch: 1 Batch: 6563/20099 (32.65%) Loss: 2.139613 LR: 0.00002839 +[10:40:58] Epoch: 1 Batch: 6564/20099 (32.66%) Loss: 1.944555 LR: 0.00002839 +[10:40:59] Epoch: 1 Batch: 6565/20099 (32.66%) Loss: 2.068617 LR: 0.00002839 +[10:41:01] Epoch: 1 Batch: 6566/20099 (32.67%) Loss: 2.116636 LR: 0.00002839 +[10:41:03] Epoch: 1 Batch: 6567/20099 (32.67%) Loss: 1.933752 LR: 0.00002839 +[10:41:05] Epoch: 1 Batch: 6568/20099 (32.68%) Loss: 2.152174 LR: 0.00002839 +[10:41:07] Epoch: 1 Batch: 6569/20099 (32.68%) Loss: 2.178444 LR: 0.00002838 +[10:41:08] Epoch: 1 Batch: 6570/20099 (32.69%) Loss: 1.901858 LR: 0.00002838 +[10:41:10] Epoch: 1 Batch: 6571/20099 (32.69%) Loss: 1.981895 LR: 0.00002838 +[10:41:12] Epoch: 1 Batch: 6572/20099 (32.70%) Loss: 2.245258 LR: 0.00002838 +[10:41:14] Epoch: 1 Batch: 6573/20099 (32.70%) Loss: 2.014955 LR: 0.00002838 +[10:41:15] Epoch: 1 Batch: 6574/20099 (32.71%) Loss: 2.200868 LR: 0.00002838 +[10:41:17] Epoch: 1 Batch: 6575/20099 (32.71%) Loss: 2.078364 LR: 0.00002838 +[10:41:19] Epoch: 1 Batch: 6576/20099 (32.72%) Loss: 1.947287 LR: 0.00002837 +[10:41:21] Epoch: 1 Batch: 6577/20099 (32.72%) Loss: 2.259489 LR: 0.00002837 +[10:41:23] Epoch: 1 Batch: 6578/20099 (32.73%) Loss: 2.029839 LR: 0.00002837 +[10:41:24] Epoch: 1 Batch: 6579/20099 (32.73%) Loss: 2.028806 LR: 0.00002837 +[10:41:26] Epoch: 1 Batch: 6580/20099 (32.74%) Loss: 2.167898 LR: 0.00002837 +[10:41:28] Epoch: 1 Batch: 6581/20099 (32.74%) Loss: 1.935447 LR: 0.00002837 +[10:41:30] Epoch: 1 Batch: 6582/20099 (32.75%) Loss: 2.283997 LR: 0.00002837 +[10:41:31] Epoch: 1 Batch: 6583/20099 (32.75%) Loss: 2.034259 LR: 0.00002837 +[10:41:33] Epoch: 1 Batch: 6584/20099 (32.76%) Loss: 2.320268 LR: 0.00002837 +[10:41:35] Epoch: 1 Batch: 6585/20099 (32.76%) Loss: 2.041673 LR: 0.00002837 +[10:41:37] Epoch: 1 Batch: 6586/20099 (32.77%) Loss: 2.107000 LR: 0.00002837 +[10:41:39] Epoch: 1 Batch: 6587/20099 (32.77%) Loss: 1.924889 LR: 0.00002837 +[10:41:40] Epoch: 1 Batch: 6588/20099 (32.78%) Loss: 2.210462 LR: 0.00002837 +[10:41:42] Epoch: 1 Batch: 6589/20099 (32.78%) Loss: 2.043254 LR: 0.00002837 +[10:41:44] Epoch: 1 Batch: 6590/20099 (32.79%) Loss: 2.043365 LR: 0.00002836 +[10:41:46] Epoch: 1 Batch: 6591/20099 (32.79%) Loss: 2.307430 LR: 0.00002836 +[10:41:47] Epoch: 1 Batch: 6592/20099 (32.80%) Loss: 2.112752 LR: 0.00002836 +[10:41:49] Epoch: 1 Batch: 6593/20099 (32.80%) Loss: 2.149111 LR: 0.00002836 +[10:41:51] Epoch: 1 Batch: 6594/20099 (32.81%) Loss: 2.112580 LR: 0.00002836 +[10:41:53] Epoch: 1 Batch: 6595/20099 (32.81%) Loss: 2.191059 LR: 0.00002836 +[10:41:55] Epoch: 1 Batch: 6596/20099 (32.82%) Loss: 1.983547 LR: 0.00002836 +[10:41:56] Epoch: 1 Batch: 6597/20099 (32.82%) Loss: 2.120425 LR: 0.00002835 +[10:41:58] Epoch: 1 Batch: 6598/20099 (32.83%) Loss: 2.133031 LR: 0.00002835 +[10:42:00] Epoch: 1 Batch: 6599/20099 (32.83%) Loss: 2.486008 LR: 0.00002835 +[10:42:05] >> Cleaned up old temp checkpoint: epoch1_step4600 +[10:42:05] >> Temp checkpoint saved: epoch1_step6600, size: 0.1693 GB +[10:42:05] Epoch: 1 Batch: 6600/20099 (32.84%) Loss: 2.077664 LR: 0.00002835 +[10:42:07] Epoch: 1 Batch: 6601/20099 (32.84%) Loss: 2.100567 LR: 0.00002835 +[10:42:09] Epoch: 1 Batch: 6602/20099 (32.85%) Loss: 2.053918 LR: 0.00002835 +[10:42:11] Epoch: 1 Batch: 6603/20099 (32.85%) Loss: 2.239495 LR: 0.00002835 +[10:42:12] Epoch: 1 Batch: 6604/20099 (32.86%) Loss: 1.784944 LR: 0.00002834 +[10:42:14] Epoch: 1 Batch: 6605/20099 (32.86%) Loss: 2.002553 LR: 0.00002834 +[10:42:16] Epoch: 1 Batch: 6606/20099 (32.87%) Loss: 1.921628 LR: 0.00002834 +[10:42:18] Epoch: 1 Batch: 6607/20099 (32.87%) Loss: 2.328188 LR: 0.00002834 +[10:42:19] Epoch: 1 Batch: 6608/20099 (32.88%) Loss: 2.131901 LR: 0.00002834 +[10:42:21] Epoch: 1 Batch: 6609/20099 (32.88%) Loss: 2.255655 LR: 0.00002834 +[10:42:23] Epoch: 1 Batch: 6610/20099 (32.89%) Loss: 2.048466 LR: 0.00002834 +[10:42:25] Epoch: 1 Batch: 6611/20099 (32.89%) Loss: 2.332816 LR: 0.00002833 +[10:42:27] Epoch: 1 Batch: 6612/20099 (32.90%) Loss: 2.326630 LR: 0.00002833 +[10:42:28] Epoch: 1 Batch: 6613/20099 (32.90%) Loss: 1.857444 LR: 0.00002833 +[10:42:30] Epoch: 1 Batch: 6614/20099 (32.91%) Loss: 2.402572 LR: 0.00002833 +[10:42:32] Epoch: 1 Batch: 6615/20099 (32.91%) Loss: 2.235574 LR: 0.00002833 +[10:42:34] Epoch: 1 Batch: 6616/20099 (32.92%) Loss: 2.412613 LR: 0.00002833 +[10:42:35] Epoch: 1 Batch: 6617/20099 (32.92%) Loss: 2.018722 LR: 0.00002833 +[10:42:37] Epoch: 1 Batch: 6618/20099 (32.93%) Loss: 2.160610 LR: 0.00002833 +[10:42:39] Epoch: 1 Batch: 6619/20099 (32.93%) Loss: 2.160319 LR: 0.00002833 +[10:42:41] Epoch: 1 Batch: 6620/20099 (32.94%) Loss: 1.734548 LR: 0.00002833 +[10:42:43] Epoch: 1 Batch: 6621/20099 (32.94%) Loss: 2.235257 LR: 0.00002833 +[10:42:44] Epoch: 1 Batch: 6622/20099 (32.95%) Loss: 1.993770 LR: 0.00002833 +[10:42:46] Epoch: 1 Batch: 6623/20099 (32.95%) Loss: 1.936388 LR: 0.00002833 +[10:42:48] Epoch: 1 Batch: 6624/20099 (32.96%) Loss: 1.903807 LR: 0.00002833 +[10:42:50] Epoch: 1 Batch: 6625/20099 (32.96%) Loss: 2.203893 LR: 0.00002832 +[10:42:52] Epoch: 1 Batch: 6626/20099 (32.97%) Loss: 2.055523 LR: 0.00002832 +[10:42:53] Epoch: 1 Batch: 6627/20099 (32.97%) Loss: 1.940781 LR: 0.00002832 +[10:42:55] Epoch: 1 Batch: 6628/20099 (32.98%) Loss: 2.008262 LR: 0.00002832 +[10:42:57] Epoch: 1 Batch: 6629/20099 (32.98%) Loss: 2.238813 LR: 0.00002832 +[10:42:59] Epoch: 1 Batch: 6630/20099 (32.99%) Loss: 2.199565 LR: 0.00002832 +[10:43:00] Epoch: 1 Batch: 6631/20099 (32.99%) Loss: 2.255496 LR: 0.00002832 +[10:43:02] Epoch: 1 Batch: 6632/20099 (33.00%) Loss: 2.135784 LR: 0.00002831 +[10:43:04] Epoch: 1 Batch: 6633/20099 (33.00%) Loss: 2.007067 LR: 0.00002831 +[10:43:06] Epoch: 1 Batch: 6634/20099 (33.01%) Loss: 1.923503 LR: 0.00002831 +[10:43:07] Epoch: 1 Batch: 6635/20099 (33.01%) Loss: 1.967523 LR: 0.00002831 +[10:43:09] Epoch: 1 Batch: 6636/20099 (33.02%) Loss: 1.924851 LR: 0.00002831 +[10:43:11] Epoch: 1 Batch: 6637/20099 (33.02%) Loss: 2.058989 LR: 0.00002831 +[10:43:13] Epoch: 1 Batch: 6638/20099 (33.03%) Loss: 2.161019 LR: 0.00002831 +[10:43:14] Epoch: 1 Batch: 6639/20099 (33.03%) Loss: 2.068487 LR: 0.00002830 +[10:43:16] Epoch: 1 Batch: 6640/20099 (33.04%) Loss: 1.973351 LR: 0.00002830 +[10:43:18] Epoch: 1 Batch: 6641/20099 (33.04%) Loss: 2.189079 LR: 0.00002830 +[10:43:20] Epoch: 1 Batch: 6642/20099 (33.05%) Loss: 2.149225 LR: 0.00002830 +[10:43:22] Epoch: 1 Batch: 6643/20099 (33.05%) Loss: 2.203907 LR: 0.00002830 +[10:43:23] Epoch: 1 Batch: 6644/20099 (33.06%) Loss: 2.114188 LR: 0.00002830 +[10:43:25] Epoch: 1 Batch: 6645/20099 (33.06%) Loss: 2.052547 LR: 0.00002830 +[10:43:27] Epoch: 1 Batch: 6646/20099 (33.07%) Loss: 2.102687 LR: 0.00002829 +[10:43:29] Epoch: 1 Batch: 6647/20099 (33.07%) Loss: 1.964479 LR: 0.00002829 +[10:43:30] Epoch: 1 Batch: 6648/20099 (33.08%) Loss: 2.102710 LR: 0.00002829 +[10:43:32] Epoch: 1 Batch: 6649/20099 (33.08%) Loss: 1.992023 LR: 0.00002829 +[10:43:34] Epoch: 1 Batch: 6650/20099 (33.09%) Loss: 2.317252 LR: 0.00002829 +[10:43:36] Epoch: 1 Batch: 6651/20099 (33.09%) Loss: 2.404772 LR: 0.00002829 +[10:43:37] Epoch: 1 Batch: 6652/20099 (33.10%) Loss: 2.115197 LR: 0.00002829 +[10:43:39] Epoch: 1 Batch: 6653/20099 (33.10%) Loss: 1.821781 LR: 0.00002828 +[10:43:41] Epoch: 1 Batch: 6654/20099 (33.11%) Loss: 2.248160 LR: 0.00002828 +[10:43:43] Epoch: 1 Batch: 6655/20099 (33.11%) Loss: 1.996402 LR: 0.00002828 +[10:43:44] Epoch: 1 Batch: 6656/20099 (33.12%) Loss: 2.160591 LR: 0.00002828 +[10:43:46] Epoch: 1 Batch: 6657/20099 (33.12%) Loss: 2.193472 LR: 0.00002828 +[10:43:48] Epoch: 1 Batch: 6658/20099 (33.13%) Loss: 2.338682 LR: 0.00002828 +[10:43:50] Epoch: 1 Batch: 6659/20099 (33.13%) Loss: 1.997308 LR: 0.00002828 +[10:43:51] Epoch: 1 Batch: 6660/20099 (33.14%) Loss: 2.039129 LR: 0.00002828 +[10:43:53] Epoch: 1 Batch: 6661/20099 (33.14%) Loss: 1.947159 LR: 0.00002828 +[10:43:55] Epoch: 1 Batch: 6662/20099 (33.15%) Loss: 2.079448 LR: 0.00002828 +[10:43:57] Epoch: 1 Batch: 6663/20099 (33.15%) Loss: 2.110144 LR: 0.00002828 +[10:43:59] Epoch: 1 Batch: 6664/20099 (33.16%) Loss: 1.904817 LR: 0.00002828 +[10:44:00] Epoch: 1 Batch: 6665/20099 (33.16%) Loss: 2.131608 LR: 0.00002828 +[10:44:02] Epoch: 1 Batch: 6666/20099 (33.17%) Loss: 2.256192 LR: 0.00002828 +[10:44:04] Epoch: 1 Batch: 6667/20099 (33.17%) Loss: 2.011547 LR: 0.00002827 +[10:44:06] Epoch: 1 Batch: 6668/20099 (33.18%) Loss: 2.144122 LR: 0.00002827 +[10:44:07] Epoch: 1 Batch: 6669/20099 (33.18%) Loss: 2.060750 LR: 0.00002827 +[10:44:09] Epoch: 1 Batch: 6670/20099 (33.19%) Loss: 2.306222 LR: 0.00002827 +[10:44:11] Epoch: 1 Batch: 6671/20099 (33.19%) Loss: 1.846326 LR: 0.00002827 +[10:44:13] Epoch: 1 Batch: 6672/20099 (33.20%) Loss: 2.433973 LR: 0.00002827 +[10:44:14] Epoch: 1 Batch: 6673/20099 (33.20%) Loss: 2.035185 LR: 0.00002827 +[10:44:16] Epoch: 1 Batch: 6674/20099 (33.21%) Loss: 2.191068 LR: 0.00002826 +[10:44:18] Epoch: 1 Batch: 6675/20099 (33.21%) Loss: 2.229950 LR: 0.00002826 +[10:44:20] Epoch: 1 Batch: 6676/20099 (33.22%) Loss: 2.320967 LR: 0.00002826 +[10:44:22] Epoch: 1 Batch: 6677/20099 (33.22%) Loss: 2.071143 LR: 0.00002826 +[10:44:23] Epoch: 1 Batch: 6678/20099 (33.23%) Loss: 2.149494 LR: 0.00002826 +[10:44:25] Epoch: 1 Batch: 6679/20099 (33.23%) Loss: 1.826397 LR: 0.00002826 +[10:44:27] Epoch: 1 Batch: 6680/20099 (33.24%) Loss: 2.332940 LR: 0.00002826 +[10:44:29] Epoch: 1 Batch: 6681/20099 (33.24%) Loss: 1.933102 LR: 0.00002825 +[10:44:30] Epoch: 1 Batch: 6682/20099 (33.25%) Loss: 2.096231 LR: 0.00002825 +[10:44:32] Epoch: 1 Batch: 6683/20099 (33.25%) Loss: 2.218872 LR: 0.00002825 +[10:44:34] Epoch: 1 Batch: 6684/20099 (33.26%) Loss: 2.036228 LR: 0.00002825 +[10:44:36] Epoch: 1 Batch: 6685/20099 (33.26%) Loss: 2.056673 LR: 0.00002825 +[10:44:38] Epoch: 1 Batch: 6686/20099 (33.27%) Loss: 1.921984 LR: 0.00002825 +[10:44:39] Epoch: 1 Batch: 6687/20099 (33.27%) Loss: 2.154208 LR: 0.00002825 +[10:44:41] Epoch: 1 Batch: 6688/20099 (33.28%) Loss: 2.052454 LR: 0.00002824 +[10:44:43] Epoch: 1 Batch: 6689/20099 (33.28%) Loss: 2.151461 LR: 0.00002824 +[10:44:45] Epoch: 1 Batch: 6690/20099 (33.29%) Loss: 1.881965 LR: 0.00002824 +[10:44:46] Epoch: 1 Batch: 6691/20099 (33.29%) Loss: 2.102429 LR: 0.00002824 +[10:44:48] Epoch: 1 Batch: 6692/20099 (33.30%) Loss: 2.124313 LR: 0.00002824 +[10:44:50] Epoch: 1 Batch: 6693/20099 (33.30%) Loss: 2.127237 LR: 0.00002824 +[10:44:52] Epoch: 1 Batch: 6694/20099 (33.31%) Loss: 2.158774 LR: 0.00002824 +[10:44:53] Epoch: 1 Batch: 6695/20099 (33.31%) Loss: 2.227474 LR: 0.00002823 +[10:44:55] Epoch: 1 Batch: 6696/20099 (33.32%) Loss: 2.249902 LR: 0.00002823 +[10:44:57] Epoch: 1 Batch: 6697/20099 (33.32%) Loss: 2.158015 LR: 0.00002823 +[10:44:59] Epoch: 1 Batch: 6698/20099 (33.33%) Loss: 2.000971 LR: 0.00002823 +[10:45:00] Epoch: 1 Batch: 6699/20099 (33.33%) Loss: 2.155462 LR: 0.00002823 +[10:45:02] Epoch: 1 Batch: 6700/20099 (33.33%) Loss: 2.166986 LR: 0.00002823 +[10:45:04] Epoch: 1 Batch: 6701/20099 (33.34%) Loss: 2.025448 LR: 0.00002823 +[10:45:06] Epoch: 1 Batch: 6702/20099 (33.34%) Loss: 1.943710 LR: 0.00002822 +[10:45:08] Epoch: 1 Batch: 6703/20099 (33.35%) Loss: 1.955895 LR: 0.00002822 +[10:45:09] Epoch: 1 Batch: 6704/20099 (33.35%) Loss: 2.171408 LR: 0.00002822 +[10:45:11] Epoch: 1 Batch: 6705/20099 (33.36%) Loss: 2.305253 LR: 0.00002822 +[10:45:13] Epoch: 1 Batch: 6706/20099 (33.36%) Loss: 2.059647 LR: 0.00002822 +[10:45:15] Epoch: 1 Batch: 6707/20099 (33.37%) Loss: 2.167451 LR: 0.00002822 +[10:45:16] Epoch: 1 Batch: 6708/20099 (33.37%) Loss: 2.019367 LR: 0.00002822 +[10:45:18] Epoch: 1 Batch: 6709/20099 (33.38%) Loss: 2.093004 LR: 0.00002822 +[10:45:20] Epoch: 1 Batch: 6710/20099 (33.38%) Loss: 2.353024 LR: 0.00002822 +[10:45:22] Epoch: 1 Batch: 6711/20099 (33.39%) Loss: 1.773209 LR: 0.00002822 +[10:45:23] Epoch: 1 Batch: 6712/20099 (33.39%) Loss: 2.224622 LR: 0.00002822 +[10:45:25] Epoch: 1 Batch: 6713/20099 (33.40%) Loss: 2.114805 LR: 0.00002822 +[10:45:27] Epoch: 1 Batch: 6714/20099 (33.40%) Loss: 1.997008 LR: 0.00002822 +[10:45:29] Epoch: 1 Batch: 6715/20099 (33.41%) Loss: 2.039809 LR: 0.00002822 +[10:45:31] Epoch: 1 Batch: 6716/20099 (33.41%) Loss: 2.115780 LR: 0.00002821 +[10:45:32] Epoch: 1 Batch: 6717/20099 (33.42%) Loss: 2.182086 LR: 0.00002821 +[10:45:34] Epoch: 1 Batch: 6718/20099 (33.42%) Loss: 1.905338 LR: 0.00002821 +[10:45:36] Epoch: 1 Batch: 6719/20099 (33.43%) Loss: 2.180464 LR: 0.00002821 +[10:45:38] Epoch: 1 Batch: 6720/20099 (33.43%) Loss: 2.033190 LR: 0.00002821 +[10:45:39] Epoch: 1 Batch: 6721/20099 (33.44%) Loss: 2.240808 LR: 0.00002821 +[10:45:41] Epoch: 1 Batch: 6722/20099 (33.44%) Loss: 2.051046 LR: 0.00002821 +[10:45:43] Epoch: 1 Batch: 6723/20099 (33.45%) Loss: 2.138796 LR: 0.00002820 +[10:45:45] Epoch: 1 Batch: 6724/20099 (33.45%) Loss: 2.237592 LR: 0.00002820 +[10:45:46] Epoch: 1 Batch: 6725/20099 (33.46%) Loss: 2.235295 LR: 0.00002820 +[10:45:48] Epoch: 1 Batch: 6726/20099 (33.46%) Loss: 2.074175 LR: 0.00002820 +[10:45:50] Epoch: 1 Batch: 6727/20099 (33.47%) Loss: 2.199253 LR: 0.00002820 +[10:45:52] Epoch: 1 Batch: 6728/20099 (33.47%) Loss: 2.063665 LR: 0.00002820 +[10:45:53] Epoch: 1 Batch: 6729/20099 (33.48%) Loss: 2.025707 LR: 0.00002820 +[10:45:55] Epoch: 1 Batch: 6730/20099 (33.48%) Loss: 2.133513 LR: 0.00002819 +[10:45:57] Epoch: 1 Batch: 6731/20099 (33.49%) Loss: 1.898714 LR: 0.00002819 +[10:45:59] Epoch: 1 Batch: 6732/20099 (33.49%) Loss: 1.956254 LR: 0.00002819 +[10:46:00] Epoch: 1 Batch: 6733/20099 (33.50%) Loss: 1.740474 LR: 0.00002819 +[10:46:02] Epoch: 1 Batch: 6734/20099 (33.50%) Loss: 2.131116 LR: 0.00002819 +[10:46:04] Epoch: 1 Batch: 6735/20099 (33.51%) Loss: 2.084637 LR: 0.00002819 +[10:46:06] Epoch: 1 Batch: 6736/20099 (33.51%) Loss: 2.142046 LR: 0.00002819 +[10:46:08] Epoch: 1 Batch: 6737/20099 (33.52%) Loss: 2.080365 LR: 0.00002818 +[10:46:09] Epoch: 1 Batch: 6738/20099 (33.52%) Loss: 2.154455 LR: 0.00002818 +[10:46:11] Epoch: 1 Batch: 6739/20099 (33.53%) Loss: 2.193942 LR: 0.00002818 +[10:46:13] Epoch: 1 Batch: 6740/20099 (33.53%) Loss: 2.127239 LR: 0.00002818 +[10:46:15] Epoch: 1 Batch: 6741/20099 (33.54%) Loss: 1.630982 LR: 0.00002818 +[10:46:16] Epoch: 1 Batch: 6742/20099 (33.54%) Loss: 1.819384 LR: 0.00002818 +[10:46:18] Epoch: 1 Batch: 6743/20099 (33.55%) Loss: 2.069460 LR: 0.00002818 +[10:46:20] Epoch: 1 Batch: 6744/20099 (33.55%) Loss: 2.100762 LR: 0.00002817 +[10:46:22] Epoch: 1 Batch: 6745/20099 (33.56%) Loss: 2.050283 LR: 0.00002817 +[10:46:23] Epoch: 1 Batch: 6746/20099 (33.56%) Loss: 1.934958 LR: 0.00002817 +[10:46:25] Epoch: 1 Batch: 6747/20099 (33.57%) Loss: 2.189257 LR: 0.00002817 +[10:46:27] Epoch: 1 Batch: 6748/20099 (33.57%) Loss: 1.998366 LR: 0.00002817 +[10:46:29] Epoch: 1 Batch: 6749/20099 (33.58%) Loss: 2.401262 LR: 0.00002817 +[10:46:30] Epoch: 1 Batch: 6750/20099 (33.58%) Loss: 1.994730 LR: 0.00002817 +[10:46:32] Epoch: 1 Batch: 6751/20099 (33.59%) Loss: 2.028171 LR: 0.00002816 +[10:46:34] Epoch: 1 Batch: 6752/20099 (33.59%) Loss: 2.305369 LR: 0.00002816 +[10:46:36] Epoch: 1 Batch: 6753/20099 (33.60%) Loss: 2.020905 LR: 0.00002816 +[10:46:38] Epoch: 1 Batch: 6754/20099 (33.60%) Loss: 2.215989 LR: 0.00002816 +[10:46:39] Epoch: 1 Batch: 6755/20099 (33.61%) Loss: 2.009247 LR: 0.00002816 +[10:46:41] Epoch: 1 Batch: 6756/20099 (33.61%) Loss: 2.024789 LR: 0.00002816 +[10:46:43] Epoch: 1 Batch: 6757/20099 (33.62%) Loss: 2.032604 LR: 0.00002816 +[10:46:45] Epoch: 1 Batch: 6758/20099 (33.62%) Loss: 2.034793 LR: 0.00002816 +[10:46:46] Epoch: 1 Batch: 6759/20099 (33.63%) Loss: 2.084465 LR: 0.00002816 +[10:46:48] Epoch: 1 Batch: 6760/20099 (33.63%) Loss: 2.211863 LR: 0.00002816 +[10:46:50] Epoch: 1 Batch: 6761/20099 (33.64%) Loss: 1.915698 LR: 0.00002816 +[10:46:52] Epoch: 1 Batch: 6762/20099 (33.64%) Loss: 2.201440 LR: 0.00002816 +[10:46:53] Epoch: 1 Batch: 6763/20099 (33.65%) Loss: 2.025680 LR: 0.00002816 +[10:46:55] Epoch: 1 Batch: 6764/20099 (33.65%) Loss: 2.166673 LR: 0.00002816 +[10:46:57] Epoch: 1 Batch: 6765/20099 (33.66%) Loss: 2.119070 LR: 0.00002815 +[10:46:59] Epoch: 1 Batch: 6766/20099 (33.66%) Loss: 2.078631 LR: 0.00002815 +[10:47:01] Epoch: 1 Batch: 6767/20099 (33.67%) Loss: 1.665108 LR: 0.00002815 +[10:47:02] Epoch: 1 Batch: 6768/20099 (33.67%) Loss: 2.020810 LR: 0.00002815 +[10:47:04] Epoch: 1 Batch: 6769/20099 (33.68%) Loss: 2.119768 LR: 0.00002815 +[10:47:06] Epoch: 1 Batch: 6770/20099 (33.68%) Loss: 2.115186 LR: 0.00002815 +[10:47:08] Epoch: 1 Batch: 6771/20099 (33.69%) Loss: 2.082371 LR: 0.00002815 +[10:47:09] Epoch: 1 Batch: 6772/20099 (33.69%) Loss: 2.187905 LR: 0.00002814 +[10:47:11] Epoch: 1 Batch: 6773/20099 (33.70%) Loss: 2.165078 LR: 0.00002814 +[10:47:13] Epoch: 1 Batch: 6774/20099 (33.70%) Loss: 2.191298 LR: 0.00002814 +[10:47:15] Epoch: 1 Batch: 6775/20099 (33.71%) Loss: 2.090792 LR: 0.00002814 +[10:47:17] Epoch: 1 Batch: 6776/20099 (33.71%) Loss: 1.858158 LR: 0.00002814 +[10:47:18] Epoch: 1 Batch: 6777/20099 (33.72%) Loss: 2.024042 LR: 0.00002814 +[10:47:20] Epoch: 1 Batch: 6778/20099 (33.72%) Loss: 2.247656 LR: 0.00002814 +[10:47:22] Epoch: 1 Batch: 6779/20099 (33.73%) Loss: 1.992156 LR: 0.00002813 +[10:47:24] Epoch: 1 Batch: 6780/20099 (33.73%) Loss: 2.091400 LR: 0.00002813 +[10:47:25] Epoch: 1 Batch: 6781/20099 (33.74%) Loss: 1.944572 LR: 0.00002813 +[10:47:27] Epoch: 1 Batch: 6782/20099 (33.74%) Loss: 2.358613 LR: 0.00002813 +[10:47:29] Epoch: 1 Batch: 6783/20099 (33.75%) Loss: 2.177873 LR: 0.00002813 +[10:47:31] Epoch: 1 Batch: 6784/20099 (33.75%) Loss: 2.180829 LR: 0.00002813 +[10:47:33] Epoch: 1 Batch: 6785/20099 (33.76%) Loss: 2.116718 LR: 0.00002813 +[10:47:34] Epoch: 1 Batch: 6786/20099 (33.76%) Loss: 1.862131 LR: 0.00002812 +[10:47:36] Epoch: 1 Batch: 6787/20099 (33.77%) Loss: 2.143174 LR: 0.00002812 +[10:47:38] Epoch: 1 Batch: 6788/20099 (33.77%) Loss: 2.031924 LR: 0.00002812 +[10:47:40] Epoch: 1 Batch: 6789/20099 (33.78%) Loss: 2.071654 LR: 0.00002812 +[10:47:41] Epoch: 1 Batch: 6790/20099 (33.78%) Loss: 2.098736 LR: 0.00002812 +[10:47:43] Epoch: 1 Batch: 6791/20099 (33.79%) Loss: 2.129359 LR: 0.00002812 +[10:47:45] Epoch: 1 Batch: 6792/20099 (33.79%) Loss: 2.024835 LR: 0.00002812 +[10:47:47] Epoch: 1 Batch: 6793/20099 (33.80%) Loss: 1.890738 LR: 0.00002811 +[10:47:49] Epoch: 1 Batch: 6794/20099 (33.80%) Loss: 2.226524 LR: 0.00002811 +[10:47:50] Epoch: 1 Batch: 6795/20099 (33.81%) Loss: 2.303231 LR: 0.00002811 +[10:47:52] Epoch: 1 Batch: 6796/20099 (33.81%) Loss: 2.004296 LR: 0.00002811 +[10:47:54] Epoch: 1 Batch: 6797/20099 (33.82%) Loss: 2.066525 LR: 0.00002811 +[10:47:56] Epoch: 1 Batch: 6798/20099 (33.82%) Loss: 2.254001 LR: 0.00002811 +[10:47:57] Epoch: 1 Batch: 6799/20099 (33.83%) Loss: 2.140169 LR: 0.00002811 +[10:48:03] >> Cleaned up old temp checkpoint: epoch1_step4800 +[10:48:03] >> Temp checkpoint saved: epoch1_step6800, size: 0.1693 GB +[10:48:03] Epoch: 1 Batch: 6800/20099 (33.83%) Loss: 1.916113 LR: 0.00002810 +[10:48:04] Epoch: 1 Batch: 6801/20099 (33.84%) Loss: 2.143852 LR: 0.00002810 +[10:48:06] Epoch: 1 Batch: 6802/20099 (33.84%) Loss: 2.223588 LR: 0.00002810 +[10:48:08] Epoch: 1 Batch: 6803/20099 (33.85%) Loss: 1.996708 LR: 0.00002810 +[10:48:10] Epoch: 1 Batch: 6804/20099 (33.85%) Loss: 2.069145 LR: 0.00002810 +[10:48:11] Epoch: 1 Batch: 6805/20099 (33.86%) Loss: 2.025539 LR: 0.00002810 +[10:48:13] Epoch: 1 Batch: 6806/20099 (33.86%) Loss: 2.089552 LR: 0.00002810 +[10:48:15] Epoch: 1 Batch: 6807/20099 (33.87%) Loss: 2.364429 LR: 0.00002810 +[10:48:17] Epoch: 1 Batch: 6808/20099 (33.87%) Loss: 2.181350 LR: 0.00002810 +[10:48:19] Epoch: 1 Batch: 6809/20099 (33.88%) Loss: 1.795850 LR: 0.00002810 +[10:48:20] Epoch: 1 Batch: 6810/20099 (33.88%) Loss: 2.077196 LR: 0.00002810 +[10:48:22] Epoch: 1 Batch: 6811/20099 (33.89%) Loss: 2.209964 LR: 0.00002810 +[10:48:24] Epoch: 1 Batch: 6812/20099 (33.89%) Loss: 1.967180 LR: 0.00002810 +[10:48:26] Epoch: 1 Batch: 6813/20099 (33.90%) Loss: 2.117794 LR: 0.00002810 +[10:48:27] Epoch: 1 Batch: 6814/20099 (33.90%) Loss: 1.884259 LR: 0.00002809 +[10:48:29] Epoch: 1 Batch: 6815/20099 (33.91%) Loss: 2.238241 LR: 0.00002809 +[10:48:31] Epoch: 1 Batch: 6816/20099 (33.91%) Loss: 2.001045 LR: 0.00002809 +[10:48:33] Epoch: 1 Batch: 6817/20099 (33.92%) Loss: 2.030901 LR: 0.00002809 +[10:48:35] Epoch: 1 Batch: 6818/20099 (33.92%) Loss: 2.379946 LR: 0.00002809 +[10:48:36] Epoch: 1 Batch: 6819/20099 (33.93%) Loss: 2.121236 LR: 0.00002809 +[10:48:38] Epoch: 1 Batch: 6820/20099 (33.93%) Loss: 1.918446 LR: 0.00002809 +[10:48:40] Epoch: 1 Batch: 6821/20099 (33.94%) Loss: 2.046328 LR: 0.00002808 +[10:48:42] Epoch: 1 Batch: 6822/20099 (33.94%) Loss: 2.333735 LR: 0.00002808 +[10:48:44] Epoch: 1 Batch: 6823/20099 (33.95%) Loss: 2.291618 LR: 0.00002808 +[10:48:45] Epoch: 1 Batch: 6824/20099 (33.95%) Loss: 2.176229 LR: 0.00002808 +[10:48:47] Epoch: 1 Batch: 6825/20099 (33.96%) Loss: 1.687084 LR: 0.00002808 +[10:48:49] Epoch: 1 Batch: 6826/20099 (33.96%) Loss: 1.904592 LR: 0.00002808 +[10:48:51] Epoch: 1 Batch: 6827/20099 (33.97%) Loss: 2.436557 LR: 0.00002808 +[10:48:52] Epoch: 1 Batch: 6828/20099 (33.97%) Loss: 2.187032 LR: 0.00002807 +[10:48:54] Epoch: 1 Batch: 6829/20099 (33.98%) Loss: 2.110093 LR: 0.00002807 +[10:48:56] Epoch: 1 Batch: 6830/20099 (33.98%) Loss: 2.027519 LR: 0.00002807 +[10:48:58] Epoch: 1 Batch: 6831/20099 (33.99%) Loss: 2.005858 LR: 0.00002807 +[10:48:59] Epoch: 1 Batch: 6832/20099 (33.99%) Loss: 2.150229 LR: 0.00002807 +[10:49:01] Epoch: 1 Batch: 6833/20099 (34.00%) Loss: 1.793528 LR: 0.00002807 +[10:49:03] Epoch: 1 Batch: 6834/20099 (34.00%) Loss: 2.023657 LR: 0.00002807 +[10:49:05] Epoch: 1 Batch: 6835/20099 (34.01%) Loss: 2.396286 LR: 0.00002806 +[10:49:07] Epoch: 1 Batch: 6836/20099 (34.01%) Loss: 1.939944 LR: 0.00002806 +[10:49:08] Epoch: 1 Batch: 6837/20099 (34.02%) Loss: 2.156412 LR: 0.00002806 +[10:49:10] Epoch: 1 Batch: 6838/20099 (34.02%) Loss: 2.326754 LR: 0.00002806 +[10:49:12] Epoch: 1 Batch: 6839/20099 (34.03%) Loss: 2.351211 LR: 0.00002806 +[10:49:14] Epoch: 1 Batch: 6840/20099 (34.03%) Loss: 2.169238 LR: 0.00002806 +[10:49:15] Epoch: 1 Batch: 6841/20099 (34.04%) Loss: 2.003825 LR: 0.00002806 +[10:49:17] Epoch: 1 Batch: 6842/20099 (34.04%) Loss: 2.041799 LR: 0.00002805 +[10:49:19] Epoch: 1 Batch: 6843/20099 (34.05%) Loss: 2.146740 LR: 0.00002805 +[10:49:21] Epoch: 1 Batch: 6844/20099 (34.05%) Loss: 2.254851 LR: 0.00002805 +[10:49:22] Epoch: 1 Batch: 6845/20099 (34.06%) Loss: 2.354460 LR: 0.00002805 +[10:49:24] Epoch: 1 Batch: 6846/20099 (34.06%) Loss: 2.311606 LR: 0.00002805 +[10:49:26] Epoch: 1 Batch: 6847/20099 (34.07%) Loss: 2.343667 LR: 0.00002805 +[10:49:28] Epoch: 1 Batch: 6848/20099 (34.07%) Loss: 1.940030 LR: 0.00002805 +[10:49:30] Epoch: 1 Batch: 6849/20099 (34.08%) Loss: 2.089580 LR: 0.00002804 +[10:49:31] Epoch: 1 Batch: 6850/20099 (34.08%) Loss: 1.973213 LR: 0.00002804 +[10:49:33] Epoch: 1 Batch: 6851/20099 (34.09%) Loss: 1.910539 LR: 0.00002804 +[10:49:35] Epoch: 1 Batch: 6852/20099 (34.09%) Loss: 2.203053 LR: 0.00002804 +[10:49:37] Epoch: 1 Batch: 6853/20099 (34.10%) Loss: 2.191430 LR: 0.00002804 +[10:49:38] Epoch: 1 Batch: 6854/20099 (34.10%) Loss: 1.961695 LR: 0.00002804 +[10:49:40] Epoch: 1 Batch: 6855/20099 (34.11%) Loss: 2.207938 LR: 0.00002804 +[10:49:42] Epoch: 1 Batch: 6856/20099 (34.11%) Loss: 2.093933 LR: 0.00002803 +[10:49:44] Epoch: 1 Batch: 6857/20099 (34.12%) Loss: 2.106593 LR: 0.00002803 +[10:49:45] Epoch: 1 Batch: 6858/20099 (34.12%) Loss: 2.213724 LR: 0.00002803 +[10:49:47] Epoch: 1 Batch: 6859/20099 (34.13%) Loss: 1.953931 LR: 0.00002803 +[10:49:49] Epoch: 1 Batch: 6860/20099 (34.13%) Loss: 2.033996 LR: 0.00002803 +[10:49:51] Epoch: 1 Batch: 6861/20099 (34.14%) Loss: 1.984051 LR: 0.00002803 +[10:49:52] Epoch: 1 Batch: 6862/20099 (34.14%) Loss: 2.107455 LR: 0.00002803 +[10:49:54] Epoch: 1 Batch: 6863/20099 (34.15%) Loss: 2.359044 LR: 0.00002802 +[10:49:56] Epoch: 1 Batch: 6864/20099 (34.15%) Loss: 1.987946 LR: 0.00002802 +[10:49:58] Epoch: 1 Batch: 6865/20099 (34.16%) Loss: 2.032412 LR: 0.00002802 +[10:50:00] Epoch: 1 Batch: 6866/20099 (34.16%) Loss: 2.114138 LR: 0.00002802 +[10:50:01] Epoch: 1 Batch: 6867/20099 (34.17%) Loss: 2.018090 LR: 0.00002802 +[10:50:03] Epoch: 1 Batch: 6868/20099 (34.17%) Loss: 2.134087 LR: 0.00002802 +[10:50:05] Epoch: 1 Batch: 6869/20099 (34.18%) Loss: 2.134937 LR: 0.00002802 +[10:50:07] Epoch: 1 Batch: 6870/20099 (34.18%) Loss: 2.071761 LR: 0.00002802 +[10:50:08] Epoch: 1 Batch: 6871/20099 (34.19%) Loss: 2.151066 LR: 0.00002802 +[10:50:10] Epoch: 1 Batch: 6872/20099 (34.19%) Loss: 2.300032 LR: 0.00002802 +[10:50:12] Epoch: 1 Batch: 6873/20099 (34.20%) Loss: 2.009768 LR: 0.00002802 +[10:50:14] Epoch: 1 Batch: 6874/20099 (34.20%) Loss: 2.201602 LR: 0.00002802 +[10:50:16] Epoch: 1 Batch: 6875/20099 (34.21%) Loss: 1.852738 LR: 0.00002802 +[10:50:17] Epoch: 1 Batch: 6876/20099 (34.21%) Loss: 2.064631 LR: 0.00002802 +[10:50:19] Epoch: 1 Batch: 6877/20099 (34.22%) Loss: 1.971257 LR: 0.00002801 +[10:50:21] Epoch: 1 Batch: 6878/20099 (34.22%) Loss: 2.184020 LR: 0.00002801 +[10:50:23] Epoch: 1 Batch: 6879/20099 (34.23%) Loss: 2.165136 LR: 0.00002801 +[10:50:24] Epoch: 1 Batch: 6880/20099 (34.23%) Loss: 1.912142 LR: 0.00002801 +[10:50:26] Epoch: 1 Batch: 6881/20099 (34.24%) Loss: 2.175558 LR: 0.00002801 +[10:50:28] Epoch: 1 Batch: 6882/20099 (34.24%) Loss: 2.092246 LR: 0.00002801 +[10:50:30] Epoch: 1 Batch: 6883/20099 (34.25%) Loss: 1.799839 LR: 0.00002801 +[10:50:32] Epoch: 1 Batch: 6884/20099 (34.25%) Loss: 2.187440 LR: 0.00002800 +[10:50:33] Epoch: 1 Batch: 6885/20099 (34.26%) Loss: 2.044364 LR: 0.00002800 +[10:50:35] Epoch: 1 Batch: 6886/20099 (34.26%) Loss: 2.156611 LR: 0.00002800 +[10:50:37] Epoch: 1 Batch: 6887/20099 (34.27%) Loss: 2.077038 LR: 0.00002800 +[10:50:39] Epoch: 1 Batch: 6888/20099 (34.27%) Loss: 1.979192 LR: 0.00002800 +[10:50:40] Epoch: 1 Batch: 6889/20099 (34.28%) Loss: 2.001452 LR: 0.00002800 +[10:50:42] Epoch: 1 Batch: 6890/20099 (34.28%) Loss: 1.879841 LR: 0.00002800 +[10:50:44] Epoch: 1 Batch: 6891/20099 (34.29%) Loss: 2.188141 LR: 0.00002799 +[10:50:46] Epoch: 1 Batch: 6892/20099 (34.29%) Loss: 2.392734 LR: 0.00002799 +[10:50:48] Epoch: 1 Batch: 6893/20099 (34.30%) Loss: 1.992314 LR: 0.00002799 +[10:50:49] Epoch: 1 Batch: 6894/20099 (34.30%) Loss: 2.172193 LR: 0.00002799 +[10:50:51] Epoch: 1 Batch: 6895/20099 (34.31%) Loss: 2.109421 LR: 0.00002799 +[10:50:53] Epoch: 1 Batch: 6896/20099 (34.31%) Loss: 2.304380 LR: 0.00002799 +[10:50:55] Epoch: 1 Batch: 6897/20099 (34.32%) Loss: 2.193593 LR: 0.00002799 +[10:50:56] Epoch: 1 Batch: 6898/20099 (34.32%) Loss: 2.299893 LR: 0.00002798 +[10:50:58] Epoch: 1 Batch: 6899/20099 (34.33%) Loss: 2.264403 LR: 0.00002798 +[10:51:00] Epoch: 1 Batch: 6900/20099 (34.33%) Loss: 2.196179 LR: 0.00002798 +[10:51:02] Epoch: 1 Batch: 6901/20099 (34.34%) Loss: 2.425621 LR: 0.00002798 +[10:51:04] Epoch: 1 Batch: 6902/20099 (34.34%) Loss: 2.013336 LR: 0.00002798 +[10:51:05] Epoch: 1 Batch: 6903/20099 (34.34%) Loss: 1.577217 LR: 0.00002798 +[10:51:07] Epoch: 1 Batch: 6904/20099 (34.35%) Loss: 2.074481 LR: 0.00002798 +[10:51:09] Epoch: 1 Batch: 6905/20099 (34.35%) Loss: 2.095217 LR: 0.00002797 +[10:51:11] Epoch: 1 Batch: 6906/20099 (34.36%) Loss: 2.205564 LR: 0.00002797 +[10:51:12] Epoch: 1 Batch: 6907/20099 (34.36%) Loss: 2.179969 LR: 0.00002797 +[10:51:14] Epoch: 1 Batch: 6908/20099 (34.37%) Loss: 2.279891 LR: 0.00002797 +[10:51:16] Epoch: 1 Batch: 6909/20099 (34.37%) Loss: 2.016210 LR: 0.00002797 +[10:51:18] Epoch: 1 Batch: 6910/20099 (34.38%) Loss: 2.243501 LR: 0.00002797 +[10:51:20] Epoch: 1 Batch: 6911/20099 (34.38%) Loss: 1.659151 LR: 0.00002797 +[10:51:21] Epoch: 1 Batch: 6912/20099 (34.39%) Loss: 2.060833 LR: 0.00002796 +[10:51:23] Epoch: 1 Batch: 6913/20099 (34.39%) Loss: 2.150305 LR: 0.00002796 +[10:51:25] Epoch: 1 Batch: 6914/20099 (34.40%) Loss: 2.047707 LR: 0.00002796 +[10:51:27] Epoch: 1 Batch: 6915/20099 (34.40%) Loss: 1.850793 LR: 0.00002796 +[10:51:28] Epoch: 1 Batch: 6916/20099 (34.41%) Loss: 2.710672 LR: 0.00002796 +[10:51:30] Epoch: 1 Batch: 6917/20099 (34.41%) Loss: 2.245318 LR: 0.00002796 +[10:51:32] Epoch: 1 Batch: 6918/20099 (34.42%) Loss: 1.938993 LR: 0.00002796 +[10:51:34] Epoch: 1 Batch: 6919/20099 (34.42%) Loss: 2.023917 LR: 0.00002795 +[10:51:36] Epoch: 1 Batch: 6920/20099 (34.43%) Loss: 2.147976 LR: 0.00002795 +[10:51:37] Epoch: 1 Batch: 6921/20099 (34.43%) Loss: 2.366758 LR: 0.00002795 +[10:51:39] Epoch: 1 Batch: 6922/20099 (34.44%) Loss: 1.996173 LR: 0.00002795 +[10:51:41] Epoch: 1 Batch: 6923/20099 (34.44%) Loss: 2.230927 LR: 0.00002795 +[10:51:43] Epoch: 1 Batch: 6924/20099 (34.45%) Loss: 2.254483 LR: 0.00002795 +[10:51:44] Epoch: 1 Batch: 6925/20099 (34.45%) Loss: 2.145182 LR: 0.00002795 +[10:51:46] Epoch: 1 Batch: 6926/20099 (34.46%) Loss: 2.287155 LR: 0.00002794 +[10:51:48] Epoch: 1 Batch: 6927/20099 (34.46%) Loss: 2.104568 LR: 0.00002794 +[10:51:50] Epoch: 1 Batch: 6928/20099 (34.47%) Loss: 2.149989 LR: 0.00002794 +[10:51:52] Epoch: 1 Batch: 6929/20099 (34.47%) Loss: 2.280102 LR: 0.00002794 +[10:51:53] Epoch: 1 Batch: 6930/20099 (34.48%) Loss: 2.323362 LR: 0.00002794 +[10:51:55] Epoch: 1 Batch: 6931/20099 (34.48%) Loss: 2.245917 LR: 0.00002794 +[10:51:57] Epoch: 1 Batch: 6932/20099 (34.49%) Loss: 2.261348 LR: 0.00002794 +[10:51:59] Epoch: 1 Batch: 6933/20099 (34.49%) Loss: 2.051615 LR: 0.00002793 +[10:52:00] Epoch: 1 Batch: 6934/20099 (34.50%) Loss: 2.283620 LR: 0.00002793 +[10:52:02] Epoch: 1 Batch: 6935/20099 (34.50%) Loss: 2.344166 LR: 0.00002793 +[10:52:04] Epoch: 1 Batch: 6936/20099 (34.51%) Loss: 1.978947 LR: 0.00002793 +[10:52:06] Epoch: 1 Batch: 6937/20099 (34.51%) Loss: 2.123052 LR: 0.00002793 +[10:52:08] Epoch: 1 Batch: 6938/20099 (34.52%) Loss: 1.977782 LR: 0.00002793 +[10:52:09] Epoch: 1 Batch: 6939/20099 (34.52%) Loss: 2.010353 LR: 0.00002793 +[10:52:11] Epoch: 1 Batch: 6940/20099 (34.53%) Loss: 1.855951 LR: 0.00002792 +[10:52:13] Epoch: 1 Batch: 6941/20099 (34.53%) Loss: 2.188379 LR: 0.00002792 +[10:52:15] Epoch: 1 Batch: 6942/20099 (34.54%) Loss: 2.217560 LR: 0.00002792 +[10:52:16] Epoch: 1 Batch: 6943/20099 (34.54%) Loss: 1.815972 LR: 0.00002792 +[10:52:18] Epoch: 1 Batch: 6944/20099 (34.55%) Loss: 2.162040 LR: 0.00002792 +[10:52:20] Epoch: 1 Batch: 6945/20099 (34.55%) Loss: 1.597113 LR: 0.00002792 +[10:52:22] Epoch: 1 Batch: 6946/20099 (34.56%) Loss: 1.869903 LR: 0.00002792 +[10:52:24] Epoch: 1 Batch: 6947/20099 (34.56%) Loss: 2.060245 LR: 0.00002792 +[10:52:25] Epoch: 1 Batch: 6948/20099 (34.57%) Loss: 2.069923 LR: 0.00002792 +[10:52:27] Epoch: 1 Batch: 6949/20099 (34.57%) Loss: 2.200412 LR: 0.00002792 +[10:52:29] Epoch: 1 Batch: 6950/20099 (34.58%) Loss: 2.258299 LR: 0.00002792 +[10:52:31] Epoch: 1 Batch: 6951/20099 (34.58%) Loss: 1.949165 LR: 0.00002792 +[10:52:32] Epoch: 1 Batch: 6952/20099 (34.59%) Loss: 2.380992 LR: 0.00002792 +[10:52:34] Epoch: 1 Batch: 6953/20099 (34.59%) Loss: 1.863948 LR: 0.00002792 +[10:52:36] Epoch: 1 Batch: 6954/20099 (34.60%) Loss: 1.946181 LR: 0.00002791 +[10:52:38] Epoch: 1 Batch: 6955/20099 (34.60%) Loss: 2.207716 LR: 0.00002791 +[10:52:39] Epoch: 1 Batch: 6956/20099 (34.61%) Loss: 1.759644 LR: 0.00002791 +[10:52:41] Epoch: 1 Batch: 6957/20099 (34.61%) Loss: 2.237288 LR: 0.00002791 +[10:52:43] Epoch: 1 Batch: 6958/20099 (34.62%) Loss: 2.050950 LR: 0.00002791 +[10:52:45] Epoch: 1 Batch: 6959/20099 (34.62%) Loss: 2.205784 LR: 0.00002791 +[10:52:47] Epoch: 1 Batch: 6960/20099 (34.63%) Loss: 2.168074 LR: 0.00002791 +[10:52:48] Epoch: 1 Batch: 6961/20099 (34.63%) Loss: 2.183275 LR: 0.00002790 +[10:52:50] Epoch: 1 Batch: 6962/20099 (34.64%) Loss: 1.963000 LR: 0.00002790 +[10:52:52] Epoch: 1 Batch: 6963/20099 (34.64%) Loss: 2.407468 LR: 0.00002790 +[10:52:54] Epoch: 1 Batch: 6964/20099 (34.65%) Loss: 2.218308 LR: 0.00002790 +[10:52:55] Epoch: 1 Batch: 6965/20099 (34.65%) Loss: 1.960445 LR: 0.00002790 +[10:52:57] Epoch: 1 Batch: 6966/20099 (34.66%) Loss: 1.576009 LR: 0.00002790 +[10:52:59] Epoch: 1 Batch: 6967/20099 (34.66%) Loss: 2.246544 LR: 0.00002790 +[10:53:01] Epoch: 1 Batch: 6968/20099 (34.67%) Loss: 1.661664 LR: 0.00002789 +[10:53:02] Epoch: 1 Batch: 6969/20099 (34.67%) Loss: 2.027601 LR: 0.00002789 +[10:53:04] Epoch: 1 Batch: 6970/20099 (34.68%) Loss: 2.167160 LR: 0.00002789 +[10:53:06] Epoch: 1 Batch: 6971/20099 (34.68%) Loss: 1.954417 LR: 0.00002789 +[10:53:08] Epoch: 1 Batch: 6972/20099 (34.69%) Loss: 2.024255 LR: 0.00002789 +[10:53:09] Epoch: 1 Batch: 6973/20099 (34.69%) Loss: 2.134949 LR: 0.00002789 +[10:53:11] Epoch: 1 Batch: 6974/20099 (34.70%) Loss: 2.255246 LR: 0.00002789 +[10:53:13] Epoch: 1 Batch: 6975/20099 (34.70%) Loss: 2.150893 LR: 0.00002788 +[10:53:15] Epoch: 1 Batch: 6976/20099 (34.71%) Loss: 2.372547 LR: 0.00002788 +[10:53:17] Epoch: 1 Batch: 6977/20099 (34.71%) Loss: 1.949844 LR: 0.00002788 +[10:53:18] Epoch: 1 Batch: 6978/20099 (34.72%) Loss: 2.090329 LR: 0.00002788 +[10:53:20] Epoch: 1 Batch: 6979/20099 (34.72%) Loss: 2.095672 LR: 0.00002788 +[10:53:22] Epoch: 1 Batch: 6980/20099 (34.73%) Loss: 1.742736 LR: 0.00002788 +[10:53:24] Epoch: 1 Batch: 6981/20099 (34.73%) Loss: 2.488661 LR: 0.00002788 +[10:53:25] Epoch: 1 Batch: 6982/20099 (34.74%) Loss: 2.191320 LR: 0.00002787 +[10:53:27] Epoch: 1 Batch: 6983/20099 (34.74%) Loss: 1.962778 LR: 0.00002787 +[10:53:29] Epoch: 1 Batch: 6984/20099 (34.75%) Loss: 2.308599 LR: 0.00002787 +[10:53:31] Epoch: 1 Batch: 6985/20099 (34.75%) Loss: 1.972839 LR: 0.00002787 +[10:53:32] Epoch: 1 Batch: 6986/20099 (34.76%) Loss: 2.201016 LR: 0.00002787 +[10:53:34] Epoch: 1 Batch: 6987/20099 (34.76%) Loss: 2.228927 LR: 0.00002787 +[10:53:36] Epoch: 1 Batch: 6988/20099 (34.77%) Loss: 1.921659 LR: 0.00002787 +[10:53:38] Epoch: 1 Batch: 6989/20099 (34.77%) Loss: 2.453832 LR: 0.00002786 +[10:53:40] Epoch: 1 Batch: 6990/20099 (34.78%) Loss: 2.125905 LR: 0.00002786 +[10:53:41] Epoch: 1 Batch: 6991/20099 (34.78%) Loss: 2.264550 LR: 0.00002786 +[10:53:43] Epoch: 1 Batch: 6992/20099 (34.79%) Loss: 2.086047 LR: 0.00002786 +[10:53:45] Epoch: 1 Batch: 6993/20099 (34.79%) Loss: 2.448757 LR: 0.00002786 +[10:53:47] Epoch: 1 Batch: 6994/20099 (34.80%) Loss: 2.140265 LR: 0.00002786 +[10:53:48] Epoch: 1 Batch: 6995/20099 (34.80%) Loss: 2.169662 LR: 0.00002786 +[10:53:50] Epoch: 1 Batch: 6996/20099 (34.81%) Loss: 2.112179 LR: 0.00002785 +[10:53:52] Epoch: 1 Batch: 6997/20099 (34.81%) Loss: 2.188232 LR: 0.00002785 +[10:53:54] Epoch: 1 Batch: 6998/20099 (34.82%) Loss: 1.933957 LR: 0.00002785 +[10:53:55] Epoch: 1 Batch: 6999/20099 (34.82%) Loss: 2.103078 LR: 0.00002785 +[10:53:57] >> Evaluating batch 0 +[10:53:58] >> Evaluating batch 1 +[10:53:59] >> Evaluating batch 2 +[10:54:00] >> Evaluating batch 3 +[10:54:01] >> Evaluating batch 4 +[10:54:02] >> Evaluating batch 5 +[10:54:03] >> Evaluating batch 6 +[10:54:04] >> Evaluating batch 7 +[10:54:05] >> Evaluating batch 8 +[10:54:06] >> Evaluating batch 9 +[10:54:07] >> Evaluating batch 10 +[10:54:08] >> Evaluating batch 11 +[10:54:09] >> Evaluating batch 12 +[10:54:10] >> Evaluating batch 13 +[10:54:11] >> Evaluating batch 14 +[10:54:12] >> Evaluating batch 15 +[10:54:13] >> Evaluating batch 16 +[10:54:14] Epoch: 1 Step: 7000/20099 Evaluation: +[10:54:14] [1mAvg Loss Since Last Eval: 2.1061 Val Loss: 2.1842 Validation loss delta: -0.0048 Perplexity: 8.8833 LR: 0.00002785 +[10:54:17] >> Cleaned up old temp checkpoint: epoch1_step5000 +[10:54:17] >> Temp checkpoint saved: epoch1_step7000, size: 0.1693 GB +[10:54:21] >> Checkpoint saved: epoch1_step7000, size: 0.1693 GB +[10:54:21] Epoch: 1 Batch: 7000/20099 (34.83%) Loss: 2.139073 LR: 0.00002785 +[10:54:23] Epoch: 1 Batch: 7001/20099 (34.83%) Loss: 2.306963 LR: 0.00002785 +[10:54:24] Epoch: 1 Batch: 7002/20099 (34.84%) Loss: 1.889974 LR: 0.00002785 +[10:54:26] Epoch: 1 Batch: 7003/20099 (34.84%) Loss: 2.004917 LR: 0.00002784 +[10:54:28] Epoch: 1 Batch: 7004/20099 (34.85%) Loss: 2.138768 LR: 0.00002784 +[10:54:30] Epoch: 1 Batch: 7005/20099 (34.85%) Loss: 2.122389 LR: 0.00002784 +[10:54:31] Epoch: 1 Batch: 7006/20099 (34.86%) Loss: 1.983937 LR: 0.00002784 +[10:54:33] Epoch: 1 Batch: 7007/20099 (34.86%) Loss: 2.372482 LR: 0.00002784 +[10:54:35] Epoch: 1 Batch: 7008/20099 (34.87%) Loss: 2.001665 LR: 0.00002784 +[10:54:37] Epoch: 1 Batch: 7009/20099 (34.87%) Loss: 2.015689 LR: 0.00002784 +[10:54:38] Epoch: 1 Batch: 7010/20099 (34.88%) Loss: 2.157980 LR: 0.00002783 +[10:54:40] Epoch: 1 Batch: 7011/20099 (34.88%) Loss: 2.048911 LR: 0.00002783 +[10:54:42] Epoch: 1 Batch: 7012/20099 (34.89%) Loss: 2.007964 LR: 0.00002783 +[10:54:44] Epoch: 1 Batch: 7013/20099 (34.89%) Loss: 2.043484 LR: 0.00002783 +[10:54:46] Epoch: 1 Batch: 7014/20099 (34.90%) Loss: 2.087451 LR: 0.00002783 +[10:54:47] Epoch: 1 Batch: 7015/20099 (34.90%) Loss: 2.170595 LR: 0.00002783 +[10:54:49] Epoch: 1 Batch: 7016/20099 (34.91%) Loss: 2.005560 LR: 0.00002783 +[10:54:51] Epoch: 1 Batch: 7017/20099 (34.91%) Loss: 1.908896 LR: 0.00002782 +[10:54:53] Epoch: 1 Batch: 7018/20099 (34.92%) Loss: 2.347799 LR: 0.00002782 +[10:54:55] Epoch: 1 Batch: 7019/20099 (34.92%) Loss: 2.300037 LR: 0.00002782 +[10:54:57] Epoch: 1 Batch: 7020/20099 (34.93%) Loss: 1.861715 LR: 0.00002782 +[10:54:59] Epoch: 1 Batch: 7021/20099 (34.93%) Loss: 2.111417 LR: 0.00002782 +[10:55:00] Epoch: 1 Batch: 7022/20099 (34.94%) Loss: 2.148019 LR: 0.00002782 +[10:55:02] Epoch: 1 Batch: 7023/20099 (34.94%) Loss: 2.026584 LR: 0.00002782 +[10:55:04] Epoch: 1 Batch: 7024/20099 (34.95%) Loss: 2.224730 LR: 0.00002781 +[10:55:06] Epoch: 1 Batch: 7025/20099 (34.95%) Loss: 1.952418 LR: 0.00002781 +[10:55:08] Epoch: 1 Batch: 7026/20099 (34.96%) Loss: 1.974889 LR: 0.00002781 +[10:55:09] Epoch: 1 Batch: 7027/20099 (34.96%) Loss: 2.100460 LR: 0.00002781 +[10:55:11] Epoch: 1 Batch: 7028/20099 (34.97%) Loss: 2.271579 LR: 0.00002781 +[10:55:13] Epoch: 1 Batch: 7029/20099 (34.97%) Loss: 1.739239 LR: 0.00002781 +[10:55:15] Epoch: 1 Batch: 7030/20099 (34.98%) Loss: 1.909335 LR: 0.00002781 +[10:55:16] Epoch: 1 Batch: 7031/20099 (34.98%) Loss: 2.047011 LR: 0.00002780 +[10:55:18] Epoch: 1 Batch: 7032/20099 (34.99%) Loss: 2.159661 LR: 0.00002780 +[10:55:20] Epoch: 1 Batch: 7033/20099 (34.99%) Loss: 1.770979 LR: 0.00002780 +[10:55:22] Epoch: 1 Batch: 7034/20099 (35.00%) Loss: 2.116041 LR: 0.00002780 +[10:55:24] Epoch: 1 Batch: 7035/20099 (35.00%) Loss: 2.067008 LR: 0.00002780 +[10:55:25] Epoch: 1 Batch: 7036/20099 (35.01%) Loss: 2.394377 LR: 0.00002780 +[10:55:27] Epoch: 1 Batch: 7037/20099 (35.01%) Loss: 2.131183 LR: 0.00002780 +[10:55:29] Epoch: 1 Batch: 7038/20099 (35.02%) Loss: 2.059942 LR: 0.00002780 +[10:55:30] Epoch: 1 Batch: 7039/20099 (35.02%) Loss: 2.249591 LR: 0.00002780 +[10:55:32] Epoch: 1 Batch: 7040/20099 (35.03%) Loss: 2.167532 LR: 0.00002780 +[10:55:34] Epoch: 1 Batch: 7041/20099 (35.03%) Loss: 1.932089 LR: 0.00002780 +[10:55:36] Epoch: 1 Batch: 7042/20099 (35.04%) Loss: 2.063604 LR: 0.00002780 +[10:55:37] Epoch: 1 Batch: 7043/20099 (35.04%) Loss: 1.649331 LR: 0.00002780 +[10:55:39] Epoch: 1 Batch: 7044/20099 (35.05%) Loss: 2.132253 LR: 0.00002780 +[10:55:41] Epoch: 1 Batch: 7045/20099 (35.05%) Loss: 2.121477 LR: 0.00002779 +[10:55:43] Epoch: 1 Batch: 7046/20099 (35.06%) Loss: 1.974774 LR: 0.00002779 +[10:55:44] Epoch: 1 Batch: 7047/20099 (35.06%) Loss: 2.217102 LR: 0.00002779 +[10:55:46] Epoch: 1 Batch: 7048/20099 (35.07%) Loss: 1.734862 LR: 0.00002779 +[10:55:48] Epoch: 1 Batch: 7049/20099 (35.07%) Loss: 2.375342 LR: 0.00002779 +[10:55:50] Epoch: 1 Batch: 7050/20099 (35.08%) Loss: 1.995915 LR: 0.00002779 +[10:55:51] Epoch: 1 Batch: 7051/20099 (35.08%) Loss: 1.824705 LR: 0.00002779 +[10:55:53] Epoch: 1 Batch: 7052/20099 (35.09%) Loss: 1.784909 LR: 0.00002778 +[10:55:55] Epoch: 1 Batch: 7053/20099 (35.09%) Loss: 2.123444 LR: 0.00002778 +[10:55:57] Epoch: 1 Batch: 7054/20099 (35.10%) Loss: 2.222812 LR: 0.00002778 +[10:55:59] Epoch: 1 Batch: 7055/20099 (35.10%) Loss: 1.838475 LR: 0.00002778 +[10:56:00] Epoch: 1 Batch: 7056/20099 (35.11%) Loss: 2.256785 LR: 0.00002778 +[10:56:02] Epoch: 1 Batch: 7057/20099 (35.11%) Loss: 2.107504 LR: 0.00002778 +[10:56:04] Epoch: 1 Batch: 7058/20099 (35.12%) Loss: 2.073534 LR: 0.00002778 +[10:56:06] Epoch: 1 Batch: 7059/20099 (35.12%) Loss: 2.106862 LR: 0.00002777 +[10:56:07] Epoch: 1 Batch: 7060/20099 (35.13%) Loss: 1.985534 LR: 0.00002777 +[10:56:09] Epoch: 1 Batch: 7061/20099 (35.13%) Loss: 2.102464 LR: 0.00002777 +[10:56:11] Epoch: 1 Batch: 7062/20099 (35.14%) Loss: 1.912967 LR: 0.00002777 +[10:56:13] Epoch: 1 Batch: 7063/20099 (35.14%) Loss: 2.228588 LR: 0.00002777 +[10:56:15] Epoch: 1 Batch: 7064/20099 (35.15%) Loss: 1.912664 LR: 0.00002777 +[10:56:16] Epoch: 1 Batch: 7065/20099 (35.15%) Loss: 1.971045 LR: 0.00002777 +[10:56:18] Epoch: 1 Batch: 7066/20099 (35.16%) Loss: 2.305587 LR: 0.00002776 +[10:56:20] Epoch: 1 Batch: 7067/20099 (35.16%) Loss: 1.702191 LR: 0.00002776 +[10:56:22] Epoch: 1 Batch: 7068/20099 (35.17%) Loss: 2.243290 LR: 0.00002776 +[10:56:24] Epoch: 1 Batch: 7069/20099 (35.17%) Loss: 2.336854 LR: 0.00002776 +[10:56:25] Epoch: 1 Batch: 7070/20099 (35.18%) Loss: 2.252550 LR: 0.00002776 +[10:56:27] Epoch: 1 Batch: 7071/20099 (35.18%) Loss: 2.029330 LR: 0.00002776 +[10:56:29] Epoch: 1 Batch: 7072/20099 (35.19%) Loss: 2.094667 LR: 0.00002776 +[10:56:31] Epoch: 1 Batch: 7073/20099 (35.19%) Loss: 2.279553 LR: 0.00002775 +[10:56:32] Epoch: 1 Batch: 7074/20099 (35.20%) Loss: 2.117043 LR: 0.00002775 +[10:56:34] Epoch: 1 Batch: 7075/20099 (35.20%) Loss: 2.046130 LR: 0.00002775 +[10:56:36] Epoch: 1 Batch: 7076/20099 (35.21%) Loss: 1.990312 LR: 0.00002775 +[10:56:38] Epoch: 1 Batch: 7077/20099 (35.21%) Loss: 1.988800 LR: 0.00002775 +[10:56:39] Epoch: 1 Batch: 7078/20099 (35.22%) Loss: 2.272196 LR: 0.00002775 +[10:56:41] Epoch: 1 Batch: 7079/20099 (35.22%) Loss: 2.018588 LR: 0.00002775 +[10:56:43] Epoch: 1 Batch: 7080/20099 (35.23%) Loss: 2.095214 LR: 0.00002774 +[10:56:45] Epoch: 1 Batch: 7081/20099 (35.23%) Loss: 2.179277 LR: 0.00002774 +[10:56:47] Epoch: 1 Batch: 7082/20099 (35.24%) Loss: 2.167888 LR: 0.00002774 +[10:56:48] Epoch: 1 Batch: 7083/20099 (35.24%) Loss: 1.992153 LR: 0.00002774 +[10:56:50] Epoch: 1 Batch: 7084/20099 (35.25%) Loss: 1.986721 LR: 0.00002774 +[10:56:52] Epoch: 1 Batch: 7085/20099 (35.25%) Loss: 2.025734 LR: 0.00002774 +[10:56:54] Epoch: 1 Batch: 7086/20099 (35.26%) Loss: 2.181678 LR: 0.00002774 +[10:56:55] Epoch: 1 Batch: 7087/20099 (35.26%) Loss: 2.056132 LR: 0.00002773 +[10:56:57] Epoch: 1 Batch: 7088/20099 (35.27%) Loss: 1.950521 LR: 0.00002773 +[10:56:59] Epoch: 1 Batch: 7089/20099 (35.27%) Loss: 2.114528 LR: 0.00002773 +[10:57:01] Epoch: 1 Batch: 7090/20099 (35.28%) Loss: 1.854285 LR: 0.00002773 +[10:57:02] Epoch: 1 Batch: 7091/20099 (35.28%) Loss: 1.843466 LR: 0.00002773 +[10:57:04] Epoch: 1 Batch: 7092/20099 (35.29%) Loss: 2.045099 LR: 0.00002773 +[10:57:06] Epoch: 1 Batch: 7093/20099 (35.29%) Loss: 2.036808 LR: 0.00002773 +[10:57:08] Epoch: 1 Batch: 7094/20099 (35.30%) Loss: 2.525105 LR: 0.00002772 +[10:57:09] Epoch: 1 Batch: 7095/20099 (35.30%) Loss: 2.081077 LR: 0.00002772 +[10:57:11] Epoch: 1 Batch: 7096/20099 (35.31%) Loss: 2.155258 LR: 0.00002772 +[10:57:13] Epoch: 1 Batch: 7097/20099 (35.31%) Loss: 2.341044 LR: 0.00002772 +[10:57:15] Epoch: 1 Batch: 7098/20099 (35.32%) Loss: 2.094894 LR: 0.00002772 +[10:57:17] Epoch: 1 Batch: 7099/20099 (35.32%) Loss: 1.951580 LR: 0.00002772 +[10:57:18] Epoch: 1 Batch: 7100/20099 (35.33%) Loss: 2.193962 LR: 0.00002772 +[10:57:20] Epoch: 1 Batch: 7101/20099 (35.33%) Loss: 1.934027 LR: 0.00002771 +[10:57:22] Epoch: 1 Batch: 7102/20099 (35.34%) Loss: 2.139476 LR: 0.00002771 +[10:57:24] Epoch: 1 Batch: 7103/20099 (35.34%) Loss: 2.108750 LR: 0.00002771 +[10:57:25] Epoch: 1 Batch: 7104/20099 (35.35%) Loss: 2.147327 LR: 0.00002771 +[10:57:27] Epoch: 1 Batch: 7105/20099 (35.35%) Loss: 1.905008 LR: 0.00002771 +[10:57:29] Epoch: 1 Batch: 7106/20099 (35.35%) Loss: 1.925633 LR: 0.00002771 +[10:57:31] Epoch: 1 Batch: 7107/20099 (35.36%) Loss: 1.899914 LR: 0.00002771 +[10:57:33] Epoch: 1 Batch: 7108/20099 (35.36%) Loss: 1.969616 LR: 0.00002770 +[10:57:34] Epoch: 1 Batch: 7109/20099 (35.37%) Loss: 2.318392 LR: 0.00002770 +[10:57:36] Epoch: 1 Batch: 7110/20099 (35.37%) Loss: 2.267425 LR: 0.00002770 +[10:57:38] Epoch: 1 Batch: 7111/20099 (35.38%) Loss: 2.382717 LR: 0.00002770 +[10:57:40] Epoch: 1 Batch: 7112/20099 (35.38%) Loss: 2.221552 LR: 0.00002770 +[10:57:41] Epoch: 1 Batch: 7113/20099 (35.39%) Loss: 2.019246 LR: 0.00002770 +[10:57:43] Epoch: 1 Batch: 7114/20099 (35.39%) Loss: 1.903459 LR: 0.00002770 +[10:57:45] Epoch: 1 Batch: 7115/20099 (35.40%) Loss: 2.048503 LR: 0.00002769 +[10:57:47] Epoch: 1 Batch: 7116/20099 (35.40%) Loss: 2.253531 LR: 0.00002769 +[10:57:49] Epoch: 1 Batch: 7117/20099 (35.41%) Loss: 2.416657 LR: 0.00002769 +[10:57:50] Epoch: 1 Batch: 7118/20099 (35.41%) Loss: 2.033681 LR: 0.00002769 +[10:57:52] Epoch: 1 Batch: 7119/20099 (35.42%) Loss: 2.137687 LR: 0.00002769 +[10:57:54] Epoch: 1 Batch: 7120/20099 (35.42%) Loss: 2.249152 LR: 0.00002769 +[10:57:56] Epoch: 1 Batch: 7121/20099 (35.43%) Loss: 2.268061 LR: 0.00002769 +[10:57:57] Epoch: 1 Batch: 7122/20099 (35.43%) Loss: 2.339057 LR: 0.00002768 +[10:57:59] Epoch: 1 Batch: 7123/20099 (35.44%) Loss: 2.389833 LR: 0.00002768 +[10:58:01] Epoch: 1 Batch: 7124/20099 (35.44%) Loss: 2.019301 LR: 0.00002768 +[10:58:03] Epoch: 1 Batch: 7125/20099 (35.45%) Loss: 2.174572 LR: 0.00002768 +[10:58:05] Epoch: 1 Batch: 7126/20099 (35.45%) Loss: 1.972755 LR: 0.00002768 +[10:58:06] Epoch: 1 Batch: 7127/20099 (35.46%) Loss: 2.265330 LR: 0.00002768 +[10:58:08] Epoch: 1 Batch: 7128/20099 (35.46%) Loss: 2.275131 LR: 0.00002768 +[10:58:10] Epoch: 1 Batch: 7129/20099 (35.47%) Loss: 2.137804 LR: 0.00002767 +[10:58:12] Epoch: 1 Batch: 7130/20099 (35.47%) Loss: 2.146187 LR: 0.00002767 +[10:58:13] Epoch: 1 Batch: 7131/20099 (35.48%) Loss: 2.232193 LR: 0.00002767 +[10:58:15] Epoch: 1 Batch: 7132/20099 (35.48%) Loss: 2.022522 LR: 0.00002767 +[10:58:17] Epoch: 1 Batch: 7133/20099 (35.49%) Loss: 2.149465 LR: 0.00002767 +[10:58:19] Epoch: 1 Batch: 7134/20099 (35.49%) Loss: 2.056278 LR: 0.00002767 +[10:58:21] Epoch: 1 Batch: 7135/20099 (35.50%) Loss: 2.129737 LR: 0.00002767 +[10:58:22] Epoch: 1 Batch: 7136/20099 (35.50%) Loss: 2.168998 LR: 0.00002766 +[10:58:24] Epoch: 1 Batch: 7137/20099 (35.51%) Loss: 2.251946 LR: 0.00002766 +[10:58:26] Epoch: 1 Batch: 7138/20099 (35.51%) Loss: 2.353036 LR: 0.00002766 +[10:58:28] Epoch: 1 Batch: 7139/20099 (35.52%) Loss: 1.980910 LR: 0.00002766 +[10:58:30] Epoch: 1 Batch: 7140/20099 (35.52%) Loss: 2.152295 LR: 0.00002766 +[10:58:31] Epoch: 1 Batch: 7141/20099 (35.53%) Loss: 2.102435 LR: 0.00002766 +[10:58:33] Epoch: 1 Batch: 7142/20099 (35.53%) Loss: 2.027541 LR: 0.00002766 +[10:58:35] Epoch: 1 Batch: 7143/20099 (35.54%) Loss: 1.926253 LR: 0.00002765 +[10:58:37] Epoch: 1 Batch: 7144/20099 (35.54%) Loss: 1.732822 LR: 0.00002765 +[10:58:38] Epoch: 1 Batch: 7145/20099 (35.55%) Loss: 2.285007 LR: 0.00002765 +[10:58:40] Epoch: 1 Batch: 7146/20099 (35.55%) Loss: 1.920499 LR: 0.00002765 +[10:58:42] Epoch: 1 Batch: 7147/20099 (35.56%) Loss: 2.107876 LR: 0.00002765 +[10:58:44] Epoch: 1 Batch: 7148/20099 (35.56%) Loss: 1.908098 LR: 0.00002765 +[10:58:45] Epoch: 1 Batch: 7149/20099 (35.57%) Loss: 1.821089 LR: 0.00002765 +[10:58:47] Epoch: 1 Batch: 7150/20099 (35.57%) Loss: 2.180191 LR: 0.00002764 +[10:58:49] Epoch: 1 Batch: 7151/20099 (35.58%) Loss: 2.092648 LR: 0.00002764 +[10:58:51] Epoch: 1 Batch: 7152/20099 (35.58%) Loss: 2.314674 LR: 0.00002764 +[10:58:53] Epoch: 1 Batch: 7153/20099 (35.59%) Loss: 1.909270 LR: 0.00002764 +[10:58:54] Epoch: 1 Batch: 7154/20099 (35.59%) Loss: 2.135186 LR: 0.00002764 +[10:58:56] Epoch: 1 Batch: 7155/20099 (35.60%) Loss: 2.044431 LR: 0.00002764 +[10:58:58] Epoch: 1 Batch: 7156/20099 (35.60%) Loss: 2.152645 LR: 0.00002764 +[10:59:00] Epoch: 1 Batch: 7157/20099 (35.61%) Loss: 2.158270 LR: 0.00002763 +[10:59:01] Epoch: 1 Batch: 7158/20099 (35.61%) Loss: 2.233413 LR: 0.00002763 +[10:59:03] Epoch: 1 Batch: 7159/20099 (35.62%) Loss: 1.975279 LR: 0.00002763 +[10:59:05] Epoch: 1 Batch: 7160/20099 (35.62%) Loss: 2.070082 LR: 0.00002763 +[10:59:07] Epoch: 1 Batch: 7161/20099 (35.63%) Loss: 1.578087 LR: 0.00002763 +[10:59:09] Epoch: 1 Batch: 7162/20099 (35.63%) Loss: 1.929736 LR: 0.00002763 +[10:59:10] Epoch: 1 Batch: 7163/20099 (35.64%) Loss: 2.040705 LR: 0.00002763 +[10:59:12] Epoch: 1 Batch: 7164/20099 (35.64%) Loss: 2.457897 LR: 0.00002762 +[10:59:14] Epoch: 1 Batch: 7165/20099 (35.65%) Loss: 2.153120 LR: 0.00002762 +[10:59:16] Epoch: 1 Batch: 7166/20099 (35.65%) Loss: 1.885421 LR: 0.00002762 +[10:59:17] Epoch: 1 Batch: 7167/20099 (35.66%) Loss: 2.289651 LR: 0.00002762 +[10:59:19] Epoch: 1 Batch: 7168/20099 (35.66%) Loss: 2.339476 LR: 0.00002762 +[10:59:21] Epoch: 1 Batch: 7169/20099 (35.67%) Loss: 2.266893 LR: 0.00002762 +[10:59:23] Epoch: 1 Batch: 7170/20099 (35.67%) Loss: 2.084042 LR: 0.00002762 +[10:59:25] Epoch: 1 Batch: 7171/20099 (35.68%) Loss: 2.002411 LR: 0.00002761 +[10:59:26] Epoch: 1 Batch: 7172/20099 (35.68%) Loss: 2.244063 LR: 0.00002761 +[10:59:28] Epoch: 1 Batch: 7173/20099 (35.69%) Loss: 2.313492 LR: 0.00002761 +[10:59:30] Epoch: 1 Batch: 7174/20099 (35.69%) Loss: 2.387916 LR: 0.00002761 +[10:59:32] Epoch: 1 Batch: 7175/20099 (35.70%) Loss: 2.186898 LR: 0.00002761 +[10:59:33] Epoch: 1 Batch: 7176/20099 (35.70%) Loss: 2.302961 LR: 0.00002761 +[10:59:35] Epoch: 1 Batch: 7177/20099 (35.71%) Loss: 2.247002 LR: 0.00002761 +[10:59:37] Epoch: 1 Batch: 7178/20099 (35.71%) Loss: 2.189175 LR: 0.00002760 +[10:59:39] Epoch: 1 Batch: 7179/20099 (35.72%) Loss: 2.017252 LR: 0.00002760 +[10:59:41] Epoch: 1 Batch: 7180/20099 (35.72%) Loss: 2.061703 LR: 0.00002760 +[10:59:42] Epoch: 1 Batch: 7181/20099 (35.73%) Loss: 2.148195 LR: 0.00002760 +[10:59:44] Epoch: 1 Batch: 7182/20099 (35.73%) Loss: 2.127609 LR: 0.00002760 +[10:59:46] Epoch: 1 Batch: 7183/20099 (35.74%) Loss: 2.223930 LR: 0.00002760 +[10:59:48] Epoch: 1 Batch: 7184/20099 (35.74%) Loss: 2.409929 LR: 0.00002760 +[10:59:50] Epoch: 1 Batch: 7185/20099 (35.75%) Loss: 2.328671 LR: 0.00002759 +[10:59:51] Epoch: 1 Batch: 7186/20099 (35.75%) Loss: 2.319633 LR: 0.00002759 +[10:59:53] Epoch: 1 Batch: 7187/20099 (35.76%) Loss: 2.014514 LR: 0.00002759 +[10:59:55] Epoch: 1 Batch: 7188/20099 (35.76%) Loss: 2.026465 LR: 0.00002759 +[10:59:57] Epoch: 1 Batch: 7189/20099 (35.77%) Loss: 1.832118 LR: 0.00002759 +[10:59:58] Epoch: 1 Batch: 7190/20099 (35.77%) Loss: 2.167995 LR: 0.00002759 +[11:00:00] Epoch: 1 Batch: 7191/20099 (35.78%) Loss: 2.097427 LR: 0.00002759 +[11:00:02] Epoch: 1 Batch: 7192/20099 (35.78%) Loss: 2.137363 LR: 0.00002758 +[11:00:04] Epoch: 1 Batch: 7193/20099 (35.79%) Loss: 2.246005 LR: 0.00002758 +[11:00:06] Epoch: 1 Batch: 7194/20099 (35.79%) Loss: 2.290567 LR: 0.00002758 +[11:00:07] Epoch: 1 Batch: 7195/20099 (35.80%) Loss: 2.265014 LR: 0.00002758 +[11:00:09] Epoch: 1 Batch: 7196/20099 (35.80%) Loss: 2.370922 LR: 0.00002758 +[11:00:11] Epoch: 1 Batch: 7197/20099 (35.81%) Loss: 2.177689 LR: 0.00002758 +[11:00:13] Epoch: 1 Batch: 7198/20099 (35.81%) Loss: 2.369873 LR: 0.00002758 +[11:00:14] Epoch: 1 Batch: 7199/20099 (35.82%) Loss: 2.153777 LR: 0.00002757 +[11:00:20] >> Cleaned up old temp checkpoint: epoch1_step5200 +[11:00:20] >> Temp checkpoint saved: epoch1_step7200, size: 0.1693 GB +[11:00:20] Epoch: 1 Batch: 7200/20099 (35.82%) Loss: 2.076853 LR: 0.00002757 +[11:00:21] Epoch: 1 Batch: 7201/20099 (35.83%) Loss: 1.911233 LR: 0.00002757 +[11:00:23] Epoch: 1 Batch: 7202/20099 (35.83%) Loss: 1.558824 LR: 0.00002757 +[11:00:25] Epoch: 1 Batch: 7203/20099 (35.84%) Loss: 2.433884 LR: 0.00002757 +[11:00:27] Epoch: 1 Batch: 7204/20099 (35.84%) Loss: 2.164688 LR: 0.00002757 +[11:00:28] Epoch: 1 Batch: 7205/20099 (35.85%) Loss: 1.967166 LR: 0.00002757 +[11:00:30] Epoch: 1 Batch: 7206/20099 (35.85%) Loss: 2.246242 LR: 0.00002756 +[11:00:32] Epoch: 1 Batch: 7207/20099 (35.86%) Loss: 2.368386 LR: 0.00002756 +[11:00:34] Epoch: 1 Batch: 7208/20099 (35.86%) Loss: 2.173307 LR: 0.00002756 +[11:00:36] Epoch: 1 Batch: 7209/20099 (35.87%) Loss: 2.270287 LR: 0.00002756 +[11:00:37] Epoch: 1 Batch: 7210/20099 (35.87%) Loss: 1.827531 LR: 0.00002756 +[11:00:39] Epoch: 1 Batch: 7211/20099 (35.88%) Loss: 2.061102 LR: 0.00002756 +[11:00:41] Epoch: 1 Batch: 7212/20099 (35.88%) Loss: 2.073916 LR: 0.00002756 +[11:00:43] Epoch: 1 Batch: 7213/20099 (35.89%) Loss: 1.890602 LR: 0.00002756 +[11:00:44] Epoch: 1 Batch: 7214/20099 (35.89%) Loss: 1.724355 LR: 0.00002756 +[11:00:46] Epoch: 1 Batch: 7215/20099 (35.90%) Loss: 1.820392 LR: 0.00002756 +[11:00:48] Epoch: 1 Batch: 7216/20099 (35.90%) Loss: 2.252142 LR: 0.00002756 +[11:00:50] Epoch: 1 Batch: 7217/20099 (35.91%) Loss: 2.047825 LR: 0.00002756 +[11:00:52] Epoch: 1 Batch: 7218/20099 (35.91%) Loss: 1.849444 LR: 0.00002756 +[11:00:53] Epoch: 1 Batch: 7219/20099 (35.92%) Loss: 1.876080 LR: 0.00002756 +[11:00:55] Epoch: 1 Batch: 7220/20099 (35.92%) Loss: 2.430123 LR: 0.00002755 +[11:00:57] Epoch: 1 Batch: 7221/20099 (35.93%) Loss: 2.195468 LR: 0.00002755 +[11:00:59] Epoch: 1 Batch: 7222/20099 (35.93%) Loss: 2.169407 LR: 0.00002755 +[11:01:01] Epoch: 1 Batch: 7223/20099 (35.94%) Loss: 2.503976 LR: 0.00002755 +[11:01:02] Epoch: 1 Batch: 7224/20099 (35.94%) Loss: 2.027979 LR: 0.00002755 +[11:01:04] Epoch: 1 Batch: 7225/20099 (35.95%) Loss: 2.149176 LR: 0.00002755 +[11:01:06] Epoch: 1 Batch: 7226/20099 (35.95%) Loss: 2.015510 LR: 0.00002755 +[11:01:08] Epoch: 1 Batch: 7227/20099 (35.96%) Loss: 2.033658 LR: 0.00002754 +[11:01:10] Epoch: 1 Batch: 7228/20099 (35.96%) Loss: 1.807357 LR: 0.00002754 +[11:01:11] Epoch: 1 Batch: 7229/20099 (35.97%) Loss: 2.182154 LR: 0.00002754 +[11:01:13] Epoch: 1 Batch: 7230/20099 (35.97%) Loss: 2.117984 LR: 0.00002754 +[11:01:15] Epoch: 1 Batch: 7231/20099 (35.98%) Loss: 1.865798 LR: 0.00002754 +[11:01:17] Epoch: 1 Batch: 7232/20099 (35.98%) Loss: 1.999234 LR: 0.00002754 +[11:01:18] Epoch: 1 Batch: 7233/20099 (35.99%) Loss: 2.306871 LR: 0.00002754 +[11:01:20] Epoch: 1 Batch: 7234/20099 (35.99%) Loss: 2.072221 LR: 0.00002753 +[11:01:22] Epoch: 1 Batch: 7235/20099 (36.00%) Loss: 2.104891 LR: 0.00002753 +[11:01:24] Epoch: 1 Batch: 7236/20099 (36.00%) Loss: 1.873299 LR: 0.00002753 +[11:01:25] Epoch: 1 Batch: 7237/20099 (36.01%) Loss: 2.234660 LR: 0.00002753 +[11:01:27] Epoch: 1 Batch: 7238/20099 (36.01%) Loss: 2.053027 LR: 0.00002753 +[11:01:29] Epoch: 1 Batch: 7239/20099 (36.02%) Loss: 2.114897 LR: 0.00002753 +[11:01:31] Epoch: 1 Batch: 7240/20099 (36.02%) Loss: 2.214556 LR: 0.00002753 +[11:01:32] Epoch: 1 Batch: 7241/20099 (36.03%) Loss: 2.174130 LR: 0.00002752 +[11:01:34] Epoch: 1 Batch: 7242/20099 (36.03%) Loss: 2.173335 LR: 0.00002752 +[11:01:36] Epoch: 1 Batch: 7243/20099 (36.04%) Loss: 2.311701 LR: 0.00002752 +[11:01:38] Epoch: 1 Batch: 7244/20099 (36.04%) Loss: 1.790541 LR: 0.00002752 +[11:01:40] Epoch: 1 Batch: 7245/20099 (36.05%) Loss: 2.153057 LR: 0.00002752 +[11:01:41] Epoch: 1 Batch: 7246/20099 (36.05%) Loss: 1.773124 LR: 0.00002752 +[11:01:43] Epoch: 1 Batch: 7247/20099 (36.06%) Loss: 2.099765 LR: 0.00002752 +[11:01:45] Epoch: 1 Batch: 7248/20099 (36.06%) Loss: 2.230945 LR: 0.00002751 +[11:01:47] Epoch: 1 Batch: 7249/20099 (36.07%) Loss: 1.850626 LR: 0.00002751 +[11:01:48] Epoch: 1 Batch: 7250/20099 (36.07%) Loss: 2.211106 LR: 0.00002751 +[11:01:50] Epoch: 1 Batch: 7251/20099 (36.08%) Loss: 1.983997 LR: 0.00002751 +[11:01:52] Epoch: 1 Batch: 7252/20099 (36.08%) Loss: 2.175603 LR: 0.00002751 +[11:01:54] Epoch: 1 Batch: 7253/20099 (36.09%) Loss: 2.614998 LR: 0.00002751 +[11:01:55] Epoch: 1 Batch: 7254/20099 (36.09%) Loss: 1.670992 LR: 0.00002751 +[11:01:57] Epoch: 1 Batch: 7255/20099 (36.10%) Loss: 1.889959 LR: 0.00002750 +[11:01:59] Epoch: 1 Batch: 7256/20099 (36.10%) Loss: 2.342557 LR: 0.00002750 +[11:02:01] Epoch: 1 Batch: 7257/20099 (36.11%) Loss: 1.913944 LR: 0.00002750 +[11:02:03] Epoch: 1 Batch: 7258/20099 (36.11%) Loss: 1.942025 LR: 0.00002750 +[11:02:04] Epoch: 1 Batch: 7259/20099 (36.12%) Loss: 2.215939 LR: 0.00002750 +[11:02:06] Epoch: 1 Batch: 7260/20099 (36.12%) Loss: 2.183671 LR: 0.00002750 +[11:02:08] Epoch: 1 Batch: 7261/20099 (36.13%) Loss: 1.947031 LR: 0.00002750 +[11:02:10] Epoch: 1 Batch: 7262/20099 (36.13%) Loss: 2.305274 LR: 0.00002749 +[11:02:11] Epoch: 1 Batch: 7263/20099 (36.14%) Loss: 2.355109 LR: 0.00002749 +[11:02:13] Epoch: 1 Batch: 7264/20099 (36.14%) Loss: 2.185870 LR: 0.00002749 +[11:02:15] Epoch: 1 Batch: 7265/20099 (36.15%) Loss: 1.741967 LR: 0.00002749 +[11:02:17] Epoch: 1 Batch: 7266/20099 (36.15%) Loss: 2.281250 LR: 0.00002749 +[11:02:19] Epoch: 1 Batch: 7267/20099 (36.16%) Loss: 2.231014 LR: 0.00002749 +[11:02:20] Epoch: 1 Batch: 7268/20099 (36.16%) Loss: 2.058611 LR: 0.00002749 +[11:02:22] Epoch: 1 Batch: 7269/20099 (36.17%) Loss: 1.955834 LR: 0.00002748 +[11:02:24] Epoch: 1 Batch: 7270/20099 (36.17%) Loss: 2.215246 LR: 0.00002748 +[11:02:26] Epoch: 1 Batch: 7271/20099 (36.18%) Loss: 2.024403 LR: 0.00002748 +[11:02:28] Epoch: 1 Batch: 7272/20099 (36.18%) Loss: 2.058369 LR: 0.00002748 +[11:02:29] Epoch: 1 Batch: 7273/20099 (36.19%) Loss: 2.076151 LR: 0.00002748 +[11:02:31] Epoch: 1 Batch: 7274/20099 (36.19%) Loss: 2.163307 LR: 0.00002748 +[11:02:33] Epoch: 1 Batch: 7275/20099 (36.20%) Loss: 2.101662 LR: 0.00002748 +[11:02:35] Epoch: 1 Batch: 7276/20099 (36.20%) Loss: 2.198657 LR: 0.00002747 +[11:02:36] Epoch: 1 Batch: 7277/20099 (36.21%) Loss: 2.193222 LR: 0.00002747 +[11:02:38] Epoch: 1 Batch: 7278/20099 (36.21%) Loss: 2.168489 LR: 0.00002747 +[11:02:40] Epoch: 1 Batch: 7279/20099 (36.22%) Loss: 2.083658 LR: 0.00002747 +[11:02:42] Epoch: 1 Batch: 7280/20099 (36.22%) Loss: 2.165561 LR: 0.00002747 +[11:02:44] Epoch: 1 Batch: 7281/20099 (36.23%) Loss: 2.142388 LR: 0.00002747 +[11:02:45] Epoch: 1 Batch: 7282/20099 (36.23%) Loss: 2.329437 LR: 0.00002747 +[11:02:47] Epoch: 1 Batch: 7283/20099 (36.24%) Loss: 1.897173 LR: 0.00002746 +[11:02:49] Epoch: 1 Batch: 7284/20099 (36.24%) Loss: 2.059953 LR: 0.00002746 +[11:02:51] Epoch: 1 Batch: 7285/20099 (36.25%) Loss: 2.178367 LR: 0.00002746 +[11:02:52] Epoch: 1 Batch: 7286/20099 (36.25%) Loss: 2.035744 LR: 0.00002746 +[11:02:54] Epoch: 1 Batch: 7287/20099 (36.26%) Loss: 2.225251 LR: 0.00002746 +[11:02:56] Epoch: 1 Batch: 7288/20099 (36.26%) Loss: 2.046050 LR: 0.00002746 +[11:02:58] Epoch: 1 Batch: 7289/20099 (36.27%) Loss: 2.071908 LR: 0.00002746 +[11:02:59] Epoch: 1 Batch: 7290/20099 (36.27%) Loss: 1.964508 LR: 0.00002745 +[11:03:01] Epoch: 1 Batch: 7291/20099 (36.28%) Loss: 2.282103 LR: 0.00002745 +[11:03:03] Epoch: 1 Batch: 7292/20099 (36.28%) Loss: 1.729223 LR: 0.00002745 +[11:03:05] Epoch: 1 Batch: 7293/20099 (36.29%) Loss: 2.304574 LR: 0.00002745 +[11:03:06] Epoch: 1 Batch: 7294/20099 (36.29%) Loss: 2.084528 LR: 0.00002745 +[11:03:08] Epoch: 1 Batch: 7295/20099 (36.30%) Loss: 1.958475 LR: 0.00002745 +[11:03:10] Epoch: 1 Batch: 7296/20099 (36.30%) Loss: 2.322032 LR: 0.00002745 +[11:03:12] Epoch: 1 Batch: 7297/20099 (36.31%) Loss: 1.829064 LR: 0.00002744 +[11:03:13] Epoch: 1 Batch: 7298/20099 (36.31%) Loss: 2.395888 LR: 0.00002744 +[11:03:15] Epoch: 1 Batch: 7299/20099 (36.32%) Loss: 2.153566 LR: 0.00002744 +[11:03:17] Epoch: 1 Batch: 7300/20099 (36.32%) Loss: 2.315783 LR: 0.00002744 +[11:03:19] Epoch: 1 Batch: 7301/20099 (36.33%) Loss: 2.189029 LR: 0.00002744 +[11:03:20] Epoch: 1 Batch: 7302/20099 (36.33%) Loss: 2.030681 LR: 0.00002744 +[11:03:22] Epoch: 1 Batch: 7303/20099 (36.34%) Loss: 1.994963 LR: 0.00002744 +[11:03:24] Epoch: 1 Batch: 7304/20099 (36.34%) Loss: 2.298687 LR: 0.00002743 +[11:03:26] Epoch: 1 Batch: 7305/20099 (36.35%) Loss: 2.360773 LR: 0.00002743 +[11:03:28] Epoch: 1 Batch: 7306/20099 (36.35%) Loss: 2.041594 LR: 0.00002743 +[11:03:29] Epoch: 1 Batch: 7307/20099 (36.36%) Loss: 2.206077 LR: 0.00002743 +[11:03:31] Epoch: 1 Batch: 7308/20099 (36.36%) Loss: 1.931530 LR: 0.00002743 +[11:03:33] Epoch: 1 Batch: 7309/20099 (36.36%) Loss: 2.277756 LR: 0.00002743 +[11:03:35] Epoch: 1 Batch: 7310/20099 (36.37%) Loss: 2.103641 LR: 0.00002743 +[11:03:36] Epoch: 1 Batch: 7311/20099 (36.37%) Loss: 2.077133 LR: 0.00002742 +[11:03:38] Epoch: 1 Batch: 7312/20099 (36.38%) Loss: 2.407587 LR: 0.00002742 +[11:03:40] Epoch: 1 Batch: 7313/20099 (36.38%) Loss: 2.292740 LR: 0.00002742 +[11:03:42] Epoch: 1 Batch: 7314/20099 (36.39%) Loss: 2.230324 LR: 0.00002742 +[11:03:43] Epoch: 1 Batch: 7315/20099 (36.39%) Loss: 2.192221 LR: 0.00002742 +[11:03:45] Epoch: 1 Batch: 7316/20099 (36.40%) Loss: 2.060912 LR: 0.00002742 +[11:03:47] Epoch: 1 Batch: 7317/20099 (36.40%) Loss: 1.764946 LR: 0.00002742 +[11:03:49] Epoch: 1 Batch: 7318/20099 (36.41%) Loss: 2.140236 LR: 0.00002741 +[11:03:51] Epoch: 1 Batch: 7319/20099 (36.41%) Loss: 2.143922 LR: 0.00002741 +[11:03:52] Epoch: 1 Batch: 7320/20099 (36.42%) Loss: 2.177565 LR: 0.00002741 +[11:03:54] Epoch: 1 Batch: 7321/20099 (36.42%) Loss: 2.258011 LR: 0.00002741 +[11:03:56] Epoch: 1 Batch: 7322/20099 (36.43%) Loss: 2.198504 LR: 0.00002741 +[11:03:58] Epoch: 1 Batch: 7323/20099 (36.43%) Loss: 2.121088 LR: 0.00002741 +[11:03:59] Epoch: 1 Batch: 7324/20099 (36.44%) Loss: 2.039883 LR: 0.00002741 +[11:04:01] Epoch: 1 Batch: 7325/20099 (36.44%) Loss: 2.040908 LR: 0.00002740 +[11:04:03] Epoch: 1 Batch: 7326/20099 (36.45%) Loss: 1.960421 LR: 0.00002740 +[11:04:05] Epoch: 1 Batch: 7327/20099 (36.45%) Loss: 1.960288 LR: 0.00002740 +[11:04:06] Epoch: 1 Batch: 7328/20099 (36.46%) Loss: 1.937103 LR: 0.00002740 +[11:04:08] Epoch: 1 Batch: 7329/20099 (36.46%) Loss: 2.225955 LR: 0.00002740 +[11:04:10] Epoch: 1 Batch: 7330/20099 (36.47%) Loss: 1.913225 LR: 0.00002740 +[11:04:12] Epoch: 1 Batch: 7331/20099 (36.47%) Loss: 2.049021 LR: 0.00002740 +[11:04:14] Epoch: 1 Batch: 7332/20099 (36.48%) Loss: 2.165673 LR: 0.00002739 +[11:04:15] Epoch: 1 Batch: 7333/20099 (36.48%) Loss: 2.459568 LR: 0.00002739 +[11:04:17] Epoch: 1 Batch: 7334/20099 (36.49%) Loss: 2.368166 LR: 0.00002739 +[11:04:19] Epoch: 1 Batch: 7335/20099 (36.49%) Loss: 1.881795 LR: 0.00002739 +[11:04:21] Epoch: 1 Batch: 7336/20099 (36.50%) Loss: 2.334785 LR: 0.00002739 +[11:04:23] Epoch: 1 Batch: 7337/20099 (36.50%) Loss: 1.741207 LR: 0.00002739 +[11:04:24] Epoch: 1 Batch: 7338/20099 (36.51%) Loss: 2.043748 LR: 0.00002739 +[11:04:26] Epoch: 1 Batch: 7339/20099 (36.51%) Loss: 1.828566 LR: 0.00002738 +[11:04:28] Epoch: 1 Batch: 7340/20099 (36.52%) Loss: 2.568763 LR: 0.00002738 +[11:04:30] Epoch: 1 Batch: 7341/20099 (36.52%) Loss: 1.944180 LR: 0.00002738 +[11:04:31] Epoch: 1 Batch: 7342/20099 (36.53%) Loss: 2.065260 LR: 0.00002738 +[11:04:33] Epoch: 1 Batch: 7343/20099 (36.53%) Loss: 2.118607 LR: 0.00002738 +[11:04:35] Epoch: 1 Batch: 7344/20099 (36.54%) Loss: 1.989427 LR: 0.00002738 +[11:04:37] Epoch: 1 Batch: 7345/20099 (36.54%) Loss: 2.393122 LR: 0.00002738 +[11:04:39] Epoch: 1 Batch: 7346/20099 (36.55%) Loss: 2.249776 LR: 0.00002737 +[11:04:40] Epoch: 1 Batch: 7347/20099 (36.55%) Loss: 1.997816 LR: 0.00002737 +[11:04:42] Epoch: 1 Batch: 7348/20099 (36.56%) Loss: 1.891102 LR: 0.00002737 +[11:04:44] Epoch: 1 Batch: 7349/20099 (36.56%) Loss: 2.059733 LR: 0.00002737 +[11:04:46] Epoch: 1 Batch: 7350/20099 (36.57%) Loss: 2.374706 LR: 0.00002737 +[11:04:47] Epoch: 1 Batch: 7351/20099 (36.57%) Loss: 1.955818 LR: 0.00002737 +[11:04:49] Epoch: 1 Batch: 7352/20099 (36.58%) Loss: 2.103321 LR: 0.00002737 +[11:04:51] Epoch: 1 Batch: 7353/20099 (36.58%) Loss: 2.455685 LR: 0.00002736 +[11:04:53] Epoch: 1 Batch: 7354/20099 (36.59%) Loss: 2.132526 LR: 0.00002736 +[11:04:54] Epoch: 1 Batch: 7355/20099 (36.59%) Loss: 2.256558 LR: 0.00002736 +[11:04:56] Epoch: 1 Batch: 7356/20099 (36.60%) Loss: 2.008852 LR: 0.00002736 +[11:04:58] Epoch: 1 Batch: 7357/20099 (36.60%) Loss: 2.490409 LR: 0.00002736 +[11:05:00] Epoch: 1 Batch: 7358/20099 (36.61%) Loss: 2.042322 LR: 0.00002736 +[11:05:02] Epoch: 1 Batch: 7359/20099 (36.61%) Loss: 2.441477 LR: 0.00002736 +[11:05:03] Epoch: 1 Batch: 7360/20099 (36.62%) Loss: 2.073596 LR: 0.00002734 +[11:05:05] Epoch: 1 Batch: 7361/20099 (36.62%) Loss: 1.889905 LR: 0.00002734 +[11:05:07] Epoch: 1 Batch: 7362/20099 (36.63%) Loss: 2.150015 LR: 0.00002734 +[11:05:09] Epoch: 1 Batch: 7363/20099 (36.63%) Loss: 2.060184 LR: 0.00002734 +[11:05:10] Epoch: 1 Batch: 7364/20099 (36.64%) Loss: 2.166988 LR: 0.00002734 +[11:05:12] Epoch: 1 Batch: 7365/20099 (36.64%) Loss: 2.058955 LR: 0.00002734 +[11:05:14] Epoch: 1 Batch: 7366/20099 (36.65%) Loss: 2.253374 LR: 0.00002734 +[11:05:16] Epoch: 1 Batch: 7367/20099 (36.65%) Loss: 2.017129 LR: 0.00002733 +[11:05:18] Epoch: 1 Batch: 7368/20099 (36.66%) Loss: 2.147307 LR: 0.00002733 +[11:05:19] Epoch: 1 Batch: 7369/20099 (36.66%) Loss: 2.314949 LR: 0.00002733 +[11:05:21] Epoch: 1 Batch: 7370/20099 (36.67%) Loss: 1.933322 LR: 0.00002733 +[11:05:23] Epoch: 1 Batch: 7371/20099 (36.67%) Loss: 2.037072 LR: 0.00002733 +[11:05:25] Epoch: 1 Batch: 7372/20099 (36.68%) Loss: 2.161930 LR: 0.00002733 +[11:05:26] Epoch: 1 Batch: 7373/20099 (36.68%) Loss: 1.884749 LR: 0.00002733 +[11:05:28] Epoch: 1 Batch: 7374/20099 (36.69%) Loss: 2.020182 LR: 0.00002732 +[11:05:30] Epoch: 1 Batch: 7375/20099 (36.69%) Loss: 2.079051 LR: 0.00002732 +[11:05:32] Epoch: 1 Batch: 7376/20099 (36.70%) Loss: 2.425427 LR: 0.00002732 +[11:05:33] Epoch: 1 Batch: 7377/20099 (36.70%) Loss: 2.373074 LR: 0.00002732 +[11:05:35] Epoch: 1 Batch: 7378/20099 (36.71%) Loss: 2.169395 LR: 0.00002732 +[11:05:37] Epoch: 1 Batch: 7379/20099 (36.71%) Loss: 1.989823 LR: 0.00002732 +[11:05:39] Epoch: 1 Batch: 7380/20099 (36.72%) Loss: 2.305950 LR: 0.00002732 +[11:05:41] Epoch: 1 Batch: 7381/20099 (36.72%) Loss: 2.165323 LR: 0.00002731 +[11:05:42] Epoch: 1 Batch: 7382/20099 (36.73%) Loss: 1.983045 LR: 0.00002731 +[11:05:44] Epoch: 1 Batch: 7383/20099 (36.73%) Loss: 2.156368 LR: 0.00002731 +[11:05:46] Epoch: 1 Batch: 7384/20099 (36.74%) Loss: 1.932139 LR: 0.00002731 +[11:05:48] Epoch: 1 Batch: 7385/20099 (36.74%) Loss: 2.303589 LR: 0.00002731 +[11:05:49] Epoch: 1 Batch: 7386/20099 (36.75%) Loss: 2.435396 LR: 0.00002731 +[11:05:51] Epoch: 1 Batch: 7387/20099 (36.75%) Loss: 2.065573 LR: 0.00002731 +[11:05:53] Epoch: 1 Batch: 7388/20099 (36.76%) Loss: 2.034019 LR: 0.00002730 +[11:05:55] Epoch: 1 Batch: 7389/20099 (36.76%) Loss: 1.919823 LR: 0.00002730 +[11:05:56] Epoch: 1 Batch: 7390/20099 (36.77%) Loss: 1.870006 LR: 0.00002730 +[11:05:58] Epoch: 1 Batch: 7391/20099 (36.77%) Loss: 1.962441 LR: 0.00002730 +[11:06:00] Epoch: 1 Batch: 7392/20099 (36.78%) Loss: 1.956198 LR: 0.00002730 +[11:06:02] Epoch: 1 Batch: 7393/20099 (36.78%) Loss: 2.429133 LR: 0.00002730 +[11:06:04] Epoch: 1 Batch: 7394/20099 (36.79%) Loss: 1.901583 LR: 0.00002730 +[11:06:05] Epoch: 1 Batch: 7395/20099 (36.79%) Loss: 2.145886 LR: 0.00002729 +[11:06:07] Epoch: 1 Batch: 7396/20099 (36.80%) Loss: 2.215379 LR: 0.00002729 +[11:06:09] Epoch: 1 Batch: 7397/20099 (36.80%) Loss: 1.915058 LR: 0.00002729 +[11:06:11] Epoch: 1 Batch: 7398/20099 (36.81%) Loss: 2.024350 LR: 0.00002729 +[11:06:12] Epoch: 1 Batch: 7399/20099 (36.81%) Loss: 2.156263 LR: 0.00002729 +[11:06:17] >> Cleaned up old temp checkpoint: epoch1_step5400 +[11:06:18] >> Temp checkpoint saved: epoch1_step7400, size: 0.1693 GB +[11:06:18] Epoch: 1 Batch: 7400/20099 (36.82%) Loss: 2.016092 LR: 0.00002729 +[11:06:19] Epoch: 1 Batch: 7401/20099 (36.82%) Loss: 2.337456 LR: 0.00002729 +[11:06:21] Epoch: 1 Batch: 7402/20099 (36.83%) Loss: 2.149627 LR: 0.00002728 +[11:06:23] Epoch: 1 Batch: 7403/20099 (36.83%) Loss: 2.359048 LR: 0.00002728 +[11:06:25] Epoch: 1 Batch: 7404/20099 (36.84%) Loss: 2.004697 LR: 0.00002728 +[11:06:26] Epoch: 1 Batch: 7405/20099 (36.84%) Loss: 2.060101 LR: 0.00002728 +[11:06:28] Epoch: 1 Batch: 7406/20099 (36.85%) Loss: 1.944252 LR: 0.00002728 +[11:06:30] Epoch: 1 Batch: 7407/20099 (36.85%) Loss: 2.026157 LR: 0.00002728 +[11:06:32] Epoch: 1 Batch: 7408/20099 (36.86%) Loss: 2.122943 LR: 0.00002728 +[11:06:33] Epoch: 1 Batch: 7409/20099 (36.86%) Loss: 2.046132 LR: 0.00002727 +[11:06:35] Epoch: 1 Batch: 7410/20099 (36.87%) Loss: 1.845685 LR: 0.00002727 +[11:06:37] Epoch: 1 Batch: 7411/20099 (36.87%) Loss: 1.954907 LR: 0.00002727 +[11:06:39] Epoch: 1 Batch: 7412/20099 (36.88%) Loss: 2.159217 LR: 0.00002727 +[11:06:41] Epoch: 1 Batch: 7413/20099 (36.88%) Loss: 2.380630 LR: 0.00002727 +[11:06:42] Epoch: 1 Batch: 7414/20099 (36.89%) Loss: 2.428357 LR: 0.00002727 +[11:06:44] Epoch: 1 Batch: 7415/20099 (36.89%) Loss: 1.951288 LR: 0.00002727 +[11:06:46] Epoch: 1 Batch: 7416/20099 (36.90%) Loss: 2.245491 LR: 0.00002726 +[11:06:48] Epoch: 1 Batch: 7417/20099 (36.90%) Loss: 2.348791 LR: 0.00002726 +[11:06:50] Epoch: 1 Batch: 7418/20099 (36.91%) Loss: 2.536381 LR: 0.00002726 +[11:06:51] Epoch: 1 Batch: 7419/20099 (36.91%) Loss: 2.118656 LR: 0.00002726 +[11:06:53] Epoch: 1 Batch: 7420/20099 (36.92%) Loss: 2.311234 LR: 0.00002726 +[11:06:55] Epoch: 1 Batch: 7421/20099 (36.92%) Loss: 1.894537 LR: 0.00002726 +[11:06:57] Epoch: 1 Batch: 7422/20099 (36.93%) Loss: 2.101614 LR: 0.00002726 +[11:06:59] Epoch: 1 Batch: 7423/20099 (36.93%) Loss: 2.204845 LR: 0.00002725 +[11:07:00] Epoch: 1 Batch: 7424/20099 (36.94%) Loss: 1.968376 LR: 0.00002725 +[11:07:02] Epoch: 1 Batch: 7425/20099 (36.94%) Loss: 1.827072 LR: 0.00002725 +[11:07:04] Epoch: 1 Batch: 7426/20099 (36.95%) Loss: 2.372280 LR: 0.00002725 +[11:07:06] Epoch: 1 Batch: 7427/20099 (36.95%) Loss: 2.292456 LR: 0.00002725 +[11:07:07] Epoch: 1 Batch: 7428/20099 (36.96%) Loss: 2.074204 LR: 0.00002725 +[11:07:09] Epoch: 1 Batch: 7429/20099 (36.96%) Loss: 2.052638 LR: 0.00002725 +[11:07:11] Epoch: 1 Batch: 7430/20099 (36.97%) Loss: 2.054391 LR: 0.00002724 +[11:07:13] Epoch: 1 Batch: 7431/20099 (36.97%) Loss: 1.967737 LR: 0.00002724 +[11:07:15] Epoch: 1 Batch: 7432/20099 (36.98%) Loss: 1.993551 LR: 0.00002724 +[11:07:16] Epoch: 1 Batch: 7433/20099 (36.98%) Loss: 1.916423 LR: 0.00002724 +[11:07:18] Epoch: 1 Batch: 7434/20099 (36.99%) Loss: 2.107457 LR: 0.00002724 +[11:07:20] Epoch: 1 Batch: 7435/20099 (36.99%) Loss: 1.944660 LR: 0.00002724 +[11:07:22] Epoch: 1 Batch: 7436/20099 (37.00%) Loss: 1.902421 LR: 0.00002724 +[11:07:23] Epoch: 1 Batch: 7437/20099 (37.00%) Loss: 1.856221 LR: 0.00002723 +[11:07:25] Epoch: 1 Batch: 7438/20099 (37.01%) Loss: 2.150042 LR: 0.00002723 +[11:07:27] Epoch: 1 Batch: 7439/20099 (37.01%) Loss: 2.070509 LR: 0.00002723 +[11:07:29] Epoch: 1 Batch: 7440/20099 (37.02%) Loss: 1.954005 LR: 0.00002723 +[11:07:30] Epoch: 1 Batch: 7441/20099 (37.02%) Loss: 2.303750 LR: 0.00002723 +[11:07:32] Epoch: 1 Batch: 7442/20099 (37.03%) Loss: 2.315698 LR: 0.00002723 +[11:07:34] Epoch: 1 Batch: 7443/20099 (37.03%) Loss: 1.845899 LR: 0.00002723 +[11:07:36] Epoch: 1 Batch: 7444/20099 (37.04%) Loss: 2.262138 LR: 0.00002722 +[11:07:37] Epoch: 1 Batch: 7445/20099 (37.04%) Loss: 2.173619 LR: 0.00002722 +[11:07:39] Epoch: 1 Batch: 7446/20099 (37.05%) Loss: 1.970432 LR: 0.00002722 +[11:07:41] Epoch: 1 Batch: 7447/20099 (37.05%) Loss: 1.899911 LR: 0.00002722 +[11:07:43] Epoch: 1 Batch: 7448/20099 (37.06%) Loss: 2.042604 LR: 0.00002722 +[11:07:44] Epoch: 1 Batch: 7449/20099 (37.06%) Loss: 1.956385 LR: 0.00002722 +[11:07:46] Epoch: 1 Batch: 7450/20099 (37.07%) Loss: 2.318150 LR: 0.00002722 +[11:07:48] Epoch: 1 Batch: 7451/20099 (37.07%) Loss: 2.254893 LR: 0.00002721 +[11:07:50] Epoch: 1 Batch: 7452/20099 (37.08%) Loss: 2.257835 LR: 0.00002721 +[11:07:52] Epoch: 1 Batch: 7453/20099 (37.08%) Loss: 2.116762 LR: 0.00002721 +[11:07:53] Epoch: 1 Batch: 7454/20099 (37.09%) Loss: 2.193834 LR: 0.00002721 +[11:07:55] Epoch: 1 Batch: 7455/20099 (37.09%) Loss: 1.999442 LR: 0.00002721 +[11:07:57] Epoch: 1 Batch: 7456/20099 (37.10%) Loss: 2.379193 LR: 0.00002721 +[11:07:59] Epoch: 1 Batch: 7457/20099 (37.10%) Loss: 1.932035 LR: 0.00002721 +[11:08:00] Epoch: 1 Batch: 7458/20099 (37.11%) Loss: 2.186571 LR: 0.00002720 +[11:08:02] Epoch: 1 Batch: 7459/20099 (37.11%) Loss: 2.167314 LR: 0.00002720 +[11:08:04] Epoch: 1 Batch: 7460/20099 (37.12%) Loss: 2.212732 LR: 0.00002720 +[11:08:06] Epoch: 1 Batch: 7461/20099 (37.12%) Loss: 2.367903 LR: 0.00002720 +[11:08:08] Epoch: 1 Batch: 7462/20099 (37.13%) Loss: 1.958687 LR: 0.00002720 +[11:08:09] Epoch: 1 Batch: 7463/20099 (37.13%) Loss: 1.872130 LR: 0.00002720 +[11:08:11] Epoch: 1 Batch: 7464/20099 (37.14%) Loss: 2.601503 LR: 0.00002720 +[11:08:13] Epoch: 1 Batch: 7465/20099 (37.14%) Loss: 2.295963 LR: 0.00002719 +[11:08:15] Epoch: 1 Batch: 7466/20099 (37.15%) Loss: 2.043216 LR: 0.00002719 +[11:08:17] Epoch: 1 Batch: 7467/20099 (37.15%) Loss: 2.102214 LR: 0.00002719 +[11:08:18] Epoch: 1 Batch: 7468/20099 (37.16%) Loss: 2.052921 LR: 0.00002719 +[11:08:20] Epoch: 1 Batch: 7469/20099 (37.16%) Loss: 1.852019 LR: 0.00002719 +[11:08:22] Epoch: 1 Batch: 7470/20099 (37.17%) Loss: 2.250102 LR: 0.00002719 +[11:08:24] Epoch: 1 Batch: 7471/20099 (37.17%) Loss: 1.948229 LR: 0.00002719 +[11:08:25] Epoch: 1 Batch: 7472/20099 (37.18%) Loss: 2.114219 LR: 0.00002718 +[11:08:27] Epoch: 1 Batch: 7473/20099 (37.18%) Loss: 2.275443 LR: 0.00002718 +[11:08:29] Epoch: 1 Batch: 7474/20099 (37.19%) Loss: 1.935437 LR: 0.00002718 +[11:08:31] Epoch: 1 Batch: 7475/20099 (37.19%) Loss: 2.201656 LR: 0.00002718 +[11:08:33] Epoch: 1 Batch: 7476/20099 (37.20%) Loss: 2.326701 LR: 0.00002718 +[11:08:34] Epoch: 1 Batch: 7477/20099 (37.20%) Loss: 2.116198 LR: 0.00002718 +[11:08:36] Epoch: 1 Batch: 7478/20099 (37.21%) Loss: 2.382480 LR: 0.00002718 +[11:08:38] Epoch: 1 Batch: 7479/20099 (37.21%) Loss: 2.061540 LR: 0.00002717 +[11:08:40] Epoch: 1 Batch: 7480/20099 (37.22%) Loss: 2.088738 LR: 0.00002717 +[11:08:41] Epoch: 1 Batch: 7481/20099 (37.22%) Loss: 1.951484 LR: 0.00002717 +[11:08:43] Epoch: 1 Batch: 7482/20099 (37.23%) Loss: 2.173810 LR: 0.00002717 +[11:08:45] Epoch: 1 Batch: 7483/20099 (37.23%) Loss: 1.944673 LR: 0.00002717 +[11:08:47] Epoch: 1 Batch: 7484/20099 (37.24%) Loss: 2.021167 LR: 0.00002717 +[11:08:48] Epoch: 1 Batch: 7485/20099 (37.24%) Loss: 2.296095 LR: 0.00002717 +[11:08:50] Epoch: 1 Batch: 7486/20099 (37.25%) Loss: 2.220359 LR: 0.00002716 +[11:08:52] Epoch: 1 Batch: 7487/20099 (37.25%) Loss: 2.195253 LR: 0.00002716 +[11:08:54] Epoch: 1 Batch: 7488/20099 (37.26%) Loss: 2.040570 LR: 0.00002716 +[11:08:56] Epoch: 1 Batch: 7489/20099 (37.26%) Loss: 2.013252 LR: 0.00002716 +[11:08:57] Epoch: 1 Batch: 7490/20099 (37.27%) Loss: 2.316216 LR: 0.00002716 +[11:08:59] Epoch: 1 Batch: 7491/20099 (37.27%) Loss: 2.182035 LR: 0.00002716 +[11:09:01] Epoch: 1 Batch: 7492/20099 (37.28%) Loss: 1.897231 LR: 0.00002716 +[11:09:03] Epoch: 1 Batch: 7493/20099 (37.28%) Loss: 2.160485 LR: 0.00002715 +[11:09:04] Epoch: 1 Batch: 7494/20099 (37.29%) Loss: 2.211513 LR: 0.00002715 +[11:09:06] Epoch: 1 Batch: 7495/20099 (37.29%) Loss: 2.043427 LR: 0.00002715 +[11:09:08] Epoch: 1 Batch: 7496/20099 (37.30%) Loss: 2.153075 LR: 0.00002715 +[11:09:10] Epoch: 1 Batch: 7497/20099 (37.30%) Loss: 2.147776 LR: 0.00002715 +[11:09:11] Epoch: 1 Batch: 7498/20099 (37.31%) Loss: 2.262820 LR: 0.00002715 +[11:09:13] Epoch: 1 Batch: 7499/20099 (37.31%) Loss: 2.286182 LR: 0.00002715 +[11:09:15] >> Evaluating batch 0 +[11:09:16] >> Evaluating batch 1 +[11:09:17] >> Evaluating batch 2 +[11:09:18] >> Evaluating batch 3 +[11:09:19] >> Evaluating batch 4 +[11:09:20] >> Evaluating batch 5 +[11:09:21] >> Evaluating batch 6 +[11:09:22] >> Evaluating batch 7 +[11:09:23] >> Evaluating batch 8 +[11:09:24] >> Evaluating batch 9 +[11:09:25] >> Evaluating batch 10 +[11:09:26] >> Evaluating batch 11 +[11:09:27] >> Evaluating batch 12 +[11:09:28] >> Evaluating batch 13 +[11:09:29] >> Evaluating batch 14 +[11:09:30] >> Evaluating batch 15 +[11:09:31] >> Evaluating batch 16 +[11:09:31] Epoch: 1 Step: 7500/20099 Evaluation: +[11:09:31] [1mAvg Loss Since Last Eval: 2.1102 Val Loss: 2.1859 Validation loss delta: 0.0017 Perplexity: 8.8985 LR: 0.00002714 +[11:09:35] >> Checkpoint saved: epoch1_step7500, size: 0.1693 GB +[11:09:35] Epoch: 1 Batch: 7500/20099 (37.32%) Loss: 2.090431 LR: 0.00002714 +[11:09:37] Epoch: 1 Batch: 7501/20099 (37.32%) Loss: 1.955598 LR: 0.00002714 +[11:09:39] Epoch: 1 Batch: 7502/20099 (37.33%) Loss: 2.319316 LR: 0.00002714 +[11:09:40] Epoch: 1 Batch: 7503/20099 (37.33%) Loss: 2.227690 LR: 0.00002714 +[11:09:42] Epoch: 1 Batch: 7504/20099 (37.34%) Loss: 2.190015 LR: 0.00002714 +[11:09:44] Epoch: 1 Batch: 7505/20099 (37.34%) Loss: 2.040753 LR: 0.00002714 +[11:09:45] Epoch: 1 Batch: 7506/20099 (37.35%) Loss: 2.369993 LR: 0.00002714 +[11:09:47] Epoch: 1 Batch: 7507/20099 (37.35%) Loss: 1.926050 LR: 0.00002713 +[11:09:49] Epoch: 1 Batch: 7508/20099 (37.36%) Loss: 2.117378 LR: 0.00002713 +[11:09:51] Epoch: 1 Batch: 7509/20099 (37.36%) Loss: 2.227285 LR: 0.00002713 +[11:09:53] Epoch: 1 Batch: 7510/20099 (37.37%) Loss: 2.096868 LR: 0.00002713 +[11:09:54] Epoch: 1 Batch: 7511/20099 (37.37%) Loss: 2.033343 LR: 0.00002713 +[11:09:56] Epoch: 1 Batch: 7512/20099 (37.37%) Loss: 2.199976 LR: 0.00002713 +[11:09:58] Epoch: 1 Batch: 7513/20099 (37.38%) Loss: 2.188995 LR: 0.00002713 +[11:10:00] Epoch: 1 Batch: 7514/20099 (37.38%) Loss: 2.311489 LR: 0.00002712 +[11:10:02] Epoch: 1 Batch: 7515/20099 (37.39%) Loss: 2.041385 LR: 0.00002712 +[11:10:03] Epoch: 1 Batch: 7516/20099 (37.39%) Loss: 2.275685 LR: 0.00002712 +[11:10:05] Epoch: 1 Batch: 7517/20099 (37.40%) Loss: 2.287780 LR: 0.00002712 +[11:10:07] Epoch: 1 Batch: 7518/20099 (37.40%) Loss: 2.242068 LR: 0.00002712 +[11:10:09] Epoch: 1 Batch: 7519/20099 (37.41%) Loss: 2.205517 LR: 0.00002712 +[11:10:11] Epoch: 1 Batch: 7520/20099 (37.41%) Loss: 2.073525 LR: 0.00002712 +[11:10:12] Epoch: 1 Batch: 7521/20099 (37.42%) Loss: 2.185176 LR: 0.00002711 +[11:10:14] Epoch: 1 Batch: 7522/20099 (37.42%) Loss: 2.105930 LR: 0.00002711 +[11:10:16] Epoch: 1 Batch: 7523/20099 (37.43%) Loss: 1.708417 LR: 0.00002711 +[11:10:18] Epoch: 1 Batch: 7524/20099 (37.43%) Loss: 2.339002 LR: 0.00002711 +[11:10:19] Epoch: 1 Batch: 7525/20099 (37.44%) Loss: 2.114140 LR: 0.00002711 +[11:10:21] Epoch: 1 Batch: 7526/20099 (37.44%) Loss: 2.012085 LR: 0.00002711 +[11:10:23] Epoch: 1 Batch: 7527/20099 (37.45%) Loss: 2.172597 LR: 0.00002711 +[11:10:25] Epoch: 1 Batch: 7528/20099 (37.45%) Loss: 1.857878 LR: 0.00002710 +[11:10:27] Epoch: 1 Batch: 7529/20099 (37.46%) Loss: 1.825968 LR: 0.00002710 +[11:10:28] Epoch: 1 Batch: 7530/20099 (37.46%) Loss: 2.057885 LR: 0.00002710 +[11:10:30] Epoch: 1 Batch: 7531/20099 (37.47%) Loss: 2.106778 LR: 0.00002710 +[11:10:32] Epoch: 1 Batch: 7532/20099 (37.47%) Loss: 2.127872 LR: 0.00002710 +[11:10:34] Epoch: 1 Batch: 7533/20099 (37.48%) Loss: 2.088772 LR: 0.00002710 +[11:10:35] Epoch: 1 Batch: 7534/20099 (37.48%) Loss: 2.615045 LR: 0.00002710 +[11:10:37] Epoch: 1 Batch: 7535/20099 (37.49%) Loss: 2.219585 LR: 0.00002708 +[11:10:39] Epoch: 1 Batch: 7536/20099 (37.49%) Loss: 2.349776 LR: 0.00002708 +[11:10:41] Epoch: 1 Batch: 7537/20099 (37.50%) Loss: 2.065005 LR: 0.00002708 +[11:10:43] Epoch: 1 Batch: 7538/20099 (37.50%) Loss: 1.936078 LR: 0.00002708 +[11:10:44] Epoch: 1 Batch: 7539/20099 (37.51%) Loss: 2.255954 LR: 0.00002708 +[11:10:46] Epoch: 1 Batch: 7540/20099 (37.51%) Loss: 2.014580 LR: 0.00002708 +[11:10:48] Epoch: 1 Batch: 7541/20099 (37.52%) Loss: 2.095209 LR: 0.00002708 +[11:10:50] Epoch: 1 Batch: 7542/20099 (37.52%) Loss: 2.124031 LR: 0.00002707 +[11:10:51] Epoch: 1 Batch: 7543/20099 (37.53%) Loss: 1.980767 LR: 0.00002707 +[11:10:53] Epoch: 1 Batch: 7544/20099 (37.53%) Loss: 2.080734 LR: 0.00002707 +[11:10:55] Epoch: 1 Batch: 7545/20099 (37.54%) Loss: 2.015906 LR: 0.00002707 +[11:10:57] Epoch: 1 Batch: 7546/20099 (37.54%) Loss: 2.184129 LR: 0.00002707 +[11:10:58] Epoch: 1 Batch: 7547/20099 (37.55%) Loss: 2.085321 LR: 0.00002707 +[11:11:00] Epoch: 1 Batch: 7548/20099 (37.55%) Loss: 2.022751 LR: 0.00002707 +[11:11:02] Epoch: 1 Batch: 7549/20099 (37.56%) Loss: 2.092518 LR: 0.00002706 +[11:11:04] Epoch: 1 Batch: 7550/20099 (37.56%) Loss: 2.152174 LR: 0.00002706 +[11:11:05] Epoch: 1 Batch: 7551/20099 (37.57%) Loss: 2.195509 LR: 0.00002706 +[11:11:07] Epoch: 1 Batch: 7552/20099 (37.57%) Loss: 2.014400 LR: 0.00002706 +[11:11:09] Epoch: 1 Batch: 7553/20099 (37.58%) Loss: 2.247091 LR: 0.00002706 +[11:11:11] Epoch: 1 Batch: 7554/20099 (37.58%) Loss: 2.333678 LR: 0.00002706 +[11:11:12] Epoch: 1 Batch: 7555/20099 (37.59%) Loss: 2.180648 LR: 0.00002706 +[11:11:14] Epoch: 1 Batch: 7556/20099 (37.59%) Loss: 2.145382 LR: 0.00002705 +[11:11:16] Epoch: 1 Batch: 7557/20099 (37.60%) Loss: 2.159431 LR: 0.00002705 +[11:11:18] Epoch: 1 Batch: 7558/20099 (37.60%) Loss: 1.994300 LR: 0.00002705 +[11:11:20] Epoch: 1 Batch: 7559/20099 (37.61%) Loss: 1.882058 LR: 0.00002705 +[11:11:21] Epoch: 1 Batch: 7560/20099 (37.61%) Loss: 2.333821 LR: 0.00002705 +[11:11:23] Epoch: 1 Batch: 7561/20099 (37.62%) Loss: 2.053298 LR: 0.00002705 +[11:11:25] Epoch: 1 Batch: 7562/20099 (37.62%) Loss: 2.077059 LR: 0.00002705 +[11:11:27] Epoch: 1 Batch: 7563/20099 (37.63%) Loss: 2.191625 LR: 0.00002704 +[11:11:28] Epoch: 1 Batch: 7564/20099 (37.63%) Loss: 2.071985 LR: 0.00002704 +[11:11:30] Epoch: 1 Batch: 7565/20099 (37.64%) Loss: 2.012497 LR: 0.00002704 +[11:11:32] Epoch: 1 Batch: 7566/20099 (37.64%) Loss: 2.005894 LR: 0.00002704 +[11:11:34] Epoch: 1 Batch: 7567/20099 (37.65%) Loss: 1.900437 LR: 0.00002704 +[11:11:36] Epoch: 1 Batch: 7568/20099 (37.65%) Loss: 2.063943 LR: 0.00002704 +[11:11:37] Epoch: 1 Batch: 7569/20099 (37.66%) Loss: 2.102217 LR: 0.00002704 +[11:11:39] Epoch: 1 Batch: 7570/20099 (37.66%) Loss: 2.223043 LR: 0.00002703 +[11:11:41] Epoch: 1 Batch: 7571/20099 (37.67%) Loss: 2.417304 LR: 0.00002703 +[11:11:43] Epoch: 1 Batch: 7572/20099 (37.67%) Loss: 2.447787 LR: 0.00002703 +[11:11:45] Epoch: 1 Batch: 7573/20099 (37.68%) Loss: 2.257261 LR: 0.00002703 +[11:11:46] Epoch: 1 Batch: 7574/20099 (37.68%) Loss: 2.252428 LR: 0.00002703 +[11:11:48] Epoch: 1 Batch: 7575/20099 (37.69%) Loss: 2.164909 LR: 0.00002703 +[11:11:50] Epoch: 1 Batch: 7576/20099 (37.69%) Loss: 2.015445 LR: 0.00002703 +[11:11:52] Epoch: 1 Batch: 7577/20099 (37.70%) Loss: 2.096794 LR: 0.00002702 +[11:11:53] Epoch: 1 Batch: 7578/20099 (37.70%) Loss: 2.184376 LR: 0.00002702 +[11:11:55] Epoch: 1 Batch: 7579/20099 (37.71%) Loss: 2.165411 LR: 0.00002702 +[11:11:57] Epoch: 1 Batch: 7580/20099 (37.71%) Loss: 2.326453 LR: 0.00002702 +[11:11:59] Epoch: 1 Batch: 7581/20099 (37.72%) Loss: 2.004523 LR: 0.00002702 +[11:12:01] Epoch: 1 Batch: 7582/20099 (37.72%) Loss: 2.204636 LR: 0.00002702 +[11:12:02] Epoch: 1 Batch: 7583/20099 (37.73%) Loss: 2.211407 LR: 0.00002702 +[11:12:04] Epoch: 1 Batch: 7584/20099 (37.73%) Loss: 2.026673 LR: 0.00002701 +[11:12:06] Epoch: 1 Batch: 7585/20099 (37.74%) Loss: 2.117855 LR: 0.00002701 +[11:12:08] Epoch: 1 Batch: 7586/20099 (37.74%) Loss: 2.178208 LR: 0.00002701 +[11:12:09] Epoch: 1 Batch: 7587/20099 (37.75%) Loss: 2.148467 LR: 0.00002701 +[11:12:11] Epoch: 1 Batch: 7588/20099 (37.75%) Loss: 2.113534 LR: 0.00002701 +[11:12:13] Epoch: 1 Batch: 7589/20099 (37.76%) Loss: 2.119609 LR: 0.00002701 +[11:12:15] Epoch: 1 Batch: 7590/20099 (37.76%) Loss: 2.158542 LR: 0.00002701 +[11:12:16] Epoch: 1 Batch: 7591/20099 (37.77%) Loss: 2.497913 LR: 0.00002700 +[11:12:18] Epoch: 1 Batch: 7592/20099 (37.77%) Loss: 2.291085 LR: 0.00002700 +[11:12:20] Epoch: 1 Batch: 7593/20099 (37.78%) Loss: 1.968984 LR: 0.00002700 +[11:12:22] Epoch: 1 Batch: 7594/20099 (37.78%) Loss: 2.307407 LR: 0.00002700 +[11:12:24] Epoch: 1 Batch: 7595/20099 (37.79%) Loss: 2.002225 LR: 0.00002700 +[11:12:25] Epoch: 1 Batch: 7596/20099 (37.79%) Loss: 2.268226 LR: 0.00002700 +[11:12:27] Epoch: 1 Batch: 7597/20099 (37.80%) Loss: 2.031996 LR: 0.00002700 +[11:12:29] Epoch: 1 Batch: 7598/20099 (37.80%) Loss: 2.295527 LR: 0.00002699 +[11:12:31] Epoch: 1 Batch: 7599/20099 (37.81%) Loss: 2.176536 LR: 0.00002699 +[11:12:36] >> Cleaned up old temp checkpoint: epoch1_step5600 +[11:12:36] >> Temp checkpoint saved: epoch1_step7600, size: 0.1693 GB +[11:12:36] Epoch: 1 Batch: 7600/20099 (37.81%) Loss: 2.354165 LR: 0.00002699 +[11:12:38] Epoch: 1 Batch: 7601/20099 (37.82%) Loss: 2.289201 LR: 0.00002699 +[11:12:39] Epoch: 1 Batch: 7602/20099 (37.82%) Loss: 2.250635 LR: 0.00002699 +[11:12:41] Epoch: 1 Batch: 7603/20099 (37.83%) Loss: 2.531839 LR: 0.00002699 +[11:12:43] Epoch: 1 Batch: 7604/20099 (37.83%) Loss: 1.831382 LR: 0.00002699 +[11:12:45] Epoch: 1 Batch: 7605/20099 (37.84%) Loss: 2.020417 LR: 0.00002698 +[11:12:46] Epoch: 1 Batch: 7606/20099 (37.84%) Loss: 2.043750 LR: 0.00002698 +[11:12:48] Epoch: 1 Batch: 7607/20099 (37.85%) Loss: 2.200810 LR: 0.00002698 +[11:12:50] Epoch: 1 Batch: 7608/20099 (37.85%) Loss: 2.257697 LR: 0.00002698 +[11:12:52] Epoch: 1 Batch: 7609/20099 (37.86%) Loss: 1.891204 LR: 0.00002698 +[11:12:53] Epoch: 1 Batch: 7610/20099 (37.86%) Loss: 2.176152 LR: 0.00002698 +[11:12:55] Epoch: 1 Batch: 7611/20099 (37.87%) Loss: 2.197040 LR: 0.00002698 +[11:12:57] Epoch: 1 Batch: 7612/20099 (37.87%) Loss: 2.117419 LR: 0.00002697 +[11:12:59] Epoch: 1 Batch: 7613/20099 (37.88%) Loss: 2.373375 LR: 0.00002697 +[11:13:00] Epoch: 1 Batch: 7614/20099 (37.88%) Loss: 2.016345 LR: 0.00002697 +[11:13:02] Epoch: 1 Batch: 7615/20099 (37.89%) Loss: 1.747656 LR: 0.00002697 +[11:13:04] Epoch: 1 Batch: 7616/20099 (37.89%) Loss: 2.432808 LR: 0.00002697 +[11:13:06] Epoch: 1 Batch: 7617/20099 (37.90%) Loss: 2.040421 LR: 0.00002697 +[11:13:08] Epoch: 1 Batch: 7618/20099 (37.90%) Loss: 2.177718 LR: 0.00002697 +[11:13:10] Epoch: 1 Batch: 7619/20099 (37.91%) Loss: 2.337139 LR: 0.00002696 +[11:13:11] Epoch: 1 Batch: 7620/20099 (37.91%) Loss: 2.133511 LR: 0.00002696 +[11:13:13] Epoch: 1 Batch: 7621/20099 (37.92%) Loss: 1.979455 LR: 0.00002696 +[11:13:15] Epoch: 1 Batch: 7622/20099 (37.92%) Loss: 2.097342 LR: 0.00002696 +[11:13:17] Epoch: 1 Batch: 7623/20099 (37.93%) Loss: 2.086156 LR: 0.00002696 +[11:13:18] Epoch: 1 Batch: 7624/20099 (37.93%) Loss: 1.882397 LR: 0.00002696 +[11:13:20] Epoch: 1 Batch: 7625/20099 (37.94%) Loss: 2.079837 LR: 0.00002696 +[11:13:22] Epoch: 1 Batch: 7626/20099 (37.94%) Loss: 2.288219 LR: 0.00002695 +[11:13:24] Epoch: 1 Batch: 7627/20099 (37.95%) Loss: 2.173043 LR: 0.00002695 +[11:13:26] Epoch: 1 Batch: 7628/20099 (37.95%) Loss: 1.995374 LR: 0.00002695 +[11:13:27] Epoch: 1 Batch: 7629/20099 (37.96%) Loss: 2.305907 LR: 0.00002695 +[11:13:29] Epoch: 1 Batch: 7630/20099 (37.96%) Loss: 2.060481 LR: 0.00002695 +[11:13:31] Epoch: 1 Batch: 7631/20099 (37.97%) Loss: 2.371988 LR: 0.00002695 +[11:13:33] Epoch: 1 Batch: 7632/20099 (37.97%) Loss: 1.937420 LR: 0.00002695 +[11:13:34] Epoch: 1 Batch: 7633/20099 (37.98%) Loss: 1.954404 LR: 0.00002693 +[11:13:36] Epoch: 1 Batch: 7634/20099 (37.98%) Loss: 1.744549 LR: 0.00002693 +[11:13:38] Epoch: 1 Batch: 7635/20099 (37.99%) Loss: 2.177539 LR: 0.00002693 +[11:13:40] Epoch: 1 Batch: 7636/20099 (37.99%) Loss: 2.483406 LR: 0.00002693 +[11:13:42] Epoch: 1 Batch: 7637/20099 (38.00%) Loss: 2.232841 LR: 0.00002693 +[11:13:43] Epoch: 1 Batch: 7638/20099 (38.00%) Loss: 2.291785 LR: 0.00002693 +[11:13:45] Epoch: 1 Batch: 7639/20099 (38.01%) Loss: 2.215655 LR: 0.00002693 +[11:13:47] Epoch: 1 Batch: 7640/20099 (38.01%) Loss: 2.175951 LR: 0.00002692 +[11:13:49] Epoch: 1 Batch: 7641/20099 (38.02%) Loss: 1.911804 LR: 0.00002692 +[11:13:50] Epoch: 1 Batch: 7642/20099 (38.02%) Loss: 2.115698 LR: 0.00002692 +[11:13:52] Epoch: 1 Batch: 7643/20099 (38.03%) Loss: 1.978925 LR: 0.00002692 +[11:13:54] Epoch: 1 Batch: 7644/20099 (38.03%) Loss: 1.972691 LR: 0.00002692 +[11:13:56] Epoch: 1 Batch: 7645/20099 (38.04%) Loss: 1.885351 LR: 0.00002692 +[11:13:57] Epoch: 1 Batch: 7646/20099 (38.04%) Loss: 2.350190 LR: 0.00002692 +[11:13:59] Epoch: 1 Batch: 7647/20099 (38.05%) Loss: 1.906459 LR: 0.00002691 +[11:14:01] Epoch: 1 Batch: 7648/20099 (38.05%) Loss: 1.952194 LR: 0.00002691 +[11:14:03] Epoch: 1 Batch: 7649/20099 (38.06%) Loss: 1.868584 LR: 0.00002691 +[11:14:04] Epoch: 1 Batch: 7650/20099 (38.06%) Loss: 2.126197 LR: 0.00002691 +[11:14:06] Epoch: 1 Batch: 7651/20099 (38.07%) Loss: 1.895261 LR: 0.00002691 +[11:14:08] Epoch: 1 Batch: 7652/20099 (38.07%) Loss: 1.957092 LR: 0.00002691 +[11:14:10] Epoch: 1 Batch: 7653/20099 (38.08%) Loss: 2.063932 LR: 0.00002691 +[11:14:12] Epoch: 1 Batch: 7654/20099 (38.08%) Loss: 2.307883 LR: 0.00002690 +[11:14:13] Epoch: 1 Batch: 7655/20099 (38.09%) Loss: 2.189895 LR: 0.00002690 +[11:14:15] Epoch: 1 Batch: 7656/20099 (38.09%) Loss: 2.029241 LR: 0.00002690 +[11:14:17] Epoch: 1 Batch: 7657/20099 (38.10%) Loss: 2.154032 LR: 0.00002690 +[11:14:19] Epoch: 1 Batch: 7658/20099 (38.10%) Loss: 2.189305 LR: 0.00002690 +[11:14:20] Epoch: 1 Batch: 7659/20099 (38.11%) Loss: 2.065312 LR: 0.00002690 +[11:14:22] Epoch: 1 Batch: 7660/20099 (38.11%) Loss: 1.970588 LR: 0.00002690 +[11:14:24] Epoch: 1 Batch: 7661/20099 (38.12%) Loss: 2.056466 LR: 0.00002689 +[11:14:26] Epoch: 1 Batch: 7662/20099 (38.12%) Loss: 2.047432 LR: 0.00002689 +[11:14:28] Epoch: 1 Batch: 7663/20099 (38.13%) Loss: 2.239151 LR: 0.00002689 +[11:14:29] Epoch: 1 Batch: 7664/20099 (38.13%) Loss: 2.113254 LR: 0.00002689 +[11:14:31] Epoch: 1 Batch: 7665/20099 (38.14%) Loss: 2.092138 LR: 0.00002689 +[11:14:33] Epoch: 1 Batch: 7666/20099 (38.14%) Loss: 2.156407 LR: 0.00002689 +[11:14:35] Epoch: 1 Batch: 7667/20099 (38.15%) Loss: 2.110400 LR: 0.00002689 +[11:14:36] Epoch: 1 Batch: 7668/20099 (38.15%) Loss: 2.103272 LR: 0.00002688 +[11:14:38] Epoch: 1 Batch: 7669/20099 (38.16%) Loss: 2.102816 LR: 0.00002688 +[11:14:40] Epoch: 1 Batch: 7670/20099 (38.16%) Loss: 1.998699 LR: 0.00002688 +[11:14:42] Epoch: 1 Batch: 7671/20099 (38.17%) Loss: 2.225369 LR: 0.00002688 +[11:14:44] Epoch: 1 Batch: 7672/20099 (38.17%) Loss: 2.252201 LR: 0.00002688 +[11:14:45] Epoch: 1 Batch: 7673/20099 (38.18%) Loss: 2.306614 LR: 0.00002688 +[11:14:47] Epoch: 1 Batch: 7674/20099 (38.18%) Loss: 2.108495 LR: 0.00002688 +[11:14:49] Epoch: 1 Batch: 7675/20099 (38.19%) Loss: 2.256114 LR: 0.00002687 +[11:14:51] Epoch: 1 Batch: 7676/20099 (38.19%) Loss: 2.040156 LR: 0.00002687 +[11:14:52] Epoch: 1 Batch: 7677/20099 (38.20%) Loss: 2.105298 LR: 0.00002687 +[11:14:54] Epoch: 1 Batch: 7678/20099 (38.20%) Loss: 2.163300 LR: 0.00002687 +[11:14:56] Epoch: 1 Batch: 7679/20099 (38.21%) Loss: 2.217394 LR: 0.00002687 +[11:14:58] Epoch: 1 Batch: 7680/20099 (38.21%) Loss: 2.301324 LR: 0.00002687 +[11:15:00] Epoch: 1 Batch: 7681/20099 (38.22%) Loss: 2.013013 LR: 0.00002687 +[11:15:01] Epoch: 1 Batch: 7682/20099 (38.22%) Loss: 2.197766 LR: 0.00002686 +[11:15:03] Epoch: 1 Batch: 7683/20099 (38.23%) Loss: 2.236234 LR: 0.00002686 +[11:15:05] Epoch: 1 Batch: 7684/20099 (38.23%) Loss: 1.973678 LR: 0.00002686 +[11:15:07] Epoch: 1 Batch: 7685/20099 (38.24%) Loss: 2.045532 LR: 0.00002686 +[11:15:08] Epoch: 1 Batch: 7686/20099 (38.24%) Loss: 2.117994 LR: 0.00002686 +[11:15:10] Epoch: 1 Batch: 7687/20099 (38.25%) Loss: 2.127974 LR: 0.00002686 +[11:15:12] Epoch: 1 Batch: 7688/20099 (38.25%) Loss: 1.829855 LR: 0.00002686 +[11:15:14] Epoch: 1 Batch: 7689/20099 (38.26%) Loss: 2.152765 LR: 0.00002685 +[11:15:15] Epoch: 1 Batch: 7690/20099 (38.26%) Loss: 2.344557 LR: 0.00002685 +[11:15:17] Epoch: 1 Batch: 7691/20099 (38.27%) Loss: 2.016577 LR: 0.00002685 +[11:15:19] Epoch: 1 Batch: 7692/20099 (38.27%) Loss: 2.115147 LR: 0.00002685 +[11:15:21] Epoch: 1 Batch: 7693/20099 (38.28%) Loss: 1.937482 LR: 0.00002685 +[11:15:23] Epoch: 1 Batch: 7694/20099 (38.28%) Loss: 2.255248 LR: 0.00002685 +[11:15:24] Epoch: 1 Batch: 7695/20099 (38.29%) Loss: 2.292229 LR: 0.00002685 +[11:15:26] Epoch: 1 Batch: 7696/20099 (38.29%) Loss: 2.018835 LR: 0.00002684 +[11:15:28] Epoch: 1 Batch: 7697/20099 (38.30%) Loss: 1.983269 LR: 0.00002684 +[11:15:29] Epoch: 1 Batch: 7698/20099 (38.30%) Loss: 2.233195 LR: 0.00002684 +[11:15:31] Epoch: 1 Batch: 7699/20099 (38.31%) Loss: 2.044190 LR: 0.00002684 +[11:15:33] Epoch: 1 Batch: 7700/20099 (38.31%) Loss: 2.261748 LR: 0.00002684 +[11:15:35] Epoch: 1 Batch: 7701/20099 (38.32%) Loss: 2.040953 LR: 0.00002684 +[11:15:37] Epoch: 1 Batch: 7702/20099 (38.32%) Loss: 2.151722 LR: 0.00002684 +[11:15:38] Epoch: 1 Batch: 7703/20099 (38.33%) Loss: 1.923068 LR: 0.00002683 +[11:15:40] Epoch: 1 Batch: 7704/20099 (38.33%) Loss: 2.095229 LR: 0.00002683 +[11:15:42] Epoch: 1 Batch: 7705/20099 (38.34%) Loss: 2.336534 LR: 0.00002683 +[11:15:44] Epoch: 1 Batch: 7706/20099 (38.34%) Loss: 2.340981 LR: 0.00002683 +[11:15:45] Epoch: 1 Batch: 7707/20099 (38.35%) Loss: 2.182798 LR: 0.00002683 +[11:15:47] Epoch: 1 Batch: 7708/20099 (38.35%) Loss: 2.232115 LR: 0.00002683 +[11:15:49] Epoch: 1 Batch: 7709/20099 (38.36%) Loss: 2.011777 LR: 0.00002683 +[11:15:51] Epoch: 1 Batch: 7710/20099 (38.36%) Loss: 1.989652 LR: 0.00002681 +[11:15:52] Epoch: 1 Batch: 7711/20099 (38.37%) Loss: 2.324297 LR: 0.00002681 +[11:15:54] Epoch: 1 Batch: 7712/20099 (38.37%) Loss: 2.290851 LR: 0.00002681 +[11:15:56] Epoch: 1 Batch: 7713/20099 (38.38%) Loss: 2.030918 LR: 0.00002681 +[11:15:58] Epoch: 1 Batch: 7714/20099 (38.38%) Loss: 2.061622 LR: 0.00002681 +[11:16:00] Epoch: 1 Batch: 7715/20099 (38.38%) Loss: 2.236552 LR: 0.00002681 +[11:16:01] Epoch: 1 Batch: 7716/20099 (38.39%) Loss: 1.739819 LR: 0.00002681 +[11:16:03] Epoch: 1 Batch: 7717/20099 (38.39%) Loss: 2.128646 LR: 0.00002680 +[11:16:05] Epoch: 1 Batch: 7718/20099 (38.40%) Loss: 2.327111 LR: 0.00002680 +[11:16:07] Epoch: 1 Batch: 7719/20099 (38.40%) Loss: 2.171966 LR: 0.00002680 +[11:16:08] Epoch: 1 Batch: 7720/20099 (38.41%) Loss: 2.210466 LR: 0.00002680 +[11:16:10] Epoch: 1 Batch: 7721/20099 (38.41%) Loss: 1.985090 LR: 0.00002680 +[11:16:12] Epoch: 1 Batch: 7722/20099 (38.42%) Loss: 2.067893 LR: 0.00002680 +[11:16:14] Epoch: 1 Batch: 7723/20099 (38.42%) Loss: 2.017352 LR: 0.00002680 +[11:16:16] Epoch: 1 Batch: 7724/20099 (38.43%) Loss: 2.128472 LR: 0.00002679 +[11:16:17] Epoch: 1 Batch: 7725/20099 (38.43%) Loss: 2.106545 LR: 0.00002679 +[11:16:19] Epoch: 1 Batch: 7726/20099 (38.44%) Loss: 2.108230 LR: 0.00002679 +[11:16:21] Epoch: 1 Batch: 7727/20099 (38.44%) Loss: 2.128987 LR: 0.00002679 +[11:16:23] Epoch: 1 Batch: 7728/20099 (38.45%) Loss: 2.421197 LR: 0.00002679 +[11:16:24] Epoch: 1 Batch: 7729/20099 (38.45%) Loss: 1.945420 LR: 0.00002679 +[11:16:26] Epoch: 1 Batch: 7730/20099 (38.46%) Loss: 1.992838 LR: 0.00002679 +[11:16:28] Epoch: 1 Batch: 7731/20099 (38.46%) Loss: 2.107953 LR: 0.00002678 +[11:16:30] Epoch: 1 Batch: 7732/20099 (38.47%) Loss: 2.341775 LR: 0.00002678 +[11:16:31] Epoch: 1 Batch: 7733/20099 (38.47%) Loss: 1.967906 LR: 0.00002678 +[11:16:33] Epoch: 1 Batch: 7734/20099 (38.48%) Loss: 1.797901 LR: 0.00002678 +[11:16:35] Epoch: 1 Batch: 7735/20099 (38.48%) Loss: 2.176594 LR: 0.00002678 +[11:16:37] Epoch: 1 Batch: 7736/20099 (38.49%) Loss: 2.273418 LR: 0.00002678 +[11:16:39] Epoch: 1 Batch: 7737/20099 (38.49%) Loss: 2.470681 LR: 0.00002678 +[11:16:40] Epoch: 1 Batch: 7738/20099 (38.50%) Loss: 1.914682 LR: 0.00002677 +[11:16:42] Epoch: 1 Batch: 7739/20099 (38.50%) Loss: 2.044192 LR: 0.00002677 +[11:16:44] Epoch: 1 Batch: 7740/20099 (38.51%) Loss: 2.113921 LR: 0.00002677 +[11:16:46] Epoch: 1 Batch: 7741/20099 (38.51%) Loss: 1.914415 LR: 0.00002677 +[11:16:47] Epoch: 1 Batch: 7742/20099 (38.52%) Loss: 1.695788 LR: 0.00002677 +[11:16:49] Epoch: 1 Batch: 7743/20099 (38.52%) Loss: 2.205287 LR: 0.00002677 +[11:16:51] Epoch: 1 Batch: 7744/20099 (38.53%) Loss: 2.122564 LR: 0.00002677 +[11:16:53] Epoch: 1 Batch: 7745/20099 (38.53%) Loss: 2.351593 LR: 0.00002676 +[11:16:55] Epoch: 1 Batch: 7746/20099 (38.54%) Loss: 2.250907 LR: 0.00002676 +[11:16:56] Epoch: 1 Batch: 7747/20099 (38.54%) Loss: 2.305131 LR: 0.00002676 +[11:16:58] Epoch: 1 Batch: 7748/20099 (38.55%) Loss: 1.760430 LR: 0.00002676 +[11:17:00] Epoch: 1 Batch: 7749/20099 (38.55%) Loss: 2.165574 LR: 0.00002676 +[11:17:02] Epoch: 1 Batch: 7750/20099 (38.56%) Loss: 2.042888 LR: 0.00002676 +[11:17:03] Epoch: 1 Batch: 7751/20099 (38.56%) Loss: 1.792091 LR: 0.00002676 +[11:17:05] Epoch: 1 Batch: 7752/20099 (38.57%) Loss: 2.066135 LR: 0.00002675 +[11:17:07] Epoch: 1 Batch: 7753/20099 (38.57%) Loss: 2.257123 LR: 0.00002675 +[11:17:09] Epoch: 1 Batch: 7754/20099 (38.58%) Loss: 2.224986 LR: 0.00002675 +[11:17:11] Epoch: 1 Batch: 7755/20099 (38.58%) Loss: 2.054968 LR: 0.00002675 +[11:17:12] Epoch: 1 Batch: 7756/20099 (38.59%) Loss: 2.109709 LR: 0.00002675 +[11:17:14] Epoch: 1 Batch: 7757/20099 (38.59%) Loss: 2.237526 LR: 0.00002675 +[11:17:16] Epoch: 1 Batch: 7758/20099 (38.60%) Loss: 2.232387 LR: 0.00002675 +[11:17:18] Epoch: 1 Batch: 7759/20099 (38.60%) Loss: 2.118845 LR: 0.00002674 +[11:17:19] Epoch: 1 Batch: 7760/20099 (38.61%) Loss: 1.826317 LR: 0.00002674 +[11:17:21] Epoch: 1 Batch: 7761/20099 (38.61%) Loss: 2.525434 LR: 0.00002674 +[11:17:23] Epoch: 1 Batch: 7762/20099 (38.62%) Loss: 1.906866 LR: 0.00002674 +[11:17:25] Epoch: 1 Batch: 7763/20099 (38.62%) Loss: 2.129358 LR: 0.00002674 +[11:17:27] Epoch: 1 Batch: 7764/20099 (38.63%) Loss: 1.898523 LR: 0.00002674 +[11:17:28] Epoch: 1 Batch: 7765/20099 (38.63%) Loss: 2.124898 LR: 0.00002674 +[11:17:30] Epoch: 1 Batch: 7766/20099 (38.64%) Loss: 2.183623 LR: 0.00002673 +[11:17:32] Epoch: 1 Batch: 7767/20099 (38.64%) Loss: 1.816592 LR: 0.00002673 +[11:17:34] Epoch: 1 Batch: 7768/20099 (38.65%) Loss: 2.031799 LR: 0.00002673 +[11:17:36] Epoch: 1 Batch: 7769/20099 (38.65%) Loss: 2.115023 LR: 0.00002673 +[11:17:37] Epoch: 1 Batch: 7770/20099 (38.66%) Loss: 2.185224 LR: 0.00002673 +[11:17:39] Epoch: 1 Batch: 7771/20099 (38.66%) Loss: 2.177919 LR: 0.00002673 +[11:17:41] Epoch: 1 Batch: 7772/20099 (38.67%) Loss: 2.140144 LR: 0.00002673 +[11:17:43] Epoch: 1 Batch: 7773/20099 (38.67%) Loss: 2.077932 LR: 0.00002671 +[11:17:44] Epoch: 1 Batch: 7774/20099 (38.68%) Loss: 2.299306 LR: 0.00002671 +[11:17:46] Epoch: 1 Batch: 7775/20099 (38.68%) Loss: 1.926958 LR: 0.00002671 +[11:17:48] Epoch: 1 Batch: 7776/20099 (38.69%) Loss: 2.182030 LR: 0.00002671 +[11:17:50] Epoch: 1 Batch: 7777/20099 (38.69%) Loss: 2.339509 LR: 0.00002671 +[11:17:52] Epoch: 1 Batch: 7778/20099 (38.70%) Loss: 2.231410 LR: 0.00002671 +[11:17:53] Epoch: 1 Batch: 7779/20099 (38.70%) Loss: 2.143681 LR: 0.00002671 +[11:17:55] Epoch: 1 Batch: 7780/20099 (38.71%) Loss: 2.115634 LR: 0.00002670 +[11:17:57] Epoch: 1 Batch: 7781/20099 (38.71%) Loss: 2.243329 LR: 0.00002670 +[11:17:59] Epoch: 1 Batch: 7782/20099 (38.72%) Loss: 2.130012 LR: 0.00002670 +[11:18:00] Epoch: 1 Batch: 7783/20099 (38.72%) Loss: 2.231580 LR: 0.00002670 +[11:18:02] Epoch: 1 Batch: 7784/20099 (38.73%) Loss: 2.209065 LR: 0.00002670 +[11:18:04] Epoch: 1 Batch: 7785/20099 (38.73%) Loss: 2.192298 LR: 0.00002670 +[11:18:06] Epoch: 1 Batch: 7786/20099 (38.74%) Loss: 1.941680 LR: 0.00002670 +[11:18:08] Epoch: 1 Batch: 7787/20099 (38.74%) Loss: 2.118325 LR: 0.00002669 +[11:18:09] Epoch: 1 Batch: 7788/20099 (38.75%) Loss: 1.906937 LR: 0.00002669 +[11:18:11] Epoch: 1 Batch: 7789/20099 (38.75%) Loss: 2.189586 LR: 0.00002669 +[11:18:13] Epoch: 1 Batch: 7790/20099 (38.76%) Loss: 2.018956 LR: 0.00002669 +[11:18:15] Epoch: 1 Batch: 7791/20099 (38.76%) Loss: 2.315284 LR: 0.00002669 +[11:18:16] Epoch: 1 Batch: 7792/20099 (38.77%) Loss: 2.052695 LR: 0.00002669 +[11:18:18] Epoch: 1 Batch: 7793/20099 (38.77%) Loss: 2.364179 LR: 0.00002669 +[11:18:20] Epoch: 1 Batch: 7794/20099 (38.78%) Loss: 1.939288 LR: 0.00002668 +[11:18:22] Epoch: 1 Batch: 7795/20099 (38.78%) Loss: 2.011673 LR: 0.00002668 +[11:18:23] Epoch: 1 Batch: 7796/20099 (38.79%) Loss: 2.031419 LR: 0.00002668 +[11:18:25] Epoch: 1 Batch: 7797/20099 (38.79%) Loss: 1.954742 LR: 0.00002668 +[11:18:27] Epoch: 1 Batch: 7798/20099 (38.80%) Loss: 2.065381 LR: 0.00002668 +[11:18:29] Epoch: 1 Batch: 7799/20099 (38.80%) Loss: 2.067803 LR: 0.00002668 +[11:18:34] >> Cleaned up old temp checkpoint: epoch1_step5800 +[11:18:34] >> Temp checkpoint saved: epoch1_step7800, size: 0.1693 GB +[11:18:34] Epoch: 1 Batch: 7800/20099 (38.81%) Loss: 2.115914 LR: 0.00002668 +[11:18:36] Epoch: 1 Batch: 7801/20099 (38.81%) Loss: 2.126686 LR: 0.00002667 +[11:18:38] Epoch: 1 Batch: 7802/20099 (38.82%) Loss: 2.041991 LR: 0.00002667 +[11:18:39] Epoch: 1 Batch: 7803/20099 (38.82%) Loss: 2.205456 LR: 0.00002667 +[11:18:41] Epoch: 1 Batch: 7804/20099 (38.83%) Loss: 2.034886 LR: 0.00002667 +[11:18:43] Epoch: 1 Batch: 7805/20099 (38.83%) Loss: 2.365794 LR: 0.00002667 +[11:18:45] Epoch: 1 Batch: 7806/20099 (38.84%) Loss: 1.890295 LR: 0.00002667 +[11:18:46] Epoch: 1 Batch: 7807/20099 (38.84%) Loss: 1.768340 LR: 0.00002667 +[11:18:48] Epoch: 1 Batch: 7808/20099 (38.85%) Loss: 2.187560 LR: 0.00002666 +[11:18:50] Epoch: 1 Batch: 7809/20099 (38.85%) Loss: 2.189487 LR: 0.00002666 +[11:18:52] Epoch: 1 Batch: 7810/20099 (38.86%) Loss: 1.897124 LR: 0.00002666 +[11:18:54] Epoch: 1 Batch: 7811/20099 (38.86%) Loss: 2.242054 LR: 0.00002666 +[11:18:55] Epoch: 1 Batch: 7812/20099 (38.87%) Loss: 2.129681 LR: 0.00002666 +[11:18:57] Epoch: 1 Batch: 7813/20099 (38.87%) Loss: 2.272071 LR: 0.00002666 +[11:18:59] Epoch: 1 Batch: 7814/20099 (38.88%) Loss: 2.046514 LR: 0.00002666 +[11:19:01] Epoch: 1 Batch: 7815/20099 (38.88%) Loss: 2.298006 LR: 0.00002665 +[11:19:02] Epoch: 1 Batch: 7816/20099 (38.89%) Loss: 2.187032 LR: 0.00002665 +[11:19:04] Epoch: 1 Batch: 7817/20099 (38.89%) Loss: 2.112337 LR: 0.00002665 +[11:19:06] Epoch: 1 Batch: 7818/20099 (38.90%) Loss: 2.272477 LR: 0.00002665 +[11:19:08] Epoch: 1 Batch: 7819/20099 (38.90%) Loss: 1.936588 LR: 0.00002665 +[11:19:10] Epoch: 1 Batch: 7820/20099 (38.91%) Loss: 2.085606 LR: 0.00002665 +[11:19:11] Epoch: 1 Batch: 7821/20099 (38.91%) Loss: 2.086196 LR: 0.00002665 +[11:19:13] Epoch: 1 Batch: 7822/20099 (38.92%) Loss: 2.235284 LR: 0.00002664 +[11:19:15] Epoch: 1 Batch: 7823/20099 (38.92%) Loss: 2.347824 LR: 0.00002664 +[11:19:17] Epoch: 1 Batch: 7824/20099 (38.93%) Loss: 1.966078 LR: 0.00002664 +[11:19:19] Epoch: 1 Batch: 7825/20099 (38.93%) Loss: 2.103088 LR: 0.00002664 +[11:19:20] Epoch: 1 Batch: 7826/20099 (38.94%) Loss: 2.293581 LR: 0.00002664 +[11:19:22] Epoch: 1 Batch: 7827/20099 (38.94%) Loss: 2.153210 LR: 0.00002664 +[11:19:24] Epoch: 1 Batch: 7828/20099 (38.95%) Loss: 2.238383 LR: 0.00002664 +[11:19:26] Epoch: 1 Batch: 7829/20099 (38.95%) Loss: 2.332803 LR: 0.00002662 +[11:19:27] Epoch: 1 Batch: 7830/20099 (38.96%) Loss: 2.453391 LR: 0.00002662 +[11:19:29] Epoch: 1 Batch: 7831/20099 (38.96%) Loss: 2.456487 LR: 0.00002662 +[11:19:31] Epoch: 1 Batch: 7832/20099 (38.97%) Loss: 2.249483 LR: 0.00002662 +[11:19:33] Epoch: 1 Batch: 7833/20099 (38.97%) Loss: 2.124620 LR: 0.00002662 +[11:19:35] Epoch: 1 Batch: 7834/20099 (38.98%) Loss: 2.441478 LR: 0.00002662 +[11:19:36] Epoch: 1 Batch: 7835/20099 (38.98%) Loss: 2.256517 LR: 0.00002662 +[11:19:38] Epoch: 1 Batch: 7836/20099 (38.99%) Loss: 2.031235 LR: 0.00002661 +[11:19:40] Epoch: 1 Batch: 7837/20099 (38.99%) Loss: 2.143024 LR: 0.00002661 +[11:19:42] Epoch: 1 Batch: 7838/20099 (39.00%) Loss: 2.305421 LR: 0.00002661 +[11:19:43] Epoch: 1 Batch: 7839/20099 (39.00%) Loss: 2.128336 LR: 0.00002661 +[11:19:45] Epoch: 1 Batch: 7840/20099 (39.01%) Loss: 2.151582 LR: 0.00002661 +[11:19:47] Epoch: 1 Batch: 7841/20099 (39.01%) Loss: 2.327061 LR: 0.00002661 +[11:19:49] Epoch: 1 Batch: 7842/20099 (39.02%) Loss: 2.014147 LR: 0.00002661 +[11:19:50] Epoch: 1 Batch: 7843/20099 (39.02%) Loss: 1.615473 LR: 0.00002660 +[11:19:52] Epoch: 1 Batch: 7844/20099 (39.03%) Loss: 2.020171 LR: 0.00002660 +[11:19:54] Epoch: 1 Batch: 7845/20099 (39.03%) Loss: 2.134998 LR: 0.00002660 +[11:19:56] Epoch: 1 Batch: 7846/20099 (39.04%) Loss: 2.113968 LR: 0.00002660 +[11:19:57] Epoch: 1 Batch: 7847/20099 (39.04%) Loss: 2.185345 LR: 0.00002660 +[11:19:59] Epoch: 1 Batch: 7848/20099 (39.05%) Loss: 2.058257 LR: 0.00002660 +[11:20:01] Epoch: 1 Batch: 7849/20099 (39.05%) Loss: 1.962369 LR: 0.00002660 +[11:20:03] Epoch: 1 Batch: 7850/20099 (39.06%) Loss: 1.902711 LR: 0.00002659 +[11:20:05] Epoch: 1 Batch: 7851/20099 (39.06%) Loss: 2.346581 LR: 0.00002659 +[11:20:06] Epoch: 1 Batch: 7852/20099 (39.07%) Loss: 2.122932 LR: 0.00002659 +[11:20:08] Epoch: 1 Batch: 7853/20099 (39.07%) Loss: 1.886862 LR: 0.00002659 +[11:20:10] Epoch: 1 Batch: 7854/20099 (39.08%) Loss: 1.563274 LR: 0.00002659 +[11:20:12] Epoch: 1 Batch: 7855/20099 (39.08%) Loss: 2.306425 LR: 0.00002659 +[11:20:13] Epoch: 1 Batch: 7856/20099 (39.09%) Loss: 2.044250 LR: 0.00002659 +[11:20:15] Epoch: 1 Batch: 7857/20099 (39.09%) Loss: 1.984739 LR: 0.00002658 +[11:20:17] Epoch: 1 Batch: 7858/20099 (39.10%) Loss: 2.333704 LR: 0.00002658 +[11:20:19] Epoch: 1 Batch: 7859/20099 (39.10%) Loss: 1.998548 LR: 0.00002658 +[11:20:20] Epoch: 1 Batch: 7860/20099 (39.11%) Loss: 2.130264 LR: 0.00002658 +[11:20:22] Epoch: 1 Batch: 7861/20099 (39.11%) Loss: 2.080041 LR: 0.00002658 +[11:20:24] Epoch: 1 Batch: 7862/20099 (39.12%) Loss: 1.956181 LR: 0.00002658 +[11:20:26] Epoch: 1 Batch: 7863/20099 (39.12%) Loss: 2.147521 LR: 0.00002658 +[11:20:27] Epoch: 1 Batch: 7864/20099 (39.13%) Loss: 2.209058 LR: 0.00002657 +[11:20:29] Epoch: 1 Batch: 7865/20099 (39.13%) Loss: 2.139018 LR: 0.00002657 +[11:20:31] Epoch: 1 Batch: 7866/20099 (39.14%) Loss: 1.920696 LR: 0.00002657 +[11:20:33] Epoch: 1 Batch: 7867/20099 (39.14%) Loss: 2.386090 LR: 0.00002657 +[11:20:35] Epoch: 1 Batch: 7868/20099 (39.15%) Loss: 2.193243 LR: 0.00002657 +[11:20:36] Epoch: 1 Batch: 7869/20099 (39.15%) Loss: 2.337170 LR: 0.00002657 +[11:20:38] Epoch: 1 Batch: 7870/20099 (39.16%) Loss: 2.318821 LR: 0.00002657 +[11:20:40] Epoch: 1 Batch: 7871/20099 (39.16%) Loss: 2.121712 LR: 0.00002656 +[11:20:42] Epoch: 1 Batch: 7872/20099 (39.17%) Loss: 2.198399 LR: 0.00002656 +[11:20:44] Epoch: 1 Batch: 7873/20099 (39.17%) Loss: 2.248979 LR: 0.00002656 +[11:20:45] Epoch: 1 Batch: 7874/20099 (39.18%) Loss: 2.472112 LR: 0.00002656 +[11:20:47] Epoch: 1 Batch: 7875/20099 (39.18%) Loss: 2.220920 LR: 0.00002656 +[11:20:49] Epoch: 1 Batch: 7876/20099 (39.19%) Loss: 1.916760 LR: 0.00002656 +[11:20:50] Epoch: 1 Batch: 7877/20099 (39.19%) Loss: 1.990875 LR: 0.00002656 +[11:20:52] Epoch: 1 Batch: 7878/20099 (39.20%) Loss: 2.146078 LR: 0.00002655 +[11:20:54] Epoch: 1 Batch: 7879/20099 (39.20%) Loss: 2.118501 LR: 0.00002655 +[11:20:56] Epoch: 1 Batch: 7880/20099 (39.21%) Loss: 2.079660 LR: 0.00002655 +[11:20:58] Epoch: 1 Batch: 7881/20099 (39.21%) Loss: 1.988912 LR: 0.00002655 +[11:20:59] Epoch: 1 Batch: 7882/20099 (39.22%) Loss: 2.106959 LR: 0.00002655 +[11:21:01] Epoch: 1 Batch: 7883/20099 (39.22%) Loss: 2.144059 LR: 0.00002655 +[11:21:03] Epoch: 1 Batch: 7884/20099 (39.23%) Loss: 2.136100 LR: 0.00002655 +[11:21:05] Epoch: 1 Batch: 7885/20099 (39.23%) Loss: 1.815180 LR: 0.00002653 +[11:21:07] Epoch: 1 Batch: 7886/20099 (39.24%) Loss: 2.022646 LR: 0.00002653 +[11:21:08] Epoch: 1 Batch: 7887/20099 (39.24%) Loss: 2.227417 LR: 0.00002653 +[11:21:10] Epoch: 1 Batch: 7888/20099 (39.25%) Loss: 2.230236 LR: 0.00002653 +[11:21:12] Epoch: 1 Batch: 7889/20099 (39.25%) Loss: 2.082716 LR: 0.00002653 +[11:21:14] Epoch: 1 Batch: 7890/20099 (39.26%) Loss: 1.944577 LR: 0.00002653 +[11:21:15] Epoch: 1 Batch: 7891/20099 (39.26%) Loss: 1.998958 LR: 0.00002653 +[11:21:17] Epoch: 1 Batch: 7892/20099 (39.27%) Loss: 1.879050 LR: 0.00002652 +[11:21:19] Epoch: 1 Batch: 7893/20099 (39.27%) Loss: 2.318759 LR: 0.00002652 +[11:21:21] Epoch: 1 Batch: 7894/20099 (39.28%) Loss: 2.119075 LR: 0.00002652 +[11:21:22] Epoch: 1 Batch: 7895/20099 (39.28%) Loss: 2.212215 LR: 0.00002652 +[11:21:24] Epoch: 1 Batch: 7896/20099 (39.29%) Loss: 2.008276 LR: 0.00002652 +[11:21:26] Epoch: 1 Batch: 7897/20099 (39.29%) Loss: 2.590654 LR: 0.00002652 +[11:21:28] Epoch: 1 Batch: 7898/20099 (39.30%) Loss: 2.244771 LR: 0.00002652 +[11:21:30] Epoch: 1 Batch: 7899/20099 (39.30%) Loss: 2.133977 LR: 0.00002651 +[11:21:31] Epoch: 1 Batch: 7900/20099 (39.31%) Loss: 1.898094 LR: 0.00002651 +[11:21:33] Epoch: 1 Batch: 7901/20099 (39.31%) Loss: 2.083942 LR: 0.00002651 +[11:21:35] Epoch: 1 Batch: 7902/20099 (39.32%) Loss: 2.161458 LR: 0.00002651 +[11:21:37] Epoch: 1 Batch: 7903/20099 (39.32%) Loss: 2.050329 LR: 0.00002651 +[11:21:38] Epoch: 1 Batch: 7904/20099 (39.33%) Loss: 2.101810 LR: 0.00002651 +[11:21:40] Epoch: 1 Batch: 7905/20099 (39.33%) Loss: 2.136550 LR: 0.00002651 +[11:21:42] Epoch: 1 Batch: 7906/20099 (39.34%) Loss: 1.999566 LR: 0.00002650 +[11:21:44] Epoch: 1 Batch: 7907/20099 (39.34%) Loss: 2.013700 LR: 0.00002650 +[11:21:46] Epoch: 1 Batch: 7908/20099 (39.35%) Loss: 2.074641 LR: 0.00002650 +[11:21:47] Epoch: 1 Batch: 7909/20099 (39.35%) Loss: 2.247553 LR: 0.00002650 +[11:21:49] Epoch: 1 Batch: 7910/20099 (39.36%) Loss: 2.295721 LR: 0.00002650 +[11:21:51] Epoch: 1 Batch: 7911/20099 (39.36%) Loss: 1.777516 LR: 0.00002650 +[11:21:53] Epoch: 1 Batch: 7912/20099 (39.37%) Loss: 2.124319 LR: 0.00002650 +[11:21:54] Epoch: 1 Batch: 7913/20099 (39.37%) Loss: 2.235009 LR: 0.00002649 +[11:21:56] Epoch: 1 Batch: 7914/20099 (39.38%) Loss: 2.439484 LR: 0.00002649 +[11:21:58] Epoch: 1 Batch: 7915/20099 (39.38%) Loss: 2.001752 LR: 0.00002649 +[11:22:00] Epoch: 1 Batch: 7916/20099 (39.39%) Loss: 2.219399 LR: 0.00002649 +[11:22:01] Epoch: 1 Batch: 7917/20099 (39.39%) Loss: 2.147407 LR: 0.00002649 +[11:22:03] Epoch: 1 Batch: 7918/20099 (39.39%) Loss: 1.916561 LR: 0.00002649 +[11:22:05] Epoch: 1 Batch: 7919/20099 (39.40%) Loss: 2.049500 LR: 0.00002649 +[11:22:07] Epoch: 1 Batch: 7920/20099 (39.40%) Loss: 2.436647 LR: 0.00002648 +[11:22:08] Epoch: 1 Batch: 7921/20099 (39.41%) Loss: 1.780668 LR: 0.00002648 +[11:22:10] Epoch: 1 Batch: 7922/20099 (39.41%) Loss: 2.211989 LR: 0.00002648 +[11:22:12] Epoch: 1 Batch: 7923/20099 (39.42%) Loss: 2.239027 LR: 0.00002648 +[11:22:14] Epoch: 1 Batch: 7924/20099 (39.42%) Loss: 2.043938 LR: 0.00002648 +[11:22:16] Epoch: 1 Batch: 7925/20099 (39.43%) Loss: 2.089024 LR: 0.00002648 +[11:22:17] Epoch: 1 Batch: 7926/20099 (39.43%) Loss: 2.102420 LR: 0.00002648 +[11:22:19] Epoch: 1 Batch: 7927/20099 (39.44%) Loss: 1.861673 LR: 0.00002647 +[11:22:21] Epoch: 1 Batch: 7928/20099 (39.44%) Loss: 2.268308 LR: 0.00002647 +[11:22:23] Epoch: 1 Batch: 7929/20099 (39.45%) Loss: 2.277569 LR: 0.00002647 +[11:22:24] Epoch: 1 Batch: 7930/20099 (39.45%) Loss: 2.441023 LR: 0.00002647 +[11:22:26] Epoch: 1 Batch: 7931/20099 (39.46%) Loss: 2.122533 LR: 0.00002647 +[11:22:28] Epoch: 1 Batch: 7932/20099 (39.46%) Loss: 2.250842 LR: 0.00002647 +[11:22:30] Epoch: 1 Batch: 7933/20099 (39.47%) Loss: 2.016479 LR: 0.00002647 +[11:22:31] Epoch: 1 Batch: 7934/20099 (39.47%) Loss: 2.052250 LR: 0.00002645 +[11:22:33] Epoch: 1 Batch: 7935/20099 (39.48%) Loss: 1.908977 LR: 0.00002645 +[11:22:35] Epoch: 1 Batch: 7936/20099 (39.48%) Loss: 2.215253 LR: 0.00002645 +[11:22:37] Epoch: 1 Batch: 7937/20099 (39.49%) Loss: 2.316883 LR: 0.00002645 +[11:22:38] Epoch: 1 Batch: 7938/20099 (39.49%) Loss: 2.153031 LR: 0.00002645 +[11:22:40] Epoch: 1 Batch: 7939/20099 (39.50%) Loss: 2.322775 LR: 0.00002645 +[11:22:42] Epoch: 1 Batch: 7940/20099 (39.50%) Loss: 2.214974 LR: 0.00002645 +[11:22:44] Epoch: 1 Batch: 7941/20099 (39.51%) Loss: 2.323170 LR: 0.00002644 +[11:22:46] Epoch: 1 Batch: 7942/20099 (39.51%) Loss: 2.104364 LR: 0.00002644 +[11:22:47] Epoch: 1 Batch: 7943/20099 (39.52%) Loss: 2.429692 LR: 0.00002644 +[11:22:49] Epoch: 1 Batch: 7944/20099 (39.52%) Loss: 2.290454 LR: 0.00002644 +[11:22:51] Epoch: 1 Batch: 7945/20099 (39.53%) Loss: 1.939322 LR: 0.00002644 +[11:22:53] Epoch: 1 Batch: 7946/20099 (39.53%) Loss: 2.148483 LR: 0.00002644 +[11:22:54] Epoch: 1 Batch: 7947/20099 (39.54%) Loss: 2.123789 LR: 0.00002644 +[11:22:56] Epoch: 1 Batch: 7948/20099 (39.54%) Loss: 2.104358 LR: 0.00002643 +[11:22:58] Epoch: 1 Batch: 7949/20099 (39.55%) Loss: 2.010888 LR: 0.00002643 +[11:23:00] Epoch: 1 Batch: 7950/20099 (39.55%) Loss: 1.731723 LR: 0.00002643 +[11:23:01] Epoch: 1 Batch: 7951/20099 (39.56%) Loss: 2.140817 LR: 0.00002643 +[11:23:03] Epoch: 1 Batch: 7952/20099 (39.56%) Loss: 1.997992 LR: 0.00002643 +[11:23:05] Epoch: 1 Batch: 7953/20099 (39.57%) Loss: 2.066802 LR: 0.00002643 +[11:23:07] Epoch: 1 Batch: 7954/20099 (39.57%) Loss: 2.127962 LR: 0.00002643 +[11:23:09] Epoch: 1 Batch: 7955/20099 (39.58%) Loss: 1.935018 LR: 0.00002642 +[11:23:10] Epoch: 1 Batch: 7956/20099 (39.58%) Loss: 1.752723 LR: 0.00002642 +[11:23:12] Epoch: 1 Batch: 7957/20099 (39.59%) Loss: 2.100379 LR: 0.00002642 +[11:23:14] Epoch: 1 Batch: 7958/20099 (39.59%) Loss: 1.952143 LR: 0.00002642 +[11:23:16] Epoch: 1 Batch: 7959/20099 (39.60%) Loss: 2.283680 LR: 0.00002642 +[11:23:17] Epoch: 1 Batch: 7960/20099 (39.60%) Loss: 1.815256 LR: 0.00002642 +[11:23:19] Epoch: 1 Batch: 7961/20099 (39.61%) Loss: 2.116079 LR: 0.00002642 +[11:23:21] Epoch: 1 Batch: 7962/20099 (39.61%) Loss: 2.321756 LR: 0.00002641 +[11:23:23] Epoch: 1 Batch: 7963/20099 (39.62%) Loss: 2.199976 LR: 0.00002641 +[11:23:24] Epoch: 1 Batch: 7964/20099 (39.62%) Loss: 1.828357 LR: 0.00002641 +[11:23:26] Epoch: 1 Batch: 7965/20099 (39.63%) Loss: 2.064555 LR: 0.00002641 +[11:23:28] Epoch: 1 Batch: 7966/20099 (39.63%) Loss: 2.031368 LR: 0.00002641 +[11:23:30] Epoch: 1 Batch: 7967/20099 (39.64%) Loss: 1.970766 LR: 0.00002641 +[11:23:32] Epoch: 1 Batch: 7968/20099 (39.64%) Loss: 1.974787 LR: 0.00002641 +[11:23:33] Epoch: 1 Batch: 7969/20099 (39.65%) Loss: 2.084501 LR: 0.00002640 +[11:23:35] Epoch: 1 Batch: 7970/20099 (39.65%) Loss: 2.006256 LR: 0.00002640 +[11:23:37] Epoch: 1 Batch: 7971/20099 (39.66%) Loss: 2.127401 LR: 0.00002640 +[11:23:39] Epoch: 1 Batch: 7972/20099 (39.66%) Loss: 1.973046 LR: 0.00002640 +[11:23:40] Epoch: 1 Batch: 7973/20099 (39.67%) Loss: 2.268507 LR: 0.00002640 +[11:23:42] Epoch: 1 Batch: 7974/20099 (39.67%) Loss: 2.264230 LR: 0.00002640 +[11:23:44] Epoch: 1 Batch: 7975/20099 (39.68%) Loss: 2.085159 LR: 0.00002640 +[11:23:46] Epoch: 1 Batch: 7976/20099 (39.68%) Loss: 2.067251 LR: 0.00002638 +[11:23:47] Epoch: 1 Batch: 7977/20099 (39.69%) Loss: 2.344509 LR: 0.00002638 +[11:23:49] Epoch: 1 Batch: 7978/20099 (39.69%) Loss: 1.954948 LR: 0.00002638 +[11:23:51] Epoch: 1 Batch: 7979/20099 (39.70%) Loss: 2.029190 LR: 0.00002638 +[11:23:53] Epoch: 1 Batch: 7980/20099 (39.70%) Loss: 1.721715 LR: 0.00002638 +[11:23:54] Epoch: 1 Batch: 7981/20099 (39.71%) Loss: 1.828877 LR: 0.00002638 +[11:23:56] Epoch: 1 Batch: 7982/20099 (39.71%) Loss: 2.269747 LR: 0.00002638 +[11:23:58] Epoch: 1 Batch: 7983/20099 (39.72%) Loss: 2.153070 LR: 0.00002637 +[11:24:00] Epoch: 1 Batch: 7984/20099 (39.72%) Loss: 1.960694 LR: 0.00002637 +[11:24:02] Epoch: 1 Batch: 7985/20099 (39.73%) Loss: 1.951195 LR: 0.00002637 +[11:24:03] Epoch: 1 Batch: 7986/20099 (39.73%) Loss: 2.157287 LR: 0.00002637 +[11:24:05] Epoch: 1 Batch: 7987/20099 (39.74%) Loss: 2.299076 LR: 0.00002637 +[11:24:07] Epoch: 1 Batch: 7988/20099 (39.74%) Loss: 2.195517 LR: 0.00002637 +[11:24:09] Epoch: 1 Batch: 7989/20099 (39.75%) Loss: 2.025500 LR: 0.00002637 +[11:24:11] Epoch: 1 Batch: 7990/20099 (39.75%) Loss: 2.088178 LR: 0.00002636 +[11:24:12] Epoch: 1 Batch: 7991/20099 (39.76%) Loss: 1.961171 LR: 0.00002636 +[11:24:14] Epoch: 1 Batch: 7992/20099 (39.76%) Loss: 2.201713 LR: 0.00002636 +[11:24:16] Epoch: 1 Batch: 7993/20099 (39.77%) Loss: 2.556838 LR: 0.00002636 +[11:24:18] Epoch: 1 Batch: 7994/20099 (39.77%) Loss: 2.164869 LR: 0.00002636 +[11:24:19] Epoch: 1 Batch: 7995/20099 (39.78%) Loss: 2.379468 LR: 0.00002636 +[11:24:21] Epoch: 1 Batch: 7996/20099 (39.78%) Loss: 2.032998 LR: 0.00002636 +[11:24:23] Epoch: 1 Batch: 7997/20099 (39.79%) Loss: 2.118078 LR: 0.00002635 +[11:24:25] Epoch: 1 Batch: 7998/20099 (39.79%) Loss: 2.011512 LR: 0.00002635 +[11:24:26] Epoch: 1 Batch: 7999/20099 (39.80%) Loss: 1.957702 LR: 0.00002635 +[11:24:28] >> Evaluating batch 0 +[11:24:29] >> Evaluating batch 1 +[11:24:30] >> Evaluating batch 2 +[11:24:31] >> Evaluating batch 3 +[11:24:32] >> Evaluating batch 4 +[11:24:33] >> Evaluating batch 5 +[11:24:34] >> Evaluating batch 6 +[11:24:35] >> Evaluating batch 7 +[11:24:36] >> Evaluating batch 8 +[11:24:37] >> Evaluating batch 9 +[11:24:38] >> Evaluating batch 10 +[11:24:39] >> Evaluating batch 11 +[11:24:40] >> Evaluating batch 12 +[11:24:41] >> Evaluating batch 13 +[11:24:42] >> Evaluating batch 14 +[11:24:43] >> Evaluating batch 15 +[11:24:44] >> Evaluating batch 16 +[11:24:45] Epoch: 1 Step: 8000/20099 Evaluation: +[11:24:45] [1mAvg Loss Since Last Eval: 2.1222 Val Loss: 2.1770 Validation loss delta: -0.0089 Perplexity: 8.8199 LR: 0.00002635 +[11:24:48] >> Cleaned up old temp checkpoint: epoch1_step6000 +[11:24:48] >> Temp checkpoint saved: epoch1_step8000, size: 0.1693 GB +[11:24:52] >> Checkpoint saved: epoch1_step8000, size: 0.1693 GB +[11:24:52] Epoch: 1 Batch: 8000/20099 (39.80%) Loss: 2.122792 LR: 0.00002635 +[11:24:54] Epoch: 1 Batch: 8001/20099 (39.81%) Loss: 2.217061 LR: 0.00002635 +[11:24:55] Epoch: 1 Batch: 8002/20099 (39.81%) Loss: 2.246052 LR: 0.00002635 +[11:24:57] Epoch: 1 Batch: 8003/20099 (39.82%) Loss: 2.086280 LR: 0.00002635 +[11:24:59] Epoch: 1 Batch: 8004/20099 (39.82%) Loss: 2.259737 LR: 0.00002634 +[11:25:01] Epoch: 1 Batch: 8005/20099 (39.83%) Loss: 1.744634 LR: 0.00002634 +[11:25:02] Epoch: 1 Batch: 8006/20099 (39.83%) Loss: 2.032975 LR: 0.00002634 +[11:25:04] Epoch: 1 Batch: 8007/20099 (39.84%) Loss: 2.513135 LR: 0.00002634 +[11:25:06] Epoch: 1 Batch: 8008/20099 (39.84%) Loss: 1.886194 LR: 0.00002634 +[11:25:08] Epoch: 1 Batch: 8009/20099 (39.85%) Loss: 1.899353 LR: 0.00002634 +[11:25:09] Epoch: 1 Batch: 8010/20099 (39.85%) Loss: 2.004407 LR: 0.00002634 +[11:25:11] Epoch: 1 Batch: 8011/20099 (39.86%) Loss: 2.047608 LR: 0.00002633 +[11:25:13] Epoch: 1 Batch: 8012/20099 (39.86%) Loss: 2.348198 LR: 0.00002633 +[11:25:15] Epoch: 1 Batch: 8013/20099 (39.87%) Loss: 2.061751 LR: 0.00002633 +[11:25:17] Epoch: 1 Batch: 8014/20099 (39.87%) Loss: 1.890765 LR: 0.00002633 +[11:25:19] Epoch: 1 Batch: 8015/20099 (39.88%) Loss: 2.150518 LR: 0.00002633 +[11:25:20] Epoch: 1 Batch: 8016/20099 (39.88%) Loss: 1.787401 LR: 0.00002633 +[11:25:22] Epoch: 1 Batch: 8017/20099 (39.89%) Loss: 2.108136 LR: 0.00002633 +[11:25:24] Epoch: 1 Batch: 8018/20099 (39.89%) Loss: 1.952225 LR: 0.00002631 +[11:25:26] Epoch: 1 Batch: 8019/20099 (39.90%) Loss: 1.983482 LR: 0.00002631 +[11:25:28] Epoch: 1 Batch: 8020/20099 (39.90%) Loss: 1.809058 LR: 0.00002631 +[11:25:30] Epoch: 1 Batch: 8021/20099 (39.91%) Loss: 2.216883 LR: 0.00002631 +[11:25:31] Epoch: 1 Batch: 8022/20099 (39.91%) Loss: 2.234991 LR: 0.00002631 +[11:25:33] Epoch: 1 Batch: 8023/20099 (39.92%) Loss: 2.233574 LR: 0.00002631 +[11:25:35] Epoch: 1 Batch: 8024/20099 (39.92%) Loss: 1.993183 LR: 0.00002631 +[11:25:37] Epoch: 1 Batch: 8025/20099 (39.93%) Loss: 2.165363 LR: 0.00002630 +[11:25:39] Epoch: 1 Batch: 8026/20099 (39.93%) Loss: 2.140855 LR: 0.00002630 +[11:25:40] Epoch: 1 Batch: 8027/20099 (39.94%) Loss: 1.866997 LR: 0.00002630 +[11:25:42] Epoch: 1 Batch: 8028/20099 (39.94%) Loss: 2.229012 LR: 0.00002630 +[11:25:44] Epoch: 1 Batch: 8029/20099 (39.95%) Loss: 1.786212 LR: 0.00002630 +[11:25:46] Epoch: 1 Batch: 8030/20099 (39.95%) Loss: 1.847127 LR: 0.00002630 +[11:25:47] Epoch: 1 Batch: 8031/20099 (39.96%) Loss: 2.050258 LR: 0.00002630 +[11:25:49] Epoch: 1 Batch: 8032/20099 (39.96%) Loss: 2.103346 LR: 0.00002629 +[11:25:51] Epoch: 1 Batch: 8033/20099 (39.97%) Loss: 1.928375 LR: 0.00002629 +[11:25:53] Epoch: 1 Batch: 8034/20099 (39.97%) Loss: 2.268705 LR: 0.00002629 +[11:25:55] Epoch: 1 Batch: 8035/20099 (39.98%) Loss: 2.334401 LR: 0.00002629 +[11:25:56] Epoch: 1 Batch: 8036/20099 (39.98%) Loss: 1.729475 LR: 0.00002629 +[11:25:58] Epoch: 1 Batch: 8037/20099 (39.99%) Loss: 2.173978 LR: 0.00002629 +[11:26:00] Epoch: 1 Batch: 8038/20099 (39.99%) Loss: 2.058457 LR: 0.00002629 +[11:26:02] Epoch: 1 Batch: 8039/20099 (40.00%) Loss: 2.165681 LR: 0.00002628 +[11:26:03] Epoch: 1 Batch: 8040/20099 (40.00%) Loss: 2.022991 LR: 0.00002628 +[11:26:05] Epoch: 1 Batch: 8041/20099 (40.01%) Loss: 2.176553 LR: 0.00002628 +[11:26:07] Epoch: 1 Batch: 8042/20099 (40.01%) Loss: 2.100810 LR: 0.00002628 +[11:26:08] Epoch: 1 Batch: 8043/20099 (40.02%) Loss: 2.089385 LR: 0.00002628 +[11:26:10] Epoch: 1 Batch: 8044/20099 (40.02%) Loss: 2.513356 LR: 0.00002628 +[11:26:12] Epoch: 1 Batch: 8045/20099 (40.03%) Loss: 1.872199 LR: 0.00002628 +[11:26:14] Epoch: 1 Batch: 8046/20099 (40.03%) Loss: 1.952038 LR: 0.00002627 +[11:26:15] Epoch: 1 Batch: 8047/20099 (40.04%) Loss: 2.257120 LR: 0.00002627 +[11:26:17] Epoch: 1 Batch: 8048/20099 (40.04%) Loss: 2.036979 LR: 0.00002627 +[11:26:19] Epoch: 1 Batch: 8049/20099 (40.05%) Loss: 2.051506 LR: 0.00002627 +[11:26:21] Epoch: 1 Batch: 8050/20099 (40.05%) Loss: 2.206504 LR: 0.00002627 +[11:26:22] Epoch: 1 Batch: 8051/20099 (40.06%) Loss: 2.153022 LR: 0.00002627 +[11:26:24] Epoch: 1 Batch: 8052/20099 (40.06%) Loss: 1.803505 LR: 0.00002627 +[11:26:26] Epoch: 1 Batch: 8053/20099 (40.07%) Loss: 2.082860 LR: 0.00002626 +[11:26:28] Epoch: 1 Batch: 8054/20099 (40.07%) Loss: 2.173228 LR: 0.00002626 +[11:26:30] Epoch: 1 Batch: 8055/20099 (40.08%) Loss: 2.058554 LR: 0.00002626 +[11:26:31] Epoch: 1 Batch: 8056/20099 (40.08%) Loss: 2.128071 LR: 0.00002626 +[11:26:33] Epoch: 1 Batch: 8057/20099 (40.09%) Loss: 2.181391 LR: 0.00002626 +[11:26:35] Epoch: 1 Batch: 8058/20099 (40.09%) Loss: 2.119464 LR: 0.00002626 +[11:26:37] Epoch: 1 Batch: 8059/20099 (40.10%) Loss: 2.252860 LR: 0.00002626 +[11:26:38] Epoch: 1 Batch: 8060/20099 (40.10%) Loss: 2.078083 LR: 0.00002624 +[11:26:40] Epoch: 1 Batch: 8061/20099 (40.11%) Loss: 2.229984 LR: 0.00002624 +[11:26:42] Epoch: 1 Batch: 8062/20099 (40.11%) Loss: 1.809499 LR: 0.00002624 +[11:26:44] Epoch: 1 Batch: 8063/20099 (40.12%) Loss: 2.104297 LR: 0.00002624 +[11:26:46] Epoch: 1 Batch: 8064/20099 (40.12%) Loss: 2.020247 LR: 0.00002624 +[11:26:47] Epoch: 1 Batch: 8065/20099 (40.13%) Loss: 2.086458 LR: 0.00002624 +[11:26:49] Epoch: 1 Batch: 8066/20099 (40.13%) Loss: 1.952986 LR: 0.00002624 +[11:26:51] Epoch: 1 Batch: 8067/20099 (40.14%) Loss: 2.029030 LR: 0.00002623 +[11:26:53] Epoch: 1 Batch: 8068/20099 (40.14%) Loss: 2.143623 LR: 0.00002623 +[11:26:55] Epoch: 1 Batch: 8069/20099 (40.15%) Loss: 2.064872 LR: 0.00002623 +[11:26:56] Epoch: 1 Batch: 8070/20099 (40.15%) Loss: 2.193104 LR: 0.00002623 +[11:26:58] Epoch: 1 Batch: 8071/20099 (40.16%) Loss: 2.252611 LR: 0.00002623 +[11:27:00] Epoch: 1 Batch: 8072/20099 (40.16%) Loss: 2.139862 LR: 0.00002623 +[11:27:02] Epoch: 1 Batch: 8073/20099 (40.17%) Loss: 1.989737 LR: 0.00002623 +[11:27:03] Epoch: 1 Batch: 8074/20099 (40.17%) Loss: 2.038685 LR: 0.00002622 +[11:27:05] Epoch: 1 Batch: 8075/20099 (40.18%) Loss: 2.078929 LR: 0.00002622 +[11:27:07] Epoch: 1 Batch: 8076/20099 (40.18%) Loss: 2.279980 LR: 0.00002622 +[11:27:09] Epoch: 1 Batch: 8077/20099 (40.19%) Loss: 2.045713 LR: 0.00002622 +[11:27:11] Epoch: 1 Batch: 8078/20099 (40.19%) Loss: 2.248418 LR: 0.00002622 +[11:27:12] Epoch: 1 Batch: 8079/20099 (40.20%) Loss: 2.075420 LR: 0.00002622 +[11:27:14] Epoch: 1 Batch: 8080/20099 (40.20%) Loss: 2.288369 LR: 0.00002622 +[11:27:16] Epoch: 1 Batch: 8081/20099 (40.21%) Loss: 1.927092 LR: 0.00002621 +[11:27:18] Epoch: 1 Batch: 8082/20099 (40.21%) Loss: 2.067911 LR: 0.00002621 +[11:27:19] Epoch: 1 Batch: 8083/20099 (40.22%) Loss: 2.386840 LR: 0.00002621 +[11:27:21] Epoch: 1 Batch: 8084/20099 (40.22%) Loss: 2.434735 LR: 0.00002621 +[11:27:23] Epoch: 1 Batch: 8085/20099 (40.23%) Loss: 2.088951 LR: 0.00002621 +[11:27:25] Epoch: 1 Batch: 8086/20099 (40.23%) Loss: 2.365932 LR: 0.00002621 +[11:27:27] Epoch: 1 Batch: 8087/20099 (40.24%) Loss: 2.069150 LR: 0.00002621 +[11:27:28] Epoch: 1 Batch: 8088/20099 (40.24%) Loss: 2.295676 LR: 0.00002620 +[11:27:30] Epoch: 1 Batch: 8089/20099 (40.25%) Loss: 2.266593 LR: 0.00002620 +[11:27:32] Epoch: 1 Batch: 8090/20099 (40.25%) Loss: 2.131532 LR: 0.00002620 +[11:27:34] Epoch: 1 Batch: 8091/20099 (40.26%) Loss: 2.172271 LR: 0.00002620 +[11:27:35] Epoch: 1 Batch: 8092/20099 (40.26%) Loss: 2.058846 LR: 0.00002620 +[11:27:37] Epoch: 1 Batch: 8093/20099 (40.27%) Loss: 2.207103 LR: 0.00002620 +[11:27:39] Epoch: 1 Batch: 8094/20099 (40.27%) Loss: 1.966424 LR: 0.00002620 +[11:27:41] Epoch: 1 Batch: 8095/20099 (40.28%) Loss: 2.181124 LR: 0.00002618 +[11:27:42] Epoch: 1 Batch: 8096/20099 (40.28%) Loss: 1.904945 LR: 0.00002618 +[11:27:44] Epoch: 1 Batch: 8097/20099 (40.29%) Loss: 2.003220 LR: 0.00002618 +[11:27:46] Epoch: 1 Batch: 8098/20099 (40.29%) Loss: 2.040104 LR: 0.00002618 +[11:27:48] Epoch: 1 Batch: 8099/20099 (40.30%) Loss: 2.242136 LR: 0.00002618 +[11:27:49] Epoch: 1 Batch: 8100/20099 (40.30%) Loss: 1.977487 LR: 0.00002618 +[11:27:51] Epoch: 1 Batch: 8101/20099 (40.31%) Loss: 1.898668 LR: 0.00002618 +[11:27:53] Epoch: 1 Batch: 8102/20099 (40.31%) Loss: 1.885321 LR: 0.00002617 +[11:27:55] Epoch: 1 Batch: 8103/20099 (40.32%) Loss: 2.075983 LR: 0.00002617 +[11:27:57] Epoch: 1 Batch: 8104/20099 (40.32%) Loss: 2.069250 LR: 0.00002617 +[11:27:58] Epoch: 1 Batch: 8105/20099 (40.33%) Loss: 2.148896 LR: 0.00002617 +[11:28:00] Epoch: 1 Batch: 8106/20099 (40.33%) Loss: 1.813303 LR: 0.00002617 +[11:28:02] Epoch: 1 Batch: 8107/20099 (40.34%) Loss: 2.052743 LR: 0.00002617 +[11:28:04] Epoch: 1 Batch: 8108/20099 (40.34%) Loss: 2.371491 LR: 0.00002617 +[11:28:05] Epoch: 1 Batch: 8109/20099 (40.35%) Loss: 2.201392 LR: 0.00002616 +[11:28:07] Epoch: 1 Batch: 8110/20099 (40.35%) Loss: 2.194938 LR: 0.00002616 +[11:28:09] Epoch: 1 Batch: 8111/20099 (40.36%) Loss: 2.204846 LR: 0.00002616 +[11:28:11] Epoch: 1 Batch: 8112/20099 (40.36%) Loss: 1.876159 LR: 0.00002616 +[11:28:12] Epoch: 1 Batch: 8113/20099 (40.37%) Loss: 2.226472 LR: 0.00002616 +[11:28:14] Epoch: 1 Batch: 8114/20099 (40.37%) Loss: 2.090899 LR: 0.00002616 +[11:28:16] Epoch: 1 Batch: 8115/20099 (40.38%) Loss: 2.026616 LR: 0.00002616 +[11:28:18] Epoch: 1 Batch: 8116/20099 (40.38%) Loss: 2.109978 LR: 0.00002615 +[11:28:20] Epoch: 1 Batch: 8117/20099 (40.39%) Loss: 1.979089 LR: 0.00002615 +[11:28:21] Epoch: 1 Batch: 8118/20099 (40.39%) Loss: 2.150599 LR: 0.00002615 +[11:28:23] Epoch: 1 Batch: 8119/20099 (40.40%) Loss: 1.914079 LR: 0.00002615 +[11:28:25] Epoch: 1 Batch: 8120/20099 (40.40%) Loss: 2.147960 LR: 0.00002615 +[11:28:27] Epoch: 1 Batch: 8121/20099 (40.40%) Loss: 2.144169 LR: 0.00002615 +[11:28:28] Epoch: 1 Batch: 8122/20099 (40.41%) Loss: 2.365145 LR: 0.00002615 +[11:28:30] Epoch: 1 Batch: 8123/20099 (40.41%) Loss: 2.087547 LR: 0.00002614 +[11:28:32] Epoch: 1 Batch: 8124/20099 (40.42%) Loss: 2.122096 LR: 0.00002614 +[11:28:34] Epoch: 1 Batch: 8125/20099 (40.42%) Loss: 2.033706 LR: 0.00002614 +[11:28:36] Epoch: 1 Batch: 8126/20099 (40.43%) Loss: 2.131450 LR: 0.00002614 +[11:28:37] Epoch: 1 Batch: 8127/20099 (40.43%) Loss: 1.983948 LR: 0.00002614 +[11:28:39] Epoch: 1 Batch: 8128/20099 (40.44%) Loss: 2.322608 LR: 0.00002614 +[11:28:41] Epoch: 1 Batch: 8129/20099 (40.44%) Loss: 2.063937 LR: 0.00002614 +[11:28:43] Epoch: 1 Batch: 8130/20099 (40.45%) Loss: 1.881704 LR: 0.00002612 +[11:28:44] Epoch: 1 Batch: 8131/20099 (40.45%) Loss: 2.292980 LR: 0.00002612 +[11:28:46] Epoch: 1 Batch: 8132/20099 (40.46%) Loss: 1.766855 LR: 0.00002612 +[11:28:48] Epoch: 1 Batch: 8133/20099 (40.46%) Loss: 1.939925 LR: 0.00002612 +[11:28:50] Epoch: 1 Batch: 8134/20099 (40.47%) Loss: 2.133668 LR: 0.00002612 +[11:28:51] Epoch: 1 Batch: 8135/20099 (40.47%) Loss: 2.061893 LR: 0.00002612 +[11:28:53] Epoch: 1 Batch: 8136/20099 (40.48%) Loss: 1.912529 LR: 0.00002612 +[11:28:55] Epoch: 1 Batch: 8137/20099 (40.48%) Loss: 2.062526 LR: 0.00002611 +[11:28:57] Epoch: 1 Batch: 8138/20099 (40.49%) Loss: 2.072294 LR: 0.00002611 +[11:28:59] Epoch: 1 Batch: 8139/20099 (40.49%) Loss: 2.097504 LR: 0.00002611 +[11:29:00] Epoch: 1 Batch: 8140/20099 (40.50%) Loss: 1.816803 LR: 0.00002611 +[11:29:02] Epoch: 1 Batch: 8141/20099 (40.50%) Loss: 2.211808 LR: 0.00002611 +[11:29:04] Epoch: 1 Batch: 8142/20099 (40.51%) Loss: 1.892917 LR: 0.00002611 +[11:29:06] Epoch: 1 Batch: 8143/20099 (40.51%) Loss: 2.049631 LR: 0.00002611 +[11:29:07] Epoch: 1 Batch: 8144/20099 (40.52%) Loss: 1.988649 LR: 0.00002610 +[11:29:09] Epoch: 1 Batch: 8145/20099 (40.52%) Loss: 2.092923 LR: 0.00002610 +[11:29:11] Epoch: 1 Batch: 8146/20099 (40.53%) Loss: 1.924506 LR: 0.00002610 +[11:29:13] Epoch: 1 Batch: 8147/20099 (40.53%) Loss: 2.405868 LR: 0.00002610 +[11:29:15] Epoch: 1 Batch: 8148/20099 (40.54%) Loss: 2.132190 LR: 0.00002610 +[11:29:16] Epoch: 1 Batch: 8149/20099 (40.54%) Loss: 2.085325 LR: 0.00002610 +[11:29:18] Epoch: 1 Batch: 8150/20099 (40.55%) Loss: 2.080643 LR: 0.00002610 +[11:29:20] Epoch: 1 Batch: 8151/20099 (40.55%) Loss: 2.010659 LR: 0.00002609 +[11:29:22] Epoch: 1 Batch: 8152/20099 (40.56%) Loss: 2.264739 LR: 0.00002609 +[11:29:23] Epoch: 1 Batch: 8153/20099 (40.56%) Loss: 2.431526 LR: 0.00002609 +[11:29:25] Epoch: 1 Batch: 8154/20099 (40.57%) Loss: 2.350100 LR: 0.00002609 +[11:29:27] Epoch: 1 Batch: 8155/20099 (40.57%) Loss: 2.158111 LR: 0.00002609 +[11:29:29] Epoch: 1 Batch: 8156/20099 (40.58%) Loss: 2.015496 LR: 0.00002609 +[11:29:31] Epoch: 1 Batch: 8157/20099 (40.58%) Loss: 2.108106 LR: 0.00002609 +[11:29:32] Epoch: 1 Batch: 8158/20099 (40.59%) Loss: 2.238278 LR: 0.00002608 +[11:29:34] Epoch: 1 Batch: 8159/20099 (40.59%) Loss: 1.924238 LR: 0.00002608 +[11:29:36] Epoch: 1 Batch: 8160/20099 (40.60%) Loss: 2.282678 LR: 0.00002608 +[11:29:38] Epoch: 1 Batch: 8161/20099 (40.60%) Loss: 2.084300 LR: 0.00002608 +[11:29:39] Epoch: 1 Batch: 8162/20099 (40.61%) Loss: 2.397682 LR: 0.00002608 +[11:29:41] Epoch: 1 Batch: 8163/20099 (40.61%) Loss: 1.724172 LR: 0.00002608 +[11:29:43] Epoch: 1 Batch: 8164/20099 (40.62%) Loss: 1.999162 LR: 0.00002608 +[11:29:45] Epoch: 1 Batch: 8165/20099 (40.62%) Loss: 2.063351 LR: 0.00002606 +[11:29:47] Epoch: 1 Batch: 8166/20099 (40.63%) Loss: 1.918376 LR: 0.00002606 +[11:29:48] Epoch: 1 Batch: 8167/20099 (40.63%) Loss: 2.213048 LR: 0.00002606 +[11:29:50] Epoch: 1 Batch: 8168/20099 (40.64%) Loss: 1.793094 LR: 0.00002606 +[11:29:52] Epoch: 1 Batch: 8169/20099 (40.64%) Loss: 2.151948 LR: 0.00002606 +[11:29:54] Epoch: 1 Batch: 8170/20099 (40.65%) Loss: 1.930595 LR: 0.00002606 +[11:29:55] Epoch: 1 Batch: 8171/20099 (40.65%) Loss: 1.909399 LR: 0.00002606 +[11:29:57] Epoch: 1 Batch: 8172/20099 (40.66%) Loss: 2.279598 LR: 0.00002605 +[11:29:59] Epoch: 1 Batch: 8173/20099 (40.66%) Loss: 2.417250 LR: 0.00002605 +[11:30:01] Epoch: 1 Batch: 8174/20099 (40.67%) Loss: 2.091132 LR: 0.00002605 +[11:30:03] Epoch: 1 Batch: 8175/20099 (40.67%) Loss: 2.024986 LR: 0.00002605 +[11:30:04] Epoch: 1 Batch: 8176/20099 (40.68%) Loss: 2.324004 LR: 0.00002605 +[11:30:06] Epoch: 1 Batch: 8177/20099 (40.68%) Loss: 2.306357 LR: 0.00002605 +[11:30:08] Epoch: 1 Batch: 8178/20099 (40.69%) Loss: 2.103740 LR: 0.00002605 +[11:30:10] Epoch: 1 Batch: 8179/20099 (40.69%) Loss: 1.767395 LR: 0.00002604 +[11:30:11] Epoch: 1 Batch: 8180/20099 (40.70%) Loss: 2.132862 LR: 0.00002604 +[11:30:13] Epoch: 1 Batch: 8181/20099 (40.70%) Loss: 2.320781 LR: 0.00002604 +[11:30:15] Epoch: 1 Batch: 8182/20099 (40.71%) Loss: 2.084222 LR: 0.00002604 +[11:30:17] Epoch: 1 Batch: 8183/20099 (40.71%) Loss: 2.149271 LR: 0.00002604 +[11:30:19] Epoch: 1 Batch: 8184/20099 (40.72%) Loss: 2.026205 LR: 0.00002604 +[11:30:20] Epoch: 1 Batch: 8185/20099 (40.72%) Loss: 2.202920 LR: 0.00002604 +[11:30:22] Epoch: 1 Batch: 8186/20099 (40.73%) Loss: 1.989897 LR: 0.00002603 +[11:30:24] Epoch: 1 Batch: 8187/20099 (40.73%) Loss: 2.411265 LR: 0.00002603 +[11:30:26] Epoch: 1 Batch: 8188/20099 (40.74%) Loss: 2.127685 LR: 0.00002603 +[11:30:27] Epoch: 1 Batch: 8189/20099 (40.74%) Loss: 2.115659 LR: 0.00002603 +[11:30:29] Epoch: 1 Batch: 8190/20099 (40.75%) Loss: 2.319483 LR: 0.00002603 +[11:30:31] Epoch: 1 Batch: 8191/20099 (40.75%) Loss: 2.101728 LR: 0.00002603 +[11:30:33] Epoch: 1 Batch: 8192/20099 (40.76%) Loss: 2.121945 LR: 0.00002603 +[11:30:35] Epoch: 1 Batch: 8193/20099 (40.76%) Loss: 2.246704 LR: 0.00002602 +[11:30:36] Epoch: 1 Batch: 8194/20099 (40.77%) Loss: 1.910924 LR: 0.00002602 +[11:30:38] Epoch: 1 Batch: 8195/20099 (40.77%) Loss: 2.356366 LR: 0.00002602 +[11:30:40] Epoch: 1 Batch: 8196/20099 (40.78%) Loss: 2.065781 LR: 0.00002602 +[11:30:42] Epoch: 1 Batch: 8197/20099 (40.78%) Loss: 2.082463 LR: 0.00002602 +[11:30:43] Epoch: 1 Batch: 8198/20099 (40.79%) Loss: 2.227404 LR: 0.00002602 +[11:30:45] Epoch: 1 Batch: 8199/20099 (40.79%) Loss: 2.247691 LR: 0.00002602 +[11:30:50] >> Cleaned up old temp checkpoint: epoch1_step6200 +[11:30:50] >> Temp checkpoint saved: epoch1_step8200, size: 0.1693 GB +[11:30:50] Epoch: 1 Batch: 8200/20099 (40.80%) Loss: 2.103532 LR: 0.00002600 +[11:30:52] Epoch: 1 Batch: 8201/20099 (40.80%) Loss: 2.047515 LR: 0.00002600 +[11:30:54] Epoch: 1 Batch: 8202/20099 (40.81%) Loss: 2.223495 LR: 0.00002600 +[11:30:56] Epoch: 1 Batch: 8203/20099 (40.81%) Loss: 2.347199 LR: 0.00002600 +[11:30:58] Epoch: 1 Batch: 8204/20099 (40.82%) Loss: 2.335638 LR: 0.00002600 +[11:30:59] Epoch: 1 Batch: 8205/20099 (40.82%) Loss: 2.047778 LR: 0.00002600 +[11:31:01] Epoch: 1 Batch: 8206/20099 (40.83%) Loss: 2.174614 LR: 0.00002600 +[11:31:03] Epoch: 1 Batch: 8207/20099 (40.83%) Loss: 1.986754 LR: 0.00002599 +[11:31:05] Epoch: 1 Batch: 8208/20099 (40.84%) Loss: 2.258337 LR: 0.00002599 +[11:31:06] Epoch: 1 Batch: 8209/20099 (40.84%) Loss: 2.233785 LR: 0.00002599 +[11:31:08] Epoch: 1 Batch: 8210/20099 (40.85%) Loss: 1.935499 LR: 0.00002599 +[11:31:10] Epoch: 1 Batch: 8211/20099 (40.85%) Loss: 2.332697 LR: 0.00002599 +[11:31:12] Epoch: 1 Batch: 8212/20099 (40.86%) Loss: 2.397747 LR: 0.00002599 +[11:31:14] Epoch: 1 Batch: 8213/20099 (40.86%) Loss: 2.253101 LR: 0.00002599 +[11:31:15] Epoch: 1 Batch: 8214/20099 (40.87%) Loss: 1.852451 LR: 0.00002598 +[11:31:17] Epoch: 1 Batch: 8215/20099 (40.87%) Loss: 2.131278 LR: 0.00002598 +[11:31:19] Epoch: 1 Batch: 8216/20099 (40.88%) Loss: 2.265303 LR: 0.00002598 +[11:31:21] Epoch: 1 Batch: 8217/20099 (40.88%) Loss: 2.155893 LR: 0.00002598 +[11:31:23] Epoch: 1 Batch: 8218/20099 (40.89%) Loss: 2.215322 LR: 0.00002598 +[11:31:24] Epoch: 1 Batch: 8219/20099 (40.89%) Loss: 2.125850 LR: 0.00002598 +[11:31:26] Epoch: 1 Batch: 8220/20099 (40.90%) Loss: 1.943813 LR: 0.00002598 +[11:31:28] Epoch: 1 Batch: 8221/20099 (40.90%) Loss: 2.177730 LR: 0.00002597 +[11:31:30] Epoch: 1 Batch: 8222/20099 (40.91%) Loss: 1.846109 LR: 0.00002597 +[11:31:31] Epoch: 1 Batch: 8223/20099 (40.91%) Loss: 2.375459 LR: 0.00002597 +[11:31:33] Epoch: 1 Batch: 8224/20099 (40.92%) Loss: 1.996266 LR: 0.00002597 +[11:31:35] Epoch: 1 Batch: 8225/20099 (40.92%) Loss: 2.110236 LR: 0.00002597 +[11:31:37] Epoch: 1 Batch: 8226/20099 (40.93%) Loss: 1.869027 LR: 0.00002597 +[11:31:39] Epoch: 1 Batch: 8227/20099 (40.93%) Loss: 2.108586 LR: 0.00002597 +[11:31:40] Epoch: 1 Batch: 8228/20099 (40.94%) Loss: 2.110737 LR: 0.00002596 +[11:31:42] Epoch: 1 Batch: 8229/20099 (40.94%) Loss: 2.118512 LR: 0.00002596 +[11:31:44] Epoch: 1 Batch: 8230/20099 (40.95%) Loss: 2.045492 LR: 0.00002596 +[11:31:46] Epoch: 1 Batch: 8231/20099 (40.95%) Loss: 2.283268 LR: 0.00002596 +[11:31:47] Epoch: 1 Batch: 8232/20099 (40.96%) Loss: 2.253147 LR: 0.00002596 +[11:31:49] Epoch: 1 Batch: 8233/20099 (40.96%) Loss: 2.287370 LR: 0.00002596 +[11:31:51] Epoch: 1 Batch: 8234/20099 (40.97%) Loss: 1.911391 LR: 0.00002596 +[11:31:53] Epoch: 1 Batch: 8235/20099 (40.97%) Loss: 2.213205 LR: 0.00002594 +[11:31:54] Epoch: 1 Batch: 8236/20099 (40.98%) Loss: 2.301231 LR: 0.00002594 +[11:31:56] Epoch: 1 Batch: 8237/20099 (40.98%) Loss: 2.063960 LR: 0.00002594 +[11:31:58] Epoch: 1 Batch: 8238/20099 (40.99%) Loss: 2.050057 LR: 0.00002594 +[11:32:00] Epoch: 1 Batch: 8239/20099 (40.99%) Loss: 2.095547 LR: 0.00002594 +[11:32:02] Epoch: 1 Batch: 8240/20099 (41.00%) Loss: 1.928670 LR: 0.00002594 +[11:32:03] Epoch: 1 Batch: 8241/20099 (41.00%) Loss: 1.848796 LR: 0.00002594 +[11:32:05] Epoch: 1 Batch: 8242/20099 (41.01%) Loss: 2.150547 LR: 0.00002593 +[11:32:07] Epoch: 1 Batch: 8243/20099 (41.01%) Loss: 1.700732 LR: 0.00002593 +[11:32:09] Epoch: 1 Batch: 8244/20099 (41.02%) Loss: 1.973993 LR: 0.00002593 +[11:32:10] Epoch: 1 Batch: 8245/20099 (41.02%) Loss: 2.132689 LR: 0.00002593 +[11:32:12] Epoch: 1 Batch: 8246/20099 (41.03%) Loss: 2.147711 LR: 0.00002593 +[11:32:14] Epoch: 1 Batch: 8247/20099 (41.03%) Loss: 2.245638 LR: 0.00002593 +[11:32:16] Epoch: 1 Batch: 8248/20099 (41.04%) Loss: 2.066290 LR: 0.00002593 +[11:32:18] Epoch: 1 Batch: 8249/20099 (41.04%) Loss: 2.386791 LR: 0.00002592 +[11:32:19] Epoch: 1 Batch: 8250/20099 (41.05%) Loss: 1.859469 LR: 0.00002592 +[11:32:21] Epoch: 1 Batch: 8251/20099 (41.05%) Loss: 1.961264 LR: 0.00002592 +[11:32:23] Epoch: 1 Batch: 8252/20099 (41.06%) Loss: 1.790578 LR: 0.00002592 +[11:32:25] Epoch: 1 Batch: 8253/20099 (41.06%) Loss: 2.089749 LR: 0.00002592 +[11:32:26] Epoch: 1 Batch: 8254/20099 (41.07%) Loss: 2.168762 LR: 0.00002592 +[11:32:28] Epoch: 1 Batch: 8255/20099 (41.07%) Loss: 2.102539 LR: 0.00002592 +[11:32:30] Epoch: 1 Batch: 8256/20099 (41.08%) Loss: 2.217268 LR: 0.00002591 +[11:32:32] Epoch: 1 Batch: 8257/20099 (41.08%) Loss: 1.979180 LR: 0.00002591 +[11:32:34] Epoch: 1 Batch: 8258/20099 (41.09%) Loss: 2.183273 LR: 0.00002591 +[11:32:35] Epoch: 1 Batch: 8259/20099 (41.09%) Loss: 2.354937 LR: 0.00002591 +[11:32:37] Epoch: 1 Batch: 8260/20099 (41.10%) Loss: 2.051922 LR: 0.00002591 +[11:32:39] Epoch: 1 Batch: 8261/20099 (41.10%) Loss: 1.968598 LR: 0.00002591 +[11:32:41] Epoch: 1 Batch: 8262/20099 (41.11%) Loss: 2.441431 LR: 0.00002591 +[11:32:43] Epoch: 1 Batch: 8263/20099 (41.11%) Loss: 2.032303 LR: 0.00002590 +[11:32:44] Epoch: 1 Batch: 8264/20099 (41.12%) Loss: 2.134115 LR: 0.00002590 +[11:32:46] Epoch: 1 Batch: 8265/20099 (41.12%) Loss: 2.029630 LR: 0.00002590 +[11:32:48] Epoch: 1 Batch: 8266/20099 (41.13%) Loss: 1.927977 LR: 0.00002590 +[11:32:50] Epoch: 1 Batch: 8267/20099 (41.13%) Loss: 2.078717 LR: 0.00002590 +[11:32:51] Epoch: 1 Batch: 8268/20099 (41.14%) Loss: 2.343168 LR: 0.00002590 +[11:32:53] Epoch: 1 Batch: 8269/20099 (41.14%) Loss: 2.276478 LR: 0.00002590 +[11:32:55] Epoch: 1 Batch: 8270/20099 (41.15%) Loss: 1.739197 LR: 0.00002588 +[11:32:57] Epoch: 1 Batch: 8271/20099 (41.15%) Loss: 2.223879 LR: 0.00002588 +[11:32:59] Epoch: 1 Batch: 8272/20099 (41.16%) Loss: 2.110296 LR: 0.00002588 +[11:33:00] Epoch: 1 Batch: 8273/20099 (41.16%) Loss: 2.004871 LR: 0.00002588 +[11:33:02] Epoch: 1 Batch: 8274/20099 (41.17%) Loss: 1.948287 LR: 0.00002588 +[11:33:04] Epoch: 1 Batch: 8275/20099 (41.17%) Loss: 2.202878 LR: 0.00002588 +[11:33:06] Epoch: 1 Batch: 8276/20099 (41.18%) Loss: 2.047437 LR: 0.00002588 +[11:33:08] Epoch: 1 Batch: 8277/20099 (41.18%) Loss: 1.949892 LR: 0.00002587 +[11:33:09] Epoch: 1 Batch: 8278/20099 (41.19%) Loss: 2.106364 LR: 0.00002587 +[11:33:11] Epoch: 1 Batch: 8279/20099 (41.19%) Loss: 2.011910 LR: 0.00002587 +[11:33:13] Epoch: 1 Batch: 8280/20099 (41.20%) Loss: 2.104411 LR: 0.00002587 +[11:33:15] Epoch: 1 Batch: 8281/20099 (41.20%) Loss: 2.362146 LR: 0.00002587 +[11:33:16] Epoch: 1 Batch: 8282/20099 (41.21%) Loss: 2.035570 LR: 0.00002587 +[11:33:18] Epoch: 1 Batch: 8283/20099 (41.21%) Loss: 2.079381 LR: 0.00002587 +[11:33:20] Epoch: 1 Batch: 8284/20099 (41.22%) Loss: 2.328632 LR: 0.00002586 +[11:33:22] Epoch: 1 Batch: 8285/20099 (41.22%) Loss: 2.030283 LR: 0.00002586 +[11:33:23] Epoch: 1 Batch: 8286/20099 (41.23%) Loss: 2.045771 LR: 0.00002586 +[11:33:25] Epoch: 1 Batch: 8287/20099 (41.23%) Loss: 1.794908 LR: 0.00002586 +[11:33:27] Epoch: 1 Batch: 8288/20099 (41.24%) Loss: 2.279110 LR: 0.00002586 +[11:33:29] Epoch: 1 Batch: 8289/20099 (41.24%) Loss: 2.111758 LR: 0.00002586 +[11:33:31] Epoch: 1 Batch: 8290/20099 (41.25%) Loss: 2.477107 LR: 0.00002586 +[11:33:32] Epoch: 1 Batch: 8291/20099 (41.25%) Loss: 1.909185 LR: 0.00002585 +[11:33:34] Epoch: 1 Batch: 8292/20099 (41.26%) Loss: 1.998699 LR: 0.00002585 +[11:33:36] Epoch: 1 Batch: 8293/20099 (41.26%) Loss: 1.997839 LR: 0.00002585 +[11:33:38] Epoch: 1 Batch: 8294/20099 (41.27%) Loss: 2.187543 LR: 0.00002585 +[11:33:40] Epoch: 1 Batch: 8295/20099 (41.27%) Loss: 2.319999 LR: 0.00002585 +[11:33:41] Epoch: 1 Batch: 8296/20099 (41.28%) Loss: 2.363299 LR: 0.00002585 +[11:33:43] Epoch: 1 Batch: 8297/20099 (41.28%) Loss: 1.838425 LR: 0.00002585 +[11:33:45] Epoch: 1 Batch: 8298/20099 (41.29%) Loss: 2.144073 LR: 0.00002583 +[11:33:47] Epoch: 1 Batch: 8299/20099 (41.29%) Loss: 1.776664 LR: 0.00002583 +[11:33:48] Epoch: 1 Batch: 8300/20099 (41.30%) Loss: 2.348075 LR: 0.00002583 +[11:33:50] Epoch: 1 Batch: 8301/20099 (41.30%) Loss: 2.438646 LR: 0.00002583 +[11:33:52] Epoch: 1 Batch: 8302/20099 (41.31%) Loss: 2.087900 LR: 0.00002583 +[11:33:54] Epoch: 1 Batch: 8303/20099 (41.31%) Loss: 2.235232 LR: 0.00002583 +[11:33:56] Epoch: 1 Batch: 8304/20099 (41.32%) Loss: 2.217824 LR: 0.00002583 +[11:33:57] Epoch: 1 Batch: 8305/20099 (41.32%) Loss: 2.130119 LR: 0.00002582 +[11:33:59] Epoch: 1 Batch: 8306/20099 (41.33%) Loss: 1.960494 LR: 0.00002582 +[11:34:01] Epoch: 1 Batch: 8307/20099 (41.33%) Loss: 1.944493 LR: 0.00002582 +[11:34:03] Epoch: 1 Batch: 8308/20099 (41.34%) Loss: 2.021588 LR: 0.00002582 +[11:34:04] Epoch: 1 Batch: 8309/20099 (41.34%) Loss: 2.081196 LR: 0.00002582 +[11:34:06] Epoch: 1 Batch: 8310/20099 (41.35%) Loss: 2.019321 LR: 0.00002582 +[11:34:08] Epoch: 1 Batch: 8311/20099 (41.35%) Loss: 2.207712 LR: 0.00002582 +[11:34:10] Epoch: 1 Batch: 8312/20099 (41.36%) Loss: 2.043094 LR: 0.00002581 +[11:34:12] Epoch: 1 Batch: 8313/20099 (41.36%) Loss: 1.935134 LR: 0.00002581 +[11:34:13] Epoch: 1 Batch: 8314/20099 (41.37%) Loss: 2.127048 LR: 0.00002581 +[11:34:15] Epoch: 1 Batch: 8315/20099 (41.37%) Loss: 2.136548 LR: 0.00002581 +[11:34:17] Epoch: 1 Batch: 8316/20099 (41.38%) Loss: 2.417741 LR: 0.00002581 +[11:34:19] Epoch: 1 Batch: 8317/20099 (41.38%) Loss: 1.796977 LR: 0.00002581 +[11:34:20] Epoch: 1 Batch: 8318/20099 (41.39%) Loss: 2.099362 LR: 0.00002581 +[11:34:22] Epoch: 1 Batch: 8319/20099 (41.39%) Loss: 2.100381 LR: 0.00002580 +[11:34:24] Epoch: 1 Batch: 8320/20099 (41.40%) Loss: 2.255765 LR: 0.00002580 +[11:34:26] Epoch: 1 Batch: 8321/20099 (41.40%) Loss: 1.894671 LR: 0.00002580 +[11:34:28] Epoch: 1 Batch: 8322/20099 (41.41%) Loss: 2.047995 LR: 0.00002580 +[11:34:29] Epoch: 1 Batch: 8323/20099 (41.41%) Loss: 2.025865 LR: 0.00002580 +[11:34:31] Epoch: 1 Batch: 8324/20099 (41.41%) Loss: 2.054902 LR: 0.00002580 +[11:34:33] Epoch: 1 Batch: 8325/20099 (41.42%) Loss: 2.210465 LR: 0.00002580 +[11:34:35] Epoch: 1 Batch: 8326/20099 (41.42%) Loss: 2.111426 LR: 0.00002578 +[11:34:36] Epoch: 1 Batch: 8327/20099 (41.43%) Loss: 1.940457 LR: 0.00002578 +[11:34:38] Epoch: 1 Batch: 8328/20099 (41.43%) Loss: 2.161345 LR: 0.00002578 +[11:34:40] Epoch: 1 Batch: 8329/20099 (41.44%) Loss: 1.977554 LR: 0.00002578 +[11:34:42] Epoch: 1 Batch: 8330/20099 (41.44%) Loss: 2.280603 LR: 0.00002578 +[11:34:44] Epoch: 1 Batch: 8331/20099 (41.45%) Loss: 2.198880 LR: 0.00002578 +[11:34:45] Epoch: 1 Batch: 8332/20099 (41.45%) Loss: 2.234639 LR: 0.00002578 +[11:34:47] Epoch: 1 Batch: 8333/20099 (41.46%) Loss: 2.143510 LR: 0.00002577 +[11:34:49] Epoch: 1 Batch: 8334/20099 (41.46%) Loss: 2.069878 LR: 0.00002577 +[11:34:51] Epoch: 1 Batch: 8335/20099 (41.47%) Loss: 1.938427 LR: 0.00002577 +[11:34:52] Epoch: 1 Batch: 8336/20099 (41.47%) Loss: 1.909236 LR: 0.00002577 +[11:34:54] Epoch: 1 Batch: 8337/20099 (41.48%) Loss: 2.240653 LR: 0.00002577 +[11:34:56] Epoch: 1 Batch: 8338/20099 (41.48%) Loss: 2.166502 LR: 0.00002577 +[11:34:58] Epoch: 1 Batch: 8339/20099 (41.49%) Loss: 1.838065 LR: 0.00002577 +[11:35:00] Epoch: 1 Batch: 8340/20099 (41.49%) Loss: 2.241515 LR: 0.00002576 +[11:35:01] Epoch: 1 Batch: 8341/20099 (41.50%) Loss: 2.158964 LR: 0.00002576 +[11:35:03] Epoch: 1 Batch: 8342/20099 (41.50%) Loss: 2.317140 LR: 0.00002576 +[11:35:05] Epoch: 1 Batch: 8343/20099 (41.51%) Loss: 1.841415 LR: 0.00002576 +[11:35:07] Epoch: 1 Batch: 8344/20099 (41.51%) Loss: 2.312675 LR: 0.00002576 +[11:35:08] Epoch: 1 Batch: 8345/20099 (41.52%) Loss: 2.040964 LR: 0.00002576 +[11:35:10] Epoch: 1 Batch: 8346/20099 (41.52%) Loss: 1.698634 LR: 0.00002576 +[11:35:12] Epoch: 1 Batch: 8347/20099 (41.53%) Loss: 1.931397 LR: 0.00002575 +[11:35:14] Epoch: 1 Batch: 8348/20099 (41.53%) Loss: 2.189879 LR: 0.00002575 +[11:35:16] Epoch: 1 Batch: 8349/20099 (41.54%) Loss: 2.087610 LR: 0.00002575 +[11:35:17] Epoch: 1 Batch: 8350/20099 (41.54%) Loss: 2.166534 LR: 0.00002575 +[11:35:19] Epoch: 1 Batch: 8351/20099 (41.55%) Loss: 1.921308 LR: 0.00002575 +[11:35:21] Epoch: 1 Batch: 8352/20099 (41.55%) Loss: 1.497729 LR: 0.00002575 +[11:35:23] Epoch: 1 Batch: 8353/20099 (41.56%) Loss: 2.069600 LR: 0.00002575 +[11:35:25] Epoch: 1 Batch: 8354/20099 (41.56%) Loss: 2.120371 LR: 0.00002573 +[11:35:26] Epoch: 1 Batch: 8355/20099 (41.57%) Loss: 2.131090 LR: 0.00002573 +[11:35:28] Epoch: 1 Batch: 8356/20099 (41.57%) Loss: 2.483771 LR: 0.00002573 +[11:35:30] Epoch: 1 Batch: 8357/20099 (41.58%) Loss: 2.265750 LR: 0.00002573 +[11:35:32] Epoch: 1 Batch: 8358/20099 (41.58%) Loss: 2.274986 LR: 0.00002573 +[11:35:33] Epoch: 1 Batch: 8359/20099 (41.59%) Loss: 2.300013 LR: 0.00002573 +[11:35:35] Epoch: 1 Batch: 8360/20099 (41.59%) Loss: 2.208018 LR: 0.00002573 +[11:35:37] Epoch: 1 Batch: 8361/20099 (41.60%) Loss: 2.316407 LR: 0.00002572 +[11:35:39] Epoch: 1 Batch: 8362/20099 (41.60%) Loss: 2.380417 LR: 0.00002572 +[11:35:41] Epoch: 1 Batch: 8363/20099 (41.61%) Loss: 2.100675 LR: 0.00002572 +[11:35:42] Epoch: 1 Batch: 8364/20099 (41.61%) Loss: 2.304286 LR: 0.00002572 +[11:35:44] Epoch: 1 Batch: 8365/20099 (41.62%) Loss: 2.019868 LR: 0.00002572 +[11:35:46] Epoch: 1 Batch: 8366/20099 (41.62%) Loss: 1.876629 LR: 0.00002572 +[11:35:48] Epoch: 1 Batch: 8367/20099 (41.63%) Loss: 2.073965 LR: 0.00002572 +[11:35:49] Epoch: 1 Batch: 8368/20099 (41.63%) Loss: 2.215035 LR: 0.00002571 +[11:35:51] Epoch: 1 Batch: 8369/20099 (41.64%) Loss: 1.950903 LR: 0.00002571 +[11:35:53] Epoch: 1 Batch: 8370/20099 (41.64%) Loss: 2.225978 LR: 0.00002571 +[11:35:55] Epoch: 1 Batch: 8371/20099 (41.65%) Loss: 2.049452 LR: 0.00002571 +[11:35:57] Epoch: 1 Batch: 8372/20099 (41.65%) Loss: 2.098077 LR: 0.00002571 +[11:35:58] Epoch: 1 Batch: 8373/20099 (41.66%) Loss: 2.496556 LR: 0.00002571 +[11:36:00] Epoch: 1 Batch: 8374/20099 (41.66%) Loss: 2.193581 LR: 0.00002571 +[11:36:02] Epoch: 1 Batch: 8375/20099 (41.67%) Loss: 2.276154 LR: 0.00002570 +[11:36:04] Epoch: 1 Batch: 8376/20099 (41.67%) Loss: 2.194686 LR: 0.00002570 +[11:36:05] Epoch: 1 Batch: 8377/20099 (41.68%) Loss: 2.273958 LR: 0.00002570 +[11:36:07] Epoch: 1 Batch: 8378/20099 (41.68%) Loss: 2.111023 LR: 0.00002570 +[11:36:09] Epoch: 1 Batch: 8379/20099 (41.69%) Loss: 1.934123 LR: 0.00002570 +[11:36:11] Epoch: 1 Batch: 8380/20099 (41.69%) Loss: 2.115332 LR: 0.00002570 +[11:36:12] Epoch: 1 Batch: 8381/20099 (41.70%) Loss: 2.146709 LR: 0.00002570 +[11:36:14] Epoch: 1 Batch: 8382/20099 (41.70%) Loss: 2.267504 LR: 0.00002569 +[11:36:16] Epoch: 1 Batch: 8383/20099 (41.71%) Loss: 2.260318 LR: 0.00002569 +[11:36:18] Epoch: 1 Batch: 8384/20099 (41.71%) Loss: 2.566031 LR: 0.00002569 +[11:36:20] Epoch: 1 Batch: 8385/20099 (41.72%) Loss: 2.083444 LR: 0.00002569 +[11:36:21] Epoch: 1 Batch: 8386/20099 (41.72%) Loss: 2.202050 LR: 0.00002569 +[11:36:23] Epoch: 1 Batch: 8387/20099 (41.73%) Loss: 1.852112 LR: 0.00002569 +[11:36:25] Epoch: 1 Batch: 8388/20099 (41.73%) Loss: 1.968212 LR: 0.00002569 +[11:36:27] Epoch: 1 Batch: 8389/20099 (41.74%) Loss: 2.033763 LR: 0.00002567 +[11:36:29] Epoch: 1 Batch: 8390/20099 (41.74%) Loss: 2.027413 LR: 0.00002567 +[11:36:30] Epoch: 1 Batch: 8391/20099 (41.75%) Loss: 2.031100 LR: 0.00002567 +[11:36:32] Epoch: 1 Batch: 8392/20099 (41.75%) Loss: 2.170081 LR: 0.00002567 +[11:36:34] Epoch: 1 Batch: 8393/20099 (41.76%) Loss: 2.148518 LR: 0.00002567 +[11:36:36] Epoch: 1 Batch: 8394/20099 (41.76%) Loss: 2.034337 LR: 0.00002567 +[11:36:37] Epoch: 1 Batch: 8395/20099 (41.77%) Loss: 2.391926 LR: 0.00002567 +[11:36:39] Epoch: 1 Batch: 8396/20099 (41.77%) Loss: 2.413854 LR: 0.00002566 +[11:36:41] Epoch: 1 Batch: 8397/20099 (41.78%) Loss: 2.125856 LR: 0.00002566 +[11:36:43] Epoch: 1 Batch: 8398/20099 (41.78%) Loss: 2.021636 LR: 0.00002566 +[11:36:45] Epoch: 1 Batch: 8399/20099 (41.79%) Loss: 1.933128 LR: 0.00002566 +[11:36:50] >> Cleaned up old temp checkpoint: epoch1_step6400 +[11:36:50] >> Temp checkpoint saved: epoch1_step8400, size: 0.1693 GB +[11:36:50] Epoch: 1 Batch: 8400/20099 (41.79%) Loss: 2.135387 LR: 0.00002566 +[11:36:52] Epoch: 1 Batch: 8401/20099 (41.80%) Loss: 2.146113 LR: 0.00002566 +[11:36:53] Epoch: 1 Batch: 8402/20099 (41.80%) Loss: 1.927920 LR: 0.00002566 +[11:36:55] Epoch: 1 Batch: 8403/20099 (41.81%) Loss: 2.080199 LR: 0.00002565 +[11:36:57] Epoch: 1 Batch: 8404/20099 (41.81%) Loss: 2.045110 LR: 0.00002565 +[11:36:59] Epoch: 1 Batch: 8405/20099 (41.82%) Loss: 1.747438 LR: 0.00002565 +[11:37:00] Epoch: 1 Batch: 8406/20099 (41.82%) Loss: 2.329950 LR: 0.00002565 +[11:37:02] Epoch: 1 Batch: 8407/20099 (41.83%) Loss: 1.695412 LR: 0.00002565 +[11:37:04] Epoch: 1 Batch: 8408/20099 (41.83%) Loss: 2.111331 LR: 0.00002565 +[11:37:06] Epoch: 1 Batch: 8409/20099 (41.84%) Loss: 2.257595 LR: 0.00002565 +[11:37:08] Epoch: 1 Batch: 8410/20099 (41.84%) Loss: 2.316241 LR: 0.00002564 +[11:37:09] Epoch: 1 Batch: 8411/20099 (41.85%) Loss: 1.885411 LR: 0.00002564 +[11:37:11] Epoch: 1 Batch: 8412/20099 (41.85%) Loss: 2.204722 LR: 0.00002564 +[11:37:13] Epoch: 1 Batch: 8413/20099 (41.86%) Loss: 2.527086 LR: 0.00002564 +[11:37:15] Epoch: 1 Batch: 8414/20099 (41.86%) Loss: 1.930007 LR: 0.00002564 +[11:37:17] Epoch: 1 Batch: 8415/20099 (41.87%) Loss: 2.023625 LR: 0.00002564 +[11:37:18] Epoch: 1 Batch: 8416/20099 (41.87%) Loss: 2.303977 LR: 0.00002564 +[11:37:20] Epoch: 1 Batch: 8417/20099 (41.88%) Loss: 2.179839 LR: 0.00002562 +[11:37:22] Epoch: 1 Batch: 8418/20099 (41.88%) Loss: 2.179608 LR: 0.00002562 +[11:37:24] Epoch: 1 Batch: 8419/20099 (41.89%) Loss: 2.093322 LR: 0.00002562 +[11:37:26] Epoch: 1 Batch: 8420/20099 (41.89%) Loss: 2.109544 LR: 0.00002562 +[11:37:27] Epoch: 1 Batch: 8421/20099 (41.90%) Loss: 2.187616 LR: 0.00002562 +[11:37:29] Epoch: 1 Batch: 8422/20099 (41.90%) Loss: 1.768944 LR: 0.00002562 +[11:37:31] Epoch: 1 Batch: 8423/20099 (41.91%) Loss: 2.315255 LR: 0.00002562 +[11:37:33] Epoch: 1 Batch: 8424/20099 (41.91%) Loss: 2.246185 LR: 0.00002561 +[11:37:35] Epoch: 1 Batch: 8425/20099 (41.92%) Loss: 2.195369 LR: 0.00002561 +[11:37:36] Epoch: 1 Batch: 8426/20099 (41.92%) Loss: 2.090518 LR: 0.00002561 +[11:37:38] Epoch: 1 Batch: 8427/20099 (41.93%) Loss: 1.947975 LR: 0.00002561 +[11:37:40] Epoch: 1 Batch: 8428/20099 (41.93%) Loss: 1.971010 LR: 0.00002561 +[11:37:42] Epoch: 1 Batch: 8429/20099 (41.94%) Loss: 2.299153 LR: 0.00002561 +[11:37:43] Epoch: 1 Batch: 8430/20099 (41.94%) Loss: 2.261442 LR: 0.00002561 +[11:37:45] Epoch: 1 Batch: 8431/20099 (41.95%) Loss: 1.807941 LR: 0.00002560 +[11:37:47] Epoch: 1 Batch: 8432/20099 (41.95%) Loss: 2.288074 LR: 0.00002560 +[11:37:49] Epoch: 1 Batch: 8433/20099 (41.96%) Loss: 1.866800 LR: 0.00002560 +[11:37:50] Epoch: 1 Batch: 8434/20099 (41.96%) Loss: 2.068439 LR: 0.00002560 +[11:37:52] Epoch: 1 Batch: 8435/20099 (41.97%) Loss: 1.894861 LR: 0.00002560 +[11:37:54] Epoch: 1 Batch: 8436/20099 (41.97%) Loss: 2.248547 LR: 0.00002560 +[11:37:56] Epoch: 1 Batch: 8437/20099 (41.98%) Loss: 2.037016 LR: 0.00002560 +[11:37:58] Epoch: 1 Batch: 8438/20099 (41.98%) Loss: 2.171897 LR: 0.00002558 +[11:37:59] Epoch: 1 Batch: 8439/20099 (41.99%) Loss: 1.719218 LR: 0.00002558 +[11:38:01] Epoch: 1 Batch: 8440/20099 (41.99%) Loss: 2.205342 LR: 0.00002558 +[11:38:03] Epoch: 1 Batch: 8441/20099 (42.00%) Loss: 2.079671 LR: 0.00002558 +[11:38:05] Epoch: 1 Batch: 8442/20099 (42.00%) Loss: 2.129334 LR: 0.00002558 +[11:38:06] Epoch: 1 Batch: 8443/20099 (42.01%) Loss: 1.992893 LR: 0.00002558 +[11:38:08] Epoch: 1 Batch: 8444/20099 (42.01%) Loss: 2.037917 LR: 0.00002558 +[11:38:10] Epoch: 1 Batch: 8445/20099 (42.02%) Loss: 1.692624 LR: 0.00002557 +[11:38:12] Epoch: 1 Batch: 8446/20099 (42.02%) Loss: 2.036050 LR: 0.00002557 +[11:38:13] Epoch: 1 Batch: 8447/20099 (42.03%) Loss: 2.151200 LR: 0.00002557 +[11:38:15] Epoch: 1 Batch: 8448/20099 (42.03%) Loss: 2.420434 LR: 0.00002557 +[11:38:17] Epoch: 1 Batch: 8449/20099 (42.04%) Loss: 2.120061 LR: 0.00002557 +[11:38:19] Epoch: 1 Batch: 8450/20099 (42.04%) Loss: 2.041980 LR: 0.00002557 +[11:38:21] Epoch: 1 Batch: 8451/20099 (42.05%) Loss: 2.481949 LR: 0.00002557 +[11:38:22] Epoch: 1 Batch: 8452/20099 (42.05%) Loss: 2.106501 LR: 0.00002556 +[11:38:24] Epoch: 1 Batch: 8453/20099 (42.06%) Loss: 1.986683 LR: 0.00002556 +[11:38:26] Epoch: 1 Batch: 8454/20099 (42.06%) Loss: 2.070791 LR: 0.00002556 +[11:38:28] Epoch: 1 Batch: 8455/20099 (42.07%) Loss: 1.831435 LR: 0.00002556 +[11:38:30] Epoch: 1 Batch: 8456/20099 (42.07%) Loss: 2.198513 LR: 0.00002556 +[11:38:31] Epoch: 1 Batch: 8457/20099 (42.08%) Loss: 2.219378 LR: 0.00002556 +[11:38:33] Epoch: 1 Batch: 8458/20099 (42.08%) Loss: 2.016115 LR: 0.00002556 +[11:38:35] Epoch: 1 Batch: 8459/20099 (42.09%) Loss: 2.088208 LR: 0.00002555 +[11:38:37] Epoch: 1 Batch: 8460/20099 (42.09%) Loss: 2.444727 LR: 0.00002555 +[11:38:38] Epoch: 1 Batch: 8461/20099 (42.10%) Loss: 1.970284 LR: 0.00002555 +[11:38:40] Epoch: 1 Batch: 8462/20099 (42.10%) Loss: 1.962031 LR: 0.00002555 +[11:38:42] Epoch: 1 Batch: 8463/20099 (42.11%) Loss: 2.339386 LR: 0.00002555 +[11:38:44] Epoch: 1 Batch: 8464/20099 (42.11%) Loss: 2.011597 LR: 0.00002555 +[11:38:46] Epoch: 1 Batch: 8465/20099 (42.12%) Loss: 2.159337 LR: 0.00002555 +[11:38:47] Epoch: 1 Batch: 8466/20099 (42.12%) Loss: 1.575574 LR: 0.00002553 +[11:38:49] Epoch: 1 Batch: 8467/20099 (42.13%) Loss: 1.765136 LR: 0.00002553 +[11:38:51] Epoch: 1 Batch: 8468/20099 (42.13%) Loss: 2.105543 LR: 0.00002553 +[11:38:53] Epoch: 1 Batch: 8469/20099 (42.14%) Loss: 2.307961 LR: 0.00002553 +[11:38:54] Epoch: 1 Batch: 8470/20099 (42.14%) Loss: 2.099694 LR: 0.00002553 +[11:38:56] Epoch: 1 Batch: 8471/20099 (42.15%) Loss: 2.275409 LR: 0.00002553 +[11:38:58] Epoch: 1 Batch: 8472/20099 (42.15%) Loss: 2.397203 LR: 0.00002553 +[11:39:00] Epoch: 1 Batch: 8473/20099 (42.16%) Loss: 2.136112 LR: 0.00002552 +[11:39:02] Epoch: 1 Batch: 8474/20099 (42.16%) Loss: 2.058021 LR: 0.00002552 +[11:39:03] Epoch: 1 Batch: 8475/20099 (42.17%) Loss: 2.098475 LR: 0.00002552 +[11:39:05] Epoch: 1 Batch: 8476/20099 (42.17%) Loss: 2.087186 LR: 0.00002552 +[11:39:07] Epoch: 1 Batch: 8477/20099 (42.18%) Loss: 1.549926 LR: 0.00002552 +[11:39:09] Epoch: 1 Batch: 8478/20099 (42.18%) Loss: 2.008648 LR: 0.00002552 +[11:39:10] Epoch: 1 Batch: 8479/20099 (42.19%) Loss: 2.333879 LR: 0.00002552 +[11:39:12] Epoch: 1 Batch: 8480/20099 (42.19%) Loss: 2.150656 LR: 0.00002551 +[11:39:14] Epoch: 1 Batch: 8481/20099 (42.20%) Loss: 2.289738 LR: 0.00002551 +[11:39:16] Epoch: 1 Batch: 8482/20099 (42.20%) Loss: 2.222258 LR: 0.00002551 +[11:39:18] Epoch: 1 Batch: 8483/20099 (42.21%) Loss: 2.175740 LR: 0.00002551 +[11:39:19] Epoch: 1 Batch: 8484/20099 (42.21%) Loss: 2.311911 LR: 0.00002551 +[11:39:21] Epoch: 1 Batch: 8485/20099 (42.22%) Loss: 1.973220 LR: 0.00002551 +[11:39:23] Epoch: 1 Batch: 8486/20099 (42.22%) Loss: 2.091789 LR: 0.00002551 +[11:39:25] Epoch: 1 Batch: 8487/20099 (42.23%) Loss: 2.123166 LR: 0.00002550 +[11:39:26] Epoch: 1 Batch: 8488/20099 (42.23%) Loss: 1.958299 LR: 0.00002550 +[11:39:28] Epoch: 1 Batch: 8489/20099 (42.24%) Loss: 2.122168 LR: 0.00002550 +[11:39:30] Epoch: 1 Batch: 8490/20099 (42.24%) Loss: 2.057658 LR: 0.00002550 +[11:39:32] Epoch: 1 Batch: 8491/20099 (42.25%) Loss: 2.021213 LR: 0.00002550 +[11:39:33] Epoch: 1 Batch: 8492/20099 (42.25%) Loss: 2.194717 LR: 0.00002550 +[11:39:35] Epoch: 1 Batch: 8493/20099 (42.26%) Loss: 2.023645 LR: 0.00002550 +[11:39:37] Epoch: 1 Batch: 8494/20099 (42.26%) Loss: 2.238142 LR: 0.00002548 +[11:39:39] Epoch: 1 Batch: 8495/20099 (42.27%) Loss: 2.036112 LR: 0.00002548 +[11:39:41] Epoch: 1 Batch: 8496/20099 (42.27%) Loss: 2.241148 LR: 0.00002548 +[11:39:42] Epoch: 1 Batch: 8497/20099 (42.28%) Loss: 1.967391 LR: 0.00002548 +[11:39:44] Epoch: 1 Batch: 8498/20099 (42.28%) Loss: 2.068941 LR: 0.00002548 +[11:39:46] Epoch: 1 Batch: 8499/20099 (42.29%) Loss: 2.199587 LR: 0.00002548 +[11:39:48] >> Evaluating batch 0 +[11:39:49] >> Evaluating batch 1 +[11:39:50] >> Evaluating batch 2 +[11:39:51] >> Evaluating batch 3 +[11:39:52] >> Evaluating batch 4 +[11:39:53] >> Evaluating batch 5 +[11:39:54] >> Evaluating batch 6 +[11:39:55] >> Evaluating batch 7 +[11:39:56] >> Evaluating batch 8 +[11:39:57] >> Evaluating batch 9 +[11:39:58] >> Evaluating batch 10 +[11:39:59] >> Evaluating batch 11 +[11:40:00] >> Evaluating batch 12 +[11:40:01] >> Evaluating batch 13 +[11:40:02] >> Evaluating batch 14 +[11:40:02] >> Evaluating batch 15 +[11:40:03] >> Evaluating batch 16 +[11:40:04] Epoch: 1 Step: 8500/20099 Evaluation: +[11:40:04] [1mAvg Loss Since Last Eval: 2.1053 Val Loss: 2.1758 Validation loss delta: -0.0012 Perplexity: 8.8091 LR: 0.00002548 +[11:40:08] >> Checkpoint saved: epoch1_step8500, size: 0.1693 GB +[11:40:08] Epoch: 1 Batch: 8500/20099 (42.29%) Loss: 2.226770 LR: 0.00002548 +[11:40:09] Epoch: 1 Batch: 8501/20099 (42.30%) Loss: 2.055083 LR: 0.00002547 +[11:40:11] Epoch: 1 Batch: 8502/20099 (42.30%) Loss: 2.110502 LR: 0.00002547 +[11:40:13] Epoch: 1 Batch: 8503/20099 (42.31%) Loss: 2.077309 LR: 0.00002547 +[11:40:15] Epoch: 1 Batch: 8504/20099 (42.31%) Loss: 1.603479 LR: 0.00002547 +[11:40:16] Epoch: 1 Batch: 8505/20099 (42.32%) Loss: 2.289458 LR: 0.00002547 +[11:40:18] Epoch: 1 Batch: 8506/20099 (42.32%) Loss: 2.025167 LR: 0.00002547 +[11:40:20] Epoch: 1 Batch: 8507/20099 (42.33%) Loss: 1.626361 LR: 0.00002547 +[11:40:22] Epoch: 1 Batch: 8508/20099 (42.33%) Loss: 1.991895 LR: 0.00002546 +[11:40:23] Epoch: 1 Batch: 8509/20099 (42.34%) Loss: 2.229913 LR: 0.00002546 +[11:40:25] Epoch: 1 Batch: 8510/20099 (42.34%) Loss: 1.980468 LR: 0.00002546 +[11:40:27] Epoch: 1 Batch: 8511/20099 (42.35%) Loss: 2.208723 LR: 0.00002546 +[11:40:29] Epoch: 1 Batch: 8512/20099 (42.35%) Loss: 2.088488 LR: 0.00002546 +[11:40:31] Epoch: 1 Batch: 8513/20099 (42.36%) Loss: 2.047435 LR: 0.00002546 +[11:40:33] Epoch: 1 Batch: 8514/20099 (42.36%) Loss: 2.281674 LR: 0.00002546 +[11:40:34] Epoch: 1 Batch: 8515/20099 (42.37%) Loss: 1.859825 LR: 0.00002545 +[11:40:36] Epoch: 1 Batch: 8516/20099 (42.37%) Loss: 2.385668 LR: 0.00002545 +[11:40:38] Epoch: 1 Batch: 8517/20099 (42.38%) Loss: 1.971903 LR: 0.00002545 +[11:40:40] Epoch: 1 Batch: 8518/20099 (42.38%) Loss: 1.972402 LR: 0.00002545 +[11:40:42] Epoch: 1 Batch: 8519/20099 (42.39%) Loss: 1.664523 LR: 0.00002545 +[11:40:43] Epoch: 1 Batch: 8520/20099 (42.39%) Loss: 1.535823 LR: 0.00002545 +[11:40:45] Epoch: 1 Batch: 8521/20099 (42.40%) Loss: 1.912231 LR: 0.00002545 +[11:40:47] Epoch: 1 Batch: 8522/20099 (42.40%) Loss: 2.273691 LR: 0.00002543 +[11:40:49] Epoch: 1 Batch: 8523/20099 (42.41%) Loss: 2.132195 LR: 0.00002543 +[11:40:51] Epoch: 1 Batch: 8524/20099 (42.41%) Loss: 2.121899 LR: 0.00002543 +[11:40:52] Epoch: 1 Batch: 8525/20099 (42.42%) Loss: 2.294261 LR: 0.00002543 +[11:40:54] Epoch: 1 Batch: 8526/20099 (42.42%) Loss: 2.072601 LR: 0.00002543 +[11:40:56] Epoch: 1 Batch: 8527/20099 (42.42%) Loss: 2.205577 LR: 0.00002543 +[11:40:58] Epoch: 1 Batch: 8528/20099 (42.43%) Loss: 1.931711 LR: 0.00002543 +[11:40:59] Epoch: 1 Batch: 8529/20099 (42.43%) Loss: 2.159572 LR: 0.00002542 +[11:41:01] Epoch: 1 Batch: 8530/20099 (42.44%) Loss: 1.947933 LR: 0.00002542 +[11:41:03] Epoch: 1 Batch: 8531/20099 (42.44%) Loss: 2.356861 LR: 0.00002542 +[11:41:05] Epoch: 1 Batch: 8532/20099 (42.45%) Loss: 2.129709 LR: 0.00002542 +[11:41:06] Epoch: 1 Batch: 8533/20099 (42.45%) Loss: 2.025892 LR: 0.00002542 +[11:41:08] Epoch: 1 Batch: 8534/20099 (42.46%) Loss: 2.348033 LR: 0.00002542 +[11:41:10] Epoch: 1 Batch: 8535/20099 (42.46%) Loss: 2.088488 LR: 0.00002542 +[11:41:12] Epoch: 1 Batch: 8536/20099 (42.47%) Loss: 2.191310 LR: 0.00002541 +[11:41:13] Epoch: 1 Batch: 8537/20099 (42.47%) Loss: 2.397079 LR: 0.00002541 +[11:41:15] Epoch: 1 Batch: 8538/20099 (42.48%) Loss: 2.136764 LR: 0.00002541 +[11:41:17] Epoch: 1 Batch: 8539/20099 (42.48%) Loss: 2.045951 LR: 0.00002541 +[11:41:19] Epoch: 1 Batch: 8540/20099 (42.49%) Loss: 2.173259 LR: 0.00002541 +[11:41:21] Epoch: 1 Batch: 8541/20099 (42.49%) Loss: 2.107357 LR: 0.00002541 +[11:41:22] Epoch: 1 Batch: 8542/20099 (42.50%) Loss: 2.171087 LR: 0.00002541 +[11:41:24] Epoch: 1 Batch: 8543/20099 (42.50%) Loss: 1.919832 LR: 0.00002539 +[11:41:26] Epoch: 1 Batch: 8544/20099 (42.51%) Loss: 2.023543 LR: 0.00002539 +[11:41:28] Epoch: 1 Batch: 8545/20099 (42.51%) Loss: 2.144823 LR: 0.00002539 +[11:41:29] Epoch: 1 Batch: 8546/20099 (42.52%) Loss: 1.947392 LR: 0.00002539 +[11:41:31] Epoch: 1 Batch: 8547/20099 (42.52%) Loss: 2.120265 LR: 0.00002539 +[11:41:33] Epoch: 1 Batch: 8548/20099 (42.53%) Loss: 2.246254 LR: 0.00002539 +[11:41:35] Epoch: 1 Batch: 8549/20099 (42.53%) Loss: 2.358902 LR: 0.00002539 +[11:41:36] Epoch: 1 Batch: 8550/20099 (42.54%) Loss: 2.277602 LR: 0.00002538 +[11:41:38] Epoch: 1 Batch: 8551/20099 (42.54%) Loss: 2.060599 LR: 0.00002538 +[11:41:40] Epoch: 1 Batch: 8552/20099 (42.55%) Loss: 1.908423 LR: 0.00002538 +[11:41:42] Epoch: 1 Batch: 8553/20099 (42.55%) Loss: 1.776831 LR: 0.00002538 +[11:41:44] Epoch: 1 Batch: 8554/20099 (42.56%) Loss: 2.154840 LR: 0.00002538 +[11:41:45] Epoch: 1 Batch: 8555/20099 (42.56%) Loss: 2.037282 LR: 0.00002538 +[11:41:47] Epoch: 1 Batch: 8556/20099 (42.57%) Loss: 2.033813 LR: 0.00002538 +[11:41:49] Epoch: 1 Batch: 8557/20099 (42.57%) Loss: 2.467761 LR: 0.00002537 +[11:41:51] Epoch: 1 Batch: 8558/20099 (42.58%) Loss: 1.687747 LR: 0.00002537 +[11:41:52] Epoch: 1 Batch: 8559/20099 (42.58%) Loss: 2.349386 LR: 0.00002537 +[11:41:54] Epoch: 1 Batch: 8560/20099 (42.59%) Loss: 2.244714 LR: 0.00002537 +[11:41:56] Epoch: 1 Batch: 8561/20099 (42.59%) Loss: 2.199562 LR: 0.00002537 +[11:41:58] Epoch: 1 Batch: 8562/20099 (42.60%) Loss: 1.936094 LR: 0.00002537 +[11:42:00] Epoch: 1 Batch: 8563/20099 (42.60%) Loss: 1.812430 LR: 0.00002537 +[11:42:01] Epoch: 1 Batch: 8564/20099 (42.61%) Loss: 2.126463 LR: 0.00002536 +[11:42:03] Epoch: 1 Batch: 8565/20099 (42.61%) Loss: 2.266934 LR: 0.00002536 +[11:42:05] Epoch: 1 Batch: 8566/20099 (42.62%) Loss: 2.285975 LR: 0.00002536 +[11:42:07] Epoch: 1 Batch: 8567/20099 (42.62%) Loss: 2.363725 LR: 0.00002536 +[11:42:09] Epoch: 1 Batch: 8568/20099 (42.63%) Loss: 1.907679 LR: 0.00002536 +[11:42:10] Epoch: 1 Batch: 8569/20099 (42.63%) Loss: 2.078296 LR: 0.00002536 +[11:42:12] Epoch: 1 Batch: 8570/20099 (42.64%) Loss: 1.833547 LR: 0.00002536 +[11:42:14] Epoch: 1 Batch: 8571/20099 (42.64%) Loss: 2.315106 LR: 0.00002534 +[11:42:16] Epoch: 1 Batch: 8572/20099 (42.65%) Loss: 2.023692 LR: 0.00002534 +[11:42:17] Epoch: 1 Batch: 8573/20099 (42.65%) Loss: 2.152697 LR: 0.00002534 +[11:42:19] Epoch: 1 Batch: 8574/20099 (42.66%) Loss: 2.287580 LR: 0.00002534 +[11:42:21] Epoch: 1 Batch: 8575/20099 (42.66%) Loss: 2.201092 LR: 0.00002534 +[11:42:23] Epoch: 1 Batch: 8576/20099 (42.67%) Loss: 1.964422 LR: 0.00002534 +[11:42:24] Epoch: 1 Batch: 8577/20099 (42.67%) Loss: 1.683542 LR: 0.00002534 +[11:42:26] Epoch: 1 Batch: 8578/20099 (42.68%) Loss: 1.966780 LR: 0.00002533 +[11:42:28] Epoch: 1 Batch: 8579/20099 (42.68%) Loss: 1.915959 LR: 0.00002533 +[11:42:30] Epoch: 1 Batch: 8580/20099 (42.69%) Loss: 1.956283 LR: 0.00002533 +[11:42:32] Epoch: 1 Batch: 8581/20099 (42.69%) Loss: 2.216053 LR: 0.00002533 +[11:42:33] Epoch: 1 Batch: 8582/20099 (42.70%) Loss: 2.547195 LR: 0.00002533 +[11:42:35] Epoch: 1 Batch: 8583/20099 (42.70%) Loss: 2.067507 LR: 0.00002533 +[11:42:37] Epoch: 1 Batch: 8584/20099 (42.71%) Loss: 1.890233 LR: 0.00002533 +[11:42:39] Epoch: 1 Batch: 8585/20099 (42.71%) Loss: 2.322392 LR: 0.00002532 +[11:42:40] Epoch: 1 Batch: 8586/20099 (42.72%) Loss: 2.068478 LR: 0.00002532 +[11:42:42] Epoch: 1 Batch: 8587/20099 (42.72%) Loss: 2.115974 LR: 0.00002532 +[11:42:44] Epoch: 1 Batch: 8588/20099 (42.73%) Loss: 2.226666 LR: 0.00002532 +[11:42:46] Epoch: 1 Batch: 8589/20099 (42.73%) Loss: 2.116151 LR: 0.00002532 +[11:42:47] Epoch: 1 Batch: 8590/20099 (42.74%) Loss: 2.033291 LR: 0.00002532 +[11:42:49] Epoch: 1 Batch: 8591/20099 (42.74%) Loss: 2.053727 LR: 0.00002532 +[11:42:51] Epoch: 1 Batch: 8592/20099 (42.75%) Loss: 2.059513 LR: 0.00002530 +[11:42:53] Epoch: 1 Batch: 8593/20099 (42.75%) Loss: 1.838191 LR: 0.00002530 +[11:42:55] Epoch: 1 Batch: 8594/20099 (42.76%) Loss: 2.229143 LR: 0.00002530 +[11:42:56] Epoch: 1 Batch: 8595/20099 (42.76%) Loss: 2.222639 LR: 0.00002530 +[11:42:58] Epoch: 1 Batch: 8596/20099 (42.77%) Loss: 2.135159 LR: 0.00002530 +[11:43:00] Epoch: 1 Batch: 8597/20099 (42.77%) Loss: 1.994958 LR: 0.00002530 +[11:43:02] Epoch: 1 Batch: 8598/20099 (42.78%) Loss: 1.803015 LR: 0.00002530 +[11:43:03] Epoch: 1 Batch: 8599/20099 (42.78%) Loss: 1.964717 LR: 0.00002529 +[11:43:09] >> Cleaned up old temp checkpoint: epoch1_step6600 +[11:43:09] >> Temp checkpoint saved: epoch1_step8600, size: 0.1693 GB +[11:43:09] Epoch: 1 Batch: 8600/20099 (42.79%) Loss: 2.135351 LR: 0.00002529 +[11:43:10] Epoch: 1 Batch: 8601/20099 (42.79%) Loss: 2.388689 LR: 0.00002529 +[11:43:12] Epoch: 1 Batch: 8602/20099 (42.80%) Loss: 2.108501 LR: 0.00002529 +[11:43:14] Epoch: 1 Batch: 8603/20099 (42.80%) Loss: 2.207276 LR: 0.00002529 +[11:43:16] Epoch: 1 Batch: 8604/20099 (42.81%) Loss: 2.053855 LR: 0.00002529 +[11:43:17] Epoch: 1 Batch: 8605/20099 (42.81%) Loss: 2.025039 LR: 0.00002529 +[11:43:19] Epoch: 1 Batch: 8606/20099 (42.82%) Loss: 2.252290 LR: 0.00002528 +[11:43:21] Epoch: 1 Batch: 8607/20099 (42.82%) Loss: 2.091642 LR: 0.00002528 +[11:43:23] Epoch: 1 Batch: 8608/20099 (42.83%) Loss: 2.059448 LR: 0.00002528 +[11:43:25] Epoch: 1 Batch: 8609/20099 (42.83%) Loss: 1.999621 LR: 0.00002528 +[11:43:26] Epoch: 1 Batch: 8610/20099 (42.84%) Loss: 2.349713 LR: 0.00002528 +[11:43:28] Epoch: 1 Batch: 8611/20099 (42.84%) Loss: 2.089671 LR: 0.00002528 +[11:43:30] Epoch: 1 Batch: 8612/20099 (42.85%) Loss: 1.988560 LR: 0.00002528 +[11:43:32] Epoch: 1 Batch: 8613/20099 (42.85%) Loss: 2.185002 LR: 0.00002527 +[11:43:34] Epoch: 1 Batch: 8614/20099 (42.86%) Loss: 2.090698 LR: 0.00002527 +[11:43:35] Epoch: 1 Batch: 8615/20099 (42.86%) Loss: 1.904257 LR: 0.00002527 +[11:43:37] Epoch: 1 Batch: 8616/20099 (42.87%) Loss: 2.108423 LR: 0.00002527 +[11:43:39] Epoch: 1 Batch: 8617/20099 (42.87%) Loss: 2.218575 LR: 0.00002527 +[11:43:41] Epoch: 1 Batch: 8618/20099 (42.88%) Loss: 2.102868 LR: 0.00002527 +[11:43:43] Epoch: 1 Batch: 8619/20099 (42.88%) Loss: 2.217443 LR: 0.00002527 +[11:43:45] Epoch: 1 Batch: 8620/20099 (42.89%) Loss: 1.877322 LR: 0.00002525 +[11:43:46] Epoch: 1 Batch: 8621/20099 (42.89%) Loss: 1.822494 LR: 0.00002525 +[11:43:48] Epoch: 1 Batch: 8622/20099 (42.90%) Loss: 1.955180 LR: 0.00002525 +[11:43:50] Epoch: 1 Batch: 8623/20099 (42.90%) Loss: 2.144392 LR: 0.00002525 +[11:43:52] Epoch: 1 Batch: 8624/20099 (42.91%) Loss: 2.146796 LR: 0.00002525 +[11:43:53] Epoch: 1 Batch: 8625/20099 (42.91%) Loss: 2.081444 LR: 0.00002525 +[11:43:55] Epoch: 1 Batch: 8626/20099 (42.92%) Loss: 2.069887 LR: 0.00002525 +[11:43:57] Epoch: 1 Batch: 8627/20099 (42.92%) Loss: 2.028244 LR: 0.00002524 +[11:43:59] Epoch: 1 Batch: 8628/20099 (42.93%) Loss: 2.045522 LR: 0.00002524 +[11:44:01] Epoch: 1 Batch: 8629/20099 (42.93%) Loss: 1.949482 LR: 0.00002524 +[11:44:02] Epoch: 1 Batch: 8630/20099 (42.94%) Loss: 2.041935 LR: 0.00002524 +[11:44:04] Epoch: 1 Batch: 8631/20099 (42.94%) Loss: 2.258480 LR: 0.00002524 +[11:44:06] Epoch: 1 Batch: 8632/20099 (42.95%) Loss: 1.848267 LR: 0.00002524 +[11:44:08] Epoch: 1 Batch: 8633/20099 (42.95%) Loss: 2.201160 LR: 0.00002524 +[11:44:09] Epoch: 1 Batch: 8634/20099 (42.96%) Loss: 1.978869 LR: 0.00002523 +[11:44:11] Epoch: 1 Batch: 8635/20099 (42.96%) Loss: 2.139172 LR: 0.00002523 +[11:44:13] Epoch: 1 Batch: 8636/20099 (42.97%) Loss: 2.281006 LR: 0.00002523 +[11:44:15] Epoch: 1 Batch: 8637/20099 (42.97%) Loss: 2.140579 LR: 0.00002523 +[11:44:16] Epoch: 1 Batch: 8638/20099 (42.98%) Loss: 1.941023 LR: 0.00002523 +[11:44:18] Epoch: 1 Batch: 8639/20099 (42.98%) Loss: 1.930530 LR: 0.00002523 +[11:44:20] Epoch: 1 Batch: 8640/20099 (42.99%) Loss: 1.938033 LR: 0.00002523 +[11:44:22] Epoch: 1 Batch: 8641/20099 (42.99%) Loss: 2.143999 LR: 0.00002521 +[11:44:23] Epoch: 1 Batch: 8642/20099 (43.00%) Loss: 2.091579 LR: 0.00002521 +[11:44:25] Epoch: 1 Batch: 8643/20099 (43.00%) Loss: 2.251325 LR: 0.00002521 +[11:44:27] Epoch: 1 Batch: 8644/20099 (43.01%) Loss: 2.210966 LR: 0.00002521 +[11:44:29] Epoch: 1 Batch: 8645/20099 (43.01%) Loss: 2.099001 LR: 0.00002521 +[11:44:30] Epoch: 1 Batch: 8646/20099 (43.02%) Loss: 2.158655 LR: 0.00002521 +[11:44:32] Epoch: 1 Batch: 8647/20099 (43.02%) Loss: 2.184009 LR: 0.00002521 +[11:44:34] Epoch: 1 Batch: 8648/20099 (43.03%) Loss: 2.182019 LR: 0.00002520 +[11:44:36] Epoch: 1 Batch: 8649/20099 (43.03%) Loss: 1.969491 LR: 0.00002520 +[11:44:38] Epoch: 1 Batch: 8650/20099 (43.04%) Loss: 2.021269 LR: 0.00002520 +[11:44:39] Epoch: 1 Batch: 8651/20099 (43.04%) Loss: 1.944142 LR: 0.00002520 +[11:44:41] Epoch: 1 Batch: 8652/20099 (43.05%) Loss: 1.787288 LR: 0.00002520 +[11:44:43] Epoch: 1 Batch: 8653/20099 (43.05%) Loss: 2.004835 LR: 0.00002520 +[11:44:45] Epoch: 1 Batch: 8654/20099 (43.06%) Loss: 2.123668 LR: 0.00002520 +[11:44:47] Epoch: 1 Batch: 8655/20099 (43.06%) Loss: 2.364437 LR: 0.00002519 +[11:44:48] Epoch: 1 Batch: 8656/20099 (43.07%) Loss: 1.824974 LR: 0.00002519 +[11:44:50] Epoch: 1 Batch: 8657/20099 (43.07%) Loss: 2.137780 LR: 0.00002519 +[11:44:52] Epoch: 1 Batch: 8658/20099 (43.08%) Loss: 2.115452 LR: 0.00002519 +[11:44:54] Epoch: 1 Batch: 8659/20099 (43.08%) Loss: 2.278077 LR: 0.00002519 +[11:44:55] Epoch: 1 Batch: 8660/20099 (43.09%) Loss: 2.202779 LR: 0.00002519 +[11:44:57] Epoch: 1 Batch: 8661/20099 (43.09%) Loss: 2.177789 LR: 0.00002519 +[11:44:59] Epoch: 1 Batch: 8662/20099 (43.10%) Loss: 2.160023 LR: 0.00002518 +[11:45:01] Epoch: 1 Batch: 8663/20099 (43.10%) Loss: 2.188057 LR: 0.00002518 +[11:45:03] Epoch: 1 Batch: 8664/20099 (43.11%) Loss: 1.868831 LR: 0.00002518 +[11:45:04] Epoch: 1 Batch: 8665/20099 (43.11%) Loss: 2.217101 LR: 0.00002518 +[11:45:06] Epoch: 1 Batch: 8666/20099 (43.12%) Loss: 2.151749 LR: 0.00002518 +[11:45:08] Epoch: 1 Batch: 8667/20099 (43.12%) Loss: 2.255677 LR: 0.00002518 +[11:45:10] Epoch: 1 Batch: 8668/20099 (43.13%) Loss: 1.944455 LR: 0.00002518 +[11:45:11] Epoch: 1 Batch: 8669/20099 (43.13%) Loss: 2.068547 LR: 0.00002516 +[11:45:13] Epoch: 1 Batch: 8670/20099 (43.14%) Loss: 2.213587 LR: 0.00002516 +[11:45:15] Epoch: 1 Batch: 8671/20099 (43.14%) Loss: 2.036845 LR: 0.00002516 +[11:45:17] Epoch: 1 Batch: 8672/20099 (43.15%) Loss: 2.061576 LR: 0.00002516 +[11:45:19] Epoch: 1 Batch: 8673/20099 (43.15%) Loss: 2.257074 LR: 0.00002516 +[11:45:20] Epoch: 1 Batch: 8674/20099 (43.16%) Loss: 2.286549 LR: 0.00002516 +[11:45:22] Epoch: 1 Batch: 8675/20099 (43.16%) Loss: 2.290716 LR: 0.00002516 +[11:45:24] Epoch: 1 Batch: 8676/20099 (43.17%) Loss: 2.055077 LR: 0.00002515 +[11:45:26] Epoch: 1 Batch: 8677/20099 (43.17%) Loss: 2.116516 LR: 0.00002515 +[11:45:27] Epoch: 1 Batch: 8678/20099 (43.18%) Loss: 1.939814 LR: 0.00002515 +[11:45:29] Epoch: 1 Batch: 8679/20099 (43.18%) Loss: 2.190419 LR: 0.00002515 +[11:45:31] Epoch: 1 Batch: 8680/20099 (43.19%) Loss: 1.872654 LR: 0.00002515 +[11:45:33] Epoch: 1 Batch: 8681/20099 (43.19%) Loss: 1.911647 LR: 0.00002515 +[11:45:34] Epoch: 1 Batch: 8682/20099 (43.20%) Loss: 1.829846 LR: 0.00002515 +[11:45:36] Epoch: 1 Batch: 8683/20099 (43.20%) Loss: 1.950848 LR: 0.00002514 +[11:45:38] Epoch: 1 Batch: 8684/20099 (43.21%) Loss: 2.157645 LR: 0.00002514 +[11:45:40] Epoch: 1 Batch: 8685/20099 (43.21%) Loss: 2.157362 LR: 0.00002514 +[11:45:41] Epoch: 1 Batch: 8686/20099 (43.22%) Loss: 2.128485 LR: 0.00002514 +[11:45:43] Epoch: 1 Batch: 8687/20099 (43.22%) Loss: 2.075567 LR: 0.00002514 +[11:45:45] Epoch: 1 Batch: 8688/20099 (43.23%) Loss: 2.206054 LR: 0.00002514 +[11:45:47] Epoch: 1 Batch: 8689/20099 (43.23%) Loss: 2.205368 LR: 0.00002514 +[11:45:49] Epoch: 1 Batch: 8690/20099 (43.24%) Loss: 1.940943 LR: 0.00002512 +[11:45:50] Epoch: 1 Batch: 8691/20099 (43.24%) Loss: 2.050334 LR: 0.00002512 +[11:45:52] Epoch: 1 Batch: 8692/20099 (43.25%) Loss: 2.087475 LR: 0.00002512 +[11:45:54] Epoch: 1 Batch: 8693/20099 (43.25%) Loss: 1.955823 LR: 0.00002512 +[11:45:56] Epoch: 1 Batch: 8694/20099 (43.26%) Loss: 2.039918 LR: 0.00002512 +[11:45:57] Epoch: 1 Batch: 8695/20099 (43.26%) Loss: 2.041171 LR: 0.00002512 +[11:45:59] Epoch: 1 Batch: 8696/20099 (43.27%) Loss: 2.069250 LR: 0.00002512 +[11:46:01] Epoch: 1 Batch: 8697/20099 (43.27%) Loss: 1.921136 LR: 0.00002511 +[11:46:03] Epoch: 1 Batch: 8698/20099 (43.28%) Loss: 2.109964 LR: 0.00002511 +[11:46:05] Epoch: 1 Batch: 8699/20099 (43.28%) Loss: 2.087691 LR: 0.00002511 +[11:46:06] Epoch: 1 Batch: 8700/20099 (43.29%) Loss: 2.057903 LR: 0.00002511 +[11:46:08] Epoch: 1 Batch: 8701/20099 (43.29%) Loss: 1.950335 LR: 0.00002511 +[11:46:10] Epoch: 1 Batch: 8702/20099 (43.30%) Loss: 1.958429 LR: 0.00002511 +[11:46:12] Epoch: 1 Batch: 8703/20099 (43.30%) Loss: 1.818969 LR: 0.00002511 +[11:46:14] Epoch: 1 Batch: 8704/20099 (43.31%) Loss: 1.957181 LR: 0.00002510 +[11:46:16] Epoch: 1 Batch: 8705/20099 (43.31%) Loss: 1.840253 LR: 0.00002510 +[11:46:17] Epoch: 1 Batch: 8706/20099 (43.32%) Loss: 2.141576 LR: 0.00002510 +[11:46:19] Epoch: 1 Batch: 8707/20099 (43.32%) Loss: 2.141484 LR: 0.00002510 +[11:46:21] Epoch: 1 Batch: 8708/20099 (43.33%) Loss: 1.966695 LR: 0.00002510 +[11:46:23] Epoch: 1 Batch: 8709/20099 (43.33%) Loss: 2.340643 LR: 0.00002510 +[11:46:24] Epoch: 1 Batch: 8710/20099 (43.34%) Loss: 2.470619 LR: 0.00002510 +[11:46:26] Epoch: 1 Batch: 8711/20099 (43.34%) Loss: 2.368766 LR: 0.00002508 +[11:46:28] Epoch: 1 Batch: 8712/20099 (43.35%) Loss: 1.845962 LR: 0.00002508 +[11:46:30] Epoch: 1 Batch: 8713/20099 (43.35%) Loss: 2.269329 LR: 0.00002508 +[11:46:32] Epoch: 1 Batch: 8714/20099 (43.36%) Loss: 2.501289 LR: 0.00002508 +[11:46:33] Epoch: 1 Batch: 8715/20099 (43.36%) Loss: 2.355699 LR: 0.00002508 +[11:46:35] Epoch: 1 Batch: 8716/20099 (43.37%) Loss: 1.818942 LR: 0.00002508 +[11:46:37] Epoch: 1 Batch: 8717/20099 (43.37%) Loss: 2.133856 LR: 0.00002508 +[11:46:39] Epoch: 1 Batch: 8718/20099 (43.38%) Loss: 2.445891 LR: 0.00002507 +[11:46:41] Epoch: 1 Batch: 8719/20099 (43.38%) Loss: 2.286847 LR: 0.00002507 +[11:46:42] Epoch: 1 Batch: 8720/20099 (43.39%) Loss: 1.817318 LR: 0.00002507 +[11:46:44] Epoch: 1 Batch: 8721/20099 (43.39%) Loss: 1.773886 LR: 0.00002507 +[11:46:46] Epoch: 1 Batch: 8722/20099 (43.40%) Loss: 2.110995 LR: 0.00002507 +[11:46:48] Epoch: 1 Batch: 8723/20099 (43.40%) Loss: 2.274652 LR: 0.00002507 +[11:46:49] Epoch: 1 Batch: 8724/20099 (43.41%) Loss: 1.998864 LR: 0.00002507 +[11:46:51] Epoch: 1 Batch: 8725/20099 (43.41%) Loss: 2.252484 LR: 0.00002506 +[11:46:53] Epoch: 1 Batch: 8726/20099 (43.42%) Loss: 1.807805 LR: 0.00002506 +[11:46:55] Epoch: 1 Batch: 8727/20099 (43.42%) Loss: 2.328614 LR: 0.00002506 +[11:46:57] Epoch: 1 Batch: 8728/20099 (43.43%) Loss: 1.953470 LR: 0.00002506 +[11:46:58] Epoch: 1 Batch: 8729/20099 (43.43%) Loss: 2.032003 LR: 0.00002506 +[11:47:00] Epoch: 1 Batch: 8730/20099 (43.43%) Loss: 2.107406 LR: 0.00002506 +[11:47:02] Epoch: 1 Batch: 8731/20099 (43.44%) Loss: 2.098711 LR: 0.00002506 +[11:47:04] Epoch: 1 Batch: 8732/20099 (43.44%) Loss: 2.235532 LR: 0.00002504 +[11:47:06] Epoch: 1 Batch: 8733/20099 (43.45%) Loss: 1.808293 LR: 0.00002504 +[11:47:07] Epoch: 1 Batch: 8734/20099 (43.45%) Loss: 1.774312 LR: 0.00002504 +[11:47:09] Epoch: 1 Batch: 8735/20099 (43.46%) Loss: 2.037095 LR: 0.00002504 +[11:47:11] Epoch: 1 Batch: 8736/20099 (43.46%) Loss: 2.064451 LR: 0.00002504 +[11:47:13] Epoch: 1 Batch: 8737/20099 (43.47%) Loss: 2.264861 LR: 0.00002504 +[11:47:14] Epoch: 1 Batch: 8738/20099 (43.47%) Loss: 2.002427 LR: 0.00002504 +[11:47:16] Epoch: 1 Batch: 8739/20099 (43.48%) Loss: 1.738102 LR: 0.00002503 +[11:47:18] Epoch: 1 Batch: 8740/20099 (43.48%) Loss: 2.282757 LR: 0.00002503 +[11:47:20] Epoch: 1 Batch: 8741/20099 (43.49%) Loss: 1.955977 LR: 0.00002503 +[11:47:22] Epoch: 1 Batch: 8742/20099 (43.49%) Loss: 2.112873 LR: 0.00002503 +[11:47:23] Epoch: 1 Batch: 8743/20099 (43.50%) Loss: 2.100660 LR: 0.00002503 +[11:47:25] Epoch: 1 Batch: 8744/20099 (43.50%) Loss: 1.953080 LR: 0.00002503 +[11:47:27] Epoch: 1 Batch: 8745/20099 (43.51%) Loss: 2.181740 LR: 0.00002503 +[11:47:29] Epoch: 1 Batch: 8746/20099 (43.51%) Loss: 2.088629 LR: 0.00002502 +[11:47:31] Epoch: 1 Batch: 8747/20099 (43.52%) Loss: 2.210272 LR: 0.00002502 +[11:47:32] Epoch: 1 Batch: 8748/20099 (43.52%) Loss: 2.431772 LR: 0.00002502 +[11:47:34] Epoch: 1 Batch: 8749/20099 (43.53%) Loss: 2.076193 LR: 0.00002502 +[11:47:36] Epoch: 1 Batch: 8750/20099 (43.53%) Loss: 1.753880 LR: 0.00002502 +[11:47:38] Epoch: 1 Batch: 8751/20099 (43.54%) Loss: 2.023881 LR: 0.00002502 +[11:47:39] Epoch: 1 Batch: 8752/20099 (43.54%) Loss: 1.874739 LR: 0.00002502 +[11:47:41] Epoch: 1 Batch: 8753/20099 (43.55%) Loss: 2.292072 LR: 0.00002500 +[11:47:43] Epoch: 1 Batch: 8754/20099 (43.55%) Loss: 2.109941 LR: 0.00002500 +[11:47:45] Epoch: 1 Batch: 8755/20099 (43.56%) Loss: 2.108124 LR: 0.00002500 +[11:47:46] Epoch: 1 Batch: 8756/20099 (43.56%) Loss: 2.006287 LR: 0.00002500 +[11:47:48] Epoch: 1 Batch: 8757/20099 (43.57%) Loss: 2.134070 LR: 0.00002500 +[11:47:50] Epoch: 1 Batch: 8758/20099 (43.57%) Loss: 2.188874 LR: 0.00002500 +[11:47:52] Epoch: 1 Batch: 8759/20099 (43.58%) Loss: 2.424749 LR: 0.00002500 +[11:47:54] Epoch: 1 Batch: 8760/20099 (43.58%) Loss: 2.178039 LR: 0.00002499 +[11:47:55] Epoch: 1 Batch: 8761/20099 (43.59%) Loss: 2.448673 LR: 0.00002499 +[11:47:57] Epoch: 1 Batch: 8762/20099 (43.59%) Loss: 2.244437 LR: 0.00002499 +[11:47:59] Epoch: 1 Batch: 8763/20099 (43.60%) Loss: 2.066433 LR: 0.00002499 +[11:48:01] Epoch: 1 Batch: 8764/20099 (43.60%) Loss: 2.194649 LR: 0.00002499 +[11:48:02] Epoch: 1 Batch: 8765/20099 (43.61%) Loss: 2.255410 LR: 0.00002499 +[11:48:04] Epoch: 1 Batch: 8766/20099 (43.61%) Loss: 2.361702 LR: 0.00002499 +[11:48:06] Epoch: 1 Batch: 8767/20099 (43.62%) Loss: 2.179531 LR: 0.00002498 +[11:48:08] Epoch: 1 Batch: 8768/20099 (43.62%) Loss: 2.216789 LR: 0.00002498 +[11:48:10] Epoch: 1 Batch: 8769/20099 (43.63%) Loss: 2.302819 LR: 0.00002498 +[11:48:11] Epoch: 1 Batch: 8770/20099 (43.63%) Loss: 2.389370 LR: 0.00002498 +[11:48:13] Epoch: 1 Batch: 8771/20099 (43.64%) Loss: 2.379965 LR: 0.00002498 +[11:48:15] Epoch: 1 Batch: 8772/20099 (43.64%) Loss: 1.903313 LR: 0.00002498 +[11:48:17] Epoch: 1 Batch: 8773/20099 (43.65%) Loss: 1.905012 LR: 0.00002498 +[11:48:19] Epoch: 1 Batch: 8774/20099 (43.65%) Loss: 1.881031 LR: 0.00002497 +[11:48:20] Epoch: 1 Batch: 8775/20099 (43.66%) Loss: 2.346161 LR: 0.00002497 +[11:48:22] Epoch: 1 Batch: 8776/20099 (43.66%) Loss: 1.985290 LR: 0.00002497 +[11:48:24] Epoch: 1 Batch: 8777/20099 (43.67%) Loss: 2.154877 LR: 0.00002497 +[11:48:26] Epoch: 1 Batch: 8778/20099 (43.67%) Loss: 2.086773 LR: 0.00002497 +[11:48:27] Epoch: 1 Batch: 8779/20099 (43.68%) Loss: 1.896952 LR: 0.00002497 +[11:48:29] Epoch: 1 Batch: 8780/20099 (43.68%) Loss: 1.734218 LR: 0.00002497 +[11:48:31] Epoch: 1 Batch: 8781/20099 (43.69%) Loss: 1.619685 LR: 0.00002495 +[11:48:33] Epoch: 1 Batch: 8782/20099 (43.69%) Loss: 1.964832 LR: 0.00002495 +[11:48:35] Epoch: 1 Batch: 8783/20099 (43.70%) Loss: 2.174592 LR: 0.00002495 +[11:48:36] Epoch: 1 Batch: 8784/20099 (43.70%) Loss: 2.134486 LR: 0.00002495 +[11:48:38] Epoch: 1 Batch: 8785/20099 (43.71%) Loss: 2.167810 LR: 0.00002495 +[11:48:40] Epoch: 1 Batch: 8786/20099 (43.71%) Loss: 2.100261 LR: 0.00002495 +[11:48:42] Epoch: 1 Batch: 8787/20099 (43.72%) Loss: 2.004875 LR: 0.00002495 +[11:48:43] Epoch: 1 Batch: 8788/20099 (43.72%) Loss: 2.284613 LR: 0.00002494 +[11:48:45] Epoch: 1 Batch: 8789/20099 (43.73%) Loss: 2.216136 LR: 0.00002494 +[11:48:47] Epoch: 1 Batch: 8790/20099 (43.73%) Loss: 2.170727 LR: 0.00002494 +[11:48:49] Epoch: 1 Batch: 8791/20099 (43.74%) Loss: 2.414905 LR: 0.00002494 +[11:48:51] Epoch: 1 Batch: 8792/20099 (43.74%) Loss: 2.294760 LR: 0.00002494 +[11:48:52] Epoch: 1 Batch: 8793/20099 (43.75%) Loss: 2.299910 LR: 0.00002494 +[11:48:54] Epoch: 1 Batch: 8794/20099 (43.75%) Loss: 1.928282 LR: 0.00002494 +[11:48:56] Epoch: 1 Batch: 8795/20099 (43.76%) Loss: 2.045822 LR: 0.00002493 +[11:48:58] Epoch: 1 Batch: 8796/20099 (43.76%) Loss: 2.228258 LR: 0.00002493 +[11:48:59] Epoch: 1 Batch: 8797/20099 (43.77%) Loss: 2.168900 LR: 0.00002493 +[11:49:01] Epoch: 1 Batch: 8798/20099 (43.77%) Loss: 2.177686 LR: 0.00002493 +[11:49:03] Epoch: 1 Batch: 8799/20099 (43.78%) Loss: 1.878343 LR: 0.00002493 +[11:49:08] >> Cleaned up old temp checkpoint: epoch1_step6800 +[11:49:08] >> Temp checkpoint saved: epoch1_step8800, size: 0.1693 GB +[11:49:08] Epoch: 1 Batch: 8800/20099 (43.78%) Loss: 1.866010 LR: 0.00002493 +[11:49:10] Epoch: 1 Batch: 8801/20099 (43.79%) Loss: 1.960122 LR: 0.00002493 +[11:49:12] Epoch: 1 Batch: 8802/20099 (43.79%) Loss: 2.231789 LR: 0.00002491 +[11:49:14] Epoch: 1 Batch: 8803/20099 (43.80%) Loss: 1.864262 LR: 0.00002491 +[11:49:15] Epoch: 1 Batch: 8804/20099 (43.80%) Loss: 2.279302 LR: 0.00002491 +[11:49:17] Epoch: 1 Batch: 8805/20099 (43.81%) Loss: 2.153176 LR: 0.00002491 +[11:49:19] Epoch: 1 Batch: 8806/20099 (43.81%) Loss: 2.209081 LR: 0.00002491 +[11:49:21] Epoch: 1 Batch: 8807/20099 (43.82%) Loss: 2.357876 LR: 0.00002491 +[11:49:22] Epoch: 1 Batch: 8808/20099 (43.82%) Loss: 1.957975 LR: 0.00002491 +[11:49:24] Epoch: 1 Batch: 8809/20099 (43.83%) Loss: 2.060785 LR: 0.00002490 +[11:49:26] Epoch: 1 Batch: 8810/20099 (43.83%) Loss: 1.981337 LR: 0.00002490 +[11:49:28] Epoch: 1 Batch: 8811/20099 (43.84%) Loss: 2.171979 LR: 0.00002490 +[11:49:30] Epoch: 1 Batch: 8812/20099 (43.84%) Loss: 1.936823 LR: 0.00002490 +[11:49:31] Epoch: 1 Batch: 8813/20099 (43.85%) Loss: 2.091846 LR: 0.00002490 +[11:49:33] Epoch: 1 Batch: 8814/20099 (43.85%) Loss: 2.210981 LR: 0.00002490 +[11:49:35] Epoch: 1 Batch: 8815/20099 (43.86%) Loss: 2.059821 LR: 0.00002490 +[11:49:37] Epoch: 1 Batch: 8816/20099 (43.86%) Loss: 2.251494 LR: 0.00002489 +[11:49:39] Epoch: 1 Batch: 8817/20099 (43.87%) Loss: 1.927938 LR: 0.00002489 +[11:49:41] Epoch: 1 Batch: 8818/20099 (43.87%) Loss: 2.082364 LR: 0.00002489 +[11:49:42] Epoch: 1 Batch: 8819/20099 (43.88%) Loss: 2.126484 LR: 0.00002489 +[11:49:44] Epoch: 1 Batch: 8820/20099 (43.88%) Loss: 2.276657 LR: 0.00002489 +[11:49:46] Epoch: 1 Batch: 8821/20099 (43.89%) Loss: 2.066762 LR: 0.00002489 +[11:49:48] Epoch: 1 Batch: 8822/20099 (43.89%) Loss: 2.221160 LR: 0.00002489 +[11:49:50] Epoch: 1 Batch: 8823/20099 (43.90%) Loss: 2.168954 LR: 0.00002487 +[11:49:51] Epoch: 1 Batch: 8824/20099 (43.90%) Loss: 2.188328 LR: 0.00002487 +[11:49:53] Epoch: 1 Batch: 8825/20099 (43.91%) Loss: 2.223566 LR: 0.00002487 +[11:49:55] Epoch: 1 Batch: 8826/20099 (43.91%) Loss: 1.842831 LR: 0.00002487 +[11:49:57] Epoch: 1 Batch: 8827/20099 (43.92%) Loss: 2.191789 LR: 0.00002487 +[11:49:58] Epoch: 1 Batch: 8828/20099 (43.92%) Loss: 2.634541 LR: 0.00002487 +[11:50:00] Epoch: 1 Batch: 8829/20099 (43.93%) Loss: 2.167196 LR: 0.00002487 +[11:50:02] Epoch: 1 Batch: 8830/20099 (43.93%) Loss: 2.050665 LR: 0.00002486 +[11:50:04] Epoch: 1 Batch: 8831/20099 (43.94%) Loss: 2.094174 LR: 0.00002486 +[11:50:05] Epoch: 1 Batch: 8832/20099 (43.94%) Loss: 2.352609 LR: 0.00002486 +[11:50:07] Epoch: 1 Batch: 8833/20099 (43.95%) Loss: 2.318770 LR: 0.00002486 +[11:50:09] Epoch: 1 Batch: 8834/20099 (43.95%) Loss: 2.289995 LR: 0.00002486 +[11:50:11] Epoch: 1 Batch: 8835/20099 (43.96%) Loss: 2.046527 LR: 0.00002486 +[11:50:13] Epoch: 1 Batch: 8836/20099 (43.96%) Loss: 2.343850 LR: 0.00002486 +[11:50:14] Epoch: 1 Batch: 8837/20099 (43.97%) Loss: 1.936934 LR: 0.00002485 +[11:50:16] Epoch: 1 Batch: 8838/20099 (43.97%) Loss: 2.000253 LR: 0.00002485 +[11:50:18] Epoch: 1 Batch: 8839/20099 (43.98%) Loss: 2.412932 LR: 0.00002485 +[11:50:20] Epoch: 1 Batch: 8840/20099 (43.98%) Loss: 2.070726 LR: 0.00002485 +[11:50:21] Epoch: 1 Batch: 8841/20099 (43.99%) Loss: 1.701976 LR: 0.00002485 +[11:50:23] Epoch: 1 Batch: 8842/20099 (43.99%) Loss: 2.069255 LR: 0.00002485 +[11:50:25] Epoch: 1 Batch: 8843/20099 (44.00%) Loss: 2.259437 LR: 0.00002485 +[11:50:27] Epoch: 1 Batch: 8844/20099 (44.00%) Loss: 2.449068 LR: 0.00002483 +[11:50:28] Epoch: 1 Batch: 8845/20099 (44.01%) Loss: 2.214760 LR: 0.00002483 +[11:50:30] Epoch: 1 Batch: 8846/20099 (44.01%) Loss: 2.054098 LR: 0.00002483 +[11:50:32] Epoch: 1 Batch: 8847/20099 (44.02%) Loss: 2.061645 LR: 0.00002483 +[11:50:34] Epoch: 1 Batch: 8848/20099 (44.02%) Loss: 2.222070 LR: 0.00002483 +[11:50:36] Epoch: 1 Batch: 8849/20099 (44.03%) Loss: 1.882869 LR: 0.00002483 +[11:50:37] Epoch: 1 Batch: 8850/20099 (44.03%) Loss: 2.003189 LR: 0.00002483 +[11:50:39] Epoch: 1 Batch: 8851/20099 (44.04%) Loss: 2.097128 LR: 0.00002482 +[11:50:41] Epoch: 1 Batch: 8852/20099 (44.04%) Loss: 2.241874 LR: 0.00002482 +[11:50:43] Epoch: 1 Batch: 8853/20099 (44.05%) Loss: 2.036433 LR: 0.00002482 +[11:50:45] Epoch: 1 Batch: 8854/20099 (44.05%) Loss: 1.945629 LR: 0.00002482 +[11:50:46] Epoch: 1 Batch: 8855/20099 (44.06%) Loss: 2.118990 LR: 0.00002482 +[11:50:48] Epoch: 1 Batch: 8856/20099 (44.06%) Loss: 2.318518 LR: 0.00002482 +[11:50:50] Epoch: 1 Batch: 8857/20099 (44.07%) Loss: 2.109678 LR: 0.00002482 +[11:50:52] Epoch: 1 Batch: 8858/20099 (44.07%) Loss: 1.893293 LR: 0.00002481 +[11:50:53] Epoch: 1 Batch: 8859/20099 (44.08%) Loss: 2.191021 LR: 0.00002481 +[11:50:55] Epoch: 1 Batch: 8860/20099 (44.08%) Loss: 1.980836 LR: 0.00002481 +[11:50:57] Epoch: 1 Batch: 8861/20099 (44.09%) Loss: 2.129941 LR: 0.00002481 +[11:50:59] Epoch: 1 Batch: 8862/20099 (44.09%) Loss: 2.392464 LR: 0.00002481 +[11:51:01] Epoch: 1 Batch: 8863/20099 (44.10%) Loss: 1.517700 LR: 0.00002481 +[11:51:02] Epoch: 1 Batch: 8864/20099 (44.10%) Loss: 2.092232 LR: 0.00002481 +[11:51:04] Epoch: 1 Batch: 8865/20099 (44.11%) Loss: 2.601226 LR: 0.00002479 +[11:51:06] Epoch: 1 Batch: 8866/20099 (44.11%) Loss: 2.259182 LR: 0.00002479 +[11:51:08] Epoch: 1 Batch: 8867/20099 (44.12%) Loss: 1.996094 LR: 0.00002479 +[11:51:09] Epoch: 1 Batch: 8868/20099 (44.12%) Loss: 2.057159 LR: 0.00002479 +[11:51:11] Epoch: 1 Batch: 8869/20099 (44.13%) Loss: 2.101749 LR: 0.00002479 +[11:51:13] Epoch: 1 Batch: 8870/20099 (44.13%) Loss: 2.284791 LR: 0.00002479 +[11:51:15] Epoch: 1 Batch: 8871/20099 (44.14%) Loss: 2.145237 LR: 0.00002479 +[11:51:17] Epoch: 1 Batch: 8872/20099 (44.14%) Loss: 2.443478 LR: 0.00002478 +[11:51:18] Epoch: 1 Batch: 8873/20099 (44.15%) Loss: 2.003298 LR: 0.00002478 +[11:51:20] Epoch: 1 Batch: 8874/20099 (44.15%) Loss: 2.197456 LR: 0.00002478 +[11:51:22] Epoch: 1 Batch: 8875/20099 (44.16%) Loss: 2.143882 LR: 0.00002478 +[11:51:24] Epoch: 1 Batch: 8876/20099 (44.16%) Loss: 2.336657 LR: 0.00002478 +[11:51:25] Epoch: 1 Batch: 8877/20099 (44.17%) Loss: 2.509401 LR: 0.00002478 +[11:51:27] Epoch: 1 Batch: 8878/20099 (44.17%) Loss: 1.780570 LR: 0.00002478 +[11:51:29] Epoch: 1 Batch: 8879/20099 (44.18%) Loss: 2.063086 LR: 0.00002477 +[11:51:31] Epoch: 1 Batch: 8880/20099 (44.18%) Loss: 1.829238 LR: 0.00002477 +[11:51:33] Epoch: 1 Batch: 8881/20099 (44.19%) Loss: 2.067754 LR: 0.00002477 +[11:51:34] Epoch: 1 Batch: 8882/20099 (44.19%) Loss: 2.306305 LR: 0.00002477 +[11:51:36] Epoch: 1 Batch: 8883/20099 (44.20%) Loss: 2.325176 LR: 0.00002477 +[11:51:38] Epoch: 1 Batch: 8884/20099 (44.20%) Loss: 1.910734 LR: 0.00002477 +[11:51:40] Epoch: 1 Batch: 8885/20099 (44.21%) Loss: 1.956098 LR: 0.00002477 +[11:51:41] Epoch: 1 Batch: 8886/20099 (44.21%) Loss: 2.054062 LR: 0.00002475 +[11:51:43] Epoch: 1 Batch: 8887/20099 (44.22%) Loss: 2.056872 LR: 0.00002475 +[11:51:45] Epoch: 1 Batch: 8888/20099 (44.22%) Loss: 2.105043 LR: 0.00002475 +[11:51:47] Epoch: 1 Batch: 8889/20099 (44.23%) Loss: 1.798053 LR: 0.00002475 +[11:51:48] Epoch: 1 Batch: 8890/20099 (44.23%) Loss: 2.134282 LR: 0.00002475 +[11:51:50] Epoch: 1 Batch: 8891/20099 (44.24%) Loss: 2.263365 LR: 0.00002475 +[11:51:52] Epoch: 1 Batch: 8892/20099 (44.24%) Loss: 2.095472 LR: 0.00002475 +[11:51:54] Epoch: 1 Batch: 8893/20099 (44.25%) Loss: 2.099180 LR: 0.00002474 +[11:51:56] Epoch: 1 Batch: 8894/20099 (44.25%) Loss: 2.139844 LR: 0.00002474 +[11:51:57] Epoch: 1 Batch: 8895/20099 (44.26%) Loss: 2.176537 LR: 0.00002474 +[11:51:59] Epoch: 1 Batch: 8896/20099 (44.26%) Loss: 1.992026 LR: 0.00002474 +[11:52:01] Epoch: 1 Batch: 8897/20099 (44.27%) Loss: 2.300092 LR: 0.00002474 +[11:52:03] Epoch: 1 Batch: 8898/20099 (44.27%) Loss: 1.934599 LR: 0.00002474 +[11:52:05] Epoch: 1 Batch: 8899/20099 (44.28%) Loss: 2.067076 LR: 0.00002474 +[11:52:06] Epoch: 1 Batch: 8900/20099 (44.28%) Loss: 1.996957 LR: 0.00002472 +[11:52:08] Epoch: 1 Batch: 8901/20099 (44.29%) Loss: 2.001204 LR: 0.00002472 +[11:52:10] Epoch: 1 Batch: 8902/20099 (44.29%) Loss: 2.185377 LR: 0.00002472 +[11:52:12] Epoch: 1 Batch: 8903/20099 (44.30%) Loss: 2.248745 LR: 0.00002472 +[11:52:13] Epoch: 1 Batch: 8904/20099 (44.30%) Loss: 1.955698 LR: 0.00002472 +[11:52:15] Epoch: 1 Batch: 8905/20099 (44.31%) Loss: 2.046822 LR: 0.00002472 +[11:52:17] Epoch: 1 Batch: 8906/20099 (44.31%) Loss: 2.110507 LR: 0.00002472 +[11:52:19] Epoch: 1 Batch: 8907/20099 (44.32%) Loss: 1.889066 LR: 0.00002471 +[11:52:21] Epoch: 1 Batch: 8908/20099 (44.32%) Loss: 2.348019 LR: 0.00002471 +[11:52:22] Epoch: 1 Batch: 8909/20099 (44.33%) Loss: 1.888197 LR: 0.00002471 +[11:52:24] Epoch: 1 Batch: 8910/20099 (44.33%) Loss: 1.944966 LR: 0.00002471 +[11:52:26] Epoch: 1 Batch: 8911/20099 (44.34%) Loss: 2.411380 LR: 0.00002471 +[11:52:28] Epoch: 1 Batch: 8912/20099 (44.34%) Loss: 2.101868 LR: 0.00002471 +[11:52:30] Epoch: 1 Batch: 8913/20099 (44.35%) Loss: 1.949051 LR: 0.00002471 +[11:52:31] Epoch: 1 Batch: 8914/20099 (44.35%) Loss: 2.021697 LR: 0.00002470 +[11:52:33] Epoch: 1 Batch: 8915/20099 (44.36%) Loss: 1.925864 LR: 0.00002470 +[11:52:35] Epoch: 1 Batch: 8916/20099 (44.36%) Loss: 2.163466 LR: 0.00002470 +[11:52:37] Epoch: 1 Batch: 8917/20099 (44.37%) Loss: 2.105380 LR: 0.00002470 +[11:52:38] Epoch: 1 Batch: 8918/20099 (44.37%) Loss: 2.020347 LR: 0.00002470 +[11:52:40] Epoch: 1 Batch: 8919/20099 (44.38%) Loss: 1.891574 LR: 0.00002470 +[11:52:42] Epoch: 1 Batch: 8920/20099 (44.38%) Loss: 2.121549 LR: 0.00002470 +[11:52:44] Epoch: 1 Batch: 8921/20099 (44.39%) Loss: 2.127363 LR: 0.00002468 +[11:52:46] Epoch: 1 Batch: 8922/20099 (44.39%) Loss: 2.315091 LR: 0.00002468 +[11:52:47] Epoch: 1 Batch: 8923/20099 (44.40%) Loss: 2.200363 LR: 0.00002468 +[11:52:49] Epoch: 1 Batch: 8924/20099 (44.40%) Loss: 2.146644 LR: 0.00002468 +[11:52:51] Epoch: 1 Batch: 8925/20099 (44.41%) Loss: 2.151454 LR: 0.00002468 +[11:52:53] Epoch: 1 Batch: 8926/20099 (44.41%) Loss: 1.891682 LR: 0.00002468 +[11:52:54] Epoch: 1 Batch: 8927/20099 (44.42%) Loss: 2.181899 LR: 0.00002468 +[11:52:56] Epoch: 1 Batch: 8928/20099 (44.42%) Loss: 2.444313 LR: 0.00002467 +[11:52:58] Epoch: 1 Batch: 8929/20099 (44.43%) Loss: 2.092938 LR: 0.00002467 +[11:53:00] Epoch: 1 Batch: 8930/20099 (44.43%) Loss: 2.003423 LR: 0.00002467 +[11:53:02] Epoch: 1 Batch: 8931/20099 (44.44%) Loss: 2.027616 LR: 0.00002467 +[11:53:03] Epoch: 1 Batch: 8932/20099 (44.44%) Loss: 1.957065 LR: 0.00002467 +[11:53:05] Epoch: 1 Batch: 8933/20099 (44.44%) Loss: 1.879801 LR: 0.00002467 +[11:53:07] Epoch: 1 Batch: 8934/20099 (44.45%) Loss: 2.005494 LR: 0.00002467 +[11:53:09] Epoch: 1 Batch: 8935/20099 (44.45%) Loss: 2.281109 LR: 0.00002466 +[11:53:10] Epoch: 1 Batch: 8936/20099 (44.46%) Loss: 2.214734 LR: 0.00002466 +[11:53:12] Epoch: 1 Batch: 8937/20099 (44.46%) Loss: 1.818481 LR: 0.00002466 +[11:53:14] Epoch: 1 Batch: 8938/20099 (44.47%) Loss: 2.112454 LR: 0.00002466 +[11:53:16] Epoch: 1 Batch: 8939/20099 (44.47%) Loss: 2.012050 LR: 0.00002466 +[11:53:18] Epoch: 1 Batch: 8940/20099 (44.48%) Loss: 2.226337 LR: 0.00002466 +[11:53:19] Epoch: 1 Batch: 8941/20099 (44.48%) Loss: 2.296197 LR: 0.00002466 +[11:53:21] Epoch: 1 Batch: 8942/20099 (44.49%) Loss: 2.230952 LR: 0.00002464 +[11:53:23] Epoch: 1 Batch: 8943/20099 (44.49%) Loss: 2.047608 LR: 0.00002464 +[11:53:25] Epoch: 1 Batch: 8944/20099 (44.50%) Loss: 2.265190 LR: 0.00002464 +[11:53:26] Epoch: 1 Batch: 8945/20099 (44.50%) Loss: 2.256366 LR: 0.00002464 +[11:53:28] Epoch: 1 Batch: 8946/20099 (44.51%) Loss: 2.191411 LR: 0.00002464 +[11:53:30] Epoch: 1 Batch: 8947/20099 (44.51%) Loss: 2.253154 LR: 0.00002464 +[11:53:32] Epoch: 1 Batch: 8948/20099 (44.52%) Loss: 2.075844 LR: 0.00002464 +[11:53:34] Epoch: 1 Batch: 8949/20099 (44.52%) Loss: 1.998888 LR: 0.00002463 +[11:53:35] Epoch: 1 Batch: 8950/20099 (44.53%) Loss: 2.170465 LR: 0.00002463 +[11:53:37] Epoch: 1 Batch: 8951/20099 (44.53%) Loss: 1.914535 LR: 0.00002463 +[11:53:39] Epoch: 1 Batch: 8952/20099 (44.54%) Loss: 2.146004 LR: 0.00002463 +[11:53:41] Epoch: 1 Batch: 8953/20099 (44.54%) Loss: 2.403942 LR: 0.00002463 +[11:53:42] Epoch: 1 Batch: 8954/20099 (44.55%) Loss: 1.942431 LR: 0.00002463 +[11:53:44] Epoch: 1 Batch: 8955/20099 (44.55%) Loss: 2.117085 LR: 0.00002463 +[11:53:46] Epoch: 1 Batch: 8956/20099 (44.56%) Loss: 2.200315 LR: 0.00002462 +[11:53:48] Epoch: 1 Batch: 8957/20099 (44.56%) Loss: 1.910949 LR: 0.00002462 +[11:53:50] Epoch: 1 Batch: 8958/20099 (44.57%) Loss: 2.144685 LR: 0.00002462 +[11:53:51] Epoch: 1 Batch: 8959/20099 (44.57%) Loss: 1.921150 LR: 0.00002462 +[11:53:53] Epoch: 1 Batch: 8960/20099 (44.58%) Loss: 2.101825 LR: 0.00002462 +[11:53:55] Epoch: 1 Batch: 8961/20099 (44.58%) Loss: 2.204791 LR: 0.00002462 +[11:53:57] Epoch: 1 Batch: 8962/20099 (44.59%) Loss: 1.871185 LR: 0.00002462 +[11:53:58] Epoch: 1 Batch: 8963/20099 (44.59%) Loss: 2.129901 LR: 0.00002460 +[11:54:00] Epoch: 1 Batch: 8964/20099 (44.60%) Loss: 1.850264 LR: 0.00002460 +[11:54:02] Epoch: 1 Batch: 8965/20099 (44.60%) Loss: 1.969214 LR: 0.00002460 +[11:54:04] Epoch: 1 Batch: 8966/20099 (44.61%) Loss: 1.965101 LR: 0.00002460 +[11:54:06] Epoch: 1 Batch: 8967/20099 (44.61%) Loss: 2.511655 LR: 0.00002460 +[11:54:07] Epoch: 1 Batch: 8968/20099 (44.62%) Loss: 2.234533 LR: 0.00002460 +[11:54:09] Epoch: 1 Batch: 8969/20099 (44.62%) Loss: 2.165569 LR: 0.00002460 +[11:54:11] Epoch: 1 Batch: 8970/20099 (44.63%) Loss: 2.074373 LR: 0.00002459 +[11:54:13] Epoch: 1 Batch: 8971/20099 (44.63%) Loss: 2.078547 LR: 0.00002459 +[11:54:14] Epoch: 1 Batch: 8972/20099 (44.64%) Loss: 2.041807 LR: 0.00002459 +[11:54:16] Epoch: 1 Batch: 8973/20099 (44.64%) Loss: 2.260077 LR: 0.00002459 +[11:54:18] Epoch: 1 Batch: 8974/20099 (44.65%) Loss: 2.033007 LR: 0.00002459 +[11:54:20] Epoch: 1 Batch: 8975/20099 (44.65%) Loss: 2.001691 LR: 0.00002459 +[11:54:22] Epoch: 1 Batch: 8976/20099 (44.66%) Loss: 2.134772 LR: 0.00002459 +[11:54:23] Epoch: 1 Batch: 8977/20099 (44.66%) Loss: 2.076771 LR: 0.00002458 +[11:54:25] Epoch: 1 Batch: 8978/20099 (44.67%) Loss: 2.277897 LR: 0.00002458 +[11:54:27] Epoch: 1 Batch: 8979/20099 (44.67%) Loss: 2.089426 LR: 0.00002458 +[11:54:29] Epoch: 1 Batch: 8980/20099 (44.68%) Loss: 1.932505 LR: 0.00002458 +[11:54:30] Epoch: 1 Batch: 8981/20099 (44.68%) Loss: 1.973094 LR: 0.00002458 +[11:54:32] Epoch: 1 Batch: 8982/20099 (44.69%) Loss: 2.451916 LR: 0.00002458 +[11:54:34] Epoch: 1 Batch: 8983/20099 (44.69%) Loss: 1.879475 LR: 0.00002458 +[11:54:36] Epoch: 1 Batch: 8984/20099 (44.70%) Loss: 2.189509 LR: 0.00002456 +[11:54:37] Epoch: 1 Batch: 8985/20099 (44.70%) Loss: 2.055868 LR: 0.00002456 +[11:54:39] Epoch: 1 Batch: 8986/20099 (44.71%) Loss: 1.995506 LR: 0.00002456 +[11:54:41] Epoch: 1 Batch: 8987/20099 (44.71%) Loss: 2.308263 LR: 0.00002456 +[11:54:43] Epoch: 1 Batch: 8988/20099 (44.72%) Loss: 2.395047 LR: 0.00002456 +[11:54:45] Epoch: 1 Batch: 8989/20099 (44.72%) Loss: 2.275849 LR: 0.00002456 +[11:54:46] Epoch: 1 Batch: 8990/20099 (44.73%) Loss: 2.255444 LR: 0.00002456 +[11:54:48] Epoch: 1 Batch: 8991/20099 (44.73%) Loss: 1.982450 LR: 0.00002455 +[11:54:50] Epoch: 1 Batch: 8992/20099 (44.74%) Loss: 1.975723 LR: 0.00002455 +[11:54:52] Epoch: 1 Batch: 8993/20099 (44.74%) Loss: 1.913492 LR: 0.00002455 +[11:54:53] Epoch: 1 Batch: 8994/20099 (44.75%) Loss: 2.001206 LR: 0.00002455 +[11:54:55] Epoch: 1 Batch: 8995/20099 (44.75%) Loss: 1.979897 LR: 0.00002455 +[11:54:57] Epoch: 1 Batch: 8996/20099 (44.76%) Loss: 1.974810 LR: 0.00002455 +[11:54:59] Epoch: 1 Batch: 8997/20099 (44.76%) Loss: 1.948730 LR: 0.00002455 +[11:55:01] Epoch: 1 Batch: 8998/20099 (44.77%) Loss: 2.000144 LR: 0.00002454 +[11:55:02] Epoch: 1 Batch: 8999/20099 (44.77%) Loss: 1.873268 LR: 0.00002454 +[11:55:04] >> Evaluating batch 0 +[11:55:05] >> Evaluating batch 1 +[11:55:06] >> Evaluating batch 2 +[11:55:07] >> Evaluating batch 3 +[11:55:08] >> Evaluating batch 4 +[11:55:09] >> Evaluating batch 5 +[11:55:10] >> Evaluating batch 6 +[11:55:11] >> Evaluating batch 7 +[11:55:12] >> Evaluating batch 8 +[11:55:13] >> Evaluating batch 9 +[11:55:14] >> Evaluating batch 10 +[11:55:15] >> Evaluating batch 11 +[11:55:16] >> Evaluating batch 12 +[11:55:17] >> Evaluating batch 13 +[11:55:18] >> Evaluating batch 14 +[11:55:19] >> Evaluating batch 15 +[11:55:20] >> Evaluating batch 16 +[11:55:21] Epoch: 1 Step: 9000/20099 Evaluation: +[11:55:21] [1mAvg Loss Since Last Eval: 2.0990 Val Loss: 2.1710 Validation loss delta: -0.0048 Perplexity: 8.7673 LR: 0.00002454 +[11:55:24] >> Cleaned up old temp checkpoint: epoch1_step7000 +[11:55:24] >> Temp checkpoint saved: epoch1_step9000, size: 0.1693 GB +[11:55:28] >> Checkpoint saved: epoch1_step9000, size: 0.1693 GB +[11:55:28] Epoch: 1 Batch: 9000/20099 (44.78%) Loss: 2.165499 LR: 0.00002454 +[11:55:29] Epoch: 1 Batch: 9001/20099 (44.78%) Loss: 2.278482 LR: 0.00002454 +[11:55:31] Epoch: 1 Batch: 9002/20099 (44.79%) Loss: 2.161115 LR: 0.00002454 +[11:55:33] Epoch: 1 Batch: 9003/20099 (44.79%) Loss: 2.146694 LR: 0.00002454 +[11:55:35] Epoch: 1 Batch: 9004/20099 (44.80%) Loss: 2.146910 LR: 0.00002454 +[11:55:36] Epoch: 1 Batch: 9005/20099 (44.80%) Loss: 2.334079 LR: 0.00002452 +[11:55:38] Epoch: 1 Batch: 9006/20099 (44.81%) Loss: 2.376534 LR: 0.00002452 +[11:55:40] Epoch: 1 Batch: 9007/20099 (44.81%) Loss: 1.890614 LR: 0.00002452 +[11:55:42] Epoch: 1 Batch: 9008/20099 (44.82%) Loss: 2.210827 LR: 0.00002452 +[11:55:43] Epoch: 1 Batch: 9009/20099 (44.82%) Loss: 2.231585 LR: 0.00002452 +[11:55:45] Epoch: 1 Batch: 9010/20099 (44.83%) Loss: 2.145108 LR: 0.00002452 +[11:55:47] Epoch: 1 Batch: 9011/20099 (44.83%) Loss: 1.832391 LR: 0.00002452 +[11:55:49] Epoch: 1 Batch: 9012/20099 (44.84%) Loss: 1.946634 LR: 0.00002451 +[11:55:51] Epoch: 1 Batch: 9013/20099 (44.84%) Loss: 2.395166 LR: 0.00002451 +[11:55:53] Epoch: 1 Batch: 9014/20099 (44.85%) Loss: 2.298743 LR: 0.00002451 +[11:55:54] Epoch: 1 Batch: 9015/20099 (44.85%) Loss: 2.102147 LR: 0.00002451 +[11:55:56] Epoch: 1 Batch: 9016/20099 (44.86%) Loss: 2.206362 LR: 0.00002451 +[11:55:58] Epoch: 1 Batch: 9017/20099 (44.86%) Loss: 2.058571 LR: 0.00002451 +[11:56:00] Epoch: 1 Batch: 9018/20099 (44.87%) Loss: 2.002407 LR: 0.00002451 +[11:56:02] Epoch: 1 Batch: 9019/20099 (44.87%) Loss: 2.114236 LR: 0.00002449 +[11:56:04] Epoch: 1 Batch: 9020/20099 (44.88%) Loss: 2.479812 LR: 0.00002449 +[11:56:05] Epoch: 1 Batch: 9021/20099 (44.88%) Loss: 2.144647 LR: 0.00002449 +[11:56:07] Epoch: 1 Batch: 9022/20099 (44.89%) Loss: 2.088941 LR: 0.00002449 +[11:56:09] Epoch: 1 Batch: 9023/20099 (44.89%) Loss: 2.132734 LR: 0.00002449 +[11:56:11] Epoch: 1 Batch: 9024/20099 (44.90%) Loss: 2.223024 LR: 0.00002449 +[11:56:13] Epoch: 1 Batch: 9025/20099 (44.90%) Loss: 2.143334 LR: 0.00002449 +[11:56:14] Epoch: 1 Batch: 9026/20099 (44.91%) Loss: 2.165899 LR: 0.00002448 +[11:56:16] Epoch: 1 Batch: 9027/20099 (44.91%) Loss: 2.148203 LR: 0.00002448 +[11:56:18] Epoch: 1 Batch: 9028/20099 (44.92%) Loss: 2.087659 LR: 0.00002448 +[11:56:20] Epoch: 1 Batch: 9029/20099 (44.92%) Loss: 1.814174 LR: 0.00002448 +[11:56:22] Epoch: 1 Batch: 9030/20099 (44.93%) Loss: 2.120022 LR: 0.00002448 +[11:56:23] Epoch: 1 Batch: 9031/20099 (44.93%) Loss: 2.401445 LR: 0.00002448 +[11:56:25] Epoch: 1 Batch: 9032/20099 (44.94%) Loss: 2.321194 LR: 0.00002448 +[11:56:27] Epoch: 1 Batch: 9033/20099 (44.94%) Loss: 1.973994 LR: 0.00002447 +[11:56:29] Epoch: 1 Batch: 9034/20099 (44.95%) Loss: 2.105116 LR: 0.00002447 +[11:56:30] Epoch: 1 Batch: 9035/20099 (44.95%) Loss: 2.152263 LR: 0.00002447 +[11:56:32] Epoch: 1 Batch: 9036/20099 (44.96%) Loss: 2.249472 LR: 0.00002447 +[11:56:34] Epoch: 1 Batch: 9037/20099 (44.96%) Loss: 2.199401 LR: 0.00002447 +[11:56:36] Epoch: 1 Batch: 9038/20099 (44.97%) Loss: 2.043688 LR: 0.00002447 +[11:56:37] Epoch: 1 Batch: 9039/20099 (44.97%) Loss: 1.975628 LR: 0.00002447 +[11:56:39] Epoch: 1 Batch: 9040/20099 (44.98%) Loss: 2.091162 LR: 0.00002445 +[11:56:41] Epoch: 1 Batch: 9041/20099 (44.98%) Loss: 1.974990 LR: 0.00002445 +[11:56:43] Epoch: 1 Batch: 9042/20099 (44.99%) Loss: 2.250166 LR: 0.00002445 +[11:56:44] Epoch: 1 Batch: 9043/20099 (44.99%) Loss: 2.365336 LR: 0.00002445 +[11:56:46] Epoch: 1 Batch: 9044/20099 (45.00%) Loss: 1.874790 LR: 0.00002445 +[11:56:48] Epoch: 1 Batch: 9045/20099 (45.00%) Loss: 2.029709 LR: 0.00002445 +[11:56:50] Epoch: 1 Batch: 9046/20099 (45.01%) Loss: 2.314759 LR: 0.00002445 +[11:56:51] Epoch: 1 Batch: 9047/20099 (45.01%) Loss: 1.972628 LR: 0.00002444 +[11:56:53] Epoch: 1 Batch: 9048/20099 (45.02%) Loss: 2.057288 LR: 0.00002444 +[11:56:55] Epoch: 1 Batch: 9049/20099 (45.02%) Loss: 2.040596 LR: 0.00002444 +[11:56:57] Epoch: 1 Batch: 9050/20099 (45.03%) Loss: 2.315883 LR: 0.00002444 +[11:56:58] Epoch: 1 Batch: 9051/20099 (45.03%) Loss: 2.318029 LR: 0.00002444 +[11:57:00] Epoch: 1 Batch: 9052/20099 (45.04%) Loss: 2.473105 LR: 0.00002444 +[11:57:02] Epoch: 1 Batch: 9053/20099 (45.04%) Loss: 2.361902 LR: 0.00002444 +[11:57:04] Epoch: 1 Batch: 9054/20099 (45.05%) Loss: 2.113984 LR: 0.00002443 +[11:57:05] Epoch: 1 Batch: 9055/20099 (45.05%) Loss: 1.990285 LR: 0.00002443 +[11:57:07] Epoch: 1 Batch: 9056/20099 (45.06%) Loss: 1.943011 LR: 0.00002443 +[11:57:09] Epoch: 1 Batch: 9057/20099 (45.06%) Loss: 2.206657 LR: 0.00002443 +[11:57:11] Epoch: 1 Batch: 9058/20099 (45.07%) Loss: 2.286159 LR: 0.00002443 +[11:57:13] Epoch: 1 Batch: 9059/20099 (45.07%) Loss: 2.356273 LR: 0.00002443 +[11:57:14] Epoch: 1 Batch: 9060/20099 (45.08%) Loss: 2.252020 LR: 0.00002443 +[11:57:16] Epoch: 1 Batch: 9061/20099 (45.08%) Loss: 2.236052 LR: 0.00002441 +[11:57:18] Epoch: 1 Batch: 9062/20099 (45.09%) Loss: 2.375777 LR: 0.00002441 +[11:57:20] Epoch: 1 Batch: 9063/20099 (45.09%) Loss: 2.265572 LR: 0.00002441 +[11:57:21] Epoch: 1 Batch: 9064/20099 (45.10%) Loss: 2.420368 LR: 0.00002441 +[11:57:23] Epoch: 1 Batch: 9065/20099 (45.10%) Loss: 2.115771 LR: 0.00002441 +[11:57:25] Epoch: 1 Batch: 9066/20099 (45.11%) Loss: 2.537005 LR: 0.00002441 +[11:57:27] Epoch: 1 Batch: 9067/20099 (45.11%) Loss: 1.812806 LR: 0.00002441 +[11:57:29] Epoch: 1 Batch: 9068/20099 (45.12%) Loss: 1.776789 LR: 0.00002440 +[11:57:30] Epoch: 1 Batch: 9069/20099 (45.12%) Loss: 2.371753 LR: 0.00002440 +[11:57:32] Epoch: 1 Batch: 9070/20099 (45.13%) Loss: 2.009654 LR: 0.00002440 +[11:57:34] Epoch: 1 Batch: 9071/20099 (45.13%) Loss: 1.730688 LR: 0.00002440 +[11:57:36] Epoch: 1 Batch: 9072/20099 (45.14%) Loss: 1.795186 LR: 0.00002440 +[11:57:38] Epoch: 1 Batch: 9073/20099 (45.14%) Loss: 2.191907 LR: 0.00002440 +[11:57:39] Epoch: 1 Batch: 9074/20099 (45.15%) Loss: 1.856454 LR: 0.00002440 +[11:57:41] Epoch: 1 Batch: 9075/20099 (45.15%) Loss: 2.242178 LR: 0.00002438 +[11:57:43] Epoch: 1 Batch: 9076/20099 (45.16%) Loss: 2.029832 LR: 0.00002438 +[11:57:45] Epoch: 1 Batch: 9077/20099 (45.16%) Loss: 1.987526 LR: 0.00002438 +[11:57:46] Epoch: 1 Batch: 9078/20099 (45.17%) Loss: 2.181558 LR: 0.00002438 +[11:57:48] Epoch: 1 Batch: 9079/20099 (45.17%) Loss: 2.033348 LR: 0.00002438 +[11:57:50] Epoch: 1 Batch: 9080/20099 (45.18%) Loss: 2.167500 LR: 0.00002438 +[11:57:52] Epoch: 1 Batch: 9081/20099 (45.18%) Loss: 2.109411 LR: 0.00002438 +[11:57:54] Epoch: 1 Batch: 9082/20099 (45.19%) Loss: 2.050435 LR: 0.00002437 +[11:57:55] Epoch: 1 Batch: 9083/20099 (45.19%) Loss: 2.111998 LR: 0.00002437 +[11:57:57] Epoch: 1 Batch: 9084/20099 (45.20%) Loss: 2.231886 LR: 0.00002437 +[11:57:59] Epoch: 1 Batch: 9085/20099 (45.20%) Loss: 1.954940 LR: 0.00002437 +[11:58:01] Epoch: 1 Batch: 9086/20099 (45.21%) Loss: 2.389026 LR: 0.00002437 +[11:58:02] Epoch: 1 Batch: 9087/20099 (45.21%) Loss: 2.310574 LR: 0.00002437 +[11:58:04] Epoch: 1 Batch: 9088/20099 (45.22%) Loss: 2.196440 LR: 0.00002437 +[11:58:06] Epoch: 1 Batch: 9089/20099 (45.22%) Loss: 2.168095 LR: 0.00002436 +[11:58:08] Epoch: 1 Batch: 9090/20099 (45.23%) Loss: 2.001562 LR: 0.00002436 +[11:58:09] Epoch: 1 Batch: 9091/20099 (45.23%) Loss: 2.129727 LR: 0.00002436 +[11:58:11] Epoch: 1 Batch: 9092/20099 (45.24%) Loss: 2.124660 LR: 0.00002436 +[11:58:13] Epoch: 1 Batch: 9093/20099 (45.24%) Loss: 1.932333 LR: 0.00002436 +[11:58:15] Epoch: 1 Batch: 9094/20099 (45.25%) Loss: 1.928227 LR: 0.00002436 +[11:58:16] Epoch: 1 Batch: 9095/20099 (45.25%) Loss: 2.066949 LR: 0.00002436 +[11:58:18] Epoch: 1 Batch: 9096/20099 (45.26%) Loss: 2.249023 LR: 0.00002434 +[11:58:20] Epoch: 1 Batch: 9097/20099 (45.26%) Loss: 2.394020 LR: 0.00002434 +[11:58:22] Epoch: 1 Batch: 9098/20099 (45.27%) Loss: 2.157460 LR: 0.00002434 +[11:58:23] Epoch: 1 Batch: 9099/20099 (45.27%) Loss: 2.152714 LR: 0.00002434 +[11:58:25] Epoch: 1 Batch: 9100/20099 (45.28%) Loss: 2.190256 LR: 0.00002434 +[11:58:27] Epoch: 1 Batch: 9101/20099 (45.28%) Loss: 1.802599 LR: 0.00002434 +[11:58:29] Epoch: 1 Batch: 9102/20099 (45.29%) Loss: 2.163137 LR: 0.00002434 +[11:58:31] Epoch: 1 Batch: 9103/20099 (45.29%) Loss: 2.383504 LR: 0.00002433 +[11:58:32] Epoch: 1 Batch: 9104/20099 (45.30%) Loss: 2.511687 LR: 0.00002433 +[11:58:34] Epoch: 1 Batch: 9105/20099 (45.30%) Loss: 2.400532 LR: 0.00002433 +[11:58:36] Epoch: 1 Batch: 9106/20099 (45.31%) Loss: 2.127068 LR: 0.00002433 +[11:58:38] Epoch: 1 Batch: 9107/20099 (45.31%) Loss: 2.294084 LR: 0.00002433 +[11:58:39] Epoch: 1 Batch: 9108/20099 (45.32%) Loss: 2.340208 LR: 0.00002433 +[11:58:41] Epoch: 1 Batch: 9109/20099 (45.32%) Loss: 2.100180 LR: 0.00002433 +[11:58:43] Epoch: 1 Batch: 9110/20099 (45.33%) Loss: 2.044102 LR: 0.00002432 +[11:58:45] Epoch: 1 Batch: 9111/20099 (45.33%) Loss: 2.411339 LR: 0.00002432 +[11:58:46] Epoch: 1 Batch: 9112/20099 (45.34%) Loss: 1.948151 LR: 0.00002432 +[11:58:48] Epoch: 1 Batch: 9113/20099 (45.34%) Loss: 1.971307 LR: 0.00002432 +[11:58:50] Epoch: 1 Batch: 9114/20099 (45.35%) Loss: 2.114316 LR: 0.00002432 +[11:58:52] Epoch: 1 Batch: 9115/20099 (45.35%) Loss: 2.291146 LR: 0.00002432 +[11:58:54] Epoch: 1 Batch: 9116/20099 (45.36%) Loss: 1.938184 LR: 0.00002432 +[11:58:55] Epoch: 1 Batch: 9117/20099 (45.36%) Loss: 2.010793 LR: 0.00002430 +[11:58:57] Epoch: 1 Batch: 9118/20099 (45.37%) Loss: 2.530650 LR: 0.00002430 +[11:58:59] Epoch: 1 Batch: 9119/20099 (45.37%) Loss: 2.032656 LR: 0.00002430 +[11:59:01] Epoch: 1 Batch: 9120/20099 (45.38%) Loss: 1.895960 LR: 0.00002430 +[11:59:03] Epoch: 1 Batch: 9121/20099 (45.38%) Loss: 2.166624 LR: 0.00002430 +[11:59:04] Epoch: 1 Batch: 9122/20099 (45.39%) Loss: 2.375689 LR: 0.00002430 +[11:59:06] Epoch: 1 Batch: 9123/20099 (45.39%) Loss: 2.254987 LR: 0.00002430 +[11:59:08] Epoch: 1 Batch: 9124/20099 (45.40%) Loss: 1.886475 LR: 0.00002429 +[11:59:10] Epoch: 1 Batch: 9125/20099 (45.40%) Loss: 2.137772 LR: 0.00002429 +[11:59:11] Epoch: 1 Batch: 9126/20099 (45.41%) Loss: 1.990760 LR: 0.00002429 +[11:59:13] Epoch: 1 Batch: 9127/20099 (45.41%) Loss: 2.231516 LR: 0.00002429 +[11:59:15] Epoch: 1 Batch: 9128/20099 (45.42%) Loss: 2.216008 LR: 0.00002429 +[11:59:17] Epoch: 1 Batch: 9129/20099 (45.42%) Loss: 2.165371 LR: 0.00002429 +[11:59:19] Epoch: 1 Batch: 9130/20099 (45.43%) Loss: 2.222523 LR: 0.00002429 +[11:59:20] Epoch: 1 Batch: 9131/20099 (45.43%) Loss: 2.101853 LR: 0.00002427 +[11:59:22] Epoch: 1 Batch: 9132/20099 (45.44%) Loss: 2.110201 LR: 0.00002427 +[11:59:24] Epoch: 1 Batch: 9133/20099 (45.44%) Loss: 1.856743 LR: 0.00002427 +[11:59:26] Epoch: 1 Batch: 9134/20099 (45.45%) Loss: 2.193601 LR: 0.00002427 +[11:59:27] Epoch: 1 Batch: 9135/20099 (45.45%) Loss: 2.028407 LR: 0.00002427 +[11:59:29] Epoch: 1 Batch: 9136/20099 (45.45%) Loss: 2.129206 LR: 0.00002427 +[11:59:31] Epoch: 1 Batch: 9137/20099 (45.46%) Loss: 2.009054 LR: 0.00002427 +[11:59:33] Epoch: 1 Batch: 9138/20099 (45.46%) Loss: 2.036807 LR: 0.00002426 +[11:59:35] Epoch: 1 Batch: 9139/20099 (45.47%) Loss: 2.248283 LR: 0.00002426 +[11:59:36] Epoch: 1 Batch: 9140/20099 (45.47%) Loss: 1.868717 LR: 0.00002426 +[11:59:38] Epoch: 1 Batch: 9141/20099 (45.48%) Loss: 2.260092 LR: 0.00002426 +[11:59:40] Epoch: 1 Batch: 9142/20099 (45.48%) Loss: 1.954775 LR: 0.00002426 +[11:59:42] Epoch: 1 Batch: 9143/20099 (45.49%) Loss: 2.091927 LR: 0.00002426 +[11:59:43] Epoch: 1 Batch: 9144/20099 (45.49%) Loss: 2.083263 LR: 0.00002426 +[11:59:45] Epoch: 1 Batch: 9145/20099 (45.50%) Loss: 2.049387 LR: 0.00002425 +[11:59:47] Epoch: 1 Batch: 9146/20099 (45.50%) Loss: 2.151494 LR: 0.00002425 +[11:59:49] Epoch: 1 Batch: 9147/20099 (45.51%) Loss: 2.144167 LR: 0.00002425 +[11:59:51] Epoch: 1 Batch: 9148/20099 (45.51%) Loss: 2.074472 LR: 0.00002425 +[11:59:52] Epoch: 1 Batch: 9149/20099 (45.52%) Loss: 1.789467 LR: 0.00002425 +[11:59:54] Epoch: 1 Batch: 9150/20099 (45.52%) Loss: 1.994728 LR: 0.00002425 +[11:59:56] Epoch: 1 Batch: 9151/20099 (45.53%) Loss: 2.204883 LR: 0.00002425 +[11:59:58] Epoch: 1 Batch: 9152/20099 (45.53%) Loss: 2.147173 LR: 0.00002423 +[11:59:59] Epoch: 1 Batch: 9153/20099 (45.54%) Loss: 2.161818 LR: 0.00002423 +[12:00:01] Epoch: 1 Batch: 9154/20099 (45.54%) Loss: 2.088622 LR: 0.00002423 +[12:00:03] Epoch: 1 Batch: 9155/20099 (45.55%) Loss: 2.293149 LR: 0.00002423 +[12:00:05] Epoch: 1 Batch: 9156/20099 (45.55%) Loss: 2.192732 LR: 0.00002423 +[12:00:06] Epoch: 1 Batch: 9157/20099 (45.56%) Loss: 1.871873 LR: 0.00002423 +[12:00:08] Epoch: 1 Batch: 9158/20099 (45.56%) Loss: 2.209268 LR: 0.00002423 +[12:00:10] Epoch: 1 Batch: 9159/20099 (45.57%) Loss: 1.788508 LR: 0.00002422 +[12:00:12] Epoch: 1 Batch: 9160/20099 (45.57%) Loss: 2.424423 LR: 0.00002422 +[12:00:14] Epoch: 1 Batch: 9161/20099 (45.58%) Loss: 2.040257 LR: 0.00002422 +[12:00:15] Epoch: 1 Batch: 9162/20099 (45.58%) Loss: 1.821468 LR: 0.00002422 +[12:00:17] Epoch: 1 Batch: 9163/20099 (45.59%) Loss: 2.084630 LR: 0.00002422 +[12:00:19] Epoch: 1 Batch: 9164/20099 (45.59%) Loss: 2.240560 LR: 0.00002422 +[12:00:21] Epoch: 1 Batch: 9165/20099 (45.60%) Loss: 2.363619 LR: 0.00002422 +[12:00:22] Epoch: 1 Batch: 9166/20099 (45.60%) Loss: 2.216919 LR: 0.00002421 +[12:00:24] Epoch: 1 Batch: 9167/20099 (45.61%) Loss: 2.069092 LR: 0.00002421 +[12:00:26] Epoch: 1 Batch: 9168/20099 (45.61%) Loss: 2.055189 LR: 0.00002421 +[12:00:28] Epoch: 1 Batch: 9169/20099 (45.62%) Loss: 1.851123 LR: 0.00002421 +[12:00:29] Epoch: 1 Batch: 9170/20099 (45.62%) Loss: 1.812312 LR: 0.00002421 +[12:00:31] Epoch: 1 Batch: 9171/20099 (45.63%) Loss: 2.076532 LR: 0.00002421 +[12:00:33] Epoch: 1 Batch: 9172/20099 (45.63%) Loss: 2.441097 LR: 0.00002421 +[12:00:35] Epoch: 1 Batch: 9173/20099 (45.64%) Loss: 2.180741 LR: 0.00002419 +[12:00:36] Epoch: 1 Batch: 9174/20099 (45.64%) Loss: 1.986938 LR: 0.00002419 +[12:00:38] Epoch: 1 Batch: 9175/20099 (45.65%) Loss: 2.007969 LR: 0.00002419 +[12:00:40] Epoch: 1 Batch: 9176/20099 (45.65%) Loss: 2.021558 LR: 0.00002419 +[12:00:42] Epoch: 1 Batch: 9177/20099 (45.66%) Loss: 2.153283 LR: 0.00002419 +[12:00:44] Epoch: 1 Batch: 9178/20099 (45.66%) Loss: 2.142602 LR: 0.00002419 +[12:00:45] Epoch: 1 Batch: 9179/20099 (45.67%) Loss: 1.904850 LR: 0.00002419 +[12:00:47] Epoch: 1 Batch: 9180/20099 (45.67%) Loss: 2.087038 LR: 0.00002418 +[12:00:49] Epoch: 1 Batch: 9181/20099 (45.68%) Loss: 2.271775 LR: 0.00002418 +[12:00:51] Epoch: 1 Batch: 9182/20099 (45.68%) Loss: 2.017069 LR: 0.00002418 +[12:00:52] Epoch: 1 Batch: 9183/20099 (45.69%) Loss: 1.889721 LR: 0.00002418 +[12:00:54] Epoch: 1 Batch: 9184/20099 (45.69%) Loss: 2.052370 LR: 0.00002418 +[12:00:56] Epoch: 1 Batch: 9185/20099 (45.70%) Loss: 2.325043 LR: 0.00002418 +[12:00:58] Epoch: 1 Batch: 9186/20099 (45.70%) Loss: 2.159525 LR: 0.00002418 +[12:00:59] Epoch: 1 Batch: 9187/20099 (45.71%) Loss: 2.120241 LR: 0.00002416 +[12:01:01] Epoch: 1 Batch: 9188/20099 (45.71%) Loss: 2.193137 LR: 0.00002416 +[12:01:03] Epoch: 1 Batch: 9189/20099 (45.72%) Loss: 2.097411 LR: 0.00002416 +[12:01:05] Epoch: 1 Batch: 9190/20099 (45.72%) Loss: 2.206403 LR: 0.00002416 +[12:01:06] Epoch: 1 Batch: 9191/20099 (45.73%) Loss: 2.289990 LR: 0.00002416 +[12:01:08] Epoch: 1 Batch: 9192/20099 (45.73%) Loss: 2.521374 LR: 0.00002416 +[12:01:10] Epoch: 1 Batch: 9193/20099 (45.74%) Loss: 1.924747 LR: 0.00002416 +[12:01:12] Epoch: 1 Batch: 9194/20099 (45.74%) Loss: 1.856146 LR: 0.00002415 +[12:01:14] Epoch: 1 Batch: 9195/20099 (45.75%) Loss: 1.888569 LR: 0.00002415 +[12:01:15] Epoch: 1 Batch: 9196/20099 (45.75%) Loss: 2.146529 LR: 0.00002415 +[12:01:17] Epoch: 1 Batch: 9197/20099 (45.76%) Loss: 2.126666 LR: 0.00002415 +[12:01:19] Epoch: 1 Batch: 9198/20099 (45.76%) Loss: 2.299310 LR: 0.00002415 +[12:01:21] Epoch: 1 Batch: 9199/20099 (45.77%) Loss: 1.915786 LR: 0.00002415 +[12:01:26] >> Cleaned up old temp checkpoint: epoch1_step7200 +[12:01:26] >> Temp checkpoint saved: epoch1_step9200, size: 0.1693 GB +[12:01:26] Epoch: 1 Batch: 9200/20099 (45.77%) Loss: 2.045567 LR: 0.00002415 +[12:01:28] Epoch: 1 Batch: 9201/20099 (45.78%) Loss: 2.350906 LR: 0.00002414 +[12:01:29] Epoch: 1 Batch: 9202/20099 (45.78%) Loss: 2.213795 LR: 0.00002414 +[12:01:31] Epoch: 1 Batch: 9203/20099 (45.79%) Loss: 2.392748 LR: 0.00002414 +[12:01:33] Epoch: 1 Batch: 9204/20099 (45.79%) Loss: 2.083737 LR: 0.00002414 +[12:01:35] Epoch: 1 Batch: 9205/20099 (45.80%) Loss: 2.033147 LR: 0.00002414 +[12:01:36] Epoch: 1 Batch: 9206/20099 (45.80%) Loss: 1.970197 LR: 0.00002414 +[12:01:38] Epoch: 1 Batch: 9207/20099 (45.81%) Loss: 2.187941 LR: 0.00002414 +[12:01:40] Epoch: 1 Batch: 9208/20099 (45.81%) Loss: 1.886406 LR: 0.00002412 +[12:01:42] Epoch: 1 Batch: 9209/20099 (45.82%) Loss: 1.894311 LR: 0.00002412 +[12:01:44] Epoch: 1 Batch: 9210/20099 (45.82%) Loss: 2.360817 LR: 0.00002412 +[12:01:45] Epoch: 1 Batch: 9211/20099 (45.83%) Loss: 2.104266 LR: 0.00002412 +[12:01:47] Epoch: 1 Batch: 9212/20099 (45.83%) Loss: 2.170833 LR: 0.00002412 +[12:01:49] Epoch: 1 Batch: 9213/20099 (45.84%) Loss: 1.718122 LR: 0.00002412 +[12:01:51] Epoch: 1 Batch: 9214/20099 (45.84%) Loss: 2.211202 LR: 0.00002412 +[12:01:53] Epoch: 1 Batch: 9215/20099 (45.85%) Loss: 1.995834 LR: 0.00002411 +[12:01:54] Epoch: 1 Batch: 9216/20099 (45.85%) Loss: 2.062163 LR: 0.00002411 +[12:01:56] Epoch: 1 Batch: 9217/20099 (45.86%) Loss: 2.307980 LR: 0.00002411 +[12:01:58] Epoch: 1 Batch: 9218/20099 (45.86%) Loss: 2.267162 LR: 0.00002411 +[12:02:00] Epoch: 1 Batch: 9219/20099 (45.87%) Loss: 2.297253 LR: 0.00002411 +[12:02:02] Epoch: 1 Batch: 9220/20099 (45.87%) Loss: 2.223510 LR: 0.00002411 +[12:02:03] Epoch: 1 Batch: 9221/20099 (45.88%) Loss: 2.280790 LR: 0.00002411 +[12:02:05] Epoch: 1 Batch: 9222/20099 (45.88%) Loss: 2.110057 LR: 0.00002409 +[12:02:07] Epoch: 1 Batch: 9223/20099 (45.89%) Loss: 2.110988 LR: 0.00002409 +[12:02:09] Epoch: 1 Batch: 9224/20099 (45.89%) Loss: 2.348033 LR: 0.00002409 +[12:02:10] Epoch: 1 Batch: 9225/20099 (45.90%) Loss: 2.063286 LR: 0.00002409 +[12:02:12] Epoch: 1 Batch: 9226/20099 (45.90%) Loss: 2.061849 LR: 0.00002409 +[12:02:14] Epoch: 1 Batch: 9227/20099 (45.91%) Loss: 2.228657 LR: 0.00002409 +[12:02:16] Epoch: 1 Batch: 9228/20099 (45.91%) Loss: 1.970988 LR: 0.00002409 +[12:02:18] Epoch: 1 Batch: 9229/20099 (45.92%) Loss: 2.355350 LR: 0.00002408 +[12:02:19] Epoch: 1 Batch: 9230/20099 (45.92%) Loss: 1.792391 LR: 0.00002408 +[12:02:21] Epoch: 1 Batch: 9231/20099 (45.93%) Loss: 2.221451 LR: 0.00002408 +[12:02:23] Epoch: 1 Batch: 9232/20099 (45.93%) Loss: 1.916058 LR: 0.00002408 +[12:02:25] Epoch: 1 Batch: 9233/20099 (45.94%) Loss: 2.318926 LR: 0.00002408 +[12:02:26] Epoch: 1 Batch: 9234/20099 (45.94%) Loss: 2.429273 LR: 0.00002408 +[12:02:28] Epoch: 1 Batch: 9235/20099 (45.95%) Loss: 2.250731 LR: 0.00002408 +[12:02:30] Epoch: 1 Batch: 9236/20099 (45.95%) Loss: 2.355707 LR: 0.00002407 +[12:02:32] Epoch: 1 Batch: 9237/20099 (45.96%) Loss: 2.218588 LR: 0.00002407 +[12:02:34] Epoch: 1 Batch: 9238/20099 (45.96%) Loss: 2.134538 LR: 0.00002407 +[12:02:35] Epoch: 1 Batch: 9239/20099 (45.97%) Loss: 2.071525 LR: 0.00002407 +[12:02:37] Epoch: 1 Batch: 9240/20099 (45.97%) Loss: 2.082251 LR: 0.00002407 +[12:02:39] Epoch: 1 Batch: 9241/20099 (45.98%) Loss: 2.028720 LR: 0.00002407 +[12:02:41] Epoch: 1 Batch: 9242/20099 (45.98%) Loss: 1.975001 LR: 0.00002407 +[12:02:42] Epoch: 1 Batch: 9243/20099 (45.99%) Loss: 1.874352 LR: 0.00002405 +[12:02:44] Epoch: 1 Batch: 9244/20099 (45.99%) Loss: 2.165920 LR: 0.00002405 +[12:02:46] Epoch: 1 Batch: 9245/20099 (46.00%) Loss: 2.311582 LR: 0.00002405 +[12:02:48] Epoch: 1 Batch: 9246/20099 (46.00%) Loss: 2.124339 LR: 0.00002405 +[12:02:49] Epoch: 1 Batch: 9247/20099 (46.01%) Loss: 2.177079 LR: 0.00002405 +[12:02:51] Epoch: 1 Batch: 9248/20099 (46.01%) Loss: 2.211617 LR: 0.00002405 +[12:02:53] Epoch: 1 Batch: 9249/20099 (46.02%) Loss: 2.010850 LR: 0.00002405 +[12:02:55] Epoch: 1 Batch: 9250/20099 (46.02%) Loss: 2.163287 LR: 0.00002404 +[12:02:56] Epoch: 1 Batch: 9251/20099 (46.03%) Loss: 2.267598 LR: 0.00002404 +[12:02:58] Epoch: 1 Batch: 9252/20099 (46.03%) Loss: 2.186189 LR: 0.00002404 +[12:03:00] Epoch: 1 Batch: 9253/20099 (46.04%) Loss: 1.706822 LR: 0.00002404 +[12:03:02] Epoch: 1 Batch: 9254/20099 (46.04%) Loss: 2.185171 LR: 0.00002404 +[12:03:04] Epoch: 1 Batch: 9255/20099 (46.05%) Loss: 2.306379 LR: 0.00002404 +[12:03:05] Epoch: 1 Batch: 9256/20099 (46.05%) Loss: 2.212572 LR: 0.00002404 +[12:03:07] Epoch: 1 Batch: 9257/20099 (46.06%) Loss: 2.298540 LR: 0.00002402 +[12:03:09] Epoch: 1 Batch: 9258/20099 (46.06%) Loss: 1.952045 LR: 0.00002402 +[12:03:11] Epoch: 1 Batch: 9259/20099 (46.07%) Loss: 1.860819 LR: 0.00002402 +[12:03:12] Epoch: 1 Batch: 9260/20099 (46.07%) Loss: 2.198509 LR: 0.00002402 +[12:03:14] Epoch: 1 Batch: 9261/20099 (46.08%) Loss: 2.318628 LR: 0.00002402 +[12:03:16] Epoch: 1 Batch: 9262/20099 (46.08%) Loss: 2.221910 LR: 0.00002402 +[12:03:18] Epoch: 1 Batch: 9263/20099 (46.09%) Loss: 1.937892 LR: 0.00002402 +[12:03:20] Epoch: 1 Batch: 9264/20099 (46.09%) Loss: 2.266819 LR: 0.00002401 +[12:03:21] Epoch: 1 Batch: 9265/20099 (46.10%) Loss: 2.493267 LR: 0.00002401 +[12:03:23] Epoch: 1 Batch: 9266/20099 (46.10%) Loss: 2.041617 LR: 0.00002401 +[12:03:25] Epoch: 1 Batch: 9267/20099 (46.11%) Loss: 2.036446 LR: 0.00002401 +[12:03:27] Epoch: 1 Batch: 9268/20099 (46.11%) Loss: 1.676574 LR: 0.00002401 +[12:03:29] Epoch: 1 Batch: 9269/20099 (46.12%) Loss: 2.131135 LR: 0.00002401 +[12:03:30] Epoch: 1 Batch: 9270/20099 (46.12%) Loss: 1.910474 LR: 0.00002401 +[12:03:32] Epoch: 1 Batch: 9271/20099 (46.13%) Loss: 2.321613 LR: 0.00002400 +[12:03:34] Epoch: 1 Batch: 9272/20099 (46.13%) Loss: 2.359130 LR: 0.00002400 +[12:03:36] Epoch: 1 Batch: 9273/20099 (46.14%) Loss: 1.900019 LR: 0.00002400 +[12:03:37] Epoch: 1 Batch: 9274/20099 (46.14%) Loss: 2.227680 LR: 0.00002400 +[12:03:39] Epoch: 1 Batch: 9275/20099 (46.15%) Loss: 2.249901 LR: 0.00002400 +[12:03:41] Epoch: 1 Batch: 9276/20099 (46.15%) Loss: 2.093754 LR: 0.00002400 +[12:03:43] Epoch: 1 Batch: 9277/20099 (46.16%) Loss: 2.144181 LR: 0.00002400 +[12:03:45] Epoch: 1 Batch: 9278/20099 (46.16%) Loss: 2.114960 LR: 0.00002398 +[12:03:46] Epoch: 1 Batch: 9279/20099 (46.17%) Loss: 2.346670 LR: 0.00002398 +[12:03:48] Epoch: 1 Batch: 9280/20099 (46.17%) Loss: 1.994242 LR: 0.00002398 +[12:03:50] Epoch: 1 Batch: 9281/20099 (46.18%) Loss: 2.514149 LR: 0.00002398 +[12:03:52] Epoch: 1 Batch: 9282/20099 (46.18%) Loss: 1.919082 LR: 0.00002398 +[12:03:53] Epoch: 1 Batch: 9283/20099 (46.19%) Loss: 2.035897 LR: 0.00002398 +[12:03:55] Epoch: 1 Batch: 9284/20099 (46.19%) Loss: 2.066009 LR: 0.00002398 +[12:03:57] Epoch: 1 Batch: 9285/20099 (46.20%) Loss: 2.431469 LR: 0.00002397 +[12:03:59] Epoch: 1 Batch: 9286/20099 (46.20%) Loss: 2.062090 LR: 0.00002397 +[12:04:01] Epoch: 1 Batch: 9287/20099 (46.21%) Loss: 2.238910 LR: 0.00002397 +[12:04:02] Epoch: 1 Batch: 9288/20099 (46.21%) Loss: 1.806733 LR: 0.00002397 +[12:04:04] Epoch: 1 Batch: 9289/20099 (46.22%) Loss: 2.155999 LR: 0.00002397 +[12:04:06] Epoch: 1 Batch: 9290/20099 (46.22%) Loss: 2.078401 LR: 0.00002397 +[12:04:08] Epoch: 1 Batch: 9291/20099 (46.23%) Loss: 2.058513 LR: 0.00002397 +[12:04:09] Epoch: 1 Batch: 9292/20099 (46.23%) Loss: 2.334511 LR: 0.00002395 +[12:04:11] Epoch: 1 Batch: 9293/20099 (46.24%) Loss: 1.956556 LR: 0.00002395 +[12:04:13] Epoch: 1 Batch: 9294/20099 (46.24%) Loss: 1.847939 LR: 0.00002395 +[12:04:15] Epoch: 1 Batch: 9295/20099 (46.25%) Loss: 2.074734 LR: 0.00002395 +[12:04:16] Epoch: 1 Batch: 9296/20099 (46.25%) Loss: 1.866462 LR: 0.00002395 +[12:04:18] Epoch: 1 Batch: 9297/20099 (46.26%) Loss: 2.245913 LR: 0.00002395 +[12:04:20] Epoch: 1 Batch: 9298/20099 (46.26%) Loss: 2.010411 LR: 0.00002395 +[12:04:22] Epoch: 1 Batch: 9299/20099 (46.27%) Loss: 2.289441 LR: 0.00002394 +[12:04:23] Epoch: 1 Batch: 9300/20099 (46.27%) Loss: 2.108822 LR: 0.00002394 +[12:04:25] Epoch: 1 Batch: 9301/20099 (46.28%) Loss: 1.973079 LR: 0.00002394 +[12:04:27] Epoch: 1 Batch: 9302/20099 (46.28%) Loss: 2.071766 LR: 0.00002394 +[12:04:29] Epoch: 1 Batch: 9303/20099 (46.29%) Loss: 2.116529 LR: 0.00002394 +[12:04:30] Epoch: 1 Batch: 9304/20099 (46.29%) Loss: 2.091579 LR: 0.00002394 +[12:04:32] Epoch: 1 Batch: 9305/20099 (46.30%) Loss: 2.317658 LR: 0.00002394 +[12:04:34] Epoch: 1 Batch: 9306/20099 (46.30%) Loss: 2.132154 LR: 0.00002392 +[12:04:36] Epoch: 1 Batch: 9307/20099 (46.31%) Loss: 2.106025 LR: 0.00002392 +[12:04:38] Epoch: 1 Batch: 9308/20099 (46.31%) Loss: 2.300197 LR: 0.00002392 +[12:04:39] Epoch: 1 Batch: 9309/20099 (46.32%) Loss: 2.103794 LR: 0.00002392 +[12:04:41] Epoch: 1 Batch: 9310/20099 (46.32%) Loss: 2.171246 LR: 0.00002392 +[12:04:43] Epoch: 1 Batch: 9311/20099 (46.33%) Loss: 2.476990 LR: 0.00002392 +[12:04:45] Epoch: 1 Batch: 9312/20099 (46.33%) Loss: 1.959389 LR: 0.00002392 +[12:04:46] Epoch: 1 Batch: 9313/20099 (46.34%) Loss: 1.983668 LR: 0.00002391 +[12:04:48] Epoch: 1 Batch: 9314/20099 (46.34%) Loss: 2.283330 LR: 0.00002391 +[12:04:50] Epoch: 1 Batch: 9315/20099 (46.35%) Loss: 2.449835 LR: 0.00002391 +[12:04:52] Epoch: 1 Batch: 9316/20099 (46.35%) Loss: 2.171149 LR: 0.00002391 +[12:04:54] Epoch: 1 Batch: 9317/20099 (46.36%) Loss: 2.098539 LR: 0.00002391 +[12:04:55] Epoch: 1 Batch: 9318/20099 (46.36%) Loss: 1.925911 LR: 0.00002391 +[12:04:57] Epoch: 1 Batch: 9319/20099 (46.37%) Loss: 1.927752 LR: 0.00002391 +[12:04:59] Epoch: 1 Batch: 9320/20099 (46.37%) Loss: 2.331660 LR: 0.00002390 +[12:05:01] Epoch: 1 Batch: 9321/20099 (46.38%) Loss: 2.120948 LR: 0.00002390 +[12:05:02] Epoch: 1 Batch: 9322/20099 (46.38%) Loss: 2.236708 LR: 0.00002390 +[12:05:04] Epoch: 1 Batch: 9323/20099 (46.39%) Loss: 2.142719 LR: 0.00002390 +[12:05:06] Epoch: 1 Batch: 9324/20099 (46.39%) Loss: 2.357676 LR: 0.00002390 +[12:05:08] Epoch: 1 Batch: 9325/20099 (46.40%) Loss: 2.300825 LR: 0.00002390 +[12:05:09] Epoch: 1 Batch: 9326/20099 (46.40%) Loss: 2.067228 LR: 0.00002390 +[12:05:11] Epoch: 1 Batch: 9327/20099 (46.41%) Loss: 2.114865 LR: 0.00002388 +[12:05:13] Epoch: 1 Batch: 9328/20099 (46.41%) Loss: 1.999580 LR: 0.00002388 +[12:05:15] Epoch: 1 Batch: 9329/20099 (46.42%) Loss: 2.046137 LR: 0.00002388 +[12:05:16] Epoch: 1 Batch: 9330/20099 (46.42%) Loss: 2.062184 LR: 0.00002388 +[12:05:18] Epoch: 1 Batch: 9331/20099 (46.43%) Loss: 2.246281 LR: 0.00002388 +[12:05:20] Epoch: 1 Batch: 9332/20099 (46.43%) Loss: 2.071417 LR: 0.00002388 +[12:05:22] Epoch: 1 Batch: 9333/20099 (46.44%) Loss: 2.187849 LR: 0.00002388 +[12:05:24] Epoch: 1 Batch: 9334/20099 (46.44%) Loss: 2.107283 LR: 0.00002387 +[12:05:25] Epoch: 1 Batch: 9335/20099 (46.45%) Loss: 2.274931 LR: 0.00002387 +[12:05:27] Epoch: 1 Batch: 9336/20099 (46.45%) Loss: 2.032984 LR: 0.00002387 +[12:05:29] Epoch: 1 Batch: 9337/20099 (46.46%) Loss: 2.207665 LR: 0.00002387 +[12:05:31] Epoch: 1 Batch: 9338/20099 (46.46%) Loss: 2.033588 LR: 0.00002387 +[12:05:33] Epoch: 1 Batch: 9339/20099 (46.46%) Loss: 2.478738 LR: 0.00002387 +[12:05:34] Epoch: 1 Batch: 9340/20099 (46.47%) Loss: 2.209998 LR: 0.00002387 +[12:05:36] Epoch: 1 Batch: 9341/20099 (46.47%) Loss: 2.214007 LR: 0.00002385 +[12:05:38] Epoch: 1 Batch: 9342/20099 (46.48%) Loss: 2.003210 LR: 0.00002385 +[12:05:40] Epoch: 1 Batch: 9343/20099 (46.48%) Loss: 2.172973 LR: 0.00002385 +[12:05:41] Epoch: 1 Batch: 9344/20099 (46.49%) Loss: 2.219859 LR: 0.00002385 +[12:05:43] Epoch: 1 Batch: 9345/20099 (46.49%) Loss: 2.347208 LR: 0.00002385 +[12:05:45] Epoch: 1 Batch: 9346/20099 (46.50%) Loss: 2.064275 LR: 0.00002385 +[12:05:47] Epoch: 1 Batch: 9347/20099 (46.50%) Loss: 2.346556 LR: 0.00002385 +[12:05:48] Epoch: 1 Batch: 9348/20099 (46.51%) Loss: 1.972494 LR: 0.00002384 +[12:05:50] Epoch: 1 Batch: 9349/20099 (46.51%) Loss: 2.125959 LR: 0.00002384 +[12:05:52] Epoch: 1 Batch: 9350/20099 (46.52%) Loss: 2.037594 LR: 0.00002384 +[12:05:54] Epoch: 1 Batch: 9351/20099 (46.52%) Loss: 1.807078 LR: 0.00002384 +[12:05:56] Epoch: 1 Batch: 9352/20099 (46.53%) Loss: 2.009830 LR: 0.00002384 +[12:05:57] Epoch: 1 Batch: 9353/20099 (46.53%) Loss: 1.947050 LR: 0.00002384 +[12:05:59] Epoch: 1 Batch: 9354/20099 (46.54%) Loss: 1.837435 LR: 0.00002384 +[12:06:01] Epoch: 1 Batch: 9355/20099 (46.54%) Loss: 2.100842 LR: 0.00002383 +[12:06:03] Epoch: 1 Batch: 9356/20099 (46.55%) Loss: 2.099627 LR: 0.00002383 +[12:06:04] Epoch: 1 Batch: 9357/20099 (46.55%) Loss: 2.143074 LR: 0.00002383 +[12:06:06] Epoch: 1 Batch: 9358/20099 (46.56%) Loss: 2.658697 LR: 0.00002383 +[12:06:08] Epoch: 1 Batch: 9359/20099 (46.56%) Loss: 2.196622 LR: 0.00002383 +[12:06:10] Epoch: 1 Batch: 9360/20099 (46.57%) Loss: 2.005906 LR: 0.00002383 +[12:06:12] Epoch: 1 Batch: 9361/20099 (46.57%) Loss: 1.869178 LR: 0.00002383 +[12:06:13] Epoch: 1 Batch: 9362/20099 (46.58%) Loss: 2.191496 LR: 0.00002381 +[12:06:15] Epoch: 1 Batch: 9363/20099 (46.58%) Loss: 2.167088 LR: 0.00002381 +[12:06:17] Epoch: 1 Batch: 9364/20099 (46.59%) Loss: 2.201173 LR: 0.00002381 +[12:06:19] Epoch: 1 Batch: 9365/20099 (46.59%) Loss: 1.829090 LR: 0.00002381 +[12:06:20] Epoch: 1 Batch: 9366/20099 (46.60%) Loss: 1.850945 LR: 0.00002381 +[12:06:22] Epoch: 1 Batch: 9367/20099 (46.60%) Loss: 2.201318 LR: 0.00002381 +[12:06:24] Epoch: 1 Batch: 9368/20099 (46.61%) Loss: 2.059328 LR: 0.00002381 +[12:06:26] Epoch: 1 Batch: 9369/20099 (46.61%) Loss: 2.258810 LR: 0.00002380 +[12:06:27] Epoch: 1 Batch: 9370/20099 (46.62%) Loss: 2.085174 LR: 0.00002380 +[12:06:29] Epoch: 1 Batch: 9371/20099 (46.62%) Loss: 1.890519 LR: 0.00002380 +[12:06:31] Epoch: 1 Batch: 9372/20099 (46.63%) Loss: 2.116861 LR: 0.00002380 +[12:06:33] Epoch: 1 Batch: 9373/20099 (46.63%) Loss: 2.003023 LR: 0.00002380 +[12:06:35] Epoch: 1 Batch: 9374/20099 (46.64%) Loss: 2.249454 LR: 0.00002380 +[12:06:36] Epoch: 1 Batch: 9375/20099 (46.64%) Loss: 2.286725 LR: 0.00002380 +[12:06:38] Epoch: 1 Batch: 9376/20099 (46.65%) Loss: 2.281140 LR: 0.00002378 +[12:06:40] Epoch: 1 Batch: 9377/20099 (46.65%) Loss: 2.191887 LR: 0.00002378 +[12:06:42] Epoch: 1 Batch: 9378/20099 (46.66%) Loss: 2.014700 LR: 0.00002378 +[12:06:43] Epoch: 1 Batch: 9379/20099 (46.66%) Loss: 2.123560 LR: 0.00002378 +[12:06:45] Epoch: 1 Batch: 9380/20099 (46.67%) Loss: 1.912226 LR: 0.00002378 +[12:06:47] Epoch: 1 Batch: 9381/20099 (46.67%) Loss: 1.930715 LR: 0.00002378 +[12:06:49] Epoch: 1 Batch: 9382/20099 (46.68%) Loss: 1.778153 LR: 0.00002378 +[12:06:51] Epoch: 1 Batch: 9383/20099 (46.68%) Loss: 2.215627 LR: 0.00002377 +[12:06:52] Epoch: 1 Batch: 9384/20099 (46.69%) Loss: 2.076908 LR: 0.00002377 +[12:06:54] Epoch: 1 Batch: 9385/20099 (46.69%) Loss: 2.466665 LR: 0.00002377 +[12:06:56] Epoch: 1 Batch: 9386/20099 (46.70%) Loss: 2.248737 LR: 0.00002377 +[12:06:58] Epoch: 1 Batch: 9387/20099 (46.70%) Loss: 2.063239 LR: 0.00002377 +[12:07:00] Epoch: 1 Batch: 9388/20099 (46.71%) Loss: 1.952344 LR: 0.00002377 +[12:07:01] Epoch: 1 Batch: 9389/20099 (46.71%) Loss: 1.833552 LR: 0.00002377 +[12:07:03] Epoch: 1 Batch: 9390/20099 (46.72%) Loss: 2.184552 LR: 0.00002375 +[12:07:05] Epoch: 1 Batch: 9391/20099 (46.72%) Loss: 1.941707 LR: 0.00002375 +[12:07:07] Epoch: 1 Batch: 9392/20099 (46.73%) Loss: 1.916327 LR: 0.00002375 +[12:07:08] Epoch: 1 Batch: 9393/20099 (46.73%) Loss: 2.278915 LR: 0.00002375 +[12:07:10] Epoch: 1 Batch: 9394/20099 (46.74%) Loss: 2.134921 LR: 0.00002375 +[12:07:12] Epoch: 1 Batch: 9395/20099 (46.74%) Loss: 2.005348 LR: 0.00002375 +[12:07:14] Epoch: 1 Batch: 9396/20099 (46.75%) Loss: 1.964191 LR: 0.00002375 +[12:07:16] Epoch: 1 Batch: 9397/20099 (46.75%) Loss: 2.199244 LR: 0.00002374 +[12:07:17] Epoch: 1 Batch: 9398/20099 (46.76%) Loss: 2.275865 LR: 0.00002374 +[12:07:19] Epoch: 1 Batch: 9399/20099 (46.76%) Loss: 2.423033 LR: 0.00002374 +[12:07:24] >> Cleaned up old temp checkpoint: epoch1_step7400 +[12:07:24] >> Temp checkpoint saved: epoch1_step9400, size: 0.1693 GB +[12:07:24] Epoch: 1 Batch: 9400/20099 (46.77%) Loss: 1.907833 LR: 0.00002374 +[12:07:26] Epoch: 1 Batch: 9401/20099 (46.77%) Loss: 2.258406 LR: 0.00002374 +[12:07:28] Epoch: 1 Batch: 9402/20099 (46.78%) Loss: 1.967403 LR: 0.00002374 +[12:07:30] Epoch: 1 Batch: 9403/20099 (46.78%) Loss: 2.172712 LR: 0.00002374 +[12:07:32] Epoch: 1 Batch: 9404/20099 (46.79%) Loss: 2.106892 LR: 0.00002373 +[12:07:33] Epoch: 1 Batch: 9405/20099 (46.79%) Loss: 2.208883 LR: 0.00002373 +[12:07:35] Epoch: 1 Batch: 9406/20099 (46.80%) Loss: 2.230259 LR: 0.00002373 +[12:07:37] Epoch: 1 Batch: 9407/20099 (46.80%) Loss: 2.010746 LR: 0.00002373 +[12:07:39] Epoch: 1 Batch: 9408/20099 (46.81%) Loss: 2.086753 LR: 0.00002373 +[12:07:40] Epoch: 1 Batch: 9409/20099 (46.81%) Loss: 1.925590 LR: 0.00002373 +[12:07:42] Epoch: 1 Batch: 9410/20099 (46.82%) Loss: 2.383957 LR: 0.00002373 +[12:07:44] Epoch: 1 Batch: 9411/20099 (46.82%) Loss: 2.261527 LR: 0.00002371 +[12:07:46] Epoch: 1 Batch: 9412/20099 (46.83%) Loss: 2.158074 LR: 0.00002371 +[12:07:47] Epoch: 1 Batch: 9413/20099 (46.83%) Loss: 1.955886 LR: 0.00002371 +[12:07:49] Epoch: 1 Batch: 9414/20099 (46.84%) Loss: 2.327398 LR: 0.00002371 +[12:07:51] Epoch: 1 Batch: 9415/20099 (46.84%) Loss: 2.030664 LR: 0.00002371 +[12:07:53] Epoch: 1 Batch: 9416/20099 (46.85%) Loss: 2.160621 LR: 0.00002371 +[12:07:55] Epoch: 1 Batch: 9417/20099 (46.85%) Loss: 2.019266 LR: 0.00002371 +[12:07:56] Epoch: 1 Batch: 9418/20099 (46.86%) Loss: 2.069087 LR: 0.00002370 +[12:07:58] Epoch: 1 Batch: 9419/20099 (46.86%) Loss: 2.036530 LR: 0.00002370 +[12:08:00] Epoch: 1 Batch: 9420/20099 (46.87%) Loss: 2.098251 LR: 0.00002370 +[12:08:02] Epoch: 1 Batch: 9421/20099 (46.87%) Loss: 2.012729 LR: 0.00002370 +[12:08:04] Epoch: 1 Batch: 9422/20099 (46.88%) Loss: 2.169810 LR: 0.00002370 +[12:08:05] Epoch: 1 Batch: 9423/20099 (46.88%) Loss: 1.990564 LR: 0.00002370 +[12:08:07] Epoch: 1 Batch: 9424/20099 (46.89%) Loss: 2.126844 LR: 0.00002370 +[12:08:09] Epoch: 1 Batch: 9425/20099 (46.89%) Loss: 2.010858 LR: 0.00002368 +[12:08:11] Epoch: 1 Batch: 9426/20099 (46.90%) Loss: 1.741487 LR: 0.00002368 +[12:08:13] Epoch: 1 Batch: 9427/20099 (46.90%) Loss: 2.058780 LR: 0.00002368 +[12:08:14] Epoch: 1 Batch: 9428/20099 (46.91%) Loss: 2.176776 LR: 0.00002368 +[12:08:16] Epoch: 1 Batch: 9429/20099 (46.91%) Loss: 2.404131 LR: 0.00002368 +[12:08:18] Epoch: 1 Batch: 9430/20099 (46.92%) Loss: 2.564886 LR: 0.00002368 +[12:08:20] Epoch: 1 Batch: 9431/20099 (46.92%) Loss: 2.164956 LR: 0.00002368 +[12:08:21] Epoch: 1 Batch: 9432/20099 (46.93%) Loss: 2.497317 LR: 0.00002367 +[12:08:23] Epoch: 1 Batch: 9433/20099 (46.93%) Loss: 2.002699 LR: 0.00002367 +[12:08:25] Epoch: 1 Batch: 9434/20099 (46.94%) Loss: 2.178479 LR: 0.00002367 +[12:08:27] Epoch: 1 Batch: 9435/20099 (46.94%) Loss: 1.958166 LR: 0.00002367 +[12:08:29] Epoch: 1 Batch: 9436/20099 (46.95%) Loss: 2.052859 LR: 0.00002367 +[12:08:30] Epoch: 1 Batch: 9437/20099 (46.95%) Loss: 1.992962 LR: 0.00002367 +[12:08:32] Epoch: 1 Batch: 9438/20099 (46.96%) Loss: 1.706176 LR: 0.00002367 +[12:08:34] Epoch: 1 Batch: 9439/20099 (46.96%) Loss: 1.873890 LR: 0.00002365 +[12:08:36] Epoch: 1 Batch: 9440/20099 (46.97%) Loss: 1.904483 LR: 0.00002365 +[12:08:37] Epoch: 1 Batch: 9441/20099 (46.97%) Loss: 2.244889 LR: 0.00002365 +[12:08:39] Epoch: 1 Batch: 9442/20099 (46.98%) Loss: 1.769275 LR: 0.00002365 +[12:08:41] Epoch: 1 Batch: 9443/20099 (46.98%) Loss: 1.941209 LR: 0.00002365 +[12:08:43] Epoch: 1 Batch: 9444/20099 (46.99%) Loss: 1.993742 LR: 0.00002365 +[12:08:44] Epoch: 1 Batch: 9445/20099 (46.99%) Loss: 2.206442 LR: 0.00002365 +[12:08:46] Epoch: 1 Batch: 9446/20099 (47.00%) Loss: 1.817183 LR: 0.00002364 +[12:08:48] Epoch: 1 Batch: 9447/20099 (47.00%) Loss: 1.941155 LR: 0.00002364 +[12:08:50] Epoch: 1 Batch: 9448/20099 (47.01%) Loss: 2.283155 LR: 0.00002364 +[12:08:52] Epoch: 1 Batch: 9449/20099 (47.01%) Loss: 2.183802 LR: 0.00002364 +[12:08:53] Epoch: 1 Batch: 9450/20099 (47.02%) Loss: 2.200827 LR: 0.00002364 +[12:08:55] Epoch: 1 Batch: 9451/20099 (47.02%) Loss: 1.961768 LR: 0.00002364 +[12:08:57] Epoch: 1 Batch: 9452/20099 (47.03%) Loss: 2.129909 LR: 0.00002364 +[12:08:59] Epoch: 1 Batch: 9453/20099 (47.03%) Loss: 1.939364 LR: 0.00002363 +[12:09:00] Epoch: 1 Batch: 9454/20099 (47.04%) Loss: 1.984640 LR: 0.00002363 +[12:09:02] Epoch: 1 Batch: 9455/20099 (47.04%) Loss: 1.668297 LR: 0.00002363 +[12:09:04] Epoch: 1 Batch: 9456/20099 (47.05%) Loss: 1.881298 LR: 0.00002363 +[12:09:06] Epoch: 1 Batch: 9457/20099 (47.05%) Loss: 2.266949 LR: 0.00002363 +[12:09:07] Epoch: 1 Batch: 9458/20099 (47.06%) Loss: 2.198484 LR: 0.00002363 +[12:09:09] Epoch: 1 Batch: 9459/20099 (47.06%) Loss: 2.332202 LR: 0.00002363 +[12:09:11] Epoch: 1 Batch: 9460/20099 (47.07%) Loss: 1.882725 LR: 0.00002361 +[12:09:13] Epoch: 1 Batch: 9461/20099 (47.07%) Loss: 2.210719 LR: 0.00002361 +[12:09:14] Epoch: 1 Batch: 9462/20099 (47.08%) Loss: 2.068842 LR: 0.00002361 +[12:09:16] Epoch: 1 Batch: 9463/20099 (47.08%) Loss: 2.213252 LR: 0.00002361 +[12:09:18] Epoch: 1 Batch: 9464/20099 (47.09%) Loss: 2.200275 LR: 0.00002361 +[12:09:20] Epoch: 1 Batch: 9465/20099 (47.09%) Loss: 2.345363 LR: 0.00002361 +[12:09:22] Epoch: 1 Batch: 9466/20099 (47.10%) Loss: 1.978821 LR: 0.00002361 +[12:09:23] Epoch: 1 Batch: 9467/20099 (47.10%) Loss: 2.023939 LR: 0.00002360 +[12:09:25] Epoch: 1 Batch: 9468/20099 (47.11%) Loss: 1.989781 LR: 0.00002360 +[12:09:27] Epoch: 1 Batch: 9469/20099 (47.11%) Loss: 2.047773 LR: 0.00002360 +[12:09:29] Epoch: 1 Batch: 9470/20099 (47.12%) Loss: 2.062806 LR: 0.00002360 +[12:09:30] Epoch: 1 Batch: 9471/20099 (47.12%) Loss: 2.282166 LR: 0.00002360 +[12:09:32] Epoch: 1 Batch: 9472/20099 (47.13%) Loss: 2.183056 LR: 0.00002360 +[12:09:34] Epoch: 1 Batch: 9473/20099 (47.13%) Loss: 2.243507 LR: 0.00002360 +[12:09:36] Epoch: 1 Batch: 9474/20099 (47.14%) Loss: 2.210069 LR: 0.00002358 +[12:09:37] Epoch: 1 Batch: 9475/20099 (47.14%) Loss: 1.780221 LR: 0.00002358 +[12:09:39] Epoch: 1 Batch: 9476/20099 (47.15%) Loss: 1.994505 LR: 0.00002358 +[12:09:41] Epoch: 1 Batch: 9477/20099 (47.15%) Loss: 1.933585 LR: 0.00002358 +[12:09:43] Epoch: 1 Batch: 9478/20099 (47.16%) Loss: 1.911352 LR: 0.00002358 +[12:09:44] Epoch: 1 Batch: 9479/20099 (47.16%) Loss: 2.143031 LR: 0.00002358 +[12:09:46] Epoch: 1 Batch: 9480/20099 (47.17%) Loss: 2.011117 LR: 0.00002358 +[12:09:48] Epoch: 1 Batch: 9481/20099 (47.17%) Loss: 2.197650 LR: 0.00002357 +[12:09:50] Epoch: 1 Batch: 9482/20099 (47.18%) Loss: 2.188788 LR: 0.00002357 +[12:09:52] Epoch: 1 Batch: 9483/20099 (47.18%) Loss: 2.122593 LR: 0.00002357 +[12:09:53] Epoch: 1 Batch: 9484/20099 (47.19%) Loss: 2.077100 LR: 0.00002357 +[12:09:55] Epoch: 1 Batch: 9485/20099 (47.19%) Loss: 2.461115 LR: 0.00002357 +[12:09:57] Epoch: 1 Batch: 9486/20099 (47.20%) Loss: 1.677493 LR: 0.00002357 +[12:09:59] Epoch: 1 Batch: 9487/20099 (47.20%) Loss: 2.160422 LR: 0.00002357 +[12:10:00] Epoch: 1 Batch: 9488/20099 (47.21%) Loss: 2.077689 LR: 0.00002355 +[12:10:02] Epoch: 1 Batch: 9489/20099 (47.21%) Loss: 2.167521 LR: 0.00002355 +[12:10:04] Epoch: 1 Batch: 9490/20099 (47.22%) Loss: 2.092480 LR: 0.00002355 +[12:10:06] Epoch: 1 Batch: 9491/20099 (47.22%) Loss: 2.101011 LR: 0.00002355 +[12:10:07] Epoch: 1 Batch: 9492/20099 (47.23%) Loss: 2.152921 LR: 0.00002355 +[12:10:09] Epoch: 1 Batch: 9493/20099 (47.23%) Loss: 1.897252 LR: 0.00002355 +[12:10:11] Epoch: 1 Batch: 9494/20099 (47.24%) Loss: 2.191504 LR: 0.00002355 +[12:10:13] Epoch: 1 Batch: 9495/20099 (47.24%) Loss: 2.150588 LR: 0.00002354 +[12:10:14] Epoch: 1 Batch: 9496/20099 (47.25%) Loss: 2.163707 LR: 0.00002354 +[12:10:16] Epoch: 1 Batch: 9497/20099 (47.25%) Loss: 2.080167 LR: 0.00002354 +[12:10:18] Epoch: 1 Batch: 9498/20099 (47.26%) Loss: 2.012486 LR: 0.00002354 +[12:10:20] Epoch: 1 Batch: 9499/20099 (47.26%) Loss: 2.247128 LR: 0.00002354 +[12:10:22] >> Evaluating batch 0 +[12:10:23] >> Evaluating batch 1 +[12:10:24] >> Evaluating batch 2 +[12:10:25] >> Evaluating batch 3 +[12:10:26] >> Evaluating batch 4 +[12:10:27] >> Evaluating batch 5 +[12:10:28] >> Evaluating batch 6 +[12:10:29] >> Evaluating batch 7 +[12:10:30] >> Evaluating batch 8 +[12:10:31] >> Evaluating batch 9 +[12:10:32] >> Evaluating batch 10 +[12:10:33] >> Evaluating batch 11 +[12:10:34] >> Evaluating batch 12 +[12:10:35] >> Evaluating batch 13 +[12:10:36] >> Evaluating batch 14 +[12:10:36] >> Evaluating batch 15 +[12:10:37] >> Evaluating batch 16 +[12:10:38] Epoch: 1 Step: 9500/20099 Evaluation: +[12:10:38] [1mAvg Loss Since Last Eval: 2.1206 Val Loss: 2.1674 Validation loss delta: -0.0036 Perplexity: 8.7354 LR: 0.00002354 +[12:10:42] >> Checkpoint saved: epoch1_step9500, size: 0.1693 GB +[12:10:42] Epoch: 1 Batch: 9500/20099 (47.27%) Loss: 2.148778 LR: 0.00002354 +[12:10:43] Epoch: 1 Batch: 9501/20099 (47.27%) Loss: 2.160522 LR: 0.00002354 +[12:10:45] Epoch: 1 Batch: 9502/20099 (47.28%) Loss: 2.229930 LR: 0.00002353 +[12:10:47] Epoch: 1 Batch: 9503/20099 (47.28%) Loss: 1.917577 LR: 0.00002353 +[12:10:49] Epoch: 1 Batch: 9504/20099 (47.29%) Loss: 2.067605 LR: 0.00002353 +[12:10:50] Epoch: 1 Batch: 9505/20099 (47.29%) Loss: 1.998001 LR: 0.00002353 +[12:10:52] Epoch: 1 Batch: 9506/20099 (47.30%) Loss: 2.256693 LR: 0.00002353 +[12:10:54] Epoch: 1 Batch: 9507/20099 (47.30%) Loss: 2.133910 LR: 0.00002353 +[12:10:56] Epoch: 1 Batch: 9508/20099 (47.31%) Loss: 2.324447 LR: 0.00002353 +[12:10:57] Epoch: 1 Batch: 9509/20099 (47.31%) Loss: 1.982158 LR: 0.00002351 +[12:10:59] Epoch: 1 Batch: 9510/20099 (47.32%) Loss: 2.174435 LR: 0.00002351 +[12:11:01] Epoch: 1 Batch: 9511/20099 (47.32%) Loss: 2.153279 LR: 0.00002351 +[12:11:03] Epoch: 1 Batch: 9512/20099 (47.33%) Loss: 1.974454 LR: 0.00002351 +[12:11:05] Epoch: 1 Batch: 9513/20099 (47.33%) Loss: 1.937890 LR: 0.00002351 +[12:11:06] Epoch: 1 Batch: 9514/20099 (47.34%) Loss: 2.187612 LR: 0.00002351 +[12:11:08] Epoch: 1 Batch: 9515/20099 (47.34%) Loss: 1.954257 LR: 0.00002351 +[12:11:10] Epoch: 1 Batch: 9516/20099 (47.35%) Loss: 2.273385 LR: 0.00002350 +[12:11:12] Epoch: 1 Batch: 9517/20099 (47.35%) Loss: 2.387912 LR: 0.00002350 +[12:11:14] Epoch: 1 Batch: 9518/20099 (47.36%) Loss: 1.922419 LR: 0.00002350 +[12:11:15] Epoch: 1 Batch: 9519/20099 (47.36%) Loss: 2.176181 LR: 0.00002350 +[12:11:17] Epoch: 1 Batch: 9520/20099 (47.37%) Loss: 2.347123 LR: 0.00002350 +[12:11:19] Epoch: 1 Batch: 9521/20099 (47.37%) Loss: 2.261538 LR: 0.00002350 +[12:11:21] Epoch: 1 Batch: 9522/20099 (47.38%) Loss: 1.714682 LR: 0.00002350 +[12:11:23] Epoch: 1 Batch: 9523/20099 (47.38%) Loss: 1.612215 LR: 0.00002348 +[12:11:24] Epoch: 1 Batch: 9524/20099 (47.39%) Loss: 1.889985 LR: 0.00002348 +[12:11:26] Epoch: 1 Batch: 9525/20099 (47.39%) Loss: 2.145338 LR: 0.00002348 +[12:11:28] Epoch: 1 Batch: 9526/20099 (47.40%) Loss: 2.152972 LR: 0.00002348 +[12:11:30] Epoch: 1 Batch: 9527/20099 (47.40%) Loss: 1.798252 LR: 0.00002348 +[12:11:31] Epoch: 1 Batch: 9528/20099 (47.41%) Loss: 2.168269 LR: 0.00002348 +[12:11:33] Epoch: 1 Batch: 9529/20099 (47.41%) Loss: 1.879148 LR: 0.00002348 +[12:11:35] Epoch: 1 Batch: 9530/20099 (47.42%) Loss: 1.927739 LR: 0.00002347 +[12:11:37] Epoch: 1 Batch: 9531/20099 (47.42%) Loss: 2.188153 LR: 0.00002347 +[12:11:39] Epoch: 1 Batch: 9532/20099 (47.43%) Loss: 2.266184 LR: 0.00002347 +[12:11:40] Epoch: 1 Batch: 9533/20099 (47.43%) Loss: 2.034377 LR: 0.00002347 +[12:11:42] Epoch: 1 Batch: 9534/20099 (47.44%) Loss: 2.031463 LR: 0.00002347 +[12:11:44] Epoch: 1 Batch: 9535/20099 (47.44%) Loss: 2.222976 LR: 0.00002347 +[12:11:46] Epoch: 1 Batch: 9536/20099 (47.45%) Loss: 1.937987 LR: 0.00002347 +[12:11:47] Epoch: 1 Batch: 9537/20099 (47.45%) Loss: 2.139163 LR: 0.00002345 +[12:11:49] Epoch: 1 Batch: 9538/20099 (47.46%) Loss: 2.161989 LR: 0.00002345 +[12:11:51] Epoch: 1 Batch: 9539/20099 (47.46%) Loss: 1.978923 LR: 0.00002345 +[12:11:53] Epoch: 1 Batch: 9540/20099 (47.47%) Loss: 2.172172 LR: 0.00002345 +[12:11:54] Epoch: 1 Batch: 9541/20099 (47.47%) Loss: 2.032835 LR: 0.00002345 +[12:11:56] Epoch: 1 Batch: 9542/20099 (47.47%) Loss: 2.350900 LR: 0.00002345 +[12:11:58] Epoch: 1 Batch: 9543/20099 (47.48%) Loss: 1.959613 LR: 0.00002345 +[12:12:00] Epoch: 1 Batch: 9544/20099 (47.48%) Loss: 2.127566 LR: 0.00002344 +[12:12:02] Epoch: 1 Batch: 9545/20099 (47.49%) Loss: 1.962626 LR: 0.00002344 +[12:12:03] Epoch: 1 Batch: 9546/20099 (47.49%) Loss: 2.079700 LR: 0.00002344 +[12:12:05] Epoch: 1 Batch: 9547/20099 (47.50%) Loss: 1.808757 LR: 0.00002344 +[12:12:07] Epoch: 1 Batch: 9548/20099 (47.50%) Loss: 1.974911 LR: 0.00002344 +[12:12:09] Epoch: 1 Batch: 9549/20099 (47.51%) Loss: 1.945500 LR: 0.00002344 +[12:12:10] Epoch: 1 Batch: 9550/20099 (47.51%) Loss: 2.254160 LR: 0.00002344 +[12:12:12] Epoch: 1 Batch: 9551/20099 (47.52%) Loss: 2.463157 LR: 0.00002342 +[12:12:14] Epoch: 1 Batch: 9552/20099 (47.52%) Loss: 2.336564 LR: 0.00002342 +[12:12:16] Epoch: 1 Batch: 9553/20099 (47.53%) Loss: 2.147993 LR: 0.00002342 +[12:12:17] Epoch: 1 Batch: 9554/20099 (47.53%) Loss: 2.160455 LR: 0.00002342 +[12:12:19] Epoch: 1 Batch: 9555/20099 (47.54%) Loss: 1.993869 LR: 0.00002342 +[12:12:21] Epoch: 1 Batch: 9556/20099 (47.54%) Loss: 2.200224 LR: 0.00002342 +[12:12:23] Epoch: 1 Batch: 9557/20099 (47.55%) Loss: 2.165503 LR: 0.00002342 +[12:12:24] Epoch: 1 Batch: 9558/20099 (47.55%) Loss: 1.916026 LR: 0.00002341 +[12:12:26] Epoch: 1 Batch: 9559/20099 (47.56%) Loss: 1.905039 LR: 0.00002341 +[12:12:28] Epoch: 1 Batch: 9560/20099 (47.56%) Loss: 2.041066 LR: 0.00002341 +[12:12:30] Epoch: 1 Batch: 9561/20099 (47.57%) Loss: 2.348471 LR: 0.00002341 +[12:12:32] Epoch: 1 Batch: 9562/20099 (47.57%) Loss: 1.924349 LR: 0.00002341 +[12:12:33] Epoch: 1 Batch: 9563/20099 (47.58%) Loss: 1.907700 LR: 0.00002341 +[12:12:35] Epoch: 1 Batch: 9564/20099 (47.58%) Loss: 1.937733 LR: 0.00002341 +[12:12:37] Epoch: 1 Batch: 9565/20099 (47.59%) Loss: 1.994278 LR: 0.00002339 +[12:12:39] Epoch: 1 Batch: 9566/20099 (47.59%) Loss: 2.114252 LR: 0.00002339 +[12:12:40] Epoch: 1 Batch: 9567/20099 (47.60%) Loss: 2.197126 LR: 0.00002339 +[12:12:42] Epoch: 1 Batch: 9568/20099 (47.60%) Loss: 2.332936 LR: 0.00002339 +[12:12:44] Epoch: 1 Batch: 9569/20099 (47.61%) Loss: 2.410905 LR: 0.00002339 +[12:12:46] Epoch: 1 Batch: 9570/20099 (47.61%) Loss: 2.191952 LR: 0.00002339 +[12:12:48] Epoch: 1 Batch: 9571/20099 (47.62%) Loss: 1.933559 LR: 0.00002339 +[12:12:49] Epoch: 1 Batch: 9572/20099 (47.62%) Loss: 2.111486 LR: 0.00002338 +[12:12:51] Epoch: 1 Batch: 9573/20099 (47.63%) Loss: 2.042972 LR: 0.00002338 +[12:12:53] Epoch: 1 Batch: 9574/20099 (47.63%) Loss: 2.065355 LR: 0.00002338 +[12:12:55] Epoch: 1 Batch: 9575/20099 (47.64%) Loss: 2.442487 LR: 0.00002338 +[12:12:56] Epoch: 1 Batch: 9576/20099 (47.64%) Loss: 2.338300 LR: 0.00002338 +[12:12:58] Epoch: 1 Batch: 9577/20099 (47.65%) Loss: 2.454665 LR: 0.00002338 +[12:13:00] Epoch: 1 Batch: 9578/20099 (47.65%) Loss: 1.964149 LR: 0.00002338 +[12:13:02] Epoch: 1 Batch: 9579/20099 (47.66%) Loss: 1.956479 LR: 0.00002337 +[12:13:04] Epoch: 1 Batch: 9580/20099 (47.66%) Loss: 1.883209 LR: 0.00002337 +[12:13:05] Epoch: 1 Batch: 9581/20099 (47.67%) Loss: 2.015712 LR: 0.00002337 +[12:13:07] Epoch: 1 Batch: 9582/20099 (47.67%) Loss: 2.117326 LR: 0.00002337 +[12:13:09] Epoch: 1 Batch: 9583/20099 (47.68%) Loss: 2.064885 LR: 0.00002337 +[12:13:11] Epoch: 1 Batch: 9584/20099 (47.68%) Loss: 2.009439 LR: 0.00002337 +[12:13:13] Epoch: 1 Batch: 9585/20099 (47.69%) Loss: 2.061685 LR: 0.00002337 +[12:13:14] Epoch: 1 Batch: 9586/20099 (47.69%) Loss: 2.136891 LR: 0.00002335 +[12:13:16] Epoch: 1 Batch: 9587/20099 (47.70%) Loss: 1.811215 LR: 0.00002335 +[12:13:18] Epoch: 1 Batch: 9588/20099 (47.70%) Loss: 2.181927 LR: 0.00002335 +[12:13:20] Epoch: 1 Batch: 9589/20099 (47.71%) Loss: 1.950021 LR: 0.00002335 +[12:13:21] Epoch: 1 Batch: 9590/20099 (47.71%) Loss: 2.129468 LR: 0.00002335 +[12:13:23] Epoch: 1 Batch: 9591/20099 (47.72%) Loss: 2.275191 LR: 0.00002335 +[12:13:25] Epoch: 1 Batch: 9592/20099 (47.72%) Loss: 2.161346 LR: 0.00002335 +[12:13:27] Epoch: 1 Batch: 9593/20099 (47.73%) Loss: 2.101499 LR: 0.00002334 +[12:13:29] Epoch: 1 Batch: 9594/20099 (47.73%) Loss: 2.289225 LR: 0.00002334 +[12:13:30] Epoch: 1 Batch: 9595/20099 (47.74%) Loss: 2.224034 LR: 0.00002334 +[12:13:32] Epoch: 1 Batch: 9596/20099 (47.74%) Loss: 1.720661 LR: 0.00002334 +[12:13:34] Epoch: 1 Batch: 9597/20099 (47.75%) Loss: 2.065393 LR: 0.00002334 +[12:13:36] Epoch: 1 Batch: 9598/20099 (47.75%) Loss: 2.297238 LR: 0.00002334 +[12:13:37] Epoch: 1 Batch: 9599/20099 (47.76%) Loss: 1.973967 LR: 0.00002334 +[12:13:43] >> Cleaned up old temp checkpoint: epoch1_step7600 +[12:13:43] >> Temp checkpoint saved: epoch1_step9600, size: 0.1693 GB +[12:13:43] Epoch: 1 Batch: 9600/20099 (47.76%) Loss: 2.267563 LR: 0.00002332 +[12:13:45] Epoch: 1 Batch: 9601/20099 (47.77%) Loss: 2.149307 LR: 0.00002332 +[12:13:46] Epoch: 1 Batch: 9602/20099 (47.77%) Loss: 1.933253 LR: 0.00002332 +[12:13:48] Epoch: 1 Batch: 9603/20099 (47.78%) Loss: 2.150386 LR: 0.00002332 +[12:13:50] Epoch: 1 Batch: 9604/20099 (47.78%) Loss: 2.231473 LR: 0.00002332 +[12:13:52] Epoch: 1 Batch: 9605/20099 (47.79%) Loss: 2.124032 LR: 0.00002332 +[12:13:53] Epoch: 1 Batch: 9606/20099 (47.79%) Loss: 1.964501 LR: 0.00002332 +[12:13:55] Epoch: 1 Batch: 9607/20099 (47.80%) Loss: 1.723553 LR: 0.00002331 +[12:13:57] Epoch: 1 Batch: 9608/20099 (47.80%) Loss: 2.318446 LR: 0.00002331 +[12:13:59] Epoch: 1 Batch: 9609/20099 (47.81%) Loss: 1.971294 LR: 0.00002331 +[12:14:01] Epoch: 1 Batch: 9610/20099 (47.81%) Loss: 1.709712 LR: 0.00002331 +[12:14:02] Epoch: 1 Batch: 9611/20099 (47.82%) Loss: 2.116408 LR: 0.00002331 +[12:14:04] Epoch: 1 Batch: 9612/20099 (47.82%) Loss: 2.173805 LR: 0.00002331 +[12:14:06] Epoch: 1 Batch: 9613/20099 (47.83%) Loss: 2.080377 LR: 0.00002331 +[12:14:08] Epoch: 1 Batch: 9614/20099 (47.83%) Loss: 2.019376 LR: 0.00002329 +[12:14:10] Epoch: 1 Batch: 9615/20099 (47.84%) Loss: 1.795697 LR: 0.00002329 +[12:14:11] Epoch: 1 Batch: 9616/20099 (47.84%) Loss: 2.285605 LR: 0.00002329 +[12:14:13] Epoch: 1 Batch: 9617/20099 (47.85%) Loss: 2.366087 LR: 0.00002329 +[12:14:15] Epoch: 1 Batch: 9618/20099 (47.85%) Loss: 2.025467 LR: 0.00002329 +[12:14:17] Epoch: 1 Batch: 9619/20099 (47.86%) Loss: 1.881422 LR: 0.00002329 +[12:14:19] Epoch: 1 Batch: 9620/20099 (47.86%) Loss: 2.139440 LR: 0.00002329 +[12:14:20] Epoch: 1 Batch: 9621/20099 (47.87%) Loss: 2.177006 LR: 0.00002328 +[12:14:22] Epoch: 1 Batch: 9622/20099 (47.87%) Loss: 2.354422 LR: 0.00002328 +[12:14:24] Epoch: 1 Batch: 9623/20099 (47.88%) Loss: 2.086661 LR: 0.00002328 +[12:14:26] Epoch: 1 Batch: 9624/20099 (47.88%) Loss: 2.100587 LR: 0.00002328 +[12:14:27] Epoch: 1 Batch: 9625/20099 (47.89%) Loss: 2.220459 LR: 0.00002328 +[12:14:29] Epoch: 1 Batch: 9626/20099 (47.89%) Loss: 2.049002 LR: 0.00002328 +[12:14:31] Epoch: 1 Batch: 9627/20099 (47.90%) Loss: 2.007426 LR: 0.00002328 +[12:14:33] Epoch: 1 Batch: 9628/20099 (47.90%) Loss: 2.145896 LR: 0.00002326 +[12:14:35] Epoch: 1 Batch: 9629/20099 (47.91%) Loss: 2.151596 LR: 0.00002326 +[12:14:36] Epoch: 1 Batch: 9630/20099 (47.91%) Loss: 2.106221 LR: 0.00002326 +[12:14:38] Epoch: 1 Batch: 9631/20099 (47.92%) Loss: 2.197315 LR: 0.00002326 +[12:14:40] Epoch: 1 Batch: 9632/20099 (47.92%) Loss: 1.722337 LR: 0.00002326 +[12:14:42] Epoch: 1 Batch: 9633/20099 (47.93%) Loss: 2.180326 LR: 0.00002326 +[12:14:43] Epoch: 1 Batch: 9634/20099 (47.93%) Loss: 2.322172 LR: 0.00002326 +[12:14:45] Epoch: 1 Batch: 9635/20099 (47.94%) Loss: 2.113185 LR: 0.00002325 +[12:14:47] Epoch: 1 Batch: 9636/20099 (47.94%) Loss: 2.203243 LR: 0.00002325 +[12:14:49] Epoch: 1 Batch: 9637/20099 (47.95%) Loss: 2.201151 LR: 0.00002325 +[12:14:50] Epoch: 1 Batch: 9638/20099 (47.95%) Loss: 2.063866 LR: 0.00002325 +[12:14:52] Epoch: 1 Batch: 9639/20099 (47.96%) Loss: 2.158506 LR: 0.00002325 +[12:14:54] Epoch: 1 Batch: 9640/20099 (47.96%) Loss: 2.308621 LR: 0.00002325 +[12:14:56] Epoch: 1 Batch: 9641/20099 (47.97%) Loss: 2.239919 LR: 0.00002325 +[12:14:58] Epoch: 1 Batch: 9642/20099 (47.97%) Loss: 2.150860 LR: 0.00002323 +[12:14:59] Epoch: 1 Batch: 9643/20099 (47.98%) Loss: 2.101541 LR: 0.00002323 +[12:15:01] Epoch: 1 Batch: 9644/20099 (47.98%) Loss: 2.161454 LR: 0.00002323 +[12:15:03] Epoch: 1 Batch: 9645/20099 (47.99%) Loss: 2.299078 LR: 0.00002323 +[12:15:05] Epoch: 1 Batch: 9646/20099 (47.99%) Loss: 2.198099 LR: 0.00002323 +[12:15:06] Epoch: 1 Batch: 9647/20099 (48.00%) Loss: 1.828958 LR: 0.00002323 +[12:15:08] Epoch: 1 Batch: 9648/20099 (48.00%) Loss: 2.449581 LR: 0.00002323 +[12:15:10] Epoch: 1 Batch: 9649/20099 (48.01%) Loss: 1.972852 LR: 0.00002322 +[12:15:12] Epoch: 1 Batch: 9650/20099 (48.01%) Loss: 2.285273 LR: 0.00002322 +[12:15:13] Epoch: 1 Batch: 9651/20099 (48.02%) Loss: 2.017434 LR: 0.00002322 +[12:15:15] Epoch: 1 Batch: 9652/20099 (48.02%) Loss: 1.872800 LR: 0.00002322 +[12:15:17] Epoch: 1 Batch: 9653/20099 (48.03%) Loss: 2.480857 LR: 0.00002322 +[12:15:19] Epoch: 1 Batch: 9654/20099 (48.03%) Loss: 2.159918 LR: 0.00002322 +[12:15:20] Epoch: 1 Batch: 9655/20099 (48.04%) Loss: 2.037303 LR: 0.00002322 +[12:15:22] Epoch: 1 Batch: 9656/20099 (48.04%) Loss: 2.211561 LR: 0.00002321 +[12:15:24] Epoch: 1 Batch: 9657/20099 (48.05%) Loss: 2.079923 LR: 0.00002321 +[12:15:26] Epoch: 1 Batch: 9658/20099 (48.05%) Loss: 2.046684 LR: 0.00002321 +[12:15:28] Epoch: 1 Batch: 9659/20099 (48.06%) Loss: 2.084955 LR: 0.00002321 +[12:15:29] Epoch: 1 Batch: 9660/20099 (48.06%) Loss: 2.267996 LR: 0.00002321 +[12:15:31] Epoch: 1 Batch: 9661/20099 (48.07%) Loss: 2.004087 LR: 0.00002321 +[12:15:33] Epoch: 1 Batch: 9662/20099 (48.07%) Loss: 2.089333 LR: 0.00002321 +[12:15:35] Epoch: 1 Batch: 9663/20099 (48.08%) Loss: 1.987405 LR: 0.00002319 +[12:15:36] Epoch: 1 Batch: 9664/20099 (48.08%) Loss: 2.298083 LR: 0.00002319 +[12:15:38] Epoch: 1 Batch: 9665/20099 (48.09%) Loss: 1.799335 LR: 0.00002319 +[12:15:40] Epoch: 1 Batch: 9666/20099 (48.09%) Loss: 2.069806 LR: 0.00002319 +[12:15:42] Epoch: 1 Batch: 9667/20099 (48.10%) Loss: 1.828833 LR: 0.00002319 +[12:15:44] Epoch: 1 Batch: 9668/20099 (48.10%) Loss: 1.984497 LR: 0.00002319 +[12:15:45] Epoch: 1 Batch: 9669/20099 (48.11%) Loss: 2.270133 LR: 0.00002319 +[12:15:47] Epoch: 1 Batch: 9670/20099 (48.11%) Loss: 2.070780 LR: 0.00002318 +[12:15:49] Epoch: 1 Batch: 9671/20099 (48.12%) Loss: 2.201030 LR: 0.00002318 +[12:15:51] Epoch: 1 Batch: 9672/20099 (48.12%) Loss: 2.148485 LR: 0.00002318 +[12:15:53] Epoch: 1 Batch: 9673/20099 (48.13%) Loss: 1.987496 LR: 0.00002318 +[12:15:54] Epoch: 1 Batch: 9674/20099 (48.13%) Loss: 1.993322 LR: 0.00002318 +[12:15:56] Epoch: 1 Batch: 9675/20099 (48.14%) Loss: 2.198798 LR: 0.00002318 +[12:15:58] Epoch: 1 Batch: 9676/20099 (48.14%) Loss: 2.220004 LR: 0.00002318 +[12:16:00] Epoch: 1 Batch: 9677/20099 (48.15%) Loss: 2.133226 LR: 0.00002316 +[12:16:01] Epoch: 1 Batch: 9678/20099 (48.15%) Loss: 1.875969 LR: 0.00002316 +[12:16:03] Epoch: 1 Batch: 9679/20099 (48.16%) Loss: 1.812950 LR: 0.00002316 +[12:16:05] Epoch: 1 Batch: 9680/20099 (48.16%) Loss: 2.078358 LR: 0.00002316 +[12:16:07] Epoch: 1 Batch: 9681/20099 (48.17%) Loss: 2.231493 LR: 0.00002316 +[12:16:09] Epoch: 1 Batch: 9682/20099 (48.17%) Loss: 1.975408 LR: 0.00002316 +[12:16:10] Epoch: 1 Batch: 9683/20099 (48.18%) Loss: 2.030170 LR: 0.00002316 +[12:16:12] Epoch: 1 Batch: 9684/20099 (48.18%) Loss: 2.245378 LR: 0.00002315 +[12:16:14] Epoch: 1 Batch: 9685/20099 (48.19%) Loss: 1.982608 LR: 0.00002315 +[12:16:16] Epoch: 1 Batch: 9686/20099 (48.19%) Loss: 2.060401 LR: 0.00002315 +[12:16:17] Epoch: 1 Batch: 9687/20099 (48.20%) Loss: 2.172085 LR: 0.00002315 +[12:16:19] Epoch: 1 Batch: 9688/20099 (48.20%) Loss: 2.174625 LR: 0.00002315 +[12:16:21] Epoch: 1 Batch: 9689/20099 (48.21%) Loss: 2.418902 LR: 0.00002315 +[12:16:23] Epoch: 1 Batch: 9690/20099 (48.21%) Loss: 1.924290 LR: 0.00002315 +[12:16:25] Epoch: 1 Batch: 9691/20099 (48.22%) Loss: 2.034927 LR: 0.00002313 +[12:16:26] Epoch: 1 Batch: 9692/20099 (48.22%) Loss: 1.946664 LR: 0.00002313 +[12:16:28] Epoch: 1 Batch: 9693/20099 (48.23%) Loss: 2.019085 LR: 0.00002313 +[12:16:30] Epoch: 1 Batch: 9694/20099 (48.23%) Loss: 2.365286 LR: 0.00002313 +[12:16:32] Epoch: 1 Batch: 9695/20099 (48.24%) Loss: 2.205871 LR: 0.00002313 +[12:16:34] Epoch: 1 Batch: 9696/20099 (48.24%) Loss: 1.919348 LR: 0.00002313 +[12:16:35] Epoch: 1 Batch: 9697/20099 (48.25%) Loss: 2.209333 LR: 0.00002313 +[12:16:37] Epoch: 1 Batch: 9698/20099 (48.25%) Loss: 2.201395 LR: 0.00002312 +[12:16:39] Epoch: 1 Batch: 9699/20099 (48.26%) Loss: 2.264983 LR: 0.00002312 +[12:16:41] Epoch: 1 Batch: 9700/20099 (48.26%) Loss: 2.277491 LR: 0.00002312 +[12:16:42] Epoch: 1 Batch: 9701/20099 (48.27%) Loss: 2.393465 LR: 0.00002312 +[12:16:44] Epoch: 1 Batch: 9702/20099 (48.27%) Loss: 2.392360 LR: 0.00002312 +[12:16:46] Epoch: 1 Batch: 9703/20099 (48.28%) Loss: 1.981941 LR: 0.00002312 +[12:16:48] Epoch: 1 Batch: 9704/20099 (48.28%) Loss: 1.734196 LR: 0.00002312 +[12:16:50] Epoch: 1 Batch: 9705/20099 (48.29%) Loss: 2.170461 LR: 0.00002310 +[12:16:51] Epoch: 1 Batch: 9706/20099 (48.29%) Loss: 2.171437 LR: 0.00002310 +[12:16:53] Epoch: 1 Batch: 9707/20099 (48.30%) Loss: 2.106994 LR: 0.00002310 +[12:16:55] Epoch: 1 Batch: 9708/20099 (48.30%) Loss: 2.176276 LR: 0.00002310 +[12:16:57] Epoch: 1 Batch: 9709/20099 (48.31%) Loss: 2.345944 LR: 0.00002310 +[12:16:58] Epoch: 1 Batch: 9710/20099 (48.31%) Loss: 2.146114 LR: 0.00002310 +[12:17:00] Epoch: 1 Batch: 9711/20099 (48.32%) Loss: 1.729410 LR: 0.00002310 +[12:17:02] Epoch: 1 Batch: 9712/20099 (48.32%) Loss: 1.806854 LR: 0.00002309 +[12:17:04] Epoch: 1 Batch: 9713/20099 (48.33%) Loss: 2.167802 LR: 0.00002309 +[12:17:06] Epoch: 1 Batch: 9714/20099 (48.33%) Loss: 2.027606 LR: 0.00002309 +[12:17:07] Epoch: 1 Batch: 9715/20099 (48.34%) Loss: 2.408265 LR: 0.00002309 +[12:17:09] Epoch: 1 Batch: 9716/20099 (48.34%) Loss: 2.060499 LR: 0.00002309 +[12:17:11] Epoch: 1 Batch: 9717/20099 (48.35%) Loss: 1.888999 LR: 0.00002309 +[12:17:13] Epoch: 1 Batch: 9718/20099 (48.35%) Loss: 2.045839 LR: 0.00002309 +[12:17:14] Epoch: 1 Batch: 9719/20099 (48.36%) Loss: 2.117521 LR: 0.00002307 +[12:17:16] Epoch: 1 Batch: 9720/20099 (48.36%) Loss: 2.208103 LR: 0.00002307 +[12:17:18] Epoch: 1 Batch: 9721/20099 (48.37%) Loss: 2.088445 LR: 0.00002307 +[12:17:20] Epoch: 1 Batch: 9722/20099 (48.37%) Loss: 2.064506 LR: 0.00002307 +[12:17:22] Epoch: 1 Batch: 9723/20099 (48.38%) Loss: 2.199306 LR: 0.00002307 +[12:17:23] Epoch: 1 Batch: 9724/20099 (48.38%) Loss: 2.076916 LR: 0.00002307 +[12:17:25] Epoch: 1 Batch: 9725/20099 (48.39%) Loss: 2.051095 LR: 0.00002307 +[12:17:27] Epoch: 1 Batch: 9726/20099 (48.39%) Loss: 2.058751 LR: 0.00002306 +[12:17:29] Epoch: 1 Batch: 9727/20099 (48.40%) Loss: 1.904768 LR: 0.00002306 +[12:17:30] Epoch: 1 Batch: 9728/20099 (48.40%) Loss: 2.165364 LR: 0.00002306 +[12:17:32] Epoch: 1 Batch: 9729/20099 (48.41%) Loss: 2.338597 LR: 0.00002306 +[12:17:34] Epoch: 1 Batch: 9730/20099 (48.41%) Loss: 2.171343 LR: 0.00002306 +[12:17:36] Epoch: 1 Batch: 9731/20099 (48.42%) Loss: 2.113727 LR: 0.00002306 +[12:17:38] Epoch: 1 Batch: 9732/20099 (48.42%) Loss: 2.090168 LR: 0.00002306 +[12:17:39] Epoch: 1 Batch: 9733/20099 (48.43%) Loss: 2.442111 LR: 0.00002304 +[12:17:41] Epoch: 1 Batch: 9734/20099 (48.43%) Loss: 1.821194 LR: 0.00002304 +[12:17:43] Epoch: 1 Batch: 9735/20099 (48.44%) Loss: 2.033279 LR: 0.00002304 +[12:17:45] Epoch: 1 Batch: 9736/20099 (48.44%) Loss: 1.940520 LR: 0.00002304 +[12:17:46] Epoch: 1 Batch: 9737/20099 (48.45%) Loss: 2.163819 LR: 0.00002304 +[12:17:48] Epoch: 1 Batch: 9738/20099 (48.45%) Loss: 1.867843 LR: 0.00002304 +[12:17:50] Epoch: 1 Batch: 9739/20099 (48.46%) Loss: 2.108232 LR: 0.00002304 +[12:17:52] Epoch: 1 Batch: 9740/20099 (48.46%) Loss: 2.200374 LR: 0.00002303 +[12:17:54] Epoch: 1 Batch: 9741/20099 (48.47%) Loss: 1.966208 LR: 0.00002303 +[12:17:55] Epoch: 1 Batch: 9742/20099 (48.47%) Loss: 2.421994 LR: 0.00002303 +[12:17:57] Epoch: 1 Batch: 9743/20099 (48.48%) Loss: 2.133201 LR: 0.00002303 +[12:17:59] Epoch: 1 Batch: 9744/20099 (48.48%) Loss: 1.802374 LR: 0.00002303 +[12:18:01] Epoch: 1 Batch: 9745/20099 (48.48%) Loss: 2.092486 LR: 0.00002303 +[12:18:02] Epoch: 1 Batch: 9746/20099 (48.49%) Loss: 2.217163 LR: 0.00002303 +[12:18:04] Epoch: 1 Batch: 9747/20099 (48.49%) Loss: 1.827517 LR: 0.00002301 +[12:18:06] Epoch: 1 Batch: 9748/20099 (48.50%) Loss: 1.746747 LR: 0.00002301 +[12:18:08] Epoch: 1 Batch: 9749/20099 (48.50%) Loss: 2.012774 LR: 0.00002301 +[12:18:10] Epoch: 1 Batch: 9750/20099 (48.51%) Loss: 2.018434 LR: 0.00002301 +[12:18:11] Epoch: 1 Batch: 9751/20099 (48.51%) Loss: 2.181140 LR: 0.00002301 +[12:18:13] Epoch: 1 Batch: 9752/20099 (48.52%) Loss: 2.155650 LR: 0.00002301 +[12:18:15] Epoch: 1 Batch: 9753/20099 (48.52%) Loss: 2.316419 LR: 0.00002301 +[12:18:17] Epoch: 1 Batch: 9754/20099 (48.53%) Loss: 2.318587 LR: 0.00002300 +[12:18:18] Epoch: 1 Batch: 9755/20099 (48.53%) Loss: 2.075574 LR: 0.00002300 +[12:18:20] Epoch: 1 Batch: 9756/20099 (48.54%) Loss: 2.383450 LR: 0.00002300 +[12:18:22] Epoch: 1 Batch: 9757/20099 (48.54%) Loss: 2.413986 LR: 0.00002300 +[12:18:24] Epoch: 1 Batch: 9758/20099 (48.55%) Loss: 2.051620 LR: 0.00002300 +[12:18:26] Epoch: 1 Batch: 9759/20099 (48.55%) Loss: 2.214840 LR: 0.00002300 +[12:18:27] Epoch: 1 Batch: 9760/20099 (48.56%) Loss: 2.176710 LR: 0.00002300 +[12:18:29] Epoch: 1 Batch: 9761/20099 (48.56%) Loss: 2.137896 LR: 0.00002298 +[12:18:31] Epoch: 1 Batch: 9762/20099 (48.57%) Loss: 2.158567 LR: 0.00002298 +[12:18:33] Epoch: 1 Batch: 9763/20099 (48.57%) Loss: 2.033225 LR: 0.00002298 +[12:18:34] Epoch: 1 Batch: 9764/20099 (48.58%) Loss: 1.970266 LR: 0.00002298 +[12:18:36] Epoch: 1 Batch: 9765/20099 (48.58%) Loss: 2.411299 LR: 0.00002298 +[12:18:38] Epoch: 1 Batch: 9766/20099 (48.59%) Loss: 2.049512 LR: 0.00002298 +[12:18:40] Epoch: 1 Batch: 9767/20099 (48.59%) Loss: 2.095679 LR: 0.00002298 +[12:18:42] Epoch: 1 Batch: 9768/20099 (48.60%) Loss: 1.835089 LR: 0.00002297 +[12:18:43] Epoch: 1 Batch: 9769/20099 (48.60%) Loss: 2.011151 LR: 0.00002297 +[12:18:45] Epoch: 1 Batch: 9770/20099 (48.61%) Loss: 2.147013 LR: 0.00002297 +[12:18:47] Epoch: 1 Batch: 9771/20099 (48.61%) Loss: 2.064754 LR: 0.00002297 +[12:18:49] Epoch: 1 Batch: 9772/20099 (48.62%) Loss: 2.096372 LR: 0.00002297 +[12:18:50] Epoch: 1 Batch: 9773/20099 (48.62%) Loss: 2.115852 LR: 0.00002297 +[12:18:52] Epoch: 1 Batch: 9774/20099 (48.63%) Loss: 2.359218 LR: 0.00002297 +[12:18:54] Epoch: 1 Batch: 9775/20099 (48.63%) Loss: 2.321289 LR: 0.00002296 +[12:18:56] Epoch: 1 Batch: 9776/20099 (48.64%) Loss: 2.155989 LR: 0.00002296 +[12:18:58] Epoch: 1 Batch: 9777/20099 (48.64%) Loss: 2.083322 LR: 0.00002296 +[12:18:59] Epoch: 1 Batch: 9778/20099 (48.65%) Loss: 1.921148 LR: 0.00002296 +[12:19:01] Epoch: 1 Batch: 9779/20099 (48.65%) Loss: 2.186870 LR: 0.00002296 +[12:19:03] Epoch: 1 Batch: 9780/20099 (48.66%) Loss: 1.827183 LR: 0.00002296 +[12:19:05] Epoch: 1 Batch: 9781/20099 (48.66%) Loss: 2.243311 LR: 0.00002296 +[12:19:07] Epoch: 1 Batch: 9782/20099 (48.67%) Loss: 2.056193 LR: 0.00002294 +[12:19:08] Epoch: 1 Batch: 9783/20099 (48.67%) Loss: 2.430165 LR: 0.00002294 +[12:19:10] Epoch: 1 Batch: 9784/20099 (48.68%) Loss: 2.256974 LR: 0.00002294 +[12:19:12] Epoch: 1 Batch: 9785/20099 (48.68%) Loss: 2.357994 LR: 0.00002294 +[12:19:14] Epoch: 1 Batch: 9786/20099 (48.69%) Loss: 2.115239 LR: 0.00002294 +[12:19:15] Epoch: 1 Batch: 9787/20099 (48.69%) Loss: 1.998582 LR: 0.00002294 +[12:19:17] Epoch: 1 Batch: 9788/20099 (48.70%) Loss: 2.065826 LR: 0.00002294 +[12:19:19] Epoch: 1 Batch: 9789/20099 (48.70%) Loss: 1.978796 LR: 0.00002293 +[12:19:21] Epoch: 1 Batch: 9790/20099 (48.71%) Loss: 2.354888 LR: 0.00002293 +[12:19:23] Epoch: 1 Batch: 9791/20099 (48.71%) Loss: 2.205518 LR: 0.00002293 +[12:19:24] Epoch: 1 Batch: 9792/20099 (48.72%) Loss: 2.107474 LR: 0.00002293 +[12:19:26] Epoch: 1 Batch: 9793/20099 (48.72%) Loss: 2.061578 LR: 0.00002293 +[12:19:28] Epoch: 1 Batch: 9794/20099 (48.73%) Loss: 2.280634 LR: 0.00002293 +[12:19:30] Epoch: 1 Batch: 9795/20099 (48.73%) Loss: 1.974273 LR: 0.00002293 +[12:19:31] Epoch: 1 Batch: 9796/20099 (48.74%) Loss: 2.213149 LR: 0.00002291 +[12:19:33] Epoch: 1 Batch: 9797/20099 (48.74%) Loss: 1.736914 LR: 0.00002291 +[12:19:35] Epoch: 1 Batch: 9798/20099 (48.75%) Loss: 2.270214 LR: 0.00002291 +[12:19:37] Epoch: 1 Batch: 9799/20099 (48.75%) Loss: 1.936389 LR: 0.00002291 +[12:19:42] >> Cleaned up old temp checkpoint: epoch1_step7800 +[12:19:42] >> Temp checkpoint saved: epoch1_step9800, size: 0.1693 GB +[12:19:42] Epoch: 1 Batch: 9800/20099 (48.76%) Loss: 2.016015 LR: 0.00002291 +[12:19:44] Epoch: 1 Batch: 9801/20099 (48.76%) Loss: 2.411626 LR: 0.00002291 +[12:19:45] Epoch: 1 Batch: 9802/20099 (48.77%) Loss: 2.200298 LR: 0.00002291 +[12:19:47] Epoch: 1 Batch: 9803/20099 (48.77%) Loss: 2.159188 LR: 0.00002290 +[12:19:49] Epoch: 1 Batch: 9804/20099 (48.78%) Loss: 2.147984 LR: 0.00002290 +[12:19:51] Epoch: 1 Batch: 9805/20099 (48.78%) Loss: 2.002583 LR: 0.00002290 +[12:19:53] Epoch: 1 Batch: 9806/20099 (48.79%) Loss: 1.908840 LR: 0.00002290 +[12:19:54] Epoch: 1 Batch: 9807/20099 (48.79%) Loss: 2.326114 LR: 0.00002290 +[12:19:56] Epoch: 1 Batch: 9808/20099 (48.80%) Loss: 2.094183 LR: 0.00002290 +[12:19:58] Epoch: 1 Batch: 9809/20099 (48.80%) Loss: 2.083468 LR: 0.00002290 +[12:20:00] Epoch: 1 Batch: 9810/20099 (48.81%) Loss: 2.147694 LR: 0.00002288 +[12:20:01] Epoch: 1 Batch: 9811/20099 (48.81%) Loss: 2.243286 LR: 0.00002288 +[12:20:03] Epoch: 1 Batch: 9812/20099 (48.82%) Loss: 1.999692 LR: 0.00002288 +[12:20:05] Epoch: 1 Batch: 9813/20099 (48.82%) Loss: 2.109024 LR: 0.00002288 +[12:20:07] Epoch: 1 Batch: 9814/20099 (48.83%) Loss: 1.773208 LR: 0.00002288 +[12:20:09] Epoch: 1 Batch: 9815/20099 (48.83%) Loss: 2.136113 LR: 0.00002288 +[12:20:10] Epoch: 1 Batch: 9816/20099 (48.84%) Loss: 2.292672 LR: 0.00002288 +[12:20:12] Epoch: 1 Batch: 9817/20099 (48.84%) Loss: 1.942911 LR: 0.00002287 +[12:20:14] Epoch: 1 Batch: 9818/20099 (48.85%) Loss: 2.249252 LR: 0.00002287 +[12:20:16] Epoch: 1 Batch: 9819/20099 (48.85%) Loss: 1.976670 LR: 0.00002287 +[12:20:17] Epoch: 1 Batch: 9820/20099 (48.86%) Loss: 2.062133 LR: 0.00002287 +[12:20:19] Epoch: 1 Batch: 9821/20099 (48.86%) Loss: 1.932003 LR: 0.00002287 +[12:20:21] Epoch: 1 Batch: 9822/20099 (48.87%) Loss: 1.942094 LR: 0.00002287 +[12:20:23] Epoch: 1 Batch: 9823/20099 (48.87%) Loss: 2.287345 LR: 0.00002287 +[12:20:25] Epoch: 1 Batch: 9824/20099 (48.88%) Loss: 2.110444 LR: 0.00002285 +[12:20:26] Epoch: 1 Batch: 9825/20099 (48.88%) Loss: 2.512805 LR: 0.00002285 +[12:20:28] Epoch: 1 Batch: 9826/20099 (48.89%) Loss: 2.175451 LR: 0.00002285 +[12:20:30] Epoch: 1 Batch: 9827/20099 (48.89%) Loss: 2.239279 LR: 0.00002285 +[12:20:32] Epoch: 1 Batch: 9828/20099 (48.90%) Loss: 2.071054 LR: 0.00002285 +[12:20:34] Epoch: 1 Batch: 9829/20099 (48.90%) Loss: 1.947576 LR: 0.00002285 +[12:20:35] Epoch: 1 Batch: 9830/20099 (48.91%) Loss: 2.226377 LR: 0.00002285 +[12:20:37] Epoch: 1 Batch: 9831/20099 (48.91%) Loss: 2.072515 LR: 0.00002284 +[12:20:39] Epoch: 1 Batch: 9832/20099 (48.92%) Loss: 2.272774 LR: 0.00002284 +[12:20:41] Epoch: 1 Batch: 9833/20099 (48.92%) Loss: 1.917523 LR: 0.00002284 +[12:20:42] Epoch: 1 Batch: 9834/20099 (48.93%) Loss: 2.144718 LR: 0.00002284 +[12:20:44] Epoch: 1 Batch: 9835/20099 (48.93%) Loss: 2.122226 LR: 0.00002284 +[12:20:46] Epoch: 1 Batch: 9836/20099 (48.94%) Loss: 1.940105 LR: 0.00002284 +[12:20:48] Epoch: 1 Batch: 9837/20099 (48.94%) Loss: 2.038232 LR: 0.00002284 +[12:20:50] Epoch: 1 Batch: 9838/20099 (48.95%) Loss: 2.260568 LR: 0.00002282 +[12:20:51] Epoch: 1 Batch: 9839/20099 (48.95%) Loss: 2.095588 LR: 0.00002282 +[12:20:53] Epoch: 1 Batch: 9840/20099 (48.96%) Loss: 2.042872 LR: 0.00002282 +[12:20:55] Epoch: 1 Batch: 9841/20099 (48.96%) Loss: 2.052678 LR: 0.00002282 +[12:20:57] Epoch: 1 Batch: 9842/20099 (48.97%) Loss: 1.987655 LR: 0.00002282 +[12:20:58] Epoch: 1 Batch: 9843/20099 (48.97%) Loss: 2.371362 LR: 0.00002282 +[12:21:00] Epoch: 1 Batch: 9844/20099 (48.98%) Loss: 2.022165 LR: 0.00002282 +[12:21:02] Epoch: 1 Batch: 9845/20099 (48.98%) Loss: 2.156317 LR: 0.00002281 +[12:21:04] Epoch: 1 Batch: 9846/20099 (48.99%) Loss: 1.563977 LR: 0.00002281 +[12:21:05] Epoch: 1 Batch: 9847/20099 (48.99%) Loss: 2.358469 LR: 0.00002281 +[12:21:07] Epoch: 1 Batch: 9848/20099 (49.00%) Loss: 2.133670 LR: 0.00002281 +[12:21:09] Epoch: 1 Batch: 9849/20099 (49.00%) Loss: 2.069508 LR: 0.00002281 +[12:21:11] Epoch: 1 Batch: 9850/20099 (49.01%) Loss: 2.234110 LR: 0.00002281 +[12:21:12] Epoch: 1 Batch: 9851/20099 (49.01%) Loss: 2.294077 LR: 0.00002281 +[12:21:14] Epoch: 1 Batch: 9852/20099 (49.02%) Loss: 1.927423 LR: 0.00002279 +[12:21:16] Epoch: 1 Batch: 9853/20099 (49.02%) Loss: 1.878186 LR: 0.00002279 +[12:21:18] Epoch: 1 Batch: 9854/20099 (49.03%) Loss: 1.916600 LR: 0.00002279 +[12:21:20] Epoch: 1 Batch: 9855/20099 (49.03%) Loss: 2.251221 LR: 0.00002279 +[12:21:21] Epoch: 1 Batch: 9856/20099 (49.04%) Loss: 1.692846 LR: 0.00002279 +[12:21:23] Epoch: 1 Batch: 9857/20099 (49.04%) Loss: 2.115594 LR: 0.00002279 +[12:21:25] Epoch: 1 Batch: 9858/20099 (49.05%) Loss: 2.146458 LR: 0.00002279 +[12:21:27] Epoch: 1 Batch: 9859/20099 (49.05%) Loss: 2.249320 LR: 0.00002278 +[12:21:28] Epoch: 1 Batch: 9860/20099 (49.06%) Loss: 2.135123 LR: 0.00002278 +[12:21:30] Epoch: 1 Batch: 9861/20099 (49.06%) Loss: 2.327817 LR: 0.00002278 +[12:21:32] Epoch: 1 Batch: 9862/20099 (49.07%) Loss: 2.357085 LR: 0.00002278 +[12:21:34] Epoch: 1 Batch: 9863/20099 (49.07%) Loss: 2.281836 LR: 0.00002278 +[12:21:35] Epoch: 1 Batch: 9864/20099 (49.08%) Loss: 2.090391 LR: 0.00002278 +[12:21:37] Epoch: 1 Batch: 9865/20099 (49.08%) Loss: 1.902724 LR: 0.00002278 +[12:21:39] Epoch: 1 Batch: 9866/20099 (49.09%) Loss: 2.225040 LR: 0.00002276 +[12:21:41] Epoch: 1 Batch: 9867/20099 (49.09%) Loss: 2.325153 LR: 0.00002276 +[12:21:43] Epoch: 1 Batch: 9868/20099 (49.10%) Loss: 2.182718 LR: 0.00002276 +[12:21:44] Epoch: 1 Batch: 9869/20099 (49.10%) Loss: 2.372121 LR: 0.00002276 +[12:21:46] Epoch: 1 Batch: 9870/20099 (49.11%) Loss: 1.800923 LR: 0.00002276 +[12:21:48] Epoch: 1 Batch: 9871/20099 (49.11%) Loss: 2.132318 LR: 0.00002276 +[12:21:50] Epoch: 1 Batch: 9872/20099 (49.12%) Loss: 2.151332 LR: 0.00002276 +[12:21:52] Epoch: 1 Batch: 9873/20099 (49.12%) Loss: 2.112836 LR: 0.00002275 +[12:21:53] Epoch: 1 Batch: 9874/20099 (49.13%) Loss: 2.205324 LR: 0.00002275 +[12:21:55] Epoch: 1 Batch: 9875/20099 (49.13%) Loss: 2.133307 LR: 0.00002275 +[12:21:57] Epoch: 1 Batch: 9876/20099 (49.14%) Loss: 1.732636 LR: 0.00002275 +[12:21:59] Epoch: 1 Batch: 9877/20099 (49.14%) Loss: 2.102807 LR: 0.00002275 +[12:22:00] Epoch: 1 Batch: 9878/20099 (49.15%) Loss: 1.943948 LR: 0.00002275 +[12:22:02] Epoch: 1 Batch: 9879/20099 (49.15%) Loss: 1.635261 LR: 0.00002275 +[12:22:04] Epoch: 1 Batch: 9880/20099 (49.16%) Loss: 2.257117 LR: 0.00002273 +[12:22:06] Epoch: 1 Batch: 9881/20099 (49.16%) Loss: 2.192789 LR: 0.00002273 +[12:22:08] Epoch: 1 Batch: 9882/20099 (49.17%) Loss: 2.070760 LR: 0.00002273 +[12:22:09] Epoch: 1 Batch: 9883/20099 (49.17%) Loss: 2.363024 LR: 0.00002273 +[12:22:11] Epoch: 1 Batch: 9884/20099 (49.18%) Loss: 1.965975 LR: 0.00002273 +[12:22:13] Epoch: 1 Batch: 9885/20099 (49.18%) Loss: 1.731550 LR: 0.00002273 +[12:22:15] Epoch: 1 Batch: 9886/20099 (49.19%) Loss: 1.811229 LR: 0.00002273 +[12:22:16] Epoch: 1 Batch: 9887/20099 (49.19%) Loss: 2.041632 LR: 0.00002272 +[12:22:18] Epoch: 1 Batch: 9888/20099 (49.20%) Loss: 2.246872 LR: 0.00002272 +[12:22:20] Epoch: 1 Batch: 9889/20099 (49.20%) Loss: 2.099231 LR: 0.00002272 +[12:22:22] Epoch: 1 Batch: 9890/20099 (49.21%) Loss: 2.284229 LR: 0.00002272 +[12:22:24] Epoch: 1 Batch: 9891/20099 (49.21%) Loss: 2.306265 LR: 0.00002272 +[12:22:25] Epoch: 1 Batch: 9892/20099 (49.22%) Loss: 2.028678 LR: 0.00002272 +[12:22:27] Epoch: 1 Batch: 9893/20099 (49.22%) Loss: 2.245099 LR: 0.00002272 +[12:22:29] Epoch: 1 Batch: 9894/20099 (49.23%) Loss: 1.961019 LR: 0.00002270 +[12:22:31] Epoch: 1 Batch: 9895/20099 (49.23%) Loss: 2.126318 LR: 0.00002270 +[12:22:33] Epoch: 1 Batch: 9896/20099 (49.24%) Loss: 2.201535 LR: 0.00002270 +[12:22:34] Epoch: 1 Batch: 9897/20099 (49.24%) Loss: 2.273367 LR: 0.00002270 +[12:22:36] Epoch: 1 Batch: 9898/20099 (49.25%) Loss: 2.095722 LR: 0.00002270 +[12:22:38] Epoch: 1 Batch: 9899/20099 (49.25%) Loss: 1.849874 LR: 0.00002270 +[12:22:40] Epoch: 1 Batch: 9900/20099 (49.26%) Loss: 2.240939 LR: 0.00002270 +[12:22:41] Epoch: 1 Batch: 9901/20099 (49.26%) Loss: 2.136930 LR: 0.00002269 +[12:22:43] Epoch: 1 Batch: 9902/20099 (49.27%) Loss: 2.058798 LR: 0.00002269 +[12:22:45] Epoch: 1 Batch: 9903/20099 (49.27%) Loss: 2.109435 LR: 0.00002269 +[12:22:47] Epoch: 1 Batch: 9904/20099 (49.28%) Loss: 2.137810 LR: 0.00002269 +[12:22:49] Epoch: 1 Batch: 9905/20099 (49.28%) Loss: 2.239021 LR: 0.00002269 +[12:22:50] Epoch: 1 Batch: 9906/20099 (49.29%) Loss: 2.143491 LR: 0.00002269 +[12:22:52] Epoch: 1 Batch: 9907/20099 (49.29%) Loss: 1.801488 LR: 0.00002269 +[12:22:54] Epoch: 1 Batch: 9908/20099 (49.30%) Loss: 2.369668 LR: 0.00002267 +[12:22:56] Epoch: 1 Batch: 9909/20099 (49.30%) Loss: 2.411505 LR: 0.00002267 +[12:22:57] Epoch: 1 Batch: 9910/20099 (49.31%) Loss: 1.957450 LR: 0.00002267 +[12:22:59] Epoch: 1 Batch: 9911/20099 (49.31%) Loss: 2.069764 LR: 0.00002267 +[12:23:01] Epoch: 1 Batch: 9912/20099 (49.32%) Loss: 2.108681 LR: 0.00002267 +[12:23:03] Epoch: 1 Batch: 9913/20099 (49.32%) Loss: 2.093893 LR: 0.00002267 +[12:23:04] Epoch: 1 Batch: 9914/20099 (49.33%) Loss: 2.083066 LR: 0.00002267 +[12:23:06] Epoch: 1 Batch: 9915/20099 (49.33%) Loss: 2.181031 LR: 0.00002266 +[12:23:08] Epoch: 1 Batch: 9916/20099 (49.34%) Loss: 2.053276 LR: 0.00002266 +[12:23:10] Epoch: 1 Batch: 9917/20099 (49.34%) Loss: 2.078971 LR: 0.00002266 +[12:23:12] Epoch: 1 Batch: 9918/20099 (49.35%) Loss: 1.892227 LR: 0.00002266 +[12:23:13] Epoch: 1 Batch: 9919/20099 (49.35%) Loss: 2.243524 LR: 0.00002266 +[12:23:15] Epoch: 1 Batch: 9920/20099 (49.36%) Loss: 2.262689 LR: 0.00002266 +[12:23:17] Epoch: 1 Batch: 9921/20099 (49.36%) Loss: 1.984151 LR: 0.00002266 +[12:23:19] Epoch: 1 Batch: 9922/20099 (49.37%) Loss: 2.415529 LR: 0.00002264 +[12:23:20] Epoch: 1 Batch: 9923/20099 (49.37%) Loss: 2.083714 LR: 0.00002264 +[12:23:22] Epoch: 1 Batch: 9924/20099 (49.38%) Loss: 2.027617 LR: 0.00002264 +[12:23:24] Epoch: 1 Batch: 9925/20099 (49.38%) Loss: 2.233963 LR: 0.00002264 +[12:23:26] Epoch: 1 Batch: 9926/20099 (49.39%) Loss: 1.963231 LR: 0.00002264 +[12:23:27] Epoch: 1 Batch: 9927/20099 (49.39%) Loss: 1.851027 LR: 0.00002264 +[12:23:29] Epoch: 1 Batch: 9928/20099 (49.40%) Loss: 1.839346 LR: 0.00002264 +[12:23:31] Epoch: 1 Batch: 9929/20099 (49.40%) Loss: 2.120089 LR: 0.00002263 +[12:23:33] Epoch: 1 Batch: 9930/20099 (49.41%) Loss: 2.069928 LR: 0.00002263 +[12:23:34] Epoch: 1 Batch: 9931/20099 (49.41%) Loss: 2.037401 LR: 0.00002263 +[12:23:36] Epoch: 1 Batch: 9932/20099 (49.42%) Loss: 2.198977 LR: 0.00002263 +[12:23:38] Epoch: 1 Batch: 9933/20099 (49.42%) Loss: 1.827817 LR: 0.00002263 +[12:23:40] Epoch: 1 Batch: 9934/20099 (49.43%) Loss: 2.114602 LR: 0.00002263 +[12:23:42] Epoch: 1 Batch: 9935/20099 (49.43%) Loss: 2.091907 LR: 0.00002263 +[12:23:44] Epoch: 1 Batch: 9936/20099 (49.44%) Loss: 1.939982 LR: 0.00002261 +[12:23:45] Epoch: 1 Batch: 9937/20099 (49.44%) Loss: 2.099046 LR: 0.00002261 +[12:23:47] Epoch: 1 Batch: 9938/20099 (49.45%) Loss: 2.214905 LR: 0.00002261 +[12:23:49] Epoch: 1 Batch: 9939/20099 (49.45%) Loss: 2.014893 LR: 0.00002261 +[12:23:51] Epoch: 1 Batch: 9940/20099 (49.46%) Loss: 2.194372 LR: 0.00002261 +[12:23:52] Epoch: 1 Batch: 9941/20099 (49.46%) Loss: 1.907268 LR: 0.00002261 +[12:23:54] Epoch: 1 Batch: 9942/20099 (49.47%) Loss: 2.273577 LR: 0.00002261 +[12:23:56] Epoch: 1 Batch: 9943/20099 (49.47%) Loss: 2.323204 LR: 0.00002260 +[12:23:58] Epoch: 1 Batch: 9944/20099 (49.48%) Loss: 2.168359 LR: 0.00002260 +[12:23:59] Epoch: 1 Batch: 9945/20099 (49.48%) Loss: 2.140655 LR: 0.00002260 +[12:24:01] Epoch: 1 Batch: 9946/20099 (49.49%) Loss: 2.177887 LR: 0.00002260 +[12:24:03] Epoch: 1 Batch: 9947/20099 (49.49%) Loss: 1.952934 LR: 0.00002260 +[12:24:05] Epoch: 1 Batch: 9948/20099 (49.49%) Loss: 2.297557 LR: 0.00002260 +[12:24:07] Epoch: 1 Batch: 9949/20099 (49.50%) Loss: 2.125679 LR: 0.00002260 +[12:24:08] Epoch: 1 Batch: 9950/20099 (49.50%) Loss: 2.127336 LR: 0.00002258 +[12:24:10] Epoch: 1 Batch: 9951/20099 (49.51%) Loss: 2.047636 LR: 0.00002258 +[12:24:12] Epoch: 1 Batch: 9952/20099 (49.51%) Loss: 2.244343 LR: 0.00002258 +[12:24:14] Epoch: 1 Batch: 9953/20099 (49.52%) Loss: 2.117914 LR: 0.00002258 +[12:24:15] Epoch: 1 Batch: 9954/20099 (49.52%) Loss: 1.971771 LR: 0.00002258 +[12:24:17] Epoch: 1 Batch: 9955/20099 (49.53%) Loss: 1.890862 LR: 0.00002258 +[12:24:19] Epoch: 1 Batch: 9956/20099 (49.53%) Loss: 1.977502 LR: 0.00002258 +[12:24:21] Epoch: 1 Batch: 9957/20099 (49.54%) Loss: 2.069806 LR: 0.00002257 +[12:24:23] Epoch: 1 Batch: 9958/20099 (49.54%) Loss: 2.217018 LR: 0.00002257 +[12:24:24] Epoch: 1 Batch: 9959/20099 (49.55%) Loss: 2.205698 LR: 0.00002257 +[12:24:26] Epoch: 1 Batch: 9960/20099 (49.55%) Loss: 2.070184 LR: 0.00002257 +[12:24:28] Epoch: 1 Batch: 9961/20099 (49.56%) Loss: 2.001893 LR: 0.00002257 +[12:24:30] Epoch: 1 Batch: 9962/20099 (49.56%) Loss: 2.035520 LR: 0.00002257 +[12:24:31] Epoch: 1 Batch: 9963/20099 (49.57%) Loss: 2.195385 LR: 0.00002257 +[12:24:33] Epoch: 1 Batch: 9964/20099 (49.57%) Loss: 2.038348 LR: 0.00002255 +[12:24:35] Epoch: 1 Batch: 9965/20099 (49.58%) Loss: 2.079867 LR: 0.00002255 +[12:24:37] Epoch: 1 Batch: 9966/20099 (49.58%) Loss: 2.221630 LR: 0.00002255 +[12:24:39] Epoch: 1 Batch: 9967/20099 (49.59%) Loss: 2.020751 LR: 0.00002255 +[12:24:40] Epoch: 1 Batch: 9968/20099 (49.59%) Loss: 2.108041 LR: 0.00002255 +[12:24:42] Epoch: 1 Batch: 9969/20099 (49.60%) Loss: 2.047154 LR: 0.00002255 +[12:24:44] Epoch: 1 Batch: 9970/20099 (49.60%) Loss: 1.993837 LR: 0.00002255 +[12:24:46] Epoch: 1 Batch: 9971/20099 (49.61%) Loss: 2.235640 LR: 0.00002254 +[12:24:47] Epoch: 1 Batch: 9972/20099 (49.61%) Loss: 2.206338 LR: 0.00002254 +[12:24:49] Epoch: 1 Batch: 9973/20099 (49.62%) Loss: 2.227196 LR: 0.00002254 +[12:24:51] Epoch: 1 Batch: 9974/20099 (49.62%) Loss: 2.172691 LR: 0.00002254 +[12:24:53] Epoch: 1 Batch: 9975/20099 (49.63%) Loss: 2.372286 LR: 0.00002254 +[12:24:55] Epoch: 1 Batch: 9976/20099 (49.63%) Loss: 2.059055 LR: 0.00002254 +[12:24:56] Epoch: 1 Batch: 9977/20099 (49.64%) Loss: 2.028141 LR: 0.00002254 +[12:24:58] Epoch: 1 Batch: 9978/20099 (49.64%) Loss: 1.804816 LR: 0.00002252 +[12:25:00] Epoch: 1 Batch: 9979/20099 (49.65%) Loss: 2.351411 LR: 0.00002252 +[12:25:02] Epoch: 1 Batch: 9980/20099 (49.65%) Loss: 1.909527 LR: 0.00002252 +[12:25:03] Epoch: 1 Batch: 9981/20099 (49.66%) Loss: 2.141121 LR: 0.00002252 +[12:25:05] Epoch: 1 Batch: 9982/20099 (49.66%) Loss: 2.252161 LR: 0.00002252 +[12:25:07] Epoch: 1 Batch: 9983/20099 (49.67%) Loss: 2.205798 LR: 0.00002252 +[12:25:09] Epoch: 1 Batch: 9984/20099 (49.67%) Loss: 1.934633 LR: 0.00002252 +[12:25:11] Epoch: 1 Batch: 9985/20099 (49.68%) Loss: 2.144827 LR: 0.00002251 +[12:25:12] Epoch: 1 Batch: 9986/20099 (49.68%) Loss: 2.156864 LR: 0.00002251 +[12:25:14] Epoch: 1 Batch: 9987/20099 (49.69%) Loss: 2.352743 LR: 0.00002251 +[12:25:16] Epoch: 1 Batch: 9988/20099 (49.69%) Loss: 2.014923 LR: 0.00002251 +[12:25:18] Epoch: 1 Batch: 9989/20099 (49.70%) Loss: 1.912745 LR: 0.00002251 +[12:25:20] Epoch: 1 Batch: 9990/20099 (49.70%) Loss: 2.250184 LR: 0.00002251 +[12:25:21] Epoch: 1 Batch: 9991/20099 (49.71%) Loss: 2.038514 LR: 0.00002251 +[12:25:23] Epoch: 1 Batch: 9992/20099 (49.71%) Loss: 2.391297 LR: 0.00002249 +[12:25:25] Epoch: 1 Batch: 9993/20099 (49.72%) Loss: 2.104748 LR: 0.00002249 +[12:25:27] Epoch: 1 Batch: 9994/20099 (49.72%) Loss: 2.165908 LR: 0.00002249 +[12:25:28] Epoch: 1 Batch: 9995/20099 (49.73%) Loss: 2.200402 LR: 0.00002249 +[12:25:30] Epoch: 1 Batch: 9996/20099 (49.73%) Loss: 1.926325 LR: 0.00002249 +[12:25:32] Epoch: 1 Batch: 9997/20099 (49.74%) Loss: 1.948569 LR: 0.00002249 +[12:25:34] Epoch: 1 Batch: 9998/20099 (49.74%) Loss: 2.033269 LR: 0.00002249 +[12:25:36] Epoch: 1 Batch: 9999/20099 (49.75%) Loss: 2.029323 LR: 0.00002248 +[12:25:37] >> Evaluating batch 0 +[12:25:38] >> Evaluating batch 1 +[12:25:39] >> Evaluating batch 2 +[12:25:41] >> Evaluating batch 3 +[12:25:42] >> Evaluating batch 4 +[12:25:43] >> Evaluating batch 5 +[12:25:44] >> Evaluating batch 6 +[12:25:45] >> Evaluating batch 7 +[12:25:46] >> Evaluating batch 8 +[12:25:47] >> Evaluating batch 9 +[12:25:48] >> Evaluating batch 10 +[12:25:48] >> Evaluating batch 11 +[12:25:49] >> Evaluating batch 12 +[12:25:50] >> Evaluating batch 13 +[12:25:51] >> Evaluating batch 14 +[12:25:52] >> Evaluating batch 15 +[12:25:53] >> Evaluating batch 16 +[12:25:54] Epoch: 1 Step: 10000/20099 Evaluation: +[12:25:54] [1mAvg Loss Since Last Eval: 2.1046 Val Loss: 2.1679 Validation loss delta: 0.0005 Perplexity: 8.7398 LR: 0.00002248 +[12:25:57] >> Cleaned up old temp checkpoint: epoch1_step8000 +[12:25:57] >> Temp checkpoint saved: epoch1_step10000, size: 0.1693 GB +[12:26:01] >> Checkpoint saved: epoch1_step10000, size: 0.1693 GB +[12:26:01] Epoch: 1 Batch: 10000/20099 (49.75%) Loss: 2.020817 LR: 0.00002248 +[12:26:03] Epoch: 1 Batch: 10001/20099 (49.76%) Loss: 2.300226 LR: 0.00002248 +[12:26:04] Epoch: 1 Batch: 10002/20099 (49.76%) Loss: 1.933957 LR: 0.00002248 +[12:26:06] Epoch: 1 Batch: 10003/20099 (49.77%) Loss: 2.006570 LR: 0.00002248 +[12:26:08] Epoch: 1 Batch: 10004/20099 (49.77%) Loss: 2.299121 LR: 0.00002248 +[12:26:10] Epoch: 1 Batch: 10005/20099 (49.78%) Loss: 2.041704 LR: 0.00002248 +[12:26:11] Epoch: 1 Batch: 10006/20099 (49.78%) Loss: 1.936397 LR: 0.00002246 +[12:26:13] Epoch: 1 Batch: 10007/20099 (49.79%) Loss: 2.199860 LR: 0.00002246 +[12:26:15] Epoch: 1 Batch: 10008/20099 (49.79%) Loss: 2.537230 LR: 0.00002246 +[12:26:17] Epoch: 1 Batch: 10009/20099 (49.80%) Loss: 2.018312 LR: 0.00002246 +[12:26:19] Epoch: 1 Batch: 10010/20099 (49.80%) Loss: 1.829368 LR: 0.00002246 +[12:26:20] Epoch: 1 Batch: 10011/20099 (49.81%) Loss: 2.122959 LR: 0.00002246 +[12:26:22] Epoch: 1 Batch: 10012/20099 (49.81%) Loss: 2.072715 LR: 0.00002246 +[12:26:24] Epoch: 1 Batch: 10013/20099 (49.82%) Loss: 2.366633 LR: 0.00002245 +[12:26:26] Epoch: 1 Batch: 10014/20099 (49.82%) Loss: 2.430928 LR: 0.00002245 +[12:26:28] Epoch: 1 Batch: 10015/20099 (49.83%) Loss: 2.023671 LR: 0.00002245 +[12:26:30] Epoch: 1 Batch: 10016/20099 (49.83%) Loss: 2.012231 LR: 0.00002245 +[12:26:31] Epoch: 1 Batch: 10017/20099 (49.84%) Loss: 1.937811 LR: 0.00002245 +[12:26:33] Epoch: 1 Batch: 10018/20099 (49.84%) Loss: 1.717094 LR: 0.00002245 +[12:26:35] Epoch: 1 Batch: 10019/20099 (49.85%) Loss: 2.190111 LR: 0.00002245 +[12:26:37] Epoch: 1 Batch: 10020/20099 (49.85%) Loss: 2.041499 LR: 0.00002243 +[12:26:39] Epoch: 1 Batch: 10021/20099 (49.86%) Loss: 1.731227 LR: 0.00002243 +[12:26:41] Epoch: 1 Batch: 10022/20099 (49.86%) Loss: 2.117833 LR: 0.00002243 +[12:26:42] Epoch: 1 Batch: 10023/20099 (49.87%) Loss: 2.296460 LR: 0.00002243 +[12:26:44] Epoch: 1 Batch: 10024/20099 (49.87%) Loss: 2.022303 LR: 0.00002243 +[12:26:46] Epoch: 1 Batch: 10025/20099 (49.88%) Loss: 2.330329 LR: 0.00002243 +[12:26:48] Epoch: 1 Batch: 10026/20099 (49.88%) Loss: 2.141462 LR: 0.00002243 +[12:26:50] Epoch: 1 Batch: 10027/20099 (49.89%) Loss: 2.020880 LR: 0.00002242 +[12:26:51] Epoch: 1 Batch: 10028/20099 (49.89%) Loss: 1.975474 LR: 0.00002242 +[12:26:53] Epoch: 1 Batch: 10029/20099 (49.90%) Loss: 2.150536 LR: 0.00002242 +[12:26:55] Epoch: 1 Batch: 10030/20099 (49.90%) Loss: 2.582999 LR: 0.00002242 +[12:26:57] Epoch: 1 Batch: 10031/20099 (49.91%) Loss: 2.003338 LR: 0.00002242 +[12:26:58] Epoch: 1 Batch: 10032/20099 (49.91%) Loss: 1.901347 LR: 0.00002242 +[12:27:00] Epoch: 1 Batch: 10033/20099 (49.92%) Loss: 2.195032 LR: 0.00002242 +[12:27:02] Epoch: 1 Batch: 10034/20099 (49.92%) Loss: 1.930283 LR: 0.00002240 +[12:27:04] Epoch: 1 Batch: 10035/20099 (49.93%) Loss: 2.292161 LR: 0.00002240 +[12:27:05] Epoch: 1 Batch: 10036/20099 (49.93%) Loss: 2.193601 LR: 0.00002240 +[12:27:07] Epoch: 1 Batch: 10037/20099 (49.94%) Loss: 2.003621 LR: 0.00002240 +[12:27:09] Epoch: 1 Batch: 10038/20099 (49.94%) Loss: 2.116432 LR: 0.00002240 +[12:27:11] Epoch: 1 Batch: 10039/20099 (49.95%) Loss: 2.233969 LR: 0.00002240 +[12:27:12] Epoch: 1 Batch: 10040/20099 (49.95%) Loss: 2.130067 LR: 0.00002240 +[12:27:14] Epoch: 1 Batch: 10041/20099 (49.96%) Loss: 2.127692 LR: 0.00002239 +[12:27:16] Epoch: 1 Batch: 10042/20099 (49.96%) Loss: 2.080173 LR: 0.00002239 +[12:27:18] Epoch: 1 Batch: 10043/20099 (49.97%) Loss: 2.220584 LR: 0.00002239 +[12:27:19] Epoch: 1 Batch: 10044/20099 (49.97%) Loss: 2.201909 LR: 0.00002239 +[12:27:21] Epoch: 1 Batch: 10045/20099 (49.98%) Loss: 2.285996 LR: 0.00002239 +[12:27:23] Epoch: 1 Batch: 10046/20099 (49.98%) Loss: 1.757471 LR: 0.00002239 +[12:27:25] Epoch: 1 Batch: 10047/20099 (49.99%) Loss: 1.897918 LR: 0.00002239 +[12:27:26] Epoch: 1 Batch: 10048/20099 (49.99%) Loss: 2.383520 LR: 0.00002237 +[12:27:28] Epoch: 1 Batch: 10049/20099 (50.00%) Loss: 2.235969 LR: 0.00002237 +[12:27:30] Epoch: 1 Batch: 10050/20099 (50.00%) Loss: 2.355113 LR: 0.00002237 +[12:27:32] Epoch: 1 Batch: 10051/20099 (50.01%) Loss: 2.123668 LR: 0.00002237 +[12:27:33] Epoch: 1 Batch: 10052/20099 (50.01%) Loss: 2.219716 LR: 0.00002237 +[12:27:35] Epoch: 1 Batch: 10053/20099 (50.02%) Loss: 2.195567 LR: 0.00002237 +[12:27:37] Epoch: 1 Batch: 10054/20099 (50.02%) Loss: 2.123614 LR: 0.00002237 +[12:27:39] Epoch: 1 Batch: 10055/20099 (50.03%) Loss: 2.161660 LR: 0.00002236 +[12:27:40] Epoch: 1 Batch: 10056/20099 (50.03%) Loss: 2.014030 LR: 0.00002236 +[12:27:42] Epoch: 1 Batch: 10057/20099 (50.04%) Loss: 2.172013 LR: 0.00002236 +[12:27:44] Epoch: 1 Batch: 10058/20099 (50.04%) Loss: 2.135809 LR: 0.00002236 +[12:27:46] Epoch: 1 Batch: 10059/20099 (50.05%) Loss: 2.165059 LR: 0.00002236 +[12:27:48] Epoch: 1 Batch: 10060/20099 (50.05%) Loss: 2.107183 LR: 0.00002236 +[12:27:49] Epoch: 1 Batch: 10061/20099 (50.06%) Loss: 1.806012 LR: 0.00002236 +[12:27:51] Epoch: 1 Batch: 10062/20099 (50.06%) Loss: 2.121375 LR: 0.00002234 +[12:27:53] Epoch: 1 Batch: 10063/20099 (50.07%) Loss: 2.103945 LR: 0.00002234 +[12:27:55] Epoch: 1 Batch: 10064/20099 (50.07%) Loss: 2.362548 LR: 0.00002234 +[12:27:56] Epoch: 1 Batch: 10065/20099 (50.08%) Loss: 2.002727 LR: 0.00002234 +[12:27:58] Epoch: 1 Batch: 10066/20099 (50.08%) Loss: 2.065569 LR: 0.00002234 +[12:28:00] Epoch: 1 Batch: 10067/20099 (50.09%) Loss: 1.845837 LR: 0.00002234 +[12:28:02] Epoch: 1 Batch: 10068/20099 (50.09%) Loss: 2.142133 LR: 0.00002234 +[12:28:04] Epoch: 1 Batch: 10069/20099 (50.10%) Loss: 1.983291 LR: 0.00002233 +[12:28:05] Epoch: 1 Batch: 10070/20099 (50.10%) Loss: 2.150580 LR: 0.00002233 +[12:28:07] Epoch: 1 Batch: 10071/20099 (50.11%) Loss: 2.093424 LR: 0.00002233 +[12:28:09] Epoch: 1 Batch: 10072/20099 (50.11%) Loss: 1.796980 LR: 0.00002233 +[12:28:11] Epoch: 1 Batch: 10073/20099 (50.12%) Loss: 2.195577 LR: 0.00002233 +[12:28:13] Epoch: 1 Batch: 10074/20099 (50.12%) Loss: 1.864250 LR: 0.00002233 +[12:28:14] Epoch: 1 Batch: 10075/20099 (50.13%) Loss: 2.233644 LR: 0.00002233 +[12:28:16] Epoch: 1 Batch: 10076/20099 (50.13%) Loss: 2.288469 LR: 0.00002231 +[12:28:18] Epoch: 1 Batch: 10077/20099 (50.14%) Loss: 1.999655 LR: 0.00002231 +[12:28:20] Epoch: 1 Batch: 10078/20099 (50.14%) Loss: 2.230029 LR: 0.00002231 +[12:28:21] Epoch: 1 Batch: 10079/20099 (50.15%) Loss: 2.279404 LR: 0.00002231 +[12:28:23] Epoch: 1 Batch: 10080/20099 (50.15%) Loss: 2.342122 LR: 0.00002231 +[12:28:25] Epoch: 1 Batch: 10081/20099 (50.16%) Loss: 2.029478 LR: 0.00002231 +[12:28:27] Epoch: 1 Batch: 10082/20099 (50.16%) Loss: 1.861949 LR: 0.00002231 +[12:28:29] Epoch: 1 Batch: 10083/20099 (50.17%) Loss: 2.106226 LR: 0.00002230 +[12:28:30] Epoch: 1 Batch: 10084/20099 (50.17%) Loss: 2.390975 LR: 0.00002230 +[12:28:32] Epoch: 1 Batch: 10085/20099 (50.18%) Loss: 2.051450 LR: 0.00002230 +[12:28:34] Epoch: 1 Batch: 10086/20099 (50.18%) Loss: 2.119925 LR: 0.00002230 +[12:28:36] Epoch: 1 Batch: 10087/20099 (50.19%) Loss: 2.282066 LR: 0.00002230 +[12:28:37] Epoch: 1 Batch: 10088/20099 (50.19%) Loss: 2.175742 LR: 0.00002230 +[12:28:39] Epoch: 1 Batch: 10089/20099 (50.20%) Loss: 2.280556 LR: 0.00002230 +[12:28:41] Epoch: 1 Batch: 10090/20099 (50.20%) Loss: 2.259068 LR: 0.00002228 +[12:28:43] Epoch: 1 Batch: 10091/20099 (50.21%) Loss: 2.128651 LR: 0.00002228 +[12:28:44] Epoch: 1 Batch: 10092/20099 (50.21%) Loss: 1.985488 LR: 0.00002228 +[12:28:46] Epoch: 1 Batch: 10093/20099 (50.22%) Loss: 2.261419 LR: 0.00002228 +[12:28:48] Epoch: 1 Batch: 10094/20099 (50.22%) Loss: 2.019954 LR: 0.00002228 +[12:28:50] Epoch: 1 Batch: 10095/20099 (50.23%) Loss: 2.046975 LR: 0.00002228 +[12:28:52] Epoch: 1 Batch: 10096/20099 (50.23%) Loss: 1.763218 LR: 0.00002228 +[12:28:53] Epoch: 1 Batch: 10097/20099 (50.24%) Loss: 2.122265 LR: 0.00002227 +[12:28:55] Epoch: 1 Batch: 10098/20099 (50.24%) Loss: 1.782828 LR: 0.00002227 +[12:28:57] Epoch: 1 Batch: 10099/20099 (50.25%) Loss: 1.971214 LR: 0.00002227 +[12:28:59] Epoch: 1 Batch: 10100/20099 (50.25%) Loss: 2.056597 LR: 0.00002227 +[12:29:00] Epoch: 1 Batch: 10101/20099 (50.26%) Loss: 2.010042 LR: 0.00002227 +[12:29:02] Epoch: 1 Batch: 10102/20099 (50.26%) Loss: 2.241540 LR: 0.00002227 +[12:29:04] Epoch: 1 Batch: 10103/20099 (50.27%) Loss: 2.240625 LR: 0.00002227 +[12:29:06] Epoch: 1 Batch: 10104/20099 (50.27%) Loss: 2.231999 LR: 0.00002225 +[12:29:07] Epoch: 1 Batch: 10105/20099 (50.28%) Loss: 2.539127 LR: 0.00002225 +[12:29:09] Epoch: 1 Batch: 10106/20099 (50.28%) Loss: 1.882254 LR: 0.00002225 +[12:29:11] Epoch: 1 Batch: 10107/20099 (50.29%) Loss: 2.028766 LR: 0.00002225 +[12:29:13] Epoch: 1 Batch: 10108/20099 (50.29%) Loss: 2.158724 LR: 0.00002225 +[12:29:15] Epoch: 1 Batch: 10109/20099 (50.30%) Loss: 2.302498 LR: 0.00002225 +[12:29:16] Epoch: 1 Batch: 10110/20099 (50.30%) Loss: 2.019878 LR: 0.00002225 +[12:29:18] Epoch: 1 Batch: 10111/20099 (50.31%) Loss: 1.955251 LR: 0.00002224 +[12:29:20] Epoch: 1 Batch: 10112/20099 (50.31%) Loss: 2.132048 LR: 0.00002224 +[12:29:22] Epoch: 1 Batch: 10113/20099 (50.32%) Loss: 2.217074 LR: 0.00002224 +[12:29:23] Epoch: 1 Batch: 10114/20099 (50.32%) Loss: 1.868025 LR: 0.00002224 +[12:29:25] Epoch: 1 Batch: 10115/20099 (50.33%) Loss: 2.040937 LR: 0.00002224 +[12:29:27] Epoch: 1 Batch: 10116/20099 (50.33%) Loss: 2.269646 LR: 0.00002224 +[12:29:29] Epoch: 1 Batch: 10117/20099 (50.34%) Loss: 2.136592 LR: 0.00002224 +[12:29:30] Epoch: 1 Batch: 10118/20099 (50.34%) Loss: 2.108685 LR: 0.00002222 +[12:29:32] Epoch: 1 Batch: 10119/20099 (50.35%) Loss: 1.879045 LR: 0.00002222 +[12:29:34] Epoch: 1 Batch: 10120/20099 (50.35%) Loss: 2.104494 LR: 0.00002222 +[12:29:36] Epoch: 1 Batch: 10121/20099 (50.36%) Loss: 1.849973 LR: 0.00002222 +[12:29:38] Epoch: 1 Batch: 10122/20099 (50.36%) Loss: 2.382658 LR: 0.00002222 +[12:29:39] Epoch: 1 Batch: 10123/20099 (50.37%) Loss: 2.118007 LR: 0.00002222 +[12:29:41] Epoch: 1 Batch: 10124/20099 (50.37%) Loss: 2.072966 LR: 0.00002222 +[12:29:43] Epoch: 1 Batch: 10125/20099 (50.38%) Loss: 2.115799 LR: 0.00002220 +[12:29:45] Epoch: 1 Batch: 10126/20099 (50.38%) Loss: 2.274395 LR: 0.00002220 +[12:29:46] Epoch: 1 Batch: 10127/20099 (50.39%) Loss: 2.256159 LR: 0.00002220 +[12:29:48] Epoch: 1 Batch: 10128/20099 (50.39%) Loss: 2.059817 LR: 0.00002220 +[12:29:50] Epoch: 1 Batch: 10129/20099 (50.40%) Loss: 1.961134 LR: 0.00002220 +[12:29:52] Epoch: 1 Batch: 10130/20099 (50.40%) Loss: 2.038703 LR: 0.00002220 +[12:29:54] Epoch: 1 Batch: 10131/20099 (50.41%) Loss: 2.154069 LR: 0.00002220 +[12:29:55] Epoch: 1 Batch: 10132/20099 (50.41%) Loss: 2.056169 LR: 0.00002219 +[12:29:57] Epoch: 1 Batch: 10133/20099 (50.42%) Loss: 2.065122 LR: 0.00002219 +[12:29:59] Epoch: 1 Batch: 10134/20099 (50.42%) Loss: 2.293813 LR: 0.00002219 +[12:30:01] Epoch: 1 Batch: 10135/20099 (50.43%) Loss: 1.643738 LR: 0.00002219 +[12:30:03] Epoch: 1 Batch: 10136/20099 (50.43%) Loss: 1.933113 LR: 0.00002219 +[12:30:04] Epoch: 1 Batch: 10137/20099 (50.44%) Loss: 1.971386 LR: 0.00002219 +[12:30:06] Epoch: 1 Batch: 10138/20099 (50.44%) Loss: 2.006056 LR: 0.00002219 +[12:30:08] Epoch: 1 Batch: 10139/20099 (50.45%) Loss: 1.959049 LR: 0.00002217 +[12:30:10] Epoch: 1 Batch: 10140/20099 (50.45%) Loss: 2.219636 LR: 0.00002217 +[12:30:11] Epoch: 1 Batch: 10141/20099 (50.46%) Loss: 1.906939 LR: 0.00002217 +[12:30:13] Epoch: 1 Batch: 10142/20099 (50.46%) Loss: 2.206557 LR: 0.00002217 +[12:30:15] Epoch: 1 Batch: 10143/20099 (50.47%) Loss: 1.996764 LR: 0.00002217 +[12:30:17] Epoch: 1 Batch: 10144/20099 (50.47%) Loss: 2.203351 LR: 0.00002217 +[12:30:19] Epoch: 1 Batch: 10145/20099 (50.48%) Loss: 2.415311 LR: 0.00002217 +[12:30:20] Epoch: 1 Batch: 10146/20099 (50.48%) Loss: 1.836269 LR: 0.00002216 +[12:30:22] Epoch: 1 Batch: 10147/20099 (50.49%) Loss: 2.120124 LR: 0.00002216 +[12:30:24] Epoch: 1 Batch: 10148/20099 (50.49%) Loss: 2.116754 LR: 0.00002216 +[12:30:26] Epoch: 1 Batch: 10149/20099 (50.50%) Loss: 2.237048 LR: 0.00002216 +[12:30:27] Epoch: 1 Batch: 10150/20099 (50.50%) Loss: 2.060260 LR: 0.00002216 +[12:30:29] Epoch: 1 Batch: 10151/20099 (50.51%) Loss: 2.203682 LR: 0.00002216 +[12:30:31] Epoch: 1 Batch: 10152/20099 (50.51%) Loss: 1.953529 LR: 0.00002216 +[12:30:33] Epoch: 1 Batch: 10153/20099 (50.51%) Loss: 2.310438 LR: 0.00002214 +[12:30:35] Epoch: 1 Batch: 10154/20099 (50.52%) Loss: 2.018197 LR: 0.00002214 +[12:30:36] Epoch: 1 Batch: 10155/20099 (50.52%) Loss: 2.127926 LR: 0.00002214 +[12:30:38] Epoch: 1 Batch: 10156/20099 (50.53%) Loss: 2.195174 LR: 0.00002214 +[12:30:40] Epoch: 1 Batch: 10157/20099 (50.53%) Loss: 2.381013 LR: 0.00002214 +[12:30:42] Epoch: 1 Batch: 10158/20099 (50.54%) Loss: 1.954852 LR: 0.00002214 +[12:30:43] Epoch: 1 Batch: 10159/20099 (50.54%) Loss: 2.530523 LR: 0.00002214 +[12:30:45] Epoch: 1 Batch: 10160/20099 (50.55%) Loss: 2.454019 LR: 0.00002213 +[12:30:47] Epoch: 1 Batch: 10161/20099 (50.55%) Loss: 2.033713 LR: 0.00002213 +[12:30:49] Epoch: 1 Batch: 10162/20099 (50.56%) Loss: 1.986884 LR: 0.00002213 +[12:30:51] Epoch: 1 Batch: 10163/20099 (50.56%) Loss: 2.241462 LR: 0.00002213 +[12:30:52] Epoch: 1 Batch: 10164/20099 (50.57%) Loss: 2.111174 LR: 0.00002213 +[12:30:54] Epoch: 1 Batch: 10165/20099 (50.57%) Loss: 2.218716 LR: 0.00002213 +[12:30:56] Epoch: 1 Batch: 10166/20099 (50.58%) Loss: 2.135056 LR: 0.00002213 +[12:30:58] Epoch: 1 Batch: 10167/20099 (50.58%) Loss: 1.764461 LR: 0.00002211 +[12:30:59] Epoch: 1 Batch: 10168/20099 (50.59%) Loss: 2.258940 LR: 0.00002211 +[12:31:01] Epoch: 1 Batch: 10169/20099 (50.59%) Loss: 2.044407 LR: 0.00002211 +[12:31:03] Epoch: 1 Batch: 10170/20099 (50.60%) Loss: 2.063094 LR: 0.00002211 +[12:31:05] Epoch: 1 Batch: 10171/20099 (50.60%) Loss: 1.949757 LR: 0.00002211 +[12:31:06] Epoch: 1 Batch: 10172/20099 (50.61%) Loss: 2.082707 LR: 0.00002211 +[12:31:08] Epoch: 1 Batch: 10173/20099 (50.61%) Loss: 2.113373 LR: 0.00002211 +[12:31:10] Epoch: 1 Batch: 10174/20099 (50.62%) Loss: 2.191147 LR: 0.00002210 +[12:31:12] Epoch: 1 Batch: 10175/20099 (50.62%) Loss: 2.331698 LR: 0.00002210 +[12:31:14] Epoch: 1 Batch: 10176/20099 (50.63%) Loss: 2.304794 LR: 0.00002210 +[12:31:15] Epoch: 1 Batch: 10177/20099 (50.63%) Loss: 2.242102 LR: 0.00002210 +[12:31:17] Epoch: 1 Batch: 10178/20099 (50.64%) Loss: 2.312136 LR: 0.00002210 +[12:31:19] Epoch: 1 Batch: 10179/20099 (50.64%) Loss: 2.077174 LR: 0.00002210 +[12:31:21] Epoch: 1 Batch: 10180/20099 (50.65%) Loss: 2.076869 LR: 0.00002210 +[12:31:22] Epoch: 1 Batch: 10181/20099 (50.65%) Loss: 2.181428 LR: 0.00002208 +[12:31:24] Epoch: 1 Batch: 10182/20099 (50.66%) Loss: 2.125516 LR: 0.00002208 +[12:31:26] Epoch: 1 Batch: 10183/20099 (50.66%) Loss: 2.280057 LR: 0.00002208 +[12:31:28] Epoch: 1 Batch: 10184/20099 (50.67%) Loss: 2.203995 LR: 0.00002208 +[12:31:30] Epoch: 1 Batch: 10185/20099 (50.67%) Loss: 1.967923 LR: 0.00002208 +[12:31:31] Epoch: 1 Batch: 10186/20099 (50.68%) Loss: 1.853456 LR: 0.00002208 +[12:31:33] Epoch: 1 Batch: 10187/20099 (50.68%) Loss: 2.328783 LR: 0.00002208 +[12:31:35] Epoch: 1 Batch: 10188/20099 (50.69%) Loss: 2.249911 LR: 0.00002207 +[12:31:37] Epoch: 1 Batch: 10189/20099 (50.69%) Loss: 1.974304 LR: 0.00002207 +[12:31:38] Epoch: 1 Batch: 10190/20099 (50.70%) Loss: 2.268555 LR: 0.00002207 +[12:31:40] Epoch: 1 Batch: 10191/20099 (50.70%) Loss: 2.112212 LR: 0.00002207 +[12:31:42] Epoch: 1 Batch: 10192/20099 (50.71%) Loss: 1.898471 LR: 0.00002207 +[12:31:44] Epoch: 1 Batch: 10193/20099 (50.71%) Loss: 2.288927 LR: 0.00002207 +[12:31:46] Epoch: 1 Batch: 10194/20099 (50.72%) Loss: 2.493155 LR: 0.00002207 +[12:31:47] Epoch: 1 Batch: 10195/20099 (50.72%) Loss: 2.076202 LR: 0.00002205 +[12:31:49] Epoch: 1 Batch: 10196/20099 (50.73%) Loss: 2.083090 LR: 0.00002205 +[12:31:51] Epoch: 1 Batch: 10197/20099 (50.73%) Loss: 2.000123 LR: 0.00002205 +[12:31:53] Epoch: 1 Batch: 10198/20099 (50.74%) Loss: 1.844868 LR: 0.00002205 +[12:31:54] Epoch: 1 Batch: 10199/20099 (50.74%) Loss: 2.204826 LR: 0.00002205 +[12:32:00] >> Cleaned up old temp checkpoint: epoch1_step8200 +[12:32:00] >> Temp checkpoint saved: epoch1_step10200, size: 0.1693 GB +[12:32:00] Epoch: 1 Batch: 10200/20099 (50.75%) Loss: 1.822114 LR: 0.00002205 +[12:32:02] Epoch: 1 Batch: 10201/20099 (50.75%) Loss: 1.940187 LR: 0.00002205 +[12:32:03] Epoch: 1 Batch: 10202/20099 (50.76%) Loss: 1.559281 LR: 0.00002204 +[12:32:05] Epoch: 1 Batch: 10203/20099 (50.76%) Loss: 2.064336 LR: 0.00002204 +[12:32:07] Epoch: 1 Batch: 10204/20099 (50.77%) Loss: 2.008357 LR: 0.00002204 +[12:32:09] Epoch: 1 Batch: 10205/20099 (50.77%) Loss: 2.183630 LR: 0.00002204 +[12:32:10] Epoch: 1 Batch: 10206/20099 (50.78%) Loss: 2.010585 LR: 0.00002204 +[12:32:12] Epoch: 1 Batch: 10207/20099 (50.78%) Loss: 1.887463 LR: 0.00002204 +[12:32:14] Epoch: 1 Batch: 10208/20099 (50.79%) Loss: 2.308248 LR: 0.00002204 +[12:32:16] Epoch: 1 Batch: 10209/20099 (50.79%) Loss: 2.325475 LR: 0.00002202 +[12:32:18] Epoch: 1 Batch: 10210/20099 (50.80%) Loss: 1.898462 LR: 0.00002202 +[12:32:19] Epoch: 1 Batch: 10211/20099 (50.80%) Loss: 2.313471 LR: 0.00002202 +[12:32:21] Epoch: 1 Batch: 10212/20099 (50.81%) Loss: 2.315353 LR: 0.00002202 +[12:32:23] Epoch: 1 Batch: 10213/20099 (50.81%) Loss: 2.193127 LR: 0.00002202 +[12:32:25] Epoch: 1 Batch: 10214/20099 (50.82%) Loss: 2.156697 LR: 0.00002202 +[12:32:27] Epoch: 1 Batch: 10215/20099 (50.82%) Loss: 2.251478 LR: 0.00002202 +[12:32:28] Epoch: 1 Batch: 10216/20099 (50.83%) Loss: 2.043884 LR: 0.00002201 +[12:32:30] Epoch: 1 Batch: 10217/20099 (50.83%) Loss: 2.243955 LR: 0.00002201 +[12:32:32] Epoch: 1 Batch: 10218/20099 (50.84%) Loss: 1.720449 LR: 0.00002201 +[12:32:34] Epoch: 1 Batch: 10219/20099 (50.84%) Loss: 2.133950 LR: 0.00002201 +[12:32:36] Epoch: 1 Batch: 10220/20099 (50.85%) Loss: 1.886939 LR: 0.00002201 +[12:32:37] Epoch: 1 Batch: 10221/20099 (50.85%) Loss: 2.048932 LR: 0.00002201 +[12:32:39] Epoch: 1 Batch: 10222/20099 (50.86%) Loss: 2.002116 LR: 0.00002201 +[12:32:41] Epoch: 1 Batch: 10223/20099 (50.86%) Loss: 2.014609 LR: 0.00002199 +[12:32:43] Epoch: 1 Batch: 10224/20099 (50.87%) Loss: 1.996554 LR: 0.00002199 +[12:32:44] Epoch: 1 Batch: 10225/20099 (50.87%) Loss: 2.276937 LR: 0.00002199 +[12:32:46] Epoch: 1 Batch: 10226/20099 (50.88%) Loss: 2.177308 LR: 0.00002199 +[12:32:48] Epoch: 1 Batch: 10227/20099 (50.88%) Loss: 2.173513 LR: 0.00002199 +[12:32:50] Epoch: 1 Batch: 10228/20099 (50.89%) Loss: 2.040606 LR: 0.00002199 +[12:32:52] Epoch: 1 Batch: 10229/20099 (50.89%) Loss: 2.269064 LR: 0.00002199 +[12:32:53] Epoch: 1 Batch: 10230/20099 (50.90%) Loss: 2.346204 LR: 0.00002198 +[12:32:55] Epoch: 1 Batch: 10231/20099 (50.90%) Loss: 2.217344 LR: 0.00002198 +[12:32:57] Epoch: 1 Batch: 10232/20099 (50.91%) Loss: 2.331136 LR: 0.00002198 +[12:32:59] Epoch: 1 Batch: 10233/20099 (50.91%) Loss: 2.364319 LR: 0.00002198 +[12:33:01] Epoch: 1 Batch: 10234/20099 (50.92%) Loss: 2.142001 LR: 0.00002198 +[12:33:02] Epoch: 1 Batch: 10235/20099 (50.92%) Loss: 2.154401 LR: 0.00002198 +[12:33:04] Epoch: 1 Batch: 10236/20099 (50.93%) Loss: 2.088140 LR: 0.00002198 +[12:33:06] Epoch: 1 Batch: 10237/20099 (50.93%) Loss: 1.892968 LR: 0.00002196 +[12:33:08] Epoch: 1 Batch: 10238/20099 (50.94%) Loss: 2.155104 LR: 0.00002196 +[12:33:09] Epoch: 1 Batch: 10239/20099 (50.94%) Loss: 2.206733 LR: 0.00002196 +[12:33:11] Epoch: 1 Batch: 10240/20099 (50.95%) Loss: 1.854820 LR: 0.00002196 +[12:33:13] Epoch: 1 Batch: 10241/20099 (50.95%) Loss: 2.110201 LR: 0.00002196 +[12:33:15] Epoch: 1 Batch: 10242/20099 (50.96%) Loss: 2.182367 LR: 0.00002196 +[12:33:16] Epoch: 1 Batch: 10243/20099 (50.96%) Loss: 2.123262 LR: 0.00002196 +[12:33:18] Epoch: 1 Batch: 10244/20099 (50.97%) Loss: 2.082731 LR: 0.00002195 +[12:33:20] Epoch: 1 Batch: 10245/20099 (50.97%) Loss: 2.214356 LR: 0.00002195 +[12:33:22] Epoch: 1 Batch: 10246/20099 (50.98%) Loss: 2.228073 LR: 0.00002195 +[12:33:23] Epoch: 1 Batch: 10247/20099 (50.98%) Loss: 2.368850 LR: 0.00002195 +[12:33:25] Epoch: 1 Batch: 10248/20099 (50.99%) Loss: 1.842714 LR: 0.00002195 +[12:33:27] Epoch: 1 Batch: 10249/20099 (50.99%) Loss: 2.141127 LR: 0.00002195 +[12:33:29] Epoch: 1 Batch: 10250/20099 (51.00%) Loss: 2.095977 LR: 0.00002195 +[12:33:31] Epoch: 1 Batch: 10251/20099 (51.00%) Loss: 2.031660 LR: 0.00002193 +[12:33:32] Epoch: 1 Batch: 10252/20099 (51.01%) Loss: 2.070159 LR: 0.00002193 +[12:33:34] Epoch: 1 Batch: 10253/20099 (51.01%) Loss: 2.254473 LR: 0.00002193 +[12:33:36] Epoch: 1 Batch: 10254/20099 (51.02%) Loss: 1.907931 LR: 0.00002193 +[12:33:38] Epoch: 1 Batch: 10255/20099 (51.02%) Loss: 2.152711 LR: 0.00002193 +[12:33:39] Epoch: 1 Batch: 10256/20099 (51.03%) Loss: 2.090043 LR: 0.00002193 +[12:33:41] Epoch: 1 Batch: 10257/20099 (51.03%) Loss: 1.957006 LR: 0.00002193 +[12:33:43] Epoch: 1 Batch: 10258/20099 (51.04%) Loss: 2.317679 LR: 0.00002191 +[12:33:45] Epoch: 1 Batch: 10259/20099 (51.04%) Loss: 2.309538 LR: 0.00002191 +[12:33:46] Epoch: 1 Batch: 10260/20099 (51.05%) Loss: 2.059008 LR: 0.00002191 +[12:33:48] Epoch: 1 Batch: 10261/20099 (51.05%) Loss: 2.061834 LR: 0.00002191 +[12:33:50] Epoch: 1 Batch: 10262/20099 (51.06%) Loss: 2.037491 LR: 0.00002191 +[12:33:52] Epoch: 1 Batch: 10263/20099 (51.06%) Loss: 2.262705 LR: 0.00002191 +[12:33:54] Epoch: 1 Batch: 10264/20099 (51.07%) Loss: 2.015163 LR: 0.00002191 +[12:33:55] Epoch: 1 Batch: 10265/20099 (51.07%) Loss: 2.084115 LR: 0.00002190 +[12:33:57] Epoch: 1 Batch: 10266/20099 (51.08%) Loss: 2.281603 LR: 0.00002190 +[12:33:59] Epoch: 1 Batch: 10267/20099 (51.08%) Loss: 1.949597 LR: 0.00002190 +[12:34:01] Epoch: 1 Batch: 10268/20099 (51.09%) Loss: 2.048435 LR: 0.00002190 +[12:34:03] Epoch: 1 Batch: 10269/20099 (51.09%) Loss: 2.297588 LR: 0.00002190 +[12:34:04] Epoch: 1 Batch: 10270/20099 (51.10%) Loss: 2.230838 LR: 0.00002190 +[12:34:06] Epoch: 1 Batch: 10271/20099 (51.10%) Loss: 2.125212 LR: 0.00002190 +[12:34:08] Epoch: 1 Batch: 10272/20099 (51.11%) Loss: 2.234109 LR: 0.00002188 +[12:34:10] Epoch: 1 Batch: 10273/20099 (51.11%) Loss: 2.192349 LR: 0.00002188 +[12:34:11] Epoch: 1 Batch: 10274/20099 (51.12%) Loss: 2.434985 LR: 0.00002188 +[12:34:13] Epoch: 1 Batch: 10275/20099 (51.12%) Loss: 2.104143 LR: 0.00002188 +[12:34:15] Epoch: 1 Batch: 10276/20099 (51.13%) Loss: 2.197866 LR: 0.00002188 +[12:34:17] Epoch: 1 Batch: 10277/20099 (51.13%) Loss: 1.888504 LR: 0.00002188 +[12:34:19] Epoch: 1 Batch: 10278/20099 (51.14%) Loss: 1.848029 LR: 0.00002188 +[12:34:20] Epoch: 1 Batch: 10279/20099 (51.14%) Loss: 2.215764 LR: 0.00002187 +[12:34:22] Epoch: 1 Batch: 10280/20099 (51.15%) Loss: 2.130677 LR: 0.00002187 +[12:34:24] Epoch: 1 Batch: 10281/20099 (51.15%) Loss: 2.071054 LR: 0.00002187 +[12:34:26] Epoch: 1 Batch: 10282/20099 (51.16%) Loss: 1.814594 LR: 0.00002187 +[12:34:27] Epoch: 1 Batch: 10283/20099 (51.16%) Loss: 2.212279 LR: 0.00002187 +[12:34:29] Epoch: 1 Batch: 10284/20099 (51.17%) Loss: 2.164334 LR: 0.00002187 +[12:34:31] Epoch: 1 Batch: 10285/20099 (51.17%) Loss: 2.106360 LR: 0.00002187 +[12:34:33] Epoch: 1 Batch: 10286/20099 (51.18%) Loss: 2.326935 LR: 0.00002185 +[12:34:35] Epoch: 1 Batch: 10287/20099 (51.18%) Loss: 2.045877 LR: 0.00002185 +[12:34:36] Epoch: 1 Batch: 10288/20099 (51.19%) Loss: 2.022226 LR: 0.00002185 +[12:34:38] Epoch: 1 Batch: 10289/20099 (51.19%) Loss: 1.972576 LR: 0.00002185 +[12:34:40] Epoch: 1 Batch: 10290/20099 (51.20%) Loss: 1.823125 LR: 0.00002185 +[12:34:42] Epoch: 1 Batch: 10291/20099 (51.20%) Loss: 2.004529 LR: 0.00002185 +[12:34:44] Epoch: 1 Batch: 10292/20099 (51.21%) Loss: 2.006238 LR: 0.00002185 +[12:34:45] Epoch: 1 Batch: 10293/20099 (51.21%) Loss: 2.182030 LR: 0.00002184 +[12:34:47] Epoch: 1 Batch: 10294/20099 (51.22%) Loss: 2.212804 LR: 0.00002184 +[12:34:49] Epoch: 1 Batch: 10295/20099 (51.22%) Loss: 1.853108 LR: 0.00002184 +[12:34:51] Epoch: 1 Batch: 10296/20099 (51.23%) Loss: 2.284854 LR: 0.00002184 +[12:34:52] Epoch: 1 Batch: 10297/20099 (51.23%) Loss: 2.055839 LR: 0.00002184 +[12:34:54] Epoch: 1 Batch: 10298/20099 (51.24%) Loss: 2.110389 LR: 0.00002184 +[12:34:56] Epoch: 1 Batch: 10299/20099 (51.24%) Loss: 2.157324 LR: 0.00002184 +[12:34:58] Epoch: 1 Batch: 10300/20099 (51.25%) Loss: 1.768042 LR: 0.00002182 +[12:35:00] Epoch: 1 Batch: 10301/20099 (51.25%) Loss: 2.050135 LR: 0.00002182 +[12:35:01] Epoch: 1 Batch: 10302/20099 (51.26%) Loss: 2.194338 LR: 0.00002182 +[12:35:03] Epoch: 1 Batch: 10303/20099 (51.26%) Loss: 2.065015 LR: 0.00002182 +[12:35:05] Epoch: 1 Batch: 10304/20099 (51.27%) Loss: 2.077620 LR: 0.00002182 +[12:35:07] Epoch: 1 Batch: 10305/20099 (51.27%) Loss: 2.238293 LR: 0.00002182 +[12:35:08] Epoch: 1 Batch: 10306/20099 (51.28%) Loss: 2.112545 LR: 0.00002182 +[12:35:10] Epoch: 1 Batch: 10307/20099 (51.28%) Loss: 1.879103 LR: 0.00002181 +[12:35:12] Epoch: 1 Batch: 10308/20099 (51.29%) Loss: 2.225071 LR: 0.00002181 +[12:35:14] Epoch: 1 Batch: 10309/20099 (51.29%) Loss: 2.115066 LR: 0.00002181 +[12:35:16] Epoch: 1 Batch: 10310/20099 (51.30%) Loss: 1.897396 LR: 0.00002181 +[12:35:17] Epoch: 1 Batch: 10311/20099 (51.30%) Loss: 2.047663 LR: 0.00002181 +[12:35:19] Epoch: 1 Batch: 10312/20099 (51.31%) Loss: 1.946905 LR: 0.00002181 +[12:35:21] Epoch: 1 Batch: 10313/20099 (51.31%) Loss: 1.810385 LR: 0.00002181 +[12:35:23] Epoch: 1 Batch: 10314/20099 (51.32%) Loss: 2.035806 LR: 0.00002179 +[12:35:24] Epoch: 1 Batch: 10315/20099 (51.32%) Loss: 2.311561 LR: 0.00002179 +[12:35:26] Epoch: 1 Batch: 10316/20099 (51.33%) Loss: 2.125880 LR: 0.00002179 +[12:35:28] Epoch: 1 Batch: 10317/20099 (51.33%) Loss: 2.453233 LR: 0.00002179 +[12:35:30] Epoch: 1 Batch: 10318/20099 (51.34%) Loss: 2.120188 LR: 0.00002179 +[12:35:32] Epoch: 1 Batch: 10319/20099 (51.34%) Loss: 2.069199 LR: 0.00002179 +[12:35:33] Epoch: 1 Batch: 10320/20099 (51.35%) Loss: 2.189836 LR: 0.00002179 +[12:35:35] Epoch: 1 Batch: 10321/20099 (51.35%) Loss: 2.321538 LR: 0.00002178 +[12:35:37] Epoch: 1 Batch: 10322/20099 (51.36%) Loss: 1.995060 LR: 0.00002178 +[12:35:39] Epoch: 1 Batch: 10323/20099 (51.36%) Loss: 1.925357 LR: 0.00002178 +[12:35:40] Epoch: 1 Batch: 10324/20099 (51.37%) Loss: 2.095595 LR: 0.00002178 +[12:35:42] Epoch: 1 Batch: 10325/20099 (51.37%) Loss: 1.985288 LR: 0.00002178 +[12:35:44] Epoch: 1 Batch: 10326/20099 (51.38%) Loss: 1.760827 LR: 0.00002178 +[12:35:46] Epoch: 1 Batch: 10327/20099 (51.38%) Loss: 2.014076 LR: 0.00002178 +[12:35:48] Epoch: 1 Batch: 10328/20099 (51.39%) Loss: 2.013000 LR: 0.00002176 +[12:35:49] Epoch: 1 Batch: 10329/20099 (51.39%) Loss: 2.127565 LR: 0.00002176 +[12:35:51] Epoch: 1 Batch: 10330/20099 (51.40%) Loss: 2.404293 LR: 0.00002176 +[12:35:53] Epoch: 1 Batch: 10331/20099 (51.40%) Loss: 2.002283 LR: 0.00002176 +[12:35:55] Epoch: 1 Batch: 10332/20099 (51.41%) Loss: 1.853055 LR: 0.00002176 +[12:35:56] Epoch: 1 Batch: 10333/20099 (51.41%) Loss: 2.123050 LR: 0.00002176 +[12:35:58] Epoch: 1 Batch: 10334/20099 (51.42%) Loss: 2.211385 LR: 0.00002176 +[12:36:00] Epoch: 1 Batch: 10335/20099 (51.42%) Loss: 2.326090 LR: 0.00002175 +[12:36:02] Epoch: 1 Batch: 10336/20099 (51.43%) Loss: 2.022330 LR: 0.00002175 +[12:36:04] Epoch: 1 Batch: 10337/20099 (51.43%) Loss: 1.975691 LR: 0.00002175 +[12:36:05] Epoch: 1 Batch: 10338/20099 (51.44%) Loss: 2.206053 LR: 0.00002175 +[12:36:07] Epoch: 1 Batch: 10339/20099 (51.44%) Loss: 2.078422 LR: 0.00002175 +[12:36:09] Epoch: 1 Batch: 10340/20099 (51.45%) Loss: 1.879647 LR: 0.00002175 +[12:36:11] Epoch: 1 Batch: 10341/20099 (51.45%) Loss: 2.153389 LR: 0.00002175 +[12:36:12] Epoch: 1 Batch: 10342/20099 (51.46%) Loss: 2.249233 LR: 0.00002173 +[12:36:14] Epoch: 1 Batch: 10343/20099 (51.46%) Loss: 1.876883 LR: 0.00002173 +[12:36:16] Epoch: 1 Batch: 10344/20099 (51.47%) Loss: 1.958269 LR: 0.00002173 +[12:36:18] Epoch: 1 Batch: 10345/20099 (51.47%) Loss: 1.858542 LR: 0.00002173 +[12:36:20] Epoch: 1 Batch: 10346/20099 (51.48%) Loss: 1.934578 LR: 0.00002173 +[12:36:21] Epoch: 1 Batch: 10347/20099 (51.48%) Loss: 1.950093 LR: 0.00002173 +[12:36:23] Epoch: 1 Batch: 10348/20099 (51.49%) Loss: 1.777596 LR: 0.00002173 +[12:36:25] Epoch: 1 Batch: 10349/20099 (51.49%) Loss: 2.050186 LR: 0.00002171 +[12:36:27] Epoch: 1 Batch: 10350/20099 (51.50%) Loss: 2.041305 LR: 0.00002171 +[12:36:28] Epoch: 1 Batch: 10351/20099 (51.50%) Loss: 2.356239 LR: 0.00002171 +[12:36:30] Epoch: 1 Batch: 10352/20099 (51.51%) Loss: 2.107040 LR: 0.00002171 +[12:36:32] Epoch: 1 Batch: 10353/20099 (51.51%) Loss: 2.032933 LR: 0.00002171 +[12:36:34] Epoch: 1 Batch: 10354/20099 (51.52%) Loss: 2.254959 LR: 0.00002171 +[12:36:36] Epoch: 1 Batch: 10355/20099 (51.52%) Loss: 2.237024 LR: 0.00002171 +[12:36:37] Epoch: 1 Batch: 10356/20099 (51.52%) Loss: 2.105427 LR: 0.00002170 +[12:36:39] Epoch: 1 Batch: 10357/20099 (51.53%) Loss: 2.006105 LR: 0.00002170 +[12:36:41] Epoch: 1 Batch: 10358/20099 (51.53%) Loss: 2.003927 LR: 0.00002170 +[12:36:43] Epoch: 1 Batch: 10359/20099 (51.54%) Loss: 2.092793 LR: 0.00002170 +[12:36:44] Epoch: 1 Batch: 10360/20099 (51.54%) Loss: 2.287009 LR: 0.00002170 +[12:36:46] Epoch: 1 Batch: 10361/20099 (51.55%) Loss: 2.411765 LR: 0.00002170 +[12:36:48] Epoch: 1 Batch: 10362/20099 (51.55%) Loss: 2.061428 LR: 0.00002170 +[12:36:50] Epoch: 1 Batch: 10363/20099 (51.56%) Loss: 2.038290 LR: 0.00002168 +[12:36:51] Epoch: 1 Batch: 10364/20099 (51.56%) Loss: 2.104352 LR: 0.00002168 +[12:36:53] Epoch: 1 Batch: 10365/20099 (51.57%) Loss: 2.272573 LR: 0.00002168 +[12:36:55] Epoch: 1 Batch: 10366/20099 (51.57%) Loss: 2.434746 LR: 0.00002168 +[12:36:57] Epoch: 1 Batch: 10367/20099 (51.58%) Loss: 2.014027 LR: 0.00002168 +[12:36:59] Epoch: 1 Batch: 10368/20099 (51.58%) Loss: 2.176905 LR: 0.00002168 +[12:37:00] Epoch: 1 Batch: 10369/20099 (51.59%) Loss: 2.013161 LR: 0.00002168 +[12:37:02] Epoch: 1 Batch: 10370/20099 (51.59%) Loss: 2.147414 LR: 0.00002167 +[12:37:04] Epoch: 1 Batch: 10371/20099 (51.60%) Loss: 2.207065 LR: 0.00002167 +[12:37:06] Epoch: 1 Batch: 10372/20099 (51.60%) Loss: 2.296807 LR: 0.00002167 +[12:37:07] Epoch: 1 Batch: 10373/20099 (51.61%) Loss: 2.158292 LR: 0.00002167 +[12:37:09] Epoch: 1 Batch: 10374/20099 (51.61%) Loss: 2.218572 LR: 0.00002167 +[12:37:11] Epoch: 1 Batch: 10375/20099 (51.62%) Loss: 1.735222 LR: 0.00002167 +[12:37:13] Epoch: 1 Batch: 10376/20099 (51.62%) Loss: 2.286353 LR: 0.00002167 +[12:37:15] Epoch: 1 Batch: 10377/20099 (51.63%) Loss: 1.821790 LR: 0.00002165 +[12:37:16] Epoch: 1 Batch: 10378/20099 (51.63%) Loss: 2.128585 LR: 0.00002165 +[12:37:18] Epoch: 1 Batch: 10379/20099 (51.64%) Loss: 2.153456 LR: 0.00002165 +[12:37:20] Epoch: 1 Batch: 10380/20099 (51.64%) Loss: 1.952369 LR: 0.00002165 +[12:37:22] Epoch: 1 Batch: 10381/20099 (51.65%) Loss: 2.398804 LR: 0.00002165 +[12:37:23] Epoch: 1 Batch: 10382/20099 (51.65%) Loss: 1.960120 LR: 0.00002165 +[12:37:25] Epoch: 1 Batch: 10383/20099 (51.66%) Loss: 2.037566 LR: 0.00002165 +[12:37:27] Epoch: 1 Batch: 10384/20099 (51.66%) Loss: 1.743765 LR: 0.00002164 +[12:37:29] Epoch: 1 Batch: 10385/20099 (51.67%) Loss: 2.090971 LR: 0.00002164 +[12:37:30] Epoch: 1 Batch: 10386/20099 (51.67%) Loss: 1.996231 LR: 0.00002164 +[12:37:32] Epoch: 1 Batch: 10387/20099 (51.68%) Loss: 2.025589 LR: 0.00002164 +[12:37:34] Epoch: 1 Batch: 10388/20099 (51.68%) Loss: 2.106706 LR: 0.00002164 +[12:37:36] Epoch: 1 Batch: 10389/20099 (51.69%) Loss: 2.175152 LR: 0.00002164 +[12:37:38] Epoch: 1 Batch: 10390/20099 (51.69%) Loss: 2.141534 LR: 0.00002164 +[12:37:39] Epoch: 1 Batch: 10391/20099 (51.70%) Loss: 2.125543 LR: 0.00002162 +[12:37:41] Epoch: 1 Batch: 10392/20099 (51.70%) Loss: 2.015568 LR: 0.00002162 +[12:37:43] Epoch: 1 Batch: 10393/20099 (51.71%) Loss: 1.880584 LR: 0.00002162 +[12:37:45] Epoch: 1 Batch: 10394/20099 (51.71%) Loss: 2.028400 LR: 0.00002162 +[12:37:46] Epoch: 1 Batch: 10395/20099 (51.72%) Loss: 2.278065 LR: 0.00002162 +[12:37:48] Epoch: 1 Batch: 10396/20099 (51.72%) Loss: 2.100892 LR: 0.00002162 +[12:37:50] Epoch: 1 Batch: 10397/20099 (51.73%) Loss: 2.186100 LR: 0.00002162 +[12:37:52] Epoch: 1 Batch: 10398/20099 (51.73%) Loss: 2.276024 LR: 0.00002161 +[12:37:54] Epoch: 1 Batch: 10399/20099 (51.74%) Loss: 2.008871 LR: 0.00002161 +[12:37:59] >> Cleaned up old temp checkpoint: epoch1_step8400 +[12:37:59] >> Temp checkpoint saved: epoch1_step10400, size: 0.1693 GB +[12:37:59] Epoch: 1 Batch: 10400/20099 (51.74%) Loss: 2.455716 LR: 0.00002161 +[12:38:01] Epoch: 1 Batch: 10401/20099 (51.75%) Loss: 2.046143 LR: 0.00002161 +[12:38:02] Epoch: 1 Batch: 10402/20099 (51.75%) Loss: 2.060118 LR: 0.00002161 +[12:38:04] Epoch: 1 Batch: 10403/20099 (51.76%) Loss: 1.947907 LR: 0.00002161 +[12:38:06] Epoch: 1 Batch: 10404/20099 (51.76%) Loss: 2.440352 LR: 0.00002161 +[12:38:08] Epoch: 1 Batch: 10405/20099 (51.77%) Loss: 2.098964 LR: 0.00002159 +[12:38:09] Epoch: 1 Batch: 10406/20099 (51.77%) Loss: 2.178732 LR: 0.00002159 +[12:38:11] Epoch: 1 Batch: 10407/20099 (51.78%) Loss: 2.139475 LR: 0.00002159 +[12:38:13] Epoch: 1 Batch: 10408/20099 (51.78%) Loss: 1.922473 LR: 0.00002159 +[12:38:15] Epoch: 1 Batch: 10409/20099 (51.79%) Loss: 1.991336 LR: 0.00002159 +[12:38:16] Epoch: 1 Batch: 10410/20099 (51.79%) Loss: 2.100329 LR: 0.00002159 +[12:38:18] Epoch: 1 Batch: 10411/20099 (51.80%) Loss: 2.065415 LR: 0.00002159 +[12:38:20] Epoch: 1 Batch: 10412/20099 (51.80%) Loss: 2.133376 LR: 0.00002158 +[12:38:22] Epoch: 1 Batch: 10413/20099 (51.81%) Loss: 2.259407 LR: 0.00002158 +[12:38:24] Epoch: 1 Batch: 10414/20099 (51.81%) Loss: 2.072116 LR: 0.00002158 +[12:38:25] Epoch: 1 Batch: 10415/20099 (51.82%) Loss: 2.089845 LR: 0.00002158 +[12:38:27] Epoch: 1 Batch: 10416/20099 (51.82%) Loss: 2.112533 LR: 0.00002158 +[12:38:29] Epoch: 1 Batch: 10417/20099 (51.83%) Loss: 2.063540 LR: 0.00002158 +[12:38:31] Epoch: 1 Batch: 10418/20099 (51.83%) Loss: 2.163007 LR: 0.00002158 +[12:38:33] Epoch: 1 Batch: 10419/20099 (51.84%) Loss: 1.749084 LR: 0.00002156 +[12:38:34] Epoch: 1 Batch: 10420/20099 (51.84%) Loss: 2.050811 LR: 0.00002156 +[12:38:36] Epoch: 1 Batch: 10421/20099 (51.85%) Loss: 1.822824 LR: 0.00002156 +[12:38:38] Epoch: 1 Batch: 10422/20099 (51.85%) Loss: 1.853037 LR: 0.00002156 +[12:38:40] Epoch: 1 Batch: 10423/20099 (51.86%) Loss: 2.047303 LR: 0.00002156 +[12:38:42] Epoch: 1 Batch: 10424/20099 (51.86%) Loss: 2.075544 LR: 0.00002156 +[12:38:43] Epoch: 1 Batch: 10425/20099 (51.87%) Loss: 2.101726 LR: 0.00002156 +[12:38:45] Epoch: 1 Batch: 10426/20099 (51.87%) Loss: 1.991239 LR: 0.00002154 +[12:38:47] Epoch: 1 Batch: 10427/20099 (51.88%) Loss: 1.990588 LR: 0.00002154 +[12:38:49] Epoch: 1 Batch: 10428/20099 (51.88%) Loss: 1.989367 LR: 0.00002154 +[12:38:51] Epoch: 1 Batch: 10429/20099 (51.89%) Loss: 1.881629 LR: 0.00002154 +[12:38:52] Epoch: 1 Batch: 10430/20099 (51.89%) Loss: 1.713953 LR: 0.00002154 +[12:38:54] Epoch: 1 Batch: 10431/20099 (51.90%) Loss: 1.925071 LR: 0.00002154 +[12:38:56] Epoch: 1 Batch: 10432/20099 (51.90%) Loss: 1.847067 LR: 0.00002154 +[12:38:58] Epoch: 1 Batch: 10433/20099 (51.91%) Loss: 1.998695 LR: 0.00002153 +[12:38:59] Epoch: 1 Batch: 10434/20099 (51.91%) Loss: 2.079140 LR: 0.00002153 +[12:39:01] Epoch: 1 Batch: 10435/20099 (51.92%) Loss: 1.877827 LR: 0.00002153 +[12:39:03] Epoch: 1 Batch: 10436/20099 (51.92%) Loss: 1.870346 LR: 0.00002153 +[12:39:05] Epoch: 1 Batch: 10437/20099 (51.93%) Loss: 1.988331 LR: 0.00002153 +[12:39:06] Epoch: 1 Batch: 10438/20099 (51.93%) Loss: 2.239874 LR: 0.00002153 +[12:39:08] Epoch: 1 Batch: 10439/20099 (51.94%) Loss: 1.920635 LR: 0.00002153 +[12:39:10] Epoch: 1 Batch: 10440/20099 (51.94%) Loss: 2.222629 LR: 0.00002151 +[12:39:12] Epoch: 1 Batch: 10441/20099 (51.95%) Loss: 2.209567 LR: 0.00002151 +[12:39:13] Epoch: 1 Batch: 10442/20099 (51.95%) Loss: 1.917642 LR: 0.00002151 +[12:39:15] Epoch: 1 Batch: 10443/20099 (51.96%) Loss: 2.217182 LR: 0.00002151 +[12:39:17] Epoch: 1 Batch: 10444/20099 (51.96%) Loss: 1.961433 LR: 0.00002151 +[12:39:19] Epoch: 1 Batch: 10445/20099 (51.97%) Loss: 2.043659 LR: 0.00002151 +[12:39:20] Epoch: 1 Batch: 10446/20099 (51.97%) Loss: 2.149435 LR: 0.00002151 +[12:39:22] Epoch: 1 Batch: 10447/20099 (51.98%) Loss: 2.133778 LR: 0.00002150 +[12:39:24] Epoch: 1 Batch: 10448/20099 (51.98%) Loss: 2.019760 LR: 0.00002150 +[12:39:26] Epoch: 1 Batch: 10449/20099 (51.99%) Loss: 2.259640 LR: 0.00002150 +[12:39:28] Epoch: 1 Batch: 10450/20099 (51.99%) Loss: 2.212079 LR: 0.00002150 +[12:39:29] Epoch: 1 Batch: 10451/20099 (52.00%) Loss: 1.876225 LR: 0.00002150 +[12:39:31] Epoch: 1 Batch: 10452/20099 (52.00%) Loss: 2.136570 LR: 0.00002150 +[12:39:33] Epoch: 1 Batch: 10453/20099 (52.01%) Loss: 2.260218 LR: 0.00002150 +[12:39:35] Epoch: 1 Batch: 10454/20099 (52.01%) Loss: 1.968714 LR: 0.00002148 +[12:39:36] Epoch: 1 Batch: 10455/20099 (52.02%) Loss: 2.147729 LR: 0.00002148 +[12:39:38] Epoch: 1 Batch: 10456/20099 (52.02%) Loss: 2.196300 LR: 0.00002148 +[12:39:40] Epoch: 1 Batch: 10457/20099 (52.03%) Loss: 1.866500 LR: 0.00002148 +[12:39:42] Epoch: 1 Batch: 10458/20099 (52.03%) Loss: 2.109551 LR: 0.00002148 +[12:39:43] Epoch: 1 Batch: 10459/20099 (52.04%) Loss: 2.244141 LR: 0.00002148 +[12:39:45] Epoch: 1 Batch: 10460/20099 (52.04%) Loss: 1.886133 LR: 0.00002148 +[12:39:47] Epoch: 1 Batch: 10461/20099 (52.05%) Loss: 2.196221 LR: 0.00002147 +[12:39:49] Epoch: 1 Batch: 10462/20099 (52.05%) Loss: 2.192221 LR: 0.00002147 +[12:39:51] Epoch: 1 Batch: 10463/20099 (52.06%) Loss: 2.087475 LR: 0.00002147 +[12:39:52] Epoch: 1 Batch: 10464/20099 (52.06%) Loss: 2.141432 LR: 0.00002147 +[12:39:54] Epoch: 1 Batch: 10465/20099 (52.07%) Loss: 2.258586 LR: 0.00002147 +[12:39:56] Epoch: 1 Batch: 10466/20099 (52.07%) Loss: 1.916189 LR: 0.00002147 +[12:39:58] Epoch: 1 Batch: 10467/20099 (52.08%) Loss: 2.090657 LR: 0.00002147 +[12:39:59] Epoch: 1 Batch: 10468/20099 (52.08%) Loss: 2.074265 LR: 0.00002145 +[12:40:01] Epoch: 1 Batch: 10469/20099 (52.09%) Loss: 2.372570 LR: 0.00002145 +[12:40:03] Epoch: 1 Batch: 10470/20099 (52.09%) Loss: 2.115500 LR: 0.00002145 +[12:40:05] Epoch: 1 Batch: 10471/20099 (52.10%) Loss: 2.135011 LR: 0.00002145 +[12:40:06] Epoch: 1 Batch: 10472/20099 (52.10%) Loss: 1.984574 LR: 0.00002145 +[12:40:08] Epoch: 1 Batch: 10473/20099 (52.11%) Loss: 2.235359 LR: 0.00002145 +[12:40:10] Epoch: 1 Batch: 10474/20099 (52.11%) Loss: 2.126053 LR: 0.00002145 +[12:40:12] Epoch: 1 Batch: 10475/20099 (52.12%) Loss: 1.923983 LR: 0.00002144 +[12:40:14] Epoch: 1 Batch: 10476/20099 (52.12%) Loss: 1.912723 LR: 0.00002144 +[12:40:15] Epoch: 1 Batch: 10477/20099 (52.13%) Loss: 2.071522 LR: 0.00002144 +[12:40:17] Epoch: 1 Batch: 10478/20099 (52.13%) Loss: 1.900655 LR: 0.00002144 +[12:40:19] Epoch: 1 Batch: 10479/20099 (52.14%) Loss: 2.166101 LR: 0.00002144 +[12:40:21] Epoch: 1 Batch: 10480/20099 (52.14%) Loss: 2.098050 LR: 0.00002144 +[12:40:23] Epoch: 1 Batch: 10481/20099 (52.15%) Loss: 1.937467 LR: 0.00002144 +[12:40:24] Epoch: 1 Batch: 10482/20099 (52.15%) Loss: 2.072272 LR: 0.00002142 +[12:40:26] Epoch: 1 Batch: 10483/20099 (52.16%) Loss: 2.163943 LR: 0.00002142 +[12:40:28] Epoch: 1 Batch: 10484/20099 (52.16%) Loss: 1.805678 LR: 0.00002142 +[12:40:30] Epoch: 1 Batch: 10485/20099 (52.17%) Loss: 1.812973 LR: 0.00002142 +[12:40:31] Epoch: 1 Batch: 10486/20099 (52.17%) Loss: 1.626138 LR: 0.00002142 +[12:40:33] Epoch: 1 Batch: 10487/20099 (52.18%) Loss: 2.248865 LR: 0.00002142 +[12:40:35] Epoch: 1 Batch: 10488/20099 (52.18%) Loss: 2.256725 LR: 0.00002142 +[12:40:37] Epoch: 1 Batch: 10489/20099 (52.19%) Loss: 2.222520 LR: 0.00002140 +[12:40:39] Epoch: 1 Batch: 10490/20099 (52.19%) Loss: 2.116989 LR: 0.00002140 +[12:40:40] Epoch: 1 Batch: 10491/20099 (52.20%) Loss: 2.273890 LR: 0.00002140 +[12:40:42] Epoch: 1 Batch: 10492/20099 (52.20%) Loss: 1.862279 LR: 0.00002140 +[12:40:44] Epoch: 1 Batch: 10493/20099 (52.21%) Loss: 2.256407 LR: 0.00002140 +[12:40:46] Epoch: 1 Batch: 10494/20099 (52.21%) Loss: 2.065521 LR: 0.00002140 +[12:40:47] Epoch: 1 Batch: 10495/20099 (52.22%) Loss: 2.184956 LR: 0.00002140 +[12:40:49] Epoch: 1 Batch: 10496/20099 (52.22%) Loss: 2.301841 LR: 0.00002139 +[12:40:51] Epoch: 1 Batch: 10497/20099 (52.23%) Loss: 2.170638 LR: 0.00002139 +[12:40:53] Epoch: 1 Batch: 10498/20099 (52.23%) Loss: 1.901645 LR: 0.00002139 +[12:40:54] Epoch: 1 Batch: 10499/20099 (52.24%) Loss: 2.013383 LR: 0.00002139 +[12:40:56] >> Evaluating batch 0 +[12:40:58] >> Evaluating batch 1 +[12:40:59] >> Evaluating batch 2 +[12:41:00] >> Evaluating batch 3 +[12:41:01] >> Evaluating batch 4 +[12:41:02] >> Evaluating batch 5 +[12:41:03] >> Evaluating batch 6 +[12:41:04] >> Evaluating batch 7 +[12:41:05] >> Evaluating batch 8 +[12:41:06] >> Evaluating batch 9 +[12:41:07] >> Evaluating batch 10 +[12:41:08] >> Evaluating batch 11 +[12:41:09] >> Evaluating batch 12 +[12:41:09] >> Evaluating batch 13 +[12:41:10] >> Evaluating batch 14 +[12:41:11] >> Evaluating batch 15 +[12:41:12] >> Evaluating batch 16 +[12:41:13] Epoch: 1 Step: 10500/20099 Evaluation: +[12:41:13] [1mAvg Loss Since Last Eval: 2.0980 Val Loss: 2.1666 Validation loss delta: -0.0013 Perplexity: 8.7286 LR: 0.00002139 +[12:41:16] >> Checkpoint saved: epoch1_step10500, size: 0.1693 GB +[12:41:16] Epoch: 1 Batch: 10500/20099 (52.24%) Loss: 2.105795 LR: 0.00002139 +[12:41:18] Epoch: 1 Batch: 10501/20099 (52.25%) Loss: 2.275038 LR: 0.00002139 +[12:41:20] Epoch: 1 Batch: 10502/20099 (52.25%) Loss: 2.063590 LR: 0.00002139 +[12:41:22] Epoch: 1 Batch: 10503/20099 (52.26%) Loss: 1.944690 LR: 0.00002137 +[12:41:23] Epoch: 1 Batch: 10504/20099 (52.26%) Loss: 2.197471 LR: 0.00002137 +[12:41:25] Epoch: 1 Batch: 10505/20099 (52.27%) Loss: 2.005412 LR: 0.00002137 +[12:41:27] Epoch: 1 Batch: 10506/20099 (52.27%) Loss: 2.247791 LR: 0.00002137 +[12:41:29] Epoch: 1 Batch: 10507/20099 (52.28%) Loss: 2.047730 LR: 0.00002137 +[12:41:30] Epoch: 1 Batch: 10508/20099 (52.28%) Loss: 2.246016 LR: 0.00002137 +[12:41:32] Epoch: 1 Batch: 10509/20099 (52.29%) Loss: 2.154473 LR: 0.00002137 +[12:41:34] Epoch: 1 Batch: 10510/20099 (52.29%) Loss: 2.257043 LR: 0.00002136 +[12:41:36] Epoch: 1 Batch: 10511/20099 (52.30%) Loss: 1.845307 LR: 0.00002136 +[12:41:38] Epoch: 1 Batch: 10512/20099 (52.30%) Loss: 2.389839 LR: 0.00002136 +[12:41:39] Epoch: 1 Batch: 10513/20099 (52.31%) Loss: 1.959780 LR: 0.00002136 +[12:41:41] Epoch: 1 Batch: 10514/20099 (52.31%) Loss: 2.152979 LR: 0.00002136 +[12:41:43] Epoch: 1 Batch: 10515/20099 (52.32%) Loss: 2.144530 LR: 0.00002136 +[12:41:45] Epoch: 1 Batch: 10516/20099 (52.32%) Loss: 2.195328 LR: 0.00002136 +[12:41:47] Epoch: 1 Batch: 10517/20099 (52.33%) Loss: 2.063839 LR: 0.00002134 +[12:41:48] Epoch: 1 Batch: 10518/20099 (52.33%) Loss: 2.272826 LR: 0.00002134 +[12:41:50] Epoch: 1 Batch: 10519/20099 (52.34%) Loss: 2.413317 LR: 0.00002134 +[12:41:52] Epoch: 1 Batch: 10520/20099 (52.34%) Loss: 2.135377 LR: 0.00002134 +[12:41:54] Epoch: 1 Batch: 10521/20099 (52.35%) Loss: 2.441047 LR: 0.00002134 +[12:41:56] Epoch: 1 Batch: 10522/20099 (52.35%) Loss: 1.931668 LR: 0.00002134 +[12:41:57] Epoch: 1 Batch: 10523/20099 (52.36%) Loss: 1.971772 LR: 0.00002134 +[12:41:59] Epoch: 1 Batch: 10524/20099 (52.36%) Loss: 1.909199 LR: 0.00002133 +[12:42:01] Epoch: 1 Batch: 10525/20099 (52.37%) Loss: 2.177586 LR: 0.00002133 +[12:42:03] Epoch: 1 Batch: 10526/20099 (52.37%) Loss: 2.091140 LR: 0.00002133 +[12:42:05] Epoch: 1 Batch: 10527/20099 (52.38%) Loss: 2.057131 LR: 0.00002133 +[12:42:06] Epoch: 1 Batch: 10528/20099 (52.38%) Loss: 2.353148 LR: 0.00002133 +[12:42:08] Epoch: 1 Batch: 10529/20099 (52.39%) Loss: 1.966397 LR: 0.00002133 +[12:42:10] Epoch: 1 Batch: 10530/20099 (52.39%) Loss: 2.049764 LR: 0.00002133 +[12:42:12] Epoch: 1 Batch: 10531/20099 (52.40%) Loss: 2.055216 LR: 0.00002131 +[12:42:13] Epoch: 1 Batch: 10532/20099 (52.40%) Loss: 2.429504 LR: 0.00002131 +[12:42:15] Epoch: 1 Batch: 10533/20099 (52.41%) Loss: 1.938302 LR: 0.00002131 +[12:42:17] Epoch: 1 Batch: 10534/20099 (52.41%) Loss: 1.732248 LR: 0.00002131 +[12:42:19] Epoch: 1 Batch: 10535/20099 (52.42%) Loss: 2.298864 LR: 0.00002131 +[12:42:21] Epoch: 1 Batch: 10536/20099 (52.42%) Loss: 1.861883 LR: 0.00002131 +[12:42:22] Epoch: 1 Batch: 10537/20099 (52.43%) Loss: 2.062312 LR: 0.00002131 +[12:42:24] Epoch: 1 Batch: 10538/20099 (52.43%) Loss: 2.034230 LR: 0.00002129 +[12:42:26] Epoch: 1 Batch: 10539/20099 (52.44%) Loss: 2.069561 LR: 0.00002129 +[12:42:28] Epoch: 1 Batch: 10540/20099 (52.44%) Loss: 2.116052 LR: 0.00002129 +[12:42:29] Epoch: 1 Batch: 10541/20099 (52.45%) Loss: 1.911066 LR: 0.00002129 +[12:42:31] Epoch: 1 Batch: 10542/20099 (52.45%) Loss: 1.891479 LR: 0.00002129 +[12:42:33] Epoch: 1 Batch: 10543/20099 (52.46%) Loss: 2.043854 LR: 0.00002129 +[12:42:35] Epoch: 1 Batch: 10544/20099 (52.46%) Loss: 2.031075 LR: 0.00002129 +[12:42:37] Epoch: 1 Batch: 10545/20099 (52.47%) Loss: 2.132545 LR: 0.00002128 +[12:42:38] Epoch: 1 Batch: 10546/20099 (52.47%) Loss: 2.175620 LR: 0.00002128 +[12:42:40] Epoch: 1 Batch: 10547/20099 (52.48%) Loss: 2.359708 LR: 0.00002128 +[12:42:42] Epoch: 1 Batch: 10548/20099 (52.48%) Loss: 1.879638 LR: 0.00002128 +[12:42:44] Epoch: 1 Batch: 10549/20099 (52.49%) Loss: 1.887316 LR: 0.00002128 +[12:42:45] Epoch: 1 Batch: 10550/20099 (52.49%) Loss: 2.167636 LR: 0.00002128 +[12:42:47] Epoch: 1 Batch: 10551/20099 (52.50%) Loss: 2.388280 LR: 0.00002128 +[12:42:49] Epoch: 1 Batch: 10552/20099 (52.50%) Loss: 2.395353 LR: 0.00002126 +[12:42:51] Epoch: 1 Batch: 10553/20099 (52.51%) Loss: 2.104136 LR: 0.00002126 +[12:42:52] Epoch: 1 Batch: 10554/20099 (52.51%) Loss: 2.270736 LR: 0.00002126 +[12:42:54] Epoch: 1 Batch: 10555/20099 (52.52%) Loss: 2.248169 LR: 0.00002126 +[12:42:56] Epoch: 1 Batch: 10556/20099 (52.52%) Loss: 2.393737 LR: 0.00002126 +[12:42:58] Epoch: 1 Batch: 10557/20099 (52.53%) Loss: 2.041635 LR: 0.00002126 +[12:42:59] Epoch: 1 Batch: 10558/20099 (52.53%) Loss: 2.362715 LR: 0.00002126 +[12:43:01] Epoch: 1 Batch: 10559/20099 (52.53%) Loss: 2.434280 LR: 0.00002125 +[12:43:03] Epoch: 1 Batch: 10560/20099 (52.54%) Loss: 1.520128 LR: 0.00002125 +[12:43:05] Epoch: 1 Batch: 10561/20099 (52.54%) Loss: 2.163376 LR: 0.00002125 +[12:43:07] Epoch: 1 Batch: 10562/20099 (52.55%) Loss: 1.914386 LR: 0.00002125 +[12:43:08] Epoch: 1 Batch: 10563/20099 (52.55%) Loss: 1.814104 LR: 0.00002125 +[12:43:10] Epoch: 1 Batch: 10564/20099 (52.56%) Loss: 2.010937 LR: 0.00002125 +[12:43:12] Epoch: 1 Batch: 10565/20099 (52.56%) Loss: 2.246521 LR: 0.00002125 +[12:43:14] Epoch: 1 Batch: 10566/20099 (52.57%) Loss: 2.330012 LR: 0.00002123 +[12:43:15] Epoch: 1 Batch: 10567/20099 (52.57%) Loss: 2.393854 LR: 0.00002123 +[12:43:17] Epoch: 1 Batch: 10568/20099 (52.58%) Loss: 2.026956 LR: 0.00002123 +[12:43:19] Epoch: 1 Batch: 10569/20099 (52.58%) Loss: 2.285920 LR: 0.00002123 +[12:43:21] Epoch: 1 Batch: 10570/20099 (52.59%) Loss: 1.929876 LR: 0.00002123 +[12:43:23] Epoch: 1 Batch: 10571/20099 (52.59%) Loss: 2.071024 LR: 0.00002123 +[12:43:24] Epoch: 1 Batch: 10572/20099 (52.60%) Loss: 2.281179 LR: 0.00002123 +[12:43:26] Epoch: 1 Batch: 10573/20099 (52.60%) Loss: 2.082563 LR: 0.00002122 +[12:43:28] Epoch: 1 Batch: 10574/20099 (52.61%) Loss: 2.056453 LR: 0.00002122 +[12:43:30] Epoch: 1 Batch: 10575/20099 (52.61%) Loss: 2.150899 LR: 0.00002122 +[12:43:32] Epoch: 1 Batch: 10576/20099 (52.62%) Loss: 1.877838 LR: 0.00002122 +[12:43:33] Epoch: 1 Batch: 10577/20099 (52.62%) Loss: 2.239994 LR: 0.00002122 +[12:43:35] Epoch: 1 Batch: 10578/20099 (52.63%) Loss: 2.245412 LR: 0.00002122 +[12:43:37] Epoch: 1 Batch: 10579/20099 (52.63%) Loss: 2.212244 LR: 0.00002122 +[12:43:39] Epoch: 1 Batch: 10580/20099 (52.64%) Loss: 2.048193 LR: 0.00002120 +[12:43:40] Epoch: 1 Batch: 10581/20099 (52.64%) Loss: 2.089339 LR: 0.00002120 +[12:43:42] Epoch: 1 Batch: 10582/20099 (52.65%) Loss: 1.980010 LR: 0.00002120 +[12:43:44] Epoch: 1 Batch: 10583/20099 (52.65%) Loss: 2.193273 LR: 0.00002120 +[12:43:46] Epoch: 1 Batch: 10584/20099 (52.66%) Loss: 2.294270 LR: 0.00002120 +[12:43:48] Epoch: 1 Batch: 10585/20099 (52.66%) Loss: 2.241586 LR: 0.00002120 +[12:43:49] Epoch: 1 Batch: 10586/20099 (52.67%) Loss: 2.049581 LR: 0.00002120 +[12:43:51] Epoch: 1 Batch: 10587/20099 (52.67%) Loss: 2.226557 LR: 0.00002119 +[12:43:53] Epoch: 1 Batch: 10588/20099 (52.68%) Loss: 2.230226 LR: 0.00002119 +[12:43:55] Epoch: 1 Batch: 10589/20099 (52.68%) Loss: 2.352484 LR: 0.00002119 +[12:43:56] Epoch: 1 Batch: 10590/20099 (52.69%) Loss: 2.317206 LR: 0.00002119 +[12:43:58] Epoch: 1 Batch: 10591/20099 (52.69%) Loss: 1.944342 LR: 0.00002119 +[12:44:00] Epoch: 1 Batch: 10592/20099 (52.70%) Loss: 2.009456 LR: 0.00002119 +[12:44:02] Epoch: 1 Batch: 10593/20099 (52.70%) Loss: 2.398356 LR: 0.00002119 +[12:44:04] Epoch: 1 Batch: 10594/20099 (52.71%) Loss: 2.173478 LR: 0.00002117 +[12:44:05] Epoch: 1 Batch: 10595/20099 (52.71%) Loss: 1.951501 LR: 0.00002117 +[12:44:07] Epoch: 1 Batch: 10596/20099 (52.72%) Loss: 2.375187 LR: 0.00002117 +[12:44:09] Epoch: 1 Batch: 10597/20099 (52.72%) Loss: 1.947006 LR: 0.00002117 +[12:44:11] Epoch: 1 Batch: 10598/20099 (52.73%) Loss: 2.068319 LR: 0.00002117 +[12:44:12] Epoch: 1 Batch: 10599/20099 (52.73%) Loss: 2.098319 LR: 0.00002117 +[12:44:18] >> Cleaned up old temp checkpoint: epoch1_step8600 +[12:44:18] >> Temp checkpoint saved: epoch1_step10600, size: 0.1693 GB +[12:44:18] Epoch: 1 Batch: 10600/20099 (52.74%) Loss: 2.075042 LR: 0.00002117 +[12:44:20] Epoch: 1 Batch: 10601/20099 (52.74%) Loss: 2.206616 LR: 0.00002115 +[12:44:21] Epoch: 1 Batch: 10602/20099 (52.75%) Loss: 2.080123 LR: 0.00002115 +[12:44:23] Epoch: 1 Batch: 10603/20099 (52.75%) Loss: 1.952322 LR: 0.00002115 +[12:44:25] Epoch: 1 Batch: 10604/20099 (52.76%) Loss: 1.956652 LR: 0.00002115 +[12:44:27] Epoch: 1 Batch: 10605/20099 (52.76%) Loss: 2.211545 LR: 0.00002115 +[12:44:28] Epoch: 1 Batch: 10606/20099 (52.77%) Loss: 2.256493 LR: 0.00002115 +[12:44:30] Epoch: 1 Batch: 10607/20099 (52.77%) Loss: 2.131943 LR: 0.00002115 +[12:44:32] Epoch: 1 Batch: 10608/20099 (52.78%) Loss: 2.085885 LR: 0.00002114 +[12:44:34] Epoch: 1 Batch: 10609/20099 (52.78%) Loss: 1.617042 LR: 0.00002114 +[12:44:35] Epoch: 1 Batch: 10610/20099 (52.79%) Loss: 2.083308 LR: 0.00002114 +[12:44:37] Epoch: 1 Batch: 10611/20099 (52.79%) Loss: 2.060690 LR: 0.00002114 +[12:44:39] Epoch: 1 Batch: 10612/20099 (52.80%) Loss: 2.462697 LR: 0.00002114 +[12:44:41] Epoch: 1 Batch: 10613/20099 (52.80%) Loss: 2.174522 LR: 0.00002114 +[12:44:43] Epoch: 1 Batch: 10614/20099 (52.81%) Loss: 2.113145 LR: 0.00002114 +[12:44:45] Epoch: 1 Batch: 10615/20099 (52.81%) Loss: 1.968632 LR: 0.00002112 +[12:44:46] Epoch: 1 Batch: 10616/20099 (52.82%) Loss: 2.278334 LR: 0.00002112 +[12:44:48] Epoch: 1 Batch: 10617/20099 (52.82%) Loss: 2.030211 LR: 0.00002112 +[12:44:50] Epoch: 1 Batch: 10618/20099 (52.83%) Loss: 1.802056 LR: 0.00002112 +[12:44:52] Epoch: 1 Batch: 10619/20099 (52.83%) Loss: 1.638935 LR: 0.00002112 +[12:44:54] Epoch: 1 Batch: 10620/20099 (52.84%) Loss: 2.331700 LR: 0.00002112 +[12:44:55] Epoch: 1 Batch: 10621/20099 (52.84%) Loss: 2.048728 LR: 0.00002112 +[12:44:57] Epoch: 1 Batch: 10622/20099 (52.85%) Loss: 2.018522 LR: 0.00002111 +[12:44:59] Epoch: 1 Batch: 10623/20099 (52.85%) Loss: 1.997976 LR: 0.00002111 +[12:45:01] Epoch: 1 Batch: 10624/20099 (52.86%) Loss: 2.419809 LR: 0.00002111 +[12:45:03] Epoch: 1 Batch: 10625/20099 (52.86%) Loss: 2.160825 LR: 0.00002111 +[12:45:04] Epoch: 1 Batch: 10626/20099 (52.87%) Loss: 2.323203 LR: 0.00002111 +[12:45:06] Epoch: 1 Batch: 10627/20099 (52.87%) Loss: 2.052153 LR: 0.00002111 +[12:45:08] Epoch: 1 Batch: 10628/20099 (52.88%) Loss: 2.125892 LR: 0.00002111 +[12:45:10] Epoch: 1 Batch: 10629/20099 (52.88%) Loss: 2.063779 LR: 0.00002109 +[12:45:11] Epoch: 1 Batch: 10630/20099 (52.89%) Loss: 2.175258 LR: 0.00002109 +[12:45:13] Epoch: 1 Batch: 10631/20099 (52.89%) Loss: 2.022120 LR: 0.00002109 +[12:45:15] Epoch: 1 Batch: 10632/20099 (52.90%) Loss: 1.751135 LR: 0.00002109 +[12:45:17] Epoch: 1 Batch: 10633/20099 (52.90%) Loss: 1.834156 LR: 0.00002109 +[12:45:18] Epoch: 1 Batch: 10634/20099 (52.91%) Loss: 2.208569 LR: 0.00002109 +[12:45:20] Epoch: 1 Batch: 10635/20099 (52.91%) Loss: 1.928240 LR: 0.00002109 +[12:45:22] Epoch: 1 Batch: 10636/20099 (52.92%) Loss: 1.833798 LR: 0.00002108 +[12:45:24] Epoch: 1 Batch: 10637/20099 (52.92%) Loss: 2.220286 LR: 0.00002108 +[12:45:26] Epoch: 1 Batch: 10638/20099 (52.93%) Loss: 2.276832 LR: 0.00002108 +[12:45:27] Epoch: 1 Batch: 10639/20099 (52.93%) Loss: 2.098781 LR: 0.00002108 +[12:45:29] Epoch: 1 Batch: 10640/20099 (52.94%) Loss: 2.087804 LR: 0.00002108 +[12:45:31] Epoch: 1 Batch: 10641/20099 (52.94%) Loss: 1.818310 LR: 0.00002108 +[12:45:33] Epoch: 1 Batch: 10642/20099 (52.95%) Loss: 1.790209 LR: 0.00002108 +[12:45:34] Epoch: 1 Batch: 10643/20099 (52.95%) Loss: 2.164132 LR: 0.00002106 +[12:45:36] Epoch: 1 Batch: 10644/20099 (52.96%) Loss: 2.268338 LR: 0.00002106 +[12:45:38] Epoch: 1 Batch: 10645/20099 (52.96%) Loss: 2.132754 LR: 0.00002106 +[12:45:40] Epoch: 1 Batch: 10646/20099 (52.97%) Loss: 2.184847 LR: 0.00002106 +[12:45:41] Epoch: 1 Batch: 10647/20099 (52.97%) Loss: 1.946147 LR: 0.00002106 +[12:45:43] Epoch: 1 Batch: 10648/20099 (52.98%) Loss: 2.215905 LR: 0.00002106 +[12:45:45] Epoch: 1 Batch: 10649/20099 (52.98%) Loss: 1.898659 LR: 0.00002106 +[12:45:47] Epoch: 1 Batch: 10650/20099 (52.99%) Loss: 2.292680 LR: 0.00002104 +[12:45:48] Epoch: 1 Batch: 10651/20099 (52.99%) Loss: 2.154249 LR: 0.00002104 +[12:45:50] Epoch: 1 Batch: 10652/20099 (53.00%) Loss: 1.712703 LR: 0.00002104 +[12:45:52] Epoch: 1 Batch: 10653/20099 (53.00%) Loss: 2.110464 LR: 0.00002104 +[12:45:54] Epoch: 1 Batch: 10654/20099 (53.01%) Loss: 2.321514 LR: 0.00002104 +[12:45:56] Epoch: 1 Batch: 10655/20099 (53.01%) Loss: 2.167873 LR: 0.00002104 +[12:45:57] Epoch: 1 Batch: 10656/20099 (53.02%) Loss: 2.115510 LR: 0.00002104 +[12:45:59] Epoch: 1 Batch: 10657/20099 (53.02%) Loss: 2.180236 LR: 0.00002103 +[12:46:01] Epoch: 1 Batch: 10658/20099 (53.03%) Loss: 2.313697 LR: 0.00002103 +[12:46:03] Epoch: 1 Batch: 10659/20099 (53.03%) Loss: 1.973081 LR: 0.00002103 +[12:46:04] Epoch: 1 Batch: 10660/20099 (53.04%) Loss: 2.190076 LR: 0.00002103 +[12:46:06] Epoch: 1 Batch: 10661/20099 (53.04%) Loss: 2.046026 LR: 0.00002103 +[12:46:08] Epoch: 1 Batch: 10662/20099 (53.05%) Loss: 2.080287 LR: 0.00002103 +[12:46:10] Epoch: 1 Batch: 10663/20099 (53.05%) Loss: 2.397389 LR: 0.00002103 +[12:46:12] Epoch: 1 Batch: 10664/20099 (53.06%) Loss: 2.026066 LR: 0.00002101 +[12:46:13] Epoch: 1 Batch: 10665/20099 (53.06%) Loss: 2.055689 LR: 0.00002101 +[12:46:15] Epoch: 1 Batch: 10666/20099 (53.07%) Loss: 1.660775 LR: 0.00002101 +[12:46:17] Epoch: 1 Batch: 10667/20099 (53.07%) Loss: 2.244256 LR: 0.00002101 +[12:46:19] Epoch: 1 Batch: 10668/20099 (53.08%) Loss: 2.269628 LR: 0.00002101 +[12:46:20] Epoch: 1 Batch: 10669/20099 (53.08%) Loss: 1.926209 LR: 0.00002101 +[12:46:22] Epoch: 1 Batch: 10670/20099 (53.09%) Loss: 1.825374 LR: 0.00002101 +[12:46:24] Epoch: 1 Batch: 10671/20099 (53.09%) Loss: 2.194442 LR: 0.00002100 +[12:46:26] Epoch: 1 Batch: 10672/20099 (53.10%) Loss: 1.913076 LR: 0.00002100 +[12:46:28] Epoch: 1 Batch: 10673/20099 (53.10%) Loss: 1.792434 LR: 0.00002100 +[12:46:29] Epoch: 1 Batch: 10674/20099 (53.11%) Loss: 2.406994 LR: 0.00002100 +[12:46:31] Epoch: 1 Batch: 10675/20099 (53.11%) Loss: 1.953329 LR: 0.00002100 +[12:46:33] Epoch: 1 Batch: 10676/20099 (53.12%) Loss: 2.127552 LR: 0.00002100 +[12:46:35] Epoch: 1 Batch: 10677/20099 (53.12%) Loss: 2.553423 LR: 0.00002100 +[12:46:36] Epoch: 1 Batch: 10678/20099 (53.13%) Loss: 1.972576 LR: 0.00002098 +[12:46:38] Epoch: 1 Batch: 10679/20099 (53.13%) Loss: 1.927000 LR: 0.00002098 +[12:46:40] Epoch: 1 Batch: 10680/20099 (53.14%) Loss: 2.224816 LR: 0.00002098 +[12:46:42] Epoch: 1 Batch: 10681/20099 (53.14%) Loss: 2.034566 LR: 0.00002098 +[12:46:44] Epoch: 1 Batch: 10682/20099 (53.15%) Loss: 2.160526 LR: 0.00002098 +[12:46:45] Epoch: 1 Batch: 10683/20099 (53.15%) Loss: 1.799345 LR: 0.00002098 +[12:46:47] Epoch: 1 Batch: 10684/20099 (53.16%) Loss: 2.122212 LR: 0.00002098 +[12:46:49] Epoch: 1 Batch: 10685/20099 (53.16%) Loss: 2.136188 LR: 0.00002097 +[12:46:51] Epoch: 1 Batch: 10686/20099 (53.17%) Loss: 2.000205 LR: 0.00002097 +[12:46:52] Epoch: 1 Batch: 10687/20099 (53.17%) Loss: 2.287966 LR: 0.00002097 +[12:46:54] Epoch: 1 Batch: 10688/20099 (53.18%) Loss: 1.848741 LR: 0.00002097 +[12:46:56] Epoch: 1 Batch: 10689/20099 (53.18%) Loss: 1.975384 LR: 0.00002097 +[12:46:58] Epoch: 1 Batch: 10690/20099 (53.19%) Loss: 2.278547 LR: 0.00002097 +[12:47:00] Epoch: 1 Batch: 10691/20099 (53.19%) Loss: 1.784178 LR: 0.00002097 +[12:47:01] Epoch: 1 Batch: 10692/20099 (53.20%) Loss: 2.199654 LR: 0.00002095 +[12:47:03] Epoch: 1 Batch: 10693/20099 (53.20%) Loss: 2.181619 LR: 0.00002095 +[12:47:05] Epoch: 1 Batch: 10694/20099 (53.21%) Loss: 1.924939 LR: 0.00002095 +[12:47:07] Epoch: 1 Batch: 10695/20099 (53.21%) Loss: 2.323013 LR: 0.00002095 +[12:47:09] Epoch: 1 Batch: 10696/20099 (53.22%) Loss: 2.152698 LR: 0.00002095 +[12:47:10] Epoch: 1 Batch: 10697/20099 (53.22%) Loss: 2.163315 LR: 0.00002095 +[12:47:12] Epoch: 1 Batch: 10698/20099 (53.23%) Loss: 1.933052 LR: 0.00002095 +[12:47:14] Epoch: 1 Batch: 10699/20099 (53.23%) Loss: 2.181442 LR: 0.00002093 +[12:47:16] Epoch: 1 Batch: 10700/20099 (53.24%) Loss: 1.976923 LR: 0.00002093 +[12:47:17] Epoch: 1 Batch: 10701/20099 (53.24%) Loss: 2.120158 LR: 0.00002093 +[12:47:19] Epoch: 1 Batch: 10702/20099 (53.25%) Loss: 2.115959 LR: 0.00002093 +[12:47:21] Epoch: 1 Batch: 10703/20099 (53.25%) Loss: 1.841247 LR: 0.00002093 +[12:47:23] Epoch: 1 Batch: 10704/20099 (53.26%) Loss: 2.033932 LR: 0.00002093 +[12:47:24] Epoch: 1 Batch: 10705/20099 (53.26%) Loss: 2.184541 LR: 0.00002093 +[12:47:26] Epoch: 1 Batch: 10706/20099 (53.27%) Loss: 1.874903 LR: 0.00002092 +[12:47:28] Epoch: 1 Batch: 10707/20099 (53.27%) Loss: 2.007438 LR: 0.00002092 +[12:47:30] Epoch: 1 Batch: 10708/20099 (53.28%) Loss: 2.101531 LR: 0.00002092 +[12:47:32] Epoch: 1 Batch: 10709/20099 (53.28%) Loss: 1.618119 LR: 0.00002092 +[12:47:33] Epoch: 1 Batch: 10710/20099 (53.29%) Loss: 2.381801 LR: 0.00002092 +[12:47:35] Epoch: 1 Batch: 10711/20099 (53.29%) Loss: 2.005429 LR: 0.00002092 +[12:47:37] Epoch: 1 Batch: 10712/20099 (53.30%) Loss: 2.273021 LR: 0.00002092 +[12:47:39] Epoch: 1 Batch: 10713/20099 (53.30%) Loss: 2.112838 LR: 0.00002090 +[12:47:40] Epoch: 1 Batch: 10714/20099 (53.31%) Loss: 2.137327 LR: 0.00002090 +[12:47:42] Epoch: 1 Batch: 10715/20099 (53.31%) Loss: 2.367983 LR: 0.00002090 +[12:47:44] Epoch: 1 Batch: 10716/20099 (53.32%) Loss: 2.020501 LR: 0.00002090 +[12:47:46] Epoch: 1 Batch: 10717/20099 (53.32%) Loss: 2.379011 LR: 0.00002090 +[12:47:47] Epoch: 1 Batch: 10718/20099 (53.33%) Loss: 2.596502 LR: 0.00002090 +[12:47:49] Epoch: 1 Batch: 10719/20099 (53.33%) Loss: 2.408472 LR: 0.00002090 +[12:47:51] Epoch: 1 Batch: 10720/20099 (53.34%) Loss: 2.328966 LR: 0.00002089 +[12:47:53] Epoch: 1 Batch: 10721/20099 (53.34%) Loss: 2.346813 LR: 0.00002089 +[12:47:55] Epoch: 1 Batch: 10722/20099 (53.35%) Loss: 2.313595 LR: 0.00002089 +[12:47:56] Epoch: 1 Batch: 10723/20099 (53.35%) Loss: 1.997594 LR: 0.00002089 +[12:47:58] Epoch: 1 Batch: 10724/20099 (53.36%) Loss: 1.845039 LR: 0.00002089 +[12:48:00] Epoch: 1 Batch: 10725/20099 (53.36%) Loss: 2.309136 LR: 0.00002089 +[12:48:02] Epoch: 1 Batch: 10726/20099 (53.37%) Loss: 2.211992 LR: 0.00002089 +[12:48:03] Epoch: 1 Batch: 10727/20099 (53.37%) Loss: 2.270927 LR: 0.00002087 +[12:48:05] Epoch: 1 Batch: 10728/20099 (53.38%) Loss: 2.228172 LR: 0.00002087 +[12:48:07] Epoch: 1 Batch: 10729/20099 (53.38%) Loss: 1.949476 LR: 0.00002087 +[12:48:09] Epoch: 1 Batch: 10730/20099 (53.39%) Loss: 1.903700 LR: 0.00002087 +[12:48:10] Epoch: 1 Batch: 10731/20099 (53.39%) Loss: 1.821935 LR: 0.00002087 +[12:48:12] Epoch: 1 Batch: 10732/20099 (53.40%) Loss: 2.257895 LR: 0.00002087 +[12:48:14] Epoch: 1 Batch: 10733/20099 (53.40%) Loss: 1.963958 LR: 0.00002087 +[12:48:16] Epoch: 1 Batch: 10734/20099 (53.41%) Loss: 1.836389 LR: 0.00002086 +[12:48:18] Epoch: 1 Batch: 10735/20099 (53.41%) Loss: 2.030680 LR: 0.00002086 +[12:48:19] Epoch: 1 Batch: 10736/20099 (53.42%) Loss: 2.476615 LR: 0.00002086 +[12:48:21] Epoch: 1 Batch: 10737/20099 (53.42%) Loss: 2.171477 LR: 0.00002086 +[12:48:23] Epoch: 1 Batch: 10738/20099 (53.43%) Loss: 1.956544 LR: 0.00002086 +[12:48:25] Epoch: 1 Batch: 10739/20099 (53.43%) Loss: 2.374203 LR: 0.00002086 +[12:48:26] Epoch: 1 Batch: 10740/20099 (53.44%) Loss: 2.087577 LR: 0.00002086 +[12:48:28] Epoch: 1 Batch: 10741/20099 (53.44%) Loss: 1.889120 LR: 0.00002084 +[12:48:30] Epoch: 1 Batch: 10742/20099 (53.45%) Loss: 2.110517 LR: 0.00002084 +[12:48:32] Epoch: 1 Batch: 10743/20099 (53.45%) Loss: 2.400672 LR: 0.00002084 +[12:48:33] Epoch: 1 Batch: 10744/20099 (53.46%) Loss: 2.330121 LR: 0.00002084 +[12:48:35] Epoch: 1 Batch: 10745/20099 (53.46%) Loss: 1.643324 LR: 0.00002084 +[12:48:37] Epoch: 1 Batch: 10746/20099 (53.47%) Loss: 2.102662 LR: 0.00002084 +[12:48:39] Epoch: 1 Batch: 10747/20099 (53.47%) Loss: 1.811929 LR: 0.00002084 +[12:48:41] Epoch: 1 Batch: 10748/20099 (53.48%) Loss: 2.175655 LR: 0.00002082 +[12:48:42] Epoch: 1 Batch: 10749/20099 (53.48%) Loss: 2.143772 LR: 0.00002082 +[12:48:44] Epoch: 1 Batch: 10750/20099 (53.49%) Loss: 2.044207 LR: 0.00002082 +[12:48:46] Epoch: 1 Batch: 10751/20099 (53.49%) Loss: 2.135013 LR: 0.00002082 +[12:48:48] Epoch: 1 Batch: 10752/20099 (53.50%) Loss: 2.156989 LR: 0.00002082 +[12:48:49] Epoch: 1 Batch: 10753/20099 (53.50%) Loss: 2.117356 LR: 0.00002082 +[12:48:51] Epoch: 1 Batch: 10754/20099 (53.51%) Loss: 2.051813 LR: 0.00002082 +[12:48:53] Epoch: 1 Batch: 10755/20099 (53.51%) Loss: 2.020932 LR: 0.00002081 +[12:48:55] Epoch: 1 Batch: 10756/20099 (53.52%) Loss: 2.186322 LR: 0.00002081 +[12:48:56] Epoch: 1 Batch: 10757/20099 (53.52%) Loss: 2.331009 LR: 0.00002081 +[12:48:58] Epoch: 1 Batch: 10758/20099 (53.53%) Loss: 1.995811 LR: 0.00002081 +[12:49:00] Epoch: 1 Batch: 10759/20099 (53.53%) Loss: 2.185558 LR: 0.00002081 +[12:49:02] Epoch: 1 Batch: 10760/20099 (53.54%) Loss: 2.387489 LR: 0.00002081 +[12:49:03] Epoch: 1 Batch: 10761/20099 (53.54%) Loss: 2.165603 LR: 0.00002081 +[12:49:05] Epoch: 1 Batch: 10762/20099 (53.54%) Loss: 1.952454 LR: 0.00002079 +[12:49:07] Epoch: 1 Batch: 10763/20099 (53.55%) Loss: 2.130037 LR: 0.00002079 +[12:49:09] Epoch: 1 Batch: 10764/20099 (53.55%) Loss: 1.813900 LR: 0.00002079 +[12:49:11] Epoch: 1 Batch: 10765/20099 (53.56%) Loss: 2.413622 LR: 0.00002079 +[12:49:12] Epoch: 1 Batch: 10766/20099 (53.56%) Loss: 2.248875 LR: 0.00002079 +[12:49:14] Epoch: 1 Batch: 10767/20099 (53.57%) Loss: 1.901267 LR: 0.00002079 +[12:49:16] Epoch: 1 Batch: 10768/20099 (53.57%) Loss: 2.084355 LR: 0.00002079 +[12:49:18] Epoch: 1 Batch: 10769/20099 (53.58%) Loss: 2.017617 LR: 0.00002078 +[12:49:19] Epoch: 1 Batch: 10770/20099 (53.58%) Loss: 1.900827 LR: 0.00002078 +[12:49:21] Epoch: 1 Batch: 10771/20099 (53.59%) Loss: 2.212478 LR: 0.00002078 +[12:49:23] Epoch: 1 Batch: 10772/20099 (53.59%) Loss: 2.380015 LR: 0.00002078 +[12:49:25] Epoch: 1 Batch: 10773/20099 (53.60%) Loss: 2.048027 LR: 0.00002078 +[12:49:26] Epoch: 1 Batch: 10774/20099 (53.60%) Loss: 2.302220 LR: 0.00002078 +[12:49:28] Epoch: 1 Batch: 10775/20099 (53.61%) Loss: 2.057735 LR: 0.00002078 +[12:49:30] Epoch: 1 Batch: 10776/20099 (53.61%) Loss: 2.207933 LR: 0.00002076 +[12:49:32] Epoch: 1 Batch: 10777/20099 (53.62%) Loss: 2.253211 LR: 0.00002076 +[12:49:34] Epoch: 1 Batch: 10778/20099 (53.62%) Loss: 2.018359 LR: 0.00002076 +[12:49:35] Epoch: 1 Batch: 10779/20099 (53.63%) Loss: 1.940859 LR: 0.00002076 +[12:49:37] Epoch: 1 Batch: 10780/20099 (53.63%) Loss: 2.188012 LR: 0.00002076 +[12:49:39] Epoch: 1 Batch: 10781/20099 (53.64%) Loss: 2.132989 LR: 0.00002076 +[12:49:41] Epoch: 1 Batch: 10782/20099 (53.64%) Loss: 1.936145 LR: 0.00002076 +[12:49:42] Epoch: 1 Batch: 10783/20099 (53.65%) Loss: 2.326366 LR: 0.00002074 +[12:49:44] Epoch: 1 Batch: 10784/20099 (53.65%) Loss: 2.318922 LR: 0.00002074 +[12:49:46] Epoch: 1 Batch: 10785/20099 (53.66%) Loss: 1.943102 LR: 0.00002074 +[12:49:48] Epoch: 1 Batch: 10786/20099 (53.66%) Loss: 2.393302 LR: 0.00002074 +[12:49:49] Epoch: 1 Batch: 10787/20099 (53.67%) Loss: 2.160732 LR: 0.00002074 +[12:49:51] Epoch: 1 Batch: 10788/20099 (53.67%) Loss: 2.270257 LR: 0.00002074 +[12:49:53] Epoch: 1 Batch: 10789/20099 (53.68%) Loss: 2.126182 LR: 0.00002074 +[12:49:55] Epoch: 1 Batch: 10790/20099 (53.68%) Loss: 2.112015 LR: 0.00002073 +[12:49:57] Epoch: 1 Batch: 10791/20099 (53.69%) Loss: 2.461033 LR: 0.00002073 +[12:49:58] Epoch: 1 Batch: 10792/20099 (53.69%) Loss: 2.194836 LR: 0.00002073 +[12:50:00] Epoch: 1 Batch: 10793/20099 (53.70%) Loss: 1.873349 LR: 0.00002073 +[12:50:02] Epoch: 1 Batch: 10794/20099 (53.70%) Loss: 2.039122 LR: 0.00002073 +[12:50:04] Epoch: 1 Batch: 10795/20099 (53.71%) Loss: 2.284618 LR: 0.00002073 +[12:50:05] Epoch: 1 Batch: 10796/20099 (53.71%) Loss: 2.086792 LR: 0.00002073 +[12:50:07] Epoch: 1 Batch: 10797/20099 (53.72%) Loss: 1.961278 LR: 0.00002071 +[12:50:09] Epoch: 1 Batch: 10798/20099 (53.72%) Loss: 2.005591 LR: 0.00002071 +[12:50:11] Epoch: 1 Batch: 10799/20099 (53.73%) Loss: 1.929565 LR: 0.00002071 +[12:50:16] >> Cleaned up old temp checkpoint: epoch1_step8800 +[12:50:16] >> Temp checkpoint saved: epoch1_step10800, size: 0.1693 GB +[12:50:16] Epoch: 1 Batch: 10800/20099 (53.73%) Loss: 1.819307 LR: 0.00002071 +[12:50:18] Epoch: 1 Batch: 10801/20099 (53.74%) Loss: 1.821550 LR: 0.00002071 +[12:50:20] Epoch: 1 Batch: 10802/20099 (53.74%) Loss: 2.019773 LR: 0.00002071 +[12:50:21] Epoch: 1 Batch: 10803/20099 (53.75%) Loss: 1.947206 LR: 0.00002071 +[12:50:23] Epoch: 1 Batch: 10804/20099 (53.75%) Loss: 2.187789 LR: 0.00002070 +[12:50:25] Epoch: 1 Batch: 10805/20099 (53.76%) Loss: 2.151719 LR: 0.00002070 +[12:50:27] Epoch: 1 Batch: 10806/20099 (53.76%) Loss: 2.017769 LR: 0.00002070 +[12:50:28] Epoch: 1 Batch: 10807/20099 (53.77%) Loss: 2.210691 LR: 0.00002070 +[12:50:30] Epoch: 1 Batch: 10808/20099 (53.77%) Loss: 2.168642 LR: 0.00002070 +[12:50:32] Epoch: 1 Batch: 10809/20099 (53.78%) Loss: 2.335950 LR: 0.00002070 +[12:50:34] Epoch: 1 Batch: 10810/20099 (53.78%) Loss: 1.898599 LR: 0.00002070 +[12:50:36] Epoch: 1 Batch: 10811/20099 (53.79%) Loss: 2.110321 LR: 0.00002068 +[12:50:37] Epoch: 1 Batch: 10812/20099 (53.79%) Loss: 1.691694 LR: 0.00002068 +[12:50:39] Epoch: 1 Batch: 10813/20099 (53.80%) Loss: 2.041438 LR: 0.00002068 +[12:50:41] Epoch: 1 Batch: 10814/20099 (53.80%) Loss: 2.167322 LR: 0.00002068 +[12:50:43] Epoch: 1 Batch: 10815/20099 (53.81%) Loss: 2.219054 LR: 0.00002068 +[12:50:44] Epoch: 1 Batch: 10816/20099 (53.81%) Loss: 1.837562 LR: 0.00002068 +[12:50:46] Epoch: 1 Batch: 10817/20099 (53.82%) Loss: 2.027601 LR: 0.00002068 +[12:50:48] Epoch: 1 Batch: 10818/20099 (53.82%) Loss: 2.234107 LR: 0.00002067 +[12:50:50] Epoch: 1 Batch: 10819/20099 (53.83%) Loss: 2.177175 LR: 0.00002067 +[12:50:52] Epoch: 1 Batch: 10820/20099 (53.83%) Loss: 2.175289 LR: 0.00002067 +[12:50:53] Epoch: 1 Batch: 10821/20099 (53.84%) Loss: 1.905036 LR: 0.00002067 +[12:50:55] Epoch: 1 Batch: 10822/20099 (53.84%) Loss: 2.244673 LR: 0.00002067 +[12:50:57] Epoch: 1 Batch: 10823/20099 (53.85%) Loss: 2.343349 LR: 0.00002067 +[12:50:59] Epoch: 1 Batch: 10824/20099 (53.85%) Loss: 2.195063 LR: 0.00002067 +[12:51:01] Epoch: 1 Batch: 10825/20099 (53.86%) Loss: 1.869355 LR: 0.00002065 +[12:51:02] Epoch: 1 Batch: 10826/20099 (53.86%) Loss: 2.035278 LR: 0.00002065 +[12:51:04] Epoch: 1 Batch: 10827/20099 (53.87%) Loss: 2.130636 LR: 0.00002065 +[12:51:06] Epoch: 1 Batch: 10828/20099 (53.87%) Loss: 2.308060 LR: 0.00002065 +[12:51:08] Epoch: 1 Batch: 10829/20099 (53.88%) Loss: 2.007224 LR: 0.00002065 +[12:51:10] Epoch: 1 Batch: 10830/20099 (53.88%) Loss: 2.131921 LR: 0.00002065 +[12:51:11] Epoch: 1 Batch: 10831/20099 (53.89%) Loss: 1.977515 LR: 0.00002065 +[12:51:13] Epoch: 1 Batch: 10832/20099 (53.89%) Loss: 2.120875 LR: 0.00002063 +[12:51:15] Epoch: 1 Batch: 10833/20099 (53.90%) Loss: 1.940558 LR: 0.00002063 +[12:51:17] Epoch: 1 Batch: 10834/20099 (53.90%) Loss: 2.297158 LR: 0.00002063 +[12:51:18] Epoch: 1 Batch: 10835/20099 (53.91%) Loss: 2.140698 LR: 0.00002063 +[12:51:20] Epoch: 1 Batch: 10836/20099 (53.91%) Loss: 2.169636 LR: 0.00002063 +[12:51:22] Epoch: 1 Batch: 10837/20099 (53.92%) Loss: 2.180158 LR: 0.00002063 +[12:51:24] Epoch: 1 Batch: 10838/20099 (53.92%) Loss: 1.755092 LR: 0.00002063 +[12:51:26] Epoch: 1 Batch: 10839/20099 (53.93%) Loss: 2.131763 LR: 0.00002062 +[12:51:27] Epoch: 1 Batch: 10840/20099 (53.93%) Loss: 2.110748 LR: 0.00002062 +[12:51:29] Epoch: 1 Batch: 10841/20099 (53.94%) Loss: 1.852887 LR: 0.00002062 +[12:51:31] Epoch: 1 Batch: 10842/20099 (53.94%) Loss: 1.944155 LR: 0.00002062 +[12:51:33] Epoch: 1 Batch: 10843/20099 (53.95%) Loss: 2.325693 LR: 0.00002062 +[12:51:34] Epoch: 1 Batch: 10844/20099 (53.95%) Loss: 2.081689 LR: 0.00002062 +[12:51:36] Epoch: 1 Batch: 10845/20099 (53.96%) Loss: 2.141023 LR: 0.00002062 +[12:51:38] Epoch: 1 Batch: 10846/20099 (53.96%) Loss: 2.120570 LR: 0.00002060 +[12:51:40] Epoch: 1 Batch: 10847/20099 (53.97%) Loss: 2.159397 LR: 0.00002060 +[12:51:41] Epoch: 1 Batch: 10848/20099 (53.97%) Loss: 2.091609 LR: 0.00002060 +[12:51:43] Epoch: 1 Batch: 10849/20099 (53.98%) Loss: 2.384986 LR: 0.00002060 +[12:51:45] Epoch: 1 Batch: 10850/20099 (53.98%) Loss: 1.948924 LR: 0.00002060 +[12:51:47] Epoch: 1 Batch: 10851/20099 (53.99%) Loss: 2.267084 LR: 0.00002060 +[12:51:48] Epoch: 1 Batch: 10852/20099 (53.99%) Loss: 1.990253 LR: 0.00002060 +[12:51:50] Epoch: 1 Batch: 10853/20099 (54.00%) Loss: 2.027387 LR: 0.00002059 +[12:51:52] Epoch: 1 Batch: 10854/20099 (54.00%) Loss: 2.333314 LR: 0.00002059 +[12:51:54] Epoch: 1 Batch: 10855/20099 (54.01%) Loss: 2.013413 LR: 0.00002059 +[12:51:55] Epoch: 1 Batch: 10856/20099 (54.01%) Loss: 2.113566 LR: 0.00002059 +[12:51:57] Epoch: 1 Batch: 10857/20099 (54.02%) Loss: 2.074779 LR: 0.00002059 +[12:51:59] Epoch: 1 Batch: 10858/20099 (54.02%) Loss: 2.128145 LR: 0.00002059 +[12:52:01] Epoch: 1 Batch: 10859/20099 (54.03%) Loss: 2.240045 LR: 0.00002059 +[12:52:03] Epoch: 1 Batch: 10860/20099 (54.03%) Loss: 2.349508 LR: 0.00002057 +[12:52:04] Epoch: 1 Batch: 10861/20099 (54.04%) Loss: 1.896391 LR: 0.00002057 +[12:52:06] Epoch: 1 Batch: 10862/20099 (54.04%) Loss: 2.019345 LR: 0.00002057 +[12:52:08] Epoch: 1 Batch: 10863/20099 (54.05%) Loss: 2.171817 LR: 0.00002057 +[12:52:10] Epoch: 1 Batch: 10864/20099 (54.05%) Loss: 2.012825 LR: 0.00002057 +[12:52:11] Epoch: 1 Batch: 10865/20099 (54.06%) Loss: 2.094913 LR: 0.00002057 +[12:52:13] Epoch: 1 Batch: 10866/20099 (54.06%) Loss: 2.136393 LR: 0.00002057 +[12:52:15] Epoch: 1 Batch: 10867/20099 (54.07%) Loss: 2.088082 LR: 0.00002055 +[12:52:17] Epoch: 1 Batch: 10868/20099 (54.07%) Loss: 2.078241 LR: 0.00002055 +[12:52:19] Epoch: 1 Batch: 10869/20099 (54.08%) Loss: 2.215929 LR: 0.00002055 +[12:52:20] Epoch: 1 Batch: 10870/20099 (54.08%) Loss: 1.865729 LR: 0.00002055 +[12:52:22] Epoch: 1 Batch: 10871/20099 (54.09%) Loss: 2.041482 LR: 0.00002055 +[12:52:24] Epoch: 1 Batch: 10872/20099 (54.09%) Loss: 1.845818 LR: 0.00002055 +[12:52:26] Epoch: 1 Batch: 10873/20099 (54.10%) Loss: 2.021316 LR: 0.00002055 +[12:52:27] Epoch: 1 Batch: 10874/20099 (54.10%) Loss: 2.199267 LR: 0.00002054 +[12:52:29] Epoch: 1 Batch: 10875/20099 (54.11%) Loss: 2.328201 LR: 0.00002054 +[12:52:31] Epoch: 1 Batch: 10876/20099 (54.11%) Loss: 2.651031 LR: 0.00002054 +[12:52:33] Epoch: 1 Batch: 10877/20099 (54.12%) Loss: 2.141410 LR: 0.00002054 +[12:52:35] Epoch: 1 Batch: 10878/20099 (54.12%) Loss: 2.127559 LR: 0.00002054 +[12:52:36] Epoch: 1 Batch: 10879/20099 (54.13%) Loss: 2.148687 LR: 0.00002054 +[12:52:38] Epoch: 1 Batch: 10880/20099 (54.13%) Loss: 2.084854 LR: 0.00002054 +[12:52:40] Epoch: 1 Batch: 10881/20099 (54.14%) Loss: 2.224936 LR: 0.00002052 +[12:52:42] Epoch: 1 Batch: 10882/20099 (54.14%) Loss: 2.264191 LR: 0.00002052 +[12:52:44] Epoch: 1 Batch: 10883/20099 (54.15%) Loss: 2.153655 LR: 0.00002052 +[12:52:45] Epoch: 1 Batch: 10884/20099 (54.15%) Loss: 1.872532 LR: 0.00002052 +[12:52:47] Epoch: 1 Batch: 10885/20099 (54.16%) Loss: 2.124489 LR: 0.00002052 +[12:52:49] Epoch: 1 Batch: 10886/20099 (54.16%) Loss: 2.109140 LR: 0.00002052 +[12:52:51] Epoch: 1 Batch: 10887/20099 (54.17%) Loss: 2.249701 LR: 0.00002052 +[12:52:52] Epoch: 1 Batch: 10888/20099 (54.17%) Loss: 1.965649 LR: 0.00002051 +[12:52:54] Epoch: 1 Batch: 10889/20099 (54.18%) Loss: 2.166732 LR: 0.00002051 +[12:52:56] Epoch: 1 Batch: 10890/20099 (54.18%) Loss: 2.067620 LR: 0.00002051 +[12:52:58] Epoch: 1 Batch: 10891/20099 (54.19%) Loss: 2.222866 LR: 0.00002051 +[12:53:00] Epoch: 1 Batch: 10892/20099 (54.19%) Loss: 2.466458 LR: 0.00002051 +[12:53:01] Epoch: 1 Batch: 10893/20099 (54.20%) Loss: 2.396294 LR: 0.00002051 +[12:53:03] Epoch: 1 Batch: 10894/20099 (54.20%) Loss: 2.106723 LR: 0.00002051 +[12:53:05] Epoch: 1 Batch: 10895/20099 (54.21%) Loss: 2.147239 LR: 0.00002049 +[12:53:07] Epoch: 1 Batch: 10896/20099 (54.21%) Loss: 1.933804 LR: 0.00002049 +[12:53:08] Epoch: 1 Batch: 10897/20099 (54.22%) Loss: 2.118937 LR: 0.00002049 +[12:53:10] Epoch: 1 Batch: 10898/20099 (54.22%) Loss: 2.151721 LR: 0.00002049 +[12:53:12] Epoch: 1 Batch: 10899/20099 (54.23%) Loss: 1.964587 LR: 0.00002049 +[12:53:14] Epoch: 1 Batch: 10900/20099 (54.23%) Loss: 1.531577 LR: 0.00002049 +[12:53:16] Epoch: 1 Batch: 10901/20099 (54.24%) Loss: 1.969033 LR: 0.00002049 +[12:53:17] Epoch: 1 Batch: 10902/20099 (54.24%) Loss: 2.265129 LR: 0.00002048 +[12:53:19] Epoch: 1 Batch: 10903/20099 (54.25%) Loss: 2.118177 LR: 0.00002048 +[12:53:21] Epoch: 1 Batch: 10904/20099 (54.25%) Loss: 2.092270 LR: 0.00002048 +[12:53:23] Epoch: 1 Batch: 10905/20099 (54.26%) Loss: 2.032341 LR: 0.00002048 +[12:53:24] Epoch: 1 Batch: 10906/20099 (54.26%) Loss: 2.141401 LR: 0.00002048 +[12:53:26] Epoch: 1 Batch: 10907/20099 (54.27%) Loss: 2.235370 LR: 0.00002048 +[12:53:28] Epoch: 1 Batch: 10908/20099 (54.27%) Loss: 2.304580 LR: 0.00002048 +[12:53:30] Epoch: 1 Batch: 10909/20099 (54.28%) Loss: 2.143189 LR: 0.00002046 +[12:53:32] Epoch: 1 Batch: 10910/20099 (54.28%) Loss: 2.042841 LR: 0.00002046 +[12:53:33] Epoch: 1 Batch: 10911/20099 (54.29%) Loss: 2.195048 LR: 0.00002046 +[12:53:35] Epoch: 1 Batch: 10912/20099 (54.29%) Loss: 2.178170 LR: 0.00002046 +[12:53:37] Epoch: 1 Batch: 10913/20099 (54.30%) Loss: 2.033402 LR: 0.00002046 +[12:53:39] Epoch: 1 Batch: 10914/20099 (54.30%) Loss: 2.036011 LR: 0.00002046 +[12:53:40] Epoch: 1 Batch: 10915/20099 (54.31%) Loss: 2.324113 LR: 0.00002046 +[12:53:42] Epoch: 1 Batch: 10916/20099 (54.31%) Loss: 1.816749 LR: 0.00002044 +[12:53:44] Epoch: 1 Batch: 10917/20099 (54.32%) Loss: 2.272579 LR: 0.00002044 +[12:53:46] Epoch: 1 Batch: 10918/20099 (54.32%) Loss: 1.984329 LR: 0.00002044 +[12:53:48] Epoch: 1 Batch: 10919/20099 (54.33%) Loss: 1.858022 LR: 0.00002044 +[12:53:49] Epoch: 1 Batch: 10920/20099 (54.33%) Loss: 2.263437 LR: 0.00002044 +[12:53:51] Epoch: 1 Batch: 10921/20099 (54.34%) Loss: 2.109052 LR: 0.00002044 +[12:53:53] Epoch: 1 Batch: 10922/20099 (54.34%) Loss: 2.086841 LR: 0.00002044 +[12:53:55] Epoch: 1 Batch: 10923/20099 (54.35%) Loss: 1.939013 LR: 0.00002043 +[12:53:56] Epoch: 1 Batch: 10924/20099 (54.35%) Loss: 2.228199 LR: 0.00002043 +[12:53:58] Epoch: 1 Batch: 10925/20099 (54.36%) Loss: 1.914180 LR: 0.00002043 +[12:54:00] Epoch: 1 Batch: 10926/20099 (54.36%) Loss: 1.675138 LR: 0.00002043 +[12:54:02] Epoch: 1 Batch: 10927/20099 (54.37%) Loss: 1.973565 LR: 0.00002043 +[12:54:04] Epoch: 1 Batch: 10928/20099 (54.37%) Loss: 2.030506 LR: 0.00002043 +[12:54:05] Epoch: 1 Batch: 10929/20099 (54.38%) Loss: 2.205430 LR: 0.00002043 +[12:54:07] Epoch: 1 Batch: 10930/20099 (54.38%) Loss: 2.018112 LR: 0.00002041 +[12:54:09] Epoch: 1 Batch: 10931/20099 (54.39%) Loss: 1.843793 LR: 0.00002041 +[12:54:11] Epoch: 1 Batch: 10932/20099 (54.39%) Loss: 1.886478 LR: 0.00002041 +[12:54:12] Epoch: 1 Batch: 10933/20099 (54.40%) Loss: 1.724491 LR: 0.00002041 +[12:54:14] Epoch: 1 Batch: 10934/20099 (54.40%) Loss: 2.348602 LR: 0.00002041 +[12:54:16] Epoch: 1 Batch: 10935/20099 (54.41%) Loss: 2.241485 LR: 0.00002041 +[12:54:18] Epoch: 1 Batch: 10936/20099 (54.41%) Loss: 2.258204 LR: 0.00002041 +[12:54:19] Epoch: 1 Batch: 10937/20099 (54.42%) Loss: 2.001014 LR: 0.00002040 +[12:54:21] Epoch: 1 Batch: 10938/20099 (54.42%) Loss: 1.879351 LR: 0.00002040 +[12:54:23] Epoch: 1 Batch: 10939/20099 (54.43%) Loss: 2.183980 LR: 0.00002040 +[12:54:25] Epoch: 1 Batch: 10940/20099 (54.43%) Loss: 2.391249 LR: 0.00002040 +[12:54:26] Epoch: 1 Batch: 10941/20099 (54.44%) Loss: 2.181504 LR: 0.00002040 +[12:54:28] Epoch: 1 Batch: 10942/20099 (54.44%) Loss: 2.069286 LR: 0.00002040 +[12:54:30] Epoch: 1 Batch: 10943/20099 (54.45%) Loss: 2.399700 LR: 0.00002040 +[12:54:32] Epoch: 1 Batch: 10944/20099 (54.45%) Loss: 1.727585 LR: 0.00002038 +[12:54:34] Epoch: 1 Batch: 10945/20099 (54.46%) Loss: 2.166899 LR: 0.00002038 +[12:54:35] Epoch: 1 Batch: 10946/20099 (54.46%) Loss: 1.769444 LR: 0.00002038 +[12:54:37] Epoch: 1 Batch: 10947/20099 (54.47%) Loss: 2.151026 LR: 0.00002038 +[12:54:39] Epoch: 1 Batch: 10948/20099 (54.47%) Loss: 1.948795 LR: 0.00002038 +[12:54:41] Epoch: 1 Batch: 10949/20099 (54.48%) Loss: 2.120847 LR: 0.00002038 +[12:54:42] Epoch: 1 Batch: 10950/20099 (54.48%) Loss: 1.909524 LR: 0.00002038 +[12:54:44] Epoch: 1 Batch: 10951/20099 (54.49%) Loss: 2.077949 LR: 0.00002036 +[12:54:46] Epoch: 1 Batch: 10952/20099 (54.49%) Loss: 2.279987 LR: 0.00002036 +[12:54:48] Epoch: 1 Batch: 10953/20099 (54.50%) Loss: 2.042197 LR: 0.00002036 +[12:54:49] Epoch: 1 Batch: 10954/20099 (54.50%) Loss: 1.842382 LR: 0.00002036 +[12:54:51] Epoch: 1 Batch: 10955/20099 (54.51%) Loss: 2.163868 LR: 0.00002036 +[12:54:53] Epoch: 1 Batch: 10956/20099 (54.51%) Loss: 2.246133 LR: 0.00002036 +[12:54:55] Epoch: 1 Batch: 10957/20099 (54.52%) Loss: 2.158983 LR: 0.00002036 +[12:54:56] Epoch: 1 Batch: 10958/20099 (54.52%) Loss: 2.022065 LR: 0.00002035 +[12:54:58] Epoch: 1 Batch: 10959/20099 (54.53%) Loss: 1.863288 LR: 0.00002035 +[12:55:00] Epoch: 1 Batch: 10960/20099 (54.53%) Loss: 2.299350 LR: 0.00002035 +[12:55:02] Epoch: 1 Batch: 10961/20099 (54.54%) Loss: 2.102475 LR: 0.00002035 +[12:55:03] Epoch: 1 Batch: 10962/20099 (54.54%) Loss: 1.928282 LR: 0.00002035 +[12:55:05] Epoch: 1 Batch: 10963/20099 (54.55%) Loss: 2.067193 LR: 0.00002035 +[12:55:07] Epoch: 1 Batch: 10964/20099 (54.55%) Loss: 2.089906 LR: 0.00002035 +[12:55:09] Epoch: 1 Batch: 10965/20099 (54.55%) Loss: 2.396143 LR: 0.00002033 +[12:55:11] Epoch: 1 Batch: 10966/20099 (54.56%) Loss: 1.918280 LR: 0.00002033 +[12:55:12] Epoch: 1 Batch: 10967/20099 (54.56%) Loss: 1.938540 LR: 0.00002033 +[12:55:14] Epoch: 1 Batch: 10968/20099 (54.57%) Loss: 1.981141 LR: 0.00002033 +[12:55:16] Epoch: 1 Batch: 10969/20099 (54.57%) Loss: 2.198460 LR: 0.00002033 +[12:55:18] Epoch: 1 Batch: 10970/20099 (54.58%) Loss: 2.246327 LR: 0.00002033 +[12:55:19] Epoch: 1 Batch: 10971/20099 (54.58%) Loss: 2.056145 LR: 0.00002033 +[12:55:21] Epoch: 1 Batch: 10972/20099 (54.59%) Loss: 2.054509 LR: 0.00002032 +[12:55:23] Epoch: 1 Batch: 10973/20099 (54.59%) Loss: 1.984092 LR: 0.00002032 +[12:55:25] Epoch: 1 Batch: 10974/20099 (54.60%) Loss: 2.310192 LR: 0.00002032 +[12:55:26] Epoch: 1 Batch: 10975/20099 (54.60%) Loss: 2.190665 LR: 0.00002032 +[12:55:28] Epoch: 1 Batch: 10976/20099 (54.61%) Loss: 2.027637 LR: 0.00002032 +[12:55:30] Epoch: 1 Batch: 10977/20099 (54.61%) Loss: 2.070393 LR: 0.00002032 +[12:55:32] Epoch: 1 Batch: 10978/20099 (54.62%) Loss: 2.377033 LR: 0.00002032 +[12:55:34] Epoch: 1 Batch: 10979/20099 (54.62%) Loss: 2.145426 LR: 0.00002030 +[12:55:35] Epoch: 1 Batch: 10980/20099 (54.63%) Loss: 2.078538 LR: 0.00002030 +[12:55:37] Epoch: 1 Batch: 10981/20099 (54.63%) Loss: 1.838984 LR: 0.00002030 +[12:55:39] Epoch: 1 Batch: 10982/20099 (54.64%) Loss: 2.122934 LR: 0.00002030 +[12:55:41] Epoch: 1 Batch: 10983/20099 (54.64%) Loss: 1.911107 LR: 0.00002030 +[12:55:42] Epoch: 1 Batch: 10984/20099 (54.65%) Loss: 2.082071 LR: 0.00002030 +[12:55:44] Epoch: 1 Batch: 10985/20099 (54.65%) Loss: 2.021607 LR: 0.00002030 +[12:55:46] Epoch: 1 Batch: 10986/20099 (54.66%) Loss: 2.545644 LR: 0.00002028 +[12:55:48] Epoch: 1 Batch: 10987/20099 (54.66%) Loss: 2.058690 LR: 0.00002028 +[12:55:49] Epoch: 1 Batch: 10988/20099 (54.67%) Loss: 2.463280 LR: 0.00002028 +[12:55:51] Epoch: 1 Batch: 10989/20099 (54.67%) Loss: 1.885264 LR: 0.00002028 +[12:55:53] Epoch: 1 Batch: 10990/20099 (54.68%) Loss: 2.087889 LR: 0.00002028 +[12:55:55] Epoch: 1 Batch: 10991/20099 (54.68%) Loss: 2.044692 LR: 0.00002028 +[12:55:57] Epoch: 1 Batch: 10992/20099 (54.69%) Loss: 1.828521 LR: 0.00002028 +[12:55:58] Epoch: 1 Batch: 10993/20099 (54.69%) Loss: 2.448980 LR: 0.00002027 +[12:56:00] Epoch: 1 Batch: 10994/20099 (54.70%) Loss: 2.105200 LR: 0.00002027 +[12:56:02] Epoch: 1 Batch: 10995/20099 (54.70%) Loss: 2.133803 LR: 0.00002027 +[12:56:04] Epoch: 1 Batch: 10996/20099 (54.71%) Loss: 2.262781 LR: 0.00002027 +[12:56:05] Epoch: 1 Batch: 10997/20099 (54.71%) Loss: 2.354441 LR: 0.00002027 +[12:56:07] Epoch: 1 Batch: 10998/20099 (54.72%) Loss: 2.009531 LR: 0.00002027 +[12:56:09] Epoch: 1 Batch: 10999/20099 (54.72%) Loss: 1.820278 LR: 0.00002027 +[12:56:11] >> Evaluating batch 0 +[12:56:12] >> Evaluating batch 1 +[12:56:13] >> Evaluating batch 2 +[12:56:14] >> Evaluating batch 3 +[12:56:15] >> Evaluating batch 4 +[12:56:16] >> Evaluating batch 5 +[12:56:17] >> Evaluating batch 6 +[12:56:18] >> Evaluating batch 7 +[12:56:19] >> Evaluating batch 8 +[12:56:20] >> Evaluating batch 9 +[12:56:21] >> Evaluating batch 10 +[12:56:22] >> Evaluating batch 11 +[12:56:23] >> Evaluating batch 12 +[12:56:24] >> Evaluating batch 13 +[12:56:25] >> Evaluating batch 14 +[12:56:26] >> Evaluating batch 15 +[12:56:27] >> Evaluating batch 16 +[12:56:27] Epoch: 1 Step: 11000/20099 Evaluation: +[12:56:27] [1mAvg Loss Since Last Eval: 2.1042 Val Loss: 2.1628 Validation loss delta: -0.0038 Perplexity: 8.6958 LR: 0.00002025 +[12:56:31] >> Cleaned up old temp checkpoint: epoch1_step9000 +[12:56:31] >> Temp checkpoint saved: epoch1_step11000, size: 0.1693 GB +[12:56:34] >> Checkpoint saved: epoch1_step11000, size: 0.1693 GB +[12:56:34] Epoch: 1 Batch: 11000/20099 (54.73%) Loss: 2.003237 LR: 0.00002025 +[12:56:36] Epoch: 1 Batch: 11001/20099 (54.73%) Loss: 2.308950 LR: 0.00002025 +[12:56:38] Epoch: 1 Batch: 11002/20099 (54.74%) Loss: 2.340276 LR: 0.00002025 +[12:56:39] Epoch: 1 Batch: 11003/20099 (54.74%) Loss: 2.191643 LR: 0.00002025 +[12:56:41] Epoch: 1 Batch: 11004/20099 (54.75%) Loss: 2.259829 LR: 0.00002025 +[12:56:43] Epoch: 1 Batch: 11005/20099 (54.75%) Loss: 2.240568 LR: 0.00002025 +[12:56:45] Epoch: 1 Batch: 11006/20099 (54.76%) Loss: 2.257064 LR: 0.00002025 +[12:56:46] Epoch: 1 Batch: 11007/20099 (54.76%) Loss: 1.995762 LR: 0.00002024 +[12:56:48] Epoch: 1 Batch: 11008/20099 (54.77%) Loss: 2.042869 LR: 0.00002024 +[12:56:50] Epoch: 1 Batch: 11009/20099 (54.77%) Loss: 2.224186 LR: 0.00002024 +[12:56:52] Epoch: 1 Batch: 11010/20099 (54.78%) Loss: 2.002099 LR: 0.00002024 +[12:56:54] Epoch: 1 Batch: 11011/20099 (54.78%) Loss: 2.174895 LR: 0.00002024 +[12:56:55] Epoch: 1 Batch: 11012/20099 (54.79%) Loss: 2.068383 LR: 0.00002024 +[12:56:57] Epoch: 1 Batch: 11013/20099 (54.79%) Loss: 2.219602 LR: 0.00002024 +[12:56:59] Epoch: 1 Batch: 11014/20099 (54.80%) Loss: 1.975746 LR: 0.00002022 +[12:57:01] Epoch: 1 Batch: 11015/20099 (54.80%) Loss: 1.967935 LR: 0.00002022 +[12:57:03] Epoch: 1 Batch: 11016/20099 (54.81%) Loss: 2.617079 LR: 0.00002022 +[12:57:05] Epoch: 1 Batch: 11017/20099 (54.81%) Loss: 2.042877 LR: 0.00002022 +[12:57:07] Epoch: 1 Batch: 11018/20099 (54.82%) Loss: 2.182997 LR: 0.00002022 +[12:57:08] Epoch: 1 Batch: 11019/20099 (54.82%) Loss: 2.382670 LR: 0.00002022 +[12:57:10] Epoch: 1 Batch: 11020/20099 (54.83%) Loss: 1.951826 LR: 0.00002022 +[12:57:12] Epoch: 1 Batch: 11021/20099 (54.83%) Loss: 2.009771 LR: 0.00002020 +[12:57:14] Epoch: 1 Batch: 11022/20099 (54.84%) Loss: 2.014898 LR: 0.00002020 +[12:57:16] Epoch: 1 Batch: 11023/20099 (54.84%) Loss: 2.069910 LR: 0.00002020 +[12:57:17] Epoch: 1 Batch: 11024/20099 (54.85%) Loss: 2.233220 LR: 0.00002020 +[12:57:19] Epoch: 1 Batch: 11025/20099 (54.85%) Loss: 2.269468 LR: 0.00002020 +[12:57:21] Epoch: 1 Batch: 11026/20099 (54.86%) Loss: 1.951767 LR: 0.00002020 +[12:57:23] Epoch: 1 Batch: 11027/20099 (54.86%) Loss: 2.151754 LR: 0.00002020 +[12:57:25] Epoch: 1 Batch: 11028/20099 (54.87%) Loss: 1.900542 LR: 0.00002019 +[12:57:26] Epoch: 1 Batch: 11029/20099 (54.87%) Loss: 2.391329 LR: 0.00002019 +[12:57:28] Epoch: 1 Batch: 11030/20099 (54.88%) Loss: 2.099194 LR: 0.00002019 +[12:57:30] Epoch: 1 Batch: 11031/20099 (54.88%) Loss: 2.077719 LR: 0.00002019 +[12:57:32] Epoch: 1 Batch: 11032/20099 (54.89%) Loss: 2.219657 LR: 0.00002019 +[12:57:33] Epoch: 1 Batch: 11033/20099 (54.89%) Loss: 2.145596 LR: 0.00002019 +[12:57:35] Epoch: 1 Batch: 11034/20099 (54.90%) Loss: 2.373609 LR: 0.00002019 +[12:57:37] Epoch: 1 Batch: 11035/20099 (54.90%) Loss: 2.361026 LR: 0.00002017 +[12:57:39] Epoch: 1 Batch: 11036/20099 (54.91%) Loss: 2.190966 LR: 0.00002017 +[12:57:40] Epoch: 1 Batch: 11037/20099 (54.91%) Loss: 2.200602 LR: 0.00002017 +[12:57:42] Epoch: 1 Batch: 11038/20099 (54.92%) Loss: 2.111797 LR: 0.00002017 +[12:57:44] Epoch: 1 Batch: 11039/20099 (54.92%) Loss: 2.082245 LR: 0.00002017 +[12:57:46] Epoch: 1 Batch: 11040/20099 (54.93%) Loss: 2.305699 LR: 0.00002017 +[12:57:47] Epoch: 1 Batch: 11041/20099 (54.93%) Loss: 1.972590 LR: 0.00002017 +[12:57:49] Epoch: 1 Batch: 11042/20099 (54.94%) Loss: 2.228224 LR: 0.00002016 +[12:57:51] Epoch: 1 Batch: 11043/20099 (54.94%) Loss: 2.070248 LR: 0.00002016 +[12:57:53] Epoch: 1 Batch: 11044/20099 (54.95%) Loss: 1.884289 LR: 0.00002016 +[12:57:54] Epoch: 1 Batch: 11045/20099 (54.95%) Loss: 2.221781 LR: 0.00002016 +[12:57:56] Epoch: 1 Batch: 11046/20099 (54.96%) Loss: 2.121635 LR: 0.00002016 +[12:57:58] Epoch: 1 Batch: 11047/20099 (54.96%) Loss: 1.905028 LR: 0.00002016 +[12:58:00] Epoch: 1 Batch: 11048/20099 (54.97%) Loss: 2.205162 LR: 0.00002016 +[12:58:01] Epoch: 1 Batch: 11049/20099 (54.97%) Loss: 2.015265 LR: 0.00002014 +[12:58:03] Epoch: 1 Batch: 11050/20099 (54.98%) Loss: 2.093370 LR: 0.00002014 +[12:58:05] Epoch: 1 Batch: 11051/20099 (54.98%) Loss: 1.964112 LR: 0.00002014 +[12:58:07] Epoch: 1 Batch: 11052/20099 (54.99%) Loss: 2.073983 LR: 0.00002014 +[12:58:08] Epoch: 1 Batch: 11053/20099 (54.99%) Loss: 2.235200 LR: 0.00002014 +[12:58:10] Epoch: 1 Batch: 11054/20099 (55.00%) Loss: 2.126853 LR: 0.00002014 +[12:58:12] Epoch: 1 Batch: 11055/20099 (55.00%) Loss: 1.940715 LR: 0.00002014 +[12:58:14] Epoch: 1 Batch: 11056/20099 (55.01%) Loss: 2.302220 LR: 0.00002012 +[12:58:15] Epoch: 1 Batch: 11057/20099 (55.01%) Loss: 2.257996 LR: 0.00002012 +[12:58:17] Epoch: 1 Batch: 11058/20099 (55.02%) Loss: 2.036028 LR: 0.00002012 +[12:58:19] Epoch: 1 Batch: 11059/20099 (55.02%) Loss: 2.094869 LR: 0.00002012 +[12:58:21] Epoch: 1 Batch: 11060/20099 (55.03%) Loss: 2.028501 LR: 0.00002012 +[12:58:23] Epoch: 1 Batch: 11061/20099 (55.03%) Loss: 2.153646 LR: 0.00002012 +[12:58:24] Epoch: 1 Batch: 11062/20099 (55.04%) Loss: 2.220414 LR: 0.00002012 +[12:58:26] Epoch: 1 Batch: 11063/20099 (55.04%) Loss: 1.973165 LR: 0.00002011 +[12:58:28] Epoch: 1 Batch: 11064/20099 (55.05%) Loss: 2.121934 LR: 0.00002011 +[12:58:30] Epoch: 1 Batch: 11065/20099 (55.05%) Loss: 2.138520 LR: 0.00002011 +[12:58:32] Epoch: 1 Batch: 11066/20099 (55.06%) Loss: 1.816774 LR: 0.00002011 +[12:58:33] Epoch: 1 Batch: 11067/20099 (55.06%) Loss: 2.173776 LR: 0.00002011 +[12:58:35] Epoch: 1 Batch: 11068/20099 (55.07%) Loss: 2.165769 LR: 0.00002011 +[12:58:37] Epoch: 1 Batch: 11069/20099 (55.07%) Loss: 1.904986 LR: 0.00002011 +[12:58:39] Epoch: 1 Batch: 11070/20099 (55.08%) Loss: 1.904932 LR: 0.00002009 +[12:58:40] Epoch: 1 Batch: 11071/20099 (55.08%) Loss: 2.408404 LR: 0.00002009 +[12:58:42] Epoch: 1 Batch: 11072/20099 (55.09%) Loss: 2.205010 LR: 0.00002009 +[12:58:44] Epoch: 1 Batch: 11073/20099 (55.09%) Loss: 2.246984 LR: 0.00002009 +[12:58:46] Epoch: 1 Batch: 11074/20099 (55.10%) Loss: 2.066077 LR: 0.00002009 +[12:58:48] Epoch: 1 Batch: 11075/20099 (55.10%) Loss: 1.998656 LR: 0.00002009 +[12:58:49] Epoch: 1 Batch: 11076/20099 (55.11%) Loss: 1.978814 LR: 0.00002009 +[12:58:51] Epoch: 1 Batch: 11077/20099 (55.11%) Loss: 2.224337 LR: 0.00002008 +[12:58:53] Epoch: 1 Batch: 11078/20099 (55.12%) Loss: 2.059154 LR: 0.00002008 +[12:58:55] Epoch: 1 Batch: 11079/20099 (55.12%) Loss: 2.179935 LR: 0.00002008 +[12:58:57] Epoch: 1 Batch: 11080/20099 (55.13%) Loss: 1.995695 LR: 0.00002008 +[12:58:58] Epoch: 1 Batch: 11081/20099 (55.13%) Loss: 2.192569 LR: 0.00002008 +[12:59:00] Epoch: 1 Batch: 11082/20099 (55.14%) Loss: 2.558090 LR: 0.00002008 +[12:59:02] Epoch: 1 Batch: 11083/20099 (55.14%) Loss: 2.396944 LR: 0.00002008 +[12:59:04] Epoch: 1 Batch: 11084/20099 (55.15%) Loss: 1.866625 LR: 0.00002006 +[12:59:05] Epoch: 1 Batch: 11085/20099 (55.15%) Loss: 2.335765 LR: 0.00002006 +[12:59:07] Epoch: 1 Batch: 11086/20099 (55.16%) Loss: 2.153575 LR: 0.00002006 +[12:59:09] Epoch: 1 Batch: 11087/20099 (55.16%) Loss: 1.952592 LR: 0.00002006 +[12:59:11] Epoch: 1 Batch: 11088/20099 (55.17%) Loss: 1.908183 LR: 0.00002006 +[12:59:13] Epoch: 1 Batch: 11089/20099 (55.17%) Loss: 2.148547 LR: 0.00002006 +[12:59:14] Epoch: 1 Batch: 11090/20099 (55.18%) Loss: 2.395633 LR: 0.00002006 +[12:59:16] Epoch: 1 Batch: 11091/20099 (55.18%) Loss: 2.063512 LR: 0.00002004 +[12:59:18] Epoch: 1 Batch: 11092/20099 (55.19%) Loss: 2.178868 LR: 0.00002004 +[12:59:20] Epoch: 1 Batch: 11093/20099 (55.19%) Loss: 2.262750 LR: 0.00002004 +[12:59:21] Epoch: 1 Batch: 11094/20099 (55.20%) Loss: 2.070873 LR: 0.00002004 +[12:59:23] Epoch: 1 Batch: 11095/20099 (55.20%) Loss: 1.986290 LR: 0.00002004 +[12:59:25] Epoch: 1 Batch: 11096/20099 (55.21%) Loss: 2.253832 LR: 0.00002004 +[12:59:27] Epoch: 1 Batch: 11097/20099 (55.21%) Loss: 2.280985 LR: 0.00002004 +[12:59:29] Epoch: 1 Batch: 11098/20099 (55.22%) Loss: 2.011451 LR: 0.00002003 +[12:59:30] Epoch: 1 Batch: 11099/20099 (55.22%) Loss: 2.089214 LR: 0.00002003 +[12:59:32] Epoch: 1 Batch: 11100/20099 (55.23%) Loss: 2.316898 LR: 0.00002003 +[12:59:34] Epoch: 1 Batch: 11101/20099 (55.23%) Loss: 2.210807 LR: 0.00002003 +[12:59:36] Epoch: 1 Batch: 11102/20099 (55.24%) Loss: 2.144978 LR: 0.00002003 +[12:59:37] Epoch: 1 Batch: 11103/20099 (55.24%) Loss: 2.026101 LR: 0.00002003 +[12:59:39] Epoch: 1 Batch: 11104/20099 (55.25%) Loss: 2.057931 LR: 0.00002003 +[12:59:41] Epoch: 1 Batch: 11105/20099 (55.25%) Loss: 2.043174 LR: 0.00002001 +[12:59:43] Epoch: 1 Batch: 11106/20099 (55.26%) Loss: 2.276022 LR: 0.00002001 +[12:59:44] Epoch: 1 Batch: 11107/20099 (55.26%) Loss: 1.948438 LR: 0.00002001 +[12:59:46] Epoch: 1 Batch: 11108/20099 (55.27%) Loss: 2.036401 LR: 0.00002001 +[12:59:48] Epoch: 1 Batch: 11109/20099 (55.27%) Loss: 2.175477 LR: 0.00002001 +[12:59:50] Epoch: 1 Batch: 11110/20099 (55.28%) Loss: 2.109460 LR: 0.00002001 +[12:59:52] Epoch: 1 Batch: 11111/20099 (55.28%) Loss: 2.004047 LR: 0.00002001 +[12:59:53] Epoch: 1 Batch: 11112/20099 (55.29%) Loss: 2.030140 LR: 0.00002000 +[12:59:55] Epoch: 1 Batch: 11113/20099 (55.29%) Loss: 1.864693 LR: 0.00002000 +[12:59:57] Epoch: 1 Batch: 11114/20099 (55.30%) Loss: 2.121825 LR: 0.00002000 +[12:59:59] Epoch: 1 Batch: 11115/20099 (55.30%) Loss: 2.045478 LR: 0.00002000 +[13:00:00] Epoch: 1 Batch: 11116/20099 (55.31%) Loss: 2.128784 LR: 0.00002000 +[13:00:02] Epoch: 1 Batch: 11117/20099 (55.31%) Loss: 2.218340 LR: 0.00002000 +[13:00:04] Epoch: 1 Batch: 11118/20099 (55.32%) Loss: 2.019497 LR: 0.00002000 +[13:00:06] Epoch: 1 Batch: 11119/20099 (55.32%) Loss: 2.090980 LR: 0.00001998 +[13:00:07] Epoch: 1 Batch: 11120/20099 (55.33%) Loss: 2.147621 LR: 0.00001998 +[13:00:09] Epoch: 1 Batch: 11121/20099 (55.33%) Loss: 2.086133 LR: 0.00001998 +[13:00:11] Epoch: 1 Batch: 11122/20099 (55.34%) Loss: 2.183997 LR: 0.00001998 +[13:00:13] Epoch: 1 Batch: 11123/20099 (55.34%) Loss: 1.818453 LR: 0.00001998 +[13:00:15] Epoch: 1 Batch: 11124/20099 (55.35%) Loss: 2.156151 LR: 0.00001998 +[13:00:16] Epoch: 1 Batch: 11125/20099 (55.35%) Loss: 2.234179 LR: 0.00001998 +[13:00:18] Epoch: 1 Batch: 11126/20099 (55.36%) Loss: 2.183157 LR: 0.00001996 +[13:00:20] Epoch: 1 Batch: 11127/20099 (55.36%) Loss: 1.940661 LR: 0.00001996 +[13:00:22] Epoch: 1 Batch: 11128/20099 (55.37%) Loss: 2.240952 LR: 0.00001996 +[13:00:23] Epoch: 1 Batch: 11129/20099 (55.37%) Loss: 2.066563 LR: 0.00001996 +[13:00:25] Epoch: 1 Batch: 11130/20099 (55.38%) Loss: 2.337873 LR: 0.00001996 +[13:00:27] Epoch: 1 Batch: 11131/20099 (55.38%) Loss: 2.140023 LR: 0.00001996 +[13:00:29] Epoch: 1 Batch: 11132/20099 (55.39%) Loss: 2.247560 LR: 0.00001996 +[13:00:31] Epoch: 1 Batch: 11133/20099 (55.39%) Loss: 2.199916 LR: 0.00001995 +[13:00:32] Epoch: 1 Batch: 11134/20099 (55.40%) Loss: 2.093566 LR: 0.00001995 +[13:00:34] Epoch: 1 Batch: 11135/20099 (55.40%) Loss: 2.023887 LR: 0.00001995 +[13:00:36] Epoch: 1 Batch: 11136/20099 (55.41%) Loss: 2.160091 LR: 0.00001995 +[13:00:38] Epoch: 1 Batch: 11137/20099 (55.41%) Loss: 2.267032 LR: 0.00001995 +[13:00:39] Epoch: 1 Batch: 11138/20099 (55.42%) Loss: 2.388919 LR: 0.00001995 +[13:00:41] Epoch: 1 Batch: 11139/20099 (55.42%) Loss: 1.908056 LR: 0.00001995 +[13:00:43] Epoch: 1 Batch: 11140/20099 (55.43%) Loss: 1.968200 LR: 0.00001993 +[13:00:45] Epoch: 1 Batch: 11141/20099 (55.43%) Loss: 1.949701 LR: 0.00001993 +[13:00:46] Epoch: 1 Batch: 11142/20099 (55.44%) Loss: 2.065496 LR: 0.00001993 +[13:00:48] Epoch: 1 Batch: 11143/20099 (55.44%) Loss: 2.346295 LR: 0.00001993 +[13:00:50] Epoch: 1 Batch: 11144/20099 (55.45%) Loss: 2.009866 LR: 0.00001993 +[13:00:52] Epoch: 1 Batch: 11145/20099 (55.45%) Loss: 2.117960 LR: 0.00001993 +[13:00:53] Epoch: 1 Batch: 11146/20099 (55.46%) Loss: 2.427787 LR: 0.00001993 +[13:00:55] Epoch: 1 Batch: 11147/20099 (55.46%) Loss: 2.204567 LR: 0.00001992 +[13:00:57] Epoch: 1 Batch: 11148/20099 (55.47%) Loss: 2.061039 LR: 0.00001992 +[13:00:59] Epoch: 1 Batch: 11149/20099 (55.47%) Loss: 1.913082 LR: 0.00001992 +[13:01:01] Epoch: 1 Batch: 11150/20099 (55.48%) Loss: 2.002297 LR: 0.00001992 +[13:01:02] Epoch: 1 Batch: 11151/20099 (55.48%) Loss: 1.983355 LR: 0.00001992 +[13:01:04] Epoch: 1 Batch: 11152/20099 (55.49%) Loss: 2.175663 LR: 0.00001992 +[13:01:06] Epoch: 1 Batch: 11153/20099 (55.49%) Loss: 1.883452 LR: 0.00001992 +[13:01:08] Epoch: 1 Batch: 11154/20099 (55.50%) Loss: 2.180163 LR: 0.00001990 +[13:01:09] Epoch: 1 Batch: 11155/20099 (55.50%) Loss: 2.135711 LR: 0.00001990 +[13:01:11] Epoch: 1 Batch: 11156/20099 (55.51%) Loss: 2.044790 LR: 0.00001990 +[13:01:13] Epoch: 1 Batch: 11157/20099 (55.51%) Loss: 1.998878 LR: 0.00001990 +[13:01:15] Epoch: 1 Batch: 11158/20099 (55.52%) Loss: 2.262176 LR: 0.00001990 +[13:01:16] Epoch: 1 Batch: 11159/20099 (55.52%) Loss: 2.102956 LR: 0.00001990 +[13:01:18] Epoch: 1 Batch: 11160/20099 (55.53%) Loss: 1.890763 LR: 0.00001990 +[13:01:20] Epoch: 1 Batch: 11161/20099 (55.53%) Loss: 1.760657 LR: 0.00001988 +[13:01:22] Epoch: 1 Batch: 11162/20099 (55.54%) Loss: 1.982925 LR: 0.00001988 +[13:01:24] Epoch: 1 Batch: 11163/20099 (55.54%) Loss: 1.647432 LR: 0.00001988 +[13:01:25] Epoch: 1 Batch: 11164/20099 (55.55%) Loss: 2.235788 LR: 0.00001988 +[13:01:27] Epoch: 1 Batch: 11165/20099 (55.55%) Loss: 1.916987 LR: 0.00001988 +[13:01:29] Epoch: 1 Batch: 11166/20099 (55.56%) Loss: 2.334794 LR: 0.00001988 +[13:01:31] Epoch: 1 Batch: 11167/20099 (55.56%) Loss: 1.626364 LR: 0.00001988 +[13:01:32] Epoch: 1 Batch: 11168/20099 (55.56%) Loss: 1.880565 LR: 0.00001987 +[13:01:34] Epoch: 1 Batch: 11169/20099 (55.57%) Loss: 2.009051 LR: 0.00001987 +[13:01:36] Epoch: 1 Batch: 11170/20099 (55.57%) Loss: 2.065865 LR: 0.00001987 +[13:01:38] Epoch: 1 Batch: 11171/20099 (55.58%) Loss: 2.123891 LR: 0.00001987 +[13:01:39] Epoch: 1 Batch: 11172/20099 (55.58%) Loss: 2.121518 LR: 0.00001987 +[13:01:41] Epoch: 1 Batch: 11173/20099 (55.59%) Loss: 1.805938 LR: 0.00001987 +[13:01:43] Epoch: 1 Batch: 11174/20099 (55.59%) Loss: 2.215779 LR: 0.00001987 +[13:01:45] Epoch: 1 Batch: 11175/20099 (55.60%) Loss: 1.877665 LR: 0.00001985 +[13:01:47] Epoch: 1 Batch: 11176/20099 (55.60%) Loss: 2.289865 LR: 0.00001985 +[13:01:48] Epoch: 1 Batch: 11177/20099 (55.61%) Loss: 2.250286 LR: 0.00001985 +[13:01:50] Epoch: 1 Batch: 11178/20099 (55.61%) Loss: 1.934063 LR: 0.00001985 +[13:01:52] Epoch: 1 Batch: 11179/20099 (55.62%) Loss: 2.162903 LR: 0.00001985 +[13:01:54] Epoch: 1 Batch: 11180/20099 (55.62%) Loss: 2.218387 LR: 0.00001985 +[13:01:55] Epoch: 1 Batch: 11181/20099 (55.63%) Loss: 2.202329 LR: 0.00001985 +[13:01:57] Epoch: 1 Batch: 11182/20099 (55.63%) Loss: 1.900783 LR: 0.00001984 +[13:01:59] Epoch: 1 Batch: 11183/20099 (55.64%) Loss: 2.277054 LR: 0.00001984 +[13:02:01] Epoch: 1 Batch: 11184/20099 (55.64%) Loss: 2.131933 LR: 0.00001984 +[13:02:03] Epoch: 1 Batch: 11185/20099 (55.65%) Loss: 1.944875 LR: 0.00001984 +[13:02:04] Epoch: 1 Batch: 11186/20099 (55.65%) Loss: 1.939227 LR: 0.00001984 +[13:02:06] Epoch: 1 Batch: 11187/20099 (55.66%) Loss: 1.760072 LR: 0.00001984 +[13:02:08] Epoch: 1 Batch: 11188/20099 (55.66%) Loss: 2.318941 LR: 0.00001984 +[13:02:10] Epoch: 1 Batch: 11189/20099 (55.67%) Loss: 1.939994 LR: 0.00001982 +[13:02:11] Epoch: 1 Batch: 11190/20099 (55.67%) Loss: 1.928573 LR: 0.00001982 +[13:02:13] Epoch: 1 Batch: 11191/20099 (55.68%) Loss: 2.368747 LR: 0.00001982 +[13:02:15] Epoch: 1 Batch: 11192/20099 (55.68%) Loss: 2.016939 LR: 0.00001982 +[13:02:17] Epoch: 1 Batch: 11193/20099 (55.69%) Loss: 2.142023 LR: 0.00001982 +[13:02:18] Epoch: 1 Batch: 11194/20099 (55.69%) Loss: 1.963130 LR: 0.00001982 +[13:02:20] Epoch: 1 Batch: 11195/20099 (55.70%) Loss: 2.013131 LR: 0.00001982 +[13:02:22] Epoch: 1 Batch: 11196/20099 (55.70%) Loss: 2.361329 LR: 0.00001980 +[13:02:24] Epoch: 1 Batch: 11197/20099 (55.71%) Loss: 2.094740 LR: 0.00001980 +[13:02:26] Epoch: 1 Batch: 11198/20099 (55.71%) Loss: 1.927511 LR: 0.00001980 +[13:02:27] Epoch: 1 Batch: 11199/20099 (55.72%) Loss: 2.178116 LR: 0.00001980 +[13:02:33] >> Cleaned up old temp checkpoint: epoch1_step9200 +[13:02:33] >> Temp checkpoint saved: epoch1_step11200, size: 0.1693 GB +[13:02:33] Epoch: 1 Batch: 11200/20099 (55.72%) Loss: 2.117837 LR: 0.00001980 +[13:02:35] Epoch: 1 Batch: 11201/20099 (55.73%) Loss: 2.048815 LR: 0.00001980 +[13:02:37] Epoch: 1 Batch: 11202/20099 (55.73%) Loss: 2.210438 LR: 0.00001980 +[13:02:38] Epoch: 1 Batch: 11203/20099 (55.74%) Loss: 1.654237 LR: 0.00001979 +[13:02:40] Epoch: 1 Batch: 11204/20099 (55.74%) Loss: 1.887697 LR: 0.00001979 +[13:02:42] Epoch: 1 Batch: 11205/20099 (55.75%) Loss: 2.311118 LR: 0.00001979 +[13:02:44] Epoch: 1 Batch: 11206/20099 (55.75%) Loss: 2.246371 LR: 0.00001979 +[13:02:45] Epoch: 1 Batch: 11207/20099 (55.76%) Loss: 2.026492 LR: 0.00001979 +[13:02:47] Epoch: 1 Batch: 11208/20099 (55.76%) Loss: 2.118731 LR: 0.00001979 +[13:02:49] Epoch: 1 Batch: 11209/20099 (55.77%) Loss: 2.043592 LR: 0.00001979 +[13:02:51] Epoch: 1 Batch: 11210/20099 (55.77%) Loss: 2.081089 LR: 0.00001977 +[13:02:52] Epoch: 1 Batch: 11211/20099 (55.78%) Loss: 2.146350 LR: 0.00001977 +[13:02:54] Epoch: 1 Batch: 11212/20099 (55.78%) Loss: 2.350594 LR: 0.00001977 +[13:02:56] Epoch: 1 Batch: 11213/20099 (55.79%) Loss: 2.168718 LR: 0.00001977 +[13:02:58] Epoch: 1 Batch: 11214/20099 (55.79%) Loss: 2.336211 LR: 0.00001977 +[13:03:00] Epoch: 1 Batch: 11215/20099 (55.80%) Loss: 2.022901 LR: 0.00001977 +[13:03:01] Epoch: 1 Batch: 11216/20099 (55.80%) Loss: 1.980048 LR: 0.00001977 +[13:03:03] Epoch: 1 Batch: 11217/20099 (55.81%) Loss: 2.107671 LR: 0.00001976 +[13:03:05] Epoch: 1 Batch: 11218/20099 (55.81%) Loss: 2.236177 LR: 0.00001976 +[13:03:07] Epoch: 1 Batch: 11219/20099 (55.82%) Loss: 2.207035 LR: 0.00001976 +[13:03:09] Epoch: 1 Batch: 11220/20099 (55.82%) Loss: 2.209307 LR: 0.00001976 +[13:03:10] Epoch: 1 Batch: 11221/20099 (55.83%) Loss: 1.631344 LR: 0.00001976 +[13:03:12] Epoch: 1 Batch: 11222/20099 (55.83%) Loss: 2.269216 LR: 0.00001976 +[13:03:14] Epoch: 1 Batch: 11223/20099 (55.84%) Loss: 1.989925 LR: 0.00001976 +[13:03:16] Epoch: 1 Batch: 11224/20099 (55.84%) Loss: 2.149505 LR: 0.00001974 +[13:03:18] Epoch: 1 Batch: 11225/20099 (55.85%) Loss: 2.109759 LR: 0.00001974 +[13:03:19] Epoch: 1 Batch: 11226/20099 (55.85%) Loss: 2.013200 LR: 0.00001974 +[13:03:21] Epoch: 1 Batch: 11227/20099 (55.86%) Loss: 2.192127 LR: 0.00001974 +[13:03:23] Epoch: 1 Batch: 11228/20099 (55.86%) Loss: 2.141660 LR: 0.00001974 +[13:03:25] Epoch: 1 Batch: 11229/20099 (55.87%) Loss: 2.195828 LR: 0.00001974 +[13:03:27] Epoch: 1 Batch: 11230/20099 (55.87%) Loss: 1.978537 LR: 0.00001974 +[13:03:28] Epoch: 1 Batch: 11231/20099 (55.88%) Loss: 2.440528 LR: 0.00001972 +[13:03:30] Epoch: 1 Batch: 11232/20099 (55.88%) Loss: 2.089115 LR: 0.00001972 +[13:03:32] Epoch: 1 Batch: 11233/20099 (55.89%) Loss: 2.221148 LR: 0.00001972 +[13:03:34] Epoch: 1 Batch: 11234/20099 (55.89%) Loss: 2.076248 LR: 0.00001972 +[13:03:36] Epoch: 1 Batch: 11235/20099 (55.90%) Loss: 2.048286 LR: 0.00001972 +[13:03:37] Epoch: 1 Batch: 11236/20099 (55.90%) Loss: 2.131194 LR: 0.00001972 +[13:03:39] Epoch: 1 Batch: 11237/20099 (55.91%) Loss: 1.896971 LR: 0.00001972 +[13:03:41] Epoch: 1 Batch: 11238/20099 (55.91%) Loss: 1.969922 LR: 0.00001971 +[13:03:43] Epoch: 1 Batch: 11239/20099 (55.92%) Loss: 2.010126 LR: 0.00001971 +[13:03:44] Epoch: 1 Batch: 11240/20099 (55.92%) Loss: 1.920749 LR: 0.00001971 +[13:03:46] Epoch: 1 Batch: 11241/20099 (55.93%) Loss: 2.067146 LR: 0.00001971 +[13:03:48] Epoch: 1 Batch: 11242/20099 (55.93%) Loss: 2.185040 LR: 0.00001971 +[13:03:50] Epoch: 1 Batch: 11243/20099 (55.94%) Loss: 1.874555 LR: 0.00001971 +[13:03:51] Epoch: 1 Batch: 11244/20099 (55.94%) Loss: 2.063280 LR: 0.00001971 +[13:03:53] Epoch: 1 Batch: 11245/20099 (55.95%) Loss: 2.348310 LR: 0.00001969 +[13:03:55] Epoch: 1 Batch: 11246/20099 (55.95%) Loss: 2.320829 LR: 0.00001969 +[13:03:57] Epoch: 1 Batch: 11247/20099 (55.96%) Loss: 2.435837 LR: 0.00001969 +[13:03:58] Epoch: 1 Batch: 11248/20099 (55.96%) Loss: 1.768005 LR: 0.00001969 +[13:04:00] Epoch: 1 Batch: 11249/20099 (55.97%) Loss: 2.292635 LR: 0.00001969 +[13:04:02] Epoch: 1 Batch: 11250/20099 (55.97%) Loss: 1.930801 LR: 0.00001969 +[13:04:04] Epoch: 1 Batch: 11251/20099 (55.98%) Loss: 2.461676 LR: 0.00001969 +[13:04:06] Epoch: 1 Batch: 11252/20099 (55.98%) Loss: 2.003017 LR: 0.00001968 +[13:04:07] Epoch: 1 Batch: 11253/20099 (55.99%) Loss: 2.077899 LR: 0.00001968 +[13:04:09] Epoch: 1 Batch: 11254/20099 (55.99%) Loss: 2.114132 LR: 0.00001968 +[13:04:11] Epoch: 1 Batch: 11255/20099 (56.00%) Loss: 2.282373 LR: 0.00001968 +[13:04:13] Epoch: 1 Batch: 11256/20099 (56.00%) Loss: 2.036855 LR: 0.00001968 +[13:04:14] Epoch: 1 Batch: 11257/20099 (56.01%) Loss: 2.101515 LR: 0.00001968 +[13:04:16] Epoch: 1 Batch: 11258/20099 (56.01%) Loss: 2.050449 LR: 0.00001968 +[13:04:18] Epoch: 1 Batch: 11259/20099 (56.02%) Loss: 1.911826 LR: 0.00001966 +[13:04:20] Epoch: 1 Batch: 11260/20099 (56.02%) Loss: 1.956179 LR: 0.00001966 +[13:04:21] Epoch: 1 Batch: 11261/20099 (56.03%) Loss: 2.082642 LR: 0.00001966 +[13:04:23] Epoch: 1 Batch: 11262/20099 (56.03%) Loss: 2.022380 LR: 0.00001966 +[13:04:25] Epoch: 1 Batch: 11263/20099 (56.04%) Loss: 2.073771 LR: 0.00001966 +[13:04:27] Epoch: 1 Batch: 11264/20099 (56.04%) Loss: 2.438944 LR: 0.00001966 +[13:04:29] Epoch: 1 Batch: 11265/20099 (56.05%) Loss: 1.959609 LR: 0.00001966 +[13:04:30] Epoch: 1 Batch: 11266/20099 (56.05%) Loss: 2.177157 LR: 0.00001964 +[13:04:32] Epoch: 1 Batch: 11267/20099 (56.06%) Loss: 2.029596 LR: 0.00001964 +[13:04:34] Epoch: 1 Batch: 11268/20099 (56.06%) Loss: 1.837508 LR: 0.00001964 +[13:04:36] Epoch: 1 Batch: 11269/20099 (56.07%) Loss: 2.378138 LR: 0.00001964 +[13:04:37] Epoch: 1 Batch: 11270/20099 (56.07%) Loss: 2.074793 LR: 0.00001964 +[13:04:39] Epoch: 1 Batch: 11271/20099 (56.08%) Loss: 2.302495 LR: 0.00001964 +[13:04:41] Epoch: 1 Batch: 11272/20099 (56.08%) Loss: 2.327219 LR: 0.00001964 +[13:04:43] Epoch: 1 Batch: 11273/20099 (56.09%) Loss: 1.883996 LR: 0.00001963 +[13:04:45] Epoch: 1 Batch: 11274/20099 (56.09%) Loss: 2.045943 LR: 0.00001963 +[13:04:46] Epoch: 1 Batch: 11275/20099 (56.10%) Loss: 2.204832 LR: 0.00001963 +[13:04:48] Epoch: 1 Batch: 11276/20099 (56.10%) Loss: 2.277378 LR: 0.00001963 +[13:04:50] Epoch: 1 Batch: 11277/20099 (56.11%) Loss: 2.065509 LR: 0.00001963 +[13:04:52] Epoch: 1 Batch: 11278/20099 (56.11%) Loss: 1.962020 LR: 0.00001963 +[13:04:54] Epoch: 1 Batch: 11279/20099 (56.12%) Loss: 1.993627 LR: 0.00001963 +[13:04:55] Epoch: 1 Batch: 11280/20099 (56.12%) Loss: 2.045679 LR: 0.00001961 +[13:04:57] Epoch: 1 Batch: 11281/20099 (56.13%) Loss: 2.177164 LR: 0.00001961 +[13:04:59] Epoch: 1 Batch: 11282/20099 (56.13%) Loss: 2.080282 LR: 0.00001961 +[13:05:01] Epoch: 1 Batch: 11283/20099 (56.14%) Loss: 2.360803 LR: 0.00001961 +[13:05:02] Epoch: 1 Batch: 11284/20099 (56.14%) Loss: 2.004181 LR: 0.00001961 +[13:05:04] Epoch: 1 Batch: 11285/20099 (56.15%) Loss: 1.995249 LR: 0.00001961 +[13:05:06] Epoch: 1 Batch: 11286/20099 (56.15%) Loss: 2.136674 LR: 0.00001961 +[13:05:08] Epoch: 1 Batch: 11287/20099 (56.16%) Loss: 1.978145 LR: 0.00001960 +[13:05:10] Epoch: 1 Batch: 11288/20099 (56.16%) Loss: 2.072719 LR: 0.00001960 +[13:05:11] Epoch: 1 Batch: 11289/20099 (56.17%) Loss: 1.981132 LR: 0.00001960 +[13:05:13] Epoch: 1 Batch: 11290/20099 (56.17%) Loss: 2.090764 LR: 0.00001960 +[13:05:15] Epoch: 1 Batch: 11291/20099 (56.18%) Loss: 2.086682 LR: 0.00001960 +[13:05:17] Epoch: 1 Batch: 11292/20099 (56.18%) Loss: 2.070847 LR: 0.00001960 +[13:05:18] Epoch: 1 Batch: 11293/20099 (56.19%) Loss: 1.809889 LR: 0.00001960 +[13:05:20] Epoch: 1 Batch: 11294/20099 (56.19%) Loss: 1.885626 LR: 0.00001958 +[13:05:22] Epoch: 1 Batch: 11295/20099 (56.20%) Loss: 1.965294 LR: 0.00001958 +[13:05:24] Epoch: 1 Batch: 11296/20099 (56.20%) Loss: 2.066365 LR: 0.00001958 +[13:05:25] Epoch: 1 Batch: 11297/20099 (56.21%) Loss: 2.211788 LR: 0.00001958 +[13:05:27] Epoch: 1 Batch: 11298/20099 (56.21%) Loss: 1.753140 LR: 0.00001958 +[13:05:29] Epoch: 1 Batch: 11299/20099 (56.22%) Loss: 1.986456 LR: 0.00001958 +[13:05:31] Epoch: 1 Batch: 11300/20099 (56.22%) Loss: 2.001939 LR: 0.00001958 +[13:05:33] Epoch: 1 Batch: 11301/20099 (56.23%) Loss: 2.047039 LR: 0.00001956 +[13:05:34] Epoch: 1 Batch: 11302/20099 (56.23%) Loss: 2.100166 LR: 0.00001956 +[13:05:36] Epoch: 1 Batch: 11303/20099 (56.24%) Loss: 2.227453 LR: 0.00001956 +[13:05:38] Epoch: 1 Batch: 11304/20099 (56.24%) Loss: 2.009810 LR: 0.00001956 +[13:05:40] Epoch: 1 Batch: 11305/20099 (56.25%) Loss: 1.931912 LR: 0.00001956 +[13:05:41] Epoch: 1 Batch: 11306/20099 (56.25%) Loss: 2.284395 LR: 0.00001956 +[13:05:43] Epoch: 1 Batch: 11307/20099 (56.26%) Loss: 2.050078 LR: 0.00001956 +[13:05:45] Epoch: 1 Batch: 11308/20099 (56.26%) Loss: 2.096411 LR: 0.00001955 +[13:05:47] Epoch: 1 Batch: 11309/20099 (56.27%) Loss: 2.051000 LR: 0.00001955 +[13:05:48] Epoch: 1 Batch: 11310/20099 (56.27%) Loss: 2.103501 LR: 0.00001955 +[13:05:50] Epoch: 1 Batch: 11311/20099 (56.28%) Loss: 1.910689 LR: 0.00001955 +[13:05:52] Epoch: 1 Batch: 11312/20099 (56.28%) Loss: 2.224805 LR: 0.00001955 +[13:05:54] Epoch: 1 Batch: 11313/20099 (56.29%) Loss: 2.295284 LR: 0.00001955 +[13:05:56] Epoch: 1 Batch: 11314/20099 (56.29%) Loss: 2.096960 LR: 0.00001955 +[13:05:57] Epoch: 1 Batch: 11315/20099 (56.30%) Loss: 2.190556 LR: 0.00001953 +[13:05:59] Epoch: 1 Batch: 11316/20099 (56.30%) Loss: 2.132529 LR: 0.00001953 +[13:06:01] Epoch: 1 Batch: 11317/20099 (56.31%) Loss: 2.180504 LR: 0.00001953 +[13:06:03] Epoch: 1 Batch: 11318/20099 (56.31%) Loss: 2.085730 LR: 0.00001953 +[13:06:04] Epoch: 1 Batch: 11319/20099 (56.32%) Loss: 1.908525 LR: 0.00001953 +[13:06:06] Epoch: 1 Batch: 11320/20099 (56.32%) Loss: 2.427936 LR: 0.00001953 +[13:06:08] Epoch: 1 Batch: 11321/20099 (56.33%) Loss: 2.184149 LR: 0.00001953 +[13:06:10] Epoch: 1 Batch: 11322/20099 (56.33%) Loss: 2.034099 LR: 0.00001951 +[13:06:11] Epoch: 1 Batch: 11323/20099 (56.34%) Loss: 2.310478 LR: 0.00001951 +[13:06:13] Epoch: 1 Batch: 11324/20099 (56.34%) Loss: 2.122055 LR: 0.00001951 +[13:06:15] Epoch: 1 Batch: 11325/20099 (56.35%) Loss: 2.224063 LR: 0.00001951 +[13:06:17] Epoch: 1 Batch: 11326/20099 (56.35%) Loss: 2.144893 LR: 0.00001951 +[13:06:19] Epoch: 1 Batch: 11327/20099 (56.36%) Loss: 2.175828 LR: 0.00001951 +[13:06:20] Epoch: 1 Batch: 11328/20099 (56.36%) Loss: 1.744901 LR: 0.00001951 +[13:06:22] Epoch: 1 Batch: 11329/20099 (56.37%) Loss: 2.340363 LR: 0.00001950 +[13:06:24] Epoch: 1 Batch: 11330/20099 (56.37%) Loss: 2.148187 LR: 0.00001950 +[13:06:26] Epoch: 1 Batch: 11331/20099 (56.38%) Loss: 1.705023 LR: 0.00001950 +[13:06:27] Epoch: 1 Batch: 11332/20099 (56.38%) Loss: 2.412832 LR: 0.00001950 +[13:06:29] Epoch: 1 Batch: 11333/20099 (56.39%) Loss: 1.643580 LR: 0.00001950 +[13:06:31] Epoch: 1 Batch: 11334/20099 (56.39%) Loss: 1.913682 LR: 0.00001950 +[13:06:33] Epoch: 1 Batch: 11335/20099 (56.40%) Loss: 2.036838 LR: 0.00001950 +[13:06:34] Epoch: 1 Batch: 11336/20099 (56.40%) Loss: 2.181314 LR: 0.00001948 +[13:06:36] Epoch: 1 Batch: 11337/20099 (56.41%) Loss: 2.526333 LR: 0.00001948 +[13:06:38] Epoch: 1 Batch: 11338/20099 (56.41%) Loss: 2.379275 LR: 0.00001948 +[13:06:40] Epoch: 1 Batch: 11339/20099 (56.42%) Loss: 2.461579 LR: 0.00001948 +[13:06:42] Epoch: 1 Batch: 11340/20099 (56.42%) Loss: 2.352740 LR: 0.00001948 +[13:06:43] Epoch: 1 Batch: 11341/20099 (56.43%) Loss: 1.739417 LR: 0.00001948 +[13:06:45] Epoch: 1 Batch: 11342/20099 (56.43%) Loss: 2.320500 LR: 0.00001948 +[13:06:47] Epoch: 1 Batch: 11343/20099 (56.44%) Loss: 2.353551 LR: 0.00001947 +[13:06:49] Epoch: 1 Batch: 11344/20099 (56.44%) Loss: 2.305062 LR: 0.00001947 +[13:06:50] Epoch: 1 Batch: 11345/20099 (56.45%) Loss: 2.190596 LR: 0.00001947 +[13:06:52] Epoch: 1 Batch: 11346/20099 (56.45%) Loss: 2.044444 LR: 0.00001947 +[13:06:54] Epoch: 1 Batch: 11347/20099 (56.46%) Loss: 2.023049 LR: 0.00001947 +[13:06:56] Epoch: 1 Batch: 11348/20099 (56.46%) Loss: 2.206058 LR: 0.00001947 +[13:06:57] Epoch: 1 Batch: 11349/20099 (56.47%) Loss: 1.973364 LR: 0.00001947 +[13:06:59] Epoch: 1 Batch: 11350/20099 (56.47%) Loss: 2.224528 LR: 0.00001945 +[13:07:01] Epoch: 1 Batch: 11351/20099 (56.48%) Loss: 2.414248 LR: 0.00001945 +[13:07:03] Epoch: 1 Batch: 11352/20099 (56.48%) Loss: 2.240918 LR: 0.00001945 +[13:07:05] Epoch: 1 Batch: 11353/20099 (56.49%) Loss: 2.201759 LR: 0.00001945 +[13:07:06] Epoch: 1 Batch: 11354/20099 (56.49%) Loss: 1.890977 LR: 0.00001945 +[13:07:08] Epoch: 1 Batch: 11355/20099 (56.50%) Loss: 2.151474 LR: 0.00001945 +[13:07:10] Epoch: 1 Batch: 11356/20099 (56.50%) Loss: 2.077891 LR: 0.00001945 +[13:07:12] Epoch: 1 Batch: 11357/20099 (56.51%) Loss: 2.250790 LR: 0.00001943 +[13:07:14] Epoch: 1 Batch: 11358/20099 (56.51%) Loss: 2.115627 LR: 0.00001943 +[13:07:15] Epoch: 1 Batch: 11359/20099 (56.52%) Loss: 2.121855 LR: 0.00001943 +[13:07:17] Epoch: 1 Batch: 11360/20099 (56.52%) Loss: 2.146965 LR: 0.00001943 +[13:07:19] Epoch: 1 Batch: 11361/20099 (56.53%) Loss: 2.337461 LR: 0.00001943 +[13:07:21] Epoch: 1 Batch: 11362/20099 (56.53%) Loss: 2.336686 LR: 0.00001943 +[13:07:22] Epoch: 1 Batch: 11363/20099 (56.54%) Loss: 2.493858 LR: 0.00001943 +[13:07:24] Epoch: 1 Batch: 11364/20099 (56.54%) Loss: 2.311725 LR: 0.00001942 +[13:07:26] Epoch: 1 Batch: 11365/20099 (56.55%) Loss: 1.889207 LR: 0.00001942 +[13:07:28] Epoch: 1 Batch: 11366/20099 (56.55%) Loss: 1.879188 LR: 0.00001942 +[13:07:30] Epoch: 1 Batch: 11367/20099 (56.56%) Loss: 2.244578 LR: 0.00001942 +[13:07:31] Epoch: 1 Batch: 11368/20099 (56.56%) Loss: 2.114572 LR: 0.00001942 +[13:07:33] Epoch: 1 Batch: 11369/20099 (56.57%) Loss: 1.945596 LR: 0.00001942 +[13:07:35] Epoch: 1 Batch: 11370/20099 (56.57%) Loss: 2.061101 LR: 0.00001942 +[13:07:37] Epoch: 1 Batch: 11371/20099 (56.57%) Loss: 2.125188 LR: 0.00001940 +[13:07:38] Epoch: 1 Batch: 11372/20099 (56.58%) Loss: 2.432108 LR: 0.00001940 +[13:07:40] Epoch: 1 Batch: 11373/20099 (56.58%) Loss: 2.489977 LR: 0.00001940 +[13:07:42] Epoch: 1 Batch: 11374/20099 (56.59%) Loss: 2.029814 LR: 0.00001940 +[13:07:44] Epoch: 1 Batch: 11375/20099 (56.59%) Loss: 1.594001 LR: 0.00001940 +[13:07:46] Epoch: 1 Batch: 11376/20099 (56.60%) Loss: 2.023249 LR: 0.00001940 +[13:07:47] Epoch: 1 Batch: 11377/20099 (56.60%) Loss: 2.049863 LR: 0.00001940 +[13:07:49] Epoch: 1 Batch: 11378/20099 (56.61%) Loss: 1.976772 LR: 0.00001939 +[13:07:51] Epoch: 1 Batch: 11379/20099 (56.61%) Loss: 1.836416 LR: 0.00001939 +[13:07:53] Epoch: 1 Batch: 11380/20099 (56.62%) Loss: 1.959018 LR: 0.00001939 +[13:07:55] Epoch: 1 Batch: 11381/20099 (56.62%) Loss: 1.980948 LR: 0.00001939 +[13:07:56] Epoch: 1 Batch: 11382/20099 (56.63%) Loss: 2.062345 LR: 0.00001939 +[13:07:58] Epoch: 1 Batch: 11383/20099 (56.63%) Loss: 2.187510 LR: 0.00001939 +[13:08:00] Epoch: 1 Batch: 11384/20099 (56.64%) Loss: 2.155882 LR: 0.00001939 +[13:08:02] Epoch: 1 Batch: 11385/20099 (56.64%) Loss: 2.204037 LR: 0.00001937 +[13:08:03] Epoch: 1 Batch: 11386/20099 (56.65%) Loss: 2.077801 LR: 0.00001937 +[13:08:05] Epoch: 1 Batch: 11387/20099 (56.65%) Loss: 1.888232 LR: 0.00001937 +[13:08:07] Epoch: 1 Batch: 11388/20099 (56.66%) Loss: 1.948736 LR: 0.00001937 +[13:08:09] Epoch: 1 Batch: 11389/20099 (56.66%) Loss: 2.230695 LR: 0.00001937 +[13:08:11] Epoch: 1 Batch: 11390/20099 (56.67%) Loss: 2.056886 LR: 0.00001937 +[13:08:12] Epoch: 1 Batch: 11391/20099 (56.67%) Loss: 2.056435 LR: 0.00001937 +[13:08:14] Epoch: 1 Batch: 11392/20099 (56.68%) Loss: 2.213975 LR: 0.00001935 +[13:08:16] Epoch: 1 Batch: 11393/20099 (56.68%) Loss: 1.933760 LR: 0.00001935 +[13:08:18] Epoch: 1 Batch: 11394/20099 (56.69%) Loss: 1.984713 LR: 0.00001935 +[13:08:20] Epoch: 1 Batch: 11395/20099 (56.69%) Loss: 2.134214 LR: 0.00001935 +[13:08:21] Epoch: 1 Batch: 11396/20099 (56.70%) Loss: 1.752936 LR: 0.00001935 +[13:08:23] Epoch: 1 Batch: 11397/20099 (56.70%) Loss: 1.968265 LR: 0.00001935 +[13:08:25] Epoch: 1 Batch: 11398/20099 (56.71%) Loss: 2.066082 LR: 0.00001935 +[13:08:27] Epoch: 1 Batch: 11399/20099 (56.71%) Loss: 2.216040 LR: 0.00001934 +[13:08:32] >> Cleaned up old temp checkpoint: epoch1_step9400 +[13:08:32] >> Temp checkpoint saved: epoch1_step11400, size: 0.1693 GB +[13:08:32] Epoch: 1 Batch: 11400/20099 (56.72%) Loss: 2.242309 LR: 0.00001934 +[13:08:34] Epoch: 1 Batch: 11401/20099 (56.72%) Loss: 1.944973 LR: 0.00001934 +[13:08:35] Epoch: 1 Batch: 11402/20099 (56.73%) Loss: 2.280672 LR: 0.00001934 +[13:08:37] Epoch: 1 Batch: 11403/20099 (56.73%) Loss: 1.662863 LR: 0.00001934 +[13:08:39] Epoch: 1 Batch: 11404/20099 (56.74%) Loss: 2.062529 LR: 0.00001934 +[13:08:41] Epoch: 1 Batch: 11405/20099 (56.74%) Loss: 2.198876 LR: 0.00001934 +[13:08:42] Epoch: 1 Batch: 11406/20099 (56.75%) Loss: 1.877169 LR: 0.00001932 +[13:08:44] Epoch: 1 Batch: 11407/20099 (56.75%) Loss: 2.145426 LR: 0.00001932 +[13:08:46] Epoch: 1 Batch: 11408/20099 (56.76%) Loss: 2.060930 LR: 0.00001932 +[13:08:48] Epoch: 1 Batch: 11409/20099 (56.76%) Loss: 2.292199 LR: 0.00001932 +[13:08:50] Epoch: 1 Batch: 11410/20099 (56.77%) Loss: 2.269112 LR: 0.00001932 +[13:08:51] Epoch: 1 Batch: 11411/20099 (56.77%) Loss: 1.956730 LR: 0.00001932 +[13:08:53] Epoch: 1 Batch: 11412/20099 (56.78%) Loss: 2.187746 LR: 0.00001932 +[13:08:55] Epoch: 1 Batch: 11413/20099 (56.78%) Loss: 1.871574 LR: 0.00001930 +[13:08:57] Epoch: 1 Batch: 11414/20099 (56.79%) Loss: 2.281445 LR: 0.00001930 +[13:08:59] Epoch: 1 Batch: 11415/20099 (56.79%) Loss: 2.081698 LR: 0.00001930 +[13:09:00] Epoch: 1 Batch: 11416/20099 (56.80%) Loss: 2.137792 LR: 0.00001930 +[13:09:02] Epoch: 1 Batch: 11417/20099 (56.80%) Loss: 2.271646 LR: 0.00001930 +[13:09:04] Epoch: 1 Batch: 11418/20099 (56.81%) Loss: 1.861923 LR: 0.00001930 +[13:09:06] Epoch: 1 Batch: 11419/20099 (56.81%) Loss: 1.984867 LR: 0.00001930 +[13:09:08] Epoch: 1 Batch: 11420/20099 (56.82%) Loss: 2.365150 LR: 0.00001929 +[13:09:09] Epoch: 1 Batch: 11421/20099 (56.82%) Loss: 2.036678 LR: 0.00001929 +[13:09:11] Epoch: 1 Batch: 11422/20099 (56.83%) Loss: 2.108927 LR: 0.00001929 +[13:09:13] Epoch: 1 Batch: 11423/20099 (56.83%) Loss: 2.426346 LR: 0.00001929 +[13:09:15] Epoch: 1 Batch: 11424/20099 (56.84%) Loss: 2.120196 LR: 0.00001929 +[13:09:17] Epoch: 1 Batch: 11425/20099 (56.84%) Loss: 2.055756 LR: 0.00001929 +[13:09:18] Epoch: 1 Batch: 11426/20099 (56.85%) Loss: 1.953622 LR: 0.00001929 +[13:09:20] Epoch: 1 Batch: 11427/20099 (56.85%) Loss: 1.957633 LR: 0.00001927 +[13:09:22] Epoch: 1 Batch: 11428/20099 (56.86%) Loss: 2.009292 LR: 0.00001927 +[13:09:24] Epoch: 1 Batch: 11429/20099 (56.86%) Loss: 2.059504 LR: 0.00001927 +[13:09:26] Epoch: 1 Batch: 11430/20099 (56.87%) Loss: 2.192499 LR: 0.00001927 +[13:09:27] Epoch: 1 Batch: 11431/20099 (56.87%) Loss: 2.103674 LR: 0.00001927 +[13:09:29] Epoch: 1 Batch: 11432/20099 (56.88%) Loss: 2.478964 LR: 0.00001927 +[13:09:31] Epoch: 1 Batch: 11433/20099 (56.88%) Loss: 1.848549 LR: 0.00001927 +[13:09:33] Epoch: 1 Batch: 11434/20099 (56.89%) Loss: 2.293578 LR: 0.00001926 +[13:09:34] Epoch: 1 Batch: 11435/20099 (56.89%) Loss: 2.033075 LR: 0.00001926 +[13:09:36] Epoch: 1 Batch: 11436/20099 (56.90%) Loss: 2.013788 LR: 0.00001926 +[13:09:38] Epoch: 1 Batch: 11437/20099 (56.90%) Loss: 1.565673 LR: 0.00001926 +[13:09:40] Epoch: 1 Batch: 11438/20099 (56.91%) Loss: 2.020387 LR: 0.00001926 +[13:09:41] Epoch: 1 Batch: 11439/20099 (56.91%) Loss: 2.269745 LR: 0.00001926 +[13:09:43] Epoch: 1 Batch: 11440/20099 (56.92%) Loss: 1.980197 LR: 0.00001926 +[13:09:45] Epoch: 1 Batch: 11441/20099 (56.92%) Loss: 2.186114 LR: 0.00001924 +[13:09:47] Epoch: 1 Batch: 11442/20099 (56.93%) Loss: 2.224105 LR: 0.00001924 +[13:09:48] Epoch: 1 Batch: 11443/20099 (56.93%) Loss: 2.105724 LR: 0.00001924 +[13:09:50] Epoch: 1 Batch: 11444/20099 (56.94%) Loss: 2.224580 LR: 0.00001924 +[13:09:52] Epoch: 1 Batch: 11445/20099 (56.94%) Loss: 2.038245 LR: 0.00001924 +[13:09:54] Epoch: 1 Batch: 11446/20099 (56.95%) Loss: 1.757391 LR: 0.00001924 +[13:09:56] Epoch: 1 Batch: 11447/20099 (56.95%) Loss: 1.946046 LR: 0.00001924 +[13:09:57] Epoch: 1 Batch: 11448/20099 (56.96%) Loss: 2.174122 LR: 0.00001922 +[13:09:59] Epoch: 1 Batch: 11449/20099 (56.96%) Loss: 2.244912 LR: 0.00001922 +[13:10:01] Epoch: 1 Batch: 11450/20099 (56.97%) Loss: 2.034544 LR: 0.00001922 +[13:10:03] Epoch: 1 Batch: 11451/20099 (56.97%) Loss: 1.958884 LR: 0.00001922 +[13:10:04] Epoch: 1 Batch: 11452/20099 (56.98%) Loss: 2.295491 LR: 0.00001922 +[13:10:06] Epoch: 1 Batch: 11453/20099 (56.98%) Loss: 1.851385 LR: 0.00001922 +[13:10:08] Epoch: 1 Batch: 11454/20099 (56.99%) Loss: 1.863563 LR: 0.00001922 +[13:10:10] Epoch: 1 Batch: 11455/20099 (56.99%) Loss: 1.760401 LR: 0.00001921 +[13:10:12] Epoch: 1 Batch: 11456/20099 (57.00%) Loss: 2.162434 LR: 0.00001921 +[13:10:13] Epoch: 1 Batch: 11457/20099 (57.00%) Loss: 1.903727 LR: 0.00001921 +[13:10:15] Epoch: 1 Batch: 11458/20099 (57.01%) Loss: 2.464973 LR: 0.00001921 +[13:10:17] Epoch: 1 Batch: 11459/20099 (57.01%) Loss: 2.344601 LR: 0.00001921 +[13:10:19] Epoch: 1 Batch: 11460/20099 (57.02%) Loss: 2.198756 LR: 0.00001921 +[13:10:20] Epoch: 1 Batch: 11461/20099 (57.02%) Loss: 2.290895 LR: 0.00001921 +[13:10:22] Epoch: 1 Batch: 11462/20099 (57.03%) Loss: 2.205675 LR: 0.00001919 +[13:10:24] Epoch: 1 Batch: 11463/20099 (57.03%) Loss: 1.817565 LR: 0.00001919 +[13:10:26] Epoch: 1 Batch: 11464/20099 (57.04%) Loss: 2.213937 LR: 0.00001919 +[13:10:27] Epoch: 1 Batch: 11465/20099 (57.04%) Loss: 2.115468 LR: 0.00001919 +[13:10:29] Epoch: 1 Batch: 11466/20099 (57.05%) Loss: 1.782922 LR: 0.00001919 +[13:10:31] Epoch: 1 Batch: 11467/20099 (57.05%) Loss: 2.203735 LR: 0.00001919 +[13:10:33] Epoch: 1 Batch: 11468/20099 (57.06%) Loss: 1.944154 LR: 0.00001919 +[13:10:35] Epoch: 1 Batch: 11469/20099 (57.06%) Loss: 1.941014 LR: 0.00001918 +[13:10:36] Epoch: 1 Batch: 11470/20099 (57.07%) Loss: 1.933968 LR: 0.00001918 +[13:10:38] Epoch: 1 Batch: 11471/20099 (57.07%) Loss: 2.072848 LR: 0.00001918 +[13:10:40] Epoch: 1 Batch: 11472/20099 (57.08%) Loss: 1.899228 LR: 0.00001918 +[13:10:42] Epoch: 1 Batch: 11473/20099 (57.08%) Loss: 2.209858 LR: 0.00001918 +[13:10:43] Epoch: 1 Batch: 11474/20099 (57.09%) Loss: 2.413316 LR: 0.00001918 +[13:10:45] Epoch: 1 Batch: 11475/20099 (57.09%) Loss: 2.120059 LR: 0.00001918 +[13:10:47] Epoch: 1 Batch: 11476/20099 (57.10%) Loss: 2.231450 LR: 0.00001916 +[13:10:49] Epoch: 1 Batch: 11477/20099 (57.10%) Loss: 1.839533 LR: 0.00001916 +[13:10:51] Epoch: 1 Batch: 11478/20099 (57.11%) Loss: 1.803795 LR: 0.00001916 +[13:10:52] Epoch: 1 Batch: 11479/20099 (57.11%) Loss: 1.992642 LR: 0.00001916 +[13:10:54] Epoch: 1 Batch: 11480/20099 (57.12%) Loss: 2.046949 LR: 0.00001916 +[13:10:56] Epoch: 1 Batch: 11481/20099 (57.12%) Loss: 2.093298 LR: 0.00001916 +[13:10:58] Epoch: 1 Batch: 11482/20099 (57.13%) Loss: 2.037118 LR: 0.00001916 +[13:10:59] Epoch: 1 Batch: 11483/20099 (57.13%) Loss: 2.064366 LR: 0.00001914 +[13:11:01] Epoch: 1 Batch: 11484/20099 (57.14%) Loss: 2.094176 LR: 0.00001914 +[13:11:03] Epoch: 1 Batch: 11485/20099 (57.14%) Loss: 1.955821 LR: 0.00001914 +[13:11:05] Epoch: 1 Batch: 11486/20099 (57.15%) Loss: 1.956538 LR: 0.00001914 +[13:11:07] Epoch: 1 Batch: 11487/20099 (57.15%) Loss: 2.043616 LR: 0.00001914 +[13:11:08] Epoch: 1 Batch: 11488/20099 (57.16%) Loss: 2.080916 LR: 0.00001914 +[13:11:10] Epoch: 1 Batch: 11489/20099 (57.16%) Loss: 2.071719 LR: 0.00001914 +[13:11:12] Epoch: 1 Batch: 11490/20099 (57.17%) Loss: 2.172902 LR: 0.00001913 +[13:11:14] Epoch: 1 Batch: 11491/20099 (57.17%) Loss: 1.306231 LR: 0.00001913 +[13:11:15] Epoch: 1 Batch: 11492/20099 (57.18%) Loss: 1.795804 LR: 0.00001913 +[13:11:17] Epoch: 1 Batch: 11493/20099 (57.18%) Loss: 2.290836 LR: 0.00001913 +[13:11:19] Epoch: 1 Batch: 11494/20099 (57.19%) Loss: 2.129313 LR: 0.00001913 +[13:11:21] Epoch: 1 Batch: 11495/20099 (57.19%) Loss: 2.286456 LR: 0.00001913 +[13:11:23] Epoch: 1 Batch: 11496/20099 (57.20%) Loss: 2.032724 LR: 0.00001913 +[13:11:24] Epoch: 1 Batch: 11497/20099 (57.20%) Loss: 1.884629 LR: 0.00001911 +[13:11:26] Epoch: 1 Batch: 11498/20099 (57.21%) Loss: 2.237799 LR: 0.00001911 +[13:11:28] Epoch: 1 Batch: 11499/20099 (57.21%) Loss: 2.120321 LR: 0.00001911 +[13:11:30] >> Evaluating batch 0 +[13:11:31] >> Evaluating batch 1 +[13:11:32] >> Evaluating batch 2 +[13:11:33] >> Evaluating batch 3 +[13:11:34] >> Evaluating batch 4 +[13:11:35] >> Evaluating batch 5 +[13:11:36] >> Evaluating batch 6 +[13:11:37] >> Evaluating batch 7 +[13:11:38] >> Evaluating batch 8 +[13:11:39] >> Evaluating batch 9 +[13:11:40] >> Evaluating batch 10 +[13:11:41] >> Evaluating batch 11 +[13:11:42] >> Evaluating batch 12 +[13:11:43] >> Evaluating batch 13 +[13:11:44] >> Evaluating batch 14 +[13:11:45] >> Evaluating batch 15 +[13:11:46] >> Evaluating batch 16 +[13:11:46] Epoch: 1 Step: 11500/20099 Evaluation: +[13:11:46] [1mAvg Loss Since Last Eval: 2.0994 Val Loss: 2.1593 Validation loss delta: -0.0035 Perplexity: 8.6652 LR: 0.00001911 +[13:11:50] >> Checkpoint saved: epoch1_step11500, size: 0.1693 GB +[13:11:50] Epoch: 1 Batch: 11500/20099 (57.22%) Loss: 2.133309 LR: 0.00001911 +[13:11:52] Epoch: 1 Batch: 11501/20099 (57.22%) Loss: 1.876649 LR: 0.00001911 +[13:11:54] Epoch: 1 Batch: 11502/20099 (57.23%) Loss: 1.810048 LR: 0.00001911 +[13:11:55] Epoch: 1 Batch: 11503/20099 (57.23%) Loss: 1.999212 LR: 0.00001911 +[13:11:57] Epoch: 1 Batch: 11504/20099 (57.24%) Loss: 2.283310 LR: 0.00001909 +[13:11:59] Epoch: 1 Batch: 11505/20099 (57.24%) Loss: 2.253425 LR: 0.00001909 +[13:12:01] Epoch: 1 Batch: 11506/20099 (57.25%) Loss: 2.120220 LR: 0.00001909 +[13:12:02] Epoch: 1 Batch: 11507/20099 (57.25%) Loss: 2.219967 LR: 0.00001909 +[13:12:04] Epoch: 1 Batch: 11508/20099 (57.26%) Loss: 2.036957 LR: 0.00001909 +[13:12:06] Epoch: 1 Batch: 11509/20099 (57.26%) Loss: 2.228568 LR: 0.00001909 +[13:12:08] Epoch: 1 Batch: 11510/20099 (57.27%) Loss: 2.105385 LR: 0.00001909 +[13:12:10] Epoch: 1 Batch: 11511/20099 (57.27%) Loss: 2.020589 LR: 0.00001908 +[13:12:11] Epoch: 1 Batch: 11512/20099 (57.28%) Loss: 2.192149 LR: 0.00001908 +[13:12:13] Epoch: 1 Batch: 11513/20099 (57.28%) Loss: 2.099419 LR: 0.00001908 +[13:12:15] Epoch: 1 Batch: 11514/20099 (57.29%) Loss: 2.081747 LR: 0.00001908 +[13:12:17] Epoch: 1 Batch: 11515/20099 (57.29%) Loss: 2.161253 LR: 0.00001908 +[13:12:18] Epoch: 1 Batch: 11516/20099 (57.30%) Loss: 1.832454 LR: 0.00001908 +[13:12:20] Epoch: 1 Batch: 11517/20099 (57.30%) Loss: 2.114930 LR: 0.00001908 +[13:12:22] Epoch: 1 Batch: 11518/20099 (57.31%) Loss: 2.044088 LR: 0.00001906 +[13:12:24] Epoch: 1 Batch: 11519/20099 (57.31%) Loss: 2.297429 LR: 0.00001906 +[13:12:26] Epoch: 1 Batch: 11520/20099 (57.32%) Loss: 1.994377 LR: 0.00001906 +[13:12:27] Epoch: 1 Batch: 11521/20099 (57.32%) Loss: 2.248152 LR: 0.00001906 +[13:12:29] Epoch: 1 Batch: 11522/20099 (57.33%) Loss: 2.106401 LR: 0.00001906 +[13:12:31] Epoch: 1 Batch: 11523/20099 (57.33%) Loss: 2.226057 LR: 0.00001906 +[13:12:33] Epoch: 1 Batch: 11524/20099 (57.34%) Loss: 1.804335 LR: 0.00001906 +[13:12:35] Epoch: 1 Batch: 11525/20099 (57.34%) Loss: 2.431558 LR: 0.00001905 +[13:12:36] Epoch: 1 Batch: 11526/20099 (57.35%) Loss: 2.196276 LR: 0.00001905 +[13:12:38] Epoch: 1 Batch: 11527/20099 (57.35%) Loss: 2.252018 LR: 0.00001905 +[13:12:40] Epoch: 1 Batch: 11528/20099 (57.36%) Loss: 2.260310 LR: 0.00001905 +[13:12:42] Epoch: 1 Batch: 11529/20099 (57.36%) Loss: 2.215660 LR: 0.00001905 +[13:12:43] Epoch: 1 Batch: 11530/20099 (57.37%) Loss: 2.292562 LR: 0.00001905 +[13:12:45] Epoch: 1 Batch: 11531/20099 (57.37%) Loss: 2.105291 LR: 0.00001905 +[13:12:47] Epoch: 1 Batch: 11532/20099 (57.38%) Loss: 2.257446 LR: 0.00001903 +[13:12:49] Epoch: 1 Batch: 11533/20099 (57.38%) Loss: 2.469804 LR: 0.00001903 +[13:12:50] Epoch: 1 Batch: 11534/20099 (57.39%) Loss: 2.128227 LR: 0.00001903 +[13:12:52] Epoch: 1 Batch: 11535/20099 (57.39%) Loss: 2.174700 LR: 0.00001903 +[13:12:54] Epoch: 1 Batch: 11536/20099 (57.40%) Loss: 1.709875 LR: 0.00001903 +[13:12:56] Epoch: 1 Batch: 11537/20099 (57.40%) Loss: 2.152450 LR: 0.00001903 +[13:12:58] Epoch: 1 Batch: 11538/20099 (57.41%) Loss: 2.329833 LR: 0.00001903 +[13:12:59] Epoch: 1 Batch: 11539/20099 (57.41%) Loss: 2.260255 LR: 0.00001901 +[13:13:01] Epoch: 1 Batch: 11540/20099 (57.42%) Loss: 1.657529 LR: 0.00001901 +[13:13:03] Epoch: 1 Batch: 11541/20099 (57.42%) Loss: 1.797801 LR: 0.00001901 +[13:13:05] Epoch: 1 Batch: 11542/20099 (57.43%) Loss: 1.814854 LR: 0.00001901 +[13:13:06] Epoch: 1 Batch: 11543/20099 (57.43%) Loss: 1.984634 LR: 0.00001901 +[13:13:08] Epoch: 1 Batch: 11544/20099 (57.44%) Loss: 2.216083 LR: 0.00001901 +[13:13:10] Epoch: 1 Batch: 11545/20099 (57.44%) Loss: 2.156146 LR: 0.00001901 +[13:13:12] Epoch: 1 Batch: 11546/20099 (57.45%) Loss: 2.170917 LR: 0.00001900 +[13:13:13] Epoch: 1 Batch: 11547/20099 (57.45%) Loss: 1.994955 LR: 0.00001900 +[13:13:15] Epoch: 1 Batch: 11548/20099 (57.46%) Loss: 2.226553 LR: 0.00001900 +[13:13:17] Epoch: 1 Batch: 11549/20099 (57.46%) Loss: 2.305378 LR: 0.00001900 +[13:13:19] Epoch: 1 Batch: 11550/20099 (57.47%) Loss: 1.807726 LR: 0.00001900 +[13:13:20] Epoch: 1 Batch: 11551/20099 (57.47%) Loss: 2.128574 LR: 0.00001900 +[13:13:22] Epoch: 1 Batch: 11552/20099 (57.48%) Loss: 2.223904 LR: 0.00001900 +[13:13:24] Epoch: 1 Batch: 11553/20099 (57.48%) Loss: 2.077194 LR: 0.00001898 +[13:13:26] Epoch: 1 Batch: 11554/20099 (57.49%) Loss: 2.172306 LR: 0.00001898 +[13:13:28] Epoch: 1 Batch: 11555/20099 (57.49%) Loss: 2.135675 LR: 0.00001898 +[13:13:29] Epoch: 1 Batch: 11556/20099 (57.50%) Loss: 2.065298 LR: 0.00001898 +[13:13:31] Epoch: 1 Batch: 11557/20099 (57.50%) Loss: 2.146181 LR: 0.00001898 +[13:13:33] Epoch: 1 Batch: 11558/20099 (57.51%) Loss: 1.915734 LR: 0.00001898 +[13:13:35] Epoch: 1 Batch: 11559/20099 (57.51%) Loss: 2.116762 LR: 0.00001898 +[13:13:36] Epoch: 1 Batch: 11560/20099 (57.52%) Loss: 2.252793 LR: 0.00001897 +[13:13:38] Epoch: 1 Batch: 11561/20099 (57.52%) Loss: 1.907568 LR: 0.00001897 +[13:13:40] Epoch: 1 Batch: 11562/20099 (57.53%) Loss: 2.268930 LR: 0.00001897 +[13:13:42] Epoch: 1 Batch: 11563/20099 (57.53%) Loss: 2.167648 LR: 0.00001897 +[13:13:43] Epoch: 1 Batch: 11564/20099 (57.54%) Loss: 1.898059 LR: 0.00001897 +[13:13:45] Epoch: 1 Batch: 11565/20099 (57.54%) Loss: 2.067606 LR: 0.00001897 +[13:13:47] Epoch: 1 Batch: 11566/20099 (57.55%) Loss: 2.174708 LR: 0.00001897 +[13:13:49] Epoch: 1 Batch: 11567/20099 (57.55%) Loss: 2.463333 LR: 0.00001895 +[13:13:51] Epoch: 1 Batch: 11568/20099 (57.56%) Loss: 2.193546 LR: 0.00001895 +[13:13:52] Epoch: 1 Batch: 11569/20099 (57.56%) Loss: 2.292379 LR: 0.00001895 +[13:13:54] Epoch: 1 Batch: 11570/20099 (57.57%) Loss: 2.283432 LR: 0.00001895 +[13:13:56] Epoch: 1 Batch: 11571/20099 (57.57%) Loss: 2.025387 LR: 0.00001895 +[13:13:58] Epoch: 1 Batch: 11572/20099 (57.58%) Loss: 2.224511 LR: 0.00001895 +[13:13:59] Epoch: 1 Batch: 11573/20099 (57.58%) Loss: 1.927526 LR: 0.00001895 +[13:14:01] Epoch: 1 Batch: 11574/20099 (57.58%) Loss: 2.233892 LR: 0.00001893 +[13:14:03] Epoch: 1 Batch: 11575/20099 (57.59%) Loss: 2.209314 LR: 0.00001893 +[13:14:05] Epoch: 1 Batch: 11576/20099 (57.59%) Loss: 2.044917 LR: 0.00001893 +[13:14:06] Epoch: 1 Batch: 11577/20099 (57.60%) Loss: 2.163279 LR: 0.00001893 +[13:14:08] Epoch: 1 Batch: 11578/20099 (57.60%) Loss: 2.140867 LR: 0.00001893 +[13:14:10] Epoch: 1 Batch: 11579/20099 (57.61%) Loss: 1.925324 LR: 0.00001893 +[13:14:12] Epoch: 1 Batch: 11580/20099 (57.61%) Loss: 2.232747 LR: 0.00001893 +[13:14:14] Epoch: 1 Batch: 11581/20099 (57.62%) Loss: 2.407112 LR: 0.00001892 +[13:14:15] Epoch: 1 Batch: 11582/20099 (57.62%) Loss: 2.089918 LR: 0.00001892 +[13:14:17] Epoch: 1 Batch: 11583/20099 (57.63%) Loss: 2.156255 LR: 0.00001892 +[13:14:19] Epoch: 1 Batch: 11584/20099 (57.63%) Loss: 2.085344 LR: 0.00001892 +[13:14:21] Epoch: 1 Batch: 11585/20099 (57.64%) Loss: 2.082570 LR: 0.00001892 +[13:14:22] Epoch: 1 Batch: 11586/20099 (57.64%) Loss: 1.584989 LR: 0.00001892 +[13:14:24] Epoch: 1 Batch: 11587/20099 (57.65%) Loss: 2.285981 LR: 0.00001892 +[13:14:26] Epoch: 1 Batch: 11588/20099 (57.65%) Loss: 2.106760 LR: 0.00001890 +[13:14:28] Epoch: 1 Batch: 11589/20099 (57.66%) Loss: 2.158588 LR: 0.00001890 +[13:14:29] Epoch: 1 Batch: 11590/20099 (57.66%) Loss: 2.029811 LR: 0.00001890 +[13:14:31] Epoch: 1 Batch: 11591/20099 (57.67%) Loss: 2.087928 LR: 0.00001890 +[13:14:33] Epoch: 1 Batch: 11592/20099 (57.67%) Loss: 2.069029 LR: 0.00001890 +[13:14:35] Epoch: 1 Batch: 11593/20099 (57.68%) Loss: 1.928134 LR: 0.00001890 +[13:14:37] Epoch: 1 Batch: 11594/20099 (57.68%) Loss: 2.050188 LR: 0.00001890 +[13:14:38] Epoch: 1 Batch: 11595/20099 (57.69%) Loss: 2.157734 LR: 0.00001888 +[13:14:40] Epoch: 1 Batch: 11596/20099 (57.69%) Loss: 2.152465 LR: 0.00001888 +[13:14:42] Epoch: 1 Batch: 11597/20099 (57.70%) Loss: 2.212037 LR: 0.00001888 +[13:14:44] Epoch: 1 Batch: 11598/20099 (57.70%) Loss: 2.062583 LR: 0.00001888 +[13:14:45] Epoch: 1 Batch: 11599/20099 (57.71%) Loss: 2.122757 LR: 0.00001888 +[13:14:51] >> Cleaned up old temp checkpoint: epoch1_step9600 +[13:14:51] >> Temp checkpoint saved: epoch1_step11600, size: 0.1693 GB +[13:14:51] Epoch: 1 Batch: 11600/20099 (57.71%) Loss: 2.018181 LR: 0.00001888 +[13:14:53] Epoch: 1 Batch: 11601/20099 (57.72%) Loss: 2.156212 LR: 0.00001888 +[13:14:54] Epoch: 1 Batch: 11602/20099 (57.72%) Loss: 2.090179 LR: 0.00001887 +[13:14:56] Epoch: 1 Batch: 11603/20099 (57.73%) Loss: 1.807906 LR: 0.00001887 +[13:14:58] Epoch: 1 Batch: 11604/20099 (57.73%) Loss: 2.004309 LR: 0.00001887 +[13:15:00] Epoch: 1 Batch: 11605/20099 (57.74%) Loss: 2.192232 LR: 0.00001887 +[13:15:01] Epoch: 1 Batch: 11606/20099 (57.74%) Loss: 1.983692 LR: 0.00001887 +[13:15:03] Epoch: 1 Batch: 11607/20099 (57.75%) Loss: 1.958194 LR: 0.00001887 +[13:15:05] Epoch: 1 Batch: 11608/20099 (57.75%) Loss: 1.943734 LR: 0.00001887 +[13:15:07] Epoch: 1 Batch: 11609/20099 (57.76%) Loss: 1.857016 LR: 0.00001885 +[13:15:08] Epoch: 1 Batch: 11610/20099 (57.76%) Loss: 2.328492 LR: 0.00001885 +[13:15:10] Epoch: 1 Batch: 11611/20099 (57.77%) Loss: 2.152304 LR: 0.00001885 +[13:15:12] Epoch: 1 Batch: 11612/20099 (57.77%) Loss: 2.180845 LR: 0.00001885 +[13:15:14] Epoch: 1 Batch: 11613/20099 (57.78%) Loss: 2.161557 LR: 0.00001885 +[13:15:16] Epoch: 1 Batch: 11614/20099 (57.78%) Loss: 2.205563 LR: 0.00001885 +[13:15:17] Epoch: 1 Batch: 11615/20099 (57.79%) Loss: 2.190918 LR: 0.00001885 +[13:15:19] Epoch: 1 Batch: 11616/20099 (57.79%) Loss: 1.940868 LR: 0.00001884 +[13:15:21] Epoch: 1 Batch: 11617/20099 (57.80%) Loss: 1.854971 LR: 0.00001884 +[13:15:23] Epoch: 1 Batch: 11618/20099 (57.80%) Loss: 2.152514 LR: 0.00001884 +[13:15:25] Epoch: 1 Batch: 11619/20099 (57.81%) Loss: 1.921189 LR: 0.00001884 +[13:15:26] Epoch: 1 Batch: 11620/20099 (57.81%) Loss: 2.131121 LR: 0.00001884 +[13:15:28] Epoch: 1 Batch: 11621/20099 (57.82%) Loss: 2.126243 LR: 0.00001884 +[13:15:30] Epoch: 1 Batch: 11622/20099 (57.82%) Loss: 1.669983 LR: 0.00001884 +[13:15:32] Epoch: 1 Batch: 11623/20099 (57.83%) Loss: 2.320068 LR: 0.00001882 +[13:15:34] Epoch: 1 Batch: 11624/20099 (57.83%) Loss: 1.949655 LR: 0.00001882 +[13:15:35] Epoch: 1 Batch: 11625/20099 (57.84%) Loss: 2.391958 LR: 0.00001882 +[13:15:37] Epoch: 1 Batch: 11626/20099 (57.84%) Loss: 1.898051 LR: 0.00001882 +[13:15:39] Epoch: 1 Batch: 11627/20099 (57.85%) Loss: 2.187440 LR: 0.00001882 +[13:15:41] Epoch: 1 Batch: 11628/20099 (57.85%) Loss: 2.429462 LR: 0.00001882 +[13:15:42] Epoch: 1 Batch: 11629/20099 (57.86%) Loss: 1.917211 LR: 0.00001882 +[13:15:44] Epoch: 1 Batch: 11630/20099 (57.86%) Loss: 2.011681 LR: 0.00001880 +[13:15:46] Epoch: 1 Batch: 11631/20099 (57.87%) Loss: 2.390054 LR: 0.00001880 +[13:15:48] Epoch: 1 Batch: 11632/20099 (57.87%) Loss: 1.939198 LR: 0.00001880 +[13:15:50] Epoch: 1 Batch: 11633/20099 (57.88%) Loss: 1.925243 LR: 0.00001880 +[13:15:51] Epoch: 1 Batch: 11634/20099 (57.88%) Loss: 1.961159 LR: 0.00001880 +[13:15:53] Epoch: 1 Batch: 11635/20099 (57.89%) Loss: 2.227958 LR: 0.00001880 +[13:15:55] Epoch: 1 Batch: 11636/20099 (57.89%) Loss: 2.033913 LR: 0.00001880 +[13:15:57] Epoch: 1 Batch: 11637/20099 (57.90%) Loss: 2.320921 LR: 0.00001879 +[13:15:58] Epoch: 1 Batch: 11638/20099 (57.90%) Loss: 2.222155 LR: 0.00001879 +[13:16:00] Epoch: 1 Batch: 11639/20099 (57.91%) Loss: 2.124020 LR: 0.00001879 +[13:16:02] Epoch: 1 Batch: 11640/20099 (57.91%) Loss: 1.926155 LR: 0.00001879 +[13:16:04] Epoch: 1 Batch: 11641/20099 (57.92%) Loss: 1.899636 LR: 0.00001879 +[13:16:05] Epoch: 1 Batch: 11642/20099 (57.92%) Loss: 2.078834 LR: 0.00001879 +[13:16:07] Epoch: 1 Batch: 11643/20099 (57.93%) Loss: 2.304538 LR: 0.00001879 +[13:16:09] Epoch: 1 Batch: 11644/20099 (57.93%) Loss: 2.042716 LR: 0.00001877 +[13:16:11] Epoch: 1 Batch: 11645/20099 (57.94%) Loss: 2.162959 LR: 0.00001877 +[13:16:12] Epoch: 1 Batch: 11646/20099 (57.94%) Loss: 2.268860 LR: 0.00001877 +[13:16:14] Epoch: 1 Batch: 11647/20099 (57.95%) Loss: 2.438877 LR: 0.00001877 +[13:16:16] Epoch: 1 Batch: 11648/20099 (57.95%) Loss: 2.039470 LR: 0.00001877 +[13:16:18] Epoch: 1 Batch: 11649/20099 (57.96%) Loss: 2.131665 LR: 0.00001877 +[13:16:20] Epoch: 1 Batch: 11650/20099 (57.96%) Loss: 1.800963 LR: 0.00001877 +[13:16:21] Epoch: 1 Batch: 11651/20099 (57.97%) Loss: 1.808833 LR: 0.00001875 +[13:16:23] Epoch: 1 Batch: 11652/20099 (57.97%) Loss: 2.138390 LR: 0.00001875 +[13:16:25] Epoch: 1 Batch: 11653/20099 (57.98%) Loss: 2.073210 LR: 0.00001875 +[13:16:27] Epoch: 1 Batch: 11654/20099 (57.98%) Loss: 1.930101 LR: 0.00001875 +[13:16:28] Epoch: 1 Batch: 11655/20099 (57.99%) Loss: 2.105002 LR: 0.00001875 +[13:16:30] Epoch: 1 Batch: 11656/20099 (57.99%) Loss: 1.822292 LR: 0.00001875 +[13:16:32] Epoch: 1 Batch: 11657/20099 (58.00%) Loss: 1.911726 LR: 0.00001875 +[13:16:34] Epoch: 1 Batch: 11658/20099 (58.00%) Loss: 2.213386 LR: 0.00001874 +[13:16:35] Epoch: 1 Batch: 11659/20099 (58.01%) Loss: 2.286058 LR: 0.00001874 +[13:16:37] Epoch: 1 Batch: 11660/20099 (58.01%) Loss: 2.118092 LR: 0.00001874 +[13:16:39] Epoch: 1 Batch: 11661/20099 (58.02%) Loss: 1.874805 LR: 0.00001874 +[13:16:41] Epoch: 1 Batch: 11662/20099 (58.02%) Loss: 1.997573 LR: 0.00001874 +[13:16:43] Epoch: 1 Batch: 11663/20099 (58.03%) Loss: 1.860676 LR: 0.00001874 +[13:16:44] Epoch: 1 Batch: 11664/20099 (58.03%) Loss: 2.221928 LR: 0.00001874 +[13:16:46] Epoch: 1 Batch: 11665/20099 (58.04%) Loss: 2.073497 LR: 0.00001872 +[13:16:48] Epoch: 1 Batch: 11666/20099 (58.04%) Loss: 1.703162 LR: 0.00001872 +[13:16:50] Epoch: 1 Batch: 11667/20099 (58.05%) Loss: 2.015299 LR: 0.00001872 +[13:16:51] Epoch: 1 Batch: 11668/20099 (58.05%) Loss: 2.110135 LR: 0.00001872 +[13:16:53] Epoch: 1 Batch: 11669/20099 (58.06%) Loss: 2.044107 LR: 0.00001872 +[13:16:55] Epoch: 1 Batch: 11670/20099 (58.06%) Loss: 2.275591 LR: 0.00001872 +[13:16:57] Epoch: 1 Batch: 11671/20099 (58.07%) Loss: 2.307717 LR: 0.00001872 +[13:16:58] Epoch: 1 Batch: 11672/20099 (58.07%) Loss: 1.963907 LR: 0.00001871 +[13:17:00] Epoch: 1 Batch: 11673/20099 (58.08%) Loss: 2.024804 LR: 0.00001871 +[13:17:02] Epoch: 1 Batch: 11674/20099 (58.08%) Loss: 2.347996 LR: 0.00001871 +[13:17:04] Epoch: 1 Batch: 11675/20099 (58.09%) Loss: 2.136356 LR: 0.00001871 +[13:17:06] Epoch: 1 Batch: 11676/20099 (58.09%) Loss: 2.091377 LR: 0.00001871 +[13:17:07] Epoch: 1 Batch: 11677/20099 (58.10%) Loss: 2.078589 LR: 0.00001871 +[13:17:09] Epoch: 1 Batch: 11678/20099 (58.10%) Loss: 2.266065 LR: 0.00001871 +[13:17:11] Epoch: 1 Batch: 11679/20099 (58.11%) Loss: 1.983041 LR: 0.00001869 +[13:17:13] Epoch: 1 Batch: 11680/20099 (58.11%) Loss: 1.999972 LR: 0.00001869 +[13:17:14] Epoch: 1 Batch: 11681/20099 (58.12%) Loss: 2.151552 LR: 0.00001869 +[13:17:16] Epoch: 1 Batch: 11682/20099 (58.12%) Loss: 2.087489 LR: 0.00001869 +[13:17:18] Epoch: 1 Batch: 11683/20099 (58.13%) Loss: 1.714874 LR: 0.00001869 +[13:17:20] Epoch: 1 Batch: 11684/20099 (58.13%) Loss: 1.909625 LR: 0.00001869 +[13:17:22] Epoch: 1 Batch: 11685/20099 (58.14%) Loss: 2.236058 LR: 0.00001869 +[13:17:23] Epoch: 1 Batch: 11686/20099 (58.14%) Loss: 1.885413 LR: 0.00001867 +[13:17:25] Epoch: 1 Batch: 11687/20099 (58.15%) Loss: 2.283195 LR: 0.00001867 +[13:17:27] Epoch: 1 Batch: 11688/20099 (58.15%) Loss: 1.883298 LR: 0.00001867 +[13:17:29] Epoch: 1 Batch: 11689/20099 (58.16%) Loss: 2.055175 LR: 0.00001867 +[13:17:31] Epoch: 1 Batch: 11690/20099 (58.16%) Loss: 1.967116 LR: 0.00001867 +[13:17:32] Epoch: 1 Batch: 11691/20099 (58.17%) Loss: 1.977898 LR: 0.00001867 +[13:17:34] Epoch: 1 Batch: 11692/20099 (58.17%) Loss: 1.840670 LR: 0.00001867 +[13:17:36] Epoch: 1 Batch: 11693/20099 (58.18%) Loss: 2.174229 LR: 0.00001866 +[13:17:38] Epoch: 1 Batch: 11694/20099 (58.18%) Loss: 2.390308 LR: 0.00001866 +[13:17:39] Epoch: 1 Batch: 11695/20099 (58.19%) Loss: 1.990697 LR: 0.00001866 +[13:17:41] Epoch: 1 Batch: 11696/20099 (58.19%) Loss: 2.169670 LR: 0.00001866 +[13:17:43] Epoch: 1 Batch: 11697/20099 (58.20%) Loss: 2.138491 LR: 0.00001866 +[13:17:45] Epoch: 1 Batch: 11698/20099 (58.20%) Loss: 2.178364 LR: 0.00001866 +[13:17:46] Epoch: 1 Batch: 11699/20099 (58.21%) Loss: 1.960036 LR: 0.00001866 +[13:17:48] Epoch: 1 Batch: 11700/20099 (58.21%) Loss: 2.222705 LR: 0.00001864 +[13:17:50] Epoch: 1 Batch: 11701/20099 (58.22%) Loss: 2.192986 LR: 0.00001864 +[13:17:52] Epoch: 1 Batch: 11702/20099 (58.22%) Loss: 2.476934 LR: 0.00001864 +[13:17:54] Epoch: 1 Batch: 11703/20099 (58.23%) Loss: 2.118600 LR: 0.00001864 +[13:17:55] Epoch: 1 Batch: 11704/20099 (58.23%) Loss: 2.107777 LR: 0.00001864 +[13:17:57] Epoch: 1 Batch: 11705/20099 (58.24%) Loss: 1.888525 LR: 0.00001864 +[13:17:59] Epoch: 1 Batch: 11706/20099 (58.24%) Loss: 2.284290 LR: 0.00001864 +[13:18:01] Epoch: 1 Batch: 11707/20099 (58.25%) Loss: 1.869882 LR: 0.00001863 +[13:18:02] Epoch: 1 Batch: 11708/20099 (58.25%) Loss: 2.150631 LR: 0.00001863 +[13:18:04] Epoch: 1 Batch: 11709/20099 (58.26%) Loss: 2.212356 LR: 0.00001863 +[13:18:06] Epoch: 1 Batch: 11710/20099 (58.26%) Loss: 2.236769 LR: 0.00001863 +[13:18:08] Epoch: 1 Batch: 11711/20099 (58.27%) Loss: 1.888924 LR: 0.00001863 +[13:18:10] Epoch: 1 Batch: 11712/20099 (58.27%) Loss: 2.198228 LR: 0.00001863 +[13:18:11] Epoch: 1 Batch: 11713/20099 (58.28%) Loss: 2.288591 LR: 0.00001863 +[13:18:13] Epoch: 1 Batch: 11714/20099 (58.28%) Loss: 2.357502 LR: 0.00001861 +[13:18:15] Epoch: 1 Batch: 11715/20099 (58.29%) Loss: 2.147627 LR: 0.00001861 +[13:18:17] Epoch: 1 Batch: 11716/20099 (58.29%) Loss: 2.183411 LR: 0.00001861 +[13:18:18] Epoch: 1 Batch: 11717/20099 (58.30%) Loss: 2.368242 LR: 0.00001861 +[13:18:20] Epoch: 1 Batch: 11718/20099 (58.30%) Loss: 2.001774 LR: 0.00001861 +[13:18:22] Epoch: 1 Batch: 11719/20099 (58.31%) Loss: 2.183414 LR: 0.00001861 +[13:18:24] Epoch: 1 Batch: 11720/20099 (58.31%) Loss: 2.261663 LR: 0.00001861 +[13:18:26] Epoch: 1 Batch: 11721/20099 (58.32%) Loss: 2.404812 LR: 0.00001859 +[13:18:27] Epoch: 1 Batch: 11722/20099 (58.32%) Loss: 1.867872 LR: 0.00001859 +[13:18:29] Epoch: 1 Batch: 11723/20099 (58.33%) Loss: 2.090527 LR: 0.00001859 +[13:18:31] Epoch: 1 Batch: 11724/20099 (58.33%) Loss: 2.006189 LR: 0.00001859 +[13:18:33] Epoch: 1 Batch: 11725/20099 (58.34%) Loss: 2.014920 LR: 0.00001859 +[13:18:34] Epoch: 1 Batch: 11726/20099 (58.34%) Loss: 2.088670 LR: 0.00001859 +[13:18:36] Epoch: 1 Batch: 11727/20099 (58.35%) Loss: 2.055283 LR: 0.00001859 +[13:18:38] Epoch: 1 Batch: 11728/20099 (58.35%) Loss: 2.098110 LR: 0.00001858 +[13:18:40] Epoch: 1 Batch: 11729/20099 (58.36%) Loss: 1.762565 LR: 0.00001858 +[13:18:41] Epoch: 1 Batch: 11730/20099 (58.36%) Loss: 1.891574 LR: 0.00001858 +[13:18:43] Epoch: 1 Batch: 11731/20099 (58.37%) Loss: 2.306237 LR: 0.00001858 +[13:18:45] Epoch: 1 Batch: 11732/20099 (58.37%) Loss: 2.124381 LR: 0.00001858 +[13:18:47] Epoch: 1 Batch: 11733/20099 (58.38%) Loss: 1.964939 LR: 0.00001858 +[13:18:49] Epoch: 1 Batch: 11734/20099 (58.38%) Loss: 1.894481 LR: 0.00001858 +[13:18:50] Epoch: 1 Batch: 11735/20099 (58.39%) Loss: 2.205078 LR: 0.00001856 +[13:18:52] Epoch: 1 Batch: 11736/20099 (58.39%) Loss: 1.677717 LR: 0.00001856 +[13:18:54] Epoch: 1 Batch: 11737/20099 (58.40%) Loss: 2.454359 LR: 0.00001856 +[13:18:56] Epoch: 1 Batch: 11738/20099 (58.40%) Loss: 2.226118 LR: 0.00001856 +[13:18:57] Epoch: 1 Batch: 11739/20099 (58.41%) Loss: 2.097765 LR: 0.00001856 +[13:18:59] Epoch: 1 Batch: 11740/20099 (58.41%) Loss: 1.938450 LR: 0.00001856 +[13:19:01] Epoch: 1 Batch: 11741/20099 (58.42%) Loss: 1.986032 LR: 0.00001856 +[13:19:03] Epoch: 1 Batch: 11742/20099 (58.42%) Loss: 2.220109 LR: 0.00001854 +[13:19:05] Epoch: 1 Batch: 11743/20099 (58.43%) Loss: 1.878649 LR: 0.00001854 +[13:19:06] Epoch: 1 Batch: 11744/20099 (58.43%) Loss: 1.951298 LR: 0.00001854 +[13:19:08] Epoch: 1 Batch: 11745/20099 (58.44%) Loss: 1.909854 LR: 0.00001854 +[13:19:10] Epoch: 1 Batch: 11746/20099 (58.44%) Loss: 2.061608 LR: 0.00001854 +[13:19:12] Epoch: 1 Batch: 11747/20099 (58.45%) Loss: 2.377390 LR: 0.00001854 +[13:19:13] Epoch: 1 Batch: 11748/20099 (58.45%) Loss: 2.206749 LR: 0.00001854 +[13:19:15] Epoch: 1 Batch: 11749/20099 (58.46%) Loss: 2.091898 LR: 0.00001853 +[13:19:17] Epoch: 1 Batch: 11750/20099 (58.46%) Loss: 2.100125 LR: 0.00001853 +[13:19:19] Epoch: 1 Batch: 11751/20099 (58.47%) Loss: 1.944708 LR: 0.00001853 +[13:19:20] Epoch: 1 Batch: 11752/20099 (58.47%) Loss: 1.943774 LR: 0.00001853 +[13:19:22] Epoch: 1 Batch: 11753/20099 (58.48%) Loss: 2.211108 LR: 0.00001853 +[13:19:24] Epoch: 1 Batch: 11754/20099 (58.48%) Loss: 2.144678 LR: 0.00001853 +[13:19:26] Epoch: 1 Batch: 11755/20099 (58.49%) Loss: 2.209696 LR: 0.00001853 +[13:19:28] Epoch: 1 Batch: 11756/20099 (58.49%) Loss: 2.116935 LR: 0.00001851 +[13:19:29] Epoch: 1 Batch: 11757/20099 (58.50%) Loss: 1.694196 LR: 0.00001851 +[13:19:31] Epoch: 1 Batch: 11758/20099 (58.50%) Loss: 2.308311 LR: 0.00001851 +[13:19:33] Epoch: 1 Batch: 11759/20099 (58.51%) Loss: 2.216196 LR: 0.00001851 +[13:19:35] Epoch: 1 Batch: 11760/20099 (58.51%) Loss: 2.125787 LR: 0.00001851 +[13:19:36] Epoch: 1 Batch: 11761/20099 (58.52%) Loss: 2.083488 LR: 0.00001851 +[13:19:38] Epoch: 1 Batch: 11762/20099 (58.52%) Loss: 2.249752 LR: 0.00001851 +[13:19:40] Epoch: 1 Batch: 11763/20099 (58.53%) Loss: 2.244584 LR: 0.00001850 +[13:19:42] Epoch: 1 Batch: 11764/20099 (58.53%) Loss: 2.289947 LR: 0.00001850 +[13:19:44] Epoch: 1 Batch: 11765/20099 (58.54%) Loss: 2.301536 LR: 0.00001850 +[13:40:38] 2025-08-23 +[13:40:38] Tesla T4 +[13:40:38] +|===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 0 B | 0 B | 0 B | 0 B | +|---------------------------------------------------------------------------| +| Active memory | 0 B | 0 B | 0 B | 0 B | +|---------------------------------------------------------------------------| +| Requested memory | 0 B | 0 B | 0 B | 0 B | +|---------------------------------------------------------------------------| +| GPU reserved memory | 0 B | 0 B | 0 B | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 0 B | 0 B | 0 B | 0 B | +|---------------------------------------------------------------------------| +| Allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Active allocs | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| GPU reserved segments | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +[13:40:38] CPU usage: 38.0%, RAM usage: 26.7% +[13:40:38] Running with the following configuration: +[13:40:38] model_name: /content/drive/MyDrive/llm/NousResearch/Hermes-3-Llama-3.1-8B +[13:40:38] tokenizer: /content/drive/MyDrive/llm/NousResearch/Hermes-3-Llama-3.1-8B +[13:40:38] output_dir: /content/drive/MyDrive/llm/Discord-Hermes-3-8B +[13:40:38] train_path: /content/drive/MyDrive/data/None156_fix.csv +[13:40:38] checkpoint: /content/drive/MyDrive/llm/Discord-Hermes-3-8B/temp/epoch1_step11600 +[13:40:38] lr: 3e-05 +[13:40:38] lr_floor: 6e-06 +[13:40:38] epochs: 1 +[13:40:38] batch_size: 5 +[13:40:38] accum_steps: 7 +[13:40:38] val_batch_size: 6 +[13:40:38] max_val_size: 100 +[13:40:38] max_length: 150 +[13:40:38] save_temp_frequency: 200 +[13:40:38] save_frequency: 500 +[13:40:38] eval_frequency: 500 +[13:40:38] save_pattern: y +[13:40:38] quantization: y +[13:40:38] quantization_bits: 4 +[13:40:38] lora: y +[13:40:38] frozen_lora_path: None +[13:40:38] lora_rank: 16 +[13:40:38] lora_alpha: 32 +[13:40:38] lora_dropout: 0.1 +[13:40:38] optimizer_weight_decay: 0.0 +[13:40:38] warmup_type: cosine +[13:40:38] warmup_ratio: 0.08 +[13:40:38] warmup_steps: 550 +[13:40:38] shuffle: y +[13:40:38] csv_column: text +[13:40:38] new_run: n +[13:40:38] label_smoothing: 0.05 +[13:40:38] SEED: 1 +[13:40:38] Using device: cuda +[13:40:39] Resuming from temp checkpoint: /content/drive/MyDrive/llm/Discord-Hermes-3-8B/temp/epoch1_step11600 +[13:46:13] Embeddings shape after: torch.Size([128256, 4096]) +[13:46:19] Loaded trainable LoRA adapter from /content/drive/MyDrive/llm/Discord-Hermes-3-8B/temp/epoch1_step11600 +[13:46:19] Trainable LoRA 'default': +[13:46:19] task_type: CAUSAL_LM +[13:46:19] peft_type: PeftType.LORA +[13:46:19] auto_mapping: None +[13:46:19] base_model_name_or_path: /content/drive/MyDrive/llm/NousResearch/Hermes-3-Llama-3.1-8B +[13:46:19] revision: None +[13:46:19] inference_mode: False +[13:46:19] r: 16 +[13:46:19] target_modules: {'q_proj', 'k_proj', 'o_proj', 'v_proj'} +[13:46:19] exclude_modules: None +[13:46:19] lora_alpha: 32 +[13:46:19] lora_dropout: 0.1 +[13:46:19] fan_in_fan_out: False +[13:46:19] bias: none +[13:46:19] use_rslora: True +[13:46:19] modules_to_save: None +[13:46:19] init_lora_weights: True +[13:46:19] layers_to_transform: None +[13:46:19] layers_pattern: None +[13:46:19] rank_pattern: {} +[13:46:19] alpha_pattern: {} +[13:46:19] megatron_config: None +[13:46:19] megatron_core: megatron.core +[13:46:19] trainable_token_indices: None +[13:46:19] loftq_config: {} +[13:46:19] eva_config: None +[13:46:19] corda_config: None +[13:46:19] use_dora: False +[13:46:19] use_qalora: False +[13:46:19] qalora_group_size: 16 +[13:46:20] layer_replication: None +[13:46:20] runtime_config: LoraRuntimeConfig(ephemeral_gpu_offload=False) +[13:46:20] lora_bias: False +[13:46:20] target_parameters: None +[13:46:20] _custom_modules: None +[13:46:20] Embeddings shape after: torch.Size([128256, 4096]) +[13:46:27] Resumed from epoch 1, step 11601, file 1 +[13:46:27] Starting from CSV file... +[13:46:30] Splitting data into chunks of 11000... +[13:46:30] Using 7 processes across 10 chunks +[13:46:31] Using saved train/val split from checkpoint. +[13:46:31] Resuming scheduler with warmup steps: 229, total steps: 2871 +[13:46:31] Initializing scheduler with cosine schedule with warmup, warmup steps 550, total steps: 2871 +[13:46:31] Train/Val split: 100492 train, 100 val samples. +[13:46:40] Model: PeftModelForCausalLM +[13:46:40] Model config: LlamaConfig { + "architectures": [ + "LlamaForCausalLM" + ], + "attention_bias": false, + "attention_dropout": 0.0, + "bos_token_id": 128000, + "eos_token_id": 128040, + "head_dim": 128, + "hidden_act": "silu", + "hidden_size": 4096, + "initializer_range": 0.02, + "intermediate_size": 14336, + "max_position_embeddings": 131072, + "mlp_bias": false, + "model_type": "llama", + "num_attention_heads": 32, + "num_hidden_layers": 32, + "num_key_value_heads": 8, + "pretraining_tp": 1, + "quantization_config": { + "_load_in_4bit": true, + "_load_in_8bit": false, + "bnb_4bit_compute_dtype": "float16", + "bnb_4bit_quant_storage": "uint8", + "bnb_4bit_quant_type": "nf4", + "bnb_4bit_use_double_quant": true, + "llm_int8_enable_fp32_cpu_offload": false, + "llm_int8_has_fp16_weight": false, + "llm_int8_skip_modules": [ + "lm_head" + ], + "llm_int8_threshold": 6.0, + "load_in_4bit": true, + "load_in_8bit": false, + "quant_method": "bitsandbytes" + }, + "rms_norm_eps": 1e-05, + "rope_scaling": { + "factor": 8.0, + "high_freq_factor": 4.0, + "low_freq_factor": 1.0, + "original_max_position_embeddings": 8192, + "rope_type": "llama3" + }, + "rope_theta": 500000.0, + "tie_word_embeddings": false, + "torch_dtype": "float16", + "transformers_version": "4.55.2", + "use_cache": true, + "vocab_size": 128256 +} + +[13:46:40] Optimizer params: lr=3e-05, weight_decay=0.0, accum_steps=7 +[13:46:40] +Optimizer: PagedAdamW ( +Parameter Group 0 + alpha: 0.0 + betas: (0.9, 0.95) + eps: 1e-08 + initial_lr: 3e-05 + lr: 0.0 + t_alpha: None + t_beta3: None + weight_decay: 0.0 +) +[13:46:40] Optimizer params: lr=3e-05, weight_decay=0.0, accum_steps=7 +[13:46:40] Scheduler: +[13:46:40] Training on 100492 training samples, 100 validation samples +[13:46:40] Average tokens per sample: 150.00 +[13:46:40] Estimated epoch time: ~296.87 min +[13:46:40] +|===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 5986 MiB | 7004 MiB | 335397 MiB | 329410 MiB | +|---------------------------------------------------------------------------| +| Active memory | 5986 MiB | 7004 MiB | 335397 MiB | 329410 MiB | +|---------------------------------------------------------------------------| +| Requested memory | 5983 MiB | 7000 MiB | 335022 MiB | 329039 MiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 7248 MiB | 7248 MiB | 7248 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 1261 MiB | 5879 MiB | 328754 MiB | 327493 MiB | +|---------------------------------------------------------------------------| +| Allocations | 2762 | 2840 | 33883 | 31121 | +|---------------------------------------------------------------------------| +| Active allocs | 2762 | 2840 | 33883 | 31121 | +|---------------------------------------------------------------------------| +| GPU reserved segments | 185 | 185 | 185 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 36 | 36 | 13826 | 13790 | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +[13:46:40] Restoring shuffle indices from training state for epoch 1 +[13:46:40] CPU usage: 44.7%, RAM usage: 37.4% +[13:46:41] Epoch 1 learning rate: 0.0 +[13:46:41] Starting epoch 1 +[13:47:20] Batch 11601: input_ids shape torch.Size([5, 150]), attention_mask shape torch.Size([5, 150]) +[13:47:21] Epoch: 1 Batch: 11601/20099 (57.72%) Loss: 2.155287 LR: 0.00000000 +[13:47:23] Epoch: 1 Batch: 11602/20099 (57.72%) Loss: 2.090240 LR: 0.00000000 +[13:47:24] Epoch: 1 Batch: 11603/20099 (57.73%) Loss: 1.809998 LR: 0.00000000 +[13:47:26] Epoch: 1 Batch: 11604/20099 (57.73%) Loss: 2.006388 LR: 0.00000000 +[13:47:28] Epoch: 1 Batch: 11605/20099 (57.74%) Loss: 2.192882 LR: 0.00000000 +[13:47:29] Epoch: 1 Batch: 11606/20099 (57.74%) Loss: 1.982967 LR: 0.00000000 +[13:47:31] Epoch: 1 Batch: 11607/20099 (57.75%) Loss: 1.957612 LR: 0.00001887 +[13:47:33] Epoch: 1 Batch: 11608/20099 (57.75%) Loss: 1.946005 LR: 0.00001887 +[13:47:34] Epoch: 1 Batch: 11609/20099 (57.76%) Loss: 1.855440 LR: 0.00001887 +[13:47:36] Epoch: 1 Batch: 11610/20099 (57.76%) Loss: 2.329100 LR: 0.00001887 +[13:47:37] Epoch: 1 Batch: 11611/20099 (57.77%) Loss: 2.153529 LR: 0.00001887 +[13:47:39] Epoch: 1 Batch: 11612/20099 (57.77%) Loss: 2.180310 LR: 0.00001887 +[13:47:41] Epoch: 1 Batch: 11613/20099 (57.78%) Loss: 2.159098 LR: 0.00001887 +[13:47:42] Epoch: 1 Batch: 11614/20099 (57.78%) Loss: 2.207454 LR: 0.00001885 +[13:47:44] Epoch: 1 Batch: 11615/20099 (57.79%) Loss: 2.190587 LR: 0.00001885 +[13:47:46] Epoch: 1 Batch: 11616/20099 (57.79%) Loss: 1.937835 LR: 0.00001885 +[13:47:47] Epoch: 1 Batch: 11617/20099 (57.80%) Loss: 1.851808 LR: 0.00001885 +[13:47:49] Epoch: 1 Batch: 11618/20099 (57.80%) Loss: 2.149576 LR: 0.00001885 +[13:47:51] Epoch: 1 Batch: 11619/20099 (57.81%) Loss: 1.917542 LR: 0.00001885 +[13:47:52] Epoch: 1 Batch: 11620/20099 (57.81%) Loss: 2.133489 LR: 0.00001885 +[13:47:54] Epoch: 1 Batch: 11621/20099 (57.82%) Loss: 2.123605 LR: 0.00001884 +[13:47:56] Epoch: 1 Batch: 11622/20099 (57.82%) Loss: 1.669149 LR: 0.00001884 +[13:47:57] Epoch: 1 Batch: 11623/20099 (57.83%) Loss: 2.318624 LR: 0.00001884 +[13:47:59] Epoch: 1 Batch: 11624/20099 (57.83%) Loss: 1.944835 LR: 0.00001884 +[13:48:01] Epoch: 1 Batch: 11625/20099 (57.84%) Loss: 2.391906 LR: 0.00001884 +[13:48:02] Epoch: 1 Batch: 11626/20099 (57.84%) Loss: 1.898487 LR: 0.00001884 +[13:48:04] Epoch: 1 Batch: 11627/20099 (57.85%) Loss: 2.187089 LR: 0.00001884 +[13:48:06] Epoch: 1 Batch: 11628/20099 (57.85%) Loss: 2.429490 LR: 0.00001882 +[13:48:07] Epoch: 1 Batch: 11629/20099 (57.86%) Loss: 1.914874 LR: 0.00001882 +[13:48:09] Epoch: 1 Batch: 11630/20099 (57.86%) Loss: 2.009294 LR: 0.00001882 +[13:48:11] Epoch: 1 Batch: 11631/20099 (57.87%) Loss: 2.392295 LR: 0.00001882 +[13:48:12] Epoch: 1 Batch: 11632/20099 (57.87%) Loss: 1.945846 LR: 0.00001882 +[13:48:14] Epoch: 1 Batch: 11633/20099 (57.88%) Loss: 1.926804 LR: 0.00001882 +[13:48:16] Epoch: 1 Batch: 11634/20099 (57.88%) Loss: 1.954943 LR: 0.00001882 +[13:48:17] Epoch: 1 Batch: 11635/20099 (57.89%) Loss: 2.227548 LR: 0.00001880 +[13:48:19] Epoch: 1 Batch: 11636/20099 (57.89%) Loss: 2.034768 LR: 0.00001880 +[13:48:21] Epoch: 1 Batch: 11637/20099 (57.90%) Loss: 2.322190 LR: 0.00001880 +[13:48:22] Epoch: 1 Batch: 11638/20099 (57.90%) Loss: 2.219972 LR: 0.00001880 +[13:48:24] Epoch: 1 Batch: 11639/20099 (57.91%) Loss: 2.122686 LR: 0.00001880 +[13:48:26] Epoch: 1 Batch: 11640/20099 (57.91%) Loss: 1.926885 LR: 0.00001880 +[13:48:27] Epoch: 1 Batch: 11641/20099 (57.92%) Loss: 1.897896 LR: 0.00001880 +[13:48:29] Epoch: 1 Batch: 11642/20099 (57.92%) Loss: 2.077842 LR: 0.00001879 +[13:48:31] Epoch: 1 Batch: 11643/20099 (57.93%) Loss: 2.303964 LR: 0.00001879 +[13:48:33] Epoch: 1 Batch: 11644/20099 (57.93%) Loss: 2.046954 LR: 0.00001879 +[13:48:34] Epoch: 1 Batch: 11645/20099 (57.94%) Loss: 2.160060 LR: 0.00001879 +[13:48:36] Epoch: 1 Batch: 11646/20099 (57.94%) Loss: 2.269464 LR: 0.00001879 +[13:48:38] Epoch: 1 Batch: 11647/20099 (57.95%) Loss: 2.438831 LR: 0.00001879 +[13:48:39] Epoch: 1 Batch: 11648/20099 (57.95%) Loss: 2.036758 LR: 0.00001879 +[13:48:41] Epoch: 1 Batch: 11649/20099 (57.96%) Loss: 2.136044 LR: 0.00001877 +[13:48:43] Epoch: 1 Batch: 11650/20099 (57.96%) Loss: 1.804207 LR: 0.00001877 +[13:48:45] Epoch: 1 Batch: 11651/20099 (57.97%) Loss: 1.809004 LR: 0.00001877 +[13:48:46] Epoch: 1 Batch: 11652/20099 (57.97%) Loss: 2.137701 LR: 0.00001877 +[13:48:48] Epoch: 1 Batch: 11653/20099 (57.98%) Loss: 2.075732 LR: 0.00001877 +[13:48:50] Epoch: 1 Batch: 11654/20099 (57.98%) Loss: 1.932519 LR: 0.00001877 +[13:48:51] Epoch: 1 Batch: 11655/20099 (57.99%) Loss: 2.106810 LR: 0.00001877 +[13:48:53] Epoch: 1 Batch: 11656/20099 (57.99%) Loss: 1.824063 LR: 0.00001875 +[13:48:55] Epoch: 1 Batch: 11657/20099 (58.00%) Loss: 1.913222 LR: 0.00001875 +[13:48:57] Epoch: 1 Batch: 11658/20099 (58.00%) Loss: 2.220027 LR: 0.00001875 +[13:48:58] Epoch: 1 Batch: 11659/20099 (58.01%) Loss: 2.284807 LR: 0.00001875 +[13:49:00] Epoch: 1 Batch: 11660/20099 (58.01%) Loss: 2.115885 LR: 0.00001875 +[13:49:02] Epoch: 1 Batch: 11661/20099 (58.02%) Loss: 1.875328 LR: 0.00001875 +[13:49:04] Epoch: 1 Batch: 11662/20099 (58.02%) Loss: 2.000580 LR: 0.00001875 +[13:49:05] Epoch: 1 Batch: 11663/20099 (58.03%) Loss: 1.861473 LR: 0.00001874 +[13:49:07] Epoch: 1 Batch: 11664/20099 (58.03%) Loss: 2.220942 LR: 0.00001874 +[13:49:09] Epoch: 1 Batch: 11665/20099 (58.04%) Loss: 2.075192 LR: 0.00001874 +[13:49:11] Epoch: 1 Batch: 11666/20099 (58.04%) Loss: 1.703435 LR: 0.00001874 +[13:49:12] Epoch: 1 Batch: 11667/20099 (58.05%) Loss: 2.017027 LR: 0.00001874 +[13:49:14] Epoch: 1 Batch: 11668/20099 (58.05%) Loss: 2.112870 LR: 0.00001874 +[13:49:16] Epoch: 1 Batch: 11669/20099 (58.06%) Loss: 2.043498 LR: 0.00001874 +[13:49:18] Epoch: 1 Batch: 11670/20099 (58.06%) Loss: 2.282007 LR: 0.00001872 +[13:49:19] Epoch: 1 Batch: 11671/20099 (58.07%) Loss: 2.311927 LR: 0.00001872 +[13:49:21] Epoch: 1 Batch: 11672/20099 (58.07%) Loss: 1.960341 LR: 0.00001872 +[13:49:23] Epoch: 1 Batch: 11673/20099 (58.08%) Loss: 2.030680 LR: 0.00001872 +[13:49:25] Epoch: 1 Batch: 11674/20099 (58.08%) Loss: 2.347935 LR: 0.00001872 +[13:49:27] Epoch: 1 Batch: 11675/20099 (58.09%) Loss: 2.133591 LR: 0.00001872 +[13:49:28] Epoch: 1 Batch: 11676/20099 (58.09%) Loss: 2.092982 LR: 0.00001872 +[13:49:30] Epoch: 1 Batch: 11677/20099 (58.10%) Loss: 2.079291 LR: 0.00001871 +[13:49:32] Epoch: 1 Batch: 11678/20099 (58.10%) Loss: 2.268747 LR: 0.00001871 +[13:49:34] Epoch: 1 Batch: 11679/20099 (58.11%) Loss: 1.979760 LR: 0.00001871 +[13:49:35] Epoch: 1 Batch: 11680/20099 (58.11%) Loss: 2.001689 LR: 0.00001871 +[13:49:37] Epoch: 1 Batch: 11681/20099 (58.12%) Loss: 2.151004 LR: 0.00001871 +[13:49:39] Epoch: 1 Batch: 11682/20099 (58.12%) Loss: 2.083557 LR: 0.00001871 +[13:49:41] Epoch: 1 Batch: 11683/20099 (58.13%) Loss: 1.713750 LR: 0.00001871 +[13:49:43] Epoch: 1 Batch: 11684/20099 (58.13%) Loss: 1.907785 LR: 0.00001869 +[13:49:44] Epoch: 1 Batch: 11685/20099 (58.14%) Loss: 2.232937 LR: 0.00001869 +[13:49:46] Epoch: 1 Batch: 11686/20099 (58.14%) Loss: 1.881616 LR: 0.00001869 +[13:49:48] Epoch: 1 Batch: 11687/20099 (58.15%) Loss: 2.284847 LR: 0.00001869 +[13:49:50] Epoch: 1 Batch: 11688/20099 (58.15%) Loss: 1.883412 LR: 0.00001869 +[13:49:51] Epoch: 1 Batch: 11689/20099 (58.16%) Loss: 2.058420 LR: 0.00001869 +[13:49:53] Epoch: 1 Batch: 11690/20099 (58.16%) Loss: 1.967052 LR: 0.00001869 +[13:49:55] Epoch: 1 Batch: 11691/20099 (58.17%) Loss: 1.979523 LR: 0.00001867 +[13:49:57] Epoch: 1 Batch: 11692/20099 (58.17%) Loss: 1.842944 LR: 0.00001867 +[13:49:58] Epoch: 1 Batch: 11693/20099 (58.18%) Loss: 2.180306 LR: 0.00001867 +[13:50:00] Epoch: 1 Batch: 11694/20099 (58.18%) Loss: 2.393631 LR: 0.00001867 +[13:50:02] Epoch: 1 Batch: 11695/20099 (58.19%) Loss: 1.988879 LR: 0.00001867 +[13:50:04] Epoch: 1 Batch: 11696/20099 (58.19%) Loss: 2.170368 LR: 0.00001867 +[13:50:06] Epoch: 1 Batch: 11697/20099 (58.20%) Loss: 2.136129 LR: 0.00001867 +[13:50:07] Epoch: 1 Batch: 11698/20099 (58.20%) Loss: 2.182470 LR: 0.00001866 +[13:50:09] Epoch: 1 Batch: 11699/20099 (58.21%) Loss: 1.959076 LR: 0.00001866 +[13:50:11] Epoch: 1 Batch: 11700/20099 (58.21%) Loss: 2.224196 LR: 0.00001866 +[13:50:13] Epoch: 1 Batch: 11701/20099 (58.22%) Loss: 2.191238 LR: 0.00001866 +[13:50:14] Epoch: 1 Batch: 11702/20099 (58.22%) Loss: 2.472732 LR: 0.00001866 +[13:50:16] Epoch: 1 Batch: 11703/20099 (58.23%) Loss: 2.117027 LR: 0.00001866 +[13:50:18] Epoch: 1 Batch: 11704/20099 (58.23%) Loss: 2.105459 LR: 0.00001866 +[13:50:20] Epoch: 1 Batch: 11705/20099 (58.24%) Loss: 1.881214 LR: 0.00001864 +[13:50:22] Epoch: 1 Batch: 11706/20099 (58.24%) Loss: 2.279474 LR: 0.00001864 +[13:50:23] Epoch: 1 Batch: 11707/20099 (58.25%) Loss: 1.867320 LR: 0.00001864 +[13:50:25] Epoch: 1 Batch: 11708/20099 (58.25%) Loss: 2.149002 LR: 0.00001864 +[13:50:27] Epoch: 1 Batch: 11709/20099 (58.26%) Loss: 2.211083 LR: 0.00001864 +[13:50:29] Epoch: 1 Batch: 11710/20099 (58.26%) Loss: 2.241066 LR: 0.00001864 +[13:50:31] Epoch: 1 Batch: 11711/20099 (58.27%) Loss: 1.891470 LR: 0.00001864 +[13:50:32] Epoch: 1 Batch: 11712/20099 (58.27%) Loss: 2.196545 LR: 0.00001863 +[13:50:34] Epoch: 1 Batch: 11713/20099 (58.28%) Loss: 2.290085 LR: 0.00001863 +[13:50:36] Epoch: 1 Batch: 11714/20099 (58.28%) Loss: 2.354098 LR: 0.00001863 +[13:50:38] Epoch: 1 Batch: 11715/20099 (58.29%) Loss: 2.145191 LR: 0.00001863 +[13:50:39] Epoch: 1 Batch: 11716/20099 (58.29%) Loss: 2.179118 LR: 0.00001863 +[13:50:41] Epoch: 1 Batch: 11717/20099 (58.30%) Loss: 2.370042 LR: 0.00001863 +[13:50:43] Epoch: 1 Batch: 11718/20099 (58.30%) Loss: 2.002565 LR: 0.00001863 +[13:50:45] Epoch: 1 Batch: 11719/20099 (58.31%) Loss: 2.180166 LR: 0.00001861 +[13:50:47] Epoch: 1 Batch: 11720/20099 (58.31%) Loss: 2.261866 LR: 0.00001861 +[13:50:48] Epoch: 1 Batch: 11721/20099 (58.32%) Loss: 2.404797 LR: 0.00001861 +[13:50:50] Epoch: 1 Batch: 11722/20099 (58.32%) Loss: 1.867610 LR: 0.00001861 +[13:50:52] Epoch: 1 Batch: 11723/20099 (58.33%) Loss: 2.090361 LR: 0.00001861 +[13:50:54] Epoch: 1 Batch: 11724/20099 (58.33%) Loss: 2.007075 LR: 0.00001861 +[13:50:56] Epoch: 1 Batch: 11725/20099 (58.34%) Loss: 2.007616 LR: 0.00001861 +[13:50:57] Epoch: 1 Batch: 11726/20099 (58.34%) Loss: 2.090174 LR: 0.00001859 +[13:50:59] Epoch: 1 Batch: 11727/20099 (58.35%) Loss: 2.055362 LR: 0.00001859 +[13:51:01] Epoch: 1 Batch: 11728/20099 (58.35%) Loss: 2.093300 LR: 0.00001859 +[13:51:03] Epoch: 1 Batch: 11729/20099 (58.36%) Loss: 1.763729 LR: 0.00001859 +[13:51:05] Epoch: 1 Batch: 11730/20099 (58.36%) Loss: 1.893010 LR: 0.00001859 +[13:51:06] Epoch: 1 Batch: 11731/20099 (58.37%) Loss: 2.306679 LR: 0.00001859 +[13:51:08] Epoch: 1 Batch: 11732/20099 (58.37%) Loss: 2.121930 LR: 0.00001859 +[13:51:10] Epoch: 1 Batch: 11733/20099 (58.38%) Loss: 1.966584 LR: 0.00001858 +[13:51:12] Epoch: 1 Batch: 11734/20099 (58.38%) Loss: 1.898839 LR: 0.00001858 +[13:51:14] Epoch: 1 Batch: 11735/20099 (58.39%) Loss: 2.202404 LR: 0.00001858 +[13:51:16] Epoch: 1 Batch: 11736/20099 (58.39%) Loss: 1.676832 LR: 0.00001858 +[13:51:17] Epoch: 1 Batch: 11737/20099 (58.40%) Loss: 2.452664 LR: 0.00001858 +[13:51:19] Epoch: 1 Batch: 11738/20099 (58.40%) Loss: 2.224465 LR: 0.00001858 +[13:51:21] Epoch: 1 Batch: 11739/20099 (58.41%) Loss: 2.098806 LR: 0.00001858 +[13:51:23] Epoch: 1 Batch: 11740/20099 (58.41%) Loss: 1.938314 LR: 0.00001856 +[13:51:25] Epoch: 1 Batch: 11741/20099 (58.42%) Loss: 1.984704 LR: 0.00001856 +[13:51:26] Epoch: 1 Batch: 11742/20099 (58.42%) Loss: 2.221113 LR: 0.00001856 +[13:51:28] Epoch: 1 Batch: 11743/20099 (58.43%) Loss: 1.881946 LR: 0.00001856 +[13:51:30] Epoch: 1 Batch: 11744/20099 (58.43%) Loss: 1.953294 LR: 0.00001856 +[13:51:32] Epoch: 1 Batch: 11745/20099 (58.44%) Loss: 1.913837 LR: 0.00001856 +[13:51:34] Epoch: 1 Batch: 11746/20099 (58.44%) Loss: 2.061452 LR: 0.00001856 +[13:51:36] Epoch: 1 Batch: 11747/20099 (58.45%) Loss: 2.374895 LR: 0.00001854 +[13:51:37] Epoch: 1 Batch: 11748/20099 (58.45%) Loss: 2.203031 LR: 0.00001854 +[13:51:39] Epoch: 1 Batch: 11749/20099 (58.46%) Loss: 2.091162 LR: 0.00001854 +[13:51:41] Epoch: 1 Batch: 11750/20099 (58.46%) Loss: 2.098262 LR: 0.00001854 +[13:51:43] Epoch: 1 Batch: 11751/20099 (58.47%) Loss: 1.944414 LR: 0.00001854 +[13:51:45] Epoch: 1 Batch: 11752/20099 (58.47%) Loss: 1.941667 LR: 0.00001854 +[13:51:47] Epoch: 1 Batch: 11753/20099 (58.48%) Loss: 2.208131 LR: 0.00001854 +[13:51:48] Epoch: 1 Batch: 11754/20099 (58.48%) Loss: 2.142455 LR: 0.00001853 +[13:51:50] Epoch: 1 Batch: 11755/20099 (58.49%) Loss: 2.210806 LR: 0.00001853 +[13:51:52] Epoch: 1 Batch: 11756/20099 (58.49%) Loss: 2.112141 LR: 0.00001853 +[13:51:54] Epoch: 1 Batch: 11757/20099 (58.50%) Loss: 1.695814 LR: 0.00001853 +[13:51:56] Epoch: 1 Batch: 11758/20099 (58.50%) Loss: 2.308747 LR: 0.00001853 +[13:51:58] Epoch: 1 Batch: 11759/20099 (58.51%) Loss: 2.217951 LR: 0.00001853 +[13:51:59] Epoch: 1 Batch: 11760/20099 (58.51%) Loss: 2.123650 LR: 0.00001853 +[13:52:01] Epoch: 1 Batch: 11761/20099 (58.52%) Loss: 2.082167 LR: 0.00001851 +[13:52:03] Epoch: 1 Batch: 11762/20099 (58.52%) Loss: 2.251978 LR: 0.00001851 +[13:52:05] Epoch: 1 Batch: 11763/20099 (58.53%) Loss: 2.240474 LR: 0.00001851 +[13:52:07] Epoch: 1 Batch: 11764/20099 (58.53%) Loss: 2.287882 LR: 0.00001851 +[13:52:09] Epoch: 1 Batch: 11765/20099 (58.54%) Loss: 2.303154 LR: 0.00001851 +[13:52:10] Epoch: 1 Batch: 11766/20099 (58.54%) Loss: 2.067173 LR: 0.00001851 +[13:52:12] Epoch: 1 Batch: 11767/20099 (58.55%) Loss: 2.291340 LR: 0.00001851 +[13:52:14] Epoch: 1 Batch: 11768/20099 (58.55%) Loss: 1.983034 LR: 0.00001850 +[13:52:16] Epoch: 1 Batch: 11769/20099 (58.56%) Loss: 2.243315 LR: 0.00001850 +[13:52:18] Epoch: 1 Batch: 11770/20099 (58.56%) Loss: 2.237517 LR: 0.00001850 +[13:52:20] Epoch: 1 Batch: 11771/20099 (58.57%) Loss: 2.051685 LR: 0.00001850 +[13:52:21] Epoch: 1 Batch: 11772/20099 (58.57%) Loss: 1.980009 LR: 0.00001850 +[13:52:23] Epoch: 1 Batch: 11773/20099 (58.58%) Loss: 1.900794 LR: 0.00001850 +[13:52:25] Epoch: 1 Batch: 11774/20099 (58.58%) Loss: 2.083643 LR: 0.00001850 +[13:52:27] Epoch: 1 Batch: 11775/20099 (58.59%) Loss: 2.554008 LR: 0.00001848 +[13:52:29] Epoch: 1 Batch: 11776/20099 (58.59%) Loss: 1.980413 LR: 0.00001848 +[13:52:31] Epoch: 1 Batch: 11777/20099 (58.59%) Loss: 2.092948 LR: 0.00001848 +[13:52:33] Epoch: 1 Batch: 11778/20099 (58.60%) Loss: 2.003721 LR: 0.00001848 +[13:52:35] Epoch: 1 Batch: 11779/20099 (58.60%) Loss: 2.252162 LR: 0.00001848 +[13:52:36] Epoch: 1 Batch: 11780/20099 (58.61%) Loss: 2.306731 LR: 0.00001848 +[13:52:38] Epoch: 1 Batch: 11781/20099 (58.61%) Loss: 2.096641 LR: 0.00001848 +[13:52:40] Epoch: 1 Batch: 11782/20099 (58.62%) Loss: 2.242378 LR: 0.00001846 +[13:52:42] Epoch: 1 Batch: 11783/20099 (58.62%) Loss: 2.262103 LR: 0.00001846 +[13:52:44] Epoch: 1 Batch: 11784/20099 (58.63%) Loss: 2.047008 LR: 0.00001846 +[13:52:46] Epoch: 1 Batch: 11785/20099 (58.63%) Loss: 2.046954 LR: 0.00001846 +[13:52:47] Epoch: 1 Batch: 11786/20099 (58.64%) Loss: 2.134771 LR: 0.00001846 +[13:52:49] Epoch: 1 Batch: 11787/20099 (58.64%) Loss: 2.003363 LR: 0.00001846 +[13:52:51] Epoch: 1 Batch: 11788/20099 (58.65%) Loss: 1.777327 LR: 0.00001846 +[13:52:53] Epoch: 1 Batch: 11789/20099 (58.65%) Loss: 2.146442 LR: 0.00001845 +[13:52:55] Epoch: 1 Batch: 11790/20099 (58.66%) Loss: 1.984076 LR: 0.00001845 +[13:52:57] Epoch: 1 Batch: 11791/20099 (58.66%) Loss: 2.324841 LR: 0.00001845 +[13:52:59] Epoch: 1 Batch: 11792/20099 (58.67%) Loss: 1.930925 LR: 0.00001845 +[13:53:00] Epoch: 1 Batch: 11793/20099 (58.67%) Loss: 2.045105 LR: 0.00001845 +[13:53:02] Epoch: 1 Batch: 11794/20099 (58.68%) Loss: 2.167869 LR: 0.00001845 +[13:53:04] Epoch: 1 Batch: 11795/20099 (58.68%) Loss: 2.091428 LR: 0.00001845 +[13:53:06] Epoch: 1 Batch: 11796/20099 (58.69%) Loss: 2.087180 LR: 0.00001843 +[13:53:08] Epoch: 1 Batch: 11797/20099 (58.69%) Loss: 2.019968 LR: 0.00001843 +[13:53:10] Epoch: 1 Batch: 11798/20099 (58.70%) Loss: 1.965273 LR: 0.00001843 +[13:53:12] Epoch: 1 Batch: 11799/20099 (58.70%) Loss: 2.102204 LR: 0.00001843 +[13:53:18] >> Cleaned up old temp checkpoint: epoch1_step6000 +[13:53:18] >> Cleaned up old temp checkpoint: epoch1_step5800 +[13:53:18] >> Cleaned up old temp checkpoint: epoch1_step5600 +[13:53:18] >> Temp checkpoint saved: epoch1_step11800, size: 0.1693 GB +[13:53:18] Epoch: 1 Batch: 11800/20099 (58.71%) Loss: 1.970434 LR: 0.00001843 +[13:53:20] Epoch: 1 Batch: 11801/20099 (58.71%) Loss: 2.213262 LR: 0.00001843 +[13:53:22] Epoch: 1 Batch: 11802/20099 (58.72%) Loss: 2.448035 LR: 0.00001843 +[13:53:24] Epoch: 1 Batch: 11803/20099 (58.72%) Loss: 2.054489 LR: 0.00001841 +[13:53:26] Epoch: 1 Batch: 11804/20099 (58.73%) Loss: 2.035305 LR: 0.00001841 +[13:53:27] Epoch: 1 Batch: 11805/20099 (58.73%) Loss: 2.100226 LR: 0.00001841 +[13:53:29] Epoch: 1 Batch: 11806/20099 (58.74%) Loss: 2.253917 LR: 0.00001841 +[13:53:31] Epoch: 1 Batch: 11807/20099 (58.74%) Loss: 1.935932 LR: 0.00001841 +[13:53:33] Epoch: 1 Batch: 11808/20099 (58.75%) Loss: 2.095400 LR: 0.00001841 +[13:53:35] Epoch: 1 Batch: 11809/20099 (58.75%) Loss: 2.087065 LR: 0.00001841 +[13:53:37] Epoch: 1 Batch: 11810/20099 (58.76%) Loss: 2.004863 LR: 0.00001840 +[13:53:38] Epoch: 1 Batch: 11811/20099 (58.76%) Loss: 2.159311 LR: 0.00001840 +[13:53:40] Epoch: 1 Batch: 11812/20099 (58.77%) Loss: 2.260426 LR: 0.00001840 +[13:53:42] Epoch: 1 Batch: 11813/20099 (58.77%) Loss: 1.922394 LR: 0.00001840 +[13:53:44] Epoch: 1 Batch: 11814/20099 (58.78%) Loss: 2.360313 LR: 0.00001840 +[13:53:46] Epoch: 1 Batch: 11815/20099 (58.78%) Loss: 2.046540 LR: 0.00001840 +[13:53:48] Epoch: 1 Batch: 11816/20099 (58.79%) Loss: 2.258841 LR: 0.00001840 +[13:53:50] Epoch: 1 Batch: 11817/20099 (58.79%) Loss: 2.000324 LR: 0.00001838 +[13:53:51] Epoch: 1 Batch: 11818/20099 (58.80%) Loss: 2.274274 LR: 0.00001838 +[13:53:53] Epoch: 1 Batch: 11819/20099 (58.80%) Loss: 2.275008 LR: 0.00001838 +[13:53:55] Epoch: 1 Batch: 11820/20099 (58.81%) Loss: 2.180188 LR: 0.00001838 +[13:53:57] Epoch: 1 Batch: 11821/20099 (58.81%) Loss: 2.175413 LR: 0.00001838 +[13:53:59] Epoch: 1 Batch: 11822/20099 (58.82%) Loss: 1.957323 LR: 0.00001838 +[13:54:01] Epoch: 1 Batch: 11823/20099 (58.82%) Loss: 2.111849 LR: 0.00001838 +[13:54:03] Epoch: 1 Batch: 11824/20099 (58.83%) Loss: 2.113971 LR: 0.00001837 +[13:54:04] Epoch: 1 Batch: 11825/20099 (58.83%) Loss: 2.256018 LR: 0.00001837 +[13:54:06] Epoch: 1 Batch: 11826/20099 (58.84%) Loss: 1.936416 LR: 0.00001837 +[13:54:08] Epoch: 1 Batch: 11827/20099 (58.84%) Loss: 1.841989 LR: 0.00001837 +[13:54:10] Epoch: 1 Batch: 11828/20099 (58.85%) Loss: 1.995320 LR: 0.00001837 +[13:54:12] Epoch: 1 Batch: 11829/20099 (58.85%) Loss: 2.261927 LR: 0.00001837 +[13:54:14] Epoch: 1 Batch: 11830/20099 (58.86%) Loss: 1.959313 LR: 0.00001837 +[13:54:15] Epoch: 1 Batch: 11831/20099 (58.86%) Loss: 2.275488 LR: 0.00001835 +[13:54:17] Epoch: 1 Batch: 11832/20099 (58.87%) Loss: 2.057869 LR: 0.00001835 +[13:54:19] Epoch: 1 Batch: 11833/20099 (58.87%) Loss: 2.074106 LR: 0.00001835 +[13:54:21] Epoch: 1 Batch: 11834/20099 (58.88%) Loss: 1.612940 LR: 0.00001835 +[13:54:23] Epoch: 1 Batch: 11835/20099 (58.88%) Loss: 2.115710 LR: 0.00001835 +[13:54:25] Epoch: 1 Batch: 11836/20099 (58.89%) Loss: 1.680971 LR: 0.00001835 +[13:54:26] Epoch: 1 Batch: 11837/20099 (58.89%) Loss: 2.159996 LR: 0.00001835 +[13:54:28] Epoch: 1 Batch: 11838/20099 (58.90%) Loss: 2.004302 LR: 0.00001833 +[13:54:30] Epoch: 1 Batch: 11839/20099 (58.90%) Loss: 2.202188 LR: 0.00001833 +[13:54:32] Epoch: 1 Batch: 11840/20099 (58.91%) Loss: 2.377111 LR: 0.00001833 +[13:54:34] Epoch: 1 Batch: 11841/20099 (58.91%) Loss: 2.104885 LR: 0.00001833 +[13:54:36] Epoch: 1 Batch: 11842/20099 (58.92%) Loss: 2.124622 LR: 0.00001833 +[13:54:38] Epoch: 1 Batch: 11843/20099 (58.92%) Loss: 2.091166 LR: 0.00001833 +[13:54:39] Epoch: 1 Batch: 11844/20099 (58.93%) Loss: 1.869989 LR: 0.00001833 +[13:54:41] Epoch: 1 Batch: 11845/20099 (58.93%) Loss: 1.897210 LR: 0.00001832 +[13:54:43] Epoch: 1 Batch: 11846/20099 (58.94%) Loss: 2.138158 LR: 0.00001832 +[13:54:45] Epoch: 1 Batch: 11847/20099 (58.94%) Loss: 1.898017 LR: 0.00001832 +[13:54:47] Epoch: 1 Batch: 11848/20099 (58.95%) Loss: 2.277475 LR: 0.00001832 +[13:54:49] Epoch: 1 Batch: 11849/20099 (58.95%) Loss: 1.848698 LR: 0.00001832 +[13:54:51] Epoch: 1 Batch: 11850/20099 (58.96%) Loss: 2.144259 LR: 0.00001832 +[13:54:52] Epoch: 1 Batch: 11851/20099 (58.96%) Loss: 2.106738 LR: 0.00001832 +[13:54:54] Epoch: 1 Batch: 11852/20099 (58.97%) Loss: 2.360341 LR: 0.00001830 +[13:54:56] Epoch: 1 Batch: 11853/20099 (58.97%) Loss: 1.937472 LR: 0.00001830 +[13:54:58] Epoch: 1 Batch: 11854/20099 (58.98%) Loss: 2.291535 LR: 0.00001830 +[13:55:00] Epoch: 1 Batch: 11855/20099 (58.98%) Loss: 2.269319 LR: 0.00001830 +[13:55:02] Epoch: 1 Batch: 11856/20099 (58.99%) Loss: 2.086259 LR: 0.00001830 +[13:55:04] Epoch: 1 Batch: 11857/20099 (58.99%) Loss: 2.061155 LR: 0.00001830 +[13:55:05] Epoch: 1 Batch: 11858/20099 (59.00%) Loss: 2.321799 LR: 0.00001830 +[13:55:07] Epoch: 1 Batch: 11859/20099 (59.00%) Loss: 1.982731 LR: 0.00001828 +[13:55:09] Epoch: 1 Batch: 11860/20099 (59.01%) Loss: 2.286336 LR: 0.00001828 +[13:55:11] Epoch: 1 Batch: 11861/20099 (59.01%) Loss: 2.063646 LR: 0.00001828 +[13:55:13] Epoch: 1 Batch: 11862/20099 (59.02%) Loss: 1.992929 LR: 0.00001828 +[13:55:15] Epoch: 1 Batch: 11863/20099 (59.02%) Loss: 2.243889 LR: 0.00001828 +[13:55:17] Epoch: 1 Batch: 11864/20099 (59.03%) Loss: 2.358169 LR: 0.00001828 +[13:55:18] Epoch: 1 Batch: 11865/20099 (59.03%) Loss: 1.999358 LR: 0.00001828 +[13:55:20] Epoch: 1 Batch: 11866/20099 (59.04%) Loss: 2.339380 LR: 0.00001827 +[13:55:22] Epoch: 1 Batch: 11867/20099 (59.04%) Loss: 1.903591 LR: 0.00001827 +[13:55:24] Epoch: 1 Batch: 11868/20099 (59.05%) Loss: 2.077491 LR: 0.00001827 +[13:55:26] Epoch: 1 Batch: 11869/20099 (59.05%) Loss: 2.216758 LR: 0.00001827 +[13:55:28] Epoch: 1 Batch: 11870/20099 (59.06%) Loss: 2.027454 LR: 0.00001827 +[13:55:30] Epoch: 1 Batch: 11871/20099 (59.06%) Loss: 2.039042 LR: 0.00001827 +[13:55:32] Epoch: 1 Batch: 11872/20099 (59.07%) Loss: 1.807988 LR: 0.00001827 +[13:55:33] Epoch: 1 Batch: 11873/20099 (59.07%) Loss: 1.829313 LR: 0.00001825 +[13:55:35] Epoch: 1 Batch: 11874/20099 (59.08%) Loss: 2.061653 LR: 0.00001825 +[13:55:37] Epoch: 1 Batch: 11875/20099 (59.08%) Loss: 2.062271 LR: 0.00001825 +[13:55:39] Epoch: 1 Batch: 11876/20099 (59.09%) Loss: 2.330869 LR: 0.00001825 +[13:55:41] Epoch: 1 Batch: 11877/20099 (59.09%) Loss: 2.580341 LR: 0.00001825 +[13:55:43] Epoch: 1 Batch: 11878/20099 (59.10%) Loss: 2.228382 LR: 0.00001825 +[13:55:45] Epoch: 1 Batch: 11879/20099 (59.10%) Loss: 2.078221 LR: 0.00001825 +[13:55:46] Epoch: 1 Batch: 11880/20099 (59.11%) Loss: 1.818790 LR: 0.00001824 +[13:55:48] Epoch: 1 Batch: 11881/20099 (59.11%) Loss: 2.021463 LR: 0.00001824 +[13:55:50] Epoch: 1 Batch: 11882/20099 (59.12%) Loss: 2.052936 LR: 0.00001824 +[13:55:52] Epoch: 1 Batch: 11883/20099 (59.12%) Loss: 2.255507 LR: 0.00001824 +[13:55:54] Epoch: 1 Batch: 11884/20099 (59.13%) Loss: 2.112234 LR: 0.00001824 +[13:55:56] Epoch: 1 Batch: 11885/20099 (59.13%) Loss: 2.133395 LR: 0.00001824 +[13:55:58] Epoch: 1 Batch: 11886/20099 (59.14%) Loss: 2.254321 LR: 0.00001824 +[13:55:59] Epoch: 1 Batch: 11887/20099 (59.14%) Loss: 2.129515 LR: 0.00001822 +[13:56:01] Epoch: 1 Batch: 11888/20099 (59.15%) Loss: 2.238394 LR: 0.00001822 +[13:56:03] Epoch: 1 Batch: 11889/20099 (59.15%) Loss: 2.329515 LR: 0.00001822 +[13:56:05] Epoch: 1 Batch: 11890/20099 (59.16%) Loss: 2.027393 LR: 0.00001822 +[13:56:07] Epoch: 1 Batch: 11891/20099 (59.16%) Loss: 2.009616 LR: 0.00001822 +[13:56:09] Epoch: 1 Batch: 11892/20099 (59.17%) Loss: 2.045273 LR: 0.00001822 +[13:56:11] Epoch: 1 Batch: 11893/20099 (59.17%) Loss: 2.198835 LR: 0.00001822 +[13:56:12] Epoch: 1 Batch: 11894/20099 (59.18%) Loss: 2.086552 LR: 0.00001820 +[13:56:14] Epoch: 1 Batch: 11895/20099 (59.18%) Loss: 2.019103 LR: 0.00001820 +[13:56:16] Epoch: 1 Batch: 11896/20099 (59.19%) Loss: 1.980255 LR: 0.00001820 +[13:56:18] Epoch: 1 Batch: 11897/20099 (59.19%) Loss: 2.073910 LR: 0.00001820 +[13:56:20] Epoch: 1 Batch: 11898/20099 (59.20%) Loss: 2.262356 LR: 0.00001820 +[13:56:22] Epoch: 1 Batch: 11899/20099 (59.20%) Loss: 2.138031 LR: 0.00001820 +[13:56:24] Epoch: 1 Batch: 11900/20099 (59.21%) Loss: 2.154301 LR: 0.00001820 +[13:56:25] Epoch: 1 Batch: 11901/20099 (59.21%) Loss: 2.523156 LR: 0.00001819 +[13:56:27] Epoch: 1 Batch: 11902/20099 (59.22%) Loss: 2.112205 LR: 0.00001819 +[13:56:29] Epoch: 1 Batch: 11903/20099 (59.22%) Loss: 2.299864 LR: 0.00001819 +[13:56:31] Epoch: 1 Batch: 11904/20099 (59.23%) Loss: 1.917401 LR: 0.00001819 +[13:56:33] Epoch: 1 Batch: 11905/20099 (59.23%) Loss: 2.186485 LR: 0.00001819 +[13:56:35] Epoch: 1 Batch: 11906/20099 (59.24%) Loss: 2.154911 LR: 0.00001819 +[13:56:37] Epoch: 1 Batch: 11907/20099 (59.24%) Loss: 2.220390 LR: 0.00001819 +[13:56:38] Epoch: 1 Batch: 11908/20099 (59.25%) Loss: 2.214277 LR: 0.00001817 +[13:56:40] Epoch: 1 Batch: 11909/20099 (59.25%) Loss: 1.964027 LR: 0.00001817 +[13:56:42] Epoch: 1 Batch: 11910/20099 (59.26%) Loss: 2.199575 LR: 0.00001817 +[13:56:44] Epoch: 1 Batch: 11911/20099 (59.26%) Loss: 2.139123 LR: 0.00001817 +[13:56:46] Epoch: 1 Batch: 11912/20099 (59.27%) Loss: 2.092628 LR: 0.00001817 +[13:56:48] Epoch: 1 Batch: 11913/20099 (59.27%) Loss: 1.992154 LR: 0.00001817 +[13:56:49] Epoch: 1 Batch: 11914/20099 (59.28%) Loss: 1.985468 LR: 0.00001817 +[13:56:51] Epoch: 1 Batch: 11915/20099 (59.28%) Loss: 2.126505 LR: 0.00001815 +[13:56:53] Epoch: 1 Batch: 11916/20099 (59.29%) Loss: 2.304210 LR: 0.00001815 +[13:56:55] Epoch: 1 Batch: 11917/20099 (59.29%) Loss: 2.128174 LR: 0.00001815 +[13:56:57] Epoch: 1 Batch: 11918/20099 (59.30%) Loss: 1.923754 LR: 0.00001815 +[13:56:59] Epoch: 1 Batch: 11919/20099 (59.30%) Loss: 2.133491 LR: 0.00001815 +[13:57:01] Epoch: 1 Batch: 11920/20099 (59.31%) Loss: 2.022337 LR: 0.00001815 +[13:57:02] Epoch: 1 Batch: 11921/20099 (59.31%) Loss: 2.223453 LR: 0.00001815 +[13:57:04] Epoch: 1 Batch: 11922/20099 (59.32%) Loss: 1.858255 LR: 0.00001814 +[13:57:06] Epoch: 1 Batch: 11923/20099 (59.32%) Loss: 2.182527 LR: 0.00001814 +[13:57:08] Epoch: 1 Batch: 11924/20099 (59.33%) Loss: 2.041735 LR: 0.00001814 +[13:57:10] Epoch: 1 Batch: 11925/20099 (59.33%) Loss: 2.060998 LR: 0.00001814 +[13:57:12] Epoch: 1 Batch: 11926/20099 (59.34%) Loss: 2.321335 LR: 0.00001814 +[13:57:14] Epoch: 1 Batch: 11927/20099 (59.34%) Loss: 2.147984 LR: 0.00001814 +[13:57:15] Epoch: 1 Batch: 11928/20099 (59.35%) Loss: 2.047800 LR: 0.00001814 +[13:57:17] Epoch: 1 Batch: 11929/20099 (59.35%) Loss: 2.068217 LR: 0.00001812 +[13:57:19] Epoch: 1 Batch: 11930/20099 (59.36%) Loss: 1.887088 LR: 0.00001812 +[13:57:21] Epoch: 1 Batch: 11931/20099 (59.36%) Loss: 1.949737 LR: 0.00001812 +[13:57:23] Epoch: 1 Batch: 11932/20099 (59.37%) Loss: 2.202947 LR: 0.00001812 +[13:57:25] Epoch: 1 Batch: 11933/20099 (59.37%) Loss: 2.046483 LR: 0.00001812 +[13:57:27] Epoch: 1 Batch: 11934/20099 (59.38%) Loss: 2.124260 LR: 0.00001812 +[13:57:28] Epoch: 1 Batch: 11935/20099 (59.38%) Loss: 2.043452 LR: 0.00001812 +[13:57:30] Epoch: 1 Batch: 11936/20099 (59.39%) Loss: 2.110117 LR: 0.00001811 +[13:57:32] Epoch: 1 Batch: 11937/20099 (59.39%) Loss: 2.115472 LR: 0.00001811 +[13:57:34] Epoch: 1 Batch: 11938/20099 (59.40%) Loss: 1.959133 LR: 0.00001811 +[13:57:36] Epoch: 1 Batch: 11939/20099 (59.40%) Loss: 1.955804 LR: 0.00001811 +[13:57:38] Epoch: 1 Batch: 11940/20099 (59.41%) Loss: 2.151593 LR: 0.00001811 +[13:57:40] Epoch: 1 Batch: 11941/20099 (59.41%) Loss: 2.126657 LR: 0.00001811 +[13:57:41] Epoch: 1 Batch: 11942/20099 (59.42%) Loss: 2.594750 LR: 0.00001811 +[13:57:43] Epoch: 1 Batch: 11943/20099 (59.42%) Loss: 2.271149 LR: 0.00001809 +[13:57:45] Epoch: 1 Batch: 11944/20099 (59.43%) Loss: 1.984614 LR: 0.00001809 +[13:57:47] Epoch: 1 Batch: 11945/20099 (59.43%) Loss: 2.174869 LR: 0.00001809 +[13:57:49] Epoch: 1 Batch: 11946/20099 (59.44%) Loss: 2.179909 LR: 0.00001809 +[13:57:51] Epoch: 1 Batch: 11947/20099 (59.44%) Loss: 2.016270 LR: 0.00001809 +[13:57:52] Epoch: 1 Batch: 11948/20099 (59.45%) Loss: 1.975372 LR: 0.00001809 +[13:57:54] Epoch: 1 Batch: 11949/20099 (59.45%) Loss: 2.064921 LR: 0.00001809 +[13:57:56] Epoch: 1 Batch: 11950/20099 (59.46%) Loss: 1.995668 LR: 0.00001807 +[13:57:58] Epoch: 1 Batch: 11951/20099 (59.46%) Loss: 2.019344 LR: 0.00001807 +[13:58:00] Epoch: 1 Batch: 11952/20099 (59.47%) Loss: 2.037734 LR: 0.00001807 +[13:58:02] Epoch: 1 Batch: 11953/20099 (59.47%) Loss: 2.183320 LR: 0.00001807 +[13:58:04] Epoch: 1 Batch: 11954/20099 (59.48%) Loss: 1.976371 LR: 0.00001807 +[13:58:05] Epoch: 1 Batch: 11955/20099 (59.48%) Loss: 2.121747 LR: 0.00001807 +[13:58:07] Epoch: 1 Batch: 11956/20099 (59.49%) Loss: 1.964252 LR: 0.00001807 +[13:58:09] Epoch: 1 Batch: 11957/20099 (59.49%) Loss: 2.026970 LR: 0.00001806 +[13:58:11] Epoch: 1 Batch: 11958/20099 (59.50%) Loss: 2.056350 LR: 0.00001806 +[13:58:13] Epoch: 1 Batch: 11959/20099 (59.50%) Loss: 2.113556 LR: 0.00001806 +[13:58:15] Epoch: 1 Batch: 11960/20099 (59.51%) Loss: 2.200220 LR: 0.00001806 +[13:58:17] Epoch: 1 Batch: 11961/20099 (59.51%) Loss: 1.939128 LR: 0.00001806 +[13:58:18] Epoch: 1 Batch: 11962/20099 (59.52%) Loss: 1.860443 LR: 0.00001806 +[13:58:20] Epoch: 1 Batch: 11963/20099 (59.52%) Loss: 2.248807 LR: 0.00001806 +[13:58:22] Epoch: 1 Batch: 11964/20099 (59.53%) Loss: 1.981426 LR: 0.00001804 +[13:58:24] Epoch: 1 Batch: 11965/20099 (59.53%) Loss: 1.994280 LR: 0.00001804 +[13:58:26] Epoch: 1 Batch: 11966/20099 (59.54%) Loss: 1.915812 LR: 0.00001804 +[13:58:28] Epoch: 1 Batch: 11967/20099 (59.54%) Loss: 2.269165 LR: 0.00001804 +[13:58:29] Epoch: 1 Batch: 11968/20099 (59.55%) Loss: 2.000148 LR: 0.00001804 +[13:58:31] Epoch: 1 Batch: 11969/20099 (59.55%) Loss: 2.320022 LR: 0.00001804 +[13:58:33] Epoch: 1 Batch: 11970/20099 (59.56%) Loss: 2.303728 LR: 0.00001804 +[13:58:35] Epoch: 1 Batch: 11971/20099 (59.56%) Loss: 2.131265 LR: 0.00001802 +[13:58:37] Epoch: 1 Batch: 11972/20099 (59.57%) Loss: 2.039286 LR: 0.00001802 +[13:58:39] Epoch: 1 Batch: 11973/20099 (59.57%) Loss: 1.986131 LR: 0.00001802 +[13:58:41] Epoch: 1 Batch: 11974/20099 (59.58%) Loss: 2.079265 LR: 0.00001802 +[13:58:42] Epoch: 1 Batch: 11975/20099 (59.58%) Loss: 1.635267 LR: 0.00001802 +[13:58:44] Epoch: 1 Batch: 11976/20099 (59.59%) Loss: 2.217071 LR: 0.00001802 +[13:58:46] Epoch: 1 Batch: 11977/20099 (59.59%) Loss: 1.918673 LR: 0.00001802 +[13:58:48] Epoch: 1 Batch: 11978/20099 (59.60%) Loss: 1.876424 LR: 0.00001801 +[13:58:50] Epoch: 1 Batch: 11979/20099 (59.60%) Loss: 1.929583 LR: 0.00001801 +[13:58:52] Epoch: 1 Batch: 11980/20099 (59.60%) Loss: 2.035302 LR: 0.00001801 +[13:58:54] Epoch: 1 Batch: 11981/20099 (59.61%) Loss: 2.284866 LR: 0.00001801 +[13:58:55] Epoch: 1 Batch: 11982/20099 (59.61%) Loss: 2.163074 LR: 0.00001801 +[13:58:57] Epoch: 1 Batch: 11983/20099 (59.62%) Loss: 2.277340 LR: 0.00001801 +[13:58:59] Epoch: 1 Batch: 11984/20099 (59.62%) Loss: 2.147599 LR: 0.00001801 +[13:59:01] Epoch: 1 Batch: 11985/20099 (59.63%) Loss: 2.097553 LR: 0.00001799 +[13:59:03] Epoch: 1 Batch: 11986/20099 (59.63%) Loss: 1.950933 LR: 0.00001799 +[13:59:05] Epoch: 1 Batch: 11987/20099 (59.64%) Loss: 2.431226 LR: 0.00001799 +[13:59:06] Epoch: 1 Batch: 11988/20099 (59.64%) Loss: 1.990898 LR: 0.00001799 +[13:59:08] Epoch: 1 Batch: 11989/20099 (59.65%) Loss: 1.940519 LR: 0.00001799 +[13:59:10] Epoch: 1 Batch: 11990/20099 (59.65%) Loss: 1.998959 LR: 0.00001799 +[13:59:12] Epoch: 1 Batch: 11991/20099 (59.66%) Loss: 2.113931 LR: 0.00001799 +[13:59:14] Epoch: 1 Batch: 11992/20099 (59.66%) Loss: 1.937516 LR: 0.00001798 +[13:59:16] Epoch: 1 Batch: 11993/20099 (59.67%) Loss: 2.254978 LR: 0.00001798 +[13:59:18] Epoch: 1 Batch: 11994/20099 (59.67%) Loss: 1.970295 LR: 0.00001798 +[13:59:19] Epoch: 1 Batch: 11995/20099 (59.68%) Loss: 2.147945 LR: 0.00001798 +[13:59:21] Epoch: 1 Batch: 11996/20099 (59.68%) Loss: 2.180451 LR: 0.00001798 +[13:59:23] Epoch: 1 Batch: 11997/20099 (59.69%) Loss: 2.271821 LR: 0.00001798 +[13:59:25] Epoch: 1 Batch: 11998/20099 (59.69%) Loss: 2.543804 LR: 0.00001798 +[13:59:27] Epoch: 1 Batch: 11999/20099 (59.70%) Loss: 2.082809 LR: 0.00001796 +[13:59:29] >> Evaluating batch 0 +[13:59:30] >> Evaluating batch 1 +[13:59:31] >> Evaluating batch 2 +[13:59:32] >> Evaluating batch 3 +[13:59:33] >> Evaluating batch 4 +[13:59:34] >> Evaluating batch 5 +[13:59:35] >> Evaluating batch 6 +[13:59:36] >> Evaluating batch 7 +[13:59:37] >> Evaluating batch 8 +[13:59:39] >> Evaluating batch 9 +[13:59:40] >> Evaluating batch 10 +[13:59:41] >> Evaluating batch 11 +[13:59:42] >> Evaluating batch 12 +[13:59:43] >> Evaluating batch 13 +[13:59:44] >> Evaluating batch 14 +[13:59:45] >> Evaluating batch 15 +[13:59:46] >> Evaluating batch 16 +[13:59:46] Epoch: 1 Step: 12000/20099 Evaluation: +[13:59:46] [1mAvg Loss Since Last Eval: 0.0699 Val Loss: 2.1585 Validation loss delta: 2.1585 Perplexity: 8.6583 LR: 0.00001796 +[13:59:50] >> Cleaned up old temp checkpoint: epoch1_step6200 +[13:59:50] >> Temp checkpoint saved: epoch1_step12000, size: 0.1693 GB +[13:59:54] >> Checkpoint saved: epoch1_step12000, size: 0.1693 GB +[13:59:54] Epoch: 1 Batch: 12000/20099 (59.70%) Loss: 1.929041 LR: 0.00001796 +[13:59:55] Epoch: 1 Batch: 12001/20099 (59.71%) Loss: 2.157771 LR: 0.00001796 +[13:59:57] Epoch: 1 Batch: 12002/20099 (59.71%) Loss: 2.243726 LR: 0.00001796 +[13:59:59] Epoch: 1 Batch: 12003/20099 (59.72%) Loss: 2.172165 LR: 0.00001796 +[14:00:01] Epoch: 1 Batch: 12004/20099 (59.72%) Loss: 2.085486 LR: 0.00001796 +[14:00:03] Epoch: 1 Batch: 12005/20099 (59.73%) Loss: 2.166671 LR: 0.00001796 +[14:00:05] Epoch: 1 Batch: 12006/20099 (59.73%) Loss: 1.988098 LR: 0.00001794 +[14:00:06] Epoch: 1 Batch: 12007/20099 (59.74%) Loss: 1.860830 LR: 0.00001794 +[14:00:08] Epoch: 1 Batch: 12008/20099 (59.74%) Loss: 1.766423 LR: 0.00001794 +[14:00:10] Epoch: 1 Batch: 12009/20099 (59.75%) Loss: 1.910781 LR: 0.00001794 +[14:00:12] Epoch: 1 Batch: 12010/20099 (59.75%) Loss: 1.559872 LR: 0.00001794 +[14:00:14] Epoch: 1 Batch: 12011/20099 (59.76%) Loss: 2.284399 LR: 0.00001794 +[14:00:16] Epoch: 1 Batch: 12012/20099 (59.76%) Loss: 1.753412 LR: 0.00001794 +[14:00:17] Epoch: 1 Batch: 12013/20099 (59.77%) Loss: 2.212875 LR: 0.00001793 +[14:00:19] Epoch: 1 Batch: 12014/20099 (59.77%) Loss: 1.777774 LR: 0.00001793 +[14:00:21] Epoch: 1 Batch: 12015/20099 (59.78%) Loss: 2.272972 LR: 0.00001793 +[14:00:23] Epoch: 1 Batch: 12016/20099 (59.78%) Loss: 1.900482 LR: 0.00001793 +[14:00:25] Epoch: 1 Batch: 12017/20099 (59.79%) Loss: 2.150534 LR: 0.00001793 +[14:00:27] Epoch: 1 Batch: 12018/20099 (59.79%) Loss: 1.791320 LR: 0.00001793 +[14:00:29] Epoch: 1 Batch: 12019/20099 (59.80%) Loss: 2.091736 LR: 0.00001793 +[14:00:30] Epoch: 1 Batch: 12020/20099 (59.80%) Loss: 2.148026 LR: 0.00001791 +[14:00:32] Epoch: 1 Batch: 12021/20099 (59.81%) Loss: 2.089726 LR: 0.00001791 +[14:00:34] Epoch: 1 Batch: 12022/20099 (59.81%) Loss: 2.007583 LR: 0.00001791 +[14:00:36] Epoch: 1 Batch: 12023/20099 (59.82%) Loss: 1.939742 LR: 0.00001791 +[14:00:38] Epoch: 1 Batch: 12024/20099 (59.82%) Loss: 2.165577 LR: 0.00001791 +[14:00:40] Epoch: 1 Batch: 12025/20099 (59.83%) Loss: 2.382001 LR: 0.00001791 +[14:00:41] Epoch: 1 Batch: 12026/20099 (59.83%) Loss: 2.161133 LR: 0.00001791 +[14:00:43] Epoch: 1 Batch: 12027/20099 (59.84%) Loss: 1.955997 LR: 0.00001789 +[14:00:45] Epoch: 1 Batch: 12028/20099 (59.84%) Loss: 2.283062 LR: 0.00001789 +[14:00:47] Epoch: 1 Batch: 12029/20099 (59.85%) Loss: 2.266393 LR: 0.00001789 +[14:00:49] Epoch: 1 Batch: 12030/20099 (59.85%) Loss: 2.204633 LR: 0.00001789 +[14:00:51] Epoch: 1 Batch: 12031/20099 (59.86%) Loss: 1.977131 LR: 0.00001789 +[14:00:53] Epoch: 1 Batch: 12032/20099 (59.86%) Loss: 2.306788 LR: 0.00001789 +[14:00:55] Epoch: 1 Batch: 12033/20099 (59.87%) Loss: 1.971958 LR: 0.00001789 +[14:00:56] Epoch: 1 Batch: 12034/20099 (59.87%) Loss: 2.033112 LR: 0.00001788 +[14:00:58] Epoch: 1 Batch: 12035/20099 (59.88%) Loss: 2.095487 LR: 0.00001788 +[14:01:00] Epoch: 1 Batch: 12036/20099 (59.88%) Loss: 2.132129 LR: 0.00001788 +[14:01:02] Epoch: 1 Batch: 12037/20099 (59.89%) Loss: 2.054639 LR: 0.00001788 +[14:01:04] Epoch: 1 Batch: 12038/20099 (59.89%) Loss: 2.019219 LR: 0.00001788 +[14:01:06] Epoch: 1 Batch: 12039/20099 (59.90%) Loss: 1.958479 LR: 0.00001788 +[14:01:08] Epoch: 1 Batch: 12040/20099 (59.90%) Loss: 2.274388 LR: 0.00001788 +[14:01:09] Epoch: 1 Batch: 12041/20099 (59.91%) Loss: 1.982106 LR: 0.00001786 +[14:01:11] Epoch: 1 Batch: 12042/20099 (59.91%) Loss: 1.871173 LR: 0.00001786 +[14:01:13] Epoch: 1 Batch: 12043/20099 (59.92%) Loss: 1.964772 LR: 0.00001786 +[14:01:15] Epoch: 1 Batch: 12044/20099 (59.92%) Loss: 1.877458 LR: 0.00001786 +[14:01:17] Epoch: 1 Batch: 12045/20099 (59.93%) Loss: 2.022984 LR: 0.00001786 +[14:01:19] Epoch: 1 Batch: 12046/20099 (59.93%) Loss: 2.178191 LR: 0.00001786 +[14:01:21] Epoch: 1 Batch: 12047/20099 (59.94%) Loss: 1.916015 LR: 0.00001786 +[14:01:22] Epoch: 1 Batch: 12048/20099 (59.94%) Loss: 2.158407 LR: 0.00001785 +[14:01:24] Epoch: 1 Batch: 12049/20099 (59.95%) Loss: 2.371899 LR: 0.00001785 +[14:01:26] Epoch: 1 Batch: 12050/20099 (59.95%) Loss: 2.184262 LR: 0.00001785 +[14:01:28] Epoch: 1 Batch: 12051/20099 (59.96%) Loss: 2.042937 LR: 0.00001785 +[14:01:30] Epoch: 1 Batch: 12052/20099 (59.96%) Loss: 2.038719 LR: 0.00001785 +[14:01:32] Epoch: 1 Batch: 12053/20099 (59.97%) Loss: 1.838948 LR: 0.00001785 +[14:01:34] Epoch: 1 Batch: 12054/20099 (59.97%) Loss: 2.180948 LR: 0.00001785 +[14:01:35] Epoch: 1 Batch: 12055/20099 (59.98%) Loss: 2.002589 LR: 0.00001783 +[14:01:37] Epoch: 1 Batch: 12056/20099 (59.98%) Loss: 2.118901 LR: 0.00001783 +[14:01:39] Epoch: 1 Batch: 12057/20099 (59.99%) Loss: 1.978682 LR: 0.00001783 +[14:01:41] Epoch: 1 Batch: 12058/20099 (59.99%) Loss: 1.822109 LR: 0.00001783 +[14:01:43] Epoch: 1 Batch: 12059/20099 (60.00%) Loss: 2.395901 LR: 0.00001783 +[14:01:45] Epoch: 1 Batch: 12060/20099 (60.00%) Loss: 2.370045 LR: 0.00001783 +[14:01:47] Epoch: 1 Batch: 12061/20099 (60.01%) Loss: 2.003750 LR: 0.00001783 +[14:01:48] Epoch: 1 Batch: 12062/20099 (60.01%) Loss: 2.226631 LR: 0.00001781 +[14:01:50] Epoch: 1 Batch: 12063/20099 (60.02%) Loss: 1.888216 LR: 0.00001781 +[14:01:52] Epoch: 1 Batch: 12064/20099 (60.02%) Loss: 2.097620 LR: 0.00001781 +[14:01:54] Epoch: 1 Batch: 12065/20099 (60.03%) Loss: 2.136600 LR: 0.00001781 +[14:01:56] Epoch: 1 Batch: 12066/20099 (60.03%) Loss: 2.104551 LR: 0.00001781 +[14:01:58] Epoch: 1 Batch: 12067/20099 (60.04%) Loss: 2.217072 LR: 0.00001781 +[14:02:00] Epoch: 1 Batch: 12068/20099 (60.04%) Loss: 2.320764 LR: 0.00001781 +[14:02:01] Epoch: 1 Batch: 12069/20099 (60.05%) Loss: 2.117182 LR: 0.00001780 +[14:02:03] Epoch: 1 Batch: 12070/20099 (60.05%) Loss: 2.137215 LR: 0.00001780 +[14:02:05] Epoch: 1 Batch: 12071/20099 (60.06%) Loss: 1.923744 LR: 0.00001780 +[14:02:07] Epoch: 1 Batch: 12072/20099 (60.06%) Loss: 2.046069 LR: 0.00001780 +[14:02:09] Epoch: 1 Batch: 12073/20099 (60.07%) Loss: 1.992553 LR: 0.00001780 +[14:02:11] Epoch: 1 Batch: 12074/20099 (60.07%) Loss: 1.968565 LR: 0.00001780 +[14:02:13] Epoch: 1 Batch: 12075/20099 (60.08%) Loss: 2.076002 LR: 0.00001780 +[14:02:14] Epoch: 1 Batch: 12076/20099 (60.08%) Loss: 2.143752 LR: 0.00001778 +[14:02:16] Epoch: 1 Batch: 12077/20099 (60.09%) Loss: 2.152250 LR: 0.00001778 +[14:02:18] Epoch: 1 Batch: 12078/20099 (60.09%) Loss: 1.995527 LR: 0.00001778 +[14:02:20] Epoch: 1 Batch: 12079/20099 (60.10%) Loss: 2.102014 LR: 0.00001778 +[14:02:22] Epoch: 1 Batch: 12080/20099 (60.10%) Loss: 2.085536 LR: 0.00001778 +[14:02:24] Epoch: 1 Batch: 12081/20099 (60.11%) Loss: 1.962524 LR: 0.00001778 +[14:02:26] Epoch: 1 Batch: 12082/20099 (60.11%) Loss: 2.373902 LR: 0.00001778 +[14:02:27] Epoch: 1 Batch: 12083/20099 (60.12%) Loss: 2.111352 LR: 0.00001776 +[14:02:29] Epoch: 1 Batch: 12084/20099 (60.12%) Loss: 2.123293 LR: 0.00001776 +[14:02:31] Epoch: 1 Batch: 12085/20099 (60.13%) Loss: 2.019420 LR: 0.00001776 +[14:02:33] Epoch: 1 Batch: 12086/20099 (60.13%) Loss: 1.855317 LR: 0.00001776 +[14:02:35] Epoch: 1 Batch: 12087/20099 (60.14%) Loss: 2.234253 LR: 0.00001776 +[14:02:37] Epoch: 1 Batch: 12088/20099 (60.14%) Loss: 1.910104 LR: 0.00001776 +[14:02:39] Epoch: 1 Batch: 12089/20099 (60.15%) Loss: 2.002606 LR: 0.00001776 +[14:02:40] Epoch: 1 Batch: 12090/20099 (60.15%) Loss: 2.108014 LR: 0.00001775 +[14:02:42] Epoch: 1 Batch: 12091/20099 (60.16%) Loss: 1.935029 LR: 0.00001775 +[14:02:44] Epoch: 1 Batch: 12092/20099 (60.16%) Loss: 2.170310 LR: 0.00001775 +[14:02:46] Epoch: 1 Batch: 12093/20099 (60.17%) Loss: 2.165373 LR: 0.00001775 +[14:02:48] Epoch: 1 Batch: 12094/20099 (60.17%) Loss: 2.054693 LR: 0.00001775 +[14:02:50] Epoch: 1 Batch: 12095/20099 (60.18%) Loss: 2.336131 LR: 0.00001775 +[14:02:52] Epoch: 1 Batch: 12096/20099 (60.18%) Loss: 1.823726 LR: 0.00001775 +[14:02:53] Epoch: 1 Batch: 12097/20099 (60.19%) Loss: 1.808364 LR: 0.00001773 +[14:02:55] Epoch: 1 Batch: 12098/20099 (60.19%) Loss: 2.032192 LR: 0.00001773 +[14:02:57] Epoch: 1 Batch: 12099/20099 (60.20%) Loss: 2.278768 LR: 0.00001773 +[14:02:59] Epoch: 1 Batch: 12100/20099 (60.20%) Loss: 2.032614 LR: 0.00001773 +[14:03:01] Epoch: 1 Batch: 12101/20099 (60.21%) Loss: 2.189554 LR: 0.00001773 +[14:03:03] Epoch: 1 Batch: 12102/20099 (60.21%) Loss: 2.166137 LR: 0.00001773 +[14:03:05] Epoch: 1 Batch: 12103/20099 (60.22%) Loss: 1.985962 LR: 0.00001773 +[14:03:06] Epoch: 1 Batch: 12104/20099 (60.22%) Loss: 2.370743 LR: 0.00001772 +[14:03:08] Epoch: 1 Batch: 12105/20099 (60.23%) Loss: 2.152474 LR: 0.00001772 +[14:03:10] Epoch: 1 Batch: 12106/20099 (60.23%) Loss: 2.177539 LR: 0.00001772 +[14:03:12] Epoch: 1 Batch: 12107/20099 (60.24%) Loss: 2.412341 LR: 0.00001772 +[14:03:14] Epoch: 1 Batch: 12108/20099 (60.24%) Loss: 2.162924 LR: 0.00001772 +[14:03:16] Epoch: 1 Batch: 12109/20099 (60.25%) Loss: 2.120435 LR: 0.00001772 +[14:03:18] Epoch: 1 Batch: 12110/20099 (60.25%) Loss: 1.977020 LR: 0.00001772 +[14:03:19] Epoch: 1 Batch: 12111/20099 (60.26%) Loss: 1.976855 LR: 0.00001770 +[14:03:21] Epoch: 1 Batch: 12112/20099 (60.26%) Loss: 2.391312 LR: 0.00001770 +[14:03:23] Epoch: 1 Batch: 12113/20099 (60.27%) Loss: 1.970687 LR: 0.00001770 +[14:03:25] Epoch: 1 Batch: 12114/20099 (60.27%) Loss: 2.046840 LR: 0.00001770 +[14:03:27] Epoch: 1 Batch: 12115/20099 (60.28%) Loss: 2.235445 LR: 0.00001770 +[14:03:29] Epoch: 1 Batch: 12116/20099 (60.28%) Loss: 2.031919 LR: 0.00001770 +[14:03:31] Epoch: 1 Batch: 12117/20099 (60.29%) Loss: 2.057075 LR: 0.00001770 +[14:03:32] Epoch: 1 Batch: 12118/20099 (60.29%) Loss: 2.126101 LR: 0.00001768 +[14:03:34] Epoch: 1 Batch: 12119/20099 (60.30%) Loss: 1.894709 LR: 0.00001768 +[14:03:36] Epoch: 1 Batch: 12120/20099 (60.30%) Loss: 2.075543 LR: 0.00001768 +[14:03:38] Epoch: 1 Batch: 12121/20099 (60.31%) Loss: 2.001178 LR: 0.00001768 +[14:03:40] Epoch: 1 Batch: 12122/20099 (60.31%) Loss: 2.430954 LR: 0.00001768 +[14:03:42] Epoch: 1 Batch: 12123/20099 (60.32%) Loss: 2.018984 LR: 0.00001768 +[14:03:43] Epoch: 1 Batch: 12124/20099 (60.32%) Loss: 2.099712 LR: 0.00001768 +[14:03:45] Epoch: 1 Batch: 12125/20099 (60.33%) Loss: 2.469924 LR: 0.00001767 +[14:03:47] Epoch: 1 Batch: 12126/20099 (60.33%) Loss: 1.946759 LR: 0.00001767 +[14:03:49] Epoch: 1 Batch: 12127/20099 (60.34%) Loss: 2.037593 LR: 0.00001767 +[14:03:51] Epoch: 1 Batch: 12128/20099 (60.34%) Loss: 1.854365 LR: 0.00001767 +[14:03:53] Epoch: 1 Batch: 12129/20099 (60.35%) Loss: 2.147152 LR: 0.00001767 +[14:03:55] Epoch: 1 Batch: 12130/20099 (60.35%) Loss: 1.976815 LR: 0.00001767 +[14:03:56] Epoch: 1 Batch: 12131/20099 (60.36%) Loss: 1.794192 LR: 0.00001767 +[14:03:58] Epoch: 1 Batch: 12132/20099 (60.36%) Loss: 2.025391 LR: 0.00001765 +[14:04:00] Epoch: 1 Batch: 12133/20099 (60.37%) Loss: 1.827565 LR: 0.00001765 +[14:04:02] Epoch: 1 Batch: 12134/20099 (60.37%) Loss: 2.163708 LR: 0.00001765 +[14:04:04] Epoch: 1 Batch: 12135/20099 (60.38%) Loss: 2.140769 LR: 0.00001765 +[14:04:06] Epoch: 1 Batch: 12136/20099 (60.38%) Loss: 2.140808 LR: 0.00001765 +[14:04:08] Epoch: 1 Batch: 12137/20099 (60.39%) Loss: 2.053265 LR: 0.00001765 +[14:04:09] Epoch: 1 Batch: 12138/20099 (60.39%) Loss: 2.201859 LR: 0.00001765 +[14:04:11] Epoch: 1 Batch: 12139/20099 (60.40%) Loss: 2.274992 LR: 0.00001763 +[14:04:13] Epoch: 1 Batch: 12140/20099 (60.40%) Loss: 2.145534 LR: 0.00001763 +[14:04:15] Epoch: 1 Batch: 12141/20099 (60.41%) Loss: 2.585270 LR: 0.00001763 +[14:04:17] Epoch: 1 Batch: 12142/20099 (60.41%) Loss: 1.744034 LR: 0.00001763 +[14:04:19] Epoch: 1 Batch: 12143/20099 (60.42%) Loss: 2.092237 LR: 0.00001763 +[14:04:20] Epoch: 1 Batch: 12144/20099 (60.42%) Loss: 2.004505 LR: 0.00001763 +[14:04:22] Epoch: 1 Batch: 12145/20099 (60.43%) Loss: 2.000364 LR: 0.00001763 +[14:04:24] Epoch: 1 Batch: 12146/20099 (60.43%) Loss: 2.124074 LR: 0.00001762 +[14:04:26] Epoch: 1 Batch: 12147/20099 (60.44%) Loss: 2.077605 LR: 0.00001762 +[14:04:28] Epoch: 1 Batch: 12148/20099 (60.44%) Loss: 1.789516 LR: 0.00001762 +[14:04:30] Epoch: 1 Batch: 12149/20099 (60.45%) Loss: 2.044131 LR: 0.00001762 +[14:04:32] Epoch: 1 Batch: 12150/20099 (60.45%) Loss: 2.014624 LR: 0.00001762 +[14:04:33] Epoch: 1 Batch: 12151/20099 (60.46%) Loss: 2.050602 LR: 0.00001762 +[14:04:35] Epoch: 1 Batch: 12152/20099 (60.46%) Loss: 2.448857 LR: 0.00001762 +[14:04:37] Epoch: 1 Batch: 12153/20099 (60.47%) Loss: 2.397296 LR: 0.00001760 +[14:04:39] Epoch: 1 Batch: 12154/20099 (60.47%) Loss: 2.037878 LR: 0.00001760 +[14:04:41] Epoch: 1 Batch: 12155/20099 (60.48%) Loss: 2.370075 LR: 0.00001760 +[14:04:43] Epoch: 1 Batch: 12156/20099 (60.48%) Loss: 2.226198 LR: 0.00001760 +[14:04:45] Epoch: 1 Batch: 12157/20099 (60.49%) Loss: 2.021324 LR: 0.00001760 +[14:04:46] Epoch: 1 Batch: 12158/20099 (60.49%) Loss: 2.083964 LR: 0.00001760 +[14:04:48] Epoch: 1 Batch: 12159/20099 (60.50%) Loss: 2.143790 LR: 0.00001760 +[14:04:50] Epoch: 1 Batch: 12160/20099 (60.50%) Loss: 2.346072 LR: 0.00001759 +[14:04:52] Epoch: 1 Batch: 12161/20099 (60.51%) Loss: 2.188300 LR: 0.00001759 +[14:04:54] Epoch: 1 Batch: 12162/20099 (60.51%) Loss: 2.151467 LR: 0.00001759 +[14:04:56] Epoch: 1 Batch: 12163/20099 (60.52%) Loss: 2.173041 LR: 0.00001759 +[14:04:58] Epoch: 1 Batch: 12164/20099 (60.52%) Loss: 2.129467 LR: 0.00001759 +[14:04:59] Epoch: 1 Batch: 12165/20099 (60.53%) Loss: 2.196073 LR: 0.00001759 +[14:05:01] Epoch: 1 Batch: 12166/20099 (60.53%) Loss: 2.131527 LR: 0.00001759 +[14:05:03] Epoch: 1 Batch: 12167/20099 (60.54%) Loss: 2.312117 LR: 0.00001757 +[14:05:05] Epoch: 1 Batch: 12168/20099 (60.54%) Loss: 2.464015 LR: 0.00001757 +[14:05:07] Epoch: 1 Batch: 12169/20099 (60.55%) Loss: 2.232410 LR: 0.00001757 +[14:05:09] Epoch: 1 Batch: 12170/20099 (60.55%) Loss: 2.122999 LR: 0.00001757 +[14:05:11] Epoch: 1 Batch: 12171/20099 (60.56%) Loss: 1.865578 LR: 0.00001757 +[14:05:13] Epoch: 1 Batch: 12172/20099 (60.56%) Loss: 1.967662 LR: 0.00001757 +[14:05:14] Epoch: 1 Batch: 12173/20099 (60.57%) Loss: 2.085064 LR: 0.00001757 +[14:05:16] Epoch: 1 Batch: 12174/20099 (60.57%) Loss: 2.224516 LR: 0.00001755 +[14:05:18] Epoch: 1 Batch: 12175/20099 (60.58%) Loss: 2.155399 LR: 0.00001755 +[14:05:20] Epoch: 1 Batch: 12176/20099 (60.58%) Loss: 2.197881 LR: 0.00001755 +[14:05:22] Epoch: 1 Batch: 12177/20099 (60.59%) Loss: 1.926407 LR: 0.00001755 +[14:05:24] Epoch: 1 Batch: 12178/20099 (60.59%) Loss: 2.039332 LR: 0.00001755 +[14:05:26] Epoch: 1 Batch: 12179/20099 (60.60%) Loss: 2.085330 LR: 0.00001755 +[14:05:27] Epoch: 1 Batch: 12180/20099 (60.60%) Loss: 2.057239 LR: 0.00001755 +[14:05:29] Epoch: 1 Batch: 12181/20099 (60.61%) Loss: 1.901259 LR: 0.00001754 +[14:05:31] Epoch: 1 Batch: 12182/20099 (60.61%) Loss: 1.775051 LR: 0.00001754 +[14:05:33] Epoch: 1 Batch: 12183/20099 (60.61%) Loss: 2.312769 LR: 0.00001754 +[14:05:35] Epoch: 1 Batch: 12184/20099 (60.62%) Loss: 1.958318 LR: 0.00001754 +[14:05:37] Epoch: 1 Batch: 12185/20099 (60.62%) Loss: 1.867153 LR: 0.00001754 +[14:05:39] Epoch: 1 Batch: 12186/20099 (60.63%) Loss: 2.051325 LR: 0.00001754 +[14:05:40] Epoch: 1 Batch: 12187/20099 (60.63%) Loss: 2.164819 LR: 0.00001754 +[14:05:42] Epoch: 1 Batch: 12188/20099 (60.64%) Loss: 2.158725 LR: 0.00001752 +[14:05:44] Epoch: 1 Batch: 12189/20099 (60.64%) Loss: 2.018093 LR: 0.00001752 +[14:05:46] Epoch: 1 Batch: 12190/20099 (60.65%) Loss: 2.090154 LR: 0.00001752 +[14:05:48] Epoch: 1 Batch: 12191/20099 (60.65%) Loss: 2.163364 LR: 0.00001752 +[14:05:50] Epoch: 1 Batch: 12192/20099 (60.66%) Loss: 1.835275 LR: 0.00001752 +[14:05:52] Epoch: 1 Batch: 12193/20099 (60.66%) Loss: 1.752116 LR: 0.00001752 +[14:05:53] Epoch: 1 Batch: 12194/20099 (60.67%) Loss: 2.345974 LR: 0.00001752 +[14:05:55] Epoch: 1 Batch: 12195/20099 (60.67%) Loss: 1.835155 LR: 0.00001750 +[14:05:57] Epoch: 1 Batch: 12196/20099 (60.68%) Loss: 2.089862 LR: 0.00001750 +[14:05:59] Epoch: 1 Batch: 12197/20099 (60.68%) Loss: 2.205111 LR: 0.00001750 +[14:06:01] Epoch: 1 Batch: 12198/20099 (60.69%) Loss: 2.030867 LR: 0.00001750 +[14:06:03] Epoch: 1 Batch: 12199/20099 (60.69%) Loss: 2.235126 LR: 0.00001750 +[14:06:08] >> Cleaned up old temp checkpoint: epoch1_step6400 +[14:06:08] >> Temp checkpoint saved: epoch1_step12200, size: 0.1693 GB +[14:06:08] Epoch: 1 Batch: 12200/20099 (60.70%) Loss: 2.129660 LR: 0.00001750 +[14:06:10] Epoch: 1 Batch: 12201/20099 (60.70%) Loss: 1.889544 LR: 0.00001750 +[14:06:12] Epoch: 1 Batch: 12202/20099 (60.71%) Loss: 2.000486 LR: 0.00001749 +[14:06:14] Epoch: 1 Batch: 12203/20099 (60.71%) Loss: 2.235077 LR: 0.00001749 +[14:06:16] Epoch: 1 Batch: 12204/20099 (60.72%) Loss: 1.891682 LR: 0.00001749 +[14:06:18] Epoch: 1 Batch: 12205/20099 (60.72%) Loss: 2.075753 LR: 0.00001749 +[14:06:19] Epoch: 1 Batch: 12206/20099 (60.73%) Loss: 2.140895 LR: 0.00001749 +[14:06:21] Epoch: 1 Batch: 12207/20099 (60.73%) Loss: 1.768916 LR: 0.00001749 +[14:06:23] Epoch: 1 Batch: 12208/20099 (60.74%) Loss: 2.109254 LR: 0.00001749 +[14:06:25] Epoch: 1 Batch: 12209/20099 (60.74%) Loss: 1.975311 LR: 0.00001747 +[14:06:27] Epoch: 1 Batch: 12210/20099 (60.75%) Loss: 2.150007 LR: 0.00001747 +[14:06:29] Epoch: 1 Batch: 12211/20099 (60.75%) Loss: 2.417830 LR: 0.00001747 +[14:06:30] Epoch: 1 Batch: 12212/20099 (60.76%) Loss: 2.271095 LR: 0.00001747 +[14:06:32] Epoch: 1 Batch: 12213/20099 (60.76%) Loss: 2.014578 LR: 0.00001747 +[14:06:34] Epoch: 1 Batch: 12214/20099 (60.77%) Loss: 2.148334 LR: 0.00001747 +[14:06:36] Epoch: 1 Batch: 12215/20099 (60.77%) Loss: 2.027583 LR: 0.00001747 +[14:06:38] Epoch: 1 Batch: 12216/20099 (60.78%) Loss: 2.023675 LR: 0.00001746 +[14:06:40] Epoch: 1 Batch: 12217/20099 (60.78%) Loss: 2.166159 LR: 0.00001746 +[14:06:42] Epoch: 1 Batch: 12218/20099 (60.79%) Loss: 2.169972 LR: 0.00001746 +[14:06:44] Epoch: 1 Batch: 12219/20099 (60.79%) Loss: 2.137627 LR: 0.00001746 +[14:06:45] Epoch: 1 Batch: 12220/20099 (60.80%) Loss: 1.948411 LR: 0.00001746 +[14:06:47] Epoch: 1 Batch: 12221/20099 (60.80%) Loss: 1.911548 LR: 0.00001746 +[14:06:49] Epoch: 1 Batch: 12222/20099 (60.81%) Loss: 1.994660 LR: 0.00001746 +[14:06:51] Epoch: 1 Batch: 12223/20099 (60.81%) Loss: 2.241331 LR: 0.00001744 +[14:06:53] Epoch: 1 Batch: 12224/20099 (60.82%) Loss: 1.844083 LR: 0.00001744 +[14:06:55] Epoch: 1 Batch: 12225/20099 (60.82%) Loss: 2.234322 LR: 0.00001744 +[14:06:57] Epoch: 1 Batch: 12226/20099 (60.83%) Loss: 2.269020 LR: 0.00001744 +[14:06:58] Epoch: 1 Batch: 12227/20099 (60.83%) Loss: 1.904417 LR: 0.00001744 +[14:07:00] Epoch: 1 Batch: 12228/20099 (60.84%) Loss: 2.110585 LR: 0.00001744 +[14:07:02] Epoch: 1 Batch: 12229/20099 (60.84%) Loss: 2.271290 LR: 0.00001744 +[14:07:04] Epoch: 1 Batch: 12230/20099 (60.85%) Loss: 2.103977 LR: 0.00001742 +[14:07:06] Epoch: 1 Batch: 12231/20099 (60.85%) Loss: 2.149932 LR: 0.00001742 +[14:07:08] Epoch: 1 Batch: 12232/20099 (60.86%) Loss: 2.026397 LR: 0.00001742 +[14:07:10] Epoch: 1 Batch: 12233/20099 (60.86%) Loss: 2.040605 LR: 0.00001742 +[14:07:12] Epoch: 1 Batch: 12234/20099 (60.87%) Loss: 1.831052 LR: 0.00001742 +[14:07:14] Epoch: 1 Batch: 12235/20099 (60.87%) Loss: 2.258478 LR: 0.00001742 +[14:07:15] Epoch: 1 Batch: 12236/20099 (60.88%) Loss: 2.116243 LR: 0.00001742 +[14:07:17] Epoch: 1 Batch: 12237/20099 (60.88%) Loss: 2.301134 LR: 0.00001741 +[14:07:19] Epoch: 1 Batch: 12238/20099 (60.89%) Loss: 2.100321 LR: 0.00001741 +[14:07:21] Epoch: 1 Batch: 12239/20099 (60.89%) Loss: 1.976056 LR: 0.00001741 +[14:07:23] Epoch: 1 Batch: 12240/20099 (60.90%) Loss: 2.194976 LR: 0.00001741 +[14:07:25] Epoch: 1 Batch: 12241/20099 (60.90%) Loss: 2.168750 LR: 0.00001741 +[14:07:27] Epoch: 1 Batch: 12242/20099 (60.91%) Loss: 2.170334 LR: 0.00001741 +[14:07:28] Epoch: 1 Batch: 12243/20099 (60.91%) Loss: 1.903274 LR: 0.00001741 +[14:07:30] Epoch: 1 Batch: 12244/20099 (60.92%) Loss: 2.520241 LR: 0.00001739 +[14:07:32] Epoch: 1 Batch: 12245/20099 (60.92%) Loss: 2.004865 LR: 0.00001739 +[14:07:34] Epoch: 1 Batch: 12246/20099 (60.93%) Loss: 1.930843 LR: 0.00001739 +[14:07:36] Epoch: 1 Batch: 12247/20099 (60.93%) Loss: 2.264266 LR: 0.00001739 +[14:07:38] Epoch: 1 Batch: 12248/20099 (60.94%) Loss: 2.082570 LR: 0.00001739 +[14:07:40] Epoch: 1 Batch: 12249/20099 (60.94%) Loss: 2.085773 LR: 0.00001739 +[14:07:41] Epoch: 1 Batch: 12250/20099 (60.95%) Loss: 1.978746 LR: 0.00001739 +[14:07:43] Epoch: 1 Batch: 12251/20099 (60.95%) Loss: 1.813149 LR: 0.00001737 +[14:07:45] Epoch: 1 Batch: 12252/20099 (60.96%) Loss: 2.146427 LR: 0.00001737 +[14:07:47] Epoch: 1 Batch: 12253/20099 (60.96%) Loss: 2.369604 LR: 0.00001737 +[14:07:49] Epoch: 1 Batch: 12254/20099 (60.97%) Loss: 2.034620 LR: 0.00001737 +[14:07:51] Epoch: 1 Batch: 12255/20099 (60.97%) Loss: 1.722326 LR: 0.00001737 +[14:07:53] Epoch: 1 Batch: 12256/20099 (60.98%) Loss: 2.052079 LR: 0.00001737 +[14:07:54] Epoch: 1 Batch: 12257/20099 (60.98%) Loss: 2.218084 LR: 0.00001737 +[14:07:56] Epoch: 1 Batch: 12258/20099 (60.99%) Loss: 2.122533 LR: 0.00001736 +[14:07:58] Epoch: 1 Batch: 12259/20099 (60.99%) Loss: 2.028579 LR: 0.00001736 +[14:08:00] Epoch: 1 Batch: 12260/20099 (61.00%) Loss: 1.887762 LR: 0.00001736 +[14:08:02] Epoch: 1 Batch: 12261/20099 (61.00%) Loss: 2.286776 LR: 0.00001736 +[14:08:04] Epoch: 1 Batch: 12262/20099 (61.01%) Loss: 2.183238 LR: 0.00001736 +[14:08:06] Epoch: 1 Batch: 12263/20099 (61.01%) Loss: 2.376193 LR: 0.00001736 +[14:08:07] Epoch: 1 Batch: 12264/20099 (61.02%) Loss: 2.083225 LR: 0.00001736 +[14:08:09] Epoch: 1 Batch: 12265/20099 (61.02%) Loss: 1.946620 LR: 0.00001734 +[14:08:11] Epoch: 1 Batch: 12266/20099 (61.03%) Loss: 1.990003 LR: 0.00001734 +[14:08:13] Epoch: 1 Batch: 12267/20099 (61.03%) Loss: 2.013268 LR: 0.00001734 +[14:08:15] Epoch: 1 Batch: 12268/20099 (61.04%) Loss: 2.114145 LR: 0.00001734 +[14:08:17] Epoch: 1 Batch: 12269/20099 (61.04%) Loss: 2.075821 LR: 0.00001734 +[14:08:19] Epoch: 1 Batch: 12270/20099 (61.05%) Loss: 1.953556 LR: 0.00001734 +[14:08:20] Epoch: 1 Batch: 12271/20099 (61.05%) Loss: 2.182477 LR: 0.00001734 +[14:08:22] Epoch: 1 Batch: 12272/20099 (61.06%) Loss: 2.005339 LR: 0.00001733 +[14:08:24] Epoch: 1 Batch: 12273/20099 (61.06%) Loss: 2.243406 LR: 0.00001733 +[14:08:26] Epoch: 1 Batch: 12274/20099 (61.07%) Loss: 2.024215 LR: 0.00001733 +[14:08:28] Epoch: 1 Batch: 12275/20099 (61.07%) Loss: 2.155385 LR: 0.00001733 +[14:08:30] Epoch: 1 Batch: 12276/20099 (61.08%) Loss: 2.214620 LR: 0.00001733 +[14:08:32] Epoch: 1 Batch: 12277/20099 (61.08%) Loss: 2.230641 LR: 0.00001733 +[14:08:34] Epoch: 1 Batch: 12278/20099 (61.09%) Loss: 2.073393 LR: 0.00001733 +[14:08:35] Epoch: 1 Batch: 12279/20099 (61.09%) Loss: 2.311084 LR: 0.00001731 +[14:08:37] Epoch: 1 Batch: 12280/20099 (61.10%) Loss: 2.190411 LR: 0.00001731 +[14:08:39] Epoch: 1 Batch: 12281/20099 (61.10%) Loss: 2.011444 LR: 0.00001731 +[14:08:41] Epoch: 1 Batch: 12282/20099 (61.11%) Loss: 1.952600 LR: 0.00001731 +[14:08:43] Epoch: 1 Batch: 12283/20099 (61.11%) Loss: 1.977123 LR: 0.00001731 +[14:08:45] Epoch: 1 Batch: 12284/20099 (61.12%) Loss: 1.981080 LR: 0.00001731 +[14:08:47] Epoch: 1 Batch: 12285/20099 (61.12%) Loss: 1.952946 LR: 0.00001731 +[14:08:48] Epoch: 1 Batch: 12286/20099 (61.13%) Loss: 1.601518 LR: 0.00001729 +[14:08:50] Epoch: 1 Batch: 12287/20099 (61.13%) Loss: 2.420287 LR: 0.00001729 +[14:08:52] Epoch: 1 Batch: 12288/20099 (61.14%) Loss: 2.311944 LR: 0.00001729 +[14:08:54] Epoch: 1 Batch: 12289/20099 (61.14%) Loss: 2.116939 LR: 0.00001729 +[14:08:56] Epoch: 1 Batch: 12290/20099 (61.15%) Loss: 2.360302 LR: 0.00001729 +[14:08:58] Epoch: 1 Batch: 12291/20099 (61.15%) Loss: 2.028761 LR: 0.00001729 +[14:09:00] Epoch: 1 Batch: 12292/20099 (61.16%) Loss: 1.989937 LR: 0.00001729 +[14:09:01] Epoch: 1 Batch: 12293/20099 (61.16%) Loss: 2.094577 LR: 0.00001728 +[14:09:03] Epoch: 1 Batch: 12294/20099 (61.17%) Loss: 2.171137 LR: 0.00001728 +[14:09:05] Epoch: 1 Batch: 12295/20099 (61.17%) Loss: 1.821798 LR: 0.00001728 +[14:09:07] Epoch: 1 Batch: 12296/20099 (61.18%) Loss: 2.183697 LR: 0.00001728 +[14:09:09] Epoch: 1 Batch: 12297/20099 (61.18%) Loss: 1.888962 LR: 0.00001728 +[14:09:11] Epoch: 1 Batch: 12298/20099 (61.19%) Loss: 2.053149 LR: 0.00001728 +[14:09:13] Epoch: 1 Batch: 12299/20099 (61.19%) Loss: 2.069914 LR: 0.00001728 +[14:09:14] Epoch: 1 Batch: 12300/20099 (61.20%) Loss: 1.943260 LR: 0.00001726 +[14:09:16] Epoch: 1 Batch: 12301/20099 (61.20%) Loss: 1.901483 LR: 0.00001726 +[14:09:18] Epoch: 1 Batch: 12302/20099 (61.21%) Loss: 2.171075 LR: 0.00001726 +[14:09:20] Epoch: 1 Batch: 12303/20099 (61.21%) Loss: 1.841258 LR: 0.00001726 +[14:09:22] Epoch: 1 Batch: 12304/20099 (61.22%) Loss: 2.150431 LR: 0.00001726 +[14:09:24] Epoch: 1 Batch: 12305/20099 (61.22%) Loss: 2.021732 LR: 0.00001726 +[14:09:26] Epoch: 1 Batch: 12306/20099 (61.23%) Loss: 2.056081 LR: 0.00001726 +[14:09:27] Epoch: 1 Batch: 12307/20099 (61.23%) Loss: 2.321287 LR: 0.00001725 +[14:09:29] Epoch: 1 Batch: 12308/20099 (61.24%) Loss: 2.158605 LR: 0.00001725 +[14:09:31] Epoch: 1 Batch: 12309/20099 (61.24%) Loss: 2.015778 LR: 0.00001725 +[14:09:33] Epoch: 1 Batch: 12310/20099 (61.25%) Loss: 2.183012 LR: 0.00001725 +[14:09:35] Epoch: 1 Batch: 12311/20099 (61.25%) Loss: 1.976186 LR: 0.00001725 +[14:09:37] Epoch: 1 Batch: 12312/20099 (61.26%) Loss: 1.936267 LR: 0.00001725 +[14:09:39] Epoch: 1 Batch: 12313/20099 (61.26%) Loss: 2.066044 LR: 0.00001725 +[14:09:41] Epoch: 1 Batch: 12314/20099 (61.27%) Loss: 2.134550 LR: 0.00001723 +[14:09:42] Epoch: 1 Batch: 12315/20099 (61.27%) Loss: 2.259176 LR: 0.00001723 +[14:09:44] Epoch: 1 Batch: 12316/20099 (61.28%) Loss: 2.229385 LR: 0.00001723 +[14:09:46] Epoch: 1 Batch: 12317/20099 (61.28%) Loss: 2.068820 LR: 0.00001723 +[14:09:48] Epoch: 1 Batch: 12318/20099 (61.29%) Loss: 2.189961 LR: 0.00001723 +[14:09:50] Epoch: 1 Batch: 12319/20099 (61.29%) Loss: 2.024681 LR: 0.00001723 +[14:09:52] Epoch: 1 Batch: 12320/20099 (61.30%) Loss: 1.749451 LR: 0.00001723 +[14:09:54] Epoch: 1 Batch: 12321/20099 (61.30%) Loss: 2.235156 LR: 0.00001721 +[14:09:55] Epoch: 1 Batch: 12322/20099 (61.31%) Loss: 1.913757 LR: 0.00001721 +[14:09:57] Epoch: 1 Batch: 12323/20099 (61.31%) Loss: 2.117591 LR: 0.00001721 +[14:09:59] Epoch: 1 Batch: 12324/20099 (61.32%) Loss: 1.765773 LR: 0.00001721 +[14:10:01] Epoch: 1 Batch: 12325/20099 (61.32%) Loss: 2.133766 LR: 0.00001721 +[14:10:03] Epoch: 1 Batch: 12326/20099 (61.33%) Loss: 2.226673 LR: 0.00001721 +[14:10:05] Epoch: 1 Batch: 12327/20099 (61.33%) Loss: 2.112855 LR: 0.00001721 +[14:10:07] Epoch: 1 Batch: 12328/20099 (61.34%) Loss: 2.135373 LR: 0.00001720 +[14:10:08] Epoch: 1 Batch: 12329/20099 (61.34%) Loss: 2.166035 LR: 0.00001720 +[14:10:10] Epoch: 1 Batch: 12330/20099 (61.35%) Loss: 2.516033 LR: 0.00001720 +[14:10:12] Epoch: 1 Batch: 12331/20099 (61.35%) Loss: 2.023518 LR: 0.00001720 +[14:10:14] Epoch: 1 Batch: 12332/20099 (61.36%) Loss: 2.396083 LR: 0.00001720 +[14:10:16] Epoch: 1 Batch: 12333/20099 (61.36%) Loss: 1.858010 LR: 0.00001720 +[14:10:18] Epoch: 1 Batch: 12334/20099 (61.37%) Loss: 2.137988 LR: 0.00001720 +[14:10:20] Epoch: 1 Batch: 12335/20099 (61.37%) Loss: 2.153458 LR: 0.00001718 +[14:10:21] Epoch: 1 Batch: 12336/20099 (61.38%) Loss: 2.187206 LR: 0.00001718 +[14:10:23] Epoch: 1 Batch: 12337/20099 (61.38%) Loss: 1.915498 LR: 0.00001718 +[14:10:25] Epoch: 1 Batch: 12338/20099 (61.39%) Loss: 2.029223 LR: 0.00001718 +[14:10:27] Epoch: 1 Batch: 12339/20099 (61.39%) Loss: 1.861312 LR: 0.00001718 +[14:10:29] Epoch: 1 Batch: 12340/20099 (61.40%) Loss: 2.329096 LR: 0.00001718 +[14:10:31] Epoch: 1 Batch: 12341/20099 (61.40%) Loss: 2.247525 LR: 0.00001718 +[14:10:33] Epoch: 1 Batch: 12342/20099 (61.41%) Loss: 2.138009 LR: 0.00001716 +[14:10:34] Epoch: 1 Batch: 12343/20099 (61.41%) Loss: 1.934742 LR: 0.00001716 +[14:10:36] Epoch: 1 Batch: 12344/20099 (61.42%) Loss: 2.134487 LR: 0.00001716 +[14:10:38] Epoch: 1 Batch: 12345/20099 (61.42%) Loss: 1.961718 LR: 0.00001716 +[14:10:40] Epoch: 1 Batch: 12346/20099 (61.43%) Loss: 2.383892 LR: 0.00001716 +[14:10:42] Epoch: 1 Batch: 12347/20099 (61.43%) Loss: 2.035318 LR: 0.00001716 +[14:10:44] Epoch: 1 Batch: 12348/20099 (61.44%) Loss: 1.997882 LR: 0.00001716 +[14:10:46] Epoch: 1 Batch: 12349/20099 (61.44%) Loss: 2.055947 LR: 0.00001715 +[14:10:47] Epoch: 1 Batch: 12350/20099 (61.45%) Loss: 1.792543 LR: 0.00001715 +[14:10:49] Epoch: 1 Batch: 12351/20099 (61.45%) Loss: 2.228012 LR: 0.00001715 +[14:10:51] Epoch: 1 Batch: 12352/20099 (61.46%) Loss: 2.345986 LR: 0.00001715 +[14:10:53] Epoch: 1 Batch: 12353/20099 (61.46%) Loss: 2.046339 LR: 0.00001715 +[14:10:55] Epoch: 1 Batch: 12354/20099 (61.47%) Loss: 1.882083 LR: 0.00001715 +[14:10:57] Epoch: 1 Batch: 12355/20099 (61.47%) Loss: 1.998046 LR: 0.00001715 +[14:10:59] Epoch: 1 Batch: 12356/20099 (61.48%) Loss: 2.249279 LR: 0.00001713 +[14:11:00] Epoch: 1 Batch: 12357/20099 (61.48%) Loss: 1.971248 LR: 0.00001713 +[14:11:02] Epoch: 1 Batch: 12358/20099 (61.49%) Loss: 2.123055 LR: 0.00001713 +[14:11:04] Epoch: 1 Batch: 12359/20099 (61.49%) Loss: 2.101486 LR: 0.00001713 +[14:11:06] Epoch: 1 Batch: 12360/20099 (61.50%) Loss: 2.082289 LR: 0.00001713 +[14:11:08] Epoch: 1 Batch: 12361/20099 (61.50%) Loss: 2.331289 LR: 0.00001713 +[14:11:10] Epoch: 1 Batch: 12362/20099 (61.51%) Loss: 2.107315 LR: 0.00001713 +[14:11:11] Epoch: 1 Batch: 12363/20099 (61.51%) Loss: 2.284415 LR: 0.00001712 +[14:11:13] Epoch: 1 Batch: 12364/20099 (61.52%) Loss: 1.960275 LR: 0.00001712 +[14:11:15] Epoch: 1 Batch: 12365/20099 (61.52%) Loss: 2.167596 LR: 0.00001712 +[14:11:17] Epoch: 1 Batch: 12366/20099 (61.53%) Loss: 2.081597 LR: 0.00001712 +[14:11:19] Epoch: 1 Batch: 12367/20099 (61.53%) Loss: 2.227604 LR: 0.00001712 +[14:11:21] Epoch: 1 Batch: 12368/20099 (61.54%) Loss: 2.049858 LR: 0.00001712 +[14:11:23] Epoch: 1 Batch: 12369/20099 (61.54%) Loss: 2.007095 LR: 0.00001712 +[14:11:24] Epoch: 1 Batch: 12370/20099 (61.55%) Loss: 2.088302 LR: 0.00001710 +[14:11:26] Epoch: 1 Batch: 12371/20099 (61.55%) Loss: 1.854296 LR: 0.00001710 +[14:11:28] Epoch: 1 Batch: 12372/20099 (61.56%) Loss: 2.133678 LR: 0.00001710 +[14:11:30] Epoch: 1 Batch: 12373/20099 (61.56%) Loss: 2.164837 LR: 0.00001710 +[14:11:32] Epoch: 1 Batch: 12374/20099 (61.57%) Loss: 2.241078 LR: 0.00001710 +[14:11:34] Epoch: 1 Batch: 12375/20099 (61.57%) Loss: 1.960070 LR: 0.00001710 +[14:11:36] Epoch: 1 Batch: 12376/20099 (61.58%) Loss: 2.052070 LR: 0.00001710 +[14:11:37] Epoch: 1 Batch: 12377/20099 (61.58%) Loss: 2.150925 LR: 0.00001708 +[14:11:39] Epoch: 1 Batch: 12378/20099 (61.59%) Loss: 2.073399 LR: 0.00001708 +[14:11:41] Epoch: 1 Batch: 12379/20099 (61.59%) Loss: 2.057556 LR: 0.00001708 +[14:11:43] Epoch: 1 Batch: 12380/20099 (61.60%) Loss: 2.113719 LR: 0.00001708 +[14:11:45] Epoch: 1 Batch: 12381/20099 (61.60%) Loss: 1.937099 LR: 0.00001708 +[14:11:47] Epoch: 1 Batch: 12382/20099 (61.61%) Loss: 2.139065 LR: 0.00001708 +[14:11:49] Epoch: 1 Batch: 12383/20099 (61.61%) Loss: 2.112404 LR: 0.00001708 +[14:11:50] Epoch: 1 Batch: 12384/20099 (61.62%) Loss: 2.174942 LR: 0.00001707 +[14:11:52] Epoch: 1 Batch: 12385/20099 (61.62%) Loss: 1.969639 LR: 0.00001707 +[14:11:54] Epoch: 1 Batch: 12386/20099 (61.62%) Loss: 1.988585 LR: 0.00001707 +[14:11:56] Epoch: 1 Batch: 12387/20099 (61.63%) Loss: 2.242295 LR: 0.00001707 +[14:11:58] Epoch: 1 Batch: 12388/20099 (61.63%) Loss: 2.712900 LR: 0.00001707 +[14:12:00] Epoch: 1 Batch: 12389/20099 (61.64%) Loss: 1.821630 LR: 0.00001707 +[14:12:02] Epoch: 1 Batch: 12390/20099 (61.64%) Loss: 2.124241 LR: 0.00001707 +[14:12:03] Epoch: 1 Batch: 12391/20099 (61.65%) Loss: 2.129233 LR: 0.00001705 +[14:12:05] Epoch: 1 Batch: 12392/20099 (61.65%) Loss: 2.216093 LR: 0.00001705 +[14:12:07] Epoch: 1 Batch: 12393/20099 (61.66%) Loss: 2.101609 LR: 0.00001705 +[14:12:09] Epoch: 1 Batch: 12394/20099 (61.66%) Loss: 2.177756 LR: 0.00001705 +[14:12:11] Epoch: 1 Batch: 12395/20099 (61.67%) Loss: 1.839501 LR: 0.00001705 +[14:12:13] Epoch: 1 Batch: 12396/20099 (61.67%) Loss: 2.259667 LR: 0.00001705 +[14:12:15] Epoch: 1 Batch: 12397/20099 (61.68%) Loss: 1.995819 LR: 0.00001705 +[14:12:16] Epoch: 1 Batch: 12398/20099 (61.68%) Loss: 2.227104 LR: 0.00001703 +[14:12:18] Epoch: 1 Batch: 12399/20099 (61.69%) Loss: 2.415015 LR: 0.00001703 +[14:12:24] >> Cleaned up old temp checkpoint: epoch1_step6600 +[14:12:24] >> Temp checkpoint saved: epoch1_step12400, size: 0.1693 GB +[14:12:24] Epoch: 1 Batch: 12400/20099 (61.69%) Loss: 2.087462 LR: 0.00001703 +[14:12:26] Epoch: 1 Batch: 12401/20099 (61.70%) Loss: 2.165205 LR: 0.00001703 +[14:12:28] Epoch: 1 Batch: 12402/20099 (61.70%) Loss: 2.255311 LR: 0.00001703 +[14:12:30] Epoch: 1 Batch: 12403/20099 (61.71%) Loss: 2.219689 LR: 0.00001703 +[14:12:31] Epoch: 1 Batch: 12404/20099 (61.71%) Loss: 1.792379 LR: 0.00001703 +[14:12:33] Epoch: 1 Batch: 12405/20099 (61.72%) Loss: 1.937288 LR: 0.00001702 +[14:12:35] Epoch: 1 Batch: 12406/20099 (61.72%) Loss: 2.165641 LR: 0.00001702 +[14:12:37] Epoch: 1 Batch: 12407/20099 (61.73%) Loss: 1.932520 LR: 0.00001702 +[14:12:39] Epoch: 1 Batch: 12408/20099 (61.73%) Loss: 2.291670 LR: 0.00001702 +[14:12:41] Epoch: 1 Batch: 12409/20099 (61.74%) Loss: 2.017411 LR: 0.00001702 +[14:12:42] Epoch: 1 Batch: 12410/20099 (61.74%) Loss: 2.170950 LR: 0.00001702 +[14:12:44] Epoch: 1 Batch: 12411/20099 (61.75%) Loss: 2.203439 LR: 0.00001702 +[14:12:46] Epoch: 1 Batch: 12412/20099 (61.75%) Loss: 2.158433 LR: 0.00001700 +[14:12:48] Epoch: 1 Batch: 12413/20099 (61.76%) Loss: 1.862632 LR: 0.00001700 +[14:12:50] Epoch: 1 Batch: 12414/20099 (61.76%) Loss: 2.106500 LR: 0.00001700 +[14:12:52] Epoch: 1 Batch: 12415/20099 (61.77%) Loss: 1.967963 LR: 0.00001700 +[14:12:54] Epoch: 1 Batch: 12416/20099 (61.77%) Loss: 2.075730 LR: 0.00001700 +[14:12:55] Epoch: 1 Batch: 12417/20099 (61.78%) Loss: 1.867535 LR: 0.00001700 +[14:12:57] Epoch: 1 Batch: 12418/20099 (61.78%) Loss: 2.073011 LR: 0.00001700 +[14:12:59] Epoch: 1 Batch: 12419/20099 (61.79%) Loss: 2.140473 LR: 0.00001699 +[14:13:01] Epoch: 1 Batch: 12420/20099 (61.79%) Loss: 1.883654 LR: 0.00001699 +[14:13:03] Epoch: 1 Batch: 12421/20099 (61.80%) Loss: 1.788712 LR: 0.00001699 +[14:13:05] Epoch: 1 Batch: 12422/20099 (61.80%) Loss: 2.033268 LR: 0.00001699 +[14:13:07] Epoch: 1 Batch: 12423/20099 (61.81%) Loss: 2.039468 LR: 0.00001699 +[14:13:08] Epoch: 1 Batch: 12424/20099 (61.81%) Loss: 2.231876 LR: 0.00001699 +[14:13:10] Epoch: 1 Batch: 12425/20099 (61.82%) Loss: 2.306885 LR: 0.00001699 +[14:13:12] Epoch: 1 Batch: 12426/20099 (61.82%) Loss: 2.272599 LR: 0.00001697 +[14:13:14] Epoch: 1 Batch: 12427/20099 (61.83%) Loss: 2.311101 LR: 0.00001697 +[14:13:16] Epoch: 1 Batch: 12428/20099 (61.83%) Loss: 2.061842 LR: 0.00001697 +[14:13:18] Epoch: 1 Batch: 12429/20099 (61.84%) Loss: 1.502964 LR: 0.00001697 +[14:13:20] Epoch: 1 Batch: 12430/20099 (61.84%) Loss: 2.062404 LR: 0.00001697 +[14:13:21] Epoch: 1 Batch: 12431/20099 (61.85%) Loss: 1.749862 LR: 0.00001697 +[14:13:23] Epoch: 1 Batch: 12432/20099 (61.85%) Loss: 1.992285 LR: 0.00001697 +[14:13:25] Epoch: 1 Batch: 12433/20099 (61.86%) Loss: 1.931303 LR: 0.00001695 +[14:13:27] Epoch: 1 Batch: 12434/20099 (61.86%) Loss: 2.504176 LR: 0.00001695 +[14:13:29] Epoch: 1 Batch: 12435/20099 (61.87%) Loss: 1.968347 LR: 0.00001695 +[14:13:31] Epoch: 1 Batch: 12436/20099 (61.87%) Loss: 1.581026 LR: 0.00001695 +[14:13:32] Epoch: 1 Batch: 12437/20099 (61.88%) Loss: 2.041371 LR: 0.00001695 +[14:13:34] Epoch: 1 Batch: 12438/20099 (61.88%) Loss: 2.134628 LR: 0.00001695 +[14:13:36] Epoch: 1 Batch: 12439/20099 (61.89%) Loss: 2.034686 LR: 0.00001695 +[14:13:38] Epoch: 1 Batch: 12440/20099 (61.89%) Loss: 1.903062 LR: 0.00001694 +[14:13:40] Epoch: 1 Batch: 12441/20099 (61.90%) Loss: 1.883896 LR: 0.00001694 +[14:13:42] Epoch: 1 Batch: 12442/20099 (61.90%) Loss: 2.332543 LR: 0.00001694 +[14:13:44] Epoch: 1 Batch: 12443/20099 (61.91%) Loss: 2.174825 LR: 0.00001694 +[14:13:45] Epoch: 1 Batch: 12444/20099 (61.91%) Loss: 2.151270 LR: 0.00001694 +[14:13:47] Epoch: 1 Batch: 12445/20099 (61.92%) Loss: 1.911283 LR: 0.00001694 +[14:13:49] Epoch: 1 Batch: 12446/20099 (61.92%) Loss: 2.189357 LR: 0.00001694 +[14:13:51] Epoch: 1 Batch: 12447/20099 (61.93%) Loss: 2.218433 LR: 0.00001692 +[14:13:53] Epoch: 1 Batch: 12448/20099 (61.93%) Loss: 2.211326 LR: 0.00001692 +[14:13:55] Epoch: 1 Batch: 12449/20099 (61.94%) Loss: 1.930006 LR: 0.00001692 +[14:13:57] Epoch: 1 Batch: 12450/20099 (61.94%) Loss: 1.699205 LR: 0.00001692 +[14:13:58] Epoch: 1 Batch: 12451/20099 (61.95%) Loss: 2.045350 LR: 0.00001692 +[14:14:00] Epoch: 1 Batch: 12452/20099 (61.95%) Loss: 2.184683 LR: 0.00001692 +[14:14:02] Epoch: 1 Batch: 12453/20099 (61.96%) Loss: 2.132371 LR: 0.00001692 +[14:14:04] Epoch: 1 Batch: 12454/20099 (61.96%) Loss: 2.234082 LR: 0.00001691 +[14:14:06] Epoch: 1 Batch: 12455/20099 (61.97%) Loss: 2.128109 LR: 0.00001691 +[14:14:08] Epoch: 1 Batch: 12456/20099 (61.97%) Loss: 2.013742 LR: 0.00001691 +[14:14:10] Epoch: 1 Batch: 12457/20099 (61.98%) Loss: 2.000071 LR: 0.00001691 +[14:14:11] Epoch: 1 Batch: 12458/20099 (61.98%) Loss: 2.215206 LR: 0.00001691 +[14:14:13] Epoch: 1 Batch: 12459/20099 (61.99%) Loss: 2.092292 LR: 0.00001691 +[14:14:15] Epoch: 1 Batch: 12460/20099 (61.99%) Loss: 2.036276 LR: 0.00001691 +[14:14:17] Epoch: 1 Batch: 12461/20099 (62.00%) Loss: 2.195232 LR: 0.00001689 +[14:14:19] Epoch: 1 Batch: 12462/20099 (62.00%) Loss: 1.989264 LR: 0.00001689 +[14:14:21] Epoch: 1 Batch: 12463/20099 (62.01%) Loss: 2.147491 LR: 0.00001689 +[14:14:22] Epoch: 1 Batch: 12464/20099 (62.01%) Loss: 2.230912 LR: 0.00001689 +[14:14:24] Epoch: 1 Batch: 12465/20099 (62.02%) Loss: 1.956182 LR: 0.00001689 +[14:14:26] Epoch: 1 Batch: 12466/20099 (62.02%) Loss: 1.994058 LR: 0.00001689 +[14:14:28] Epoch: 1 Batch: 12467/20099 (62.03%) Loss: 2.233642 LR: 0.00001689 +[14:14:30] Epoch: 1 Batch: 12468/20099 (62.03%) Loss: 2.216034 LR: 0.00001687 +[14:14:32] Epoch: 1 Batch: 12469/20099 (62.04%) Loss: 1.901636 LR: 0.00001687 +[14:14:34] Epoch: 1 Batch: 12470/20099 (62.04%) Loss: 2.061889 LR: 0.00001687 +[14:14:35] Epoch: 1 Batch: 12471/20099 (62.05%) Loss: 1.928025 LR: 0.00001687 +[14:14:37] Epoch: 1 Batch: 12472/20099 (62.05%) Loss: 2.115180 LR: 0.00001687 +[14:14:39] Epoch: 1 Batch: 12473/20099 (62.06%) Loss: 2.171680 LR: 0.00001687 +[14:14:41] Epoch: 1 Batch: 12474/20099 (62.06%) Loss: 1.871390 LR: 0.00001687 +[14:14:43] Epoch: 1 Batch: 12475/20099 (62.07%) Loss: 2.108838 LR: 0.00001686 +[14:14:45] Epoch: 1 Batch: 12476/20099 (62.07%) Loss: 1.932359 LR: 0.00001686 +[14:14:47] Epoch: 1 Batch: 12477/20099 (62.08%) Loss: 1.982171 LR: 0.00001686 +[14:14:48] Epoch: 1 Batch: 12478/20099 (62.08%) Loss: 1.989198 LR: 0.00001686 +[14:14:50] Epoch: 1 Batch: 12479/20099 (62.09%) Loss: 2.440098 LR: 0.00001686 +[14:14:52] Epoch: 1 Batch: 12480/20099 (62.09%) Loss: 2.278450 LR: 0.00001686 +[14:14:54] Epoch: 1 Batch: 12481/20099 (62.10%) Loss: 1.972771 LR: 0.00001686 +[14:14:56] Epoch: 1 Batch: 12482/20099 (62.10%) Loss: 2.254585 LR: 0.00001684 +[14:14:58] Epoch: 1 Batch: 12483/20099 (62.11%) Loss: 1.506684 LR: 0.00001684 +[14:15:00] Epoch: 1 Batch: 12484/20099 (62.11%) Loss: 2.111426 LR: 0.00001684 +[14:15:01] Epoch: 1 Batch: 12485/20099 (62.12%) Loss: 2.167999 LR: 0.00001684 +[14:15:03] Epoch: 1 Batch: 12486/20099 (62.12%) Loss: 2.215415 LR: 0.00001684 +[14:15:05] Epoch: 1 Batch: 12487/20099 (62.13%) Loss: 2.039410 LR: 0.00001684 +[14:15:07] Epoch: 1 Batch: 12488/20099 (62.13%) Loss: 1.945113 LR: 0.00001684 +[14:15:09] Epoch: 1 Batch: 12489/20099 (62.14%) Loss: 2.236220 LR: 0.00001682 +[14:15:11] Epoch: 1 Batch: 12490/20099 (62.14%) Loss: 1.776360 LR: 0.00001682 +[14:15:13] Epoch: 1 Batch: 12491/20099 (62.15%) Loss: 2.113581 LR: 0.00001682 +[14:15:14] Epoch: 1 Batch: 12492/20099 (62.15%) Loss: 2.022442 LR: 0.00001682 +[14:15:16] Epoch: 1 Batch: 12493/20099 (62.16%) Loss: 2.055039 LR: 0.00001682 +[14:15:18] Epoch: 1 Batch: 12494/20099 (62.16%) Loss: 2.216140 LR: 0.00001682 +[14:15:20] Epoch: 1 Batch: 12495/20099 (62.17%) Loss: 1.930246 LR: 0.00001682 +[14:15:22] Epoch: 1 Batch: 12496/20099 (62.17%) Loss: 1.843150 LR: 0.00001681 +[14:15:24] Epoch: 1 Batch: 12497/20099 (62.18%) Loss: 2.334273 LR: 0.00001681 +[14:15:26] Epoch: 1 Batch: 12498/20099 (62.18%) Loss: 2.158778 LR: 0.00001681 +[14:15:27] Epoch: 1 Batch: 12499/20099 (62.19%) Loss: 2.173199 LR: 0.00001681 +[14:15:29] >> Evaluating batch 0 +[14:15:30] >> Evaluating batch 1 +[14:15:32] >> Evaluating batch 2 +[14:15:33] >> Evaluating batch 3 +[14:15:34] >> Evaluating batch 4 +[14:15:35] >> Evaluating batch 5 +[14:15:36] >> Evaluating batch 6 +[14:15:37] >> Evaluating batch 7 +[14:15:38] >> Evaluating batch 8 +[14:15:39] >> Evaluating batch 9 +[14:15:40] >> Evaluating batch 10 +[14:15:41] >> Evaluating batch 11 +[14:15:42] >> Evaluating batch 12 +[14:15:43] >> Evaluating batch 13 +[14:15:44] >> Evaluating batch 14 +[14:15:45] >> Evaluating batch 15 +[14:15:46] >> Evaluating batch 16 +[14:15:47] Epoch: 1 Step: 12500/20099 Evaluation: +[14:15:47] [1mAvg Loss Since Last Eval: 2.0854 Val Loss: 2.1564 Validation loss delta: -0.0021 Perplexity: 8.6399 LR: 0.00001681 +[14:15:51] >> Checkpoint saved: epoch1_step12500, size: 0.1693 GB +[14:15:51] Epoch: 1 Batch: 12500/20099 (62.19%) Loss: 2.219299 LR: 0.00001681 +[14:15:52] Epoch: 1 Batch: 12501/20099 (62.20%) Loss: 2.284016 LR: 0.00001681 +[14:15:54] Epoch: 1 Batch: 12502/20099 (62.20%) Loss: 1.825728 LR: 0.00001681 +[14:15:56] Epoch: 1 Batch: 12503/20099 (62.21%) Loss: 2.208636 LR: 0.00001679 +[14:15:58] Epoch: 1 Batch: 12504/20099 (62.21%) Loss: 2.339548 LR: 0.00001679 +[14:16:00] Epoch: 1 Batch: 12505/20099 (62.22%) Loss: 2.153552 LR: 0.00001679 +[14:16:02] Epoch: 1 Batch: 12506/20099 (62.22%) Loss: 1.996155 LR: 0.00001679 +[14:16:03] Epoch: 1 Batch: 12507/20099 (62.23%) Loss: 2.367830 LR: 0.00001679 +[14:16:05] Epoch: 1 Batch: 12508/20099 (62.23%) Loss: 1.989859 LR: 0.00001679 +[14:16:07] Epoch: 1 Batch: 12509/20099 (62.24%) Loss: 1.975428 LR: 0.00001679 +[14:16:09] Epoch: 1 Batch: 12510/20099 (62.24%) Loss: 2.300255 LR: 0.00001678 +[14:16:11] Epoch: 1 Batch: 12511/20099 (62.25%) Loss: 2.000569 LR: 0.00001678 +[14:16:13] Epoch: 1 Batch: 12512/20099 (62.25%) Loss: 2.143499 LR: 0.00001678 +[14:16:14] Epoch: 1 Batch: 12513/20099 (62.26%) Loss: 1.843700 LR: 0.00001678 +[14:16:16] Epoch: 1 Batch: 12514/20099 (62.26%) Loss: 2.138958 LR: 0.00001678 +[14:16:18] Epoch: 1 Batch: 12515/20099 (62.27%) Loss: 2.074175 LR: 0.00001678 +[14:16:20] Epoch: 1 Batch: 12516/20099 (62.27%) Loss: 1.907137 LR: 0.00001678 +[14:16:22] Epoch: 1 Batch: 12517/20099 (62.28%) Loss: 2.646269 LR: 0.00001676 +[14:16:24] Epoch: 1 Batch: 12518/20099 (62.28%) Loss: 2.228563 LR: 0.00001676 +[14:16:26] Epoch: 1 Batch: 12519/20099 (62.29%) Loss: 2.292810 LR: 0.00001676 +[14:16:27] Epoch: 1 Batch: 12520/20099 (62.29%) Loss: 2.159517 LR: 0.00001676 +[14:16:29] Epoch: 1 Batch: 12521/20099 (62.30%) Loss: 2.058583 LR: 0.00001676 +[14:16:31] Epoch: 1 Batch: 12522/20099 (62.30%) Loss: 2.193661 LR: 0.00001676 +[14:16:33] Epoch: 1 Batch: 12523/20099 (62.31%) Loss: 2.123439 LR: 0.00001676 +[14:16:35] Epoch: 1 Batch: 12524/20099 (62.31%) Loss: 2.095974 LR: 0.00001674 +[14:16:37] Epoch: 1 Batch: 12525/20099 (62.32%) Loss: 2.113257 LR: 0.00001674 +[14:16:39] Epoch: 1 Batch: 12526/20099 (62.32%) Loss: 2.275215 LR: 0.00001674 +[14:16:40] Epoch: 1 Batch: 12527/20099 (62.33%) Loss: 2.001440 LR: 0.00001674 +[14:16:42] Epoch: 1 Batch: 12528/20099 (62.33%) Loss: 2.124734 LR: 0.00001674 +[14:16:44] Epoch: 1 Batch: 12529/20099 (62.34%) Loss: 2.079690 LR: 0.00001674 +[14:16:46] Epoch: 1 Batch: 12530/20099 (62.34%) Loss: 1.884131 LR: 0.00001674 +[14:16:48] Epoch: 1 Batch: 12531/20099 (62.35%) Loss: 2.175298 LR: 0.00001673 +[14:16:50] Epoch: 1 Batch: 12532/20099 (62.35%) Loss: 2.158480 LR: 0.00001673 +[14:16:52] Epoch: 1 Batch: 12533/20099 (62.36%) Loss: 1.963490 LR: 0.00001673 +[14:16:53] Epoch: 1 Batch: 12534/20099 (62.36%) Loss: 1.890460 LR: 0.00001673 +[14:16:55] Epoch: 1 Batch: 12535/20099 (62.37%) Loss: 2.062577 LR: 0.00001673 +[14:16:57] Epoch: 1 Batch: 12536/20099 (62.37%) Loss: 2.133164 LR: 0.00001673 +[14:16:59] Epoch: 1 Batch: 12537/20099 (62.38%) Loss: 2.146369 LR: 0.00001673 +[14:17:01] Epoch: 1 Batch: 12538/20099 (62.38%) Loss: 2.362843 LR: 0.00001671 +[14:17:03] Epoch: 1 Batch: 12539/20099 (62.39%) Loss: 1.915859 LR: 0.00001671 +[14:17:05] Epoch: 1 Batch: 12540/20099 (62.39%) Loss: 2.055562 LR: 0.00001671 +[14:17:06] Epoch: 1 Batch: 12541/20099 (62.40%) Loss: 2.024007 LR: 0.00001671 +[14:17:08] Epoch: 1 Batch: 12542/20099 (62.40%) Loss: 2.038615 LR: 0.00001671 +[14:17:10] Epoch: 1 Batch: 12543/20099 (62.41%) Loss: 2.250983 LR: 0.00001671 +[14:17:12] Epoch: 1 Batch: 12544/20099 (62.41%) Loss: 1.592399 LR: 0.00001671 +[14:17:14] Epoch: 1 Batch: 12545/20099 (62.42%) Loss: 2.047400 LR: 0.00001670 +[14:17:16] Epoch: 1 Batch: 12546/20099 (62.42%) Loss: 1.981114 LR: 0.00001670 +[14:17:18] Epoch: 1 Batch: 12547/20099 (62.43%) Loss: 2.222521 LR: 0.00001670 +[14:17:19] Epoch: 1 Batch: 12548/20099 (62.43%) Loss: 2.149658 LR: 0.00001670 +[14:17:21] Epoch: 1 Batch: 12549/20099 (62.44%) Loss: 1.868765 LR: 0.00001670 +[14:17:23] Epoch: 1 Batch: 12550/20099 (62.44%) Loss: 2.036593 LR: 0.00001670 +[14:17:25] Epoch: 1 Batch: 12551/20099 (62.45%) Loss: 2.049429 LR: 0.00001670 +[14:17:27] Epoch: 1 Batch: 12552/20099 (62.45%) Loss: 2.091812 LR: 0.00001668 +[14:17:29] Epoch: 1 Batch: 12553/20099 (62.46%) Loss: 1.649682 LR: 0.00001668 +[14:17:31] Epoch: 1 Batch: 12554/20099 (62.46%) Loss: 1.861302 LR: 0.00001668 +[14:17:32] Epoch: 1 Batch: 12555/20099 (62.47%) Loss: 2.148660 LR: 0.00001668 +[14:17:34] Epoch: 1 Batch: 12556/20099 (62.47%) Loss: 1.928683 LR: 0.00001668 +[14:17:36] Epoch: 1 Batch: 12557/20099 (62.48%) Loss: 1.794629 LR: 0.00001668 +[14:17:38] Epoch: 1 Batch: 12558/20099 (62.48%) Loss: 2.201550 LR: 0.00001668 +[14:17:40] Epoch: 1 Batch: 12559/20099 (62.49%) Loss: 1.960785 LR: 0.00001666 +[14:17:42] Epoch: 1 Batch: 12560/20099 (62.49%) Loss: 2.259130 LR: 0.00001666 +[14:17:44] Epoch: 1 Batch: 12561/20099 (62.50%) Loss: 2.005169 LR: 0.00001666 +[14:17:45] Epoch: 1 Batch: 12562/20099 (62.50%) Loss: 1.770578 LR: 0.00001666 +[14:17:47] Epoch: 1 Batch: 12563/20099 (62.51%) Loss: 2.292722 LR: 0.00001666 +[14:17:49] Epoch: 1 Batch: 12564/20099 (62.51%) Loss: 2.287107 LR: 0.00001666 +[14:17:51] Epoch: 1 Batch: 12565/20099 (62.52%) Loss: 2.002618 LR: 0.00001666 +[14:17:53] Epoch: 1 Batch: 12566/20099 (62.52%) Loss: 1.897934 LR: 0.00001665 +[14:17:55] Epoch: 1 Batch: 12567/20099 (62.53%) Loss: 1.964326 LR: 0.00001665 +[14:17:57] Epoch: 1 Batch: 12568/20099 (62.53%) Loss: 1.693827 LR: 0.00001665 +[14:17:59] Epoch: 1 Batch: 12569/20099 (62.54%) Loss: 2.136894 LR: 0.00001665 +[14:18:00] Epoch: 1 Batch: 12570/20099 (62.54%) Loss: 2.475913 LR: 0.00001665 +[14:18:02] Epoch: 1 Batch: 12571/20099 (62.55%) Loss: 2.238304 LR: 0.00001665 +[14:18:04] Epoch: 1 Batch: 12572/20099 (62.55%) Loss: 2.079713 LR: 0.00001665 +[14:18:06] Epoch: 1 Batch: 12573/20099 (62.56%) Loss: 2.531006 LR: 0.00001663 +[14:18:08] Epoch: 1 Batch: 12574/20099 (62.56%) Loss: 2.276798 LR: 0.00001663 +[14:18:10] Epoch: 1 Batch: 12575/20099 (62.57%) Loss: 2.315984 LR: 0.00001663 +[14:18:12] Epoch: 1 Batch: 12576/20099 (62.57%) Loss: 2.081683 LR: 0.00001663 +[14:18:13] Epoch: 1 Batch: 12577/20099 (62.58%) Loss: 1.971057 LR: 0.00001663 +[14:18:15] Epoch: 1 Batch: 12578/20099 (62.58%) Loss: 1.856409 LR: 0.00001663 +[14:18:17] Epoch: 1 Batch: 12579/20099 (62.59%) Loss: 1.957466 LR: 0.00001663 +[14:18:19] Epoch: 1 Batch: 12580/20099 (62.59%) Loss: 1.963899 LR: 0.00001661 +[14:18:21] Epoch: 1 Batch: 12581/20099 (62.60%) Loss: 1.858514 LR: 0.00001661 +[14:18:23] Epoch: 1 Batch: 12582/20099 (62.60%) Loss: 1.957968 LR: 0.00001661 +[14:18:25] Epoch: 1 Batch: 12583/20099 (62.61%) Loss: 1.873317 LR: 0.00001661 +[14:18:26] Epoch: 1 Batch: 12584/20099 (62.61%) Loss: 2.097559 LR: 0.00001661 +[14:18:28] Epoch: 1 Batch: 12585/20099 (62.62%) Loss: 2.003291 LR: 0.00001661 +[14:18:30] Epoch: 1 Batch: 12586/20099 (62.62%) Loss: 2.133249 LR: 0.00001661 +[14:18:32] Epoch: 1 Batch: 12587/20099 (62.63%) Loss: 2.315340 LR: 0.00001660 +[14:18:34] Epoch: 1 Batch: 12588/20099 (62.63%) Loss: 2.154790 LR: 0.00001660 +[14:18:36] Epoch: 1 Batch: 12589/20099 (62.63%) Loss: 1.974035 LR: 0.00001660 +[14:18:38] Epoch: 1 Batch: 12590/20099 (62.64%) Loss: 2.150758 LR: 0.00001660 +[14:18:40] Epoch: 1 Batch: 12591/20099 (62.64%) Loss: 2.122747 LR: 0.00001660 +[14:18:41] Epoch: 1 Batch: 12592/20099 (62.65%) Loss: 2.169953 LR: 0.00001660 +[14:18:43] Epoch: 1 Batch: 12593/20099 (62.65%) Loss: 2.065744 LR: 0.00001660 +[14:18:45] Epoch: 1 Batch: 12594/20099 (62.66%) Loss: 2.092122 LR: 0.00001658 +[14:18:47] Epoch: 1 Batch: 12595/20099 (62.66%) Loss: 2.115968 LR: 0.00001658 +[14:18:49] Epoch: 1 Batch: 12596/20099 (62.67%) Loss: 2.031101 LR: 0.00001658 +[14:18:51] Epoch: 1 Batch: 12597/20099 (62.67%) Loss: 2.207226 LR: 0.00001658 +[14:18:52] Epoch: 1 Batch: 12598/20099 (62.68%) Loss: 2.119432 LR: 0.00001658 +[14:18:54] Epoch: 1 Batch: 12599/20099 (62.68%) Loss: 2.127054 LR: 0.00001658 +[14:19:00] >> Cleaned up old temp checkpoint: epoch1_step6800 +[14:19:00] >> Temp checkpoint saved: epoch1_step12600, size: 0.1693 GB +[14:19:00] Epoch: 1 Batch: 12600/20099 (62.69%) Loss: 2.190987 LR: 0.00001658 +[14:19:02] Epoch: 1 Batch: 12601/20099 (62.69%) Loss: 1.814809 LR: 0.00001657 +[14:19:04] Epoch: 1 Batch: 12602/20099 (62.70%) Loss: 2.089878 LR: 0.00001657 +[14:19:05] Epoch: 1 Batch: 12603/20099 (62.70%) Loss: 1.944906 LR: 0.00001657 +[14:19:07] Epoch: 1 Batch: 12604/20099 (62.71%) Loss: 2.099495 LR: 0.00001657 +[14:19:09] Epoch: 1 Batch: 12605/20099 (62.71%) Loss: 1.941844 LR: 0.00001657 +[14:19:11] Epoch: 1 Batch: 12606/20099 (62.72%) Loss: 2.405787 LR: 0.00001657 +[14:19:13] Epoch: 1 Batch: 12607/20099 (62.72%) Loss: 2.009480 LR: 0.00001657 +[14:19:15] Epoch: 1 Batch: 12608/20099 (62.73%) Loss: 2.019177 LR: 0.00001655 +[14:19:16] Epoch: 1 Batch: 12609/20099 (62.73%) Loss: 2.096102 LR: 0.00001655 +[14:19:18] Epoch: 1 Batch: 12610/20099 (62.74%) Loss: 1.915547 LR: 0.00001655 +[14:19:20] Epoch: 1 Batch: 12611/20099 (62.74%) Loss: 2.189138 LR: 0.00001655 +[14:19:22] Epoch: 1 Batch: 12612/20099 (62.75%) Loss: 2.205578 LR: 0.00001655 +[14:19:24] Epoch: 1 Batch: 12613/20099 (62.75%) Loss: 2.071659 LR: 0.00001655 +[14:19:26] Epoch: 1 Batch: 12614/20099 (62.76%) Loss: 2.342597 LR: 0.00001655 +[14:19:27] Epoch: 1 Batch: 12615/20099 (62.76%) Loss: 2.257440 LR: 0.00001653 +[14:19:29] Epoch: 1 Batch: 12616/20099 (62.77%) Loss: 2.033727 LR: 0.00001653 +[14:19:31] Epoch: 1 Batch: 12617/20099 (62.77%) Loss: 2.365439 LR: 0.00001653 +[14:19:33] Epoch: 1 Batch: 12618/20099 (62.78%) Loss: 2.430846 LR: 0.00001653 +[14:19:35] Epoch: 1 Batch: 12619/20099 (62.78%) Loss: 2.059263 LR: 0.00001653 +[14:19:37] Epoch: 1 Batch: 12620/20099 (62.79%) Loss: 2.015301 LR: 0.00001653 +[14:19:39] Epoch: 1 Batch: 12621/20099 (62.79%) Loss: 1.804606 LR: 0.00001653 +[14:19:40] Epoch: 1 Batch: 12622/20099 (62.80%) Loss: 1.973134 LR: 0.00001652 +[14:19:42] Epoch: 1 Batch: 12623/20099 (62.80%) Loss: 2.125205 LR: 0.00001652 +[14:19:44] Epoch: 1 Batch: 12624/20099 (62.81%) Loss: 2.435292 LR: 0.00001652 +[14:19:46] Epoch: 1 Batch: 12625/20099 (62.81%) Loss: 1.987108 LR: 0.00001652 +[14:19:48] Epoch: 1 Batch: 12626/20099 (62.82%) Loss: 2.008645 LR: 0.00001652 +[14:19:50] Epoch: 1 Batch: 12627/20099 (62.82%) Loss: 1.955027 LR: 0.00001652 +[14:19:52] Epoch: 1 Batch: 12628/20099 (62.83%) Loss: 2.236176 LR: 0.00001652 +[14:19:54] Epoch: 1 Batch: 12629/20099 (62.83%) Loss: 1.941162 LR: 0.00001650 +[14:19:55] Epoch: 1 Batch: 12630/20099 (62.84%) Loss: 2.277886 LR: 0.00001650 +[14:19:57] Epoch: 1 Batch: 12631/20099 (62.84%) Loss: 1.924298 LR: 0.00001650 +[14:19:59] Epoch: 1 Batch: 12632/20099 (62.85%) Loss: 2.139210 LR: 0.00001650 +[14:20:01] Epoch: 1 Batch: 12633/20099 (62.85%) Loss: 2.066837 LR: 0.00001650 +[14:20:03] Epoch: 1 Batch: 12634/20099 (62.86%) Loss: 2.125645 LR: 0.00001650 +[14:20:05] Epoch: 1 Batch: 12635/20099 (62.86%) Loss: 2.086339 LR: 0.00001650 +[14:20:07] Epoch: 1 Batch: 12636/20099 (62.87%) Loss: 1.964285 LR: 0.00001649 +[14:20:08] Epoch: 1 Batch: 12637/20099 (62.87%) Loss: 1.520764 LR: 0.00001649 +[14:20:10] Epoch: 1 Batch: 12638/20099 (62.88%) Loss: 2.206492 LR: 0.00001649 +[14:20:12] Epoch: 1 Batch: 12639/20099 (62.88%) Loss: 1.992594 LR: 0.00001649 +[14:20:14] Epoch: 1 Batch: 12640/20099 (62.89%) Loss: 1.990512 LR: 0.00001649 +[14:20:16] Epoch: 1 Batch: 12641/20099 (62.89%) Loss: 1.522527 LR: 0.00001649 +[14:20:18] Epoch: 1 Batch: 12642/20099 (62.90%) Loss: 1.964320 LR: 0.00001649 +[14:20:19] Epoch: 1 Batch: 12643/20099 (62.90%) Loss: 2.071163 LR: 0.00001647 +[14:20:21] Epoch: 1 Batch: 12644/20099 (62.91%) Loss: 2.200923 LR: 0.00001647 +[14:20:23] Epoch: 1 Batch: 12645/20099 (62.91%) Loss: 2.470009 LR: 0.00001647 +[14:20:25] Epoch: 1 Batch: 12646/20099 (62.92%) Loss: 2.014628 LR: 0.00001647 +[14:20:27] Epoch: 1 Batch: 12647/20099 (62.92%) Loss: 2.145019 LR: 0.00001647 +[14:20:29] Epoch: 1 Batch: 12648/20099 (62.93%) Loss: 2.132402 LR: 0.00001647 +[14:20:30] Epoch: 1 Batch: 12649/20099 (62.93%) Loss: 2.267000 LR: 0.00001647 +[14:20:32] Epoch: 1 Batch: 12650/20099 (62.94%) Loss: 2.201242 LR: 0.00001645 +[14:20:34] Epoch: 1 Batch: 12651/20099 (62.94%) Loss: 2.326022 LR: 0.00001645 +[14:20:36] Epoch: 1 Batch: 12652/20099 (62.95%) Loss: 2.139207 LR: 0.00001645 +[14:20:38] Epoch: 1 Batch: 12653/20099 (62.95%) Loss: 1.999581 LR: 0.00001645 +[14:20:40] Epoch: 1 Batch: 12654/20099 (62.96%) Loss: 2.027912 LR: 0.00001645 +[14:20:42] Epoch: 1 Batch: 12655/20099 (62.96%) Loss: 2.121173 LR: 0.00001645 +[14:20:43] Epoch: 1 Batch: 12656/20099 (62.97%) Loss: 2.226741 LR: 0.00001645 +[14:20:45] Epoch: 1 Batch: 12657/20099 (62.97%) Loss: 2.179336 LR: 0.00001644 +[14:20:47] Epoch: 1 Batch: 12658/20099 (62.98%) Loss: 2.550044 LR: 0.00001644 +[14:20:49] Epoch: 1 Batch: 12659/20099 (62.98%) Loss: 2.160189 LR: 0.00001644 +[14:20:51] Epoch: 1 Batch: 12660/20099 (62.99%) Loss: 2.258188 LR: 0.00001644 +[14:20:53] Epoch: 1 Batch: 12661/20099 (62.99%) Loss: 2.256711 LR: 0.00001644 +[14:20:55] Epoch: 1 Batch: 12662/20099 (63.00%) Loss: 2.088911 LR: 0.00001644 +[14:20:56] Epoch: 1 Batch: 12663/20099 (63.00%) Loss: 2.102660 LR: 0.00001644 +[14:20:58] Epoch: 1 Batch: 12664/20099 (63.01%) Loss: 1.774547 LR: 0.00001642 +[14:21:00] Epoch: 1 Batch: 12665/20099 (63.01%) Loss: 2.453672 LR: 0.00001642 +[14:21:02] Epoch: 1 Batch: 12666/20099 (63.02%) Loss: 2.270105 LR: 0.00001642 +[14:21:04] Epoch: 1 Batch: 12667/20099 (63.02%) Loss: 2.440426 LR: 0.00001642 +[14:21:06] Epoch: 1 Batch: 12668/20099 (63.03%) Loss: 2.125851 LR: 0.00001642 +[14:21:08] Epoch: 1 Batch: 12669/20099 (63.03%) Loss: 2.025991 LR: 0.00001642 +[14:21:09] Epoch: 1 Batch: 12670/20099 (63.04%) Loss: 2.187680 LR: 0.00001642 +[14:21:11] Epoch: 1 Batch: 12671/20099 (63.04%) Loss: 2.219559 LR: 0.00001640 +[14:21:13] Epoch: 1 Batch: 12672/20099 (63.05%) Loss: 2.344424 LR: 0.00001640 +[14:21:15] Epoch: 1 Batch: 12673/20099 (63.05%) Loss: 2.126011 LR: 0.00001640 +[14:21:17] Epoch: 1 Batch: 12674/20099 (63.06%) Loss: 2.289833 LR: 0.00001640 +[14:21:19] Epoch: 1 Batch: 12675/20099 (63.06%) Loss: 2.125325 LR: 0.00001640 +[14:21:20] Epoch: 1 Batch: 12676/20099 (63.07%) Loss: 2.256634 LR: 0.00001640 +[14:21:22] Epoch: 1 Batch: 12677/20099 (63.07%) Loss: 1.956988 LR: 0.00001640 +[14:21:24] Epoch: 1 Batch: 12678/20099 (63.08%) Loss: 1.869049 LR: 0.00001639 +[14:21:26] Epoch: 1 Batch: 12679/20099 (63.08%) Loss: 2.478871 LR: 0.00001639 +[14:21:28] Epoch: 1 Batch: 12680/20099 (63.09%) Loss: 2.156205 LR: 0.00001639 +[14:21:30] Epoch: 1 Batch: 12681/20099 (63.09%) Loss: 2.214976 LR: 0.00001639 +[14:21:32] Epoch: 1 Batch: 12682/20099 (63.10%) Loss: 2.244859 LR: 0.00001639 +[14:21:33] Epoch: 1 Batch: 12683/20099 (63.10%) Loss: 1.898811 LR: 0.00001639 +[14:21:35] Epoch: 1 Batch: 12684/20099 (63.11%) Loss: 2.265249 LR: 0.00001639 +[14:21:37] Epoch: 1 Batch: 12685/20099 (63.11%) Loss: 1.994962 LR: 0.00001637 +[14:21:39] Epoch: 1 Batch: 12686/20099 (63.12%) Loss: 2.169183 LR: 0.00001637 +[14:21:41] Epoch: 1 Batch: 12687/20099 (63.12%) Loss: 2.008988 LR: 0.00001637 +[14:21:43] Epoch: 1 Batch: 12688/20099 (63.13%) Loss: 1.982748 LR: 0.00001637 +[14:21:45] Epoch: 1 Batch: 12689/20099 (63.13%) Loss: 2.259931 LR: 0.00001637 +[14:21:46] Epoch: 1 Batch: 12690/20099 (63.14%) Loss: 2.115459 LR: 0.00001637 +[14:21:48] Epoch: 1 Batch: 12691/20099 (63.14%) Loss: 2.330428 LR: 0.00001637 +[14:21:50] Epoch: 1 Batch: 12692/20099 (63.15%) Loss: 2.033515 LR: 0.00001636 +[14:21:52] Epoch: 1 Batch: 12693/20099 (63.15%) Loss: 1.952426 LR: 0.00001636 +[14:21:54] Epoch: 1 Batch: 12694/20099 (63.16%) Loss: 2.330842 LR: 0.00001636 +[14:21:56] Epoch: 1 Batch: 12695/20099 (63.16%) Loss: 1.977513 LR: 0.00001636 +[14:21:57] Epoch: 1 Batch: 12696/20099 (63.17%) Loss: 2.067961 LR: 0.00001636 +[14:21:59] Epoch: 1 Batch: 12697/20099 (63.17%) Loss: 2.316957 LR: 0.00001636 +[14:22:01] Epoch: 1 Batch: 12698/20099 (63.18%) Loss: 2.323213 LR: 0.00001636 +[14:22:03] Epoch: 1 Batch: 12699/20099 (63.18%) Loss: 2.178513 LR: 0.00001634 +[14:22:05] Epoch: 1 Batch: 12700/20099 (63.19%) Loss: 2.035271 LR: 0.00001634 +[14:22:07] Epoch: 1 Batch: 12701/20099 (63.19%) Loss: 1.866002 LR: 0.00001634 +[14:22:09] Epoch: 1 Batch: 12702/20099 (63.20%) Loss: 2.121701 LR: 0.00001634 +[14:22:10] Epoch: 1 Batch: 12703/20099 (63.20%) Loss: 1.732798 LR: 0.00001634 +[14:22:12] Epoch: 1 Batch: 12704/20099 (63.21%) Loss: 1.744851 LR: 0.00001634 +[14:22:14] Epoch: 1 Batch: 12705/20099 (63.21%) Loss: 2.185283 LR: 0.00001634 +[14:22:16] Epoch: 1 Batch: 12706/20099 (63.22%) Loss: 2.117529 LR: 0.00001632 +[14:22:18] Epoch: 1 Batch: 12707/20099 (63.22%) Loss: 2.103647 LR: 0.00001632 +[14:22:20] Epoch: 1 Batch: 12708/20099 (63.23%) Loss: 2.192468 LR: 0.00001632 +[14:22:22] Epoch: 1 Batch: 12709/20099 (63.23%) Loss: 2.200884 LR: 0.00001632 +[14:22:23] Epoch: 1 Batch: 12710/20099 (63.24%) Loss: 1.955021 LR: 0.00001632 +[14:22:25] Epoch: 1 Batch: 12711/20099 (63.24%) Loss: 2.234428 LR: 0.00001632 +[14:22:27] Epoch: 1 Batch: 12712/20099 (63.25%) Loss: 2.021751 LR: 0.00001632 +[14:22:29] Epoch: 1 Batch: 12713/20099 (63.25%) Loss: 2.008090 LR: 0.00001631 +[14:22:31] Epoch: 1 Batch: 12714/20099 (63.26%) Loss: 2.096052 LR: 0.00001631 +[14:22:33] Epoch: 1 Batch: 12715/20099 (63.26%) Loss: 1.916308 LR: 0.00001631 +[14:22:35] Epoch: 1 Batch: 12716/20099 (63.27%) Loss: 2.164582 LR: 0.00001631 +[14:22:36] Epoch: 1 Batch: 12717/20099 (63.27%) Loss: 2.407270 LR: 0.00001631 +[14:22:38] Epoch: 1 Batch: 12718/20099 (63.28%) Loss: 2.033816 LR: 0.00001631 +[14:22:40] Epoch: 1 Batch: 12719/20099 (63.28%) Loss: 2.221924 LR: 0.00001631 +[14:22:42] Epoch: 1 Batch: 12720/20099 (63.29%) Loss: 2.080385 LR: 0.00001629 +[14:22:44] Epoch: 1 Batch: 12721/20099 (63.29%) Loss: 1.983751 LR: 0.00001629 +[14:22:46] Epoch: 1 Batch: 12722/20099 (63.30%) Loss: 2.212434 LR: 0.00001629 +[14:22:47] Epoch: 1 Batch: 12723/20099 (63.30%) Loss: 2.200582 LR: 0.00001629 +[14:22:49] Epoch: 1 Batch: 12724/20099 (63.31%) Loss: 1.843708 LR: 0.00001629 +[14:22:51] Epoch: 1 Batch: 12725/20099 (63.31%) Loss: 2.129790 LR: 0.00001629 +[14:22:53] Epoch: 1 Batch: 12726/20099 (63.32%) Loss: 2.032133 LR: 0.00001629 +[14:22:55] Epoch: 1 Batch: 12727/20099 (63.32%) Loss: 2.097843 LR: 0.00001628 +[14:22:57] Epoch: 1 Batch: 12728/20099 (63.33%) Loss: 2.184883 LR: 0.00001628 +[14:22:58] Epoch: 1 Batch: 12729/20099 (63.33%) Loss: 2.137303 LR: 0.00001628 +[14:23:00] Epoch: 1 Batch: 12730/20099 (63.34%) Loss: 1.880498 LR: 0.00001628 +[14:23:02] Epoch: 1 Batch: 12731/20099 (63.34%) Loss: 2.004451 LR: 0.00001628 +[14:23:04] Epoch: 1 Batch: 12732/20099 (63.35%) Loss: 2.154609 LR: 0.00001628 +[14:23:06] Epoch: 1 Batch: 12733/20099 (63.35%) Loss: 2.079374 LR: 0.00001628 +[14:23:08] Epoch: 1 Batch: 12734/20099 (63.36%) Loss: 2.056169 LR: 0.00001626 +[14:23:10] Epoch: 1 Batch: 12735/20099 (63.36%) Loss: 2.086734 LR: 0.00001626 +[14:23:11] Epoch: 1 Batch: 12736/20099 (63.37%) Loss: 2.136561 LR: 0.00001626 +[14:23:13] Epoch: 1 Batch: 12737/20099 (63.37%) Loss: 2.094603 LR: 0.00001626 +[14:23:15] Epoch: 1 Batch: 12738/20099 (63.38%) Loss: 2.094899 LR: 0.00001626 +[14:23:17] Epoch: 1 Batch: 12739/20099 (63.38%) Loss: 1.916194 LR: 0.00001626 +[14:23:19] Epoch: 1 Batch: 12740/20099 (63.39%) Loss: 2.124086 LR: 0.00001626 +[14:23:21] Epoch: 1 Batch: 12741/20099 (63.39%) Loss: 2.010309 LR: 0.00001624 +[14:23:23] Epoch: 1 Batch: 12742/20099 (63.40%) Loss: 2.212326 LR: 0.00001624 +[14:23:24] Epoch: 1 Batch: 12743/20099 (63.40%) Loss: 2.244358 LR: 0.00001624 +[14:23:26] Epoch: 1 Batch: 12744/20099 (63.41%) Loss: 2.135125 LR: 0.00001624 +[14:23:28] Epoch: 1 Batch: 12745/20099 (63.41%) Loss: 2.010543 LR: 0.00001624 +[14:23:30] Epoch: 1 Batch: 12746/20099 (63.42%) Loss: 2.168868 LR: 0.00001624 +[14:23:32] Epoch: 1 Batch: 12747/20099 (63.42%) Loss: 1.841687 LR: 0.00001624 +[14:23:34] Epoch: 1 Batch: 12748/20099 (63.43%) Loss: 2.124093 LR: 0.00001623 +[14:23:36] Epoch: 1 Batch: 12749/20099 (63.43%) Loss: 2.096280 LR: 0.00001623 +[14:23:37] Epoch: 1 Batch: 12750/20099 (63.44%) Loss: 2.310754 LR: 0.00001623 +[14:23:39] Epoch: 1 Batch: 12751/20099 (63.44%) Loss: 1.908609 LR: 0.00001623 +[14:23:41] Epoch: 1 Batch: 12752/20099 (63.45%) Loss: 2.225609 LR: 0.00001623 +[14:23:43] Epoch: 1 Batch: 12753/20099 (63.45%) Loss: 1.968488 LR: 0.00001623 +[14:23:45] Epoch: 1 Batch: 12754/20099 (63.46%) Loss: 2.021708 LR: 0.00001623 +[14:23:47] Epoch: 1 Batch: 12755/20099 (63.46%) Loss: 2.046255 LR: 0.00001621 +[14:23:49] Epoch: 1 Batch: 12756/20099 (63.47%) Loss: 1.846948 LR: 0.00001621 +[14:23:50] Epoch: 1 Batch: 12757/20099 (63.47%) Loss: 2.115986 LR: 0.00001621 +[14:23:52] Epoch: 1 Batch: 12758/20099 (63.48%) Loss: 2.196186 LR: 0.00001621 +[14:23:54] Epoch: 1 Batch: 12759/20099 (63.48%) Loss: 1.741677 LR: 0.00001621 +[14:23:56] Epoch: 1 Batch: 12760/20099 (63.49%) Loss: 2.196645 LR: 0.00001621 +[14:23:58] Epoch: 1 Batch: 12761/20099 (63.49%) Loss: 1.901577 LR: 0.00001621 +[14:24:00] Epoch: 1 Batch: 12762/20099 (63.50%) Loss: 2.095832 LR: 0.00001620 +[14:24:02] Epoch: 1 Batch: 12763/20099 (63.50%) Loss: 1.924193 LR: 0.00001620 +[14:24:03] Epoch: 1 Batch: 12764/20099 (63.51%) Loss: 2.095968 LR: 0.00001620 +[14:24:05] Epoch: 1 Batch: 12765/20099 (63.51%) Loss: 2.034623 LR: 0.00001620 +[14:24:07] Epoch: 1 Batch: 12766/20099 (63.52%) Loss: 2.076230 LR: 0.00001620 +[14:24:09] Epoch: 1 Batch: 12767/20099 (63.52%) Loss: 2.175267 LR: 0.00001620 +[14:24:11] Epoch: 1 Batch: 12768/20099 (63.53%) Loss: 2.023261 LR: 0.00001620 +[14:24:13] Epoch: 1 Batch: 12769/20099 (63.53%) Loss: 1.968342 LR: 0.00001618 +[14:24:15] Epoch: 1 Batch: 12770/20099 (63.54%) Loss: 2.123756 LR: 0.00001618 +[14:24:16] Epoch: 1 Batch: 12771/20099 (63.54%) Loss: 2.136152 LR: 0.00001618 +[14:24:18] Epoch: 1 Batch: 12772/20099 (63.55%) Loss: 2.077722 LR: 0.00001618 +[14:24:20] Epoch: 1 Batch: 12773/20099 (63.55%) Loss: 2.108462 LR: 0.00001618 +[14:24:22] Epoch: 1 Batch: 12774/20099 (63.56%) Loss: 1.651378 LR: 0.00001618 +[14:24:24] Epoch: 1 Batch: 12775/20099 (63.56%) Loss: 2.145134 LR: 0.00001618 +[14:24:26] Epoch: 1 Batch: 12776/20099 (63.57%) Loss: 2.026645 LR: 0.00001616 +[14:24:28] Epoch: 1 Batch: 12777/20099 (63.57%) Loss: 2.356749 LR: 0.00001616 +[14:24:29] Epoch: 1 Batch: 12778/20099 (63.58%) Loss: 1.906243 LR: 0.00001616 +[14:24:31] Epoch: 1 Batch: 12779/20099 (63.58%) Loss: 2.122055 LR: 0.00001616 +[14:24:33] Epoch: 1 Batch: 12780/20099 (63.59%) Loss: 1.957824 LR: 0.00001616 +[14:24:35] Epoch: 1 Batch: 12781/20099 (63.59%) Loss: 2.219341 LR: 0.00001616 +[14:24:37] Epoch: 1 Batch: 12782/20099 (63.60%) Loss: 2.145710 LR: 0.00001616 +[14:24:39] Epoch: 1 Batch: 12783/20099 (63.60%) Loss: 1.938868 LR: 0.00001615 +[14:24:41] Epoch: 1 Batch: 12784/20099 (63.61%) Loss: 2.068468 LR: 0.00001615 +[14:24:42] Epoch: 1 Batch: 12785/20099 (63.61%) Loss: 2.184168 LR: 0.00001615 +[14:24:44] Epoch: 1 Batch: 12786/20099 (63.62%) Loss: 2.350325 LR: 0.00001615 +[14:24:46] Epoch: 1 Batch: 12787/20099 (63.62%) Loss: 2.037389 LR: 0.00001615 +[14:24:48] Epoch: 1 Batch: 12788/20099 (63.63%) Loss: 2.173512 LR: 0.00001615 +[14:24:50] Epoch: 1 Batch: 12789/20099 (63.63%) Loss: 2.008999 LR: 0.00001615 +[14:24:52] Epoch: 1 Batch: 12790/20099 (63.64%) Loss: 2.434047 LR: 0.00001613 +[14:24:54] Epoch: 1 Batch: 12791/20099 (63.64%) Loss: 2.184029 LR: 0.00001613 +[14:24:55] Epoch: 1 Batch: 12792/20099 (63.64%) Loss: 2.138233 LR: 0.00001613 +[14:24:57] Epoch: 1 Batch: 12793/20099 (63.65%) Loss: 2.120486 LR: 0.00001613 +[14:24:59] Epoch: 1 Batch: 12794/20099 (63.65%) Loss: 2.178020 LR: 0.00001613 +[14:25:01] Epoch: 1 Batch: 12795/20099 (63.66%) Loss: 1.813330 LR: 0.00001613 +[14:25:03] Epoch: 1 Batch: 12796/20099 (63.66%) Loss: 2.247991 LR: 0.00001613 +[14:25:05] Epoch: 1 Batch: 12797/20099 (63.67%) Loss: 1.904433 LR: 0.00001612 +[14:25:07] Epoch: 1 Batch: 12798/20099 (63.67%) Loss: 2.020821 LR: 0.00001612 +[14:25:08] Epoch: 1 Batch: 12799/20099 (63.68%) Loss: 2.115403 LR: 0.00001612 +[14:25:14] >> Cleaned up old temp checkpoint: epoch1_step7000 +[14:25:14] >> Temp checkpoint saved: epoch1_step12800, size: 0.1693 GB +[14:25:14] Epoch: 1 Batch: 12800/20099 (63.68%) Loss: 1.950352 LR: 0.00001612 +[14:25:16] Epoch: 1 Batch: 12801/20099 (63.69%) Loss: 2.151952 LR: 0.00001612 +[14:25:18] Epoch: 1 Batch: 12802/20099 (63.69%) Loss: 2.129425 LR: 0.00001612 +[14:25:20] Epoch: 1 Batch: 12803/20099 (63.70%) Loss: 2.135623 LR: 0.00001612 +[14:25:21] Epoch: 1 Batch: 12804/20099 (63.70%) Loss: 1.922926 LR: 0.00001610 +[14:25:23] Epoch: 1 Batch: 12805/20099 (63.71%) Loss: 2.332348 LR: 0.00001610 +[14:25:25] Epoch: 1 Batch: 12806/20099 (63.71%) Loss: 2.241134 LR: 0.00001610 +[14:25:27] Epoch: 1 Batch: 12807/20099 (63.72%) Loss: 2.064121 LR: 0.00001610 +[14:25:29] Epoch: 1 Batch: 12808/20099 (63.72%) Loss: 2.114382 LR: 0.00001610 +[14:25:31] Epoch: 1 Batch: 12809/20099 (63.73%) Loss: 1.758492 LR: 0.00001610 +[14:25:32] Epoch: 1 Batch: 12810/20099 (63.73%) Loss: 2.312398 LR: 0.00001610 +[14:25:34] Epoch: 1 Batch: 12811/20099 (63.74%) Loss: 2.220335 LR: 0.00001608 +[14:25:36] Epoch: 1 Batch: 12812/20099 (63.74%) Loss: 1.997087 LR: 0.00001608 +[14:25:38] Epoch: 1 Batch: 12813/20099 (63.75%) Loss: 2.120683 LR: 0.00001608 +[14:25:40] Epoch: 1 Batch: 12814/20099 (63.75%) Loss: 2.244936 LR: 0.00001608 +[14:25:42] Epoch: 1 Batch: 12815/20099 (63.76%) Loss: 2.186012 LR: 0.00001608 +[14:25:44] Epoch: 1 Batch: 12816/20099 (63.76%) Loss: 1.997612 LR: 0.00001608 +[14:25:46] Epoch: 1 Batch: 12817/20099 (63.77%) Loss: 1.830277 LR: 0.00001608 +[14:25:47] Epoch: 1 Batch: 12818/20099 (63.77%) Loss: 2.095818 LR: 0.00001607 +[14:25:49] Epoch: 1 Batch: 12819/20099 (63.78%) Loss: 2.145198 LR: 0.00001607 +[14:25:51] Epoch: 1 Batch: 12820/20099 (63.78%) Loss: 1.888341 LR: 0.00001607 +[14:25:53] Epoch: 1 Batch: 12821/20099 (63.79%) Loss: 2.260203 LR: 0.00001607 +[14:25:55] Epoch: 1 Batch: 12822/20099 (63.79%) Loss: 2.401816 LR: 0.00001607 +[14:25:57] Epoch: 1 Batch: 12823/20099 (63.80%) Loss: 2.076410 LR: 0.00001607 +[14:25:59] Epoch: 1 Batch: 12824/20099 (63.80%) Loss: 2.062681 LR: 0.00001607 +[14:26:01] Epoch: 1 Batch: 12825/20099 (63.81%) Loss: 2.241721 LR: 0.00001605 +[14:26:02] Epoch: 1 Batch: 12826/20099 (63.81%) Loss: 2.073078 LR: 0.00001605 +[14:26:04] Epoch: 1 Batch: 12827/20099 (63.82%) Loss: 2.229571 LR: 0.00001605 +[14:26:06] Epoch: 1 Batch: 12828/20099 (63.82%) Loss: 1.953658 LR: 0.00001605 +[14:26:08] Epoch: 1 Batch: 12829/20099 (63.83%) Loss: 2.260696 LR: 0.00001605 +[14:26:10] Epoch: 1 Batch: 12830/20099 (63.83%) Loss: 2.035530 LR: 0.00001605 +[14:26:12] Epoch: 1 Batch: 12831/20099 (63.84%) Loss: 2.105090 LR: 0.00001605 +[14:26:14] Epoch: 1 Batch: 12832/20099 (63.84%) Loss: 1.827815 LR: 0.00001604 +[14:26:15] Epoch: 1 Batch: 12833/20099 (63.85%) Loss: 2.270337 LR: 0.00001604 +[14:26:17] Epoch: 1 Batch: 12834/20099 (63.85%) Loss: 2.087409 LR: 0.00001604 +[14:26:19] Epoch: 1 Batch: 12835/20099 (63.86%) Loss: 1.937943 LR: 0.00001604 +[14:26:21] Epoch: 1 Batch: 12836/20099 (63.86%) Loss: 2.040845 LR: 0.00001604 +[14:26:23] Epoch: 1 Batch: 12837/20099 (63.87%) Loss: 1.866566 LR: 0.00001604 +[14:26:25] Epoch: 1 Batch: 12838/20099 (63.87%) Loss: 2.141048 LR: 0.00001604 +[14:26:27] Epoch: 1 Batch: 12839/20099 (63.88%) Loss: 1.821415 LR: 0.00001602 +[14:26:28] Epoch: 1 Batch: 12840/20099 (63.88%) Loss: 1.973950 LR: 0.00001602 +[14:26:30] Epoch: 1 Batch: 12841/20099 (63.89%) Loss: 1.858399 LR: 0.00001602 +[14:26:32] Epoch: 1 Batch: 12842/20099 (63.89%) Loss: 1.976620 LR: 0.00001602 +[14:26:34] Epoch: 1 Batch: 12843/20099 (63.90%) Loss: 2.477959 LR: 0.00001602 +[14:26:36] Epoch: 1 Batch: 12844/20099 (63.90%) Loss: 1.927150 LR: 0.00001602 +[14:26:38] Epoch: 1 Batch: 12845/20099 (63.91%) Loss: 2.199275 LR: 0.00001602 +[14:26:39] Epoch: 1 Batch: 12846/20099 (63.91%) Loss: 1.992364 LR: 0.00001600 +[14:26:41] Epoch: 1 Batch: 12847/20099 (63.92%) Loss: 2.237602 LR: 0.00001600 +[14:26:43] Epoch: 1 Batch: 12848/20099 (63.92%) Loss: 2.343164 LR: 0.00001600 +[14:26:45] Epoch: 1 Batch: 12849/20099 (63.93%) Loss: 1.969648 LR: 0.00001600 +[14:26:47] Epoch: 1 Batch: 12850/20099 (63.93%) Loss: 1.946840 LR: 0.00001600 +[14:26:49] Epoch: 1 Batch: 12851/20099 (63.94%) Loss: 2.138975 LR: 0.00001600 +[14:26:50] Epoch: 1 Batch: 12852/20099 (63.94%) Loss: 2.087366 LR: 0.00001600 +[14:26:52] Epoch: 1 Batch: 12853/20099 (63.95%) Loss: 2.083472 LR: 0.00001599 +[14:26:54] Epoch: 1 Batch: 12854/20099 (63.95%) Loss: 2.031334 LR: 0.00001599 +[14:26:56] Epoch: 1 Batch: 12855/20099 (63.96%) Loss: 2.124740 LR: 0.00001599 +[14:26:58] Epoch: 1 Batch: 12856/20099 (63.96%) Loss: 2.288210 LR: 0.00001599 +[14:27:00] Epoch: 1 Batch: 12857/20099 (63.97%) Loss: 2.129854 LR: 0.00001599 +[14:27:02] Epoch: 1 Batch: 12858/20099 (63.97%) Loss: 2.061570 LR: 0.00001599 +[14:27:03] Epoch: 1 Batch: 12859/20099 (63.98%) Loss: 1.956840 LR: 0.00001599 +[14:27:05] Epoch: 1 Batch: 12860/20099 (63.98%) Loss: 2.293208 LR: 0.00001597 +[14:27:07] Epoch: 1 Batch: 12861/20099 (63.99%) Loss: 2.134808 LR: 0.00001597 +[14:27:09] Epoch: 1 Batch: 12862/20099 (63.99%) Loss: 2.380448 LR: 0.00001597 +[14:27:11] Epoch: 1 Batch: 12863/20099 (64.00%) Loss: 2.204761 LR: 0.00001597 +[14:27:13] Epoch: 1 Batch: 12864/20099 (64.00%) Loss: 2.129557 LR: 0.00001597 +[14:27:15] Epoch: 1 Batch: 12865/20099 (64.01%) Loss: 2.140949 LR: 0.00001597 +[14:27:16] Epoch: 1 Batch: 12866/20099 (64.01%) Loss: 2.261990 LR: 0.00001597 +[14:27:18] Epoch: 1 Batch: 12867/20099 (64.02%) Loss: 2.202038 LR: 0.00001596 +[14:27:20] Epoch: 1 Batch: 12868/20099 (64.02%) Loss: 2.276654 LR: 0.00001596 +[14:27:22] Epoch: 1 Batch: 12869/20099 (64.03%) Loss: 2.156851 LR: 0.00001596 +[14:27:24] Epoch: 1 Batch: 12870/20099 (64.03%) Loss: 2.417508 LR: 0.00001596 +[14:27:26] Epoch: 1 Batch: 12871/20099 (64.04%) Loss: 1.763663 LR: 0.00001596 +[14:27:28] Epoch: 1 Batch: 12872/20099 (64.04%) Loss: 1.986901 LR: 0.00001596 +[14:27:29] Epoch: 1 Batch: 12873/20099 (64.05%) Loss: 2.106017 LR: 0.00001596 +[14:27:31] Epoch: 1 Batch: 12874/20099 (64.05%) Loss: 2.179945 LR: 0.00001594 +[14:27:33] Epoch: 1 Batch: 12875/20099 (64.06%) Loss: 2.217934 LR: 0.00001594 +[14:27:35] Epoch: 1 Batch: 12876/20099 (64.06%) Loss: 2.048631 LR: 0.00001594 +[14:27:37] Epoch: 1 Batch: 12877/20099 (64.07%) Loss: 2.201055 LR: 0.00001594 +[14:27:39] Epoch: 1 Batch: 12878/20099 (64.07%) Loss: 1.962034 LR: 0.00001594 +[14:27:41] Epoch: 1 Batch: 12879/20099 (64.08%) Loss: 2.149259 LR: 0.00001594 +[14:27:43] Epoch: 1 Batch: 12880/20099 (64.08%) Loss: 1.800211 LR: 0.00001594 +[14:27:44] Epoch: 1 Batch: 12881/20099 (64.09%) Loss: 1.925407 LR: 0.00001592 +[14:27:46] Epoch: 1 Batch: 12882/20099 (64.09%) Loss: 1.945724 LR: 0.00001592 +[14:27:48] Epoch: 1 Batch: 12883/20099 (64.10%) Loss: 2.226109 LR: 0.00001592 +[14:27:50] Epoch: 1 Batch: 12884/20099 (64.10%) Loss: 2.018183 LR: 0.00001592 +[14:27:52] Epoch: 1 Batch: 12885/20099 (64.11%) Loss: 2.328751 LR: 0.00001592 +[14:27:54] Epoch: 1 Batch: 12886/20099 (64.11%) Loss: 2.061155 LR: 0.00001592 +[14:27:56] Epoch: 1 Batch: 12887/20099 (64.12%) Loss: 2.048187 LR: 0.00001592 +[14:27:57] Epoch: 1 Batch: 12888/20099 (64.12%) Loss: 2.156271 LR: 0.00001591 +[14:27:59] Epoch: 1 Batch: 12889/20099 (64.13%) Loss: 2.217962 LR: 0.00001591 +[14:28:01] Epoch: 1 Batch: 12890/20099 (64.13%) Loss: 2.020672 LR: 0.00001591 +[14:28:03] Epoch: 1 Batch: 12891/20099 (64.14%) Loss: 2.230306 LR: 0.00001591 +[14:28:05] Epoch: 1 Batch: 12892/20099 (64.14%) Loss: 2.068682 LR: 0.00001591 +[14:28:07] Epoch: 1 Batch: 12893/20099 (64.15%) Loss: 2.070997 LR: 0.00001591 +[14:28:09] Epoch: 1 Batch: 12894/20099 (64.15%) Loss: 2.132705 LR: 0.00001591 +[14:28:10] Epoch: 1 Batch: 12895/20099 (64.16%) Loss: 2.479273 LR: 0.00001589 +[14:28:12] Epoch: 1 Batch: 12896/20099 (64.16%) Loss: 2.023038 LR: 0.00001589 +[14:28:14] Epoch: 1 Batch: 12897/20099 (64.17%) Loss: 1.972834 LR: 0.00001589 +[14:28:16] Epoch: 1 Batch: 12898/20099 (64.17%) Loss: 2.335745 LR: 0.00001589 +[14:28:18] Epoch: 1 Batch: 12899/20099 (64.18%) Loss: 2.158777 LR: 0.00001589 +[14:28:20] Epoch: 1 Batch: 12900/20099 (64.18%) Loss: 2.017734 LR: 0.00001589 +[14:28:22] Epoch: 1 Batch: 12901/20099 (64.19%) Loss: 1.960034 LR: 0.00001589 +[14:28:23] Epoch: 1 Batch: 12902/20099 (64.19%) Loss: 2.156090 LR: 0.00001588 +[14:28:25] Epoch: 1 Batch: 12903/20099 (64.20%) Loss: 2.094389 LR: 0.00001588 +[14:28:27] Epoch: 1 Batch: 12904/20099 (64.20%) Loss: 2.578749 LR: 0.00001588 +[14:28:29] Epoch: 1 Batch: 12905/20099 (64.21%) Loss: 2.345489 LR: 0.00001588 +[14:28:31] Epoch: 1 Batch: 12906/20099 (64.21%) Loss: 2.068344 LR: 0.00001588 +[14:28:33] Epoch: 1 Batch: 12907/20099 (64.22%) Loss: 2.047159 LR: 0.00001588 +[14:28:35] Epoch: 1 Batch: 12908/20099 (64.22%) Loss: 2.216292 LR: 0.00001588 +[14:28:36] Epoch: 1 Batch: 12909/20099 (64.23%) Loss: 1.973738 LR: 0.00001586 +[14:28:38] Epoch: 1 Batch: 12910/20099 (64.23%) Loss: 1.978111 LR: 0.00001586 +[14:28:40] Epoch: 1 Batch: 12911/20099 (64.24%) Loss: 1.840107 LR: 0.00001586 +[14:28:42] Epoch: 1 Batch: 12912/20099 (64.24%) Loss: 2.234126 LR: 0.00001586 +[14:28:44] Epoch: 1 Batch: 12913/20099 (64.25%) Loss: 2.164028 LR: 0.00001586 +[14:28:46] Epoch: 1 Batch: 12914/20099 (64.25%) Loss: 2.159059 LR: 0.00001586 +[14:28:48] Epoch: 1 Batch: 12915/20099 (64.26%) Loss: 2.039332 LR: 0.00001586 +[14:28:49] Epoch: 1 Batch: 12916/20099 (64.26%) Loss: 2.195777 LR: 0.00001584 +[14:28:51] Epoch: 1 Batch: 12917/20099 (64.27%) Loss: 1.818622 LR: 0.00001584 +[14:28:53] Epoch: 1 Batch: 12918/20099 (64.27%) Loss: 2.101648 LR: 0.00001584 +[14:28:55] Epoch: 1 Batch: 12919/20099 (64.28%) Loss: 2.108256 LR: 0.00001584 +[14:28:57] Epoch: 1 Batch: 12920/20099 (64.28%) Loss: 2.023192 LR: 0.00001584 +[14:28:59] Epoch: 1 Batch: 12921/20099 (64.29%) Loss: 1.792125 LR: 0.00001584 +[14:29:00] Epoch: 1 Batch: 12922/20099 (64.29%) Loss: 2.260219 LR: 0.00001584 +[14:29:02] Epoch: 1 Batch: 12923/20099 (64.30%) Loss: 1.999249 LR: 0.00001583 +[14:29:04] Epoch: 1 Batch: 12924/20099 (64.30%) Loss: 2.289367 LR: 0.00001583 +[14:29:06] Epoch: 1 Batch: 12925/20099 (64.31%) Loss: 2.193008 LR: 0.00001583 +[14:29:08] Epoch: 1 Batch: 12926/20099 (64.31%) Loss: 2.249049 LR: 0.00001583 +[14:29:10] Epoch: 1 Batch: 12927/20099 (64.32%) Loss: 2.179411 LR: 0.00001583 +[14:29:12] Epoch: 1 Batch: 12928/20099 (64.32%) Loss: 2.246844 LR: 0.00001583 +[14:29:13] Epoch: 1 Batch: 12929/20099 (64.33%) Loss: 2.254588 LR: 0.00001583 +[14:29:15] Epoch: 1 Batch: 12930/20099 (64.33%) Loss: 2.092018 LR: 0.00001581 +[14:29:17] Epoch: 1 Batch: 12931/20099 (64.34%) Loss: 1.964147 LR: 0.00001581 +[14:29:19] Epoch: 1 Batch: 12932/20099 (64.34%) Loss: 2.181491 LR: 0.00001581 +[14:29:21] Epoch: 1 Batch: 12933/20099 (64.35%) Loss: 2.067664 LR: 0.00001581 +[14:29:23] Epoch: 1 Batch: 12934/20099 (64.35%) Loss: 1.993499 LR: 0.00001581 +[14:29:25] Epoch: 1 Batch: 12935/20099 (64.36%) Loss: 2.233940 LR: 0.00001581 +[14:29:26] Epoch: 1 Batch: 12936/20099 (64.36%) Loss: 2.065438 LR: 0.00001581 +[14:29:28] Epoch: 1 Batch: 12937/20099 (64.37%) Loss: 2.069937 LR: 0.00001580 +[14:29:30] Epoch: 1 Batch: 12938/20099 (64.37%) Loss: 2.094052 LR: 0.00001580 +[14:29:32] Epoch: 1 Batch: 12939/20099 (64.38%) Loss: 1.999597 LR: 0.00001580 +[14:29:34] Epoch: 1 Batch: 12940/20099 (64.38%) Loss: 1.976787 LR: 0.00001580 +[14:29:36] Epoch: 1 Batch: 12941/20099 (64.39%) Loss: 1.953522 LR: 0.00001580 +[14:29:38] Epoch: 1 Batch: 12942/20099 (64.39%) Loss: 2.018340 LR: 0.00001580 +[14:29:39] Epoch: 1 Batch: 12943/20099 (64.40%) Loss: 1.953619 LR: 0.00001580 +[14:29:41] Epoch: 1 Batch: 12944/20099 (64.40%) Loss: 2.329332 LR: 0.00001578 +[14:29:43] Epoch: 1 Batch: 12945/20099 (64.41%) Loss: 2.074020 LR: 0.00001578 +[14:29:45] Epoch: 1 Batch: 12946/20099 (64.41%) Loss: 2.038601 LR: 0.00001578 +[14:29:47] Epoch: 1 Batch: 12947/20099 (64.42%) Loss: 2.007611 LR: 0.00001578 +[14:29:49] Epoch: 1 Batch: 12948/20099 (64.42%) Loss: 1.998048 LR: 0.00001578 +[14:29:51] Epoch: 1 Batch: 12949/20099 (64.43%) Loss: 1.945383 LR: 0.00001578 +[14:29:53] Epoch: 1 Batch: 12950/20099 (64.43%) Loss: 2.228230 LR: 0.00001578 +[14:29:54] Epoch: 1 Batch: 12951/20099 (64.44%) Loss: 1.999337 LR: 0.00001576 +[14:29:56] Epoch: 1 Batch: 12952/20099 (64.44%) Loss: 2.325129 LR: 0.00001576 +[14:29:58] Epoch: 1 Batch: 12953/20099 (64.45%) Loss: 1.928772 LR: 0.00001576 +[14:30:00] Epoch: 1 Batch: 12954/20099 (64.45%) Loss: 2.000386 LR: 0.00001576 +[14:30:02] Epoch: 1 Batch: 12955/20099 (64.46%) Loss: 2.365925 LR: 0.00001576 +[14:30:04] Epoch: 1 Batch: 12956/20099 (64.46%) Loss: 2.282686 LR: 0.00001576 +[14:30:06] Epoch: 1 Batch: 12957/20099 (64.47%) Loss: 1.856720 LR: 0.00001576 +[14:30:08] Epoch: 1 Batch: 12958/20099 (64.47%) Loss: 1.958438 LR: 0.00001575 +[14:30:09] Epoch: 1 Batch: 12959/20099 (64.48%) Loss: 2.171622 LR: 0.00001575 +[14:30:11] Epoch: 1 Batch: 12960/20099 (64.48%) Loss: 2.184100 LR: 0.00001575 +[14:30:13] Epoch: 1 Batch: 12961/20099 (64.49%) Loss: 2.203345 LR: 0.00001575 +[14:30:15] Epoch: 1 Batch: 12962/20099 (64.49%) Loss: 2.292236 LR: 0.00001575 +[14:30:17] Epoch: 1 Batch: 12963/20099 (64.50%) Loss: 1.878169 LR: 0.00001575 +[14:30:19] Epoch: 1 Batch: 12964/20099 (64.50%) Loss: 2.163909 LR: 0.00001575 +[14:30:21] Epoch: 1 Batch: 12965/20099 (64.51%) Loss: 2.144651 LR: 0.00001573 +[14:30:22] Epoch: 1 Batch: 12966/20099 (64.51%) Loss: 2.118259 LR: 0.00001573 +[14:30:24] Epoch: 1 Batch: 12967/20099 (64.52%) Loss: 2.302425 LR: 0.00001573 +[14:30:26] Epoch: 1 Batch: 12968/20099 (64.52%) Loss: 2.083566 LR: 0.00001573 +[14:30:28] Epoch: 1 Batch: 12969/20099 (64.53%) Loss: 2.034535 LR: 0.00001573 +[14:30:30] Epoch: 1 Batch: 12970/20099 (64.53%) Loss: 2.031750 LR: 0.00001573 +[14:30:32] Epoch: 1 Batch: 12971/20099 (64.54%) Loss: 2.162639 LR: 0.00001573 +[14:30:34] Epoch: 1 Batch: 12972/20099 (64.54%) Loss: 2.416115 LR: 0.00001572 +[14:30:35] Epoch: 1 Batch: 12973/20099 (64.55%) Loss: 2.023897 LR: 0.00001572 +[14:30:37] Epoch: 1 Batch: 12974/20099 (64.55%) Loss: 2.014349 LR: 0.00001572 +[14:30:39] Epoch: 1 Batch: 12975/20099 (64.56%) Loss: 2.078857 LR: 0.00001572 +[14:30:41] Epoch: 1 Batch: 12976/20099 (64.56%) Loss: 2.113276 LR: 0.00001572 +[14:30:43] Epoch: 1 Batch: 12977/20099 (64.57%) Loss: 1.958822 LR: 0.00001572 +[14:30:45] Epoch: 1 Batch: 12978/20099 (64.57%) Loss: 1.940669 LR: 0.00001572 +[14:30:47] Epoch: 1 Batch: 12979/20099 (64.58%) Loss: 2.317500 LR: 0.00001570 +[14:30:48] Epoch: 1 Batch: 12980/20099 (64.58%) Loss: 2.242309 LR: 0.00001570 +[14:30:50] Epoch: 1 Batch: 12981/20099 (64.59%) Loss: 2.114102 LR: 0.00001570 +[14:30:52] Epoch: 1 Batch: 12982/20099 (64.59%) Loss: 1.841945 LR: 0.00001570 +[14:30:54] Epoch: 1 Batch: 12983/20099 (64.60%) Loss: 2.250595 LR: 0.00001570 +[14:30:56] Epoch: 1 Batch: 12984/20099 (64.60%) Loss: 1.975917 LR: 0.00001570 +[14:30:58] Epoch: 1 Batch: 12985/20099 (64.61%) Loss: 2.284597 LR: 0.00001570 +[14:30:59] Epoch: 1 Batch: 12986/20099 (64.61%) Loss: 2.287809 LR: 0.00001568 +[14:31:01] Epoch: 1 Batch: 12987/20099 (64.62%) Loss: 2.382632 LR: 0.00001568 +[14:31:03] Epoch: 1 Batch: 12988/20099 (64.62%) Loss: 2.066663 LR: 0.00001568 +[14:31:05] Epoch: 1 Batch: 12989/20099 (64.63%) Loss: 2.328650 LR: 0.00001568 +[14:31:07] Epoch: 1 Batch: 12990/20099 (64.63%) Loss: 2.069773 LR: 0.00001568 +[14:31:09] Epoch: 1 Batch: 12991/20099 (64.64%) Loss: 2.189937 LR: 0.00001568 +[14:31:11] Epoch: 1 Batch: 12992/20099 (64.64%) Loss: 2.081442 LR: 0.00001568 +[14:31:12] Epoch: 1 Batch: 12993/20099 (64.65%) Loss: 2.052189 LR: 0.00001567 +[14:31:14] Epoch: 1 Batch: 12994/20099 (64.65%) Loss: 2.345217 LR: 0.00001567 +[14:31:16] Epoch: 1 Batch: 12995/20099 (64.65%) Loss: 2.032706 LR: 0.00001567 +[14:31:18] Epoch: 1 Batch: 12996/20099 (64.66%) Loss: 2.087821 LR: 0.00001567 +[14:31:20] Epoch: 1 Batch: 12997/20099 (64.66%) Loss: 2.083219 LR: 0.00001567 +[14:31:22] Epoch: 1 Batch: 12998/20099 (64.67%) Loss: 2.241676 LR: 0.00001567 +[14:31:24] Epoch: 1 Batch: 12999/20099 (64.67%) Loss: 2.049608 LR: 0.00001567 +[14:31:25] >> Evaluating batch 0 +[14:31:27] >> Evaluating batch 1 +[14:31:28] >> Evaluating batch 2 +[14:31:29] >> Evaluating batch 3 +[14:31:30] >> Evaluating batch 4 +[14:31:31] >> Evaluating batch 5 +[14:31:32] >> Evaluating batch 6 +[14:31:33] >> Evaluating batch 7 +[14:31:34] >> Evaluating batch 8 +[14:31:35] >> Evaluating batch 9 +[14:31:36] >> Evaluating batch 10 +[14:31:37] >> Evaluating batch 11 +[14:31:38] >> Evaluating batch 12 +[14:31:39] >> Evaluating batch 13 +[14:31:40] >> Evaluating batch 14 +[14:31:41] >> Evaluating batch 15 +[14:31:42] >> Evaluating batch 16 +[14:31:43] Epoch: 1 Step: 13000/20099 Evaluation: +[14:31:43] [1mAvg Loss Since Last Eval: 2.1007 Val Loss: 2.1575 Validation loss delta: 0.0012 Perplexity: 8.6499 LR: 0.00001565 +[14:31:47] >> Cleaned up old temp checkpoint: epoch1_step7200 +[14:31:47] >> Temp checkpoint saved: epoch1_step13000, size: 0.1693 GB +[14:31:50] >> Checkpoint saved: epoch1_step13000, size: 0.1693 GB +[14:31:50] Epoch: 1 Batch: 13000/20099 (64.68%) Loss: 2.186511 LR: 0.00001565 +[14:31:52] Epoch: 1 Batch: 13001/20099 (64.68%) Loss: 2.033415 LR: 0.00001565 +[14:31:54] Epoch: 1 Batch: 13002/20099 (64.69%) Loss: 2.112689 LR: 0.00001565 +[14:31:56] Epoch: 1 Batch: 13003/20099 (64.69%) Loss: 2.029428 LR: 0.00001565 +[14:31:58] Epoch: 1 Batch: 13004/20099 (64.70%) Loss: 2.165260 LR: 0.00001565 +[14:32:00] Epoch: 1 Batch: 13005/20099 (64.70%) Loss: 1.916878 LR: 0.00001565 +[14:32:01] Epoch: 1 Batch: 13006/20099 (64.71%) Loss: 2.075658 LR: 0.00001565 +[14:32:03] Epoch: 1 Batch: 13007/20099 (64.71%) Loss: 2.195780 LR: 0.00001564 +[14:32:05] Epoch: 1 Batch: 13008/20099 (64.72%) Loss: 1.837087 LR: 0.00001564 +[14:32:07] Epoch: 1 Batch: 13009/20099 (64.72%) Loss: 2.286971 LR: 0.00001564 +[14:32:09] Epoch: 1 Batch: 13010/20099 (64.73%) Loss: 2.213863 LR: 0.00001564 +[14:32:11] Epoch: 1 Batch: 13011/20099 (64.73%) Loss: 2.055955 LR: 0.00001564 +[14:32:13] Epoch: 1 Batch: 13012/20099 (64.74%) Loss: 1.917463 LR: 0.00001564 +[14:32:15] Epoch: 1 Batch: 13013/20099 (64.74%) Loss: 2.159004 LR: 0.00001564 +[14:32:17] Epoch: 1 Batch: 13014/20099 (64.75%) Loss: 2.422572 LR: 0.00001562 +[14:32:18] Epoch: 1 Batch: 13015/20099 (64.75%) Loss: 2.172176 LR: 0.00001562 +[14:32:20] Epoch: 1 Batch: 13016/20099 (64.76%) Loss: 2.096167 LR: 0.00001562 +[14:32:22] Epoch: 1 Batch: 13017/20099 (64.76%) Loss: 1.715508 LR: 0.00001562 +[14:32:24] Epoch: 1 Batch: 13018/20099 (64.77%) Loss: 2.088180 LR: 0.00001562 +[14:32:26] Epoch: 1 Batch: 13019/20099 (64.77%) Loss: 2.170098 LR: 0.00001562 +[14:32:28] Epoch: 1 Batch: 13020/20099 (64.78%) Loss: 2.085781 LR: 0.00001562 +[14:32:30] Epoch: 1 Batch: 13021/20099 (64.78%) Loss: 2.156624 LR: 0.00001560 +[14:32:32] Epoch: 1 Batch: 13022/20099 (64.79%) Loss: 1.931699 LR: 0.00001560 +[14:32:34] Epoch: 1 Batch: 13023/20099 (64.79%) Loss: 2.058910 LR: 0.00001560 +[14:32:35] Epoch: 1 Batch: 13024/20099 (64.80%) Loss: 2.491917 LR: 0.00001560 +[14:32:37] Epoch: 1 Batch: 13025/20099 (64.80%) Loss: 2.247309 LR: 0.00001560 +[14:32:39] Epoch: 1 Batch: 13026/20099 (64.81%) Loss: 1.844331 LR: 0.00001560 +[14:32:41] Epoch: 1 Batch: 13027/20099 (64.81%) Loss: 1.976668 LR: 0.00001560 +[14:32:43] Epoch: 1 Batch: 13028/20099 (64.82%) Loss: 2.279679 LR: 0.00001559 +[14:32:45] Epoch: 1 Batch: 13029/20099 (64.82%) Loss: 2.088941 LR: 0.00001559 +[14:32:47] Epoch: 1 Batch: 13030/20099 (64.83%) Loss: 2.250738 LR: 0.00001559 +[14:32:48] Epoch: 1 Batch: 13031/20099 (64.83%) Loss: 2.222489 LR: 0.00001559 +[14:32:50] Epoch: 1 Batch: 13032/20099 (64.84%) Loss: 1.802041 LR: 0.00001559 +[14:32:52] Epoch: 1 Batch: 13033/20099 (64.84%) Loss: 2.233462 LR: 0.00001559 +[14:32:54] Epoch: 1 Batch: 13034/20099 (64.85%) Loss: 2.507017 LR: 0.00001559 +[14:32:56] Epoch: 1 Batch: 13035/20099 (64.85%) Loss: 2.402030 LR: 0.00001557 +[14:32:58] Epoch: 1 Batch: 13036/20099 (64.86%) Loss: 1.921838 LR: 0.00001557 +[14:32:59] Epoch: 1 Batch: 13037/20099 (64.86%) Loss: 1.977620 LR: 0.00001557 +[14:33:01] Epoch: 1 Batch: 13038/20099 (64.87%) Loss: 2.160607 LR: 0.00001557 +[14:33:03] Epoch: 1 Batch: 13039/20099 (64.87%) Loss: 2.067260 LR: 0.00001557 +[14:33:05] Epoch: 1 Batch: 13040/20099 (64.88%) Loss: 2.273299 LR: 0.00001557 +[14:33:07] Epoch: 1 Batch: 13041/20099 (64.88%) Loss: 1.867083 LR: 0.00001557 +[14:33:09] Epoch: 1 Batch: 13042/20099 (64.89%) Loss: 2.235672 LR: 0.00001556 +[14:33:11] Epoch: 1 Batch: 13043/20099 (64.89%) Loss: 2.163190 LR: 0.00001556 +[14:33:12] Epoch: 1 Batch: 13044/20099 (64.90%) Loss: 2.124427 LR: 0.00001556 +[14:33:14] Epoch: 1 Batch: 13045/20099 (64.90%) Loss: 2.142574 LR: 0.00001556 +[14:33:16] Epoch: 1 Batch: 13046/20099 (64.91%) Loss: 1.950213 LR: 0.00001556 +[14:33:18] Epoch: 1 Batch: 13047/20099 (64.91%) Loss: 2.321145 LR: 0.00001556 +[14:33:20] Epoch: 1 Batch: 13048/20099 (64.92%) Loss: 1.954415 LR: 0.00001556 +[14:33:22] Epoch: 1 Batch: 13049/20099 (64.92%) Loss: 2.272018 LR: 0.00001554 +[14:33:23] Epoch: 1 Batch: 13050/20099 (64.93%) Loss: 2.270055 LR: 0.00001554 +[14:33:25] Epoch: 1 Batch: 13051/20099 (64.93%) Loss: 1.668542 LR: 0.00001554 +[14:33:27] Epoch: 1 Batch: 13052/20099 (64.94%) Loss: 2.115391 LR: 0.00001554 +[14:33:29] Epoch: 1 Batch: 13053/20099 (64.94%) Loss: 2.282138 LR: 0.00001554 +[14:33:31] Epoch: 1 Batch: 13054/20099 (64.95%) Loss: 1.971604 LR: 0.00001554 +[14:33:33] Epoch: 1 Batch: 13055/20099 (64.95%) Loss: 2.478104 LR: 0.00001554 +[14:33:35] Epoch: 1 Batch: 13056/20099 (64.96%) Loss: 1.961425 LR: 0.00001552 +[14:33:36] Epoch: 1 Batch: 13057/20099 (64.96%) Loss: 2.103701 LR: 0.00001552 +[14:33:38] Epoch: 1 Batch: 13058/20099 (64.97%) Loss: 2.378411 LR: 0.00001552 +[14:33:40] Epoch: 1 Batch: 13059/20099 (64.97%) Loss: 2.075052 LR: 0.00001552 +[14:33:42] Epoch: 1 Batch: 13060/20099 (64.98%) Loss: 2.053140 LR: 0.00001552 +[14:33:44] Epoch: 1 Batch: 13061/20099 (64.98%) Loss: 1.981675 LR: 0.00001552 +[14:33:46] Epoch: 1 Batch: 13062/20099 (64.99%) Loss: 2.080071 LR: 0.00001552 +[14:33:48] Epoch: 1 Batch: 13063/20099 (64.99%) Loss: 1.900683 LR: 0.00001551 +[14:33:49] Epoch: 1 Batch: 13064/20099 (65.00%) Loss: 2.163860 LR: 0.00001551 +[14:33:51] Epoch: 1 Batch: 13065/20099 (65.00%) Loss: 2.142650 LR: 0.00001551 +[14:33:53] Epoch: 1 Batch: 13066/20099 (65.01%) Loss: 2.375175 LR: 0.00001551 +[14:33:55] Epoch: 1 Batch: 13067/20099 (65.01%) Loss: 2.156680 LR: 0.00001551 +[14:33:57] Epoch: 1 Batch: 13068/20099 (65.02%) Loss: 2.159161 LR: 0.00001551 +[14:33:59] Epoch: 1 Batch: 13069/20099 (65.02%) Loss: 1.768512 LR: 0.00001551 +[14:34:01] Epoch: 1 Batch: 13070/20099 (65.03%) Loss: 1.899040 LR: 0.00001549 +[14:34:02] Epoch: 1 Batch: 13071/20099 (65.03%) Loss: 1.782808 LR: 0.00001549 +[14:34:04] Epoch: 1 Batch: 13072/20099 (65.04%) Loss: 1.832714 LR: 0.00001549 +[14:34:06] Epoch: 1 Batch: 13073/20099 (65.04%) Loss: 2.014294 LR: 0.00001549 +[14:34:08] Epoch: 1 Batch: 13074/20099 (65.05%) Loss: 2.095278 LR: 0.00001549 +[14:34:10] Epoch: 1 Batch: 13075/20099 (65.05%) Loss: 1.772875 LR: 0.00001549 +[14:34:12] Epoch: 1 Batch: 13076/20099 (65.06%) Loss: 1.903954 LR: 0.00001549 +[14:34:14] Epoch: 1 Batch: 13077/20099 (65.06%) Loss: 1.966766 LR: 0.00001548 +[14:34:15] Epoch: 1 Batch: 13078/20099 (65.07%) Loss: 2.247721 LR: 0.00001548 +[14:34:17] Epoch: 1 Batch: 13079/20099 (65.07%) Loss: 2.104852 LR: 0.00001548 +[14:34:19] Epoch: 1 Batch: 13080/20099 (65.08%) Loss: 2.059314 LR: 0.00001548 +[14:34:21] Epoch: 1 Batch: 13081/20099 (65.08%) Loss: 2.021200 LR: 0.00001548 +[14:34:23] Epoch: 1 Batch: 13082/20099 (65.09%) Loss: 2.138544 LR: 0.00001548 +[14:34:25] Epoch: 1 Batch: 13083/20099 (65.09%) Loss: 1.498118 LR: 0.00001548 +[14:34:27] Epoch: 1 Batch: 13084/20099 (65.10%) Loss: 1.830529 LR: 0.00001546 +[14:34:29] Epoch: 1 Batch: 13085/20099 (65.10%) Loss: 2.102938 LR: 0.00001546 +[14:34:30] Epoch: 1 Batch: 13086/20099 (65.11%) Loss: 2.249313 LR: 0.00001546 +[14:34:32] Epoch: 1 Batch: 13087/20099 (65.11%) Loss: 2.090057 LR: 0.00001546 +[14:34:34] Epoch: 1 Batch: 13088/20099 (65.12%) Loss: 2.228157 LR: 0.00001546 +[14:34:36] Epoch: 1 Batch: 13089/20099 (65.12%) Loss: 2.036174 LR: 0.00001546 +[14:34:38] Epoch: 1 Batch: 13090/20099 (65.13%) Loss: 1.994516 LR: 0.00001546 +[14:34:40] Epoch: 1 Batch: 13091/20099 (65.13%) Loss: 1.809638 LR: 0.00001545 +[14:34:42] Epoch: 1 Batch: 13092/20099 (65.14%) Loss: 1.995000 LR: 0.00001545 +[14:34:43] Epoch: 1 Batch: 13093/20099 (65.14%) Loss: 2.216943 LR: 0.00001545 +[14:34:45] Epoch: 1 Batch: 13094/20099 (65.15%) Loss: 1.844049 LR: 0.00001545 +[14:34:47] Epoch: 1 Batch: 13095/20099 (65.15%) Loss: 2.117756 LR: 0.00001545 +[14:34:49] Epoch: 1 Batch: 13096/20099 (65.16%) Loss: 1.993351 LR: 0.00001545 +[14:34:51] Epoch: 1 Batch: 13097/20099 (65.16%) Loss: 2.306378 LR: 0.00001545 +[14:34:53] Epoch: 1 Batch: 13098/20099 (65.17%) Loss: 2.019214 LR: 0.00001543 +[14:34:55] Epoch: 1 Batch: 13099/20099 (65.17%) Loss: 2.094152 LR: 0.00001543 +[14:34:56] Epoch: 1 Batch: 13100/20099 (65.18%) Loss: 2.001420 LR: 0.00001543 +[14:34:58] Epoch: 1 Batch: 13101/20099 (65.18%) Loss: 1.913367 LR: 0.00001543 +[14:35:00] Epoch: 1 Batch: 13102/20099 (65.19%) Loss: 1.980994 LR: 0.00001543 +[14:35:02] Epoch: 1 Batch: 13103/20099 (65.19%) Loss: 2.107728 LR: 0.00001543 +[14:35:04] Epoch: 1 Batch: 13104/20099 (65.20%) Loss: 2.009214 LR: 0.00001543 +[14:35:06] Epoch: 1 Batch: 13105/20099 (65.20%) Loss: 1.908236 LR: 0.00001541 +[14:35:08] Epoch: 1 Batch: 13106/20099 (65.21%) Loss: 2.094634 LR: 0.00001541 +[14:35:09] Epoch: 1 Batch: 13107/20099 (65.21%) Loss: 2.001141 LR: 0.00001541 +[14:35:11] Epoch: 1 Batch: 13108/20099 (65.22%) Loss: 2.228169 LR: 0.00001541 +[14:35:13] Epoch: 1 Batch: 13109/20099 (65.22%) Loss: 2.373686 LR: 0.00001541 +[14:35:15] Epoch: 1 Batch: 13110/20099 (65.23%) Loss: 2.072931 LR: 0.00001541 +[14:35:17] Epoch: 1 Batch: 13111/20099 (65.23%) Loss: 2.016960 LR: 0.00001541 +[14:35:19] Epoch: 1 Batch: 13112/20099 (65.24%) Loss: 1.740198 LR: 0.00001540 +[14:35:20] Epoch: 1 Batch: 13113/20099 (65.24%) Loss: 2.035235 LR: 0.00001540 +[14:35:22] Epoch: 1 Batch: 13114/20099 (65.25%) Loss: 1.650130 LR: 0.00001540 +[14:35:24] Epoch: 1 Batch: 13115/20099 (65.25%) Loss: 2.219318 LR: 0.00001540 +[14:35:26] Epoch: 1 Batch: 13116/20099 (65.26%) Loss: 2.137151 LR: 0.00001540 +[14:35:28] Epoch: 1 Batch: 13117/20099 (65.26%) Loss: 2.159455 LR: 0.00001540 +[14:35:30] Epoch: 1 Batch: 13118/20099 (65.27%) Loss: 2.067186 LR: 0.00001540 +[14:35:32] Epoch: 1 Batch: 13119/20099 (65.27%) Loss: 2.298682 LR: 0.00001538 +[14:35:33] Epoch: 1 Batch: 13120/20099 (65.28%) Loss: 1.984197 LR: 0.00001538 +[14:35:35] Epoch: 1 Batch: 13121/20099 (65.28%) Loss: 2.287044 LR: 0.00001538 +[14:35:37] Epoch: 1 Batch: 13122/20099 (65.29%) Loss: 2.010224 LR: 0.00001538 +[14:35:39] Epoch: 1 Batch: 13123/20099 (65.29%) Loss: 2.355015 LR: 0.00001538 +[14:35:41] Epoch: 1 Batch: 13124/20099 (65.30%) Loss: 2.427701 LR: 0.00001538 +[14:35:43] Epoch: 1 Batch: 13125/20099 (65.30%) Loss: 2.299504 LR: 0.00001538 +[14:35:45] Epoch: 1 Batch: 13126/20099 (65.31%) Loss: 2.145265 LR: 0.00001537 +[14:35:46] Epoch: 1 Batch: 13127/20099 (65.31%) Loss: 1.982587 LR: 0.00001537 +[14:35:48] Epoch: 1 Batch: 13128/20099 (65.32%) Loss: 1.929771 LR: 0.00001537 +[14:35:50] Epoch: 1 Batch: 13129/20099 (65.32%) Loss: 2.201361 LR: 0.00001537 +[14:35:52] Epoch: 1 Batch: 13130/20099 (65.33%) Loss: 2.281821 LR: 0.00001537 +[14:35:54] Epoch: 1 Batch: 13131/20099 (65.33%) Loss: 2.027696 LR: 0.00001537 +[14:35:56] Epoch: 1 Batch: 13132/20099 (65.34%) Loss: 2.132603 LR: 0.00001537 +[14:35:58] Epoch: 1 Batch: 13133/20099 (65.34%) Loss: 1.877525 LR: 0.00001535 +[14:35:59] Epoch: 1 Batch: 13134/20099 (65.35%) Loss: 2.102915 LR: 0.00001535 +[14:36:01] Epoch: 1 Batch: 13135/20099 (65.35%) Loss: 2.101993 LR: 0.00001535 +[14:36:03] Epoch: 1 Batch: 13136/20099 (65.36%) Loss: 1.971623 LR: 0.00001535 +[14:36:05] Epoch: 1 Batch: 13137/20099 (65.36%) Loss: 2.100583 LR: 0.00001535 +[14:36:07] Epoch: 1 Batch: 13138/20099 (65.37%) Loss: 2.114788 LR: 0.00001535 +[14:36:09] Epoch: 1 Batch: 13139/20099 (65.37%) Loss: 2.067682 LR: 0.00001535 +[14:36:11] Epoch: 1 Batch: 13140/20099 (65.38%) Loss: 2.127916 LR: 0.00001533 +[14:36:12] Epoch: 1 Batch: 13141/20099 (65.38%) Loss: 2.190855 LR: 0.00001533 +[14:36:14] Epoch: 1 Batch: 13142/20099 (65.39%) Loss: 2.155585 LR: 0.00001533 +[14:36:16] Epoch: 1 Batch: 13143/20099 (65.39%) Loss: 1.791536 LR: 0.00001533 +[14:36:18] Epoch: 1 Batch: 13144/20099 (65.40%) Loss: 1.876237 LR: 0.00001533 +[14:36:20] Epoch: 1 Batch: 13145/20099 (65.40%) Loss: 1.896307 LR: 0.00001533 +[14:36:22] Epoch: 1 Batch: 13146/20099 (65.41%) Loss: 2.112759 LR: 0.00001533 +[14:36:24] Epoch: 1 Batch: 13147/20099 (65.41%) Loss: 2.109152 LR: 0.00001532 +[14:36:25] Epoch: 1 Batch: 13148/20099 (65.42%) Loss: 2.049245 LR: 0.00001532 +[14:36:27] Epoch: 1 Batch: 13149/20099 (65.42%) Loss: 2.015254 LR: 0.00001532 +[14:36:29] Epoch: 1 Batch: 13150/20099 (65.43%) Loss: 2.186172 LR: 0.00001532 +[14:36:31] Epoch: 1 Batch: 13151/20099 (65.43%) Loss: 1.873394 LR: 0.00001532 +[14:36:33] Epoch: 1 Batch: 13152/20099 (65.44%) Loss: 2.154361 LR: 0.00001532 +[14:36:35] Epoch: 1 Batch: 13153/20099 (65.44%) Loss: 2.039216 LR: 0.00001532 +[14:36:37] Epoch: 1 Batch: 13154/20099 (65.45%) Loss: 1.883165 LR: 0.00001530 +[14:36:38] Epoch: 1 Batch: 13155/20099 (65.45%) Loss: 1.919197 LR: 0.00001530 +[14:36:40] Epoch: 1 Batch: 13156/20099 (65.46%) Loss: 2.253483 LR: 0.00001530 +[14:36:42] Epoch: 1 Batch: 13157/20099 (65.46%) Loss: 1.949835 LR: 0.00001530 +[14:36:44] Epoch: 1 Batch: 13158/20099 (65.47%) Loss: 2.171381 LR: 0.00001530 +[14:36:46] Epoch: 1 Batch: 13159/20099 (65.47%) Loss: 2.045382 LR: 0.00001530 +[14:36:48] Epoch: 1 Batch: 13160/20099 (65.48%) Loss: 2.004151 LR: 0.00001530 +[14:36:49] Epoch: 1 Batch: 13161/20099 (65.48%) Loss: 1.988157 LR: 0.00001529 +[14:36:51] Epoch: 1 Batch: 13162/20099 (65.49%) Loss: 2.272771 LR: 0.00001529 +[14:36:53] Epoch: 1 Batch: 13163/20099 (65.49%) Loss: 2.151926 LR: 0.00001529 +[14:36:55] Epoch: 1 Batch: 13164/20099 (65.50%) Loss: 1.997819 LR: 0.00001529 +[14:36:57] Epoch: 1 Batch: 13165/20099 (65.50%) Loss: 1.723411 LR: 0.00001529 +[14:36:59] Epoch: 1 Batch: 13166/20099 (65.51%) Loss: 1.979266 LR: 0.00001529 +[14:37:00] Epoch: 1 Batch: 13167/20099 (65.51%) Loss: 1.925596 LR: 0.00001529 +[14:37:02] Epoch: 1 Batch: 13168/20099 (65.52%) Loss: 2.063634 LR: 0.00001527 +[14:37:04] Epoch: 1 Batch: 13169/20099 (65.52%) Loss: 2.227344 LR: 0.00001527 +[14:37:06] Epoch: 1 Batch: 13170/20099 (65.53%) Loss: 2.047029 LR: 0.00001527 +[14:37:08] Epoch: 1 Batch: 13171/20099 (65.53%) Loss: 2.148620 LR: 0.00001527 +[14:37:10] Epoch: 1 Batch: 13172/20099 (65.54%) Loss: 2.307974 LR: 0.00001527 +[14:37:12] Epoch: 1 Batch: 13173/20099 (65.54%) Loss: 2.017819 LR: 0.00001527 +[14:37:13] Epoch: 1 Batch: 13174/20099 (65.55%) Loss: 2.151975 LR: 0.00001527 +[14:37:15] Epoch: 1 Batch: 13175/20099 (65.55%) Loss: 1.955115 LR: 0.00001526 +[14:37:17] Epoch: 1 Batch: 13176/20099 (65.56%) Loss: 1.775499 LR: 0.00001526 +[14:37:19] Epoch: 1 Batch: 13177/20099 (65.56%) Loss: 2.089624 LR: 0.00001526 +[14:37:21] Epoch: 1 Batch: 13178/20099 (65.57%) Loss: 2.186596 LR: 0.00001526 +[14:37:23] Epoch: 1 Batch: 13179/20099 (65.57%) Loss: 1.924040 LR: 0.00001526 +[14:37:25] Epoch: 1 Batch: 13180/20099 (65.58%) Loss: 1.902681 LR: 0.00001526 +[14:37:26] Epoch: 1 Batch: 13181/20099 (65.58%) Loss: 2.139262 LR: 0.00001526 +[14:37:28] Epoch: 1 Batch: 13182/20099 (65.59%) Loss: 2.223891 LR: 0.00001524 +[14:37:30] Epoch: 1 Batch: 13183/20099 (65.59%) Loss: 2.135888 LR: 0.00001524 +[14:37:32] Epoch: 1 Batch: 13184/20099 (65.60%) Loss: 2.089815 LR: 0.00001524 +[14:37:34] Epoch: 1 Batch: 13185/20099 (65.60%) Loss: 2.113931 LR: 0.00001524 +[14:37:36] Epoch: 1 Batch: 13186/20099 (65.61%) Loss: 2.242988 LR: 0.00001524 +[14:37:38] Epoch: 1 Batch: 13187/20099 (65.61%) Loss: 2.129860 LR: 0.00001524 +[14:37:40] Epoch: 1 Batch: 13188/20099 (65.62%) Loss: 2.267756 LR: 0.00001524 +[14:37:41] Epoch: 1 Batch: 13189/20099 (65.62%) Loss: 2.125868 LR: 0.00001522 +[14:37:43] Epoch: 1 Batch: 13190/20099 (65.63%) Loss: 1.897566 LR: 0.00001522 +[14:37:45] Epoch: 1 Batch: 13191/20099 (65.63%) Loss: 1.897878 LR: 0.00001522 +[14:37:47] Epoch: 1 Batch: 13192/20099 (65.64%) Loss: 2.139117 LR: 0.00001522 +[14:37:49] Epoch: 1 Batch: 13193/20099 (65.64%) Loss: 1.963516 LR: 0.00001522 +[14:37:51] Epoch: 1 Batch: 13194/20099 (65.65%) Loss: 2.012357 LR: 0.00001522 +[14:37:53] Epoch: 1 Batch: 13195/20099 (65.65%) Loss: 2.088510 LR: 0.00001522 +[14:37:54] Epoch: 1 Batch: 13196/20099 (65.66%) Loss: 2.325826 LR: 0.00001521 +[14:37:56] Epoch: 1 Batch: 13197/20099 (65.66%) Loss: 1.917570 LR: 0.00001521 +[14:37:58] Epoch: 1 Batch: 13198/20099 (65.66%) Loss: 2.124802 LR: 0.00001521 +[14:38:00] Epoch: 1 Batch: 13199/20099 (65.67%) Loss: 2.210272 LR: 0.00001521 +[14:38:06] >> Cleaned up old temp checkpoint: epoch1_step7400 +[14:38:06] >> Temp checkpoint saved: epoch1_step13200, size: 0.1693 GB +[14:38:06] Epoch: 1 Batch: 13200/20099 (65.67%) Loss: 2.100329 LR: 0.00001521 +[14:38:07] Epoch: 1 Batch: 13201/20099 (65.68%) Loss: 2.164722 LR: 0.00001521 +[14:38:09] Epoch: 1 Batch: 13202/20099 (65.68%) Loss: 1.989659 LR: 0.00001521 +[14:38:11] Epoch: 1 Batch: 13203/20099 (65.69%) Loss: 2.001170 LR: 0.00001519 +[14:38:13] Epoch: 1 Batch: 13204/20099 (65.69%) Loss: 1.948820 LR: 0.00001519 +[14:38:15] Epoch: 1 Batch: 13205/20099 (65.70%) Loss: 2.014765 LR: 0.00001519 +[14:38:17] Epoch: 1 Batch: 13206/20099 (65.70%) Loss: 2.128066 LR: 0.00001519 +[14:38:18] Epoch: 1 Batch: 13207/20099 (65.71%) Loss: 2.099763 LR: 0.00001519 +[14:38:20] Epoch: 1 Batch: 13208/20099 (65.71%) Loss: 2.333603 LR: 0.00001519 +[14:38:22] Epoch: 1 Batch: 13209/20099 (65.72%) Loss: 2.169842 LR: 0.00001519 +[14:38:24] Epoch: 1 Batch: 13210/20099 (65.72%) Loss: 1.772969 LR: 0.00001518 +[14:38:26] Epoch: 1 Batch: 13211/20099 (65.73%) Loss: 2.020803 LR: 0.00001518 +[14:38:28] Epoch: 1 Batch: 13212/20099 (65.73%) Loss: 1.961950 LR: 0.00001518 +[14:38:30] Epoch: 1 Batch: 13213/20099 (65.74%) Loss: 2.306063 LR: 0.00001518 +[14:38:32] Epoch: 1 Batch: 13214/20099 (65.74%) Loss: 2.069408 LR: 0.00001518 +[14:38:34] Epoch: 1 Batch: 13215/20099 (65.75%) Loss: 2.162756 LR: 0.00001518 +[14:38:35] Epoch: 1 Batch: 13216/20099 (65.75%) Loss: 1.959623 LR: 0.00001518 +[14:38:37] Epoch: 1 Batch: 13217/20099 (65.76%) Loss: 2.033809 LR: 0.00001516 +[14:38:39] Epoch: 1 Batch: 13218/20099 (65.76%) Loss: 2.224956 LR: 0.00001516 +[14:38:41] Epoch: 1 Batch: 13219/20099 (65.77%) Loss: 2.112262 LR: 0.00001516 +[14:38:43] Epoch: 1 Batch: 13220/20099 (65.77%) Loss: 2.081476 LR: 0.00001516 +[14:38:45] Epoch: 1 Batch: 13221/20099 (65.78%) Loss: 1.982050 LR: 0.00001516 +[14:38:47] Epoch: 1 Batch: 13222/20099 (65.78%) Loss: 2.275592 LR: 0.00001516 +[14:38:49] Epoch: 1 Batch: 13223/20099 (65.79%) Loss: 1.843499 LR: 0.00001516 +[14:38:50] Epoch: 1 Batch: 13224/20099 (65.79%) Loss: 2.497805 LR: 0.00001514 +[14:38:52] Epoch: 1 Batch: 13225/20099 (65.80%) Loss: 1.684174 LR: 0.00001514 +[14:38:54] Epoch: 1 Batch: 13226/20099 (65.80%) Loss: 2.239550 LR: 0.00001514 +[14:38:56] Epoch: 1 Batch: 13227/20099 (65.81%) Loss: 2.155401 LR: 0.00001514 +[14:38:58] Epoch: 1 Batch: 13228/20099 (65.81%) Loss: 1.905514 LR: 0.00001514 +[14:39:00] Epoch: 1 Batch: 13229/20099 (65.82%) Loss: 2.360881 LR: 0.00001514 +[14:39:02] Epoch: 1 Batch: 13230/20099 (65.82%) Loss: 2.545454 LR: 0.00001514 +[14:39:03] Epoch: 1 Batch: 13231/20099 (65.83%) Loss: 2.210158 LR: 0.00001513 +[14:39:05] Epoch: 1 Batch: 13232/20099 (65.83%) Loss: 2.140237 LR: 0.00001513 +[14:39:07] Epoch: 1 Batch: 13233/20099 (65.84%) Loss: 2.166386 LR: 0.00001513 +[14:39:09] Epoch: 1 Batch: 13234/20099 (65.84%) Loss: 2.046569 LR: 0.00001513 +[14:39:11] Epoch: 1 Batch: 13235/20099 (65.85%) Loss: 1.923811 LR: 0.00001513 +[14:39:13] Epoch: 1 Batch: 13236/20099 (65.85%) Loss: 2.382309 LR: 0.00001513 +[14:39:14] Epoch: 1 Batch: 13237/20099 (65.86%) Loss: 1.942813 LR: 0.00001513 +[14:39:16] Epoch: 1 Batch: 13238/20099 (65.86%) Loss: 1.900930 LR: 0.00001511 +[14:39:18] Epoch: 1 Batch: 13239/20099 (65.87%) Loss: 1.799536 LR: 0.00001511 +[14:39:20] Epoch: 1 Batch: 13240/20099 (65.87%) Loss: 2.316589 LR: 0.00001511 +[14:39:22] Epoch: 1 Batch: 13241/20099 (65.88%) Loss: 2.239168 LR: 0.00001511 +[14:39:24] Epoch: 1 Batch: 13242/20099 (65.88%) Loss: 2.129160 LR: 0.00001511 +[14:39:26] Epoch: 1 Batch: 13243/20099 (65.89%) Loss: 2.177190 LR: 0.00001511 +[14:39:27] Epoch: 1 Batch: 13244/20099 (65.89%) Loss: 2.101856 LR: 0.00001511 +[14:39:29] Epoch: 1 Batch: 13245/20099 (65.90%) Loss: 1.795748 LR: 0.00001510 +[14:39:31] Epoch: 1 Batch: 13246/20099 (65.90%) Loss: 1.982189 LR: 0.00001510 +[14:39:33] Epoch: 1 Batch: 13247/20099 (65.91%) Loss: 2.391403 LR: 0.00001510 +[14:39:35] Epoch: 1 Batch: 13248/20099 (65.91%) Loss: 2.282059 LR: 0.00001510 +[14:39:37] Epoch: 1 Batch: 13249/20099 (65.92%) Loss: 2.072038 LR: 0.00001510 +[14:39:39] Epoch: 1 Batch: 13250/20099 (65.92%) Loss: 2.127013 LR: 0.00001510 +[14:39:40] Epoch: 1 Batch: 13251/20099 (65.93%) Loss: 2.228413 LR: 0.00001510 +[14:39:42] Epoch: 1 Batch: 13252/20099 (65.93%) Loss: 2.234942 LR: 0.00001508 +[14:39:44] Epoch: 1 Batch: 13253/20099 (65.94%) Loss: 1.952884 LR: 0.00001508 +[14:39:46] Epoch: 1 Batch: 13254/20099 (65.94%) Loss: 2.025196 LR: 0.00001508 +[14:39:48] Epoch: 1 Batch: 13255/20099 (65.95%) Loss: 1.776057 LR: 0.00001508 +[14:39:50] Epoch: 1 Batch: 13256/20099 (65.95%) Loss: 2.288254 LR: 0.00001508 +[14:39:52] Epoch: 1 Batch: 13257/20099 (65.96%) Loss: 2.095778 LR: 0.00001508 +[14:39:53] Epoch: 1 Batch: 13258/20099 (65.96%) Loss: 1.798327 LR: 0.00001508 +[14:39:55] Epoch: 1 Batch: 13259/20099 (65.97%) Loss: 2.149096 LR: 0.00001507 +[14:39:57] Epoch: 1 Batch: 13260/20099 (65.97%) Loss: 1.935918 LR: 0.00001507 +[14:39:59] Epoch: 1 Batch: 13261/20099 (65.98%) Loss: 2.147200 LR: 0.00001507 +[14:40:01] Epoch: 1 Batch: 13262/20099 (65.98%) Loss: 2.083940 LR: 0.00001507 +[14:40:03] Epoch: 1 Batch: 13263/20099 (65.99%) Loss: 1.889767 LR: 0.00001507 +[14:40:05] Epoch: 1 Batch: 13264/20099 (65.99%) Loss: 1.968810 LR: 0.00001507 +[14:40:06] Epoch: 1 Batch: 13265/20099 (66.00%) Loss: 2.424130 LR: 0.00001507 +[14:40:08] Epoch: 1 Batch: 13266/20099 (66.00%) Loss: 1.941973 LR: 0.00001505 +[14:40:10] Epoch: 1 Batch: 13267/20099 (66.01%) Loss: 2.026785 LR: 0.00001505 +[14:40:12] Epoch: 1 Batch: 13268/20099 (66.01%) Loss: 2.004981 LR: 0.00001505 +[14:40:14] Epoch: 1 Batch: 13269/20099 (66.02%) Loss: 1.995591 LR: 0.00001505 +[14:40:16] Epoch: 1 Batch: 13270/20099 (66.02%) Loss: 2.053522 LR: 0.00001505 +[14:40:18] Epoch: 1 Batch: 13271/20099 (66.03%) Loss: 2.215240 LR: 0.00001505 +[14:40:19] Epoch: 1 Batch: 13272/20099 (66.03%) Loss: 2.192187 LR: 0.00001505 +[14:40:21] Epoch: 1 Batch: 13273/20099 (66.04%) Loss: 2.298878 LR: 0.00001503 +[14:40:23] Epoch: 1 Batch: 13274/20099 (66.04%) Loss: 2.034941 LR: 0.00001503 +[14:40:25] Epoch: 1 Batch: 13275/20099 (66.05%) Loss: 1.986762 LR: 0.00001503 +[14:40:27] Epoch: 1 Batch: 13276/20099 (66.05%) Loss: 1.991832 LR: 0.00001503 +[14:40:29] Epoch: 1 Batch: 13277/20099 (66.06%) Loss: 2.264936 LR: 0.00001503 +[14:40:31] Epoch: 1 Batch: 13278/20099 (66.06%) Loss: 2.404581 LR: 0.00001503 +[14:40:32] Epoch: 1 Batch: 13279/20099 (66.07%) Loss: 1.853590 LR: 0.00001503 +[14:40:34] Epoch: 1 Batch: 13280/20099 (66.07%) Loss: 2.103401 LR: 0.00001502 +[14:40:36] Epoch: 1 Batch: 13281/20099 (66.08%) Loss: 2.077523 LR: 0.00001502 +[14:40:38] Epoch: 1 Batch: 13282/20099 (66.08%) Loss: 2.057067 LR: 0.00001502 +[14:40:40] Epoch: 1 Batch: 13283/20099 (66.09%) Loss: 1.807791 LR: 0.00001502 +[14:40:42] Epoch: 1 Batch: 13284/20099 (66.09%) Loss: 1.867887 LR: 0.00001502 +[14:40:44] Epoch: 1 Batch: 13285/20099 (66.10%) Loss: 2.179462 LR: 0.00001502 +[14:40:46] Epoch: 1 Batch: 13286/20099 (66.10%) Loss: 2.035606 LR: 0.00001502 +[14:40:47] Epoch: 1 Batch: 13287/20099 (66.11%) Loss: 1.995253 LR: 0.00001500 +[14:40:49] Epoch: 1 Batch: 13288/20099 (66.11%) Loss: 1.916707 LR: 0.00001500 +[14:40:51] Epoch: 1 Batch: 13289/20099 (66.12%) Loss: 1.984780 LR: 0.00001500 +[14:40:53] Epoch: 1 Batch: 13290/20099 (66.12%) Loss: 1.978468 LR: 0.00001500 +[14:40:55] Epoch: 1 Batch: 13291/20099 (66.13%) Loss: 2.129000 LR: 0.00001500 +[14:40:57] Epoch: 1 Batch: 13292/20099 (66.13%) Loss: 2.009662 LR: 0.00001500 +[14:40:58] Epoch: 1 Batch: 13293/20099 (66.14%) Loss: 2.284569 LR: 0.00001500 +[14:41:00] Epoch: 1 Batch: 13294/20099 (66.14%) Loss: 2.032733 LR: 0.00001499 +[14:41:02] Epoch: 1 Batch: 13295/20099 (66.15%) Loss: 2.197347 LR: 0.00001499 +[14:41:04] Epoch: 1 Batch: 13296/20099 (66.15%) Loss: 2.106954 LR: 0.00001499 +[14:41:06] Epoch: 1 Batch: 13297/20099 (66.16%) Loss: 1.755550 LR: 0.00001499 +[14:41:08] Epoch: 1 Batch: 13298/20099 (66.16%) Loss: 1.694667 LR: 0.00001499 +[14:41:10] Epoch: 1 Batch: 13299/20099 (66.17%) Loss: 2.054538 LR: 0.00001499 +[14:41:11] Epoch: 1 Batch: 13300/20099 (66.17%) Loss: 2.303781 LR: 0.00001499 +[14:41:13] Epoch: 1 Batch: 13301/20099 (66.18%) Loss: 2.338343 LR: 0.00001497 +[14:41:15] Epoch: 1 Batch: 13302/20099 (66.18%) Loss: 1.930414 LR: 0.00001497 +[14:41:17] Epoch: 1 Batch: 13303/20099 (66.19%) Loss: 2.263451 LR: 0.00001497 +[14:41:19] Epoch: 1 Batch: 13304/20099 (66.19%) Loss: 2.025279 LR: 0.00001497 +[14:41:21] Epoch: 1 Batch: 13305/20099 (66.20%) Loss: 1.967745 LR: 0.00001497 +[14:41:22] Epoch: 1 Batch: 13306/20099 (66.20%) Loss: 2.429386 LR: 0.00001497 +[14:41:24] Epoch: 1 Batch: 13307/20099 (66.21%) Loss: 2.230625 LR: 0.00001497 +[14:41:26] Epoch: 1 Batch: 13308/20099 (66.21%) Loss: 2.096977 LR: 0.00001496 +[14:41:28] Epoch: 1 Batch: 13309/20099 (66.22%) Loss: 2.336501 LR: 0.00001496 +[14:41:30] Epoch: 1 Batch: 13310/20099 (66.22%) Loss: 2.312783 LR: 0.00001496 +[14:41:32] Epoch: 1 Batch: 13311/20099 (66.23%) Loss: 2.071438 LR: 0.00001496 +[14:41:34] Epoch: 1 Batch: 13312/20099 (66.23%) Loss: 2.379458 LR: 0.00001496 +[14:41:35] Epoch: 1 Batch: 13313/20099 (66.24%) Loss: 2.158523 LR: 0.00001496 +[14:41:37] Epoch: 1 Batch: 13314/20099 (66.24%) Loss: 1.903454 LR: 0.00001496 +[14:41:39] Epoch: 1 Batch: 13315/20099 (66.25%) Loss: 1.839971 LR: 0.00001494 +[14:41:41] Epoch: 1 Batch: 13316/20099 (66.25%) Loss: 2.039931 LR: 0.00001494 +[14:41:43] Epoch: 1 Batch: 13317/20099 (66.26%) Loss: 1.767747 LR: 0.00001494 +[14:41:45] Epoch: 1 Batch: 13318/20099 (66.26%) Loss: 1.980087 LR: 0.00001494 +[14:41:47] Epoch: 1 Batch: 13319/20099 (66.27%) Loss: 2.074606 LR: 0.00001494 +[14:41:48] Epoch: 1 Batch: 13320/20099 (66.27%) Loss: 2.111281 LR: 0.00001494 +[14:41:50] Epoch: 1 Batch: 13321/20099 (66.28%) Loss: 2.273560 LR: 0.00001494 +[14:41:52] Epoch: 1 Batch: 13322/20099 (66.28%) Loss: 2.181362 LR: 0.00001492 +[14:41:54] Epoch: 1 Batch: 13323/20099 (66.29%) Loss: 2.243356 LR: 0.00001492 +[14:41:56] Epoch: 1 Batch: 13324/20099 (66.29%) Loss: 2.121855 LR: 0.00001492 +[14:41:58] Epoch: 1 Batch: 13325/20099 (66.30%) Loss: 2.167868 LR: 0.00001492 +[14:42:00] Epoch: 1 Batch: 13326/20099 (66.30%) Loss: 1.829316 LR: 0.00001492 +[14:42:01] Epoch: 1 Batch: 13327/20099 (66.31%) Loss: 1.829332 LR: 0.00001492 +[14:42:03] Epoch: 1 Batch: 13328/20099 (66.31%) Loss: 2.210718 LR: 0.00001492 +[14:42:05] Epoch: 1 Batch: 13329/20099 (66.32%) Loss: 2.150987 LR: 0.00001491 +[14:42:07] Epoch: 1 Batch: 13330/20099 (66.32%) Loss: 2.288572 LR: 0.00001491 +[14:42:09] Epoch: 1 Batch: 13331/20099 (66.33%) Loss: 1.981042 LR: 0.00001491 +[14:42:11] Epoch: 1 Batch: 13332/20099 (66.33%) Loss: 2.221025 LR: 0.00001491 +[14:42:13] Epoch: 1 Batch: 13333/20099 (66.34%) Loss: 2.110627 LR: 0.00001491 +[14:42:14] Epoch: 1 Batch: 13334/20099 (66.34%) Loss: 2.168762 LR: 0.00001491 +[14:42:16] Epoch: 1 Batch: 13335/20099 (66.35%) Loss: 1.816602 LR: 0.00001491 +[14:42:18] Epoch: 1 Batch: 13336/20099 (66.35%) Loss: 2.223041 LR: 0.00001489 +[14:42:20] Epoch: 1 Batch: 13337/20099 (66.36%) Loss: 2.069530 LR: 0.00001489 +[14:42:22] Epoch: 1 Batch: 13338/20099 (66.36%) Loss: 2.031565 LR: 0.00001489 +[14:42:24] Epoch: 1 Batch: 13339/20099 (66.37%) Loss: 2.159914 LR: 0.00001489 +[14:42:26] Epoch: 1 Batch: 13340/20099 (66.37%) Loss: 2.183666 LR: 0.00001489 +[14:42:27] Epoch: 1 Batch: 13341/20099 (66.38%) Loss: 2.094838 LR: 0.00001489 +[14:42:29] Epoch: 1 Batch: 13342/20099 (66.38%) Loss: 2.000820 LR: 0.00001489 +[14:42:31] Epoch: 1 Batch: 13343/20099 (66.39%) Loss: 1.890665 LR: 0.00001488 +[14:42:33] Epoch: 1 Batch: 13344/20099 (66.39%) Loss: 2.388323 LR: 0.00001488 +[14:42:35] Epoch: 1 Batch: 13345/20099 (66.40%) Loss: 2.242734 LR: 0.00001488 +[14:42:37] Epoch: 1 Batch: 13346/20099 (66.40%) Loss: 1.852429 LR: 0.00001488 +[14:42:38] Epoch: 1 Batch: 13347/20099 (66.41%) Loss: 2.160745 LR: 0.00001488 +[14:42:40] Epoch: 1 Batch: 13348/20099 (66.41%) Loss: 2.057290 LR: 0.00001488 +[14:42:42] Epoch: 1 Batch: 13349/20099 (66.42%) Loss: 2.008163 LR: 0.00001488 +[14:42:44] Epoch: 1 Batch: 13350/20099 (66.42%) Loss: 1.995487 LR: 0.00001486 +[14:42:46] Epoch: 1 Batch: 13351/20099 (66.43%) Loss: 1.992618 LR: 0.00001486 +[14:42:48] Epoch: 1 Batch: 13352/20099 (66.43%) Loss: 1.784879 LR: 0.00001486 +[14:42:50] Epoch: 1 Batch: 13353/20099 (66.44%) Loss: 1.997287 LR: 0.00001486 +[14:42:51] Epoch: 1 Batch: 13354/20099 (66.44%) Loss: 2.358819 LR: 0.00001486 +[14:42:53] Epoch: 1 Batch: 13355/20099 (66.45%) Loss: 2.384898 LR: 0.00001486 +[14:42:55] Epoch: 1 Batch: 13356/20099 (66.45%) Loss: 1.859201 LR: 0.00001486 +[14:42:57] Epoch: 1 Batch: 13357/20099 (66.46%) Loss: 2.176084 LR: 0.00001485 +[14:42:59] Epoch: 1 Batch: 13358/20099 (66.46%) Loss: 2.058954 LR: 0.00001485 +[14:43:01] Epoch: 1 Batch: 13359/20099 (66.47%) Loss: 2.127850 LR: 0.00001485 +[14:43:02] Epoch: 1 Batch: 13360/20099 (66.47%) Loss: 2.171948 LR: 0.00001485 +[14:43:04] Epoch: 1 Batch: 13361/20099 (66.48%) Loss: 2.267786 LR: 0.00001485 +[14:43:06] Epoch: 1 Batch: 13362/20099 (66.48%) Loss: 2.175216 LR: 0.00001485 +[14:43:08] Epoch: 1 Batch: 13363/20099 (66.49%) Loss: 2.209704 LR: 0.00001485 +[14:43:10] Epoch: 1 Batch: 13364/20099 (66.49%) Loss: 2.323904 LR: 0.00001483 +[14:43:12] Epoch: 1 Batch: 13365/20099 (66.50%) Loss: 2.060765 LR: 0.00001483 +[14:43:14] Epoch: 1 Batch: 13366/20099 (66.50%) Loss: 2.176801 LR: 0.00001483 +[14:43:15] Epoch: 1 Batch: 13367/20099 (66.51%) Loss: 2.266035 LR: 0.00001483 +[14:43:17] Epoch: 1 Batch: 13368/20099 (66.51%) Loss: 2.115610 LR: 0.00001483 +[14:43:19] Epoch: 1 Batch: 13369/20099 (66.52%) Loss: 2.231431 LR: 0.00001483 +[14:43:21] Epoch: 1 Batch: 13370/20099 (66.52%) Loss: 2.277613 LR: 0.00001483 +[14:43:23] Epoch: 1 Batch: 13371/20099 (66.53%) Loss: 2.158147 LR: 0.00001481 +[14:43:25] Epoch: 1 Batch: 13372/20099 (66.53%) Loss: 2.118618 LR: 0.00001481 +[14:43:27] Epoch: 1 Batch: 13373/20099 (66.54%) Loss: 1.765904 LR: 0.00001481 +[14:43:29] Epoch: 1 Batch: 13374/20099 (66.54%) Loss: 1.926446 LR: 0.00001481 +[14:43:30] Epoch: 1 Batch: 13375/20099 (66.55%) Loss: 2.176546 LR: 0.00001481 +[14:43:32] Epoch: 1 Batch: 13376/20099 (66.55%) Loss: 2.009362 LR: 0.00001481 +[14:43:34] Epoch: 1 Batch: 13377/20099 (66.56%) Loss: 1.979883 LR: 0.00001481 +[14:43:36] Epoch: 1 Batch: 13378/20099 (66.56%) Loss: 2.275742 LR: 0.00001480 +[14:43:38] Epoch: 1 Batch: 13379/20099 (66.57%) Loss: 2.205226 LR: 0.00001480 +[14:43:40] Epoch: 1 Batch: 13380/20099 (66.57%) Loss: 1.986157 LR: 0.00001480 +[14:43:42] Epoch: 1 Batch: 13381/20099 (66.58%) Loss: 2.037007 LR: 0.00001480 +[14:43:43] Epoch: 1 Batch: 13382/20099 (66.58%) Loss: 1.993940 LR: 0.00001480 +[14:43:45] Epoch: 1 Batch: 13383/20099 (66.59%) Loss: 1.849427 LR: 0.00001480 +[14:43:47] Epoch: 1 Batch: 13384/20099 (66.59%) Loss: 1.788118 LR: 0.00001480 +[14:43:49] Epoch: 1 Batch: 13385/20099 (66.60%) Loss: 2.277706 LR: 0.00001478 +[14:43:51] Epoch: 1 Batch: 13386/20099 (66.60%) Loss: 2.305454 LR: 0.00001478 +[14:43:53] Epoch: 1 Batch: 13387/20099 (66.61%) Loss: 2.110665 LR: 0.00001478 +[14:43:55] Epoch: 1 Batch: 13388/20099 (66.61%) Loss: 2.121770 LR: 0.00001478 +[14:43:56] Epoch: 1 Batch: 13389/20099 (66.62%) Loss: 2.005130 LR: 0.00001478 +[14:43:58] Epoch: 1 Batch: 13390/20099 (66.62%) Loss: 2.416057 LR: 0.00001478 +[14:44:00] Epoch: 1 Batch: 13391/20099 (66.63%) Loss: 2.057936 LR: 0.00001478 +[14:44:02] Epoch: 1 Batch: 13392/20099 (66.63%) Loss: 2.136729 LR: 0.00001477 +[14:44:04] Epoch: 1 Batch: 13393/20099 (66.64%) Loss: 2.028944 LR: 0.00001477 +[14:44:06] Epoch: 1 Batch: 13394/20099 (66.64%) Loss: 2.085351 LR: 0.00001477 +[14:44:07] Epoch: 1 Batch: 13395/20099 (66.65%) Loss: 2.093023 LR: 0.00001477 +[14:44:09] Epoch: 1 Batch: 13396/20099 (66.65%) Loss: 1.930779 LR: 0.00001477 +[14:44:11] Epoch: 1 Batch: 13397/20099 (66.66%) Loss: 2.247925 LR: 0.00001477 +[14:44:13] Epoch: 1 Batch: 13398/20099 (66.66%) Loss: 2.240807 LR: 0.00001477 +[14:44:15] Epoch: 1 Batch: 13399/20099 (66.67%) Loss: 2.086411 LR: 0.00001475 +[14:44:20] >> Cleaned up old temp checkpoint: epoch1_step7600 +[14:44:20] >> Temp checkpoint saved: epoch1_step13400, size: 0.1693 GB +[14:44:20] Epoch: 1 Batch: 13400/20099 (66.67%) Loss: 2.135127 LR: 0.00001475 +[14:44:22] Epoch: 1 Batch: 13401/20099 (66.67%) Loss: 1.793053 LR: 0.00001475 +[14:44:24] Epoch: 1 Batch: 13402/20099 (66.68%) Loss: 2.021641 LR: 0.00001475 +[14:44:26] Epoch: 1 Batch: 13403/20099 (66.68%) Loss: 2.265216 LR: 0.00001475 +[14:44:28] Epoch: 1 Batch: 13404/20099 (66.69%) Loss: 2.333460 LR: 0.00001475 +[14:44:30] Epoch: 1 Batch: 13405/20099 (66.69%) Loss: 2.192636 LR: 0.00001475 +[14:44:32] Epoch: 1 Batch: 13406/20099 (66.70%) Loss: 2.081510 LR: 0.00001474 +[14:44:33] Epoch: 1 Batch: 13407/20099 (66.70%) Loss: 2.191995 LR: 0.00001474 +[14:44:35] Epoch: 1 Batch: 13408/20099 (66.71%) Loss: 2.086858 LR: 0.00001474 +[14:44:37] Epoch: 1 Batch: 13409/20099 (66.71%) Loss: 2.146316 LR: 0.00001474 +[14:44:39] Epoch: 1 Batch: 13410/20099 (66.72%) Loss: 1.925659 LR: 0.00001474 +[14:44:41] Epoch: 1 Batch: 13411/20099 (66.72%) Loss: 2.127150 LR: 0.00001474 +[14:44:43] Epoch: 1 Batch: 13412/20099 (66.73%) Loss: 1.910194 LR: 0.00001474 +[14:44:45] Epoch: 1 Batch: 13413/20099 (66.73%) Loss: 2.036099 LR: 0.00001472 +[14:44:47] Epoch: 1 Batch: 13414/20099 (66.74%) Loss: 2.062120 LR: 0.00001472 +[14:44:49] Epoch: 1 Batch: 13415/20099 (66.74%) Loss: 2.202031 LR: 0.00001472 +[14:44:50] Epoch: 1 Batch: 13416/20099 (66.75%) Loss: 2.026764 LR: 0.00001472 +[14:44:52] Epoch: 1 Batch: 13417/20099 (66.75%) Loss: 2.123771 LR: 0.00001472 +[14:44:54] Epoch: 1 Batch: 13418/20099 (66.76%) Loss: 1.684933 LR: 0.00001472 +[14:44:56] Epoch: 1 Batch: 13419/20099 (66.76%) Loss: 2.046165 LR: 0.00001472 +[14:44:58] Epoch: 1 Batch: 13420/20099 (66.77%) Loss: 2.168466 LR: 0.00001471 +[14:45:00] Epoch: 1 Batch: 13421/20099 (66.77%) Loss: 1.825064 LR: 0.00001471 +[14:45:02] Epoch: 1 Batch: 13422/20099 (66.78%) Loss: 2.148528 LR: 0.00001471 +[14:45:03] Epoch: 1 Batch: 13423/20099 (66.78%) Loss: 2.297801 LR: 0.00001471 +[14:45:05] Epoch: 1 Batch: 13424/20099 (66.79%) Loss: 2.136005 LR: 0.00001471 +[14:45:07] Epoch: 1 Batch: 13425/20099 (66.79%) Loss: 1.755471 LR: 0.00001471 +[14:45:09] Epoch: 1 Batch: 13426/20099 (66.80%) Loss: 2.222122 LR: 0.00001471 +[14:45:11] Epoch: 1 Batch: 13427/20099 (66.80%) Loss: 2.453722 LR: 0.00001469 +[14:45:13] Epoch: 1 Batch: 13428/20099 (66.81%) Loss: 2.042207 LR: 0.00001469 +[14:45:15] Epoch: 1 Batch: 13429/20099 (66.81%) Loss: 2.144678 LR: 0.00001469 +[14:45:16] Epoch: 1 Batch: 13430/20099 (66.82%) Loss: 2.148500 LR: 0.00001469 +[14:45:18] Epoch: 1 Batch: 13431/20099 (66.82%) Loss: 2.146038 LR: 0.00001469 +[14:45:20] Epoch: 1 Batch: 13432/20099 (66.83%) Loss: 2.038846 LR: 0.00001469 +[14:45:22] Epoch: 1 Batch: 13433/20099 (66.83%) Loss: 2.096416 LR: 0.00001469 +[14:45:24] Epoch: 1 Batch: 13434/20099 (66.84%) Loss: 2.089402 LR: 0.00001467 +[14:45:26] Epoch: 1 Batch: 13435/20099 (66.84%) Loss: 2.266926 LR: 0.00001467 +[14:45:28] Epoch: 1 Batch: 13436/20099 (66.85%) Loss: 1.930270 LR: 0.00001467 +[14:45:29] Epoch: 1 Batch: 13437/20099 (66.85%) Loss: 1.991151 LR: 0.00001467 +[14:45:31] Epoch: 1 Batch: 13438/20099 (66.86%) Loss: 2.178125 LR: 0.00001467 +[14:45:33] Epoch: 1 Batch: 13439/20099 (66.86%) Loss: 2.182286 LR: 0.00001467 +[14:45:35] Epoch: 1 Batch: 13440/20099 (66.87%) Loss: 1.984294 LR: 0.00001467 +[14:45:37] Epoch: 1 Batch: 13441/20099 (66.87%) Loss: 2.205653 LR: 0.00001466 +[14:45:39] Epoch: 1 Batch: 13442/20099 (66.88%) Loss: 1.954536 LR: 0.00001466 +[14:45:40] Epoch: 1 Batch: 13443/20099 (66.88%) Loss: 1.910935 LR: 0.00001466 +[14:45:42] Epoch: 1 Batch: 13444/20099 (66.89%) Loss: 1.883409 LR: 0.00001466 +[14:45:44] Epoch: 1 Batch: 13445/20099 (66.89%) Loss: 1.968218 LR: 0.00001466 +[14:45:46] Epoch: 1 Batch: 13446/20099 (66.90%) Loss: 1.509426 LR: 0.00001466 +[14:45:48] Epoch: 1 Batch: 13447/20099 (66.90%) Loss: 2.001812 LR: 0.00001466 +[14:45:50] Epoch: 1 Batch: 13448/20099 (66.91%) Loss: 1.812986 LR: 0.00001464 +[14:45:52] Epoch: 1 Batch: 13449/20099 (66.91%) Loss: 2.101390 LR: 0.00001464 +[14:45:53] Epoch: 1 Batch: 13450/20099 (66.92%) Loss: 2.241686 LR: 0.00001464 +[14:45:55] Epoch: 1 Batch: 13451/20099 (66.92%) Loss: 2.485014 LR: 0.00001464 +[14:45:57] Epoch: 1 Batch: 13452/20099 (66.93%) Loss: 2.218925 LR: 0.00001464 +[14:45:59] Epoch: 1 Batch: 13453/20099 (66.93%) Loss: 2.250610 LR: 0.00001464 +[14:46:01] Epoch: 1 Batch: 13454/20099 (66.94%) Loss: 2.383035 LR: 0.00001464 +[14:46:03] Epoch: 1 Batch: 13455/20099 (66.94%) Loss: 2.035984 LR: 0.00001463 +[14:46:05] Epoch: 1 Batch: 13456/20099 (66.95%) Loss: 1.989036 LR: 0.00001463 +[14:46:06] Epoch: 1 Batch: 13457/20099 (66.95%) Loss: 1.995362 LR: 0.00001463 +[14:46:08] Epoch: 1 Batch: 13458/20099 (66.96%) Loss: 1.863683 LR: 0.00001463 +[14:46:10] Epoch: 1 Batch: 13459/20099 (66.96%) Loss: 1.934050 LR: 0.00001463 +[14:46:12] Epoch: 1 Batch: 13460/20099 (66.97%) Loss: 1.838286 LR: 0.00001463 +[14:46:14] Epoch: 1 Batch: 13461/20099 (66.97%) Loss: 2.080239 LR: 0.00001463 +[14:46:16] Epoch: 1 Batch: 13462/20099 (66.98%) Loss: 2.190958 LR: 0.00001461 +[14:46:18] Epoch: 1 Batch: 13463/20099 (66.98%) Loss: 2.205681 LR: 0.00001461 +[14:46:19] Epoch: 1 Batch: 13464/20099 (66.99%) Loss: 2.040700 LR: 0.00001461 +[14:46:21] Epoch: 1 Batch: 13465/20099 (66.99%) Loss: 2.091816 LR: 0.00001461 +[14:46:23] Epoch: 1 Batch: 13466/20099 (67.00%) Loss: 2.049945 LR: 0.00001461 +[14:46:25] Epoch: 1 Batch: 13467/20099 (67.00%) Loss: 1.877816 LR: 0.00001461 +[14:46:27] Epoch: 1 Batch: 13468/20099 (67.01%) Loss: 2.258307 LR: 0.00001461 +[14:46:29] Epoch: 1 Batch: 13469/20099 (67.01%) Loss: 2.198229 LR: 0.00001460 +[14:46:30] Epoch: 1 Batch: 13470/20099 (67.02%) Loss: 1.899738 LR: 0.00001460 +[14:46:32] Epoch: 1 Batch: 13471/20099 (67.02%) Loss: 2.358918 LR: 0.00001460 +[14:46:34] Epoch: 1 Batch: 13472/20099 (67.03%) Loss: 2.227908 LR: 0.00001460 +[14:46:36] Epoch: 1 Batch: 13473/20099 (67.03%) Loss: 2.537945 LR: 0.00001460 +[14:46:38] Epoch: 1 Batch: 13474/20099 (67.04%) Loss: 1.979318 LR: 0.00001460 +[14:46:40] Epoch: 1 Batch: 13475/20099 (67.04%) Loss: 2.319332 LR: 0.00001460 +[14:46:42] Epoch: 1 Batch: 13476/20099 (67.05%) Loss: 1.680117 LR: 0.00001458 +[14:46:43] Epoch: 1 Batch: 13477/20099 (67.05%) Loss: 2.024430 LR: 0.00001458 +[14:46:45] Epoch: 1 Batch: 13478/20099 (67.06%) Loss: 2.212808 LR: 0.00001458 +[14:46:47] Epoch: 1 Batch: 13479/20099 (67.06%) Loss: 2.102658 LR: 0.00001458 +[14:46:49] Epoch: 1 Batch: 13480/20099 (67.07%) Loss: 1.983001 LR: 0.00001458 +[14:46:51] Epoch: 1 Batch: 13481/20099 (67.07%) Loss: 2.138290 LR: 0.00001458 +[14:46:53] Epoch: 1 Batch: 13482/20099 (67.08%) Loss: 2.020572 LR: 0.00001458 +[14:46:55] Epoch: 1 Batch: 13483/20099 (67.08%) Loss: 2.035801 LR: 0.00001456 +[14:46:56] Epoch: 1 Batch: 13484/20099 (67.09%) Loss: 1.725659 LR: 0.00001456 +[14:46:58] Epoch: 1 Batch: 13485/20099 (67.09%) Loss: 2.176112 LR: 0.00001456 +[14:47:00] Epoch: 1 Batch: 13486/20099 (67.10%) Loss: 1.992947 LR: 0.00001456 +[14:47:02] Epoch: 1 Batch: 13487/20099 (67.10%) Loss: 2.131181 LR: 0.00001456 +[14:47:04] Epoch: 1 Batch: 13488/20099 (67.11%) Loss: 1.562997 LR: 0.00001456 +[14:47:06] Epoch: 1 Batch: 13489/20099 (67.11%) Loss: 2.506859 LR: 0.00001456 +[14:47:08] Epoch: 1 Batch: 13490/20099 (67.12%) Loss: 2.072924 LR: 0.00001455 +[14:47:09] Epoch: 1 Batch: 13491/20099 (67.12%) Loss: 1.916165 LR: 0.00001455 +[14:47:11] Epoch: 1 Batch: 13492/20099 (67.13%) Loss: 2.225444 LR: 0.00001455 +[14:47:13] Epoch: 1 Batch: 13493/20099 (67.13%) Loss: 1.816761 LR: 0.00001455 +[14:47:15] Epoch: 1 Batch: 13494/20099 (67.14%) Loss: 2.235599 LR: 0.00001455 +[14:47:17] Epoch: 1 Batch: 13495/20099 (67.14%) Loss: 2.225565 LR: 0.00001455 +[14:47:19] Epoch: 1 Batch: 13496/20099 (67.15%) Loss: 2.069945 LR: 0.00001455 +[14:47:21] Epoch: 1 Batch: 13497/20099 (67.15%) Loss: 1.985946 LR: 0.00001453 +[14:47:22] Epoch: 1 Batch: 13498/20099 (67.16%) Loss: 2.429569 LR: 0.00001453 +[14:47:24] Epoch: 1 Batch: 13499/20099 (67.16%) Loss: 1.888161 LR: 0.00001453 +[14:47:26] >> Evaluating batch 0 +[14:47:27] >> Evaluating batch 1 +[14:47:28] >> Evaluating batch 2 +[14:47:30] >> Evaluating batch 3 +[14:47:31] >> Evaluating batch 4 +[14:47:32] >> Evaluating batch 5 +[14:47:33] >> Evaluating batch 6 +[14:47:34] >> Evaluating batch 7 +[14:47:35] >> Evaluating batch 8 +[14:47:36] >> Evaluating batch 9 +[14:47:37] >> Evaluating batch 10 +[14:47:38] >> Evaluating batch 11 +[14:47:39] >> Evaluating batch 12 +[14:47:40] >> Evaluating batch 13 +[14:47:41] >> Evaluating batch 14 +[14:47:42] >> Evaluating batch 15 +[14:47:43] >> Evaluating batch 16 +[14:47:44] Epoch: 1 Step: 13500/20099 Evaluation: +[14:47:44] [1mAvg Loss Since Last Eval: 2.0832 Val Loss: 2.1553 Validation loss delta: -0.0023 Perplexity: 8.6302 LR: 0.00001453 +[14:47:47] >> Checkpoint saved: epoch1_step13500, size: 0.1693 GB +[14:47:47] Epoch: 1 Batch: 13500/20099 (67.17%) Loss: 2.011903 LR: 0.00001453 +[14:47:49] Epoch: 1 Batch: 13501/20099 (67.17%) Loss: 2.321958 LR: 0.00001453 +[14:47:51] Epoch: 1 Batch: 13502/20099 (67.18%) Loss: 2.030170 LR: 0.00001453 +[14:47:53] Epoch: 1 Batch: 13503/20099 (67.18%) Loss: 2.166376 LR: 0.00001453 +[14:47:55] Epoch: 1 Batch: 13504/20099 (67.19%) Loss: 2.249407 LR: 0.00001452 +[14:47:56] Epoch: 1 Batch: 13505/20099 (67.19%) Loss: 2.082206 LR: 0.00001452 +[14:47:58] Epoch: 1 Batch: 13506/20099 (67.20%) Loss: 1.991330 LR: 0.00001452 +[14:48:00] Epoch: 1 Batch: 13507/20099 (67.20%) Loss: 2.177834 LR: 0.00001452 +[14:48:02] Epoch: 1 Batch: 13508/20099 (67.21%) Loss: 2.169148 LR: 0.00001452 +[14:48:04] Epoch: 1 Batch: 13509/20099 (67.21%) Loss: 1.767752 LR: 0.00001452 +[14:48:06] Epoch: 1 Batch: 13510/20099 (67.22%) Loss: 2.297407 LR: 0.00001452 +[14:48:08] Epoch: 1 Batch: 13511/20099 (67.22%) Loss: 2.075797 LR: 0.00001450 +[14:48:09] Epoch: 1 Batch: 13512/20099 (67.23%) Loss: 1.794461 LR: 0.00001450 +[14:48:11] Epoch: 1 Batch: 13513/20099 (67.23%) Loss: 2.079753 LR: 0.00001450 +[14:48:13] Epoch: 1 Batch: 13514/20099 (67.24%) Loss: 1.902808 LR: 0.00001450 +[14:48:15] Epoch: 1 Batch: 13515/20099 (67.24%) Loss: 2.136451 LR: 0.00001450 +[14:48:17] Epoch: 1 Batch: 13516/20099 (67.25%) Loss: 2.154259 LR: 0.00001450 +[14:48:19] Epoch: 1 Batch: 13517/20099 (67.25%) Loss: 1.720858 LR: 0.00001450 +[14:48:21] Epoch: 1 Batch: 13518/20099 (67.26%) Loss: 1.927377 LR: 0.00001449 +[14:48:22] Epoch: 1 Batch: 13519/20099 (67.26%) Loss: 2.036101 LR: 0.00001449 +[14:48:24] Epoch: 1 Batch: 13520/20099 (67.27%) Loss: 1.910327 LR: 0.00001449 +[14:48:26] Epoch: 1 Batch: 13521/20099 (67.27%) Loss: 2.013096 LR: 0.00001449 +[14:48:28] Epoch: 1 Batch: 13522/20099 (67.28%) Loss: 2.146849 LR: 0.00001449 +[14:48:30] Epoch: 1 Batch: 13523/20099 (67.28%) Loss: 2.425998 LR: 0.00001449 +[14:48:32] Epoch: 1 Batch: 13524/20099 (67.29%) Loss: 2.205116 LR: 0.00001449 +[14:48:34] Epoch: 1 Batch: 13525/20099 (67.29%) Loss: 2.205664 LR: 0.00001447 +[14:48:35] Epoch: 1 Batch: 13526/20099 (67.30%) Loss: 2.172720 LR: 0.00001447 +[14:48:37] Epoch: 1 Batch: 13527/20099 (67.30%) Loss: 1.858198 LR: 0.00001447 +[14:48:39] Epoch: 1 Batch: 13528/20099 (67.31%) Loss: 1.943600 LR: 0.00001447 +[14:48:41] Epoch: 1 Batch: 13529/20099 (67.31%) Loss: 2.094165 LR: 0.00001447 +[14:48:43] Epoch: 1 Batch: 13530/20099 (67.32%) Loss: 2.073287 LR: 0.00001447 +[14:48:45] Epoch: 1 Batch: 13531/20099 (67.32%) Loss: 1.963931 LR: 0.00001447 +[14:48:47] Epoch: 1 Batch: 13532/20099 (67.33%) Loss: 1.961176 LR: 0.00001446 +[14:48:48] Epoch: 1 Batch: 13533/20099 (67.33%) Loss: 1.996915 LR: 0.00001446 +[14:48:50] Epoch: 1 Batch: 13534/20099 (67.34%) Loss: 2.026828 LR: 0.00001446 +[14:48:52] Epoch: 1 Batch: 13535/20099 (67.34%) Loss: 2.232148 LR: 0.00001446 +[14:48:54] Epoch: 1 Batch: 13536/20099 (67.35%) Loss: 2.231326 LR: 0.00001446 +[14:48:56] Epoch: 1 Batch: 13537/20099 (67.35%) Loss: 2.224164 LR: 0.00001446 +[14:48:58] Epoch: 1 Batch: 13538/20099 (67.36%) Loss: 2.220720 LR: 0.00001446 +[14:49:00] Epoch: 1 Batch: 13539/20099 (67.36%) Loss: 2.199985 LR: 0.00001444 +[14:49:01] Epoch: 1 Batch: 13540/20099 (67.37%) Loss: 2.325209 LR: 0.00001444 +[14:49:03] Epoch: 1 Batch: 13541/20099 (67.37%) Loss: 1.984649 LR: 0.00001444 +[14:49:05] Epoch: 1 Batch: 13542/20099 (67.38%) Loss: 2.193214 LR: 0.00001444 +[14:49:07] Epoch: 1 Batch: 13543/20099 (67.38%) Loss: 2.160307 LR: 0.00001444 +[14:49:09] Epoch: 1 Batch: 13544/20099 (67.39%) Loss: 1.992991 LR: 0.00001444 +[14:49:11] Epoch: 1 Batch: 13545/20099 (67.39%) Loss: 1.855545 LR: 0.00001444 +[14:49:12] Epoch: 1 Batch: 13546/20099 (67.40%) Loss: 1.895160 LR: 0.00001442 +[14:49:14] Epoch: 1 Batch: 13547/20099 (67.40%) Loss: 2.112163 LR: 0.00001442 +[14:49:16] Epoch: 1 Batch: 13548/20099 (67.41%) Loss: 2.150685 LR: 0.00001442 +[14:49:18] Epoch: 1 Batch: 13549/20099 (67.41%) Loss: 2.261418 LR: 0.00001442 +[14:49:20] Epoch: 1 Batch: 13550/20099 (67.42%) Loss: 2.044626 LR: 0.00001442 +[14:49:22] Epoch: 1 Batch: 13551/20099 (67.42%) Loss: 2.172755 LR: 0.00001442 +[14:49:24] Epoch: 1 Batch: 13552/20099 (67.43%) Loss: 2.154099 LR: 0.00001442 +[14:49:25] Epoch: 1 Batch: 13553/20099 (67.43%) Loss: 1.877829 LR: 0.00001441 +[14:49:27] Epoch: 1 Batch: 13554/20099 (67.44%) Loss: 1.991456 LR: 0.00001441 +[14:49:29] Epoch: 1 Batch: 13555/20099 (67.44%) Loss: 2.375202 LR: 0.00001441 +[14:49:31] Epoch: 1 Batch: 13556/20099 (67.45%) Loss: 2.229606 LR: 0.00001441 +[14:49:33] Epoch: 1 Batch: 13557/20099 (67.45%) Loss: 2.085515 LR: 0.00001441 +[14:49:35] Epoch: 1 Batch: 13558/20099 (67.46%) Loss: 2.245805 LR: 0.00001441 +[14:49:37] Epoch: 1 Batch: 13559/20099 (67.46%) Loss: 1.872956 LR: 0.00001441 +[14:49:38] Epoch: 1 Batch: 13560/20099 (67.47%) Loss: 1.738995 LR: 0.00001439 +[14:49:40] Epoch: 1 Batch: 13561/20099 (67.47%) Loss: 1.983373 LR: 0.00001439 +[14:49:42] Epoch: 1 Batch: 13562/20099 (67.48%) Loss: 1.810364 LR: 0.00001439 +[14:49:44] Epoch: 1 Batch: 13563/20099 (67.48%) Loss: 2.186716 LR: 0.00001439 +[14:49:46] Epoch: 1 Batch: 13564/20099 (67.49%) Loss: 2.173098 LR: 0.00001439 +[14:49:48] Epoch: 1 Batch: 13565/20099 (67.49%) Loss: 1.861415 LR: 0.00001439 +[14:49:50] Epoch: 1 Batch: 13566/20099 (67.50%) Loss: 1.851422 LR: 0.00001439 +[14:49:51] Epoch: 1 Batch: 13567/20099 (67.50%) Loss: 2.381083 LR: 0.00001438 +[14:49:53] Epoch: 1 Batch: 13568/20099 (67.51%) Loss: 2.073934 LR: 0.00001438 +[14:49:55] Epoch: 1 Batch: 13569/20099 (67.51%) Loss: 2.339108 LR: 0.00001438 +[14:49:57] Epoch: 1 Batch: 13570/20099 (67.52%) Loss: 2.024748 LR: 0.00001438 +[14:49:59] Epoch: 1 Batch: 13571/20099 (67.52%) Loss: 1.746899 LR: 0.00001438 +[14:50:01] Epoch: 1 Batch: 13572/20099 (67.53%) Loss: 2.386711 LR: 0.00001438 +[14:50:03] Epoch: 1 Batch: 13573/20099 (67.53%) Loss: 1.951219 LR: 0.00001438 +[14:50:05] Epoch: 1 Batch: 13574/20099 (67.54%) Loss: 1.961536 LR: 0.00001436 +[14:50:06] Epoch: 1 Batch: 13575/20099 (67.54%) Loss: 1.905236 LR: 0.00001436 +[14:50:08] Epoch: 1 Batch: 13576/20099 (67.55%) Loss: 2.264654 LR: 0.00001436 +[14:50:10] Epoch: 1 Batch: 13577/20099 (67.55%) Loss: 2.028520 LR: 0.00001436 +[14:50:12] Epoch: 1 Batch: 13578/20099 (67.56%) Loss: 2.112345 LR: 0.00001436 +[14:50:14] Epoch: 1 Batch: 13579/20099 (67.56%) Loss: 2.090103 LR: 0.00001436 +[14:50:16] Epoch: 1 Batch: 13580/20099 (67.57%) Loss: 1.986644 LR: 0.00001436 +[14:50:18] Epoch: 1 Batch: 13581/20099 (67.57%) Loss: 2.139053 LR: 0.00001435 +[14:50:19] Epoch: 1 Batch: 13582/20099 (67.58%) Loss: 1.886828 LR: 0.00001435 +[14:50:21] Epoch: 1 Batch: 13583/20099 (67.58%) Loss: 2.337843 LR: 0.00001435 +[14:50:23] Epoch: 1 Batch: 13584/20099 (67.59%) Loss: 1.729876 LR: 0.00001435 +[14:50:25] Epoch: 1 Batch: 13585/20099 (67.59%) Loss: 1.969269 LR: 0.00001435 +[14:50:27] Epoch: 1 Batch: 13586/20099 (67.60%) Loss: 1.854407 LR: 0.00001435 +[14:50:29] Epoch: 1 Batch: 13587/20099 (67.60%) Loss: 2.180773 LR: 0.00001435 +[14:50:31] Epoch: 1 Batch: 13588/20099 (67.61%) Loss: 2.002723 LR: 0.00001433 +[14:50:32] Epoch: 1 Batch: 13589/20099 (67.61%) Loss: 2.225653 LR: 0.00001433 +[14:50:34] Epoch: 1 Batch: 13590/20099 (67.62%) Loss: 2.066202 LR: 0.00001433 +[14:50:36] Epoch: 1 Batch: 13591/20099 (67.62%) Loss: 2.287260 LR: 0.00001433 +[14:50:38] Epoch: 1 Batch: 13592/20099 (67.63%) Loss: 2.226359 LR: 0.00001433 +[14:50:40] Epoch: 1 Batch: 13593/20099 (67.63%) Loss: 1.839072 LR: 0.00001433 +[14:50:42] Epoch: 1 Batch: 13594/20099 (67.64%) Loss: 2.007494 LR: 0.00001433 +[14:50:44] Epoch: 1 Batch: 13595/20099 (67.64%) Loss: 2.212268 LR: 0.00001432 +[14:50:45] Epoch: 1 Batch: 13596/20099 (67.65%) Loss: 2.177581 LR: 0.00001432 +[14:50:47] Epoch: 1 Batch: 13597/20099 (67.65%) Loss: 2.124609 LR: 0.00001432 +[14:50:49] Epoch: 1 Batch: 13598/20099 (67.66%) Loss: 2.182082 LR: 0.00001432 +[14:50:51] Epoch: 1 Batch: 13599/20099 (67.66%) Loss: 2.164456 LR: 0.00001432 +[14:50:56] >> Cleaned up old temp checkpoint: epoch1_step11600 +[14:50:56] >> Temp checkpoint saved: epoch1_step13600, size: 0.1693 GB +[14:50:56] Epoch: 1 Batch: 13600/20099 (67.67%) Loss: 2.120978 LR: 0.00001432 +[14:50:58] Epoch: 1 Batch: 13601/20099 (67.67%) Loss: 2.012021 LR: 0.00001432 +[14:51:00] Epoch: 1 Batch: 13602/20099 (67.68%) Loss: 2.059035 LR: 0.00001430 +[14:51:02] Epoch: 1 Batch: 13603/20099 (67.68%) Loss: 2.229355 LR: 0.00001430 +[14:51:04] Epoch: 1 Batch: 13604/20099 (67.68%) Loss: 1.755328 LR: 0.00001430 +[14:51:06] Epoch: 1 Batch: 13605/20099 (67.69%) Loss: 2.164550 LR: 0.00001430 +[14:51:07] Epoch: 1 Batch: 13606/20099 (67.69%) Loss: 2.469192 LR: 0.00001430 +[14:51:09] Epoch: 1 Batch: 13607/20099 (67.70%) Loss: 1.831126 LR: 0.00001430 +[14:51:11] Epoch: 1 Batch: 13608/20099 (67.70%) Loss: 2.212390 LR: 0.00001430 +[14:51:13] Epoch: 1 Batch: 13609/20099 (67.71%) Loss: 2.255259 LR: 0.00001429 +[14:51:15] Epoch: 1 Batch: 13610/20099 (67.71%) Loss: 2.067151 LR: 0.00001429 +[14:51:17] Epoch: 1 Batch: 13611/20099 (67.72%) Loss: 2.113679 LR: 0.00001429 +[14:51:19] Epoch: 1 Batch: 13612/20099 (67.72%) Loss: 2.222425 LR: 0.00001429 +[14:51:20] Epoch: 1 Batch: 13613/20099 (67.73%) Loss: 2.119242 LR: 0.00001429 +[14:51:22] Epoch: 1 Batch: 13614/20099 (67.73%) Loss: 2.195909 LR: 0.00001429 +[14:51:24] Epoch: 1 Batch: 13615/20099 (67.74%) Loss: 1.964242 LR: 0.00001429 +[14:51:26] Epoch: 1 Batch: 13616/20099 (67.74%) Loss: 1.973704 LR: 0.00001427 +[14:51:28] Epoch: 1 Batch: 13617/20099 (67.75%) Loss: 2.056378 LR: 0.00001427 +[14:51:30] Epoch: 1 Batch: 13618/20099 (67.75%) Loss: 2.030741 LR: 0.00001427 +[14:51:32] Epoch: 1 Batch: 13619/20099 (67.76%) Loss: 1.962627 LR: 0.00001427 +[14:51:33] Epoch: 1 Batch: 13620/20099 (67.76%) Loss: 2.364783 LR: 0.00001427 +[14:51:35] Epoch: 1 Batch: 13621/20099 (67.77%) Loss: 1.789416 LR: 0.00001427 +[14:51:37] Epoch: 1 Batch: 13622/20099 (67.77%) Loss: 2.385618 LR: 0.00001427 +[14:51:39] Epoch: 1 Batch: 13623/20099 (67.78%) Loss: 2.141251 LR: 0.00001425 +[14:51:41] Epoch: 1 Batch: 13624/20099 (67.78%) Loss: 2.158912 LR: 0.00001425 +[14:51:43] Epoch: 1 Batch: 13625/20099 (67.79%) Loss: 1.850044 LR: 0.00001425 +[14:51:45] Epoch: 1 Batch: 13626/20099 (67.79%) Loss: 2.111247 LR: 0.00001425 +[14:51:47] Epoch: 1 Batch: 13627/20099 (67.80%) Loss: 2.062116 LR: 0.00001425 +[14:51:48] Epoch: 1 Batch: 13628/20099 (67.80%) Loss: 1.899929 LR: 0.00001425 +[14:51:50] Epoch: 1 Batch: 13629/20099 (67.81%) Loss: 1.845110 LR: 0.00001425 +[14:51:52] Epoch: 1 Batch: 13630/20099 (67.81%) Loss: 2.233570 LR: 0.00001424 +[14:51:54] Epoch: 1 Batch: 13631/20099 (67.82%) Loss: 2.037664 LR: 0.00001424 +[14:51:56] Epoch: 1 Batch: 13632/20099 (67.82%) Loss: 1.876193 LR: 0.00001424 +[14:51:58] Epoch: 1 Batch: 13633/20099 (67.83%) Loss: 2.247371 LR: 0.00001424 +[14:51:59] Epoch: 1 Batch: 13634/20099 (67.83%) Loss: 1.932721 LR: 0.00001424 +[14:52:01] Epoch: 1 Batch: 13635/20099 (67.84%) Loss: 2.182402 LR: 0.00001424 +[14:52:03] Epoch: 1 Batch: 13636/20099 (67.84%) Loss: 2.231509 LR: 0.00001424 +[14:52:05] Epoch: 1 Batch: 13637/20099 (67.85%) Loss: 2.347555 LR: 0.00001422 +[14:52:07] Epoch: 1 Batch: 13638/20099 (67.85%) Loss: 2.222641 LR: 0.00001422 +[14:52:09] Epoch: 1 Batch: 13639/20099 (67.86%) Loss: 2.050825 LR: 0.00001422 +[14:52:11] Epoch: 1 Batch: 13640/20099 (67.86%) Loss: 2.196037 LR: 0.00001422 +[14:52:12] Epoch: 1 Batch: 13641/20099 (67.87%) Loss: 2.027528 LR: 0.00001422 +[14:52:14] Epoch: 1 Batch: 13642/20099 (67.87%) Loss: 2.106459 LR: 0.00001422 +[14:52:16] Epoch: 1 Batch: 13643/20099 (67.88%) Loss: 2.113587 LR: 0.00001422 +[14:52:18] Epoch: 1 Batch: 13644/20099 (67.88%) Loss: 2.161985 LR: 0.00001421 +[14:52:20] Epoch: 1 Batch: 13645/20099 (67.89%) Loss: 1.704740 LR: 0.00001421 +[14:52:22] Epoch: 1 Batch: 13646/20099 (67.89%) Loss: 2.366832 LR: 0.00001421 +[14:52:23] Epoch: 1 Batch: 13647/20099 (67.90%) Loss: 1.881084 LR: 0.00001421 +[14:52:25] Epoch: 1 Batch: 13648/20099 (67.90%) Loss: 2.310054 LR: 0.00001421 +[14:52:27] Epoch: 1 Batch: 13649/20099 (67.91%) Loss: 1.958483 LR: 0.00001421 +[14:52:29] Epoch: 1 Batch: 13650/20099 (67.91%) Loss: 1.979843 LR: 0.00001421 +[14:52:31] Epoch: 1 Batch: 13651/20099 (67.92%) Loss: 2.077630 LR: 0.00001419 +[14:52:33] Epoch: 1 Batch: 13652/20099 (67.92%) Loss: 1.957179 LR: 0.00001419 +[14:52:35] Epoch: 1 Batch: 13653/20099 (67.93%) Loss: 1.926663 LR: 0.00001419 +[14:52:36] Epoch: 1 Batch: 13654/20099 (67.93%) Loss: 2.144798 LR: 0.00001419 +[14:52:38] Epoch: 1 Batch: 13655/20099 (67.94%) Loss: 2.027520 LR: 0.00001419 +[14:52:40] Epoch: 1 Batch: 13656/20099 (67.94%) Loss: 2.468672 LR: 0.00001419 +[14:52:42] Epoch: 1 Batch: 13657/20099 (67.95%) Loss: 2.135991 LR: 0.00001419 +[14:52:44] Epoch: 1 Batch: 13658/20099 (67.95%) Loss: 1.754703 LR: 0.00001418 +[14:52:46] Epoch: 1 Batch: 13659/20099 (67.96%) Loss: 2.002926 LR: 0.00001418 +[14:52:48] Epoch: 1 Batch: 13660/20099 (67.96%) Loss: 1.969887 LR: 0.00001418 +[14:52:49] Epoch: 1 Batch: 13661/20099 (67.97%) Loss: 2.093292 LR: 0.00001418 +[14:52:51] Epoch: 1 Batch: 13662/20099 (67.97%) Loss: 2.084755 LR: 0.00001418 +[14:52:53] Epoch: 1 Batch: 13663/20099 (67.98%) Loss: 2.140959 LR: 0.00001418 +[14:52:55] Epoch: 1 Batch: 13664/20099 (67.98%) Loss: 2.028686 LR: 0.00001418 +[14:52:57] Epoch: 1 Batch: 13665/20099 (67.99%) Loss: 2.451407 LR: 0.00001416 +[14:52:59] Epoch: 1 Batch: 13666/20099 (67.99%) Loss: 1.885467 LR: 0.00001416 +[14:53:01] Epoch: 1 Batch: 13667/20099 (68.00%) Loss: 1.920986 LR: 0.00001416 +[14:53:02] Epoch: 1 Batch: 13668/20099 (68.00%) Loss: 2.137796 LR: 0.00001416 +[14:53:04] Epoch: 1 Batch: 13669/20099 (68.01%) Loss: 2.074468 LR: 0.00001416 +[14:53:06] Epoch: 1 Batch: 13670/20099 (68.01%) Loss: 2.191732 LR: 0.00001416 +[14:53:08] Epoch: 1 Batch: 13671/20099 (68.02%) Loss: 1.984421 LR: 0.00001416 +[14:53:10] Epoch: 1 Batch: 13672/20099 (68.02%) Loss: 2.029981 LR: 0.00001415 +[14:53:12] Epoch: 1 Batch: 13673/20099 (68.03%) Loss: 2.294815 LR: 0.00001415 +[14:53:14] Epoch: 1 Batch: 13674/20099 (68.03%) Loss: 1.852778 LR: 0.00001415 +[14:53:15] Epoch: 1 Batch: 13675/20099 (68.04%) Loss: 2.075808 LR: 0.00001415 +[14:53:17] Epoch: 1 Batch: 13676/20099 (68.04%) Loss: 1.900626 LR: 0.00001415 +[14:53:19] Epoch: 1 Batch: 13677/20099 (68.05%) Loss: 2.095630 LR: 0.00001415 +[14:53:21] Epoch: 1 Batch: 13678/20099 (68.05%) Loss: 1.917425 LR: 0.00001415 +[14:53:23] Epoch: 1 Batch: 13679/20099 (68.06%) Loss: 1.922134 LR: 0.00001413 +[14:53:25] Epoch: 1 Batch: 13680/20099 (68.06%) Loss: 2.161550 LR: 0.00001413 +[14:53:27] Epoch: 1 Batch: 13681/20099 (68.07%) Loss: 2.124061 LR: 0.00001413 +[14:53:29] Epoch: 1 Batch: 13682/20099 (68.07%) Loss: 1.953416 LR: 0.00001413 +[14:53:30] Epoch: 1 Batch: 13683/20099 (68.08%) Loss: 2.027341 LR: 0.00001413 +[14:53:32] Epoch: 1 Batch: 13684/20099 (68.08%) Loss: 2.068595 LR: 0.00001413 +[14:53:34] Epoch: 1 Batch: 13685/20099 (68.09%) Loss: 2.387050 LR: 0.00001413 +[14:53:36] Epoch: 1 Batch: 13686/20099 (68.09%) Loss: 2.279773 LR: 0.00001412 +[14:53:38] Epoch: 1 Batch: 13687/20099 (68.10%) Loss: 2.070468 LR: 0.00001412 +[14:53:40] Epoch: 1 Batch: 13688/20099 (68.10%) Loss: 2.240777 LR: 0.00001412 +[14:53:41] Epoch: 1 Batch: 13689/20099 (68.11%) Loss: 2.114401 LR: 0.00001412 +[14:53:43] Epoch: 1 Batch: 13690/20099 (68.11%) Loss: 2.268743 LR: 0.00001412 +[14:53:45] Epoch: 1 Batch: 13691/20099 (68.12%) Loss: 1.938378 LR: 0.00001412 +[14:53:47] Epoch: 1 Batch: 13692/20099 (68.12%) Loss: 2.059533 LR: 0.00001412 +[14:53:49] Epoch: 1 Batch: 13693/20099 (68.13%) Loss: 1.824458 LR: 0.00001410 +[14:53:51] Epoch: 1 Batch: 13694/20099 (68.13%) Loss: 2.068025 LR: 0.00001410 +[14:53:53] Epoch: 1 Batch: 13695/20099 (68.14%) Loss: 1.837087 LR: 0.00001410 +[14:53:54] Epoch: 1 Batch: 13696/20099 (68.14%) Loss: 2.109556 LR: 0.00001410 +[14:53:56] Epoch: 1 Batch: 13697/20099 (68.15%) Loss: 2.235687 LR: 0.00001410 +[14:53:58] Epoch: 1 Batch: 13698/20099 (68.15%) Loss: 2.082028 LR: 0.00001410 +[14:54:00] Epoch: 1 Batch: 13699/20099 (68.16%) Loss: 2.357527 LR: 0.00001410 +[14:54:02] Epoch: 1 Batch: 13700/20099 (68.16%) Loss: 2.246573 LR: 0.00001409 +[14:54:04] Epoch: 1 Batch: 13701/20099 (68.17%) Loss: 1.960848 LR: 0.00001409 +[14:54:05] Epoch: 1 Batch: 13702/20099 (68.17%) Loss: 2.358168 LR: 0.00001409 +[14:54:07] Epoch: 1 Batch: 13703/20099 (68.18%) Loss: 1.882469 LR: 0.00001409 +[14:54:09] Epoch: 1 Batch: 13704/20099 (68.18%) Loss: 2.318020 LR: 0.00001409 +[14:54:11] Epoch: 1 Batch: 13705/20099 (68.19%) Loss: 2.261776 LR: 0.00001409 +[14:54:13] Epoch: 1 Batch: 13706/20099 (68.19%) Loss: 2.330215 LR: 0.00001409 +[14:54:15] Epoch: 1 Batch: 13707/20099 (68.20%) Loss: 1.975414 LR: 0.00001407 +[14:54:17] Epoch: 1 Batch: 13708/20099 (68.20%) Loss: 2.128087 LR: 0.00001407 +[14:54:18] Epoch: 1 Batch: 13709/20099 (68.21%) Loss: 2.323610 LR: 0.00001407 +[14:54:20] Epoch: 1 Batch: 13710/20099 (68.21%) Loss: 2.080271 LR: 0.00001407 +[14:54:22] Epoch: 1 Batch: 13711/20099 (68.22%) Loss: 2.088404 LR: 0.00001407 +[14:54:24] Epoch: 1 Batch: 13712/20099 (68.22%) Loss: 2.126389 LR: 0.00001407 +[14:54:26] Epoch: 1 Batch: 13713/20099 (68.23%) Loss: 2.017226 LR: 0.00001407 +[14:54:28] Epoch: 1 Batch: 13714/20099 (68.23%) Loss: 1.807030 LR: 0.00001405 +[14:54:30] Epoch: 1 Batch: 13715/20099 (68.24%) Loss: 2.390823 LR: 0.00001405 +[14:54:32] Epoch: 1 Batch: 13716/20099 (68.24%) Loss: 2.393228 LR: 0.00001405 +[14:54:33] Epoch: 1 Batch: 13717/20099 (68.25%) Loss: 2.103113 LR: 0.00001405 +[14:54:35] Epoch: 1 Batch: 13718/20099 (68.25%) Loss: 1.985986 LR: 0.00001405 +[14:54:37] Epoch: 1 Batch: 13719/20099 (68.26%) Loss: 2.145407 LR: 0.00001405 +[14:54:39] Epoch: 1 Batch: 13720/20099 (68.26%) Loss: 2.058873 LR: 0.00001405 +[14:54:41] Epoch: 1 Batch: 13721/20099 (68.27%) Loss: 2.245711 LR: 0.00001404 +[14:54:43] Epoch: 1 Batch: 13722/20099 (68.27%) Loss: 1.697034 LR: 0.00001404 +[14:54:45] Epoch: 1 Batch: 13723/20099 (68.28%) Loss: 1.681452 LR: 0.00001404 +[14:54:46] Epoch: 1 Batch: 13724/20099 (68.28%) Loss: 2.234457 LR: 0.00001404 +[14:54:48] Epoch: 1 Batch: 13725/20099 (68.29%) Loss: 1.760089 LR: 0.00001404 +[14:54:50] Epoch: 1 Batch: 13726/20099 (68.29%) Loss: 1.962854 LR: 0.00001404 +[14:54:52] Epoch: 1 Batch: 13727/20099 (68.30%) Loss: 2.358553 LR: 0.00001404 +[14:54:54] Epoch: 1 Batch: 13728/20099 (68.30%) Loss: 2.088039 LR: 0.00001402 +[14:54:56] Epoch: 1 Batch: 13729/20099 (68.31%) Loss: 2.207064 LR: 0.00001402 +[14:54:58] Epoch: 1 Batch: 13730/20099 (68.31%) Loss: 2.032794 LR: 0.00001402 +[14:54:59] Epoch: 1 Batch: 13731/20099 (68.32%) Loss: 2.089985 LR: 0.00001402 +[14:55:01] Epoch: 1 Batch: 13732/20099 (68.32%) Loss: 2.225474 LR: 0.00001402 +[14:55:03] Epoch: 1 Batch: 13733/20099 (68.33%) Loss: 1.770898 LR: 0.00001402 +[14:55:05] Epoch: 1 Batch: 13734/20099 (68.33%) Loss: 1.936968 LR: 0.00001402 +[14:55:07] Epoch: 1 Batch: 13735/20099 (68.34%) Loss: 2.163065 LR: 0.00001401 +[14:55:09] Epoch: 1 Batch: 13736/20099 (68.34%) Loss: 2.147904 LR: 0.00001401 +[14:55:10] Epoch: 1 Batch: 13737/20099 (68.35%) Loss: 2.140880 LR: 0.00001401 +[14:55:12] Epoch: 1 Batch: 13738/20099 (68.35%) Loss: 1.806151 LR: 0.00001401 +[14:55:14] Epoch: 1 Batch: 13739/20099 (68.36%) Loss: 2.156403 LR: 0.00001401 +[14:55:16] Epoch: 1 Batch: 13740/20099 (68.36%) Loss: 2.127851 LR: 0.00001401 +[14:55:18] Epoch: 1 Batch: 13741/20099 (68.37%) Loss: 2.045966 LR: 0.00001401 +[14:55:20] Epoch: 1 Batch: 13742/20099 (68.37%) Loss: 2.099993 LR: 0.00001399 +[14:55:22] Epoch: 1 Batch: 13743/20099 (68.38%) Loss: 2.091187 LR: 0.00001399 +[14:55:23] Epoch: 1 Batch: 13744/20099 (68.38%) Loss: 2.318160 LR: 0.00001399 +[14:55:25] Epoch: 1 Batch: 13745/20099 (68.39%) Loss: 2.193218 LR: 0.00001399 +[14:55:27] Epoch: 1 Batch: 13746/20099 (68.39%) Loss: 2.130143 LR: 0.00001399 +[14:55:29] Epoch: 1 Batch: 13747/20099 (68.40%) Loss: 1.705513 LR: 0.00001399 +[14:55:31] Epoch: 1 Batch: 13748/20099 (68.40%) Loss: 2.313535 LR: 0.00001399 +[14:55:33] Epoch: 1 Batch: 13749/20099 (68.41%) Loss: 2.025187 LR: 0.00001398 +[14:55:35] Epoch: 1 Batch: 13750/20099 (68.41%) Loss: 2.098824 LR: 0.00001398 +[14:55:36] Epoch: 1 Batch: 13751/20099 (68.42%) Loss: 2.084153 LR: 0.00001398 +[14:55:38] Epoch: 1 Batch: 13752/20099 (68.42%) Loss: 2.073747 LR: 0.00001398 +[14:55:40] Epoch: 1 Batch: 13753/20099 (68.43%) Loss: 1.941357 LR: 0.00001398 +[14:55:42] Epoch: 1 Batch: 13754/20099 (68.43%) Loss: 1.983290 LR: 0.00001398 +[14:55:44] Epoch: 1 Batch: 13755/20099 (68.44%) Loss: 2.288856 LR: 0.00001398 +[14:55:46] Epoch: 1 Batch: 13756/20099 (68.44%) Loss: 1.965196 LR: 0.00001396 +[14:55:48] Epoch: 1 Batch: 13757/20099 (68.45%) Loss: 2.063732 LR: 0.00001396 +[14:55:49] Epoch: 1 Batch: 13758/20099 (68.45%) Loss: 2.247014 LR: 0.00001396 +[14:55:51] Epoch: 1 Batch: 13759/20099 (68.46%) Loss: 2.273970 LR: 0.00001396 +[14:55:53] Epoch: 1 Batch: 13760/20099 (68.46%) Loss: 2.195621 LR: 0.00001396 +[14:55:55] Epoch: 1 Batch: 13761/20099 (68.47%) Loss: 2.062136 LR: 0.00001396 +[14:55:57] Epoch: 1 Batch: 13762/20099 (68.47%) Loss: 2.281042 LR: 0.00001396 +[14:55:59] Epoch: 1 Batch: 13763/20099 (68.48%) Loss: 1.982923 LR: 0.00001395 +[14:56:01] Epoch: 1 Batch: 13764/20099 (68.48%) Loss: 2.148174 LR: 0.00001395 +[14:56:02] Epoch: 1 Batch: 13765/20099 (68.49%) Loss: 2.012049 LR: 0.00001395 +[14:56:04] Epoch: 1 Batch: 13766/20099 (68.49%) Loss: 2.263738 LR: 0.00001395 +[14:56:06] Epoch: 1 Batch: 13767/20099 (68.50%) Loss: 2.092298 LR: 0.00001395 +[14:56:08] Epoch: 1 Batch: 13768/20099 (68.50%) Loss: 2.319257 LR: 0.00001395 +[14:56:10] Epoch: 1 Batch: 13769/20099 (68.51%) Loss: 2.088193 LR: 0.00001395 +[14:56:12] Epoch: 1 Batch: 13770/20099 (68.51%) Loss: 2.257448 LR: 0.00001393 +[14:56:13] Epoch: 1 Batch: 13771/20099 (68.52%) Loss: 2.124279 LR: 0.00001393 +[14:56:15] Epoch: 1 Batch: 13772/20099 (68.52%) Loss: 1.984884 LR: 0.00001393 +[14:56:17] Epoch: 1 Batch: 13773/20099 (68.53%) Loss: 1.847061 LR: 0.00001393 +[14:56:19] Epoch: 1 Batch: 13774/20099 (68.53%) Loss: 1.938062 LR: 0.00001393 +[14:56:21] Epoch: 1 Batch: 13775/20099 (68.54%) Loss: 1.768648 LR: 0.00001393 +[14:56:23] Epoch: 1 Batch: 13776/20099 (68.54%) Loss: 2.260957 LR: 0.00001393 +[14:56:25] Epoch: 1 Batch: 13777/20099 (68.55%) Loss: 2.207091 LR: 0.00001392 +[14:56:26] Epoch: 1 Batch: 13778/20099 (68.55%) Loss: 2.214519 LR: 0.00001392 +[14:56:28] Epoch: 1 Batch: 13779/20099 (68.56%) Loss: 2.072462 LR: 0.00001392 +[14:56:30] Epoch: 1 Batch: 13780/20099 (68.56%) Loss: 2.201774 LR: 0.00001392 +[14:56:32] Epoch: 1 Batch: 13781/20099 (68.57%) Loss: 2.249969 LR: 0.00001392 +[14:56:34] Epoch: 1 Batch: 13782/20099 (68.57%) Loss: 2.246098 LR: 0.00001392 +[14:56:36] Epoch: 1 Batch: 13783/20099 (68.58%) Loss: 2.040993 LR: 0.00001392 +[14:56:38] Epoch: 1 Batch: 13784/20099 (68.58%) Loss: 2.540714 LR: 0.00001390 +[14:56:39] Epoch: 1 Batch: 13785/20099 (68.59%) Loss: 2.161488 LR: 0.00001390 +[14:56:41] Epoch: 1 Batch: 13786/20099 (68.59%) Loss: 1.740087 LR: 0.00001390 +[14:56:43] Epoch: 1 Batch: 13787/20099 (68.60%) Loss: 2.128547 LR: 0.00001390 +[14:56:45] Epoch: 1 Batch: 13788/20099 (68.60%) Loss: 2.404901 LR: 0.00001390 +[14:56:47] Epoch: 1 Batch: 13789/20099 (68.61%) Loss: 2.168510 LR: 0.00001390 +[14:56:49] Epoch: 1 Batch: 13790/20099 (68.61%) Loss: 2.088853 LR: 0.00001390 +[14:56:51] Epoch: 1 Batch: 13791/20099 (68.62%) Loss: 2.316560 LR: 0.00001389 +[14:56:52] Epoch: 1 Batch: 13792/20099 (68.62%) Loss: 2.428486 LR: 0.00001389 +[14:56:54] Epoch: 1 Batch: 13793/20099 (68.63%) Loss: 2.024719 LR: 0.00001389 +[14:56:56] Epoch: 1 Batch: 13794/20099 (68.63%) Loss: 2.143664 LR: 0.00001389 +[14:56:58] Epoch: 1 Batch: 13795/20099 (68.64%) Loss: 1.964874 LR: 0.00001389 +[14:57:00] Epoch: 1 Batch: 13796/20099 (68.64%) Loss: 1.947215 LR: 0.00001389 +[14:57:02] Epoch: 1 Batch: 13797/20099 (68.65%) Loss: 1.684202 LR: 0.00001389 +[14:57:04] Epoch: 1 Batch: 13798/20099 (68.65%) Loss: 2.307091 LR: 0.00001387 +[14:57:05] Epoch: 1 Batch: 13799/20099 (68.66%) Loss: 1.540619 LR: 0.00001387 +[14:57:11] >> Cleaned up old temp checkpoint: epoch1_step11800 +[14:57:11] >> Temp checkpoint saved: epoch1_step13800, size: 0.1693 GB +[14:57:11] Epoch: 1 Batch: 13800/20099 (68.66%) Loss: 2.045623 LR: 0.00001387 +[14:57:13] Epoch: 1 Batch: 13801/20099 (68.67%) Loss: 1.870990 LR: 0.00001387 +[14:57:15] Epoch: 1 Batch: 13802/20099 (68.67%) Loss: 2.322865 LR: 0.00001387 +[14:57:17] Epoch: 1 Batch: 13803/20099 (68.68%) Loss: 2.127298 LR: 0.00001387 +[14:57:19] Epoch: 1 Batch: 13804/20099 (68.68%) Loss: 1.633534 LR: 0.00001387 +[14:57:21] Epoch: 1 Batch: 13805/20099 (68.69%) Loss: 2.104667 LR: 0.00001386 +[14:57:22] Epoch: 1 Batch: 13806/20099 (68.69%) Loss: 1.923604 LR: 0.00001386 +[14:57:24] Epoch: 1 Batch: 13807/20099 (68.69%) Loss: 1.892445 LR: 0.00001386 +[14:57:26] Epoch: 1 Batch: 13808/20099 (68.70%) Loss: 2.252549 LR: 0.00001386 +[14:57:28] Epoch: 1 Batch: 13809/20099 (68.70%) Loss: 1.993925 LR: 0.00001386 +[14:57:30] Epoch: 1 Batch: 13810/20099 (68.71%) Loss: 2.261081 LR: 0.00001386 +[14:57:32] Epoch: 1 Batch: 13811/20099 (68.71%) Loss: 1.998562 LR: 0.00001386 +[14:57:34] Epoch: 1 Batch: 13812/20099 (68.72%) Loss: 2.079758 LR: 0.00001384 +[14:57:35] Epoch: 1 Batch: 13813/20099 (68.72%) Loss: 2.286893 LR: 0.00001384 +[14:57:37] Epoch: 1 Batch: 13814/20099 (68.73%) Loss: 2.151386 LR: 0.00001384 +[14:57:39] Epoch: 1 Batch: 13815/20099 (68.73%) Loss: 2.166206 LR: 0.00001384 +[14:57:41] Epoch: 1 Batch: 13816/20099 (68.74%) Loss: 1.689416 LR: 0.00001384 +[14:57:43] Epoch: 1 Batch: 13817/20099 (68.74%) Loss: 2.278618 LR: 0.00001384 +[14:57:45] Epoch: 1 Batch: 13818/20099 (68.75%) Loss: 2.127134 LR: 0.00001384 +[14:57:47] Epoch: 1 Batch: 13819/20099 (68.75%) Loss: 1.858003 LR: 0.00001383 +[14:57:48] Epoch: 1 Batch: 13820/20099 (68.76%) Loss: 2.344027 LR: 0.00001383 +[14:57:50] Epoch: 1 Batch: 13821/20099 (68.76%) Loss: 1.999196 LR: 0.00001383 +[14:57:52] Epoch: 1 Batch: 13822/20099 (68.77%) Loss: 1.814938 LR: 0.00001383 +[14:57:54] Epoch: 1 Batch: 13823/20099 (68.77%) Loss: 1.959158 LR: 0.00001383 +[14:57:56] Epoch: 1 Batch: 13824/20099 (68.78%) Loss: 2.004296 LR: 0.00001383 +[14:57:58] Epoch: 1 Batch: 13825/20099 (68.78%) Loss: 2.429563 LR: 0.00001383 +[14:58:00] Epoch: 1 Batch: 13826/20099 (68.79%) Loss: 2.287254 LR: 0.00001381 +[14:58:02] Epoch: 1 Batch: 13827/20099 (68.79%) Loss: 2.112763 LR: 0.00001381 +[14:58:03] Epoch: 1 Batch: 13828/20099 (68.80%) Loss: 2.053855 LR: 0.00001381 +[14:58:05] Epoch: 1 Batch: 13829/20099 (68.80%) Loss: 2.175066 LR: 0.00001381 +[14:58:07] Epoch: 1 Batch: 13830/20099 (68.81%) Loss: 2.278041 LR: 0.00001381 +[14:58:09] Epoch: 1 Batch: 13831/20099 (68.81%) Loss: 2.196589 LR: 0.00001381 +[14:58:11] Epoch: 1 Batch: 13832/20099 (68.82%) Loss: 1.957931 LR: 0.00001381 +[14:58:13] Epoch: 1 Batch: 13833/20099 (68.82%) Loss: 1.950059 LR: 0.00001380 +[14:58:15] Epoch: 1 Batch: 13834/20099 (68.83%) Loss: 1.985117 LR: 0.00001380 +[14:58:16] Epoch: 1 Batch: 13835/20099 (68.83%) Loss: 2.079342 LR: 0.00001380 +[14:58:18] Epoch: 1 Batch: 13836/20099 (68.84%) Loss: 2.113398 LR: 0.00001380 +[14:58:20] Epoch: 1 Batch: 13837/20099 (68.84%) Loss: 2.116506 LR: 0.00001380 +[14:58:22] Epoch: 1 Batch: 13838/20099 (68.85%) Loss: 2.005578 LR: 0.00001380 +[14:58:24] Epoch: 1 Batch: 13839/20099 (68.85%) Loss: 1.996784 LR: 0.00001380 +[14:58:26] Epoch: 1 Batch: 13840/20099 (68.86%) Loss: 1.978959 LR: 0.00001378 +[14:58:27] Epoch: 1 Batch: 13841/20099 (68.86%) Loss: 2.175993 LR: 0.00001378 +[14:58:29] Epoch: 1 Batch: 13842/20099 (68.87%) Loss: 2.131866 LR: 0.00001378 +[14:58:31] Epoch: 1 Batch: 13843/20099 (68.87%) Loss: 1.964678 LR: 0.00001378 +[14:58:33] Epoch: 1 Batch: 13844/20099 (68.88%) Loss: 2.249523 LR: 0.00001378 +[14:58:35] Epoch: 1 Batch: 13845/20099 (68.88%) Loss: 2.057565 LR: 0.00001378 +[14:58:37] Epoch: 1 Batch: 13846/20099 (68.89%) Loss: 1.956790 LR: 0.00001378 +[14:58:39] Epoch: 1 Batch: 13847/20099 (68.89%) Loss: 1.983870 LR: 0.00001376 +[14:58:40] Epoch: 1 Batch: 13848/20099 (68.90%) Loss: 2.136093 LR: 0.00001376 +[14:58:42] Epoch: 1 Batch: 13849/20099 (68.90%) Loss: 2.017327 LR: 0.00001376 +[14:58:44] Epoch: 1 Batch: 13850/20099 (68.91%) Loss: 1.756735 LR: 0.00001376 +[14:58:46] Epoch: 1 Batch: 13851/20099 (68.91%) Loss: 1.920734 LR: 0.00001376 +[14:58:48] Epoch: 1 Batch: 13852/20099 (68.92%) Loss: 2.206615 LR: 0.00001376 +[14:58:50] Epoch: 1 Batch: 13853/20099 (68.92%) Loss: 1.869215 LR: 0.00001376 +[14:58:51] Epoch: 1 Batch: 13854/20099 (68.93%) Loss: 1.911517 LR: 0.00001375 +[14:58:53] Epoch: 1 Batch: 13855/20099 (68.93%) Loss: 2.006258 LR: 0.00001375 +[14:58:55] Epoch: 1 Batch: 13856/20099 (68.94%) Loss: 2.350784 LR: 0.00001375 +[14:58:57] Epoch: 1 Batch: 13857/20099 (68.94%) Loss: 2.189626 LR: 0.00001375 +[14:58:59] Epoch: 1 Batch: 13858/20099 (68.95%) Loss: 2.003358 LR: 0.00001375 +[14:59:01] Epoch: 1 Batch: 13859/20099 (68.95%) Loss: 2.038043 LR: 0.00001375 +[14:59:03] Epoch: 1 Batch: 13860/20099 (68.96%) Loss: 2.140952 LR: 0.00001375 +[14:59:04] Epoch: 1 Batch: 13861/20099 (68.96%) Loss: 2.136316 LR: 0.00001373 +[14:59:06] Epoch: 1 Batch: 13862/20099 (68.97%) Loss: 1.955398 LR: 0.00001373 +[14:59:08] Epoch: 1 Batch: 13863/20099 (68.97%) Loss: 2.319066 LR: 0.00001373 +[14:59:10] Epoch: 1 Batch: 13864/20099 (68.98%) Loss: 2.208076 LR: 0.00001373 +[14:59:12] Epoch: 1 Batch: 13865/20099 (68.98%) Loss: 1.976430 LR: 0.00001373 +[14:59:14] Epoch: 1 Batch: 13866/20099 (68.99%) Loss: 2.209623 LR: 0.00001373 +[14:59:15] Epoch: 1 Batch: 13867/20099 (68.99%) Loss: 2.050482 LR: 0.00001373 +[14:59:17] Epoch: 1 Batch: 13868/20099 (69.00%) Loss: 2.206521 LR: 0.00001372 +[14:59:19] Epoch: 1 Batch: 13869/20099 (69.00%) Loss: 2.253762 LR: 0.00001372 +[14:59:21] Epoch: 1 Batch: 13870/20099 (69.01%) Loss: 2.117237 LR: 0.00001372 +[14:59:23] Epoch: 1 Batch: 13871/20099 (69.01%) Loss: 2.280343 LR: 0.00001372 +[14:59:25] Epoch: 1 Batch: 13872/20099 (69.02%) Loss: 1.991402 LR: 0.00001372 +[14:59:26] Epoch: 1 Batch: 13873/20099 (69.02%) Loss: 2.061892 LR: 0.00001372 +[14:59:28] Epoch: 1 Batch: 13874/20099 (69.03%) Loss: 2.153712 LR: 0.00001372 +[14:59:30] Epoch: 1 Batch: 13875/20099 (69.03%) Loss: 2.342072 LR: 0.00001370 +[14:59:32] Epoch: 1 Batch: 13876/20099 (69.04%) Loss: 1.893811 LR: 0.00001370 +[14:59:34] Epoch: 1 Batch: 13877/20099 (69.04%) Loss: 2.078634 LR: 0.00001370 +[14:59:36] Epoch: 1 Batch: 13878/20099 (69.05%) Loss: 1.690002 LR: 0.00001370 +[14:59:38] Epoch: 1 Batch: 13879/20099 (69.05%) Loss: 2.056716 LR: 0.00001370 +[14:59:39] Epoch: 1 Batch: 13880/20099 (69.06%) Loss: 2.348188 LR: 0.00001370 +[14:59:41] Epoch: 1 Batch: 13881/20099 (69.06%) Loss: 2.045840 LR: 0.00001370 +[14:59:43] Epoch: 1 Batch: 13882/20099 (69.07%) Loss: 2.122275 LR: 0.00001369 +[14:59:45] Epoch: 1 Batch: 13883/20099 (69.07%) Loss: 1.711198 LR: 0.00001369 +[14:59:47] Epoch: 1 Batch: 13884/20099 (69.08%) Loss: 1.990469 LR: 0.00001369 +[14:59:49] Epoch: 1 Batch: 13885/20099 (69.08%) Loss: 1.910649 LR: 0.00001369 +[14:59:50] Epoch: 1 Batch: 13886/20099 (69.09%) Loss: 2.251243 LR: 0.00001369 +[14:59:52] Epoch: 1 Batch: 13887/20099 (69.09%) Loss: 2.018031 LR: 0.00001369 +[14:59:54] Epoch: 1 Batch: 13888/20099 (69.10%) Loss: 2.193894 LR: 0.00001369 +[14:59:56] Epoch: 1 Batch: 13889/20099 (69.10%) Loss: 2.106102 LR: 0.00001367 +[14:59:58] Epoch: 1 Batch: 13890/20099 (69.11%) Loss: 1.976237 LR: 0.00001367 +[15:00:00] Epoch: 1 Batch: 13891/20099 (69.11%) Loss: 2.154158 LR: 0.00001367 +[15:00:02] Epoch: 1 Batch: 13892/20099 (69.12%) Loss: 2.022938 LR: 0.00001367 +[15:00:03] Epoch: 1 Batch: 13893/20099 (69.12%) Loss: 2.183618 LR: 0.00001367 +[15:00:05] Epoch: 1 Batch: 13894/20099 (69.13%) Loss: 2.245978 LR: 0.00001367 +[15:00:07] Epoch: 1 Batch: 13895/20099 (69.13%) Loss: 2.279955 LR: 0.00001367 +[15:00:09] Epoch: 1 Batch: 13896/20099 (69.14%) Loss: 1.756827 LR: 0.00001366 +[15:00:11] Epoch: 1 Batch: 13897/20099 (69.14%) Loss: 2.220166 LR: 0.00001366 +[15:00:13] Epoch: 1 Batch: 13898/20099 (69.15%) Loss: 2.034070 LR: 0.00001366 +[15:00:15] Epoch: 1 Batch: 13899/20099 (69.15%) Loss: 2.619587 LR: 0.00001366 +[15:00:16] Epoch: 1 Batch: 13900/20099 (69.16%) Loss: 2.011238 LR: 0.00001366 +[15:00:18] Epoch: 1 Batch: 13901/20099 (69.16%) Loss: 2.284074 LR: 0.00001366 +[15:00:20] Epoch: 1 Batch: 13902/20099 (69.17%) Loss: 1.836890 LR: 0.00001366 +[15:00:22] Epoch: 1 Batch: 13903/20099 (69.17%) Loss: 2.065214 LR: 0.00001364 +[15:00:24] Epoch: 1 Batch: 13904/20099 (69.18%) Loss: 1.886945 LR: 0.00001364 +[15:00:26] Epoch: 1 Batch: 13905/20099 (69.18%) Loss: 2.364975 LR: 0.00001364 +[15:00:27] Epoch: 1 Batch: 13906/20099 (69.19%) Loss: 2.410313 LR: 0.00001364 +[15:00:29] Epoch: 1 Batch: 13907/20099 (69.19%) Loss: 2.017542 LR: 0.00001364 +[15:00:31] Epoch: 1 Batch: 13908/20099 (69.20%) Loss: 1.995131 LR: 0.00001364 +[15:00:33] Epoch: 1 Batch: 13909/20099 (69.20%) Loss: 2.106164 LR: 0.00001364 +[15:00:35] Epoch: 1 Batch: 13910/20099 (69.21%) Loss: 2.051979 LR: 0.00001363 +[15:00:37] Epoch: 1 Batch: 13911/20099 (69.21%) Loss: 1.964107 LR: 0.00001363 +[15:00:39] Epoch: 1 Batch: 13912/20099 (69.22%) Loss: 1.752481 LR: 0.00001363 +[15:00:40] Epoch: 1 Batch: 13913/20099 (69.22%) Loss: 2.281965 LR: 0.00001363 +[15:00:42] Epoch: 1 Batch: 13914/20099 (69.23%) Loss: 2.153474 LR: 0.00001363 +[15:00:44] Epoch: 1 Batch: 13915/20099 (69.23%) Loss: 1.671374 LR: 0.00001363 +[15:00:46] Epoch: 1 Batch: 13916/20099 (69.24%) Loss: 1.830287 LR: 0.00001363 +[15:00:48] Epoch: 1 Batch: 13917/20099 (69.24%) Loss: 1.993374 LR: 0.00001361 +[15:00:50] Epoch: 1 Batch: 13918/20099 (69.25%) Loss: 2.106870 LR: 0.00001361 +[15:00:52] Epoch: 1 Batch: 13919/20099 (69.25%) Loss: 2.113884 LR: 0.00001361 +[15:00:53] Epoch: 1 Batch: 13920/20099 (69.26%) Loss: 1.861474 LR: 0.00001361 +[15:00:55] Epoch: 1 Batch: 13921/20099 (69.26%) Loss: 2.332723 LR: 0.00001361 +[15:00:57] Epoch: 1 Batch: 13922/20099 (69.27%) Loss: 2.108716 LR: 0.00001361 +[15:00:59] Epoch: 1 Batch: 13923/20099 (69.27%) Loss: 2.089876 LR: 0.00001361 +[15:01:01] Epoch: 1 Batch: 13924/20099 (69.28%) Loss: 2.180338 LR: 0.00001360 +[15:01:03] Epoch: 1 Batch: 13925/20099 (69.28%) Loss: 2.046123 LR: 0.00001360 +[15:01:05] Epoch: 1 Batch: 13926/20099 (69.29%) Loss: 1.975247 LR: 0.00001360 +[15:01:06] Epoch: 1 Batch: 13927/20099 (69.29%) Loss: 1.926643 LR: 0.00001360 +[15:01:08] Epoch: 1 Batch: 13928/20099 (69.30%) Loss: 1.974615 LR: 0.00001360 +[15:01:10] Epoch: 1 Batch: 13929/20099 (69.30%) Loss: 1.961394 LR: 0.00001360 +[15:01:12] Epoch: 1 Batch: 13930/20099 (69.31%) Loss: 2.276236 LR: 0.00001360 +[15:01:14] Epoch: 1 Batch: 13931/20099 (69.31%) Loss: 1.943706 LR: 0.00001358 +[15:01:16] Epoch: 1 Batch: 13932/20099 (69.32%) Loss: 2.236912 LR: 0.00001358 +[15:01:18] Epoch: 1 Batch: 13933/20099 (69.32%) Loss: 2.074392 LR: 0.00001358 +[15:01:19] Epoch: 1 Batch: 13934/20099 (69.33%) Loss: 1.899179 LR: 0.00001358 +[15:01:21] Epoch: 1 Batch: 13935/20099 (69.33%) Loss: 2.113750 LR: 0.00001358 +[15:01:23] Epoch: 1 Batch: 13936/20099 (69.34%) Loss: 2.259090 LR: 0.00001358 +[15:01:25] Epoch: 1 Batch: 13937/20099 (69.34%) Loss: 2.245796 LR: 0.00001358 +[15:01:27] Epoch: 1 Batch: 13938/20099 (69.35%) Loss: 1.974177 LR: 0.00001357 +[15:01:29] Epoch: 1 Batch: 13939/20099 (69.35%) Loss: 2.020723 LR: 0.00001357 +[15:01:31] Epoch: 1 Batch: 13940/20099 (69.36%) Loss: 1.994613 LR: 0.00001357 +[15:01:32] Epoch: 1 Batch: 13941/20099 (69.36%) Loss: 1.930880 LR: 0.00001357 +[15:01:34] Epoch: 1 Batch: 13942/20099 (69.37%) Loss: 2.120154 LR: 0.00001357 +[15:01:36] Epoch: 1 Batch: 13943/20099 (69.37%) Loss: 1.996972 LR: 0.00001357 +[15:01:38] Epoch: 1 Batch: 13944/20099 (69.38%) Loss: 1.996036 LR: 0.00001357 +[15:01:40] Epoch: 1 Batch: 13945/20099 (69.38%) Loss: 2.282368 LR: 0.00001355 +[15:01:42] Epoch: 1 Batch: 13946/20099 (69.39%) Loss: 2.060913 LR: 0.00001355 +[15:01:44] Epoch: 1 Batch: 13947/20099 (69.39%) Loss: 1.985770 LR: 0.00001355 +[15:01:46] Epoch: 1 Batch: 13948/20099 (69.40%) Loss: 2.014093 LR: 0.00001355 +[15:01:47] Epoch: 1 Batch: 13949/20099 (69.40%) Loss: 2.267414 LR: 0.00001355 +[15:01:49] Epoch: 1 Batch: 13950/20099 (69.41%) Loss: 2.219996 LR: 0.00001355 +[15:01:51] Epoch: 1 Batch: 13951/20099 (69.41%) Loss: 2.029548 LR: 0.00001355 +[15:01:53] Epoch: 1 Batch: 13952/20099 (69.42%) Loss: 1.994523 LR: 0.00001354 +[15:01:55] Epoch: 1 Batch: 13953/20099 (69.42%) Loss: 1.897132 LR: 0.00001354 +[15:01:57] Epoch: 1 Batch: 13954/20099 (69.43%) Loss: 2.203361 LR: 0.00001354 +[15:01:58] Epoch: 1 Batch: 13955/20099 (69.43%) Loss: 2.188126 LR: 0.00001354 +[15:02:00] Epoch: 1 Batch: 13956/20099 (69.44%) Loss: 2.133483 LR: 0.00001354 +[15:02:02] Epoch: 1 Batch: 13957/20099 (69.44%) Loss: 1.927952 LR: 0.00001354 +[15:02:04] Epoch: 1 Batch: 13958/20099 (69.45%) Loss: 1.900508 LR: 0.00001354 +[15:02:06] Epoch: 1 Batch: 13959/20099 (69.45%) Loss: 2.197140 LR: 0.00001352 +[15:02:08] Epoch: 1 Batch: 13960/20099 (69.46%) Loss: 2.246276 LR: 0.00001352 +[15:02:10] Epoch: 1 Batch: 13961/20099 (69.46%) Loss: 2.024846 LR: 0.00001352 +[15:02:11] Epoch: 1 Batch: 13962/20099 (69.47%) Loss: 2.085478 LR: 0.00001352 +[15:02:13] Epoch: 1 Batch: 13963/20099 (69.47%) Loss: 1.895947 LR: 0.00001352 +[15:02:15] Epoch: 1 Batch: 13964/20099 (69.48%) Loss: 2.219170 LR: 0.00001352 +[15:02:17] Epoch: 1 Batch: 13965/20099 (69.48%) Loss: 2.010260 LR: 0.00001352 +[15:02:19] Epoch: 1 Batch: 13966/20099 (69.49%) Loss: 2.205330 LR: 0.00001351 +[15:02:21] Epoch: 1 Batch: 13967/20099 (69.49%) Loss: 2.051417 LR: 0.00001351 +[15:02:23] Epoch: 1 Batch: 13968/20099 (69.50%) Loss: 2.108162 LR: 0.00001351 +[15:02:24] Epoch: 1 Batch: 13969/20099 (69.50%) Loss: 2.276898 LR: 0.00001351 +[15:02:26] Epoch: 1 Batch: 13970/20099 (69.51%) Loss: 2.188663 LR: 0.00001351 +[15:02:28] Epoch: 1 Batch: 13971/20099 (69.51%) Loss: 1.983019 LR: 0.00001351 +[15:02:30] Epoch: 1 Batch: 13972/20099 (69.52%) Loss: 2.023133 LR: 0.00001351 +[15:02:32] Epoch: 1 Batch: 13973/20099 (69.52%) Loss: 2.302833 LR: 0.00001349 +[15:02:34] Epoch: 1 Batch: 13974/20099 (69.53%) Loss: 1.993385 LR: 0.00001349 +[15:02:36] Epoch: 1 Batch: 13975/20099 (69.53%) Loss: 2.238295 LR: 0.00001349 +[15:02:38] Epoch: 1 Batch: 13976/20099 (69.54%) Loss: 2.270355 LR: 0.00001349 +[15:02:39] Epoch: 1 Batch: 13977/20099 (69.54%) Loss: 2.008026 LR: 0.00001349 +[15:02:41] Epoch: 1 Batch: 13978/20099 (69.55%) Loss: 2.190813 LR: 0.00001349 +[15:02:43] Epoch: 1 Batch: 13979/20099 (69.55%) Loss: 2.022871 LR: 0.00001349 +[15:02:45] Epoch: 1 Batch: 13980/20099 (69.56%) Loss: 2.094099 LR: 0.00001348 +[15:02:47] Epoch: 1 Batch: 13981/20099 (69.56%) Loss: 2.178144 LR: 0.00001348 +[15:02:49] Epoch: 1 Batch: 13982/20099 (69.57%) Loss: 2.188421 LR: 0.00001348 +[15:02:51] Epoch: 1 Batch: 13983/20099 (69.57%) Loss: 1.880559 LR: 0.00001348 +[15:02:52] Epoch: 1 Batch: 13984/20099 (69.58%) Loss: 2.402621 LR: 0.00001348 +[15:02:54] Epoch: 1 Batch: 13985/20099 (69.58%) Loss: 1.970962 LR: 0.00001348 +[15:02:56] Epoch: 1 Batch: 13986/20099 (69.59%) Loss: 1.912417 LR: 0.00001348 +[15:02:58] Epoch: 1 Batch: 13987/20099 (69.59%) Loss: 2.249149 LR: 0.00001346 +[15:03:00] Epoch: 1 Batch: 13988/20099 (69.60%) Loss: 1.900277 LR: 0.00001346 +[15:03:02] Epoch: 1 Batch: 13989/20099 (69.60%) Loss: 2.099365 LR: 0.00001346 +[15:03:04] Epoch: 1 Batch: 13990/20099 (69.61%) Loss: 1.963787 LR: 0.00001346 +[15:03:05] Epoch: 1 Batch: 13991/20099 (69.61%) Loss: 2.024497 LR: 0.00001346 +[15:03:07] Epoch: 1 Batch: 13992/20099 (69.62%) Loss: 2.091093 LR: 0.00001346 +[15:03:09] Epoch: 1 Batch: 13993/20099 (69.62%) Loss: 2.018023 LR: 0.00001346 +[15:03:11] Epoch: 1 Batch: 13994/20099 (69.63%) Loss: 2.148852 LR: 0.00001345 +[15:03:13] Epoch: 1 Batch: 13995/20099 (69.63%) Loss: 2.228026 LR: 0.00001345 +[15:03:15] Epoch: 1 Batch: 13996/20099 (69.64%) Loss: 2.393276 LR: 0.00001345 +[15:03:17] Epoch: 1 Batch: 13997/20099 (69.64%) Loss: 1.635229 LR: 0.00001345 +[15:03:18] Epoch: 1 Batch: 13998/20099 (69.65%) Loss: 1.796764 LR: 0.00001345 +[15:03:20] Epoch: 1 Batch: 13999/20099 (69.65%) Loss: 2.049733 LR: 0.00001345 +[15:03:22] >> Evaluating batch 0 +[15:03:23] >> Evaluating batch 1 +[15:03:24] >> Evaluating batch 2 +[15:03:25] >> Evaluating batch 3 +[15:03:26] >> Evaluating batch 4 +[15:03:28] >> Evaluating batch 5 +[15:03:29] >> Evaluating batch 6 +[15:03:30] >> Evaluating batch 7 +[15:03:31] >> Evaluating batch 8 +[15:03:32] >> Evaluating batch 9 +[15:03:33] >> Evaluating batch 10 +[15:03:34] >> Evaluating batch 11 +[15:03:35] >> Evaluating batch 12 +[15:03:36] >> Evaluating batch 13 +[15:03:37] >> Evaluating batch 14 +[15:03:38] >> Evaluating batch 15 +[15:03:39] >> Evaluating batch 16 +[15:03:39] Epoch: 1 Step: 14000/20099 Evaluation: +[15:03:39] [1mAvg Loss Since Last Eval: 2.0831 Val Loss: 2.1546 Validation loss delta: -0.0007 Perplexity: 8.6244 LR: 0.00001345 +[15:03:43] >> Cleaned up old temp checkpoint: epoch1_step12000 +[15:03:43] >> Temp checkpoint saved: epoch1_step14000, size: 0.1693 GB +[15:03:47] >> Checkpoint saved: epoch1_step14000, size: 0.1693 GB +[15:03:47] Epoch: 1 Batch: 14000/20099 (69.66%) Loss: 1.908236 LR: 0.00001345 +[15:03:48] Epoch: 1 Batch: 14001/20099 (69.66%) Loss: 2.152014 LR: 0.00001343 +[15:03:50] Epoch: 1 Batch: 14002/20099 (69.67%) Loss: 2.035866 LR: 0.00001343 +[15:03:52] Epoch: 1 Batch: 14003/20099 (69.67%) Loss: 2.169752 LR: 0.00001343 +[15:03:54] Epoch: 1 Batch: 14004/20099 (69.68%) Loss: 2.156988 LR: 0.00001343 +[15:03:56] Epoch: 1 Batch: 14005/20099 (69.68%) Loss: 2.053854 LR: 0.00001343 +[15:03:58] Epoch: 1 Batch: 14006/20099 (69.69%) Loss: 2.015075 LR: 0.00001343 +[15:03:59] Epoch: 1 Batch: 14007/20099 (69.69%) Loss: 2.143739 LR: 0.00001343 +[15:04:01] Epoch: 1 Batch: 14008/20099 (69.70%) Loss: 2.312611 LR: 0.00001342 +[15:04:03] Epoch: 1 Batch: 14009/20099 (69.70%) Loss: 1.889057 LR: 0.00001342 +[15:04:05] Epoch: 1 Batch: 14010/20099 (69.70%) Loss: 2.238124 LR: 0.00001342 +[15:04:07] Epoch: 1 Batch: 14011/20099 (69.71%) Loss: 1.765634 LR: 0.00001342 +[15:04:09] Epoch: 1 Batch: 14012/20099 (69.71%) Loss: 2.169719 LR: 0.00001342 +[15:04:11] Epoch: 1 Batch: 14013/20099 (69.72%) Loss: 2.335338 LR: 0.00001342 +[15:04:13] Epoch: 1 Batch: 14014/20099 (69.72%) Loss: 2.380353 LR: 0.00001342 +[15:04:15] Epoch: 1 Batch: 14015/20099 (69.73%) Loss: 2.028246 LR: 0.00001340 +[15:04:16] Epoch: 1 Batch: 14016/20099 (69.73%) Loss: 2.277111 LR: 0.00001340 +[15:04:18] Epoch: 1 Batch: 14017/20099 (69.74%) Loss: 2.014685 LR: 0.00001340 +[15:04:20] Epoch: 1 Batch: 14018/20099 (69.74%) Loss: 2.137202 LR: 0.00001340 +[15:04:22] Epoch: 1 Batch: 14019/20099 (69.75%) Loss: 2.016282 LR: 0.00001340 +[15:04:24] Epoch: 1 Batch: 14020/20099 (69.75%) Loss: 2.085164 LR: 0.00001340 +[15:04:26] Epoch: 1 Batch: 14021/20099 (69.76%) Loss: 2.132124 LR: 0.00001340 +[15:04:28] Epoch: 1 Batch: 14022/20099 (69.76%) Loss: 2.165897 LR: 0.00001339 +[15:04:30] Epoch: 1 Batch: 14023/20099 (69.77%) Loss: 2.016580 LR: 0.00001339 +[15:04:32] Epoch: 1 Batch: 14024/20099 (69.77%) Loss: 1.937707 LR: 0.00001339 +[15:04:33] Epoch: 1 Batch: 14025/20099 (69.78%) Loss: 1.870736 LR: 0.00001339 +[15:04:35] Epoch: 1 Batch: 14026/20099 (69.78%) Loss: 1.949037 LR: 0.00001339 +[15:04:37] Epoch: 1 Batch: 14027/20099 (69.79%) Loss: 2.074316 LR: 0.00001339 +[15:04:39] Epoch: 1 Batch: 14028/20099 (69.79%) Loss: 2.092081 LR: 0.00001339 +[15:04:41] Epoch: 1 Batch: 14029/20099 (69.80%) Loss: 2.311319 LR: 0.00001337 +[15:04:43] Epoch: 1 Batch: 14030/20099 (69.80%) Loss: 1.931598 LR: 0.00001337 +[15:04:44] Epoch: 1 Batch: 14031/20099 (69.81%) Loss: 2.239501 LR: 0.00001337 +[15:04:46] Epoch: 1 Batch: 14032/20099 (69.81%) Loss: 2.190092 LR: 0.00001337 +[15:04:48] Epoch: 1 Batch: 14033/20099 (69.82%) Loss: 2.310144 LR: 0.00001337 +[15:04:50] Epoch: 1 Batch: 14034/20099 (69.82%) Loss: 2.056702 LR: 0.00001337 +[15:04:52] Epoch: 1 Batch: 14035/20099 (69.83%) Loss: 1.903889 LR: 0.00001337 +[15:04:54] Epoch: 1 Batch: 14036/20099 (69.83%) Loss: 1.891208 LR: 0.00001336 +[15:04:56] Epoch: 1 Batch: 14037/20099 (69.84%) Loss: 1.909328 LR: 0.00001336 +[15:04:57] Epoch: 1 Batch: 14038/20099 (69.84%) Loss: 2.251498 LR: 0.00001336 +[15:04:59] Epoch: 1 Batch: 14039/20099 (69.85%) Loss: 2.144134 LR: 0.00001336 +[15:05:01] Epoch: 1 Batch: 14040/20099 (69.85%) Loss: 2.473144 LR: 0.00001336 +[15:05:03] Epoch: 1 Batch: 14041/20099 (69.86%) Loss: 2.045105 LR: 0.00001336 +[15:05:05] Epoch: 1 Batch: 14042/20099 (69.86%) Loss: 1.967996 LR: 0.00001336 +[15:05:07] Epoch: 1 Batch: 14043/20099 (69.87%) Loss: 2.129719 LR: 0.00001334 +[15:05:08] Epoch: 1 Batch: 14044/20099 (69.87%) Loss: 2.128107 LR: 0.00001334 +[15:05:10] Epoch: 1 Batch: 14045/20099 (69.88%) Loss: 2.002133 LR: 0.00001334 +[15:05:12] Epoch: 1 Batch: 14046/20099 (69.88%) Loss: 2.099285 LR: 0.00001334 +[15:05:14] Epoch: 1 Batch: 14047/20099 (69.89%) Loss: 2.348757 LR: 0.00001334 +[15:05:16] Epoch: 1 Batch: 14048/20099 (69.89%) Loss: 2.361411 LR: 0.00001334 +[15:05:18] Epoch: 1 Batch: 14049/20099 (69.90%) Loss: 2.333449 LR: 0.00001334 +[15:05:20] Epoch: 1 Batch: 14050/20099 (69.90%) Loss: 2.141673 LR: 0.00001333 +[15:05:21] Epoch: 1 Batch: 14051/20099 (69.91%) Loss: 1.997753 LR: 0.00001333 +[15:05:23] Epoch: 1 Batch: 14052/20099 (69.91%) Loss: 2.026626 LR: 0.00001333 +[15:05:25] Epoch: 1 Batch: 14053/20099 (69.92%) Loss: 2.236732 LR: 0.00001333 +[15:05:27] Epoch: 1 Batch: 14054/20099 (69.92%) Loss: 1.996920 LR: 0.00001333 +[15:05:29] Epoch: 1 Batch: 14055/20099 (69.93%) Loss: 2.158952 LR: 0.00001333 +[15:05:31] Epoch: 1 Batch: 14056/20099 (69.93%) Loss: 2.174842 LR: 0.00001333 +[15:05:33] Epoch: 1 Batch: 14057/20099 (69.94%) Loss: 2.160324 LR: 0.00001331 +[15:05:34] Epoch: 1 Batch: 14058/20099 (69.94%) Loss: 1.935305 LR: 0.00001331 +[15:05:36] Epoch: 1 Batch: 14059/20099 (69.95%) Loss: 1.981442 LR: 0.00001331 +[15:05:38] Epoch: 1 Batch: 14060/20099 (69.95%) Loss: 2.069142 LR: 0.00001331 +[15:05:40] Epoch: 1 Batch: 14061/20099 (69.96%) Loss: 2.086893 LR: 0.00001331 +[15:05:42] Epoch: 1 Batch: 14062/20099 (69.96%) Loss: 2.513581 LR: 0.00001331 +[15:05:44] Epoch: 1 Batch: 14063/20099 (69.97%) Loss: 1.899521 LR: 0.00001331 +[15:05:46] Epoch: 1 Batch: 14064/20099 (69.97%) Loss: 1.893383 LR: 0.00001330 +[15:05:48] Epoch: 1 Batch: 14065/20099 (69.98%) Loss: 2.259633 LR: 0.00001330 +[15:05:49] Epoch: 1 Batch: 14066/20099 (69.98%) Loss: 1.960115 LR: 0.00001330 +[15:05:51] Epoch: 1 Batch: 14067/20099 (69.99%) Loss: 2.103124 LR: 0.00001330 +[15:05:53] Epoch: 1 Batch: 14068/20099 (69.99%) Loss: 2.135539 LR: 0.00001330 +[15:05:55] Epoch: 1 Batch: 14069/20099 (70.00%) Loss: 2.063297 LR: 0.00001330 +[15:05:57] Epoch: 1 Batch: 14070/20099 (70.00%) Loss: 2.176017 LR: 0.00001330 +[15:05:59] Epoch: 1 Batch: 14071/20099 (70.01%) Loss: 2.201392 LR: 0.00001328 +[15:06:01] Epoch: 1 Batch: 14072/20099 (70.01%) Loss: 2.330271 LR: 0.00001328 +[15:06:02] Epoch: 1 Batch: 14073/20099 (70.02%) Loss: 2.088560 LR: 0.00001328 +[15:06:04] Epoch: 1 Batch: 14074/20099 (70.02%) Loss: 2.288650 LR: 0.00001328 +[15:06:06] Epoch: 1 Batch: 14075/20099 (70.03%) Loss: 2.207077 LR: 0.00001328 +[15:06:08] Epoch: 1 Batch: 14076/20099 (70.03%) Loss: 1.758030 LR: 0.00001328 +[15:06:10] Epoch: 1 Batch: 14077/20099 (70.04%) Loss: 2.220818 LR: 0.00001328 +[15:06:12] Epoch: 1 Batch: 14078/20099 (70.04%) Loss: 1.826088 LR: 0.00001327 +[15:06:14] Epoch: 1 Batch: 14079/20099 (70.05%) Loss: 2.040720 LR: 0.00001327 +[15:06:15] Epoch: 1 Batch: 14080/20099 (70.05%) Loss: 2.245892 LR: 0.00001327 +[15:06:17] Epoch: 1 Batch: 14081/20099 (70.06%) Loss: 2.259224 LR: 0.00001327 +[15:06:19] Epoch: 1 Batch: 14082/20099 (70.06%) Loss: 1.787365 LR: 0.00001327 +[15:06:21] Epoch: 1 Batch: 14083/20099 (70.07%) Loss: 2.131996 LR: 0.00001327 +[15:06:23] Epoch: 1 Batch: 14084/20099 (70.07%) Loss: 2.107537 LR: 0.00001327 +[15:06:25] Epoch: 1 Batch: 14085/20099 (70.08%) Loss: 2.117787 LR: 0.00001325 +[15:06:26] Epoch: 1 Batch: 14086/20099 (70.08%) Loss: 2.089099 LR: 0.00001325 +[15:06:28] Epoch: 1 Batch: 14087/20099 (70.09%) Loss: 1.963848 LR: 0.00001325 +[15:06:30] Epoch: 1 Batch: 14088/20099 (70.09%) Loss: 2.116444 LR: 0.00001325 +[15:06:32] Epoch: 1 Batch: 14089/20099 (70.10%) Loss: 1.703026 LR: 0.00001325 +[15:06:34] Epoch: 1 Batch: 14090/20099 (70.10%) Loss: 2.112913 LR: 0.00001325 +[15:06:36] Epoch: 1 Batch: 14091/20099 (70.11%) Loss: 1.945439 LR: 0.00001325 +[15:06:38] Epoch: 1 Batch: 14092/20099 (70.11%) Loss: 2.091409 LR: 0.00001324 +[15:06:39] Epoch: 1 Batch: 14093/20099 (70.12%) Loss: 2.248818 LR: 0.00001324 +[15:06:41] Epoch: 1 Batch: 14094/20099 (70.12%) Loss: 1.714032 LR: 0.00001324 +[15:06:43] Epoch: 1 Batch: 14095/20099 (70.13%) Loss: 1.971220 LR: 0.00001324 +[15:06:45] Epoch: 1 Batch: 14096/20099 (70.13%) Loss: 2.127012 LR: 0.00001324 +[15:06:47] Epoch: 1 Batch: 14097/20099 (70.14%) Loss: 2.014073 LR: 0.00001324 +[15:06:49] Epoch: 1 Batch: 14098/20099 (70.14%) Loss: 1.880741 LR: 0.00001324 +[15:06:51] Epoch: 1 Batch: 14099/20099 (70.15%) Loss: 2.292917 LR: 0.00001322 +[15:06:52] Epoch: 1 Batch: 14100/20099 (70.15%) Loss: 2.267997 LR: 0.00001322 +[15:06:54] Epoch: 1 Batch: 14101/20099 (70.16%) Loss: 2.133940 LR: 0.00001322 +[15:06:56] Epoch: 1 Batch: 14102/20099 (70.16%) Loss: 2.089356 LR: 0.00001322 +[15:06:58] Epoch: 1 Batch: 14103/20099 (70.17%) Loss: 2.067074 LR: 0.00001322 +[15:07:00] Epoch: 1 Batch: 14104/20099 (70.17%) Loss: 1.921925 LR: 0.00001322 +[15:07:02] Epoch: 1 Batch: 14105/20099 (70.18%) Loss: 2.198205 LR: 0.00001322 +[15:07:04] Epoch: 1 Batch: 14106/20099 (70.18%) Loss: 2.230950 LR: 0.00001321 +[15:07:05] Epoch: 1 Batch: 14107/20099 (70.19%) Loss: 2.091808 LR: 0.00001321 +[15:07:07] Epoch: 1 Batch: 14108/20099 (70.19%) Loss: 2.194324 LR: 0.00001321 +[15:07:09] Epoch: 1 Batch: 14109/20099 (70.20%) Loss: 2.120229 LR: 0.00001321 +[15:07:11] Epoch: 1 Batch: 14110/20099 (70.20%) Loss: 2.333757 LR: 0.00001321 +[15:07:13] Epoch: 1 Batch: 14111/20099 (70.21%) Loss: 2.429177 LR: 0.00001321 +[15:07:15] Epoch: 1 Batch: 14112/20099 (70.21%) Loss: 1.957863 LR: 0.00001321 +[15:07:16] Epoch: 1 Batch: 14113/20099 (70.22%) Loss: 1.870540 LR: 0.00001319 +[15:07:18] Epoch: 1 Batch: 14114/20099 (70.22%) Loss: 2.317012 LR: 0.00001319 +[15:07:20] Epoch: 1 Batch: 14115/20099 (70.23%) Loss: 1.740619 LR: 0.00001319 +[15:07:22] Epoch: 1 Batch: 14116/20099 (70.23%) Loss: 1.896868 LR: 0.00001319 +[15:07:24] Epoch: 1 Batch: 14117/20099 (70.24%) Loss: 1.537217 LR: 0.00001319 +[15:07:26] Epoch: 1 Batch: 14118/20099 (70.24%) Loss: 2.018700 LR: 0.00001319 +[15:07:28] Epoch: 1 Batch: 14119/20099 (70.25%) Loss: 2.039764 LR: 0.00001319 +[15:07:29] Epoch: 1 Batch: 14120/20099 (70.25%) Loss: 2.313422 LR: 0.00001318 +[15:07:31] Epoch: 1 Batch: 14121/20099 (70.26%) Loss: 2.352668 LR: 0.00001318 +[15:07:33] Epoch: 1 Batch: 14122/20099 (70.26%) Loss: 2.203261 LR: 0.00001318 +[15:07:35] Epoch: 1 Batch: 14123/20099 (70.27%) Loss: 1.856316 LR: 0.00001318 +[15:07:37] Epoch: 1 Batch: 14124/20099 (70.27%) Loss: 1.922009 LR: 0.00001318 +[15:07:39] Epoch: 1 Batch: 14125/20099 (70.28%) Loss: 2.045663 LR: 0.00001318 +[15:07:41] Epoch: 1 Batch: 14126/20099 (70.28%) Loss: 2.360119 LR: 0.00001318 +[15:07:42] Epoch: 1 Batch: 14127/20099 (70.29%) Loss: 2.649317 LR: 0.00001316 +[15:07:44] Epoch: 1 Batch: 14128/20099 (70.29%) Loss: 2.030398 LR: 0.00001316 +[15:07:46] Epoch: 1 Batch: 14129/20099 (70.30%) Loss: 1.995903 LR: 0.00001316 +[15:07:48] Epoch: 1 Batch: 14130/20099 (70.30%) Loss: 1.951354 LR: 0.00001316 +[15:07:50] Epoch: 1 Batch: 14131/20099 (70.31%) Loss: 2.004289 LR: 0.00001316 +[15:07:52] Epoch: 1 Batch: 14132/20099 (70.31%) Loss: 2.029167 LR: 0.00001316 +[15:07:54] Epoch: 1 Batch: 14133/20099 (70.32%) Loss: 2.048864 LR: 0.00001316 +[15:07:55] Epoch: 1 Batch: 14134/20099 (70.32%) Loss: 2.329615 LR: 0.00001315 +[15:07:57] Epoch: 1 Batch: 14135/20099 (70.33%) Loss: 2.032752 LR: 0.00001315 +[15:07:59] Epoch: 1 Batch: 14136/20099 (70.33%) Loss: 1.876507 LR: 0.00001315 +[15:08:01] Epoch: 1 Batch: 14137/20099 (70.34%) Loss: 2.222470 LR: 0.00001315 +[15:08:03] Epoch: 1 Batch: 14138/20099 (70.34%) Loss: 2.221272 LR: 0.00001315 +[15:08:05] Epoch: 1 Batch: 14139/20099 (70.35%) Loss: 2.333714 LR: 0.00001315 +[15:08:06] Epoch: 1 Batch: 14140/20099 (70.35%) Loss: 2.099627 LR: 0.00001315 +[15:08:08] Epoch: 1 Batch: 14141/20099 (70.36%) Loss: 1.963567 LR: 0.00001313 +[15:08:10] Epoch: 1 Batch: 14142/20099 (70.36%) Loss: 2.235420 LR: 0.00001313 +[15:08:12] Epoch: 1 Batch: 14143/20099 (70.37%) Loss: 1.877184 LR: 0.00001313 +[15:08:14] Epoch: 1 Batch: 14144/20099 (70.37%) Loss: 2.290599 LR: 0.00001313 +[15:08:16] Epoch: 1 Batch: 14145/20099 (70.38%) Loss: 1.808970 LR: 0.00001313 +[15:08:18] Epoch: 1 Batch: 14146/20099 (70.38%) Loss: 1.987428 LR: 0.00001313 +[15:08:19] Epoch: 1 Batch: 14147/20099 (70.39%) Loss: 2.179288 LR: 0.00001313 +[15:08:21] Epoch: 1 Batch: 14148/20099 (70.39%) Loss: 2.232906 LR: 0.00001312 +[15:08:23] Epoch: 1 Batch: 14149/20099 (70.40%) Loss: 2.055832 LR: 0.00001312 +[15:08:25] Epoch: 1 Batch: 14150/20099 (70.40%) Loss: 2.206657 LR: 0.00001312 +[15:08:27] Epoch: 1 Batch: 14151/20099 (70.41%) Loss: 1.981790 LR: 0.00001312 +[15:08:29] Epoch: 1 Batch: 14152/20099 (70.41%) Loss: 2.080351 LR: 0.00001312 +[15:08:30] Epoch: 1 Batch: 14153/20099 (70.42%) Loss: 2.257615 LR: 0.00001312 +[15:08:32] Epoch: 1 Batch: 14154/20099 (70.42%) Loss: 2.161017 LR: 0.00001312 +[15:08:34] Epoch: 1 Batch: 14155/20099 (70.43%) Loss: 2.157845 LR: 0.00001310 +[15:08:36] Epoch: 1 Batch: 14156/20099 (70.43%) Loss: 2.169761 LR: 0.00001310 +[15:08:38] Epoch: 1 Batch: 14157/20099 (70.44%) Loss: 2.239894 LR: 0.00001310 +[15:08:40] Epoch: 1 Batch: 14158/20099 (70.44%) Loss: 2.090580 LR: 0.00001310 +[15:08:42] Epoch: 1 Batch: 14159/20099 (70.45%) Loss: 2.004393 LR: 0.00001310 +[15:08:43] Epoch: 1 Batch: 14160/20099 (70.45%) Loss: 2.234771 LR: 0.00001310 +[15:08:45] Epoch: 1 Batch: 14161/20099 (70.46%) Loss: 2.207394 LR: 0.00001310 +[15:08:47] Epoch: 1 Batch: 14162/20099 (70.46%) Loss: 1.983834 LR: 0.00001309 +[15:08:49] Epoch: 1 Batch: 14163/20099 (70.47%) Loss: 1.388820 LR: 0.00001309 +[15:08:51] Epoch: 1 Batch: 14164/20099 (70.47%) Loss: 2.037211 LR: 0.00001309 +[15:08:53] Epoch: 1 Batch: 14165/20099 (70.48%) Loss: 1.953824 LR: 0.00001309 +[15:08:55] Epoch: 1 Batch: 14166/20099 (70.48%) Loss: 1.882253 LR: 0.00001309 +[15:08:56] Epoch: 1 Batch: 14167/20099 (70.49%) Loss: 2.105921 LR: 0.00001309 +[15:08:58] Epoch: 1 Batch: 14168/20099 (70.49%) Loss: 2.074148 LR: 0.00001309 +[15:09:00] Epoch: 1 Batch: 14169/20099 (70.50%) Loss: 2.218823 LR: 0.00001307 +[15:09:02] Epoch: 1 Batch: 14170/20099 (70.50%) Loss: 2.048222 LR: 0.00001307 +[15:09:04] Epoch: 1 Batch: 14171/20099 (70.51%) Loss: 1.949757 LR: 0.00001307 +[15:09:06] Epoch: 1 Batch: 14172/20099 (70.51%) Loss: 2.160639 LR: 0.00001307 +[15:09:08] Epoch: 1 Batch: 14173/20099 (70.52%) Loss: 2.140779 LR: 0.00001307 +[15:09:09] Epoch: 1 Batch: 14174/20099 (70.52%) Loss: 2.154678 LR: 0.00001307 +[15:09:11] Epoch: 1 Batch: 14175/20099 (70.53%) Loss: 2.054819 LR: 0.00001307 +[15:09:13] Epoch: 1 Batch: 14176/20099 (70.53%) Loss: 1.943763 LR: 0.00001306 +[15:09:15] Epoch: 1 Batch: 14177/20099 (70.54%) Loss: 2.023943 LR: 0.00001306 +[15:09:17] Epoch: 1 Batch: 14178/20099 (70.54%) Loss: 2.128215 LR: 0.00001306 +[15:09:19] Epoch: 1 Batch: 14179/20099 (70.55%) Loss: 2.233530 LR: 0.00001306 +[15:09:20] Epoch: 1 Batch: 14180/20099 (70.55%) Loss: 2.353640 LR: 0.00001306 +[15:09:22] Epoch: 1 Batch: 14181/20099 (70.56%) Loss: 2.047935 LR: 0.00001306 +[15:09:24] Epoch: 1 Batch: 14182/20099 (70.56%) Loss: 2.029359 LR: 0.00001306 +[15:09:26] Epoch: 1 Batch: 14183/20099 (70.57%) Loss: 2.107228 LR: 0.00001304 +[15:09:28] Epoch: 1 Batch: 14184/20099 (70.57%) Loss: 2.579060 LR: 0.00001304 +[15:09:30] Epoch: 1 Batch: 14185/20099 (70.58%) Loss: 2.080436 LR: 0.00001304 +[15:09:32] Epoch: 1 Batch: 14186/20099 (70.58%) Loss: 2.224449 LR: 0.00001304 +[15:09:33] Epoch: 1 Batch: 14187/20099 (70.59%) Loss: 2.035991 LR: 0.00001304 +[15:09:35] Epoch: 1 Batch: 14188/20099 (70.59%) Loss: 1.714143 LR: 0.00001304 +[15:09:37] Epoch: 1 Batch: 14189/20099 (70.60%) Loss: 2.115271 LR: 0.00001304 +[15:09:39] Epoch: 1 Batch: 14190/20099 (70.60%) Loss: 2.251832 LR: 0.00001303 +[15:09:41] Epoch: 1 Batch: 14191/20099 (70.61%) Loss: 2.064791 LR: 0.00001303 +[15:09:43] Epoch: 1 Batch: 14192/20099 (70.61%) Loss: 2.095898 LR: 0.00001303 +[15:09:44] Epoch: 1 Batch: 14193/20099 (70.62%) Loss: 1.999495 LR: 0.00001303 +[15:09:46] Epoch: 1 Batch: 14194/20099 (70.62%) Loss: 1.880523 LR: 0.00001303 +[15:09:48] Epoch: 1 Batch: 14195/20099 (70.63%) Loss: 2.378606 LR: 0.00001303 +[15:09:50] Epoch: 1 Batch: 14196/20099 (70.63%) Loss: 2.050813 LR: 0.00001303 +[15:09:52] Epoch: 1 Batch: 14197/20099 (70.64%) Loss: 2.286471 LR: 0.00001302 +[15:09:54] Epoch: 1 Batch: 14198/20099 (70.64%) Loss: 2.296535 LR: 0.00001302 +[15:09:56] Epoch: 1 Batch: 14199/20099 (70.65%) Loss: 2.148905 LR: 0.00001302 +[15:10:01] >> Cleaned up old temp checkpoint: epoch1_step12200 +[15:10:01] >> Temp checkpoint saved: epoch1_step14200, size: 0.1693 GB +[15:10:01] Epoch: 1 Batch: 14200/20099 (70.65%) Loss: 1.778147 LR: 0.00001302 +[15:10:03] Epoch: 1 Batch: 14201/20099 (70.66%) Loss: 2.048033 LR: 0.00001302 +[15:10:05] Epoch: 1 Batch: 14202/20099 (70.66%) Loss: 2.127092 LR: 0.00001302 +[15:10:06] Epoch: 1 Batch: 14203/20099 (70.67%) Loss: 2.125577 LR: 0.00001302 +[15:10:08] Epoch: 1 Batch: 14204/20099 (70.67%) Loss: 1.883456 LR: 0.00001300 +[15:10:10] Epoch: 1 Batch: 14205/20099 (70.68%) Loss: 1.932036 LR: 0.00001300 +[15:10:12] Epoch: 1 Batch: 14206/20099 (70.68%) Loss: 2.118212 LR: 0.00001300 +[15:10:14] Epoch: 1 Batch: 14207/20099 (70.69%) Loss: 1.956492 LR: 0.00001300 +[15:10:16] Epoch: 1 Batch: 14208/20099 (70.69%) Loss: 2.170266 LR: 0.00001300 +[15:10:18] Epoch: 1 Batch: 14209/20099 (70.70%) Loss: 2.233057 LR: 0.00001300 +[15:10:19] Epoch: 1 Batch: 14210/20099 (70.70%) Loss: 2.136323 LR: 0.00001300 +[15:10:21] Epoch: 1 Batch: 14211/20099 (70.71%) Loss: 2.140452 LR: 0.00001299 +[15:10:23] Epoch: 1 Batch: 14212/20099 (70.71%) Loss: 2.169427 LR: 0.00001299 +[15:10:25] Epoch: 1 Batch: 14213/20099 (70.71%) Loss: 1.829173 LR: 0.00001299 +[15:10:27] Epoch: 1 Batch: 14214/20099 (70.72%) Loss: 2.188499 LR: 0.00001299 +[15:10:29] Epoch: 1 Batch: 14215/20099 (70.72%) Loss: 2.253694 LR: 0.00001299 +[15:10:31] Epoch: 1 Batch: 14216/20099 (70.73%) Loss: 2.108526 LR: 0.00001299 +[15:10:33] Epoch: 1 Batch: 14217/20099 (70.73%) Loss: 2.022913 LR: 0.00001299 +[15:10:35] Epoch: 1 Batch: 14218/20099 (70.74%) Loss: 2.097582 LR: 0.00001297 +[15:10:36] Epoch: 1 Batch: 14219/20099 (70.74%) Loss: 1.958154 LR: 0.00001297 +[15:10:38] Epoch: 1 Batch: 14220/20099 (70.75%) Loss: 2.049472 LR: 0.00001297 +[15:10:40] Epoch: 1 Batch: 14221/20099 (70.75%) Loss: 2.116355 LR: 0.00001297 +[15:10:42] Epoch: 1 Batch: 14222/20099 (70.76%) Loss: 2.280712 LR: 0.00001297 +[15:10:44] Epoch: 1 Batch: 14223/20099 (70.76%) Loss: 2.074663 LR: 0.00001297 +[15:10:46] Epoch: 1 Batch: 14224/20099 (70.77%) Loss: 2.292792 LR: 0.00001297 +[15:10:48] Epoch: 1 Batch: 14225/20099 (70.77%) Loss: 2.048664 LR: 0.00001296 +[15:10:49] Epoch: 1 Batch: 14226/20099 (70.78%) Loss: 2.391509 LR: 0.00001296 +[15:10:51] Epoch: 1 Batch: 14227/20099 (70.78%) Loss: 2.012646 LR: 0.00001296 +[15:10:53] Epoch: 1 Batch: 14228/20099 (70.79%) Loss: 1.984705 LR: 0.00001296 +[15:10:55] Epoch: 1 Batch: 14229/20099 (70.79%) Loss: 2.099390 LR: 0.00001296 +[15:10:57] Epoch: 1 Batch: 14230/20099 (70.80%) Loss: 2.407599 LR: 0.00001296 +[15:10:59] Epoch: 1 Batch: 14231/20099 (70.80%) Loss: 2.017683 LR: 0.00001296 +[15:11:01] Epoch: 1 Batch: 14232/20099 (70.81%) Loss: 2.152608 LR: 0.00001294 +[15:11:02] Epoch: 1 Batch: 14233/20099 (70.81%) Loss: 2.183143 LR: 0.00001294 +[15:11:04] Epoch: 1 Batch: 14234/20099 (70.82%) Loss: 1.992147 LR: 0.00001294 +[15:11:06] Epoch: 1 Batch: 14235/20099 (70.82%) Loss: 2.051990 LR: 0.00001294 +[15:11:08] Epoch: 1 Batch: 14236/20099 (70.83%) Loss: 2.161864 LR: 0.00001294 +[15:11:10] Epoch: 1 Batch: 14237/20099 (70.83%) Loss: 1.988487 LR: 0.00001294 +[15:11:12] Epoch: 1 Batch: 14238/20099 (70.84%) Loss: 1.962685 LR: 0.00001294 +[15:11:13] Epoch: 1 Batch: 14239/20099 (70.84%) Loss: 2.360046 LR: 0.00001293 +[15:11:15] Epoch: 1 Batch: 14240/20099 (70.85%) Loss: 2.083917 LR: 0.00001293 +[15:11:17] Epoch: 1 Batch: 14241/20099 (70.85%) Loss: 2.101196 LR: 0.00001293 +[15:11:19] Epoch: 1 Batch: 14242/20099 (70.86%) Loss: 1.967901 LR: 0.00001293 +[15:11:21] Epoch: 1 Batch: 14243/20099 (70.86%) Loss: 2.010656 LR: 0.00001293 +[15:11:23] Epoch: 1 Batch: 14244/20099 (70.87%) Loss: 1.901682 LR: 0.00001293 +[15:11:24] Epoch: 1 Batch: 14245/20099 (70.87%) Loss: 2.088339 LR: 0.00001293 +[15:11:26] Epoch: 1 Batch: 14246/20099 (70.88%) Loss: 2.083033 LR: 0.00001291 +[15:11:28] Epoch: 1 Batch: 14247/20099 (70.88%) Loss: 1.934826 LR: 0.00001291 +[15:11:30] Epoch: 1 Batch: 14248/20099 (70.89%) Loss: 2.463536 LR: 0.00001291 +[15:11:32] Epoch: 1 Batch: 14249/20099 (70.89%) Loss: 1.949071 LR: 0.00001291 +[15:11:34] Epoch: 1 Batch: 14250/20099 (70.90%) Loss: 2.160035 LR: 0.00001291 +[15:11:36] Epoch: 1 Batch: 14251/20099 (70.90%) Loss: 1.906379 LR: 0.00001291 +[15:11:37] Epoch: 1 Batch: 14252/20099 (70.91%) Loss: 2.040885 LR: 0.00001291 +[15:11:39] Epoch: 1 Batch: 14253/20099 (70.91%) Loss: 2.290811 LR: 0.00001290 +[15:11:41] Epoch: 1 Batch: 14254/20099 (70.92%) Loss: 2.044679 LR: 0.00001290 +[15:11:43] Epoch: 1 Batch: 14255/20099 (70.92%) Loss: 2.275305 LR: 0.00001290 +[15:11:45] Epoch: 1 Batch: 14256/20099 (70.93%) Loss: 2.160977 LR: 0.00001290 +[15:11:47] Epoch: 1 Batch: 14257/20099 (70.93%) Loss: 2.095740 LR: 0.00001290 +[15:11:49] Epoch: 1 Batch: 14258/20099 (70.94%) Loss: 2.264572 LR: 0.00001290 +[15:11:50] Epoch: 1 Batch: 14259/20099 (70.94%) Loss: 2.008916 LR: 0.00001290 +[15:11:52] Epoch: 1 Batch: 14260/20099 (70.95%) Loss: 1.997956 LR: 0.00001288 +[15:11:54] Epoch: 1 Batch: 14261/20099 (70.95%) Loss: 2.050543 LR: 0.00001288 +[15:11:56] Epoch: 1 Batch: 14262/20099 (70.96%) Loss: 2.298821 LR: 0.00001288 +[15:11:58] Epoch: 1 Batch: 14263/20099 (70.96%) Loss: 2.020747 LR: 0.00001288 +[15:12:00] Epoch: 1 Batch: 14264/20099 (70.97%) Loss: 2.071627 LR: 0.00001288 +[15:12:02] Epoch: 1 Batch: 14265/20099 (70.97%) Loss: 2.066666 LR: 0.00001288 +[15:12:03] Epoch: 1 Batch: 14266/20099 (70.98%) Loss: 2.313992 LR: 0.00001288 +[15:12:05] Epoch: 1 Batch: 14267/20099 (70.98%) Loss: 2.029183 LR: 0.00001287 +[15:12:07] Epoch: 1 Batch: 14268/20099 (70.99%) Loss: 1.924399 LR: 0.00001287 +[15:12:09] Epoch: 1 Batch: 14269/20099 (70.99%) Loss: 2.206744 LR: 0.00001287 +[15:12:11] Epoch: 1 Batch: 14270/20099 (71.00%) Loss: 2.231679 LR: 0.00001287 +[15:12:13] Epoch: 1 Batch: 14271/20099 (71.00%) Loss: 2.003874 LR: 0.00001287 +[15:12:15] Epoch: 1 Batch: 14272/20099 (71.01%) Loss: 1.761560 LR: 0.00001287 +[15:12:16] Epoch: 1 Batch: 14273/20099 (71.01%) Loss: 2.349317 LR: 0.00001287 +[15:12:18] Epoch: 1 Batch: 14274/20099 (71.02%) Loss: 2.185916 LR: 0.00001285 +[15:12:20] Epoch: 1 Batch: 14275/20099 (71.02%) Loss: 1.842154 LR: 0.00001285 +[15:12:22] Epoch: 1 Batch: 14276/20099 (71.03%) Loss: 2.250989 LR: 0.00001285 +[15:12:24] Epoch: 1 Batch: 14277/20099 (71.03%) Loss: 2.146939 LR: 0.00001285 +[15:12:26] Epoch: 1 Batch: 14278/20099 (71.04%) Loss: 2.069284 LR: 0.00001285 +[15:12:27] Epoch: 1 Batch: 14279/20099 (71.04%) Loss: 2.093804 LR: 0.00001285 +[15:12:29] Epoch: 1 Batch: 14280/20099 (71.05%) Loss: 2.183137 LR: 0.00001285 +[15:12:31] Epoch: 1 Batch: 14281/20099 (71.05%) Loss: 2.452441 LR: 0.00001284 +[15:12:33] Epoch: 1 Batch: 14282/20099 (71.06%) Loss: 2.143381 LR: 0.00001284 +[15:12:35] Epoch: 1 Batch: 14283/20099 (71.06%) Loss: 2.092828 LR: 0.00001284 +[15:12:37] Epoch: 1 Batch: 14284/20099 (71.07%) Loss: 1.878299 LR: 0.00001284 +[15:12:39] Epoch: 1 Batch: 14285/20099 (71.07%) Loss: 2.028181 LR: 0.00001284 +[15:12:40] Epoch: 1 Batch: 14286/20099 (71.08%) Loss: 2.117287 LR: 0.00001284 +[15:12:42] Epoch: 1 Batch: 14287/20099 (71.08%) Loss: 2.160241 LR: 0.00001284 +[15:12:44] Epoch: 1 Batch: 14288/20099 (71.09%) Loss: 2.094038 LR: 0.00001282 +[15:12:46] Epoch: 1 Batch: 14289/20099 (71.09%) Loss: 2.236572 LR: 0.00001282 +[15:12:48] Epoch: 1 Batch: 14290/20099 (71.10%) Loss: 2.046491 LR: 0.00001282 +[15:12:50] Epoch: 1 Batch: 14291/20099 (71.10%) Loss: 1.774092 LR: 0.00001282 +[15:12:52] Epoch: 1 Batch: 14292/20099 (71.11%) Loss: 2.364005 LR: 0.00001282 +[15:12:53] Epoch: 1 Batch: 14293/20099 (71.11%) Loss: 1.826458 LR: 0.00001282 +[15:12:55] Epoch: 1 Batch: 14294/20099 (71.12%) Loss: 2.015998 LR: 0.00001282 +[15:12:57] Epoch: 1 Batch: 14295/20099 (71.12%) Loss: 1.851637 LR: 0.00001281 +[15:12:59] Epoch: 1 Batch: 14296/20099 (71.13%) Loss: 2.108362 LR: 0.00001281 +[15:13:01] Epoch: 1 Batch: 14297/20099 (71.13%) Loss: 1.991532 LR: 0.00001281 +[15:13:03] Epoch: 1 Batch: 14298/20099 (71.14%) Loss: 2.288403 LR: 0.00001281 +[15:13:05] Epoch: 1 Batch: 14299/20099 (71.14%) Loss: 2.049167 LR: 0.00001281 +[15:13:06] Epoch: 1 Batch: 14300/20099 (71.15%) Loss: 1.792877 LR: 0.00001281 +[15:13:08] Epoch: 1 Batch: 14301/20099 (71.15%) Loss: 2.212775 LR: 0.00001281 +[15:13:10] Epoch: 1 Batch: 14302/20099 (71.16%) Loss: 2.052789 LR: 0.00001279 +[15:13:12] Epoch: 1 Batch: 14303/20099 (71.16%) Loss: 2.148552 LR: 0.00001279 +[15:13:14] Epoch: 1 Batch: 14304/20099 (71.17%) Loss: 1.838241 LR: 0.00001279 +[15:13:16] Epoch: 1 Batch: 14305/20099 (71.17%) Loss: 2.006079 LR: 0.00001279 +[15:13:18] Epoch: 1 Batch: 14306/20099 (71.18%) Loss: 2.521352 LR: 0.00001279 +[15:13:19] Epoch: 1 Batch: 14307/20099 (71.18%) Loss: 1.997672 LR: 0.00001279 +[15:13:21] Epoch: 1 Batch: 14308/20099 (71.19%) Loss: 1.998363 LR: 0.00001279 +[15:13:23] Epoch: 1 Batch: 14309/20099 (71.19%) Loss: 2.151805 LR: 0.00001278 +[15:13:25] Epoch: 1 Batch: 14310/20099 (71.20%) Loss: 1.948065 LR: 0.00001278 +[15:13:27] Epoch: 1 Batch: 14311/20099 (71.20%) Loss: 1.877794 LR: 0.00001278 +[15:13:29] Epoch: 1 Batch: 14312/20099 (71.21%) Loss: 2.216712 LR: 0.00001278 +[15:13:31] Epoch: 1 Batch: 14313/20099 (71.21%) Loss: 2.215113 LR: 0.00001278 +[15:13:32] Epoch: 1 Batch: 14314/20099 (71.22%) Loss: 2.139597 LR: 0.00001278 +[15:13:34] Epoch: 1 Batch: 14315/20099 (71.22%) Loss: 2.149871 LR: 0.00001278 +[15:13:36] Epoch: 1 Batch: 14316/20099 (71.23%) Loss: 2.000152 LR: 0.00001277 +[15:13:38] Epoch: 1 Batch: 14317/20099 (71.23%) Loss: 2.012342 LR: 0.00001277 +[15:13:40] Epoch: 1 Batch: 14318/20099 (71.24%) Loss: 1.899079 LR: 0.00001277 +[15:13:42] Epoch: 1 Batch: 14319/20099 (71.24%) Loss: 2.537788 LR: 0.00001277 +[15:13:44] Epoch: 1 Batch: 14320/20099 (71.25%) Loss: 2.433259 LR: 0.00001277 +[15:13:45] Epoch: 1 Batch: 14321/20099 (71.25%) Loss: 2.167090 LR: 0.00001277 +[15:13:47] Epoch: 1 Batch: 14322/20099 (71.26%) Loss: 2.113279 LR: 0.00001277 +[15:13:49] Epoch: 1 Batch: 14323/20099 (71.26%) Loss: 2.197399 LR: 0.00001275 +[15:13:51] Epoch: 1 Batch: 14324/20099 (71.27%) Loss: 2.130492 LR: 0.00001275 +[15:13:53] Epoch: 1 Batch: 14325/20099 (71.27%) Loss: 2.047857 LR: 0.00001275 +[15:13:55] Epoch: 1 Batch: 14326/20099 (71.28%) Loss: 2.188257 LR: 0.00001275 +[15:13:57] Epoch: 1 Batch: 14327/20099 (71.28%) Loss: 2.095658 LR: 0.00001275 +[15:13:58] Epoch: 1 Batch: 14328/20099 (71.29%) Loss: 1.766085 LR: 0.00001275 +[15:14:00] Epoch: 1 Batch: 14329/20099 (71.29%) Loss: 1.965010 LR: 0.00001275 +[15:14:02] Epoch: 1 Batch: 14330/20099 (71.30%) Loss: 2.070580 LR: 0.00001274 +[15:14:04] Epoch: 1 Batch: 14331/20099 (71.30%) Loss: 1.739433 LR: 0.00001274 +[15:14:06] Epoch: 1 Batch: 14332/20099 (71.31%) Loss: 2.107355 LR: 0.00001274 +[15:14:08] Epoch: 1 Batch: 14333/20099 (71.31%) Loss: 1.887204 LR: 0.00001274 +[15:14:09] Epoch: 1 Batch: 14334/20099 (71.32%) Loss: 2.030982 LR: 0.00001274 +[15:14:11] Epoch: 1 Batch: 14335/20099 (71.32%) Loss: 2.009320 LR: 0.00001274 +[15:14:13] Epoch: 1 Batch: 14336/20099 (71.33%) Loss: 2.125716 LR: 0.00001274 +[15:14:15] Epoch: 1 Batch: 14337/20099 (71.33%) Loss: 2.191684 LR: 0.00001272 +[15:14:17] Epoch: 1 Batch: 14338/20099 (71.34%) Loss: 1.899550 LR: 0.00001272 +[15:14:19] Epoch: 1 Batch: 14339/20099 (71.34%) Loss: 2.223105 LR: 0.00001272 +[15:14:21] Epoch: 1 Batch: 14340/20099 (71.35%) Loss: 2.197577 LR: 0.00001272 +[15:14:22] Epoch: 1 Batch: 14341/20099 (71.35%) Loss: 2.256318 LR: 0.00001272 +[15:14:24] Epoch: 1 Batch: 14342/20099 (71.36%) Loss: 2.203864 LR: 0.00001272 +[15:14:26] Epoch: 1 Batch: 14343/20099 (71.36%) Loss: 2.034499 LR: 0.00001272 +[15:14:28] Epoch: 1 Batch: 14344/20099 (71.37%) Loss: 2.186865 LR: 0.00001271 +[15:14:30] Epoch: 1 Batch: 14345/20099 (71.37%) Loss: 2.044107 LR: 0.00001271 +[15:14:32] Epoch: 1 Batch: 14346/20099 (71.38%) Loss: 2.170029 LR: 0.00001271 +[15:14:34] Epoch: 1 Batch: 14347/20099 (71.38%) Loss: 2.129327 LR: 0.00001271 +[15:14:36] Epoch: 1 Batch: 14348/20099 (71.39%) Loss: 2.287020 LR: 0.00001271 +[15:14:38] Epoch: 1 Batch: 14349/20099 (71.39%) Loss: 1.977602 LR: 0.00001271 +[15:14:39] Epoch: 1 Batch: 14350/20099 (71.40%) Loss: 1.947317 LR: 0.00001271 +[15:14:41] Epoch: 1 Batch: 14351/20099 (71.40%) Loss: 2.026627 LR: 0.00001269 +[15:14:43] Epoch: 1 Batch: 14352/20099 (71.41%) Loss: 2.179625 LR: 0.00001269 +[15:14:45] Epoch: 1 Batch: 14353/20099 (71.41%) Loss: 1.953543 LR: 0.00001269 +[15:14:47] Epoch: 1 Batch: 14354/20099 (71.42%) Loss: 2.018035 LR: 0.00001269 +[15:14:49] Epoch: 1 Batch: 14355/20099 (71.42%) Loss: 1.907887 LR: 0.00001269 +[15:14:51] Epoch: 1 Batch: 14356/20099 (71.43%) Loss: 2.303562 LR: 0.00001269 +[15:14:52] Epoch: 1 Batch: 14357/20099 (71.43%) Loss: 2.162208 LR: 0.00001269 +[15:14:54] Epoch: 1 Batch: 14358/20099 (71.44%) Loss: 2.359410 LR: 0.00001268 +[15:14:56] Epoch: 1 Batch: 14359/20099 (71.44%) Loss: 2.038847 LR: 0.00001268 +[15:14:58] Epoch: 1 Batch: 14360/20099 (71.45%) Loss: 2.267715 LR: 0.00001268 +[15:15:00] Epoch: 1 Batch: 14361/20099 (71.45%) Loss: 2.086479 LR: 0.00001268 +[15:15:02] Epoch: 1 Batch: 14362/20099 (71.46%) Loss: 2.276069 LR: 0.00001268 +[15:15:04] Epoch: 1 Batch: 14363/20099 (71.46%) Loss: 2.056138 LR: 0.00001268 +[15:15:05] Epoch: 1 Batch: 14364/20099 (71.47%) Loss: 2.281404 LR: 0.00001268 +[15:15:07] Epoch: 1 Batch: 14365/20099 (71.47%) Loss: 2.120492 LR: 0.00001266 +[15:15:09] Epoch: 1 Batch: 14366/20099 (71.48%) Loss: 2.051313 LR: 0.00001266 +[15:15:11] Epoch: 1 Batch: 14367/20099 (71.48%) Loss: 2.134567 LR: 0.00001266 +[15:15:13] Epoch: 1 Batch: 14368/20099 (71.49%) Loss: 1.789321 LR: 0.00001266 +[15:15:15] Epoch: 1 Batch: 14369/20099 (71.49%) Loss: 1.693445 LR: 0.00001266 +[15:15:17] Epoch: 1 Batch: 14370/20099 (71.50%) Loss: 2.067714 LR: 0.00001266 +[15:15:18] Epoch: 1 Batch: 14371/20099 (71.50%) Loss: 2.123215 LR: 0.00001266 +[15:15:20] Epoch: 1 Batch: 14372/20099 (71.51%) Loss: 1.870459 LR: 0.00001265 +[15:15:22] Epoch: 1 Batch: 14373/20099 (71.51%) Loss: 2.179322 LR: 0.00001265 +[15:15:24] Epoch: 1 Batch: 14374/20099 (71.52%) Loss: 2.337648 LR: 0.00001265 +[15:15:26] Epoch: 1 Batch: 14375/20099 (71.52%) Loss: 2.033265 LR: 0.00001265 +[15:15:28] Epoch: 1 Batch: 14376/20099 (71.53%) Loss: 1.904836 LR: 0.00001265 +[15:15:30] Epoch: 1 Batch: 14377/20099 (71.53%) Loss: 2.504828 LR: 0.00001265 +[15:15:31] Epoch: 1 Batch: 14378/20099 (71.54%) Loss: 2.086949 LR: 0.00001265 +[15:15:33] Epoch: 1 Batch: 14379/20099 (71.54%) Loss: 2.122542 LR: 0.00001263 +[15:15:35] Epoch: 1 Batch: 14380/20099 (71.55%) Loss: 2.071714 LR: 0.00001263 +[15:15:37] Epoch: 1 Batch: 14381/20099 (71.55%) Loss: 1.894771 LR: 0.00001263 +[15:15:39] Epoch: 1 Batch: 14382/20099 (71.56%) Loss: 2.319098 LR: 0.00001263 +[15:15:41] Epoch: 1 Batch: 14383/20099 (71.56%) Loss: 2.279652 LR: 0.00001263 +[15:15:42] Epoch: 1 Batch: 14384/20099 (71.57%) Loss: 2.373564 LR: 0.00001263 +[15:15:44] Epoch: 1 Batch: 14385/20099 (71.57%) Loss: 2.100822 LR: 0.00001263 +[15:15:46] Epoch: 1 Batch: 14386/20099 (71.58%) Loss: 1.988576 LR: 0.00001262 +[15:15:48] Epoch: 1 Batch: 14387/20099 (71.58%) Loss: 2.149220 LR: 0.00001262 +[15:15:50] Epoch: 1 Batch: 14388/20099 (71.59%) Loss: 2.156591 LR: 0.00001262 +[15:15:52] Epoch: 1 Batch: 14389/20099 (71.59%) Loss: 2.148806 LR: 0.00001262 +[15:15:54] Epoch: 1 Batch: 14390/20099 (71.60%) Loss: 2.266338 LR: 0.00001262 +[15:15:55] Epoch: 1 Batch: 14391/20099 (71.60%) Loss: 2.061398 LR: 0.00001262 +[15:15:57] Epoch: 1 Batch: 14392/20099 (71.61%) Loss: 1.843570 LR: 0.00001262 +[15:15:59] Epoch: 1 Batch: 14393/20099 (71.61%) Loss: 1.948949 LR: 0.00001261 +[15:16:01] Epoch: 1 Batch: 14394/20099 (71.62%) Loss: 1.814525 LR: 0.00001261 +[15:16:03] Epoch: 1 Batch: 14395/20099 (71.62%) Loss: 2.158657 LR: 0.00001261 +[15:16:05] Epoch: 1 Batch: 14396/20099 (71.63%) Loss: 1.749747 LR: 0.00001261 +[15:16:07] Epoch: 1 Batch: 14397/20099 (71.63%) Loss: 1.921659 LR: 0.00001261 +[15:16:08] Epoch: 1 Batch: 14398/20099 (71.64%) Loss: 2.144074 LR: 0.00001261 +[15:16:10] Epoch: 1 Batch: 14399/20099 (71.64%) Loss: 1.687020 LR: 0.00001261 +[15:16:16] >> Cleaned up old temp checkpoint: epoch1_step12400 +[15:16:16] >> Temp checkpoint saved: epoch1_step14400, size: 0.1693 GB +[15:16:16] Epoch: 1 Batch: 14400/20099 (71.65%) Loss: 2.206450 LR: 0.00001259 +[15:16:18] Epoch: 1 Batch: 14401/20099 (71.65%) Loss: 2.075752 LR: 0.00001259 +[15:16:20] Epoch: 1 Batch: 14402/20099 (71.66%) Loss: 2.045844 LR: 0.00001259 +[15:16:22] Epoch: 1 Batch: 14403/20099 (71.66%) Loss: 2.143670 LR: 0.00001259 +[15:16:23] Epoch: 1 Batch: 14404/20099 (71.67%) Loss: 1.976221 LR: 0.00001259 +[15:16:25] Epoch: 1 Batch: 14405/20099 (71.67%) Loss: 1.876219 LR: 0.00001259 +[15:16:27] Epoch: 1 Batch: 14406/20099 (71.68%) Loss: 1.975319 LR: 0.00001259 +[15:16:29] Epoch: 1 Batch: 14407/20099 (71.68%) Loss: 2.163534 LR: 0.00001258 +[15:16:31] Epoch: 1 Batch: 14408/20099 (71.69%) Loss: 2.071994 LR: 0.00001258 +[15:16:33] Epoch: 1 Batch: 14409/20099 (71.69%) Loss: 1.934184 LR: 0.00001258 +[15:16:35] Epoch: 1 Batch: 14410/20099 (71.70%) Loss: 1.976240 LR: 0.00001258 +[15:16:36] Epoch: 1 Batch: 14411/20099 (71.70%) Loss: 2.033132 LR: 0.00001258 +[15:16:38] Epoch: 1 Batch: 14412/20099 (71.71%) Loss: 1.987811 LR: 0.00001258 +[15:16:40] Epoch: 1 Batch: 14413/20099 (71.71%) Loss: 2.220178 LR: 0.00001258 +[15:16:42] Epoch: 1 Batch: 14414/20099 (71.72%) Loss: 2.152871 LR: 0.00001256 +[15:16:44] Epoch: 1 Batch: 14415/20099 (71.72%) Loss: 2.332562 LR: 0.00001256 +[15:16:46] Epoch: 1 Batch: 14416/20099 (71.72%) Loss: 2.238581 LR: 0.00001256 +[15:16:48] Epoch: 1 Batch: 14417/20099 (71.73%) Loss: 2.089610 LR: 0.00001256 +[15:16:49] Epoch: 1 Batch: 14418/20099 (71.73%) Loss: 2.253574 LR: 0.00001256 +[15:16:51] Epoch: 1 Batch: 14419/20099 (71.74%) Loss: 1.877119 LR: 0.00001256 +[15:16:53] Epoch: 1 Batch: 14420/20099 (71.74%) Loss: 1.851880 LR: 0.00001256 +[15:16:55] Epoch: 1 Batch: 14421/20099 (71.75%) Loss: 2.165289 LR: 0.00001255 +[15:16:57] Epoch: 1 Batch: 14422/20099 (71.75%) Loss: 2.246713 LR: 0.00001255 +[15:16:59] Epoch: 1 Batch: 14423/20099 (71.76%) Loss: 2.126076 LR: 0.00001255 +[15:17:01] Epoch: 1 Batch: 14424/20099 (71.76%) Loss: 2.044554 LR: 0.00001255 +[15:17:03] Epoch: 1 Batch: 14425/20099 (71.77%) Loss: 2.324728 LR: 0.00001255 +[15:17:04] Epoch: 1 Batch: 14426/20099 (71.77%) Loss: 1.812767 LR: 0.00001255 +[15:17:06] Epoch: 1 Batch: 14427/20099 (71.78%) Loss: 2.258412 LR: 0.00001255 +[15:17:08] Epoch: 1 Batch: 14428/20099 (71.78%) Loss: 2.184781 LR: 0.00001253 +[15:17:10] Epoch: 1 Batch: 14429/20099 (71.79%) Loss: 2.070400 LR: 0.00001253 +[15:17:12] Epoch: 1 Batch: 14430/20099 (71.79%) Loss: 1.893469 LR: 0.00001253 +[15:17:14] Epoch: 1 Batch: 14431/20099 (71.80%) Loss: 2.042439 LR: 0.00001253 +[15:17:16] Epoch: 1 Batch: 14432/20099 (71.80%) Loss: 1.971487 LR: 0.00001253 +[15:17:17] Epoch: 1 Batch: 14433/20099 (71.81%) Loss: 1.981019 LR: 0.00001253 +[15:17:19] Epoch: 1 Batch: 14434/20099 (71.81%) Loss: 2.101089 LR: 0.00001253 +[15:17:21] Epoch: 1 Batch: 14435/20099 (71.82%) Loss: 1.960382 LR: 0.00001252 +[15:17:23] Epoch: 1 Batch: 14436/20099 (71.82%) Loss: 2.217888 LR: 0.00001252 +[15:17:25] Epoch: 1 Batch: 14437/20099 (71.83%) Loss: 2.130462 LR: 0.00001252 +[15:17:27] Epoch: 1 Batch: 14438/20099 (71.83%) Loss: 2.106613 LR: 0.00001252 +[15:17:28] Epoch: 1 Batch: 14439/20099 (71.84%) Loss: 1.908011 LR: 0.00001252 +[15:17:30] Epoch: 1 Batch: 14440/20099 (71.84%) Loss: 2.220032 LR: 0.00001252 +[15:17:32] Epoch: 1 Batch: 14441/20099 (71.85%) Loss: 1.910267 LR: 0.00001252 +[15:17:34] Epoch: 1 Batch: 14442/20099 (71.85%) Loss: 2.261104 LR: 0.00001250 +[15:17:36] Epoch: 1 Batch: 14443/20099 (71.86%) Loss: 2.085515 LR: 0.00001250 +[15:17:38] Epoch: 1 Batch: 14444/20099 (71.86%) Loss: 1.943610 LR: 0.00001250 +[15:17:40] Epoch: 1 Batch: 14445/20099 (71.87%) Loss: 1.963847 LR: 0.00001250 +[15:17:41] Epoch: 1 Batch: 14446/20099 (71.87%) Loss: 2.349266 LR: 0.00001250 +[15:17:43] Epoch: 1 Batch: 14447/20099 (71.88%) Loss: 2.200140 LR: 0.00001250 +[15:17:45] Epoch: 1 Batch: 14448/20099 (71.88%) Loss: 1.996956 LR: 0.00001250 +[15:17:47] Epoch: 1 Batch: 14449/20099 (71.89%) Loss: 2.202397 LR: 0.00001249 +[15:17:49] Epoch: 1 Batch: 14450/20099 (71.89%) Loss: 2.148545 LR: 0.00001249 +[15:17:51] Epoch: 1 Batch: 14451/20099 (71.90%) Loss: 2.126232 LR: 0.00001249 +[15:17:53] Epoch: 1 Batch: 14452/20099 (71.90%) Loss: 2.233553 LR: 0.00001249 +[15:17:54] Epoch: 1 Batch: 14453/20099 (71.91%) Loss: 2.125915 LR: 0.00001249 +[15:17:56] Epoch: 1 Batch: 14454/20099 (71.91%) Loss: 2.064939 LR: 0.00001249 +[15:17:58] Epoch: 1 Batch: 14455/20099 (71.92%) Loss: 1.873284 LR: 0.00001249 +[15:18:00] Epoch: 1 Batch: 14456/20099 (71.92%) Loss: 2.141307 LR: 0.00001247 +[15:18:02] Epoch: 1 Batch: 14457/20099 (71.93%) Loss: 2.360401 LR: 0.00001247 +[15:18:04] Epoch: 1 Batch: 14458/20099 (71.93%) Loss: 2.310028 LR: 0.00001247 +[15:18:05] Epoch: 1 Batch: 14459/20099 (71.94%) Loss: 2.637458 LR: 0.00001247 +[15:18:07] Epoch: 1 Batch: 14460/20099 (71.94%) Loss: 2.149997 LR: 0.00001247 +[15:18:09] Epoch: 1 Batch: 14461/20099 (71.95%) Loss: 2.183871 LR: 0.00001247 +[15:18:11] Epoch: 1 Batch: 14462/20099 (71.95%) Loss: 1.826224 LR: 0.00001247 +[15:18:13] Epoch: 1 Batch: 14463/20099 (71.96%) Loss: 1.833122 LR: 0.00001246 +[15:18:15] Epoch: 1 Batch: 14464/20099 (71.96%) Loss: 2.202696 LR: 0.00001246 +[15:18:17] Epoch: 1 Batch: 14465/20099 (71.97%) Loss: 1.819483 LR: 0.00001246 +[15:18:18] Epoch: 1 Batch: 14466/20099 (71.97%) Loss: 2.076388 LR: 0.00001246 +[15:18:20] Epoch: 1 Batch: 14467/20099 (71.98%) Loss: 1.945291 LR: 0.00001246 +[15:18:22] Epoch: 1 Batch: 14468/20099 (71.98%) Loss: 2.335364 LR: 0.00001246 +[15:18:24] Epoch: 1 Batch: 14469/20099 (71.99%) Loss: 2.198541 LR: 0.00001246 +[15:18:26] Epoch: 1 Batch: 14470/20099 (71.99%) Loss: 2.079492 LR: 0.00001245 +[15:18:28] Epoch: 1 Batch: 14471/20099 (72.00%) Loss: 2.233108 LR: 0.00001245 +[15:18:29] Epoch: 1 Batch: 14472/20099 (72.00%) Loss: 1.769062 LR: 0.00001245 +[15:18:31] Epoch: 1 Batch: 14473/20099 (72.01%) Loss: 2.417563 LR: 0.00001245 +[15:18:33] Epoch: 1 Batch: 14474/20099 (72.01%) Loss: 2.011116 LR: 0.00001245 +[15:18:35] Epoch: 1 Batch: 14475/20099 (72.02%) Loss: 2.102329 LR: 0.00001245 +[15:18:37] Epoch: 1 Batch: 14476/20099 (72.02%) Loss: 2.167774 LR: 0.00001245 +[15:18:39] Epoch: 1 Batch: 14477/20099 (72.03%) Loss: 1.933705 LR: 0.00001243 +[15:18:41] Epoch: 1 Batch: 14478/20099 (72.03%) Loss: 2.047865 LR: 0.00001243 +[15:18:42] Epoch: 1 Batch: 14479/20099 (72.04%) Loss: 2.353516 LR: 0.00001243 +[15:18:44] Epoch: 1 Batch: 14480/20099 (72.04%) Loss: 2.319725 LR: 0.00001243 +[15:18:46] Epoch: 1 Batch: 14481/20099 (72.05%) Loss: 1.982674 LR: 0.00001243 +[15:18:48] Epoch: 1 Batch: 14482/20099 (72.05%) Loss: 1.962979 LR: 0.00001243 +[15:18:50] Epoch: 1 Batch: 14483/20099 (72.06%) Loss: 1.914790 LR: 0.00001243 +[15:18:52] Epoch: 1 Batch: 14484/20099 (72.06%) Loss: 2.024591 LR: 0.00001242 +[15:18:54] Epoch: 1 Batch: 14485/20099 (72.07%) Loss: 2.169400 LR: 0.00001242 +[15:18:55] Epoch: 1 Batch: 14486/20099 (72.07%) Loss: 2.048958 LR: 0.00001242 +[15:18:57] Epoch: 1 Batch: 14487/20099 (72.08%) Loss: 2.480508 LR: 0.00001242 +[15:18:59] Epoch: 1 Batch: 14488/20099 (72.08%) Loss: 2.060916 LR: 0.00001242 +[15:19:01] Epoch: 1 Batch: 14489/20099 (72.09%) Loss: 2.125001 LR: 0.00001242 +[15:19:03] Epoch: 1 Batch: 14490/20099 (72.09%) Loss: 1.822491 LR: 0.00001242 +[15:19:05] Epoch: 1 Batch: 14491/20099 (72.10%) Loss: 1.838583 LR: 0.00001240 +[15:19:07] Epoch: 1 Batch: 14492/20099 (72.10%) Loss: 2.015318 LR: 0.00001240 +[15:19:09] Epoch: 1 Batch: 14493/20099 (72.11%) Loss: 2.206692 LR: 0.00001240 +[15:19:10] Epoch: 1 Batch: 14494/20099 (72.11%) Loss: 2.215018 LR: 0.00001240 +[15:19:12] Epoch: 1 Batch: 14495/20099 (72.12%) Loss: 2.295737 LR: 0.00001240 +[15:19:14] Epoch: 1 Batch: 14496/20099 (72.12%) Loss: 2.448688 LR: 0.00001240 +[15:19:16] Epoch: 1 Batch: 14497/20099 (72.13%) Loss: 2.364935 LR: 0.00001240 +[15:19:18] Epoch: 1 Batch: 14498/20099 (72.13%) Loss: 2.018556 LR: 0.00001239 +[15:19:20] Epoch: 1 Batch: 14499/20099 (72.14%) Loss: 1.936328 LR: 0.00001239 +[15:19:22] >> Evaluating batch 0 +[15:19:23] >> Evaluating batch 1 +[15:19:24] >> Evaluating batch 2 +[15:19:25] >> Evaluating batch 3 +[15:19:26] >> Evaluating batch 4 +[15:19:27] >> Evaluating batch 5 +[15:19:28] >> Evaluating batch 6 +[15:19:29] >> Evaluating batch 7 +[15:19:30] >> Evaluating batch 8 +[15:19:31] >> Evaluating batch 9 +[15:19:32] >> Evaluating batch 10 +[15:19:33] >> Evaluating batch 11 +[15:19:34] >> Evaluating batch 12 +[15:19:35] >> Evaluating batch 13 +[15:19:36] >> Evaluating batch 14 +[15:19:37] >> Evaluating batch 15 +[15:19:38] >> Evaluating batch 16 +[15:19:39] Epoch: 1 Step: 14500/20099 Evaluation: +[15:19:39] [1mAvg Loss Since Last Eval: 2.0950 Val Loss: 2.1510 Validation loss delta: -0.0036 Perplexity: 8.5937 LR: 0.00001239 +[15:19:43] >> Checkpoint saved: epoch1_step14500, size: 0.1693 GB +[15:19:43] Epoch: 1 Batch: 14500/20099 (72.14%) Loss: 1.991810 LR: 0.00001239 +[15:19:44] Epoch: 1 Batch: 14501/20099 (72.15%) Loss: 2.164187 LR: 0.00001239 +[15:19:46] Epoch: 1 Batch: 14502/20099 (72.15%) Loss: 2.163255 LR: 0.00001239 +[15:19:48] Epoch: 1 Batch: 14503/20099 (72.16%) Loss: 2.463665 LR: 0.00001239 +[15:19:50] Epoch: 1 Batch: 14504/20099 (72.16%) Loss: 2.165258 LR: 0.00001239 +[15:19:52] Epoch: 1 Batch: 14505/20099 (72.17%) Loss: 2.323009 LR: 0.00001237 +[15:19:54] Epoch: 1 Batch: 14506/20099 (72.17%) Loss: 2.243193 LR: 0.00001237 +[15:19:56] Epoch: 1 Batch: 14507/20099 (72.18%) Loss: 1.835574 LR: 0.00001237 +[15:19:57] Epoch: 1 Batch: 14508/20099 (72.18%) Loss: 1.756479 LR: 0.00001237 +[15:19:59] Epoch: 1 Batch: 14509/20099 (72.19%) Loss: 1.705734 LR: 0.00001237 +[15:20:01] Epoch: 1 Batch: 14510/20099 (72.19%) Loss: 2.187381 LR: 0.00001237 +[15:20:03] Epoch: 1 Batch: 14511/20099 (72.20%) Loss: 2.145376 LR: 0.00001237 +[15:20:05] Epoch: 1 Batch: 14512/20099 (72.20%) Loss: 1.882414 LR: 0.00001236 +[15:20:07] Epoch: 1 Batch: 14513/20099 (72.21%) Loss: 2.215072 LR: 0.00001236 +[15:20:09] Epoch: 1 Batch: 14514/20099 (72.21%) Loss: 1.993399 LR: 0.00001236 +[15:20:10] Epoch: 1 Batch: 14515/20099 (72.22%) Loss: 1.847801 LR: 0.00001236 +[15:20:12] Epoch: 1 Batch: 14516/20099 (72.22%) Loss: 2.180828 LR: 0.00001236 +[15:20:14] Epoch: 1 Batch: 14517/20099 (72.23%) Loss: 2.125284 LR: 0.00001236 +[15:20:16] Epoch: 1 Batch: 14518/20099 (72.23%) Loss: 2.132117 LR: 0.00001236 +[15:20:18] Epoch: 1 Batch: 14519/20099 (72.24%) Loss: 2.158510 LR: 0.00001235 +[15:20:20] Epoch: 1 Batch: 14520/20099 (72.24%) Loss: 2.063774 LR: 0.00001235 +[15:20:22] Epoch: 1 Batch: 14521/20099 (72.25%) Loss: 2.124070 LR: 0.00001235 +[15:20:24] Epoch: 1 Batch: 14522/20099 (72.25%) Loss: 2.018030 LR: 0.00001235 +[15:20:25] Epoch: 1 Batch: 14523/20099 (72.26%) Loss: 2.184516 LR: 0.00001235 +[15:20:27] Epoch: 1 Batch: 14524/20099 (72.26%) Loss: 2.082471 LR: 0.00001235 +[15:20:29] Epoch: 1 Batch: 14525/20099 (72.27%) Loss: 2.160716 LR: 0.00001235 +[15:20:31] Epoch: 1 Batch: 14526/20099 (72.27%) Loss: 2.116430 LR: 0.00001233 +[15:20:33] Epoch: 1 Batch: 14527/20099 (72.28%) Loss: 1.868672 LR: 0.00001233 +[15:20:35] Epoch: 1 Batch: 14528/20099 (72.28%) Loss: 1.933285 LR: 0.00001233 +[15:20:37] Epoch: 1 Batch: 14529/20099 (72.29%) Loss: 2.187509 LR: 0.00001233 +[15:20:38] Epoch: 1 Batch: 14530/20099 (72.29%) Loss: 2.055129 LR: 0.00001233 +[15:20:40] Epoch: 1 Batch: 14531/20099 (72.30%) Loss: 2.005318 LR: 0.00001233 +[15:20:42] Epoch: 1 Batch: 14532/20099 (72.30%) Loss: 2.230756 LR: 0.00001233 +[15:20:44] Epoch: 1 Batch: 14533/20099 (72.31%) Loss: 1.957039 LR: 0.00001232 +[15:20:46] Epoch: 1 Batch: 14534/20099 (72.31%) Loss: 2.161413 LR: 0.00001232 +[15:20:48] Epoch: 1 Batch: 14535/20099 (72.32%) Loss: 1.921613 LR: 0.00001232 +[15:20:50] Epoch: 1 Batch: 14536/20099 (72.32%) Loss: 1.986036 LR: 0.00001232 +[15:20:51] Epoch: 1 Batch: 14537/20099 (72.33%) Loss: 2.228891 LR: 0.00001232 +[15:20:53] Epoch: 1 Batch: 14538/20099 (72.33%) Loss: 1.871427 LR: 0.00001232 +[15:20:55] Epoch: 1 Batch: 14539/20099 (72.34%) Loss: 2.155831 LR: 0.00001232 +[15:20:57] Epoch: 1 Batch: 14540/20099 (72.34%) Loss: 2.498863 LR: 0.00001230 +[15:20:59] Epoch: 1 Batch: 14541/20099 (72.35%) Loss: 1.964395 LR: 0.00001230 +[15:21:01] Epoch: 1 Batch: 14542/20099 (72.35%) Loss: 2.005301 LR: 0.00001230 +[15:21:03] Epoch: 1 Batch: 14543/20099 (72.36%) Loss: 2.308398 LR: 0.00001230 +[15:21:04] Epoch: 1 Batch: 14544/20099 (72.36%) Loss: 2.098806 LR: 0.00001230 +[15:21:06] Epoch: 1 Batch: 14545/20099 (72.37%) Loss: 1.818506 LR: 0.00001230 +[15:21:08] Epoch: 1 Batch: 14546/20099 (72.37%) Loss: 2.287256 LR: 0.00001230 +[15:21:10] Epoch: 1 Batch: 14547/20099 (72.38%) Loss: 2.301476 LR: 0.00001229 +[15:21:12] Epoch: 1 Batch: 14548/20099 (72.38%) Loss: 1.800382 LR: 0.00001229 +[15:21:14] Epoch: 1 Batch: 14549/20099 (72.39%) Loss: 2.378016 LR: 0.00001229 +[15:21:15] Epoch: 1 Batch: 14550/20099 (72.39%) Loss: 2.083975 LR: 0.00001229 +[15:21:17] Epoch: 1 Batch: 14551/20099 (72.40%) Loss: 2.112379 LR: 0.00001229 +[15:21:19] Epoch: 1 Batch: 14552/20099 (72.40%) Loss: 2.409041 LR: 0.00001229 +[15:21:21] Epoch: 1 Batch: 14553/20099 (72.41%) Loss: 2.221600 LR: 0.00001229 +[15:21:23] Epoch: 1 Batch: 14554/20099 (72.41%) Loss: 1.893570 LR: 0.00001227 +[15:21:25] Epoch: 1 Batch: 14555/20099 (72.42%) Loss: 2.271136 LR: 0.00001227 +[15:21:27] Epoch: 1 Batch: 14556/20099 (72.42%) Loss: 2.220961 LR: 0.00001227 +[15:21:28] Epoch: 1 Batch: 14557/20099 (72.43%) Loss: 2.177417 LR: 0.00001227 +[15:21:30] Epoch: 1 Batch: 14558/20099 (72.43%) Loss: 2.383300 LR: 0.00001227 +[15:21:32] Epoch: 1 Batch: 14559/20099 (72.44%) Loss: 2.106506 LR: 0.00001227 +[15:21:34] Epoch: 1 Batch: 14560/20099 (72.44%) Loss: 2.212078 LR: 0.00001227 +[15:21:36] Epoch: 1 Batch: 14561/20099 (72.45%) Loss: 2.100115 LR: 0.00001226 +[15:21:38] Epoch: 1 Batch: 14562/20099 (72.45%) Loss: 2.168281 LR: 0.00001226 +[15:21:40] Epoch: 1 Batch: 14563/20099 (72.46%) Loss: 2.038933 LR: 0.00001226 +[15:21:41] Epoch: 1 Batch: 14564/20099 (72.46%) Loss: 2.212020 LR: 0.00001226 +[15:21:43] Epoch: 1 Batch: 14565/20099 (72.47%) Loss: 1.901061 LR: 0.00001226 +[15:21:45] Epoch: 1 Batch: 14566/20099 (72.47%) Loss: 2.051090 LR: 0.00001226 +[15:21:47] Epoch: 1 Batch: 14567/20099 (72.48%) Loss: 2.046336 LR: 0.00001226 +[15:21:49] Epoch: 1 Batch: 14568/20099 (72.48%) Loss: 2.211973 LR: 0.00001225 +[15:21:51] Epoch: 1 Batch: 14569/20099 (72.49%) Loss: 2.412043 LR: 0.00001225 +[15:21:53] Epoch: 1 Batch: 14570/20099 (72.49%) Loss: 1.976097 LR: 0.00001225 +[15:21:54] Epoch: 1 Batch: 14571/20099 (72.50%) Loss: 1.640873 LR: 0.00001225 +[15:21:56] Epoch: 1 Batch: 14572/20099 (72.50%) Loss: 1.940375 LR: 0.00001225 +[15:21:58] Epoch: 1 Batch: 14573/20099 (72.51%) Loss: 1.804585 LR: 0.00001225 +[15:22:00] Epoch: 1 Batch: 14574/20099 (72.51%) Loss: 1.994008 LR: 0.00001225 +[15:22:02] Epoch: 1 Batch: 14575/20099 (72.52%) Loss: 2.144055 LR: 0.00001223 +[15:22:04] Epoch: 1 Batch: 14576/20099 (72.52%) Loss: 1.995118 LR: 0.00001223 +[15:22:06] Epoch: 1 Batch: 14577/20099 (72.53%) Loss: 2.275714 LR: 0.00001223 +[15:22:07] Epoch: 1 Batch: 14578/20099 (72.53%) Loss: 1.918792 LR: 0.00001223 +[15:22:09] Epoch: 1 Batch: 14579/20099 (72.54%) Loss: 2.120080 LR: 0.00001223 +[15:22:11] Epoch: 1 Batch: 14580/20099 (72.54%) Loss: 1.992417 LR: 0.00001223 +[15:22:13] Epoch: 1 Batch: 14581/20099 (72.55%) Loss: 2.131125 LR: 0.00001223 +[15:22:15] Epoch: 1 Batch: 14582/20099 (72.55%) Loss: 2.312051 LR: 0.00001222 +[15:22:17] Epoch: 1 Batch: 14583/20099 (72.56%) Loss: 2.062566 LR: 0.00001222 +[15:22:19] Epoch: 1 Batch: 14584/20099 (72.56%) Loss: 1.993074 LR: 0.00001222 +[15:22:20] Epoch: 1 Batch: 14585/20099 (72.57%) Loss: 2.236779 LR: 0.00001222 +[15:22:22] Epoch: 1 Batch: 14586/20099 (72.57%) Loss: 2.443809 LR: 0.00001222 +[15:22:24] Epoch: 1 Batch: 14587/20099 (72.58%) Loss: 2.076302 LR: 0.00001222 +[15:22:26] Epoch: 1 Batch: 14588/20099 (72.58%) Loss: 2.030118 LR: 0.00001222 +[15:22:28] Epoch: 1 Batch: 14589/20099 (72.59%) Loss: 1.984947 LR: 0.00001220 +[15:22:30] Epoch: 1 Batch: 14590/20099 (72.59%) Loss: 2.191644 LR: 0.00001220 +[15:22:32] Epoch: 1 Batch: 14591/20099 (72.60%) Loss: 2.212530 LR: 0.00001220 +[15:22:33] Epoch: 1 Batch: 14592/20099 (72.60%) Loss: 2.199967 LR: 0.00001220 +[15:22:35] Epoch: 1 Batch: 14593/20099 (72.61%) Loss: 2.130109 LR: 0.00001220 +[15:22:37] Epoch: 1 Batch: 14594/20099 (72.61%) Loss: 1.889025 LR: 0.00001220 +[15:22:39] Epoch: 1 Batch: 14595/20099 (72.62%) Loss: 2.228910 LR: 0.00001220 +[15:22:41] Epoch: 1 Batch: 14596/20099 (72.62%) Loss: 2.041878 LR: 0.00001219 +[15:22:43] Epoch: 1 Batch: 14597/20099 (72.63%) Loss: 1.977821 LR: 0.00001219 +[15:22:45] Epoch: 1 Batch: 14598/20099 (72.63%) Loss: 2.112341 LR: 0.00001219 +[15:22:46] Epoch: 1 Batch: 14599/20099 (72.64%) Loss: 2.205645 LR: 0.00001219 +[15:22:52] >> Cleaned up old temp checkpoint: epoch1_step12600 +[15:22:52] >> Temp checkpoint saved: epoch1_step14600, size: 0.1693 GB +[15:22:52] Epoch: 1 Batch: 14600/20099 (72.64%) Loss: 1.845559 LR: 0.00001219 +[15:22:54] Epoch: 1 Batch: 14601/20099 (72.65%) Loss: 2.057917 LR: 0.00001219 +[15:22:56] Epoch: 1 Batch: 14602/20099 (72.65%) Loss: 2.533642 LR: 0.00001219 +[15:22:57] Epoch: 1 Batch: 14603/20099 (72.66%) Loss: 2.032584 LR: 0.00001217 +[15:22:59] Epoch: 1 Batch: 14604/20099 (72.66%) Loss: 2.102175 LR: 0.00001217 +[15:23:01] Epoch: 1 Batch: 14605/20099 (72.67%) Loss: 2.207043 LR: 0.00001217 +[15:23:03] Epoch: 1 Batch: 14606/20099 (72.67%) Loss: 1.926958 LR: 0.00001217 +[15:23:05] Epoch: 1 Batch: 14607/20099 (72.68%) Loss: 1.838336 LR: 0.00001217 +[15:23:07] Epoch: 1 Batch: 14608/20099 (72.68%) Loss: 1.911971 LR: 0.00001217 +[15:23:09] Epoch: 1 Batch: 14609/20099 (72.69%) Loss: 1.884308 LR: 0.00001217 +[15:23:10] Epoch: 1 Batch: 14610/20099 (72.69%) Loss: 1.935245 LR: 0.00001216 +[15:23:12] Epoch: 1 Batch: 14611/20099 (72.70%) Loss: 1.943741 LR: 0.00001216 +[15:23:14] Epoch: 1 Batch: 14612/20099 (72.70%) Loss: 2.016637 LR: 0.00001216 +[15:23:16] Epoch: 1 Batch: 14613/20099 (72.71%) Loss: 1.950716 LR: 0.00001216 +[15:23:18] Epoch: 1 Batch: 14614/20099 (72.71%) Loss: 2.190831 LR: 0.00001216 +[15:23:20] Epoch: 1 Batch: 14615/20099 (72.72%) Loss: 1.879681 LR: 0.00001216 +[15:23:22] Epoch: 1 Batch: 14616/20099 (72.72%) Loss: 2.331791 LR: 0.00001216 +[15:23:23] Epoch: 1 Batch: 14617/20099 (72.73%) Loss: 2.205222 LR: 0.00001215 +[15:23:25] Epoch: 1 Batch: 14618/20099 (72.73%) Loss: 2.156131 LR: 0.00001215 +[15:23:27] Epoch: 1 Batch: 14619/20099 (72.73%) Loss: 1.859800 LR: 0.00001215 +[15:23:29] Epoch: 1 Batch: 14620/20099 (72.74%) Loss: 2.412878 LR: 0.00001215 +[15:23:31] Epoch: 1 Batch: 14621/20099 (72.74%) Loss: 2.234488 LR: 0.00001215 +[15:23:33] Epoch: 1 Batch: 14622/20099 (72.75%) Loss: 2.142592 LR: 0.00001215 +[15:23:35] Epoch: 1 Batch: 14623/20099 (72.75%) Loss: 1.815431 LR: 0.00001215 +[15:23:37] Epoch: 1 Batch: 14624/20099 (72.76%) Loss: 2.023240 LR: 0.00001213 +[15:23:38] Epoch: 1 Batch: 14625/20099 (72.76%) Loss: 2.063927 LR: 0.00001213 +[15:23:40] Epoch: 1 Batch: 14626/20099 (72.77%) Loss: 1.755138 LR: 0.00001213 +[15:23:42] Epoch: 1 Batch: 14627/20099 (72.77%) Loss: 2.143165 LR: 0.00001213 +[15:23:44] Epoch: 1 Batch: 14628/20099 (72.78%) Loss: 2.154861 LR: 0.00001213 +[15:23:46] Epoch: 1 Batch: 14629/20099 (72.78%) Loss: 2.120452 LR: 0.00001213 +[15:23:48] Epoch: 1 Batch: 14630/20099 (72.79%) Loss: 1.985712 LR: 0.00001213 +[15:23:50] Epoch: 1 Batch: 14631/20099 (72.79%) Loss: 2.323375 LR: 0.00001212 +[15:23:51] Epoch: 1 Batch: 14632/20099 (72.80%) Loss: 1.974238 LR: 0.00001212 +[15:23:53] Epoch: 1 Batch: 14633/20099 (72.80%) Loss: 2.069423 LR: 0.00001212 +[15:23:55] Epoch: 1 Batch: 14634/20099 (72.81%) Loss: 1.919914 LR: 0.00001212 +[15:23:57] Epoch: 1 Batch: 14635/20099 (72.81%) Loss: 2.212505 LR: 0.00001212 +[15:23:59] Epoch: 1 Batch: 14636/20099 (72.82%) Loss: 1.883612 LR: 0.00001212 +[15:24:01] Epoch: 1 Batch: 14637/20099 (72.82%) Loss: 2.042296 LR: 0.00001212 +[15:24:03] Epoch: 1 Batch: 14638/20099 (72.83%) Loss: 2.043594 LR: 0.00001210 +[15:24:04] Epoch: 1 Batch: 14639/20099 (72.83%) Loss: 1.896118 LR: 0.00001210 +[15:24:06] Epoch: 1 Batch: 14640/20099 (72.84%) Loss: 1.933711 LR: 0.00001210 +[15:24:08] Epoch: 1 Batch: 14641/20099 (72.84%) Loss: 1.864973 LR: 0.00001210 +[15:24:10] Epoch: 1 Batch: 14642/20099 (72.85%) Loss: 2.149244 LR: 0.00001210 +[15:24:12] Epoch: 1 Batch: 14643/20099 (72.85%) Loss: 2.395778 LR: 0.00001210 +[15:24:14] Epoch: 1 Batch: 14644/20099 (72.86%) Loss: 2.180894 LR: 0.00001210 +[15:24:16] Epoch: 1 Batch: 14645/20099 (72.86%) Loss: 2.107377 LR: 0.00001209 +[15:24:17] Epoch: 1 Batch: 14646/20099 (72.87%) Loss: 2.022335 LR: 0.00001209 +[15:24:19] Epoch: 1 Batch: 14647/20099 (72.87%) Loss: 2.150871 LR: 0.00001209 +[15:24:21] Epoch: 1 Batch: 14648/20099 (72.88%) Loss: 2.026461 LR: 0.00001209 +[15:24:23] Epoch: 1 Batch: 14649/20099 (72.88%) Loss: 1.902867 LR: 0.00001209 +[15:24:25] Epoch: 1 Batch: 14650/20099 (72.89%) Loss: 1.860460 LR: 0.00001209 +[15:24:27] Epoch: 1 Batch: 14651/20099 (72.89%) Loss: 2.294273 LR: 0.00001209 +[15:24:29] Epoch: 1 Batch: 14652/20099 (72.90%) Loss: 2.045142 LR: 0.00001208 +[15:24:30] Epoch: 1 Batch: 14653/20099 (72.90%) Loss: 2.006219 LR: 0.00001208 +[15:24:32] Epoch: 1 Batch: 14654/20099 (72.91%) Loss: 1.814377 LR: 0.00001208 +[15:24:34] Epoch: 1 Batch: 14655/20099 (72.91%) Loss: 2.342150 LR: 0.00001208 +[15:24:36] Epoch: 1 Batch: 14656/20099 (72.92%) Loss: 2.009468 LR: 0.00001208 +[15:24:38] Epoch: 1 Batch: 14657/20099 (72.92%) Loss: 2.218953 LR: 0.00001208 +[15:24:40] Epoch: 1 Batch: 14658/20099 (72.93%) Loss: 2.021503 LR: 0.00001208 +[15:24:42] Epoch: 1 Batch: 14659/20099 (72.93%) Loss: 2.121539 LR: 0.00001206 +[15:24:43] Epoch: 1 Batch: 14660/20099 (72.94%) Loss: 1.995151 LR: 0.00001206 +[15:24:45] Epoch: 1 Batch: 14661/20099 (72.94%) Loss: 1.991552 LR: 0.00001206 +[15:24:47] Epoch: 1 Batch: 14662/20099 (72.95%) Loss: 2.206068 LR: 0.00001206 +[15:24:49] Epoch: 1 Batch: 14663/20099 (72.95%) Loss: 2.110966 LR: 0.00001206 +[15:24:51] Epoch: 1 Batch: 14664/20099 (72.96%) Loss: 2.055425 LR: 0.00001206 +[15:24:53] Epoch: 1 Batch: 14665/20099 (72.96%) Loss: 2.027600 LR: 0.00001206 +[15:24:55] Epoch: 1 Batch: 14666/20099 (72.97%) Loss: 2.510704 LR: 0.00001205 +[15:24:56] Epoch: 1 Batch: 14667/20099 (72.97%) Loss: 2.203276 LR: 0.00001205 +[15:24:58] Epoch: 1 Batch: 14668/20099 (72.98%) Loss: 2.153906 LR: 0.00001205 +[15:25:00] Epoch: 1 Batch: 14669/20099 (72.98%) Loss: 2.305878 LR: 0.00001205 +[15:25:02] Epoch: 1 Batch: 14670/20099 (72.99%) Loss: 2.250226 LR: 0.00001205 +[15:25:04] Epoch: 1 Batch: 14671/20099 (72.99%) Loss: 2.112631 LR: 0.00001205 +[15:25:06] Epoch: 1 Batch: 14672/20099 (73.00%) Loss: 2.270169 LR: 0.00001205 +[15:25:08] Epoch: 1 Batch: 14673/20099 (73.00%) Loss: 2.122239 LR: 0.00001203 +[15:25:09] Epoch: 1 Batch: 14674/20099 (73.01%) Loss: 2.188517 LR: 0.00001203 +[15:25:11] Epoch: 1 Batch: 14675/20099 (73.01%) Loss: 2.006328 LR: 0.00001203 +[15:25:13] Epoch: 1 Batch: 14676/20099 (73.02%) Loss: 2.356757 LR: 0.00001203 +[15:25:15] Epoch: 1 Batch: 14677/20099 (73.02%) Loss: 2.269834 LR: 0.00001203 +[15:25:17] Epoch: 1 Batch: 14678/20099 (73.03%) Loss: 2.044632 LR: 0.00001203 +[15:25:19] Epoch: 1 Batch: 14679/20099 (73.03%) Loss: 2.291368 LR: 0.00001203 +[15:25:20] Epoch: 1 Batch: 14680/20099 (73.04%) Loss: 2.192049 LR: 0.00001202 +[15:25:22] Epoch: 1 Batch: 14681/20099 (73.04%) Loss: 2.262888 LR: 0.00001202 +[15:25:24] Epoch: 1 Batch: 14682/20099 (73.05%) Loss: 2.233484 LR: 0.00001202 +[15:25:26] Epoch: 1 Batch: 14683/20099 (73.05%) Loss: 2.256237 LR: 0.00001202 +[15:25:28] Epoch: 1 Batch: 14684/20099 (73.06%) Loss: 2.018841 LR: 0.00001202 +[15:25:30] Epoch: 1 Batch: 14685/20099 (73.06%) Loss: 2.179704 LR: 0.00001202 +[15:25:32] Epoch: 1 Batch: 14686/20099 (73.07%) Loss: 2.466531 LR: 0.00001202 +[15:25:33] Epoch: 1 Batch: 14687/20099 (73.07%) Loss: 2.204104 LR: 0.00001200 +[15:25:35] Epoch: 1 Batch: 14688/20099 (73.08%) Loss: 1.864255 LR: 0.00001200 +[15:25:37] Epoch: 1 Batch: 14689/20099 (73.08%) Loss: 2.025358 LR: 0.00001200 +[15:25:39] Epoch: 1 Batch: 14690/20099 (73.09%) Loss: 2.007831 LR: 0.00001200 +[15:25:41] Epoch: 1 Batch: 14691/20099 (73.09%) Loss: 2.452326 LR: 0.00001200 +[15:25:43] Epoch: 1 Batch: 14692/20099 (73.10%) Loss: 2.217162 LR: 0.00001200 +[15:25:45] Epoch: 1 Batch: 14693/20099 (73.10%) Loss: 2.163694 LR: 0.00001200 +[15:25:46] Epoch: 1 Batch: 14694/20099 (73.11%) Loss: 2.180666 LR: 0.00001199 +[15:25:48] Epoch: 1 Batch: 14695/20099 (73.11%) Loss: 1.786565 LR: 0.00001199 +[15:25:50] Epoch: 1 Batch: 14696/20099 (73.12%) Loss: 2.239279 LR: 0.00001199 +[15:25:52] Epoch: 1 Batch: 14697/20099 (73.12%) Loss: 2.327783 LR: 0.00001199 +[15:25:54] Epoch: 1 Batch: 14698/20099 (73.13%) Loss: 2.160182 LR: 0.00001199 +[15:25:56] Epoch: 1 Batch: 14699/20099 (73.13%) Loss: 2.071859 LR: 0.00001199 +[15:25:58] Epoch: 1 Batch: 14700/20099 (73.14%) Loss: 2.109748 LR: 0.00001199 +[15:25:59] Epoch: 1 Batch: 14701/20099 (73.14%) Loss: 1.987413 LR: 0.00001198 +[15:26:01] Epoch: 1 Batch: 14702/20099 (73.15%) Loss: 2.063632 LR: 0.00001198 +[15:26:03] Epoch: 1 Batch: 14703/20099 (73.15%) Loss: 2.079232 LR: 0.00001198 +[15:26:05] Epoch: 1 Batch: 14704/20099 (73.16%) Loss: 2.107628 LR: 0.00001198 +[15:26:07] Epoch: 1 Batch: 14705/20099 (73.16%) Loss: 1.878389 LR: 0.00001198 +[15:26:09] Epoch: 1 Batch: 14706/20099 (73.17%) Loss: 2.106992 LR: 0.00001198 +[15:26:10] Epoch: 1 Batch: 14707/20099 (73.17%) Loss: 1.831939 LR: 0.00001198 +[15:26:12] Epoch: 1 Batch: 14708/20099 (73.18%) Loss: 2.020643 LR: 0.00001196 +[15:26:14] Epoch: 1 Batch: 14709/20099 (73.18%) Loss: 1.899216 LR: 0.00001196 +[15:26:16] Epoch: 1 Batch: 14710/20099 (73.19%) Loss: 1.819044 LR: 0.00001196 +[15:26:18] Epoch: 1 Batch: 14711/20099 (73.19%) Loss: 2.131608 LR: 0.00001196 +[15:26:20] Epoch: 1 Batch: 14712/20099 (73.20%) Loss: 2.186875 LR: 0.00001196 +[15:26:22] Epoch: 1 Batch: 14713/20099 (73.20%) Loss: 1.993829 LR: 0.00001196 +[15:26:23] Epoch: 1 Batch: 14714/20099 (73.21%) Loss: 1.985477 LR: 0.00001196 +[15:26:25] Epoch: 1 Batch: 14715/20099 (73.21%) Loss: 2.242743 LR: 0.00001195 +[15:26:27] Epoch: 1 Batch: 14716/20099 (73.22%) Loss: 2.094003 LR: 0.00001195 +[15:26:29] Epoch: 1 Batch: 14717/20099 (73.22%) Loss: 2.027370 LR: 0.00001195 +[15:26:31] Epoch: 1 Batch: 14718/20099 (73.23%) Loss: 2.048308 LR: 0.00001195 +[15:26:33] Epoch: 1 Batch: 14719/20099 (73.23%) Loss: 2.288171 LR: 0.00001195 +[15:26:35] Epoch: 1 Batch: 14720/20099 (73.24%) Loss: 1.997952 LR: 0.00001195 +[15:26:36] Epoch: 1 Batch: 14721/20099 (73.24%) Loss: 1.976528 LR: 0.00001195 +[15:26:38] Epoch: 1 Batch: 14722/20099 (73.25%) Loss: 2.072670 LR: 0.00001193 +[15:26:40] Epoch: 1 Batch: 14723/20099 (73.25%) Loss: 1.886161 LR: 0.00001193 +[15:26:42] Epoch: 1 Batch: 14724/20099 (73.26%) Loss: 2.258782 LR: 0.00001193 +[15:26:44] Epoch: 1 Batch: 14725/20099 (73.26%) Loss: 2.041959 LR: 0.00001193 +[15:26:46] Epoch: 1 Batch: 14726/20099 (73.27%) Loss: 2.078343 LR: 0.00001193 +[15:26:47] Epoch: 1 Batch: 14727/20099 (73.27%) Loss: 2.028849 LR: 0.00001193 +[15:26:49] Epoch: 1 Batch: 14728/20099 (73.28%) Loss: 2.250461 LR: 0.00001193 +[15:26:51] Epoch: 1 Batch: 14729/20099 (73.28%) Loss: 1.819268 LR: 0.00001192 +[15:26:53] Epoch: 1 Batch: 14730/20099 (73.29%) Loss: 2.272353 LR: 0.00001192 +[15:26:55] Epoch: 1 Batch: 14731/20099 (73.29%) Loss: 1.910786 LR: 0.00001192 +[15:26:57] Epoch: 1 Batch: 14732/20099 (73.30%) Loss: 2.077081 LR: 0.00001192 +[15:26:59] Epoch: 1 Batch: 14733/20099 (73.30%) Loss: 2.061259 LR: 0.00001192 +[15:27:00] Epoch: 1 Batch: 14734/20099 (73.31%) Loss: 1.848962 LR: 0.00001192 +[15:27:02] Epoch: 1 Batch: 14735/20099 (73.31%) Loss: 2.058107 LR: 0.00001192 +[15:27:04] Epoch: 1 Batch: 14736/20099 (73.32%) Loss: 2.471178 LR: 0.00001191 +[15:27:06] Epoch: 1 Batch: 14737/20099 (73.32%) Loss: 2.064875 LR: 0.00001191 +[15:27:08] Epoch: 1 Batch: 14738/20099 (73.33%) Loss: 2.044766 LR: 0.00001191 +[15:27:10] Epoch: 1 Batch: 14739/20099 (73.33%) Loss: 2.070641 LR: 0.00001191 +[15:27:12] Epoch: 1 Batch: 14740/20099 (73.34%) Loss: 1.806723 LR: 0.00001191 +[15:27:13] Epoch: 1 Batch: 14741/20099 (73.34%) Loss: 2.153395 LR: 0.00001191 +[15:27:15] Epoch: 1 Batch: 14742/20099 (73.35%) Loss: 2.079314 LR: 0.00001191 +[15:27:17] Epoch: 1 Batch: 14743/20099 (73.35%) Loss: 2.326917 LR: 0.00001189 +[15:27:19] Epoch: 1 Batch: 14744/20099 (73.36%) Loss: 2.130505 LR: 0.00001189 +[15:27:21] Epoch: 1 Batch: 14745/20099 (73.36%) Loss: 2.326753 LR: 0.00001189 +[15:27:23] Epoch: 1 Batch: 14746/20099 (73.37%) Loss: 2.183417 LR: 0.00001189 +[15:27:25] Epoch: 1 Batch: 14747/20099 (73.37%) Loss: 1.856151 LR: 0.00001189 +[15:27:26] Epoch: 1 Batch: 14748/20099 (73.38%) Loss: 2.055672 LR: 0.00001189 +[15:27:28] Epoch: 1 Batch: 14749/20099 (73.38%) Loss: 2.316203 LR: 0.00001189 +[15:27:30] Epoch: 1 Batch: 14750/20099 (73.39%) Loss: 2.248447 LR: 0.00001188 +[15:27:32] Epoch: 1 Batch: 14751/20099 (73.39%) Loss: 1.879160 LR: 0.00001188 +[15:27:34] Epoch: 1 Batch: 14752/20099 (73.40%) Loss: 2.146756 LR: 0.00001188 +[15:27:36] Epoch: 1 Batch: 14753/20099 (73.40%) Loss: 2.407632 LR: 0.00001188 +[15:27:38] Epoch: 1 Batch: 14754/20099 (73.41%) Loss: 2.247459 LR: 0.00001188 +[15:27:39] Epoch: 1 Batch: 14755/20099 (73.41%) Loss: 2.083076 LR: 0.00001188 +[15:27:41] Epoch: 1 Batch: 14756/20099 (73.42%) Loss: 2.252095 LR: 0.00001188 +[15:27:43] Epoch: 1 Batch: 14757/20099 (73.42%) Loss: 2.056896 LR: 0.00001186 +[15:27:45] Epoch: 1 Batch: 14758/20099 (73.43%) Loss: 1.769943 LR: 0.00001186 +[15:27:47] Epoch: 1 Batch: 14759/20099 (73.43%) Loss: 2.107672 LR: 0.00001186 +[15:27:49] Epoch: 1 Batch: 14760/20099 (73.44%) Loss: 1.994403 LR: 0.00001186 +[15:27:50] Epoch: 1 Batch: 14761/20099 (73.44%) Loss: 1.793188 LR: 0.00001186 +[15:27:52] Epoch: 1 Batch: 14762/20099 (73.45%) Loss: 2.123430 LR: 0.00001186 +[15:27:54] Epoch: 1 Batch: 14763/20099 (73.45%) Loss: 1.763446 LR: 0.00001186 +[15:27:56] Epoch: 1 Batch: 14764/20099 (73.46%) Loss: 2.305965 LR: 0.00001185 +[15:27:58] Epoch: 1 Batch: 14765/20099 (73.46%) Loss: 1.941462 LR: 0.00001185 +[15:28:00] Epoch: 1 Batch: 14766/20099 (73.47%) Loss: 2.170876 LR: 0.00001185 +[15:28:02] Epoch: 1 Batch: 14767/20099 (73.47%) Loss: 2.104540 LR: 0.00001185 +[15:28:03] Epoch: 1 Batch: 14768/20099 (73.48%) Loss: 2.065100 LR: 0.00001185 +[15:28:05] Epoch: 1 Batch: 14769/20099 (73.48%) Loss: 2.126123 LR: 0.00001185 +[15:28:07] Epoch: 1 Batch: 14770/20099 (73.49%) Loss: 2.102377 LR: 0.00001185 +[15:28:09] Epoch: 1 Batch: 14771/20099 (73.49%) Loss: 2.071370 LR: 0.00001184 +[15:28:11] Epoch: 1 Batch: 14772/20099 (73.50%) Loss: 2.018467 LR: 0.00001184 +[15:28:13] Epoch: 1 Batch: 14773/20099 (73.50%) Loss: 2.278516 LR: 0.00001184 +[15:28:15] Epoch: 1 Batch: 14774/20099 (73.51%) Loss: 2.049923 LR: 0.00001184 +[15:28:17] Epoch: 1 Batch: 14775/20099 (73.51%) Loss: 2.177368 LR: 0.00001184 +[15:28:18] Epoch: 1 Batch: 14776/20099 (73.52%) Loss: 1.925876 LR: 0.00001184 +[15:28:20] Epoch: 1 Batch: 14777/20099 (73.52%) Loss: 2.301172 LR: 0.00001184 +[15:28:22] Epoch: 1 Batch: 14778/20099 (73.53%) Loss: 1.967881 LR: 0.00001182 +[15:28:24] Epoch: 1 Batch: 14779/20099 (73.53%) Loss: 1.908280 LR: 0.00001182 +[15:28:26] Epoch: 1 Batch: 14780/20099 (73.54%) Loss: 1.795962 LR: 0.00001182 +[15:28:28] Epoch: 1 Batch: 14781/20099 (73.54%) Loss: 2.360852 LR: 0.00001182 +[15:28:30] Epoch: 1 Batch: 14782/20099 (73.55%) Loss: 2.084901 LR: 0.00001182 +[15:28:31] Epoch: 1 Batch: 14783/20099 (73.55%) Loss: 2.293394 LR: 0.00001182 +[15:28:33] Epoch: 1 Batch: 14784/20099 (73.56%) Loss: 2.015037 LR: 0.00001182 +[15:28:35] Epoch: 1 Batch: 14785/20099 (73.56%) Loss: 1.998969 LR: 0.00001181 +[15:28:37] Epoch: 1 Batch: 14786/20099 (73.57%) Loss: 1.892595 LR: 0.00001181 +[15:28:39] Epoch: 1 Batch: 14787/20099 (73.57%) Loss: 2.071038 LR: 0.00001181 +[15:28:41] Epoch: 1 Batch: 14788/20099 (73.58%) Loss: 2.064728 LR: 0.00001181 +[15:28:42] Epoch: 1 Batch: 14789/20099 (73.58%) Loss: 2.148851 LR: 0.00001181 +[15:28:44] Epoch: 1 Batch: 14790/20099 (73.59%) Loss: 2.087311 LR: 0.00001181 +[15:28:46] Epoch: 1 Batch: 14791/20099 (73.59%) Loss: 2.271152 LR: 0.00001181 +[15:28:48] Epoch: 1 Batch: 14792/20099 (73.60%) Loss: 2.218958 LR: 0.00001179 +[15:28:50] Epoch: 1 Batch: 14793/20099 (73.60%) Loss: 2.100421 LR: 0.00001179 +[15:28:52] Epoch: 1 Batch: 14794/20099 (73.61%) Loss: 2.152236 LR: 0.00001179 +[15:28:54] Epoch: 1 Batch: 14795/20099 (73.61%) Loss: 2.186857 LR: 0.00001179 +[15:28:55] Epoch: 1 Batch: 14796/20099 (73.62%) Loss: 1.746650 LR: 0.00001179 +[15:28:57] Epoch: 1 Batch: 14797/20099 (73.62%) Loss: 2.412846 LR: 0.00001179 +[15:28:59] Epoch: 1 Batch: 14798/20099 (73.63%) Loss: 2.076237 LR: 0.00001179 +[15:29:01] Epoch: 1 Batch: 14799/20099 (73.63%) Loss: 1.773370 LR: 0.00001178 +[15:29:06] >> Cleaned up old temp checkpoint: epoch1_step12800 +[15:29:06] >> Temp checkpoint saved: epoch1_step14800, size: 0.1693 GB +[15:29:06] Epoch: 1 Batch: 14800/20099 (73.64%) Loss: 2.057747 LR: 0.00001178 +[15:29:08] Epoch: 1 Batch: 14801/20099 (73.64%) Loss: 1.785506 LR: 0.00001178 +[15:29:10] Epoch: 1 Batch: 14802/20099 (73.65%) Loss: 2.156562 LR: 0.00001178 +[15:29:12] Epoch: 1 Batch: 14803/20099 (73.65%) Loss: 2.045387 LR: 0.00001178 +[15:29:14] Epoch: 1 Batch: 14804/20099 (73.66%) Loss: 2.207200 LR: 0.00001178 +[15:29:16] Epoch: 1 Batch: 14805/20099 (73.66%) Loss: 2.113934 LR: 0.00001178 +[15:29:17] Epoch: 1 Batch: 14806/20099 (73.67%) Loss: 2.321863 LR: 0.00001177 +[15:29:19] Epoch: 1 Batch: 14807/20099 (73.67%) Loss: 1.982444 LR: 0.00001177 +[15:29:21] Epoch: 1 Batch: 14808/20099 (73.68%) Loss: 2.057757 LR: 0.00001177 +[15:29:23] Epoch: 1 Batch: 14809/20099 (73.68%) Loss: 2.235214 LR: 0.00001177 +[15:29:25] Epoch: 1 Batch: 14810/20099 (73.69%) Loss: 1.831557 LR: 0.00001177 +[15:29:27] Epoch: 1 Batch: 14811/20099 (73.69%) Loss: 2.398069 LR: 0.00001177 +[15:29:29] Epoch: 1 Batch: 14812/20099 (73.70%) Loss: 2.089263 LR: 0.00001177 +[15:29:30] Epoch: 1 Batch: 14813/20099 (73.70%) Loss: 1.942188 LR: 0.00001175 +[15:29:32] Epoch: 1 Batch: 14814/20099 (73.71%) Loss: 1.837707 LR: 0.00001175 +[15:29:34] Epoch: 1 Batch: 14815/20099 (73.71%) Loss: 1.964948 LR: 0.00001175 +[15:29:36] Epoch: 1 Batch: 14816/20099 (73.72%) Loss: 1.883831 LR: 0.00001175 +[15:29:38] Epoch: 1 Batch: 14817/20099 (73.72%) Loss: 2.600588 LR: 0.00001175 +[15:29:40] Epoch: 1 Batch: 14818/20099 (73.73%) Loss: 1.884473 LR: 0.00001175 +[15:29:42] Epoch: 1 Batch: 14819/20099 (73.73%) Loss: 2.001213 LR: 0.00001175 +[15:29:44] Epoch: 1 Batch: 14820/20099 (73.74%) Loss: 1.869462 LR: 0.00001174 +[15:29:45] Epoch: 1 Batch: 14821/20099 (73.74%) Loss: 2.137549 LR: 0.00001174 +[15:29:47] Epoch: 1 Batch: 14822/20099 (73.74%) Loss: 2.148617 LR: 0.00001174 +[15:29:49] Epoch: 1 Batch: 14823/20099 (73.75%) Loss: 1.938741 LR: 0.00001174 +[15:29:51] Epoch: 1 Batch: 14824/20099 (73.75%) Loss: 1.853939 LR: 0.00001174 +[15:29:53] Epoch: 1 Batch: 14825/20099 (73.76%) Loss: 2.227391 LR: 0.00001174 +[15:29:55] Epoch: 1 Batch: 14826/20099 (73.76%) Loss: 2.249324 LR: 0.00001174 +[15:29:57] Epoch: 1 Batch: 14827/20099 (73.77%) Loss: 2.202935 LR: 0.00001173 +[15:29:58] Epoch: 1 Batch: 14828/20099 (73.77%) Loss: 2.003791 LR: 0.00001173 +[15:30:00] Epoch: 1 Batch: 14829/20099 (73.78%) Loss: 1.994618 LR: 0.00001173 +[15:30:02] Epoch: 1 Batch: 14830/20099 (73.78%) Loss: 2.145823 LR: 0.00001173 +[15:30:04] Epoch: 1 Batch: 14831/20099 (73.79%) Loss: 2.304205 LR: 0.00001173 +[15:30:06] Epoch: 1 Batch: 14832/20099 (73.79%) Loss: 2.108578 LR: 0.00001173 +[15:30:08] Epoch: 1 Batch: 14833/20099 (73.80%) Loss: 1.930388 LR: 0.00001173 +[15:30:09] Epoch: 1 Batch: 14834/20099 (73.80%) Loss: 2.101443 LR: 0.00001171 +[15:30:11] Epoch: 1 Batch: 14835/20099 (73.81%) Loss: 1.978124 LR: 0.00001171 +[15:30:13] Epoch: 1 Batch: 14836/20099 (73.81%) Loss: 2.106428 LR: 0.00001171 +[15:30:15] Epoch: 1 Batch: 14837/20099 (73.82%) Loss: 2.214680 LR: 0.00001171 +[15:30:17] Epoch: 1 Batch: 14838/20099 (73.82%) Loss: 2.139306 LR: 0.00001171 +[15:30:19] Epoch: 1 Batch: 14839/20099 (73.83%) Loss: 2.192270 LR: 0.00001171 +[15:30:21] Epoch: 1 Batch: 14840/20099 (73.83%) Loss: 1.920345 LR: 0.00001171 +[15:30:22] Epoch: 1 Batch: 14841/20099 (73.84%) Loss: 2.089080 LR: 0.00001170 +[15:30:24] Epoch: 1 Batch: 14842/20099 (73.84%) Loss: 2.127104 LR: 0.00001170 +[15:30:26] Epoch: 1 Batch: 14843/20099 (73.85%) Loss: 2.033286 LR: 0.00001170 +[15:30:28] Epoch: 1 Batch: 14844/20099 (73.85%) Loss: 2.155840 LR: 0.00001170 +[15:30:30] Epoch: 1 Batch: 14845/20099 (73.86%) Loss: 2.099457 LR: 0.00001170 +[15:30:32] Epoch: 1 Batch: 14846/20099 (73.86%) Loss: 1.791582 LR: 0.00001170 +[15:30:34] Epoch: 1 Batch: 14847/20099 (73.87%) Loss: 1.853220 LR: 0.00001170 +[15:30:35] Epoch: 1 Batch: 14848/20099 (73.87%) Loss: 2.273280 LR: 0.00001168 +[15:30:37] Epoch: 1 Batch: 14849/20099 (73.88%) Loss: 2.051242 LR: 0.00001168 +[15:30:39] Epoch: 1 Batch: 14850/20099 (73.88%) Loss: 1.827104 LR: 0.00001168 +[15:30:41] Epoch: 1 Batch: 14851/20099 (73.89%) Loss: 2.169297 LR: 0.00001168 +[15:30:43] Epoch: 1 Batch: 14852/20099 (73.89%) Loss: 1.912151 LR: 0.00001168 +[15:30:45] Epoch: 1 Batch: 14853/20099 (73.90%) Loss: 1.831099 LR: 0.00001168 +[15:30:46] Epoch: 1 Batch: 14854/20099 (73.90%) Loss: 2.297000 LR: 0.00001168 +[15:30:48] Epoch: 1 Batch: 14855/20099 (73.91%) Loss: 2.252624 LR: 0.00001167 +[15:30:50] Epoch: 1 Batch: 14856/20099 (73.91%) Loss: 1.939414 LR: 0.00001167 +[15:30:52] Epoch: 1 Batch: 14857/20099 (73.92%) Loss: 2.165844 LR: 0.00001167 +[15:30:54] Epoch: 1 Batch: 14858/20099 (73.92%) Loss: 2.094969 LR: 0.00001167 +[15:30:56] Epoch: 1 Batch: 14859/20099 (73.93%) Loss: 2.158749 LR: 0.00001167 +[15:30:58] Epoch: 1 Batch: 14860/20099 (73.93%) Loss: 1.891902 LR: 0.00001167 +[15:30:59] Epoch: 1 Batch: 14861/20099 (73.94%) Loss: 2.246606 LR: 0.00001167 +[15:31:01] Epoch: 1 Batch: 14862/20099 (73.94%) Loss: 2.117472 LR: 0.00001166 +[15:31:03] Epoch: 1 Batch: 14863/20099 (73.95%) Loss: 1.879394 LR: 0.00001166 +[15:31:05] Epoch: 1 Batch: 14864/20099 (73.95%) Loss: 2.058429 LR: 0.00001166 +[15:31:07] Epoch: 1 Batch: 14865/20099 (73.96%) Loss: 2.135745 LR: 0.00001166 +[15:31:09] Epoch: 1 Batch: 14866/20099 (73.96%) Loss: 2.000299 LR: 0.00001166 +[15:31:11] Epoch: 1 Batch: 14867/20099 (73.97%) Loss: 2.216433 LR: 0.00001166 +[15:31:12] Epoch: 1 Batch: 14868/20099 (73.97%) Loss: 1.979194 LR: 0.00001166 +[15:31:14] Epoch: 1 Batch: 14869/20099 (73.98%) Loss: 1.961370 LR: 0.00001164 +[15:31:16] Epoch: 1 Batch: 14870/20099 (73.98%) Loss: 2.199549 LR: 0.00001164 +[15:31:18] Epoch: 1 Batch: 14871/20099 (73.99%) Loss: 1.989530 LR: 0.00001164 +[15:31:20] Epoch: 1 Batch: 14872/20099 (73.99%) Loss: 2.185439 LR: 0.00001164 +[15:31:22] Epoch: 1 Batch: 14873/20099 (74.00%) Loss: 2.027701 LR: 0.00001164 +[15:31:23] Epoch: 1 Batch: 14874/20099 (74.00%) Loss: 2.135622 LR: 0.00001164 +[15:31:25] Epoch: 1 Batch: 14875/20099 (74.01%) Loss: 2.107593 LR: 0.00001164 +[15:31:27] Epoch: 1 Batch: 14876/20099 (74.01%) Loss: 2.260426 LR: 0.00001163 +[15:31:29] Epoch: 1 Batch: 14877/20099 (74.02%) Loss: 2.194685 LR: 0.00001163 +[15:31:31] Epoch: 1 Batch: 14878/20099 (74.02%) Loss: 1.847812 LR: 0.00001163 +[15:31:33] Epoch: 1 Batch: 14879/20099 (74.03%) Loss: 2.132424 LR: 0.00001163 +[15:31:35] Epoch: 1 Batch: 14880/20099 (74.03%) Loss: 2.380367 LR: 0.00001163 +[15:31:36] Epoch: 1 Batch: 14881/20099 (74.04%) Loss: 2.272047 LR: 0.00001163 +[15:31:38] Epoch: 1 Batch: 14882/20099 (74.04%) Loss: 2.066972 LR: 0.00001163 +[15:31:40] Epoch: 1 Batch: 14883/20099 (74.05%) Loss: 2.379069 LR: 0.00001162 +[15:31:42] Epoch: 1 Batch: 14884/20099 (74.05%) Loss: 1.938607 LR: 0.00001162 +[15:31:44] Epoch: 1 Batch: 14885/20099 (74.06%) Loss: 2.059341 LR: 0.00001162 +[15:31:46] Epoch: 1 Batch: 14886/20099 (74.06%) Loss: 2.521427 LR: 0.00001162 +[15:31:48] Epoch: 1 Batch: 14887/20099 (74.07%) Loss: 2.238933 LR: 0.00001162 +[15:31:50] Epoch: 1 Batch: 14888/20099 (74.07%) Loss: 2.149816 LR: 0.00001162 +[15:31:51] Epoch: 1 Batch: 14889/20099 (74.08%) Loss: 1.967731 LR: 0.00001162 +[15:31:53] Epoch: 1 Batch: 14890/20099 (74.08%) Loss: 2.126666 LR: 0.00001160 +[15:31:55] Epoch: 1 Batch: 14891/20099 (74.09%) Loss: 2.220259 LR: 0.00001160 +[15:31:57] Epoch: 1 Batch: 14892/20099 (74.09%) Loss: 1.747729 LR: 0.00001160 +[15:31:59] Epoch: 1 Batch: 14893/20099 (74.10%) Loss: 2.100382 LR: 0.00001160 +[15:32:01] Epoch: 1 Batch: 14894/20099 (74.10%) Loss: 2.461234 LR: 0.00001160 +[15:32:02] Epoch: 1 Batch: 14895/20099 (74.11%) Loss: 2.361595 LR: 0.00001160 +[15:32:04] Epoch: 1 Batch: 14896/20099 (74.11%) Loss: 2.251818 LR: 0.00001160 +[15:32:06] Epoch: 1 Batch: 14897/20099 (74.12%) Loss: 2.013610 LR: 0.00001159 +[15:32:08] Epoch: 1 Batch: 14898/20099 (74.12%) Loss: 1.991952 LR: 0.00001159 +[15:32:10] Epoch: 1 Batch: 14899/20099 (74.13%) Loss: 1.918209 LR: 0.00001159 +[15:32:12] Epoch: 1 Batch: 14900/20099 (74.13%) Loss: 1.890265 LR: 0.00001159 +[15:32:14] Epoch: 1 Batch: 14901/20099 (74.14%) Loss: 1.871738 LR: 0.00001159 +[15:32:15] Epoch: 1 Batch: 14902/20099 (74.14%) Loss: 1.873860 LR: 0.00001159 +[15:32:17] Epoch: 1 Batch: 14903/20099 (74.15%) Loss: 1.887889 LR: 0.00001159 +[15:32:19] Epoch: 1 Batch: 14904/20099 (74.15%) Loss: 2.057255 LR: 0.00001157 +[15:32:21] Epoch: 1 Batch: 14905/20099 (74.16%) Loss: 2.113943 LR: 0.00001157 +[15:32:23] Epoch: 1 Batch: 14906/20099 (74.16%) Loss: 1.885328 LR: 0.00001157 +[15:32:25] Epoch: 1 Batch: 14907/20099 (74.17%) Loss: 1.972797 LR: 0.00001157 +[15:32:26] Epoch: 1 Batch: 14908/20099 (74.17%) Loss: 2.294512 LR: 0.00001157 +[15:32:28] Epoch: 1 Batch: 14909/20099 (74.18%) Loss: 2.053163 LR: 0.00001157 +[15:32:30] Epoch: 1 Batch: 14910/20099 (74.18%) Loss: 1.957083 LR: 0.00001157 +[15:32:32] Epoch: 1 Batch: 14911/20099 (74.19%) Loss: 1.956885 LR: 0.00001156 +[15:32:34] Epoch: 1 Batch: 14912/20099 (74.19%) Loss: 1.599414 LR: 0.00001156 +[15:32:36] Epoch: 1 Batch: 14913/20099 (74.20%) Loss: 1.876906 LR: 0.00001156 +[15:32:38] Epoch: 1 Batch: 14914/20099 (74.20%) Loss: 2.217363 LR: 0.00001156 +[15:32:39] Epoch: 1 Batch: 14915/20099 (74.21%) Loss: 2.208473 LR: 0.00001156 +[15:32:41] Epoch: 1 Batch: 14916/20099 (74.21%) Loss: 1.772821 LR: 0.00001156 +[15:32:43] Epoch: 1 Batch: 14917/20099 (74.22%) Loss: 1.953118 LR: 0.00001156 +[15:32:45] Epoch: 1 Batch: 14918/20099 (74.22%) Loss: 2.026433 LR: 0.00001155 +[15:32:47] Epoch: 1 Batch: 14919/20099 (74.23%) Loss: 2.237639 LR: 0.00001155 +[15:32:49] Epoch: 1 Batch: 14920/20099 (74.23%) Loss: 1.734126 LR: 0.00001155 +[15:32:51] Epoch: 1 Batch: 14921/20099 (74.24%) Loss: 1.669109 LR: 0.00001155 +[15:32:52] Epoch: 1 Batch: 14922/20099 (74.24%) Loss: 2.436272 LR: 0.00001155 +[15:32:54] Epoch: 1 Batch: 14923/20099 (74.25%) Loss: 2.062576 LR: 0.00001155 +[15:32:56] Epoch: 1 Batch: 14924/20099 (74.25%) Loss: 2.037621 LR: 0.00001155 +[15:32:58] Epoch: 1 Batch: 14925/20099 (74.26%) Loss: 2.063207 LR: 0.00001153 +[15:33:00] Epoch: 1 Batch: 14926/20099 (74.26%) Loss: 2.181958 LR: 0.00001153 +[15:33:02] Epoch: 1 Batch: 14927/20099 (74.27%) Loss: 2.065014 LR: 0.00001153 +[15:33:04] Epoch: 1 Batch: 14928/20099 (74.27%) Loss: 2.331077 LR: 0.00001153 +[15:33:05] Epoch: 1 Batch: 14929/20099 (74.28%) Loss: 2.108982 LR: 0.00001153 +[15:33:07] Epoch: 1 Batch: 14930/20099 (74.28%) Loss: 2.152534 LR: 0.00001153 +[15:33:09] Epoch: 1 Batch: 14931/20099 (74.29%) Loss: 2.086803 LR: 0.00001153 +[15:33:11] Epoch: 1 Batch: 14932/20099 (74.29%) Loss: 2.058345 LR: 0.00001152 +[15:33:13] Epoch: 1 Batch: 14933/20099 (74.30%) Loss: 1.886432 LR: 0.00001152 +[15:33:15] Epoch: 1 Batch: 14934/20099 (74.30%) Loss: 2.044586 LR: 0.00001152 +[15:33:16] Epoch: 1 Batch: 14935/20099 (74.31%) Loss: 2.283975 LR: 0.00001152 +[15:33:18] Epoch: 1 Batch: 14936/20099 (74.31%) Loss: 2.255137 LR: 0.00001152 +[15:33:20] Epoch: 1 Batch: 14937/20099 (74.32%) Loss: 2.152160 LR: 0.00001152 +[15:33:22] Epoch: 1 Batch: 14938/20099 (74.32%) Loss: 2.033334 LR: 0.00001152 +[15:33:24] Epoch: 1 Batch: 14939/20099 (74.33%) Loss: 2.029494 LR: 0.00001151 +[15:33:26] Epoch: 1 Batch: 14940/20099 (74.33%) Loss: 2.414512 LR: 0.00001151 +[15:33:28] Epoch: 1 Batch: 14941/20099 (74.34%) Loss: 1.913621 LR: 0.00001151 +[15:33:30] Epoch: 1 Batch: 14942/20099 (74.34%) Loss: 2.063816 LR: 0.00001151 +[15:33:32] Epoch: 1 Batch: 14943/20099 (74.35%) Loss: 2.190058 LR: 0.00001151 +[15:33:33] Epoch: 1 Batch: 14944/20099 (74.35%) Loss: 2.190831 LR: 0.00001151 +[15:33:35] Epoch: 1 Batch: 14945/20099 (74.36%) Loss: 2.061677 LR: 0.00001151 +[15:33:37] Epoch: 1 Batch: 14946/20099 (74.36%) Loss: 1.805814 LR: 0.00001149 +[15:33:39] Epoch: 1 Batch: 14947/20099 (74.37%) Loss: 2.225127 LR: 0.00001149 +[15:33:41] Epoch: 1 Batch: 14948/20099 (74.37%) Loss: 1.942101 LR: 0.00001149 +[15:33:43] Epoch: 1 Batch: 14949/20099 (74.38%) Loss: 2.080892 LR: 0.00001149 +[15:33:44] Epoch: 1 Batch: 14950/20099 (74.38%) Loss: 2.065711 LR: 0.00001149 +[15:33:46] Epoch: 1 Batch: 14951/20099 (74.39%) Loss: 1.685774 LR: 0.00001149 +[15:33:48] Epoch: 1 Batch: 14952/20099 (74.39%) Loss: 1.968872 LR: 0.00001149 +[15:33:50] Epoch: 1 Batch: 14953/20099 (74.40%) Loss: 1.959617 LR: 0.00001148 +[15:33:52] Epoch: 1 Batch: 14954/20099 (74.40%) Loss: 1.658932 LR: 0.00001148 +[15:33:54] Epoch: 1 Batch: 14955/20099 (74.41%) Loss: 2.344990 LR: 0.00001148 +[15:33:56] Epoch: 1 Batch: 14956/20099 (74.41%) Loss: 2.049994 LR: 0.00001148 +[15:33:57] Epoch: 1 Batch: 14957/20099 (74.42%) Loss: 1.879896 LR: 0.00001148 +[15:33:59] Epoch: 1 Batch: 14958/20099 (74.42%) Loss: 2.146871 LR: 0.00001148 +[15:34:01] Epoch: 1 Batch: 14959/20099 (74.43%) Loss: 2.046450 LR: 0.00001148 +[15:34:03] Epoch: 1 Batch: 14960/20099 (74.43%) Loss: 2.006572 LR: 0.00001146 +[15:34:05] Epoch: 1 Batch: 14961/20099 (74.44%) Loss: 1.916632 LR: 0.00001146 +[15:34:07] Epoch: 1 Batch: 14962/20099 (74.44%) Loss: 1.815775 LR: 0.00001146 +[15:34:09] Epoch: 1 Batch: 14963/20099 (74.45%) Loss: 2.026780 LR: 0.00001146 +[15:34:10] Epoch: 1 Batch: 14964/20099 (74.45%) Loss: 2.146287 LR: 0.00001146 +[15:34:12] Epoch: 1 Batch: 14965/20099 (74.46%) Loss: 1.947296 LR: 0.00001146 +[15:34:14] Epoch: 1 Batch: 14966/20099 (74.46%) Loss: 1.876985 LR: 0.00001146 +[15:34:16] Epoch: 1 Batch: 14967/20099 (74.47%) Loss: 2.160566 LR: 0.00001145 +[15:34:18] Epoch: 1 Batch: 14968/20099 (74.47%) Loss: 2.108978 LR: 0.00001145 +[15:34:20] Epoch: 1 Batch: 14969/20099 (74.48%) Loss: 2.083555 LR: 0.00001145 +[15:34:22] Epoch: 1 Batch: 14970/20099 (74.48%) Loss: 2.002642 LR: 0.00001145 +[15:34:24] Epoch: 1 Batch: 14971/20099 (74.49%) Loss: 1.803746 LR: 0.00001145 +[15:34:25] Epoch: 1 Batch: 14972/20099 (74.49%) Loss: 2.032679 LR: 0.00001145 +[15:34:27] Epoch: 1 Batch: 14973/20099 (74.50%) Loss: 2.148899 LR: 0.00001145 +[15:34:29] Epoch: 1 Batch: 14974/20099 (74.50%) Loss: 1.702875 LR: 0.00001144 +[15:34:31] Epoch: 1 Batch: 14975/20099 (74.51%) Loss: 2.112844 LR: 0.00001144 +[15:34:33] Epoch: 1 Batch: 14976/20099 (74.51%) Loss: 2.228926 LR: 0.00001144 +[15:34:35] Epoch: 1 Batch: 14977/20099 (74.52%) Loss: 2.226591 LR: 0.00001144 +[15:34:37] Epoch: 1 Batch: 14978/20099 (74.52%) Loss: 2.350809 LR: 0.00001144 +[15:34:38] Epoch: 1 Batch: 14979/20099 (74.53%) Loss: 2.133690 LR: 0.00001144 +[15:34:40] Epoch: 1 Batch: 14980/20099 (74.53%) Loss: 2.304178 LR: 0.00001144 +[15:34:42] Epoch: 1 Batch: 14981/20099 (74.54%) Loss: 2.182967 LR: 0.00001142 +[15:34:44] Epoch: 1 Batch: 14982/20099 (74.54%) Loss: 1.884519 LR: 0.00001142 +[15:34:46] Epoch: 1 Batch: 14983/20099 (74.55%) Loss: 2.032850 LR: 0.00001142 +[15:34:48] Epoch: 1 Batch: 14984/20099 (74.55%) Loss: 2.089105 LR: 0.00001142 +[15:34:50] Epoch: 1 Batch: 14985/20099 (74.56%) Loss: 2.290029 LR: 0.00001142 +[15:34:51] Epoch: 1 Batch: 14986/20099 (74.56%) Loss: 2.208405 LR: 0.00001142 +[15:34:53] Epoch: 1 Batch: 14987/20099 (74.57%) Loss: 2.318029 LR: 0.00001142 +[15:34:55] Epoch: 1 Batch: 14988/20099 (74.57%) Loss: 2.018976 LR: 0.00001141 +[15:34:57] Epoch: 1 Batch: 14989/20099 (74.58%) Loss: 2.410890 LR: 0.00001141 +[15:34:59] Epoch: 1 Batch: 14990/20099 (74.58%) Loss: 1.792704 LR: 0.00001141 +[15:35:01] Epoch: 1 Batch: 14991/20099 (74.59%) Loss: 2.074270 LR: 0.00001141 +[15:35:02] Epoch: 1 Batch: 14992/20099 (74.59%) Loss: 1.945010 LR: 0.00001141 +[15:35:04] Epoch: 1 Batch: 14993/20099 (74.60%) Loss: 2.284334 LR: 0.00001141 +[15:35:06] Epoch: 1 Batch: 14994/20099 (74.60%) Loss: 1.616927 LR: 0.00001141 +[15:35:08] Epoch: 1 Batch: 14995/20099 (74.61%) Loss: 2.180976 LR: 0.00001140 +[15:35:10] Epoch: 1 Batch: 14996/20099 (74.61%) Loss: 1.950357 LR: 0.00001140 +[15:35:12] Epoch: 1 Batch: 14997/20099 (74.62%) Loss: 2.245338 LR: 0.00001140 +[15:35:14] Epoch: 1 Batch: 14998/20099 (74.62%) Loss: 2.255525 LR: 0.00001140 +[15:35:15] Epoch: 1 Batch: 14999/20099 (74.63%) Loss: 2.211165 LR: 0.00001140 +[15:35:17] >> Evaluating batch 0 +[15:35:18] >> Evaluating batch 1 +[15:35:19] >> Evaluating batch 2 +[15:35:21] >> Evaluating batch 3 +[15:35:22] >> Evaluating batch 4 +[15:35:23] >> Evaluating batch 5 +[15:35:24] >> Evaluating batch 6 +[15:35:25] >> Evaluating batch 7 +[15:35:26] >> Evaluating batch 8 +[15:35:27] >> Evaluating batch 9 +[15:35:28] >> Evaluating batch 10 +[15:35:29] >> Evaluating batch 11 +[15:35:30] >> Evaluating batch 12 +[15:35:31] >> Evaluating batch 13 +[15:35:32] >> Evaluating batch 14 +[15:35:33] >> Evaluating batch 15 +[15:35:34] >> Evaluating batch 16 +[15:35:35] Epoch: 1 Step: 15000/20099 Evaluation: +[15:35:35] [1mAvg Loss Since Last Eval: 2.0849 Val Loss: 2.1531 Validation loss delta: 0.0021 Perplexity: 8.6118 LR: 0.00001140 +[15:35:38] >> Cleaned up old temp checkpoint: epoch1_step13000 +[15:35:38] >> Temp checkpoint saved: epoch1_step15000, size: 0.1693 GB +[15:35:42] >> Checkpoint saved: epoch1_step15000, size: 0.1693 GB +[15:35:42] Epoch: 1 Batch: 15000/20099 (74.63%) Loss: 2.399714 LR: 0.00001140 +[15:35:43] Epoch: 1 Batch: 15001/20099 (74.64%) Loss: 2.268160 LR: 0.00001140 +[15:35:45] Epoch: 1 Batch: 15002/20099 (74.64%) Loss: 2.052238 LR: 0.00001138 +[15:35:47] Epoch: 1 Batch: 15003/20099 (74.65%) Loss: 2.290272 LR: 0.00001138 +[15:35:49] Epoch: 1 Batch: 15004/20099 (74.65%) Loss: 2.058256 LR: 0.00001138 +[15:35:51] Epoch: 1 Batch: 15005/20099 (74.66%) Loss: 2.200400 LR: 0.00001138 +[15:35:53] Epoch: 1 Batch: 15006/20099 (74.66%) Loss: 2.171251 LR: 0.00001138 +[15:35:54] Epoch: 1 Batch: 15007/20099 (74.67%) Loss: 2.048496 LR: 0.00001138 +[15:35:56] Epoch: 1 Batch: 15008/20099 (74.67%) Loss: 1.979047 LR: 0.00001138 +[15:35:58] Epoch: 1 Batch: 15009/20099 (74.68%) Loss: 2.077617 LR: 0.00001137 +[15:36:00] Epoch: 1 Batch: 15010/20099 (74.68%) Loss: 1.943466 LR: 0.00001137 +[15:36:02] Epoch: 1 Batch: 15011/20099 (74.69%) Loss: 2.078683 LR: 0.00001137 +[15:36:04] Epoch: 1 Batch: 15012/20099 (74.69%) Loss: 2.207379 LR: 0.00001137 +[15:36:06] Epoch: 1 Batch: 15013/20099 (74.70%) Loss: 2.096154 LR: 0.00001137 +[15:36:08] Epoch: 1 Batch: 15014/20099 (74.70%) Loss: 1.865391 LR: 0.00001137 +[15:36:10] Epoch: 1 Batch: 15015/20099 (74.71%) Loss: 2.009006 LR: 0.00001137 +[15:36:12] Epoch: 1 Batch: 15016/20099 (74.71%) Loss: 1.933682 LR: 0.00001136 +[15:36:14] Epoch: 1 Batch: 15017/20099 (74.72%) Loss: 1.910530 LR: 0.00001136 +[15:36:15] Epoch: 1 Batch: 15018/20099 (74.72%) Loss: 2.183299 LR: 0.00001136 +[15:36:17] Epoch: 1 Batch: 15019/20099 (74.73%) Loss: 2.156774 LR: 0.00001136 +[15:36:19] Epoch: 1 Batch: 15020/20099 (74.73%) Loss: 1.996060 LR: 0.00001136 +[15:36:21] Epoch: 1 Batch: 15021/20099 (74.74%) Loss: 2.121195 LR: 0.00001136 +[15:36:23] Epoch: 1 Batch: 15022/20099 (74.74%) Loss: 2.121837 LR: 0.00001136 +[15:36:25] Epoch: 1 Batch: 15023/20099 (74.75%) Loss: 2.208791 LR: 0.00001134 +[15:36:27] Epoch: 1 Batch: 15024/20099 (74.75%) Loss: 2.012165 LR: 0.00001134 +[15:36:28] Epoch: 1 Batch: 15025/20099 (74.75%) Loss: 2.246577 LR: 0.00001134 +[15:36:30] Epoch: 1 Batch: 15026/20099 (74.76%) Loss: 2.053092 LR: 0.00001134 +[15:36:32] Epoch: 1 Batch: 15027/20099 (74.76%) Loss: 2.093612 LR: 0.00001134 +[15:36:34] Epoch: 1 Batch: 15028/20099 (74.77%) Loss: 2.285298 LR: 0.00001134 +[15:36:36] Epoch: 1 Batch: 15029/20099 (74.77%) Loss: 2.219091 LR: 0.00001134 +[15:36:38] Epoch: 1 Batch: 15030/20099 (74.78%) Loss: 2.310145 LR: 0.00001133 +[15:36:40] Epoch: 1 Batch: 15031/20099 (74.78%) Loss: 1.908726 LR: 0.00001133 +[15:36:41] Epoch: 1 Batch: 15032/20099 (74.79%) Loss: 2.225934 LR: 0.00001133 +[15:36:43] Epoch: 1 Batch: 15033/20099 (74.79%) Loss: 2.205931 LR: 0.00001133 +[15:36:45] Epoch: 1 Batch: 15034/20099 (74.80%) Loss: 2.026833 LR: 0.00001133 +[15:36:47] Epoch: 1 Batch: 15035/20099 (74.80%) Loss: 2.098872 LR: 0.00001133 +[15:36:49] Epoch: 1 Batch: 15036/20099 (74.81%) Loss: 1.963253 LR: 0.00001133 +[15:36:51] Epoch: 1 Batch: 15037/20099 (74.81%) Loss: 1.951711 LR: 0.00001132 +[15:36:52] Epoch: 1 Batch: 15038/20099 (74.82%) Loss: 1.882307 LR: 0.00001132 +[15:36:54] Epoch: 1 Batch: 15039/20099 (74.82%) Loss: 2.301706 LR: 0.00001132 +[15:36:56] Epoch: 1 Batch: 15040/20099 (74.83%) Loss: 2.209752 LR: 0.00001132 +[15:36:58] Epoch: 1 Batch: 15041/20099 (74.83%) Loss: 1.963155 LR: 0.00001132 +[15:37:00] Epoch: 1 Batch: 15042/20099 (74.84%) Loss: 2.436749 LR: 0.00001132 +[15:37:02] Epoch: 1 Batch: 15043/20099 (74.84%) Loss: 2.036539 LR: 0.00001132 +[15:37:04] Epoch: 1 Batch: 15044/20099 (74.85%) Loss: 2.107283 LR: 0.00001130 +[15:37:05] Epoch: 1 Batch: 15045/20099 (74.85%) Loss: 2.154724 LR: 0.00001130 +[15:37:07] Epoch: 1 Batch: 15046/20099 (74.86%) Loss: 1.883241 LR: 0.00001130 +[15:37:09] Epoch: 1 Batch: 15047/20099 (74.86%) Loss: 2.225297 LR: 0.00001130 +[15:37:11] Epoch: 1 Batch: 15048/20099 (74.87%) Loss: 2.056154 LR: 0.00001130 +[15:37:13] Epoch: 1 Batch: 15049/20099 (74.87%) Loss: 1.888702 LR: 0.00001130 +[15:37:15] Epoch: 1 Batch: 15050/20099 (74.88%) Loss: 2.342314 LR: 0.00001130 +[15:37:17] Epoch: 1 Batch: 15051/20099 (74.88%) Loss: 2.300777 LR: 0.00001129 +[15:37:18] Epoch: 1 Batch: 15052/20099 (74.89%) Loss: 2.206802 LR: 0.00001129 +[15:37:20] Epoch: 1 Batch: 15053/20099 (74.89%) Loss: 1.960949 LR: 0.00001129 +[15:37:22] Epoch: 1 Batch: 15054/20099 (74.90%) Loss: 2.306780 LR: 0.00001129 +[15:37:24] Epoch: 1 Batch: 15055/20099 (74.90%) Loss: 2.071290 LR: 0.00001129 +[15:37:26] Epoch: 1 Batch: 15056/20099 (74.91%) Loss: 1.726404 LR: 0.00001129 +[15:37:28] Epoch: 1 Batch: 15057/20099 (74.91%) Loss: 2.131804 LR: 0.00001129 +[15:37:30] Epoch: 1 Batch: 15058/20099 (74.92%) Loss: 2.316452 LR: 0.00001128 +[15:37:31] Epoch: 1 Batch: 15059/20099 (74.92%) Loss: 2.140406 LR: 0.00001128 +[15:37:33] Epoch: 1 Batch: 15060/20099 (74.93%) Loss: 2.020577 LR: 0.00001128 +[15:37:35] Epoch: 1 Batch: 15061/20099 (74.93%) Loss: 2.142670 LR: 0.00001128 +[15:37:37] Epoch: 1 Batch: 15062/20099 (74.94%) Loss: 2.216624 LR: 0.00001128 +[15:37:39] Epoch: 1 Batch: 15063/20099 (74.94%) Loss: 1.961823 LR: 0.00001128 +[15:37:41] Epoch: 1 Batch: 15064/20099 (74.95%) Loss: 2.321778 LR: 0.00001128 +[15:37:43] Epoch: 1 Batch: 15065/20099 (74.95%) Loss: 2.136083 LR: 0.00001126 +[15:37:44] Epoch: 1 Batch: 15066/20099 (74.96%) Loss: 1.960380 LR: 0.00001126 +[15:37:46] Epoch: 1 Batch: 15067/20099 (74.96%) Loss: 2.602595 LR: 0.00001126 +[15:37:48] Epoch: 1 Batch: 15068/20099 (74.97%) Loss: 2.209375 LR: 0.00001126 +[15:37:50] Epoch: 1 Batch: 15069/20099 (74.97%) Loss: 1.821879 LR: 0.00001126 +[15:37:52] Epoch: 1 Batch: 15070/20099 (74.98%) Loss: 1.969225 LR: 0.00001126 +[15:37:54] Epoch: 1 Batch: 15071/20099 (74.98%) Loss: 2.187813 LR: 0.00001126 +[15:37:56] Epoch: 1 Batch: 15072/20099 (74.99%) Loss: 2.255141 LR: 0.00001125 +[15:37:57] Epoch: 1 Batch: 15073/20099 (74.99%) Loss: 1.948574 LR: 0.00001125 +[15:37:59] Epoch: 1 Batch: 15074/20099 (75.00%) Loss: 2.046085 LR: 0.00001125 +[15:38:01] Epoch: 1 Batch: 15075/20099 (75.00%) Loss: 2.074902 LR: 0.00001125 +[15:38:03] Epoch: 1 Batch: 15076/20099 (75.01%) Loss: 2.158692 LR: 0.00001125 +[15:38:05] Epoch: 1 Batch: 15077/20099 (75.01%) Loss: 1.937281 LR: 0.00001125 +[15:38:07] Epoch: 1 Batch: 15078/20099 (75.02%) Loss: 2.248474 LR: 0.00001125 +[15:38:09] Epoch: 1 Batch: 15079/20099 (75.02%) Loss: 2.176357 LR: 0.00001123 +[15:38:10] Epoch: 1 Batch: 15080/20099 (75.03%) Loss: 1.968749 LR: 0.00001123 +[15:38:12] Epoch: 1 Batch: 15081/20099 (75.03%) Loss: 2.186950 LR: 0.00001123 +[15:38:14] Epoch: 1 Batch: 15082/20099 (75.04%) Loss: 2.097704 LR: 0.00001123 +[15:38:16] Epoch: 1 Batch: 15083/20099 (75.04%) Loss: 1.596262 LR: 0.00001123 +[15:38:18] Epoch: 1 Batch: 15084/20099 (75.05%) Loss: 2.188342 LR: 0.00001123 +[15:38:20] Epoch: 1 Batch: 15085/20099 (75.05%) Loss: 2.295798 LR: 0.00001123 +[15:38:22] Epoch: 1 Batch: 15086/20099 (75.06%) Loss: 1.961109 LR: 0.00001122 +[15:38:23] Epoch: 1 Batch: 15087/20099 (75.06%) Loss: 1.931898 LR: 0.00001122 +[15:38:25] Epoch: 1 Batch: 15088/20099 (75.07%) Loss: 2.135632 LR: 0.00001122 +[15:38:27] Epoch: 1 Batch: 15089/20099 (75.07%) Loss: 2.054744 LR: 0.00001122 +[15:38:29] Epoch: 1 Batch: 15090/20099 (75.08%) Loss: 2.379980 LR: 0.00001122 +[15:38:31] Epoch: 1 Batch: 15091/20099 (75.08%) Loss: 2.355986 LR: 0.00001122 +[15:38:33] Epoch: 1 Batch: 15092/20099 (75.09%) Loss: 2.110699 LR: 0.00001122 +[15:38:35] Epoch: 1 Batch: 15093/20099 (75.09%) Loss: 1.870952 LR: 0.00001121 +[15:38:36] Epoch: 1 Batch: 15094/20099 (75.10%) Loss: 1.910584 LR: 0.00001121 +[15:38:38] Epoch: 1 Batch: 15095/20099 (75.10%) Loss: 2.638815 LR: 0.00001121 +[15:38:40] Epoch: 1 Batch: 15096/20099 (75.11%) Loss: 1.994974 LR: 0.00001121 +[15:38:42] Epoch: 1 Batch: 15097/20099 (75.11%) Loss: 2.007473 LR: 0.00001121 +[15:38:44] Epoch: 1 Batch: 15098/20099 (75.12%) Loss: 2.239409 LR: 0.00001121 +[15:38:46] Epoch: 1 Batch: 15099/20099 (75.12%) Loss: 2.126755 LR: 0.00001121 +[15:38:48] Epoch: 1 Batch: 15100/20099 (75.13%) Loss: 1.957310 LR: 0.00001119 +[15:38:49] Epoch: 1 Batch: 15101/20099 (75.13%) Loss: 1.907703 LR: 0.00001119 +[15:38:51] Epoch: 1 Batch: 15102/20099 (75.14%) Loss: 2.063211 LR: 0.00001119 +[15:38:53] Epoch: 1 Batch: 15103/20099 (75.14%) Loss: 2.293962 LR: 0.00001119 +[15:38:55] Epoch: 1 Batch: 15104/20099 (75.15%) Loss: 2.286539 LR: 0.00001119 +[15:38:57] Epoch: 1 Batch: 15105/20099 (75.15%) Loss: 1.864180 LR: 0.00001119 +[15:38:59] Epoch: 1 Batch: 15106/20099 (75.16%) Loss: 1.766424 LR: 0.00001119 +[15:39:00] Epoch: 1 Batch: 15107/20099 (75.16%) Loss: 2.404510 LR: 0.00001118 +[15:39:02] Epoch: 1 Batch: 15108/20099 (75.17%) Loss: 2.285655 LR: 0.00001118 +[15:39:04] Epoch: 1 Batch: 15109/20099 (75.17%) Loss: 1.667720 LR: 0.00001118 +[15:39:06] Epoch: 1 Batch: 15110/20099 (75.18%) Loss: 1.895698 LR: 0.00001118 +[15:39:08] Epoch: 1 Batch: 15111/20099 (75.18%) Loss: 2.287520 LR: 0.00001118 +[15:39:10] Epoch: 1 Batch: 15112/20099 (75.19%) Loss: 2.028039 LR: 0.00001118 +[15:39:12] Epoch: 1 Batch: 15113/20099 (75.19%) Loss: 2.111310 LR: 0.00001118 +[15:39:13] Epoch: 1 Batch: 15114/20099 (75.20%) Loss: 2.129694 LR: 0.00001117 +[15:39:15] Epoch: 1 Batch: 15115/20099 (75.20%) Loss: 2.164134 LR: 0.00001117 +[15:39:17] Epoch: 1 Batch: 15116/20099 (75.21%) Loss: 2.100262 LR: 0.00001117 +[15:39:19] Epoch: 1 Batch: 15117/20099 (75.21%) Loss: 2.176819 LR: 0.00001117 +[15:39:21] Epoch: 1 Batch: 15118/20099 (75.22%) Loss: 2.366236 LR: 0.00001117 +[15:39:23] Epoch: 1 Batch: 15119/20099 (75.22%) Loss: 2.155540 LR: 0.00001117 +[15:39:25] Epoch: 1 Batch: 15120/20099 (75.23%) Loss: 1.939144 LR: 0.00001117 +[15:39:26] Epoch: 1 Batch: 15121/20099 (75.23%) Loss: 2.072552 LR: 0.00001115 +[15:39:28] Epoch: 1 Batch: 15122/20099 (75.24%) Loss: 2.101533 LR: 0.00001115 +[15:39:30] Epoch: 1 Batch: 15123/20099 (75.24%) Loss: 2.016602 LR: 0.00001115 +[15:39:32] Epoch: 1 Batch: 15124/20099 (75.25%) Loss: 2.112617 LR: 0.00001115 +[15:39:34] Epoch: 1 Batch: 15125/20099 (75.25%) Loss: 2.362927 LR: 0.00001115 +[15:39:36] Epoch: 1 Batch: 15126/20099 (75.26%) Loss: 2.051871 LR: 0.00001115 +[15:39:38] Epoch: 1 Batch: 15127/20099 (75.26%) Loss: 2.078336 LR: 0.00001115 +[15:39:39] Epoch: 1 Batch: 15128/20099 (75.27%) Loss: 2.379160 LR: 0.00001114 +[15:39:41] Epoch: 1 Batch: 15129/20099 (75.27%) Loss: 2.372590 LR: 0.00001114 +[15:39:43] Epoch: 1 Batch: 15130/20099 (75.28%) Loss: 2.012220 LR: 0.00001114 +[15:39:45] Epoch: 1 Batch: 15131/20099 (75.28%) Loss: 2.287977 LR: 0.00001114 +[15:39:47] Epoch: 1 Batch: 15132/20099 (75.29%) Loss: 2.316362 LR: 0.00001114 +[15:39:49] Epoch: 1 Batch: 15133/20099 (75.29%) Loss: 2.515020 LR: 0.00001114 +[15:39:51] Epoch: 1 Batch: 15134/20099 (75.30%) Loss: 2.013502 LR: 0.00001114 +[15:39:52] Epoch: 1 Batch: 15135/20099 (75.30%) Loss: 2.194878 LR: 0.00001113 +[15:39:54] Epoch: 1 Batch: 15136/20099 (75.31%) Loss: 2.347776 LR: 0.00001113 +[15:39:56] Epoch: 1 Batch: 15137/20099 (75.31%) Loss: 2.251693 LR: 0.00001113 +[15:39:58] Epoch: 1 Batch: 15138/20099 (75.32%) Loss: 2.229572 LR: 0.00001113 +[15:40:00] Epoch: 1 Batch: 15139/20099 (75.32%) Loss: 2.148108 LR: 0.00001113 +[15:40:02] Epoch: 1 Batch: 15140/20099 (75.33%) Loss: 1.839326 LR: 0.00001113 +[15:40:04] Epoch: 1 Batch: 15141/20099 (75.33%) Loss: 1.935631 LR: 0.00001113 +[15:40:05] Epoch: 1 Batch: 15142/20099 (75.34%) Loss: 2.234335 LR: 0.00001111 +[15:40:07] Epoch: 1 Batch: 15143/20099 (75.34%) Loss: 2.447953 LR: 0.00001111 +[15:40:09] Epoch: 1 Batch: 15144/20099 (75.35%) Loss: 2.291926 LR: 0.00001111 +[15:40:11] Epoch: 1 Batch: 15145/20099 (75.35%) Loss: 2.160417 LR: 0.00001111 +[15:40:13] Epoch: 1 Batch: 15146/20099 (75.36%) Loss: 2.205726 LR: 0.00001111 +[15:40:15] Epoch: 1 Batch: 15147/20099 (75.36%) Loss: 2.252620 LR: 0.00001111 +[15:40:17] Epoch: 1 Batch: 15148/20099 (75.37%) Loss: 2.209413 LR: 0.00001111 +[15:40:18] Epoch: 1 Batch: 15149/20099 (75.37%) Loss: 2.293726 LR: 0.00001110 +[15:40:20] Epoch: 1 Batch: 15150/20099 (75.38%) Loss: 1.894569 LR: 0.00001110 +[15:40:22] Epoch: 1 Batch: 15151/20099 (75.38%) Loss: 2.105061 LR: 0.00001110 +[15:40:24] Epoch: 1 Batch: 15152/20099 (75.39%) Loss: 2.204687 LR: 0.00001110 +[15:40:26] Epoch: 1 Batch: 15153/20099 (75.39%) Loss: 1.987600 LR: 0.00001110 +[15:40:28] Epoch: 1 Batch: 15154/20099 (75.40%) Loss: 2.045967 LR: 0.00001110 +[15:40:30] Epoch: 1 Batch: 15155/20099 (75.40%) Loss: 2.612744 LR: 0.00001110 +[15:40:31] Epoch: 1 Batch: 15156/20099 (75.41%) Loss: 2.007379 LR: 0.00001109 +[15:40:33] Epoch: 1 Batch: 15157/20099 (75.41%) Loss: 1.999689 LR: 0.00001109 +[15:40:35] Epoch: 1 Batch: 15158/20099 (75.42%) Loss: 2.134153 LR: 0.00001109 +[15:40:37] Epoch: 1 Batch: 15159/20099 (75.42%) Loss: 2.215169 LR: 0.00001109 +[15:40:39] Epoch: 1 Batch: 15160/20099 (75.43%) Loss: 2.095916 LR: 0.00001109 +[15:40:41] Epoch: 1 Batch: 15161/20099 (75.43%) Loss: 2.381615 LR: 0.00001109 +[15:40:43] Epoch: 1 Batch: 15162/20099 (75.44%) Loss: 2.090457 LR: 0.00001109 +[15:40:44] Epoch: 1 Batch: 15163/20099 (75.44%) Loss: 2.434157 LR: 0.00001107 +[15:40:46] Epoch: 1 Batch: 15164/20099 (75.45%) Loss: 2.006435 LR: 0.00001107 +[15:40:48] Epoch: 1 Batch: 15165/20099 (75.45%) Loss: 2.003097 LR: 0.00001107 +[15:40:50] Epoch: 1 Batch: 15166/20099 (75.46%) Loss: 2.111379 LR: 0.00001107 +[15:40:52] Epoch: 1 Batch: 15167/20099 (75.46%) Loss: 2.273332 LR: 0.00001107 +[15:40:54] Epoch: 1 Batch: 15168/20099 (75.47%) Loss: 1.987785 LR: 0.00001107 +[15:40:56] Epoch: 1 Batch: 15169/20099 (75.47%) Loss: 2.169081 LR: 0.00001107 +[15:40:57] Epoch: 1 Batch: 15170/20099 (75.48%) Loss: 2.141636 LR: 0.00001106 +[15:40:59] Epoch: 1 Batch: 15171/20099 (75.48%) Loss: 2.186105 LR: 0.00001106 +[15:41:01] Epoch: 1 Batch: 15172/20099 (75.49%) Loss: 1.819408 LR: 0.00001106 +[15:41:03] Epoch: 1 Batch: 15173/20099 (75.49%) Loss: 2.144169 LR: 0.00001106 +[15:41:05] Epoch: 1 Batch: 15174/20099 (75.50%) Loss: 2.090658 LR: 0.00001106 +[15:41:07] Epoch: 1 Batch: 15175/20099 (75.50%) Loss: 2.374142 LR: 0.00001106 +[15:41:09] Epoch: 1 Batch: 15176/20099 (75.51%) Loss: 2.120179 LR: 0.00001106 +[15:41:10] Epoch: 1 Batch: 15177/20099 (75.51%) Loss: 2.191149 LR: 0.00001105 +[15:41:12] Epoch: 1 Batch: 15178/20099 (75.52%) Loss: 2.171837 LR: 0.00001105 +[15:41:14] Epoch: 1 Batch: 15179/20099 (75.52%) Loss: 2.180700 LR: 0.00001105 +[15:41:16] Epoch: 1 Batch: 15180/20099 (75.53%) Loss: 1.980951 LR: 0.00001105 +[15:41:18] Epoch: 1 Batch: 15181/20099 (75.53%) Loss: 2.029532 LR: 0.00001105 +[15:41:20] Epoch: 1 Batch: 15182/20099 (75.54%) Loss: 2.200535 LR: 0.00001105 +[15:41:22] Epoch: 1 Batch: 15183/20099 (75.54%) Loss: 2.114560 LR: 0.00001105 +[15:41:23] Epoch: 1 Batch: 15184/20099 (75.55%) Loss: 2.073947 LR: 0.00001103 +[15:41:25] Epoch: 1 Batch: 15185/20099 (75.55%) Loss: 2.123386 LR: 0.00001103 +[15:41:27] Epoch: 1 Batch: 15186/20099 (75.56%) Loss: 2.229039 LR: 0.00001103 +[15:41:29] Epoch: 1 Batch: 15187/20099 (75.56%) Loss: 2.150718 LR: 0.00001103 +[15:41:31] Epoch: 1 Batch: 15188/20099 (75.57%) Loss: 2.167921 LR: 0.00001103 +[15:41:33] Epoch: 1 Batch: 15189/20099 (75.57%) Loss: 2.082790 LR: 0.00001103 +[15:41:35] Epoch: 1 Batch: 15190/20099 (75.58%) Loss: 1.978480 LR: 0.00001103 +[15:41:36] Epoch: 1 Batch: 15191/20099 (75.58%) Loss: 2.058733 LR: 0.00001102 +[15:41:38] Epoch: 1 Batch: 15192/20099 (75.59%) Loss: 2.272103 LR: 0.00001102 +[15:41:40] Epoch: 1 Batch: 15193/20099 (75.59%) Loss: 2.259006 LR: 0.00001102 +[15:41:42] Epoch: 1 Batch: 15194/20099 (75.60%) Loss: 2.281334 LR: 0.00001102 +[15:41:44] Epoch: 1 Batch: 15195/20099 (75.60%) Loss: 2.152300 LR: 0.00001102 +[15:41:46] Epoch: 1 Batch: 15196/20099 (75.61%) Loss: 2.045691 LR: 0.00001102 +[15:41:48] Epoch: 1 Batch: 15197/20099 (75.61%) Loss: 2.313630 LR: 0.00001102 +[15:41:49] Epoch: 1 Batch: 15198/20099 (75.62%) Loss: 2.111108 LR: 0.00001101 +[15:41:51] Epoch: 1 Batch: 15199/20099 (75.62%) Loss: 1.731439 LR: 0.00001101 +[15:41:57] >> Cleaned up old temp checkpoint: epoch1_step13200 +[15:41:57] >> Temp checkpoint saved: epoch1_step15200, size: 0.1693 GB +[15:41:57] Epoch: 1 Batch: 15200/20099 (75.63%) Loss: 2.114590 LR: 0.00001101 +[15:41:59] Epoch: 1 Batch: 15201/20099 (75.63%) Loss: 2.403763 LR: 0.00001101 +[15:42:00] Epoch: 1 Batch: 15202/20099 (75.64%) Loss: 2.000744 LR: 0.00001101 +[15:42:02] Epoch: 1 Batch: 15203/20099 (75.64%) Loss: 1.870393 LR: 0.00001101 +[15:42:04] Epoch: 1 Batch: 15204/20099 (75.65%) Loss: 2.191642 LR: 0.00001101 +[15:42:06] Epoch: 1 Batch: 15205/20099 (75.65%) Loss: 2.082978 LR: 0.00001100 +[15:42:08] Epoch: 1 Batch: 15206/20099 (75.66%) Loss: 2.329272 LR: 0.00001100 +[15:42:10] Epoch: 1 Batch: 15207/20099 (75.66%) Loss: 2.268017 LR: 0.00001100 +[15:42:11] Epoch: 1 Batch: 15208/20099 (75.67%) Loss: 2.089405 LR: 0.00001100 +[15:42:13] Epoch: 1 Batch: 15209/20099 (75.67%) Loss: 2.299042 LR: 0.00001100 +[15:42:15] Epoch: 1 Batch: 15210/20099 (75.68%) Loss: 1.971114 LR: 0.00001100 +[15:42:17] Epoch: 1 Batch: 15211/20099 (75.68%) Loss: 2.050303 LR: 0.00001100 +[15:42:19] Epoch: 1 Batch: 15212/20099 (75.69%) Loss: 1.794920 LR: 0.00001098 +[15:42:21] Epoch: 1 Batch: 15213/20099 (75.69%) Loss: 2.208009 LR: 0.00001098 +[15:42:23] Epoch: 1 Batch: 15214/20099 (75.70%) Loss: 2.036736 LR: 0.00001098 +[15:42:24] Epoch: 1 Batch: 15215/20099 (75.70%) Loss: 1.895869 LR: 0.00001098 +[15:42:26] Epoch: 1 Batch: 15216/20099 (75.71%) Loss: 2.079580 LR: 0.00001098 +[15:42:28] Epoch: 1 Batch: 15217/20099 (75.71%) Loss: 2.061784 LR: 0.00001098 +[15:42:30] Epoch: 1 Batch: 15218/20099 (75.72%) Loss: 2.211788 LR: 0.00001098 +[15:42:32] Epoch: 1 Batch: 15219/20099 (75.72%) Loss: 2.178365 LR: 0.00001097 +[15:42:34] Epoch: 1 Batch: 15220/20099 (75.73%) Loss: 2.153729 LR: 0.00001097 +[15:42:36] Epoch: 1 Batch: 15221/20099 (75.73%) Loss: 1.927995 LR: 0.00001097 +[15:42:38] Epoch: 1 Batch: 15222/20099 (75.74%) Loss: 1.944123 LR: 0.00001097 +[15:42:39] Epoch: 1 Batch: 15223/20099 (75.74%) Loss: 1.749714 LR: 0.00001097 +[15:42:41] Epoch: 1 Batch: 15224/20099 (75.75%) Loss: 1.868667 LR: 0.00001097 +[15:42:43] Epoch: 1 Batch: 15225/20099 (75.75%) Loss: 2.048191 LR: 0.00001097 +[15:42:45] Epoch: 1 Batch: 15226/20099 (75.76%) Loss: 1.875780 LR: 0.00001096 +[15:42:47] Epoch: 1 Batch: 15227/20099 (75.76%) Loss: 2.339383 LR: 0.00001096 +[15:42:49] Epoch: 1 Batch: 15228/20099 (75.76%) Loss: 2.075808 LR: 0.00001096 +[15:42:51] Epoch: 1 Batch: 15229/20099 (75.77%) Loss: 2.072983 LR: 0.00001096 +[15:42:52] Epoch: 1 Batch: 15230/20099 (75.77%) Loss: 2.109290 LR: 0.00001096 +[15:42:54] Epoch: 1 Batch: 15231/20099 (75.78%) Loss: 2.354695 LR: 0.00001096 +[15:42:56] Epoch: 1 Batch: 15232/20099 (75.78%) Loss: 1.915908 LR: 0.00001096 +[15:42:58] Epoch: 1 Batch: 15233/20099 (75.79%) Loss: 2.061069 LR: 0.00001094 +[15:43:00] Epoch: 1 Batch: 15234/20099 (75.79%) Loss: 2.223853 LR: 0.00001094 +[15:43:02] Epoch: 1 Batch: 15235/20099 (75.80%) Loss: 2.138276 LR: 0.00001094 +[15:43:04] Epoch: 1 Batch: 15236/20099 (75.80%) Loss: 2.059806 LR: 0.00001094 +[15:43:05] Epoch: 1 Batch: 15237/20099 (75.81%) Loss: 2.079848 LR: 0.00001094 +[15:43:07] Epoch: 1 Batch: 15238/20099 (75.81%) Loss: 1.966143 LR: 0.00001094 +[15:43:09] Epoch: 1 Batch: 15239/20099 (75.82%) Loss: 2.286017 LR: 0.00001094 +[15:43:11] Epoch: 1 Batch: 15240/20099 (75.82%) Loss: 1.834256 LR: 0.00001093 +[15:43:13] Epoch: 1 Batch: 15241/20099 (75.83%) Loss: 2.136604 LR: 0.00001093 +[15:43:15] Epoch: 1 Batch: 15242/20099 (75.83%) Loss: 2.097176 LR: 0.00001093 +[15:43:16] Epoch: 1 Batch: 15243/20099 (75.84%) Loss: 2.202451 LR: 0.00001093 +[15:43:18] Epoch: 1 Batch: 15244/20099 (75.84%) Loss: 2.220344 LR: 0.00001093 +[15:43:20] Epoch: 1 Batch: 15245/20099 (75.85%) Loss: 1.805936 LR: 0.00001093 +[15:43:22] Epoch: 1 Batch: 15246/20099 (75.85%) Loss: 2.165530 LR: 0.00001093 +[15:43:24] Epoch: 1 Batch: 15247/20099 (75.86%) Loss: 1.965137 LR: 0.00001092 +[15:43:26] Epoch: 1 Batch: 15248/20099 (75.86%) Loss: 1.925427 LR: 0.00001092 +[15:43:27] Epoch: 1 Batch: 15249/20099 (75.87%) Loss: 2.194438 LR: 0.00001092 +[15:43:29] Epoch: 1 Batch: 15250/20099 (75.87%) Loss: 2.178713 LR: 0.00001092 +[15:43:31] Epoch: 1 Batch: 15251/20099 (75.88%) Loss: 2.013268 LR: 0.00001092 +[15:43:33] Epoch: 1 Batch: 15252/20099 (75.88%) Loss: 2.151851 LR: 0.00001092 +[15:43:35] Epoch: 1 Batch: 15253/20099 (75.89%) Loss: 2.062140 LR: 0.00001092 +[15:43:37] Epoch: 1 Batch: 15254/20099 (75.89%) Loss: 2.106021 LR: 0.00001090 +[15:43:39] Epoch: 1 Batch: 15255/20099 (75.90%) Loss: 2.180043 LR: 0.00001090 +[15:43:40] Epoch: 1 Batch: 15256/20099 (75.90%) Loss: 1.914078 LR: 0.00001090 +[15:43:42] Epoch: 1 Batch: 15257/20099 (75.91%) Loss: 2.175543 LR: 0.00001090 +[15:43:44] Epoch: 1 Batch: 15258/20099 (75.91%) Loss: 1.924527 LR: 0.00001090 +[15:43:46] Epoch: 1 Batch: 15259/20099 (75.92%) Loss: 2.287408 LR: 0.00001090 +[15:43:48] Epoch: 1 Batch: 15260/20099 (75.92%) Loss: 2.005952 LR: 0.00001090 +[15:43:50] Epoch: 1 Batch: 15261/20099 (75.93%) Loss: 1.861841 LR: 0.00001089 +[15:43:52] Epoch: 1 Batch: 15262/20099 (75.93%) Loss: 2.221993 LR: 0.00001089 +[15:43:53] Epoch: 1 Batch: 15263/20099 (75.94%) Loss: 2.232285 LR: 0.00001089 +[15:43:55] Epoch: 1 Batch: 15264/20099 (75.94%) Loss: 2.102482 LR: 0.00001089 +[15:43:57] Epoch: 1 Batch: 15265/20099 (75.95%) Loss: 2.280500 LR: 0.00001089 +[15:43:59] Epoch: 1 Batch: 15266/20099 (75.95%) Loss: 2.012672 LR: 0.00001089 +[15:44:01] Epoch: 1 Batch: 15267/20099 (75.96%) Loss: 1.807153 LR: 0.00001089 +[15:44:03] Epoch: 1 Batch: 15268/20099 (75.96%) Loss: 2.244059 LR: 0.00001088 +[15:44:05] Epoch: 1 Batch: 15269/20099 (75.97%) Loss: 2.238489 LR: 0.00001088 +[15:44:06] Epoch: 1 Batch: 15270/20099 (75.97%) Loss: 2.039292 LR: 0.00001088 +[15:44:08] Epoch: 1 Batch: 15271/20099 (75.98%) Loss: 2.141555 LR: 0.00001088 +[15:44:10] Epoch: 1 Batch: 15272/20099 (75.98%) Loss: 2.332460 LR: 0.00001088 +[15:44:12] Epoch: 1 Batch: 15273/20099 (75.99%) Loss: 2.246798 LR: 0.00001088 +[15:44:14] Epoch: 1 Batch: 15274/20099 (75.99%) Loss: 2.157036 LR: 0.00001088 +[15:44:16] Epoch: 1 Batch: 15275/20099 (76.00%) Loss: 2.055009 LR: 0.00001086 +[15:44:18] Epoch: 1 Batch: 15276/20099 (76.00%) Loss: 2.000980 LR: 0.00001086 +[15:44:20] Epoch: 1 Batch: 15277/20099 (76.01%) Loss: 2.221381 LR: 0.00001086 +[15:44:21] Epoch: 1 Batch: 15278/20099 (76.01%) Loss: 1.990639 LR: 0.00001086 +[15:44:23] Epoch: 1 Batch: 15279/20099 (76.02%) Loss: 1.966292 LR: 0.00001086 +[15:44:25] Epoch: 1 Batch: 15280/20099 (76.02%) Loss: 2.002801 LR: 0.00001086 +[15:44:27] Epoch: 1 Batch: 15281/20099 (76.03%) Loss: 1.885616 LR: 0.00001086 +[15:44:29] Epoch: 1 Batch: 15282/20099 (76.03%) Loss: 2.103451 LR: 0.00001085 +[15:44:31] Epoch: 1 Batch: 15283/20099 (76.04%) Loss: 1.893052 LR: 0.00001085 +[15:44:33] Epoch: 1 Batch: 15284/20099 (76.04%) Loss: 2.122659 LR: 0.00001085 +[15:44:34] Epoch: 1 Batch: 15285/20099 (76.05%) Loss: 1.978359 LR: 0.00001085 +[15:44:36] Epoch: 1 Batch: 15286/20099 (76.05%) Loss: 1.936689 LR: 0.00001085 +[15:44:38] Epoch: 1 Batch: 15287/20099 (76.06%) Loss: 1.928531 LR: 0.00001085 +[15:44:40] Epoch: 1 Batch: 15288/20099 (76.06%) Loss: 1.824268 LR: 0.00001085 +[15:44:42] Epoch: 1 Batch: 15289/20099 (76.07%) Loss: 2.047624 LR: 0.00001084 +[15:44:44] Epoch: 1 Batch: 15290/20099 (76.07%) Loss: 2.014981 LR: 0.00001084 +[15:44:46] Epoch: 1 Batch: 15291/20099 (76.08%) Loss: 1.934533 LR: 0.00001084 +[15:44:47] Epoch: 1 Batch: 15292/20099 (76.08%) Loss: 2.510976 LR: 0.00001084 +[15:44:49] Epoch: 1 Batch: 15293/20099 (76.09%) Loss: 2.062315 LR: 0.00001084 +[15:44:51] Epoch: 1 Batch: 15294/20099 (76.09%) Loss: 1.954675 LR: 0.00001084 +[15:44:53] Epoch: 1 Batch: 15295/20099 (76.10%) Loss: 1.884975 LR: 0.00001084 +[15:44:55] Epoch: 1 Batch: 15296/20099 (76.10%) Loss: 1.583232 LR: 0.00001082 +[15:44:57] Epoch: 1 Batch: 15297/20099 (76.11%) Loss: 1.930627 LR: 0.00001082 +[15:44:59] Epoch: 1 Batch: 15298/20099 (76.11%) Loss: 1.848381 LR: 0.00001082 +[15:45:00] Epoch: 1 Batch: 15299/20099 (76.12%) Loss: 1.943404 LR: 0.00001082 +[15:45:02] Epoch: 1 Batch: 15300/20099 (76.12%) Loss: 2.124398 LR: 0.00001082 +[15:45:04] Epoch: 1 Batch: 15301/20099 (76.13%) Loss: 2.343056 LR: 0.00001082 +[15:45:06] Epoch: 1 Batch: 15302/20099 (76.13%) Loss: 2.000821 LR: 0.00001082 +[15:45:08] Epoch: 1 Batch: 15303/20099 (76.14%) Loss: 1.924122 LR: 0.00001081 +[15:45:10] Epoch: 1 Batch: 15304/20099 (76.14%) Loss: 2.076907 LR: 0.00001081 +[15:45:11] Epoch: 1 Batch: 15305/20099 (76.15%) Loss: 2.070025 LR: 0.00001081 +[15:45:13] Epoch: 1 Batch: 15306/20099 (76.15%) Loss: 2.202655 LR: 0.00001081 +[15:45:15] Epoch: 1 Batch: 15307/20099 (76.16%) Loss: 2.066310 LR: 0.00001081 +[15:45:17] Epoch: 1 Batch: 15308/20099 (76.16%) Loss: 1.984949 LR: 0.00001081 +[15:45:19] Epoch: 1 Batch: 15309/20099 (76.17%) Loss: 2.378587 LR: 0.00001081 +[15:45:21] Epoch: 1 Batch: 15310/20099 (76.17%) Loss: 2.128044 LR: 0.00001080 +[15:45:23] Epoch: 1 Batch: 15311/20099 (76.18%) Loss: 2.329441 LR: 0.00001080 +[15:45:24] Epoch: 1 Batch: 15312/20099 (76.18%) Loss: 2.148195 LR: 0.00001080 +[15:45:26] Epoch: 1 Batch: 15313/20099 (76.19%) Loss: 1.908280 LR: 0.00001080 +[15:45:28] Epoch: 1 Batch: 15314/20099 (76.19%) Loss: 2.214649 LR: 0.00001080 +[15:45:30] Epoch: 1 Batch: 15315/20099 (76.20%) Loss: 2.294306 LR: 0.00001080 +[15:45:32] Epoch: 1 Batch: 15316/20099 (76.20%) Loss: 1.850348 LR: 0.00001080 +[15:45:34] Epoch: 1 Batch: 15317/20099 (76.21%) Loss: 2.029615 LR: 0.00001079 +[15:45:36] Epoch: 1 Batch: 15318/20099 (76.21%) Loss: 1.921311 LR: 0.00001079 +[15:45:37] Epoch: 1 Batch: 15319/20099 (76.22%) Loss: 2.143719 LR: 0.00001079 +[15:45:39] Epoch: 1 Batch: 15320/20099 (76.22%) Loss: 2.155460 LR: 0.00001079 +[15:45:41] Epoch: 1 Batch: 15321/20099 (76.23%) Loss: 2.037797 LR: 0.00001079 +[15:45:43] Epoch: 1 Batch: 15322/20099 (76.23%) Loss: 2.003455 LR: 0.00001079 +[15:45:45] Epoch: 1 Batch: 15323/20099 (76.24%) Loss: 2.020525 LR: 0.00001079 +[15:45:47] Epoch: 1 Batch: 15324/20099 (76.24%) Loss: 1.798068 LR: 0.00001077 +[15:45:49] Epoch: 1 Batch: 15325/20099 (76.25%) Loss: 2.075258 LR: 0.00001077 +[15:45:50] Epoch: 1 Batch: 15326/20099 (76.25%) Loss: 1.986336 LR: 0.00001077 +[15:45:52] Epoch: 1 Batch: 15327/20099 (76.26%) Loss: 2.221890 LR: 0.00001077 +[15:45:54] Epoch: 1 Batch: 15328/20099 (76.26%) Loss: 2.216751 LR: 0.00001077 +[15:45:56] Epoch: 1 Batch: 15329/20099 (76.27%) Loss: 2.171168 LR: 0.00001077 +[15:45:58] Epoch: 1 Batch: 15330/20099 (76.27%) Loss: 2.033912 LR: 0.00001077 +[15:46:00] Epoch: 1 Batch: 15331/20099 (76.28%) Loss: 2.283942 LR: 0.00001076 +[15:46:02] Epoch: 1 Batch: 15332/20099 (76.28%) Loss: 1.884410 LR: 0.00001076 +[15:46:03] Epoch: 1 Batch: 15333/20099 (76.29%) Loss: 1.910356 LR: 0.00001076 +[15:46:05] Epoch: 1 Batch: 15334/20099 (76.29%) Loss: 1.671831 LR: 0.00001076 +[15:46:07] Epoch: 1 Batch: 15335/20099 (76.30%) Loss: 1.978758 LR: 0.00001076 +[15:46:09] Epoch: 1 Batch: 15336/20099 (76.30%) Loss: 2.178091 LR: 0.00001076 +[15:46:11] Epoch: 1 Batch: 15337/20099 (76.31%) Loss: 2.288015 LR: 0.00001076 +[15:46:13] Epoch: 1 Batch: 15338/20099 (76.31%) Loss: 1.961298 LR: 0.00001075 +[15:46:14] Epoch: 1 Batch: 15339/20099 (76.32%) Loss: 1.953100 LR: 0.00001075 +[15:46:16] Epoch: 1 Batch: 15340/20099 (76.32%) Loss: 2.260576 LR: 0.00001075 +[15:46:18] Epoch: 1 Batch: 15341/20099 (76.33%) Loss: 2.070876 LR: 0.00001075 +[15:46:20] Epoch: 1 Batch: 15342/20099 (76.33%) Loss: 2.089956 LR: 0.00001075 +[15:46:22] Epoch: 1 Batch: 15343/20099 (76.34%) Loss: 2.010200 LR: 0.00001075 +[15:46:24] Epoch: 1 Batch: 15344/20099 (76.34%) Loss: 1.958868 LR: 0.00001075 +[15:46:26] Epoch: 1 Batch: 15345/20099 (76.35%) Loss: 2.130991 LR: 0.00001073 +[15:46:27] Epoch: 1 Batch: 15346/20099 (76.35%) Loss: 2.244550 LR: 0.00001073 +[15:46:29] Epoch: 1 Batch: 15347/20099 (76.36%) Loss: 2.099769 LR: 0.00001073 +[15:46:31] Epoch: 1 Batch: 15348/20099 (76.36%) Loss: 2.222178 LR: 0.00001073 +[15:46:33] Epoch: 1 Batch: 15349/20099 (76.37%) Loss: 2.209086 LR: 0.00001073 +[15:46:35] Epoch: 1 Batch: 15350/20099 (76.37%) Loss: 2.145990 LR: 0.00001073 +[15:46:37] Epoch: 1 Batch: 15351/20099 (76.38%) Loss: 2.178428 LR: 0.00001073 +[15:46:39] Epoch: 1 Batch: 15352/20099 (76.38%) Loss: 2.200249 LR: 0.00001072 +[15:46:40] Epoch: 1 Batch: 15353/20099 (76.39%) Loss: 1.720559 LR: 0.00001072 +[15:46:42] Epoch: 1 Batch: 15354/20099 (76.39%) Loss: 1.886654 LR: 0.00001072 +[15:46:44] Epoch: 1 Batch: 15355/20099 (76.40%) Loss: 2.165847 LR: 0.00001072 +[15:46:46] Epoch: 1 Batch: 15356/20099 (76.40%) Loss: 2.435343 LR: 0.00001072 +[15:46:48] Epoch: 1 Batch: 15357/20099 (76.41%) Loss: 1.827041 LR: 0.00001072 +[15:46:50] Epoch: 1 Batch: 15358/20099 (76.41%) Loss: 2.252287 LR: 0.00001072 +[15:46:51] Epoch: 1 Batch: 15359/20099 (76.42%) Loss: 1.803403 LR: 0.00001071 +[15:46:53] Epoch: 1 Batch: 15360/20099 (76.42%) Loss: 2.218782 LR: 0.00001071 +[15:46:55] Epoch: 1 Batch: 15361/20099 (76.43%) Loss: 1.906487 LR: 0.00001071 +[15:46:57] Epoch: 1 Batch: 15362/20099 (76.43%) Loss: 2.178492 LR: 0.00001071 +[15:46:59] Epoch: 1 Batch: 15363/20099 (76.44%) Loss: 1.837838 LR: 0.00001071 +[15:47:01] Epoch: 1 Batch: 15364/20099 (76.44%) Loss: 2.127402 LR: 0.00001071 +[15:47:03] Epoch: 1 Batch: 15365/20099 (76.45%) Loss: 2.148665 LR: 0.00001071 +[15:47:04] Epoch: 1 Batch: 15366/20099 (76.45%) Loss: 1.975127 LR: 0.00001070 +[15:47:06] Epoch: 1 Batch: 15367/20099 (76.46%) Loss: 2.035200 LR: 0.00001070 +[15:47:08] Epoch: 1 Batch: 15368/20099 (76.46%) Loss: 1.895653 LR: 0.00001070 +[15:47:10] Epoch: 1 Batch: 15369/20099 (76.47%) Loss: 2.330080 LR: 0.00001070 +[15:47:12] Epoch: 1 Batch: 15370/20099 (76.47%) Loss: 2.114565 LR: 0.00001070 +[15:47:14] Epoch: 1 Batch: 15371/20099 (76.48%) Loss: 2.262273 LR: 0.00001070 +[15:47:16] Epoch: 1 Batch: 15372/20099 (76.48%) Loss: 2.123597 LR: 0.00001070 +[15:47:17] Epoch: 1 Batch: 15373/20099 (76.49%) Loss: 2.242881 LR: 0.00001068 +[15:47:19] Epoch: 1 Batch: 15374/20099 (76.49%) Loss: 2.090059 LR: 0.00001068 +[15:47:21] Epoch: 1 Batch: 15375/20099 (76.50%) Loss: 1.761441 LR: 0.00001068 +[15:47:23] Epoch: 1 Batch: 15376/20099 (76.50%) Loss: 2.504808 LR: 0.00001068 +[15:47:25] Epoch: 1 Batch: 15377/20099 (76.51%) Loss: 1.810058 LR: 0.00001068 +[15:47:27] Epoch: 1 Batch: 15378/20099 (76.51%) Loss: 1.841511 LR: 0.00001068 +[15:47:28] Epoch: 1 Batch: 15379/20099 (76.52%) Loss: 2.082515 LR: 0.00001068 +[15:47:30] Epoch: 1 Batch: 15380/20099 (76.52%) Loss: 2.267555 LR: 0.00001067 +[15:47:32] Epoch: 1 Batch: 15381/20099 (76.53%) Loss: 2.239877 LR: 0.00001067 +[15:47:34] Epoch: 1 Batch: 15382/20099 (76.53%) Loss: 2.049970 LR: 0.00001067 +[15:47:36] Epoch: 1 Batch: 15383/20099 (76.54%) Loss: 1.867315 LR: 0.00001067 +[15:47:38] Epoch: 1 Batch: 15384/20099 (76.54%) Loss: 2.223927 LR: 0.00001067 +[15:47:40] Epoch: 1 Batch: 15385/20099 (76.55%) Loss: 1.880072 LR: 0.00001067 +[15:47:41] Epoch: 1 Batch: 15386/20099 (76.55%) Loss: 2.238926 LR: 0.00001067 +[15:47:43] Epoch: 1 Batch: 15387/20099 (76.56%) Loss: 1.876747 LR: 0.00001066 +[15:47:45] Epoch: 1 Batch: 15388/20099 (76.56%) Loss: 2.223061 LR: 0.00001066 +[15:47:47] Epoch: 1 Batch: 15389/20099 (76.57%) Loss: 2.282782 LR: 0.00001066 +[15:47:49] Epoch: 1 Batch: 15390/20099 (76.57%) Loss: 2.186871 LR: 0.00001066 +[15:47:51] Epoch: 1 Batch: 15391/20099 (76.58%) Loss: 2.287266 LR: 0.00001066 +[15:47:52] Epoch: 1 Batch: 15392/20099 (76.58%) Loss: 1.821151 LR: 0.00001066 +[15:47:54] Epoch: 1 Batch: 15393/20099 (76.59%) Loss: 1.907996 LR: 0.00001066 +[15:47:56] Epoch: 1 Batch: 15394/20099 (76.59%) Loss: 1.884081 LR: 0.00001064 +[15:47:58] Epoch: 1 Batch: 15395/20099 (76.60%) Loss: 2.300083 LR: 0.00001064 +[15:48:00] Epoch: 1 Batch: 15396/20099 (76.60%) Loss: 2.018670 LR: 0.00001064 +[15:48:02] Epoch: 1 Batch: 15397/20099 (76.61%) Loss: 2.046902 LR: 0.00001064 +[15:48:04] Epoch: 1 Batch: 15398/20099 (76.61%) Loss: 2.021716 LR: 0.00001064 +[15:48:05] Epoch: 1 Batch: 15399/20099 (76.62%) Loss: 1.923739 LR: 0.00001064 +[15:48:11] >> Cleaned up old temp checkpoint: epoch1_step13400 +[15:48:11] >> Temp checkpoint saved: epoch1_step15400, size: 0.1693 GB +[15:48:11] Epoch: 1 Batch: 15400/20099 (76.62%) Loss: 1.787343 LR: 0.00001064 +[15:48:13] Epoch: 1 Batch: 15401/20099 (76.63%) Loss: 2.077458 LR: 0.00001063 +[15:48:15] Epoch: 1 Batch: 15402/20099 (76.63%) Loss: 1.791352 LR: 0.00001063 +[15:48:17] Epoch: 1 Batch: 15403/20099 (76.64%) Loss: 2.085089 LR: 0.00001063 +[15:48:18] Epoch: 1 Batch: 15404/20099 (76.64%) Loss: 2.108154 LR: 0.00001063 +[15:48:20] Epoch: 1 Batch: 15405/20099 (76.65%) Loss: 2.321947 LR: 0.00001063 +[15:48:22] Epoch: 1 Batch: 15406/20099 (76.65%) Loss: 2.095125 LR: 0.00001063 +[15:48:24] Epoch: 1 Batch: 15407/20099 (76.66%) Loss: 1.988294 LR: 0.00001063 +[15:48:26] Epoch: 1 Batch: 15408/20099 (76.66%) Loss: 2.074601 LR: 0.00001062 +[15:48:28] Epoch: 1 Batch: 15409/20099 (76.67%) Loss: 2.135297 LR: 0.00001062 +[15:48:30] Epoch: 1 Batch: 15410/20099 (76.67%) Loss: 2.262911 LR: 0.00001062 +[15:48:31] Epoch: 1 Batch: 15411/20099 (76.68%) Loss: 1.893217 LR: 0.00001062 +[15:48:33] Epoch: 1 Batch: 15412/20099 (76.68%) Loss: 2.199661 LR: 0.00001062 +[15:48:35] Epoch: 1 Batch: 15413/20099 (76.69%) Loss: 2.293222 LR: 0.00001062 +[15:48:37] Epoch: 1 Batch: 15414/20099 (76.69%) Loss: 2.060537 LR: 0.00001062 +[15:48:39] Epoch: 1 Batch: 15415/20099 (76.70%) Loss: 1.940843 LR: 0.00001061 +[15:48:41] Epoch: 1 Batch: 15416/20099 (76.70%) Loss: 1.998266 LR: 0.00001061 +[15:48:43] Epoch: 1 Batch: 15417/20099 (76.71%) Loss: 2.206871 LR: 0.00001061 +[15:48:44] Epoch: 1 Batch: 15418/20099 (76.71%) Loss: 2.037436 LR: 0.00001061 +[15:48:46] Epoch: 1 Batch: 15419/20099 (76.72%) Loss: 2.017669 LR: 0.00001061 +[15:48:48] Epoch: 1 Batch: 15420/20099 (76.72%) Loss: 1.915703 LR: 0.00001061 +[15:48:50] Epoch: 1 Batch: 15421/20099 (76.73%) Loss: 2.088032 LR: 0.00001061 +[15:48:52] Epoch: 1 Batch: 15422/20099 (76.73%) Loss: 1.935935 LR: 0.00001059 +[15:48:54] Epoch: 1 Batch: 15423/20099 (76.74%) Loss: 2.225381 LR: 0.00001059 +[15:48:56] Epoch: 1 Batch: 15424/20099 (76.74%) Loss: 2.021248 LR: 0.00001059 +[15:48:58] Epoch: 1 Batch: 15425/20099 (76.75%) Loss: 2.185432 LR: 0.00001059 +[15:48:59] Epoch: 1 Batch: 15426/20099 (76.75%) Loss: 2.328572 LR: 0.00001059 +[15:49:01] Epoch: 1 Batch: 15427/20099 (76.76%) Loss: 1.983002 LR: 0.00001059 +[15:49:03] Epoch: 1 Batch: 15428/20099 (76.76%) Loss: 1.960104 LR: 0.00001059 +[15:49:05] Epoch: 1 Batch: 15429/20099 (76.77%) Loss: 2.139424 LR: 0.00001058 +[15:49:07] Epoch: 1 Batch: 15430/20099 (76.77%) Loss: 2.054262 LR: 0.00001058 +[15:49:09] Epoch: 1 Batch: 15431/20099 (76.77%) Loss: 2.104237 LR: 0.00001058 +[15:49:11] Epoch: 1 Batch: 15432/20099 (76.78%) Loss: 2.113359 LR: 0.00001058 +[15:49:12] Epoch: 1 Batch: 15433/20099 (76.78%) Loss: 2.174231 LR: 0.00001058 +[15:49:14] Epoch: 1 Batch: 15434/20099 (76.79%) Loss: 2.036298 LR: 0.00001058 +[15:49:16] Epoch: 1 Batch: 15435/20099 (76.79%) Loss: 2.316279 LR: 0.00001058 +[15:49:18] Epoch: 1 Batch: 15436/20099 (76.80%) Loss: 1.991347 LR: 0.00001057 +[15:49:20] Epoch: 1 Batch: 15437/20099 (76.80%) Loss: 2.198402 LR: 0.00001057 +[15:49:22] Epoch: 1 Batch: 15438/20099 (76.81%) Loss: 2.201159 LR: 0.00001057 +[15:49:23] Epoch: 1 Batch: 15439/20099 (76.81%) Loss: 1.845095 LR: 0.00001057 +[15:49:25] Epoch: 1 Batch: 15440/20099 (76.82%) Loss: 2.215903 LR: 0.00001057 +[15:49:27] Epoch: 1 Batch: 15441/20099 (76.82%) Loss: 2.004562 LR: 0.00001057 +[15:49:29] Epoch: 1 Batch: 15442/20099 (76.83%) Loss: 1.925024 LR: 0.00001057 +[15:49:31] Epoch: 1 Batch: 15443/20099 (76.83%) Loss: 2.285370 LR: 0.00001055 +[15:49:33] Epoch: 1 Batch: 15444/20099 (76.84%) Loss: 2.211355 LR: 0.00001055 +[15:49:35] Epoch: 1 Batch: 15445/20099 (76.84%) Loss: 1.970210 LR: 0.00001055 +[15:49:36] Epoch: 1 Batch: 15446/20099 (76.85%) Loss: 2.032976 LR: 0.00001055 +[15:49:38] Epoch: 1 Batch: 15447/20099 (76.85%) Loss: 1.953813 LR: 0.00001055 +[15:49:40] Epoch: 1 Batch: 15448/20099 (76.86%) Loss: 2.135137 LR: 0.00001055 +[15:49:42] Epoch: 1 Batch: 15449/20099 (76.86%) Loss: 1.843059 LR: 0.00001055 +[15:49:44] Epoch: 1 Batch: 15450/20099 (76.87%) Loss: 2.398548 LR: 0.00001054 +[15:49:46] Epoch: 1 Batch: 15451/20099 (76.87%) Loss: 2.227602 LR: 0.00001054 +[15:49:47] Epoch: 1 Batch: 15452/20099 (76.88%) Loss: 2.184901 LR: 0.00001054 +[15:49:49] Epoch: 1 Batch: 15453/20099 (76.88%) Loss: 2.113898 LR: 0.00001054 +[15:49:51] Epoch: 1 Batch: 15454/20099 (76.89%) Loss: 1.927786 LR: 0.00001054 +[15:49:53] Epoch: 1 Batch: 15455/20099 (76.89%) Loss: 2.019750 LR: 0.00001054 +[15:49:55] Epoch: 1 Batch: 15456/20099 (76.90%) Loss: 2.172204 LR: 0.00001054 +[15:49:57] Epoch: 1 Batch: 15457/20099 (76.90%) Loss: 2.195533 LR: 0.00001053 +[15:49:59] Epoch: 1 Batch: 15458/20099 (76.91%) Loss: 2.417556 LR: 0.00001053 +[15:50:00] Epoch: 1 Batch: 15459/20099 (76.91%) Loss: 1.661147 LR: 0.00001053 +[15:50:02] Epoch: 1 Batch: 15460/20099 (76.92%) Loss: 1.673719 LR: 0.00001053 +[15:50:04] Epoch: 1 Batch: 15461/20099 (76.92%) Loss: 2.203580 LR: 0.00001053 +[15:50:06] Epoch: 1 Batch: 15462/20099 (76.93%) Loss: 2.148651 LR: 0.00001053 +[15:50:08] Epoch: 1 Batch: 15463/20099 (76.93%) Loss: 2.285357 LR: 0.00001053 +[15:50:10] Epoch: 1 Batch: 15464/20099 (76.94%) Loss: 2.070907 LR: 0.00001052 +[15:50:12] Epoch: 1 Batch: 15465/20099 (76.94%) Loss: 2.470788 LR: 0.00001052 +[15:50:13] Epoch: 1 Batch: 15466/20099 (76.95%) Loss: 2.489792 LR: 0.00001052 +[15:50:15] Epoch: 1 Batch: 15467/20099 (76.95%) Loss: 2.138826 LR: 0.00001052 +[15:50:17] Epoch: 1 Batch: 15468/20099 (76.96%) Loss: 2.191453 LR: 0.00001052 +[15:50:19] Epoch: 1 Batch: 15469/20099 (76.96%) Loss: 2.314897 LR: 0.00001052 +[15:50:21] Epoch: 1 Batch: 15470/20099 (76.97%) Loss: 2.428260 LR: 0.00001052 +[15:50:23] Epoch: 1 Batch: 15471/20099 (76.97%) Loss: 1.937745 LR: 0.00001050 +[15:50:25] Epoch: 1 Batch: 15472/20099 (76.98%) Loss: 2.076041 LR: 0.00001050 +[15:50:26] Epoch: 1 Batch: 15473/20099 (76.98%) Loss: 2.316175 LR: 0.00001050 +[15:50:28] Epoch: 1 Batch: 15474/20099 (76.99%) Loss: 1.933040 LR: 0.00001050 +[15:50:30] Epoch: 1 Batch: 15475/20099 (76.99%) Loss: 2.101529 LR: 0.00001050 +[15:50:32] Epoch: 1 Batch: 15476/20099 (77.00%) Loss: 1.771391 LR: 0.00001050 +[15:50:34] Epoch: 1 Batch: 15477/20099 (77.00%) Loss: 1.861154 LR: 0.00001050 +[15:50:36] Epoch: 1 Batch: 15478/20099 (77.01%) Loss: 2.233125 LR: 0.00001049 +[15:50:38] Epoch: 1 Batch: 15479/20099 (77.01%) Loss: 2.029874 LR: 0.00001049 +[15:50:39] Epoch: 1 Batch: 15480/20099 (77.02%) Loss: 2.368866 LR: 0.00001049 +[15:50:41] Epoch: 1 Batch: 15481/20099 (77.02%) Loss: 2.043628 LR: 0.00001049 +[15:50:43] Epoch: 1 Batch: 15482/20099 (77.03%) Loss: 2.073314 LR: 0.00001049 +[15:50:45] Epoch: 1 Batch: 15483/20099 (77.03%) Loss: 1.944717 LR: 0.00001049 +[15:50:47] Epoch: 1 Batch: 15484/20099 (77.04%) Loss: 2.297977 LR: 0.00001049 +[15:50:49] Epoch: 1 Batch: 15485/20099 (77.04%) Loss: 2.014137 LR: 0.00001048 +[15:50:50] Epoch: 1 Batch: 15486/20099 (77.05%) Loss: 1.989508 LR: 0.00001048 +[15:50:52] Epoch: 1 Batch: 15487/20099 (77.05%) Loss: 2.271599 LR: 0.00001048 +[15:50:54] Epoch: 1 Batch: 15488/20099 (77.06%) Loss: 2.165126 LR: 0.00001048 +[15:50:56] Epoch: 1 Batch: 15489/20099 (77.06%) Loss: 2.003668 LR: 0.00001048 +[15:50:58] Epoch: 1 Batch: 15490/20099 (77.07%) Loss: 2.109466 LR: 0.00001048 +[15:51:00] Epoch: 1 Batch: 15491/20099 (77.07%) Loss: 2.016095 LR: 0.00001048 +[15:51:02] Epoch: 1 Batch: 15492/20099 (77.08%) Loss: 2.128454 LR: 0.00001047 +[15:51:03] Epoch: 1 Batch: 15493/20099 (77.08%) Loss: 2.162929 LR: 0.00001047 +[15:51:05] Epoch: 1 Batch: 15494/20099 (77.09%) Loss: 1.761993 LR: 0.00001047 +[15:51:07] Epoch: 1 Batch: 15495/20099 (77.09%) Loss: 2.107648 LR: 0.00001047 +[15:51:09] Epoch: 1 Batch: 15496/20099 (77.10%) Loss: 2.174953 LR: 0.00001047 +[15:51:11] Epoch: 1 Batch: 15497/20099 (77.10%) Loss: 2.359966 LR: 0.00001047 +[15:51:13] Epoch: 1 Batch: 15498/20099 (77.11%) Loss: 2.103602 LR: 0.00001047 +[15:51:15] Epoch: 1 Batch: 15499/20099 (77.11%) Loss: 2.105142 LR: 0.00001045 +[15:51:16] >> Evaluating batch 0 +[15:51:18] >> Evaluating batch 1 +[15:51:19] >> Evaluating batch 2 +[15:51:20] >> Evaluating batch 3 +[15:51:21] >> Evaluating batch 4 +[15:51:22] >> Evaluating batch 5 +[15:51:23] >> Evaluating batch 6 +[15:51:24] >> Evaluating batch 7 +[15:51:25] >> Evaluating batch 8 +[15:51:26] >> Evaluating batch 9 +[15:51:27] >> Evaluating batch 10 +[15:51:28] >> Evaluating batch 11 +[15:51:29] >> Evaluating batch 12 +[15:51:30] >> Evaluating batch 13 +[15:51:31] >> Evaluating batch 14 +[15:51:32] >> Evaluating batch 15 +[15:51:33] >> Evaluating batch 16 +[15:51:34] Epoch: 1 Step: 15500/20099 Evaluation: +[15:51:34] [1mAvg Loss Since Last Eval: 2.0979 Val Loss: 2.1504 Validation loss delta: -0.0028 Perplexity: 8.5879 LR: 0.00001045 +[15:51:38] >> Checkpoint saved: epoch1_step15500, size: 0.1693 GB +[15:51:38] Epoch: 1 Batch: 15500/20099 (77.12%) Loss: 1.783126 LR: 0.00001045 +[15:51:40] Epoch: 1 Batch: 15501/20099 (77.12%) Loss: 2.151868 LR: 0.00001045 +[15:51:42] Epoch: 1 Batch: 15502/20099 (77.13%) Loss: 2.051015 LR: 0.00001045 +[15:51:44] Epoch: 1 Batch: 15503/20099 (77.13%) Loss: 1.664257 LR: 0.00001045 +[15:51:45] Epoch: 1 Batch: 15504/20099 (77.14%) Loss: 2.154980 LR: 0.00001045 +[15:51:47] Epoch: 1 Batch: 15505/20099 (77.14%) Loss: 2.316065 LR: 0.00001045 +[15:51:49] Epoch: 1 Batch: 15506/20099 (77.15%) Loss: 2.253338 LR: 0.00001044 +[15:51:51] Epoch: 1 Batch: 15507/20099 (77.15%) Loss: 2.069305 LR: 0.00001044 +[15:51:53] Epoch: 1 Batch: 15508/20099 (77.16%) Loss: 2.113972 LR: 0.00001044 +[15:51:55] Epoch: 1 Batch: 15509/20099 (77.16%) Loss: 2.306899 LR: 0.00001044 +[15:51:57] Epoch: 1 Batch: 15510/20099 (77.17%) Loss: 1.902904 LR: 0.00001044 +[15:51:58] Epoch: 1 Batch: 15511/20099 (77.17%) Loss: 1.918498 LR: 0.00001044 +[15:52:00] Epoch: 1 Batch: 15512/20099 (77.18%) Loss: 2.075801 LR: 0.00001044 +[15:52:02] Epoch: 1 Batch: 15513/20099 (77.18%) Loss: 1.901731 LR: 0.00001043 +[15:52:04] Epoch: 1 Batch: 15514/20099 (77.19%) Loss: 2.298077 LR: 0.00001043 +[15:52:06] Epoch: 1 Batch: 15515/20099 (77.19%) Loss: 2.278167 LR: 0.00001043 +[15:52:08] Epoch: 1 Batch: 15516/20099 (77.20%) Loss: 2.283703 LR: 0.00001043 +[15:52:10] Epoch: 1 Batch: 15517/20099 (77.20%) Loss: 2.006898 LR: 0.00001043 +[15:52:12] Epoch: 1 Batch: 15518/20099 (77.21%) Loss: 2.052295 LR: 0.00001043 +[15:52:13] Epoch: 1 Batch: 15519/20099 (77.21%) Loss: 1.961916 LR: 0.00001043 +[15:52:15] Epoch: 1 Batch: 15520/20099 (77.22%) Loss: 1.953322 LR: 0.00001042 +[15:52:17] Epoch: 1 Batch: 15521/20099 (77.22%) Loss: 2.049419 LR: 0.00001042 +[15:52:19] Epoch: 1 Batch: 15522/20099 (77.23%) Loss: 1.989728 LR: 0.00001042 +[15:52:21] Epoch: 1 Batch: 15523/20099 (77.23%) Loss: 1.962202 LR: 0.00001042 +[15:52:23] Epoch: 1 Batch: 15524/20099 (77.24%) Loss: 2.300329 LR: 0.00001042 +[15:52:25] Epoch: 1 Batch: 15525/20099 (77.24%) Loss: 1.937537 LR: 0.00001042 +[15:52:26] Epoch: 1 Batch: 15526/20099 (77.25%) Loss: 2.042443 LR: 0.00001042 +[15:52:28] Epoch: 1 Batch: 15527/20099 (77.25%) Loss: 1.890192 LR: 0.00001040 +[15:52:30] Epoch: 1 Batch: 15528/20099 (77.26%) Loss: 2.058901 LR: 0.00001040 +[15:52:32] Epoch: 1 Batch: 15529/20099 (77.26%) Loss: 1.856785 LR: 0.00001040 +[15:52:34] Epoch: 1 Batch: 15530/20099 (77.27%) Loss: 2.109856 LR: 0.00001040 +[15:52:36] Epoch: 1 Batch: 15531/20099 (77.27%) Loss: 1.651166 LR: 0.00001040 +[15:52:38] Epoch: 1 Batch: 15532/20099 (77.28%) Loss: 2.019732 LR: 0.00001040 +[15:52:39] Epoch: 1 Batch: 15533/20099 (77.28%) Loss: 2.156766 LR: 0.00001040 +[15:52:41] Epoch: 1 Batch: 15534/20099 (77.29%) Loss: 1.967450 LR: 0.00001039 +[15:52:43] Epoch: 1 Batch: 15535/20099 (77.29%) Loss: 2.107119 LR: 0.00001039 +[15:52:45] Epoch: 1 Batch: 15536/20099 (77.30%) Loss: 2.122323 LR: 0.00001039 +[15:52:47] Epoch: 1 Batch: 15537/20099 (77.30%) Loss: 1.559146 LR: 0.00001039 +[15:52:49] Epoch: 1 Batch: 15538/20099 (77.31%) Loss: 2.045942 LR: 0.00001039 +[15:52:50] Epoch: 1 Batch: 15539/20099 (77.31%) Loss: 2.207802 LR: 0.00001039 +[15:52:52] Epoch: 1 Batch: 15540/20099 (77.32%) Loss: 1.936261 LR: 0.00001039 +[15:52:54] Epoch: 1 Batch: 15541/20099 (77.32%) Loss: 2.156393 LR: 0.00001038 +[15:52:56] Epoch: 1 Batch: 15542/20099 (77.33%) Loss: 2.043953 LR: 0.00001038 +[15:52:58] Epoch: 1 Batch: 15543/20099 (77.33%) Loss: 2.241662 LR: 0.00001038 +[15:53:00] Epoch: 1 Batch: 15544/20099 (77.34%) Loss: 1.952232 LR: 0.00001038 +[15:53:01] Epoch: 1 Batch: 15545/20099 (77.34%) Loss: 2.060880 LR: 0.00001038 +[15:53:03] Epoch: 1 Batch: 15546/20099 (77.35%) Loss: 2.003650 LR: 0.00001038 +[15:53:05] Epoch: 1 Batch: 15547/20099 (77.35%) Loss: 2.081288 LR: 0.00001038 +[15:53:07] Epoch: 1 Batch: 15548/20099 (77.36%) Loss: 1.799410 LR: 0.00001036 +[15:53:09] Epoch: 1 Batch: 15549/20099 (77.36%) Loss: 2.099184 LR: 0.00001036 +[15:53:11] Epoch: 1 Batch: 15550/20099 (77.37%) Loss: 2.065982 LR: 0.00001036 +[15:53:13] Epoch: 1 Batch: 15551/20099 (77.37%) Loss: 1.933086 LR: 0.00001036 +[15:53:14] Epoch: 1 Batch: 15552/20099 (77.38%) Loss: 1.653019 LR: 0.00001036 +[15:53:16] Epoch: 1 Batch: 15553/20099 (77.38%) Loss: 1.851772 LR: 0.00001036 +[15:53:18] Epoch: 1 Batch: 15554/20099 (77.39%) Loss: 2.249006 LR: 0.00001036 +[15:53:20] Epoch: 1 Batch: 15555/20099 (77.39%) Loss: 2.035734 LR: 0.00001035 +[15:53:22] Epoch: 1 Batch: 15556/20099 (77.40%) Loss: 2.202520 LR: 0.00001035 +[15:53:24] Epoch: 1 Batch: 15557/20099 (77.40%) Loss: 1.954386 LR: 0.00001035 +[15:53:25] Epoch: 1 Batch: 15558/20099 (77.41%) Loss: 2.028545 LR: 0.00001035 +[15:53:27] Epoch: 1 Batch: 15559/20099 (77.41%) Loss: 2.326250 LR: 0.00001035 +[15:53:29] Epoch: 1 Batch: 15560/20099 (77.42%) Loss: 2.211926 LR: 0.00001035 +[15:53:31] Epoch: 1 Batch: 15561/20099 (77.42%) Loss: 1.985471 LR: 0.00001035 +[15:53:33] Epoch: 1 Batch: 15562/20099 (77.43%) Loss: 2.116374 LR: 0.00001034 +[15:53:35] Epoch: 1 Batch: 15563/20099 (77.43%) Loss: 2.589301 LR: 0.00001034 +[15:53:37] Epoch: 1 Batch: 15564/20099 (77.44%) Loss: 2.016425 LR: 0.00001034 +[15:53:38] Epoch: 1 Batch: 15565/20099 (77.44%) Loss: 1.549008 LR: 0.00001034 +[15:53:40] Epoch: 1 Batch: 15566/20099 (77.45%) Loss: 2.320504 LR: 0.00001034 +[15:53:42] Epoch: 1 Batch: 15567/20099 (77.45%) Loss: 2.432944 LR: 0.00001034 +[15:53:44] Epoch: 1 Batch: 15568/20099 (77.46%) Loss: 2.205656 LR: 0.00001034 +[15:53:46] Epoch: 1 Batch: 15569/20099 (77.46%) Loss: 2.157797 LR: 0.00001033 +[15:53:48] Epoch: 1 Batch: 15570/20099 (77.47%) Loss: 2.226482 LR: 0.00001033 +[15:53:50] Epoch: 1 Batch: 15571/20099 (77.47%) Loss: 2.364875 LR: 0.00001033 +[15:53:51] Epoch: 1 Batch: 15572/20099 (77.48%) Loss: 2.060739 LR: 0.00001033 +[15:53:53] Epoch: 1 Batch: 15573/20099 (77.48%) Loss: 2.180540 LR: 0.00001033 +[15:53:55] Epoch: 1 Batch: 15574/20099 (77.49%) Loss: 1.971587 LR: 0.00001033 +[15:53:57] Epoch: 1 Batch: 15575/20099 (77.49%) Loss: 2.318553 LR: 0.00001033 +[15:53:59] Epoch: 1 Batch: 15576/20099 (77.50%) Loss: 2.319774 LR: 0.00001031 +[15:54:01] Epoch: 1 Batch: 15577/20099 (77.50%) Loss: 2.178058 LR: 0.00001031 +[15:54:02] Epoch: 1 Batch: 15578/20099 (77.51%) Loss: 1.992712 LR: 0.00001031 +[15:54:04] Epoch: 1 Batch: 15579/20099 (77.51%) Loss: 2.201994 LR: 0.00001031 +[15:54:06] Epoch: 1 Batch: 15580/20099 (77.52%) Loss: 2.176192 LR: 0.00001031 +[15:54:08] Epoch: 1 Batch: 15581/20099 (77.52%) Loss: 2.050178 LR: 0.00001031 +[15:54:10] Epoch: 1 Batch: 15582/20099 (77.53%) Loss: 1.971581 LR: 0.00001031 +[15:54:12] Epoch: 1 Batch: 15583/20099 (77.53%) Loss: 2.188823 LR: 0.00001030 +[15:54:14] Epoch: 1 Batch: 15584/20099 (77.54%) Loss: 2.261289 LR: 0.00001030 +[15:54:15] Epoch: 1 Batch: 15585/20099 (77.54%) Loss: 1.993728 LR: 0.00001030 +[15:54:17] Epoch: 1 Batch: 15586/20099 (77.55%) Loss: 1.967864 LR: 0.00001030 +[15:54:19] Epoch: 1 Batch: 15587/20099 (77.55%) Loss: 1.910121 LR: 0.00001030 +[15:54:21] Epoch: 1 Batch: 15588/20099 (77.56%) Loss: 2.221543 LR: 0.00001030 +[15:54:23] Epoch: 1 Batch: 15589/20099 (77.56%) Loss: 2.122075 LR: 0.00001030 +[15:54:25] Epoch: 1 Batch: 15590/20099 (77.57%) Loss: 2.145648 LR: 0.00001029 +[15:54:27] Epoch: 1 Batch: 15591/20099 (77.57%) Loss: 2.261636 LR: 0.00001029 +[15:54:28] Epoch: 1 Batch: 15592/20099 (77.58%) Loss: 1.870109 LR: 0.00001029 +[15:54:30] Epoch: 1 Batch: 15593/20099 (77.58%) Loss: 1.972451 LR: 0.00001029 +[15:54:32] Epoch: 1 Batch: 15594/20099 (77.59%) Loss: 2.236895 LR: 0.00001029 +[15:54:34] Epoch: 1 Batch: 15595/20099 (77.59%) Loss: 1.842286 LR: 0.00001029 +[15:54:36] Epoch: 1 Batch: 15596/20099 (77.60%) Loss: 2.272804 LR: 0.00001029 +[15:54:38] Epoch: 1 Batch: 15597/20099 (77.60%) Loss: 2.117489 LR: 0.00001028 +[15:54:40] Epoch: 1 Batch: 15598/20099 (77.61%) Loss: 1.909019 LR: 0.00001028 +[15:54:41] Epoch: 1 Batch: 15599/20099 (77.61%) Loss: 2.129066 LR: 0.00001028 +[15:54:47] >> Cleaned up old temp checkpoint: epoch1_step13600 +[15:54:47] >> Temp checkpoint saved: epoch1_step15600, size: 0.1693 GB +[15:54:47] Epoch: 1 Batch: 15600/20099 (77.62%) Loss: 2.021706 LR: 0.00001028 +[15:54:49] Epoch: 1 Batch: 15601/20099 (77.62%) Loss: 1.960383 LR: 0.00001028 +[15:54:51] Epoch: 1 Batch: 15602/20099 (77.63%) Loss: 2.075302 LR: 0.00001028 +[15:54:52] Epoch: 1 Batch: 15603/20099 (77.63%) Loss: 1.948139 LR: 0.00001028 +[15:54:54] Epoch: 1 Batch: 15604/20099 (77.64%) Loss: 2.230962 LR: 0.00001027 +[15:54:56] Epoch: 1 Batch: 15605/20099 (77.64%) Loss: 1.932310 LR: 0.00001027 +[15:54:58] Epoch: 1 Batch: 15606/20099 (77.65%) Loss: 1.975339 LR: 0.00001027 +[15:55:00] Epoch: 1 Batch: 15607/20099 (77.65%) Loss: 2.238906 LR: 0.00001027 +[15:55:02] Epoch: 1 Batch: 15608/20099 (77.66%) Loss: 2.086926 LR: 0.00001027 +[15:55:04] Epoch: 1 Batch: 15609/20099 (77.66%) Loss: 2.194679 LR: 0.00001027 +[15:55:05] Epoch: 1 Batch: 15610/20099 (77.67%) Loss: 2.123814 LR: 0.00001027 +[15:55:07] Epoch: 1 Batch: 15611/20099 (77.67%) Loss: 2.171151 LR: 0.00001025 +[15:55:09] Epoch: 1 Batch: 15612/20099 (77.68%) Loss: 2.017267 LR: 0.00001025 +[15:55:11] Epoch: 1 Batch: 15613/20099 (77.68%) Loss: 1.959557 LR: 0.00001025 +[15:55:13] Epoch: 1 Batch: 15614/20099 (77.69%) Loss: 2.179789 LR: 0.00001025 +[15:55:15] Epoch: 1 Batch: 15615/20099 (77.69%) Loss: 2.149274 LR: 0.00001025 +[15:55:17] Epoch: 1 Batch: 15616/20099 (77.70%) Loss: 1.979704 LR: 0.00001025 +[15:55:18] Epoch: 1 Batch: 15617/20099 (77.70%) Loss: 2.094185 LR: 0.00001025 +[15:55:20] Epoch: 1 Batch: 15618/20099 (77.71%) Loss: 2.050099 LR: 0.00001024 +[15:55:22] Epoch: 1 Batch: 15619/20099 (77.71%) Loss: 2.118863 LR: 0.00001024 +[15:55:24] Epoch: 1 Batch: 15620/20099 (77.72%) Loss: 2.302886 LR: 0.00001024 +[15:55:26] Epoch: 1 Batch: 15621/20099 (77.72%) Loss: 2.011087 LR: 0.00001024 +[15:55:28] Epoch: 1 Batch: 15622/20099 (77.73%) Loss: 2.249615 LR: 0.00001024 +[15:55:30] Epoch: 1 Batch: 15623/20099 (77.73%) Loss: 2.051770 LR: 0.00001024 +[15:55:32] Epoch: 1 Batch: 15624/20099 (77.74%) Loss: 2.355285 LR: 0.00001024 +[15:55:33] Epoch: 1 Batch: 15625/20099 (77.74%) Loss: 2.138605 LR: 0.00001023 +[15:55:35] Epoch: 1 Batch: 15626/20099 (77.75%) Loss: 2.001810 LR: 0.00001023 +[15:55:37] Epoch: 1 Batch: 15627/20099 (77.75%) Loss: 2.375710 LR: 0.00001023 +[15:55:39] Epoch: 1 Batch: 15628/20099 (77.76%) Loss: 1.858228 LR: 0.00001023 +[15:55:41] Epoch: 1 Batch: 15629/20099 (77.76%) Loss: 1.801430 LR: 0.00001023 +[15:55:43] Epoch: 1 Batch: 15630/20099 (77.77%) Loss: 1.992192 LR: 0.00001023 +[15:55:45] Epoch: 1 Batch: 15631/20099 (77.77%) Loss: 2.365761 LR: 0.00001023 +[15:55:46] Epoch: 1 Batch: 15632/20099 (77.78%) Loss: 2.119000 LR: 0.00001022 +[15:55:48] Epoch: 1 Batch: 15633/20099 (77.78%) Loss: 2.460329 LR: 0.00001022 +[15:55:50] Epoch: 1 Batch: 15634/20099 (77.78%) Loss: 1.991013 LR: 0.00001022 +[15:55:52] Epoch: 1 Batch: 15635/20099 (77.79%) Loss: 1.801908 LR: 0.00001022 +[15:55:54] Epoch: 1 Batch: 15636/20099 (77.79%) Loss: 2.100223 LR: 0.00001022 +[15:55:56] Epoch: 1 Batch: 15637/20099 (77.80%) Loss: 1.908920 LR: 0.00001022 +[15:55:57] Epoch: 1 Batch: 15638/20099 (77.80%) Loss: 2.105090 LR: 0.00001022 +[15:55:59] Epoch: 1 Batch: 15639/20099 (77.81%) Loss: 1.817190 LR: 0.00001020 +[15:56:01] Epoch: 1 Batch: 15640/20099 (77.81%) Loss: 1.937742 LR: 0.00001020 +[15:56:03] Epoch: 1 Batch: 15641/20099 (77.82%) Loss: 2.051777 LR: 0.00001020 +[15:56:05] Epoch: 1 Batch: 15642/20099 (77.82%) Loss: 1.970969 LR: 0.00001020 +[15:56:07] Epoch: 1 Batch: 15643/20099 (77.83%) Loss: 2.154370 LR: 0.00001020 +[15:56:09] Epoch: 1 Batch: 15644/20099 (77.83%) Loss: 2.030743 LR: 0.00001020 +[15:56:10] Epoch: 1 Batch: 15645/20099 (77.84%) Loss: 1.763569 LR: 0.00001020 +[15:56:12] Epoch: 1 Batch: 15646/20099 (77.84%) Loss: 1.837155 LR: 0.00001019 +[15:56:14] Epoch: 1 Batch: 15647/20099 (77.85%) Loss: 2.001074 LR: 0.00001019 +[15:56:16] Epoch: 1 Batch: 15648/20099 (77.85%) Loss: 1.983035 LR: 0.00001019 +[15:56:18] Epoch: 1 Batch: 15649/20099 (77.86%) Loss: 1.869951 LR: 0.00001019 +[15:56:20] Epoch: 1 Batch: 15650/20099 (77.86%) Loss: 2.091889 LR: 0.00001019 +[15:56:21] Epoch: 1 Batch: 15651/20099 (77.87%) Loss: 1.983964 LR: 0.00001019 +[15:56:23] Epoch: 1 Batch: 15652/20099 (77.87%) Loss: 2.137774 LR: 0.00001019 +[15:56:25] Epoch: 1 Batch: 15653/20099 (77.88%) Loss: 2.080971 LR: 0.00001018 +[15:56:27] Epoch: 1 Batch: 15654/20099 (77.88%) Loss: 1.870296 LR: 0.00001018 +[15:56:29] Epoch: 1 Batch: 15655/20099 (77.89%) Loss: 2.010511 LR: 0.00001018 +[15:56:31] Epoch: 1 Batch: 15656/20099 (77.89%) Loss: 1.821626 LR: 0.00001018 +[15:56:33] Epoch: 1 Batch: 15657/20099 (77.90%) Loss: 1.947908 LR: 0.00001018 +[15:56:34] Epoch: 1 Batch: 15658/20099 (77.90%) Loss: 2.111537 LR: 0.00001018 +[15:56:36] Epoch: 1 Batch: 15659/20099 (77.91%) Loss: 1.988392 LR: 0.00001018 +[15:56:38] Epoch: 1 Batch: 15660/20099 (77.91%) Loss: 2.302016 LR: 0.00001017 +[15:56:40] Epoch: 1 Batch: 15661/20099 (77.92%) Loss: 2.287156 LR: 0.00001017 +[15:56:42] Epoch: 1 Batch: 15662/20099 (77.92%) Loss: 2.209000 LR: 0.00001017 +[15:56:44] Epoch: 1 Batch: 15663/20099 (77.93%) Loss: 2.206934 LR: 0.00001017 +[15:56:45] Epoch: 1 Batch: 15664/20099 (77.93%) Loss: 1.912124 LR: 0.00001017 +[15:56:47] Epoch: 1 Batch: 15665/20099 (77.94%) Loss: 2.110021 LR: 0.00001017 +[15:56:49] Epoch: 1 Batch: 15666/20099 (77.94%) Loss: 2.036299 LR: 0.00001017 +[15:56:51] Epoch: 1 Batch: 15667/20099 (77.95%) Loss: 2.342015 LR: 0.00001015 +[15:56:53] Epoch: 1 Batch: 15668/20099 (77.95%) Loss: 2.244155 LR: 0.00001015 +[15:56:55] Epoch: 1 Batch: 15669/20099 (77.96%) Loss: 1.996850 LR: 0.00001015 +[15:56:57] Epoch: 1 Batch: 15670/20099 (77.96%) Loss: 1.853235 LR: 0.00001015 +[15:56:58] Epoch: 1 Batch: 15671/20099 (77.97%) Loss: 2.167412 LR: 0.00001015 +[15:57:00] Epoch: 1 Batch: 15672/20099 (77.97%) Loss: 2.124855 LR: 0.00001015 +[15:57:02] Epoch: 1 Batch: 15673/20099 (77.98%) Loss: 1.925758 LR: 0.00001015 +[15:57:04] Epoch: 1 Batch: 15674/20099 (77.98%) Loss: 2.082839 LR: 0.00001014 +[15:57:06] Epoch: 1 Batch: 15675/20099 (77.99%) Loss: 1.947602 LR: 0.00001014 +[15:57:08] Epoch: 1 Batch: 15676/20099 (77.99%) Loss: 2.264728 LR: 0.00001014 +[15:57:10] Epoch: 1 Batch: 15677/20099 (78.00%) Loss: 2.192770 LR: 0.00001014 +[15:57:11] Epoch: 1 Batch: 15678/20099 (78.00%) Loss: 2.032751 LR: 0.00001014 +[15:57:13] Epoch: 1 Batch: 15679/20099 (78.01%) Loss: 2.001391 LR: 0.00001014 +[15:57:15] Epoch: 1 Batch: 15680/20099 (78.01%) Loss: 2.532889 LR: 0.00001014 +[15:57:17] Epoch: 1 Batch: 15681/20099 (78.02%) Loss: 2.396230 LR: 0.00001013 +[15:57:19] Epoch: 1 Batch: 15682/20099 (78.02%) Loss: 2.062712 LR: 0.00001013 +[15:57:21] Epoch: 1 Batch: 15683/20099 (78.03%) Loss: 2.077476 LR: 0.00001013 +[15:57:23] Epoch: 1 Batch: 15684/20099 (78.03%) Loss: 2.034358 LR: 0.00001013 +[15:57:25] Epoch: 1 Batch: 15685/20099 (78.04%) Loss: 2.145914 LR: 0.00001013 +[15:57:26] Epoch: 1 Batch: 15686/20099 (78.04%) Loss: 1.954157 LR: 0.00001013 +[15:57:28] Epoch: 1 Batch: 15687/20099 (78.05%) Loss: 1.810053 LR: 0.00001013 +[15:57:30] Epoch: 1 Batch: 15688/20099 (78.05%) Loss: 2.200597 LR: 0.00001012 +[15:57:32] Epoch: 1 Batch: 15689/20099 (78.06%) Loss: 2.030104 LR: 0.00001012 +[15:57:34] Epoch: 1 Batch: 15690/20099 (78.06%) Loss: 2.218868 LR: 0.00001012 +[15:57:36] Epoch: 1 Batch: 15691/20099 (78.07%) Loss: 1.740746 LR: 0.00001012 +[15:57:38] Epoch: 1 Batch: 15692/20099 (78.07%) Loss: 2.049696 LR: 0.00001012 +[15:57:39] Epoch: 1 Batch: 15693/20099 (78.08%) Loss: 2.253983 LR: 0.00001012 +[15:57:41] Epoch: 1 Batch: 15694/20099 (78.08%) Loss: 2.236936 LR: 0.00001012 +[15:57:43] Epoch: 1 Batch: 15695/20099 (78.09%) Loss: 2.352349 LR: 0.00001010 +[15:57:45] Epoch: 1 Batch: 15696/20099 (78.09%) Loss: 2.193680 LR: 0.00001010 +[15:57:47] Epoch: 1 Batch: 15697/20099 (78.10%) Loss: 1.813413 LR: 0.00001010 +[15:57:49] Epoch: 1 Batch: 15698/20099 (78.10%) Loss: 2.093519 LR: 0.00001010 +[15:57:51] Epoch: 1 Batch: 15699/20099 (78.11%) Loss: 2.182641 LR: 0.00001010 +[15:57:52] Epoch: 1 Batch: 15700/20099 (78.11%) Loss: 2.157641 LR: 0.00001010 +[15:57:54] Epoch: 1 Batch: 15701/20099 (78.12%) Loss: 2.059607 LR: 0.00001010 +[15:57:56] Epoch: 1 Batch: 15702/20099 (78.12%) Loss: 1.827855 LR: 0.00001009 +[15:57:58] Epoch: 1 Batch: 15703/20099 (78.13%) Loss: 2.143876 LR: 0.00001009 +[15:58:00] Epoch: 1 Batch: 15704/20099 (78.13%) Loss: 1.906261 LR: 0.00001009 +[15:58:02] Epoch: 1 Batch: 15705/20099 (78.14%) Loss: 1.935135 LR: 0.00001009 +[15:58:04] Epoch: 1 Batch: 15706/20099 (78.14%) Loss: 2.119717 LR: 0.00001009 +[15:58:05] Epoch: 1 Batch: 15707/20099 (78.15%) Loss: 2.268354 LR: 0.00001009 +[15:58:07] Epoch: 1 Batch: 15708/20099 (78.15%) Loss: 2.158185 LR: 0.00001009 +[15:58:09] Epoch: 1 Batch: 15709/20099 (78.16%) Loss: 2.276398 LR: 0.00001008 +[15:58:11] Epoch: 1 Batch: 15710/20099 (78.16%) Loss: 2.106095 LR: 0.00001008 +[15:58:13] Epoch: 1 Batch: 15711/20099 (78.17%) Loss: 1.940248 LR: 0.00001008 +[15:58:15] Epoch: 1 Batch: 15712/20099 (78.17%) Loss: 1.900149 LR: 0.00001008 +[15:58:16] Epoch: 1 Batch: 15713/20099 (78.18%) Loss: 1.855711 LR: 0.00001008 +[15:58:18] Epoch: 1 Batch: 15714/20099 (78.18%) Loss: 2.253801 LR: 0.00001008 +[15:58:20] Epoch: 1 Batch: 15715/20099 (78.19%) Loss: 2.010630 LR: 0.00001008 +[15:58:22] Epoch: 1 Batch: 15716/20099 (78.19%) Loss: 2.166205 LR: 0.00001007 +[15:58:24] Epoch: 1 Batch: 15717/20099 (78.20%) Loss: 2.149372 LR: 0.00001007 +[15:58:26] Epoch: 1 Batch: 15718/20099 (78.20%) Loss: 2.240678 LR: 0.00001007 +[15:58:28] Epoch: 1 Batch: 15719/20099 (78.21%) Loss: 2.367652 LR: 0.00001007 +[15:58:30] Epoch: 1 Batch: 15720/20099 (78.21%) Loss: 2.237463 LR: 0.00001007 +[15:58:31] Epoch: 1 Batch: 15721/20099 (78.22%) Loss: 2.124147 LR: 0.00001007 +[15:58:33] Epoch: 1 Batch: 15722/20099 (78.22%) Loss: 2.136568 LR: 0.00001007 +[15:58:35] Epoch: 1 Batch: 15723/20099 (78.23%) Loss: 2.024474 LR: 0.00001006 +[15:58:37] Epoch: 1 Batch: 15724/20099 (78.23%) Loss: 1.914419 LR: 0.00001006 +[15:58:39] Epoch: 1 Batch: 15725/20099 (78.24%) Loss: 1.746320 LR: 0.00001006 +[15:58:41] Epoch: 1 Batch: 15726/20099 (78.24%) Loss: 2.078390 LR: 0.00001006 +[15:58:43] Epoch: 1 Batch: 15727/20099 (78.25%) Loss: 1.903078 LR: 0.00001006 +[15:58:44] Epoch: 1 Batch: 15728/20099 (78.25%) Loss: 2.181024 LR: 0.00001006 +[15:58:46] Epoch: 1 Batch: 15729/20099 (78.26%) Loss: 1.939851 LR: 0.00001006 +[15:58:48] Epoch: 1 Batch: 15730/20099 (78.26%) Loss: 2.060353 LR: 0.00001004 +[15:58:50] Epoch: 1 Batch: 15731/20099 (78.27%) Loss: 2.166804 LR: 0.00001004 +[15:58:52] Epoch: 1 Batch: 15732/20099 (78.27%) Loss: 2.285968 LR: 0.00001004 +[15:58:54] Epoch: 1 Batch: 15733/20099 (78.28%) Loss: 2.215719 LR: 0.00001004 +[15:58:56] Epoch: 1 Batch: 15734/20099 (78.28%) Loss: 1.830772 LR: 0.00001004 +[15:58:57] Epoch: 1 Batch: 15735/20099 (78.29%) Loss: 2.100947 LR: 0.00001004 +[15:58:59] Epoch: 1 Batch: 15736/20099 (78.29%) Loss: 1.913176 LR: 0.00001004 +[15:59:01] Epoch: 1 Batch: 15737/20099 (78.30%) Loss: 2.159969 LR: 0.00001003 +[15:59:03] Epoch: 1 Batch: 15738/20099 (78.30%) Loss: 2.170730 LR: 0.00001003 +[15:59:05] Epoch: 1 Batch: 15739/20099 (78.31%) Loss: 2.252765 LR: 0.00001003 +[15:59:07] Epoch: 1 Batch: 15740/20099 (78.31%) Loss: 2.307674 LR: 0.00001003 +[15:59:09] Epoch: 1 Batch: 15741/20099 (78.32%) Loss: 2.524321 LR: 0.00001003 +[15:59:10] Epoch: 1 Batch: 15742/20099 (78.32%) Loss: 2.226506 LR: 0.00001003 +[15:59:12] Epoch: 1 Batch: 15743/20099 (78.33%) Loss: 1.990787 LR: 0.00001003 +[15:59:14] Epoch: 1 Batch: 15744/20099 (78.33%) Loss: 2.087411 LR: 0.00001002 +[15:59:16] Epoch: 1 Batch: 15745/20099 (78.34%) Loss: 2.154553 LR: 0.00001002 +[15:59:18] Epoch: 1 Batch: 15746/20099 (78.34%) Loss: 1.968263 LR: 0.00001002 +[15:59:20] Epoch: 1 Batch: 15747/20099 (78.35%) Loss: 2.219302 LR: 0.00001002 +[15:59:21] Epoch: 1 Batch: 15748/20099 (78.35%) Loss: 2.055678 LR: 0.00001002 +[15:59:23] Epoch: 1 Batch: 15749/20099 (78.36%) Loss: 1.884024 LR: 0.00001002 +[15:59:25] Epoch: 1 Batch: 15750/20099 (78.36%) Loss: 2.137870 LR: 0.00001002 +[15:59:27] Epoch: 1 Batch: 15751/20099 (78.37%) Loss: 2.139441 LR: 0.00001001 +[15:59:29] Epoch: 1 Batch: 15752/20099 (78.37%) Loss: 1.967782 LR: 0.00001001 +[15:59:31] Epoch: 1 Batch: 15753/20099 (78.38%) Loss: 2.152537 LR: 0.00001001 +[15:59:33] Epoch: 1 Batch: 15754/20099 (78.38%) Loss: 2.031475 LR: 0.00001001 +[15:59:34] Epoch: 1 Batch: 15755/20099 (78.39%) Loss: 2.232095 LR: 0.00001001 +[15:59:36] Epoch: 1 Batch: 15756/20099 (78.39%) Loss: 2.274142 LR: 0.00001001 +[15:59:38] Epoch: 1 Batch: 15757/20099 (78.40%) Loss: 2.126869 LR: 0.00001001 +[15:59:40] Epoch: 1 Batch: 15758/20099 (78.40%) Loss: 2.427935 LR: 0.00001000 +[15:59:42] Epoch: 1 Batch: 15759/20099 (78.41%) Loss: 2.234783 LR: 0.00001000 +[15:59:44] Epoch: 1 Batch: 15760/20099 (78.41%) Loss: 2.258379 LR: 0.00001000 +[15:59:46] Epoch: 1 Batch: 15761/20099 (78.42%) Loss: 2.019770 LR: 0.00001000 +[15:59:47] Epoch: 1 Batch: 15762/20099 (78.42%) Loss: 1.767642 LR: 0.00001000 +[15:59:49] Epoch: 1 Batch: 15763/20099 (78.43%) Loss: 2.335553 LR: 0.00001000 +[15:59:51] Epoch: 1 Batch: 15764/20099 (78.43%) Loss: 1.960418 LR: 0.00001000 +[15:59:53] Epoch: 1 Batch: 15765/20099 (78.44%) Loss: 2.246634 LR: 0.00000998 +[15:59:55] Epoch: 1 Batch: 15766/20099 (78.44%) Loss: 1.892395 LR: 0.00000998 +[15:59:57] Epoch: 1 Batch: 15767/20099 (78.45%) Loss: 2.049098 LR: 0.00000998 +[15:59:58] Epoch: 1 Batch: 15768/20099 (78.45%) Loss: 2.130647 LR: 0.00000998 +[16:00:00] Epoch: 1 Batch: 15769/20099 (78.46%) Loss: 2.180283 LR: 0.00000998 +[16:00:02] Epoch: 1 Batch: 15770/20099 (78.46%) Loss: 2.019328 LR: 0.00000998 +[16:00:04] Epoch: 1 Batch: 15771/20099 (78.47%) Loss: 2.058983 LR: 0.00000998 +[16:00:06] Epoch: 1 Batch: 15772/20099 (78.47%) Loss: 2.042424 LR: 0.00000997 +[16:00:08] Epoch: 1 Batch: 15773/20099 (78.48%) Loss: 1.757224 LR: 0.00000997 +[16:00:10] Epoch: 1 Batch: 15774/20099 (78.48%) Loss: 2.053060 LR: 0.00000997 +[16:00:11] Epoch: 1 Batch: 15775/20099 (78.49%) Loss: 2.046332 LR: 0.00000997 +[16:00:13] Epoch: 1 Batch: 15776/20099 (78.49%) Loss: 2.144575 LR: 0.00000997 +[16:00:15] Epoch: 1 Batch: 15777/20099 (78.50%) Loss: 2.469840 LR: 0.00000997 +[16:00:17] Epoch: 1 Batch: 15778/20099 (78.50%) Loss: 2.391606 LR: 0.00000997 +[16:00:19] Epoch: 1 Batch: 15779/20099 (78.51%) Loss: 2.216549 LR: 0.00000996 +[16:00:21] Epoch: 1 Batch: 15780/20099 (78.51%) Loss: 2.280471 LR: 0.00000996 +[16:00:23] Epoch: 1 Batch: 15781/20099 (78.52%) Loss: 1.925820 LR: 0.00000996 +[16:00:24] Epoch: 1 Batch: 15782/20099 (78.52%) Loss: 2.251534 LR: 0.00000996 +[16:00:26] Epoch: 1 Batch: 15783/20099 (78.53%) Loss: 2.329866 LR: 0.00000996 +[16:00:28] Epoch: 1 Batch: 15784/20099 (78.53%) Loss: 2.107868 LR: 0.00000996 +[16:00:30] Epoch: 1 Batch: 15785/20099 (78.54%) Loss: 2.016022 LR: 0.00000996 +[16:00:32] Epoch: 1 Batch: 15786/20099 (78.54%) Loss: 2.247635 LR: 0.00000995 +[16:00:34] Epoch: 1 Batch: 15787/20099 (78.55%) Loss: 2.103354 LR: 0.00000995 +[16:00:35] Epoch: 1 Batch: 15788/20099 (78.55%) Loss: 1.929298 LR: 0.00000995 +[16:00:37] Epoch: 1 Batch: 15789/20099 (78.56%) Loss: 1.694338 LR: 0.00000995 +[16:00:39] Epoch: 1 Batch: 15790/20099 (78.56%) Loss: 1.943704 LR: 0.00000995 +[16:00:41] Epoch: 1 Batch: 15791/20099 (78.57%) Loss: 2.047667 LR: 0.00000995 +[16:00:43] Epoch: 1 Batch: 15792/20099 (78.57%) Loss: 2.258962 LR: 0.00000995 +[16:00:45] Epoch: 1 Batch: 15793/20099 (78.58%) Loss: 2.104939 LR: 0.00000994 +[16:00:47] Epoch: 1 Batch: 15794/20099 (78.58%) Loss: 2.222811 LR: 0.00000994 +[16:00:48] Epoch: 1 Batch: 15795/20099 (78.59%) Loss: 2.096844 LR: 0.00000994 +[16:00:50] Epoch: 1 Batch: 15796/20099 (78.59%) Loss: 2.251136 LR: 0.00000994 +[16:00:52] Epoch: 1 Batch: 15797/20099 (78.60%) Loss: 2.254202 LR: 0.00000994 +[16:00:54] Epoch: 1 Batch: 15798/20099 (78.60%) Loss: 1.743074 LR: 0.00000994 +[16:00:56] Epoch: 1 Batch: 15799/20099 (78.61%) Loss: 1.841773 LR: 0.00000994 +[16:01:01] >> Cleaned up old temp checkpoint: epoch1_step13800 +[16:01:01] >> Temp checkpoint saved: epoch1_step15800, size: 0.1693 GB +[16:01:01] Epoch: 1 Batch: 15800/20099 (78.61%) Loss: 2.084735 LR: 0.00000992 +[16:01:03] Epoch: 1 Batch: 15801/20099 (78.62%) Loss: 2.225002 LR: 0.00000992 +[16:01:05] Epoch: 1 Batch: 15802/20099 (78.62%) Loss: 2.211079 LR: 0.00000992 +[16:01:07] Epoch: 1 Batch: 15803/20099 (78.63%) Loss: 2.129584 LR: 0.00000992 +[16:01:09] Epoch: 1 Batch: 15804/20099 (78.63%) Loss: 2.014624 LR: 0.00000992 +[16:01:10] Epoch: 1 Batch: 15805/20099 (78.64%) Loss: 1.876759 LR: 0.00000992 +[16:01:12] Epoch: 1 Batch: 15806/20099 (78.64%) Loss: 2.081866 LR: 0.00000992 +[16:01:14] Epoch: 1 Batch: 15807/20099 (78.65%) Loss: 2.182796 LR: 0.00000991 +[16:01:16] Epoch: 1 Batch: 15808/20099 (78.65%) Loss: 2.256665 LR: 0.00000991 +[16:01:18] Epoch: 1 Batch: 15809/20099 (78.66%) Loss: 2.227414 LR: 0.00000991 +[16:01:20] Epoch: 1 Batch: 15810/20099 (78.66%) Loss: 2.129448 LR: 0.00000991 +[16:01:22] Epoch: 1 Batch: 15811/20099 (78.67%) Loss: 2.371997 LR: 0.00000991 +[16:01:23] Epoch: 1 Batch: 15812/20099 (78.67%) Loss: 2.171572 LR: 0.00000991 +[16:01:25] Epoch: 1 Batch: 15813/20099 (78.68%) Loss: 1.746992 LR: 0.00000991 +[16:01:27] Epoch: 1 Batch: 15814/20099 (78.68%) Loss: 2.139758 LR: 0.00000990 +[16:01:29] Epoch: 1 Batch: 15815/20099 (78.69%) Loss: 1.975576 LR: 0.00000990 +[16:01:31] Epoch: 1 Batch: 15816/20099 (78.69%) Loss: 2.226619 LR: 0.00000990 +[16:01:33] Epoch: 1 Batch: 15817/20099 (78.70%) Loss: 2.162621 LR: 0.00000990 +[16:01:35] Epoch: 1 Batch: 15818/20099 (78.70%) Loss: 2.187351 LR: 0.00000990 +[16:01:37] Epoch: 1 Batch: 15819/20099 (78.71%) Loss: 2.131095 LR: 0.00000990 +[16:01:38] Epoch: 1 Batch: 15820/20099 (78.71%) Loss: 2.228522 LR: 0.00000990 +[16:01:40] Epoch: 1 Batch: 15821/20099 (78.72%) Loss: 2.126752 LR: 0.00000989 +[16:01:42] Epoch: 1 Batch: 15822/20099 (78.72%) Loss: 1.975162 LR: 0.00000989 +[16:01:44] Epoch: 1 Batch: 15823/20099 (78.73%) Loss: 1.967517 LR: 0.00000989 +[16:01:46] Epoch: 1 Batch: 15824/20099 (78.73%) Loss: 2.028698 LR: 0.00000989 +[16:01:48] Epoch: 1 Batch: 15825/20099 (78.74%) Loss: 2.126650 LR: 0.00000989 +[16:01:50] Epoch: 1 Batch: 15826/20099 (78.74%) Loss: 1.743927 LR: 0.00000989 +[16:01:51] Epoch: 1 Batch: 15827/20099 (78.75%) Loss: 1.986560 LR: 0.00000989 +[16:01:53] Epoch: 1 Batch: 15828/20099 (78.75%) Loss: 1.899283 LR: 0.00000988 +[16:01:55] Epoch: 1 Batch: 15829/20099 (78.76%) Loss: 1.950975 LR: 0.00000988 +[16:01:57] Epoch: 1 Batch: 15830/20099 (78.76%) Loss: 2.043473 LR: 0.00000988 +[16:01:59] Epoch: 1 Batch: 15831/20099 (78.77%) Loss: 2.089319 LR: 0.00000988 +[16:02:01] Epoch: 1 Batch: 15832/20099 (78.77%) Loss: 2.080405 LR: 0.00000988 +[16:02:03] Epoch: 1 Batch: 15833/20099 (78.78%) Loss: 2.248727 LR: 0.00000988 +[16:02:04] Epoch: 1 Batch: 15834/20099 (78.78%) Loss: 2.099190 LR: 0.00000988 +[16:02:06] Epoch: 1 Batch: 15835/20099 (78.79%) Loss: 1.972268 LR: 0.00000986 +[16:02:08] Epoch: 1 Batch: 15836/20099 (78.79%) Loss: 2.161817 LR: 0.00000986 +[16:02:10] Epoch: 1 Batch: 15837/20099 (78.79%) Loss: 1.865821 LR: 0.00000986 +[16:02:12] Epoch: 1 Batch: 15838/20099 (78.80%) Loss: 2.130437 LR: 0.00000986 +[16:02:14] Epoch: 1 Batch: 15839/20099 (78.80%) Loss: 2.028706 LR: 0.00000986 +[16:02:15] Epoch: 1 Batch: 15840/20099 (78.81%) Loss: 1.881472 LR: 0.00000986 +[16:02:17] Epoch: 1 Batch: 15841/20099 (78.81%) Loss: 2.373928 LR: 0.00000986 +[16:02:19] Epoch: 1 Batch: 15842/20099 (78.82%) Loss: 2.217161 LR: 0.00000985 +[16:02:21] Epoch: 1 Batch: 15843/20099 (78.82%) Loss: 2.125276 LR: 0.00000985 +[16:02:23] Epoch: 1 Batch: 15844/20099 (78.83%) Loss: 2.222782 LR: 0.00000985 +[16:02:25] Epoch: 1 Batch: 15845/20099 (78.83%) Loss: 2.038830 LR: 0.00000985 +[16:02:26] Epoch: 1 Batch: 15846/20099 (78.84%) Loss: 2.082449 LR: 0.00000985 +[16:02:28] Epoch: 1 Batch: 15847/20099 (78.84%) Loss: 2.216962 LR: 0.00000985 +[16:02:30] Epoch: 1 Batch: 15848/20099 (78.85%) Loss: 2.353801 LR: 0.00000985 +[16:02:32] Epoch: 1 Batch: 15849/20099 (78.85%) Loss: 2.253825 LR: 0.00000984 +[16:02:34] Epoch: 1 Batch: 15850/20099 (78.86%) Loss: 2.062839 LR: 0.00000984 +[16:02:36] Epoch: 1 Batch: 15851/20099 (78.86%) Loss: 1.963298 LR: 0.00000984 +[16:02:38] Epoch: 1 Batch: 15852/20099 (78.87%) Loss: 2.176297 LR: 0.00000984 +[16:02:39] Epoch: 1 Batch: 15853/20099 (78.87%) Loss: 2.258205 LR: 0.00000984 +[16:02:41] Epoch: 1 Batch: 15854/20099 (78.88%) Loss: 2.276224 LR: 0.00000984 +[16:02:43] Epoch: 1 Batch: 15855/20099 (78.88%) Loss: 2.102357 LR: 0.00000984 +[16:02:45] Epoch: 1 Batch: 15856/20099 (78.89%) Loss: 2.392173 LR: 0.00000983 +[16:02:47] Epoch: 1 Batch: 15857/20099 (78.89%) Loss: 1.991498 LR: 0.00000983 +[16:02:49] Epoch: 1 Batch: 15858/20099 (78.90%) Loss: 1.934220 LR: 0.00000983 +[16:02:51] Epoch: 1 Batch: 15859/20099 (78.90%) Loss: 1.951105 LR: 0.00000983 +[16:02:52] Epoch: 1 Batch: 15860/20099 (78.91%) Loss: 2.150551 LR: 0.00000983 +[16:02:54] Epoch: 1 Batch: 15861/20099 (78.91%) Loss: 2.121039 LR: 0.00000983 +[16:02:56] Epoch: 1 Batch: 15862/20099 (78.92%) Loss: 1.890010 LR: 0.00000983 +[16:02:58] Epoch: 1 Batch: 15863/20099 (78.92%) Loss: 2.056733 LR: 0.00000982 +[16:03:00] Epoch: 1 Batch: 15864/20099 (78.93%) Loss: 2.076474 LR: 0.00000982 +[16:03:02] Epoch: 1 Batch: 15865/20099 (78.93%) Loss: 2.092302 LR: 0.00000982 +[16:03:04] Epoch: 1 Batch: 15866/20099 (78.94%) Loss: 2.051674 LR: 0.00000982 +[16:03:06] Epoch: 1 Batch: 15867/20099 (78.94%) Loss: 2.130412 LR: 0.00000982 +[16:03:07] Epoch: 1 Batch: 15868/20099 (78.95%) Loss: 2.017164 LR: 0.00000982 +[16:03:09] Epoch: 1 Batch: 15869/20099 (78.95%) Loss: 1.874221 LR: 0.00000982 +[16:03:11] Epoch: 1 Batch: 15870/20099 (78.96%) Loss: 2.037051 LR: 0.00000980 +[16:03:13] Epoch: 1 Batch: 15871/20099 (78.96%) Loss: 1.904336 LR: 0.00000980 +[16:03:15] Epoch: 1 Batch: 15872/20099 (78.97%) Loss: 2.045178 LR: 0.00000980 +[16:03:17] Epoch: 1 Batch: 15873/20099 (78.97%) Loss: 2.002966 LR: 0.00000980 +[16:03:19] Epoch: 1 Batch: 15874/20099 (78.98%) Loss: 1.994064 LR: 0.00000980 +[16:03:20] Epoch: 1 Batch: 15875/20099 (78.98%) Loss: 2.104956 LR: 0.00000980 +[16:03:22] Epoch: 1 Batch: 15876/20099 (78.99%) Loss: 2.284694 LR: 0.00000980 +[16:03:24] Epoch: 1 Batch: 15877/20099 (78.99%) Loss: 2.190358 LR: 0.00000979 +[16:03:26] Epoch: 1 Batch: 15878/20099 (79.00%) Loss: 2.130193 LR: 0.00000979 +[16:03:28] Epoch: 1 Batch: 15879/20099 (79.00%) Loss: 2.105683 LR: 0.00000979 +[16:03:30] Epoch: 1 Batch: 15880/20099 (79.01%) Loss: 2.223555 LR: 0.00000979 +[16:03:31] Epoch: 1 Batch: 15881/20099 (79.01%) Loss: 2.162377 LR: 0.00000979 +[16:03:33] Epoch: 1 Batch: 15882/20099 (79.02%) Loss: 2.021289 LR: 0.00000979 +[16:03:35] Epoch: 1 Batch: 15883/20099 (79.02%) Loss: 1.930312 LR: 0.00000979 +[16:03:37] Epoch: 1 Batch: 15884/20099 (79.03%) Loss: 1.863414 LR: 0.00000978 +[16:03:39] Epoch: 1 Batch: 15885/20099 (79.03%) Loss: 2.243502 LR: 0.00000978 +[16:03:41] Epoch: 1 Batch: 15886/20099 (79.04%) Loss: 2.284492 LR: 0.00000978 +[16:03:43] Epoch: 1 Batch: 15887/20099 (79.04%) Loss: 2.088390 LR: 0.00000978 +[16:03:44] Epoch: 1 Batch: 15888/20099 (79.05%) Loss: 1.904705 LR: 0.00000978 +[16:03:46] Epoch: 1 Batch: 15889/20099 (79.05%) Loss: 1.891292 LR: 0.00000978 +[16:03:48] Epoch: 1 Batch: 15890/20099 (79.06%) Loss: 2.113130 LR: 0.00000978 +[16:03:50] Epoch: 1 Batch: 15891/20099 (79.06%) Loss: 1.989501 LR: 0.00000977 +[16:03:52] Epoch: 1 Batch: 15892/20099 (79.07%) Loss: 2.019909 LR: 0.00000977 +[16:03:54] Epoch: 1 Batch: 15893/20099 (79.07%) Loss: 1.938616 LR: 0.00000977 +[16:03:55] Epoch: 1 Batch: 15894/20099 (79.08%) Loss: 2.098261 LR: 0.00000977 +[16:03:57] Epoch: 1 Batch: 15895/20099 (79.08%) Loss: 2.125155 LR: 0.00000977 +[16:03:59] Epoch: 1 Batch: 15896/20099 (79.09%) Loss: 2.150758 LR: 0.00000977 +[16:04:01] Epoch: 1 Batch: 15897/20099 (79.09%) Loss: 2.092648 LR: 0.00000977 +[16:04:03] Epoch: 1 Batch: 15898/20099 (79.10%) Loss: 2.090314 LR: 0.00000976 +[16:04:05] Epoch: 1 Batch: 15899/20099 (79.10%) Loss: 2.051914 LR: 0.00000976 +[16:04:07] Epoch: 1 Batch: 15900/20099 (79.11%) Loss: 2.247695 LR: 0.00000976 +[16:04:08] Epoch: 1 Batch: 15901/20099 (79.11%) Loss: 2.150155 LR: 0.00000976 +[16:04:10] Epoch: 1 Batch: 15902/20099 (79.12%) Loss: 2.101501 LR: 0.00000976 +[16:04:12] Epoch: 1 Batch: 15903/20099 (79.12%) Loss: 2.013657 LR: 0.00000976 +[16:04:14] Epoch: 1 Batch: 15904/20099 (79.13%) Loss: 2.161249 LR: 0.00000976 +[16:04:16] Epoch: 1 Batch: 15905/20099 (79.13%) Loss: 2.048043 LR: 0.00000974 +[16:04:18] Epoch: 1 Batch: 15906/20099 (79.14%) Loss: 2.328536 LR: 0.00000974 +[16:04:20] Epoch: 1 Batch: 15907/20099 (79.14%) Loss: 2.120104 LR: 0.00000974 +[16:04:21] Epoch: 1 Batch: 15908/20099 (79.15%) Loss: 2.511087 LR: 0.00000974 +[16:04:23] Epoch: 1 Batch: 15909/20099 (79.15%) Loss: 1.976946 LR: 0.00000974 +[16:04:25] Epoch: 1 Batch: 15910/20099 (79.16%) Loss: 2.237674 LR: 0.00000974 +[16:04:27] Epoch: 1 Batch: 15911/20099 (79.16%) Loss: 2.052062 LR: 0.00000974 +[16:04:29] Epoch: 1 Batch: 15912/20099 (79.17%) Loss: 2.073648 LR: 0.00000973 +[16:04:31] Epoch: 1 Batch: 15913/20099 (79.17%) Loss: 2.119620 LR: 0.00000973 +[16:04:32] Epoch: 1 Batch: 15914/20099 (79.18%) Loss: 2.229075 LR: 0.00000973 +[16:04:34] Epoch: 1 Batch: 15915/20099 (79.18%) Loss: 2.061507 LR: 0.00000973 +[16:04:36] Epoch: 1 Batch: 15916/20099 (79.19%) Loss: 1.939970 LR: 0.00000973 +[16:04:38] Epoch: 1 Batch: 15917/20099 (79.19%) Loss: 1.960644 LR: 0.00000973 +[16:04:40] Epoch: 1 Batch: 15918/20099 (79.20%) Loss: 1.991987 LR: 0.00000973 +[16:04:42] Epoch: 1 Batch: 15919/20099 (79.20%) Loss: 1.927425 LR: 0.00000972 +[16:04:44] Epoch: 1 Batch: 15920/20099 (79.21%) Loss: 1.990829 LR: 0.00000972 +[16:04:45] Epoch: 1 Batch: 15921/20099 (79.21%) Loss: 2.029898 LR: 0.00000972 +[16:04:47] Epoch: 1 Batch: 15922/20099 (79.22%) Loss: 2.063139 LR: 0.00000972 +[16:04:49] Epoch: 1 Batch: 15923/20099 (79.22%) Loss: 2.181451 LR: 0.00000972 +[16:04:51] Epoch: 1 Batch: 15924/20099 (79.23%) Loss: 2.146837 LR: 0.00000972 +[16:04:53] Epoch: 1 Batch: 15925/20099 (79.23%) Loss: 2.129570 LR: 0.00000972 +[16:04:55] Epoch: 1 Batch: 15926/20099 (79.24%) Loss: 2.310096 LR: 0.00000971 +[16:04:57] Epoch: 1 Batch: 15927/20099 (79.24%) Loss: 2.133451 LR: 0.00000971 +[16:04:58] Epoch: 1 Batch: 15928/20099 (79.25%) Loss: 2.060733 LR: 0.00000971 +[16:05:00] Epoch: 1 Batch: 15929/20099 (79.25%) Loss: 2.009328 LR: 0.00000971 +[16:05:02] Epoch: 1 Batch: 15930/20099 (79.26%) Loss: 2.264488 LR: 0.00000971 +[16:05:04] Epoch: 1 Batch: 15931/20099 (79.26%) Loss: 1.887088 LR: 0.00000971 +[16:05:06] Epoch: 1 Batch: 15932/20099 (79.27%) Loss: 2.002310 LR: 0.00000971 +[16:05:08] Epoch: 1 Batch: 15933/20099 (79.27%) Loss: 1.818721 LR: 0.00000970 +[16:05:10] Epoch: 1 Batch: 15934/20099 (79.28%) Loss: 2.426514 LR: 0.00000970 +[16:05:11] Epoch: 1 Batch: 15935/20099 (79.28%) Loss: 2.045333 LR: 0.00000970 +[16:05:13] Epoch: 1 Batch: 15936/20099 (79.29%) Loss: 1.829441 LR: 0.00000970 +[16:05:15] Epoch: 1 Batch: 15937/20099 (79.29%) Loss: 2.552944 LR: 0.00000970 +[16:05:17] Epoch: 1 Batch: 15938/20099 (79.30%) Loss: 2.170409 LR: 0.00000970 +[16:05:19] Epoch: 1 Batch: 15939/20099 (79.30%) Loss: 1.718681 LR: 0.00000970 +[16:05:21] Epoch: 1 Batch: 15940/20099 (79.31%) Loss: 1.929232 LR: 0.00000969 +[16:05:23] Epoch: 1 Batch: 15941/20099 (79.31%) Loss: 2.056788 LR: 0.00000969 +[16:05:24] Epoch: 1 Batch: 15942/20099 (79.32%) Loss: 1.986933 LR: 0.00000969 +[16:05:26] Epoch: 1 Batch: 15943/20099 (79.32%) Loss: 1.996920 LR: 0.00000969 +[16:05:28] Epoch: 1 Batch: 15944/20099 (79.33%) Loss: 2.125483 LR: 0.00000969 +[16:05:30] Epoch: 1 Batch: 15945/20099 (79.33%) Loss: 1.982537 LR: 0.00000969 +[16:05:32] Epoch: 1 Batch: 15946/20099 (79.34%) Loss: 1.807639 LR: 0.00000969 +[16:05:34] Epoch: 1 Batch: 15947/20099 (79.34%) Loss: 1.806106 LR: 0.00000967 +[16:05:36] Epoch: 1 Batch: 15948/20099 (79.35%) Loss: 2.000024 LR: 0.00000967 +[16:05:37] Epoch: 1 Batch: 15949/20099 (79.35%) Loss: 2.316201 LR: 0.00000967 +[16:05:39] Epoch: 1 Batch: 15950/20099 (79.36%) Loss: 2.027521 LR: 0.00000967 +[16:05:41] Epoch: 1 Batch: 15951/20099 (79.36%) Loss: 1.986259 LR: 0.00000967 +[16:05:43] Epoch: 1 Batch: 15952/20099 (79.37%) Loss: 2.517145 LR: 0.00000967 +[16:05:45] Epoch: 1 Batch: 15953/20099 (79.37%) Loss: 2.251321 LR: 0.00000967 +[16:05:47] Epoch: 1 Batch: 15954/20099 (79.38%) Loss: 1.892114 LR: 0.00000966 +[16:05:49] Epoch: 1 Batch: 15955/20099 (79.38%) Loss: 1.994374 LR: 0.00000966 +[16:05:50] Epoch: 1 Batch: 15956/20099 (79.39%) Loss: 1.884433 LR: 0.00000966 +[16:05:52] Epoch: 1 Batch: 15957/20099 (79.39%) Loss: 1.970425 LR: 0.00000966 +[16:05:54] Epoch: 1 Batch: 15958/20099 (79.40%) Loss: 1.951691 LR: 0.00000966 +[16:05:56] Epoch: 1 Batch: 15959/20099 (79.40%) Loss: 1.963602 LR: 0.00000966 +[16:05:58] Epoch: 1 Batch: 15960/20099 (79.41%) Loss: 1.896909 LR: 0.00000966 +[16:06:00] Epoch: 1 Batch: 15961/20099 (79.41%) Loss: 1.966817 LR: 0.00000965 +[16:06:02] Epoch: 1 Batch: 15962/20099 (79.42%) Loss: 2.094138 LR: 0.00000965 +[16:06:03] Epoch: 1 Batch: 15963/20099 (79.42%) Loss: 2.034080 LR: 0.00000965 +[16:06:05] Epoch: 1 Batch: 15964/20099 (79.43%) Loss: 2.104521 LR: 0.00000965 +[16:06:07] Epoch: 1 Batch: 15965/20099 (79.43%) Loss: 1.782545 LR: 0.00000965 +[16:06:09] Epoch: 1 Batch: 15966/20099 (79.44%) Loss: 2.009510 LR: 0.00000965 +[16:06:11] Epoch: 1 Batch: 15967/20099 (79.44%) Loss: 1.649070 LR: 0.00000965 +[16:06:13] Epoch: 1 Batch: 15968/20099 (79.45%) Loss: 2.149815 LR: 0.00000964 +[16:06:15] Epoch: 1 Batch: 15969/20099 (79.45%) Loss: 2.035090 LR: 0.00000964 +[16:06:16] Epoch: 1 Batch: 15970/20099 (79.46%) Loss: 2.236294 LR: 0.00000964 +[16:06:18] Epoch: 1 Batch: 15971/20099 (79.46%) Loss: 2.275693 LR: 0.00000964 +[16:06:20] Epoch: 1 Batch: 15972/20099 (79.47%) Loss: 2.172979 LR: 0.00000964 +[16:06:22] Epoch: 1 Batch: 15973/20099 (79.47%) Loss: 2.235153 LR: 0.00000964 +[16:06:24] Epoch: 1 Batch: 15974/20099 (79.48%) Loss: 1.606601 LR: 0.00000964 +[16:06:26] Epoch: 1 Batch: 15975/20099 (79.48%) Loss: 2.061748 LR: 0.00000963 +[16:06:28] Epoch: 1 Batch: 15976/20099 (79.49%) Loss: 1.980593 LR: 0.00000963 +[16:06:29] Epoch: 1 Batch: 15977/20099 (79.49%) Loss: 1.970255 LR: 0.00000963 +[16:06:31] Epoch: 1 Batch: 15978/20099 (79.50%) Loss: 2.027783 LR: 0.00000963 +[16:06:33] Epoch: 1 Batch: 15979/20099 (79.50%) Loss: 1.925441 LR: 0.00000963 +[16:06:35] Epoch: 1 Batch: 15980/20099 (79.51%) Loss: 2.073586 LR: 0.00000963 +[16:06:37] Epoch: 1 Batch: 15981/20099 (79.51%) Loss: 2.237588 LR: 0.00000963 +[16:06:39] Epoch: 1 Batch: 15982/20099 (79.52%) Loss: 2.538350 LR: 0.00000962 +[16:06:41] Epoch: 1 Batch: 15983/20099 (79.52%) Loss: 2.069956 LR: 0.00000962 +[16:06:42] Epoch: 1 Batch: 15984/20099 (79.53%) Loss: 2.007016 LR: 0.00000962 +[16:06:44] Epoch: 1 Batch: 15985/20099 (79.53%) Loss: 2.342731 LR: 0.00000962 +[16:06:46] Epoch: 1 Batch: 15986/20099 (79.54%) Loss: 2.064733 LR: 0.00000962 +[16:06:48] Epoch: 1 Batch: 15987/20099 (79.54%) Loss: 1.825910 LR: 0.00000962 +[16:06:50] Epoch: 1 Batch: 15988/20099 (79.55%) Loss: 2.457549 LR: 0.00000962 +[16:06:52] Epoch: 1 Batch: 15989/20099 (79.55%) Loss: 1.984759 LR: 0.00000960 +[16:06:54] Epoch: 1 Batch: 15990/20099 (79.56%) Loss: 2.153504 LR: 0.00000960 +[16:06:55] Epoch: 1 Batch: 15991/20099 (79.56%) Loss: 2.168320 LR: 0.00000960 +[16:06:57] Epoch: 1 Batch: 15992/20099 (79.57%) Loss: 2.223712 LR: 0.00000960 +[16:06:59] Epoch: 1 Batch: 15993/20099 (79.57%) Loss: 1.968870 LR: 0.00000960 +[16:07:01] Epoch: 1 Batch: 15994/20099 (79.58%) Loss: 2.223657 LR: 0.00000960 +[16:07:03] Epoch: 1 Batch: 15995/20099 (79.58%) Loss: 1.992247 LR: 0.00000960 +[16:07:05] Epoch: 1 Batch: 15996/20099 (79.59%) Loss: 2.353453 LR: 0.00000959 +[16:07:06] Epoch: 1 Batch: 15997/20099 (79.59%) Loss: 1.974596 LR: 0.00000959 +[16:07:08] Epoch: 1 Batch: 15998/20099 (79.60%) Loss: 2.178932 LR: 0.00000959 +[16:07:10] Epoch: 1 Batch: 15999/20099 (79.60%) Loss: 1.863865 LR: 0.00000959 +[16:07:12] >> Evaluating batch 0 +[16:07:13] >> Evaluating batch 1 +[16:07:14] >> Evaluating batch 2 +[16:07:15] >> Evaluating batch 3 +[16:07:16] >> Evaluating batch 4 +[16:07:17] >> Evaluating batch 5 +[16:07:18] >> Evaluating batch 6 +[16:07:20] >> Evaluating batch 7 +[16:07:21] >> Evaluating batch 8 +[16:07:22] >> Evaluating batch 9 +[16:07:23] >> Evaluating batch 10 +[16:07:24] >> Evaluating batch 11 +[16:07:25] >> Evaluating batch 12 +[16:07:26] >> Evaluating batch 13 +[16:07:27] >> Evaluating batch 14 +[16:07:28] >> Evaluating batch 15 +[16:07:29] >> Evaluating batch 16 +[16:07:29] Epoch: 1 Step: 16000/20099 Evaluation: +[16:07:29] [1mAvg Loss Since Last Eval: 2.0832 Val Loss: 2.1489 Validation loss delta: -0.0015 Perplexity: 8.5754 LR: 0.00000959 +[16:07:33] >> Cleaned up old temp checkpoint: epoch1_step14000 +[16:07:33] >> Temp checkpoint saved: epoch1_step16000, size: 0.1693 GB +[16:07:36] >> Checkpoint saved: epoch1_step16000, size: 0.1693 GB +[16:07:36] Epoch: 1 Batch: 16000/20099 (79.61%) Loss: 2.333603 LR: 0.00000959 +[16:07:38] Epoch: 1 Batch: 16001/20099 (79.61%) Loss: 2.075922 LR: 0.00000959 +[16:07:40] Epoch: 1 Batch: 16002/20099 (79.62%) Loss: 2.095192 LR: 0.00000959 +[16:07:42] Epoch: 1 Batch: 16003/20099 (79.62%) Loss: 2.162087 LR: 0.00000958 +[16:07:44] Epoch: 1 Batch: 16004/20099 (79.63%) Loss: 1.783127 LR: 0.00000958 +[16:07:45] Epoch: 1 Batch: 16005/20099 (79.63%) Loss: 2.001313 LR: 0.00000958 +[16:07:47] Epoch: 1 Batch: 16006/20099 (79.64%) Loss: 1.945339 LR: 0.00000958 +[16:07:49] Epoch: 1 Batch: 16007/20099 (79.64%) Loss: 1.991182 LR: 0.00000958 +[16:07:51] Epoch: 1 Batch: 16008/20099 (79.65%) Loss: 2.114540 LR: 0.00000958 +[16:07:53] Epoch: 1 Batch: 16009/20099 (79.65%) Loss: 2.072911 LR: 0.00000958 +[16:07:55] Epoch: 1 Batch: 16010/20099 (79.66%) Loss: 2.069255 LR: 0.00000957 +[16:07:56] Epoch: 1 Batch: 16011/20099 (79.66%) Loss: 2.141101 LR: 0.00000957 +[16:07:58] Epoch: 1 Batch: 16012/20099 (79.67%) Loss: 2.186353 LR: 0.00000957 +[16:08:00] Epoch: 1 Batch: 16013/20099 (79.67%) Loss: 2.107273 LR: 0.00000957 +[16:08:02] Epoch: 1 Batch: 16014/20099 (79.68%) Loss: 1.867392 LR: 0.00000957 +[16:08:04] Epoch: 1 Batch: 16015/20099 (79.68%) Loss: 1.975651 LR: 0.00000957 +[16:08:06] Epoch: 1 Batch: 16016/20099 (79.69%) Loss: 1.994370 LR: 0.00000957 +[16:08:08] Epoch: 1 Batch: 16017/20099 (79.69%) Loss: 1.718414 LR: 0.00000956 +[16:08:09] Epoch: 1 Batch: 16018/20099 (79.70%) Loss: 1.784824 LR: 0.00000956 +[16:08:11] Epoch: 1 Batch: 16019/20099 (79.70%) Loss: 2.071734 LR: 0.00000956 +[16:08:13] Epoch: 1 Batch: 16020/20099 (79.71%) Loss: 2.116893 LR: 0.00000956 +[16:08:15] Epoch: 1 Batch: 16021/20099 (79.71%) Loss: 2.129067 LR: 0.00000956 +[16:08:17] Epoch: 1 Batch: 16022/20099 (79.72%) Loss: 1.583279 LR: 0.00000956 +[16:08:19] Epoch: 1 Batch: 16023/20099 (79.72%) Loss: 2.021614 LR: 0.00000956 +[16:08:21] Epoch: 1 Batch: 16024/20099 (79.73%) Loss: 2.063846 LR: 0.00000955 +[16:08:22] Epoch: 1 Batch: 16025/20099 (79.73%) Loss: 2.326497 LR: 0.00000955 +[16:08:24] Epoch: 1 Batch: 16026/20099 (79.74%) Loss: 2.180389 LR: 0.00000955 +[16:08:26] Epoch: 1 Batch: 16027/20099 (79.74%) Loss: 2.197096 LR: 0.00000955 +[16:08:28] Epoch: 1 Batch: 16028/20099 (79.75%) Loss: 2.300980 LR: 0.00000955 +[16:08:30] Epoch: 1 Batch: 16029/20099 (79.75%) Loss: 1.945229 LR: 0.00000955 +[16:08:32] Epoch: 1 Batch: 16030/20099 (79.76%) Loss: 2.389047 LR: 0.00000955 +[16:08:34] Epoch: 1 Batch: 16031/20099 (79.76%) Loss: 2.038697 LR: 0.00000953 +[16:08:35] Epoch: 1 Batch: 16032/20099 (79.77%) Loss: 2.304904 LR: 0.00000953 +[16:08:37] Epoch: 1 Batch: 16033/20099 (79.77%) Loss: 1.943884 LR: 0.00000953 +[16:08:39] Epoch: 1 Batch: 16034/20099 (79.78%) Loss: 2.231856 LR: 0.00000953 +[16:08:41] Epoch: 1 Batch: 16035/20099 (79.78%) Loss: 2.062818 LR: 0.00000953 +[16:08:43] Epoch: 1 Batch: 16036/20099 (79.79%) Loss: 2.463088 LR: 0.00000953 +[16:08:45] Epoch: 1 Batch: 16037/20099 (79.79%) Loss: 1.882139 LR: 0.00000953 +[16:08:47] Epoch: 1 Batch: 16038/20099 (79.80%) Loss: 2.138044 LR: 0.00000952 +[16:08:48] Epoch: 1 Batch: 16039/20099 (79.80%) Loss: 2.224810 LR: 0.00000952 +[16:08:50] Epoch: 1 Batch: 16040/20099 (79.80%) Loss: 2.064534 LR: 0.00000952 +[16:08:52] Epoch: 1 Batch: 16041/20099 (79.81%) Loss: 2.012225 LR: 0.00000952 +[16:08:54] Epoch: 1 Batch: 16042/20099 (79.81%) Loss: 2.265417 LR: 0.00000952 +[16:08:56] Epoch: 1 Batch: 16043/20099 (79.82%) Loss: 2.202663 LR: 0.00000952 +[16:08:58] Epoch: 1 Batch: 16044/20099 (79.82%) Loss: 1.857615 LR: 0.00000952 +[16:09:00] Epoch: 1 Batch: 16045/20099 (79.83%) Loss: 1.804510 LR: 0.00000951 +[16:09:01] Epoch: 1 Batch: 16046/20099 (79.83%) Loss: 1.948513 LR: 0.00000951 +[16:09:03] Epoch: 1 Batch: 16047/20099 (79.84%) Loss: 1.931455 LR: 0.00000951 +[16:09:05] Epoch: 1 Batch: 16048/20099 (79.84%) Loss: 2.257914 LR: 0.00000951 +[16:09:07] Epoch: 1 Batch: 16049/20099 (79.85%) Loss: 1.781874 LR: 0.00000951 +[16:09:09] Epoch: 1 Batch: 16050/20099 (79.85%) Loss: 2.147046 LR: 0.00000951 +[16:09:11] Epoch: 1 Batch: 16051/20099 (79.86%) Loss: 2.174601 LR: 0.00000951 +[16:09:12] Epoch: 1 Batch: 16052/20099 (79.86%) Loss: 1.810132 LR: 0.00000950 +[16:09:14] Epoch: 1 Batch: 16053/20099 (79.87%) Loss: 2.076145 LR: 0.00000950 +[16:09:16] Epoch: 1 Batch: 16054/20099 (79.87%) Loss: 2.177145 LR: 0.00000950 +[16:09:18] Epoch: 1 Batch: 16055/20099 (79.88%) Loss: 2.316991 LR: 0.00000950 +[16:09:20] Epoch: 1 Batch: 16056/20099 (79.88%) Loss: 2.006156 LR: 0.00000950 +[16:09:22] Epoch: 1 Batch: 16057/20099 (79.89%) Loss: 1.941637 LR: 0.00000950 +[16:09:24] Epoch: 1 Batch: 16058/20099 (79.89%) Loss: 1.860137 LR: 0.00000950 +[16:09:25] Epoch: 1 Batch: 16059/20099 (79.90%) Loss: 2.138763 LR: 0.00000949 +[16:09:27] Epoch: 1 Batch: 16060/20099 (79.90%) Loss: 2.144622 LR: 0.00000949 +[16:09:29] Epoch: 1 Batch: 16061/20099 (79.91%) Loss: 1.771066 LR: 0.00000949 +[16:09:31] Epoch: 1 Batch: 16062/20099 (79.91%) Loss: 2.062634 LR: 0.00000949 +[16:09:33] Epoch: 1 Batch: 16063/20099 (79.92%) Loss: 1.971748 LR: 0.00000949 +[16:09:35] Epoch: 1 Batch: 16064/20099 (79.92%) Loss: 1.773437 LR: 0.00000949 +[16:09:37] Epoch: 1 Batch: 16065/20099 (79.93%) Loss: 1.965925 LR: 0.00000949 +[16:09:38] Epoch: 1 Batch: 16066/20099 (79.93%) Loss: 2.158885 LR: 0.00000948 +[16:09:40] Epoch: 1 Batch: 16067/20099 (79.94%) Loss: 1.698883 LR: 0.00000948 +[16:09:42] Epoch: 1 Batch: 16068/20099 (79.94%) Loss: 1.952510 LR: 0.00000948 +[16:09:44] Epoch: 1 Batch: 16069/20099 (79.95%) Loss: 2.044368 LR: 0.00000948 +[16:09:46] Epoch: 1 Batch: 16070/20099 (79.95%) Loss: 2.048138 LR: 0.00000948 +[16:09:48] Epoch: 1 Batch: 16071/20099 (79.96%) Loss: 2.140734 LR: 0.00000948 +[16:09:50] Epoch: 1 Batch: 16072/20099 (79.96%) Loss: 1.973428 LR: 0.00000948 +[16:09:51] Epoch: 1 Batch: 16073/20099 (79.97%) Loss: 1.903027 LR: 0.00000947 +[16:09:53] Epoch: 1 Batch: 16074/20099 (79.97%) Loss: 2.579132 LR: 0.00000947 +[16:09:55] Epoch: 1 Batch: 16075/20099 (79.98%) Loss: 2.154301 LR: 0.00000947 +[16:09:57] Epoch: 1 Batch: 16076/20099 (79.98%) Loss: 2.282586 LR: 0.00000947 +[16:09:59] Epoch: 1 Batch: 16077/20099 (79.99%) Loss: 2.173342 LR: 0.00000947 +[16:10:01] Epoch: 1 Batch: 16078/20099 (79.99%) Loss: 2.020382 LR: 0.00000947 +[16:10:03] Epoch: 1 Batch: 16079/20099 (80.00%) Loss: 2.067012 LR: 0.00000947 +[16:10:04] Epoch: 1 Batch: 16080/20099 (80.00%) Loss: 2.142215 LR: 0.00000945 +[16:10:06] Epoch: 1 Batch: 16081/20099 (80.01%) Loss: 2.184170 LR: 0.00000945 +[16:10:08] Epoch: 1 Batch: 16082/20099 (80.01%) Loss: 2.260066 LR: 0.00000945 +[16:10:10] Epoch: 1 Batch: 16083/20099 (80.02%) Loss: 2.211658 LR: 0.00000945 +[16:10:12] Epoch: 1 Batch: 16084/20099 (80.02%) Loss: 2.162451 LR: 0.00000945 +[16:10:14] Epoch: 1 Batch: 16085/20099 (80.03%) Loss: 2.212565 LR: 0.00000945 +[16:10:16] Epoch: 1 Batch: 16086/20099 (80.03%) Loss: 2.288130 LR: 0.00000945 +[16:10:17] Epoch: 1 Batch: 16087/20099 (80.04%) Loss: 2.032566 LR: 0.00000944 +[16:10:19] Epoch: 1 Batch: 16088/20099 (80.04%) Loss: 2.324108 LR: 0.00000944 +[16:10:21] Epoch: 1 Batch: 16089/20099 (80.05%) Loss: 1.565732 LR: 0.00000944 +[16:10:23] Epoch: 1 Batch: 16090/20099 (80.05%) Loss: 2.124652 LR: 0.00000944 +[16:10:25] Epoch: 1 Batch: 16091/20099 (80.06%) Loss: 2.125581 LR: 0.00000944 +[16:10:27] Epoch: 1 Batch: 16092/20099 (80.06%) Loss: 2.148273 LR: 0.00000944 +[16:10:29] Epoch: 1 Batch: 16093/20099 (80.07%) Loss: 2.351346 LR: 0.00000944 +[16:10:31] Epoch: 1 Batch: 16094/20099 (80.07%) Loss: 1.891081 LR: 0.00000943 +[16:10:32] Epoch: 1 Batch: 16095/20099 (80.08%) Loss: 1.921865 LR: 0.00000943 +[16:10:34] Epoch: 1 Batch: 16096/20099 (80.08%) Loss: 1.993951 LR: 0.00000943 +[16:10:36] Epoch: 1 Batch: 16097/20099 (80.09%) Loss: 2.046703 LR: 0.00000943 +[16:10:38] Epoch: 1 Batch: 16098/20099 (80.09%) Loss: 1.883911 LR: 0.00000943 +[16:10:40] Epoch: 1 Batch: 16099/20099 (80.10%) Loss: 2.296765 LR: 0.00000943 +[16:10:42] Epoch: 1 Batch: 16100/20099 (80.10%) Loss: 1.429716 LR: 0.00000943 +[16:10:44] Epoch: 1 Batch: 16101/20099 (80.11%) Loss: 2.021261 LR: 0.00000942 +[16:10:45] Epoch: 1 Batch: 16102/20099 (80.11%) Loss: 2.589377 LR: 0.00000942 +[16:10:47] Epoch: 1 Batch: 16103/20099 (80.12%) Loss: 2.082891 LR: 0.00000942 +[16:10:49] Epoch: 1 Batch: 16104/20099 (80.12%) Loss: 2.170861 LR: 0.00000942 +[16:10:51] Epoch: 1 Batch: 16105/20099 (80.13%) Loss: 2.072650 LR: 0.00000942 +[16:10:53] Epoch: 1 Batch: 16106/20099 (80.13%) Loss: 2.279993 LR: 0.00000942 +[16:10:55] Epoch: 1 Batch: 16107/20099 (80.14%) Loss: 1.920435 LR: 0.00000942 +[16:10:57] Epoch: 1 Batch: 16108/20099 (80.14%) Loss: 2.115345 LR: 0.00000941 +[16:10:58] Epoch: 1 Batch: 16109/20099 (80.15%) Loss: 2.205196 LR: 0.00000941 +[16:11:00] Epoch: 1 Batch: 16110/20099 (80.15%) Loss: 2.131766 LR: 0.00000941 +[16:11:02] Epoch: 1 Batch: 16111/20099 (80.16%) Loss: 2.310471 LR: 0.00000941 +[16:11:04] Epoch: 1 Batch: 16112/20099 (80.16%) Loss: 2.138590 LR: 0.00000941 +[16:11:06] Epoch: 1 Batch: 16113/20099 (80.17%) Loss: 2.274571 LR: 0.00000941 +[16:11:08] Epoch: 1 Batch: 16114/20099 (80.17%) Loss: 2.001380 LR: 0.00000941 +[16:11:10] Epoch: 1 Batch: 16115/20099 (80.18%) Loss: 2.040736 LR: 0.00000940 +[16:11:11] Epoch: 1 Batch: 16116/20099 (80.18%) Loss: 2.132338 LR: 0.00000940 +[16:11:13] Epoch: 1 Batch: 16117/20099 (80.19%) Loss: 1.939666 LR: 0.00000940 +[16:11:15] Epoch: 1 Batch: 16118/20099 (80.19%) Loss: 1.968175 LR: 0.00000940 +[16:11:17] Epoch: 1 Batch: 16119/20099 (80.20%) Loss: 1.811564 LR: 0.00000940 +[16:11:19] Epoch: 1 Batch: 16120/20099 (80.20%) Loss: 1.948194 LR: 0.00000940 +[16:11:21] Epoch: 1 Batch: 16121/20099 (80.21%) Loss: 1.971432 LR: 0.00000940 +[16:11:23] Epoch: 1 Batch: 16122/20099 (80.21%) Loss: 1.957029 LR: 0.00000939 +[16:11:24] Epoch: 1 Batch: 16123/20099 (80.22%) Loss: 1.983742 LR: 0.00000939 +[16:11:26] Epoch: 1 Batch: 16124/20099 (80.22%) Loss: 2.321616 LR: 0.00000939 +[16:11:28] Epoch: 1 Batch: 16125/20099 (80.23%) Loss: 1.886446 LR: 0.00000939 +[16:11:30] Epoch: 1 Batch: 16126/20099 (80.23%) Loss: 1.948652 LR: 0.00000939 +[16:11:32] Epoch: 1 Batch: 16127/20099 (80.24%) Loss: 2.157378 LR: 0.00000939 +[16:11:34] Epoch: 1 Batch: 16128/20099 (80.24%) Loss: 2.149153 LR: 0.00000939 +[16:11:36] Epoch: 1 Batch: 16129/20099 (80.25%) Loss: 2.044570 LR: 0.00000938 +[16:11:37] Epoch: 1 Batch: 16130/20099 (80.25%) Loss: 1.874442 LR: 0.00000938 +[16:11:39] Epoch: 1 Batch: 16131/20099 (80.26%) Loss: 1.622912 LR: 0.00000938 +[16:11:41] Epoch: 1 Batch: 16132/20099 (80.26%) Loss: 1.899008 LR: 0.00000938 +[16:11:43] Epoch: 1 Batch: 16133/20099 (80.27%) Loss: 2.073575 LR: 0.00000938 +[16:11:45] Epoch: 1 Batch: 16134/20099 (80.27%) Loss: 2.205327 LR: 0.00000938 +[16:11:47] Epoch: 1 Batch: 16135/20099 (80.28%) Loss: 2.192777 LR: 0.00000938 +[16:11:49] Epoch: 1 Batch: 16136/20099 (80.28%) Loss: 1.954570 LR: 0.00000936 +[16:11:50] Epoch: 1 Batch: 16137/20099 (80.29%) Loss: 2.243796 LR: 0.00000936 +[16:11:52] Epoch: 1 Batch: 16138/20099 (80.29%) Loss: 2.008721 LR: 0.00000936 +[16:11:54] Epoch: 1 Batch: 16139/20099 (80.30%) Loss: 1.996799 LR: 0.00000936 +[16:11:56] Epoch: 1 Batch: 16140/20099 (80.30%) Loss: 1.844346 LR: 0.00000936 +[16:11:58] Epoch: 1 Batch: 16141/20099 (80.31%) Loss: 1.987541 LR: 0.00000936 +[16:12:00] Epoch: 1 Batch: 16142/20099 (80.31%) Loss: 2.058368 LR: 0.00000936 +[16:12:02] Epoch: 1 Batch: 16143/20099 (80.32%) Loss: 1.701446 LR: 0.00000935 +[16:12:03] Epoch: 1 Batch: 16144/20099 (80.32%) Loss: 2.061042 LR: 0.00000935 +[16:12:05] Epoch: 1 Batch: 16145/20099 (80.33%) Loss: 2.191151 LR: 0.00000935 +[16:12:07] Epoch: 1 Batch: 16146/20099 (80.33%) Loss: 1.921444 LR: 0.00000935 +[16:12:09] Epoch: 1 Batch: 16147/20099 (80.34%) Loss: 2.096420 LR: 0.00000935 +[16:12:11] Epoch: 1 Batch: 16148/20099 (80.34%) Loss: 2.063709 LR: 0.00000935 +[16:12:13] Epoch: 1 Batch: 16149/20099 (80.35%) Loss: 2.056097 LR: 0.00000935 +[16:12:15] Epoch: 1 Batch: 16150/20099 (80.35%) Loss: 1.774503 LR: 0.00000934 +[16:12:16] Epoch: 1 Batch: 16151/20099 (80.36%) Loss: 1.880729 LR: 0.00000934 +[16:12:18] Epoch: 1 Batch: 16152/20099 (80.36%) Loss: 2.170770 LR: 0.00000934 +[16:12:20] Epoch: 1 Batch: 16153/20099 (80.37%) Loss: 2.005351 LR: 0.00000934 +[16:12:22] Epoch: 1 Batch: 16154/20099 (80.37%) Loss: 2.300341 LR: 0.00000934 +[16:12:24] Epoch: 1 Batch: 16155/20099 (80.38%) Loss: 1.772797 LR: 0.00000934 +[16:12:26] Epoch: 1 Batch: 16156/20099 (80.38%) Loss: 2.207148 LR: 0.00000934 +[16:12:28] Epoch: 1 Batch: 16157/20099 (80.39%) Loss: 2.035293 LR: 0.00000933 +[16:12:29] Epoch: 1 Batch: 16158/20099 (80.39%) Loss: 2.267085 LR: 0.00000933 +[16:12:31] Epoch: 1 Batch: 16159/20099 (80.40%) Loss: 2.179423 LR: 0.00000933 +[16:12:33] Epoch: 1 Batch: 16160/20099 (80.40%) Loss: 2.379231 LR: 0.00000933 +[16:12:35] Epoch: 1 Batch: 16161/20099 (80.41%) Loss: 1.900602 LR: 0.00000933 +[16:12:37] Epoch: 1 Batch: 16162/20099 (80.41%) Loss: 2.152182 LR: 0.00000933 +[16:12:39] Epoch: 1 Batch: 16163/20099 (80.42%) Loss: 2.059004 LR: 0.00000933 +[16:12:40] Epoch: 1 Batch: 16164/20099 (80.42%) Loss: 1.996798 LR: 0.00000932 +[16:12:42] Epoch: 1 Batch: 16165/20099 (80.43%) Loss: 2.285771 LR: 0.00000932 +[16:12:44] Epoch: 1 Batch: 16166/20099 (80.43%) Loss: 2.231897 LR: 0.00000932 +[16:12:46] Epoch: 1 Batch: 16167/20099 (80.44%) Loss: 1.965531 LR: 0.00000932 +[16:12:48] Epoch: 1 Batch: 16168/20099 (80.44%) Loss: 2.193839 LR: 0.00000932 +[16:12:50] Epoch: 1 Batch: 16169/20099 (80.45%) Loss: 1.935272 LR: 0.00000932 +[16:12:52] Epoch: 1 Batch: 16170/20099 (80.45%) Loss: 2.403994 LR: 0.00000932 +[16:12:53] Epoch: 1 Batch: 16171/20099 (80.46%) Loss: 1.953079 LR: 0.00000931 +[16:12:55] Epoch: 1 Batch: 16172/20099 (80.46%) Loss: 2.028410 LR: 0.00000931 +[16:12:57] Epoch: 1 Batch: 16173/20099 (80.47%) Loss: 2.233258 LR: 0.00000931 +[16:12:59] Epoch: 1 Batch: 16174/20099 (80.47%) Loss: 1.968730 LR: 0.00000931 +[16:13:01] Epoch: 1 Batch: 16175/20099 (80.48%) Loss: 2.085281 LR: 0.00000931 +[16:13:03] Epoch: 1 Batch: 16176/20099 (80.48%) Loss: 2.051264 LR: 0.00000931 +[16:13:05] Epoch: 1 Batch: 16177/20099 (80.49%) Loss: 2.580713 LR: 0.00000931 +[16:13:06] Epoch: 1 Batch: 16178/20099 (80.49%) Loss: 2.175680 LR: 0.00000930 +[16:13:08] Epoch: 1 Batch: 16179/20099 (80.50%) Loss: 2.186239 LR: 0.00000930 +[16:13:10] Epoch: 1 Batch: 16180/20099 (80.50%) Loss: 1.871091 LR: 0.00000930 +[16:13:12] Epoch: 1 Batch: 16181/20099 (80.51%) Loss: 1.917589 LR: 0.00000930 +[16:13:14] Epoch: 1 Batch: 16182/20099 (80.51%) Loss: 1.954893 LR: 0.00000930 +[16:13:16] Epoch: 1 Batch: 16183/20099 (80.52%) Loss: 1.950796 LR: 0.00000930 +[16:13:18] Epoch: 1 Batch: 16184/20099 (80.52%) Loss: 1.984521 LR: 0.00000930 +[16:13:19] Epoch: 1 Batch: 16185/20099 (80.53%) Loss: 2.008568 LR: 0.00000929 +[16:13:21] Epoch: 1 Batch: 16186/20099 (80.53%) Loss: 2.154390 LR: 0.00000929 +[16:13:23] Epoch: 1 Batch: 16187/20099 (80.54%) Loss: 1.896702 LR: 0.00000929 +[16:13:25] Epoch: 1 Batch: 16188/20099 (80.54%) Loss: 2.099654 LR: 0.00000929 +[16:13:27] Epoch: 1 Batch: 16189/20099 (80.55%) Loss: 2.251858 LR: 0.00000929 +[16:13:29] Epoch: 1 Batch: 16190/20099 (80.55%) Loss: 1.927732 LR: 0.00000929 +[16:13:31] Epoch: 1 Batch: 16191/20099 (80.56%) Loss: 2.076502 LR: 0.00000929 +[16:13:33] Epoch: 1 Batch: 16192/20099 (80.56%) Loss: 2.358695 LR: 0.00000927 +[16:13:34] Epoch: 1 Batch: 16193/20099 (80.57%) Loss: 2.171126 LR: 0.00000927 +[16:13:36] Epoch: 1 Batch: 16194/20099 (80.57%) Loss: 1.971061 LR: 0.00000927 +[16:13:38] Epoch: 1 Batch: 16195/20099 (80.58%) Loss: 2.059350 LR: 0.00000927 +[16:13:40] Epoch: 1 Batch: 16196/20099 (80.58%) Loss: 1.812565 LR: 0.00000927 +[16:13:42] Epoch: 1 Batch: 16197/20099 (80.59%) Loss: 1.741261 LR: 0.00000927 +[16:13:44] Epoch: 1 Batch: 16198/20099 (80.59%) Loss: 2.197853 LR: 0.00000927 +[16:13:46] Epoch: 1 Batch: 16199/20099 (80.60%) Loss: 2.152427 LR: 0.00000926 +[16:13:51] >> Cleaned up old temp checkpoint: epoch1_step14200 +[16:13:51] >> Temp checkpoint saved: epoch1_step16200, size: 0.1693 GB +[16:13:51] Epoch: 1 Batch: 16200/20099 (80.60%) Loss: 2.072310 LR: 0.00000926 +[16:13:53] Epoch: 1 Batch: 16201/20099 (80.61%) Loss: 2.030499 LR: 0.00000926 +[16:13:55] Epoch: 1 Batch: 16202/20099 (80.61%) Loss: 1.994692 LR: 0.00000926 +[16:13:56] Epoch: 1 Batch: 16203/20099 (80.62%) Loss: 2.025219 LR: 0.00000926 +[16:13:58] Epoch: 1 Batch: 16204/20099 (80.62%) Loss: 1.901455 LR: 0.00000926 +[16:14:00] Epoch: 1 Batch: 16205/20099 (80.63%) Loss: 1.995661 LR: 0.00000926 +[16:14:02] Epoch: 1 Batch: 16206/20099 (80.63%) Loss: 2.003099 LR: 0.00000925 +[16:14:04] Epoch: 1 Batch: 16207/20099 (80.64%) Loss: 2.203615 LR: 0.00000925 +[16:14:06] Epoch: 1 Batch: 16208/20099 (80.64%) Loss: 2.163456 LR: 0.00000925 +[16:14:08] Epoch: 1 Batch: 16209/20099 (80.65%) Loss: 2.332261 LR: 0.00000925 +[16:14:09] Epoch: 1 Batch: 16210/20099 (80.65%) Loss: 2.261419 LR: 0.00000925 +[16:14:11] Epoch: 1 Batch: 16211/20099 (80.66%) Loss: 2.134980 LR: 0.00000925 +[16:14:13] Epoch: 1 Batch: 16212/20099 (80.66%) Loss: 2.063556 LR: 0.00000925 +[16:14:15] Epoch: 1 Batch: 16213/20099 (80.67%) Loss: 2.075718 LR: 0.00000924 +[16:14:17] Epoch: 1 Batch: 16214/20099 (80.67%) Loss: 2.062649 LR: 0.00000924 +[16:14:19] Epoch: 1 Batch: 16215/20099 (80.68%) Loss: 2.315426 LR: 0.00000924 +[16:14:20] Epoch: 1 Batch: 16216/20099 (80.68%) Loss: 2.029032 LR: 0.00000924 +[16:14:22] Epoch: 1 Batch: 16217/20099 (80.69%) Loss: 2.137788 LR: 0.00000924 +[16:14:24] Epoch: 1 Batch: 16218/20099 (80.69%) Loss: 2.316128 LR: 0.00000924 +[16:14:26] Epoch: 1 Batch: 16219/20099 (80.70%) Loss: 2.284968 LR: 0.00000924 +[16:14:28] Epoch: 1 Batch: 16220/20099 (80.70%) Loss: 1.941548 LR: 0.00000923 +[16:14:30] Epoch: 1 Batch: 16221/20099 (80.71%) Loss: 1.917004 LR: 0.00000923 +[16:14:32] Epoch: 1 Batch: 16222/20099 (80.71%) Loss: 1.887786 LR: 0.00000923 +[16:14:33] Epoch: 1 Batch: 16223/20099 (80.72%) Loss: 1.907259 LR: 0.00000923 +[16:14:35] Epoch: 1 Batch: 16224/20099 (80.72%) Loss: 1.875565 LR: 0.00000923 +[16:14:37] Epoch: 1 Batch: 16225/20099 (80.73%) Loss: 2.085193 LR: 0.00000923 +[16:14:39] Epoch: 1 Batch: 16226/20099 (80.73%) Loss: 2.120503 LR: 0.00000923 +[16:14:41] Epoch: 1 Batch: 16227/20099 (80.74%) Loss: 2.110924 LR: 0.00000922 +[16:14:43] Epoch: 1 Batch: 16228/20099 (80.74%) Loss: 2.246830 LR: 0.00000922 +[16:14:45] Epoch: 1 Batch: 16229/20099 (80.75%) Loss: 2.101231 LR: 0.00000922 +[16:14:46] Epoch: 1 Batch: 16230/20099 (80.75%) Loss: 2.256769 LR: 0.00000922 +[16:14:48] Epoch: 1 Batch: 16231/20099 (80.76%) Loss: 2.027970 LR: 0.00000922 +[16:14:50] Epoch: 1 Batch: 16232/20099 (80.76%) Loss: 2.118880 LR: 0.00000922 +[16:14:52] Epoch: 1 Batch: 16233/20099 (80.77%) Loss: 1.991335 LR: 0.00000922 +[16:14:54] Epoch: 1 Batch: 16234/20099 (80.77%) Loss: 2.079563 LR: 0.00000921 +[16:14:56] Epoch: 1 Batch: 16235/20099 (80.78%) Loss: 1.969099 LR: 0.00000921 +[16:14:58] Epoch: 1 Batch: 16236/20099 (80.78%) Loss: 2.250857 LR: 0.00000921 +[16:14:59] Epoch: 1 Batch: 16237/20099 (80.79%) Loss: 1.845540 LR: 0.00000921 +[16:15:01] Epoch: 1 Batch: 16238/20099 (80.79%) Loss: 2.079174 LR: 0.00000921 +[16:15:03] Epoch: 1 Batch: 16239/20099 (80.80%) Loss: 2.114981 LR: 0.00000921 +[16:15:05] Epoch: 1 Batch: 16240/20099 (80.80%) Loss: 2.210446 LR: 0.00000921 +[16:15:07] Epoch: 1 Batch: 16241/20099 (80.81%) Loss: 1.993339 LR: 0.00000920 +[16:15:09] Epoch: 1 Batch: 16242/20099 (80.81%) Loss: 2.245748 LR: 0.00000920 +[16:15:11] Epoch: 1 Batch: 16243/20099 (80.81%) Loss: 2.250192 LR: 0.00000920 +[16:15:12] Epoch: 1 Batch: 16244/20099 (80.82%) Loss: 2.057471 LR: 0.00000920 +[16:15:14] Epoch: 1 Batch: 16245/20099 (80.82%) Loss: 2.240060 LR: 0.00000920 +[16:15:16] Epoch: 1 Batch: 16246/20099 (80.83%) Loss: 2.008941 LR: 0.00000920 +[16:15:18] Epoch: 1 Batch: 16247/20099 (80.83%) Loss: 1.833347 LR: 0.00000920 +[16:15:20] Epoch: 1 Batch: 16248/20099 (80.84%) Loss: 1.843803 LR: 0.00000919 +[16:15:22] Epoch: 1 Batch: 16249/20099 (80.84%) Loss: 1.957973 LR: 0.00000919 +[16:15:23] Epoch: 1 Batch: 16250/20099 (80.85%) Loss: 2.161137 LR: 0.00000919 +[16:15:25] Epoch: 1 Batch: 16251/20099 (80.85%) Loss: 1.972811 LR: 0.00000919 +[16:15:27] Epoch: 1 Batch: 16252/20099 (80.86%) Loss: 1.905372 LR: 0.00000919 +[16:15:29] Epoch: 1 Batch: 16253/20099 (80.86%) Loss: 2.048338 LR: 0.00000919 +[16:15:31] Epoch: 1 Batch: 16254/20099 (80.87%) Loss: 2.161475 LR: 0.00000919 +[16:15:33] Epoch: 1 Batch: 16255/20099 (80.87%) Loss: 2.358382 LR: 0.00000917 +[16:15:35] Epoch: 1 Batch: 16256/20099 (80.88%) Loss: 1.970593 LR: 0.00000917 +[16:15:36] Epoch: 1 Batch: 16257/20099 (80.88%) Loss: 1.892983 LR: 0.00000917 +[16:15:38] Epoch: 1 Batch: 16258/20099 (80.89%) Loss: 2.253192 LR: 0.00000917 +[16:15:40] Epoch: 1 Batch: 16259/20099 (80.89%) Loss: 2.088091 LR: 0.00000917 +[16:15:42] Epoch: 1 Batch: 16260/20099 (80.90%) Loss: 1.883291 LR: 0.00000917 +[16:15:44] Epoch: 1 Batch: 16261/20099 (80.90%) Loss: 2.102073 LR: 0.00000917 +[16:15:46] Epoch: 1 Batch: 16262/20099 (80.91%) Loss: 2.053726 LR: 0.00000916 +[16:15:48] Epoch: 1 Batch: 16263/20099 (80.91%) Loss: 2.074721 LR: 0.00000916 +[16:15:49] Epoch: 1 Batch: 16264/20099 (80.92%) Loss: 2.085749 LR: 0.00000916 +[16:15:51] Epoch: 1 Batch: 16265/20099 (80.92%) Loss: 1.658970 LR: 0.00000916 +[16:15:53] Epoch: 1 Batch: 16266/20099 (80.93%) Loss: 1.906022 LR: 0.00000916 +[16:15:55] Epoch: 1 Batch: 16267/20099 (80.93%) Loss: 2.078442 LR: 0.00000916 +[16:15:57] Epoch: 1 Batch: 16268/20099 (80.94%) Loss: 1.869716 LR: 0.00000916 +[16:15:59] Epoch: 1 Batch: 16269/20099 (80.94%) Loss: 2.378378 LR: 0.00000915 +[16:16:01] Epoch: 1 Batch: 16270/20099 (80.95%) Loss: 2.135688 LR: 0.00000915 +[16:16:02] Epoch: 1 Batch: 16271/20099 (80.95%) Loss: 2.033135 LR: 0.00000915 +[16:16:04] Epoch: 1 Batch: 16272/20099 (80.96%) Loss: 1.955188 LR: 0.00000915 +[16:16:06] Epoch: 1 Batch: 16273/20099 (80.96%) Loss: 2.033279 LR: 0.00000915 +[16:16:08] Epoch: 1 Batch: 16274/20099 (80.97%) Loss: 2.132611 LR: 0.00000915 +[16:16:10] Epoch: 1 Batch: 16275/20099 (80.97%) Loss: 2.141267 LR: 0.00000915 +[16:16:12] Epoch: 1 Batch: 16276/20099 (80.98%) Loss: 1.830270 LR: 0.00000914 +[16:16:14] Epoch: 1 Batch: 16277/20099 (80.98%) Loss: 1.984531 LR: 0.00000914 +[16:16:15] Epoch: 1 Batch: 16278/20099 (80.99%) Loss: 2.256507 LR: 0.00000914 +[16:16:17] Epoch: 1 Batch: 16279/20099 (80.99%) Loss: 1.934224 LR: 0.00000914 +[16:16:19] Epoch: 1 Batch: 16280/20099 (81.00%) Loss: 1.647403 LR: 0.00000914 +[16:16:21] Epoch: 1 Batch: 16281/20099 (81.00%) Loss: 2.150187 LR: 0.00000914 +[16:16:23] Epoch: 1 Batch: 16282/20099 (81.01%) Loss: 2.173044 LR: 0.00000914 +[16:16:25] Epoch: 1 Batch: 16283/20099 (81.01%) Loss: 2.117081 LR: 0.00000913 +[16:16:27] Epoch: 1 Batch: 16284/20099 (81.02%) Loss: 1.832438 LR: 0.00000913 +[16:16:29] Epoch: 1 Batch: 16285/20099 (81.02%) Loss: 1.934216 LR: 0.00000913 +[16:16:30] Epoch: 1 Batch: 16286/20099 (81.03%) Loss: 2.274875 LR: 0.00000913 +[16:16:32] Epoch: 1 Batch: 16287/20099 (81.03%) Loss: 2.230308 LR: 0.00000913 +[16:16:34] Epoch: 1 Batch: 16288/20099 (81.04%) Loss: 2.269819 LR: 0.00000913 +[16:16:36] Epoch: 1 Batch: 16289/20099 (81.04%) Loss: 2.151894 LR: 0.00000913 +[16:16:38] Epoch: 1 Batch: 16290/20099 (81.05%) Loss: 1.856832 LR: 0.00000912 +[16:16:40] Epoch: 1 Batch: 16291/20099 (81.05%) Loss: 2.420965 LR: 0.00000912 +[16:16:42] Epoch: 1 Batch: 16292/20099 (81.06%) Loss: 2.079112 LR: 0.00000912 +[16:16:43] Epoch: 1 Batch: 16293/20099 (81.06%) Loss: 2.132642 LR: 0.00000912 +[16:16:45] Epoch: 1 Batch: 16294/20099 (81.07%) Loss: 1.997705 LR: 0.00000912 +[16:16:47] Epoch: 1 Batch: 16295/20099 (81.07%) Loss: 2.125673 LR: 0.00000912 +[16:16:49] Epoch: 1 Batch: 16296/20099 (81.08%) Loss: 2.069116 LR: 0.00000912 +[16:16:51] Epoch: 1 Batch: 16297/20099 (81.08%) Loss: 1.888569 LR: 0.00000911 +[16:16:53] Epoch: 1 Batch: 16298/20099 (81.09%) Loss: 2.216491 LR: 0.00000911 +[16:16:55] Epoch: 1 Batch: 16299/20099 (81.09%) Loss: 1.793048 LR: 0.00000911 +[16:16:56] Epoch: 1 Batch: 16300/20099 (81.10%) Loss: 1.944720 LR: 0.00000911 +[16:16:58] Epoch: 1 Batch: 16301/20099 (81.10%) Loss: 2.169119 LR: 0.00000911 +[16:17:00] Epoch: 1 Batch: 16302/20099 (81.11%) Loss: 2.007722 LR: 0.00000911 +[16:17:02] Epoch: 1 Batch: 16303/20099 (81.11%) Loss: 2.056260 LR: 0.00000911 +[16:17:04] Epoch: 1 Batch: 16304/20099 (81.12%) Loss: 2.043446 LR: 0.00000910 +[16:17:06] Epoch: 1 Batch: 16305/20099 (81.12%) Loss: 2.317718 LR: 0.00000910 +[16:17:08] Epoch: 1 Batch: 16306/20099 (81.13%) Loss: 2.196620 LR: 0.00000910 +[16:17:09] Epoch: 1 Batch: 16307/20099 (81.13%) Loss: 2.117467 LR: 0.00000910 +[16:17:11] Epoch: 1 Batch: 16308/20099 (81.14%) Loss: 2.354849 LR: 0.00000910 +[16:17:13] Epoch: 1 Batch: 16309/20099 (81.14%) Loss: 1.910407 LR: 0.00000910 +[16:17:15] Epoch: 1 Batch: 16310/20099 (81.15%) Loss: 2.071437 LR: 0.00000910 +[16:17:17] Epoch: 1 Batch: 16311/20099 (81.15%) Loss: 2.010394 LR: 0.00000909 +[16:17:19] Epoch: 1 Batch: 16312/20099 (81.16%) Loss: 2.281152 LR: 0.00000909 +[16:17:21] Epoch: 1 Batch: 16313/20099 (81.16%) Loss: 1.898114 LR: 0.00000909 +[16:17:22] Epoch: 1 Batch: 16314/20099 (81.17%) Loss: 2.141768 LR: 0.00000909 +[16:17:24] Epoch: 1 Batch: 16315/20099 (81.17%) Loss: 1.999946 LR: 0.00000909 +[16:17:26] Epoch: 1 Batch: 16316/20099 (81.18%) Loss: 2.269006 LR: 0.00000909 +[16:17:28] Epoch: 1 Batch: 16317/20099 (81.18%) Loss: 1.954539 LR: 0.00000909 +[16:17:30] Epoch: 1 Batch: 16318/20099 (81.19%) Loss: 1.924524 LR: 0.00000908 +[16:17:32] Epoch: 1 Batch: 16319/20099 (81.19%) Loss: 2.186167 LR: 0.00000908 +[16:17:34] Epoch: 1 Batch: 16320/20099 (81.20%) Loss: 2.141123 LR: 0.00000908 +[16:17:35] Epoch: 1 Batch: 16321/20099 (81.20%) Loss: 2.383243 LR: 0.00000908 +[16:17:37] Epoch: 1 Batch: 16322/20099 (81.21%) Loss: 2.273573 LR: 0.00000908 +[16:17:39] Epoch: 1 Batch: 16323/20099 (81.21%) Loss: 2.262253 LR: 0.00000908 +[16:17:41] Epoch: 1 Batch: 16324/20099 (81.22%) Loss: 1.951034 LR: 0.00000908 +[16:17:43] Epoch: 1 Batch: 16325/20099 (81.22%) Loss: 2.372686 LR: 0.00000907 +[16:17:45] Epoch: 1 Batch: 16326/20099 (81.23%) Loss: 1.999699 LR: 0.00000907 +[16:17:47] Epoch: 1 Batch: 16327/20099 (81.23%) Loss: 1.915114 LR: 0.00000907 +[16:17:48] Epoch: 1 Batch: 16328/20099 (81.24%) Loss: 1.518602 LR: 0.00000907 +[16:17:50] Epoch: 1 Batch: 16329/20099 (81.24%) Loss: 1.936647 LR: 0.00000907 +[16:17:52] Epoch: 1 Batch: 16330/20099 (81.25%) Loss: 2.140012 LR: 0.00000907 +[16:17:54] Epoch: 1 Batch: 16331/20099 (81.25%) Loss: 2.056526 LR: 0.00000907 +[16:17:56] Epoch: 1 Batch: 16332/20099 (81.26%) Loss: 2.191687 LR: 0.00000905 +[16:17:58] Epoch: 1 Batch: 16333/20099 (81.26%) Loss: 2.051647 LR: 0.00000905 +[16:17:59] Epoch: 1 Batch: 16334/20099 (81.27%) Loss: 2.229664 LR: 0.00000905 +[16:18:01] Epoch: 1 Batch: 16335/20099 (81.27%) Loss: 2.010728 LR: 0.00000905 +[16:18:03] Epoch: 1 Batch: 16336/20099 (81.28%) Loss: 2.141655 LR: 0.00000905 +[16:18:05] Epoch: 1 Batch: 16337/20099 (81.28%) Loss: 2.018472 LR: 0.00000905 +[16:18:07] Epoch: 1 Batch: 16338/20099 (81.29%) Loss: 2.302359 LR: 0.00000905 +[16:18:09] Epoch: 1 Batch: 16339/20099 (81.29%) Loss: 2.220205 LR: 0.00000904 +[16:18:11] Epoch: 1 Batch: 16340/20099 (81.30%) Loss: 2.022842 LR: 0.00000904 +[16:18:12] Epoch: 1 Batch: 16341/20099 (81.30%) Loss: 1.561133 LR: 0.00000904 +[16:18:14] Epoch: 1 Batch: 16342/20099 (81.31%) Loss: 2.295654 LR: 0.00000904 +[16:18:16] Epoch: 1 Batch: 16343/20099 (81.31%) Loss: 2.443317 LR: 0.00000904 +[16:18:18] Epoch: 1 Batch: 16344/20099 (81.32%) Loss: 2.069518 LR: 0.00000904 +[16:18:20] Epoch: 1 Batch: 16345/20099 (81.32%) Loss: 2.228383 LR: 0.00000904 +[16:18:22] Epoch: 1 Batch: 16346/20099 (81.33%) Loss: 1.787857 LR: 0.00000903 +[16:18:24] Epoch: 1 Batch: 16347/20099 (81.33%) Loss: 1.798540 LR: 0.00000903 +[16:18:25] Epoch: 1 Batch: 16348/20099 (81.34%) Loss: 2.076424 LR: 0.00000903 +[16:18:27] Epoch: 1 Batch: 16349/20099 (81.34%) Loss: 1.930049 LR: 0.00000903 +[16:18:29] Epoch: 1 Batch: 16350/20099 (81.35%) Loss: 2.189503 LR: 0.00000903 +[16:18:31] Epoch: 1 Batch: 16351/20099 (81.35%) Loss: 2.202222 LR: 0.00000903 +[16:18:33] Epoch: 1 Batch: 16352/20099 (81.36%) Loss: 1.936811 LR: 0.00000903 +[16:18:35] Epoch: 1 Batch: 16353/20099 (81.36%) Loss: 2.376939 LR: 0.00000902 +[16:18:36] Epoch: 1 Batch: 16354/20099 (81.37%) Loss: 2.293652 LR: 0.00000902 +[16:18:38] Epoch: 1 Batch: 16355/20099 (81.37%) Loss: 1.960539 LR: 0.00000902 +[16:18:40] Epoch: 1 Batch: 16356/20099 (81.38%) Loss: 2.094041 LR: 0.00000902 +[16:18:42] Epoch: 1 Batch: 16357/20099 (81.38%) Loss: 2.160471 LR: 0.00000902 +[16:18:44] Epoch: 1 Batch: 16358/20099 (81.39%) Loss: 1.955694 LR: 0.00000902 +[16:18:46] Epoch: 1 Batch: 16359/20099 (81.39%) Loss: 1.870648 LR: 0.00000902 +[16:18:48] Epoch: 1 Batch: 16360/20099 (81.40%) Loss: 2.089891 LR: 0.00000901 +[16:18:49] Epoch: 1 Batch: 16361/20099 (81.40%) Loss: 2.334042 LR: 0.00000901 +[16:18:51] Epoch: 1 Batch: 16362/20099 (81.41%) Loss: 2.328595 LR: 0.00000901 +[16:18:53] Epoch: 1 Batch: 16363/20099 (81.41%) Loss: 2.366523 LR: 0.00000901 +[16:18:55] Epoch: 1 Batch: 16364/20099 (81.42%) Loss: 2.426263 LR: 0.00000901 +[16:18:57] Epoch: 1 Batch: 16365/20099 (81.42%) Loss: 1.925551 LR: 0.00000901 +[16:18:59] Epoch: 1 Batch: 16366/20099 (81.43%) Loss: 1.943493 LR: 0.00000901 +[16:19:01] Epoch: 1 Batch: 16367/20099 (81.43%) Loss: 1.787126 LR: 0.00000900 +[16:19:02] Epoch: 1 Batch: 16368/20099 (81.44%) Loss: 1.962619 LR: 0.00000900 +[16:19:04] Epoch: 1 Batch: 16369/20099 (81.44%) Loss: 2.289970 LR: 0.00000900 +[16:19:06] Epoch: 1 Batch: 16370/20099 (81.45%) Loss: 2.190047 LR: 0.00000900 +[16:19:08] Epoch: 1 Batch: 16371/20099 (81.45%) Loss: 2.249807 LR: 0.00000900 +[16:19:10] Epoch: 1 Batch: 16372/20099 (81.46%) Loss: 2.336900 LR: 0.00000900 +[16:19:12] Epoch: 1 Batch: 16373/20099 (81.46%) Loss: 2.124334 LR: 0.00000900 +[16:19:14] Epoch: 1 Batch: 16374/20099 (81.47%) Loss: 2.361797 LR: 0.00000899 +[16:19:15] Epoch: 1 Batch: 16375/20099 (81.47%) Loss: 1.895668 LR: 0.00000899 +[16:19:17] Epoch: 1 Batch: 16376/20099 (81.48%) Loss: 2.109955 LR: 0.00000899 +[16:19:19] Epoch: 1 Batch: 16377/20099 (81.48%) Loss: 1.970118 LR: 0.00000899 +[16:19:21] Epoch: 1 Batch: 16378/20099 (81.49%) Loss: 2.160949 LR: 0.00000899 +[16:19:23] Epoch: 1 Batch: 16379/20099 (81.49%) Loss: 2.103943 LR: 0.00000899 +[16:19:25] Epoch: 1 Batch: 16380/20099 (81.50%) Loss: 1.970526 LR: 0.00000899 +[16:19:27] Epoch: 1 Batch: 16381/20099 (81.50%) Loss: 2.027892 LR: 0.00000898 +[16:19:29] Epoch: 1 Batch: 16382/20099 (81.51%) Loss: 2.165708 LR: 0.00000898 +[16:19:30] Epoch: 1 Batch: 16383/20099 (81.51%) Loss: 1.935363 LR: 0.00000898 +[16:19:32] Epoch: 1 Batch: 16384/20099 (81.52%) Loss: 2.043347 LR: 0.00000898 +[16:19:34] Epoch: 1 Batch: 16385/20099 (81.52%) Loss: 2.274018 LR: 0.00000898 +[16:19:36] Epoch: 1 Batch: 16386/20099 (81.53%) Loss: 2.229197 LR: 0.00000898 +[16:19:38] Epoch: 1 Batch: 16387/20099 (81.53%) Loss: 2.355647 LR: 0.00000898 +[16:19:40] Epoch: 1 Batch: 16388/20099 (81.54%) Loss: 2.541529 LR: 0.00000897 +[16:19:42] Epoch: 1 Batch: 16389/20099 (81.54%) Loss: 2.181964 LR: 0.00000897 +[16:19:43] Epoch: 1 Batch: 16390/20099 (81.55%) Loss: 2.094627 LR: 0.00000897 +[16:19:45] Epoch: 1 Batch: 16391/20099 (81.55%) Loss: 1.928281 LR: 0.00000897 +[16:19:47] Epoch: 1 Batch: 16392/20099 (81.56%) Loss: 2.072373 LR: 0.00000897 +[16:19:49] Epoch: 1 Batch: 16393/20099 (81.56%) Loss: 2.252689 LR: 0.00000897 +[16:19:51] Epoch: 1 Batch: 16394/20099 (81.57%) Loss: 2.374795 LR: 0.00000897 +[16:19:53] Epoch: 1 Batch: 16395/20099 (81.57%) Loss: 2.069069 LR: 0.00000896 +[16:19:55] Epoch: 1 Batch: 16396/20099 (81.58%) Loss: 1.946969 LR: 0.00000896 +[16:19:56] Epoch: 1 Batch: 16397/20099 (81.58%) Loss: 2.038100 LR: 0.00000896 +[16:19:58] Epoch: 1 Batch: 16398/20099 (81.59%) Loss: 2.205416 LR: 0.00000896 +[16:20:00] Epoch: 1 Batch: 16399/20099 (81.59%) Loss: 2.535084 LR: 0.00000896 +[16:20:06] >> Cleaned up old temp checkpoint: epoch1_step14400 +[16:20:06] >> Temp checkpoint saved: epoch1_step16400, size: 0.1693 GB +[16:20:06] Epoch: 1 Batch: 16400/20099 (81.60%) Loss: 2.174673 LR: 0.00000896 +[16:20:08] Epoch: 1 Batch: 16401/20099 (81.60%) Loss: 1.592294 LR: 0.00000896 +[16:20:09] Epoch: 1 Batch: 16402/20099 (81.61%) Loss: 2.011690 LR: 0.00000895 +[16:20:11] Epoch: 1 Batch: 16403/20099 (81.61%) Loss: 1.990918 LR: 0.00000895 +[16:20:13] Epoch: 1 Batch: 16404/20099 (81.62%) Loss: 1.896723 LR: 0.00000895 +[16:20:15] Epoch: 1 Batch: 16405/20099 (81.62%) Loss: 2.256215 LR: 0.00000895 +[16:20:17] Epoch: 1 Batch: 16406/20099 (81.63%) Loss: 2.215314 LR: 0.00000895 +[16:20:19] Epoch: 1 Batch: 16407/20099 (81.63%) Loss: 2.201990 LR: 0.00000895 +[16:20:21] Epoch: 1 Batch: 16408/20099 (81.64%) Loss: 2.278979 LR: 0.00000895 +[16:20:22] Epoch: 1 Batch: 16409/20099 (81.64%) Loss: 1.909014 LR: 0.00000894 +[16:20:24] Epoch: 1 Batch: 16410/20099 (81.65%) Loss: 2.186988 LR: 0.00000894 +[16:20:26] Epoch: 1 Batch: 16411/20099 (81.65%) Loss: 1.988971 LR: 0.00000894 +[16:20:28] Epoch: 1 Batch: 16412/20099 (81.66%) Loss: 2.406084 LR: 0.00000894 +[16:20:30] Epoch: 1 Batch: 16413/20099 (81.66%) Loss: 2.097692 LR: 0.00000894 +[16:20:32] Epoch: 1 Batch: 16414/20099 (81.67%) Loss: 2.235339 LR: 0.00000894 +[16:20:34] Epoch: 1 Batch: 16415/20099 (81.67%) Loss: 1.925044 LR: 0.00000894 +[16:20:36] Epoch: 1 Batch: 16416/20099 (81.68%) Loss: 2.077592 LR: 0.00000893 +[16:20:37] Epoch: 1 Batch: 16417/20099 (81.68%) Loss: 2.009226 LR: 0.00000893 +[16:20:39] Epoch: 1 Batch: 16418/20099 (81.69%) Loss: 2.015584 LR: 0.00000893 +[16:20:41] Epoch: 1 Batch: 16419/20099 (81.69%) Loss: 2.362475 LR: 0.00000893 +[16:20:43] Epoch: 1 Batch: 16420/20099 (81.70%) Loss: 2.175116 LR: 0.00000893 +[16:20:45] Epoch: 1 Batch: 16421/20099 (81.70%) Loss: 2.058089 LR: 0.00000893 +[16:20:47] Epoch: 1 Batch: 16422/20099 (81.71%) Loss: 2.083847 LR: 0.00000893 +[16:20:49] Epoch: 1 Batch: 16423/20099 (81.71%) Loss: 2.165125 LR: 0.00000892 +[16:20:51] Epoch: 1 Batch: 16424/20099 (81.72%) Loss: 1.835495 LR: 0.00000892 +[16:20:52] Epoch: 1 Batch: 16425/20099 (81.72%) Loss: 2.364249 LR: 0.00000892 +[16:20:54] Epoch: 1 Batch: 16426/20099 (81.73%) Loss: 2.174137 LR: 0.00000892 +[16:20:56] Epoch: 1 Batch: 16427/20099 (81.73%) Loss: 2.141110 LR: 0.00000892 +[16:20:58] Epoch: 1 Batch: 16428/20099 (81.74%) Loss: 2.007820 LR: 0.00000892 +[16:21:00] Epoch: 1 Batch: 16429/20099 (81.74%) Loss: 2.076276 LR: 0.00000892 +[16:21:02] Epoch: 1 Batch: 16430/20099 (81.75%) Loss: 2.052114 LR: 0.00000890 +[16:21:04] Epoch: 1 Batch: 16431/20099 (81.75%) Loss: 1.926757 LR: 0.00000890 +[16:21:05] Epoch: 1 Batch: 16432/20099 (81.76%) Loss: 2.080806 LR: 0.00000890 +[16:21:07] Epoch: 1 Batch: 16433/20099 (81.76%) Loss: 1.868304 LR: 0.00000890 +[16:21:09] Epoch: 1 Batch: 16434/20099 (81.77%) Loss: 2.096716 LR: 0.00000890 +[16:21:11] Epoch: 1 Batch: 16435/20099 (81.77%) Loss: 1.606771 LR: 0.00000890 +[16:21:13] Epoch: 1 Batch: 16436/20099 (81.78%) Loss: 1.903899 LR: 0.00000890 +[16:21:15] Epoch: 1 Batch: 16437/20099 (81.78%) Loss: 1.931796 LR: 0.00000889 +[16:21:17] Epoch: 1 Batch: 16438/20099 (81.79%) Loss: 2.159409 LR: 0.00000889 +[16:21:19] Epoch: 1 Batch: 16439/20099 (81.79%) Loss: 2.105321 LR: 0.00000889 +[16:21:20] Epoch: 1 Batch: 16440/20099 (81.80%) Loss: 2.027949 LR: 0.00000889 +[16:21:22] Epoch: 1 Batch: 16441/20099 (81.80%) Loss: 2.178613 LR: 0.00000889 +[16:21:24] Epoch: 1 Batch: 16442/20099 (81.81%) Loss: 1.841477 LR: 0.00000889 +[16:21:26] Epoch: 1 Batch: 16443/20099 (81.81%) Loss: 2.025831 LR: 0.00000889 +[16:21:28] Epoch: 1 Batch: 16444/20099 (81.82%) Loss: 2.324130 LR: 0.00000888 +[16:21:30] Epoch: 1 Batch: 16445/20099 (81.82%) Loss: 1.813631 LR: 0.00000888 +[16:21:32] Epoch: 1 Batch: 16446/20099 (81.82%) Loss: 1.483622 LR: 0.00000888 +[16:21:33] Epoch: 1 Batch: 16447/20099 (81.83%) Loss: 2.230134 LR: 0.00000888 +[16:21:35] Epoch: 1 Batch: 16448/20099 (81.83%) Loss: 2.088285 LR: 0.00000888 +[16:21:37] Epoch: 1 Batch: 16449/20099 (81.84%) Loss: 1.994093 LR: 0.00000888 +[16:21:39] Epoch: 1 Batch: 16450/20099 (81.84%) Loss: 1.872150 LR: 0.00000888 +[16:21:41] Epoch: 1 Batch: 16451/20099 (81.85%) Loss: 2.092474 LR: 0.00000887 +[16:21:43] Epoch: 1 Batch: 16452/20099 (81.85%) Loss: 2.442058 LR: 0.00000887 +[16:21:45] Epoch: 1 Batch: 16453/20099 (81.86%) Loss: 2.015718 LR: 0.00000887 +[16:21:46] Epoch: 1 Batch: 16454/20099 (81.86%) Loss: 2.232316 LR: 0.00000887 +[16:21:48] Epoch: 1 Batch: 16455/20099 (81.87%) Loss: 2.005383 LR: 0.00000887 +[16:21:50] Epoch: 1 Batch: 16456/20099 (81.87%) Loss: 1.947353 LR: 0.00000887 +[16:21:52] Epoch: 1 Batch: 16457/20099 (81.88%) Loss: 2.298584 LR: 0.00000887 +[16:21:54] Epoch: 1 Batch: 16458/20099 (81.88%) Loss: 1.957472 LR: 0.00000886 +[16:21:56] Epoch: 1 Batch: 16459/20099 (81.89%) Loss: 2.102442 LR: 0.00000886 +[16:21:58] Epoch: 1 Batch: 16460/20099 (81.89%) Loss: 2.182721 LR: 0.00000886 +[16:21:59] Epoch: 1 Batch: 16461/20099 (81.90%) Loss: 2.035393 LR: 0.00000886 +[16:22:01] Epoch: 1 Batch: 16462/20099 (81.90%) Loss: 2.214000 LR: 0.00000886 +[16:22:03] Epoch: 1 Batch: 16463/20099 (81.91%) Loss: 1.996739 LR: 0.00000886 +[16:22:05] Epoch: 1 Batch: 16464/20099 (81.91%) Loss: 2.189920 LR: 0.00000886 +[16:22:07] Epoch: 1 Batch: 16465/20099 (81.92%) Loss: 1.935609 LR: 0.00000885 +[16:22:09] Epoch: 1 Batch: 16466/20099 (81.92%) Loss: 2.085215 LR: 0.00000885 +[16:22:10] Epoch: 1 Batch: 16467/20099 (81.93%) Loss: 2.205858 LR: 0.00000885 +[16:22:12] Epoch: 1 Batch: 16468/20099 (81.93%) Loss: 1.821169 LR: 0.00000885 +[16:22:14] Epoch: 1 Batch: 16469/20099 (81.94%) Loss: 2.239026 LR: 0.00000885 +[16:22:16] Epoch: 1 Batch: 16470/20099 (81.94%) Loss: 1.953988 LR: 0.00000885 +[16:22:18] Epoch: 1 Batch: 16471/20099 (81.95%) Loss: 1.897557 LR: 0.00000885 +[16:22:20] Epoch: 1 Batch: 16472/20099 (81.95%) Loss: 2.120228 LR: 0.00000884 +[16:22:22] Epoch: 1 Batch: 16473/20099 (81.96%) Loss: 1.921435 LR: 0.00000884 +[16:22:23] Epoch: 1 Batch: 16474/20099 (81.96%) Loss: 2.147715 LR: 0.00000884 +[16:22:25] Epoch: 1 Batch: 16475/20099 (81.97%) Loss: 2.137479 LR: 0.00000884 +[16:22:27] Epoch: 1 Batch: 16476/20099 (81.97%) Loss: 2.145004 LR: 0.00000884 +[16:22:29] Epoch: 1 Batch: 16477/20099 (81.98%) Loss: 1.923843 LR: 0.00000884 +[16:22:31] Epoch: 1 Batch: 16478/20099 (81.98%) Loss: 2.312792 LR: 0.00000884 +[16:22:33] Epoch: 1 Batch: 16479/20099 (81.99%) Loss: 2.204871 LR: 0.00000883 +[16:22:34] Epoch: 1 Batch: 16480/20099 (81.99%) Loss: 2.117625 LR: 0.00000883 +[16:22:36] Epoch: 1 Batch: 16481/20099 (82.00%) Loss: 2.031749 LR: 0.00000883 +[16:22:38] Epoch: 1 Batch: 16482/20099 (82.00%) Loss: 1.964625 LR: 0.00000883 +[16:22:40] Epoch: 1 Batch: 16483/20099 (82.01%) Loss: 2.349060 LR: 0.00000883 +[16:22:42] Epoch: 1 Batch: 16484/20099 (82.01%) Loss: 2.165146 LR: 0.00000883 +[16:22:44] Epoch: 1 Batch: 16485/20099 (82.02%) Loss: 2.178628 LR: 0.00000883 +[16:22:46] Epoch: 1 Batch: 16486/20099 (82.02%) Loss: 1.836853 LR: 0.00000882 +[16:22:47] Epoch: 1 Batch: 16487/20099 (82.03%) Loss: 2.248146 LR: 0.00000882 +[16:22:49] Epoch: 1 Batch: 16488/20099 (82.03%) Loss: 2.415990 LR: 0.00000882 +[16:22:51] Epoch: 1 Batch: 16489/20099 (82.04%) Loss: 1.786675 LR: 0.00000882 +[16:22:53] Epoch: 1 Batch: 16490/20099 (82.04%) Loss: 1.948336 LR: 0.00000882 +[16:22:55] Epoch: 1 Batch: 16491/20099 (82.05%) Loss: 2.164959 LR: 0.00000882 +[16:22:57] Epoch: 1 Batch: 16492/20099 (82.05%) Loss: 2.010401 LR: 0.00000882 +[16:22:58] Epoch: 1 Batch: 16493/20099 (82.06%) Loss: 2.409545 LR: 0.00000881 +[16:23:00] Epoch: 1 Batch: 16494/20099 (82.06%) Loss: 2.172247 LR: 0.00000881 +[16:23:02] Epoch: 1 Batch: 16495/20099 (82.07%) Loss: 2.104458 LR: 0.00000881 +[16:23:04] Epoch: 1 Batch: 16496/20099 (82.07%) Loss: 2.059661 LR: 0.00000881 +[16:23:06] Epoch: 1 Batch: 16497/20099 (82.08%) Loss: 2.255671 LR: 0.00000881 +[16:23:08] Epoch: 1 Batch: 16498/20099 (82.08%) Loss: 1.734671 LR: 0.00000881 +[16:23:10] Epoch: 1 Batch: 16499/20099 (82.09%) Loss: 2.002261 LR: 0.00000881 +[16:23:11] >> Evaluating batch 0 +[16:23:13] >> Evaluating batch 1 +[16:23:14] >> Evaluating batch 2 +[16:23:15] >> Evaluating batch 3 +[16:23:16] >> Evaluating batch 4 +[16:23:17] >> Evaluating batch 5 +[16:23:18] >> Evaluating batch 6 +[16:23:19] >> Evaluating batch 7 +[16:23:20] >> Evaluating batch 8 +[16:23:21] >> Evaluating batch 9 +[16:23:22] >> Evaluating batch 10 +[16:23:23] >> Evaluating batch 11 +[16:23:24] >> Evaluating batch 12 +[16:23:25] >> Evaluating batch 13 +[16:23:26] >> Evaluating batch 14 +[16:23:27] >> Evaluating batch 15 +[16:23:28] >> Evaluating batch 16 +[16:23:29] Epoch: 1 Step: 16500/20099 Evaluation: +[16:23:29] [1mAvg Loss Since Last Eval: 2.0763 Val Loss: 2.1479 Validation loss delta: -0.0010 Perplexity: 8.5664 LR: 0.00000880 +[16:23:32] >> Checkpoint saved: epoch1_step16500, size: 0.1693 GB +[16:23:32] Epoch: 1 Batch: 16500/20099 (82.09%) Loss: 2.285379 LR: 0.00000880 +[16:23:34] Epoch: 1 Batch: 16501/20099 (82.10%) Loss: 2.283371 LR: 0.00000880 +[16:23:36] Epoch: 1 Batch: 16502/20099 (82.10%) Loss: 2.249239 LR: 0.00000880 +[16:23:38] Epoch: 1 Batch: 16503/20099 (82.11%) Loss: 2.182446 LR: 0.00000880 +[16:23:40] Epoch: 1 Batch: 16504/20099 (82.11%) Loss: 2.295748 LR: 0.00000880 +[16:23:42] Epoch: 1 Batch: 16505/20099 (82.12%) Loss: 2.222879 LR: 0.00000880 +[16:23:43] Epoch: 1 Batch: 16506/20099 (82.12%) Loss: 1.888754 LR: 0.00000880 +[16:23:45] Epoch: 1 Batch: 16507/20099 (82.13%) Loss: 2.204117 LR: 0.00000879 +[16:23:47] Epoch: 1 Batch: 16508/20099 (82.13%) Loss: 2.051782 LR: 0.00000879 +[16:23:49] Epoch: 1 Batch: 16509/20099 (82.14%) Loss: 2.224120 LR: 0.00000879 +[16:23:51] Epoch: 1 Batch: 16510/20099 (82.14%) Loss: 2.142867 LR: 0.00000879 +[16:23:53] Epoch: 1 Batch: 16511/20099 (82.15%) Loss: 1.979385 LR: 0.00000879 +[16:23:55] Epoch: 1 Batch: 16512/20099 (82.15%) Loss: 1.811125 LR: 0.00000879 +[16:23:56] Epoch: 1 Batch: 16513/20099 (82.16%) Loss: 1.968368 LR: 0.00000879 +[16:23:58] Epoch: 1 Batch: 16514/20099 (82.16%) Loss: 2.105552 LR: 0.00000878 +[16:24:00] Epoch: 1 Batch: 16515/20099 (82.17%) Loss: 2.119487 LR: 0.00000878 +[16:24:02] Epoch: 1 Batch: 16516/20099 (82.17%) Loss: 2.135611 LR: 0.00000878 +[16:24:04] Epoch: 1 Batch: 16517/20099 (82.18%) Loss: 2.369372 LR: 0.00000878 +[16:24:06] Epoch: 1 Batch: 16518/20099 (82.18%) Loss: 2.232313 LR: 0.00000878 +[16:24:08] Epoch: 1 Batch: 16519/20099 (82.19%) Loss: 1.523172 LR: 0.00000878 +[16:24:09] Epoch: 1 Batch: 16520/20099 (82.19%) Loss: 2.272309 LR: 0.00000878 +[16:24:11] Epoch: 1 Batch: 16521/20099 (82.20%) Loss: 2.195114 LR: 0.00000877 +[16:24:13] Epoch: 1 Batch: 16522/20099 (82.20%) Loss: 2.411936 LR: 0.00000877 +[16:24:15] Epoch: 1 Batch: 16523/20099 (82.21%) Loss: 2.126309 LR: 0.00000877 +[16:24:17] Epoch: 1 Batch: 16524/20099 (82.21%) Loss: 1.969497 LR: 0.00000877 +[16:24:19] Epoch: 1 Batch: 16525/20099 (82.22%) Loss: 2.122985 LR: 0.00000877 +[16:24:21] Epoch: 1 Batch: 16526/20099 (82.22%) Loss: 2.025162 LR: 0.00000877 +[16:24:22] Epoch: 1 Batch: 16527/20099 (82.23%) Loss: 1.947451 LR: 0.00000877 +[16:24:24] Epoch: 1 Batch: 16528/20099 (82.23%) Loss: 2.262833 LR: 0.00000876 +[16:24:26] Epoch: 1 Batch: 16529/20099 (82.24%) Loss: 2.189519 LR: 0.00000876 +[16:24:28] Epoch: 1 Batch: 16530/20099 (82.24%) Loss: 2.193761 LR: 0.00000876 +[16:24:30] Epoch: 1 Batch: 16531/20099 (82.25%) Loss: 1.863230 LR: 0.00000876 +[16:24:32] Epoch: 1 Batch: 16532/20099 (82.25%) Loss: 1.641419 LR: 0.00000876 +[16:24:34] Epoch: 1 Batch: 16533/20099 (82.26%) Loss: 2.089165 LR: 0.00000876 +[16:24:35] Epoch: 1 Batch: 16534/20099 (82.26%) Loss: 1.938838 LR: 0.00000876 +[16:24:37] Epoch: 1 Batch: 16535/20099 (82.27%) Loss: 2.047857 LR: 0.00000875 +[16:24:39] Epoch: 1 Batch: 16536/20099 (82.27%) Loss: 2.164503 LR: 0.00000875 +[16:24:41] Epoch: 1 Batch: 16537/20099 (82.28%) Loss: 2.228244 LR: 0.00000875 +[16:24:43] Epoch: 1 Batch: 16538/20099 (82.28%) Loss: 2.022917 LR: 0.00000875 +[16:24:45] Epoch: 1 Batch: 16539/20099 (82.29%) Loss: 2.093437 LR: 0.00000875 +[16:24:47] Epoch: 1 Batch: 16540/20099 (82.29%) Loss: 1.986417 LR: 0.00000875 +[16:24:48] Epoch: 1 Batch: 16541/20099 (82.30%) Loss: 1.919136 LR: 0.00000875 +[16:24:50] Epoch: 1 Batch: 16542/20099 (82.30%) Loss: 1.906429 LR: 0.00000874 +[16:24:52] Epoch: 1 Batch: 16543/20099 (82.31%) Loss: 2.081245 LR: 0.00000874 +[16:24:54] Epoch: 1 Batch: 16544/20099 (82.31%) Loss: 2.002152 LR: 0.00000874 +[16:24:56] Epoch: 1 Batch: 16545/20099 (82.32%) Loss: 2.170120 LR: 0.00000874 +[16:24:58] Epoch: 1 Batch: 16546/20099 (82.32%) Loss: 2.112074 LR: 0.00000874 +[16:25:00] Epoch: 1 Batch: 16547/20099 (82.33%) Loss: 2.229463 LR: 0.00000874 +[16:25:01] Epoch: 1 Batch: 16548/20099 (82.33%) Loss: 2.110729 LR: 0.00000874 +[16:25:03] Epoch: 1 Batch: 16549/20099 (82.34%) Loss: 2.218425 LR: 0.00000873 +[16:25:05] Epoch: 1 Batch: 16550/20099 (82.34%) Loss: 1.827083 LR: 0.00000873 +[16:25:07] Epoch: 1 Batch: 16551/20099 (82.35%) Loss: 2.236944 LR: 0.00000873 +[16:25:09] Epoch: 1 Batch: 16552/20099 (82.35%) Loss: 2.076031 LR: 0.00000873 +[16:25:11] Epoch: 1 Batch: 16553/20099 (82.36%) Loss: 2.235271 LR: 0.00000873 +[16:25:13] Epoch: 1 Batch: 16554/20099 (82.36%) Loss: 1.987821 LR: 0.00000873 +[16:25:14] Epoch: 1 Batch: 16555/20099 (82.37%) Loss: 1.788835 LR: 0.00000873 +[16:25:16] Epoch: 1 Batch: 16556/20099 (82.37%) Loss: 2.366536 LR: 0.00000872 +[16:25:18] Epoch: 1 Batch: 16557/20099 (82.38%) Loss: 2.405690 LR: 0.00000872 +[16:25:20] Epoch: 1 Batch: 16558/20099 (82.38%) Loss: 2.240904 LR: 0.00000872 +[16:25:22] Epoch: 1 Batch: 16559/20099 (82.39%) Loss: 1.982807 LR: 0.00000872 +[16:25:24] Epoch: 1 Batch: 16560/20099 (82.39%) Loss: 2.165594 LR: 0.00000872 +[16:25:25] Epoch: 1 Batch: 16561/20099 (82.40%) Loss: 2.156009 LR: 0.00000872 +[16:25:27] Epoch: 1 Batch: 16562/20099 (82.40%) Loss: 2.135953 LR: 0.00000872 +[16:25:29] Epoch: 1 Batch: 16563/20099 (82.41%) Loss: 2.054039 LR: 0.00000871 +[16:25:31] Epoch: 1 Batch: 16564/20099 (82.41%) Loss: 2.251221 LR: 0.00000871 +[16:25:33] Epoch: 1 Batch: 16565/20099 (82.42%) Loss: 2.108429 LR: 0.00000871 +[16:25:35] Epoch: 1 Batch: 16566/20099 (82.42%) Loss: 2.023752 LR: 0.00000871 +[16:25:37] Epoch: 1 Batch: 16567/20099 (82.43%) Loss: 1.892316 LR: 0.00000871 +[16:25:38] Epoch: 1 Batch: 16568/20099 (82.43%) Loss: 2.197621 LR: 0.00000871 +[16:25:40] Epoch: 1 Batch: 16569/20099 (82.44%) Loss: 2.212276 LR: 0.00000871 +[16:25:42] Epoch: 1 Batch: 16570/20099 (82.44%) Loss: 1.957202 LR: 0.00000870 +[16:25:44] Epoch: 1 Batch: 16571/20099 (82.45%) Loss: 2.097516 LR: 0.00000870 +[16:25:46] Epoch: 1 Batch: 16572/20099 (82.45%) Loss: 2.083212 LR: 0.00000870 +[16:25:48] Epoch: 1 Batch: 16573/20099 (82.46%) Loss: 2.195915 LR: 0.00000870 +[16:25:49] Epoch: 1 Batch: 16574/20099 (82.46%) Loss: 2.133075 LR: 0.00000870 +[16:25:51] Epoch: 1 Batch: 16575/20099 (82.47%) Loss: 2.241356 LR: 0.00000870 +[16:25:53] Epoch: 1 Batch: 16576/20099 (82.47%) Loss: 1.874364 LR: 0.00000870 +[16:25:55] Epoch: 1 Batch: 16577/20099 (82.48%) Loss: 2.366028 LR: 0.00000869 +[16:25:57] Epoch: 1 Batch: 16578/20099 (82.48%) Loss: 1.736152 LR: 0.00000869 +[16:25:59] Epoch: 1 Batch: 16579/20099 (82.49%) Loss: 2.118838 LR: 0.00000869 +[16:26:01] Epoch: 1 Batch: 16580/20099 (82.49%) Loss: 2.051070 LR: 0.00000869 +[16:26:02] Epoch: 1 Batch: 16581/20099 (82.50%) Loss: 1.910800 LR: 0.00000869 +[16:26:04] Epoch: 1 Batch: 16582/20099 (82.50%) Loss: 2.112578 LR: 0.00000869 +[16:26:06] Epoch: 1 Batch: 16583/20099 (82.51%) Loss: 1.881167 LR: 0.00000869 +[16:26:08] Epoch: 1 Batch: 16584/20099 (82.51%) Loss: 1.797178 LR: 0.00000868 +[16:26:10] Epoch: 1 Batch: 16585/20099 (82.52%) Loss: 2.132055 LR: 0.00000868 +[16:26:12] Epoch: 1 Batch: 16586/20099 (82.52%) Loss: 2.342200 LR: 0.00000868 +[16:26:14] Epoch: 1 Batch: 16587/20099 (82.53%) Loss: 2.033067 LR: 0.00000868 +[16:26:15] Epoch: 1 Batch: 16588/20099 (82.53%) Loss: 2.344885 LR: 0.00000868 +[16:26:17] Epoch: 1 Batch: 16589/20099 (82.54%) Loss: 1.907830 LR: 0.00000868 +[16:26:19] Epoch: 1 Batch: 16590/20099 (82.54%) Loss: 2.113581 LR: 0.00000868 +[16:26:21] Epoch: 1 Batch: 16591/20099 (82.55%) Loss: 2.015352 LR: 0.00000867 +[16:26:23] Epoch: 1 Batch: 16592/20099 (82.55%) Loss: 2.057253 LR: 0.00000867 +[16:26:25] Epoch: 1 Batch: 16593/20099 (82.56%) Loss: 2.164710 LR: 0.00000867 +[16:26:27] Epoch: 1 Batch: 16594/20099 (82.56%) Loss: 2.154039 LR: 0.00000867 +[16:26:28] Epoch: 1 Batch: 16595/20099 (82.57%) Loss: 2.231452 LR: 0.00000867 +[16:26:30] Epoch: 1 Batch: 16596/20099 (82.57%) Loss: 2.405390 LR: 0.00000867 +[16:26:32] Epoch: 1 Batch: 16597/20099 (82.58%) Loss: 2.132266 LR: 0.00000867 +[16:26:34] Epoch: 1 Batch: 16598/20099 (82.58%) Loss: 2.018621 LR: 0.00000866 +[16:26:36] Epoch: 1 Batch: 16599/20099 (82.59%) Loss: 1.993345 LR: 0.00000866 +[16:26:41] >> Cleaned up old temp checkpoint: epoch1_step14600 +[16:26:41] >> Temp checkpoint saved: epoch1_step16600, size: 0.1693 GB +[16:26:41] Epoch: 1 Batch: 16600/20099 (82.59%) Loss: 1.964015 LR: 0.00000866 +[16:26:43] Epoch: 1 Batch: 16601/20099 (82.60%) Loss: 2.126747 LR: 0.00000866 +[16:26:45] Epoch: 1 Batch: 16602/20099 (82.60%) Loss: 2.104041 LR: 0.00000866 +[16:26:47] Epoch: 1 Batch: 16603/20099 (82.61%) Loss: 2.161549 LR: 0.00000866 +[16:26:49] Epoch: 1 Batch: 16604/20099 (82.61%) Loss: 2.281396 LR: 0.00000866 +[16:26:50] Epoch: 1 Batch: 16605/20099 (82.62%) Loss: 1.860242 LR: 0.00000864 +[16:26:52] Epoch: 1 Batch: 16606/20099 (82.62%) Loss: 2.091843 LR: 0.00000864 +[16:26:54] Epoch: 1 Batch: 16607/20099 (82.63%) Loss: 2.312483 LR: 0.00000864 +[16:26:56] Epoch: 1 Batch: 16608/20099 (82.63%) Loss: 2.153271 LR: 0.00000864 +[16:26:58] Epoch: 1 Batch: 16609/20099 (82.64%) Loss: 2.004353 LR: 0.00000864 +[16:27:00] Epoch: 1 Batch: 16610/20099 (82.64%) Loss: 2.242671 LR: 0.00000864 +[16:27:02] Epoch: 1 Batch: 16611/20099 (82.65%) Loss: 2.304753 LR: 0.00000864 +[16:27:03] Epoch: 1 Batch: 16612/20099 (82.65%) Loss: 1.941058 LR: 0.00000863 +[16:27:05] Epoch: 1 Batch: 16613/20099 (82.66%) Loss: 2.105476 LR: 0.00000863 +[16:27:07] Epoch: 1 Batch: 16614/20099 (82.66%) Loss: 2.234357 LR: 0.00000863 +[16:27:09] Epoch: 1 Batch: 16615/20099 (82.67%) Loss: 2.176742 LR: 0.00000863 +[16:27:11] Epoch: 1 Batch: 16616/20099 (82.67%) Loss: 2.198997 LR: 0.00000863 +[16:27:13] Epoch: 1 Batch: 16617/20099 (82.68%) Loss: 2.499077 LR: 0.00000863 +[16:27:15] Epoch: 1 Batch: 16618/20099 (82.68%) Loss: 1.900685 LR: 0.00000863 +[16:27:16] Epoch: 1 Batch: 16619/20099 (82.69%) Loss: 2.113362 LR: 0.00000862 +[16:27:18] Epoch: 1 Batch: 16620/20099 (82.69%) Loss: 2.058198 LR: 0.00000862 +[16:27:20] Epoch: 1 Batch: 16621/20099 (82.70%) Loss: 1.930577 LR: 0.00000862 +[16:27:22] Epoch: 1 Batch: 16622/20099 (82.70%) Loss: 2.135709 LR: 0.00000862 +[16:27:24] Epoch: 1 Batch: 16623/20099 (82.71%) Loss: 2.101613 LR: 0.00000862 +[16:27:26] Epoch: 1 Batch: 16624/20099 (82.71%) Loss: 2.227145 LR: 0.00000862 +[16:27:28] Epoch: 1 Batch: 16625/20099 (82.72%) Loss: 1.806254 LR: 0.00000862 +[16:27:29] Epoch: 1 Batch: 16626/20099 (82.72%) Loss: 2.036176 LR: 0.00000861 +[16:27:31] Epoch: 1 Batch: 16627/20099 (82.73%) Loss: 2.251123 LR: 0.00000861 +[16:27:33] Epoch: 1 Batch: 16628/20099 (82.73%) Loss: 2.193777 LR: 0.00000861 +[16:27:35] Epoch: 1 Batch: 16629/20099 (82.74%) Loss: 2.152320 LR: 0.00000861 +[16:27:37] Epoch: 1 Batch: 16630/20099 (82.74%) Loss: 2.110282 LR: 0.00000861 +[16:27:39] Epoch: 1 Batch: 16631/20099 (82.75%) Loss: 2.013544 LR: 0.00000861 +[16:27:41] Epoch: 1 Batch: 16632/20099 (82.75%) Loss: 2.095991 LR: 0.00000861 +[16:27:42] Epoch: 1 Batch: 16633/20099 (82.76%) Loss: 2.331870 LR: 0.00000860 +[16:27:44] Epoch: 1 Batch: 16634/20099 (82.76%) Loss: 2.015231 LR: 0.00000860 +[16:27:46] Epoch: 1 Batch: 16635/20099 (82.77%) Loss: 2.197946 LR: 0.00000860 +[16:27:48] Epoch: 1 Batch: 16636/20099 (82.77%) Loss: 2.052575 LR: 0.00000860 +[16:27:50] Epoch: 1 Batch: 16637/20099 (82.78%) Loss: 1.918877 LR: 0.00000860 +[16:27:52] Epoch: 1 Batch: 16638/20099 (82.78%) Loss: 2.265791 LR: 0.00000860 +[16:27:54] Epoch: 1 Batch: 16639/20099 (82.79%) Loss: 2.252563 LR: 0.00000860 +[16:27:55] Epoch: 1 Batch: 16640/20099 (82.79%) Loss: 1.908554 LR: 0.00000859 +[16:27:57] Epoch: 1 Batch: 16641/20099 (82.80%) Loss: 2.226805 LR: 0.00000859 +[16:27:59] Epoch: 1 Batch: 16642/20099 (82.80%) Loss: 2.310566 LR: 0.00000859 +[16:28:01] Epoch: 1 Batch: 16643/20099 (82.81%) Loss: 2.210073 LR: 0.00000859 +[16:28:03] Epoch: 1 Batch: 16644/20099 (82.81%) Loss: 2.056789 LR: 0.00000859 +[16:28:05] Epoch: 1 Batch: 16645/20099 (82.82%) Loss: 2.250406 LR: 0.00000859 +[16:28:06] Epoch: 1 Batch: 16646/20099 (82.82%) Loss: 2.018199 LR: 0.00000859 +[16:28:08] Epoch: 1 Batch: 16647/20099 (82.83%) Loss: 2.232755 LR: 0.00000858 +[16:28:10] Epoch: 1 Batch: 16648/20099 (82.83%) Loss: 2.188177 LR: 0.00000858 +[16:28:12] Epoch: 1 Batch: 16649/20099 (82.83%) Loss: 2.072391 LR: 0.00000858 +[16:28:14] Epoch: 1 Batch: 16650/20099 (82.84%) Loss: 1.813561 LR: 0.00000858 +[16:28:16] Epoch: 1 Batch: 16651/20099 (82.84%) Loss: 2.252062 LR: 0.00000858 +[16:28:18] Epoch: 1 Batch: 16652/20099 (82.85%) Loss: 2.136316 LR: 0.00000858 +[16:28:19] Epoch: 1 Batch: 16653/20099 (82.85%) Loss: 2.247184 LR: 0.00000858 +[16:28:21] Epoch: 1 Batch: 16654/20099 (82.86%) Loss: 2.088854 LR: 0.00000857 +[16:28:23] Epoch: 1 Batch: 16655/20099 (82.86%) Loss: 2.019944 LR: 0.00000857 +[16:28:25] Epoch: 1 Batch: 16656/20099 (82.87%) Loss: 2.085492 LR: 0.00000857 +[16:28:27] Epoch: 1 Batch: 16657/20099 (82.87%) Loss: 2.054235 LR: 0.00000857 +[16:28:29] Epoch: 1 Batch: 16658/20099 (82.88%) Loss: 2.069050 LR: 0.00000857 +[16:28:30] Epoch: 1 Batch: 16659/20099 (82.88%) Loss: 1.985282 LR: 0.00000857 +[16:28:32] Epoch: 1 Batch: 16660/20099 (82.89%) Loss: 2.067096 LR: 0.00000857 +[16:28:34] Epoch: 1 Batch: 16661/20099 (82.89%) Loss: 2.151713 LR: 0.00000856 +[16:28:36] Epoch: 1 Batch: 16662/20099 (82.90%) Loss: 2.030815 LR: 0.00000856 +[16:28:38] Epoch: 1 Batch: 16663/20099 (82.90%) Loss: 1.783173 LR: 0.00000856 +[16:28:40] Epoch: 1 Batch: 16664/20099 (82.91%) Loss: 2.056595 LR: 0.00000856 +[16:28:42] Epoch: 1 Batch: 16665/20099 (82.91%) Loss: 2.006794 LR: 0.00000856 +[16:28:43] Epoch: 1 Batch: 16666/20099 (82.92%) Loss: 2.132596 LR: 0.00000856 +[16:28:45] Epoch: 1 Batch: 16667/20099 (82.92%) Loss: 2.109039 LR: 0.00000856 +[16:28:47] Epoch: 1 Batch: 16668/20099 (82.93%) Loss: 2.349339 LR: 0.00000855 +[16:28:49] Epoch: 1 Batch: 16669/20099 (82.93%) Loss: 2.406927 LR: 0.00000855 +[16:28:51] Epoch: 1 Batch: 16670/20099 (82.94%) Loss: 2.061842 LR: 0.00000855 +[16:28:53] Epoch: 1 Batch: 16671/20099 (82.94%) Loss: 1.845564 LR: 0.00000855 +[16:28:55] Epoch: 1 Batch: 16672/20099 (82.95%) Loss: 2.232133 LR: 0.00000855 +[16:28:56] Epoch: 1 Batch: 16673/20099 (82.95%) Loss: 1.756716 LR: 0.00000855 +[16:28:58] Epoch: 1 Batch: 16674/20099 (82.96%) Loss: 2.037388 LR: 0.00000855 +[16:29:00] Epoch: 1 Batch: 16675/20099 (82.96%) Loss: 1.858126 LR: 0.00000854 +[16:29:02] Epoch: 1 Batch: 16676/20099 (82.97%) Loss: 2.110776 LR: 0.00000854 +[16:29:04] Epoch: 1 Batch: 16677/20099 (82.97%) Loss: 1.919326 LR: 0.00000854 +[16:29:06] Epoch: 1 Batch: 16678/20099 (82.98%) Loss: 2.282291 LR: 0.00000854 +[16:29:08] Epoch: 1 Batch: 16679/20099 (82.98%) Loss: 2.130403 LR: 0.00000854 +[16:29:09] Epoch: 1 Batch: 16680/20099 (82.99%) Loss: 2.217645 LR: 0.00000854 +[16:29:11] Epoch: 1 Batch: 16681/20099 (82.99%) Loss: 2.053225 LR: 0.00000854 +[16:29:13] Epoch: 1 Batch: 16682/20099 (83.00%) Loss: 2.020172 LR: 0.00000853 +[16:29:15] Epoch: 1 Batch: 16683/20099 (83.00%) Loss: 2.056619 LR: 0.00000853 +[16:29:17] Epoch: 1 Batch: 16684/20099 (83.01%) Loss: 2.188424 LR: 0.00000853 +[16:29:19] Epoch: 1 Batch: 16685/20099 (83.01%) Loss: 2.219086 LR: 0.00000853 +[16:29:21] Epoch: 1 Batch: 16686/20099 (83.02%) Loss: 1.860166 LR: 0.00000853 +[16:29:22] Epoch: 1 Batch: 16687/20099 (83.02%) Loss: 1.879277 LR: 0.00000853 +[16:29:24] Epoch: 1 Batch: 16688/20099 (83.03%) Loss: 2.299792 LR: 0.00000853 +[16:29:26] Epoch: 1 Batch: 16689/20099 (83.03%) Loss: 2.137259 LR: 0.00000852 +[16:29:28] Epoch: 1 Batch: 16690/20099 (83.04%) Loss: 2.105556 LR: 0.00000852 +[16:29:30] Epoch: 1 Batch: 16691/20099 (83.04%) Loss: 2.079010 LR: 0.00000852 +[16:29:32] Epoch: 1 Batch: 16692/20099 (83.05%) Loss: 1.978904 LR: 0.00000852 +[16:29:34] Epoch: 1 Batch: 16693/20099 (83.05%) Loss: 2.201975 LR: 0.00000852 +[16:29:35] Epoch: 1 Batch: 16694/20099 (83.06%) Loss: 1.963457 LR: 0.00000852 +[16:29:37] Epoch: 1 Batch: 16695/20099 (83.06%) Loss: 2.051216 LR: 0.00000852 +[16:29:39] Epoch: 1 Batch: 16696/20099 (83.07%) Loss: 2.172128 LR: 0.00000851 +[16:29:41] Epoch: 1 Batch: 16697/20099 (83.07%) Loss: 2.409959 LR: 0.00000851 +[16:29:43] Epoch: 1 Batch: 16698/20099 (83.08%) Loss: 2.027288 LR: 0.00000851 +[16:29:45] Epoch: 1 Batch: 16699/20099 (83.08%) Loss: 1.703194 LR: 0.00000851 +[16:29:46] Epoch: 1 Batch: 16700/20099 (83.09%) Loss: 1.934295 LR: 0.00000851 +[16:29:48] Epoch: 1 Batch: 16701/20099 (83.09%) Loss: 2.006363 LR: 0.00000851 +[16:29:50] Epoch: 1 Batch: 16702/20099 (83.10%) Loss: 2.095083 LR: 0.00000851 +[16:29:52] Epoch: 1 Batch: 16703/20099 (83.10%) Loss: 2.324102 LR: 0.00000850 +[16:29:54] Epoch: 1 Batch: 16704/20099 (83.11%) Loss: 2.057350 LR: 0.00000850 +[16:29:56] Epoch: 1 Batch: 16705/20099 (83.11%) Loss: 2.076157 LR: 0.00000850 +[16:29:58] Epoch: 1 Batch: 16706/20099 (83.12%) Loss: 2.271578 LR: 0.00000850 +[16:29:59] Epoch: 1 Batch: 16707/20099 (83.12%) Loss: 1.967137 LR: 0.00000850 +[16:30:01] Epoch: 1 Batch: 16708/20099 (83.13%) Loss: 2.225040 LR: 0.00000850 +[16:30:03] Epoch: 1 Batch: 16709/20099 (83.13%) Loss: 2.192386 LR: 0.00000850 +[16:30:05] Epoch: 1 Batch: 16710/20099 (83.14%) Loss: 2.049857 LR: 0.00000849 +[16:30:07] Epoch: 1 Batch: 16711/20099 (83.14%) Loss: 2.231309 LR: 0.00000849 +[16:30:09] Epoch: 1 Batch: 16712/20099 (83.15%) Loss: 2.178263 LR: 0.00000849 +[16:30:11] Epoch: 1 Batch: 16713/20099 (83.15%) Loss: 2.078445 LR: 0.00000849 +[16:30:12] Epoch: 1 Batch: 16714/20099 (83.16%) Loss: 2.075748 LR: 0.00000849 +[16:30:14] Epoch: 1 Batch: 16715/20099 (83.16%) Loss: 2.081205 LR: 0.00000849 +[16:30:16] Epoch: 1 Batch: 16716/20099 (83.17%) Loss: 2.209902 LR: 0.00000849 +[16:30:18] Epoch: 1 Batch: 16717/20099 (83.17%) Loss: 2.120301 LR: 0.00000848 +[16:30:20] Epoch: 1 Batch: 16718/20099 (83.18%) Loss: 1.990799 LR: 0.00000848 +[16:30:22] Epoch: 1 Batch: 16719/20099 (83.18%) Loss: 1.914299 LR: 0.00000848 +[16:30:23] Epoch: 1 Batch: 16720/20099 (83.19%) Loss: 2.052778 LR: 0.00000848 +[16:30:25] Epoch: 1 Batch: 16721/20099 (83.19%) Loss: 2.114540 LR: 0.00000848 +[16:30:27] Epoch: 1 Batch: 16722/20099 (83.20%) Loss: 2.425771 LR: 0.00000848 +[16:30:29] Epoch: 1 Batch: 16723/20099 (83.20%) Loss: 1.733617 LR: 0.00000848 +[16:30:31] Epoch: 1 Batch: 16724/20099 (83.21%) Loss: 1.858983 LR: 0.00000847 +[16:30:33] Epoch: 1 Batch: 16725/20099 (83.21%) Loss: 2.135315 LR: 0.00000847 +[16:30:35] Epoch: 1 Batch: 16726/20099 (83.22%) Loss: 2.274087 LR: 0.00000847 +[16:30:36] Epoch: 1 Batch: 16727/20099 (83.22%) Loss: 1.951154 LR: 0.00000847 +[16:30:38] Epoch: 1 Batch: 16728/20099 (83.23%) Loss: 2.326573 LR: 0.00000847 +[16:30:40] Epoch: 1 Batch: 16729/20099 (83.23%) Loss: 2.154121 LR: 0.00000847 +[16:30:42] Epoch: 1 Batch: 16730/20099 (83.24%) Loss: 2.051224 LR: 0.00000847 +[16:30:44] Epoch: 1 Batch: 16731/20099 (83.24%) Loss: 1.838811 LR: 0.00000846 +[16:30:46] Epoch: 1 Batch: 16732/20099 (83.25%) Loss: 2.110795 LR: 0.00000846 +[16:30:48] Epoch: 1 Batch: 16733/20099 (83.25%) Loss: 2.162132 LR: 0.00000846 +[16:30:49] Epoch: 1 Batch: 16734/20099 (83.26%) Loss: 2.387799 LR: 0.00000846 +[16:30:51] Epoch: 1 Batch: 16735/20099 (83.26%) Loss: 2.069029 LR: 0.00000846 +[16:30:53] Epoch: 1 Batch: 16736/20099 (83.27%) Loss: 1.947996 LR: 0.00000846 +[16:30:55] Epoch: 1 Batch: 16737/20099 (83.27%) Loss: 2.272533 LR: 0.00000846 +[16:30:57] Epoch: 1 Batch: 16738/20099 (83.28%) Loss: 2.231332 LR: 0.00000845 +[16:30:59] Epoch: 1 Batch: 16739/20099 (83.28%) Loss: 1.725709 LR: 0.00000845 +[16:31:01] Epoch: 1 Batch: 16740/20099 (83.29%) Loss: 2.056757 LR: 0.00000845 +[16:31:02] Epoch: 1 Batch: 16741/20099 (83.29%) Loss: 1.967078 LR: 0.00000845 +[16:31:04] Epoch: 1 Batch: 16742/20099 (83.30%) Loss: 1.892167 LR: 0.00000845 +[16:31:06] Epoch: 1 Batch: 16743/20099 (83.30%) Loss: 2.112472 LR: 0.00000845 +[16:31:08] Epoch: 1 Batch: 16744/20099 (83.31%) Loss: 2.010365 LR: 0.00000845 +[16:31:10] Epoch: 1 Batch: 16745/20099 (83.31%) Loss: 2.186443 LR: 0.00000844 +[16:31:12] Epoch: 1 Batch: 16746/20099 (83.32%) Loss: 2.163054 LR: 0.00000844 +[16:31:13] Epoch: 1 Batch: 16747/20099 (83.32%) Loss: 2.093272 LR: 0.00000844 +[16:31:15] Epoch: 1 Batch: 16748/20099 (83.33%) Loss: 2.104272 LR: 0.00000844 +[16:31:17] Epoch: 1 Batch: 16749/20099 (83.33%) Loss: 2.233150 LR: 0.00000844 +[16:31:19] Epoch: 1 Batch: 16750/20099 (83.34%) Loss: 2.408203 LR: 0.00000844 +[16:31:21] Epoch: 1 Batch: 16751/20099 (83.34%) Loss: 1.762165 LR: 0.00000844 +[16:31:23] Epoch: 1 Batch: 16752/20099 (83.35%) Loss: 2.066351 LR: 0.00000844 +[16:31:25] Epoch: 1 Batch: 16753/20099 (83.35%) Loss: 2.119678 LR: 0.00000844 +[16:31:26] Epoch: 1 Batch: 16754/20099 (83.36%) Loss: 2.371763 LR: 0.00000844 +[16:31:28] Epoch: 1 Batch: 16755/20099 (83.36%) Loss: 1.937698 LR: 0.00000844 +[16:31:30] Epoch: 1 Batch: 16756/20099 (83.37%) Loss: 2.006530 LR: 0.00000844 +[16:31:32] Epoch: 1 Batch: 16757/20099 (83.37%) Loss: 1.989969 LR: 0.00000844 +[16:31:34] Epoch: 1 Batch: 16758/20099 (83.38%) Loss: 2.179781 LR: 0.00000844 +[16:31:36] Epoch: 1 Batch: 16759/20099 (83.38%) Loss: 1.774055 LR: 0.00000843 +[16:31:37] Epoch: 1 Batch: 16760/20099 (83.39%) Loss: 2.144992 LR: 0.00000843 +[16:31:39] Epoch: 1 Batch: 16761/20099 (83.39%) Loss: 1.934946 LR: 0.00000843 +[16:31:41] Epoch: 1 Batch: 16762/20099 (83.40%) Loss: 2.200750 LR: 0.00000843 +[16:31:43] Epoch: 1 Batch: 16763/20099 (83.40%) Loss: 1.927474 LR: 0.00000843 +[16:31:45] Epoch: 1 Batch: 16764/20099 (83.41%) Loss: 1.737182 LR: 0.00000843 +[16:31:47] Epoch: 1 Batch: 16765/20099 (83.41%) Loss: 2.168481 LR: 0.00000843 +[16:31:49] Epoch: 1 Batch: 16766/20099 (83.42%) Loss: 1.973350 LR: 0.00000842 +[16:31:51] Epoch: 1 Batch: 16767/20099 (83.42%) Loss: 2.100538 LR: 0.00000842 +[16:31:52] Epoch: 1 Batch: 16768/20099 (83.43%) Loss: 2.000759 LR: 0.00000842 +[16:31:54] Epoch: 1 Batch: 16769/20099 (83.43%) Loss: 2.078461 LR: 0.00000842 +[16:31:56] Epoch: 1 Batch: 16770/20099 (83.44%) Loss: 2.377001 LR: 0.00000842 +[16:31:58] Epoch: 1 Batch: 16771/20099 (83.44%) Loss: 2.082760 LR: 0.00000842 +[16:32:00] Epoch: 1 Batch: 16772/20099 (83.45%) Loss: 1.696118 LR: 0.00000842 +[16:32:02] Epoch: 1 Batch: 16773/20099 (83.45%) Loss: 1.908051 LR: 0.00000841 +[16:32:03] Epoch: 1 Batch: 16774/20099 (83.46%) Loss: 2.004529 LR: 0.00000841 +[16:32:05] Epoch: 1 Batch: 16775/20099 (83.46%) Loss: 1.916346 LR: 0.00000841 +[16:32:07] Epoch: 1 Batch: 16776/20099 (83.47%) Loss: 2.141359 LR: 0.00000841 +[16:32:09] Epoch: 1 Batch: 16777/20099 (83.47%) Loss: 2.037600 LR: 0.00000841 +[16:32:11] Epoch: 1 Batch: 16778/20099 (83.48%) Loss: 2.013016 LR: 0.00000841 +[16:32:13] Epoch: 1 Batch: 16779/20099 (83.48%) Loss: 1.780203 LR: 0.00000841 +[16:32:15] Epoch: 1 Batch: 16780/20099 (83.49%) Loss: 2.179477 LR: 0.00000840 +[16:32:16] Epoch: 1 Batch: 16781/20099 (83.49%) Loss: 2.392939 LR: 0.00000840 +[16:32:18] Epoch: 1 Batch: 16782/20099 (83.50%) Loss: 2.235525 LR: 0.00000840 +[16:32:20] Epoch: 1 Batch: 16783/20099 (83.50%) Loss: 2.167224 LR: 0.00000840 +[16:32:22] Epoch: 1 Batch: 16784/20099 (83.51%) Loss: 2.122760 LR: 0.00000840 +[16:32:24] Epoch: 1 Batch: 16785/20099 (83.51%) Loss: 2.288196 LR: 0.00000840 +[16:32:26] Epoch: 1 Batch: 16786/20099 (83.52%) Loss: 2.079150 LR: 0.00000840 +[16:32:28] Epoch: 1 Batch: 16787/20099 (83.52%) Loss: 2.091730 LR: 0.00000839 +[16:32:29] Epoch: 1 Batch: 16788/20099 (83.53%) Loss: 2.136585 LR: 0.00000839 +[16:32:31] Epoch: 1 Batch: 16789/20099 (83.53%) Loss: 2.106469 LR: 0.00000839 +[16:32:33] Epoch: 1 Batch: 16790/20099 (83.54%) Loss: 1.985318 LR: 0.00000839 +[16:32:35] Epoch: 1 Batch: 16791/20099 (83.54%) Loss: 1.866581 LR: 0.00000839 +[16:32:37] Epoch: 1 Batch: 16792/20099 (83.55%) Loss: 2.095175 LR: 0.00000839 +[16:32:39] Epoch: 1 Batch: 16793/20099 (83.55%) Loss: 1.824196 LR: 0.00000839 +[16:32:41] Epoch: 1 Batch: 16794/20099 (83.56%) Loss: 2.246424 LR: 0.00000838 +[16:32:42] Epoch: 1 Batch: 16795/20099 (83.56%) Loss: 2.189766 LR: 0.00000838 +[16:32:44] Epoch: 1 Batch: 16796/20099 (83.57%) Loss: 2.035667 LR: 0.00000838 +[16:32:46] Epoch: 1 Batch: 16797/20099 (83.57%) Loss: 2.094762 LR: 0.00000838 +[16:32:48] Epoch: 1 Batch: 16798/20099 (83.58%) Loss: 1.892124 LR: 0.00000838 +[16:32:50] Epoch: 1 Batch: 16799/20099 (83.58%) Loss: 2.465844 LR: 0.00000838 +[16:32:55] >> Cleaned up old temp checkpoint: epoch1_step14800 +[16:32:55] >> Temp checkpoint saved: epoch1_step16800, size: 0.1693 GB +[16:32:55] Epoch: 1 Batch: 16800/20099 (83.59%) Loss: 1.784863 LR: 0.00000838 +[16:32:57] Epoch: 1 Batch: 16801/20099 (83.59%) Loss: 2.340324 LR: 0.00000837 +[16:32:59] Epoch: 1 Batch: 16802/20099 (83.60%) Loss: 2.238878 LR: 0.00000837 +[16:33:01] Epoch: 1 Batch: 16803/20099 (83.60%) Loss: 2.020341 LR: 0.00000837 +[16:33:03] Epoch: 1 Batch: 16804/20099 (83.61%) Loss: 1.807205 LR: 0.00000837 +[16:33:05] Epoch: 1 Batch: 16805/20099 (83.61%) Loss: 2.048045 LR: 0.00000837 +[16:33:07] Epoch: 1 Batch: 16806/20099 (83.62%) Loss: 2.288436 LR: 0.00000837 +[16:33:08] Epoch: 1 Batch: 16807/20099 (83.62%) Loss: 1.890078 LR: 0.00000837 +[16:33:10] Epoch: 1 Batch: 16808/20099 (83.63%) Loss: 2.154845 LR: 0.00000836 +[16:33:12] Epoch: 1 Batch: 16809/20099 (83.63%) Loss: 1.984071 LR: 0.00000836 +[16:33:14] Epoch: 1 Batch: 16810/20099 (83.64%) Loss: 1.910351 LR: 0.00000836 +[16:33:16] Epoch: 1 Batch: 16811/20099 (83.64%) Loss: 2.326294 LR: 0.00000836 +[16:33:18] Epoch: 1 Batch: 16812/20099 (83.65%) Loss: 2.118865 LR: 0.00000836 +[16:33:20] Epoch: 1 Batch: 16813/20099 (83.65%) Loss: 2.012086 LR: 0.00000836 +[16:33:21] Epoch: 1 Batch: 16814/20099 (83.66%) Loss: 2.215842 LR: 0.00000836 +[16:33:23] Epoch: 1 Batch: 16815/20099 (83.66%) Loss: 1.863859 LR: 0.00000835 +[16:33:25] Epoch: 1 Batch: 16816/20099 (83.67%) Loss: 2.080020 LR: 0.00000835 +[16:33:27] Epoch: 1 Batch: 16817/20099 (83.67%) Loss: 1.798467 LR: 0.00000835 +[16:33:29] Epoch: 1 Batch: 16818/20099 (83.68%) Loss: 2.022755 LR: 0.00000835 +[16:33:31] Epoch: 1 Batch: 16819/20099 (83.68%) Loss: 2.186172 LR: 0.00000835 +[16:33:33] Epoch: 1 Batch: 16820/20099 (83.69%) Loss: 2.192345 LR: 0.00000835 +[16:33:34] Epoch: 1 Batch: 16821/20099 (83.69%) Loss: 1.972157 LR: 0.00000835 +[16:33:36] Epoch: 1 Batch: 16822/20099 (83.70%) Loss: 2.086529 LR: 0.00000834 +[16:33:38] Epoch: 1 Batch: 16823/20099 (83.70%) Loss: 1.952470 LR: 0.00000834 +[16:33:40] Epoch: 1 Batch: 16824/20099 (83.71%) Loss: 1.963237 LR: 0.00000834 +[16:33:42] Epoch: 1 Batch: 16825/20099 (83.71%) Loss: 2.047339 LR: 0.00000834 +[16:33:44] Epoch: 1 Batch: 16826/20099 (83.72%) Loss: 2.113311 LR: 0.00000834 +[16:33:46] Epoch: 1 Batch: 16827/20099 (83.72%) Loss: 1.856907 LR: 0.00000834 +[16:33:47] Epoch: 1 Batch: 16828/20099 (83.73%) Loss: 1.812660 LR: 0.00000834 +[16:33:49] Epoch: 1 Batch: 16829/20099 (83.73%) Loss: 1.991629 LR: 0.00000833 +[16:33:51] Epoch: 1 Batch: 16830/20099 (83.74%) Loss: 2.099817 LR: 0.00000833 +[16:33:53] Epoch: 1 Batch: 16831/20099 (83.74%) Loss: 1.901524 LR: 0.00000833 +[16:33:55] Epoch: 1 Batch: 16832/20099 (83.75%) Loss: 2.231394 LR: 0.00000833 +[16:33:57] Epoch: 1 Batch: 16833/20099 (83.75%) Loss: 2.159597 LR: 0.00000833 +[16:33:59] Epoch: 1 Batch: 16834/20099 (83.76%) Loss: 1.990955 LR: 0.00000833 +[16:34:00] Epoch: 1 Batch: 16835/20099 (83.76%) Loss: 2.282653 LR: 0.00000833 +[16:34:02] Epoch: 1 Batch: 16836/20099 (83.77%) Loss: 2.094138 LR: 0.00000832 +[16:34:04] Epoch: 1 Batch: 16837/20099 (83.77%) Loss: 2.255760 LR: 0.00000832 +[16:34:06] Epoch: 1 Batch: 16838/20099 (83.78%) Loss: 2.026728 LR: 0.00000832 +[16:34:08] Epoch: 1 Batch: 16839/20099 (83.78%) Loss: 1.957673 LR: 0.00000832 +[16:34:10] Epoch: 1 Batch: 16840/20099 (83.79%) Loss: 2.093330 LR: 0.00000832 +[16:34:11] Epoch: 1 Batch: 16841/20099 (83.79%) Loss: 1.900071 LR: 0.00000832 +[16:34:13] Epoch: 1 Batch: 16842/20099 (83.80%) Loss: 2.009720 LR: 0.00000832 +[16:34:15] Epoch: 1 Batch: 16843/20099 (83.80%) Loss: 1.808057 LR: 0.00000831 +[16:34:17] Epoch: 1 Batch: 16844/20099 (83.81%) Loss: 2.067416 LR: 0.00000831 +[16:34:19] Epoch: 1 Batch: 16845/20099 (83.81%) Loss: 2.279677 LR: 0.00000831 +[16:34:21] Epoch: 1 Batch: 16846/20099 (83.82%) Loss: 1.973707 LR: 0.00000831 +[16:34:23] Epoch: 1 Batch: 16847/20099 (83.82%) Loss: 2.010552 LR: 0.00000831 +[16:34:24] Epoch: 1 Batch: 16848/20099 (83.83%) Loss: 2.111931 LR: 0.00000831 +[16:34:26] Epoch: 1 Batch: 16849/20099 (83.83%) Loss: 1.724504 LR: 0.00000831 +[16:34:28] Epoch: 1 Batch: 16850/20099 (83.84%) Loss: 1.752719 LR: 0.00000830 +[16:34:30] Epoch: 1 Batch: 16851/20099 (83.84%) Loss: 1.999775 LR: 0.00000830 +[16:34:32] Epoch: 1 Batch: 16852/20099 (83.84%) Loss: 2.251132 LR: 0.00000830 +[16:34:34] Epoch: 1 Batch: 16853/20099 (83.85%) Loss: 1.759152 LR: 0.00000830 +[16:34:35] Epoch: 1 Batch: 16854/20099 (83.85%) Loss: 2.007309 LR: 0.00000830 +[16:34:37] Epoch: 1 Batch: 16855/20099 (83.86%) Loss: 2.010330 LR: 0.00000830 +[16:34:39] Epoch: 1 Batch: 16856/20099 (83.86%) Loss: 2.061253 LR: 0.00000830 +[16:34:41] Epoch: 1 Batch: 16857/20099 (83.87%) Loss: 2.302803 LR: 0.00000829 +[16:34:43] Epoch: 1 Batch: 16858/20099 (83.87%) Loss: 1.950537 LR: 0.00000829 +[16:34:45] Epoch: 1 Batch: 16859/20099 (83.88%) Loss: 2.169888 LR: 0.00000829 +[16:34:47] Epoch: 1 Batch: 16860/20099 (83.88%) Loss: 1.887194 LR: 0.00000829 +[16:34:48] Epoch: 1 Batch: 16861/20099 (83.89%) Loss: 2.528647 LR: 0.00000829 +[16:34:50] Epoch: 1 Batch: 16862/20099 (83.89%) Loss: 1.636225 LR: 0.00000829 +[16:34:52] Epoch: 1 Batch: 16863/20099 (83.90%) Loss: 2.132779 LR: 0.00000829 +[16:34:54] Epoch: 1 Batch: 16864/20099 (83.90%) Loss: 2.193437 LR: 0.00000828 +[16:34:56] Epoch: 1 Batch: 16865/20099 (83.91%) Loss: 2.299555 LR: 0.00000828 +[16:34:58] Epoch: 1 Batch: 16866/20099 (83.91%) Loss: 2.142249 LR: 0.00000828 +[16:35:00] Epoch: 1 Batch: 16867/20099 (83.92%) Loss: 2.150552 LR: 0.00000828 +[16:35:02] Epoch: 1 Batch: 16868/20099 (83.92%) Loss: 2.072528 LR: 0.00000828 +[16:35:03] Epoch: 1 Batch: 16869/20099 (83.93%) Loss: 2.051984 LR: 0.00000828 +[16:35:05] Epoch: 1 Batch: 16870/20099 (83.93%) Loss: 2.140445 LR: 0.00000828 +[16:35:07] Epoch: 1 Batch: 16871/20099 (83.94%) Loss: 2.257636 LR: 0.00000827 +[16:35:09] Epoch: 1 Batch: 16872/20099 (83.94%) Loss: 1.999200 LR: 0.00000827 +[16:35:11] Epoch: 1 Batch: 16873/20099 (83.95%) Loss: 1.639849 LR: 0.00000827 +[16:35:13] Epoch: 1 Batch: 16874/20099 (83.95%) Loss: 1.850664 LR: 0.00000827 +[16:35:15] Epoch: 1 Batch: 16875/20099 (83.96%) Loss: 1.932293 LR: 0.00000827 +[16:35:16] Epoch: 1 Batch: 16876/20099 (83.96%) Loss: 2.238001 LR: 0.00000827 +[16:35:18] Epoch: 1 Batch: 16877/20099 (83.97%) Loss: 2.221350 LR: 0.00000827 +[16:35:20] Epoch: 1 Batch: 16878/20099 (83.97%) Loss: 2.241984 LR: 0.00000826 +[16:35:22] Epoch: 1 Batch: 16879/20099 (83.98%) Loss: 2.069839 LR: 0.00000826 +[16:35:24] Epoch: 1 Batch: 16880/20099 (83.98%) Loss: 2.351850 LR: 0.00000826 +[16:35:26] Epoch: 1 Batch: 16881/20099 (83.99%) Loss: 2.362052 LR: 0.00000826 +[16:35:28] Epoch: 1 Batch: 16882/20099 (83.99%) Loss: 2.038764 LR: 0.00000826 +[16:35:29] Epoch: 1 Batch: 16883/20099 (84.00%) Loss: 1.991608 LR: 0.00000826 +[16:35:31] Epoch: 1 Batch: 16884/20099 (84.00%) Loss: 2.257091 LR: 0.00000826 +[16:35:33] Epoch: 1 Batch: 16885/20099 (84.01%) Loss: 1.986232 LR: 0.00000825 +[16:35:35] Epoch: 1 Batch: 16886/20099 (84.01%) Loss: 2.005443 LR: 0.00000825 +[16:35:37] Epoch: 1 Batch: 16887/20099 (84.02%) Loss: 2.043661 LR: 0.00000825 +[16:35:39] Epoch: 1 Batch: 16888/20099 (84.02%) Loss: 2.307811 LR: 0.00000825 +[16:35:41] Epoch: 1 Batch: 16889/20099 (84.03%) Loss: 2.150407 LR: 0.00000825 +[16:35:42] Epoch: 1 Batch: 16890/20099 (84.03%) Loss: 2.138336 LR: 0.00000825 +[16:35:44] Epoch: 1 Batch: 16891/20099 (84.04%) Loss: 2.147258 LR: 0.00000825 +[16:35:46] Epoch: 1 Batch: 16892/20099 (84.04%) Loss: 2.207033 LR: 0.00000824 +[16:35:48] Epoch: 1 Batch: 16893/20099 (84.05%) Loss: 1.876033 LR: 0.00000824 +[16:35:50] Epoch: 1 Batch: 16894/20099 (84.05%) Loss: 1.905821 LR: 0.00000824 +[16:35:52] Epoch: 1 Batch: 16895/20099 (84.06%) Loss: 1.873228 LR: 0.00000824 +[16:35:53] Epoch: 1 Batch: 16896/20099 (84.06%) Loss: 2.125677 LR: 0.00000824 +[16:35:55] Epoch: 1 Batch: 16897/20099 (84.07%) Loss: 2.106588 LR: 0.00000824 +[16:35:57] Epoch: 1 Batch: 16898/20099 (84.07%) Loss: 1.917327 LR: 0.00000824 +[16:35:59] Epoch: 1 Batch: 16899/20099 (84.08%) Loss: 2.020635 LR: 0.00000823 +[16:36:01] Epoch: 1 Batch: 16900/20099 (84.08%) Loss: 2.200465 LR: 0.00000823 +[16:36:03] Epoch: 1 Batch: 16901/20099 (84.09%) Loss: 2.360320 LR: 0.00000823 +[16:36:05] Epoch: 1 Batch: 16902/20099 (84.09%) Loss: 1.702943 LR: 0.00000823 +[16:36:06] Epoch: 1 Batch: 16903/20099 (84.10%) Loss: 2.170109 LR: 0.00000823 +[16:36:08] Epoch: 1 Batch: 16904/20099 (84.10%) Loss: 2.277708 LR: 0.00000823 +[16:36:10] Epoch: 1 Batch: 16905/20099 (84.11%) Loss: 1.803392 LR: 0.00000823 +[16:36:12] Epoch: 1 Batch: 16906/20099 (84.11%) Loss: 1.981362 LR: 0.00000822 +[16:36:14] Epoch: 1 Batch: 16907/20099 (84.12%) Loss: 2.203875 LR: 0.00000822 +[16:36:16] Epoch: 1 Batch: 16908/20099 (84.12%) Loss: 2.002357 LR: 0.00000822 +[16:36:18] Epoch: 1 Batch: 16909/20099 (84.13%) Loss: 1.837436 LR: 0.00000822 +[16:36:19] Epoch: 1 Batch: 16910/20099 (84.13%) Loss: 2.062276 LR: 0.00000822 +[16:36:21] Epoch: 1 Batch: 16911/20099 (84.14%) Loss: 2.342928 LR: 0.00000822 +[16:36:23] Epoch: 1 Batch: 16912/20099 (84.14%) Loss: 1.974358 LR: 0.00000822 +[16:36:25] Epoch: 1 Batch: 16913/20099 (84.15%) Loss: 2.200863 LR: 0.00000821 +[16:36:27] Epoch: 1 Batch: 16914/20099 (84.15%) Loss: 2.112032 LR: 0.00000821 +[16:36:29] Epoch: 1 Batch: 16915/20099 (84.16%) Loss: 2.308681 LR: 0.00000821 +[16:36:30] Epoch: 1 Batch: 16916/20099 (84.16%) Loss: 1.731260 LR: 0.00000821 +[16:36:32] Epoch: 1 Batch: 16917/20099 (84.17%) Loss: 2.104476 LR: 0.00000821 +[16:36:34] Epoch: 1 Batch: 16918/20099 (84.17%) Loss: 2.085897 LR: 0.00000821 +[16:36:36] Epoch: 1 Batch: 16919/20099 (84.18%) Loss: 1.968170 LR: 0.00000821 +[16:36:38] Epoch: 1 Batch: 16920/20099 (84.18%) Loss: 2.192598 LR: 0.00000820 +[16:36:40] Epoch: 1 Batch: 16921/20099 (84.19%) Loss: 1.791328 LR: 0.00000820 +[16:36:42] Epoch: 1 Batch: 16922/20099 (84.19%) Loss: 1.703125 LR: 0.00000820 +[16:36:43] Epoch: 1 Batch: 16923/20099 (84.20%) Loss: 2.177454 LR: 0.00000820 +[16:36:45] Epoch: 1 Batch: 16924/20099 (84.20%) Loss: 2.228979 LR: 0.00000820 +[16:36:47] Epoch: 1 Batch: 16925/20099 (84.21%) Loss: 1.936711 LR: 0.00000820 +[16:36:49] Epoch: 1 Batch: 16926/20099 (84.21%) Loss: 2.053221 LR: 0.00000820 +[16:36:51] Epoch: 1 Batch: 16927/20099 (84.22%) Loss: 2.229975 LR: 0.00000820 +[16:36:53] Epoch: 1 Batch: 16928/20099 (84.22%) Loss: 1.690549 LR: 0.00000820 +[16:36:54] Epoch: 1 Batch: 16929/20099 (84.23%) Loss: 1.814277 LR: 0.00000820 +[16:36:56] Epoch: 1 Batch: 16930/20099 (84.23%) Loss: 1.886759 LR: 0.00000820 +[16:36:58] Epoch: 1 Batch: 16931/20099 (84.24%) Loss: 2.026542 LR: 0.00000820 +[16:37:00] Epoch: 1 Batch: 16932/20099 (84.24%) Loss: 2.192698 LR: 0.00000820 +[16:37:02] Epoch: 1 Batch: 16933/20099 (84.25%) Loss: 2.245118 LR: 0.00000820 +[16:37:04] Epoch: 1 Batch: 16934/20099 (84.25%) Loss: 2.169201 LR: 0.00000819 +[16:37:06] Epoch: 1 Batch: 16935/20099 (84.26%) Loss: 1.918150 LR: 0.00000819 +[16:37:07] Epoch: 1 Batch: 16936/20099 (84.26%) Loss: 2.087108 LR: 0.00000819 +[16:37:09] Epoch: 1 Batch: 16937/20099 (84.27%) Loss: 2.235621 LR: 0.00000819 +[16:37:11] Epoch: 1 Batch: 16938/20099 (84.27%) Loss: 2.171145 LR: 0.00000819 +[16:37:13] Epoch: 1 Batch: 16939/20099 (84.28%) Loss: 1.998971 LR: 0.00000819 +[16:37:15] Epoch: 1 Batch: 16940/20099 (84.28%) Loss: 1.929785 LR: 0.00000819 +[16:37:17] Epoch: 1 Batch: 16941/20099 (84.29%) Loss: 2.258292 LR: 0.00000818 +[16:37:19] Epoch: 1 Batch: 16942/20099 (84.29%) Loss: 2.068955 LR: 0.00000818 +[16:37:20] Epoch: 1 Batch: 16943/20099 (84.30%) Loss: 2.011949 LR: 0.00000818 +[16:37:22] Epoch: 1 Batch: 16944/20099 (84.30%) Loss: 2.187311 LR: 0.00000818 +[16:37:24] Epoch: 1 Batch: 16945/20099 (84.31%) Loss: 1.573499 LR: 0.00000818 +[16:37:26] Epoch: 1 Batch: 16946/20099 (84.31%) Loss: 2.176100 LR: 0.00000818 +[16:37:28] Epoch: 1 Batch: 16947/20099 (84.32%) Loss: 2.445177 LR: 0.00000818 +[16:37:30] Epoch: 1 Batch: 16948/20099 (84.32%) Loss: 2.114449 LR: 0.00000817 +[16:37:31] Epoch: 1 Batch: 16949/20099 (84.33%) Loss: 1.838074 LR: 0.00000817 +[16:37:33] Epoch: 1 Batch: 16950/20099 (84.33%) Loss: 2.123845 LR: 0.00000817 +[16:37:35] Epoch: 1 Batch: 16951/20099 (84.34%) Loss: 2.141633 LR: 0.00000817 +[16:37:37] Epoch: 1 Batch: 16952/20099 (84.34%) Loss: 1.640726 LR: 0.00000817 +[16:37:39] Epoch: 1 Batch: 16953/20099 (84.35%) Loss: 2.508094 LR: 0.00000817 +[16:37:41] Epoch: 1 Batch: 16954/20099 (84.35%) Loss: 2.346501 LR: 0.00000817 +[16:37:43] Epoch: 1 Batch: 16955/20099 (84.36%) Loss: 2.356726 LR: 0.00000816 +[16:37:44] Epoch: 1 Batch: 16956/20099 (84.36%) Loss: 2.081290 LR: 0.00000816 +[16:37:46] Epoch: 1 Batch: 16957/20099 (84.37%) Loss: 2.198065 LR: 0.00000816 +[16:37:48] Epoch: 1 Batch: 16958/20099 (84.37%) Loss: 2.266855 LR: 0.00000816 +[16:37:50] Epoch: 1 Batch: 16959/20099 (84.38%) Loss: 1.977073 LR: 0.00000816 +[16:37:52] Epoch: 1 Batch: 16960/20099 (84.38%) Loss: 2.138112 LR: 0.00000816 +[16:37:54] Epoch: 1 Batch: 16961/20099 (84.39%) Loss: 1.947053 LR: 0.00000816 +[16:37:56] Epoch: 1 Batch: 16962/20099 (84.39%) Loss: 2.025668 LR: 0.00000815 +[16:37:57] Epoch: 1 Batch: 16963/20099 (84.40%) Loss: 2.123389 LR: 0.00000815 +[16:37:59] Epoch: 1 Batch: 16964/20099 (84.40%) Loss: 2.102684 LR: 0.00000815 +[16:38:01] Epoch: 1 Batch: 16965/20099 (84.41%) Loss: 2.156146 LR: 0.00000815 +[16:38:03] Epoch: 1 Batch: 16966/20099 (84.41%) Loss: 2.182614 LR: 0.00000815 +[16:38:05] Epoch: 1 Batch: 16967/20099 (84.42%) Loss: 2.149832 LR: 0.00000815 +[16:38:07] Epoch: 1 Batch: 16968/20099 (84.42%) Loss: 1.787732 LR: 0.00000815 +[16:38:09] Epoch: 1 Batch: 16969/20099 (84.43%) Loss: 1.978180 LR: 0.00000814 +[16:38:10] Epoch: 1 Batch: 16970/20099 (84.43%) Loss: 1.799998 LR: 0.00000814 +[16:38:12] Epoch: 1 Batch: 16971/20099 (84.44%) Loss: 1.943870 LR: 0.00000814 +[16:38:14] Epoch: 1 Batch: 16972/20099 (84.44%) Loss: 2.210814 LR: 0.00000814 +[16:38:16] Epoch: 1 Batch: 16973/20099 (84.45%) Loss: 2.219488 LR: 0.00000814 +[16:38:18] Epoch: 1 Batch: 16974/20099 (84.45%) Loss: 2.012368 LR: 0.00000814 +[16:38:20] Epoch: 1 Batch: 16975/20099 (84.46%) Loss: 1.894328 LR: 0.00000814 +[16:38:22] Epoch: 1 Batch: 16976/20099 (84.46%) Loss: 1.633489 LR: 0.00000813 +[16:38:23] Epoch: 1 Batch: 16977/20099 (84.47%) Loss: 2.245471 LR: 0.00000813 +[16:38:25] Epoch: 1 Batch: 16978/20099 (84.47%) Loss: 2.307497 LR: 0.00000813 +[16:38:27] Epoch: 1 Batch: 16979/20099 (84.48%) Loss: 2.178915 LR: 0.00000813 +[16:38:29] Epoch: 1 Batch: 16980/20099 (84.48%) Loss: 1.981733 LR: 0.00000813 +[16:38:31] Epoch: 1 Batch: 16981/20099 (84.49%) Loss: 1.898773 LR: 0.00000813 +[16:38:33] Epoch: 1 Batch: 16982/20099 (84.49%) Loss: 1.922602 LR: 0.00000813 +[16:38:35] Epoch: 1 Batch: 16983/20099 (84.50%) Loss: 2.043898 LR: 0.00000812 +[16:38:36] Epoch: 1 Batch: 16984/20099 (84.50%) Loss: 2.275217 LR: 0.00000812 +[16:38:38] Epoch: 1 Batch: 16985/20099 (84.51%) Loss: 2.183689 LR: 0.00000812 +[16:38:40] Epoch: 1 Batch: 16986/20099 (84.51%) Loss: 2.035990 LR: 0.00000812 +[16:38:42] Epoch: 1 Batch: 16987/20099 (84.52%) Loss: 1.878629 LR: 0.00000812 +[16:38:44] Epoch: 1 Batch: 16988/20099 (84.52%) Loss: 1.804050 LR: 0.00000812 +[16:38:46] Epoch: 1 Batch: 16989/20099 (84.53%) Loss: 2.214268 LR: 0.00000812 +[16:38:48] Epoch: 1 Batch: 16990/20099 (84.53%) Loss: 2.285652 LR: 0.00000811 +[16:38:49] Epoch: 1 Batch: 16991/20099 (84.54%) Loss: 1.891377 LR: 0.00000811 +[16:38:51] Epoch: 1 Batch: 16992/20099 (84.54%) Loss: 2.319561 LR: 0.00000811 +[16:38:53] Epoch: 1 Batch: 16993/20099 (84.55%) Loss: 2.112476 LR: 0.00000811 +[16:38:55] Epoch: 1 Batch: 16994/20099 (84.55%) Loss: 2.204563 LR: 0.00000811 +[16:38:57] Epoch: 1 Batch: 16995/20099 (84.56%) Loss: 1.818583 LR: 0.00000811 +[16:38:59] Epoch: 1 Batch: 16996/20099 (84.56%) Loss: 1.988327 LR: 0.00000811 +[16:39:01] Epoch: 1 Batch: 16997/20099 (84.57%) Loss: 2.262294 LR: 0.00000810 +[16:39:02] Epoch: 1 Batch: 16998/20099 (84.57%) Loss: 2.034229 LR: 0.00000810 +[16:39:04] Epoch: 1 Batch: 16999/20099 (84.58%) Loss: 1.943547 LR: 0.00000810 +[16:39:06] >> Evaluating batch 0 +[16:39:07] >> Evaluating batch 1 +[16:39:08] >> Evaluating batch 2 +[16:39:09] >> Evaluating batch 3 +[16:39:10] >> Evaluating batch 4 +[16:39:12] >> Evaluating batch 5 +[16:39:13] >> Evaluating batch 6 +[16:39:14] >> Evaluating batch 7 +[16:39:15] >> Evaluating batch 8 +[16:39:16] >> Evaluating batch 9 +[16:39:17] >> Evaluating batch 10 +[16:39:18] >> Evaluating batch 11 +[16:39:19] >> Evaluating batch 12 +[16:39:20] >> Evaluating batch 13 +[16:39:21] >> Evaluating batch 14 +[16:39:22] >> Evaluating batch 15 +[16:39:23] >> Evaluating batch 16 +[16:39:23] Epoch: 1 Step: 17000/20099 Evaluation: +[16:39:23] [1mAvg Loss Since Last Eval: 2.0804 Val Loss: 2.1469 Validation loss delta: -0.0009 Perplexity: 8.5584 LR: 0.00000810 +[16:39:27] >> Cleaned up old temp checkpoint: epoch1_step15000 +[16:39:27] >> Temp checkpoint saved: epoch1_step17000, size: 0.1693 GB +[16:39:31] >> Checkpoint saved: epoch1_step17000, size: 0.1693 GB +[16:39:31] Epoch: 1 Batch: 17000/20099 (84.58%) Loss: 2.191501 LR: 0.00000810 +[16:39:32] Epoch: 1 Batch: 17001/20099 (84.59%) Loss: 2.395800 LR: 0.00000810 +[16:39:34] Epoch: 1 Batch: 17002/20099 (84.59%) Loss: 1.995746 LR: 0.00000810 +[16:39:36] Epoch: 1 Batch: 17003/20099 (84.60%) Loss: 2.097784 LR: 0.00000810 +[16:39:38] Epoch: 1 Batch: 17004/20099 (84.60%) Loss: 2.210976 LR: 0.00000809 +[16:39:40] Epoch: 1 Batch: 17005/20099 (84.61%) Loss: 1.958822 LR: 0.00000809 +[16:39:42] Epoch: 1 Batch: 17006/20099 (84.61%) Loss: 2.267553 LR: 0.00000809 +[16:39:43] Epoch: 1 Batch: 17007/20099 (84.62%) Loss: 1.867942 LR: 0.00000809 +[16:39:45] Epoch: 1 Batch: 17008/20099 (84.62%) Loss: 2.424419 LR: 0.00000809 +[16:39:47] Epoch: 1 Batch: 17009/20099 (84.63%) Loss: 2.219309 LR: 0.00000809 +[16:39:49] Epoch: 1 Batch: 17010/20099 (84.63%) Loss: 1.764623 LR: 0.00000809 +[16:39:51] Epoch: 1 Batch: 17011/20099 (84.64%) Loss: 1.998904 LR: 0.00000808 +[16:39:53] Epoch: 1 Batch: 17012/20099 (84.64%) Loss: 2.387442 LR: 0.00000808 +[16:39:55] Epoch: 1 Batch: 17013/20099 (84.65%) Loss: 1.869690 LR: 0.00000808 +[16:39:57] Epoch: 1 Batch: 17014/20099 (84.65%) Loss: 1.603001 LR: 0.00000808 +[16:39:59] Epoch: 1 Batch: 17015/20099 (84.66%) Loss: 2.142349 LR: 0.00000808 +[16:40:01] Epoch: 1 Batch: 17016/20099 (84.66%) Loss: 1.916685 LR: 0.00000808 +[16:40:02] Epoch: 1 Batch: 17017/20099 (84.67%) Loss: 2.317330 LR: 0.00000808 +[16:40:04] Epoch: 1 Batch: 17018/20099 (84.67%) Loss: 2.166361 LR: 0.00000808 +[16:40:06] Epoch: 1 Batch: 17019/20099 (84.68%) Loss: 2.210110 LR: 0.00000808 +[16:40:08] Epoch: 1 Batch: 17020/20099 (84.68%) Loss: 1.822191 LR: 0.00000808 +[16:40:10] Epoch: 1 Batch: 17021/20099 (84.69%) Loss: 2.090557 LR: 0.00000808 +[16:40:12] Epoch: 1 Batch: 17022/20099 (84.69%) Loss: 2.224681 LR: 0.00000808 +[16:40:14] Epoch: 1 Batch: 17023/20099 (84.70%) Loss: 2.099369 LR: 0.00000808 +[16:40:16] Epoch: 1 Batch: 17024/20099 (84.70%) Loss: 2.002857 LR: 0.00000808 +[16:40:17] Epoch: 1 Batch: 17025/20099 (84.71%) Loss: 1.935846 LR: 0.00000807 +[16:40:19] Epoch: 1 Batch: 17026/20099 (84.71%) Loss: 1.808426 LR: 0.00000807 +[16:40:21] Epoch: 1 Batch: 17027/20099 (84.72%) Loss: 2.327192 LR: 0.00000807 +[16:40:23] Epoch: 1 Batch: 17028/20099 (84.72%) Loss: 2.161461 LR: 0.00000807 +[16:40:25] Epoch: 1 Batch: 17029/20099 (84.73%) Loss: 2.257541 LR: 0.00000807 +[16:40:27] Epoch: 1 Batch: 17030/20099 (84.73%) Loss: 1.998036 LR: 0.00000807 +[16:40:28] Epoch: 1 Batch: 17031/20099 (84.74%) Loss: 2.404791 LR: 0.00000807 +[16:40:30] Epoch: 1 Batch: 17032/20099 (84.74%) Loss: 2.379539 LR: 0.00000806 +[16:40:32] Epoch: 1 Batch: 17033/20099 (84.75%) Loss: 1.970291 LR: 0.00000806 +[16:40:34] Epoch: 1 Batch: 17034/20099 (84.75%) Loss: 2.261286 LR: 0.00000806 +[16:40:36] Epoch: 1 Batch: 17035/20099 (84.76%) Loss: 2.050778 LR: 0.00000806 +[16:40:38] Epoch: 1 Batch: 17036/20099 (84.76%) Loss: 2.045305 LR: 0.00000806 +[16:40:40] Epoch: 1 Batch: 17037/20099 (84.77%) Loss: 2.335869 LR: 0.00000806 +[16:40:41] Epoch: 1 Batch: 17038/20099 (84.77%) Loss: 2.319338 LR: 0.00000806 +[16:40:43] Epoch: 1 Batch: 17039/20099 (84.78%) Loss: 2.086765 LR: 0.00000805 +[16:40:45] Epoch: 1 Batch: 17040/20099 (84.78%) Loss: 2.159406 LR: 0.00000805 +[16:40:47] Epoch: 1 Batch: 17041/20099 (84.79%) Loss: 2.155032 LR: 0.00000805 +[16:40:49] Epoch: 1 Batch: 17042/20099 (84.79%) Loss: 2.356128 LR: 0.00000805 +[16:40:51] Epoch: 1 Batch: 17043/20099 (84.80%) Loss: 2.009349 LR: 0.00000805 +[16:40:52] Epoch: 1 Batch: 17044/20099 (84.80%) Loss: 1.917074 LR: 0.00000805 +[16:40:54] Epoch: 1 Batch: 17045/20099 (84.81%) Loss: 2.294897 LR: 0.00000805 +[16:40:56] Epoch: 1 Batch: 17046/20099 (84.81%) Loss: 2.063482 LR: 0.00000804 +[16:40:58] Epoch: 1 Batch: 17047/20099 (84.82%) Loss: 2.460309 LR: 0.00000804 +[16:41:00] Epoch: 1 Batch: 17048/20099 (84.82%) Loss: 1.886720 LR: 0.00000804 +[16:41:02] Epoch: 1 Batch: 17049/20099 (84.83%) Loss: 2.090027 LR: 0.00000804 +[16:41:03] Epoch: 1 Batch: 17050/20099 (84.83%) Loss: 1.806859 LR: 0.00000804 +[16:41:05] Epoch: 1 Batch: 17051/20099 (84.84%) Loss: 1.931793 LR: 0.00000804 +[16:41:07] Epoch: 1 Batch: 17052/20099 (84.84%) Loss: 2.059423 LR: 0.00000804 +[16:41:09] Epoch: 1 Batch: 17053/20099 (84.85%) Loss: 2.026283 LR: 0.00000803 +[16:41:11] Epoch: 1 Batch: 17054/20099 (84.85%) Loss: 1.881199 LR: 0.00000803 +[16:41:13] Epoch: 1 Batch: 17055/20099 (84.85%) Loss: 2.082657 LR: 0.00000803 +[16:41:15] Epoch: 1 Batch: 17056/20099 (84.86%) Loss: 1.911495 LR: 0.00000803 +[16:41:16] Epoch: 1 Batch: 17057/20099 (84.86%) Loss: 1.916013 LR: 0.00000803 +[16:41:18] Epoch: 1 Batch: 17058/20099 (84.87%) Loss: 2.184204 LR: 0.00000803 +[16:41:20] Epoch: 1 Batch: 17059/20099 (84.87%) Loss: 1.856809 LR: 0.00000803 +[16:41:22] Epoch: 1 Batch: 17060/20099 (84.88%) Loss: 2.080236 LR: 0.00000802 +[16:41:24] Epoch: 1 Batch: 17061/20099 (84.88%) Loss: 1.960490 LR: 0.00000802 +[16:41:26] Epoch: 1 Batch: 17062/20099 (84.89%) Loss: 1.995957 LR: 0.00000802 +[16:41:28] Epoch: 1 Batch: 17063/20099 (84.89%) Loss: 2.031104 LR: 0.00000802 +[16:41:30] Epoch: 1 Batch: 17064/20099 (84.90%) Loss: 2.029749 LR: 0.00000802 +[16:41:31] Epoch: 1 Batch: 17065/20099 (84.90%) Loss: 2.029975 LR: 0.00000802 +[16:41:33] Epoch: 1 Batch: 17066/20099 (84.91%) Loss: 2.349007 LR: 0.00000802 +[16:41:35] Epoch: 1 Batch: 17067/20099 (84.91%) Loss: 2.254635 LR: 0.00000801 +[16:41:37] Epoch: 1 Batch: 17068/20099 (84.92%) Loss: 2.012547 LR: 0.00000801 +[16:41:39] Epoch: 1 Batch: 17069/20099 (84.92%) Loss: 2.081901 LR: 0.00000801 +[16:41:41] Epoch: 1 Batch: 17070/20099 (84.93%) Loss: 2.240936 LR: 0.00000801 +[16:41:43] Epoch: 1 Batch: 17071/20099 (84.93%) Loss: 2.137130 LR: 0.00000801 +[16:41:44] Epoch: 1 Batch: 17072/20099 (84.94%) Loss: 2.011303 LR: 0.00000801 +[16:41:46] Epoch: 1 Batch: 17073/20099 (84.94%) Loss: 2.115493 LR: 0.00000801 +[16:41:48] Epoch: 1 Batch: 17074/20099 (84.95%) Loss: 1.715613 LR: 0.00000800 +[16:41:50] Epoch: 1 Batch: 17075/20099 (84.95%) Loss: 2.100876 LR: 0.00000800 +[16:41:52] Epoch: 1 Batch: 17076/20099 (84.96%) Loss: 2.018236 LR: 0.00000800 +[16:41:54] Epoch: 1 Batch: 17077/20099 (84.96%) Loss: 1.568634 LR: 0.00000800 +[16:41:55] Epoch: 1 Batch: 17078/20099 (84.97%) Loss: 2.210957 LR: 0.00000800 +[16:41:57] Epoch: 1 Batch: 17079/20099 (84.97%) Loss: 1.859042 LR: 0.00000800 +[16:41:59] Epoch: 1 Batch: 17080/20099 (84.98%) Loss: 1.955690 LR: 0.00000800 +[16:42:01] Epoch: 1 Batch: 17081/20099 (84.98%) Loss: 2.244576 LR: 0.00000799 +[16:42:03] Epoch: 1 Batch: 17082/20099 (84.99%) Loss: 1.886267 LR: 0.00000799 +[16:42:05] Epoch: 1 Batch: 17083/20099 (84.99%) Loss: 1.767527 LR: 0.00000799 +[16:42:07] Epoch: 1 Batch: 17084/20099 (85.00%) Loss: 1.959054 LR: 0.00000799 +[16:42:08] Epoch: 1 Batch: 17085/20099 (85.00%) Loss: 1.747660 LR: 0.00000799 +[16:42:10] Epoch: 1 Batch: 17086/20099 (85.01%) Loss: 2.083838 LR: 0.00000799 +[16:42:12] Epoch: 1 Batch: 17087/20099 (85.01%) Loss: 2.039367 LR: 0.00000799 +[16:42:14] Epoch: 1 Batch: 17088/20099 (85.02%) Loss: 1.610425 LR: 0.00000798 +[16:42:16] Epoch: 1 Batch: 17089/20099 (85.02%) Loss: 1.924597 LR: 0.00000798 +[16:42:18] Epoch: 1 Batch: 17090/20099 (85.03%) Loss: 2.033648 LR: 0.00000798 +[16:42:19] Epoch: 1 Batch: 17091/20099 (85.03%) Loss: 2.048092 LR: 0.00000798 +[16:42:21] Epoch: 1 Batch: 17092/20099 (85.04%) Loss: 2.030400 LR: 0.00000798 +[16:42:23] Epoch: 1 Batch: 17093/20099 (85.04%) Loss: 2.186990 LR: 0.00000798 +[16:42:25] Epoch: 1 Batch: 17094/20099 (85.05%) Loss: 2.370811 LR: 0.00000798 +[16:42:27] Epoch: 1 Batch: 17095/20099 (85.05%) Loss: 1.966318 LR: 0.00000798 +[16:42:29] Epoch: 1 Batch: 17096/20099 (85.06%) Loss: 2.227926 LR: 0.00000798 +[16:42:31] Epoch: 1 Batch: 17097/20099 (85.06%) Loss: 1.970612 LR: 0.00000798 +[16:42:32] Epoch: 1 Batch: 17098/20099 (85.07%) Loss: 1.880152 LR: 0.00000798 +[16:42:34] Epoch: 1 Batch: 17099/20099 (85.07%) Loss: 2.285702 LR: 0.00000798 +[16:42:36] Epoch: 1 Batch: 17100/20099 (85.08%) Loss: 1.950794 LR: 0.00000798 +[16:42:38] Epoch: 1 Batch: 17101/20099 (85.08%) Loss: 2.210974 LR: 0.00000798 +[16:42:40] Epoch: 1 Batch: 17102/20099 (85.09%) Loss: 2.111454 LR: 0.00000797 +[16:42:42] Epoch: 1 Batch: 17103/20099 (85.09%) Loss: 2.065188 LR: 0.00000797 +[16:42:44] Epoch: 1 Batch: 17104/20099 (85.10%) Loss: 2.137998 LR: 0.00000797 +[16:42:45] Epoch: 1 Batch: 17105/20099 (85.10%) Loss: 2.428985 LR: 0.00000797 +[16:42:47] Epoch: 1 Batch: 17106/20099 (85.11%) Loss: 2.040183 LR: 0.00000797 +[16:42:49] Epoch: 1 Batch: 17107/20099 (85.11%) Loss: 2.357509 LR: 0.00000797 +[16:42:51] Epoch: 1 Batch: 17108/20099 (85.12%) Loss: 2.183233 LR: 0.00000797 +[16:42:53] Epoch: 1 Batch: 17109/20099 (85.12%) Loss: 2.294452 LR: 0.00000796 +[16:42:55] Epoch: 1 Batch: 17110/20099 (85.13%) Loss: 2.289980 LR: 0.00000796 +[16:42:57] Epoch: 1 Batch: 17111/20099 (85.13%) Loss: 2.086342 LR: 0.00000796 +[16:42:58] Epoch: 1 Batch: 17112/20099 (85.14%) Loss: 1.994547 LR: 0.00000796 +[16:43:00] Epoch: 1 Batch: 17113/20099 (85.14%) Loss: 2.055961 LR: 0.00000796 +[16:43:02] Epoch: 1 Batch: 17114/20099 (85.15%) Loss: 1.846628 LR: 0.00000796 +[16:43:04] Epoch: 1 Batch: 17115/20099 (85.15%) Loss: 1.792512 LR: 0.00000796 +[16:43:06] Epoch: 1 Batch: 17116/20099 (85.16%) Loss: 2.193355 LR: 0.00000795 +[16:43:08] Epoch: 1 Batch: 17117/20099 (85.16%) Loss: 2.068602 LR: 0.00000795 +[16:43:10] Epoch: 1 Batch: 17118/20099 (85.17%) Loss: 1.911580 LR: 0.00000795 +[16:43:11] Epoch: 1 Batch: 17119/20099 (85.17%) Loss: 2.128372 LR: 0.00000795 +[16:43:13] Epoch: 1 Batch: 17120/20099 (85.18%) Loss: 1.848773 LR: 0.00000795 +[16:43:15] Epoch: 1 Batch: 17121/20099 (85.18%) Loss: 1.808911 LR: 0.00000795 +[16:43:17] Epoch: 1 Batch: 17122/20099 (85.19%) Loss: 1.705184 LR: 0.00000795 +[16:43:19] Epoch: 1 Batch: 17123/20099 (85.19%) Loss: 2.145293 LR: 0.00000794 +[16:43:21] Epoch: 1 Batch: 17124/20099 (85.20%) Loss: 2.040301 LR: 0.00000794 +[16:43:23] Epoch: 1 Batch: 17125/20099 (85.20%) Loss: 2.028472 LR: 0.00000794 +[16:43:24] Epoch: 1 Batch: 17126/20099 (85.21%) Loss: 2.176286 LR: 0.00000794 +[16:43:26] Epoch: 1 Batch: 17127/20099 (85.21%) Loss: 2.128289 LR: 0.00000794 +[16:43:28] Epoch: 1 Batch: 17128/20099 (85.22%) Loss: 1.957259 LR: 0.00000794 +[16:43:30] Epoch: 1 Batch: 17129/20099 (85.22%) Loss: 1.907363 LR: 0.00000794 +[16:43:32] Epoch: 1 Batch: 17130/20099 (85.23%) Loss: 2.096828 LR: 0.00000793 +[16:43:34] Epoch: 1 Batch: 17131/20099 (85.23%) Loss: 2.181234 LR: 0.00000793 +[16:43:35] Epoch: 1 Batch: 17132/20099 (85.24%) Loss: 2.112146 LR: 0.00000793 +[16:43:37] Epoch: 1 Batch: 17133/20099 (85.24%) Loss: 2.083989 LR: 0.00000793 +[16:43:39] Epoch: 1 Batch: 17134/20099 (85.25%) Loss: 2.124143 LR: 0.00000793 +[16:43:41] Epoch: 1 Batch: 17135/20099 (85.25%) Loss: 2.334980 LR: 0.00000793 +[16:43:43] Epoch: 1 Batch: 17136/20099 (85.26%) Loss: 2.074976 LR: 0.00000793 +[16:43:45] Epoch: 1 Batch: 17137/20099 (85.26%) Loss: 1.917490 LR: 0.00000792 +[16:43:47] Epoch: 1 Batch: 17138/20099 (85.27%) Loss: 1.915757 LR: 0.00000792 +[16:43:48] Epoch: 1 Batch: 17139/20099 (85.27%) Loss: 2.025529 LR: 0.00000792 +[16:43:50] Epoch: 1 Batch: 17140/20099 (85.28%) Loss: 2.124703 LR: 0.00000792 +[16:43:52] Epoch: 1 Batch: 17141/20099 (85.28%) Loss: 2.383836 LR: 0.00000792 +[16:43:54] Epoch: 1 Batch: 17142/20099 (85.29%) Loss: 2.215561 LR: 0.00000792 +[16:43:56] Epoch: 1 Batch: 17143/20099 (85.29%) Loss: 1.989034 LR: 0.00000792 +[16:43:58] Epoch: 1 Batch: 17144/20099 (85.30%) Loss: 2.036236 LR: 0.00000791 +[16:44:00] Epoch: 1 Batch: 17145/20099 (85.30%) Loss: 1.826297 LR: 0.00000791 +[16:44:01] Epoch: 1 Batch: 17146/20099 (85.31%) Loss: 1.941903 LR: 0.00000791 +[16:44:03] Epoch: 1 Batch: 17147/20099 (85.31%) Loss: 2.217383 LR: 0.00000791 +[16:44:05] Epoch: 1 Batch: 17148/20099 (85.32%) Loss: 2.159658 LR: 0.00000791 +[16:44:07] Epoch: 1 Batch: 17149/20099 (85.32%) Loss: 2.406987 LR: 0.00000791 +[16:44:09] Epoch: 1 Batch: 17150/20099 (85.33%) Loss: 1.932494 LR: 0.00000791 +[16:44:11] Epoch: 1 Batch: 17151/20099 (85.33%) Loss: 1.918618 LR: 0.00000790 +[16:44:13] Epoch: 1 Batch: 17152/20099 (85.34%) Loss: 2.149009 LR: 0.00000790 +[16:44:15] Epoch: 1 Batch: 17153/20099 (85.34%) Loss: 2.128085 LR: 0.00000790 +[16:44:16] Epoch: 1 Batch: 17154/20099 (85.35%) Loss: 1.992927 LR: 0.00000790 +[16:44:18] Epoch: 1 Batch: 17155/20099 (85.35%) Loss: 2.302804 LR: 0.00000790 +[16:44:20] Epoch: 1 Batch: 17156/20099 (85.36%) Loss: 2.302409 LR: 0.00000790 +[16:44:22] Epoch: 1 Batch: 17157/20099 (85.36%) Loss: 2.458788 LR: 0.00000790 +[16:44:24] Epoch: 1 Batch: 17158/20099 (85.37%) Loss: 2.335850 LR: 0.00000790 +[16:44:26] Epoch: 1 Batch: 17159/20099 (85.37%) Loss: 1.974927 LR: 0.00000790 +[16:44:28] Epoch: 1 Batch: 17160/20099 (85.38%) Loss: 1.966613 LR: 0.00000790 +[16:44:29] Epoch: 1 Batch: 17161/20099 (85.38%) Loss: 1.939694 LR: 0.00000790 +[16:44:31] Epoch: 1 Batch: 17162/20099 (85.39%) Loss: 2.148151 LR: 0.00000790 +[16:44:33] Epoch: 1 Batch: 17163/20099 (85.39%) Loss: 1.939029 LR: 0.00000790 +[16:44:35] Epoch: 1 Batch: 17164/20099 (85.40%) Loss: 2.324956 LR: 0.00000790 +[16:44:37] Epoch: 1 Batch: 17165/20099 (85.40%) Loss: 2.196288 LR: 0.00000789 +[16:44:39] Epoch: 1 Batch: 17166/20099 (85.41%) Loss: 2.177834 LR: 0.00000789 +[16:44:40] Epoch: 1 Batch: 17167/20099 (85.41%) Loss: 2.068719 LR: 0.00000789 +[16:44:42] Epoch: 1 Batch: 17168/20099 (85.42%) Loss: 2.042240 LR: 0.00000789 +[16:44:44] Epoch: 1 Batch: 17169/20099 (85.42%) Loss: 2.424984 LR: 0.00000789 +[16:44:46] Epoch: 1 Batch: 17170/20099 (85.43%) Loss: 2.206494 LR: 0.00000789 +[16:44:48] Epoch: 1 Batch: 17171/20099 (85.43%) Loss: 2.312869 LR: 0.00000789 +[16:44:50] Epoch: 1 Batch: 17172/20099 (85.44%) Loss: 2.008868 LR: 0.00000788 +[16:44:52] Epoch: 1 Batch: 17173/20099 (85.44%) Loss: 2.134622 LR: 0.00000788 +[16:44:53] Epoch: 1 Batch: 17174/20099 (85.45%) Loss: 2.161809 LR: 0.00000788 +[16:44:55] Epoch: 1 Batch: 17175/20099 (85.45%) Loss: 2.010904 LR: 0.00000788 +[16:44:57] Epoch: 1 Batch: 17176/20099 (85.46%) Loss: 2.018464 LR: 0.00000788 +[16:44:59] Epoch: 1 Batch: 17177/20099 (85.46%) Loss: 2.034219 LR: 0.00000788 +[16:45:01] Epoch: 1 Batch: 17178/20099 (85.47%) Loss: 2.178152 LR: 0.00000788 +[16:45:03] Epoch: 1 Batch: 17179/20099 (85.47%) Loss: 2.159491 LR: 0.00000787 +[16:45:05] Epoch: 1 Batch: 17180/20099 (85.48%) Loss: 1.968548 LR: 0.00000787 +[16:45:06] Epoch: 1 Batch: 17181/20099 (85.48%) Loss: 1.871145 LR: 0.00000787 +[16:45:08] Epoch: 1 Batch: 17182/20099 (85.49%) Loss: 2.105959 LR: 0.00000787 +[16:45:10] Epoch: 1 Batch: 17183/20099 (85.49%) Loss: 2.116418 LR: 0.00000787 +[16:45:12] Epoch: 1 Batch: 17184/20099 (85.50%) Loss: 1.689605 LR: 0.00000787 +[16:45:14] Epoch: 1 Batch: 17185/20099 (85.50%) Loss: 2.093787 LR: 0.00000787 +[16:45:16] Epoch: 1 Batch: 17186/20099 (85.51%) Loss: 2.071362 LR: 0.00000786 +[16:45:17] Epoch: 1 Batch: 17187/20099 (85.51%) Loss: 2.191816 LR: 0.00000786 +[16:45:19] Epoch: 1 Batch: 17188/20099 (85.52%) Loss: 2.070408 LR: 0.00000786 +[16:45:21] Epoch: 1 Batch: 17189/20099 (85.52%) Loss: 1.933650 LR: 0.00000786 +[16:45:23] Epoch: 1 Batch: 17190/20099 (85.53%) Loss: 1.972380 LR: 0.00000786 +[16:45:25] Epoch: 1 Batch: 17191/20099 (85.53%) Loss: 1.849607 LR: 0.00000786 +[16:45:27] Epoch: 1 Batch: 17192/20099 (85.54%) Loss: 2.175886 LR: 0.00000786 +[16:45:28] Epoch: 1 Batch: 17193/20099 (85.54%) Loss: 2.155641 LR: 0.00000785 +[16:45:30] Epoch: 1 Batch: 17194/20099 (85.55%) Loss: 2.044591 LR: 0.00000785 +[16:45:32] Epoch: 1 Batch: 17195/20099 (85.55%) Loss: 2.204817 LR: 0.00000785 +[16:45:34] Epoch: 1 Batch: 17196/20099 (85.56%) Loss: 2.120708 LR: 0.00000785 +[16:45:36] Epoch: 1 Batch: 17197/20099 (85.56%) Loss: 1.942590 LR: 0.00000785 +[16:45:38] Epoch: 1 Batch: 17198/20099 (85.57%) Loss: 2.170893 LR: 0.00000785 +[16:45:40] Epoch: 1 Batch: 17199/20099 (85.57%) Loss: 2.493351 LR: 0.00000785 +[16:45:45] >> Cleaned up old temp checkpoint: epoch1_step15200 +[16:45:45] >> Temp checkpoint saved: epoch1_step17200, size: 0.1693 GB +[16:45:45] Epoch: 1 Batch: 17200/20099 (85.58%) Loss: 2.036138 LR: 0.00000784 +[16:45:47] Epoch: 1 Batch: 17201/20099 (85.58%) Loss: 2.152213 LR: 0.00000784 +[16:45:49] Epoch: 1 Batch: 17202/20099 (85.59%) Loss: 1.968125 LR: 0.00000784 +[16:45:50] Epoch: 1 Batch: 17203/20099 (85.59%) Loss: 2.203853 LR: 0.00000784 +[16:45:52] Epoch: 1 Batch: 17204/20099 (85.60%) Loss: 2.350096 LR: 0.00000784 +[16:45:54] Epoch: 1 Batch: 17205/20099 (85.60%) Loss: 2.035088 LR: 0.00000784 +[16:45:56] Epoch: 1 Batch: 17206/20099 (85.61%) Loss: 2.481073 LR: 0.00000784 +[16:45:58] Epoch: 1 Batch: 17207/20099 (85.61%) Loss: 2.153065 LR: 0.00000784 +[16:46:00] Epoch: 1 Batch: 17208/20099 (85.62%) Loss: 2.210736 LR: 0.00000784 +[16:46:02] Epoch: 1 Batch: 17209/20099 (85.62%) Loss: 2.038571 LR: 0.00000784 +[16:46:03] Epoch: 1 Batch: 17210/20099 (85.63%) Loss: 2.021170 LR: 0.00000784 +[16:46:05] Epoch: 1 Batch: 17211/20099 (85.63%) Loss: 1.926524 LR: 0.00000784 +[16:46:07] Epoch: 1 Batch: 17212/20099 (85.64%) Loss: 2.079737 LR: 0.00000784 +[16:46:09] Epoch: 1 Batch: 17213/20099 (85.64%) Loss: 1.967750 LR: 0.00000784 +[16:46:11] Epoch: 1 Batch: 17214/20099 (85.65%) Loss: 2.312788 LR: 0.00000783 +[16:46:13] Epoch: 1 Batch: 17215/20099 (85.65%) Loss: 2.097534 LR: 0.00000783 +[16:46:15] Epoch: 1 Batch: 17216/20099 (85.66%) Loss: 2.160419 LR: 0.00000783 +[16:46:16] Epoch: 1 Batch: 17217/20099 (85.66%) Loss: 1.959460 LR: 0.00000783 +[16:46:18] Epoch: 1 Batch: 17218/20099 (85.67%) Loss: 2.072925 LR: 0.00000783 +[16:46:20] Epoch: 1 Batch: 17219/20099 (85.67%) Loss: 2.004448 LR: 0.00000783 +[16:46:22] Epoch: 1 Batch: 17220/20099 (85.68%) Loss: 2.073457 LR: 0.00000783 +[16:46:24] Epoch: 1 Batch: 17221/20099 (85.68%) Loss: 2.484454 LR: 0.00000782 +[16:46:26] Epoch: 1 Batch: 17222/20099 (85.69%) Loss: 2.040761 LR: 0.00000782 +[16:46:28] Epoch: 1 Batch: 17223/20099 (85.69%) Loss: 2.189120 LR: 0.00000782 +[16:46:30] Epoch: 1 Batch: 17224/20099 (85.70%) Loss: 2.018032 LR: 0.00000782 +[16:46:31] Epoch: 1 Batch: 17225/20099 (85.70%) Loss: 2.178329 LR: 0.00000782 +[16:46:33] Epoch: 1 Batch: 17226/20099 (85.71%) Loss: 2.017492 LR: 0.00000782 +[16:46:35] Epoch: 1 Batch: 17227/20099 (85.71%) Loss: 2.018399 LR: 0.00000782 +[16:46:37] Epoch: 1 Batch: 17228/20099 (85.72%) Loss: 1.923882 LR: 0.00000781 +[16:46:39] Epoch: 1 Batch: 17229/20099 (85.72%) Loss: 2.385854 LR: 0.00000781 +[16:46:41] Epoch: 1 Batch: 17230/20099 (85.73%) Loss: 2.192957 LR: 0.00000781 +[16:46:43] Epoch: 1 Batch: 17231/20099 (85.73%) Loss: 2.112211 LR: 0.00000781 +[16:46:44] Epoch: 1 Batch: 17232/20099 (85.74%) Loss: 2.062364 LR: 0.00000781 +[16:46:46] Epoch: 1 Batch: 17233/20099 (85.74%) Loss: 1.871177 LR: 0.00000781 +[16:46:48] Epoch: 1 Batch: 17234/20099 (85.75%) Loss: 2.107831 LR: 0.00000781 +[16:46:50] Epoch: 1 Batch: 17235/20099 (85.75%) Loss: 2.002338 LR: 0.00000780 +[16:46:52] Epoch: 1 Batch: 17236/20099 (85.76%) Loss: 1.909302 LR: 0.00000780 +[16:46:54] Epoch: 1 Batch: 17237/20099 (85.76%) Loss: 1.697102 LR: 0.00000780 +[16:46:56] Epoch: 1 Batch: 17238/20099 (85.77%) Loss: 2.139163 LR: 0.00000780 +[16:46:57] Epoch: 1 Batch: 17239/20099 (85.77%) Loss: 1.934740 LR: 0.00000780 +[16:46:59] Epoch: 1 Batch: 17240/20099 (85.78%) Loss: 1.790341 LR: 0.00000780 +[16:47:01] Epoch: 1 Batch: 17241/20099 (85.78%) Loss: 2.079438 LR: 0.00000780 +[16:47:03] Epoch: 1 Batch: 17242/20099 (85.79%) Loss: 2.167928 LR: 0.00000779 +[16:47:05] Epoch: 1 Batch: 17243/20099 (85.79%) Loss: 2.125711 LR: 0.00000779 +[16:47:07] Epoch: 1 Batch: 17244/20099 (85.80%) Loss: 2.144293 LR: 0.00000779 +[16:47:08] Epoch: 1 Batch: 17245/20099 (85.80%) Loss: 1.897318 LR: 0.00000779 +[16:47:10] Epoch: 1 Batch: 17246/20099 (85.81%) Loss: 2.089487 LR: 0.00000779 +[16:47:12] Epoch: 1 Batch: 17247/20099 (85.81%) Loss: 2.109206 LR: 0.00000779 +[16:47:14] Epoch: 1 Batch: 17248/20099 (85.82%) Loss: 2.335516 LR: 0.00000779 +[16:47:16] Epoch: 1 Batch: 17249/20099 (85.82%) Loss: 1.900991 LR: 0.00000778 +[16:47:18] Epoch: 1 Batch: 17250/20099 (85.83%) Loss: 2.200330 LR: 0.00000778 +[16:47:20] Epoch: 1 Batch: 17251/20099 (85.83%) Loss: 2.193169 LR: 0.00000778 +[16:47:21] Epoch: 1 Batch: 17252/20099 (85.84%) Loss: 2.069787 LR: 0.00000778 +[16:47:23] Epoch: 1 Batch: 17253/20099 (85.84%) Loss: 2.380711 LR: 0.00000778 +[16:47:25] Epoch: 1 Batch: 17254/20099 (85.85%) Loss: 2.274825 LR: 0.00000778 +[16:47:27] Epoch: 1 Batch: 17255/20099 (85.85%) Loss: 1.904615 LR: 0.00000778 +[16:47:29] Epoch: 1 Batch: 17256/20099 (85.86%) Loss: 2.077132 LR: 0.00000778 +[16:47:31] Epoch: 1 Batch: 17257/20099 (85.86%) Loss: 2.166797 LR: 0.00000778 +[16:47:32] Epoch: 1 Batch: 17258/20099 (85.86%) Loss: 1.973248 LR: 0.00000778 +[16:47:34] Epoch: 1 Batch: 17259/20099 (85.87%) Loss: 2.247633 LR: 0.00000778 +[16:47:36] Epoch: 1 Batch: 17260/20099 (85.87%) Loss: 2.201734 LR: 0.00000778 +[16:47:38] Epoch: 1 Batch: 17261/20099 (85.88%) Loss: 1.935487 LR: 0.00000778 +[16:47:40] Epoch: 1 Batch: 17262/20099 (85.88%) Loss: 2.133951 LR: 0.00000778 +[16:47:42] Epoch: 1 Batch: 17263/20099 (85.89%) Loss: 2.126147 LR: 0.00000777 +[16:47:44] Epoch: 1 Batch: 17264/20099 (85.89%) Loss: 1.933670 LR: 0.00000777 +[16:47:46] Epoch: 1 Batch: 17265/20099 (85.90%) Loss: 2.353459 LR: 0.00000777 +[16:47:47] Epoch: 1 Batch: 17266/20099 (85.90%) Loss: 1.878260 LR: 0.00000777 +[16:47:49] Epoch: 1 Batch: 17267/20099 (85.91%) Loss: 2.208635 LR: 0.00000777 +[16:47:51] Epoch: 1 Batch: 17268/20099 (85.91%) Loss: 1.995406 LR: 0.00000777 +[16:47:53] Epoch: 1 Batch: 17269/20099 (85.92%) Loss: 2.036774 LR: 0.00000777 +[16:47:55] Epoch: 1 Batch: 17270/20099 (85.92%) Loss: 1.879942 LR: 0.00000776 +[16:47:57] Epoch: 1 Batch: 17271/20099 (85.93%) Loss: 1.984814 LR: 0.00000776 +[16:47:59] Epoch: 1 Batch: 17272/20099 (85.93%) Loss: 2.154281 LR: 0.00000776 +[16:48:00] Epoch: 1 Batch: 17273/20099 (85.94%) Loss: 1.603649 LR: 0.00000776 +[16:48:02] Epoch: 1 Batch: 17274/20099 (85.94%) Loss: 2.429045 LR: 0.00000776 +[16:48:04] Epoch: 1 Batch: 17275/20099 (85.95%) Loss: 2.091626 LR: 0.00000776 +[16:48:06] Epoch: 1 Batch: 17276/20099 (85.95%) Loss: 2.333521 LR: 0.00000776 +[16:48:08] Epoch: 1 Batch: 17277/20099 (85.96%) Loss: 2.206443 LR: 0.00000775 +[16:48:10] Epoch: 1 Batch: 17278/20099 (85.96%) Loss: 1.879445 LR: 0.00000775 +[16:48:12] Epoch: 1 Batch: 17279/20099 (85.97%) Loss: 2.062108 LR: 0.00000775 +[16:48:13] Epoch: 1 Batch: 17280/20099 (85.97%) Loss: 1.934065 LR: 0.00000775 +[16:48:15] Epoch: 1 Batch: 17281/20099 (85.98%) Loss: 2.174314 LR: 0.00000775 +[16:48:17] Epoch: 1 Batch: 17282/20099 (85.98%) Loss: 2.366800 LR: 0.00000775 +[16:48:19] Epoch: 1 Batch: 17283/20099 (85.99%) Loss: 1.886599 LR: 0.00000775 +[16:48:21] Epoch: 1 Batch: 17284/20099 (85.99%) Loss: 2.048080 LR: 0.00000774 +[16:48:23] Epoch: 1 Batch: 17285/20099 (86.00%) Loss: 2.125031 LR: 0.00000774 +[16:48:25] Epoch: 1 Batch: 17286/20099 (86.00%) Loss: 2.138046 LR: 0.00000774 +[16:48:26] Epoch: 1 Batch: 17287/20099 (86.01%) Loss: 2.170060 LR: 0.00000774 +[16:48:28] Epoch: 1 Batch: 17288/20099 (86.01%) Loss: 2.533138 LR: 0.00000774 +[16:48:30] Epoch: 1 Batch: 17289/20099 (86.02%) Loss: 2.094114 LR: 0.00000774 +[16:48:32] Epoch: 1 Batch: 17290/20099 (86.02%) Loss: 2.213296 LR: 0.00000774 +[16:48:34] Epoch: 1 Batch: 17291/20099 (86.03%) Loss: 1.922375 LR: 0.00000773 +[16:48:36] Epoch: 1 Batch: 17292/20099 (86.03%) Loss: 2.205354 LR: 0.00000773 +[16:48:38] Epoch: 1 Batch: 17293/20099 (86.04%) Loss: 2.112832 LR: 0.00000773 +[16:48:39] Epoch: 1 Batch: 17294/20099 (86.04%) Loss: 2.349554 LR: 0.00000773 +[16:48:41] Epoch: 1 Batch: 17295/20099 (86.05%) Loss: 2.135830 LR: 0.00000773 +[16:48:43] Epoch: 1 Batch: 17296/20099 (86.05%) Loss: 2.214923 LR: 0.00000773 +[16:48:45] Epoch: 1 Batch: 17297/20099 (86.06%) Loss: 2.028799 LR: 0.00000773 +[16:48:47] Epoch: 1 Batch: 17298/20099 (86.06%) Loss: 1.969951 LR: 0.00000772 +[16:48:49] Epoch: 1 Batch: 17299/20099 (86.07%) Loss: 2.016245 LR: 0.00000772 +[16:48:51] Epoch: 1 Batch: 17300/20099 (86.07%) Loss: 2.060121 LR: 0.00000772 +[16:48:52] Epoch: 1 Batch: 17301/20099 (86.08%) Loss: 2.200127 LR: 0.00000772 +[16:48:54] Epoch: 1 Batch: 17302/20099 (86.08%) Loss: 1.905559 LR: 0.00000772 +[16:48:56] Epoch: 1 Batch: 17303/20099 (86.09%) Loss: 2.082115 LR: 0.00000772 +[16:48:58] Epoch: 1 Batch: 17304/20099 (86.09%) Loss: 2.003947 LR: 0.00000772 +[16:49:00] Epoch: 1 Batch: 17305/20099 (86.10%) Loss: 2.056095 LR: 0.00000772 +[16:49:02] Epoch: 1 Batch: 17306/20099 (86.10%) Loss: 1.949459 LR: 0.00000772 +[16:49:03] Epoch: 1 Batch: 17307/20099 (86.11%) Loss: 2.052391 LR: 0.00000772 +[16:49:05] Epoch: 1 Batch: 17308/20099 (86.11%) Loss: 2.343960 LR: 0.00000772 +[16:49:07] Epoch: 1 Batch: 17309/20099 (86.12%) Loss: 2.408039 LR: 0.00000772 +[16:49:09] Epoch: 1 Batch: 17310/20099 (86.12%) Loss: 2.010841 LR: 0.00000772 +[16:49:11] Epoch: 1 Batch: 17311/20099 (86.13%) Loss: 1.919291 LR: 0.00000772 +[16:49:13] Epoch: 1 Batch: 17312/20099 (86.13%) Loss: 2.232831 LR: 0.00000771 +[16:49:14] Epoch: 1 Batch: 17313/20099 (86.14%) Loss: 1.839179 LR: 0.00000771 +[16:49:16] Epoch: 1 Batch: 17314/20099 (86.14%) Loss: 2.376810 LR: 0.00000771 +[16:49:18] Epoch: 1 Batch: 17315/20099 (86.15%) Loss: 1.999658 LR: 0.00000771 +[16:49:20] Epoch: 1 Batch: 17316/20099 (86.15%) Loss: 1.963540 LR: 0.00000771 +[16:49:22] Epoch: 1 Batch: 17317/20099 (86.16%) Loss: 1.899183 LR: 0.00000771 +[16:49:24] Epoch: 1 Batch: 17318/20099 (86.16%) Loss: 1.985349 LR: 0.00000771 +[16:49:26] Epoch: 1 Batch: 17319/20099 (86.17%) Loss: 2.199787 LR: 0.00000770 +[16:49:27] Epoch: 1 Batch: 17320/20099 (86.17%) Loss: 2.281458 LR: 0.00000770 +[16:49:29] Epoch: 1 Batch: 17321/20099 (86.18%) Loss: 1.995444 LR: 0.00000770 +[16:49:31] Epoch: 1 Batch: 17322/20099 (86.18%) Loss: 1.796515 LR: 0.00000770 +[16:49:33] Epoch: 1 Batch: 17323/20099 (86.19%) Loss: 2.193122 LR: 0.00000770 +[16:49:35] Epoch: 1 Batch: 17324/20099 (86.19%) Loss: 1.890665 LR: 0.00000770 +[16:49:37] Epoch: 1 Batch: 17325/20099 (86.20%) Loss: 1.878411 LR: 0.00000770 +[16:49:39] Epoch: 1 Batch: 17326/20099 (86.20%) Loss: 2.080388 LR: 0.00000769 +[16:49:40] Epoch: 1 Batch: 17327/20099 (86.21%) Loss: 1.803379 LR: 0.00000769 +[16:49:42] Epoch: 1 Batch: 17328/20099 (86.21%) Loss: 2.037890 LR: 0.00000769 +[16:49:44] Epoch: 1 Batch: 17329/20099 (86.22%) Loss: 2.085538 LR: 0.00000769 +[16:49:46] Epoch: 1 Batch: 17330/20099 (86.22%) Loss: 2.430757 LR: 0.00000769 +[16:49:48] Epoch: 1 Batch: 17331/20099 (86.23%) Loss: 1.926998 LR: 0.00000769 +[16:49:50] Epoch: 1 Batch: 17332/20099 (86.23%) Loss: 2.235817 LR: 0.00000769 +[16:49:52] Epoch: 1 Batch: 17333/20099 (86.24%) Loss: 1.915742 LR: 0.00000768 +[16:49:53] Epoch: 1 Batch: 17334/20099 (86.24%) Loss: 2.317299 LR: 0.00000768 +[16:49:55] Epoch: 1 Batch: 17335/20099 (86.25%) Loss: 2.190443 LR: 0.00000768 +[16:49:57] Epoch: 1 Batch: 17336/20099 (86.25%) Loss: 2.035498 LR: 0.00000768 +[16:49:59] Epoch: 1 Batch: 17337/20099 (86.26%) Loss: 2.084763 LR: 0.00000768 +[16:50:01] Epoch: 1 Batch: 17338/20099 (86.26%) Loss: 2.080914 LR: 0.00000768 +[16:50:03] Epoch: 1 Batch: 17339/20099 (86.27%) Loss: 2.279551 LR: 0.00000768 +[16:50:05] Epoch: 1 Batch: 17340/20099 (86.27%) Loss: 2.102048 LR: 0.00000767 +[16:50:06] Epoch: 1 Batch: 17341/20099 (86.28%) Loss: 2.302230 LR: 0.00000767 +[16:50:08] Epoch: 1 Batch: 17342/20099 (86.28%) Loss: 2.177974 LR: 0.00000767 +[16:50:10] Epoch: 1 Batch: 17343/20099 (86.29%) Loss: 2.056188 LR: 0.00000767 +[16:50:12] Epoch: 1 Batch: 17344/20099 (86.29%) Loss: 2.127283 LR: 0.00000767 +[16:50:14] Epoch: 1 Batch: 17345/20099 (86.30%) Loss: 2.146709 LR: 0.00000767 +[16:50:16] Epoch: 1 Batch: 17346/20099 (86.30%) Loss: 2.223658 LR: 0.00000767 +[16:50:18] Epoch: 1 Batch: 17347/20099 (86.31%) Loss: 2.191038 LR: 0.00000767 +[16:50:19] Epoch: 1 Batch: 17348/20099 (86.31%) Loss: 1.788854 LR: 0.00000767 +[16:50:21] Epoch: 1 Batch: 17349/20099 (86.32%) Loss: 2.120617 LR: 0.00000767 +[16:50:23] Epoch: 1 Batch: 17350/20099 (86.32%) Loss: 2.089490 LR: 0.00000767 +[16:50:25] Epoch: 1 Batch: 17351/20099 (86.33%) Loss: 1.964117 LR: 0.00000767 +[16:50:27] Epoch: 1 Batch: 17352/20099 (86.33%) Loss: 1.746458 LR: 0.00000767 +[16:50:29] Epoch: 1 Batch: 17353/20099 (86.34%) Loss: 1.463508 LR: 0.00000767 +[16:50:31] Epoch: 1 Batch: 17354/20099 (86.34%) Loss: 2.105454 LR: 0.00000766 +[16:50:32] Epoch: 1 Batch: 17355/20099 (86.35%) Loss: 2.284277 LR: 0.00000766 +[16:50:34] Epoch: 1 Batch: 17356/20099 (86.35%) Loss: 2.545760 LR: 0.00000766 +[16:50:36] Epoch: 1 Batch: 17357/20099 (86.36%) Loss: 2.373370 LR: 0.00000766 +[16:50:38] Epoch: 1 Batch: 17358/20099 (86.36%) Loss: 1.881050 LR: 0.00000766 +[16:50:40] Epoch: 1 Batch: 17359/20099 (86.37%) Loss: 2.212394 LR: 0.00000766 +[16:50:42] Epoch: 1 Batch: 17360/20099 (86.37%) Loss: 2.260129 LR: 0.00000766 +[16:50:44] Epoch: 1 Batch: 17361/20099 (86.38%) Loss: 2.002730 LR: 0.00000765 +[16:50:45] Epoch: 1 Batch: 17362/20099 (86.38%) Loss: 1.913905 LR: 0.00000765 +[16:50:47] Epoch: 1 Batch: 17363/20099 (86.39%) Loss: 2.123970 LR: 0.00000765 +[16:50:49] Epoch: 1 Batch: 17364/20099 (86.39%) Loss: 2.014408 LR: 0.00000765 +[16:50:51] Epoch: 1 Batch: 17365/20099 (86.40%) Loss: 2.123674 LR: 0.00000765 +[16:50:53] Epoch: 1 Batch: 17366/20099 (86.40%) Loss: 1.622434 LR: 0.00000765 +[16:50:55] Epoch: 1 Batch: 17367/20099 (86.41%) Loss: 1.943944 LR: 0.00000765 +[16:50:56] Epoch: 1 Batch: 17368/20099 (86.41%) Loss: 2.383486 LR: 0.00000764 +[16:50:58] Epoch: 1 Batch: 17369/20099 (86.42%) Loss: 2.008016 LR: 0.00000764 +[16:51:00] Epoch: 1 Batch: 17370/20099 (86.42%) Loss: 2.052284 LR: 0.00000764 +[16:51:02] Epoch: 1 Batch: 17371/20099 (86.43%) Loss: 1.954945 LR: 0.00000764 +[16:51:04] Epoch: 1 Batch: 17372/20099 (86.43%) Loss: 2.108921 LR: 0.00000764 +[16:51:06] Epoch: 1 Batch: 17373/20099 (86.44%) Loss: 2.039291 LR: 0.00000764 +[16:51:08] Epoch: 1 Batch: 17374/20099 (86.44%) Loss: 2.054207 LR: 0.00000764 +[16:51:09] Epoch: 1 Batch: 17375/20099 (86.45%) Loss: 2.207550 LR: 0.00000763 +[16:51:11] Epoch: 1 Batch: 17376/20099 (86.45%) Loss: 2.382384 LR: 0.00000763 +[16:51:13] Epoch: 1 Batch: 17377/20099 (86.46%) Loss: 1.936982 LR: 0.00000763 +[16:51:15] Epoch: 1 Batch: 17378/20099 (86.46%) Loss: 2.106252 LR: 0.00000763 +[16:51:17] Epoch: 1 Batch: 17379/20099 (86.47%) Loss: 1.996331 LR: 0.00000763 +[16:51:19] Epoch: 1 Batch: 17380/20099 (86.47%) Loss: 1.953431 LR: 0.00000763 +[16:51:20] Epoch: 1 Batch: 17381/20099 (86.48%) Loss: 2.002789 LR: 0.00000763 +[16:51:22] Epoch: 1 Batch: 17382/20099 (86.48%) Loss: 1.942730 LR: 0.00000763 +[16:51:24] Epoch: 1 Batch: 17383/20099 (86.49%) Loss: 1.905152 LR: 0.00000763 +[16:51:26] Epoch: 1 Batch: 17384/20099 (86.49%) Loss: 2.195080 LR: 0.00000763 +[16:51:28] Epoch: 1 Batch: 17385/20099 (86.50%) Loss: 1.892500 LR: 0.00000763 +[16:51:30] Epoch: 1 Batch: 17386/20099 (86.50%) Loss: 2.035051 LR: 0.00000763 +[16:51:32] Epoch: 1 Batch: 17387/20099 (86.51%) Loss: 2.385957 LR: 0.00000763 +[16:51:33] Epoch: 1 Batch: 17388/20099 (86.51%) Loss: 2.052205 LR: 0.00000763 +[16:51:35] Epoch: 1 Batch: 17389/20099 (86.52%) Loss: 1.883700 LR: 0.00000762 +[16:51:37] Epoch: 1 Batch: 17390/20099 (86.52%) Loss: 1.715557 LR: 0.00000762 +[16:51:39] Epoch: 1 Batch: 17391/20099 (86.53%) Loss: 2.144867 LR: 0.00000762 +[16:51:41] Epoch: 1 Batch: 17392/20099 (86.53%) Loss: 2.075301 LR: 0.00000762 +[16:51:43] Epoch: 1 Batch: 17393/20099 (86.54%) Loss: 2.113362 LR: 0.00000762 +[16:51:45] Epoch: 1 Batch: 17394/20099 (86.54%) Loss: 1.967252 LR: 0.00000762 +[16:51:46] Epoch: 1 Batch: 17395/20099 (86.55%) Loss: 1.899744 LR: 0.00000762 +[16:51:48] Epoch: 1 Batch: 17396/20099 (86.55%) Loss: 2.161981 LR: 0.00000761 +[16:51:50] Epoch: 1 Batch: 17397/20099 (86.56%) Loss: 1.730449 LR: 0.00000761 +[16:51:52] Epoch: 1 Batch: 17398/20099 (86.56%) Loss: 2.250108 LR: 0.00000761 +[16:51:54] Epoch: 1 Batch: 17399/20099 (86.57%) Loss: 2.054953 LR: 0.00000761 +[16:51:59] >> Cleaned up old temp checkpoint: epoch1_step15400 +[16:51:59] >> Temp checkpoint saved: epoch1_step17400, size: 0.1693 GB +[16:51:59] Epoch: 1 Batch: 17400/20099 (86.57%) Loss: 1.919746 LR: 0.00000761 +[16:52:01] Epoch: 1 Batch: 17401/20099 (86.58%) Loss: 1.799139 LR: 0.00000761 +[16:52:03] Epoch: 1 Batch: 17402/20099 (86.58%) Loss: 2.000528 LR: 0.00000761 +[16:52:05] Epoch: 1 Batch: 17403/20099 (86.59%) Loss: 1.805122 LR: 0.00000760 +[16:52:07] Epoch: 1 Batch: 17404/20099 (86.59%) Loss: 2.256586 LR: 0.00000760 +[16:52:09] Epoch: 1 Batch: 17405/20099 (86.60%) Loss: 1.913914 LR: 0.00000760 +[16:52:11] Epoch: 1 Batch: 17406/20099 (86.60%) Loss: 1.680895 LR: 0.00000760 +[16:52:12] Epoch: 1 Batch: 17407/20099 (86.61%) Loss: 2.058811 LR: 0.00000760 +[16:52:14] Epoch: 1 Batch: 17408/20099 (86.61%) Loss: 2.050990 LR: 0.00000760 +[16:52:16] Epoch: 1 Batch: 17409/20099 (86.62%) Loss: 2.085586 LR: 0.00000760 +[16:52:18] Epoch: 1 Batch: 17410/20099 (86.62%) Loss: 1.804055 LR: 0.00000759 +[16:52:20] Epoch: 1 Batch: 17411/20099 (86.63%) Loss: 2.061215 LR: 0.00000759 +[16:52:22] Epoch: 1 Batch: 17412/20099 (86.63%) Loss: 1.941276 LR: 0.00000759 +[16:52:24] Epoch: 1 Batch: 17413/20099 (86.64%) Loss: 2.062979 LR: 0.00000759 +[16:52:25] Epoch: 1 Batch: 17414/20099 (86.64%) Loss: 1.997889 LR: 0.00000759 +[16:52:27] Epoch: 1 Batch: 17415/20099 (86.65%) Loss: 2.135464 LR: 0.00000759 +[16:52:29] Epoch: 1 Batch: 17416/20099 (86.65%) Loss: 2.187501 LR: 0.00000759 +[16:52:31] Epoch: 1 Batch: 17417/20099 (86.66%) Loss: 2.248392 LR: 0.00000758 +[16:52:33] Epoch: 1 Batch: 17418/20099 (86.66%) Loss: 1.904775 LR: 0.00000758 +[16:52:35] Epoch: 1 Batch: 17419/20099 (86.67%) Loss: 1.831022 LR: 0.00000758 +[16:52:37] Epoch: 1 Batch: 17420/20099 (86.67%) Loss: 2.134475 LR: 0.00000758 +[16:52:38] Epoch: 1 Batch: 17421/20099 (86.68%) Loss: 2.049401 LR: 0.00000758 +[16:52:40] Epoch: 1 Batch: 17422/20099 (86.68%) Loss: 2.266951 LR: 0.00000758 +[16:52:42] Epoch: 1 Batch: 17423/20099 (86.69%) Loss: 2.083439 LR: 0.00000758 +[16:52:44] Epoch: 1 Batch: 17424/20099 (86.69%) Loss: 2.195826 LR: 0.00000758 +[16:52:46] Epoch: 1 Batch: 17425/20099 (86.70%) Loss: 2.195502 LR: 0.00000758 +[16:52:48] Epoch: 1 Batch: 17426/20099 (86.70%) Loss: 2.305930 LR: 0.00000758 +[16:52:50] Epoch: 1 Batch: 17427/20099 (86.71%) Loss: 2.065771 LR: 0.00000758 +[16:52:52] Epoch: 1 Batch: 17428/20099 (86.71%) Loss: 2.051359 LR: 0.00000758 +[16:52:53] Epoch: 1 Batch: 17429/20099 (86.72%) Loss: 2.040518 LR: 0.00000758 +[16:52:55] Epoch: 1 Batch: 17430/20099 (86.72%) Loss: 2.187335 LR: 0.00000758 +[16:52:57] Epoch: 1 Batch: 17431/20099 (86.73%) Loss: 1.919889 LR: 0.00000757 +[16:52:59] Epoch: 1 Batch: 17432/20099 (86.73%) Loss: 1.883625 LR: 0.00000757 +[16:53:01] Epoch: 1 Batch: 17433/20099 (86.74%) Loss: 2.270657 LR: 0.00000757 +[16:53:03] Epoch: 1 Batch: 17434/20099 (86.74%) Loss: 2.191535 LR: 0.00000757 +[16:53:05] Epoch: 1 Batch: 17435/20099 (86.75%) Loss: 2.157299 LR: 0.00000757 +[16:53:06] Epoch: 1 Batch: 17436/20099 (86.75%) Loss: 1.945669 LR: 0.00000757 +[16:53:08] Epoch: 1 Batch: 17437/20099 (86.76%) Loss: 2.231283 LR: 0.00000757 +[16:53:10] Epoch: 1 Batch: 17438/20099 (86.76%) Loss: 2.337742 LR: 0.00000756 +[16:53:12] Epoch: 1 Batch: 17439/20099 (86.77%) Loss: 2.132945 LR: 0.00000756 +[16:53:14] Epoch: 1 Batch: 17440/20099 (86.77%) Loss: 1.901555 LR: 0.00000756 +[16:53:16] Epoch: 1 Batch: 17441/20099 (86.78%) Loss: 2.196252 LR: 0.00000756 +[16:53:17] Epoch: 1 Batch: 17442/20099 (86.78%) Loss: 2.357275 LR: 0.00000756 +[16:53:19] Epoch: 1 Batch: 17443/20099 (86.79%) Loss: 2.076062 LR: 0.00000756 +[16:53:21] Epoch: 1 Batch: 17444/20099 (86.79%) Loss: 2.368969 LR: 0.00000756 +[16:53:23] Epoch: 1 Batch: 17445/20099 (86.80%) Loss: 2.316773 LR: 0.00000755 +[16:53:25] Epoch: 1 Batch: 17446/20099 (86.80%) Loss: 2.412952 LR: 0.00000755 +[16:53:27] Epoch: 1 Batch: 17447/20099 (86.81%) Loss: 2.167215 LR: 0.00000755 +[16:53:28] Epoch: 1 Batch: 17448/20099 (86.81%) Loss: 2.030721 LR: 0.00000755 +[16:53:30] Epoch: 1 Batch: 17449/20099 (86.82%) Loss: 2.032823 LR: 0.00000755 +[16:53:32] Epoch: 1 Batch: 17450/20099 (86.82%) Loss: 1.819517 LR: 0.00000755 +[16:53:34] Epoch: 1 Batch: 17451/20099 (86.83%) Loss: 2.154927 LR: 0.00000755 +[16:53:36] Epoch: 1 Batch: 17452/20099 (86.83%) Loss: 2.115144 LR: 0.00000754 +[16:53:38] Epoch: 1 Batch: 17453/20099 (86.84%) Loss: 1.819947 LR: 0.00000754 +[16:53:40] Epoch: 1 Batch: 17454/20099 (86.84%) Loss: 2.176224 LR: 0.00000754 +[16:53:41] Epoch: 1 Batch: 17455/20099 (86.85%) Loss: 2.136883 LR: 0.00000754 +[16:53:43] Epoch: 1 Batch: 17456/20099 (86.85%) Loss: 1.724549 LR: 0.00000754 +[16:53:45] Epoch: 1 Batch: 17457/20099 (86.86%) Loss: 2.220631 LR: 0.00000754 +[16:53:47] Epoch: 1 Batch: 17458/20099 (86.86%) Loss: 1.849589 LR: 0.00000754 +[16:53:49] Epoch: 1 Batch: 17459/20099 (86.87%) Loss: 2.021147 LR: 0.00000754 +[16:53:51] Epoch: 1 Batch: 17460/20099 (86.87%) Loss: 2.253992 LR: 0.00000754 +[16:53:53] Epoch: 1 Batch: 17461/20099 (86.87%) Loss: 1.699164 LR: 0.00000754 +[16:53:54] Epoch: 1 Batch: 17462/20099 (86.88%) Loss: 2.146445 LR: 0.00000754 +[16:53:56] Epoch: 1 Batch: 17463/20099 (86.88%) Loss: 2.080113 LR: 0.00000754 +[16:53:58] Epoch: 1 Batch: 17464/20099 (86.89%) Loss: 1.963135 LR: 0.00000754 +[16:54:00] Epoch: 1 Batch: 17465/20099 (86.89%) Loss: 2.147226 LR: 0.00000754 +[16:54:02] Epoch: 1 Batch: 17466/20099 (86.90%) Loss: 1.900192 LR: 0.00000753 +[16:54:04] Epoch: 1 Batch: 17467/20099 (86.90%) Loss: 2.210519 LR: 0.00000753 +[16:54:06] Epoch: 1 Batch: 17468/20099 (86.91%) Loss: 2.061323 LR: 0.00000753 +[16:54:07] Epoch: 1 Batch: 17469/20099 (86.91%) Loss: 1.962737 LR: 0.00000753 +[16:54:09] Epoch: 1 Batch: 17470/20099 (86.92%) Loss: 1.767326 LR: 0.00000753 +[16:54:11] Epoch: 1 Batch: 17471/20099 (86.92%) Loss: 1.996546 LR: 0.00000753 +[16:54:13] Epoch: 1 Batch: 17472/20099 (86.93%) Loss: 2.376313 LR: 0.00000753 +[16:54:15] Epoch: 1 Batch: 17473/20099 (86.93%) Loss: 2.111506 LR: 0.00000752 +[16:54:17] Epoch: 1 Batch: 17474/20099 (86.94%) Loss: 2.174128 LR: 0.00000752 +[16:54:19] Epoch: 1 Batch: 17475/20099 (86.94%) Loss: 2.148231 LR: 0.00000752 +[16:54:20] Epoch: 1 Batch: 17476/20099 (86.95%) Loss: 1.927590 LR: 0.00000752 +[16:54:22] Epoch: 1 Batch: 17477/20099 (86.95%) Loss: 2.248884 LR: 0.00000752 +[16:54:24] Epoch: 1 Batch: 17478/20099 (86.96%) Loss: 2.426048 LR: 0.00000752 +[16:54:26] Epoch: 1 Batch: 17479/20099 (86.96%) Loss: 1.946288 LR: 0.00000752 +[16:54:28] Epoch: 1 Batch: 17480/20099 (86.97%) Loss: 2.331980 LR: 0.00000751 +[16:54:30] Epoch: 1 Batch: 17481/20099 (86.97%) Loss: 2.165002 LR: 0.00000751 +[16:54:32] Epoch: 1 Batch: 17482/20099 (86.98%) Loss: 1.737559 LR: 0.00000751 +[16:54:33] Epoch: 1 Batch: 17483/20099 (86.98%) Loss: 2.025131 LR: 0.00000751 +[16:54:36] Epoch: 1 Batch: 17484/20099 (86.99%) Loss: 2.005286 LR: 0.00000751 +[16:54:37] Epoch: 1 Batch: 17485/20099 (86.99%) Loss: 1.973143 LR: 0.00000751 +[16:54:39] Epoch: 1 Batch: 17486/20099 (87.00%) Loss: 1.710006 LR: 0.00000751 +[16:54:41] Epoch: 1 Batch: 17487/20099 (87.00%) Loss: 2.244121 LR: 0.00000751 +[16:54:43] Epoch: 1 Batch: 17488/20099 (87.01%) Loss: 2.342765 LR: 0.00000751 +[16:54:45] Epoch: 1 Batch: 17489/20099 (87.01%) Loss: 2.235284 LR: 0.00000751 +[16:54:47] Epoch: 1 Batch: 17490/20099 (87.02%) Loss: 1.948175 LR: 0.00000751 +[16:54:49] Epoch: 1 Batch: 17491/20099 (87.02%) Loss: 2.049271 LR: 0.00000751 +[16:54:50] Epoch: 1 Batch: 17492/20099 (87.03%) Loss: 1.816251 LR: 0.00000751 +[16:54:52] Epoch: 1 Batch: 17493/20099 (87.03%) Loss: 2.252430 LR: 0.00000751 +[16:54:54] Epoch: 1 Batch: 17494/20099 (87.04%) Loss: 1.945635 LR: 0.00000750 +[16:54:56] Epoch: 1 Batch: 17495/20099 (87.04%) Loss: 2.090365 LR: 0.00000750 +[16:54:58] Epoch: 1 Batch: 17496/20099 (87.05%) Loss: 2.094879 LR: 0.00000750 +[16:55:00] Epoch: 1 Batch: 17497/20099 (87.05%) Loss: 1.921861 LR: 0.00000750 +[16:55:01] Epoch: 1 Batch: 17498/20099 (87.06%) Loss: 1.965526 LR: 0.00000750 +[16:55:03] Epoch: 1 Batch: 17499/20099 (87.06%) Loss: 2.302305 LR: 0.00000750 +[16:55:05] >> Evaluating batch 0 +[16:55:06] >> Evaluating batch 1 +[16:55:07] >> Evaluating batch 2 +[16:55:08] >> Evaluating batch 3 +[16:55:10] >> Evaluating batch 4 +[16:55:11] >> Evaluating batch 5 +[16:55:12] >> Evaluating batch 6 +[16:55:13] >> Evaluating batch 7 +[16:55:14] >> Evaluating batch 8 +[16:55:15] >> Evaluating batch 9 +[16:55:16] >> Evaluating batch 10 +[16:55:17] >> Evaluating batch 11 +[16:55:18] >> Evaluating batch 12 +[16:55:19] >> Evaluating batch 13 +[16:55:20] >> Evaluating batch 14 +[16:55:21] >> Evaluating batch 15 +[16:55:22] >> Evaluating batch 16 +[16:55:22] Epoch: 1 Step: 17500/20099 Evaluation: +[16:55:22] [1mAvg Loss Since Last Eval: 2.0786 Val Loss: 2.1478 Validation loss delta: 0.0008 Perplexity: 8.5657 LR: 0.00000750 +[16:55:26] >> Checkpoint saved: epoch1_step17500, size: 0.1693 GB +[16:55:26] Epoch: 1 Batch: 17500/20099 (87.07%) Loss: 2.235569 LR: 0.00000750 +[16:55:28] Epoch: 1 Batch: 17501/20099 (87.07%) Loss: 1.715805 LR: 0.00000749 +[16:55:30] Epoch: 1 Batch: 17502/20099 (87.08%) Loss: 1.954907 LR: 0.00000749 +[16:55:31] Epoch: 1 Batch: 17503/20099 (87.08%) Loss: 2.258551 LR: 0.00000749 +[16:55:33] Epoch: 1 Batch: 17504/20099 (87.09%) Loss: 2.229052 LR: 0.00000749 +[16:55:35] Epoch: 1 Batch: 17505/20099 (87.09%) Loss: 2.292044 LR: 0.00000749 +[16:55:37] Epoch: 1 Batch: 17506/20099 (87.10%) Loss: 2.261032 LR: 0.00000749 +[16:55:39] Epoch: 1 Batch: 17507/20099 (87.10%) Loss: 1.893339 LR: 0.00000749 +[16:55:41] Epoch: 1 Batch: 17508/20099 (87.11%) Loss: 2.165386 LR: 0.00000748 +[16:55:43] Epoch: 1 Batch: 17509/20099 (87.11%) Loss: 2.097392 LR: 0.00000748 +[16:55:44] Epoch: 1 Batch: 17510/20099 (87.12%) Loss: 2.204749 LR: 0.00000748 +[16:55:46] Epoch: 1 Batch: 17511/20099 (87.12%) Loss: 2.159243 LR: 0.00000748 +[16:55:48] Epoch: 1 Batch: 17512/20099 (87.13%) Loss: 2.176404 LR: 0.00000748 +[16:55:50] Epoch: 1 Batch: 17513/20099 (87.13%) Loss: 2.247217 LR: 0.00000748 +[16:55:52] Epoch: 1 Batch: 17514/20099 (87.14%) Loss: 2.210220 LR: 0.00000748 +[16:55:54] Epoch: 1 Batch: 17515/20099 (87.14%) Loss: 2.052430 LR: 0.00000747 +[16:55:56] Epoch: 1 Batch: 17516/20099 (87.15%) Loss: 2.379779 LR: 0.00000747 +[16:55:57] Epoch: 1 Batch: 17517/20099 (87.15%) Loss: 2.065267 LR: 0.00000747 +[16:55:59] Epoch: 1 Batch: 17518/20099 (87.16%) Loss: 2.148651 LR: 0.00000747 +[16:56:01] Epoch: 1 Batch: 17519/20099 (87.16%) Loss: 2.098794 LR: 0.00000747 +[16:56:03] Epoch: 1 Batch: 17520/20099 (87.17%) Loss: 2.067379 LR: 0.00000747 +[16:56:05] Epoch: 1 Batch: 17521/20099 (87.17%) Loss: 1.958378 LR: 0.00000747 +[16:56:07] Epoch: 1 Batch: 17522/20099 (87.18%) Loss: 2.113764 LR: 0.00000747 +[16:56:09] Epoch: 1 Batch: 17523/20099 (87.18%) Loss: 1.983912 LR: 0.00000747 +[16:56:11] Epoch: 1 Batch: 17524/20099 (87.19%) Loss: 2.129180 LR: 0.00000747 +[16:56:12] Epoch: 1 Batch: 17525/20099 (87.19%) Loss: 1.754702 LR: 0.00000747 +[16:56:14] Epoch: 1 Batch: 17526/20099 (87.20%) Loss: 2.087590 LR: 0.00000747 +[16:56:16] Epoch: 1 Batch: 17527/20099 (87.20%) Loss: 1.877912 LR: 0.00000747 +[16:56:18] Epoch: 1 Batch: 17528/20099 (87.21%) Loss: 2.185984 LR: 0.00000747 +[16:56:20] Epoch: 1 Batch: 17529/20099 (87.21%) Loss: 2.124600 LR: 0.00000746 +[16:56:22] Epoch: 1 Batch: 17530/20099 (87.22%) Loss: 1.892984 LR: 0.00000746 +[16:56:24] Epoch: 1 Batch: 17531/20099 (87.22%) Loss: 2.091397 LR: 0.00000746 +[16:56:25] Epoch: 1 Batch: 17532/20099 (87.23%) Loss: 1.996607 LR: 0.00000746 +[16:56:27] Epoch: 1 Batch: 17533/20099 (87.23%) Loss: 2.079574 LR: 0.00000746 +[16:56:29] Epoch: 1 Batch: 17534/20099 (87.24%) Loss: 2.142843 LR: 0.00000746 +[16:56:31] Epoch: 1 Batch: 17535/20099 (87.24%) Loss: 2.078719 LR: 0.00000746 +[16:56:33] Epoch: 1 Batch: 17536/20099 (87.25%) Loss: 2.171797 LR: 0.00000745 +[16:56:35] Epoch: 1 Batch: 17537/20099 (87.25%) Loss: 2.290361 LR: 0.00000745 +[16:56:36] Epoch: 1 Batch: 17538/20099 (87.26%) Loss: 2.057257 LR: 0.00000745 +[16:56:38] Epoch: 1 Batch: 17539/20099 (87.26%) Loss: 2.095358 LR: 0.00000745 +[16:56:40] Epoch: 1 Batch: 17540/20099 (87.27%) Loss: 2.373081 LR: 0.00000745 +[16:56:42] Epoch: 1 Batch: 17541/20099 (87.27%) Loss: 2.271805 LR: 0.00000745 +[16:56:44] Epoch: 1 Batch: 17542/20099 (87.28%) Loss: 2.197390 LR: 0.00000745 +[16:56:46] Epoch: 1 Batch: 17543/20099 (87.28%) Loss: 2.493478 LR: 0.00000744 +[16:56:48] Epoch: 1 Batch: 17544/20099 (87.29%) Loss: 1.839786 LR: 0.00000744 +[16:56:49] Epoch: 1 Batch: 17545/20099 (87.29%) Loss: 1.865154 LR: 0.00000744 +[16:56:51] Epoch: 1 Batch: 17546/20099 (87.30%) Loss: 1.965575 LR: 0.00000744 +[16:56:53] Epoch: 1 Batch: 17547/20099 (87.30%) Loss: 1.922869 LR: 0.00000744 +[16:56:55] Epoch: 1 Batch: 17548/20099 (87.31%) Loss: 2.210639 LR: 0.00000744 +[16:56:57] Epoch: 1 Batch: 17549/20099 (87.31%) Loss: 1.990479 LR: 0.00000744 +[16:56:59] Epoch: 1 Batch: 17550/20099 (87.32%) Loss: 1.876554 LR: 0.00000743 +[16:57:00] Epoch: 1 Batch: 17551/20099 (87.32%) Loss: 2.072264 LR: 0.00000743 +[16:57:02] Epoch: 1 Batch: 17552/20099 (87.33%) Loss: 1.964428 LR: 0.00000743 +[16:57:04] Epoch: 1 Batch: 17553/20099 (87.33%) Loss: 2.291249 LR: 0.00000743 +[16:57:06] Epoch: 1 Batch: 17554/20099 (87.34%) Loss: 2.187897 LR: 0.00000743 +[16:57:08] Epoch: 1 Batch: 17555/20099 (87.34%) Loss: 2.060750 LR: 0.00000743 +[16:57:10] Epoch: 1 Batch: 17556/20099 (87.35%) Loss: 1.993210 LR: 0.00000743 +[16:57:12] Epoch: 1 Batch: 17557/20099 (87.35%) Loss: 2.393697 LR: 0.00000743 +[16:57:13] Epoch: 1 Batch: 17558/20099 (87.36%) Loss: 1.958124 LR: 0.00000743 +[16:57:15] Epoch: 1 Batch: 17559/20099 (87.36%) Loss: 1.798334 LR: 0.00000743 +[16:57:17] Epoch: 1 Batch: 17560/20099 (87.37%) Loss: 2.376261 LR: 0.00000743 +[16:57:19] Epoch: 1 Batch: 17561/20099 (87.37%) Loss: 2.049006 LR: 0.00000743 +[16:57:21] Epoch: 1 Batch: 17562/20099 (87.38%) Loss: 2.257985 LR: 0.00000743 +[16:57:23] Epoch: 1 Batch: 17563/20099 (87.38%) Loss: 1.971873 LR: 0.00000743 +[16:57:25] Epoch: 1 Batch: 17564/20099 (87.39%) Loss: 2.170785 LR: 0.00000742 +[16:57:26] Epoch: 1 Batch: 17565/20099 (87.39%) Loss: 2.103040 LR: 0.00000742 +[16:57:28] Epoch: 1 Batch: 17566/20099 (87.40%) Loss: 2.058441 LR: 0.00000742 +[16:57:30] Epoch: 1 Batch: 17567/20099 (87.40%) Loss: 2.125111 LR: 0.00000742 +[16:57:32] Epoch: 1 Batch: 17568/20099 (87.41%) Loss: 2.287550 LR: 0.00000742 +[16:57:34] Epoch: 1 Batch: 17569/20099 (87.41%) Loss: 2.007423 LR: 0.00000742 +[16:57:36] Epoch: 1 Batch: 17570/20099 (87.42%) Loss: 1.912422 LR: 0.00000742 +[16:57:38] Epoch: 1 Batch: 17571/20099 (87.42%) Loss: 2.087461 LR: 0.00000741 +[16:57:39] Epoch: 1 Batch: 17572/20099 (87.43%) Loss: 2.213047 LR: 0.00000741 +[16:57:41] Epoch: 1 Batch: 17573/20099 (87.43%) Loss: 2.347178 LR: 0.00000741 +[16:57:43] Epoch: 1 Batch: 17574/20099 (87.44%) Loss: 1.943620 LR: 0.00000741 +[16:57:45] Epoch: 1 Batch: 17575/20099 (87.44%) Loss: 1.965196 LR: 0.00000741 +[16:57:47] Epoch: 1 Batch: 17576/20099 (87.45%) Loss: 2.231793 LR: 0.00000741 +[16:57:49] Epoch: 1 Batch: 17577/20099 (87.45%) Loss: 2.128377 LR: 0.00000741 +[16:57:51] Epoch: 1 Batch: 17578/20099 (87.46%) Loss: 2.405900 LR: 0.00000740 +[16:57:53] Epoch: 1 Batch: 17579/20099 (87.46%) Loss: 2.167300 LR: 0.00000740 +[16:57:54] Epoch: 1 Batch: 17580/20099 (87.47%) Loss: 1.992033 LR: 0.00000740 +[16:57:56] Epoch: 1 Batch: 17581/20099 (87.47%) Loss: 1.977361 LR: 0.00000740 +[16:57:58] Epoch: 1 Batch: 17582/20099 (87.48%) Loss: 2.372686 LR: 0.00000740 +[16:58:00] Epoch: 1 Batch: 17583/20099 (87.48%) Loss: 2.371083 LR: 0.00000740 +[16:58:02] Epoch: 1 Batch: 17584/20099 (87.49%) Loss: 2.093752 LR: 0.00000740 +[16:58:04] Epoch: 1 Batch: 17585/20099 (87.49%) Loss: 1.923145 LR: 0.00000740 +[16:58:06] Epoch: 1 Batch: 17586/20099 (87.50%) Loss: 1.873164 LR: 0.00000740 +[16:58:07] Epoch: 1 Batch: 17587/20099 (87.50%) Loss: 2.343898 LR: 0.00000740 +[16:58:09] Epoch: 1 Batch: 17588/20099 (87.51%) Loss: 2.212794 LR: 0.00000740 +[16:58:11] Epoch: 1 Batch: 17589/20099 (87.51%) Loss: 2.031736 LR: 0.00000740 +[16:58:13] Epoch: 1 Batch: 17590/20099 (87.52%) Loss: 2.003627 LR: 0.00000740 +[16:58:15] Epoch: 1 Batch: 17591/20099 (87.52%) Loss: 1.730391 LR: 0.00000740 +[16:58:17] Epoch: 1 Batch: 17592/20099 (87.53%) Loss: 2.068016 LR: 0.00000739 +[16:58:19] Epoch: 1 Batch: 17593/20099 (87.53%) Loss: 1.565503 LR: 0.00000739 +[16:58:20] Epoch: 1 Batch: 17594/20099 (87.54%) Loss: 1.899391 LR: 0.00000739 +[16:58:22] Epoch: 1 Batch: 17595/20099 (87.54%) Loss: 1.910562 LR: 0.00000739 +[16:58:24] Epoch: 1 Batch: 17596/20099 (87.55%) Loss: 1.883582 LR: 0.00000739 +[16:58:26] Epoch: 1 Batch: 17597/20099 (87.55%) Loss: 2.436203 LR: 0.00000739 +[16:58:28] Epoch: 1 Batch: 17598/20099 (87.56%) Loss: 2.247070 LR: 0.00000739 +[16:58:30] Epoch: 1 Batch: 17599/20099 (87.56%) Loss: 2.383197 LR: 0.00000738 +[16:58:35] >> Cleaned up old temp checkpoint: epoch1_step15600 +[16:58:35] >> Temp checkpoint saved: epoch1_step17600, size: 0.1693 GB +[16:58:35] Epoch: 1 Batch: 17600/20099 (87.57%) Loss: 2.295342 LR: 0.00000738 +[16:58:37] Epoch: 1 Batch: 17601/20099 (87.57%) Loss: 1.992482 LR: 0.00000738 +[16:58:39] Epoch: 1 Batch: 17602/20099 (87.58%) Loss: 1.999650 LR: 0.00000738 +[16:58:41] Epoch: 1 Batch: 17603/20099 (87.58%) Loss: 2.313826 LR: 0.00000738 +[16:58:43] Epoch: 1 Batch: 17604/20099 (87.59%) Loss: 1.874575 LR: 0.00000738 +[16:58:44] Epoch: 1 Batch: 17605/20099 (87.59%) Loss: 2.285563 LR: 0.00000738 +[16:58:46] Epoch: 1 Batch: 17606/20099 (87.60%) Loss: 1.992687 LR: 0.00000737 +[16:58:48] Epoch: 1 Batch: 17607/20099 (87.60%) Loss: 2.119013 LR: 0.00000737 +[16:58:50] Epoch: 1 Batch: 17608/20099 (87.61%) Loss: 2.160233 LR: 0.00000737 +[16:58:52] Epoch: 1 Batch: 17609/20099 (87.61%) Loss: 1.840156 LR: 0.00000737 +[16:58:54] Epoch: 1 Batch: 17610/20099 (87.62%) Loss: 2.300976 LR: 0.00000737 +[16:58:55] Epoch: 1 Batch: 17611/20099 (87.62%) Loss: 2.084265 LR: 0.00000737 +[16:58:57] Epoch: 1 Batch: 17612/20099 (87.63%) Loss: 1.776223 LR: 0.00000737 +[16:58:59] Epoch: 1 Batch: 17613/20099 (87.63%) Loss: 2.225344 LR: 0.00000737 +[16:59:01] Epoch: 1 Batch: 17614/20099 (87.64%) Loss: 1.906311 LR: 0.00000737 +[16:59:03] Epoch: 1 Batch: 17615/20099 (87.64%) Loss: 2.034387 LR: 0.00000737 +[16:59:05] Epoch: 1 Batch: 17616/20099 (87.65%) Loss: 1.933355 LR: 0.00000737 +[16:59:07] Epoch: 1 Batch: 17617/20099 (87.65%) Loss: 2.013640 LR: 0.00000737 +[16:59:09] Epoch: 1 Batch: 17618/20099 (87.66%) Loss: 2.048515 LR: 0.00000737 +[16:59:10] Epoch: 1 Batch: 17619/20099 (87.66%) Loss: 2.013034 LR: 0.00000737 +[16:59:12] Epoch: 1 Batch: 17620/20099 (87.67%) Loss: 2.206240 LR: 0.00000736 +[16:59:14] Epoch: 1 Batch: 17621/20099 (87.67%) Loss: 2.347862 LR: 0.00000736 +[16:59:16] Epoch: 1 Batch: 17622/20099 (87.68%) Loss: 2.076741 LR: 0.00000736 +[16:59:18] Epoch: 1 Batch: 17623/20099 (87.68%) Loss: 2.159196 LR: 0.00000736 +[16:59:20] Epoch: 1 Batch: 17624/20099 (87.69%) Loss: 1.809497 LR: 0.00000736 +[16:59:22] Epoch: 1 Batch: 17625/20099 (87.69%) Loss: 2.177937 LR: 0.00000736 +[16:59:23] Epoch: 1 Batch: 17626/20099 (87.70%) Loss: 2.155886 LR: 0.00000736 +[16:59:25] Epoch: 1 Batch: 17627/20099 (87.70%) Loss: 2.022970 LR: 0.00000735 +[16:59:27] Epoch: 1 Batch: 17628/20099 (87.71%) Loss: 2.060594 LR: 0.00000735 +[16:59:29] Epoch: 1 Batch: 17629/20099 (87.71%) Loss: 2.143426 LR: 0.00000735 +[16:59:31] Epoch: 1 Batch: 17630/20099 (87.72%) Loss: 2.029916 LR: 0.00000735 +[16:59:33] Epoch: 1 Batch: 17631/20099 (87.72%) Loss: 2.418743 LR: 0.00000735 +[16:59:35] Epoch: 1 Batch: 17632/20099 (87.73%) Loss: 1.934432 LR: 0.00000735 +[16:59:36] Epoch: 1 Batch: 17633/20099 (87.73%) Loss: 2.026227 LR: 0.00000735 +[16:59:38] Epoch: 1 Batch: 17634/20099 (87.74%) Loss: 1.899903 LR: 0.00000734 +[16:59:40] Epoch: 1 Batch: 17635/20099 (87.74%) Loss: 2.087386 LR: 0.00000734 +[16:59:42] Epoch: 1 Batch: 17636/20099 (87.75%) Loss: 2.075427 LR: 0.00000734 +[16:59:44] Epoch: 1 Batch: 17637/20099 (87.75%) Loss: 1.812583 LR: 0.00000734 +[16:59:46] Epoch: 1 Batch: 17638/20099 (87.76%) Loss: 2.168366 LR: 0.00000734 +[16:59:47] Epoch: 1 Batch: 17639/20099 (87.76%) Loss: 1.981191 LR: 0.00000734 +[16:59:49] Epoch: 1 Batch: 17640/20099 (87.77%) Loss: 2.203865 LR: 0.00000734 +[16:59:51] Epoch: 1 Batch: 17641/20099 (87.77%) Loss: 1.861481 LR: 0.00000734 +[16:59:53] Epoch: 1 Batch: 17642/20099 (87.78%) Loss: 2.230494 LR: 0.00000734 +[16:59:55] Epoch: 1 Batch: 17643/20099 (87.78%) Loss: 2.144921 LR: 0.00000734 +[16:59:57] Epoch: 1 Batch: 17644/20099 (87.79%) Loss: 2.140000 LR: 0.00000734 +[16:59:59] Epoch: 1 Batch: 17645/20099 (87.79%) Loss: 2.020084 LR: 0.00000734 +[17:00:00] Epoch: 1 Batch: 17646/20099 (87.80%) Loss: 2.117768 LR: 0.00000734 +[17:00:02] Epoch: 1 Batch: 17647/20099 (87.80%) Loss: 1.933041 LR: 0.00000734 +[17:00:04] Epoch: 1 Batch: 17648/20099 (87.81%) Loss: 2.227930 LR: 0.00000733 +[17:00:06] Epoch: 1 Batch: 17649/20099 (87.81%) Loss: 2.109669 LR: 0.00000733 +[17:00:08] Epoch: 1 Batch: 17650/20099 (87.82%) Loss: 2.393474 LR: 0.00000733 +[17:00:10] Epoch: 1 Batch: 17651/20099 (87.82%) Loss: 1.935545 LR: 0.00000733 +[17:00:12] Epoch: 1 Batch: 17652/20099 (87.83%) Loss: 2.100092 LR: 0.00000733 +[17:00:13] Epoch: 1 Batch: 17653/20099 (87.83%) Loss: 2.068494 LR: 0.00000733 +[17:00:15] Epoch: 1 Batch: 17654/20099 (87.84%) Loss: 2.245531 LR: 0.00000733 +[17:00:17] Epoch: 1 Batch: 17655/20099 (87.84%) Loss: 2.064369 LR: 0.00000732 +[17:00:19] Epoch: 1 Batch: 17656/20099 (87.85%) Loss: 1.957474 LR: 0.00000732 +[17:00:21] Epoch: 1 Batch: 17657/20099 (87.85%) Loss: 2.269178 LR: 0.00000732 +[17:00:23] Epoch: 1 Batch: 17658/20099 (87.86%) Loss: 2.324436 LR: 0.00000732 +[17:00:24] Epoch: 1 Batch: 17659/20099 (87.86%) Loss: 1.933003 LR: 0.00000732 +[17:00:26] Epoch: 1 Batch: 17660/20099 (87.87%) Loss: 2.320109 LR: 0.00000732 +[17:00:28] Epoch: 1 Batch: 17661/20099 (87.87%) Loss: 2.125337 LR: 0.00000732 +[17:00:30] Epoch: 1 Batch: 17662/20099 (87.88%) Loss: 2.285500 LR: 0.00000731 +[17:00:32] Epoch: 1 Batch: 17663/20099 (87.88%) Loss: 1.969454 LR: 0.00000731 +[17:00:34] Epoch: 1 Batch: 17664/20099 (87.88%) Loss: 1.690562 LR: 0.00000731 +[17:00:36] Epoch: 1 Batch: 17665/20099 (87.89%) Loss: 2.244020 LR: 0.00000731 +[17:00:37] Epoch: 1 Batch: 17666/20099 (87.89%) Loss: 2.180691 LR: 0.00000731 +[17:00:39] Epoch: 1 Batch: 17667/20099 (87.90%) Loss: 1.858263 LR: 0.00000731 +[17:00:41] Epoch: 1 Batch: 17668/20099 (87.90%) Loss: 1.795972 LR: 0.00000731 +[17:00:43] Epoch: 1 Batch: 17669/20099 (87.91%) Loss: 1.923286 LR: 0.00000731 +[17:00:45] Epoch: 1 Batch: 17670/20099 (87.91%) Loss: 2.106896 LR: 0.00000731 +[17:00:47] Epoch: 1 Batch: 17671/20099 (87.92%) Loss: 2.419809 LR: 0.00000731 +[17:00:49] Epoch: 1 Batch: 17672/20099 (87.92%) Loss: 2.039883 LR: 0.00000731 +[17:00:50] Epoch: 1 Batch: 17673/20099 (87.93%) Loss: 2.273315 LR: 0.00000731 +[17:00:52] Epoch: 1 Batch: 17674/20099 (87.93%) Loss: 1.959483 LR: 0.00000731 +[17:00:54] Epoch: 1 Batch: 17675/20099 (87.94%) Loss: 2.292348 LR: 0.00000731 +[17:00:56] Epoch: 1 Batch: 17676/20099 (87.94%) Loss: 1.933493 LR: 0.00000730 +[17:00:58] Epoch: 1 Batch: 17677/20099 (87.95%) Loss: 2.140478 LR: 0.00000730 +[17:01:00] Epoch: 1 Batch: 17678/20099 (87.95%) Loss: 1.856125 LR: 0.00000730 +[17:01:01] Epoch: 1 Batch: 17679/20099 (87.96%) Loss: 2.062682 LR: 0.00000730 +[17:01:03] Epoch: 1 Batch: 17680/20099 (87.96%) Loss: 2.260612 LR: 0.00000730 +[17:01:05] Epoch: 1 Batch: 17681/20099 (87.97%) Loss: 2.011373 LR: 0.00000730 +[17:01:07] Epoch: 1 Batch: 17682/20099 (87.97%) Loss: 2.133018 LR: 0.00000730 +[17:01:09] Epoch: 1 Batch: 17683/20099 (87.98%) Loss: 1.878486 LR: 0.00000729 +[17:01:11] Epoch: 1 Batch: 17684/20099 (87.98%) Loss: 1.989194 LR: 0.00000729 +[17:01:12] Epoch: 1 Batch: 17685/20099 (87.99%) Loss: 2.178058 LR: 0.00000729 +[17:01:14] Epoch: 1 Batch: 17686/20099 (87.99%) Loss: 2.395487 LR: 0.00000729 +[17:01:16] Epoch: 1 Batch: 17687/20099 (88.00%) Loss: 1.917369 LR: 0.00000729 +[17:01:18] Epoch: 1 Batch: 17688/20099 (88.00%) Loss: 2.213969 LR: 0.00000729 +[17:01:20] Epoch: 1 Batch: 17689/20099 (88.01%) Loss: 2.026361 LR: 0.00000729 +[17:01:22] Epoch: 1 Batch: 17690/20099 (88.01%) Loss: 1.888976 LR: 0.00000728 +[17:01:24] Epoch: 1 Batch: 17691/20099 (88.02%) Loss: 2.220475 LR: 0.00000728 +[17:01:25] Epoch: 1 Batch: 17692/20099 (88.02%) Loss: 2.110624 LR: 0.00000728 +[17:01:27] Epoch: 1 Batch: 17693/20099 (88.03%) Loss: 2.161783 LR: 0.00000728 +[17:01:29] Epoch: 1 Batch: 17694/20099 (88.03%) Loss: 2.077018 LR: 0.00000728 +[17:01:31] Epoch: 1 Batch: 17695/20099 (88.04%) Loss: 1.980493 LR: 0.00000728 +[17:01:33] Epoch: 1 Batch: 17696/20099 (88.04%) Loss: 1.818541 LR: 0.00000728 +[17:01:35] Epoch: 1 Batch: 17697/20099 (88.05%) Loss: 1.978923 LR: 0.00000728 +[17:01:37] Epoch: 1 Batch: 17698/20099 (88.05%) Loss: 2.124783 LR: 0.00000728 +[17:01:38] Epoch: 1 Batch: 17699/20099 (88.06%) Loss: 2.372052 LR: 0.00000728 +[17:01:40] Epoch: 1 Batch: 17700/20099 (88.06%) Loss: 1.779910 LR: 0.00000728 +[17:01:42] Epoch: 1 Batch: 17701/20099 (88.07%) Loss: 2.069732 LR: 0.00000728 +[17:01:44] Epoch: 1 Batch: 17702/20099 (88.07%) Loss: 2.196401 LR: 0.00000728 +[17:01:46] Epoch: 1 Batch: 17703/20099 (88.08%) Loss: 2.084141 LR: 0.00000728 +[17:01:48] Epoch: 1 Batch: 17704/20099 (88.08%) Loss: 1.797920 LR: 0.00000727 +[17:01:50] Epoch: 1 Batch: 17705/20099 (88.09%) Loss: 2.074756 LR: 0.00000727 +[17:01:51] Epoch: 1 Batch: 17706/20099 (88.09%) Loss: 2.008380 LR: 0.00000727 +[17:01:53] Epoch: 1 Batch: 17707/20099 (88.10%) Loss: 2.029838 LR: 0.00000727 +[17:01:55] Epoch: 1 Batch: 17708/20099 (88.10%) Loss: 2.123073 LR: 0.00000727 +[17:01:57] Epoch: 1 Batch: 17709/20099 (88.11%) Loss: 2.013156 LR: 0.00000727 +[17:01:59] Epoch: 1 Batch: 17710/20099 (88.11%) Loss: 2.300520 LR: 0.00000727 +[17:02:01] Epoch: 1 Batch: 17711/20099 (88.12%) Loss: 1.933515 LR: 0.00000726 +[17:02:03] Epoch: 1 Batch: 17712/20099 (88.12%) Loss: 2.129671 LR: 0.00000726 +[17:02:04] Epoch: 1 Batch: 17713/20099 (88.13%) Loss: 1.878718 LR: 0.00000726 +[17:02:06] Epoch: 1 Batch: 17714/20099 (88.13%) Loss: 2.067873 LR: 0.00000726 +[17:02:08] Epoch: 1 Batch: 17715/20099 (88.14%) Loss: 1.993014 LR: 0.00000726 +[17:02:10] Epoch: 1 Batch: 17716/20099 (88.14%) Loss: 2.073698 LR: 0.00000726 +[17:02:12] Epoch: 1 Batch: 17717/20099 (88.15%) Loss: 2.340602 LR: 0.00000726 +[17:02:14] Epoch: 1 Batch: 17718/20099 (88.15%) Loss: 2.361984 LR: 0.00000726 +[17:02:16] Epoch: 1 Batch: 17719/20099 (88.16%) Loss: 1.764903 LR: 0.00000726 +[17:02:17] Epoch: 1 Batch: 17720/20099 (88.16%) Loss: 2.232680 LR: 0.00000726 +[17:02:19] Epoch: 1 Batch: 17721/20099 (88.17%) Loss: 2.203587 LR: 0.00000726 +[17:02:21] Epoch: 1 Batch: 17722/20099 (88.17%) Loss: 1.875383 LR: 0.00000726 +[17:02:23] Epoch: 1 Batch: 17723/20099 (88.18%) Loss: 1.981509 LR: 0.00000726 +[17:02:25] Epoch: 1 Batch: 17724/20099 (88.18%) Loss: 2.154276 LR: 0.00000726 +[17:02:27] Epoch: 1 Batch: 17725/20099 (88.19%) Loss: 2.090257 LR: 0.00000725 +[17:02:28] Epoch: 1 Batch: 17726/20099 (88.19%) Loss: 2.161135 LR: 0.00000725 +[17:02:30] Epoch: 1 Batch: 17727/20099 (88.20%) Loss: 2.083909 LR: 0.00000725 +[17:02:32] Epoch: 1 Batch: 17728/20099 (88.20%) Loss: 2.006870 LR: 0.00000725 +[17:02:34] Epoch: 1 Batch: 17729/20099 (88.21%) Loss: 1.738441 LR: 0.00000725 +[17:02:36] Epoch: 1 Batch: 17730/20099 (88.21%) Loss: 1.899634 LR: 0.00000725 +[17:02:38] Epoch: 1 Batch: 17731/20099 (88.22%) Loss: 2.118209 LR: 0.00000725 +[17:02:39] Epoch: 1 Batch: 17732/20099 (88.22%) Loss: 1.882387 LR: 0.00000724 +[17:02:41] Epoch: 1 Batch: 17733/20099 (88.23%) Loss: 2.321584 LR: 0.00000724 +[17:02:43] Epoch: 1 Batch: 17734/20099 (88.23%) Loss: 2.042328 LR: 0.00000724 +[17:02:45] Epoch: 1 Batch: 17735/20099 (88.24%) Loss: 1.898383 LR: 0.00000724 +[17:02:47] Epoch: 1 Batch: 17736/20099 (88.24%) Loss: 1.864330 LR: 0.00000724 +[17:02:49] Epoch: 1 Batch: 17737/20099 (88.25%) Loss: 2.093995 LR: 0.00000724 +[17:02:51] Epoch: 1 Batch: 17738/20099 (88.25%) Loss: 2.156176 LR: 0.00000724 +[17:02:52] Epoch: 1 Batch: 17739/20099 (88.26%) Loss: 2.170602 LR: 0.00000723 +[17:02:54] Epoch: 1 Batch: 17740/20099 (88.26%) Loss: 2.152958 LR: 0.00000723 +[17:02:56] Epoch: 1 Batch: 17741/20099 (88.27%) Loss: 2.152933 LR: 0.00000723 +[17:02:58] Epoch: 1 Batch: 17742/20099 (88.27%) Loss: 2.097863 LR: 0.00000723 +[17:03:00] Epoch: 1 Batch: 17743/20099 (88.28%) Loss: 2.093624 LR: 0.00000723 +[17:03:02] Epoch: 1 Batch: 17744/20099 (88.28%) Loss: 1.947402 LR: 0.00000723 +[17:03:04] Epoch: 1 Batch: 17745/20099 (88.29%) Loss: 1.961210 LR: 0.00000723 +[17:03:05] Epoch: 1 Batch: 17746/20099 (88.29%) Loss: 2.145979 LR: 0.00000723 +[17:03:07] Epoch: 1 Batch: 17747/20099 (88.30%) Loss: 1.895670 LR: 0.00000723 +[17:03:09] Epoch: 1 Batch: 17748/20099 (88.30%) Loss: 2.367840 LR: 0.00000723 +[17:03:11] Epoch: 1 Batch: 17749/20099 (88.31%) Loss: 2.177567 LR: 0.00000723 +[17:03:13] Epoch: 1 Batch: 17750/20099 (88.31%) Loss: 2.069417 LR: 0.00000723 +[17:03:15] Epoch: 1 Batch: 17751/20099 (88.32%) Loss: 1.975558 LR: 0.00000723 +[17:03:17] Epoch: 1 Batch: 17752/20099 (88.32%) Loss: 1.785103 LR: 0.00000723 +[17:03:18] Epoch: 1 Batch: 17753/20099 (88.33%) Loss: 2.017456 LR: 0.00000722 +[17:03:20] Epoch: 1 Batch: 17754/20099 (88.33%) Loss: 2.025445 LR: 0.00000722 +[17:03:22] Epoch: 1 Batch: 17755/20099 (88.34%) Loss: 1.935558 LR: 0.00000722 +[17:03:24] Epoch: 1 Batch: 17756/20099 (88.34%) Loss: 1.691265 LR: 0.00000722 +[17:03:26] Epoch: 1 Batch: 17757/20099 (88.35%) Loss: 2.150466 LR: 0.00000722 +[17:03:28] Epoch: 1 Batch: 17758/20099 (88.35%) Loss: 2.097889 LR: 0.00000722 +[17:03:30] Epoch: 1 Batch: 17759/20099 (88.36%) Loss: 2.283082 LR: 0.00000722 +[17:03:31] Epoch: 1 Batch: 17760/20099 (88.36%) Loss: 2.078397 LR: 0.00000721 +[17:03:33] Epoch: 1 Batch: 17761/20099 (88.37%) Loss: 2.075105 LR: 0.00000721 +[17:03:35] Epoch: 1 Batch: 17762/20099 (88.37%) Loss: 2.476645 LR: 0.00000721 +[17:03:37] Epoch: 1 Batch: 17763/20099 (88.38%) Loss: 2.207800 LR: 0.00000721 +[17:03:39] Epoch: 1 Batch: 17764/20099 (88.38%) Loss: 2.280474 LR: 0.00000721 +[17:03:41] Epoch: 1 Batch: 17765/20099 (88.39%) Loss: 2.246104 LR: 0.00000721 +[17:03:43] Epoch: 1 Batch: 17766/20099 (88.39%) Loss: 2.097852 LR: 0.00000721 +[17:03:44] Epoch: 1 Batch: 17767/20099 (88.40%) Loss: 2.095386 LR: 0.00000721 +[17:03:46] Epoch: 1 Batch: 17768/20099 (88.40%) Loss: 2.010395 LR: 0.00000721 +[17:03:48] Epoch: 1 Batch: 17769/20099 (88.41%) Loss: 2.240827 LR: 0.00000721 +[17:03:50] Epoch: 1 Batch: 17770/20099 (88.41%) Loss: 2.317030 LR: 0.00000721 +[17:03:52] Epoch: 1 Batch: 17771/20099 (88.42%) Loss: 1.914078 LR: 0.00000721 +[17:03:54] Epoch: 1 Batch: 17772/20099 (88.42%) Loss: 2.124379 LR: 0.00000721 +[17:03:56] Epoch: 1 Batch: 17773/20099 (88.43%) Loss: 2.031997 LR: 0.00000721 +[17:03:57] Epoch: 1 Batch: 17774/20099 (88.43%) Loss: 2.147969 LR: 0.00000720 +[17:03:59] Epoch: 1 Batch: 17775/20099 (88.44%) Loss: 2.081557 LR: 0.00000720 +[17:04:01] Epoch: 1 Batch: 17776/20099 (88.44%) Loss: 1.790178 LR: 0.00000720 +[17:04:03] Epoch: 1 Batch: 17777/20099 (88.45%) Loss: 2.189587 LR: 0.00000720 +[17:04:05] Epoch: 1 Batch: 17778/20099 (88.45%) Loss: 2.259582 LR: 0.00000720 +[17:04:07] Epoch: 1 Batch: 17779/20099 (88.46%) Loss: 2.193757 LR: 0.00000720 +[17:04:08] Epoch: 1 Batch: 17780/20099 (88.46%) Loss: 1.998036 LR: 0.00000720 +[17:04:10] Epoch: 1 Batch: 17781/20099 (88.47%) Loss: 2.237819 LR: 0.00000719 +[17:04:12] Epoch: 1 Batch: 17782/20099 (88.47%) Loss: 1.847376 LR: 0.00000719 +[17:04:14] Epoch: 1 Batch: 17783/20099 (88.48%) Loss: 2.302372 LR: 0.00000719 +[17:04:16] Epoch: 1 Batch: 17784/20099 (88.48%) Loss: 2.222530 LR: 0.00000719 +[17:04:18] Epoch: 1 Batch: 17785/20099 (88.49%) Loss: 2.163054 LR: 0.00000719 +[17:04:20] Epoch: 1 Batch: 17786/20099 (88.49%) Loss: 2.123675 LR: 0.00000719 +[17:04:21] Epoch: 1 Batch: 17787/20099 (88.50%) Loss: 2.047291 LR: 0.00000719 +[17:04:23] Epoch: 1 Batch: 17788/20099 (88.50%) Loss: 1.886324 LR: 0.00000718 +[17:04:25] Epoch: 1 Batch: 17789/20099 (88.51%) Loss: 2.026636 LR: 0.00000718 +[17:04:27] Epoch: 1 Batch: 17790/20099 (88.51%) Loss: 2.280516 LR: 0.00000718 +[17:04:29] Epoch: 1 Batch: 17791/20099 (88.52%) Loss: 1.925335 LR: 0.00000718 +[17:04:31] Epoch: 1 Batch: 17792/20099 (88.52%) Loss: 2.167463 LR: 0.00000718 +[17:04:32] Epoch: 1 Batch: 17793/20099 (88.53%) Loss: 2.033176 LR: 0.00000718 +[17:04:34] Epoch: 1 Batch: 17794/20099 (88.53%) Loss: 2.170818 LR: 0.00000718 +[17:04:36] Epoch: 1 Batch: 17795/20099 (88.54%) Loss: 2.071957 LR: 0.00000718 +[17:04:38] Epoch: 1 Batch: 17796/20099 (88.54%) Loss: 1.714383 LR: 0.00000718 +[17:04:40] Epoch: 1 Batch: 17797/20099 (88.55%) Loss: 2.283930 LR: 0.00000718 +[17:04:42] Epoch: 1 Batch: 17798/20099 (88.55%) Loss: 2.142462 LR: 0.00000718 +[17:04:43] Epoch: 1 Batch: 17799/20099 (88.56%) Loss: 2.096951 LR: 0.00000718 +[17:04:49] >> Cleaned up old temp checkpoint: epoch1_step15800 +[17:04:49] >> Temp checkpoint saved: epoch1_step17800, size: 0.1693 GB +[17:04:49] Epoch: 1 Batch: 17800/20099 (88.56%) Loss: 2.086728 LR: 0.00000718 +[17:04:51] Epoch: 1 Batch: 17801/20099 (88.57%) Loss: 2.153544 LR: 0.00000718 +[17:04:53] Epoch: 1 Batch: 17802/20099 (88.57%) Loss: 2.114238 LR: 0.00000717 +[17:04:55] Epoch: 1 Batch: 17803/20099 (88.58%) Loss: 1.773811 LR: 0.00000717 +[17:04:57] Epoch: 1 Batch: 17804/20099 (88.58%) Loss: 2.132750 LR: 0.00000717 +[17:04:58] Epoch: 1 Batch: 17805/20099 (88.59%) Loss: 2.149646 LR: 0.00000717 +[17:05:00] Epoch: 1 Batch: 17806/20099 (88.59%) Loss: 2.162595 LR: 0.00000717 +[17:05:02] Epoch: 1 Batch: 17807/20099 (88.60%) Loss: 2.044490 LR: 0.00000717 +[17:05:04] Epoch: 1 Batch: 17808/20099 (88.60%) Loss: 2.073534 LR: 0.00000717 +[17:05:06] Epoch: 1 Batch: 17809/20099 (88.61%) Loss: 2.203354 LR: 0.00000716 +[17:05:08] Epoch: 1 Batch: 17810/20099 (88.61%) Loss: 2.009174 LR: 0.00000716 +[17:05:10] Epoch: 1 Batch: 17811/20099 (88.62%) Loss: 1.987364 LR: 0.00000716 +[17:05:11] Epoch: 1 Batch: 17812/20099 (88.62%) Loss: 2.108571 LR: 0.00000716 +[17:05:13] Epoch: 1 Batch: 17813/20099 (88.63%) Loss: 2.216749 LR: 0.00000716 +[17:05:15] Epoch: 1 Batch: 17814/20099 (88.63%) Loss: 2.198654 LR: 0.00000716 +[17:05:17] Epoch: 1 Batch: 17815/20099 (88.64%) Loss: 1.938093 LR: 0.00000716 +[17:05:19] Epoch: 1 Batch: 17816/20099 (88.64%) Loss: 2.048977 LR: 0.00000716 +[17:05:21] Epoch: 1 Batch: 17817/20099 (88.65%) Loss: 2.127401 LR: 0.00000716 +[17:05:23] Epoch: 1 Batch: 17818/20099 (88.65%) Loss: 1.921334 LR: 0.00000716 +[17:05:25] Epoch: 1 Batch: 17819/20099 (88.66%) Loss: 2.101003 LR: 0.00000716 +[17:05:26] Epoch: 1 Batch: 17820/20099 (88.66%) Loss: 2.377211 LR: 0.00000716 +[17:05:28] Epoch: 1 Batch: 17821/20099 (88.67%) Loss: 2.196701 LR: 0.00000716 +[17:05:30] Epoch: 1 Batch: 17822/20099 (88.67%) Loss: 2.033611 LR: 0.00000716 +[17:05:32] Epoch: 1 Batch: 17823/20099 (88.68%) Loss: 2.112613 LR: 0.00000715 +[17:05:34] Epoch: 1 Batch: 17824/20099 (88.68%) Loss: 1.988445 LR: 0.00000715 +[17:05:36] Epoch: 1 Batch: 17825/20099 (88.69%) Loss: 2.464946 LR: 0.00000715 +[17:05:38] Epoch: 1 Batch: 17826/20099 (88.69%) Loss: 1.981993 LR: 0.00000715 +[17:05:39] Epoch: 1 Batch: 17827/20099 (88.70%) Loss: 1.897996 LR: 0.00000715 +[17:05:41] Epoch: 1 Batch: 17828/20099 (88.70%) Loss: 1.987734 LR: 0.00000715 +[17:05:43] Epoch: 1 Batch: 17829/20099 (88.71%) Loss: 2.091252 LR: 0.00000715 +[17:05:45] Epoch: 1 Batch: 17830/20099 (88.71%) Loss: 2.062995 LR: 0.00000714 +[17:05:47] Epoch: 1 Batch: 17831/20099 (88.72%) Loss: 1.685725 LR: 0.00000714 +[17:05:49] Epoch: 1 Batch: 17832/20099 (88.72%) Loss: 2.255077 LR: 0.00000714 +[17:05:50] Epoch: 1 Batch: 17833/20099 (88.73%) Loss: 2.176946 LR: 0.00000714 +[17:05:52] Epoch: 1 Batch: 17834/20099 (88.73%) Loss: 2.283120 LR: 0.00000714 +[17:05:54] Epoch: 1 Batch: 17835/20099 (88.74%) Loss: 2.065267 LR: 0.00000714 +[17:05:56] Epoch: 1 Batch: 17836/20099 (88.74%) Loss: 2.273965 LR: 0.00000714 +[17:05:58] Epoch: 1 Batch: 17837/20099 (88.75%) Loss: 1.603566 LR: 0.00000714 +[17:06:00] Epoch: 1 Batch: 17838/20099 (88.75%) Loss: 2.361848 LR: 0.00000714 +[17:06:02] Epoch: 1 Batch: 17839/20099 (88.76%) Loss: 1.719803 LR: 0.00000714 +[17:06:03] Epoch: 1 Batch: 17840/20099 (88.76%) Loss: 2.097347 LR: 0.00000714 +[17:06:05] Epoch: 1 Batch: 17841/20099 (88.77%) Loss: 2.188777 LR: 0.00000714 +[17:06:07] Epoch: 1 Batch: 17842/20099 (88.77%) Loss: 2.086007 LR: 0.00000714 +[17:06:09] Epoch: 1 Batch: 17843/20099 (88.78%) Loss: 1.674153 LR: 0.00000714 +[17:06:11] Epoch: 1 Batch: 17844/20099 (88.78%) Loss: 1.934477 LR: 0.00000713 +[17:06:13] Epoch: 1 Batch: 17845/20099 (88.79%) Loss: 2.114703 LR: 0.00000713 +[17:06:15] Epoch: 1 Batch: 17846/20099 (88.79%) Loss: 1.869865 LR: 0.00000713 +[17:06:16] Epoch: 1 Batch: 17847/20099 (88.80%) Loss: 2.153941 LR: 0.00000713 +[17:06:18] Epoch: 1 Batch: 17848/20099 (88.80%) Loss: 1.988660 LR: 0.00000713 +[17:06:20] Epoch: 1 Batch: 17849/20099 (88.81%) Loss: 2.098267 LR: 0.00000713 +[17:06:22] Epoch: 1 Batch: 17850/20099 (88.81%) Loss: 2.158161 LR: 0.00000713 +[17:06:24] Epoch: 1 Batch: 17851/20099 (88.82%) Loss: 2.178480 LR: 0.00000712 +[17:06:26] Epoch: 1 Batch: 17852/20099 (88.82%) Loss: 2.169351 LR: 0.00000712 +[17:06:27] Epoch: 1 Batch: 17853/20099 (88.83%) Loss: 1.938846 LR: 0.00000712 +[17:06:29] Epoch: 1 Batch: 17854/20099 (88.83%) Loss: 1.951990 LR: 0.00000712 +[17:06:31] Epoch: 1 Batch: 17855/20099 (88.84%) Loss: 2.364931 LR: 0.00000712 +[17:06:33] Epoch: 1 Batch: 17856/20099 (88.84%) Loss: 2.037882 LR: 0.00000712 +[17:06:35] Epoch: 1 Batch: 17857/20099 (88.85%) Loss: 2.028296 LR: 0.00000712 +[17:06:37] Epoch: 1 Batch: 17858/20099 (88.85%) Loss: 2.349697 LR: 0.00000711 +[17:06:39] Epoch: 1 Batch: 17859/20099 (88.86%) Loss: 2.146251 LR: 0.00000711 +[17:06:40] Epoch: 1 Batch: 17860/20099 (88.86%) Loss: 2.281940 LR: 0.00000711 +[17:06:42] Epoch: 1 Batch: 17861/20099 (88.87%) Loss: 2.065648 LR: 0.00000711 +[17:06:44] Epoch: 1 Batch: 17862/20099 (88.87%) Loss: 2.080222 LR: 0.00000711 +[17:06:46] Epoch: 1 Batch: 17863/20099 (88.88%) Loss: 1.964322 LR: 0.00000711 +[17:06:48] Epoch: 1 Batch: 17864/20099 (88.88%) Loss: 2.045818 LR: 0.00000711 +[17:06:50] Epoch: 1 Batch: 17865/20099 (88.89%) Loss: 2.146172 LR: 0.00000711 +[17:06:52] Epoch: 1 Batch: 17866/20099 (88.89%) Loss: 2.075050 LR: 0.00000711 +[17:06:53] Epoch: 1 Batch: 17867/20099 (88.89%) Loss: 1.876192 LR: 0.00000711 +[17:06:55] Epoch: 1 Batch: 17868/20099 (88.90%) Loss: 1.633713 LR: 0.00000711 +[17:06:57] Epoch: 1 Batch: 17869/20099 (88.90%) Loss: 1.890222 LR: 0.00000711 +[17:06:59] Epoch: 1 Batch: 17870/20099 (88.91%) Loss: 2.202323 LR: 0.00000711 +[17:07:01] Epoch: 1 Batch: 17871/20099 (88.91%) Loss: 2.102167 LR: 0.00000711 +[17:07:03] Epoch: 1 Batch: 17872/20099 (88.92%) Loss: 1.839270 LR: 0.00000710 +[17:07:05] Epoch: 1 Batch: 17873/20099 (88.92%) Loss: 2.015069 LR: 0.00000710 +[17:07:06] Epoch: 1 Batch: 17874/20099 (88.93%) Loss: 2.213577 LR: 0.00000710 +[17:07:08] Epoch: 1 Batch: 17875/20099 (88.93%) Loss: 2.445965 LR: 0.00000710 +[17:07:10] Epoch: 1 Batch: 17876/20099 (88.94%) Loss: 2.367826 LR: 0.00000710 +[17:07:12] Epoch: 1 Batch: 17877/20099 (88.94%) Loss: 1.769125 LR: 0.00000710 +[17:07:14] Epoch: 1 Batch: 17878/20099 (88.95%) Loss: 2.318062 LR: 0.00000710 +[17:07:16] Epoch: 1 Batch: 17879/20099 (88.95%) Loss: 2.199411 LR: 0.00000709 +[17:07:18] Epoch: 1 Batch: 17880/20099 (88.96%) Loss: 2.319661 LR: 0.00000709 +[17:07:19] Epoch: 1 Batch: 17881/20099 (88.96%) Loss: 1.950304 LR: 0.00000709 +[17:07:21] Epoch: 1 Batch: 17882/20099 (88.97%) Loss: 2.120035 LR: 0.00000709 +[17:07:23] Epoch: 1 Batch: 17883/20099 (88.97%) Loss: 1.683664 LR: 0.00000709 +[17:07:25] Epoch: 1 Batch: 17884/20099 (88.98%) Loss: 2.361920 LR: 0.00000709 +[17:07:27] Epoch: 1 Batch: 17885/20099 (88.98%) Loss: 2.172902 LR: 0.00000709 +[17:07:29] Epoch: 1 Batch: 17886/20099 (88.99%) Loss: 2.611073 LR: 0.00000709 +[17:07:31] Epoch: 1 Batch: 17887/20099 (88.99%) Loss: 1.832123 LR: 0.00000709 +[17:07:32] Epoch: 1 Batch: 17888/20099 (89.00%) Loss: 2.060222 LR: 0.00000709 +[17:07:34] Epoch: 1 Batch: 17889/20099 (89.00%) Loss: 2.115753 LR: 0.00000709 +[17:07:36] Epoch: 1 Batch: 17890/20099 (89.01%) Loss: 2.018043 LR: 0.00000709 +[17:07:38] Epoch: 1 Batch: 17891/20099 (89.01%) Loss: 2.035562 LR: 0.00000709 +[17:07:40] Epoch: 1 Batch: 17892/20099 (89.02%) Loss: 1.761590 LR: 0.00000709 +[17:07:42] Epoch: 1 Batch: 17893/20099 (89.02%) Loss: 2.229523 LR: 0.00000708 +[17:07:44] Epoch: 1 Batch: 17894/20099 (89.03%) Loss: 2.173752 LR: 0.00000708 +[17:07:45] Epoch: 1 Batch: 17895/20099 (89.03%) Loss: 2.024402 LR: 0.00000708 +[17:07:47] Epoch: 1 Batch: 17896/20099 (89.04%) Loss: 1.905504 LR: 0.00000708 +[17:07:49] Epoch: 1 Batch: 17897/20099 (89.04%) Loss: 2.049084 LR: 0.00000708 +[17:07:51] Epoch: 1 Batch: 17898/20099 (89.05%) Loss: 2.113774 LR: 0.00000708 +[17:07:53] Epoch: 1 Batch: 17899/20099 (89.05%) Loss: 2.029503 LR: 0.00000708 +[17:07:55] Epoch: 1 Batch: 17900/20099 (89.06%) Loss: 2.032397 LR: 0.00000707 +[17:07:57] Epoch: 1 Batch: 17901/20099 (89.06%) Loss: 1.882246 LR: 0.00000707 +[17:07:58] Epoch: 1 Batch: 17902/20099 (89.07%) Loss: 2.137533 LR: 0.00000707 +[17:08:00] Epoch: 1 Batch: 17903/20099 (89.07%) Loss: 2.252595 LR: 0.00000707 +[17:08:02] Epoch: 1 Batch: 17904/20099 (89.08%) Loss: 1.929169 LR: 0.00000707 +[17:08:04] Epoch: 1 Batch: 17905/20099 (89.08%) Loss: 1.988222 LR: 0.00000707 +[17:08:06] Epoch: 1 Batch: 17906/20099 (89.09%) Loss: 2.147563 LR: 0.00000707 +[17:08:08] Epoch: 1 Batch: 17907/20099 (89.09%) Loss: 2.078092 LR: 0.00000707 +[17:08:10] Epoch: 1 Batch: 17908/20099 (89.10%) Loss: 1.783320 LR: 0.00000707 +[17:08:11] Epoch: 1 Batch: 17909/20099 (89.10%) Loss: 1.861536 LR: 0.00000707 +[17:08:13] Epoch: 1 Batch: 17910/20099 (89.11%) Loss: 2.264801 LR: 0.00000707 +[17:08:15] Epoch: 1 Batch: 17911/20099 (89.11%) Loss: 2.007893 LR: 0.00000707 +[17:08:17] Epoch: 1 Batch: 17912/20099 (89.12%) Loss: 2.245257 LR: 0.00000707 +[17:08:19] Epoch: 1 Batch: 17913/20099 (89.12%) Loss: 2.510348 LR: 0.00000707 +[17:08:21] Epoch: 1 Batch: 17914/20099 (89.13%) Loss: 2.187956 LR: 0.00000706 +[17:08:23] Epoch: 1 Batch: 17915/20099 (89.13%) Loss: 2.200603 LR: 0.00000706 +[17:08:24] Epoch: 1 Batch: 17916/20099 (89.14%) Loss: 2.246451 LR: 0.00000706 +[17:08:26] Epoch: 1 Batch: 17917/20099 (89.14%) Loss: 1.981159 LR: 0.00000706 +[17:08:28] Epoch: 1 Batch: 17918/20099 (89.15%) Loss: 2.116902 LR: 0.00000706 +[17:08:30] Epoch: 1 Batch: 17919/20099 (89.15%) Loss: 2.226731 LR: 0.00000706 +[17:08:32] Epoch: 1 Batch: 17920/20099 (89.16%) Loss: 2.142377 LR: 0.00000706 +[17:08:34] Epoch: 1 Batch: 17921/20099 (89.16%) Loss: 1.907858 LR: 0.00000705 +[17:08:36] Epoch: 1 Batch: 17922/20099 (89.17%) Loss: 1.893241 LR: 0.00000705 +[17:08:37] Epoch: 1 Batch: 17923/20099 (89.17%) Loss: 1.731871 LR: 0.00000705 +[17:08:39] Epoch: 1 Batch: 17924/20099 (89.18%) Loss: 1.828278 LR: 0.00000705 +[17:08:41] Epoch: 1 Batch: 17925/20099 (89.18%) Loss: 2.215572 LR: 0.00000705 +[17:08:43] Epoch: 1 Batch: 17926/20099 (89.19%) Loss: 1.900466 LR: 0.00000705 +[17:08:45] Epoch: 1 Batch: 17927/20099 (89.19%) Loss: 1.966420 LR: 0.00000705 +[17:08:47] Epoch: 1 Batch: 17928/20099 (89.20%) Loss: 1.700089 LR: 0.00000705 +[17:08:49] Epoch: 1 Batch: 17929/20099 (89.20%) Loss: 1.965002 LR: 0.00000705 +[17:08:50] Epoch: 1 Batch: 17930/20099 (89.21%) Loss: 1.595601 LR: 0.00000705 +[17:08:52] Epoch: 1 Batch: 17931/20099 (89.21%) Loss: 2.120149 LR: 0.00000705 +[17:08:54] Epoch: 1 Batch: 17932/20099 (89.22%) Loss: 2.189367 LR: 0.00000705 +[17:08:56] Epoch: 1 Batch: 17933/20099 (89.22%) Loss: 2.095162 LR: 0.00000705 +[17:08:58] Epoch: 1 Batch: 17934/20099 (89.23%) Loss: 1.639656 LR: 0.00000705 +[17:09:00] Epoch: 1 Batch: 17935/20099 (89.23%) Loss: 1.996701 LR: 0.00000704 +[17:09:02] Epoch: 1 Batch: 17936/20099 (89.24%) Loss: 2.109386 LR: 0.00000704 +[17:09:03] Epoch: 1 Batch: 17937/20099 (89.24%) Loss: 2.138952 LR: 0.00000704 +[17:09:05] Epoch: 1 Batch: 17938/20099 (89.25%) Loss: 1.821160 LR: 0.00000704 +[17:09:07] Epoch: 1 Batch: 17939/20099 (89.25%) Loss: 1.806112 LR: 0.00000704 +[17:09:09] Epoch: 1 Batch: 17940/20099 (89.26%) Loss: 2.380305 LR: 0.00000704 +[17:09:11] Epoch: 1 Batch: 17941/20099 (89.26%) Loss: 1.861227 LR: 0.00000704 +[17:09:13] Epoch: 1 Batch: 17942/20099 (89.27%) Loss: 1.916560 LR: 0.00000703 +[17:09:15] Epoch: 1 Batch: 17943/20099 (89.27%) Loss: 2.018984 LR: 0.00000703 +[17:09:16] Epoch: 1 Batch: 17944/20099 (89.28%) Loss: 2.181723 LR: 0.00000703 +[17:09:18] Epoch: 1 Batch: 17945/20099 (89.28%) Loss: 1.963849 LR: 0.00000703 +[17:09:20] Epoch: 1 Batch: 17946/20099 (89.29%) Loss: 2.141358 LR: 0.00000703 +[17:09:22] Epoch: 1 Batch: 17947/20099 (89.29%) Loss: 2.262803 LR: 0.00000703 +[17:09:24] Epoch: 1 Batch: 17948/20099 (89.30%) Loss: 1.765720 LR: 0.00000703 +[17:09:26] Epoch: 1 Batch: 17949/20099 (89.30%) Loss: 1.969283 LR: 0.00000703 +[17:09:28] Epoch: 1 Batch: 17950/20099 (89.31%) Loss: 2.067737 LR: 0.00000703 +[17:09:29] Epoch: 1 Batch: 17951/20099 (89.31%) Loss: 1.729749 LR: 0.00000703 +[17:09:31] Epoch: 1 Batch: 17952/20099 (89.32%) Loss: 2.096179 LR: 0.00000703 +[17:09:33] Epoch: 1 Batch: 17953/20099 (89.32%) Loss: 2.010196 LR: 0.00000703 +[17:09:35] Epoch: 1 Batch: 17954/20099 (89.33%) Loss: 2.233782 LR: 0.00000703 +[17:09:37] Epoch: 1 Batch: 17955/20099 (89.33%) Loss: 1.900432 LR: 0.00000703 +[17:09:39] Epoch: 1 Batch: 17956/20099 (89.34%) Loss: 2.063324 LR: 0.00000702 +[17:09:40] Epoch: 1 Batch: 17957/20099 (89.34%) Loss: 1.771828 LR: 0.00000702 +[17:09:42] Epoch: 1 Batch: 17958/20099 (89.35%) Loss: 2.336871 LR: 0.00000702 +[17:09:44] Epoch: 1 Batch: 17959/20099 (89.35%) Loss: 2.374394 LR: 0.00000702 +[17:09:46] Epoch: 1 Batch: 17960/20099 (89.36%) Loss: 2.073898 LR: 0.00000702 +[17:09:48] Epoch: 1 Batch: 17961/20099 (89.36%) Loss: 2.097205 LR: 0.00000702 +[17:09:50] Epoch: 1 Batch: 17962/20099 (89.37%) Loss: 2.179702 LR: 0.00000702 +[17:09:52] Epoch: 1 Batch: 17963/20099 (89.37%) Loss: 2.099352 LR: 0.00000701 +[17:09:53] Epoch: 1 Batch: 17964/20099 (89.38%) Loss: 1.899663 LR: 0.00000701 +[17:09:55] Epoch: 1 Batch: 17965/20099 (89.38%) Loss: 2.231411 LR: 0.00000701 +[17:09:57] Epoch: 1 Batch: 17966/20099 (89.39%) Loss: 2.054854 LR: 0.00000701 +[17:09:59] Epoch: 1 Batch: 17967/20099 (89.39%) Loss: 2.403244 LR: 0.00000701 +[17:10:01] Epoch: 1 Batch: 17968/20099 (89.40%) Loss: 1.937529 LR: 0.00000701 +[17:10:03] Epoch: 1 Batch: 17969/20099 (89.40%) Loss: 2.144378 LR: 0.00000701 +[17:10:05] Epoch: 1 Batch: 17970/20099 (89.41%) Loss: 2.156329 LR: 0.00000701 +[17:10:06] Epoch: 1 Batch: 17971/20099 (89.41%) Loss: 2.115625 LR: 0.00000701 +[17:10:08] Epoch: 1 Batch: 17972/20099 (89.42%) Loss: 1.965944 LR: 0.00000701 +[17:10:10] Epoch: 1 Batch: 17973/20099 (89.42%) Loss: 2.171092 LR: 0.00000701 +[17:10:12] Epoch: 1 Batch: 17974/20099 (89.43%) Loss: 1.909823 LR: 0.00000701 +[17:10:14] Epoch: 1 Batch: 17975/20099 (89.43%) Loss: 2.093071 LR: 0.00000701 +[17:10:16] Epoch: 1 Batch: 17976/20099 (89.44%) Loss: 1.913140 LR: 0.00000701 +[17:10:17] Epoch: 1 Batch: 17977/20099 (89.44%) Loss: 2.326950 LR: 0.00000700 +[17:10:19] Epoch: 1 Batch: 17978/20099 (89.45%) Loss: 2.089114 LR: 0.00000700 +[17:10:21] Epoch: 1 Batch: 17979/20099 (89.45%) Loss: 2.009624 LR: 0.00000700 +[17:10:23] Epoch: 1 Batch: 17980/20099 (89.46%) Loss: 1.958285 LR: 0.00000700 +[17:10:25] Epoch: 1 Batch: 17981/20099 (89.46%) Loss: 2.094510 LR: 0.00000700 +[17:10:27] Epoch: 1 Batch: 17982/20099 (89.47%) Loss: 1.758223 LR: 0.00000700 +[17:10:29] Epoch: 1 Batch: 17983/20099 (89.47%) Loss: 2.000707 LR: 0.00000700 +[17:10:30] Epoch: 1 Batch: 17984/20099 (89.48%) Loss: 1.948707 LR: 0.00000700 +[17:10:32] Epoch: 1 Batch: 17985/20099 (89.48%) Loss: 2.217270 LR: 0.00000700 +[17:10:34] Epoch: 1 Batch: 17986/20099 (89.49%) Loss: 2.099501 LR: 0.00000700 +[17:10:36] Epoch: 1 Batch: 17987/20099 (89.49%) Loss: 2.207113 LR: 0.00000700 +[17:10:38] Epoch: 1 Batch: 17988/20099 (89.50%) Loss: 2.160863 LR: 0.00000700 +[17:10:40] Epoch: 1 Batch: 17989/20099 (89.50%) Loss: 2.235569 LR: 0.00000700 +[17:10:42] Epoch: 1 Batch: 17990/20099 (89.51%) Loss: 1.662771 LR: 0.00000700 +[17:10:43] Epoch: 1 Batch: 17991/20099 (89.51%) Loss: 2.004698 LR: 0.00000699 +[17:10:45] Epoch: 1 Batch: 17992/20099 (89.52%) Loss: 2.117875 LR: 0.00000699 +[17:10:47] Epoch: 1 Batch: 17993/20099 (89.52%) Loss: 1.828228 LR: 0.00000699 +[17:10:49] Epoch: 1 Batch: 17994/20099 (89.53%) Loss: 2.016907 LR: 0.00000699 +[17:10:51] Epoch: 1 Batch: 17995/20099 (89.53%) Loss: 2.215260 LR: 0.00000699 +[17:10:53] Epoch: 1 Batch: 17996/20099 (89.54%) Loss: 1.861059 LR: 0.00000699 +[17:10:55] Epoch: 1 Batch: 17997/20099 (89.54%) Loss: 2.173875 LR: 0.00000699 +[17:10:56] Epoch: 1 Batch: 17998/20099 (89.55%) Loss: 2.036344 LR: 0.00000698 +[17:10:58] Epoch: 1 Batch: 17999/20099 (89.55%) Loss: 1.817745 LR: 0.00000698 +[17:11:00] >> Evaluating batch 0 +[17:11:01] >> Evaluating batch 1 +[17:11:02] >> Evaluating batch 2 +[17:11:03] >> Evaluating batch 3 +[17:11:04] >> Evaluating batch 4 +[17:11:06] >> Evaluating batch 5 +[17:11:07] >> Evaluating batch 6 +[17:11:08] >> Evaluating batch 7 +[17:11:09] >> Evaluating batch 8 +[17:11:10] >> Evaluating batch 9 +[17:11:11] >> Evaluating batch 10 +[17:11:12] >> Evaluating batch 11 +[17:11:13] >> Evaluating batch 12 +[17:11:14] >> Evaluating batch 13 +[17:11:15] >> Evaluating batch 14 +[17:11:16] >> Evaluating batch 15 +[17:11:17] >> Evaluating batch 16 +[17:11:17] Epoch: 1 Step: 18000/20099 Evaluation: +[17:11:17] [1mAvg Loss Since Last Eval: 2.0751 Val Loss: 2.1479 Validation loss delta: 0.0002 Perplexity: 8.5672 LR: 0.00000698 +[17:11:21] >> Cleaned up old temp checkpoint: epoch1_step16000 +[17:11:21] >> Temp checkpoint saved: epoch1_step18000, size: 0.1693 GB +[17:11:25] >> Checkpoint saved: epoch1_step18000, size: 0.1693 GB +[17:11:25] Epoch: 1 Batch: 18000/20099 (89.56%) Loss: 2.004705 LR: 0.00000698 +[17:11:27] Epoch: 1 Batch: 18001/20099 (89.56%) Loss: 2.205158 LR: 0.00000698 +[17:11:28] Epoch: 1 Batch: 18002/20099 (89.57%) Loss: 2.016294 LR: 0.00000698 +[17:11:30] Epoch: 1 Batch: 18003/20099 (89.57%) Loss: 1.971382 LR: 0.00000698 +[17:11:32] Epoch: 1 Batch: 18004/20099 (89.58%) Loss: 1.886335 LR: 0.00000698 +[17:11:34] Epoch: 1 Batch: 18005/20099 (89.58%) Loss: 1.924114 LR: 0.00000698 +[17:11:36] Epoch: 1 Batch: 18006/20099 (89.59%) Loss: 1.895831 LR: 0.00000698 +[17:11:38] Epoch: 1 Batch: 18007/20099 (89.59%) Loss: 1.896155 LR: 0.00000698 +[17:11:39] Epoch: 1 Batch: 18008/20099 (89.60%) Loss: 2.063445 LR: 0.00000698 +[17:11:41] Epoch: 1 Batch: 18009/20099 (89.60%) Loss: 1.721303 LR: 0.00000698 +[17:11:43] Epoch: 1 Batch: 18010/20099 (89.61%) Loss: 2.263575 LR: 0.00000698 +[17:11:45] Epoch: 1 Batch: 18011/20099 (89.61%) Loss: 2.260409 LR: 0.00000698 +[17:11:47] Epoch: 1 Batch: 18012/20099 (89.62%) Loss: 2.441817 LR: 0.00000697 +[17:11:49] Epoch: 1 Batch: 18013/20099 (89.62%) Loss: 2.200558 LR: 0.00000697 +[17:11:51] Epoch: 1 Batch: 18014/20099 (89.63%) Loss: 2.163806 LR: 0.00000697 +[17:11:53] Epoch: 1 Batch: 18015/20099 (89.63%) Loss: 2.083430 LR: 0.00000697 +[17:11:55] Epoch: 1 Batch: 18016/20099 (89.64%) Loss: 2.095789 LR: 0.00000697 +[17:11:57] Epoch: 1 Batch: 18017/20099 (89.64%) Loss: 2.156080 LR: 0.00000697 +[17:11:59] Epoch: 1 Batch: 18018/20099 (89.65%) Loss: 1.983117 LR: 0.00000697 +[17:12:00] Epoch: 1 Batch: 18019/20099 (89.65%) Loss: 2.137215 LR: 0.00000696 +[17:12:02] Epoch: 1 Batch: 18020/20099 (89.66%) Loss: 2.316269 LR: 0.00000696 +[17:12:04] Epoch: 1 Batch: 18021/20099 (89.66%) Loss: 2.132652 LR: 0.00000696 +[17:12:06] Epoch: 1 Batch: 18022/20099 (89.67%) Loss: 2.201175 LR: 0.00000696 +[17:12:08] Epoch: 1 Batch: 18023/20099 (89.67%) Loss: 2.304817 LR: 0.00000696 +[17:12:10] Epoch: 1 Batch: 18024/20099 (89.68%) Loss: 1.968195 LR: 0.00000696 +[17:12:12] Epoch: 1 Batch: 18025/20099 (89.68%) Loss: 2.180826 LR: 0.00000696 +[17:12:14] Epoch: 1 Batch: 18026/20099 (89.69%) Loss: 1.883281 LR: 0.00000696 +[17:12:15] Epoch: 1 Batch: 18027/20099 (89.69%) Loss: 2.215320 LR: 0.00000696 +[17:12:17] Epoch: 1 Batch: 18028/20099 (89.70%) Loss: 2.149981 LR: 0.00000696 +[17:12:19] Epoch: 1 Batch: 18029/20099 (89.70%) Loss: 2.217269 LR: 0.00000696 +[17:12:21] Epoch: 1 Batch: 18030/20099 (89.71%) Loss: 1.900046 LR: 0.00000696 +[17:12:23] Epoch: 1 Batch: 18031/20099 (89.71%) Loss: 2.091825 LR: 0.00000696 +[17:12:25] Epoch: 1 Batch: 18032/20099 (89.72%) Loss: 2.042341 LR: 0.00000696 +[17:12:26] Epoch: 1 Batch: 18033/20099 (89.72%) Loss: 2.202927 LR: 0.00000695 +[17:12:28] Epoch: 1 Batch: 18034/20099 (89.73%) Loss: 1.931984 LR: 0.00000695 +[17:12:30] Epoch: 1 Batch: 18035/20099 (89.73%) Loss: 2.253047 LR: 0.00000695 +[17:12:32] Epoch: 1 Batch: 18036/20099 (89.74%) Loss: 2.353540 LR: 0.00000695 +[17:12:34] Epoch: 1 Batch: 18037/20099 (89.74%) Loss: 2.253605 LR: 0.00000695 +[17:12:36] Epoch: 1 Batch: 18038/20099 (89.75%) Loss: 2.340044 LR: 0.00000695 +[17:12:37] Epoch: 1 Batch: 18039/20099 (89.75%) Loss: 2.103763 LR: 0.00000695 +[17:12:39] Epoch: 1 Batch: 18040/20099 (89.76%) Loss: 1.990899 LR: 0.00000694 +[17:12:41] Epoch: 1 Batch: 18041/20099 (89.76%) Loss: 2.345346 LR: 0.00000694 +[17:12:43] Epoch: 1 Batch: 18042/20099 (89.77%) Loss: 2.633908 LR: 0.00000694 +[17:12:45] Epoch: 1 Batch: 18043/20099 (89.77%) Loss: 1.917744 LR: 0.00000694 +[17:12:47] Epoch: 1 Batch: 18044/20099 (89.78%) Loss: 2.171669 LR: 0.00000694 +[17:12:49] Epoch: 1 Batch: 18045/20099 (89.78%) Loss: 2.174442 LR: 0.00000694 +[17:12:50] Epoch: 1 Batch: 18046/20099 (89.79%) Loss: 2.062431 LR: 0.00000694 +[17:12:52] Epoch: 1 Batch: 18047/20099 (89.79%) Loss: 2.188600 LR: 0.00000694 +[17:12:54] Epoch: 1 Batch: 18048/20099 (89.80%) Loss: 2.104252 LR: 0.00000694 +[17:12:56] Epoch: 1 Batch: 18049/20099 (89.80%) Loss: 2.041443 LR: 0.00000694 +[17:12:58] Epoch: 1 Batch: 18050/20099 (89.81%) Loss: 2.104284 LR: 0.00000694 +[17:13:00] Epoch: 1 Batch: 18051/20099 (89.81%) Loss: 2.017040 LR: 0.00000694 +[17:13:02] Epoch: 1 Batch: 18052/20099 (89.82%) Loss: 2.321356 LR: 0.00000694 +[17:13:03] Epoch: 1 Batch: 18053/20099 (89.82%) Loss: 2.158650 LR: 0.00000694 +[17:13:05] Epoch: 1 Batch: 18054/20099 (89.83%) Loss: 2.085393 LR: 0.00000693 +[17:13:07] Epoch: 1 Batch: 18055/20099 (89.83%) Loss: 1.976495 LR: 0.00000693 +[17:13:09] Epoch: 1 Batch: 18056/20099 (89.84%) Loss: 2.142013 LR: 0.00000693 +[17:13:11] Epoch: 1 Batch: 18057/20099 (89.84%) Loss: 2.144199 LR: 0.00000693 +[17:13:13] Epoch: 1 Batch: 18058/20099 (89.85%) Loss: 1.637200 LR: 0.00000693 +[17:13:15] Epoch: 1 Batch: 18059/20099 (89.85%) Loss: 2.053703 LR: 0.00000693 +[17:13:16] Epoch: 1 Batch: 18060/20099 (89.86%) Loss: 2.196311 LR: 0.00000693 +[17:13:18] Epoch: 1 Batch: 18061/20099 (89.86%) Loss: 2.102333 LR: 0.00000693 +[17:13:20] Epoch: 1 Batch: 18062/20099 (89.87%) Loss: 2.406469 LR: 0.00000693 +[17:13:22] Epoch: 1 Batch: 18063/20099 (89.87%) Loss: 1.898297 LR: 0.00000693 +[17:13:24] Epoch: 1 Batch: 18064/20099 (89.88%) Loss: 1.802793 LR: 0.00000693 +[17:13:26] Epoch: 1 Batch: 18065/20099 (89.88%) Loss: 2.227165 LR: 0.00000693 +[17:13:28] Epoch: 1 Batch: 18066/20099 (89.89%) Loss: 2.096587 LR: 0.00000693 +[17:13:30] Epoch: 1 Batch: 18067/20099 (89.89%) Loss: 2.089689 LR: 0.00000693 +[17:13:32] Epoch: 1 Batch: 18068/20099 (89.90%) Loss: 2.231888 LR: 0.00000692 +[17:13:33] Epoch: 1 Batch: 18069/20099 (89.90%) Loss: 2.012552 LR: 0.00000692 +[17:13:35] Epoch: 1 Batch: 18070/20099 (89.90%) Loss: 2.365656 LR: 0.00000692 +[17:13:37] Epoch: 1 Batch: 18071/20099 (89.91%) Loss: 2.353521 LR: 0.00000692 +[17:13:39] Epoch: 1 Batch: 18072/20099 (89.91%) Loss: 2.417355 LR: 0.00000692 +[17:13:41] Epoch: 1 Batch: 18073/20099 (89.92%) Loss: 2.153215 LR: 0.00000692 +[17:13:43] Epoch: 1 Batch: 18074/20099 (89.92%) Loss: 2.378214 LR: 0.00000692 +[17:13:45] Epoch: 1 Batch: 18075/20099 (89.93%) Loss: 1.890756 LR: 0.00000691 +[17:13:46] Epoch: 1 Batch: 18076/20099 (89.93%) Loss: 2.140655 LR: 0.00000691 +[17:13:48] Epoch: 1 Batch: 18077/20099 (89.94%) Loss: 2.364367 LR: 0.00000691 +[17:13:50] Epoch: 1 Batch: 18078/20099 (89.94%) Loss: 1.870022 LR: 0.00000691 +[17:13:52] Epoch: 1 Batch: 18079/20099 (89.95%) Loss: 2.101709 LR: 0.00000691 +[17:13:54] Epoch: 1 Batch: 18080/20099 (89.95%) Loss: 2.057960 LR: 0.00000691 +[17:13:56] Epoch: 1 Batch: 18081/20099 (89.96%) Loss: 2.152176 LR: 0.00000691 +[17:13:58] Epoch: 1 Batch: 18082/20099 (89.96%) Loss: 1.915035 LR: 0.00000691 +[17:13:59] Epoch: 1 Batch: 18083/20099 (89.97%) Loss: 2.147628 LR: 0.00000691 +[17:14:01] Epoch: 1 Batch: 18084/20099 (89.97%) Loss: 2.180744 LR: 0.00000691 +[17:14:03] Epoch: 1 Batch: 18085/20099 (89.98%) Loss: 2.220371 LR: 0.00000691 +[17:14:05] Epoch: 1 Batch: 18086/20099 (89.98%) Loss: 2.095800 LR: 0.00000691 +[17:14:07] Epoch: 1 Batch: 18087/20099 (89.99%) Loss: 2.179121 LR: 0.00000691 +[17:14:09] Epoch: 1 Batch: 18088/20099 (89.99%) Loss: 1.686664 LR: 0.00000691 +[17:14:11] Epoch: 1 Batch: 18089/20099 (90.00%) Loss: 2.466885 LR: 0.00000690 +[17:14:12] Epoch: 1 Batch: 18090/20099 (90.00%) Loss: 2.092649 LR: 0.00000690 +[17:14:14] Epoch: 1 Batch: 18091/20099 (90.01%) Loss: 1.834007 LR: 0.00000690 +[17:14:16] Epoch: 1 Batch: 18092/20099 (90.01%) Loss: 2.004737 LR: 0.00000690 +[17:14:18] Epoch: 1 Batch: 18093/20099 (90.02%) Loss: 2.088456 LR: 0.00000690 +[17:14:20] Epoch: 1 Batch: 18094/20099 (90.02%) Loss: 1.900372 LR: 0.00000690 +[17:14:22] Epoch: 1 Batch: 18095/20099 (90.03%) Loss: 2.085242 LR: 0.00000690 +[17:14:24] Epoch: 1 Batch: 18096/20099 (90.03%) Loss: 2.080117 LR: 0.00000689 +[17:14:25] Epoch: 1 Batch: 18097/20099 (90.04%) Loss: 2.095410 LR: 0.00000689 +[17:14:27] Epoch: 1 Batch: 18098/20099 (90.04%) Loss: 1.875842 LR: 0.00000689 +[17:14:29] Epoch: 1 Batch: 18099/20099 (90.05%) Loss: 2.062492 LR: 0.00000689 +[17:14:31] Epoch: 1 Batch: 18100/20099 (90.05%) Loss: 1.912337 LR: 0.00000689 +[17:14:33] Epoch: 1 Batch: 18101/20099 (90.06%) Loss: 2.026867 LR: 0.00000689 +[17:14:35] Epoch: 1 Batch: 18102/20099 (90.06%) Loss: 2.027820 LR: 0.00000689 +[17:14:37] Epoch: 1 Batch: 18103/20099 (90.07%) Loss: 2.213296 LR: 0.00000689 +[17:14:38] Epoch: 1 Batch: 18104/20099 (90.07%) Loss: 1.902693 LR: 0.00000689 +[17:14:40] Epoch: 1 Batch: 18105/20099 (90.08%) Loss: 2.221216 LR: 0.00000689 +[17:14:42] Epoch: 1 Batch: 18106/20099 (90.08%) Loss: 2.161276 LR: 0.00000689 +[17:14:44] Epoch: 1 Batch: 18107/20099 (90.09%) Loss: 2.094465 LR: 0.00000689 +[17:14:46] Epoch: 1 Batch: 18108/20099 (90.09%) Loss: 2.090825 LR: 0.00000689 +[17:14:48] Epoch: 1 Batch: 18109/20099 (90.10%) Loss: 2.232857 LR: 0.00000689 +[17:14:49] Epoch: 1 Batch: 18110/20099 (90.10%) Loss: 2.170771 LR: 0.00000688 +[17:14:51] Epoch: 1 Batch: 18111/20099 (90.11%) Loss: 2.149915 LR: 0.00000688 +[17:14:53] Epoch: 1 Batch: 18112/20099 (90.11%) Loss: 2.066336 LR: 0.00000688 +[17:14:55] Epoch: 1 Batch: 18113/20099 (90.12%) Loss: 2.395573 LR: 0.00000688 +[17:14:57] Epoch: 1 Batch: 18114/20099 (90.12%) Loss: 2.088757 LR: 0.00000688 +[17:14:59] Epoch: 1 Batch: 18115/20099 (90.13%) Loss: 2.069268 LR: 0.00000688 +[17:15:01] Epoch: 1 Batch: 18116/20099 (90.13%) Loss: 2.037172 LR: 0.00000688 +[17:15:02] Epoch: 1 Batch: 18117/20099 (90.14%) Loss: 2.035388 LR: 0.00000688 +[17:15:04] Epoch: 1 Batch: 18118/20099 (90.14%) Loss: 2.022200 LR: 0.00000688 +[17:15:06] Epoch: 1 Batch: 18119/20099 (90.15%) Loss: 1.911481 LR: 0.00000688 +[17:15:08] Epoch: 1 Batch: 18120/20099 (90.15%) Loss: 2.212917 LR: 0.00000688 +[17:15:10] Epoch: 1 Batch: 18121/20099 (90.16%) Loss: 2.009455 LR: 0.00000688 +[17:15:12] Epoch: 1 Batch: 18122/20099 (90.16%) Loss: 2.066230 LR: 0.00000688 +[17:15:13] Epoch: 1 Batch: 18123/20099 (90.17%) Loss: 2.180559 LR: 0.00000688 +[17:15:15] Epoch: 1 Batch: 18124/20099 (90.17%) Loss: 2.044644 LR: 0.00000687 +[17:15:17] Epoch: 1 Batch: 18125/20099 (90.18%) Loss: 1.972743 LR: 0.00000687 +[17:15:19] Epoch: 1 Batch: 18126/20099 (90.18%) Loss: 1.908948 LR: 0.00000687 +[17:15:21] Epoch: 1 Batch: 18127/20099 (90.19%) Loss: 2.022417 LR: 0.00000687 +[17:15:23] Epoch: 1 Batch: 18128/20099 (90.19%) Loss: 2.231846 LR: 0.00000687 +[17:15:25] Epoch: 1 Batch: 18129/20099 (90.20%) Loss: 2.077019 LR: 0.00000687 +[17:15:26] Epoch: 1 Batch: 18130/20099 (90.20%) Loss: 1.983520 LR: 0.00000687 +[17:15:28] Epoch: 1 Batch: 18131/20099 (90.21%) Loss: 2.256837 LR: 0.00000686 +[17:15:30] Epoch: 1 Batch: 18132/20099 (90.21%) Loss: 1.957459 LR: 0.00000686 +[17:15:32] Epoch: 1 Batch: 18133/20099 (90.22%) Loss: 2.282062 LR: 0.00000686 +[17:15:34] Epoch: 1 Batch: 18134/20099 (90.22%) Loss: 2.317946 LR: 0.00000686 +[17:15:36] Epoch: 1 Batch: 18135/20099 (90.23%) Loss: 1.969807 LR: 0.00000686 +[17:15:38] Epoch: 1 Batch: 18136/20099 (90.23%) Loss: 1.639954 LR: 0.00000686 +[17:15:39] Epoch: 1 Batch: 18137/20099 (90.24%) Loss: 2.172732 LR: 0.00000686 +[17:15:41] Epoch: 1 Batch: 18138/20099 (90.24%) Loss: 2.030382 LR: 0.00000686 +[17:15:43] Epoch: 1 Batch: 18139/20099 (90.25%) Loss: 2.183272 LR: 0.00000686 +[17:15:45] Epoch: 1 Batch: 18140/20099 (90.25%) Loss: 2.279426 LR: 0.00000686 +[17:15:47] Epoch: 1 Batch: 18141/20099 (90.26%) Loss: 1.901422 LR: 0.00000686 +[17:15:49] Epoch: 1 Batch: 18142/20099 (90.26%) Loss: 2.196802 LR: 0.00000686 +[17:15:51] Epoch: 1 Batch: 18143/20099 (90.27%) Loss: 2.221325 LR: 0.00000686 +[17:15:53] Epoch: 1 Batch: 18144/20099 (90.27%) Loss: 2.332128 LR: 0.00000686 +[17:15:54] Epoch: 1 Batch: 18145/20099 (90.28%) Loss: 1.914622 LR: 0.00000685 +[17:15:56] Epoch: 1 Batch: 18146/20099 (90.28%) Loss: 2.454961 LR: 0.00000685 +[17:15:58] Epoch: 1 Batch: 18147/20099 (90.29%) Loss: 2.219870 LR: 0.00000685 +[17:16:00] Epoch: 1 Batch: 18148/20099 (90.29%) Loss: 2.045647 LR: 0.00000685 +[17:16:02] Epoch: 1 Batch: 18149/20099 (90.30%) Loss: 2.206931 LR: 0.00000685 +[17:16:04] Epoch: 1 Batch: 18150/20099 (90.30%) Loss: 2.210404 LR: 0.00000685 +[17:16:05] Epoch: 1 Batch: 18151/20099 (90.31%) Loss: 2.217705 LR: 0.00000685 +[17:16:07] Epoch: 1 Batch: 18152/20099 (90.31%) Loss: 2.103740 LR: 0.00000685 +[17:16:09] Epoch: 1 Batch: 18153/20099 (90.32%) Loss: 1.846803 LR: 0.00000685 +[17:16:11] Epoch: 1 Batch: 18154/20099 (90.32%) Loss: 2.026046 LR: 0.00000685 +[17:16:13] Epoch: 1 Batch: 18155/20099 (90.33%) Loss: 2.066137 LR: 0.00000685 +[17:16:15] Epoch: 1 Batch: 18156/20099 (90.33%) Loss: 2.143274 LR: 0.00000685 +[17:16:17] Epoch: 1 Batch: 18157/20099 (90.34%) Loss: 2.185949 LR: 0.00000685 +[17:16:18] Epoch: 1 Batch: 18158/20099 (90.34%) Loss: 1.726983 LR: 0.00000685 +[17:16:20] Epoch: 1 Batch: 18159/20099 (90.35%) Loss: 2.124674 LR: 0.00000684 +[17:16:22] Epoch: 1 Batch: 18160/20099 (90.35%) Loss: 1.914822 LR: 0.00000684 +[17:16:24] Epoch: 1 Batch: 18161/20099 (90.36%) Loss: 2.093619 LR: 0.00000684 +[17:16:26] Epoch: 1 Batch: 18162/20099 (90.36%) Loss: 1.973718 LR: 0.00000684 +[17:16:28] Epoch: 1 Batch: 18163/20099 (90.37%) Loss: 2.176095 LR: 0.00000684 +[17:16:29] Epoch: 1 Batch: 18164/20099 (90.37%) Loss: 2.025962 LR: 0.00000684 +[17:16:31] Epoch: 1 Batch: 18165/20099 (90.38%) Loss: 2.021997 LR: 0.00000684 +[17:16:33] Epoch: 1 Batch: 18166/20099 (90.38%) Loss: 1.903714 LR: 0.00000683 +[17:16:35] Epoch: 1 Batch: 18167/20099 (90.39%) Loss: 2.163500 LR: 0.00000683 +[17:16:37] Epoch: 1 Batch: 18168/20099 (90.39%) Loss: 2.056345 LR: 0.00000683 +[17:16:39] Epoch: 1 Batch: 18169/20099 (90.40%) Loss: 1.826703 LR: 0.00000683 +[17:16:41] Epoch: 1 Batch: 18170/20099 (90.40%) Loss: 1.704078 LR: 0.00000683 +[17:16:42] Epoch: 1 Batch: 18171/20099 (90.41%) Loss: 1.863752 LR: 0.00000683 +[17:16:44] Epoch: 1 Batch: 18172/20099 (90.41%) Loss: 2.031860 LR: 0.00000683 +[17:16:46] Epoch: 1 Batch: 18173/20099 (90.42%) Loss: 1.657881 LR: 0.00000683 +[17:16:48] Epoch: 1 Batch: 18174/20099 (90.42%) Loss: 2.188251 LR: 0.00000683 +[17:16:50] Epoch: 1 Batch: 18175/20099 (90.43%) Loss: 2.008827 LR: 0.00000683 +[17:16:52] Epoch: 1 Batch: 18176/20099 (90.43%) Loss: 1.992982 LR: 0.00000683 +[17:16:53] Epoch: 1 Batch: 18177/20099 (90.44%) Loss: 2.056496 LR: 0.00000683 +[17:16:55] Epoch: 1 Batch: 18178/20099 (90.44%) Loss: 2.167754 LR: 0.00000683 +[17:16:57] Epoch: 1 Batch: 18179/20099 (90.45%) Loss: 2.165286 LR: 0.00000683 +[17:16:59] Epoch: 1 Batch: 18180/20099 (90.45%) Loss: 2.116682 LR: 0.00000682 +[17:17:01] Epoch: 1 Batch: 18181/20099 (90.46%) Loss: 2.274207 LR: 0.00000682 +[17:17:03] Epoch: 1 Batch: 18182/20099 (90.46%) Loss: 2.332131 LR: 0.00000682 +[17:17:05] Epoch: 1 Batch: 18183/20099 (90.47%) Loss: 2.094793 LR: 0.00000682 +[17:17:06] Epoch: 1 Batch: 18184/20099 (90.47%) Loss: 1.919041 LR: 0.00000682 +[17:17:08] Epoch: 1 Batch: 18185/20099 (90.48%) Loss: 1.875718 LR: 0.00000682 +[17:17:10] Epoch: 1 Batch: 18186/20099 (90.48%) Loss: 1.858859 LR: 0.00000682 +[17:17:12] Epoch: 1 Batch: 18187/20099 (90.49%) Loss: 2.229933 LR: 0.00000682 +[17:17:14] Epoch: 1 Batch: 18188/20099 (90.49%) Loss: 1.889033 LR: 0.00000682 +[17:17:16] Epoch: 1 Batch: 18189/20099 (90.50%) Loss: 2.172183 LR: 0.00000682 +[17:17:18] Epoch: 1 Batch: 18190/20099 (90.50%) Loss: 1.922470 LR: 0.00000682 +[17:17:19] Epoch: 1 Batch: 18191/20099 (90.51%) Loss: 2.084778 LR: 0.00000682 +[17:17:21] Epoch: 1 Batch: 18192/20099 (90.51%) Loss: 1.843529 LR: 0.00000682 +[17:17:23] Epoch: 1 Batch: 18193/20099 (90.52%) Loss: 2.037268 LR: 0.00000682 +[17:17:25] Epoch: 1 Batch: 18194/20099 (90.52%) Loss: 2.015238 LR: 0.00000681 +[17:17:27] Epoch: 1 Batch: 18195/20099 (90.53%) Loss: 1.959667 LR: 0.00000681 +[17:17:29] Epoch: 1 Batch: 18196/20099 (90.53%) Loss: 2.263777 LR: 0.00000681 +[17:17:31] Epoch: 1 Batch: 18197/20099 (90.54%) Loss: 2.256994 LR: 0.00000681 +[17:17:32] Epoch: 1 Batch: 18198/20099 (90.54%) Loss: 2.079698 LR: 0.00000681 +[17:17:34] Epoch: 1 Batch: 18199/20099 (90.55%) Loss: 2.081349 LR: 0.00000681 +[17:17:40] >> Cleaned up old temp checkpoint: epoch1_step16200 +[17:17:40] >> Temp checkpoint saved: epoch1_step18200, size: 0.1693 GB +[17:17:40] Epoch: 1 Batch: 18200/20099 (90.55%) Loss: 2.523502 LR: 0.00000681 +[17:17:42] Epoch: 1 Batch: 18201/20099 (90.56%) Loss: 2.030546 LR: 0.00000680 +[17:17:44] Epoch: 1 Batch: 18202/20099 (90.56%) Loss: 2.279252 LR: 0.00000680 +[17:17:45] Epoch: 1 Batch: 18203/20099 (90.57%) Loss: 2.193962 LR: 0.00000680 +[17:17:47] Epoch: 1 Batch: 18204/20099 (90.57%) Loss: 2.022379 LR: 0.00000680 +[17:17:49] Epoch: 1 Batch: 18205/20099 (90.58%) Loss: 2.336188 LR: 0.00000680 +[17:17:51] Epoch: 1 Batch: 18206/20099 (90.58%) Loss: 2.039440 LR: 0.00000680 +[17:17:53] Epoch: 1 Batch: 18207/20099 (90.59%) Loss: 1.648145 LR: 0.00000680 +[17:17:55] Epoch: 1 Batch: 18208/20099 (90.59%) Loss: 2.088834 LR: 0.00000680 +[17:17:57] Epoch: 1 Batch: 18209/20099 (90.60%) Loss: 2.494401 LR: 0.00000680 +[17:17:58] Epoch: 1 Batch: 18210/20099 (90.60%) Loss: 2.227100 LR: 0.00000680 +[17:18:00] Epoch: 1 Batch: 18211/20099 (90.61%) Loss: 2.416119 LR: 0.00000680 +[17:18:02] Epoch: 1 Batch: 18212/20099 (90.61%) Loss: 2.062824 LR: 0.00000680 +[17:18:04] Epoch: 1 Batch: 18213/20099 (90.62%) Loss: 1.908146 LR: 0.00000680 +[17:18:06] Epoch: 1 Batch: 18214/20099 (90.62%) Loss: 2.121631 LR: 0.00000680 +[17:18:08] Epoch: 1 Batch: 18215/20099 (90.63%) Loss: 1.950527 LR: 0.00000679 +[17:18:10] Epoch: 1 Batch: 18216/20099 (90.63%) Loss: 1.918663 LR: 0.00000679 +[17:18:11] Epoch: 1 Batch: 18217/20099 (90.64%) Loss: 1.962832 LR: 0.00000679 +[17:18:13] Epoch: 1 Batch: 18218/20099 (90.64%) Loss: 1.997926 LR: 0.00000679 +[17:18:15] Epoch: 1 Batch: 18219/20099 (90.65%) Loss: 1.877568 LR: 0.00000679 +[17:18:17] Epoch: 1 Batch: 18220/20099 (90.65%) Loss: 2.161923 LR: 0.00000679 +[17:18:19] Epoch: 1 Batch: 18221/20099 (90.66%) Loss: 2.402639 LR: 0.00000679 +[17:18:21] Epoch: 1 Batch: 18222/20099 (90.66%) Loss: 2.319711 LR: 0.00000679 +[17:18:23] Epoch: 1 Batch: 18223/20099 (90.67%) Loss: 2.000299 LR: 0.00000679 +[17:18:24] Epoch: 1 Batch: 18224/20099 (90.67%) Loss: 2.109338 LR: 0.00000679 +[17:18:26] Epoch: 1 Batch: 18225/20099 (90.68%) Loss: 1.997355 LR: 0.00000679 +[17:18:28] Epoch: 1 Batch: 18226/20099 (90.68%) Loss: 2.038446 LR: 0.00000679 +[17:18:30] Epoch: 1 Batch: 18227/20099 (90.69%) Loss: 1.971798 LR: 0.00000679 +[17:18:32] Epoch: 1 Batch: 18228/20099 (90.69%) Loss: 1.743580 LR: 0.00000679 +[17:18:34] Epoch: 1 Batch: 18229/20099 (90.70%) Loss: 2.134880 LR: 0.00000678 +[17:18:36] Epoch: 1 Batch: 18230/20099 (90.70%) Loss: 2.037946 LR: 0.00000678 +[17:18:37] Epoch: 1 Batch: 18231/20099 (90.71%) Loss: 2.280535 LR: 0.00000678 +[17:18:39] Epoch: 1 Batch: 18232/20099 (90.71%) Loss: 2.319334 LR: 0.00000678 +[17:18:41] Epoch: 1 Batch: 18233/20099 (90.72%) Loss: 2.229506 LR: 0.00000678 +[17:18:43] Epoch: 1 Batch: 18234/20099 (90.72%) Loss: 2.069950 LR: 0.00000678 +[17:18:45] Epoch: 1 Batch: 18235/20099 (90.73%) Loss: 2.044002 LR: 0.00000678 +[17:18:47] Epoch: 1 Batch: 18236/20099 (90.73%) Loss: 2.199251 LR: 0.00000678 +[17:18:49] Epoch: 1 Batch: 18237/20099 (90.74%) Loss: 2.085798 LR: 0.00000678 +[17:18:51] Epoch: 1 Batch: 18238/20099 (90.74%) Loss: 2.021058 LR: 0.00000678 +[17:18:52] Epoch: 1 Batch: 18239/20099 (90.75%) Loss: 2.407196 LR: 0.00000678 +[17:18:54] Epoch: 1 Batch: 18240/20099 (90.75%) Loss: 2.024598 LR: 0.00000678 +[17:18:56] Epoch: 1 Batch: 18241/20099 (90.76%) Loss: 2.081717 LR: 0.00000678 +[17:18:58] Epoch: 1 Batch: 18242/20099 (90.76%) Loss: 1.911353 LR: 0.00000678 +[17:19:00] Epoch: 1 Batch: 18243/20099 (90.77%) Loss: 2.043551 LR: 0.00000677 +[17:19:02] Epoch: 1 Batch: 18244/20099 (90.77%) Loss: 2.257400 LR: 0.00000677 +[17:19:03] Epoch: 1 Batch: 18245/20099 (90.78%) Loss: 2.075047 LR: 0.00000677 +[17:19:05] Epoch: 1 Batch: 18246/20099 (90.78%) Loss: 2.165115 LR: 0.00000677 +[17:19:07] Epoch: 1 Batch: 18247/20099 (90.79%) Loss: 2.416836 LR: 0.00000677 +[17:19:09] Epoch: 1 Batch: 18248/20099 (90.79%) Loss: 2.076576 LR: 0.00000677 +[17:19:11] Epoch: 1 Batch: 18249/20099 (90.80%) Loss: 2.016102 LR: 0.00000677 +[17:19:13] Epoch: 1 Batch: 18250/20099 (90.80%) Loss: 2.277062 LR: 0.00000676 +[17:19:15] Epoch: 1 Batch: 18251/20099 (90.81%) Loss: 2.048624 LR: 0.00000676 +[17:19:16] Epoch: 1 Batch: 18252/20099 (90.81%) Loss: 2.034063 LR: 0.00000676 +[17:19:18] Epoch: 1 Batch: 18253/20099 (90.82%) Loss: 2.170323 LR: 0.00000676 +[17:19:20] Epoch: 1 Batch: 18254/20099 (90.82%) Loss: 2.185003 LR: 0.00000676 +[17:19:22] Epoch: 1 Batch: 18255/20099 (90.83%) Loss: 2.027947 LR: 0.00000676 +[17:19:24] Epoch: 1 Batch: 18256/20099 (90.83%) Loss: 2.100636 LR: 0.00000676 +[17:19:26] Epoch: 1 Batch: 18257/20099 (90.84%) Loss: 2.147670 LR: 0.00000676 +[17:19:28] Epoch: 1 Batch: 18258/20099 (90.84%) Loss: 2.421621 LR: 0.00000676 +[17:19:29] Epoch: 1 Batch: 18259/20099 (90.85%) Loss: 2.165501 LR: 0.00000676 +[17:19:31] Epoch: 1 Batch: 18260/20099 (90.85%) Loss: 2.042848 LR: 0.00000676 +[17:19:33] Epoch: 1 Batch: 18261/20099 (90.86%) Loss: 2.172343 LR: 0.00000676 +[17:19:35] Epoch: 1 Batch: 18262/20099 (90.86%) Loss: 2.005523 LR: 0.00000676 +[17:19:37] Epoch: 1 Batch: 18263/20099 (90.87%) Loss: 2.114500 LR: 0.00000676 +[17:19:39] Epoch: 1 Batch: 18264/20099 (90.87%) Loss: 1.780577 LR: 0.00000675 +[17:19:41] Epoch: 1 Batch: 18265/20099 (90.88%) Loss: 2.120799 LR: 0.00000675 +[17:19:42] Epoch: 1 Batch: 18266/20099 (90.88%) Loss: 2.259013 LR: 0.00000675 +[17:19:44] Epoch: 1 Batch: 18267/20099 (90.89%) Loss: 2.079130 LR: 0.00000675 +[17:19:46] Epoch: 1 Batch: 18268/20099 (90.89%) Loss: 2.325318 LR: 0.00000675 +[17:19:48] Epoch: 1 Batch: 18269/20099 (90.90%) Loss: 2.008859 LR: 0.00000675 +[17:19:50] Epoch: 1 Batch: 18270/20099 (90.90%) Loss: 2.069320 LR: 0.00000675 +[17:19:52] Epoch: 1 Batch: 18271/20099 (90.91%) Loss: 2.120573 LR: 0.00000675 +[17:19:53] Epoch: 1 Batch: 18272/20099 (90.91%) Loss: 1.977761 LR: 0.00000675 +[17:19:55] Epoch: 1 Batch: 18273/20099 (90.91%) Loss: 2.163534 LR: 0.00000675 +[17:19:57] Epoch: 1 Batch: 18274/20099 (90.92%) Loss: 1.993819 LR: 0.00000675 +[17:19:59] Epoch: 1 Batch: 18275/20099 (90.92%) Loss: 1.991727 LR: 0.00000675 +[17:20:01] Epoch: 1 Batch: 18276/20099 (90.93%) Loss: 1.737078 LR: 0.00000675 +[17:20:03] Epoch: 1 Batch: 18277/20099 (90.93%) Loss: 2.163844 LR: 0.00000675 +[17:20:05] Epoch: 1 Batch: 18278/20099 (90.94%) Loss: 2.635512 LR: 0.00000674 +[17:20:06] Epoch: 1 Batch: 18279/20099 (90.94%) Loss: 1.964423 LR: 0.00000674 +[17:20:08] Epoch: 1 Batch: 18280/20099 (90.95%) Loss: 2.352328 LR: 0.00000674 +[17:20:10] Epoch: 1 Batch: 18281/20099 (90.95%) Loss: 2.581562 LR: 0.00000674 +[17:20:12] Epoch: 1 Batch: 18282/20099 (90.96%) Loss: 2.068626 LR: 0.00000674 +[17:20:14] Epoch: 1 Batch: 18283/20099 (90.96%) Loss: 2.196370 LR: 0.00000674 +[17:20:16] Epoch: 1 Batch: 18284/20099 (90.97%) Loss: 2.100296 LR: 0.00000674 +[17:20:18] Epoch: 1 Batch: 18285/20099 (90.97%) Loss: 1.817886 LR: 0.00000674 +[17:20:19] Epoch: 1 Batch: 18286/20099 (90.98%) Loss: 2.005213 LR: 0.00000674 +[17:20:21] Epoch: 1 Batch: 18287/20099 (90.98%) Loss: 2.257515 LR: 0.00000674 +[17:20:23] Epoch: 1 Batch: 18288/20099 (90.99%) Loss: 2.070512 LR: 0.00000674 +[17:20:25] Epoch: 1 Batch: 18289/20099 (90.99%) Loss: 2.233605 LR: 0.00000674 +[17:20:27] Epoch: 1 Batch: 18290/20099 (91.00%) Loss: 1.956137 LR: 0.00000674 +[17:20:29] Epoch: 1 Batch: 18291/20099 (91.00%) Loss: 1.892496 LR: 0.00000674 +[17:20:31] Epoch: 1 Batch: 18292/20099 (91.01%) Loss: 2.244969 LR: 0.00000673 +[17:20:33] Epoch: 1 Batch: 18293/20099 (91.01%) Loss: 2.247425 LR: 0.00000673 +[17:20:34] Epoch: 1 Batch: 18294/20099 (91.02%) Loss: 2.115074 LR: 0.00000673 +[17:20:36] Epoch: 1 Batch: 18295/20099 (91.02%) Loss: 1.971923 LR: 0.00000673 +[17:20:38] Epoch: 1 Batch: 18296/20099 (91.03%) Loss: 2.238026 LR: 0.00000673 +[17:20:40] Epoch: 1 Batch: 18297/20099 (91.03%) Loss: 2.173589 LR: 0.00000673 +[17:20:42] Epoch: 1 Batch: 18298/20099 (91.04%) Loss: 2.057692 LR: 0.00000673 +[17:20:44] Epoch: 1 Batch: 18299/20099 (91.04%) Loss: 1.731912 LR: 0.00000672 +[17:20:46] Epoch: 1 Batch: 18300/20099 (91.05%) Loss: 2.360095 LR: 0.00000672 +[17:20:47] Epoch: 1 Batch: 18301/20099 (91.05%) Loss: 2.147245 LR: 0.00000672 +[17:20:49] Epoch: 1 Batch: 18302/20099 (91.06%) Loss: 2.054597 LR: 0.00000672 +[17:20:51] Epoch: 1 Batch: 18303/20099 (91.06%) Loss: 1.745960 LR: 0.00000672 +[17:20:53] Epoch: 1 Batch: 18304/20099 (91.07%) Loss: 2.321170 LR: 0.00000672 +[17:20:55] Epoch: 1 Batch: 18305/20099 (91.07%) Loss: 1.976469 LR: 0.00000672 +[17:20:57] Epoch: 1 Batch: 18306/20099 (91.08%) Loss: 1.805069 LR: 0.00000672 +[17:20:59] Epoch: 1 Batch: 18307/20099 (91.08%) Loss: 1.898974 LR: 0.00000672 +[17:21:00] Epoch: 1 Batch: 18308/20099 (91.09%) Loss: 2.018485 LR: 0.00000672 +[17:21:02] Epoch: 1 Batch: 18309/20099 (91.09%) Loss: 2.045979 LR: 0.00000672 +[17:21:04] Epoch: 1 Batch: 18310/20099 (91.10%) Loss: 2.436414 LR: 0.00000672 +[17:21:06] Epoch: 1 Batch: 18311/20099 (91.10%) Loss: 1.983081 LR: 0.00000672 +[17:21:08] Epoch: 1 Batch: 18312/20099 (91.11%) Loss: 1.722857 LR: 0.00000672 +[17:21:10] Epoch: 1 Batch: 18313/20099 (91.11%) Loss: 1.791212 LR: 0.00000671 +[17:21:12] Epoch: 1 Batch: 18314/20099 (91.12%) Loss: 2.322313 LR: 0.00000671 +[17:21:13] Epoch: 1 Batch: 18315/20099 (91.12%) Loss: 2.223823 LR: 0.00000671 +[17:21:15] Epoch: 1 Batch: 18316/20099 (91.13%) Loss: 2.138594 LR: 0.00000671 +[17:21:17] Epoch: 1 Batch: 18317/20099 (91.13%) Loss: 2.237443 LR: 0.00000671 +[17:21:19] Epoch: 1 Batch: 18318/20099 (91.14%) Loss: 2.198127 LR: 0.00000671 +[17:21:21] Epoch: 1 Batch: 18319/20099 (91.14%) Loss: 2.305162 LR: 0.00000671 +[17:21:23] Epoch: 1 Batch: 18320/20099 (91.15%) Loss: 2.279042 LR: 0.00000671 +[17:21:25] Epoch: 1 Batch: 18321/20099 (91.15%) Loss: 2.049140 LR: 0.00000671 +[17:21:27] Epoch: 1 Batch: 18322/20099 (91.16%) Loss: 1.891444 LR: 0.00000671 +[17:21:28] Epoch: 1 Batch: 18323/20099 (91.16%) Loss: 1.871263 LR: 0.00000671 +[17:21:30] Epoch: 1 Batch: 18324/20099 (91.17%) Loss: 2.406608 LR: 0.00000671 +[17:21:32] Epoch: 1 Batch: 18325/20099 (91.17%) Loss: 1.669776 LR: 0.00000671 +[17:21:34] Epoch: 1 Batch: 18326/20099 (91.18%) Loss: 2.239606 LR: 0.00000671 +[17:21:36] Epoch: 1 Batch: 18327/20099 (91.18%) Loss: 1.950848 LR: 0.00000670 +[17:21:38] Epoch: 1 Batch: 18328/20099 (91.19%) Loss: 1.796362 LR: 0.00000670 +[17:21:40] Epoch: 1 Batch: 18329/20099 (91.19%) Loss: 1.811717 LR: 0.00000670 +[17:21:41] Epoch: 1 Batch: 18330/20099 (91.20%) Loss: 2.029404 LR: 0.00000670 +[17:21:43] Epoch: 1 Batch: 18331/20099 (91.20%) Loss: 2.034078 LR: 0.00000670 +[17:21:45] Epoch: 1 Batch: 18332/20099 (91.21%) Loss: 1.942078 LR: 0.00000670 +[17:21:47] Epoch: 1 Batch: 18333/20099 (91.21%) Loss: 2.045277 LR: 0.00000670 +[17:21:49] Epoch: 1 Batch: 18334/20099 (91.22%) Loss: 2.144887 LR: 0.00000670 +[17:21:51] Epoch: 1 Batch: 18335/20099 (91.22%) Loss: 2.002846 LR: 0.00000670 +[17:21:53] Epoch: 1 Batch: 18336/20099 (91.23%) Loss: 1.899409 LR: 0.00000670 +[17:21:54] Epoch: 1 Batch: 18337/20099 (91.23%) Loss: 1.897315 LR: 0.00000670 +[17:21:56] Epoch: 1 Batch: 18338/20099 (91.24%) Loss: 2.173879 LR: 0.00000670 +[17:21:58] Epoch: 1 Batch: 18339/20099 (91.24%) Loss: 2.136481 LR: 0.00000670 +[17:22:00] Epoch: 1 Batch: 18340/20099 (91.25%) Loss: 2.128942 LR: 0.00000670 +[17:22:02] Epoch: 1 Batch: 18341/20099 (91.25%) Loss: 2.503619 LR: 0.00000669 +[17:22:04] Epoch: 1 Batch: 18342/20099 (91.26%) Loss: 2.130661 LR: 0.00000669 +[17:22:06] Epoch: 1 Batch: 18343/20099 (91.26%) Loss: 2.113306 LR: 0.00000669 +[17:22:07] Epoch: 1 Batch: 18344/20099 (91.27%) Loss: 2.172758 LR: 0.00000669 +[17:22:09] Epoch: 1 Batch: 18345/20099 (91.27%) Loss: 1.736252 LR: 0.00000669 +[17:22:11] Epoch: 1 Batch: 18346/20099 (91.28%) Loss: 2.086050 LR: 0.00000669 +[17:22:13] Epoch: 1 Batch: 18347/20099 (91.28%) Loss: 1.931801 LR: 0.00000669 +[17:22:15] Epoch: 1 Batch: 18348/20099 (91.29%) Loss: 1.932867 LR: 0.00000669 +[17:22:17] Epoch: 1 Batch: 18349/20099 (91.29%) Loss: 2.122732 LR: 0.00000669 +[17:22:19] Epoch: 1 Batch: 18350/20099 (91.30%) Loss: 1.977980 LR: 0.00000669 +[17:22:20] Epoch: 1 Batch: 18351/20099 (91.30%) Loss: 2.219812 LR: 0.00000669 +[17:22:22] Epoch: 1 Batch: 18352/20099 (91.31%) Loss: 2.149518 LR: 0.00000669 +[17:22:24] Epoch: 1 Batch: 18353/20099 (91.31%) Loss: 2.171475 LR: 0.00000669 +[17:22:26] Epoch: 1 Batch: 18354/20099 (91.32%) Loss: 1.790500 LR: 0.00000669 +[17:22:28] Epoch: 1 Batch: 18355/20099 (91.32%) Loss: 1.642383 LR: 0.00000668 +[17:22:30] Epoch: 1 Batch: 18356/20099 (91.33%) Loss: 1.978318 LR: 0.00000668 +[17:22:31] Epoch: 1 Batch: 18357/20099 (91.33%) Loss: 1.906888 LR: 0.00000668 +[17:22:33] Epoch: 1 Batch: 18358/20099 (91.34%) Loss: 2.045049 LR: 0.00000668 +[17:22:35] Epoch: 1 Batch: 18359/20099 (91.34%) Loss: 2.393996 LR: 0.00000668 +[17:22:37] Epoch: 1 Batch: 18360/20099 (91.35%) Loss: 2.415705 LR: 0.00000668 +[17:22:39] Epoch: 1 Batch: 18361/20099 (91.35%) Loss: 1.838671 LR: 0.00000668 +[17:22:41] Epoch: 1 Batch: 18362/20099 (91.36%) Loss: 1.911799 LR: 0.00000668 +[17:22:43] Epoch: 1 Batch: 18363/20099 (91.36%) Loss: 1.934760 LR: 0.00000668 +[17:22:44] Epoch: 1 Batch: 18364/20099 (91.37%) Loss: 1.930239 LR: 0.00000668 +[17:22:46] Epoch: 1 Batch: 18365/20099 (91.37%) Loss: 2.068558 LR: 0.00000668 +[17:22:48] Epoch: 1 Batch: 18366/20099 (91.38%) Loss: 2.095570 LR: 0.00000668 +[17:22:50] Epoch: 1 Batch: 18367/20099 (91.38%) Loss: 2.100580 LR: 0.00000668 +[17:22:52] Epoch: 1 Batch: 18368/20099 (91.39%) Loss: 1.865878 LR: 0.00000668 +[17:22:54] Epoch: 1 Batch: 18369/20099 (91.39%) Loss: 2.188638 LR: 0.00000667 +[17:22:55] Epoch: 1 Batch: 18370/20099 (91.40%) Loss: 2.192326 LR: 0.00000667 +[17:22:57] Epoch: 1 Batch: 18371/20099 (91.40%) Loss: 2.295754 LR: 0.00000667 +[17:22:59] Epoch: 1 Batch: 18372/20099 (91.41%) Loss: 1.689284 LR: 0.00000667 +[17:23:01] Epoch: 1 Batch: 18373/20099 (91.41%) Loss: 2.270655 LR: 0.00000667 +[17:23:03] Epoch: 1 Batch: 18374/20099 (91.42%) Loss: 2.085134 LR: 0.00000667 +[17:23:05] Epoch: 1 Batch: 18375/20099 (91.42%) Loss: 2.369422 LR: 0.00000667 +[17:23:07] Epoch: 1 Batch: 18376/20099 (91.43%) Loss: 1.928385 LR: 0.00000666 +[17:23:09] Epoch: 1 Batch: 18377/20099 (91.43%) Loss: 2.055173 LR: 0.00000666 +[17:23:10] Epoch: 1 Batch: 18378/20099 (91.44%) Loss: 2.043360 LR: 0.00000666 +[17:23:12] Epoch: 1 Batch: 18379/20099 (91.44%) Loss: 2.200934 LR: 0.00000666 +[17:23:14] Epoch: 1 Batch: 18380/20099 (91.45%) Loss: 2.039237 LR: 0.00000666 +[17:23:16] Epoch: 1 Batch: 18381/20099 (91.45%) Loss: 2.202638 LR: 0.00000666 +[17:23:18] Epoch: 1 Batch: 18382/20099 (91.46%) Loss: 2.025159 LR: 0.00000666 +[17:23:20] Epoch: 1 Batch: 18383/20099 (91.46%) Loss: 2.141991 LR: 0.00000666 +[17:23:22] Epoch: 1 Batch: 18384/20099 (91.47%) Loss: 2.219597 LR: 0.00000666 +[17:23:24] Epoch: 1 Batch: 18385/20099 (91.47%) Loss: 1.981840 LR: 0.00000666 +[17:23:25] Epoch: 1 Batch: 18386/20099 (91.48%) Loss: 2.142974 LR: 0.00000666 +[17:23:27] Epoch: 1 Batch: 18387/20099 (91.48%) Loss: 2.116699 LR: 0.00000666 +[17:23:29] Epoch: 1 Batch: 18388/20099 (91.49%) Loss: 1.905943 LR: 0.00000666 +[17:23:31] Epoch: 1 Batch: 18389/20099 (91.49%) Loss: 1.965544 LR: 0.00000666 +[17:23:33] Epoch: 1 Batch: 18390/20099 (91.50%) Loss: 2.064021 LR: 0.00000665 +[17:23:35] Epoch: 1 Batch: 18391/20099 (91.50%) Loss: 2.184979 LR: 0.00000665 +[17:23:37] Epoch: 1 Batch: 18392/20099 (91.51%) Loss: 1.990743 LR: 0.00000665 +[17:23:38] Epoch: 1 Batch: 18393/20099 (91.51%) Loss: 1.844092 LR: 0.00000665 +[17:23:40] Epoch: 1 Batch: 18394/20099 (91.52%) Loss: 2.140414 LR: 0.00000665 +[17:23:42] Epoch: 1 Batch: 18395/20099 (91.52%) Loss: 1.959634 LR: 0.00000665 +[17:23:44] Epoch: 1 Batch: 18396/20099 (91.53%) Loss: 2.201645 LR: 0.00000665 +[17:23:46] Epoch: 1 Batch: 18397/20099 (91.53%) Loss: 2.017674 LR: 0.00000665 +[17:23:48] Epoch: 1 Batch: 18398/20099 (91.54%) Loss: 1.887501 LR: 0.00000665 +[17:23:50] Epoch: 1 Batch: 18399/20099 (91.54%) Loss: 1.976553 LR: 0.00000665 +[17:23:55] >> Cleaned up old temp checkpoint: epoch1_step16400 +[17:23:55] >> Temp checkpoint saved: epoch1_step18400, size: 0.1693 GB +[17:23:55] Epoch: 1 Batch: 18400/20099 (91.55%) Loss: 2.249673 LR: 0.00000665 +[17:23:57] Epoch: 1 Batch: 18401/20099 (91.55%) Loss: 2.009481 LR: 0.00000665 +[17:23:59] Epoch: 1 Batch: 18402/20099 (91.56%) Loss: 2.090874 LR: 0.00000665 +[17:24:00] Epoch: 1 Batch: 18403/20099 (91.56%) Loss: 1.690160 LR: 0.00000665 +[17:24:02] Epoch: 1 Batch: 18404/20099 (91.57%) Loss: 1.949294 LR: 0.00000664 +[17:24:04] Epoch: 1 Batch: 18405/20099 (91.57%) Loss: 1.991221 LR: 0.00000664 +[17:24:06] Epoch: 1 Batch: 18406/20099 (91.58%) Loss: 2.097525 LR: 0.00000664 +[17:24:08] Epoch: 1 Batch: 18407/20099 (91.58%) Loss: 2.059528 LR: 0.00000664 +[17:24:10] Epoch: 1 Batch: 18408/20099 (91.59%) Loss: 2.054844 LR: 0.00000664 +[17:24:12] Epoch: 1 Batch: 18409/20099 (91.59%) Loss: 1.852958 LR: 0.00000664 +[17:24:13] Epoch: 1 Batch: 18410/20099 (91.60%) Loss: 2.152265 LR: 0.00000664 +[17:24:15] Epoch: 1 Batch: 18411/20099 (91.60%) Loss: 1.833290 LR: 0.00000664 +[17:24:17] Epoch: 1 Batch: 18412/20099 (91.61%) Loss: 2.328901 LR: 0.00000664 +[17:24:19] Epoch: 1 Batch: 18413/20099 (91.61%) Loss: 2.168030 LR: 0.00000664 +[17:24:21] Epoch: 1 Batch: 18414/20099 (91.62%) Loss: 1.804657 LR: 0.00000664 +[17:24:23] Epoch: 1 Batch: 18415/20099 (91.62%) Loss: 2.174877 LR: 0.00000664 +[17:24:25] Epoch: 1 Batch: 18416/20099 (91.63%) Loss: 2.063121 LR: 0.00000664 +[17:24:27] Epoch: 1 Batch: 18417/20099 (91.63%) Loss: 2.119265 LR: 0.00000664 +[17:24:28] Epoch: 1 Batch: 18418/20099 (91.64%) Loss: 2.232717 LR: 0.00000663 +[17:24:30] Epoch: 1 Batch: 18419/20099 (91.64%) Loss: 2.120742 LR: 0.00000663 +[17:24:32] Epoch: 1 Batch: 18420/20099 (91.65%) Loss: 2.199342 LR: 0.00000663 +[17:24:34] Epoch: 1 Batch: 18421/20099 (91.65%) Loss: 2.155524 LR: 0.00000663 +[17:24:36] Epoch: 1 Batch: 18422/20099 (91.66%) Loss: 1.961869 LR: 0.00000663 +[17:24:38] Epoch: 1 Batch: 18423/20099 (91.66%) Loss: 1.773931 LR: 0.00000663 +[17:24:40] Epoch: 1 Batch: 18424/20099 (91.67%) Loss: 2.082604 LR: 0.00000663 +[17:24:42] Epoch: 1 Batch: 18425/20099 (91.67%) Loss: 1.982359 LR: 0.00000663 +[17:24:43] Epoch: 1 Batch: 18426/20099 (91.68%) Loss: 1.930046 LR: 0.00000663 +[17:24:45] Epoch: 1 Batch: 18427/20099 (91.68%) Loss: 2.135198 LR: 0.00000663 +[17:24:47] Epoch: 1 Batch: 18428/20099 (91.69%) Loss: 1.780113 LR: 0.00000663 +[17:24:49] Epoch: 1 Batch: 18429/20099 (91.69%) Loss: 2.316741 LR: 0.00000663 +[17:24:51] Epoch: 1 Batch: 18430/20099 (91.70%) Loss: 2.105590 LR: 0.00000663 +[17:24:53] Epoch: 1 Batch: 18431/20099 (91.70%) Loss: 2.206619 LR: 0.00000663 +[17:24:55] Epoch: 1 Batch: 18432/20099 (91.71%) Loss: 2.088426 LR: 0.00000662 +[17:24:56] Epoch: 1 Batch: 18433/20099 (91.71%) Loss: 1.767898 LR: 0.00000662 +[17:24:58] Epoch: 1 Batch: 18434/20099 (91.72%) Loss: 1.984496 LR: 0.00000662 +[17:25:00] Epoch: 1 Batch: 18435/20099 (91.72%) Loss: 1.812868 LR: 0.00000662 +[17:25:02] Epoch: 1 Batch: 18436/20099 (91.73%) Loss: 2.266970 LR: 0.00000662 +[17:25:04] Epoch: 1 Batch: 18437/20099 (91.73%) Loss: 2.125788 LR: 0.00000662 +[17:25:06] Epoch: 1 Batch: 18438/20099 (91.74%) Loss: 2.031799 LR: 0.00000662 +[17:25:08] Epoch: 1 Batch: 18439/20099 (91.74%) Loss: 2.156986 LR: 0.00000662 +[17:25:09] Epoch: 1 Batch: 18440/20099 (91.75%) Loss: 2.113350 LR: 0.00000662 +[17:25:11] Epoch: 1 Batch: 18441/20099 (91.75%) Loss: 2.007675 LR: 0.00000662 +[17:25:13] Epoch: 1 Batch: 18442/20099 (91.76%) Loss: 2.235642 LR: 0.00000662 +[17:25:15] Epoch: 1 Batch: 18443/20099 (91.76%) Loss: 2.294758 LR: 0.00000662 +[17:25:17] Epoch: 1 Batch: 18444/20099 (91.77%) Loss: 2.291122 LR: 0.00000662 +[17:25:19] Epoch: 1 Batch: 18445/20099 (91.77%) Loss: 1.818213 LR: 0.00000662 +[17:25:21] Epoch: 1 Batch: 18446/20099 (91.78%) Loss: 2.424020 LR: 0.00000661 +[17:25:22] Epoch: 1 Batch: 18447/20099 (91.78%) Loss: 2.229614 LR: 0.00000661 +[17:25:24] Epoch: 1 Batch: 18448/20099 (91.79%) Loss: 2.290539 LR: 0.00000661 +[17:25:26] Epoch: 1 Batch: 18449/20099 (91.79%) Loss: 2.015176 LR: 0.00000661 +[17:25:28] Epoch: 1 Batch: 18450/20099 (91.80%) Loss: 1.939888 LR: 0.00000661 +[17:25:30] Epoch: 1 Batch: 18451/20099 (91.80%) Loss: 2.155565 LR: 0.00000661 +[17:25:32] Epoch: 1 Batch: 18452/20099 (91.81%) Loss: 2.225537 LR: 0.00000661 +[17:25:33] Epoch: 1 Batch: 18453/20099 (91.81%) Loss: 2.078254 LR: 0.00000661 +[17:25:35] Epoch: 1 Batch: 18454/20099 (91.82%) Loss: 2.303090 LR: 0.00000661 +[17:25:37] Epoch: 1 Batch: 18455/20099 (91.82%) Loss: 2.247781 LR: 0.00000661 +[17:25:39] Epoch: 1 Batch: 18456/20099 (91.83%) Loss: 2.372938 LR: 0.00000661 +[17:25:41] Epoch: 1 Batch: 18457/20099 (91.83%) Loss: 2.056391 LR: 0.00000661 +[17:25:43] Epoch: 1 Batch: 18458/20099 (91.84%) Loss: 2.242462 LR: 0.00000661 +[17:25:45] Epoch: 1 Batch: 18459/20099 (91.84%) Loss: 2.127981 LR: 0.00000661 +[17:25:46] Epoch: 1 Batch: 18460/20099 (91.85%) Loss: 2.217747 LR: 0.00000660 +[17:25:48] Epoch: 1 Batch: 18461/20099 (91.85%) Loss: 2.080657 LR: 0.00000660 +[17:25:50] Epoch: 1 Batch: 18462/20099 (91.86%) Loss: 2.056182 LR: 0.00000660 +[17:25:52] Epoch: 1 Batch: 18463/20099 (91.86%) Loss: 2.178015 LR: 0.00000660 +[17:25:54] Epoch: 1 Batch: 18464/20099 (91.87%) Loss: 2.217439 LR: 0.00000660 +[17:25:56] Epoch: 1 Batch: 18465/20099 (91.87%) Loss: 1.958147 LR: 0.00000660 +[17:25:58] Epoch: 1 Batch: 18466/20099 (91.88%) Loss: 1.988809 LR: 0.00000660 +[17:25:59] Epoch: 1 Batch: 18467/20099 (91.88%) Loss: 2.021551 LR: 0.00000660 +[17:26:01] Epoch: 1 Batch: 18468/20099 (91.89%) Loss: 2.137855 LR: 0.00000660 +[17:26:03] Epoch: 1 Batch: 18469/20099 (91.89%) Loss: 2.297027 LR: 0.00000660 +[17:26:05] Epoch: 1 Batch: 18470/20099 (91.90%) Loss: 1.779277 LR: 0.00000660 +[17:26:07] Epoch: 1 Batch: 18471/20099 (91.90%) Loss: 2.017470 LR: 0.00000660 +[17:26:09] Epoch: 1 Batch: 18472/20099 (91.91%) Loss: 2.142831 LR: 0.00000660 +[17:26:11] Epoch: 1 Batch: 18473/20099 (91.91%) Loss: 2.102731 LR: 0.00000660 +[17:26:13] Epoch: 1 Batch: 18474/20099 (91.92%) Loss: 2.043601 LR: 0.00000659 +[17:26:14] Epoch: 1 Batch: 18475/20099 (91.92%) Loss: 1.955654 LR: 0.00000659 +[17:26:16] Epoch: 1 Batch: 18476/20099 (91.92%) Loss: 2.155729 LR: 0.00000659 +[17:26:18] Epoch: 1 Batch: 18477/20099 (91.93%) Loss: 2.310088 LR: 0.00000659 +[17:26:20] Epoch: 1 Batch: 18478/20099 (91.93%) Loss: 2.086790 LR: 0.00000659 +[17:26:22] Epoch: 1 Batch: 18479/20099 (91.94%) Loss: 1.949475 LR: 0.00000659 +[17:26:24] Epoch: 1 Batch: 18480/20099 (91.94%) Loss: 2.324889 LR: 0.00000659 +[17:26:26] Epoch: 1 Batch: 18481/20099 (91.95%) Loss: 2.137190 LR: 0.00000659 +[17:26:27] Epoch: 1 Batch: 18482/20099 (91.95%) Loss: 2.033202 LR: 0.00000659 +[17:26:29] Epoch: 1 Batch: 18483/20099 (91.96%) Loss: 1.879902 LR: 0.00000659 +[17:26:31] Epoch: 1 Batch: 18484/20099 (91.96%) Loss: 1.733433 LR: 0.00000659 +[17:26:33] Epoch: 1 Batch: 18485/20099 (91.97%) Loss: 1.865685 LR: 0.00000659 +[17:26:35] Epoch: 1 Batch: 18486/20099 (91.97%) Loss: 2.006053 LR: 0.00000659 +[17:26:37] Epoch: 1 Batch: 18487/20099 (91.98%) Loss: 2.027108 LR: 0.00000659 +[17:26:39] Epoch: 1 Batch: 18488/20099 (91.98%) Loss: 2.125189 LR: 0.00000658 +[17:26:40] Epoch: 1 Batch: 18489/20099 (91.99%) Loss: 1.993268 LR: 0.00000658 +[17:26:42] Epoch: 1 Batch: 18490/20099 (91.99%) Loss: 2.025223 LR: 0.00000658 +[17:26:44] Epoch: 1 Batch: 18491/20099 (92.00%) Loss: 1.954008 LR: 0.00000658 +[17:26:46] Epoch: 1 Batch: 18492/20099 (92.00%) Loss: 2.049920 LR: 0.00000658 +[17:26:48] Epoch: 1 Batch: 18493/20099 (92.01%) Loss: 2.060196 LR: 0.00000658 +[17:26:50] Epoch: 1 Batch: 18494/20099 (92.01%) Loss: 2.221396 LR: 0.00000658 +[17:26:52] Epoch: 1 Batch: 18495/20099 (92.02%) Loss: 1.949296 LR: 0.00000658 +[17:26:54] Epoch: 1 Batch: 18496/20099 (92.02%) Loss: 2.216684 LR: 0.00000658 +[17:26:55] Epoch: 1 Batch: 18497/20099 (92.03%) Loss: 1.797877 LR: 0.00000658 +[17:26:57] Epoch: 1 Batch: 18498/20099 (92.03%) Loss: 2.202593 LR: 0.00000658 +[17:26:59] Epoch: 1 Batch: 18499/20099 (92.04%) Loss: 1.974794 LR: 0.00000658 +[17:27:01] >> Evaluating batch 0 +[17:27:02] >> Evaluating batch 1 +[17:27:03] >> Evaluating batch 2 +[17:27:04] >> Evaluating batch 3 +[17:27:05] >> Evaluating batch 4 +[17:27:06] >> Evaluating batch 5 +[17:27:07] >> Evaluating batch 6 +[17:27:08] >> Evaluating batch 7 +[17:27:10] >> Evaluating batch 8 +[17:27:11] >> Evaluating batch 9 +[17:27:12] >> Evaluating batch 10 +[17:27:13] >> Evaluating batch 11 +[17:27:14] >> Evaluating batch 12 +[17:27:15] >> Evaluating batch 13 +[17:27:16] >> Evaluating batch 14 +[17:27:17] >> Evaluating batch 15 +[17:27:18] >> Evaluating batch 16 +[17:27:18] Epoch: 1 Step: 18500/20099 Evaluation: +[17:27:18] [1mAvg Loss Since Last Eval: 2.0856 Val Loss: 2.1458 Validation loss delta: -0.0021 Perplexity: 8.5491 LR: 0.00000658 +[17:27:22] >> Checkpoint saved: epoch1_step18500, size: 0.1693 GB +[17:27:22] Epoch: 1 Batch: 18500/20099 (92.04%) Loss: 2.070910 LR: 0.00000658 +[17:27:24] Epoch: 1 Batch: 18501/20099 (92.05%) Loss: 2.200188 LR: 0.00000658 +[17:27:25] Epoch: 1 Batch: 18502/20099 (92.05%) Loss: 1.681633 LR: 0.00000657 +[17:27:27] Epoch: 1 Batch: 18503/20099 (92.06%) Loss: 2.008803 LR: 0.00000657 +[17:27:29] Epoch: 1 Batch: 18504/20099 (92.06%) Loss: 2.209777 LR: 0.00000657 +[17:27:31] Epoch: 1 Batch: 18505/20099 (92.07%) Loss: 2.010547 LR: 0.00000657 +[17:27:33] Epoch: 1 Batch: 18506/20099 (92.07%) Loss: 2.352647 LR: 0.00000657 +[17:27:35] Epoch: 1 Batch: 18507/20099 (92.08%) Loss: 2.049237 LR: 0.00000657 +[17:27:37] Epoch: 1 Batch: 18508/20099 (92.08%) Loss: 2.254377 LR: 0.00000657 +[17:27:38] Epoch: 1 Batch: 18509/20099 (92.09%) Loss: 1.905163 LR: 0.00000657 +[17:27:40] Epoch: 1 Batch: 18510/20099 (92.09%) Loss: 2.159025 LR: 0.00000657 +[17:27:42] Epoch: 1 Batch: 18511/20099 (92.10%) Loss: 2.104522 LR: 0.00000657 +[17:27:44] Epoch: 1 Batch: 18512/20099 (92.10%) Loss: 2.090782 LR: 0.00000657 +[17:27:46] Epoch: 1 Batch: 18513/20099 (92.11%) Loss: 1.868900 LR: 0.00000657 +[17:27:48] Epoch: 1 Batch: 18514/20099 (92.11%) Loss: 2.178253 LR: 0.00000657 +[17:27:50] Epoch: 1 Batch: 18515/20099 (92.12%) Loss: 1.943151 LR: 0.00000657 +[17:27:52] Epoch: 1 Batch: 18516/20099 (92.12%) Loss: 2.097134 LR: 0.00000656 +[17:27:53] Epoch: 1 Batch: 18517/20099 (92.13%) Loss: 2.108878 LR: 0.00000656 +[17:27:55] Epoch: 1 Batch: 18518/20099 (92.13%) Loss: 1.919691 LR: 0.00000656 +[17:27:57] Epoch: 1 Batch: 18519/20099 (92.14%) Loss: 2.186897 LR: 0.00000656 +[17:27:59] Epoch: 1 Batch: 18520/20099 (92.14%) Loss: 2.186325 LR: 0.00000656 +[17:28:01] Epoch: 1 Batch: 18521/20099 (92.15%) Loss: 2.180423 LR: 0.00000656 +[17:28:03] Epoch: 1 Batch: 18522/20099 (92.15%) Loss: 1.992442 LR: 0.00000656 +[17:28:05] Epoch: 1 Batch: 18523/20099 (92.16%) Loss: 2.119346 LR: 0.00000656 +[17:28:07] Epoch: 1 Batch: 18524/20099 (92.16%) Loss: 2.086420 LR: 0.00000656 +[17:28:08] Epoch: 1 Batch: 18525/20099 (92.17%) Loss: 2.232122 LR: 0.00000656 +[17:28:10] Epoch: 1 Batch: 18526/20099 (92.17%) Loss: 2.112742 LR: 0.00000656 +[17:28:12] Epoch: 1 Batch: 18527/20099 (92.18%) Loss: 2.139407 LR: 0.00000656 +[17:28:14] Epoch: 1 Batch: 18528/20099 (92.18%) Loss: 1.721443 LR: 0.00000656 +[17:28:16] Epoch: 1 Batch: 18529/20099 (92.19%) Loss: 2.120173 LR: 0.00000656 +[17:28:18] Epoch: 1 Batch: 18530/20099 (92.19%) Loss: 1.952200 LR: 0.00000655 +[17:28:20] Epoch: 1 Batch: 18531/20099 (92.20%) Loss: 2.082420 LR: 0.00000655 +[17:28:21] Epoch: 1 Batch: 18532/20099 (92.20%) Loss: 1.943797 LR: 0.00000655 +[17:28:23] Epoch: 1 Batch: 18533/20099 (92.21%) Loss: 2.337121 LR: 0.00000655 +[17:28:25] Epoch: 1 Batch: 18534/20099 (92.21%) Loss: 2.154261 LR: 0.00000655 +[17:28:27] Epoch: 1 Batch: 18535/20099 (92.22%) Loss: 1.921984 LR: 0.00000655 +[17:28:29] Epoch: 1 Batch: 18536/20099 (92.22%) Loss: 2.241860 LR: 0.00000655 +[17:28:31] Epoch: 1 Batch: 18537/20099 (92.23%) Loss: 2.158810 LR: 0.00000655 +[17:28:32] Epoch: 1 Batch: 18538/20099 (92.23%) Loss: 2.122480 LR: 0.00000655 +[17:28:34] Epoch: 1 Batch: 18539/20099 (92.24%) Loss: 2.175948 LR: 0.00000655 +[17:28:36] Epoch: 1 Batch: 18540/20099 (92.24%) Loss: 2.282676 LR: 0.00000655 +[17:28:38] Epoch: 1 Batch: 18541/20099 (92.25%) Loss: 1.977975 LR: 0.00000655 +[17:28:40] Epoch: 1 Batch: 18542/20099 (92.25%) Loss: 2.406963 LR: 0.00000655 +[17:28:42] Epoch: 1 Batch: 18543/20099 (92.26%) Loss: 2.249956 LR: 0.00000655 +[17:28:44] Epoch: 1 Batch: 18544/20099 (92.26%) Loss: 2.353301 LR: 0.00000654 +[17:28:45] Epoch: 1 Batch: 18545/20099 (92.27%) Loss: 1.787629 LR: 0.00000654 +[17:28:47] Epoch: 1 Batch: 18546/20099 (92.27%) Loss: 2.286444 LR: 0.00000654 +[17:28:49] Epoch: 1 Batch: 18547/20099 (92.28%) Loss: 1.700065 LR: 0.00000654 +[17:28:51] Epoch: 1 Batch: 18548/20099 (92.28%) Loss: 1.937444 LR: 0.00000654 +[17:28:53] Epoch: 1 Batch: 18549/20099 (92.29%) Loss: 1.863976 LR: 0.00000654 +[17:28:55] Epoch: 1 Batch: 18550/20099 (92.29%) Loss: 1.998828 LR: 0.00000654 +[17:28:56] Epoch: 1 Batch: 18551/20099 (92.30%) Loss: 2.319018 LR: 0.00000654 +[17:28:58] Epoch: 1 Batch: 18552/20099 (92.30%) Loss: 2.218042 LR: 0.00000654 +[17:29:00] Epoch: 1 Batch: 18553/20099 (92.31%) Loss: 2.173389 LR: 0.00000654 +[17:29:02] Epoch: 1 Batch: 18554/20099 (92.31%) Loss: 1.863856 LR: 0.00000654 +[17:29:04] Epoch: 1 Batch: 18555/20099 (92.32%) Loss: 2.091495 LR: 0.00000654 +[17:29:06] Epoch: 1 Batch: 18556/20099 (92.32%) Loss: 1.889516 LR: 0.00000654 +[17:29:08] Epoch: 1 Batch: 18557/20099 (92.33%) Loss: 1.814129 LR: 0.00000654 +[17:29:09] Epoch: 1 Batch: 18558/20099 (92.33%) Loss: 2.095965 LR: 0.00000653 +[17:29:11] Epoch: 1 Batch: 18559/20099 (92.34%) Loss: 2.098662 LR: 0.00000653 +[17:29:13] Epoch: 1 Batch: 18560/20099 (92.34%) Loss: 2.084881 LR: 0.00000653 +[17:29:15] Epoch: 1 Batch: 18561/20099 (92.35%) Loss: 1.856266 LR: 0.00000653 +[17:29:17] Epoch: 1 Batch: 18562/20099 (92.35%) Loss: 2.208614 LR: 0.00000653 +[17:29:19] Epoch: 1 Batch: 18563/20099 (92.36%) Loss: 2.038973 LR: 0.00000653 +[17:29:21] Epoch: 1 Batch: 18564/20099 (92.36%) Loss: 2.149385 LR: 0.00000653 +[17:29:23] Epoch: 1 Batch: 18565/20099 (92.37%) Loss: 1.968051 LR: 0.00000653 +[17:29:24] Epoch: 1 Batch: 18566/20099 (92.37%) Loss: 1.950612 LR: 0.00000653 +[17:29:26] Epoch: 1 Batch: 18567/20099 (92.38%) Loss: 2.178420 LR: 0.00000653 +[17:29:28] Epoch: 1 Batch: 18568/20099 (92.38%) Loss: 2.039620 LR: 0.00000653 +[17:29:30] Epoch: 1 Batch: 18569/20099 (92.39%) Loss: 2.145299 LR: 0.00000653 +[17:29:32] Epoch: 1 Batch: 18570/20099 (92.39%) Loss: 2.082948 LR: 0.00000653 +[17:29:34] Epoch: 1 Batch: 18571/20099 (92.40%) Loss: 2.105550 LR: 0.00000653 +[17:29:36] Epoch: 1 Batch: 18572/20099 (92.40%) Loss: 2.055319 LR: 0.00000652 +[17:29:37] Epoch: 1 Batch: 18573/20099 (92.41%) Loss: 1.954497 LR: 0.00000652 +[17:29:39] Epoch: 1 Batch: 18574/20099 (92.41%) Loss: 2.095882 LR: 0.00000652 +[17:29:41] Epoch: 1 Batch: 18575/20099 (92.42%) Loss: 2.423534 LR: 0.00000652 +[17:29:43] Epoch: 1 Batch: 18576/20099 (92.42%) Loss: 1.885827 LR: 0.00000652 +[17:29:45] Epoch: 1 Batch: 18577/20099 (92.43%) Loss: 2.016953 LR: 0.00000652 +[17:29:47] Epoch: 1 Batch: 18578/20099 (92.43%) Loss: 1.802325 LR: 0.00000652 +[17:29:49] Epoch: 1 Batch: 18579/20099 (92.44%) Loss: 2.585333 LR: 0.00000652 +[17:29:50] Epoch: 1 Batch: 18580/20099 (92.44%) Loss: 2.004861 LR: 0.00000652 +[17:29:52] Epoch: 1 Batch: 18581/20099 (92.45%) Loss: 2.285186 LR: 0.00000652 +[17:29:54] Epoch: 1 Batch: 18582/20099 (92.45%) Loss: 2.204913 LR: 0.00000652 +[17:29:56] Epoch: 1 Batch: 18583/20099 (92.46%) Loss: 2.206007 LR: 0.00000652 +[17:29:58] Epoch: 1 Batch: 18584/20099 (92.46%) Loss: 2.077108 LR: 0.00000652 +[17:30:00] Epoch: 1 Batch: 18585/20099 (92.47%) Loss: 2.128728 LR: 0.00000652 +[17:30:01] Epoch: 1 Batch: 18586/20099 (92.47%) Loss: 2.184425 LR: 0.00000651 +[17:30:03] Epoch: 1 Batch: 18587/20099 (92.48%) Loss: 2.174702 LR: 0.00000651 +[17:30:05] Epoch: 1 Batch: 18588/20099 (92.48%) Loss: 1.979606 LR: 0.00000651 +[17:30:07] Epoch: 1 Batch: 18589/20099 (92.49%) Loss: 1.969262 LR: 0.00000651 +[17:30:09] Epoch: 1 Batch: 18590/20099 (92.49%) Loss: 2.291821 LR: 0.00000651 +[17:30:11] Epoch: 1 Batch: 18591/20099 (92.50%) Loss: 2.020004 LR: 0.00000651 +[17:30:13] Epoch: 1 Batch: 18592/20099 (92.50%) Loss: 2.005043 LR: 0.00000651 +[17:30:14] Epoch: 1 Batch: 18593/20099 (92.51%) Loss: 1.991050 LR: 0.00000651 +[17:30:16] Epoch: 1 Batch: 18594/20099 (92.51%) Loss: 2.192938 LR: 0.00000651 +[17:30:18] Epoch: 1 Batch: 18595/20099 (92.52%) Loss: 1.842924 LR: 0.00000651 +[17:30:20] Epoch: 1 Batch: 18596/20099 (92.52%) Loss: 1.967454 LR: 0.00000651 +[17:30:22] Epoch: 1 Batch: 18597/20099 (92.53%) Loss: 1.984706 LR: 0.00000651 +[17:30:24] Epoch: 1 Batch: 18598/20099 (92.53%) Loss: 2.036189 LR: 0.00000651 +[17:30:25] Epoch: 1 Batch: 18599/20099 (92.54%) Loss: 2.076400 LR: 0.00000651 +[17:30:31] >> Cleaned up old temp checkpoint: epoch1_step16600 +[17:30:31] >> Temp checkpoint saved: epoch1_step18600, size: 0.1693 GB +[17:30:31] Epoch: 1 Batch: 18600/20099 (92.54%) Loss: 2.136930 LR: 0.00000650 +[17:30:33] Epoch: 1 Batch: 18601/20099 (92.55%) Loss: 2.203590 LR: 0.00000650 +[17:30:35] Epoch: 1 Batch: 18602/20099 (92.55%) Loss: 2.242135 LR: 0.00000650 +[17:30:36] Epoch: 1 Batch: 18603/20099 (92.56%) Loss: 2.149366 LR: 0.00000650 +[17:30:38] Epoch: 1 Batch: 18604/20099 (92.56%) Loss: 2.079739 LR: 0.00000650 +[17:30:40] Epoch: 1 Batch: 18605/20099 (92.57%) Loss: 2.105687 LR: 0.00000650 +[17:30:42] Epoch: 1 Batch: 18606/20099 (92.57%) Loss: 2.024431 LR: 0.00000650 +[17:30:44] Epoch: 1 Batch: 18607/20099 (92.58%) Loss: 2.034639 LR: 0.00000650 +[17:30:46] Epoch: 1 Batch: 18608/20099 (92.58%) Loss: 2.312706 LR: 0.00000650 +[17:30:47] Epoch: 1 Batch: 18609/20099 (92.59%) Loss: 1.852452 LR: 0.00000650 +[17:30:49] Epoch: 1 Batch: 18610/20099 (92.59%) Loss: 1.923555 LR: 0.00000650 +[17:30:51] Epoch: 1 Batch: 18611/20099 (92.60%) Loss: 2.000809 LR: 0.00000650 +[17:30:53] Epoch: 1 Batch: 18612/20099 (92.60%) Loss: 1.888199 LR: 0.00000650 +[17:30:55] Epoch: 1 Batch: 18613/20099 (92.61%) Loss: 2.238435 LR: 0.00000650 +[17:30:57] Epoch: 1 Batch: 18614/20099 (92.61%) Loss: 1.866315 LR: 0.00000650 +[17:30:59] Epoch: 1 Batch: 18615/20099 (92.62%) Loss: 1.947032 LR: 0.00000650 +[17:31:00] Epoch: 1 Batch: 18616/20099 (92.62%) Loss: 2.433438 LR: 0.00000650 +[17:31:02] Epoch: 1 Batch: 18617/20099 (92.63%) Loss: 2.131267 LR: 0.00000650 +[17:31:04] Epoch: 1 Batch: 18618/20099 (92.63%) Loss: 1.948051 LR: 0.00000650 +[17:31:06] Epoch: 1 Batch: 18619/20099 (92.64%) Loss: 2.178027 LR: 0.00000650 +[17:31:08] Epoch: 1 Batch: 18620/20099 (92.64%) Loss: 2.228802 LR: 0.00000650 +[17:31:10] Epoch: 1 Batch: 18621/20099 (92.65%) Loss: 1.898891 LR: 0.00000649 +[17:31:12] Epoch: 1 Batch: 18622/20099 (92.65%) Loss: 2.019291 LR: 0.00000649 +[17:31:14] Epoch: 1 Batch: 18623/20099 (92.66%) Loss: 2.387329 LR: 0.00000649 +[17:31:15] Epoch: 1 Batch: 18624/20099 (92.66%) Loss: 2.464012 LR: 0.00000649 +[17:31:17] Epoch: 1 Batch: 18625/20099 (92.67%) Loss: 1.971589 LR: 0.00000649 +[17:31:19] Epoch: 1 Batch: 18626/20099 (92.67%) Loss: 2.178151 LR: 0.00000649 +[17:31:21] Epoch: 1 Batch: 18627/20099 (92.68%) Loss: 2.074365 LR: 0.00000649 +[17:31:23] Epoch: 1 Batch: 18628/20099 (92.68%) Loss: 2.064335 LR: 0.00000649 +[17:31:25] Epoch: 1 Batch: 18629/20099 (92.69%) Loss: 2.118189 LR: 0.00000649 +[17:31:27] Epoch: 1 Batch: 18630/20099 (92.69%) Loss: 2.140453 LR: 0.00000649 +[17:31:29] Epoch: 1 Batch: 18631/20099 (92.70%) Loss: 2.093363 LR: 0.00000649 +[17:31:30] Epoch: 1 Batch: 18632/20099 (92.70%) Loss: 2.368889 LR: 0.00000649 +[17:31:32] Epoch: 1 Batch: 18633/20099 (92.71%) Loss: 2.021134 LR: 0.00000649 +[17:31:34] Epoch: 1 Batch: 18634/20099 (92.71%) Loss: 2.218708 LR: 0.00000649 +[17:31:36] Epoch: 1 Batch: 18635/20099 (92.72%) Loss: 1.892248 LR: 0.00000648 +[17:31:38] Epoch: 1 Batch: 18636/20099 (92.72%) Loss: 2.027182 LR: 0.00000648 +[17:31:40] Epoch: 1 Batch: 18637/20099 (92.73%) Loss: 2.000261 LR: 0.00000648 +[17:31:41] Epoch: 1 Batch: 18638/20099 (92.73%) Loss: 2.136497 LR: 0.00000648 +[17:31:43] Epoch: 1 Batch: 18639/20099 (92.74%) Loss: 2.118199 LR: 0.00000648 +[17:31:45] Epoch: 1 Batch: 18640/20099 (92.74%) Loss: 2.016747 LR: 0.00000648 +[17:31:47] Epoch: 1 Batch: 18641/20099 (92.75%) Loss: 1.917366 LR: 0.00000648 +[17:31:49] Epoch: 1 Batch: 18642/20099 (92.75%) Loss: 2.190498 LR: 0.00000648 +[17:31:51] Epoch: 1 Batch: 18643/20099 (92.76%) Loss: 2.409042 LR: 0.00000648 +[17:31:53] Epoch: 1 Batch: 18644/20099 (92.76%) Loss: 2.184630 LR: 0.00000648 +[17:31:54] Epoch: 1 Batch: 18645/20099 (92.77%) Loss: 1.783897 LR: 0.00000648 +[17:31:56] Epoch: 1 Batch: 18646/20099 (92.77%) Loss: 1.832634 LR: 0.00000648 +[17:31:58] Epoch: 1 Batch: 18647/20099 (92.78%) Loss: 2.293603 LR: 0.00000648 +[17:32:00] Epoch: 1 Batch: 18648/20099 (92.78%) Loss: 2.066734 LR: 0.00000648 +[17:32:02] Epoch: 1 Batch: 18649/20099 (92.79%) Loss: 2.010708 LR: 0.00000647 +[17:32:04] Epoch: 1 Batch: 18650/20099 (92.79%) Loss: 1.835790 LR: 0.00000647 +[17:32:05] Epoch: 1 Batch: 18651/20099 (92.80%) Loss: 2.315195 LR: 0.00000647 +[17:32:07] Epoch: 1 Batch: 18652/20099 (92.80%) Loss: 1.996179 LR: 0.00000647 +[17:32:09] Epoch: 1 Batch: 18653/20099 (92.81%) Loss: 2.290437 LR: 0.00000647 +[17:32:11] Epoch: 1 Batch: 18654/20099 (92.81%) Loss: 2.172856 LR: 0.00000647 +[17:32:13] Epoch: 1 Batch: 18655/20099 (92.82%) Loss: 2.273805 LR: 0.00000647 +[17:32:15] Epoch: 1 Batch: 18656/20099 (92.82%) Loss: 1.965993 LR: 0.00000647 +[17:32:16] Epoch: 1 Batch: 18657/20099 (92.83%) Loss: 2.040334 LR: 0.00000647 +[17:32:18] Epoch: 1 Batch: 18658/20099 (92.83%) Loss: 2.064215 LR: 0.00000647 +[17:32:20] Epoch: 1 Batch: 18659/20099 (92.84%) Loss: 1.930113 LR: 0.00000647 +[17:32:22] Epoch: 1 Batch: 18660/20099 (92.84%) Loss: 1.890174 LR: 0.00000647 +[17:32:24] Epoch: 1 Batch: 18661/20099 (92.85%) Loss: 1.977024 LR: 0.00000647 +[17:32:26] Epoch: 1 Batch: 18662/20099 (92.85%) Loss: 2.100531 LR: 0.00000647 +[17:32:28] Epoch: 1 Batch: 18663/20099 (92.86%) Loss: 2.075784 LR: 0.00000646 +[17:32:29] Epoch: 1 Batch: 18664/20099 (92.86%) Loss: 2.078973 LR: 0.00000646 +[17:32:31] Epoch: 1 Batch: 18665/20099 (92.87%) Loss: 1.921682 LR: 0.00000646 +[17:32:33] Epoch: 1 Batch: 18666/20099 (92.87%) Loss: 2.073140 LR: 0.00000646 +[17:32:35] Epoch: 1 Batch: 18667/20099 (92.88%) Loss: 2.015704 LR: 0.00000646 +[17:32:37] Epoch: 1 Batch: 18668/20099 (92.88%) Loss: 2.001980 LR: 0.00000646 +[17:32:39] Epoch: 1 Batch: 18669/20099 (92.89%) Loss: 2.378247 LR: 0.00000646 +[17:32:41] Epoch: 1 Batch: 18670/20099 (92.89%) Loss: 1.904986 LR: 0.00000646 +[17:32:42] Epoch: 1 Batch: 18671/20099 (92.90%) Loss: 2.058527 LR: 0.00000646 +[17:32:44] Epoch: 1 Batch: 18672/20099 (92.90%) Loss: 2.001500 LR: 0.00000646 +[17:32:46] Epoch: 1 Batch: 18673/20099 (92.91%) Loss: 2.045680 LR: 0.00000646 +[17:32:48] Epoch: 1 Batch: 18674/20099 (92.91%) Loss: 2.374646 LR: 0.00000646 +[17:32:50] Epoch: 1 Batch: 18675/20099 (92.92%) Loss: 2.226331 LR: 0.00000646 +[17:32:52] Epoch: 1 Batch: 18676/20099 (92.92%) Loss: 1.977802 LR: 0.00000646 +[17:32:54] Epoch: 1 Batch: 18677/20099 (92.93%) Loss: 1.942435 LR: 0.00000645 +[17:32:55] Epoch: 1 Batch: 18678/20099 (92.93%) Loss: 1.825245 LR: 0.00000645 +[17:32:57] Epoch: 1 Batch: 18679/20099 (92.93%) Loss: 2.064878 LR: 0.00000645 +[17:32:59] Epoch: 1 Batch: 18680/20099 (92.94%) Loss: 2.237486 LR: 0.00000645 +[17:33:01] Epoch: 1 Batch: 18681/20099 (92.94%) Loss: 2.093668 LR: 0.00000645 +[17:33:03] Epoch: 1 Batch: 18682/20099 (92.95%) Loss: 2.129313 LR: 0.00000645 +[17:33:05] Epoch: 1 Batch: 18683/20099 (92.95%) Loss: 2.305095 LR: 0.00000645 +[17:33:07] Epoch: 1 Batch: 18684/20099 (92.96%) Loss: 2.067386 LR: 0.00000645 +[17:33:09] Epoch: 1 Batch: 18685/20099 (92.96%) Loss: 1.971169 LR: 0.00000645 +[17:33:10] Epoch: 1 Batch: 18686/20099 (92.97%) Loss: 2.109677 LR: 0.00000645 +[17:33:12] Epoch: 1 Batch: 18687/20099 (92.97%) Loss: 2.283536 LR: 0.00000645 +[17:33:14] Epoch: 1 Batch: 18688/20099 (92.98%) Loss: 2.154419 LR: 0.00000645 +[17:33:16] Epoch: 1 Batch: 18689/20099 (92.98%) Loss: 2.238497 LR: 0.00000645 +[17:33:18] Epoch: 1 Batch: 18690/20099 (92.99%) Loss: 2.078596 LR: 0.00000645 +[17:33:20] Epoch: 1 Batch: 18691/20099 (92.99%) Loss: 2.267323 LR: 0.00000645 +[17:33:22] Epoch: 1 Batch: 18692/20099 (93.00%) Loss: 2.092633 LR: 0.00000645 +[17:33:23] Epoch: 1 Batch: 18693/20099 (93.00%) Loss: 2.002461 LR: 0.00000645 +[17:33:25] Epoch: 1 Batch: 18694/20099 (93.01%) Loss: 1.712382 LR: 0.00000645 +[17:33:27] Epoch: 1 Batch: 18695/20099 (93.01%) Loss: 1.858646 LR: 0.00000645 +[17:33:29] Epoch: 1 Batch: 18696/20099 (93.02%) Loss: 2.499231 LR: 0.00000645 +[17:33:31] Epoch: 1 Batch: 18697/20099 (93.02%) Loss: 2.223319 LR: 0.00000645 +[17:33:33] Epoch: 1 Batch: 18698/20099 (93.03%) Loss: 2.077700 LR: 0.00000644 +[17:33:35] Epoch: 1 Batch: 18699/20099 (93.03%) Loss: 2.001270 LR: 0.00000644 +[17:33:36] Epoch: 1 Batch: 18700/20099 (93.04%) Loss: 1.892269 LR: 0.00000644 +[17:33:38] Epoch: 1 Batch: 18701/20099 (93.04%) Loss: 2.018689 LR: 0.00000644 +[17:33:40] Epoch: 1 Batch: 18702/20099 (93.05%) Loss: 2.082895 LR: 0.00000644 +[17:33:42] Epoch: 1 Batch: 18703/20099 (93.05%) Loss: 1.855661 LR: 0.00000644 +[17:33:44] Epoch: 1 Batch: 18704/20099 (93.06%) Loss: 2.001112 LR: 0.00000644 +[17:33:46] Epoch: 1 Batch: 18705/20099 (93.06%) Loss: 1.860882 LR: 0.00000644 +[17:33:48] Epoch: 1 Batch: 18706/20099 (93.07%) Loss: 2.058739 LR: 0.00000644 +[17:33:49] Epoch: 1 Batch: 18707/20099 (93.07%) Loss: 2.156561 LR: 0.00000644 +[17:33:51] Epoch: 1 Batch: 18708/20099 (93.08%) Loss: 2.047520 LR: 0.00000644 +[17:33:53] Epoch: 1 Batch: 18709/20099 (93.08%) Loss: 2.263521 LR: 0.00000644 +[17:33:55] Epoch: 1 Batch: 18710/20099 (93.09%) Loss: 2.085829 LR: 0.00000644 +[17:33:57] Epoch: 1 Batch: 18711/20099 (93.09%) Loss: 2.424526 LR: 0.00000644 +[17:33:59] Epoch: 1 Batch: 18712/20099 (93.10%) Loss: 1.983364 LR: 0.00000643 +[17:34:01] Epoch: 1 Batch: 18713/20099 (93.10%) Loss: 2.018081 LR: 0.00000643 +[17:34:02] Epoch: 1 Batch: 18714/20099 (93.11%) Loss: 2.169566 LR: 0.00000643 +[17:34:04] Epoch: 1 Batch: 18715/20099 (93.11%) Loss: 1.968771 LR: 0.00000643 +[17:34:06] Epoch: 1 Batch: 18716/20099 (93.12%) Loss: 2.036135 LR: 0.00000643 +[17:34:08] Epoch: 1 Batch: 18717/20099 (93.12%) Loss: 1.847755 LR: 0.00000643 +[17:34:10] Epoch: 1 Batch: 18718/20099 (93.13%) Loss: 2.001440 LR: 0.00000643 +[17:34:12] Epoch: 1 Batch: 18719/20099 (93.13%) Loss: 2.224553 LR: 0.00000643 +[17:34:14] Epoch: 1 Batch: 18720/20099 (93.14%) Loss: 2.151979 LR: 0.00000643 +[17:34:15] Epoch: 1 Batch: 18721/20099 (93.14%) Loss: 2.276133 LR: 0.00000643 +[17:34:17] Epoch: 1 Batch: 18722/20099 (93.15%) Loss: 2.107471 LR: 0.00000643 +[17:34:19] Epoch: 1 Batch: 18723/20099 (93.15%) Loss: 1.984312 LR: 0.00000643 +[17:34:21] Epoch: 1 Batch: 18724/20099 (93.16%) Loss: 2.004476 LR: 0.00000643 +[17:34:23] Epoch: 1 Batch: 18725/20099 (93.16%) Loss: 2.042790 LR: 0.00000643 +[17:34:25] Epoch: 1 Batch: 18726/20099 (93.17%) Loss: 2.188302 LR: 0.00000642 +[17:34:26] Epoch: 1 Batch: 18727/20099 (93.17%) Loss: 2.103880 LR: 0.00000642 +[17:34:28] Epoch: 1 Batch: 18728/20099 (93.18%) Loss: 1.967489 LR: 0.00000642 +[17:34:30] Epoch: 1 Batch: 18729/20099 (93.18%) Loss: 2.156760 LR: 0.00000642 +[17:34:32] Epoch: 1 Batch: 18730/20099 (93.19%) Loss: 1.999638 LR: 0.00000642 +[17:34:34] Epoch: 1 Batch: 18731/20099 (93.19%) Loss: 2.132552 LR: 0.00000642 +[17:34:36] Epoch: 1 Batch: 18732/20099 (93.20%) Loss: 2.038106 LR: 0.00000642 +[17:34:38] Epoch: 1 Batch: 18733/20099 (93.20%) Loss: 2.134585 LR: 0.00000642 +[17:34:39] Epoch: 1 Batch: 18734/20099 (93.21%) Loss: 2.178454 LR: 0.00000642 +[17:34:41] Epoch: 1 Batch: 18735/20099 (93.21%) Loss: 1.941187 LR: 0.00000642 +[17:34:43] Epoch: 1 Batch: 18736/20099 (93.22%) Loss: 1.861298 LR: 0.00000642 +[17:34:45] Epoch: 1 Batch: 18737/20099 (93.22%) Loss: 2.228157 LR: 0.00000642 +[17:34:47] Epoch: 1 Batch: 18738/20099 (93.23%) Loss: 2.259131 LR: 0.00000642 +[17:34:49] Epoch: 1 Batch: 18739/20099 (93.23%) Loss: 1.827313 LR: 0.00000642 +[17:34:51] Epoch: 1 Batch: 18740/20099 (93.24%) Loss: 2.070163 LR: 0.00000642 +[17:34:52] Epoch: 1 Batch: 18741/20099 (93.24%) Loss: 2.206600 LR: 0.00000642 +[17:34:54] Epoch: 1 Batch: 18742/20099 (93.25%) Loss: 1.767934 LR: 0.00000642 +[17:34:56] Epoch: 1 Batch: 18743/20099 (93.25%) Loss: 1.962018 LR: 0.00000642 +[17:34:58] Epoch: 1 Batch: 18744/20099 (93.26%) Loss: 1.959214 LR: 0.00000642 +[17:35:00] Epoch: 1 Batch: 18745/20099 (93.26%) Loss: 2.339392 LR: 0.00000642 +[17:35:02] Epoch: 1 Batch: 18746/20099 (93.27%) Loss: 2.058571 LR: 0.00000642 +[17:35:04] Epoch: 1 Batch: 18747/20099 (93.27%) Loss: 2.025589 LR: 0.00000641 +[17:35:05] Epoch: 1 Batch: 18748/20099 (93.28%) Loss: 2.037528 LR: 0.00000641 +[17:35:07] Epoch: 1 Batch: 18749/20099 (93.28%) Loss: 2.028756 LR: 0.00000641 +[17:35:09] Epoch: 1 Batch: 18750/20099 (93.29%) Loss: 2.050617 LR: 0.00000641 +[17:35:11] Epoch: 1 Batch: 18751/20099 (93.29%) Loss: 1.990655 LR: 0.00000641 +[17:35:13] Epoch: 1 Batch: 18752/20099 (93.30%) Loss: 2.074683 LR: 0.00000641 +[17:35:15] Epoch: 1 Batch: 18753/20099 (93.30%) Loss: 2.065112 LR: 0.00000641 +[17:35:17] Epoch: 1 Batch: 18754/20099 (93.31%) Loss: 2.352881 LR: 0.00000641 +[17:35:18] Epoch: 1 Batch: 18755/20099 (93.31%) Loss: 2.209114 LR: 0.00000641 +[17:35:20] Epoch: 1 Batch: 18756/20099 (93.32%) Loss: 2.051444 LR: 0.00000641 +[17:35:22] Epoch: 1 Batch: 18757/20099 (93.32%) Loss: 2.088859 LR: 0.00000641 +[17:35:24] Epoch: 1 Batch: 18758/20099 (93.33%) Loss: 2.426763 LR: 0.00000641 +[17:35:26] Epoch: 1 Batch: 18759/20099 (93.33%) Loss: 2.168723 LR: 0.00000641 +[17:35:28] Epoch: 1 Batch: 18760/20099 (93.34%) Loss: 2.182183 LR: 0.00000641 +[17:35:30] Epoch: 1 Batch: 18761/20099 (93.34%) Loss: 2.304244 LR: 0.00000640 +[17:35:31] Epoch: 1 Batch: 18762/20099 (93.35%) Loss: 2.311611 LR: 0.00000640 +[17:35:33] Epoch: 1 Batch: 18763/20099 (93.35%) Loss: 2.066550 LR: 0.00000640 +[17:35:35] Epoch: 1 Batch: 18764/20099 (93.36%) Loss: 2.306504 LR: 0.00000640 +[17:35:37] Epoch: 1 Batch: 18765/20099 (93.36%) Loss: 2.212830 LR: 0.00000640 +[17:35:39] Epoch: 1 Batch: 18766/20099 (93.37%) Loss: 1.993317 LR: 0.00000640 +[17:35:41] Epoch: 1 Batch: 18767/20099 (93.37%) Loss: 2.210887 LR: 0.00000640 +[17:35:42] Epoch: 1 Batch: 18768/20099 (93.38%) Loss: 1.817197 LR: 0.00000640 +[17:35:44] Epoch: 1 Batch: 18769/20099 (93.38%) Loss: 2.325694 LR: 0.00000640 +[17:35:46] Epoch: 1 Batch: 18770/20099 (93.39%) Loss: 2.396560 LR: 0.00000640 +[17:35:48] Epoch: 1 Batch: 18771/20099 (93.39%) Loss: 2.282889 LR: 0.00000640 +[17:35:50] Epoch: 1 Batch: 18772/20099 (93.40%) Loss: 2.000047 LR: 0.00000640 +[17:35:52] Epoch: 1 Batch: 18773/20099 (93.40%) Loss: 2.224808 LR: 0.00000640 +[17:35:54] Epoch: 1 Batch: 18774/20099 (93.41%) Loss: 2.046538 LR: 0.00000640 +[17:35:55] Epoch: 1 Batch: 18775/20099 (93.41%) Loss: 2.110552 LR: 0.00000639 +[17:35:57] Epoch: 1 Batch: 18776/20099 (93.42%) Loss: 2.005676 LR: 0.00000639 +[17:35:59] Epoch: 1 Batch: 18777/20099 (93.42%) Loss: 1.748749 LR: 0.00000639 +[17:36:01] Epoch: 1 Batch: 18778/20099 (93.43%) Loss: 2.110223 LR: 0.00000639 +[17:36:03] Epoch: 1 Batch: 18779/20099 (93.43%) Loss: 1.993221 LR: 0.00000639 +[17:36:05] Epoch: 1 Batch: 18780/20099 (93.44%) Loss: 1.864843 LR: 0.00000639 +[17:36:07] Epoch: 1 Batch: 18781/20099 (93.44%) Loss: 1.971040 LR: 0.00000639 +[17:36:08] Epoch: 1 Batch: 18782/20099 (93.45%) Loss: 2.099641 LR: 0.00000639 +[17:36:10] Epoch: 1 Batch: 18783/20099 (93.45%) Loss: 2.127069 LR: 0.00000639 +[17:36:12] Epoch: 1 Batch: 18784/20099 (93.46%) Loss: 2.238376 LR: 0.00000639 +[17:36:14] Epoch: 1 Batch: 18785/20099 (93.46%) Loss: 2.208950 LR: 0.00000639 +[17:36:16] Epoch: 1 Batch: 18786/20099 (93.47%) Loss: 2.097956 LR: 0.00000639 +[17:36:18] Epoch: 1 Batch: 18787/20099 (93.47%) Loss: 2.375259 LR: 0.00000639 +[17:36:20] Epoch: 1 Batch: 18788/20099 (93.48%) Loss: 2.211732 LR: 0.00000639 +[17:36:21] Epoch: 1 Batch: 18789/20099 (93.48%) Loss: 2.009293 LR: 0.00000639 +[17:36:23] Epoch: 1 Batch: 18790/20099 (93.49%) Loss: 2.040404 LR: 0.00000639 +[17:36:25] Epoch: 1 Batch: 18791/20099 (93.49%) Loss: 2.039186 LR: 0.00000639 +[17:36:27] Epoch: 1 Batch: 18792/20099 (93.50%) Loss: 1.941736 LR: 0.00000639 +[17:36:29] Epoch: 1 Batch: 18793/20099 (93.50%) Loss: 2.182559 LR: 0.00000639 +[17:36:31] Epoch: 1 Batch: 18794/20099 (93.51%) Loss: 1.916400 LR: 0.00000639 +[17:36:33] Epoch: 1 Batch: 18795/20099 (93.51%) Loss: 2.411017 LR: 0.00000639 +[17:36:35] Epoch: 1 Batch: 18796/20099 (93.52%) Loss: 2.071110 LR: 0.00000638 +[17:36:36] Epoch: 1 Batch: 18797/20099 (93.52%) Loss: 2.140448 LR: 0.00000638 +[17:36:38] Epoch: 1 Batch: 18798/20099 (93.53%) Loss: 2.302137 LR: 0.00000638 +[17:36:40] Epoch: 1 Batch: 18799/20099 (93.53%) Loss: 2.134970 LR: 0.00000638 +[17:36:46] >> Cleaned up old temp checkpoint: epoch1_step16800 +[17:36:46] >> Temp checkpoint saved: epoch1_step18800, size: 0.1693 GB +[17:36:46] Epoch: 1 Batch: 18800/20099 (93.54%) Loss: 1.885792 LR: 0.00000638 +[17:36:48] Epoch: 1 Batch: 18801/20099 (93.54%) Loss: 1.947433 LR: 0.00000638 +[17:36:49] Epoch: 1 Batch: 18802/20099 (93.55%) Loss: 1.903567 LR: 0.00000638 +[17:36:51] Epoch: 1 Batch: 18803/20099 (93.55%) Loss: 2.278106 LR: 0.00000638 +[17:36:53] Epoch: 1 Batch: 18804/20099 (93.56%) Loss: 2.265444 LR: 0.00000638 +[17:36:55] Epoch: 1 Batch: 18805/20099 (93.56%) Loss: 2.026125 LR: 0.00000638 +[17:36:57] Epoch: 1 Batch: 18806/20099 (93.57%) Loss: 2.132472 LR: 0.00000638 +[17:36:59] Epoch: 1 Batch: 18807/20099 (93.57%) Loss: 1.751912 LR: 0.00000638 +[17:37:01] Epoch: 1 Batch: 18808/20099 (93.58%) Loss: 2.314101 LR: 0.00000638 +[17:37:02] Epoch: 1 Batch: 18809/20099 (93.58%) Loss: 2.162289 LR: 0.00000638 +[17:37:04] Epoch: 1 Batch: 18810/20099 (93.59%) Loss: 2.132331 LR: 0.00000637 +[17:37:06] Epoch: 1 Batch: 18811/20099 (93.59%) Loss: 1.926652 LR: 0.00000637 +[17:37:08] Epoch: 1 Batch: 18812/20099 (93.60%) Loss: 2.130433 LR: 0.00000637 +[17:37:10] Epoch: 1 Batch: 18813/20099 (93.60%) Loss: 2.084897 LR: 0.00000637 +[17:37:12] Epoch: 1 Batch: 18814/20099 (93.61%) Loss: 2.275780 LR: 0.00000637 +[17:37:14] Epoch: 1 Batch: 18815/20099 (93.61%) Loss: 2.090987 LR: 0.00000637 +[17:37:16] Epoch: 1 Batch: 18816/20099 (93.62%) Loss: 2.123373 LR: 0.00000637 +[17:37:17] Epoch: 1 Batch: 18817/20099 (93.62%) Loss: 1.644003 LR: 0.00000637 +[17:37:19] Epoch: 1 Batch: 18818/20099 (93.63%) Loss: 2.177297 LR: 0.00000637 +[17:37:21] Epoch: 1 Batch: 18819/20099 (93.63%) Loss: 1.641241 LR: 0.00000637 +[17:37:23] Epoch: 1 Batch: 18820/20099 (93.64%) Loss: 2.119945 LR: 0.00000637 +[17:37:25] Epoch: 1 Batch: 18821/20099 (93.64%) Loss: 2.314338 LR: 0.00000637 +[17:37:27] Epoch: 1 Batch: 18822/20099 (93.65%) Loss: 1.986167 LR: 0.00000637 +[17:37:29] Epoch: 1 Batch: 18823/20099 (93.65%) Loss: 2.032900 LR: 0.00000637 +[17:37:30] Epoch: 1 Batch: 18824/20099 (93.66%) Loss: 1.994736 LR: 0.00000637 +[17:37:32] Epoch: 1 Batch: 18825/20099 (93.66%) Loss: 2.139333 LR: 0.00000637 +[17:37:34] Epoch: 1 Batch: 18826/20099 (93.67%) Loss: 2.302745 LR: 0.00000637 +[17:37:36] Epoch: 1 Batch: 18827/20099 (93.67%) Loss: 2.265903 LR: 0.00000637 +[17:37:38] Epoch: 1 Batch: 18828/20099 (93.68%) Loss: 1.683404 LR: 0.00000637 +[17:37:40] Epoch: 1 Batch: 18829/20099 (93.68%) Loss: 2.141759 LR: 0.00000637 +[17:37:41] Epoch: 1 Batch: 18830/20099 (93.69%) Loss: 2.068971 LR: 0.00000637 +[17:37:43] Epoch: 1 Batch: 18831/20099 (93.69%) Loss: 2.064252 LR: 0.00000636 +[17:37:45] Epoch: 1 Batch: 18832/20099 (93.70%) Loss: 1.994169 LR: 0.00000636 +[17:37:47] Epoch: 1 Batch: 18833/20099 (93.70%) Loss: 2.008592 LR: 0.00000636 +[17:37:49] Epoch: 1 Batch: 18834/20099 (93.71%) Loss: 2.055545 LR: 0.00000636 +[17:37:51] Epoch: 1 Batch: 18835/20099 (93.71%) Loss: 2.405630 LR: 0.00000636 +[17:37:53] Epoch: 1 Batch: 18836/20099 (93.72%) Loss: 1.993490 LR: 0.00000636 +[17:37:54] Epoch: 1 Batch: 18837/20099 (93.72%) Loss: 2.127676 LR: 0.00000636 +[17:37:56] Epoch: 1 Batch: 18838/20099 (93.73%) Loss: 2.072916 LR: 0.00000636 +[17:37:58] Epoch: 1 Batch: 18839/20099 (93.73%) Loss: 2.077196 LR: 0.00000636 +[17:38:00] Epoch: 1 Batch: 18840/20099 (93.74%) Loss: 2.022184 LR: 0.00000636 +[17:38:02] Epoch: 1 Batch: 18841/20099 (93.74%) Loss: 2.049360 LR: 0.00000636 +[17:38:04] Epoch: 1 Batch: 18842/20099 (93.75%) Loss: 1.737040 LR: 0.00000636 +[17:38:05] Epoch: 1 Batch: 18843/20099 (93.75%) Loss: 2.266951 LR: 0.00000636 +[17:38:07] Epoch: 1 Batch: 18844/20099 (93.76%) Loss: 2.244268 LR: 0.00000636 +[17:38:09] Epoch: 1 Batch: 18845/20099 (93.76%) Loss: 2.109494 LR: 0.00000635 +[17:38:11] Epoch: 1 Batch: 18846/20099 (93.77%) Loss: 2.008463 LR: 0.00000635 +[17:38:13] Epoch: 1 Batch: 18847/20099 (93.77%) Loss: 2.133718 LR: 0.00000635 +[17:38:15] Epoch: 1 Batch: 18848/20099 (93.78%) Loss: 1.984733 LR: 0.00000635 +[17:38:17] Epoch: 1 Batch: 18849/20099 (93.78%) Loss: 2.152451 LR: 0.00000635 +[17:38:18] Epoch: 1 Batch: 18850/20099 (93.79%) Loss: 1.904969 LR: 0.00000635 +[17:38:20] Epoch: 1 Batch: 18851/20099 (93.79%) Loss: 2.103430 LR: 0.00000635 +[17:38:22] Epoch: 1 Batch: 18852/20099 (93.80%) Loss: 1.832710 LR: 0.00000635 +[17:38:24] Epoch: 1 Batch: 18853/20099 (93.80%) Loss: 2.296971 LR: 0.00000635 +[17:38:26] Epoch: 1 Batch: 18854/20099 (93.81%) Loss: 2.229144 LR: 0.00000635 +[17:38:28] Epoch: 1 Batch: 18855/20099 (93.81%) Loss: 2.040869 LR: 0.00000635 +[17:38:30] Epoch: 1 Batch: 18856/20099 (93.82%) Loss: 1.851839 LR: 0.00000635 +[17:38:31] Epoch: 1 Batch: 18857/20099 (93.82%) Loss: 2.076534 LR: 0.00000635 +[17:38:33] Epoch: 1 Batch: 18858/20099 (93.83%) Loss: 1.885270 LR: 0.00000635 +[17:38:35] Epoch: 1 Batch: 18859/20099 (93.83%) Loss: 2.301797 LR: 0.00000635 +[17:38:37] Epoch: 1 Batch: 18860/20099 (93.84%) Loss: 2.399562 LR: 0.00000635 +[17:38:39] Epoch: 1 Batch: 18861/20099 (93.84%) Loss: 2.322425 LR: 0.00000635 +[17:38:41] Epoch: 1 Batch: 18862/20099 (93.85%) Loss: 2.041503 LR: 0.00000635 +[17:38:43] Epoch: 1 Batch: 18863/20099 (93.85%) Loss: 1.829449 LR: 0.00000635 +[17:38:45] Epoch: 1 Batch: 18864/20099 (93.86%) Loss: 2.224522 LR: 0.00000635 +[17:38:46] Epoch: 1 Batch: 18865/20099 (93.86%) Loss: 2.050713 LR: 0.00000635 +[17:38:48] Epoch: 1 Batch: 18866/20099 (93.87%) Loss: 2.088889 LR: 0.00000634 +[17:38:50] Epoch: 1 Batch: 18867/20099 (93.87%) Loss: 1.903591 LR: 0.00000634 +[17:38:52] Epoch: 1 Batch: 18868/20099 (93.88%) Loss: 1.837787 LR: 0.00000634 +[17:38:54] Epoch: 1 Batch: 18869/20099 (93.88%) Loss: 1.899860 LR: 0.00000634 +[17:38:56] Epoch: 1 Batch: 18870/20099 (93.89%) Loss: 2.153050 LR: 0.00000634 +[17:38:58] Epoch: 1 Batch: 18871/20099 (93.89%) Loss: 2.115349 LR: 0.00000634 +[17:38:59] Epoch: 1 Batch: 18872/20099 (93.90%) Loss: 2.281894 LR: 0.00000634 +[17:39:01] Epoch: 1 Batch: 18873/20099 (93.90%) Loss: 2.020916 LR: 0.00000634 +[17:39:03] Epoch: 1 Batch: 18874/20099 (93.91%) Loss: 2.364813 LR: 0.00000634 +[17:39:05] Epoch: 1 Batch: 18875/20099 (93.91%) Loss: 2.139720 LR: 0.00000634 +[17:39:07] Epoch: 1 Batch: 18876/20099 (93.92%) Loss: 2.006542 LR: 0.00000634 +[17:39:09] Epoch: 1 Batch: 18877/20099 (93.92%) Loss: 2.313743 LR: 0.00000634 +[17:39:11] Epoch: 1 Batch: 18878/20099 (93.93%) Loss: 2.136689 LR: 0.00000634 +[17:39:12] Epoch: 1 Batch: 18879/20099 (93.93%) Loss: 2.237786 LR: 0.00000634 +[17:39:14] Epoch: 1 Batch: 18880/20099 (93.94%) Loss: 2.011243 LR: 0.00000634 +[17:39:16] Epoch: 1 Batch: 18881/20099 (93.94%) Loss: 1.684474 LR: 0.00000634 +[17:39:18] Epoch: 1 Batch: 18882/20099 (93.94%) Loss: 2.020290 LR: 0.00000634 +[17:39:20] Epoch: 1 Batch: 18883/20099 (93.95%) Loss: 2.218421 LR: 0.00000634 +[17:39:22] Epoch: 1 Batch: 18884/20099 (93.95%) Loss: 2.297864 LR: 0.00000634 +[17:39:24] Epoch: 1 Batch: 18885/20099 (93.96%) Loss: 2.290954 LR: 0.00000634 +[17:39:25] Epoch: 1 Batch: 18886/20099 (93.96%) Loss: 2.012318 LR: 0.00000634 +[17:39:27] Epoch: 1 Batch: 18887/20099 (93.97%) Loss: 2.335898 LR: 0.00000633 +[17:39:29] Epoch: 1 Batch: 18888/20099 (93.97%) Loss: 2.256416 LR: 0.00000633 +[17:39:31] Epoch: 1 Batch: 18889/20099 (93.98%) Loss: 2.014206 LR: 0.00000633 +[17:39:33] Epoch: 1 Batch: 18890/20099 (93.98%) Loss: 2.244164 LR: 0.00000633 +[17:39:35] Epoch: 1 Batch: 18891/20099 (93.99%) Loss: 2.372243 LR: 0.00000633 +[17:39:37] Epoch: 1 Batch: 18892/20099 (93.99%) Loss: 1.874442 LR: 0.00000633 +[17:39:38] Epoch: 1 Batch: 18893/20099 (94.00%) Loss: 2.349739 LR: 0.00000633 +[17:39:40] Epoch: 1 Batch: 18894/20099 (94.00%) Loss: 2.053530 LR: 0.00000633 +[17:39:42] Epoch: 1 Batch: 18895/20099 (94.01%) Loss: 2.091776 LR: 0.00000633 +[17:39:44] Epoch: 1 Batch: 18896/20099 (94.01%) Loss: 2.131095 LR: 0.00000633 +[17:39:46] Epoch: 1 Batch: 18897/20099 (94.02%) Loss: 1.939365 LR: 0.00000633 +[17:39:48] Epoch: 1 Batch: 18898/20099 (94.02%) Loss: 2.188623 LR: 0.00000633 +[17:39:50] Epoch: 1 Batch: 18899/20099 (94.03%) Loss: 2.270235 LR: 0.00000633 +[17:39:51] Epoch: 1 Batch: 18900/20099 (94.03%) Loss: 2.249335 LR: 0.00000633 +[17:39:53] Epoch: 1 Batch: 18901/20099 (94.04%) Loss: 1.662834 LR: 0.00000632 +[17:39:55] Epoch: 1 Batch: 18902/20099 (94.04%) Loss: 2.107272 LR: 0.00000632 +[17:39:57] Epoch: 1 Batch: 18903/20099 (94.05%) Loss: 2.056877 LR: 0.00000632 +[17:39:59] Epoch: 1 Batch: 18904/20099 (94.05%) Loss: 2.058735 LR: 0.00000632 +[17:40:01] Epoch: 1 Batch: 18905/20099 (94.06%) Loss: 2.231776 LR: 0.00000632 +[17:40:02] Epoch: 1 Batch: 18906/20099 (94.06%) Loss: 2.030151 LR: 0.00000632 +[17:40:04] Epoch: 1 Batch: 18907/20099 (94.07%) Loss: 2.122410 LR: 0.00000632 +[17:40:06] Epoch: 1 Batch: 18908/20099 (94.07%) Loss: 2.169614 LR: 0.00000632 +[17:40:08] Epoch: 1 Batch: 18909/20099 (94.08%) Loss: 2.206417 LR: 0.00000632 +[17:40:10] Epoch: 1 Batch: 18910/20099 (94.08%) Loss: 2.175643 LR: 0.00000632 +[17:40:12] Epoch: 1 Batch: 18911/20099 (94.09%) Loss: 1.995345 LR: 0.00000632 +[17:40:14] Epoch: 1 Batch: 18912/20099 (94.09%) Loss: 2.025333 LR: 0.00000632 +[17:40:15] Epoch: 1 Batch: 18913/20099 (94.10%) Loss: 1.953658 LR: 0.00000632 +[17:40:17] Epoch: 1 Batch: 18914/20099 (94.10%) Loss: 1.967561 LR: 0.00000632 +[17:40:19] Epoch: 1 Batch: 18915/20099 (94.11%) Loss: 2.115386 LR: 0.00000632 +[17:40:21] Epoch: 1 Batch: 18916/20099 (94.11%) Loss: 2.210395 LR: 0.00000632 +[17:40:23] Epoch: 1 Batch: 18917/20099 (94.12%) Loss: 2.008346 LR: 0.00000632 +[17:40:25] Epoch: 1 Batch: 18918/20099 (94.12%) Loss: 2.211252 LR: 0.00000632 +[17:40:27] Epoch: 1 Batch: 18919/20099 (94.13%) Loss: 2.087225 LR: 0.00000632 +[17:40:28] Epoch: 1 Batch: 18920/20099 (94.13%) Loss: 1.850089 LR: 0.00000632 +[17:40:30] Epoch: 1 Batch: 18921/20099 (94.14%) Loss: 2.062256 LR: 0.00000632 +[17:40:32] Epoch: 1 Batch: 18922/20099 (94.14%) Loss: 1.976392 LR: 0.00000631 +[17:40:34] Epoch: 1 Batch: 18923/20099 (94.15%) Loss: 2.228108 LR: 0.00000631 +[17:40:36] Epoch: 1 Batch: 18924/20099 (94.15%) Loss: 2.288451 LR: 0.00000631 +[17:40:38] Epoch: 1 Batch: 18925/20099 (94.16%) Loss: 2.137476 LR: 0.00000631 +[17:40:39] Epoch: 1 Batch: 18926/20099 (94.16%) Loss: 1.900648 LR: 0.00000631 +[17:40:41] Epoch: 1 Batch: 18927/20099 (94.17%) Loss: 2.049162 LR: 0.00000631 +[17:40:43] Epoch: 1 Batch: 18928/20099 (94.17%) Loss: 1.781119 LR: 0.00000631 +[17:40:45] Epoch: 1 Batch: 18929/20099 (94.18%) Loss: 1.956563 LR: 0.00000631 +[17:40:47] Epoch: 1 Batch: 18930/20099 (94.18%) Loss: 2.024128 LR: 0.00000631 +[17:40:49] Epoch: 1 Batch: 18931/20099 (94.19%) Loss: 1.839046 LR: 0.00000631 +[17:40:51] Epoch: 1 Batch: 18932/20099 (94.19%) Loss: 2.398629 LR: 0.00000631 +[17:40:52] Epoch: 1 Batch: 18933/20099 (94.20%) Loss: 1.855192 LR: 0.00000631 +[17:40:54] Epoch: 1 Batch: 18934/20099 (94.20%) Loss: 1.978988 LR: 0.00000631 +[17:40:56] Epoch: 1 Batch: 18935/20099 (94.21%) Loss: 1.873926 LR: 0.00000631 +[17:40:58] Epoch: 1 Batch: 18936/20099 (94.21%) Loss: 2.253057 LR: 0.00000631 +[17:41:00] Epoch: 1 Batch: 18937/20099 (94.22%) Loss: 2.187789 LR: 0.00000631 +[17:41:02] Epoch: 1 Batch: 18938/20099 (94.22%) Loss: 2.210035 LR: 0.00000631 +[17:41:03] Epoch: 1 Batch: 18939/20099 (94.23%) Loss: 2.040642 LR: 0.00000631 +[17:41:05] Epoch: 1 Batch: 18940/20099 (94.23%) Loss: 2.014747 LR: 0.00000631 +[17:41:07] Epoch: 1 Batch: 18941/20099 (94.24%) Loss: 1.973518 LR: 0.00000631 +[17:41:09] Epoch: 1 Batch: 18942/20099 (94.24%) Loss: 2.036056 LR: 0.00000631 +[17:41:11] Epoch: 1 Batch: 18943/20099 (94.25%) Loss: 2.435516 LR: 0.00000630 +[17:41:13] Epoch: 1 Batch: 18944/20099 (94.25%) Loss: 2.070271 LR: 0.00000630 +[17:41:15] Epoch: 1 Batch: 18945/20099 (94.26%) Loss: 2.029995 LR: 0.00000630 +[17:41:16] Epoch: 1 Batch: 18946/20099 (94.26%) Loss: 2.152287 LR: 0.00000630 +[17:41:18] Epoch: 1 Batch: 18947/20099 (94.27%) Loss: 1.930840 LR: 0.00000630 +[17:41:20] Epoch: 1 Batch: 18948/20099 (94.27%) Loss: 2.202269 LR: 0.00000630 +[17:41:22] Epoch: 1 Batch: 18949/20099 (94.28%) Loss: 1.884151 LR: 0.00000630 +[17:41:24] Epoch: 1 Batch: 18950/20099 (94.28%) Loss: 2.040607 LR: 0.00000630 +[17:41:26] Epoch: 1 Batch: 18951/20099 (94.29%) Loss: 2.072490 LR: 0.00000630 +[17:41:27] Epoch: 1 Batch: 18952/20099 (94.29%) Loss: 1.976477 LR: 0.00000630 +[17:41:29] Epoch: 1 Batch: 18953/20099 (94.30%) Loss: 2.136546 LR: 0.00000630 +[17:41:31] Epoch: 1 Batch: 18954/20099 (94.30%) Loss: 1.837832 LR: 0.00000630 +[17:41:33] Epoch: 1 Batch: 18955/20099 (94.31%) Loss: 2.195533 LR: 0.00000630 +[17:41:35] Epoch: 1 Batch: 18956/20099 (94.31%) Loss: 2.130661 LR: 0.00000630 +[17:41:37] Epoch: 1 Batch: 18957/20099 (94.32%) Loss: 2.150196 LR: 0.00000629 +[17:41:39] Epoch: 1 Batch: 18958/20099 (94.32%) Loss: 2.515425 LR: 0.00000629 +[17:41:40] Epoch: 1 Batch: 18959/20099 (94.33%) Loss: 2.175138 LR: 0.00000629 +[17:41:42] Epoch: 1 Batch: 18960/20099 (94.33%) Loss: 2.110434 LR: 0.00000629 +[17:41:44] Epoch: 1 Batch: 18961/20099 (94.34%) Loss: 2.090422 LR: 0.00000629 +[17:41:46] Epoch: 1 Batch: 18962/20099 (94.34%) Loss: 2.086721 LR: 0.00000629 +[17:41:48] Epoch: 1 Batch: 18963/20099 (94.35%) Loss: 2.134450 LR: 0.00000629 +[17:41:50] Epoch: 1 Batch: 18964/20099 (94.35%) Loss: 2.224017 LR: 0.00000629 +[17:41:52] Epoch: 1 Batch: 18965/20099 (94.36%) Loss: 1.878811 LR: 0.00000629 +[17:41:53] Epoch: 1 Batch: 18966/20099 (94.36%) Loss: 2.216249 LR: 0.00000629 +[17:41:55] Epoch: 1 Batch: 18967/20099 (94.37%) Loss: 2.389329 LR: 0.00000629 +[17:41:57] Epoch: 1 Batch: 18968/20099 (94.37%) Loss: 1.848604 LR: 0.00000629 +[17:41:59] Epoch: 1 Batch: 18969/20099 (94.38%) Loss: 2.376028 LR: 0.00000629 +[17:42:01] Epoch: 1 Batch: 18970/20099 (94.38%) Loss: 1.992151 LR: 0.00000629 +[17:42:03] Epoch: 1 Batch: 18971/20099 (94.39%) Loss: 2.107484 LR: 0.00000629 +[17:42:04] Epoch: 1 Batch: 18972/20099 (94.39%) Loss: 2.242568 LR: 0.00000629 +[17:42:06] Epoch: 1 Batch: 18973/20099 (94.40%) Loss: 1.898398 LR: 0.00000629 +[17:42:08] Epoch: 1 Batch: 18974/20099 (94.40%) Loss: 1.725408 LR: 0.00000629 +[17:42:10] Epoch: 1 Batch: 18975/20099 (94.41%) Loss: 1.672631 LR: 0.00000629 +[17:42:12] Epoch: 1 Batch: 18976/20099 (94.41%) Loss: 1.989157 LR: 0.00000629 +[17:42:14] Epoch: 1 Batch: 18977/20099 (94.42%) Loss: 2.038745 LR: 0.00000629 +[17:42:16] Epoch: 1 Batch: 18978/20099 (94.42%) Loss: 2.059676 LR: 0.00000628 +[17:42:17] Epoch: 1 Batch: 18979/20099 (94.43%) Loss: 2.221294 LR: 0.00000628 +[17:42:19] Epoch: 1 Batch: 18980/20099 (94.43%) Loss: 2.347026 LR: 0.00000628 +[17:42:21] Epoch: 1 Batch: 18981/20099 (94.44%) Loss: 2.304212 LR: 0.00000628 +[17:42:23] Epoch: 1 Batch: 18982/20099 (94.44%) Loss: 2.377649 LR: 0.00000628 +[17:42:25] Epoch: 1 Batch: 18983/20099 (94.45%) Loss: 1.896579 LR: 0.00000628 +[17:42:27] Epoch: 1 Batch: 18984/20099 (94.45%) Loss: 1.970669 LR: 0.00000628 +[17:42:29] Epoch: 1 Batch: 18985/20099 (94.46%) Loss: 2.072384 LR: 0.00000628 +[17:42:30] Epoch: 1 Batch: 18986/20099 (94.46%) Loss: 2.213712 LR: 0.00000628 +[17:42:32] Epoch: 1 Batch: 18987/20099 (94.47%) Loss: 1.848153 LR: 0.00000628 +[17:42:34] Epoch: 1 Batch: 18988/20099 (94.47%) Loss: 2.251995 LR: 0.00000628 +[17:42:36] Epoch: 1 Batch: 18989/20099 (94.48%) Loss: 2.018346 LR: 0.00000628 +[17:42:38] Epoch: 1 Batch: 18990/20099 (94.48%) Loss: 2.179210 LR: 0.00000628 +[17:42:40] Epoch: 1 Batch: 18991/20099 (94.49%) Loss: 2.129859 LR: 0.00000628 +[17:42:42] Epoch: 1 Batch: 18992/20099 (94.49%) Loss: 1.910518 LR: 0.00000628 +[17:42:43] Epoch: 1 Batch: 18993/20099 (94.50%) Loss: 2.022862 LR: 0.00000628 +[17:42:45] Epoch: 1 Batch: 18994/20099 (94.50%) Loss: 2.017751 LR: 0.00000628 +[17:42:47] Epoch: 1 Batch: 18995/20099 (94.51%) Loss: 2.208992 LR: 0.00000628 +[17:42:49] Epoch: 1 Batch: 18996/20099 (94.51%) Loss: 2.124212 LR: 0.00000628 +[17:42:51] Epoch: 1 Batch: 18997/20099 (94.52%) Loss: 2.006035 LR: 0.00000628 +[17:42:53] Epoch: 1 Batch: 18998/20099 (94.52%) Loss: 2.086472 LR: 0.00000628 +[17:42:55] Epoch: 1 Batch: 18999/20099 (94.53%) Loss: 2.119793 LR: 0.00000627 +[17:42:56] >> Evaluating batch 0 +[17:42:58] >> Evaluating batch 1 +[17:42:59] >> Evaluating batch 2 +[17:43:00] >> Evaluating batch 3 +[17:43:01] >> Evaluating batch 4 +[17:43:02] >> Evaluating batch 5 +[17:43:03] >> Evaluating batch 6 +[17:43:04] >> Evaluating batch 7 +[17:43:05] >> Evaluating batch 8 +[17:43:06] >> Evaluating batch 9 +[17:43:07] >> Evaluating batch 10 +[17:43:08] >> Evaluating batch 11 +[17:43:09] >> Evaluating batch 12 +[17:43:10] >> Evaluating batch 13 +[17:43:11] >> Evaluating batch 14 +[17:43:12] >> Evaluating batch 15 +[17:43:13] >> Evaluating batch 16 +[17:43:14] Epoch: 1 Step: 19000/20099 Evaluation: +[17:43:14] [1mAvg Loss Since Last Eval: 2.0872 Val Loss: 2.1460 Validation loss delta: 0.0001 Perplexity: 8.5502 LR: 0.00000627 +[17:43:17] >> Cleaned up old temp checkpoint: epoch1_step17000 +[17:43:17] >> Temp checkpoint saved: epoch1_step19000, size: 0.1693 GB +[17:43:21] >> Checkpoint saved: epoch1_step19000, size: 0.1693 GB +[17:43:21] Epoch: 1 Batch: 19000/20099 (94.53%) Loss: 2.124066 LR: 0.00000627 +[17:43:23] Epoch: 1 Batch: 19001/20099 (94.54%) Loss: 1.977629 LR: 0.00000627 +[17:43:25] Epoch: 1 Batch: 19002/20099 (94.54%) Loss: 2.163679 LR: 0.00000627 +[17:43:27] Epoch: 1 Batch: 19003/20099 (94.55%) Loss: 2.211445 LR: 0.00000627 +[17:43:29] Epoch: 1 Batch: 19004/20099 (94.55%) Loss: 2.461773 LR: 0.00000627 +[17:43:30] Epoch: 1 Batch: 19005/20099 (94.56%) Loss: 2.194065 LR: 0.00000627 +[17:43:32] Epoch: 1 Batch: 19006/20099 (94.56%) Loss: 2.215211 LR: 0.00000627 +[17:43:34] Epoch: 1 Batch: 19007/20099 (94.57%) Loss: 2.036795 LR: 0.00000627 +[17:43:36] Epoch: 1 Batch: 19008/20099 (94.57%) Loss: 2.430990 LR: 0.00000627 +[17:43:38] Epoch: 1 Batch: 19009/20099 (94.58%) Loss: 2.280551 LR: 0.00000627 +[17:43:40] Epoch: 1 Batch: 19010/20099 (94.58%) Loss: 1.902504 LR: 0.00000627 +[17:43:42] Epoch: 1 Batch: 19011/20099 (94.59%) Loss: 2.033902 LR: 0.00000627 +[17:43:44] Epoch: 1 Batch: 19012/20099 (94.59%) Loss: 2.101706 LR: 0.00000627 +[17:43:46] Epoch: 1 Batch: 19013/20099 (94.60%) Loss: 2.397848 LR: 0.00000627 +[17:43:47] Epoch: 1 Batch: 19014/20099 (94.60%) Loss: 2.104726 LR: 0.00000627 +[17:43:49] Epoch: 1 Batch: 19015/20099 (94.61%) Loss: 2.174184 LR: 0.00000627 +[17:43:51] Epoch: 1 Batch: 19016/20099 (94.61%) Loss: 1.822861 LR: 0.00000627 +[17:43:53] Epoch: 1 Batch: 19017/20099 (94.62%) Loss: 1.988816 LR: 0.00000627 +[17:43:55] Epoch: 1 Batch: 19018/20099 (94.62%) Loss: 2.130287 LR: 0.00000627 +[17:43:57] Epoch: 1 Batch: 19019/20099 (94.63%) Loss: 2.064925 LR: 0.00000627 +[17:43:59] Epoch: 1 Batch: 19020/20099 (94.63%) Loss: 1.901406 LR: 0.00000626 +[17:44:01] Epoch: 1 Batch: 19021/20099 (94.64%) Loss: 1.936075 LR: 0.00000626 +[17:44:03] Epoch: 1 Batch: 19022/20099 (94.64%) Loss: 1.969674 LR: 0.00000626 +[17:44:04] Epoch: 1 Batch: 19023/20099 (94.65%) Loss: 2.344706 LR: 0.00000626 +[17:44:06] Epoch: 1 Batch: 19024/20099 (94.65%) Loss: 1.944178 LR: 0.00000626 +[17:44:08] Epoch: 1 Batch: 19025/20099 (94.66%) Loss: 2.168022 LR: 0.00000626 +[17:44:10] Epoch: 1 Batch: 19026/20099 (94.66%) Loss: 1.888553 LR: 0.00000626 +[17:44:12] Epoch: 1 Batch: 19027/20099 (94.67%) Loss: 2.058875 LR: 0.00000626 +[17:44:14] Epoch: 1 Batch: 19028/20099 (94.67%) Loss: 1.962032 LR: 0.00000626 +[17:44:16] Epoch: 1 Batch: 19029/20099 (94.68%) Loss: 1.925310 LR: 0.00000626 +[17:44:17] Epoch: 1 Batch: 19030/20099 (94.68%) Loss: 2.049188 LR: 0.00000626 +[17:44:19] Epoch: 1 Batch: 19031/20099 (94.69%) Loss: 1.792070 LR: 0.00000626 +[17:44:21] Epoch: 1 Batch: 19032/20099 (94.69%) Loss: 1.949693 LR: 0.00000626 +[17:44:23] Epoch: 1 Batch: 19033/20099 (94.70%) Loss: 2.184498 LR: 0.00000626 +[17:44:25] Epoch: 1 Batch: 19034/20099 (94.70%) Loss: 2.177499 LR: 0.00000626 +[17:44:27] Epoch: 1 Batch: 19035/20099 (94.71%) Loss: 1.747552 LR: 0.00000626 +[17:44:28] Epoch: 1 Batch: 19036/20099 (94.71%) Loss: 2.124082 LR: 0.00000626 +[17:44:30] Epoch: 1 Batch: 19037/20099 (94.72%) Loss: 2.147720 LR: 0.00000626 +[17:44:32] Epoch: 1 Batch: 19038/20099 (94.72%) Loss: 1.956648 LR: 0.00000626 +[17:44:34] Epoch: 1 Batch: 19039/20099 (94.73%) Loss: 2.151826 LR: 0.00000626 +[17:44:36] Epoch: 1 Batch: 19040/20099 (94.73%) Loss: 2.102213 LR: 0.00000626 +[17:44:38] Epoch: 1 Batch: 19041/20099 (94.74%) Loss: 1.980709 LR: 0.00000625 +[17:44:40] Epoch: 1 Batch: 19042/20099 (94.74%) Loss: 2.130724 LR: 0.00000625 +[17:44:41] Epoch: 1 Batch: 19043/20099 (94.75%) Loss: 1.990044 LR: 0.00000625 +[17:44:43] Epoch: 1 Batch: 19044/20099 (94.75%) Loss: 2.223107 LR: 0.00000625 +[17:44:45] Epoch: 1 Batch: 19045/20099 (94.76%) Loss: 1.653735 LR: 0.00000625 +[17:44:47] Epoch: 1 Batch: 19046/20099 (94.76%) Loss: 2.267890 LR: 0.00000625 +[17:44:49] Epoch: 1 Batch: 19047/20099 (94.77%) Loss: 1.986172 LR: 0.00000625 +[17:44:51] Epoch: 1 Batch: 19048/20099 (94.77%) Loss: 1.642796 LR: 0.00000625 +[17:44:52] Epoch: 1 Batch: 19049/20099 (94.78%) Loss: 2.098203 LR: 0.00000625 +[17:44:54] Epoch: 1 Batch: 19050/20099 (94.78%) Loss: 2.115925 LR: 0.00000625 +[17:44:56] Epoch: 1 Batch: 19051/20099 (94.79%) Loss: 2.317911 LR: 0.00000625 +[17:44:58] Epoch: 1 Batch: 19052/20099 (94.79%) Loss: 1.670208 LR: 0.00000625 +[17:45:00] Epoch: 1 Batch: 19053/20099 (94.80%) Loss: 2.267307 LR: 0.00000625 +[17:45:02] Epoch: 1 Batch: 19054/20099 (94.80%) Loss: 2.235002 LR: 0.00000625 +[17:45:04] Epoch: 1 Batch: 19055/20099 (94.81%) Loss: 1.947238 LR: 0.00000625 +[17:45:05] Epoch: 1 Batch: 19056/20099 (94.81%) Loss: 2.132453 LR: 0.00000625 +[17:45:07] Epoch: 1 Batch: 19057/20099 (94.82%) Loss: 2.061967 LR: 0.00000625 +[17:45:09] Epoch: 1 Batch: 19058/20099 (94.82%) Loss: 2.113499 LR: 0.00000625 +[17:45:11] Epoch: 1 Batch: 19059/20099 (94.83%) Loss: 1.874085 LR: 0.00000625 +[17:45:13] Epoch: 1 Batch: 19060/20099 (94.83%) Loss: 2.107150 LR: 0.00000625 +[17:45:15] Epoch: 1 Batch: 19061/20099 (94.84%) Loss: 2.134414 LR: 0.00000625 +[17:45:17] Epoch: 1 Batch: 19062/20099 (94.84%) Loss: 2.422300 LR: 0.00000624 +[17:45:18] Epoch: 1 Batch: 19063/20099 (94.85%) Loss: 1.945963 LR: 0.00000624 +[17:45:20] Epoch: 1 Batch: 19064/20099 (94.85%) Loss: 2.076646 LR: 0.00000624 +[17:45:22] Epoch: 1 Batch: 19065/20099 (94.86%) Loss: 1.970679 LR: 0.00000624 +[17:45:24] Epoch: 1 Batch: 19066/20099 (94.86%) Loss: 1.979329 LR: 0.00000624 +[17:45:26] Epoch: 1 Batch: 19067/20099 (94.87%) Loss: 2.095322 LR: 0.00000624 +[17:45:28] Epoch: 1 Batch: 19068/20099 (94.87%) Loss: 1.995820 LR: 0.00000624 +[17:45:30] Epoch: 1 Batch: 19069/20099 (94.88%) Loss: 2.166484 LR: 0.00000624 +[17:45:31] Epoch: 1 Batch: 19070/20099 (94.88%) Loss: 1.972367 LR: 0.00000624 +[17:45:33] Epoch: 1 Batch: 19071/20099 (94.89%) Loss: 1.820091 LR: 0.00000624 +[17:45:35] Epoch: 1 Batch: 19072/20099 (94.89%) Loss: 1.982157 LR: 0.00000624 +[17:45:37] Epoch: 1 Batch: 19073/20099 (94.90%) Loss: 2.092489 LR: 0.00000624 +[17:45:39] Epoch: 1 Batch: 19074/20099 (94.90%) Loss: 1.747104 LR: 0.00000624 +[17:45:41] Epoch: 1 Batch: 19075/20099 (94.91%) Loss: 2.365318 LR: 0.00000624 +[17:45:43] Epoch: 1 Batch: 19076/20099 (94.91%) Loss: 1.736719 LR: 0.00000624 +[17:45:45] Epoch: 1 Batch: 19077/20099 (94.92%) Loss: 2.252074 LR: 0.00000624 +[17:45:46] Epoch: 1 Batch: 19078/20099 (94.92%) Loss: 2.164671 LR: 0.00000624 +[17:45:48] Epoch: 1 Batch: 19079/20099 (94.93%) Loss: 2.262967 LR: 0.00000624 +[17:45:50] Epoch: 1 Batch: 19080/20099 (94.93%) Loss: 2.020811 LR: 0.00000624 +[17:45:52] Epoch: 1 Batch: 19081/20099 (94.94%) Loss: 2.084850 LR: 0.00000624 +[17:45:54] Epoch: 1 Batch: 19082/20099 (94.94%) Loss: 2.289929 LR: 0.00000624 +[17:45:56] Epoch: 1 Batch: 19083/20099 (94.95%) Loss: 1.871007 LR: 0.00000623 +[17:45:58] Epoch: 1 Batch: 19084/20099 (94.95%) Loss: 2.352017 LR: 0.00000623 +[17:45:59] Epoch: 1 Batch: 19085/20099 (94.95%) Loss: 2.158286 LR: 0.00000623 +[17:46:01] Epoch: 1 Batch: 19086/20099 (94.96%) Loss: 2.090886 LR: 0.00000623 +[17:46:03] Epoch: 1 Batch: 19087/20099 (94.96%) Loss: 2.222639 LR: 0.00000623 +[17:46:05] Epoch: 1 Batch: 19088/20099 (94.97%) Loss: 2.144529 LR: 0.00000623 +[17:46:07] Epoch: 1 Batch: 19089/20099 (94.97%) Loss: 2.232392 LR: 0.00000623 +[17:46:09] Epoch: 1 Batch: 19090/20099 (94.98%) Loss: 2.151242 LR: 0.00000623 +[17:46:11] Epoch: 1 Batch: 19091/20099 (94.98%) Loss: 2.151306 LR: 0.00000623 +[17:46:12] Epoch: 1 Batch: 19092/20099 (94.99%) Loss: 2.221907 LR: 0.00000623 +[17:46:14] Epoch: 1 Batch: 19093/20099 (94.99%) Loss: 1.769434 LR: 0.00000623 +[17:46:16] Epoch: 1 Batch: 19094/20099 (95.00%) Loss: 1.890808 LR: 0.00000623 +[17:46:18] Epoch: 1 Batch: 19095/20099 (95.00%) Loss: 2.033147 LR: 0.00000623 +[17:46:20] Epoch: 1 Batch: 19096/20099 (95.01%) Loss: 2.116609 LR: 0.00000623 +[17:46:22] Epoch: 1 Batch: 19097/20099 (95.01%) Loss: 2.210115 LR: 0.00000623 +[17:46:24] Epoch: 1 Batch: 19098/20099 (95.02%) Loss: 2.256449 LR: 0.00000623 +[17:46:25] Epoch: 1 Batch: 19099/20099 (95.02%) Loss: 2.313223 LR: 0.00000623 +[17:46:27] Epoch: 1 Batch: 19100/20099 (95.03%) Loss: 1.688814 LR: 0.00000623 +[17:46:29] Epoch: 1 Batch: 19101/20099 (95.03%) Loss: 2.271181 LR: 0.00000623 +[17:46:31] Epoch: 1 Batch: 19102/20099 (95.04%) Loss: 1.967984 LR: 0.00000623 +[17:46:33] Epoch: 1 Batch: 19103/20099 (95.04%) Loss: 1.996438 LR: 0.00000623 +[17:46:35] Epoch: 1 Batch: 19104/20099 (95.05%) Loss: 1.968906 LR: 0.00000622 +[17:46:37] Epoch: 1 Batch: 19105/20099 (95.05%) Loss: 2.072537 LR: 0.00000622 +[17:46:38] Epoch: 1 Batch: 19106/20099 (95.06%) Loss: 2.073584 LR: 0.00000622 +[17:46:40] Epoch: 1 Batch: 19107/20099 (95.06%) Loss: 1.953782 LR: 0.00000622 +[17:46:42] Epoch: 1 Batch: 19108/20099 (95.07%) Loss: 2.123389 LR: 0.00000622 +[17:46:44] Epoch: 1 Batch: 19109/20099 (95.07%) Loss: 1.984866 LR: 0.00000622 +[17:46:46] Epoch: 1 Batch: 19110/20099 (95.08%) Loss: 2.157506 LR: 0.00000622 +[17:46:48] Epoch: 1 Batch: 19111/20099 (95.08%) Loss: 2.095464 LR: 0.00000622 +[17:46:50] Epoch: 1 Batch: 19112/20099 (95.09%) Loss: 1.908317 LR: 0.00000622 +[17:46:51] Epoch: 1 Batch: 19113/20099 (95.09%) Loss: 2.032139 LR: 0.00000622 +[17:46:53] Epoch: 1 Batch: 19114/20099 (95.10%) Loss: 2.219545 LR: 0.00000622 +[17:46:55] Epoch: 1 Batch: 19115/20099 (95.10%) Loss: 2.160140 LR: 0.00000622 +[17:46:57] Epoch: 1 Batch: 19116/20099 (95.11%) Loss: 2.153012 LR: 0.00000622 +[17:46:59] Epoch: 1 Batch: 19117/20099 (95.11%) Loss: 2.140130 LR: 0.00000622 +[17:47:01] Epoch: 1 Batch: 19118/20099 (95.12%) Loss: 2.097713 LR: 0.00000622 +[17:47:03] Epoch: 1 Batch: 19119/20099 (95.12%) Loss: 2.416894 LR: 0.00000622 +[17:47:04] Epoch: 1 Batch: 19120/20099 (95.13%) Loss: 1.848419 LR: 0.00000622 +[17:47:06] Epoch: 1 Batch: 19121/20099 (95.13%) Loss: 2.056425 LR: 0.00000622 +[17:47:08] Epoch: 1 Batch: 19122/20099 (95.14%) Loss: 2.144402 LR: 0.00000622 +[17:47:10] Epoch: 1 Batch: 19123/20099 (95.14%) Loss: 1.922931 LR: 0.00000622 +[17:47:12] Epoch: 1 Batch: 19124/20099 (95.15%) Loss: 2.279506 LR: 0.00000622 +[17:47:14] Epoch: 1 Batch: 19125/20099 (95.15%) Loss: 2.003331 LR: 0.00000621 +[17:47:16] Epoch: 1 Batch: 19126/20099 (95.16%) Loss: 1.974100 LR: 0.00000621 +[17:47:17] Epoch: 1 Batch: 19127/20099 (95.16%) Loss: 1.791069 LR: 0.00000621 +[17:47:19] Epoch: 1 Batch: 19128/20099 (95.17%) Loss: 2.304812 LR: 0.00000621 +[17:47:21] Epoch: 1 Batch: 19129/20099 (95.17%) Loss: 2.248230 LR: 0.00000621 +[17:47:23] Epoch: 1 Batch: 19130/20099 (95.18%) Loss: 2.053443 LR: 0.00000621 +[17:47:25] Epoch: 1 Batch: 19131/20099 (95.18%) Loss: 2.002259 LR: 0.00000621 +[17:47:27] Epoch: 1 Batch: 19132/20099 (95.19%) Loss: 1.951243 LR: 0.00000621 +[17:47:29] Epoch: 1 Batch: 19133/20099 (95.19%) Loss: 2.085312 LR: 0.00000621 +[17:47:30] Epoch: 1 Batch: 19134/20099 (95.20%) Loss: 2.450269 LR: 0.00000621 +[17:47:32] Epoch: 1 Batch: 19135/20099 (95.20%) Loss: 2.315492 LR: 0.00000621 +[17:47:34] Epoch: 1 Batch: 19136/20099 (95.21%) Loss: 1.946923 LR: 0.00000621 +[17:47:36] Epoch: 1 Batch: 19137/20099 (95.21%) Loss: 2.330607 LR: 0.00000621 +[17:47:38] Epoch: 1 Batch: 19138/20099 (95.22%) Loss: 2.296384 LR: 0.00000621 +[17:47:40] Epoch: 1 Batch: 19139/20099 (95.22%) Loss: 2.013048 LR: 0.00000621 +[17:47:42] Epoch: 1 Batch: 19140/20099 (95.23%) Loss: 2.147447 LR: 0.00000621 +[17:47:43] Epoch: 1 Batch: 19141/20099 (95.23%) Loss: 1.794285 LR: 0.00000621 +[17:47:45] Epoch: 1 Batch: 19142/20099 (95.24%) Loss: 2.182471 LR: 0.00000621 +[17:47:47] Epoch: 1 Batch: 19143/20099 (95.24%) Loss: 1.964452 LR: 0.00000621 +[17:47:49] Epoch: 1 Batch: 19144/20099 (95.25%) Loss: 1.981396 LR: 0.00000621 +[17:47:51] Epoch: 1 Batch: 19145/20099 (95.25%) Loss: 1.793979 LR: 0.00000621 +[17:47:53] Epoch: 1 Batch: 19146/20099 (95.26%) Loss: 1.831638 LR: 0.00000621 +[17:47:54] Epoch: 1 Batch: 19147/20099 (95.26%) Loss: 2.008935 LR: 0.00000621 +[17:47:56] Epoch: 1 Batch: 19148/20099 (95.27%) Loss: 1.768438 LR: 0.00000621 +[17:47:58] Epoch: 1 Batch: 19149/20099 (95.27%) Loss: 2.036927 LR: 0.00000621 +[17:48:00] Epoch: 1 Batch: 19150/20099 (95.28%) Loss: 2.165780 LR: 0.00000621 +[17:48:02] Epoch: 1 Batch: 19151/20099 (95.28%) Loss: 1.613072 LR: 0.00000621 +[17:48:04] Epoch: 1 Batch: 19152/20099 (95.29%) Loss: 1.954875 LR: 0.00000621 +[17:48:06] Epoch: 1 Batch: 19153/20099 (95.29%) Loss: 1.835941 LR: 0.00000620 +[17:48:07] Epoch: 1 Batch: 19154/20099 (95.30%) Loss: 2.073872 LR: 0.00000620 +[17:48:09] Epoch: 1 Batch: 19155/20099 (95.30%) Loss: 2.034414 LR: 0.00000620 +[17:48:11] Epoch: 1 Batch: 19156/20099 (95.31%) Loss: 1.940453 LR: 0.00000620 +[17:48:13] Epoch: 1 Batch: 19157/20099 (95.31%) Loss: 1.988105 LR: 0.00000620 +[17:48:15] Epoch: 1 Batch: 19158/20099 (95.32%) Loss: 2.059215 LR: 0.00000620 +[17:48:17] Epoch: 1 Batch: 19159/20099 (95.32%) Loss: 2.075120 LR: 0.00000620 +[17:48:19] Epoch: 1 Batch: 19160/20099 (95.33%) Loss: 1.976802 LR: 0.00000620 +[17:48:20] Epoch: 1 Batch: 19161/20099 (95.33%) Loss: 1.928421 LR: 0.00000620 +[17:48:22] Epoch: 1 Batch: 19162/20099 (95.34%) Loss: 2.027040 LR: 0.00000620 +[17:48:24] Epoch: 1 Batch: 19163/20099 (95.34%) Loss: 1.999191 LR: 0.00000620 +[17:48:26] Epoch: 1 Batch: 19164/20099 (95.35%) Loss: 2.044357 LR: 0.00000620 +[17:48:28] Epoch: 1 Batch: 19165/20099 (95.35%) Loss: 2.076180 LR: 0.00000620 +[17:48:30] Epoch: 1 Batch: 19166/20099 (95.36%) Loss: 2.222180 LR: 0.00000620 +[17:48:31] Epoch: 1 Batch: 19167/20099 (95.36%) Loss: 1.950299 LR: 0.00000620 +[17:48:33] Epoch: 1 Batch: 19168/20099 (95.37%) Loss: 2.166123 LR: 0.00000620 +[17:48:35] Epoch: 1 Batch: 19169/20099 (95.37%) Loss: 2.267328 LR: 0.00000620 +[17:48:37] Epoch: 1 Batch: 19170/20099 (95.38%) Loss: 1.901120 LR: 0.00000620 +[17:48:39] Epoch: 1 Batch: 19171/20099 (95.38%) Loss: 2.203068 LR: 0.00000620 +[17:48:41] Epoch: 1 Batch: 19172/20099 (95.39%) Loss: 1.718506 LR: 0.00000620 +[17:48:43] Epoch: 1 Batch: 19173/20099 (95.39%) Loss: 2.164622 LR: 0.00000620 +[17:48:44] Epoch: 1 Batch: 19174/20099 (95.40%) Loss: 2.154397 LR: 0.00000619 +[17:48:46] Epoch: 1 Batch: 19175/20099 (95.40%) Loss: 2.189280 LR: 0.00000619 +[17:48:48] Epoch: 1 Batch: 19176/20099 (95.41%) Loss: 1.990188 LR: 0.00000619 +[17:48:50] Epoch: 1 Batch: 19177/20099 (95.41%) Loss: 2.105336 LR: 0.00000619 +[17:48:52] Epoch: 1 Batch: 19178/20099 (95.42%) Loss: 2.048081 LR: 0.00000619 +[17:48:54] Epoch: 1 Batch: 19179/20099 (95.42%) Loss: 2.012219 LR: 0.00000619 +[17:48:56] Epoch: 1 Batch: 19180/20099 (95.43%) Loss: 2.290995 LR: 0.00000619 +[17:48:57] Epoch: 1 Batch: 19181/20099 (95.43%) Loss: 2.003184 LR: 0.00000619 +[17:48:59] Epoch: 1 Batch: 19182/20099 (95.44%) Loss: 2.192874 LR: 0.00000619 +[17:49:01] Epoch: 1 Batch: 19183/20099 (95.44%) Loss: 2.045545 LR: 0.00000619 +[17:49:03] Epoch: 1 Batch: 19184/20099 (95.45%) Loss: 1.915342 LR: 0.00000619 +[17:49:05] Epoch: 1 Batch: 19185/20099 (95.45%) Loss: 1.938540 LR: 0.00000619 +[17:49:07] Epoch: 1 Batch: 19186/20099 (95.46%) Loss: 2.322129 LR: 0.00000619 +[17:49:09] Epoch: 1 Batch: 19187/20099 (95.46%) Loss: 2.157093 LR: 0.00000619 +[17:49:10] Epoch: 1 Batch: 19188/20099 (95.47%) Loss: 2.030738 LR: 0.00000619 +[17:49:12] Epoch: 1 Batch: 19189/20099 (95.47%) Loss: 1.976158 LR: 0.00000619 +[17:49:14] Epoch: 1 Batch: 19190/20099 (95.48%) Loss: 2.063291 LR: 0.00000619 +[17:49:16] Epoch: 1 Batch: 19191/20099 (95.48%) Loss: 1.848789 LR: 0.00000619 +[17:49:18] Epoch: 1 Batch: 19192/20099 (95.49%) Loss: 1.696146 LR: 0.00000619 +[17:49:20] Epoch: 1 Batch: 19193/20099 (95.49%) Loss: 2.008295 LR: 0.00000619 +[17:49:22] Epoch: 1 Batch: 19194/20099 (95.50%) Loss: 2.154993 LR: 0.00000619 +[17:49:23] Epoch: 1 Batch: 19195/20099 (95.50%) Loss: 2.204476 LR: 0.00000619 +[17:49:25] Epoch: 1 Batch: 19196/20099 (95.51%) Loss: 2.223096 LR: 0.00000619 +[17:49:27] Epoch: 1 Batch: 19197/20099 (95.51%) Loss: 2.149396 LR: 0.00000619 +[17:49:29] Epoch: 1 Batch: 19198/20099 (95.52%) Loss: 2.151521 LR: 0.00000619 +[17:49:31] Epoch: 1 Batch: 19199/20099 (95.52%) Loss: 1.866635 LR: 0.00000619 +[17:49:36] >> Cleaned up old temp checkpoint: epoch1_step17200 +[17:49:36] >> Temp checkpoint saved: epoch1_step19200, size: 0.1693 GB +[17:49:36] Epoch: 1 Batch: 19200/20099 (95.53%) Loss: 2.144485 LR: 0.00000619 +[17:49:38] Epoch: 1 Batch: 19201/20099 (95.53%) Loss: 2.156549 LR: 0.00000619 +[17:49:40] Epoch: 1 Batch: 19202/20099 (95.54%) Loss: 2.143527 LR: 0.00000618 +[17:49:42] Epoch: 1 Batch: 19203/20099 (95.54%) Loss: 2.074331 LR: 0.00000618 +[17:49:44] Epoch: 1 Batch: 19204/20099 (95.55%) Loss: 1.984075 LR: 0.00000618 +[17:49:45] Epoch: 1 Batch: 19205/20099 (95.55%) Loss: 1.979135 LR: 0.00000618 +[17:49:47] Epoch: 1 Batch: 19206/20099 (95.56%) Loss: 2.152142 LR: 0.00000618 +[17:49:49] Epoch: 1 Batch: 19207/20099 (95.56%) Loss: 2.554687 LR: 0.00000618 +[17:49:51] Epoch: 1 Batch: 19208/20099 (95.57%) Loss: 2.116932 LR: 0.00000618 +[17:49:53] Epoch: 1 Batch: 19209/20099 (95.57%) Loss: 2.299011 LR: 0.00000618 +[17:49:55] Epoch: 1 Batch: 19210/20099 (95.58%) Loss: 2.098840 LR: 0.00000618 +[17:49:57] Epoch: 1 Batch: 19211/20099 (95.58%) Loss: 2.005692 LR: 0.00000618 +[17:49:59] Epoch: 1 Batch: 19212/20099 (95.59%) Loss: 1.906707 LR: 0.00000618 +[17:50:00] Epoch: 1 Batch: 19213/20099 (95.59%) Loss: 2.253526 LR: 0.00000618 +[17:50:02] Epoch: 1 Batch: 19214/20099 (95.60%) Loss: 2.148888 LR: 0.00000618 +[17:50:04] Epoch: 1 Batch: 19215/20099 (95.60%) Loss: 2.282751 LR: 0.00000618 +[17:50:06] Epoch: 1 Batch: 19216/20099 (95.61%) Loss: 2.112956 LR: 0.00000618 +[17:50:08] Epoch: 1 Batch: 19217/20099 (95.61%) Loss: 2.117719 LR: 0.00000618 +[17:50:10] Epoch: 1 Batch: 19218/20099 (95.62%) Loss: 2.298433 LR: 0.00000618 +[17:50:12] Epoch: 1 Batch: 19219/20099 (95.62%) Loss: 2.285566 LR: 0.00000618 +[17:50:14] Epoch: 1 Batch: 19220/20099 (95.63%) Loss: 2.140619 LR: 0.00000618 +[17:50:15] Epoch: 1 Batch: 19221/20099 (95.63%) Loss: 2.106320 LR: 0.00000618 +[17:50:17] Epoch: 1 Batch: 19222/20099 (95.64%) Loss: 2.344803 LR: 0.00000618 +[17:50:19] Epoch: 1 Batch: 19223/20099 (95.64%) Loss: 1.940233 LR: 0.00000617 +[17:50:21] Epoch: 1 Batch: 19224/20099 (95.65%) Loss: 1.980187 LR: 0.00000617 +[17:50:23] Epoch: 1 Batch: 19225/20099 (95.65%) Loss: 2.072789 LR: 0.00000617 +[17:50:25] Epoch: 1 Batch: 19226/20099 (95.66%) Loss: 2.085811 LR: 0.00000617 +[17:50:27] Epoch: 1 Batch: 19227/20099 (95.66%) Loss: 2.177317 LR: 0.00000617 +[17:50:28] Epoch: 1 Batch: 19228/20099 (95.67%) Loss: 2.180286 LR: 0.00000617 +[17:50:30] Epoch: 1 Batch: 19229/20099 (95.67%) Loss: 1.923912 LR: 0.00000617 +[17:50:32] Epoch: 1 Batch: 19230/20099 (95.68%) Loss: 2.132566 LR: 0.00000617 +[17:50:34] Epoch: 1 Batch: 19231/20099 (95.68%) Loss: 2.087453 LR: 0.00000617 +[17:50:36] Epoch: 1 Batch: 19232/20099 (95.69%) Loss: 1.976746 LR: 0.00000617 +[17:50:38] Epoch: 1 Batch: 19233/20099 (95.69%) Loss: 1.997505 LR: 0.00000617 +[17:50:40] Epoch: 1 Batch: 19234/20099 (95.70%) Loss: 2.364505 LR: 0.00000617 +[17:50:41] Epoch: 1 Batch: 19235/20099 (95.70%) Loss: 2.005272 LR: 0.00000617 +[17:50:43] Epoch: 1 Batch: 19236/20099 (95.71%) Loss: 2.084250 LR: 0.00000617 +[17:50:45] Epoch: 1 Batch: 19237/20099 (95.71%) Loss: 2.101813 LR: 0.00000617 +[17:50:47] Epoch: 1 Batch: 19238/20099 (95.72%) Loss: 1.898337 LR: 0.00000617 +[17:50:49] Epoch: 1 Batch: 19239/20099 (95.72%) Loss: 1.864245 LR: 0.00000617 +[17:50:51] Epoch: 1 Batch: 19240/20099 (95.73%) Loss: 1.887712 LR: 0.00000617 +[17:50:52] Epoch: 1 Batch: 19241/20099 (95.73%) Loss: 2.212617 LR: 0.00000617 +[17:50:54] Epoch: 1 Batch: 19242/20099 (95.74%) Loss: 2.278148 LR: 0.00000617 +[17:50:56] Epoch: 1 Batch: 19243/20099 (95.74%) Loss: 2.362340 LR: 0.00000617 +[17:50:58] Epoch: 1 Batch: 19244/20099 (95.75%) Loss: 2.214197 LR: 0.00000617 +[17:51:00] Epoch: 1 Batch: 19245/20099 (95.75%) Loss: 1.943503 LR: 0.00000617 +[17:51:02] Epoch: 1 Batch: 19246/20099 (95.76%) Loss: 2.040317 LR: 0.00000617 +[17:51:04] Epoch: 1 Batch: 19247/20099 (95.76%) Loss: 2.189617 LR: 0.00000617 +[17:51:05] Epoch: 1 Batch: 19248/20099 (95.77%) Loss: 2.145845 LR: 0.00000617 +[17:51:07] Epoch: 1 Batch: 19249/20099 (95.77%) Loss: 2.211088 LR: 0.00000617 +[17:51:09] Epoch: 1 Batch: 19250/20099 (95.78%) Loss: 2.423914 LR: 0.00000617 +[17:51:11] Epoch: 1 Batch: 19251/20099 (95.78%) Loss: 2.307972 LR: 0.00000616 +[17:51:13] Epoch: 1 Batch: 19252/20099 (95.79%) Loss: 2.269443 LR: 0.00000616 +[17:51:15] Epoch: 1 Batch: 19253/20099 (95.79%) Loss: 2.143629 LR: 0.00000616 +[17:51:17] Epoch: 1 Batch: 19254/20099 (95.80%) Loss: 2.064335 LR: 0.00000616 +[17:51:18] Epoch: 1 Batch: 19255/20099 (95.80%) Loss: 2.286610 LR: 0.00000616 +[17:51:20] Epoch: 1 Batch: 19256/20099 (95.81%) Loss: 1.904684 LR: 0.00000616 +[17:51:22] Epoch: 1 Batch: 19257/20099 (95.81%) Loss: 1.972625 LR: 0.00000616 +[17:51:24] Epoch: 1 Batch: 19258/20099 (95.82%) Loss: 1.896065 LR: 0.00000616 +[17:51:26] Epoch: 1 Batch: 19259/20099 (95.82%) Loss: 2.162983 LR: 0.00000616 +[17:51:28] Epoch: 1 Batch: 19260/20099 (95.83%) Loss: 2.011559 LR: 0.00000616 +[17:51:30] Epoch: 1 Batch: 19261/20099 (95.83%) Loss: 1.939342 LR: 0.00000616 +[17:51:31] Epoch: 1 Batch: 19262/20099 (95.84%) Loss: 2.360000 LR: 0.00000616 +[17:51:33] Epoch: 1 Batch: 19263/20099 (95.84%) Loss: 2.007145 LR: 0.00000616 +[17:51:35] Epoch: 1 Batch: 19264/20099 (95.85%) Loss: 2.060386 LR: 0.00000616 +[17:51:37] Epoch: 1 Batch: 19265/20099 (95.85%) Loss: 2.213036 LR: 0.00000616 +[17:51:39] Epoch: 1 Batch: 19266/20099 (95.86%) Loss: 2.171942 LR: 0.00000616 +[17:51:41] Epoch: 1 Batch: 19267/20099 (95.86%) Loss: 1.997608 LR: 0.00000616 +[17:51:43] Epoch: 1 Batch: 19268/20099 (95.87%) Loss: 2.044609 LR: 0.00000616 +[17:51:44] Epoch: 1 Batch: 19269/20099 (95.87%) Loss: 2.135028 LR: 0.00000616 +[17:51:46] Epoch: 1 Batch: 19270/20099 (95.88%) Loss: 2.251157 LR: 0.00000616 +[17:51:48] Epoch: 1 Batch: 19271/20099 (95.88%) Loss: 2.073727 LR: 0.00000616 +[17:51:50] Epoch: 1 Batch: 19272/20099 (95.89%) Loss: 1.850901 LR: 0.00000616 +[17:51:52] Epoch: 1 Batch: 19273/20099 (95.89%) Loss: 2.174431 LR: 0.00000616 +[17:51:54] Epoch: 1 Batch: 19274/20099 (95.90%) Loss: 1.966506 LR: 0.00000616 +[17:51:56] Epoch: 1 Batch: 19275/20099 (95.90%) Loss: 1.960494 LR: 0.00000616 +[17:51:58] Epoch: 1 Batch: 19276/20099 (95.91%) Loss: 1.920515 LR: 0.00000616 +[17:51:59] Epoch: 1 Batch: 19277/20099 (95.91%) Loss: 2.121280 LR: 0.00000616 +[17:52:01] Epoch: 1 Batch: 19278/20099 (95.92%) Loss: 2.005651 LR: 0.00000616 +[17:52:03] Epoch: 1 Batch: 19279/20099 (95.92%) Loss: 2.349174 LR: 0.00000615 +[17:52:05] Epoch: 1 Batch: 19280/20099 (95.93%) Loss: 2.098175 LR: 0.00000615 +[17:52:07] Epoch: 1 Batch: 19281/20099 (95.93%) Loss: 2.254025 LR: 0.00000615 +[17:52:09] Epoch: 1 Batch: 19282/20099 (95.94%) Loss: 1.946934 LR: 0.00000615 +[17:52:10] Epoch: 1 Batch: 19283/20099 (95.94%) Loss: 2.241768 LR: 0.00000615 +[17:52:12] Epoch: 1 Batch: 19284/20099 (95.95%) Loss: 1.918355 LR: 0.00000615 +[17:52:14] Epoch: 1 Batch: 19285/20099 (95.95%) Loss: 2.156464 LR: 0.00000615 +[17:52:16] Epoch: 1 Batch: 19286/20099 (95.96%) Loss: 1.896860 LR: 0.00000615 +[17:52:18] Epoch: 1 Batch: 19287/20099 (95.96%) Loss: 1.747467 LR: 0.00000615 +[17:52:20] Epoch: 1 Batch: 19288/20099 (95.96%) Loss: 2.037144 LR: 0.00000615 +[17:52:22] Epoch: 1 Batch: 19289/20099 (95.97%) Loss: 1.990064 LR: 0.00000615 +[17:52:23] Epoch: 1 Batch: 19290/20099 (95.97%) Loss: 2.401813 LR: 0.00000615 +[17:52:25] Epoch: 1 Batch: 19291/20099 (95.98%) Loss: 2.035743 LR: 0.00000615 +[17:52:27] Epoch: 1 Batch: 19292/20099 (95.98%) Loss: 1.954222 LR: 0.00000615 +[17:52:29] Epoch: 1 Batch: 19293/20099 (95.99%) Loss: 2.066465 LR: 0.00000615 +[17:52:31] Epoch: 1 Batch: 19294/20099 (95.99%) Loss: 1.846586 LR: 0.00000615 +[17:52:33] Epoch: 1 Batch: 19295/20099 (96.00%) Loss: 2.185046 LR: 0.00000615 +[17:52:35] Epoch: 1 Batch: 19296/20099 (96.00%) Loss: 2.247917 LR: 0.00000615 +[17:52:36] Epoch: 1 Batch: 19297/20099 (96.01%) Loss: 2.504989 LR: 0.00000615 +[17:52:38] Epoch: 1 Batch: 19298/20099 (96.01%) Loss: 1.830147 LR: 0.00000615 +[17:52:40] Epoch: 1 Batch: 19299/20099 (96.02%) Loss: 1.966463 LR: 0.00000615 +[17:52:42] Epoch: 1 Batch: 19300/20099 (96.02%) Loss: 2.045630 LR: 0.00000615 +[17:52:44] Epoch: 1 Batch: 19301/20099 (96.03%) Loss: 2.121202 LR: 0.00000615 +[17:52:46] Epoch: 1 Batch: 19302/20099 (96.03%) Loss: 2.001701 LR: 0.00000615 +[17:52:47] Epoch: 1 Batch: 19303/20099 (96.04%) Loss: 2.119357 LR: 0.00000615 +[17:52:49] Epoch: 1 Batch: 19304/20099 (96.04%) Loss: 2.324713 LR: 0.00000615 +[17:52:51] Epoch: 1 Batch: 19305/20099 (96.05%) Loss: 2.090404 LR: 0.00000615 +[17:52:53] Epoch: 1 Batch: 19306/20099 (96.05%) Loss: 2.235808 LR: 0.00000615 +[17:52:55] Epoch: 1 Batch: 19307/20099 (96.06%) Loss: 2.131451 LR: 0.00000614 +[17:52:57] Epoch: 1 Batch: 19308/20099 (96.06%) Loss: 1.998028 LR: 0.00000614 +[17:52:59] Epoch: 1 Batch: 19309/20099 (96.07%) Loss: 2.403607 LR: 0.00000614 +[17:53:00] Epoch: 1 Batch: 19310/20099 (96.07%) Loss: 2.035325 LR: 0.00000614 +[17:53:02] Epoch: 1 Batch: 19311/20099 (96.08%) Loss: 2.004435 LR: 0.00000614 +[17:53:04] Epoch: 1 Batch: 19312/20099 (96.08%) Loss: 2.523798 LR: 0.00000614 +[17:53:06] Epoch: 1 Batch: 19313/20099 (96.09%) Loss: 2.101306 LR: 0.00000614 +[17:53:08] Epoch: 1 Batch: 19314/20099 (96.09%) Loss: 2.082710 LR: 0.00000614 +[17:53:10] Epoch: 1 Batch: 19315/20099 (96.10%) Loss: 2.158450 LR: 0.00000614 +[17:53:12] Epoch: 1 Batch: 19316/20099 (96.10%) Loss: 2.085894 LR: 0.00000614 +[17:53:13] Epoch: 1 Batch: 19317/20099 (96.11%) Loss: 1.920276 LR: 0.00000614 +[17:53:15] Epoch: 1 Batch: 19318/20099 (96.11%) Loss: 2.350251 LR: 0.00000614 +[17:53:17] Epoch: 1 Batch: 19319/20099 (96.12%) Loss: 2.002548 LR: 0.00000614 +[17:53:19] Epoch: 1 Batch: 19320/20099 (96.12%) Loss: 2.005158 LR: 0.00000614 +[17:53:21] Epoch: 1 Batch: 19321/20099 (96.13%) Loss: 1.957619 LR: 0.00000614 +[17:53:23] Epoch: 1 Batch: 19322/20099 (96.13%) Loss: 2.168481 LR: 0.00000614 +[17:53:25] Epoch: 1 Batch: 19323/20099 (96.14%) Loss: 1.947701 LR: 0.00000614 +[17:53:26] Epoch: 1 Batch: 19324/20099 (96.14%) Loss: 2.057435 LR: 0.00000614 +[17:53:28] Epoch: 1 Batch: 19325/20099 (96.15%) Loss: 1.992551 LR: 0.00000614 +[17:53:30] Epoch: 1 Batch: 19326/20099 (96.15%) Loss: 2.271888 LR: 0.00000614 +[17:53:32] Epoch: 1 Batch: 19327/20099 (96.16%) Loss: 1.961425 LR: 0.00000614 +[17:53:34] Epoch: 1 Batch: 19328/20099 (96.16%) Loss: 2.078290 LR: 0.00000614 +[17:53:36] Epoch: 1 Batch: 19329/20099 (96.17%) Loss: 2.182646 LR: 0.00000614 +[17:53:38] Epoch: 1 Batch: 19330/20099 (96.17%) Loss: 2.046505 LR: 0.00000614 +[17:53:40] Epoch: 1 Batch: 19331/20099 (96.18%) Loss: 2.068397 LR: 0.00000614 +[17:53:41] Epoch: 1 Batch: 19332/20099 (96.18%) Loss: 2.285892 LR: 0.00000614 +[17:53:43] Epoch: 1 Batch: 19333/20099 (96.19%) Loss: 1.986217 LR: 0.00000614 +[17:53:45] Epoch: 1 Batch: 19334/20099 (96.19%) Loss: 2.115404 LR: 0.00000614 +[17:53:47] Epoch: 1 Batch: 19335/20099 (96.20%) Loss: 2.298965 LR: 0.00000613 +[17:53:49] Epoch: 1 Batch: 19336/20099 (96.20%) Loss: 1.903817 LR: 0.00000613 +[17:53:51] Epoch: 1 Batch: 19337/20099 (96.21%) Loss: 2.047595 LR: 0.00000613 +[17:53:53] Epoch: 1 Batch: 19338/20099 (96.21%) Loss: 2.126400 LR: 0.00000613 +[17:53:54] Epoch: 1 Batch: 19339/20099 (96.22%) Loss: 1.660000 LR: 0.00000613 +[17:53:56] Epoch: 1 Batch: 19340/20099 (96.22%) Loss: 2.151557 LR: 0.00000613 +[17:53:58] Epoch: 1 Batch: 19341/20099 (96.23%) Loss: 2.040665 LR: 0.00000613 +[17:54:00] Epoch: 1 Batch: 19342/20099 (96.23%) Loss: 2.017295 LR: 0.00000613 +[17:54:02] Epoch: 1 Batch: 19343/20099 (96.24%) Loss: 2.097999 LR: 0.00000613 +[17:54:04] Epoch: 1 Batch: 19344/20099 (96.24%) Loss: 2.218729 LR: 0.00000613 +[17:54:06] Epoch: 1 Batch: 19345/20099 (96.25%) Loss: 1.557062 LR: 0.00000613 +[17:54:07] Epoch: 1 Batch: 19346/20099 (96.25%) Loss: 2.061547 LR: 0.00000613 +[17:54:09] Epoch: 1 Batch: 19347/20099 (96.26%) Loss: 2.107486 LR: 0.00000613 +[17:54:11] Epoch: 1 Batch: 19348/20099 (96.26%) Loss: 1.981693 LR: 0.00000613 +[17:54:13] Epoch: 1 Batch: 19349/20099 (96.27%) Loss: 2.163957 LR: 0.00000613 +[17:54:15] Epoch: 1 Batch: 19350/20099 (96.27%) Loss: 1.940536 LR: 0.00000613 +[17:54:17] Epoch: 1 Batch: 19351/20099 (96.28%) Loss: 2.279514 LR: 0.00000613 +[17:54:19] Epoch: 1 Batch: 19352/20099 (96.28%) Loss: 2.062138 LR: 0.00000613 +[17:54:20] Epoch: 1 Batch: 19353/20099 (96.29%) Loss: 2.368973 LR: 0.00000613 +[17:54:22] Epoch: 1 Batch: 19354/20099 (96.29%) Loss: 2.137545 LR: 0.00000613 +[17:54:24] Epoch: 1 Batch: 19355/20099 (96.30%) Loss: 1.982227 LR: 0.00000613 +[17:54:26] Epoch: 1 Batch: 19356/20099 (96.30%) Loss: 2.155360 LR: 0.00000613 +[17:54:28] Epoch: 1 Batch: 19357/20099 (96.31%) Loss: 1.993412 LR: 0.00000613 +[17:54:30] Epoch: 1 Batch: 19358/20099 (96.31%) Loss: 2.258946 LR: 0.00000613 +[17:54:31] Epoch: 1 Batch: 19359/20099 (96.32%) Loss: 2.113856 LR: 0.00000613 +[17:54:33] Epoch: 1 Batch: 19360/20099 (96.32%) Loss: 1.843676 LR: 0.00000613 +[17:54:35] Epoch: 1 Batch: 19361/20099 (96.33%) Loss: 2.309748 LR: 0.00000613 +[17:54:37] Epoch: 1 Batch: 19362/20099 (96.33%) Loss: 1.844963 LR: 0.00000613 +[17:54:39] Epoch: 1 Batch: 19363/20099 (96.34%) Loss: 2.246712 LR: 0.00000612 +[17:54:41] Epoch: 1 Batch: 19364/20099 (96.34%) Loss: 2.020667 LR: 0.00000612 +[17:54:43] Epoch: 1 Batch: 19365/20099 (96.35%) Loss: 2.043519 LR: 0.00000612 +[17:54:44] Epoch: 1 Batch: 19366/20099 (96.35%) Loss: 2.245112 LR: 0.00000612 +[17:54:46] Epoch: 1 Batch: 19367/20099 (96.36%) Loss: 1.809615 LR: 0.00000612 +[17:54:48] Epoch: 1 Batch: 19368/20099 (96.36%) Loss: 2.149504 LR: 0.00000612 +[17:54:50] Epoch: 1 Batch: 19369/20099 (96.37%) Loss: 2.042169 LR: 0.00000612 +[17:54:52] Epoch: 1 Batch: 19370/20099 (96.37%) Loss: 2.340504 LR: 0.00000612 +[17:54:54] Epoch: 1 Batch: 19371/20099 (96.38%) Loss: 1.903045 LR: 0.00000612 +[17:54:56] Epoch: 1 Batch: 19372/20099 (96.38%) Loss: 2.288682 LR: 0.00000612 +[17:54:57] Epoch: 1 Batch: 19373/20099 (96.39%) Loss: 2.155350 LR: 0.00000612 +[17:54:59] Epoch: 1 Batch: 19374/20099 (96.39%) Loss: 1.990450 LR: 0.00000612 +[17:55:01] Epoch: 1 Batch: 19375/20099 (96.40%) Loss: 1.896698 LR: 0.00000612 +[17:55:03] Epoch: 1 Batch: 19376/20099 (96.40%) Loss: 1.763433 LR: 0.00000612 +[17:55:05] Epoch: 1 Batch: 19377/20099 (96.41%) Loss: 1.811717 LR: 0.00000612 +[17:55:07] Epoch: 1 Batch: 19378/20099 (96.41%) Loss: 1.978618 LR: 0.00000612 +[17:55:09] Epoch: 1 Batch: 19379/20099 (96.42%) Loss: 1.934530 LR: 0.00000612 +[17:55:10] Epoch: 1 Batch: 19380/20099 (96.42%) Loss: 2.182753 LR: 0.00000612 +[17:55:12] Epoch: 1 Batch: 19381/20099 (96.43%) Loss: 1.943883 LR: 0.00000612 +[17:55:14] Epoch: 1 Batch: 19382/20099 (96.43%) Loss: 1.996518 LR: 0.00000612 +[17:55:16] Epoch: 1 Batch: 19383/20099 (96.44%) Loss: 2.188907 LR: 0.00000612 +[17:55:18] Epoch: 1 Batch: 19384/20099 (96.44%) Loss: 2.021890 LR: 0.00000612 +[17:55:20] Epoch: 1 Batch: 19385/20099 (96.45%) Loss: 2.134569 LR: 0.00000612 +[17:55:22] Epoch: 1 Batch: 19386/20099 (96.45%) Loss: 1.697063 LR: 0.00000612 +[17:55:23] Epoch: 1 Batch: 19387/20099 (96.46%) Loss: 2.006622 LR: 0.00000612 +[17:55:25] Epoch: 1 Batch: 19388/20099 (96.46%) Loss: 2.066455 LR: 0.00000612 +[17:55:27] Epoch: 1 Batch: 19389/20099 (96.47%) Loss: 1.704357 LR: 0.00000612 +[17:55:29] Epoch: 1 Batch: 19390/20099 (96.47%) Loss: 2.009863 LR: 0.00000612 +[17:55:31] Epoch: 1 Batch: 19391/20099 (96.48%) Loss: 2.196047 LR: 0.00000611 +[17:55:33] Epoch: 1 Batch: 19392/20099 (96.48%) Loss: 2.179289 LR: 0.00000611 +[17:55:35] Epoch: 1 Batch: 19393/20099 (96.49%) Loss: 2.236217 LR: 0.00000611 +[17:55:36] Epoch: 1 Batch: 19394/20099 (96.49%) Loss: 2.035829 LR: 0.00000611 +[17:55:38] Epoch: 1 Batch: 19395/20099 (96.50%) Loss: 1.961829 LR: 0.00000611 +[17:55:40] Epoch: 1 Batch: 19396/20099 (96.50%) Loss: 2.051625 LR: 0.00000611 +[17:55:42] Epoch: 1 Batch: 19397/20099 (96.51%) Loss: 2.077054 LR: 0.00000611 +[17:55:44] Epoch: 1 Batch: 19398/20099 (96.51%) Loss: 1.812675 LR: 0.00000611 +[17:55:46] Epoch: 1 Batch: 19399/20099 (96.52%) Loss: 2.121755 LR: 0.00000611 +[17:55:51] >> Cleaned up old temp checkpoint: epoch1_step17400 +[17:55:51] >> Temp checkpoint saved: epoch1_step19400, size: 0.1693 GB +[17:55:51] Epoch: 1 Batch: 19400/20099 (96.52%) Loss: 1.925099 LR: 0.00000611 +[17:55:53] Epoch: 1 Batch: 19401/20099 (96.53%) Loss: 1.917139 LR: 0.00000611 +[17:55:55] Epoch: 1 Batch: 19402/20099 (96.53%) Loss: 2.258086 LR: 0.00000611 +[17:55:57] Epoch: 1 Batch: 19403/20099 (96.54%) Loss: 2.195222 LR: 0.00000611 +[17:55:59] Epoch: 1 Batch: 19404/20099 (96.54%) Loss: 2.020311 LR: 0.00000611 +[17:56:00] Epoch: 1 Batch: 19405/20099 (96.55%) Loss: 1.865507 LR: 0.00000611 +[17:56:02] Epoch: 1 Batch: 19406/20099 (96.55%) Loss: 1.990460 LR: 0.00000611 +[17:56:04] Epoch: 1 Batch: 19407/20099 (96.56%) Loss: 2.076969 LR: 0.00000611 +[17:56:06] Epoch: 1 Batch: 19408/20099 (96.56%) Loss: 2.015680 LR: 0.00000611 +[17:56:08] Epoch: 1 Batch: 19409/20099 (96.57%) Loss: 2.140987 LR: 0.00000611 +[17:56:10] Epoch: 1 Batch: 19410/20099 (96.57%) Loss: 2.205754 LR: 0.00000611 +[17:56:12] Epoch: 1 Batch: 19411/20099 (96.58%) Loss: 2.132425 LR: 0.00000611 +[17:56:13] Epoch: 1 Batch: 19412/20099 (96.58%) Loss: 1.985133 LR: 0.00000611 +[17:56:15] Epoch: 1 Batch: 19413/20099 (96.59%) Loss: 2.017500 LR: 0.00000611 +[17:56:17] Epoch: 1 Batch: 19414/20099 (96.59%) Loss: 2.086883 LR: 0.00000611 +[17:56:19] Epoch: 1 Batch: 19415/20099 (96.60%) Loss: 1.951419 LR: 0.00000611 +[17:56:21] Epoch: 1 Batch: 19416/20099 (96.60%) Loss: 2.102908 LR: 0.00000611 +[17:56:23] Epoch: 1 Batch: 19417/20099 (96.61%) Loss: 2.181990 LR: 0.00000611 +[17:56:25] Epoch: 1 Batch: 19418/20099 (96.61%) Loss: 2.164812 LR: 0.00000611 +[17:56:26] Epoch: 1 Batch: 19419/20099 (96.62%) Loss: 1.808563 LR: 0.00000611 +[17:56:28] Epoch: 1 Batch: 19420/20099 (96.62%) Loss: 2.159036 LR: 0.00000611 +[17:56:30] Epoch: 1 Batch: 19421/20099 (96.63%) Loss: 2.462580 LR: 0.00000611 +[17:56:32] Epoch: 1 Batch: 19422/20099 (96.63%) Loss: 2.323354 LR: 0.00000611 +[17:56:34] Epoch: 1 Batch: 19423/20099 (96.64%) Loss: 2.112163 LR: 0.00000611 +[17:56:36] Epoch: 1 Batch: 19424/20099 (96.64%) Loss: 2.001533 LR: 0.00000611 +[17:56:38] Epoch: 1 Batch: 19425/20099 (96.65%) Loss: 1.966708 LR: 0.00000611 +[17:56:40] Epoch: 1 Batch: 19426/20099 (96.65%) Loss: 1.804164 LR: 0.00000610 +[17:56:41] Epoch: 1 Batch: 19427/20099 (96.66%) Loss: 2.171651 LR: 0.00000610 +[17:56:43] Epoch: 1 Batch: 19428/20099 (96.66%) Loss: 1.965465 LR: 0.00000610 +[17:56:45] Epoch: 1 Batch: 19429/20099 (96.67%) Loss: 2.064736 LR: 0.00000610 +[17:56:47] Epoch: 1 Batch: 19430/20099 (96.67%) Loss: 2.281380 LR: 0.00000610 +[17:56:49] Epoch: 1 Batch: 19431/20099 (96.68%) Loss: 2.105890 LR: 0.00000610 +[17:56:51] Epoch: 1 Batch: 19432/20099 (96.68%) Loss: 2.204804 LR: 0.00000610 +[17:56:53] Epoch: 1 Batch: 19433/20099 (96.69%) Loss: 1.852398 LR: 0.00000610 +[17:56:54] Epoch: 1 Batch: 19434/20099 (96.69%) Loss: 2.217866 LR: 0.00000610 +[17:56:56] Epoch: 1 Batch: 19435/20099 (96.70%) Loss: 1.799012 LR: 0.00000610 +[17:56:58] Epoch: 1 Batch: 19436/20099 (96.70%) Loss: 2.312256 LR: 0.00000610 +[17:57:00] Epoch: 1 Batch: 19437/20099 (96.71%) Loss: 2.063627 LR: 0.00000610 +[17:57:02] Epoch: 1 Batch: 19438/20099 (96.71%) Loss: 2.141188 LR: 0.00000610 +[17:57:04] Epoch: 1 Batch: 19439/20099 (96.72%) Loss: 2.190587 LR: 0.00000610 +[17:57:06] Epoch: 1 Batch: 19440/20099 (96.72%) Loss: 2.201427 LR: 0.00000610 +[17:57:07] Epoch: 1 Batch: 19441/20099 (96.73%) Loss: 2.004861 LR: 0.00000610 +[17:57:09] Epoch: 1 Batch: 19442/20099 (96.73%) Loss: 1.692448 LR: 0.00000610 +[17:57:11] Epoch: 1 Batch: 19443/20099 (96.74%) Loss: 1.792293 LR: 0.00000610 +[17:57:13] Epoch: 1 Batch: 19444/20099 (96.74%) Loss: 2.099373 LR: 0.00000610 +[17:57:15] Epoch: 1 Batch: 19445/20099 (96.75%) Loss: 1.901873 LR: 0.00000610 +[17:57:17] Epoch: 1 Batch: 19446/20099 (96.75%) Loss: 1.975660 LR: 0.00000610 +[17:57:19] Epoch: 1 Batch: 19447/20099 (96.76%) Loss: 1.867064 LR: 0.00000610 +[17:57:20] Epoch: 1 Batch: 19448/20099 (96.76%) Loss: 2.299204 LR: 0.00000610 +[17:57:22] Epoch: 1 Batch: 19449/20099 (96.77%) Loss: 2.042770 LR: 0.00000610 +[17:57:24] Epoch: 1 Batch: 19450/20099 (96.77%) Loss: 2.163883 LR: 0.00000610 +[17:57:26] Epoch: 1 Batch: 19451/20099 (96.78%) Loss: 2.170896 LR: 0.00000610 +[17:57:28] Epoch: 1 Batch: 19452/20099 (96.78%) Loss: 1.844895 LR: 0.00000610 +[17:57:30] Epoch: 1 Batch: 19453/20099 (96.79%) Loss: 1.967607 LR: 0.00000610 +[17:57:32] Epoch: 1 Batch: 19454/20099 (96.79%) Loss: 2.094273 LR: 0.00000609 +[17:57:33] Epoch: 1 Batch: 19455/20099 (96.80%) Loss: 1.927925 LR: 0.00000609 +[17:57:35] Epoch: 1 Batch: 19456/20099 (96.80%) Loss: 1.993478 LR: 0.00000609 +[17:57:37] Epoch: 1 Batch: 19457/20099 (96.81%) Loss: 1.928728 LR: 0.00000609 +[17:57:39] Epoch: 1 Batch: 19458/20099 (96.81%) Loss: 2.242888 LR: 0.00000609 +[17:57:41] Epoch: 1 Batch: 19459/20099 (96.82%) Loss: 2.047242 LR: 0.00000609 +[17:57:43] Epoch: 1 Batch: 19460/20099 (96.82%) Loss: 1.965682 LR: 0.00000609 +[17:57:45] Epoch: 1 Batch: 19461/20099 (96.83%) Loss: 2.219653 LR: 0.00000609 +[17:57:46] Epoch: 1 Batch: 19462/20099 (96.83%) Loss: 2.083118 LR: 0.00000609 +[17:57:48] Epoch: 1 Batch: 19463/20099 (96.84%) Loss: 2.296796 LR: 0.00000609 +[17:57:50] Epoch: 1 Batch: 19464/20099 (96.84%) Loss: 2.037698 LR: 0.00000609 +[17:57:52] Epoch: 1 Batch: 19465/20099 (96.85%) Loss: 2.283341 LR: 0.00000609 +[17:57:54] Epoch: 1 Batch: 19466/20099 (96.85%) Loss: 2.258079 LR: 0.00000609 +[17:57:56] Epoch: 1 Batch: 19467/20099 (96.86%) Loss: 1.868088 LR: 0.00000609 +[17:57:58] Epoch: 1 Batch: 19468/20099 (96.86%) Loss: 1.925231 LR: 0.00000609 +[17:57:59] Epoch: 1 Batch: 19469/20099 (96.87%) Loss: 2.134203 LR: 0.00000609 +[17:58:01] Epoch: 1 Batch: 19470/20099 (96.87%) Loss: 2.055877 LR: 0.00000609 +[17:58:03] Epoch: 1 Batch: 19471/20099 (96.88%) Loss: 2.046229 LR: 0.00000609 +[17:58:05] Epoch: 1 Batch: 19472/20099 (96.88%) Loss: 2.131074 LR: 0.00000609 +[17:58:07] Epoch: 1 Batch: 19473/20099 (96.89%) Loss: 2.177550 LR: 0.00000609 +[17:58:09] Epoch: 1 Batch: 19474/20099 (96.89%) Loss: 1.931331 LR: 0.00000609 +[17:58:11] Epoch: 1 Batch: 19475/20099 (96.90%) Loss: 1.712581 LR: 0.00000609 +[17:58:12] Epoch: 1 Batch: 19476/20099 (96.90%) Loss: 2.054656 LR: 0.00000609 +[17:58:14] Epoch: 1 Batch: 19477/20099 (96.91%) Loss: 1.903771 LR: 0.00000609 +[17:58:16] Epoch: 1 Batch: 19478/20099 (96.91%) Loss: 1.763790 LR: 0.00000609 +[17:58:18] Epoch: 1 Batch: 19479/20099 (96.92%) Loss: 1.917998 LR: 0.00000609 +[17:58:20] Epoch: 1 Batch: 19480/20099 (96.92%) Loss: 1.862601 LR: 0.00000609 +[17:58:22] Epoch: 1 Batch: 19481/20099 (96.93%) Loss: 1.981473 LR: 0.00000609 +[17:58:24] Epoch: 1 Batch: 19482/20099 (96.93%) Loss: 2.192888 LR: 0.00000609 +[17:58:26] Epoch: 1 Batch: 19483/20099 (96.94%) Loss: 2.191119 LR: 0.00000609 +[17:58:27] Epoch: 1 Batch: 19484/20099 (96.94%) Loss: 2.047935 LR: 0.00000609 +[17:58:29] Epoch: 1 Batch: 19485/20099 (96.95%) Loss: 2.176638 LR: 0.00000609 +[17:58:31] Epoch: 1 Batch: 19486/20099 (96.95%) Loss: 2.109624 LR: 0.00000609 +[17:58:33] Epoch: 1 Batch: 19487/20099 (96.96%) Loss: 1.885157 LR: 0.00000609 +[17:58:35] Epoch: 1 Batch: 19488/20099 (96.96%) Loss: 2.149888 LR: 0.00000609 +[17:58:37] Epoch: 1 Batch: 19489/20099 (96.97%) Loss: 2.200167 LR: 0.00000609 +[17:58:39] Epoch: 1 Batch: 19490/20099 (96.97%) Loss: 1.930757 LR: 0.00000609 +[17:58:40] Epoch: 1 Batch: 19491/20099 (96.97%) Loss: 1.809559 LR: 0.00000609 +[17:58:42] Epoch: 1 Batch: 19492/20099 (96.98%) Loss: 1.936587 LR: 0.00000609 +[17:58:44] Epoch: 1 Batch: 19493/20099 (96.98%) Loss: 1.901129 LR: 0.00000609 +[17:58:46] Epoch: 1 Batch: 19494/20099 (96.99%) Loss: 2.301649 LR: 0.00000609 +[17:58:48] Epoch: 1 Batch: 19495/20099 (96.99%) Loss: 2.171221 LR: 0.00000609 +[17:58:50] Epoch: 1 Batch: 19496/20099 (97.00%) Loss: 2.007379 LR: 0.00000608 +[17:58:52] Epoch: 1 Batch: 19497/20099 (97.00%) Loss: 1.939808 LR: 0.00000608 +[17:58:53] Epoch: 1 Batch: 19498/20099 (97.01%) Loss: 2.022080 LR: 0.00000608 +[17:58:55] Epoch: 1 Batch: 19499/20099 (97.01%) Loss: 2.302041 LR: 0.00000608 +[17:58:57] >> Evaluating batch 0 +[17:58:58] >> Evaluating batch 1 +[17:58:59] >> Evaluating batch 2 +[17:59:01] >> Evaluating batch 3 +[17:59:02] >> Evaluating batch 4 +[17:59:03] >> Evaluating batch 5 +[17:59:04] >> Evaluating batch 6 +[17:59:05] >> Evaluating batch 7 +[17:59:06] >> Evaluating batch 8 +[17:59:07] >> Evaluating batch 9 +[17:59:08] >> Evaluating batch 10 +[17:59:09] >> Evaluating batch 11 +[17:59:10] >> Evaluating batch 12 +[17:59:11] >> Evaluating batch 13 +[17:59:12] >> Evaluating batch 14 +[17:59:13] >> Evaluating batch 15 +[17:59:14] >> Evaluating batch 16 +[17:59:15] Epoch: 1 Step: 19500/20099 Evaluation: +[17:59:15] [1mAvg Loss Since Last Eval: 2.0715 Val Loss: 2.1466 Validation loss delta: 0.0007 Perplexity: 8.5557 LR: 0.00000608 +[17:59:18] >> Checkpoint saved: epoch1_step19500, size: 0.1693 GB +[17:59:18] Epoch: 1 Batch: 19500/20099 (97.02%) Loss: 2.199610 LR: 0.00000608 +[17:59:20] Epoch: 1 Batch: 19501/20099 (97.02%) Loss: 2.145238 LR: 0.00000608 +[17:59:22] Epoch: 1 Batch: 19502/20099 (97.03%) Loss: 2.047187 LR: 0.00000608 +[17:59:24] Epoch: 1 Batch: 19503/20099 (97.03%) Loss: 1.855285 LR: 0.00000608 +[17:59:25] Epoch: 1 Batch: 19504/20099 (97.04%) Loss: 1.993897 LR: 0.00000608 +[17:59:27] Epoch: 1 Batch: 19505/20099 (97.04%) Loss: 2.341565 LR: 0.00000608 +[17:59:29] Epoch: 1 Batch: 19506/20099 (97.05%) Loss: 2.201092 LR: 0.00000608 +[17:59:31] Epoch: 1 Batch: 19507/20099 (97.05%) Loss: 2.396380 LR: 0.00000608 +[17:59:33] Epoch: 1 Batch: 19508/20099 (97.06%) Loss: 2.121985 LR: 0.00000608 +[17:59:35] Epoch: 1 Batch: 19509/20099 (97.06%) Loss: 2.030664 LR: 0.00000608 +[17:59:37] Epoch: 1 Batch: 19510/20099 (97.07%) Loss: 2.153253 LR: 0.00000608 +[17:59:38] Epoch: 1 Batch: 19511/20099 (97.07%) Loss: 2.185047 LR: 0.00000608 +[17:59:40] Epoch: 1 Batch: 19512/20099 (97.08%) Loss: 2.211670 LR: 0.00000608 +[17:59:42] Epoch: 1 Batch: 19513/20099 (97.08%) Loss: 1.995257 LR: 0.00000608 +[17:59:44] Epoch: 1 Batch: 19514/20099 (97.09%) Loss: 2.130027 LR: 0.00000608 +[17:59:46] Epoch: 1 Batch: 19515/20099 (97.09%) Loss: 1.905598 LR: 0.00000608 +[17:59:48] Epoch: 1 Batch: 19516/20099 (97.10%) Loss: 1.985156 LR: 0.00000608 +[17:59:50] Epoch: 1 Batch: 19517/20099 (97.10%) Loss: 2.293996 LR: 0.00000608 +[17:59:51] Epoch: 1 Batch: 19518/20099 (97.11%) Loss: 2.030399 LR: 0.00000608 +[17:59:53] Epoch: 1 Batch: 19519/20099 (97.11%) Loss: 2.061881 LR: 0.00000608 +[17:59:55] Epoch: 1 Batch: 19520/20099 (97.12%) Loss: 2.167645 LR: 0.00000608 +[17:59:57] Epoch: 1 Batch: 19521/20099 (97.12%) Loss: 1.952795 LR: 0.00000608 +[17:59:59] Epoch: 1 Batch: 19522/20099 (97.13%) Loss: 2.161867 LR: 0.00000608 +[18:00:01] Epoch: 1 Batch: 19523/20099 (97.13%) Loss: 1.998260 LR: 0.00000608 +[18:00:03] Epoch: 1 Batch: 19524/20099 (97.14%) Loss: 2.282427 LR: 0.00000608 +[18:00:05] Epoch: 1 Batch: 19525/20099 (97.14%) Loss: 2.044101 LR: 0.00000608 +[18:00:06] Epoch: 1 Batch: 19526/20099 (97.15%) Loss: 2.195108 LR: 0.00000608 +[18:00:08] Epoch: 1 Batch: 19527/20099 (97.15%) Loss: 1.862700 LR: 0.00000608 +[18:00:10] Epoch: 1 Batch: 19528/20099 (97.16%) Loss: 2.144134 LR: 0.00000608 +[18:00:12] Epoch: 1 Batch: 19529/20099 (97.16%) Loss: 2.076379 LR: 0.00000608 +[18:00:14] Epoch: 1 Batch: 19530/20099 (97.17%) Loss: 2.001679 LR: 0.00000608 +[18:00:16] Epoch: 1 Batch: 19531/20099 (97.17%) Loss: 2.028637 LR: 0.00000607 +[18:00:18] Epoch: 1 Batch: 19532/20099 (97.18%) Loss: 2.187317 LR: 0.00000607 +[18:00:19] Epoch: 1 Batch: 19533/20099 (97.18%) Loss: 2.084725 LR: 0.00000607 +[18:00:21] Epoch: 1 Batch: 19534/20099 (97.19%) Loss: 2.155938 LR: 0.00000607 +[18:00:23] Epoch: 1 Batch: 19535/20099 (97.19%) Loss: 2.056368 LR: 0.00000607 +[18:00:25] Epoch: 1 Batch: 19536/20099 (97.20%) Loss: 2.011721 LR: 0.00000607 +[18:00:27] Epoch: 1 Batch: 19537/20099 (97.20%) Loss: 2.235137 LR: 0.00000607 +[18:00:29] Epoch: 1 Batch: 19538/20099 (97.21%) Loss: 2.068829 LR: 0.00000607 +[18:00:31] Epoch: 1 Batch: 19539/20099 (97.21%) Loss: 2.052560 LR: 0.00000607 +[18:00:32] Epoch: 1 Batch: 19540/20099 (97.22%) Loss: 2.337172 LR: 0.00000607 +[18:00:34] Epoch: 1 Batch: 19541/20099 (97.22%) Loss: 2.164357 LR: 0.00000607 +[18:00:36] Epoch: 1 Batch: 19542/20099 (97.23%) Loss: 2.013916 LR: 0.00000607 +[18:00:38] Epoch: 1 Batch: 19543/20099 (97.23%) Loss: 2.471647 LR: 0.00000607 +[18:00:40] Epoch: 1 Batch: 19544/20099 (97.24%) Loss: 2.270936 LR: 0.00000607 +[18:00:42] Epoch: 1 Batch: 19545/20099 (97.24%) Loss: 2.379933 LR: 0.00000607 +[18:00:44] Epoch: 1 Batch: 19546/20099 (97.25%) Loss: 2.250376 LR: 0.00000607 +[18:00:45] Epoch: 1 Batch: 19547/20099 (97.25%) Loss: 2.149836 LR: 0.00000607 +[18:00:47] Epoch: 1 Batch: 19548/20099 (97.26%) Loss: 2.164184 LR: 0.00000607 +[18:00:49] Epoch: 1 Batch: 19549/20099 (97.26%) Loss: 2.052629 LR: 0.00000607 +[18:00:51] Epoch: 1 Batch: 19550/20099 (97.27%) Loss: 2.255353 LR: 0.00000607 +[18:00:53] Epoch: 1 Batch: 19551/20099 (97.27%) Loss: 2.041041 LR: 0.00000607 +[18:00:55] Epoch: 1 Batch: 19552/20099 (97.28%) Loss: 2.272614 LR: 0.00000607 +[18:00:56] Epoch: 1 Batch: 19553/20099 (97.28%) Loss: 2.087205 LR: 0.00000607 +[18:00:58] Epoch: 1 Batch: 19554/20099 (97.29%) Loss: 2.354961 LR: 0.00000607 +[18:01:00] Epoch: 1 Batch: 19555/20099 (97.29%) Loss: 1.965375 LR: 0.00000607 +[18:01:02] Epoch: 1 Batch: 19556/20099 (97.30%) Loss: 1.888774 LR: 0.00000607 +[18:01:04] Epoch: 1 Batch: 19557/20099 (97.30%) Loss: 2.132423 LR: 0.00000607 +[18:01:06] Epoch: 1 Batch: 19558/20099 (97.31%) Loss: 2.215467 LR: 0.00000607 +[18:01:08] Epoch: 1 Batch: 19559/20099 (97.31%) Loss: 1.979129 LR: 0.00000607 +[18:01:09] Epoch: 1 Batch: 19560/20099 (97.32%) Loss: 1.986181 LR: 0.00000607 +[18:01:11] Epoch: 1 Batch: 19561/20099 (97.32%) Loss: 1.943447 LR: 0.00000607 +[18:01:13] Epoch: 1 Batch: 19562/20099 (97.33%) Loss: 2.131091 LR: 0.00000607 +[18:01:15] Epoch: 1 Batch: 19563/20099 (97.33%) Loss: 2.090927 LR: 0.00000607 +[18:01:17] Epoch: 1 Batch: 19564/20099 (97.34%) Loss: 2.122889 LR: 0.00000607 +[18:01:19] Epoch: 1 Batch: 19565/20099 (97.34%) Loss: 1.901665 LR: 0.00000607 +[18:01:21] Epoch: 1 Batch: 19566/20099 (97.35%) Loss: 2.179532 LR: 0.00000607 +[18:01:22] Epoch: 1 Batch: 19567/20099 (97.35%) Loss: 2.119359 LR: 0.00000607 +[18:01:24] Epoch: 1 Batch: 19568/20099 (97.36%) Loss: 1.945777 LR: 0.00000607 +[18:01:26] Epoch: 1 Batch: 19569/20099 (97.36%) Loss: 2.149792 LR: 0.00000607 +[18:01:28] Epoch: 1 Batch: 19570/20099 (97.37%) Loss: 2.278142 LR: 0.00000607 +[18:01:30] Epoch: 1 Batch: 19571/20099 (97.37%) Loss: 2.122362 LR: 0.00000607 +[18:01:32] Epoch: 1 Batch: 19572/20099 (97.38%) Loss: 1.948481 LR: 0.00000607 +[18:01:34] Epoch: 1 Batch: 19573/20099 (97.38%) Loss: 2.017097 LR: 0.00000606 +[18:01:35] Epoch: 1 Batch: 19574/20099 (97.39%) Loss: 2.121473 LR: 0.00000606 +[18:01:37] Epoch: 1 Batch: 19575/20099 (97.39%) Loss: 1.868038 LR: 0.00000606 +[18:01:39] Epoch: 1 Batch: 19576/20099 (97.40%) Loss: 2.128089 LR: 0.00000606 +[18:01:41] Epoch: 1 Batch: 19577/20099 (97.40%) Loss: 2.029241 LR: 0.00000606 +[18:01:43] Epoch: 1 Batch: 19578/20099 (97.41%) Loss: 2.116193 LR: 0.00000606 +[18:01:45] Epoch: 1 Batch: 19579/20099 (97.41%) Loss: 1.792562 LR: 0.00000606 +[18:01:47] Epoch: 1 Batch: 19580/20099 (97.42%) Loss: 2.079478 LR: 0.00000606 +[18:01:48] Epoch: 1 Batch: 19581/20099 (97.42%) Loss: 1.954494 LR: 0.00000606 +[18:01:50] Epoch: 1 Batch: 19582/20099 (97.43%) Loss: 2.049985 LR: 0.00000606 +[18:01:52] Epoch: 1 Batch: 19583/20099 (97.43%) Loss: 1.880618 LR: 0.00000606 +[18:01:54] Epoch: 1 Batch: 19584/20099 (97.44%) Loss: 2.280022 LR: 0.00000606 +[18:01:56] Epoch: 1 Batch: 19585/20099 (97.44%) Loss: 2.034087 LR: 0.00000606 +[18:01:58] Epoch: 1 Batch: 19586/20099 (97.45%) Loss: 2.295334 LR: 0.00000606 +[18:02:00] Epoch: 1 Batch: 19587/20099 (97.45%) Loss: 1.760272 LR: 0.00000606 +[18:02:02] Epoch: 1 Batch: 19588/20099 (97.46%) Loss: 1.954733 LR: 0.00000606 +[18:02:03] Epoch: 1 Batch: 19589/20099 (97.46%) Loss: 1.967489 LR: 0.00000606 +[18:02:05] Epoch: 1 Batch: 19590/20099 (97.47%) Loss: 2.323667 LR: 0.00000606 +[18:02:07] Epoch: 1 Batch: 19591/20099 (97.47%) Loss: 2.148960 LR: 0.00000606 +[18:02:09] Epoch: 1 Batch: 19592/20099 (97.48%) Loss: 1.635573 LR: 0.00000606 +[18:02:11] Epoch: 1 Batch: 19593/20099 (97.48%) Loss: 2.146079 LR: 0.00000606 +[18:02:13] Epoch: 1 Batch: 19594/20099 (97.49%) Loss: 2.035960 LR: 0.00000606 +[18:02:15] Epoch: 1 Batch: 19595/20099 (97.49%) Loss: 1.550639 LR: 0.00000606 +[18:02:16] Epoch: 1 Batch: 19596/20099 (97.50%) Loss: 2.150113 LR: 0.00000606 +[18:02:18] Epoch: 1 Batch: 19597/20099 (97.50%) Loss: 1.886929 LR: 0.00000606 +[18:02:20] Epoch: 1 Batch: 19598/20099 (97.51%) Loss: 2.179450 LR: 0.00000606 +[18:02:22] Epoch: 1 Batch: 19599/20099 (97.51%) Loss: 2.228694 LR: 0.00000606 +[18:02:28] >> Cleaned up old temp checkpoint: epoch1_step17600 +[18:02:28] >> Temp checkpoint saved: epoch1_step19600, size: 0.1693 GB +[18:02:28] Epoch: 1 Batch: 19600/20099 (97.52%) Loss: 2.046846 LR: 0.00000606 +[18:02:30] Epoch: 1 Batch: 19601/20099 (97.52%) Loss: 2.107209 LR: 0.00000606 +[18:02:31] Epoch: 1 Batch: 19602/20099 (97.53%) Loss: 2.263826 LR: 0.00000606 +[18:02:33] Epoch: 1 Batch: 19603/20099 (97.53%) Loss: 2.066991 LR: 0.00000606 +[18:02:35] Epoch: 1 Batch: 19604/20099 (97.54%) Loss: 2.019565 LR: 0.00000606 +[18:02:37] Epoch: 1 Batch: 19605/20099 (97.54%) Loss: 1.901450 LR: 0.00000606 +[18:02:39] Epoch: 1 Batch: 19606/20099 (97.55%) Loss: 2.159849 LR: 0.00000606 +[18:02:41] Epoch: 1 Batch: 19607/20099 (97.55%) Loss: 1.895479 LR: 0.00000606 +[18:02:42] Epoch: 1 Batch: 19608/20099 (97.56%) Loss: 2.216279 LR: 0.00000606 +[18:02:44] Epoch: 1 Batch: 19609/20099 (97.56%) Loss: 2.306649 LR: 0.00000606 +[18:02:46] Epoch: 1 Batch: 19610/20099 (97.57%) Loss: 2.418483 LR: 0.00000606 +[18:02:48] Epoch: 1 Batch: 19611/20099 (97.57%) Loss: 2.172929 LR: 0.00000606 +[18:02:50] Epoch: 1 Batch: 19612/20099 (97.58%) Loss: 2.161781 LR: 0.00000606 +[18:02:52] Epoch: 1 Batch: 19613/20099 (97.58%) Loss: 1.833497 LR: 0.00000606 +[18:02:54] Epoch: 1 Batch: 19614/20099 (97.59%) Loss: 1.979941 LR: 0.00000606 +[18:02:56] Epoch: 1 Batch: 19615/20099 (97.59%) Loss: 1.928433 LR: 0.00000605 +[18:02:57] Epoch: 1 Batch: 19616/20099 (97.60%) Loss: 2.227236 LR: 0.00000605 +[18:02:59] Epoch: 1 Batch: 19617/20099 (97.60%) Loss: 2.179509 LR: 0.00000605 +[18:03:01] Epoch: 1 Batch: 19618/20099 (97.61%) Loss: 2.473178 LR: 0.00000605 +[18:03:03] Epoch: 1 Batch: 19619/20099 (97.61%) Loss: 2.369631 LR: 0.00000605 +[18:03:05] Epoch: 1 Batch: 19620/20099 (97.62%) Loss: 2.237721 LR: 0.00000605 +[18:03:07] Epoch: 1 Batch: 19621/20099 (97.62%) Loss: 2.056774 LR: 0.00000605 +[18:03:09] Epoch: 1 Batch: 19622/20099 (97.63%) Loss: 2.352124 LR: 0.00000605 +[18:03:11] Epoch: 1 Batch: 19623/20099 (97.63%) Loss: 1.850369 LR: 0.00000605 +[18:03:12] Epoch: 1 Batch: 19624/20099 (97.64%) Loss: 2.108496 LR: 0.00000605 +[18:03:14] Epoch: 1 Batch: 19625/20099 (97.64%) Loss: 1.915843 LR: 0.00000605 +[18:03:16] Epoch: 1 Batch: 19626/20099 (97.65%) Loss: 1.643032 LR: 0.00000605 +[18:03:18] Epoch: 1 Batch: 19627/20099 (97.65%) Loss: 2.248806 LR: 0.00000605 +[18:03:20] Epoch: 1 Batch: 19628/20099 (97.66%) Loss: 1.956812 LR: 0.00000605 +[18:03:22] Epoch: 1 Batch: 19629/20099 (97.66%) Loss: 2.260164 LR: 0.00000605 +[18:03:24] Epoch: 1 Batch: 19630/20099 (97.67%) Loss: 2.263349 LR: 0.00000605 +[18:03:25] Epoch: 1 Batch: 19631/20099 (97.67%) Loss: 2.224092 LR: 0.00000605 +[18:03:27] Epoch: 1 Batch: 19632/20099 (97.68%) Loss: 2.232020 LR: 0.00000605 +[18:03:29] Epoch: 1 Batch: 19633/20099 (97.68%) Loss: 2.182219 LR: 0.00000605 +[18:03:31] Epoch: 1 Batch: 19634/20099 (97.69%) Loss: 2.031585 LR: 0.00000605 +[18:03:33] Epoch: 1 Batch: 19635/20099 (97.69%) Loss: 1.928462 LR: 0.00000605 +[18:03:35] Epoch: 1 Batch: 19636/20099 (97.70%) Loss: 2.070324 LR: 0.00000605 +[18:03:37] Epoch: 1 Batch: 19637/20099 (97.70%) Loss: 2.301719 LR: 0.00000605 +[18:03:38] Epoch: 1 Batch: 19638/20099 (97.71%) Loss: 2.178490 LR: 0.00000605 +[18:03:40] Epoch: 1 Batch: 19639/20099 (97.71%) Loss: 2.415533 LR: 0.00000605 +[18:03:42] Epoch: 1 Batch: 19640/20099 (97.72%) Loss: 2.146120 LR: 0.00000605 +[18:03:44] Epoch: 1 Batch: 19641/20099 (97.72%) Loss: 2.192451 LR: 0.00000605 +[18:03:46] Epoch: 1 Batch: 19642/20099 (97.73%) Loss: 2.035509 LR: 0.00000605 +[18:03:48] Epoch: 1 Batch: 19643/20099 (97.73%) Loss: 2.034524 LR: 0.00000605 +[18:03:49] Epoch: 1 Batch: 19644/20099 (97.74%) Loss: 2.241922 LR: 0.00000605 +[18:03:51] Epoch: 1 Batch: 19645/20099 (97.74%) Loss: 2.071701 LR: 0.00000605 +[18:03:53] Epoch: 1 Batch: 19646/20099 (97.75%) Loss: 2.121493 LR: 0.00000605 +[18:03:55] Epoch: 1 Batch: 19647/20099 (97.75%) Loss: 2.149298 LR: 0.00000605 +[18:03:57] Epoch: 1 Batch: 19648/20099 (97.76%) Loss: 2.009726 LR: 0.00000605 +[18:03:59] Epoch: 1 Batch: 19649/20099 (97.76%) Loss: 2.193168 LR: 0.00000605 +[18:04:01] Epoch: 1 Batch: 19650/20099 (97.77%) Loss: 2.124324 LR: 0.00000605 +[18:04:02] Epoch: 1 Batch: 19651/20099 (97.77%) Loss: 2.043886 LR: 0.00000605 +[18:04:04] Epoch: 1 Batch: 19652/20099 (97.78%) Loss: 2.154180 LR: 0.00000605 +[18:04:06] Epoch: 1 Batch: 19653/20099 (97.78%) Loss: 2.118556 LR: 0.00000605 +[18:04:08] Epoch: 1 Batch: 19654/20099 (97.79%) Loss: 2.494402 LR: 0.00000605 +[18:04:10] Epoch: 1 Batch: 19655/20099 (97.79%) Loss: 2.005381 LR: 0.00000605 +[18:04:12] Epoch: 1 Batch: 19656/20099 (97.80%) Loss: 2.073069 LR: 0.00000605 +[18:04:14] Epoch: 1 Batch: 19657/20099 (97.80%) Loss: 1.904256 LR: 0.00000604 +[18:04:15] Epoch: 1 Batch: 19658/20099 (97.81%) Loss: 2.419137 LR: 0.00000604 +[18:04:17] Epoch: 1 Batch: 19659/20099 (97.81%) Loss: 2.302383 LR: 0.00000604 +[18:04:19] Epoch: 1 Batch: 19660/20099 (97.82%) Loss: 2.082259 LR: 0.00000604 +[18:04:21] Epoch: 1 Batch: 19661/20099 (97.82%) Loss: 2.089575 LR: 0.00000604 +[18:04:23] Epoch: 1 Batch: 19662/20099 (97.83%) Loss: 2.005141 LR: 0.00000604 +[18:04:25] Epoch: 1 Batch: 19663/20099 (97.83%) Loss: 2.157150 LR: 0.00000604 +[18:04:27] Epoch: 1 Batch: 19664/20099 (97.84%) Loss: 2.117649 LR: 0.00000604 +[18:04:28] Epoch: 1 Batch: 19665/20099 (97.84%) Loss: 2.233055 LR: 0.00000604 +[18:04:30] Epoch: 1 Batch: 19666/20099 (97.85%) Loss: 2.041726 LR: 0.00000604 +[18:04:32] Epoch: 1 Batch: 19667/20099 (97.85%) Loss: 1.895770 LR: 0.00000604 +[18:04:34] Epoch: 1 Batch: 19668/20099 (97.86%) Loss: 2.282933 LR: 0.00000604 +[18:04:36] Epoch: 1 Batch: 19669/20099 (97.86%) Loss: 2.356988 LR: 0.00000604 +[18:04:38] Epoch: 1 Batch: 19670/20099 (97.87%) Loss: 2.037096 LR: 0.00000604 +[18:04:40] Epoch: 1 Batch: 19671/20099 (97.87%) Loss: 2.109510 LR: 0.00000604 +[18:04:41] Epoch: 1 Batch: 19672/20099 (97.88%) Loss: 2.202351 LR: 0.00000604 +[18:04:43] Epoch: 1 Batch: 19673/20099 (97.88%) Loss: 2.075749 LR: 0.00000604 +[18:04:45] Epoch: 1 Batch: 19674/20099 (97.89%) Loss: 2.060717 LR: 0.00000604 +[18:04:47] Epoch: 1 Batch: 19675/20099 (97.89%) Loss: 2.055551 LR: 0.00000604 +[18:04:49] Epoch: 1 Batch: 19676/20099 (97.90%) Loss: 2.157984 LR: 0.00000604 +[18:04:51] Epoch: 1 Batch: 19677/20099 (97.90%) Loss: 1.892918 LR: 0.00000604 +[18:04:53] Epoch: 1 Batch: 19678/20099 (97.91%) Loss: 1.907358 LR: 0.00000604 +[18:04:54] Epoch: 1 Batch: 19679/20099 (97.91%) Loss: 2.129120 LR: 0.00000604 +[18:04:56] Epoch: 1 Batch: 19680/20099 (97.92%) Loss: 1.932591 LR: 0.00000604 +[18:04:58] Epoch: 1 Batch: 19681/20099 (97.92%) Loss: 2.256490 LR: 0.00000604 +[18:05:00] Epoch: 1 Batch: 19682/20099 (97.93%) Loss: 1.997917 LR: 0.00000604 +[18:05:02] Epoch: 1 Batch: 19683/20099 (97.93%) Loss: 1.831437 LR: 0.00000604 +[18:05:04] Epoch: 1 Batch: 19684/20099 (97.94%) Loss: 2.303206 LR: 0.00000604 +[18:05:06] Epoch: 1 Batch: 19685/20099 (97.94%) Loss: 2.407388 LR: 0.00000604 +[18:05:08] Epoch: 1 Batch: 19686/20099 (97.95%) Loss: 2.134957 LR: 0.00000604 +[18:05:09] Epoch: 1 Batch: 19687/20099 (97.95%) Loss: 2.168730 LR: 0.00000604 +[18:05:11] Epoch: 1 Batch: 19688/20099 (97.96%) Loss: 2.110125 LR: 0.00000604 +[18:05:13] Epoch: 1 Batch: 19689/20099 (97.96%) Loss: 1.937375 LR: 0.00000604 +[18:05:15] Epoch: 1 Batch: 19690/20099 (97.97%) Loss: 1.886409 LR: 0.00000604 +[18:05:17] Epoch: 1 Batch: 19691/20099 (97.97%) Loss: 1.953246 LR: 0.00000604 +[18:05:19] Epoch: 1 Batch: 19692/20099 (97.98%) Loss: 2.011682 LR: 0.00000604 +[18:05:21] Epoch: 1 Batch: 19693/20099 (97.98%) Loss: 2.048472 LR: 0.00000604 +[18:05:22] Epoch: 1 Batch: 19694/20099 (97.98%) Loss: 2.216143 LR: 0.00000604 +[18:05:24] Epoch: 1 Batch: 19695/20099 (97.99%) Loss: 2.253417 LR: 0.00000604 +[18:05:26] Epoch: 1 Batch: 19696/20099 (97.99%) Loss: 2.285841 LR: 0.00000604 +[18:05:28] Epoch: 1 Batch: 19697/20099 (98.00%) Loss: 2.406462 LR: 0.00000604 +[18:05:30] Epoch: 1 Batch: 19698/20099 (98.00%) Loss: 2.118507 LR: 0.00000604 +[18:05:32] Epoch: 1 Batch: 19699/20099 (98.01%) Loss: 2.098961 LR: 0.00000604 +[18:05:34] Epoch: 1 Batch: 19700/20099 (98.01%) Loss: 1.905943 LR: 0.00000604 +[18:05:35] Epoch: 1 Batch: 19701/20099 (98.02%) Loss: 2.091667 LR: 0.00000604 +[18:05:37] Epoch: 1 Batch: 19702/20099 (98.02%) Loss: 2.173095 LR: 0.00000604 +[18:05:39] Epoch: 1 Batch: 19703/20099 (98.03%) Loss: 2.439407 LR: 0.00000604 +[18:05:41] Epoch: 1 Batch: 19704/20099 (98.03%) Loss: 2.156247 LR: 0.00000604 +[18:05:43] Epoch: 1 Batch: 19705/20099 (98.04%) Loss: 1.948297 LR: 0.00000604 +[18:05:45] Epoch: 1 Batch: 19706/20099 (98.04%) Loss: 2.166853 LR: 0.00000604 +[18:05:47] Epoch: 1 Batch: 19707/20099 (98.05%) Loss: 1.843661 LR: 0.00000604 +[18:05:48] Epoch: 1 Batch: 19708/20099 (98.05%) Loss: 1.994346 LR: 0.00000604 +[18:05:50] Epoch: 1 Batch: 19709/20099 (98.06%) Loss: 2.203892 LR: 0.00000604 +[18:05:52] Epoch: 1 Batch: 19710/20099 (98.06%) Loss: 1.927269 LR: 0.00000604 +[18:05:54] Epoch: 1 Batch: 19711/20099 (98.07%) Loss: 2.060558 LR: 0.00000604 +[18:05:56] Epoch: 1 Batch: 19712/20099 (98.07%) Loss: 2.082078 LR: 0.00000604 +[18:05:58] Epoch: 1 Batch: 19713/20099 (98.08%) Loss: 2.192109 LR: 0.00000603 +[18:06:00] Epoch: 1 Batch: 19714/20099 (98.08%) Loss: 2.029804 LR: 0.00000603 +[18:06:01] Epoch: 1 Batch: 19715/20099 (98.09%) Loss: 1.949035 LR: 0.00000603 +[18:06:03] Epoch: 1 Batch: 19716/20099 (98.09%) Loss: 2.221664 LR: 0.00000603 +[18:06:05] Epoch: 1 Batch: 19717/20099 (98.10%) Loss: 1.803337 LR: 0.00000603 +[18:06:07] Epoch: 1 Batch: 19718/20099 (98.10%) Loss: 1.843779 LR: 0.00000603 +[18:06:09] Epoch: 1 Batch: 19719/20099 (98.11%) Loss: 2.468314 LR: 0.00000603 +[18:06:11] Epoch: 1 Batch: 19720/20099 (98.11%) Loss: 2.296453 LR: 0.00000603 +[18:06:13] Epoch: 1 Batch: 19721/20099 (98.12%) Loss: 1.811277 LR: 0.00000603 +[18:06:14] Epoch: 1 Batch: 19722/20099 (98.12%) Loss: 1.913281 LR: 0.00000603 +[18:06:16] Epoch: 1 Batch: 19723/20099 (98.13%) Loss: 1.965908 LR: 0.00000603 +[18:06:18] Epoch: 1 Batch: 19724/20099 (98.13%) Loss: 2.256288 LR: 0.00000603 +[18:06:20] Epoch: 1 Batch: 19725/20099 (98.14%) Loss: 2.020911 LR: 0.00000603 +[18:06:22] Epoch: 1 Batch: 19726/20099 (98.14%) Loss: 2.113656 LR: 0.00000603 +[18:06:24] Epoch: 1 Batch: 19727/20099 (98.15%) Loss: 2.008913 LR: 0.00000603 +[18:06:25] Epoch: 1 Batch: 19728/20099 (98.15%) Loss: 2.094574 LR: 0.00000603 +[18:06:27] Epoch: 1 Batch: 19729/20099 (98.16%) Loss: 2.083511 LR: 0.00000603 +[18:06:29] Epoch: 1 Batch: 19730/20099 (98.16%) Loss: 2.115383 LR: 0.00000603 +[18:06:31] Epoch: 1 Batch: 19731/20099 (98.17%) Loss: 2.062526 LR: 0.00000603 +[18:06:33] Epoch: 1 Batch: 19732/20099 (98.17%) Loss: 1.893872 LR: 0.00000603 +[18:06:35] Epoch: 1 Batch: 19733/20099 (98.18%) Loss: 2.380313 LR: 0.00000603 +[18:06:37] Epoch: 1 Batch: 19734/20099 (98.18%) Loss: 2.102864 LR: 0.00000603 +[18:06:38] Epoch: 1 Batch: 19735/20099 (98.19%) Loss: 1.706490 LR: 0.00000603 +[18:06:40] Epoch: 1 Batch: 19736/20099 (98.19%) Loss: 2.039380 LR: 0.00000603 +[18:06:42] Epoch: 1 Batch: 19737/20099 (98.20%) Loss: 2.046010 LR: 0.00000603 +[18:06:44] Epoch: 1 Batch: 19738/20099 (98.20%) Loss: 2.072414 LR: 0.00000603 +[18:06:46] Epoch: 1 Batch: 19739/20099 (98.21%) Loss: 2.169982 LR: 0.00000603 +[18:06:48] Epoch: 1 Batch: 19740/20099 (98.21%) Loss: 2.043205 LR: 0.00000603 +[18:06:50] Epoch: 1 Batch: 19741/20099 (98.22%) Loss: 2.296820 LR: 0.00000603 +[18:06:51] Epoch: 1 Batch: 19742/20099 (98.22%) Loss: 2.313753 LR: 0.00000603 +[18:06:53] Epoch: 1 Batch: 19743/20099 (98.23%) Loss: 1.911609 LR: 0.00000603 +[18:06:55] Epoch: 1 Batch: 19744/20099 (98.23%) Loss: 2.019235 LR: 0.00000603 +[18:06:57] Epoch: 1 Batch: 19745/20099 (98.24%) Loss: 2.137435 LR: 0.00000603 +[18:06:59] Epoch: 1 Batch: 19746/20099 (98.24%) Loss: 1.996792 LR: 0.00000603 +[18:07:01] Epoch: 1 Batch: 19747/20099 (98.25%) Loss: 1.922562 LR: 0.00000603 +[18:07:03] Epoch: 1 Batch: 19748/20099 (98.25%) Loss: 1.859964 LR: 0.00000603 +[18:07:04] Epoch: 1 Batch: 19749/20099 (98.26%) Loss: 1.914468 LR: 0.00000603 +[18:07:06] Epoch: 1 Batch: 19750/20099 (98.26%) Loss: 2.225158 LR: 0.00000603 +[18:07:08] Epoch: 1 Batch: 19751/20099 (98.27%) Loss: 1.813281 LR: 0.00000603 +[18:07:10] Epoch: 1 Batch: 19752/20099 (98.27%) Loss: 1.793062 LR: 0.00000603 +[18:07:12] Epoch: 1 Batch: 19753/20099 (98.28%) Loss: 2.447685 LR: 0.00000603 +[18:07:14] Epoch: 1 Batch: 19754/20099 (98.28%) Loss: 2.159777 LR: 0.00000603 +[18:07:15] Epoch: 1 Batch: 19755/20099 (98.29%) Loss: 2.067502 LR: 0.00000603 +[18:07:17] Epoch: 1 Batch: 19756/20099 (98.29%) Loss: 2.146427 LR: 0.00000603 +[18:07:19] Epoch: 1 Batch: 19757/20099 (98.30%) Loss: 2.219773 LR: 0.00000603 +[18:07:21] Epoch: 1 Batch: 19758/20099 (98.30%) Loss: 1.832261 LR: 0.00000603 +[18:07:23] Epoch: 1 Batch: 19759/20099 (98.31%) Loss: 2.321684 LR: 0.00000603 +[18:07:25] Epoch: 1 Batch: 19760/20099 (98.31%) Loss: 1.928580 LR: 0.00000603 +[18:07:27] Epoch: 1 Batch: 19761/20099 (98.32%) Loss: 1.878603 LR: 0.00000603 +[18:07:28] Epoch: 1 Batch: 19762/20099 (98.32%) Loss: 2.133772 LR: 0.00000603 +[18:07:30] Epoch: 1 Batch: 19763/20099 (98.33%) Loss: 2.173616 LR: 0.00000603 +[18:07:32] Epoch: 1 Batch: 19764/20099 (98.33%) Loss: 1.866152 LR: 0.00000603 +[18:07:34] Epoch: 1 Batch: 19765/20099 (98.34%) Loss: 2.231261 LR: 0.00000603 +[18:07:36] Epoch: 1 Batch: 19766/20099 (98.34%) Loss: 2.208231 LR: 0.00000603 +[18:07:38] Epoch: 1 Batch: 19767/20099 (98.35%) Loss: 2.668456 LR: 0.00000603 +[18:07:40] Epoch: 1 Batch: 19768/20099 (98.35%) Loss: 2.096473 LR: 0.00000603 +[18:07:41] Epoch: 1 Batch: 19769/20099 (98.36%) Loss: 2.187817 LR: 0.00000603 +[18:07:43] Epoch: 1 Batch: 19770/20099 (98.36%) Loss: 2.127483 LR: 0.00000603 +[18:07:45] Epoch: 1 Batch: 19771/20099 (98.37%) Loss: 2.229463 LR: 0.00000603 +[18:07:47] Epoch: 1 Batch: 19772/20099 (98.37%) Loss: 2.026065 LR: 0.00000603 +[18:07:49] Epoch: 1 Batch: 19773/20099 (98.38%) Loss: 2.241276 LR: 0.00000603 +[18:07:51] Epoch: 1 Batch: 19774/20099 (98.38%) Loss: 2.317104 LR: 0.00000603 +[18:07:53] Epoch: 1 Batch: 19775/20099 (98.39%) Loss: 2.058139 LR: 0.00000603 +[18:07:54] Epoch: 1 Batch: 19776/20099 (98.39%) Loss: 1.926757 LR: 0.00000602 +[18:07:56] Epoch: 1 Batch: 19777/20099 (98.40%) Loss: 2.175685 LR: 0.00000602 +[18:07:58] Epoch: 1 Batch: 19778/20099 (98.40%) Loss: 2.164976 LR: 0.00000602 +[18:08:00] Epoch: 1 Batch: 19779/20099 (98.41%) Loss: 1.738712 LR: 0.00000602 +[18:08:02] Epoch: 1 Batch: 19780/20099 (98.41%) Loss: 1.832962 LR: 0.00000602 +[18:08:04] Epoch: 1 Batch: 19781/20099 (98.42%) Loss: 2.064605 LR: 0.00000602 +[18:08:06] Epoch: 1 Batch: 19782/20099 (98.42%) Loss: 1.983401 LR: 0.00000602 +[18:08:07] Epoch: 1 Batch: 19783/20099 (98.43%) Loss: 2.163121 LR: 0.00000602 +[18:08:09] Epoch: 1 Batch: 19784/20099 (98.43%) Loss: 2.137851 LR: 0.00000602 +[18:08:11] Epoch: 1 Batch: 19785/20099 (98.44%) Loss: 2.007523 LR: 0.00000602 +[18:08:13] Epoch: 1 Batch: 19786/20099 (98.44%) Loss: 1.790818 LR: 0.00000602 +[18:08:15] Epoch: 1 Batch: 19787/20099 (98.45%) Loss: 2.076005 LR: 0.00000602 +[18:08:17] Epoch: 1 Batch: 19788/20099 (98.45%) Loss: 1.591262 LR: 0.00000602 +[18:08:19] Epoch: 1 Batch: 19789/20099 (98.46%) Loss: 2.062265 LR: 0.00000602 +[18:08:20] Epoch: 1 Batch: 19790/20099 (98.46%) Loss: 2.057992 LR: 0.00000602 +[18:08:22] Epoch: 1 Batch: 19791/20099 (98.47%) Loss: 1.813899 LR: 0.00000602 +[18:08:24] Epoch: 1 Batch: 19792/20099 (98.47%) Loss: 1.949007 LR: 0.00000602 +[18:08:26] Epoch: 1 Batch: 19793/20099 (98.48%) Loss: 1.768726 LR: 0.00000602 +[18:08:28] Epoch: 1 Batch: 19794/20099 (98.48%) Loss: 1.849072 LR: 0.00000602 +[18:08:30] Epoch: 1 Batch: 19795/20099 (98.49%) Loss: 2.052927 LR: 0.00000602 +[18:08:32] Epoch: 1 Batch: 19796/20099 (98.49%) Loss: 1.931696 LR: 0.00000602 +[18:08:33] Epoch: 1 Batch: 19797/20099 (98.50%) Loss: 2.181660 LR: 0.00000602 +[18:08:35] Epoch: 1 Batch: 19798/20099 (98.50%) Loss: 2.170081 LR: 0.00000602 +[18:08:37] Epoch: 1 Batch: 19799/20099 (98.51%) Loss: 2.017310 LR: 0.00000602 +[18:08:43] >> Cleaned up old temp checkpoint: epoch1_step17800 +[18:08:43] >> Temp checkpoint saved: epoch1_step19800, size: 0.1693 GB +[18:08:43] Epoch: 1 Batch: 19800/20099 (98.51%) Loss: 2.051925 LR: 0.00000602 +[18:08:44] Epoch: 1 Batch: 19801/20099 (98.52%) Loss: 2.289562 LR: 0.00000602 +[18:08:46] Epoch: 1 Batch: 19802/20099 (98.52%) Loss: 1.942138 LR: 0.00000602 +[18:08:48] Epoch: 1 Batch: 19803/20099 (98.53%) Loss: 2.093971 LR: 0.00000602 +[18:08:50] Epoch: 1 Batch: 19804/20099 (98.53%) Loss: 1.984400 LR: 0.00000602 +[18:08:52] Epoch: 1 Batch: 19805/20099 (98.54%) Loss: 1.872719 LR: 0.00000602 +[18:08:54] Epoch: 1 Batch: 19806/20099 (98.54%) Loss: 2.373830 LR: 0.00000602 +[18:08:56] Epoch: 1 Batch: 19807/20099 (98.55%) Loss: 2.054818 LR: 0.00000602 +[18:08:57] Epoch: 1 Batch: 19808/20099 (98.55%) Loss: 1.820733 LR: 0.00000602 +[18:08:59] Epoch: 1 Batch: 19809/20099 (98.56%) Loss: 1.875537 LR: 0.00000602 +[18:09:01] Epoch: 1 Batch: 19810/20099 (98.56%) Loss: 1.896393 LR: 0.00000602 +[18:09:03] Epoch: 1 Batch: 19811/20099 (98.57%) Loss: 2.518976 LR: 0.00000602 +[18:09:05] Epoch: 1 Batch: 19812/20099 (98.57%) Loss: 2.415454 LR: 0.00000602 +[18:09:07] Epoch: 1 Batch: 19813/20099 (98.58%) Loss: 2.033613 LR: 0.00000602 +[18:09:09] Epoch: 1 Batch: 19814/20099 (98.58%) Loss: 2.292053 LR: 0.00000602 +[18:09:10] Epoch: 1 Batch: 19815/20099 (98.59%) Loss: 1.899283 LR: 0.00000602 +[18:09:12] Epoch: 1 Batch: 19816/20099 (98.59%) Loss: 2.255978 LR: 0.00000602 +[18:09:14] Epoch: 1 Batch: 19817/20099 (98.60%) Loss: 1.930538 LR: 0.00000602 +[18:09:16] Epoch: 1 Batch: 19818/20099 (98.60%) Loss: 2.062823 LR: 0.00000602 +[18:09:18] Epoch: 1 Batch: 19819/20099 (98.61%) Loss: 2.554799 LR: 0.00000602 +[18:09:20] Epoch: 1 Batch: 19820/20099 (98.61%) Loss: 2.461972 LR: 0.00000602 +[18:09:22] Epoch: 1 Batch: 19821/20099 (98.62%) Loss: 2.327936 LR: 0.00000602 +[18:09:24] Epoch: 1 Batch: 19822/20099 (98.62%) Loss: 1.930173 LR: 0.00000602 +[18:09:25] Epoch: 1 Batch: 19823/20099 (98.63%) Loss: 2.103911 LR: 0.00000602 +[18:09:27] Epoch: 1 Batch: 19824/20099 (98.63%) Loss: 1.908548 LR: 0.00000602 +[18:09:29] Epoch: 1 Batch: 19825/20099 (98.64%) Loss: 2.288671 LR: 0.00000602 +[18:09:31] Epoch: 1 Batch: 19826/20099 (98.64%) Loss: 2.085632 LR: 0.00000602 +[18:09:33] Epoch: 1 Batch: 19827/20099 (98.65%) Loss: 2.184224 LR: 0.00000602 +[18:09:35] Epoch: 1 Batch: 19828/20099 (98.65%) Loss: 2.046353 LR: 0.00000602 +[18:09:37] Epoch: 1 Batch: 19829/20099 (98.66%) Loss: 2.053692 LR: 0.00000602 +[18:09:38] Epoch: 1 Batch: 19830/20099 (98.66%) Loss: 1.720436 LR: 0.00000602 +[18:09:40] Epoch: 1 Batch: 19831/20099 (98.67%) Loss: 2.022283 LR: 0.00000602 +[18:09:42] Epoch: 1 Batch: 19832/20099 (98.67%) Loss: 2.128328 LR: 0.00000602 +[18:09:44] Epoch: 1 Batch: 19833/20099 (98.68%) Loss: 1.924510 LR: 0.00000602 +[18:09:46] Epoch: 1 Batch: 19834/20099 (98.68%) Loss: 2.167423 LR: 0.00000602 +[18:09:48] Epoch: 1 Batch: 19835/20099 (98.69%) Loss: 2.149616 LR: 0.00000602 +[18:09:50] Epoch: 1 Batch: 19836/20099 (98.69%) Loss: 2.190154 LR: 0.00000602 +[18:09:51] Epoch: 1 Batch: 19837/20099 (98.70%) Loss: 2.251785 LR: 0.00000602 +[18:09:53] Epoch: 1 Batch: 19838/20099 (98.70%) Loss: 1.972065 LR: 0.00000602 +[18:09:55] Epoch: 1 Batch: 19839/20099 (98.71%) Loss: 2.179947 LR: 0.00000602 +[18:09:57] Epoch: 1 Batch: 19840/20099 (98.71%) Loss: 1.809257 LR: 0.00000602 +[18:09:59] Epoch: 1 Batch: 19841/20099 (98.72%) Loss: 1.841779 LR: 0.00000602 +[18:10:01] Epoch: 1 Batch: 19842/20099 (98.72%) Loss: 1.967659 LR: 0.00000602 +[18:10:02] Epoch: 1 Batch: 19843/20099 (98.73%) Loss: 2.036695 LR: 0.00000602 +[18:10:04] Epoch: 1 Batch: 19844/20099 (98.73%) Loss: 2.005346 LR: 0.00000602 +[18:10:06] Epoch: 1 Batch: 19845/20099 (98.74%) Loss: 2.079934 LR: 0.00000602 +[18:10:08] Epoch: 1 Batch: 19846/20099 (98.74%) Loss: 2.151863 LR: 0.00000602 +[18:10:10] Epoch: 1 Batch: 19847/20099 (98.75%) Loss: 2.093193 LR: 0.00000602 +[18:10:12] Epoch: 1 Batch: 19848/20099 (98.75%) Loss: 1.804417 LR: 0.00000602 +[18:10:14] Epoch: 1 Batch: 19849/20099 (98.76%) Loss: 2.175230 LR: 0.00000602 +[18:10:15] Epoch: 1 Batch: 19850/20099 (98.76%) Loss: 2.240636 LR: 0.00000602 +[18:10:17] Epoch: 1 Batch: 19851/20099 (98.77%) Loss: 2.130238 LR: 0.00000602 +[18:10:19] Epoch: 1 Batch: 19852/20099 (98.77%) Loss: 2.086620 LR: 0.00000602 +[18:10:21] Epoch: 1 Batch: 19853/20099 (98.78%) Loss: 2.367168 LR: 0.00000601 +[18:10:23] Epoch: 1 Batch: 19854/20099 (98.78%) Loss: 1.819766 LR: 0.00000601 +[18:10:25] Epoch: 1 Batch: 19855/20099 (98.79%) Loss: 2.123184 LR: 0.00000601 +[18:10:27] Epoch: 1 Batch: 19856/20099 (98.79%) Loss: 2.113590 LR: 0.00000601 +[18:10:28] Epoch: 1 Batch: 19857/20099 (98.80%) Loss: 2.118157 LR: 0.00000601 +[18:10:30] Epoch: 1 Batch: 19858/20099 (98.80%) Loss: 1.897917 LR: 0.00000601 +[18:10:32] Epoch: 1 Batch: 19859/20099 (98.81%) Loss: 2.366243 LR: 0.00000601 +[18:10:34] Epoch: 1 Batch: 19860/20099 (98.81%) Loss: 1.888029 LR: 0.00000601 +[18:10:36] Epoch: 1 Batch: 19861/20099 (98.82%) Loss: 1.926271 LR: 0.00000601 +[18:10:38] Epoch: 1 Batch: 19862/20099 (98.82%) Loss: 2.253131 LR: 0.00000601 +[18:10:40] Epoch: 1 Batch: 19863/20099 (98.83%) Loss: 2.081148 LR: 0.00000601 +[18:10:41] Epoch: 1 Batch: 19864/20099 (98.83%) Loss: 2.161395 LR: 0.00000601 +[18:10:43] Epoch: 1 Batch: 19865/20099 (98.84%) Loss: 2.206704 LR: 0.00000601 +[18:10:45] Epoch: 1 Batch: 19866/20099 (98.84%) Loss: 2.363537 LR: 0.00000601 +[18:10:47] Epoch: 1 Batch: 19867/20099 (98.85%) Loss: 2.181461 LR: 0.00000601 +[18:10:49] Epoch: 1 Batch: 19868/20099 (98.85%) Loss: 2.418191 LR: 0.00000601 +[18:10:51] Epoch: 1 Batch: 19869/20099 (98.86%) Loss: 2.045240 LR: 0.00000601 +[18:10:53] Epoch: 1 Batch: 19870/20099 (98.86%) Loss: 2.261252 LR: 0.00000601 +[18:10:55] Epoch: 1 Batch: 19871/20099 (98.87%) Loss: 2.352535 LR: 0.00000601 +[18:10:56] Epoch: 1 Batch: 19872/20099 (98.87%) Loss: 2.135545 LR: 0.00000601 +[18:10:58] Epoch: 1 Batch: 19873/20099 (98.88%) Loss: 1.981014 LR: 0.00000601 +[18:11:00] Epoch: 1 Batch: 19874/20099 (98.88%) Loss: 2.387724 LR: 0.00000601 +[18:11:02] Epoch: 1 Batch: 19875/20099 (98.89%) Loss: 1.827366 LR: 0.00000601 +[18:11:04] Epoch: 1 Batch: 19876/20099 (98.89%) Loss: 2.058333 LR: 0.00000601 +[18:11:06] Epoch: 1 Batch: 19877/20099 (98.90%) Loss: 2.179856 LR: 0.00000601 +[18:11:08] Epoch: 1 Batch: 19878/20099 (98.90%) Loss: 2.273023 LR: 0.00000601 +[18:11:09] Epoch: 1 Batch: 19879/20099 (98.91%) Loss: 2.151248 LR: 0.00000601 +[18:11:11] Epoch: 1 Batch: 19880/20099 (98.91%) Loss: 2.390322 LR: 0.00000601 +[18:11:13] Epoch: 1 Batch: 19881/20099 (98.92%) Loss: 1.923298 LR: 0.00000601 +[18:11:15] Epoch: 1 Batch: 19882/20099 (98.92%) Loss: 2.094088 LR: 0.00000601 +[18:11:17] Epoch: 1 Batch: 19883/20099 (98.93%) Loss: 2.342150 LR: 0.00000601 +[18:11:19] Epoch: 1 Batch: 19884/20099 (98.93%) Loss: 1.903969 LR: 0.00000601 +[18:11:21] Epoch: 1 Batch: 19885/20099 (98.94%) Loss: 2.450585 LR: 0.00000601 +[18:11:23] Epoch: 1 Batch: 19886/20099 (98.94%) Loss: 2.299467 LR: 0.00000601 +[18:11:24] Epoch: 1 Batch: 19887/20099 (98.95%) Loss: 2.011108 LR: 0.00000601 +[18:11:26] Epoch: 1 Batch: 19888/20099 (98.95%) Loss: 2.216524 LR: 0.00000601 +[18:11:28] Epoch: 1 Batch: 19889/20099 (98.96%) Loss: 2.243650 LR: 0.00000601 +[18:11:30] Epoch: 1 Batch: 19890/20099 (98.96%) Loss: 2.025542 LR: 0.00000601 +[18:11:32] Epoch: 1 Batch: 19891/20099 (98.97%) Loss: 2.224040 LR: 0.00000601 +[18:11:34] Epoch: 1 Batch: 19892/20099 (98.97%) Loss: 2.126376 LR: 0.00000601 +[18:11:35] Epoch: 1 Batch: 19893/20099 (98.98%) Loss: 2.143669 LR: 0.00000601 +[18:11:37] Epoch: 1 Batch: 19894/20099 (98.98%) Loss: 1.997896 LR: 0.00000601 +[18:11:39] Epoch: 1 Batch: 19895/20099 (98.99%) Loss: 2.360316 LR: 0.00000601 +[18:11:41] Epoch: 1 Batch: 19896/20099 (98.99%) Loss: 1.992538 LR: 0.00000601 +[18:11:43] Epoch: 1 Batch: 19897/20099 (98.99%) Loss: 1.909878 LR: 0.00000601 +[18:11:45] Epoch: 1 Batch: 19898/20099 (99.00%) Loss: 2.474989 LR: 0.00000601 +[18:11:47] Epoch: 1 Batch: 19899/20099 (99.00%) Loss: 2.139086 LR: 0.00000601 +[18:11:48] Epoch: 1 Batch: 19900/20099 (99.01%) Loss: 2.263630 LR: 0.00000601 +[18:11:50] Epoch: 1 Batch: 19901/20099 (99.01%) Loss: 2.060228 LR: 0.00000601 +[18:11:52] Epoch: 1 Batch: 19902/20099 (99.02%) Loss: 2.183789 LR: 0.00000601 +[18:11:54] Epoch: 1 Batch: 19903/20099 (99.02%) Loss: 2.224228 LR: 0.00000601 +[18:11:56] Epoch: 1 Batch: 19904/20099 (99.03%) Loss: 2.058662 LR: 0.00000601 +[18:11:58] Epoch: 1 Batch: 19905/20099 (99.03%) Loss: 2.343738 LR: 0.00000601 +[18:12:00] Epoch: 1 Batch: 19906/20099 (99.04%) Loss: 1.906108 LR: 0.00000601 +[18:12:02] Epoch: 1 Batch: 19907/20099 (99.04%) Loss: 1.927742 LR: 0.00000601 +[18:12:03] Epoch: 1 Batch: 19908/20099 (99.05%) Loss: 2.235109 LR: 0.00000601 +[18:12:05] Epoch: 1 Batch: 19909/20099 (99.05%) Loss: 2.001861 LR: 0.00000601 +[18:12:07] Epoch: 1 Batch: 19910/20099 (99.06%) Loss: 2.194060 LR: 0.00000601 +[18:12:09] Epoch: 1 Batch: 19911/20099 (99.06%) Loss: 1.997604 LR: 0.00000601 +[18:12:11] Epoch: 1 Batch: 19912/20099 (99.07%) Loss: 1.491943 LR: 0.00000601 +[18:12:13] Epoch: 1 Batch: 19913/20099 (99.07%) Loss: 2.068338 LR: 0.00000601 +[18:12:14] Epoch: 1 Batch: 19914/20099 (99.08%) Loss: 2.129529 LR: 0.00000601 +[18:12:16] Epoch: 1 Batch: 19915/20099 (99.08%) Loss: 1.912435 LR: 0.00000601 +[18:12:18] Epoch: 1 Batch: 19916/20099 (99.09%) Loss: 2.082454 LR: 0.00000601 +[18:12:20] Epoch: 1 Batch: 19917/20099 (99.09%) Loss: 1.930790 LR: 0.00000601 +[18:12:22] Epoch: 1 Batch: 19918/20099 (99.10%) Loss: 1.794702 LR: 0.00000601 +[18:12:24] Epoch: 1 Batch: 19919/20099 (99.10%) Loss: 2.064025 LR: 0.00000601 +[18:12:26] Epoch: 1 Batch: 19920/20099 (99.11%) Loss: 1.681873 LR: 0.00000601 +[18:12:27] Epoch: 1 Batch: 19921/20099 (99.11%) Loss: 2.124971 LR: 0.00000601 +[18:12:29] Epoch: 1 Batch: 19922/20099 (99.12%) Loss: 2.194179 LR: 0.00000601 +[18:12:31] Epoch: 1 Batch: 19923/20099 (99.12%) Loss: 1.765832 LR: 0.00000601 +[18:12:33] Epoch: 1 Batch: 19924/20099 (99.13%) Loss: 2.186311 LR: 0.00000601 +[18:12:35] Epoch: 1 Batch: 19925/20099 (99.13%) Loss: 2.228449 LR: 0.00000601 +[18:12:37] Epoch: 1 Batch: 19926/20099 (99.14%) Loss: 1.966350 LR: 0.00000601 +[18:12:39] Epoch: 1 Batch: 19927/20099 (99.14%) Loss: 2.091926 LR: 0.00000601 +[18:12:40] Epoch: 1 Batch: 19928/20099 (99.15%) Loss: 2.112078 LR: 0.00000601 +[18:12:42] Epoch: 1 Batch: 19929/20099 (99.15%) Loss: 2.223415 LR: 0.00000601 +[18:12:44] Epoch: 1 Batch: 19930/20099 (99.16%) Loss: 2.091291 LR: 0.00000601 +[18:12:46] Epoch: 1 Batch: 19931/20099 (99.16%) Loss: 1.978330 LR: 0.00000601 +[18:12:48] Epoch: 1 Batch: 19932/20099 (99.17%) Loss: 1.728313 LR: 0.00000601 +[18:12:50] Epoch: 1 Batch: 19933/20099 (99.17%) Loss: 2.364385 LR: 0.00000601 +[18:12:52] Epoch: 1 Batch: 19934/20099 (99.18%) Loss: 2.016558 LR: 0.00000601 +[18:12:53] Epoch: 1 Batch: 19935/20099 (99.18%) Loss: 2.216303 LR: 0.00000601 +[18:12:55] Epoch: 1 Batch: 19936/20099 (99.19%) Loss: 2.105827 LR: 0.00000601 +[18:12:57] Epoch: 1 Batch: 19937/20099 (99.19%) Loss: 2.393059 LR: 0.00000601 +[18:12:59] Epoch: 1 Batch: 19938/20099 (99.20%) Loss: 2.184569 LR: 0.00000601 +[18:13:01] Epoch: 1 Batch: 19939/20099 (99.20%) Loss: 2.019128 LR: 0.00000601 +[18:13:03] Epoch: 1 Batch: 19940/20099 (99.21%) Loss: 2.103800 LR: 0.00000601 +[18:13:05] Epoch: 1 Batch: 19941/20099 (99.21%) Loss: 2.226369 LR: 0.00000601 +[18:13:06] Epoch: 1 Batch: 19942/20099 (99.22%) Loss: 2.003755 LR: 0.00000601 +[18:13:08] Epoch: 1 Batch: 19943/20099 (99.22%) Loss: 2.065042 LR: 0.00000601 +[18:13:10] Epoch: 1 Batch: 19944/20099 (99.23%) Loss: 2.323166 LR: 0.00000601 +[18:13:12] Epoch: 1 Batch: 19945/20099 (99.23%) Loss: 1.949398 LR: 0.00000601 +[18:13:14] Epoch: 1 Batch: 19946/20099 (99.24%) Loss: 2.002255 LR: 0.00000601 +[18:13:16] Epoch: 1 Batch: 19947/20099 (99.24%) Loss: 2.056298 LR: 0.00000601 +[18:13:17] Epoch: 1 Batch: 19948/20099 (99.25%) Loss: 2.044523 LR: 0.00000601 +[18:13:19] Epoch: 1 Batch: 19949/20099 (99.25%) Loss: 2.300149 LR: 0.00000601 +[18:13:21] Epoch: 1 Batch: 19950/20099 (99.26%) Loss: 2.122986 LR: 0.00000601 +[18:13:23] Epoch: 1 Batch: 19951/20099 (99.26%) Loss: 2.088663 LR: 0.00000601 +[18:13:25] Epoch: 1 Batch: 19952/20099 (99.27%) Loss: 2.109361 LR: 0.00000601 +[18:13:27] Epoch: 1 Batch: 19953/20099 (99.27%) Loss: 2.695279 LR: 0.00000601 +[18:13:29] Epoch: 1 Batch: 19954/20099 (99.28%) Loss: 2.065400 LR: 0.00000601 +[18:13:30] Epoch: 1 Batch: 19955/20099 (99.28%) Loss: 2.190050 LR: 0.00000601 +[18:13:32] Epoch: 1 Batch: 19956/20099 (99.29%) Loss: 2.142442 LR: 0.00000601 +[18:13:34] Epoch: 1 Batch: 19957/20099 (99.29%) Loss: 1.986890 LR: 0.00000601 +[18:13:36] Epoch: 1 Batch: 19958/20099 (99.30%) Loss: 1.784241 LR: 0.00000600 +[18:13:38] Epoch: 1 Batch: 19959/20099 (99.30%) Loss: 2.032623 LR: 0.00000600 +[18:13:40] Epoch: 1 Batch: 19960/20099 (99.31%) Loss: 2.015134 LR: 0.00000600 +[18:13:42] Epoch: 1 Batch: 19961/20099 (99.31%) Loss: 1.981108 LR: 0.00000600 +[18:13:43] Epoch: 1 Batch: 19962/20099 (99.32%) Loss: 1.877672 LR: 0.00000600 +[18:13:45] Epoch: 1 Batch: 19963/20099 (99.32%) Loss: 1.990401 LR: 0.00000600 +[18:13:47] Epoch: 1 Batch: 19964/20099 (99.33%) Loss: 2.051068 LR: 0.00000600 +[18:13:49] Epoch: 1 Batch: 19965/20099 (99.33%) Loss: 2.449492 LR: 0.00000600 +[18:13:51] Epoch: 1 Batch: 19966/20099 (99.34%) Loss: 2.365175 LR: 0.00000600 +[18:13:53] Epoch: 1 Batch: 19967/20099 (99.34%) Loss: 1.904671 LR: 0.00000600 +[18:13:55] Epoch: 1 Batch: 19968/20099 (99.35%) Loss: 1.993899 LR: 0.00000600 +[18:13:56] Epoch: 1 Batch: 19969/20099 (99.35%) Loss: 1.825162 LR: 0.00000600 +[18:13:58] Epoch: 1 Batch: 19970/20099 (99.36%) Loss: 2.120810 LR: 0.00000600 +[18:14:00] Epoch: 1 Batch: 19971/20099 (99.36%) Loss: 2.126808 LR: 0.00000600 +[18:14:02] Epoch: 1 Batch: 19972/20099 (99.37%) Loss: 2.258442 LR: 0.00000600 +[18:14:04] Epoch: 1 Batch: 19973/20099 (99.37%) Loss: 1.917770 LR: 0.00000600 +[18:14:06] Epoch: 1 Batch: 19974/20099 (99.38%) Loss: 1.709959 LR: 0.00000600 +[18:14:08] Epoch: 1 Batch: 19975/20099 (99.38%) Loss: 2.094954 LR: 0.00000600 +[18:14:09] Epoch: 1 Batch: 19976/20099 (99.39%) Loss: 1.951792 LR: 0.00000600 +[18:14:11] Epoch: 1 Batch: 19977/20099 (99.39%) Loss: 1.990114 LR: 0.00000600 +[18:14:13] Epoch: 1 Batch: 19978/20099 (99.40%) Loss: 2.174391 LR: 0.00000600 +[18:14:15] Epoch: 1 Batch: 19979/20099 (99.40%) Loss: 1.989349 LR: 0.00000600 +[18:14:17] Epoch: 1 Batch: 19980/20099 (99.41%) Loss: 2.005177 LR: 0.00000600 +[18:14:19] Epoch: 1 Batch: 19981/20099 (99.41%) Loss: 2.212886 LR: 0.00000600 +[18:14:21] Epoch: 1 Batch: 19982/20099 (99.42%) Loss: 1.738271 LR: 0.00000600 +[18:14:22] Epoch: 1 Batch: 19983/20099 (99.42%) Loss: 1.654482 LR: 0.00000600 +[18:14:24] Epoch: 1 Batch: 19984/20099 (99.43%) Loss: 1.832176 LR: 0.00000600 +[18:14:26] Epoch: 1 Batch: 19985/20099 (99.43%) Loss: 1.949587 LR: 0.00000600 +[18:14:28] Epoch: 1 Batch: 19986/20099 (99.44%) Loss: 2.049768 LR: 0.00000600 +[18:14:30] Epoch: 1 Batch: 19987/20099 (99.44%) Loss: 2.207282 LR: 0.00000600 +[18:14:32] Epoch: 1 Batch: 19988/20099 (99.45%) Loss: 2.254671 LR: 0.00000600 +[18:14:33] Epoch: 1 Batch: 19989/20099 (99.45%) Loss: 1.989869 LR: 0.00000600 +[18:14:35] Epoch: 1 Batch: 19990/20099 (99.46%) Loss: 2.359429 LR: 0.00000600 +[18:14:37] Epoch: 1 Batch: 19991/20099 (99.46%) Loss: 2.252242 LR: 0.00000600 +[18:14:39] Epoch: 1 Batch: 19992/20099 (99.47%) Loss: 2.124191 LR: 0.00000600 +[18:14:41] Epoch: 1 Batch: 19993/20099 (99.47%) Loss: 2.152019 LR: 0.00000600 +[18:14:43] Epoch: 1 Batch: 19994/20099 (99.48%) Loss: 2.158744 LR: 0.00000600 +[18:14:45] Epoch: 1 Batch: 19995/20099 (99.48%) Loss: 2.042598 LR: 0.00000600 +[18:14:46] Epoch: 1 Batch: 19996/20099 (99.49%) Loss: 2.274831 LR: 0.00000600 +[18:14:48] Epoch: 1 Batch: 19997/20099 (99.49%) Loss: 2.210019 LR: 0.00000600 +[18:14:50] Epoch: 1 Batch: 19998/20099 (99.50%) Loss: 2.021503 LR: 0.00000600 +[18:14:52] Epoch: 1 Batch: 19999/20099 (99.50%) Loss: 2.177226 LR: 0.00000600 +[18:14:54] >> Evaluating batch 0 +[18:14:55] >> Evaluating batch 1 +[18:14:56] >> Evaluating batch 2 +[18:14:57] >> Evaluating batch 3 +[18:14:58] >> Evaluating batch 4 +[18:14:59] >> Evaluating batch 5 +[18:15:00] >> Evaluating batch 6 +[18:15:01] >> Evaluating batch 7 +[18:15:02] >> Evaluating batch 8 +[18:15:04] >> Evaluating batch 9 +[18:15:05] >> Evaluating batch 10 +[18:15:06] >> Evaluating batch 11 +[18:15:07] >> Evaluating batch 12 +[18:15:08] >> Evaluating batch 13 +[18:15:09] >> Evaluating batch 14 +[18:15:10] >> Evaluating batch 15 +[18:15:11] >> Evaluating batch 16 +[18:15:11] Epoch: 1 Step: 20000/20099 Evaluation: +[18:15:11] [1mAvg Loss Since Last Eval: 2.0924 Val Loss: 2.1444 Validation loss delta: -0.0022 Perplexity: 8.5367 LR: 0.00000600 +[18:15:15] >> Cleaned up old temp checkpoint: epoch1_step18000 +[18:15:15] >> Temp checkpoint saved: epoch1_step20000, size: 0.1693 GB +[18:15:18] >> Checkpoint saved: epoch1_step20000, size: 0.1693 GB +[18:15:18] Epoch: 1 Batch: 20000/20099 (99.51%) Loss: 1.895693 LR: 0.00000600 +[18:15:20] Epoch: 1 Batch: 20001/20099 (99.51%) Loss: 2.118155 LR: 0.00000600 +[18:15:22] Epoch: 1 Batch: 20002/20099 (99.52%) Loss: 1.900028 LR: 0.00000600 +[18:15:24] Epoch: 1 Batch: 20003/20099 (99.52%) Loss: 1.885696 LR: 0.00000600 +[18:15:26] Epoch: 1 Batch: 20004/20099 (99.53%) Loss: 2.298121 LR: 0.00000600 +[18:15:27] Epoch: 1 Batch: 20005/20099 (99.53%) Loss: 2.129913 LR: 0.00000600 +[18:15:29] Epoch: 1 Batch: 20006/20099 (99.54%) Loss: 2.128814 LR: 0.00000600 +[18:15:31] Epoch: 1 Batch: 20007/20099 (99.54%) Loss: 2.134611 LR: 0.00000600 +[18:15:33] Epoch: 1 Batch: 20008/20099 (99.55%) Loss: 1.945534 LR: 0.00000600 +[18:15:35] Epoch: 1 Batch: 20009/20099 (99.55%) Loss: 2.313602 LR: 0.00000600 +[18:15:37] Epoch: 1 Batch: 20010/20099 (99.56%) Loss: 2.076821 LR: 0.00000600 +[18:15:39] Epoch: 1 Batch: 20011/20099 (99.56%) Loss: 2.217935 LR: 0.00000600 +[18:15:41] Epoch: 1 Batch: 20012/20099 (99.57%) Loss: 1.920306 LR: 0.00000600 +[18:15:43] Epoch: 1 Batch: 20013/20099 (99.57%) Loss: 2.286664 LR: 0.00000600 +[18:15:45] Epoch: 1 Batch: 20014/20099 (99.58%) Loss: 1.826551 LR: 0.00000600 +[18:15:46] Epoch: 1 Batch: 20015/20099 (99.58%) Loss: 2.038226 LR: 0.00000600 +[18:15:48] Epoch: 1 Batch: 20016/20099 (99.59%) Loss: 2.308253 LR: 0.00000600 +[18:15:50] Epoch: 1 Batch: 20017/20099 (99.59%) Loss: 2.430830 LR: 0.00000600 +[18:15:52] Epoch: 1 Batch: 20018/20099 (99.60%) Loss: 2.111906 LR: 0.00000600 +[18:15:54] Epoch: 1 Batch: 20019/20099 (99.60%) Loss: 1.901409 LR: 0.00000600 +[18:15:56] Epoch: 1 Batch: 20020/20099 (99.61%) Loss: 1.697669 LR: 0.00000600 +[18:15:58] Epoch: 1 Batch: 20021/20099 (99.61%) Loss: 1.972427 LR: 0.00000600 +[18:16:00] Epoch: 1 Batch: 20022/20099 (99.62%) Loss: 2.280633 LR: 0.00000600 +[18:16:01] Epoch: 1 Batch: 20023/20099 (99.62%) Loss: 2.210963 LR: 0.00000600 +[18:16:03] Epoch: 1 Batch: 20024/20099 (99.63%) Loss: 2.193432 LR: 0.00000600 +[18:16:05] Epoch: 1 Batch: 20025/20099 (99.63%) Loss: 2.002573 LR: 0.00000600 +[18:16:07] Epoch: 1 Batch: 20026/20099 (99.64%) Loss: 2.376453 LR: 0.00000600 +[18:16:09] Epoch: 1 Batch: 20027/20099 (99.64%) Loss: 2.042660 LR: 0.00000600 +[18:16:11] Epoch: 1 Batch: 20028/20099 (99.65%) Loss: 1.959887 LR: 0.00000600 +[18:16:13] Epoch: 1 Batch: 20029/20099 (99.65%) Loss: 2.093671 LR: 0.00000600 +[18:16:14] Epoch: 1 Batch: 20030/20099 (99.66%) Loss: 2.195789 LR: 0.00000600 +[18:16:16] Epoch: 1 Batch: 20031/20099 (99.66%) Loss: 2.164001 LR: 0.00000600 +[18:16:18] Epoch: 1 Batch: 20032/20099 (99.67%) Loss: 1.986731 LR: 0.00000600 +[18:16:20] Epoch: 1 Batch: 20033/20099 (99.67%) Loss: 2.271636 LR: 0.00000600 +[18:16:22] Epoch: 1 Batch: 20034/20099 (99.68%) Loss: 2.018275 LR: 0.00000600 +[18:16:24] Epoch: 1 Batch: 20035/20099 (99.68%) Loss: 2.099380 LR: 0.00000600 +[18:16:25] Epoch: 1 Batch: 20036/20099 (99.69%) Loss: 2.018587 LR: 0.00000600 +[18:16:27] Epoch: 1 Batch: 20037/20099 (99.69%) Loss: 2.019433 LR: 0.00000600 +[18:16:29] Epoch: 1 Batch: 20038/20099 (99.70%) Loss: 2.112030 LR: 0.00000600 +[18:16:31] Epoch: 1 Batch: 20039/20099 (99.70%) Loss: 2.167210 LR: 0.00000600 +[18:16:33] Epoch: 1 Batch: 20040/20099 (99.71%) Loss: 2.285786 LR: 0.00000600 +[18:16:35] Epoch: 1 Batch: 20041/20099 (99.71%) Loss: 2.189596 LR: 0.00000600 +[18:16:37] Epoch: 1 Batch: 20042/20099 (99.72%) Loss: 2.201477 LR: 0.00000600 +[18:16:38] Epoch: 1 Batch: 20043/20099 (99.72%) Loss: 1.802700 LR: 0.00000600 +[18:16:40] Epoch: 1 Batch: 20044/20099 (99.73%) Loss: 1.943753 LR: 0.00000600 +[18:16:42] Epoch: 1 Batch: 20045/20099 (99.73%) Loss: 1.957204 LR: 0.00000600 +[18:16:44] Epoch: 1 Batch: 20046/20099 (99.74%) Loss: 2.256362 LR: 0.00000600 +[18:16:46] Epoch: 1 Batch: 20047/20099 (99.74%) Loss: 2.068656 LR: 0.00000600 +[18:16:48] Epoch: 1 Batch: 20048/20099 (99.75%) Loss: 1.874844 LR: 0.00000600 +[18:16:49] Epoch: 1 Batch: 20049/20099 (99.75%) Loss: 1.974484 LR: 0.00000600 +[18:16:51] Epoch: 1 Batch: 20050/20099 (99.76%) Loss: 2.000124 LR: 0.00000600 +[18:16:53] Epoch: 1 Batch: 20051/20099 (99.76%) Loss: 2.147340 LR: 0.00000600 +[18:16:55] Epoch: 1 Batch: 20052/20099 (99.77%) Loss: 2.334250 LR: 0.00000600 +[18:16:57] Epoch: 1 Batch: 20053/20099 (99.77%) Loss: 2.299127 LR: 0.00000600 +[18:16:59] Epoch: 1 Batch: 20054/20099 (99.78%) Loss: 1.934283 LR: 0.00000600 +[18:17:01] Epoch: 1 Batch: 20055/20099 (99.78%) Loss: 1.882757 LR: 0.00000600 +[18:17:02] Epoch: 1 Batch: 20056/20099 (99.79%) Loss: 2.052042 LR: 0.00000600 +[18:17:04] Epoch: 1 Batch: 20057/20099 (99.79%) Loss: 2.233625 LR: 0.00000600 +[18:17:06] Epoch: 1 Batch: 20058/20099 (99.80%) Loss: 2.161074 LR: 0.00000600 +[18:17:08] Epoch: 1 Batch: 20059/20099 (99.80%) Loss: 2.036236 LR: 0.00000600 +[18:17:10] Epoch: 1 Batch: 20060/20099 (99.81%) Loss: 1.957123 LR: 0.00000600 +[18:17:12] Epoch: 1 Batch: 20061/20099 (99.81%) Loss: 1.790294 LR: 0.00000600 +[18:17:14] Epoch: 1 Batch: 20062/20099 (99.82%) Loss: 2.163150 LR: 0.00000600 +[18:17:15] Epoch: 1 Batch: 20063/20099 (99.82%) Loss: 2.443207 LR: 0.00000600 +[18:17:17] Epoch: 1 Batch: 20064/20099 (99.83%) Loss: 1.768087 LR: 0.00000600 +[18:17:19] Epoch: 1 Batch: 20065/20099 (99.83%) Loss: 2.161706 LR: 0.00000600 +[18:17:21] Epoch: 1 Batch: 20066/20099 (99.84%) Loss: 1.910872 LR: 0.00000600 +[18:17:23] Epoch: 1 Batch: 20067/20099 (99.84%) Loss: 2.215328 LR: 0.00000600 +[18:17:25] Epoch: 1 Batch: 20068/20099 (99.85%) Loss: 2.215004 LR: 0.00000600 +[18:17:27] Epoch: 1 Batch: 20069/20099 (99.85%) Loss: 2.040601 LR: 0.00000600 +[18:17:28] Epoch: 1 Batch: 20070/20099 (99.86%) Loss: 1.849070 LR: 0.00000600 +[18:17:30] Epoch: 1 Batch: 20071/20099 (99.86%) Loss: 2.029732 LR: 0.00000600 +[18:17:32] Epoch: 1 Batch: 20072/20099 (99.87%) Loss: 2.323634 LR: 0.00000600 +[18:17:34] Epoch: 1 Batch: 20073/20099 (99.87%) Loss: 2.117156 LR: 0.00000600 +[18:17:36] Epoch: 1 Batch: 20074/20099 (99.88%) Loss: 1.723173 LR: 0.00000600 +[18:17:38] Epoch: 1 Batch: 20075/20099 (99.88%) Loss: 1.981044 LR: 0.00000600 +[18:17:40] Epoch: 1 Batch: 20076/20099 (99.89%) Loss: 2.037143 LR: 0.00000600 +[18:17:41] Epoch: 1 Batch: 20077/20099 (99.89%) Loss: 1.958102 LR: 0.00000600 +[18:17:43] Epoch: 1 Batch: 20078/20099 (99.90%) Loss: 1.980074 LR: 0.00000600 +[18:17:45] Epoch: 1 Batch: 20079/20099 (99.90%) Loss: 2.210082 LR: 0.00000600 +[18:17:47] Epoch: 1 Batch: 20080/20099 (99.91%) Loss: 2.048256 LR: 0.00000600 +[18:17:49] Epoch: 1 Batch: 20081/20099 (99.91%) Loss: 2.119569 LR: 0.00000600 +[18:17:51] Epoch: 1 Batch: 20082/20099 (99.92%) Loss: 1.897750 LR: 0.00000600 +[18:17:53] Epoch: 1 Batch: 20083/20099 (99.92%) Loss: 2.091305 LR: 0.00000600 +[18:17:54] Epoch: 1 Batch: 20084/20099 (99.93%) Loss: 2.280691 LR: 0.00000600 +[18:17:56] Epoch: 1 Batch: 20085/20099 (99.93%) Loss: 2.147128 LR: 0.00000600 +[18:17:58] Epoch: 1 Batch: 20086/20099 (99.94%) Loss: 2.006140 LR: 0.00000600 +[18:18:00] Epoch: 1 Batch: 20087/20099 (99.94%) Loss: 2.240511 LR: 0.00000600 +[18:18:02] Epoch: 1 Batch: 20088/20099 (99.95%) Loss: 1.995956 LR: 0.00000600 +[18:18:04] Epoch: 1 Batch: 20089/20099 (99.95%) Loss: 2.086977 LR: 0.00000600 +[18:18:05] Epoch: 1 Batch: 20090/20099 (99.96%) Loss: 1.993428 LR: 0.00000600 +[18:18:07] Epoch: 1 Batch: 20091/20099 (99.96%) Loss: 2.169419 LR: 0.00000600 +[18:18:09] Epoch: 1 Batch: 20092/20099 (99.97%) Loss: 2.396623 LR: 0.00000600 +[18:18:11] Epoch: 1 Batch: 20093/20099 (99.97%) Loss: 1.850003 LR: 0.00000600 +[18:18:13] Epoch: 1 Batch: 20094/20099 (99.98%) Loss: 1.873596 LR: 0.00000600 +[18:18:15] Epoch: 1 Batch: 20095/20099 (99.98%) Loss: 1.948931 LR: 0.00000600 +[18:18:17] Epoch: 1 Batch: 20096/20099 (99.99%) Loss: 2.426093 LR: 0.00000600 +[18:18:18] Epoch: 1 Batch: 20097/20099 (99.99%) Loss: 2.223984 LR: 0.00000600 +[18:18:20] Epoch: 1 Batch: 20098/20099 (100.00%) Loss: 1.950486 LR: 0.00000600 +[18:18:21] Epoch: 1 Batch: 20099/20099 (100.00%) Loss: 2.265170 LR: 0.00000600 +[18:18:21] CPU usage: 64.4%, RAM usage: 30.6% +[18:18:21] Memory cleanup after epoch 1 +[18:18:23] CPU usage: 53.0%, RAM usage: 30.6% +[18:18:23] Epoch 1 average loss: 0.8819 +[18:18:23] >> Evaluating batch 0 +[18:18:24] >> Evaluating batch 1 +[18:18:25] >> Evaluating batch 2 +[18:18:26] >> Evaluating batch 3 +[18:18:27] >> Evaluating batch 4 +[18:18:28] >> Evaluating batch 5 +[18:18:29] >> Evaluating batch 6 +[18:18:31] >> Evaluating batch 7 +[18:18:32] >> Evaluating batch 8 +[18:18:33] >> Evaluating batch 9 +[18:18:34] >> Evaluating batch 10 +[18:18:35] >> Evaluating batch 11 +[18:18:36] >> Evaluating batch 12 +[18:18:37] >> Evaluating batch 13 +[18:18:38] >> Evaluating batch 14 +[18:18:39] >> Evaluating batch 15 +[18:18:40] >> Evaluating batch 16 +[18:18:40] Epoch: 1 Step: 20099/20099 Evaluation: +[18:18:40] Val Loss: 2.1445 Perplexity: 8.5377 LR: 0.00000600 +[18:18:40] Epoch 1 completed in 16320.25 seconds +[18:18:44] >> Checkpoint saved: epoch1_complete, size: 0.1690 GB +[18:18:48] >> Cleaned up old temp checkpoint: epoch1_step18200 +[18:18:48] >> Temp checkpoint saved: epoch1_step20099, size: 0.1690 GB +[18:18:48] Training complete. +[20:41:35] 2025-08-24 +[20:41:36] Tesla T4 +[20:41:36] +|===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 0 B | 0 B | 0 B | 0 B | +|---------------------------------------------------------------------------| +| Active memory | 0 B | 0 B | 0 B | 0 B | +|---------------------------------------------------------------------------| +| Requested memory | 0 B | 0 B | 0 B | 0 B | +|---------------------------------------------------------------------------| +| GPU reserved memory | 0 B | 0 B | 0 B | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 0 B | 0 B | 0 B | 0 B | +|---------------------------------------------------------------------------| +| Allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Active allocs | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| GPU reserved segments | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +[20:41:36] CPU usage: 94.8%, RAM usage: 27.2% +[20:41:36] Running with the following configuration: +[20:41:36] model_name: /content/drive/MyDrive/llm/NousResearch/Hermes-3-Llama-3.1-8B +[20:41:36] tokenizer: /content/drive/MyDrive/llm/NousResearch/Hermes-3-Llama-3.1-8B +[20:41:36] output_dir: /content/drive/MyDrive/llm/Discord-Hermes-3-8B +[20:41:36] train_path: /content/drive/MyDrive/data/None156_fix.csv +[20:41:36] checkpoint: /content/drive/MyDrive/llm/Discord-Hermes-3-8B/epoch1_step19000 +[20:41:36] lr: 3e-05 +[20:41:36] lr_floor: 6e-06 +[20:41:37] epochs: 1 +[20:41:37] batch_size: 5 +[20:41:37] accum_steps: 7 +[20:41:37] val_batch_size: 6 +[20:41:37] max_val_size: 100 +[20:41:37] max_length: 150 +[20:41:37] save_temp_frequency: 200 +[20:41:37] save_frequency: 500 +[20:41:37] eval_frequency: 500 +[20:41:37] save_pattern: y +[20:41:37] quantization: y +[20:41:37] quantization_bits: 4 +[20:41:37] lora: y +[20:41:37] frozen_lora_path: None +[20:41:37] lora_rank: 16 +[20:41:37] lora_alpha: 32 +[20:41:37] lora_dropout: 0.1 +[20:41:37] optimizer_weight_decay: 0.0 +[20:41:37] warmup_type: cosine +[20:41:37] warmup_ratio: 0.08 +[20:41:37] warmup_steps: 550 +[20:41:37] shuffle: y +[20:41:37] csv_column: text +[20:41:37] new_run: n +[20:41:37] label_smoothing: 0.05 +[20:41:37] SEED: 1 +[20:41:37] Using device: cuda +[20:41:37] Resuming from temp checkpoint: /content/drive/MyDrive/llm/Discord-Hermes-3-8B/epoch1_step19000 +[20:48:09] Embeddings shape after: torch.Size([128256, 4096]) +[20:48:19] Loaded trainable LoRA adapter from /content/drive/MyDrive/llm/Discord-Hermes-3-8B/epoch1_step19000 +[20:48:19] Trainable LoRA 'default': +[20:48:19] task_type: CAUSAL_LM +[20:48:19] peft_type: PeftType.LORA +[20:48:19] auto_mapping: None +[20:48:19] base_model_name_or_path: /content/drive/MyDrive/llm/NousResearch/Hermes-3-Llama-3.1-8B +[20:48:19] revision: None +[20:48:19] inference_mode: False +[20:48:19] r: 16 +[20:48:19] target_modules: {'q_proj', 'k_proj', 'v_proj', 'o_proj'} +[20:48:19] exclude_modules: None +[20:48:19] lora_alpha: 32 +[20:48:19] lora_dropout: 0.1 +[20:48:19] fan_in_fan_out: False +[20:48:19] bias: none +[20:48:19] use_rslora: True +[20:48:19] modules_to_save: None +[20:48:19] init_lora_weights: True +[20:48:19] layers_to_transform: None +[20:48:19] layers_pattern: None +[20:48:19] rank_pattern: {} +[20:48:19] alpha_pattern: {} +[20:48:19] megatron_config: None +[20:48:19] megatron_core: megatron.core +[20:48:19] trainable_token_indices: None +[20:48:19] loftq_config: {} +[20:48:19] eva_config: None +[20:48:19] corda_config: None +[20:48:19] use_dora: False +[20:48:19] use_qalora: False +[20:48:19] qalora_group_size: 16 +[20:48:19] layer_replication: None +[20:48:19] runtime_config: LoraRuntimeConfig(ephemeral_gpu_offload=False) +[20:48:19] lora_bias: False +[20:48:19] target_parameters: None +[20:48:19] _custom_modules: None +[20:48:19] Embeddings shape after: torch.Size([128256, 4096]) +[20:48:34] Resumed from epoch 1, step 19001, file 1 +[20:48:34] Starting from CSV file... +[20:48:39] Splitting data into chunks of 11000... +[20:48:39] Using 7 processes across 10 chunks +[20:48:39] Using saved train/val split from checkpoint. +[20:48:39] Resuming scheduler with warmup steps: 229, total steps: 2871 +[20:48:39] Initializing scheduler with cosine schedule with warmup, warmup steps 550, total steps: 2871 +[20:48:39] Train/Val split: 100492 train, 100 val samples. +[20:48:48] Model: PeftModelForCausalLM +[20:48:48] Model config: LlamaConfig { + "architectures": [ + "LlamaForCausalLM" + ], + "attention_bias": false, + "attention_dropout": 0.0, + "bos_token_id": 128000, + "eos_token_id": 128040, + "head_dim": 128, + "hidden_act": "silu", + "hidden_size": 4096, + "initializer_range": 0.02, + "intermediate_size": 14336, + "max_position_embeddings": 131072, + "mlp_bias": false, + "model_type": "llama", + "num_attention_heads": 32, + "num_hidden_layers": 32, + "num_key_value_heads": 8, + "pretraining_tp": 1, + "quantization_config": { + "_load_in_4bit": true, + "_load_in_8bit": false, + "bnb_4bit_compute_dtype": "float16", + "bnb_4bit_quant_storage": "uint8", + "bnb_4bit_quant_type": "nf4", + "bnb_4bit_use_double_quant": true, + "llm_int8_enable_fp32_cpu_offload": false, + "llm_int8_has_fp16_weight": false, + "llm_int8_skip_modules": [ + "lm_head" + ], + "llm_int8_threshold": 6.0, + "load_in_4bit": true, + "load_in_8bit": false, + "quant_method": "bitsandbytes" + }, + "rms_norm_eps": 1e-05, + "rope_scaling": { + "factor": 8.0, + "high_freq_factor": 4.0, + "low_freq_factor": 1.0, + "original_max_position_embeddings": 8192, + "rope_type": "llama3" + }, + "rope_theta": 500000.0, + "tie_word_embeddings": false, + "torch_dtype": "float16", + "transformers_version": "4.55.2", + "use_cache": true, + "vocab_size": 128256 +} + +[20:48:48] Optimizer params: lr=3e-05, weight_decay=0.0, accum_steps=7 +[20:48:48] +Optimizer: PagedAdamW ( +Parameter Group 0 + alpha: 0.0 + betas: (0.9, 0.95) + eps: 1e-08 + initial_lr: 3e-05 + lr: 0.0 + t_alpha: None + t_beta3: None + weight_decay: 0.0 +) +[20:48:48] Optimizer params: lr=3e-05, weight_decay=0.0, accum_steps=7 +[20:48:48] Scheduler: +[20:48:48] Training on 100492 training samples, 100 validation samples +[20:48:48] Average tokens per sample: 150.00 +[20:48:48] Estimated epoch time: ~301.73 min +[20:48:48] +|===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 5986 MiB | 7004 MiB | 335397 MiB | 329410 MiB | +|---------------------------------------------------------------------------| +| Active memory | 5986 MiB | 7004 MiB | 335397 MiB | 329410 MiB | +|---------------------------------------------------------------------------| +| Requested memory | 5983 MiB | 7000 MiB | 335022 MiB | 329039 MiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 7248 MiB | 7248 MiB | 7248 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 1261 MiB | 5879 MiB | 328754 MiB | 327493 MiB | +|---------------------------------------------------------------------------| +| Allocations | 2762 | 2840 | 33883 | 31121 | +|---------------------------------------------------------------------------| +| Active allocs | 2762 | 2840 | 33883 | 31121 | +|---------------------------------------------------------------------------| +| GPU reserved segments | 185 | 185 | 185 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 36 | 36 | 13826 | 13790 | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +[20:48:48] Restoring shuffle indices from training state for epoch 1 +[20:48:48] CPU usage: 42.2%, RAM usage: 36.8% +[20:48:49] Epoch 1 learning rate: 0.0 +[20:48:49] Starting epoch 1 +[20:49:50] Batch 19001: input_ids shape torch.Size([5, 150]), attention_mask shape torch.Size([5, 150]) +[20:49:51] Epoch: 1 Batch: 19001/20099 (94.54%) Loss: 1.978112 LR: 0.00000000 +[20:49:53] Epoch: 1 Batch: 19002/20099 (94.54%) Loss: 2.163214 LR: 0.00000000 +[20:49:55] Epoch: 1 Batch: 19003/20099 (94.55%) Loss: 2.210306 LR: 0.00000000 +[20:49:56] Epoch: 1 Batch: 19004/20099 (94.55%) Loss: 2.462724 LR: 0.00000000 +[20:49:58] Epoch: 1 Batch: 19005/20099 (94.56%) Loss: 2.192734 LR: 0.00000000 +[20:50:00] Epoch: 1 Batch: 19006/20099 (94.56%) Loss: 2.213096 LR: 0.00000000 +[20:50:01] Epoch: 1 Batch: 19007/20099 (94.57%) Loss: 2.035861 LR: 0.00000627 +[20:50:03] Epoch: 1 Batch: 19008/20099 (94.57%) Loss: 2.430056 LR: 0.00000627 +[20:50:05] Epoch: 1 Batch: 19009/20099 (94.58%) Loss: 2.280874 LR: 0.00000627 +[20:50:06] Epoch: 1 Batch: 19010/20099 (94.58%) Loss: 1.900641 LR: 0.00000627 +[20:50:08] Epoch: 1 Batch: 19011/20099 (94.59%) Loss: 2.030517 LR: 0.00000627 +[20:50:10] Epoch: 1 Batch: 19012/20099 (94.59%) Loss: 2.103296 LR: 0.00000627 +[20:50:12] Epoch: 1 Batch: 19013/20099 (94.60%) Loss: 2.395484 LR: 0.00000627 +[20:50:13] Epoch: 1 Batch: 19014/20099 (94.60%) Loss: 2.102773 LR: 0.00000627 +[20:50:15] Epoch: 1 Batch: 19015/20099 (94.61%) Loss: 2.169385 LR: 0.00000627 +[20:50:17] Epoch: 1 Batch: 19016/20099 (94.61%) Loss: 1.819225 LR: 0.00000627 +[20:50:18] Epoch: 1 Batch: 19017/20099 (94.62%) Loss: 1.983545 LR: 0.00000627 +[20:50:20] Epoch: 1 Batch: 19018/20099 (94.62%) Loss: 2.136305 LR: 0.00000627 +[20:50:22] Epoch: 1 Batch: 19019/20099 (94.63%) Loss: 2.064129 LR: 0.00000627 +[20:50:24] Epoch: 1 Batch: 19020/20099 (94.63%) Loss: 1.902418 LR: 0.00000627 +[20:50:25] Epoch: 1 Batch: 19021/20099 (94.64%) Loss: 1.938429 LR: 0.00000626 +[20:50:27] Epoch: 1 Batch: 19022/20099 (94.64%) Loss: 1.968538 LR: 0.00000626 +[20:50:29] Epoch: 1 Batch: 19023/20099 (94.65%) Loss: 2.344552 LR: 0.00000626 +[20:50:31] Epoch: 1 Batch: 19024/20099 (94.65%) Loss: 1.947167 LR: 0.00000626 +[20:50:32] Epoch: 1 Batch: 19025/20099 (94.66%) Loss: 2.168672 LR: 0.00000626 +[20:50:34] Epoch: 1 Batch: 19026/20099 (94.66%) Loss: 1.889120 LR: 0.00000626 +[20:50:36] Epoch: 1 Batch: 19027/20099 (94.67%) Loss: 2.061914 LR: 0.00000626 +[20:50:38] Epoch: 1 Batch: 19028/20099 (94.67%) Loss: 1.961316 LR: 0.00000626 +[20:50:39] Epoch: 1 Batch: 19029/20099 (94.68%) Loss: 1.922917 LR: 0.00000626 +[20:50:41] Epoch: 1 Batch: 19030/20099 (94.68%) Loss: 2.051198 LR: 0.00000626 +[20:50:43] Epoch: 1 Batch: 19031/20099 (94.69%) Loss: 1.791793 LR: 0.00000626 +[20:50:45] Epoch: 1 Batch: 19032/20099 (94.69%) Loss: 1.950352 LR: 0.00000626 +[20:50:46] Epoch: 1 Batch: 19033/20099 (94.70%) Loss: 2.182775 LR: 0.00000626 +[20:50:48] Epoch: 1 Batch: 19034/20099 (94.70%) Loss: 2.177331 LR: 0.00000626 +[20:50:50] Epoch: 1 Batch: 19035/20099 (94.71%) Loss: 1.749549 LR: 0.00000626 +[20:50:51] Epoch: 1 Batch: 19036/20099 (94.71%) Loss: 2.126160 LR: 0.00000626 +[20:50:53] Epoch: 1 Batch: 19037/20099 (94.72%) Loss: 2.146404 LR: 0.00000626 +[20:50:55] Epoch: 1 Batch: 19038/20099 (94.72%) Loss: 1.955635 LR: 0.00000626 +[20:50:56] Epoch: 1 Batch: 19039/20099 (94.73%) Loss: 2.152267 LR: 0.00000626 +[20:50:58] Epoch: 1 Batch: 19040/20099 (94.73%) Loss: 2.099615 LR: 0.00000626 +[20:51:00] Epoch: 1 Batch: 19041/20099 (94.74%) Loss: 1.976633 LR: 0.00000626 +[20:51:02] Epoch: 1 Batch: 19042/20099 (94.74%) Loss: 2.129047 LR: 0.00000625 +[20:51:03] Epoch: 1 Batch: 19043/20099 (94.75%) Loss: 1.988193 LR: 0.00000625 +[20:51:05] Epoch: 1 Batch: 19044/20099 (94.75%) Loss: 2.221761 LR: 0.00000625 +[20:51:07] Epoch: 1 Batch: 19045/20099 (94.76%) Loss: 1.653421 LR: 0.00000625 +[20:51:08] Epoch: 1 Batch: 19046/20099 (94.76%) Loss: 2.268929 LR: 0.00000625 +[20:51:10] Epoch: 1 Batch: 19047/20099 (94.77%) Loss: 1.984380 LR: 0.00000625 +[20:51:12] Epoch: 1 Batch: 19048/20099 (94.77%) Loss: 1.642195 LR: 0.00000625 +[20:51:13] Epoch: 1 Batch: 19049/20099 (94.78%) Loss: 2.097610 LR: 0.00000625 +[20:51:15] Epoch: 1 Batch: 19050/20099 (94.78%) Loss: 2.115116 LR: 0.00000625 +[20:51:17] Epoch: 1 Batch: 19051/20099 (94.79%) Loss: 2.317827 LR: 0.00000625 +[20:51:18] Epoch: 1 Batch: 19052/20099 (94.79%) Loss: 1.668196 LR: 0.00000625 +[20:51:20] Epoch: 1 Batch: 19053/20099 (94.80%) Loss: 2.269184 LR: 0.00000625 +[20:51:22] Epoch: 1 Batch: 19054/20099 (94.80%) Loss: 2.229957 LR: 0.00000625 +[20:51:23] Epoch: 1 Batch: 19055/20099 (94.81%) Loss: 1.947311 LR: 0.00000625 +[20:51:25] Epoch: 1 Batch: 19056/20099 (94.81%) Loss: 2.133463 LR: 0.00000625 +[20:51:27] Epoch: 1 Batch: 19057/20099 (94.82%) Loss: 2.061038 LR: 0.00000625 +[20:51:28] Epoch: 1 Batch: 19058/20099 (94.82%) Loss: 2.114843 LR: 0.00000625 +[20:51:30] Epoch: 1 Batch: 19059/20099 (94.83%) Loss: 1.873330 LR: 0.00000625 +[20:51:32] Epoch: 1 Batch: 19060/20099 (94.83%) Loss: 2.105238 LR: 0.00000625 +[20:51:33] Epoch: 1 Batch: 19061/20099 (94.84%) Loss: 2.132580 LR: 0.00000625 +[20:51:35] Epoch: 1 Batch: 19062/20099 (94.84%) Loss: 2.423683 LR: 0.00000625 +[20:51:37] Epoch: 1 Batch: 19063/20099 (94.85%) Loss: 1.946827 LR: 0.00000624 +[20:51:38] Epoch: 1 Batch: 19064/20099 (94.85%) Loss: 2.077487 LR: 0.00000624 +[20:51:40] Epoch: 1 Batch: 19065/20099 (94.86%) Loss: 1.968953 LR: 0.00000624 +[20:51:42] Epoch: 1 Batch: 19066/20099 (94.86%) Loss: 1.977991 LR: 0.00000624 +[20:51:43] Epoch: 1 Batch: 19067/20099 (94.87%) Loss: 2.096603 LR: 0.00000624 +[20:51:45] Epoch: 1 Batch: 19068/20099 (94.87%) Loss: 1.995829 LR: 0.00000624 +[20:51:47] Epoch: 1 Batch: 19069/20099 (94.88%) Loss: 2.165722 LR: 0.00000624 +[20:51:48] Epoch: 1 Batch: 19070/20099 (94.88%) Loss: 1.969299 LR: 0.00000624 +[20:51:50] Epoch: 1 Batch: 19071/20099 (94.89%) Loss: 1.821863 LR: 0.00000624 +[20:51:52] Epoch: 1 Batch: 19072/20099 (94.89%) Loss: 1.980339 LR: 0.00000624 +[20:51:54] Epoch: 1 Batch: 19073/20099 (94.90%) Loss: 2.093713 LR: 0.00000624 +[20:51:55] Epoch: 1 Batch: 19074/20099 (94.90%) Loss: 1.747308 LR: 0.00000624 +[20:51:57] Epoch: 1 Batch: 19075/20099 (94.91%) Loss: 2.366522 LR: 0.00000624 +[20:51:59] Epoch: 1 Batch: 19076/20099 (94.91%) Loss: 1.737646 LR: 0.00000624 +[20:52:00] Epoch: 1 Batch: 19077/20099 (94.92%) Loss: 2.251599 LR: 0.00000624 +[20:52:02] Epoch: 1 Batch: 19078/20099 (94.92%) Loss: 2.161386 LR: 0.00000624 +[20:52:04] Epoch: 1 Batch: 19079/20099 (94.93%) Loss: 2.263873 LR: 0.00000624 +[20:52:06] Epoch: 1 Batch: 19080/20099 (94.93%) Loss: 2.019010 LR: 0.00000624 +[20:52:07] Epoch: 1 Batch: 19081/20099 (94.94%) Loss: 2.086088 LR: 0.00000624 +[20:52:09] Epoch: 1 Batch: 19082/20099 (94.94%) Loss: 2.288847 LR: 0.00000624 +[20:52:11] Epoch: 1 Batch: 19083/20099 (94.95%) Loss: 1.870238 LR: 0.00000624 +[20:52:12] Epoch: 1 Batch: 19084/20099 (94.95%) Loss: 2.354767 LR: 0.00000623 +[20:52:14] Epoch: 1 Batch: 19085/20099 (94.95%) Loss: 2.159588 LR: 0.00000623 +[20:52:16] Epoch: 1 Batch: 19086/20099 (94.96%) Loss: 2.087382 LR: 0.00000623 +[20:52:17] Epoch: 1 Batch: 19087/20099 (94.96%) Loss: 2.222804 LR: 0.00000623 +[20:52:19] Epoch: 1 Batch: 19088/20099 (94.97%) Loss: 2.145258 LR: 0.00000623 +[20:52:21] Epoch: 1 Batch: 19089/20099 (94.97%) Loss: 2.231341 LR: 0.00000623 +[20:52:23] Epoch: 1 Batch: 19090/20099 (94.98%) Loss: 2.151014 LR: 0.00000623 +[20:52:24] Epoch: 1 Batch: 19091/20099 (94.98%) Loss: 2.149684 LR: 0.00000623 +[20:52:26] Epoch: 1 Batch: 19092/20099 (94.99%) Loss: 2.225381 LR: 0.00000623 +[20:52:28] Epoch: 1 Batch: 19093/20099 (94.99%) Loss: 1.769606 LR: 0.00000623 +[20:52:29] Epoch: 1 Batch: 19094/20099 (95.00%) Loss: 1.890630 LR: 0.00000623 +[20:52:31] Epoch: 1 Batch: 19095/20099 (95.00%) Loss: 2.034448 LR: 0.00000623 +[20:52:33] Epoch: 1 Batch: 19096/20099 (95.01%) Loss: 2.114579 LR: 0.00000623 +[20:52:34] Epoch: 1 Batch: 19097/20099 (95.01%) Loss: 2.209268 LR: 0.00000623 +[20:52:36] Epoch: 1 Batch: 19098/20099 (95.02%) Loss: 2.254925 LR: 0.00000623 +[20:52:38] Epoch: 1 Batch: 19099/20099 (95.02%) Loss: 2.315436 LR: 0.00000623 +[20:52:39] Epoch: 1 Batch: 19100/20099 (95.03%) Loss: 1.686467 LR: 0.00000623 +[20:52:41] Epoch: 1 Batch: 19101/20099 (95.03%) Loss: 2.270308 LR: 0.00000623 +[20:52:43] Epoch: 1 Batch: 19102/20099 (95.04%) Loss: 1.968389 LR: 0.00000623 +[20:52:45] Epoch: 1 Batch: 19103/20099 (95.04%) Loss: 1.994247 LR: 0.00000623 +[20:52:46] Epoch: 1 Batch: 19104/20099 (95.05%) Loss: 1.968544 LR: 0.00000623 +[20:52:48] Epoch: 1 Batch: 19105/20099 (95.05%) Loss: 2.068643 LR: 0.00000622 +[20:52:50] Epoch: 1 Batch: 19106/20099 (95.06%) Loss: 2.073535 LR: 0.00000622 +[20:52:51] Epoch: 1 Batch: 19107/20099 (95.06%) Loss: 1.955161 LR: 0.00000622 +[20:52:53] Epoch: 1 Batch: 19108/20099 (95.07%) Loss: 2.120068 LR: 0.00000622 +[20:52:55] Epoch: 1 Batch: 19109/20099 (95.07%) Loss: 1.983937 LR: 0.00000622 +[20:52:56] Epoch: 1 Batch: 19110/20099 (95.08%) Loss: 2.158452 LR: 0.00000622 +[20:52:58] Epoch: 1 Batch: 19111/20099 (95.08%) Loss: 2.097769 LR: 0.00000622 +[20:53:00] Epoch: 1 Batch: 19112/20099 (95.09%) Loss: 1.908696 LR: 0.00000622 +[20:53:01] Epoch: 1 Batch: 19113/20099 (95.09%) Loss: 2.033146 LR: 0.00000622 +[20:53:03] Epoch: 1 Batch: 19114/20099 (95.10%) Loss: 2.218906 LR: 0.00000622 +[20:53:05] Epoch: 1 Batch: 19115/20099 (95.10%) Loss: 2.161279 LR: 0.00000622 +[20:53:07] Epoch: 1 Batch: 19116/20099 (95.11%) Loss: 2.150926 LR: 0.00000622 +[20:53:08] Epoch: 1 Batch: 19117/20099 (95.11%) Loss: 2.138609 LR: 0.00000622 +[20:53:10] Epoch: 1 Batch: 19118/20099 (95.12%) Loss: 2.097715 LR: 0.00000622 +[20:53:12] Epoch: 1 Batch: 19119/20099 (95.12%) Loss: 2.417540 LR: 0.00000622 +[20:53:13] Epoch: 1 Batch: 19120/20099 (95.13%) Loss: 1.849390 LR: 0.00000622 +[20:53:15] Epoch: 1 Batch: 19121/20099 (95.13%) Loss: 2.056543 LR: 0.00000622 +[20:53:17] Epoch: 1 Batch: 19122/20099 (95.14%) Loss: 2.141981 LR: 0.00000622 +[20:53:18] Epoch: 1 Batch: 19123/20099 (95.14%) Loss: 1.924123 LR: 0.00000622 +[20:53:20] Epoch: 1 Batch: 19124/20099 (95.15%) Loss: 2.281999 LR: 0.00000622 +[20:53:22] Epoch: 1 Batch: 19125/20099 (95.15%) Loss: 2.005001 LR: 0.00000622 +[20:53:23] Epoch: 1 Batch: 19126/20099 (95.16%) Loss: 1.968428 LR: 0.00000621 +[20:53:25] Epoch: 1 Batch: 19127/20099 (95.16%) Loss: 1.788190 LR: 0.00000621 +[20:53:27] Epoch: 1 Batch: 19128/20099 (95.17%) Loss: 2.302800 LR: 0.00000621 +[20:53:29] Epoch: 1 Batch: 19129/20099 (95.17%) Loss: 2.249286 LR: 0.00000621 +[20:53:30] Epoch: 1 Batch: 19130/20099 (95.18%) Loss: 2.052105 LR: 0.00000621 +[20:53:32] Epoch: 1 Batch: 19131/20099 (95.18%) Loss: 2.001174 LR: 0.00000621 +[20:53:34] Epoch: 1 Batch: 19132/20099 (95.19%) Loss: 1.953700 LR: 0.00000621 +[20:53:35] Epoch: 1 Batch: 19133/20099 (95.19%) Loss: 2.086836 LR: 0.00000621 +[20:53:37] Epoch: 1 Batch: 19134/20099 (95.20%) Loss: 2.452835 LR: 0.00000621 +[20:53:39] Epoch: 1 Batch: 19135/20099 (95.20%) Loss: 2.316084 LR: 0.00000621 +[20:53:40] Epoch: 1 Batch: 19136/20099 (95.21%) Loss: 1.945849 LR: 0.00000621 +[20:53:42] Epoch: 1 Batch: 19137/20099 (95.21%) Loss: 2.327879 LR: 0.00000621 +[20:53:44] Epoch: 1 Batch: 19138/20099 (95.22%) Loss: 2.295895 LR: 0.00000621 +[20:53:45] Epoch: 1 Batch: 19139/20099 (95.22%) Loss: 2.014965 LR: 0.00000621 +[20:53:47] Epoch: 1 Batch: 19140/20099 (95.23%) Loss: 2.146734 LR: 0.00000621 +[20:53:49] Epoch: 1 Batch: 19141/20099 (95.23%) Loss: 1.795921 LR: 0.00000621 +[20:53:51] Epoch: 1 Batch: 19142/20099 (95.24%) Loss: 2.182985 LR: 0.00000621 +[20:53:52] Epoch: 1 Batch: 19143/20099 (95.24%) Loss: 1.968587 LR: 0.00000621 +[20:53:54] Epoch: 1 Batch: 19144/20099 (95.25%) Loss: 1.981478 LR: 0.00000621 +[20:53:56] Epoch: 1 Batch: 19145/20099 (95.25%) Loss: 1.792846 LR: 0.00000621 +[20:53:57] Epoch: 1 Batch: 19146/20099 (95.26%) Loss: 1.831909 LR: 0.00000621 +[20:53:59] Epoch: 1 Batch: 19147/20099 (95.26%) Loss: 2.008496 LR: 0.00000621 +[20:54:01] Epoch: 1 Batch: 19148/20099 (95.27%) Loss: 1.771042 LR: 0.00000621 +[20:54:02] Epoch: 1 Batch: 19149/20099 (95.27%) Loss: 2.035701 LR: 0.00000621 +[20:54:04] Epoch: 1 Batch: 19150/20099 (95.28%) Loss: 2.169387 LR: 0.00000621 +[20:54:06] Epoch: 1 Batch: 19151/20099 (95.28%) Loss: 1.612649 LR: 0.00000621 +[20:54:08] Epoch: 1 Batch: 19152/20099 (95.29%) Loss: 1.955926 LR: 0.00000621 +[20:54:09] Epoch: 1 Batch: 19153/20099 (95.29%) Loss: 1.834377 LR: 0.00000621 +[20:54:11] Epoch: 1 Batch: 19154/20099 (95.30%) Loss: 2.075702 LR: 0.00000620 +[20:54:13] Epoch: 1 Batch: 19155/20099 (95.30%) Loss: 2.035671 LR: 0.00000620 +[20:54:14] Epoch: 1 Batch: 19156/20099 (95.31%) Loss: 1.941365 LR: 0.00000620 +[20:54:16] Epoch: 1 Batch: 19157/20099 (95.31%) Loss: 1.987090 LR: 0.00000620 +[20:54:18] Epoch: 1 Batch: 19158/20099 (95.32%) Loss: 2.059231 LR: 0.00000620 +[20:54:19] Epoch: 1 Batch: 19159/20099 (95.32%) Loss: 2.076622 LR: 0.00000620 +[20:54:21] Epoch: 1 Batch: 19160/20099 (95.33%) Loss: 1.975419 LR: 0.00000620 +[20:54:23] Epoch: 1 Batch: 19161/20099 (95.33%) Loss: 1.929149 LR: 0.00000620 +[20:54:25] Epoch: 1 Batch: 19162/20099 (95.34%) Loss: 2.027461 LR: 0.00000620 +[20:54:26] Epoch: 1 Batch: 19163/20099 (95.34%) Loss: 1.998302 LR: 0.00000620 +[20:54:28] Epoch: 1 Batch: 19164/20099 (95.35%) Loss: 2.040854 LR: 0.00000620 +[20:54:30] Epoch: 1 Batch: 19165/20099 (95.35%) Loss: 2.074735 LR: 0.00000620 +[20:54:31] Epoch: 1 Batch: 19166/20099 (95.36%) Loss: 2.222879 LR: 0.00000620 +[20:54:33] Epoch: 1 Batch: 19167/20099 (95.36%) Loss: 1.950391 LR: 0.00000620 +[20:54:35] Epoch: 1 Batch: 19168/20099 (95.37%) Loss: 2.169336 LR: 0.00000620 +[20:54:36] Epoch: 1 Batch: 19169/20099 (95.37%) Loss: 2.268463 LR: 0.00000620 +[20:54:38] Epoch: 1 Batch: 19170/20099 (95.38%) Loss: 1.903302 LR: 0.00000620 +[20:54:40] Epoch: 1 Batch: 19171/20099 (95.38%) Loss: 2.200964 LR: 0.00000620 +[20:54:41] Epoch: 1 Batch: 19172/20099 (95.39%) Loss: 1.718770 LR: 0.00000620 +[20:54:43] Epoch: 1 Batch: 19173/20099 (95.39%) Loss: 2.163008 LR: 0.00000620 +[20:54:45] Epoch: 1 Batch: 19174/20099 (95.40%) Loss: 2.153473 LR: 0.00000620 +[20:54:47] Epoch: 1 Batch: 19175/20099 (95.40%) Loss: 2.190738 LR: 0.00000619 +[20:54:48] Epoch: 1 Batch: 19176/20099 (95.41%) Loss: 1.991787 LR: 0.00000619 +[20:54:50] Epoch: 1 Batch: 19177/20099 (95.41%) Loss: 2.105792 LR: 0.00000619 +[20:54:52] Epoch: 1 Batch: 19178/20099 (95.42%) Loss: 2.050585 LR: 0.00000619 +[20:54:53] Epoch: 1 Batch: 19179/20099 (95.42%) Loss: 2.012550 LR: 0.00000619 +[20:54:55] Epoch: 1 Batch: 19180/20099 (95.43%) Loss: 2.289478 LR: 0.00000619 +[20:54:57] Epoch: 1 Batch: 19181/20099 (95.43%) Loss: 2.000117 LR: 0.00000619 +[20:54:58] Epoch: 1 Batch: 19182/20099 (95.44%) Loss: 2.192591 LR: 0.00000619 +[20:55:00] Epoch: 1 Batch: 19183/20099 (95.44%) Loss: 2.047304 LR: 0.00000619 +[20:55:02] Epoch: 1 Batch: 19184/20099 (95.45%) Loss: 1.916289 LR: 0.00000619 +[20:55:04] Epoch: 1 Batch: 19185/20099 (95.45%) Loss: 1.935992 LR: 0.00000619 +[20:55:05] Epoch: 1 Batch: 19186/20099 (95.46%) Loss: 2.320133 LR: 0.00000619 +[20:55:07] Epoch: 1 Batch: 19187/20099 (95.46%) Loss: 2.155389 LR: 0.00000619 +[20:55:09] Epoch: 1 Batch: 19188/20099 (95.47%) Loss: 2.032508 LR: 0.00000619 +[20:55:10] Epoch: 1 Batch: 19189/20099 (95.47%) Loss: 1.975265 LR: 0.00000619 +[20:55:12] Epoch: 1 Batch: 19190/20099 (95.48%) Loss: 2.061513 LR: 0.00000619 +[20:55:14] Epoch: 1 Batch: 19191/20099 (95.48%) Loss: 1.848305 LR: 0.00000619 +[20:55:15] Epoch: 1 Batch: 19192/20099 (95.49%) Loss: 1.694508 LR: 0.00000619 +[20:55:17] Epoch: 1 Batch: 19193/20099 (95.49%) Loss: 2.009422 LR: 0.00000619 +[20:55:19] Epoch: 1 Batch: 19194/20099 (95.50%) Loss: 2.153341 LR: 0.00000619 +[20:55:21] Epoch: 1 Batch: 19195/20099 (95.50%) Loss: 2.206749 LR: 0.00000619 +[20:55:22] Epoch: 1 Batch: 19196/20099 (95.51%) Loss: 2.221833 LR: 0.00000619 +[20:55:24] Epoch: 1 Batch: 19197/20099 (95.51%) Loss: 2.147700 LR: 0.00000619 +[20:55:26] Epoch: 1 Batch: 19198/20099 (95.52%) Loss: 2.150656 LR: 0.00000619 +[20:55:27] Epoch: 1 Batch: 19199/20099 (95.52%) Loss: 1.865967 LR: 0.00000619 +[20:56:01] >> Cleaned up old temp checkpoint: epoch1_step17600 +[20:56:01] >> Temp checkpoint saved: epoch1_step19200, size: 0.1693 GB +[20:56:01] Epoch: 1 Batch: 19200/20099 (95.53%) Loss: 2.142875 LR: 0.00000619 +[20:56:02] Epoch: 1 Batch: 19201/20099 (95.53%) Loss: 2.152092 LR: 0.00000619 +[20:56:04] Epoch: 1 Batch: 19202/20099 (95.54%) Loss: 2.140530 LR: 0.00000619 +[20:56:06] Epoch: 1 Batch: 19203/20099 (95.54%) Loss: 2.072032 LR: 0.00000618 +[20:56:07] Epoch: 1 Batch: 19204/20099 (95.55%) Loss: 1.984781 LR: 0.00000618 +[20:56:09] Epoch: 1 Batch: 19205/20099 (95.55%) Loss: 1.979917 LR: 0.00000618 +[20:56:13] Epoch: 1 Batch: 19206/20099 (95.56%) Loss: 2.151002 LR: 0.00000618 +[20:56:14] Epoch: 1 Batch: 19207/20099 (95.56%) Loss: 2.552400 LR: 0.00000618 +[20:56:16] Epoch: 1 Batch: 19208/20099 (95.57%) Loss: 2.114375 LR: 0.00000618 +[20:56:18] Epoch: 1 Batch: 19209/20099 (95.57%) Loss: 2.299017 LR: 0.00000618 +[20:56:19] Epoch: 1 Batch: 19210/20099 (95.58%) Loss: 2.098571 LR: 0.00000618 +[20:56:21] Epoch: 1 Batch: 19211/20099 (95.58%) Loss: 2.009327 LR: 0.00000618 +[20:56:23] Epoch: 1 Batch: 19212/20099 (95.59%) Loss: 1.907591 LR: 0.00000618 +[20:56:25] Epoch: 1 Batch: 19213/20099 (95.59%) Loss: 2.253966 LR: 0.00000618 +[20:56:26] Epoch: 1 Batch: 19214/20099 (95.60%) Loss: 2.149771 LR: 0.00000618 +[20:56:28] Epoch: 1 Batch: 19215/20099 (95.60%) Loss: 2.279134 LR: 0.00000618 +[20:56:30] Epoch: 1 Batch: 19216/20099 (95.61%) Loss: 2.113948 LR: 0.00000618 +[20:56:32] Epoch: 1 Batch: 19217/20099 (95.61%) Loss: 2.117103 LR: 0.00000618 +[20:56:33] Epoch: 1 Batch: 19218/20099 (95.62%) Loss: 2.296872 LR: 0.00000618 +[20:56:35] Epoch: 1 Batch: 19219/20099 (95.62%) Loss: 2.284608 LR: 0.00000618 +[20:56:37] Epoch: 1 Batch: 19220/20099 (95.63%) Loss: 2.141250 LR: 0.00000618 +[20:56:39] Epoch: 1 Batch: 19221/20099 (95.63%) Loss: 2.107011 LR: 0.00000618 +[20:56:40] Epoch: 1 Batch: 19222/20099 (95.64%) Loss: 2.347450 LR: 0.00000618 +[20:56:42] Epoch: 1 Batch: 19223/20099 (95.64%) Loss: 1.942029 LR: 0.00000618 +[20:56:44] Epoch: 1 Batch: 19224/20099 (95.65%) Loss: 1.978559 LR: 0.00000617 +[20:56:46] Epoch: 1 Batch: 19225/20099 (95.65%) Loss: 2.072915 LR: 0.00000617 +[20:56:47] Epoch: 1 Batch: 19226/20099 (95.66%) Loss: 2.084349 LR: 0.00000617 +[20:56:49] Epoch: 1 Batch: 19227/20099 (95.66%) Loss: 2.176692 LR: 0.00000617 +[20:56:51] Epoch: 1 Batch: 19228/20099 (95.67%) Loss: 2.179019 LR: 0.00000617 +[20:56:53] Epoch: 1 Batch: 19229/20099 (95.67%) Loss: 1.923511 LR: 0.00000617 +[20:56:54] Epoch: 1 Batch: 19230/20099 (95.68%) Loss: 2.132443 LR: 0.00000617 +[20:56:56] Epoch: 1 Batch: 19231/20099 (95.68%) Loss: 2.086932 LR: 0.00000617 +[20:56:58] Epoch: 1 Batch: 19232/20099 (95.69%) Loss: 1.977108 LR: 0.00000617 +[20:56:59] Epoch: 1 Batch: 19233/20099 (95.69%) Loss: 1.994963 LR: 0.00000617 +[20:57:01] Epoch: 1 Batch: 19234/20099 (95.70%) Loss: 2.359763 LR: 0.00000617 +[20:57:03] Epoch: 1 Batch: 19235/20099 (95.70%) Loss: 2.006109 LR: 0.00000617 +[20:57:05] Epoch: 1 Batch: 19236/20099 (95.71%) Loss: 2.083440 LR: 0.00000617 +[20:57:06] Epoch: 1 Batch: 19237/20099 (95.71%) Loss: 2.103277 LR: 0.00000617 +[20:57:08] Epoch: 1 Batch: 19238/20099 (95.72%) Loss: 1.896535 LR: 0.00000617 +[20:57:10] Epoch: 1 Batch: 19239/20099 (95.72%) Loss: 1.865528 LR: 0.00000617 +[20:57:11] Epoch: 1 Batch: 19240/20099 (95.73%) Loss: 1.888546 LR: 0.00000617 +[20:57:13] Epoch: 1 Batch: 19241/20099 (95.73%) Loss: 2.211688 LR: 0.00000617 +[20:57:15] Epoch: 1 Batch: 19242/20099 (95.74%) Loss: 2.278568 LR: 0.00000617 +[20:57:16] Epoch: 1 Batch: 19243/20099 (95.74%) Loss: 2.366391 LR: 0.00000617 +[20:57:18] Epoch: 1 Batch: 19244/20099 (95.75%) Loss: 2.212700 LR: 0.00000617 +[20:57:20] Epoch: 1 Batch: 19245/20099 (95.75%) Loss: 1.944273 LR: 0.00000617 +[20:57:21] Epoch: 1 Batch: 19246/20099 (95.76%) Loss: 2.041165 LR: 0.00000617 +[20:57:23] Epoch: 1 Batch: 19247/20099 (95.76%) Loss: 2.191855 LR: 0.00000617 +[20:57:25] Epoch: 1 Batch: 19248/20099 (95.77%) Loss: 2.144728 LR: 0.00000617 +[20:57:26] Epoch: 1 Batch: 19249/20099 (95.77%) Loss: 2.210048 LR: 0.00000617 +[20:57:28] Epoch: 1 Batch: 19250/20099 (95.78%) Loss: 2.424622 LR: 0.00000617 +[20:57:30] Epoch: 1 Batch: 19251/20099 (95.78%) Loss: 2.307872 LR: 0.00000617 +[20:57:31] Epoch: 1 Batch: 19252/20099 (95.79%) Loss: 2.268333 LR: 0.00000616 +[20:57:33] Epoch: 1 Batch: 19253/20099 (95.79%) Loss: 2.144069 LR: 0.00000616 +[20:57:35] Epoch: 1 Batch: 19254/20099 (95.80%) Loss: 2.063195 LR: 0.00000616 +[20:57:36] Epoch: 1 Batch: 19255/20099 (95.80%) Loss: 2.288700 LR: 0.00000616 +[20:57:38] Epoch: 1 Batch: 19256/20099 (95.81%) Loss: 1.901940 LR: 0.00000616 +[20:57:40] Epoch: 1 Batch: 19257/20099 (95.81%) Loss: 1.972109 LR: 0.00000616 +[20:57:41] Epoch: 1 Batch: 19258/20099 (95.82%) Loss: 1.897694 LR: 0.00000616 +[20:57:43] Epoch: 1 Batch: 19259/20099 (95.82%) Loss: 2.158423 LR: 0.00000616 +[20:57:45] Epoch: 1 Batch: 19260/20099 (95.83%) Loss: 2.009321 LR: 0.00000616 +[20:57:46] Epoch: 1 Batch: 19261/20099 (95.83%) Loss: 1.939477 LR: 0.00000616 +[20:57:48] Epoch: 1 Batch: 19262/20099 (95.84%) Loss: 2.358209 LR: 0.00000616 +[20:57:50] Epoch: 1 Batch: 19263/20099 (95.84%) Loss: 2.007946 LR: 0.00000616 +[20:57:51] Epoch: 1 Batch: 19264/20099 (95.85%) Loss: 2.058913 LR: 0.00000616 +[20:57:53] Epoch: 1 Batch: 19265/20099 (95.85%) Loss: 2.212206 LR: 0.00000616 +[20:57:55] Epoch: 1 Batch: 19266/20099 (95.86%) Loss: 2.173413 LR: 0.00000616 +[20:57:57] Epoch: 1 Batch: 19267/20099 (95.86%) Loss: 1.998665 LR: 0.00000616 +[20:57:58] Epoch: 1 Batch: 19268/20099 (95.87%) Loss: 2.045244 LR: 0.00000616 +[20:58:00] Epoch: 1 Batch: 19269/20099 (95.87%) Loss: 2.134426 LR: 0.00000616 +[20:58:02] Epoch: 1 Batch: 19270/20099 (95.88%) Loss: 2.253838 LR: 0.00000616 +[20:58:03] Epoch: 1 Batch: 19271/20099 (95.88%) Loss: 2.073817 LR: 0.00000616 +[20:58:05] Epoch: 1 Batch: 19272/20099 (95.89%) Loss: 1.851133 LR: 0.00000616 +[20:58:07] Epoch: 1 Batch: 19273/20099 (95.89%) Loss: 2.171045 LR: 0.00000616 +[20:58:08] Epoch: 1 Batch: 19274/20099 (95.90%) Loss: 1.961958 LR: 0.00000616 +[20:58:10] Epoch: 1 Batch: 19275/20099 (95.90%) Loss: 1.956166 LR: 0.00000616 +[20:58:12] Epoch: 1 Batch: 19276/20099 (95.91%) Loss: 1.917084 LR: 0.00000616 +[20:58:14] Epoch: 1 Batch: 19277/20099 (95.91%) Loss: 2.122412 LR: 0.00000616 +[20:58:15] Epoch: 1 Batch: 19278/20099 (95.92%) Loss: 2.005834 LR: 0.00000616 +[20:58:17] Epoch: 1 Batch: 19279/20099 (95.92%) Loss: 2.346806 LR: 0.00000616 +[20:58:19] Epoch: 1 Batch: 19280/20099 (95.93%) Loss: 2.100619 LR: 0.00000615 +[20:58:20] Epoch: 1 Batch: 19281/20099 (95.93%) Loss: 2.253731 LR: 0.00000615 +[20:58:22] Epoch: 1 Batch: 19282/20099 (95.94%) Loss: 1.946220 LR: 0.00000615 +[20:58:24] Epoch: 1 Batch: 19283/20099 (95.94%) Loss: 2.240571 LR: 0.00000615 +[20:58:26] Epoch: 1 Batch: 19284/20099 (95.95%) Loss: 1.921088 LR: 0.00000615 +[20:58:27] Epoch: 1 Batch: 19285/20099 (95.95%) Loss: 2.157561 LR: 0.00000615 +[20:58:29] Epoch: 1 Batch: 19286/20099 (95.96%) Loss: 1.897148 LR: 0.00000615 +[20:58:31] Epoch: 1 Batch: 19287/20099 (95.96%) Loss: 1.748106 LR: 0.00000615 +[20:58:32] Epoch: 1 Batch: 19288/20099 (95.96%) Loss: 2.034826 LR: 0.00000615 +[20:58:34] Epoch: 1 Batch: 19289/20099 (95.97%) Loss: 1.990942 LR: 0.00000615 +[20:58:36] Epoch: 1 Batch: 19290/20099 (95.97%) Loss: 2.399631 LR: 0.00000615 +[20:58:37] Epoch: 1 Batch: 19291/20099 (95.98%) Loss: 2.035029 LR: 0.00000615 +[20:58:39] Epoch: 1 Batch: 19292/20099 (95.98%) Loss: 1.952772 LR: 0.00000615 +[20:58:41] Epoch: 1 Batch: 19293/20099 (95.99%) Loss: 2.066490 LR: 0.00000615 +[20:58:43] Epoch: 1 Batch: 19294/20099 (95.99%) Loss: 1.847305 LR: 0.00000615 +[20:58:44] Epoch: 1 Batch: 19295/20099 (96.00%) Loss: 2.187312 LR: 0.00000615 +[20:58:46] Epoch: 1 Batch: 19296/20099 (96.00%) Loss: 2.249157 LR: 0.00000615 +[20:58:48] Epoch: 1 Batch: 19297/20099 (96.01%) Loss: 2.504088 LR: 0.00000615 +[20:58:49] Epoch: 1 Batch: 19298/20099 (96.01%) Loss: 1.829207 LR: 0.00000615 +[20:58:51] Epoch: 1 Batch: 19299/20099 (96.02%) Loss: 1.963589 LR: 0.00000615 +[20:58:53] Epoch: 1 Batch: 19300/20099 (96.02%) Loss: 2.047724 LR: 0.00000615 +[20:58:54] Epoch: 1 Batch: 19301/20099 (96.03%) Loss: 2.120921 LR: 0.00000615 +[20:58:56] Epoch: 1 Batch: 19302/20099 (96.03%) Loss: 2.004623 LR: 0.00000615 +[20:58:58] Epoch: 1 Batch: 19303/20099 (96.04%) Loss: 2.120356 LR: 0.00000615 +[20:58:59] Epoch: 1 Batch: 19304/20099 (96.04%) Loss: 2.324379 LR: 0.00000615 +[20:59:01] Epoch: 1 Batch: 19305/20099 (96.05%) Loss: 2.088772 LR: 0.00000615 +[20:59:03] Epoch: 1 Batch: 19306/20099 (96.05%) Loss: 2.235310 LR: 0.00000615 +[20:59:04] Epoch: 1 Batch: 19307/20099 (96.06%) Loss: 2.132212 LR: 0.00000615 +[20:59:06] Epoch: 1 Batch: 19308/20099 (96.06%) Loss: 1.994984 LR: 0.00000614 +[20:59:08] Epoch: 1 Batch: 19309/20099 (96.07%) Loss: 2.398373 LR: 0.00000614 +[20:59:10] Epoch: 1 Batch: 19310/20099 (96.07%) Loss: 2.033854 LR: 0.00000614 +[20:59:11] Epoch: 1 Batch: 19311/20099 (96.08%) Loss: 2.002549 LR: 0.00000614 +[20:59:13] Epoch: 1 Batch: 19312/20099 (96.08%) Loss: 2.526049 LR: 0.00000614 +[20:59:15] Epoch: 1 Batch: 19313/20099 (96.09%) Loss: 2.100870 LR: 0.00000614 +[20:59:16] Epoch: 1 Batch: 19314/20099 (96.09%) Loss: 2.079069 LR: 0.00000614 +[20:59:18] Epoch: 1 Batch: 19315/20099 (96.10%) Loss: 2.153771 LR: 0.00000614 +[20:59:20] Epoch: 1 Batch: 19316/20099 (96.10%) Loss: 2.083469 LR: 0.00000614 +[20:59:21] Epoch: 1 Batch: 19317/20099 (96.11%) Loss: 1.919357 LR: 0.00000614 +[20:59:23] Epoch: 1 Batch: 19318/20099 (96.11%) Loss: 2.351133 LR: 0.00000614 +[20:59:25] Epoch: 1 Batch: 19319/20099 (96.12%) Loss: 2.002998 LR: 0.00000614 +[20:59:26] Epoch: 1 Batch: 19320/20099 (96.12%) Loss: 1.999341 LR: 0.00000614 +[20:59:28] Epoch: 1 Batch: 19321/20099 (96.13%) Loss: 1.955538 LR: 0.00000614 +[20:59:30] Epoch: 1 Batch: 19322/20099 (96.13%) Loss: 2.167962 LR: 0.00000614 +[20:59:31] Epoch: 1 Batch: 19323/20099 (96.14%) Loss: 1.948067 LR: 0.00000614 +[20:59:33] Epoch: 1 Batch: 19324/20099 (96.14%) Loss: 2.053443 LR: 0.00000614 +[20:59:35] Epoch: 1 Batch: 19325/20099 (96.15%) Loss: 1.994024 LR: 0.00000614 +[20:59:37] Epoch: 1 Batch: 19326/20099 (96.15%) Loss: 2.267633 LR: 0.00000614 +[20:59:38] Epoch: 1 Batch: 19327/20099 (96.16%) Loss: 1.964378 LR: 0.00000614 +[20:59:40] Epoch: 1 Batch: 19328/20099 (96.16%) Loss: 2.074259 LR: 0.00000614 +[20:59:42] Epoch: 1 Batch: 19329/20099 (96.17%) Loss: 2.180869 LR: 0.00000614 +[20:59:43] Epoch: 1 Batch: 19330/20099 (96.17%) Loss: 2.047415 LR: 0.00000614 +[20:59:45] Epoch: 1 Batch: 19331/20099 (96.18%) Loss: 2.065015 LR: 0.00000614 +[20:59:47] Epoch: 1 Batch: 19332/20099 (96.18%) Loss: 2.287655 LR: 0.00000614 +[20:59:48] Epoch: 1 Batch: 19333/20099 (96.19%) Loss: 1.989528 LR: 0.00000614 +[20:59:50] Epoch: 1 Batch: 19334/20099 (96.19%) Loss: 2.117703 LR: 0.00000614 +[20:59:52] Epoch: 1 Batch: 19335/20099 (96.20%) Loss: 2.298967 LR: 0.00000614 +[20:59:54] Epoch: 1 Batch: 19336/20099 (96.20%) Loss: 1.903924 LR: 0.00000613 +[20:59:55] Epoch: 1 Batch: 19337/20099 (96.21%) Loss: 2.048628 LR: 0.00000613 +[20:59:57] Epoch: 1 Batch: 19338/20099 (96.21%) Loss: 2.127208 LR: 0.00000613 +[20:59:59] Epoch: 1 Batch: 19339/20099 (96.22%) Loss: 1.658756 LR: 0.00000613 +[21:00:00] Epoch: 1 Batch: 19340/20099 (96.22%) Loss: 2.150151 LR: 0.00000613 +[21:00:02] Epoch: 1 Batch: 19341/20099 (96.23%) Loss: 2.042761 LR: 0.00000613 +[21:00:04] Epoch: 1 Batch: 19342/20099 (96.23%) Loss: 2.015319 LR: 0.00000613 +[21:00:05] Epoch: 1 Batch: 19343/20099 (96.24%) Loss: 2.097989 LR: 0.00000613 +[21:00:07] Epoch: 1 Batch: 19344/20099 (96.24%) Loss: 2.220733 LR: 0.00000613 +[21:00:09] Epoch: 1 Batch: 19345/20099 (96.25%) Loss: 1.554869 LR: 0.00000613 +[21:00:11] Epoch: 1 Batch: 19346/20099 (96.25%) Loss: 2.061192 LR: 0.00000613 +[21:00:12] Epoch: 1 Batch: 19347/20099 (96.26%) Loss: 2.108182 LR: 0.00000613 +[21:00:14] Epoch: 1 Batch: 19348/20099 (96.26%) Loss: 1.979931 LR: 0.00000613 +[21:00:16] Epoch: 1 Batch: 19349/20099 (96.27%) Loss: 2.162380 LR: 0.00000613 +[21:00:17] Epoch: 1 Batch: 19350/20099 (96.27%) Loss: 1.940903 LR: 0.00000613 +[21:00:19] Epoch: 1 Batch: 19351/20099 (96.28%) Loss: 2.282442 LR: 0.00000613 +[21:00:21] Epoch: 1 Batch: 19352/20099 (96.28%) Loss: 2.060704 LR: 0.00000613 +[21:00:22] Epoch: 1 Batch: 19353/20099 (96.29%) Loss: 2.369459 LR: 0.00000613 +[21:00:24] Epoch: 1 Batch: 19354/20099 (96.29%) Loss: 2.137497 LR: 0.00000613 +[21:00:26] Epoch: 1 Batch: 19355/20099 (96.30%) Loss: 1.979867 LR: 0.00000613 +[21:00:27] Epoch: 1 Batch: 19356/20099 (96.30%) Loss: 2.156448 LR: 0.00000613 +[21:00:29] Epoch: 1 Batch: 19357/20099 (96.31%) Loss: 1.996593 LR: 0.00000613 +[21:00:31] Epoch: 1 Batch: 19358/20099 (96.31%) Loss: 2.260717 LR: 0.00000613 +[21:00:33] Epoch: 1 Batch: 19359/20099 (96.32%) Loss: 2.114215 LR: 0.00000613 +[21:00:34] Epoch: 1 Batch: 19360/20099 (96.32%) Loss: 1.845825 LR: 0.00000613 +[21:00:36] Epoch: 1 Batch: 19361/20099 (96.33%) Loss: 2.307250 LR: 0.00000613 +[21:00:38] Epoch: 1 Batch: 19362/20099 (96.33%) Loss: 1.844648 LR: 0.00000613 +[21:00:39] Epoch: 1 Batch: 19363/20099 (96.34%) Loss: 2.248711 LR: 0.00000613 +[21:00:41] Epoch: 1 Batch: 19364/20099 (96.34%) Loss: 2.023023 LR: 0.00000612 +[21:00:43] Epoch: 1 Batch: 19365/20099 (96.35%) Loss: 2.043382 LR: 0.00000612 +[21:00:45] Epoch: 1 Batch: 19366/20099 (96.35%) Loss: 2.246357 LR: 0.00000612 +[21:00:46] Epoch: 1 Batch: 19367/20099 (96.36%) Loss: 1.808795 LR: 0.00000612 +[21:00:48] Epoch: 1 Batch: 19368/20099 (96.36%) Loss: 2.152886 LR: 0.00000612 +[21:00:50] Epoch: 1 Batch: 19369/20099 (96.37%) Loss: 2.042317 LR: 0.00000612 +[21:00:51] Epoch: 1 Batch: 19370/20099 (96.37%) Loss: 2.341267 LR: 0.00000612 +[21:00:53] Epoch: 1 Batch: 19371/20099 (96.38%) Loss: 1.901681 LR: 0.00000612 +[21:00:55] Epoch: 1 Batch: 19372/20099 (96.38%) Loss: 2.290436 LR: 0.00000612 +[21:00:56] Epoch: 1 Batch: 19373/20099 (96.39%) Loss: 2.157007 LR: 0.00000612 +[21:00:58] Epoch: 1 Batch: 19374/20099 (96.39%) Loss: 1.990457 LR: 0.00000612 +[21:01:00] Epoch: 1 Batch: 19375/20099 (96.40%) Loss: 1.900661 LR: 0.00000612 +[21:01:02] Epoch: 1 Batch: 19376/20099 (96.40%) Loss: 1.762744 LR: 0.00000612 +[21:01:03] Epoch: 1 Batch: 19377/20099 (96.41%) Loss: 1.808325 LR: 0.00000612 +[21:01:05] Epoch: 1 Batch: 19378/20099 (96.41%) Loss: 1.975487 LR: 0.00000612 +[21:01:07] Epoch: 1 Batch: 19379/20099 (96.42%) Loss: 1.929963 LR: 0.00000612 +[21:01:08] Epoch: 1 Batch: 19380/20099 (96.42%) Loss: 2.183448 LR: 0.00000612 +[21:01:10] Epoch: 1 Batch: 19381/20099 (96.43%) Loss: 1.944941 LR: 0.00000612 +[21:01:12] Epoch: 1 Batch: 19382/20099 (96.43%) Loss: 1.996176 LR: 0.00000612 +[21:01:13] Epoch: 1 Batch: 19383/20099 (96.44%) Loss: 2.188856 LR: 0.00000612 +[21:01:15] Epoch: 1 Batch: 19384/20099 (96.44%) Loss: 2.023151 LR: 0.00000612 +[21:01:17] Epoch: 1 Batch: 19385/20099 (96.45%) Loss: 2.136082 LR: 0.00000612 +[21:01:19] Epoch: 1 Batch: 19386/20099 (96.45%) Loss: 1.696579 LR: 0.00000612 +[21:01:20] Epoch: 1 Batch: 19387/20099 (96.46%) Loss: 2.007573 LR: 0.00000612 +[21:01:22] Epoch: 1 Batch: 19388/20099 (96.46%) Loss: 2.065821 LR: 0.00000612 +[21:01:24] Epoch: 1 Batch: 19389/20099 (96.47%) Loss: 1.703727 LR: 0.00000612 +[21:01:25] Epoch: 1 Batch: 19390/20099 (96.47%) Loss: 2.011709 LR: 0.00000612 +[21:01:27] Epoch: 1 Batch: 19391/20099 (96.48%) Loss: 2.194217 LR: 0.00000612 +[21:01:29] Epoch: 1 Batch: 19392/20099 (96.48%) Loss: 2.177925 LR: 0.00000611 +[21:01:30] Epoch: 1 Batch: 19393/20099 (96.49%) Loss: 2.240837 LR: 0.00000611 +[21:01:32] Epoch: 1 Batch: 19394/20099 (96.49%) Loss: 2.036586 LR: 0.00000611 +[21:01:34] Epoch: 1 Batch: 19395/20099 (96.50%) Loss: 1.963877 LR: 0.00000611 +[21:01:35] Epoch: 1 Batch: 19396/20099 (96.50%) Loss: 2.052092 LR: 0.00000611 +[21:01:37] Epoch: 1 Batch: 19397/20099 (96.51%) Loss: 2.076968 LR: 0.00000611 +[21:01:39] Epoch: 1 Batch: 19398/20099 (96.51%) Loss: 1.817231 LR: 0.00000611 +[21:01:41] Epoch: 1 Batch: 19399/20099 (96.52%) Loss: 2.123400 LR: 0.00000611 +[21:02:14] >> Temp checkpoint saved: epoch1_step19400, size: 0.1693 GB +[21:02:14] Epoch: 1 Batch: 19400/20099 (96.52%) Loss: 1.924161 LR: 0.00000611 +[21:02:16] Epoch: 1 Batch: 19401/20099 (96.53%) Loss: 1.918088 LR: 0.00000611 +[21:02:17] Epoch: 1 Batch: 19402/20099 (96.53%) Loss: 2.260812 LR: 0.00000611 +[21:02:19] Epoch: 1 Batch: 19403/20099 (96.54%) Loss: 2.193776 LR: 0.00000611 +[21:02:21] Epoch: 1 Batch: 19404/20099 (96.54%) Loss: 2.020071 LR: 0.00000611 +[21:02:22] Epoch: 1 Batch: 19405/20099 (96.55%) Loss: 1.868532 LR: 0.00000611 +[21:02:24] Epoch: 1 Batch: 19406/20099 (96.55%) Loss: 1.988458 LR: 0.00000611 +[21:02:28] Epoch: 1 Batch: 19407/20099 (96.56%) Loss: 2.079351 LR: 0.00000611 +[21:02:29] Epoch: 1 Batch: 19408/20099 (96.56%) Loss: 2.015919 LR: 0.00000611 +[21:02:31] Epoch: 1 Batch: 19409/20099 (96.57%) Loss: 2.139397 LR: 0.00000611 +[21:02:33] Epoch: 1 Batch: 19410/20099 (96.57%) Loss: 2.206385 LR: 0.00000611 +[21:02:35] Epoch: 1 Batch: 19411/20099 (96.58%) Loss: 2.133883 LR: 0.00000611 +[21:02:36] Epoch: 1 Batch: 19412/20099 (96.58%) Loss: 1.984125 LR: 0.00000611 +[21:02:38] Epoch: 1 Batch: 19413/20099 (96.59%) Loss: 2.020163 LR: 0.00000611 +[21:02:40] Epoch: 1 Batch: 19414/20099 (96.59%) Loss: 2.087989 LR: 0.00000611 +[21:02:41] Epoch: 1 Batch: 19415/20099 (96.60%) Loss: 1.952047 LR: 0.00000611 +[21:02:43] Epoch: 1 Batch: 19416/20099 (96.60%) Loss: 2.104268 LR: 0.00000611 +[21:02:45] Epoch: 1 Batch: 19417/20099 (96.61%) Loss: 2.183434 LR: 0.00000611 +[21:02:47] Epoch: 1 Batch: 19418/20099 (96.61%) Loss: 2.168390 LR: 0.00000611 +[21:02:49] Epoch: 1 Batch: 19419/20099 (96.62%) Loss: 1.809287 LR: 0.00000611 +[21:02:50] Epoch: 1 Batch: 19420/20099 (96.62%) Loss: 2.158170 LR: 0.00000611 +[21:02:52] Epoch: 1 Batch: 19421/20099 (96.63%) Loss: 2.462716 LR: 0.00000611 +[21:02:54] Epoch: 1 Batch: 19422/20099 (96.63%) Loss: 2.324638 LR: 0.00000611 +[21:02:56] Epoch: 1 Batch: 19423/20099 (96.64%) Loss: 2.118031 LR: 0.00000611 +[21:02:57] Epoch: 1 Batch: 19424/20099 (96.64%) Loss: 2.003569 LR: 0.00000611 +[21:02:59] Epoch: 1 Batch: 19425/20099 (96.65%) Loss: 1.966407 LR: 0.00000611 +[21:03:01] Epoch: 1 Batch: 19426/20099 (96.65%) Loss: 1.801728 LR: 0.00000611 +[21:03:03] Epoch: 1 Batch: 19427/20099 (96.66%) Loss: 2.171199 LR: 0.00000610 +[21:03:04] Epoch: 1 Batch: 19428/20099 (96.66%) Loss: 1.966954 LR: 0.00000610 +[21:03:06] Epoch: 1 Batch: 19429/20099 (96.67%) Loss: 2.069332 LR: 0.00000610 +[21:03:08] Epoch: 1 Batch: 19430/20099 (96.67%) Loss: 2.282226 LR: 0.00000610 +[21:03:09] Epoch: 1 Batch: 19431/20099 (96.68%) Loss: 2.108295 LR: 0.00000610 +[21:03:11] Epoch: 1 Batch: 19432/20099 (96.68%) Loss: 2.206185 LR: 0.00000610 +[21:03:13] Epoch: 1 Batch: 19433/20099 (96.69%) Loss: 1.853494 LR: 0.00000610 +[21:03:15] Epoch: 1 Batch: 19434/20099 (96.69%) Loss: 2.217936 LR: 0.00000610 +[21:03:16] Epoch: 1 Batch: 19435/20099 (96.70%) Loss: 1.800403 LR: 0.00000610 +[21:03:18] Epoch: 1 Batch: 19436/20099 (96.70%) Loss: 2.313794 LR: 0.00000610 +[21:03:20] Epoch: 1 Batch: 19437/20099 (96.71%) Loss: 2.067330 LR: 0.00000610 +[21:03:21] Epoch: 1 Batch: 19438/20099 (96.71%) Loss: 2.140300 LR: 0.00000610 +[21:03:23] Epoch: 1 Batch: 19439/20099 (96.72%) Loss: 2.192342 LR: 0.00000610 +[21:03:25] Epoch: 1 Batch: 19440/20099 (96.72%) Loss: 2.201669 LR: 0.00000610 +[21:03:26] Epoch: 1 Batch: 19441/20099 (96.73%) Loss: 2.007512 LR: 0.00000610 +[21:03:28] Epoch: 1 Batch: 19442/20099 (96.73%) Loss: 1.690128 LR: 0.00000610 +[21:03:30] Epoch: 1 Batch: 19443/20099 (96.74%) Loss: 1.790415 LR: 0.00000610 +[21:03:31] Epoch: 1 Batch: 19444/20099 (96.74%) Loss: 2.098956 LR: 0.00000610 +[21:03:33] Epoch: 1 Batch: 19445/20099 (96.75%) Loss: 1.904441 LR: 0.00000610 +[21:03:35] Epoch: 1 Batch: 19446/20099 (96.75%) Loss: 1.975704 LR: 0.00000610 +[21:03:36] Epoch: 1 Batch: 19447/20099 (96.76%) Loss: 1.869539 LR: 0.00000610 +[21:03:38] Epoch: 1 Batch: 19448/20099 (96.76%) Loss: 2.298446 LR: 0.00000610 +[21:03:40] Epoch: 1 Batch: 19449/20099 (96.77%) Loss: 2.042427 LR: 0.00000610 +[21:03:41] Epoch: 1 Batch: 19450/20099 (96.77%) Loss: 2.165522 LR: 0.00000610 +[21:03:43] Epoch: 1 Batch: 19451/20099 (96.78%) Loss: 2.169574 LR: 0.00000610 +[21:03:45] Epoch: 1 Batch: 19452/20099 (96.78%) Loss: 1.844660 LR: 0.00000610 +[21:03:46] Epoch: 1 Batch: 19453/20099 (96.79%) Loss: 1.967025 LR: 0.00000610 +[21:03:48] Epoch: 1 Batch: 19454/20099 (96.79%) Loss: 2.094159 LR: 0.00000610 +[21:03:50] Epoch: 1 Batch: 19455/20099 (96.80%) Loss: 1.926441 LR: 0.00000609 +[21:03:51] Epoch: 1 Batch: 19456/20099 (96.80%) Loss: 1.998559 LR: 0.00000609 +[21:03:53] Epoch: 1 Batch: 19457/20099 (96.81%) Loss: 1.929996 LR: 0.00000609 +[21:03:55] Epoch: 1 Batch: 19458/20099 (96.81%) Loss: 2.244255 LR: 0.00000609 +[21:03:56] Epoch: 1 Batch: 19459/20099 (96.82%) Loss: 2.048966 LR: 0.00000609 +[21:03:58] Epoch: 1 Batch: 19460/20099 (96.82%) Loss: 1.966728 LR: 0.00000609 +[21:04:00] Epoch: 1 Batch: 19461/20099 (96.83%) Loss: 2.219174 LR: 0.00000609 +[21:04:02] Epoch: 1 Batch: 19462/20099 (96.83%) Loss: 2.083934 LR: 0.00000609 +[21:04:03] Epoch: 1 Batch: 19463/20099 (96.84%) Loss: 2.297644 LR: 0.00000609 +[21:04:05] Epoch: 1 Batch: 19464/20099 (96.84%) Loss: 2.037646 LR: 0.00000609 +[21:04:07] Epoch: 1 Batch: 19465/20099 (96.85%) Loss: 2.284017 LR: 0.00000609 +[21:04:08] Epoch: 1 Batch: 19466/20099 (96.85%) Loss: 2.258305 LR: 0.00000609 +[21:04:10] Epoch: 1 Batch: 19467/20099 (96.86%) Loss: 1.868079 LR: 0.00000609 +[21:04:12] Epoch: 1 Batch: 19468/20099 (96.86%) Loss: 1.929782 LR: 0.00000609 +[21:04:13] Epoch: 1 Batch: 19469/20099 (96.87%) Loss: 2.134636 LR: 0.00000609 +[21:04:15] Epoch: 1 Batch: 19470/20099 (96.87%) Loss: 2.052505 LR: 0.00000609 +[21:04:17] Epoch: 1 Batch: 19471/20099 (96.88%) Loss: 2.046832 LR: 0.00000609 +[21:04:19] Epoch: 1 Batch: 19472/20099 (96.88%) Loss: 2.131657 LR: 0.00000609 +[21:04:20] Epoch: 1 Batch: 19473/20099 (96.89%) Loss: 2.178327 LR: 0.00000609 +[21:04:22] Epoch: 1 Batch: 19474/20099 (96.89%) Loss: 1.931745 LR: 0.00000609 +[21:04:24] Epoch: 1 Batch: 19475/20099 (96.90%) Loss: 1.709950 LR: 0.00000609 +[21:04:25] Epoch: 1 Batch: 19476/20099 (96.90%) Loss: 2.056588 LR: 0.00000609 +[21:04:27] Epoch: 1 Batch: 19477/20099 (96.91%) Loss: 1.903806 LR: 0.00000609 +[21:04:29] Epoch: 1 Batch: 19478/20099 (96.91%) Loss: 1.764072 LR: 0.00000609 +[21:04:30] Epoch: 1 Batch: 19479/20099 (96.92%) Loss: 1.917783 LR: 0.00000609 +[21:04:32] Epoch: 1 Batch: 19480/20099 (96.92%) Loss: 1.864176 LR: 0.00000609 +[21:04:34] Epoch: 1 Batch: 19481/20099 (96.93%) Loss: 1.983515 LR: 0.00000609 +[21:04:36] Epoch: 1 Batch: 19482/20099 (96.93%) Loss: 2.193128 LR: 0.00000609 +[21:04:37] Epoch: 1 Batch: 19483/20099 (96.94%) Loss: 2.191280 LR: 0.00000609 +[21:04:39] Epoch: 1 Batch: 19484/20099 (96.94%) Loss: 2.047396 LR: 0.00000609 +[21:04:41] Epoch: 1 Batch: 19485/20099 (96.95%) Loss: 2.175007 LR: 0.00000609 +[21:04:42] Epoch: 1 Batch: 19486/20099 (96.95%) Loss: 2.109110 LR: 0.00000609 +[21:04:44] Epoch: 1 Batch: 19487/20099 (96.96%) Loss: 1.885035 LR: 0.00000609 +[21:04:46] Epoch: 1 Batch: 19488/20099 (96.96%) Loss: 2.148341 LR: 0.00000609 +[21:04:47] Epoch: 1 Batch: 19489/20099 (96.97%) Loss: 2.202434 LR: 0.00000609 +[21:04:49] Epoch: 1 Batch: 19490/20099 (96.97%) Loss: 1.931261 LR: 0.00000609 +[21:04:51] Epoch: 1 Batch: 19491/20099 (96.97%) Loss: 1.811763 LR: 0.00000609 +[21:04:53] Epoch: 1 Batch: 19492/20099 (96.98%) Loss: 1.939105 LR: 0.00000609 +[21:04:54] Epoch: 1 Batch: 19493/20099 (96.98%) Loss: 1.899246 LR: 0.00000609 +[21:04:56] Epoch: 1 Batch: 19494/20099 (96.99%) Loss: 2.303148 LR: 0.00000609 +[21:04:58] Epoch: 1 Batch: 19495/20099 (96.99%) Loss: 2.167841 LR: 0.00000609 +[21:04:59] Epoch: 1 Batch: 19496/20099 (97.00%) Loss: 2.007464 LR: 0.00000609 +[21:05:01] Epoch: 1 Batch: 19497/20099 (97.00%) Loss: 1.941152 LR: 0.00000608 +[21:05:03] Epoch: 1 Batch: 19498/20099 (97.01%) Loss: 2.023023 LR: 0.00000608 +[21:05:05] Epoch: 1 Batch: 19499/20099 (97.01%) Loss: 2.300960 LR: 0.00000608 +[21:05:06] >> Evaluating batch 0 +[21:05:07] >> Evaluating batch 1 +[21:05:08] >> Evaluating batch 2 +[21:05:09] >> Evaluating batch 3 +[21:05:10] >> Evaluating batch 4 +[21:05:11] >> Evaluating batch 5 +[21:05:12] >> Evaluating batch 6 +[21:05:13] >> Evaluating batch 7 +[21:05:14] >> Evaluating batch 8 +[21:05:15] >> Evaluating batch 9 +[21:05:16] >> Evaluating batch 10 +[21:05:17] >> Evaluating batch 11 +[21:05:18] >> Evaluating batch 12 +[21:05:19] >> Evaluating batch 13 +[21:05:20] >> Evaluating batch 14 +[21:05:21] >> Evaluating batch 15 +[21:05:21] >> Evaluating batch 16 +[21:05:22] Epoch: 1 Step: 19500/20099 Evaluation: +[21:05:22] [1mAvg Loss Since Last Eval: 0.0531 Val Loss: 2.1462 Validation loss delta: 2.1462 Perplexity: 8.5523 LR: 0.00000608 +[21:05:26] >> Checkpoint saved: epoch1_step19500, size: 0.1693 GB +[21:05:26] Epoch: 1 Batch: 19500/20099 (97.02%) Loss: 2.200761 LR: 0.00000608 +[21:05:28] Epoch: 1 Batch: 19501/20099 (97.02%) Loss: 2.148013 LR: 0.00000608 +[21:05:29] Epoch: 1 Batch: 19502/20099 (97.03%) Loss: 2.046340 LR: 0.00000608 +[21:05:31] Epoch: 1 Batch: 19503/20099 (97.03%) Loss: 1.856627 LR: 0.00000608 +[21:05:33] Epoch: 1 Batch: 19504/20099 (97.04%) Loss: 1.993814 LR: 0.00000608 +[21:05:34] Epoch: 1 Batch: 19505/20099 (97.04%) Loss: 2.341378 LR: 0.00000608 +[21:05:36] Epoch: 1 Batch: 19506/20099 (97.05%) Loss: 2.199477 LR: 0.00000608 +[21:05:38] Epoch: 1 Batch: 19507/20099 (97.05%) Loss: 2.394095 LR: 0.00000608 +[21:05:39] Epoch: 1 Batch: 19508/20099 (97.06%) Loss: 2.121704 LR: 0.00000608 +[21:05:41] Epoch: 1 Batch: 19509/20099 (97.06%) Loss: 2.031300 LR: 0.00000608 +[21:05:43] Epoch: 1 Batch: 19510/20099 (97.07%) Loss: 2.154776 LR: 0.00000608 +[21:05:45] Epoch: 1 Batch: 19511/20099 (97.07%) Loss: 2.184183 LR: 0.00000608 +[21:05:46] Epoch: 1 Batch: 19512/20099 (97.08%) Loss: 2.213635 LR: 0.00000608 +[21:05:48] Epoch: 1 Batch: 19513/20099 (97.08%) Loss: 1.994876 LR: 0.00000608 +[21:05:50] Epoch: 1 Batch: 19514/20099 (97.09%) Loss: 2.129902 LR: 0.00000608 +[21:05:51] Epoch: 1 Batch: 19515/20099 (97.09%) Loss: 1.901718 LR: 0.00000608 +[21:05:53] Epoch: 1 Batch: 19516/20099 (97.10%) Loss: 1.983424 LR: 0.00000608 +[21:05:55] Epoch: 1 Batch: 19517/20099 (97.10%) Loss: 2.291611 LR: 0.00000608 +[21:05:57] Epoch: 1 Batch: 19518/20099 (97.11%) Loss: 2.029920 LR: 0.00000608 +[21:05:58] Epoch: 1 Batch: 19519/20099 (97.11%) Loss: 2.062734 LR: 0.00000608 +[21:06:00] Epoch: 1 Batch: 19520/20099 (97.12%) Loss: 2.167249 LR: 0.00000608 +[21:06:02] Epoch: 1 Batch: 19521/20099 (97.12%) Loss: 1.950860 LR: 0.00000608 +[21:06:03] Epoch: 1 Batch: 19522/20099 (97.13%) Loss: 2.158612 LR: 0.00000608 +[21:06:05] Epoch: 1 Batch: 19523/20099 (97.13%) Loss: 1.995965 LR: 0.00000608 +[21:06:07] Epoch: 1 Batch: 19524/20099 (97.14%) Loss: 2.281001 LR: 0.00000608 +[21:06:09] Epoch: 1 Batch: 19525/20099 (97.14%) Loss: 2.043555 LR: 0.00000608 +[21:06:10] Epoch: 1 Batch: 19526/20099 (97.15%) Loss: 2.196559 LR: 0.00000608 +[21:06:12] Epoch: 1 Batch: 19527/20099 (97.15%) Loss: 1.863253 LR: 0.00000608 +[21:06:14] Epoch: 1 Batch: 19528/20099 (97.16%) Loss: 2.143423 LR: 0.00000608 +[21:06:15] Epoch: 1 Batch: 19529/20099 (97.16%) Loss: 2.078881 LR: 0.00000608 +[21:06:17] Epoch: 1 Batch: 19530/20099 (97.17%) Loss: 2.001253 LR: 0.00000608 +[21:06:19] Epoch: 1 Batch: 19531/20099 (97.17%) Loss: 2.029536 LR: 0.00000608 +[21:06:20] Epoch: 1 Batch: 19532/20099 (97.18%) Loss: 2.184704 LR: 0.00000607 +[21:06:22] Epoch: 1 Batch: 19533/20099 (97.18%) Loss: 2.083115 LR: 0.00000607 +[21:06:24] Epoch: 1 Batch: 19534/20099 (97.19%) Loss: 2.157794 LR: 0.00000607 +[21:06:25] Epoch: 1 Batch: 19535/20099 (97.19%) Loss: 2.055910 LR: 0.00000607 +[21:06:27] Epoch: 1 Batch: 19536/20099 (97.20%) Loss: 2.010116 LR: 0.00000607 +[21:06:29] Epoch: 1 Batch: 19537/20099 (97.20%) Loss: 2.234168 LR: 0.00000607 +[21:06:31] Epoch: 1 Batch: 19538/20099 (97.21%) Loss: 2.070967 LR: 0.00000607 +[21:06:32] Epoch: 1 Batch: 19539/20099 (97.21%) Loss: 2.053724 LR: 0.00000607 +[21:06:34] Epoch: 1 Batch: 19540/20099 (97.22%) Loss: 2.333262 LR: 0.00000607 +[21:06:36] Epoch: 1 Batch: 19541/20099 (97.22%) Loss: 2.163237 LR: 0.00000607 +[21:06:37] Epoch: 1 Batch: 19542/20099 (97.23%) Loss: 2.011131 LR: 0.00000607 +[21:06:39] Epoch: 1 Batch: 19543/20099 (97.23%) Loss: 2.468477 LR: 0.00000607 +[21:06:41] Epoch: 1 Batch: 19544/20099 (97.24%) Loss: 2.267995 LR: 0.00000607 +[21:06:42] Epoch: 1 Batch: 19545/20099 (97.24%) Loss: 2.379897 LR: 0.00000607 +[21:06:44] Epoch: 1 Batch: 19546/20099 (97.25%) Loss: 2.250490 LR: 0.00000607 +[21:06:46] Epoch: 1 Batch: 19547/20099 (97.25%) Loss: 2.152672 LR: 0.00000607 +[21:06:47] Epoch: 1 Batch: 19548/20099 (97.26%) Loss: 2.165462 LR: 0.00000607 +[21:06:49] Epoch: 1 Batch: 19549/20099 (97.26%) Loss: 2.052712 LR: 0.00000607 +[21:06:51] Epoch: 1 Batch: 19550/20099 (97.27%) Loss: 2.255512 LR: 0.00000607 +[21:06:52] Epoch: 1 Batch: 19551/20099 (97.27%) Loss: 2.036903 LR: 0.00000607 +[21:06:54] Epoch: 1 Batch: 19552/20099 (97.28%) Loss: 2.273328 LR: 0.00000607 +[21:06:56] Epoch: 1 Batch: 19553/20099 (97.28%) Loss: 2.087305 LR: 0.00000607 +[21:06:58] Epoch: 1 Batch: 19554/20099 (97.29%) Loss: 2.354399 LR: 0.00000607 +[21:06:59] Epoch: 1 Batch: 19555/20099 (97.29%) Loss: 1.965425 LR: 0.00000607 +[21:07:01] Epoch: 1 Batch: 19556/20099 (97.30%) Loss: 1.888842 LR: 0.00000607 +[21:07:03] Epoch: 1 Batch: 19557/20099 (97.30%) Loss: 2.133152 LR: 0.00000607 +[21:07:04] Epoch: 1 Batch: 19558/20099 (97.31%) Loss: 2.215588 LR: 0.00000607 +[21:07:06] Epoch: 1 Batch: 19559/20099 (97.31%) Loss: 1.980094 LR: 0.00000607 +[21:07:08] Epoch: 1 Batch: 19560/20099 (97.32%) Loss: 1.983996 LR: 0.00000607 +[21:07:09] Epoch: 1 Batch: 19561/20099 (97.32%) Loss: 1.945713 LR: 0.00000607 +[21:07:11] Epoch: 1 Batch: 19562/20099 (97.33%) Loss: 2.131228 LR: 0.00000607 +[21:07:13] Epoch: 1 Batch: 19563/20099 (97.33%) Loss: 2.089635 LR: 0.00000607 +[21:07:14] Epoch: 1 Batch: 19564/20099 (97.34%) Loss: 2.124650 LR: 0.00000607 +[21:07:16] Epoch: 1 Batch: 19565/20099 (97.34%) Loss: 1.903804 LR: 0.00000607 +[21:07:18] Epoch: 1 Batch: 19566/20099 (97.35%) Loss: 2.179860 LR: 0.00000607 +[21:07:20] Epoch: 1 Batch: 19567/20099 (97.35%) Loss: 2.118630 LR: 0.00000607 +[21:07:21] Epoch: 1 Batch: 19568/20099 (97.36%) Loss: 1.946252 LR: 0.00000607 +[21:07:23] Epoch: 1 Batch: 19569/20099 (97.36%) Loss: 2.150693 LR: 0.00000607 +[21:07:25] Epoch: 1 Batch: 19570/20099 (97.37%) Loss: 2.279199 LR: 0.00000607 +[21:07:26] Epoch: 1 Batch: 19571/20099 (97.37%) Loss: 2.124003 LR: 0.00000607 +[21:07:28] Epoch: 1 Batch: 19572/20099 (97.38%) Loss: 1.950289 LR: 0.00000607 +[21:07:30] Epoch: 1 Batch: 19573/20099 (97.38%) Loss: 2.014793 LR: 0.00000607 +[21:07:31] Epoch: 1 Batch: 19574/20099 (97.39%) Loss: 2.119580 LR: 0.00000606 +[21:07:33] Epoch: 1 Batch: 19575/20099 (97.39%) Loss: 1.867238 LR: 0.00000606 +[21:07:35] Epoch: 1 Batch: 19576/20099 (97.40%) Loss: 2.126518 LR: 0.00000606 +[21:07:37] Epoch: 1 Batch: 19577/20099 (97.40%) Loss: 2.029558 LR: 0.00000606 +[21:07:38] Epoch: 1 Batch: 19578/20099 (97.41%) Loss: 2.114726 LR: 0.00000606 +[21:07:40] Epoch: 1 Batch: 19579/20099 (97.41%) Loss: 1.791034 LR: 0.00000606 +[21:07:42] Epoch: 1 Batch: 19580/20099 (97.42%) Loss: 2.079845 LR: 0.00000606 +[21:07:43] Epoch: 1 Batch: 19581/20099 (97.42%) Loss: 1.952391 LR: 0.00000606 +[21:07:45] Epoch: 1 Batch: 19582/20099 (97.43%) Loss: 2.053125 LR: 0.00000606 +[21:07:47] Epoch: 1 Batch: 19583/20099 (97.43%) Loss: 1.879852 LR: 0.00000606 +[21:07:48] Epoch: 1 Batch: 19584/20099 (97.44%) Loss: 2.277481 LR: 0.00000606 +[21:07:50] Epoch: 1 Batch: 19585/20099 (97.44%) Loss: 2.036930 LR: 0.00000606 +[21:07:52] Epoch: 1 Batch: 19586/20099 (97.45%) Loss: 2.296636 LR: 0.00000606 +[21:07:54] Epoch: 1 Batch: 19587/20099 (97.45%) Loss: 1.760554 LR: 0.00000606 +[21:07:55] Epoch: 1 Batch: 19588/20099 (97.46%) Loss: 1.951828 LR: 0.00000606 +[21:07:57] Epoch: 1 Batch: 19589/20099 (97.46%) Loss: 1.969662 LR: 0.00000606 +[21:07:59] Epoch: 1 Batch: 19590/20099 (97.47%) Loss: 2.324870 LR: 0.00000606 +[21:08:00] Epoch: 1 Batch: 19591/20099 (97.47%) Loss: 2.147874 LR: 0.00000606 +[21:08:02] Epoch: 1 Batch: 19592/20099 (97.48%) Loss: 1.635293 LR: 0.00000606 +[21:08:04] Epoch: 1 Batch: 19593/20099 (97.48%) Loss: 2.144351 LR: 0.00000606 +[21:08:05] Epoch: 1 Batch: 19594/20099 (97.49%) Loss: 2.038568 LR: 0.00000606 +[21:08:07] Epoch: 1 Batch: 19595/20099 (97.49%) Loss: 1.552045 LR: 0.00000606 +[21:08:09] Epoch: 1 Batch: 19596/20099 (97.50%) Loss: 2.147465 LR: 0.00000606 +[21:08:11] Epoch: 1 Batch: 19597/20099 (97.50%) Loss: 1.891292 LR: 0.00000606 +[21:08:12] Epoch: 1 Batch: 19598/20099 (97.51%) Loss: 2.180317 LR: 0.00000606 +[21:08:14] Epoch: 1 Batch: 19599/20099 (97.51%) Loss: 2.229229 LR: 0.00000606 +[21:08:19] >> Temp checkpoint saved: epoch1_step19600, size: 0.1693 GB +[21:08:19] Epoch: 1 Batch: 19600/20099 (97.52%) Loss: 2.047410 LR: 0.00000606 +[21:08:21] Epoch: 1 Batch: 19601/20099 (97.52%) Loss: 2.106380 LR: 0.00000606 +[21:08:23] Epoch: 1 Batch: 19602/20099 (97.53%) Loss: 2.263475 LR: 0.00000606 +[21:08:24] Epoch: 1 Batch: 19603/20099 (97.53%) Loss: 2.067778 LR: 0.00000606 +[21:08:26] Epoch: 1 Batch: 19604/20099 (97.54%) Loss: 2.020451 LR: 0.00000606 +[21:08:28] Epoch: 1 Batch: 19605/20099 (97.54%) Loss: 1.901429 LR: 0.00000606 +[21:08:29] Epoch: 1 Batch: 19606/20099 (97.55%) Loss: 2.157249 LR: 0.00000606 +[21:08:31] Epoch: 1 Batch: 19607/20099 (97.55%) Loss: 1.895662 LR: 0.00000606 +[21:08:33] Epoch: 1 Batch: 19608/20099 (97.56%) Loss: 2.216561 LR: 0.00000606 +[21:08:35] Epoch: 1 Batch: 19609/20099 (97.56%) Loss: 2.307118 LR: 0.00000606 +[21:08:36] Epoch: 1 Batch: 19610/20099 (97.57%) Loss: 2.419635 LR: 0.00000606 +[21:08:38] Epoch: 1 Batch: 19611/20099 (97.57%) Loss: 2.175984 LR: 0.00000606 +[21:08:40] Epoch: 1 Batch: 19612/20099 (97.58%) Loss: 2.160920 LR: 0.00000606 +[21:08:41] Epoch: 1 Batch: 19613/20099 (97.58%) Loss: 1.833313 LR: 0.00000606 +[21:08:43] Epoch: 1 Batch: 19614/20099 (97.59%) Loss: 1.977506 LR: 0.00000606 +[21:08:45] Epoch: 1 Batch: 19615/20099 (97.59%) Loss: 1.926411 LR: 0.00000606 +[21:08:46] Epoch: 1 Batch: 19616/20099 (97.60%) Loss: 2.224730 LR: 0.00000605 +[21:08:48] Epoch: 1 Batch: 19617/20099 (97.60%) Loss: 2.179622 LR: 0.00000605 +[21:08:50] Epoch: 1 Batch: 19618/20099 (97.61%) Loss: 2.471264 LR: 0.00000605 +[21:08:52] Epoch: 1 Batch: 19619/20099 (97.61%) Loss: 2.367577 LR: 0.00000605 +[21:08:53] Epoch: 1 Batch: 19620/20099 (97.62%) Loss: 2.237221 LR: 0.00000605 +[21:08:55] Epoch: 1 Batch: 19621/20099 (97.62%) Loss: 2.056040 LR: 0.00000605 +[21:08:57] Epoch: 1 Batch: 19622/20099 (97.63%) Loss: 2.350324 LR: 0.00000605 +[21:08:58] Epoch: 1 Batch: 19623/20099 (97.63%) Loss: 1.847331 LR: 0.00000605 +[21:09:00] Epoch: 1 Batch: 19624/20099 (97.64%) Loss: 2.109378 LR: 0.00000605 +[21:09:02] Epoch: 1 Batch: 19625/20099 (97.64%) Loss: 1.914697 LR: 0.00000605 +[21:09:04] Epoch: 1 Batch: 19626/20099 (97.65%) Loss: 1.644272 LR: 0.00000605 +[21:09:05] Epoch: 1 Batch: 19627/20099 (97.65%) Loss: 2.252537 LR: 0.00000605 +[21:09:07] Epoch: 1 Batch: 19628/20099 (97.66%) Loss: 1.957116 LR: 0.00000605 +[21:09:09] Epoch: 1 Batch: 19629/20099 (97.66%) Loss: 2.261369 LR: 0.00000605 +[21:09:10] Epoch: 1 Batch: 19630/20099 (97.67%) Loss: 2.267442 LR: 0.00000605 +[21:09:12] Epoch: 1 Batch: 19631/20099 (97.67%) Loss: 2.226410 LR: 0.00000605 +[21:09:14] Epoch: 1 Batch: 19632/20099 (97.68%) Loss: 2.233835 LR: 0.00000605 +[21:09:15] Epoch: 1 Batch: 19633/20099 (97.68%) Loss: 2.181283 LR: 0.00000605 +[21:09:17] Epoch: 1 Batch: 19634/20099 (97.69%) Loss: 2.031026 LR: 0.00000605 +[21:09:19] Epoch: 1 Batch: 19635/20099 (97.69%) Loss: 1.927615 LR: 0.00000605 +[21:09:21] Epoch: 1 Batch: 19636/20099 (97.70%) Loss: 2.067996 LR: 0.00000605 +[21:09:22] Epoch: 1 Batch: 19637/20099 (97.70%) Loss: 2.301143 LR: 0.00000605 +[21:09:24] Epoch: 1 Batch: 19638/20099 (97.71%) Loss: 2.177565 LR: 0.00000605 +[21:09:26] Epoch: 1 Batch: 19639/20099 (97.71%) Loss: 2.415480 LR: 0.00000605 +[21:09:27] Epoch: 1 Batch: 19640/20099 (97.72%) Loss: 2.148036 LR: 0.00000605 +[21:09:29] Epoch: 1 Batch: 19641/20099 (97.72%) Loss: 2.194405 LR: 0.00000605 +[21:09:31] Epoch: 1 Batch: 19642/20099 (97.73%) Loss: 2.033323 LR: 0.00000605 +[21:09:32] Epoch: 1 Batch: 19643/20099 (97.73%) Loss: 2.031634 LR: 0.00000605 +[21:09:34] Epoch: 1 Batch: 19644/20099 (97.74%) Loss: 2.242031 LR: 0.00000605 +[21:09:36] Epoch: 1 Batch: 19645/20099 (97.74%) Loss: 2.073355 LR: 0.00000605 +[21:09:37] Epoch: 1 Batch: 19646/20099 (97.75%) Loss: 2.119832 LR: 0.00000605 +[21:09:39] Epoch: 1 Batch: 19647/20099 (97.75%) Loss: 2.147381 LR: 0.00000605 +[21:09:41] Epoch: 1 Batch: 19648/20099 (97.76%) Loss: 2.008725 LR: 0.00000605 +[21:09:43] Epoch: 1 Batch: 19649/20099 (97.76%) Loss: 2.193848 LR: 0.00000605 +[21:09:44] Epoch: 1 Batch: 19650/20099 (97.77%) Loss: 2.126035 LR: 0.00000605 +[21:09:46] Epoch: 1 Batch: 19651/20099 (97.77%) Loss: 2.037643 LR: 0.00000605 +[21:09:48] Epoch: 1 Batch: 19652/20099 (97.78%) Loss: 2.156614 LR: 0.00000605 +[21:09:49] Epoch: 1 Batch: 19653/20099 (97.78%) Loss: 2.118502 LR: 0.00000605 +[21:09:51] Epoch: 1 Batch: 19654/20099 (97.79%) Loss: 2.496036 LR: 0.00000605 +[21:09:53] Epoch: 1 Batch: 19655/20099 (97.79%) Loss: 2.002283 LR: 0.00000605 +[21:09:54] Epoch: 1 Batch: 19656/20099 (97.80%) Loss: 2.070154 LR: 0.00000605 +[21:09:56] Epoch: 1 Batch: 19657/20099 (97.80%) Loss: 1.906476 LR: 0.00000605 +[21:09:58] Epoch: 1 Batch: 19658/20099 (97.81%) Loss: 2.419716 LR: 0.00000604 +[21:09:59] Epoch: 1 Batch: 19659/20099 (97.81%) Loss: 2.301998 LR: 0.00000604 +[21:10:01] Epoch: 1 Batch: 19660/20099 (97.82%) Loss: 2.082315 LR: 0.00000604 +[21:10:03] Epoch: 1 Batch: 19661/20099 (97.82%) Loss: 2.089788 LR: 0.00000604 +[21:10:04] Epoch: 1 Batch: 19662/20099 (97.83%) Loss: 2.007788 LR: 0.00000604 +[21:10:06] Epoch: 1 Batch: 19663/20099 (97.83%) Loss: 2.155777 LR: 0.00000604 +[21:10:08] Epoch: 1 Batch: 19664/20099 (97.84%) Loss: 2.118854 LR: 0.00000604 +[21:10:10] Epoch: 1 Batch: 19665/20099 (97.84%) Loss: 2.233336 LR: 0.00000604 +[21:10:11] Epoch: 1 Batch: 19666/20099 (97.85%) Loss: 2.043792 LR: 0.00000604 +[21:10:13] Epoch: 1 Batch: 19667/20099 (97.85%) Loss: 1.893848 LR: 0.00000604 +[21:10:15] Epoch: 1 Batch: 19668/20099 (97.86%) Loss: 2.282824 LR: 0.00000604 +[21:10:16] Epoch: 1 Batch: 19669/20099 (97.86%) Loss: 2.357982 LR: 0.00000604 +[21:10:18] Epoch: 1 Batch: 19670/20099 (97.87%) Loss: 2.036724 LR: 0.00000604 +[21:10:20] Epoch: 1 Batch: 19671/20099 (97.87%) Loss: 2.109029 LR: 0.00000604 +[21:10:21] Epoch: 1 Batch: 19672/20099 (97.88%) Loss: 2.200966 LR: 0.00000604 +[21:10:23] Epoch: 1 Batch: 19673/20099 (97.88%) Loss: 2.075505 LR: 0.00000604 +[21:10:25] Epoch: 1 Batch: 19674/20099 (97.89%) Loss: 2.063268 LR: 0.00000604 +[21:10:27] Epoch: 1 Batch: 19675/20099 (97.89%) Loss: 2.054620 LR: 0.00000604 +[21:10:28] Epoch: 1 Batch: 19676/20099 (97.90%) Loss: 2.158583 LR: 0.00000604 +[21:10:30] Epoch: 1 Batch: 19677/20099 (97.90%) Loss: 1.894716 LR: 0.00000604 +[21:10:32] Epoch: 1 Batch: 19678/20099 (97.91%) Loss: 1.905698 LR: 0.00000604 +[21:10:34] Epoch: 1 Batch: 19679/20099 (97.91%) Loss: 2.126866 LR: 0.00000604 +[21:10:35] Epoch: 1 Batch: 19680/20099 (97.92%) Loss: 1.933876 LR: 0.00000604 +[21:10:37] Epoch: 1 Batch: 19681/20099 (97.92%) Loss: 2.253431 LR: 0.00000604 +[21:10:39] Epoch: 1 Batch: 19682/20099 (97.93%) Loss: 1.996334 LR: 0.00000604 +[21:10:40] Epoch: 1 Batch: 19683/20099 (97.93%) Loss: 1.831973 LR: 0.00000604 +[21:10:42] Epoch: 1 Batch: 19684/20099 (97.94%) Loss: 2.302943 LR: 0.00000604 +[21:10:44] Epoch: 1 Batch: 19685/20099 (97.94%) Loss: 2.411334 LR: 0.00000604 +[21:10:45] Epoch: 1 Batch: 19686/20099 (97.95%) Loss: 2.135217 LR: 0.00000604 +[21:10:47] Epoch: 1 Batch: 19687/20099 (97.95%) Loss: 2.168799 LR: 0.00000604 +[21:10:49] Epoch: 1 Batch: 19688/20099 (97.96%) Loss: 2.111564 LR: 0.00000604 +[21:10:51] Epoch: 1 Batch: 19689/20099 (97.96%) Loss: 1.937084 LR: 0.00000604 +[21:10:52] Epoch: 1 Batch: 19690/20099 (97.97%) Loss: 1.887712 LR: 0.00000604 +[21:10:54] Epoch: 1 Batch: 19691/20099 (97.97%) Loss: 1.951787 LR: 0.00000604 +[21:10:56] Epoch: 1 Batch: 19692/20099 (97.98%) Loss: 2.012900 LR: 0.00000604 +[21:10:57] Epoch: 1 Batch: 19693/20099 (97.98%) Loss: 2.049324 LR: 0.00000604 +[21:10:59] Epoch: 1 Batch: 19694/20099 (97.98%) Loss: 2.214893 LR: 0.00000604 +[21:11:01] Epoch: 1 Batch: 19695/20099 (97.99%) Loss: 2.252084 LR: 0.00000604 +[21:11:02] Epoch: 1 Batch: 19696/20099 (97.99%) Loss: 2.282241 LR: 0.00000604 +[21:11:04] Epoch: 1 Batch: 19697/20099 (98.00%) Loss: 2.403964 LR: 0.00000604 +[21:11:06] Epoch: 1 Batch: 19698/20099 (98.00%) Loss: 2.118081 LR: 0.00000604 +[21:11:07] Epoch: 1 Batch: 19699/20099 (98.01%) Loss: 2.099753 LR: 0.00000604 +[21:11:09] Epoch: 1 Batch: 19700/20099 (98.01%) Loss: 1.905315 LR: 0.00000604 +[21:11:11] Epoch: 1 Batch: 19701/20099 (98.02%) Loss: 2.092693 LR: 0.00000604 +[21:11:13] Epoch: 1 Batch: 19702/20099 (98.02%) Loss: 2.173392 LR: 0.00000604 +[21:11:14] Epoch: 1 Batch: 19703/20099 (98.03%) Loss: 2.438276 LR: 0.00000604 +[21:11:16] Epoch: 1 Batch: 19704/20099 (98.03%) Loss: 2.157546 LR: 0.00000604 +[21:11:18] Epoch: 1 Batch: 19705/20099 (98.04%) Loss: 1.947139 LR: 0.00000604 +[21:11:19] Epoch: 1 Batch: 19706/20099 (98.04%) Loss: 2.170629 LR: 0.00000604 +[21:11:21] Epoch: 1 Batch: 19707/20099 (98.05%) Loss: 1.843884 LR: 0.00000604 +[21:11:23] Epoch: 1 Batch: 19708/20099 (98.05%) Loss: 1.996712 LR: 0.00000604 +[21:11:25] Epoch: 1 Batch: 19709/20099 (98.06%) Loss: 2.203949 LR: 0.00000604 +[21:11:26] Epoch: 1 Batch: 19710/20099 (98.06%) Loss: 1.928943 LR: 0.00000604 +[21:11:28] Epoch: 1 Batch: 19711/20099 (98.07%) Loss: 2.063842 LR: 0.00000604 +[21:11:30] Epoch: 1 Batch: 19712/20099 (98.07%) Loss: 2.082536 LR: 0.00000604 +[21:11:31] Epoch: 1 Batch: 19713/20099 (98.08%) Loss: 2.192016 LR: 0.00000604 +[21:11:33] Epoch: 1 Batch: 19714/20099 (98.08%) Loss: 2.026328 LR: 0.00000603 +[21:11:35] Epoch: 1 Batch: 19715/20099 (98.09%) Loss: 1.948423 LR: 0.00000603 +[21:11:36] Epoch: 1 Batch: 19716/20099 (98.09%) Loss: 2.223531 LR: 0.00000603 +[21:11:38] Epoch: 1 Batch: 19717/20099 (98.10%) Loss: 1.804175 LR: 0.00000603 +[21:11:40] Epoch: 1 Batch: 19718/20099 (98.10%) Loss: 1.847138 LR: 0.00000603 +[21:11:42] Epoch: 1 Batch: 19719/20099 (98.11%) Loss: 2.468766 LR: 0.00000603 +[21:11:43] Epoch: 1 Batch: 19720/20099 (98.11%) Loss: 2.296887 LR: 0.00000603 +[21:11:45] Epoch: 1 Batch: 19721/20099 (98.12%) Loss: 1.811715 LR: 0.00000603 +[21:11:47] Epoch: 1 Batch: 19722/20099 (98.12%) Loss: 1.916123 LR: 0.00000603 +[21:11:48] Epoch: 1 Batch: 19723/20099 (98.13%) Loss: 1.967202 LR: 0.00000603 +[21:11:50] Epoch: 1 Batch: 19724/20099 (98.13%) Loss: 2.255395 LR: 0.00000603 +[21:11:52] Epoch: 1 Batch: 19725/20099 (98.14%) Loss: 2.021192 LR: 0.00000603 +[21:11:53] Epoch: 1 Batch: 19726/20099 (98.14%) Loss: 2.113619 LR: 0.00000603 +[21:11:55] Epoch: 1 Batch: 19727/20099 (98.15%) Loss: 2.008102 LR: 0.00000603 +[21:11:57] Epoch: 1 Batch: 19728/20099 (98.15%) Loss: 2.094108 LR: 0.00000603 +[21:11:58] Epoch: 1 Batch: 19729/20099 (98.16%) Loss: 2.080100 LR: 0.00000603 +[21:12:00] Epoch: 1 Batch: 19730/20099 (98.16%) Loss: 2.113569 LR: 0.00000603 +[21:12:02] Epoch: 1 Batch: 19731/20099 (98.17%) Loss: 2.062838 LR: 0.00000603 +[21:12:03] Epoch: 1 Batch: 19732/20099 (98.17%) Loss: 1.894602 LR: 0.00000603 +[21:12:05] Epoch: 1 Batch: 19733/20099 (98.18%) Loss: 2.380128 LR: 0.00000603 +[21:12:07] Epoch: 1 Batch: 19734/20099 (98.18%) Loss: 2.104017 LR: 0.00000603 +[21:12:09] Epoch: 1 Batch: 19735/20099 (98.19%) Loss: 1.709727 LR: 0.00000603 +[21:12:10] Epoch: 1 Batch: 19736/20099 (98.19%) Loss: 2.038914 LR: 0.00000603 +[21:12:12] Epoch: 1 Batch: 19737/20099 (98.20%) Loss: 2.046742 LR: 0.00000603 +[21:12:14] Epoch: 1 Batch: 19738/20099 (98.20%) Loss: 2.071620 LR: 0.00000603 +[21:12:15] Epoch: 1 Batch: 19739/20099 (98.21%) Loss: 2.168152 LR: 0.00000603 +[21:12:17] Epoch: 1 Batch: 19740/20099 (98.21%) Loss: 2.043953 LR: 0.00000603 +[21:12:19] Epoch: 1 Batch: 19741/20099 (98.22%) Loss: 2.293775 LR: 0.00000603 +[21:12:20] Epoch: 1 Batch: 19742/20099 (98.22%) Loss: 2.314892 LR: 0.00000603 +[21:12:22] Epoch: 1 Batch: 19743/20099 (98.23%) Loss: 1.911069 LR: 0.00000603 +[21:12:24] Epoch: 1 Batch: 19744/20099 (98.23%) Loss: 2.020998 LR: 0.00000603 +[21:12:26] Epoch: 1 Batch: 19745/20099 (98.24%) Loss: 2.140975 LR: 0.00000603 +[21:12:27] Epoch: 1 Batch: 19746/20099 (98.24%) Loss: 1.997423 LR: 0.00000603 +[21:12:29] Epoch: 1 Batch: 19747/20099 (98.25%) Loss: 1.922877 LR: 0.00000603 +[21:12:31] Epoch: 1 Batch: 19748/20099 (98.25%) Loss: 1.861700 LR: 0.00000603 +[21:12:32] Epoch: 1 Batch: 19749/20099 (98.26%) Loss: 1.913440 LR: 0.00000603 +[21:12:34] Epoch: 1 Batch: 19750/20099 (98.26%) Loss: 2.223578 LR: 0.00000603 +[21:12:36] Epoch: 1 Batch: 19751/20099 (98.27%) Loss: 1.813420 LR: 0.00000603 +[21:12:37] Epoch: 1 Batch: 19752/20099 (98.27%) Loss: 1.790297 LR: 0.00000603 +[21:12:39] Epoch: 1 Batch: 19753/20099 (98.28%) Loss: 2.447745 LR: 0.00000603 +[21:12:41] Epoch: 1 Batch: 19754/20099 (98.28%) Loss: 2.159322 LR: 0.00000603 +[21:12:43] Epoch: 1 Batch: 19755/20099 (98.29%) Loss: 2.067868 LR: 0.00000603 +[21:12:44] Epoch: 1 Batch: 19756/20099 (98.29%) Loss: 2.145474 LR: 0.00000603 +[21:12:46] Epoch: 1 Batch: 19757/20099 (98.30%) Loss: 2.218757 LR: 0.00000603 +[21:12:48] Epoch: 1 Batch: 19758/20099 (98.30%) Loss: 1.833680 LR: 0.00000603 +[21:12:49] Epoch: 1 Batch: 19759/20099 (98.31%) Loss: 2.323237 LR: 0.00000603 +[21:12:51] Epoch: 1 Batch: 19760/20099 (98.31%) Loss: 1.930634 LR: 0.00000603 +[21:12:53] Epoch: 1 Batch: 19761/20099 (98.32%) Loss: 1.875503 LR: 0.00000603 +[21:12:54] Epoch: 1 Batch: 19762/20099 (98.32%) Loss: 2.133965 LR: 0.00000603 +[21:12:56] Epoch: 1 Batch: 19763/20099 (98.33%) Loss: 2.174304 LR: 0.00000603 +[21:12:58] Epoch: 1 Batch: 19764/20099 (98.33%) Loss: 1.866125 LR: 0.00000603 +[21:12:59] Epoch: 1 Batch: 19765/20099 (98.34%) Loss: 2.230114 LR: 0.00000603 +[21:13:01] Epoch: 1 Batch: 19766/20099 (98.34%) Loss: 2.207559 LR: 0.00000603 +[21:13:03] Epoch: 1 Batch: 19767/20099 (98.35%) Loss: 2.667779 LR: 0.00000603 +[21:13:05] Epoch: 1 Batch: 19768/20099 (98.35%) Loss: 2.096428 LR: 0.00000603 +[21:13:06] Epoch: 1 Batch: 19769/20099 (98.36%) Loss: 2.190620 LR: 0.00000603 +[21:13:08] Epoch: 1 Batch: 19770/20099 (98.36%) Loss: 2.128263 LR: 0.00000603 +[21:13:10] Epoch: 1 Batch: 19771/20099 (98.37%) Loss: 2.224972 LR: 0.00000603 +[21:13:11] Epoch: 1 Batch: 19772/20099 (98.37%) Loss: 2.028450 LR: 0.00000603 +[21:13:13] Epoch: 1 Batch: 19773/20099 (98.38%) Loss: 2.242065 LR: 0.00000603 +[21:13:15] Epoch: 1 Batch: 19774/20099 (98.38%) Loss: 2.319241 LR: 0.00000603 +[21:13:16] Epoch: 1 Batch: 19775/20099 (98.39%) Loss: 2.059998 LR: 0.00000603 +[21:13:18] Epoch: 1 Batch: 19776/20099 (98.39%) Loss: 1.928091 LR: 0.00000603 +[21:13:20] Epoch: 1 Batch: 19777/20099 (98.40%) Loss: 2.175483 LR: 0.00000602 +[21:13:22] Epoch: 1 Batch: 19778/20099 (98.40%) Loss: 2.162094 LR: 0.00000602 +[21:13:23] Epoch: 1 Batch: 19779/20099 (98.41%) Loss: 1.738840 LR: 0.00000602 +[21:13:25] Epoch: 1 Batch: 19780/20099 (98.41%) Loss: 1.834609 LR: 0.00000602 +[21:13:27] Epoch: 1 Batch: 19781/20099 (98.42%) Loss: 2.065798 LR: 0.00000602 +[21:13:28] Epoch: 1 Batch: 19782/20099 (98.42%) Loss: 1.981241 LR: 0.00000602 +[21:13:30] Epoch: 1 Batch: 19783/20099 (98.43%) Loss: 2.163119 LR: 0.00000602 +[21:13:32] Epoch: 1 Batch: 19784/20099 (98.43%) Loss: 2.136414 LR: 0.00000602 +[21:13:33] Epoch: 1 Batch: 19785/20099 (98.44%) Loss: 2.007050 LR: 0.00000602 +[21:13:35] Epoch: 1 Batch: 19786/20099 (98.44%) Loss: 1.792248 LR: 0.00000602 +[21:13:37] Epoch: 1 Batch: 19787/20099 (98.45%) Loss: 2.074148 LR: 0.00000602 +[21:13:38] Epoch: 1 Batch: 19788/20099 (98.45%) Loss: 1.591648 LR: 0.00000602 +[21:13:40] Epoch: 1 Batch: 19789/20099 (98.46%) Loss: 2.062532 LR: 0.00000602 +[21:13:42] Epoch: 1 Batch: 19790/20099 (98.46%) Loss: 2.058855 LR: 0.00000602 +[21:13:44] Epoch: 1 Batch: 19791/20099 (98.47%) Loss: 1.814199 LR: 0.00000602 +[21:13:45] Epoch: 1 Batch: 19792/20099 (98.47%) Loss: 1.950415 LR: 0.00000602 +[21:13:47] Epoch: 1 Batch: 19793/20099 (98.48%) Loss: 1.771776 LR: 0.00000602 +[21:13:49] Epoch: 1 Batch: 19794/20099 (98.48%) Loss: 1.851476 LR: 0.00000602 +[21:13:50] Epoch: 1 Batch: 19795/20099 (98.49%) Loss: 2.052677 LR: 0.00000602 +[21:13:52] Epoch: 1 Batch: 19796/20099 (98.49%) Loss: 1.928611 LR: 0.00000602 +[21:13:54] Epoch: 1 Batch: 19797/20099 (98.50%) Loss: 2.180693 LR: 0.00000602 +[21:13:55] Epoch: 1 Batch: 19798/20099 (98.50%) Loss: 2.166965 LR: 0.00000602 +[21:13:57] Epoch: 1 Batch: 19799/20099 (98.51%) Loss: 2.018901 LR: 0.00000602 +[21:14:03] >> Cleaned up old temp checkpoint: epoch1_step17800 +[21:14:03] >> Temp checkpoint saved: epoch1_step19800, size: 0.1693 GB +[21:14:03] Epoch: 1 Batch: 19800/20099 (98.51%) Loss: 2.052021 LR: 0.00000602 +[21:14:04] Epoch: 1 Batch: 19801/20099 (98.52%) Loss: 2.292603 LR: 0.00000602 +[21:14:06] Epoch: 1 Batch: 19802/20099 (98.52%) Loss: 1.940827 LR: 0.00000602 +[21:14:08] Epoch: 1 Batch: 19803/20099 (98.53%) Loss: 2.090777 LR: 0.00000602 +[21:14:09] Epoch: 1 Batch: 19804/20099 (98.53%) Loss: 1.982296 LR: 0.00000602 +[21:14:11] Epoch: 1 Batch: 19805/20099 (98.54%) Loss: 1.870217 LR: 0.00000602 +[21:14:13] Epoch: 1 Batch: 19806/20099 (98.54%) Loss: 2.375092 LR: 0.00000602 +[21:14:14] Epoch: 1 Batch: 19807/20099 (98.55%) Loss: 2.053111 LR: 0.00000602 +[21:14:16] Epoch: 1 Batch: 19808/20099 (98.55%) Loss: 1.820624 LR: 0.00000602 +[21:14:18] Epoch: 1 Batch: 19809/20099 (98.56%) Loss: 1.875758 LR: 0.00000602 +[21:14:19] Epoch: 1 Batch: 19810/20099 (98.56%) Loss: 1.896281 LR: 0.00000602 +[21:14:21] Epoch: 1 Batch: 19811/20099 (98.57%) Loss: 2.519021 LR: 0.00000602 +[21:14:23] Epoch: 1 Batch: 19812/20099 (98.57%) Loss: 2.417110 LR: 0.00000602 +[21:14:25] Epoch: 1 Batch: 19813/20099 (98.58%) Loss: 2.032626 LR: 0.00000602 +[21:14:26] Epoch: 1 Batch: 19814/20099 (98.58%) Loss: 2.289120 LR: 0.00000602 +[21:14:28] Epoch: 1 Batch: 19815/20099 (98.59%) Loss: 1.898158 LR: 0.00000602 +[21:14:30] Epoch: 1 Batch: 19816/20099 (98.59%) Loss: 2.255989 LR: 0.00000602 +[21:14:31] Epoch: 1 Batch: 19817/20099 (98.60%) Loss: 1.928620 LR: 0.00000602 +[21:14:33] Epoch: 1 Batch: 19818/20099 (98.60%) Loss: 2.066255 LR: 0.00000602 +[21:14:35] Epoch: 1 Batch: 19819/20099 (98.61%) Loss: 2.556113 LR: 0.00000602 +[21:14:37] Epoch: 1 Batch: 19820/20099 (98.61%) Loss: 2.461286 LR: 0.00000602 +[21:14:38] Epoch: 1 Batch: 19821/20099 (98.62%) Loss: 2.330388 LR: 0.00000602 +[21:14:40] Epoch: 1 Batch: 19822/20099 (98.62%) Loss: 1.928169 LR: 0.00000602 +[21:14:42] Epoch: 1 Batch: 19823/20099 (98.63%) Loss: 2.100055 LR: 0.00000602 +[21:14:43] Epoch: 1 Batch: 19824/20099 (98.63%) Loss: 1.911302 LR: 0.00000602 +[21:14:45] Epoch: 1 Batch: 19825/20099 (98.64%) Loss: 2.287809 LR: 0.00000602 +[21:14:47] Epoch: 1 Batch: 19826/20099 (98.64%) Loss: 2.085364 LR: 0.00000602 +[21:14:48] Epoch: 1 Batch: 19827/20099 (98.65%) Loss: 2.183035 LR: 0.00000602 +[21:14:50] Epoch: 1 Batch: 19828/20099 (98.65%) Loss: 2.047106 LR: 0.00000602 +[21:14:52] Epoch: 1 Batch: 19829/20099 (98.66%) Loss: 2.056818 LR: 0.00000602 +[21:14:54] Epoch: 1 Batch: 19830/20099 (98.66%) Loss: 1.722013 LR: 0.00000602 +[21:14:55] Epoch: 1 Batch: 19831/20099 (98.67%) Loss: 2.022016 LR: 0.00000602 +[21:14:57] Epoch: 1 Batch: 19832/20099 (98.67%) Loss: 2.126859 LR: 0.00000602 +[21:14:59] Epoch: 1 Batch: 19833/20099 (98.68%) Loss: 1.922251 LR: 0.00000602 +[21:15:00] Epoch: 1 Batch: 19834/20099 (98.68%) Loss: 2.169086 LR: 0.00000602 +[21:15:02] Epoch: 1 Batch: 19835/20099 (98.69%) Loss: 2.153125 LR: 0.00000602 +[21:15:04] Epoch: 1 Batch: 19836/20099 (98.69%) Loss: 2.191390 LR: 0.00000602 +[21:15:05] Epoch: 1 Batch: 19837/20099 (98.70%) Loss: 2.248435 LR: 0.00000602 +[21:15:07] Epoch: 1 Batch: 19838/20099 (98.70%) Loss: 1.974731 LR: 0.00000602 +[21:15:09] Epoch: 1 Batch: 19839/20099 (98.71%) Loss: 2.181959 LR: 0.00000602 +[21:15:11] Epoch: 1 Batch: 19840/20099 (98.71%) Loss: 1.811167 LR: 0.00000602 +[21:15:12] Epoch: 1 Batch: 19841/20099 (98.72%) Loss: 1.840478 LR: 0.00000602 +[21:15:14] Epoch: 1 Batch: 19842/20099 (98.72%) Loss: 1.963462 LR: 0.00000602 +[21:15:16] Epoch: 1 Batch: 19843/20099 (98.73%) Loss: 2.038458 LR: 0.00000602 +[21:15:17] Epoch: 1 Batch: 19844/20099 (98.73%) Loss: 2.004476 LR: 0.00000602 +[21:15:19] Epoch: 1 Batch: 19845/20099 (98.74%) Loss: 2.078323 LR: 0.00000602 +[21:15:21] Epoch: 1 Batch: 19846/20099 (98.74%) Loss: 2.155626 LR: 0.00000602 +[21:15:22] Epoch: 1 Batch: 19847/20099 (98.75%) Loss: 2.095434 LR: 0.00000602 +[21:15:24] Epoch: 1 Batch: 19848/20099 (98.75%) Loss: 1.807209 LR: 0.00000602 +[21:15:26] Epoch: 1 Batch: 19849/20099 (98.76%) Loss: 2.173926 LR: 0.00000602 +[21:15:27] Epoch: 1 Batch: 19850/20099 (98.76%) Loss: 2.241233 LR: 0.00000602 +[21:15:29] Epoch: 1 Batch: 19851/20099 (98.77%) Loss: 2.129119 LR: 0.00000602 +[21:15:31] Epoch: 1 Batch: 19852/20099 (98.77%) Loss: 2.088399 LR: 0.00000602 +[21:15:32] Epoch: 1 Batch: 19853/20099 (98.78%) Loss: 2.365876 LR: 0.00000602 +[21:15:34] Epoch: 1 Batch: 19854/20099 (98.78%) Loss: 1.820472 LR: 0.00000601 +[21:15:36] Epoch: 1 Batch: 19855/20099 (98.79%) Loss: 2.125359 LR: 0.00000601 +[21:15:37] Epoch: 1 Batch: 19856/20099 (98.79%) Loss: 2.111696 LR: 0.00000601 +[21:15:39] Epoch: 1 Batch: 19857/20099 (98.80%) Loss: 2.118651 LR: 0.00000601 +[21:15:41] Epoch: 1 Batch: 19858/20099 (98.80%) Loss: 1.898112 LR: 0.00000601 +[21:15:43] Epoch: 1 Batch: 19859/20099 (98.81%) Loss: 2.364972 LR: 0.00000601 +[21:15:44] Epoch: 1 Batch: 19860/20099 (98.81%) Loss: 1.888286 LR: 0.00000601 +[21:15:46] Epoch: 1 Batch: 19861/20099 (98.82%) Loss: 1.925153 LR: 0.00000601 +[21:15:48] Epoch: 1 Batch: 19862/20099 (98.82%) Loss: 2.251948 LR: 0.00000601 +[21:15:49] Epoch: 1 Batch: 19863/20099 (98.83%) Loss: 2.079187 LR: 0.00000601 +[21:15:51] Epoch: 1 Batch: 19864/20099 (98.83%) Loss: 2.162825 LR: 0.00000601 +[21:15:53] Epoch: 1 Batch: 19865/20099 (98.84%) Loss: 2.204412 LR: 0.00000601 +[21:15:54] Epoch: 1 Batch: 19866/20099 (98.84%) Loss: 2.361954 LR: 0.00000601 +[21:15:56] Epoch: 1 Batch: 19867/20099 (98.85%) Loss: 2.182274 LR: 0.00000601 +[21:15:58] Epoch: 1 Batch: 19868/20099 (98.85%) Loss: 2.418038 LR: 0.00000601 +[21:15:59] Epoch: 1 Batch: 19869/20099 (98.86%) Loss: 2.044775 LR: 0.00000601 +[21:16:01] Epoch: 1 Batch: 19870/20099 (98.86%) Loss: 2.260160 LR: 0.00000601 +[21:16:03] Epoch: 1 Batch: 19871/20099 (98.87%) Loss: 2.349758 LR: 0.00000601 +[21:16:05] Epoch: 1 Batch: 19872/20099 (98.87%) Loss: 2.136977 LR: 0.00000601 +[21:16:06] Epoch: 1 Batch: 19873/20099 (98.88%) Loss: 1.982842 LR: 0.00000601 +[21:16:08] Epoch: 1 Batch: 19874/20099 (98.88%) Loss: 2.386414 LR: 0.00000601 +[21:16:10] Epoch: 1 Batch: 19875/20099 (98.89%) Loss: 1.828253 LR: 0.00000601 +[21:16:11] Epoch: 1 Batch: 19876/20099 (98.89%) Loss: 2.059540 LR: 0.00000601 +[21:16:13] Epoch: 1 Batch: 19877/20099 (98.90%) Loss: 2.177837 LR: 0.00000601 +[21:16:15] Epoch: 1 Batch: 19878/20099 (98.90%) Loss: 2.270956 LR: 0.00000601 +[21:16:16] Epoch: 1 Batch: 19879/20099 (98.91%) Loss: 2.152571 LR: 0.00000601 +[21:16:18] Epoch: 1 Batch: 19880/20099 (98.91%) Loss: 2.391748 LR: 0.00000601 +[21:16:20] Epoch: 1 Batch: 19881/20099 (98.92%) Loss: 1.922534 LR: 0.00000601 +[21:16:21] Epoch: 1 Batch: 19882/20099 (98.92%) Loss: 2.095635 LR: 0.00000601 +[21:16:23] Epoch: 1 Batch: 19883/20099 (98.93%) Loss: 2.339785 LR: 0.00000601 +[21:16:25] Epoch: 1 Batch: 19884/20099 (98.93%) Loss: 1.905582 LR: 0.00000601 +[21:16:27] Epoch: 1 Batch: 19885/20099 (98.94%) Loss: 2.450078 LR: 0.00000601 +[21:16:28] Epoch: 1 Batch: 19886/20099 (98.94%) Loss: 2.299301 LR: 0.00000601 +[21:16:30] Epoch: 1 Batch: 19887/20099 (98.95%) Loss: 2.012414 LR: 0.00000601 +[21:16:32] Epoch: 1 Batch: 19888/20099 (98.95%) Loss: 2.217902 LR: 0.00000601 +[21:16:33] Epoch: 1 Batch: 19889/20099 (98.96%) Loss: 2.243109 LR: 0.00000601 +[21:16:35] Epoch: 1 Batch: 19890/20099 (98.96%) Loss: 2.026502 LR: 0.00000601 +[21:16:37] Epoch: 1 Batch: 19891/20099 (98.97%) Loss: 2.223279 LR: 0.00000601 +[21:16:38] Epoch: 1 Batch: 19892/20099 (98.97%) Loss: 2.128953 LR: 0.00000601 +[21:16:40] Epoch: 1 Batch: 19893/20099 (98.98%) Loss: 2.140095 LR: 0.00000601 +[21:16:42] Epoch: 1 Batch: 19894/20099 (98.98%) Loss: 1.992040 LR: 0.00000601 +[21:16:43] Epoch: 1 Batch: 19895/20099 (98.99%) Loss: 2.362668 LR: 0.00000601 +[21:16:45] Epoch: 1 Batch: 19896/20099 (98.99%) Loss: 1.991993 LR: 0.00000601 +[21:16:47] Epoch: 1 Batch: 19897/20099 (98.99%) Loss: 1.908153 LR: 0.00000601 +[21:16:48] Epoch: 1 Batch: 19898/20099 (99.00%) Loss: 2.475658 LR: 0.00000601 +[21:16:50] Epoch: 1 Batch: 19899/20099 (99.00%) Loss: 2.139417 LR: 0.00000601 +[21:16:52] Epoch: 1 Batch: 19900/20099 (99.01%) Loss: 2.265611 LR: 0.00000601 +[21:16:54] Epoch: 1 Batch: 19901/20099 (99.01%) Loss: 2.066012 LR: 0.00000601 +[21:16:55] Epoch: 1 Batch: 19902/20099 (99.02%) Loss: 2.183114 LR: 0.00000601 +[21:16:57] Epoch: 1 Batch: 19903/20099 (99.02%) Loss: 2.225366 LR: 0.00000601 +[21:16:59] Epoch: 1 Batch: 19904/20099 (99.03%) Loss: 2.059197 LR: 0.00000601 +[21:17:00] Epoch: 1 Batch: 19905/20099 (99.03%) Loss: 2.343226 LR: 0.00000601 +[21:17:02] Epoch: 1 Batch: 19906/20099 (99.04%) Loss: 1.905977 LR: 0.00000601 +[21:17:04] Epoch: 1 Batch: 19907/20099 (99.04%) Loss: 1.926025 LR: 0.00000601 +[21:17:05] Epoch: 1 Batch: 19908/20099 (99.05%) Loss: 2.233013 LR: 0.00000601 +[21:17:07] Epoch: 1 Batch: 19909/20099 (99.05%) Loss: 2.001720 LR: 0.00000601 +[21:17:09] Epoch: 1 Batch: 19910/20099 (99.06%) Loss: 2.192508 LR: 0.00000601 +[21:17:10] Epoch: 1 Batch: 19911/20099 (99.06%) Loss: 1.998633 LR: 0.00000601 +[21:17:12] Epoch: 1 Batch: 19912/20099 (99.07%) Loss: 1.491950 LR: 0.00000601 +[21:17:14] Epoch: 1 Batch: 19913/20099 (99.07%) Loss: 2.068771 LR: 0.00000601 +[21:17:16] Epoch: 1 Batch: 19914/20099 (99.08%) Loss: 2.128163 LR: 0.00000601 +[21:17:17] Epoch: 1 Batch: 19915/20099 (99.08%) Loss: 1.915859 LR: 0.00000601 +[21:17:19] Epoch: 1 Batch: 19916/20099 (99.09%) Loss: 2.079754 LR: 0.00000601 +[21:17:21] Epoch: 1 Batch: 19917/20099 (99.09%) Loss: 1.931251 LR: 0.00000601 +[21:17:22] Epoch: 1 Batch: 19918/20099 (99.10%) Loss: 1.794970 LR: 0.00000601 +[21:17:24] Epoch: 1 Batch: 19919/20099 (99.10%) Loss: 2.064368 LR: 0.00000601 +[21:17:26] Epoch: 1 Batch: 19920/20099 (99.11%) Loss: 1.681525 LR: 0.00000601 +[21:17:27] Epoch: 1 Batch: 19921/20099 (99.11%) Loss: 2.125411 LR: 0.00000601 +[21:17:29] Epoch: 1 Batch: 19922/20099 (99.12%) Loss: 2.191348 LR: 0.00000601 +[21:17:31] Epoch: 1 Batch: 19923/20099 (99.12%) Loss: 1.762524 LR: 0.00000601 +[21:17:32] Epoch: 1 Batch: 19924/20099 (99.13%) Loss: 2.188381 LR: 0.00000601 +[21:17:34] Epoch: 1 Batch: 19925/20099 (99.13%) Loss: 2.229745 LR: 0.00000601 +[21:17:36] Epoch: 1 Batch: 19926/20099 (99.14%) Loss: 1.964939 LR: 0.00000601 +[21:17:38] Epoch: 1 Batch: 19927/20099 (99.14%) Loss: 2.091248 LR: 0.00000601 +[21:17:39] Epoch: 1 Batch: 19928/20099 (99.15%) Loss: 2.110506 LR: 0.00000601 +[21:17:41] Epoch: 1 Batch: 19929/20099 (99.15%) Loss: 2.221666 LR: 0.00000601 +[21:17:43] Epoch: 1 Batch: 19930/20099 (99.16%) Loss: 2.092208 LR: 0.00000601 +[21:17:44] Epoch: 1 Batch: 19931/20099 (99.16%) Loss: 1.979654 LR: 0.00000601 +[21:17:46] Epoch: 1 Batch: 19932/20099 (99.17%) Loss: 1.729502 LR: 0.00000601 +[21:17:48] Epoch: 1 Batch: 19933/20099 (99.17%) Loss: 2.357080 LR: 0.00000601 +[21:17:49] Epoch: 1 Batch: 19934/20099 (99.18%) Loss: 2.016843 LR: 0.00000601 +[21:17:51] Epoch: 1 Batch: 19935/20099 (99.18%) Loss: 2.214905 LR: 0.00000601 +[21:17:53] Epoch: 1 Batch: 19936/20099 (99.19%) Loss: 2.107324 LR: 0.00000601 +[21:17:54] Epoch: 1 Batch: 19937/20099 (99.19%) Loss: 2.392010 LR: 0.00000601 +[21:17:56] Epoch: 1 Batch: 19938/20099 (99.20%) Loss: 2.181722 LR: 0.00000601 +[21:17:58] Epoch: 1 Batch: 19939/20099 (99.20%) Loss: 2.021723 LR: 0.00000601 +[21:18:00] Epoch: 1 Batch: 19940/20099 (99.21%) Loss: 2.103037 LR: 0.00000601 +[21:18:01] Epoch: 1 Batch: 19941/20099 (99.21%) Loss: 2.227875 LR: 0.00000601 +[21:18:03] Epoch: 1 Batch: 19942/20099 (99.22%) Loss: 2.004650 LR: 0.00000601 +[21:18:05] Epoch: 1 Batch: 19943/20099 (99.22%) Loss: 2.066424 LR: 0.00000601 +[21:18:06] Epoch: 1 Batch: 19944/20099 (99.23%) Loss: 2.322552 LR: 0.00000601 +[21:18:08] Epoch: 1 Batch: 19945/20099 (99.23%) Loss: 1.947948 LR: 0.00000601 +[21:18:10] Epoch: 1 Batch: 19946/20099 (99.24%) Loss: 2.001580 LR: 0.00000601 +[21:18:11] Epoch: 1 Batch: 19947/20099 (99.24%) Loss: 2.056804 LR: 0.00000601 +[21:18:13] Epoch: 1 Batch: 19948/20099 (99.25%) Loss: 2.046386 LR: 0.00000601 +[21:18:15] Epoch: 1 Batch: 19949/20099 (99.25%) Loss: 2.299993 LR: 0.00000601 +[21:18:16] Epoch: 1 Batch: 19950/20099 (99.26%) Loss: 2.123426 LR: 0.00000601 +[21:18:18] Epoch: 1 Batch: 19951/20099 (99.26%) Loss: 2.092831 LR: 0.00000601 +[21:18:20] Epoch: 1 Batch: 19952/20099 (99.27%) Loss: 2.107985 LR: 0.00000601 +[21:18:22] Epoch: 1 Batch: 19953/20099 (99.27%) Loss: 2.699091 LR: 0.00000601 +[21:18:23] Epoch: 1 Batch: 19954/20099 (99.28%) Loss: 2.065516 LR: 0.00000601 +[21:18:25] Epoch: 1 Batch: 19955/20099 (99.28%) Loss: 2.189439 LR: 0.00000601 +[21:18:27] Epoch: 1 Batch: 19956/20099 (99.29%) Loss: 2.139875 LR: 0.00000601 +[21:18:28] Epoch: 1 Batch: 19957/20099 (99.29%) Loss: 1.987597 LR: 0.00000601 +[21:18:30] Epoch: 1 Batch: 19958/20099 (99.30%) Loss: 1.785277 LR: 0.00000601 +[21:18:32] Epoch: 1 Batch: 19959/20099 (99.30%) Loss: 2.030840 LR: 0.00000600 +[21:18:33] Epoch: 1 Batch: 19960/20099 (99.31%) Loss: 2.016146 LR: 0.00000600 +[21:18:35] Epoch: 1 Batch: 19961/20099 (99.31%) Loss: 1.983338 LR: 0.00000600 +[21:18:37] Epoch: 1 Batch: 19962/20099 (99.32%) Loss: 1.879156 LR: 0.00000600 +[21:18:38] Epoch: 1 Batch: 19963/20099 (99.32%) Loss: 1.990597 LR: 0.00000600 +[21:18:40] Epoch: 1 Batch: 19964/20099 (99.33%) Loss: 2.050892 LR: 0.00000600 +[21:18:42] Epoch: 1 Batch: 19965/20099 (99.33%) Loss: 2.450272 LR: 0.00000600 +[21:18:44] Epoch: 1 Batch: 19966/20099 (99.34%) Loss: 2.364906 LR: 0.00000600 +[21:18:45] Epoch: 1 Batch: 19967/20099 (99.34%) Loss: 1.904831 LR: 0.00000600 +[21:18:47] Epoch: 1 Batch: 19968/20099 (99.35%) Loss: 1.992085 LR: 0.00000600 +[21:18:49] Epoch: 1 Batch: 19969/20099 (99.35%) Loss: 1.824592 LR: 0.00000600 +[21:18:50] Epoch: 1 Batch: 19970/20099 (99.36%) Loss: 2.119126 LR: 0.00000600 +[21:18:52] Epoch: 1 Batch: 19971/20099 (99.36%) Loss: 2.127517 LR: 0.00000600 +[21:18:54] Epoch: 1 Batch: 19972/20099 (99.37%) Loss: 2.261355 LR: 0.00000600 +[21:18:55] Epoch: 1 Batch: 19973/20099 (99.37%) Loss: 1.915739 LR: 0.00000600 +[21:18:57] Epoch: 1 Batch: 19974/20099 (99.38%) Loss: 1.710819 LR: 0.00000600 +[21:18:59] Epoch: 1 Batch: 19975/20099 (99.38%) Loss: 2.095576 LR: 0.00000600 +[21:19:01] Epoch: 1 Batch: 19976/20099 (99.39%) Loss: 1.952948 LR: 0.00000600 +[21:19:02] Epoch: 1 Batch: 19977/20099 (99.39%) Loss: 1.988061 LR: 0.00000600 +[21:19:04] Epoch: 1 Batch: 19978/20099 (99.40%) Loss: 2.172117 LR: 0.00000600 +[21:19:06] Epoch: 1 Batch: 19979/20099 (99.40%) Loss: 1.987535 LR: 0.00000600 +[21:19:07] Epoch: 1 Batch: 19980/20099 (99.41%) Loss: 2.006175 LR: 0.00000600 +[21:19:09] Epoch: 1 Batch: 19981/20099 (99.41%) Loss: 2.217468 LR: 0.00000600 +[21:19:11] Epoch: 1 Batch: 19982/20099 (99.42%) Loss: 1.738266 LR: 0.00000600 +[21:19:12] Epoch: 1 Batch: 19983/20099 (99.42%) Loss: 1.652343 LR: 0.00000600 +[21:19:14] Epoch: 1 Batch: 19984/20099 (99.43%) Loss: 1.832038 LR: 0.00000600 +[21:19:16] Epoch: 1 Batch: 19985/20099 (99.43%) Loss: 1.951116 LR: 0.00000600 +[21:19:17] Epoch: 1 Batch: 19986/20099 (99.44%) Loss: 2.049422 LR: 0.00000600 +[21:19:19] Epoch: 1 Batch: 19987/20099 (99.44%) Loss: 2.207121 LR: 0.00000600 +[21:19:21] Epoch: 1 Batch: 19988/20099 (99.45%) Loss: 2.255623 LR: 0.00000600 +[21:19:23] Epoch: 1 Batch: 19989/20099 (99.45%) Loss: 1.987094 LR: 0.00000600 +[21:19:24] Epoch: 1 Batch: 19990/20099 (99.46%) Loss: 2.357598 LR: 0.00000600 +[21:19:26] Epoch: 1 Batch: 19991/20099 (99.46%) Loss: 2.250891 LR: 0.00000600 +[21:19:28] Epoch: 1 Batch: 19992/20099 (99.47%) Loss: 2.123541 LR: 0.00000600 +[21:19:29] Epoch: 1 Batch: 19993/20099 (99.47%) Loss: 2.153937 LR: 0.00000600 +[21:19:31] Epoch: 1 Batch: 19994/20099 (99.48%) Loss: 2.160149 LR: 0.00000600 +[21:19:33] Epoch: 1 Batch: 19995/20099 (99.48%) Loss: 2.040557 LR: 0.00000600 +[21:19:34] Epoch: 1 Batch: 19996/20099 (99.49%) Loss: 2.279503 LR: 0.00000600 +[21:19:36] Epoch: 1 Batch: 19997/20099 (99.49%) Loss: 2.209139 LR: 0.00000600 +[21:19:38] Epoch: 1 Batch: 19998/20099 (99.50%) Loss: 2.021509 LR: 0.00000600 +[21:19:40] Epoch: 1 Batch: 19999/20099 (99.50%) Loss: 2.177733 LR: 0.00000600 +[21:19:41] >> Evaluating batch 0 +[21:19:42] >> Evaluating batch 1 +[21:19:43] >> Evaluating batch 2 +[21:19:44] >> Evaluating batch 3 +[21:19:45] >> Evaluating batch 4 +[21:19:46] >> Evaluating batch 5 +[21:19:47] >> Evaluating batch 6 +[21:19:48] >> Evaluating batch 7 +[21:19:49] >> Evaluating batch 8 +[21:19:50] >> Evaluating batch 9 +[21:19:51] >> Evaluating batch 10 +[21:19:52] >> Evaluating batch 11 +[21:19:53] >> Evaluating batch 12 +[21:19:54] >> Evaluating batch 13 +[21:19:55] >> Evaluating batch 14 +[21:19:55] >> Evaluating batch 15 +[21:19:56] >> Evaluating batch 16 +[21:19:57] Epoch: 1 Step: 20000/20099 Evaluation: +[21:19:57] [1mAvg Loss Since Last Eval: 2.0924 Val Loss: 2.1448 Validation loss delta: -0.0014 Perplexity: 8.5400 LR: 0.00000600 +[21:20:01] >> Cleaned up old temp checkpoint: epoch1_step18000 +[21:20:01] >> Temp checkpoint saved: epoch1_step20000, size: 0.1693 GB +[21:20:04] >> Checkpoint saved: epoch1_step20000, size: 0.1693 GB +[21:20:04] Epoch: 1 Batch: 20000/20099 (99.51%) Loss: 1.894609 LR: 0.00000600 +[21:20:06] Epoch: 1 Batch: 20001/20099 (99.51%) Loss: 2.119290 LR: 0.00000600 +[21:20:08] Epoch: 1 Batch: 20002/20099 (99.52%) Loss: 1.900173 LR: 0.00000600 +[21:20:09] Epoch: 1 Batch: 20003/20099 (99.52%) Loss: 1.886672 LR: 0.00000600 +[21:20:11] Epoch: 1 Batch: 20004/20099 (99.53%) Loss: 2.296856 LR: 0.00000600 +[21:20:13] Epoch: 1 Batch: 20005/20099 (99.53%) Loss: 2.127817 LR: 0.00000600 +[21:20:14] Epoch: 1 Batch: 20006/20099 (99.54%) Loss: 2.125939 LR: 0.00000600 +[21:20:16] Epoch: 1 Batch: 20007/20099 (99.54%) Loss: 2.132201 LR: 0.00000600 +[21:20:18] Epoch: 1 Batch: 20008/20099 (99.55%) Loss: 1.948098 LR: 0.00000600 +[21:20:19] Epoch: 1 Batch: 20009/20099 (99.55%) Loss: 2.311985 LR: 0.00000600 +[21:20:21] Epoch: 1 Batch: 20010/20099 (99.56%) Loss: 2.074785 LR: 0.00000600 +[21:20:23] Epoch: 1 Batch: 20011/20099 (99.56%) Loss: 2.217137 LR: 0.00000600 +[21:20:25] Epoch: 1 Batch: 20012/20099 (99.57%) Loss: 1.919242 LR: 0.00000600 +[21:20:26] Epoch: 1 Batch: 20013/20099 (99.57%) Loss: 2.283978 LR: 0.00000600 +[21:20:28] Epoch: 1 Batch: 20014/20099 (99.58%) Loss: 1.825777 LR: 0.00000600 +[21:20:30] Epoch: 1 Batch: 20015/20099 (99.58%) Loss: 2.039642 LR: 0.00000600 +[21:20:31] Epoch: 1 Batch: 20016/20099 (99.59%) Loss: 2.310894 LR: 0.00000600 +[21:20:33] Epoch: 1 Batch: 20017/20099 (99.59%) Loss: 2.431560 LR: 0.00000600 +[21:20:35] Epoch: 1 Batch: 20018/20099 (99.60%) Loss: 2.110026 LR: 0.00000600 +[21:20:37] Epoch: 1 Batch: 20019/20099 (99.60%) Loss: 1.900888 LR: 0.00000600 +[21:20:38] Epoch: 1 Batch: 20020/20099 (99.61%) Loss: 1.696613 LR: 0.00000600 +[21:20:40] Epoch: 1 Batch: 20021/20099 (99.61%) Loss: 1.971084 LR: 0.00000600 +[21:20:42] Epoch: 1 Batch: 20022/20099 (99.62%) Loss: 2.279451 LR: 0.00000600 +[21:20:43] Epoch: 1 Batch: 20023/20099 (99.62%) Loss: 2.208556 LR: 0.00000600 +[21:20:45] Epoch: 1 Batch: 20024/20099 (99.63%) Loss: 2.193953 LR: 0.00000600 +[21:20:47] Epoch: 1 Batch: 20025/20099 (99.63%) Loss: 2.005806 LR: 0.00000600 +[21:20:49] Epoch: 1 Batch: 20026/20099 (99.64%) Loss: 2.376236 LR: 0.00000600 +[21:20:50] Epoch: 1 Batch: 20027/20099 (99.64%) Loss: 2.040633 LR: 0.00000600 +[21:20:52] Epoch: 1 Batch: 20028/20099 (99.65%) Loss: 1.958661 LR: 0.00000600 +[21:20:54] Epoch: 1 Batch: 20029/20099 (99.65%) Loss: 2.091048 LR: 0.00000600 +[21:20:55] Epoch: 1 Batch: 20030/20099 (99.66%) Loss: 2.196643 LR: 0.00000600 +[21:20:57] Epoch: 1 Batch: 20031/20099 (99.66%) Loss: 2.165956 LR: 0.00000600 +[21:20:59] Epoch: 1 Batch: 20032/20099 (99.67%) Loss: 1.989054 LR: 0.00000600 +[21:21:00] Epoch: 1 Batch: 20033/20099 (99.67%) Loss: 2.271550 LR: 0.00000600 +[21:21:02] Epoch: 1 Batch: 20034/20099 (99.68%) Loss: 2.020090 LR: 0.00000600 +[21:21:04] Epoch: 1 Batch: 20035/20099 (99.68%) Loss: 2.100436 LR: 0.00000600 +[21:21:06] Epoch: 1 Batch: 20036/20099 (99.69%) Loss: 2.018391 LR: 0.00000600 +[21:21:07] Epoch: 1 Batch: 20037/20099 (99.69%) Loss: 2.015633 LR: 0.00000600 +[21:21:09] Epoch: 1 Batch: 20038/20099 (99.70%) Loss: 2.111664 LR: 0.00000600 +[21:21:11] Epoch: 1 Batch: 20039/20099 (99.70%) Loss: 2.169130 LR: 0.00000600 +[21:21:12] Epoch: 1 Batch: 20040/20099 (99.71%) Loss: 2.284732 LR: 0.00000600 +[21:21:14] Epoch: 1 Batch: 20041/20099 (99.71%) Loss: 2.193870 LR: 0.00000600 +[21:21:16] Epoch: 1 Batch: 20042/20099 (99.72%) Loss: 2.200043 LR: 0.00000600 +[21:21:17] Epoch: 1 Batch: 20043/20099 (99.72%) Loss: 1.802087 LR: 0.00000600 +[21:21:19] Epoch: 1 Batch: 20044/20099 (99.73%) Loss: 1.943886 LR: 0.00000600 +[21:21:21] Epoch: 1 Batch: 20045/20099 (99.73%) Loss: 1.958809 LR: 0.00000600 +[21:21:22] Epoch: 1 Batch: 20046/20099 (99.74%) Loss: 2.254942 LR: 0.00000600 +[21:21:24] Epoch: 1 Batch: 20047/20099 (99.74%) Loss: 2.071559 LR: 0.00000600 +[21:21:26] Epoch: 1 Batch: 20048/20099 (99.75%) Loss: 1.877109 LR: 0.00000600 +[21:21:27] Epoch: 1 Batch: 20049/20099 (99.75%) Loss: 1.975144 LR: 0.00000600 +[21:21:29] Epoch: 1 Batch: 20050/20099 (99.76%) Loss: 2.000994 LR: 0.00000600 +[21:21:31] Epoch: 1 Batch: 20051/20099 (99.76%) Loss: 2.148230 LR: 0.00000600 +[21:21:33] Epoch: 1 Batch: 20052/20099 (99.77%) Loss: 2.332328 LR: 0.00000600 +[21:21:34] Epoch: 1 Batch: 20053/20099 (99.77%) Loss: 2.297668 LR: 0.00000600 +[21:21:36] Epoch: 1 Batch: 20054/20099 (99.78%) Loss: 1.931857 LR: 0.00000600 +[21:21:38] Epoch: 1 Batch: 20055/20099 (99.78%) Loss: 1.881793 LR: 0.00000600 +[21:21:39] Epoch: 1 Batch: 20056/20099 (99.79%) Loss: 2.050407 LR: 0.00000600 +[21:21:41] Epoch: 1 Batch: 20057/20099 (99.79%) Loss: 2.232180 LR: 0.00000600 +[21:21:43] Epoch: 1 Batch: 20058/20099 (99.80%) Loss: 2.163333 LR: 0.00000600 +[21:21:44] Epoch: 1 Batch: 20059/20099 (99.80%) Loss: 2.033811 LR: 0.00000600 +[21:21:46] Epoch: 1 Batch: 20060/20099 (99.81%) Loss: 1.958224 LR: 0.00000600 +[21:21:48] Epoch: 1 Batch: 20061/20099 (99.81%) Loss: 1.790466 LR: 0.00000600 +[21:21:49] Epoch: 1 Batch: 20062/20099 (99.82%) Loss: 2.163512 LR: 0.00000600 +[21:21:51] Epoch: 1 Batch: 20063/20099 (99.82%) Loss: 2.442458 LR: 0.00000600 +[21:21:53] Epoch: 1 Batch: 20064/20099 (99.83%) Loss: 1.767469 LR: 0.00000600 +[21:21:55] Epoch: 1 Batch: 20065/20099 (99.83%) Loss: 2.162555 LR: 0.00000600 +[21:21:56] Epoch: 1 Batch: 20066/20099 (99.84%) Loss: 1.913892 LR: 0.00000600 +[21:21:58] Epoch: 1 Batch: 20067/20099 (99.84%) Loss: 2.216797 LR: 0.00000600 +[21:22:00] Epoch: 1 Batch: 20068/20099 (99.85%) Loss: 2.215182 LR: 0.00000600 +[21:22:01] Epoch: 1 Batch: 20069/20099 (99.85%) Loss: 2.041317 LR: 0.00000600 +[21:22:03] Epoch: 1 Batch: 20070/20099 (99.86%) Loss: 1.849959 LR: 0.00000600 +[21:22:05] Epoch: 1 Batch: 20071/20099 (99.86%) Loss: 2.030911 LR: 0.00000600 +[21:22:06] Epoch: 1 Batch: 20072/20099 (99.87%) Loss: 2.323207 LR: 0.00000600 +[21:22:08] Epoch: 1 Batch: 20073/20099 (99.87%) Loss: 2.118185 LR: 0.00000600 +[21:22:10] Epoch: 1 Batch: 20074/20099 (99.88%) Loss: 1.719002 LR: 0.00000600 +[21:22:11] Epoch: 1 Batch: 20075/20099 (99.88%) Loss: 1.980797 LR: 0.00000600 +[21:22:13] Epoch: 1 Batch: 20076/20099 (99.89%) Loss: 2.036379 LR: 0.00000600 +[21:22:15] Epoch: 1 Batch: 20077/20099 (99.89%) Loss: 1.961114 LR: 0.00000600 +[21:22:17] Epoch: 1 Batch: 20078/20099 (99.90%) Loss: 1.979412 LR: 0.00000600 +[21:22:18] Epoch: 1 Batch: 20079/20099 (99.90%) Loss: 2.213207 LR: 0.00000600 +[21:22:20] Epoch: 1 Batch: 20080/20099 (99.91%) Loss: 2.047645 LR: 0.00000600 +[21:22:22] Epoch: 1 Batch: 20081/20099 (99.91%) Loss: 2.121185 LR: 0.00000600 +[21:22:23] Epoch: 1 Batch: 20082/20099 (99.92%) Loss: 1.899416 LR: 0.00000600 +[21:22:25] Epoch: 1 Batch: 20083/20099 (99.92%) Loss: 2.089800 LR: 0.00000600 +[21:22:27] Epoch: 1 Batch: 20084/20099 (99.93%) Loss: 2.278829 LR: 0.00000600 +[21:22:28] Epoch: 1 Batch: 20085/20099 (99.93%) Loss: 2.147457 LR: 0.00000600 +[21:22:30] Epoch: 1 Batch: 20086/20099 (99.94%) Loss: 2.007672 LR: 0.00000600 +[21:22:32] Epoch: 1 Batch: 20087/20099 (99.94%) Loss: 2.243100 LR: 0.00000600 +[21:22:34] Epoch: 1 Batch: 20088/20099 (99.95%) Loss: 1.995599 LR: 0.00000600 +[21:22:35] Epoch: 1 Batch: 20089/20099 (99.95%) Loss: 2.086331 LR: 0.00000600 +[21:22:37] Epoch: 1 Batch: 20090/20099 (99.96%) Loss: 1.993614 LR: 0.00000600 +[21:22:39] Epoch: 1 Batch: 20091/20099 (99.96%) Loss: 2.170582 LR: 0.00000600 +[21:22:40] Epoch: 1 Batch: 20092/20099 (99.97%) Loss: 2.397893 LR: 0.00000600 +[21:22:42] Epoch: 1 Batch: 20093/20099 (99.97%) Loss: 1.849714 LR: 0.00000600 +[21:22:44] Epoch: 1 Batch: 20094/20099 (99.98%) Loss: 1.872685 LR: 0.00000600 +[21:22:46] Epoch: 1 Batch: 20095/20099 (99.98%) Loss: 1.949284 LR: 0.00000600 +[21:22:47] Epoch: 1 Batch: 20096/20099 (99.99%) Loss: 2.428811 LR: 0.00000600 +[21:22:49] Epoch: 1 Batch: 20097/20099 (99.99%) Loss: 2.222315 LR: 0.00000600 +[21:22:51] Epoch: 1 Batch: 20098/20099 (100.00%) Loss: 1.950915 LR: 0.00000600 +[21:22:52] Epoch: 1 Batch: 20099/20099 (100.00%) Loss: 2.265149 LR: 0.00000600 +[21:22:52] CPU usage: 64.4%, RAM usage: 43.5% +[21:22:52] Memory cleanup after epoch 1 +[21:22:53] CPU usage: 54.0%, RAM usage: 43.5% +[21:22:53] Epoch 1 average loss: 0.1139 +[21:22:53] >> Evaluating batch 0 +[21:22:54] >> Evaluating batch 1 +[21:22:55] >> Evaluating batch 2 +[21:22:56] >> Evaluating batch 3 +[21:22:57] >> Evaluating batch 4 +[21:22:58] >> Evaluating batch 5 +[21:22:59] >> Evaluating batch 6 +[21:23:00] >> Evaluating batch 7 +[21:23:01] >> Evaluating batch 8 +[21:23:02] >> Evaluating batch 9 +[21:23:03] >> Evaluating batch 10 +[21:23:04] >> Evaluating batch 11 +[21:23:05] >> Evaluating batch 12 +[21:23:06] >> Evaluating batch 13 +[21:23:06] >> Evaluating batch 14 +[21:23:07] >> Evaluating batch 15 +[21:23:08] >> Evaluating batch 16 +[21:23:09] Epoch: 1 Step: 20099/20099 Evaluation: +[21:23:09] Val Loss: 2.1447 Perplexity: 8.5396 LR: 0.00000600 +[21:23:09] Epoch 1 completed in 2060.51 seconds +[21:23:13] >> Checkpoint saved: epoch1_complete, size: 0.1690 GB +[21:23:16] >> Cleaned up old temp checkpoint: epoch1_step18200 +[21:23:16] >> Temp checkpoint saved: epoch1_step20099, size: 0.1690 GB +[21:23:17] Training complete.