Files
shisa-v2.1-lfm2-1.2b/debug.log
ModelHub XC 224f68937d 初始化项目,由ModelHub XC社区提供模型
Model: shisa-ai/shisa-v2.1-lfm2-1.2b
Source: Original Platform
2026-06-19 09:44:13 +08:00

10 lines
34 KiB
Plaintext

[2025-11-06 08:51:25,486] [WARNING] [py.warnings._showwarnmsg:112] [PID:3076758] /root/miniforge3/envs/axolotl/lib/python3.12/site-packages/deepspeed/runtime/zero/partition_parameters.py:240: UserWarning: expandable_segments not supported on this platform (Triggered internally at /pytorch/c10/hip/HIPAllocatorConfig.h:36.)
tensor: Tensor = fn(*args, **kwargs)
[2025-11-06 08:51:32,980] [WARNING] [py.warnings._showwarnmsg:112] [PID:3076758] /root/miniforge3/envs/axolotl/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py:4807: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
warnings.warn( # warn only once
Extracting prompt in train dataset (num_proc=160): 0%| | 0/105170 [00:00<?, ? examples/s]
Extracting prompt in train dataset (num_proc=160): 0%| | 110/105170 [00:03<1:03:35, 27.53 examples/s]
Extracting prompt in train dataset (num_proc=160): 4%|█▎ | 3951/105170 [00:04<01:15, 1338.94 examples/s]
Extracting prompt in train dataset (num_proc=160): 6%|█▉ | 5842/105170 [00:04<00:46, 2139.82 examples/s]
Extracting prompt in train dataset (num_proc=160): 7%|██▍ | 7581/105170 [00:04<00:32, 3039.11 examples/s]
Extracting prompt in train dataset (num_proc=160): 9%|███ | 9307/105170 [00:04<00:24, 3949.89 examples/s]
Extracting prompt in train dataset (num_proc=160): 13%|████▏ | 13382/105170 [00:04<00:12, 7534.85 examples/s]
Extracting prompt in train dataset (num_proc=160): 16%|█████▏ | 17022/105170 [00:04<00:07, 11051.92 examples/s]
Extracting prompt in train dataset (num_proc=160): 19%|██████ | 19824/105170 [00:04<00:06, 12624.91 examples/s]
Extracting prompt in train dataset (num_proc=160): 21%|██████▊ | 22283/105170 [00:05<00:06, 13352.98 examples/s]
Extracting prompt in train dataset (num_proc=160): 23%|███████▍ | 24460/105170 [00:05<00:05, 13662.30 examples/s]
Extracting prompt in train dataset (num_proc=160): 25%|████████ | 26478/105170 [00:05<00:05, 13868.55 examples/s]
Extracting prompt in train dataset (num_proc=160): 28%|████████▊ | 29156/105170 [00:05<00:04, 16444.30 examples/s]
Extracting prompt in train dataset (num_proc=160): 30%|█████████▌ | 31230/105170 [00:05<00:04, 16927.48 examples/s]
Extracting prompt in train dataset (num_proc=160): 32%|██████████▏ | 33410/105170 [00:05<00:04, 14539.05 examples/s]
Extracting prompt in train dataset (num_proc=160): 49%|███████████████▋ | 51665/105170 [00:05<00:01, 50088.28 examples/s]
Extracting prompt in train dataset (num_proc=160): 66%|█████████████████████▏ | 69590/105170 [00:05<00:00, 79793.06 examples/s]
Extracting prompt in train dataset (num_proc=160): 90%|███████████████████████████▊ | 94350/105170 [00:06<00:00, 121502.26 examples/s]
Extracting prompt in train dataset (num_proc=160): 100%|███████████████████████████████| 105170/105170 [00:06<00:00, 15164.58 examples/s]
Applying chat template to train dataset (num_proc=160): 0%| | 0/105170 [00:00<?, ? examples/s]
Applying chat template to train dataset (num_proc=160): 0%| | 50/105170 [00:04<2:30:25, 11.65 examples/s]
Applying chat template to train dataset (num_proc=160): 1%|▏ | 675/105170 [00:04<08:15, 210.77 examples/s]
Applying chat template to train dataset (num_proc=160): 2%|▌ | 1938/105170 [00:04<02:17, 750.78 examples/s]
Applying chat template to train dataset (num_proc=160): 3%|▊ | 3050/105170 [00:04<01:16, 1341.84 examples/s]
Applying chat template to train dataset (num_proc=160): 5%|█▍ | 5273/105170 [00:04<00:34, 2935.94 examples/s]
Applying chat template to train dataset (num_proc=160): 6%|█▊ | 6618/105170 [00:04<00:25, 3798.52 examples/s]
Applying chat template to train dataset (num_proc=160): 9%|██▍ | 8989/105170 [00:05<00:16, 5978.80 examples/s]
Applying chat template to train dataset (num_proc=160): 11%|███▏ | 11951/105170 [00:05<00:10, 9212.12 examples/s]
Applying chat template to train dataset (num_proc=160): 13%|███▋ | 14157/105170 [00:05<00:08, 11311.43 examples/s]
Applying chat template to train dataset (num_proc=160): 15%|████▏ | 16107/105170 [00:05<00:07, 12101.63 examples/s]
Applying chat template to train dataset (num_proc=160): 17%|████▌ | 17962/105170 [00:05<00:06, 12748.17 examples/s]
Applying chat template to train dataset (num_proc=160): 20%|█████▎ | 20806/105170 [00:05<00:05, 14755.22 examples/s]
Applying chat template to train dataset (num_proc=160): 22%|█████▊ | 22626/105170 [00:05<00:05, 13836.62 examples/s]
Applying chat template to train dataset (num_proc=160): 23%|██████▏ | 24225/105170 [00:05<00:05, 13621.09 examples/s]
Applying chat template to train dataset (num_proc=160): 24%|██████▌ | 25754/105170 [00:06<00:06, 13176.93 examples/s]
Applying chat template to train dataset (num_proc=160): 27%|███████▎ | 28443/105170 [00:06<00:04, 15705.45 examples/s]
Applying chat template to train dataset (num_proc=160): 29%|███████▋ | 30166/105170 [00:06<00:04, 15903.40 examples/s]
Applying chat template to train dataset (num_proc=160): 30%|████████▏ | 31893/105170 [00:06<00:05, 13227.29 examples/s]
Applying chat template to train dataset (num_proc=160): 33%|████████▊ | 34205/105170 [00:06<00:04, 15487.94 examples/s]
Applying chat template to train dataset (num_proc=160): 34%|█████████▏ | 35947/105170 [00:06<00:05, 13727.27 examples/s]
Applying chat template to train dataset (num_proc=160): 36%|█████████▋ | 37785/105170 [00:06<00:04, 13706.64 examples/s]
Applying chat template to train dataset (num_proc=160): 39%|██████████▌ | 41145/105170 [00:06<00:03, 17628.22 examples/s]
Applying chat template to train dataset (num_proc=160): 41%|███████████ | 43030/105170 [00:07<00:03, 16625.80 examples/s]
Applying chat template to train dataset (num_proc=160): 43%|███████████▌ | 44862/105170 [00:07<00:04, 14248.72 examples/s]
Applying chat template to train dataset (num_proc=160): 44%|███████████▉ | 46451/105170 [00:07<00:04, 12630.48 examples/s]
Applying chat template to train dataset (num_proc=160): 46%|████████████▌ | 48782/105170 [00:07<00:03, 14955.29 examples/s]
Applying chat template to train dataset (num_proc=160): 48%|████████████▉ | 50448/105170 [00:07<00:03, 15266.67 examples/s]
Applying chat template to train dataset (num_proc=160): 50%|█████████████▍ | 52509/105170 [00:07<00:03, 16492.22 examples/s]
Applying chat template to train dataset (num_proc=160): 52%|█████████████▉ | 54268/105170 [00:07<00:03, 13054.62 examples/s]
Applying chat template to train dataset (num_proc=160): 53%|██████████████▎ | 55781/105170 [00:08<00:03, 12473.00 examples/s]
Applying chat template to train dataset (num_proc=160): 55%|██████████████▊ | 57497/105170 [00:08<00:03, 12879.94 examples/s]
Applying chat template to train dataset (num_proc=160): 56%|███████████████ | 58880/105170 [00:08<00:03, 12737.76 examples/s]
Applying chat template to train dataset (num_proc=160): 59%|███████████████▊ | 61705/105170 [00:08<00:02, 16520.53 examples/s]
Applying chat template to train dataset (num_proc=160): 60%|████████████████▎ | 63508/105170 [00:08<00:02, 16895.09 examples/s]
Applying chat template to train dataset (num_proc=160): 62%|████████████████▊ | 65430/105170 [00:08<00:02, 16404.91 examples/s]
Applying chat template to train dataset (num_proc=160): 65%|█████████████████▌ | 68209/105170 [00:08<00:02, 17411.58 examples/s]
Applying chat template to train dataset (num_proc=160): 67%|█████████████████▉ | 70007/105170 [00:08<00:02, 15183.55 examples/s]
Applying chat template to train dataset (num_proc=160): 68%|██████████████████▍ | 71610/105170 [00:09<00:02, 14578.54 examples/s]
Applying chat template to train dataset (num_proc=160): 70%|██████████████████▉ | 73807/105170 [00:09<00:02, 14997.86 examples/s]
Applying chat template to train dataset (num_proc=160): 72%|███████████████████▎ | 75370/105170 [00:09<00:02, 13514.99 examples/s]
Applying chat template to train dataset (num_proc=160): 73%|███████████████████▋ | 76794/105170 [00:09<00:02, 13564.88 examples/s]
Applying chat template to train dataset (num_proc=160): 74%|████████████████████ | 78181/105170 [00:09<00:02, 12037.23 examples/s]
Applying chat template to train dataset (num_proc=160): 77%|████████████████████▊ | 81009/105170 [00:09<00:01, 15904.44 examples/s]
Applying chat template to train dataset (num_proc=160): 79%|█████████████████████▏ | 82739/105170 [00:09<00:01, 12437.93 examples/s]
Applying chat template to train dataset (num_proc=160): 80%|█████████████████████▋ | 84495/105170 [00:10<00:01, 12873.48 examples/s]
Applying chat template to train dataset (num_proc=160): 82%|██████████████████████▏ | 86312/105170 [00:10<00:01, 13198.06 examples/s]
Applying chat template to train dataset (num_proc=160): 85%|██████████████████████▉ | 89292/105170 [00:10<00:00, 17001.18 examples/s]
Applying chat template to train dataset (num_proc=160): 87%|███████████████████████▍ | 91165/105170 [00:10<00:00, 16199.78 examples/s]
Applying chat template to train dataset (num_proc=160): 88%|███████████████████████▊ | 92909/105170 [00:10<00:00, 14763.83 examples/s]
Applying chat template to train dataset (num_proc=160): 91%|████████████████████████▋ | 96038/105170 [00:10<00:00, 18747.94 examples/s]
Applying chat template to train dataset (num_proc=160): 93%|█████████████████████████▏ | 98086/105170 [00:10<00:00, 18120.33 examples/s]
Applying chat template to train dataset (num_proc=160): 95%|████████████████████████▋ | 100039/105170 [00:10<00:00, 16410.97 examples/s]
Applying chat template to train dataset (num_proc=160): 97%|█████████████████████████▏| 101785/105170 [00:11<00:00, 14486.47 examples/s]
Applying chat template to train dataset (num_proc=160): 98%|█████████████████████████▌| 103358/105170 [00:11<00:00, 13775.91 examples/s]
Applying chat template to train dataset (num_proc=160): 100%|█████████████████████████▉| 104843/105170 [00:11<00:00, 10274.28 examples/s]
Applying chat template to train dataset (num_proc=160): 100%|███████████████████████████| 105170/105170 [00:12<00:00, 8452.21 examples/s]
Tokenizing train dataset (num_proc=160): 0%| | 0/105170 [00:00<?, ? examples/s]
Tokenizing train dataset (num_proc=160): 0%| | 10/105170 [00:06<17:47:56, 1.64 examples/s]
Tokenizing train dataset (num_proc=160): 0%| | 32/105170 [00:06<4:24:34, 6.62 examples/s]
Tokenizing train dataset (num_proc=160): 0%| | 56/105170 [00:06<2:05:39, 13.94 examples/s]
Tokenizing train dataset (num_proc=160): 0%| | 116/105170 [00:06<45:03, 38.85 examples/s]
Tokenizing train dataset (num_proc=160): 0%| | 257/105170 [00:06<15:24, 113.44 examples/s]
Tokenizing train dataset (num_proc=160): 0%|▏ | 307/105170 [00:06<12:32, 139.32 examples/s]
Tokenizing train dataset (num_proc=160): 1%|▎ | 581/105170 [00:06<04:51, 358.72 examples/s]
Tokenizing train dataset (num_proc=160): 1%|▍ | 897/105170 [00:07<02:55, 593.51 examples/s]
Tokenizing train dataset (num_proc=160): 1%|▌ | 1198/105170 [00:07<01:59, 869.76 examples/s]
Tokenizing train dataset (num_proc=160): 2%|▋ | 1769/105170 [00:07<01:07, 1535.37 examples/s]
Tokenizing train dataset (num_proc=160): 2%|▊ | 2024/105170 [00:07<01:00, 1706.84 examples/s]
Tokenizing train dataset (num_proc=160): 2%|█ | 2549/105170 [00:07<00:43, 2384.35 examples/s]
Tokenizing train dataset (num_proc=160): 3%|█▏ | 2884/105170 [00:07<00:40, 2545.10 examples/s]
Tokenizing train dataset (num_proc=160): 3%|█▍ | 3320/105170 [00:07<00:39, 2563.07 examples/s]
Tokenizing train dataset (num_proc=160): 4%|█▌ | 3682/105170 [00:07<00:36, 2798.12 examples/s]
Tokenizing train dataset (num_proc=160): 4%|█▊ | 4242/105170 [00:08<00:29, 3474.91 examples/s]
Tokenizing train dataset (num_proc=160): 5%|██ | 4846/105170 [00:08<00:33, 3020.68 examples/s]
Tokenizing train dataset (num_proc=160): 5%|██▍ | 5699/105170 [00:08<00:28, 3521.63 examples/s]
Tokenizing train dataset (num_proc=160): 6%|██▌ | 6152/105170 [00:08<00:29, 3345.92 examples/s]
Tokenizing train dataset (num_proc=160): 6%|██▊ | 6677/105170 [00:08<00:26, 3738.30 examples/s]
Tokenizing train dataset (num_proc=160): 7%|███ | 7316/105170 [00:08<00:22, 4337.63 examples/s]
Tokenizing train dataset (num_proc=160): 7%|███▎ | 7793/105170 [00:09<00:35, 2726.37 examples/s]
Tokenizing train dataset (num_proc=160): 8%|███▍ | 8167/105170 [00:09<00:38, 2540.90 examples/s]
Tokenizing train dataset (num_proc=160): 8%|███▌ | 8496/105170 [00:09<00:37, 2549.51 examples/s]
Tokenizing train dataset (num_proc=160): 9%|███▊ | 9005/105170 [00:09<00:32, 2994.71 examples/s]
Tokenizing train dataset (num_proc=160): 9%|███▉ | 9537/105170 [00:09<00:31, 3079.02 examples/s]
Tokenizing train dataset (num_proc=160): 9%|████▏ | 9899/105170 [00:09<00:29, 3194.26 examples/s]
Tokenizing train dataset (num_proc=160): 10%|████▏ | 10249/105170 [00:10<00:39, 2401.80 examples/s]
Tokenizing train dataset (num_proc=160): 10%|████▎ | 10535/105170 [00:10<00:42, 2228.54 examples/s]
Tokenizing train dataset (num_proc=160): 10%|████▍ | 10790/105170 [00:10<01:02, 1506.25 examples/s]
Tokenizing train dataset (num_proc=160): 10%|████▍ | 10989/105170 [00:10<01:00, 1561.95 examples/s]
Tokenizing train dataset (num_proc=160): 11%|████▌ | 11186/105170 [00:11<01:33, 1000.43 examples/s]
Tokenizing train dataset (num_proc=160): 11%|████▋ | 11353/105170 [00:11<01:36, 972.19 examples/s]
Tokenizing train dataset (num_proc=160): 11%|████▊ | 11501/105170 [00:11<01:43, 902.76 examples/s]
Tokenizing train dataset (num_proc=160): 11%|████▊ | 11616/105170 [00:11<01:43, 903.41 examples/s]
Tokenizing train dataset (num_proc=160): 11%|████▉ | 11730/105170 [00:11<01:41, 916.36 examples/s]
Tokenizing train dataset (num_proc=160): 11%|████▉ | 12013/105170 [00:11<01:17, 1199.45 examples/s]
Tokenizing train dataset (num_proc=160): 12%|████▉ | 12149/105170 [00:12<01:20, 1162.14 examples/s]
Tokenizing train dataset (num_proc=160): 12%|█████ | 12366/105170 [00:12<01:12, 1285.80 examples/s]
Tokenizing train dataset (num_proc=160): 12%|█████▏ | 12704/105170 [00:12<00:56, 1633.30 examples/s]
Tokenizing train dataset (num_proc=160): 12%|█████▎ | 12876/105170 [00:12<00:59, 1551.22 examples/s]
Tokenizing train dataset (num_proc=160): 12%|█████▎ | 13119/105170 [00:12<00:56, 1632.35 examples/s]
Tokenizing train dataset (num_proc=160): 13%|█████▍ | 13380/105170 [00:12<00:54, 1681.67 examples/s]
Tokenizing train dataset (num_proc=160): 13%|█████▌ | 13562/105170 [00:12<00:56, 1611.60 examples/s]
Tokenizing train dataset (num_proc=160): 13%|█████▋ | 13808/105170 [00:13<00:56, 1603.03 examples/s]
Tokenizing train dataset (num_proc=160): 13%|█████▋ | 14061/105170 [00:13<01:01, 1493.05 examples/s]
Tokenizing train dataset (num_proc=160): 14%|█████▊ | 14266/105170 [00:13<01:08, 1320.66 examples/s]
Tokenizing train dataset (num_proc=160): 14%|█████▉ | 14516/105170 [00:13<01:09, 1313.18 examples/s]
Tokenizing train dataset (num_proc=160): 14%|██████▏ | 15086/105170 [00:13<00:43, 2083.48 examples/s]
Tokenizing train dataset (num_proc=160): 15%|██████▎ | 15328/105170 [00:13<00:43, 2047.49 examples/s]
Tokenizing train dataset (num_proc=160): 15%|██████▎ | 15564/105170 [00:13<00:42, 2090.18 examples/s]
Tokenizing train dataset (num_proc=160): 15%|██████▍ | 15792/105170 [00:14<00:43, 2065.01 examples/s]
Tokenizing train dataset (num_proc=160): 15%|██████▌ | 16009/105170 [00:14<00:55, 1599.65 examples/s]
Tokenizing train dataset (num_proc=160): 15%|██████▊ | 16198/105170 [00:14<01:57, 756.36 examples/s]
Tokenizing train dataset (num_proc=160): 16%|██████▊ | 16339/105170 [00:15<01:59, 745.22 examples/s]
Tokenizing train dataset (num_proc=160): 16%|██████▉ | 16537/105170 [00:15<01:48, 818.59 examples/s]
Tokenizing train dataset (num_proc=160): 16%|███████ | 16809/105170 [00:15<01:33, 944.74 examples/s]
Tokenizing train dataset (num_proc=160): 16%|██████▉ | 17000/105170 [00:15<01:21, 1081.32 examples/s]
Tokenizing train dataset (num_proc=160): 16%|███████▏ | 17326/105170 [00:16<01:27, 999.61 examples/s]
Tokenizing train dataset (num_proc=160): 17%|███████▎ | 17485/105170 [00:16<01:46, 824.02 examples/s]
Tokenizing train dataset (num_proc=160): 17%|███████▍ | 17733/105170 [00:16<01:42, 854.27 examples/s]
Tokenizing train dataset (num_proc=160): 17%|███████▍ | 17843/105170 [00:16<01:50, 793.28 examples/s]
Tokenizing train dataset (num_proc=160): 17%|███████▍ | 18174/105170 [00:16<01:21, 1071.36 examples/s]
Tokenizing train dataset (num_proc=160): 17%|███████▌ | 18361/105170 [00:17<01:12, 1201.67 examples/s]
Tokenizing train dataset (num_proc=160): 18%|███████▌ | 18541/105170 [00:17<01:06, 1303.76 examples/s]
Tokenizing train dataset (num_proc=160): 18%|███████▋ | 18702/105170 [00:17<01:15, 1145.99 examples/s]
Tokenizing train dataset (num_proc=160): 18%|███████▉ | 18836/105170 [00:17<01:28, 973.70 examples/s]
Tokenizing train dataset (num_proc=160): 18%|███████▊ | 19037/105170 [00:17<01:15, 1140.69 examples/s]
Tokenizing train dataset (num_proc=160): 18%|███████▉ | 19280/105170 [00:17<01:02, 1383.82 examples/s]
Tokenizing train dataset (num_proc=160): 19%|███████▉ | 19550/105170 [00:17<00:50, 1682.84 examples/s]
Tokenizing train dataset (num_proc=160): 19%|████████ | 19746/105170 [00:18<00:56, 1520.65 examples/s]
Tokenizing train dataset (num_proc=160): 19%|████████▏ | 19919/105170 [00:18<00:58, 1445.12 examples/s]
Tokenizing train dataset (num_proc=160): 19%|████████▏ | 20078/105170 [00:18<01:03, 1337.83 examples/s]
Tokenizing train dataset (num_proc=160): 19%|████████▎ | 20222/105170 [00:18<01:06, 1273.43 examples/s]
Tokenizing train dataset (num_proc=160): 19%|████████▎ | 20357/105170 [00:18<01:11, 1188.40 examples/s]
Tokenizing train dataset (num_proc=160): 19%|████████▍ | 20494/105170 [00:18<01:11, 1180.03 examples/s]
Tokenizing train dataset (num_proc=160): 20%|████████▌ | 20815/105170 [00:19<01:12, 1160.02 examples/s]
Tokenizing train dataset (num_proc=160): 20%|████████▊ | 20934/105170 [00:19<02:04, 675.69 examples/s]
Tokenizing train dataset (num_proc=160): 20%|████████▊ | 21191/105170 [00:19<01:59, 701.94 examples/s]
Tokenizing train dataset (num_proc=160): 20%|████████▉ | 21407/105170 [00:20<02:05, 667.83 examples/s]
Tokenizing train dataset (num_proc=160): 21%|█████████ | 21653/105170 [00:20<01:57, 710.42 examples/s]
Tokenizing train dataset (num_proc=160): 21%|█████████ | 21743/105170 [00:20<02:07, 656.76 examples/s]
Tokenizing train dataset (num_proc=160): 21%|█████████▏ | 21918/105170 [00:20<01:54, 726.36 examples/s]
Tokenizing train dataset (num_proc=160): 21%|█████████▎ | 22238/105170 [00:21<01:28, 932.79 examples/s]
Tokenizing train dataset (num_proc=160): 21%|█████████▎ | 22364/105170 [00:21<01:31, 900.12 examples/s]
Tokenizing train dataset (num_proc=160): 21%|█████████▍ | 22533/105170 [00:21<01:23, 994.27 examples/s]
Tokenizing train dataset (num_proc=160): 22%|█████████▎ | 22784/105170 [00:21<01:10, 1163.91 examples/s]
Tokenizing train dataset (num_proc=160): 22%|█████████▍ | 22983/105170 [00:21<01:04, 1274.29 examples/s]
Tokenizing train dataset (num_proc=160): 22%|█████████▋ | 23124/105170 [00:21<01:26, 950.38 examples/s]
Tokenizing train dataset (num_proc=160): 22%|█████████▌ | 23267/105170 [00:22<01:20, 1014.10 examples/s]
Tokenizing train dataset (num_proc=160): 22%|█████████▌ | 23482/105170 [00:22<01:13, 1111.32 examples/s]
Tokenizing train dataset (num_proc=160): 22%|█████████▋ | 23604/105170 [00:22<01:19, 1032.16 examples/s]
Tokenizing train dataset (num_proc=160): 23%|█████████▉ | 23714/105170 [00:22<01:42, 796.32 examples/s]
Tokenizing train dataset (num_proc=160): 23%|█████████▉ | 23823/105170 [00:22<01:44, 775.80 examples/s]
Tokenizing train dataset (num_proc=160): 23%|██████████ | 23981/105170 [00:22<01:33, 864.36 examples/s]
Tokenizing train dataset (num_proc=160): 23%|██████████ | 24162/105170 [00:23<01:29, 907.95 examples/s]
Tokenizing train dataset (num_proc=160): 23%|█████████▉ | 24391/105170 [00:23<01:13, 1102.93 examples/s]
Tokenizing train dataset (num_proc=160): 23%|██████████ | 24582/105170 [00:23<01:10, 1141.02 examples/s]
Tokenizing train dataset (num_proc=160): 23%|██████████ | 24703/105170 [00:23<01:14, 1076.54 examples/s]
Tokenizing train dataset (num_proc=160): 24%|██████████▍ | 24819/105170 [00:23<01:23, 960.79 examples/s]
Tokenizing train dataset (num_proc=160): 24%|██████████▍ | 24949/105170 [00:23<01:23, 957.49 examples/s]
Tokenizing train dataset (num_proc=160): 24%|██████████▎ | 25256/105170 [00:23<00:56, 1415.02 examples/s]
Tokenizing train dataset (num_proc=160): 24%|██████████▍ | 25441/105170 [00:24<00:53, 1502.64 examples/s]
Tokenizing train dataset (num_proc=160): 24%|██████████▍ | 25622/105170 [00:24<00:53, 1498.80 examples/s]
Tokenizing train dataset (num_proc=160): 25%|██████████▌ | 25781/105170 [00:24<00:54, 1452.65 examples/s]
Tokenizing train dataset (num_proc=160): 25%|██████████▌ | 25964/105170 [00:24<01:01, 1281.67 examples/s]
Tokenizing train dataset (num_proc=160): 25%|██████████▋ | 26105/105170 [00:24<01:11, 1101.67 examples/s]
Tokenizing train dataset (num_proc=160): 25%|██████████▊ | 26326/105170 [00:24<00:58, 1342.69 examples/s]
Tokenizing train dataset (num_proc=160): 25%|███████████ | 26477/105170 [00:25<01:26, 911.55 examples/s]
Tokenizing train dataset (num_proc=160): 25%|███████████▏ | 26599/105170 [00:25<01:37, 802.05 examples/s]
Tokenizing train dataset (num_proc=160): 26%|██████████▉ | 26863/105170 [00:25<01:11, 1092.16 examples/s]
Tokenizing train dataset (num_proc=160): 26%|███████████ | 27099/105170 [00:25<01:03, 1233.73 examples/s]
Tokenizing train dataset (num_proc=160): 26%|███████████▏ | 27398/105170 [00:25<00:55, 1402.88 examples/s]
Tokenizing train dataset (num_proc=160): 26%|███████████▎ | 27554/105170 [00:25<00:57, 1351.88 examples/s]
Tokenizing train dataset (num_proc=160): 26%|███████████▎ | 27699/105170 [00:25<00:56, 1368.98 examples/s]
Tokenizing train dataset (num_proc=160): 27%|███████████▍ | 27962/105170 [00:26<00:48, 1595.83 examples/s]
Tokenizing train dataset (num_proc=160): 27%|███████████▌ | 28344/105170 [00:26<00:36, 2115.14 examples/s]
Tokenizing train dataset (num_proc=160): 27%|███████████▋ | 28574/105170 [00:26<01:06, 1155.71 examples/s]
Tokenizing train dataset (num_proc=160): 28%|████████████ | 29556/105170 [00:26<00:29, 2568.18 examples/s]
Tokenizing train dataset (num_proc=160): 30%|█████████████ | 31856/105170 [00:26<00:11, 6432.02 examples/s]
Tokenizing train dataset (num_proc=160): 32%|█████████████▌ | 33316/105170 [00:26<00:08, 8155.63 examples/s]
Tokenizing train dataset (num_proc=160): 33%|██████████████ | 34439/105170 [00:27<00:08, 7881.01 examples/s]
Tokenizing train dataset (num_proc=160): 34%|██████████████▍ | 35442/105170 [00:27<00:08, 7904.14 examples/s]
Tokenizing train dataset (num_proc=160): 35%|██████████████▉ | 36388/105170 [00:27<00:08, 8146.29 examples/s]
Tokenizing train dataset (num_proc=160): 37%|███████████████▋ | 39383/105170 [00:27<00:04, 13491.93 examples/s]
Tokenizing train dataset (num_proc=160): 39%|████████████████▍ | 41266/105170 [00:27<00:04, 14882.03 examples/s]
Tokenizing train dataset (num_proc=160): 42%|█████████████████▋ | 44153/105170 [00:27<00:03, 18649.57 examples/s]
Tokenizing train dataset (num_proc=160): 46%|███████████████████ | 47866/105170 [00:27<00:02, 23808.81 examples/s]
Tokenizing train dataset (num_proc=160): 50%|████████████████████▉ | 52351/105170 [00:27<00:01, 29802.64 examples/s]
Tokenizing train dataset (num_proc=160): 53%|██████████████████████▏ | 55542/105170 [00:27<00:01, 29908.27 examples/s]
Tokenizing train dataset (num_proc=160): 56%|███████████████████████▍ | 58625/105170 [00:28<00:01, 25757.21 examples/s]
Tokenizing train dataset (num_proc=160): 60%|█████████████████████████▎ | 63427/105170 [00:28<00:01, 31557.28 examples/s]
Tokenizing train dataset (num_proc=160): 66%|███████████████████████████▋ | 69218/105170 [00:28<00:00, 38704.85 examples/s]
Tokenizing train dataset (num_proc=160): 75%|███████████████████████████████▍ | 78776/105170 [00:28<00:00, 54475.47 examples/s]
Tokenizing train dataset (num_proc=160): 86%|████████████████████████████████████ | 90193/105170 [00:28<00:00, 70913.96 examples/s]
Tokenizing train dataset (num_proc=160): 93%|██████████████████████████████████████▉ | 97524/105170 [00:28<00:00, 69379.22 examples/s]
Tokenizing train dataset (num_proc=160): 100%|██████████████████████████████████████████| 105170/105170 [00:29<00:00, 3577.76 examples/s]