10 lines
34 KiB
Plaintext
10 lines
34 KiB
Plaintext
|
|
[2025-11-06 08:51:25,486] [WARNING] [py.warnings._showwarnmsg:112] [PID:3076758] /root/miniforge3/envs/axolotl/lib/python3.12/site-packages/deepspeed/runtime/zero/partition_parameters.py:240: UserWarning: expandable_segments not supported on this platform (Triggered internally at /pytorch/c10/hip/HIPAllocatorConfig.h:36.)
|
||
|
|
tensor: Tensor = fn(*args, **kwargs)
|
||
|
|
|
||
|
|
[2025-11-06 08:51:32,980] [WARNING] [py.warnings._showwarnmsg:112] [PID:3076758] /root/miniforge3/envs/axolotl/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py:4807: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
|
||
|
|
warnings.warn( # warn only once
|
||
|
|
|
||
|
|
Extracting prompt in train dataset (num_proc=160): 0%| | 0/105170 [00:00<?, ? examples/s]
Extracting prompt in train dataset (num_proc=160): 0%| | 110/105170 [00:03<1:03:35, 27.53 examples/s]
Extracting prompt in train dataset (num_proc=160): 4%|█▎ | 3951/105170 [00:04<01:15, 1338.94 examples/s]
Extracting prompt in train dataset (num_proc=160): 6%|█▉ | 5842/105170 [00:04<00:46, 2139.82 examples/s]
Extracting prompt in train dataset (num_proc=160): 7%|██▍ | 7581/105170 [00:04<00:32, 3039.11 examples/s]
Extracting prompt in train dataset (num_proc=160): 9%|███ | 9307/105170 [00:04<00:24, 3949.89 examples/s]
Extracting prompt in train dataset (num_proc=160): 13%|████▏ | 13382/105170 [00:04<00:12, 7534.85 examples/s]
Extracting prompt in train dataset (num_proc=160): 16%|█████▏ | 17022/105170 [00:04<00:07, 11051.92 examples/s]
Extracting prompt in train dataset (num_proc=160): 19%|██████ | 19824/105170 [00:04<00:06, 12624.91 examples/s]
Extracting prompt in train dataset (num_proc=160): 21%|██████▊ | 22283/105170 [00:05<00:06, 13352.98 examples/s]
Extracting prompt in train dataset (num_proc=160): 23%|███████▍ | 24460/105170 [00:05<00:05, 13662.30 examples/s]
Extracting prompt in train dataset (num_proc=160): 25%|████████ | 26478/105170 [00:05<00:05, 13868.55 examples/s]
Extracting prompt in train dataset (num_proc=160): 28%|████████▊ | 29156/105170 [00:05<00:04, 16444.30 examples/s]
Extracting prompt in train dataset (num_proc=160): 30%|█████████▌ | 31230/105170 [00:05<00:04, 16927.48 examples/s]
Extracting prompt in train dataset (num_proc=160): 32%|██████████▏ | 33410/105170 [00:05<00:04, 14539.05 examples/s]
Extracting prompt in train dataset (num_proc=160): 49%|███████████████▋ | 51665/105170 [00:05<00:01, 50088.28 examples/s]
Extracting prompt in train dataset (num_proc=160): 66%|█████████████████████▏ | 69590/105170 [00:05<00:00, 79793.06 examples/s]
Extracting prompt in train dataset (num_proc=160): 90%|███████████████████████████▊ | 94350/105170 [00:06<00:00, 121502.26 examples/s]
Extracting prompt in train dataset (num_proc=160): 100%|███████████████████████████████| 105170/105170 [00:06<00:00, 15164.58 examples/s]
|
||
|
|
Applying chat template to train dataset (num_proc=160): 0%| | 0/105170 [00:00<?, ? examples/s]
Applying chat template to train dataset (num_proc=160): 0%| | 50/105170 [00:04<2:30:25, 11.65 examples/s]
Applying chat template to train dataset (num_proc=160): 1%|▏ | 675/105170 [00:04<08:15, 210.77 examples/s]
Applying chat template to train dataset (num_proc=160): 2%|▌ | 1938/105170 [00:04<02:17, 750.78 examples/s]
Applying chat template to train dataset (num_proc=160): 3%|▊ | 3050/105170 [00:04<01:16, 1341.84 examples/s]
Applying chat template to train dataset (num_proc=160): 5%|█▍ | 5273/105170 [00:04<00:34, 2935.94 examples/s]
Applying chat template to train dataset (num_proc=160): 6%|█▊ | 6618/105170 [00:04<00:25, 3798.52 examples/s]
Applying chat template to train dataset (num_proc=160): 9%|██▍ | 8989/105170 [00:05<00:16, 5978.80 examples/s]
Applying chat template to train dataset (num_proc=160): 11%|███▏ | 11951/105170 [00:05<00:10, 9212.12 examples/s]
Applying chat template to train dataset (num_proc=160): 13%|███▋ | 14157/105170 [00:05<00:08, 11311.43 examples/s]
Applying chat template to train dataset (num_proc=160): 15%|████▏ | 16107/105170 [00:05<00:07, 12101.63 examples/s]
Applying chat template to train dataset (num_proc=160): 17%|████▌ | 17962/105170 [00:05<00:06, 12748.17 examples/s]
Applying chat template to train dataset (num_proc=160): 20%|█████▎ | 20806/105170 [00:05<00:05, 14755.22 examples/s]
Applying chat template to train dataset (num_proc=160): 22%|█████▊ | 22626/105170 [00:05<00:05, 13836.62 examples/s]
Applying chat template to train dataset (num_proc=160): 23%|██████▏ | 24225/105170 [00:05<00:05, 13621.09 examples/s]
Applying chat template to train dataset (num_proc=160): 24%|██████▌ | 25754/105170 [00:06<00:06, 13176.93 examples/s]
Applying chat template to train dataset (num_proc=160): 27%|███████▎ | 28443/105170 [00:06<00:04, 15705.45 examples/s]
Applying chat template to train dataset (num_proc=160): 29%|███████▋ | 30166/105170 [00:06<00:04, 15903.40 examples/s]
Applying chat template to train dataset (num_proc=160): 30%|████████▏ | 31893/105170 [00:06<00:05, 13227.29 examples/s]
Applying chat template to train dataset (num_proc=160): 33%|████████▊ | 34205/105170 [00:06<00:04, 15487.94 examples/s]
Applying chat template to train dataset (num_proc=160): 34%|█████████▏ | 35947/105170 [00:06<00:05, 13727.27 examples/s]
Applying chat template to train dataset (num_proc=160): 36%|█████████▋ | 37785/105170 [00:06<00:04, 13706.64 examples/s]
Applying chat template to train dataset (num_proc=160): 39%|██████████▌ | 41145/105170 [00:06<00:03, 17628.22 examples/s]
Applying chat template to train dataset (num_proc=160): 41%|███████████ | 43030/105170 [00:07<00:03, 16625.80 examples/s]
Applying chat template to train dataset (num_proc=160): 43%|███████████▌ | 44862/105170 [00:07<00:04, 14248.72 examples/s]
Applying chat template to train dataset (num_proc=160): 44%|███████████▉ | 46451/105170 [00:07<00:04, 12630.48 examples/s]
Applying chat template to train dataset (num_proc=160): 46%|████████████▌ | 48782/105170 [00:07<00:03, 14955.29 examples/s]
Applying chat template to train da
|
||
|
|
Tokenizing train dataset (num_proc=160): 0%| | 0/105170 [00:00<?, ? examples/s]
Tokenizing train dataset (num_proc=160): 0%| | 10/105170 [00:06<17:47:56, 1.64 examples/s]
Tokenizing train dataset (num_proc=160): 0%| | 32/105170 [00:06<4:24:34, 6.62 examples/s]
Tokenizing train dataset (num_proc=160): 0%| | 56/105170 [00:06<2:05:39, 13.94 examples/s]
Tokenizing train dataset (num_proc=160): 0%| | 116/105170 [00:06<45:03, 38.85 examples/s]
Tokenizing train dataset (num_proc=160): 0%| | 257/105170 [00:06<15:24, 113.44 examples/s]
Tokenizing train dataset (num_proc=160): 0%|▏ | 307/105170 [00:06<12:32, 139.32 examples/s]
Tokenizing train dataset (num_proc=160): 1%|▎ | 581/105170 [00:06<04:51, 358.72 examples/s]
Tokenizing train dataset (num_proc=160): 1%|▍ | 897/105170 [00:07<02:55, 593.51 examples/s]
Tokenizing train dataset (num_proc=160): 1%|▌ | 1198/105170 [00:07<01:59, 869.76 examples/s]
Tokenizing train dataset (num_proc=160): 2%|▋ | 1769/105170 [00:07<01:07, 1535.37 examples/s]
Tokenizing train dataset (num_proc=160): 2%|▊ | 2024/105170 [00:07<01:00, 1706.84 examples/s]
Tokenizing train dataset (num_proc=160): 2%|█ | 2549/105170 [00:07<00:43, 2384.35 examples/s]
Tokenizing train dataset (num_proc=160): 3%|█▏ | 2884/105170 [00:07<00:40, 2545.10 examples/s]
Tokenizing train dataset (num_proc=160): 3%|█▍ | 3320/105170 [00:07<00:39, 2563.07 examples/s]
Tokenizing train dataset (num_proc=160): 4%|█▌ | 3682/105170 [00:07<00:36, 2798.12 examples/s]
Tokenizing train dataset (num_proc=160): 4%|█▊ | 4242/105170 [00:08<00:29, 3474.91 examples/s]
Tokenizing train dataset (num_proc=160): 5%|██ | 4846/105170 [00:08<00:33, 3020.68 examples/s]
Tokenizing train dataset (num_proc=160): 5%|██▍ | 5699/105170 [00:08<00:28, 3521.63 examples/s]
Tokenizing train dataset (num_proc=160): 6%|██▌ | 6152/105170 [00:08<00:29, 3345.92 examples/s]
Tokenizing train dataset (num_proc=160): 6%|██▊ | 6677/105170 [00:08<00:26, 3738.30 examples/s]
Tokenizing train dataset (num_proc=160): 7%|███ | 7316/105170 [00:08<00:22, 4337.63 examples/s]
Tokenizing train dataset (num_proc=160): 7%|███▎ | 7793/105170 [00:09<00:35, 2726.37 examples/s]
Tokenizing train dataset (num_proc=160): 8%|███▍ | 8167/105170 [00:09<00:38, 2540.90 examples/s]
Tokenizing train dataset (num_proc=160): 8%|███▌ | 8496/105170 [00:09<00:37, 2549.51 examples/s]
Tokenizing train dataset (num_proc=160): 9%|███▊ | 9005/105170 [00:09<00:32, 2994.71 examples/s]
Tokenizing train dataset (num_proc=160): 9%|███▉ | 9537/105170 [00:09<00:31, 3079.02 examples/s]
Tokenizing train dataset (num_proc=160): 9%|████▏ | 9899/105170 [00:09<00:29, 3194.26 examples/s]
Tokenizing train dataset (num_proc=160): 10%|████▏ | 10249/105170 [00:10<0
|